From aed5b081b4f96fb5eec3e837b9e82d500da0c017 Mon Sep 17 00:00:00 2001
From: Sungwoo Kim <iam@sung-woo.kim>
Date: Sat, 10 Aug 2024 10:58:40 -0400
Subject: [PATCH] add papers

---
 .../Declarative Recursive Computation on an RDBMS |  1 +
 ... for Certifiable Defense against Patch Attacks |  1 +
 ...usible 3D Human Models to Ambiguous Image Data |  1 +
 ...3D Self-Supervised Methods for Medical Imaging |  1 +
 .../3D Shape Reconstruction from Vision and Touch |  1 +
 ...nse Large-Scale Networks: Design and Inference |  1 +
 ...g Algorithm and Applications to Auction Design |  1 +
 ... Nonparametrics View into Deep Representations |  1 +
 ...spective on Training Speed and Model Selection |  1 +
 ...eralization in Grounded Language Understanding |  1 +
 ...sible Neural Network for Slow Feature Analysis |  1 +
 ...oolean Task Algebra for Reinforcement Learning |  1 +
 .../A Catalyst Framework for Minimax Optimization |  1 +
 ...A Causal View on Robustness of Neural Networks |  1 +
 ...ithms for General Instrumental Variable Models |  1 +
 .../A Closer Look at Accuracy vs. Robustness      |  1 +
 ...the Training Strategy for Modern Meta-Learning |  1 +
 ...Combinatorial Perspective on Transfer Learning |  1 +
 ...n between Private Learning and Online Learning |  1 +
 ...ror Descent Approach to Sparse Phase Retrieval |  1 +
 ...m for Simulations of Multi-modal Distributions |  1 +
 ...Assembly and Viral Quasispecies Reconstruction |  1 +
 ...rithm for Training Generative Adversarial Nets |  1 +
 ... to Domain-Invariant Learning in Deep Networks |  1 +
 ...opic Model without the Reparametrization Trick |  1 +
 ...tral Limit Theorem for Shallow Neural Networks |  1 +
 ...air Classifier Using Kernel Density Estimation |  1 +
 ... for Nonconvex Sparse Constrained Optimization |  1 +
 ...nalysis of Two Time-Scale Actor-Critic Methods |  1 +
 ...iors with Adaptive Smoothing and Game Encoding |  1 +
 ...s of Additive Adversarial Attacks and Defenses |  1 +
 ...aximization Algorithm with Endogenous Sampling |  1 +
 ... Framework for Solving Integer Linear Programs |  1 +
 ...eneral Method for Robust Learning from Batches |  1 +
 ... Kernel Analysis for Two-layer Neural Networks |  1 +
 ...roup-Theoretic Framework for Data Augmentation |  2 ++
 .../A Limitation of the PAC-Bayes Framework       |  2 ++
 ...Code for Distributional Reinforcement Learning |  1 +
 ...al Networks Based on Watson's Perceptual Model |  1 +
 ... and Its Application to Co-occurrence Matrices |  1 +
 ...o Off-Policy Evaluation in Average-Reward MDPs |  1 +
 ...Approach to Kernel Conditional Mean Embeddings |  1 +
 ...nalysis for Stein Variational Gradient Descent |  1 +
 ...r Constrained Optimization in Graphical Models |  1 +
 ...ategy to Solve Hard Sokoban Planning Instances |  1 +
 ...thm to Reduce the Support of Discrete Measures |  1 +
 ...ification and Localisation in Object Detection |  1 +
 ...l EM Algorithm for Incomplete Panel Count Data |  1 +
 ...vacy-Preserving Collaborative Machine Learning |  1 +
 ...r Learning Optimal Multivariate Decision Trees |  1 +
 .../neurips/A Self-Tuning Actor-Critic Algorithm  |  1 +
 ...mple Language Model for Task-Oriented Dialogue |  1 +
 ... for Faster Optimization and Local Exploration |  1 +
 ...ion with Adversarial or Stochastic Constraints |  1 +
 ...gorithm for Nonconvex-Concave Min-Max Problems |  1 +
 ... Energy Distance for Parallel Speech Synthesis |  1 +
 ... Low-bitwidth Training of Deep Neural Networks |  1 +
 ...ask-Agnostic Sample Design in Machine Learning |  1 +
 ...l EstimatoR Expectation Maximization Algorithm |  1 +
 ...dy on Encodings for Neural Architecture Search |  2 ++
 ...A Theoretical Framework for Target Propagation |  1 +
 ... Bound and Efficient Reduction for Swap Regret |  1 +
 ...pological Filter for Learning with Label Noise |  1 +
 ... Convergence Analysis of Q-Learning Algorithms |  1 +
 .../A Unified View of Label Shift Estimation      |  1 +
 ...of Optimism in Episodic Reinforcement Learning |  1 +
 ...works for Expressing Probability Distributions |  1 +
 ... for Learning from Positive and Unlabeled Data |  1 +
 ...al view of compositional zero-shot recognition |  2 ++
 ...zation formulation for multivariate regression |  1 +
 ...ement using multi-agent reinforcement learning |  1 +
 .../neurips/A kernel test for quasi-independence  |  1 +
 ... automatic differentiation in machine learning |  1 +
 ...thematical theory of cooperative communication |  1 +
 ...an-field analysis of two-player zero-sum games |  1 +
 ...carve a desired function into a neural network |  1 +
 ... Q-learning with linear function approximation |  1 +
 ...zed linear models of noisy interacting neurons |  1 +
 ...ariational form of the Schatten-$p$ quasi-norm |  1 +
 ...rithm for learning nonparametric causal graphs |  1 +
 .../A shooting formulation of deep learning       |  1 +
 ...mates local non-Hebbian learning in the cortex |  1 +
 ...ymbolic regression exploiting graph modularity |  1 +
 ... Based Identity Swapping for Forgery Detection |  1 +
 ...Expanding Receptive Field for Dense Prediction |  1 +
 ...aset: Anonymized Videos from Diverse Countries |  1 +
 ...anguage Models with Progressive Layer Dropping |  1 +
 .../Acceleration with a Ball Optimization Oracle  |  1 +
 ...alized Odds by Resampling Sensitive Attributes |  1 +
 ...iction: Experiment Selection through Stability |  1 +
 ...rning of Causal DAGs via Directed Clique Trees |  1 +
 ... Stepsizes by the Belief in Observed Gradients |  1 +
 ...o Share For Efficient Deep Multi-Task Learning |  1 +
 ...tive Tensor Program Compilation Made Efficient |  1 +
 .../Adam with Bandit Sampling for Deep Learning   |  1 +
 ...Allow Identification of Optimized Neural Codes |  1 +
 .../Adapting Neural Architectures Between Domains |  1 +
 ...ting to Misspecification in Contextual Bandits |  1 +
 ...ization for Model-Based Reinforcement Learning |  2 ++
 ...al Interference: A Maximum Likelihood Approach |  2 ++
 ...ve Gradient Quantization for Data-Parallel SGD |  1 +
 ...onal Recurrent Network for Traffic Forecasting |  1 +
 ...zation and Sampling with Decreasing Step-Sizes |  1 +
 ...dels for Efficient Pairwise Sequence Alignment |  1 +
 ...line Estimation of Piecewise Polynomial Trends |  1 +
 ...ive Probing Policies for Shortest Path Routing |  1 +
 .../2020/neurips/Adaptive Reduced Rank Regression |  1 +
 ...e Sampling for Stochastic Risk-Averse Learning |  1 +
 ...tive Shrinkage Estimation for Streaming Graphs |  1 +
 ...ox Adversarial Attacks using Normalizing Flows |  1 +
 ... Flows, Importance Weighting, and Optimization |  1 +
 .../Adversarial Attacks on Deep Graph Matching    |  1 +
 ...versarial Attacks on Linear Contextual Bandits |  1 +
 ...ns: Regret Lower Bound and No-regret Algorithm |  0
 data/2020/neurips/Adversarial Blocking Bandits    |  1 +
 ...cing Through Robust Rank-One Matrix Completion |  1 +
 ...stributional Training for Robust Deep Learning |  1 +
 data/2020/neurips/Adversarial Example Games       |  1 +
 ...dversarial Learning for Robust Deep Clustering |  1 +
 ...rsarial Robustness of Supervised Sparse Coding |  1 +
 ...versarial Self-Supervised Contrastive Learning |  1 +
 ...Imitation Learning without Policy Optimization |  1 +
 ...Sparse Transformer for Time Series Forecasting |  1 +
 ...ng for One-Shot Unsupervised Domain Adaptation |  1 +
 ...of Data-dependent Operator Norm Regularization |  1 +
 ...eight Perturbation Helps Robust Generalization |  1 +
 ...robustness via robust low rank representations |  4 ++++
 ...st Few-Shot Learning: A Meta-Learning Approach |  1 +
 ... Streaming Algorithms via Differential Privacy |  1 +
 ...semble of Discrete Undirected Graphical Models |  1 +
 ...s on Approximation Error and Sample Complexity |  1 +
 ...rning of a Single Neuron with Gradient Descent |  1 +
 .../Agnostic Learning with Multiple Objectives    |  1 +
 ...emble Knowledge Distillation in Gradient Space |  1 +
 ...ect causal knowledge: a probabilistic approach |  1 +
 .../All Word Embeddings from One Embedding        |  1 +
 ...transitions in sparse spiked matrix estimation |  1 +
 ... Learningvia Reference-Advantage Decomposition |  1 +
 .../neurips/Almost Surely Stable Deep Dynamics    |  1 +
 ...n Analysis of SVD for Deep Rotation Estimation |  1 +
 ...mental Algorithm for Contextual Linear Bandits |  1 +
 ...fficient Adversarial Attack for Tree Ensembles |  1 +
 ... Evolutionary and Gradient-based Policy Search |  1 +
 ...ent Framework for Clustered Federated Learning |  1 +
 ...lgorithms for Combinatorial and Linear Bandits |  1 +
 ... and Non-Uniform Sampling in Experience Replay |  1 +
 ...ch to Transfer Learning with Dynamics Mismatch |  1 +
 ...y Gradient and Natural Policy Gradient Methods |  1 +
 ...s of Stochastic Gradient Descent with Momentum |  1 +
 ... Elimination Algorithm for Learning a Best Arm |  1 +
 ... Estimator for Learning with Augmented Classes |  1 +
 ...nformation-Theoretic Perceptual Quality Metric |  1 +
 ...etworks dynamics for hinge loss classification |  1 +
 ...tion of stagewise convex optimization problems |  1 +
 ...rning approach for parametric modal regression |  1 +
 .../An operator view of policy gradient methods   |  1 +
 ...son Sampling for Stochastic Partial Monitoring |  1 +
 ...ian in Shallow ReLU Models: A Tale of Symmetry |  1 +
 ...tion-Maximization for Deep Generative Networks |  1 +
 ...cations of Common Entropy for Causal Inference |  1 +
 ...oximate Cross-Validation for Structured Models |  1 +
 ...lidation with Low-Rank Data in High Dimensions |  1 +
 ...ained Learning with Lagrange Multiplier Models |  1 +
 ...nce Reduction for Reparameterization Gradients |  1 +
 ... Ability to Solve the Symbol Grounding Problem |  1 +
 ...g: A Framework for Multi-Organization Learning |  1 +
 ...l knowledge into model-agnostic explainability |  1 +
 ...eling Based on the Smooth Wasserstein Distance |  1 +
 ...rs neural network in the random features model |  1 +
 ...ly Optimal Exact Minibatch Metropolis-Hastings |  1 +
 ...es, You Really Can Backdoor Federated Learning |  1 +
 ...ment Learning Model for Traffic Signal Control |  1 +
 ...n implement reward-based error backpropagation |  1 +
 ...ibute Prototype Network for Zero-Shot Learning |  1 +
 ...ompression for Reliable Network Interpretation |  1 +
 ...udio Generation for a Silent Performance Video |  1 +
 ... Machine Learning: How Private is Private SGD? |  1 +
 data/2020/neurips/Auto Learning Attention         |  1 +
 ... Architecture Search for Panoptic Segmentation |  1 +
 ...ient Algorithm for Block Stacking Style Search |  1 +
 ... Selection for Secure Neural Network Inference |  2 ++
 ...ze for Data-Parallel Distributed Deep Learning |  1 +
 ...coders that don't overfit towards the Identity |  1 +
 .../Autofocused oracles for model-based design    |  1 +
 ...Curriculum Learning through Value Disagreement |  1 +
 ...s for Scalable Certified Robustness and Beyond |  1 +
 ...ity-aware Surrogates for Optimization Problems |  1 +
 data/2020/neurips/Autoregressive Score Matching   |  1 +
 ...ary Task Reweighting for Minimum-data Learning |  1 +
 data/2020/neurips/AvE: Assistance via Empowerment |  1 +
 ...iding Side Effects By Considering Future Tasks |  1 +
 .../Avoiding Side Effects in Complex Environments |  1 +
 .../Axioms for Learning from Pairwise Comparisons |  1 +
 ...Learning for Batch Deep Reinforcement Learning |  1 +
 ...nce: Fast and Robust Inference with Early Exit |  1 +
 ...BOSS: Bayesian Optimization over String Spaces |  1 +
 .../BRP-NAS: Prediction-based NAS using GCNs      |  1 +
 ...proves Transferability of Adversarial Examples |  1 +
 ...Bad Global Minima Exist and SGD Can Reach Them |  1 +
 ...eta-Softmax for Long-Tailed Visual Recognition |  1 +
 data/2020/neurips/Bandit Linear Control           |  1 +
 ...it Samplers for Training Graph Neural Networks |  1 +
 ...e k-Medoids Clustering via Multi-Armed Bandits |  1 +
 ...pproach to search over molecule synthesis DAGs |  1 +
 ...ollapse for randomly initialised deep networks |  1 +
 .../Batched Coarse Ranking in Multi-Armed Bandits |  1 +
 data/2020/neurips/Baxter Permutation Process      |  1 +
 ...onal Learning for Multi-omics Data Integration |  1 +
 ... Loss Functions and the Scoring Function Class |  1 +
 data/2020/neurips/Bayesian Attention Modules      |  1 +
 ...yesian Bits: Unifying Quantization and Pruning |  1 +
 ...g with Zero-Inflated Poisson Bayesian Networks |  1 +
 ...n Deep Ensembles via the Neural Tangent Kernel |  1 +
 ... a Probabilistic Perspective of Generalization |  1 +
 ...ning for the Few-Shot Setting via Deep Kernels |  1 +
 ...type Mean Field Multi-agent Imitation Learning |  1 +
 .../Bayesian Optimization for Iterative Learning  |  1 +
 .../Bayesian Optimization of Risk Measures        |  1 +
 ...c Numerical Integration with Tree-Based Models |  1 +
 data/2020/neurips/Bayesian Pseudocoresets         |  1 +
 ...ian Robust Optimization for Imitation Learning |  1 +
 ...n-adaptive neural network optimization methods |  1 +
 .../neurips/Belief Propagation Neural Networks    |  1 +
 ...overy in POMDPs using the Value of Information |  1 +
 ...odels over time, and the Neural-Adjoint method |  1 +
 ...ng Interpretability in Time Series Predictions |  1 +
 ...ulti-Hop Logical Reasoning in Knowledge Graphs |  1 +
 ... Pedestrian Detection from Another Perspective |  1 +
 ...trix Regret via Parameter-Free Online Learning |  1 +
 ...r Set Representations For Relational Reasoning |  1 +
 ...rks: Current Limitations and Effective Designs |  1 +
 ... Interactive Summaries of Actionable Recourses |  1 +
 ...ng for Over-parameterized Tensor Decomposition |  1 +
 ...ntees with Arbitrary Adversarial Test Examples |  1 +
 ...CNNs and humans by measuring error consistency |  1 +
 ...Processes Improve the Predictive Uncertainties |  1 +
 ...r Learning Energy-based Latent Variable Models |  1 +
 ...regret bounds for adversarial bandits and MDPs |  2 ++
 ... Convolutional Poisson Gamma Dynamical Systems |  1 +
 .../Big Bird: Transformers for Longer Sequences   |  1 +
 ...sed Models are Strong Semi-Supervised Learners |  1 +
 ...ough dynamic inversion of feedforward networks |  1 +
 ...Inspired Mechanisms for Adversarial Robustness |  1 +
 ...ing: A Functional Optimization Based Framework |  1 +
 ... Optimization with Local Generative Surrogates |  1 +
 ...odels using generative evolutionary algorithms |  1 +
 ...ideo Temporal Consistency via Deep Video Prior |  1 +
 ...e Scene Representations from Unlabelled Images |  1 +
 ...or Efficient Monte-Carlo Bayesian Optimization |  1 +
 ...dversarial Training with Hypersphere Embedding |  1 +
 ...tive: New Schemes with Faster Worst-Case Rates |  1 +
 ...t - A New Approach to Self-Supervised Learning |  1 +
 data/2020/neurips/Bootstrapping neural processes  |  1 +
 ...ry thickness and robustness in learning models |  1 +
 ... Embedding Model for Knowledge Base Completion |  1 +
 ... Langevin Dynamics for Non-Convex Optimization |  1 +
 ...ng the Communication-Privacy-Accuracy Trilemma |  1 +
 ...Reinforcement Learning with a Generative Model |  1 +
 ...ty for Model-Based Deep Reinforcement Learning |  1 +
 ...One-shot Neural Architecture Search with BONAS |  1 +
 ...eural networks with structural message-passing |  1 +
 ...tine Resilient Distributed Multi-Task Learning |  1 +
 ...arization via Auxiliary Causal Graph Discovery |  1 +
 ...tworks with Scalable and Consistent Estimation |  1 +
 ...ural Architecture Search for Image Restoration |  1 +
 ...t Embeddings from Narrated Instructional Video |  1 +
 ...sformer for Video-Text Representation Learning |  1 +
 .../COPT: Coordinated Optimal Transport on Graphs |  1 +
 ...g Sequential Data via Causal Optimal Transport |  1 +
 ...: Communication-efficient SGD with Error Reset |  1 +
 ...Learning on Distributionally Shifted Instances |  1 +
 ...cal Spatiotemporal Point Cloud Representations |  1 +
 ...able Regression using Maximum Mean Discrepancy |  1 +
 .../Calibrating CNNs for Lifelong Learning        |  1 +
 ...ibrating Deep Neural Networks using Focal Loss |  1 +
 ... General Sum Partially Observable Markov Games |  1 +
 ...Can Graph Neural Networks Count Substructures? |  1 +
 ...ess with Unlabeled Data and Bayesian Inference |  1 +
 ...Stochastic Convex Optimization as a Case Study |  1 +
 ...alizable Branching Heuristic for a SAT Solver? |  1 +
 ...ning Learn Representation? A Mean-Field Theory |  3 +++
 ... Backpropagation in Predictive Coding Networks |  1 +
 ...ense weakly-supervised category reconstruction |  1 +
 ...caded Text Generation with Markov Transformers |  1 +
 ...Unknown Targets: Characterization and Learning |  1 +
 ...usal Discovery in Physical Systems from Videos |  1 +
 .../Causal Estimation with Functional Confounders |  1 +
 ...on for Weakly-Supervised Semantic Segmentation |  1 +
 ...plain Individual Predictions of Complex Models |  2 ++
 .../Causal analysis of Covid-19 Spread in Germany |  1 +
 ...y Robust Detection of Out-of-Distribution Data |  1 +
 ...Image Transformations via Randomized Smoothing |  1 +
 .../neurips/Certified Monotonic Neural Networks   |  1 +
 ...Graph Classification under Topological Attacks |  1 +
 ...Certifying Confidence via Randomized Smoothing |  1 +
 .../Certifying Strategyproof Auction Networks     |  1 +
 ...Optimism: Volume Analysis of Learning in Games |  3 +++
 ...licies: Where to Intervene and What to Observe |  1 +
 ... of candidate learning rules for deep networks |  1 +
 data/2020/neurips/Choice Bandits                  |  1 +
 ... Adversarial Learning across Spherical Circles |  1 +
 ...s, Generalized Linear Models, and Evolvability |  1 +
 ...lassification with Valid and Adaptive Coverage |  1 +
 ...ntization Gap: PixelCNN as a Single-Layer Flow |  1 +
 data/2020/neurips/Co-Tuning for Transfer Learning |  1 +
 ...xposure Maximization in Online Social Networks |  1 +
 ...ution Networks for Co-Salient Object Detection |  1 +
 ...ltimodal Image Representation for Registration |  1 +
 .../neurips/CoSE: Compositional Stroke Embeddings |  1 +
 ...For Function-Level Binary Source Code Matching |  1 +
 ...Matrix Multiplication For Straggler Mitigation |  1 +
 .../neurips/CogLTX: Applying BERT to Long Texts   |  1 +
 ...sign for COVID-19 Using Deep Generative Models |  1 +
 ...erarchical Multi-Label Classification Networks |  1 +
 ...ICE: Off-Policy Confidence Interval Estimation |  1 +
 ...actical Private Mean and Covariance Estimation |  1 +
 ...anguage GANs with Cautious Sampling Strategies |  1 +
 ...heir Application to Public Health Intervention |  1 +
 data/2020/neurips/Collegial Ensembles             |  1 +
 ...usions: A Statistics-based Computational Model |  1 +
 ...ing and Search for Imperfect-Information Games |  1 +
 ...evolving graphs with a dynamical Bethe-Hessian |  1 +
 ...dinality semidefinite programming\342\200\251" |  1 +
 ...ormative model for higher-order brain activity |  1 +
 .../neurips/Comparator-Adaptive Convex Bandits    |  1 +
 ...Understanding Gradient Flow in Phase Retrieval |  1 +
 .../neurips/Compositional Explanations of Neurons |  1 +
 ...eralization by Learning Analytical Expressions |  1 +
 ...eralization via Neural-Symbolic Stack Machines |  1 +
 ...nal Visual Generation with Energy Based Models |  1 +
 ...ing via Fine-Grained Dense Feature Composition |  1 +
 ...llation for Weakly-Supervised Object Detection |  1 +
 ...t Representations with Relative Entropy Coding |  1 +
 ... Selective Inference using Dynamic Programming |  1 +
 ...ve Information-Theoretic Generalization Bounds |  1 +
 ...nce sequences for sampling without replacement |  1 +
 ...ormal Symplectic and Relativistic Optimization |  1 +
 ...ion in Infinite-Horizon Reinforcement Learning |  1 +
 ...timization over Positive Semidefinite Matrices |  1 +
 data/2020/neurips/Consequences of Misaligned AI   |  1 +
 ... Q-Learning for Offline Reinforcement Learning |  1 +
 ...r Certified Robustness of Smoothed Classifiers |  1 +
 ...etric Mixture Models from Grouped Observations |  1 +
 ...sifiers for Complex Objectives and Constraints |  1 +
 ...l Relation Learning for Zero-Shot Segmentation |  1 +
 ...re selection for analytic deep neural networks |  2 ++
 ... for Compressed Sensing with Generative Priors |  1 +
 ...arning in concave-convex and knapsack settings |  1 +
 ...rence with Geometric Jensen-Shannon Divergence |  1 +
 ... and Coordination in Recommendation Ecosystems |  1 +
 ...es: Multi-Agent Learning with Side Information |  1 +
 ...tion in Auctions via Mixed Integer Programming |  1 +
 ...by Functional Regularisation of Memorable Past |  1 +
 ...nual Learning in Low-rank Orthogonal Subspaces |  1 +
 ...l Primitives : Skill Discovery via Reset-Games |  1 +
 ...Mixed Sequence of Similar and Dissimilar Tasks |  1 +
 ...nce based Adaptive Group Sparse Regularization |  1 +
 .../Continuous Meta-Learning without Tasks        |  1 +
 ...View Synthesis without Target View Supervision |  1 +
 ...Continuous Regularized Wasserstein Barycenters |  1 +
 ...bmodular Maximization: Beyond DR-Submodularity |  1 +
 data/2020/neurips/Continuous Surface Embeddings   |  1 +
 ...tive Learning for Conditional Image Generation |  1 +
 ...Contrastive Learning with Adversarial Examples |  1 +
 ...al image segmentation with limited annotations |  1 +
 ...oving BERT with Span-based Dynamic Convolution |  1 +
 ... Convolutional Networks on Large Random Graphs |  1 +
 ...sk-Specific Adaptation over Partial Parameters |  1 +
 ...tion based on global lower second-order models |  1 +
 ...Convolutional Generation of Textured 3D Meshes |  1 +
 ...Tensor-Train LSTM for Spatio-Temporal Learning |  1 +
 ...tive Heterogeneous Deep Reinforcement Learning |  1 +
 .../Cooperative Multi-player Bandit Optimization  |  1 +
 .../neurips/Coresets for Near-Convex Functions    |  2 ++
 .../Coresets for Regressions with Panel Data      |  1 +
 ...g of Deep Neural Networks against Noisy Labels |  0
 ...imization for Continual Learning and Streaming |  1 +
 .../Correlation Robust Influence Maximization     |  1 +
 ...ence learning via linearly-invariant embedding |  1 +
 ...e-Guided Learning of Monotonic Neural Networks |  1 +
 ...or Weakly-Supervised Vision-Language Grounding |  1 +
 ...a Augmentation using Locally Factored Dynamics |  1 +
 ...Counterfactual Prediction for Bundle Treatment |  1 +
 ...rfactual Predictions under Runtime Confounding |  1 +
 ...nd-Language Navigation: Unravelling the Unseen |  1 +
 ...rks Are Universal Diffeomorphism Approximators |  1 +
 ... Paths For One-Shot Neural Architecture Search |  1 +
 data/2020/neurips/Critic Regularized Regression   |  1 +
 ...raph Neural Network for Image Super-Resolution |  1 +
 ...trieval for Iterative Self-Supervised Training |  1 +
 ...validation Confidence Intervals for Test Error |  1 +
 ...ransformers: spatially-aware few-shot transfer |  1 +
 ...tructured Bandits Beyond Asymptotic Optimality |  1 +
 data/2020/neurips/Curriculum By Smoothing         |  1 +
 ...rriculum Learning by Dynamic Instance Hardness |  1 +
 ...for multilevel budgeted combinatorial problems |  1 +
 ...ation to Prevent Distortion in Graph Embedding |  1 +
 ... Self-Supervised Video Representation Learning |  1 +
 ...us Optimization for Learning Bayesian Networks |  1 +
 ...: Learning local features with policy gradient |  1 +
 ...es for Enhanced Robust Generation of Ensembles |  1 +
 ... Continual Learning: a Strong, Simple Baseline |  1 +
 ...Simple Strategy For Neural Machine Translation |  1 +
 ...ing Text by Fingerprinting Language Generation |  1 +
 data/2020/neurips/Debiased Contrastive Learning   |  1 +
 ...stic Gradient Descent to handle missing values |  1 +
 ... Surrogate Sketching and Scaled Regularization |  1 +
 .../Debugging Tests for Model Explanations        |  1 +
 ...tralized Accelerated Proximal Gradient Descent |  1 +
 ...alized Langevin Dynamics for Bayesian Learning |  1 +
 ...ion Approximation and its Finite-Time Analysis |  1 +
 ...o characterize their generalization properties |  1 +
 ...on-Making with Auto-Encoding Variational Bayes |  1 +
 ...terfactual Explanations and Strategic Behavior |  1 +
 data/2020/neurips/Deep Archimedean Copulas        |  1 +
 data/2020/neurips/Deep Automodulators             |  1 +
 ...iant Wasserstein Distributional Classification |  1 +
 .../2020/neurips/Deep Direct Likelihood Knockoffs |  1 +
 ...Energy-based Modeling of Discrete-Time Physics |  1 +
 data/2020/neurips/Deep Evidential Regression      |  1 +
 ...phical model for improved animal pose tracking |  1 +
 ...ion Learning for Bimanual Robotic Manipulation |  1 +
 .../Deep Metric Learning with Spherical Embedding |  1 +
 .../Deep Multimodal Fusion by Channel Exchanging  |  1 +
 ...d Particle Filters for Time Series Forecasting |  1 +
 ...ed Hierarchical Attention for Text-based Games |  1 +
 .../Deep Reinforcement and InfoMax Learning       |  1 +
 ...odeling via Graph Poisson Gamma Belief Network |  1 +
 ...ed Shape Correspondence with Optimal Transport |  1 +
 ...ep Smoothing of the Implied Volatility Surface |  1 +
 data/2020/neurips/Deep Statistical Solvers        |  1 +
 ... Models for Tractable Counterfactual Inference |  1 +
 ...eep Subspace Clustering with Data Augmentation |  1 +
 .../Deep Transformation-Invariant Clustering      |  1 +
 .../neurips/Deep Transformers with Latent Depth   |  1 +
 .../Deep Variational Instance Segmentation        |  1 +
 ...iener Meets Deep Learning for Image Deblurring |  1 +
 ...ive inference agents using Monte-Carlo methods |  1 +
 ...he time evolution of the Neural Tangent Kernel |  1 +
 ...ruction of strange attractors from time series |  1 +
 ...to-Image Translation by Transferring from GANs |  1 +
 ...nerative Network for Vector Graphics Animation |  1 +
 ...Learned Spectral Total Variation Decomposition |  1 +
 ...nd Cooperation in Nonstochastic Linear Bandits |  1 +
 ...l Networks using Structured Response Jacobians |  1 +
 ...m in Semi-supervised Video Object Segmentation |  1 +
 ...ural population data from multiple brain areas |  1 +
 ...Demystifying Orthogonal Monte Carlo and Beyond |  1 +
 ... A Provable Defense for Pretrained Classifiers |  1 +
 .../Denoising Diffusion Probabilistic Models      |  1 +
 ...rning Transformation Synchronization on Graphs |  1 +
 .../neurips/Depth Uncertainty in Neural Networks  |  1 +
 .../Design Space for Graph Neural Networks        |  1 +
 ...s and Recognizing Physical Contact in the Wild |  1 +
 ... from Neural Networks via Topological Analysis |  0
 ...rtified Object Detection with Median Smoothing |  0
 ...imization over a Matroid in Nearly Linear Time |  1 +
 ...a: Learning Visual Dialog Agents from VQA Data |  1 +
 ...tial Operators and Algebraic Multigrid Pooling |  1 +
 ...e Augmentation for Data-Efficient GAN Training |  1 +
 ...able Causal Discovery from Interventional Data |  1 +
 ...Parallel Multi-Objective Bayesian Optimization |  1 +
 ...ifferentiable Meta-Learning of Bandit Policies |  1 +
 ... Equivalent Space with Exploration Enhancement |  1 +
 .../Differentiable Top-k with Optimal Transport   |  1 +
 ...Private Clustering: Tight Approximation Ratios |  2 ++
 ...ifferentially-Private Federated Linear Bandits |  1 +
 .../Digraph Inception Convolutional Networks      |  1 +
 ...o Modern Deep Learning Tasks and Architectures |  1 +
 ...mization of Policies in Discrete Action Spaces |  1 +
 .../Directional Pruning of Deep Neural Networks   |  1 +
 ...nal convergence and alignment in deep learning |  1 +
 .../Dirichlet Graph Variational Autoencoder       |  1 +
 ...Gradient Estimator for Binary Latent Variables |  1 +
 ...forcement Learning via Distribution Correction |  1 +
 .../Discovering Reinforcement Learning Algorithms |  1 +
 ...odels from Deep Learning with Inductive Biases |  1 +
 ...covering conflicting groups in signed networks |  1 +
 ...ation via Self-supervised Audiovisual Matching |  1 +
 ...Ground Truth in Segmentation of Medical Images |  1 +
 .../neurips/Disentangling by Subspace Diffusion   |  1 +
 ... Learning for Accurate Optical Flow Estimation |  1 +
 data/2020/neurips/Dissecting Neural ODEs          |  1 +
 ...ral Networks for Graph Representation Learning |  1 +
 ...istributed Distillation for On-Device Learning |  1 +
 ... Communicate Less and Resist Byzantine Workers |  1 +
 ...ta: Bridging Median- and Mean-Based Algorithms |  1 +
 ...-label for Imbalanced Semi-supervised Learning |  1 +
 .../Distribution Matching for Crowd Counting      |  1 +
 ...ion sets, confidence intervals and calibration |  1 +
 ...with IPMs and links to Regularization and GANs |  1 +
 .../Distributionally Robust Federated Averaging   |  1 +
 ...st Local Non-parametric Conditional Estimation |  1 +
 ...obust Parametric Maximum Likelihood Estimation |  1 +
 ...ioning with Context-Object Split Latent Spaces |  1 +
 ...versification for White- and Black-box Attacks |  1 +
 ...e Bayesian Optimization With Batch Evaluations |  1 +
 ...rially Robust ImageNet Models Transfer Better? |  1 +
 ...tion Learning Help Neural Architecture Search? |  1 +
 ... as a Problem of Inference on Graphical Models |  1 +
 ...tribution Matching and Generalized Label Shift |  1 +
 ...fication with Linear-Dependency Regularization |  2 ++
 ...main Generalization via Entropy Regularization |  1 +
 ...Gradient Estimation for Deterministic Policies |  1 +
 .../neurips/Dual Instrumental Variable Regression |  1 +
 ...ense against Lp and non-Lp Adversarial Attacks |  1 +
 ... for Transition Matrix in Label-noise Learning |  1 +
 ...ntralized Optimization with Variance Reduction |  1 +
 .../Dual-Resolution Correspondence Networks       |  1 +
 ...Factorization Based Knowledge Graph Completion |  1 +
 ...RT: Dynamic BERT with Adaptive Width and Depth |  1 +
 ...nd Verbal Narrations in Knowledge-rich Domains |  1 +
 .../Dynamic Regret of Convex and Smooth Functions |  1 +
 ...cy Optimization in Non-Stationary Environments |  1 +
 data/2020/neurips/Dynamic Submodular Maximization |  1 +
 ...ted memory resources in reinforcement learning |  1 +
 ...ent descent in Gaussian mixture classification |  1 +
 ...rization Prevents Memorization of Noisy Labels |  1 +
 ...s Under Extreme Budget and Network Constraints |  1 +
 ... Faster Regularized Least-Squares Optimization |  1 +
 ...ity in Population Based Reinforcement Learning |  1 +
 ...ms for Device Placement of DNN Graph Operators |  1 +
 ...d On A Unified View Of $K$-means And Ratio-cut |  1 +
 ...r Stretched Mixtures: Landscape and Optimality |  1 +
 ...ent Contextual Bandits with Continuous Actions |  1 +
 ...ed High-Dimensional Distributions via Learning |  6 ++++++
 ...xact Verification of Binarized Neural Networks |  1 +
 ...inforcement Learning via Bayesian Optimization |  1 +
 ... Objects with Constrained Adversarial Networks |  1 +
 ...fficient Learning of Discrete Graphical Models |  1 +
 ...ve Models via Finite-Difference Score Matching |  1 +
 ...sian Variational Inference for Neural Networks |  1 +
 ...e and Structured Latent Variables via Sparsity |  1 +
 ... through Optimistic Policy Search and Planning |  1 +
 ...ian Optimization via One-Shot Multi-Step Trees |  1 +
 ... Dimensionality Reduction via Gradient Descent |  1 +
 ...e MDPs with Weak Linear Function Approximation |  1 +
 ...tion-free Algorithms for Saddle Point Problems |  1 +
 ...parse Deep Learning with Theoretical Guarantee |  1 +
 ...sparse halfspaces with arbitrary bounded noise |  1 +
 ... of neural tuning during naturalistic behavior |  1 +
 ...ased inference for binary and multi-class MRFs |  1 +
 ...presentation Learning in Class-Imbalanced Data |  1 +
 ...: Protecting SignSGD against Byzantine Attacks |  1 +
 ...t Transfer via Unsupervised Environment Design |  1 +
 ...n from Randomized Uncertain Social Preferences |  1 +
 .../Empirical Likelihood for Contextual Bandits   |  1 +
 ... via memory-efficient semidefinite programming |  1 +
 .../End-to-End Learning and Intervention in Games |  1 +
 .../Energy-based Out-of-distribution Detection    |  1 +
 ... for Robust Model Fusion in Federated Learning |  2 ++
 ...ophysical models with Bayesian Neural Networks |  1 +
 .../Ensuring Fairness Beyond the Training Data    |  1 +
 ...nce: Identifiability and Finite Sample Results |  1 +
 ...Unbalanced Gaussian Measures has a Closed Form |  1 +
 ...ergence of iterative methods for eigenproblems |  1 +
 ...uivariant Networks for Hierarchical Structures |  0
 ...ework for Combinatorial Optimization on Graphs |  1 +
 ... Bounds of Imitating Policies and Environments |  1 +
 ...int Faster under Interpolation-like Conditions |  1 +
 .../Escaping the Gravitational Pull of Softmax    |  1 +
 ...ural Representations of Uncertain Environments |  1 +
 ...rom Heavy-Tailed Noise via Self-Avoiding Walks |  1 +
 ...ing Data Influence by Tracing Gradient Descent |  1 +
 ...ability with polylogarithmic sample complexity |  2 ++
 ...ventions using Generative Adversarial Networks |  1 +
 .../Estimating weighted areas under the ROC curve |  1 +
 ...mation of Skill Distribution from a Tournament |  1 +
 ...aluating Attribution for Graph Neural Networks |  1 +
 ...g Teamwork Using Cooperative Game Abstractions |  1 +
 ...with Hybrid-Cylindrical-Spherical Voxelization |  1 +
 ...Spaces in Conditional Variational Autoencoders |  1 +
 ...y Prediction with Dynamic Relational Reasoning |  1 +
 ...al Planning for Vision-and-Language Navigation |  1 +
 .../Evolving Normalization-Activation Layers      |  1 +
 ... of Mangled Clusters with Same-Cluster Queries |  1 +
 ...cit regularization via surrogate random design |  1 +
 ... the Local Lipschitz Constant of ReLU Networks |  1 +
 .../Exchangeable Neural ODE for Set Modeling      |  1 +
 data/2020/neurips/Exemplar Guided Active Learning |  1 +
 ...rest Neighbor Retrieval, and Data Augmentation |  1 +
 ...zation to Train Compact Convolutional Networks |  1 +
 ...imental design for MRI by greedy policy search |  1 +
 ...ing for Offline Policy Learning and Evaluation |  1 +
 data/2020/neurips/Explainable Voting              |  1 +
 ...ear Classifiers with Polynomial Time and Delay |  1 +
 ...it Regularisation in Gaussian Noise Injections |  1 +
 ...ative-free Optimization and Continuous Bandits |  1 +
 ... Fair and Transferable Representation Learning |  1 +
 ...rogate Gap in Online Multiclass Classification |  1 +
 ...ual patterns to learn from partial annotations |  1 +
 ...radient Methods with Variable Stepsize Scaling |  1 +
 ...st Neighbour and Its Improved Convergence Rate |  1 +
 ...y and Representation Learning of Low Rank MDPs |  1 +
 data/2020/neurips/Factor Graph Grammars           |  1 +
 data/2020/neurips/Factor Graph Neural Networks    |  1 +
 .../Factorizable Graph Convolutional Networks     |  1 +
 ...ocesses: K-Shot Prediction of Neural Responses |  1 +
 data/2020/neurips/Fair Hierarchical Clustering    |  2 ++
 ...ple Decision Making Through Soft Interventions |  1 +
 .../neurips/Fair Performance Metric Elicitation   |  1 +
 ... and recalibration with statistical guarantees |  1 +
 .../Fair regression with Wasserstein barycenters  |  1 +
 ... help exact inference in structured prediction |  1 +
 ...bmodular Maximization: Algorithms and Hardness |  1 +
 ...verlapping Groups; a Probabilistic Perspective |  0
 ...hics through Adversarially Reweighted Learning |  1 +
 ...Faithful Embeddings for Knowledge Base Queries |  1 +
 ...con: Fast Spectral Inference on Encrypted Data |  1 +
 ... Maximization Subject to a Knapsack Constraint |  1 +
 ...namics on Manifold: Geodesics meet Log-Sobolev |  1 +
 ...Distributionally Robust Support Vector Machine |  1 +
 data/2020/neurips/Fast Fourier Convolution        |  1 +
 ...o Gaussian Processes and Bayesian Optimization |  1 +
 .../Fast Transformers with Clustered Attention    |  1 +
 .../Fast Unbalanced Optimal Transport on a Tree   |  1 +
 ...nd Accurate $k$-means++ via Rejection Sampling |  1 +
 ... Temporal Point Processes with Triangular Maps |  1 +
 ...Fast geometric learning with symbolic matrices |  1 +
 ...ls for Tabular Data via Augmented Distillation |  1 +
 ...aster DBSCAN via subsampled similarity queries |  1 +
 ...ergence Analysis of Discretized Langevin MCMC" |  1 +
 ...or Point Methods for Tall Wide Linear Programs |  1 +
 ...stance Estimation with the Sinkhorn Divergence |  1 +
 .../Feature Importance Ranking for Deep Learning  |  1 +
 ...ave Shifted via Conditional Distribution Tests |  1 +
 ...hmic framework for fast federated optimization |  1 +
 ...erated Accelerated Stochastic Gradient Descent |  1 +
 ...ed Bayesian Optimization via Thompson Sampling |  1 +
 .../Federated Principal Component Analysis        |  1 +
 ...ject Detection with Adversarial-Paced Learning |  1 +
 ...e Generation with Elastic Weight Consolidation |  1 +
 ...ning with Meta-Analogical Contrastive Learning |  1 +
 ...etric Learning Perspective Using Fewer Proxies |  1 +
 ...mes: Continuous Time Analysis and Applications |  2 ++
 ...wise Learning for Multi-field Categorical Data |  1 +
 ... Behavioral Cloning from Observation Histories |  1 +
 ...vex Linearly Constrained Optimization Problems |  1 +
 ...gy of Decision Boundaries with Active Learning |  1 +
 ...Fine-Grained Dynamic Head for Object Detection |  1 +
 ...c Reconstruction via Biodiversity Optimization |  1 +
 data/2020/neurips/Finite Continuum-Armed Bandits  |  1 +
 ...s Infinite Neural Networks: an Empirical Study |  1 +
 .../Finite-Time Analysis for Double Q-learning    |  1 +
 ...tion with Multiple Plays and Markovian Rewards |  1 +
 ...a General Approach for Growing Neural Networks |  1 +
 ...Order Constrained Optimization in Policy Space |  1 +
 ...for Large-Scale Market Equilibrium Computation |  1 +
 ...vised Learning with Consistency and Confidence |  1 +
 ...ers: Computational Hardness and Fast Algorithm |  1 +
 .../FleXOR: Trainable Fractional Quantization     |  1 +
 ...xtures of non-overlapping exponential families |  1 +
 ...neous manifold learning and density estimation |  1 +
 ...proves Information Transfer in Visual Features |  1 +
 ...t Parallel Algorithms for Smooth Minimax Games |  1 +
 ...Forethought and Hindsight in Credit Assignment |  1 +
 ... Depth Estimators with MED Probability Volumes |  1 +
 ...Frequency Functions in Low Dimensional Domains |  1 +
 ...everage Scores and Approximate Kernel Learning |  3 +++
 ...Discrepancies in Deep Network Generated Images |  1 +
 ...stability of deep learning models for genomics |  1 +
 ...rally and Spatially for Efficient DNN Training |  1 +
 ...ann Machines to Neural Networks and Back Again |  1 +
 ...s to Decisions: Using Lookahead Regularization |  1 +
 ...s and Back: Hyperbolic Hierarchical Clustering |  1 +
 ...ML Prediction APIs more accurately and cheaply |  1 +
 ...oder using Efficient Spatially Varying Kernels |  1 +
 ...orithm for Constrained Submodular Optimization |  1 +
 ...on Learning: A Unified Theoretical Perspective |  1 +
 ...l Redundancy for Efficient Language Processing |  1 +
 ... Outlier Detection with Deep Generative Models |  1 +
 ...ing rule derived from backpropagation of error |  1 +
 data/2020/neurips/GAN Memory with No Forgetting   |  1 +
 ...NSpace: Discovering Interpretable GAN Controls |  1 +
 ...ing \"When to Sample\" from \"How to Sample\"" |  1 +
 ...inatorial Algorithms over Billion-sized Graphs |  1 +
 ...ph Neural Networks against Adversarial Attacks |  1 +
 ...orrespondence Volumes into Your Neural Network |  1 +
 ...PS-Net: Graph-based Photometric Stereo Network |  1 +
 ... for Extremely Fast Large-Scale Classification |  1 +
 ...e Radiance Fields for 3D-Aware Image Synthesis |  1 +
 ...for Learning Differentially Private Generators |  1 +
 ...rence Learning for Infinite-Horizon Prediction |  1 +
 data/2020/neurips/Gaussian Gated Linear Networks  |  1 +
 ...ion of the Thermodynamic Variational Objective |  1 +
 ...unctions for Causal Effect Estimation from IVs |  0
 ...ty of Soft Interventions: Completeness Results |  1 +
 ... Bayesian Filtering via Sequential Monte Carlo |  1 +
 ...radient Descent for Non-Convex Metric Learning |  1 +
 ...tion by infinite dimensional Langevin dynamics |  1 +
 ...proaching Bayes error with convex optimization |  1 +
 data/2020/neurips/Generalized Boosting            |  0
 ...uted Bounding Boxes for Dense Object Detection |  1 +
 ...neralized Hindsight for Reinforcement Learning |  1 +
 ...n for Estimating Latent Variable Causal Graphs |  1 +
 ...ed Leverage Score Sampling for Neural Networks |  4 ++++
 ...ubgoals in Hierarchical Reinforcement Learning |  1 +
 ...rs for Progressive Matrices Intelligence Tests |  1 +
 ...ve 3D Part Assembly via Dynamic Graph Learning |  1 +
 .../neurips/Generative Neurosymbolic Machines     |  1 +
 ...rom Single-view Semantics to Novel-view Images |  1 +
 ...e causal explanations of black-box classifiers |  1 +
 ...Functions for Single-view Human Reconstruction |  1 +
 ...Geometric All-way Boolean Tensor Decomposition |  1 +
 ...metric Dataset Distances via Optimal Transport |  1 +
 .../Geometric Exploration for Online Control      |  1 +
 data/2020/neurips/Gibbs Sampling with People      |  1 +
 ...ing Spatial Redundancy in Image Classification |  1 +
 ...Class of Nonconvex-Nonconcave Minimax Problems |  1 +
 ... One Wide Layer Followed by Pyramidal Topology |  1 +
 ... Text-to-Speech via Monotonic Alignment Search |  1 +
 ...raining Deep Neural Networks on Encrypted Data |  2 ++
 ... Structures with Conditional Generative Models |  1 +
 ...Regularization Method for Deep Neural Networks |  1 +
 .../neurips/Gradient Boosted Normalizing Flows    |  1 +
 ...ient Estimation with Stochastic Softmax Tricks |  1 +
 ...rized V-Learning for Dynamic Treatment Regimes |  1 +
 .../Gradient Surgery for Multi-Task Learning      |  1 +
 .../neurips/Gradient-EM Bayesian Meta-Learning    |  1 +
 ...o Unsupervised Graph Matching Network Learning |  1 +
 ...N: Deep 3D Texture Synthesis From 2D Exemplars |  1 +
 ...aph Cross Networks with Vertex Infomax Pooling |  1 +
 .../neurips/Graph Geometry Interaction Learning   |  1 +
 data/2020/neurips/Graph Information Bottleneck    |  1 +
 .../Graph Meta Learning via Local Subgraphs       |  1 +
 ...ork for Transferable Active Learning on Graphs |  1 +
 ...etworks for Semi-Supervised Learning on Graphs |  1 +
 ...c Neural Networks for Semi-supervised Learning |  1 +
 ...d the Transferability of Graph Neural Networks |  1 +
 ...Solution for Visual Learning of Robotic Grasps |  1 +
 ...ogarithmic Number of Winning Tickets is Enough |  1 +
 ... inference with structure-exploiting lazy maps |  1 +
 ...ol: Distortion-Aware Sparse Adversarial Attack |  1 +
 .../Group Contextual Encoding for 3D Point Clouds |  1 +
 ...: Federated Learning of Large CNNs at the Edge |  1 +
 ...roup-Fair Online Allocation in Continuous Time |  1 +
 ... Evaluating and Enhancing Adversarial Defenses |  1 +
 ...olecular Optimization with Genetic Exploration |  1 +
 ...naptic plasticity with Hebbian Memory Networks |  1 +
 ...trace-Weighted Quantization of Neural Networks |  1 +
 ...earest Neighbor Search on Heterogeneous Memory |  1 +
 ...ating and Decomposing Human-Object Interaction |  1 +
 ...HRN: A Holistic Approach to One Class Learning |  1 +
 ...: Pruning Adversarially Robust Neural Networks |  1 +
 ...nference for latent Gaussian models and beyond |  0
 ...issing Data with Graph Representation Learning |  1 +
 ...sis for Cross-domain Shape Similarity Learning |  1 +
 .../Hard Negative Mixing for Contrastive Learning |  1 +
 .../Hard Shape-Constrained Kernel Machines        |  1 +
 ... Learning Neural Networks with Natural Weights |  1 +
 ...y Tails, and Generalization in Neural Networks |  1 +
 ...xt Polarity Classification & Data Augmentation |  1 +
 ...aster convergence of external and swap regrets |  4 ++++
 data/2020/neurips/Heuristic Domain Adaptation     |  1 +
 ...r Efficient and High Fidelity Speech Synthesis |  1 +
 ...ent Memory with Optimal Polynomial Projections |  1 +
 ...ess Priors for Bayesian Neural Network Weights |  1 +
 .../Hierarchical Granularity Transfer Learning    |  1 +
 ...l Architecture Search for Deep Stereo Matching |  1 +
 ...Generating Diverse Videos from a Single Sample |  1 +
 ...g for Compositional Generalization in Language |  1 +
 .../neurips/Hierarchical Quantized Autoencoders   |  1 +
 ...ierarchical nucleation in deep neural networks |  1 +
 ...or Exploratory Search in Morphogenetic Systems |  1 +
 ...n Optimization via Nested Riemannian Manifolds |  1 +
 ...wn Context Rewards using Bayesian Optimization |  1 +
 .../High-Dimensional Sparse Linear Bandits        |  1 +
 .../High-Fidelity Generative Image Compression    |  1 +
 .../neurips/High-Throughput Synchronous Deep RL   |  1 +
 ...f deep neural network models of visual cortex" |  1 +
 ...correlated time series with latent confounders |  1 +
 ...r-Order Certification For Randomized Smoothing |  1 +
 ...r-Order Spectral Clustering of Directed Graphs |  1 +
 ...ction for Maximizing Expected Order Statistics |  2 ++
 ...riminative features on deep network boundaries |  1 +
 ...udy of Deep Neural Network Explanation Methods |  1 +
 ...pharmaceutical Interventions against COVID-19? |  1 +
 ...air decisions fare in long-term qualification? |  1 +
 ...rpretable Attribution for Feature Interactions |  1 +
 ...eneralisation Ability of Deep Neural Networks? |  0
 ...distinguish graphs with graph neural networks? |  1 +
 ...itial point worth in Low-rank Matrix Recovery? |  1 +
 ...verparameterized Convolutional Neural Networks |  1 +
 ... Action-Gradient-Estimator Policy Optimization |  1 +
 ...e Image to 3D Human via Cross-View Consistency |  1 +
 ...ith Hybrid Similarity Measure and Triplet Loss |  1 +
 .../neurips/Hybrid Models for Learning to Branch  |  1 +
 ...inimax Problems with Nonconvex-Linear Function |  1 +
 ...ersolvers: Toward Fast Continuous-Depth Models |  1 +
 ...epresentations and Feature Attribution Mapping |  1 +
 ...nergy-Based Deep Models Based on Nonlinear ICA |  1 +
 ... Correlation Network for Co-Saliency Detection |  1 +
 ...alized Accelerated Augmented Lagrangian Method |  1 +
 ...nt Neural Architecture Search by Sparse Coding |  1 +
 ...nference Failure with Uncertainty-Aware Models |  1 +
 ...Learning Rules From Neural Network Observables |  1 +
 ...d Data using the Area Under the Margin Ranking |  1 +
 ...n activity with Gaussian process factor models |  1 +
 ... Algorithm Configuration from an Infinite Pool |  1 +
 ...ion: Initialization Scale vs Training Accuracy |  1 +
 ...Implicit Distributional Reinforcement Learning |  1 +
 data/2020/neurips/Implicit Graph Neural Networks  |  1 +
 ...esentations with Periodic Activation Functions |  1 +
 .../neurips/Implicit Rank-Minimizing Autoencoder  |  1 +
 ... Deep Learning May Not Be Explainable by Norms |  1 +
 ... Results for Grammar-Compressed Linear Algebra |  3 +++
 ...rithms for Convex-Concave Minimax Optimization |  1 +
 ...lar Maximization via First-order Regret Bounds |  1 +
 ...lipping Algorithms for Non-convex Optimization |  1 +
 ...uarantees for k-means++ and k-means++ Parallel |  1 +
 ...for Incremental Autonomous Exploration in MDPs |  1 +
 ...es for Episodic Memory-based Lifelong Learning |  1 +
 ...ues for Training Score-Based Generative Models |  1 +
 ... Phylogenetic Inference with Normalizing Flows |  1 +
 ...Column Subset Selection and the Nystrom method |  1 +
 ...o-Augment via Augmentation-Wise Weight Sharing |  1 +
 ...bability Ratio Clipping and Sample Reweighting |  1 +
 ...forcement Learning with Mixture Regularization |  1 +
 ...proving Inference for Neural Image Compression |  1 +
 ...dentifiability in Probabilistic Box Embeddings |  1 +
 ... Tasks with Human Gaze-Guided Neural Attention |  1 +
 ...twork Training in Low Dimensional Random Bases |  1 +
 ... Sequential Decision Making and ML Predictions |  1 +
 ...-Constrained Kidney Exchange via Pre-Screening |  1 +
 ...y Bounds for (Natural) Actor-Critic Algorithms |  1 +
 ...ctor Technique with Renyi Differential Privacy |  1 +
 ... with accuracy versus uncertainty optimization |  1 +
 ...mmon corruptions by covariate shift adaptation |  1 +
 ...In search of robust measures of generalization |  1 +
 ... into Parallel Sequence Decoding with Adapters |  1 +
 ...Output Constraints in Bayesian Neural Networks |  1 +
 ...Reasoning Communication into Emergent Language |  1 +
 ...Methods for Competitive Reinforcement Learning |  1 +
 data/2020/neurips/Inductive Quantum Embedding     |  1 +
 ...on for Cross-scenario 3D Human Pose Estimation |  1 +
 data/2020/neurips/Inference for Batched Bandits   |  1 +
 ...ing learning rules from animal decision-making |  1 +
 ...ented Online Planning for Complex Environments |  1 +
 ...Information Maximization for Few-Shot Learning |  0
 ...l Learning from Missing-Not-At-Random Feedback |  1 +
 ...tic Regret Bounds for Online Nonlinear Control |  1 +
 ...ion theoretic limits of learning a sparse rule |  1 +
 ...Task Selection for Meta-Reinforcement Learning |  1 +
 .../neurips/Input-Aware Dynamic Backdoor Attack   |  1 +
 ...d Approximations to Profile Maximum Likelihood |  1 +
 data/2020/neurips/Instance Selection for GANs     |  1 +
 ...based Generalization in Reinforcement Learning |  1 +
 ...via approximate inverse sensitivity mechanisms |  1 +
 data/2020/neurips/Instance-wise Feature Grouping  |  1 +
 ...rning, Automatically Synthesize Fast Gradients |  1 +
 ...terferometer by a reinforcement learning agent |  1 +
 ...t Solving for LP-based prediction+optimisation |  1 +
 ... Speed Up Gradients Propagation in Neural ODEs |  1 +
 ...ble Sequence Learning for Covid-19 Forecasting |  1 +
 ...olicies from Heterogeneous User Demonstrations |  1 +
 ...ng fMRI responses to continuous natural speech |  1 +
 ...ent Architecture for Knowledge Graph Embedding |  1 +
 .../2020/neurips/Interventional Few-Shot Learning |  1 +
 ...for Calibration of Multi-Class Neural Networks |  1 +
 ...ocessing Methods for Debiasing Neural Networks |  1 +
 ...ducing Routing Uncertainty in Capsule Networks | 11 +++++++++++
 data/2020/neurips/Inverse Learning of Symmetries  |  1 +
 ...ially Observable Continuous Nonlinear Dynamics |  1 +
 ...arameterization: Revisiting the Gumbel-Softmax |  1 +
 ... is it to break privacy in federated learning? |  1 +
 ...anguage Models Using Causal Mediation Analysis |  1 +
 ...rizon RL More Difficult Than Short Horizon RL? |  1 +
 ...ient for Feature-based Reinforcement Learning? |  1 +
 ...ndispensable for training deep neural network? |  1 +
 ...al Networks: Better and Robust Node Embeddings |  1 +
 ...JAX MD: A Framework for Differentiable Physics |  1 +
 ...ntrastive Learning with Infinite Possibilities |  1 +
 ...agent Collaboration with Imperfect Information |  1 +
 data/2020/neurips/Joints in Random Forests        |  1 +
 ...ep Multitask Models with Gradient Sign Dropout |  1 +
 ...ation Algorithm for $k$-center Fair Clustering |  1 +
 ...n for User Behavior Modeling in CTR Prediction |  1 +
 ... Estimator: Risk Prediction from Training Data |  1 +
 ...ressive Distillation for Adder Neural Networks |  1 +
 ... Roof: Handling Billions of Points Efficiently |  1 +
 ...ble 3-factor Hebbian learning in deep networks |  1 +
 ... Facial Expression and Action Unit Recognition |  1 +
 ...k Bound, Data Efficiency and Imperfect Teacher |  1 +
 ... Reinforcement Learning for Continuous Control |  1 +
 ...k for Single Image Super-resolution and Beyond |  1 +
 ...ard Better Generalization and Local Elasticity |  1 +
 ...from scratch with multi-modal self-supervision |  1 +
 ...ble signal propagation in feedforward networks |  1 +
 .../neurips/Language Models are Few-Shot Learners |  1 +
 ...proach for Multiscale Language Representations |  1 +
 ...Entity Relationship Graph for Agent Navigation |  1 +
 ... Imagine Goals in Curiosity Driven Exploration |  1 +
 ...mitation Learning for Robot Manipulation Tasks |  1 +
 ...or Vision-and-Language Representation Learning |  1 +
 ...thods for Distributionally Robust Optimization |  1 +
 data/2020/neurips/Latent Bandits Revisited        |  1 +
 ...Analysis of High-Dimensional Neural Recordings |  1 +
 .../Latent Template Induction with Gumbel-CRFs    |  1 +
 ...Models For Intrinsically Motivated Exploration |  1 +
 ... Systematically Reason Over Implicit Knowledge |  1 +
 ...Learnability with Indirect Supervision Signals |  1 +
 ...bout Objects by Learning to Interact with Them |  1 +
 ...for Interaction Exploration in 3D Environments |  1 +
 .../Learning Agent Representations for Ice Hockey |  1 +
 ...ugmented Energy Minimization via Speed Scaling |  1 +
 .../Learning Bounds for Risk-sensitive Learning   |  1 +
 ...fects via Weighted Empirical Risk Minimization |  1 +
 ...ng Certified Individually Fair Representations |  1 +
 ...able Energy Surrogates for PDE Order Reduction |  1 +
 ...mpositional Rules via Neural Program Synthesis |  1 +
 ... from Irregularly-Sampled Partial Observations |  1 +
 ...ep Attribution Priors Based On Prior Knowledge |  1 +
 ...mable Tetrahedral Meshes for 3D Reconstruction |  1 +
 ...ble Programs with Admissible Neural Heuristics |  1 +
 ... Differential Equations that are Easy to Solve |  1 +
 ...odels via Auxiliary-variable Local Exploration |  1 +
 ... and Group Structure of Dynamical Environments |  1 +
 ...ed Representations of Videos with Missing Data |  1 +
 ...the Principle of Maximal Coding Rate Reduction |  1 +
 ...elief Graphs to Generalize on Text-Based Games |  1 +
 .../Learning Feature Sparse Principal Subspace    |  1 +
 ...consistent with Local Contrastive Explanations |  2 ++
 ... Structure With A Finite-State Automaton Layer |  1 +
 ...idance Rewards with Trajectory-space Smoothing |  1 +
 ...Cooperative Multi-Agent Reinforcement Learning |  1 +
 ...Topology-Varying Dense 3D Shape Correspondence |  1 +
 ...rred Communication for Multi-Agent Cooperation |  1 +
 ...ariances in Neural Networks from Training Data |  1 +
 .../Learning Invariants through Soft Unification  |  1 +
 .../Learning Kernel Tests Without Data Splitting  |  1 +
 ...Learning Latent Space Energy-Based Prior Model |  1 +
 ...earning Linear Programs from Optimal Decisions |  1 +
 .../Learning Loss for Test-Time Augmentation      |  1 +
 ...d Implicitly via Explicit Heat-Kernel Learning |  1 +
 ...ication through Structured Attentive Reasoning |  1 +
 ...Target Coverage in Directional Sensor Networks |  1 +
 data/2020/neurips/Learning Mutational Semantics   |  1 +
 ...ons of Multi-Object Scenes from Multiple Views |  1 +
 ...ions with the Decodable Information Bottleneck |  1 +
 .../Learning Parities with Neural Networks        |  1 +
 ...g Physical Constraints with Neural Projections |  1 +
 ...sical Graph Representations from Visual Scenes |  1 +
 ...sentations from Audio-Visual Spatial Alignment |  1 +
 ...oltzmann Machines with Sparse Latent Variables |  1 +
 ... Knowledge with Reverse Reinforcement Learning |  1 +
 data/2020/neurips/Learning Rich Rankings          |  1 +
 ...bust Decision Policies from Observational Data |  1 +
 ...box Optimization using Monte Carlo Tree Search |  1 +
 ...malization for Generative Adversarial Networks |  1 +
 ...aphical Models without Condition Number Bounds |  2 ++
 ...Learning Sparse Prototypes for Text Generation |  1 +
 .../Learning Strategic Network Emergence Games    |  1 +
 .../Learning Strategy-Aware Linear Classifiers    |  1 +
 ...ons From Untrusted Batches: Faster and Simpler |  2 ++
 ...lities and Equilibria in Non-Truthful Auctions |  1 +
 ...r drawing by efficient motor program induction |  1 +
 ...Learning by Minimizing the Sum of Ranked Range |  1 +
 ...al functions via multiplicative weight updates |  1 +
 ...g discrete distributions with infinite support |  1 +
 ...rete distributions: user vs item-level privacy |  3 +++
 ...ndent representations with synaptic plasticity |  1 +
 .../neurips/Learning from Aggregate Observations  |  1 +
 ...: De-biasing Classifier from Biased Classifier |  0
 ... Proportions: A Mutual Contamination Framework |  1 +
 ...rom Mixtures of Private and Public Populations |  3 +++
 ...d Unlabeled Data with Arbitrary Positive Shift |  1 +
 ... high-dimensional neural activity using pi-VAE |  1 +
 ...Discrete Graphical Models with Neural Networks |  1 +
 ...Black-Box: The pursuit of interpretable models |  1 +
 ...iologically plausible local wiring constraints |  1 +
 .../Learning the Geometry of Wave-Based Imaging   |  1 +
 ...uadratic Regulator from Nonlinear Observations |  1 +
 .../neurips/Learning to Adapt to Evolving Domains |  1 +
 .../Learning to Approximate a Bregman Divergence  |  1 +
 ...r Decoding of Sparse Graph-Based Channel Codes |  1 +
 ...hop Scheduling via Deep Reinforcement Learning |  1 +
 ...uction Pointer Attention Graph Neural Networks |  1 +
 ...sductive Few-shot Out-of-Graph Link Prediction |  1 +
 .../Learning to Incentivize Other Learning Agents |  1 +
 .../Learning to Learn Variational Semantic Memory |  1 +
 ...ng to Learn with Feedback and Local Plasticity |  1 +
 ...to Mutate with Hypergradient Guided Population |  1 +
 ...ent Surfaces by Self-supervised Spherical CNNs |  1 +
 ... Diplomacy with Best Response Policy Iteration |  1 +
 ...Play Sequential Games versus Unknown Opponents |  1 +
 ...rove Theorems by Learning to Generate Theorems |  1 +
 ...Forecast Tasks for Clinical Outcome Prediction |  0
 ...ping Rewards: A New Approach of Reward Shaping |  1 +
 ...ficiently for causally near-optimal treatments |  1 +
 ... regularised problems with unrolled algorithms |  1 +
 .../Learning to summarize with human feedback     |  0
 ...plications to Variational and Ensemble methods |  1 +
 ...arning with Differentiable Pertubed Optimizers |  0
 ...ued Kernels in Reproducing Kernel Krein Spaces |  1 +
 ...ning without Sparsity and Low-Rank Assumptions |  1 +
 ...kovian Data: Fundamental Limits and Algorithms |  2 ++
 ...of KL Regularization in Reinforcement Learning |  1 +
 ...vex Optimization via Gradient-based Algorithms |  1 +
 ...t Need Complex Weight Posterior Approximations |  1 +
 ...olicies for Faster Training Without Forgetting |  1 +
 ...al Networks for Text-Guided Image Manipulation |  1 +
 ...n Detection Score For Variational Auto-encoder |  1 +
 ... on Testing Structural Changes in Ising Models |  1 +
 ...Limits to Depth Efficiencies of Self-Attention |  1 +
 ...esentations and Unsupervised Action Estimation |  1 +
 ...ical Systems as a Core Computational Primitive |  1 +
 ...e Sinkhorn Divergences using Positive Features |  1 +
 ...near-Sample Learning of Low-Rank Distributions |  1 +
 .../Linearly Converging Error Compensated SGD     |  1 +
 ...rovably Robust Training by Laplacian Smoothing |  1 +
 ...-Certifiable Training with a Tight Outer Bound |  1 +
 ... Mean Estimation via Iterative Multi-Filtering |  1 +
 ...ning to Sounds of Silence for Speech Denoising |  1 +
 ...oCo: Local Contrastive Representation Learning |  1 +
 ...entially Private (Contextual) Bandits Learning |  1 +
 ...butions is faster using interactive mechanisms |  1 +
 ...Locally-Adaptive Nonparametric Online Learning |  1 +
 ...and Quantifiable Neural Distribution Alignment |  1 +
 .../neurips/Logarithmic Pruning is All You Need   |  1 +
 ... Partially Observable Linear Dynamical Systems |  1 +
 ... with Goal-Conditioned Hierarchical Predictors |  1 +
 ...od and Removing the Bad Momentum Causal Effect |  1 +
 ... Pose and Shape for 3D Human Mesh Registration |  1 +
 ...-Resampling with Spatially Stochastic Networks |  1 +
 ... Awareness to Task Embedding for Meta Learning |  1 +
 .../MCUNet: Tiny Deep Learning on IoT Devices     |  1 +
 ...ks: Group Symmetries in Reinforcement Learning |  1 +
 ...Ensemble Imbalanced Learning with MEta-SAmpler |  1 +
 ...ural Networks by Maximizing the Minimal Angles |  1 +
 .../MOPO: Model-based Offline Policy Optimization |  1 +
 ...eL: Model-Based Offline Reinforcement Learning |  1 +
 ...rmuted Pre-training for Language Understanding |  1 +
 .../MRI Banding Removal via Adversarial Training  |  1 +
 ...Shot Video Object Segmentation Efficient Again |  1 +
 ...chastic Control (Almost) as Easy as Stochastic |  2 ++
 ... non-Euclidean latent structure in neural data |  1 +
 .../Manifold structure in graph embeddings        |  1 +
 ... in Continuous or Large Discrete Action Spaces |  1 +
 ... Insufficient for Explaining Gradient Boosting |  1 +
 ...etion with Hierarchical Graph Side Information |  1 +
 ...Inference and Estimation in Multi-Layer Models |  1 +
 ...rn Gaussian Processes on Riemannian Manifolds" |  1 +
 ...ion for Improved Generalization and Robustness |  1 +
 ...al Distribution Shifts in Image Classification |  1 +
 ...n in Neural Proof Generation with Transformers |  1 +
 ...oned Policies for Learning from Sparse Rewards |  1 +
 ...r Dynamical Systems for Prediction and Control |  1 +
 ...MeshSDF: Differentiable Iso-Surface Extraction |  4 ++++
 .../Meta-Consolidation for Continual Learning     |  3 +++
 ...t Learning with an Objective Discovered Online |  1 +
 .../Meta-Learning Requires Meta-Augmentation      |  1 +
 ...Prediction with Convolutional Neural Processes |  1 +
 ... through Hebbian Plasticity in Random Networks |  1 +
 .../Meta-Learning with Adaptive Hyperparameters   |  1 +
 data/2020/neurips/Meta-Neighborhoods              |  1 +
 ...from Tasks with Heterogeneous Attribute Spaces |  1 +
 ...-trained agents implement Bayes-optimal agents |  1 +
 ...izer for Heterogeneous Tasks and Architectures |  1 +
 ...cal General-purpose Clean-label Data Poisoning |  1 +
 ...taSDF: Meta-Learning Signed Distance Functions |  1 +
 ...ic-Free Individual Fairness in Online Learning |  1 +
 ...nd: Regularization, Approximation and Numerics |  1 +
 ...nostic Compression of Pre-Trained Transformers |  1 +
 ... Stochastic Approximate Proximal Point Methods |  1 +
 ...cal SGD for Heterogeneous Distributed Learning |  1 +
 .../Minimax Bounds for Generalized Linear Models  |  1 +
 ...ation with 0-1 Loss and Performance Guarantees |  1 +
 ... Networks of Excitatory and Inhibitory Neurons |  1 +
 ...inimax Estimation of Conditional Moment Models |  1 +
 ...th Linear and One-hidden Layer Neural Networks |  1 +
 ... Estimation of Heterogeneous Treatment Effects |  1 +
 ...nline Convex Optimization: No Phase Transition |  1 +
 ... Off-Policy Evaluation and Policy Optimization |  4 ++++
 ...l Learning via Instance-Aware Parameterization |  1 +
 ...eer Review via Randomized Reviewer Assignments |  1 +
 ...for Learning Models from Mixture Distributions |  1 +
 ...lo for Mixed Discrete and Continuous Variables |  1 +
 .../Model Agnostic Multilevel Explanations        |  1 +
 .../Model Class Reliance for Random Forests       |  1 +
 .../neurips/Model Fusion via Optimal Transport    |  1 +
 ...sting Resolution, Depth and Width for TinyNets |  1 +
 ...uction System via Automated Online Experiments |  1 +
 ...ction in Contextual Stochastic Bandit Problems |  1 +
 ...rkov Games with Near-Optimal Sample Complexity |  1 +
 ...-based Adversarial Meta-Reinforcement Learning |  1 +
 ...ptimization with Unsupervised Model Adaptation |  1 +
 ...emi-Markov Decision Processes with Neural ODEs |  1 +
 ...astic Processes with Dynamic Normalizing Flows |  1 +
 .../Modeling Noisy Annotations for Crowd Counting |  1 +
 ... in Neuroimaging Studies through MultiView ICA |  3 +++
 ...tion in the Brain via Zero-Shot MEG Prediction |  1 +
 ...ng and Optimization Trade-off in Meta-learning |  1 +
 ...Attention for Immune Repertoire Classification |  1 +
 .../neurips/Modular Meta-Learning with Shrinkage  |  1 +
 ...rating Momentum into Recurrent Neural Networks |  1 +
 .../Monotone operator equilibrium networks        |  1 +
 ...ment Pruning: Adaptive Sparsity by Fine-Tuning |  1 +
 ...Compression of LiDAR using Deep Entropy Models |  1 +
 ...Bayesian Optimization via Deep Neural Networks |  1 +
 ...lti-Plane Program Induction with 3D Box Priors |  1 +
 ...with Probabilistic Safety Barrier Certificates |  1 +
 data/2020/neurips/Multi-Stage Influence Function  |  1 +
 ...einforcement Learning with Soft Modularization |  1 +
 ...s for On-Device Contactless Vitals Measurement |  1 +
 ...ajectory Prediction with Fuzzy Query Attention |  1 +
 ...gent active perception with prediction rewards |  1 +
 .../Multi-label Contrastive Predictive Coding     |  1 +
 ...bset accuracy really conflict with each other? |  1 +
 ...t Estimation and Automatic Structure Discovery |  1 +
 ...ch Reinforcement Learning with Metric Learning |  1 +
 ...i-task Causal Learning with Gaussian Processes |  1 +
 ...antic Map Memory using Multi-Object Navigation |  1 +
 ...y Estimation for Label-Efficient Deep Learning |  1 +
 ...hical Partitioning and Data-dependent Grouping |  1 +
 ...e Learning Utilizing Jensen-Shannon-Divergence |  1 +
 ...al Generalization in Visual Question Answering |  1 +
 ...istence Image for Topological Machine Learning |  1 +
 ... for Parametric Partial Differential Equations |  1 +
 .../neurips/Multiscale Deep Equilibrium Models    |  1 +
 ...ction by Disentangling Geometry and Appearance |  1 +
 .../neurips/Munchausen Reinforcement Learning     |  1 +
 ...sivity as a challenge for deep neural networks |  1 +
 data/2020/neurips/Myersonian Regression           |  1 +
 ...E: A Deep Hierarchical Variational Autoencoder |  1 +
 ...zing Flows with Sublinear Parameter Complexity |  1 +
 data/2020/neurips/Natural Graph Networks          |  1 +
 ...thod for Constrained Markov Decision Processes |  1 +
 .../Near-Optimal Comparison Based Clustering      |  1 +
 ...-Optimal Reinforcement Learning with Self-Play |  1 +
 ... Halfspaces and ReLUs under Gaussian Marginals |  1 +
 ...work Diffusions via Neural Mean-Field Dynamics |  1 +
 ...n memorization with two-layers neural networks |  0
 ...on with Conditional Invertible Neural Networks |  1 +
 ...ng for supervised learning with missing values |  1 +
 data/2020/neurips/Neural Anisotropy Directions    |  1 +
 .../Neural Architecture Generator Optimization    |  1 +
 ... Evaluating Safety-Critical Autonomous Systems |  1 +
 data/2020/neurips/Neural Complexity Measures      |  1 +
 ...fferential Equations for Irregular Time Series |  1 +
 ... Policies for End-to-End Sensorimotor Learning |  1 +
 ...ution Engines: Learning to Execute Subroutines |  1 +
 ...ral FFTs for Universal Texture Image Synthesis |  1 +
 ...eural Manifold Ordinary Differential Equations |  1 +
 ...nifold Mesh Generation via Diffeomorphic Flows |  1 +
 ...i-Relational Ordered and Recursive Hypergraphs |  1 +
 ...l Methods for Point-wise Dependency Estimation |  1 +
 ... to Learn Periodic Functions and How to Fix It |  1 +
 ...ization with (almost) no Over-Parameterization |  2 ++
 ...al Networks with Recurrent Generative Feedback |  1 +
 ...th Small Weights and Depth-Separation Barriers |  1 +
 data/2020/neurips/Neural Non-Rigid Tracking       |  1 +
 ...derstanding the role of gates in deep learning |  1 +
 data/2020/neurips/Neural Power Units              |  1 +
 ...al Sparse Representation for Image Restoration |  1 +
 data/2020/neurips/Neural Sparse Voxel Fields      |  1 +
 ...Neural Star Domain as Primitive Representation |  1 +
 ...ural Topographic Factor Analysis for fMRI Data |  1 +
 ...Distance Fields for Implicit Function Learning |  1 +
 .../neurips/Neural encoding with visual attention |  1 +
 ...euron Merging: Compensating for Pruned Neurons |  1 +
 ...n Shapley: Discovering the Responsible Neurons |  1 +
 ...uctured Pruning using Polarization Regularizer |  1 +
 .../neurips/Neuronal Gaussian Process Regression  |  1 +
 ...nt Learning with Formally Verified Exploration |  1 +
 ...lic Transformers for Multi-Agent Communication |  1 +
 ... Self-Selection Bias in Sampling for Sortition |  1 +
 ...Parallel GPU Task Scheduling for Deep Learning |  1 +
 ...ness in Coarse-Grained Classification Problems |  1 +
 ...mics for Extensive-Form Correlated Equilibrium |  1 +
 ...ing and Mixed Nash Equilibria: They Do Not Mix |  1 +
 ... Competitions under Consumer Reference Effects |  1 +
 ... Labels via Meta Transformed Network Embedding |  1 +
 ...t Low-Rank Representations of Complex Networks |  2 ++
 ...ve Estimation for Multivariate Point Processes |  1 +
 ...ng A Self-Supervised Bound for Image Denoising |  1 +
 ...Learns Halfspaces with Adversarial Label Noise |  1 +
 ...sion for Distributional Reinforcement Learning |  0
 .../neurips/Non-Euclidean Universal Approximation |  1 +
 .../Non-Stochastic Control with Bandit Feedback   |  1 +
 ...n-parametric Models for Non-negative Functions |  1 +
 ...ying latent dynamical structure in neural data |  1 +
 ... Spiked Matrix Recovery with Generative Priors |  1 +
 ...ng under Laplacian Constrained Graphical Model |  1 +
 ... Filters for Multivariate Time Series Analysis |  1 +
 ...ing to Weight Data in Semi-supervised Learning |  1 +
 ...ational Space for Sample Efficient Exploration |  1 +
 ...rtial Differential Equations via Deep Learning |  1 +
 ...iversal Approximability of Sparse Transformers |  1 +
 ...t-of-Distribution Detection and Classification |  1 +
 ... Optimal Transport Approach for Topic Modeling |  1 +
 ...ation using Goal-Oriented Semantic Exploration |  1 +
 .../Object-Centric Learning with Slot Attention   |  1 +
 data/2020/neurips/Ode to an ODE                   |  1 +
 ... for External Validity under a Covariate Shift |  1 +
 ...licy Evaluation via the Regularized Lagrangian |  1 +
 ...ff-Policy Imitation Learning from Observations |  1 +
 ...rval Estimation with Lipschitz Value Iteration |  1 +
 ...uential Decisions Under Unobserved Confounding |  1 +
 ...itation Learning with a Misspecified Simulator |  1 +
 .../On 1 n neural representation and robustness   |  1 +
 ...aptive Attacks to Adversarial Example Defenses |  1 +
 data/2020/neurips/On Adaptive Distance Estimation |  2 ++
 ...ept-Based Explanations in Deep Neural Networks |  1 +
 ...ergence and Generalization of Dropout Training |  1 +
 ...ghbor Classifiers over Feature Transformations |  1 +
 ...fferentiation for Non-Differentiable Functions |  1 +
 ...iciency in Hierarchical Reinforcement Learning |  1 +
 data/2020/neurips/On Infinite-Width Hypernetworks |  1 +
 ...Ising Models under Huber's Contamination Model |  1 +
 .../neurips/On Numerosity of Deep Neural Networks |  1 +
 data/2020/neurips/On Power Laws in Deep Ensembles |  2 ++
 .../neurips/On Regret with Multiple Best Arms     |  1 +
 ...nt Learning with Linear Function Approximation |  1 +
 ...econd Order Behaviour in Augmented Neural ODEs |  1 +
 data/2020/neurips/On Testing of Samplers          |  3 +++
 ...onvergence and Low-Norm Interpolation Learning |  1 +
 .../On Warm-Starting Neural Network Training      |  1 +
 ...king via sorting by estimated expected utility |  1 +
 ...hastic Gradient Descent in Non-Convex Problems |  1 +
 ...egularized Approximate Value Iteration Schemes |  1 +
 ...vate Learnability beyond Binary Classification |  1 +
 ...rmality of Randomized Midpoint Sampling Method |  1 +
 ...he Error Resistance of Hinge-Loss Minimization |  1 +
 ...roximate Inference in Bayesian Neural Networks |  1 +
 ...dentifying Challenges and How to Overcome Them |  1 +
 .../neurips/On the Modularity of Hypernetworks    |  1 +
 ...Power of Louvain in the Stochastic Block Model |  1 +
 ...y and DAG Constraints for Learning Linear DAGs |  1 +
 ...between the Laplace and Neural Tangent Kernels |  1 +
 ...ning: A Case Study on Linear Quadratic Systems |  1 +
 ...fer Learning: The Importance of Task Diversity |  1 +
 ... Certifying Robustness to Adversarial Examples |  1 +
 ...ff between Adversarial and Backdoor Robustness |  1 +
 ...ribution Testing: An Example of Goodhart's Law |  1 +
 ... neural networks and the stability of learning |  1 +
 ...nd molecular wave function with poor basis set |  1 +
 ...s: when and why the tangent kernel is constant |  1 +
 ...deoff between Robustness and Accuracy for Free |  1 +
 ...ably Robust Geometric Perception with Outliers |  1 +
 ...ew-Shot Extrapolation via Structured MaxEnt RL |  1 +
 .../One-bit Supervision for Image Classification  |  1 +
 ...ple Guided Object Representation Disassembling |  1 +
 ...line Agnostic Boosting via Regret Minimization |  3 +++
 ...quential Selection with Contextual Information |  1 +
 ...ti-shop Ski Rental with Machine Learned Advice |  1 +
 ...ference for Boundedly Rational Planning Agents |  1 +
 data/2020/neurips/Online Bayesian Persuasion      |  1 +
 ... Optimization Over Erdos-Renyi Random Networks |  1 +
 ...sed Visual Tracking via Reinforcement Learning |  1 +
 ... (OSAKA): a New Approach to Continual Learning |  1 +
 ...ence Maximization under Linear Threshold Model |  1 +
 ...Contextual Bandits using Gated Linear Networks |  1 +
 ...ine Learning with Primary and Secondary Losses |  1 +
 .../Online Linear Optimization with Many Hints    |  1 +
 ...MAP Inference of Determinantal Point Processes |  1 +
 ...Online Matrix Completion with Side Information |  1 +
 ...c Learning for Off-Policy Actor-Critic Methods |  1 +
 ...nline Multitask Learning with Long-Term Memory |  1 +
 ...nnectivity Estimation with Noisy Group Testing |  1 +
 ...on-Convex Optimization with Imperfect Feedback |  1 +
 ...timization with Memory and Competitive Control |  1 +
 .../Online Planning with Lookahead Policies       |  1 +
 ...nline Robust Regression via SGD on the l1 loss |  1 +
 ...ptimal Transport distances from sample streams |  1 +
 data/2020/neurips/Online Structured Meta-learning |  1 +
 ... learning with dynamics: A minimax perspective |  3 +++
 ...hmark: Datasets for Machine Learning on Graphs |  1 +
 ... Maximize Simultaneously Recorded Neuron Yield |  1 +
 ... Multi-Armed Bandits with Heavy Tailed Rewards |  1 +
 ... - Smoothness Tradeoffs for Soft-Max Functions |  1 +
 ...imal Best-arm Identification in Linear Bandits |  1 +
 ...escent Ascent Methods for Min-Max Optimization |  1 +
 ...h the Subsampled Randomized Hadamard Transform |  1 +
 .../Optimal Learning from Verified Training Data  |  1 +
 ...ogarithmic Over-Parameterization is Sufficient |  1 +
 ...the Number of Unseen Species with Multiplicity |  1 +
 ...ation under Minimal Distributional Assumptions |  1 +
 ...exity of Secure Stochastic Convex Optimization |  1 +
 ...-offs for Learning-Augmented Online Algorithms |  1 +
 ...dient Estimator for Importance-Weighted Bounds |  1 +
 ...and Strongly Convex Decentralized Optimization |  1 +
 ...ceiving a Learning Leader in Stackelberg Games |  1 +
 ...Coherent Non-monotone Variational Inequalities |  1 +
 ...plication to Multi-scale Graph Neural Networks |  1 +
 ...l Networks with Quadratic Activation Functions |  6 ++++++
 ...imizing Mode Connectivity via Neuron Alignment |  1 +
 ...ng Neural Networks via Koopman Operator Theory |  1 +
 ... offering using an individual treatment effect |  1 +
 ... task-computation to enable continual learning |  1 +
 ...stimation with Subgaussian Rates via Stability |  1 +
 ...nalysis Overcoming the Curse of Dimensionality |  1 +
 ...rmless for Basis Pursuit, But Only to a Degree |  1 +
 .../PAC-Bayes Analysis Beyond the Usual Bounds    |  2 ++
 ...es Learning Bounds for Sample-Dependent Priors |  1 +
 ...yesian Bound for the Conditional Value at Risk |  1 +
 ...loration for Provable Policy Gradient Learning |  1 +
 .../PEP: Parameter Ensembling by Perturbation     |  1 +
 ...l Model Explanations for Graph Neural Networks |  1 +
 ...NET: Parametric Inference of Point Cloud Edges |  1 +
 ...S: Neuro-Symbolic Program Learning from Videos |  1 +
 ...pological Layer based on Persistent Landscapes |  0
 ...inuous Space MDPs with Non-Asymptotic Analysis |  1 +
 .../POMDPs in Continuous Time and Discrete Spaces |  1 +
 ...ith Multiple Optima for Reinforcement Learning |  1 +
 .../PRANK: motion Prediction based on RANKing     |  1 +
 .../Parabolic Approximation Line Search for DNNs  |  1 +
 ...rameterized Explainer for Graph Neural Network |  1 +
 ...ation for Unsupervised Visual Feature learning |  1 +
 ... Noise: Towards Instance-dependent Label Noise |  1 +
 .../neurips/Partially View-aligned Clustering     |  1 +
 ...MRI with Self-Supervised Learning\342\200\213" |  1 +
 ...volution and Pooling for Graph Neural Networks |  1 +
 ...ient Estimators for Stochastic Binary Networks |  1 +
 ...ing penalty for smooth and log-concave targets |  1 +
 ...mechanism for differentially private selection |  1 +
 ...lized Federated Learning with Moreau Envelopes |  1 +
 ...ntees: A Model-Agnostic Meta-Learning Approach |  1 +
 ...ard and Strict Blackbox Attack Transferability |  1 +
 ...tatistical and computational phase transitions |  1 +
 ...ing Approximate Nash Equilibria in Large Games |  1 +
 ...tive for Domain Adaptive Semantic Segmentation |  1 +
 ...lanning With Sparse Rewards and Multiple Goals |  1 +
 ...Processes with Gap-Dependent Sample Complexity |  1 +
 ...bjective Functions: Going Beyond Total Rewards |  1 +
 ...ection in high-dimensional neural spike trains |  1 +
 data/2020/neurips/Pointer Graph Networks          |  1 +
 ... Improvement via Imitation of Multiple Oracles |  1 +
 ...Form Games with Public Chance Moves and Beyond |  1 +
 ...: An End-to-End Learning and Control Framework |  1 +
 ...ed Gradient for Model Quantization and Pruning |  1 +
 ...erarchical Data Augmentation for Deep Networks |  1 +
 ...ut OOD Samples via Density-Based Pseudo-Counts |  2 ++
 ...sterior Re-calibration for Imbalanced Datasets |  1 +
 ...ion Compression in Decentralized Deep Learning |  1 +
 ...ctical No-box Adversarial Attacks against DNNs |  1 +
 ...wton Methods for Training Deep Neural Networks |  1 +
 data/2020/neurips/Pre-training via Paraphrasing   |  1 +
 ...: Low-rank approximation and randomized Newton |  1 +
 .../Predicting Training Time Without Training     |  1 +
 .../Prediction with Corrupted Expert Advice       |  1 +
 ...dictive Information Accelerates Learning in RL |  1 +
 ...d neural networks with noise, chaos and delays |  1 +
 ...ce is free with the jackknife+-after-bootstrap |  1 +
 ...ultiple criteria: A game-theoretic perspective |  1 +
 ...forcement Learning with Finite-Time Guarantees |  1 +
 ...roximal Stochastic Gradient Langevin Algorithm |  1 +
 ...Primal-Dual Mesh Convolutional Neural Networks |  1 +
 ...cipal Neighbourhood Aggregation for Graph Nets |  1 +
 .../Privacy Amplification via Random Check-Ins    |  1 +
 ...ity Testing for High-Dimensional Distributions |  1 +
 ...onstruction and Reducing the Sample Complexity |  1 +
 .../neurips/Probabilistic Active Meta-Learning    |  1 +
 ...ational Inference in Discrete Graphical Models |  1 +
 data/2020/neurips/Probabilistic Fair Clustering   |  1 +
 ...heoretical Limits and Practical Approximations |  1 +
 ...babilistic Linear Solvers for Machine Learning |  1 +
 ...on Estimation with Matrix Fisher Distributions |  1 +
 ... Forecasting with Shape and Temporal Diversity |  0
 ...bly Approximately Correct Constrained Learning |  1 +
 ...rnability and Compressibility of Distributions |  1 +
 ...Program Synthesis with Pragmatic Communication |  1 +
 .../Projected Stein Variational Gradient Descent  |  1 +
 ...ethod and Optimal Nonsmooth Frank-Wolfe Method |  1 +
 ...sserstein Distance and Riemannian Optimization |  1 +
 ...ion in Multi-Agent Deep Reinforcement Learning |  1 +
 ...a a Simple and Efficient Regularization Method |  1 +
 ...on: Predicting Attention with Future Attention |  1 +
 ...of a Structured Tensor via Dictionary Learning |  1 +
 ...lapping Community Detection in Weighted Graphs |  1 +
 .../Provably Consistent Partial-Label Learning    |  1 +
 ...forcement Learning Using Unsupervised Learning |  1 +
 ...tural Equation Models: An Adversarial Approach |  1 +
 ...y Efficient Neural GTD for Off-Policy Learning |  0
 ...ter Optimization with Population-Based Bandits |  1 +
 ...with Kernel and Neural Function Approximations |  0
 ...gnostic Navigation with Linear Value Iteration |  1 +
 ...inforcement Learning Without Great Exploration |  0
 data/2020/neurips/Provably Robust Metric Learning |  1 +
 ...aptive reinforcement learning in metric spaces |  1 +
 .../Proximal Mapping for Deep Regularization      |  1 +
 ...trix Perspective Function and its Applications |  1 +
 data/2020/neurips/Pruning Filter in Filter        |  1 +
 ...y data by iteratively conserving synaptic flow |  1 +
 ...g at Cloud Scale with Microsoft Floating Point |  1 +
 ...lic Programming for Automated Machine Learning |  1 +
 ...l Concepts Emerging in Representation Learning |  1 +
 ... Measures: Beating the Curse of Dimensionality |  1 +
 ...for Wasserstein-Approximate Gaussian Processes |  1 +
 ...ation of Chaos for SGD in Wide Neural Networks |  1 +
 data/2020/neurips/Quantized Variational Inference |  1 +
 ...evant mechanism for sequential decision-making |  1 +
 ...on Attention Network for Semantic Segmentation |  1 +
 ...Transient Tasks for Continual Image Captioning |  1 +
 ...Scene Synthesis Using Structured Latent Spaces |  1 +
 ... Benchmarks for Offline Reinforcement Learning |  0
 ...n-linear Pooling for RAM Constrained Inference |  1 +
 ... Sample-based Keypoint Detector and Descriptor |  1 +
 ... Data Augmentation with a Reduced Search Space |  1 +
 .../Random Reshuffling is Not Always Better       |  1 +
 ...ffling: Simple Analysis with Vast Improvements |  1 +
 .../neurips/Random Walk Graph Neural Networks     |  1 +
 ...ession: A more efficient and powerful solution |  1 +
 ...Projection Alternative to the Softmax Function |  1 +
 ...rmulation of Wasserstein Discriminant Analysis |  1 +
 data/2020/neurips/Rational neural networks        |  1 +
 ...ngs for High-Dimensional Bayesian Optimization |  1 +
 .../Real World Games Look Like Spinning Tops      |  1 +
 ...-Time Dynamical Systems using Polynomial Forms |  1 +
 ...ersarial Learning via Characteristic Functions |  1 +
 ...mization Analyses: The Intrinsic Learning Rate |  1 +
 ...rative Objectives For Counterfactual Reasoning |  1 +
 ...ages from Brain Activity by Shape-Semantic GAN |  1 +
 ...e linear classifiers from mixture of responses |  1 +
 .../neurips/Recurrent Quantum Neural Networks     |  1 +
 ...ls for Multiple Interacting Neural Populations |  1 +
 ...cursive Inference for Variational Autoencoders |  1 +
 ...lly Robust Learning to Non-Robust PAC Learning |  1 +
 ...ability using Self-Supervised Object Proposals |  1 +
 ...sion with reject option and application to kNN |  1 +
 ...Online Learning with Relative-Lipschitz Losses |  2 ++
 .../Regret in Online Recommendation Systems       |  1 +
 ...s recover the principal components, eventually |  1 +
 ...Black-box Models for Improved Interpretability |  1 +
 ...rds Permutation Invariance In Recurrent Models |  2 ++
 ...mization with Neighborhood-Controlled Grammars |  1 +
 ...Learning for Control with Multiple Frequencies |  1 +
 ...ter Regret Bounds for the Non-Episodic Setting |  1 +
 .../Reinforcement Learning with Augmented Data    |  1 +
 ...ial Actions: An Application to Vehicle Routing |  1 +
 .../Reinforcement Learning with Feedback Graphs   |  1 +
 ...fficient Approach via Bounded Eluder Dimension |  1 +
 ...enchmark for Grounding Spatial Relations in 3D |  1 +
 ...s for Object Detection via Transformer Decoder |  1 +
 ...he Jacobian term in unsupervised deep learning |  1 +
 ...e Graph Neural Networks via Robust Aggregation |  1 +
 ...ularization by Maximizing Functional Entropies |  1 +
 ...fication Meets Regression for Object Detection |  1 +
 ...ameterizing Mirror Descent as Gradient Descent |  1 +
 ...amics for Bayesian Learning on Large Datasets" |  1 +
 ... Outcomes to Optimize Individualized Treatment |  1 +
 ...escuing neural spike train models from bad MLE |  1 +
 ...ts Recurrent Kernels and Structured Transforms |  1 +
 ...ortable Deep Neural Networks without Shortcuts |  1 +
 ...havior Imitation and Extended Motion Synthesis |  1 +
 ...mplexity Algorithm for Online Restless Bandits |  1 +
 ...ative Information in Few-Shot Object Detection |  1 +
 ...ing for Deep Learning under Distribution Shift |  1 +
 ...able Tree Filter for Generic Feature Transform |  1 +
 .../Rethinking Pre-training and Self-training     |  1 +
 .../Rethinking pooling in graph neural networks   |  1 +
 ...Labels for Improving Class-Imbalanced Learning |  1 +
 ...d Generation for Knowledge-Intensive NLP Tasks |  1 +
 ...mpose Retrosynthesis Prediction Like A Chemist |  1 +
 ...ions to a hierarchical inference task for mice |  1 +
 ...Polytopes: Strict Complementarity and Sparsity |  1 +
 ...ing for Automatic Neural Channel Number Search |  1 +
 ...e Spectrum Approximation of Gaussian Processes |  1 +
 ...Propagation Using Graph Convolutional Networks |  1 +
 ...oice: A unifying formalism for reward learning |  1 +
 ...RL: Hindsight Inference for Policy Improvement |  1 +
 ...tions by Following Eigenvectors of the Hessian |  1 +
 .../Riemannian Continuous Normalizing Flows       |  1 +
 ...g: Near-Optimal Risk-Sample Tradeoff in Regret |  1 +
 ...g Bias using Cumulative Distribution Functions |  1 +
 ...dversarial Perturbations on State Observations |  1 +
 ...bust Density Estimation under Besov IPM Losses |  1 +
 ...ust Disentanglement of a Few Factors at a Time |  1 +
 ...arning: The Case of Affine Distribution Shifts |  1 +
 ...stimation in Nearly-Matrix Multiplication Time |  1 +
 ...for Mixed Linear Regression with Small Batches |  1 +
 ... Reinforcement Learning with Model Uncertainty |  1 +
 ... Reweighting of the Graph Connection Laplacian |  1 +
 ...s in Generative Modeling and Domain Adaptation |  1 +
 ...ation for Fairness with Noisy Protected Groups |  1 +
 ...Persistence Diagrams using Reproducing Kernels |  1 +
 ...e-Training by Adversarial Contrastive Learning |  1 +
 ...obust Quantization: One Model to Rule Them All |  1 +
 ...earning Rates for Double Over-parameterization |  1 +
 ...atment Effects with Uncertainty Quantification |  1 +
 ...ia Adversarial training with Langevin Dynamics |  1 +
 .../Robust Sequence Submodular Maximization       |  1 +
 ...nalysis and Width-Independent Schatten Packing |  1 +
 ...stimation Made Simple, via Regret Minimization |  3 +++
 ...ust compressed sensing using generative models |  1 +
 ...bust large-margin learning in hyperbolic space |  1 +
 ...chastic Optimization for Variational Inference |  1 +
 ...trol of Linear Systems: beyond Quadratic Costs |  1 +
 ...tic Gradient Descent using Biased Expectations |  1 +
 ...sian Neural Networks to Gradient-Based Attacks |  1 +
 ...ty Detection to Random Geometric Perturbations |  1 +
 data/2020/neurips/Rotated Binary Neural Network   |  1 +
 ...bal Representation Learning for 3D Point Cloud |  1 +
 ... Self-Attention via Sparse Adaptive Connection |  1 +
 ...ic Control for Reliable Neural Network Pruning |  1 +
 ...ce 3D Object Reconstruction from Static Images |  1 +
 ...oto-Translation Equivariant Attention Networks |  1 +
 ...pplications in Radar and Satellite Meteorology |  1 +
 ...mponent convexity and large epoch requirements |  1 +
 ...ced Network For Spatial Description Resolution |  1 +
 ...nknown dynamical systems with long-term memory |  1 +
 ...fficient Attention using Asymmetric Clustering |  1 +
 ...SOLOv2: Dynamic and Fast Instance Segmentation |  1 +
 ... Simple Temporal Regularization For Neural ODE |  1 +
 ...nforced Multivariate Recurrent Neural Networks |  1 +
 ..., Robust, Fast Distribution Learning Algorithm |  1 +
 ...in gradient flow of the chi-squared divergence |  1 +
 ...einforcement Learning via Curriculum Induction |  1 +
 ...rning: Sharper Analysis and Variance Reduction |  1 +
 ...ty of Uniform Convergence for Multicalibration |  1 +
 ...cement Learning via Low-Rank Matrix Estimation |  1 +
 ...ffective dimension for regression on manifolds |  1 +
 ...Deep Generative Models via Weighted Retraining |  1 +
 ...Reinforcement Learning of Undercomplete POMDPs |  1 +
 ...ling from a k-DPP without looking at all items |  1 +
 ...ecomposable Generative Adversarial Recommender |  1 +
 ...ng Methods: Random Tickets can Win the Jackpot |  1 +
 ...able Belief Propagation via Relaxed Scheduling |  0
 ... Neural Networks via Bidirectional Propagation |  1 +
 ...ning for Networked Systems with Average Reward |  1 +
 ...r Communication-Efficient Distributed Training |  1 +
 ...Oversmoothness in Graph Convolutional Networks |  1 +
 ...r Low-Bit Weights in Quantized Neural Networks |  1 +
 ...x Optimization via Perturbed Gradient Tracking |  0
 ...Bayesian Bounds for the Weighted Majority Vote |  1 +
 ... Matching Problems with Machine Learned Advice |  0
 ...nd Seldonian Reinforcement Learning Algorithms |  1 +
 ...xplore: Curiosity via Audio-Visual Association |  1 +
 ...e Training: beyond Empirical Risk Minimization |  1 +
 ...03\251 from Focused and Defocused Image Pairs" |  1 +
 ...tion Amplifies Regularization in Hilbert Space |  1 +
 ...tillation as Instance-Specific Label Smoothing |  1 +
 ...earning via Generalized Lower Bound Q-learning |  1 +
 ...ations for Improving Gaze and Head Redirection |  1 +
 .../Self-Paced Deep Reinforcement Learning        |  1 +
 ...f-Supervised Few-Shot Learning on Point Clouds |  1 +
 ...-Supervised Generative Adversarial Compression |  1 +
 ...Learning by Cross-Modal Audio-Video Clustering |  1 +
 .../Self-Supervised MultiModal Versatile Networks |  1 +
 ...lational Reasoning for Representation Learning |  1 +
 .../neurips/Self-Supervised Relationship Probing  |  1 +
 ...esentation Learning from Hierarchical Grouping |  1 +
 ...Hybrid Memory for Domain Adaptive Object Re-ID |  1 +
 ...rning with Meta-paths for Heterogeneous Graphs |  1 +
 ... Co-Training for Video Representation Learning |  1 +
 ...upervised learning through the eyes of a child |  1 +
 ...ids Using Spurious Features Under Domain Shift |  1 +
 ...c Visual Navigation by Watching YouTube Videos |  1 +
 .../Semi-Supervised Neural Architecture Search    |  1 +
 ...rning via Confidence-Rated Margin Maximization |  1 +
 ...ation for Lipschitz Constants of ReLU Networks |  1 +
 ...Analysis of Bias Due to Unobserved Confounding |  1 +
 ...perimental Design with Variable Cost Structure |  1 +
 .../neurips/Set2Graph: Learning Graphs From Sets  |  1 +
 ...w: Learnable Deformation Flows Among 3D Shapes |  0
 ...-Critic for Multi-Agent Reinforcement Learning |  1 +
 ...er Learning for analyzing multi-site fMRI data |  1 +
 ...ReLU Networks with Precise Dependence on Depth |  5 +++++
 ...rgence bounds through empirical centralization |  1 +
 ... an Application to Noisy, Iterative Algorithms |  1 +
 ...er Generalization Bounds for Pairwise Learning |  1 +
 .../ShiftAddNet: A Hardware-Inspired Deep Network |  1 +
 ...r Binary Integer and Online Linear Programming |  0
 ...rministic Deep Learning via Distance Awareness |  1 +
 ... Sparse k-means Clustering via Feature Ranking |  1 +
 ... Sampling for Implicit Collaborative Filtering |  1 +
 ...NNs Improves Robustness to Image Perturbations |  1 +
 ...ce and Metric Learning from Paired Comparisons |  1 +
 ...dversarial Episodic MDPs with Known Transition |  1 +
 ...orn Barycenter via Functional Gradient Descent |  5 +++++
 ...inkhorn Natural Gradient for Generative Models |  1 +
 ...ion: From Global Inference to Local Adjustment |  1 +
 ...ng Window Algorithms for k-Clustering Problems |  1 +
 ...h Equilibrium Certificates in Very Large Games |  1 +
 ... And Consistent Probabilistic Regression Trees |  1 +
 ... of Online and Differentially Private Learning |  1 +
 .../Smoothed Geometry for Robust Attribution      |  1 +
 ...ing User Contributions in Differential Privacy |  1 +
 .../SnapBoost: A Heterogeneous Boosting Machine   |  1 +
 ...t Contrastive Learning for Visual Localization |  1 +
 ...ic Framework for Normalizing Flow on Manifolds |  1 +
 ...max Deep Double Deterministic Policy Gradients |  1 +
 ...Physics to Interact with Iterative PDE-Solvers |  1 +
 ...me Correspondence as a Contrastive Random Walk |  1 +
 .../Sparse Graphical Memory for Robust Planning   |  1 +
 data/2020/neurips/Sparse Learning with CART       |  1 +
 ...put Measures for Nonstationary Kernel Learning |  1 +
 ...arse Symplectically Integrated Neural Networks |  1 +
 .../neurips/Sparse Weight Activation Training     |  1 +
 .../Sparse and Continuous Attention Mechanisms    |  1 +
 ...angent Kernel for linear-width neural networks |  1 +
 ...twork for Multivariate Time-series Forecasting |  1 +
 ...Bayes for high dimensional logistic regression |  1 +
 data/2020/neurips/Spin-Weighted Spherical CNNs    |  1 +
 ...ic Gradient Descent on Nonsmooth Convex Losses |  2 ++
 .../Stable and expressive recurrent vision models |  1 +
 .../Stage-wise Conservative Linear Bandits        |  1 +
 ...ynamic Deterministic Markov Decision Processes |  1 +
 ...s for Uncertainty Calibration in Deep Learning |  1 +
 ...ompson Sampling for Combinatorial Semi-Bandits |  1 +
 ...of Distributed Nearest Neighbor Classification |  1 +
 ...l Transport posed as Learning Kernel Embedding |  1 +
 ...l Properties of Sliced Probability Divergences |  1 +
 ...rce imaging with desparsified mutli-task Lasso |  0
 ...al-Query Lower Bounds via Functional Gradients |  1 +
 ...te Analysis of Episodic Reinforcement Learning |  1 +
 ...ighbors in Supervised Dimensionality Reduction |  1 +
 ...Repulsive Dynamics: Benefits From Past Samples |  1 +
 ...Stochastic Deep Gaussian Processes over Graphs |  1 +
 ...elated Settings: A Study on Gaussian Processes |  1 +
 ...orcement Learning with a Latent Variable Model |  1 +
 data/2020/neurips/Stochastic Normalization        |  1 +
 data/2020/neurips/Stochastic Normalizing Flows    |  1 +
 ...astic Optimization for Performative Prediction |  3 +++
 ...Tailed Noise via Accelerated Gradient Clipping |  1 +
 ...astic Optimization with Laggard Data Pipelines |  1 +
 ...ic Nonconvex-Strongly-Concave Minimax Problems |  1 +
 ...ing Spatially Correlated Aleatoric Uncertainty |  1 +
 data/2020/neurips/Stochastic Stein Discrepancies  |  1 +
 ...earning Rate for Multiscale Objective Function |  1 +
 ...hannel Pruning via Deep Reinforcement Learning |  1 +
 ...r Misinformation Prevention in Social Networks |  1 +
 ...Learning by Energy-based Distribution Matching |  1 +
 ...onstituency Parsing with Graph Neural Networks |  1 +
 ...supervised learning and local graph clustering |  1 +
 ...nvolutions for Efficient Neural Network Design |  1 +
 ...tured Prediction for Conditional Meta-Learning |  1 +
 ...Bayesian Optimisation in Unknown Search Spaces |  1 +
 ...or Efficient Non-Parametric Bandit Exploration |  1 +
 data/2020/neurips/Subgraph Neural Networks        |  1 +
 ...ubgroup-based Rank-1 Lattice Quasi-Monte Carlo |  1 +
 ...modular Maximization Through Barrier Functions |  1 +
 data/2020/neurips/Submodular Meta-Learning        |  1 +
 ...nt Communication With Temporal Message Control |  1 +
 ...on using principal optimal transport direction |  1 +
 ... A Generic Loss for Robust Curriculum Learning |  1 +
 data/2020/neurips/Supermasks in Superposition     |  1 +
 data/2020/neurips/Supervised Contrastive Learning |  1 +
 ...tions to Bridge the Gap between VAEs and Flows |  1 +
 ...apping Autoencoder for Deep Image Manipulation |  1 +
 ...ng Learning Algorithms with Synthetic Datasets |  1 +
 ...earning to Repair for Neural Program Synthesis |  1 +
 ...Synthesizing Tasks for Block-based Programming |  1 +
 ...hetic Data Generators - Sequential and Private |  1 +
 ...nstraints: A Circuit Model of the Inner Retina |  1 +
 ...Semantic Pyramid for Sign Language Translation |  1 +
 ...roblem in Heterogeneous Federated Optimization |  1 +
 ...ete Integration via the Boon of Dimensionality |  2 ++
 ...l Perturbations for Monocular Depth Prediction |  1 +
 ... Inference of Gaussian Process Hyperparameters |  1 +
 ...with an Infinite Mixture of Gaussian Processes |  1 +
 .../neurips/Task-Oriented Feature Distillation    |  1 +
 .../Task-Robust Model-Agnostic Meta-Learning      |  1 +
 ...agnostic Exploration in Reinforcement Learning |  1 +
 ...s Sample-Efficient Natural Language Generation |  1 +
 .../2020/neurips/Teaching a GAN What Not to Learn |  1 +
 .../neurips/Telescoping Density-Ratio Estimation  |  1 +
 ...ical Hypothesis Generation via Risk Estimation |  1 +
 ...ckpropagation for Deep Spiking Neural Networks |  1 +
 ...mporal Variability in Implicit Online Learning |  1 +
 .../2020/neurips/Tensor Completion Made Practical |  1 +
 .../neurips/Testing Determinantal Point Processes |  1 +
 ...re Interpolation for Probing Visual Perception |  1 +
 ...ty of Maximizing a Gross Substitutes Valuation |  1 +
 ...ning for Biased Regularization and Fine Tuning |  1 +
 ...All-or-Nothing Phenomenon in Sparse Tensor PCA |  1 +
 .../The Autoencoding Variational Autoencoder      |  1 +
 .../neurips/The Complete Lasso Tradeoff Diagram   |  1 +
 ...per Learning of Halfspaces with Agnostic Noise |  1 +
 ... of Silence: Speech Separation by Localization |  1 +
 ...on Relaxations for Neural Network Verification |  1 +
 ...on Exponential and Generalized Sylvester Flows |  1 +
 ... Macroscopic Prediction via Microscopic Models |  1 +
 ...hted TriHard Loss for Person Re-Identification |  1 +
 ...The Discrete Gaussian for Differential Privacy |  2 ++
 .../The Diversified Ensemble Neural Network       |  1 +
 ...l Privacy: Private Counting with Minimal Space |  1 +
 ...n-Stability Tradeoff In Neural Network Pruning |  1 +
 ...h Nonlinear Observations and Generative Priors |  1 +
 ...nge: Detecting Hate Speech in Multimodal Memes |  1 +
 ...al Correlation on Learning Some Deep Functions |  1 +
 ...Model-Based Behavior in Reinforcement Learning |  1 +
 ...icket Hypothesis for Pre-trained BERT Networks |  1 +
 .../The MAGICAL Benchmark for Robust Imitation    |  1 +
 .../The Mean-Squared Error of Double Q-Learning   |  1 +
 .../2020/neurips/The NetHack Learning Environment |  1 +
 ... Texture Bias in Convolutional Neural Networks |  1 +
 ...Pitfalls of Simplicity Bias in Neural Networks |  1 +
 ...tts-Ising model for discrete multivariate data |  1 +
 ...isons for Actively Learning Linear Classifiers |  1 +
 .../The Power of Predictions in Online Control    |  1 +
 ...-Dual method for Learning Augmented Algorithms |  1 +
 .../The Smoothed Possibility of Social Choice     |  1 +
 ...cal Complexity of Early-Stopped Mirror Descent |  2 ++
 .../neurips/The Strong Screening Rule for SLOPE   |  3 +++
 ...arly-Time Learning Dynamics of Neural Networks |  1 +
 ...inciple for Model-Based Reinforcement Learning |  1 +
 .../The Wasserstein Proximal Gradient Algorithm   |  1 +
 ...ndomness and structure during learning in RNNs |  1 +
 ...f approximation rates for deep neural networks |  1 +
 ...ames: When is price of anarchy too optimistic? |  2 ++
 ...sification: A High-dimensional Asymptotic View |  1 +
 ...rized Differential Network Architecture Search |  1 +
 ...ology Design for Cross-Silo Federated Learning |  1 +
 ...oordinate Selection Solver for Sparse Learning |  1 +
 ...r Regret Bounds for Adversarial Linear Bandits |  1 +
 ...dient Descent under the Noiseless Linear Model |  1 +
 ...s for no-regret learning in multi-player games |  1 +
 .../neurips/Time-Reversal Symmetric ODE Network   |  1 +
 ... using Temporal Hierarchical One-Class Network |  1 +
 ...ot Parameters for Efficient On-Device Learning |  1 +
 .../Top-KAST: Top-K Always Sparse Training        |  1 +
 ...g GAN Performance by Throwing Away Bad Samples |  1 +
 ...arning Approach to Sequential Conformer Search |  1 +
 ...d the Fundamental Limits of Imitation Learning |  1 +
 ...er Generalization of Adaptive Gradient Methods |  1 +
 ... Analysis of Random Forests for Classification |  0
 ...etworks using Decentralized Mixture-of-Experts |  1 +
 ...tworks with Differentiable Group Normalization |  1 +
 ...standing with Explanations as Latent Variables |  1 +
 .../Towards Learning Convolutions from Scratch    |  1 +
 ...tween In-Domain & Out-of-Distribution Examples |  1 +
 ...Learning in Factored Markov Decision Processes |  1 +
 ...l Adversarial Attacks on Graph Neural Networks |  1 +
 .../neurips/Towards Neural Programming Interfaces |  1 +
 ...ll MOBA Games with Deep Reinforcement Learning |  1 +
 ...wards Problem-dependent Optimal Learning Rates |  1 +
 ...afe Policy Improvement for Non-Stationary MDPs |  1 +
 ...ards Scalable Bayesian Learning of Causal DAGs |  1 +
 ... Problem Solving by Iterative Homogeneous GNNs |  0
 ... Generalizes Better Than Adam in Deep Learning |  1 +
 ...l Learning: Benefits of Neural Representations |  1 +
 ...Towards a Better Global Loss Landscape of GANs |  1 +
 ...al Characterization of Bounded-Memory Learning |  2 ++
 ... differentially private causal graph discovery |  1 +
 ...sentation Learning for Information Obfuscation |  1 +
 ...acy: Data Debugging in Collaborative Filtering |  1 +
 ...oupling Locations of Weights from Their Values |  1 +
 ...rks by Solving Ordinary Differential Equations |  1 +
 ...erative Adversarial Networks with Limited Data |  1 +
 .../neurips/Training Linear Finite-State Machines |  1 +
 ...neck for Competitive Generative Classification |  1 +
 ...ng Stronger Baselines for Learning to Optimize |  1 +
 ...amics Generalization in Reinforcement Learning |  1 +
 ...fer Learning via \342\204\2231 Regularization" |  0
 ...h Lower Bias and Variance in Domain Adaptation |  1 +
 ...Transferable Graph Optimizers for ML Compilers |  1 +
 ...e! I am a low dimensional Hyperbolic Embedding |  1 +
 ...ds of overfitting: where & why do they appear? |  1 +
 ...Truncated Linear Regression in High Dimensions |  1 +
 ... Is Confident: Masked Model-based Actor-Critic |  2 ++
 .../Truthful Data Acquisition via Peer Prediction |  1 +
 ...et: Single View Reconstruction in Object Space |  1 +
 ...iscovering of Constructive Solid Geometry Tree |  1 +
 ...raphy, Watermarking, and Light Field Messaging |  1 +
 ...el Capacity Weakly Supervised Object Detection |  1 +
 ...ecision 4-bit Training of Deep Neural Networks |  1 +
 .../Ultrahyperbolic Representation Learning       |  1 +
 ... a Modulo Image for High Dynamic Range Imaging |  1 +
 data/2020/neurips/Unbalanced Sobolev Descent      |  1 +
 ...y Aware Semi-Supervised Learning on Graph Data |  1 +
 ...y Quantification for Inferring Hawkes Networks |  1 +
 ...e Learning for Zero-Shot Semantic Segmentation |  1 +
 ...Self-training for Few-shot Text Classification |  1 +
 ...me-Varying fMRI Data using Cubical Persistence |  1 +
 ...ough Hierarchies of Distributions and Features |  1 +
 ...tural Gradient Descent in Wide Neural Networks |  1 +
 ...tanding Deep Architecture with Reasoning Layer |  0
 ...res A Fine-Grained Bias-Variance Decomposition |  1 +
 ...ontributions With Additive Importance Measures |  1 +
 ...ipping in Private SGD: A Geometric Perspective |  1 +
 ...ring the Network with Stochastic Architectures |  1 +
 ...anding and Improving Fast Adversarial Training |  1 +
 ...g spiking networks through convex optimization |  1 +
 ...Role of Training Regimes in Continual Learning |  1 +
 ...'s functions for optimized reservoir computing |  1 +
 ...nating Optimization for Blind Super Resolution |  1 +
 ...sed Learning Rules for Spiking Neural Networks |  1 +
 ...sal Domain Adaptation through Self Supervision |  1 +
 .../Universal Function Approximation on Graphs    |  1 +
 ...duction via a higher-order splitting criterion |  2 ++
 .../Universally Quantized Neural Compression      |  1 +
 ...lgorithms in Multi-Armed Bandit with Many Arms |  0
 ...sed Data Augmentation for Consistency Training |  1 +
 ...tations with Compositional Energy-Based Models |  1 +
 ...vised Learning of Dense Visual Representations |  1 +
 ...ynamics from Images for Prediction and Control |  1 +
 ...ect Landmarks via Self-Training Correspondence |  1 +
 ...al Features by Contrasting Cluster Assignments |  1 +
 ...resentation Learning by Invariance Propagation |  1 +
 ...Template Matching for Semi-Supervised Learning |  1 +
 ...nd Separation Using Mixture Invariant Training |  1 +
 ...rvised Text Generation by Learning from Search |  1 +
 ...upervised Translation of Programming Languages |  1 +
 ...ntric video generation and decomposition in 3D |  1 +
 ...d self-attention in artificial neural networks |  0
 ...cement Learning for CMDP with Adversarial Loss |  1 +
 ...Sequence Models for Continuous-Time Event Data |  1 +
 ...nt neural network structure and prune synapses |  1 +
 ...rative Model for Heterogeneous Mixed Type Data |  1 +
 ...and Semi-supervised Learning to Tabular Domain |  1 +
 .../2020/neurips/Value-driven Hindsight Modelling |  1 +
 ...e Gradient Estimator for Variational Inference |  1 +
 ...ted Dual Averaging for Finite-Sum Optimization |  1 +
 ...Random Coordinate Descent-Langevin Monte Carlo |  3 +++
 ... Learning: Non-Asymptotic Convergence Analysis |  1 +
 .../neurips/Variational Amodal Object Completion  |  1 +
 ...al Bayesian Monte Carlo with Noisy Likelihoods |  1 +
 data/2020/neurips/Variational Bayesian Unlearning |  1 +
 ...Absence of Graph Data and Adversarial Settings |  1 +
 ... Reinforcement Learning with General Utilities |  1 +
 ...eo Frame Interpolation without Temporal Priors |  1 +
 ...e Feature Bank and Uncertain-Region Refinement |  1 +
 ...escent Directions for Constrained Minimization |  1 +
 ...riational Inference for Bayesian Deep Learning |  1 +
 ...tein Distances for Stereo Disparity Estimation |  1 +
 ...urring the Vision of Your Deep Neural Networks |  1 +
 ... Training of High Resolution Normalizing Flows |  1 +
 .../Weak Form Generalized Hamiltonian Learning    |  1 +
 ...rvised Deep Functional Maps for Shape Matching |  0
 ...inforcement Learning for Controllable Behavior |  1 +
 ...on for Deep Multi-Agent Reinforcement Learning |  1 +
 ...Towards scalable higher-order graph embeddings |  1 +
 ...ston-Watkins Hinge Loss and Ordered Partitions |  1 +
 ...ning Agent Behaviour through Intended Outcomes |  1 +
 ...etworks Learn When Trained With Random Labels? |  1 +
 ...Makes for Good Views for Contrastive Learning? |  1 +
 ...overing the Long Tail via Influence Estimation |  2 ++
 .../neurips/What if Neural Networks had SVDs?     |  1 +
 ...hat is being transferred in transfer learning? |  1 +
 ...xploring datasets, architectures, and training |  1 +
 ...re importance for time-series black-box models |  1 +
 .../When Counterpoint Meets Chinese Folk Melodies |  1 +
 ... Do Neural Networks Outperform Kernel Methods? |  1 +
 ...essment using Compartmental Gaussian Processes |  1 +
 ...etworks? - A Neural Tangent Kernel Perspective |  1 +
 ... Flows Fail to Detect Out-of-Distribution Data |  1 +
 ...re Adaptive Methods Good for Attention Models? |  1 +
 ...ing the Lottery with Continuous Sparsification |  1 +
 ... Improving Consistency of Deep Learning Models |  1 +
 ...r Approximation for Neural Network Compression |  2 ++
 ...bury Transformations for Deep Generative Flows |  1 +
 ...orst-Case Analysis for Randomly Collected Data |  1 +
 ...AL: Explicit Calibration for Survival Analysis |  1 +
 ...ecretly Suffice Multi-Source Domain Adaptation |  1 +
 ...hould Use Discriminator Driven Latent Sampling |  1 +
 ...Learning With Nonlinear Function Approximation |  1 +
 ...esource Knowledge-Grounded Dialogue Generation |  1 +
 .../neurips/f-Divergence Variational Inference    |  1 +
 ... for Generative Adversarial Imitation Learning |  1 +
 ...-Supervised Learning of Speech Representations |  1 +
 ...Exploration from Decentralized Learning Agents |  1 +
 ...th Correspondence Learning and Mesh Refinement |  1 +
 ...e Voxel-to-BEV Tracker for Sparse Point Clouds |  1 +
 ...Scene Perception via Probabilistic Programming |  1 +
 ...nerative Model for Structure-Based Drug Design |  1 +
 ...Recommender Systems in a Two-sided Marketplace |  1 +
 ...with Control in the Presence of Subpopulations |  1 +
 ...to Reasoning and Learning in Intuitive Physics |  1 +
 ... to Solve Combinatorial Optimization on Graphs |  1 +
 ...eural Network Sampler with Near-Optimal Regret |  1 +
 ...A Causal Lens for Controllable Text Generation |  1 +
 ...rem for Differentially Private Query Answering |  1 +
 ...case Behavior of Multi-armed Bandit Algorithms |  1 +
 ...Circuit Operations for Probabilistic Inference |  1 +
 ...ely Tight Analysis of Gradient Descent for PCA |  1 +
 ... for Learning Exponential Family Distributions |  1 +
 ...g Agent for Model-Based Reinforcement Learning |  1 +
 ...ndom-Order No-Substitution k-Median Clustering |  1 +
 .../A Continuous Mapping For Augmentation Design  |  1 +
 ...ch for Training Variational Autoencoder Priors |  1 +
 ...s of Gradient Descent on Graph Neural Networks |  1 +
 ...al Estimation with Deep Latent Variable Models |  1 +
 ...lgorithm with Order-Optimal Regret Performance |  1 +
 ...lized Algorithm for Nonconvex Minimax Problems |  1 +
 ...lgorithm with Applications in Machine Learning |  1 +
 .../A Framework to Learn with Interpretation      |  1 +
 data/2021/neurips/A Gang of Adversarial Bandits   |  1 +
 ... Mixture Model for Multi-Label Active Learning |  1 +
 ...of Neural Collapse with Unconstrained Features |  1 +
 ...ural Calibration via Sensitivity Decomposition |  1 +
 ...on and Its Role in Making Gradients Small Fast |  1 +
 .../A Gradient Method for Multilevel Optimization |  1 +
 ...rge-scale Dynamic Pickup and Delivery Problems |  1 +
 ...n Application to Function-On-Scalar Regression |  1 +
 ...st of Independence for Cluster-correlated Data |  1 +
 ...garithm for Multi-Agent Reinforcement Learning |  1 +
 ... Robust Features for Targeted Transfer Attacks |  1 +
 ...nsferability in Multi-source Transfer Learning |  1 +
 ...n Entropy Framework for Reinforcement Learning |  1 +
 ...ist Approach to Offline Reinforcement Learning |  1 +
 ...Multi-Implicit Neural Representation for Fonts |  1 +
 ... for Debiasing Trained Machine Learning Models |  1 +
 ...astic Bilevel Optimization via Double-Momentum |  1 +
 ...k for Fast and Accurate Online Decision-Making |  1 +
 ...or Robust Acceleration in the Hyperbolic Plane |  1 +
 ...rithm for Positive Semidefinite Factorizations |  1 +
 ...e Algorithm for Independent Component Analysis |  1 +
 ... Note on Sparse Generalized Eigenvalue Problem |  1 +
 ...A PAC-Bayes Analysis of Adversarial Robustness |  1 +
 ...Inference from Differential Equations and Data |  1 +
 ...d Framework for Unsupervised Domain Adaptation |  1 +
 ...ing Method for Episodic Reinforcement Learning |  1 +
 ...Collection Strategy for Reinforcement Learning |  1 +
 ...proach to Learning-Augmented Online Algorithms |  1 +
 ...ata-oblivious and Data-aware Poisoning Attacks |  1 +
 ...el for Shape-Accurate 3D-Aware Image Synthesis |  1 +
 ... Algorithm for Distributed Convex Optimization |  1 +
 ...r Prediction+Programming with Soft Constraints |  0
 ...l Analysis of Fine-tuning with Linear Teachers |  1 +
 ...rtion-Perception Tradeoff in Wasserstein Space |  1 +
 ...Method for Contrastive Representation Learning |  1 +
 .../A Topological Perspective on Causal Inference |  1 +
 ...ding Model for Hyperspectral Image Restoration |  1 +
 ... Online Learning via Blackwell Approachability |  1 +
 ...ied View of cGANs with and without Classifiers |  1 +
 ...A Universal Law of Robustness via Isoperimetry |  1 +
 ...ion-Based Generative Models and Score Matching |  1 +
 ...rks Can Improve Out-of-Distribution Robustness |  1 +
 ...ual method with adaptivity to local smoothness |  1 +
 ... neural population responses to natural images |  1 +
 ...nonparametric Bayesian model for whole genomes |  1 +
 ...rea recurrent network model of decision-making |  1 +
 ...al change problems with statistical guarantees |  1 +
 ...ributions based on optimal weak mass transport |  1 +
 ...ling-based circuit for optimal decision making |  1 +
 ...ptures feature learning effects in finite CNNs |  1 +
 ... examples on random two-layers neural networks |  1 +
 ... unified framework for bandit multiple testing |  1 +
 ...veals ongoing modulation of neural variability |  1 +
 ...oximate posterior for the deep Wishart process |  1 +
 ...for Learning Human Shape, Appearance, and Pose |  1 +
 ... for Class-imbalanced Semi-supervised Learning |  1 +
 ... DeCompressed Training of Deep Neural Networks |  1 +
 ...vation Compression with Guaranteed Convergence |  1 +
 ...ing of Negative Transfer in Continual Learning |  1 +
 ... Efficient Point Cloud Representation Learning |  1 +
 ...essive Transformers for Indoor Scene Synthesis |  1 +
 ...st for Detecting Heteroscedastic Relationships |  1 +
 ...ficient Method to Find N: M Transposable Masks |  1 +
 ...ratic Optimization with Reinforcement Learning |  1 +
 ...t Learning via Parameterized Action Primitives |  1 +
 ...ity for Multi-Objective Reinforcement Learning |  1 +
 ...cumulative Poisoning Attacks on Real-time Data |  1 +
 ...oud Registration with Robust Optimal Transport |  1 +
 ...ately Solving Rod Dynamics with Graph Learning |  1 +
 ...n and Knowledge Transfer in Continual Learning |  1 +
 ...ance with Bessel-Convolutional Neural Networks |  1 +
 ...r decoding by probabilistic manifold alignment |  1 +
 .../Action-guided 3D Human Motion Prediction      |  1 +
 ...sport Problem without Bidirectional Connection |  1 +
 ... 3D Shape Reconstruction from Vision and Touch |  1 +
 ...s Accuracy Surface Over Attribute Combinations |  1 +
 ...Active Learning of Convex Halfspaces on Graphs |  1 +
 data/2021/neurips/Active Offline Policy Selection |  1 +
 .../Active clustering for labeling training data  |  1 +
 ...iables Given Only Response Variable Observable |  1 +
 ...Populations via a Generative Model of Policies |  1 +
 ... and growth conditions in private optimization |  1 +
 ...e Conformal Inference Under Distribution Shift |  1 +
 .../Adaptive Data Augmentation on Temporal Graphs |  0
 .../neurips/Adaptive Denoising via GainTuning     |  1 +
 .../Adaptive Diffusion in Graph Neural Networks   |  1 +
 ... Minimizing Estimation Bias via Error Feedback |  1 +
 ...ex Minimization without Lipschitz Requirements |  1 +
 data/2021/neurips/Adaptive Machine Unlearning     |  1 +
 ...aptive Online Packing-guided Search for POMDPs |  1 +
 ...radient Methods for Structured Neural Networks |  1 +
 ...inimization: Learning to Adapt to Domain Shift |  1 +
 ...ptive Sampling for Minimax Fair Classification |  1 +
 ...n from neural networks through interpretations |  1 +
 .../Adder Attention for Vision Transformer        |  1 +
 ...erformance Inconsistency in Federated Learning |  1 +
 ...ated Errors in Neural Networks for Time Series |  1 +
 ...k Generation Empowered by Min-Max Optimization |  1 +
 ...eraging the Power of Geometric Transformations |  1 +
 ...on Graph Classifiers via Bayesian Optimisation |  0
 .../Adversarial Examples Make Strong Poisons      |  1 +
 ...sifiers Based on Higher-Order Voronoi Diagrams |  1 +
 ...l Examples in Multi-Layer Random ReLU Networks |  1 +
 .../neurips/Adversarial Feature Desensitization   |  1 +
 ...entation to Improve Graph Contrastive Learning |  1 +
 ...ntrinsic Motivation for Reinforcement Learning |  1 +
 ...Neuron Pruning Purifies Backdoored Deep Models |  1 +
 ...on with Doubly Non-negative Weighting Matrices |  1 +
 ...rial Reweighting for Partial Domain Adaptation |  1 +
 ...rial Robustness with Non-uniform Perturbations |  1 +
 ...stness with Semi-Infinite Constrained Learning |  1 +
 ... A Teacher-Guided Curriculum Learning Approach |  0
 ...resentation Learning for Domain Generalization |  1 +
 ...s Transfer Learning via Better Representations |  1 +
 ...oint Cloud Recognition Using Self-Supervisions |  1 +
 .../Adversarially Robust Change Point Detection   |  1 +
 ...ng for security-constrained optimal power flow |  1 +
 ... Observability for Deep Reinforcement Learning |  1 +
 ...rning with Low-Rank MDPs and Rich Observations |  1 +
 ... Instabilities of Accelerated Gradient Descent |  1 +
 ...of an unsupervised feature selection algorithm |  1 +
 .../Alias-Free Generative Adversarial Networks    |  1 +
 ...esentation Learning with Momentum Distillation |  1 +
 ... Learning for Efficient Image Super-Resolution |  1 +
 ...etection via Object-Level Contrastive Learning |  1 +
 ...ology for Self-Adaptive 3D Human Pose Recovery |  1 +
 ...ention by Matching Key and Query Distributions |  1 +
 ...beling for Training Better Vision Transformers |  1 +
 ...onfigurations Using a Differentiable Surrogate |  1 +
 ...ional Inference for Simple Hierarchical Models |  1 +
 ...n-convex Regime: Asymptotic Normality and Bias |  1 +
 ...Provably-Fair Welfare-Centric Machine Learning |  1 +
 ...hastic Linear Bandits with General Constraints |  1 +
 ...ramework for Multiagent Reinforcement Learning |  1 +
 ... Generalization with Empirical Risk Minimizers |  1 +
 ... of Adder Neural Networks for Object Detection |  1 +
 ...rithm: Minibatching and Interpolation Learning |  1 +
 ...e Generalization Error for the Gibbs Algorithm |  1 +
 ...morization Capacity of Deep Threshold Networks |  1 +
 ...Realizable MDP with Constant Suboptimality Gap |  1 +
 ...and Words: Towards Disentanglement in The Wild |  1 +
 ...tion under Without-replacement Sampling Orders |  0
 ...nt Tracking for Decentralized Machine Learning |  1 +
 ...ets That Fixes Their Asymptotic Overconfidence |  1 +
 ...tion-theoretic Approach to Distribution Shifts |  1 +
 ...Robust Optimization with Non-convex Objectives |  1 +
 ... for Stochastic Canonical Correlation Analysis |  1 +
 ...ple is a Price of Privacy-Preserving Microdata |  1 +
 ...of Ermakov-Zolotukhin quadrature using kernels |  1 +
 ...rithm for difference-of-squares classification |  1 +
 ... Algorithm: Designing a Unified Sequence Model |  1 +
 ...gnal Recovery under a Generalized Linear Model |  1 +
 ...layer neural networks via the resolvent method |  1 +
 ...ucture and Rank of Neural Network Hessian Maps |  1 +
 ...adigmatic High-Dimensional Non-Convex Problems |  1 +
 ...distillable Teachers in Knowledge Distillation |  1 +
 ... of SGLD Using Properties of Gaussian Channels |  1 +
 ...sal Queries With the Maximum Causal Set Effect |  1 +
 ...arning: Training Clean Models on Poisoned Data |  1 +
 ... of Label Differential Privacy: PATE and ALIBI |  1 +
 ... Minimization for Cardinality-Based Components |  1 +
 ...ization of convex functions with outlier noise |  1 +
 ...ing the Permanent with Deep Rejection Sampling |  1 +
 ...rbitrary Conditional Distributions with Energy |  1 +
 ...air? An Empirical Study of Fixed-Seed Training |  1 +
 .../Are Transformers more robust than CNNs?       |  1 +
 ... Diffusion: Learning Categorical Distributions |  1 +
 ...nal-external Learning and Contrastive Learning |  1 +
 ...ssing Fairness in the Presence of Missing Data |  1 +
 ...ith Transformers for Video Object Segmentation |  1 +
 .../Associative Memories via Predictive Coding    |  1 +
 ...Effect Identification with Multi-Armed Bandits |  1 +
 ...icy Evaluation with Misspecified Linear Models |  1 +
 ...on learning in finite Bayesian neural networks |  1 +
 ...Applications to Inference with Model Selection |  1 +
 .../Asynchronous Decentralized Online Learning    |  1 +
 ...ntralized SGD with Quantized and Local Updates |  1 +
 ...hastic Optimization Robust to Arbitrary Delays |  1 +
 ...tention Approximates Sparse Distributed Memory |  1 +
 .../Attention Bottlenecks for Multimodal Fusion   |  1 +
 ...ct Embeddings Enables Complex Visual Reasoning |  1 +
 ...iction Models for Data Minimization Compliance |  1 +
 ...on of Random Augmentations for Robust Training |  1 +
 .../Augmented Shortcuts for Vision Transformers   |  1 +
 ...aph for Unsupervised Medical Report Generation |  1 +
 ...: Optimized Loss Functions for Imbalanced Data |  1 +
 ... Neural Network with Explicit Link Information |  1 +
 ...Autobahn: Automorphism-based Graph Neural Nets |  1 +
 ...o-Correlation for Long-Term Series Forecasting |  1 +
 ...ry of Adaptive Attacks on Adversarial Defenses |  1 +
 .../neurips/Automated Dynamic Mechanism Design    |  1 +
 ...n for Generalization in Reinforcement Learning |  1 +
 ...scovery with Lie Algebra Convolutional Network |  1 +
 ...Automatic Unsupervised Outlier Model Selection |  1 +
 ...aphic Optimization: A Dynamic Barrier Approach |  0
 ...morphic Equivalence-aware Graph Neural Network |  1 +
 ...s Reinforcement Learning via Subgoal Curricula |  1 +
 ...rage-Reward Learning and Planning with Options |  1 +
 ...dimension-free convergence of gradient descent |  1 +
 ...: Evaluating Generated Text as Text Generation |  1 +
 ... Spanning Trees for Complex Constrained Domain |  1 +
 ...ional Approaches for Bayesian Causal Discovery |  1 +
 ...k for Coupons Allocation in E-commerce Market" |  1 +
 ... Structures Dynamically for Continual Learning |  1 +
 ... the goals, preferences, and actions of others |  1 +
 ...th Imperceptible Input and Latent Modification |  1 +
 ...e Prediction Updates: A Probabilistic Approach |  1 +
 ...omprehensive Metric for Point Cloud Completion |  0
 ...Hop Reasoning at Scale via Condensed Retrieval |  1 +
 ...Bandit Learning with Delayed Impact of Actions |  1 +
 data/2021/neurips/Bandit Phase Retrieval          |  1 +
 .../neurips/Bandit Quickest Changepoint Detection |  1 +
 .../Bandits with Knapsacks beyond the Worst Case  |  1 +
 data/2021/neurips/Bandits with many optimal arms  |  1 +
 data/2021/neurips/Batch Active Learning at Scale  |  1 +
 ...ptimization with Deep Auto-Regressive Networks |  1 +
 ...alizes Representations in Deep Random Networks |  1 +
 ...-all Architecture Search with Robust Quantizer |  1 +
 data/2021/neurips/Batched Thompson Sampling       |  1 +
 ...ertainty Quantification for Causal Data Fusion |  1 +
 .../Bayesian Adaptation for Covariate Shift       |  0
 data/2021/neurips/Bayesian Bellman Operators      |  1 +
 .../Bayesian Optimization of Function Networks    |  1 +
 ...ian Optimization with High-Dimensional Outputs |  1 +
 ...fied priors with applications to meta-learning |  1 +
 ...aph Neural Networks via Confidence Calibration |  1 +
 ...rom the Void: Unsupervised Active Pre-Training |  1 +
 ...for Offline Multi-Agent Reinforcement Learning |  1 +
 ...f RL Problems, and Sample-Efficient Algorithms |  1 +
 ...t Pessimism for Offline Reinforcement Learning |  1 +
 .../Beltrami Flow and Neural Diffusion on Graphs  |  1 +
 ...lassification: All Roads Lead to Interpolation |  1 +
 ...h Spectral Filters via Bernstein Approximation |  1 +
 ...ntification in Contaminated Stochastic Bandits |  0
 ...ly Optimal Submodular Maximization in Parallel |  1 +
 .../Best-case lower bounds in online learning     |  1 +
 ...nts for Neural Network Robustness Verification |  1 +
 ... Algorithms for Individually Fair k-Clustering |  1 +
 ...Delusive Adversaries with Adversarial Training |  1 +
 ...ausal Discovery Benchmarks May Be Easy to Game |  1 +
 ...t Feedback in Online Multiclass Classification |  1 +
 ...nderstanding of Normalization in Deep Learning |  1 +
 ...hods for Calibrated Uncertainty Quantification |  1 +
 ...Analysis into Nonparametric Density Estimation |  1 +
 ...oncordant losses, via iterative regularization |  1 +
 ...ret Bounds for Episodic Reinforcement Learning |  1 +
 ...onparametric Tensor Completion via Sign Series |  1 +
 ...l Biases in Popular Generative Language Models |  1 +
 ...Bias and variance of the Bayesian-mean decoder |  1 +
 .../neurips/Biological key-value memory networks  |  0
 .../2021/neurips/Black Box Probabilistic Numerics |  1 +
 ...lending for Arbitrary Stylized Face Generation |  1 +
 ...Blending Anti-Aliasing into Vision Transformer |  1 +
 ...oosting Approach for Continual Learning of VAE |  1 +
 ...rovably Efficient Bootstrapped Value Iteration |  1 +
 .../neurips/Boost Neural Networks by Checkpoints  |  1 +
 data/2021/neurips/Boosted CVaR Classification     |  1 +
 data/2021/neurips/Boosting with Multiple Sources  |  1 +
 ...tstrap Your Object Detector via Mixed Training |  1 +
 .../Bootstrapping the Error of Oja's Algorithm    |  1 +
 ... energy-based models with bidirectional bounds |  1 +
 ... Dilemma of Medical Image-to-image Translation |  1 +
 ...l Gradient Methods Using MaxIP Data-structures |  1 +
 ...hm for Bandits with Super Heavy-Tailed Payoffs |  1 +
 ...gret-Optimal Model-Free Reinforcement Learning |  3 +++
 ...ed barrier for cross-device federated learning |  1 +
 ... Construction with Deep Reinforcement Learning |  1 +
 ... Generative Models via Neural Stein Estimators |  1 +
 ...the-wild Data for Incremental Object Detection |  1 +
 ...ng and Imitation Learning: A Tale of Pessimism |  1 +
 ...and PAC-Bayes Theory in Few-Shot Meta-Learning |  1 +
 ... the Imitation Gap by Adaptive Insubordination |  1 +
 ... real-time flow prediction on neural manifolds |  1 +
 ...l Network Training via Boundary Example Mining |  1 +
 ...ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE |  1 +
 ...on Modules for Generative Adversarial Networks |  1 +
 ...ex Optimization with Communication Compression |  1 +
 ...ith Continuous Augmented Positional Embeddings |  1 +
 ...etic-REINFORCE Multi-Sample Gradient Estimator |  1 +
 ...egation Transformers for Visual Correspondence |  1 +
 ...sion using a pseudo-Lagrange multiplier method |  1 +
 ...VS: Context-aware Controllable Video Synthesis |  1 +
 ...ence-based Pruning for Compact Neural Networks |  1 +
 ...Learning for Semi-Supervised Domain Adaptation |  1 +
 .../CLIP-It! Language-Guided Video Summarization  |  1 +
 ... reInforcement Learning On sub-Task curriculum |  1 +
 ... Text Sequences for Language Model Pretraining |  1 +
 ...bject and Hand Embedding Segmentation In Video |  0
 ...vative Offline Model-Based Policy Optimization |  1 +
 ...s Based on Patient Disease Class, Sex, and Age |  1 +
 ...odels for Probabilistic Time Series Imputation |  1 +
 ...s: A Novel Approach to Multi-Class Calibration |  1 +
 ...nd Consistency of Adversarial Surrogate Losses |  1 +
 ... Targets for Interventions in Neural Circuits? |  1 +
 ...ancing Label Noise Rates Considered Beneficial |  1 +
 ... Easy to Hard Problems with Recurrent Networks |  1 +
 ...contrastive learning avoid shortcut solutions? |  1 +
 ...sentation of syntactic structure in the brain? |  1 +
 ...sification networks know what they don't know? |  1 +
 ...ation loss? Quasiconvexity in ridge regression |  1 +
 ... and Adversarial Robustness of Neural Networks |  1 +
 ...es: Self-Supervised Capsules in Canonical Pose |  1 +
 ...arned Geometric Embeddings for Directed Graphs |  1 +
 ...ith self-supervised hyperbolic representations |  1 +
 ...ned submodular maximization for random streams |  1 +
 .../Cardinality-Regularized Hawkes-Granger Model  |  1 +
 ...n The Emergence Of Compositional Communication |  1 +
 ...ic Data Leakage in Vertical Federated Learning |  0
 ... to Generate Audio from a Single Short Example |  1 +
 .../Causal Abstractions of Neural Networks        |  1 +
 .../Causal Bandits with Unknown Graph Structure   |  1 +
 ...sal Effect Inference for Structured Treatments |  1 +
 .../Causal Identification with Matrix Equations   |  1 +
 ...or Event Pairs in Multivariate Point Processes |  1 +
 ...Improving Efficiency in Reinforcement Learning |  1 +
 ... Navigation by Continuous-time Neural Networks |  1 +
 ...nfer Treatment-Effects from Observational Data |  1 +
 ...y in Shared Multi-Agent Reinforcement Learning |  1 +
 ...obustness for Networks with Structured Outputs |  1 +
 ...stance Representation for Scene Text Detection |  1 +
 ...ss to Programmable Data Bias in Decision Trees |  1 +
 ...ties in High Dimensional Variational Inference |  1 +
 ...on via Multivariate Singular Spectrum Analysis |  1 +
 .../Channel Permutations for N: M Sparsity        |  1 +
 ...Of-Distribution Shifts in Deep Metric Learning |  1 +
 ...lure modes in physics-informed neural networks |  1 +
 .../Characterizing the risk of fairwashing        |  1 +
 ...ace of Solutions for Recurrent Neural Networks |  1 +
 ...Vision Transformers: An End-to-End Exploration |  1 +
 ...nett Inequality for the Weighted Majority Vote |  1 +
 .../Choose a Transformer: Fourier or Galerkin     |  1 +
 ...ca: Stochastic ReLUs for Private Deep Learning |  1 +
 ...lications in Adversarial Detection and Defense |  1 +
 ...ass-Incremental Learning via Dual Augmentation |  1 +
 ... Reconstruction of Dynamic Objects from Videos |  1 +
 .../neurips/Clockwork Variational Autoencoders    |  1 +
 ...ochastic Gradient Methods for Bilevel Problems |  1 +
 ...-making: A case study on organ transplantation |  1 +
 ...Clustering Effect of Adversarial Robust Models |  0
 ...in Inference-based Deep Reinforcement Learning |  1 +
 ...ion Transformer for Protein Contact Prediction |  1 +
 ...g Convolution and Attention for All Data Sizes |  1 +
 ...espondences for Robust PointCloud Registration |  1 +
 ...l Architecture Inspired by Continued Fractions |  0
 ...oarse-to-fine Animal Pose and Shape Estimation |  1 +
 ... Tool for the Training of Deep Neural Networks |  1 +
 ...ring Text-to-Image Generation via Transformers |  1 +
 .../Collaborating with Humans without Human Data  |  1 +
 ...ive Causal Discovery with Atomic Interventions |  1 +
 ...ogeneous, Asynchronous and Nonconvex Learning) |  1 +
 ...ertainty in Multi-Agent Trajectory Forecasting |  1 +
 ...ariational Bounds for Bayesian Neural Networks |  1 +
 ... Learning by Region Uncertainty Quantification |  1 +
 ... Segmentation: A Fully Differentiable Approach |  1 +
 ...re Exploration with Bottleneck Reward Function |  1 +
 ...ntion Transformer with Sparse Computation Cost |  1 +
 ...ilities via Confusion Matrices and Calibration |  1 +
 ...ayesian Optimization over Combinatorial Spaces |  1 +
 ...ous-time Models with Linear State Space Layers |  1 +
 ...ient SGD: From Local SGD to One-Shot Averaging |  1 +
 ...Efficient Low-Rank Hypercomplex Adapter Layers |  1 +
 ...onconvex-Strongly-Concave Min-Max Optimization |  1 +
 ...namical Systems with ODE-based Random Features |  1 +
 ...forcement Learning from Logical Specifications |  1 +
 ...ompositional Transformers for Scene Generation |  1 +
 ...nowledge Distillation with Causal Intervention |  1 +
 .../neurips/Compressed Video Contrastive Learning |  1 +
 ...termining the Optimal Layer-wise Decomposition |  1 +
 .../neurips/Compressive Visual Representations    |  1 +
 .../neurips/Computer-Aided Design as Language     |  1 +
 ... for Multi-Hop Reasoning over Knowledge Graphs |  1 +
 ...er sub-Gaussian and sub-exponential conditions |  1 +
 ...itional Generation Using Polynomial Expansions |  1 +
 ...ks for Mesh-Based Modeling of Physical Systems |  1 +
 ... Gaussian Processes for Online Decision-making |  1 +
 ...ng from Demonstrations with Varying Optimality |  1 +
 ...or-Induced Multi-Source Free Domain Adaptation |  1 +
 ...verse Gradient Descent for Multi-task learning |  1 +
 data/2021/neurips/Conformal Bayesian Computation  |  1 +
 ...formal Prediction using Conditional Histograms |  1 +
 .../neurips/Conformal Time-series Forecasting     |  1 +
 ...meter-Free Convex-Concave Saddle-Point Solving |  1 +
 ... for Multi-Task Offline Reinforcement Learning |  1 +
 ... Offline Distributional Reinforcement Learning |  1 +
 ...y Regularization for Variational Auto-Encoders |  1 +
 ... and Sparse Regression with Oblivious Outliers |  1 +
 ...n-Parametric Methods for Maximizing Robustness |  1 +
 ...orks on Critical and Under-Represented Classes |  1 +
 .../Constrained Robust Submodular Partitioning    |  1 +
 .../Container: Context Aggregation Networks       |  1 +
 ...ations and Low-Regret Cutting-Plane Algorithms |  1 +
 ...tion with Self-attention for Visual Re-ranking |  1 +
 .../neurips/Continual Auxiliary Task Learning     |  1 +
 ...ontinual Learning via Local Module Composition |  1 +
 ...Benchmark For Continual Reinforcement Learning |  1 +
 ...ic Gradient Descents, and of Gossip Algorithms |  0
 ...oubly Constrained Batch Reinforcement Learning |  1 +
 data/2021/neurips/Continuous Latent Process Flows |  1 +
 .../neurips/Continuous Mean-Covariance Bandits    |  1 +
 ... Discrete Optimization of Deep Neural Networks |  1 +
 ...modelling using non-parametric point processes |  1 +
 ...Video Domain Adaptation with Background Mixing |  1 +
 data/2021/neurips/Contrastive Active Inference    |  1 +
 ...ervised Learning with Extremely Limited Labels |  1 +
 data/2021/neurips/Contrastive Laplacian Eigenmaps |  1 +
 .../Contrastive Learning for Neural Topic Model   |  1 +
 ...ning of Global and Local Video Representations |  0
 ...rcement Learning of Symbolic Reasoning Domains |  1 +
 ...isentangled Sequential Variational Autoencoder |  1 +
 ...ntrol Variates for Slate Off-Policy Evaluation |  1 +
 ...tinuous Optimization with Multiple Constraints |  1 +
 ...ling Neural Networks with Rule Representations |  1 +
 ...Gradient Descent under Infinite Noise Variance |  1 +
 ...hms for constrained weakly convex optimization |  1 +
 ...nvex Polytope Trees and its Application to VAE |  0
 .../Convex-Concave Min-Max Stackelberg Games      |  1 +
 ... Convolutional Network Robustness and Training |  1 +
 ...h Asynchronous Agents and Constrained Feedback |  1 +
 .../Coordinated Proximal Policy Optimization      |  1 +
 ...r Classification - Simplified and Strengthened |  1 +
 .../Coresets for Clustering with Missing Values   |  3 +++
 .../Coresets for Decision Trees of Signals        |  1 +
 .../neurips/Coresets for Time Series Clustering   |  1 +
 ...ng with Applications to Recovering Communities |  1 +
 .../neurips/Corruption Robust Active Learning     |  1 +
 ...er Network for Cortical Surface Reconstruction |  0
 ...ellar networks as decoupling neural interfaces |  1 +
 ...and Strategic Incentives in Allocation Markets |  1 +
 ... Policy Refinement Using Bayesian Optimization |  1 +
 ...Counterfactual Explanations Can Be Manipulated |  1 +
 ...n Sequential Decision Making Under Uncertainty |  1 +
 ...o Spurious Correlations in Text Classification |  0
 ...kelihood Estimation for Training Deep Networks |  1 +
 ...dient Estimators for Discrete Latent Variables |  1 +
 ...nd Edge Learning via Dynamic Graph Propagation |  1 +
 ...timation Without Private Covariance Estimation |  1 +
 data/2021/neurips/Credal Self-Supervised Learning |  1 +
 ...ent Through Broadcasting a Global Error Vector |  1 +
 ... Neural Networks through Deep Feedback Control |  1 +
 ...r Cost-Efficient Visual Reinforcement Learning |  1 +
 ...o-localization with Layer-to-Layer Transformer |  1 +
 ...Multi-Party Computation Meets Machine Learning |  1 +
 ...ng via Demonstrations: Theory and Applications |  1 +
 ...ngled Recommendation with Noisy Multi-feedback |  1 +
 ...um Learning for Vision-and-Language Navigation |  1 +
 .../neurips/Curriculum Offline Imitating Learning |  1 +
 .../Cycle Self-Training for Domain Adaptation     |  1 +
 ...ing Models for Few-Shot Conditional Generation |  1 +
 ... Data Using Causally-Aware Generative Networks |  1 +
 ...Material with a Hybrid Differentiable Renderer |  1 +
 ...: Spatial Invariance and Neural Tangent Kernel |  1 +
 ...e-Training Objective for Programming Languages |  1 +
 ... Method for Detecting Misclassification Errors |  1 +
 ...supervised Learning with A Few Labeled Samples |  1 +
 .../DRIVE: One-bit Distributed Mean Estimation    |  1 +
 ... SLAM for Monocular, Stereo, and RGB-D Cameras |  1 +
 ...ware Low-rank Compression for Large NLP Models |  1 +
 ...perts with Applications to Multi-Task Learning |  1 +
 ...: Acceleration, Convergence, and Stabilization |  1 +
 ...Bayesian Model Averaging under Covariate Shift |  1 +
 .../Data Augmentation Can Improve Robustness      |  1 +
 ... Compression for Cooperative Networked Control |  1 +
 .../neurips/Data driven semi-supervised learning  |  1 +
 ...t) Augmentations: A Lottery Ticket Perspective |  1 +
 ...stance Generation from Instance Discrimination |  1 +
 ...on with Infinitely Wide Convolutional Networks |  1 +
 ...CMC dynamics with the diffusion Stein operator |  1 +
 ...n Fixed-Confidence Linear Top-m Identification |  1 +
 ...Answering from Feature and Sample Perspectives |  1 +
 ...ugmentation for GAN Training with Limited Data |  1 +
 ...centralized Learning in Online Queuing Systems |  1 +
 ...entralized Q-learning in Zero-sum Markov Games |  1 +
 ...: Reinforcement Learning via Sequence Modeling |  1 +
 ...onditional Downscaling with Gaussian Processes |  1 +
 .../Deconvolutional Networks on Graph Data        |  1 +
 ...g the Depth and Scope of Graph Neural Networks |  1 +
 ...y Complex Wordplay Puzzles as a Target for NLP |  1 +
 ...e and Efficient Exploration with Deep Networks |  1 +
 ...ssian Mixture Model for Constrained Clustering |  1 +
 .../neurips/Deep Contextual Video Compression     |  1 +
 ...icit Duration Switching Models for Time Series |  1 +
 ...p Extended Hazard Models for Survival Analysis |  1 +
 ...xtrapolation for Attribute-Enhanced Generation |  1 +
 ...cy Evaluation in Continuous Treatment Settings |  1 +
 ...earning Through the Lens of Example Difficulty |  1 +
 ...: Finding Important Examples Early in Training |  1 +
 .../Deep Learning with Label Differential Privacy |  1 +
 ...ntation for High-Resolution 3D Shape Synthesis |  1 +
 ...ent Temporal and Spatial Analysis of fMRI Data |  1 +
 ...g via Fusing Physical and Chemical Information |  1 +
 ...Deep Networks Provably Classify Data on Curves |  1 +
 ...as Point Estimates for Deep Gaussian Processes |  1 +
 ...ication to Confounded Bandit Policy Evaluation |  1 +
 ...rning at the Edge of the Statistical Precipice |  1 +
 ...p Residual Learning in Spiking Neural Networks |  1 +
 ...issimilarities as Powerful Visual Fingerprints |  1 +
 ...e-Carlo Planning in Reconnaissance Blind Chess |  1 +
 ...n using selective backpropagation through time |  1 +
 ...of model smoothness in anisotropic Besov space |  1 +
 ...d Expectation-Maximization for Blind Inversion |  1 +
 ...nication Framework for Federated Deep Learning |  1 +
 ...omposition of What and When Across Time Scales |  1 +
 ...ameter-Efficient Convolutional Neural Networks |  1 +
 ... Highly Structured and Sparse Linear Transform |  1 +
 ...e Communication Latency for Federated Learning |  0
 ...ds Practical Control in Cyber-Physical Systems |  1 +
 .../Demystifying and Generalizing BinaryConnect   |  1 +
 data/2021/neurips/Denoising Normalizing Flow      |  1 +
 .../Dense Keypoints via Multiview Supervision     |  0
 ...e Unsupervised Learning for Video Segmentation |  1 +
 .../neurips/Densely connected normalizing flows   |  1 +
 ... Implicit Regularization and Sample Complexity |  1 +
 ...ments for Stochastic Contextual Linear Bandits |  1 +
 ...rfactual Generators using Deep Model Inversion |  1 +
 ... Event Sequences with Temporal Point Processes |  1 +
 ...on Unlabeled Data with Self-training Ensembles |  1 +
 ...tyle: Exploring Behavioral Stylometry in Chess |  1 +
 ...hlights in Videos via Natural Language Queries |  0
 ...istribution Shifts in Bayesian Online Learning |  1 +
 ...al polynomials for sampling minibatches in SGD |  1 +
 ...BS: Differentiable Bayesian Structure Learning |  1 +
 ...ance Sampling and the Perils of Gradient Noise |  1 +
 ...lberg Models of Combinatorial Congestion Games |  1 +
 .../neurips/Differentiable Learning Under Triage  |  1 +
 .../Differentiable Multiple Shooting Layers       |  1 +
 ...ondecomposable Functions using Linear Programs |  1 +
 .../2021/neurips/Differentiable Quality Diversity |  1 +
 ...entiable Simulation of Soft Multi-body Systems |  1 +
 ...t-Descent for Training Spiking Neural Networks |  0
 .../neurips/Differentiable Spline Approximations  |  1 +
 ...ferentiable Synthesis of Program Architectures |  1 +
 ...d Feature Selection based on a Gated Laplacian |  1 +
 ...erentiable rendering with perturbed optimizers |  1 +
 ... Langevin Diffusion and Noisy Gradient Descent |  1 +
 ...Differential Privacy Over Riemannian Manifolds |  1 +
 ...ical Risk Minimization under the Fairness Lens |  1 +
 ...sian Optimization with Distributed Exploration |  1 +
 ...tially Private Learning with Adaptive Clipping |  1 +
 .../Differentially Private Model Personalization  |  1 +
 ...ivate Multi-Armed Bandits in the Shuffle Model |  1 +
 ...erentially Private Sampling from Distributions |  1 +
 ... New Results in Convex and Non-Convex Settings |  1 +
 .../Differentially Private n-gram Extraction      |  1 +
 .../Diffusion Models Beat GANs on Image Synthesis |  1 +
 data/2021/neurips/Diffusion Normalizing Flow      |  1 +
 ...plications to Score-Based Generative Modeling" |  1 +
 .../Dimension-free empirical entropy estimation   |  1 +
 ...sionality Reduction for Wasserstein Barycenter |  1 +
 ...ect Multi-view Multi-person 3D Pose Estimation |  1 +
 .../neurips/Directed Graph Contrastive Learning   |  1 +
 .../2021/neurips/Directed Probabilistic Watershed |  1 +
 ...ve Latent Network Models Of Neural Populations |  1 +
 ... on Molecular Graphs via Synthetic Coordinates |  1 +
 ...rained Learning for Deep Graph Neural Networks |  1 +
 ...tworks with Hierarchical Voting Transformation |  1 +
 ...ions for Spatio-Temporal Graph Neural Networks |  1 +
 ...scovering and Achieving Goals via World Models |  1 +
 ...Discovery of Options via Meta-Learned Subgoals |  1 +
 .../neurips/Discrete-Valued Neural Communication  |  1 +
 .../Disentangled Contrastive Learning on Graphs   |  1 +
 ... from Noisy Data with Structured Nonlinear ICA |  1 +
 ...ion and the Prior in the Cold Posterior Effect |  1 +
 ...ncertainty Estimation Without Harming Accuracy |  1 +
 ...Process in Linear Graph Convolutional Networks |  1 +
 ...stilling Image Classifiers in Object Detectors |  1 +
 ...icit Drug Trafficker Detection on Social Media |  1 +
 ...tilling Object Detectors with Feature Richness |  1 +
 ...Adversarial Examples by Information Bottleneck |  1 +
 ...stributed Deep Learning In Open Collaborations |  1 +
 ...les per User: Sharp Rates and Phase Transition |  1 +
 ...achine Learning with Sparse Heterogeneous Data |  1 +
 ... Component Analysis with Limited Communication |  1 +
 ...ed Saddle-Point Problems Under Data Similarity |  0
 ...ero-Order Optimization under Adversarial Noise |  1 +
 ...gression: discrete, continuous, and in between |  1 +
 ... for Learning Uncertain Neural Dynamics Models |  1 +
 ...earning for Multi-Dimensional Reward Functions |  1 +
 .../Distributionally Robust Imitation Learning    |  1 +
 ..., Quantization Effects, and Frontier Integrals |  1 +
 ...Message Passing for Attribute with Heterophily |  1 +
 ...ve Learning with Strictly Proper Scoring Rules |  1 +
 ...Diversity Matters When Learning From Ensembles |  1 +
 ...ing Tasks Require Different Appearance Models? |  1 +
 ...t Gradients Highlight Discriminative Features? |  1 +
 ...ers Work? A Continuous Wasserstein-2 Benchmark |  1 +
 ...Really Perform Badly for Graph Representation? |  1 +
 ...ormers See Like Convolutional Neural Networks? |  1 +
 ...l Networks Really Help Adversarial Robustness? |  1 +
 .../Does Knowledge Distillation Really Work?      |  1 +
 ...p Training Over-parameterized Neural Networks? |  1 +
 ...mitigate biases caused by subpopulation shift? |  1 +
 ...ation Learning: What Transformations to Learn? |  1 +
 ...n Learning with Domain Density Transformations |  1 +
 ...N: M sparse schemes from dense neural networks |  1 +
 ...ate Generative Models with Sinkhorn Divergence |  1 +
 ...Machine Learning for Dynamic Treatment Effects |  1 +
 ...n for Local Treatment Effects with Instruments |  1 +
 ...y Robust Thompson Sampling with Linear Payoffs |  1 +
 ... the strange case of off-policy policy updates |  1 +
 ...Are Found within Randomly Initialized Networks |  1 +
 ...rvised Approach for Generating Neural Activity |  1 +
 ...gnal Between Sequences While Dropping Outliers |  1 +
 ...se the Expressiveness of Graph Neural Networks |  1 +
 ...mizing the Adaptive Regret of Convex Functions |  1 +
 ...ation of Sparse Variational Gaussian Processes |  1 +
 ...ype Network for Generalized Zero-Shot Learning |  1 +
 .../Dual-stream Network for Visual Recognition    |  1 +
 .../DualNet: Continual Learning, Fast and Slow    |  1 +
 .../Dueling Bandits with Adversarial Sleeping     |  1 +
 .../neurips/Dueling Bandits with Team Comparisons |  1 +
 ...ce Learning for Reversible Machine Translation |  1 +
 ...form for Holistic Next-Generation Benchmarking |  1 +
 ... via De-Sparsified Orthogonal Matching Pursuit |  1 +
 ...tleneck for Robust Self-Supervised Exploration |  1 +
 ...ure from a spatial-temporal transmission model |  1 +
 .../neurips/Dynamic Causal Bayesian Optimization  |  1 +
 ...omain Few-Shot Recognition with Unlabeled Data |  1 +
 ...ynamic Grained Encoder for Vision Transformers |  1 +
 .../Dynamic Inference with Neural Interpreters    |  1 +
 ...ders for High-Resolution Semantic Segmentation |  1 +
 ...ization and Relay for Video Action Recognition |  1 +
 data/2021/neurips/Dynamic Resolution Network      |  1 +
 ...e Screening for Norm-Regularized Least Squares |  1 +
 data/2021/neurips/Dynamic Trace Estimation        |  1 +
 ...ntiable Physics Models from Video and Language |  1 +
 data/2021/neurips/Dynamic influence maximization  |  1 +
 ...ulti-agent communication with natural language |  1 +
 ...Transformers with Dynamic Token Sparsification |  1 +
 ...sserstein Barycenters for Time-series Modeling |  1 +
 ...entum Methods on Large-scale, Quadratic Models |  1 +
 ...inematic policy for egocentric pose estimation |  1 +
 .../neurips/E(n) Equivariant Normalizing Flows    |  1 +
 ...xplaining Deep Reinforcement Learning Policies |  1 +
 ... Better, and Practically Faster Error Feedback |  1 +
 ...Efficient Infinite-Depth Graph Neural Networks |  1 +
 ...ploration through Learned Language Abstraction |  1 +
 ...arly Convolutions Help Transformers See Better |  1 +
 .../Early-stopped neural networks are consistent  |  1 +
 .../Edge Representation Learning with Hypergraphs |  1 +
 ...EditGAN: High-Precision Semantic Image Editing |  1 +
 ...a classifier by rewriting its prediction rules |  1 +
 ...rization by Kernelized Proximal Regularization |  1 +
 ...sian Process Classification by Error Reduction |  1 +
 ... Neural Networks with General ReLU Activations |  1 +
 ...ture learning via local Markov boundary search |  1 +
 ...terialization and Offloading for Training DNNs |  1 +
 data/2021/neurips/Efficient Equivariant Network   |  1 +
 ...ion, Allocation, and Triangular Discrimination |  1 +
 ...lization with Distributionally Robust Learning |  1 +
 ...ning of Discrete-Continuous Computation Graphs |  1 +
 ... Ascent Methods for Nonsmooth Minimax Problems |  1 +
 ...orward and Backward Propagation Sparsification |  1 +
 ... of Causal Effects by Deciding What to Observe |  1 +
 ...ssment of Neural Network Corruption Robustness |  1 +
 ...ining of Retrieval Models using Negative Cache |  1 +
 ...ing of Visual Transformers with Small Datasets |  1 +
 ... Linear Regression with Unknown Noise Variance |  1 +
 ...ficient and Accurate Gradients for Neural SDEs |  1 +
 .../Efficient and Local Parallel Random Walks     |  1 +
 ...ned sampling via the mirror-Langevin algorithm |  1 +
 ...tio-temporal regression models in neuroimaging |  1 +
 ... random fields under sparse linear constraints |  1 +
 ...tifying Task Groupings for Multi-Task Learning |  1 +
 ...ng One Hidden Layer ReLU Networks From Queries |  0
 ...iple of Loss Landscape of Deep Neural Networks |  1 +
 .../Emergent Communication of Generalizations     |  1 +
 ...ication under Varying Sizes and Connectivities |  0
 ...gent Discrete Communication in Semantic Spaces |  1 +
 ...via Just-in-Time Compilation and Vectorization |  1 +
 ...ge Style via Adversarial Feature Perturbations |  1 +
 ...volutional Features for Texture Representation |  1 +
 ...d Retriever for Open-Domain Question Answering |  1 +
 data/2021/neurips/End-to-End Weak Supervision     |  1 +
 ...nd-to-end Multi-modal Video Temporal Grounding |  1 +
 ...ata-driven regularization for inverse problems |  1 +
 .../Ensembling Graph Predictions for AMR Parsing  |  1 +
 ...ntropic Desired Dynamics for Intrinsic Control |  1 +
 ...Entropy-based adaptive Hamiltonian Monte Carlo |  1 +
 ...Zero-Shot Compositional Reinforcement Learning |  1 +
 ...ent Learning with Curiosity-driven Exploration |  1 +
 ...hines: The One-Sided Quasi-Perfect Equilibrium |  1 +
 ... the learning of Restricted Boltzmann Machines |  1 +
 data/2021/neurips/Equivariant Manifold Flows      |  1 +
 ...Compensated Distributed SGD Can Be Accelerated |  1 +
 ...r compensation for variance reduced algorithms |  1 +
 ...s by a simple gradient-descent based algorithm |  1 +
 .../Escaping Saddle Points with Compressed SGD    |  1 +
 ...radients of the Data Distribution by Denoising |  1 +
 ...reatment Effects via Single-cause Perturbation |  1 +
 ...ting the Long-Term Effects of Novel Treatments |  1 +
 ...the Unique Information of Continuous Variables |  1 +
 ...Performance Estimators of Neural Architectures |  1 +
 ...ion Attacks and Defenses in Federated Learning |  1 +
 ...Classification Models Against Bayes Optimality |  1 +
 ...el performance under worst-case subpopulations |  1 +
 ...ms for Learned and Rule-Based Agents in Hanabi |  1 +
 ...en Regularization Imposed by Self-Distillation |  1 +
 ...imodal Distributions in Deep Generative Models |  1 +
 ... Meta-Learning and Hyperparameter Optimization |  1 +
 ...Large-Scale Benchmark for Evolving Soft Robots |  1 +
 ...he Exponential Mechanism with Artificial Atoms |  1 +
 ...stributions of finite Bayesian neural networks |  1 +
 .../Excess Capacity and Backdoor Poisoning        |  1 +
 ...o Vision with Cross-Modal Contrastive Learning |  1 +
 ...eter Optimization via Partial Dependence Plots |  1 +
 ...tent Representations with a Corpus of Examples |  1 +
 ...rhinal cortex with task-driven neural networks |  1 +
 ...sed Data Augmentation for Image Classification |  1 +
 ...eward Design for Reinforcement Learning Agents |  1 +
 ...e gradient descent training of neural networks |  1 +
 ...' Theorem to Compare Probability Distributions |  1 +
 ...in Secure Cross-Platform Social Recommendation |  1 +
 ...ific Features to Enhance Domain Generalization |  1 +
 ...ethods Globally: Adaptive Sample Size Approach |  1 +
 ... Under Utility Constraints in Sequential Games |  1 +
 ...ploiting a Zoo of Checkpoints for Unseen Tasks |  1 +
 ...od Structure for Source-free Domain Adaptation |  1 +
 ...petition: Convergence with Bounded Rationality |  1 +
 ...s of Adversarially Robust Deep Neural Networks |  1 +
 ...r Weakly-Supervised Audio-Visual Video Parsing |  1 +
 ...ensic Dental Identification with Deep Learning |  1 +
 ...riational Autoencoder for Interaction Modeling |  1 +
 ...ng the Limits of Out-of-Distribution Detection |  1 +
 ...unds for Risk-Sensitive Reinforcement Learning |  1 +
 ...ably Efficient for Decentralized Deep Training |  1 +
 ...Two Learning Models and Adversarial Robustness |  1 +
 ...al Networks with Differentiable Contact Models |  1 +
 ...ion-Aware Local Features by Learning to Deform |  1 +
 ...tored Multi-Agent Centralised Policy Gradients |  1 +
 .../FINE Samples for Learning with Noisy Labels   |  1 +
 ...n Federated Learning from a Client Perspective |  1 +
 .../FLEX: Unifying Evaluation for Few-Shot NLP    |  1 +
 ... Decomposed Near-field and Far-field Attention |  1 +
 ...ing Structure for Efficient Learning in MOMDPs |  1 +
 ...Algorithms for Multi-Agent Multi-Armed Bandits |  1 +
 ... Classification with Adversarial Perturbations |  1 +
 .../neurips/Fair Clustering Under a Bounded Cost  |  1 +
 .../Fair Exploration via Axiomatic Bargaining     |  1 +
 .../Fair Scheduling for Time-dependent Resources  |  1 +
 ...ial Selection Using Supervised Learning Models |  1 +
 data/2021/neurips/Fair Sortition Made Transparent |  1 +
 ...n Invex Relaxation for a Combinatorial Problem |  1 +
 .../neurips/Fairness in Ranking under Uncertainty |  1 +
 .../Fairness via Representation Neutralization    |  1 +
 ...g by Similarity-based Consistency Optimization |  1 +
 ...for Infinite-Horizon Markov Decision Processes |  1 +
 ...ance Using Concentration of Random Projections |  1 +
 ...Fast Axiomatic Attribution for Neural Networks |  1 +
 ...an Cox Processes via Path Integral Formulation |  1 +
 ...st Certified Robust Training with Short Warmup |  1 +
 ...artition Function with Weak Mixing Time Bounds |  1 +
 ...ructured Nonconvex-Nonconcave Minimax Problems |  1 +
 ...he Presence of Arbitrary Device Unavailability |  1 +
 ...rial Attacks through Adaptive Norm Constraints |  1 +
 ...ng for Extreme Multi-label Text Classification |  1 +
 ... Competitive Games with Entropy Regularization |  1 +
 ...cations to Sparse Regression in Bioinformatics |  1 +
 .../neurips/Fast Pure Exploration via Frank-Wolfe |  1 +
 ...ng in Congestion Games via Exponential Weights |  0
 ...Stochastic Compositional Optimization Problems |  1 +
 ... Lumigraph Representations using Meta Learning |  1 +
 ...egative Tensors Using Mean-Field Approximation |  1 +
 ... Differentially Private-SGD via JL Projections |  1 +
 ... algorithms for low-rank tensor decompositions |  1 +
 ...ates for prediction with limited expert advice |  1 +
 ...dit Alignment for Automatic Speech Recognition |  1 +
 ...Lower Bounds for the Worst-Case Expected Error |  1 +
 ...ural Networks under Spherically Symmetric Data |  1 +
 .../neurips/Faster Matchings via Learned Duals    |  1 +
 ...rk Training with Approximate Tensor Operations |  1 +
 ...n-asymptotic Convergence for Double Q-learning |  1 +
 ...mization using Jacobi-based eigenvalue methods |  1 +
 ...nforcement Learning with Theoretical Guarantee |  1 +
 ...for Nonconvex Federated Composite Optimization |  1 +
 ...rated Graph Classification over Non-IID Graphs |  1 +
 ..., Baselines, and Connections to Weight-Sharing |  1 +
 .../neurips/Federated Linear Contextual Bandits   |  1 +
 ...Task Learning under a Mixture of Distributions |  1 +
 ...nstruction: Partially Local Federated Learning |  1 +
 ... Vision Transformer for COVID-19 CXR Diagnosis |  0
 ...eterogeneity mitigation and variance reduction |  0
 .../Few-Round Learning for Federated Learning     |  1 +
 ...a-Driven Algorithms for Low Rank Approximation |  1 +
 ...t Detection via Association and DIscrimination |  1 +
 ... Segmentation via Cycle-Consistent Transformer |  1 +
 .../Finding Bipartite Components in Hypergraphs   |  1 +
 ...pecific Degradations in Blind Super-Resolution |  1 +
 ...for Reducing Distortions of Hard-label Attacks |  1 +
 ...ion-Making via Expected Conditional Covariance |  1 +
 ...ing Input Features with Predictive Information |  1 +
 ...ero-Shot Learning with DNA as Side Information |  1 +
 ...zation Analysis of Inductive Matrix Completion |  0
 ...of Average-Reward TD Learning and $Q$-Learning |  1 +
 ... TD-Learning via Generalized Bellman Operators |  1 +
 ...ith a differentiable spiking network simulator |  1 +
 ...ating Improvements in Machine-Learning Systems |  1 +
 ...der heterogeneous targets with Ordered Dropout |  1 +
 ... Projection Memory Benefits Continual Learning |  1 +
 ...vised Learning with Curriculum Pseudo Labeling |  1 +
 data/2021/neurips/Flexible Option Learning        |  1 +
 ...for Non-Iterative Diverse Candidate Generation |  1 +
 ...Long-Range Interactions in Vision Transformers |  1 +
 ...s across covariates instead of across datasets |  1 +
 ...ess of Neural Networks to Weight Perturbations |  1 +
 ...ion-Forgetting Trade-off in Continual Learning |  1 +
 ...composition and Learning Halfspaces with Noise |  1 +
 ... Symbolic Languages for Model Interpretability |  1 +
 ...operties of Stochastic Optimization Algorithms |  1 +
 ... RNN as a kernel method: A neural ODE approach |  1 +
 ...lysis to Self-supervised Graph Neural Networks |  1 +
 ...e Re-Sampling Strategies in Stochastic Bandits |  0
 ...andom forests and when they are Shapley values |  1 +
 ...orks for Parametric Image Restoration Problems |  1 +
 ...orcement Learning via Learned Fourier Features |  1 +
 ...ference based on Stochastic Process Generators |  1 +
 ...ledge Transfer for Low-resource Drug Discovery |  1 +
 .../Fuzzy Clustering with Similarity Queries      |  1 +
 ... Private Aggregation of Teacher Discriminators |  1 +
 ...t Representations without Iterative Refinement |  1 +
 ... Network for Multi-agent Trajectory Prediction |  1 +
 ...ment Reconstruction from Point Cloud Sequences |  1 +
 data/2021/neurips/Gauge Equivariant Transformer   |  1 +
 ...re Network for Single Image Defocus Deblurring |  1 +
 ...irectional Graph Neural Networks for Molecules |  1 +
 ...ization: Geometric Analysis and Sharper Bounds |  1 +
 ...neral Nonlinearities in SO(2)-Equivariant CNNs |  1 +
 ... from Observation via Inferring Goal Proximity |  1 +
 .../Generalizable Multi-linear Attention Network  |  1 +
 ...ta-Learning: An Information-Theoretic Analysis |  1 +
 ...n Bounds for (Wasserstein) Robust Optimization |  1 +
 ... Using Negative Sampling: Linear vs Hyperbolic |  0
 ...a-Learning via PAC-Bayes and Uniform Stability |  1 +
 ...e Crossover from the Noiseless to Noisy Regime |  1 +
 ...ization Guarantee of SGD for Pairwise Learning |  1 +
 ...earning Algorithms: Recurring and Unseen Tasks |  1 +
 ...eighting via Class-Level Gradient Manipulation |  1 +
 ...rsarially Robust and Efficient Neural Networks |  1 +
 ...Divergence Loss for Learning with Noisy Labels |  1 +
 ...Linear Bandits with Local Differential Privacy |  1 +
 ...Proximal Policy Optimization with Sample Reuse |  1 +
 ...alized Shape Metrics on Neural Representations |  1 +
 ...bject Detection via SVD-Dictionary Enhancement |  1 +
 ... Navigation in Partially-Revealed Environments |  1 +
 ...cy Fields for 3D Surface-Aware Image Synthesis |  1 +
 ...native: Rethinking The Meta-Continual Learning |  1 +
 ...eric Neural Architecture Search via Regression |  1 +
 ...Generation of Molecular 3D Conformer Ensembles |  1 +
 .../Geometry Processing with Neural Fields        |  1 +
 .../neurips/Glance-and-Gaze Vision Transformer    |  1 +
 ...t for Asymmetric Low-Rank Matrix Factorization |  1 +
 ...ization for Nonlinear Model Predictive Control |  1 +
 ...librium in Classes of Nonconvex Zero-Sum Games |  1 +
 ...lobal Filter Networks for Image Classification |  1 +
 ...am Search for Neural Abstractive Summarization |  1 +
 ...Sample Efficient Neural Function Approximation |  1 +
 ...formers with Recurrent Fast Weight Programmers |  1 +
 ... Neural Active Learning with Fisher Embeddings |  1 +
 ...d Classification Measures and How to Find Them |  1 +
 ...cation Using Gradients for Task Representation |  1 +
 ...ral Networks for Stable and Efficient Training |  1 +
 ... Nets: Margin Maximization and Simplicity Bias |  1 +
 ...tee Fairness in Collaborative Machine Learning |  1 +
 ...Gradient Inversion with Generative Image Prior |  1 +
 ...t Image Corruption for Learning-based Steering |  1 +
 ...amples for Online Task-free Continual Learning |  1 +
 ...Hyperparameter Optimization Over Long Horizons |  1 +
 ...daptation without Indexed Intermediate Domains |  1 +
 .../Grammar-Based Grounded Lexicon Learning       |  1 +
 .../Graph Adversarial Self-Supervised Learning    |  1 +
 ...le Architecture Search with Structure Learning |  1 +
 .../Graph Neural Networks with Adaptive Residual  |  1 +
 ...ph Neural Networks with Local Graph Parameters |  1 +
 ...Predictive Uncertainty for Node Classification |  1 +
 ...s for Representation Learning on Textual Graph |  1 +
 .../Graphical Models in Heavy-Tailed Markets      |  1 +
 ...ithms for Active Sequential Hypothesis Testing |  1 +
 ...s with Faster Explicit Superlinear Convergence |  1 +
 ...ntation Similarity Through Statistical Testing |  0
 ...ing Spatio-Temporal Language with Transformers |  1 +
 ...ages: invariance stems from variations in data |  1 +
 data/2021/neurips/Group Equivariant Subsampling   |  1 +
 ...nd Temporal Reconstruction of Humans in Motion |  1 +
 ...bal Parameters for Neural Posterior Estimation |  1 +
 ...esolution Vision Transformer for Dense Predict |  0
 ...antic-Visual Adaptation for Zero-Shot Learning |  1 +
 ...ing Home Assistants to Rearrange their Habitat |  1 +
 ...with Non-Newtonian Momentum for Rapid Sampling |  1 +
 ... Long-tailed Feature Distribution in AdderNets |  1 +
 ...rd-Attention for Scalable Image Classification |  1 +
 ...t Latency Prediction for NAS via Meta-Learning |  1 +
 .../neurips/Hash Layers For Large Sparse Models   |  1 +
 .../Heavy Ball Momentum for Conditional Gradient  |  1 +
 ...vy Ball Neural Ordinary Differential Equations |  1 +
 ...essibility of Overparametrized Neural Networks |  1 +
 ...igenspectra of More Realistic Nonlinear Models |  1 +
 ...ed Bandits: Closing the Gap and Generalization |  1 +
 .../Heuristic-Guided Reinforcement Learning       |  1 +
 ...: O(1)-Approximation for Well-Clustered Graphs |  1 +
 ...cal Reinforcement Learning with Timed Subgoals |  1 +
 .../Hierarchical Skills for Efficient Exploration |  1 +
 ...ds for Line Search Based on Stochastic Oracles |  1 +
 ...onvex Stochastic Optimization with Heavy Tails |  1 +
 ...to Capture Filtrations of Stochastic Processes |  1 +
 ...g: Experience Replay for Sparse Reward Meta-RL |  1 +
 ...Transformer for Vision-and-Language Navigation |  1 +
 ...tive RL and Fragment-based Molecule Generation |  1 +
 ...ion affects Optimization for Linear Regression |  1 +
 data/2021/neurips/How Does it Sound?              |  0
 ...Fine-Tuning Allows for Effective Meta-Learning |  1 +
 ...ule Networks Be for Systematic Generalization? |  1 +
 ...ance Predictors in Neural Architecture Search? |  1 +
 ... Be Fine-Tuned Towards Adversarial Robustness? |  1 +
 ...ght Can PAC-Bayes be in the Small Data Regime? |  1 +
 ...pport Causal Understanding of CNN Activations? |  1 +
 ...n classical multidimensional scaling go wrong? |  1 +
 ...tecture Impact its Robustness to Noisy Labels? |  1 +
 ...c reasoning knowledge to learn new algorithms? |  1 +
 .../Human-Adversarial Visual Question Answering   |  1 +
 ...al Semi-Bandits and Adversarial Linear Bandits |  1 +
 ... Compact and Expressive Probabilistic Circuits |  1 +
 ...rbolic Busemann Learning with Ideal Prototypes |  1 +
 ... Procrustes Analysis Using Riemannian Geometry |  1 +
 ... and Community Selection for Objects Retrieval |  1 +
 ...timization Is Deceiving Us, and How to Stop It |  1 +
 ...yperparameter Tuning is All You Need for LISTA |  1 +
 ...edge Graph Completion Using Pair-Wise Encoding |  1 +
 ...Q-Learn: Inverse soft-Q Learning for Imitation |  1 +
 ...n't: A test case of natural language inference |  1 +
 ...ntifiability in inverse reinforcement learning |  1 +
 ...dels for Missing Not at Random Data Imputation |  1 +
 ...servational Studies with Covariate Information |  1 +
 ...s for the Non-Gaussian and Heterogeneous Cases |  1 +
 ...ized Condorcet Winner in Multi-dueling Bandits |  1 +
 ...ing Natural Out-of-Context Prediction Problems |  1 +
 .../neurips/Identity testing for Mallows model    |  1 +
 ...Image Generation using Continuous Filter Atoms |  1 +
 ...l Diffusion for Autoregressive Image Synthesis |  1 +
 ...ally Elastic Stochastic Differential Equations |  1 +
 .../neurips/Imitation with Neural Density Models  |  1 +
 ... Networks: a Provable Benefit of Stochasticity |  1 +
 ...-Based Experimental Design without Likelihoods |  1 +
 ...ptimal Algorithms for Stochastic Shortest Path |  1 +
 data/2021/neurips/Implicit Generative Copulas     |  1 +
 ...ough Discrete Exponential Family Distributions |  1 +
 ...arization in Matrix Sensing via Mirror Descent |  1 +
 ...Implicit SVD for Graph Representation Learning |  1 +
 ...sponse Alignment for Partial Domain Adaptation |  1 +
 ...zation: The Impact of Depth and Early Stopping |  1 +
 ...ncy Measure for Unsupervised Domain Adaptation |  1 +
 ...reen Content Image Continuous Super-Resolution |  1 +
 ...presentation learning with synaptic plasticity |  1 +
 ...arallel Tree Search with Off-Policy Correction |  1 +
 ...Algorithms for Power Means in Euclidean Spaces |  1 +
 ... via new Ordered Contention Resolution Schemes |  1 +
 ...pe SVM with Sparse Multi-Kernel Representation |  1 +
 ...Regret Bounds for Tracking Experts with Memory |  1 +
 ... Robustness for Fine-tuning in Neural Networks |  1 +
 .../Improved Transformer for High-Resolution GANs |  1 +
 ...Sets for Linear Bandits and Linear Mixture MDP |  1 +
 ...scaded Networks and a Temporal-Difference Loss |  1 +
 ...h the Relationship with Adversarial Robustness |  1 +
 ...els with Dual-System, Neuro-Symbolic Reasoning |  1 +
 ...Networks by Decoding Representations to Inputs |  1 +
 ...l Reinforcement Learning via Stored Embeddings |  1 +
 ...al Coverage via Orthogonal Quantile Regression |  1 +
 ...ing on Imbalanced Data via Open-World Sampling |  0
 ...g Interpretability by Saliency Guided Training |  1 +
 ...h Imaginary Tasks from Latent Dynamics Mixture |  1 +
 .../Improving Robustness using Generated Data     |  1 +
 ...ith Automated Unsupervised Outlier Arbitration |  1 +
 ...ations via Augmentation-Aware Self-Supervision |  1 +
 ...s by A Token-based Generator with Transformers |  1 +
 ... in VAE latent space using decoder uncertainty |  1 +
 ...cs Organized by Astrocyte-modulated Plasticity |  1 +
 ...ype Propagation for Zero-Shot Compositionality |  1 +
 ...Independent mechanism analysis, a new concept? |  1 +
 ...imum Empirical Divergence for Unimodal Bandits |  1 +
 ... Privacy Accounting via a R\303\251nyi Filter" |  1 +
 ...ime Horizon Safety of Bayesian Neural Networks |  1 +
 ...tterns for Explaining Information Flow in BERT |  1 +
 ...: Information-Aware Graph Contrastive Learning |  1 +
 ...ted Reward Learning for Reinforcement Learning |  1 +
 ...on Directed Sampling for Sparse Linear Bandits |  1 +
 ...wer: Intrinsic Control via Information Capture |  1 +
 ...on: can adaptive processing of gradients help? |  1 +
 ...ation bounds for black-box learning algorithms |  1 +
 ...al Knowledge Distillation for Object Detection |  1 +
 data/2021/neurips/Instance-Conditioned GAN        |  1 +
 ...Lipschitz Optimization with Error Certificates |  1 +
 .../Instance-Dependent Partial Label Learning     |  1 +
 ...noise Learning under a Structural Causal Model |  1 +
 ...mal Mean Estimation Under Differential Privacy |  1 +
 ...ral ODEs: Pharmacology and Disease Progression |  1 +
 ...ee Path in Transformer for Code Representation |  1 +
 ...Label Cleaning with Example-based Explanations |  1 +
 ...ious Agent: Learning Task-Agnostic Exploration |  1 +
 ... Momentum Contrastive Self Supervised Learning |  1 +
 ...ust generalization even when there is no noise |  1 +
 ...generic visual processor emerging on the side) |  1 +
 ... Quality of DNNs for 3D Point Cloud Processing |  1 +
 ... Inference with Tractable Probabilistic Models |  1 +
 .../Intriguing Properties of Contrastive Losses   |  1 +
 .../Intriguing Properties of Vision Transformers  |  1 +
 ...Homology and Generalization in Neural Networks |  1 +
 ...ive Distillation for Robust Question Answering |  1 +
 ...tleneck for Out-of-Distribution Generalization |  1 +
 ... Imitation Learning for Generalizable Policies |  1 +
 ...aracteristics of the Human Sensorimotor System |  1 +
 ...raging Pre-trained Contrastive Representations |  1 +
 ... Continuous State Space with Formal Guarantees |  1 +
 data/2021/neurips/Inverse-Weighted Survival Games |  1 +
 ...nvertible DenseNets with Concatenated LipSwish |  1 +
 ...valuation Broken? The Incoherence of Coherence |  1 +
 ...ing Continuous Control with Bernoulli Policies |  1 +
 ...s for Convergent Solutions to Inverse Problems |  1 +
 .../Iterative Amortized Policy Optimization       |  1 +
 ...sence of Latent Confounders and Selection Bias |  1 +
 ...Connecting Probability Estimation for Networks |  1 +
 ...hetic Data: Unifying Framework and New Methods |  1 +
 .../2021/neurips/Iterative Teacher-Aware Learning |  1 +
 .../neurips/Iterative Teaching by Label Synthesis |  1 +
 ...is Pursuit with Global Linear Convergence Rate |  1 +
 ...eural Network Depth and Dropout Regularization |  1 +
 ...jects and Relations for Scene Graph Generation |  1 +
 ...akly Supervised RGB-D Salient Object Detection |  1 +
 ...and input optimization in equilibrium networks |  1 +
 .../K-Net: Towards Unified Image Segmentation     |  1 +
 ...Reasoning for Zero-Shot Coordination in Hanabi |  1 +
 ...t Flow for Probabilities with Disjoint Support |  1 +
 ...er Incomplete Graphs via Graphs Neural Network |  1 +
 ...ll: Trajectory Attention in Video Transformers |  1 +
 data/2021/neurips/Kernel Functional Optimisation  |  1 +
 .../Kernel Identification Through Transformers    |  1 +
 .../Kernelized Heterogeneous Risk Minimization    |  1 +
 data/2021/neurips/Knowledge-Adaptation Priors     |  1 +
 ...pired 3D Scene Graph Prediction in Point Cloud |  1 +
 ...ks via Efficient in-situ Subspace Optimization |  1 +
 ...tion via Augmentation for Deep Active Learning |  1 +
 ...al Systems that Generalize Across Environments |  1 +
 ...ti-purpose Learnt Low-dimensional Binary Codes |  1 +
 ...ve Hashing Accelerated Simulation and Learning |  1 +
 ...tition-based Extreme Multilabel Classification |  1 +
 ...se SGD Provably Prefers Flat Global Minimizers |  1 +
 ...onsistency in overfitted generalized $k$-means |  1 +
 ...tive Classification under Overparameterization |  1 +
 ...etworks for Multi-Node Representation Learning |  1 +
 ...eration in Hierarchical Reinforcement Learning |  1 +
 ...gation with Fine-Grained Alignment Supervision |  1 +
 ...improved power method for tensor decomposition |  1 +
 ...f the effects of mutations on protein function |  1 +
 ...lace Redux - Effortless Bayesian Deep Learning |  1 +
 ...aphs: New Benchmarks and Strong Simple Methods |  1 +
 ...ith Fourier Features and Tensor Decompositions |  1 +
 .../Large-Scale Unsupervised Object Discovery     |  1 +
 .../Large-Scale Wasserstein Gradient Flows        |  1 +
 ... for Least-Squares in the Interpolation regime |  1 +
 ...st-iterate Convergence in Extensive-Form Games |  1 +
 ...fast computation with arbitrarily slow neurons |  1 +
 ...ram Synthesis Beyond Domain-Specific Languages |  1 +
 ...tent Matters: Learning Deep State-Space Models |  1 +
 .../Lattice partition recovery with dyadic CART   |  1 +
 ...ty of Linear Thresholds from Label Proportions |  1 +
 ... Multi-dimensional Spatial Positional Encoding |  1 +
 ...pproach for High-Dimensional Outlier Detection |  1 +
 ...Correspondence via Canonical Point Autoencoder |  1 +
 ...nt Learning with Zero Training-time Violations |  1 +
 ...resentation for Out-of-Distribution Prediction |  1 +
 ...ive Policies to Solve NP-hard Routing Problems |  1 +
 ...al Networks using DiscriminAtive Masking (DAM) |  1 +
 ...ning Conjoint Attentions for Graph Neural Nets |  1 +
 ...entation via Disentangled Feature Augmentation |  1 +
 ...gled Representations for Semantic Segmentation |  1 +
 .../Learning Disentangled Behavior Embeddings     |  1 +
 ...Collaboration Graph for Multi-Agent Perception |  1 +
 ...Diverse Policies in MOBA Games via Macro-Goals |  1 +
 ...Representations in Goal-conditioned Block MDPs |  1 +
 ...rain Connectome with Spatio-Temporal Attention |  1 +
 ...ibria in Matching Markets from Bandit Feedback |  1 +
 ...Equivariant Stein Variational Gradient Descent |  1 +
 .../Learning Fast-Inference Bayesian Networks     |  1 +
 ...omain Approximation for Binary Neural Networks |  1 +
 ...Models: Precise Asymptotics in High-dimensions |  1 +
 ...rning Generalized Gumbel-max Causal Mechanisms |  1 +
 ...rgy-Based Latent Space for Saliency Prediction |  1 +
 .../2021/neurips/Learning Graph Cellular Automata |  1 +
 ...ing Graph Models for Retrosynthesis Prediction |  1 +
 ...zation Problems: A Data Generation Perspective |  1 +
 ...ject Detection via Kullback-Leibler Divergence |  1 +
 ... Rule Sets: A Submodular Optimization Approach |  1 +
 ...aph-based World Models of Textual Environments |  1 +
 ...hborhood Search Policy for Integer Programming |  1 +
 ...tial Decision Making by Reinforcement Learning |  0
 ...e Abstractions for Deep Reinforcement Learning |  1 +
 .../Learning Models for Actionable Recourse       |  1 +
 ...etric Volterra Kernels with Gaussian Processes |  1 +
 .../Learning Optimal Predictive Checklists        |  1 +
 ...nded Constraint Violation for Constrained MDPs |  1 +
 ...annian metric for disease progression modeling |  1 +
 ...tterns of Human Brain across Many fMRI Studies |  1 +
 ...tic Representations to Verify Hardware Designs |  1 +
 ...ing Signal-Agnostic Manifolds of Neural Fields |  1 +
 .../Learning Space Partitions for Path Planning   |  1 +
 ...artially Observed or Delayed Dynamical Systems |  1 +
 ...rom Random Deep Action-conditional Predictions |  1 +
 ...by Minimizing a PAC-Bayes Generalization Bound |  1 +
 ...ly Teacher Networks for Knowledge Distillation |  1 +
 ...xplain Generalisation in Graph Neural Networks |  1 +
 ...earning Transferable Adversarial Perturbations |  1 +
 ...Cloud Detection via 3D Contrastive Co-training |  1 +
 ...s in Panels with General Intervention Patterns |  1 +
 ...Representation for Deep Reinforcement Learning |  1 +
 ...Single Neuron with Bias Using Gradient Descent |  1 +
 .../neurips/Learning and Generalization in RNNs   |  1 +
 ...ealistic datasets with a teacher-student model |  1 +
 ...ing and Reasoning for Video Question Answering |  1 +
 ... in Multi-Stage Decentralized Matching Markets |  1 +
 ...erative Configurable Markov Decision Processes |  1 +
 ...ly observable Markov games with perfect recall |  0
 ...l trajectories via augmented behavioral models |  1 +
 ...rning latent causal graphs via mixture oracles |  1 +
 ...ficient for Estimating (Some) Graph Parameters |  1 +
 ...t attractor structure in decision-making tasks |  1 +
 ...imal Tikhonov regularizer for inverse problems |  1 +
 ...ent Domains for Adaptive Semantic Segmentation |  1 +
 ...ing to Assimilate in Chaotic Dynamical Systems |  1 +
 ...Example Solutions for Neural Program Synthesis |  1 +
 .../neurips/Learning to Compose Visual Relations  |  1 +
 ...Draw: Emergent Communication through Sketching |  1 +
 data/2021/neurips/Learning to Elect               |  1 +
 ...niversal Plan-Conditioned Policies in Robotics |  1 +
 ...a Pixel-level Noise-aware Adversarial Training |  1 +
 ...nerate Visual Questions with Noisy Supervision |  1 +
 ...nd Multi-Agent Communication with Autoencoders |  1 +
 ...ems with Dual-Aspect Collaborative Transformer |  1 +
 ...Dense Gaussian Processes for Few-Shot Learning |  1 +
 .../neurips/Learning to Learn Graph Topologies    |  1 +
 ... Predict Trustworthiness with Steep Slope Loss |  1 +
 ...ing to Schedule Heuristics in Branch and Bound |  1 +
 .../neurips/Learning to See by Looking at Noise   |  1 +
 ...enous Events for Marked Temporal Point Process |  1 +
 ...ms as Interpretable and Generalizable Policies |  1 +
 ...al Networks Through the Information Bottleneck |  1 +
 .../neurips/Learning to dehaze with polarization  |  1 +
 ...ng to delegate for large-scale vehicle routing |  1 +
 ...adient sparsity in meta and continual learning |  1 +
 ...rithmic Supervision via Continuous Relaxations |  1 +
 ...rning with Holographic Reduced Representations |  1 +
 .../Learning with Labeling Induced Abstentions    |  1 +
 ... Noisy Correspondence for Cross-modal Matching |  1 +
 .../2021/neurips/Learning with User-Level Privacy |  1 +
 ...with Multiple States via New Ski Rental Bounds |  1 +
 ...learn non-convex piecewise-Lipschitz functions |  1 +
 .../Least Square Calibration for Peer Reviews     |  1 +
 ...ath for Cross-Domain Cold-Start Recommendation |  1 +
 ... Approximate Inference in Combinatorial Spaces |  1 +
 ...Level Object Pose Estimation from Point Clouds |  1 +
 ...ral Correlations in Sparsified Mean Estimation |  1 +
 ...Language Models for Abstract Textual Reasoning |  1 +
 ...ptation via Consolidated Internal Distribution |  1 +
 ...presentations with Single-Evaluation Rendering |  1 +
 ...layer neural networks with mean field training |  1 +
 ...ling Client Heterogeneity and Sparse Gradients |  1 +
 ... High-dimensional Vector Autoregressive Models |  1 +
 ...aming Model: Improved Bounds for Heavy Hitters |  1 +
 ...babilistic Solution of Boundary Value Problems |  0
 ... Synthesis with Visual Context Attentional GAN |  1 +
 ...t-Decodable Mean Estimation in Nearly-PCA Time |  2 ++
 ...lestone Classes are Privately Online Learnable |  1 +
 ... Regret Minimization in Reinforcement Learning |  1 +
 ...o-Encoders Using Jacobian $L_1$ Regularization |  1 +
 ...al Explanation of Dialogue Response Generation |  1 +
 data/2021/neurips/Local Hyper-Flow Diffusion      |  1 +
 ...ure Learning in Neural Networks Beyond Kernels |  1 +
 ... using self-supervised contrastive predictions |  1 +
 ...Local policy search with Bayesian optimization |  1 +
 data/2021/neurips/Locality Sensitive Teaching     |  1 +
 ...ity in convolutional teacher-student scenarios |  1 +
 .../neurips/Localization with Sampling-Argmax     |  1 +
 .../Localization, Convexity, and Star Aggregation |  1 +
 ...ibution Detection using Deep Generative Models |  1 +
 ... Prediction Intervals for Deep Learning Models |  1 +
 ...ation of functionals of discrete distributions |  0
 .../Locally private online change point detection |  1 +
 .../Logarithmic Regret from Sublinear Hints       |  1 +
 ...ithmic Regret in Feature-based Dynamic Pricing |  1 +
 ...t-Term Transformer for Online Action Detection |  1 +
 ...Efficient Transformers for Language and Vision |  1 +
 ...rounding of Narrations in Instructional Videos |  1 +
 ...anations with Sobol-based Sensitivity Analysis |  1 +
 ...for Contrastive Semantic Segmentation Learning |  1 +
 ... application to particle variational inference |  1 +
 .../Lossy Compression for Lossless Prediction     |  1 +
 ... Optimization for Temporal Action Localization |  1 +
 ...raints for Fast Inference in Structured Models |  1 +
 ...ooth and Low-Rank Matrix Optimization Problems |  1 +
 data/2021/neurips/Low-Rank Subspaces in GANs      |  1 +
 ...epresentations is Reflected in Brain Responses |  1 +
 ...alized Optimization Over Time-Varying Networks |  1 +
 ...ing Methods for Well-Conditioned Distributions |  1 +
 ... the Pseudo-Dimension of Tensor Network Models |  1 +
 .../neurips/Luna: Linear Unified Nested Attention |  1 +
 ...ree Approximations of Second-Order Information |  1 +
 ...via Maximizing Deviation from Explored Regions |  1 +
 ...g with a Team of Reinforcement Learning Agents |  1 +
 ...ion-Aware Unit for Video Prediction and Beyond |  1 +
 ...Text and Human Text using Divergence Frontiers |  1 +
 ...nference via Uncorrected Hamiltonian Annealing |  1 +
 ...LOT: Multimodal Neural Script Knowledge Models |  1 +
 ...Economic Sparse Training Framework on the Edge |  1 +
 ...state similarity for Markov decision processes |  1 +
 ...mputation via Learning Missing Data Mechanisms |  1 +
 .../MLP-Mixer: An all-MLP Architecture for Vision |  1 +
 ...OMA: Multi-Object Multi-Actor Activity Parsing |  1 +
 ...pervised Transformer for Visual Representation |  1 +
 ...g for Variance Reduction in Online Experiments |  1 +
 ...rackets for forecasting irreversible processes |  1 +
 ...Attention in Deep Reinforcement Learning Tasks |  1 +
 .../MagNet: A Neural Network for Directed Graphs  |  1 +
 ...ork for Verifying Probabilistic Specifications |  1 +
 ...terfactual) Difference One Rationale at a Time |  1 +
 ...online learning for optimal allocation of time |  1 +
 ...ence: a Framework for Comparing Data Manifolds |  1 +
 .../Manipulating SGD with Data Ordering Attacks   |  1 +
 ...Online Multiclass Learning via Convex Geometry |  1 +
 ...alised Gaussian Processes with Nested Sampling |  1 +
 .../MarioNette: Self-Supervised Sprite Learning   |  1 +
 .../Mastering Atari Games with Limited Data       |  1 +
 ...a Desired Causal State via Shift Interventions |  1 +
 ...networks for neural combinatorial optimization |  1 +
 ...on and the interpretation of geodesic distance |  1 +
 ...ihood Training of Score-Based Diffusion Models |  1 +
 ...easuring Generalization with Optimal Transport |  1 +
 ...ng to Identify High-Risk States and Treatments |  1 +
 ...mory Efficient Meta-Learning with Large Images |  1 +
 ...ithms for Max-k-Cut and Correlation Clustering |  1 +
 ...t Patch-based Inference for Tiny Deep Learning |  1 +
 data/2021/neurips/Meta Internal Learning          |  1 +
 ...Meta Learning Backpropagation And Improving It |  1 +
 ...Learning Kernels for Testing with Limited Data |  1 +
 ...ptive Nonlinear Control: Theory and Algorithms |  1 +
 ...Learning Reliable Priors in the Function Space |  1 +
 ...earning Sparse Implicit Neural Representations |  1 +
 ...Learning for Relative Density-Ratio Estimation |  1 +
 ...ck-Box Random Search Based Adversarial Attacks |  1 +
 ...-Learning via Learning with Distributed Memory |  0
 .../neurips/Meta-learning to Improve Pre-training |  1 +
 .../Meta-learning with an Adaptive Task Scheduler |  1 +
 ...ble Clothed Human Models from Few Depth Images |  1 +
 ...Task Bandits with Bayesian Hierarchical Models |  1 +
 ...gs Data Augmentation for Graph Neural Networks |  1 +
 ...poral Generalization in Neural Language Models |  1 +
 ...ent Slot Set Encoder for Scalable Set Encoding |  1 +
 ...hods for Stochastic Weakly Convex Optimization |  1 +
 ...arial Regret via Root-Logarithmic Regularizers |  1 +
 .../Minimax Regret for Stochastic Shortest Path   |  1 +
 ...ent in Social Networks via Link Recommendation |  1 +
 ...efits of Two-stage and One-stage HOI Detection |  1 +
 ...gevin Monte Carlo: the Case Under Isoperimetry |  1 +
 ...specified Gaussian Process Bandit Optimization |  1 +
 ...earning via Offline Data With Partial Coverage |  0
 ...ine Continual Learning with Neuron Calibration |  1 +
 ...fer via Distillation of Activated Channel Maps |  1 +
 ... Forecasting with Microscopic Time Series Data |  1 +
 ...nt: Fast online multiclass logistic regression |  1 +
 ...ransferring Mask Prior and Semantic Similarity |  1 +
 ... Estimation and PU Learning: A Modern Approach |  1 +
 ...ion for Alpha-Divergence Variational Inference |  1 +
 ...ased Imitation Learning From Observation Alone |  1 +
 ...tory Forecasting for Human Mobility Prediction |  1 +
 .../Modality-Agnostic Topology Aware Localization |  1 +
 ...pervised Domain Adaptation without Source Data |  1 +
 .../Model Selection for Bayesian Autoencoders     |  1 +
 ...n of gradient flow in the random feature model |  1 +
 .../neurips/Model-Based Domain Generalization     |  1 +
 ...pisodic Memory Induces Dynamic Hybrid Controls |  1 +
 ...t Learning via Imagination with Derived Memory |  1 +
 ...rchies with Relation-specific Hyperbolic Cones |  1 +
 .../Modified Frank Wolfe in Probability Space     |  1 +
 ...dular Gaussian Processes for Transfer Learning |  1 +
 ...nchronous Update for Adaptive Gradient Methods |  1 +
 ...h With Iteratively Refining State Abstractions |  1 +
 ...k (MA): A New Potential Risk of Screen Photos" |  0
 ...Knowledge Distillation from Out-of-Domain Data |  1 +
 ...ergence-based Generative Modeling on Manifolds |  1 +
 ...d Training on Heterogeneous Unreliable Devices |  1 +
 ...sed Learning for Molecular Property Prediction |  1 +
 ...Voltage Control on Power Distribution Networks |  1 +
 ...ement Learning in Stochastic Networked Systems |  1 +
 ...est-Arm Identification and Regret Minimization |  1 +
 ...ulti-Facet Clustering Variational Autoencoders |  1 +
 ...abel Learning with Pairwise Relevance Ordering |  1 +
 data/2021/neurips/Multi-Objective Meta Learning   |  1 +
 ...ovement with Safety Constraints in Finite MDPs |  1 +
 ...otion Prediction with Multi-Range Transformers |  1 +
 ...ulti-Scale Representation Learning on Proteins |  1 +
 ...ian Optimization with Unknown Evaluation Costs |  1 +
 ...ation Learning via Total Correlation Objective |  1 +
 ...-armed Bandit Requiring Monotone Arm Sequences |  1 +
 ...lti-modal Dependency Tree for Video Captioning |  1 +
 ...ask Learning of Order-Consistent Causal Graphs |  1 +
 .../Multi-view Contrastive Graph Clustering       |  1 +
 ...ticlass Boosting and the Cost of Weak Learning |  1 +
 ...sus Binary Differentially Private PAC Learning |  1 +
 ...re-training with Universal Dependency Learning |  1 +
 ... Few-Shot Learning with Frozen Language Models |  1 +
 .../neurips/Multimodal Virtual Point 3D Detection |  1 +
 ...ngual Embeddings for Large-Scale Speech Mining |  1 +
 ... Descent: Design Your Own Generalization Curve |  1 +
 ...d Operator Learning for Differential Equations |  1 +
 ...NAS-Bench-x11 and the Power of Learning Curves |  1 +
 ...ing on the Orbits of a Deterministic Transform |  0
 ...tion Problems on Geometric Intersection Graphs |  1 +
 ...ality Assessment using Non-Matching References |  1 +
 ...mization using Implicit Neural Representations |  1 +
 ...o the Best Policy in Markov Decision Processes |  1 +
 ... for Sparse-view 3D Reconstruction in the Wild |  1 +
 .../NeRV: Neural Representations for Videos       |  1 +
 .../Near Optimal Policy Optimization via REPS     |  1 +
 ...nvex Optimization For All Orders of Smoothness |  1 +
 ...erimental Design for Causal Structure Learning |  1 +
 ...ar-Optimal No-Regret Learning in General Games |  1 +
 ...rcement Learning via Double Variance Reduction |  1 +
 ...thms for Learning Non-Linear Dynamical Systems |  1 +
 ...ly Horizon-Free Offline Reinforcement Learning |  1 +
 ...mal Reinforcement Learning for Discounted MDPs |  1 +
 ...blivious Algorithms for Explainable Clustering |  1 +
 ... causal graphical models with hidden variables |  1 +
 .../Neighborhood Reconstructing Autoencoders      |  1 +
 ...ware Graph Neural Networks for Link Prediction |  1 +
 ...ification from Arbitrary Surrogate Experiments |  1 +
 data/2021/neurips/Nested Graph Neural Networks    |  1 +
 data/2021/neurips/Nested Variational Inference    |  1 +
 ...orcing Occam's Razor to Improve Generalization |  1 +
 ...Volume Rendering for Multi-view Reconstruction |  1 +
 ...Index Network For Restless Bandits Via Deep RL |  1 +
 ...al Active Learning with Performance Guarantees |  1 +
 ...nterpretable Machine Learning with Neural Nets |  1 +
 ...al Algorithmic Reasoners are Implicit Planners |  1 +
 ...ng Speech from Self-Supervised Representations |  1 +
 ...al Auto-Curricula in Two-Player Zero-Sum Games |  0
 ...h Neural Network Framework for Link Prediction |  1 +
 data/2021/neurips/Neural Bootstrapper             |  1 +
 ... Circuit Synthesis from Specification Patterns |  1 +
 ...l Distance Embeddings for Biological Sequences |  1 +
 ...ubber: Dubbing for Videos According to Scripts |  1 +
 ...h for Uncertainty Estimation and Dataset Shift |  1 +
 ...al Flows: Efficient Alternative to Neural ODEs |  1 +
 ...adiance Fields for Human Performance Rendering |  1 +
 ...With Multiple Modes and Stochastic Transitions |  1 +
 ...the Role of Stochasticity in Robust Perception |  1 +
 data/2021/neurips/Neural Production Systems       |  1 +
 ...ural Program Generation Modulo Static Analysis |  1 +
 ...seudo-Label Optimism for the Bank Loan Problem |  1 +
 ...ral Taskonomy at Scale in Rodent Visual Cortex |  1 +
 ...ural Relightable Participating Media Rendering |  1 +
 data/2021/neurips/Neural Routing by Memory        |  1 +
 ... Machine For Transformer-Based Text Generation |  1 +
 data/2021/neurips/Neural Scene Flow Prior         |  1 +
 ...tonian Equations on General Coordinate Systems |  1 +
 ...Neural Tangent Kernel Maximum Mean Discrepancy |  1 +
 .../neurips/Neural Trees for Learning on Graphs   |  1 +
 ...r Semi-Supervised Few-Shot Learning of 3D Pose |  1 +
 ...mal feedback control with local learning rules |  1 +
 ...egrated Lighting for Reflectance Decomposition |  1 +
 ...tic for Solving the Traveling Salesman Problem |  1 +
 ...Reliable Route Recommendation on Road Networks |  1 +
 ...Full Batch (in Stochastic Convex Optimization) |  1 +
 ...hout Trade-offs for the Sketched Newton Update |  1 +
 ...ation for Federated Learning with Non-IID Data |  1 +
 ...ation: Learning to Navigate without Navigating |  1 +
 .../No Regrets for Learning the Prior in Bandits  |  1 +
 data/2021/neurips/No-Press Diplomacy from Scratch |  1 +
 ...gret Online Learning over Riemannian Manifolds |  1 +
 ...nt Local Smoothing for Scalable Graph Learning |  1 +
 ...rks: meta-learning useful conserved quantities |  1 +
 ...: Role of Symmetry Breaking in Neural Networks |  1 +
 ...upervised Image Denoising without Clean Images |  1 +
 ...03\251vy Flights in Attractor Neural Networks" |  1 +
 data/2021/neurips/Noisy Recurrent Neural Networks |  1 +
 ...TDC with General Smooth Function Approximation |  1 +
 ...ian Gaussian Processes for Few-Shot Regression |  1 +
 ...hs via Discrete Difference of Convex Algorithm |  1 +
 ...asymptotic Error Bounds for Bidirectional GANs |  1 +
 ...r Wasserstein approximation using point clouds |  1 +
 ...y Robust Optimization: Non-asymptotic Analysis |  1 +
 ...ion for Self-Adaptive 3D Human Pose Estimation |  1 +
 ...imation of continuous DPPs with kernel methods |  1 +
 ...ntiation for Machine-Learning and Optimization |  1 +
 ... and Log Odds Correction with Rare Events Data |  1 +
 ...c Transformers for Efficient Image Recognition |  1 +
 ...ers are Robust in Graph Convolutional Networks |  1 +
 ...the Constrained Most Probable Explanation Task |  1 +
 ...g Statistics and Mutual Knowledge Distillation |  1 +
 ...: A Simple yet Effective Exploration Criterion |  1 +
 .../Numerical Composition of Differential Privacy |  1 +
 ...rical influence of ReLU'(0) on backpropagation |  1 +
 ...on for Natural Language Understanding via ADMM |  1 +
 ...eep Generative Models for Lossless Compression |  1 +
 ...GCNN: 3D Object Detection using Dynamic Graphs |  1 +
 ...ressing Causal Confusion in Imitation Learning |  1 +
 ...with Generative Spatial-Temporal Factorization |  1 +
 ...ive Learning for Debiased Scene Representation |  1 +
 ...Observation-Free Attacks on Stochastic Bandits |  1 +
 ...ierarchical Implicit Functions for 3D Modeling |  1 +
 ...f-Policy Risk Assessment in Contextual Bandits |  1 +
 ... Learning via Pessimistic Dual Value Iteration |  1 +
 ...enges and Effective Data Collection Strategies |  1 +
 .../Offline Model-based Adaptable Policy Learning |  1 +
 .../Offline RL Without Off-Policy Evaluation      |  1 +
 ... Learning as One Big Sequence Modeling Problem |  1 +
 ... Learning with Reverse Model-based Imagination |  1 +
 ...untable Multi-Agent Sequential Decision Making |  1 +
 ...n Calibration and Out-of-Domain Generalization |  1 +
 ... Interactions in Two-Stage Recommender Systems |  1 +
 ...astive Representations of Stochastic Processes |  1 +
 .../On Density Estimation with Diffusion Models   |  0
 ...heduling of Model-based Reinforcement Learning |  1 +
 ...imization with Dependent and Heavy-Tailed Data |  1 +
 ..., Prototypical Networks, and Few-Shot Learning |  1 +
 ... for Heterogeneous Treatment Effect Estimation |  1 +
 ...d Corruptions in Natural Corruption Robustness |  1 +
 ...r Solving Placement and Routing in Chip Design |  1 +
 ...n Large-Cohort Training for Federated Learning |  1 +
 ...ns for Transfer Learning with Multiple Sources |  1 +
 ...of SGD and Input-Smoothness of Neural Networks |  1 +
 .../On Locality of Local Explanation Models       |  1 +
 ...gin-Based Cluster Recovery with Oracle Queries |  1 +
 ...zation in Probabilistic Deep Generative Models |  1 +
 ...led Object Detection and Instance Segmentation |  1 +
 .../On Optimal Interpolation in Linear Regression |  1 +
 ...rsarial Corruption in Online Decision Problems |  1 +
 ...ls: Group Representation and Isotropic Scaling |  1 +
 ...nforcement Learning from Expert Demonstrations |  1 +
 ...lly Frozen Weights in Sequential Task Learning |  1 +
 ...Depth in Training Graph Convolutional Networks |  1 +
 ...e Matrices with the Bures-Wasserstein Geometry |  1 +
 ...tational Complexity and Barycenter Computation |  1 +
 ...A Second Look at Transferable Targeted Attacks |  1 +
 ...plication to Ranking from Pairwise Comparisons |  1 +
 data/2021/neurips/On Training Implicit Models     |  1 +
 data/2021/neurips/On UMAP's True Loss Function    |  1 +
 ...rning sparse vectors from mixture of responses |  1 +
 ...n sensitivity of meta-learning to support data |  1 +
 ... Algorithmic Stability of Adversarial Training |  1 +
 ...iance-Cost Tradeoff of Stochastic Optimization |  1 +
 ...sed Model-Agnostic Meta-Reinforcement Learning |  1 +
 ...ncy of Variance-Reduced Policy Gradient Method |  1 +
 ...or-Guided Zeroth-Order Optimization Algorithms |  1 +
 ...ep Decay Step-Size for Stochastic Optimization |  1 +
 ...c Hardness of Learning Single Periodic Neurons |  1 +
 ...ween Neural Network and Support Vector Machine |  1 +
 .../On the Estimation Bias in Double Q-Learning   |  1 +
 ... Existence of The Adversarial Bayes Classifier |  1 +
 .../On the Expected Complexity of Maxout Networks |  1 +
 .../neurips/On the Expressivity of Markov Reward  |  1 +
 .../On the Frequency Bias of Generative Models    |  1 +
 ... the Generative Utility of Cyclic Conditionals |  1 +
 ...or Detecting Distributional Shifts in the Wild |  1 +
 ...eneralization of Probabilistic Image Modelling |  1 +
 ...ning with Batch Normalization and Weight Decay |  1 +
 ...erentiable Learning versus PAC and SQ Learning |  1 +
 .../On the Power of Edge Independent Graph Models |  1 +
 ...le Generalization of Recurrent Neural Networks |  1 +
 ...e Representation Power of Set Pooling Networks |  1 +
 ...of Solutions to Elliptic PDEs in Barron Spaces |  1 +
 ...ation in Double Descent: A Least Squares Study |  1 +
 ...mplexity of Learning under Geometric Stability |  0
 ... of Privately Learning Axis-Aligned Rectangles |  1 +
 ...onvergence Properties of Random Search Methods |  1 +
 ...the Stochastic Stability of Deep Markov Models |  1 +
 ...mality of Thompson Sampling in High Dimensions |  1 +
 ...cement Learning with Once-per-Episode Feedback |  1 +
 ...f Graph Neural Networks on Large Random Graphs |  1 +
 ... with Stochastic Differential Equations (SDEs) |  1 +
 ...te Gradients in Variational Autoencoder Models |  1 +
 ...d Function Approximation in Imitation Learning |  1 +
 ...ce of the Fisher Information for Deep Learning |  1 +
 ...e and loss function in classification problems |  1 +
 ...ured Attention Graphs for Image Classification |  1 +
 ...gle Cosine Similarity based Learning Objective |  1 +
 ...operative Bandits with Imperfect Communication |  1 +
 ...ges with Cross-lingual Dense Passage Retrieval |  1 +
 ... Active Learning with Surrogate Loss Functions |  1 +
 .../Online Adaptation to Label Distribution Shift |  1 +
 ...trol of Unknown Time-Varying Dynamical Systems |  1 +
 ...imization with Continuous Switching Constraint |  1 +
 .../Online Facility Location with Multiple Advice |  1 +
 .../Online Knapsack with Frequency Predictions    |  1 +
 ...ral Computations From Sparse Temporal Feedback |  1 +
 ...f Complex Dynamical Systems from Sensory Input |  1 +
 .../Online Learning in Periodic Zero-Sum Games    |  1 +
 ... Equilibrium with Application to Fair Division |  1 +
 ...on-Asymptotic Performances of Greedy Algorithm |  1 +
 ...ne Multi-Armed Bandits with Adaptive Inference |  1 +
 ... Reinforcement Learning with Model Uncertainty |  1 +
 ...Selective Classification with Limited Feedback |  1 +
 ...f the Number of Errors in Thresholding Bandits |  1 +
 ...e Variational Filtering and Parameter Learning |  1 +
 ...ment Learning by Planning with a Learned Model |  1 +
 ...e control for anomaly detection in time series |  1 +
 ...ear function approximation and bandit feedback |  1 +
 ... Neural Network Training And Pruning Framework |  1 +
 data/2021/neurips/Open Rule Induction             |  1 +
 ...mprove Robustness Against Inherent Label Noise |  1 +
 ...rning with Open-set Consistency Regularization |  1 +
 ...s for Stochastic Contextual Preference Bandits |  1 +
 ... Identification Methods for Tail-Risk Measures |  1 +
 ...Algorithms for Non-concave Bandit Optimization |  1 +
 ...der Simple Regret for Gaussian Process Bandits |  1 +
 .../neurips/Optimal Policies Tend To Seek Power   |  1 +
 ...ity Estimation under Communication Constraints |  1 +
 ...mal Rates for Random Order Online Optimization |  1 +
 .../Optimal Sketching for Trace Estimation        |  1 +
 .../Optimal Underdamped Langevin MCMC Method      |  1 +
 ...eneous, Reward-Free and Task-Agnostic Settings |  1 +
 ...of Markov chains with and without spectral gap |  1 +
 ... Federated Learning: A Game-theoretic Approach |  1 +
 ...e for stochasticblock model with missing links |  1 +
 ...tigrid Coarsening Using Reinforcement Learning |  1 +
 ...nditional Value-At-Risk of Black-Box Functions |  1 +
 ...ralization Bound via Anisotropic Noise of SGLD |  0
 ...wledge for Continual Learning via Metalearning |  1 +
 ...Complexity in Nonsmooth Nonconvex Optimization |  1 +
 ...zation in Factored MDPs with Unknown Structure |  1 +
 ...stribution Generalization in Kernel Regression |  1 +
 ...inforcement Learning via Variational Inference |  1 +
 ...ental Few-Shot Learning by Finding Flat Minima |  1 +
 ...ercoming the Convex Barrier for Simplex Inputs |  1 +
 ...ian regularization in semi-supervised learning |  1 +
 ...reveals image classification model pathologies |  1 +
 ...bustness to Covariate Shift in High Dimensions |  1 +
 ...ssage Passing in Rotationally Invariant Models |  1 +
 ...ks Motivated by Partial Differential Equations |  1 +
 ...ing generative models with recovery guarantees |  0
 ...of Program Learning, Understanding, and Repair |  1 +
 ...ing via Penalizing Out-of-Distribution Samples |  1 +
 ...presentations for Effective Probability Models |  1 +
 ...Conceptual, Relational, and Physical Reasoning |  1 +
 ...D Scene Reconstruction From a Single RGB Image |  1 +
 ...l Ridge Regression by Feature Space Partitions |  1 +
 ...jectives with Expected Hypervolume Improvement |  1 +
 ...and Efficient Hierarchical k-Median Clustering |  1 +
 data/2021/neurips/Parallelizing Thompson Sampling |  1 +
 .../Parameter Inference with Bifurcation Diagrams |  1 +
 ...meter Prediction for Unseen Deep Architectures |  1 +
 ...Parameter-free HE-friendly Logistic Regression |  1 +
 ...e Transfer for Personalized Federated Learning |  1 +
 ...ds for Approximating PDEs with Neural Networks |  1 +
 ...ed Quantum Policies for Reinforcement Learning |  1 +
 ...nted Algorithms for Online Conversion Problems |  1 +
 ...osing the gap between human and machine vision |  1 +
 ... Federated Learning via Partial Initialization |  1 +
 ...essage Passing Generative Adversarial Networks |  1 +
 ... Network with Global Convergence Rate Analysis |  1 +
 ...tion and Code: learning how to compress graphs |  1 +
 ...r Optimization of Trained ReLU Neural Networks |  1 +
 ...ral networks predicts human visual selectivity |  1 +
 ... Signal Mid-level Patches in Referential Games |  1 +
 data/2021/neurips/Pay Attention to MLPs           |  1 +
 ...ultilingual and Multi-Domain Sequence Modeling |  1 +
 ... is Not All You Need for Semantic Segmentation |  1 +
 ...terogeneous Agents via Personalized Simulators |  1 +
 ...What Data Modalities Does Your Model Perceive? |  1 +
 ...iodic Activation Functions Induce Stationarity |  1 +
 ...ncoder for Graph-Level Representation Learning |  1 +
 .../Permuton-induced Chinese Restaurant Process   |  1 +
 ...zed Federated Learning With Gaussian Processes |  1 +
 ...g and learning in discrete energy-based models |  1 +
 ...urbation Theory for the Information Bottleneck |  1 +
 ...dictive Control in Linear Time Varying Systems |  1 +
 ...ly Efficient Offline Mean-Field Multi-Agent RL |  1 +
 ...oo: Gym for Multi-Agent Reinforcement Learning |  1 +
 ...rential Privacy with Direct Feedback Alignment |  1 +
 ...with Deep Learning for Scalable Flood Modeling |  1 +
 ...r Robust and Interpretable Generative Modeling |  1 +
 ...le Learning To Rank via Differentiable Sorting |  1 +
 .../Pipeline Combinators for Gradual AutoML       |  1 +
 ...ltidimensional Planner for DNN Parallelization |  1 +
 ...nments with Combinatorially Hard Search Spaces |  1 +
 ...g Games as Classifying Markov Decision Process |  1 +
 ...irtual Trajectories for Reinforcement Learning |  1 +
 ...ion Estimation under Communication Constraints |  1 +
 ... Detection and Segmentation with Polar Pillars |  0
 ...ient Offline and Online Reinforcement Learning |  1 +
 .../Policy Learning Using Weak Supervision        |  1 +
 ...MDPs: Improved Exploration via Dilated Bonuses |  1 +
 .../Pooling by Sliced-Wasserstein Embedding       |  1 +
 ...ble and High-Quality Generative Text-to-Speech |  1 +
 .../2021/neurips/Post-Contextual-Bandit Inference |  1 +
 ...t-Training Quantization for Vision Transformer |  1 +
 .../Post-Training Sparsity-Aware Quantization     |  1 +
 .../Post-processing for Individual Fairness       |  1 +
 ...llapse and Latent Variable Non-identifiability |  1 +
 .../Posterior Meta-Replay for Continual Learning  |  1 +
 ... A sparsity inducing weight reparameterisation |  1 +
 ... Programming using Primal-Dual Hybrid Gradient |  1 +
 ...actical Near Neighbor Search via Group Testing |  1 +
 ...ealizable Setting: The Power of True Believers |  1 +
 ...pression for Human-in-the-Loop Decision-Making |  1 +
 ... predictive distribution of deep ReLU networks |  1 +
 ...r-Parameterized Nonconvex Matrix Factorization |  1 +
 ...neralization with Perturbation Response Curves |  1 +
 ... Memorability from Contextual Visual Semantics |  1 +
 ... Conformation via Dynamic Graph Score Matching |  1 +
 ... Know Helps: Provable Self-Supervised Learning |  1 +
 ...with brain-inspired predictive coding dynamics |  1 +
 ...eferences in Auction Design with Deep Learning |  1 +
 ...irectional compression in distributed settings |  1 +
 ...ions for Data-Efficient Reinforcement Learning |  1 +
 ... Dynamic Auctions for a Value-maximizing Buyer |  1 +
 ...e Non-smooth ERM and SCO in Subquadratic Steps |  1 +
 ...on-private Uniformity Testing for Ranking Data |  1 +
 .../Private learning implies quantum stability    |  1 +
 ...ly Learning Mixtures of Axis-Aligned Gaussians |  1 +
 data/2021/neurips/Privately Learning Subspaces    |  1 +
 .../Privately Publishable Per-instance Privacy    |  1 +
 ...am-Guided Transformer for Program-Guided Tasks |  1 +
 ...ilistic Attention for Interactive Segmentation |  1 +
 ...tion Model for Reasoning over Knowledge Graphs |  1 +
 ...robabilistic Forecasting: A Level-Set Approach |  1 +
 ...r Instance Reweighting in Adversarial Training |  1 +
 ...position of Neural Population Spiking Activity |  1 +
 ...abilistic Transformer For Time Series Analysis |  1 +
 ...ths and the Structure of Predictions over Time |  1 +
 ...Attention for Vision-and-Language Pre-training |  1 +
 ... Society (PALMS) with Values-Targeted Datasets |  1 +
 ...i-Objective Stein Variational Gradient Descent |  1 +
 ...t Learning for Partially Observed Environments |  1 +
 ...e Transforms for Monocular 3D Object Detection |  1 +
 ...ure Interaction Search for Deep Sparse Network |  1 +
 data/2021/neurips/Projected GANs Converge Faster  |  1 +
 data/2021/neurips/Proper Value Equivalence        |  1 +
 ...rks for Few-Shot Molecular Property Prediction |  1 +
 ...articipatory Budgeting with Additive Utilities |  1 +
 ... for Multiple Object Tracking and Segmentation |  1 +
 ...tic Methods for Offline Reinforcement Learning |  1 +
 ...d Deep Learning with Spectral Contrastive Loss |  1 +
 ...ng: Shelve Optimism, Embrace Virtual Curvature |  1 +
 ...or Imitation with Contrastive Fourier Features |  1 +
 ...isoning Attacks Against Reinforcement Learning |  1 +
 ...nt Learning with Confounded Observational Data |  1 +
 ...ion Approximation under Adaptivity Constraints |  1 +
 ...bly Faster Algorithms for Bilevel Optimization |  1 +
 ...ation Benefit for Invariance in Kernel Methods |  1 +
 ...ask reinforcement learning with model transfer |  1 +
 ... efficient, succinct, and precise explanations |  1 +
 ...of Neural Networks Trained by Gradient Descent |  1 +
 ... Normalization while Removing Batch Dependence |  1 +
 ...d Neural Networks with Iterative Randomization |  1 +
 .../Pseudo-Spherical Contrastive Divergence       |  1 +
 .../Pure Exploration in Kernel and Neural Bandits |  1 +
 ...ion with Synthetic Boundary Supporting Samples |  1 +
 ...n Artifacts for Achieving Adversarial Outcomes |  1 +
 ...lation with Applications to Federated Learning |  1 +
 ...oving Transferability in Domain Generalization |  1 +
 ...-Drop: Regularized Dropout for Neural Networks |  1 +
 ...Structured Compression of Deep Neural Networks |  1 +
 ...iple Interacting People under Weak Supervision |  1 +
 ... Efficient and Robust Semi-Supervised Learning |  1 +
 ...able Influence-based Active Learning on Graphs |  1 +
 ...tent MDPs: Regret Guarantees and a Lower Bound |  1 +
 ...d Reinforcement Learning is a Dataflow Problem |  1 +
 ... for Cooperative Reinforcement Learning Agents |  1 +
 ...mory Management for Class-Incremental Learning |  1 +
 ... Defense Against Query-Based Black-Box Attacks |  1 +
 ... After Many Epochs on Ill-Conditioned Problems |  1 +
 ...ecovery: Subgradient Method and Exact Recovery |  1 +
 data/2021/neurips/Ranking Policy Decisions        |  1 +
 ...e-Optimal Subspace Estimation on Random Graphs |  1 +
 ...Plug-in Estimators via Barycentric Projections |  1 +
 ... MLP based Multi-Modal Information Unscrambler |  0
 ...ieval and transductive few-shot classification |  1 +
 ...tribution Detection With Rectified Activations |  1 +
 .../neurips/ReLU Regression with Massart Noise    |  1 +
 ...elf-Supervised Learning with Weak Augmentation |  1 +
 ...c evaluation of transductive few-shot learning |  1 +
 ...Auxiliary Classifier GANs with Stable Training |  1 +
 ...ounding Bandits for Modeling Satiation Effects |  1 +
 ...ognizing Vector Graphics without Rasterization |  1 +
 ...onstruction for Powerful Graph Representations |  1 +
 ...or for Generalization to Distributional Shifts |  1 +
 ...iors using the Restricted Eigenvalue Condition |  1 +
 .../Rectangular Flows for Manifold Learning       |  1 +
 ...t Learning of Background for Few-Shot Learning |  1 +
 ...ural Networks with Recurrent Layer Aggregation |  1 +
 ...er Chains for Exact Multi-Label Classification |  1 +
 ...ular Welfare and Matroid Blocking Semi-Bandits |  1 +
 ...xt-Free Grammars and Dynamic Bayesian Networks |  1 +
 ...resence of Latent Variables and Selection Bias |  1 +
 ...Insights from Multi-particle Dynamical Systems |  1 +
 ...ed Motion Planning Using Graph Neural Networks |  1 +
 ...ck for Weakly Supervised Semantic Segmentation |  1 +
 ...ft by Mirror Samples in Cross Domain Alignment |  1 +
 ...e-step Approach to Multi-task Visual Grounding |  1 +
 ...ng Bounds for Kernel and Approximate $k$-Means |  1 +
 ...anguage Models with Compositional Explanations |  1 +
 ...hot Action Recognition for Multi-label Actions |  1 +
 data/2021/neurips/Regime Switching Bandits        |  1 +
 ...Gaussian-Process Optimization in Large Domains |  1 +
 ...ce Replay in Off-Policy Reinforcement Learning |  1 +
 ...Regularization in ResNet with Stochastic Depth |  1 +
 ...Dense CRFs: Generalizing Mean Field and Beyond |  1 +
 ...egularized Softmax Deep Multi-Agent Q-Learning |  1 +
 ...gulating algorithmic filtering on social media |  1 +
 ...on Function Learning for Bayesian Optimization |  1 +
 ...g Enhanced Explainer for Graph Neural Networks |  1 +
 ...ease Progression Model for Alzheimer's Disease |  1 +
 ...: Constant Regret and Representation Selection |  1 +
 ...forcement Learning in Newcomblike Environments |  1 +
 .../Reinforcement Learning in Reward-Mixing MDPs  |  1 +
 .../Reinforcement Learning with Latent Flow       |  1 +
 ...iselessly Observable Markov Decision Processes |  1 +
 ...n of variational quantum circuit architectures |  1 +
 ...s Missing in Attention for Video Understanding |  1 +
 .../neurips/Relative Flatness and Generalization  |  1 +
 ...nty Learning for Facial Expression Recognition |  1 +
 ...eomorphisms indicates performance in deep nets |  1 +
 ...ncy for Differentially Private Query Answering |  1 +
 data/2021/neurips/Relaxing Local Robustness       |  1 +
 ...entralized Deep Learning on Heterogeneous Data |  1 +
 ...h Improved Exact Search and Weaker Assumptions |  1 +
 .../Reliable Decisions with Threshold Calibration |  1 +
 ...criminator in Reproducing Kernel Hilbert Space |  1 +
 ...ations: Modeling Uncertainty in Explainability |  1 +
 ...rning for Health Using Dataset Shift Detection |  1 +
 ...t to Forget: Algorithms for Machine Unlearning |  1 +
 ...y from Functional Data in Systems Neuroscience |  1 +
 ...bsampled Shuffle Model In Distributed Learning |  1 +
 ...sed Policy Search via Recursive Classification |  1 +
 .../Replay-Guided Adversarial Environment Design  |  1 +
 ...of Linear Neural Networks: Analysis and Design |  1 +
 ...on Learning Beyond Linear Prediction Functions |  1 +
 ...n Learning for Event-based Visuomotor Policies |  1 +
 .../Representation Learning on Spatial Networks   |  1 +
 ...on of Deep Neural Networks and Ensemble Models |  1 +
 ... Space Accurately using Multi-Component Floats |  1 +
 ...or Graph Neural Networks with Global Attention |  1 +
 .../neurips/Repulsive Deep Ensembles are Bayesian |  1 +
 ...Models with Improved Representation Guarantees |  1 +
 ...n Efficient Transformer for Visual Recognition |  1 +
 ...thway Priors for Soft Equivariance Constraints |  1 +
 ...axation for Multi-view Representation Learning |  1 +
 ...: Debiasing graph embedding with random graphs |  1 +
 ...l Networks: Do Not Be Afraid of Overconfidence |  1 +
 ...ing Graph Transformers with Spectral Attention |  1 +
 ...Rethinking Neural Operations for Diverse Tasks |  1 +
 ...verage for Efficient Video Object Segmentation |  1 +
 ...-Label Ranking: Consistency and Generalization |  1 +
 ...sing geometrically structured latent manifolds |  1 +
 ...ent sparsification as total error minimization |  1 +
 ...ning Criteria for Convolutional Neural Network |  1 +
 ...rpretation of Accelerated Optimization Methods |  1 +
 ... Adult: New Datasets for Fair Machine Learning |  1 +
 ...ive Projections over Submodular Base Polytopes |  1 +
 ... and Protecting Labels in Distributed Training |  1 +
 ...imization via machine learning with noisy data |  1 +
 ... optimizers reveals known and novel mechanisms |  1 +
 ...th Jacobian switching linear dynamical systems |  1 +
 ...plement Equivariant Networks for DNA Sequences |  1 +
 ...arning through the Lens of Multi-Task Learning |  1 +
 ...bject Detection From an Egocentric Perspective |  1 +
 ...nsupervised Learning of Visual Representations |  1 +
 ...visiting Deep Learning Models for Tabular Data |  1 +
 ...r-discriminator Cooperative Compression Scheme |  1 +
 ...ormation Bottleneck for Adversarial Robustness |  1 +
 ...el Stitching to Compare Neural Representations |  1 +
 ...Nets: Improved Training and Scaling Strategies |  1 +
 .../neurips/Revisiting Smoothed Online Learning   |  1 +
 ...ting the Calibration of Modern Neural Networks |  1 +
 ...Self-Supervised Visual Representation Learning |  1 +
 .../2021/neurips/Reward is enough for convex MDPs |  1 +
 ...nt Learning with Linear Function Approximation |  1 +
 ...ation for a Smart Predict-then-Optimize Method |  1 +
 ...Margin Classification on Sub-Gaussian Mixtures |  1 +
 ... Guarantees for Supervised and Policy Learning |  1 +
 .../Risk Monotonicity in Statistical Learning     |  1 +
 ...k-Averse Bayes-Adaptive Reinforcement Learning |  1 +
 ...einforcement Learning using Successor Features |  1 +
 ...k-averse Heteroscedastic Bayesian Optimization |  1 +
 ...daptation for Offline Model-based Optimization |  1 +
 .../Robust Allocations with Diversity Constraints |  1 +
 ...obust Auction Design in the Auto-bidding World |  1 +
 ...ressed Sensing MRI with Deep Generative Priors |  1 +
 ...ing Negative Samples with Diminished Semantics |  1 +
 ...rfactual Explanations on Graph Neural Networks |  1 +
 ...einforcement Learning through Adversarial Loss |  1 +
 ...n Shift via Minimum Discriminating Information |  1 +
 ...plicit Networks via Non-Euclidean Contractions |  1 +
 ...nt Learning under Transition Dynamics Mismatch |  1 +
 .../neurips/Robust Learning of Optimal Auctions   |  1 +
 .../neurips/Robust Online Correlation Clustering  |  0
 ... Multilingual Translation with Imbalanced Data |  1 +
 ...rowded Scenes with Direct Pose-Level Inference |  1 +
 data/2021/neurips/Robust Predictable Control      |  1 +
 ...ed: Acceleration and Improved Estimation Rates |  1 +
 ...ing via Language Guided Neural Module Networks |  1 +
 ...mposable Average Precision for Image Retrieval |  1 +
 ...-and-Bounded Learning (With Outliers) Problems |  1 +
 ...ust and differentially private mean estimation |  1 +
 ...of Learning Latent Trees with Vector Variables |  1 +
 .../Robustness between the worst and average case |  1 +
 .../Robustness of Graph Neural Networks at Scale  |  1 +
 ...stness via Uncertainty-aware Cycle Consistency |  1 +
 ...ity by Projection in Knowledge Graph Embedding |  1 +
 ...inate Frames For Interacting Dynamical Systems |  1 +
 ...ow-clustering of a Point Process-valued Matrix |  1 +
 ...Dual Graph Aggregation Network for Text-to-SQL |  1 +
 ...e Progressive Encoding for Neural Optimization |  1 +
 ...l Networks via Stochastic Bilevel Optimization |  1 +
 ...lecular wavefunctions and electronic densities |  1 +
 ... Learning using Exploration and 3D Consistency |  1 +
 ...Regularization, Batch-size and Multiple-epochs |  1 +
 ...bolic Interactive Language Grounding Benchmark |  0
 ...s Based Active Learning In Realistic Scenarios |  1 +
 ...entations via Unsupervised Video Decomposition |  1 +
 ...s Structure Learning for Graph Neural Networks |  1 +
 ...erence in High-Dimensional Logistic Regression |  1 +
 ... Solving Noisy Inverse Problems Stochastically |  1 +
 ...Transformer for Vision-and-Language Navigation |  1 +
 ...oftmax-free Transformer with Linear Complexity |  1 +
 .../SOLQ: Segmenting Objects by Learning Queries  |  1 +
 .../SOPE: Spectrum of Off-Policy Estimators       |  1 +
 ...-scale Approximate Nearest Neighborhood Search |  0
 ... by Decoupling Multi-Hop and Logical Reasoning |  1 +
 ... Learning for Domain Adaptive Object Detection |  1 +
 .../SSMF: Shifting Seasonal Matrix Factorization  |  1 +
 ... for Exemplar-based Class-Incremental Learning |  1 +
 ...munication Complexities for Federated Learning |  1 +
 ...esence of Limited In-Distribution Labeled Data |  1 +
 ... Recursive Momentum for Nonconvex Optimization |  0
 ... and Universal Framework of Adaptive Gradients |  1 +
 ...: Domain Generalization by Seeking Flat Minima |  1 +
 ...cal Generalized Linear Function Approximations |  1 +
 .../Safe Pontryagin Differentiable Programming    |  1 +
 ...orcement Learning by Imagining the Near Future |  1 +
 ...ent Learning with Natural Language Constraints |  1 +
 ...arning against Both Stragglers and Adversaries |  1 +
 ...e Graph Explanations for Commonsense Reasoning |  1 +
 ...for Active Ranking from Multi-wise Comparisons |  1 +
 ...earch Configuration: Cutting Planes and Beyond | 15 +++++++++++++++
 .../Sample Selection for Fair and Robust Training |  1 +
 ...of Stackelberg Equilibria in General-Sum Games |  1 +
 ...nearly Realizable MDPs with Limited Revisiting |  1 +
 ...rly-Parameterized MDPs with a Generative Model |  1 +
 ...es Your Winning Ticket Really Win the Jackpot? |  1 +
 ...evance determination and discrete noise models |  1 +
 ...del Selection for Accessible Transfer Learning |  1 +
 ...ching of the Fokker-Planck-Kolmogorov Equation |  1 +
 ...parsely-changing Gaussian Markov Random Fields |  1 +
 ...ntervention Target Estimation in Linear Models |  1 +
 ...rver: A Data Recommender for Transfer Learning |  1 +
 ...lanning via Reinforcement Learning Fine-Tuning |  1 +
 ...Inference for Instrumental Variable Regression |  1 +
 ...tion Learning for Interpretable Classification |  1 +
 ... Sampling using Sparse Gaussian Process Models |  1 +
 ...Flexible Classifiers with Fairness Constraints |  1 +
 ...ne learning, structured like classical physics |  1 +
 ...rsarial Patches with Sparse Superficial Layers |  1 +
 ...istillation to Many Classes with Proxy Targets |  1 +
 ...vative Information Using Variational Inference |  1 +
 ...gent Kernels via Sketching and Random Features |  1 +
 ...t Neural Network Compression by ReLU Stability |  1 +
 .../Scaling Vision with Sparse Mixture of Experts |  1 +
 ...Markov Chains Helps Resolve Underspecification |  1 +
 ...Databases to Scalable Differentiable Reasoning |  1 +
 ...rbrain: Unifying Sparse and Low-rank Attention |  1 +
 .../Scheduling jobs with stochastic holding costs |  1 +
 ...core-based Generative Modeling in Latent Space |  1 +
 ...ral Networks for Large-Scale Optimal Transport |  1 +
 ...ing Parameterized AP Loss for Object Detection |  1 +
 ...r Efficient Transformers for Language Modeling |  1 +
 ...arching the Search Space of Vision Transformer |  1 +
 .../neurips/Second-Order Neural ODE Optimizer     |  1 +
 ... Consistency Learning for Scene Classification |  1 +
 ...gn for Semantic Segmentation with Transformers |  1 +
 ...ve Sampling for Online Best-arm Identification |  1 +
 ...Point Processes with Nonparametric Time Decays |  1 +
 ...Individual Input-Output Pairs in Deep Learning |  1 +
 .../neurips/Self-Consistent Models and Values     |  1 +
 ...ted Samples in Generative Adversarial Networks |  1 +
 ...ed Recurrent Units with Dynamic Soft Recursion |  1 +
 ...with Transformation Equivariant Interpretation |  1 +
 ...ed Medical Image Segmentation with Meta-labels |  1 +
 .../Self-Supervised Bug Detection and Repair      |  1 +
 .../Self-Supervised GANs with Label Augmentation  |  1 +
 ...g Disentangled Group Representation as Feature |  1 +
 ...ased Optical Flow with Spiking Neural Networks |  1 +
 ...entations Provably Isolates Content from Style |  1 +
 ...d Learning with Kernel Dependence Maximization |  1 +
 ...i-Object Tracking with Cross-input Consistency |  1 +
 ...rk Weights for Model Characteristic Prediction |  1 +
 ...egmentation via Adaptive Equalization Learning |  1 +
 ...brium Models and Applications to Certification |  1 +
 ...ernel and Feature-Learning Probability Metrics |  1 +
 ...-Sequence Learning with Latent Neural Grammars |  1 +
 ...orithms for Testing Closeness of Distributions |  1 +
 ...Imitation Learning with Unobserved Confounders |  1 +
 .../neurips/Set Prediction in the Latent Space    |  1 +
 ...g the Variance of Multi-Agent Policy Gradients |  1 +
 ...ape As Points: A Differentiable Poisson Solver |  1 +
 ...Shape Registration in the Time of Transformers |  1 +
 ...red 3D Shape and Motion of Fast Moving Objects |  1 +
 ...ization Approach to Deterministic Autoencoders |  1 +
 ...Transformer using Factorized Reshaped Matrices |  1 +
 ... activity-context priors from egocentric video |  1 +
 ...e limits of the Shapley value for explanations |  1 +
 ...ponent Analysis for Multi-Subject Neuroimaging |  1 +
 ... Impossibility Results for Hyper-graph Testing |  1 +
 ...t Invariance Can Reduce Adversarial Robustness |  1 +
 ...e Limitations of Localized Graph Training data |  1 +
 ... for Spatio-Temporal Representational Learning |  1 +
 ...ethods for stochastic variational inequalities |  1 +
 data/2021/neurips/Sim and Real: Better Together   |  1 +
 ...Training using Gradient Similarity Measurement |  1 +
 ...and Matching of Neural Network Representations |  1 +
 ...dient Descent Algorithms for Pairwise Learning |  1 +
 ...olfe and generalized self-concordant functions |  1 +
 ...m Likelihood for Out-of-Distribution Detection |  1 +
 ...SketchGen: Generating Constrained CAD Sketches |  1 +
 ...ased Piano Transcription With Neural Semi-CRFs |  1 +
 .../Slice Sampling Reparameterization Gradients   |  1 +
 ...: A Scalable Measure of Statistical Dependence |  1 +
 ...ilarity Computation via Knowledge Distillation |  1 +
 ...erparameterized low-rank matrix reconstruction |  1 +
 ... Bilevel Programming for Sparse Regularization |  1 +
 data/2021/neurips/Smooth Normalizing Flows        |  1 +
 ... Smoothed Classifiers for Certified Robustness |  1 +
 ...ession Techniques for Distributed Optimization |  1 +
 ...onal continuous control via parameter freezing |  1 +
 ...oft Calibration Objectives for Neural Networks |  1 +
 ... Games with Tree Search and Imitation Learning |  0
 ...h Hidden Structure via Gradient Descent Ascent |  1 +
 ...via $k$-Sparse Discrete Wasserstein Barycenter |  1 +
 ...ce-time Mixing Attention for Video Transformer |  1 +
 ...ework Immune to Local Traps and Miscalibration |  1 +
 .../Sparse Flows: Pruning Continuous-depth Models |  1 +
 ...ith Application to Permutation Synchronisation |  1 +
 data/2021/neurips/Sparse Spiking Gradient Descent |  1 +
 ...ation and Tracking of Object Poses in 3D Space |  1 +
 ...ting Pruning Plasticity with Neuroregeneration |  1 +
 ...ntation in Deep Learning with Inducing Weights |  1 +
 .../Sparse is Enough in Scaling Transformers      |  1 +
 ...n and Planning in Partially Observable Domains |  1 +
 ...othing Mechanism for Student-Teacher Framework |  1 +
 ...ellite Imagery via Conditional Pixel Synthesis |  1 +
 ...Spatio-Temporal Variational Gaussian Processes |  1 +
 ...omposition in 3D Convolutional Neural Networks |  1 +
 ...for dynamic networks with stability guarantees |  1 +
 ...tion for Accurate Blind Image Super-Resolution |  1 +
 ...s Fully Recurrent Convolutional Neural Network |  1 +
 ...ch-T: Transducer for Text to Speech and Beyond |  1 +
 ...ance Estimation for Neural Architecture Search |  1 +
 ...ized Neural Network using SGD and Weight Decay |  1 +
 ...of Topological Changes via Geometric Alignment |  1 +
 ...suit: Tuning-Free Noisy Robust Matrix Recovery |  1 +
 ...ral Networks without the Neural Tangent Kernel |  1 +
 ...mal Risk Bounds with Convergence Rate $O(1 n)$ |  1 +
 ...vel Programming in Hyperparameter Optimization |  1 +
 ...nd Vision Transformers under Data Augmentation |  1 +
 ... Dynamical Systems via Policy Gradient Methods |  1 +
 ...ints for Defending Against Adversarial Attacks |  1 +
 ...ed Attention with Relative Positional Encoding |  1 +
 ...teful ODE-Nets using Basis Function Expansions |  1 +
 data/2021/neurips/Stateful Strategic Regression   |  1 +
 ...with M-Estimators on Adaptively Collected Data |  1 +
 ...er Bounds for List-Decodable Linear Regression |  1 +
 ...tein Autoencoder with Latent Space Consistency |  1 +
 ...l Models in the Presence of Latent Confounders |  1 +
 ... Efficient Linear Meta-representation Learning |  1 +
 ...n Mixing for Nonconvex Stochastic Optimization |  1 +
 .../Stochastic Bias-Reduced Gradient Methods      |  1 +
 ...vergence Analysis under Expected Co-coercivity |  1 +
 ...stic Multi-Armed Bandits with Control Variates |  1 +
 ...ession: the Forward Algorithm to Replace Ridge |  1 +
 ...cision-Recall Curves with Provable Convergence |  1 +
 ...Parameter-Free and Towards Horizon-Free Regret |  1 +
 ...roblems using the Prior Implicit in a Denoiser |  1 +
 ...Stochastic bandits with groups of similar arms |  1 +
 ...cay schedules, and high probability guarantees |  0
 ...r General Stochastic Automatic Differentiation |  1 +
 ...liss: Iterative Voting Improves Social Welfare |  1 +
 ... Identification with Reverse Experience Replay |  2 ++
 .../neurips/Stronger NAS with Weaker Predictors   |  1 +
 ...n Neural Networks using Reinforcement Learning |  1 +
 ... Bregman information, and exponential families |  1 +
 ...ructure-Aware Random Fourier Kernel for Graphs |  1 +
 ...sing Diffusion Models in Discrete State-Spaces |  1 +
 ...ational Inference for Bayesian Neural Networks |  1 +
 ...ing Latent Alignments in Sequence Transduction |  1 +
 ...eraging Dropout in RNNs for Efficient Training |  1 +
 ...logue Generation with Multi-Pass Dual Learning |  1 +
 ...Sub-Linear Memory: How to Make Performers SLiM |  1 +
 ...ta for Self-Supervised Representation Learning |  1 +
 .../Subgame solving without common knowledge      |  1 +
 ...ampling for Off-Policy Evaluation and Learning |  1 +
 .../Subgoal Search For Complex Reasoning Tasks    |  1 +
 ...ated Learning with Missing Neighbor Generation |  1 +
 ...lization and Fairness of Graph Neural Networks |  1 +
 data/2021/neurips/Submodular + Concave            |  1 +
 ...erparameterization for Shallow Neural Networks |  1 +
 ...orizon Goal-Conditioned Reinforcement Learning |  1 +
 ...ergy-based Contrastive Representation Transfer |  1 +
 ...sing the Transfer of Reasoning Patterns in VQA |  1 +
 ... Signals from a Mixture of Linear Measurements |  1 +
 ...n coincide with very high-dimensional features |  1 +
 .../Surrogate Regret Bounds for Polyhedral Losses |  1 +
 ...eous Treatment Effects from Time-to-Event Data |  1 +
 ...arnt Hamiltonian Dynamics Inferred from Vision |  1 +
 ... Learning Enhanced Genetic Programming Seeding |  0
 ...act Gradient of Neural ODE with Minimal Memory |  1 +
 ...t Effect Estimation with Longitudinal Outcomes |  1 +
 ...to Experimental Design with Synthetic Controls |  1 +
 ...stematic Generalization with Edge Transformers |  1 +
 ...r Structured Sparsity and Smoothness on Graphs |  1 +
 ...y Abstract Actor-Critic for Continuous Control |  1 +
 ... NAS Predictor with a Self-evolution Framework |  1 +
 ...pproach towards Few-shot Hypothesis Adaptation |  1 +
 ...moting Gradient Diversity and Model Smoothness |  1 +
 ...-Supervised Test-Time Training Fail or Thrive? |  1 +
 ... from Scratch with Deep Reinforcement Learning |  1 +
 ... and Pessimism for Deep Reinforcement Learning |  1 +
 ...ing unsupervised objectives at prediction time |  1 +
 ...Cooperative Multi-Agent Reinforcement Learning |  1 +
 .../neurips/Targeted Neural Dynamical Modeling    |  1 +
 ... Network Search with Meta-Contrastive Learning |  1 +
 ...re Deactivation Using Out-of-Distribution Data |  1 +
 ...al structure in neural network loss landscapes |  1 +
 ...Reinforcement Learning via Advice Distillation |  1 +
 ...ng an Active Learner with Contrastive Examples |  1 +
 ...the Learning-with-Equivalence-Queries Paradigm |  1 +
 .../Techniques for Symbol Grounding with SATNet   |  1 +
 ...ariance Pooling Networks for Video Recognition |  1 +
 .../neurips/Temporally Abstract Partial Models    |  1 +
 ...ensor Normal Training for Deep Learning Models |  1 +
 ...r correlations by nonlinear Hebbian plasticity |  1 +
 ...Execution of Imperative Deep Learning Programs |  1 +
 ...odule for Model-Agnostic Domain Generalization |  1 +
 ...n with a Transformer for Human Pose Estimation |  1 +
 data/2021/neurips/Test-time Collective Prediction |  1 +
 ...labeled Test Instances for Deep Learning Tasks |  1 +
 data/2021/neurips/Testing Probabilistic Circuits  |  1 +
 ...imator and a Paradox Concerning Logging Policy |  1 +
 ...ularization from SGD in Least Squares Problems |  1 +
 ...n: Expressiveness, Learnability, and Inference |  1 +
 ...etwork Learning: Revisiting the Superstructure |  1 +
 .../neurips/The Complexity of Sparse Tensor PCA   |  1 +
 ...assive Learning in Deep Reinforcement Learning |  1 +
 ...on the Generalization of Quadratic Classifiers |  1 +
 .../neurips/The Elastic Lottery Ticket Hypothesis |  1 +
 ...s: Learning Zero-shot Segmentation from Videos |  1 +
 ...Duality of Adaptive Dropout and Regularization |  1 +
 ... Combinatorial Semi-bandits with Greedy Oracle |  1 +
 .../The Image Local Autoregressive Transformer    |  1 +
 ...f Minima Stability: A View from Function Space |  1 +
 .../neurips/The Inductive Bias of Quantum Kernels |  1 +
 ...orithm is Universal on Strongly Convex Domains |  0
 ... Networks: A Deep Gaussian Process Perspective |  1 +
 .../The Limits of Optimal Pricing in the Dark     |  1 +
 .../neurips/The Many Faces of Adversarial Risk    |  1 +
 ...ch Methods for Feature Importance Explanations |  1 +
 ...model selection for general Contextual Bandits |  1 +
 ... Few-Shot Classification and How to Infer Them |  1 +
 .../The Semi-Random Satisfaction of Voting Axioms |  1 +
 ...ant Neural Networks for Reinforcement Learning |  1 +
 ... for Differentially Private Federated Learning |  1 +
 ...ein Distance: Conic Formulation and Relaxation |  1 +
 ...Explainable AI in Ad Hoc Human-Machine Teaming |  1 +
 ...lue of Information When Deciding What to Learn |  1 +
 ...oice in distance-regularized domain adaptation |  1 +
 ...ersarial episodic MDPs with unknown transition |  1 +
 ...bandits and uncertainty to optimism and beyond |  1 +
 ...y embedding constructed from the $k$-Laplacian |  1 +
 ...s correlation with automatic evaluation scores |  1 +
 ...hways with self-supervised predictive learning |  1 +
 ...finite-depth-and-width limit at initialization |  1 +
 ...hierarchical structure can guide deep learning |  1 +
 ...for Reversibility-Aware Reinforcement Learning |  1 +
 ...mall: Do Language Models Distil Occam's Razor? |  1 +
 ...chastic Gradients, and Adaptive Learning Rates |  1 +
 ...d motion correction for Neuropixels recordings |  1 +
 ...r Stochastic Approximation with Fixed Stepsize |  1 +
 ...lization Error Bounds via Wasserstein Distance |  1 +
 ... Action Repetition for Policy Gradient Methods |  1 +
 ...ization Bounds for SGLD in Non-convex Settings |  1 +
 ...ime-series Generation by Contrastive Imitation |  1 +
 ...is a Question of Cooperation for Language GANs |  1 +
 ...ce-driven monocular 3D category reconstruction |  1 +
 ...d Alignment for Unsupervised Domain Adaptation |  1 +
 ...r: Adaptive Space-Time Tokenization for Videos |  1 +
 ...ocument Graph-based Neural Network Perspective |  1 +
 ...opicNet: Semantic Graph-Guided Topic Discovery |  1 +
 .../Topographic VAEs learn Equivariant Capsules   |  1 +
 ...ological Attention for Time Series Forecasting |  1 +
 ...ological Detection of Trojaned Neural Networks |  1 +
 .../Topological Relational Learning on Graphs     |  1 +
 ...arning for Semi-Supervised Node Classification |  1 +
 ...ll-Worlds Online Learning with Feedback Graphs |  0
 ...bly Robust Models against Adversarial Examples |  1 +
 ... Biologically Plausible Convolutional Networks |  1 +
 ...iled Visual Recognition from Prior Perspective |  1 +
 ...Context-Agnostic Learning Using Synthetic Data |  1 +
 ...forcement Learning with Spectral Normalization |  1 +
 ...s Efficient and Effective Adversarial Training |  1 +
 ...ards Enabling Meta-Learning from Target Models |  1 +
 ...imization with Non-convex Followers and Beyond |  1 +
 ...y Selection for Offline Reinforcement Learning |  1 +
 ... Offline Reinforcement Learning with Pessimism |  1 +
 ...er Bounds on the Depth of ReLU Neural Networks |  1 +
 ...ained Explainability for Graph Neural Networks |  1 +
 ...polation: An Inductive Graph Learning Approach |  1 +
 ...g Self-Driving Perception Models in Simulation |  1 +
 .../Towards Robust Bisimulation Metric Learning   |  1 +
 ...wards Robust and Reliable Algorithmic Recourse |  1 +
 ...se Retrieval with Sparse and Generative Priors |  1 +
 ...mple-efficient Overparameterized Meta-learning |  1 +
 ...Try-On via Patch-Routed Spatially-Adaptive GAN |  1 +
 ...eneralization Bounds for Structured Prediction |  1 +
 .../neurips/Towards Stable and Robust AdderNets   |  1 +
 ...tion Lower Bounds for Distributed Optimisation |  1 +
 ...ulti-Agent Q-Learning with Value Factorization |  1 +
 ...okahead Generalizes Better Than SGD and Beyond |  1 +
 ...sity for Open-ended Learning in Zero-sum Games |  1 +
 ...ramework of Out-of-Distribution Generalization |  1 +
 ...ew of Adversarial Perturbations and Robustness |  0
 ...rmation-Theoretic Framework for Generalization |  1 +
 ...hical memory for reinforcement learning agents |  1 +
 ...taining from prediction with OOD test examples |  1 +
 ...by multi-task learning on monkey visual cortex |  1 +
 ...standing retrosynthesis by energy-based models |  1 +
 .../Tracking People with 3D Representations       |  1 +
 ... Without Re-recognition in Humans and Machines |  1 +
 ...arned Manifolds with Conformal Embedding Flows |  1 +
 ...table Regularization of Probabilistic Circuits |  1 +
 ...Networks with Efficient Local Lipschitz Bounds |  1 +
 ...licit Differentiation on the Equilibrium State |  1 +
 .../Training Neural Networks is ER-complete       |  1 +
 ...aining Neural Networks with Fixed Sparse Masks |  1 +
 ...erized Models with Non-decomposable Objectives |  1 +
 ...nt Interpolation Loss to Generalize Along Time |  1 +
 ...Can Make One Strong GAN, and That Can Scale Up |  1 +
 ... Learning for Whole Slide Image Classification |  1 +
 ...ers for Generalizable Person Re-identification |  1 +
 ...tworks with Ego-graph Information Maximization |  1 +
 data/2021/neurips/Transformer in Transformer      |  1 +
 ...ar RGB Scene Reconstruction using Transformers |  1 +
 ...ts and Can be Extended to Graphs & Hypergraphs |  1 +
 ...trategy for Single Image Reflection Separation |  1 +
 ...n Tree: from Decision Trees to Decision Graphs |  1 +
 ...n-centric Audio-visual Representation Learning |  0
 .../True Few-Shot Learning with Language Models   |  1 +
 .../Truncated Marginal Neural Ratio Estimation    |  1 +
 ... Mixture of Normal-inverse Gamma Distributions |  1 +
 ...Networks via Zero-Shot Hyperparameter Transfer |  0
 ... the Fly for Efficient Population Based AutoRL |  1 +
 ...of Bounded-Precision Recurrent Neural Networks |  1 +
 ...ivalence between robustness and regularization |  1 +
 ...gn of Spatial Attention in Vision Transformers |  1 +
 ...earning Evaluation: In vs. Out of Distribution |  1 +
 data/2021/neurips/Two steps to risk sensitivity   |  1 +
 ...ided fairness in rankings via Lorenz dominance |  1 +
 ...esian optimization with inequality constraints |  1 +
 ...iance Fields for Dynamic Scene View Synthesis" |  1 +
 ...ms for Multinomial Logistic Regression Bandits |  1 +
 ...Modal Controls for Conditional Image Synthesis |  1 +
 ...Stochastic Combinatorial Optimization Problems |  1 +
 data/2021/neurips/Ultrahyperbolic Neural Networks |  1 +
 ... Examples: Designing Objects for Robust Vision |  1 +
 ...rough Non-negative Penalized Linear Regression |  1 +
 ...gh Bias-Contrastive and Bias-Balanced Learning |  1 +
 ...ecisions Facilitate Better Preference Learning |  1 +
 ...libration for Ensemble-Based Debiasing Methods |  1 +
 .../Uncertainty Quantification and Deep Ensembles |  1 +
 ...forcement Learning with Diversified Q-Ensemble |  1 +
 ...-Driven Loss for Single Image Super-Resolution |  1 +
 ...Integration In Deep Speech Recognition Systems |  1 +
 .../Understanding Bandits with Graph Feedback     |  1 +
 ...cess in Over-parametrized Tensor Decomposition |  1 +
 ... Learning Methods as Implicit Parameterization |  1 +
 ...nding How Encoder-Decoder Architectures Attend |  1 +
 ... Interpretability of Variational Auto-Encoders |  1 +
 ...ocking Dynamics of Cooperative Rationalization |  1 +
 ...native Self-supervised Representation Learning |  1 +
 ...al Multi-Label Learning via Mutual Information |  1 +
 ... Early Stopping for Learning with Noisy Labels |  1 +
 ...Effect of Stochasticity in Policy Optimization |  1 +
 ...it of Model Invariance from a Data Perspective |  1 +
 ...upervised Domain Adaptation via Data Poisoning |  1 +
 ... Under-Coverage Bias in Uncertainty Estimation |  1 +
 ... Taylor's Approximations for Image Restoration |  1 +
 ...etraining Framework for Document Understanding |  1 +
 ...ward a Unified Framework for Robust Clustering |  1 +
 ...sian Width, Norm Bounds and Benign Overfitting |  1 +
 .../Uniform Sampling over Episode Difficulty      |  1 +
 ...nt Learning with Linear Function Approximation |  1 +
 ...inforcement Learning via Off-Policy Evaluation |  1 +
 ...Methods for Quasi-Self-Concordant Optimization |  1 +
 ...s on prediction dimension of convex surrogates |  0
 ...alification Rate Disparities and Interventions |  1 +
 ...ique sparse decomposition of low rank matrices |  1 +
 ...ation Using Well-Conditioned Normalizing Flows |  1 +
 .../Universal Graph Convolutional Networks        |  1 +
 data/2021/neurips/Universal Off-Policy Evaluation |  1 +
 ...rception Representations for Lossy Compression |  1 +
 .../neurips/Universal Semi-Supervised Learning    |  1 +
 .../Unlabeled Principal Component Analysis        |  1 +
 ...al Models via Contrast-Regularized Fine-Tuning |  1 +
 ...namics-Aware Rewards in Reinforcement Learning |  1 +
 ...eground Extraction via Deep Region Competition |  1 +
 ...ised Learning of Compositional Energy Concepts |  1 +
 ...resentation Learning with Capsule Autoencoders |  1 +
 ...by Discriminator-Constrained Optimal Transport |  1 +
 ...odels For 3D Partially Observable Environments |  1 +
 ...evel Representation Learning from Scene Images |  1 +
 ...Part Discovery from Contrastive Reconstruction |  1 +
 ...l Networks: I Believe I Can Distill On-the-Fly |  1 +
 data/2021/neurips/Unsupervised Speech Recognition |  1 +
 ...ially Private Learning via Correlated Sampling |  0
 ... and Repeated Measures in Deep Neural Networks |  1 +
 ...on Factorization with Variable Agent Sub-Teams |  1 +
 ...rvised Learning from Raw Video, Audio and Text |  1 +
 ...raph Neural Networks using Vector Quantization |  1 +
 ...icket Hypothesis with Inertial Manifold Theory |  1 +
 ...Replication Robust Volume-based Data Valuation |  1 +
 ... Evaluation with Linear Function Approximation |  1 +
 ...Sparse-Reward Cooperative Multi-Agent Problems |  1 +
 .../Variational Bayesian Optimistic Sampling      |  1 +
 ...sian Reinforcement Learning with Regret Bounds |  1 +
 .../Variational Continual Bayesian Meta-Learning  |  1 +
 ...or Continuous-Time Switching Dynamical Systems |  1 +
 .../neurips/Variational Model Inversion Attacks   |  1 +
 ...Multi-Task Learning with Gumbel-Softmax Priors |  1 +
 ... Space of Symmetric Positive Definite Matrices |  1 +
 ...ifolds via Gauge Independent Projected Kernels |  1 +
 ...ddings for Articulated 3D Shape Reconstruction |  1 +
 ...Advanced by Exploring Intrinsic Inductive Bias |  1 +
 ...tanding via Video-Distilled Knowledge Transfer |  1 +
 ...n using Inter-Frame Communication Transformers |  1 +
 ...ess for Coordination Detection on Social Media |  1 +
 ...al Imitation Learning using Variational Models |  1 +
 ... Nets and Humans Share Similar Inherent Biases |  1 +
 ...rgence of Intermediate Visual Patterns in DNNs |  1 +
 .../VoiceMixer: Adversarial Voice Style Mixup     |  1 +
 .../Volume Rendering of Neural Implicit Surfaces  |  1 +
 ...uction of Multiple Objects from a Single Image |  1 +
 ...sis of Representation Learning in Actor-Critic |  1 +
 ...grained Classification via Similarity Transfer |  1 +
 ...for offline model-based reinforcement learning |  1 +
 ...Weisfeiler and Lehman Go Cellular: CW Networks |  1 +
 ...ll-tuned Simple Nets Excel on Tabular Datasets |  1 +
 ...i-Modal Learning Better than Single (Provably) |  1 +
 ...at Matters for Adversarial Imitation Learning? |  1 +
 ...al networks actually say about generalization? |  1 +
 ...aining reveals about neural network complexity |  1 +
 ...ood imputation to predict with missing values? |  1 +
 ...When Are Solutions Connected in Deep Networks? |  1 +
 ... Trainability: Fewer than $n$ Neurons Can Work |  1 +
 ...mization with Low FPR for Multipartite Ranking |  1 +
 ...eneralizable Reinforcement Learning Tractable? |  1 +
 ...When Is Unsupervised Disentanglement Possible? |  1 +
 ...ial Robustness from Pretraining to Finetuning? |  1 +
 ...tainty Quantification for Epidemic Forecasting |  1 +
 ...earning Objectives are Sufficient for Control? |  1 +
 ...s and Who Follows in Strategic Classification? |  1 +
 ... Functions Lead to Less Transferable Features? |  1 +
 ...m Tasks? An Analysis of Head and Prompt Tuning |  1 +
 ...emic POMDPs and Implicit Partial Observability |  1 +
 ...of Sample Complexity on Sparse Neural Networks |  0
 ...ion Stabilizes GANs: Analysis and Improvements |  1 +
 ...xplanation and Context-Aware Data Augmentation |  1 +
 ...s and Heuristics Over the Atari-2600 Benchmark |  1 +
 ...gregation of Voter Information and Preferences |  1 +
 ...s Functions for Diachronic Word Representation |  1 +
 .../XCiT: Cross-Covariance Image Transformers     |  1 +
 ...uble Oracle Algorithm for Extensive-Form Games |  1 +
 ...wn Papers: An Owner-Assisted Scoring Mechanism |  1 +
 data/2021/neurips/You Never Cluster Alone         |  1 +
 ...Transformer in Vision through Object Detection |  1 +
 ...t! Making a lottery ticket claim its ownership |  1 +
 ...al-driven models of the primate dorsal pathway |  1 +
 ...ling Predictions in Early Exit Neural Networks |  1 +
 data/2021/neurips/argmax centroid                 |  0
 ...cient Lossless Compression via a Uniform Coder |  1 +
 ...gh-dimensional Neural Tangent Kernel Approach" |  1 +
 ...ass-Contrastive Back-Propagation Explanations" |  1 +
 ...antitative Study of Scalability with Dimension |  1 +
 ...domized Smoothing for Decision Stump Ensembles |  1 +
 ...ine Bipartite Matching with Degree Information |  1 +
 ...y for Self-training and Hyper-parameter Tuning |  1 +
 .../neurips/3D Concept Grounding on Neural Fields |  1 +
 ...Framework for Debugging Computer Vision Models |  1 +
 ...egular Latent Grids for 3D Generative Modeling |  1 +
 ...ing Semantic Novelty Detection on Point Clouds |  1 +
 .../2022/neurips/4D Unsupervised Object Discovery |  1 +
 ...A Benchmark for Compositional Visual Reasoning |  1 +
 ...ds Algorithm for Bandits with Delayed Feedback |  1 +
 .../A Boosting Approach to Reinforcement Learning |  1 +
 data/2022/neurips/A Causal Analysis of Harm       |  1 +
 ... for Non-Autoregressive Sentence Summarization |  1 +
 ...pervised Adversarially Robust PAC Learnability |  1 +
 ...n: Stability, Robustness, and Inductive Biases |  1 +
 .../neurips/A Closer Look at Offline RL Agents    |  0
 ...e Classifier for Few-shot Image Classification |  1 +
 ...ly-Supervised Audio-Visual Source Localization |  1 +
 ...ersarial Robustness of Deep Equilibrium Models |  1 +
 ...e on the Optimization of Shallow ReLU Networks |  1 +
 ...ng Algorithm for Training Deep Neural Networks |  1 +
 ...ear Convergence for Federated Minimax Learning |  1 +
 ...le Graph Training: Benchmarking and Rethinking |  1 +
 ...r Sparse Logistic Regression in High-Dimension |  1 +
 ...iable Lp Canonical Calibration Error Estimator |  1 +
 ...for Support Vector Machines via Data Reduction |  0
 ...s Time Framework for Discrete Denoising Models |  1 +
 ...ntrastive Framework for Neural Text Generation |  1 +
 ...rity for Practical Vertical Federated Learning |  1 +
 ... Analytical Moments And Sampling-Free Training |  0
 ... Development Goal of Safe Working Environments |  1 +
 ...arning Dataloader with Shared Data Preparation |  1 +
 ...ement Learning Framework for Column Generation |  1 +
 ...babilistic Embedding for Cross-Modal Retrieval |  1 +
 ...e fPTAS for the Minimum Enclosing Ball Problem |  1 +
 ...ation of AIXI Using Logical State Abstractions |  1 +
 ...st-Training Pruning Framework for Transformers |  1 +
 ...-negative Least Squares with Non-negative Data |  1 +
 ... RL with Resets and Linear Value Approximation |  1 +
 .../A Fourier Approach to Mixture Learning        |  1 +
 ...diting Differentially Private Machine Learning |  1 +
 ...metric Perspective on Variational Autoencoders |  1 +
 ...mputational Linguistics and Political Analysis |  1 +
 ...istic for Assessing Implicit Generative Models |  1 +
 ...Lagrangian Duality Approach to Active Learning |  1 +
 ...e Search Dataset for Unbiased Learning to Rank |  1 +
 .../A Lower Bound of Hash Codes' Performance      |  1 +
 ...esource Management with Function Approximation |  1 +
 ...prises for Unsupervised Reinforcement Learning |  1 +
 ... U-Nets with Applications to Hierarchical VAEs |  1 +
 ...anguage Understanding and Judgement Prediction |  1 +
 ...mework for Approximate Nearest Neighbor Search |  1 +
 ...rithm for Online Learning with Feedback Graphs |  1 +
 ...al-Dual Method for Off-Policy Learning in CMDP |  1 +
 ...A Neural Corpus Indexer for Document Retrieval |  1 +
 ... Learning Algorithm to Reduce Label Complexity |  1 +
 ...lization Bounds Using Samplewise Evaluated CMI |  1 +
 ...for High-Dimensional Generalized Linear Models |  1 +
 ...of Non-parametric Temporal-Difference Learning |  1 +
 ... Generalization Bound for Equivariant Networks |  1 +
 ...on Approach for Offline Reinforcement Learning |  1 +
 .../A Practical, Progressively-Expressive GNN     |  1 +
 ...tic Graph Coupling View of Dimension Reduction |  1 +
 ...tochastic Multi-level Composition Optimization |  1 +
 ...trol Variates and Adaptive Importance Sampling |  1 +
 ...eometric Approach to Neural-Network Smoothness |  1 +
 ...ary Approach for Debiasing Multiclass Datasets |  1 +
 ...A Regret-Variance Trade-Off in Online Learning |  1 +
 ...harpness Measure Based on Information Geometry |  1 +
 ...r Corruption-Tolerant Gaussian Process Bandits |  1 +
 ...ution for Hierarchical Representation Learning |  1 +
 ...n Algorithm for Training Optimal Decision Tree |  0
 ...mple Approach to Automated Spectral Clustering |  1 +
 .../A Simple Decentralized Cross-Entropy Method   |  1 +
 ...Learning with Safety against Heavy-tailed Risk |  1 +
 ...ynchronous Federated Contextual Linear Bandits |  1 +
 ... Approximation with Multiple Coupled Sequences |  1 +
 ... Scalable Learning in Neural ILP Architectures |  1 +
 .../A Spectral Approach to Item Response Theory   |  1 +
 ... Approach in Averaged Stochastic Approximation |  0
 ... Method for Decentralized Bilevel Optimization |  1 +
 ...ublicly Available US Criminal Justice Datasets |  1 +
 ...A Theoretical Framework for Inference Learning |  1 +
 ...heoretical Study on Solving Continual Learning |  1 +
 ...f Gradient Bias in Meta-Reinforcement Learning |  1 +
 ...heoretical View on Sparsely Activated Networks |  1 +
 ... Learnability under Transformation Invariances |  1 +
 ...ctor with Coarse-Fine Crossing Representations |  1 +
 ...d Learning with Arbitrary Client Participation |  1 +
 ...Data Augmentation: A Loss Function Perspective |  1 +
 ...ce Theorem for Stochastic Optimization Methods |  1 +
 ... Measure for Multiagent Reinforcement Learning |  0
 ...l Backdoor Learning: Frameworks and Benchmarks |  1 +
 ...ing Offline Model Training and Policy Learning |  1 +
 ...Unified Framework for Deep Symbolic Regression |  0
 ...amework for Solving Geometrically Complex PDEs |  1 +
 ...nified Model for Multi-class Anomaly Detection |  1 +
 .../A Unified Sequence Interface for Vision Tasks |  1 +
 ...Online Optimization with Long-Term Constraints |  1 +
 ...f Off-Policy General Value Function Evaluation |  1 +
 ...t Predictions Applied to Online Graph Problems |  1 +
 ...nt of Anderson Mixing with Minimal Memory Size |  1 +
 ...l for Supervised Graph Representation Learning |  1 +
 ... Sparse and Robust Pre-trained Language Models |  1 +
 ...ady-state simulations on high-resolution grids |  1 +
 .../A consistently adaptive trust-region method   |  1 +
 .../neurips/A contrastive rule for meta-learning  |  1 +
 ...astic and global variance reduction algorithms |  1 +
 ...ero-order optimization with two point feedback |  1 +
 ...Lipschitz functions in high and low dimensions |  1 +
 ... dataset for multilingual keyphrase generation |  0
 .../A permutation-free kernel two-sample test     |  1 +
 ...F result with applications in network modeling |  1 +
 ...ntinual learning: Repeated Augmented Rehearsal |  1 +
 ...ry of weight distribution-constrained learning |  0
 ...ormation encoding in recurrent neural networks |  1 +
 ...ted Attacker for Boosting Adversarial Training |  1 +
 ...h Absolute Memorization and Privacy Protection |  1 +
 ... Dropout for Robust Language Model Fine-Tuning |  1 +
 .../neurips/ADBench: Anomaly Detection Benchmark  |  1 +
 ...e Replay for Incremental Semantic Segmentation |  1 +
 ...hical Learning for Composite Multi-Agent Tasks |  1 +
 ...hmark for Versatile Medical Image Segmentation |  1 +
 ...rallel Strategies with Heterogeneity Awareness |  1 +
 ...tion Network for Click-Through Rate Prediction |  1 +
 ...chmark for Animal Pose Estimation and Tracking |  1 +
 ...aptive Skill Priors for Reinforcement Learning |  1 +
 ...ng CP Tensor Decomposition by Self Supervision |  1 +
 ...n for Compute-Efficient Hyper-parameter Tuning |  1 +
 ...anguage Embodied Navigation in 3D Environments |  1 +
 ...signal uncorrelation on spatio-temporal graphs |  1 +
 ...place Approximation for Bayesian Deep Learning |  1 +
 ...e Saddle-Point Problems with Bilinear Coupling |  1 +
 ...for Sparsity Constrained Optimization Problems |  1 +
 ...etworks (PINNs) using Meshless Discretizations |  1 +
 ...ied Robustness Training via Knowledge Transfer |  1 +
 ...onditioned Huge-Scale Online Matrix Completion |  1 +
 ...e Convolution with Column Vector-Wise Sparsity |  1 +
 .../Acceleration in Distributed Sparse Regression |  0
 ...ivity arises from distributed control policies |  1 +
 ...sing Wearable Sensors in a Kitchen Environment |  1 +
 .../2022/neurips/Active Bayesian Causal Inference |  1 +
 ...Exploration for Inverse Reinforcement Learning |  1 +
 ...ctive Labeling: Streaming Stochastic Gradients |  1 +
 ...elps Pretrained Models Learn the Intended Task |  1 +
 ...Active Learning Polynomial Threshold Functions |  1 +
 .../Active Learning Through a Covering Lens       |  1 +
 .../Active Learning for Multiple Target Models    |  0
 ...ing of Classifiers with Label and Seed Queries |  1 +
 ...tworks: Insights from Nonparametric Statistics |  1 +
 .../Active Learning with Safety Constraints       |  1 +
 ...Ranking without Strong Stochastic Transitivity |  1 +
 ...g Approach to Label-Efficient Model Evaluation |  1 +
 ...bilities of Deep Learning-based Stereo Methods |  1 +
 ...daFocal: Calibration-aware Adaptive Focal Loss |  1 +
 ...verge Without Any Modification On Update Rules |  1 +
 ...n Transformers for Scalable Visual Recognition |  1 +
 ...ayesian Inference in Attractor Neural Networks |  1 +
 ...bing Attention-Conditioned Masking Consistency |  1 +
 ...to Online Label Shift with Provable Guarantees |  1 +
 ...ive Data Debiasing through Bounded Exploration |  1 +
 ...t Learning with Hierarchical Optimal Transport |  1 +
 ...e Interest for Emphatic Reinforcement Learning |  1 +
 ...n for Learning Latent Space Energy-based Model |  1 +
 .../Adaptive Oracle-Efficient Online Learning     |  1 +
 data/2022/neurips/Adaptive Sampling for Discovery |  1 +
 ...duction for Non-convex Finite-Sum Minimization |  1 +
 ...ly Exploiting d-Separators with Causal Bandits |  1 +
 ...table Multiple Instance Learning for Pathology |  1 +
 ...ddressing Leakage in Concept Bottleneck Models |  1 +
 ...al Pretraining and Unified-Vocabulary Datasets |  1 +
 ...Gaussian process driven differential equations |  1 +
 ...ferable Adversarial Attack on Face Recognition |  1 +
 ...ancing Model Pruning via Bi-level Optimization |  1 +
 ...o Mitigate Black-Box Score-Based Query Attacks |  1 +
 ...resentation Learning Principle Guided Approach |  1 +
 .../neurips/Adversarial Reprogramming Revisited   |  1 +
 ...arial Robustness is at Odds with Lazy Training |  1 +
 ...or Domain Generalized Urban-Scene Segmentation |  1 +
 ...Adversarial Task Up-sampling for Meta-learning |  1 +
 ...n the Benefit of Gradually Informative Attacks |  1 +
 ...ducing Confidence Along Adversarial Directions |  1 +
 ...versarial training for high-stakes reliability |  1 +
 ...c Minimax Optimal Learner and Characterization |  1 +
 ...etable Assessment of Implicit Graph Generators |  1 +
 ...ce of Neural Networks under Distribution Shift |  1 +
 ...ting Reynolds-Averaged Navier-Stokes Solutions |  1 +
 ...rning Linear Thresholds from Label Proportions |  1 +
 ...mate Data Removal: New Results and Limitations |  1 +
 .../neurips/Algorithms with Prediction Portfolios |  1 +
 ...lustering with Anchor Matching Correspondences |  1 +
 ...rains with fused unbalanced Gromov Wasserstein |  1 +
 ...emporal Attention for Video Action Recognition |  1 +
 ...ics is Local: Redistricting via Local Fairness |  1 +
 ...se\" in Deep Topic Models via Policy Gradient" |  1 +
 ... Attacks on Variational Autoencoders with MCMC |  1 +
 ...earning by Removing Projection to the Centroid |  1 +
 ...g Mirror Descent for Constrained Min-Max Games |  1 +
 ...dgments for Robust Visual Event Classification |  1 +
 ...rtized Inference for Causal Structure Learning |  1 +
 ...ce for Heterogeneous Reconstruction in Cryo-EM |  1 +
 ...tized Mixing Coupling Processes for Clustering |  1 +
 ...ation for Sliced Wasserstein Generative Models |  1 +
 data/2022/neurips/Amortized Proximal Optimization |  1 +
 ...lifying Membership Exposure via Data Poisoning |  1 +
 ...ary Environments with Piecewise Stable Context |  1 +
 ...rated Learning of Heterogeneous Causal Effects |  1 +
 ...or Learning Switched Linear Dynamics from Data |  1 +
 .../2022/neurips/An Analysis of Ensemble Sampling |  1 +
 ...urriculum Learning in Teacher-Student Networks |  1 +
 ...tched Algorithm for the Dueling Bandit Problem |  1 +
 ... Approach to Semi-Supervised Few-Shot Learning |  1 +
 ...nglement of Negative-free Contrastive Learning |  1 +
 ...n In-depth Study of Stochastic Backpropagation |  1 +
 ...ormation-Theoretic Framework for Deep Learning |  1 +
 ...to Whitening Loss for Self-supervised Learning |  1 +
 ... ultra-large combinatorial synthesis libraries |  1 +
 ... compute-optimal large language model training |  0
 ...tric Properties for Graph Contrastive Learning |  1 +
 ...ypothesis from PAC-Bayesian Theory Perspective |  1 +
 ...: Progressive Sharpening and Edge of Stability |  1 +
 ...ent for Multi-Objective Reinforcement Learning |  1 +
 ...-Aware Face Image Generation for Video Avatars |  1 +
 ...sual Correspondence from Open Source 3D Movies |  1 +
 ...d Super-Resolution Models for Animation Videos |  1 +
 ... of Spurious Minima in Two-Layer ReLU Networks |  1 +
 ...t Benchmark for Unsupervised Anomaly Detection |  1 +
 ...ized Histograms in Intermediate Privacy Models |  1 +
 .../Anonymous Bandits for Multi-User Systems      |  1 +
 ... Performativity by Predicting from Predictions |  1 +
 ...Based Generative Models for Protein Structures |  1 +
 ...ime-Valid Inference For Multinomial Count Data |  1 +
 ...with Application to Gradient-Free Optimization |  1 +
 ...ths and distances beyond Johnson-Lindenstrauss |  1 +
 ...ations for the Cubic Regularization Subproblem |  1 +
 data/2022/neurips/Approximate Value Equivalence   |  1 +
 ...lev Space: with Applications to Classification |  1 +
 ...s in High Dimensions Under Minimal Assumptions |  1 +
 ...s Created Equal: A Neural Collapse Perspective |  1 +
 ...ke Agents Robust to Adversarial Perturbations? |  1 +
 ...Are Defenses for Graph Neural Networks Robust? |  1 +
 data/2022/neurips/Are GANs overkill for NLP?      |  1 +
 ...ng Disparate Treatment in Fair Neural Networks |  1 +
 ...lation for Fingerprinting Deep Neural Networks |  1 +
 ...ive Sparse Labeling for Video Action Detection |  1 +
 ...rning to Leverage an Expert for Embodied Tasks |  1 +
 ...stribution Generalization in Transfer Learning |  1 +
 ...tive Teaching of Motor Control Tasks to Humans |  1 +
 ...ir Effects in Video through Coordination Games |  1 +
 ...Multi-Task Classification with Category Shifts |  1 +
 ...Scaling Makes Larger Networks Teach Well Again |  1 +
 ...ic Approximation: A Jump Diffusion Perspective |  1 +
 ...ies for Bayesian Neural Network in Besov Space |  1 +
 ...Partial AUC Optimization: Theory and Algorithm |  1 +
 ...asserstein distances in the small noise regime |  1 +
 ... \342\204\2232 Regularized Network Embeddings" |  1 +
 ...-Critic for Multi-Agent Reinforcement Learning |  1 +
 ...SGD Beats Minibatch SGD Under Arbitrary Delays |  1 +
 ...sformers via Attentive Class Activation Tokens |  1 +
 .../Attention-based Neural Cellular Automata      |  1 +
 ...ple Approach for Source-free Domain Adaptation |  1 +
 ...udio-Driven Co-Speech Gesture Video Generation |  1 +
 ...ontrastive Learning: Fabricated and Generative |  1 +
 ...r Adaptive Control of Linear Quadratic Systems |  1 +
 ...etons and Object Outlines by Linking Keypoints |  1 +
 data/2022/neurips/AutoML Two-Sample Test          |  1 +
 ... for Novelty Detection with Error Rate Control |  1 +
 ...k for Automating Efficient Multi-Task Learning |  1 +
 ...niversal Modeling of Spatio-temporal Sequences |  1 +
 ...ing Automated Weak Supervision with 100 Labels |  1 +
 .../Autoformalization with Large Language Models  |  1 +
 ...Uncertainty Aware Inversion of Neural Networks |  1 +
 ...entiation of Programs with Discrete Randomness |  1 +
 ...ferentiation of nonsmooth iterative algorithms |  1 +
 ...utoregressive Perturbations for Data Poisoning |  1 +
 ... Generating Substrings as Document Identifiers |  1 +
 ...ralization Using Procedurally Generated Worlds |  1 +
 .../Average Sensitivity of Euclidean k-Clustering |  1 +
 ...ex Optimization with Communication Compression |  1 +
 ...imple and Robust LiDAR-Camera Fusion Framework |  1 +
 ...t Algorithm for Joint Alignment of Time Series |  0
 ...l Architecture Search Benchmark and Algorithms |  1 +
 ...m Update for Continual Video-Language Modeling |  1 +
 ...ation Made Easy: A Simple First-Order Approach |  1 +
 ...ier Node Detection on Static Attributed Graphs |  1 +
 ...as Reduced Self-Normalized Importance Sampling |  1 +
 ...xplore: Exploration by Bootstrapped Prediction |  1 +
 ...er Learning by Self-Sparsified Backpropagation |  1 +
 ...A Comprehensive Benchmark of Backdoor Learning |  1 +
 ...Prompt: Backdoor Attacks on Continuous Prompts |  1 +
 ...ip: A Certified Defense Against Data Poisoning |  1 +
 ...d Directed Evolution for Sequence Optimization |  1 +
 ...utations using the Acquisition Weighted Kernel |  1 +
 ...n via density-ratio estimation with guarantees |  1 +
 ...delity Active Learning with Budget Constraints |  1 +
 .../Batch size-invariance for policy optimization |  1 +
 ...ilistically Triggered Arms or Independent Arms |  1 +
 ...Learnable Predictive Coding Associative Memory |  1 +
 ...earning with Fully Bayesian Gaussian Processes |  1 +
 ... a Mixture of Dynamic Poisson Factor Analyzers |  1 +
 ...oration for Model-based Reinforcement Learning |  1 +
 ...ed Spaces via Probabilistic Reparameterization |  1 +
 .../Bayesian Persuasion for Algorithmic Recourse  |  1 +
 .../Bayesian Risk Markov Decision Processes       |  1 +
 ...Nonlinear Dynamics with Quantified Uncertainty |  1 +
 ...ayesian inference via sparse Hamiltonian flows |  1 +
 ...Transformers: Cloning $k$ modes with one stone |  1 +
 ...onalization for Offline Reinforcement Learning |  1 +
 ...ct Models through the Lens of Interpretability |  1 +
 ...an Pose and Shape Estimation Beyond Algorithms |  1 +
 ...ient and collaborative optimization benchmarks |  1 +
 ...ise in Composing Classes with Bounded Capacity |  1 +
 ...Permutation-Equivariance in Auction Mechanisms |  1 +
 ...ing in Two-layer Convolutional Neural Networks |  1 +
 ...gn Underfitting of Stochastic Gradient Descent |  1 +
 ...phic: Toward a Refined Taxonomy of Overfitting |  0
 ...nsmission Effects in Multi-Mode Optical Fibres |  1 +
 .../neurips/Best of Both Worlds Model Selection   |  1 +
 ...Worlds Bounds for Bandits with Switching Costs |  1 +
 .../Better SGD using Second-order Momentum        |  1 +
 ...ia Proper Scores for Classification and Beyond |  1 +
 ...ization: Improved Regret Bounds via Smoothness |  1 +
 ...ti-Class Prediction via Information Projection |  1 +
 ... decision-making in heterogeneous environments |  1 +
 ...L1: Faster and Better Sparse Models with skglm |  1 +
 ...Mahalanobis Distance for Textual OOD Detection |  0
 ...nual Learning with Backward Knowledge Transfer |  1 +
 ...pirical Study of Node Classification with GNNs |  1 +
 ...tive on Offline Multiagent Behavioral Analysis |  1 +
 ...tive Representations to Related Subpopulations |  1 +
 ... via Clairvoyant Multiplicative Weights Update |  1 +
 ...bio-plausible temporal credit assignment rules |  1 +
 ...Parameter learning for the deviated components |  1 +
 ...ws: beating power law scaling via data pruning |  1 +
 ...role of the topology in decentralized learning |  1 +
 ...unctional Estimation in Infinite-Armed Bandits |  1 +
 ...r User-specified Error-measuring Distributions |  1 +
 ...zier Gaussian Processes for Tall and Wide Data |  1 +
 ...tillation for Whole Slide Image Classification |  1 +
 ...chitectures for Vision Multi-Layer Perceptrons |  1 +
 ...Robustly Binarized Multi-distilled Transformer |  1 +
 ...ffline Infinite-width Model-based Optimization |  1 +
 ...Centric Biomedical Natural Language Processing |  1 +
 ...obabilistic Model for Binaural Audio Synthesis |  1 +
 ...e Representations of Commuting Transformations |  1 +
 ...Dynamic Thresholds for Spiking Neural Networks |  1 +
 ...ons for spiking networks with efficient coding |  1 +
 ...rks for Blind Separation of Correlated Sources |  1 +
 ... arbitrary timespans via local neuromodulators |  1 +
 ... Classification with Optimal Label Permutation |  1 +
 ...ralization: Stability of Zeroth-Order Learning |  1 +
 .../Black-box coreset variational inference       |  1 +
 ...Blackbox Attacks via Surrogate Ensemble Search |  1 +
 ...ave Flatter Landscape Around the True Solution |  1 +
 data/2022/neurips/Block-Recurrent Transformers    |  1 +
 ...s: A New Perspective on Adversarial Robustness |  1 +
 ...f-distribution Detection with Typical Features |  1 +
 ... Network Frameworks with Log-supermodular CRFs |  1 +
 ... Attacks with Reverse Adversarial Perturbation |  1 +
 ...Transformer for Offline Reinforcement Learning |  1 +
 ...rediction Error, Constraints, and Nonlinearity |  1 +
 ...ersectional Fairness through Marginal Fairness |  1 +
 data/2022/neurips/Brain Network Transformer       |  1 +
 ...ratively Solvable Problems in Predict+Optimize |  0
 ... Dataset for Geometric Fracture and Reassembly |  1 +
 ...chitecture Spaces via A Cross-Domain Predictor |  1 +
 ...rential Privacy in Data Acquisition Mechanisms |  1 +
 ...onvolutional Neural Networks on Small Datasets |  1 +
 ... Representations for Open-Vocabulary Detection |  1 +
 ...es in Non-contrastive Self-supervised Learning |  0
 ...valuation of Neural Network Binary Classifiers |  1 +
 ...tially Private Stochastic Minimax Optimization |  1 +
 ...eo via Frame-Clip Consistency of Object Tokens |  1 +
 ...mizing Privacy Subject to Accuracy Constraints |  1 +
 data/2022/neurips/Byzantine Spectral Ranking      |  1 +
 ...Gaussian process regression for streaming data |  1 +
 ...-Mixup: Improving Generalization in Regression |  1 +
 ...Networks for Precise Probabilistic Forecasting |  1 +
 ...ating Multimodal Referring Expression Datasets |  0
 ...ouping for 3D Object Detection on Point Clouds |  1 +
 ...Classification and Regression Diffusion Models |  1 +
 ...from Simulation to multiple Real-World Domains |  1 +
 ...tegory-agnostic Skeletal Animal Reconstruction |  1 +
 ...ext Generation APIs via Conditional Watermarks |  1 +
 data/2022/neurips/CCCP is Frank-Wolfe in disguise |  1 +
 ...s of Real-World Concepts on NLP Model Behavior |  1 +
 ...ons for Optical Chemical Structure Recognition |  1 +
 ...for Reinforcement Learning with Demonstrations |  1 +
 ...: Benchmark Tasks for Continual Graph Learning |  1 +
 ...MLE for Multimodal Conditional Image Synthesis |  1 +
 ...nerative Counterfactual Explanations on Graphs |  1 +
 ...ibing Physical and Causal Events the Human Way |  1 +
 ...wing Synthesis through Language-Image Encoders |  1 +
 ...opfield Networks with InfoLOOB Outperform CLIP |  1 +
 ...arning Benchmark for Vision-and-Language Tasks |  1 +
 ...trained Text Generation with Langevin Dynamics |  1 +
 ...ey Values for Data Valuation in Classification |  1 +
 data/2022/neurips/CUP: Critic-Guided Policy Reuse |  1 +
 ... Resampling for Training Recommender Retriever |  1 +
 ...d for Generalized 3D Deformation and Animation |  1 +
 ...rated Adversarial Training with Label Skewness |  1 +
 ...Constraints with Exact Satisfaction Guarantees |  1 +
 ...raining Be Manipulated By Non-Robust Features? |  1 +
 ...etworks Help Solve the Maximum Clique Problem? |  1 +
 ...enerative Models Fit Multimodal Distributions? |  1 +
 ...rge Language Models via Human Cognitive Biases |  1 +
 ...Capturing Graphs with Hypo-Elliptic Diffusions |  1 +
 ...Training in Extreme Multi-label Classification |  1 +
 ...Supervised Learning Approach and A New Dataset |  1 +
 ...ts Under the Sparse Mechanism Shift Hypothesis |  1 +
 ...t Variable Models Subject to Measurement Error |  1 +
 ...valence: Calculus, Algorithm, and Completeness |  1 +
 ...ith Non-IID Data using Linear Graphical Models |  1 +
 ...n and Classification using Neurochaos Learning |  1 +
 ...Structure Discovery for Reinforcement Learning |  1 +
 ...ated multi-shortcut identification and removal |  0
 ...tworks for Distribution-Free Survival Analysis |  1 +
 ...on under Orthogonal Gromov-Wasserstein Threats |  1 +
 ...onal Fairness with Subpopulation Decomposition |  1 +
 ...in of Thought Imitation with Procedure Cloning |  1 +
 ...ing Elicits Reasoning in Large Language Models |  1 +
 ...n Assumptions in Convex Reinforcement Learning |  1 +
 ...ry from Spatio-temporal Remote Sensing Imagery |  1 +
 ...nd Dense Functional Data in General Dimensions |  1 +
 ... Intrinsic to Neural Network Training with SGD |  1 +
 ...iled Limits for Deterministic Gradient Descent |  1 +
 ...wards Rigorous Benchmarking of Language Models |  1 +
 ...sk for Locally Strongly Convex Population Risk |  1 +
 ...with Response-Optimized Neural Encoding Models |  1 +
 ...atasets for UTXO and Account-based Blockchains |  1 +
 ...ndom Tables: Non-Trigonometric Random Features |  1 +
 ... Shortcut Learning with Generative Classifiers |  1 +
 .../Chromatic Correlation Clustering, Revisited   |  1 +
 ...al Transformers for Medical Image Segmentation |  1 +
 ...Learning with Cycle-Consistency Regularization |  1 +
 ...on Enabling Robustness on Efficient Inferences |  1 +
 ...riational Inequalities with Heavy-Tailed Noise |  1 +
 ...ed Designs for One-Sided Bipartite Experiments |  1 +
 ...gregate: Face Recognition with Large Probe Set |  1 +
 ...ve Learning for Imbalanced Node Classification |  1 +
 .../CoNSoLe: Convex Neural Symbolic Learning      |  1 +
 .../CoNT: Contrastive Neural Text Generation      |  1 +
 ...llaborative Inference via Feature Purification |  1 +
 ...guage Pre-training with Fusion in the Backbone |  1 +
 ...trained Models and Deep Reinforcement Learning |  1 +
 ...ansform for Generalizable Deep Metric Learning |  1 +
 ...Image Generation via Hierarchical Transformers |  1 +
 ...ative Decision Making Using Action Suggestions |  1 +
 ...e Learning by Detecting Collaboration Partners |  1 +
 ...er Heterogeneity and Communication Constraints |  1 +
 ...Adversarial Agents: Near-Optimal Regret Bounds |  1 +
 ...icient Message Passing for 3D Molecular Graphs |  1 +
 ...glement and Segmentation via Image Composition |  1 +
 ...MU: Dataset for Combinatorial Music Generation |  1 +
 ...ear Constraints: Beyond Knapsacks and Fairness |  1 +
 ...zation for Efficient Learning in Deep Networks |  1 +
 ...cating Natural Programs to Humans and Machines |  1 +
 ...ted Primal-Dual Algorithm with an Inexact Prox |  1 +
 ...ted Learning for Kernelized Contextual Bandits |  1 +
 ...erated Learning for Generalized Linear Bandits |  1 +
 ...entralized Learning with $O(1)$ Consensus Rate |  1 +
 ...nspace estimation with arbitrary node failures |  1 +
 ...mposite Feature Selection Using Deep Ensembles |  1 +
 ... Theorems for Interactive Differential Privacy |  1 +
 ...Study on Disentanglement and Emergent Language |  1 +
 ...ations in human and artificial neural networks |  1 +
 ...omposable NeRF via Rank-residual Decomposition |  1 +
 ...Reinforcement Learning for Linear Mixture MDPs |  1 +
 ...ata Encoding in Parameterized Quantum Circuits |  1 +
 ...lized Framework For Concept-Based Explanations |  1 +
 ...: Beyond the Accuracy-Explainability Trade-Off |  1 +
 ...: Generalized Score Matching for Discrete Data |  1 +
 ...ional Diffusion Process for Inverse Halftoning |  1 +
 ...stic Data and Applications to Causal Discovery |  1 +
 ...tional Meta-Learning of Linear Representations |  1 +
 ... Free-Standing Social Interactions in the Wild |  1 +
 ...ence-based Reliable Learning under Dual Noises |  1 +
 .../neurips/Confident Adaptive Language Modeling  |  1 +
 ...formal Frequency Estimation with Sketched Data |  1 +
 ...al Off-Policy Prediction in Contextual Bandits |  1 +
 ... Prediction with Temporal Quantile Adjustments |  1 +
 ...Conformalized Fairness via Quantile Regression |  1 +
 ...ting Image Data Privacy with Causal Confounder |  1 +
 ...r Efficient Model-Based Reinforcement Learning |  1 +
 ...ng under Graph Induced Fair Planted Partitions |  1 +
 ...ting Ensembles via the Manifold-Hilbert Kernel |  1 +
 ...ng the decision of any classifier or regressor |  0
 data/2022/neurips/Constants of motion network     |  1 +
 ...r Zero-Shot Transfer in Reinforcement Learning |  1 +
 ...rithms with L-mixing External Random Variables |  1 +
 ...ally Plausible Model of the Cortical Hierarchy |  1 +
 ... Optimization with State-dependent Markov Data |  1 +
 ...rojection Approach to Safe Policy Optimization |  1 +
 ...tems of Linear Ordinary Differential Equations |  1 +
 .../Contact-aware Human Motion Forecasting        |  1 +
 ...mic Pricing with Partially Linear Demand Model |  0
 ... Bandits with Knapsacks for a Conversion Model |  1 +
 ...Explore-then-UCB Strategy and Improved Regrets |  1 +
 ...on for Efficient Few-Shot Image Classification |  1 +
 ...g In Environments With Polynomial Mixing Times |  1 +
 ...tinual Learning with Evolving Class Ontologies |  1 +
 ...icient algorithm, and fundamental obstructions |  1 +
 ...blems: Normalized Advantage Functions Analysis |  1 +
 ... Homomorphisms and Homomorphic Policy Gradient |  1 +
 .../neurips/Continuously Tempered PDMP samplers   |  1 +
 ...Adapters for Foundation Model Group Robustness |  1 +
 ... via Information Bottleneck for Recommendation |  1 +
 ...guage-Image Pre-Training with Knowledge Graphs |  1 +
 ...ing as Goal-Conditioned Reinforcement Learning |  1 +
 .../neurips/Contrastive Neural Ratio Estimation   |  1 +
 ...er Global and Local Spectral Embedding Methods |  1 +
 ...s with Conditional Generative Occupancy Fields |  1 +
 ...ext Generation with Neurally-Decomposed Oracle |  1 +
 ... to Stop Tuning Penalties and Love Constraints |  1 +
 ...-parameterized regime using Rayleigh quotients |  1 +
 ...generative modeling with polynomial complexity |  1 +
 ...ograms in Human and Artificial Neural Networks |  1 +
 .../neurips/Convexity Certificates from Hessians  |  1 +
 ...Graphs with Chebyshev Approximation, Revisited |  1 +
 ...ive Distribution Alignment via JSD Upper Bound |  1 +
 ...e Reduction for Generalized Linear Programming |  1 +
 ...Prior Helps Implicit Neural 3D representations |  1 +
 .../2022/neurips/Coreset for Line-Sets Clustering |  1 +
 ...esets for Relational Data and The Applications |  1 +
 ...zed Linear Regression and $K$-Means Clustering |  1 +
 ... Distributionally Robust Optimization Problems |  1 +
 ...aining for Optimizing Non-Decomposable Metrics |  1 +
 ...etwork embeddings for tensor-structured inputs |  1 +
 ...mage Models Extract Universal Representations? |  1 +
 ...ual Fairness with Partially Known Causal Graph |  1 +
 ...al Influence of Misinformation on Social Media |  1 +
 .../Counterfactual Temporal Point Processes       |  1 +
 data/2022/neurips/Counterfactual harm             |  1 +
 ...sk Alignments for Referring Image Segmentation |  1 +
 ...g for 3D Vision Tasks by Cross-View Completion |  1 +
 ... Aggregation Transformer for Image Restoration |  1 +
 ...ross-Image Context for Single Image Inpainting |  1 +
 ...ing for cross-modality representation learning |  1 +
 ... for Image-Guided Point Cloud Shape Completion |  1 +
 ...ncrypted Graph Convolutional Network Inference |  1 +
 ...ness of Learning Halfspaces with Massart Noise |  1 +
 ...ld Models Yields Zero-Shot Object Manipulation |  1 +
 ...ptimal Transport via Gradual Domain Adaptation |  1 +
 ... Cyclic Contrastive Language-Image Pretraining |  1 +
 ... and Algorithms for Universal Self-Supervision |  1 +
 ... a Log-Determinant Acyclicity Characterization |  1 +
 ...Disentanglement-Augmented Rationale Extraction |  1 +
 ...del with Diverse Accessories and Rich Textures |  1 +
 ...ort Constrained Offline Reinforcement Learning |  1 +
 .../DC-BENCH: Dataset Condensation Benchmark      |  1 +
 ... A New Dataset For Automatic Medical Diagnosis |  1 +
 .../DENSE: Data-Free One-Shot Federated Learning  |  1 +
 ... Financial Dataset for Graph Anomaly Detection |  1 +
 ...and Sparse Hierarchical Reinforcement Learning |  1 +
 ...Solver for Combinatorial Optimization Problems |  1 +
 ...ersarial Defense with Local Implicit Functions |  1 +
 ... for learning to locomote with a changing body |  1 +
 ... Optimization with a Dual Network Architecture |  1 +
 ...ralized Context in Meta-Reinforcement Learning |  1 +
 ...ic Exploration for Safe Reinforcement Learning |  1 +
 ...tically Optimal and Differentially Private PCA |  1 +
 ...robabilistic Model Sampling in Around 10 Steps |  1 +
 ...ng for Non-IID Clients via Secret Data Sharing |  1 +
 ... Guidance for Semi-Supervised Object Detection |  1 +
 ...ptation for Unsupervised Semantic Segmentation |  1 +
 ...ning spike timing and reconstructive attention |  1 +
 ...MC for Bayesian Inference from Privatized Data |  1 +
 ... Advancing Predictive Models of the Microbiome |  1 +
 ...e Emergent In-Context Learning in Transformers |  1 +
 ...for efficient learning from parametric experts |  1 +
 .../Data-Driven Conditional Robust Optimization   |  1 +
 ...n-Making via Invariant Representation Learning |  1 +
 ...ient Augmentation for Training Neural Networks |  1 +
 ...fline Reinforcement Learning with Limited Data |  1 +
 ...Structured Pruning via Submodular Optimization |  1 +
 ...ps with heterogeneous outcomes in tabular data |  1 +
 ...DataMUX: Data Multiplexing for Neural Networks |  1 +
 ...t Distillation using Neural Feature Regression |  1 +
 .../Dataset Distillation via Factorization        |  1 +
 .../Dataset Inference for Self-Supervised Models  |  1 +
 ...mable Voxel Radiance Fields for Dynamic Scenes |  1 +
 ...Effects Estimation with Unmeasured Confounding |  1 +
 ...without Sample-Splitting for Stable Estimators |  1 +
 ...sed Self-Training for Semi-Supervised Learning |  1 +
 ...ommendation through Multi-Visit Clinic Records |  1 +
 ... via Learning Disentangled Causal Substructure |  1 +
 ...aches: An Influence Function Based Perspective |  1 +
 ...level Optimization over Communication Networks |  1 +
 ...ic Extra-Gradient for Variational Inequalities |  1 +
 ...oundation Models in Heterogeneous Environments |  1 +
 ...n-free Learning in Structured Matching Markets |  1 +
 ...Equivalent Sampling for Reinforcement Learning |  1 +
 .../Decision Trees with Short Explainable Rules   |  1 +
 ...ng: Learning Locally Optimized Decision Losses |  1 +
 ...ransformers via Patch-wise Adversarial Removal |  1 +
 ... with Nearly-Linear Gradient Oracle Complexity |  1 +
 ...on for Class-Incremental Semantic Segmentation |  1 +
 ...eRF for Editing via Feature Field Distillation |  1 +
 ...n Similarity for Comparison of Neural Networks |  1 +
 ...essing for Context Augmented Language Modeling |  1 +
 .../Decoupled Self-supervised Learning for Graphs |  1 +
 ...hot Object Detection and Instance Segmentation |  1 +
 ...ical Propagation for Video Object Segmentation |  1 +
 ...orization: Retrieval-augmented Prompt Learning |  1 +
 ...ctive Learning by Leveraging Training Dynamics |  1 +
 ...s for Its Convergence: A Fine-Grained Analysis |  1 +
 ...g for Solving Constraint Optimization Problems |  1 +
 ...rectional Language-Knowledge Graph Pretraining |  1 +
 data/2022/neurips/Deep Combinatorial Aggregation  |  1 +
 ... Compression of Pre-trained Transformer Models |  1 +
 ...timation with Categorical Background Variables |  1 +
 .../Deep Differentiable Logic Gate Networks       |  1 +
 .../Deep Ensembles Work, But Are They Necessary?  |  1 +
 ...eep Equilibrium Approaches to Diffusion Models |  1 +
 data/2022/neurips/Deep Fourier Up-Sampling        |  1 +
 .../Deep Generalized Schr\303\266dinger Bridge"   |  1 +
 .../Deep Generative Model for Periodic Graphs     |  1 +
 .../Deep Hierarchical Planning from Pixels        |  1 +
 ...ximal Inference via Maximum Moment Restriction |  1 +
 data/2022/neurips/Deep Model Reassembly           |  1 +
 ...al Effect Estimation With Unstructured Proxies |  1 +
 ... Surrogate Assisted Generation of Environments |  1 +
 ...tworks with differentiable augmentation layers |  1 +
 ...sh Simulation with Deep Reinforcement Learning |  1 +
 ...: 3D Object Detection via Modality Interaction |  1 +
 ...Mediation Analysis with Debiased Deep Learning |  1 +
 ...ep Threshold-Optimal Policy for MDPs and RMABs |  1 +
 ... Adversarial Attacks via Neural Dynamic System |  1 +
 .../Defining and Characterizing Reward Gaming     |  0
 ...e Transformer for Spectral Compressive Imaging |  1 +
 ...ging for Domain Adaptive Semantic Segmentation |  1 +
 ...Detection with Vision-Language Representations |  1 +
 ...into Sequential Patches for Deepfake Detection |  1 +
 .../Denoising Diffusion Restoration Models        |  1 +
 .../neurips/Dense Interspecies Face Embedding     |  1 +
 ...gularization for Out-of-distribution Detection |  1 +
 ...h with Prediction Concatenation in Deep Forest |  1 +
 ...gy Produce Heterogeneous Graph Neural Networks |  1 +
 ...ralleled Pre-training for Open-world Detection |  1 +
 ...Changes in Sequential Pairwise Comparison Data |  1 +
 ...zation of Changes in Conditional Distributions |  1 +
 ... with Normalizing Flows for Bayesian Inference |  1 +
 ...nnections for Locality Preserving Sparse Codes |  1 +
 ...: Differential Spectral Clustering of Features |  1 +
 ...tribution shift in real-world medical settings |  1 +
 ...es are as Effective as Structured State Spaces |  1 +
 ... Prior Dictionary Knowledge for Text-to-Speech |  1 +
 ...Quantum Computing for Optimization and Control |  1 +
 ...te gradient search for spiking neural networks |  1 +
 .../Differentially Private Covariance Revisited   |  1 +
 ...ly Private Generalized Linear Models Revisited |  1 +
 ... via Sensitivity-Bounded Personalized PageRank |  1 +
 ...eeds Hidden State (Or Much Faster Convergence) |  1 +
 ...tially Private Learning with Margin Guarantees |  1 +
 ...es: Efficient Implementations and Applications |  1 +
 .../Differentially Private Model Compression      |  1 +
 ...ally Private Online-to-batch for Smooth Losses |  1 +
 ...ating Local Curvature in High Dimensional Data |  1 +
 .../Diffusion Models as Plug-and-Play Priors      |  1 +
 .../Diffusion Visual Counterfactual Explanations  |  1 +
 ...usion-LM Improves Controllable Text Generation |  1 +
 ...cule Generation with Informative Prior Bridges |  1 +
 ...ularization for GAN Training with Limited Data |  1 +
 data/2022/neurips/Direct Advantage Estimation     |  1 +
 data/2022/neurips/Discovered Policy Optimisation  |  1 +
 .../Discovering Design Concepts for CAD Sketches  |  1 +
 ...se-engineered Data-free Knowledge Distillation |  1 +
 ...iscovery of Single Independent Latent Variable |  1 +
 ...on for Goal Conditioned Reinforcement Learning |  0
 ... for Warm-Starting Algorithms with Predictions |  1 +
 ...ions in the Presence of Unobserved Confounders |  1 +
 ...g Transfer in Continual Reinforcement Learning |  1 +
 ...ep Ensembles through the Neural Tangent Kernel |  1 +
 ...r Input Attribution in the Deep Neural Network |  1 +
 ...ations from GAN Generator via Squeeze and Span |  1 +
 ...g Learning Rules with Brain Machine Interfaces |  1 +
 ...l variability using warped autoregressive HMMs |  1 +
 ...Robust Optimization with Non-Convex Objectives |  1 +
 ...s for Parallel MARL in Large Networked Systems |  1 +
 ...Reinforcement Learning for Multi-agent Systems |  1 +
 ...ntiles in the Reproducing Kernel Hilbert Space |  1 +
 ...onal Inequalities, with Theoretical Guarantees |  1 +
 ...vex Optimization with Compressed Communication |  1 +
 ...Dimension Independent Communication Complexity |  1 +
 ...ural Networks for Domain Adaptation Regression |  0
 ... Convergence of the Sliced Wasserstein Process |  1 +
 ...forcement Learning for Risk-Sensitive Policies |  1 +
 ...ective Multi-agent Deep Reinforcement Learning |  1 +
 ...utionally Adaptive Meta Reinforcement Learning |  1 +
 ...bust Optimization via Ball Oracle Acceleration |  1 +
 ...ionally Robust Optimization with Data Geometry |  1 +
 ...butionally robust weighted k-nearest neighbors |  1 +
 ...BO: Diversity-aware CASH for Ensemble Learning |  1 +
 ...eraging for Out-of-Distribution Generalization |  1 +
 ...endations for Agents with Adaptive Preferences |  1 +
 ...e generalization in one-shot generative models |  1 +
 ...ert More Attention to Vision-Language Tracking |  1 +
 ...n Adaptation via Adaptive Contrastive Learning |  1 +
 ...timization Methods in Deep Learning Even Help? |  1 +
 ...retize Neural Ordinary Differential Equations? |  1 +
 ...GNN Pretraining Help Molecular Representation? |  1 +
 ...the Implicit Regularization on Separable Data? |  1 +
 ...ly Improve Reinforcement Learning from Pixels? |  1 +
 ... meets Individual Fairness. And they get along |  1 +
 .../Domain Adaptation under Open Set Label Shift  |  1 +
 ...Learning and Removing Domain-specific Features |  1 +
 ...n Generalization without Excess Empirical Risk |  1 +
 ...emporal Logic for Temporal Action Segmentation |  1 +
 ...ery Distortion of Matching Problems and Beyond |  0
 ...cing Certified Robustness through Transitivity |  1 +
 ... Bidirectional Offline Model-Based Imagination |  1 +
 .../Doubly Robust Counterfactual Classification   |  1 +
 ...Making Value Iteration Asynchronous in Actions |  1 +
 ...mage Generation with Contextual RQ-Transformer |  1 +
 ...ribution with Neuro-Symbolic Generative Models |  1 +
 ...edding Table Placement for Recommender Systems |  1 +
 ...ective Method for Improving Deep Architectures |  1 +
 ...cer Prognosis Analysis with Whole Slide Images |  1 +
 ...k for Imbalanced Graph-level Anomaly Detection |  1 +
 ...lti-Label Recognition with Limited Annotations |  1 +
 ...ngeons and Data: A Large-Scale NetHack Dataset |  1 +
 ...Dynamic Fair Division with Partial Information |  1 +
 ...works Under Spatio-Temporal Distribution Shift |  1 +
 ...nt Learning for Characterizing Animal Behavior |  1 +
 .../Dynamic Learning in Large Matching Markets    |  1 +
 ...nstraint under Unknown Parametric Demand Model |  1 +
 ...ries Classification: Learning What to \"See\"" |  1 +
 .../neurips/Dynamic Tensor Product Regression     |  1 +
 ...g and assortment under a contextual MNL demand |  1 +
 ...ive Variants and Convergence to Exact Solution |  1 +
 ...cement Learning with Parallel Program Guidance |  1 +
 ...Automatic Reward Shaping in Language-guided RL |  1 +
 ...biased Compression in Distributed Optimization |  1 +
 ...nergy-Guided Stochastic Differential Equations |  1 +
 ...to-SQL Benchmark for Electronic Health Records |  1 +
 ...ical Reasoning with Adaptive Symbolic Compiler |  1 +
 ...or Evaluating Language-Augmented Visual Models |  1 +
 ...ing to Index and Search in Large Output Spaces |  1 +
 ...on Alignment as a Multi-Agent Intrinsic Reward |  1 +
 ...For Post-Processing Ensemble Weather Forecasts |  1 +
 ...mark: VIdeo Segmentations and Object Relations |  1 +
 ...t Aware Dose Allocation for Precision Medicine |  1 +
 ...al Representation Learning in Echocardiography |  1 +
 ...o-Cost Proxies For Neural Architecture Scoring |  1 +
 ... Training Mildly Parameterized Neural Networks |  1 +
 ...Time Transformers for Earth System Forecasting |  1 +
 ...Energy-Saving Attention with Linear Complexity |  1 +
 ...ask Co-Training for Unified Autonomous Driving |  1 +
 ... by Exploiting Sensitivity of Poisoned Samples |  1 +
 ... Dimension in Bandit Problems under Censorship |  1 +
 ...and Accurate Single-Stage Pedestrian Detection |  1 +
 ...ffects of Data Geometry in Early Deep Learning |  1 +
 ...ciency Ordering of Stochastic Gradient Descent |  1 +
 .../Efficient Active Learning with Abstention     |  1 +
 ...Worst-Case-Aware Robust Reinforcement Learning |  1 +
 ...d Kernel Tests using Incomplete $U$-statistics |  1 +
 ...fficient Architecture Search for Diverse Tasks |  1 +
 ...istillation using Random Feature Approximation |  1 +
 ...or Generalized Low-Rank Matrix Bandit Problems |  1 +
 ...rity Computation with Alignment Regularization |  1 +
 ... Knowledge Distillation from Model Checkpoints |  1 +
 ... Learning for Preference-based Fast Adaptation |  1 +
 ...ent Methods for Non-stationary Online Learning |  1 +
 ...on via Self-supervised Information Aggregation |  1 +
 ...-Parametric Optimizer Search for Diverse Tasks |  1 +
 ...Extensive-Form Games via Online Mirror Descent |  1 +
 .../Efficient Risk-Averse Reinforcement Learning  |  1 +
 ...ling on Riemannian Manifolds via Langevin MCMC |  1 +
 ...a Augmentation for Deep Reinforcement Learning |  1 +
 ...ence for Conditional GANs and Diffusion Models |  1 +
 ...timization under Noise: Local Search is Robust |  1 +
 ...ient Training of Low-Curvature Neural Networks |  1 +
 ...Augmentation Strategy for Adversarial Training |  1 +
 ...rouping via Meta Learning on Task Combinations |  1 +
 ...Effective Optimal Transport-Based Biclustering |  1 +
 ...Efficient and Modular Implicit Differentiation |  1 +
 ...line Learning for Generalized Linear Functions |  1 +
 ...ent and Stable Fully Dynamic Facility Location |  1 +
 ...capacity, and the emergence of retinal mosaics |  1 +
 ...rmative features in simulation-based inference |  1 +
 ...models with time-series privileged information |  1 +
 ...Former: Vision Transformers at MobileNet Speed |  1 +
 ...tants of Neural Networks via Bound Propagation |  1 +
 ...olean Matrices using Proximal Gradient Descent |  1 +
 ...Understanding Human Tasks in Egocentric Videos |  1 +
 .../neurips/Egocentric Video-Language Pretraining |  1 +
 ...tion for self-supervised multi-view stereopsis |  0
 .../Eliciting Thinking Hierarchy without a Prior  |  1 +
 ...ign Space of Diffusion-Based Generative Models |  1 +
 ...amical systems with uncertainty quantification |  1 +
 .../Embodied Scene-aware Human Pose Estimation    |  1 +
 ...p: VAEs Perform Independent Mechanism Analysis |  1 +
 ...e Approach for Spatio-Temporal Video Grounding |  1 +
 ...ingle Sheet of Self-Organizing Spiking Neurons |  1 +
 ... Generalization and Overfitting in Lewis Games |  1 +
 ...cal Conventions in a Visual Communication Game |  1 +
 ...rical Gateaux Derivatives for Causal Inference |  1 +
 ...hree-layer Neural Networks with Infinite Width |  1 +
 ... Evaluation Through Video Dataset Augmentation |  1 +
 ...t Networks: Extrapolation without Overthinking |  0
 ...tochastic Optimization with Energy-based Model |  1 +
 ...d-to-end Symbolic Regression with Transformers |  1 +
 ...Contrastive Learning of Visual Representations |  1 +
 ...presentation via Discrete Adversarial Training |  1 +
 ...nced Bilevel Optimization via Bregman Distance |  1 +
 ...l Image Denoising via Alternative Optimization |  1 +
 ...a Demonstrations in Sparse Reward Environments |  0
 ...fe Exploration Using Safety State Augmentation |  1 +
 ... Boosting Performance in Domain Generalization |  1 +
 ...Precision Quantization for Deep Network Design |  0
 ...orcement Learning Environment Execution Engine |  1 +
 ...lti-head Neural Network for Invariant Learning |  1 +
 .../Envy-free Policy Teaching to Multiple Agents  |  1 +
 .../EpiGRAF: Rethinking training of 3D GANs       |  1 +
 ...ivariant Graph Hierarchy-Based Neural Networks |  1 +
 .../Equivariant Networks for Crystal Structures   |  1 +
 ...quivariant Networks for Zero-Shot Coordination |  1 +
 ...r Analysis of Tensor-Train Cross Approximation |  1 +
 .../neurips/Error Correction Code Transformer     |  1 +
 ...ective Generalization on Class-Imbalanced Data |  1 +
 ...ation Efficient Nonconvex Distributed Learning |  1 +
 ...lizations in Deep Variational Quantum Circuits |  1 +
 ...el Correlations for Noisy Multi-Label Learning |  1 +
 ...formance When Both Covariates and Labels Shift |  1 +
 ... with applications to single-cell gene network |  1 +
 ...l ROC Curve and Lower Bounding the Maximal AUC |  1 +
 ...Constant Space with Improved Sample Complexity |  1 +
 ...or Meta Learning: Tightness and Expressiveness |  1 +
 ...ive Models with Contrastively Learned Features |  1 +
 ...-ML Models under Realistic Distribution Shifts |  1 +
 ...tion Performance on Document Image Classifiers |  1 +
 ...o Dataset Shift via Parametric Robustness Sets |  1 +
 ...rmance: Analyzing Concepts in AlphaZero in Hex |  1 +
 ...s Improves Robustness of Graph Neural Networks |  1 +
 ... Kernels under Benign and Adversarial Training |  1 +
 ... Shape Correspondence via 2D graph convolution |  1 +
 .../Exact Solutions of a Deep Linear Network      |  1 +
 ...s of deep linear networks with prior knowledge |  1 +
 ...on for Weakly-Supervised Semantic Segmentation |  1 +
 ...for Compact Video-and-Language Representations |  1 +
 .../Expected Improvement for Contextual Bandits   |  1 +
 ...ormer for Dense Prediction without Fine-tuning |  1 +
 ...nctionals in Reproducing Kernel Hilbert Spaces |  1 +
 ...g-Term Memory by predicting uncertain outcomes |  1 +
 .../neurips/Explainability Via Causal Self-Talk   |  1 +
 ...le Reinforcement Learning via Model Transforms |  1 +
 .../Explaining Preferences with Shapley Values    |  1 +
 data/2022/neurips/Explicable Policy Search        |  1 +
 ...ersarial and Natural Distributional Robustness |  1 +
 ...rvative Exploitation via Linear Reward Shaping |  1 +
 ...xploitability Minimization in Games and Beyond |  1 +
 ...Semantic Relations for Glass Surface Detection |  1 +
 ...d Cosine Similarity for Attribution Protection |  1 +
 .../Exploration via Elliptical Episodic Bonuses   |  1 +
 ...g for Information about the Optimal Trajectory |  1 +
 ...or Reinforcement Learning under Sparse Rewards |  1 +
 ...loring Example Influence in Continual Learning |  1 +
 ...ssignment Mechanism in Perceptual Organization |  0
 ...Length Generalization in Large Language Models |  1 +
 ...language models as protein function predictors |  1 +
 ...tion of AUPRC Optimization with List Stability |  1 +
 ...ace of Autoencoders with Interventional Assays |  1 +
 ...ng for Detoxifying Large-Scale Language Models |  1 +
 ...he Whole Rashomon Set of Sparse Decision Trees |  1 +
 ... Random Curiosity with General Value Functions |  1 +
 ...ased Reinforcement Learning via Score Matching |  1 +
 ...ntial Separations in Symmetric Neural Networks |  1 +
 ...sfeiler-Lehman Test with Graph Neural Networks |  1 +
 ...ructures for Fast and Accurate Sparse Training |  1 +
 ...oise-Adaptive Accelerated Second-Order Methods |  1 +
 ...echanisms from neural data using low-rank RNNs |  1 +
 ... with Hadamard Product: a Polynomial Net Study |  1 +
 ...rk for Fast Training-free Test-time Adaptation |  1 +
 ...overning Abstractions Behind Integer Sequences |  1 +
 ...undational Models for Expert Task Applications |  1 +
 ...d of Words Represented as Non-Linear Functions |  1 +
 ... Federated Learning Annotated Image Repository |  1 +
 ...ated Learning in Realistic Healthcare Settings |  1 +
 ...VR: Neural Volume Rendering for Face Animation |  1 +
 ...d for Monocular Real-time Human Reconstruction |  1 +
 .../FP8 Quantization: The Power of the Exponent   |  1 +
 ... Folded Rationalization with a Unified Encoder |  1 +
 ...tion for Non-Stationary Reinforcement Learning |  1 +
 ...ionally Robust Policies for Contextual Bandits |  1 +
 ... Parameter Factorization & Similarity Matching |  0
 ...Language Models for Open-Ended Text Generation |  1 +
 ...es-Optimal Classifiers Under Predictive Parity |  1 +
 ... Biased Training Data Points Without Refitting |  1 +
 data/2022/neurips/Fair Rank Aggregation           |  1 +
 .../Fair Ranking with Noisy Protected Attributes  |  1 +
 .../Fair Wrapping for Black-box Predictions       |  1 +
 ...ient Allocations Without Obvious Manipulations |  1 +
 ...Decision Trees: A Dynamic Programming Approach |  1 +
 ...ramework with Contrastive Adversarial Learning |  1 +
 data/2022/neurips/Fairness Reprogramming          |  1 +
 ...rability Subject to Bounded Distribution Shift |  1 +
 ...rness in Federated Learning via Core-Stability |  1 +
 ...ut Demographics through Knowledge Distillation |  1 +
 ...proach for Approximate Nearest Neighbor Search |  1 +
 ...fore Extrapolation in Causal Effect Estimation |  1 +
 ...for Packing Proportional Fairness and its Dual |  1 +
 ...ts via Subsampling and Quasi-Newton Refinement |  1 +
 ...nt Process Intensity as Function of Covariates |  1 +
 ...h Bayesian Quadrature via Kernel Recombination |  1 +
 .../Fast Distance Oracles for Any Symmetric Norm  |  1 +
 .../Fast Instrument Learning with Faster Rates    |  1 +
 ...nt Descent with Normalization and Weight Decay |  1 +
 ...ural Kernel Embeddings for General Activations |  1 +
 ...ed Frank-Wolfe Algorithm under Parallelization |  1 +
 .../Fast Vision Transformers with HiLo Attention  |  1 +
 ...nforcement Learning with Slower Online Network |  1 +
 .../Faster Linear Algebra for Distance Matrices   |  1 +
 ...orithms for Densest Subgraph and Decomposition |  1 +
 ...k: Fast and Accurate Interpretable Risk Scores |  1 +
 data/2022/neurips/Fault-Aware Neural Code Rankers |  1 +
 .../FeLMi : Few shot Learning with hard Mixup     |  1 +
 ...arized DNNs: Attraction Repulsion and Sparsity |  1 +
 ...re-Proxy Transformer for Few-Shot Segmentation |  1 +
 ... Local Updates Lead to Representation Learning |  1 +
 ...n Approach for Personalised Federated Learning |  1 +
 ...ted Learning with Rolling Sub-Model Extraction |  1 +
 ...n Generalization Method for Federated Learning |  1 +
 ...rained Models: A Contrastive Learning Approach |  1 +
 ...Audio-Visual Learning of Environment Acoustics |  1 +
 .../Few-Shot Continual Active Learning by a Robot |  1 +
 .../Few-Shot Fast-Adaptive Anomaly Detection      |  1 +
 ...etric Learning with Deep Latent Variable Model |  1 +
 ...is Better and Cheaper than In-Context Learning |  1 +
 ...eration via Adaptation-Aware Kernel Modulation |  1 +
 ...on with Hilbert-Schmidt Independence Criterion |  1 +
 ... Reasoning via Connection Subgraph Pretraining |  1 +
 ...re Search for Distilling Large Language Models |  1 +
 ...ep Learning via Feature-wise Linear Modulation |  1 +
 ...ry Model for Long-term Time Series Forecasting |  1 +
 ...r Data-Driven Financial Reinforcement Learning |  1 +
 ...onstrained Markov Game: A Primal-Dual Approach |  1 +
 ...nvNets Using Counterfactual Simulation Testing |  1 +
 ...Occurring Physical Backdoors in Image Datasets |  1 +
 ...ts with Semi-bandit Feedback and Finite Budget |  1 +
 ...onconvex-Strongly-Concave Minimax Optimization |  1 +
 ...Finding and Listing Front-door Adjustment Sets |  1 +
 ...ralization for Modern Meta Learning Algorithms |  1 +
 ...antically Aligned Vision-Language Pre-Training |  1 +
 ...fectively by Optimizing Subnetworks Adaptively |  1 +
 ... using Activation Quantization with Guarantees |  1 +
 ...greement among humans with diverse preferences |  1 +
 ...lysis Of Dynamic Regression Parameter Learning |  1 +
 ...mple Maximum Likelihood Estimation of Location |  1 +
 ... Difference Learning with Deep Neural Networks |  1 +
 ...Convergence for Learning in Multi-Player Games |  1 +
 ...hms for Exponential Family Multi-Armed Bandits |  1 +
 ...Adaptation via Mutual Information Maximization |  1 +
 ...enerating Manifold, Graph and Categorical Data |  1 +
 ...s Better Than Last for Language Data Influence |  1 +
 ...Min-Max Optimization in Geodesic Metric Spaces |  1 +
 .../Fixed-Distance Hamiltonian Monte Carlo        |  1 +
 ... a Visual Language Model for Few-Shot Learning |  1 +
 ...enomenological Nighttime Flare Removal Dataset |  1 +
 ...ry-Efficient Exact Attention with IO-Awareness |  1 +
 .../Flexible Diffusion Modeling of Long Videos    |  1 +
 ...ible Neural Image Compression via Code Editing |  1 +
 ...MM: Flow-based continuous hidden Markov models |  1 +
 ...lowification: Everything is a normalizing flow |  1 +
 ...isual navigation using panoramic stereo vision |  1 +
 data/2022/neurips/Focal Modulation Networks       |  1 +
 ...Markov Decision Processes with Bandit Feedback |  1 +
 ...sting Future World Events With Neural Networks |  1 +
 ...orecasting Human Trajectory from Scene History |  1 +
 ...tency and Coherence of Representation Learning |  1 +
 ...mulating Robustness Against Unforeseen Attacks |  1 +
 ... for Hidden Continuous-Time semi-Markov Chains |  1 +
 ...eriors for Approximate Probabilistic Inference |  1 +
 ...mer Meets Generalized Fourier Integral Theorem |  1 +
 ...cal optical encoders for computational imaging |  1 +
 ...gorithms for Approximating Tyler's M-estimator |  1 +
 ...omponents for Training GANs under Limited Data |  1 +
 ...f feed-forward fully connected neural networks |  1 +
 ...Powerful Defense against Data Poisoning Attack |  1 +
 ...s to Learning with Stochastic Gradient Descent |  1 +
 ...tage 3D Object Detection on LiDAR Range Images |  1 +
 .../2022/neurips/Fully Sparse 3D Object Detection |  1 +
 ...iable Nonlinear Independent Component Analysis |  1 +
 .../2022/neurips/Functional Ensemble Distillation |  1 +
 ... for Better Out-of-distribution Generalization |  1 +
 ...lternating Least Squares for Tensor Clustering |  1 +
 data/2022/neurips/Fuzzy Learning Machine          |  1 +
 ...Age-path of Generalized Self-paced Regularizer |  1 +
 ...ecentralized Multi-Organization Collaborations |  1 +
 ...ent Learning via Generalizable Logic Synthesis |  1 +
 ...erative Adversarial Multi-Object Scene Attacks |  1 +
 ...zed Autoregressive Paraphrase-Identification X |  1 +
 ...lized Autoregression for Multi-Fidelity Fusion |  1 +
 ...al Architect for Immersive 3D Scene Generation |  1 +
 ...synchronous Training for Recommendation Models |  1 +
 ...ENIE: Higher-Order Denoising Diffusion Solvers |  1 +
 ...Quality 3D Textured Shapes Learned from Images |  1 +
 ...te-and-Fire Neuron for Spiking Neural Networks |  1 +
 ...Localization and Vision-Language Understanding |  1 +
 ...tudinal Human Behavior Modeling Generalization |  1 +
 ... based Generative Semantic Segmentation Models |  1 +
 .../GOOD: A Graph Out-of-Distribution Benchmark   |  1 +
 ...atrix Multiplication for Transformers at Scale |  0
 ...etrosynthetic Planning with Goal-driven Policy |  1 +
 ...ramework for Learning Graph Distance Functions |  1 +
 ...etworks with Structure-Aware Cooperative Games |  1 +
 ...Synthesis with Generative Adversarial Networks |  1 +
 ...rediction-based metric between representations |  1 +
 data/2022/neurips/Gaussian Copula Embeddings      |  1 +
 ...ing of Generalizable Signed Distance Functions |  1 +
 ...for Generalizable Out-Of-Domain Text-to-Speech |  0
 ...-Propagation-Based Neural Network Verification |  1 +
 .../Generalised Implicit Neural Representations   |  1 +
 ...tual Information for Discriminative Clustering |  1 +
 ...Passing Neural Networks on Large Random Graphs |  1 +
 ...nalysis on Learning with a Concurrent Verifier |  1 +
 ...mating Causal Effects of Continuous Treatments |  1 +
 ...ient Methods via Discrete and Continuous Prior |  1 +
 ...Class via Distributionally Robust Optimization |  1 +
 ...r Bounds on Deep Learning with Markov Datasets |  1 +
 .../Generalization Gap in Amortized Inference     |  1 +
 ...AS under Activation and Skip Connection Search |  1 +
 ...ification with overparameterized linear models |  1 +
 ... Post-Click Information in Recommender Systems |  1 +
 data/2022/neurips/Generalized Laplacian Eigenmaps |  1 +
 ... Adaptation of Generative Adversarial Networks |  1 +
 ... Gaussian Measures meet Bayesian Deep Learning |  1 +
 ...Optimization with Decision-theoretic Entropies |  1 +
 ...jection to be Compatible with Arbitrary Losses |  1 +
 ...ent Learning with Variational Causal Reasoning |  1 +
 .../Generating Long Videos of Dynamic Scenes      |  1 +
 ...dels: Towards Zero-Shot Language Understanding |  1 +
 ...with COmmon Source CoordInated GAN (COSCI-GAN) |  1 +
 .../Generative Neural Articulated Radiance Fields |  1 +
 ... Information Decoupling for Image Rain Removal |  1 +
 ...g with Diffusion, Denoise, and Disentanglement |  1 +
 ...ional Control of Pre-Trained Generative Models |  1 +
 ... learning mitigates target-causing confounding |  1 +
 ...r for physics-informed (and) operator learning |  1 +
 ...urfaces Learning for Multi-view Reconstruction |  1 +
 ...ble Geometric Shapes in Deep Image Classifiers |  1 +
 ... Few-Shot Generalization in Euclidean Geometry |  1 +
 ...rk for Efficient Graph Representation Learning |  1 +
 .../Geodesic Self-Attention for 3D Point Clouds   |  1 +
 ...Topology Compression for Graph Neural Networks |  1 +
 .../Geometric Order Learning for Rank Estimation  |  1 +
 ...e PIFu Representation for Human Reconstruction |  1 +
 ...ating Sparse Training with Gradient Correction |  1 +
 ...ance Cheap Operation with Long-Range Attention |  1 +
 ...ale Kernel Matrix-Vector Multiplication on GPU |  1 +
 ...ractive Student Programs with Meta-Exploration |  1 +
 ...Interpretable, Leak-proof Concept-based Models |  1 +
 ...e and Stability of Stochastic Gradient Descent |  1 +
 ...nce of Federated Learning for Mixed Regression |  1 +
 ...gence of IRLS for Non-Smooth Robust Regression |  1 +
 ...ming Speech Recognition in a Modular Framework |  1 +
 ...al K-Medoids Clustering of One Million Samples |  1 +
 ...Convergent Policy Search for Output Estimation |  1 +
 .../neurips/Globally Gated Deep Linear Networks   |  1 +
 ...g deep learning: How much physics do we need?" |  1 +
 ...tter Data Permutations than Random Reshuffling |  1 +
 ...ricted Secant Inequality And Upper Error Bound |  1 +
 .../Gradient Descent: The Ultimate Optimizer      |  1 +
 ...dient Estimation with Discrete Stein Operators |  1 +
 ...thods Provably Converge to Non-Robust Networks |  1 +
 ...networks for square loss and orthogonal inputs |  1 +
 ...nd Stochastic Nonsmooth Nonconvex Optimization |  1 +
 ...Assembly and Viral Quasispecies Reconstruction |  1 +
 ...ng Guarantee and Item Mixture Powered Strategy |  1 +
 ...ew-shot Learning with Task-specific Structures |  1 +
 ...g Assisted Multi-Objective Integer Programming |  0
 data/2022/neurips/Graph Neural Network Bandits    |  1 +
 .../Graph Neural Networks are Dynamic Programmers |  1 +
 .../Graph Neural Networks with Adaptive Readouts  |  1 +
 ...ering for Cache-Efficient Near Neighbor Search |  1 +
 .../Graph Scattering beyond Wavelet Shackles      |  1 +
 ...ed Learning with Accurate Discrepancy Learning |  1 +
 ...ng and Out-of-Distribution Detection on Graphs |  1 +
 ...: Quantum Neural Tangent Kernel for Graph Data |  1 +
 ...omolecular Structures and Interaction Networks |  1 +
 ...l Vision Transformer for Masked Image Modeling |  1 +
 ...riddlyJS: A Web IDE for Reinforcement Learning |  1 +
 ... Learning to Win the Game under Human Commands |  1 +
 .../neurips/Grounded Video Situation Recognition  |  1 +
 ...ncertainty for Unsupervised Environment Design |  1 +
 ...tocratic Fairness in Linear Contextual Bandits |  1 +
 ... Framework for Continuous Categories Discovery |  1 +
 ...tum for Learning Particle-based Fluid Dynamics |  1 +
 ...dinal Dataset of Commercial ML API Predictions |  1 +
 ...ce Reconstruction Using High-Frequency Details |  1 +
 ...or Modeling Surfaces with Arbitrary Topologies |  1 +
 ...for 3D Point Clouds by Learning Hyper Surfaces |  1 +
 ...nditioned Human Motion Generation in 3D Scenes |  1 +
 ...rchitecture for Accelerated MRI Reconstruction |  1 +
 ...for Long-Horizon Prediction of Event Sequences |  1 +
 ... and motion disentanglement in image sequences |  1 +
 .../Hand-Object Interaction Image Generation      |  1 +
 ...munication in Physical and Social Environments |  1 +
 .../Handcrafted Backdoors in Deep Neural Networks |  1 +
 ...ntations for Objects with Strong Spurious Cues |  1 +
 ...Markov Decision Processes: Theory and Practice |  1 +
 ... Learning for Two-Hidden-Layer Neural Networks |  1 +
 ...strategies of deep neural networks with humans |  1 +
 ...istribution Matching for Human Pose Estimation |  1 +
 ...Augmentation in Probabilistic Graphical Models |  1 +
 ...rogeneous Skill Learning for Multi-agent Tasks |  0
 ...D Learns Parities Near the Computational Limit |  1 +
 .../Hiding Images in Deep Probabilistic Models    |  1 +
 ...upervised Representations for Speech Synthesis |  1 +
 ...ive Graph Clustering in Poly-Logarithmic Depth |  1 +
 ...Communication-efficient Collaborative Learning |  1 +
 ... Graph Transformer with Adaptive Node Sampling |  1 +
 ...e Layer for Partially Monotone Neural Networks |  0
 ...lization for Robust Monocular Depth Estimation |  1 +
 ...al classification at multiple operating points |  1 +
 ...raph Neural Networks with Tensor Decomposition |  1 +
 ...ssian Processes under Monotonicity Constraints |  1 +
 ... One Gradient Step Improves the Representation |  1 +
 ...r SGD: Effective dynamics and critical scaling |  1 +
 ...Distillation for Cross-Dimensionality Networks |  1 +
 ...act Gradients Through Finite Size Oscillations |  1 +
 data/2022/neurips/Homomorphic Matrix Completion   |  1 +
 ...lization in Competitive Reinforcement Learning |  1 +
 ...Interactions with Recursive Gated Convolutions |  1 +
 ...tasets via Capacity-Aware Neuron Steganography |  1 +
 ...oretical Understandings of Masked Autoencoders |  1 +
 ...re K-hop Message Passing Graph Neural Networks |  1 +
 ...s the Robustness of Stochastic Neural Networks |  1 +
 ...Video Representations Based on Synthetic Data? |  1 +
 ... Model Human Real-time and Life-long Learning? |  1 +
 ...eel? Estimating Wellbeing From Video Scenarios |  1 +
 ... On the Incentives of Users of Learning Agents |  1 +
 ...earn: Instructions, descriptions, and autonomy |  1 +
 ...sfer Learning from A Hub of Pre-trained Models |  1 +
 .../Human-AI Collaborative Bayesian Optimisation  |  1 +
 .../Human-AI Shared Control via Policy Dissection |  1 +
 ...s Collaborating Agents for Symmetrical Walking |  1 +
 ... Detector to Model the Manual Labeling Process |  1 +
 ...ng in Visual and Other Sensory Neuroprostheses |  1 +
 ...Models: Sampling Unseen Neural Network Weights |  1 +
 ...Adaptation for Generative Adversarial Networks |  1 +
 ...opic Taxonomy Mining with Hyperbolic Embedding |  1 +
 ...erTree Proof Search for Neural Theorem Proving |  1 +
 ...nference for Structured Multi-Label Prediction |  1 +
 ... Estimation and Infinite Sampling on Manifolds |  1 +
 ...nalysis and a Scalable Hyper-Ensemble Solution |  1 +
 ...g for Differentially Private Linear Regression |  1 +
 ...t Attention for Zero-Shot Image Classification |  1 +
 ...2Q: A Fully Decentralized Q-Learning Algorithm |  0
 ...KEA-Manual: Seeing Shape Assembly Step by Step |  1 +
 ... Maximization Loss for Spiking Neural Networks |  1 +
 ... learning of ergodic Markov decision processes |  1 +
 ...mplicit Neural Representation for Audio Scenes |  0
 ...iple experts in Inverse Reinforcement Learning |  1 +
 ...enerative models without auxiliary information |  1 +
 ...ent: A bridge to Gaussian Differential Privacy |  1 +
 ...ently learn low-degree plus sparse polynomials |  1 +
 ...ons are the Answer, Then What is the Question? |  1 +
 ...e Trouble: Revisiting Neural-Collapse Geometry |  1 +
 ...mitating Past Successes can be Very Suboptimal |  1 +
 ...minacy for Explanations of Automated Decisions |  1 +
 ...rized Models: On Equivalence to Mirror Descent |  1 +
 ... Neural Representations with Levels-of-Experts |  1 +
 ...ct Risk Trajectories of SGD in High Dimensions |  1 +
 ...Implicit Warping for Animation with Image Sets |  1 +
 ...Improved Algorithms for Neural Active Learning |  1 +
 ...ty for Representing Piecewise Linear Functions |  1 +
 ... Reduction and its Application to Optimization |  1 +
 .../Improved Coresets for Euclidean k-Means       |  1 +
 ...l Private Linear Operators on Adaptive Streams |  1 +
 ...ed Feature Distillation via Projector Ensemble |  1 +
 ...-Tuning by Better Leveraging Pre-Training Data |  1 +
 ...vex Regularizers with Global Optima Guarantees |  1 +
 ...r Bandits and Horizon-Free Linear Mixture MDPs |  1 +
 ...proved Utility Analysis of Private CountSketch |  1 +
 ...ved techniques for deterministic l2 robustness |  1 +
 ... Synthesis with A Geometry-aware Discriminator |  1 +
 ...criminating Unlabeled Samples with Super-Class |  0
 ...ia Statistical Learning with Logical Reasoning |  1 +
 ...or Inverse Problems using Manifold Constraints |  1 +
 .../Improving GANs with A Dynamic Discriminator   |  1 +
 ...works via Adversarial Learning in Latent Space |  1 +
 ...trinsic Exploration with Language Abstractions |  1 +
 ...ns with Nesterov's Accelerated Gradient Method |  1 +
 ...by Adversarial Training with Structured Priors |  1 +
 ...cy Learning via Language Dynamics Distillation |  1 +
 ...ng by Characterizing Idealized Representations |  1 +
 ...earning via Adaptive Vicinal Risk Minimization |  1 +
 ...ansformer with an Admixture of Attention Heads |  1 +
 ...encoders with Density Gap-based Regularization |  1 +
 ...earning using Generalized Similarity Functions |  1 +
 ...ary Scalarization for Deep Multi-Task Learning |  1 +
 ...Histogram Leakage in Ensemble Private Learning |  1 +
 ...orks Invariant and How Should We Measure This? |  1 +
 ...r: Robust Prediction with Causal User Modeling |  1 +
 ...Incentivizing Combinatorial Bandit Exploration |  1 +
 data/2022/neurips/Inception Transformer           |  1 +
 ...o Contrastive Loss for Collaborative Filtering |  1 +
 ...nfidence in Adversarial Robustness Evaluations |  1 +
 ...tive Bayesian Optimization in Nested Subspaces |  1 +
 ...ement Learning under Mixed and Delayed Rewards |  1 +
 ...e Testing for Bounded Degree Bayesian Networks |  1 +
 ...asurement Error and Linear Non-Gaussian Models |  1 +
 ...Improving Optimization of Adversarial Examples |  1 +
 ...ous Design-and-Play Ensures Global Convergence |  1 +
 ... Classifier at the End of Deep Neural Network? |  1 +
 ...ve Logical Query Answering in Knowledge Graphs |  1 +
 .../Inference and Sampling for Archimax Copulas   |  1 +
 ...commendation Networks: A Data-Centric Approach |  1 +
 ...lity Coregionalization for Physical Simulation |  1 +
 ... Behavior in Multiagent Reinforcement Learning |  1 +
 ...gression: relevancy, efficiency and optimality |  1 +
 ...ompression with Variational Energy-based Model |  1 +
 ...retic Safe Exploration with Gaussian Processes |  1 +
 ...ble Reinforcement Learning in Natural Language |  1 +
 ...al Networks for Predicting Material Properties |  1 +
 ...rformant Insertion-based Text Generation Model |  1 +
 ...roposal for Online Video Instance Segmentation |  1 +
 ... into Pre-training via Simpler Synthetic Tasks |  1 +
 ...ima in GAN Training with Kernel Discriminators |  1 +
 ...timation for Gradient-Boosted Regression Trees |  1 +
 ...on in Linear MDPs via Online Experiment Design |  1 +
 ...e-based Learning for Knowledge Base Completion |  1 +
 ...-optimal PAC Algorithms for Contextual Bandits |  1 +
 .../Integral Probability Metrics PAC-Bayes Bounds |  1 +
 .../Interaction Modeling with Multiplex Attention |  1 +
 ...ounded Learning with Action-Inclusive Feedback |  1 +
 ...Transformer for Few-Shot Semantic Segmentation |  1 +
 ...olation and Regularization for Causal Learning |  1 +
 ...rspective from Influence-Directed Explanations |  1 +
 ...Experimental Design for Causal Models at Scale |  1 +
 ...gent speech permits zero-shot task acquisition |  1 +
 ...ensionality estimation using Normalizing Flows |  0
 ...tage approach for Inference in Neural Networks |  1 +
 .../Invariance Learning based on Label Hierarchy  |  1 +
 ...rks with Differentiable Laplace Approximations |  1 +
 ...riance-Aware Randomized Smoothing Certificates |  1 +
 ... Representations for Anti-Causal Domain Shifts |  1 +
 ...re Interactions using Graph Network Simulators |  1 +
 ...erg Games: the Blessing of Bounded Rationality |  1 +
 ...tible Monotone Operators for Normalizing Flows |  1 +
 ... Arithmetic Enough for Deep Learning Training? |  1 +
 .../Is Out-of-Distribution Detection Learnable?   |  1 +
 .../Is Sortition Both Representative and Fair?    |  1 +
 .../neurips/Is a Modular Architecture Enough?     |  1 +
 ...hmark for noisy and ambiguous label estimation |  1 +
 ...nd Query Efficient Model Agnostic Explanations |  0
 ...oncontrollable Visual Dynamics in World Models |  1 +
 ... 3D Adversarial Examples in the Physical World |  1 +
 ...n Generalization with Logarithmic Environments |  1 +
 .../2022/neurips/Iterative Scene Graph Generation |  1 +
 ...rative Structural Inference of Directed Graphs |  1 +
 ...n Joint Architecture And Hyperparameter Search |  1 +
 ...g Predictive Uncertainty Under Covariate Shift |  1 +
 ...h For Maximally-Informed Bayesian Optimization |  1 +
 ...arch for Multi-Objective Bayesian Optimization |  1 +
 ... 2D-3D Weakly Supervised Semantic Segmentation |  1 +
 ...apturing High-order Statistics in Transformers |  1 +
 ...sferable Visual Models with External Knowledge |  1 +
 ...tonomous Driving in Various Weather Conditions |  1 +
 ... Positional Embedding for Length Extrapolation |  1 +
 .../neurips/KSD Aggregated Goodness-of-fit Test   |  1 +
 ...k! Wasserstein GANs are not Optimal Transport? |  1 +
 .../Kernel Interpolation with Sparse Grids        |  1 +
 ...orks: A Unifying Framework for Memory Modeling |  1 +
 .../Kernel Multimodal Continuous Attention        |  1 +
 ...rnel similarity matching with Hebbian networks |  1 +
 ...pplications in Heterogeneous Domain Adaptation |  1 +
 ...ructure Augmentation for Graph Neural Networks |  1 +
 ...Knowledge Distillation from A Stronger Teacher |  1 +
 ...stillation: Bad Models Can Be Good Role Models |  1 +
 .../Knowledge-Aware Bayesian Deep Topic Model     |  1 +
 ...for training next generation image-text models |  1 +
 ...Text from Gradients with Language Model Priors |  1 +
 ...ptimization for Offline Reinforcement Learning |  1 +
 ...om Sparse Image Ensemble via 3D Part Discovery |  1 +
 ...for Individual-Level Unbiased Learning to Rank |  1 +
 ...Cooperative Multi-Agent Reinforcement Learning |  1 +
 ...sodic Count for Task-Specific Intrinsic Reward |  1 +
 ... Denoising Network for Video-Language Modeling |  1 +
 ...Tuning for Non-language Machine Learning Tasks |  1 +
 ...Point Diffusion Models for 3D Shape Generation |  1 +
 ...Industrial Physical Simulation benchmark suite |  1 +
 ...Interpretable Skill Abstractions from Language |  1 +
 ...ptation for Label-Efficient OOD Generalization |  0
 ... Training on Improving l2 Certified Robustness |  1 +
 ...rameter and Memory Efficient Transfer Learning |  1 +
 ...able Thresholding Neurons and Moderate Dropout |  1 +
 ... Novel Perspective to Study Robust Overfitting |  1 +
 ...lti-Label Learning with Single Positive Labels |  1 +
 ...ation for Semi-Supervised Graph Classification |  1 +
 ...oders for Learning Deep Latent Variable Models |  1 +
 ...ial Relation Reasoning for 3D Object Grounding |  1 +
 ...rs are Strong Few-Shot Video-Language Learners |  1 +
 ...coders for Learning Stochastic Representations |  1 +
 .../Large Language Models are Zero-Shot Reasoners |  1 +
 ...fferentiable Causal Discovery of Factor Graphs |  1 +
 ...rge-Scale Retrieval for Reinforcement Learning |  1 +
 ...dictions: Training Faster R-CNN in 4.2 Minutes |  1 +
 ...Partial AUC in a Range of False Positive Rates |  1 +
 ...tive Structure-aware Generative Language Model |  1 +
 ...t Method for Monotone Variational Inequalities |  1 +
 .../Latency-aware Spatial-wise Dynamic Networks   |  1 +
 ...usal Structure Discovery with Rank Constraints |  1 +
 .../Latent Planning via Expansive Tree Search     |  1 +
 ...ces of a Generic Framework for Sparse Training |  1 +
 ... MAP Inference for Determinantal Point Process |  1 +
 ... Thought Chains for Science Question Answering |  1 +
 ...nforcement Learning in Markov Matching Markets |  1 +
 ...itation learning with task-relevant embeddings |  1 +
 ...variant and Equivariant Convolutional Networks |  1 +
 ...arning (Very) Simple Generative Models Is Hard |  1 +
 ...ning Active Camera for Multi-Object Navigation |  1 +
 ... Dynamics with Lagrangian Graph Neural Network |  1 +
 ...Using Scene Graphs for Audio Source Separation |  1 +
 ...g Best Combination for Efficient N: M Sparsity |  1 +
 ...te Graphs: Heavy Tails and Multiple Components |  1 +
 ...r Out-of-Distribution Generalization on Graphs |  1 +
 ...arning Chaotic Dynamics in Dissipative Systems |  0
 ...ncept Credible Models for Mitigating Shortcuts |  1 +
 ... Functions Progressively from Raw Point Clouds |  1 +
 ...Contrastive Embedding in Low-Dimensional Space |  1 +
 ...ning Debiased Classifier with Biased Committee |  1 +
 .../Learning Deep Input-Output Stable Dynamics    |  1 +
 ...ple Views for Low-shot Category Generalization |  1 +
 ... and Representative Modes for Image Captioning |  1 +
 ...etwork Load Balancing as Markov Potential Game |  1 +
 ...Networks in the Presence of Arbitrary Outliers |  1 +
 ...egression in Reproducing Kernel Hilbert Spaces |  1 +
 ...formers via Fine-Grained Manifold Distillation |  1 +
 ...Networks with Generalized Fenchel-Young Losses |  1 +
 ... for Tabular Data via Neighborhood Propagation |  0
 ...ant Segmentation with Instance-Unique Querying |  1 +
 ...ations with Mixture of Expert Neural Processes |  1 +
 ...es in Neural Stochastic Differential Equations |  1 +
 ...Models in a Handful of Reward-Free Deployments |  1 +
 ...le Routing Problems via Knowledge Distillation |  1 +
 ...sed Feature Representation for 3D Point Clouds |  1 +
 ...r Relational Stochastic Shortest Path Problems |  1 +
 ...ck-tracing for Object Tracking in Event Clouds |  1 +
 ...ised Clustering Approach Using Adaptive Fusion |  1 +
 ...tless Multi-Action Bandits via Index Awareness |  1 +
 ...ace Conditions in Domain Decomposition Solvers |  1 +
 ...tations for Out-of-Distribution Generalization |  1 +
 ...nd Representations for Time Series Forecasting |  1 +
 ...Term Crop Management Strategies with CyclesGym |  1 +
 ...ions with Conditional Variational Autoencoders |  1 +
 ...ed Multinomial Logits with Provable Guarantees |  1 +
 ...ng Modular Simulations for Homogeneous Systems |  1 +
 ...h Spectral Attention for Robust Shape Matching |  1 +
 ...m Graph and Provable Auction-Fitted Q-learning |  1 +
 data/2022/neurips/Learning Neural Acoustic Fields |  1 +
 ... Set Functions Under the Optimal Subset Oracle |  1 +
 ...ing Optical Flow from Continuous Spike Streams |  1 +
 ... Flows for Non-Equilibrium Importance Sampling |  1 +
 .../2022/neurips/Learning Options via Compression |  1 +
 .../Learning Partial Equivariances From Data      |  1 +
 ...mics with Subequivariant Graph Neural Networks |  1 +
 ...hysics Constrained Dynamics Using Autoencoders |  1 +
 ...ng Predictions for Algorithms with Predictions |  1 +
 ...dels from Generator Latent Spaces with Hat EBM |  1 +
 ...nce Environment to Enhance Prediction Accuracy |  1 +
 ...avioral Metric for Deep Reinforcement Learning |  1 +
 ...ust Dynamics through Variational Sparse Gating |  1 +
 ...for Abstract Reasoning via Internal Inferences |  1 +
 ...sual Representations from Audible Interactions |  1 +
 ...erarchical Representation Learning by Chunking |  1 +
 ... Out-of-Distribution Molecular Representations |  1 +
 ...erpoint Graph Cut for 3D Instance Segmentation |  1 +
 .../neurips/Learning Symmetric Rules with SATNet  |  1 +
 ...istic Models from Inconsistent Local Estimates |  1 +
 ...ction Approximation and Correlated Equilibrium |  0
 ...presentations by Recovering Tokens in 3D Space |  1 +
 ...ory-Efficient Video Class-Incremental Learning |  1 +
 ...endent Random Variables with Unbounded Support |  1 +
 ...of deep linear networks with multiple pathways |  1 +
 ...ual Linear Bandits Without Sharing the Context |  0
 ...th Composition and Locality at Multiple Scales |  1 +
 ...f-Training Framework for Semantic Segmentation |  1 +
 ...Label Proportions by Learning with Label Noise |  1 +
 ...arning from Stochastically Revealed Preference |  1 +
 .../Learning from a Sample in Online Algorithms   |  1 +
 ...rning in Congestion Games with Bandit Feedback |  1 +
 ...s, without Computationally Intractable Oracles |  1 +
 ...ical systems with latent Gaussian process ODEs |  1 +
 ...ble natural features from retina using a U-net |  1 +
 ...bitrary Graph Topologies via Predictive Coding |  1 +
 ...nline Learning with Stochastic Feedback Graphs |  1 +
 ...ngle-index models with shallow neural networks |  1 +
 ...res can lead to overfitting in neural networks |  1 +
 ...ge Networked Systems Obeying Conservation Laws |  1 +
 ...erential Equations via Latent Global Evolution |  1 +
 ...-based Reinforcement Learning Attack Framework |  1 +
 .../neurips/Learning to Branch with Tree MDPs     |  1 +
 ...igating Repetitions for Neural Text Generation |  1 +
 ...in Branch and Bound with Graph Neural Networks |  1 +
 ...ter Networks with Neural Algorithmic Reasoning |  1 +
 ... Policy Optimization with Virtual Trust Region |  1 +
 .../Learning to Discover and Detect Objects       |  1 +
 ...Adversarial Approach to Training Sequence VAEs |  1 +
 ...ing to Follow Instructions in Text-Based Games |  1 +
 ...enerate Inversion-Resistant Model Explanations |  1 +
 ...to Mitigate AI Collusion on Economic Platforms |  1 +
 ...g to Navigate Wikipedia by Taking Random Walks |  1 +
 ...ptimal Transport for Imbalanced Classification |  1 +
 ...neralization, Unseen Data and Boolean Measures |  1 +
 ...Spatiotemporal Graphs with Sparse Observations |  1 +
 ...-shot Reasoning over Temporal Knowledge Graphs |  1 +
 ...ld: Optimizing Model Explanations for Teaching |  1 +
 ...n Networked Multi-Agent Reinforcement Learning |  0
 ...ution and pooling operations in kernel methods |  1 +
 data/2022/neurips/Learning with little mixing     |  1 +
 ...for Online Linear and Semidefinite Programming |  1 +
 ... Environments Using GNNs and Temporal Encoding |  1 +
 ...N-based best-first search for Sokoban Planning |  1 +
 .../Less-forgetting Multi-lingual Fine-tuning     |  1 +
 ... Cloud Cross-Modal Training for Shape Analysis |  1 +
 .../Lethal Dose Conjecture on Data Poisoning      |  1 +
 ...t Offline Reinforcement Learning in Healthcare |  1 +
 ...yer Dependency for Post -Training Quantization |  1 +
 ...ptive Bidding in Repeated First-Price Auctions |  1 +
 .../LieGG: Studying Learned Lie Group Generators  |  1 +
 ...earning Cumulatively Online without Forgetting |  1 +
 ...ting Weak Supervision To Structured Prediction |  1 +
 ...is of Thompson Sampling for Contextual Bandits |  1 +
 .../Linear Label Ranking with Bounded Noise       |  1 +
 data/2022/neurips/Linear tree shap                |  1 +
 .../Lipschitz Bandits with Batched Feedback       |  1 +
 .../neurips/List-Decodable Sparse Mean Estimation |  1 +
 ...n Estimation via Difference-of-Pairs Filtering |  1 +
 ...c Interpretability for Audio Networks with NMF |  1 +
 ...hitecture Search for Efficient Language Models |  1 +
 ... Stationary Distribution Correction Estimation |  1 +
 ...mization via maximizing probability of descent |  1 +
 ...ility of Deep ReLU Neural Networks: the Theory |  1 +
 ...e Bayesian Optimization over Structured Inputs |  1 +
 ...bspace Optimization via Strict Complementarity |  1 +
 ... in Contextual Bandits with Continuous Actions |  1 +
 ... Longitudinally-consistent Neuroimage Analysis |  1 +
 ...l-Global MCMC kernels: the best of both worlds |  1 +
 ... Auto-Regressive Modeling for Image Generation |  1 +
 ...cating and Editing Factual Associations in GPT |  1 +
 ...l Noise Distributions for Differential Privacy |  1 +
 ...e Gaussian Processes Using Binary Tree Kernels |  1 +
 .../neurips/Log-Polar Space Convolution Layers    |  1 +
 ...Logical Reasoning via Adversarial Pre-training |  1 +
 ...equivalents of Probabilistic Boolean Operators |  1 +
 data/2022/neurips/Logical Credal Networks         |  1 +
 data/2022/neurips/Long Range Graph Benchmark      |  1 +
 ... with Multimodal Temporal Contrastive Learning |  1 +
 ...Knowledge Distillation for 3D Visual Grounding |  1 +
 .../Look More but Care Less in Video Recognition  |  1 +
 ...eneralization in visual Reinforcement Learning |  1 +
 ...t Multilingual and Multitask Speech Processing |  1 +
 ...tangled models at combinatorial generalisation |  1 +
 ...Initializations with Sparse Trainable Networks |  1 +
 ...ular Reinforcement Learning via Muscle Synergy |  1 +
 ...sport: Approximation, Statistics and Debiasing |  1 +
 ...ral networks via matrix differential equations |  1 +
 ...ibuted Learning with Communication Compression |  1 +
 ...Preconditioned Lasso via Robust Sparse Designs |  1 +
 .../Luckiness in Multiscale Online Learning       |  1 +
 .../M2N: Mesh Movement Networks for PDE Solvers   |  1 +
 ...Musical Score Provided Mandarin Singing Corpus |  1 +
 ...ster Forest Training Using Multi-Armed Bandits |  1 +
 ...al Networks for Fast and Accurate Force Fields |  1 +
 ...ual Knowledge for Unpaired Image-text Matching |  1 +
 ...earning in Distributed Target Coverage Control |  1 +
 .../MAgNet: Mesh Agnostic Neural PDE Solver       |  1 +
 ... A Manifold Attention Network for EEG Decoding |  1 +
 .../MBW: Multi-view Bootstrapping in the Wild     |  1 +
 ...works with Multiple Specialized Discriminators |  1 +
 ...: Masked Convolution Meets Masked Autoencoders |  1 +
 ... for Prediction, Generation, and Interpolation |  1 +
 ...ime Robustness via Adaptation and Augmentation |  1 +
 ... Targeted Sentiment on COVID-19 Related Tweets |  1 +
 ...odel Extraction Crossover Membership Inference |  1 +
 ...ale Graph Neural Networks with Implicit Layers |  1 +
 ... for Multi-Object Multi-Actor Activity Parsing |  1 +
 ...stness Evaluation with Model Reweighing Attack |  1 +
 ...ised Movable Object Segmentation and Detection |  1 +
 ...it String Dataset for Handwriting Verification |  1 +
 ...tructure Across Multiple Levels of Abstraction |  1 +
 ...or Real-World Multi-View Object Classification |  1 +
 ...n Stronger: A Sparsified Perturbation Approach |  1 +
 ...and Efficient Single-Step Adversarial Training |  1 +
 ...ro-Shot Learning for Novel Attribute Synthesis |  1 +
 ...trategies Feasible with Neural Tangent Kernels |  1 +
 ...lack-box Explanations Using Dependence Measure |  1 +
 ...timal-Transport Flows for Trajectory Inference |  1 +
 ...arning with Class-Level Overfitting Mitigation |  1 +
 ...Variational Inference with Markovian Gradients |  1 +
 .../neurips/Markovian Interference in Experiments |  1 +
 ...: Backdoor Attacks with Arbitrary Target Class |  1 +
 ...Matching Transformer for Few-Shot Segmentation |  1 +
 ...tent Reconstruction for Reinforcement Learning |  1 +
 ... via Reinforced Visual Representation Learning |  1 +
 ...ng Spurious Correlations by Forcing to Explore |  1 +
 ...Masked Autoencoders As Spatiotemporal Learners |  1 +
 data/2022/neurips/Masked Autoencoders that Listen |  1 +
 ...for Scalable and Generalizable Decision Making |  1 +
 ...etworks are Data-Efficient Generation Learners |  1 +
 ...d Prediction: A Parameter Identifiability View |  1 +
 .../Matching in Multi-arm Bandit with Collision   |  1 +
 ...Zero-Sum Games: Conservation Laws & Recurrence |  1 +
 .../neurips/Matryoshka Representation Learning    |  1 +
 ...orst-Case Robustness to Model Misspecification |  1 +
 ... under Market Shrinkage and Market Uncertainty |  1 +
 ... in Multi-armed Bandits with Graph Information |  1 +
 ...ass Separation as Inductive Bias in One Matrix |  1 +
 ...Retrieval: Late and Early Interaction Networks |  1 +
 ...Training of Implicit Nonlinear Diffusion Model |  1 +
 ...tinal ganglion cells with deep denoiser priors |  1 +
 ...forcement Learning with Finite-Time Guarantees |  1 +
 ...ensional Binary Markov Gaussian Mixture Models |  1 +
 ...th User-level Privacy under Data Heterogeneity |  1 +
 ...s of Information Reflect Memorization Patterns |  1 +
 ...on Defenses in Collaborative Inference Systems |  1 +
 ...te Regression in Structured Prediction for NLP |  1 +
 ...the Training Dynamics of Large Language Models |  1 +
 ...al Networks with Minimum Over-parameterization |  1 +
 ...Efficient Continual Learning with Transformers |  1 +
 .../Memory safe computations with XLA compiler    |  1 +
 .../Merging Models with Fisher-Weighted Averaging |  1 +
 .../Mesoscopic modeling of hidden spiking neurons |  1 +
 ...Training Tasks - a Density Estimation Approach |  1 +
 ...Meta-Dataset for Few-Shot Image Classification |  1 +
 ...ving Parametric Partial Differential Equations |  1 +
 ...emantics of Short Texts in Neural Topic Models |  0
 ...t by Meta-Distillation from Mixture-of-Experts |  1 +
 ...ning Dynamics Forecasting Using Task Inference |  1 +
 ...a-Learning with Self-Improving Momentum Target |  1 +
 ...mativeness Dilemma in Open-set Active Learning |  1 +
 ...orcement Learning with Self-Modifying Networks |  1 +
 ...ng for Preference-based Reinforcement Learning |  1 +
 ...ng within randomly initialized neural networks |  1 +
 ...sional Confounder for Self-Supervised Learning |  1 +
 ...in Adaptation for Medical Image Classification |  1 +
 ...f Correlation Exploring in Similarity Learning |  1 +
 ...h Modeling for Graph Variational Auto-Encoders |  1 +
 ... Q-Learning for Offline Reinforcement Learning |  1 +
 ...ntation Framework without Video-based Training |  1 +
 ...structing complex images from brain activities |  1 +
 ...ulti-modal Contrastive Representation Learning |  1 +
 ... Embodied Agents with Internet-Scale Knowledge |  1 +
 ...Cooperative Multi-Agent Reinforcement Learning |  1 +
 ...ithms for Fixed-Budget Best Arm Identification |  1 +
 ...dget Best Arm Identification in Linear Bandits |  1 +
 ...nline Imitation Learning via Replay Estimation |  1 +
 .../neurips/Minimax Regret for Cascading Bandits  |  1 +
 ...ent RL in Markov Games With a Generative Model |  1 +
 ...ulti-Label Samples from Single Positive Labels |  1 +
 ...A Simple Baseline for Incremental Segmentation |  1 +
 ...ized Margin and Can Be Implemented Efficiently |  1 +
 ...re Spaces, with application to Sinkhorn and EM |  1 +
 ...t Model-Policy Optimization for Model-Based RL |  1 +
 ...ing Data with Continuous Additive Noise Models |  1 +
 ...ierarchical Models and Hamiltonian Monte Carlo |  1 +
 ...ecified Phase Retrieval with Generative Priors |  1 +
 ...ogy with Data Mixing for Domain Generalization |  1 +
 .../Mixture-of-Experts with Expert Choice Routing |  1 +
 ...ti-Task Dataset for Simulated Humanoid Control |  1 +
 ...: Model-based Counterfactual Data Augmentation |  1 +
 ... Object Detection with Ground Depth Estimation |  1 +
 ...zed Vectors for High-Fidelity Image Generation |  1 +
 ...del Preserving Compression for Neural Networks |  1 +
 ...f Diverse Populations of Neural Network Models |  1 +
 ...del-Based Imitation Learning for Urban Driving |  1 +
 ...rning with Pessimism-Modulated Dynamics Belief |  1 +
 data/2022/neurips/Model-Based Opponent Modeling   |  1 +
 ...inforcement Learning with Bayesian Exploration |  1 +
 ...g: Structural Conditions and Sample Complexity |  1 +
 ...trained Proximal Policy Optimization Algorithm |  1 +
 ...rough Resource-Rational Reinforcement Learning |  1 +
 ...Directed Graphs via Binary Code Box Embeddings |  1 +
 .../Modeling the Machine Learning Multiverse      |  1 +
 ... Fourier Lens on Distribution Shift Robustness |  1 +
 ...kdoor Defender for Pre-trained Language Models |  1 +
 ...dular Flows: Differential Molecular Generation |  1 +
 ...dule-Aware Optimization for Auxiliary Learning |  1 +
 ...on by Principal Subgraph Mining and Assembling |  1 +
 ...ributionally Robust Tree Structured Prediction |  1 +
 ...ion Shifts in Data-Free Knowledge Distillation |  1 +
 ...omentum Aggregation for Private Non-convex ERM |  1 +
 ...ues for Neural Implicit Surface Reconstruction |  1 +
 ...ocular Dynamic View Synthesis: A Reality Check |  1 +
 ...cement Learning from Suboptimal Demonstrations |  1 +
 ... Carlo Tree Descent for Black-Box Optimization |  1 +
 ...ion for High Dimensional Bayesian Optimization |  1 +
 ... Injecting Morphology in Tensorized Embeddings |  1 +
 ...ns Can Win the Lottery Without Excessive Depth |  1 +
 ...ion Localization and Local Movement Refinement |  1 +
 ...zation with Application to Wind Energy Systems |  1 +
 ...former for 3D Object Detection on Point Clouds |  1 +
 ...n for Decentralized Optimization and Averaging |  1 +
 ...cement Learning is a Sequence Modeling Problem |  1 +
 .../neurips/Multi-Class $H$-Consistency Bounds    |  1 +
 .../Multi-Fidelity Best-Arm Identification        |  0
 .../2022/neurips/Multi-Game Decision Transformers |  1 +
 ...ralized Medical Visual Representation Learning |  1 +
 ...diction and Out-of-Distribution Generalization |  1 +
 ...ivil Rights Lawsuits at Multiple Granularities |  1 +
 ...timodal Pre-training for Cross-modal Retrieval |  0
 ... Deep Learning with Adaptive Reference Vectors |  1 +
 ...i-Sample Training for Neural Image Compression |  1 +
 ...le Adaptive Network for Single Image Denoising |  1 +
 .../Multi-agent Dynamic Algorithm Configuration   |  1 +
 ...Greedy Deployment and Consensus Seeking Agents |  1 +
 ...plications in Multi-task Deep AUC Maximization |  1 +
 ...timator for Coupled Compositional Optimization |  1 +
 ... of Transformers for Robust Action Recognition |  1 +
 ...delity Monte Carlo: a pseudo-marginal approach |  1 +
 ...te Evolution Under Random Convolutional Design |  1 +
 ...r Weakly-Supervised Audio-Visual Video Parsing |  1 +
 ...ta Generation with Correlated Property Control |  1 +
 ...ew Subspace Clustering on Topological Manifold |  0
 ...el Classification against Adversarial Examples |  1 +
 ...g for 3D environments with articulated objects |  1 +
 ...ltiagent Q-learning with Sub-Team Coordination |  1 +
 ...k: Universal Rates and Partial Concept Classes |  1 +
 ...Comment Detection at Scale for Indic Languages |  1 +
 ...h LIMoE: the Language-Image Mixture of Experts |  1 +
 ...A Neural Model for Bilingual Cognitive Reserve |  1 +
 ...with Temporal Polynomial Graph Neural Networks |  1 +
 ... Body Reconstruction from Uncalibrated Cameras |  1 +
 ... Coarse-Grained Attention for Music Generation |  1 +
 ...nified Metric for Multimodal Generative Models |  1 +
 ...idge trajectory optimization and deep learning |  1 +
 ...ask Learning with Model-Accelerator Co-design" |  1 +
 ...ng Neural Architecture Search on Diverse Tasks |  1 +
 ... Benchmarking Graph Neural Architecture Search |  1 +
 ...ro: Accelerating Research on Zero Cost Proxies |  1 +
 ...rior for Effective Unsupervised Shape Matching |  1 +
 ...linear Manifold Decoders for Operator Learning |  1 +
 ...t-time Adaptation Against Temporal Correlation |  1 +
 .../NS3: Neuro-symbolic Semantic Code Search      |  1 +
 ...bilistic Framework for Satisfiability Problems |  1 +
 ...ssive Generation for Infinite Visual Synthesis |  1 +
 ...owards Boosting Black-box Unrestricted Attacks |  1 +
 ...ables fast sampling in spiking neural networks |  1 +
 ...iational information bottleneck representation |  1 +
 ...ematical Proof Generation with Language Models |  1 +
 ... Pseudo-Task Simulation for Continual Learning |  1 +
 ...: Neural Motion Fields for Kinematic Animation |  1 +
 ... Reinforcement Learning for Deterministic MDPs |  1 +
 ... Kronecker-Structured Random Tensor Embeddings |  1 +
 ...Near-Optimal Collaborative Learning in Bandits |  1 +
 ...ar-Optimal Correlation Clustering with Privacy |  1 +
 ...cement Learning in Non-Stationary Environments |  1 +
 ...Multi-Agent Learning for Safe Coverage Control |  1 +
 ...ret Learning Dynamics for General Convex Games |  1 +
 ...ar-Optimal Private and Scalable $k$-Clustering |  0
 ...loration for Tabular Markov Decision Processes |  1 +
 ... Bounds for Multi-batch Reinforcement Learning |  1 +
 ...r Adversarial MDP with Delayed Bandit Feedback |  1 +
 ... Sample Complexity Bounds for Constrained MDPs |  1 +
 ...ontextual Bandits with Adversarial Corruptions |  1 +
 ...ithms for Online Learning with Feedback Graphs |  1 +
 ...ght Bounds for Testing Histogram Distributions |  0
 ...d Benchmark for Offline Reinforcement Learning |  1 +
 ...ameter-Agnostic Nonconvex Minimax Optimization |  1 +
 ... localisation under local differential privacy |  1 +
 ... Adaptive Overfitting for Neural Shape Editing |  1 +
 ...ral Geometry and Physics from Monocular Videos |  1 +
 ...ur2SP: Neural Two-Stage Stochastic Programming |  1 +
 ...Enabling Parametric Photonic Device Simulation |  1 +
 data/2022/neurips/Neural Abstractions             |  1 +
 ...al Approximation of Graph Topological Features |  1 +
 data/2022/neurips/Neural Attentive Circuits       |  1 +
 .../Neural Basis Models for Interpretability      |  1 +
 ...cuit Architectural Priors for Embodied Control |  1 +
 ...eometric Analysis over the Riemannian Manifold |  1 +
 ...nservation Laws: A Divergence-Free Perspective |  1 +
 ... Neural Nets Through Continuous Learning Rules |  1 +
 ...pplications to Differentiable Subset Selection |  1 +
 ...wn Nonlinear Systems with Stability Guarantees |  1 +
 ...n of Matching Fields for Visual Correspondence |  1 +
 ...al Network Architecture Beyond Width and Depth |  1 +
 ...d Stable Payoff Allocations Among Team Members |  1 +
 ...ing with Discrete Functions in High Dimensions |  1 +
 data/2022/neurips/Neural Shape Deformation Priors |  1 +
 ...ctive on Heterophily and Oversmoothing in GNNs |  1 +
 data/2022/neurips/Neural Stochastic Control       |  1 +
 ...Learning of Continuous Spatiotemporal Dynamics |  1 +
 ... of Dynamic Scenes with Monocular RGB-D Camera |  1 +
 ...ion Learning on Continuous-Time Dynamic Graphs |  1 +
 ...al Topological Ordering for Computation Graphs |  1 +
 .../neurips/Neural Transmitted Radiance Fields    |  1 +
 ...ntangled Framework for Complex Query Answering |  1 +
 ...sed Scheduling Method for High-level Synthesis |  1 +
 ...Steady Response Leads to Better Generalization |  1 +
 ... for Sequence Data with Relational Constraints |  1 +
 ...Methods: Staying Intrinsic, Complete and Sound |  1 +
 ...imation and a Generalized Fingerprinting Lemma |  1 +
 ...h Models of the Entorhinal-Hippocampal Circuit |  1 +
 ...es and adaptivity via learning rate separation |  1 +
 ...ent learning one step closer to the real world |  1 +
 ...e Learning Transformer for Node Classification |  1 +
 ...Enhancing Noise Robustness by Gradient Scaling |  1 +
 ...level Games with Critical Point Selection Maps |  1 +
 data/2022/neurips/Non-Gaussian Tensor Programs    |  1 +
 data/2022/neurips/Non-Linear Coordination Graphs  |  1 +
 ...or Contrastive Learning of Sentence Embeddings |  1 +
 ...s via Interpretable Multiple Instance Learning |  1 +
 ...C-Based Non-Autoregressive Machine Translation |  1 +
 ...yoffs: Improved Planning with Sublinear Regret |  1 +
 ...ex online learning via algorithmic equivalence |  1 +
 data/2022/neurips/Non-deep Networks               |  1 +
 ...isspecification in Models of Molecular Fitness |  1 +
 ...lization in the Bandits with Knapsacks Problem |  1 +
 ...d Registration with Neural Deformation Pyramid |  1 +
 .../neurips/Non-stationary Bandits with Knapsacks |  1 +
 ...ng the Stationarity in Time Series Forecasting |  1 +
 .../Nonlinear MCMC for Bayesian Machine Learning  |  1 +
 ...ion Reduction with a Stochastic Neural Network |  1 +
 ...ive Tensor Completion via Integer Optimization |  1 +
 ...cation for Single Deterministic Neural Network |  1 +
 ...nary Dual Averaging and Online Fair Allocation |  1 +
 ...for Knockoff-free Controlled Feature Selection |  1 +
 ...: Heterogeneous Precisions via Trainable Noise |  1 +
 ... theoretical analysis of graph (over)smoothing |  1 +
 ...gmentation from Rigid Dynamics of Point Clouds |  1 +
 ... Labels for Investigating Visual Eye Semantics |  1 +
 ... of Message-Passing GNNs in Larger Test Graphs |  1 +
 ...thogonal Propagation with Ego-Network Modeling |  1 +
 ...Data Subset Selection under Distribution Shift |  1 +
 ...Fake Detection via One-Shot Test-Time Training |  1 +
 ...owledge Graph Embeddings via Optimal Transport |  1 +
 ...s Sequences with Class Prompt for Visual Tasks |  1 +
 ...ement Algorithms with Implicit Differentiation |  1 +
 .../Object Scene Representation Transformer       |  1 +
 .../Object-Category Aware Reinforcement Learning  |  1 +
 ...lel Data Balanced in Gender within Occupations |  1 +
 ...r Action-Dependent Non-stationary Environments |  1 +
 ...Decision Processes under Non-Parametric Models |  1 +
 ... with Deficient Support Using Side Information |  1 +
 ...on with Policy-Dependent Optimization Response |  1 +
 data/2022/neurips/Off-Team Learning               |  1 +
 ...orcement Learning via $f$-Advantage Regression |  1 +
 ...forcement Learning with Knowledge Distillation |  1 +
 ...ing Better by Making Statistical Matches Match |  1 +
 ...radient Flow can Make Vanilla-GCNs Great Again |  1 +
 ...el for Image-Language and Video-Language Tasks |  1 +
 .../On A Mallows-type Model For (Ranked) Choices  |  1 +
 ...ties of Diffusion-based Deep Generative Models |  1 +
 ...Teaching with Sample Complexity Bounded by VCD |  1 +
 ... Probabilistic Explanations for Decision Trees |  1 +
 ...ty Invariant Bounds, Non-smoothness and Beyond |  1 +
 ...n and Estimation of Distributions on Manifolds |  1 +
 ...ivergence Measures for Bayesian Pseudocoresets |  1 +
 ...t Online Imitation Learning via Classification |  1 +
 ...ies for Bandit Fixed-Confidence Identification |  1 +
 ...or Numerical Features in Tabular Deep Learning |  1 +
 ...ed Meta-Learning for Rapid Few-Shot Adaptation |  1 +
 ...rning in the Presence of Spurious Correlations |  1 +
 ...dent Bounds for Offline Reinforcement Learning |  1 +
 ... of the Optimal Solutions to Accuracy and Dice |  1 +
 ...arations Between Simple and Optimal Mechanisms |  1 +
 ...ernelized Multi-Armed Bandits with Constraints |  1 +
 ...ng Fairness and Accuracy on Multiple Subgroups |  1 +
 ...n in Noninteractive Local Differential Privacy |  0
 ...ditional Mutual Information For Generalization |  1 +
 ...argin Maximization in Linear and ReLU Networks |  1 +
 ...gins and Generalisation for Voting Classifiers |  1 +
 ...n Measuring Excess Capacity in Neural Networks |  1 +
 ...n-Linear operators for Geometric Deep Learning |  1 +
 ...Optimal Learning Under Targeted Data Poisoning |  1 +
 ...rsonalization in Cross-Silo Federated Learning |  1 +
 ...anguage Models with no Catastrophic Forgetting |  1 +
 .../neurips/On Robust Multiclass Learnability     |  1 +
 ...sonalized Collaborative and Federated Learning |  1 +
 data/2022/neurips/On Scalable Testing of Samplers |  1 +
 ...na for Randomly Initialized Recurrent Networks |  1 +
 ...cle-Consistent Generative Adversarial Networks |  1 +
 ...d Data Augmentation in Bayesian Classification |  1 +
 ...o infinite width using linear parameterization |  1 +
 ...e Adversarial Robustness of Mixture of Experts |  1 +
 ... the Complexity of Adversarial Decision Making |  1 +
 ...nce Theory for Hessian-Free Bilevel Algorithms |  1 +
 ...lti-Objective Gradient Manipulation and Beyond |  1 +
 ... Mean Aggregation Feature Imputation in Graphs |  1 +
 ...ent of Random Features Models Trained with SGD |  1 +
 ...ent Modality on Offline Reinforcement Learning |  1 +
 ...orks: Convergence Guarantees and Implicit Bias |  1 +
 ...Fine-tuning Versus Meta-reinforcement Learning |  1 +
 ...pschitz-Driven Rehearsal in Continual Learning |  1 +
 .../On the Effectiveness of Persistent Homology   |  1 +
 ...uracy Optimality of Profile Maximum Likelihood |  1 +
 ...he Epistemic Limits of Personalized Prediction |  1 +
 .../On the Frequency-bias of Coordinate-MLPs      |  0
 ...lity and Predictability of Recommender Systems |  1 +
 ...fitted Three-Layer Neural Tangent Kernel Model |  1 +
 ...oftmax Gradient Play in Markov Potential Games |  1 +
 ...iability of Nonlinear ICA: Sparsity and Beyond |  1 +
 ...rtance of Gradient Norm in PAC-Bayesian Bounds |  1 +
 ...ral Networks Through Model Gradient Similarity |  1 +
 ... the Learning Mechanisms in Physical Reasoning |  1 +
 ...itations of Stochastic Pre-processing Defenses |  1 +
 ... Initialization of Diagonal State Space Models |  1 +
 ...entation Collapse of Sparse Mixture of Experts |  1 +
 ...ering Models: Adversarial Attacks and Defenses |  1 +
 ...aph Neural Diffusion to Topology Perturbations |  1 +
 ...Scaling Rules for Adaptive Gradient Algorithms |  1 +
 ...Machine Learning: A Maximum Deviation Approach |  1 +
 ...Stabilizing LTI Systems on a Single Trajectory |  1 +
 ...al Neural Tangent and Gaussian Process Kernels |  1 +
 ... and Scalability of Node Perturbation Learning |  1 +
 ...cy of Reward-Free Exploration in Non-Linear RL |  1 +
 ...on Between Model Invariance and Generalization |  1 +
 ...ning Models and their Internal Representations |  1 +
 ...f Noise Correlation in Stochastic Optimization |  1 +
 ...n the Tradeoff Between Robustness and Fairness |  1 +
 ... Receiver Operating Characteristic (ROC) curve |  1 +
 ...to Nash equilibria in general stochastic games |  1 +
 ...es in the likelihood for variational inference |  1 +
 ...ficulty of learning chaotic dynamics with RNNs |  1 +
 ...on of learning algorithms that do not converge |  1 +
 ...ion to optimally learn compositional functions |  1 +
 ...eep learning: quantifying the cost of symmetry |  1 +
 ...iational inference and auto-associative memory |  1 +
 ...ce learning with linear function approximation |  1 +
 ... Vector Diffusion Maps on the Projective Plane |  1 +
 ...Learning Optimally from Multiple Distributions |  1 +
 .../neurips/On-Device Training Under 256KB Memory |  1 +
 ...n Image Manipulation with Semantic Modulations |  1 +
 ...ve Multi-Label Learning with Label Enhancement |  1 +
 ...ic and Preference Learning over Multiple Users |  1 +
 ...Position Encoding for Point Cloud Registration |  1 +
 ...ackdoor Erasing via Adversarial Weight Masking |  1 +
 ...Shot Object Pose Estimation without CAD Models |  1 +
 .../neurips/Online Agnostic Multiclass Boosting   |  1 +
 .../Online Algorithms for the Santa Claus Problem |  1 +
 ...d Learning in the Presence of Strategic Agents |  1 +
 ...-Consistency Tradeoffs for the Two-Stage Model |  1 +
 ...nts: Towards the Best of Two Worlds and Beyond |  1 +
 data/2022/neurips/Online Decision Mediation       |  1 +
 ...brium Learning for Regularization by Denoising |  1 +
 .../Online Frank-Wolfe with Arbitrary Delays      |  1 +
 ...ork Revenue Management with Reusable Resources |  1 +
 ...ation: Multicalibeating and Other Applications |  1 +
 ...tion with Hierarchical Dirichlet Point Process |  1 +
 data/2022/neurips/Online PAC-Bayes Learning       |  1 +
 ...Reinforcement Learning for Mixed Policy Scopes |  1 +
 ...ining Through Time for Spiking Neural Networks |  1 +
 ...ion for Ontological Multi-Label Classification |  0
 ...Dataset - With Application to Super-Resolution |  1 +
 ...orcement Learning with Neural Reward Functions |  1 +
 ...AUC: Towards AUC-Oriented Open-Set Recognition |  1 +
 ...Benchmark Datasets for Full Waveform Inversion |  1 +
 ...ize Research Access to Social Media AR Filters |  1 +
 ...king Generalized Out-of-Distribution Detection |  1 +
 ...sing intraoperative stimulated Raman histology |  1 +
 ...a Transparent Evaluation of Model Explanations |  1 +
 ...ined connectivity of recurrent neural networks |  1 +
 .../neurips/Operator Splitting Value Iteration    |  1 +
 ...entralized Stochastic Variational Inequalities |  1 +
 .../Optimal Binary Classification Beyond Accuracy |  1 +
 ...ccurate Post-Training Quantization and Pruning |  1 +
 ...r Adaptive Online Learning with Switching Cost |  1 +
 .../neurips/Optimal Dynamic Regret in LQR Control |  1 +
 ...fficiency-Envy Trade-Off via Optimal Transport |  1 +
 ...imal Distributed Optimization Under Similarity |  0
 ...Latent Transformation for Contrastive Learning |  1 +
 ...uery Complexities for Dynamic Trace Estimation |  1 +
 ...egularized Conditional Mean Embedding Learning |  1 +
 ... Locally Balanced Proposals in Discrete Spaces |  1 +
 .../Optimal Transport of Classifiers to Fairness  |  1 +
 ...entity-invariant Facial Expression Recognition |  1 +
 data/2022/neurips/Optimal Weak to Strong Learning |  1 +
 ...mal and Adaptive Monteiro-Svaiter Acceleration |  1 +
 .../neurips/Optimal-er Auctions through Attention |  1 +
 ...Coarse Correlated Equilibria in Bimatrix Games |  1 +
 ...Learning with Few Samples and Tight Guarantees |  1 +
 ...rches for Combinatorial Black-Box Optimization |  0
 ...ptimizing Data Collection for Machine Learning |  1 +
 ...aps of Vision Transformers Improves Robustness |  1 +
 ...el Selection in Offline Reinforcement Learning |  1 +
 ...cient Online Learning for Smoothed Adversaries |  0
 ...dinality Estimators Are Differentially Private |  1 +
 .../neurips/Ordered Subgraph Aggregation Networks |  1 +
 ...Prompts for Language-Guided Ordinal Regression |  1 +
 ...nsformer Backbone with Token Orthogonalization |  1 +
 ...sion and Procession of Hippocampal Place Cells |  1 +
 ...tion via Conditional Kernel Independence Model |  1 +
 ...kelihood Ratio on Informative Hierarchical VAE |  1 +
 ...e Limit of Low-bit Transformer Language Models |  1 +
 ... Sparse Estimation via Non-Convex Optimization |  1 +
 ...Mean Estimation for Heavy-Tailed Distributions |  1 +
 ...a Efficient Collaborative Open-Source Sampling |  1 +
 ...arameterization from Computational Constraints |  1 +
 ...t Cloud Analysis with Point-to-Pixel Prompting |  1 +
 ... So Tight That They Can Explain Generalization |  1 +
 ...dictions in Multi-Agent Reinforcement Learning |  0
 .../neurips/PALBERT: Teaching ALBERT to Ponder    |  1 +
 ...ion Loop with Memory for Long-Horizon Planning |  1 +
 ...sive Benchmark for Scientific Machine Learning |  1 +
 ...ted Domain Programming, Learning, and Planning |  0
 ...k Benchmark for Protein Sequence Understanding |  1 +
 ... Detectors via Pearson Correlation Coefficient |  1 +
 ...try Dataset for Machine Learning in Proteomics |  0
 ...ompositional Multi-task Reinforcement Learning |  1 +
 ...e Latent Manifold for Unsupervised Pretraining |  1 +
 ...sion via Alternating Reverse Filtering Network |  1 +
 ...arallel Tempering With a Variational Reference |  1 +
 ...l Transport with semi-dual Brenier formulation |  1 +
 .../neurips/Parameter-Efficient Masking Networks  |  1 +
 ...ee Dynamic Graph Embedding for Link Prediction |  1 +
 ...ee Regret in High Probability with Heavy Tails |  1 +
 ... Overparameterization and Membership Inference |  1 +
 ...etargetable Decision-Makers Tend To Seek Power |  1 +
 ...ng Is All You Need for Novel Object Captioning |  1 +
 ...ing for Expensive Multi-Objective Optimization |  1 +
 ...atment Effects with Implicit Generative Models |  1 +
 ...s for 3D Shape Completion on Unseen Categories |  1 +
 ...pen-vocabulary models by interpolating weights |  1 +
 ...odels Can Better Exploit Test-Time Computation |  1 +
 ...misconceptions about Lipschitz neural networks |  1 +
 .../PeRFception: Perception using Radiance Fields |  1 +
 .../neurips/Peer Prediction for Learning Agents   |  1 +
 ...ce Image Quality Models with Human-in-the-Loop |  1 +
 .../Perfect Sampling from Pairwise Comparisons    |  1 +
 ...DouDizhu with Perfect Information Distillation |  1 +
 data/2022/neurips/Performative Power              |  1 +
 ...rmers for Crystal Material Property Prediction |  1 +
 data/2022/neurips/Peripheral Vision Transformer   |  1 +
 ...munication Efficiency, Robustness and Fairness |  0
 ...nline Federated Learning with Multiple Kernels |  1 +
 .../Perturbation Learning Based Anomaly Detection |  1 +
 ...on from Clean Training to Adversarial Training |  1 +
 ... in high-dimensional two-layer neural networks |  1 +
 .../Phase transitions in when feedback is useful  |  1 +
 ...fusion Models with Deep Language Understanding |  1 +
 ...issue Deformation in Image-Guided Neurosurgery |  1 +
 ...ed Face Rendering for NIR-VIS Face Recognition |  1 +
 ...ral PDE Solvers with Mixed Boundary Conditions |  1 +
 ...t Representations of Equilibrium Network Flows |  1 +
 ...ic Monoculture lead to Outcome Homogenization? |  1 +
 ... the Law and a 256GB Open-Source Legal Dataset |  1 +
 ...ainty Quantification through Loss Minimisation |  1 +
 ...g Model For Model-Based Reinforcement Learning |  1 +
 ...anning for Sample Efficient Imitation Learning |  1 +
 ...izon of BAMDPs via Epistemic State Abstraction |  1 +
 ...nd, and Snow for Optimization Time Integration |  1 +
 ... Image Completion with Gaussian Mixture Models |  0
 ...d Vector Attention and Partition-based Pooling |  1 +
 ...ders for Hierarchical Point Cloud Pre-training |  1 +
 ... with Improved Training and Scaling Strategies |  1 +
 ...l Action Detection with Learnable Query Points |  1 +
 data/2022/neurips/Poisson Flow Generative Models  |  1 +
 ... Augmentation Technique for LiDAR Point Clouds |  1 +
 ...cy Gradient With Serial Markov Chain Reasoning |  1 +
 ...ames: Unified Framework and Faster Convergence |  1 +
 ...ion for Long-Term Fairness in Decision Systems |  1 +
 ...ization with Linear Temporal Logic Constraints |  1 +
 ...t Multi-Task Adaptation for Dense Vision Tasks |  1 +
 ...lds for Subband Decomposition and Manipulation |  1 +
 ... time guarantees for the Burer-Monteiro method |  2 ++
 ...libria with a Mediator in Extensive-Form Games |  1 +
 ...ental Design for Optimal Sparse Linear Bandits |  1 +
 ...Forests via Recursive Greedy Risk Minimization |  1 +
 ...ely Weighted Kernel Quadrature via Subsampling |  1 +
 ... estimators for learning to defer to an expert |  1 +
 ...r-independent Mechanisms with Value Maximizers |  1 +
 ...ior Collapse of a Linear Latent Variable Model |  1 +
 .../Posterior Matching for Arbitrary Conditioning |  1 +
 ... Sample Efficiency in Bayesian Neural Networks |  1 +
 ...omputational Uncertainty in Gaussian Processes |  1 +
 ...of single-qubit native quantum neural networks |  1 +
 ...s on Spatiotemporal Traffic Forecasting Models |  1 +
 ...al Adversarial Multivalid Conformal Prediction |  1 +
 ...ical Demonstrations in Multi-Goal Environments |  1 +
 ...sian Transfer Learning with Informative Priors |  1 +
 ...or Generalizable Visual Reinforcement Learning |  1 +
 ...anguage Models for Interactive Decision-Making |  1 +
 ...ty Evaluation for Small-Data Transfer Learning |  1 +
 ...tivation Distributions Expose Backdoor Neurons |  1 +
 .../neurips/Pre-trained Adversarial Perturbations |  1 +
 ...der Scalings for Dot-product Kernel Regression |  0
 ...or Log-loss via a Truncated Bayesian Algorithm |  1 +
 ...Drug Perturbations at a Single-Cell Resolution |  1 +
 ...ng Label Distribution from Multi-label Ranking |  1 +
 ...redictive Coding beyond Gaussian Distributions |  1 +
 ...ying for Autoregressive Neural Sequence Models |  1 +
 ...by Not-True Distillation in Federated Learning |  1 +
 ...on-Computation Gaps and Sparse Mean Estimation |  1 +
 ...ent: More Iterations without More Privacy Loss |  1 +
 .../neurips/Private Estimation with Public Data   |  1 +
 ...Path Distance Release with Improved Error Rate |  1 +
 data/2022/neurips/Private Isotonic Regression     |  1 +
 .../Private Multiparty Perception for Navigation  |  1 +
 ...Set Generation with Discriminative Information |  1 +
 ...ta for Multitask Learning and Marginal Queries |  1 +
 ...on-Efficient Algorithms for Entropy Estimation |  1 +
 ...utation for Mixed Categorical and Ordered Data |  1 +
 ...tributions for RNA Folding and Molecule Design |  1 +
 ... Generalization via Quantile Risk Minimization |  1 +
 ...e Unreliable for Concept Removal and Detection |  1 +
 ...ral Image Programs for Representation Learning |  1 +
 ...r Revenue Maximization with Multiple Purchases |  1 +
 ...ect Segmentation from Real-world Single Images |  1 +
 ...Randomized Gradient Smoothing and Quantization |  0
 ...ustomizable and Composable Learning Algorithms |  1 +
 ...elf-Explainable Prototypical Variational Model |  1 +
 ...a Reinforcement Learning Agent via Prototyping |  1 +
 ...t for Few-Shot 3D Point Cloud Object Detection |  1 +
 ...resentation Learning in Reinforcement Learning |  1 +
 ...st Backdoor Policies in Reinforcement Learning |  1 +
 ...entation Learning in Multitask Bandits and MDP |  1 +
 ...erparameterized Meta-learning Trained with SGD |  1 +
 ...e Identification Under Post-Nonlinear Mixtures |  1 +
 ... of Out-of-Distribution Data (Almost) for Free |  1 +
 ...strained RL with Linear Function Approximation |  1 +
 ...Reinforcement Learning via Strategy-wise Bonus |  1 +
 ...ning in Partially Observable Dynamical Systems |  1 +
 ...nforcement Learning via Active Reward Learning |  1 +
 .../Provably expressive temporal graph networks   |  1 +
 ...RL with side information about latent dynamics |  1 +
 ...rovably tuning the ElasticNet across instances |  1 +
 ...imal Learning With Opponent-Learning Awareness |  1 +
 .../neurips/Proximal Point Imitation Learning     |  1 +
 ...ong rat visual cortex and deep neural networks |  1 +
 ...ts and Convex Geometry: Towards No Assumptions |  1 +
 ...uning has a disparate impact on model accuracy |  1 +
 ...hrough the Lens of Training and Regularization |  1 +
 ...Pseudo-Riemannian Graph Convolutional Networks |  1 +
 ...er Co-Attention for Social Text Classification |  1 +
 ... for Pulsative Physiological Signal Imputation |  1 +
 .../Pure Transformers are Powerful Graph Learners |  1 +
 ... impossibility: Who's the fairest of them all? |  1 +
 ...yramid Attention For Source Code Summarization |  0
 ...lignment for Vision-language Model Pretraining |  1 +
 ...toencoders in Python - A Benchmarking Use Case |  1 +
 ...and Fully Quantized Low-bit Vision Transformer |  1 +
 ...Controllable Image Generation and Manipulation |  1 +
 ...ble Text Generation with Reinforced Unlearning |  1 +
 ... between Dataset Design and Robustness of CLIP |  1 +
 ...ased Image Segmentation by Selective Inference |  1 +
 ...ning Framework Constraining Outage Probability |  1 +
 ...d Training of Gradient Boosting Decision Trees |  1 +
 ...ributions and Estimating Normalizing Constants |  1 +
 ...o Logarithmic Regret Stochastic Convex Bandits |  1 +
 ...Quasi-Newton Methods for Saddle Point Problems |  1 +
 ... Regression via Spatial-Aware Part-Level Query |  1 +
 ...Dynamic Capacity Region of Multiplayer Bandits |  1 +
 ...e Key Towards Long-Term Multi-Object Tracking? |  1 +
 ...ial Model-Based Offline Reinforcement Learning |  1 +
 ...s in Knowledge-Based Visual Question Answering |  1 +
 ...zed Decision Learning with Sensitive Variables |  1 +
 .../RKHS-SHAP: Shapley Values for Kernel Methods  |  1 +
 ...raining for Human-Object Interaction Detection |  1 +
 ...Stable Assemblies of Recurrent Neural Networks |  1 +
 ...nforcement Learning via Conservative Smoothing |  1 +
 ...ive Augmentations for Self-supervised Learning |  1 +
 ...al-Time Semantic Segmentation with Transformer |  1 +
 ...enchmark for Spatial Precipitation Downscaling |  1 +
 ...malization Aggregation for Adversarial Defense |  1 +
 ...ly Fair Randomized Facility Location Mechanism |  1 +
 .../neurips/Random Sharpness-Aware Minimization   |  1 +
 ...ckdoor Attack Detection without Clean Datasets |  1 +
 ...ray-box Certificates for Graph Neural Networks |  1 +
 ... Clustering: Fast and Optimal Kernel $k$-Means |  1 +
 .../Rank Diminishing in Deep Neural Networks      |  1 +
 ...ture Removal for Out-of-distribution Detection |  1 +
 ... Model Architecture Adaption for Meta-Learning |  1 +
 ...opolis Algorithms for Model Selection Problems |  1 +
 ...ems: Finding Lottery Tickets at Initialization |  1 +
 ... for Predictive Multiplicity in Classification |  1 +
 ... Generalization Error for Distributed Learning |  1 +
 ...Convex Optimization in Adaptive Linear Control |  1 +
 ...Matrix Approximation via Dyson Brownian Motion |  1 +
 ...Retrieve and Co-segment for Zero-shot Transfer |  1 +
 ...ased Models from a Message-Passing Perspective |  1 +
 ... Unsuitable for Complex-Valued Neural Networks |  1 +
 ...rk Pruning and the Undecayed Pruning Algorithm |  1 +
 ...eceding Horizon Inverse Reinforcement Learning |  1 +
 ... General, Powerful, Scalable Graph Transformer |  1 +
 .../Recommender Forest for Efficient Retrieval    |  1 +
 ...ing Training Data From Trained Neural Networks |  1 +
 ...nstruction on Trees and Low-Degree Polynomials |  1 +
 ... Text in Federated Learning of Language Models |  1 +
 .../Recruitment Strategies That Take a Chance     |  1 +
 ...al Networks Learn Succinct Learning Algorithms |  1 +
 data/2022/neurips/Recurrent Memory Transformer    |  1 +
 ...n Transformer with Guided Deformable Attention |  1 +
 ...inimax Games: A Level $k$ Gradient Play Method |  1 +
 .../2022/neurips/Recursive Reinforcement Learning |  1 +
 .../RecursiveMix: Mixed Learning with History     |  1 +
 ...intrinsic rewards via constrained optimization |  1 +
 ...ghts and Activations for AdderNet Quantization |  1 +
 ... Fields for Effective Non-rigid Shape Matching |  1 +
 ...ence Diagrams of Networks: CoralTDA and PrunIT |  1 +
 ...Free Message Passing for Graph Neural Networks |  0
 ...ns help generalization in wide neural networks |  1 +
 ...entanglement of Multilingual Translation Model |  0
 ...or Information-Directed Reinforcement Learning |  1 +
 ...tilabel Classification in Sparse Label Regimes |  1 +
 ...unds for Risk-Sensitive Reinforcement Learning |  1 +
 ...nt Ascent for Two-Player Zero-Sum Markov Games |  1 +
 .../Regularized Molecular Conformation Fields     |  1 +
 ...using Prior Computation to Accelerate Progress |  1 +
 ...etic Algorithm for Structure-based Drug Design |  1 +
 ...ss: Breaking the Dependence on the State Space |  1 +
 ... Learning with Automated Auxiliary Loss Search |  1 +
 ...ng with Logarithmic Regret and Policy Switches |  1 +
 ...forcement Learning with Neural Radiance Fields |  1 +
 ...ment Learning with Non-Exponential Discounting |  1 +
 .../Reinforcement Learning with a Terminator      |  1 +
 ...ation-Constrained Decoding for Text Generation |  1 +
 ...t Relationships as Fine-Grained Discriminators |  1 +
 ...: Provable Efficiency and Applications to MARL |  1 +
 ...traints with Non-stationary Continuous Filters |  1 +
 ... into Addressable Memories for Neural Networks |  1 +
 ...cations to Private and Robust Machine Learning |  1 +
 ...ural Networks by Leaving the Right Past Behind |  1 +
 ...esenting Spatial Trajectories as Distributions |  1 +
 ...Optimization: Theoretical Framework and Limits |  1 +
 ...ent Reinforcement Learning Value Factorization |  1 +
 .../neurips/ResT V2: Simpler, Faster and Stronger |  1 +
 ... Filter Networks for Multiscale Reconstruction |  1 +
 ...lving the data ambiguity for periodic crystals |  1 +
 ...ed Learning with All-In-One Neural Composition |  1 +
 ...pecting Transfer Gap in Knowledge Distillation |  1 +
 ...Knowledge for Learning with Dynamic Definition |  1 +
 ...ignment in Video Super-Resolution Transformers |  1 +
 ...king Generalization in Few-Shot Classification |  1 +
 ...hinking Image Restoration for Object Detection |  1 +
 ...Cooperative Multi-Agent Reinforcement Learning |  1 +
 ...aph Evaluation Under the Open-World Assumption |  1 +
 ...ied Robustness: A Boolean Function Perspective |  1 +
 ... in the Context of Efficient Video Recognition |  1 +
 ...g for Generalization in Reinforcement Learning |  1 +
 ...Probabilistic Programs with Stochastic Support |  1 +
 ...apley Value-based Approach in Frequency Domain |  0
 ...y Efficient Approach with Group Discrimination |  1 +
 ...ing the Reverse-engineering of Trojan Triggers |  1 +
 ...through regularization in the hyperbolic space |  1 +
 .../neurips/Retrieval-Augmented Diffusion Models  |  1 +
 ...ing Accurate and Faithful Patient Instructions |  1 +
 ...tive Adversarial Replay for Continual Learning |  1 +
 ... of mSGD under milder requirement on step size |  1 +
 ...ting Active Sets for Gaussian Process Decoders |  1 +
 ...earning from the Perspective of Graph Spectrum |  1 +
 ...visiting Heterophily For Graph Neural Networks |  1 +
 ...iting Injective Attacks on Recommender Systems |  1 +
 ...ing Neural Scaling Laws in Language and Vision |  1 +
 ...s for Robust and Generalizable Stereo Matching |  1 +
 ...n-convex Stochastic Decentralized Optimization |  1 +
 ...nference and Adaptation by Anchored Clustering |  1 +
 ...n on Images: From Vectorization to Convolution |  1 +
 ...rse Convolutional Model for Visual Recognition |  1 +
 data/2022/neurips/Riemannian Diffusion Models     |  1 +
 ...arning Stochastic Representations on Manifolds |  1 +
 .../Riemannian Score-Based Generative Modelling   |  1 +
 ... for Least Squares in the Interpolation Regime |  1 +
 .../Risk-Driven Design of Perception Systems      |  1 +
 ...Disabling Shortcuts and Learning New Knowledge |  1 +
 ... Anytime Learning of Markov Decision Processes |  1 +
 ...bust Bayesian Regression via Hard Thresholding |  1 +
 ...odels by Pruning Randomly-initialized Networks |  1 +
 ...ibration with Multi-domain Temperature Scaling |  1 +
 ...e-Level Adversaries are Interpretability Tools |  1 +
 ...d Method of Moments: A Finite Sample Viewpoint |  1 +
 ...ucture Learning via Multiple Statistical Tests |  1 +
 ...of a Few Demonstrations with a Backwards Model |  1 +
 ... Mirror Descent Inverse Reinforcement Learning |  1 +
 ...Robust Learning against Relational Adversaries |  1 +
 ... Selection and Nearly-Proper Learning for GMMs |  1 +
 .../neurips/Robust Models are less Over-Confident |  1 +
 ...ior Estimation and Statistical Model Criticism |  1 +
 ...nt Policy Evaluation in Reinforcement Learning |  1 +
 ...bust Reinforcement Learning using Offline Data |  1 +
 data/2022/neurips/Robust Rent Division            |  1 +
 ...ised Learning when Not All Classes have Labels |  1 +
 data/2022/neurips/Robust Streaming PCA            |  1 +
 ...bust Testing in High-Dimensional Sparse Models |  1 +
 ...dels Against Visual and Language Perturbations |  1 +
 .../Robustness Disparities in Face Detection      |  1 +
 ...the bad (depth), and the ugly (initialization) |  1 +
 ...Depends on the Shape of the Noise Distribution |  1 +
 ...to Unbounded Smoothness of Generalized SignSGD |  1 +
 ...ures in Microservices through Causal Discovery |  1 +
 ...elds for Learning a Natural Illumination Prior |  1 +
 ...on Learning with Skew R\303\251nyi Divergence" |  1 +
 ...u for Single-view Clothed Human Reconstruction |  1 +
 ... Occam's Razor for Domain Incremental Learning |  1 +
 ...Augmentation in Offline Reinforcement Learning |  1 +
 ...om Shading and Shadow under a Single Viewpoint |  1 +
 ...3GC: Scalable Self-Supervised Graph Clustering |  1 +
 ... as Multidimensional Signals with State Spaces |  0
 ...tacking Lattice Cryptography with Transformers |  1 +
 ...trained Real-world Arbitrary Image collections |  1 +
 ...Aware Point Affiliation for Feature Upsampling |  1 +
 ... Method for Nonconvex-Concave Minimax Problems |  1 +
 ...-Aware Pipeline for Data Parallel DNN Training |  1 +
 ...Object-Centric Learning from Real-World Videos |  1 +
 ...sformer Pruning via Collaborative Optimization |  1 +
 ...or Camera Measurement of Physiological Signals |  1 +
 ...asting with Sample Convolution and Interaction |  1 +
 ...ly-Supervised Whole-Slide Image Classification |  1 +
 ...Unknown Environments by Volumetric Integration |  1 +
 ...ld through Simultaneous Generation and Mapping |  1 +
 ...apley Value Theory into Multi-Agent Q-Learning |  1 +
 .../SHINE: SubHypergraph Inductive Neural nEtwork |  1 +
 ...ions for Detecting Out-of-Distribution Objects |  1 +
 ...O: Smoothing Inference with Twisted Objectives |  1 +
 ...rated Gradients Estimation of Neuron Relevance |  1 +
 ...Flow: Learning Optical Flow with Super Kernels |  1 +
 ...ring and Process Control Learning Environments |  1 +
 .../SNAKE: Shape-aware Neural 3D Keypoint Field   |  1 +
 ...twork through Regularized Adversarial Training |  1 +
 ...pretable unsupervised domain adaptation in EEG |  1 +
 ...nsupervised Multi-agent Reinforcement Learning |  1 +
 ...rmer for Dense Point Cloud Semantic Completion |  1 +
 ...for Learning Single Neurons with Massart Noise |  1 +
 ...ter-Efficient Image-to-Video Transfer Learning |  1 +
 ...tion Activity with Spatiotemporal Transformers |  1 +
 .../STaR: Bootstrapping Reasoning With Reasoning  |  1 +
 .../Safe Opponent-Exploitation Subgame Refinement |  1 +
 ...m for Safety Evaluation of Autonomous Vehicles |  1 +
 ...namic Systems via Stochastic Barrier Functions |  1 +
 ...ageMix: Saliency-Guided Mixup for Point Clouds |  1 +
 .../Saliency-Aware Neural Architecture Search     |  1 +
 ... Functions for Greedy-Best-First and A* Search |  1 +
 ...Sample Constrained Treatment Effect Estimation |  1 +
 ...Benchmark for Practical Molecular Optimization |  1 +
 ... Correlated Equilibria in Extensive-Form Games |  1 +
 ... Learning of Partially Observable Markov Games |  1 +
 ...e-Then-Optimize Batch Neural Thompson Sampling |  1 +
 ...istributions with Infinity-Distance Guarantees |  1 +
 ... Orthogonal-Space Variational Gradient Descent |  1 +
 ...Hamiltonian Monte Carlo in a Constrained Space |  1 +
 ...aster Rates in Finite-Sum Minimax Optimization |  1 +
 ... Temporal and Multi-Spectral Satellite Imagery |  1 +
 ...ass of Non-Convex Optimization with Guarantees |  1 +
 data/2022/neurips/Scalable Infomin Learning       |  1 +
 .../Scalable Interpretability via Polynomials     |  1 +
 ...ing Option Discovery based on Kronecker Graphs |  1 +
 ...esentations with Learnable Positional Features |  1 +
 ...extual Bandits with Constant Regret Guarantees |  1 +
 ...t Estimates of Continuous-Valued Interventions |  1 +
 ...cient Non-adaptive Deterministic Group Testing |  0
 ...onal Neural Networks with Differential Privacy |  1 +
 ...sing Discrete Optimization with Graph Coloring |  1 +
 .../Scale-invariant Learning by Physics Inversion |  1 +
 ...res: A New Baseline for Efficient Model Tuning |  2 ++
 ...ning via Cross-Modality Gradient Harmonization |  1 +
 ...d Diffusion meets Annealed Importance Sampling |  1 +
 ...Score-Based Generative Models Detect Manifolds |  1 +
 ...ng Secretly Minimizes the Wasserstein Distance |  1 +
 ...poral Alignment in Few-Shot Action Recognition |  1 +
 ... to Re-Align With Human Values from Text Edits |  1 +
 ...aussianization protocol for Federated Learning |  1 +
 ...dual and collective dynamics with transformers |  1 +
 ...nal Attention Design for Semantic Segmentation |  1 +
 ...ic Segmentation with Plain Vision Transformers |  1 +
 ...s via an Object-Centric Layered Representation |  1 +
 ...biased Learning by Contradicting-pair Sampling |  1 +
 ...sentations for variable-rate image compression |  1 +
 .../Self-Aware Personalized Federated Learning    |  1 +
 ...ry of Kernel Evolution in Wide Neural Networks |  1 +
 .../Self-Explaining Deviations for Coordination   |  1 +
 ...Cooperative Multi-agent Reinforcement Learning |  1 +
 ...ages as Differentiable Fractal Representations |  1 +
 ...erts for Test-Agnostic Long-Tailed Recognition |  1 +
 ...For Time Series via Time-Frequency Consistency |  1 +
 ...r Representation Learning without Demographics |  1 +
 ... Image Restoration with Blurry and Noisy Pairs |  1 +
 ...f-Supervised Learning Through Efference Copies |  1 +
 ...of Brain Dynamics from Broad Neuroimaging Data |  1 +
 ...Supervised Learning via Maximum Entropy Coding |  1 +
 ...ing with an Information Maximization Criterion |  1 +
 ...vised Pretraining for Large-Scale Point Clouds |  1 +
 ...Representation Learning with Semantic Grouping |  1 +
 ...plaining deep models with logic rule reasoning |  1 +
 ...lf-supervised Amodal Video Object Segmentation |  1 +
 ...ph Pre-training Based on Structural Clustering |  1 +
 ...epth estimation with volumetric feature fusion |  1 +
 ...uided Masking for Learning Masked Autoencoders |  1 +
 ...ic Diffusion Network for Semantic Segmentation |  1 +
 ...ge Abstractions and Pretrained Representations |  1 +
 ...obabilistic Layers for Neuro-Symbolic Learning |  1 +
 ...ainty intervals for disentangled latent spaces |  1 +
 ...zing Flows through Differentiable Tessellation |  1 +
 ... Generative Models for Multiagent Trajectories |  1 +
 ... Graph Laplacian Tree Alternating Optimization |  1 +
 ...tic Segmentation via Gentle Teaching Assistant |  1 +
 ...tion Based on Uncertainty-Guided Pseudo Labels |  0
 ...finitely Constrained Markov Decision Processes |  1 +
 .../Semi-supervised Active Linear Regression      |  1 +
 ...ith Prototype-based Consistency Regularization |  1 +
 .../Semi-supervised Vision Transformers at Scale  |  1 +
 ... for Unlabeled Clients with Alternate Training |  1 +
 ...ate Text Generation via Knowledge Distillation |  1 +
 ...el Imitation Learning with Unobserved Contexts |  1 +
 .../neurips/Sequence-to-Set Generative Models     |  1 +
 .../Sequencer: Deep LSTM for Image Classification |  1 +
 ...ation Design: Learning to Persuade in the Dark |  1 +
 ... Meta-Interpolation for Few-Task Meta-Learning |  1 +
 ...Bridging Offline and Online Knowledge Transfer |  1 +
 ... And Structure Preserving Differential Privacy |  1 +
 ...ages using Monte Carlo Rendering and Denoising |  1 +
 ...ive Text-Conditioned 3D Shape Generation Model |  1 +
 ...ge for Meta-learning with Feature Descriptions |  1 +
 ...on under Global Kurdyka-Lojasiewicz Inequality |  1 +
 ...ous SGD for Distributed and Federated Learning |  1 +
 .../neurips/Sharpness-Aware Training for Free     |  1 +
 ...on for Safe Multi-Agent Reinforcement Learning |  1 +
 ...n Efficient ConvNet for Image Super-Resolution |  1 +
 .../neurips/SignRFF: Sign Random Fourier Features |  1 +
 ...Processing for Implicit Neural Representations |  1 +
 ...cal Perspectives and the Role of Rank Collapse |  1 +
 ...y with Non-Expansive Generative Network Priors |  1 +
 ...fare Maximization in Rich Advertising Auctions |  1 +
 ...c Learning for Complex and Naturalistic Videos |  1 +
 ...al Greedy Online Contention Resolution Schemes |  1 +
 .../Simplified Graph Convolution with Heterophily |  1 +
 ...m Search for Neural Combinatorial Optimization |  1 +
 ... Imputation and Structure Learning with Groups |  1 +
 ...an Homotopy Method for Non-convex Optimization |  1 +
 ...ainty Estimation via Stochastic Data Centering |  1 +
 ...elationship Learning using Conditional Queries |  1 +
 ...tion with Instance-sensitive Sample Complexity |  0
 ...ase deep learning in cortico-cortical networks |  1 +
 ...gmentation requires Few-parameters Fine-tuning |  1 +
 ...ural networks: interpolation and approximation |  1 +
 ...g Size-Generalization in Graph Neural Networks |  1 +
 ...al Networks with Sublinear Training Complexity |  1 +
 ...Boosted Decision Tree for Multioutput Problems |  1 +
 ... Image Classification with Provable Guarantees |  1 +
 ... for Multi-task Offline Reinforcement Learning |  1 +
 ...xperts for fine-grained debugging and analysis |  1 +
 ...doors for Neural Networks Trained from Scratch |  1 +
 ...with Perturbed Payoffs and Unknown Transitions |  1 +
 ...hed Embeddings for Certified Few-Shot Learning |  1 +
 ...imization Based on Discounted-Normal-Predictor |  1 +
 ...ayesian Optimization with Pathwise Exploration |  1 +
 ...n arbitrary length-scales in molecular systems |  1 +
 ...Refinery for Imbalanced Partial-Label Learning |  1 +
 ...arning Elliptic Equations via Gradient Descent |  1 +
 ...cial Contagion Management with Task Migrations |  1 +
 ... Regret Bounds of Concurrent Thompson Sampling |  0
 ...Unsupervised Anomaly Detection with Noisy Data |  1 +
 ...tative Reasoning Problems with Language Models |  1 +
 ...erated Learning with Communication Compression |  1 +
 ...ent Variables Given Local Background Knowledge |  0
 ...d Complete Verification of Polynomial Networks |  1 +
 ...mulation Platform for Visual-Acoustic Learning |  1 +
 .../SparCL: Sparse Continual Learning on the Edge |  1 +
 ...rier Backpropagation in Cryo-EM Reconstruction |  1 +
 ...rocess Hyperparameters: Optimize or Integrate? |  1 +
 ...Detection Thresholds in Stochastic Block Model |  1 +
 ...ure Interaction Detection and Sparse Selection |  1 +
 ...Probabilistic Circuits via Pruning and Growing |  1 +
 .../Sparse Structure Search for Delta Tuning      |  1 +
 ...g Tickets are Data-Efficient Image Recognizers |  1 +
 ...to Densify 3D Features for 3D Object Detection |  1 +
 .../Sparsity in Continuous-Depth Neural Networks  |  1 +
 ...tiable Sparsity via Regularized Transportation |  1 +
 data/2022/neurips/Spatial Mixture-of-Experts      |  1 +
 ... Convolution for Efficient 3D Object Detection |  1 +
 ...ing Set for Deep Networks in the Kernel Regime |  1 +
 ...lization in Image-based Reinforcement Learning |  1 +
 ...ely: Accelerating MCTS with Virtual Expansions |  1 +
 ...ical Channels for Modeling Atomic Interactions |  1 +
 ...zation Layer: Representation Using Only Angles |  1 +
 ...t-kl Inequalities for Ternary Random Variables |  1 +
 ...t Transformer for Automatic Speech Recognition |  1 +
 ... Generalization Bounds of Adversarial Training |  1 +
 ...f Gradient Methods for Shallow Neural Networks |  1 +
 ...n for Markov Chain Stochastic Gradient Methods |  1 +
 ...stering: from Single Kernel to Multiple Kernel |  1 +
 ...e Under Interference Without Network Knowledge |  1 +
 ...ttention for Recurrent Processing of Sequences |  1 +
 ... Sequence Modeling with Partially Labeled Data |  1 +
 ...ale Graph Building for Clustering and Learning |  0
 ...verse Problems: A Stochastic Gradient Approach |  1 +
 ...al Guarantees for Sliced Wasserstein Distances |  1 +
 ...pproximating Turing Machines with Transformers |  1 +
 ...ks: A Social Psychology Perspective of Loafing |  1 +
 .../Stochastic Adaptive Activation Function       |  1 +
 ...e Reduction for Stochastic Monotone Inclusions |  1 +
 ...stic Multiple Target Sampling Gradient Descent |  1 +
 ... Graphs: Finite-Time and Asymptotic Optimality |  1 +
 ...lexity of SGD for Gradient-Dominated Functions |  1 +
 ...astic Window Transformer for Image Restoration |  1 +
 ...reaming Radiance Fields for 3D Video Synthesis |  1 +
 ...k Dataset for Sub-second Action Identification |  1 +
 ... the Learnability of Gomory Mixed Integer Cuts |  1 +
 ... Optimization and Symbolical Optimal Transport |  1 +
 ...al Knowledge Distillation for Object Detection |  1 +
 ...ructural Pruning via Latency-Saliency Knapsack |  1 +
 ...Aware Image Segmentation with Homotopy Warping |  1 +
 ...D Garment Modeling with Neural Sewing Machines |  1 +
 .../neurips/Structured Energy Network As a Loss   |  0
 ...ion for Generative Models with Explaining Away |  1 +
 ...cturing Representations Using Group Invariants |  1 +
 ...d Sampling in Stochastic Segmentation Networks |  1 +
 ...lower bounds for Principal Components Analysis |  1 +
 .../Subgame Solving in Adversarial Team Games     |  1 +
 ... On Trees: An Empirical Baseline Investigation |  1 +
 ...blinear Algorithms for Hierarchical Clustering |  1 +
 .../Submodular Maximization in Clean Linear Time  |  1 +
 ...sion with Applications to Tensor Decomposition |  1 +
 ...type Alignment for Universal Domain Adaptation |  1 +
 ...om Heterogeneous Data with Non-isotropic Noise |  1 +
 ...transitions & Statistical-to-Computational gap |  1 +
 .../Supervised Training of Conditional Monge Maps |  1 +
 ...Fidelity Race of Hyperparameter Configurations |  1 +
 ...rt Recovery in Sparse PCA with Incomplete Data |  1 +
 ...ptimization for Offline Reinforcement Learning |  1 +
 ... Assist Blind Navigation in Urban Environments |  1 +
 ... Multi-Agent Learning with Energy-based Models |  0
 ...Online Reinforcement Learning for Auto-bidding |  1 +
 ...e and Strong Baseline for Transformer Tracking |  1 +
 ...etricity for Neural Combinatorial Optimization |  1 +
 ...istillation for Learned TCP Congestion Control |  1 +
 ...try Teleportation for Accelerated Optimization |  1 +
 .../Symmetry-induced Disentanglement on Graphs    |  1 +
 ...arning Hamiltonians from Noisy and Sparse Data |  1 +
 ...per-parameters in Contextual Bandit Algorithms |  1 +
 ... Collaborate to Improve Adversarial Robustness |  1 +
 ...ise Approach to Unsupervised Ensemble Learning |  1 +
 ...of neural network quantum states using Lanczos |  0
 ...coding Scheme for Neural Network Architectures |  1 +
 ...y-Aware Large Scale Mixture-of-Expert Training |  1 +
 ...bust 3D Stylization via Lighting Decomposition |  1 +
 ... for Drug-Protein Binding Structure Prediction |  1 +
 ... A Benchmark for Tracking Any Point in a Video |  1 +
 ...ning using Bootstrapped Neural Tangent Kernels |  1 +
 ... Text Generation of Pretrained Language Models |  1 +
 ...ion Transformer with Noun-Pronoun Distillation |  1 +
 ...-KNN: K Nearest Neighbor Search at Peak FLOP s |  1 +
 ...sient Redundancy Elimination-based Convolution |  1 +
 ... and its Application to Reinforcement Learning |  1 +
 .../TUSK: Task-Agnostic Unsupervised Keypoints    |  1 +
 .../TVLT: Textless Vision-Language Transformer    |  1 +
 .../TaSIL: Taylor Series Imitation Learning       |  1 +
 ...Neural Architecture Search on Tabular Datasets |  1 +
 ...taset for Chinese Vision-Language Pre-training |  1 +
 ...nfinite Variance) Noise in Federated Learning" |  1 +
 ...ge Objects without Explicit Goal Specification |  1 +
 ...alignment in truncated kernel ridge regression |  1 +
 ...g the Tasks that Neural Networks Generalize on |  1 +
 .../2022/neurips/Task-Agnostic Graph Explanations |  1 +
 ...rning via Online Discrepancy Distance Learning |  1 +
 ...ask-level Differentially Private Meta Learning |  1 +
 ...ndistillable Classes in Knowledge Distillation |  0
 ... Recovers Reward Functions for Text Generation |  1 +
 ...namically Evolving and Newly Emerging Entities |  1 +
 ...eural Network with Optimal Transport Distances |  1 +
 ...el Training through Memory Footprint Reduction |  1 +
 ...Batch Normalization in Spiking Neural Networks |  1 +
 ...low Processing Mechanisms in Sequence Learning |  1 +
 ...emporally Disentangled Representation Learning |  1 +
 .../Temporally-Consistent Survival Analysis       |  1 +
 ...pose Benchmark Dataset for Recommender Systems |  1 +
 ...ogram Optimization with Probabilistic Programs |  1 +
 ...position and Its Tensor Completion Application |  1 +
 ...st Time Adaptation via Conjugate Pseudo-labels |  1 +
 ...-Shot Generalization in Vision-Language Models |  1 +
 .../Test-Time Training with Masked Autoencoders   |  1 +
 .../neurips/Text Classification with Born's Rule  |  1 +
 ...al Prototype Matching for Video-Text Retrieval |  1 +
 ...Corpus: A 1.6TB Composite Multilingual Dataset |  1 +
 ... can fail even above the Barvinok-Pataki bound |  1 +
 ...: Rate of Differentiating Through Optimization |  1 +
 ...aphic and Socioeconomic Diversity of the World |  1 +
 ...tion and Data Augmentation are Class Dependent |  1 +
 ...gh-Order Methods in Smooth Convex Optimization |  1 +
 ...y-Convex-Strongly-Concave Minimax Optimization |  1 +
 ...onal Trade-offs in High Dimensional Statistics |  1 +
 .../The Gyro-Structure of Some Matrix Manifolds   |  1 +
 data/2022/neurips/The Hessian Screening Rule      |  1 +
 ...tion in Evaluating Deep Reinforcement Learning |  1 +
 data/2022/neurips/The Implicit Delta Method       |  1 +
 ...ad in Non-contrastive Self-supervised Learning |  1 +
 ...moting Collaborative Metric Learning Algorithm |  1 +
 ...Reciprocal Twin of Invariant Risk Minimization |  1 +
 ...lti-step Distributional Reinforcement Learning |  1 +
 ...ite Depth-and-Width Networks at Initialization |  1 +
 ...e Neural Testbed: Evaluating Joint Predictions |  1 +
 data/2022/neurips/The Phenomenon of Policy Churn  |  1 +
 ...ls of Regularization in Off-Policy TD Learning |  1 +
 ...rative Routing Neural Networks for Chip Design |  1 +
 ...ng for Linear Regression under Covariate Shift |  1 +
 ...Privacy Onion Effect: Memorization is Relative |  1 +
 .../neurips/The Query Complexity of Cake Cutting  |  2 ++
 ...e of Baselines in Policy Gradient Optimization |  1 +
 ...Complexity of One-Hidden-Layer Neural Networks |  1 +
 ...Sequence Length Warmup for Training GPT Models |  1 +
 ...veness of PPO in Cooperative Multi-Agent Games |  1 +
 ...of Fully-Connected Layers for Low-Data Regimes |  1 +
 ...ns in Few-shot Prompting for Textual Reasoning |  1 +
 ...helps select flat minima: A stability analysis |  1 +
 ...d learning benefits of Daleian neural networks |  1 +
 ...ol principle for local learning at equilibrium |  1 +
 ...noise structure in low-rank matrix estimation? |  1 +
 ...airness in linear bandits with biased feedback |  1 +
 ...on models : 100GB to 10MB Criteo-tb DLRM model |  0
 ...networks for temporally dependent observations |  1 +
 ... with Smoothness-Aware Quantization Techniques |  1 +
 ...Theoretically Provable Spiking Neural Networks |  1 +
 ...anched Optimal Transport with Multiple Sources |  1 +
 ...rary for Differentiable Nonlinear Optimization |  1 +
 ...eralized Linear Stochastic Convex Optimization |  1 +
 ...for sparse graphs with overlapping communities |  1 +
 ...ZCZE, a comprehensive NLP benchmark for Polish |  1 +
 ...iciently Learns to Control Diffusion Processes |  1 +
 ... Language Models and Automated Theorem Provers |  1 +
 ...in the Face of Uncertainty and Constant Regret |  1 +
 ...radient Methods For Nonconvex Minimax Problems |  1 +
 ...rantees for Zero-Shot Learning with Attributes |  1 +
 ...With Contrastive Fenchel-Legendre Optimization |  1 +
 ... Transport Robust under Martingale Constraints |  1 +
 ...nvolution Networks for Time-series Forecasting |  0
 ... update? Neurons at equilibrium in deep models |  1 +
 ...ingerprinting in Computer-Aided Drug Discovery |  1 +
 ...Token-level Data Augmentation for Transformers |  1 +
 data/2022/neurips/Top Two Algorithms Revisited    |  1 +
 ...l Diffusion for Molecular Conformer Generation |  1 +
 ...lf-Portrait Videos of Faces, Hands, and Bodies |  1 +
 ...Learning from Human-Collected Vision and Touch |  1 +
 ...ient Descent and Discretization Error Analysis |  1 +
 ...eural Network Against Adversarial Perturbation |  1 +
 ...eged Features Distillation in Learning-to-Rank |  1 +
 ...ing in the brain with self-supervised learning |  1 +
 ... Better Evaluation for Dynamic Link Prediction |  1 +
 ...ards Consistency in Adversarial Classification |  1 +
 ...ntangling Information Paths with Coded ResNeXt |  1 +
 ...ot Adaption of Generative Adversarial Networks |  1 +
 ... in Zero-Resource Sounding Object Localization |  1 +
 ...D Object Detection with Knowledge Distillation |  1 +
 ...ng Quantization of Pre-trained Language Models |  1 +
 ...on via 3D-aware Global Correspondence Learning |  1 +
 ...erous Manipulation with Reinforcement Learning |  1 +
 ...bration in Object Detection Under Domain Shift |  1 +
 ...ving Faithfulness in Abstractive Summarization |  1 +
 ...al Hyperparameter Optimizers with Transformers |  1 +
 ... Black-Box Attack Against Deep Neural Networks |  0
 ...plexity in Distributed Non-Convex Optimization |  1 +
 ...equential Event Prediction: A Causal Treatment |  1 +
 ...rol of Singular Values of Convolutional Layers |  1 +
 ...nsductive Minimum Description Length Inference |  1 +
 ...ed Graph Structure Attacks via Gradient Debias |  1 +
 ...nference with Balanced Neural Ratio Estimation |  1 +
 ...e Restoration with Codebook Lookup Transformer |  1 +
 ...forcement Learning with a Safety Editor Policy |  1 +
 ...ly Inspired Neural Initialization Optimization |  1 +
 ...rs' Reasoning with Deep Reinforcement Learning |  1 +
 ...An Effective Theory of Representation Learning |  1 +
 ...nsation of Neural Networks at Initial Training |  1 +
 ... the Mixture-of-Experts Layer in Deep Learning |  1 +
 .../neurips/Towards Versatile Embodied Navigation |  1 +
 ...ual Question Answering: Benchmark and Baseline |  1 +
 ...mance Evaluation Protocol for Cooperative MARL |  1 +
 ...Variable Selection with Theoretical Guarantees |  1 +
 ...mble Bayesian Model for Robust Neural Decoding |  1 +
 ...iational Inference in Bayesian Neural Networks |  1 +
 .../Tractable Optimality in Episodic Latent MABs  |  1 +
 ...in Shapley-Fair Collaborative Machine Learning |  1 +
 ...ff Resource Budgets For Improved Regret Bounds |  1 +
 ...ry with Regularized Deterministic Autoencoders |  1 +
 ...ness, and Complexity in Emergent Communication |  1 +
 ...orks on the Sphere Can Happen in Three Regimes |  1 +
 ...ral Networks with Event-driven Backpropagation |  1 +
 ...ing Neural Networks with Local Tandem Learning |  1 +
 ...Training Subset Selection for Weak Supervision |  1 +
 ...e Classifiers with Conformalized Deep Learning |  1 +
 ... Any-Order Autoregressive Models the Right Way |  1 +
 ...els to follow instructions with human feedback |  1 +
 ...upralinear networks by dynamics-neutral growth |  1 +
 ...Injected and Natural Backdoors During Training |  1 +
 ...nference via Mean-field Langevin in Path Space |  1 +
 ...lance: Improved credit assignment in GFlowNets |  1 +
 ... Saturation and Convergence in High Dimensions |  1 +
 ...tonomous Driving: A Simple yet Strong Baseline |  1 +
 ...t ImageNet Performance using Deep Transduction |  1 +
 ...ransferable Tabular Transformers Across Tables |  1 +
 ...entence Scoring with Sliding Language Modeling |  1 +
 ...eature Spaces for Treatment Effects Estimation |  1 +
 ...ion Shifts via Fair Consistency Regularization |  1 +
 ...entations with Cross-modal Similarity Matching |  1 +
 ...fficient Operator Learning in Frequency Domain |  1 +
 ...former Memory as a Differentiable Search Index |  1 +
 ...ent Reinforcement Learning with Action Parsing |  1 +
 .../Transformers from an Optimization Perspective |  1 +
 ...Attention with Data-Adaptive Sparsity and Cost |  1 +
 ...works with Directed Acyclic Graph Architecture |  1 +
 ...works with a Continuous Manifold of Attractors |  1 +
 ...apping Them into an Easy-to-Replace Subnetwork |  1 +
 ...Metrics and Stability of Graph Neural Networks |  1 +
 ...th known constraints over mixed-feature spaces |  1 +
 ...tive Neuron Morphology Representation Learning |  1 +
 ...ngulation candidates for Bayesian optimization |  1 +
 ...Estimation for Robust Generalized Linear Model |  0
 .../Truly Deterministic Policy Optimization       |  1 +
 ...ower Iteration for Differentiable DAG Learning |  1 +
 ...ble and hassle-free simulation-based inference |  1 +
 ...: Duality and Algorithm for Continuous Actions |  1 +
 data/2022/neurips/Trustworthy Monte Carlo         |  0
 ...Machine for Solving Contextual Bandit Problems |  1 +
 ..., CEs and CCEs with Neural Equilibrium Solvers |  1 +
 ...ed, Dynamic Tabular Datasets for ML Evaluation |  1 +
 ...End to End Entity Linking Benchmark for Tweets |  1 +
 ...-22: Towards Graph-Based Twitter Bot Detection |  1 +
 ... for Sign Language Recognition and Translation |  1 +
 ...ptimization guarantee in the mean-field regime |  1 +
 ...ible TinyML Models for Neural Processing Units |  1 +
 ...Neural Fields for Mix-and-Match Virtual Try-On |  1 +
 ...ubpopulation Shift via Uncertainty-Aware Mixup |  1 +
 ... Deep Classifiers trained via Conditional GANs |  1 +
 ...pervised Learning Benchmark for Classification |  1 +
 ...Approach for Vision with Learned Guiding Codes |  1 +
 ...ated Models Can Improve Human-AI Collaboration |  1 +
 ...el Dynamics for Offline Reinforcement Learning |  1 +
 ...ew Data: The Power of Seeing the Whole Picture |  1 +
 ... Incremental Implicitly-Refined Classification |  1 +
 ...isk-Sensitive Player Evaluation in Sports Game |  1 +
 ...with O(log T) Swap Regret in Multiplayer Games |  1 +
 ...uctural Fairness in Graph Contrastive Learning |  1 +
 ...hoto Critique Dataset for Aesthetic Assessment |  1 +
 ...gn Overfitting in Gradient-Based Meta Learning |  1 +
 ...d on Domain Similarity and Few-Shot Difficulty |  1 +
 ...tive Learning via Coordinate-wise Optimization |  1 +
 ...al Computing for Parallel Single-Pass Learning |  1 +
 ...tworks from the Bayesian-Inference Perspective |  1 +
 ...upervision via Source-aware Influence Function |  1 +
 ...hrough the Lens of Representation Similarities |  1 +
 ...ng Overparametrized Neural Network Classifiers |  1 +
 ...g Subgraph GNNs by Rethinking Their Symmetries |  1 +
 ...mers through Patch-based Negative Augmentation |  1 +
 .../neurips/Understanding the Eluder Dimension    |  1 +
 ... Linear Regions in Deep Reinforcement Learning |  1 +
 ...of Batch Normalization for Transformers in NLP |  1 +
 ...t of Normalization Layers: Sharpness Reduction |  1 +
 ...c 2D Prediction for Multi-Stage Classification |  1 +
 ...Sparse Generalist Models with Conditional MoEs |  1 +
 ...rk for Contrastive Language-Image Pre-training |  1 +
 ...ode Collapse in GANs using a Uniform Generator |  1 +
 ...fied Inference in Sequential Decision Problems |  1 +
 ...port Framework for Universal Domain Adaptation |  1 +
 ...ation with Transformer for 3D Object Detection |  1 +
 ...Based Training-Free Neural Architecture Search |  1 +
 .../Universal Rates for Interactive Learning      |  0
 ... Networks Based on Ridgelet Analysis on Groups |  1 +
 ...nication in Multi-Agent Reinforcement Learning |  1 +
 ...sarial Learning for Open-Set Domain Adaptation |  1 +
 ...tersection-Closed Classes and Extremal Classes |  1 +
 ...its of Reward Engineering on Sample Complexity |  1 +
 ...ed Graph Neural Networks for Dynamical Systems |  1 +
 ...rom Repeated Traversals for Autonomous Driving |  1 +
 ...ised Causal Generative Understanding of Images |  1 +
 ...Task Generalization via Retrieval Augmentation |  1 +
 ...Semantic Segmentation using Depth Distribution |  1 +
 ...anslation with Density Changing Regularization |  0
 ...m Incomplete Measurements for Inverse Problems |  1 +
 ...imization with Principled Objective Relaxation |  1 +
 ...arning of Equivariant Structure from Sequences |  1 +
 ...roup Invariant and Equivariant Representations |  1 +
 ... Shape Programs with Repeatable Implicit Parts |  1 +
 ...Unsupervised Learning under Latent Label Shift |  1 +
 ...ntation by Predicting Probable Motion Patterns |  1 +
 ... Segmentation Using Radiance Field Propagation |  1 +
 ...Object Priors Generation and Detector Learning |  0
 ...Translation and Rotation Group Equivariant VAE |  1 +
 ...by Generative Adversarial Autoencoding Network |  1 +
 ...nt Learning with Contrastive Intrinsic Control |  0
 ...rom Pre-trained Diffusion Probabilistic Models |  1 +
 ...d Skill Discovery via Recurrent Skill Training |  1 +
 ... via Mutual Information Regularized Assignment |  1 +
 ...less and Stealthy Dataset Copyright Protection |  1 +
 data/2022/neurips/Uplifting Bandits               |  1 +
 ...rounded Simulations for Explanation Evaluation |  1 +
 ...stimation of Peer Influence in Social Networks |  1 +
 ...rove Accuracy & Out-of-Distribution Robustness |  0
 ...artial Monotonicity in Submodular Maximization |  1 +
 ... to instill human inductive biases in machines |  1 +
 ...toencoders and Probabilistic Logic Programming |  1 +
 .../neurips/VCT: A Video Compression Transformer  |  1 +
 ...rgence of Navigation in Embodied Rearrangement |  1 +
 ... Federated Learning, Efficiently and Securely? |  1 +
 ...: Variational Interpretable Concept Embeddings |  1 +
 ...f-Supervised Learning of Local Visual Features |  1 +
 ...ance Segmentation via Object Token Association |  1 +
 ...Benchmark for Vision-and-Language Manipulation |  1 +
 ... Pre-Training with Mixture-of-Modality-Experts |  1 +
 ...amework for Visual Deep Reinforcement Learning |  1 +
 ...rmer Compression with Low-Frequency Components |  1 +
 ...tional Inference Based Algorithm for Phylogeny |  1 +
 ...rative Design of Reinforcement Learning Agents |  1 +
 ...CPC leads to acoustic unit discovery in speech |  1 +
 ..., Theory and Application to Federated Learning |  1 +
 ...Perturbation for Source-Free Domain Adaptation |  1 +
 ...ional inference via Wasserstein gradient flows |  1 +
 ...for Rotation Equivariant Geometry Optimization |  1 +
 ...rk for Authorship Verification on the Dark Web |  1 +
 ...fication and search algorithms for causal DAGs |  1 +
 ...raph Neural Network for Circuit Representation |  1 +
 ...oNS: Visual Search in Natural Scenes Benchmark |  0
 ...ransformer Baselines for Human Pose Estimation |  1 +
 data/2022/neurips/Video Diffusion Models          |  1 +
 ...ing to Act by Watching Unlabeled Online Videos |  1 +
 ...chmark of learning-based video-quality metrics |  1 +
 ...ject Interaction Detection from Tubelet Tokens |  1 +
 ...earners for Self-Supervised Video Pre-Training |  1 +
 ...f Visual Recognition to Adversarial Viewpoints |  1 +
 ...Reconstruction with Viscosity and Coarea Grids |  1 +
 ...ion with Right-for-the-Right-Reason Objectives |  1 +
 .../Vision GNN: An Image is Worth Graph of Nodes  |  1 +
 ... Transformers provably learn spatial structure |  1 +
 ...age Foundations for Image Paragraph Captioning |  1 +
 data/2022/neurips/Visual Concepts Tokenization    |  1 +
 .../neurips/Visual Prompting via Image Inpainting |  1 +
 ...prove AI robustness and human-AI team accuracy |  1 +
 ...Adversarial Attacks with Audio-to-Audio Models |  1 +
 ...-Aware Image Synthesis with Sparse Voxel Grids |  1 +
 ...indow-based Transformers for Multi-view Stereo |  1 +
 ...n The (Im)possibility of Fairwashing Detection |  0
 ...means for clustering probability distributions |  1 +
 ...n Iterative Networks for Barycenter Estimation |  1 +
 ...rstein Logistic Regression with Mixed Features |  1 +
 ...Watermarking for Out-of-distribution Detection |  1 +
 ...rror Bounds for Stable Time Series Forecasting |  1 +
 ...ature Maps Compression for Image-to-Image CNNs |  1 +
 .../Wavelet Score-Based Generative Modeling       |  1 +
 ...ntic Segmentation via Dual Similarity Transfer |  1 +
 ...resentation Learning with Sparse Perturbations |  1 +
 ...akly supervised causal representation learning |  1 +
 ...ap Learning for Vision-and-Language Navigation |  1 +
 ... Web Interaction with Grounded Language Agents |  1 +
 .../Weighted Distillation with Unlabeled Examples |  1 +
 ...arning with Diversity-Driven Model Compression |  1 +
 ...d improving Shapley based feature attributions |  1 +
 ...eman Go Walking: Random Walk Kernels Revisited |  1 +
 ...ntext? A Case Study of Simple Function Classes |  1 +
 ...t Kernel Tell Us About Adversarial Robustness? |  1 +
 ...valuation Framework for Explainability Methods |  1 +
 ...hat Makes Graph Neural Networks Miscalibrated? |  1 +
 ...edge Distillation - A Statistical Perspective" |  1 +
 ...e is What You Classify: Black Box Attributions |  1 +
 ...eep Learning via Distributional Generalization |  1 +
 ... Systems? New Perspectives on NLP Benchmarking |  1 +
 ...pen-World Phrase-Grounding without Text Inputs |  1 +
 ...c to Study Generalization of Minimax Learners? |  1 +
 ... the Fraction Negatively Affected by Treatment |  1 +
 ...formers: Recipes from Training to Architecture |  1 +
 ...l Thompson Sampling meets Approximation Regret |  1 +
 .../neurips/When Do Flat Minima Optimizers Work?  |  1 +
 ...rivate Learning Not Suffer in High Dimensions? |  1 +
 ...ariant Learning Survive Spurious Correlations? |  1 +
 ...ned Analysis of Differentially Private Bandits |  1 +
 ... are Local Queries Useful for Robust Learning? |  1 +
 ...ine Two-Player Zero-Sum Markov Games Solvable? |  1 +
 ...? Analyzing the remaining mistakes on ImageNet |  1 +
 ...rning work for offline reinforcement learning? |  1 +
 ...rventions in Autonomous Reinforcement Learning |  1 +
 ...imal Intervention Policies for Critical Events |  1 +
 ...age Models as Accounts of Human Moral Judgment |  1 +
 ...brid Offline-and-Online Reinforcement Learning |  1 +
 ...Constrained Model-based Reinforcement Learning |  1 +
 ...rameter-Space Saliency Maps for Explainability |  1 +
 ...tion in Sparse Training for Feature Selection? |  1 +
 ...orative Perception via Spatial Confidence Maps |  1 +
 ...ective to Characterizing Post Hoc Explanations |  1 +
 ...gence Rate of Coupling-based Normalizing Flows |  1 +
 ...lly Generated Data Help Adversarial Robustness |  0
 ... is Difficult: Perspective of Expressive Power |  1 +
 ... Ensembles, and Why Their Independence Matters |  1 +
 ...trastive Learning? A Gradient-Bias Perspective |  1 +
 ...perform deep learning on typical tabular data? |  1 +
 ... The many regularizers of geometric complexity |  1 +
 ...rk of in-the-Wild Distribution Shift over Time |  1 +
 .../Will Bilevel Optimizers Benefit from Loops    |  1 +
 ...chmark to Challenge Vision-and-Language Models |  1 +
 ...ale Chinese Cross-modal Pre-training Benchmark |  1 +
 ...trained Transformers Made Simple and Efficient |  1 +
 ...ormers and RvS Fail in Stochastic Environments |  1 +
 ...ation via Bank-constrained Manifold Projection |  0
 ... Live Once: Single-Life Reinforcement Learning |  1 +
 ...f-Distribution Detection Method is Not Robust! |  1 +
 ...ansformer May Not be as Powerful as You Expect |  1 +
 ...er Optimization for Neural Architecture Search |  1 +
 ...earn Invariance Without Environment Partition? |  1 +
 ...al Navigation using Multimodal Goal Embeddings |  1 +
 ...hot 3D Drug Design by Sketching and Generating |  1 +
 ...ering via Frozen Bidirectional Language Models |  1 +
 .../neurips/Zero-Sum Stochastic Stackelberg Games |  1 +
 ...ogeneous Graph via Knowledge Transfer Networks |  1 +
 ... Recognition and Acquisition at Inference Time |  1 +
 ...ning Quantization for Large-Scale Transformers |  1 +
 ...d-Thresholding: Gradient Error vs. Expansivity |  1 +
 ...ding: Escaping Saddle Points without Gradients |  1 +
 ...ins for Lagrangian Neural Network Verification |  1 +
 ...del Zoo for Out-of-Distribution Generalization |  1 +
 data/2022/neurips/coVariance Neural Networks      |  1 +
 ...aset using mmWave, RGB-D, and Inertial Sensors |  1 +
 ... Benchmark for Personalized Federated Learning |  1 +
 ...r training deep networks with unitary matrices |  1 +
 ...k Deep Learning based Knowledge Tracing Models |  1 +
 ...g And Zero-Shot Transfer to Unlabeled Modality |  1 +
 ...ctivity Using Synthetic Aperture Radar Imagery |  1 +
 ...Scale Embodied AI Using Procedural Generation" |  1 +
 ...Automatically Terminate Bayesian Optimization" |  1 +
 ...zation: A unified approach to private training |  1 +
 ...g with Bidirectional Communication Compression |  1 +
 ...D molecule generation by denoising voxel grids |  1 +
 ...ecting the 3D World into Large Language Models |  1 +
 .../neurips/4D Panoptic Scene Graph Generation    |  1 +
 ...ing Training Data Attribution In Deep Learning |  1 +
 ...y Estimation for Computerized Adaptive Testing |  1 +
 ...s-Moment Approach for Causal Effect Estimation |  1 +
 ...ed Class Incremental Learning for Vision Tasks |  1 +
 ...reaming Media Performance over HTTP 3 Browsers |  1 +
 ...iffusion-Model of Joint Interactive Navigation |  1 +
 ...tem View of Langevin-Based Non-Convex Sampling |  1 +
 ...ependent Learning in Zero-Sum Stochastic Games |  1 +
 ...tiparameter Persistent Homology Decompositions |  1 +
 ... Robust G-Invariance in G-Equivariant Networks |  1 +
 ...Correct, Incorrect, and Extrinsic Equivariance |  1 +
 ...t Extraction and Concept Importance Estimation |  1 +
 ...antic Similarity Dataset of Historical English |  1 +
 ... Measure-Theoretic Axiomatisation of Causality |  1 +
 ... Posterior Sampling in Image Recovery Problems |  1 +
 ...ing the Projection Robust Wasserstein Distance |  1 +
 ...m for the Euclidean Bipartite Matching Problem |  1 +
 ...twork for DSIC Affine Maximizer Auction Design |  1 +
 ...o Find a Causal Order in Additive Noise Models |  1 +
 ...ty Bounds for Constrained Minimax Optimization |  1 +
 ...variance Estimation in Relative Frobenius Norm |  1 +
 ...timization in Linear Markov Decision Processes |  1 +
 ...of Skills based on Determinantal Point Process |  1 +
 ...ery in Nonlinear Generative Compressed Sensing |  1 +
 ...Model and Dimension for Interactive Estimation |  1 +
 ...lable Framework for Neural Population Decoding |  1 +
 ...on: Game Dynamics for Multi-Objective Learning |  1 +
 ... optimize time-space tradeoff for large models |  0
 ...aphon-signal analysis of graph neural networks |  1 +
 ...xpressive 3D equivariant graph neural networks |  1 +
 ...r learning to represent visual transformations |  1 +
 ...or information-theoretic generalization bounds |  1 +
 ...Gym: Design Choices for Deep Anomaly Detection |  1 +
 ...Gradient Difference for Preconditioning Matrix |  1 +
 ...Control in High-Dimensional Mediation Analysis |  1 +
 ...ral Programming with Interactive Decomposition |  1 +
 ...nsor Networks for Quantum Many-Body Simulation |  1 +
 ...Benchmarking Tool for Label Quality Assessment |  1 +
 ...Regressive Diffusion Model for Text Generation |  1 +
 ...r Advancing Isolated Sign Language Recognition |  1 +
 ...icient Parallelization of Deep Neural Networks |  1 +
 ...hrough Memory Efficient Attention Manipulation |  1 +
 ...wing Instructions with Latent Diffusion Models |  1 +
 ...mation Seeking with Large Language Model Agent |  1 +
 ... Claim Verification with Evidence from the Web |  1 +
 ... generation of in-vitro functioning antibodies |  1 +
 ...ining with Module-Wise Descending Asynchronism |  0
 ...ex Optimization Problem with Infinite Variance |  1 +
 ...lerating Motion Planning via Optimal Transport |  1 +
 ...r Dimensions for Unsupervised Word Translation |  1 +
 ...ization with Multimodal Unified Representation |  1 +
 ...t Imitation from Observation with World Models |  1 +
 ...nt Learning under Limited Visual Observability |  1 +
 ...neral task space with applications in robotics |  1 +
 ...vity Grammars for Temporal Action Segmentation |  1 +
 ...ANNS: A Framework for Adaptive Semantic Search |  1 +
 ...ors for Data-Efficient Complex Query Answering |  1 +
 ...ent Regression with Applications to Panel Data |  1 +
 ...vacy Composition for Accuracy-first Mechanisms |  1 +
 ...st-Time Personalization for Federated Learning |  1 +
 ...omputation scaling to unseen difficulty levels |  1 +
 ...d Thin: Diffusion for Temporal Point Processes |  1 +
 ...dressing Negative Transfer in Diffusion Models |  1 +
 ...rial Counterfactual Environment Model Learning | 14 ++++++++++++++
 ... Networks for Low Dimensional Linear Subspaces |  1 +
 ...d Generalization for Gradual Domain Adaptation |  1 +
 ...versarial Training from Mean Field Perspective |  0
 ...ount Tracking via Partial Differential Privacy |  1 +
 ... under Budget Constraint for Online Algorithms |  0
 data/2023/neurips/Affinity-Aware Graph Networks   |  1 +
 ...atter Dataset From Delhi For ML based Modeling |  1 +
 ... Stationary Distribution Correction Estimation |  1 +
 ...ibution Alignment for Zero-Shot Generalization |  1 +
 ...nd Hessian for Neural Signed Distance Function |  1 +
 ...with Human Preferences via a Bayesian Approach |  1 +
 ...resentations supports robust few-shot learning |  1 +
 ...lignment for Weakly-supervised 3D Segmentation |  1 +
 ...f-Experts for Integrated Multimodal Perception |  1 +
 ...Alternating Updates for Efficient Transformers |  1 +
 ...makes the adversary weaker in two-player games |  1 +
 ...red Text Dataset of Historical U.S. Newspapers |  1 +
 ...Scalable Variational Inference for Latent SDEs |  1 +
 ...under the Polyak-\305\201ojasiewicz Condition" |  0
 ...ugin and Its Application to Continual Learning |  0
 ...roach to Best of Both Worlds in Linear Bandits |  1 +
 ...racle-Efficient Adversarial Contextual Bandits |  1 +
 ...n Networks: Approximating Equitable Partitions |  1 +
 ...content of communication between brain regions |  1 +
 ...n of Neural Networks through Loss Path Kernels |  1 +
 ...f Self-Supervised Image Reconstruction Methods |  1 +
 data/2023/neurips/Anchor Data Augmentation        |  1 +
 ...d Copy-Robust Delegations for Liquid Democracy |  1 +
 .../Anytime Model Selection in Linear Bandits     |  1 +
 ...itive Reinforcement Learning with Policy Prior |  1 +
 ...ral Causal Bandits with Unobserved Confounders |  1 +
 ...nference of marginals using the IBIA framework |  1 +
 ...iffusion Models Vision-And-Language Reasoners? |  1 +
 ... More Data Hungry Than Newborn Visual Systems? |  1 +
 ...Blind Omnidirectional Image Quality Assessment |  1 +
 data/2023/neurips/Auditing for Human Expertise    |  1 +
 ...gmenting Language Models with Long-Term Memory |  1 +
 ...e Translation for Daily Communication and News |  1 +
 ...raph Optimization for Neural Network Evolution |  1 +
 .../Autodecoding Latent 3D Diffusion Models       |  1 +
 ...and Benchmarking Agents that Solve Fuzzy Tasks |  1 +
 ...ransformer for Cross-data Learning in the Wild |  0
 ...or Efficient Neural Combinatorial Optimization |  1 +
 ...ing Modal Transformation for Data Augmentation |  0
 ...d-Bandit Strategy for Automated Phased Release |  0
 ...loud Segmentation with Inter-Part Equivariance |  1 +
 ...t Task Assignment with Unknown Processing Time |  0
 .../BanditPAM++: Faster k-medoids Clustering      |  1 +
 ...casting with Learnable and Interpretable Basis |  1 +
 ...esTune: Bayesian Sparse Deep Model Fine-tuning |  1 +
 ...usal Discovery with Multi-Fidelity Experiments |  0
 .../Bayesian Learning via Q-Exponential Process   |  1 +
 ... Uncertainty Quantification in Image Retrieval |  1 +
 ...-Averse Q-Learning with Streaming Observations |  1 +
 ...s for analyzing neural spike train variability |  1 +
 ...lignment of LLM via a Human-Preference Dataset |  1 +
 ...tion Models with Language-Model-as-an-Examiner |  1 +
 ...ssion Through Better Private Feature Selection |  1 +
 ...ntralized Learning via Finite-time Convergence |  1 +
 ...abeled Data through Holistic Predictive Trends |  1 +
 ...luation Processes: An Optimization-Based Model |  1 +
 ...on Algorithms for the Submodular Cover Problem |  1 +
 ...nsional Mechanism Design with Side Information |  1 +
 .../Bifurcations and loss jumps in RNN training   |  1 +
 ...mation using Multi-modal Satellite Time-series |  0
 ...Recovery: A Novel Benchmark Dataset and Method |  1 +
 ...ck-Box Differential Privacy for Interactive ML |  1 +
 ...ug-and-Play Methods for Blind Inverse Problems |  1 +
 data/2023/neurips/Block-State Transformers        |  1 +
 ...Transferability by Achieving Flat Local Maxima |  1 +
 ...ta via Kernel Correction and Affinity Learning |  0
 ...tor for Offline Design of Biological Sequences |  1 +
 ...ng-Free Semantic Control with Diffusion Models |  1 +
 ...ounding training data reconstruction in DP-SGD |  1 +
 ... Discovery using Large Scale Generative Models |  1 +
 ...-Accuracy Tradeoff with f-Differential Privacy |  1 +
 ... 3D Scene Understanding with Foundation Models |  1 +
 ... localization from dense multielectrode probes |  1 +
 ...-Optimal Adversarial Linear Contextual Bandits |  1 +
 ... Factors under an Inductive Bias of Confounder |  1 +
 ...uning for Highly-Accurate Sparse Vision Models |  1 +
 ...ynamics Under Temporal Environmental Variation |  1 +
 ...in Space and Time for Video Action Recognition |  1 +
 ...EIL: Generalized Contextual Imitation Learning |  1 +
 ... channel-adaptive models in microscopy imaging |  1 +
 ...d Counterfactual Examples for Image-Text Pairs |  1 +
 ...Benchmark for Continual Reinforcement Learning |  1 +
 ...ed Deep Offline Reinforcement Learning Library |  1 +
 ...orcement Learning with a Quantized World Model |  1 +
 ...ollable, Robust and Secure Image Steganography |  1 +
 ...er with Continuously Weighted Contrastive Loss |  1 +
 .../Cal-DETR: Calibrated Detection Transformer    |  1 +
 ... GNN Over Multi-Relational and Temporal Graphs |  1 +
 ...Matching: Trainable Kernel Calibration Metrics |  1 +
 ...for Generating Camoflauged Adversarial Patches |  1 +
 ...ion Frequency in Delayed Feedback Environments |  0
 ...terventional data across multiple environments |  1 +
 ...iven Augmentations for Text OOD Generalization |  1 +
 ... of Causal Graphs with Small Conditioning Sets |  1 +
 ...verfitting in Robust Multiclass Classification |  0
 ...on Shift: A Model Weight Perturbation Approach |  1 +
 ...rarchical Comparisons for Image Classification |  1 +
 ...e Instruction Tuning for Large Language Models |  1 +
 ... Metrics for Autoregressive Transformer Models |  0
 ...ounding Dataset on City-scale Point Cloud Data |  1 +
 ...itional Conformal Prediction with Many Classes |  1 +
 ... Circuits: Parallel Environments and Benchmark |  1 +
 ...hine Learning for Weather and Climate Modeling |  1 +
 ...omer: Clustering As A Universal Visual Learner |  0
 ...etch: Dynamic Compression for Embedding Tables |  1 +
 ...gnment for Open-vocabulary 3D Object Detection |  1 +
 ...Alignment for Open-Vocabulary Object Detection |  1 +
 ... for Communication-Efficient Private Inference |  1 +
 ...via Long-Range Modulatory Feedback Connections |  1 +
 .../neurips/Collaborative Alignment of NLP Models |  0
 ...ollaborative Learning via Prediction Consensus |  1 +
 ...ore Distillation for Consistent Visual Editing |  0
 ...ing Linear Models with Structured Missing Data |  1 +
 ...Collapsed Inference for Bayesian Deep Learning |  1 +
 ...c Video Representations with Dynamic Codebooks |  0
 ...s, Structural Models, Graphs, and Abstractions |  1 +
 ...ing and Self-Training Under Distribution Shift |  1 +
 ...ing Neural Networks: Smoothness and Degeneracy |  1 +
 ...Score-Based) Text-Controlled Generative Models |  1 +
 ...an-Centered Explanations for Model Improvement |  1 +
 ...fficient Transfer Learning with Fast Inference |  1 +
 ...gled Representations in Reinforcement Learning |  1 +
 ...y-Aware Planning with Diffusion Dynamics Model |  1 +
 ...arning and its Application to Minimax Theorems |  1 +
 .../Connecting Certified and Adversarial Training |  1 +
 ... Estimation for Offline Reinforcement Learning |  1 +
 ...pic Gaussian Diffusion Model for Image Editing |  1 +
 ...uction for Offline Meta-Reinforcement Learning |  1 +
 ...fective Topic Modeling in Low-Resource Regimes |  1 +
 .../neurips/Context-lumpable stochastic bandits   |  1 +
 ... Learning with Preference-Based Active Queries |  1 +
 .../Contextual Stochastic Bilevel Optimization    |  1 +
 ...Continuous-Time Functional Diffusion Processes |  1 +
 ...pervised Halfspace Learning in Polynomial Time |  1 +
 ...ontrastive Sampling Chains in Diffusion Models |  0
 ...xt-to-Image Diffusion by Orthogonal Finetuning |  1 +
 ...-Concave Zero-Sum Stochastic Stackelberg Games |  0
 ...ators for robust and accurate learning of PDEs |  1 +
 ... Models for Long-Range Spatiotemporal Modeling |  1 +
 ...e-sets for Fair and Diverse Data Summarization |  1 +
 ...d Deep Neural Networks without Weight Symmetry |  1 +
 ...rrespondence Priors for Neural Radiance Fields |  1 +
 ...for Offline Multi-agent Reinforcement Learning |  1 +
 ... Evaluation of Peer-Review Assignment Policies |  1 +
 ...cal Surfaces by Diffeomorphic Mesh Deformation |  1 +
 .../Covariance-adaptive best arm identification   |  1 +
 ...el Skill Hierarchies in Reinforcement Learning |  1 +
 ...g a Public Repository for Joining Private Data |  0
 ...icy Adaptation via Value-Guided Data Filtering |  1 +
 ...nking the Debiased GNN from a Data Perspective |  1 +
 ...eural Network via Discrete Denoising Diffusion |  0
 ...-DETR: Divide the Attention Layers and Conquer |  0
 ...or visual understanding of mixture-of-datasets |  1 +
 ... and High-quality Speech-to-Speech Translation |  1 +
 ... for Vector Set Search with Vector Set Queries |  1 +
 ...e Replay in Multi-Agent Reinforcement Learning |  1 +
 ...ffusion Solvers for Combinatorial Optimization |  1 +
 .../DISCS: A Benchmark for Discrete Sampling      |  1 +
 ...al Multi-Dataset Multi-Task Segmentation Model |  1 +
 .../neurips/Data Quality in Imitation Learning    |  1 +
 ... for Language Models via Importance Resampling |  1 +
 ... Linear Systems with Unknown Noise Covariances |  1 +
 ...neration for Pixel-Level Semantic Segmentation |  1 +
 .../Debiasing Conditional Stochastic Optimization |  1 +
 ... Multi-armed Bandit with Heterogeneous Rewards |  1 +
 ...rcement Learning via Modular Generative Models |  1 +
 ...s and AIs on the Many Facets of Working Memory |  1 +
 ... Concept Learning For 3D Novel Class Discovery |  0
 ...Subtasks in Multi-Agent Reinforcement Learning |  1 +
 ...eep Contract Design via Discontinuous Networks |  1 +
 .../neurips/Deep Fractional Fourier Transform     |  0
 ... Fields for Graph-Structured Dynamical Systems |  1 +
 ...ights into Noisy Pseudo Labeling on Graph Data |  1 +
 ...imal for the Deep Unconstrained Features Model |  1 +
 ...ced Ant Systems for Combinatorial Optimization |  1 +
 ...izing Sequential Operations in Neural Networks |  1 +
 ...Hand-Object Interaction via Physics Simulation |  1 +
 data/2023/neurips/Delegated Classification        |  1 +
 ...n Graph Neural Networks: Can One Size Fit All? |  1 +
 ...) Promote Compositional Reasoning in VL Models |  1 +
 ...ric Learning for Monocular 3D Object Detection |  1 +
 ...ection with FDR control via conformal e-values |  1 +
 ...ing Object Detection with Flexible Expressions |  1 +
 ...Diffusion Transformers for 3D Shape Generation |  1 +
 ...ate Views using Neural Template Regularization |  0
 ...o-Audio Synthesis with Latent Diffusion Models |  1 +
 ...ng Knowledge From Pre-trained Diffusion Models |  1 +
 ...ainst Diffusion-Based Adversarial Purification |  1 +
 ...Diffusion-based Generative 3D Shape Completion |  1 +
 ...allel Random Patch Diffusion in Histopathology |  1 +
 ... Trajectory with Diffusion Probabilistic Model |  1 +
 ...ble Clustering with Perturbed Spanning Forests |  1 +
 ...tiable sorting for censored time-to-event data |  0
 ...gh \316\262-Divergence One Posterior Sampling" |  1 +
 ... Physics-Augmented Generative Diffusion Models |  1 +
 ...thesizer for Multi-Task Reinforcement Learning |  1 +
 ...elf-Guidance for Controllable Image Generation |  1 +
 ...ed Policy Optimization without Reward Modeling |  1 +
 ...ts can control the sign of synaptic plasticity |  1 +
 ...entanglement of Diffusion Probabilistic Models |  1 +
 ...ding for Multi-Instance Partial-Label Learning |  1 +
 ...Algorithms with Adversarial Environment Design |  1 +
 ...einforcement Learning via Contrastive Learning |  1 +
 ...-Temporal Logic Rules to Explain Human Actions |  1 +
 ...in Autoencoder for T-Cell Receptor Engineering |  1 +
 ...gnitive Diagnosis with Limited Exercise Labels |  0
 ... with Self-Supervision for Speaker Recognition |  1 +
 ...ustness from Vision-Language Foundation Models |  1 +
 ...ing of Large Language Models Over The Internet |  1 +
 ...buted Personalized Empirical Risk Minimization |  1 +
 ...ets Towards Calibrated Sparse Network Training |  1 +
 ...tyle Modulated Generative Adversarial Networks |  1 +
 ...bution Detection via Informative Extrapolation |  1 +
 ...ts with Automatic Diffusion-based Augmentation |  1 +
 ...to-Image Alignment with Iterative VQA Feedback |  1 +
 ... Mixtures Speeds Up Language Model Pretraining |  1 +
 ...via Environment Augmentation Learn Invariance? |  1 +
 ...ork training problem always admit an optimum ? |  1 +
 ...ut Learning due to Gradients and Cross Entropy |  1 +
 ...gnitude! Your mask topology is a secret weapon |  0
 ... Generic Algorithm and Robust Partial Coverage |  1 +
 ...th Dimension-Independent Convergence Guarantee |  1 +
 ...lication to High-Dimensional Synthetic Control |  1 +
 ...ented Transfer for Meta-Reinforcement Learning |  1 +
 ...ble Multivariate Time Series Anomaly Detection |  1 +
 ...out Individual Global Max for Cooperative MARL |  1 +
 ...nce of Gene Regulatory Networks with GFlowNets |  1 +
 ...Point: Dynamic Neural Point For View Synthesis |  1 +
 ... and Interpretable Autoregressive Transformers |  1 +
 ...ed Learning with Adaptive Differential Privacy |  1 +
 ...amic Regret of Adversarial Linear Mixture MDPs |  1 +
 ...mic Sparsity Is Channel-Level Sparsity Learner |  1 +
 ...e Input Views with Monocular Depth Adaptation" |  1 +
 ...n with Spatio-Temporal Representation Learning |  1 +
 ...wering Dataset Combined With Electrocardiogram |  1 +
 ...nt Diffusion for Planning with Embodied Agents |  1 +
 ...l Waveform Inversion of Geophysical Properties |  1 +
 ...k for Few-Shot Evaluation of Foundation Models |  1 +
 .../ELDEN: Exploration via Local Dependencies     |  1 +
 .../neurips/Easy Learning from Label Proportions  |  1 +
 ...scedastic Regression with Deep Neural Networks |  1 +
 ...Shifts for Models with Different Training Data |  1 +
 ...on Sets in Hierarchical Reinforcement Learning |  0
 ...earning via Robustness-Aware Coreset Selection |  1 +
 ...lized Linear Bandits with Heavy-tailed Rewards |  1 +
 ...Extrapolation using Prior-Data Fitted Networks |  1 +
 ...on Policies For Offline Reinforcement Learning |  1 +
 ...ear Graph Neural Networks via Node Subsampling |  0
 ...icient Model-Free Exploration in Low-Rank MDPs |  1 +
 ...ning using Inverse Dynamic Bisimulation Metric |  0
 ...an Optimization for Arbitrary Uncertain inputs |  1 +
 ...l Equations with Positive Semi-Definite Models |  1 +
 ...th Second-Order Degradation and Reconstruction |  1 +
 ...duction for Over-Parameterized Neural Networks |  1 +
 ...ions into geometric deep learning force fields |  1 +
 ...term Egocentric Visual Object Tracking Dataset |  1 +
 ...a Abnormal Adversarial Examples Regularization |  1 +
 ...al Neural Networks through Activation Sparsity |  1 +
 .../Emergent Communication for Rules Reasoning    |  1 +
 ...volutional Neural Nets with MetaSin Activation |  0
 ...ls for Inverse Problems in High Energy Physics |  1 +
 ...ntext Update in Text-to-Image Diffusion Models |  1 +
 .../Energy-Efficient Scheduling with Predictions  |  1 +
 ...Incremental Learning with Data-free Subnetwork |  1 +
 ...urring in High-Speed Scenes with Spike Streams |  0
 ...ware Optimization Through Variance Suppression |  1 +
 ...ural Optimal Transport via Diffusion Processes |  1 +
 ... Methods for Scalable Neural Implicit Samplers |  1 +
 ...k Learning with Heterogeneous Neural Processes |  1 +
 ...ual Opportunity of Coverage in Fair Regression |  1 +
 ...tation Learning from Imbalanced Demonstrations |  0
 ... a Combination of Observations and Experiments |  0
 ...o provably learn large scale dynamical systems |  1 +
 ...ric with Noise-Contaminated Intrinsic Distance |  0
 ...Planning in Large Language Models with CogEval |  1 +
 ...ng Neuron Interpretation Methods of NLP Models |  1 +
 data/2023/neurips/Evaluating Open-QA Evaluation   |  1 +
 ... Graph Neural Networks via Robustness Analysis |  0
 ... Models Under Structural Distributional Shifts |  1 +
 ...rvised Learning for Molecular Graph Embeddings |  1 +
 ...Augmented Computation-Intensive Math Reasoning |  1 +
 ...hrough Explanation Invariance and Equivariance |  1 +
 ...g Small-Scale Datasets with Guided Imagination |  1 +
 ...erimental Designs for Heteroskedastic Variance |  1 +
 ...ed Auxiliary Feedback in Parameterized Bandits |  1 +
 ...nvex games for convergence to Nash equilibrium |  1 +
 ...d Training Strategy in Spiking Neural Networks |  1 +
 ...oring Question Decomposition for Zero-Shot VQA |  1 +
 ...Algorithms for Supervised Matrix Factorization |  1 +
 ...tion Glitches with Flip-Flop Language Modeling |  1 +
 ...and their unfair treatment of diffusion models |  1 +
 ...riant Networks for Spectral Geometric Learning |  1 +
 .../Expressivity-Preserving GNN Simulation        |  0
 .../FAMO: Fast Adaptive Multitask Optimization    |  1 +
 ...uation of Open-Domain Text-to-Video Generation |  1 +
 ...chmark for Evaluating Interpretability Methods |  1 +
 ... Algorithm for Multinomial Logistic Regression |  0
 ... in Federated Learning using Invariant Dropout |  1 +
 ...ld Model Backbones: RNNs, Transformers, and S4 |  1 +
 ...ussian Process Optimization with Regret Bounds |  1 +
 data/2023/neurips/Fair Graph Distillation         |  1 +
 ...nalysis: Statistical and Algorithmic Viewpoint |  1 +
 ...lexible and Controllable Optimization Approach |  0
 ...Scene Understanding in Open-World Environments |  1 +
 ...te: Limits of Transformers on Compositionality |  1 +
 ...ry Proportion control for aggregated Knockoffs |  1 +
 ...milarity Graphs with Kernel Density Estimation |  1 +
 .../Fast Model DeBias with Machine Unlearning     |  1 +
 ...ision Matrix Estimation under Total Positivity |  1 +
 ... Best Order Score Search and Grow Shrink Trees |  1 +
 ...e Convex Optimization via Second-Order Methods |  1 +
 ...nimization with Predictions: The M-Convex Case |  1 +
 ...aster approximate subgraph counts with privacy |  1 +
 ...g for Interpretable, Performant Decision Trees |  1 +
 ...eralization of Generative Models Using Samples |  1 +
 ... Selection in the Contrastive Analysis Setting |  1 +
 ...Learning with Self-Adjusting Gradient Balancer |  1 +
 ...rated Training of Graph Convolutional Networks |  1 +
 ...ation with Normalized Annealing Regularization |  1 +
 ...Linear Bandits with Finite Adversarial Actions |  1 +
 ...nima Efficiently in Decentralized Optimization |  0
 ...fe Zones of Markov Decision Processes Policies |  1 +
 ...Using a Correlation-Aware Homography Estimator |  1 +
 data/2023/neurips/Fine-Grained Visual Prompting   |  1 +
 ...tic Guarantees for Treatment Effect Estimation |  1 +
 .../neurips/Flat Seeking Bayesian Neural Networks |  1 +
 ...vate Prompt Learning for Large Language Models |  1 +
 .../Flow Factorized Representation Learning       |  1 +
 ...Infrastructure Cooperative 3D Object Detection |  1 +
 ...: Per-instance Personalized Federated Learning |  1 +
 ...us Your Attention when Few-Shot Classification |  0
 ...l Mining Transformer for Few-Shot Segmentation |  1 +
 ...g Contextual Bandits via Post-serving Contexts |  1 +
 ...ete Probability Flow Through Optimal Transport |  1 +
 ...ries Forecasting from a Pure Graph Perspective |  1 +
 ...D Hand Representation Using Fourier Query Flow |  1 +
 ...orial and Mixed-variable Bayesian Optimization |  1 +
 .../Frequency Domain-Based Dataset Distillation   |  1 +
 ... Effective Learners in Time Series Forecasting |  1 +
 ... Language Models to Pre-trained Machine Reader |  1 +
 ...e Negative Depth to Edge Heterophily in Graphs |  1 +
 ...Protein Pocket Design via Iterative Refinement |  1 +
 ...ian Pseudocoreset for Bayesian Neural Networks |  1 +
 ...ivity of Reducible Hyperbolic Tangent Networks |  1 +
 ... Abnormality for Out-of-distribution Detection |  1 +
 ... A Library for Gaussian Processes in Chemistry |  1 +
 ...r Instantaneous Graph Learning Model Selection |  1 +
 ...deo Generation via GLOBal Guided Video DecodER |  1 +
 .../2023/neurips/GMSF: Global Matching Scene Flow |  1 +
 ...rk For Interpreting Artificial Neural Networks |  1 +
 ...ining of Spatio-Temporal Graph Neural Networks |  1 +
 ...nguage Model to Use Tools via Self-instruction |  1 +
 ...ner Common Information Variational Autoencoder |  1 +
 .../neurips/Gaussian Membership Inference Privacy |  1 +
 ...ction and Application to High-dimensional Data |  1 +
 ...ess Probes (GPP) for Uncertainty-Aware Probing |  1 +
 ...ale Benchmark for Detecting AI-Generated Image |  1 +
 ... Surface Reconstruction from Multi-View Images |  1 +
 ...tific Simulators via Amortized Cost Estimation |  1 +
 data/2023/neurips/Generalized Belief Transport    |  1 +
 ...ls by Removing Label Bias in Foundation Models |  1 +
 ...ted Path Consistency for Mastering Atari Games |  1 +
 ...s between subsampling and ridge regularization |  1 +
 ...formance in extreme multi-label classification |  1 +
 ... Diverse Policies with Latent Diffusion Models |  1 +
 ...ar SDEs with Additive and Multiplicative Noise |  1 +
 ...erse Evaluation Dataset for Object Recognition |  1 +
 ...desic Multi-Modal Mixup for Robust Fine-Tuning |  1 +
 ...ometric Analysis of Matrix Sensing over Graphs |  1 +
 ...er Neural Network without Overparameterization |  0
 ...onary Learning via Matrix Volume Optimization" |  0
 ...fusion Process for Low-light Image Enhancement |  1 +
 ...t-Based Feature Learning under Structured Data |  1 +
 ...Language Generation with Large Language Models |  1 +
 ...vised learning of latent temporal dependencies |  1 +
 ...Datasets and Evaluations for Accident Analysis |  1 +
 ...ann Manifold Flows for Stable Shape Generation |  1 +
 ...ation with Grounded Models for Embodied Agents |  1 +
 data/2023/neurips/Group Fairness in Peer Review   |  1 +
 ... in Federated Learning with Pre-Trained Models |  1 +
 ...med Representations for Dexterous Manipulation |  1 +
 ...Comprehensive Assembly Knowledge Understanding |  1 +
 ...Human Handover Dataset with Large Object Count |  1 +
 ...ce Properties of Text-Guided Image Classifiers |  1 +
 ...the power of choices in decision tree learning |  1 +
 ...rarchical Encoding-based Neural Representation |  1 +
 ...l Learning: Rethinking Obscured Sub-optimality |  1 +
 .../Hierarchical Multi-Agent Skill Discovery      |  0
 ...th Application to Diffusion Model Acceleration |  1 +
 ...odel Evaluation with Conditional Randomization |  1 +
 ...ensional Asymptotics of Denoising Autoencoders |  1 +
 .../Holistic Evaluation of Text-to-Image Models   |  1 +
 ... of NeuralODEs for accurate dynamics discovery |  1 +
 ... State of Instruction Tuning on Open Resources |  1 +
 .../How Re-sampling Helps for Long-Tail Learning? |  1 +
 ...ical abilities in a pre-trained language model |  1 +
 ...Model Shift and Model Bias Policy Optimization |  1 +
 data/2023/neurips/How to Scale Your EMA           |  1 +
 ...wledge Graph Embeddings into Generative Models |  1 +
 ...ting via Hub Generation and Pin-hub Connection |  1 +
 ...uman-Guided Complexity-Controlled Abstractions |  1 +
 ...or Deep Stimulus Encoding in Visual Prostheses |  1 +
 ...nstructing Facial Skin-Spectra from RGB Images |  1 +
 ...oosts Fine-Grained Learning from Coarse Labels |  1 +
 ...ersible Backdoor Attacks in Federated Learning |  1 +
 ...es Inversely Correlated on Real-world Datasets |  1 +
 ...ntation Method for Training Robust Classifiers |  1 +
 ...oom and Spatial Biases in Image Classification |  1 +
 ...Human Preferences for Text-to-Image Generation |  1 +
 ...hesis with Scene Graph Hallucination Diffusion |  0
 ...ion: Theoretical Justifications and Algorithms |  1 +
 .../Imitation Learning from Vague Feedback        |  1 +
 ...r Logistic Regression at the Edge of Stability |  1 +
 .../Implicit Manifold Gaussian Process Regression |  1 +
 ...n in Over-Parameterized Support Vector Machine |  1 +
 ...onal Inference for High-Dimensional Posteriors |  1 +
 ...variance regularization in non-contrastive SSL |  1 +
 ...dits Using Tail Bounds for Martingale Mixtures |  1 +
 ...Yield Reduced Social Welfare Under Competition |  1 +
 ...General Regularizers and Multiple Optimal Arms |  1 +
 ...ipped Gradient Methods with Heavy Tailed Noise |  1 +
 ...mproving Robustness with Adaptive Weight Decay |  1 +
 ...presentations using human similarity judgments |  1 +
 ...or Calibrated and Consistent Learning to Defer |  1 +
 ...ls Large Language Models' Strengths and Biases |  1 +
 ...Context Learning Unlocked for Diffusion Models |  1 +
 .../Individual Arbitrariness and Group Fairness   |  0
 ...ferring Hybrid Neural Fluid Fields from Videos |  1 +
 .../Inferring the Future by Imagining the Past    |  1 +
 ...amfer Distance Loss for Point Cloud Completion |  1 +
 ...ompt Tuning for Natural Language Understanding |  1 +
 ...eometry of the Retinal Representation Manifold |  1 +
 ...ine Approach for Partially Observable Problems |  1 +
 ...exity of Linear Predictors and Neural Networks |  1 +
 .../Inner Product-based Neural Network Similarity |  1 +
 ...r: Instruction-driven Physics-based Characters |  1 +
 ...ng Anybody in Diffusion Models via Celeb Basis |  1 +
 ...f Language Model with Sparse Human Supervision |  1 +
 ...Interactive Visual Reasoning under Uncertainty |  1 +
 ...Scale: Identifying Causal Mechanisms in Alpaca |  1 +
 ...e Prototype-based Graph Information Bottleneck |  1 +
 ...ion for Robust Detection of AI-Generated Texts |  1 +
 ...Probability of Sufficient and Necessary Causes |  1 +
 ...s Good Representations for Multitask Imitation |  1 +
 ...ent Learning with the Average Reward Criterion |  1 +
 ...ity Estimation for Safe Reinforcement Learning |  1 +
 ... Optimizing the Jaccard Index with Soft Labels |  1 +
 ...Jailbroken: How Does LLM Safety Training Fail? |  1 +
 ...ampling Based Conditional Independence Testing |  0
 ...wledge Distiller for Any Teacher-Student Pairs |  1 +
 ...rnel Quadrature with Randomly Pivoted Cholesky |  1 +
 ... Sound Symbolism in Vision-and-Language Models |  1 +
 ... Efficient Low-Rank Permutation Representation |  1 +
 .../neurips/Knowledge Diffusion for Distillation  |  1 +
 ...stillation Performs Partial Variance Reduction |  1 +
 ... Any-level Descriptions using Diffusion Priors |  0
 ... Generalization Bounds and Confidence Boosting |  0
 ... Perfect linear concept erasure in closed form |  1 +
 ...Knowledge Transfer for Lifelong Robot Learning |  1 +
 ...e Models in Text-to-Image Synthesis Evaluation |  1 +
 ...edical Imaging via Second-order Graph Matching |  1 +
 data/2023/neurips/Label Poisoning is All You Need |  1 +
 ...Model Inversion Attacks via Knowledge Transfer |  1 +
 ...iffusion Models for Learning from Noisy Labels |  1 +
 ... Lagrangian Fluid Mechanics Benchmarking Suite |  1 +
 data/2023/neurips/Langevin Quasi-Monte Carlo      |  1 +
 ...Need: Aligning Perception with Language Models |  1 +
 ...ful Explanations in Chain-of-Thought Prompting |  1 +
 ...: Embodied Experiences Enhance Language Models |  1 +
 ...ers: Towards Unsupervised Text-Image Alignment |  1 +
 ... Spaces Improve Video Self-Supervised Learning |  1 +
 ... Semi-Parametric Reinforcement Learning Agents |  1 +
 ...guage Models are Visual Reasoning Coordinators |  1 +
 ...Language Models can Implement Policy Iteration |  1 +
 ...rk Dataset for Large-Scale Traffic Forecasting |  1 +
 ...dient Primal-Dual Methods for Constrained MDPs |  1 +
 .../neurips/Latent SDEs on Homogeneous Spaces     |  1 +
 .../Latent exploration for Reinforcement Learning |  1 +
 ...ling - Defusing Neighborhood Explosion in GNNs |  1 +
 ...Self-Coding for Generalized Category Discovery |  1 +
 ...ensity Fields for Clean Cryo-ET Reconstruction |  1 +
 ...arning Causal Models under Independent Changes |  1 +
 .../neurips/Learning Cuts via Enumeration Oracles |  1 +
 .../Learning DAGs from Data with Few Root Causes  |  1 +
 ...Highly-accurate Cross-view Camera Localization |  1 +
 ...ia Semipermeable Maximum Likelihood Estimation |  1 +
 .../2023/neurips/Learning Functional Transduction |  1 +
 ...ecognition Representations Without Real Humans |  1 +
 ...ecular Representation in Latent Discrete Space |  1 +
 ...Property Prediction via Graph Segment Training |  1 +
 ...raphical Models via Bridge-Block Decomposition |  1 +
 ...Neural Fields via Context Pruned Meta-Learning |  1 +
 ...l Domain-Invariant Representations for Ranking |  1 +
 ...ion Refinement for Unsupervised Face Animation |  1 +
 ...Free Bayesian Inference in Constrained Domains |  1 +
 ... Regularized Monotone Graphon Mean-Field Games |  1 +
 ...s Using An Intra-class Correlation Regularizer |  1 +
 ...ety Constraints from Multi-task Demonstrations |  1 +
 ...or Individual Neurons from Population Dynamics |  1 +
 ...ing Trajectories are Generalization Indicators |  1 +
 ...rsal Policies via Text-Guided Video Generation |  1 +
 ...rning Visual Prior via Generative Pre-Training |  1 +
 ...an Involvement through Proxy Value Propagation |  1 +
 ...rse Locations for Long-tailed Object Detection |  1 +
 ...vian Decision-Making from State-only Sequences |  1 +
 ...istributions for Out-of-distribution Detection |  1 +
 ...rning to Group Auxiliary Datasets for Molecule |  1 +
 ...Attributes for Open-set Fine-grained Retrieval |  1 +
 ...p: Intervention-Aware Concept Embedding Models |  1 +
 .../Learning to Taste: A Multimodal Wine Dataset  |  1 +
 ... Ordering Alignment for Ordinal Classification |  1 +
 ...ize World Models for Model-based Task Planning |  1 +
 ... to demonstrate convergence of neural networks |  1 +
 data/2023/neurips/Lexinvariant Language Models    |  1 +
 ...Linker Co-Design with 3D Equivariant Diffusion |  1 +
 ...al Lithography for Semiconductor Manufacturing |  1 +
 .../Lo-Hi: Practical ML Drug Discovery Benchmark  |  1 +
 ...-of-Distribution Detection via Prompt Learning |  1 +
 ...rated Learning with Isolated Subspace Training |  1 +
 ...om Stationary Signals with Recovery Guarantees |  1 +
 data/2023/neurips/Logarithmic Bayes Regret Bounds |  1 +
 .../Long-Term Fairness with Unknown Dynamics      |  1 +
 ... of Temporal Difference Reinforcement Learning |  1 +
 ... Compression with Conditional Diffusion Models |  1 +
 ...t Object Learning with Mutual Exclusivity Bias |  1 +
 ...Bounds on Adaptive Sensing for Matrix Recovery |  1 +
 ...e Repainting for Lighting-Realistic Generation |  1 +
 ...nt Cloud-Based Intra-Patient Lung Registration |  0
 ...al of Machine Learning for Materials Discovery |  1 +
 ...Chinese Historical Document Analysis Benchmark |  1 +
 ...ngual And Document-Level Large Audited Dataset |  1 +
 ...on-byte Sequences with Multiscale Transformers |  1 +
 ... for Compact and Efficient Vision Transformers |  0
 ... for Two-Sample Testing Without Data Splitting |  1 +
 ...pler for MicroMotion-based Gait Classification |  1 +
 ...achine learning detects terminal singularities |  1 +
 ...ent by Wire-Mask-Guided Black-Box Optimization |  1 +
 ...ny-body Approximation for Non-negative Tensors |  1 +
 ...ibutionally Equivalent Model Extraction Attack |  1 +
 ...ing for Efficient Dynamic Scene Reconstruction |  1 +
 ...ave a Role in Mathematical Architecture Design |  1 +
 data/2023/neurips/Max-Sliced Mutual Information   |  1 +
 ...p Learning with Adversarial Ranking Robustness |  1 +
 ...Set: Self-Training through Dynamic Programming |  1 +
 ...ic Pre-Training for 3D Molecular Conformations |  1 +
 ...al Aggregation on Multi-Scaled Graph Hierarchy |  1 +
 ...ng Medical Prescriptions and Satellite Imagery |  1 +
 ... Optimizer with Momentum for Few-Shot Learning |  0
 .../Meta-Learning with Neural Bandit Scheduler    |  0
 ...a-in-context learning in large language models |  1 +
 ...king networks using simulation-based inference |  1 +
 ...g and Enhancing In-Network Regular Expressions |  1 +
 ...Shape-Image-Text Aligned Latent Representation |  1 +
 ...kernels and neural networks in fixed dimension |  1 +
 ... Learning Environments for Goal-Oriented Tasks |  1 +
 ... of Evolving Tasks with Performance Guarantees |  1 +
 ...ization Guarantees for Representation Learning |  1 +
 .../Minimum-Risk Recalibration of Classifiers     |  1 +
 ...nsformers via Regularized Nonlocal Functionals |  1 +
 ...gating Test-Time Bias for Fair Image Retrieval |  1 +
 ...Incidental Correlations on Part-based Learning |  1 +
 ... Filtering: A Dimensional Collapse Perspective |  0
 ...FormerV2: Efficient Fully Transformer Tracking |  1 +
 ...arning to Train Transformers with Transformers |  1 +
 ...l Alignment on Causal and Moral Judgment Tasks |  1 +
 ...Learning with Meta-Learned Masked Auto-Encoder |  1 +
 ...et Weakly-Supervised Audio-Visual Event Parser |  1 +
 ...or Bayesian Neural Networks in Mutual Learning |  1 +
 ...odel-Based Control with Sparse Neural Dynamics |  1 +
 ...dient Methods: Theory and Practical Algorithms |  1 +
 ...erior Sampling via Learning Rate Randomization |  1 +
 ...ditive Mechanism Shift Variational Autoencoder |  1 +
 ...stillation for Multimodality Foundation Models |  1 +
 .../Moment Matching Denoising Gibbs Sampling      |  1 +
 ...ive Video Moment Retrieval from Random to Real |  1 +
 .../Momentum Provably Improves Error Feedback!    |  1 +
 ...A Simple Sub-Quadratic GEMM-Based Architecture |  1 +
 ...e Carlo Tree Search with Boltzmann Exploration |  1 +
 ... 3D Expressive Whole-body Human Motion Dataset |  1 +
 ...g with Heterogeneous Linear Contextual Bandits |  0
 ...Sharper Convergence Rates with Task Similarity |  1 +
 ...or Multi-Source Unsupervised Domain Adaptation |  1 +
 ...vised Rigid Segmentation and Motion Estimation |  1 +
 ...lti-modal Queried Object Detection in the Wild |  1 +
 .../Multi-scale Diffusion Denoised Smoothing      |  1 +
 ...g: Simple and Intuitive Weak Learning Criteria |  1 +
 ...ormality on Null Covariates in High-Dimensions |  1 +
 ...timation of Targeted Average Treatment Effects |  1 +
 ...ith High-Quality 3D Shape and Pose Annotations |  1 +
 ...rk for Deep Learning on non-Cartesian Lattices |  1 +
 ...forcement Learning with Function Approximation |  1 +
 ...Linear Time Algorithm for the Chamfer Distance |  0
 ...timal k-Clustering in the Sliding Window Model |  1 +
 .../Nearest Neighbour with Bandit Feedback        |  1 +
 .../Nearly Optimal Bounds for Cyclic Forgetting   |  0
 ...timal Decision Trees using Dynamic Programming |  1 +
 ...isson Compressed Sensing in the Olfactory Bulb |  1 +
 ...ontext Pretraining for Neural Spiking Activity |  1 +
 ...on of Latent Representations on Dynamic Scenes |  1 +
 data/2023/neurips/Neural Functional Transformers  |  1 +
 ... with Neural Stochastic Differential Equations |  1 +
 ...eneralization, Robustness, and Spectral Biases |  1 +
 ...ral Lyapunov Control for Discrete-Time Systems |  1 +
 ...ed Learning Framework for Improved Reliability |  0
 ...Neural Priming for Sample-Efficient Adaptation |  1 +
 data/2023/neurips/Neural Processes with Stability |  1 +
 ...rchical Exponential-family Energy-based Models |  1 +
 ...stimation by Learning Neural Gradient Function |  1 +
 ...nary Optimizers for Deep Learning Applications |  1 +
 ...Tuning of Regression Problems Across Instances |  1 +
 ... Model Change Maximization for Active Learning |  0
 ...y Detection by Point Sequential Reconstruction |  1 +
 ...otic Analysis of a UCB-based Top Two Algorithm |  1 +
 ...neralization Bounds for Sparse Neural Networks |  1 +
 ...re All That Sharpness-Aware Minimization Needs |  1 +
 ...Normalizing flow neural networks by JKO scheme |  1 +
 ...Analysis and Mitigation of Reasoning Shortcuts |  1 +
 ...ed Dataset of Interleaved Image-Text Documents |  1 +
 ...t Model-free Reinforcement Learning for POMDPs |  1 +
 ...RTS: Towards Open-Vocabulary Part Segmentation |  1 +
 ...nchmarks from lightweight tracking annotations |  0
 ...cy Prediction Benchmark for Autonomous Driving |  1 +
 .../OceanBench: The Sea Surface Height Edition    |  1 +
 ...ning with Variational Counterfactual Reasoning |  1 +
 ...Representations for Generalizability in POMDPs |  1 +
 ...Approximations to the Gaussian Mixture Entropy |  1 +
 ...mpling from Gaussian and Product Distributions |  1 +
 ...eneralization Bounds for Projective Clustering |  1 +
 ...asked Pre-training and the Marginal Likelihood |  1 +
 ...ning with Experts: Algorithms and Lower Bounds |  0
 ... network posteriors: a variational perspective |  1 +
 ...tions in distillation: does it pay to disobey? |  1 +
 ...s of Out-of-distribution Generalization Models |  1 +
 ... Best-Arm Identification with Fixed Confidence |  1 +
 ...ning Data Diversity and Fine-tuning Robustness |  1 +
 ...Solution of Shuffling-Type Gradient Algorithms |  1 +
 ...ratically-Bounded Losses: an Improved Analysis |  1 +
 ...nd Interpretability of Gaussian Process Models |  1 +
 ...of Sparse ICA without Assuming Non-Gaussianity |  0
 ...e Overlooked Structure of Stochastic Gradients |  1 +
 ...rge Language Models - A Critical Investigation |  1 +
 ...bustness of Removal-Based Feature Attributions |  1 +
 ... Networks: Exponential Gaps for Long Sequences |  1 +
 ...ization in Adversarially Robust Classification |  1 +
 ...er-class Diversity for Supervised Pre-training |  1 +
 ...the spectral bias of two-layer linear networks |  1 +
 ... on Model-Based Offline Reinforcement Learning |  1 +
 ...eneous Architectures in Knowledge Distillation |  1 +
 ...e-step differentiation of iterative algorithms |  1 +
 ...odels under Concept Drift by Online Ensembling |  1 +
 ...arning: Provable Guarantees for Generalization |  0
 .../neurips/Online Control for Meta-optimization  |  0
 ...imal Dynamic Regret meets Practical Algorithms |  1 +
 ...arning under Adversarial Nonlinear Constraints |  1 +
 ...Planning with Anytime Deterministic Guarantees |  1 +
 .../Online robust non-stationary estimation       |  0
 ...t Style Compensation for Semantic Segmentation |  1 +
 ...elation-Oriented Multimodality Model Prompting |  1 +
 ...sk3D: Open-Vocabulary 3D Instance Segmentation |  1 +
 ...nchmark of Spatio-Temporal Predictive Learning |  1 +
 ...y Stopping for Robustifying Differentiable NAS |  1 +
 ...al Fields: Tackling PDEs on General Geometries |  1 +
 ...thms for the Inhomogeneous Spiked Wigner Model |  1 +
 ...ction for Graph-based Semi-supervised Learning |  1 +
 ...escent in Discounted Markov Decision Processes |  1 +
 ...iational Inequalities with Separable Structure |  1 +
 ...zation Methods Under a Fixed Computation Model |  1 +
 ...imal Transport for Treatment Effect Estimation |  1 +
 ...ned test statistics across independent studies |  1 +
 ...nt Policy Optimization Framework for Online RL |  1 +
 ...ptimizing Prompts for Text-to-Image Generation |  1 +
 ...of Dataset Imbalance for Multilingual Learning |  1 +
 ...ng with Unreliable Out-of-distribution Sources |  1 +
 ...ficient Zero-Shot TTS through Speech Prompting |  1 +
 ...rning Linear Thresholds from Label Proportions |  1 +
 ...Accurate Long Rollouts with Neural PDE Solvers |  1 +
 ... Paragraph via Latent Language Diffusion Model |  1 +
 ...Object Search in Partially Unknown Environment |  0
 ...odel for Patient-independent Seizure Detection |  1 +
 ...n for Enhanced Self-Supervised Image Denoising |  1 +
 ...echnique for Detecting Poisoned Samples in NLP |  1 +
 .../Parallel Submodular Function Minimization     |  1 +
 ...mentoring for Offline Model-based Optimization |  1 +
 ...ed text, but retrieval is an effective defense |  1 +
 ...y Propagation guided Candidate Label Shrinkage |  0
 ...articipatory Personalization in Classification |  1 +
 ...nce with Generalized Wasserstein Gradient Flow |  1 +
 ...re Data-Efficient Training of Diffusion Models |  1 +
 ...ransformer for any Aspect Ratio and Resolution |  1 +
 ... effect of the learning rate, depth, and width |  1 +
 ...ine Inference of Different Physical Properties |  1 +
 ...nE: Representation Learning over Planar Graphs |  1 +
 ... of protein families as sequences-of-sequences |  1 +
 ...olicy Space Diversity for Non-Transitive Games |  1 +
 ...Reconstruction via Guided Set Diffusion Models |  1 +
 ...tain Structured Strong Winning Lottery Tickets |  1 +
 ...of Language Models Can Improve Language Models |  1 +
 ...ata for Improving Utility on Selected Measures |  1 +
 .../PrObeD: Proactive Object Detection Wrapper    |  1 +
 ...tation Nowcasting with Latent Diffusion Models |  1 +
 ...trix Factorization via Scaled Gradient Descent |  1 +
 ... Protein's Stability under a Million Mutations |  1 +
 ...nd Control in Continual Reinforcement Learning |  1 +
 ...Asymmetric Kernel SVD in Primal Representation |  1 +
 ...ation Trade-off in Distributed Mean Estimation |  1 +
 ...econd-Order Stationary Points and Excess Risks |  1 +
 ...for stochastic block models and mixture models |  1 +
 ...d Multimodal Dataset for Molecular Biology Lab |  1 +
 ...obing Privacy Leakage in Large Language Models |  1 +
 ...nt Learning with Randomized Linear Classifiers |  1 +
 ...l Optimization with Convex Lower-level Problem |  1 +
 ...x Optimization via Efficient Newton Iterations |  1 +
 ...IR: Prompting for All-in-One Image Restoration |  0
 ... Knowledge Updates to LMs Through Distillation |  1 +
 ...arks for Protein Fitness Prediction and Design |  0
 ... Folding on Diverse Tasks, Models, and Metrics |  1 +
 ...ling, Noise-Contrastive Estimation, and beyond |  1 +
 .../neurips/Provable benefits of score matching   |  1 +
 ...guarantees for black-box variational inference |  1 +
 ...More) Sample-Efficient Offline RL with Options |  1 +
 .../Provably Bounding Neural Network Preimages    |  1 +
 ...-Informed Calibration for Deep Neural Networks |  1 +
 ...An Efficient Low-bit Quantized Diffusion Model |  1 +
 ...ltonian Prediction Benchmark for QM9 Molecules |  1 +
 ...Outliers by Helping Attention Heads Do Nothing |  1 +
 ...n with Explicit Motion for 3D Object Detection |  0
 ...eneration via Large Mixture of Diffusion Paths |  1 +
 ...active Defense Against Model Poisoning Attacks |  1 +
 ...pling and Denoising for Real-Time Path Tracing |  1 +
 ...a Benchmark for Radiology Language Evaluations |  1 +
 ... and Pre-trained Models for Continual Learning |  1 +
 ...dom Cuts are Optimal for Explainable k-Medians |  1 +
 ...for Efficient and Accurate 3D Object Detection |  0
 ...adient Descent and Small Random Initialization |  1 +
 ...er Training on Efficient Panoptic Segmentation |  1 +
 ...lay Atari with the Help of Instruction Manuals |  1 +
 ...lyline Transformer with Relative Pose Encoding |  1 +
 ...asting Continual Learning as Sequence Modeling |  1 +
 ...on-Convex Iteratively Reweighted Least Squares |  1 +
 .../Recurrent Temporal Revision Graph Networks    |  1 +
 ...ion for Length Generalization with Scalability |  1 +
 ...p Neural Networks with Feature Synthesis Tools |  1 +
 ...+: (In)Stability and Fast Convergence in Games |  1 +
 ...ng for Discounted MDPs with Short Burn-In Time |  1 +
 ...hearsal Learning for Avoiding Undesired Future |  0
 ...ugh Data-Centric AI: A Comprehensive Benchmark |  1 +
 ...for Fine-tuning Text-to-Image Diffusion Models |  1 +
 ...cement Learning with Fast and Forgetful Memory |  1 +
 ...cement Learning via Representation Distinction |  1 +
 ...ing Perspective to (Unbalanced) Classification |  1 +
 ...le Off-Policy Learning for Dosage Combinations |  1 +
 ...al Text Degeneration from the Data Perspective |  1 +
 .../Replicability in Reinforcement Learning       |  1 +
 ...: a Framework for Alias-free Operator Learning |  1 +
 ...ce Learning: A Case For Algorithmic Unit Tests |  1 +
 ...or Image Super-resolution by Residual Shifting |  1 +
 ...g the Optimizer in Deep RL: An Empirical Study |  1 +
 ...Uncovering the Mechanisms of Residual Networks |  1 +
 data/2023/neurips/Resilient Constrained Learning  |  1 +
 ...uling: An LLM-Empowered LLM Inference Pipeline |  1 +
 .../Responsible AI (RAI) Games and Ensembles      |  1 +
 ...rt Sampling for Improving Generative Processes |  1 +
 ...eaking the Uniform Global Attractor Assumption |  1 +
 ...Architectures Make for Fairer Face Recognition |  1 +
 ...stems: Are Monotone Rewards Always Beneficial? |  1 +
 ...rking video action recognition under occlusion |  0
 ...lation from the Perspective of Robust Fairness |  0
 ...aining and Generalization across Threat Models |  1 +
 ...LP: Benchmarks, Analysis, and LLMs Evaluations |  1 +
 ...ng the Evaluation of Image Synthesis with GANs |  1 +
 ... with Sketching for Contextual Batched Bandits |  1 +
 ...Distribution Estimation and Reward Improvement |  1 +
 ...polating weights fine-tuned on diverse rewards |  1 +
 ... for Language-Supervised Semantic Segmentation |  1 +
 .../neurips/Riemannian Residual Neural Networks   |  1 +
 ...ess-Aware Minimization on Riemannian Manifolds |  0
 ...ptimization methods avoid strict saddle points |  1 +
 ... Timely Outcome Prediction under Cost Pressure |  1 +
 data/2023/neurips/Robust Bayesian Satisficing     |  1 +
 ...abel Noise via Maximizing Re-labeling Accuracy |  1 +
 ...ve Data Expansion Against Spurious Correlation |  1 +
 ...Robust Matrix Sensing in the Semi-Random Model |  0
 ...on Without Moments for Symmetric Distributions |  1 +
 ...easoning and Fitting via Dual Sparsity Pursuit |  0
 ...ively Secure Serverless Collaborative Learning |  1 +
 ...ith missing values and cell-wise contamination |  1 +
 ...aining via approximate orthonormal constraints |  1 +
 ...s Solver for Fast Sampling of Diffusion Models |  1 +
 ...arning attack on LWE with sparse small secrets |  0
 ...re-aware Shapley-based Multipiece Explanations |  0
 ...c-Aware Normalizing Flow for Anomaly Detection |  1 +
 ...istration for Robust 6D Object Pose Estimation |  1 +
 .../SE(3) Equivariant Augmented Coupling Flows    |  1 +
 ...st High-Quality Sampling from Diffusion Models |  1 +
 ...t between human and machine visual abstraction |  1 +
 ... Structured Constrained Nonconvex Optimization |  1 +
 ...ixing for Distillation with Unlabeled Examples |  1 +
 ...luster for Referring Video Object Segmentation |  1 +
 ...al Alignment Perspective for Domain Adaptation |  1 +
 ... Contribution Evaluation in Federated Learning |  1 +
 ...Spiked Random Model for Reinforcement Learning |  1 +
 ...ith Spatiotemporal Annotations of Sound Events |  1 +
 ...rative Model for Text-to-Behavior in Minecraft |  1 +
 ...lator for Machine Learning in Particle Physics |  1 +
 ...for Enhanced Molecular Representation Learning |  0
 ...ion Learning with Non-Preferred Demonstrations |  1 +
 ...n of Decision-Tree Policies in Continuous Time |  1 +
 ...hing: Causal Discovery and Generative Modeling |  1 +
 ...onditioned Hierarchical Reinforcement Learning |  0
 ...bjective Molecular Optimization with GFlowNets |  1 +
 ... using Remote Sensing and Citizen Science Data |  0
 ...ed Language Models Using Declarative Prompting |  1 +
 .../Scalable 3D Captioning with Pretrained Models |  1 +
 .../neurips/Scalable Fair Influence Maximization  |  1 +
 ...Model via Scaling Network Long Skip Connection |  1 +
 .../Scaling Open-Vocabulary Object Detection      |  1 +
 .../neurips/Scaling Riemannian Diffusion Models   |  1 +
 ...ling laws for language encoding models in fMRI |  1 +
 ...s and Token Composition in 1-layer Transformer |  1 +
 ...ble Driving Scenario Generation With Diffusion |  1 +
 ... Step-sizes with Multidimensional Backtracking |  1 +
 ...n Task Generalization with Energy-Based Models |  1 +
 ...orcement Learning against Spurious Correlation |  1 +
 ...ion Refinement with Discrete Diffusion Process |  1 +
 .../2023/neurips/Segment Anything in High Quality |  1 +
 ...proach to Forgetting in Deep Generative Models |  1 +
 ...es Improves Multi-Agent Reinforcement Learning |  1 +
 ...ainst On-body Displacement of Flexible Sensors |  1 +
 ... for Video Localization and Question Answering |  1 +
 ... Optimization through Bayesian Active Learning |  1 +
 data/2023/neurips/Self-Predictive Universal AI    |  1 +
 ... and more human-aligned visual representations |  0
 ...ntial Subset Matching for Dataset Distillation |  1 +
 ...n Pretrained Language Models through Honeypots |  1 +
 ...ised Functional Map Regularized Reconstruction |  1 +
 ...ize Sharpness To Achieve Better Generalization |  1 +
 ...dent Networks Copy or Average Teacher Weights? |  1 +
 ...ramework for Multi-modal Domain Generalization |  1 +
 ...ive Clustering via One-Dimensional Projections |  1 +
 ...g Transformers for Large-Graph Representations |  1 +
 ...ies: Improved Analysis under Weaker Conditions |  1 +
 ...Contexts into Generative Commonsense Reasoning |  1 +
 ...aptive Regularization with Frequent Directions |  1 +
 .../Slot-guided Volumetric Object Radiance Fields |  1 +
 .../Small batch deep reinforcement learning       |  1 +
 ...mmetrization for deep learning on point clouds |  1 +
 ... Analysis of Sequential Probability Assignment |  1 +
 ...ust Test-Time Adaptation on Noisy Data Streams |  1 +
 ...l Motion Prediction with Cognitive Hierarchies |  1 +
 ...Efficient Training of Attention-based Networks |  0
 ...vex Minimax Optimization in Federated Learning |  1 +
 ... for Time Series Data: Theory and Applications |  1 +
 ...lar Activation for Efficient Sequence Modeling |  1 +
 ...meterization for Epitomic Dataset Distillation |  0
 ...lly Private Training of Large Embedding Models |  1 +
 ...annels, shape bias, and adversarial robustness |  1 +
 ...ology Images via Bi-modal Contrastive Learning |  1 +
 ...Estimation for Low-Rank Reinforcement Learning |  1 +
 data/2023/neurips/Spike-driven Transformer        |  1 +
 ...mmetry breaking in generative diffusion models |  1 +
 ...ilies: A New Class of Tractable Density Models |  1 +
 ...ation at ImageNet Scale From A New Perspective |  1 +
 ...tance Functions and Finer Shape Representation |  1 +
 ...g Societal Representations in Diffusion Models |  0
 ...ent Homology using Signed Barcodes as Measures |  1 +
 ...d Learning for Federated Domain Generalization |  1 +
 ... Fourier Transform for Representation Learning |  1 +
 ...ased Representations for Off-Policy Evaluation |  1 +
 ...approximators with exponential decaying memory |  1 +
 ... Benefit Agent Learning and User Understanding |  1 +
 ...Attacks in the Context of Selective Forgetting |  0
 ...te Learning Process in Quantum Neural Networks |  1 +
 ...Knowledge Assessment for Large Language Models |  1 +
 ...l Trade-off in Multi-Agent Multi-Armed Bandits |  1 +
 ...ce Assessment through Conditional Permutations |  1 +
 .../neurips/Stein \316\240-Importance Sampling"   |  1 +
 ...le Free Sampling of Molecular Transition Paths |  1 +
 ...d Benchmark for Continuous Story Visualization |  1 +
 data/2023/neurips/Strategic Apple Tasting         |  1 +
 ...nds for Estimating Correlation Clustering Cost |  0
 .../2023/neurips/Streaming PCA for Markovian Data |  1 +
 ...ulation of Human Percepts via Robustified ANNs |  1 +
 ...Neural Inverse Graphics from a Pile of Objects |  1 +
 data/2023/neurips/Structure of universal formulas |  1 +
 ...ks for Density Estimation and Causal Inference |  1 +
 ...o-End Stability and Output Tracking Guarantees |  1 +
 ...rediction with Stronger Consistency Guarantees |  1 +
 ...tyleDrop: Text-to-Image Synthesis of Any Style |  0
 ...StyleGAN knows Normal, Depth, Albedo, and More |  1 +
 ...roportional high-dimensional Linear Regression |  1 +
 ...unterexample for the Success of Early Stopping |  1 +
 .../Successor-Predecessor Intrinsic Exploration   |  1 +
 ...braic Decomposition via Reinforcement Learning |  0
 ...Survival Analysis with Time-Varying Covariates |  0
 ...ng Environments for Sustainable Energy Systems |  1 +
 ...e Prompt Adaptation for Vision-Language Models |  1 +
 ...orcement Learning for Adaptive Mesh Refinement |  1 +
 ...nd Slow Thinking for Complex Interactive Tasks |  1 +
 ...Molecules, Proteins, and Crystalline Materials |  1 +
 ... Pose Estimation with Geometric Reconstruction |  1 +
 ... through Object-Centric Relational Abstraction |  1 +
 ...n-world Compositional Text-to-image Generation |  1 +
 ...rastive Loss for Visual Reinforcement Learning |  1 +
 ...omplex Reasoning over Temporal Knowledge Graph |  1 +
 data/2023/neurips/TOA: Task-oriented Active VQA   |  1 +
 ...auditing training data for improved regression |  1 +
 ...x Optimal and Instance-Dependent Regret Bounds |  1 +
 ...g Self-Attention for Graph via Rooted Subtrees |  1 +
 ...ects in Graph-based Spatiotemporal Forecasting |  1 +
 ...atures for Scalable Molecular Machine Learning |  1 +
 ... Space: Improved Editing of Pre-Trained Models |  1 +
 ...e Weight Analysis, and Neural Network Training |  1 +
 ...Prior Compensation for Human Motion Prediction |  1 +
 ...rse Classifier for Fine-Grained Classification |  1 +
 ...g for Matching-based Video Object Segmentation |  1 +
 ... Efficient Unified Model for Massive NLP Tasks |  1 +
 .../Textually Pretrained Speech Language Models   |  1 +
 ...and Convergence of Local Bayesian Optimization |  1 +
 ...ching Consensus and Convergence to Equilibrium |  1 +
 ... Normalization in Sharpness-Aware Minimization |  1 +
 ...Reinforcement Learning with a Generative Model |  1 +
 ...stortion of Binomial Voting Defies Expectation |  1 +
 ... Stability under Regularized Learning in Games |  1 +
 .../The Gain from Ordering in Online Learning     |  0
 ...e Portability and Implications for ML Progress |  1 +
 ... Subgraph Densities to Stochastic Block Models |  1 +
 .../The Learnability of In-Context Learning       |  1 +
 ...tperforming Curated Corpora with Web Data Only |  0
 ...ly-supervised Whole Slide Image Classification |  1 +
 ...f Emergent In-Context Learning in Transformers |  1 +
 ...g Data Representations in Deep Neural Networks |  1 +
 ...sive power of pooling in Graph Neural Networks |  1 +
 ...level in linear regression with dependent data |  1 +
 .../The probability flow ODE is provably fast     |  1 +
 ...tability with respect to distributional shifts |  1 +
 ...nductive Biases in Deep Convolutional Networks |  0
 ...ization, Generalization and Conflict-Avoidance |  1 +
 ... Large Language Models with External Knowledge |  1 +
 ...ounds for Volumetric Spanners and Applications |  1 +
 ... Bounds for Gradient Descent on Separable Data |  1 +
 ...ransformer for Irregularly Sampled Time Series |  1 +
 ...guage Models Can Teach Themselves to Use Tools |  1 +
 ...eep Ensemble Works in Selective Classification |  0
 ...-supervised Simplicial Representation Learning |  1 +
 ...verification and retrieval without fine-tuning |  1 +
 ...Learning: New Architecture and Unified Library |  1 +
 ...Initialization: What Makes a Good Sparse Mask? |  0
 ...e Study on Overparameterized Linear Regression |  1 +
 ...bution-Agnostic Generalized Category Discovery |  1 +
 ...: Characterizing Scaling and Transfer Behavior |  1 +
 ...ds Higher Ranks via Adversarial Weight Pruning |  1 +
 .../Towards In-context Scene Understanding        |  1 +
 ...cene Understanding by Vision Foundation Models |  1 +
 ...ng for Group Robustness with Fewer Annotations |  1 +
 ...nd Chain of Thought: A Theoretical Perspective |  1 +
 ...ICD Coding via Tree-based Contrastive Learning |  1 +
 ...door Purification through Feature Shift Tuning |  1 +
 ...ark for High-Level Synthesis Targeted to FPGAs |  1 +
 ... of Kernel-based Methods Under Covariate Shift |  1 +
 ...tive Learning for Disentangled Representations |  1 +
 ...rons with Clustered Compositional Explanations |  1 +
 ...n Dataset on Large Tensor Computational Graphs |  1 +
 ...Fight Easy: Robust Meta Reinforcement Learning |  1 +
 ...Chain-of-Thought via Latent-Variable Inference |  1 +
 ...ted Neural Networks is \342\210\203R-Complete" |  1 +
 ...ges Improves Robustness to Adversarial Attacks |  1 +
 ...ion for Variable-Sized Text-to-Image Synthesis |  1 +
 ...of Stability Phenomenon via Bifurcation Theory |  1 +
 ...bust Generalization for Tensor Neural Networks |  1 +
 ...for Lidar View Synthesis and 3D Reconstruction |  1 +
 ...or Continual Knowledge Retention and Promotion |  1 +
 ... with data-constrained spiking neural networks |  1 +
 ... Trojan Prompt Attack on Large Language Models |  1 +
 ...metric Super-Resolution with BLASTNet 2.0 Data |  1 +
 ... Untuned SGD and the Power of Adaptive Methods |  1 +
 ...-Stage Learning to Defer with Multiple Experts |  1 +
 ... Retrieve Any Object via Prompt-based Tracking |  1 +
 ...A Real-World Dataset for Under-Display Cameras |  0
 ...d for Real-Time Rendering of Large-Scale Scene |  1 +
 ...strained Pose Prior-Free Neural Radiance Field |  1 +
 ...ommendation Unlearning via Error Decomposition |  0
 ...odels with structured discrete representations |  1 +
 ... Network for Cross-Domain Video-Text Retrieval |  0
 ...e Instance Reweighting for Off-Policy Learning |  1 +
 ...Unconstrained Dynamic Regret via Sparse Coding |  1 +
 ...r Weakly Open-Vocabulary Semantic Segmentation |  1 +
 ...-supervised Learning under Distribution Shifts |  1 +
 ...ives as the ELBO with Simple Data Augmentation |  1 +
 ...dness and Adaptation Difficulty via Attributes |  0
 ...Learning via Stage-wise Relaxed Initialization |  1 +
 ...earning for Out-of-Distribution Generalization |  1 +
 ...ar property prediction: Insights and Solutions |  1 +
 ...ental class-level effects of data augmentation |  1 +
 ...ramework for Fast Sampling of Diffusion Models |  1 +
 ...rm Convergence with Square-Root Lipschitz Loss |  1 +
 .../Universality and Limitations of Prompt Tuning |  1 +
 ...mization in Auditing Differentially Private ML |  1 +
 .../Unsupervised Anomaly Detection with Rejection |  1 +
 ...ture Search with Disentangled Self-Supervision |  1 +
 ...supervised Image Denoising with Score Function |  1 +
 ...Representation for CT Metal Artifact Reduction |  1 +
 ... Science Applications of Large Language Models |  1 +
 .../neurips/Utilitarian Algorithm Configuration   |  1 +
 ...ext Omni-Modality Foundation Model and Dataset |  1 +
 ...n Vision-Language Tasks via Pre-trained Models |  1 +
 .../neurips/VaRT: Variational Regression Trees    |  1 +
 ...aling on Graphs for Combinatorial Optimization |  1 +
 ...n Models as Rewards for Reinforcement Learning |  1 +
 ...or Keystep Recognition in Instructional Videos |  1 +
 ...ent between AI and Humans in Visual Perception |  1 +
 ...g gender bias in image-text pronoun resolution |  1 +
 ...n Inversion: Image Editing via Image Prompting |  0
 ... for Fast Neural Radiance Field Reconstruction |  1 +
 ...Quantum Many-Body Schr\303\266dinger Equation" |  1 +
 ... Supervised Learning with Intermediate Targets |  1 +
 .../What Can We Learn from Unlearnable Datasets?  |  1 +
 ... Trajectory Prediction for Autonomous Driving? |  1 +
 ... Evaluation of Zero-Shot Semantic Segmentation |  1 +
 ...ork for Offline Inverse Reinforcement Learning |  1 +
 ...Nets Outperform Boosted Trees on Tabular Data? |  1 +
 ...oes Confidence-Based Cascade Deferral Suffice? |  1 +
 ...es Optimizing a Proper Loss Yield Calibration? |  1 +
 ...rom? Origin Attribution of AI-Generated Images |  0
 ...icial Visual Cortex for Embodied Intelligence? |  1 +
 ...Unseen Novel Categories of Articulated Objects |  1 +
 ...Aware Minimization Generalize Better Than SGD? |  1 +
 ...soning emerges from the locality of experience |  1 +
 ...dal time series for wildfire spread prediction |  0
 ... Memory Efficient Adaptation of Language Model |  1 +
 ...ch Implementations: Guarantees and Limitations |  1 +
 .../XAGen: 3D Expressive Human Avatars Generation |  1 +
 .../Zero-One Laws of Graph Neural Networks        |  1 +
 ...ce-Aware Structured Pruning of Language Models |  1 +
 ...rk for Goal-Conditioned RL using f-Divergences |  1 +
 ...etter Initialization with Differential Privacy |  1 +
 .../neurips/rPPG-Toolbox: Deep Remote PPG Toolbox |  1 +
 ...sentation Learner for Single-Cell RNA-Seq Data |  1 +
 8074 files changed, 7920 insertions(+)
 create mode 100644 data/2018/vldb/Declarative Recursive Computation on an RDBMS
 create mode 100644 data/2020/neurips/(De)Randomized Smoothing for Certifiable Defense against Patch Attacks
 create mode 100644 data/2020/neurips/3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
 create mode 100644 data/2020/neurips/3D Self-Supervised Methods for Medical Imaging
 create mode 100644 data/2020/neurips/3D Shape Reconstruction from Vision and Touch
 create mode 100644 data/2020/neurips/A B Testing in Dense Large-Scale Networks: Design and Inference
 create mode 100644 data/2020/neurips/A Bandit Learning Algorithm and Applications to Auction Design
 create mode 100644 data/2020/neurips/A Bayesian Nonparametrics View into Deep Representations
 create mode 100644 data/2020/neurips/A Bayesian Perspective on Training Speed and Model Selection
 create mode 100644 data/2020/neurips/A Benchmark for Systematic Generalization in Grounded Language Understanding
 create mode 100644 data/2020/neurips/A Biologically Plausible Neural Network for Slow Feature Analysis
 create mode 100644 data/2020/neurips/A Boolean Task Algebra for Reinforcement Learning
 create mode 100644 data/2020/neurips/A Catalyst Framework for Minimax Optimization
 create mode 100644 data/2020/neurips/A Causal View on Robustness of Neural Networks
 create mode 100644 data/2020/neurips/A Class of Algorithms for General Instrumental Variable Models
 create mode 100644 data/2020/neurips/A Closer Look at Accuracy vs. Robustness
 create mode 100644 data/2020/neurips/A Closer Look at the Training Strategy for Modern Meta-Learning
 create mode 100644 data/2020/neurips/A Combinatorial Perspective on Transfer Learning
 create mode 100644 data/2020/neurips/A Computational Separation between Private Learning and Online Learning
 create mode 100644 data/2020/neurips/A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval
 create mode 100644 data/2020/neurips/A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions
 create mode 100644 data/2020/neurips/A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
 create mode 100644 data/2020/neurips/A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
 create mode 100644 data/2020/neurips/A Dictionary Approach to Domain-Invariant Learning in Deep Networks
 create mode 100644 data/2020/neurips/A Discrete Variational Recurrent Topic Model without the Reparametrization Trick
 create mode 100644 data/2020/neurips/A Dynamical Central Limit Theorem for Shallow Neural Networks
 create mode 100644 data/2020/neurips/A Fair Classifier Using Kernel Density Estimation
 create mode 100644 data/2020/neurips/A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization
 create mode 100644 data/2020/neurips/A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
 create mode 100644 data/2020/neurips/A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding
 create mode 100644 data/2020/neurips/A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses
 create mode 100644 data/2020/neurips/A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling
 create mode 100644 data/2020/neurips/A General Large Neighborhood Search Framework for Solving Integer Linear Programs
 create mode 100644 data/2020/neurips/A General Method for Robust Learning from Batches
 create mode 100644 data/2020/neurips/A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks
 create mode 100644 data/2020/neurips/A Group-Theoretic Framework for Data Augmentation
 create mode 100644 data/2020/neurips/A Limitation of the PAC-Bayes Framework
 create mode 100644 data/2020/neurips/A Local Temporal Difference Code for Distributional Reinforcement Learning
 create mode 100644 data/2020/neurips/A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model
 create mode 100644 data/2020/neurips/A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices
 create mode 100644 data/2020/neurips/A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
 create mode 100644 data/2020/neurips/A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings
 create mode 100644 data/2020/neurips/A Non-Asymptotic Analysis for Stein Variational Gradient Descent
 create mode 100644 data/2020/neurips/A Novel Approach for Constrained Optimization in Graphical Models
 create mode 100644 data/2020/neurips/A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
 create mode 100644 data/2020/neurips/A Randomized Algorithm to Reduce the Support of Discrete Measures
 create mode 100644 data/2020/neurips/A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection
 create mode 100644 data/2020/neurips/A Robust Functional EM Algorithm for Incomplete Panel Count Data
 create mode 100644 data/2020/neurips/A Scalable Approach for Privacy-Preserving Collaborative Machine Learning
 create mode 100644 data/2020/neurips/A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
 create mode 100644 data/2020/neurips/A Self-Tuning Actor-Critic Algorithm
 create mode 100644 data/2020/neurips/A Simple Language Model for Task-Oriented Dialogue
 create mode 100644 data/2020/neurips/A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration
 create mode 100644 data/2020/neurips/A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints
 create mode 100644 data/2020/neurips/A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
 create mode 100644 data/2020/neurips/A Spectral Energy Distance for Parallel Speech Synthesis
 create mode 100644 data/2020/neurips/A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
 create mode 100644 data/2020/neurips/A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning
 create mode 100644 data/2020/neurips/A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm
 create mode 100644 data/2020/neurips/A Study on Encodings for Neural Architecture Search
 create mode 100644 data/2020/neurips/A Theoretical Framework for Target Propagation
 create mode 100644 data/2020/neurips/A Tight Lower Bound and Efficient Reduction for Swap Regret
 create mode 100644 data/2020/neurips/A Topological Filter for Learning with Label Noise
 create mode 100644 data/2020/neurips/A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
 create mode 100644 data/2020/neurips/A Unified View of Label Shift Estimation
 create mode 100644 data/2020/neurips/A Unifying View of Optimism in Episodic Reinforcement Learning
 create mode 100644 data/2020/neurips/A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions
 create mode 100644 data/2020/neurips/A Variational Approach for Learning from Positive and Unlabeled Data
 create mode 100644 data/2020/neurips/A causal view of compositional zero-shot recognition
 create mode 100644 data/2020/neurips/A convex optimization formulation for multivariate regression
 create mode 100644 data/2020/neurips/A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning
 create mode 100644 data/2020/neurips/A kernel test for quasi-independence
 create mode 100644 data/2020/neurips/A mathematical model for automatic differentiation in machine learning
 create mode 100644 data/2020/neurips/A mathematical theory of cooperative communication
 create mode 100644 data/2020/neurips/A mean-field analysis of two-player zero-sum games
 create mode 100644 data/2020/neurips/A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network
 create mode 100644 data/2020/neurips/A new convergent variant of Q-learning with linear function approximation
 create mode 100644 data/2020/neurips/A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons
 create mode 100644 data/2020/neurips/A novel variational form of the Schatten-$p$ quasi-norm
 create mode 100644 data/2020/neurips/A polynomial-time algorithm for learning nonparametric causal graphs
 create mode 100644 data/2020/neurips/A shooting formulation of deep learning
 create mode 100644 data/2020/neurips/A simple normative network approximates local non-Hebbian learning in the cortex
 create mode 100644 data/2020/neurips/AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity
 create mode 100644 data/2020/neurips/AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection
 create mode 100644 data/2020/neurips/ARMA Nets: Expanding Receptive Field for Dense Prediction
 create mode 100644 data/2020/neurips/AViD Dataset: Anonymized Videos from Diverse Countries
 create mode 100644 data/2020/neurips/Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
 create mode 100644 data/2020/neurips/Acceleration with a Ball Optimization Oracle
 create mode 100644 data/2020/neurips/Achieving Equalized Odds by Resampling Sensitive Attributes
 create mode 100644 data/2020/neurips/Active Invariant Causal Prediction: Experiment Selection through Stability
 create mode 100644 data/2020/neurips/Active Structure Learning of Causal DAGs via Directed Clique Trees
 create mode 100644 data/2020/neurips/AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
 create mode 100644 data/2020/neurips/AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
 create mode 100644 data/2020/neurips/AdaTune: Adaptive Tensor Program Compilation Made Efficient
 create mode 100644 data/2020/neurips/Adam with Bandit Sampling for Deep Learning
 create mode 100644 data/2020/neurips/Adaptation Properties Allow Identification of Optimized Neural Codes
 create mode 100644 data/2020/neurips/Adapting Neural Architectures Between Domains
 create mode 100644 data/2020/neurips/Adapting to Misspecification in Contextual Bandits
 create mode 100644 data/2020/neurips/Adaptive Discretization for Model-Based Reinforcement Learning
 create mode 100644 data/2020/neurips/Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach
 create mode 100644 data/2020/neurips/Adaptive Gradient Quantization for Data-Parallel SGD
 create mode 100644 data/2020/neurips/Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting
 create mode 100644 data/2020/neurips/Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes
 create mode 100644 data/2020/neurips/Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment
 create mode 100644 data/2020/neurips/Adaptive Online Estimation of Piecewise Polynomial Trends
 create mode 100644 data/2020/neurips/Adaptive Probing Policies for Shortest Path Routing
 create mode 100644 data/2020/neurips/Adaptive Reduced Rank Regression
 create mode 100644 data/2020/neurips/Adaptive Sampling for Stochastic Risk-Averse Learning
 create mode 100644 data/2020/neurips/Adaptive Shrinkage Estimation for Streaming Graphs
 create mode 100644 data/2020/neurips/AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows
 create mode 100644 data/2020/neurips/Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization
 create mode 100644 data/2020/neurips/Adversarial Attacks on Deep Graph Matching
 create mode 100644 data/2020/neurips/Adversarial Attacks on Linear Contextual Bandits
 create mode 100644 data/2020/neurips/Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm
 create mode 100644 data/2020/neurips/Adversarial Blocking Bandits
 create mode 100644 data/2020/neurips/Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion
 create mode 100644 data/2020/neurips/Adversarial Distributional Training for Robust Deep Learning
 create mode 100644 data/2020/neurips/Adversarial Example Games
 create mode 100644 data/2020/neurips/Adversarial Learning for Robust Deep Clustering
 create mode 100644 data/2020/neurips/Adversarial Robustness of Supervised Sparse Coding
 create mode 100644 data/2020/neurips/Adversarial Self-Supervised Contrastive Learning
 create mode 100644 data/2020/neurips/Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
 create mode 100644 data/2020/neurips/Adversarial Sparse Transformer for Time Series Forecasting
 create mode 100644 data/2020/neurips/Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation
 create mode 100644 data/2020/neurips/Adversarial Training is a Form of Data-dependent Operator Norm Regularization
 create mode 100644 data/2020/neurips/Adversarial Weight Perturbation Helps Robust Generalization
 create mode 100644 data/2020/neurips/Adversarial robustness via robust low rank representations
 create mode 100644 data/2020/neurips/Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
 create mode 100644 data/2020/neurips/Adversarially Robust Streaming Algorithms via Differential Privacy
 create mode 100644 data/2020/neurips/Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models
 create mode 100644 data/2020/neurips/Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity
 create mode 100644 data/2020/neurips/Agnostic Learning of a Single Neuron with Gradient Descent
 create mode 100644 data/2020/neurips/Agnostic Learning with Multiple Objectives
 create mode 100644 data/2020/neurips/Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space
 create mode 100644 data/2020/neurips/Algorithmic recourse under imperfect causal knowledge: a probabilistic approach
 create mode 100644 data/2020/neurips/All Word Embeddings from One Embedding
 create mode 100644 data/2020/neurips/All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation
 create mode 100644 data/2020/neurips/Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition
 create mode 100644 data/2020/neurips/Almost Surely Stable Deep Dynamics
 create mode 100644 data/2020/neurips/An Analysis of SVD for Deep Rotation Estimation
 create mode 100644 data/2020/neurips/An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits
 create mode 100644 data/2020/neurips/An Efficient Adversarial Attack for Tree Ensembles
 create mode 100644 data/2020/neurips/An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
 create mode 100644 data/2020/neurips/An Efficient Framework for Clustered Federated Learning
 create mode 100644 data/2020/neurips/An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits
 create mode 100644 data/2020/neurips/An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
 create mode 100644 data/2020/neurips/An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
 create mode 100644 data/2020/neurips/An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
 create mode 100644 data/2020/neurips/An Improved Analysis of Stochastic Gradient Descent with Momentum
 create mode 100644 data/2020/neurips/An Optimal Elimination Algorithm for Learning a Best Arm
 create mode 100644 data/2020/neurips/An Unbiased Risk Estimator for Learning with Augmented Classes
 create mode 100644 data/2020/neurips/An Unsupervised Information-Theoretic Perceptual Quality Metric
 create mode 100644 data/2020/neurips/An analytic theory of shallow networks dynamics for hinge loss classification
 create mode 100644 data/2020/neurips/An efficient nonconvex reformulation of stagewise convex optimization problems
 create mode 100644 data/2020/neurips/An implicit function learning approach for parametric modal regression
 create mode 100644 data/2020/neurips/An operator view of policy gradient methods
 create mode 100644 data/2020/neurips/Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring
 create mode 100644 data/2020/neurips/Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
 create mode 100644 data/2020/neurips/Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks
 create mode 100644 data/2020/neurips/Applications of Common Entropy for Causal Inference
 create mode 100644 data/2020/neurips/Approximate Cross-Validation for Structured Models
 create mode 100644 data/2020/neurips/Approximate Cross-Validation with Low-Rank Data in High Dimensions
 create mode 100644 data/2020/neurips/Approximate Heavily-Constrained Learning with Lagrange Multiplier Models
 create mode 100644 data/2020/neurips/Approximation Based Variance Reduction for Reparameterization Gradients
 create mode 100644 data/2020/neurips/Assessing SATNet's Ability to Solve the Symbol Grounding Problem
 create mode 100644 data/2020/neurips/Assisted Learning: A Framework for Multi-Organization Learning
 create mode 100644 data/2020/neurips/Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
 create mode 100644 data/2020/neurips/Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
 create mode 100644 data/2020/neurips/Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model
 create mode 100644 data/2020/neurips/Asymptotically Optimal Exact Minibatch Metropolis-Hastings
 create mode 100644 data/2020/neurips/Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
 create mode 100644 data/2020/neurips/AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control
 create mode 100644 data/2020/neurips/Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation
 create mode 100644 data/2020/neurips/Attribute Prototype Network for Zero-Shot Learning
 create mode 100644 data/2020/neurips/Attribution Preservation in Network Compression for Reliable Network Interpretation
 create mode 100644 data/2020/neurips/Audeo: Audio Generation for a Silent Performance Video
 create mode 100644 data/2020/neurips/Auditing Differentially Private Machine Learning: How Private is Private SGD?
 create mode 100644 data/2020/neurips/Auto Learning Attention
 create mode 100644 data/2020/neurips/Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
 create mode 100644 data/2020/neurips/AutoBSS: An Efficient Algorithm for Block Stacking Style Search
 create mode 100644 data/2020/neurips/AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference
 create mode 100644 data/2020/neurips/AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
 create mode 100644 data/2020/neurips/Autoencoders that don't overfit towards the Identity
 create mode 100644 data/2020/neurips/Autofocused oracles for model-based design
 create mode 100644 data/2020/neurips/Automatic Curriculum Learning through Value Disagreement
 create mode 100644 data/2020/neurips/Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
 create mode 100644 data/2020/neurips/Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
 create mode 100644 data/2020/neurips/Autoregressive Score Matching
 create mode 100644 data/2020/neurips/Auxiliary Task Reweighting for Minimum-data Learning
 create mode 100644 data/2020/neurips/AvE: Assistance via Empowerment
 create mode 100644 data/2020/neurips/Avoiding Side Effects By Considering Future Tasks
 create mode 100644 data/2020/neurips/Avoiding Side Effects in Complex Environments
 create mode 100644 data/2020/neurips/Axioms for Learning from Pairwise Comparisons
 create mode 100644 data/2020/neurips/BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/BERT Loses Patience: Fast and Robust Inference with Early Exit
 create mode 100644 data/2020/neurips/BOSS: Bayesian Optimization over String Spaces
 create mode 100644 data/2020/neurips/BRP-NAS: Prediction-based NAS using GCNs
 create mode 100644 data/2020/neurips/Backpropagating Linearly Improves Transferability of Adversarial Examples
 create mode 100644 data/2020/neurips/Bad Global Minima Exist and SGD Can Reach Them
 create mode 100644 data/2020/neurips/Balanced Meta-Softmax for Long-Tailed Visual Recognition
 create mode 100644 data/2020/neurips/Bandit Linear Control
 create mode 100644 data/2020/neurips/Bandit Samplers for Training Graph Neural Networks
 create mode 100644 data/2020/neurips/BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits
 create mode 100644 data/2020/neurips/Barking up the right tree: an approach to search over molecule synthesis DAGs
 create mode 100644 data/2020/neurips/Batch normalization provably avoids ranks collapse for randomly initialised deep networks
 create mode 100644 data/2020/neurips/Batched Coarse Ranking in Multi-Armed Bandits
 create mode 100644 data/2020/neurips/Baxter Permutation Process
 create mode 100644 data/2020/neurips/BayReL: Bayesian Relational Learning for Multi-omics Data Integration
 create mode 100644 data/2020/neurips/Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class
 create mode 100644 data/2020/neurips/Bayesian Attention Modules
 create mode 100644 data/2020/neurips/Bayesian Bits: Unifying Quantization and Pruning
 create mode 100644 data/2020/neurips/Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks
 create mode 100644 data/2020/neurips/Bayesian Deep Ensembles via the Neural Tangent Kernel
 create mode 100644 data/2020/neurips/Bayesian Deep Learning and a Probabilistic Perspective of Generalization
 create mode 100644 data/2020/neurips/Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels
 create mode 100644 data/2020/neurips/Bayesian Multi-type Mean Field Multi-agent Imitation Learning
 create mode 100644 data/2020/neurips/Bayesian Optimization for Iterative Learning
 create mode 100644 data/2020/neurips/Bayesian Optimization of Risk Measures
 create mode 100644 data/2020/neurips/Bayesian Probabilistic Numerical Integration with Tree-Based Models
 create mode 100644 data/2020/neurips/Bayesian Pseudocoresets
 create mode 100644 data/2020/neurips/Bayesian Robust Optimization for Imitation Learning
 create mode 100644 data/2020/neurips/Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
 create mode 100644 data/2020/neurips/Belief Propagation Neural Networks
 create mode 100644 data/2020/neurips/Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
 create mode 100644 data/2020/neurips/Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method
 create mode 100644 data/2020/neurips/Benchmarking Deep Learning Interpretability in Time Series Predictions
 create mode 100644 data/2020/neurips/Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs
 create mode 100644 data/2020/neurips/Beta R-CNN: Looking into Pedestrian Detection from Another Perspective
 create mode 100644 data/2020/neurips/Better Full-Matrix Regret via Parameter-Free Online Learning
 create mode 100644 data/2020/neurips/Better Set Representations For Relational Reasoning
 create mode 100644 data/2020/neurips/Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs
 create mode 100644 data/2020/neurips/Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
 create mode 100644 data/2020/neurips/Beyond Lazy Training for Over-parameterized Tensor Decomposition
 create mode 100644 data/2020/neurips/Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples
 create mode 100644 data/2020/neurips/Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency
 create mode 100644 data/2020/neurips/Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties
 create mode 100644 data/2020/neurips/Bi-level Score Matching for Learning Energy-based Latent Variable Models
 create mode 100644 data/2020/neurips/Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
 create mode 100644 data/2020/neurips/Bidirectional Convolutional Poisson Gamma Dynamical Systems
 create mode 100644 data/2020/neurips/Big Bird: Transformers for Longer Sequences
 create mode 100644 data/2020/neurips/Big Self-Supervised Models are Strong Semi-Supervised Learners
 create mode 100644 data/2020/neurips/Biological credit assignment through dynamic inversion of feedforward networks
 create mode 100644 data/2020/neurips/Biologically Inspired Mechanisms for Adversarial Robustness
 create mode 100644 data/2020/neurips/Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework
 create mode 100644 data/2020/neurips/Black-Box Optimization with Local Generative Surrogates
 create mode 100644 data/2020/neurips/Black-Box Ripper: Copying black-box models using generative evolutionary algorithms
 create mode 100644 data/2020/neurips/Blind Video Temporal Consistency via Deep Video Prior
 create mode 100644 data/2020/neurips/BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
 create mode 100644 data/2020/neurips/BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization
 create mode 100644 data/2020/neurips/Boosting Adversarial Training with Hypersphere Embedding
 create mode 100644 data/2020/neurips/Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates
 create mode 100644 data/2020/neurips/Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
 create mode 100644 data/2020/neurips/Bootstrapping neural processes
 create mode 100644 data/2020/neurips/Boundary thickness and robustness in learning models
 create mode 100644 data/2020/neurips/BoxE: A Box Embedding Model for Knowledge Base Completion
 create mode 100644 data/2020/neurips/Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization
 create mode 100644 data/2020/neurips/Breaking the Communication-Privacy-Accuracy Trilemma
 create mode 100644 data/2020/neurips/Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
 create mode 100644 data/2020/neurips/Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
 create mode 100644 data/2020/neurips/Building powerful and equivariant graph neural networks with structural message-passing
 create mode 100644 data/2020/neurips/Byzantine Resilient Distributed Multi-Task Learning
 create mode 100644 data/2020/neurips/CASTLE: Regularization via Auxiliary Causal Graph Discovery
 create mode 100644 data/2020/neurips/CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation
 create mode 100644 data/2020/neurips/CLEARER: Multi-Scale Neural Architecture Search for Image Restoration
 create mode 100644 data/2020/neurips/COBE: Contextualized Object Embeddings from Narrated Instructional Video
 create mode 100644 data/2020/neurips/COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
 create mode 100644 data/2020/neurips/COPT: Coordinated Optimal Transport on Graphs
 create mode 100644 data/2020/neurips/COT-GAN: Generating Sequential Data via Causal Optimal Transport
 create mode 100644 data/2020/neurips/CSER: Communication-efficient SGD with Error Reset
 create mode 100644 data/2020/neurips/CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances
 create mode 100644 data/2020/neurips/CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations
 create mode 100644 data/2020/neurips/Calibrated Reliable Regression using Maximum Mean Discrepancy
 create mode 100644 data/2020/neurips/Calibrating CNNs for Lifelong Learning
 create mode 100644 data/2020/neurips/Calibrating Deep Neural Networks using Focal Loss
 create mode 100644 data/2020/neurips/Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
 create mode 100644 data/2020/neurips/Can Graph Neural Networks Count Substructures?
 create mode 100644 data/2020/neurips/Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference
 create mode 100644 data/2020/neurips/Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
 create mode 100644 data/2020/neurips/Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?
 create mode 100644 data/2020/neurips/Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
 create mode 100644 data/2020/neurips/Can the Brain Do Backpropagation? - Exact Implementation of Backpropagation in Predictive Coding Networks
 create mode 100644 data/2020/neurips/Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction
 create mode 100644 data/2020/neurips/Cascaded Text Generation with Markov Transformers
 create mode 100644 data/2020/neurips/Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning
 create mode 100644 data/2020/neurips/Causal Discovery in Physical Systems from Videos
 create mode 100644 data/2020/neurips/Causal Estimation with Functional Confounders
 create mode 100644 data/2020/neurips/Causal Intervention for Weakly-Supervised Semantic Segmentation
 create mode 100644 data/2020/neurips/Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models
 create mode 100644 data/2020/neurips/Causal analysis of Covid-19 Spread in Germany
 create mode 100644 data/2020/neurips/Certifiably Adversarially Robust Detection of Out-of-Distribution Data
 create mode 100644 data/2020/neurips/Certified Defense to Image Transformations via Randomized Smoothing
 create mode 100644 data/2020/neurips/Certified Monotonic Neural Networks
 create mode 100644 data/2020/neurips/Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks
 create mode 100644 data/2020/neurips/Certifying Confidence via Randomized Smoothing
 create mode 100644 data/2020/neurips/Certifying Strategyproof Auction Networks
 create mode 100644 data/2020/neurips/Chaos, Extremism and Optimism: Volume Analysis of Learning in Games
 create mode 100644 data/2020/neurips/Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe
 create mode 100644 data/2020/neurips/Characterizing emergent representations in a space of candidate learning rules for deep networks
 create mode 100644 data/2020/neurips/Choice Bandits
 create mode 100644 data/2020/neurips/CircleGAN: Generative Adversarial Learning across Spherical Circles
 create mode 100644 data/2020/neurips/Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability
 create mode 100644 data/2020/neurips/Classification with Valid and Adaptive Coverage
 create mode 100644 data/2020/neurips/Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
 create mode 100644 data/2020/neurips/Co-Tuning for Transfer Learning
 create mode 100644 data/2020/neurips/Co-exposure Maximization in Online Social Networks
 create mode 100644 data/2020/neurips/CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection
 create mode 100644 data/2020/neurips/CoMIR: Contrastive Multimodal Image Representation for Registration
 create mode 100644 data/2020/neurips/CoSE: Compositional Stroke Embeddings
 create mode 100644 data/2020/neurips/CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching
 create mode 100644 data/2020/neurips/Coded Sequential Matrix Multiplication For Straggler Mitigation
 create mode 100644 data/2020/neurips/CogLTX: Applying BERT to Long Texts
 create mode 100644 data/2020/neurips/CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
 create mode 100644 data/2020/neurips/Coherent Hierarchical Multi-Label Classification Networks
 create mode 100644 data/2020/neurips/CoinDICE: Off-Policy Confidence Interval Estimation
 create mode 100644 data/2020/neurips/CoinPress: Practical Private Mean and Covariance Estimation
 create mode 100644 data/2020/neurips/ColdGANs: Taming Language GANs with Cautious Sampling Strategies
 create mode 100644 data/2020/neurips/Collapsing Bandits and Their Application to Public Health Intervention
 create mode 100644 data/2020/neurips/Collegial Ensembles
 create mode 100644 data/2020/neurips/Color Visual Illusions: A Statistics-based Computational Model
 create mode 100644 data/2020/neurips/Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
 create mode 100644 data/2020/neurips/Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian
 create mode 100644 "data/2020/neurips/Community detection using fast low-cardinality semidefinite programming\342\200\251"
 create mode 100644 data/2020/neurips/Compact task representations as a normative model for higher-order brain activity
 create mode 100644 data/2020/neurips/Comparator-Adaptive Convex Bandits
 create mode 100644 data/2020/neurips/Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval
 create mode 100644 data/2020/neurips/Compositional Explanations of Neurons
 create mode 100644 data/2020/neurips/Compositional Generalization by Learning Analytical Expressions
 create mode 100644 data/2020/neurips/Compositional Generalization via Neural-Symbolic Stack Machines
 create mode 100644 data/2020/neurips/Compositional Visual Generation with Energy Based Models
 create mode 100644 data/2020/neurips/Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
 create mode 100644 data/2020/neurips/Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection
 create mode 100644 data/2020/neurips/Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding
 create mode 100644 data/2020/neurips/Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming
 create mode 100644 data/2020/neurips/Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds
 create mode 100644 data/2020/neurips/Confidence sequences for sampling without replacement
 create mode 100644 data/2020/neurips/Conformal Symplectic and Relativistic Optimization
 create mode 100644 data/2020/neurips/Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
 create mode 100644 data/2020/neurips/Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices
 create mode 100644 data/2020/neurips/Consequences of Misaligned AI
 create mode 100644 data/2020/neurips/Conservative Q-Learning for Offline Reinforcement Learning
 create mode 100644 data/2020/neurips/Consistency Regularization for Certified Robustness of Smoothed Classifiers
 create mode 100644 data/2020/neurips/Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations
 create mode 100644 data/2020/neurips/Consistent Plug-in Classifiers for Complex Objectives and Constraints
 create mode 100644 data/2020/neurips/Consistent Structural Relation Learning for Zero-Shot Segmentation
 create mode 100644 data/2020/neurips/Consistent feature selection for analytic deep neural networks
 create mode 100644 data/2020/neurips/Constant-Expansion Suffices for Compressed Sensing with Generative Priors
 create mode 100644 data/2020/neurips/Constrained episodic reinforcement learning in concave-convex and knapsack settings
 create mode 100644 data/2020/neurips/Constraining Variational Inference with Geometric Jensen-Shannon Divergence
 create mode 100644 data/2020/neurips/Content Provider Dynamics and Coordination in Recommendation Ecosystems
 create mode 100644 data/2020/neurips/Contextual Games: Multi-Agent Learning with Side Information
 create mode 100644 data/2020/neurips/Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming
 create mode 100644 data/2020/neurips/Continual Deep Learning by Functional Regularisation of Memorable Past
 create mode 100644 data/2020/neurips/Continual Learning in Low-rank Orthogonal Subspaces
 create mode 100644 data/2020/neurips/Continual Learning of Control Primitives : Skill Discovery via Reset-Games
 create mode 100644 data/2020/neurips/Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks
 create mode 100644 data/2020/neurips/Continual Learning with Node-Importance based Adaptive Group Sparse Regularization
 create mode 100644 data/2020/neurips/Continuous Meta-Learning without Tasks
 create mode 100644 data/2020/neurips/Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision
 create mode 100644 data/2020/neurips/Continuous Regularized Wasserstein Barycenters
 create mode 100644 data/2020/neurips/Continuous Submodular Maximization: Beyond DR-Submodularity
 create mode 100644 data/2020/neurips/Continuous Surface Embeddings
 create mode 100644 data/2020/neurips/ContraGAN: Contrastive Learning for Conditional Image Generation
 create mode 100644 data/2020/neurips/Contrastive Learning with Adversarial Examples
 create mode 100644 data/2020/neurips/Contrastive learning of global and local features for medical image segmentation with limited annotations
 create mode 100644 data/2020/neurips/ConvBERT: Improving BERT with Span-based Dynamic Convolution
 create mode 100644 data/2020/neurips/Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
 create mode 100644 data/2020/neurips/Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters
 create mode 100644 data/2020/neurips/Convex optimization based on global lower second-order models
 create mode 100644 data/2020/neurips/Convolutional Generation of Textured 3D Meshes
 create mode 100644 data/2020/neurips/Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
 create mode 100644 data/2020/neurips/Cooperative Heterogeneous Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Cooperative Multi-player Bandit Optimization
 create mode 100644 data/2020/neurips/Coresets for Near-Convex Functions
 create mode 100644 data/2020/neurips/Coresets for Regressions with Panel Data
 create mode 100644 data/2020/neurips/Coresets for Robust Training of Deep Neural Networks against Noisy Labels
 create mode 100644 data/2020/neurips/Coresets via Bilevel Optimization for Continual Learning and Streaming
 create mode 100644 data/2020/neurips/Correlation Robust Influence Maximization
 create mode 100644 data/2020/neurips/Correspondence learning via linearly-invariant embedding
 create mode 100644 data/2020/neurips/Counterexample-Guided Learning of Monotonic Neural Networks
 create mode 100644 data/2020/neurips/Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding
 create mode 100644 data/2020/neurips/Counterfactual Data Augmentation using Locally Factored Dynamics
 create mode 100644 data/2020/neurips/Counterfactual Prediction for Bundle Treatment
 create mode 100644 data/2020/neurips/Counterfactual Predictions under Runtime Confounding
 create mode 100644 data/2020/neurips/Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
 create mode 100644 data/2020/neurips/Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators
 create mode 100644 data/2020/neurips/Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
 create mode 100644 data/2020/neurips/Critic Regularized Regression
 create mode 100644 data/2020/neurips/Cross-Scale Internal Graph Neural Network for Image Super-Resolution
 create mode 100644 data/2020/neurips/Cross-lingual Retrieval for Iterative Self-Supervised Training
 create mode 100644 data/2020/neurips/Cross-validation Confidence Intervals for Test Error
 create mode 100644 data/2020/neurips/CrossTransformers: spatially-aware few-shot transfer
 create mode 100644 data/2020/neurips/Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality
 create mode 100644 data/2020/neurips/Curriculum By Smoothing
 create mode 100644 data/2020/neurips/Curriculum Learning by Dynamic Instance Hardness
 create mode 100644 data/2020/neurips/Curriculum learning for multilevel budgeted combinatorial problems
 create mode 100644 data/2020/neurips/Curvature Regularization to Prevent Distortion in Graph Embedding
 create mode 100644 data/2020/neurips/Cycle-Contrast for Self-Supervised Video Representation Learning
 create mode 100644 data/2020/neurips/DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks
 create mode 100644 data/2020/neurips/DISK: Learning local features with policy gradient
 create mode 100644 data/2020/neurips/DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles
 create mode 100644 data/2020/neurips/Dark Experience for General Continual Learning: a Strong, Simple Baseline
 create mode 100644 data/2020/neurips/Data Diversification: A Simple Strategy For Neural Machine Translation
 create mode 100644 data/2020/neurips/De-Anonymizing Text by Fingerprinting Language Generation
 create mode 100644 data/2020/neurips/Debiased Contrastive Learning
 create mode 100644 data/2020/neurips/Debiasing Averaged Stochastic Gradient Descent to handle missing values
 create mode 100644 data/2020/neurips/Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
 create mode 100644 data/2020/neurips/Debugging Tests for Model Explanations
 create mode 100644 data/2020/neurips/Decentralized Accelerated Proximal Gradient Descent
 create mode 100644 data/2020/neurips/Decentralized Langevin Dynamics for Bayesian Learning
 create mode 100644 data/2020/neurips/Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis
 create mode 100644 data/2020/neurips/Decision trees as partitioning machines to characterize their generalization properties
 create mode 100644 data/2020/neurips/Decision-Making with Auto-Encoding Variational Bayes
 create mode 100644 data/2020/neurips/Decisions, Counterfactual Explanations and Strategic Behavior
 create mode 100644 data/2020/neurips/Deep Archimedean Copulas
 create mode 100644 data/2020/neurips/Deep Automodulators
 create mode 100644 data/2020/neurips/Deep Diffusion-Invariant Wasserstein Distributional Classification
 create mode 100644 data/2020/neurips/Deep Direct Likelihood Knockoffs
 create mode 100644 data/2020/neurips/Deep Energy-based Modeling of Discrete-Time Physics
 create mode 100644 data/2020/neurips/Deep Evidential Regression
 create mode 100644 data/2020/neurips/Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking
 create mode 100644 data/2020/neurips/Deep Imitation Learning for Bimanual Robotic Manipulation
 create mode 100644 data/2020/neurips/Deep Metric Learning with Spherical Embedding
 create mode 100644 data/2020/neurips/Deep Multimodal Fusion by Channel Exchanging
 create mode 100644 data/2020/neurips/Deep Rao-Blackwellised Particle Filters for Time Series Forecasting
 create mode 100644 data/2020/neurips/Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
 create mode 100644 data/2020/neurips/Deep Reinforcement and InfoMax Learning
 create mode 100644 data/2020/neurips/Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network
 create mode 100644 data/2020/neurips/Deep Shells: Unsupervised Shape Correspondence with Optimal Transport
 create mode 100644 data/2020/neurips/Deep Smoothing of the Implied Volatility Surface
 create mode 100644 data/2020/neurips/Deep Statistical Solvers
 create mode 100644 data/2020/neurips/Deep Structural Causal Models for Tractable Counterfactual Inference
 create mode 100644 data/2020/neurips/Deep Subspace Clustering with Data Augmentation
 create mode 100644 data/2020/neurips/Deep Transformation-Invariant Clustering
 create mode 100644 data/2020/neurips/Deep Transformers with Latent Depth
 create mode 100644 data/2020/neurips/Deep Variational Instance Segmentation
 create mode 100644 data/2020/neurips/Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring
 create mode 100644 data/2020/neurips/Deep active inference agents using Monte-Carlo methods
 create mode 100644 data/2020/neurips/Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
 create mode 100644 data/2020/neurips/Deep reconstruction of strange attractors from time series
 create mode 100644 data/2020/neurips/DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs
 create mode 100644 data/2020/neurips/DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
 create mode 100644 data/2020/neurips/Deeply Learned Spectral Total Variation Decomposition
 create mode 100644 data/2020/neurips/Delay and Cooperation in Nonstochastic Linear Bandits
 create mode 100644 data/2020/neurips/Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians
 create mode 100644 data/2020/neurips/Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation
 create mode 100644 data/2020/neurips/Demixed shared component analysis of neural population data from multiple brain areas
 create mode 100644 data/2020/neurips/Demystifying Orthogonal Monte Carlo and Beyond
 create mode 100644 data/2020/neurips/Denoised Smoothing: A Provable Defense for Pretrained Classifiers
 create mode 100644 data/2020/neurips/Denoising Diffusion Probabilistic Models
 create mode 100644 data/2020/neurips/Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs
 create mode 100644 data/2020/neurips/Depth Uncertainty in Neural Networks
 create mode 100644 data/2020/neurips/Design Space for Graph Neural Networks
 create mode 100644 data/2020/neurips/Detecting Hands and Recognizing Physical Contact in the Wild
 create mode 100644 data/2020/neurips/Detecting Interactions from Neural Networks via Topological Analysis
 create mode 100644 data/2020/neurips/Detection as Regression: Certified Object Detection with Median Smoothing
 create mode 100644 data/2020/neurips/Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
 create mode 100644 data/2020/neurips/Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
 create mode 100644 data/2020/neurips/DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling
 create mode 100644 data/2020/neurips/Differentiable Augmentation for Data-Efficient GAN Training
 create mode 100644 data/2020/neurips/Differentiable Causal Discovery from Interventional Data
 create mode 100644 data/2020/neurips/Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization
 create mode 100644 data/2020/neurips/Differentiable Meta-Learning of Bandit Policies
 create mode 100644 data/2020/neurips/Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
 create mode 100644 data/2020/neurips/Differentiable Top-k with Optimal Transport
 create mode 100644 data/2020/neurips/Differentially Private Clustering: Tight Approximation Ratios
 create mode 100644 data/2020/neurips/Differentially-Private Federated Linear Bandits
 create mode 100644 data/2020/neurips/Digraph Inception Convolutional Networks
 create mode 100644 data/2020/neurips/Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
 create mode 100644 data/2020/neurips/Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
 create mode 100644 data/2020/neurips/Directional Pruning of Deep Neural Networks
 create mode 100644 data/2020/neurips/Directional convergence and alignment in deep learning
 create mode 100644 data/2020/neurips/Dirichlet Graph Variational Autoencoder
 create mode 100644 data/2020/neurips/DisARM: An Antithetic Gradient Estimator for Binary Latent Variables
 create mode 100644 data/2020/neurips/DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
 create mode 100644 data/2020/neurips/Discovering Reinforcement Learning Algorithms
 create mode 100644 data/2020/neurips/Discovering Symbolic Models from Deep Learning with Inductive Biases
 create mode 100644 data/2020/neurips/Discovering conflicting groups in signed networks
 create mode 100644 data/2020/neurips/Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
 create mode 100644 data/2020/neurips/Disentangling Human Error from Ground Truth in Segmentation of Medical Images
 create mode 100644 data/2020/neurips/Disentangling by Subspace Diffusion
 create mode 100644 data/2020/neurips/Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation
 create mode 100644 data/2020/neurips/Dissecting Neural ODEs
 create mode 100644 data/2020/neurips/Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning
 create mode 100644 data/2020/neurips/Distributed Distillation for On-Device Learning
 create mode 100644 data/2020/neurips/Distributed Newton Can Communicate Less and Resist Byzantine Workers
 create mode 100644 data/2020/neurips/Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
 create mode 100644 data/2020/neurips/Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
 create mode 100644 data/2020/neurips/Distribution Matching for Crowd Counting
 create mode 100644 data/2020/neurips/Distribution-free binary classification: prediction sets, confidence intervals and calibration
 create mode 100644 data/2020/neurips/Distributional Robustness with IPMs and links to Regularization and GANs
 create mode 100644 data/2020/neurips/Distributionally Robust Federated Averaging
 create mode 100644 data/2020/neurips/Distributionally Robust Local Non-parametric Conditional Estimation
 create mode 100644 data/2020/neurips/Distributionally Robust Parametric Maximum Likelihood Estimation
 create mode 100644 data/2020/neurips/Diverse Image Captioning with Context-Object Split Latent Spaces
 create mode 100644 data/2020/neurips/Diversity can be Transferred: Output Diversification for White- and Black-box Attacks
 create mode 100644 data/2020/neurips/Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations
 create mode 100644 data/2020/neurips/Do Adversarially Robust ImageNet Models Transfer Better?
 create mode 100644 data/2020/neurips/Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?
 create mode 100644 data/2020/neurips/Domain Adaptation as a Problem of Inference on Graphical Models
 create mode 100644 data/2020/neurips/Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift
 create mode 100644 data/2020/neurips/Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization
 create mode 100644 data/2020/neurips/Domain Generalization via Entropy Regularization
 create mode 100644 data/2020/neurips/Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
 create mode 100644 data/2020/neurips/Dual Instrumental Variable Regression
 create mode 100644 data/2020/neurips/Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks
 create mode 100644 data/2020/neurips/Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning
 create mode 100644 data/2020/neurips/Dual-Free Stochastic Decentralized Optimization with Variance Reduction
 create mode 100644 data/2020/neurips/Dual-Resolution Correspondence Networks
 create mode 100644 data/2020/neurips/Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion
 create mode 100644 data/2020/neurips/DynaBERT: Dynamic BERT with Adaptive Width and Depth
 create mode 100644 data/2020/neurips/Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains
 create mode 100644 data/2020/neurips/Dynamic Regret of Convex and Smooth Functions
 create mode 100644 data/2020/neurips/Dynamic Regret of Policy Optimization in Non-Stationary Environments
 create mode 100644 data/2020/neurips/Dynamic Submodular Maximization
 create mode 100644 data/2020/neurips/Dynamic allocation of limited memory resources in reinforcement learning
 create mode 100644 data/2020/neurips/Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification
 create mode 100644 data/2020/neurips/Early-Learning Regularization Prevents Memorization of Noisy Labels
 create mode 100644 data/2020/neurips/EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints
 create mode 100644 data/2020/neurips/Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization
 create mode 100644 data/2020/neurips/Effective Diversity in Population Based Reinforcement Learning
 create mode 100644 data/2020/neurips/Efficient Algorithms for Device Placement of DNN Graph Operators
 create mode 100644 data/2020/neurips/Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut
 create mode 100644 data/2020/neurips/Efficient Clustering for Stretched Mixtures: Landscape and Optimality
 create mode 100644 data/2020/neurips/Efficient Contextual Bandits with Continuous Actions
 create mode 100644 data/2020/neurips/Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning
 create mode 100644 data/2020/neurips/Efficient Exact Verification of Binarized Neural Networks
 create mode 100644 data/2020/neurips/Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization
 create mode 100644 data/2020/neurips/Efficient Generation of Structured Objects with Constrained Adversarial Networks
 create mode 100644 data/2020/neurips/Efficient Learning of Discrete Graphical Models
 create mode 100644 data/2020/neurips/Efficient Learning of Generative Models via Finite-Difference Score Matching
 create mode 100644 data/2020/neurips/Efficient Low Rank Gaussian Variational Inference for Neural Networks
 create mode 100644 data/2020/neurips/Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
 create mode 100644 data/2020/neurips/Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
 create mode 100644 data/2020/neurips/Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
 create mode 100644 data/2020/neurips/Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
 create mode 100644 data/2020/neurips/Efficient Planning in Large MDPs with Weak Linear Function Approximation
 create mode 100644 data/2020/neurips/Efficient Projection-free Algorithms for Saddle Point Problems
 create mode 100644 data/2020/neurips/Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee
 create mode 100644 data/2020/neurips/Efficient active learning of sparse halfspaces with arbitrary bounded noise
 create mode 100644 data/2020/neurips/Efficient estimation of neural tuning during naturalistic behavior
 create mode 100644 data/2020/neurips/Efficient semidefinite-programming-based inference for binary and multi-class MRFs
 create mode 100644 data/2020/neurips/Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data
 create mode 100644 data/2020/neurips/Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks
 create mode 100644 data/2020/neurips/Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
 create mode 100644 data/2020/neurips/Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences
 create mode 100644 data/2020/neurips/Empirical Likelihood for Contextual Bandits
 create mode 100644 data/2020/neurips/Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
 create mode 100644 data/2020/neurips/End-to-End Learning and Intervention in Games
 create mode 100644 data/2020/neurips/Energy-based Out-of-distribution Detection
 create mode 100644 data/2020/neurips/Ensemble Distillation for Robust Model Fusion in Federated Learning
 create mode 100644 data/2020/neurips/Ensembling geophysical models with Bayesian Neural Networks
 create mode 100644 data/2020/neurips/Ensuring Fairness Beyond the Training Data
 create mode 100644 data/2020/neurips/Entropic Causal Inference: Identifiability and Finite Sample Results
 create mode 100644 data/2020/neurips/Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form
 create mode 100644 data/2020/neurips/Entrywise convergence of iterative methods for eigenproblems
 create mode 100644 data/2020/neurips/Equivariant Networks for Hierarchical Structures
 create mode 100644 data/2020/neurips/Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs
 create mode 100644 data/2020/neurips/Error Bounds of Imitating Policies and Environments
 create mode 100644 data/2020/neurips/Escaping Saddle-Point Faster under Interpolation-like Conditions
 create mode 100644 data/2020/neurips/Escaping the Gravitational Pull of Softmax
 create mode 100644 data/2020/neurips/Estimating Fluctuations in Neural Representations of Uncertain Environments
 create mode 100644 data/2020/neurips/Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks
 create mode 100644 data/2020/neurips/Estimating Training Data Influence by Tracing Gradient Descent
 create mode 100644 data/2020/neurips/Estimating decision tree learnability with polylogarithmic sample complexity
 create mode 100644 data/2020/neurips/Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks
 create mode 100644 data/2020/neurips/Estimating weighted areas under the ROC curve
 create mode 100644 data/2020/neurips/Estimation of Skill Distribution from a Tournament
 create mode 100644 data/2020/neurips/Evaluating Attribution for Graph Neural Networks
 create mode 100644 data/2020/neurips/Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions
 create mode 100644 data/2020/neurips/Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization
 create mode 100644 data/2020/neurips/Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders
 create mode 100644 data/2020/neurips/EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning
 create mode 100644 data/2020/neurips/Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
 create mode 100644 data/2020/neurips/Evolving Normalization-Activation Layers
 create mode 100644 data/2020/neurips/Exact Recovery of Mangled Clusters with Same-Cluster Queries
 create mode 100644 data/2020/neurips/Exact expressions for double descent and implicit regularization via surrogate random design
 create mode 100644 data/2020/neurips/Exactly Computing the Local Lipschitz Constant of ReLU Networks
 create mode 100644 data/2020/neurips/Exchangeable Neural ODE for Set Modeling
 create mode 100644 data/2020/neurips/Exemplar Guided Active Learning
 create mode 100644 data/2020/neurips/Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
 create mode 100644 data/2020/neurips/ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
 create mode 100644 data/2020/neurips/Experimental design for MRI by greedy policy search
 create mode 100644 data/2020/neurips/Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
 create mode 100644 data/2020/neurips/Explainable Voting
 create mode 100644 data/2020/neurips/Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay
 create mode 100644 data/2020/neurips/Explicit Regularisation in Gaussian Noise Injections
 create mode 100644 data/2020/neurips/Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits
 create mode 100644 data/2020/neurips/Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning
 create mode 100644 data/2020/neurips/Exploiting the Surrogate Gap in Online Multiclass Classification
 create mode 100644 data/2020/neurips/Exploiting weakly supervised visual patterns to learn from partial annotations
 create mode 100644 data/2020/neurips/Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling
 create mode 100644 data/2020/neurips/Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate
 create mode 100644 data/2020/neurips/FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
 create mode 100644 data/2020/neurips/Factor Graph Grammars
 create mode 100644 data/2020/neurips/Factor Graph Neural Networks
 create mode 100644 data/2020/neurips/Factorizable Graph Convolutional Networks
 create mode 100644 data/2020/neurips/Factorized Neural Processes for Neural Processes: K-Shot Prediction of Neural Responses
 create mode 100644 data/2020/neurips/Fair Hierarchical Clustering
 create mode 100644 data/2020/neurips/Fair Multiple Decision Making Through Soft Interventions
 create mode 100644 data/2020/neurips/Fair Performance Metric Elicitation
 create mode 100644 data/2020/neurips/Fair regression via plug-in estimator and recalibration with statistical guarantees
 create mode 100644 data/2020/neurips/Fair regression with Wasserstein barycenters
 create mode 100644 data/2020/neurips/Fairness constraints can help exact inference in structured prediction
 create mode 100644 data/2020/neurips/Fairness in Streaming Submodular Maximization: Algorithms and Hardness
 create mode 100644 data/2020/neurips/Fairness with Overlapping Groups; a Probabilistic Perspective
 create mode 100644 data/2020/neurips/Fairness without Demographics through Adversarially Reweighted Learning
 create mode 100644 data/2020/neurips/Faithful Embeddings for Knowledge Base Queries
 create mode 100644 data/2020/neurips/Falcon: Fast Spectral Inference on Encrypted Data
 create mode 100644 data/2020/neurips/Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint
 create mode 100644 data/2020/neurips/Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev
 create mode 100644 data/2020/neurips/Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine
 create mode 100644 data/2020/neurips/Fast Fourier Convolution
 create mode 100644 data/2020/neurips/Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization
 create mode 100644 data/2020/neurips/Fast Transformers with Clustered Attention
 create mode 100644 data/2020/neurips/Fast Unbalanced Optimal Transport on a Tree
 create mode 100644 data/2020/neurips/Fast and Accurate $k$-means++ via Rejection Sampling
 create mode 100644 data/2020/neurips/Fast and Flexible Temporal Point Processes with Triangular Maps
 create mode 100644 data/2020/neurips/Fast geometric learning with symbolic matrices
 create mode 100644 data/2020/neurips/Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
 create mode 100644 data/2020/neurips/Faster DBSCAN via subsampled similarity queries
 create mode 100644 "data/2020/neurips/Faster Differentially Private Samplers via R\303\251nyi Divergence Analysis of Discretized Langevin MCMC"
 create mode 100644 data/2020/neurips/Faster Randomized Infeasible Interior Point Methods for Tall Wide Linear Programs
 create mode 100644 data/2020/neurips/Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
 create mode 100644 data/2020/neurips/Feature Importance Ranking for Deep Learning
 create mode 100644 data/2020/neurips/Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests
 create mode 100644 data/2020/neurips/FedSplit: an algorithmic framework for fast federated optimization
 create mode 100644 data/2020/neurips/Federated Accelerated Stochastic Gradient Descent
 create mode 100644 data/2020/neurips/Federated Bayesian Optimization via Thompson Sampling
 create mode 100644 data/2020/neurips/Federated Principal Component Analysis
 create mode 100644 data/2020/neurips/Few-Cost Salient Object Detection with Adversarial-Paced Learning
 create mode 100644 data/2020/neurips/Few-shot Image Generation with Elastic Weight Consolidation
 create mode 100644 data/2020/neurips/Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning
 create mode 100644 data/2020/neurips/Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies
 create mode 100644 data/2020/neurips/Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
 create mode 100644 data/2020/neurips/Field-wise Learning for Multi-field Categorical Data
 create mode 100644 data/2020/neurips/Fighting Copycat Agents in Behavioral Cloning from Observation Histories
 create mode 100644 data/2020/neurips/Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems
 create mode 100644 data/2020/neurips/Finding the Homology of Decision Boundaries with Active Learning
 create mode 100644 data/2020/neurips/Fine-Grained Dynamic Head for Object Detection
 create mode 100644 data/2020/neurips/Finer Metagenomic Reconstruction via Biodiversity Optimization
 create mode 100644 data/2020/neurips/Finite Continuum-Armed Bandits
 create mode 100644 data/2020/neurips/Finite Versus Infinite Neural Networks: an Empirical Study
 create mode 100644 data/2020/neurips/Finite-Time Analysis for Double Q-learning
 create mode 100644 data/2020/neurips/Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
 create mode 100644 data/2020/neurips/Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
 create mode 100644 data/2020/neurips/First Order Constrained Optimization in Policy Space
 create mode 100644 data/2020/neurips/First-Order Methods for Large-Scale Market Equilibrium Computation
 create mode 100644 data/2020/neurips/FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
 create mode 100644 data/2020/neurips/Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm
 create mode 100644 data/2020/neurips/FleXOR: Trainable Fractional Quantization
 create mode 100644 data/2020/neurips/Flexible mean field variational inference using mixtures of non-overlapping exponential families
 create mode 100644 data/2020/neurips/Flows for simultaneous manifold learning and density estimation
 create mode 100644 data/2020/neurips/Focus of Attention Improves Information Transfer in Visual Features
 create mode 100644 data/2020/neurips/Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games
 create mode 100644 data/2020/neurips/Forethought and Hindsight in Credit Assignment
 create mode 100644 data/2020/neurips/Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
 create mode 100644 data/2020/neurips/Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
 create mode 100644 data/2020/neurips/Fourier Sparse Leverage Scores and Approximate Kernel Learning
 create mode 100644 data/2020/neurips/Fourier Spectrum Discrepancies in Deep Network Generated Images
 create mode 100644 data/2020/neurips/Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics
 create mode 100644 data/2020/neurips/FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
 create mode 100644 data/2020/neurips/From Boltzmann Machines to Neural Networks and Back Again
 create mode 100644 data/2020/neurips/From Predictions to Decisions: Using Lookahead Regularization
 create mode 100644 data/2020/neurips/From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
 create mode 100644 data/2020/neurips/FrugalML: How to use ML Prediction APIs more accurately and cheaply
 create mode 100644 data/2020/neurips/Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels
 create mode 100644 data/2020/neurips/Fully Dynamic Algorithm for Constrained Submodular Optimization
 create mode 100644 data/2020/neurips/Functional Regularization for Representation Learning: A Unified Theoretical Perspective
 create mode 100644 data/2020/neurips/Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
 create mode 100644 data/2020/neurips/Further Analysis of Outlier Detection with Deep Generative Models
 create mode 100644 data/2020/neurips/GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
 create mode 100644 data/2020/neurips/GAN Memory with No Forgetting
 create mode 100644 data/2020/neurips/GANSpace: Discovering Interpretable GAN Controls
 create mode 100644 "data/2020/neurips/GCN meets GPU: Decoupling \"When to Sample\" from \"How to Sample\""
 create mode 100644 data/2020/neurips/GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
 create mode 100644 data/2020/neurips/GNNGuard: Defending Graph Neural Networks against Adversarial Attacks
 create mode 100644 data/2020/neurips/GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network
 create mode 100644 data/2020/neurips/GPS-Net: Graph-based Photometric Stereo Network
 create mode 100644 data/2020/neurips/GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification
 create mode 100644 data/2020/neurips/GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis
 create mode 100644 data/2020/neurips/GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators
 create mode 100644 data/2020/neurips/Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
 create mode 100644 data/2020/neurips/Gaussian Gated Linear Networks
 create mode 100644 data/2020/neurips/Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
 create mode 100644 data/2020/neurips/General Control Functions for Causal Effect Estimation from IVs
 create mode 100644 data/2020/neurips/General Transportability of Soft Interventions: Completeness Results
 create mode 100644 data/2020/neurips/Generalised Bayesian Filtering via Sequential Monte Carlo
 create mode 100644 data/2020/neurips/Generalization Bound of Gradient Descent for Non-Convex Metric Learning
 create mode 100644 data/2020/neurips/Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
 create mode 100644 data/2020/neurips/Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
 create mode 100644 data/2020/neurips/Generalized Boosting
 create mode 100644 data/2020/neurips/Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
 create mode 100644 data/2020/neurips/Generalized Hindsight for Reinforcement Learning
 create mode 100644 data/2020/neurips/Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs
 create mode 100644 data/2020/neurips/Generalized Leverage Score Sampling for Neural Networks
 create mode 100644 data/2020/neurips/Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning
 create mode 100644 data/2020/neurips/Generating Correct Answers for Progressive Matrices Intelligence Tests
 create mode 100644 data/2020/neurips/Generative 3D Part Assembly via Dynamic Graph Learning
 create mode 100644 data/2020/neurips/Generative Neurosymbolic Machines
 create mode 100644 data/2020/neurips/Generative View Synthesis: From Single-view Semantics to Novel-view Images
 create mode 100644 data/2020/neurips/Generative causal explanations of black-box classifiers
 create mode 100644 data/2020/neurips/Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction
 create mode 100644 data/2020/neurips/Geometric All-way Boolean Tensor Decomposition
 create mode 100644 data/2020/neurips/Geometric Dataset Distances via Optimal Transport
 create mode 100644 data/2020/neurips/Geometric Exploration for Online Control
 create mode 100644 data/2020/neurips/Gibbs Sampling with People
 create mode 100644 data/2020/neurips/Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
 create mode 100644 data/2020/neurips/Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
 create mode 100644 data/2020/neurips/Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
 create mode 100644 data/2020/neurips/Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
 create mode 100644 data/2020/neurips/Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
 create mode 100644 data/2020/neurips/Goal-directed Generation of Discrete Structures with Conditional Generative Models
 create mode 100644 data/2020/neurips/GradAug: A New Regularization Method for Deep Neural Networks
 create mode 100644 data/2020/neurips/Gradient Boosted Normalizing Flows
 create mode 100644 data/2020/neurips/Gradient Estimation with Stochastic Softmax Tricks
 create mode 100644 data/2020/neurips/Gradient Regularized V-Learning for Dynamic Treatment Regimes
 create mode 100644 data/2020/neurips/Gradient Surgery for Multi-Task Learning
 create mode 100644 data/2020/neurips/Gradient-EM Bayesian Meta-Learning
 create mode 100644 data/2020/neurips/Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning
 create mode 100644 data/2020/neurips/GramGAN: Deep 3D Texture Synthesis From 2D Exemplars
 create mode 100644 data/2020/neurips/Graph Cross Networks with Vertex Infomax Pooling
 create mode 100644 data/2020/neurips/Graph Geometry Interaction Learning
 create mode 100644 data/2020/neurips/Graph Information Bottleneck
 create mode 100644 data/2020/neurips/Graph Meta Learning via Local Subgraphs
 create mode 100644 data/2020/neurips/Graph Policy Network for Transferable Active Learning on Graphs
 create mode 100644 data/2020/neurips/Graph Random Neural Networks for Semi-Supervised Learning on Graphs
 create mode 100644 data/2020/neurips/Graph Stochastic Neural Networks for Semi-supervised Learning
 create mode 100644 data/2020/neurips/Graphon Neural Networks and the Transferability of Graph Neural Networks
 create mode 100644 data/2020/neurips/Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps
 create mode 100644 data/2020/neurips/Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough
 create mode 100644 data/2020/neurips/Greedy inference with structure-exploiting lazy maps
 create mode 100644 data/2020/neurips/GreedyFool: Distortion-Aware Sparse Adversarial Attack
 create mode 100644 data/2020/neurips/Group Contextual Encoding for 3D Point Clouds
 create mode 100644 data/2020/neurips/Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
 create mode 100644 data/2020/neurips/Group-Fair Online Allocation in Continuous Time
 create mode 100644 data/2020/neurips/Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses
 create mode 100644 data/2020/neurips/Guiding Deep Molecular Optimization with Genetic Exploration
 create mode 100644 data/2020/neurips/H-Mem: Harnessing synaptic plasticity with Hebbian Memory Networks
 create mode 100644 data/2020/neurips/HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
 create mode 100644 data/2020/neurips/HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory
 create mode 100644 data/2020/neurips/HOI Analysis: Integrating and Decomposing Human-Object Interaction
 create mode 100644 data/2020/neurips/HRN: A Holistic Approach to One Class Learning
 create mode 100644 data/2020/neurips/HYDRA: Pruning Adversarially Robust Neural Networks
 create mode 100644 data/2020/neurips/Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond
 create mode 100644 data/2020/neurips/Handling Missing Data with Graph Representation Learning
 create mode 100644 data/2020/neurips/Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
 create mode 100644 data/2020/neurips/Hard Negative Mixing for Contrastive Learning
 create mode 100644 data/2020/neurips/Hard Shape-Constrained Kernel Machines
 create mode 100644 data/2020/neurips/Hardness of Learning Neural Networks with Natural Weights
 create mode 100644 data/2020/neurips/Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
 create mode 100644 data/2020/neurips/Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
 create mode 100644 data/2020/neurips/Hedging in games: Faster convergence of external and swap regrets
 create mode 100644 data/2020/neurips/Heuristic Domain Adaptation
 create mode 100644 data/2020/neurips/HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
 create mode 100644 data/2020/neurips/HiPPO: Recurrent Memory with Optimal Polynomial Projections
 create mode 100644 data/2020/neurips/Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
 create mode 100644 data/2020/neurips/Hierarchical Granularity Transfer Learning
 create mode 100644 data/2020/neurips/Hierarchical Neural Architecture Search for Deep Stereo Matching
 create mode 100644 data/2020/neurips/Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample
 create mode 100644 data/2020/neurips/Hierarchical Poset Decoding for Compositional Generalization in Language
 create mode 100644 data/2020/neurips/Hierarchical Quantized Autoencoders
 create mode 100644 data/2020/neurips/Hierarchical nucleation in deep neural networks
 create mode 100644 data/2020/neurips/Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems
 create mode 100644 data/2020/neurips/High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds
 create mode 100644 data/2020/neurips/High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization
 create mode 100644 data/2020/neurips/High-Dimensional Sparse Linear Bandits
 create mode 100644 data/2020/neurips/High-Fidelity Generative Image Compression
 create mode 100644 data/2020/neurips/High-Throughput Synchronous Deep RL
 create mode 100644 "data/2020/neurips/High-contrast \"gaudy\" images improve the training of deep neural network models of visual cortex"
 create mode 100644 data/2020/neurips/High-recall causal discovery for autocorrelated time series with latent confounders
 create mode 100644 data/2020/neurips/Higher-Order Certification For Randomized Smoothing
 create mode 100644 data/2020/neurips/Higher-Order Spectral Clustering of Directed Graphs
 create mode 100644 data/2020/neurips/Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics
 create mode 100644 data/2020/neurips/Hold me tight! Influence of discriminative features on deep network boundaries
 create mode 100644 data/2020/neurips/How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods
 create mode 100644 data/2020/neurips/How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
 create mode 100644 data/2020/neurips/How do fair decisions fare in long-term qualification?
 create mode 100644 data/2020/neurips/How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions
 create mode 100644 data/2020/neurips/How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?
 create mode 100644 data/2020/neurips/How hard is to distinguish graphs with graph neural networks?
 create mode 100644 data/2020/neurips/How many samples is a good initial point worth in Low-rank Matrix Recovery?
 create mode 100644 data/2020/neurips/How to Characterize The Landscape of Overparameterized Convolutional Neural Networks
 create mode 100644 data/2020/neurips/How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
 create mode 100644 data/2020/neurips/Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency
 create mode 100644 data/2020/neurips/HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss
 create mode 100644 data/2020/neurips/Hybrid Models for Learning to Branch
 create mode 100644 data/2020/neurips/Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function
 create mode 100644 data/2020/neurips/Hypersolvers: Toward Fast Continuous-Depth Models
 create mode 100644 data/2020/neurips/ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
 create mode 100644 data/2020/neurips/ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
 create mode 100644 data/2020/neurips/ICNet: Intra-saliency Correlation Network for Co-Saliency Detection
 create mode 100644 data/2020/neurips/IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method
 create mode 100644 data/2020/neurips/ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
 create mode 100644 data/2020/neurips/Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models
 create mode 100644 data/2020/neurips/Identifying Learning Rules From Neural Network Observables
 create mode 100644 data/2020/neurips/Identifying Mislabeled Data using the Area Under the Margin Ranking
 create mode 100644 data/2020/neurips/Identifying signal and noise structure in neural population activity with Gaussian process factor models
 create mode 100644 data/2020/neurips/ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool
 create mode 100644 data/2020/neurips/Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
 create mode 100644 data/2020/neurips/Implicit Distributional Reinforcement Learning
 create mode 100644 data/2020/neurips/Implicit Graph Neural Networks
 create mode 100644 data/2020/neurips/Implicit Neural Representations with Periodic Activation Functions
 create mode 100644 data/2020/neurips/Implicit Rank-Minimizing Autoencoder
 create mode 100644 data/2020/neurips/Implicit Regularization in Deep Learning May Not Be Explainable by Norms
 create mode 100644 data/2020/neurips/Impossibility Results for Grammar-Compressed Linear Algebra
 create mode 100644 data/2020/neurips/Improved Algorithms for Convex-Concave Minimax Optimization
 create mode 100644 data/2020/neurips/Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds
 create mode 100644 data/2020/neurips/Improved Analysis of Clipping Algorithms for Non-convex Optimization
 create mode 100644 data/2020/neurips/Improved Guarantees for k-means++ and k-means++ Parallel
 create mode 100644 data/2020/neurips/Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
 create mode 100644 data/2020/neurips/Improved Schemes for Episodic Memory-based Lifelong Learning
 create mode 100644 data/2020/neurips/Improved Techniques for Training Score-Based Generative Models
 create mode 100644 data/2020/neurips/Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows
 create mode 100644 data/2020/neurips/Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method
 create mode 100644 data/2020/neurips/Improving Auto-Augment via Augmentation-Wise Weight Sharing
 create mode 100644 data/2020/neurips/Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
 create mode 100644 data/2020/neurips/Improving Generalization in Reinforcement Learning with Mixture Regularization
 create mode 100644 data/2020/neurips/Improving Inference for Neural Image Compression
 create mode 100644 data/2020/neurips/Improving Local Identifiability in Probabilistic Box Embeddings
 create mode 100644 data/2020/neurips/Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
 create mode 100644 data/2020/neurips/Improving Neural Network Training in Low Dimensional Random Bases
 create mode 100644 data/2020/neurips/Improving Online Rent-or-Buy Algorithms with Sequential Decision Making and ML Predictions
 create mode 100644 data/2020/neurips/Improving Policy-Constrained Kidney Exchange via Pre-Screening
 create mode 100644 data/2020/neurips/Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
 create mode 100644 data/2020/neurips/Improving Sparse Vector Technique with Renyi Differential Privacy
 create mode 100644 data/2020/neurips/Improving model calibration with accuracy versus uncertainty optimization
 create mode 100644 data/2020/neurips/Improving robustness against common corruptions by covariate shift adaptation
 create mode 100644 data/2020/neurips/In search of robust measures of generalization
 create mode 100644 data/2020/neurips/Incorporating BERT into Parallel Sequence Decoding with Adapters
 create mode 100644 data/2020/neurips/Incorporating Interpretable Output Constraints in Bayesian Neural Networks
 create mode 100644 data/2020/neurips/Incorporating Pragmatic Reasoning Communication into Emergent Language
 create mode 100644 data/2020/neurips/Independent Policy Gradient Methods for Competitive Reinforcement Learning
 create mode 100644 data/2020/neurips/Inductive Quantum Embedding
 create mode 100644 data/2020/neurips/Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation
 create mode 100644 data/2020/neurips/Inference for Batched Bandits
 create mode 100644 data/2020/neurips/Inferring learning rules from animal decision-making
 create mode 100644 data/2020/neurips/Influence-Augmented Online Planning for Complex Environments
 create mode 100644 data/2020/neurips/Information Maximization for Few-Shot Learning
 create mode 100644 data/2020/neurips/Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback
 create mode 100644 data/2020/neurips/Information Theoretic Regret Bounds for Online Nonlinear Control
 create mode 100644 data/2020/neurips/Information theoretic limits of learning a sparse rule
 create mode 100644 data/2020/neurips/Information-theoretic Task Selection for Meta-Reinforcement Learning
 create mode 100644 data/2020/neurips/Input-Aware Dynamic Backdoor Attack
 create mode 100644 data/2020/neurips/Instance Based Approximations to Profile Maximum Likelihood
 create mode 100644 data/2020/neurips/Instance Selection for GANs
 create mode 100644 data/2020/neurips/Instance-based Generalization in Reinforcement Learning
 create mode 100644 data/2020/neurips/Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms
 create mode 100644 data/2020/neurips/Instance-wise Feature Grouping
 create mode 100644 data/2020/neurips/Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
 create mode 100644 data/2020/neurips/Interferobot: aligning an optical interferometer by a reinforcement learning agent
 create mode 100644 data/2020/neurips/Interior Point Solving for LP-based prediction+optimisation
 create mode 100644 data/2020/neurips/Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs
 create mode 100644 data/2020/neurips/Interpretable Sequence Learning for Covid-19 Forecasting
 create mode 100644 data/2020/neurips/Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations
 create mode 100644 data/2020/neurips/Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech
 create mode 100644 data/2020/neurips/Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding
 create mode 100644 data/2020/neurips/Interventional Few-Shot Learning
 create mode 100644 data/2020/neurips/Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks
 create mode 100644 data/2020/neurips/Intra-Processing Methods for Debiasing Neural Networks
 create mode 100644 data/2020/neurips/Introducing Routing Uncertainty in Capsule Networks
 create mode 100644 data/2020/neurips/Inverse Learning of Symmetries
 create mode 100644 data/2020/neurips/Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
 create mode 100644 data/2020/neurips/Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
 create mode 100644 data/2020/neurips/Inverting Gradients - How easy is it to break privacy in federated learning?
 create mode 100644 data/2020/neurips/Investigating Gender Bias in Language Models Using Causal Mediation Analysis
 create mode 100644 data/2020/neurips/Is Long Horizon RL More Difficult Than Short Horizon RL?
 create mode 100644 data/2020/neurips/Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?
 create mode 100644 data/2020/neurips/Is normalization indispensable for training deep neural network?
 create mode 100644 data/2020/neurips/Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings
 create mode 100644 data/2020/neurips/JAX MD: A Framework for Differentiable Physics
 create mode 100644 data/2020/neurips/Joint Contrastive Learning with Infinite Possibilities
 create mode 100644 data/2020/neurips/Joint Policy Search for Multi-agent Collaboration with Imperfect Information
 create mode 100644 data/2020/neurips/Joints in Random Forests
 create mode 100644 data/2020/neurips/Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
 create mode 100644 data/2020/neurips/KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering
 create mode 100644 data/2020/neurips/Kalman Filtering Attention for User Behavior Modeling in CTR Prediction
 create mode 100644 data/2020/neurips/Kernel Alignment Risk Estimator: Risk Prediction from Training Data
 create mode 100644 data/2020/neurips/Kernel Based Progressive Distillation for Adder Neural Networks
 create mode 100644 data/2020/neurips/Kernel Methods Through the Roof: Handling Billions of Points Efficiently
 create mode 100644 data/2020/neurips/Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks
 create mode 100644 data/2020/neurips/Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
 create mode 100644 data/2020/neurips/Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
 create mode 100644 data/2020/neurips/Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control
 create mode 100644 data/2020/neurips/LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond
 create mode 100644 data/2020/neurips/Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity
 create mode 100644 data/2020/neurips/Labelling unlabelled videos from scratch with multi-modal self-supervision
 create mode 100644 data/2020/neurips/Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks
 create mode 100644 data/2020/neurips/Language Models are Few-Shot Learners
 create mode 100644 data/2020/neurips/Language Through a Prism: A Spectral Approach for Multiscale Language Representations
 create mode 100644 data/2020/neurips/Language and Visual Entity Relationship Graph for Agent Navigation
 create mode 100644 data/2020/neurips/Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration
 create mode 100644 data/2020/neurips/Language-Conditioned Imitation Learning for Robot Manipulation Tasks
 create mode 100644 data/2020/neurips/Large-Scale Adversarial Training for Vision-and-Language Representation Learning
 create mode 100644 data/2020/neurips/Large-Scale Methods for Distributionally Robust Optimization
 create mode 100644 data/2020/neurips/Latent Bandits Revisited
 create mode 100644 data/2020/neurips/Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings
 create mode 100644 data/2020/neurips/Latent Template Induction with Gumbel-CRFs
 create mode 100644 data/2020/neurips/Latent World Models For Intrinsically Motivated Exploration
 create mode 100644 data/2020/neurips/Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge
 create mode 100644 data/2020/neurips/Learnability with Indirect Supervision Signals
 create mode 100644 data/2020/neurips/Learning About Objects by Learning to Interact with Them
 create mode 100644 data/2020/neurips/Learning Affordance Landscapes for Interaction Exploration in 3D Environments
 create mode 100644 data/2020/neurips/Learning Agent Representations for Ice Hockey
 create mode 100644 data/2020/neurips/Learning Augmented Energy Minimization via Speed Scaling
 create mode 100644 data/2020/neurips/Learning Bounds for Risk-sensitive Learning
 create mode 100644 data/2020/neurips/Learning Causal Effects via Weighted Empirical Risk Minimization
 create mode 100644 data/2020/neurips/Learning Certified Individually Fair Representations
 create mode 100644 data/2020/neurips/Learning Composable Energy Surrogates for PDE Order Reduction
 create mode 100644 data/2020/neurips/Learning Compositional Rules via Neural Program Synthesis
 create mode 100644 data/2020/neurips/Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations
 create mode 100644 data/2020/neurips/Learning Deep Attribution Priors Based On Prior Knowledge
 create mode 100644 data/2020/neurips/Learning Deformable Tetrahedral Meshes for 3D Reconstruction
 create mode 100644 data/2020/neurips/Learning Differentiable Programs with Admissible Neural Heuristics
 create mode 100644 data/2020/neurips/Learning Differential Equations that are Easy to Solve
 create mode 100644 data/2020/neurips/Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
 create mode 100644 data/2020/neurips/Learning Disentangled Representations and Group Structure of Dynamical Environments
 create mode 100644 data/2020/neurips/Learning Disentangled Representations of Videos with Missing Data
 create mode 100644 data/2020/neurips/Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction
 create mode 100644 data/2020/neurips/Learning Dynamic Belief Graphs to Generalize on Text-Based Games
 create mode 100644 data/2020/neurips/Learning Feature Sparse Principal Subspace
 create mode 100644 data/2020/neurips/Learning Global Transparent Models consistent with Local Contrastive Explanations
 create mode 100644 data/2020/neurips/Learning Graph Structure With A Finite-State Automaton Layer
 create mode 100644 data/2020/neurips/Learning Guidance Rewards with Trajectory-space Smoothing
 create mode 100644 data/2020/neurips/Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
 create mode 100644 data/2020/neurips/Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence
 create mode 100644 data/2020/neurips/Learning Individually Inferred Communication for Multi-Agent Cooperation
 create mode 100644 data/2020/neurips/Learning Invariances in Neural Networks from Training Data
 create mode 100644 data/2020/neurips/Learning Invariants through Soft Unification
 create mode 100644 data/2020/neurips/Learning Kernel Tests Without Data Splitting
 create mode 100644 data/2020/neurips/Learning Latent Space Energy-Based Prior Model
 create mode 100644 data/2020/neurips/Learning Linear Programs from Optimal Decisions
 create mode 100644 data/2020/neurips/Learning Loss for Test-Time Augmentation
 create mode 100644 data/2020/neurips/Learning Manifold Implicitly via Explicit Heat-Kernel Learning
 create mode 100644 data/2020/neurips/Learning Multi-Agent Communication through Structured Attentive Reasoning
 create mode 100644 data/2020/neurips/Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks
 create mode 100644 data/2020/neurips/Learning Mutational Semantics
 create mode 100644 data/2020/neurips/Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
 create mode 100644 data/2020/neurips/Learning Optimal Representations with the Decodable Information Bottleneck
 create mode 100644 data/2020/neurips/Learning Parities with Neural Networks
 create mode 100644 data/2020/neurips/Learning Physical Constraints with Neural Projections
 create mode 100644 data/2020/neurips/Learning Physical Graph Representations from Visual Scenes
 create mode 100644 data/2020/neurips/Learning Representations from Audio-Visual Spatial Alignment
 create mode 100644 data/2020/neurips/Learning Restricted Boltzmann Machines with Sparse Latent Variables
 create mode 100644 data/2020/neurips/Learning Retrospective Knowledge with Reverse Reinforcement Learning
 create mode 100644 data/2020/neurips/Learning Rich Rankings
 create mode 100644 data/2020/neurips/Learning Robust Decision Policies from Observational Data
 create mode 100644 data/2020/neurips/Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search
 create mode 100644 data/2020/neurips/Learning Semantic-aware Normalization for Generative Adversarial Networks
 create mode 100644 data/2020/neurips/Learning Some Popular Gaussian Graphical Models without Condition Number Bounds
 create mode 100644 data/2020/neurips/Learning Sparse Prototypes for Text Generation
 create mode 100644 data/2020/neurips/Learning Strategic Network Emergence Games
 create mode 100644 data/2020/neurips/Learning Strategy-Aware Linear Classifiers
 create mode 100644 data/2020/neurips/Learning Structured Distributions From Untrusted Batches: Faster and Simpler
 create mode 100644 data/2020/neurips/Learning Utilities and Equilibria in Non-Truthful Auctions
 create mode 100644 data/2020/neurips/Learning abstract structure for drawing by efficient motor program induction
 create mode 100644 data/2020/neurips/Learning by Minimizing the Sum of Ranked Range
 create mode 100644 data/2020/neurips/Learning compositional functions via multiplicative weight updates
 create mode 100644 data/2020/neurips/Learning discrete distributions with infinite support
 create mode 100644 data/2020/neurips/Learning discrete distributions: user vs item-level privacy
 create mode 100644 data/2020/neurips/Learning efficient task-dependent representations with synaptic plasticity
 create mode 100644 data/2020/neurips/Learning from Aggregate Observations
 create mode 100644 data/2020/neurips/Learning from Failure: De-biasing Classifier from Biased Classifier
 create mode 100644 data/2020/neurips/Learning from Label Proportions: A Mutual Contamination Framework
 create mode 100644 data/2020/neurips/Learning from Mixtures of Private and Public Populations
 create mode 100644 data/2020/neurips/Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
 create mode 100644 data/2020/neurips/Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
 create mode 100644 data/2020/neurips/Learning of Discrete Graphical Models with Neural Networks
 create mode 100644 data/2020/neurips/Learning outside the Black-Box: The pursuit of interpretable models
 create mode 100644 data/2020/neurips/Learning sparse codes from compressed representations with biologically plausible local wiring constraints
 create mode 100644 data/2020/neurips/Learning the Geometry of Wave-Based Imaging
 create mode 100644 data/2020/neurips/Learning the Linear Quadratic Regulator from Nonlinear Observations
 create mode 100644 data/2020/neurips/Learning to Adapt to Evolving Domains
 create mode 100644 data/2020/neurips/Learning to Approximate a Bregman Divergence
 create mode 100644 data/2020/neurips/Learning to Decode: Reinforcement Learning for Decoding of Sparse Graph-Based Channel Codes
 create mode 100644 data/2020/neurips/Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
 create mode 100644 data/2020/neurips/Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction
 create mode 100644 data/2020/neurips/Learning to Incentivize Other Learning Agents
 create mode 100644 data/2020/neurips/Learning to Learn Variational Semantic Memory
 create mode 100644 data/2020/neurips/Learning to Learn with Feedback and Local Plasticity
 create mode 100644 data/2020/neurips/Learning to Mutate with Hypergradient Guided Population
 create mode 100644 data/2020/neurips/Learning to Orient Surfaces by Self-supervised Spherical CNNs
 create mode 100644 data/2020/neurips/Learning to Play No-Press Diplomacy with Best Response Policy Iteration
 create mode 100644 data/2020/neurips/Learning to Play Sequential Games versus Unknown Opponents
 create mode 100644 data/2020/neurips/Learning to Prove Theorems by Learning to Generate Theorems
 create mode 100644 data/2020/neurips/Learning to Select Best Forecast Tasks for Clinical Outcome Prediction
 create mode 100644 data/2020/neurips/Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
 create mode 100644 data/2020/neurips/Learning to search efficiently for causally near-optimal treatments
 create mode 100644 data/2020/neurips/Learning to solve TV regularised problems with unrolled algorithms
 create mode 100644 data/2020/neurips/Learning to summarize with human feedback
 create mode 100644 data/2020/neurips/Learning under Model Misspecification: Applications to Variational and Ensemble methods
 create mode 100644 data/2020/neurips/Learning with Differentiable Pertubed Optimizers
 create mode 100644 data/2020/neurips/Learning with Operator-valued Kernels in Reproducing Kernel Krein Spaces
 create mode 100644 data/2020/neurips/Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions
 create mode 100644 data/2020/neurips/Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms
 create mode 100644 data/2020/neurips/Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
 create mode 100644 data/2020/neurips/Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms
 create mode 100644 data/2020/neurips/Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations
 create mode 100644 data/2020/neurips/Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
 create mode 100644 data/2020/neurips/Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
 create mode 100644 data/2020/neurips/Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder
 create mode 100644 data/2020/neurips/Limits on Testing Structural Changes in Ising Models
 create mode 100644 data/2020/neurips/Limits to Depth Efficiencies of Self-Attention
 create mode 100644 data/2020/neurips/Linear Disentangled Representations and Unsupervised Action Estimation
 create mode 100644 data/2020/neurips/Linear Dynamical Systems as a Core Computational Primitive
 create mode 100644 data/2020/neurips/Linear Time Sinkhorn Divergences using Positive Features
 create mode 100644 data/2020/neurips/Linear-Sample Learning of Low-Rank Distributions
 create mode 100644 data/2020/neurips/Linearly Converging Error Compensated SGD
 create mode 100644 data/2020/neurips/Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing
 create mode 100644 data/2020/neurips/Lipschitz-Certifiable Training with a Tight Outer Bound
 create mode 100644 data/2020/neurips/List-Decodable Mean Estimation via Iterative Multi-Filtering
 create mode 100644 data/2020/neurips/Listening to Sounds of Silence for Speech Denoising
 create mode 100644 data/2020/neurips/LoCo: Local Contrastive Representation Learning
 create mode 100644 data/2020/neurips/Locally Differentially Private (Contextual) Bandits Learning
 create mode 100644 data/2020/neurips/Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms
 create mode 100644 data/2020/neurips/Locally-Adaptive Nonparametric Online Learning
 create mode 100644 data/2020/neurips/Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
 create mode 100644 data/2020/neurips/Logarithmic Pruning is All You Need
 create mode 100644 data/2020/neurips/Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
 create mode 100644 data/2020/neurips/Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
 create mode 100644 data/2020/neurips/Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
 create mode 100644 data/2020/neurips/LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
 create mode 100644 data/2020/neurips/Low Distortion Block-Resampling with Spatially Stochastic Networks
 create mode 100644 data/2020/neurips/MATE: Plugging in Model Awareness to Task Embedding for Meta Learning
 create mode 100644 data/2020/neurips/MCUNet: Tiny Deep Learning on IoT Devices
 create mode 100644 data/2020/neurips/MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
 create mode 100644 data/2020/neurips/MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler
 create mode 100644 data/2020/neurips/MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles
 create mode 100644 data/2020/neurips/MOPO: Model-based Offline Policy Optimization
 create mode 100644 data/2020/neurips/MOReL: Model-Based Offline Reinforcement Learning
 create mode 100644 data/2020/neurips/MPNet: Masked and Permuted Pre-training for Language Understanding
 create mode 100644 data/2020/neurips/MRI Banding Removal via Adversarial Training
 create mode 100644 data/2020/neurips/Make One-Shot Video Object Segmentation Efficient Again
 create mode 100644 data/2020/neurips/Making Non-Stochastic Control (Almost) as Easy as Stochastic
 create mode 100644 data/2020/neurips/Manifold GPLVMs for discovering non-Euclidean latent structure in neural data
 create mode 100644 data/2020/neurips/Manifold structure in graph embeddings
 create mode 100644 data/2020/neurips/Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
 create mode 100644 data/2020/neurips/Margins are Insufficient for Explaining Gradient Boosting
 create mode 100644 data/2020/neurips/Matrix Completion with Hierarchical Graph Side Information
 create mode 100644 data/2020/neurips/Matrix Inference and Estimation in Multi-Layer Models
 create mode 100644 "data/2020/neurips/Mat\303\251rn Gaussian Processes on Riemannian Manifolds"
 create mode 100644 data/2020/neurips/Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
 create mode 100644 data/2020/neurips/Measuring Robustness to Natural Distribution Shifts in Image Classification
 create mode 100644 data/2020/neurips/Measuring Systematic Generalization in Neural Proof Generation with Transformers
 create mode 100644 data/2020/neurips/Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
 create mode 100644 data/2020/neurips/Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control
 create mode 100644 data/2020/neurips/MeshSDF: Differentiable Iso-Surface Extraction
 create mode 100644 data/2020/neurips/Meta-Consolidation for Continual Learning
 create mode 100644 data/2020/neurips/Meta-Gradient Reinforcement Learning with an Objective Discovered Online
 create mode 100644 data/2020/neurips/Meta-Learning Requires Meta-Augmentation
 create mode 100644 data/2020/neurips/Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
 create mode 100644 data/2020/neurips/Meta-Learning through Hebbian Plasticity in Random Networks
 create mode 100644 data/2020/neurips/Meta-Learning with Adaptive Hyperparameters
 create mode 100644 data/2020/neurips/Meta-Neighborhoods
 create mode 100644 data/2020/neurips/Meta-learning from Tasks with Heterogeneous Attribute Spaces
 create mode 100644 data/2020/neurips/Meta-trained agents implement Bayes-optimal agents
 create mode 100644 data/2020/neurips/MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 create mode 100644 data/2020/neurips/MetaPoison: Practical General-purpose Clean-label Data Poisoning
 create mode 100644 data/2020/neurips/MetaSDF: Meta-Learning Signed Distance Functions
 create mode 100644 data/2020/neurips/Metric-Free Individual Fairness in Online Learning
 create mode 100644 data/2020/neurips/MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics
 create mode 100644 data/2020/neurips/MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
 create mode 100644 data/2020/neurips/Minibatch Stochastic Approximate Proximal Point Methods
 create mode 100644 data/2020/neurips/Minibatch vs Local SGD for Heterogeneous Distributed Learning
 create mode 100644 data/2020/neurips/Minimax Bounds for Generalized Linear Models
 create mode 100644 data/2020/neurips/Minimax Classification with 0-1 Loss and Performance Guarantees
 create mode 100644 data/2020/neurips/Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons
 create mode 100644 data/2020/neurips/Minimax Estimation of Conditional Moment Models
 create mode 100644 data/2020/neurips/Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
 create mode 100644 data/2020/neurips/Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects
 create mode 100644 data/2020/neurips/Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
 create mode 100644 data/2020/neurips/Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
 create mode 100644 data/2020/neurips/Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
 create mode 100644 data/2020/neurips/Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments
 create mode 100644 data/2020/neurips/Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
 create mode 100644 data/2020/neurips/Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables
 create mode 100644 data/2020/neurips/Model Agnostic Multilevel Explanations
 create mode 100644 data/2020/neurips/Model Class Reliance for Random Forests
 create mode 100644 data/2020/neurips/Model Fusion via Optimal Transport
 create mode 100644 data/2020/neurips/Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
 create mode 100644 data/2020/neurips/Model Selection for Production System via Automated Online Experiments
 create mode 100644 data/2020/neurips/Model Selection in Contextual Stochastic Bandit Problems
 create mode 100644 data/2020/neurips/Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
 create mode 100644 data/2020/neurips/Model-based Adversarial Meta-Reinforcement Learning
 create mode 100644 data/2020/neurips/Model-based Policy Optimization with Unsupervised Model Adaptation
 create mode 100644 data/2020/neurips/Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
 create mode 100644 data/2020/neurips/Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
 create mode 100644 data/2020/neurips/Modeling Noisy Annotations for Crowd Counting
 create mode 100644 data/2020/neurips/Modeling Shared responses in Neuroimaging Studies through MultiView ICA
 create mode 100644 data/2020/neurips/Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction
 create mode 100644 data/2020/neurips/Modeling and Optimization Trade-off in Meta-learning
 create mode 100644 data/2020/neurips/Modern Hopfield Networks and Attention for Immune Repertoire Classification
 create mode 100644 data/2020/neurips/Modular Meta-Learning with Shrinkage
 create mode 100644 data/2020/neurips/MomentumRNN: Integrating Momentum into Recurrent Neural Networks
 create mode 100644 data/2020/neurips/Monotone operator equilibrium networks
 create mode 100644 data/2020/neurips/Movement Pruning: Adaptive Sparsity by Fine-Tuning
 create mode 100644 data/2020/neurips/MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models
 create mode 100644 data/2020/neurips/Multi-Fidelity Bayesian Optimization via Deep Neural Networks
 create mode 100644 data/2020/neurips/Multi-Plane Program Induction with 3D Box Priors
 create mode 100644 data/2020/neurips/Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates
 create mode 100644 data/2020/neurips/Multi-Stage Influence Function
 create mode 100644 data/2020/neurips/Multi-Task Reinforcement Learning with Soft Modularization
 create mode 100644 data/2020/neurips/Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement
 create mode 100644 data/2020/neurips/Multi-agent Trajectory Prediction with Fuzzy Query Attention
 create mode 100644 data/2020/neurips/Multi-agent active perception with prediction rewards
 create mode 100644 data/2020/neurips/Multi-label Contrastive Predictive Coding
 create mode 100644 data/2020/neurips/Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?
 create mode 100644 data/2020/neurips/Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery
 create mode 100644 data/2020/neurips/Multi-task Batch Reinforcement Learning with Metric Learning
 create mode 100644 data/2020/neurips/Multi-task Causal Learning with Gaussian Processes
 create mode 100644 data/2020/neurips/MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
 create mode 100644 data/2020/neurips/Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning
 create mode 100644 data/2020/neurips/Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping
 create mode 100644 data/2020/neurips/Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
 create mode 100644 data/2020/neurips/Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
 create mode 100644 data/2020/neurips/Multiparameter Persistence Image for Topological Machine Learning
 create mode 100644 data/2020/neurips/Multipole Graph Neural Operator for Parametric Partial Differential Equations
 create mode 100644 data/2020/neurips/Multiscale Deep Equilibrium Models
 create mode 100644 data/2020/neurips/Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
 create mode 100644 data/2020/neurips/Munchausen Reinforcement Learning
 create mode 100644 data/2020/neurips/Mutual exclusivity as a challenge for deep neural networks
 create mode 100644 data/2020/neurips/Myersonian Regression
 create mode 100644 data/2020/neurips/NVAE: A Deep Hierarchical Variational Autoencoder
 create mode 100644 data/2020/neurips/NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
 create mode 100644 data/2020/neurips/Natural Graph Networks
 create mode 100644 data/2020/neurips/Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
 create mode 100644 data/2020/neurips/Near-Optimal Comparison Based Clustering
 create mode 100644 data/2020/neurips/Near-Optimal Reinforcement Learning with Self-Play
 create mode 100644 data/2020/neurips/Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
 create mode 100644 data/2020/neurips/Network Diffusions via Neural Mean-Field Dynamics
 create mode 100644 data/2020/neurips/Network size and size of the weights in memorization with two-layers neural networks
 create mode 100644 data/2020/neurips/Network-to-Network Translation with Conditional Invertible Neural Networks
 create mode 100644 data/2020/neurips/NeuMiss networks: differentiable programming for supervised learning with missing values
 create mode 100644 data/2020/neurips/Neural Anisotropy Directions
 create mode 100644 data/2020/neurips/Neural Architecture Generator Optimization
 create mode 100644 data/2020/neurips/Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems
 create mode 100644 data/2020/neurips/Neural Complexity Measures
 create mode 100644 data/2020/neurips/Neural Controlled Differential Equations for Irregular Time Series
 create mode 100644 data/2020/neurips/Neural Dynamic Policies for End-to-End Sensorimotor Learning
 create mode 100644 data/2020/neurips/Neural Execution Engines: Learning to Execute Subroutines
 create mode 100644 data/2020/neurips/Neural FFTs for Universal Texture Image Synthesis
 create mode 100644 data/2020/neurips/Neural Manifold Ordinary Differential Equations
 create mode 100644 data/2020/neurips/Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows
 create mode 100644 data/2020/neurips/Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs
 create mode 100644 data/2020/neurips/Neural Methods for Point-wise Dependency Estimation
 create mode 100644 data/2020/neurips/Neural Networks Fail to Learn Periodic Functions and How to Fix It
 create mode 100644 data/2020/neurips/Neural Networks Learning and Memorization with (almost) no Over-Parameterization
 create mode 100644 data/2020/neurips/Neural Networks with Recurrent Generative Feedback
 create mode 100644 data/2020/neurips/Neural Networks with Small Weights and Depth-Separation Barriers
 create mode 100644 data/2020/neurips/Neural Non-Rigid Tracking
 create mode 100644 data/2020/neurips/Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning
 create mode 100644 data/2020/neurips/Neural Power Units
 create mode 100644 data/2020/neurips/Neural Sparse Representation for Image Restoration
 create mode 100644 data/2020/neurips/Neural Sparse Voxel Fields
 create mode 100644 data/2020/neurips/Neural Star Domain as Primitive Representation
 create mode 100644 data/2020/neurips/Neural Topographic Factor Analysis for fMRI Data
 create mode 100644 data/2020/neurips/Neural Unsigned Distance Fields for Implicit Function Learning
 create mode 100644 data/2020/neurips/Neural encoding with visual attention
 create mode 100644 data/2020/neurips/Neuron Merging: Compensating for Pruned Neurons
 create mode 100644 data/2020/neurips/Neuron Shapley: Discovering the Responsible Neurons
 create mode 100644 data/2020/neurips/Neuron-level Structured Pruning using Polarization Regularizer
 create mode 100644 data/2020/neurips/Neuronal Gaussian Process Regression
 create mode 100644 data/2020/neurips/Neurosymbolic Reinforcement Learning with Formally Verified Exploration
 create mode 100644 data/2020/neurips/Neurosymbolic Transformers for Multi-Agent Communication
 create mode 100644 data/2020/neurips/Neutralizing Self-Selection Bias in Sampling for Sortition
 create mode 100644 data/2020/neurips/Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
 create mode 100644 data/2020/neurips/No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
 create mode 100644 data/2020/neurips/No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
 create mode 100644 data/2020/neurips/No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
 create mode 100644 data/2020/neurips/No-regret Learning in Price Competitions under Consumer Reference Effects
 create mode 100644 data/2020/neurips/Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding
 create mode 100644 data/2020/neurips/Node Embeddings and Exact Low-Rank Representations of Complex Networks
 create mode 100644 data/2020/neurips/Noise-Contrastive Estimation for Multivariate Point Processes
 create mode 100644 data/2020/neurips/Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising
 create mode 100644 data/2020/neurips/Non-Convex SGD Learns Halfspaces with Adversarial Label Noise
 create mode 100644 data/2020/neurips/Non-Crossing Quantile Regression for Distributional Reinforcement Learning
 create mode 100644 data/2020/neurips/Non-Euclidean Universal Approximation
 create mode 100644 data/2020/neurips/Non-Stochastic Control with Bandit Feedback
 create mode 100644 data/2020/neurips/Non-parametric Models for Non-negative Functions
 create mode 100644 data/2020/neurips/Non-reversible Gaussian processes for identifying latent dynamical structure in neural data
 create mode 100644 data/2020/neurips/Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors
 create mode 100644 data/2020/neurips/Nonconvex Sparse Graph Learning under Laplacian Constrained Graphical Model
 create mode 100644 data/2020/neurips/Normalizing Kalman Filters for Multivariate Time Series Analysis
 create mode 100644 data/2020/neurips/Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning
 create mode 100644 data/2020/neurips/Novelty Search in Representational Space for Sample Efficient Exploration
 create mode 100644 data/2020/neurips/Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning
 create mode 100644 data/2020/neurips/O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
 create mode 100644 data/2020/neurips/OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification
 create mode 100644 data/2020/neurips/OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling
 create mode 100644 data/2020/neurips/Object Goal Navigation using Goal-Oriented Semantic Exploration
 create mode 100644 data/2020/neurips/Object-Centric Learning with Slot Attention
 create mode 100644 data/2020/neurips/Ode to an ODE
 create mode 100644 data/2020/neurips/Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
 create mode 100644 data/2020/neurips/Off-Policy Evaluation via the Regularized Lagrangian
 create mode 100644 data/2020/neurips/Off-Policy Imitation Learning from Observations
 create mode 100644 data/2020/neurips/Off-Policy Interval Estimation with Lipschitz Value Iteration
 create mode 100644 data/2020/neurips/Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
 create mode 100644 data/2020/neurips/Offline Imitation Learning with a Misspecified Simulator
 create mode 100644 data/2020/neurips/On 1 n neural representation and robustness
 create mode 100644 data/2020/neurips/On Adaptive Attacks to Adversarial Example Defenses
 create mode 100644 data/2020/neurips/On Adaptive Distance Estimation
 create mode 100644 data/2020/neurips/On Completeness-aware Concept-Based Explanations in Deep Neural Networks
 create mode 100644 data/2020/neurips/On Convergence and Generalization of Dropout Training
 create mode 100644 data/2020/neurips/On Convergence of Nearest Neighbor Classifiers over Feature Transformations
 create mode 100644 data/2020/neurips/On Correctness of Automatic Differentiation for Non-Differentiable Functions
 create mode 100644 data/2020/neurips/On Efficiency in Hierarchical Reinforcement Learning
 create mode 100644 data/2020/neurips/On Infinite-Width Hypernetworks
 create mode 100644 data/2020/neurips/On Learning Ising Models under Huber's Contamination Model
 create mode 100644 data/2020/neurips/On Numerosity of Deep Neural Networks
 create mode 100644 data/2020/neurips/On Power Laws in Deep Ensembles
 create mode 100644 data/2020/neurips/On Regret with Multiple Best Arms
 create mode 100644 data/2020/neurips/On Reward-Free Reinforcement Learning with Linear Function Approximation
 create mode 100644 data/2020/neurips/On Second Order Behaviour in Augmented Neural ODEs
 create mode 100644 data/2020/neurips/On Testing of Samplers
 create mode 100644 data/2020/neurips/On Uniform Convergence and Low-Norm Interpolation Learning
 create mode 100644 data/2020/neurips/On Warm-Starting Neural Network Training
 create mode 100644 data/2020/neurips/On ranking via sorting by estimated expected utility
 create mode 100644 data/2020/neurips/On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
 create mode 100644 data/2020/neurips/On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
 create mode 100644 data/2020/neurips/On the Equivalence between Online and Private Learnability beyond Binary Classification
 create mode 100644 data/2020/neurips/On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method
 create mode 100644 data/2020/neurips/On the Error Resistance of Hinge-Loss Minimization
 create mode 100644 data/2020/neurips/On the Expressiveness of Approximate Inference in Bayesian Neural Networks
 create mode 100644 data/2020/neurips/On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
 create mode 100644 data/2020/neurips/On the Modularity of Hypernetworks
 create mode 100644 data/2020/neurips/On the Power of Louvain in the Stochastic Block Model
 create mode 100644 data/2020/neurips/On the Role of Sparsity and DAG Constraints for Learning Linear DAGs
 create mode 100644 data/2020/neurips/On the Similarity between the Laplace and Neural Tangent Kernels
 create mode 100644 data/2020/neurips/On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems
 create mode 100644 data/2020/neurips/On the Theory of Transfer Learning: The Importance of Task Diversity
 create mode 100644 data/2020/neurips/On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples
 create mode 100644 data/2020/neurips/On the Trade-off between Adversarial and Backdoor Robustness
 create mode 100644 data/2020/neurips/On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
 create mode 100644 data/2020/neurips/On the distance between two neural networks and the stability of learning
 create mode 100644 data/2020/neurips/On the equivalence of molecular graph convolution and molecular wave function with poor basis set
 create mode 100644 data/2020/neurips/On the linearity of large non-linear models: when and why the tangent kernel is constant
 create mode 100644 data/2020/neurips/Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free
 create mode 100644 data/2020/neurips/One Ring to Rule Them All: Certifiably Robust Geometric Perception with Outliers
 create mode 100644 data/2020/neurips/One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
 create mode 100644 data/2020/neurips/One-bit Supervision for Image Classification
 create mode 100644 data/2020/neurips/One-sample Guided Object Representation Disassembling
 create mode 100644 data/2020/neurips/Online Agnostic Boosting via Regret Minimization
 create mode 100644 data/2020/neurips/Online Algorithm for Unsupervised Sequential Selection with Contextual Information
 create mode 100644 data/2020/neurips/Online Algorithms for Multi-shop Ski Rental with Machine Learned Advice
 create mode 100644 data/2020/neurips/Online Bayesian Goal Inference for Boundedly Rational Planning Agents
 create mode 100644 data/2020/neurips/Online Bayesian Persuasion
 create mode 100644 data/2020/neurips/Online Convex Optimization Over Erdos-Renyi Random Networks
 create mode 100644 data/2020/neurips/Online Decision Based Visual Tracking via Reinforcement Learning
 create mode 100644 data/2020/neurips/Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning
 create mode 100644 data/2020/neurips/Online Influence Maximization under Linear Threshold Model
 create mode 100644 data/2020/neurips/Online Learning in Contextual Bandits using Gated Linear Networks
 create mode 100644 data/2020/neurips/Online Learning with Primary and Secondary Losses
 create mode 100644 data/2020/neurips/Online Linear Optimization with Many Hints
 create mode 100644 data/2020/neurips/Online MAP Inference of Determinantal Point Processes
 create mode 100644 data/2020/neurips/Online Matrix Completion with Side Information
 create mode 100644 data/2020/neurips/Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
 create mode 100644 data/2020/neurips/Online Multitask Learning with Long-Term Memory
 create mode 100644 data/2020/neurips/Online Neural Connectivity Estimation with Noisy Group Testing
 create mode 100644 data/2020/neurips/Online Non-Convex Optimization with Imperfect Feedback
 create mode 100644 data/2020/neurips/Online Optimization with Memory and Competitive Control
 create mode 100644 data/2020/neurips/Online Planning with Lookahead Policies
 create mode 100644 data/2020/neurips/Online Robust Regression via SGD on the l1 loss
 create mode 100644 data/2020/neurips/Online Sinkhorn: Optimal Transport distances from sample streams
 create mode 100644 data/2020/neurips/Online Structured Meta-learning
 create mode 100644 data/2020/neurips/Online learning with dynamics: A minimax perspective
 create mode 100644 data/2020/neurips/Open Graph Benchmark: Datasets for Machine Learning on Graphs
 create mode 100644 data/2020/neurips/Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield
 create mode 100644 data/2020/neurips/Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
 create mode 100644 data/2020/neurips/Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions
 create mode 100644 data/2020/neurips/Optimal Best-arm Identification in Linear Bandits
 create mode 100644 data/2020/neurips/Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization
 create mode 100644 data/2020/neurips/Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
 create mode 100644 data/2020/neurips/Optimal Learning from Verified Training Data
 create mode 100644 data/2020/neurips/Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient
 create mode 100644 data/2020/neurips/Optimal Prediction of the Number of Unseen Species with Multiplicity
 create mode 100644 data/2020/neurips/Optimal Private Median Estimation under Minimal Distributional Assumptions
 create mode 100644 data/2020/neurips/Optimal Query Complexity of Secure Stochastic Convex Optimization
 create mode 100644 data/2020/neurips/Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms
 create mode 100644 data/2020/neurips/Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds
 create mode 100644 data/2020/neurips/Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization
 create mode 100644 data/2020/neurips/Optimally Deceiving a Learning Leader in Stackelberg Games
 create mode 100644 data/2020/neurips/Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities
 create mode 100644 data/2020/neurips/Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
 create mode 100644 data/2020/neurips/Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
 create mode 100644 data/2020/neurips/Optimizing Mode Connectivity via Neuron Alignment
 create mode 100644 data/2020/neurips/Optimizing Neural Networks via Koopman Operator Theory
 create mode 100644 data/2020/neurips/OrganITE: Optimal transplant donor organ offering using an individual treatment effect
 create mode 100644 data/2020/neurips/Organizing recurrent network dynamics by task-computation to enable continual learning
 create mode 100644 data/2020/neurips/Outlier Robust Mean Estimation with Subgaussian Rates via Stability
 create mode 100644 data/2020/neurips/Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
 create mode 100644 data/2020/neurips/Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree
 create mode 100644 data/2020/neurips/PAC-Bayes Analysis Beyond the Usual Bounds
 create mode 100644 data/2020/neurips/PAC-Bayes Learning Bounds for Sample-Dependent Priors
 create mode 100644 data/2020/neurips/PAC-Bayesian Bound for the Conditional Value at Risk
 create mode 100644 data/2020/neurips/PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
 create mode 100644 data/2020/neurips/PEP: Parameter Ensembling by Perturbation
 create mode 100644 data/2020/neurips/PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks
 create mode 100644 data/2020/neurips/PIE-NET: Parametric Inference of Point Cloud Edges
 create mode 100644 data/2020/neurips/PLANS: Neuro-Symbolic Program Learning from Videos
 create mode 100644 data/2020/neurips/PLLay: Efficient Topological Layer based on Persistent Landscapes
 create mode 100644 data/2020/neurips/POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis
 create mode 100644 data/2020/neurips/POMDPs in Continuous Time and Discrete Spaces
 create mode 100644 data/2020/neurips/POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
 create mode 100644 data/2020/neurips/PRANK: motion Prediction based on RANKing
 create mode 100644 data/2020/neurips/Parabolic Approximation Line Search for DNNs
 create mode 100644 data/2020/neurips/Parameterized Explainer for Graph Neural Network
 create mode 100644 data/2020/neurips/Parametric Instance Classification for Unsupervised Visual Feature learning
 create mode 100644 data/2020/neurips/Part-dependent Label Noise: Towards Instance-dependent Label Noise
 create mode 100644 data/2020/neurips/Partially View-aligned Clustering
 create mode 100644 "data/2020/neurips/Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning\342\200\213"
 create mode 100644 data/2020/neurips/Path Integral Based Convolution and Pooling for Graph Neural Networks
 create mode 100644 data/2020/neurips/Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
 create mode 100644 data/2020/neurips/Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets
 create mode 100644 data/2020/neurips/Permute-and-Flip: A new mechanism for differentially private selection
 create mode 100644 data/2020/neurips/Personalized Federated Learning with Moreau Envelopes
 create mode 100644 data/2020/neurips/Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach
 create mode 100644 data/2020/neurips/Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability
 create mode 100644 data/2020/neurips/Phase retrieval in high dimensions: Statistical and computational phase transitions
 create mode 100644 data/2020/neurips/Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
 create mode 100644 data/2020/neurips/Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation
 create mode 100644 data/2020/neurips/PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
 create mode 100644 data/2020/neurips/Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
 create mode 100644 data/2020/neurips/Planning with General Objective Functions: Going Beyond Total Rewards
 create mode 100644 data/2020/neurips/Point process models for sequence detection in high-dimensional neural spike trains
 create mode 100644 data/2020/neurips/Pointer Graph Networks
 create mode 100644 data/2020/neurips/Policy Improvement via Imitation of Multiple Oracles
 create mode 100644 data/2020/neurips/Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond
 create mode 100644 data/2020/neurips/Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
 create mode 100644 data/2020/neurips/Position-based Scaled Gradient for Model Quantization and Pruning
 create mode 100644 data/2020/neurips/Post-training Iterative Hierarchical Data Augmentation for Deep Networks
 create mode 100644 data/2020/neurips/Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts
 create mode 100644 data/2020/neurips/Posterior Re-calibration for Imbalanced Datasets
 create mode 100644 data/2020/neurips/Practical Low-Rank Communication Compression in Decentralized Deep Learning
 create mode 100644 data/2020/neurips/Practical No-box Adversarial Attacks against DNNs
 create mode 100644 data/2020/neurips/Practical Quasi-Newton Methods for Training Deep Neural Networks
 create mode 100644 data/2020/neurips/Pre-training via Paraphrasing
 create mode 100644 data/2020/neurips/Precise expressions for random projections: Low-rank approximation and randomized Newton
 create mode 100644 data/2020/neurips/Predicting Training Time Without Training
 create mode 100644 data/2020/neurips/Prediction with Corrupted Expert Advice
 create mode 100644 data/2020/neurips/Predictive Information Accelerates Learning in RL
 create mode 100644 data/2020/neurips/Predictive coding in balanced neural networks with noise, chaos and delays
 create mode 100644 data/2020/neurips/Predictive inference is free with the jackknife+-after-bootstrap
 create mode 100644 data/2020/neurips/Preference learning along multiple criteria: A game-theoretic perspective
 create mode 100644 data/2020/neurips/Preference-based Reinforcement Learning with Finite-Time Guarantees
 create mode 100644 data/2020/neurips/Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
 create mode 100644 data/2020/neurips/Primal-Dual Mesh Convolutional Neural Networks
 create mode 100644 data/2020/neurips/Principal Neighbourhood Aggregation for Graph Nets
 create mode 100644 data/2020/neurips/Privacy Amplification via Random Check-Ins
 create mode 100644 data/2020/neurips/Private Identity Testing for High-Dimensional Distributions
 create mode 100644 data/2020/neurips/Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
 create mode 100644 data/2020/neurips/Probabilistic Active Meta-Learning
 create mode 100644 data/2020/neurips/Probabilistic Circuits for Variational Inference in Discrete Graphical Models
 create mode 100644 data/2020/neurips/Probabilistic Fair Clustering
 create mode 100644 data/2020/neurips/Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations
 create mode 100644 data/2020/neurips/Probabilistic Linear Solvers for Machine Learning
 create mode 100644 data/2020/neurips/Probabilistic Orientation Estimation with Matrix Fisher Distributions
 create mode 100644 data/2020/neurips/Probabilistic Time Series Forecasting with Shape and Temporal Diversity
 create mode 100644 data/2020/neurips/Probably Approximately Correct Constrained Learning
 create mode 100644 data/2020/neurips/Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions
 create mode 100644 data/2020/neurips/Program Synthesis with Pragmatic Communication
 create mode 100644 data/2020/neurips/Projected Stein Variational Gradient Descent
 create mode 100644 data/2020/neurips/Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method
 create mode 100644 data/2020/neurips/Projection Robust Wasserstein Distance and Riemannian Optimization
 create mode 100644 data/2020/neurips/Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method
 create mode 100644 data/2020/neurips/Prophet Attention: Predicting Attention with Future Attention
 create mode 100644 data/2020/neurips/Provable Online CP PARAFAC Decomposition of a Structured Tensor via Dictionary Learning
 create mode 100644 data/2020/neurips/Provable Overlapping Community Detection in Weighted Graphs
 create mode 100644 data/2020/neurips/Provably Consistent Partial-Label Learning
 create mode 100644 data/2020/neurips/Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
 create mode 100644 data/2020/neurips/Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
 create mode 100644 data/2020/neurips/Provably Efficient Neural GTD for Off-Policy Learning
 create mode 100644 data/2020/neurips/Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
 create mode 100644 data/2020/neurips/Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
 create mode 100644 data/2020/neurips/Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
 create mode 100644 data/2020/neurips/Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
 create mode 100644 data/2020/neurips/Provably Robust Metric Learning
 create mode 100644 data/2020/neurips/Provably adaptive reinforcement learning in metric spaces
 create mode 100644 data/2020/neurips/Proximal Mapping for Deep Regularization
 create mode 100644 data/2020/neurips/Proximity Operator of the Matrix Perspective Function and its Applications
 create mode 100644 data/2020/neurips/Pruning Filter in Filter
 create mode 100644 data/2020/neurips/Pruning neural networks without any data by iteratively conserving synaptic flow
 create mode 100644 data/2020/neurips/Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point
 create mode 100644 data/2020/neurips/PyGlove: Symbolic Programming for Automated Machine Learning
 create mode 100644 data/2020/neurips/Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
 create mode 100644 data/2020/neurips/Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality
 create mode 100644 data/2020/neurips/Quantile Propagation for Wasserstein-Approximate Gaussian Processes
 create mode 100644 data/2020/neurips/Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 create mode 100644 data/2020/neurips/Quantized Variational Inference
 create mode 100644 data/2020/neurips/R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making
 create mode 100644 data/2020/neurips/RANet: Region Attention Network for Semantic Segmentation
 create mode 100644 data/2020/neurips/RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
 create mode 100644 data/2020/neurips/RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
 create mode 100644 data/2020/neurips/RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning
 create mode 100644 data/2020/neurips/RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
 create mode 100644 data/2020/neurips/RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor
 create mode 100644 data/2020/neurips/RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
 create mode 100644 data/2020/neurips/Random Reshuffling is Not Always Better
 create mode 100644 data/2020/neurips/Random Reshuffling: Simple Analysis with Vast Improvements
 create mode 100644 data/2020/neurips/Random Walk Graph Neural Networks
 create mode 100644 data/2020/neurips/Randomized tests for high-dimensional regression: A more efficient and powerful solution
 create mode 100644 data/2020/neurips/Rankmax: An Adaptive Projection Alternative to the Softmax Function
 create mode 100644 data/2020/neurips/Ratio Trace Formulation of Wasserstein Discriminant Analysis
 create mode 100644 data/2020/neurips/Rational neural networks
 create mode 100644 data/2020/neurips/Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization
 create mode 100644 data/2020/neurips/Real World Games Look Like Spinning Tops
 create mode 100644 data/2020/neurips/Reasoning about Uncertainties in Discrete-Time Dynamical Systems using Polynomial Forms
 create mode 100644 data/2020/neurips/Reciprocal Adversarial Learning via Characteristic Functions
 create mode 100644 data/2020/neurips/Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
 create mode 100644 data/2020/neurips/Reconsidering Generative Objectives For Counterfactual Reasoning
 create mode 100644 data/2020/neurips/Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN
 create mode 100644 data/2020/neurips/Recovery of sparse linear classifiers from mixture of responses
 create mode 100644 data/2020/neurips/Recurrent Quantum Neural Networks
 create mode 100644 data/2020/neurips/Recurrent Switching Dynamical Systems Models for Multiple Interacting Neural Populations
 create mode 100644 data/2020/neurips/Recursive Inference for Variational Autoencoders
 create mode 100644 data/2020/neurips/Reducing Adversarially Robust Learning to Non-Robust PAC Learning
 create mode 100644 data/2020/neurips/Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals
 create mode 100644 data/2020/neurips/Regression with reject option and application to kNN
 create mode 100644 data/2020/neurips/Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
 create mode 100644 data/2020/neurips/Regret in Online Recommendation Systems
 create mode 100644 data/2020/neurips/Regularized linear autoencoders recover the principal components, eventually
 create mode 100644 data/2020/neurips/Regularizing Black-box Models for Improved Interpretability
 create mode 100644 data/2020/neurips/Regularizing Towards Permutation Invariance In Recurrent Models
 create mode 100644 data/2020/neurips/Reinforced Molecular Optimization with Neighborhood-Controlled Grammars
 create mode 100644 data/2020/neurips/Reinforcement Learning for Control with Multiple Frequencies
 create mode 100644 data/2020/neurips/Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting
 create mode 100644 data/2020/neurips/Reinforcement Learning with Augmented Data
 create mode 100644 data/2020/neurips/Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
 create mode 100644 data/2020/neurips/Reinforcement Learning with Feedback Graphs
 create mode 100644 data/2020/neurips/Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
 create mode 100644 data/2020/neurips/Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
 create mode 100644 data/2020/neurips/RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
 create mode 100644 data/2020/neurips/Relative gradient optimization of the Jacobian term in unsupervised deep learning
 create mode 100644 data/2020/neurips/Reliable Graph Neural Networks via Robust Aggregation
 create mode 100644 data/2020/neurips/Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
 create mode 100644 data/2020/neurips/RepPoints v2: Verification Meets Regression for Object Detection
 create mode 100644 data/2020/neurips/Reparameterizing Mirror Descent as Gradient Descent
 create mode 100644 "data/2020/neurips/Replica-Exchange Nos\303\251-Hoover Dynamics for Bayesian Learning on Large Datasets"
 create mode 100644 data/2020/neurips/Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatment
 create mode 100644 data/2020/neurips/Rescuing neural spike train models from bad MLE
 create mode 100644 data/2020/neurips/Reservoir Computing meets Recurrent Kernels and Structured Transforms
 create mode 100644 data/2020/neurips/Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
 create mode 100644 data/2020/neurips/Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis
 create mode 100644 data/2020/neurips/Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
 create mode 100644 data/2020/neurips/Restoring Negative Information in Few-Shot Object Detection
 create mode 100644 data/2020/neurips/Rethinking Importance Weighting for Deep Learning under Distribution Shift
 create mode 100644 data/2020/neurips/Rethinking Learnable Tree Filter for Generic Feature Transform
 create mode 100644 data/2020/neurips/Rethinking Pre-training and Self-training
 create mode 100644 data/2020/neurips/Rethinking pooling in graph neural networks
 create mode 100644 data/2020/neurips/Rethinking the Value of Labels for Improving Class-Imbalanced Learning
 create mode 100644 data/2020/neurips/Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
 create mode 100644 data/2020/neurips/RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist
 create mode 100644 data/2020/neurips/Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice
 create mode 100644 data/2020/neurips/Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity
 create mode 100644 data/2020/neurips/Revisiting Parameter Sharing for Automatic Neural Channel Number Search
 create mode 100644 data/2020/neurips/Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes
 create mode 100644 data/2020/neurips/Reward Propagation Using Graph Convolutional Networks
 create mode 100644 data/2020/neurips/Reward-rational (implicit) choice: A unifying formalism for reward learning
 create mode 100644 data/2020/neurips/Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
 create mode 100644 data/2020/neurips/Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
 create mode 100644 data/2020/neurips/Riemannian Continuous Normalizing Flows
 create mode 100644 data/2020/neurips/Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
 create mode 100644 data/2020/neurips/Robust Correction of Sampling Bias using Cumulative Distribution Functions
 create mode 100644 data/2020/neurips/Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
 create mode 100644 data/2020/neurips/Robust Density Estimation under Besov IPM Losses
 create mode 100644 data/2020/neurips/Robust Disentanglement of a Few Factors at a Time
 create mode 100644 data/2020/neurips/Robust Federated Learning: The Case of Affine Distribution Shifts
 create mode 100644 data/2020/neurips/Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time
 create mode 100644 data/2020/neurips/Robust Meta-learning for Mixed Linear Regression with Small Batches
 create mode 100644 data/2020/neurips/Robust Multi-Agent Reinforcement Learning with Model Uncertainty
 create mode 100644 data/2020/neurips/Robust Multi-Object Matching via Iterative Reweighting of the Graph Connection Laplacian
 create mode 100644 data/2020/neurips/Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation
 create mode 100644 data/2020/neurips/Robust Optimization for Fairness with Noisy Protected Groups
 create mode 100644 data/2020/neurips/Robust Persistence Diagrams using Reproducing Kernels
 create mode 100644 data/2020/neurips/Robust Pre-Training by Adversarial Contrastive Learning
 create mode 100644 data/2020/neurips/Robust Quantization: One Model to Rule Them All
 create mode 100644 data/2020/neurips/Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
 create mode 100644 data/2020/neurips/Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification
 create mode 100644 data/2020/neurips/Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
 create mode 100644 data/2020/neurips/Robust Sequence Submodular Maximization
 create mode 100644 data/2020/neurips/Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
 create mode 100644 data/2020/neurips/Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
 create mode 100644 data/2020/neurips/Robust compressed sensing using generative models
 create mode 100644 data/2020/neurips/Robust large-margin learning in hyperbolic space
 create mode 100644 data/2020/neurips/Robust, Accurate Stochastic Optimization for Variational Inference
 create mode 100644 data/2020/neurips/Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs
 create mode 100644 data/2020/neurips/Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
 create mode 100644 data/2020/neurips/Robustness of Bayesian Neural Networks to Gradient-Based Attacks
 create mode 100644 data/2020/neurips/Robustness of Community Detection to Random Geometric Perturbations
 create mode 100644 data/2020/neurips/Rotated Binary Neural Network
 create mode 100644 data/2020/neurips/Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud
 create mode 100644 data/2020/neurips/SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
 create mode 100644 data/2020/neurips/SCOP: Scientific Control for Reliable Neural Network Pruning
 create mode 100644 data/2020/neurips/SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images
 create mode 100644 data/2020/neurips/SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
 create mode 100644 data/2020/neurips/SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology
 create mode 100644 data/2020/neurips/SGD with shuffling: optimal rates without component convexity and large epoch requirements
 create mode 100644 data/2020/neurips/SIRI: Spatial Relation Induced Network For Spatial Description Resolution
 create mode 100644 data/2020/neurips/SLIP: Learning to predict in unknown dynamical systems with long-term memory
 create mode 100644 data/2020/neurips/SMYRF - Efficient Attention using Asymmetric Clustering
 create mode 100644 data/2020/neurips/SOLOv2: Dynamic and Fast Instance Segmentation
 create mode 100644 data/2020/neurips/STEER : Simple Temporal Regularization For Neural ODE
 create mode 100644 data/2020/neurips/STLnet: Signal Temporal Logic Enforced Multivariate Recurrent Neural Networks
 create mode 100644 data/2020/neurips/SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
 create mode 100644 data/2020/neurips/SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence
 create mode 100644 data/2020/neurips/Safe Reinforcement Learning via Curriculum Induction
 create mode 100644 data/2020/neurips/Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
 create mode 100644 data/2020/neurips/Sample Complexity of Uniform Convergence for Multicalibration
 create mode 100644 data/2020/neurips/Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
 create mode 100644 data/2020/neurips/Sample complexity and effective dimension for regression on manifolds
 create mode 100644 data/2020/neurips/Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining
 create mode 100644 data/2020/neurips/Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
 create mode 100644 data/2020/neurips/Sampling from a k-DPP without looking at all items
 create mode 100644 data/2020/neurips/Sampling-Decomposable Generative Adversarial Recommender
 create mode 100644 data/2020/neurips/Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
 create mode 100644 data/2020/neurips/Scalable Belief Propagation via Relaxed Scheduling
 create mode 100644 data/2020/neurips/Scalable Graph Neural Networks via Bidirectional Propagation
 create mode 100644 data/2020/neurips/Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
 create mode 100644 data/2020/neurips/ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
 create mode 100644 data/2020/neurips/Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks
 create mode 100644 data/2020/neurips/Searching for Low-Bit Weights in Quantized Neural Networks
 create mode 100644 data/2020/neurips/Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking
 create mode 100644 data/2020/neurips/Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
 create mode 100644 data/2020/neurips/Secretary and Online Matching Problems with Machine Learned Advice
 create mode 100644 data/2020/neurips/Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
 create mode 100644 data/2020/neurips/See, Hear, Explore: Curiosity via Audio-Visual Association
 create mode 100644 data/2020/neurips/Self-Adaptive Training: beyond Empirical Risk Minimization
 create mode 100644 "data/2020/neurips/Self-Adaptively Learning to Demoir\303\251 from Focused and Defocused Image Pairs"
 create mode 100644 data/2020/neurips/Self-Distillation Amplifies Regularization in Hilbert Space
 create mode 100644 data/2020/neurips/Self-Distillation as Instance-Specific Label Smoothing
 create mode 100644 data/2020/neurips/Self-Imitation Learning via Generalized Lower Bound Q-learning
 create mode 100644 data/2020/neurips/Self-Learning Transformations for Improving Gaze and Head Redirection
 create mode 100644 data/2020/neurips/Self-Paced Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Self-Supervised Few-Shot Learning on Point Clouds
 create mode 100644 data/2020/neurips/Self-Supervised Generative Adversarial Compression
 create mode 100644 data/2020/neurips/Self-Supervised Learning by Cross-Modal Audio-Video Clustering
 create mode 100644 data/2020/neurips/Self-Supervised MultiModal Versatile Networks
 create mode 100644 data/2020/neurips/Self-Supervised Relational Reasoning for Representation Learning
 create mode 100644 data/2020/neurips/Self-Supervised Relationship Probing
 create mode 100644 data/2020/neurips/Self-Supervised Visual Representation Learning from Hierarchical Grouping
 create mode 100644 data/2020/neurips/Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
 create mode 100644 data/2020/neurips/Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs
 create mode 100644 data/2020/neurips/Self-supervised Co-Training for Video Representation Learning
 create mode 100644 data/2020/neurips/Self-supervised learning through the eyes of a child
 create mode 100644 data/2020/neurips/Self-training Avoids Using Spurious Features Under Domain Shift
 create mode 100644 data/2020/neurips/Semantic Visual Navigation by Watching YouTube Videos
 create mode 100644 data/2020/neurips/Semi-Supervised Neural Architecture Search
 create mode 100644 data/2020/neurips/Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization
 create mode 100644 data/2020/neurips/Semialgebraic Optimization for Lipschitz Constants of ReLU Networks
 create mode 100644 data/2020/neurips/Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
 create mode 100644 data/2020/neurips/Sequential Bayesian Experimental Design with Variable Cost Structure
 create mode 100644 data/2020/neurips/Set2Graph: Learning Graphs From Sets
 create mode 100644 data/2020/neurips/ShapeFlow: Learnable Deformation Flows Among 3D Shapes
 create mode 100644 data/2020/neurips/Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
 create mode 100644 data/2020/neurips/Shared Space Transfer Learning for analyzing multi-site fMRI data
 create mode 100644 data/2020/neurips/Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth
 create mode 100644 data/2020/neurips/Sharp uniform convergence bounds through empirical centralization
 create mode 100644 data/2020/neurips/Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
 create mode 100644 data/2020/neurips/Sharper Generalization Bounds for Pairwise Learning
 create mode 100644 data/2020/neurips/ShiftAddNet: A Hardware-Inspired Deep Network
 create mode 100644 data/2020/neurips/Simple and Fast Algorithm for Binary Integer and Online Linear Programming
 create mode 100644 data/2020/neurips/Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
 create mode 100644 data/2020/neurips/Simple and Scalable Sparse k-means Clustering via Feature Ranking
 create mode 100644 data/2020/neurips/Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering
 create mode 100644 data/2020/neurips/Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations
 create mode 100644 data/2020/neurips/Simultaneous Preference and Metric Learning from Paired Comparisons
 create mode 100644 data/2020/neurips/Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
 create mode 100644 data/2020/neurips/Sinkhorn Barycenter via Functional Gradient Descent
 create mode 100644 data/2020/neurips/Sinkhorn Natural Gradient for Generative Models
 create mode 100644 data/2020/neurips/Skeleton-bridged Point Completion: From Global Inference to Local Adjustment
 create mode 100644 data/2020/neurips/Sliding Window Algorithms for k-Clustering Problems
 create mode 100644 data/2020/neurips/Small Nash Equilibrium Certificates in Very Large Games
 create mode 100644 data/2020/neurips/Smooth And Consistent Probabilistic Regression Trees
 create mode 100644 data/2020/neurips/Smoothed Analysis of Online and Differentially Private Learning
 create mode 100644 data/2020/neurips/Smoothed Geometry for Robust Attribution
 create mode 100644 data/2020/neurips/Smoothly Bounding User Contributions in Differential Privacy
 create mode 100644 data/2020/neurips/SnapBoost: A Heterogeneous Boosting Machine
 create mode 100644 data/2020/neurips/Soft Contrastive Learning for Visual Localization
 create mode 100644 data/2020/neurips/SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds
 create mode 100644 data/2020/neurips/Softmax Deep Double Deterministic Policy Gradients
 create mode 100644 data/2020/neurips/Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers
 create mode 100644 data/2020/neurips/Space-Time Correspondence as a Contrastive Random Walk
 create mode 100644 data/2020/neurips/Sparse Graphical Memory for Robust Planning
 create mode 100644 data/2020/neurips/Sparse Learning with CART
 create mode 100644 data/2020/neurips/Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning
 create mode 100644 data/2020/neurips/Sparse Symplectically Integrated Neural Networks
 create mode 100644 data/2020/neurips/Sparse Weight Activation Training
 create mode 100644 data/2020/neurips/Sparse and Continuous Attention Mechanisms
 create mode 100644 data/2020/neurips/Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks
 create mode 100644 data/2020/neurips/Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting
 create mode 100644 data/2020/neurips/Spike and slab variational Bayes for high dimensional logistic regression
 create mode 100644 data/2020/neurips/Spin-Weighted Spherical CNNs
 create mode 100644 data/2020/neurips/Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
 create mode 100644 data/2020/neurips/Stable and expressive recurrent vision models
 create mode 100644 data/2020/neurips/Stage-wise Conservative Linear Bandits
 create mode 100644 data/2020/neurips/Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes
 create mode 100644 data/2020/neurips/Stationary Activations for Uncertainty Calibration in Deep Learning
 create mode 100644 data/2020/neurips/Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
 create mode 100644 data/2020/neurips/Statistical Guarantees of Distributed Nearest Neighbor Classification
 create mode 100644 data/2020/neurips/Statistical Optimal Transport posed as Learning Kernel Embedding
 create mode 100644 data/2020/neurips/Statistical and Topological Properties of Sliced Probability Divergences
 create mode 100644 data/2020/neurips/Statistical control for spatio-temporal MEG EEG source imaging with desparsified mutli-task Lasso
 create mode 100644 data/2020/neurips/Statistical-Query Lower Bounds via Functional Gradients
 create mode 100644 data/2020/neurips/Steady State Analysis of Episodic Reinforcement Learning
 create mode 100644 data/2020/neurips/Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction
 create mode 100644 data/2020/neurips/Stein Self-Repulsive Dynamics: Benefits From Past Samples
 create mode 100644 data/2020/neurips/Stochastic Deep Gaussian Processes over Graphs
 create mode 100644 data/2020/neurips/Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes
 create mode 100644 data/2020/neurips/Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
 create mode 100644 data/2020/neurips/Stochastic Normalization
 create mode 100644 data/2020/neurips/Stochastic Normalizing Flows
 create mode 100644 data/2020/neurips/Stochastic Optimization for Performative Prediction
 create mode 100644 data/2020/neurips/Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
 create mode 100644 data/2020/neurips/Stochastic Optimization with Laggard Data Pipelines
 create mode 100644 data/2020/neurips/Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
 create mode 100644 data/2020/neurips/Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty
 create mode 100644 data/2020/neurips/Stochastic Stein Discrepancies
 create mode 100644 data/2020/neurips/Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
 create mode 100644 data/2020/neurips/Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/StratLearner: Learning a Strategy for Misinformation Prevention in Social Networks
 create mode 100644 data/2020/neurips/Strictly Batch Imitation Learning by Energy-based Distribution Matching
 create mode 100644 data/2020/neurips/Strongly Incremental Constituency Parsing with Graph Neural Networks
 create mode 100644 data/2020/neurips/Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering
 create mode 100644 data/2020/neurips/Structured Convolutions for Efficient Neural Network Design
 create mode 100644 data/2020/neurips/Structured Prediction for Conditional Meta-Learning
 create mode 100644 data/2020/neurips/Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
 create mode 100644 data/2020/neurips/Sub-sampling for Efficient Non-Parametric Bandit Exploration
 create mode 100644 data/2020/neurips/Subgraph Neural Networks
 create mode 100644 data/2020/neurips/Subgroup-based Rank-1 Lattice Quasi-Monte Carlo
 create mode 100644 data/2020/neurips/Submodular Maximization Through Barrier Functions
 create mode 100644 data/2020/neurips/Submodular Meta-Learning
 create mode 100644 data/2020/neurips/Succinct and Robust Multi-Agent Communication With Temporal Message Control
 create mode 100644 data/2020/neurips/Sufficient dimension reduction for classification using principal optimal transport direction
 create mode 100644 data/2020/neurips/SuperLoss: A Generic Loss for Robust Curriculum Learning
 create mode 100644 data/2020/neurips/Supermasks in Superposition
 create mode 100644 data/2020/neurips/Supervised Contrastive Learning
 create mode 100644 data/2020/neurips/SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
 create mode 100644 data/2020/neurips/Swapping Autoencoder for Deep Image Manipulation
 create mode 100644 data/2020/neurips/Synbols: Probing Learning Algorithms with Synthetic Datasets
 create mode 100644 data/2020/neurips/Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
 create mode 100644 data/2020/neurips/Synthesizing Tasks for Block-based Programming
 create mode 100644 data/2020/neurips/Synthetic Data Generators - Sequential and Private
 create mode 100644 data/2020/neurips/System Identification with Biophysical Constraints: A Circuit Model of the Inner Retina
 create mode 100644 data/2020/neurips/TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
 create mode 100644 data/2020/neurips/Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization
 create mode 100644 data/2020/neurips/Taming Discrete Integration via the Boon of Dimensionality
 create mode 100644 data/2020/neurips/Targeted Adversarial Perturbations for Monocular Depth Prediction
 create mode 100644 data/2020/neurips/Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters
 create mode 100644 data/2020/neurips/Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
 create mode 100644 data/2020/neurips/Task-Oriented Feature Distillation
 create mode 100644 data/2020/neurips/Task-Robust Model-Agnostic Meta-Learning
 create mode 100644 data/2020/neurips/Task-agnostic Exploration in Reinforcement Learning
 create mode 100644 data/2020/neurips/TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation
 create mode 100644 data/2020/neurips/Teaching a GAN What Not to Learn
 create mode 100644 data/2020/neurips/Telescoping Density-Ratio Estimation
 create mode 100644 data/2020/neurips/Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation
 create mode 100644 data/2020/neurips/Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks
 create mode 100644 data/2020/neurips/Temporal Variability in Implicit Online Learning
 create mode 100644 data/2020/neurips/Tensor Completion Made Practical
 create mode 100644 data/2020/neurips/Testing Determinantal Point Processes
 create mode 100644 data/2020/neurips/Texture Interpolation for Probing Visual Perception
 create mode 100644 data/2020/neurips/The Adaptive Complexity of Maximizing a Gross Substitutes Valuation
 create mode 100644 data/2020/neurips/The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning
 create mode 100644 data/2020/neurips/The All-or-Nothing Phenomenon in Sparse Tensor PCA
 create mode 100644 data/2020/neurips/The Autoencoding Variational Autoencoder
 create mode 100644 data/2020/neurips/The Complete Lasso Tradeoff Diagram
 create mode 100644 data/2020/neurips/The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise
 create mode 100644 data/2020/neurips/The Cone of Silence: Speech Separation by Localization
 create mode 100644 data/2020/neurips/The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification
 create mode 100644 data/2020/neurips/The Convolution Exponential and Generalized Sylvester Flows
 create mode 100644 data/2020/neurips/The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models
 create mode 100644 data/2020/neurips/The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification
 create mode 100644 data/2020/neurips/The Discrete Gaussian for Differential Privacy
 create mode 100644 data/2020/neurips/The Diversified Ensemble Neural Network
 create mode 100644 data/2020/neurips/The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space
 create mode 100644 data/2020/neurips/The Generalization-Stability Tradeoff In Neural Network Pruning
 create mode 100644 data/2020/neurips/The Generalized Lasso with Nonlinear Observations and Generative Priors
 create mode 100644 data/2020/neurips/The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
 create mode 100644 data/2020/neurips/The Implications of Local Correlation on Learning Some Deep Functions
 create mode 100644 data/2020/neurips/The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning
 create mode 100644 data/2020/neurips/The Lottery Ticket Hypothesis for Pre-trained BERT Networks
 create mode 100644 data/2020/neurips/The MAGICAL Benchmark for Robust Imitation
 create mode 100644 data/2020/neurips/The Mean-Squared Error of Double Q-Learning
 create mode 100644 data/2020/neurips/The NetHack Learning Environment
 create mode 100644 data/2020/neurips/The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
 create mode 100644 data/2020/neurips/The Pitfalls of Simplicity Bias in Neural Networks
 create mode 100644 data/2020/neurips/The Potts-Ising model for discrete multivariate data
 create mode 100644 data/2020/neurips/The Power of Comparisons for Actively Learning Linear Classifiers
 create mode 100644 data/2020/neurips/The Power of Predictions in Online Control
 create mode 100644 data/2020/neurips/The Primal-Dual method for Learning Augmented Algorithms
 create mode 100644 data/2020/neurips/The Smoothed Possibility of Social Choice
 create mode 100644 data/2020/neurips/The Statistical Complexity of Early-Stopped Mirror Descent
 create mode 100644 data/2020/neurips/The Strong Screening Rule for SLOPE
 create mode 100644 data/2020/neurips/The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
 create mode 100644 data/2020/neurips/The Value Equivalence Principle for Model-Based Reinforcement Learning
 create mode 100644 data/2020/neurips/The Wasserstein Proximal Gradient Algorithm
 create mode 100644 data/2020/neurips/The interplay between randomness and structure during learning in RNNs
 create mode 100644 data/2020/neurips/The phase diagram of approximation rates for deep neural networks
 create mode 100644 data/2020/neurips/The route to chaos in routing games: When is price of anarchy too optimistic?
 create mode 100644 data/2020/neurips/Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
 create mode 100644 data/2020/neurips/Theory-Inspired Path-Regularized Differential Network Architecture Search
 create mode 100644 data/2020/neurips/Throughput-Optimal Topology Design for Cross-Silo Federated Learning
 create mode 100644 data/2020/neurips/Thunder: a Fast Coordinate Selection Solver for Sparse Learning
 create mode 100644 data/2020/neurips/Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
 create mode 100644 data/2020/neurips/Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
 create mode 100644 data/2020/neurips/Tight last-iterate convergence rates for no-regret learning in multi-player games
 create mode 100644 data/2020/neurips/Time-Reversal Symmetric ODE Network
 create mode 100644 data/2020/neurips/Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network
 create mode 100644 data/2020/neurips/TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning
 create mode 100644 data/2020/neurips/Top-KAST: Top-K Always Sparse Training
 create mode 100644 data/2020/neurips/Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
 create mode 100644 data/2020/neurips/TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
 create mode 100644 data/2020/neurips/Toward the Fundamental Limits of Imitation Learning
 create mode 100644 data/2020/neurips/Towards Better Generalization of Adaptive Gradient Methods
 create mode 100644 data/2020/neurips/Towards Convergence Rate Analysis of Random Forests for Classification
 create mode 100644 data/2020/neurips/Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
 create mode 100644 data/2020/neurips/Towards Deeper Graph Neural Networks with Differentiable Group Normalization
 create mode 100644 data/2020/neurips/Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
 create mode 100644 data/2020/neurips/Towards Learning Convolutions from Scratch
 create mode 100644 data/2020/neurips/Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples
 create mode 100644 data/2020/neurips/Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
 create mode 100644 data/2020/neurips/Towards More Practical Adversarial Attacks on Graph Neural Networks
 create mode 100644 data/2020/neurips/Towards Neural Programming Interfaces
 create mode 100644 data/2020/neurips/Towards Playing Full MOBA Games with Deep Reinforcement Learning
 create mode 100644 data/2020/neurips/Towards Problem-dependent Optimal Learning Rates
 create mode 100644 data/2020/neurips/Towards Safe Policy Improvement for Non-Stationary MDPs
 create mode 100644 data/2020/neurips/Towards Scalable Bayesian Learning of Causal DAGs
 create mode 100644 data/2020/neurips/Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs
 create mode 100644 data/2020/neurips/Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning
 create mode 100644 data/2020/neurips/Towards Understanding Hierarchical Learning: Benefits of Neural Representations
 create mode 100644 data/2020/neurips/Towards a Better Global Loss Landscape of GANs
 create mode 100644 data/2020/neurips/Towards a Combinatorial Characterization of Bounded-Memory Learning
 create mode 100644 data/2020/neurips/Towards practical differentially private causal graph discovery
 create mode 100644 data/2020/neurips/Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation
 create mode 100644 data/2020/neurips/Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering
 create mode 100644 data/2020/neurips/Train-by-Reconnect: Decoupling Locations of Weights from Their Values
 create mode 100644 data/2020/neurips/Training Generative Adversarial Networks by Solving Ordinary Differential Equations
 create mode 100644 data/2020/neurips/Training Generative Adversarial Networks with Limited Data
 create mode 100644 data/2020/neurips/Training Linear Finite-State Machines
 create mode 100644 data/2020/neurips/Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification
 create mode 100644 data/2020/neurips/Training Stronger Baselines for Learning to Optimize
 create mode 100644 data/2020/neurips/Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
 create mode 100644 "data/2020/neurips/Transfer Learning via \342\204\2231 Regularization"
 create mode 100644 data/2020/neurips/Transferable Calibration with Lower Bias and Variance in Domain Adaptation
 create mode 100644 data/2020/neurips/Transferable Graph Optimizers for ML Compilers
 create mode 100644 data/2020/neurips/Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding
 create mode 100644 data/2020/neurips/Triple descent and the two kinds of overfitting: where & why do they appear?
 create mode 100644 data/2020/neurips/Truncated Linear Regression in High Dimensions
 create mode 100644 data/2020/neurips/Trust the Model When It Is Confident: Masked Model-based Actor-Critic
 create mode 100644 data/2020/neurips/Truthful Data Acquisition via Peer Prediction
 create mode 100644 data/2020/neurips/UCLID-Net: Single View Reconstruction in Object Space
 create mode 100644 data/2020/neurips/UCSG-NET- Unsupervised Discovering of Constructive Solid Geometry Tree
 create mode 100644 data/2020/neurips/UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging
 create mode 100644 data/2020/neurips/UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection
 create mode 100644 data/2020/neurips/Ultra-Low Precision 4-bit Training of Deep Neural Networks
 create mode 100644 data/2020/neurips/Ultrahyperbolic Representation Learning
 create mode 100644 data/2020/neurips/UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
 create mode 100644 data/2020/neurips/Unbalanced Sobolev Descent
 create mode 100644 data/2020/neurips/Uncertainty Aware Semi-Supervised Learning on Graph Data
 create mode 100644 data/2020/neurips/Uncertainty Quantification for Inferring Hawkes Networks
 create mode 100644 data/2020/neurips/Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
 create mode 100644 data/2020/neurips/Uncertainty-aware Self-training for Few-shot Text Classification
 create mode 100644 data/2020/neurips/Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence
 create mode 100644 data/2020/neurips/Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features
 create mode 100644 data/2020/neurips/Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
 create mode 100644 data/2020/neurips/Understanding Deep Architecture with Reasoning Layer
 create mode 100644 data/2020/neurips/Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
 create mode 100644 data/2020/neurips/Understanding Global Feature Contributions With Additive Importance Measures
 create mode 100644 data/2020/neurips/Understanding Gradient Clipping in Private SGD: A Geometric Perspective
 create mode 100644 data/2020/neurips/Understanding and Exploring the Network with Stochastic Architectures
 create mode 100644 data/2020/neurips/Understanding and Improving Fast Adversarial Training
 create mode 100644 data/2020/neurips/Understanding spiking networks through convex optimization
 create mode 100644 data/2020/neurips/Understanding the Role of Training Regimes in Continual Learning
 create mode 100644 data/2020/neurips/Unfolding recurrence by Green's functions for optimized reservoir computing
 create mode 100644 data/2020/neurips/Unfolding the Alternating Optimization for Blind Super Resolution
 create mode 100644 data/2020/neurips/Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks
 create mode 100644 data/2020/neurips/Universal Domain Adaptation through Self Supervision
 create mode 100644 data/2020/neurips/Universal Function Approximation on Graphs
 create mode 100644 data/2020/neurips/Universal guarantees for decision tree induction via a higher-order splitting criterion
 create mode 100644 data/2020/neurips/Universally Quantized Neural Compression
 create mode 100644 data/2020/neurips/Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
 create mode 100644 data/2020/neurips/Unsupervised Data Augmentation for Consistency Training
 create mode 100644 data/2020/neurips/Unsupervised Joint k-node Graph Representations with Compositional Energy-Based Models
 create mode 100644 data/2020/neurips/Unsupervised Learning of Dense Visual Representations
 create mode 100644 data/2020/neurips/Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control
 create mode 100644 data/2020/neurips/Unsupervised Learning of Object Landmarks via Self-Training Correspondence
 create mode 100644 data/2020/neurips/Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
 create mode 100644 data/2020/neurips/Unsupervised Representation Learning by Invariance Propagation
 create mode 100644 data/2020/neurips/Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning
 create mode 100644 data/2020/neurips/Unsupervised Sound Separation Using Mixture Invariant Training
 create mode 100644 data/2020/neurips/Unsupervised Text Generation by Learning from Search
 create mode 100644 data/2020/neurips/Unsupervised Translation of Programming Languages
 create mode 100644 data/2020/neurips/Unsupervised object-centric video generation and decomposition in 3D
 create mode 100644 data/2020/neurips/Untangling tradeoffs between recurrence and self-attention in artificial neural networks
 create mode 100644 data/2020/neurips/Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
 create mode 100644 data/2020/neurips/User-Dependent Neural Sequence Models for Continuous-Time Event Data
 create mode 100644 data/2020/neurips/Using noise to probe recurrent neural network structure and prune synapses
 create mode 100644 data/2020/neurips/VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
 create mode 100644 data/2020/neurips/VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
 create mode 100644 data/2020/neurips/Value-driven Hindsight Modelling
 create mode 100644 data/2020/neurips/VarGrad: A Low-Variance Gradient Estimator for Variational Inference
 create mode 100644 data/2020/neurips/Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization
 create mode 100644 data/2020/neurips/Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
 create mode 100644 data/2020/neurips/Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis
 create mode 100644 data/2020/neurips/Variational Amodal Object Completion
 create mode 100644 data/2020/neurips/Variational Bayesian Monte Carlo with Noisy Likelihoods
 create mode 100644 data/2020/neurips/Variational Bayesian Unlearning
 create mode 100644 data/2020/neurips/Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings
 create mode 100644 data/2020/neurips/Variational Policy Gradient Method for Reinforcement Learning with General Utilities
 create mode 100644 data/2020/neurips/Video Frame Interpolation without Temporal Priors
 create mode 100644 data/2020/neurips/Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement
 create mode 100644 data/2020/neurips/Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization
 create mode 100644 data/2020/neurips/Walsh-Hadamard Variational Inference for Bayesian Deep Learning
 create mode 100644 data/2020/neurips/Wasserstein Distances for Stereo Disparity Estimation
 create mode 100644 data/2020/neurips/Watch out! Motion is Blurring the Vision of Your Deep Neural Networks
 create mode 100644 data/2020/neurips/Wavelet Flow: Fast Training of High Resolution Normalizing Flows
 create mode 100644 data/2020/neurips/Weak Form Generalized Hamiltonian Learning
 create mode 100644 data/2020/neurips/Weakly Supervised Deep Functional Maps for Shape Matching
 create mode 100644 data/2020/neurips/Weakly-Supervised Reinforcement Learning for Controllable Behavior
 create mode 100644 data/2020/neurips/Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
 create mode 100644 data/2020/neurips/Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings
 create mode 100644 data/2020/neurips/Weston-Watkins Hinge Loss and Ordered Partitions
 create mode 100644 data/2020/neurips/What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes
 create mode 100644 data/2020/neurips/What Do Neural Networks Learn When Trained With Random Labels?
 create mode 100644 data/2020/neurips/What Makes for Good Views for Contrastive Learning?
 create mode 100644 data/2020/neurips/What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
 create mode 100644 data/2020/neurips/What if Neural Networks had SVDs?
 create mode 100644 data/2020/neurips/What is being transferred in transfer learning?
 create mode 100644 data/2020/neurips/What shapes feature representations? Exploring datasets, architectures, and training
 create mode 100644 data/2020/neurips/What went wrong and when? Instance-wise feature importance for time-series black-box models
 create mode 100644 data/2020/neurips/When Counterpoint Meets Chinese Folk Melodies
 create mode 100644 data/2020/neurips/When Do Neural Networks Outperform Kernel Methods?
 create mode 100644 data/2020/neurips/When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes
 create mode 100644 data/2020/neurips/Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective
 create mode 100644 data/2020/neurips/Why Normalizing Flows Fail to Detect Out-of-Distribution Data
 create mode 100644 data/2020/neurips/Why are Adaptive Methods Good for Attention Models?
 create mode 100644 data/2020/neurips/Winning the Lottery with Continuous Sparsification
 create mode 100644 data/2020/neurips/Wisdom of the Ensemble: Improving Consistency of Deep Learning Models
 create mode 100644 data/2020/neurips/WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
 create mode 100644 data/2020/neurips/Woodbury Transformations for Deep Generative Flows
 create mode 100644 data/2020/neurips/Worst-Case Analysis for Randomly Collected Data
 create mode 100644 data/2020/neurips/X-CAL: Explicit Calibration for Survival Analysis
 create mode 100644 data/2020/neurips/Your Classifier can Secretly Suffice Multi-Source Domain Adaptation
 create mode 100644 data/2020/neurips/Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling
 create mode 100644 data/2020/neurips/Zap Q-Learning With Nonlinear Function Approximation
 create mode 100644 data/2020/neurips/Zero-Resource Knowledge-Grounded Dialogue Generation
 create mode 100644 data/2020/neurips/f-Divergence Variational Inference
 create mode 100644 data/2020/neurips/f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
 create mode 100644 data/2020/neurips/wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
 create mode 100644 data/2021/neurips/(Almost) Free Incentivized Exploration from Decentralized Learning Agents
 create mode 100644 data/2021/neurips/3D Pose Transfer with Correspondence Learning and Mesh Refinement
 create mode 100644 data/2021/neurips/3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds
 create mode 100644 data/2021/neurips/3DP3: 3D Scene Perception via Probabilistic Programming
 create mode 100644 data/2021/neurips/A 3D Generative Model for Structure-Based Drug Design
 create mode 100644 data/2021/neurips/A B Testing for Recommender Systems in a Two-sided Marketplace
 create mode 100644 data/2021/neurips/A B n Testing with Control in the Presence of Subpopulations
 create mode 100644 data/2021/neurips/A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics
 create mode 100644 data/2021/neurips/A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
 create mode 100644 data/2021/neurips/A Biased Graph Neural Network Sampler with Near-Optimal Regret
 create mode 100644 data/2021/neurips/A Causal Lens for Controllable Text Generation
 create mode 100644 data/2021/neurips/A Central Limit Theorem for Differentially Private Query Answering
 create mode 100644 data/2021/neurips/A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms
 create mode 100644 data/2021/neurips/A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference
 create mode 100644 data/2021/neurips/A Comprehensively Tight Analysis of Gradient Descent for PCA
 create mode 100644 data/2021/neurips/A Computationally Efficient Method for Learning Exponential Family Distributions
 create mode 100644 data/2021/neurips/A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
 create mode 100644 data/2021/neurips/A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering
 create mode 100644 data/2021/neurips/A Continuous Mapping For Augmentation Design
 create mode 100644 data/2021/neurips/A Contrastive Learning Approach for Training Variational Autoencoder Priors
 create mode 100644 data/2021/neurips/A Convergence Analysis of Gradient Descent on Graph Neural Networks
 create mode 100644 data/2021/neurips/A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models
 create mode 100644 data/2021/neurips/A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance
 create mode 100644 data/2021/neurips/A Faster Decentralized Algorithm for Nonconvex Minimax Problems
 create mode 100644 data/2021/neurips/A Faster Maximum Cardinality Matching Algorithm with Applications in Machine Learning
 create mode 100644 data/2021/neurips/A Framework to Learn with Interpretation
 create mode 100644 data/2021/neurips/A Gang of Adversarial Bandits
 create mode 100644 data/2021/neurips/A Gaussian Process-Bayesian Bernoulli Mixture Model for Multi-Label Active Learning
 create mode 100644 data/2021/neurips/A Geometric Analysis of Neural Collapse with Unconstrained Features
 create mode 100644 data/2021/neurips/A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition
 create mode 100644 data/2021/neurips/A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
 create mode 100644 data/2021/neurips/A Gradient Method for Multilevel Optimization
 create mode 100644 data/2021/neurips/A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems
 create mode 100644 data/2021/neurips/A Highly-Efficient Group Elastic Net Algorithm with an Application to Function-On-Scalar Regression
 create mode 100644 data/2021/neurips/A Kernel-based Test of Independence for Cluster-correlated Data
 create mode 100644 data/2021/neurips/A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
 create mode 100644 data/2021/neurips/A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks
 create mode 100644 data/2021/neurips/A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning
 create mode 100644 data/2021/neurips/A Max-Min Entropy Framework for Reinforcement Learning
 create mode 100644 data/2021/neurips/A Minimalist Approach to Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/A Multi-Implicit Neural Representation for Fonts
 create mode 100644 data/2021/neurips/A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models
 create mode 100644 data/2021/neurips/A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
 create mode 100644 data/2021/neurips/A New Theoretical Framework for Fast and Accurate Online Decision-Making
 create mode 100644 data/2021/neurips/A No-go Theorem for Robust Acceleration in the Hyperbolic Plane
 create mode 100644 data/2021/neurips/A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations
 create mode 100644 data/2021/neurips/A Normative and Biologically Plausible Algorithm for Independent Component Analysis
 create mode 100644 data/2021/neurips/A Note on Sparse Generalized Eigenvalue Problem
 create mode 100644 data/2021/neurips/A PAC-Bayes Analysis of Adversarial Robustness
 create mode 100644 data/2021/neurips/A Probabilistic State Space Model for Joint Inference from Differential Equations and Data
 create mode 100644 data/2021/neurips/A Prototype-Oriented Framework for Unsupervised Domain Adaptation
 create mode 100644 data/2021/neurips/A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
 create mode 100644 data/2021/neurips/A Provably Efficient Sample Collection Strategy for Reinforcement Learning
 create mode 100644 data/2021/neurips/A Regression Approach to Learning-Augmented Online Algorithms
 create mode 100644 data/2021/neurips/A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks
 create mode 100644 data/2021/neurips/A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis
 create mode 100644 data/2021/neurips/A Stochastic Newton Algorithm for Distributed Convex Optimization
 create mode 100644 data/2021/neurips/A Surrogate Objective Framework for Prediction+Programming with Soft Constraints
 create mode 100644 data/2021/neurips/A Theoretical Analysis of Fine-tuning with Linear Teachers
 create mode 100644 data/2021/neurips/A Theory of the Distortion-Perception Tradeoff in Wasserstein Space
 create mode 100644 data/2021/neurips/A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning
 create mode 100644 data/2021/neurips/A Topological Perspective on Causal Inference
 create mode 100644 data/2021/neurips/A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration
 create mode 100644 data/2021/neurips/A Unified Approach to Fair Online Learning via Blackwell Approachability
 create mode 100644 data/2021/neurips/A Unified View of cGANs with and without Classifiers
 create mode 100644 data/2021/neurips/A Universal Law of Robustness via Isoperimetry
 create mode 100644 data/2021/neurips/A Variational Perspective on Diffusion-Based Generative Models and Score Matching
 create mode 100644 data/2021/neurips/A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness
 create mode 100644 data/2021/neurips/A first-order primal-dual method with adaptivity to local smoothness
 create mode 100644 data/2021/neurips/A flow-based latent state generative model of neural population responses to natural images
 create mode 100644 data/2021/neurips/A generative nonparametric Bayesian model for whole genomes
 create mode 100644 data/2021/neurips/A mechanistic multi-area recurrent network model of decision-making
 create mode 100644 data/2021/neurips/A nonparametric method for gradual change problems with statistical guarantees
 create mode 100644 data/2021/neurips/A novel notion of barycenter for probability distributions based on optimal weak mass transport
 create mode 100644 data/2021/neurips/A sampling-based circuit for optimal decision making
 create mode 100644 data/2021/neurips/A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs
 create mode 100644 data/2021/neurips/A single gradient step finds adversarial examples on random two-layers neural networks
 create mode 100644 data/2021/neurips/A unified framework for bandit multiple testing
 create mode 100644 data/2021/neurips/A universal probabilistic spike count model reveals ongoing modulation of neural variability
 create mode 100644 data/2021/neurips/A variational approximate posterior for the deep Wishart process
 create mode 100644 data/2021/neurips/A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose
 create mode 100644 data/2021/neurips/ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning
 create mode 100644 data/2021/neurips/AC DC: Alternating Compressed DeCompressed Training of Deep Neural Networks
 create mode 100644 data/2021/neurips/AC-GC: Lossy Activation Compression with Guaranteed Convergence
 create mode 100644 data/2021/neurips/AFEC: Active Forgetting of Negative Transfer in Continual Learning
 create mode 100644 data/2021/neurips/ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
 create mode 100644 data/2021/neurips/ATISS: Autoregressive Transformers for Indoor Scene Synthesis
 create mode 100644 data/2021/neurips/Absolute Neighbour Difference based Correlation Test for Detecting Heteroscedastic Relationships
 create mode 100644 data/2021/neurips/Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks
 create mode 100644 data/2021/neurips/Accelerating Quadratic Optimization with Reinforcement Learning
 create mode 100644 data/2021/neurips/Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
 create mode 100644 data/2021/neurips/Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning
 create mode 100644 data/2021/neurips/Accumulative Poisoning Attacks on Real-time Data
 create mode 100644 data/2021/neurips/Accurate Point Cloud Registration with Robust Optimal Transport
 create mode 100644 data/2021/neurips/Accurately Solving Rod Dynamics with Graph Learning
 create mode 100644 data/2021/neurips/Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning
 create mode 100644 data/2021/neurips/Achieving Rotational Invariance with Bessel-Convolutional Neural Networks
 create mode 100644 data/2021/neurips/Across-animal odor decoding by probabilistic manifold alignment
 create mode 100644 data/2021/neurips/Action-guided 3D Human Motion Prediction
 create mode 100644 data/2021/neurips/Activation Sharing with Asymmetric Paths Solves Weight Transport Problem without Bidirectional Connection
 create mode 100644 data/2021/neurips/Active 3D Shape Reconstruction from Vision and Touch
 create mode 100644 data/2021/neurips/Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations
 create mode 100644 data/2021/neurips/Active Learning of Convex Halfspaces on Graphs
 create mode 100644 data/2021/neurips/Active Offline Policy Selection
 create mode 100644 data/2021/neurips/Active clustering for labeling training data
 create mode 100644 data/2021/neurips/Actively Identifying Causal Effects with Latent Variables Given Only Response Variable Observable
 create mode 100644 data/2021/neurips/Adaptable Agent Populations via a Generative Model of Policies
 create mode 100644 data/2021/neurips/Adapting to function difficulty and growth conditions in private optimization
 create mode 100644 data/2021/neurips/Adaptive Conformal Inference Under Distribution Shift
 create mode 100644 data/2021/neurips/Adaptive Data Augmentation on Temporal Graphs
 create mode 100644 data/2021/neurips/Adaptive Denoising via GainTuning
 create mode 100644 data/2021/neurips/Adaptive Diffusion in Graph Neural Networks
 create mode 100644 data/2021/neurips/Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
 create mode 100644 data/2021/neurips/Adaptive First-Order Methods Revisited: Convex Minimization without Lipschitz Requirements
 create mode 100644 data/2021/neurips/Adaptive Machine Unlearning
 create mode 100644 data/2021/neurips/Adaptive Online Packing-guided Search for POMDPs
 create mode 100644 data/2021/neurips/Adaptive Proximal Gradient Methods for Structured Neural Networks
 create mode 100644 data/2021/neurips/Adaptive Risk Minimization: Learning to Adapt to Domain Shift
 create mode 100644 data/2021/neurips/Adaptive Sampling for Minimax Fair Classification
 create mode 100644 data/2021/neurips/Adaptive wavelet distillation from neural networks through interpretations
 create mode 100644 data/2021/neurips/Adder Attention for Vision Transformer
 create mode 100644 data/2021/neurips/Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning
 create mode 100644 data/2021/neurips/Adjusting for Autocorrelated Errors in Neural Networks for Time Series
 create mode 100644 data/2021/neurips/Adversarial Attack Generation Empowered by Min-Max Optimization
 create mode 100644 data/2021/neurips/Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations
 create mode 100644 data/2021/neurips/Adversarial Attacks on Graph Classifiers via Bayesian Optimisation
 create mode 100644 data/2021/neurips/Adversarial Examples Make Strong Poisons
 create mode 100644 data/2021/neurips/Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams
 create mode 100644 data/2021/neurips/Adversarial Examples in Multi-Layer Random ReLU Networks
 create mode 100644 data/2021/neurips/Adversarial Feature Desensitization
 create mode 100644 data/2021/neurips/Adversarial Graph Augmentation to Improve Graph Contrastive Learning
 create mode 100644 data/2021/neurips/Adversarial Intrinsic Motivation for Reinforcement Learning
 create mode 100644 data/2021/neurips/Adversarial Neuron Pruning Purifies Backdoored Deep Models
 create mode 100644 data/2021/neurips/Adversarial Regression with Doubly Non-negative Weighting Matrices
 create mode 100644 data/2021/neurips/Adversarial Reweighting for Partial Domain Adaptation
 create mode 100644 data/2021/neurips/Adversarial Robustness with Non-uniform Perturbations
 create mode 100644 data/2021/neurips/Adversarial Robustness with Semi-Infinite Constrained Learning
 create mode 100644 data/2021/neurips/Adversarial Robustness without Adversarial Training: A Teacher-Guided Curriculum Learning Approach
 create mode 100644 data/2021/neurips/Adversarial Teacher-Student Representation Learning for Domain Generalization
 create mode 100644 data/2021/neurips/Adversarial Training Helps Transfer Learning via Better Representations
 create mode 100644 data/2021/neurips/Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions
 create mode 100644 data/2021/neurips/Adversarially Robust Change Point Detection
 create mode 100644 data/2021/neurips/Adversarially robust learning for security-constrained optimal power flow
 create mode 100644 data/2021/neurips/Agent Modelling under Partial Observability for Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
 create mode 100644 data/2021/neurips/Algorithmic Instabilities of Accelerated Gradient Descent
 create mode 100644 data/2021/neurips/Algorithmic stability and generalization of an unsupervised feature selection algorithm
 create mode 100644 data/2021/neurips/Alias-Free Generative Adversarial Networks
 create mode 100644 data/2021/neurips/Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
 create mode 100644 data/2021/neurips/Aligned Structured Sparsity Learning for Efficient Image Super-Resolution
 create mode 100644 data/2021/neurips/Aligning Pretraining for Detection via Object-Level Contrastive Learning
 create mode 100644 data/2021/neurips/Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
 create mode 100644 data/2021/neurips/Alignment Attention by Matching Key and Query Distributions
 create mode 100644 data/2021/neurips/All Tokens Matter: Token Labeling for Training Better Vision Transformers
 create mode 100644 data/2021/neurips/Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate
 create mode 100644 data/2021/neurips/Amortized Variational Inference for Simple Hierarchical Models
 create mode 100644 data/2021/neurips/An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
 create mode 100644 data/2021/neurips/An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning
 create mode 100644 data/2021/neurips/An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints
 create mode 100644 data/2021/neurips/An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
 create mode 100644 data/2021/neurips/An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers
 create mode 100644 data/2021/neurips/An Empirical Study of Adder Neural Networks for Object Detection
 create mode 100644 data/2021/neurips/An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning
 create mode 100644 data/2021/neurips/An Exact Characterization of the Generalization Error for the Gibbs Algorithm
 create mode 100644 data/2021/neurips/An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks
 create mode 100644 data/2021/neurips/An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap
 create mode 100644 data/2021/neurips/An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild
 create mode 100644 data/2021/neurips/An Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders
 create mode 100644 data/2021/neurips/An Improved Analysis of Gradient Tracking for Decentralized Machine Learning
 create mode 100644 data/2021/neurips/An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence
 create mode 100644 data/2021/neurips/An Information-theoretic Approach to Distribution Shifts
 create mode 100644 data/2021/neurips/An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives
 create mode 100644 data/2021/neurips/An Online Riemannian PCA for Stochastic Canonical Correlation Analysis
 create mode 100644 data/2021/neurips/An Uncertainty Principle is a Price of Privacy-Preserving Microdata
 create mode 100644 data/2021/neurips/An analysis of Ermakov-Zolotukhin quadrature using kernels
 create mode 100644 data/2021/neurips/An online passive-aggressive algorithm for difference-of-squares classification
 create mode 100644 data/2021/neurips/Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
 create mode 100644 data/2021/neurips/Analysis of Sensing Spectral for Signal Recovery under a Generalized Linear Model
 create mode 100644 data/2021/neurips/Analysis of one-hidden-layer neural networks via the resolvent method
 create mode 100644 data/2021/neurips/Analytic Insights into Structure and Rank of Neural Network Hessian Maps
 create mode 100644 data/2021/neurips/Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems
 create mode 100644 data/2021/neurips/Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation
 create mode 100644 data/2021/neurips/Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels
 create mode 100644 data/2021/neurips/Answering Complex Causal Queries With the Maximum Causal Set Effect
 create mode 100644 data/2021/neurips/Anti-Backdoor Learning: Training Clean Models on Poisoned Data
 create mode 100644 data/2021/neurips/Antipodes of Label Differential Privacy: PATE and ALIBI
 create mode 100644 data/2021/neurips/Approximate Decomposable Submodular Function Minimization for Cardinality-Based Components
 create mode 100644 data/2021/neurips/Approximate optimization of convex functions with outlier noise
 create mode 100644 data/2021/neurips/Approximating the Permanent with Deep Rejection Sampling
 create mode 100644 data/2021/neurips/Arbitrary Conditional Distributions with Energy
 create mode 100644 data/2021/neurips/Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training
 create mode 100644 data/2021/neurips/Are Transformers more robust than CNNs?
 create mode 100644 data/2021/neurips/Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
 create mode 100644 data/2021/neurips/Artistic Style Transfer with Internal-external Learning and Contrastive Learning
 create mode 100644 data/2021/neurips/Assessing Fairness in the Presence of Missing Data
 create mode 100644 data/2021/neurips/Associating Objects with Transformers for Video Object Segmentation
 create mode 100644 data/2021/neurips/Associative Memories via Predictive Coding
 create mode 100644 data/2021/neurips/Asymptotically Best Causal Effect Identification with Multi-Armed Bandits
 create mode 100644 data/2021/neurips/Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models
 create mode 100644 data/2021/neurips/Asymptotics of representation learning in finite Bayesian neural networks
 create mode 100644 data/2021/neurips/Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection
 create mode 100644 data/2021/neurips/Asynchronous Decentralized Online Learning
 create mode 100644 data/2021/neurips/Asynchronous Decentralized SGD with Quantized and Local Updates
 create mode 100644 data/2021/neurips/Asynchronous Stochastic Optimization Robust to Arbitrary Delays
 create mode 100644 data/2021/neurips/Attention Approximates Sparse Distributed Memory
 create mode 100644 data/2021/neurips/Attention Bottlenecks for Multimodal Fusion
 create mode 100644 data/2021/neurips/Attention over Learned Object Embeddings Enables Complex Visual Reasoning
 create mode 100644 data/2021/neurips/Auditing Black-Box Prediction Models for Data Minimization Compliance
 create mode 100644 data/2021/neurips/AugMax: Adversarial Composition of Random Augmentations for Robust Training
 create mode 100644 data/2021/neurips/Augmented Shortcuts for Vision Transformers
 create mode 100644 data/2021/neurips/Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
 create mode 100644 data/2021/neurips/AutoBalance: Optimized Loss Functions for Imbalanced Data
 create mode 100644 data/2021/neurips/AutoGEL: An Automated Graph Neural Network with Explicit Link Information
 create mode 100644 data/2021/neurips/Autobahn: Automorphism-based Graph Neural Nets
 create mode 100644 data/2021/neurips/Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
 create mode 100644 data/2021/neurips/Automated Discovery of Adaptive Attacks on Adversarial Defenses
 create mode 100644 data/2021/neurips/Automated Dynamic Mechanism Design
 create mode 100644 data/2021/neurips/Automatic Data Augmentation for Generalization in Reinforcement Learning
 create mode 100644 data/2021/neurips/Automatic Symmetry Discovery with Lie Algebra Convolutional Network
 create mode 100644 data/2021/neurips/Automatic Unsupervised Outlier Model Selection
 create mode 100644 data/2021/neurips/Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach
 create mode 100644 data/2021/neurips/Automorphic Equivalence-aware Graph Neural Network
 create mode 100644 data/2021/neurips/Autonomous Reinforcement Learning via Subgoal Curricula
 create mode 100644 data/2021/neurips/Average-Reward Learning and Planning with Options
 create mode 100644 data/2021/neurips/Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
 create mode 100644 data/2021/neurips/BARTScore: Evaluating Generated Text as Text Generation
 create mode 100644 data/2021/neurips/BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain
 create mode 100644 data/2021/neurips/BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
 create mode 100644 "data/2021/neurips/BCORLE(\316\273): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market"
 create mode 100644 data/2021/neurips/BNS: Building Network Structures Dynamically for Continual Learning
 create mode 100644 data/2021/neurips/Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others
 create mode 100644 data/2021/neurips/Backdoor Attack with Imperceptible Input and Latent Modification
 create mode 100644 data/2021/neurips/Backward-Compatible Prediction Updates: A Probabilistic Approach
 create mode 100644 data/2021/neurips/Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
 create mode 100644 data/2021/neurips/Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
 create mode 100644 data/2021/neurips/Bandit Learning with Delayed Impact of Actions
 create mode 100644 data/2021/neurips/Bandit Phase Retrieval
 create mode 100644 data/2021/neurips/Bandit Quickest Changepoint Detection
 create mode 100644 data/2021/neurips/Bandits with Knapsacks beyond the Worst Case
 create mode 100644 data/2021/neurips/Bandits with many optimal arms
 create mode 100644 data/2021/neurips/Batch Active Learning at Scale
 create mode 100644 data/2021/neurips/Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks
 create mode 100644 data/2021/neurips/Batch Normalization Orthogonalizes Representations in Deep Random Networks
 create mode 100644 data/2021/neurips/BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
 create mode 100644 data/2021/neurips/Batched Thompson Sampling
 create mode 100644 data/2021/neurips/BayesIMP: Uncertainty Quantification for Causal Data Fusion
 create mode 100644 data/2021/neurips/Bayesian Adaptation for Covariate Shift
 create mode 100644 data/2021/neurips/Bayesian Bellman Operators
 create mode 100644 data/2021/neurips/Bayesian Optimization of Function Networks
 create mode 100644 data/2021/neurips/Bayesian Optimization with High-Dimensional Outputs
 create mode 100644 data/2021/neurips/Bayesian decision-making under misspecified priors with applications to meta-learning
 create mode 100644 data/2021/neurips/Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration
 create mode 100644 data/2021/neurips/Behavior From the Void: Unsupervised Active Pre-Training
 create mode 100644 data/2021/neurips/Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
 create mode 100644 data/2021/neurips/Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
 create mode 100644 data/2021/neurips/Bellman-consistent Pessimism for Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/Beltrami Flow and Neural Diffusion on Graphs
 create mode 100644 data/2021/neurips/Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
 create mode 100644 data/2021/neurips/BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation
 create mode 100644 data/2021/neurips/Best Arm Identification in Contaminated Stochastic Bandits
 create mode 100644 data/2021/neurips/Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel
 create mode 100644 data/2021/neurips/Best-case lower bounds in online learning
 create mode 100644 data/2021/neurips/Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification
 create mode 100644 data/2021/neurips/Better Algorithms for Individually Fair k-Clustering
 create mode 100644 data/2021/neurips/Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training
 create mode 100644 data/2021/neurips/Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game
 create mode 100644 data/2021/neurips/Beyond Bandit Feedback in Online Multiclass Classification
 create mode 100644 data/2021/neurips/Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
 create mode 100644 data/2021/neurips/Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification
 create mode 100644 data/2021/neurips/Beyond Smoothness: Incorporating Low-Rank Analysis into Nonparametric Density Estimation
 create mode 100644 data/2021/neurips/Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization
 create mode 100644 data/2021/neurips/Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
 create mode 100644 data/2021/neurips/Beyond the Signs: Nonparametric Tensor Completion via Sign Series
 create mode 100644 data/2021/neurips/Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
 create mode 100644 data/2021/neurips/Bias and variance of the Bayesian-mean decoder
 create mode 100644 data/2021/neurips/Biological key-value memory networks
 create mode 100644 data/2021/neurips/Black Box Probabilistic Numerics
 create mode 100644 data/2021/neurips/BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation
 create mode 100644 data/2021/neurips/Blending Anti-Aliasing into Vision Transformer
 create mode 100644 data/2021/neurips/BooVAE: Boosting Approach for Continual Learning of VAE
 create mode 100644 data/2021/neurips/BooVI: Provably Efficient Bootstrapped Value Iteration
 create mode 100644 data/2021/neurips/Boost Neural Networks by Checkpoints
 create mode 100644 data/2021/neurips/Boosted CVaR Classification
 create mode 100644 data/2021/neurips/Boosting with Multiple Sources
 create mode 100644 data/2021/neurips/Bootstrap Your Object Detector via Mixed Training
 create mode 100644 data/2021/neurips/Bootstrapping the Error of Oja's Algorithm
 create mode 100644 data/2021/neurips/Bounds all around: training energy-based models with bidirectional bounds
 create mode 100644 data/2021/neurips/Breaking the Dilemma of Medical Image-to-image Translation
 create mode 100644 data/2021/neurips/Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures
 create mode 100644 data/2021/neurips/Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
 create mode 100644 data/2021/neurips/Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
 create mode 100644 data/2021/neurips/Breaking the centralized barrier for cross-device federated learning
 create mode 100644 data/2021/neurips/Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators
 create mode 100644 data/2021/neurips/Bridging Non Co-occurrence with Unlabeled In-the-wild Data for Incremental Object Detection
 create mode 100644 data/2021/neurips/Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
 create mode 100644 data/2021/neurips/Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning
 create mode 100644 data/2021/neurips/Bridging the Imitation Gap by Adaptive Insubordination
 create mode 100644 data/2021/neurips/Bubblewrap: Online tiling and real-time flow prediction on neural manifolds
 create mode 100644 data/2021/neurips/BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining
 create mode 100644 data/2021/neurips/ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE
 create mode 100644 data/2021/neurips/CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks
 create mode 100644 data/2021/neurips/CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
 create mode 100644 data/2021/neurips/CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
 create mode 100644 data/2021/neurips/CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator
 create mode 100644 data/2021/neurips/CATs: Cost Aggregation Transformers for Visual Correspondence
 create mode 100644 data/2021/neurips/CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method
 create mode 100644 data/2021/neurips/CCVS: Context-aware Controllable Video Synthesis
 create mode 100644 data/2021/neurips/CHIP: CHannel Independence-based Pruning for Compact Neural Networks
 create mode 100644 data/2021/neurips/CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation
 create mode 100644 data/2021/neurips/CLIP-It! Language-Guided Video Summarization
 create mode 100644 data/2021/neurips/CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum
 create mode 100644 data/2021/neurips/COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
 create mode 100644 data/2021/neurips/COHESIV: Contrastive Object and Hand Embedding Segmentation In Video
 create mode 100644 data/2021/neurips/COMBO: Conservative Offline Model-Based Policy Optimization
 create mode 100644 data/2021/neurips/CROCS: Clustering and Retrieval of Cardiac Signals Based on Patient Disease Class, Sex, and Age
 create mode 100644 data/2021/neurips/CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
 create mode 100644 data/2021/neurips/Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
 create mode 100644 data/2021/neurips/Calibration and Consistency of Adversarial Surrogate Losses
 create mode 100644 data/2021/neurips/Can Information Flows Suggest Targets for Interventions in Neural Circuits?
 create mode 100644 data/2021/neurips/Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial
 create mode 100644 data/2021/neurips/Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks
 create mode 100644 data/2021/neurips/Can contrastive learning avoid shortcut solutions?
 create mode 100644 data/2021/neurips/Can fMRI reveal the representation of syntactic structure in the brain?
 create mode 100644 data/2021/neurips/Can multi-label classification networks know what they don't know?
 create mode 100644 data/2021/neurips/Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression
 create mode 100644 data/2021/neurips/Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks
 create mode 100644 data/2021/neurips/Canonical Capsules: Self-Supervised Capsules in Canonical Pose
 create mode 100644 data/2021/neurips/Capacity and Bias of Learned Geometric Embeddings for Directed Graphs
 create mode 100644 data/2021/neurips/Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations
 create mode 100644 data/2021/neurips/Cardinality constrained submodular maximization for random streams
 create mode 100644 data/2021/neurips/Cardinality-Regularized Hawkes-Granger Model
 create mode 100644 data/2021/neurips/Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
 create mode 100644 data/2021/neurips/Catastrophic Data Leakage in Vertical Federated Learning
 create mode 100644 data/2021/neurips/Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
 create mode 100644 data/2021/neurips/Causal Abstractions of Neural Networks
 create mode 100644 data/2021/neurips/Causal Bandits with Unknown Graph Structure
 create mode 100644 data/2021/neurips/Causal Effect Inference for Structured Treatments
 create mode 100644 data/2021/neurips/Causal Identification with Matrix Equations
 create mode 100644 data/2021/neurips/Causal Inference for Event Pairs in Multivariate Point Processes
 create mode 100644 data/2021/neurips/Causal Influence Detection for Improving Efficiency in Reinforcement Learning
 create mode 100644 data/2021/neurips/Causal Navigation by Continuous-time Neural Networks
 create mode 100644 data/2021/neurips/Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data
 create mode 100644 data/2021/neurips/Celebrating Diversity in Shared Multi-Agent Reinforcement Learning
 create mode 100644 data/2021/neurips/Center Smoothing: Certified Robustness for Networks with Structured Outputs
 create mode 100644 data/2021/neurips/CentripetalText: An Efficient Text Instance Representation for Scene Text Detection
 create mode 100644 data/2021/neurips/Certifying Robustness to Programmable Data Bias in Decision Trees
 create mode 100644 data/2021/neurips/Challenges and Opportunities in High Dimensional Variational Inference
 create mode 100644 data/2021/neurips/Change Point Detection via Multivariate Singular Spectrum Analysis
 create mode 100644 data/2021/neurips/Channel Permutations for N: M Sparsity
 create mode 100644 data/2021/neurips/Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning
 create mode 100644 data/2021/neurips/Characterizing possible failure modes in physics-informed neural networks
 create mode 100644 data/2021/neurips/Characterizing the risk of fairwashing
 create mode 100644 data/2021/neurips/Charting and Navigating the Space of Solutions for Recurrent Neural Networks
 create mode 100644 data/2021/neurips/Chasing Sparsity in Vision Transformers: An End-to-End Exploration
 create mode 100644 data/2021/neurips/Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote
 create mode 100644 data/2021/neurips/Choose a Transformer: Fourier or Galerkin
 create mode 100644 data/2021/neurips/Circa: Stochastic ReLUs for Private Deep Learning
 create mode 100644 data/2021/neurips/Class-Disentanglement and Applications in Adversarial Detection and Defense
 create mode 100644 data/2021/neurips/Class-Incremental Learning via Dual Augmentation
 create mode 100644 data/2021/neurips/Class-agnostic Reconstruction of Dynamic Objects from Videos
 create mode 100644 data/2021/neurips/Clockwork Variational Autoencoders
 create mode 100644 data/2021/neurips/Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems
 create mode 100644 data/2021/neurips/Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation
 create mode 100644 data/2021/neurips/Clustering Effect of Adversarial Robust Models
 create mode 100644 data/2021/neurips/Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Co-evolution Transformer for Protein Contact Prediction
 create mode 100644 data/2021/neurips/CoAtNet: Marrying Convolution and Attention for All Data Sizes
 create mode 100644 data/2021/neurips/CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration
 create mode 100644 data/2021/neurips/CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions
 create mode 100644 data/2021/neurips/Coarse-to-fine Animal Pose and Shape Estimation
 create mode 100644 data/2021/neurips/Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
 create mode 100644 data/2021/neurips/CogView: Mastering Text-to-Image Generation via Transformers
 create mode 100644 data/2021/neurips/Collaborating with Humans without Human Data
 create mode 100644 data/2021/neurips/Collaborative Causal Discovery with Atomic Interventions
 create mode 100644 data/2021/neurips/Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)
 create mode 100644 data/2021/neurips/Collaborative Uncertainty in Multi-Agent Trajectory Forecasting
 create mode 100644 data/2021/neurips/Collapsed Variational Bounds for Bayesian Neural Networks
 create mode 100644 data/2021/neurips/Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification
 create mode 100644 data/2021/neurips/Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach
 create mode 100644 data/2021/neurips/Combinatorial Pure Exploration with Bottleneck Reward Function
 create mode 100644 data/2021/neurips/Combiner: Full Attention Transformer with Sparse Computation Cost
 create mode 100644 data/2021/neurips/Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration
 create mode 100644 data/2021/neurips/Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces
 create mode 100644 data/2021/neurips/Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
 create mode 100644 data/2021/neurips/Communication-efficient SGD: From Local SGD to One-Shot Averaging
 create mode 100644 data/2021/neurips/Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
 create mode 100644 data/2021/neurips/Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization
 create mode 100644 data/2021/neurips/Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features
 create mode 100644 data/2021/neurips/Compositional Reinforcement Learning from Logical Specifications
 create mode 100644 data/2021/neurips/Compositional Transformers for Scene Generation
 create mode 100644 data/2021/neurips/Comprehensive Knowledge Distillation with Causal Intervention
 create mode 100644 data/2021/neurips/Compressed Video Contrastive Learning
 create mode 100644 data/2021/neurips/Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition
 create mode 100644 data/2021/neurips/Compressive Visual Representations
 create mode 100644 data/2021/neurips/Computer-Aided Design as Language
 create mode 100644 data/2021/neurips/ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs
 create mode 100644 data/2021/neurips/Concentration inequalities under sub-Gaussian and sub-exponential conditions
 create mode 100644 data/2021/neurips/Conditional Generation Using Polynomial Expansions
 create mode 100644 data/2021/neurips/Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems
 create mode 100644 data/2021/neurips/Conditioning Sparse Variational Gaussian Processes for Online Decision-making
 create mode 100644 data/2021/neurips/Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality
 create mode 100644 data/2021/neurips/Confident Anchor-Induced Multi-Source Free Domain Adaptation
 create mode 100644 data/2021/neurips/Conflict-Averse Gradient Descent for Multi-task learning
 create mode 100644 data/2021/neurips/Conformal Bayesian Computation
 create mode 100644 data/2021/neurips/Conformal Prediction using Conditional Histograms
 create mode 100644 data/2021/neurips/Conformal Time-series Forecasting
 create mode 100644 data/2021/neurips/Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving
 create mode 100644 data/2021/neurips/Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/Conservative Offline Distributional Reinforcement Learning
 create mode 100644 data/2021/neurips/Consistency Regularization for Variational Auto-Encoders
 create mode 100644 data/2021/neurips/Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers
 create mode 100644 data/2021/neurips/Consistent Non-Parametric Methods for Maximizing Robustness
 create mode 100644 data/2021/neurips/Constrained Optimization to Train Neural Networks on Critical and Under-Represented Classes
 create mode 100644 data/2021/neurips/Constrained Robust Submodular Partitioning
 create mode 100644 data/2021/neurips/Container: Context Aggregation Networks
 create mode 100644 data/2021/neurips/Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
 create mode 100644 data/2021/neurips/Contextual Similarity Aggregation with Self-attention for Visual Re-ranking
 create mode 100644 data/2021/neurips/Continual Auxiliary Task Learning
 create mode 100644 data/2021/neurips/Continual Learning via Local Module Composition
 create mode 100644 data/2021/neurips/Continual World: A Robotic Benchmark For Continual Reinforcement Learning
 create mode 100644 data/2021/neurips/Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
 create mode 100644 data/2021/neurips/Continuous Doubly Constrained Batch Reinforcement Learning
 create mode 100644 data/2021/neurips/Continuous Latent Process Flows
 create mode 100644 data/2021/neurips/Continuous Mean-Covariance Bandits
 create mode 100644 data/2021/neurips/Continuous vs. Discrete Optimization of Deep Neural Networks
 create mode 100644 data/2021/neurips/Continuous-time edge modelling using non-parametric point processes
 create mode 100644 data/2021/neurips/Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
 create mode 100644 data/2021/neurips/Contrastive Active Inference
 create mode 100644 data/2021/neurips/Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels
 create mode 100644 data/2021/neurips/Contrastive Laplacian Eigenmaps
 create mode 100644 data/2021/neurips/Contrastive Learning for Neural Topic Model
 create mode 100644 data/2021/neurips/Contrastive Learning of Global and Local Video Representations
 create mode 100644 data/2021/neurips/Contrastive Reinforcement Learning of Symbolic Reasoning Domains
 create mode 100644 data/2021/neurips/Contrastively Disentangled Sequential Variational Autoencoder
 create mode 100644 data/2021/neurips/Control Variates for Slate Off-Policy Evaluation
 create mode 100644 data/2021/neurips/Controlled Text Generation as Continuous Optimization with Multiple Constraints
 create mode 100644 data/2021/neurips/Controlling Neural Networks with Rule Representations
 create mode 100644 data/2021/neurips/Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
 create mode 100644 data/2021/neurips/Convergence of adaptive algorithms for constrained weakly convex optimization
 create mode 100644 data/2021/neurips/Convex Polytope Trees and its Application to VAE
 create mode 100644 data/2021/neurips/Convex-Concave Min-Max Stackelberg Games
 create mode 100644 data/2021/neurips/Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training
 create mode 100644 data/2021/neurips/Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback
 create mode 100644 data/2021/neurips/Coordinated Proximal Policy Optimization
 create mode 100644 data/2021/neurips/Coresets for Classification - Simplified and Strengthened
 create mode 100644 data/2021/neurips/Coresets for Clustering with Missing Values
 create mode 100644 data/2021/neurips/Coresets for Decision Trees of Signals
 create mode 100644 data/2021/neurips/Coresets for Time Series Clustering
 create mode 100644 data/2021/neurips/Correlated Stochastic Block Models: Exact Graph Matching with Applications to Recovering Communities
 create mode 100644 data/2021/neurips/Corruption Robust Active Learning
 create mode 100644 data/2021/neurips/CorticalFlow: A Diffeomorphic Mesh Transformer Network for Cortical Surface Reconstruction
 create mode 100644 data/2021/neurips/Cortico-cerebellar networks as decoupling neural interfaces
 create mode 100644 data/2021/neurips/Counterbalancing Learning and Strategic Incentives in Allocation Markets
 create mode 100644 data/2021/neurips/Counterexample Guided RL Policy Refinement Using Bayesian Optimization
 create mode 100644 data/2021/neurips/Counterfactual Explanations Can Be Manipulated
 create mode 100644 data/2021/neurips/Counterfactual Explanations in Sequential Decision Making Under Uncertainty
 create mode 100644 data/2021/neurips/Counterfactual Invariance to Spurious Correlations in Text Classification
 create mode 100644 data/2021/neurips/Counterfactual Maximum Likelihood Estimation for Training Deep Networks
 create mode 100644 data/2021/neurips/Coupled Gradient Estimators for Discrete Latent Variables
 create mode 100644 data/2021/neurips/Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
 create mode 100644 data/2021/neurips/Covariance-Aware Private Mean Estimation Without Private Covariance Estimation
 create mode 100644 data/2021/neurips/Credal Self-Supervised Learning
 create mode 100644 data/2021/neurips/Credit Assignment Through Broadcasting a Global Error Vector
 create mode 100644 data/2021/neurips/Credit Assignment in Neural Networks through Deep Feedback Control
 create mode 100644 data/2021/neurips/Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning
 create mode 100644 data/2021/neurips/Cross-view Geo-localization with Layer-to-Layer Transformer
 create mode 100644 data/2021/neurips/CrypTen: Secure Multi-Party Computation Meets Machine Learning
 create mode 100644 data/2021/neurips/Curriculum Design for Teaching via Demonstrations: Theory and Applications
 create mode 100644 data/2021/neurips/Curriculum Disentangled Recommendation with Noisy Multi-feedback
 create mode 100644 data/2021/neurips/Curriculum Learning for Vision-and-Language Navigation
 create mode 100644 data/2021/neurips/Curriculum Offline Imitating Learning
 create mode 100644 data/2021/neurips/Cycle Self-Training for Domain Adaptation
 create mode 100644 data/2021/neurips/D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation
 create mode 100644 data/2021/neurips/DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks
 create mode 100644 data/2021/neurips/DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer
 create mode 100644 data/2021/neurips/DNN-based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel
 create mode 100644 data/2021/neurips/DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
 create mode 100644 data/2021/neurips/DOCTOR: A Simple Method for Detecting Misclassification Errors
 create mode 100644 data/2021/neurips/DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples
 create mode 100644 data/2021/neurips/DRIVE: One-bit Distributed Mean Estimation
 create mode 100644 data/2021/neurips/DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
 create mode 100644 data/2021/neurips/DRONE: Data-aware Low-rank Compression for Large NLP Models
 create mode 100644 data/2021/neurips/DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
 create mode 100644 data/2021/neurips/Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
 create mode 100644 data/2021/neurips/Dangers of Bayesian Model Averaging under Covariate Shift
 create mode 100644 data/2021/neurips/Data Augmentation Can Improve Robustness
 create mode 100644 data/2021/neurips/Data Sharing and Compression for Cooperative Networked Control
 create mode 100644 data/2021/neurips/Data driven semi-supervised learning
 create mode 100644 data/2021/neurips/Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
 create mode 100644 data/2021/neurips/Data-Efficient Instance Generation from Instance Discrimination
 create mode 100644 data/2021/neurips/Dataset Distillation with Infinitely Wide Convolutional Networks
 create mode 100644 data/2021/neurips/De-randomizing MCMC dynamics with the diffusion Stein operator
 create mode 100644 data/2021/neurips/Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification
 create mode 100644 data/2021/neurips/Debiased Visual Question Answering from Feature and Sample Perspectives
 create mode 100644 data/2021/neurips/Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data
 create mode 100644 data/2021/neurips/Decentralized Learning in Online Queuing Systems
 create mode 100644 data/2021/neurips/Decentralized Q-learning in Zero-sum Markov Games
 create mode 100644 data/2021/neurips/Decision Transformer: Reinforcement Learning via Sequence Modeling
 create mode 100644 data/2021/neurips/Deconditional Downscaling with Gaussian Processes
 create mode 100644 data/2021/neurips/Deconvolutional Networks on Graph Data
 create mode 100644 data/2021/neurips/Decoupling the Depth and Scope of Graph Neural Networks
 create mode 100644 data/2021/neurips/Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP
 create mode 100644 data/2021/neurips/Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
 create mode 100644 data/2021/neurips/Deep Conditional Gaussian Mixture Model for Constrained Clustering
 create mode 100644 data/2021/neurips/Deep Contextual Video Compression
 create mode 100644 data/2021/neurips/Deep Explicit Duration Switching Models for Time Series
 create mode 100644 data/2021/neurips/Deep Extended Hazard Models for Survival Analysis
 create mode 100644 data/2021/neurips/Deep Extrapolation for Attribute-Enhanced Generation
 create mode 100644 data/2021/neurips/Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
 create mode 100644 data/2021/neurips/Deep Learning Through the Lens of Example Difficulty
 create mode 100644 data/2021/neurips/Deep Learning on a Data Diet: Finding Important Examples Early in Training
 create mode 100644 data/2021/neurips/Deep Learning with Label Differential Privacy
 create mode 100644 data/2021/neurips/Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis
 create mode 100644 data/2021/neurips/Deep Markov Factor Analysis: Towards Concurrent Temporal and Spatial Analysis of fMRI Data
 create mode 100644 data/2021/neurips/Deep Molecular Representation Learning via Fusing Physical and Chemical Information
 create mode 100644 data/2021/neurips/Deep Networks Provably Classify Data on Curves
 create mode 100644 data/2021/neurips/Deep Neural Networks as Point Estimates for Deep Gaussian Processes
 create mode 100644 data/2021/neurips/Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
 create mode 100644 data/2021/neurips/Deep Reinforcement Learning at the Edge of the Statistical Precipice
 create mode 100644 data/2021/neurips/Deep Residual Learning in Spiking Neural Networks
 create mode 100644 data/2021/neurips/Deep Self-Dissimilarities as Powerful Visual Fingerprints
 create mode 100644 data/2021/neurips/Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess
 create mode 100644 data/2021/neurips/Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time
 create mode 100644 data/2021/neurips/Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
 create mode 100644 data/2021/neurips/DeepGEM: Generalized Expectation-Maximization for Blind Inversion
 create mode 100644 data/2021/neurips/DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
 create mode 100644 data/2021/neurips/DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales
 create mode 100644 data/2021/neurips/Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks
 create mode 100644 data/2021/neurips/Deformable Butterfly: A Highly Structured and Sparse Linear Transform
 create mode 100644 data/2021/neurips/Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
 create mode 100644 data/2021/neurips/Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems
 create mode 100644 data/2021/neurips/Demystifying and Generalizing BinaryConnect
 create mode 100644 data/2021/neurips/Denoising Normalizing Flow
 create mode 100644 data/2021/neurips/Dense Keypoints via Multiview Supervision
 create mode 100644 data/2021/neurips/Dense Unsupervised Learning for Video Segmentation
 create mode 100644 data/2021/neurips/Densely connected normalizing flows
 create mode 100644 data/2021/neurips/Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
 create mode 100644 data/2021/neurips/Design of Experiments for Stochastic Contextual Linear Bandits
 create mode 100644 data/2021/neurips/Designing Counterfactual Generators using Deep Model Inversion
 create mode 100644 data/2021/neurips/Detecting Anomalous Event Sequences with Temporal Point Processes
 create mode 100644 data/2021/neurips/Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
 create mode 100644 data/2021/neurips/Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess
 create mode 100644 data/2021/neurips/Detecting Moments and Highlights in Videos via Natural Language Queries
 create mode 100644 data/2021/neurips/Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning
 create mode 100644 data/2021/neurips/Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD
 create mode 100644 data/2021/neurips/DiBS: Differentiable Bayesian Structure Learning
 create mode 100644 data/2021/neurips/Differentiable Annealed Importance Sampling and the Perils of Gradient Noise
 create mode 100644 data/2021/neurips/Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
 create mode 100644 data/2021/neurips/Differentiable Learning Under Triage
 create mode 100644 data/2021/neurips/Differentiable Multiple Shooting Layers
 create mode 100644 data/2021/neurips/Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs
 create mode 100644 data/2021/neurips/Differentiable Quality Diversity
 create mode 100644 data/2021/neurips/Differentiable Simulation of Soft Multi-body Systems
 create mode 100644 data/2021/neurips/Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks
 create mode 100644 data/2021/neurips/Differentiable Spline Approximations
 create mode 100644 data/2021/neurips/Differentiable Synthesis of Program Architectures
 create mode 100644 data/2021/neurips/Differentiable Unsupervised Feature Selection based on a Gated Laplacian
 create mode 100644 data/2021/neurips/Differentiable rendering with perturbed optimizers
 create mode 100644 data/2021/neurips/Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent
 create mode 100644 data/2021/neurips/Differential Privacy Over Riemannian Manifolds
 create mode 100644 data/2021/neurips/Differentially Private Empirical Risk Minimization under the Fairness Lens
 create mode 100644 data/2021/neurips/Differentially Private Federated Bayesian Optimization with Distributed Exploration
 create mode 100644 data/2021/neurips/Differentially Private Learning with Adaptive Clipping
 create mode 100644 data/2021/neurips/Differentially Private Model Personalization
 create mode 100644 data/2021/neurips/Differentially Private Multi-Armed Bandits in the Shuffle Model
 create mode 100644 data/2021/neurips/Differentially Private Sampling from Distributions
 create mode 100644 data/2021/neurips/Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings
 create mode 100644 data/2021/neurips/Differentially Private n-gram Extraction
 create mode 100644 data/2021/neurips/Diffusion Models Beat GANs on Image Synthesis
 create mode 100644 data/2021/neurips/Diffusion Normalizing Flow
 create mode 100644 "data/2021/neurips/Diffusion Schr\303\266dinger Bridge with Applications to Score-Based Generative Modeling"
 create mode 100644 data/2021/neurips/Dimension-free empirical entropy estimation
 create mode 100644 data/2021/neurips/Dimensionality Reduction for Wasserstein Barycenter
 create mode 100644 data/2021/neurips/Direct Multi-view Multi-person 3D Pose Estimation
 create mode 100644 data/2021/neurips/Directed Graph Contrastive Learning
 create mode 100644 data/2021/neurips/Directed Probabilistic Watershed
 create mode 100644 data/2021/neurips/Directed Spectrum Measures Improve Latent Network Models Of Neural Populations
 create mode 100644 data/2021/neurips/Directional Message Passing on Molecular Graphs via Synthetic Coordinates
 create mode 100644 data/2021/neurips/Dirichlet Energy Constrained Learning for Deep Graph Neural Networks
 create mode 100644 data/2021/neurips/Discerning Decision-Making Process of Deep Neural Networks with Hierarchical Voting Transformation
 create mode 100644 data/2021/neurips/Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks
 create mode 100644 data/2021/neurips/Discovering and Achieving Goals via World Models
 create mode 100644 data/2021/neurips/Discovery of Options via Meta-Learned Subgoals
 create mode 100644 data/2021/neurips/Discrete-Valued Neural Communication
 create mode 100644 data/2021/neurips/Disentangled Contrastive Learning on Graphs
 create mode 100644 data/2021/neurips/Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA
 create mode 100644 data/2021/neurips/Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect
 create mode 100644 data/2021/neurips/Disrupting Deep Uncertainty Estimation Without Harming Accuracy
 create mode 100644 data/2021/neurips/Dissecting the Diffusion Process in Linear Graph Convolutional Networks
 create mode 100644 data/2021/neurips/Distilling Image Classifiers in Object Detectors
 create mode 100644 data/2021/neurips/Distilling Meta Knowledge on Heterogeneous Graph for Illicit Drug Trafficker Detection on Social Media
 create mode 100644 data/2021/neurips/Distilling Object Detectors with Feature Richness
 create mode 100644 data/2021/neurips/Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck
 create mode 100644 data/2021/neurips/Distributed Deep Learning In Open Collaborations
 create mode 100644 data/2021/neurips/Distributed Estimation with Multiple Samples per User: Sharp Rates and Phase Transition
 create mode 100644 data/2021/neurips/Distributed Machine Learning with Sparse Heterogeneous Data
 create mode 100644 data/2021/neurips/Distributed Principal Component Analysis with Limited Communication
 create mode 100644 data/2021/neurips/Distributed Saddle-Point Problems Under Data Similarity
 create mode 100644 data/2021/neurips/Distributed Zero-Order Optimization under Adversarial Noise
 create mode 100644 data/2021/neurips/Distribution-free inference for regression: discrete, continuous, and in between
 create mode 100644 data/2021/neurips/Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models
 create mode 100644 data/2021/neurips/Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
 create mode 100644 data/2021/neurips/Distributionally Robust Imitation Learning
 create mode 100644 data/2021/neurips/Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
 create mode 100644 data/2021/neurips/Diverse Message Passing for Attribute with Heterophily
 create mode 100644 data/2021/neurips/Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
 create mode 100644 data/2021/neurips/Diversity Matters When Learning From Ensembles
 create mode 100644 data/2021/neurips/Do Different Tracking Tasks Require Different Appearance Models?
 create mode 100644 data/2021/neurips/Do Input Gradients Highlight Discriminative Features?
 create mode 100644 data/2021/neurips/Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark
 create mode 100644 data/2021/neurips/Do Transformers Really Perform Badly for Graph Representation?
 create mode 100644 data/2021/neurips/Do Vision Transformers See Like Convolutional Neural Networks?
 create mode 100644 data/2021/neurips/Do Wider Neural Networks Really Help Adversarial Robustness?
 create mode 100644 data/2021/neurips/Does Knowledge Distillation Really Work?
 create mode 100644 data/2021/neurips/Does Preprocessing Help Training Over-parameterized Neural Networks?
 create mode 100644 data/2021/neurips/Does enforcing fairness mitigate biases caused by subpopulation shift?
 create mode 100644 data/2021/neurips/Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?
 create mode 100644 data/2021/neurips/Domain Invariant Representation Learning with Domain Density Transformations
 create mode 100644 data/2021/neurips/DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks
 create mode 100644 data/2021/neurips/Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence
 create mode 100644 data/2021/neurips/Double Debiased Machine Learning for Dynamic Treatment Effects
 create mode 100644 data/2021/neurips/Double Machine Learning Density Estimation for Local Treatment Effects with Instruments
 create mode 100644 data/2021/neurips/Doubly Robust Thompson Sampling with Linear Payoffs
 create mode 100644 data/2021/neurips/Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates
 create mode 100644 data/2021/neurips/Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
 create mode 100644 data/2021/neurips/Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity
 create mode 100644 data/2021/neurips/Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
 create mode 100644 data/2021/neurips/DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural Networks
 create mode 100644 data/2021/neurips/Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions
 create mode 100644 data/2021/neurips/Dual Parameterization of Sparse Variational Gaussian Processes
 create mode 100644 data/2021/neurips/Dual Progressive Prototype Network for Generalized Zero-Shot Learning
 create mode 100644 data/2021/neurips/Dual-stream Network for Visual Recognition
 create mode 100644 data/2021/neurips/DualNet: Continual Learning, Fast and Slow
 create mode 100644 data/2021/neurips/Dueling Bandits with Adversarial Sleeping
 create mode 100644 data/2021/neurips/Dueling Bandits with Team Comparisons
 create mode 100644 data/2021/neurips/Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
 create mode 100644 data/2021/neurips/Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
 create mode 100644 data/2021/neurips/Dynamic Analysis of Higher-Order Coordination in Neuronal Assemblies via De-Sparsified Orthogonal Matching Pursuit
 create mode 100644 data/2021/neurips/Dynamic Bottleneck for Robust Self-Supervised Exploration
 create mode 100644 data/2021/neurips/Dynamic COVID risk assessment accounting for community virus exposure from a spatial-temporal transmission model
 create mode 100644 data/2021/neurips/Dynamic Causal Bayesian Optimization
 create mode 100644 data/2021/neurips/Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data
 create mode 100644 data/2021/neurips/Dynamic Grained Encoder for Vision Transformers
 create mode 100644 data/2021/neurips/Dynamic Inference with Neural Interpreters
 create mode 100644 data/2021/neurips/Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation
 create mode 100644 data/2021/neurips/Dynamic Normalization and Relay for Video Action Recognition
 create mode 100644 data/2021/neurips/Dynamic Resolution Network
 create mode 100644 data/2021/neurips/Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares
 create mode 100644 data/2021/neurips/Dynamic Trace Estimation
 create mode 100644 data/2021/neurips/Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
 create mode 100644 data/2021/neurips/Dynamic influence maximization
 create mode 100644 data/2021/neurips/Dynamic population-based meta-learning for multi-agent communication with natural language
 create mode 100644 data/2021/neurips/DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
 create mode 100644 data/2021/neurips/Dynamical Wasserstein Barycenters for Time-series Modeling
 create mode 100644 data/2021/neurips/Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models
 create mode 100644 data/2021/neurips/Dynamics-regulated kinematic policy for egocentric pose estimation
 create mode 100644 data/2021/neurips/E(n) Equivariant Normalizing Flows
 create mode 100644 data/2021/neurips/EDGE: Explaining Deep Reinforcement Learning Policies
 create mode 100644 data/2021/neurips/EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback
 create mode 100644 data/2021/neurips/EIGNN: Efficient Infinite-Depth Graph Neural Networks
 create mode 100644 data/2021/neurips/ELLA: Exploration through Learned Language Abstraction
 create mode 100644 data/2021/neurips/Early Convolutions Help Transformers See Better
 create mode 100644 data/2021/neurips/Early-stopped neural networks are consistent
 create mode 100644 data/2021/neurips/Edge Representation Learning with Hypergraphs
 create mode 100644 data/2021/neurips/EditGAN: High-Precision Semantic Image Editing
 create mode 100644 data/2021/neurips/Editing a classifier by rewriting its prediction rules
 create mode 100644 data/2021/neurips/Effective Meta-Regularization by Kernelized Proximal Regularization
 create mode 100644 data/2021/neurips/Efficient Active Learning for Gaussian Process Classification by Error Reduction
 create mode 100644 data/2021/neurips/Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
 create mode 100644 data/2021/neurips/Efficient Bayesian network structure learning via local Markov boundary search
 create mode 100644 data/2021/neurips/Efficient Combination of Rematerialization and Offloading for Training DNNs
 create mode 100644 data/2021/neurips/Efficient Equivariant Network
 create mode 100644 data/2021/neurips/Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination
 create mode 100644 data/2021/neurips/Efficient Generalization with Distributionally Robust Learning
 create mode 100644 data/2021/neurips/Efficient Learning of Discrete-Continuous Computation Graphs
 create mode 100644 data/2021/neurips/Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems
 create mode 100644 data/2021/neurips/Efficient Neural Network Training via Forward and Backward Propagation Sparsification
 create mode 100644 data/2021/neurips/Efficient Online Estimation of Causal Effects by Deciding What to Observe
 create mode 100644 data/2021/neurips/Efficient Statistical Assessment of Neural Network Corruption Robustness
 create mode 100644 data/2021/neurips/Efficient Training of Retrieval Models using Negative Cache
 create mode 100644 data/2021/neurips/Efficient Training of Visual Transformers with Small Datasets
 create mode 100644 data/2021/neurips/Efficient Truncated Linear Regression with Unknown Noise Variance
 create mode 100644 data/2021/neurips/Efficient and Accurate Gradients for Neural SDEs
 create mode 100644 data/2021/neurips/Efficient and Local Parallel Random Walks
 create mode 100644 data/2021/neurips/Efficient constrained sampling via the mirror-Langevin algorithm
 create mode 100644 data/2021/neurips/Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging
 create mode 100644 data/2021/neurips/Efficient methods for Gaussian Markov random fields under sparse linear constraints
 create mode 100644 data/2021/neurips/Efficiently Identifying Task Groupings for Multi-Task Learning
 create mode 100644 data/2021/neurips/Efficiently Learning One Hidden Layer ReLU Networks From Queries
 create mode 100644 data/2021/neurips/Embedding Principle of Loss Landscape of Deep Neural Networks
 create mode 100644 data/2021/neurips/Emergent Communication of Generalizations
 create mode 100644 data/2021/neurips/Emergent Communication under Varying Sizes and Connectivities
 create mode 100644 data/2021/neurips/Emergent Discrete Communication in Semantic Spaces
 create mode 100644 data/2021/neurips/Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization
 create mode 100644 data/2021/neurips/Encoding Robustness to Image Style via Adversarial Feature Perturbations
 create mode 100644 data/2021/neurips/Encoding Spatial Distribution of Convolutional Features for Texture Representation
 create mode 100644 data/2021/neurips/End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
 create mode 100644 data/2021/neurips/End-to-End Weak Supervision
 create mode 100644 data/2021/neurips/End-to-end Multi-modal Video Temporal Grounding
 create mode 100644 data/2021/neurips/End-to-end reconstruction meets data-driven regularization for inverse problems
 create mode 100644 data/2021/neurips/Ensembling Graph Predictions for AMR Parsing
 create mode 100644 data/2021/neurips/Entropic Desired Dynamics for Intrinsic Control
 create mode 100644 data/2021/neurips/Entropy-based adaptive Hamiltonian Monte Carlo
 create mode 100644 data/2021/neurips/Environment Generation for Zero-Shot Compositional Reinforcement Learning
 create mode 100644 data/2021/neurips/Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
 create mode 100644 data/2021/neurips/Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium
 create mode 100644 data/2021/neurips/Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
 create mode 100644 data/2021/neurips/Equivariant Manifold Flows
 create mode 100644 data/2021/neurips/Error Compensated Distributed SGD Can Be Accelerated
 create mode 100644 data/2021/neurips/ErrorCompensatedX: error compensation for variance reduced algorithms
 create mode 100644 data/2021/neurips/Escape saddle points by a simple gradient-descent based algorithm
 create mode 100644 data/2021/neurips/Escaping Saddle Points with Compressed SGD
 create mode 100644 data/2021/neurips/Estimating High Order Gradients of the Data Distribution by Denoising
 create mode 100644 data/2021/neurips/Estimating Multi-cause Treatment Effects via Single-cause Perturbation
 create mode 100644 data/2021/neurips/Estimating the Long-Term Effects of Novel Treatments
 create mode 100644 data/2021/neurips/Estimating the Unique Information of Continuous Variables
 create mode 100644 data/2021/neurips/Evaluating Efficient Performance Estimators of Neural Architectures
 create mode 100644 data/2021/neurips/Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
 create mode 100644 data/2021/neurips/Evaluating State-of-the-Art Classification Models Against Bayes Optimality
 create mode 100644 data/2021/neurips/Evaluating model performance under worst-case subpopulations
 create mode 100644 data/2021/neurips/Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi
 create mode 100644 data/2021/neurips/Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation
 create mode 100644 data/2021/neurips/Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
 create mode 100644 data/2021/neurips/EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization
 create mode 100644 data/2021/neurips/Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots
 create mode 100644 data/2021/neurips/Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms
 create mode 100644 data/2021/neurips/Exact marginal prior distributions of finite Bayesian neural networks
 create mode 100644 data/2021/neurips/Excess Capacity and Backdoor Poisoning
 create mode 100644 data/2021/neurips/Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
 create mode 100644 data/2021/neurips/Explaining Hyperparameter Optimization via Partial Dependence Plots
 create mode 100644 data/2021/neurips/Explaining Latent Representations with a Corpus of Examples
 create mode 100644 data/2021/neurips/Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks
 create mode 100644 data/2021/neurips/Explanation-based Data Augmentation for Image Classification
 create mode 100644 data/2021/neurips/Explicable Reward Design for Reinforcement Learning Agents
 create mode 100644 data/2021/neurips/Explicit loss asymptotics in the gradient descent training of neural networks
 create mode 100644 data/2021/neurips/Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions
 create mode 100644 data/2021/neurips/Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation
 create mode 100644 data/2021/neurips/Exploiting Domain-Specific Features to Enhance Domain Generalization
 create mode 100644 data/2021/neurips/Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach
 create mode 100644 data/2021/neurips/Exploiting Opponents Under Utility Constraints in Sequential Games
 create mode 100644 data/2021/neurips/Exploiting a Zoo of Checkpoints for Unseen Tasks
 create mode 100644 data/2021/neurips/Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation
 create mode 100644 data/2021/neurips/Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
 create mode 100644 data/2021/neurips/Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
 create mode 100644 data/2021/neurips/Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
 create mode 100644 data/2021/neurips/Exploring Forensic Dental Identification with Deep Learning
 create mode 100644 data/2021/neurips/Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling
 create mode 100644 data/2021/neurips/Exploring the Limits of Out-of-Distribution Detection
 create mode 100644 data/2021/neurips/Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
 create mode 100644 data/2021/neurips/Exponential Graph is Provably Efficient for Decentralized Deep Training
 create mode 100644 data/2021/neurips/Exponential Separation between Two Learning Models and Adversarial Robustness
 create mode 100644 data/2021/neurips/Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models
 create mode 100644 data/2021/neurips/Extracting Deformation-Aware Local Features by Learning to Deform
 create mode 100644 data/2021/neurips/FACMAC: Factored Multi-Agent Centralised Policy Gradients
 create mode 100644 data/2021/neurips/FINE Samples for Learning with Noisy Labels
 create mode 100644 data/2021/neurips/FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective
 create mode 100644 data/2021/neurips/FLEX: Unifying Evaluation for Few-Shot NLP
 create mode 100644 data/2021/neurips/FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
 create mode 100644 data/2021/neurips/Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
 create mode 100644 data/2021/neurips/Fair Algorithms for Multi-Agent Multi-Armed Bandits
 create mode 100644 data/2021/neurips/Fair Classification with Adversarial Perturbations
 create mode 100644 data/2021/neurips/Fair Clustering Under a Bounded Cost
 create mode 100644 data/2021/neurips/Fair Exploration via Axiomatic Bargaining
 create mode 100644 data/2021/neurips/Fair Scheduling for Time-dependent Resources
 create mode 100644 data/2021/neurips/Fair Sequential Selection Using Supervised Learning Models
 create mode 100644 data/2021/neurips/Fair Sortition Made Transparent
 create mode 100644 data/2021/neurips/Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem
 create mode 100644 data/2021/neurips/Fairness in Ranking under Uncertainty
 create mode 100644 data/2021/neurips/Fairness via Representation Neutralization
 create mode 100644 data/2021/neurips/Fast Abductive Learning by Similarity-based Consistency Optimization
 create mode 100644 data/2021/neurips/Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
 create mode 100644 data/2021/neurips/Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections
 create mode 100644 data/2021/neurips/Fast Axiomatic Attribution for Neural Networks
 create mode 100644 data/2021/neurips/Fast Bayesian Inference for Gaussian Cox Processes via Path Integral Formulation
 create mode 100644 data/2021/neurips/Fast Certified Robust Training with Short Warmup
 create mode 100644 data/2021/neurips/Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds
 create mode 100644 data/2021/neurips/Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems
 create mode 100644 data/2021/neurips/Fast Federated Learning in the Presence of Arbitrary Device Unavailability
 create mode 100644 data/2021/neurips/Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints
 create mode 100644 data/2021/neurips/Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
 create mode 100644 data/2021/neurips/Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
 create mode 100644 data/2021/neurips/Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics
 create mode 100644 data/2021/neurips/Fast Pure Exploration via Frank-Wolfe
 create mode 100644 data/2021/neurips/Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights
 create mode 100644 data/2021/neurips/Fast Training Method for Stochastic Compositional Optimization Problems
 create mode 100644 data/2021/neurips/Fast Training of Neural Lumigraph Representations using Meta Learning
 create mode 100644 data/2021/neurips/Fast Tucker Rank Reduction for Non-Negative Tensors Using Mean-Field Approximation
 create mode 100644 data/2021/neurips/Fast and Memory Efficient Differentially Private-SGD via JL Projections
 create mode 100644 data/2021/neurips/Fast and accurate randomized algorithms for low-rank tensor decompositions
 create mode 100644 data/2021/neurips/Fast rates for prediction with limited expert advice
 create mode 100644 data/2021/neurips/FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
 create mode 100644 data/2021/neurips/Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error
 create mode 100644 data/2021/neurips/Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data
 create mode 100644 data/2021/neurips/Faster Matchings via Learned Duals
 create mode 100644 data/2021/neurips/Faster Neural Network Training with Approximate Tensor Operations
 create mode 100644 data/2021/neurips/Faster Non-asymptotic Convergence for Double Q-learning
 create mode 100644 data/2021/neurips/Faster proximal algorithms for matrix optimization using Jacobi-based eigenvalue methods
 create mode 100644 data/2021/neurips/Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
 create mode 100644 data/2021/neurips/FedDR - Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization
 create mode 100644 data/2021/neurips/Federated Graph Classification over Non-IID Graphs
 create mode 100644 data/2021/neurips/Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing
 create mode 100644 data/2021/neurips/Federated Linear Contextual Bandits
 create mode 100644 data/2021/neurips/Federated Multi-Task Learning under a Mixture of Distributions
 create mode 100644 data/2021/neurips/Federated Reconstruction: Partially Local Federated Learning
 create mode 100644 data/2021/neurips/Federated Split Task-Agnostic Vision Transformer for COVID-19 CXR Diagnosis
 create mode 100644 data/2021/neurips/Federated-EM with heterogeneity mitigation and variance reduction
 create mode 100644 data/2021/neurips/Few-Round Learning for Federated Learning
 create mode 100644 data/2021/neurips/Few-Shot Data-Driven Algorithms for Low Rank Approximation
 create mode 100644 data/2021/neurips/Few-Shot Object Detection via Association and DIscrimination
 create mode 100644 data/2021/neurips/Few-Shot Segmentation via Cycle-Consistent Transformer
 create mode 100644 data/2021/neurips/Finding Bipartite Components in Hypergraphs
 create mode 100644 data/2021/neurips/Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
 create mode 100644 data/2021/neurips/Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks
 create mode 100644 data/2021/neurips/Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance
 create mode 100644 data/2021/neurips/Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information
 create mode 100644 data/2021/neurips/Fine-Grained Zero-Shot Learning with DNA as Side Information
 create mode 100644 data/2021/neurips/Fine-grained Generalization Analysis of Inductive Matrix Completion
 create mode 100644 data/2021/neurips/Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
 create mode 100644 data/2021/neurips/Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
 create mode 100644 data/2021/neurips/Fitting summary statistics of neural data with a differentiable spiking network simulator
 create mode 100644 data/2021/neurips/Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems
 create mode 100644 data/2021/neurips/FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout
 create mode 100644 data/2021/neurips/Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning
 create mode 100644 data/2021/neurips/FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling
 create mode 100644 data/2021/neurips/Flexible Option Learning
 create mode 100644 data/2021/neurips/Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
 create mode 100644 data/2021/neurips/Focal Attention for Long-Range Interactions in Vision Transformers
 create mode 100644 data/2021/neurips/For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets
 create mode 100644 data/2021/neurips/Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations
 create mode 100644 data/2021/neurips/Formalizing the Generalization-Forgetting Trade-off in Continual Learning
 create mode 100644 data/2021/neurips/Forster Decomposition and Learning Halfspaces with Noise
 create mode 100644 data/2021/neurips/Foundations of Symbolic Languages for Model Interpretability
 create mode 100644 data/2021/neurips/Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
 create mode 100644 data/2021/neurips/Framing RNN as a kernel method: A neural ODE approach
 create mode 100644 data/2021/neurips/From Canonical Correlation Analysis to Self-supervised Graph Neural Networks
 create mode 100644 data/2021/neurips/From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits
 create mode 100644 data/2021/neurips/From global to local MDI variable importances for random forests and when they are Shapley values
 create mode 100644 data/2021/neurips/Functional Neural Networks for Parametric Image Restoration Problems
 create mode 100644 data/2021/neurips/Functional Regularization for Reinforcement Learning via Learned Fourier Features
 create mode 100644 data/2021/neurips/Functional Variational Inference based on Stochastic Process Generators
 create mode 100644 data/2021/neurips/Functionally Regionalized Knowledge Transfer for Low-resource Drug Discovery
 create mode 100644 data/2021/neurips/Fuzzy Clustering with Similarity Queries
 create mode 100644 data/2021/neurips/G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators
 create mode 100644 data/2021/neurips/GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
 create mode 100644 data/2021/neurips/GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction
 create mode 100644 data/2021/neurips/Garment4D: Garment Reconstruction from Point Cloud Sequences
 create mode 100644 data/2021/neurips/Gauge Equivariant Transformer
 create mode 100644 data/2021/neurips/Gaussian Kernel Mixture Network for Single Image Defocus Deblurring
 create mode 100644 data/2021/neurips/GemNet: Universal Directional Graph Neural Networks for Molecules
 create mode 100644 data/2021/neurips/General Low-rank Matrix Optimization: Geometric Analysis and Sharper Bounds
 create mode 100644 data/2021/neurips/General Nonlinearities in SO(2)-Equivariant CNNs
 create mode 100644 data/2021/neurips/Generalizable Imitation Learning from Observation via Inferring Goal Proximity
 create mode 100644 data/2021/neurips/Generalizable Multi-linear Attention Network
 create mode 100644 data/2021/neurips/Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis
 create mode 100644 data/2021/neurips/Generalization Bounds for (Wasserstein) Robust Optimization
 create mode 100644 data/2021/neurips/Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic
 create mode 100644 data/2021/neurips/Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability
 create mode 100644 data/2021/neurips/Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
 create mode 100644 data/2021/neurips/Generalization Guarantee of SGD for Pairwise Learning
 create mode 100644 data/2021/neurips/Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks
 create mode 100644 data/2021/neurips/Generalized DataWeighting via Class-Level Gradient Manipulation
 create mode 100644 data/2021/neurips/Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks
 create mode 100644 data/2021/neurips/Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels
 create mode 100644 data/2021/neurips/Generalized Linear Bandits with Local Differential Privacy
 create mode 100644 data/2021/neurips/Generalized Proximal Policy Optimization with Sample Reuse
 create mode 100644 data/2021/neurips/Generalized Shape Metrics on Neural Representations
 create mode 100644 data/2021/neurips/Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement
 create mode 100644 data/2021/neurips/Generating High-Quality Explanations for Navigation in Partially-Revealed Environments
 create mode 100644 data/2021/neurips/Generative Occupancy Fields for 3D Surface-Aware Image Synthesis
 create mode 100644 data/2021/neurips/Generative vs. Discriminative: Rethinking The Meta-Continual Learning
 create mode 100644 data/2021/neurips/Generic Neural Architecture Search via Regression
 create mode 100644 data/2021/neurips/GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles
 create mode 100644 data/2021/neurips/Geometry Processing with Neural Fields
 create mode 100644 data/2021/neurips/Glance-and-Gaze Vision Transformer
 create mode 100644 data/2021/neurips/Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization
 create mode 100644 data/2021/neurips/Global Convergence of Online Optimization for Nonlinear Model Predictive Control
 create mode 100644 data/2021/neurips/Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games
 create mode 100644 data/2021/neurips/Global Filter Networks for Image Classification
 create mode 100644 data/2021/neurips/Global-aware Beam Search for Neural Abstractive Summarization
 create mode 100644 data/2021/neurips/Going Beyond Linear RL: Sample Efficient Neural Function Approximation
 create mode 100644 data/2021/neurips/Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
 create mode 100644 data/2021/neurips/Gone Fishing: Neural Active Learning with Fisher Embeddings
 create mode 100644 data/2021/neurips/Good Classification Measures and How to Find Them
 create mode 100644 data/2021/neurips/Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation
 create mode 100644 data/2021/neurips/GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
 create mode 100644 data/2021/neurips/Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
 create mode 100644 data/2021/neurips/Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning
 create mode 100644 data/2021/neurips/Gradient Inversion with Generative Image Prior
 create mode 100644 data/2021/neurips/Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering
 create mode 100644 data/2021/neurips/Gradient-based Editing of Memory Examples for Online Task-free Continual Learning
 create mode 100644 data/2021/neurips/Gradient-based Hyperparameter Optimization Over Long Horizons
 create mode 100644 data/2021/neurips/Gradual Domain Adaptation without Indexed Intermediate Domains
 create mode 100644 data/2021/neurips/Grammar-Based Grounded Lexicon Learning
 create mode 100644 data/2021/neurips/Graph Adversarial Self-Supervised Learning
 create mode 100644 data/2021/neurips/Graph Differentiable Architecture Search with Structure Learning
 create mode 100644 data/2021/neurips/Graph Neural Networks with Adaptive Residual
 create mode 100644 data/2021/neurips/Graph Neural Networks with Local Graph Parameters
 create mode 100644 data/2021/neurips/Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification
 create mode 100644 data/2021/neurips/GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph
 create mode 100644 data/2021/neurips/Graphical Models in Heavy-Tailed Markets
 create mode 100644 data/2021/neurips/Greedy Approximation Algorithms for Active Sequential Hypothesis Testing
 create mode 100644 data/2021/neurips/Greedy and Random Quasi-Newton Methods with Faster Explicit Superlinear Convergence
 create mode 100644 data/2021/neurips/Grounding Representation Similarity Through Statistical Testing
 create mode 100644 data/2021/neurips/Grounding Spatio-Temporal Language with Transformers
 create mode 100644 data/2021/neurips/Grounding inductive biases in natural images: invariance stems from variations in data
 create mode 100644 data/2021/neurips/Group Equivariant Subsampling
 create mode 100644 data/2021/neurips/H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
 create mode 100644 data/2021/neurips/HNPE: Leveraging Global Parameters for Neural Posterior Estimation
 create mode 100644 data/2021/neurips/HRFormer: High-Resolution Vision Transformer for Dense Predict
 create mode 100644 data/2021/neurips/HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning
 create mode 100644 data/2021/neurips/Habitat 2.0: Training Home Assistants to Rearrange their Habitat
 create mode 100644 data/2021/neurips/Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling
 create mode 100644 data/2021/neurips/Handling Long-tailed Feature Distribution in AdderNets
 create mode 100644 data/2021/neurips/Hard-Attention for Scalable Image Classification
 create mode 100644 data/2021/neurips/Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning
 create mode 100644 data/2021/neurips/Hash Layers For Large Sparse Models
 create mode 100644 data/2021/neurips/Heavy Ball Momentum for Conditional Gradient
 create mode 100644 data/2021/neurips/Heavy Ball Neural Ordinary Differential Equations
 create mode 100644 data/2021/neurips/Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
 create mode 100644 data/2021/neurips/Hessian Eigenspectra of More Realistic Nonlinear Models
 create mode 100644 data/2021/neurips/Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
 create mode 100644 data/2021/neurips/Heuristic-Guided Reinforcement Learning
 create mode 100644 data/2021/neurips/Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs
 create mode 100644 data/2021/neurips/Hierarchical Reinforcement Learning with Timed Subgoals
 create mode 100644 data/2021/neurips/Hierarchical Skills for Efficient Exploration
 create mode 100644 data/2021/neurips/High Probability Complexity Bounds for Line Search Based on Stochastic Oracles
 create mode 100644 data/2021/neurips/High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
 create mode 100644 data/2021/neurips/Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
 create mode 100644 data/2021/neurips/Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
 create mode 100644 data/2021/neurips/History Aware Multimodal Transformer for Vision-and-Language Navigation
 create mode 100644 data/2021/neurips/Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation
 create mode 100644 data/2021/neurips/How Data Augmentation affects Optimization for Linear Regression
 create mode 100644 data/2021/neurips/How Does it Sound?
 create mode 100644 data/2021/neurips/How Fine-Tuning Allows for Effective Meta-Learning
 create mode 100644 data/2021/neurips/How Modular should Neural Module Networks Be for Systematic Generalization?
 create mode 100644 data/2021/neurips/How Powerful are Performance Predictors in Neural Architecture Search?
 create mode 100644 data/2021/neurips/How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
 create mode 100644 data/2021/neurips/How Tight Can PAC-Bayes be in the Small Data Regime?
 create mode 100644 data/2021/neurips/How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
 create mode 100644 data/2021/neurips/How can classical multidimensional scaling go wrong?
 create mode 100644 data/2021/neurips/How does a Neural Network's Architecture Impact its Robustness to Noisy Labels?
 create mode 100644 data/2021/neurips/How to transfer algorithmic reasoning knowledge to learn new algorithms?
 create mode 100644 data/2021/neurips/Human-Adversarial Visual Question Answering
 create mode 100644 data/2021/neurips/Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
 create mode 100644 data/2021/neurips/HyperSPNs: Compact and Expressive Probabilistic Circuits
 create mode 100644 data/2021/neurips/Hyperbolic Busemann Learning with Ideal Prototypes
 create mode 100644 data/2021/neurips/Hyperbolic Procrustes Analysis Using Riemannian Geometry
 create mode 100644 data/2021/neurips/Hypergraph Propagation and Community Selection for Objects Retrieval
 create mode 100644 data/2021/neurips/Hyperparameter Optimization Is Deceiving Us, and How to Stop It
 create mode 100644 data/2021/neurips/Hyperparameter Tuning is All You Need for LISTA
 create mode 100644 data/2021/neurips/INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding
 create mode 100644 data/2021/neurips/IQ-Learn: Inverse soft-Q Learning for Imitation
 create mode 100644 data/2021/neurips/IRM - when it works and when it doesn't: A test case of natural language inference
 create mode 100644 data/2021/neurips/Identifiability in inverse reinforcement learning
 create mode 100644 data/2021/neurips/Identifiable Generative models for Missing Not at Random Data Imputation
 create mode 100644 data/2021/neurips/Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information
 create mode 100644 data/2021/neurips/Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases
 create mode 100644 data/2021/neurips/Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
 create mode 100644 data/2021/neurips/Identifying and Benchmarking Natural Out-of-Context Prediction Problems
 create mode 100644 data/2021/neurips/Identity testing for Mallows model
 create mode 100644 data/2021/neurips/Image Generation using Continuous Filter Atoms
 create mode 100644 data/2021/neurips/ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
 create mode 100644 data/2021/neurips/Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
 create mode 100644 data/2021/neurips/Imitation with Neural Density Models
 create mode 100644 data/2021/neurips/Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
 create mode 100644 data/2021/neurips/Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods
 create mode 100644 data/2021/neurips/Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
 create mode 100644 data/2021/neurips/Implicit Generative Copulas
 create mode 100644 data/2021/neurips/Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
 create mode 100644 data/2021/neurips/Implicit Regularization in Matrix Sensing via Mirror Descent
 create mode 100644 data/2021/neurips/Implicit SVD for Graph Representation Learning
 create mode 100644 data/2021/neurips/Implicit Semantic Response Alignment for Partial Domain Adaptation
 create mode 100644 data/2021/neurips/Implicit Sparse Regularization: The Impact of Depth and Early Stopping
 create mode 100644 data/2021/neurips/Implicit Task-Driven Probability Discrepancy Measure for Unsupervised Domain Adaptation
 create mode 100644 data/2021/neurips/Implicit Transformer Network for Screen Content Image Continuous Super-Resolution
 create mode 100644 data/2021/neurips/Impression learning: Online representation learning with synaptic plasticity
 create mode 100644 data/2021/neurips/Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
 create mode 100644 data/2021/neurips/Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces
 create mode 100644 data/2021/neurips/Improved Guarantees for Offline Stochastic Matching via new Ordered Contention Resolution Schemes
 create mode 100644 data/2021/neurips/Improved Learning Rates of a Functional Lasso-type SVM with Sparse Multi-Kernel Representation
 create mode 100644 data/2021/neurips/Improved Regret Bounds for Tracking Experts with Memory
 create mode 100644 data/2021/neurips/Improved Regularization and Robustness for Fine-tuning in Neural Networks
 create mode 100644 data/2021/neurips/Improved Transformer for High-Resolution GANs
 create mode 100644 data/2021/neurips/Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
 create mode 100644 data/2021/neurips/Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss
 create mode 100644 data/2021/neurips/Improving Calibration through the Relationship with Adversarial Robustness
 create mode 100644 data/2021/neurips/Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
 create mode 100644 data/2021/neurips/Improving Compositionality of Neural Networks by Decoding Representations to Inputs
 create mode 100644 data/2021/neurips/Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
 create mode 100644 data/2021/neurips/Improving Conditional Coverage via Orthogonal Quantile Regression
 create mode 100644 data/2021/neurips/Improving Contrastive Learning on Imbalanced Data via Open-World Sampling
 create mode 100644 data/2021/neurips/Improving Deep Learning Interpretability by Saliency Guided Training
 create mode 100644 data/2021/neurips/Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture
 create mode 100644 data/2021/neurips/Improving Robustness using Generated Data
 create mode 100644 data/2021/neurips/Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration
 create mode 100644 data/2021/neurips/Improving Transferability of Representations via Augmentation-Aware Self-Supervision
 create mode 100644 data/2021/neurips/Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
 create mode 100644 data/2021/neurips/Improving black-box optimization in VAE latent space using decoder uncertainty
 create mode 100644 data/2021/neurips/Increasing Liquid State Machine Performance with Edge-of-Chaos Dynamics Organized by Astrocyte-modulated Plasticity
 create mode 100644 data/2021/neurips/Independent Prototype Propagation for Zero-Shot Compositionality
 create mode 100644 data/2021/neurips/Independent mechanism analysis, a new concept?
 create mode 100644 data/2021/neurips/Indexed Minimum Empirical Divergence for Unimodal Bandits
 create mode 100644 "data/2021/neurips/Individual Privacy Accounting via a R\303\251nyi Filter"
 create mode 100644 data/2021/neurips/Infinite Time Horizon Safety of Bayesian Neural Networks
 create mode 100644 data/2021/neurips/Influence Patterns for Explaining Information Flow in BERT
 create mode 100644 data/2021/neurips/InfoGCL: Information-Aware Graph Contrastive Learning
 create mode 100644 data/2021/neurips/Information Directed Reward Learning for Reinforcement Learning
 create mode 100644 data/2021/neurips/Information Directed Sampling for Sparse Linear Bandits
 create mode 100644 data/2021/neurips/Information is Power: Intrinsic Control via Information Capture
 create mode 100644 data/2021/neurips/Information-constrained optimization: can adaptive processing of gradients help?
 create mode 100644 data/2021/neurips/Information-theoretic generalization bounds for black-box learning algorithms
 create mode 100644 data/2021/neurips/Instance-Conditional Knowledge Distillation for Object Detection
 create mode 100644 data/2021/neurips/Instance-Conditioned GAN
 create mode 100644 data/2021/neurips/Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates
 create mode 100644 data/2021/neurips/Instance-Dependent Partial Label Learning
 create mode 100644 data/2021/neurips/Instance-dependent Label-noise Learning under a Structural Causal Model
 create mode 100644 data/2021/neurips/Instance-optimal Mean Estimation Under Differential Privacy
 create mode 100644 data/2021/neurips/Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression
 create mode 100644 data/2021/neurips/Integrating Tree Path in Transformer for Code Representation
 create mode 100644 data/2021/neurips/Interactive Label Cleaning with Example-based Explanations
 create mode 100644 data/2021/neurips/Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
 create mode 100644 data/2021/neurips/Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
 create mode 100644 data/2021/neurips/Interpolation can hurt robust generalization even when there is no noise
 create mode 100644 data/2021/neurips/Interpretable agent communication from scratch (with a generic visual processor emerging on the side)
 create mode 100644 data/2021/neurips/Interpreting Representation Quality of DNNs for 3D Point Cloud Processing
 create mode 100644 data/2021/neurips/Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models
 create mode 100644 data/2021/neurips/Intriguing Properties of Contrastive Losses
 create mode 100644 data/2021/neurips/Intriguing Properties of Vision Transformers
 create mode 100644 data/2021/neurips/Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
 create mode 100644 data/2021/neurips/Introspective Distillation for Robust Question Answering
 create mode 100644 data/2021/neurips/Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
 create mode 100644 data/2021/neurips/Invariant Causal Imitation Learning for Generalizable Policies
 create mode 100644 data/2021/neurips/Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System
 create mode 100644 data/2021/neurips/Inverse Problems Leveraging Pre-trained Contrastive Representations
 create mode 100644 data/2021/neurips/Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees
 create mode 100644 data/2021/neurips/Inverse-Weighted Survival Games
 create mode 100644 data/2021/neurips/Invertible DenseNets with Concatenated LipSwish
 create mode 100644 data/2021/neurips/Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence
 create mode 100644 data/2021/neurips/Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
 create mode 100644 data/2021/neurips/It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems
 create mode 100644 data/2021/neurips/Iterative Amortized Policy Optimization
 create mode 100644 data/2021/neurips/Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias
 create mode 100644 data/2021/neurips/Iterative Connecting Probability Estimation for Networks
 create mode 100644 data/2021/neurips/Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods
 create mode 100644 data/2021/neurips/Iterative Teacher-Aware Learning
 create mode 100644 data/2021/neurips/Iterative Teaching by Label Synthesis
 create mode 100644 data/2021/neurips/Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate
 create mode 100644 data/2021/neurips/Joint Inference for Neural Network Depth and Dropout Regularization
 create mode 100644 data/2021/neurips/Joint Modeling of Visual Objects and Relations for Scene Graph Generation
 create mode 100644 data/2021/neurips/Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection
 create mode 100644 data/2021/neurips/Joint inference and input optimization in equilibrium networks
 create mode 100644 data/2021/neurips/K-Net: Towards Unified Image Segmentation
 create mode 100644 data/2021/neurips/K-level Reasoning for Zero-Shot Coordination in Hanabi
 create mode 100644 data/2021/neurips/KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support
 create mode 100644 data/2021/neurips/KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network
 create mode 100644 data/2021/neurips/Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
 create mode 100644 data/2021/neurips/Kernel Functional Optimisation
 create mode 100644 data/2021/neurips/Kernel Identification Through Transformers
 create mode 100644 data/2021/neurips/Kernelized Heterogeneous Risk Minimization
 create mode 100644 data/2021/neurips/Knowledge-Adaptation Priors
 create mode 100644 data/2021/neurips/Knowledge-inspired 3D Scene Graph Prediction in Point Cloud
 create mode 100644 data/2021/neurips/L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization
 create mode 100644 data/2021/neurips/LADA: Look-Ahead Data Acquisition via Augmentation for Deep Active Learning
 create mode 100644 data/2021/neurips/LEADS: Learning Dynamical Systems that Generalize Across Environments
 create mode 100644 data/2021/neurips/LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
 create mode 100644 data/2021/neurips/LSH-SMILE: Locality Sensitive Hashing Accelerated Simulation and Learning
 create mode 100644 data/2021/neurips/Label Disentanglement in Partition-based Extreme Multilabel Classification
 create mode 100644 data/2021/neurips/Label Noise SGD Provably Prefers Flat Global Minimizers
 create mode 100644 data/2021/neurips/Label consistency in overfitted generalized $k$-means
 create mode 100644 data/2021/neurips/Label-Imbalanced and Group-Sensitive Classification under Overparameterization
 create mode 100644 data/2021/neurips/Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning
 create mode 100644 data/2021/neurips/Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning
 create mode 100644 data/2021/neurips/Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
 create mode 100644 data/2021/neurips/Landscape analysis of an improved power method for tensor decomposition
 create mode 100644 data/2021/neurips/Language models enable zero-shot prediction of the effects of mutations on protein function
 create mode 100644 data/2021/neurips/Laplace Redux - Effortless Bayesian Deep Learning
 create mode 100644 data/2021/neurips/Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods
 create mode 100644 data/2021/neurips/Large-Scale Learning with Fourier Features and Tensor Decompositions
 create mode 100644 data/2021/neurips/Large-Scale Unsupervised Object Discovery
 create mode 100644 data/2021/neurips/Large-Scale Wasserstein Gradient Flows
 create mode 100644 data/2021/neurips/Last iterate convergence of SGD for Least-Squares in the Interpolation regime
 create mode 100644 data/2021/neurips/Last-iterate Convergence in Extensive-Form Games
 create mode 100644 data/2021/neurips/Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons
 create mode 100644 data/2021/neurips/Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages
 create mode 100644 data/2021/neurips/Latent Matters: Learning Deep State-Space Models
 create mode 100644 data/2021/neurips/Lattice partition recovery with dyadic CART
 create mode 100644 data/2021/neurips/Learnability of Linear Thresholds from Label Proportions
 create mode 100644 data/2021/neurips/Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
 create mode 100644 data/2021/neurips/Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection
 create mode 100644 data/2021/neurips/Learning 3D Dense Correspondence via Canonical Point Autoencoder
 create mode 100644 data/2021/neurips/Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations
 create mode 100644 data/2021/neurips/Learning Causal Semantic Representation for Out-of-Distribution Prediction
 create mode 100644 data/2021/neurips/Learning Collaborative Policies to Solve NP-hard Routing Problems
 create mode 100644 data/2021/neurips/Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)
 create mode 100644 data/2021/neurips/Learning Conjoint Attentions for Graph Neural Nets
 create mode 100644 data/2021/neurips/Learning Debiased Representation via Disentangled Feature Augmentation
 create mode 100644 data/2021/neurips/Learning Debiased and Disentangled Representations for Semantic Segmentation
 create mode 100644 data/2021/neurips/Learning Disentangled Behavior Embeddings
 create mode 100644 data/2021/neurips/Learning Distilled Collaboration Graph for Multi-Agent Perception
 create mode 100644 data/2021/neurips/Learning Diverse Policies in MOBA Games via Macro-Goals
 create mode 100644 data/2021/neurips/Learning Domain Invariant Representations in Goal-conditioned Block MDPs
 create mode 100644 data/2021/neurips/Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention
 create mode 100644 data/2021/neurips/Learning Equilibria in Matching Markets from Bandit Feedback
 create mode 100644 data/2021/neurips/Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
 create mode 100644 data/2021/neurips/Learning Fast-Inference Bayesian Networks
 create mode 100644 data/2021/neurips/Learning Frequency Domain Approximation for Binary Neural Networks
 create mode 100644 data/2021/neurips/Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions
 create mode 100644 data/2021/neurips/Learning Generalized Gumbel-max Causal Mechanisms
 create mode 100644 data/2021/neurips/Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction
 create mode 100644 data/2021/neurips/Learning Graph Cellular Automata
 create mode 100644 data/2021/neurips/Learning Graph Models for Retrosynthesis Prediction
 create mode 100644 data/2021/neurips/Learning Hard Optimization Problems: A Data Generation Perspective
 create mode 100644 data/2021/neurips/Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence
 create mode 100644 data/2021/neurips/Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach
 create mode 100644 data/2021/neurips/Learning Knowledge Graph-based World Models of Textual Environments
 create mode 100644 data/2021/neurips/Learning Large Neighborhood Search Policy for Integer Programming
 create mode 100644 data/2021/neurips/Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning
 create mode 100644 data/2021/neurips/Learning Markov State Abstractions for Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Learning Models for Actionable Recourse
 create mode 100644 data/2021/neurips/Learning Nonparametric Volterra Kernels with Gaussian Processes
 create mode 100644 data/2021/neurips/Learning Optimal Predictive Checklists
 create mode 100644 data/2021/neurips/Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs
 create mode 100644 data/2021/neurips/Learning Riemannian metric for disease progression modeling
 create mode 100644 data/2021/neurips/Learning Robust Hierarchical Patterns of Human Brain across Many fMRI Studies
 create mode 100644 data/2021/neurips/Learning Semantic Representations to Verify Hardware Designs
 create mode 100644 data/2021/neurips/Learning Signal-Agnostic Manifolds of Neural Fields
 create mode 100644 data/2021/neurips/Learning Space Partitions for Path Planning
 create mode 100644 data/2021/neurips/Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems
 create mode 100644 data/2021/neurips/Learning State Representations from Random Deep Action-conditional Predictions
 create mode 100644 data/2021/neurips/Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound
 create mode 100644 data/2021/neurips/Learning Student-Friendly Teacher Networks for Knowledge Distillation
 create mode 100644 data/2021/neurips/Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks
 create mode 100644 data/2021/neurips/Learning Transferable Adversarial Perturbations
 create mode 100644 data/2021/neurips/Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training
 create mode 100644 data/2021/neurips/Learning Treatment Effects in Panels with General Intervention Patterns
 create mode 100644 data/2021/neurips/Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Learning a Single Neuron with Bias Using Gradient Descent
 create mode 100644 data/2021/neurips/Learning and Generalization in RNNs
 create mode 100644 data/2021/neurips/Learning curves of generic features maps for realistic datasets with a teacher-student model
 create mode 100644 data/2021/neurips/Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering
 create mode 100644 data/2021/neurips/Learning in Multi-Stage Decentralized Matching Markets
 create mode 100644 data/2021/neurips/Learning in Non-Cooperative Configurable Markov Decision Processes
 create mode 100644 data/2021/neurips/Learning in two-player zero-sum partially observable Markov games with perfect recall
 create mode 100644 data/2021/neurips/Learning interaction rules from multi-animal trajectories via augmented behavioral models
 create mode 100644 data/2021/neurips/Learning latent causal graphs via mixture oracles
 create mode 100644 data/2021/neurips/Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters
 create mode 100644 data/2021/neurips/Learning rule influences recurrent network representations but not attractor structure in decision-making tasks
 create mode 100644 data/2021/neurips/Learning the optimal Tikhonov regularizer for inverse problems
 create mode 100644 data/2021/neurips/Learning to Adapt via Latent Domains for Adaptive Semantic Segmentation
 create mode 100644 data/2021/neurips/Learning to Assimilate in Chaotic Dynamical Systems
 create mode 100644 data/2021/neurips/Learning to Combine Per-Example Solutions for Neural Program Synthesis
 create mode 100644 data/2021/neurips/Learning to Compose Visual Relations
 create mode 100644 data/2021/neurips/Learning to Draw: Emergent Communication through Sketching
 create mode 100644 data/2021/neurips/Learning to Elect
 create mode 100644 data/2021/neurips/Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics
 create mode 100644 data/2021/neurips/Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training
 create mode 100644 data/2021/neurips/Learning to Generate Visual Questions with Noisy Supervision
 create mode 100644 data/2021/neurips/Learning to Ground Multi-Agent Communication with Autoencoders
 create mode 100644 data/2021/neurips/Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer
 create mode 100644 data/2021/neurips/Learning to Learn Dense Gaussian Processes for Few-Shot Learning
 create mode 100644 data/2021/neurips/Learning to Learn Graph Topologies
 create mode 100644 data/2021/neurips/Learning to Predict Trustworthiness with Steep Slope Loss
 create mode 100644 data/2021/neurips/Learning to Schedule Heuristics in Branch and Bound
 create mode 100644 data/2021/neurips/Learning to See by Looking at Noise
 create mode 100644 data/2021/neurips/Learning to Select Exogenous Events for Marked Temporal Point Process
 create mode 100644 data/2021/neurips/Learning to Synthesize Programs as Interpretable and Generalizable Policies
 create mode 100644 data/2021/neurips/Learning to Time-Decode in Spiking Neural Networks Through the Information Bottleneck
 create mode 100644 data/2021/neurips/Learning to dehaze with polarization
 create mode 100644 data/2021/neurips/Learning to delegate for large-scale vehicle routing
 create mode 100644 data/2021/neurips/Learning where to learn: Gradient sparsity in meta and continual learning
 create mode 100644 data/2021/neurips/Learning with Algorithmic Supervision via Continuous Relaxations
 create mode 100644 data/2021/neurips/Learning with Holographic Reduced Representations
 create mode 100644 data/2021/neurips/Learning with Labeling Induced Abstentions
 create mode 100644 data/2021/neurips/Learning with Noisy Correspondence for Cross-modal Matching
 create mode 100644 data/2021/neurips/Learning with User-Level Privacy
 create mode 100644 data/2021/neurips/Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds
 create mode 100644 data/2021/neurips/Learning-to-learn non-convex piecewise-Lipschitz functions
 create mode 100644 data/2021/neurips/Least Square Calibration for Peer Reviews
 create mode 100644 data/2021/neurips/Leveraging Distribution Alignment via Stein Path for Cross-Domain Cold-Start Recommendation
 create mode 100644 data/2021/neurips/Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
 create mode 100644 data/2021/neurips/Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds
 create mode 100644 data/2021/neurips/Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
 create mode 100644 data/2021/neurips/Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
 create mode 100644 data/2021/neurips/Lifelong Domain Adaptation via Consolidated Internal Distribution
 create mode 100644 data/2021/neurips/Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering
 create mode 100644 data/2021/neurips/Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
 create mode 100644 data/2021/neurips/Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients
 create mode 100644 data/2021/neurips/Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models
 create mode 100644 data/2021/neurips/Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters
 create mode 100644 data/2021/neurips/Linear-Time Probabilistic Solution of Boundary Value Problems
 create mode 100644 data/2021/neurips/Lip to Speech Synthesis with Visual Context Attentional GAN
 create mode 100644 data/2021/neurips/List-Decodable Mean Estimation in Nearly-PCA Time
 create mode 100644 data/2021/neurips/Littlestone Classes are Privately Online Learnable
 create mode 100644 data/2021/neurips/Local Differential Privacy for Regret Minimization in Reinforcement Learning
 create mode 100644 data/2021/neurips/Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization
 create mode 100644 data/2021/neurips/Local Explanation of Dialogue Response Generation
 create mode 100644 data/2021/neurips/Local Hyper-Flow Diffusion
 create mode 100644 data/2021/neurips/Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels
 create mode 100644 data/2021/neurips/Local plasticity rules can learn deep representations using self-supervised contrastive predictions
 create mode 100644 data/2021/neurips/Local policy search with Bayesian optimization
 create mode 100644 data/2021/neurips/Locality Sensitive Teaching
 create mode 100644 data/2021/neurips/Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
 create mode 100644 data/2021/neurips/Localization with Sampling-Argmax
 create mode 100644 data/2021/neurips/Localization, Convexity, and Star Aggregation
 create mode 100644 data/2021/neurips/Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models
 create mode 100644 data/2021/neurips/Locally Valid and Discriminative Prediction Intervals for Deep Learning Models
 create mode 100644 data/2021/neurips/Locally differentially private estimation of functionals of discrete distributions
 create mode 100644 data/2021/neurips/Locally private online change point detection
 create mode 100644 data/2021/neurips/Logarithmic Regret from Sublinear Hints
 create mode 100644 data/2021/neurips/Logarithmic Regret in Feature-based Dynamic Pricing
 create mode 100644 data/2021/neurips/Long Short-Term Transformer for Online Action Detection
 create mode 100644 data/2021/neurips/Long-Short Transformer: Efficient Transformers for Language and Vision
 create mode 100644 data/2021/neurips/Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
 create mode 100644 data/2021/neurips/Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis
 create mode 100644 data/2021/neurips/Looking Beyond Single Images for Contrastive Semantic Segmentation Learning
 create mode 100644 data/2021/neurips/Loss function based second-order Jensen inequality and its application to particle variational inference
 create mode 100644 data/2021/neurips/Lossy Compression for Lossless Prediction
 create mode 100644 data/2021/neurips/Low-Fidelity Video Encoder Optimization for Temporal Action Localization
 create mode 100644 data/2021/neurips/Low-Rank Constraints for Fast Inference in Structured Models
 create mode 100644 data/2021/neurips/Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems
 create mode 100644 data/2021/neurips/Low-Rank Subspaces in GANs
 create mode 100644 data/2021/neurips/Low-dimensional Structure in the Space of Language Representations is Reflected in Brain Responses
 create mode 100644 data/2021/neurips/Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks
 create mode 100644 data/2021/neurips/Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions
 create mode 100644 data/2021/neurips/Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
 create mode 100644 data/2021/neurips/Luna: Linear Unified Nested Attention
 create mode 100644 data/2021/neurips/M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
 create mode 100644 data/2021/neurips/MADE: Exploration via Maximizing Deviation from Explored Regions
 create mode 100644 data/2021/neurips/MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents
 create mode 100644 data/2021/neurips/MAU: A Motion-Aware Unit for Video Prediction and Beyond
 create mode 100644 data/2021/neurips/MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
 create mode 100644 data/2021/neurips/MCMC Variational Inference via Uncorrected Hamiltonian Annealing
 create mode 100644 data/2021/neurips/MERLOT: Multimodal Neural Script Knowledge Models
 create mode 100644 data/2021/neurips/MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
 create mode 100644 data/2021/neurips/MICo: Improved representations via sampling-based state similarity for Markov decision processes
 create mode 100644 data/2021/neurips/MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms
 create mode 100644 data/2021/neurips/MLP-Mixer: An all-MLP Architecture for Vision
 create mode 100644 data/2021/neurips/MOMA: Multi-Object Multi-Actor Activity Parsing
 create mode 100644 data/2021/neurips/MST: Masked Self-Supervised Transformer for Visual Representation
 create mode 100644 data/2021/neurips/Machine Learning for Variance Reduction in Online Experiments
 create mode 100644 data/2021/neurips/Machine learning structure preserving brackets for forecasting irreversible processes
 create mode 100644 data/2021/neurips/Machine versus Human Attention in Deep Reinforcement Learning Tasks
 create mode 100644 data/2021/neurips/MagNet: A Neural Network for Directed Graphs
 create mode 100644 data/2021/neurips/Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications
 create mode 100644 data/2021/neurips/Making a (Counterfactual) Difference One Rationale at a Time
 create mode 100644 data/2021/neurips/Making the most of your day: online learning for optimal allocation of time
 create mode 100644 data/2021/neurips/Manifold Topology Divergence: a Framework for Comparing Data Manifolds
 create mode 100644 data/2021/neurips/Manipulating SGD with Data Ordering Attacks
 create mode 100644 data/2021/neurips/Margin-Independent Online Multiclass Learning via Convex Geometry
 create mode 100644 data/2021/neurips/Marginalised Gaussian Processes with Nested Sampling
 create mode 100644 data/2021/neurips/MarioNette: Self-Supervised Sprite Learning
 create mode 100644 data/2021/neurips/Mastering Atari Games with Limited Data
 create mode 100644 data/2021/neurips/Matching a Desired Causal State via Shift Interventions
 create mode 100644 data/2021/neurips/Matrix encoding networks for neural combinatorial optimization
 create mode 100644 data/2021/neurips/Matrix factorisation and the interpretation of geodesic distance
 create mode 100644 data/2021/neurips/Maximum Likelihood Training of Score-Based Diffusion Models
 create mode 100644 data/2021/neurips/Measuring Generalization with Optimal Transport
 create mode 100644 data/2021/neurips/Medical Dead-ends and Learning to Identify High-Risk States and Treatments
 create mode 100644 data/2021/neurips/Memory Efficient Meta-Learning with Large Images
 create mode 100644 data/2021/neurips/Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering
 create mode 100644 data/2021/neurips/Memory-efficient Patch-based Inference for Tiny Deep Learning
 create mode 100644 data/2021/neurips/Meta Internal Learning
 create mode 100644 data/2021/neurips/Meta Learning Backpropagation And Improving It
 create mode 100644 data/2021/neurips/Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data
 create mode 100644 data/2021/neurips/Meta-Adaptive Nonlinear Control: Theory and Algorithms
 create mode 100644 data/2021/neurips/Meta-Learning Reliable Priors in the Function Space
 create mode 100644 data/2021/neurips/Meta-Learning Sparse Implicit Neural Representations
 create mode 100644 data/2021/neurips/Meta-Learning for Relative Density-Ratio Estimation
 create mode 100644 data/2021/neurips/Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks
 create mode 100644 data/2021/neurips/Meta-Learning via Learning with Distributed Memory
 create mode 100644 data/2021/neurips/Meta-learning to Improve Pre-training
 create mode 100644 data/2021/neurips/Meta-learning with an Adaptive Task Scheduler
 create mode 100644 data/2021/neurips/MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images
 create mode 100644 data/2021/neurips/Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models
 create mode 100644 data/2021/neurips/Metropolis-Hastings Data Augmentation for Graph Neural Networks
 create mode 100644 data/2021/neurips/Mind the Gap: Assessing Temporal Generalization in Neural Language Models
 create mode 100644 data/2021/neurips/Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding
 create mode 100644 data/2021/neurips/Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization
 create mode 100644 data/2021/neurips/Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers
 create mode 100644 data/2021/neurips/Minimax Regret for Stochastic Shortest Path
 create mode 100644 data/2021/neurips/Minimizing Polarization and Disagreement in Social Networks via Link Recommendation
 create mode 100644 data/2021/neurips/Mining the Benefits of Two-stage and One-stage HOI Detection
 create mode 100644 data/2021/neurips/Mirror Langevin Monte Carlo: the Case Under Isoperimetry
 create mode 100644 data/2021/neurips/Misspecified Gaussian Process Bandit Optimization
 create mode 100644 data/2021/neurips/Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage
 create mode 100644 data/2021/neurips/Mitigating Forgetting in Online Continual Learning with Neuron Calibration
 create mode 100644 data/2021/neurips/MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps
 create mode 100644 data/2021/neurips/MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data
 create mode 100644 data/2021/neurips/Mixability made efficient: Fast online multiclass logistic regression
 create mode 100644 data/2021/neurips/Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity
 create mode 100644 data/2021/neurips/Mixture Proportion Estimation and PU Learning: A Modern Approach
 create mode 100644 data/2021/neurips/Mixture weights optimisation for Alpha-Divergence Variational Inference
 create mode 100644 data/2021/neurips/MobILE: Model-Based Imitation Learning From Observation Alone
 create mode 100644 data/2021/neurips/MobTCast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction
 create mode 100644 data/2021/neurips/Modality-Agnostic Topology Aware Localization
 create mode 100644 data/2021/neurips/Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data
 create mode 100644 data/2021/neurips/Model Selection for Bayesian Autoencoders
 create mode 100644 data/2021/neurips/Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model
 create mode 100644 data/2021/neurips/Model-Based Domain Generalization
 create mode 100644 data/2021/neurips/Model-Based Episodic Memory Induces Dynamic Hybrid Controls
 create mode 100644 data/2021/neurips/Model-Based Reinforcement Learning via Imagination with Derived Memory
 create mode 100644 data/2021/neurips/Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones
 create mode 100644 data/2021/neurips/Modified Frank Wolfe in Probability Space
 create mode 100644 data/2021/neurips/Modular Gaussian Processes for Transfer Learning
 create mode 100644 data/2021/neurips/Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
 create mode 100644 data/2021/neurips/Monte Carlo Tree Search With Iteratively Refining State Abstractions
 create mode 100644 "data/2021/neurips/Mori\303\251 Attack (MA): A New Potential Risk of Screen Photos"
 create mode 100644 data/2021/neurips/Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
 create mode 100644 data/2021/neurips/Moser Flow: Divergence-based Generative Modeling on Manifolds
 create mode 100644 data/2021/neurips/Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
 create mode 100644 data/2021/neurips/Motif-based Graph Self-Supervised Learning for Molecular Property Prediction
 create mode 100644 data/2021/neurips/Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks
 create mode 100644 data/2021/neurips/Multi-Agent Reinforcement Learning in Stochastic Networked Systems
 create mode 100644 data/2021/neurips/Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization
 create mode 100644 data/2021/neurips/Multi-Facet Clustering Variational Autoencoders
 create mode 100644 data/2021/neurips/Multi-Label Learning with Pairwise Relevance Ordering
 create mode 100644 data/2021/neurips/Multi-Objective Meta Learning
 create mode 100644 data/2021/neurips/Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
 create mode 100644 data/2021/neurips/Multi-Person 3D Motion Prediction with Multi-Range Transformers
 create mode 100644 data/2021/neurips/Multi-Scale Representation Learning on Proteins
 create mode 100644 data/2021/neurips/Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs
 create mode 100644 data/2021/neurips/Multi-View Representation Learning via Total Correlation Objective
 create mode 100644 data/2021/neurips/Multi-armed Bandit Requiring Monotone Arm Sequences
 create mode 100644 data/2021/neurips/Multi-modal Dependency Tree for Video Captioning
 create mode 100644 data/2021/neurips/Multi-task Learning of Order-Consistent Causal Graphs
 create mode 100644 data/2021/neurips/Multi-view Contrastive Graph Clustering
 create mode 100644 data/2021/neurips/Multiclass Boosting and the Cost of Weak Learning
 create mode 100644 data/2021/neurips/Multiclass versus Binary Differentially Private PAC Learning
 create mode 100644 data/2021/neurips/Multilingual Pre-training with Universal Dependency Learning
 create mode 100644 data/2021/neurips/Multimodal Few-Shot Learning with Frozen Language Models
 create mode 100644 data/2021/neurips/Multimodal Virtual Point 3D Detection
 create mode 100644 data/2021/neurips/Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
 create mode 100644 data/2021/neurips/Multiple Descent: Design Your Own Generalization Curve
 create mode 100644 data/2021/neurips/Multiwavelet-based Operator Learning for Differential Equations
 create mode 100644 data/2021/neurips/NAS-Bench-x11 and the Power of Learning Curves
 create mode 100644 data/2021/neurips/NEO: Non Equilibrium Sampling on the Orbits of a Deterministic Transform
 create mode 100644 data/2021/neurips/NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs
 create mode 100644 data/2021/neurips/NORESQA: A Framework for Speech Quality Assessment using Non-Matching References
 create mode 100644 data/2021/neurips/NTopo: Mesh-free Topology Optimization using Implicit Neural Representations
 create mode 100644 data/2021/neurips/Navigating to the Best Policy in Markov Decision Processes
 create mode 100644 data/2021/neurips/NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild
 create mode 100644 data/2021/neurips/NeRV: Neural Representations for Videos
 create mode 100644 data/2021/neurips/Near Optimal Policy Optimization via REPS
 create mode 100644 data/2021/neurips/Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness
 create mode 100644 data/2021/neurips/Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning
 create mode 100644 data/2021/neurips/Near-Optimal No-Regret Learning in General Games
 create mode 100644 data/2021/neurips/Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
 create mode 100644 data/2021/neurips/Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
 create mode 100644 data/2021/neurips/Nearly Horizon-Free Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
 create mode 100644 data/2021/neurips/Nearly-Tight and Oblivious Algorithms for Explainable Clustering
 create mode 100644 data/2021/neurips/Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables
 create mode 100644 data/2021/neurips/Neighborhood Reconstructing Autoencoders
 create mode 100644 data/2021/neurips/Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction
 create mode 100644 data/2021/neurips/Nested Counterfactual Identification from Arbitrary Surrogate Experiments
 create mode 100644 data/2021/neurips/Nested Graph Neural Networks
 create mode 100644 data/2021/neurips/Nested Variational Inference
 create mode 100644 data/2021/neurips/Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization
 create mode 100644 data/2021/neurips/NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
 create mode 100644 data/2021/neurips/NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
 create mode 100644 data/2021/neurips/Neural Active Learning with Performance Guarantees
 create mode 100644 data/2021/neurips/Neural Additive Models: Interpretable Machine Learning with Neural Nets
 create mode 100644 data/2021/neurips/Neural Algorithmic Reasoners are Implicit Planners
 create mode 100644 data/2021/neurips/Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
 create mode 100644 data/2021/neurips/Neural Auto-Curricula in Two-Player Zero-Sum Games
 create mode 100644 data/2021/neurips/Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction
 create mode 100644 data/2021/neurips/Neural Bootstrapper
 create mode 100644 data/2021/neurips/Neural Circuit Synthesis from Specification Patterns
 create mode 100644 data/2021/neurips/Neural Distance Embeddings for Biological Sequences
 create mode 100644 data/2021/neurips/Neural Dubber: Dubbing for Videos According to Scripts
 create mode 100644 data/2021/neurips/Neural Ensemble Search for Uncertainty Estimation and Dataset Shift
 create mode 100644 data/2021/neurips/Neural Flows: Efficient Alternative to Neural ODEs
 create mode 100644 data/2021/neurips/Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
 create mode 100644 data/2021/neurips/Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions
 create mode 100644 data/2021/neurips/Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception
 create mode 100644 data/2021/neurips/Neural Production Systems
 create mode 100644 data/2021/neurips/Neural Program Generation Modulo Static Analysis
 create mode 100644 data/2021/neurips/Neural Pseudo-Label Optimism for the Bank Loan Problem
 create mode 100644 data/2021/neurips/Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex
 create mode 100644 data/2021/neurips/Neural Relightable Participating Media Rendering
 create mode 100644 data/2021/neurips/Neural Routing by Memory
 create mode 100644 data/2021/neurips/Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation
 create mode 100644 data/2021/neurips/Neural Scene Flow Prior
 create mode 100644 data/2021/neurips/Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems
 create mode 100644 data/2021/neurips/Neural Tangent Kernel Maximum Mean Discrepancy
 create mode 100644 data/2021/neurips/Neural Trees for Learning on Graphs
 create mode 100644 data/2021/neurips/Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose
 create mode 100644 data/2021/neurips/Neural optimal feedback control with local learning rules
 create mode 100644 data/2021/neurips/Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition
 create mode 100644 data/2021/neurips/NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem
 create mode 100644 data/2021/neurips/NeuroMLR: Robust & Reliable Route Recommendation on Road Networks
 create mode 100644 data/2021/neurips/Never Go Full Batch (in Stochastic Convex Optimization)
 create mode 100644 data/2021/neurips/Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
 create mode 100644 data/2021/neurips/No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data
 create mode 100644 data/2021/neurips/No RL, No Simulation: Learning to Navigate without Navigating
 create mode 100644 data/2021/neurips/No Regrets for Learning the Prior in Bandits
 create mode 100644 data/2021/neurips/No-Press Diplomacy from Scratch
 create mode 100644 data/2021/neurips/No-regret Online Learning over Riemannian Manifolds
 create mode 100644 data/2021/neurips/Node Dependent Local Smoothing for Scalable Graph Learning
 create mode 100644 data/2021/neurips/Noether Networks: meta-learning useful conserved quantities
 create mode 100644 data/2021/neurips/Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks
 create mode 100644 data/2021/neurips/Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images
 create mode 100644 "data/2021/neurips/Noisy Adaptation Generates L\303\251vy Flights in Attractor Neural Networks"
 create mode 100644 data/2021/neurips/Noisy Recurrent Neural Networks
 create mode 100644 data/2021/neurips/Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation
 create mode 100644 data/2021/neurips/Non-Gaussian Gaussian Processes for Few-Shot Regression
 create mode 100644 data/2021/neurips/Non-approximate Inference for Collective Graphical Models on Path Graphs via Discrete Difference of Convex Algorithm
 create mode 100644 data/2021/neurips/Non-asymptotic Error Bounds for Bidirectional GANs
 create mode 100644 data/2021/neurips/Non-asymptotic convergence bounds for Wasserstein approximation using point clouds
 create mode 100644 data/2021/neurips/Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
 create mode 100644 data/2021/neurips/Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation
 create mode 100644 data/2021/neurips/Nonparametric estimation of continuous DPPs with kernel methods
 create mode 100644 data/2021/neurips/Nonsmooth Implicit Differentiation for Machine-Learning and Optimization
 create mode 100644 data/2021/neurips/Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data
 create mode 100644 data/2021/neurips/Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition
 create mode 100644 data/2021/neurips/Not All Low-Pass Filters are Robust in Graph Convolutional Networks
 create mode 100644 data/2021/neurips/Novel Upper Bounds for the Constrained Most Probable Explanation Task
 create mode 100644 data/2021/neurips/Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation
 create mode 100644 data/2021/neurips/NovelD: A Simple yet Effective Exploration Criterion
 create mode 100644 data/2021/neurips/Numerical Composition of Differential Privacy
 create mode 100644 data/2021/neurips/Numerical influence of ReLU'(0) on backpropagation
 create mode 100644 data/2021/neurips/NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
 create mode 100644 data/2021/neurips/OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression
 create mode 100644 data/2021/neurips/Object DGCNN: 3D Object Detection using Dynamic Graphs
 create mode 100644 data/2021/neurips/Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
 create mode 100644 data/2021/neurips/Object-Centric Representation Learning with Generative Spatial-Temporal Factorization
 create mode 100644 data/2021/neurips/Object-aware Contrastive Learning for Debiased Scene Representation
 create mode 100644 data/2021/neurips/Observation-Free Attacks on Stochastic Bandits
 create mode 100644 data/2021/neurips/OctField: Hierarchical Implicit Functions for 3D Modeling
 create mode 100644 data/2021/neurips/Off-Policy Risk Assessment in Contextual Bandits
 create mode 100644 data/2021/neurips/Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
 create mode 100644 data/2021/neurips/Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies
 create mode 100644 data/2021/neurips/Offline Model-based Adaptable Policy Learning
 create mode 100644 data/2021/neurips/Offline RL Without Off-Policy Evaluation
 create mode 100644 data/2021/neurips/Offline Reinforcement Learning as One Big Sequence Modeling Problem
 create mode 100644 data/2021/neurips/Offline Reinforcement Learning with Reverse Model-based Imagination
 create mode 100644 data/2021/neurips/On Blame Attribution for Accountable Multi-Agent Sequential Decision Making
 create mode 100644 data/2021/neurips/On Calibration and Out-of-Domain Generalization
 create mode 100644 data/2021/neurips/On Component Interactions in Two-Stage Recommender Systems
 create mode 100644 data/2021/neurips/On Contrastive Representations of Stochastic Processes
 create mode 100644 data/2021/neurips/On Density Estimation with Diffusion Models
 create mode 100644 data/2021/neurips/On Effective Scheduling of Model-based Reinforcement Learning
 create mode 100644 data/2021/neurips/On Empirical Risk Minimization with Dependent and Heavy-Tailed Data
 create mode 100644 data/2021/neurips/On Episodes, Prototypical Networks, and Few-Shot Learning
 create mode 100644 data/2021/neurips/On Inductive Biases for Heterogeneous Treatment Effect Estimation
 create mode 100644 data/2021/neurips/On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
 create mode 100644 data/2021/neurips/On Joint Learning for Solving Placement and Routing in Chip Design
 create mode 100644 data/2021/neurips/On Large-Cohort Training for Federated Learning
 create mode 100644 data/2021/neurips/On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources
 create mode 100644 data/2021/neurips/On Linear Stability of SGD and Input-Smoothness of Neural Networks
 create mode 100644 data/2021/neurips/On Locality of Local Explanation Models
 create mode 100644 data/2021/neurips/On Margin-Based Cluster Recovery with Oracle Queries
 create mode 100644 data/2021/neurips/On Memorization in Probabilistic Deep Generative Models
 create mode 100644 data/2021/neurips/On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
 create mode 100644 data/2021/neurips/On Optimal Interpolation in Linear Regression
 create mode 100644 data/2021/neurips/On Optimal Robustness to Adversarial Corruption in Online Decision Problems
 create mode 100644 data/2021/neurips/On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
 create mode 100644 data/2021/neurips/On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
 create mode 100644 data/2021/neurips/On Plasticity, Invariance, and Mutually Frozen Weights in Sequential Task Learning
 create mode 100644 data/2021/neurips/On Provable Benefits of Depth in Training Graph Convolutional Networks
 create mode 100644 data/2021/neurips/On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry
 create mode 100644 data/2021/neurips/On Robust Optimal Transport: Computational Complexity and Barycenter Computation
 create mode 100644 data/2021/neurips/On Success and Simplicity: A Second Look at Transferable Targeted Attacks
 create mode 100644 data/2021/neurips/On The Structure of Parametric Tournaments with Application to Ranking from Pairwise Comparisons
 create mode 100644 data/2021/neurips/On Training Implicit Models
 create mode 100644 data/2021/neurips/On UMAP's True Loss Function
 create mode 100644 data/2021/neurips/On learning sparse vectors from mixture of responses
 create mode 100644 data/2021/neurips/On sensitivity of meta-learning to support data
 create mode 100644 data/2021/neurips/On the Algorithmic Stability of Adversarial Training
 create mode 100644 data/2021/neurips/On the Bias-Variance-Cost Tradeoff of Stochastic Optimization
 create mode 100644 data/2021/neurips/On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning
 create mode 100644 data/2021/neurips/On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
 create mode 100644 data/2021/neurips/On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms
 create mode 100644 data/2021/neurips/On the Convergence of Step Decay Step-Size for Stochastic Optimization
 create mode 100644 data/2021/neurips/On the Cryptographic Hardness of Learning Single Periodic Neurons
 create mode 100644 data/2021/neurips/On the Equivalence between Neural Network and Support Vector Machine
 create mode 100644 data/2021/neurips/On the Estimation Bias in Double Q-Learning
 create mode 100644 data/2021/neurips/On the Existence of The Adversarial Bayes Classifier
 create mode 100644 data/2021/neurips/On the Expected Complexity of Maxout Networks
 create mode 100644 data/2021/neurips/On the Expressivity of Markov Reward
 create mode 100644 data/2021/neurips/On the Frequency Bias of Generative Models
 create mode 100644 data/2021/neurips/On the Generative Utility of Cyclic Conditionals
 create mode 100644 data/2021/neurips/On the Importance of Gradients for Detecting Distributional Shifts in the Wild
 create mode 100644 data/2021/neurips/On the Out-of-distribution Generalization of Probabilistic Image Modelling
 create mode 100644 data/2021/neurips/On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
 create mode 100644 data/2021/neurips/On the Power of Differentiable Learning versus PAC and SQ Learning
 create mode 100644 data/2021/neurips/On the Power of Edge Independent Graph Models
 create mode 100644 data/2021/neurips/On the Provable Generalization of Recurrent Neural Networks
 create mode 100644 data/2021/neurips/On the Representation Power of Set Pooling Networks
 create mode 100644 data/2021/neurips/On the Representation of Solutions to Elliptic PDEs in Barron Spaces
 create mode 100644 data/2021/neurips/On the Role of Optimization in Double Descent: A Least Squares Study
 create mode 100644 data/2021/neurips/On the Sample Complexity of Learning under Geometric Stability
 create mode 100644 data/2021/neurips/On the Sample Complexity of Privately Learning Axis-Aligned Rectangles
 create mode 100644 data/2021/neurips/On the Second-order Convergence Properties of Random Search Methods
 create mode 100644 data/2021/neurips/On the Stochastic Stability of Deep Markov Models
 create mode 100644 data/2021/neurips/On the Suboptimality of Thompson Sampling in High Dimensions
 create mode 100644 data/2021/neurips/On the Theory of Reinforcement Learning with Once-per-Episode Feedback
 create mode 100644 data/2021/neurips/On the Universality of Graph Neural Networks on Large Random Graphs
 create mode 100644 data/2021/neurips/On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
 create mode 100644 data/2021/neurips/On the Value of Infinite Gradients in Variational Autoencoder Models
 create mode 100644 data/2021/neurips/On the Value of Interaction and Function Approximation in Imitation Learning
 create mode 100644 data/2021/neurips/On the Variance of the Fisher Information for Deep Learning
 create mode 100644 data/2021/neurips/On the interplay between data structure and loss function in classification problems
 create mode 100644 data/2021/neurips/One Explanation is Not Enough: Structured Attention Graphs for Image Classification
 create mode 100644 data/2021/neurips/One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective
 create mode 100644 data/2021/neurips/One More Step Towards Reality: Cooperative Bandits with Imperfect Communication
 create mode 100644 data/2021/neurips/One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
 create mode 100644 data/2021/neurips/Online Active Learning with Surrogate Loss Functions
 create mode 100644 data/2021/neurips/Online Adaptation to Label Distribution Shift
 create mode 100644 data/2021/neurips/Online Control of Unknown Time-Varying Dynamical Systems
 create mode 100644 data/2021/neurips/Online Convex Optimization with Continuous Switching Constraint
 create mode 100644 data/2021/neurips/Online Facility Location with Multiple Advice
 create mode 100644 data/2021/neurips/Online Knapsack with Frequency Predictions
 create mode 100644 data/2021/neurips/Online Learning Of Neural Computations From Sparse Temporal Feedback
 create mode 100644 data/2021/neurips/Online Learning and Control of Complex Dynamical Systems from Sensory Input
 create mode 100644 data/2021/neurips/Online Learning in Periodic Zero-Sum Games
 create mode 100644 data/2021/neurips/Online Market Equilibrium with Application to Fair Division
 create mode 100644 data/2021/neurips/Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm
 create mode 100644 data/2021/neurips/Online Multi-Armed Bandits with Adaptive Inference
 create mode 100644 data/2021/neurips/Online Robust Reinforcement Learning with Model Uncertainty
 create mode 100644 data/2021/neurips/Online Selective Classification with Limited Feedback
 create mode 100644 data/2021/neurips/Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
 create mode 100644 data/2021/neurips/Online Variational Filtering and Parameter Learning
 create mode 100644 data/2021/neurips/Online and Offline Reinforcement Learning by Planning with a Learned Model
 create mode 100644 data/2021/neurips/Online false discovery rate control for anomaly detection in time series
 create mode 100644 data/2021/neurips/Online learning in MDPs with linear function approximation and bandit feedback
 create mode 100644 data/2021/neurips/Only Train Once: A One-Shot Neural Network Training And Pruning Framework
 create mode 100644 data/2021/neurips/Open Rule Induction
 create mode 100644 data/2021/neurips/Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
 create mode 100644 data/2021/neurips/OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization
 create mode 100644 data/2021/neurips/Optimal Algorithms for Stochastic Contextual Preference Bandits
 create mode 100644 data/2021/neurips/Optimal Best-Arm Identification Methods for Tail-Risk Measures
 create mode 100644 data/2021/neurips/Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
 create mode 100644 data/2021/neurips/Optimal Order Simple Regret for Gaussian Process Bandits
 create mode 100644 data/2021/neurips/Optimal Policies Tend To Seek Power
 create mode 100644 data/2021/neurips/Optimal Rates for Nonparametric Density Estimation under Communication Constraints
 create mode 100644 data/2021/neurips/Optimal Rates for Random Order Online Optimization
 create mode 100644 data/2021/neurips/Optimal Sketching for Trace Estimation
 create mode 100644 data/2021/neurips/Optimal Underdamped Langevin MCMC Method
 create mode 100644 data/2021/neurips/Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
 create mode 100644 data/2021/neurips/Optimal prediction of Markov chains with and without spectral gap
 create mode 100644 data/2021/neurips/Optimality and Stability in Federated Learning: A Game-theoretic Approach
 create mode 100644 data/2021/neurips/Optimality of variational inference for stochasticblock model with missing links
 create mode 100644 data/2021/neurips/Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning
 create mode 100644 data/2021/neurips/Optimizing Conditional Value-At-Risk of Black-Box Functions
 create mode 100644 data/2021/neurips/Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD
 create mode 100644 data/2021/neurips/Optimizing Reusable Knowledge for Continual Learning via Metalearning
 create mode 100644 data/2021/neurips/Oracle Complexity in Nonsmooth Nonconvex Optimization
 create mode 100644 data/2021/neurips/Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
 create mode 100644 data/2021/neurips/Out-of-Distribution Generalization in Kernel Regression
 create mode 100644 data/2021/neurips/Outcome-Driven Reinforcement Learning via Variational Inference
 create mode 100644 data/2021/neurips/Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima
 create mode 100644 data/2021/neurips/Overcoming the Convex Barrier for Simplex Inputs
 create mode 100644 data/2021/neurips/Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning
 create mode 100644 data/2021/neurips/Overinterpretation reveals image classification model pathologies
 create mode 100644 data/2021/neurips/Overparameterization Improves Robustness to Covariate Shift in High Dimensions
 create mode 100644 data/2021/neurips/PCA Initialization for Approximate Message Passing in Rotationally Invariant Models
 create mode 100644 data/2021/neurips/PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
 create mode 100644 data/2021/neurips/PLUGIn: A simple algorithm for inverting generative models with recovery guarantees
 create mode 100644 data/2021/neurips/PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair
 create mode 100644 data/2021/neurips/POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples
 create mode 100644 data/2021/neurips/PSD Representations for Effective Probability Models
 create mode 100644 data/2021/neurips/PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
 create mode 100644 data/2021/neurips/Panoptic 3D Scene Reconstruction From a Single RGB Image
 create mode 100644 data/2021/neurips/ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions
 create mode 100644 data/2021/neurips/Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
 create mode 100644 data/2021/neurips/Parallel and Efficient Hierarchical k-Median Clustering
 create mode 100644 data/2021/neurips/Parallelizing Thompson Sampling
 create mode 100644 data/2021/neurips/Parameter Inference with Bifurcation Diagrams
 create mode 100644 data/2021/neurips/Parameter Prediction for Unseen Deep Architectures
 create mode 100644 data/2021/neurips/Parameter-free HE-friendly Logistic Regression
 create mode 100644 data/2021/neurips/Parameterized Knowledge Transfer for Personalized Federated Learning
 create mode 100644 data/2021/neurips/Parametric Complexity Bounds for Approximating PDEs with Neural Networks
 create mode 100644 data/2021/neurips/Parametrized Quantum Policies for Reinforcement Learning
 create mode 100644 data/2021/neurips/Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems
 create mode 100644 data/2021/neurips/Partial success in closing the gap between human and machine vision
 create mode 100644 data/2021/neurips/PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization
 create mode 100644 data/2021/neurips/Particle Cloud Generation with Message Passing Generative Adversarial Networks
 create mode 100644 data/2021/neurips/Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis
 create mode 100644 data/2021/neurips/Partition and Code: learning how to compress graphs
 create mode 100644 data/2021/neurips/Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks
 create mode 100644 data/2021/neurips/Passive attention in artificial neural networks predicts human visual selectivity
 create mode 100644 data/2021/neurips/PatchGame: Learning to Signal Mid-level Patches in Referential Games
 create mode 100644 data/2021/neurips/Pay Attention to MLPs
 create mode 100644 data/2021/neurips/Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling
 create mode 100644 data/2021/neurips/Per-Pixel Classification is Not All You Need for Semantic Segmentation
 create mode 100644 data/2021/neurips/PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
 create mode 100644 data/2021/neurips/Perceptual Score: What Data Modalities Does Your Model Perceive?
 create mode 100644 data/2021/neurips/Periodic Activation Functions Induce Stationarity
 create mode 100644 data/2021/neurips/Permutation-Invariant Variational Autoencoder for Graph-Level Representation Learning
 create mode 100644 data/2021/neurips/Permuton-induced Chinese Restaurant Process
 create mode 100644 data/2021/neurips/Personalized Federated Learning With Gaussian Processes
 create mode 100644 data/2021/neurips/Perturb-and-max-product: Sampling and learning in discrete energy-based models
 create mode 100644 data/2021/neurips/Perturbation Theory for the Information Bottleneck
 create mode 100644 data/2021/neurips/Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems
 create mode 100644 data/2021/neurips/Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
 create mode 100644 data/2021/neurips/PettingZoo: Gym for Multi-Agent Reinforcement Learning
 create mode 100644 data/2021/neurips/Photonic Differential Privacy with Direct Feedback Alignment
 create mode 100644 data/2021/neurips/Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling
 create mode 100644 data/2021/neurips/Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling
 create mode 100644 data/2021/neurips/PiRank: Scalable Learning To Rank via Differentiable Sorting
 create mode 100644 data/2021/neurips/Pipeline Combinators for Gradual AutoML
 create mode 100644 data/2021/neurips/Piper: Multidimensional Planner for DNN Parallelization
 create mode 100644 data/2021/neurips/Planning from Pixels in Environments with Combinatorially Hard Search Spaces
 create mode 100644 data/2021/neurips/Play to Grade: Testing Coding Games as Classifying Markov Decision Process
 create mode 100644 data/2021/neurips/PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning
 create mode 100644 data/2021/neurips/Pointwise Bounds for Distribution Estimation under Communication Constraints
 create mode 100644 data/2021/neurips/PolarStream: Streaming Object Detection and Segmentation with Polar Pillars
 create mode 100644 data/2021/neurips/Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
 create mode 100644 data/2021/neurips/Policy Learning Using Weak Supervision
 create mode 100644 data/2021/neurips/Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
 create mode 100644 data/2021/neurips/Pooling by Sliced-Wasserstein Embedding
 create mode 100644 data/2021/neurips/PortaSpeech: Portable and High-Quality Generative Text-to-Speech
 create mode 100644 data/2021/neurips/Post-Contextual-Bandit Inference
 create mode 100644 data/2021/neurips/Post-Training Quantization for Vision Transformer
 create mode 100644 data/2021/neurips/Post-Training Sparsity-Aware Quantization
 create mode 100644 data/2021/neurips/Post-processing for Individual Fairness
 create mode 100644 data/2021/neurips/Posterior Collapse and Latent Variable Non-identifiability
 create mode 100644 data/2021/neurips/Posterior Meta-Replay for Continual Learning
 create mode 100644 data/2021/neurips/Powerpropagation: A sparsity inducing weight reparameterisation
 create mode 100644 data/2021/neurips/Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient
 create mode 100644 data/2021/neurips/Practical Near Neighbor Search via Group Testing
 create mode 100644 data/2021/neurips/Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers
 create mode 100644 data/2021/neurips/Pragmatic Image Compression for Human-in-the-Loop Decision-Making
 create mode 100644 data/2021/neurips/Precise characterization of the prior predictive distribution of deep ReLU networks
 create mode 100644 data/2021/neurips/Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization
 create mode 100644 data/2021/neurips/Predicting Deep Neural Network Generalization with Perturbation Response Curves
 create mode 100644 data/2021/neurips/Predicting Event Memorability from Contextual Visual Semantics
 create mode 100644 data/2021/neurips/Predicting Molecular Conformation via Dynamic Graph Score Matching
 create mode 100644 data/2021/neurips/Predicting What You Already Know Helps: Provable Self-Supervised Learning
 create mode 100644 data/2021/neurips/Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics
 create mode 100644 data/2021/neurips/PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning
 create mode 100644 data/2021/neurips/Preserved central model for faster bidirectional compression in distributed settings
 create mode 100644 data/2021/neurips/Pretraining Representations for Data-Efficient Reinforcement Learning
 create mode 100644 data/2021/neurips/Prior-independent Dynamic Auctions for a Value-maximizing Buyer
 create mode 100644 data/2021/neurips/Private Non-smooth ERM and SCO in Subquadratic Steps
 create mode 100644 data/2021/neurips/Private and Non-private Uniformity Testing for Ranking Data
 create mode 100644 data/2021/neurips/Private learning implies quantum stability
 create mode 100644 data/2021/neurips/Privately Learning Mixtures of Axis-Aligned Gaussians
 create mode 100644 data/2021/neurips/Privately Learning Subspaces
 create mode 100644 data/2021/neurips/Privately Publishable Per-instance Privacy
 create mode 100644 data/2021/neurips/ProTo: Program-Guided Transformer for Program-Guided Tasks
 create mode 100644 data/2021/neurips/Probabilistic Attention for Interactive Segmentation
 create mode 100644 data/2021/neurips/Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs
 create mode 100644 data/2021/neurips/Probabilistic Forecasting: A Level-Set Approach
 create mode 100644 data/2021/neurips/Probabilistic Margins for Instance Reweighting in Adversarial Training
 create mode 100644 data/2021/neurips/Probabilistic Tensor Decomposition of Neural Population Spiking Activity
 create mode 100644 data/2021/neurips/Probabilistic Transformer For Time Series Analysis
 create mode 100644 data/2021/neurips/Probability Paths and the Structure of Predictions over Time
 create mode 100644 data/2021/neurips/Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
 create mode 100644 data/2021/neurips/Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
 create mode 100644 data/2021/neurips/Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent
 create mode 100644 data/2021/neurips/Program Synthesis Guided Reinforcement Learning for Partially Observed Environments
 create mode 100644 data/2021/neurips/Progressive Coordinate Transforms for Monocular 3D Object Detection
 create mode 100644 data/2021/neurips/Progressive Feature Interaction Search for Deep Sparse Network
 create mode 100644 data/2021/neurips/Projected GANs Converge Faster
 create mode 100644 data/2021/neurips/Proper Value Equivalence
 create mode 100644 data/2021/neurips/Property-Aware Relation Networks for Few-Shot Molecular Property Prediction
 create mode 100644 data/2021/neurips/Proportional Participatory Budgeting with Additive Utilities
 create mode 100644 data/2021/neurips/Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
 create mode 100644 data/2021/neurips/Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss
 create mode 100644 data/2021/neurips/Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
 create mode 100644 data/2021/neurips/Provable Representation Learning for Imitation with Contrastive Fourier Features
 create mode 100644 data/2021/neurips/Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning
 create mode 100644 data/2021/neurips/Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
 create mode 100644 data/2021/neurips/Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
 create mode 100644 data/2021/neurips/Provably Faster Algorithms for Bilevel Optimization
 create mode 100644 data/2021/neurips/Provably Strict Generalisation Benefit for Invariance in Kernel Methods
 create mode 100644 data/2021/neurips/Provably efficient multi-task reinforcement learning with model transfer
 create mode 100644 data/2021/neurips/Provably efficient, succinct, and precise explanations
 create mode 100644 data/2021/neurips/Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
 create mode 100644 data/2021/neurips/Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
 create mode 100644 data/2021/neurips/Pruning Randomly Initialized Neural Networks with Iterative Randomization
 create mode 100644 data/2021/neurips/Pseudo-Spherical Contrastive Divergence
 create mode 100644 data/2021/neurips/Pure Exploration in Kernel and Neural Bandits
 create mode 100644 data/2021/neurips/Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
 create mode 100644 data/2021/neurips/Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
 create mode 100644 data/2021/neurips/QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning
 create mode 100644 data/2021/neurips/Quantifying and Improving Transferability in Domain Generalization
 create mode 100644 data/2021/neurips/R-Drop: Regularized Dropout for Neural Networks
 create mode 100644 data/2021/neurips/RED : Looking for Redundancies for Data-FreeStructured Compression of Deep Neural Networks
 create mode 100644 data/2021/neurips/REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision
 create mode 100644 data/2021/neurips/RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning
 create mode 100644 data/2021/neurips/RIM: Reliable Influence-based Active Learning on Graphs
 create mode 100644 data/2021/neurips/RL for Latent MDPs: Regret Guarantees and a Lower Bound
 create mode 100644 data/2021/neurips/RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
 create mode 100644 data/2021/neurips/RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents
 create mode 100644 data/2021/neurips/RMM: Reinforced Memory Management for Class-Incremental Learning
 create mode 100644 data/2021/neurips/Random Noise Defense Against Query-Based Black-Box Attacks
 create mode 100644 data/2021/neurips/Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
 create mode 100644 data/2021/neurips/Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery
 create mode 100644 data/2021/neurips/Ranking Policy Decisions
 create mode 100644 data/2021/neurips/Rate-Optimal Subspace Estimation on Random Graphs
 create mode 100644 data/2021/neurips/Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections
 create mode 100644 data/2021/neurips/Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler
 create mode 100644 data/2021/neurips/Re-ranking for image retrieval and transductive few-shot classification
 create mode 100644 data/2021/neurips/ReAct: Out-of-distribution Detection With Rectified Activations
 create mode 100644 data/2021/neurips/ReLU Regression with Massart Noise
 create mode 100644 data/2021/neurips/ReSSL: Relational Self-Supervised Learning with Weak Augmentation
 create mode 100644 data/2021/neurips/Realistic evaluation of transductive few-shot learning
 create mode 100644 data/2021/neurips/Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training
 create mode 100644 data/2021/neurips/Rebounding Bandits for Modeling Satiation Effects
 create mode 100644 data/2021/neurips/Recognizing Vector Graphics without Rasterization
 create mode 100644 data/2021/neurips/Reconstruction for Powerful Graph Representations
 create mode 100644 data/2021/neurips/Recovering Latent Causal Factor for Generalization to Distributional Shifts
 create mode 100644 data/2021/neurips/Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition
 create mode 100644 data/2021/neurips/Rectangular Flows for Manifold Learning
 create mode 100644 data/2021/neurips/Rectifying the Shortcut Learning of Background for Few-Shot Learning
 create mode 100644 data/2021/neurips/Recurrence along Depth: Deep Convolutional Neural Networks with Recurrent Layer Aggregation
 create mode 100644 data/2021/neurips/Recurrent Bayesian Classifier Chains for Exact Multi-Label Classification
 create mode 100644 data/2021/neurips/Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits
 create mode 100644 data/2021/neurips/Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks
 create mode 100644 data/2021/neurips/Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias
 create mode 100644 data/2021/neurips/Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
 create mode 100644 data/2021/neurips/Reducing Collision Checking for Sampling-Based Motion Planning Using Graph Neural Networks
 create mode 100644 data/2021/neurips/Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation
 create mode 100644 data/2021/neurips/Reducing the Covariate Shift by Mirror Samples in Cross Domain Alignment
 create mode 100644 data/2021/neurips/Referring Transformer: A One-step Approach to Multi-task Visual Grounding
 create mode 100644 data/2021/neurips/Refined Learning Bounds for Kernel and Approximate $k$-Means
 create mode 100644 data/2021/neurips/Refining Language Models with Compositional Explanations
 create mode 100644 data/2021/neurips/Reformulating Zero-shot Action Recognition for Multi-label Actions
 create mode 100644 data/2021/neurips/Regime Switching Bandits
 create mode 100644 data/2021/neurips/Regret Bounds for Gaussian-Process Optimization in Large Domains
 create mode 100644 data/2021/neurips/Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
 create mode 100644 data/2021/neurips/Regularization in ResNet with Stochastic Depth
 create mode 100644 data/2021/neurips/Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond
 create mode 100644 data/2021/neurips/Regularized Softmax Deep Multi-Agent Q-Learning
 create mode 100644 data/2021/neurips/Regulating algorithmic filtering on social media
 create mode 100644 data/2021/neurips/Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization
 create mode 100644 data/2021/neurips/Reinforcement Learning Enhanced Explainer for Graph Neural Networks
 create mode 100644 data/2021/neurips/Reinforcement Learning based Disease Progression Model for Alzheimer's Disease
 create mode 100644 data/2021/neurips/Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
 create mode 100644 data/2021/neurips/Reinforcement Learning in Newcomblike Environments
 create mode 100644 data/2021/neurips/Reinforcement Learning in Reward-Mixing MDPs
 create mode 100644 data/2021/neurips/Reinforcement Learning with Latent Flow
 create mode 100644 data/2021/neurips/Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes
 create mode 100644 data/2021/neurips/Reinforcement learning for optimization of variational quantum circuit architectures
 create mode 100644 data/2021/neurips/Relational Self-Attention: What's Missing in Attention for Video Understanding
 create mode 100644 data/2021/neurips/Relative Flatness and Generalization
 create mode 100644 data/2021/neurips/Relative Uncertainty Learning for Facial Expression Recognition
 create mode 100644 data/2021/neurips/Relative stability toward diffeomorphisms indicates performance in deep nets
 create mode 100644 data/2021/neurips/Relaxed Marginal Consistency for Differentially Private Query Answering
 create mode 100644 data/2021/neurips/Relaxing Local Robustness
 create mode 100644 data/2021/neurips/RelaySum for Decentralized Deep Learning on Heterogeneous Data
 create mode 100644 data/2021/neurips/Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions
 create mode 100644 data/2021/neurips/Reliable Decisions with Threshold Calibration
 create mode 100644 data/2021/neurips/Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space
 create mode 100644 data/2021/neurips/Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
 create mode 100644 data/2021/neurips/Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection
 create mode 100644 data/2021/neurips/Remember What You Want to Forget: Algorithms for Machine Unlearning
 create mode 100644 data/2021/neurips/Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience
 create mode 100644 data/2021/neurips/Renyi Differential Privacy of The Subsampled Shuffle Model In Distributed Learning
 create mode 100644 data/2021/neurips/Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
 create mode 100644 data/2021/neurips/Replay-Guided Adversarial Environment Design
 create mode 100644 data/2021/neurips/Representation Costs of Linear Neural Networks: Analysis and Design
 create mode 100644 data/2021/neurips/Representation Learning Beyond Linear Prediction Functions
 create mode 100644 data/2021/neurips/Representation Learning for Event-based Visuomotor Policies
 create mode 100644 data/2021/neurips/Representation Learning on Spatial Networks
 create mode 100644 data/2021/neurips/Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models
 create mode 100644 data/2021/neurips/Representing Hyperbolic Space Accurately using Multi-Component Floats
 create mode 100644 data/2021/neurips/Representing Long-Range Context for Graph Neural Networks with Global Attention
 create mode 100644 data/2021/neurips/Repulsive Deep Ensembles are Bayesian
 create mode 100644 data/2021/neurips/ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees
 create mode 100644 data/2021/neurips/ResT: An Efficient Transformer for Visual Recognition
 create mode 100644 data/2021/neurips/Residual Pathway Priors for Soft Equivariance Constraints
 create mode 100644 data/2021/neurips/Residual Relaxation for Multi-view Representation Learning
 create mode 100644 data/2021/neurips/Residual2Vec: Debiasing graph embedding with random graphs
 create mode 100644 data/2021/neurips/Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence
 create mode 100644 data/2021/neurips/Rethinking Graph Transformers with Spectral Attention
 create mode 100644 data/2021/neurips/Rethinking Neural Operations for Diverse Tasks
 create mode 100644 data/2021/neurips/Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
 create mode 100644 data/2021/neurips/Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization
 create mode 100644 data/2021/neurips/Rethinking conditional GAN training: An approach using geometrically structured latent manifolds
 create mode 100644 data/2021/neurips/Rethinking gradient sparsification as total error minimization
 create mode 100644 data/2021/neurips/Rethinking the Pruning Criteria for Convolutional Neural Network
 create mode 100644 data/2021/neurips/Rethinking the Variational Interpretation of Accelerated Optimization Methods
 create mode 100644 data/2021/neurips/Retiring Adult: New Datasets for Fair Machine Learning
 create mode 100644 data/2021/neurips/Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes
 create mode 100644 data/2021/neurips/Revealing and Protecting Labels in Distributed Training
 create mode 100644 data/2021/neurips/Revenue maximization via machine learning with noisy data
 create mode 100644 data/2021/neurips/Reverse engineering learned optimizers reveals known and novel mechanisms
 create mode 100644 data/2021/neurips/Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems
 create mode 100644 data/2021/neurips/Reverse-Complement Equivariant Networks for DNA Sequences
 create mode 100644 data/2021/neurips/Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning
 create mode 100644 data/2021/neurips/Revisiting 3D Object Detection From an Egocentric Perspective
 create mode 100644 data/2021/neurips/Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
 create mode 100644 data/2021/neurips/Revisiting Deep Learning Models for Tabular Data
 create mode 100644 data/2021/neurips/Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme
 create mode 100644 data/2021/neurips/Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness
 create mode 100644 data/2021/neurips/Revisiting Model Stitching to Compare Neural Representations
 create mode 100644 data/2021/neurips/Revisiting ResNets: Improved Training and Scaling Strategies
 create mode 100644 data/2021/neurips/Revisiting Smoothed Online Learning
 create mode 100644 data/2021/neurips/Revisiting the Calibration of Modern Neural Networks
 create mode 100644 data/2021/neurips/Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
 create mode 100644 data/2021/neurips/Reward is enough for convex MDPs
 create mode 100644 data/2021/neurips/Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
 create mode 100644 data/2021/neurips/Risk Bounds and Calibration for a Smart Predict-then-Optimize Method
 create mode 100644 data/2021/neurips/Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures
 create mode 100644 data/2021/neurips/Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning
 create mode 100644 data/2021/neurips/Risk Monotonicity in Statistical Learning
 create mode 100644 data/2021/neurips/Risk-Averse Bayes-Adaptive Reinforcement Learning
 create mode 100644 data/2021/neurips/Risk-Aware Transfer in Reinforcement Learning using Successor Features
 create mode 100644 data/2021/neurips/Risk-averse Heteroscedastic Bayesian Optimization
 create mode 100644 data/2021/neurips/RoMA: Robust Model Adaptation for Offline Model-based Optimization
 create mode 100644 data/2021/neurips/Robust Allocations with Diversity Constraints
 create mode 100644 data/2021/neurips/Robust Auction Design in the Auto-bidding World
 create mode 100644 data/2021/neurips/Robust Compressed Sensing MRI with Deep Generative Priors
 create mode 100644 data/2021/neurips/Robust Contrastive Learning Using Negative Samples with Diminished Semantics
 create mode 100644 data/2021/neurips/Robust Counterfactual Explanations on Graph Neural Networks
 create mode 100644 data/2021/neurips/Robust Deep Reinforcement Learning through Adversarial Loss
 create mode 100644 data/2021/neurips/Robust Generalization despite Distribution Shift via Minimum Discriminating Information
 create mode 100644 data/2021/neurips/Robust Implicit Networks via Non-Euclidean Contractions
 create mode 100644 data/2021/neurips/Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
 create mode 100644 data/2021/neurips/Robust Learning of Optimal Auctions
 create mode 100644 data/2021/neurips/Robust Online Correlation Clustering
 create mode 100644 data/2021/neurips/Robust Optimization for Multilingual Translation with Imbalanced Data
 create mode 100644 data/2021/neurips/Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference
 create mode 100644 data/2021/neurips/Robust Predictable Control
 create mode 100644 data/2021/neurips/Robust Regression Revisited: Acceleration and Improved Estimation Rates
 create mode 100644 data/2021/neurips/Robust Visual Reasoning via Language Guided Neural Module Networks
 create mode 100644 data/2021/neurips/Robust and Decomposable Average Precision for Image Retrieval
 create mode 100644 data/2021/neurips/Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems
 create mode 100644 data/2021/neurips/Robust and differentially private mean estimation
 create mode 100644 data/2021/neurips/Robustifying Algorithms of Learning Latent Trees with Vector Variables
 create mode 100644 data/2021/neurips/Robustness between the worst and average case
 create mode 100644 data/2021/neurips/Robustness of Graph Neural Networks at Scale
 create mode 100644 data/2021/neurips/Robustness via Uncertainty-aware Cycle Consistency
 create mode 100644 data/2021/neurips/Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding
 create mode 100644 data/2021/neurips/Roto-translated Local Coordinate Frames For Interacting Dynamical Systems
 create mode 100644 data/2021/neurips/Row-clustering of a Point Process-valued Matrix
 create mode 100644 data/2021/neurips/SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL
 create mode 100644 data/2021/neurips/SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization
 create mode 100644 data/2021/neurips/SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization
 create mode 100644 data/2021/neurips/SE(3)-equivariant prediction of molecular wavefunctions and electronic densities
 create mode 100644 data/2021/neurips/SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
 create mode 100644 data/2021/neurips/SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
 create mode 100644 data/2021/neurips/SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark
 create mode 100644 data/2021/neurips/SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios
 create mode 100644 data/2021/neurips/SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
 create mode 100644 data/2021/neurips/SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks
 create mode 100644 data/2021/neurips/SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
 create mode 100644 data/2021/neurips/SNIPS: Solving Noisy Inverse Problems Stochastically
 create mode 100644 data/2021/neurips/SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
 create mode 100644 data/2021/neurips/SOFT: Softmax-free Transformer with Linear Complexity
 create mode 100644 data/2021/neurips/SOLQ: Segmenting Objects by Learning Queries
 create mode 100644 data/2021/neurips/SOPE: Spectrum of Off-Policy Estimators
 create mode 100644 data/2021/neurips/SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search
 create mode 100644 data/2021/neurips/SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning
 create mode 100644 data/2021/neurips/SSAL: Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection
 create mode 100644 data/2021/neurips/SSMF: Shifting Seasonal Matrix Factorization
 create mode 100644 data/2021/neurips/SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning
 create mode 100644 data/2021/neurips/STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning
 create mode 100644 data/2021/neurips/STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data
 create mode 100644 data/2021/neurips/STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization
 create mode 100644 data/2021/neurips/SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients
 create mode 100644 data/2021/neurips/SWAD: Domain Generalization by Seeking Flat Minima
 create mode 100644 data/2021/neurips/Safe Policy Optimization with Local Generalized Linear Function Approximations
 create mode 100644 data/2021/neurips/Safe Pontryagin Differentiable Programming
 create mode 100644 data/2021/neurips/Safe Reinforcement Learning by Imagining the Near Future
 create mode 100644 data/2021/neurips/Safe Reinforcement Learning with Natural Language Constraints
 create mode 100644 data/2021/neurips/Sageflow: Robust Federated Learning against Both Stragglers and Adversaries
 create mode 100644 data/2021/neurips/SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning
 create mode 100644 data/2021/neurips/Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons
 create mode 100644 data/2021/neurips/Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond
 create mode 100644 data/2021/neurips/Sample Selection for Fair and Robust Training
 create mode 100644 data/2021/neurips/Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
 create mode 100644 data/2021/neurips/Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
 create mode 100644 data/2021/neurips/Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
 create mode 100644 data/2021/neurips/Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?
 create mode 100644 data/2021/neurips/Scalable Bayesian GPFA with automatic relevance determination and discrete noise models
 create mode 100644 data/2021/neurips/Scalable Diverse Model Selection for Accessible Transfer Learning
 create mode 100644 data/2021/neurips/Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation
 create mode 100644 data/2021/neurips/Scalable Inference of Sparsely-changing Gaussian Markov Random Fields
 create mode 100644 data/2021/neurips/Scalable Intervention Target Estimation in Linear Models
 create mode 100644 data/2021/neurips/Scalable Neural Data Server: A Data Recommender for Transfer Learning
 create mode 100644 data/2021/neurips/Scalable Online Planning via Reinforcement Learning Fine-Tuning
 create mode 100644 data/2021/neurips/Scalable Quasi-Bayesian Inference for Instrumental Variable Regression
 create mode 100644 data/2021/neurips/Scalable Rule-Based Representation Learning for Interpretable Classification
 create mode 100644 data/2021/neurips/Scalable Thompson Sampling using Sparse Gaussian Process Models
 create mode 100644 data/2021/neurips/Scalable and Stable Surrogates for Flexible Classifiers with Fairness Constraints
 create mode 100644 data/2021/neurips/Scalars are universal: Equivariant machine learning, structured like classical physics
 create mode 100644 data/2021/neurips/ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers
 create mode 100644 data/2021/neurips/Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets
 create mode 100644 data/2021/neurips/Scaling Gaussian Processes with Derivative Information Using Variational Inference
 create mode 100644 data/2021/neurips/Scaling Neural Tangent Kernels via Sketching and Random Features
 create mode 100644 data/2021/neurips/Scaling Up Exact Neural Network Compression by ReLU Stability
 create mode 100644 data/2021/neurips/Scaling Vision with Sparse Mixture of Experts
 create mode 100644 data/2021/neurips/Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification
 create mode 100644 data/2021/neurips/Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning
 create mode 100644 data/2021/neurips/Scatterbrain: Unifying Sparse and Low-rank Attention
 create mode 100644 data/2021/neurips/Scheduling jobs with stochastic holding costs
 create mode 100644 data/2021/neurips/Score-based Generative Modeling in Latent Space
 create mode 100644 data/2021/neurips/Score-based Generative Neural Networks for Large-Scale Optimal Transport
 create mode 100644 data/2021/neurips/Searching Parameterized AP Loss for Object Detection
 create mode 100644 data/2021/neurips/Searching for Efficient Transformers for Language Modeling
 create mode 100644 data/2021/neurips/Searching the Search Space of Vision Transformer
 create mode 100644 data/2021/neurips/Second-Order Neural ODE Optimizer
 create mode 100644 data/2021/neurips/See More for Scene: Pairwise Consistency Learning for Scene Classification
 create mode 100644 data/2021/neurips/SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
 create mode 100644 data/2021/neurips/Selective Sampling for Online Best-arm Identification
 create mode 100644 data/2021/neurips/Self-Adaptable Point Processes with Nonparametric Time Decays
 create mode 100644 data/2021/neurips/Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
 create mode 100644 data/2021/neurips/Self-Consistent Models and Values
 create mode 100644 data/2021/neurips/Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks
 create mode 100644 data/2021/neurips/Self-Instantiated Recurrent Units with Dynamic Soft Recursion
 create mode 100644 data/2021/neurips/Self-Interpretable Model with Transformation Equivariant Interpretation
 create mode 100644 data/2021/neurips/Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels
 create mode 100644 data/2021/neurips/Self-Supervised Bug Detection and Repair
 create mode 100644 data/2021/neurips/Self-Supervised GANs with Label Augmentation
 create mode 100644 data/2021/neurips/Self-Supervised Learning Disentangled Group Representation as Feature
 create mode 100644 data/2021/neurips/Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks
 create mode 100644 data/2021/neurips/Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style
 create mode 100644 data/2021/neurips/Self-Supervised Learning with Kernel Dependence Maximization
 create mode 100644 data/2021/neurips/Self-Supervised Multi-Object Tracking with Cross-input Consistency
 create mode 100644 data/2021/neurips/Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
 create mode 100644 data/2021/neurips/Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning
 create mode 100644 data/2021/neurips/Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification
 create mode 100644 data/2021/neurips/Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics
 create mode 100644 data/2021/neurips/Sequence-to-Sequence Learning with Latent Neural Grammars
 create mode 100644 data/2021/neurips/Sequential Algorithms for Testing Closeness of Distributions
 create mode 100644 data/2021/neurips/Sequential Causal Imitation Learning with Unobserved Confounders
 create mode 100644 data/2021/neurips/Set Prediction in the Latent Space
 create mode 100644 data/2021/neurips/Settling the Variance of Multi-Agent Policy Gradients
 create mode 100644 data/2021/neurips/Shape As Points: A Differentiable Poisson Solver
 create mode 100644 data/2021/neurips/Shape Registration in the Time of Transformers
 create mode 100644 data/2021/neurips/Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects
 create mode 100644 data/2021/neurips/Shape your Space: A Gaussian Mixture Regularization Approach to Deterministic Autoencoders
 create mode 100644 data/2021/neurips/Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices
 create mode 100644 data/2021/neurips/Shaping embodied agent behavior with activity-context priors from egocentric video
 create mode 100644 data/2021/neurips/Shapley Residuals: Quantifying the limits of the Shapley value for explanations
 create mode 100644 data/2021/neurips/Shared Independent Component Analysis for Multi-Subject Neuroimaging
 create mode 100644 data/2021/neurips/Sharp Impossibility Results for Hyper-graph Testing
 create mode 100644 data/2021/neurips/Shift Invariance Can Reduce Adversarial Robustness
 create mode 100644 data/2021/neurips/Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data
 create mode 100644 data/2021/neurips/Shifted Chunk Transformer for Spatio-Temporal Representational Learning
 create mode 100644 data/2021/neurips/Sifting through the noise: Universal first-order methods for stochastic variational inequalities
 create mode 100644 data/2021/neurips/Sim and Real: Better Together
 create mode 100644 data/2021/neurips/SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement
 create mode 100644 data/2021/neurips/Similarity and Matching of Neural Network Representations
 create mode 100644 data/2021/neurips/Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
 create mode 100644 data/2021/neurips/Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions
 create mode 100644 data/2021/neurips/Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection
 create mode 100644 data/2021/neurips/SketchGen: Generating Constrained CAD Sketches
 create mode 100644 data/2021/neurips/Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs
 create mode 100644 data/2021/neurips/Slice Sampling Reparameterization Gradients
 create mode 100644 data/2021/neurips/Sliced Mutual Information: A Scalable Measure of Statistical Dependence
 create mode 100644 data/2021/neurips/Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
 create mode 100644 data/2021/neurips/Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction
 create mode 100644 data/2021/neurips/Smooth Bilevel Programming for Sparse Regularization
 create mode 100644 data/2021/neurips/Smooth Normalizing Flows
 create mode 100644 data/2021/neurips/SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness
 create mode 100644 data/2021/neurips/Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization
 create mode 100644 data/2021/neurips/Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing
 create mode 100644 data/2021/neurips/Soft Calibration Objectives for Neural Networks
 create mode 100644 data/2021/neurips/Solving Graph-based Public Goods Games with Tree Search and Imitation Learning
 create mode 100644 data/2021/neurips/Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent
 create mode 100644 data/2021/neurips/Solving Soft Clustering Ensemble via $k$-Sparse Discrete Wasserstein Barycenter
 create mode 100644 data/2021/neurips/Space-time Mixing Attention for Video Transformer
 create mode 100644 data/2021/neurips/Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration
 create mode 100644 data/2021/neurips/Sparse Flows: Pruning Continuous-depth Models
 create mode 100644 data/2021/neurips/Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation
 create mode 100644 data/2021/neurips/Sparse Spiking Gradient Descent
 create mode 100644 data/2021/neurips/Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space
 create mode 100644 data/2021/neurips/Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
 create mode 100644 data/2021/neurips/Sparse Uncertainty Representation in Deep Learning with Inducing Weights
 create mode 100644 data/2021/neurips/Sparse is Enough in Scaling Transformers
 create mode 100644 data/2021/neurips/Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains
 create mode 100644 data/2021/neurips/Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework
 create mode 100644 data/2021/neurips/Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis
 create mode 100644 data/2021/neurips/Spatio-Temporal Variational Gaussian Processes
 create mode 100644 data/2021/neurips/Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks
 create mode 100644 data/2021/neurips/Spectral embedding for dynamic networks with stability guarantees
 create mode 100644 data/2021/neurips/Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution
 create mode 100644 data/2021/neurips/Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
 create mode 100644 data/2021/neurips/Speech-T: Transducer for Text to Speech and Beyond
 create mode 100644 data/2021/neurips/Speedy Performance Estimation for Neural Architecture Search
 create mode 100644 data/2021/neurips/Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay
 create mode 100644 data/2021/neurips/Spot the Difference: Detection of Topological Changes via Geometric Alignment
 create mode 100644 data/2021/neurips/Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery
 create mode 100644 data/2021/neurips/Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
 create mode 100644 data/2021/neurips/Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1 n)$
 create mode 100644 data/2021/neurips/Stability and Generalization of Bilevel Programming in Hyperparameter Optimization
 create mode 100644 data/2021/neurips/Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
 create mode 100644 data/2021/neurips/Stabilizing Dynamical Systems via Policy Gradient Methods
 create mode 100644 data/2021/neurips/Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks
 create mode 100644 data/2021/neurips/Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
 create mode 100644 data/2021/neurips/Stateful ODE-Nets using Basis Function Expansions
 create mode 100644 data/2021/neurips/Stateful Strategic Regression
 create mode 100644 data/2021/neurips/Statistical Inference with M-Estimators on Adaptively Collected Data
 create mode 100644 data/2021/neurips/Statistical Query Lower Bounds for List-Decodable Linear Regression
 create mode 100644 data/2021/neurips/Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency
 create mode 100644 data/2021/neurips/Statistical Undecidability in Linear, Non-Gaussian Causal Models in the Presence of Latent Confounders
 create mode 100644 data/2021/neurips/Statistically and Computationally Efficient Linear Meta-representation Learning
 create mode 100644 data/2021/neurips/Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
 create mode 100644 data/2021/neurips/Stochastic Bias-Reduced Gradient Methods
 create mode 100644 data/2021/neurips/Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
 create mode 100644 data/2021/neurips/Stochastic Multi-Armed Bandits with Control Variates
 create mode 100644 data/2021/neurips/Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge
 create mode 100644 data/2021/neurips/Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence
 create mode 100644 data/2021/neurips/Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
 create mode 100644 data/2021/neurips/Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser
 create mode 100644 data/2021/neurips/Stochastic bandits with groups of similar arms
 create mode 100644 data/2021/neurips/Stochastic optimization under time drift: iterate averaging, step-decay schedules, and high probability guarantees
 create mode 100644 data/2021/neurips/Storchastic: A Framework for General Stochastic Automatic Differentiation
 create mode 100644 data/2021/neurips/Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare
 create mode 100644 data/2021/neurips/Streaming Linear System Identification with Reverse Experience Replay
 create mode 100644 data/2021/neurips/Stronger NAS with Weaker Predictors
 create mode 100644 data/2021/neurips/Structural Credit Assignment in Neural Networks using Reinforcement Learning
 create mode 100644 data/2021/neurips/Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families
 create mode 100644 data/2021/neurips/Structure-Aware Random Fourier Kernel for Graphs
 create mode 100644 data/2021/neurips/Structured Denoising Diffusion Models in Discrete State-Spaces
 create mode 100644 data/2021/neurips/Structured Dropout Variational Inference for Bayesian Neural Networks
 create mode 100644 data/2021/neurips/Structured Reordering for Modeling Latent Alignments in Sequence Transduction
 create mode 100644 data/2021/neurips/Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
 create mode 100644 data/2021/neurips/Stylized Dialogue Generation with Multi-Pass Dual Learning
 create mode 100644 data/2021/neurips/Sub-Linear Memory: How to Make Performers SLiM
 create mode 100644 data/2021/neurips/SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning
 create mode 100644 data/2021/neurips/Subgame solving without common knowledge
 create mode 100644 data/2021/neurips/Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
 create mode 100644 data/2021/neurips/Subgoal Search For Complex Reasoning Tasks
 create mode 100644 data/2021/neurips/Subgraph Federated Learning with Missing Neighbor Generation
 create mode 100644 data/2021/neurips/Subgroup Generalization and Fairness of Graph Neural Networks
 create mode 100644 data/2021/neurips/Submodular + Concave
 create mode 100644 data/2021/neurips/Subquadratic Overparameterization for Shallow Neural Networks
 create mode 100644 data/2021/neurips/Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning
 create mode 100644 data/2021/neurips/Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer
 create mode 100644 data/2021/neurips/Supervising the Transfer of Reasoning Patterns in VQA
 create mode 100644 data/2021/neurips/Support Recovery of Sparse Signals from a Mixture of Linear Measurements
 create mode 100644 data/2021/neurips/Support vector machines and linear regression coincide with very high-dimensional features
 create mode 100644 data/2021/neurips/Surrogate Regret Bounds for Polyhedral Losses
 create mode 100644 data/2021/neurips/SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
 create mode 100644 data/2021/neurips/SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision
 create mode 100644 data/2021/neurips/Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
 create mode 100644 data/2021/neurips/Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory
 create mode 100644 data/2021/neurips/SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
 create mode 100644 data/2021/neurips/Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
 create mode 100644 data/2021/neurips/Systematic Generalization with Edge Transformers
 create mode 100644 data/2021/neurips/T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs
 create mode 100644 data/2021/neurips/TAAC: Temporally Abstract Actor-Critic for Continuous Control
 create mode 100644 data/2021/neurips/TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework
 create mode 100644 data/2021/neurips/TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation
 create mode 100644 data/2021/neurips/TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness
 create mode 100644 data/2021/neurips/TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive?
 create mode 100644 data/2021/neurips/TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Tactical Optimism and Pessimism for Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time
 create mode 100644 data/2021/neurips/Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning
 create mode 100644 data/2021/neurips/Targeted Neural Dynamical Modeling
 create mode 100644 data/2021/neurips/Task-Adaptive Neural Network Search with Meta-Contrastive Learning
 create mode 100644 data/2021/neurips/Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data
 create mode 100644 data/2021/neurips/Taxonomizing local versus global structure in neural network loss landscapes
 create mode 100644 data/2021/neurips/Teachable Reinforcement Learning via Advice Distillation
 create mode 100644 data/2021/neurips/Teaching an Active Learner with Contrastive Examples
 create mode 100644 data/2021/neurips/Teaching via Best-Case Counterexamples in the Learning-with-Equivalence-Queries Paradigm
 create mode 100644 data/2021/neurips/Techniques for Symbol Grounding with SATNet
 create mode 100644 data/2021/neurips/Temporal-attentive Covariance Pooling Networks for Video Recognition
 create mode 100644 data/2021/neurips/Temporally Abstract Partial Models
 create mode 100644 data/2021/neurips/Tensor Normal Training for Deep Learning Models
 create mode 100644 data/2021/neurips/Tensor decompositions of higher-order correlations by nonlinear Hebbian plasticity
 create mode 100644 data/2021/neurips/Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
 create mode 100644 data/2021/neurips/Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization
 create mode 100644 data/2021/neurips/Test-Time Personalization with a Transformer for Human Pose Estimation
 create mode 100644 data/2021/neurips/Test-time Collective Prediction
 create mode 100644 data/2021/neurips/TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks
 create mode 100644 data/2021/neurips/Testing Probabilistic Circuits
 create mode 100644 data/2021/neurips/The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy
 create mode 100644 data/2021/neurips/The Benefits of Implicit Regularization from SGD in Least Squares Problems
 create mode 100644 data/2021/neurips/The Causal-Neural Connection: Expressiveness, Learnability, and Inference
 create mode 100644 data/2021/neurips/The Complexity of Bayesian Network Learning: Revisiting the Superstructure
 create mode 100644 data/2021/neurips/The Complexity of Sparse Tensor PCA
 create mode 100644 data/2021/neurips/The Difficulty of Passive Learning in Deep Reinforcement Learning
 create mode 100644 data/2021/neurips/The Effect of the Intrinsic Dimension on the Generalization of Quadratic Classifiers
 create mode 100644 data/2021/neurips/The Elastic Lottery Ticket Hypothesis
 create mode 100644 data/2021/neurips/The Emergence of Objectness: Learning Zero-shot Segmentation from Videos
 create mode 100644 data/2021/neurips/The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization
 create mode 100644 data/2021/neurips/The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
 create mode 100644 data/2021/neurips/The Image Local Autoregressive Transformer
 create mode 100644 data/2021/neurips/The Implicit Bias of Minima Stability: A View from Function Space
 create mode 100644 data/2021/neurips/The Inductive Bias of Quantum Kernels
 create mode 100644 data/2021/neurips/The Lazy Online Subgradient Algorithm is Universal on Strongly Convex Domains
 create mode 100644 data/2021/neurips/The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
 create mode 100644 data/2021/neurips/The Limits of Optimal Pricing in the Dark
 create mode 100644 data/2021/neurips/The Many Faces of Adversarial Risk
 create mode 100644 data/2021/neurips/The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
 create mode 100644 data/2021/neurips/The Pareto Frontier of model selection for general Contextual Bandits
 create mode 100644 data/2021/neurips/The Role of Global Labels in Few-Shot Classification and How to Infer Them
 create mode 100644 data/2021/neurips/The Semi-Random Satisfaction of Voting Axioms
 create mode 100644 data/2021/neurips/The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
 create mode 100644 data/2021/neurips/The Skellam Mechanism for Differentially Private Federated Learning
 create mode 100644 data/2021/neurips/The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation
 create mode 100644 data/2021/neurips/The Utility of Explainable AI in Ad Hoc Human-Machine Teaming
 create mode 100644 data/2021/neurips/The Value of Information When Deciding What to Learn
 create mode 100644 data/2021/neurips/The balancing principle for parameter choice in distance-regularized domain adaptation
 create mode 100644 data/2021/neurips/The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
 create mode 100644 data/2021/neurips/The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond
 create mode 100644 data/2021/neurips/The decomposition of the higher-order homology embedding constructed from the $k$-Laplacian
 create mode 100644 data/2021/neurips/The effectiveness of feature attribution methods and its correlation with automatic evaluation scores
 create mode 100644 data/2021/neurips/The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
 create mode 100644 data/2021/neurips/The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization
 create mode 100644 data/2021/neurips/The staircase property: How hierarchical structure can guide deep learning
 create mode 100644 data/2021/neurips/There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
 create mode 100644 data/2021/neurips/Think Big, Teach Small: Do Language Models Distil Occam's Razor?
 create mode 100644 data/2021/neurips/Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
 create mode 100644 data/2021/neurips/Three-dimensional spike localization and improved motion correction for Neuropixels recordings
 create mode 100644 data/2021/neurips/Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize
 create mode 100644 data/2021/neurips/Tighter Expected Generalization Error Bounds via Wasserstein Distance
 create mode 100644 data/2021/neurips/Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods
 create mode 100644 data/2021/neurips/Time-independent Generalization Bounds for SGLD in Non-convex Settings
 create mode 100644 data/2021/neurips/Time-series Generation by Contrastive Imitation
 create mode 100644 data/2021/neurips/To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs
 create mode 100644 data/2021/neurips/To The Point: Correspondence-driven monocular 3D category reconstruction
 create mode 100644 data/2021/neurips/ToAlign: Task-Oriented Alignment for Unsupervised Domain Adaptation
 create mode 100644 data/2021/neurips/TokenLearner: Adaptive Space-Time Tokenization for Videos
 create mode 100644 data/2021/neurips/Topic Modeling Revisited: A Document Graph-based Neural Network Perspective
 create mode 100644 data/2021/neurips/TopicNet: Semantic Graph-Guided Topic Discovery
 create mode 100644 data/2021/neurips/Topographic VAEs learn Equivariant Capsules
 create mode 100644 data/2021/neurips/Topological Attention for Time Series Forecasting
 create mode 100644 data/2021/neurips/Topological Detection of Trojaned Neural Networks
 create mode 100644 data/2021/neurips/Topological Relational Learning on Graphs
 create mode 100644 data/2021/neurips/Topology-Imbalance Learning for Semi-Supervised Node Classification
 create mode 100644 data/2021/neurips/Towards Best-of-All-Worlds Online Learning with Feedback Graphs
 create mode 100644 data/2021/neurips/Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples
 create mode 100644 data/2021/neurips/Towards Biologically Plausible Convolutional Networks
 create mode 100644 data/2021/neurips/Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective
 create mode 100644 data/2021/neurips/Towards Context-Agnostic Learning Using Synthetic Data
 create mode 100644 data/2021/neurips/Towards Deeper Deep Reinforcement Learning with Spectral Normalization
 create mode 100644 data/2021/neurips/Towards Efficient and Effective Adversarial Training
 create mode 100644 data/2021/neurips/Towards Enabling Meta-Learning from Target Models
 create mode 100644 data/2021/neurips/Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond
 create mode 100644 data/2021/neurips/Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
 create mode 100644 data/2021/neurips/Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
 create mode 100644 data/2021/neurips/Towards Lower Bounds on the Depth of ReLU Neural Networks
 create mode 100644 data/2021/neurips/Towards Multi-Grained Explainability for Graph Neural Networks
 create mode 100644 data/2021/neurips/Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach
 create mode 100644 data/2021/neurips/Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation
 create mode 100644 data/2021/neurips/Towards Robust Bisimulation Metric Learning
 create mode 100644 data/2021/neurips/Towards Robust and Reliable Algorithmic Recourse
 create mode 100644 data/2021/neurips/Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors
 create mode 100644 data/2021/neurips/Towards Sample-efficient Overparameterized Meta-learning
 create mode 100644 data/2021/neurips/Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN
 create mode 100644 data/2021/neurips/Towards Sharper Generalization Bounds for Structured Prediction
 create mode 100644 data/2021/neurips/Towards Stable and Robust AdderNets
 create mode 100644 data/2021/neurips/Towards Tight Communication Lower Bounds for Distributed Optimisation
 create mode 100644 data/2021/neurips/Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization
 create mode 100644 data/2021/neurips/Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond
 create mode 100644 data/2021/neurips/Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
 create mode 100644 data/2021/neurips/Towards a Theoretical Framework of Out-of-Distribution Generalization
 create mode 100644 data/2021/neurips/Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness
 create mode 100644 data/2021/neurips/Towards a Unified Information-Theoretic Framework for Generalization
 create mode 100644 data/2021/neurips/Towards mental time travel: a hierarchical memory for reinforcement learning agents
 create mode 100644 data/2021/neurips/Towards optimally abstaining from prediction with OOD test examples
 create mode 100644 data/2021/neurips/Towards robust vision by multi-task learning on monkey visual cortex
 create mode 100644 data/2021/neurips/Towards understanding retrosynthesis by energy-based models
 create mode 100644 data/2021/neurips/Tracking People with 3D Representations
 create mode 100644 data/2021/neurips/Tracking Without Re-recognition in Humans and Machines
 create mode 100644 data/2021/neurips/Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows
 create mode 100644 data/2021/neurips/Tractable Regularization of Probabilistic Circuits
 create mode 100644 data/2021/neurips/Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds
 create mode 100644 data/2021/neurips/Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State
 create mode 100644 data/2021/neurips/Training Neural Networks is ER-complete
 create mode 100644 data/2021/neurips/Training Neural Networks with Fixed Sparse Masks
 create mode 100644 data/2021/neurips/Training Over-parameterized Models with Non-decomposable Objectives
 create mode 100644 data/2021/neurips/Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time
 create mode 100644 data/2021/neurips/TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
 create mode 100644 data/2021/neurips/TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification
 create mode 100644 data/2021/neurips/TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification
 create mode 100644 data/2021/neurips/Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization
 create mode 100644 data/2021/neurips/Transformer in Transformer
 create mode 100644 data/2021/neurips/TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
 create mode 100644 data/2021/neurips/Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs
 create mode 100644 data/2021/neurips/Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation
 create mode 100644 data/2021/neurips/Tree in Tree: from Decision Trees to Decision Graphs
 create mode 100644 data/2021/neurips/TriBERT: Human-centric Audio-visual Representation Learning
 create mode 100644 data/2021/neurips/True Few-Shot Learning with Language Models
 create mode 100644 data/2021/neurips/Truncated Marginal Neural Ratio Estimation
 create mode 100644 data/2021/neurips/Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions
 create mode 100644 data/2021/neurips/Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
 create mode 100644 data/2021/neurips/Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
 create mode 100644 data/2021/neurips/Turing Completeness of Bounded-Precision Recurrent Neural Networks
 create mode 100644 data/2021/neurips/Twice regularized MDPs and the equivalence between robustness and regularization
 create mode 100644 data/2021/neurips/Twins: Revisiting the Design of Spatial Attention in Vision Transformers
 create mode 100644 data/2021/neurips/Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution
 create mode 100644 data/2021/neurips/Two steps to risk sensitivity
 create mode 100644 data/2021/neurips/Two-sided fairness in rankings via Lorenz dominance
 create mode 100644 data/2021/neurips/Two-step lookahead Bayesian optimization with inequality constraints
 create mode 100644 "data/2021/neurips/T\303\266RF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis"
 create mode 100644 data/2021/neurips/UCB-based Algorithms for Multinomial Logistic Regression Bandits
 create mode 100644 data/2021/neurips/UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis
 create mode 100644 data/2021/neurips/USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems
 create mode 100644 data/2021/neurips/Ultrahyperbolic Neural Networks
 create mode 100644 data/2021/neurips/Unadversarial Examples: Designing Objects for Robust Vision
 create mode 100644 data/2021/neurips/Unbalanced Optimal Transport through Non-negative Penalized Linear Regression
 create mode 100644 data/2021/neurips/Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning
 create mode 100644 data/2021/neurips/Uncertain Decisions Facilitate Better Preference Learning
 create mode 100644 data/2021/neurips/Uncertainty Calibration for Ensemble-Based Debiasing Methods
 create mode 100644 data/2021/neurips/Uncertainty Quantification and Deep Ensembles
 create mode 100644 data/2021/neurips/Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
 create mode 100644 data/2021/neurips/Uncertainty-Driven Loss for Single Image Super-Resolution
 create mode 100644 data/2021/neurips/Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems
 create mode 100644 data/2021/neurips/Understanding Bandits with Graph Feedback
 create mode 100644 data/2021/neurips/Understanding Deflation Process in Over-parametrized Tensor Decomposition
 create mode 100644 data/2021/neurips/Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization
 create mode 100644 data/2021/neurips/Understanding How Encoder-Decoder Architectures Attend
 create mode 100644 data/2021/neurips/Understanding Instance-based Interpretability of Variational Auto-Encoders
 create mode 100644 data/2021/neurips/Understanding Interlocking Dynamics of Cooperative Rationalization
 create mode 100644 data/2021/neurips/Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning
 create mode 100644 data/2021/neurips/Understanding Partial Multi-Label Learning via Mutual Information
 create mode 100644 data/2021/neurips/Understanding and Improving Early Stopping for Learning with Noisy Labels
 create mode 100644 data/2021/neurips/Understanding the Effect of Stochasticity in Policy Optimization
 create mode 100644 data/2021/neurips/Understanding the Generalization Benefit of Model Invariance from a Data Perspective
 create mode 100644 data/2021/neurips/Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning
 create mode 100644 data/2021/neurips/Understanding the Under-Coverage Bias in Uncertainty Estimation
 create mode 100644 data/2021/neurips/Unfolding Taylor's Approximations for Image Restoration
 create mode 100644 data/2021/neurips/UniDoc: Unified Pretraining Framework for Document Understanding
 create mode 100644 data/2021/neurips/Uniform Concentration Bounds toward a Unified Framework for Robust Clustering
 create mode 100644 data/2021/neurips/Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting
 create mode 100644 data/2021/neurips/Uniform Sampling over Episode Difficulty
 create mode 100644 data/2021/neurips/Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation
 create mode 100644 data/2021/neurips/Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
 create mode 100644 data/2021/neurips/Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization
 create mode 100644 data/2021/neurips/Unifying lower bounds on prediction dimension of convex surrogates
 create mode 100644 data/2021/neurips/Unintended Selection: Persistent Qualification Rate Disparities and Interventions
 create mode 100644 data/2021/neurips/Unique sparse decomposition of low rank matrices
 create mode 100644 data/2021/neurips/Universal Approximation Using Well-Conditioned Normalizing Flows
 create mode 100644 data/2021/neurips/Universal Graph Convolutional Networks
 create mode 100644 data/2021/neurips/Universal Off-Policy Evaluation
 create mode 100644 data/2021/neurips/Universal Rate-Distortion-Perception Representations for Lossy Compression
 create mode 100644 data/2021/neurips/Universal Semi-Supervised Learning
 create mode 100644 data/2021/neurips/Unlabeled Principal Component Analysis
 create mode 100644 data/2021/neurips/Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning
 create mode 100644 data/2021/neurips/Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning
 create mode 100644 data/2021/neurips/Unsupervised Foreground Extraction via Deep Region Competition
 create mode 100644 data/2021/neurips/Unsupervised Learning of Compositional Energy Concepts
 create mode 100644 data/2021/neurips/Unsupervised Motion Representation Learning with Capsule Autoencoders
 create mode 100644 data/2021/neurips/Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport
 create mode 100644 data/2021/neurips/Unsupervised Object-Based Transition Models For 3D Partially Observable Environments
 create mode 100644 data/2021/neurips/Unsupervised Object-Level Representation Learning from Scene Images
 create mode 100644 data/2021/neurips/Unsupervised Part Discovery from Contrastive Reconstruction
 create mode 100644 data/2021/neurips/Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly
 create mode 100644 data/2021/neurips/Unsupervised Speech Recognition
 create mode 100644 data/2021/neurips/User-Level Differentially Private Learning via Correlated Sampling
 create mode 100644 data/2021/neurips/Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks
 create mode 100644 data/2021/neurips/VAST: Value Function Factorization with Variable Agent Sub-Teams
 create mode 100644 data/2021/neurips/VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
 create mode 100644 data/2021/neurips/VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization
 create mode 100644 data/2021/neurips/Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory
 create mode 100644 data/2021/neurips/Validation Free and Replication Robust Volume-based Data Valuation
 create mode 100644 data/2021/neurips/Variance-Aware Off-Policy Evaluation with Linear Function Approximation
 create mode 100644 data/2021/neurips/Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
 create mode 100644 data/2021/neurips/Variational Bayesian Optimistic Sampling
 create mode 100644 data/2021/neurips/Variational Bayesian Reinforcement Learning with Regret Bounds
 create mode 100644 data/2021/neurips/Variational Continual Bayesian Meta-Learning
 create mode 100644 data/2021/neurips/Variational Inference for Continuous-Time Switching Dynamical Systems
 create mode 100644 data/2021/neurips/Variational Model Inversion Attacks
 create mode 100644 data/2021/neurips/Variational Multi-Task Learning with Gumbel-Softmax Priors
 create mode 100644 data/2021/neurips/Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices
 create mode 100644 data/2021/neurips/Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels
 create mode 100644 data/2021/neurips/ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
 create mode 100644 data/2021/neurips/ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
 create mode 100644 data/2021/neurips/VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
 create mode 100644 data/2021/neurips/Video Instance Segmentation using Inter-Frame Communication Transformers
 create mode 100644 data/2021/neurips/VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media
 create mode 100644 data/2021/neurips/Visual Adversarial Imitation Learning using Variational Models
 create mode 100644 data/2021/neurips/Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases
 create mode 100644 data/2021/neurips/Visualizing the Emergence of Intermediate Visual Patterns in DNNs
 create mode 100644 data/2021/neurips/VoiceMixer: Adversarial Voice Style Mixup
 create mode 100644 data/2021/neurips/Volume Rendering of Neural Implicit Surfaces
 create mode 100644 data/2021/neurips/Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image
 create mode 100644 data/2021/neurips/Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
 create mode 100644 data/2021/neurips/Weak-shot Fine-grained Classification via Similarity Transfer
 create mode 100644 data/2021/neurips/Weighted model estimation for offline model-based reinforcement learning
 create mode 100644 data/2021/neurips/Weisfeiler and Lehman Go Cellular: CW Networks
 create mode 100644 data/2021/neurips/Well-tuned Simple Nets Excel on Tabular Datasets
 create mode 100644 data/2021/neurips/What Makes Multi-Modal Learning Better than Single (Provably)
 create mode 100644 data/2021/neurips/What Matters for Adversarial Imitation Learning?
 create mode 100644 data/2021/neurips/What can linearized neural networks actually say about generalization?
 create mode 100644 data/2021/neurips/What training reveals about neural network complexity
 create mode 100644 data/2021/neurips/What's a good imputation to predict with missing values?
 create mode 100644 data/2021/neurips/When Are Solutions Connected in Deep Networks?
 create mode 100644 data/2021/neurips/When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
 create mode 100644 data/2021/neurips/When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking
 create mode 100644 data/2021/neurips/When Is Generalizable Reinforcement Learning Tractable?
 create mode 100644 data/2021/neurips/When Is Unsupervised Disentanglement Possible?
 create mode 100644 data/2021/neurips/When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
 create mode 100644 data/2021/neurips/When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting
 create mode 100644 data/2021/neurips/Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
 create mode 100644 data/2021/neurips/Who Leads and Who Follows in Strategic Classification?
 create mode 100644 data/2021/neurips/Why Do Better Loss Functions Lead to Less Transferable Features?
 create mode 100644 data/2021/neurips/Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
 create mode 100644 data/2021/neurips/Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
 create mode 100644 data/2021/neurips/Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks
 create mode 100644 data/2021/neurips/Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
 create mode 100644 data/2021/neurips/Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
 create mode 100644 data/2021/neurips/Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
 create mode 100644 data/2021/neurips/Wisdom of the Crowd Voting: Truthful Aggregation of Voter Information and Preferences
 create mode 100644 data/2021/neurips/Word2Fun: Modelling Words as Functions for Diachronic Word Representation
 create mode 100644 data/2021/neurips/XCiT: Cross-Covariance Image Transformers
 create mode 100644 data/2021/neurips/XDO: A Double Oracle Algorithm for Extensive-Form Games
 create mode 100644 data/2021/neurips/You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism
 create mode 100644 data/2021/neurips/You Never Cluster Alone
 create mode 100644 data/2021/neurips/You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
 create mode 100644 data/2021/neurips/You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership
 create mode 100644 data/2021/neurips/Your head is there to move you around: Goal-driven models of the primate dorsal pathway
 create mode 100644 data/2021/neurips/Zero Time Waste: Recycling Predictions in Early Exit Neural Networks
 create mode 100644 data/2021/neurips/argmax centroid
 create mode 100644 data/2021/neurips/iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder
 create mode 100644 "data/2022/neurips/\"Lossless\" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach"
 create mode 100644 "data/2022/neurips/\"Why Not Other Classes?\": Towards Class-Contrastive Back-Propagation Explanations"
 create mode 100644 data/2022/neurips/$k$-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension
 create mode 100644 data/2022/neurips/(De-)Randomized Smoothing for Decision Stump Ensembles
 create mode 100644 data/2022/neurips/(Optimal) Online Bipartite Matching with Degree Information
 create mode 100644 data/2022/neurips/360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning
 create mode 100644 data/2022/neurips/3D Concept Grounding on Neural Fields
 create mode 100644 data/2022/neurips/3DB: A Framework for Debugging Computer Vision Models
 create mode 100644 data/2022/neurips/3DILG: Irregular Latent Grids for 3D Generative Modeling
 create mode 100644 data/2022/neurips/3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection on Point Clouds
 create mode 100644 data/2022/neurips/4D Unsupervised Object Discovery
 create mode 100644 data/2022/neurips/A Benchmark for Compositional Visual Reasoning
 create mode 100644 data/2022/neurips/A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
 create mode 100644 data/2022/neurips/A Boosting Approach to Reinforcement Learning
 create mode 100644 data/2022/neurips/A Causal Analysis of Harm
 create mode 100644 data/2022/neurips/A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization
 create mode 100644 data/2022/neurips/A Characterization of Semi-Supervised Adversarially Robust PAC Learnability
 create mode 100644 data/2022/neurips/A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
 create mode 100644 data/2022/neurips/A Closer Look at Offline RL Agents
 create mode 100644 data/2022/neurips/A Closer Look at Prototype Classifier for Few-shot Image Classification
 create mode 100644 data/2022/neurips/A Closer Look at Weakly-Supervised Audio-Visual Source Localization
 create mode 100644 data/2022/neurips/A Closer Look at the Adversarial Robustness of Deep Equilibrium Models
 create mode 100644 data/2022/neurips/A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
 create mode 100644 data/2022/neurips/A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
 create mode 100644 data/2022/neurips/A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning
 create mode 100644 data/2022/neurips/A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking
 create mode 100644 data/2022/neurips/A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension
 create mode 100644 data/2022/neurips/A Consistent and Differentiable Lp Canonical Calibration Error Estimator
 create mode 100644 data/2022/neurips/A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction
 create mode 100644 data/2022/neurips/A Continuous Time Framework for Discrete Denoising Models
 create mode 100644 data/2022/neurips/A Contrastive Framework for Neural Text Generation
 create mode 100644 data/2022/neurips/A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning
 create mode 100644 data/2022/neurips/A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training
 create mode 100644 data/2022/neurips/A Dataset for Efforts Towards Achieving the Sustainable Development Goal of Safe Working Environments
 create mode 100644 data/2022/neurips/A Deep Learning Dataloader with Shared Data Preparation
 create mode 100644 data/2022/neurips/A Deep Reinforcement Learning Framework for Column Generation
 create mode 100644 data/2022/neurips/A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval
 create mode 100644 data/2022/neurips/A Differentially Private Linear-Time fPTAS for the Minimum Enclosing Ball Problem
 create mode 100644 data/2022/neurips/A Direct Approximation of AIXI Using Logical State Abstractions
 create mode 100644 data/2022/neurips/A Fast Post-Training Pruning Framework for Transformers
 create mode 100644 data/2022/neurips/A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data
 create mode 100644 data/2022/neurips/A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation
 create mode 100644 data/2022/neurips/A Fourier Approach to Mixture Learning
 create mode 100644 data/2022/neurips/A General Framework for Auditing Differentially Private Machine Learning
 create mode 100644 data/2022/neurips/A Geometric Perspective on Variational Autoencoders
 create mode 100644 data/2022/neurips/A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis
 create mode 100644 data/2022/neurips/A Kernelised Stein Statistic for Assessing Implicit Generative Models
 create mode 100644 data/2022/neurips/A Lagrangian Duality Approach to Active Learning
 create mode 100644 data/2022/neurips/A Large Scale Search Dataset for Unbiased Learning to Rank
 create mode 100644 data/2022/neurips/A Lower Bound of Hash Codes' Performance
 create mode 100644 data/2022/neurips/A Mean-Field Game Approach to Cloud Resource Management with Function Approximation
 create mode 100644 data/2022/neurips/A Mixture Of Surprises for Unsupervised Reinforcement Learning
 create mode 100644 data/2022/neurips/A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
 create mode 100644 data/2022/neurips/A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
 create mode 100644 data/2022/neurips/A Multilabel Classification Framework for Approximate Nearest Neighbor Search
 create mode 100644 data/2022/neurips/A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
 create mode 100644 data/2022/neurips/A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
 create mode 100644 data/2022/neurips/A Neural Corpus Indexer for Document Retrieval
 create mode 100644 data/2022/neurips/A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity
 create mode 100644 data/2022/neurips/A New Family of Generalization Bounds Using Samplewise Evaluated CMI
 create mode 100644 data/2022/neurips/A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models
 create mode 100644 data/2022/neurips/A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
 create mode 100644 data/2022/neurips/A PAC-Bayesian Generalization Bound for Equivariant Networks
 create mode 100644 data/2022/neurips/A Policy-Guided Imitation Approach for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/A Practical, Progressively-Expressive GNN
 create mode 100644 data/2022/neurips/A Probabilistic Graph Coupling View of Dimension Reduction
 create mode 100644 data/2022/neurips/A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization
 create mode 100644 data/2022/neurips/A Quadrature Rule combining Control Variates and Adaptive Importance Sampling
 create mode 100644 data/2022/neurips/A Quantitative Geometric Approach to Neural-Network Smoothness
 create mode 100644 data/2022/neurips/A Reduction to Binary Approach for Debiasing Multiclass Datasets
 create mode 100644 data/2022/neurips/A Regret-Variance Trade-Off in Online Learning
 create mode 100644 data/2022/neurips/A Reparametrization-Invariant Sharpness Measure Based on Information Geometry
 create mode 100644 data/2022/neurips/A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits
 create mode 100644 data/2022/neurips/A Rotated Hyperbolic Wrapped Normal Distribution for Hierarchical Representation Learning
 create mode 100644 data/2022/neurips/A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree
 create mode 100644 data/2022/neurips/A Simple Approach to Automated Spectral Clustering
 create mode 100644 data/2022/neurips/A Simple Decentralized Cross-Entropy Method
 create mode 100644 data/2022/neurips/A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk
 create mode 100644 data/2022/neurips/A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits
 create mode 100644 data/2022/neurips/A Single-timescale Analysis for Stochastic Approximation with Multiple Coupled Sequences
 create mode 100644 data/2022/neurips/A Solver-free Framework for Scalable Learning in Neural ILP Architectures
 create mode 100644 data/2022/neurips/A Spectral Approach to Item Response Theory
 create mode 100644 data/2022/neurips/A Statistical Online Inference Approach in Averaged Stochastic Approximation
 create mode 100644 data/2022/neurips/A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization
 create mode 100644 data/2022/neurips/A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets
 create mode 100644 data/2022/neurips/A Theoretical Framework for Inference Learning
 create mode 100644 data/2022/neurips/A Theoretical Study on Solving Continual Learning
 create mode 100644 data/2022/neurips/A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
 create mode 100644 data/2022/neurips/A Theoretical View on Sparsely Activated Networks
 create mode 100644 data/2022/neurips/A Theory of PAC Learnability under Transformation Invariances
 create mode 100644 data/2022/neurips/A Transformer-Based Object Detector with Coarse-Fine Crossing Representations
 create mode 100644 data/2022/neurips/A Unified Analysis of Federated Learning with Arbitrary Client Participation
 create mode 100644 data/2022/neurips/A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
 create mode 100644 data/2022/neurips/A Unified Convergence Theorem for Stochastic Optimization Methods
 create mode 100644 data/2022/neurips/A Unified Diversity Measure for Multiagent Reinforcement Learning
 create mode 100644 data/2022/neurips/A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks
 create mode 100644 data/2022/neurips/A Unified Framework for Alternating Offline Model Training and Policy Learning
 create mode 100644 data/2022/neurips/A Unified Framework for Deep Symbolic Regression
 create mode 100644 data/2022/neurips/A Unified Hard-Constraint Framework for Solving Geometrically Complex PDEs
 create mode 100644 data/2022/neurips/A Unified Model for Multi-class Anomaly Detection
 create mode 100644 data/2022/neurips/A Unified Sequence Interface for Vision Tasks
 create mode 100644 data/2022/neurips/A Unifying Framework for Online Optimization with Long-Term Constraints
 create mode 100644 data/2022/neurips/A Unifying Framework of Off-Policy General Value Function Evaluation
 create mode 100644 data/2022/neurips/A Universal Error Measure for Input Predictions Applied to Online Graph Problems
 create mode 100644 data/2022/neurips/A Variant of Anderson Mixing with Minimal Memory Size
 create mode 100644 data/2022/neurips/A Variational Edge Partition Model for Supervised Graph Representation Learning
 create mode 100644 data/2022/neurips/A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
 create mode 100644 data/2022/neurips/A composable machine-learning approach for steady-state simulations on high-resolution grids
 create mode 100644 data/2022/neurips/A consistently adaptive trust-region method
 create mode 100644 data/2022/neurips/A contrastive rule for meta-learning
 create mode 100644 data/2022/neurips/A framework for bilevel optimization that enables stochastic and global variance reduction algorithms
 create mode 100644 data/2022/neurips/A gradient estimator via L1-randomization for online zero-order optimization with two point feedback
 create mode 100644 data/2022/neurips/A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions
 create mode 100644 data/2022/neurips/A new dataset for multilingual keyphrase generation
 create mode 100644 data/2022/neurips/A permutation-free kernel two-sample test
 create mode 100644 data/2022/neurips/A sharp NMF result with applications in network modeling
 create mode 100644 data/2022/neurips/A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal
 create mode 100644 data/2022/neurips/A theory of weight distribution-constrained learning
 create mode 100644 data/2022/neurips/A time-resolved theory of information encoding in recurrent neural networks
 create mode 100644 data/2022/neurips/A2: Efficient Automated Attacker for Boosting Adversarial Training
 create mode 100644 data/2022/neurips/ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
 create mode 100644 data/2022/neurips/AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
 create mode 100644 data/2022/neurips/ADBench: Anomaly Detection Benchmark
 create mode 100644 data/2022/neurips/ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation
 create mode 100644 data/2022/neurips/ALMA: Hierarchical Learning for Composite Multi-Agent Tasks
 create mode 100644 data/2022/neurips/AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
 create mode 100644 data/2022/neurips/AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
 create mode 100644 data/2022/neurips/APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction
 create mode 100644 data/2022/neurips/APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
 create mode 100644 data/2022/neurips/ASPiRe: Adaptive Skill Priors for Reinforcement Learning
 create mode 100644 data/2022/neurips/ATD: Augmenting CP Tensor Decomposition by Self Supervision
 create mode 100644 data/2022/neurips/AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
 create mode 100644 data/2022/neurips/AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
 create mode 100644 data/2022/neurips/AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs
 create mode 100644 data/2022/neurips/Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
 create mode 100644 data/2022/neurips/Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling
 create mode 100644 data/2022/neurips/Accelerated Projected Gradient Algorithms for Sparsity Constrained Optimization Problems
 create mode 100644 data/2022/neurips/Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations
 create mode 100644 data/2022/neurips/Accelerating Certified Robustness Training via Knowledge Transfer
 create mode 100644 data/2022/neurips/Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion
 create mode 100644 data/2022/neurips/Accelerating Sparse Convolution with Column Vector-Wise Sparsity
 create mode 100644 data/2022/neurips/Acceleration in Distributed Sparse Regression
 create mode 100644 data/2022/neurips/Action-modulated midbrain dopamine activity arises from distributed control policies
 create mode 100644 data/2022/neurips/ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment
 create mode 100644 data/2022/neurips/Active Bayesian Causal Inference
 create mode 100644 data/2022/neurips/Active Exploration for Inverse Reinforcement Learning
 create mode 100644 data/2022/neurips/Active Labeling: Streaming Stochastic Gradients
 create mode 100644 data/2022/neurips/Active Learning Helps Pretrained Models Learn the Intended Task
 create mode 100644 data/2022/neurips/Active Learning Polynomial Threshold Functions
 create mode 100644 data/2022/neurips/Active Learning Through a Covering Lens
 create mode 100644 data/2022/neurips/Active Learning for Multiple Target Models
 create mode 100644 data/2022/neurips/Active Learning of Classifiers with Label and Seed Queries
 create mode 100644 data/2022/neurips/Active Learning with Neural Networks: Insights from Nonparametric Statistics
 create mode 100644 data/2022/neurips/Active Learning with Safety Constraints
 create mode 100644 data/2022/neurips/Active Ranking without Strong Stochastic Transitivity
 create mode 100644 data/2022/neurips/Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation
 create mode 100644 data/2022/neurips/Active-Passive SimStereo - Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods
 create mode 100644 data/2022/neurips/AdaFocal: Calibration-aware Adaptive Focal Loss
 create mode 100644 data/2022/neurips/Adam Can Converge Without Any Modification On Update Rules
 create mode 100644 data/2022/neurips/AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
 create mode 100644 data/2022/neurips/Adaptation Accelerating Sampling-based Bayesian Inference in Attractor Neural Networks
 create mode 100644 data/2022/neurips/Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
 create mode 100644 data/2022/neurips/Adapting to Online Label Shift with Provable Guarantees
 create mode 100644 data/2022/neurips/Adaptive Data Debiasing through Bounded Exploration
 create mode 100644 data/2022/neurips/Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport
 create mode 100644 data/2022/neurips/Adaptive Interest for Emphatic Reinforcement Learning
 create mode 100644 data/2022/neurips/Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model
 create mode 100644 data/2022/neurips/Adaptive Oracle-Efficient Online Learning
 create mode 100644 data/2022/neurips/Adaptive Sampling for Discovery
 create mode 100644 data/2022/neurips/Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
 create mode 100644 data/2022/neurips/Adaptively Exploiting d-Separators with Causal Bandits
 create mode 100644 data/2022/neurips/Additive MIL: Intrinsically Interpretable Multiple Instance Learning for Pathology
 create mode 100644 data/2022/neurips/Addressing Leakage in Concept Bottleneck Models
 create mode 100644 data/2022/neurips/Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets
 create mode 100644 data/2022/neurips/Adjoint-aided inference of Gaussian process driven differential equations
 create mode 100644 data/2022/neurips/Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition
 create mode 100644 data/2022/neurips/Advancing Model Pruning via Bi-level Optimization
 create mode 100644 data/2022/neurips/Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
 create mode 100644 data/2022/neurips/Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
 create mode 100644 data/2022/neurips/Adversarial Reprogramming Revisited
 create mode 100644 data/2022/neurips/Adversarial Robustness is at Odds with Lazy Training
 create mode 100644 data/2022/neurips/Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation
 create mode 100644 data/2022/neurips/Adversarial Task Up-sampling for Meta-learning
 create mode 100644 data/2022/neurips/Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks
 create mode 100644 data/2022/neurips/Adversarial Unlearning: Reducing Confidence Along Adversarial Directions
 create mode 100644 data/2022/neurips/Adversarial training for high-stakes reliability
 create mode 100644 data/2022/neurips/Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization
 create mode 100644 data/2022/neurips/AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators
 create mode 100644 data/2022/neurips/Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift
 create mode 100644 data/2022/neurips/AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions
 create mode 100644 data/2022/neurips/Algorithms and Hardness for Learning Linear Thresholds from Label Proportions
 create mode 100644 data/2022/neurips/Algorithms that Approximate Data Removal: New Results and Limitations
 create mode 100644 data/2022/neurips/Algorithms with Prediction Portfolios
 create mode 100644 data/2022/neurips/Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences
 create mode 100644 data/2022/neurips/Aligning individual brains with fused unbalanced Gromov Wasserstein
 create mode 100644 data/2022/neurips/Alignment-guided Temporal Attention for Video Action Recognition
 create mode 100644 data/2022/neurips/All Politics is Local: Redistricting via Local Fairness
 create mode 100644 "data/2022/neurips/Alleviating \"Posterior Collapse\" in Deep Topic Models via Policy Gradient"
 create mode 100644 data/2022/neurips/Alleviating Adversarial Attacks on Variational Autoencoders with MCMC
 create mode 100644 data/2022/neurips/Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid
 create mode 100644 data/2022/neurips/Alternating Mirror Descent for Constrained Min-Max Games
 create mode 100644 data/2022/neurips/Ambiguous Images With Human Judgments for Robust Visual Event Classification
 create mode 100644 data/2022/neurips/Amortized Inference for Causal Structure Learning
 create mode 100644 data/2022/neurips/Amortized Inference for Heterogeneous Reconstruction in Cryo-EM
 create mode 100644 data/2022/neurips/Amortized Mixing Coupling Processes for Clustering
 create mode 100644 data/2022/neurips/Amortized Projection Optimization for Sliced Wasserstein Generative Models
 create mode 100644 data/2022/neurips/Amortized Proximal Optimization
 create mode 100644 data/2022/neurips/Amplifying Membership Exposure via Data Poisoning
 create mode 100644 data/2022/neurips/An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
 create mode 100644 data/2022/neurips/An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects
 create mode 100644 data/2022/neurips/An Algorithm for Learning Switched Linear Dynamics from Data
 create mode 100644 data/2022/neurips/An Analysis of Ensemble Sampling
 create mode 100644 data/2022/neurips/An Analytical Theory of Curriculum Learning in Teacher-Student Networks
 create mode 100644 data/2022/neurips/An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
 create mode 100644 data/2022/neurips/An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
 create mode 100644 data/2022/neurips/An Empirical Study on Disentanglement of Negative-free Contrastive Learning
 create mode 100644 data/2022/neurips/An In-depth Study of Stochastic Backpropagation
 create mode 100644 data/2022/neurips/An Information-Theoretic Framework for Deep Learning
 create mode 100644 data/2022/neurips/An Investigation into Whitening Loss for Self-supervised Learning
 create mode 100644 data/2022/neurips/An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries
 create mode 100644 data/2022/neurips/An empirical analysis of compute-optimal large language model training
 create mode 100644 data/2022/neurips/Analyzing Data-Centric Properties for Graph Contrastive Learning
 create mode 100644 data/2022/neurips/Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
 create mode 100644 data/2022/neurips/Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
 create mode 100644 data/2022/neurips/Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
 create mode 100644 data/2022/neurips/AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars
 create mode 100644 data/2022/neurips/AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies
 create mode 100644 data/2022/neurips/AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
 create mode 100644 data/2022/neurips/Annihilation of Spurious Minima in Two-Layer ReLU Networks
 create mode 100644 data/2022/neurips/AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection
 create mode 100644 data/2022/neurips/Anonymized Histograms in Intermediate Privacy Models
 create mode 100644 data/2022/neurips/Anonymous Bandits for Multi-User Systems
 create mode 100644 data/2022/neurips/Anticipating Performativity by Predicting from Predictions
 create mode 100644 data/2022/neurips/Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures
 create mode 100644 data/2022/neurips/Anytime-Valid Inference For Multinomial Count Data
 create mode 100644 data/2022/neurips/Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization
 create mode 100644 data/2022/neurips/Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss
 create mode 100644 data/2022/neurips/Approximate Secular Equations for the Cubic Regularization Subproblem
 create mode 100644 data/2022/neurips/Approximate Value Equivalence
 create mode 100644 data/2022/neurips/Approximation with CNNs in Sobolev Space: with Applications to Classification
 create mode 100644 data/2022/neurips/Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions
 create mode 100644 data/2022/neurips/Are All Losses Created Equal: A Neural Collapse Perspective
 create mode 100644 data/2022/neurips/Are AlphaZero-like Agents Robust to Adversarial Perturbations?
 create mode 100644 data/2022/neurips/Are Defenses for Graph Neural Networks Robust?
 create mode 100644 data/2022/neurips/Are GANs overkill for NLP?
 create mode 100644 data/2022/neurips/Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks
 create mode 100644 data/2022/neurips/Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks
 create mode 100644 data/2022/neurips/Are all Frames Equal? Active Sparse Labeling for Video Action Detection
 create mode 100644 data/2022/neurips/Ask4Help: Learning to Leverage an Expert for Embodied Tasks
 create mode 100644 data/2022/neurips/Assaying Out-Of-Distribution Generalization in Transfer Learning
 create mode 100644 data/2022/neurips/Assistive Teaching of Motor Control Tasks to Humans
 create mode 100644 data/2022/neurips/Associating Objects and Their Effects in Video through Coordination Games
 create mode 100644 data/2022/neurips/Association Graph Learning for Multi-Task Classification with Category Shifts
 create mode 100644 data/2022/neurips/Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
 create mode 100644 data/2022/neurips/Asymptotic Behaviors of Projected Stochastic Approximation: A Jump Diffusion Perspective
 create mode 100644 data/2022/neurips/Asymptotic Properties for Bayesian Neural Network in Besov Space
 create mode 100644 data/2022/neurips/Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm
 create mode 100644 data/2022/neurips/Asymptotics of smoothed Wasserstein distances in the small noise regime
 create mode 100644 "data/2022/neurips/Asymptotics of \342\204\2232 Regularized Network Embeddings"
 create mode 100644 data/2022/neurips/Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays
 create mode 100644 data/2022/neurips/AttCAT: Explaining Transformers via Attentive Class Activation Tokens
 create mode 100644 data/2022/neurips/Attention-based Neural Cellular Automata
 create mode 100644 data/2022/neurips/Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
 create mode 100644 data/2022/neurips/Audio-Driven Co-Speech Gesture Video Generation
 create mode 100644 data/2022/neurips/Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative
 create mode 100644 data/2022/neurips/Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
 create mode 100644 data/2022/neurips/AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints
 create mode 100644 data/2022/neurips/AutoML Two-Sample Test
 create mode 100644 data/2022/neurips/AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control
 create mode 100644 data/2022/neurips/AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning
 create mode 100644 data/2022/neurips/AutoST: Towards the Universal Modeling of Spatio-temporal Sequences
 create mode 100644 data/2022/neurips/AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
 create mode 100644 data/2022/neurips/Autoformalization with Large Language Models
 create mode 100644 data/2022/neurips/Autoinverse: Uncertainty Aware Inversion of Neural Networks
 create mode 100644 data/2022/neurips/Automatic Differentiation of Programs with Discrete Randomness
 create mode 100644 data/2022/neurips/Automatic differentiation of nonsmooth iterative algorithms
 create mode 100644 data/2022/neurips/Autoregressive Perturbations for Data Poisoning
 create mode 100644 data/2022/neurips/Autoregressive Search Engines: Generating Substrings as Document Identifiers
 create mode 100644 data/2022/neurips/Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds
 create mode 100644 data/2022/neurips/Average Sensitivity of Euclidean k-Clustering
 create mode 100644 data/2022/neurips/BEER: Fast $O(1 T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression
 create mode 100644 data/2022/neurips/BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
 create mode 100644 data/2022/neurips/BILCO: An Efficient Algorithm for Joint Alignment of Time Series
 create mode 100644 data/2022/neurips/BLOX: Macro Neural Architecture Search Benchmark and Algorithms
 create mode 100644 data/2022/neurips/BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling
 create mode 100644 data/2022/neurips/BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
 create mode 100644 data/2022/neurips/BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs
 create mode 100644 data/2022/neurips/BR-SNIS: Bias Reduced Self-Normalized Importance Sampling
 create mode 100644 data/2022/neurips/BYOL-Explore: Exploration by Bootstrapped Prediction
 create mode 100644 data/2022/neurips/Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation
 create mode 100644 data/2022/neurips/BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
 create mode 100644 data/2022/neurips/BadPrompt: Backdoor Attacks on Continuous Prompts
 create mode 100644 data/2022/neurips/BagFlip: A Certified Defense Against Data Poisoning
 create mode 100644 data/2022/neurips/Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
 create mode 100644 data/2022/neurips/Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel
 create mode 100644 data/2022/neurips/Batch Bayesian optimisation via density-ratio estimation with guarantees
 create mode 100644 data/2022/neurips/Batch Multi-Fidelity Active Learning with Budget Constraints
 create mode 100644 data/2022/neurips/Batch size-invariance for policy optimization
 create mode 100644 data/2022/neurips/Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
 create mode 100644 data/2022/neurips/BayesPCN: A Continually Learnable Predictive Coding Associative Memory
 create mode 100644 data/2022/neurips/Bayesian Active Learning with Fully Bayesian Gaussian Processes
 create mode 100644 data/2022/neurips/Bayesian Clustering of Neural Spiking Activity Using a Mixture of Dynamic Poisson Factor Analyzers
 create mode 100644 data/2022/neurips/Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning
 create mode 100644 data/2022/neurips/Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization
 create mode 100644 data/2022/neurips/Bayesian Persuasion for Algorithmic Recourse
 create mode 100644 data/2022/neurips/Bayesian Risk Markov Decision Processes
 create mode 100644 data/2022/neurips/Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty
 create mode 100644 data/2022/neurips/Bayesian inference via sparse Hamiltonian flows
 create mode 100644 data/2022/neurips/Behavior Transformers: Cloning $k$ modes with one stone
 create mode 100644 data/2022/neurips/Bellman Residual Orthogonalization for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
 create mode 100644 data/2022/neurips/Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms
 create mode 100644 data/2022/neurips/Benchopt: Reproducible, efficient and collaborative optimization benchmarks
 create mode 100644 data/2022/neurips/Benefits of Additive Noise in Composing Classes with Bounded Capacity
 create mode 100644 data/2022/neurips/Benefits of Permutation-Equivariance in Auction Mechanisms
 create mode 100644 data/2022/neurips/Benign Overfitting in Two-layer Convolutional Neural Networks
 create mode 100644 data/2022/neurips/Benign Underfitting of Stochastic Gradient Descent
 create mode 100644 data/2022/neurips/Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting
 create mode 100644 data/2022/neurips/Bessel Equivariant Networks for Inversion of Transmission Effects in Multi-Mode Optical Fibres
 create mode 100644 data/2022/neurips/Best of Both Worlds Model Selection
 create mode 100644 data/2022/neurips/Better Best of Both Worlds Bounds for Bandits with Switching Costs
 create mode 100644 data/2022/neurips/Better SGD using Second-order Momentum
 create mode 100644 data/2022/neurips/Better Uncertainty Calibration via Proper Scores for Classification and Beyond
 create mode 100644 data/2022/neurips/Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness
 create mode 100644 data/2022/neurips/Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection
 create mode 100644 data/2022/neurips/Beyond IID: data-driven decision-making in heterogeneous environments
 create mode 100644 data/2022/neurips/Beyond L1: Faster and Better Sparse Models with skglm
 create mode 100644 data/2022/neurips/Beyond Mahalanobis Distance for Textual OOD Detection
 create mode 100644 data/2022/neurips/Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
 create mode 100644 data/2022/neurips/Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs
 create mode 100644 data/2022/neurips/Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
 create mode 100644 data/2022/neurips/Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations
 create mode 100644 data/2022/neurips/Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update
 create mode 100644 data/2022/neurips/Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
 create mode 100644 data/2022/neurips/Beyond black box densities: Parameter learning for the deviated components
 create mode 100644 data/2022/neurips/Beyond neural scaling laws: beating power law scaling via data pruning
 create mode 100644 data/2022/neurips/Beyond spectral gap: the role of the topology in decentralized learning
 create mode 100644 data/2022/neurips/Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits
 create mode 100644 data/2022/neurips/Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
 create mode 100644 data/2022/neurips/Bezier Gaussian Processes for Tall and Wide Data
 create mode 100644 data/2022/neurips/Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification
 create mode 100644 data/2022/neurips/BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons
 create mode 100644 data/2022/neurips/BiT: Robustly Binarized Multi-distilled Transformer
 create mode 100644 data/2022/neurips/Bidirectional Learning for Offline Infinite-width Model-based Optimization
 create mode 100644 data/2022/neurips/BigBio: A Framework for Data-Centric Biomedical Natural Language Processing
 create mode 100644 data/2022/neurips/BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
 create mode 100644 data/2022/neurips/Biological Learning of Irreducible Representations of Commuting Transformations
 create mode 100644 data/2022/neurips/Biologically Inspired Dynamic Thresholds for Spiking Neural Networks
 create mode 100644 data/2022/neurips/Biologically plausible solutions for spiking networks with efficient coding
 create mode 100644 data/2022/neurips/Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources
 create mode 100644 data/2022/neurips/Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators
 create mode 100644 data/2022/neurips/Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation
 create mode 100644 data/2022/neurips/Black-Box Generalization: Stability of Zeroth-Order Learning
 create mode 100644 data/2022/neurips/Black-box coreset variational inference
 create mode 100644 data/2022/neurips/Blackbox Attacks via Surrogate Ensemble Search
 create mode 100644 data/2022/neurips/Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution
 create mode 100644 data/2022/neurips/Block-Recurrent Transformers
 create mode 100644 data/2022/neurips/Boosting Barely Robust Learners: A New Perspective on Adversarial Robustness
 create mode 100644 data/2022/neurips/Boosting Out-of-distribution Detection with Typical Features
 create mode 100644 data/2022/neurips/Boosting the Performance of Generic Deep Neural Network Frameworks with Log-supermodular CRFs
 create mode 100644 data/2022/neurips/Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation
 create mode 100644 data/2022/neurips/Bootstrapped Transformer for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity
 create mode 100644 data/2022/neurips/Bounding and Approximating Intersectional Fairness through Marginal Fairness
 create mode 100644 data/2022/neurips/Brain Network Transformer
 create mode 100644 data/2022/neurips/Branch & Learn for Recursively and Iteratively Solvable Problems in Predict+Optimize
 create mode 100644 data/2022/neurips/Breaking Bad: A Dataset for Geometric Fracture and Reassembly
 create mode 100644 data/2022/neurips/Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor
 create mode 100644 data/2022/neurips/Bridging Central and Local Differential Privacy in Data Acquisition Mechanisms
 create mode 100644 data/2022/neurips/Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
 create mode 100644 data/2022/neurips/Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
 create mode 100644 data/2022/neurips/Bridging the Gap from Asymmetry Tricks to Decorrelation Principles in Non-contrastive Self-supervised Learning
 create mode 100644 data/2022/neurips/Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers
 create mode 100644 data/2022/neurips/Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization
 create mode 100644 data/2022/neurips/Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
 create mode 100644 data/2022/neurips/Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints
 create mode 100644 data/2022/neurips/Byzantine Spectral Ranking
 create mode 100644 data/2022/neurips/Byzantine-tolerant federated Gaussian process regression for streaming data
 create mode 100644 data/2022/neurips/C-Mixup: Improving Generalization in Regression
 create mode 100644 data/2022/neurips/C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting
 create mode 100644 data/2022/neurips/CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
 create mode 100644 data/2022/neurips/CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds
 create mode 100644 data/2022/neurips/CARD: Classification and Regression Diffusion Models
 create mode 100644 data/2022/neurips/CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains
 create mode 100644 data/2022/neurips/CASA: Category-agnostic Skeletal Animal Reconstruction
 create mode 100644 data/2022/neurips/CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
 create mode 100644 data/2022/neurips/CCCP is Frank-Wolfe in disguise
 create mode 100644 data/2022/neurips/CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
 create mode 100644 data/2022/neurips/CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition
 create mode 100644 data/2022/neurips/CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations
 create mode 100644 data/2022/neurips/CGLB: Benchmark Tasks for Continual Graph Learning
 create mode 100644 data/2022/neurips/CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis
 create mode 100644 data/2022/neurips/CLEAR: Generative Counterfactual Explanations on Graphs
 create mode 100644 data/2022/neurips/CLEVRER-Humans: Describing Physical and Causal Events the Human Way
 create mode 100644 data/2022/neurips/CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders
 create mode 100644 data/2022/neurips/CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
 create mode 100644 data/2022/neurips/CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
 create mode 100644 data/2022/neurips/COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
 create mode 100644 data/2022/neurips/CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification
 create mode 100644 data/2022/neurips/CUP: Critic-Guided Policy Reuse
 create mode 100644 data/2022/neurips/Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever
 create mode 100644 data/2022/neurips/CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation
 create mode 100644 data/2022/neurips/CalFAT: Calibrated Federated Adversarial Training with Label Skewness
 create mode 100644 data/2022/neurips/Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
 create mode 100644 data/2022/neurips/Can Adversarial Training Be Manipulated By Non-Robust Features?
 create mode 100644 data/2022/neurips/Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem?
 create mode 100644 data/2022/neurips/Can Push-forward Generative Models Fit Multimodal Distributions?
 create mode 100644 data/2022/neurips/Capturing Failures of Large Language Models via Human Cognitive Biases
 create mode 100644 data/2022/neurips/Capturing Graphs with Hypo-Elliptic Diffusions
 create mode 100644 data/2022/neurips/CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification
 create mode 100644 data/2022/neurips/Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset
 create mode 100644 data/2022/neurips/Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis
 create mode 100644 data/2022/neurips/Causal Discovery in Linear Latent Variable Models Subject to Measurement Error
 create mode 100644 data/2022/neurips/Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness
 create mode 100644 data/2022/neurips/Causal Inference with Non-IID Data using Linear Graphical Models
 create mode 100644 data/2022/neurips/Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning
 create mode 100644 data/2022/neurips/Causality-driven Hierarchical Structure Discovery for Reinforcement Learning
 create mode 100644 data/2022/neurips/Causally motivated multi-shortcut identification and removal
 create mode 100644 data/2022/neurips/Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
 create mode 100644 data/2022/neurips/Certifying Robust Graph Classification under Orthogonal Gromov-Wasserstein Threats
 create mode 100644 data/2022/neurips/Certifying Some Distributional Fairness with Subpopulation Decomposition
 create mode 100644 data/2022/neurips/Chain of Thought Imitation with Procedure Cloning
 create mode 100644 data/2022/neurips/Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
 create mode 100644 data/2022/neurips/Challenging Common Assumptions in Convex Reinforcement Learning
 create mode 100644 data/2022/neurips/Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery
 create mode 100644 data/2022/neurips/Change-point Detection for Sparse and Dense Functional Data in General Dimensions
 create mode 100644 data/2022/neurips/Chaotic Dynamics are Intrinsic to Neural Network Training with SGD
 create mode 100644 data/2022/neurips/Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent
 create mode 100644 data/2022/neurips/Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
 create mode 100644 data/2022/neurips/Characterization of Excess Risk for Locally Strongly Convex Population Risk
 create mode 100644 data/2022/neurips/Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models
 create mode 100644 data/2022/neurips/Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains
 create mode 100644 data/2022/neurips/Chefs' Random Tables: Non-Trigonometric Random Features
 create mode 100644 data/2022/neurips/Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
 create mode 100644 data/2022/neurips/Chromatic Correlation Clustering, Revisited
 create mode 100644 data/2022/neurips/Class-Aware Adversarial Transformers for Medical Image Segmentation
 create mode 100644 data/2022/neurips/Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization
 create mode 100644 data/2022/neurips/ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences
 create mode 100644 data/2022/neurips/Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise
 create mode 100644 data/2022/neurips/Cluster Randomized Designs for One-Sided Bipartite Experiments
 create mode 100644 data/2022/neurips/Cluster and Aggregate: Face Recognition with Large Probe Set
 create mode 100644 data/2022/neurips/Co-Modality Graph Contrastive Learning for Imbalanced Node Classification
 create mode 100644 data/2022/neurips/CoNSoLe: Convex Neural Symbolic Learning
 create mode 100644 data/2022/neurips/CoNT: Contrastive Neural Text Generation
 create mode 100644 data/2022/neurips/CoPur: Certifiably Robust Collaborative Inference via Feature Purification
 create mode 100644 data/2022/neurips/Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
 create mode 100644 data/2022/neurips/CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Coded Residual Transform for Generalizable Deep Metric Learning
 create mode 100644 data/2022/neurips/CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
 create mode 100644 data/2022/neurips/Collaborative Decision Making Using Action Suggestions
 create mode 100644 data/2022/neurips/Collaborative Learning by Detecting Collaboration Partners
 create mode 100644 data/2022/neurips/Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints
 create mode 100644 data/2022/neurips/Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds
 create mode 100644 data/2022/neurips/ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs
 create mode 100644 data/2022/neurips/ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition
 create mode 100644 data/2022/neurips/ComMU: Dataset for Combinatorial Music Generation
 create mode 100644 data/2022/neurips/Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness
 create mode 100644 data/2022/neurips/Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
 create mode 100644 data/2022/neurips/Communicating Natural Programs to Humans and Machines
 create mode 100644 data/2022/neurips/Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with an Inexact Prox
 create mode 100644 data/2022/neurips/Communication Efficient Distributed Learning for Kernelized Contextual Bandits
 create mode 100644 data/2022/neurips/Communication Efficient Federated Learning for Generalized Linear Bandits
 create mode 100644 data/2022/neurips/Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate
 create mode 100644 data/2022/neurips/Communication-efficient distributed eigenspace estimation with arbitrary node failures
 create mode 100644 data/2022/neurips/Composite Feature Selection Using Deep Ensembles
 create mode 100644 data/2022/neurips/Composition Theorems for Interactive Differential Privacy
 create mode 100644 data/2022/neurips/Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
 create mode 100644 data/2022/neurips/Compositional generalization through abstract representations in human and artificial neural networks
 create mode 100644 data/2022/neurips/Compressible-composable NeRF via Rank-residual Decomposition
 create mode 100644 data/2022/neurips/Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
 create mode 100644 data/2022/neurips/Concentration of Data Encoding in Parameterized Quantum Circuits
 create mode 100644 data/2022/neurips/Concept Activation Regions: A Generalized Framework For Concept-Based Explanations
 create mode 100644 data/2022/neurips/Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off
 create mode 100644 data/2022/neurips/Concrete Score Matching: Generalized Score Matching for Discrete Data
 create mode 100644 data/2022/neurips/Conditional Diffusion Process for Inverse Halftoning
 create mode 100644 data/2022/neurips/Conditional Independence Testing with Heteroskedastic Data and Applications to Causal Discovery
 create mode 100644 data/2022/neurips/Conditional Meta-Learning of Linear Representations
 create mode 100644 data/2022/neurips/ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild
 create mode 100644 data/2022/neurips/Confidence-based Reliable Learning under Dual Noises
 create mode 100644 data/2022/neurips/Confident Adaptive Language Modeling
 create mode 100644 data/2022/neurips/Conformal Frequency Estimation with Sketched Data
 create mode 100644 data/2022/neurips/Conformal Off-Policy Prediction in Contextual Bandits
 create mode 100644 data/2022/neurips/Conformal Prediction with Temporal Quantile Adjustments
 create mode 100644 data/2022/neurips/Conformalized Fairness via Quantile Regression
 create mode 100644 data/2022/neurips/ConfounderGAN: Protecting Image Data Privacy with Causal Confounder
 create mode 100644 data/2022/neurips/Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
 create mode 100644 data/2022/neurips/Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions
 create mode 100644 data/2022/neurips/Consistent Interpolating Ensembles via the Manifold-Hilbert Kernel
 create mode 100644 data/2022/neurips/Consistent Sufficient Explanations and Minimal Local Rules for explaining the decision of any classifier or regressor
 create mode 100644 data/2022/neurips/Constants of motion network
 create mode 100644 data/2022/neurips/Constrained GPI for Zero-Shot Transfer in Reinforcement Learning
 create mode 100644 data/2022/neurips/Constrained Langevin Algorithms with L-mixing External Random Variables
 create mode 100644 data/2022/neurips/Constrained Predictive Coding as a Biologically Plausible Model of the Cortical Hierarchy
 create mode 100644 data/2022/neurips/Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data
 create mode 100644 data/2022/neurips/Constrained Update Projection Approach to Safe Policy Optimization
 create mode 100644 data/2022/neurips/Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations
 create mode 100644 data/2022/neurips/Contact-aware Human Motion Forecasting
 create mode 100644 data/2022/neurips/Context-Based Dynamic Pricing with Partially Linear Demand Model
 create mode 100644 data/2022/neurips/Contextual Bandits with Knapsacks for a Conversion Model
 create mode 100644 data/2022/neurips/Contextual Dynamic Pricing with Unknown Noise: Explore-then-UCB Strategy and Improved Regrets
 create mode 100644 data/2022/neurips/Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification
 create mode 100644 data/2022/neurips/Continual Learning In Environments With Polynomial Mixing Times
 create mode 100644 data/2022/neurips/Continual Learning with Evolving Class Ontologies
 create mode 100644 data/2022/neurips/Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions
 create mode 100644 data/2022/neurips/Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis
 create mode 100644 data/2022/neurips/Continuous MDP Homomorphisms and Homomorphic Policy Gradient
 create mode 100644 data/2022/neurips/Continuously Tempered PDMP samplers
 create mode 100644 data/2022/neurips/Contrastive Adapters for Foundation Model Group Robustness
 create mode 100644 data/2022/neurips/Contrastive Graph Structure Learning via Information Bottleneck for Recommendation
 create mode 100644 data/2022/neurips/Contrastive Language-Image Pre-Training with Knowledge Graphs
 create mode 100644 data/2022/neurips/Contrastive Learning as Goal-Conditioned Reinforcement Learning
 create mode 100644 data/2022/neurips/Contrastive Neural Ratio Estimation
 create mode 100644 data/2022/neurips/Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods
 create mode 100644 data/2022/neurips/Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields
 create mode 100644 data/2022/neurips/Controllable Text Generation with Neurally-Decomposed Oracle
 create mode 100644 data/2022/neurips/Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints
 create mode 100644 data/2022/neurips/Convergence beyond the over-parameterized regime using Rayleigh quotients
 create mode 100644 data/2022/neurips/Convergence for score-based generative modeling with polynomial complexity
 create mode 100644 data/2022/neurips/Convergent Representations of Computer Programs in Human and Artificial Neural Networks
 create mode 100644 data/2022/neurips/Convexity Certificates from Hessians
 create mode 100644 data/2022/neurips/Convolutional Neural Networks on Graphs with Chebyshev Approximation, Revisited
 create mode 100644 data/2022/neurips/Cooperative Distribution Alignment via JSD Upper Bound
 create mode 100644 data/2022/neurips/Coordinate Linear Variance Reduction for Generalized Linear Programming
 create mode 100644 data/2022/neurips/Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations
 create mode 100644 data/2022/neurips/Coreset for Line-Sets Clustering
 create mode 100644 data/2022/neurips/Coresets for Relational Data and The Applications
 create mode 100644 data/2022/neurips/Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering
 create mode 100644 data/2022/neurips/Coresets for Wasserstein Distributionally Robust Optimization Problems
 create mode 100644 data/2022/neurips/Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics
 create mode 100644 data/2022/neurips/Cost-efficient Gaussian tensor network embeddings for tensor-structured inputs
 create mode 100644 data/2022/neurips/Could Giant Pre-trained Image Models Extract Universal Representations?
 create mode 100644 data/2022/neurips/Counterfactual Fairness with Partially Known Causal Graph
 create mode 100644 data/2022/neurips/Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media
 create mode 100644 data/2022/neurips/Counterfactual Temporal Point Processes
 create mode 100644 data/2022/neurips/Counterfactual harm
 create mode 100644 data/2022/neurips/CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
 create mode 100644 data/2022/neurips/CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
 create mode 100644 data/2022/neurips/Cross Aggregation Transformer for Image Restoration
 create mode 100644 data/2022/neurips/Cross-Image Context for Single Image Inpainting
 create mode 100644 data/2022/neurips/Cross-Linked Unified Embedding for cross-modality representation learning
 create mode 100644 data/2022/neurips/Cross-modal Learning for Image-Guided Point Cloud Shape Completion
 create mode 100644 data/2022/neurips/CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference
 create mode 100644 data/2022/neurips/Cryptographic Hardness of Learning Halfspaces with Massart Noise
 create mode 100644 data/2022/neurips/Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
 create mode 100644 data/2022/neurips/Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation
 create mode 100644 data/2022/neurips/CyCLIP: Cyclic Contrastive Language-Image Pretraining
 create mode 100644 data/2022/neurips/DABS 2.0: Improved Datasets and Algorithms for Universal Self-Supervision
 create mode 100644 data/2022/neurips/DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization
 create mode 100644 data/2022/neurips/DARE: Disentanglement-Augmented Rationale Extraction
 create mode 100644 data/2022/neurips/DART: Articulated Hand Model with Diverse Accessories and Rich Textures
 create mode 100644 data/2022/neurips/DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/DC-BENCH: Dataset Condensation Benchmark
 create mode 100644 data/2022/neurips/DDXPlus: A New Dataset For Automatic Medical Diagnosis
 create mode 100644 data/2022/neurips/DENSE: Data-Free One-Shot Federated Learning
 create mode 100644 data/2022/neurips/DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection
 create mode 100644 data/2022/neurips/DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning
 create mode 100644 data/2022/neurips/DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems
 create mode 100644 data/2022/neurips/DISCO: Adversarial Defense with Local Implicit Functions
 create mode 100644 data/2022/neurips/DMAP: a Distributed Morphological Attention Policy for learning to locomote with a changing body
 create mode 100644 data/2022/neurips/DNA: Proximal Policy Optimization with a Dual Network Architecture
 create mode 100644 data/2022/neurips/DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
 create mode 100644 data/2022/neurips/DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning
 create mode 100644 data/2022/neurips/DP-PCA: Statistically Optimal and Differentially Private PCA
 create mode 100644 data/2022/neurips/DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
 create mode 100644 data/2022/neurips/DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing
 create mode 100644 data/2022/neurips/DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection
 create mode 100644 data/2022/neurips/DaDA: Distortion-aware Domain Adaptation for Unsupervised Semantic Segmentation
 create mode 100644 data/2022/neurips/Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention
 create mode 100644 data/2022/neurips/Data Augmentation MCMC for Bayesian Inference from Privatized Data
 create mode 100644 data/2022/neurips/Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome
 create mode 100644 data/2022/neurips/Data Distributional Properties Drive Emergent In-Context Learning in Transformers
 create mode 100644 data/2022/neurips/Data augmentation for efficient learning from parametric experts
 create mode 100644 data/2022/neurips/Data-Driven Conditional Robust Optimization
 create mode 100644 data/2022/neurips/Data-Driven Offline Decision-Making via Invariant Representation Learning
 create mode 100644 data/2022/neurips/Data-Efficient Augmentation for Training Neural Networks
 create mode 100644 data/2022/neurips/Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
 create mode 100644 data/2022/neurips/Data-Efficient Structured Pruning via Submodular Optimization
 create mode 100644 data/2022/neurips/Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data
 create mode 100644 data/2022/neurips/DataMUX: Data Multiplexing for Neural Networks
 create mode 100644 data/2022/neurips/Dataset Distillation using Neural Feature Regression
 create mode 100644 data/2022/neurips/Dataset Distillation via Factorization
 create mode 100644 data/2022/neurips/Dataset Inference for Self-Supervised Models
 create mode 100644 data/2022/neurips/DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
 create mode 100644 data/2022/neurips/Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding
 create mode 100644 data/2022/neurips/Debiased Machine Learning without Sample-Splitting for Stable Estimators
 create mode 100644 data/2022/neurips/Debiased Self-Training for Semi-Supervised Learning
 create mode 100644 data/2022/neurips/Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records
 create mode 100644 data/2022/neurips/Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure
 create mode 100644 data/2022/neurips/Debugging and Explaining Metric Learning Approaches: An Influence Function Based Perspective
 create mode 100644 data/2022/neurips/Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks
 create mode 100644 data/2022/neurips/Decentralized Local Stochastic Extra-Gradient for Variational Inequalities
 create mode 100644 data/2022/neurips/Decentralized Training of Foundation Models in Heterogeneous Environments
 create mode 100644 data/2022/neurips/Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets
 create mode 100644 data/2022/neurips/Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
 create mode 100644 data/2022/neurips/Decision Trees with Short Explainable Rules
 create mode 100644 data/2022/neurips/Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses
 create mode 100644 data/2022/neurips/Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal
 create mode 100644 data/2022/neurips/Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity
 create mode 100644 data/2022/neurips/Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation
 create mode 100644 data/2022/neurips/Decomposing NeRF for Editing via Feature Field Distillation
 create mode 100644 data/2022/neurips/Deconfounded Representation Similarity for Comparison of Neural Networks
 create mode 100644 data/2022/neurips/Decoupled Context Processing for Context Augmented Language Modeling
 create mode 100644 data/2022/neurips/Decoupled Self-supervised Learning for Graphs
 create mode 100644 data/2022/neurips/Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation
 create mode 100644 data/2022/neurips/Decoupling Features in Hierarchical Propagation for Video Object Segmentation
 create mode 100644 data/2022/neurips/Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning
 create mode 100644 data/2022/neurips/Deep Active Learning by Leveraging Training Dynamics
 create mode 100644 data/2022/neurips/Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis
 create mode 100644 data/2022/neurips/Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems
 create mode 100644 data/2022/neurips/Deep Bidirectional Language-Knowledge Graph Pretraining
 create mode 100644 data/2022/neurips/Deep Combinatorial Aggregation
 create mode 100644 data/2022/neurips/Deep Compression of Pre-trained Transformer Models
 create mode 100644 data/2022/neurips/Deep Counterfactual Estimation with Categorical Background Variables
 create mode 100644 data/2022/neurips/Deep Differentiable Logic Gate Networks
 create mode 100644 data/2022/neurips/Deep Ensembles Work, But Are They Necessary?
 create mode 100644 data/2022/neurips/Deep Equilibrium Approaches to Diffusion Models
 create mode 100644 data/2022/neurips/Deep Fourier Up-Sampling
 create mode 100644 "data/2022/neurips/Deep Generalized Schr\303\266dinger Bridge"
 create mode 100644 data/2022/neurips/Deep Generative Model for Periodic Graphs
 create mode 100644 data/2022/neurips/Deep Hierarchical Planning from Pixels
 create mode 100644 data/2022/neurips/Deep Learning Methods for Proximal Inference via Maximum Moment Restriction
 create mode 100644 data/2022/neurips/Deep Model Reassembly
 create mode 100644 data/2022/neurips/Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies
 create mode 100644 data/2022/neurips/Deep Surrogate Assisted Generation of Environments
 create mode 100644 data/2022/neurips/Deep invariant networks with differentiable augmentation layers
 create mode 100644 data/2022/neurips/DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/DeepInteraction: 3D Object Detection via Modality Interaction
 create mode 100644 data/2022/neurips/DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning
 create mode 100644 data/2022/neurips/DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs
 create mode 100644 data/2022/neurips/Defending Against Adversarial Attacks via Neural Dynamic System
 create mode 100644 data/2022/neurips/Defining and Characterizing Reward Gaming
 create mode 100644 data/2022/neurips/Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
 create mode 100644 data/2022/neurips/Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
 create mode 100644 data/2022/neurips/Delving into Out-of-Distribution Detection with Vision-Language Representations
 create mode 100644 data/2022/neurips/Delving into Sequential Patches for Deepfake Detection
 create mode 100644 data/2022/neurips/Denoising Diffusion Restoration Models
 create mode 100644 data/2022/neurips/Dense Interspecies Face Embedding
 create mode 100644 data/2022/neurips/Density-driven Regularization for Out-of-distribution Detection
 create mode 100644 data/2022/neurips/Depth is More Powerful than Width with Prediction Concatenation in Deep Forest
 create mode 100644 data/2022/neurips/Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks
 create mode 100644 data/2022/neurips/DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
 create mode 100644 data/2022/neurips/Detecting Abrupt Changes in Sequential Pairwise Comparison Data
 create mode 100644 data/2022/neurips/Detection and Localization of Changes in Conditional Distributions
 create mode 100644 data/2022/neurips/Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference
 create mode 100644 data/2022/neurips/DevFly: Bio-Inspired Development of Binary Connections for Locality Preserving Sparse Codes
 create mode 100644 data/2022/neurips/DiSC: Differential Spectral Clustering of Features
 create mode 100644 data/2022/neurips/Diagnosing failures of fairness transfer across distribution shift in real-world medical settings
 create mode 100644 data/2022/neurips/Diagonal State Spaces are as Effective as Structured State Spaces
 create mode 100644 data/2022/neurips/Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
 create mode 100644 data/2022/neurips/Differentiable Analog Quantum Computing for Optimization and Control
 create mode 100644 data/2022/neurips/Differentiable hierarchical and surrogate gradient search for spiking neural networks
 create mode 100644 data/2022/neurips/Differentially Private Covariance Revisited
 create mode 100644 data/2022/neurips/Differentially Private Generalized Linear Models Revisited
 create mode 100644 data/2022/neurips/Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank
 create mode 100644 data/2022/neurips/Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
 create mode 100644 data/2022/neurips/Differentially Private Learning with Margin Guarantees
 create mode 100644 data/2022/neurips/Differentially Private Linear Sketches: Efficient Implementations and Applications
 create mode 100644 data/2022/neurips/Differentially Private Model Compression
 create mode 100644 data/2022/neurips/Differentially Private Online-to-batch for Smooth Losses
 create mode 100644 data/2022/neurips/Diffusion Curvature for Estimating Local Curvature in High Dimensional Data
 create mode 100644 data/2022/neurips/Diffusion Models as Plug-and-Play Priors
 create mode 100644 data/2022/neurips/Diffusion Visual Counterfactual Explanations
 create mode 100644 data/2022/neurips/Diffusion-LM Improves Controllable Text Generation
 create mode 100644 data/2022/neurips/Diffusion-based Molecule Generation with Informative Prior Bridges
 create mode 100644 data/2022/neurips/DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data
 create mode 100644 data/2022/neurips/Direct Advantage Estimation
 create mode 100644 data/2022/neurips/Discovered Policy Optimisation
 create mode 100644 data/2022/neurips/Discovering Design Concepts for CAD Sketches
 create mode 100644 data/2022/neurips/Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation
 create mode 100644 data/2022/neurips/Discovery of Single Independent Latent Variable
 create mode 100644 data/2022/neurips/Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning
 create mode 100644 data/2022/neurips/Discrete-Convex-Analysis-Based Framework for Warm-Starting Algorithms with Predictions
 create mode 100644 data/2022/neurips/Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders
 create mode 100644 data/2022/neurips/Disentangling Transfer in Continual Reinforcement Learning
 create mode 100644 data/2022/neurips/Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
 create mode 100644 data/2022/neurips/Distilled Gradient Aggregation: Purify Features for Input Attribution in the Deep Neural Network
 create mode 100644 data/2022/neurips/Distilling Representations from GAN Generator via Squeeze and Span
 create mode 100644 data/2022/neurips/Distinguishing Learning Rules with Brain Machine Interfaces
 create mode 100644 data/2022/neurips/Distinguishing discrete and continuous behavioral variability using warped autoregressive HMMs
 create mode 100644 data/2022/neurips/Distributed Distributionally Robust Optimization with Non-Convex Objectives
 create mode 100644 data/2022/neurips/Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems
 create mode 100644 data/2022/neurips/Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems
 create mode 100644 data/2022/neurips/Distributed Learning of Conditional Quantiles in the Reproducing Kernel Hilbert Space
 create mode 100644 data/2022/neurips/Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
 create mode 100644 data/2022/neurips/Distributed Online Convex Optimization with Compressed Communication
 create mode 100644 data/2022/neurips/Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity
 create mode 100644 data/2022/neurips/Distribution-Informed Neural Networks for Domain Adaptation Regression
 create mode 100644 data/2022/neurips/Distributional Convergence of the Sliced Wasserstein Process
 create mode 100644 data/2022/neurips/Distributional Reinforcement Learning for Risk-Sensitive Policies
 create mode 100644 data/2022/neurips/Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Distributionally Adaptive Meta Reinforcement Learning
 create mode 100644 data/2022/neurips/Distributionally Robust Optimization via Ball Oracle Acceleration
 create mode 100644 data/2022/neurips/Distributionally Robust Optimization with Data Geometry
 create mode 100644 data/2022/neurips/Distributionally robust weighted k-nearest neighbors
 create mode 100644 data/2022/neurips/DivBO: Diversity-aware CASH for Ensemble Learning
 create mode 100644 data/2022/neurips/Diverse Weight Averaging for Out-of-Distribution Generalization
 create mode 100644 data/2022/neurips/Diversified Recommendations for Agents with Adaptive Preferences
 create mode 100644 data/2022/neurips/Diversity vs. Recognizability: Human-like generalization in one-shot generative models
 create mode 100644 data/2022/neurips/Divert More Attention to Vision-Language Tracking
 create mode 100644 data/2022/neurips/Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning
 create mode 100644 data/2022/neurips/Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
 create mode 100644 data/2022/neurips/Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
 create mode 100644 data/2022/neurips/Does GNN Pretraining Help Molecular Representation?
 create mode 100644 data/2022/neurips/Does Momentum Change the Implicit Regularization on Separable Data?
 create mode 100644 data/2022/neurips/Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
 create mode 100644 data/2022/neurips/Domain Adaptation meets Individual Fairness. And they get along
 create mode 100644 data/2022/neurips/Domain Adaptation under Open Set Label Shift
 create mode 100644 data/2022/neurips/Domain Generalization by Learning and Removing Domain-specific Features
 create mode 100644 data/2022/neurips/Domain Generalization without Excess Empirical Risk
 create mode 100644 data/2022/neurips/Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation
 create mode 100644 data/2022/neurips/Don't Roll the Dice, Ask Twice: The Two-Query Distortion of Matching Problems and Beyond
 create mode 100644 data/2022/neurips/Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity
 create mode 100644 data/2022/neurips/Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
 create mode 100644 data/2022/neurips/Doubly Robust Counterfactual Classification
 create mode 100644 data/2022/neurips/Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
 create mode 100644 data/2022/neurips/Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
 create mode 100644 data/2022/neurips/Drawing out of Distribution with Neuro-Symbolic Generative Models
 create mode 100644 data/2022/neurips/DreamShard: Generalizable Embedding Table Placement for Recommender Systems
 create mode 100644 data/2022/neurips/DropCov: A Simple yet Effective Method for Improving Deep Architectures
 create mode 100644 data/2022/neurips/Dual-Curriculum Contrastive Multi-Instance Learning for Cancer Prognosis Analysis with Whole Slide Images
 create mode 100644 data/2022/neurips/Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection
 create mode 100644 data/2022/neurips/DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations
 create mode 100644 data/2022/neurips/Dungeons and Data: A Large-Scale NetHack Dataset
 create mode 100644 data/2022/neurips/Dynamic Fair Division with Partial Information
 create mode 100644 data/2022/neurips/Dynamic Graph Neural Networks Under Spatio-Temporal Distribution Shift
 create mode 100644 data/2022/neurips/Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
 create mode 100644 data/2022/neurips/Dynamic Learning in Large Matching Markets
 create mode 100644 data/2022/neurips/Dynamic Pricing with Monotonicity Constraint under Unknown Parametric Demand Model
 create mode 100644 "data/2022/neurips/Dynamic Sparse Network for Time Series Classification: Learning What to \"See\""
 create mode 100644 data/2022/neurips/Dynamic Tensor Product Regression
 create mode 100644 data/2022/neurips/Dynamic pricing and assortment under a contextual MNL demand
 create mode 100644 data/2022/neurips/Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution
 create mode 100644 data/2022/neurips/E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance
 create mode 100644 data/2022/neurips/EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
 create mode 100644 data/2022/neurips/EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization
 create mode 100644 data/2022/neurips/EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations
 create mode 100644 data/2022/neurips/EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
 create mode 100644 data/2022/neurips/ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler
 create mode 100644 data/2022/neurips/ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
 create mode 100644 data/2022/neurips/ELIAS: End-to-End Learning to Index and Search in Large Output Spaces
 create mode 100644 data/2022/neurips/ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
 create mode 100644 data/2022/neurips/ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
 create mode 100644 data/2022/neurips/EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
 create mode 100644 data/2022/neurips/ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision Medicine
 create mode 100644 data/2022/neurips/ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography
 create mode 100644 data/2022/neurips/EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring
 create mode 100644 data/2022/neurips/Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks
 create mode 100644 data/2022/neurips/Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
 create mode 100644 data/2022/neurips/EcoFormer: Energy-Saving Attention with Linear Complexity
 create mode 100644 data/2022/neurips/Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
 create mode 100644 data/2022/neurips/Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples
 create mode 100644 data/2022/neurips/Effective Dimension in Bandit Problems under Censorship
 create mode 100644 data/2022/neurips/Effectiveness of Vision Transformer for Fast and Accurate Single-Stage Pedestrian Detection
 create mode 100644 data/2022/neurips/Effects of Data Geometry in Early Deep Learning
 create mode 100644 data/2022/neurips/Efficiency Ordering of Stochastic Gradient Descent
 create mode 100644 data/2022/neurips/Efficient Active Learning with Abstention
 create mode 100644 data/2022/neurips/Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning
 create mode 100644 data/2022/neurips/Efficient Aggregated Kernel Tests using Incomplete $U$-statistics
 create mode 100644 data/2022/neurips/Efficient Architecture Search for Diverse Tasks
 create mode 100644 data/2022/neurips/Efficient Dataset Distillation using Random Feature Approximation
 create mode 100644 data/2022/neurips/Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems
 create mode 100644 data/2022/neurips/Efficient Graph Similarity Computation with Alignment Regularization
 create mode 100644 data/2022/neurips/Efficient Knowledge Distillation from Model Checkpoints
 create mode 100644 data/2022/neurips/Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
 create mode 100644 data/2022/neurips/Efficient Methods for Non-stationary Online Learning
 create mode 100644 data/2022/neurips/Efficient Multi-agent Communication via Self-supervised Information Aggregation
 create mode 100644 data/2022/neurips/Efficient Non-Parametric Optimizer Search for Diverse Tasks
 create mode 100644 data/2022/neurips/Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent
 create mode 100644 data/2022/neurips/Efficient Risk-Averse Reinforcement Learning
 create mode 100644 data/2022/neurips/Efficient Sampling on Riemannian Manifolds via Langevin MCMC
 create mode 100644 data/2022/neurips/Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
 create mode 100644 data/2022/neurips/Efficient Submodular Optimization under Noise: Local Search is Robust
 create mode 100644 data/2022/neurips/Efficient Training of Low-Curvature Neural Networks
 create mode 100644 data/2022/neurips/Efficient and Effective Augmentation Strategy for Adversarial Training
 create mode 100644 data/2022/neurips/Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations
 create mode 100644 data/2022/neurips/Efficient and Effective Optimal Transport-Based Biclustering
 create mode 100644 data/2022/neurips/Efficient and Modular Implicit Differentiation
 create mode 100644 data/2022/neurips/Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions
 create mode 100644 data/2022/neurips/Efficient and Stable Fully Dynamic Facility Location
 create mode 100644 data/2022/neurips/Efficient coding, channel capacity, and the emergence of retinal mosaics
 create mode 100644 data/2022/neurips/Efficient identification of informative features in simulation-based inference
 create mode 100644 data/2022/neurips/Efficient learning of nonlinear prediction models with time-series privileged information
 create mode 100644 data/2022/neurips/EfficientFormer: Vision Transformers at MobileNet Speed
 create mode 100644 data/2022/neurips/Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation
 create mode 100644 data/2022/neurips/Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent
 create mode 100644 data/2022/neurips/EgoTaskQA: Understanding Human Tasks in Egocentric Videos
 create mode 100644 data/2022/neurips/Egocentric Video-Language Pretraining
 create mode 100644 data/2022/neurips/ElasticMVS: Learning elastic part representation for self-supervised multi-view stereopsis
 create mode 100644 data/2022/neurips/Eliciting Thinking Hierarchy without a Prior
 create mode 100644 data/2022/neurips/Elucidating the Design Space of Diffusion-Based Generative Models
 create mode 100644 data/2022/neurips/Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification
 create mode 100644 data/2022/neurips/Embodied Scene-aware Human Pose Estimation
 create mode 100644 data/2022/neurips/Embrace the Gap: VAEs Perform Independent Mechanism Analysis
 create mode 100644 data/2022/neurips/Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
 create mode 100644 data/2022/neurips/Emergence of Hierarchical Layers in a Single Sheet of Self-Organizing Spiking Neurons
 create mode 100644 data/2022/neurips/Emergent Communication: Generalization and Overfitting in Lewis Games
 create mode 100644 data/2022/neurips/Emergent Graphical Conventions in a Visual Communication Game
 create mode 100644 data/2022/neurips/Empirical Gateaux Derivatives for Causal Inference
 create mode 100644 data/2022/neurips/Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
 create mode 100644 data/2022/neurips/Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
 create mode 100644 data/2022/neurips/End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking
 create mode 100644 data/2022/neurips/End-to-end Stochastic Optimization with Energy-based Model
 create mode 100644 data/2022/neurips/End-to-end Symbolic Regression with Transformers
 create mode 100644 data/2022/neurips/Energy-Based Contrastive Learning of Visual Representations
 create mode 100644 data/2022/neurips/Enhance the Visual Representation via Discrete Adversarial Training
 create mode 100644 data/2022/neurips/Enhanced Bilevel Optimization via Bregman Distance
 create mode 100644 data/2022/neurips/Enhanced Latent Space Blind Model for Real Image Denoising via Alternative Optimization
 create mode 100644 data/2022/neurips/Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments
 create mode 100644 data/2022/neurips/Enhancing Safe Exploration Using Safety State Augmentation
 create mode 100644 data/2022/neurips/Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
 create mode 100644 data/2022/neurips/Entropy-Driven Mixed-Precision Quantization for Deep Network Design
 create mode 100644 data/2022/neurips/EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
 create mode 100644 data/2022/neurips/Environment Diversification with Multi-head Neural Network for Invariant Learning
 create mode 100644 data/2022/neurips/Envy-free Policy Teaching to Multiple Agents
 create mode 100644 data/2022/neurips/EpiGRAF: Rethinking training of 3D GANs
 create mode 100644 data/2022/neurips/Equivariant Graph Hierarchy-Based Neural Networks
 create mode 100644 data/2022/neurips/Equivariant Networks for Crystal Structures
 create mode 100644 data/2022/neurips/Equivariant Networks for Zero-Shot Coordination
 create mode 100644 data/2022/neurips/Error Analysis of Tensor-Train Cross Approximation
 create mode 100644 data/2022/neurips/Error Correction Code Transformer
 create mode 100644 data/2022/neurips/Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
 create mode 100644 data/2022/neurips/Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
 create mode 100644 data/2022/neurips/Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits
 create mode 100644 data/2022/neurips/Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning
 create mode 100644 data/2022/neurips/Estimating and Explaining Model Performance When Both Covariates and Labels Shift
 create mode 100644 data/2022/neurips/Estimating graphical models for count data with applications to single-cell gene network
 create mode 100644 data/2022/neurips/Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC
 create mode 100644 data/2022/neurips/Estimation of Entropy in Constant Space with Improved Sample Complexity
 create mode 100644 data/2022/neurips/Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness
 create mode 100644 data/2022/neurips/Evaluating Graph Generative Models with Contrastively Learned Features
 create mode 100644 data/2022/neurips/Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts
 create mode 100644 data/2022/neurips/Evaluating Out-of-Distribution Performance on Document Image Classifiers
 create mode 100644 data/2022/neurips/Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
 create mode 100644 data/2022/neurips/Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex
 create mode 100644 data/2022/neurips/EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks
 create mode 100644 data/2022/neurips/Evolution of Neural Tangent Kernels under Benign and Adversarial Training
 create mode 100644 data/2022/neurips/Exact Shape Correspondence via 2D graph convolution
 create mode 100644 data/2022/neurips/Exact Solutions of a Deep Linear Network
 create mode 100644 data/2022/neurips/Exact learning dynamics of deep linear networks with prior knowledge
 create mode 100644 data/2022/neurips/Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
 create mode 100644 data/2022/neurips/Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
 create mode 100644 data/2022/neurips/Expected Improvement for Contextual Bandits
 create mode 100644 data/2022/neurips/Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
 create mode 100644 data/2022/neurips/Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces
 create mode 100644 data/2022/neurips/Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes
 create mode 100644 data/2022/neurips/Explainability Via Causal Self-Talk
 create mode 100644 data/2022/neurips/Explainable Reinforcement Learning via Model Transforms
 create mode 100644 data/2022/neurips/Explaining Preferences with Shapley Values
 create mode 100644 data/2022/neurips/Explicable Policy Search
 create mode 100644 data/2022/neurips/Explicit Tradeoffs between Adversarial and Natural Distributional Robustness
 create mode 100644 data/2022/neurips/Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping
 create mode 100644 data/2022/neurips/Exploitability Minimization in Games and Beyond
 create mode 100644 data/2022/neurips/Exploiting Semantic Relations for Glass Surface Detection
 create mode 100644 data/2022/neurips/Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection
 create mode 100644 data/2022/neurips/Exploration via Elliptical Episodic Bonuses
 create mode 100644 data/2022/neurips/Exploration via Planning for Information about the Optimal Trajectory
 create mode 100644 data/2022/neurips/Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards
 create mode 100644 data/2022/neurips/Exploring Example Influence in Continual Learning
 create mode 100644 data/2022/neurips/Exploring Figure-Ground Assignment Mechanism in Perceptual Organization
 create mode 100644 data/2022/neurips/Exploring Length Generalization in Large Language Models
 create mode 100644 data/2022/neurips/Exploring evolution-aware & -free protein language models as protein function predictors
 create mode 100644 data/2022/neurips/Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability
 create mode 100644 data/2022/neurips/Exploring the Latent Space of Autoencoders with Interventional Assays
 create mode 100644 data/2022/neurips/Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
 create mode 100644 data/2022/neurips/Exploring the Whole Rashomon Set of Sparse Decision Trees
 create mode 100644 data/2022/neurips/Exploring through Random Curiosity with General Value Functions
 create mode 100644 data/2022/neurips/Exponential Family Model-Based Reinforcement Learning via Score Matching
 create mode 100644 data/2022/neurips/Exponential Separations in Symmetric Neural Networks
 create mode 100644 data/2022/neurips/Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks
 create mode 100644 data/2022/neurips/Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training
 create mode 100644 data/2022/neurips/Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods
 create mode 100644 data/2022/neurips/Extracting computational mechanisms from neural data using low-rank RNNs
 create mode 100644 data/2022/neurips/Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study
 create mode 100644 data/2022/neurips/Extrapolative Continuous-time Bayesian Neural Network for Fast Training-free Test-time Adaptation
 create mode 100644 data/2022/neurips/FACT: Learning Governing Abstractions Behind Integer Sequences
 create mode 100644 data/2022/neurips/FETA: Towards Specializing Foundational Models for Expert Task Applications
 create mode 100644 data/2022/neurips/FIRE: Semantic Field of Words Represented as Non-Linear Functions
 create mode 100644 data/2022/neurips/FLAIR: Federated Learning Annotated Image Repository
 create mode 100644 data/2022/neurips/FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
 create mode 100644 data/2022/neurips/FNeVR: Neural Volume Rendering for Face Animation
 create mode 100644 data/2022/neurips/FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction
 create mode 100644 data/2022/neurips/FP8 Quantization: The Power of the Exponent
 create mode 100644 data/2022/neurips/FR: Folded Rationalization with a Unified Encoder
 create mode 100644 data/2022/neurips/Factored Adaptation for Non-Stationary Reinforcement Learning
 create mode 100644 data/2022/neurips/Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits
 create mode 100644 data/2022/neurips/Factorized-FL: Personalized Federated Learning with Parameter Factorization & Similarity Matching
 create mode 100644 data/2022/neurips/Factuality Enhanced Language Models for Open-Ended Text Generation
 create mode 100644 data/2022/neurips/Fair Bayes-Optimal Classifiers Under Predictive Parity
 create mode 100644 data/2022/neurips/Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
 create mode 100644 data/2022/neurips/Fair Rank Aggregation
 create mode 100644 data/2022/neurips/Fair Ranking with Noisy Protected Attributes
 create mode 100644 data/2022/neurips/Fair Wrapping for Black-box Predictions
 create mode 100644 data/2022/neurips/Fair and Efficient Allocations Without Obvious Manipulations
 create mode 100644 data/2022/neurips/Fair and Optimal Decision Trees: A Dynamic Programming Approach
 create mode 100644 data/2022/neurips/FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning
 create mode 100644 data/2022/neurips/Fairness Reprogramming
 create mode 100644 data/2022/neurips/Fairness Transferability Subject to Bounded Distribution Shift
 create mode 100644 data/2022/neurips/Fairness in Federated Learning via Core-Stability
 create mode 100644 data/2022/neurips/Fairness without Demographics through Knowledge Distillation
 create mode 100644 data/2022/neurips/Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search
 create mode 100644 data/2022/neurips/Falsification before Extrapolation in Causal Effect Estimation
 create mode 100644 data/2022/neurips/Fast Algorithms for Packing Proportional Fairness and its Dual
 create mode 100644 data/2022/neurips/Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
 create mode 100644 data/2022/neurips/Fast Bayesian Estimation of Point Process Intensity as Function of Covariates
 create mode 100644 data/2022/neurips/Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
 create mode 100644 data/2022/neurips/Fast Distance Oracles for Any Symmetric Norm
 create mode 100644 data/2022/neurips/Fast Instrument Learning with Faster Rates
 create mode 100644 data/2022/neurips/Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
 create mode 100644 data/2022/neurips/Fast Neural Kernel Embeddings for General Activations
 create mode 100644 data/2022/neurips/Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization
 create mode 100644 data/2022/neurips/Fast Vision Transformers with HiLo Attention
 create mode 100644 data/2022/neurips/Faster Deep Reinforcement Learning with Slower Online Network
 create mode 100644 data/2022/neurips/Faster Linear Algebra for Distance Matrices
 create mode 100644 data/2022/neurips/Faster and Scalable Algorithms for Densest Subgraph and Decomposition
 create mode 100644 data/2022/neurips/FasterRisk: Fast and Accurate Interpretable Risk Scores
 create mode 100644 data/2022/neurips/Fault-Aware Neural Code Rankers
 create mode 100644 data/2022/neurips/FeLMi : Few shot Learning with hard Mixup
 create mode 100644 data/2022/neurips/Feature Learning in $L_2$-regularized DNNs: Attraction Repulsion and Sparsity
 create mode 100644 data/2022/neurips/Feature-Proxy Transformer for Few-Shot Segmentation
 create mode 100644 data/2022/neurips/FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
 create mode 100644 data/2022/neurips/FedPop: A Bayesian Approach for Personalised Federated Learning
 create mode 100644 data/2022/neurips/FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction
 create mode 100644 data/2022/neurips/FedSR: A Simple and Effective Domain Generalization Method for Federated Learning
 create mode 100644 data/2022/neurips/Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
 create mode 100644 data/2022/neurips/Few-Shot Audio-Visual Learning of Environment Acoustics
 create mode 100644 data/2022/neurips/Few-Shot Continual Active Learning by a Robot
 create mode 100644 data/2022/neurips/Few-Shot Fast-Adaptive Anomaly Detection
 create mode 100644 data/2022/neurips/Few-Shot Non-Parametric Learning with Deep Latent Variable Model
 create mode 100644 data/2022/neurips/Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
 create mode 100644 data/2022/neurips/Few-shot Image Generation via Adaptation-Aware Kernel Modulation
 create mode 100644 data/2022/neurips/Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion
 create mode 100644 data/2022/neurips/Few-shot Relational Reasoning via Connection Subgraph Pretraining
 create mode 100644 data/2022/neurips/Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
 create mode 100644 data/2022/neurips/FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation
 create mode 100644 data/2022/neurips/FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting
 create mode 100644 data/2022/neurips/FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning
 create mode 100644 data/2022/neurips/Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach
 create mode 100644 data/2022/neurips/Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
 create mode 100644 data/2022/neurips/Finding Naturally Occurring Physical Backdoors in Image Datasets
 create mode 100644 data/2022/neurips/Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
 create mode 100644 data/2022/neurips/Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization
 create mode 100644 data/2022/neurips/Finding and Listing Front-door Adjustment Sets
 create mode 100644 data/2022/neurips/Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms
 create mode 100644 data/2022/neurips/Fine-Grained Semantically Aligned Vision-Language Pre-Training
 create mode 100644 data/2022/neurips/Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
 create mode 100644 data/2022/neurips/Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees
 create mode 100644 data/2022/neurips/Fine-tuning language models to find agreement among humans with diverse preferences
 create mode 100644 data/2022/neurips/Finite Sample Analysis Of Dynamic Regression Parameter Learning
 create mode 100644 data/2022/neurips/Finite-Sample Maximum Likelihood Estimation of Location
 create mode 100644 data/2022/neurips/Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks
 create mode 100644 data/2022/neurips/Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games
 create mode 100644 data/2022/neurips/Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
 create mode 100644 data/2022/neurips/First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization
 create mode 100644 data/2022/neurips/First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data
 create mode 100644 data/2022/neurips/First is Better Than Last for Language Data Influence
 create mode 100644 data/2022/neurips/First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces
 create mode 100644 data/2022/neurips/Fixed-Distance Hamiltonian Monte Carlo
 create mode 100644 data/2022/neurips/Flamingo: a Visual Language Model for Few-Shot Learning
 create mode 100644 data/2022/neurips/Flare7K: A Phenomenological Nighttime Flare Removal Dataset
 create mode 100644 data/2022/neurips/FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
 create mode 100644 data/2022/neurips/Flexible Diffusion Modeling of Long Videos
 create mode 100644 data/2022/neurips/Flexible Neural Image Compression via Code Editing
 create mode 100644 data/2022/neurips/FlowHMM: Flow-based continuous hidden Markov models
 create mode 100644 data/2022/neurips/Flowification: Everything is a normalizing flow
 create mode 100644 data/2022/neurips/FlyView: a bio-informed optical flow truth dataset for visual navigation using panoramic stereo vision
 create mode 100644 data/2022/neurips/Focal Modulation Networks
 create mode 100644 data/2022/neurips/Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
 create mode 100644 data/2022/neurips/Forecasting Future World Events With Neural Networks
 create mode 100644 data/2022/neurips/Forecasting Human Trajectory from Scene History
 create mode 100644 data/2022/neurips/Formalizing Consistency and Coherence of Representation Learning
 create mode 100644 data/2022/neurips/Formulating Robustness Against Unforeseen Attacks
 create mode 100644 data/2022/neurips/Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains
 create mode 100644 data/2022/neurips/Foundation Posteriors for Approximate Probabilistic Inference
 create mode 100644 data/2022/neurips/FourierFormer: Transformer Meets Generalized Fourier Integral Theorem
 create mode 100644 data/2022/neurips/FourierNets enable the design of highly non-local optical encoders for computational imaging
 create mode 100644 data/2022/neurips/Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
 create mode 100644 data/2022/neurips/FreGAN: Exploiting Frequency Components for Training GANs under Limited Data
 create mode 100644 data/2022/neurips/Free Probability for predicting the performance of feed-forward fully connected neural networks
 create mode 100644 data/2022/neurips/Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack
 create mode 100644 data/2022/neurips/From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
 create mode 100644 data/2022/neurips/Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images
 create mode 100644 data/2022/neurips/Fully Sparse 3D Object Detection
 create mode 100644 data/2022/neurips/Function Classes for Identifiable Nonlinear Independent Component Analysis
 create mode 100644 data/2022/neurips/Functional Ensemble Distillation
 create mode 100644 data/2022/neurips/Functional Indirection Neural Estimator for Better Out-of-distribution Generalization
 create mode 100644 data/2022/neurips/Fused Orthogonal Alternating Least Squares for Tensor Clustering
 create mode 100644 data/2022/neurips/Fuzzy Learning Machine
 create mode 100644 data/2022/neurips/GAGA: Deciphering Age-path of Generalized Self-paced Regularizer
 create mode 100644 data/2022/neurips/GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations
 create mode 100644 data/2022/neurips/GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis
 create mode 100644 data/2022/neurips/GAMA: Generative Adversarial Multi-Object Scene Attacks
 create mode 100644 data/2022/neurips/GAPX: Generalized Autoregressive Paraphrase-Identification X
 create mode 100644 data/2022/neurips/GAR: Generalized Autoregression for Multi-Fidelity Fusion
 create mode 100644 data/2022/neurips/GAUDI: A Neural Architect for Immersive 3D Scene Generation
 create mode 100644 data/2022/neurips/GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Models
 create mode 100644 data/2022/neurips/GENIE: Higher-Order Denoising Diffusion Solvers
 create mode 100644 data/2022/neurips/GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images
 create mode 100644 data/2022/neurips/GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks
 create mode 100644 data/2022/neurips/GLIPv2: Unifying Localization and Vision-Language Understanding
 create mode 100644 data/2022/neurips/GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
 create mode 100644 data/2022/neurips/GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models
 create mode 100644 data/2022/neurips/GOOD: A Graph Out-of-Distribution Benchmark
 create mode 100644 data/2022/neurips/GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale
 create mode 100644 data/2022/neurips/GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy
 create mode 100644 data/2022/neurips/GREED: A Neural Framework for Learning Graph Distance Functions
 create mode 100644 data/2022/neurips/GStarX: Explaining Graph Neural Networks with Structure-Aware Cooperative Games
 create mode 100644 data/2022/neurips/GT-GAN: General Purpose Time Series Synthesis with Generative Adversarial Networks
 create mode 100644 data/2022/neurips/GULP: a prediction-based metric between representations
 create mode 100644 data/2022/neurips/Gaussian Copula Embeddings
 create mode 100644 data/2022/neurips/GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions
 create mode 100644 data/2022/neurips/GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
 create mode 100644 data/2022/neurips/General Cutting Planes for Bound-Propagation-Based Neural Network Verification
 create mode 100644 data/2022/neurips/Generalised Implicit Neural Representations
 create mode 100644 data/2022/neurips/Generalised Mutual Information for Discriminative Clustering
 create mode 100644 data/2022/neurips/Generalization Analysis of Message Passing Neural Networks on Large Random Graphs
 create mode 100644 data/2022/neurips/Generalization Analysis on Learning with a Concurrent Verifier
 create mode 100644 data/2022/neurips/Generalization Bounds for Estimating Causal Effects of Continuous Treatments
 create mode 100644 data/2022/neurips/Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
 create mode 100644 data/2022/neurips/Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization
 create mode 100644 data/2022/neurips/Generalization Error Bounds on Deep Learning with Markov Datasets
 create mode 100644 data/2022/neurips/Generalization Gap in Amortized Inference
 create mode 100644 data/2022/neurips/Generalization Properties of NAS under Activation and Skip Connection Search
 create mode 100644 data/2022/neurips/Generalization for multiclass classification with overparameterized linear models
 create mode 100644 data/2022/neurips/Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems
 create mode 100644 data/2022/neurips/Generalized Laplacian Eigenmaps
 create mode 100644 data/2022/neurips/Generalized One-shot Domain Adaptation of Generative Adversarial Networks
 create mode 100644 data/2022/neurips/Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning
 create mode 100644 data/2022/neurips/Generalizing Bayesian Optimization with Decision-theoretic Entropies
 create mode 100644 data/2022/neurips/Generalizing Consistent Multi-Class Classification with Rejection to be Compatible with Arbitrary Losses
 create mode 100644 data/2022/neurips/Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning
 create mode 100644 data/2022/neurips/Generating Long Videos of Dynamic Scenes
 create mode 100644 data/2022/neurips/Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
 create mode 100644 data/2022/neurips/Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)
 create mode 100644 data/2022/neurips/Generative Neural Articulated Radiance Fields
 create mode 100644 data/2022/neurips/Generative Status Estimation and Information Decoupling for Image Rain Removal
 create mode 100644 data/2022/neurips/Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement
 create mode 100644 data/2022/neurips/Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models
 create mode 100644 data/2022/neurips/Generative multitask learning mitigates target-causing confounding
 create mode 100644 data/2022/neurips/Generic bounds on the approximation error for physics-informed (and) operator learning
 create mode 100644 data/2022/neurips/Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction
 create mode 100644 data/2022/neurips/Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers
 create mode 100644 data/2022/neurips/Geoclidean: Few-Shot Generalization in Euclidean Geometry
 create mode 100644 data/2022/neurips/Geodesic Graph Neural Network for Efficient Graph Representation Learning
 create mode 100644 data/2022/neurips/Geodesic Self-Attention for 3D Point Clouds
 create mode 100644 data/2022/neurips/Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks
 create mode 100644 data/2022/neurips/Geometric Order Learning for Rank Estimation
 create mode 100644 data/2022/neurips/Geometry-aware Two-scale PIFu Representation for Human Reconstruction
 create mode 100644 data/2022/neurips/Get More at Once: Alternating Sparse Training with Gradient Correction
 create mode 100644 data/2022/neurips/GhostNetV2: Enhance Cheap Operation with Long-Range Attention
 create mode 100644 data/2022/neurips/Giga-scale Kernel Matrix-Vector Multiplication on GPU
 create mode 100644 data/2022/neurips/Giving Feedback on Interactive Student Programs with Meta-Exploration
 create mode 100644 data/2022/neurips/GlanceNets: Interpretable, Leak-proof Concept-based Models
 create mode 100644 data/2022/neurips/Global Convergence and Stability of Stochastic Gradient Descent
 create mode 100644 data/2022/neurips/Global Convergence of Federated Learning for Mixed Regression
 create mode 100644 data/2022/neurips/Global Linear and Local Superlinear Convergence of IRLS for Non-Smooth Robust Regression
 create mode 100644 data/2022/neurips/Global Normalization for Streaming Speech Recognition in a Modular Framework
 create mode 100644 data/2022/neurips/Global Optimal K-Medoids Clustering of One Million Samples
 create mode 100644 data/2022/neurips/Globally Convergent Policy Search for Output Estimation
 create mode 100644 data/2022/neurips/Globally Gated Deep Linear Networks
 create mode 100644 "data/2022/neurips/Gold-standard solutions to the Schr\303\266dinger equation using deep learning: How much physics do we need?"
 create mode 100644 data/2022/neurips/GraB: Finding Provably Better Data Permutations than Random Reshuffling
 create mode 100644 data/2022/neurips/Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound
 create mode 100644 data/2022/neurips/Gradient Descent: The Ultimate Optimizer
 create mode 100644 data/2022/neurips/Gradient Estimation with Discrete Stein Operators
 create mode 100644 data/2022/neurips/Gradient Methods Provably Converge to Non-Robust Networks
 create mode 100644 data/2022/neurips/Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
 create mode 100644 data/2022/neurips/Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization
 create mode 100644 data/2022/neurips/Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction
 create mode 100644 data/2022/neurips/Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy
 create mode 100644 data/2022/neurips/Graph Few-shot Learning with Task-specific Structures
 create mode 100644 data/2022/neurips/Graph Learning Assisted Multi-Objective Integer Programming
 create mode 100644 data/2022/neurips/Graph Neural Network Bandits
 create mode 100644 data/2022/neurips/Graph Neural Networks are Dynamic Programmers
 create mode 100644 data/2022/neurips/Graph Neural Networks with Adaptive Readouts
 create mode 100644 data/2022/neurips/Graph Reordering for Cache-Efficient Near Neighbor Search
 create mode 100644 data/2022/neurips/Graph Scattering beyond Wavelet Shackles
 create mode 100644 data/2022/neurips/Graph Self-supervised Learning with Accurate Discrepancy Learning
 create mode 100644 data/2022/neurips/GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs
 create mode 100644 data/2022/neurips/GraphQNTK: Quantum Neural Tangent Kernel for Graph Data
 create mode 100644 data/2022/neurips/Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks
 create mode 100644 data/2022/neurips/Green Hierarchical Vision Transformer for Masked Image Modeling
 create mode 100644 data/2022/neurips/GriddlyJS: A Web IDE for Reinforcement Learning
 create mode 100644 data/2022/neurips/Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
 create mode 100644 data/2022/neurips/Grounded Video Situation Recognition
 create mode 100644 data/2022/neurips/Grounding Aleatoric Uncertainty for Unsupervised Environment Design
 create mode 100644 data/2022/neurips/Group Meritocratic Fairness in Linear Contextual Bandits
 create mode 100644 data/2022/neurips/Grow and Merge: A Unified Framework for Continuous Categories Discovery
 create mode 100644 data/2022/neurips/Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics
 create mode 100644 data/2022/neurips/HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
 create mode 100644 data/2022/neurips/HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details
 create mode 100644 data/2022/neurips/HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies
 create mode 100644 data/2022/neurips/HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces
 create mode 100644 data/2022/neurips/HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
 create mode 100644 data/2022/neurips/HUMUS-Net: Hybrid Unrolled Multi-scale Network Architecture for Accelerated MRI Reconstruction
 create mode 100644 data/2022/neurips/HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences
 create mode 100644 data/2022/neurips/Hamiltonian Latent Operators for content and motion disentanglement in image sequences
 create mode 100644 data/2022/neurips/Hand-Object Interaction Image Generation
 create mode 100644 data/2022/neurips/HandMeThat: Human-Robot Communication in Physical and Social Environments
 create mode 100644 data/2022/neurips/Handcrafted Backdoors in Deep Neural Networks
 create mode 100644 data/2022/neurips/Hard ImageNet: Segmentations for Objects with Strong Spurious Cues
 create mode 100644 data/2022/neurips/Hardness in Markov Decision Processes: Theory and Practice
 create mode 100644 data/2022/neurips/Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
 create mode 100644 data/2022/neurips/Harmonizing the object recognition strategies of deep neural networks with humans
 create mode 100644 data/2022/neurips/Heatmap Distribution Matching for Human Pose Estimation
 create mode 100644 data/2022/neurips/Hedging as Reward Augmentation in Probabilistic Graphical Models
 create mode 100644 data/2022/neurips/Heterogeneous Skill Learning for Multi-agent Tasks
 create mode 100644 data/2022/neurips/Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
 create mode 100644 data/2022/neurips/Hiding Images in Deep Probabilistic Models
 create mode 100644 data/2022/neurips/HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis
 create mode 100644 data/2022/neurips/Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
 create mode 100644 data/2022/neurips/Hierarchical Channel-spatial Encoding for Communication-efficient Collaborative Learning
 create mode 100644 data/2022/neurips/Hierarchical Graph Transformer with Adaptive Node Sampling
 create mode 100644 data/2022/neurips/Hierarchical Lattice Layer for Partially Monotone Neural Networks
 create mode 100644 data/2022/neurips/Hierarchical Normalization for Robust Monocular Depth Estimation
 create mode 100644 data/2022/neurips/Hierarchical classification at multiple operating points
 create mode 100644 data/2022/neurips/High-Order Pooling for Graph Neural Networks with Tensor Decomposition
 create mode 100644 data/2022/neurips/High-dimensional Additive Gaussian Processes under Monotonicity Constraints
 create mode 100644 data/2022/neurips/High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
 create mode 100644 data/2022/neurips/High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
 create mode 100644 data/2022/neurips/Hilbert Distillation for Cross-Dimensionality Networks
 create mode 100644 data/2022/neurips/Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations
 create mode 100644 data/2022/neurips/Homomorphic Matrix Completion
 create mode 100644 data/2022/neurips/Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
 create mode 100644 data/2022/neurips/HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
 create mode 100644 data/2022/neurips/House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography
 create mode 100644 data/2022/neurips/How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
 create mode 100644 data/2022/neurips/How Powerful are K-hop Message Passing Graph Neural Networks
 create mode 100644 data/2022/neurips/How Sampling Impacts the Robustness of Stochastic Neural Networks
 create mode 100644 data/2022/neurips/How Transferable are Video Representations Based on Synthetic Data?
 create mode 100644 data/2022/neurips/How Well Do Unsupervised Learning Algorithms Model Human Real-time and Life-long Learning?
 create mode 100644 data/2022/neurips/How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
 create mode 100644 data/2022/neurips/How and Why to Manipulate Your Own Agent: On the Incentives of Users of Learning Agents
 create mode 100644 data/2022/neurips/How to talk so AI will learn: Instructions, descriptions, and autonomy
 create mode 100644 data/2022/neurips/Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
 create mode 100644 data/2022/neurips/Human-AI Collaborative Bayesian Optimisation
 create mode 100644 data/2022/neurips/Human-AI Shared Control via Policy Dissection
 create mode 100644 data/2022/neurips/Human-Robotic Prosthesis as Collaborating Agents for Symmetrical Walking
 create mode 100644 data/2022/neurips/HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process
 create mode 100644 data/2022/neurips/Hybrid Neural Autoencoders for Stimulus Encoding in Visual and Other Sensory Neuroprostheses
 create mode 100644 data/2022/neurips/Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights
 create mode 100644 data/2022/neurips/HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks
 create mode 100644 data/2022/neurips/HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding
 create mode 100644 data/2022/neurips/HyperTree Proof Search for Neural Theorem Proving
 create mode 100644 data/2022/neurips/Hyperbolic Embedding Inference for Structured Multi-Label Prediction
 create mode 100644 data/2022/neurips/Hyperbolic Feature Augmentation via Distribution Estimation and Infinite Sampling on Manifolds
 create mode 100644 data/2022/neurips/Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution
 create mode 100644 data/2022/neurips/Hypothesis Testing for Differentially Private Linear Regression
 create mode 100644 data/2022/neurips/I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
 create mode 100644 data/2022/neurips/I2Q: A Fully Decentralized Q-Learning Algorithm
 create mode 100644 data/2022/neurips/IKEA-Manual: Seeing Shape Assembly Step by Step
 create mode 100644 data/2022/neurips/IM-Loss: Information Maximization Loss for Spiking Neural Networks
 create mode 100644 data/2022/neurips/IMED-RL: Regret optimal learning of ergodic Markov decision processes
 create mode 100644 data/2022/neurips/INRAS: Implicit Neural Representation for Audio Scenes
 create mode 100644 data/2022/neurips/Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning
 create mode 100644 data/2022/neurips/Identifiability of deep generative models without auxiliary information
 create mode 100644 data/2022/neurips/Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy
 create mode 100644 data/2022/neurips/Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials
 create mode 100644 data/2022/neurips/If Influence Functions are the Answer, Then What is the Question?
 create mode 100644 data/2022/neurips/Imbalance Trouble: Revisiting Neural-Collapse Geometry
 create mode 100644 data/2022/neurips/Imitating Past Successes can be Very Suboptimal
 create mode 100644 data/2022/neurips/Implications of Model Indeterminacy for Explanations of Automated Decisions
 create mode 100644 data/2022/neurips/Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
 create mode 100644 data/2022/neurips/Implicit Neural Representations with Levels-of-Experts
 create mode 100644 data/2022/neurips/Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
 create mode 100644 data/2022/neurips/Implicit Warping for Animation with Image Sets
 create mode 100644 data/2022/neurips/Improved Algorithms for Neural Active Learning
 create mode 100644 data/2022/neurips/Improved Bounds on Neural Complexity for Representing Piecewise Linear Functions
 create mode 100644 data/2022/neurips/Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
 create mode 100644 data/2022/neurips/Improved Coresets for Euclidean k-Means
 create mode 100644 data/2022/neurips/Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams
 create mode 100644 data/2022/neurips/Improved Feature Distillation via Projector Ensemble
 create mode 100644 data/2022/neurips/Improved Fine-Tuning by Better Leveraging Pre-Training Data
 create mode 100644 data/2022/neurips/Improved Imaging by Invex Regularizers with Global Optima Guarantees
 create mode 100644 data/2022/neurips/Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
 create mode 100644 data/2022/neurips/Improved Utility Analysis of Private CountSketch
 create mode 100644 data/2022/neurips/Improved techniques for deterministic l2 robustness
 create mode 100644 data/2022/neurips/Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator
 create mode 100644 data/2022/neurips/Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class
 create mode 100644 data/2022/neurips/Improving Certified Robustness via Statistical Learning with Logical Reasoning
 create mode 100644 data/2022/neurips/Improving Diffusion Models for Inverse Problems using Manifold Constraints
 create mode 100644 data/2022/neurips/Improving GANs with A Dynamic Discriminator
 create mode 100644 data/2022/neurips/Improving Generative Adversarial Networks via Adversarial Learning in Latent Space
 create mode 100644 data/2022/neurips/Improving Intrinsic Exploration with Language Abstractions
 create mode 100644 data/2022/neurips/Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method
 create mode 100644 data/2022/neurips/Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors
 create mode 100644 data/2022/neurips/Improving Policy Learning via Language Dynamics Distillation
 create mode 100644 data/2022/neurips/Improving Self-Supervised Learning by Characterizing Idealized Representations
 create mode 100644 data/2022/neurips/Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization
 create mode 100644 data/2022/neurips/Improving Transformer with an Admixture of Attention Heads
 create mode 100644 data/2022/neurips/Improving Variational Autoencoders with Density Gap-based Regularization
 create mode 100644 data/2022/neurips/Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
 create mode 100644 data/2022/neurips/In Defense of the Unitary Scalarization for Deep Multi-Task Learning
 create mode 100644 data/2022/neurips/In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning
 create mode 100644 data/2022/neurips/In What Ways Are Deep Neural Networks Invariant and How Should We Measure This?
 create mode 100644 data/2022/neurips/In the Eye of the Beholder: Robust Prediction with Causal User Modeling
 create mode 100644 data/2022/neurips/Incentivizing Combinatorial Bandit Exploration
 create mode 100644 data/2022/neurips/Inception Transformer
 create mode 100644 data/2022/neurips/Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering
 create mode 100644 data/2022/neurips/Increasing Confidence in Adversarial Robustness Evaluations
 create mode 100644 data/2022/neurips/Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces
 create mode 100644 data/2022/neurips/Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
 create mode 100644 data/2022/neurips/Independence Testing for Bounded Degree Bayesian Networks
 create mode 100644 data/2022/neurips/Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models
 create mode 100644 data/2022/neurips/Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
 create mode 100644 data/2022/neurips/Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence
 create mode 100644 data/2022/neurips/Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?
 create mode 100644 data/2022/neurips/Inductive Logical Query Answering in Knowledge Graphs
 create mode 100644 data/2022/neurips/Inference and Sampling for Archimax Copulas
 create mode 100644 data/2022/neurips/Infinite Recommendation Networks: A Data-Centric Approach
 create mode 100644 data/2022/neurips/Infinite-Fidelity Coregionalization for Physical Simulation
 create mode 100644 data/2022/neurips/Influencing Long-Term Behavior in Multiagent Reinforcement Learning
 create mode 100644 data/2022/neurips/Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
 create mode 100644 data/2022/neurips/Information-Theoretic GAN Compression with Variational Energy-based Model
 create mode 100644 data/2022/neurips/Information-Theoretic Safe Exploration with Gaussian Processes
 create mode 100644 data/2022/neurips/Inherently Explainable Reinforcement Learning in Natural Language
 create mode 100644 data/2022/neurips/Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties
 create mode 100644 data/2022/neurips/InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model
 create mode 100644 data/2022/neurips/InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation
 create mode 100644 data/2022/neurips/Insights into Pre-training via Simpler Synthetic Tasks
 create mode 100644 data/2022/neurips/Instability and Local Minima in GAN Training with Kernel Discriminators
 create mode 100644 data/2022/neurips/Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
 create mode 100644 data/2022/neurips/Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design
 create mode 100644 data/2022/neurips/Instance-based Learning for Knowledge Base Completion
 create mode 100644 data/2022/neurips/Instance-optimal PAC Algorithms for Contextual Bandits
 create mode 100644 data/2022/neurips/Integral Probability Metrics PAC-Bayes Bounds
 create mode 100644 data/2022/neurips/Interaction Modeling with Multiplex Attention
 create mode 100644 data/2022/neurips/Interaction-Grounded Learning with Action-Inclusive Feedback
 create mode 100644 data/2022/neurips/Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation
 create mode 100644 data/2022/neurips/Interpolation and Regularization for Causal Learning
 create mode 100644 data/2022/neurips/Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations
 create mode 100644 data/2022/neurips/Interventions, Where and How? Experimental Design for Causal Models at Scale
 create mode 100644 data/2022/neurips/Intra-agent speech permits zero-shot task acquisition
 create mode 100644 data/2022/neurips/Intrinsic dimensionality estimation using Normalizing Flows
 create mode 100644 data/2022/neurips/Introspective Learning : A Two-Stage approach for Inference in Neural Networks
 create mode 100644 data/2022/neurips/Invariance Learning based on Label Hierarchy
 create mode 100644 data/2022/neurips/Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations
 create mode 100644 data/2022/neurips/Invariance-Aware Randomized Smoothing Certificates
 create mode 100644 data/2022/neurips/Invariant and Transportable Representations for Anti-Causal Domain Shifts
 create mode 100644 data/2022/neurips/Inverse Design for Fluid-Structure Interactions using Graph Network Simulators
 create mode 100644 data/2022/neurips/Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality
 create mode 100644 data/2022/neurips/Invertible Monotone Operators for Normalizing Flows
 create mode 100644 data/2022/neurips/Is Integer Arithmetic Enough for Deep Learning Training?
 create mode 100644 data/2022/neurips/Is Out-of-Distribution Detection Learnable?
 create mode 100644 data/2022/neurips/Is Sortition Both Representative and Fair?
 create mode 100644 data/2022/neurips/Is a Modular Architecture Enough?
 create mode 100644 data/2022/neurips/Is one annotation enough? - A data-centric image classification benchmark for noisy and ambiguous label estimation
 create mode 100644 data/2022/neurips/Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations
 create mode 100644 data/2022/neurips/Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models
 create mode 100644 data/2022/neurips/Isometric 3D Adversarial Examples in the Physical World
 create mode 100644 data/2022/neurips/Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments
 create mode 100644 data/2022/neurips/Iterative Scene Graph Generation
 create mode 100644 data/2022/neurips/Iterative Structural Inference of Directed Graphs
 create mode 100644 data/2022/neurips/JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search
 create mode 100644 data/2022/neurips/JAWS: Auditing Predictive Uncertainty Under Covariate Shift
 create mode 100644 data/2022/neurips/Joint Entropy Search For Maximally-Informed Bayesian Optimization
 create mode 100644 data/2022/neurips/Joint Entropy Search for Multi-Objective Bayesian Optimization
 create mode 100644 data/2022/neurips/Joint Learning of 2D-3D Weakly Supervised Semantic Segmentation
 create mode 100644 data/2022/neurips/Jump Self-attention: Capturing High-order Statistics in Transformers
 create mode 100644 data/2022/neurips/K-LITE: Learning Transferable Visual Models with External Knowledge
 create mode 100644 data/2022/neurips/K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions
 create mode 100644 data/2022/neurips/KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
 create mode 100644 data/2022/neurips/KSD Aggregated Goodness-of-fit Test
 create mode 100644 data/2022/neurips/Kantorovich Strikes Back! Wasserstein GANs are not Optimal Transport?
 create mode 100644 data/2022/neurips/Kernel Interpolation with Sparse Grids
 create mode 100644 data/2022/neurips/Kernel Memory Networks: A Unifying Framework for Memory Modeling
 create mode 100644 data/2022/neurips/Kernel Multimodal Continuous Attention
 create mode 100644 data/2022/neurips/Kernel similarity matching with Hebbian networks
 create mode 100644 data/2022/neurips/Keypoint-Guided Optimal Transport with Applications in Heterogeneous Domain Adaptation
 create mode 100644 data/2022/neurips/Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks
 create mode 100644 data/2022/neurips/Knowledge Distillation from A Stronger Teacher
 create mode 100644 data/2022/neurips/Knowledge Distillation: Bad Models Can Be Good Role Models
 create mode 100644 data/2022/neurips/Knowledge-Aware Bayesian Deep Topic Model
 create mode 100644 data/2022/neurips/LAION-5B: An open large-scale dataset for training next generation image-text models
 create mode 100644 data/2022/neurips/LAMP: Extracting Text from Gradients with Language Model Priors
 create mode 100644 data/2022/neurips/LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery
 create mode 100644 data/2022/neurips/LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank
 create mode 100644 data/2022/neurips/LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
 create mode 100644 data/2022/neurips/LGDN: Language-Guided Denoising Network for Video-Language Modeling
 create mode 100644 data/2022/neurips/LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks
 create mode 100644 data/2022/neurips/LION: Latent Point Diffusion Models for 3D Shape Generation
 create mode 100644 data/2022/neurips/LIPS - Learning Industrial Physical Simulation benchmark suite
 create mode 100644 data/2022/neurips/LISA: Learning Interpretable Skill Abstractions from Language
 create mode 100644 data/2022/neurips/LOG: Active Model Adaptation for Label-Efficient OOD Generalization
 create mode 100644 data/2022/neurips/LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness
 create mode 100644 data/2022/neurips/LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
 create mode 100644 data/2022/neurips/LTMD: Learning Improvement of Spiking Neural Networks with Learnable Thresholding Neurons and Moderate Dropout
 create mode 100644 data/2022/neurips/Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
 create mode 100644 data/2022/neurips/Label-Aware Global Consistency for Multi-Label Learning with Single Positive Labels
 create mode 100644 data/2022/neurips/Label-invariant Augmentation for Semi-Supervised Graph Classification
 create mode 100644 data/2022/neurips/Langevin Autoencoders for Learning Deep Latent Variable Models
 create mode 100644 data/2022/neurips/Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
 create mode 100644 data/2022/neurips/Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
 create mode 100644 data/2022/neurips/Laplacian Autoencoders for Learning Stochastic Representations
 create mode 100644 data/2022/neurips/Large Language Models are Zero-Shot Reasoners
 create mode 100644 data/2022/neurips/Large-Scale Differentiable Causal Discovery of Factor Graphs
 create mode 100644 data/2022/neurips/Large-Scale Retrieval for Reinforcement Learning
 create mode 100644 data/2022/neurips/Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes
 create mode 100644 data/2022/neurips/Large-scale Optimization of Partial AUC in a Range of False Positive Rates
 create mode 100644 data/2022/neurips/LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
 create mode 100644 data/2022/neurips/Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities
 create mode 100644 data/2022/neurips/Latency-aware Spatial-wise Dynamic Networks
 create mode 100644 data/2022/neurips/Latent Hierarchical Causal Structure Discovery with Rank Constraints
 create mode 100644 data/2022/neurips/Latent Planning via Expansive Tree Search
 create mode 100644 data/2022/neurips/Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
 create mode 100644 data/2022/neurips/Lazy and Fast Greedy MAP Inference for Determinantal Point Process
 create mode 100644 data/2022/neurips/Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
 create mode 100644 data/2022/neurips/Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets
 create mode 100644 data/2022/neurips/Learn what matters: cross-domain imitation learning with task-relevant embeddings
 create mode 100644 data/2022/neurips/Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks
 create mode 100644 data/2022/neurips/Learning (Very) Simple Generative Models Is Hard
 create mode 100644 data/2022/neurips/Learning Active Camera for Multi-Object Navigation
 create mode 100644 data/2022/neurips/Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network
 create mode 100644 data/2022/neurips/Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
 create mode 100644 data/2022/neurips/Learning Best Combination for Efficient N: M Sparsity
 create mode 100644 data/2022/neurips/Learning Bipartite Graphs: Heavy Tails and Multiple Components
 create mode 100644 data/2022/neurips/Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs
 create mode 100644 data/2022/neurips/Learning Chaotic Dynamics in Dissipative Systems
 create mode 100644 data/2022/neurips/Learning Concept Credible Models for Mitigating Shortcuts
 create mode 100644 data/2022/neurips/Learning Consistency-Aware Unsigned Distance Functions Progressively from Raw Point Clouds
 create mode 100644 data/2022/neurips/Learning Contrastive Embedding in Low-Dimensional Space
 create mode 100644 data/2022/neurips/Learning Debiased Classifier with Biased Committee
 create mode 100644 data/2022/neurips/Learning Deep Input-Output Stable Dynamics
 create mode 100644 data/2022/neurips/Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
 create mode 100644 data/2022/neurips/Learning Distinct and Representative Modes for Image Captioning
 create mode 100644 data/2022/neurips/Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game
 create mode 100644 data/2022/neurips/Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers
 create mode 100644 data/2022/neurips/Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces
 create mode 100644 data/2022/neurips/Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
 create mode 100644 data/2022/neurips/Learning Energy Networks with Generalized Fenchel-Young Losses
 create mode 100644 data/2022/neurips/Learning Enhanced Representation for Tabular Data via Neighborhood Propagation
 create mode 100644 data/2022/neurips/Learning Equivariant Segmentation with Instance-Unique Querying
 create mode 100644 data/2022/neurips/Learning Expressive Meta-Representations with Mixture of Expert Neural Processes
 create mode 100644 data/2022/neurips/Learning Fractional White Noises in Neural Stochastic Differential Equations
 create mode 100644 data/2022/neurips/Learning General World Models in a Handful of Reward-Free Deployments
 create mode 100644 data/2022/neurips/Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation
 create mode 100644 data/2022/neurips/Learning Generalizable Part-based Feature Representation for 3D Point Clouds
 create mode 100644 data/2022/neurips/Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems
 create mode 100644 data/2022/neurips/Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds
 create mode 100644 data/2022/neurips/Learning Individualized Treatment Rules with Many Treatments: A Supervised Clustering Approach Using Adaptive Fusion
 create mode 100644 data/2022/neurips/Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
 create mode 100644 data/2022/neurips/Learning Interface Conditions in Domain Decomposition Solvers
 create mode 100644 data/2022/neurips/Learning Invariant Graph Representations for Out-of-Distribution Generalization
 create mode 100644 data/2022/neurips/Learning Latent Seasonal-Trend Representations for Time Series Forecasting
 create mode 100644 data/2022/neurips/Learning Long-Term Crop Management Strategies with CyclesGym
 create mode 100644 data/2022/neurips/Learning Manifold Dimensions with Conditional Variational Autoencoders
 create mode 100644 data/2022/neurips/Learning Mixed Multinomial Logits with Provable Guarantees
 create mode 100644 data/2022/neurips/Learning Modular Simulations for Homogeneous Systems
 create mode 100644 data/2022/neurips/Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching
 create mode 100644 data/2022/neurips/Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning
 create mode 100644 data/2022/neurips/Learning Neural Acoustic Fields
 create mode 100644 data/2022/neurips/Learning Neural Set Functions Under the Optimal Subset Oracle
 create mode 100644 data/2022/neurips/Learning Optical Flow from Continuous Spike Streams
 create mode 100644 data/2022/neurips/Learning Optimal Flows for Non-Equilibrium Importance Sampling
 create mode 100644 data/2022/neurips/Learning Options via Compression
 create mode 100644 data/2022/neurips/Learning Partial Equivariances From Data
 create mode 100644 data/2022/neurips/Learning Physical Dynamics with Subequivariant Graph Neural Networks
 create mode 100644 data/2022/neurips/Learning Physics Constrained Dynamics Using Autoencoders
 create mode 100644 data/2022/neurips/Learning Predictions for Algorithms with Predictions
 create mode 100644 data/2022/neurips/Learning Probabilistic Models from Generator Latent Spaces with Hat EBM
 create mode 100644 data/2022/neurips/Learning Recourse on Instance Environment to Enhance Prediction Accuracy
 create mode 100644 data/2022/neurips/Learning Representations via a Robust Behavioral Metric for Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Learning Robust Dynamics through Variational Sparse Gating
 create mode 100644 data/2022/neurips/Learning Robust Rule Representations for Abstract Reasoning via Internal Inferences
 create mode 100644 data/2022/neurips/Learning State-Aware Visual Representations from Audible Interactions
 create mode 100644 data/2022/neurips/Learning Structure from the Ground up - Hierarchical Representation Learning by Chunking
 create mode 100644 data/2022/neurips/Learning Substructure Invariance for Out-of-Distribution Molecular Representations
 create mode 100644 data/2022/neurips/Learning Superpoint Graph Cut for 3D Instance Segmentation
 create mode 100644 data/2022/neurips/Learning Symmetric Rules with SATNet
 create mode 100644 data/2022/neurips/Learning Tractable Probabilistic Models from Inconsistent Local Estimates
 create mode 100644 data/2022/neurips/Learning Two-Player Markov Games: Neural Function Approximation and Correlated Equilibrium
 create mode 100644 data/2022/neurips/Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
 create mode 100644 data/2022/neurips/Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
 create mode 100644 data/2022/neurips/Learning and Covering Sums of Independent Random Variables with Unbounded Support
 create mode 100644 data/2022/neurips/Learning dynamics of deep linear networks with multiple pathways
 create mode 100644 data/2022/neurips/Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context
 create mode 100644 data/2022/neurips/Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales
 create mode 100644 data/2022/neurips/Learning from Future: A Novel Self-Training Framework for Semantic Segmentation
 create mode 100644 data/2022/neurips/Learning from Label Proportions by Learning with Label Noise
 create mode 100644 data/2022/neurips/Learning from Stochastically Revealed Preference
 create mode 100644 data/2022/neurips/Learning from a Sample in Online Algorithms
 create mode 100644 data/2022/neurips/Learning in Congestion Games with Bandit Feedback
 create mode 100644 data/2022/neurips/Learning in Observable POMDPs, without Computationally Intractable Oracles
 create mode 100644 data/2022/neurips/Learning interacting dynamical systems with latent Gaussian process ODEs
 create mode 100644 data/2022/neurips/Learning low-dimensional generalizable natural features from retina using a U-net
 create mode 100644 data/2022/neurips/Learning on Arbitrary Graph Topologies via Predictive Coding
 create mode 100644 data/2022/neurips/Learning on the Edge: Online Learning with Stochastic Feedback Graphs
 create mode 100644 data/2022/neurips/Learning single-index models with shallow neural networks
 create mode 100644 data/2022/neurips/Learning sparse features can lead to overfitting in neural networks
 create mode 100644 data/2022/neurips/Learning the Structure of Large Networked Systems Obeying Conservation Laws
 create mode 100644 data/2022/neurips/Learning to Accelerate Partial Differential Equations via Latent Global Evolution
 create mode 100644 data/2022/neurips/Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework
 create mode 100644 data/2022/neurips/Learning to Branch with Tree MDPs
 create mode 100644 data/2022/neurips/Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
 create mode 100644 data/2022/neurips/Learning to Compare Nodes in Branch and Bound with Graph Neural Networks
 create mode 100644 data/2022/neurips/Learning to Configure Computer Networks with Neural Algorithmic Reasoning
 create mode 100644 data/2022/neurips/Learning to Constrain Policy Optimization with Virtual Trust Region
 create mode 100644 data/2022/neurips/Learning to Discover and Detect Objects
 create mode 100644 data/2022/neurips/Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs
 create mode 100644 data/2022/neurips/Learning to Follow Instructions in Text-Based Games
 create mode 100644 data/2022/neurips/Learning to Generate Inversion-Resistant Model Explanations
 create mode 100644 data/2022/neurips/Learning to Mitigate AI Collusion on Economic Platforms
 create mode 100644 data/2022/neurips/Learning to Navigate Wikipedia by Taking Random Walks
 create mode 100644 data/2022/neurips/Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification
 create mode 100644 data/2022/neurips/Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
 create mode 100644 data/2022/neurips/Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations
 create mode 100644 data/2022/neurips/Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs
 create mode 100644 data/2022/neurips/Learning to Scaffold: Optimizing Model Explanations for Teaching
 create mode 100644 data/2022/neurips/Learning to Share in Networked Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Learning with convolution and pooling operations in kernel methods
 create mode 100644 data/2022/neurips/Learning with little mixing
 create mode 100644 data/2022/neurips/Learning-Augmented Algorithms for Online Linear and Semidefinite Programming
 create mode 100644 data/2022/neurips/Learning-based Motion Planning in Dynamic Environments Using GNNs and Temporal Encoding
 create mode 100644 data/2022/neurips/Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning
 create mode 100644 data/2022/neurips/Less-forgetting Multi-lingual Fine-tuning
 create mode 100644 data/2022/neurips/Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis
 create mode 100644 data/2022/neurips/Lethal Dose Conjecture on Data Poisoning
 create mode 100644 data/2022/neurips/Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
 create mode 100644 data/2022/neurips/Leveraging Inter-Layer Dependency for Post -Training Quantization
 create mode 100644 data/2022/neurips/Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions
 create mode 100644 data/2022/neurips/LieGG: Studying Learned Lie Group Generators
 create mode 100644 data/2022/neurips/Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
 create mode 100644 data/2022/neurips/Lifting Weak Supervision To Structured Prediction
 create mode 100644 data/2022/neurips/Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
 create mode 100644 data/2022/neurips/Linear Label Ranking with Bounded Noise
 create mode 100644 data/2022/neurips/Linear tree shap
 create mode 100644 data/2022/neurips/Lipschitz Bandits with Batched Feedback
 create mode 100644 data/2022/neurips/List-Decodable Sparse Mean Estimation
 create mode 100644 data/2022/neurips/List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering
 create mode 100644 data/2022/neurips/Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF
 create mode 100644 data/2022/neurips/LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
 create mode 100644 data/2022/neurips/LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
 create mode 100644 data/2022/neurips/Local Bayesian optimization via maximizing probability of descent
 create mode 100644 data/2022/neurips/Local Identifiability of Deep ReLU Neural Networks: the Theory
 create mode 100644 data/2022/neurips/Local Latent Space Bayesian Optimization over Structured Inputs
 create mode 100644 data/2022/neurips/Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity
 create mode 100644 data/2022/neurips/Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
 create mode 100644 data/2022/neurips/Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis
 create mode 100644 data/2022/neurips/Local-Global MCMC kernels: the best of both worlds
 create mode 100644 data/2022/neurips/Locally Hierarchical Auto-Regressive Modeling for Image Generation
 create mode 100644 data/2022/neurips/Locating and Editing Factual Associations in GPT
 create mode 100644 data/2022/neurips/Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy
 create mode 100644 data/2022/neurips/Log-Linear-Time Gaussian Processes Using Binary Tree Kernels
 create mode 100644 data/2022/neurips/Log-Polar Space Convolution Layers
 create mode 100644 data/2022/neurips/LogiGAN: Learning Logical Reasoning via Adversarial Pre-training
 create mode 100644 data/2022/neurips/Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators
 create mode 100644 data/2022/neurips/Logical Credal Networks
 create mode 100644 data/2022/neurips/Long Range Graph Benchmark
 create mode 100644 data/2022/neurips/Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
 create mode 100644 data/2022/neurips/Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
 create mode 100644 data/2022/neurips/Look More but Care Less in Video Recognition
 create mode 100644 data/2022/neurips/Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
 create mode 100644 data/2022/neurips/Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
 create mode 100644 data/2022/neurips/Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation
 create mode 100644 data/2022/neurips/Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks
 create mode 100644 data/2022/neurips/Low-Rank Modular Reinforcement Learning via Muscle Synergy
 create mode 100644 data/2022/neurips/Low-rank Optimal Transport: Approximation, Statistics and Debiasing
 create mode 100644 data/2022/neurips/Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations
 create mode 100644 data/2022/neurips/Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression
 create mode 100644 data/2022/neurips/Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs
 create mode 100644 data/2022/neurips/Luckiness in Multiscale Online Learning
 create mode 100644 data/2022/neurips/M2N: Mesh Movement Networks for PDE Solvers
 create mode 100644 data/2022/neurips/M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus
 create mode 100644 data/2022/neurips/MABSplit: Faster Forest Training Using Multi-Armed Bandits
 create mode 100644 data/2022/neurips/MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields
 create mode 100644 data/2022/neurips/MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching
 create mode 100644 data/2022/neurips/MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
 create mode 100644 data/2022/neurips/MAgNet: Mesh Agnostic Neural PDE Solver
 create mode 100644 data/2022/neurips/MAtt: A Manifold Attention Network for EEG Decoding
 create mode 100644 data/2022/neurips/MBW: Multi-view Bootstrapping in the Wild
 create mode 100644 data/2022/neurips/MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators
 create mode 100644 data/2022/neurips/MCMAE: Masked Convolution Meets Masked Autoencoders
 create mode 100644 data/2022/neurips/MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
 create mode 100644 data/2022/neurips/MEMO: Test Time Robustness via Adaptation and Augmentation
 create mode 100644 data/2022/neurips/METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets
 create mode 100644 data/2022/neurips/MExMI: Pool-based Active Model Extraction Crossover Membership Inference
 create mode 100644 data/2022/neurips/MGNNI: Multiscale Graph Neural Networks with Implicit Layers
 create mode 100644 data/2022/neurips/MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
 create mode 100644 data/2022/neurips/MORA: Improving Ensemble Robustness Evaluation with Model Reweighing Attack
 create mode 100644 data/2022/neurips/MOVE: Unsupervised Movable Object Segmentation and Detection
 create mode 100644 data/2022/neurips/MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification
 create mode 100644 data/2022/neurips/MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction
 create mode 100644 data/2022/neurips/MVP-N: A Dataset and Benchmark for Real-World Multi-View Object Classification
 create mode 100644 data/2022/neurips/Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
 create mode 100644 data/2022/neurips/Make Some Noise: Reliable and Efficient Single-Step Adversarial Training
 create mode 100644 data/2022/neurips/Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis
 create mode 100644 data/2022/neurips/Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels
 create mode 100644 data/2022/neurips/Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure
 create mode 100644 data/2022/neurips/Manifold Interpolating Optimal-Transport Flows for Trajectory Inference
 create mode 100644 data/2022/neurips/Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation
 create mode 100644 data/2022/neurips/Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients
 create mode 100644 data/2022/neurips/Markovian Interference in Experiments
 create mode 100644 data/2022/neurips/Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class
 create mode 100644 data/2022/neurips/Mask Matching Transformer for Few-Shot Segmentation
 create mode 100644 data/2022/neurips/Mask-based Latent Reconstruction for Reinforcement Learning
 create mode 100644 data/2022/neurips/MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning
 create mode 100644 data/2022/neurips/MaskTune: Mitigating Spurious Correlations by Forcing to Explore
 create mode 100644 data/2022/neurips/Masked Autoencoders As Spatiotemporal Learners
 create mode 100644 data/2022/neurips/Masked Autoencoders that Listen
 create mode 100644 data/2022/neurips/Masked Autoencoding for Scalable and Generalizable Decision Making
 create mode 100644 data/2022/neurips/Masked Generative Adversarial Networks are Data-Efficient Generation Learners
 create mode 100644 data/2022/neurips/Masked Prediction: A Parameter Identifiability View
 create mode 100644 data/2022/neurips/Matching in Multi-arm Bandit with Collision
 create mode 100644 data/2022/neurips/Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence
 create mode 100644 data/2022/neurips/Matryoshka Representation Learning
 create mode 100644 data/2022/neurips/Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
 create mode 100644 data/2022/neurips/Maximizing Revenue under Market Shrinkage and Market Uncertainty
 create mode 100644 data/2022/neurips/Maximizing and Satisficing in Multi-armed Bandits with Graph Information
 create mode 100644 data/2022/neurips/Maximum Class Separation as Inductive Bias in One Matrix
 create mode 100644 data/2022/neurips/Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks
 create mode 100644 data/2022/neurips/Maximum Likelihood Training of Implicit Nonlinear Diffusion Model
 create mode 100644 data/2022/neurips/Maximum a posteriori natural scene reconstruction from retinal ganglion cells with deep denoiser priors
 create mode 100644 data/2022/neurips/Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
 create mode 100644 data/2022/neurips/Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models
 create mode 100644 data/2022/neurips/Mean Estimation with User-level Privacy under Data Heterogeneity
 create mode 100644 data/2022/neurips/Measures of Information Reflect Memorization Patterns
 create mode 100644 data/2022/neurips/Measuring Data Reconstruction Defenses in Collaborative Inference Systems
 create mode 100644 data/2022/neurips/Measuring and Reducing Model Update Regression in Structured Prediction for NLP
 create mode 100644 data/2022/neurips/Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
 create mode 100644 data/2022/neurips/Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
 create mode 100644 data/2022/neurips/Memory Efficient Continual Learning with Transformers
 create mode 100644 data/2022/neurips/Memory safe computations with XLA compiler
 create mode 100644 data/2022/neurips/Merging Models with Fisher-Weighted Averaging
 create mode 100644 data/2022/neurips/Mesoscopic modeling of hidden spiking neurons
 create mode 100644 data/2022/neurips/Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach
 create mode 100644 data/2022/neurips/Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification
 create mode 100644 data/2022/neurips/Meta-Auto-Decoder for Solving Parametric Partial Differential Equations
 create mode 100644 data/2022/neurips/Meta-Complementing the Semantics of Short Texts in Neural Topic Models
 create mode 100644 data/2022/neurips/Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
 create mode 100644 data/2022/neurips/Meta-Learning Dynamics Forecasting Using Task Inference
 create mode 100644 data/2022/neurips/Meta-Learning with Self-Improving Momentum Target
 create mode 100644 data/2022/neurips/Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning
 create mode 100644 data/2022/neurips/Meta-Reinforcement Learning with Self-Modifying Networks
 create mode 100644 data/2022/neurips/Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
 create mode 100644 data/2022/neurips/Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks
 create mode 100644 data/2022/neurips/MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning
 create mode 100644 data/2022/neurips/MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification
 create mode 100644 data/2022/neurips/MetricFormer: A Unified Perspective of Correlation Exploring in Similarity Learning
 create mode 100644 data/2022/neurips/Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders
 create mode 100644 data/2022/neurips/Mildly Conservative Q-Learning for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
 create mode 100644 data/2022/neurips/Mind Reader: Reconstructing complex images from brain activities
 create mode 100644 data/2022/neurips/Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
 create mode 100644 data/2022/neurips/MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
 create mode 100644 data/2022/neurips/Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification
 create mode 100644 data/2022/neurips/Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits
 create mode 100644 data/2022/neurips/Minimax Optimal Online Imitation Learning via Replay Estimation
 create mode 100644 data/2022/neurips/Minimax Regret for Cascading Bandits
 create mode 100644 data/2022/neurips/Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model
 create mode 100644 data/2022/neurips/Mining Multi-Label Samples from Single Positive Labels
 create mode 100644 data/2022/neurips/Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation
 create mode 100644 data/2022/neurips/Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently
 create mode 100644 data/2022/neurips/Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM
 create mode 100644 data/2022/neurips/Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
 create mode 100644 data/2022/neurips/MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models
 create mode 100644 data/2022/neurips/Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo
 create mode 100644 data/2022/neurips/Misspecified Phase Retrieval with Generative Priors
 create mode 100644 data/2022/neurips/Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization
 create mode 100644 data/2022/neurips/Mixture-of-Experts with Expert Choice Routing
 create mode 100644 data/2022/neurips/MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control
 create mode 100644 data/2022/neurips/MoCoDA: Model-based Counterfactual Data Augmentation
 create mode 100644 data/2022/neurips/MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation
 create mode 100644 data/2022/neurips/MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
 create mode 100644 data/2022/neurips/Model Preserving Compression for Neural Networks
 create mode 100644 data/2022/neurips/Model Zoos: A Dataset of Diverse Populations of Neural Network Models
 create mode 100644 data/2022/neurips/Model-Based Imitation Learning for Urban Driving
 create mode 100644 data/2022/neurips/Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
 create mode 100644 data/2022/neurips/Model-Based Opponent Modeling
 create mode 100644 data/2022/neurips/Model-based Lifelong Reinforcement Learning with Bayesian Exploration
 create mode 100644 data/2022/neurips/Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
 create mode 100644 data/2022/neurips/Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
 create mode 100644 data/2022/neurips/Modeling Human Exploration Through Resource-Rational Reinforcement Learning
 create mode 100644 data/2022/neurips/Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings
 create mode 100644 data/2022/neurips/Modeling the Machine Learning Multiverse
 create mode 100644 data/2022/neurips/Models Out of Line: A Fourier Lens on Distribution Shift Robustness
 create mode 100644 data/2022/neurips/Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
 create mode 100644 data/2022/neurips/Modular Flows: Differential Molecular Generation
 create mode 100644 data/2022/neurips/Module-Aware Optimization for Auxiliary Learning
 create mode 100644 data/2022/neurips/Molecule Generation by Principal Subgraph Mining and Assembling
 create mode 100644 data/2022/neurips/Moment Distributionally Robust Tree Structured Prediction
 create mode 100644 data/2022/neurips/Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation
 create mode 100644 data/2022/neurips/Momentum Aggregation for Private Non-convex ERM
 create mode 100644 data/2022/neurips/MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
 create mode 100644 data/2022/neurips/Monocular Dynamic View Synthesis: A Reality Check
 create mode 100644 data/2022/neurips/Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
 create mode 100644 data/2022/neurips/Monte Carlo Tree Descent for Black-Box Optimization
 create mode 100644 data/2022/neurips/Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization
 create mode 100644 data/2022/neurips/MorphTE: Injecting Morphology in Tensorized Embeddings
 create mode 100644 data/2022/neurips/Most Activation Functions Can Win the Lottery Without Excessive Depth
 create mode 100644 data/2022/neurips/Motion Transformer with Global Intention Localization and Local Movement Refinement
 create mode 100644 data/2022/neurips/Movement Penalized Bayesian Optimization with Application to Wind Energy Systems
 create mode 100644 data/2022/neurips/MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds
 create mode 100644 data/2022/neurips/Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging
 create mode 100644 data/2022/neurips/Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
 create mode 100644 data/2022/neurips/Multi-Class $H$-Consistency Bounds
 create mode 100644 data/2022/neurips/Multi-Fidelity Best-Arm Identification
 create mode 100644 data/2022/neurips/Multi-Game Decision Transformers
 create mode 100644 data/2022/neurips/Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
 create mode 100644 data/2022/neurips/Multi-Instance Causal Representation Learning for Instance Label Prediction and Out-of-Distribution Generalization
 create mode 100644 data/2022/neurips/Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities
 create mode 100644 data/2022/neurips/Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
 create mode 100644 data/2022/neurips/Multi-Objective Deep Learning with Adaptive Reference Vectors
 create mode 100644 data/2022/neurips/Multi-Sample Training for Neural Image Compression
 create mode 100644 data/2022/neurips/Multi-Scale Adaptive Network for Single Image Denoising
 create mode 100644 data/2022/neurips/Multi-agent Dynamic Algorithm Configuration
 create mode 100644 data/2022/neurips/Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents
 create mode 100644 data/2022/neurips/Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization
 create mode 100644 data/2022/neurips/Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization
 create mode 100644 data/2022/neurips/Multi-dataset Training of Transformers for Robust Action Recognition
 create mode 100644 data/2022/neurips/Multi-fidelity Monte Carlo: a pseudo-marginal approach
 create mode 100644 data/2022/neurips/Multi-layer State Evolution Under Random Convolutional Design
 create mode 100644 data/2022/neurips/Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
 create mode 100644 data/2022/neurips/Multi-objective Deep Data Generation with Correlated Property Control
 create mode 100644 data/2022/neurips/Multi-view Subspace Clustering on Topological Manifold
 create mode 100644 data/2022/neurips/MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples
 create mode 100644 data/2022/neurips/MultiScan: Scalable RGBD scanning for 3D environments with articulated objects
 create mode 100644 data/2022/neurips/Multiagent Q-learning with Sub-Team Coordination
 create mode 100644 data/2022/neurips/Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
 create mode 100644 data/2022/neurips/Multilingual Abusive Comment Detection at Scale for Indic Languages
 create mode 100644 data/2022/neurips/Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
 create mode 100644 data/2022/neurips/Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve
 create mode 100644 data/2022/neurips/Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks
 create mode 100644 data/2022/neurips/Multiview Human Body Reconstruction from Uncalibrated Cameras
 create mode 100644 data/2022/neurips/Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
 create mode 100644 data/2022/neurips/Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
 create mode 100644 data/2022/neurips/Myriad: a real-world testbed to bridge trajectory optimization and deep learning
 create mode 100644 "data/2022/neurips/M\302\263ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"
 create mode 100644 data/2022/neurips/NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
 create mode 100644 data/2022/neurips/NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search
 create mode 100644 data/2022/neurips/NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies
 create mode 100644 data/2022/neurips/NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching
 create mode 100644 data/2022/neurips/NOMAD: Nonlinear Manifold Decoders for Operator Learning
 create mode 100644 data/2022/neurips/NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation
 create mode 100644 data/2022/neurips/NS3: Neuro-symbolic Semantic Code Search
 create mode 100644 data/2022/neurips/NSNet: A General Neural Probabilistic Framework for Satisfiability Problems
 create mode 100644 data/2022/neurips/NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
 create mode 100644 data/2022/neurips/Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks
 create mode 100644 data/2022/neurips/Natural gradient enables fast sampling in spiking neural networks
 create mode 100644 data/2022/neurips/Natural image synthesis for the retina with variational information bottleneck representation
 create mode 100644 data/2022/neurips/NaturalProver: Grounded Mathematical Proof Generation with Language Models
 create mode 100644 data/2022/neurips/Navigating Memory Construction by Global Pseudo-Task Simulation for Continual Learning
 create mode 100644 data/2022/neurips/NeMF: Neural Motion Fields for Kinematic Animation
 create mode 100644 data/2022/neurips/Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
 create mode 100644 data/2022/neurips/Near-Isometric Properties of Kronecker-Structured Random Tensor Embeddings
 create mode 100644 data/2022/neurips/Near-Optimal Collaborative Learning in Bandits
 create mode 100644 data/2022/neurips/Near-Optimal Correlation Clustering with Privacy
 create mode 100644 data/2022/neurips/Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
 create mode 100644 data/2022/neurips/Near-Optimal Multi-Agent Learning for Safe Coverage Control
 create mode 100644 data/2022/neurips/Near-Optimal No-Regret Learning Dynamics for General Convex Games
 create mode 100644 data/2022/neurips/Near-Optimal Private and Scalable $k$-Clustering
 create mode 100644 data/2022/neurips/Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
 create mode 100644 data/2022/neurips/Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
 create mode 100644 data/2022/neurips/Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
 create mode 100644 data/2022/neurips/Near-Optimal Sample Complexity Bounds for Constrained MDPs
 create mode 100644 data/2022/neurips/Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
 create mode 100644 data/2022/neurips/Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs
 create mode 100644 data/2022/neurips/Nearly-Tight Bounds for Testing Histogram Distributions
 create mode 100644 data/2022/neurips/NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
 create mode 100644 data/2022/neurips/Network change point localisation under local differential privacy
 create mode 100644 data/2022/neurips/NeuForm: Adaptive Overfitting for Neural Shape Editing
 create mode 100644 data/2022/neurips/NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos
 create mode 100644 data/2022/neurips/Neur2SP: Neural Two-Stage Stochastic Programming
 create mode 100644 data/2022/neurips/NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation
 create mode 100644 data/2022/neurips/Neural Abstractions
 create mode 100644 data/2022/neurips/Neural Approximation of Graph Topological Features
 create mode 100644 data/2022/neurips/Neural Attentive Circuits
 create mode 100644 data/2022/neurips/Neural Basis Models for Interpretability
 create mode 100644 data/2022/neurips/Neural Circuit Architectural Priors for Embodied Control
 create mode 100644 data/2022/neurips/Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
 create mode 100644 data/2022/neurips/Neural Conservation Laws: A Divergence-Free Perspective
 create mode 100644 data/2022/neurips/Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
 create mode 100644 data/2022/neurips/Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection
 create mode 100644 data/2022/neurips/Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees
 create mode 100644 data/2022/neurips/Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence
 create mode 100644 data/2022/neurips/Neural Network Architecture Beyond Width and Depth
 create mode 100644 data/2022/neurips/Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members
 create mode 100644 data/2022/neurips/Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions
 create mode 100644 data/2022/neurips/Neural Shape Deformation Priors
 create mode 100644 data/2022/neurips/Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs
 create mode 100644 data/2022/neurips/Neural Stochastic Control
 create mode 100644 data/2022/neurips/Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics
 create mode 100644 data/2022/neurips/Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera
 create mode 100644 data/2022/neurips/Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs
 create mode 100644 data/2022/neurips/Neural Topological Ordering for Computation Graphs
 create mode 100644 data/2022/neurips/Neural Transmitted Radiance Fields
 create mode 100644 data/2022/neurips/Neural-Symbolic Entangled Framework for Complex Query Answering
 create mode 100644 data/2022/neurips/NeuroSchedule: A Novel Effective GNN-based Scheduling Method for High-level Synthesis
 create mode 100644 data/2022/neurips/Neuron with Steady Response Leads to Better Generalization
 create mode 100644 data/2022/neurips/Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints
 create mode 100644 data/2022/neurips/New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
 create mode 100644 data/2022/neurips/New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma
 create mode 100644 data/2022/neurips/No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit
 create mode 100644 data/2022/neurips/No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation
 create mode 100644 data/2022/neurips/Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world
 create mode 100644 data/2022/neurips/NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification
 create mode 100644 data/2022/neurips/Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling
 create mode 100644 data/2022/neurips/Non-Convex Bilevel Games with Critical Point Selection Maps
 create mode 100644 data/2022/neurips/Non-Gaussian Tensor Programs
 create mode 100644 data/2022/neurips/Non-Linear Coordination Graphs
 create mode 100644 data/2022/neurips/Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
 create mode 100644 data/2022/neurips/Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
 create mode 100644 data/2022/neurips/Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
 create mode 100644 data/2022/neurips/Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret
 create mode 100644 data/2022/neurips/Non-convex online learning via algorithmic equivalence
 create mode 100644 data/2022/neurips/Non-deep Networks
 create mode 100644 data/2022/neurips/Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness
 create mode 100644 data/2022/neurips/Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
 create mode 100644 data/2022/neurips/Non-rigid Point Cloud Registration with Neural Deformation Pyramid
 create mode 100644 data/2022/neurips/Non-stationary Bandits with Knapsacks
 create mode 100644 data/2022/neurips/Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting
 create mode 100644 data/2022/neurips/Nonlinear MCMC for Bayesian Machine Learning
 create mode 100644 data/2022/neurips/Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network
 create mode 100644 data/2022/neurips/Nonnegative Tensor Completion via Integer Optimization
 create mode 100644 data/2022/neurips/Nonparametric Uncertainty Quantification for Single Deterministic Neural Network
 create mode 100644 data/2022/neurips/Nonstationary Dual Averaging and Online Fair Allocation
 create mode 100644 data/2022/neurips/Normalizing Flows for Knockoff-free Controlled Feature Selection
 create mode 100644 data/2022/neurips/Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise
 create mode 100644 data/2022/neurips/Not too little, not too much: a theoretical analysis of graph (over)smoothing
 create mode 100644 data/2022/neurips/OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds
 create mode 100644 data/2022/neurips/OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics
 create mode 100644 data/2022/neurips/OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs
 create mode 100644 data/2022/neurips/OPEN: Orthogonal Propagation with Ego-Network Modeling
 create mode 100644 data/2022/neurips/ORIENT: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift
 create mode 100644 data/2022/neurips/OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training
 create mode 100644 data/2022/neurips/OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport
 create mode 100644 data/2022/neurips/Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks
 create mode 100644 data/2022/neurips/Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation
 create mode 100644 data/2022/neurips/Object Scene Representation Transformer
 create mode 100644 data/2022/neurips/Object-Category Aware Reinforcement Learning
 create mode 100644 data/2022/neurips/OccGen: Selection of Real-world Multilingual Parallel Data Balanced in Gender within Occupations
 create mode 100644 data/2022/neurips/Off-Policy Evaluation for Action-Dependent Non-stationary Environments
 create mode 100644 data/2022/neurips/Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models
 create mode 100644 data/2022/neurips/Off-Policy Evaluation with Deficient Support Using Side Information
 create mode 100644 data/2022/neurips/Off-Policy Evaluation with Policy-Dependent Optimization Response
 create mode 100644 data/2022/neurips/Off-Team Learning
 create mode 100644 data/2022/neurips/Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression
 create mode 100644 data/2022/neurips/Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
 create mode 100644 data/2022/neurips/Okapi: Generalising Better by Making Statistical Matches Match
 create mode 100644 data/2022/neurips/Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
 create mode 100644 data/2022/neurips/OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
 create mode 100644 data/2022/neurips/On A Mallows-type Model For (Ranked) Choices
 create mode 100644 data/2022/neurips/On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models
 create mode 100644 data/2022/neurips/On Batch Teaching with Sample Complexity Bounded by VCD
 create mode 100644 data/2022/neurips/On Computing Probabilistic Explanations for Decision Trees
 create mode 100644 data/2022/neurips/On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond
 create mode 100644 data/2022/neurips/On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds
 create mode 100644 data/2022/neurips/On Divergence Measures for Bayesian Pseudocoresets
 create mode 100644 data/2022/neurips/On Efficient Online Imitation Learning via Classification
 create mode 100644 data/2022/neurips/On Elimination Strategies for Bandit Fixed-Confidence Identification
 create mode 100644 data/2022/neurips/On Embeddings for Numerical Features in Tabular Deep Learning
 create mode 100644 data/2022/neurips/On Enforcing Better Conditioned Meta-Learning for Rapid Few-Shot Adaptation
 create mode 100644 data/2022/neurips/On Feature Learning in the Presence of Spurious Correlations
 create mode 100644 data/2022/neurips/On Gap-dependent Bounds for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice
 create mode 100644 data/2022/neurips/On Infinite Separations Between Simple and Optimal Mechanisms
 create mode 100644 data/2022/neurips/On Kernelized Multi-Armed Bandits with Constraints
 create mode 100644 data/2022/neurips/On Learning Fairness and Accuracy on Multiple Subgroups
 create mode 100644 data/2022/neurips/On Learning and Refutation in Noninteractive Local Differential Privacy
 create mode 100644 data/2022/neurips/On Leave-One-Out Conditional Mutual Information For Generalization
 create mode 100644 data/2022/neurips/On Margin Maximization in Linear and ReLU Networks
 create mode 100644 data/2022/neurips/On Margins and Generalisation for Voting Classifiers
 create mode 100644 data/2022/neurips/On Measuring Excess Capacity in Neural Networks
 create mode 100644 data/2022/neurips/On Non-Linear operators for Geometric Deep Learning
 create mode 100644 data/2022/neurips/On Optimal Learning Under Targeted Data Poisoning
 create mode 100644 data/2022/neurips/On Privacy and Personalization in Cross-Silo Federated Learning
 create mode 100644 data/2022/neurips/On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
 create mode 100644 data/2022/neurips/On Robust Multiclass Learnability
 create mode 100644 data/2022/neurips/On Sample Optimality in Personalized Collaborative and Federated Learning
 create mode 100644 data/2022/neurips/On Scalable Testing of Samplers
 create mode 100644 data/2022/neurips/On Scrambling Phenomena for Randomly Initialized Recurrent Networks
 create mode 100644 data/2022/neurips/On Translation and Reconstruction Guarantees of the Cycle-Consistent Generative Adversarial Networks
 create mode 100644 data/2022/neurips/On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification
 create mode 100644 data/2022/neurips/On global convergence of ResNets: From finite to infinite width using linear parameterization
 create mode 100644 data/2022/neurips/On the Adversarial Robustness of Mixture of Experts
 create mode 100644 data/2022/neurips/On the Complexity of Adversarial Decision Making
 create mode 100644 data/2022/neurips/On the Convergence Theory for Hessian-Free Bilevel Algorithms
 create mode 100644 data/2022/neurips/On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond
 create mode 100644 data/2022/neurips/On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs
 create mode 100644 data/2022/neurips/On the Double Descent of Random Features Models Trained with SGD
 create mode 100644 data/2022/neurips/On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias
 create mode 100644 data/2022/neurips/On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
 create mode 100644 data/2022/neurips/On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
 create mode 100644 data/2022/neurips/On the Effectiveness of Persistent Homology
 create mode 100644 data/2022/neurips/On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
 create mode 100644 data/2022/neurips/On the Epistemic Limits of Personalized Prediction
 create mode 100644 data/2022/neurips/On the Frequency-bias of Coordinate-MLPs
 create mode 100644 data/2022/neurips/On the Generalizability and Predictability of Recommender Systems
 create mode 100644 data/2022/neurips/On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model
 create mode 100644 data/2022/neurips/On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
 create mode 100644 data/2022/neurips/On the Identifiability of Nonlinear ICA: Sparsity and Beyond
 create mode 100644 data/2022/neurips/On the Importance of Gradient Norm in PAC-Bayesian Bounds
 create mode 100644 data/2022/neurips/On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity
 create mode 100644 data/2022/neurips/On the Learning Mechanisms in Physical Reasoning
 create mode 100644 data/2022/neurips/On the Limitations of Stochastic Pre-processing Defenses
 create mode 100644 data/2022/neurips/On the Parameterization and Initialization of Diagonal State Space Models
 create mode 100644 data/2022/neurips/On the Representation Collapse of Sparse Mixture of Experts
 create mode 100644 data/2022/neurips/On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses
 create mode 100644 data/2022/neurips/On the Robustness of Graph Neural Diffusion to Topology Perturbations
 create mode 100644 data/2022/neurips/On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
 create mode 100644 data/2022/neurips/On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
 create mode 100644 data/2022/neurips/On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory
 create mode 100644 data/2022/neurips/On the Spectral Bias of Convolutional Neural Tangent and Gaussian Process Kernels
 create mode 100644 data/2022/neurips/On the Stability and Scalability of Node Perturbation Learning
 create mode 100644 data/2022/neurips/On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
 create mode 100644 data/2022/neurips/On the Strong Correlation Between Model Invariance and Generalization
 create mode 100644 data/2022/neurips/On the Symmetries of Deep Learning Models and their Internal Representations
 create mode 100644 data/2022/neurips/On the Theoretical Properties of Noise Correlation in Stochastic Optimization
 create mode 100644 data/2022/neurips/On the Tradeoff Between Robustness and Fairness
 create mode 100644 data/2022/neurips/On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve
 create mode 100644 data/2022/neurips/On the convergence of policy gradient methods to Nash equilibria in general stochastic games
 create mode 100644 data/2022/neurips/On the detrimental effect of invariances in the likelihood for variational inference
 create mode 100644 data/2022/neurips/On the difficulty of learning chaotic dynamics with RNNs
 create mode 100644 data/2022/neurips/On the generalization of learning algorithms that do not converge
 create mode 100644 data/2022/neurips/On the inability of Gaussian process regression to optimally learn compositional functions
 create mode 100644 data/2022/neurips/On the non-universality of deep learning: quantifying the cost of symmetry
 create mode 100644 data/2022/neurips/On the relationship between variational inference and auto-associative memory
 create mode 100644 data/2022/neurips/On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation
 create mode 100644 data/2022/neurips/On the symmetries of the synchronization problem in Cryo-EM: Multi-Frequency Vector Diffusion Maps on the Projective Plane
 create mode 100644 data/2022/neurips/On-Demand Sampling: Learning Optimally from Multiple Distributions
 create mode 100644 data/2022/neurips/On-Device Training Under 256KB Memory
 create mode 100644 data/2022/neurips/One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations
 create mode 100644 data/2022/neurips/One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement
 create mode 100644 data/2022/neurips/One for All: Simultaneous Metric and Preference Learning over Multiple Users
 create mode 100644 data/2022/neurips/One-Inlier is First: Towards Efficient Position Encoding for Point Cloud Registration
 create mode 100644 data/2022/neurips/One-shot Neural Backdoor Erasing via Adversarial Weight Masking
 create mode 100644 data/2022/neurips/OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models
 create mode 100644 data/2022/neurips/Online Agnostic Multiclass Boosting
 create mode 100644 data/2022/neurips/Online Algorithms for the Santa Claus Problem
 create mode 100644 data/2022/neurips/Online Allocation and Learning in the Presence of Strategic Agents
 create mode 100644 data/2022/neurips/Online Bipartite Matching with Advice: Tight Robustness-Consistency Tradeoffs for the Two-Stage Model
 create mode 100644 data/2022/neurips/Online Convex Optimization with Hard Constraints: Towards the Best of Two Worlds and Beyond
 create mode 100644 data/2022/neurips/Online Decision Mediation
 create mode 100644 data/2022/neurips/Online Deep Equilibrium Learning for Regularization by Denoising
 create mode 100644 data/2022/neurips/Online Frank-Wolfe with Arbitrary Delays
 create mode 100644 data/2022/neurips/Online Learning and Pricing for Network Revenue Management with Reusable Resources
 create mode 100644 data/2022/neurips/Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications
 create mode 100644 data/2022/neurips/Online Neural Sequence Detection with Hierarchical Dirichlet Point Process
 create mode 100644 data/2022/neurips/Online PAC-Bayes Learning
 create mode 100644 data/2022/neurips/Online Reinforcement Learning for Mixed Policy Scopes
 create mode 100644 data/2022/neurips/Online Training Through Time for Spiking Neural Networks
 create mode 100644 data/2022/neurips/Ontologue: Declarative Benchmark Construction for Ontological Multi-Label Classification
 create mode 100644 data/2022/neurips/Open High-Resolution Satellite Imagery: The WorldStrat Dataset - With Application to Super-Resolution
 create mode 100644 data/2022/neurips/Open-Ended Reinforcement Learning with Neural Reward Functions
 create mode 100644 data/2022/neurips/OpenAUC: Towards AUC-Oriented Open-Set Recognition
 create mode 100644 data/2022/neurips/OpenFWI: Large-scale Multi-structural Benchmark Datasets for Full Waveform Inversion
 create mode 100644 data/2022/neurips/OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters
 create mode 100644 data/2022/neurips/OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
 create mode 100644 data/2022/neurips/OpenSRH: optimizing brain tumor surgery using intraoperative stimulated Raman histology
 create mode 100644 data/2022/neurips/OpenXAI: Towards a Transparent Evaluation of Model Explanations
 create mode 100644 data/2022/neurips/Operative dimensions in unconstrained connectivity of recurrent neural networks
 create mode 100644 data/2022/neurips/Operator Splitting Value Iteration
 create mode 100644 data/2022/neurips/Optimal Algorithms for Decentralized Stochastic Variational Inequalities
 create mode 100644 data/2022/neurips/Optimal Binary Classification Beyond Accuracy
 create mode 100644 data/2022/neurips/Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
 create mode 100644 data/2022/neurips/Optimal Comparator Adaptive Online Learning with Switching Cost
 create mode 100644 data/2022/neurips/Optimal Dynamic Regret in LQR Control
 create mode 100644 data/2022/neurips/Optimal Efficiency-Envy Trade-Off via Optimal Transport
 create mode 100644 data/2022/neurips/Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity
 create mode 100644 data/2022/neurips/Optimal Positive Generation via Latent Transformation for Contrastive Learning
 create mode 100644 data/2022/neurips/Optimal Query Complexities for Dynamic Trace Estimation
 create mode 100644 data/2022/neurips/Optimal Rates for Regularized Conditional Mean Embedding Learning
 create mode 100644 data/2022/neurips/Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
 create mode 100644 data/2022/neurips/Optimal Transport of Classifiers to Fairness
 create mode 100644 data/2022/neurips/Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition
 create mode 100644 data/2022/neurips/Optimal Weak to Strong Learning
 create mode 100644 data/2022/neurips/Optimal and Adaptive Monteiro-Svaiter Acceleration
 create mode 100644 data/2022/neurips/Optimal-er Auctions through Attention
 create mode 100644 data/2022/neurips/Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games
 create mode 100644 data/2022/neurips/Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
 create mode 100644 data/2022/neurips/Optimistic Tree Searches for Combinatorial Black-Box Optimization
 create mode 100644 data/2022/neurips/Optimizing Data Collection for Machine Learning
 create mode 100644 data/2022/neurips/Optimizing Relevance Maps of Vision Transformers Improves Robustness
 create mode 100644 data/2022/neurips/Oracle Inequalities for Model Selection in Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/Oracle-Efficient Online Learning for Smoothed Adversaries
 create mode 100644 data/2022/neurips/Order-Invariant Cardinality Estimators Are Differentially Private
 create mode 100644 data/2022/neurips/Ordered Subgraph Aggregation Networks
 create mode 100644 data/2022/neurips/OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
 create mode 100644 data/2022/neurips/Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization
 create mode 100644 data/2022/neurips/Oscillatory Tracking of Continuous Attractor Neural Networks Account for Phase Precession and Procession of Hippocampal Place Cells
 create mode 100644 data/2022/neurips/Out-of-Distribution Detection via Conditional Kernel Independence Model
 create mode 100644 data/2022/neurips/Out-of-Distribution Detection with An Adaptive Likelihood Ratio on Informative Hierarchical VAE
 create mode 100644 data/2022/neurips/Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
 create mode 100644 data/2022/neurips/Outlier-Robust Sparse Estimation via Non-Convex Optimization
 create mode 100644 data/2022/neurips/Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions
 create mode 100644 data/2022/neurips/Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling
 create mode 100644 data/2022/neurips/Overparameterization from Computational Constraints
 create mode 100644 data/2022/neurips/P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
 create mode 100644 data/2022/neurips/PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
 create mode 100644 data/2022/neurips/PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/PALBERT: Teaching ALBERT to Ponder
 create mode 100644 data/2022/neurips/PALMER: Perception - Action Loop with Memory for Long-Horizon Planning
 create mode 100644 data/2022/neurips/PDEBench: An Extensive Benchmark for Scientific Machine Learning
 create mode 100644 data/2022/neurips/PDSketch: Integrated Domain Programming, Learning, and Planning
 create mode 100644 data/2022/neurips/PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
 create mode 100644 data/2022/neurips/PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
 create mode 100644 data/2022/neurips/PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics
 create mode 100644 data/2022/neurips/PaCo: Parameter-Compositional Multi-task Reinforcement Learning
 create mode 100644 data/2022/neurips/Palm up: Playing in the Latent Manifold for Unsupervised Pretraining
 create mode 100644 data/2022/neurips/Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network
 create mode 100644 data/2022/neurips/Parallel Tempering With a Variational Reference
 create mode 100644 data/2022/neurips/Parameter tuning and model selection in Optimal Transport with semi-dual Brenier formulation
 create mode 100644 data/2022/neurips/Parameter-Efficient Masking Networks
 create mode 100644 data/2022/neurips/Parameter-free Dynamic Graph Embedding for Link Prediction
 create mode 100644 data/2022/neurips/Parameter-free Regret in High Probability with Heavy Tails
 create mode 100644 data/2022/neurips/Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference
 create mode 100644 data/2022/neurips/Parametrically Retargetable Decision-Makers Tend To Seek Power
 create mode 100644 data/2022/neurips/Paraphrasing Is All You Need for Novel Object Captioning
 create mode 100644 data/2022/neurips/Pareto Set Learning for Expensive Multi-Objective Optimization
 create mode 100644 data/2022/neurips/Partial Identification of Treatment Effects with Implicit Generative Models
 create mode 100644 data/2022/neurips/PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories
 create mode 100644 data/2022/neurips/Patching open-vocabulary models by interpolating weights
 create mode 100644 data/2022/neurips/Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
 create mode 100644 data/2022/neurips/Pay attention to your loss : understanding misconceptions about Lipschitz neural networks
 create mode 100644 data/2022/neurips/PeRFception: Perception using Radiance Fields
 create mode 100644 data/2022/neurips/Peer Prediction for Learning Agents
 create mode 100644 data/2022/neurips/Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop
 create mode 100644 data/2022/neurips/Perfect Sampling from Pairwise Comparisons
 create mode 100644 data/2022/neurips/PerfectDou: Dominating DouDizhu with Perfect Information Distillation
 create mode 100644 data/2022/neurips/Performative Power
 create mode 100644 data/2022/neurips/Periodic Graph Transformers for Crystal Material Property Prediction
 create mode 100644 data/2022/neurips/Peripheral Vision Transformer
 create mode 100644 data/2022/neurips/Personalized Federated Learning towards Communication Efficiency, Robustness and Fairness
 create mode 100644 data/2022/neurips/Personalized Online Federated Learning with Multiple Kernels
 create mode 100644 data/2022/neurips/Perturbation Learning Based Anomaly Detection
 create mode 100644 data/2022/neurips/Phase Transition from Clean Training to Adversarial Training
 create mode 100644 data/2022/neurips/Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
 create mode 100644 data/2022/neurips/Phase transitions in when feedback is useful
 create mode 100644 data/2022/neurips/Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
 create mode 100644 data/2022/neurips/PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery
 create mode 100644 data/2022/neurips/Physically-Based Face Rendering for NIR-VIS Face Recognition
 create mode 100644 data/2022/neurips/Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions
 create mode 100644 data/2022/neurips/Physics-Informed Implicit Representations of Equilibrium Network Flows
 create mode 100644 data/2022/neurips/Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?
 create mode 100644 data/2022/neurips/Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
 create mode 100644 data/2022/neurips/Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation
 create mode 100644 data/2022/neurips/Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning
 create mode 100644 data/2022/neurips/Planning for Sample Efficient Imitation Learning
 create mode 100644 data/2022/neurips/Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction
 create mode 100644 data/2022/neurips/PlasticityNet: Learning to Simulate Metal, Sand, and Snow for Optimization Time Integration
 create mode 100644 data/2022/neurips/Pluralistic Image Completion with Gaussian Mixture Models
 create mode 100644 data/2022/neurips/Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
 create mode 100644 data/2022/neurips/Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
 create mode 100644 data/2022/neurips/PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
 create mode 100644 data/2022/neurips/PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
 create mode 100644 data/2022/neurips/Poisson Flow Generative Models
 create mode 100644 data/2022/neurips/PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds
 create mode 100644 data/2022/neurips/Policy Gradient With Serial Markov Chain Reasoning
 create mode 100644 data/2022/neurips/Policy Optimization for Markov Games: Unified Framework and Faster Convergence
 create mode 100644 data/2022/neurips/Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems
 create mode 100644 data/2022/neurips/Policy Optimization with Linear Temporal Logic Constraints
 create mode 100644 data/2022/neurips/Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
 create mode 100644 data/2022/neurips/Polynomial Neural Fields for Subband Decomposition and Manipulation
 create mode 100644 data/2022/neurips/Polynomial time guarantees for the Burer-Monteiro method
 create mode 100644 data/2022/neurips/Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games
 create mode 100644 data/2022/neurips/PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
 create mode 100644 data/2022/neurips/Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
 create mode 100644 data/2022/neurips/Positively Weighted Kernel Quadrature via Subsampling
 create mode 100644 data/2022/neurips/Post-hoc estimators for learning to defer to an expert
 create mode 100644 data/2022/neurips/Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers
 create mode 100644 data/2022/neurips/Posterior Collapse of a Linear Latent Variable Model
 create mode 100644 data/2022/neurips/Posterior Matching for Arbitrary Conditioning
 create mode 100644 data/2022/neurips/Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks
 create mode 100644 data/2022/neurips/Posterior and Computational Uncertainty in Gaussian Processes
 create mode 100644 data/2022/neurips/Power and limitations of single-qubit native quantum neural networks
 create mode 100644 data/2022/neurips/Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models
 create mode 100644 data/2022/neurips/Practical Adversarial Multivalid Conformal Prediction
 create mode 100644 data/2022/neurips/Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments
 create mode 100644 data/2022/neurips/Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors
 create mode 100644 data/2022/neurips/Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning
 create mode 100644 data/2022/neurips/Pre-Trained Language Models for Interactive Decision-Making
 create mode 100644 data/2022/neurips/Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning
 create mode 100644 data/2022/neurips/Pre-activation Distributions Expose Backdoor Neurons
 create mode 100644 data/2022/neurips/Pre-trained Adversarial Perturbations
 create mode 100644 data/2022/neurips/Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression
 create mode 100644 data/2022/neurips/Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm
 create mode 100644 data/2022/neurips/Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution
 create mode 100644 data/2022/neurips/Predicting Label Distribution from Multi-label Ranking
 create mode 100644 data/2022/neurips/Predictive Coding beyond Gaussian Distributions
 create mode 100644 data/2022/neurips/Predictive Querying for Autoregressive Neural Sequence Models
 create mode 100644 data/2022/neurips/Preservation of the Global Knowledge by Not-True Distillation in Federated Learning
 create mode 100644 data/2022/neurips/Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation
 create mode 100644 data/2022/neurips/Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss
 create mode 100644 data/2022/neurips/Private Estimation with Public Data
 create mode 100644 data/2022/neurips/Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate
 create mode 100644 data/2022/neurips/Private Isotonic Regression
 create mode 100644 data/2022/neurips/Private Multiparty Perception for Navigation
 create mode 100644 data/2022/neurips/Private Set Generation with Discriminative Information
 create mode 100644 data/2022/neurips/Private Synthetic Data for Multitask Learning and Marginal Queries
 create mode 100644 data/2022/neurips/Private and Communication-Efficient Algorithms for Entropy Estimation
 create mode 100644 data/2022/neurips/Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data
 create mode 100644 data/2022/neurips/Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
 create mode 100644 data/2022/neurips/Probable Domain Generalization via Quantile Risk Minimization
 create mode 100644 data/2022/neurips/Probing Classifiers are Unreliable for Concept Removal and Detection
 create mode 100644 data/2022/neurips/Procedural Image Programs for Representation Learning
 create mode 100644 data/2022/neurips/Product Ranking for Revenue Maximization with Multiple Purchases
 create mode 100644 data/2022/neurips/Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images
 create mode 100644 data/2022/neurips/Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization
 create mode 100644 data/2022/neurips/Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms
 create mode 100644 data/2022/neurips/ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model
 create mode 100644 data/2022/neurips/ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
 create mode 100644 data/2022/neurips/Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection
 create mode 100644 data/2022/neurips/Provable Benefit of Multitask Representation Learning in Reinforcement Learning
 create mode 100644 data/2022/neurips/Provable Defense against Backdoor Policies in Reinforcement Learning
 create mode 100644 data/2022/neurips/Provable General Function Class Representation Learning in Multitask Bandits and MDP
 create mode 100644 data/2022/neurips/Provable Generalization of Overparameterized Meta-learning Trained with SGD
 create mode 100644 data/2022/neurips/Provable Subspace Identification Under Post-Nonlinear Mixtures
 create mode 100644 data/2022/neurips/Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free
 create mode 100644 data/2022/neurips/Provably Efficient Model-Free Constrained RL with Linear Function Approximation
 create mode 100644 data/2022/neurips/Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
 create mode 100644 data/2022/neurips/Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
 create mode 100644 data/2022/neurips/Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
 create mode 100644 data/2022/neurips/Provably expressive temporal graph networks
 create mode 100644 data/2022/neurips/Provably sample-efficient RL with side information about latent dynamics
 create mode 100644 data/2022/neurips/Provably tuning the ElasticNet across instances
 create mode 100644 data/2022/neurips/Proximal Learning With Opponent-Learning Awareness
 create mode 100644 data/2022/neurips/Proximal Point Imitation Learning
 create mode 100644 data/2022/neurips/Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks
 create mode 100644 data/2022/neurips/Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions
 create mode 100644 data/2022/neurips/Pruning has a disparate impact on model accuracy
 create mode 100644 data/2022/neurips/Pruning's Effect on Generalization Through the Lens of Training and Regularization
 create mode 100644 data/2022/neurips/Pseudo-Riemannian Graph Convolutional Networks
 create mode 100644 data/2022/neurips/Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social Text Classification
 create mode 100644 data/2022/neurips/PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation
 create mode 100644 data/2022/neurips/Pure Transformers are Powerful Graph Learners
 create mode 100644 data/2022/neurips/Pushing the limits of fairness impossibility: Who's the fairest of them all?
 create mode 100644 data/2022/neurips/Pyramid Attention For Source Code Summarization
 create mode 100644 data/2022/neurips/PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
 create mode 100644 data/2022/neurips/Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case
 create mode 100644 data/2022/neurips/Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
 create mode 100644 data/2022/neurips/QC-StyleGAN - Quality Controllable Image Generation and Manipulation
 create mode 100644 data/2022/neurips/QUARK: Controllable Text Generation with Reinforced Unlearning
 create mode 100644 data/2022/neurips/Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
 create mode 100644 data/2022/neurips/Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference
 create mode 100644 data/2022/neurips/Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability
 create mode 100644 data/2022/neurips/Quantized Training of Gradient Boosting Decision Trees
 create mode 100644 data/2022/neurips/Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants
 create mode 100644 data/2022/neurips/Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits
 create mode 100644 data/2022/neurips/Quasi-Newton Methods for Saddle Point Problems
 create mode 100644 data/2022/neurips/QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query
 create mode 100644 data/2022/neurips/Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits
 create mode 100644 data/2022/neurips/Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?
 create mode 100644 data/2022/neurips/RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
 create mode 100644 data/2022/neurips/RISE: Robust Individualized Decision Learning with Sensitive Variables
 create mode 100644 data/2022/neurips/RKHS-SHAP: Shapley Values for Kernel Methods
 create mode 100644 data/2022/neurips/RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
 create mode 100644 data/2022/neurips/RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
 create mode 100644 data/2022/neurips/RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
 create mode 100644 data/2022/neurips/RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning
 create mode 100644 data/2022/neurips/RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
 create mode 100644 data/2022/neurips/RainNet: A Large-Scale Imagery Dataset and Benchmark for Spatial Precipitation Downscaling
 create mode 100644 data/2022/neurips/Random Normalization Aggregation for Adversarial Defense
 create mode 100644 data/2022/neurips/Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism
 create mode 100644 data/2022/neurips/Random Sharpness-Aware Minimization
 create mode 100644 data/2022/neurips/Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets
 create mode 100644 data/2022/neurips/Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks
 create mode 100644 data/2022/neurips/Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means
 create mode 100644 data/2022/neurips/Rank Diminishing in Deep Neural Networks
 create mode 100644 data/2022/neurips/RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection
 create mode 100644 data/2022/neurips/Rapid Model Architecture Adaption for Meta-Learning
 create mode 100644 data/2022/neurips/Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems
 create mode 100644 data/2022/neurips/Rare Gems: Finding Lottery Tickets at Initialization
 create mode 100644 data/2022/neurips/Rashomon Capacity: A Metric for Predictive Multiplicity in Classification
 create mode 100644 data/2022/neurips/Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning
 create mode 100644 data/2022/neurips/Rate-Optimal Online Convex Optimization in Adaptive Linear Control
 create mode 100644 data/2022/neurips/Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion
 create mode 100644 data/2022/neurips/ReCo: Retrieve and Co-segment for Zero-shot Transfer
 create mode 100644 data/2022/neurips/ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
 create mode 100644 data/2022/neurips/Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks
 create mode 100644 data/2022/neurips/Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
 create mode 100644 data/2022/neurips/Receding Horizon Inverse Reinforcement Learning
 create mode 100644 data/2022/neurips/Recipe for a General, Powerful, Scalable Graph Transformer
 create mode 100644 data/2022/neurips/Recommender Forest for Efficient Retrieval
 create mode 100644 data/2022/neurips/Reconstructing Training Data From Trained Neural Networks
 create mode 100644 data/2022/neurips/Reconstruction on Trees and Low-Degree Polynomials
 create mode 100644 data/2022/neurips/Recovering Private Text in Federated Learning of Language Models
 create mode 100644 data/2022/neurips/Recruitment Strategies That Take a Chance
 create mode 100644 data/2022/neurips/Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms
 create mode 100644 data/2022/neurips/Recurrent Memory Transformer
 create mode 100644 data/2022/neurips/Recurrent Video Restoration Transformer with Guided Deformable Attention
 create mode 100644 data/2022/neurips/Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method
 create mode 100644 data/2022/neurips/Recursive Reinforcement Learning
 create mode 100644 data/2022/neurips/RecursiveMix: Mixed Learning with History
 create mode 100644 data/2022/neurips/Redeeming intrinsic rewards via constrained optimization
 create mode 100644 data/2022/neurips/Redistribution of Weights and Activations for AdderNet Quantization
 create mode 100644 data/2022/neurips/Reduced Representation of Deformation Fields for Effective Non-rigid Shape Matching
 create mode 100644 data/2022/neurips/Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
 create mode 100644 data/2022/neurips/Redundancy-Free Message Passing for Graph Neural Networks
 create mode 100644 data/2022/neurips/Redundant representations help generalization in wide neural networks
 create mode 100644 data/2022/neurips/Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Translation Model
 create mode 100644 data/2022/neurips/Regret Bounds for Information-Directed Reinforcement Learning
 create mode 100644 data/2022/neurips/Regret Bounds for Multilabel Classification in Sparse Label Regimes
 create mode 100644 data/2022/neurips/Regret Bounds for Risk-Sensitive Reinforcement Learning
 create mode 100644 data/2022/neurips/Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
 create mode 100644 data/2022/neurips/Regularized Molecular Conformation Fields
 create mode 100644 data/2022/neurips/Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
 create mode 100644 data/2022/neurips/Reinforced Genetic Algorithm for Structure-based Drug Design
 create mode 100644 data/2022/neurips/Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
 create mode 100644 data/2022/neurips/Reinforcement Learning with Automated Auxiliary Loss Search
 create mode 100644 data/2022/neurips/Reinforcement Learning with Logarithmic Regret and Policy Switches
 create mode 100644 data/2022/neurips/Reinforcement Learning with Neural Radiance Fields
 create mode 100644 data/2022/neurips/Reinforcement Learning with Non-Exponential Discounting
 create mode 100644 data/2022/neurips/Reinforcement Learning with a Terminator
 create mode 100644 data/2022/neurips/Relation-Constrained Decoding for Text Generation
 create mode 100644 data/2022/neurips/Relational Proxies: Emergent Relationships as Fine-Grained Discriminators
 create mode 100644 data/2022/neurips/Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
 create mode 100644 data/2022/neurips/Relaxing Equivariance Constraints with Non-stationary Continuous Filters
 create mode 100644 data/2022/neurips/Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks
 create mode 100644 data/2022/neurips/Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
 create mode 100644 data/2022/neurips/Repairing Neural Networks by Leaving the Right Past Behind
 create mode 100644 data/2022/neurips/Representing Spatial Trajectories as Distributions
 create mode 100644 data/2022/neurips/Reproducibility in Optimization: Theoretical Framework and Limits
 create mode 100644 data/2022/neurips/ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization
 create mode 100644 data/2022/neurips/ResT V2: Simpler, Faster and Stronger
 create mode 100644 data/2022/neurips/Residual Multiplicative Filter Networks for Multiscale Reconstruction
 create mode 100644 data/2022/neurips/Resolving the data ambiguity for periodic crystals
 create mode 100644 data/2022/neurips/Resource-Adaptive Federated Learning with All-In-One Neural Composition
 create mode 100644 data/2022/neurips/Respecting Transfer Gap in Knowledge Distillation
 create mode 100644 data/2022/neurips/Retaining Knowledge for Learning with Dynamic Definition
 create mode 100644 data/2022/neurips/Rethinking Alignment in Video Super-Resolution Transformers
 create mode 100644 data/2022/neurips/Rethinking Generalization in Few-Shot Classification
 create mode 100644 data/2022/neurips/Rethinking Image Restoration for Object Detection
 create mode 100644 data/2022/neurips/Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Rethinking Knowledge Graph Evaluation Under the Open-World Assumption
 create mode 100644 data/2022/neurips/Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective
 create mode 100644 data/2022/neurips/Rethinking Resolution in the Context of Efficient Video Recognition
 create mode 100644 data/2022/neurips/Rethinking Value Function Learning for Generalization in Reinforcement Learning
 create mode 100644 data/2022/neurips/Rethinking Variational Inference for Probabilistic Programs with Stochastic Support
 create mode 100644 data/2022/neurips/Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain
 create mode 100644 data/2022/neurips/Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination
 create mode 100644 data/2022/neurips/Rethinking the Reverse-engineering of Trojan Triggers
 create mode 100644 data/2022/neurips/Rethinking the compositionality of point clouds through regularization in the hyperbolic space
 create mode 100644 data/2022/neurips/Retrieval-Augmented Diffusion Models
 create mode 100644 data/2022/neurips/Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions
 create mode 100644 data/2022/neurips/Retrospective Adversarial Replay for Continual Learning
 create mode 100644 data/2022/neurips/Revisit last-iterate convergence of mSGD under milder requirement on step size
 create mode 100644 data/2022/neurips/Revisiting Active Sets for Gaussian Process Decoders
 create mode 100644 data/2022/neurips/Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum
 create mode 100644 data/2022/neurips/Revisiting Heterophily For Graph Neural Networks
 create mode 100644 data/2022/neurips/Revisiting Injective Attacks on Recommender Systems
 create mode 100644 data/2022/neurips/Revisiting Neural Scaling Laws in Language and Vision
 create mode 100644 data/2022/neurips/Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching
 create mode 100644 data/2022/neurips/Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization
 create mode 100644 data/2022/neurips/Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering
 create mode 100644 data/2022/neurips/Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution
 create mode 100644 data/2022/neurips/Revisiting Sparse Convolutional Model for Visual Recognition
 create mode 100644 data/2022/neurips/Riemannian Diffusion Models
 create mode 100644 data/2022/neurips/Riemannian Neural SDE: Learning Stochastic Representations on Manifolds
 create mode 100644 data/2022/neurips/Riemannian Score-Based Generative Modelling
 create mode 100644 data/2022/neurips/Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
 create mode 100644 data/2022/neurips/Risk-Driven Design of Perception Systems
 create mode 100644 data/2022/neurips/Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge
 create mode 100644 data/2022/neurips/Robust Anytime Learning of Markov Decision Processes
 create mode 100644 data/2022/neurips/Robust Bayesian Regression via Hard Thresholding
 create mode 100644 data/2022/neurips/Robust Binary Models by Pruning Randomly-initialized Networks
 create mode 100644 data/2022/neurips/Robust Calibration with Multi-domain Temperature Scaling
 create mode 100644 data/2022/neurips/Robust Feature-Level Adversaries are Interpretability Tools
 create mode 100644 data/2022/neurips/Robust Generalized Method of Moments: A Finite Sample Viewpoint
 create mode 100644 data/2022/neurips/Robust Graph Structure Learning via Multiple Statistical Tests
 create mode 100644 data/2022/neurips/Robust Imitation of a Few Demonstrations with a Backwards Model
 create mode 100644 data/2022/neurips/Robust Imitation via Mirror Descent Inverse Reinforcement Learning
 create mode 100644 data/2022/neurips/Robust Learning against Relational Adversaries
 create mode 100644 data/2022/neurips/Robust Model Selection and Nearly-Proper Learning for GMMs
 create mode 100644 data/2022/neurips/Robust Models are less Over-Confident
 create mode 100644 data/2022/neurips/Robust Neural Posterior Estimation and Statistical Model Criticism
 create mode 100644 data/2022/neurips/Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
 create mode 100644 data/2022/neurips/Robust Reinforcement Learning using Offline Data
 create mode 100644 data/2022/neurips/Robust Rent Division
 create mode 100644 data/2022/neurips/Robust Semi-Supervised Learning when Not All Classes have Labels
 create mode 100644 data/2022/neurips/Robust Streaming PCA
 create mode 100644 data/2022/neurips/Robust Testing in High-Dimensional Sparse Models
 create mode 100644 data/2022/neurips/Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
 create mode 100644 data/2022/neurips/Robustness Disparities in Face Detection
 create mode 100644 data/2022/neurips/Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)
 create mode 100644 data/2022/neurips/Robustness to Label Noise Depends on the Shape of the Noise Distribution
 create mode 100644 data/2022/neurips/Robustness to Unbounded Smoothness of Generalized SignSGD
 create mode 100644 data/2022/neurips/Root Cause Analysis of Failures in Microservices through Causal Discovery
 create mode 100644 data/2022/neurips/Rotation-Equivariant Conditional Spherical Neural Fields for Learning a Natural Illumination Prior
 create mode 100644 "data/2022/neurips/R\303\251nyiCL: Contrastive Representation Learning with Skew R\303\251nyi Divergence"
 create mode 100644 data/2022/neurips/S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction
 create mode 100644 data/2022/neurips/S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
 create mode 100644 data/2022/neurips/S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
 create mode 100644 data/2022/neurips/S3GC: Scalable Self-Supervised Graph Clustering
 create mode 100644 data/2022/neurips/S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
 create mode 100644 data/2022/neurips/SALSA: Attacking Lattice Cryptography with Transformers
 create mode 100644 data/2022/neurips/SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections
 create mode 100644 data/2022/neurips/SAPA: Similarity-Aware Point Affiliation for Feature Upsampling
 create mode 100644 data/2022/neurips/SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems
 create mode 100644 data/2022/neurips/SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training
 create mode 100644 data/2022/neurips/SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
 create mode 100644 data/2022/neurips/SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization
 create mode 100644 data/2022/neurips/SCAMPS: Synthetics for Camera Measurement of Physiological Signals
 create mode 100644 data/2022/neurips/SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction
 create mode 100644 data/2022/neurips/SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification
 create mode 100644 data/2022/neurips/SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration
 create mode 100644 data/2022/neurips/SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping
 create mode 100644 data/2022/neurips/SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning
 create mode 100644 data/2022/neurips/SHINE: SubHypergraph Inductive Neural nEtwork
 create mode 100644 data/2022/neurips/SIREN: Shaping Representations for Detecting Out-of-Distribution Objects
 create mode 100644 data/2022/neurips/SIXO: Smoothing Inference with Twisted Objectives
 create mode 100644 data/2022/neurips/SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance
 create mode 100644 data/2022/neurips/SKFlow: Learning Optical Flow with Super Kernels
 create mode 100644 data/2022/neurips/SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments
 create mode 100644 data/2022/neurips/SNAKE: Shape-aware Neural 3D Keypoint Field
 create mode 100644 data/2022/neurips/SNN-RAT: Robustness-enhanced Spiking Neural Network through Regularized Adversarial Training
 create mode 100644 data/2022/neurips/SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG
 create mode 100644 data/2022/neurips/SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning
 create mode 100644 data/2022/neurips/SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion
 create mode 100644 data/2022/neurips/SQ Lower Bounds for Learning Single Neurons with Massart Noise
 create mode 100644 data/2022/neurips/ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
 create mode 100644 data/2022/neurips/STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers
 create mode 100644 data/2022/neurips/STaR: Bootstrapping Reasoning With Reasoning
 create mode 100644 data/2022/neurips/Safe Opponent-Exploitation Subgame Refinement
 create mode 100644 data/2022/neurips/SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles
 create mode 100644 data/2022/neurips/Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
 create mode 100644 data/2022/neurips/SageMix: Saliency-Guided Mixup for Point Clouds
 create mode 100644 data/2022/neurips/Saliency-Aware Neural Architecture Search
 create mode 100644 data/2022/neurips/Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search
 create mode 100644 data/2022/neurips/Sample Constrained Treatment Effect Estimation
 create mode 100644 data/2022/neurips/Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization
 create mode 100644 data/2022/neurips/Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games
 create mode 100644 data/2022/neurips/Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
 create mode 100644 data/2022/neurips/Sample-Then-Optimize Batch Neural Thompson Sampling
 create mode 100644 data/2022/neurips/Sampling from Log-Concave Distributions with Infinity-Distance Guarantees
 create mode 100644 data/2022/neurips/Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent
 create mode 100644 data/2022/neurips/Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space
 create mode 100644 data/2022/neurips/Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization
 create mode 100644 data/2022/neurips/SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
 create mode 100644 data/2022/neurips/Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees
 create mode 100644 data/2022/neurips/Scalable Infomin Learning
 create mode 100644 data/2022/neurips/Scalable Interpretability via Polynomials
 create mode 100644 data/2022/neurips/Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs
 create mode 100644 data/2022/neurips/Scalable Neural Video Representations with Learnable Positional Features
 create mode 100644 data/2022/neurips/Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
 create mode 100644 data/2022/neurips/Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions
 create mode 100644 data/2022/neurips/Scalable and Efficient Non-adaptive Deterministic Group Testing
 create mode 100644 data/2022/neurips/Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy
 create mode 100644 data/2022/neurips/Scalable design of Error-Correcting Output Codes using Discrete Optimization with Graph Coloring
 create mode 100644 data/2022/neurips/Scale-invariant Learning by Physics Inversion
 create mode 100644 data/2022/neurips/Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
 create mode 100644 data/2022/neurips/Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
 create mode 100644 data/2022/neurips/Score-Based Diffusion meets Annealed Importance Sampling
 create mode 100644 data/2022/neurips/Score-Based Generative Models Detect Manifolds
 create mode 100644 data/2022/neurips/Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance
 create mode 100644 data/2022/neurips/Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
 create mode 100644 data/2022/neurips/Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
 create mode 100644 data/2022/neurips/SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning
 create mode 100644 data/2022/neurips/Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
 create mode 100644 data/2022/neurips/SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
 create mode 100644 data/2022/neurips/SegViT: Semantic Segmentation with Plain Vision Transformers
 create mode 100644 data/2022/neurips/Segmenting Moving Objects via an Object-Centric Layered Representation
 create mode 100644 data/2022/neurips/SelecMix: Debiased Learning by Contradicting-pair Sampling
 create mode 100644 data/2022/neurips/Selective compression learning of latent representations for variable-rate image compression
 create mode 100644 data/2022/neurips/Self-Aware Personalized Federated Learning
 create mode 100644 data/2022/neurips/Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks
 create mode 100644 data/2022/neurips/Self-Explaining Deviations for Coordination
 create mode 100644 data/2022/neurips/Self-Organized Group for Cooperative Multi-agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations
 create mode 100644 data/2022/neurips/Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition
 create mode 100644 data/2022/neurips/Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency
 create mode 100644 data/2022/neurips/Self-Supervised Fair Representation Learning without Demographics
 create mode 100644 data/2022/neurips/Self-Supervised Image Restoration with Blurry and Noisy Pairs
 create mode 100644 data/2022/neurips/Self-Supervised Learning Through Efference Copies
 create mode 100644 data/2022/neurips/Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data
 create mode 100644 data/2022/neurips/Self-Supervised Learning via Maximum Entropy Coding
 create mode 100644 data/2022/neurips/Self-Supervised Learning with an Information Maximization Criterion
 create mode 100644 data/2022/neurips/Self-Supervised Pretraining for Large-Scale Point Clouds
 create mode 100644 data/2022/neurips/Self-Supervised Visual Representation Learning with Semantic Grouping
 create mode 100644 data/2022/neurips/Self-explaining deep models with logic rule reasoning
 create mode 100644 data/2022/neurips/Self-supervised Amodal Video Object Segmentation
 create mode 100644 data/2022/neurips/Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
 create mode 100644 data/2022/neurips/Self-supervised surround-view depth estimation with volumetric feature fusion
 create mode 100644 data/2022/neurips/SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
 create mode 100644 data/2022/neurips/Semantic Diffusion Network for Semantic Segmentation
 create mode 100644 data/2022/neurips/Semantic Exploration from Language Abstractions and Pretrained Representations
 create mode 100644 data/2022/neurips/Semantic Probabilistic Layers for Neuro-Symbolic Learning
 create mode 100644 data/2022/neurips/Semantic uncertainty intervals for disentangled latent spaces
 create mode 100644 data/2022/neurips/Semi-Discrete Normalizing Flows through Differentiable Tessellation
 create mode 100644 data/2022/neurips/Semi-Supervised Generative Models for Multiagent Trajectories
 create mode 100644 data/2022/neurips/Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization
 create mode 100644 data/2022/neurips/Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant
 create mode 100644 data/2022/neurips/Semi-Supervised Video Salient Object Detection Based on Uncertainty-Guided Pseudo Labels
 create mode 100644 data/2022/neurips/Semi-infinitely Constrained Markov Decision Processes
 create mode 100644 data/2022/neurips/Semi-supervised Active Linear Regression
 create mode 100644 data/2022/neurips/Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization
 create mode 100644 data/2022/neurips/Semi-supervised Vision Transformers at Scale
 create mode 100644 data/2022/neurips/SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
 create mode 100644 data/2022/neurips/SeqPATE: Differentially Private Text Generation via Knowledge Distillation
 create mode 100644 data/2022/neurips/Sequence Model Imitation Learning with Unobserved Contexts
 create mode 100644 data/2022/neurips/Sequence-to-Set Generative Models
 create mode 100644 data/2022/neurips/Sequencer: Deep LSTM for Image Classification
 create mode 100644 data/2022/neurips/Sequential Information Design: Learning to Persuade in the Dark
 create mode 100644 data/2022/neurips/Set-based Meta-Interpolation for Few-Task Meta-Learning
 create mode 100644 data/2022/neurips/Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
 create mode 100644 data/2022/neurips/Shape And Structure Preserving Differential Privacy
 create mode 100644 data/2022/neurips/Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising
 create mode 100644 data/2022/neurips/ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model
 create mode 100644 data/2022/neurips/Sharing Knowledge for Meta-learning with Feature Descriptions
 create mode 100644 data/2022/neurips/Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality
 create mode 100644 data/2022/neurips/Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning
 create mode 100644 data/2022/neurips/Sharpness-Aware Training for Free
 create mode 100644 data/2022/neurips/Shield Decentralization for Safe Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/ShuffleMixer: An Efficient ConvNet for Image Super-Resolution
 create mode 100644 data/2022/neurips/SignRFF: Sign Random Fourier Features
 create mode 100644 data/2022/neurips/Signal Processing for Implicit Neural Representations
 create mode 100644 data/2022/neurips/Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
 create mode 100644 data/2022/neurips/Signal Recovery with Non-Expansive Generative Network Priors
 create mode 100644 data/2022/neurips/Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions
 create mode 100644 data/2022/neurips/Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
 create mode 100644 data/2022/neurips/Simple and Optimal Greedy Online Contention Resolution Schemes
 create mode 100644 data/2022/neurips/Simplified Graph Convolution with Heterophily
 create mode 100644 data/2022/neurips/Simulation-guided Beam Search for Neural Combinatorial Optimization
 create mode 100644 data/2022/neurips/Simultaneous Missing Value Imputation and Structure Learning with Groups
 create mode 100644 data/2022/neurips/Single Loop Gaussian Homotopy Method for Non-convex Optimization
 create mode 100644 data/2022/neurips/Single Model Uncertainty Estimation via Stochastic Data Centering
 create mode 100644 data/2022/neurips/Single-Stage Visual Relationship Learning using Conditional Queries
 create mode 100644 data/2022/neurips/Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity
 create mode 100644 data/2022/neurips/Single-phase deep learning in cortico-cortical networks
 create mode 100644 data/2022/neurips/Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning
 create mode 100644 data/2022/neurips/Size and depth of monotone neural networks: interpolation and approximation
 create mode 100644 data/2022/neurips/SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks
 create mode 100644 data/2022/neurips/Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity
 create mode 100644 data/2022/neurips/SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems
 create mode 100644 data/2022/neurips/Sketching based Representations for Robust Image Classification with Provable Guarantees
 create mode 100644 data/2022/neurips/Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis
 create mode 100644 data/2022/neurips/Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
 create mode 100644 data/2022/neurips/Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions
 create mode 100644 data/2022/neurips/Smoothed Embeddings for Certified Few-Shot Learning
 create mode 100644 data/2022/neurips/Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor
 create mode 100644 data/2022/neurips/SnAKe: Bayesian Optimization with Pathwise Exploration
 create mode 100644 data/2022/neurips/So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems
 create mode 100644 data/2022/neurips/SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning
 create mode 100644 data/2022/neurips/Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
 create mode 100644 data/2022/neurips/Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations
 create mode 100644 data/2022/neurips/Society of Agents: Regret Bounds of Concurrent Thompson Sampling
 create mode 100644 data/2022/neurips/SoftPatch: Unsupervised Anomaly Detection with Noisy Data
 create mode 100644 data/2022/neurips/Solving Quantitative Reasoning Problems with Language Models
 create mode 100644 data/2022/neurips/SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression
 create mode 100644 data/2022/neurips/Sound and Complete Causal Identification with Latent Variables Given Local Background Knowledge
 create mode 100644 data/2022/neurips/Sound and Complete Verification of Polynomial Networks
 create mode 100644 data/2022/neurips/SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
 create mode 100644 data/2022/neurips/SparCL: Sparse Continual Learning on the Edge
 create mode 100644 data/2022/neurips/Sparse Fourier Backpropagation in Cryo-EM Reconstruction
 create mode 100644 data/2022/neurips/Sparse Gaussian Process Hyperparameters: Optimize or Integrate?
 create mode 100644 data/2022/neurips/Sparse Hypergraph Community Detection Thresholds in Stochastic Block Model
 create mode 100644 data/2022/neurips/Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection
 create mode 100644 data/2022/neurips/Sparse Probabilistic Circuits via Pruning and Growing
 create mode 100644 data/2022/neurips/Sparse Structure Search for Delta Tuning
 create mode 100644 data/2022/neurips/Sparse Winning Tickets are Data-Efficient Image Recognizers
 create mode 100644 data/2022/neurips/Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection
 create mode 100644 data/2022/neurips/Sparsity in Continuous-Depth Neural Networks
 create mode 100644 data/2022/neurips/Spartan: Differentiable Sparsity via Regularized Transportation
 create mode 100644 data/2022/neurips/Spatial Mixture-of-Experts
 create mode 100644 data/2022/neurips/Spatial Pruned Sparse Convolution for Efficient 3D Object Detection
 create mode 100644 data/2022/neurips/Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime
 create mode 100644 data/2022/neurips/Spectrum Random Masking for Generalization in Image-based Reinforcement Learning
 create mode 100644 data/2022/neurips/Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions
 create mode 100644 data/2022/neurips/Spherical Channels for Modeling Atomic Interactions
 create mode 100644 data/2022/neurips/Spherization Layer: Representation Using Only Angles
 create mode 100644 data/2022/neurips/Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables
 create mode 100644 data/2022/neurips/Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
 create mode 100644 data/2022/neurips/Stability Analysis and Generalization Bounds of Adversarial Training
 create mode 100644 data/2022/neurips/Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks
 create mode 100644 data/2022/neurips/Stability and Generalization for Markov Chain Stochastic Gradient Methods
 create mode 100644 data/2022/neurips/Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel
 create mode 100644 data/2022/neurips/Staggered Rollout Designs Enable Causal Inference Under Interference Without Network Knowledge
 create mode 100644 data/2022/neurips/Staircase Attention for Recurrent Processing of Sequences
 create mode 100644 data/2022/neurips/Star Temporal Classification: Sequence Modeling with Partially Labeled Data
 create mode 100644 data/2022/neurips/Stars: Tera-Scale Graph Building for Clustering and Learning
 create mode 100644 data/2022/neurips/Statistical Learning and Inverse Problems: A Stochastic Gradient Approach
 create mode 100644 data/2022/neurips/Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances
 create mode 100644 data/2022/neurips/Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
 create mode 100644 data/2022/neurips/Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing
 create mode 100644 data/2022/neurips/Stochastic Adaptive Activation Function
 create mode 100644 data/2022/neurips/Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions
 create mode 100644 data/2022/neurips/Stochastic Multiple Target Sampling Gradient Descent
 create mode 100644 data/2022/neurips/Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality
 create mode 100644 data/2022/neurips/Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions
 create mode 100644 data/2022/neurips/Stochastic Window Transformer for Image Restoration
 create mode 100644 data/2022/neurips/Streaming Radiance Fields for 3D Video Synthesis
 create mode 100644 data/2022/neurips/StrokeRehab: A Benchmark Dataset for Sub-second Action Identification
 create mode 100644 data/2022/neurips/Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts
 create mode 100644 data/2022/neurips/Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport
 create mode 100644 data/2022/neurips/Structural Knowledge Distillation for Object Detection
 create mode 100644 data/2022/neurips/Structural Pruning via Latency-Saliency Knapsack
 create mode 100644 data/2022/neurips/Structure-Aware Image Segmentation with Homotopy Warping
 create mode 100644 data/2022/neurips/Structure-Preserving 3D Garment Modeling with Neural Sewing Machines
 create mode 100644 data/2022/neurips/Structured Energy Network As a Loss
 create mode 100644 data/2022/neurips/Structured Recognition for Generative Models with Explaining Away
 create mode 100644 data/2022/neurips/Structuring Representations Using Group Invariants
 create mode 100644 data/2022/neurips/Structuring Uncertainty for Fine-Grained Sampling in Stochastic Segmentation Networks
 create mode 100644 data/2022/neurips/Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis
 create mode 100644 data/2022/neurips/Subgame Solving in Adversarial Team Games
 create mode 100644 data/2022/neurips/Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation
 create mode 100644 data/2022/neurips/Sublinear Algorithms for Hierarchical Clustering
 create mode 100644 data/2022/neurips/Submodular Maximization in Clean Linear Time
 create mode 100644 data/2022/neurips/Subquadratic Kronecker Regression with Applications to Tensor Decomposition
 create mode 100644 data/2022/neurips/Subsidiary Prototype Alignment for Universal Domain Adaptation
 create mode 100644 data/2022/neurips/Subspace Recovery from Heterogeneous Data with Non-isotropic Noise
 create mode 100644 data/2022/neurips/Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap
 create mode 100644 data/2022/neurips/Supervised Training of Conditional Monge Maps
 create mode 100644 data/2022/neurips/Supervising the Multi-Fidelity Race of Hyperparameter Configurations
 create mode 100644 data/2022/neurips/Support Recovery in Sparse PCA with Incomplete Data
 create mode 100644 data/2022/neurips/Supported Policy Optimization for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/SurDis: A Surface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments
 create mode 100644 data/2022/neurips/Surprise Minimizing Multi-Agent Learning with Energy-based Models
 create mode 100644 data/2022/neurips/Sustainable Online Reinforcement Learning for Auto-bidding
 create mode 100644 data/2022/neurips/SwinTrack: A Simple and Strong Baseline for Transformer Tracking
 create mode 100644 data/2022/neurips/Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization
 create mode 100644 data/2022/neurips/Symbolic Distillation for Learned TCP Congestion Control
 create mode 100644 data/2022/neurips/Symmetry Teleportation for Accelerated Optimization
 create mode 100644 data/2022/neurips/Symmetry-induced Disentanglement on Graphs
 create mode 100644 data/2022/neurips/Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data
 create mode 100644 data/2022/neurips/Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms
 create mode 100644 data/2022/neurips/Synergy-of-Experts: Collaborate to Improve Adversarial Robustness
 create mode 100644 data/2022/neurips/Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning
 create mode 100644 data/2022/neurips/Systematic improvement of neural network quantum states using Lanczos
 create mode 100644 data/2022/neurips/TA-GATES: An Encoding Scheme for Neural Network Architectures
 create mode 100644 data/2022/neurips/TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
 create mode 100644 data/2022/neurips/TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
 create mode 100644 data/2022/neurips/TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction
 create mode 100644 data/2022/neurips/TAP-Vid: A Benchmark for Tracking Any Point in a Video
 create mode 100644 data/2022/neurips/TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels
 create mode 100644 data/2022/neurips/TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
 create mode 100644 data/2022/neurips/TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
 create mode 100644 data/2022/neurips/TPU-KNN: K Nearest Neighbor Search at Peak FLOP s
 create mode 100644 data/2022/neurips/TREC: Transient Redundancy Elimination-based Convolution
 create mode 100644 data/2022/neurips/TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning
 create mode 100644 data/2022/neurips/TUSK: Task-Agnostic Unsupervised Keypoints
 create mode 100644 data/2022/neurips/TVLT: Textless Vision-Language Transformer
 create mode 100644 data/2022/neurips/TaSIL: Taylor Series Imitation Learning
 create mode 100644 data/2022/neurips/TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets
 create mode 100644 data/2022/neurips/TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training
 create mode 100644 "data/2022/neurips/Taming Fat-Tailed (\"Heavier-Tailed\" with Potentially Infinite Variance) Noise in Federated Learning"
 create mode 100644 data/2022/neurips/TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification
 create mode 100644 data/2022/neurips/Target alignment in truncated kernel ridge regression
 create mode 100644 data/2022/neurips/Task Discovery: Finding the Tasks that Neural Networks Generalize on
 create mode 100644 data/2022/neurips/Task-Agnostic Graph Explanations
 create mode 100644 data/2022/neurips/Task-Free Continual Learning via Online Discrepancy Distance Learning
 create mode 100644 data/2022/neurips/Task-level Differentially Private Meta Learning
 create mode 100644 data/2022/neurips/Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation
 create mode 100644 data/2022/neurips/Teacher Forcing Recovers Reward Functions for Text Generation
 create mode 100644 data/2022/neurips/TempEL: Linking Dynamically Evolving and Newly Emerging Entities
 create mode 100644 data/2022/neurips/Template based Graph Neural Network with Optimal Transport Distances
 create mode 100644 data/2022/neurips/Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction
 create mode 100644 data/2022/neurips/Temporal Effective Batch Normalization in Spiking Neural Networks
 create mode 100644 data/2022/neurips/Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
 create mode 100644 data/2022/neurips/Temporally Disentangled Representation Learning
 create mode 100644 data/2022/neurips/Temporally-Consistent Survival Analysis
 create mode 100644 data/2022/neurips/Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
 create mode 100644 data/2022/neurips/Tensor Program Optimization with Probabilistic Programs
 create mode 100644 data/2022/neurips/Tensor Wheel Decomposition and Its Tensor Completion Application
 create mode 100644 data/2022/neurips/Test Time Adaptation via Conjugate Pseudo-labels
 create mode 100644 data/2022/neurips/Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
 create mode 100644 data/2022/neurips/Test-Time Training with Masked Autoencoders
 create mode 100644 data/2022/neurips/Text Classification with Born's Rule
 create mode 100644 data/2022/neurips/Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
 create mode 100644 data/2022/neurips/The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
 create mode 100644 data/2022/neurips/The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound
 create mode 100644 data/2022/neurips/The Curse of Unrolling: Rate of Differentiating Through Optimization
 create mode 100644 data/2022/neurips/The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World
 create mode 100644 data/2022/neurips/The Effects of Regularization and Data Augmentation are Class Dependent
 create mode 100644 data/2022/neurips/The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization
 create mode 100644 data/2022/neurips/The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization
 create mode 100644 data/2022/neurips/The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics
 create mode 100644 data/2022/neurips/The Gyro-Structure of Some Matrix Manifolds
 create mode 100644 data/2022/neurips/The Hessian Screening Rule
 create mode 100644 data/2022/neurips/The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/The Implicit Delta Method
 create mode 100644 data/2022/neurips/The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
 create mode 100644 data/2022/neurips/The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm
 create mode 100644 data/2022/neurips/The Missing Invariance Principle found - the Reciprocal Twin of Invariant Risk Minimization
 create mode 100644 data/2022/neurips/The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
 create mode 100644 data/2022/neurips/The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization
 create mode 100644 data/2022/neurips/The Neural Testbed: Evaluating Joint Predictions
 create mode 100644 data/2022/neurips/The Phenomenon of Policy Churn
 create mode 100644 data/2022/neurips/The Pitfalls of Regularization in Off-Policy TD Learning
 create mode 100644 data/2022/neurips/The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design
 create mode 100644 data/2022/neurips/The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
 create mode 100644 data/2022/neurips/The Privacy Onion Effect: Memorization is Relative
 create mode 100644 data/2022/neurips/The Query Complexity of Cake Cutting
 create mode 100644 data/2022/neurips/The Role of Baselines in Policy Gradient Optimization
 create mode 100644 data/2022/neurips/The Sample Complexity of One-Hidden-Layer Neural Networks
 create mode 100644 data/2022/neurips/The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
 create mode 100644 data/2022/neurips/The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
 create mode 100644 data/2022/neurips/The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes
 create mode 100644 data/2022/neurips/The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
 create mode 100644 data/2022/neurips/The alignment property of SGD noise and how it helps select flat minima: A stability analysis
 create mode 100644 data/2022/neurips/The computational and learning benefits of Daleian neural networks
 create mode 100644 data/2022/neurips/The least-control principle for local learning at equilibrium
 create mode 100644 data/2022/neurips/The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?
 create mode 100644 data/2022/neurips/The price of unfairness in linear bandits with biased feedback
 create mode 100644 data/2022/neurips/The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model
 create mode 100644 data/2022/neurips/Theoretical analysis of deep neural networks for temporally dependent observations
 create mode 100644 data/2022/neurips/Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques
 create mode 100644 data/2022/neurips/Theoretically Provable Spiking Neural Networks
 create mode 100644 data/2022/neurips/Theory and Approximate Solvers for Branched Optimal Transport with Multiple Sources
 create mode 100644 data/2022/neurips/Theseus: A Library for Differentiable Nonlinear Optimization
 create mode 100644 data/2022/neurips/Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
 create mode 100644 data/2022/neurips/Thinned random measures for sparse graphs with overlapping communities
 create mode 100644 data/2022/neurips/This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
 create mode 100644 data/2022/neurips/Thompson Sampling Efficiently Learns to Control Diffusion Processes
 create mode 100644 data/2022/neurips/Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
 create mode 100644 data/2022/neurips/Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret
 create mode 100644 data/2022/neurips/Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems
 create mode 100644 data/2022/neurips/Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes
 create mode 100644 data/2022/neurips/Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization
 create mode 100644 data/2022/neurips/Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints
 create mode 100644 data/2022/neurips/Time-Conditioned Dances with Simplicial Complexes: Zigzag Filtration Curve based Supra-Hodge Convolution Networks for Time-series Forecasting
 create mode 100644 data/2022/neurips/To update or not to update? Neurons at equilibrium in deep models
 create mode 100644 data/2022/neurips/ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
 create mode 100644 data/2022/neurips/TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
 create mode 100644 data/2022/neurips/Top Two Algorithms Revisited
 create mode 100644 data/2022/neurips/Torsional Diffusion for Molecular Conformer Generation
 create mode 100644 data/2022/neurips/TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies
 create mode 100644 data/2022/neurips/Touch and Go: Learning from Human-Collected Vision and Touch
 create mode 100644 data/2022/neurips/Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
 create mode 100644 data/2022/neurips/Toward Robust Spiking Neural Network Against Adversarial Perturbation
 create mode 100644 data/2022/neurips/Toward Understanding Privileged Features Distillation in Learning-to-Rank
 create mode 100644 data/2022/neurips/Toward a realistic model of speech processing in the brain with self-supervised learning
 create mode 100644 data/2022/neurips/Towards Better Evaluation for Dynamic Link Prediction
 create mode 100644 data/2022/neurips/Towards Consistency in Adversarial Classification
 create mode 100644 data/2022/neurips/Towards Disentangling Information Paths with Coded ResNeXt
 create mode 100644 data/2022/neurips/Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks
 create mode 100644 data/2022/neurips/Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
 create mode 100644 data/2022/neurips/Towards Efficient 3D Object Detection with Knowledge Distillation
 create mode 100644 data/2022/neurips/Towards Efficient Post-training Quantization of Pre-trained Language Models
 create mode 100644 data/2022/neurips/Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning
 create mode 100644 data/2022/neurips/Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
 create mode 100644 data/2022/neurips/Towards Improving Calibration in Object Detection Under Domain Shift
 create mode 100644 data/2022/neurips/Towards Improving Faithfulness in Abstractive Summarization
 create mode 100644 data/2022/neurips/Towards Learning Universal Hyperparameter Optimizers with Transformers
 create mode 100644 data/2022/neurips/Towards Lightweight Black-Box Attack Against Deep Neural Networks
 create mode 100644 data/2022/neurips/Towards Optimal Communication Complexity in Distributed Non-Convex Optimization
 create mode 100644 data/2022/neurips/Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment
 create mode 100644 data/2022/neurips/Towards Practical Control of Singular Values of Convolutional Layers
 create mode 100644 data/2022/neurips/Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference
 create mode 100644 data/2022/neurips/Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias
 create mode 100644 data/2022/neurips/Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation
 create mode 100644 data/2022/neurips/Towards Robust Blind Face Restoration with Codebook Lookup Transformer
 create mode 100644 data/2022/neurips/Towards Safe Reinforcement Learning with a Safety Editor Policy
 create mode 100644 data/2022/neurips/Towards Theoretically Inspired Neural Initialization Optimization
 create mode 100644 data/2022/neurips/Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Towards Understanding Grokking: An Effective Theory of Representation Learning
 create mode 100644 data/2022/neurips/Towards Understanding the Condensation of Neural Networks at Initial Training
 create mode 100644 data/2022/neurips/Towards Understanding the Mixture-of-Experts Layer in Deep Learning
 create mode 100644 data/2022/neurips/Towards Versatile Embodied Navigation
 create mode 100644 data/2022/neurips/Towards Video Text Visual Question Answering: Benchmark and Baseline
 create mode 100644 data/2022/neurips/Towards a Standardised Performance Evaluation Protocol for Cooperative MARL
 create mode 100644 data/2022/neurips/Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees
 create mode 100644 data/2022/neurips/Tracking Functional Changes in Nonstationary Signals with Evolutionary Ensemble Bayesian Model for Robust Neural Decoding
 create mode 100644 data/2022/neurips/Tractable Function-Space Variational Inference in Bayesian Neural Networks
 create mode 100644 data/2022/neurips/Tractable Optimality in Episodic Latent MABs
 create mode 100644 data/2022/neurips/Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning
 create mode 100644 data/2022/neurips/Trading Off Resource Budgets For Improved Regret Bounds
 create mode 100644 data/2022/neurips/Trading off Image Quality for Robustness is not Necessary with Regularized Deterministic Autoencoders
 create mode 100644 data/2022/neurips/Trading off Utility, Informativeness, and Complexity in Emergent Communication
 create mode 100644 data/2022/neurips/Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
 create mode 100644 data/2022/neurips/Training Spiking Neural Networks with Event-driven Backpropagation
 create mode 100644 data/2022/neurips/Training Spiking Neural Networks with Local Tandem Learning
 create mode 100644 data/2022/neurips/Training Subset Selection for Weak Supervision
 create mode 100644 data/2022/neurips/Training Uncertainty-Aware Classifiers with Conformalized Deep Learning
 create mode 100644 data/2022/neurips/Training and Inference on Any-Order Autoregressive Models the Right Way
 create mode 100644 data/2022/neurips/Training language models to follow instructions with human feedback
 create mode 100644 data/2022/neurips/Training stochastic stabilized supralinear networks by dynamics-neutral growth
 create mode 100644 data/2022/neurips/Training with More Confidence: Mitigating Injected and Natural Backdoors During Training
 create mode 100644 data/2022/neurips/Trajectory Inference via Mean-field Langevin in Path Space
 create mode 100644 data/2022/neurips/Trajectory balance: Improved credit assignment in GFlowNets
 create mode 100644 data/2022/neurips/Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
 create mode 100644 data/2022/neurips/Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
 create mode 100644 data/2022/neurips/TransBoost: Improving the Best ImageNet Performance using Deep Transduction
 create mode 100644 data/2022/neurips/TransTab: Learning Transferable Tabular Transformers Across Tables
 create mode 100644 data/2022/neurips/Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
 create mode 100644 data/2022/neurips/Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation
 create mode 100644 data/2022/neurips/Transferring Fairness under Distribution Shifts via Fair Consistency Regularization
 create mode 100644 data/2022/neurips/Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching
 create mode 100644 data/2022/neurips/Transform Once: Efficient Operator Learning in Frequency Domain
 create mode 100644 data/2022/neurips/Transformer Memory as a Differentiable Search Index
 create mode 100644 data/2022/neurips/Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing
 create mode 100644 data/2022/neurips/Transformers from an Optimization Perspective
 create mode 100644 data/2022/neurips/Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
 create mode 100644 data/2022/neurips/Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
 create mode 100644 data/2022/neurips/Translation-equivariant Representation in Recurrent Networks with a Continuous Manifold of Attractors
 create mode 100644 data/2022/neurips/Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork
 create mode 100644 data/2022/neurips/Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks
 create mode 100644 data/2022/neurips/Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces
 create mode 100644 data/2022/neurips/TreeMoCo: Contrastive Neuron Morphology Representation Learning
 create mode 100644 data/2022/neurips/Triangulation candidates for Bayesian optimization
 create mode 100644 data/2022/neurips/Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model
 create mode 100644 data/2022/neurips/Truly Deterministic Policy Optimization
 create mode 100644 data/2022/neurips/Truncated Matrix Power Iteration for Differentiable DAG Learning
 create mode 100644 data/2022/neurips/Truncated proposals for scalable and hassle-free simulation-based inference
 create mode 100644 data/2022/neurips/Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions
 create mode 100644 data/2022/neurips/Trustworthy Monte Carlo
 create mode 100644 data/2022/neurips/Tsetlin Machine for Solving Contextual Bandit Problems
 create mode 100644 data/2022/neurips/Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers
 create mode 100644 data/2022/neurips/Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation
 create mode 100644 data/2022/neurips/TweetNERD - End to End Entity Linking Benchmark for Tweets
 create mode 100644 data/2022/neurips/TwiBot-22: Towards Graph-Based Twitter Bot Detection
 create mode 100644 data/2022/neurips/Two-Stream Network for Sign Language Recognition and Translation
 create mode 100644 data/2022/neurips/Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime
 create mode 100644 data/2022/neurips/UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units
 create mode 100644 data/2022/neurips/ULNeF: Untangled Layered Neural Fields for Mix-and-Match Virtual Try-On
 create mode 100644 data/2022/neurips/UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup
 create mode 100644 data/2022/neurips/UQGAN: A Unified Model for Uncertainty Quantification of Deep Classifiers trained via Conditional GANs
 create mode 100644 data/2022/neurips/USB: A Unified Semi-supervised Learning Benchmark for Classification
 create mode 100644 data/2022/neurips/UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
 create mode 100644 data/2022/neurips/Uncalibrated Models Can Improve Human-AI Collaboration
 create mode 100644 data/2022/neurips/Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning
 create mode 100644 data/2022/neurips/Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
 create mode 100644 data/2022/neurips/Uncertainty-Aware Hierarchical Refinement for Incremental Implicitly-Refined Classification
 create mode 100644 data/2022/neurips/Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game
 create mode 100644 data/2022/neurips/Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games
 create mode 100644 data/2022/neurips/Uncovering the Structural Fairness in Graph Contrastive Learning
 create mode 100644 data/2022/neurips/Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment
 create mode 100644 data/2022/neurips/Understanding Benign Overfitting in Gradient-Based Meta Learning
 create mode 100644 data/2022/neurips/Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
 create mode 100644 data/2022/neurips/Understanding Deep Contrastive Learning via Coordinate-wise Optimization
 create mode 100644 data/2022/neurips/Understanding Hyperdimensional Computing for Parallel Single-Pass Learning
 create mode 100644 data/2022/neurips/Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective
 create mode 100644 data/2022/neurips/Understanding Programmatic Weak Supervision via Source-aware Influence Function
 create mode 100644 data/2022/neurips/Understanding Robust Learning through the Lens of Representation Similarities
 create mode 100644 data/2022/neurips/Understanding Square Loss in Training Overparametrized Neural Network Classifiers
 create mode 100644 data/2022/neurips/Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries
 create mode 100644 data/2022/neurips/Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation
 create mode 100644 data/2022/neurips/Understanding the Eluder Dimension
 create mode 100644 data/2022/neurips/Understanding the Evolution of Linear Regions in Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/Understanding the Failure of Batch Normalization for Transformers in NLP
 create mode 100644 data/2022/neurips/Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
 create mode 100644 data/2022/neurips/UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
 create mode 100644 data/2022/neurips/Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
 create mode 100644 data/2022/neurips/UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
 create mode 100644 data/2022/neurips/UniGAN: Reducing Mode Collapse in GANs using a Uniform Generator
 create mode 100644 data/2022/neurips/Uni[MASK]: Unified Inference in Sequential Decision Problems
 create mode 100644 data/2022/neurips/Unified Optimal Transport Framework for Universal Domain Adaptation
 create mode 100644 data/2022/neurips/Unifying Voxel-based Representation with Transformer for 3D Object Detection
 create mode 100644 data/2022/neurips/Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
 create mode 100644 data/2022/neurips/Universal Rates for Interactive Learning
 create mode 100644 data/2022/neurips/Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups
 create mode 100644 data/2022/neurips/Universally Expressive Communication in Multi-Agent Reinforcement Learning
 create mode 100644 data/2022/neurips/Unknown-Aware Domain Adversarial Learning for Open-Set Domain Adaptation
 create mode 100644 data/2022/neurips/Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes
 create mode 100644 data/2022/neurips/Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
 create mode 100644 data/2022/neurips/Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems
 create mode 100644 data/2022/neurips/Unsupervised Adaptation from Repeated Traversals for Autonomous Driving
 create mode 100644 data/2022/neurips/Unsupervised Causal Generative Understanding of Images
 create mode 100644 data/2022/neurips/Unsupervised Cross-Task Generalization via Retrieval Augmentation
 create mode 100644 data/2022/neurips/Unsupervised Domain Adaptation for Semantic Segmentation using Depth Distribution
 create mode 100644 data/2022/neurips/Unsupervised Image-to-Image Translation with Density Changing Regularization
 create mode 100644 data/2022/neurips/Unsupervised Learning From Incomplete Measurements for Inverse Problems
 create mode 100644 data/2022/neurips/Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation
 create mode 100644 data/2022/neurips/Unsupervised Learning of Equivariant Structure from Sequences
 create mode 100644 data/2022/neurips/Unsupervised Learning of Group Invariant and Equivariant Representations
 create mode 100644 data/2022/neurips/Unsupervised Learning of Shape Programs with Repeatable Implicit Parts
 create mode 100644 data/2022/neurips/Unsupervised Learning under Latent Label Shift
 create mode 100644 data/2022/neurips/Unsupervised Multi-Object Segmentation by Predicting Probable Motion Patterns
 create mode 100644 data/2022/neurips/Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation
 create mode 100644 data/2022/neurips/Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning
 create mode 100644 data/2022/neurips/Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant VAE
 create mode 100644 data/2022/neurips/Unsupervised Point Cloud Completion and Segmentation by Generative Adversarial Autoencoding Network
 create mode 100644 data/2022/neurips/Unsupervised Reinforcement Learning with Contrastive Intrinsic Control
 create mode 100644 data/2022/neurips/Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models
 create mode 100644 data/2022/neurips/Unsupervised Skill Discovery via Recurrent Skill Training
 create mode 100644 data/2022/neurips/Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment
 create mode 100644 data/2022/neurips/Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection
 create mode 100644 data/2022/neurips/Uplifting Bandits
 create mode 100644 data/2022/neurips/Use-Case-Grounded Simulations for Explanation Evaluation
 create mode 100644 data/2022/neurips/Using Embeddings for Causal Estimation of Peer Influence in Social Networks
 create mode 100644 data/2022/neurips/Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness
 create mode 100644 data/2022/neurips/Using Partial Monotonicity in Submodular Maximization
 create mode 100644 data/2022/neurips/Using natural language and program abstractions to instill human inductive biases in machines
 create mode 100644 data/2022/neurips/VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming
 create mode 100644 data/2022/neurips/VCT: A Video Compression Transformer
 create mode 100644 data/2022/neurips/VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
 create mode 100644 data/2022/neurips/VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely?
 create mode 100644 data/2022/neurips/VICE: Variational Interpretable Concept Embeddings
 create mode 100644 data/2022/neurips/VICRegL: Self-Supervised Learning of Local Visual Features
 create mode 100644 data/2022/neurips/VITA: Video Instance Segmentation via Object Token Association
 create mode 100644 data/2022/neurips/VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
 create mode 100644 data/2022/neurips/VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
 create mode 100644 data/2022/neurips/VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
 create mode 100644 data/2022/neurips/VTC-LFC: Vision Transformer Compression with Low-Frequency Components
 create mode 100644 data/2022/neurips/VaiPhy: a Variational Inference Based Algorithm for Phylogeny
 create mode 100644 data/2022/neurips/Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
 create mode 100644 data/2022/neurips/Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
 create mode 100644 data/2022/neurips/Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning
 create mode 100644 data/2022/neurips/Variational Model Perturbation for Source-Free Domain Adaptation
 create mode 100644 data/2022/neurips/Variational inference via Wasserstein gradient flows
 create mode 100644 data/2022/neurips/VectorAdam for Rotation Equivariant Geometry Optimization
 create mode 100644 data/2022/neurips/VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web
 create mode 100644 data/2022/neurips/Verification and search algorithms for causal DAGs
 create mode 100644 data/2022/neurips/Versatile Multi-stage Graph Neural Network for Circuit Representation
 create mode 100644 data/2022/neurips/ViSioNS: Visual Search in Natural Scenes Benchmark
 create mode 100644 data/2022/neurips/ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 create mode 100644 data/2022/neurips/Video Diffusion Models
 create mode 100644 data/2022/neurips/Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
 create mode 100644 data/2022/neurips/Video compression dataset and benchmark of learning-based video-quality metrics
 create mode 100644 data/2022/neurips/Video-based Human-Object Interaction Detection from Tubelet Tokens
 create mode 100644 data/2022/neurips/VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
 create mode 100644 data/2022/neurips/ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints
 create mode 100644 data/2022/neurips/VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids
 create mode 100644 data/2022/neurips/VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
 create mode 100644 data/2022/neurips/Vision GNN: An Image is Worth Graph of Nodes
 create mode 100644 data/2022/neurips/Vision Transformers provably learn spatial structure
 create mode 100644 data/2022/neurips/Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
 create mode 100644 data/2022/neurips/Visual Concepts Tokenization
 create mode 100644 data/2022/neurips/Visual Prompting via Image Inpainting
 create mode 100644 data/2022/neurips/Visual correspondence-based explanations improve AI robustness and human-AI team accuracy
 create mode 100644 data/2022/neurips/VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models
 create mode 100644 data/2022/neurips/VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
 create mode 100644 data/2022/neurips/WT-MVSNet: Window-based Transformers for Multi-view Stereo
 create mode 100644 data/2022/neurips/Washing The Unwashable : On The (Im)possibility of Fairwashing Detection
 create mode 100644 data/2022/neurips/Wasserstein $K$-means for clustering probability distributions
 create mode 100644 data/2022/neurips/Wasserstein Iterative Networks for Barycenter Estimation
 create mode 100644 data/2022/neurips/Wasserstein Logistic Regression with Mixed Features
 create mode 100644 data/2022/neurips/Watermarking for Out-of-distribution Detection
 create mode 100644 data/2022/neurips/WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting
 create mode 100644 data/2022/neurips/Wavelet Feature Maps Compression for Image-to-Image CNNs
 create mode 100644 data/2022/neurips/Wavelet Score-Based Generative Modeling
 create mode 100644 data/2022/neurips/Weak-shot Semantic Segmentation via Dual Similarity Transfer
 create mode 100644 data/2022/neurips/Weakly Supervised Representation Learning with Sparse Perturbations
 create mode 100644 data/2022/neurips/Weakly supervised causal representation learning
 create mode 100644 data/2022/neurips/Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
 create mode 100644 data/2022/neurips/WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
 create mode 100644 data/2022/neurips/Weighted Distillation with Unlabeled Examples
 create mode 100644 data/2022/neurips/Weighted Mutual Learning with Diversity-Driven Model Compression
 create mode 100644 data/2022/neurips/WeightedSHAP: analyzing and improving Shapley based feature attributions
 create mode 100644 data/2022/neurips/Weisfeiler and Leman Go Walking: Random Walk Kernels Revisited
 create mode 100644 data/2022/neurips/What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
 create mode 100644 data/2022/neurips/What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?
 create mode 100644 data/2022/neurips/What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods
 create mode 100644 data/2022/neurips/What Makes Graph Neural Networks Miscalibrated?
 create mode 100644 "data/2022/neurips/What Makes a \"Good\" Data Augmentation in Knowledge Distillation - A Statistical Perspective"
 create mode 100644 data/2022/neurips/What You See is What You Classify: Black Box Attributions
 create mode 100644 data/2022/neurips/What You See is What You Get: Principled Deep Learning via Distributional Generalization
 create mode 100644 data/2022/neurips/What are the best Systems? New Perspectives on NLP Benchmarking
 create mode 100644 data/2022/neurips/What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
 create mode 100644 data/2022/neurips/What is a Good Metric to Study Generalization of Minimax Learners?
 create mode 100644 data/2022/neurips/What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment
 create mode 100644 data/2022/neurips/When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
 create mode 100644 data/2022/neurips/When Combinatorial Thompson Sampling meets Approximation Regret
 create mode 100644 data/2022/neurips/When Do Flat Minima Optimizers Work?
 create mode 100644 data/2022/neurips/When Does Differentially Private Learning Not Suffer in High Dimensions?
 create mode 100644 data/2022/neurips/When Does Group Invariant Learning Survive Spurious Correlations?
 create mode 100644 data/2022/neurips/When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits
 create mode 100644 data/2022/neurips/When are Local Queries Useful for Robust Learning?
 create mode 100644 data/2022/neurips/When are Offline Two-Player Zero-Sum Markov Games Solvable?
 create mode 100644 data/2022/neurips/When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
 create mode 100644 data/2022/neurips/When does return-conditioned supervised learning work for offline reinforcement learning?
 create mode 100644 data/2022/neurips/When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning
 create mode 100644 data/2022/neurips/When to Intervene: Learning Optimal Intervention Policies for Critical Events
 create mode 100644 data/2022/neurips/When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
 create mode 100644 data/2022/neurips/When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
 create mode 100644 data/2022/neurips/When to Update Your Model: Constrained Model-based Reinforcement Learning
 create mode 100644 data/2022/neurips/Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability
 create mode 100644 data/2022/neurips/Where to Pay Attention in Sparse Training for Feature Selection?
 create mode 100644 data/2022/neurips/Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps
 create mode 100644 data/2022/neurips/Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations
 create mode 100644 data/2022/neurips/Whitening Convergence Rate of Coupling-based Normalizing Flows
 create mode 100644 data/2022/neurips/Why Do Artificially Generated Data Help Adversarial Robustness
 create mode 100644 data/2022/neurips/Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
 create mode 100644 data/2022/neurips/Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters
 create mode 100644 data/2022/neurips/Why do We Need Large Batchsizes in Contrastive Learning? A Gradient-Bias Perspective
 create mode 100644 data/2022/neurips/Why do tree-based models still outperform deep learning on typical tabular data?
 create mode 100644 data/2022/neurips/Why neural networks find simple solutions: The many regularizers of geometric complexity
 create mode 100644 data/2022/neurips/Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time
 create mode 100644 data/2022/neurips/Will Bilevel Optimizers Benefit from Loops
 create mode 100644 data/2022/neurips/WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
 create mode 100644 data/2022/neurips/Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
 create mode 100644 data/2022/neurips/XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
 create mode 100644 data/2022/neurips/You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
 create mode 100644 data/2022/neurips/You Never Stop Dancing: Non-freezing Dance Generation via Bank-constrained Manifold Projection
 create mode 100644 data/2022/neurips/You Only Live Once: Single-Life Reinforcement Learning
 create mode 100644 data/2022/neurips/Your Out-of-Distribution Detection Method is Not Robust!
 create mode 100644 data/2022/neurips/Your Transformer May Not be as Powerful as You Expect
 create mode 100644 data/2022/neurips/ZARTS: On Zero-order Optimization for Neural Architecture Search
 create mode 100644 data/2022/neurips/ZIN: When and How to Learn Invariance Without Environment Partition?
 create mode 100644 data/2022/neurips/ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
 create mode 100644 data/2022/neurips/Zero-Shot 3D Drug Design by Sketching and Generating
 create mode 100644 data/2022/neurips/Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
 create mode 100644 data/2022/neurips/Zero-Sum Stochastic Stackelberg Games
 create mode 100644 data/2022/neurips/Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
 create mode 100644 data/2022/neurips/ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
 create mode 100644 data/2022/neurips/ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
 create mode 100644 data/2022/neurips/Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity
 create mode 100644 data/2022/neurips/Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients
 create mode 100644 data/2022/neurips/Zonotope Domains for Lagrangian Neural Network Verification
 create mode 100644 data/2022/neurips/ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization
 create mode 100644 data/2022/neurips/coVariance Neural Networks
 create mode 100644 data/2022/neurips/mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors
 create mode 100644 data/2022/neurips/pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning
 create mode 100644 data/2022/neurips/projUNN: efficient method for training deep networks with unitary matrices
 create mode 100644 data/2022/neurips/pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
 create mode 100644 data/2022/neurips/u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
 create mode 100644 data/2022/neurips/xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Radar Imagery
 create mode 100644 "data/2022/neurips/\360\237\217\230\357\270\217 ProcTHOR: Large-Scale Embodied AI Using Procedural Generation"
 create mode 100644 "data/2023/neurips/\"Why Not Looking backward?\" A Robust Two-Step Method to Automatically Terminate Bayesian Optimization"
 create mode 100644 data/2023/neurips/(Amplified) Banded Matrix Factorization: A unified approach to private training
 create mode 100644 data/2023/neurips/2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression
 create mode 100644 data/2023/neurips/3D molecule generation by denoising voxel grids
 create mode 100644 data/2023/neurips/3D-LLM: Injecting the 3D World into Large Language Models
 create mode 100644 data/2023/neurips/4D Panoptic Scene Graph Generation
 create mode 100644 data/2023/neurips/A Bayesian Approach To Analysing Training Data Attribution In Deep Learning
 create mode 100644 data/2023/neurips/A Bounded Ability Estimation for Computerized Adaptive Testing
 create mode 100644 data/2023/neurips/A Cross-Moment Approach for Causal Effect Estimation
 create mode 100644 data/2023/neurips/A Data-Free Approach to Mitigate Catastrophic Forgetting in Federated Class Incremental Learning for Vision Tasks
 create mode 100644 data/2023/neurips/A Dataset for Analyzing Streaming Media Performance over HTTP 3 Browsers
 create mode 100644 data/2023/neurips/A Diffusion-Model of Joint Interactive Navigation
 create mode 100644 data/2023/neurips/A Dynamical System View of Langevin-Based Non-Convex Sampling
 create mode 100644 data/2023/neurips/A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
 create mode 100644 data/2023/neurips/A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions
 create mode 100644 data/2023/neurips/A General Framework for Robust G-Invariance in G-Equivariant Networks
 create mode 100644 data/2023/neurips/A General Theory of Correct, Incorrect, and Extrinsic Equivariance
 create mode 100644 data/2023/neurips/A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
 create mode 100644 data/2023/neurips/A Massive Scale Semantic Similarity Dataset of Historical English
 create mode 100644 data/2023/neurips/A Measure-Theoretic Axiomatisation of Causality
 create mode 100644 data/2023/neurips/A Regularized Conditional GAN for Posterior Sampling in Image Recovery Problems
 create mode 100644 data/2023/neurips/A Riemannian Exponential Augmented Lagrangian Method for Computing the Projection Robust Wasserstein Distance
 create mode 100644 data/2023/neurips/A Robust Exact Algorithm for the Euclidean Bipartite Matching Problem
 create mode 100644 data/2023/neurips/A Scalable Neural Network for DSIC Affine Maximizer Auction Design
 create mode 100644 data/2023/neurips/A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models
 create mode 100644 data/2023/neurips/A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization
 create mode 100644 data/2023/neurips/A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm
 create mode 100644 data/2023/neurips/A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
 create mode 100644 data/2023/neurips/A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process
 create mode 100644 data/2023/neurips/A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing
 create mode 100644 data/2023/neurips/A Unified Model and Dimension for Interactive Estimation
 create mode 100644 data/2023/neurips/A Unified, Scalable Framework for Neural Population Decoding
 create mode 100644 data/2023/neurips/A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning
 create mode 100644 data/2023/neurips/A fast heuristic to optimize time-space tradeoff for large models
 create mode 100644 data/2023/neurips/A graphon-signal analysis of graph neural networks
 create mode 100644 data/2023/neurips/A new perspective on building efficient and expressive 3D equivariant graph neural networks
 create mode 100644 data/2023/neurips/A polar prediction model for learning to represent visual transformations
 create mode 100644 data/2023/neurips/A unified framework for information-theoretic generalization bounds
 create mode 100644 data/2023/neurips/ADGym: Design Choices for Deep Anomaly Detection
 create mode 100644 data/2023/neurips/AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
 create mode 100644 data/2023/neurips/AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis
 create mode 100644 data/2023/neurips/ANPL: Towards Natural Programming with Interactive Decomposition
 create mode 100644 data/2023/neurips/ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation
 create mode 100644 data/2023/neurips/AQuA: A Benchmarking Tool for Label Quality Assessment
 create mode 100644 data/2023/neurips/AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
 create mode 100644 data/2023/neurips/ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition
 create mode 100644 data/2023/neurips/ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks
 create mode 100644 data/2023/neurips/ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
 create mode 100644 data/2023/neurips/AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
 create mode 100644 data/2023/neurips/AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
 create mode 100644 data/2023/neurips/AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web
 create mode 100644 data/2023/neurips/AbDiffuser: full-atom generation of in-vitro functioning antibodies
 create mode 100644 data/2023/neurips/Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism
 create mode 100644 data/2023/neurips/Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
 create mode 100644 data/2023/neurips/Accelerating Motion Planning via Optimal Transport
 create mode 100644 data/2023/neurips/Accessing Higher Dimensions for Unsupervised Word Translation
 create mode 100644 data/2023/neurips/Achieving Cross Modal Generalization with Multimodal Unified Representation
 create mode 100644 data/2023/neurips/Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models
 create mode 100644 data/2023/neurips/Active Vision Reinforcement Learning under Limited Visual Observability
 create mode 100644 data/2023/neurips/Active representation learning for general task space with applications in robotics
 create mode 100644 data/2023/neurips/Activity Grammars for Temporal Action Segmentation
 create mode 100644 data/2023/neurips/AdANNS: A Framework for Adaptive Semantic Search
 create mode 100644 data/2023/neurips/Adapting Neural Link Predictors for Data-Efficient Complex Query Answering
 create mode 100644 data/2023/neurips/Adaptive Principal Component Regression with Applications to Panel Data
 create mode 100644 data/2023/neurips/Adaptive Privacy Composition for Accuracy-first Mechanisms
 create mode 100644 data/2023/neurips/Adaptive Test-Time Personalization for Federated Learning
 create mode 100644 data/2023/neurips/Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels
 create mode 100644 data/2023/neurips/Add and Thin: Diffusion for Temporal Point Processes
 create mode 100644 data/2023/neurips/Addressing Negative Transfer in Diffusion Models
 create mode 100644 data/2023/neurips/Adversarial Counterfactual Environment Model Learning
 create mode 100644 data/2023/neurips/Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces
 create mode 100644 data/2023/neurips/Adversarial Self-Training Improves Robustness and Generalization for Gradual Domain Adaptation
 create mode 100644 data/2023/neurips/Adversarial Training from Mean Field Perspective
 create mode 100644 data/2023/neurips/Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
 create mode 100644 data/2023/neurips/Advice Querying under Budget Constraint for Online Algorithms
 create mode 100644 data/2023/neurips/Affinity-Aware Graph Networks
 create mode 100644 data/2023/neurips/AirDelhi: Fine-Grained Spatio-Temporal Particulate Matter Dataset From Delhi For ML based Modeling
 create mode 100644 data/2023/neurips/AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
 create mode 100644 data/2023/neurips/Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
 create mode 100644 data/2023/neurips/Aligning Gradient and Hessian for Neural Signed Distance Function
 create mode 100644 data/2023/neurips/Aligning Language Models with Human Preferences via a Bayesian Approach
 create mode 100644 data/2023/neurips/Alignment with human representations supports robust few-shot learning
 create mode 100644 data/2023/neurips/All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation
 create mode 100644 data/2023/neurips/Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
 create mode 100644 data/2023/neurips/Alternating Updates for Efficient Transformers
 create mode 100644 data/2023/neurips/Alternation makes the adversary weaker in two-player games
 create mode 100644 data/2023/neurips/American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
 create mode 100644 data/2023/neurips/Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs
 create mode 100644 "data/2023/neurips/An Alternating Optimization Method for Bilevel Problems under the Polyak-\305\201ojasiewicz Condition"
 create mode 100644 data/2023/neurips/An Efficient Dataset Condensation Plugin and Its Application to Continual Learning
 create mode 100644 data/2023/neurips/An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits
 create mode 100644 data/2023/neurips/An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
 create mode 100644 data/2023/neurips/An Optimization-based Approach To Node Role Discovery in Networks: Approximating Equitable Partitions
 create mode 100644 data/2023/neurips/An information-theoretic quantification of the content of communication between brain regions
 create mode 100644 data/2023/neurips/Analyzing Generalization of Neural Networks through Loss Path Kernels
 create mode 100644 data/2023/neurips/Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods
 create mode 100644 data/2023/neurips/Anchor Data Augmentation
 create mode 100644 data/2023/neurips/Anonymous and Copy-Robust Delegations for Liquid Democracy
 create mode 100644 data/2023/neurips/Anytime Model Selection in Linear Bandits
 create mode 100644 data/2023/neurips/Anytime-Competitive Reinforcement Learning with Policy Prior
 create mode 100644 data/2023/neurips/Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders
 create mode 100644 data/2023/neurips/Approximate inference of marginals using the IBIA framework
 create mode 100644 data/2023/neurips/Are Diffusion Models Vision-And-Language Reasoners?
 create mode 100644 data/2023/neurips/Are Vision Transformers More Data Hungry Than Newborn Visual Systems?
 create mode 100644 data/2023/neurips/Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
 create mode 100644 data/2023/neurips/Auditing for Human Expertise
 create mode 100644 data/2023/neurips/Augmenting Language Models with Long-Term Memory
 create mode 100644 data/2023/neurips/Auslan-Daily: Australian Sign Language Translation for Daily Communication and News
 create mode 100644 data/2023/neurips/AutoGO: Automated Computation Graph Optimization for Neural Network Evolution
 create mode 100644 data/2023/neurips/Autodecoding Latent 3D Diffusion Models
 create mode 100644 data/2023/neurips/BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
 create mode 100644 data/2023/neurips/BIOT: Biosignal Transformer for Cross-data Learning in the Wild
 create mode 100644 data/2023/neurips/BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization
 create mode 100644 data/2023/neurips/Back-Modality: Leveraging Modal Transformation for Data Augmentation
 create mode 100644 data/2023/neurips/Balancing Risk and Reward: A Batched-Bandit Strategy for Automated Phased Release
 create mode 100644 data/2023/neurips/Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance
 create mode 100644 data/2023/neurips/Bandit Task Assignment with Unknown Processing Time
 create mode 100644 data/2023/neurips/BanditPAM++: Faster k-medoids Clustering
 create mode 100644 data/2023/neurips/BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
 create mode 100644 data/2023/neurips/BayesTune: Bayesian Sparse Deep Model Fine-tuning
 create mode 100644 data/2023/neurips/Bayesian Active Causal Discovery with Multi-Fidelity Experiments
 create mode 100644 data/2023/neurips/Bayesian Learning via Q-Exponential Process
 create mode 100644 data/2023/neurips/Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval
 create mode 100644 data/2023/neurips/Bayesian Risk-Averse Q-Learning with Streaming Observations
 create mode 100644 data/2023/neurips/Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variability
 create mode 100644 data/2023/neurips/BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
 create mode 100644 data/2023/neurips/Benchmarking Foundation Models with Language-Model-as-an-Examiner
 create mode 100644 data/2023/neurips/Better Private Linear Regression Through Better Private Feature Selection
 create mode 100644 data/2023/neurips/Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
 create mode 100644 data/2023/neurips/Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends
 create mode 100644 data/2023/neurips/Bias in Evaluation Processes: An Optimization-Based Model
 create mode 100644 data/2023/neurips/Bicriteria Approximation Algorithms for the Submodular Cover Problem
 create mode 100644 data/2023/neurips/Bicriteria Multidimensional Mechanism Design with Side Information
 create mode 100644 data/2023/neurips/Bifurcations and loss jumps in RNN training
 create mode 100644 data/2023/neurips/BioMassters: A Benchmark Dataset for Forest Biomass Estimation using Multi-modal Satellite Time-series
 create mode 100644 data/2023/neurips/Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method
 create mode 100644 data/2023/neurips/Black-Box Differential Privacy for Interactive ML
 create mode 100644 data/2023/neurips/Block Coordinate Plug-and-Play Methods for Blind Inverse Problems
 create mode 100644 data/2023/neurips/Block-State Transformers
 create mode 100644 data/2023/neurips/Boosting Adversarial Transferability by Achieving Flat Local Maxima
 create mode 100644 data/2023/neurips/Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
 create mode 100644 data/2023/neurips/Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences
 create mode 100644 data/2023/neurips/Boundary Guided Learning-Free Semantic Control with Diffusion Models
 create mode 100644 data/2023/neurips/Bounding training data reconstruction in DP-SGD
 create mode 100644 data/2023/neurips/Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models
 create mode 100644 data/2023/neurips/Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy
 create mode 100644 data/2023/neurips/Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
 create mode 100644 data/2023/neurips/Bypassing spike sorting: Density-based decoding using spike localization from dense multielectrode probes
 create mode 100644 data/2023/neurips/Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
 create mode 100644 data/2023/neurips/C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder
 create mode 100644 data/2023/neurips/CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
 create mode 100644 data/2023/neurips/CARE: Modeling Interacting Dynamics Under Temporal Environmental Variation
 create mode 100644 data/2023/neurips/CAST: Cross-Attention in Space and Time for Video Action Recognition
 create mode 100644 data/2023/neurips/CEIL: Generalized Contextual Imitation Learning
 create mode 100644 data/2023/neurips/CHAMMI: A benchmark for channel-adaptive models in microscopy imaging
 create mode 100644 data/2023/neurips/COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
 create mode 100644 data/2023/neurips/COOM: A Game Benchmark for Continual Reinforcement Learning
 create mode 100644 data/2023/neurips/CORL: Research-oriented Deep Offline Reinforcement Learning Library
 create mode 100644 data/2023/neurips/CQM: Curriculum Reinforcement Learning with a Quantized World Model
 create mode 100644 data/2023/neurips/CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography
 create mode 100644 data/2023/neurips/CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
 create mode 100644 data/2023/neurips/Cal-DETR: Calibrated Detection Transformer
 create mode 100644 data/2023/neurips/Calibrate and Boost Logical Expressiveness of GNN Over Multi-Relational and Temporal Graphs
 create mode 100644 data/2023/neurips/Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
 create mode 100644 data/2023/neurips/CamoPatch: An Evolutionary Strategy for Generating Camoflauged Adversarial Patches
 create mode 100644 data/2023/neurips/Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
 create mode 100644 data/2023/neurips/Causal discovery from observational and interventional data across multiple environments
 create mode 100644 data/2023/neurips/Causal-structure Driven Augmentations for Text OOD Generalization
 create mode 100644 data/2023/neurips/Characterization and Learning of Causal Graphs with Small Conditioning Sets
 create mode 100644 data/2023/neurips/Characterization of Overfitting in Robust Multiclass Classification
 create mode 100644 data/2023/neurips/Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach
 create mode 100644 data/2023/neurips/ChatGPT-Powered Hierarchical Comparisons for Image Classification
 create mode 100644 data/2023/neurips/Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
 create mode 100644 data/2023/neurips/Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
 create mode 100644 data/2023/neurips/CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data
 create mode 100644 data/2023/neurips/Class-Conditional Conformal Prediction with Many Classes
 create mode 100644 data/2023/neurips/Classical Simulation of Quantum Circuits: Parallel Environments and Benchmark
 create mode 100644 data/2023/neurips/ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling
 create mode 100644 data/2023/neurips/ClusterFomer: Clustering As A Universal Visual Learner
 create mode 100644 data/2023/neurips/Clustering the Sketch: Dynamic Compression for Embedding Tables
 create mode 100644 data/2023/neurips/CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
 create mode 100644 data/2023/neurips/CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
 create mode 100644 data/2023/neurips/CoPriv: Network Protocol Co-Optimization for Communication-Efficient Private Inference
 create mode 100644 data/2023/neurips/Cognitive Steering in Deep Neural Networks via Long-Range Modulatory Feedback Connections
 create mode 100644 data/2023/neurips/Collaborative Alignment of NLP Models
 create mode 100644 data/2023/neurips/Collaborative Learning via Prediction Consensus
 create mode 100644 data/2023/neurips/Collaborative Score Distillation for Consistent Visual Editing
 create mode 100644 data/2023/neurips/Collaboratively Learning Linear Models with Structured Missing Data
 create mode 100644 data/2023/neurips/Collapsed Inference for Bayesian Deep Learning
 create mode 100644 data/2023/neurips/Compact Neural Volumetric Video Representations with Dynamic Codebooks
 create mode 100644 data/2023/neurips/Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions
 create mode 100644 data/2023/neurips/Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift
 create mode 100644 data/2023/neurips/Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy
 create mode 100644 data/2023/neurips/Concept Algebra for (Score-Based) Text-Controlled Generative Models
 create mode 100644 data/2023/neurips/Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
 create mode 100644 data/2023/neurips/Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
 create mode 100644 data/2023/neurips/Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
 create mode 100644 data/2023/neurips/Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
 create mode 100644 data/2023/neurips/Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems
 create mode 100644 data/2023/neurips/Connecting Certified and Adversarial Training
 create mode 100644 data/2023/neurips/Conservative State Value Estimation for Offline Reinforcement Learning
 create mode 100644 data/2023/neurips/Constructing Non-isotropic Gaussian Diffusion Model Using Isotropic Gaussian Diffusion Model for Image Editing
 create mode 100644 data/2023/neurips/Context Shift Reduction for Offline Meta-Reinforcement Learning
 create mode 100644 data/2023/neurips/Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes
 create mode 100644 data/2023/neurips/Context-lumpable stochastic bandits
 create mode 100644 data/2023/neurips/Contextual Bandits and Imitation Learning with Preference-Based Active Queries
 create mode 100644 data/2023/neurips/Contextual Stochastic Bilevel Optimization
 create mode 100644 data/2023/neurips/Continuous-Time Functional Diffusion Processes
 create mode 100644 data/2023/neurips/Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time
 create mode 100644 data/2023/neurips/Contrastive Sampling Chains in Diffusion Models
 create mode 100644 data/2023/neurips/Controlling Text-to-Image Diffusion by Orthogonal Finetuning
 create mode 100644 data/2023/neurips/Convex-Concave Zero-Sum Stochastic Stackelberg Games
 create mode 100644 data/2023/neurips/Convolutional Neural Operators for robust and accurate learning of PDEs
 create mode 100644 data/2023/neurips/Convolutional State Space Models for Long-Range Spatiotemporal Modeling
 create mode 100644 data/2023/neurips/Core-sets for Fair and Diverse Data Summarization
 create mode 100644 data/2023/neurips/Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry
 create mode 100644 data/2023/neurips/CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
 create mode 100644 data/2023/neurips/Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
 create mode 100644 data/2023/neurips/Counterfactual Evaluation of Peer-Review Assignment Policies
 create mode 100644 data/2023/neurips/Coupled Reconstruction of Cortical Surfaces by Diffeomorphic Mesh Deformation
 create mode 100644 data/2023/neurips/Covariance-adaptive best arm identification
 create mode 100644 data/2023/neurips/Creating Multi-Level Skill Hierarchies in Reinforcement Learning
 create mode 100644 data/2023/neurips/Creating a Public Repository for Joining Private Data
 create mode 100644 data/2023/neurips/Cross-Domain Policy Adaptation via Value-Guided Data Filtering
 create mode 100644 data/2023/neurips/Cross-links Matter for Link Prediction: Rethinking the Debiased GNN from a Data Perspective
 create mode 100644 data/2023/neurips/D4Explainer: In-distribution Explanations of Graph Neural Network via Discrete Denoising Diffusion
 create mode 100644 data/2023/neurips/DAC-DETR: Divide the Attention Layers and Conquer
 create mode 100644 data/2023/neurips/DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
 create mode 100644 data/2023/neurips/DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
 create mode 100644 data/2023/neurips/DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
 create mode 100644 data/2023/neurips/DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning
 create mode 100644 data/2023/neurips/DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
 create mode 100644 data/2023/neurips/DISCS: A Benchmark for Discrete Sampling
 create mode 100644 data/2023/neurips/DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
 create mode 100644 data/2023/neurips/Data Quality in Imitation Learning
 create mode 100644 data/2023/neurips/Data Selection for Language Models via Importance Resampling
 create mode 100644 data/2023/neurips/Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances
 create mode 100644 data/2023/neurips/Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation
 create mode 100644 data/2023/neurips/Debiasing Conditional Stochastic Optimization
 create mode 100644 data/2023/neurips/Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
 create mode 100644 data/2023/neurips/Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models
 create mode 100644 data/2023/neurips/Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory
 create mode 100644 data/2023/neurips/Decompose Novel into Known: Part Concept Learning For 3D Novel Class Discovery
 create mode 100644 data/2023/neurips/Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning
 create mode 100644 data/2023/neurips/Deep Contract Design via Discontinuous Networks
 create mode 100644 data/2023/neurips/Deep Fractional Fourier Transform
 create mode 100644 data/2023/neurips/Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems
 create mode 100644 data/2023/neurips/Deep Insights into Noisy Pseudo Labeling on Graph Data
 create mode 100644 data/2023/neurips/Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model
 create mode 100644 data/2023/neurips/DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization
 create mode 100644 data/2023/neurips/DeepPCR: Parallelizing Sequential Operations in Neural Networks
 create mode 100644 data/2023/neurips/DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
 create mode 100644 data/2023/neurips/Delegated Classification
 create mode 100644 data/2023/neurips/Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?
 create mode 100644 data/2023/neurips/Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
 create mode 100644 data/2023/neurips/Depth-discriminative Metric Learning for Monocular 3D Object Detection
 create mode 100644 data/2023/neurips/Derandomized novelty detection with FDR control via conformal e-values
 create mode 100644 data/2023/neurips/Described Object Detection: Liberating Object Detection with Flexible Expressions
 create mode 100644 data/2023/neurips/DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
 create mode 100644 data/2023/neurips/DiViNeT: 3D Reconstruction from Disparate Views using Neural Template Regularization
 create mode 100644 data/2023/neurips/Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
 create mode 100644 data/2023/neurips/Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
 create mode 100644 data/2023/neurips/DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
 create mode 100644 data/2023/neurips/DiffComplete: Diffusion-based Generative 3D Shape Completion
 create mode 100644 data/2023/neurips/DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology
 create mode 100644 data/2023/neurips/DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model
 create mode 100644 data/2023/neurips/Differentiable Clustering with Perturbed Spanning Forests
 create mode 100644 data/2023/neurips/Differentiable sorting for censored time-to-event data
 create mode 100644 "data/2023/neurips/Differentially Private Statistical Inference through \316\262-Divergence One Posterior Sampling"
 create mode 100644 data/2023/neurips/DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
 create mode 100644 data/2023/neurips/Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
 create mode 100644 data/2023/neurips/Diffusion Self-Guidance for Controllable Image Generation
 create mode 100644 data/2023/neurips/Direct Preference-based Policy Optimization without Reward Modeling
 create mode 100644 data/2023/neurips/Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity
 create mode 100644 data/2023/neurips/DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models
 create mode 100644 data/2023/neurips/Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning
 create mode 100644 data/2023/neurips/Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
 create mode 100644 data/2023/neurips/Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning
 create mode 100644 data/2023/neurips/Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions
 create mode 100644 data/2023/neurips/Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering
 create mode 100644 data/2023/neurips/Disentangling Cognitive Diagnosis with Limited Exercise Labels
 create mode 100644 data/2023/neurips/Disentangling Voice and Content with Self-Supervision for Speaker Recognition
 create mode 100644 data/2023/neurips/Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
 create mode 100644 data/2023/neurips/Distributed Inference and Fine-tuning of Large Language Models Over The Internet
 create mode 100644 data/2023/neurips/Distributed Personalized Empirical Risk Minimization
 create mode 100644 data/2023/neurips/Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training
 create mode 100644 data/2023/neurips/Diverse Shape Completion via Style Modulated Generative Adversarial Networks
 create mode 100644 data/2023/neurips/Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation
 create mode 100644 data/2023/neurips/Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
 create mode 100644 data/2023/neurips/Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback
 create mode 100644 data/2023/neurips/DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
 create mode 100644 data/2023/neurips/Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
 create mode 100644 data/2023/neurips/Does a sparse ReLU network training problem always admit an optimum ?
 create mode 100644 data/2023/neurips/Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy
 create mode 100644 data/2023/neurips/Don't just prune by magnitude! Your mask topology is a secret weapon
 create mode 100644 data/2023/neurips/Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
 create mode 100644 data/2023/neurips/Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee
 create mode 100644 data/2023/neurips/Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control
 create mode 100644 data/2023/neurips/Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
 create mode 100644 data/2023/neurips/Drift doesn't Matter: Dynamic Decomposition with Diffusion Reconstruction for Unstable Multivariate Time Series Anomaly Detection
 create mode 100644 data/2023/neurips/Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative MARL
 create mode 100644 data/2023/neurips/DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
 create mode 100644 data/2023/neurips/DynPoint: Dynamic Neural Point For View Synthesis
 create mode 100644 data/2023/neurips/Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
 create mode 100644 data/2023/neurips/Dynamic Personalized Federated Learning with Adaptive Differential Privacy
 create mode 100644 data/2023/neurips/Dynamic Regret of Adversarial Linear Mixture MDPs
 create mode 100644 data/2023/neurips/Dynamic Sparsity Is Channel-Level Sparsity Learner
 create mode 100644 "data/2023/neurips/D\303\244RF: Boosting Radiance Fields from Sparse Input Views with Monocular Depth Adaptation"
 create mode 100644 data/2023/neurips/E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning
 create mode 100644 data/2023/neurips/ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
 create mode 100644 data/2023/neurips/EDGI: Equivariant Diffusion for Planning with Embodied Agents
 create mode 100644 data/2023/neurips/EFWI: Multiparameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties
 create mode 100644 data/2023/neurips/EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
 create mode 100644 data/2023/neurips/ELDEN: Exploration via Local Dependencies
 create mode 100644 data/2023/neurips/Easy Learning from Label Proportions
 create mode 100644 data/2023/neurips/Effective Bayesian Heteroscedastic Regression with Deep Neural Networks
 create mode 100644 data/2023/neurips/Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
 create mode 100644 data/2023/neurips/Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning
 create mode 100644 data/2023/neurips/Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection
 create mode 100644 data/2023/neurips/Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
 create mode 100644 data/2023/neurips/Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
 create mode 100644 data/2023/neurips/Efficient Diffusion Policies For Offline Reinforcement Learning
 create mode 100644 data/2023/neurips/Efficient Learning of Linear Graph Neural Networks via Node Subsampling
 create mode 100644 data/2023/neurips/Efficient Model-Free Exploration in Low-Rank MDPs
 create mode 100644 data/2023/neurips/Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric
 create mode 100644 data/2023/neurips/Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs
 create mode 100644 data/2023/neurips/Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models
 create mode 100644 data/2023/neurips/Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction
 create mode 100644 data/2023/neurips/Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks
 create mode 100644 data/2023/neurips/Efficiently incorporating quintuple interactions into geometric deep learning force fields
 create mode 100644 data/2023/neurips/EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
 create mode 100644 data/2023/neurips/Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
 create mode 100644 data/2023/neurips/Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity
 create mode 100644 data/2023/neurips/Emergent Communication for Rules Reasoning
 create mode 100644 data/2023/neurips/Empowering Convolutional Neural Nets with MetaSin Activation
 create mode 100644 data/2023/neurips/End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics
 create mode 100644 data/2023/neurips/Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
 create mode 100644 data/2023/neurips/Energy-Efficient Scheduling with Predictions
 create mode 100644 data/2023/neurips/Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork
 create mode 100644 data/2023/neurips/Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams
 create mode 100644 data/2023/neurips/Enhancing Sharpness-Aware Optimization Through Variance Suppression
 create mode 100644 data/2023/neurips/Entropic Neural Optimal Transport via Diffusion Processes
 create mode 100644 data/2023/neurips/Entropy-based Training Methods for Scalable Neural Implicit Samplers
 create mode 100644 data/2023/neurips/Episodic Multi-Task Learning with Heterogeneous Neural Processes
 create mode 100644 data/2023/neurips/Equal Opportunity of Coverage in Fair Regression
 create mode 100644 data/2023/neurips/Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations
 create mode 100644 data/2023/neurips/Estimating Causal Effects Identifiable from a Combination of Observations and Experiments
 create mode 100644 data/2023/neurips/Estimating Koopman operators with sketching to provably learn large scale dynamical systems
 create mode 100644 data/2023/neurips/Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance
 create mode 100644 data/2023/neurips/Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
 create mode 100644 data/2023/neurips/Evaluating Neuron Interpretation Methods of NLP Models
 create mode 100644 data/2023/neurips/Evaluating Open-QA Evaluation
 create mode 100644 data/2023/neurips/Evaluating Post-hoc Explanations for Graph Neural Networks via Robustness Analysis
 create mode 100644 data/2023/neurips/Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts
 create mode 100644 data/2023/neurips/Evaluating Self-Supervised Learning for Molecular Graph Embeddings
 create mode 100644 data/2023/neurips/Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
 create mode 100644 data/2023/neurips/Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance
 create mode 100644 data/2023/neurips/Expanding Small-Scale Datasets with Guided Imagination
 create mode 100644 data/2023/neurips/Experimental Designs for Heteroskedastic Variance
 create mode 100644 data/2023/neurips/Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
 create mode 100644 data/2023/neurips/Exploiting hidden structures in non-convex games for convergence to Nash equilibrium
 create mode 100644 data/2023/neurips/Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks
 create mode 100644 data/2023/neurips/Exploring Question Decomposition for Zero-Shot VQA
 create mode 100644 data/2023/neurips/Exponentially Convergent Algorithms for Supervised Matrix Factorization
 create mode 100644 data/2023/neurips/Exposing Attention Glitches with Flip-Flop Language Modeling
 create mode 100644 data/2023/neurips/Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
 create mode 100644 data/2023/neurips/Expressive Sign Equivariant Networks for Spectral Geometric Learning
 create mode 100644 data/2023/neurips/Expressivity-Preserving GNN Simulation
 create mode 100644 data/2023/neurips/FAMO: Fast Adaptive Multitask Optimization
 create mode 100644 data/2023/neurips/FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
 create mode 100644 data/2023/neurips/FIND: A Function Description Benchmark for Evaluating Interpretability Methods
 create mode 100644 data/2023/neurips/FIRAL: An Active Learning Algorithm for Multinomial Logistic Regression
 create mode 100644 data/2023/neurips/FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
 create mode 100644 data/2023/neurips/Facing Off World Model Backbones: RNNs, Transformers, and S4
 create mode 100644 data/2023/neurips/Failure-Aware Gaussian Process Optimization with Regret Bounds
 create mode 100644 data/2023/neurips/Fair Graph Distillation
 create mode 100644 data/2023/neurips/Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
 create mode 100644 data/2023/neurips/Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach
 create mode 100644 data/2023/neurips/Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments
 create mode 100644 data/2023/neurips/Faith and Fate: Limits of Transformers on Compositionality
 create mode 100644 data/2023/neurips/False Discovery Proportion control for aggregated Knockoffs
 create mode 100644 data/2023/neurips/Fast Approximation of Similarity Graphs with Kernel Density Estimation
 create mode 100644 data/2023/neurips/Fast Model DeBias with Machine Unlearning
 create mode 100644 data/2023/neurips/Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity
 create mode 100644 data/2023/neurips/Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow Shrink Trees
 create mode 100644 data/2023/neurips/Faster Differentially Private Convex Optimization via Second-Order Methods
 create mode 100644 data/2023/neurips/Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case
 create mode 100644 data/2023/neurips/Faster approximate subgraph counts with privacy
 create mode 100644 data/2023/neurips/Feature Learning for Interpretable, Performant Decision Trees
 create mode 100644 data/2023/neurips/Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples
 create mode 100644 data/2023/neurips/Feature Selection in the Contrastive Analysis Setting
 create mode 100644 data/2023/neurips/Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer
 create mode 100644 data/2023/neurips/FedGCN: Convergence-Communication Tradeoffs in Federated Training of Graph Convolutional Networks
 create mode 100644 data/2023/neurips/FedNAR: Federated Optimization with Normalized Annealing Regularization
 create mode 100644 data/2023/neurips/Federated Linear Bandits with Finite Adversarial Actions
 create mode 100644 data/2023/neurips/Finding Local Minima Efficiently in Decentralized Optimization
 create mode 100644 data/2023/neurips/Finding Safe Zones of Markov Decision Processes Policies
 create mode 100644 data/2023/neurips/Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
 create mode 100644 data/2023/neurips/Fine-Grained Visual Prompting
 create mode 100644 data/2023/neurips/Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation
 create mode 100644 data/2023/neurips/Flat Seeking Bayesian Neural Networks
 create mode 100644 data/2023/neurips/Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
 create mode 100644 data/2023/neurips/Flow Factorized Representation Learning
 create mode 100644 data/2023/neurips/Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
 create mode 100644 data/2023/neurips/Flow: Per-instance Personalized Federated Learning
 create mode 100644 data/2023/neurips/Focus Your Attention when Few-Shot Classification
 create mode 100644 data/2023/neurips/Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation
 create mode 100644 data/2023/neurips/Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts
 create mode 100644 data/2023/neurips/Formulating Discrete Probability Flow Through Optimal Transport
 create mode 100644 data/2023/neurips/FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective
 create mode 100644 data/2023/neurips/FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow
 create mode 100644 data/2023/neurips/Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization
 create mode 100644 data/2023/neurips/Frequency Domain-Based Dataset Distillation
 create mode 100644 data/2023/neurips/Frequency-domain MLPs are More Effective Learners in Time Series Forecasting
 create mode 100644 data/2023/neurips/From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
 create mode 100644 data/2023/neurips/From Trainable Negative Depth to Edge Heterophily in Graphs
 create mode 100644 data/2023/neurips/Full-Atom Protein Pocket Design via Iterative Refinement
 create mode 100644 data/2023/neurips/Function Space Bayesian Pseudocoreset for Bayesian Neural Networks
 create mode 100644 data/2023/neurips/Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks
 create mode 100644 data/2023/neurips/GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection
 create mode 100644 data/2023/neurips/GAUCHE: A Library for Gaussian Processes in Chemistry
 create mode 100644 data/2023/neurips/GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection
 create mode 100644 data/2023/neurips/GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
 create mode 100644 data/2023/neurips/GMSF: Global Matching Scene Flow
 create mode 100644 data/2023/neurips/GPEX, A Framework For Interpreting Artificial Neural Networks
 create mode 100644 data/2023/neurips/GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks
 create mode 100644 data/2023/neurips/GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
 create mode 100644 data/2023/neurips/Gacs-Korner Common Information Variational Autoencoder
 create mode 100644 data/2023/neurips/Gaussian Membership Inference Privacy
 create mode 100644 data/2023/neurips/Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data
 create mode 100644 data/2023/neurips/Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
 create mode 100644 data/2023/neurips/GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image
 create mode 100644 data/2023/neurips/GenS: Generalizable Neural Surface Reconstruction from Multi-View Images
 create mode 100644 data/2023/neurips/Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation
 create mode 100644 data/2023/neurips/Generalized Belief Transport
 create mode 100644 data/2023/neurips/Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
 create mode 100644 data/2023/neurips/Generalized Weighted Path Consistency for Mastering Atari Games
 create mode 100644 data/2023/neurips/Generalized equivalences between subsampling and ridge regularization
 create mode 100644 data/2023/neurips/Generalized test utilities for long-tail performance in extreme multi-label classification
 create mode 100644 data/2023/neurips/Generating Behaviorally Diverse Policies with Latent Diffusion Models
 create mode 100644 data/2023/neurips/Generator Identification for Linear SDEs with Additive and Multiplicative Noise
 create mode 100644 data/2023/neurips/GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition
 create mode 100644 data/2023/neurips/Geodesic Multi-Modal Mixup for Robust Fine-Tuning
 create mode 100644 data/2023/neurips/Geometric Analysis of Matrix Sensing over Graphs
 create mode 100644 data/2023/neurips/Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization
 create mode 100644 "data/2023/neurips/Global Identifiability of \360\235\223\2011-based Dictionary Learning via Matrix Volume Optimization"
 create mode 100644 data/2023/neurips/Global Structure-Aware Diffusion Process for Low-light Image Enhancement
 create mode 100644 data/2023/neurips/Gradient-Based Feature Learning under Structured Data
 create mode 100644 data/2023/neurips/Grammar Prompting for Domain-Specific Language Generation with Large Language Models
 create mode 100644 data/2023/neurips/Granger Components Analysis: Unsupervised learning of latent temporal dependencies
 create mode 100644 data/2023/neurips/Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis
 create mode 100644 data/2023/neurips/Grassmann Manifold Flows for Stable Shape Generation
 create mode 100644 data/2023/neurips/Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
 create mode 100644 data/2023/neurips/Group Fairness in Peer Review
 create mode 100644 data/2023/neurips/Guiding The Last Layer in Federated Learning with Pre-Trained Models
 create mode 100644 data/2023/neurips/H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
 create mode 100644 data/2023/neurips/HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding
 create mode 100644 data/2023/neurips/HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
 create mode 100644 data/2023/neurips/Hardware Resilience Properties of Text-Guided Image Classifiers
 create mode 100644 data/2023/neurips/Harnessing the power of choices in decision tree learning
 create mode 100644 data/2023/neurips/HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
 create mode 100644 data/2023/neurips/Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality
 create mode 100644 data/2023/neurips/Hierarchical Multi-Agent Skill Discovery
 create mode 100644 data/2023/neurips/Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration
 create mode 100644 data/2023/neurips/High Precision Causal Model Evaluation with Conditional Randomization
 create mode 100644 data/2023/neurips/High-dimensional Asymptotics of Denoising Autoencoders
 create mode 100644 data/2023/neurips/Holistic Evaluation of Text-to-Image Models
 create mode 100644 data/2023/neurips/Homotopy-based training of NeuralODEs for accurate dynamics discovery
 create mode 100644 data/2023/neurips/How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
 create mode 100644 data/2023/neurips/How Re-sampling Helps for Long-Tail Learning?
 create mode 100644 data/2023/neurips/How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
 create mode 100644 data/2023/neurips/How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
 create mode 100644 data/2023/neurips/How to Scale Your EMA
 create mode 100644 data/2023/neurips/How to Turn Your Knowledge Graph Embeddings into Generative Models
 create mode 100644 data/2023/neurips/HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection
 create mode 100644 data/2023/neurips/Human-Guided Complexity-Controlled Abstractions
 create mode 100644 data/2023/neurips/Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses
 create mode 100644 data/2023/neurips/Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images
 create mode 100644 data/2023/neurips/Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels
 create mode 100644 data/2023/neurips/IBA: Towards Irreversible Backdoor Attacks in Federated Learning
 create mode 100644 data/2023/neurips/ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets
 create mode 100644 data/2023/neurips/IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers
 create mode 100644 data/2023/neurips/ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
 create mode 100644 data/2023/neurips/ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
 create mode 100644 data/2023/neurips/Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
 create mode 100644 data/2023/neurips/Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
 create mode 100644 data/2023/neurips/Imitation Learning from Vague Feedback
 create mode 100644 data/2023/neurips/Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
 create mode 100644 data/2023/neurips/Implicit Manifold Gaussian Process Regression
 create mode 100644 data/2023/neurips/Implicit Regularization in Over-Parameterized Support Vector Machine
 create mode 100644 data/2023/neurips/Implicit Variational Inference for High-Dimensional Posteriors
 create mode 100644 data/2023/neurips/Implicit variance regularization in non-contrastive SSL
 create mode 100644 data/2023/neurips/Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures
 create mode 100644 data/2023/neurips/Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
 create mode 100644 data/2023/neurips/Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms
 create mode 100644 data/2023/neurips/Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise
 create mode 100644 data/2023/neurips/Improving Robustness with Adaptive Weight Decay
 create mode 100644 data/2023/neurips/Improving neural network representations using human similarity judgments
 create mode 100644 data/2023/neurips/In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer
 create mode 100644 data/2023/neurips/In-Context Impersonation Reveals Large Language Models' Strengths and Biases
 create mode 100644 data/2023/neurips/In-Context Learning Unlocked for Diffusion Models
 create mode 100644 data/2023/neurips/Individual Arbitrariness and Group Fairness
 create mode 100644 data/2023/neurips/Inferring Hybrid Neural Fluid Fields from Videos
 create mode 100644 data/2023/neurips/Inferring the Future by Imagining the Past
 create mode 100644 data/2023/neurips/InfoCD: A Contrastive Chamfer Distance Loss for Point Cloud Completion
 create mode 100644 data/2023/neurips/InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
 create mode 100644 data/2023/neurips/Information Geometry of the Retinal Representation Manifold
 create mode 100644 data/2023/neurips/Information-guided Planning: An Online Approach for Partially Observable Problems
 create mode 100644 data/2023/neurips/Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks
 create mode 100644 data/2023/neurips/Inner Product-based Neural Network Similarity
 create mode 100644 data/2023/neurips/InsActor: Instruction-driven Physics-based Characters
 create mode 100644 data/2023/neurips/Inserting Anybody in Diffusion Models via Celeb Basis
 create mode 100644 data/2023/neurips/Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision
 create mode 100644 data/2023/neurips/Interactive Visual Reasoning under Uncertainty
 create mode 100644 data/2023/neurips/Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
 create mode 100644 data/2023/neurips/Interpretable Prototype-based Graph Information Bottleneck
 create mode 100644 data/2023/neurips/Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts
 create mode 100644 data/2023/neurips/Invariant Learning via Probability of Sufficient and Necessary Causes
 create mode 100644 data/2023/neurips/Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation
 create mode 100644 data/2023/neurips/Inverse Reinforcement Learning with the Average Reward Criterion
 create mode 100644 data/2023/neurips/Iterative Reachability Estimation for Safe Reinforcement Learning
 create mode 100644 data/2023/neurips/Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
 create mode 100644 data/2023/neurips/Jailbroken: How Does LLM Safety Training Fail?
 create mode 100644 data/2023/neurips/K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
 create mode 100644 data/2023/neurips/KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
 create mode 100644 data/2023/neurips/Kernel Quadrature with Randomly Pivoted Cholesky
 create mode 100644 data/2023/neurips/Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
 create mode 100644 data/2023/neurips/Kissing to Find a Match: Efficient Low-Rank Permutation Representation
 create mode 100644 data/2023/neurips/Knowledge Diffusion for Distillation
 create mode 100644 data/2023/neurips/Knowledge Distillation Performs Partial Variance Reduction
 create mode 100644 data/2023/neurips/L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors
 create mode 100644 data/2023/neurips/L2-Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting
 create mode 100644 data/2023/neurips/LEACE: Perfect linear concept erasure in closed form
 create mode 100644 data/2023/neurips/LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
 create mode 100644 data/2023/neurips/LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
 create mode 100644 data/2023/neurips/LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
 create mode 100644 data/2023/neurips/Label Poisoning is All You Need
 create mode 100644 data/2023/neurips/Label-Only Model Inversion Attacks via Knowledge Transfer
 create mode 100644 data/2023/neurips/Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels
 create mode 100644 data/2023/neurips/LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite
 create mode 100644 data/2023/neurips/Langevin Quasi-Monte Carlo
 create mode 100644 data/2023/neurips/Language Is Not All You Need: Aligning Perception with Language Models
 create mode 100644 data/2023/neurips/Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
 create mode 100644 data/2023/neurips/Language Models Meet World Models: Embodied Experiences Enhance Language Models
 create mode 100644 data/2023/neurips/Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
 create mode 100644 data/2023/neurips/Language-based Action Concept Spaces Improve Video Self-Supervised Learning
 create mode 100644 data/2023/neurips/Large Language Models Are Semi-Parametric Reinforcement Learning Agents
 create mode 100644 data/2023/neurips/Large Language Models are Visual Reasoning Coordinators
 create mode 100644 data/2023/neurips/Large Language Models can Implement Policy Iteration
 create mode 100644 data/2023/neurips/LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
 create mode 100644 data/2023/neurips/Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
 create mode 100644 data/2023/neurips/Latent SDEs on Homogeneous Spaces
 create mode 100644 data/2023/neurips/Latent exploration for Reinforcement Learning
 create mode 100644 data/2023/neurips/Layer-Neighbor Sampling - Defusing Neighborhood Explosion in GNNs
 create mode 100644 data/2023/neurips/Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
 create mode 100644 data/2023/neurips/Learning Adaptive Tensorial Density Fields for Clean Cryo-ET Reconstruction
 create mode 100644 data/2023/neurips/Learning Causal Models under Independent Changes
 create mode 100644 data/2023/neurips/Learning Cuts via Enumeration Oracles
 create mode 100644 data/2023/neurips/Learning DAGs from Data with Few Root Causes
 create mode 100644 data/2023/neurips/Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization
 create mode 100644 data/2023/neurips/Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
 create mode 100644 data/2023/neurips/Learning Functional Transduction
 create mode 100644 data/2023/neurips/Learning Human Action Recognition Representations Without Real Humans
 create mode 100644 data/2023/neurips/Learning Invariant Molecular Representation in Latent Discrete Space
 create mode 100644 data/2023/neurips/Learning Large Graph Property Prediction via Graph Segment Training
 create mode 100644 data/2023/neurips/Learning Large-Scale MTP2 Gaussian Graphical Models via Bridge-Block Decomposition
 create mode 100644 data/2023/neurips/Learning Large-scale Neural Fields via Context Pruned Meta-Learning
 create mode 100644 data/2023/neurips/Learning List-Level Domain-Invariant Representations for Ranking
 create mode 100644 data/2023/neurips/Learning Motion Refinement for Unsupervised Face Animation
 create mode 100644 data/2023/neurips/Learning Rate Free Bayesian Inference in Constrained Domains
 create mode 100644 data/2023/neurips/Learning Regularized Monotone Graphon Mean-Field Games
 create mode 100644 data/2023/neurips/Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer
 create mode 100644 data/2023/neurips/Learning Shared Safety Constraints from Multi-task Demonstrations
 create mode 100644 data/2023/neurips/Learning Time-Invariant Representations for Individual Neurons from Population Dynamics
 create mode 100644 data/2023/neurips/Learning Trajectories are Generalization Indicators
 create mode 100644 data/2023/neurips/Learning Universal Policies via Text-Guided Video Generation
 create mode 100644 data/2023/neurips/Learning Visual Prior via Generative Pre-Training
 create mode 100644 data/2023/neurips/Learning from Active Human Involvement through Proxy Value Propagation
 create mode 100644 data/2023/neurips/Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
 create mode 100644 data/2023/neurips/Learning non-Markovian Decision-Making from State-only Sequences
 create mode 100644 data/2023/neurips/Learning to Augment Distributions for Out-of-distribution Detection
 create mode 100644 data/2023/neurips/Learning to Group Auxiliary Datasets for Molecule
 create mode 100644 data/2023/neurips/Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval
 create mode 100644 data/2023/neurips/Learning to Receive Help: Intervention-Aware Concept Embedding Models
 create mode 100644 data/2023/neurips/Learning to Taste: A Multimodal Wine Dataset
 create mode 100644 data/2023/neurips/Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
 create mode 100644 data/2023/neurips/Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
 create mode 100644 data/2023/neurips/Leveraging the two-timescale regime to demonstrate convergence of neural networks
 create mode 100644 data/2023/neurips/Lexinvariant Language Models
 create mode 100644 data/2023/neurips/LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion
 create mode 100644 data/2023/neurips/LithoBench: Benchmarking AI Computational Lithography for Semiconductor Manufacturing
 create mode 100644 data/2023/neurips/Lo-Hi: Practical ML Drug Discovery Benchmark
 create mode 100644 data/2023/neurips/LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning
 create mode 100644 data/2023/neurips/Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training
 create mode 100644 data/2023/neurips/LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees
 create mode 100644 data/2023/neurips/Logarithmic Bayes Regret Bounds
 create mode 100644 data/2023/neurips/Long-Term Fairness with Unknown Dynamics
 create mode 100644 data/2023/neurips/Loss Dynamics of Temporal Difference Reinforcement Learning
 create mode 100644 data/2023/neurips/Lossy Image Compression with Conditional Diffusion Models
 create mode 100644 data/2023/neurips/Low-shot Object Learning with Mutual Exclusivity Bias
 create mode 100644 data/2023/neurips/Lower Bounds on Adaptive Sensing for Matrix Recovery
 create mode 100644 data/2023/neurips/LuminAIRe: Illumination-Aware Conditional Image Repainting for Lighting-Realistic Generation
 create mode 100644 data/2023/neurips/Lung250M-4B: A Combined 3D Dataset for CT- and Point Cloud-Based Intra-Patient Lung Registration
 create mode 100644 data/2023/neurips/M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery
 create mode 100644 data/2023/neurips/M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark
 create mode 100644 data/2023/neurips/MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
 create mode 100644 data/2023/neurips/MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
 create mode 100644 data/2023/neurips/MG-ViT: A Multi-Granularity Method for Compact and Efficient Vision Transformers
 create mode 100644 data/2023/neurips/MMD-Fuse: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting
 create mode 100644 data/2023/neurips/MVDoppler: Unleashing the Power of Multi-View Doppler for MicroMotion-based Gait Classification
 create mode 100644 data/2023/neurips/Machine learning detects terminal singularities
 create mode 100644 data/2023/neurips/Macro Placement by Wire-Mask-Guided Black-Box Optimization
 create mode 100644 data/2023/neurips/Many-body Approximation for Non-negative Tensors
 create mode 100644 data/2023/neurips/Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack
 create mode 100644 data/2023/neurips/Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
 create mode 100644 data/2023/neurips/MathNAS: If Blocks Have a Role in Mathematical Architecture Design
 create mode 100644 data/2023/neurips/Max-Sliced Mutual Information
 create mode 100644 data/2023/neurips/Maximization of Average Precision for Deep Learning with Adversarial Ranking Robustness
 create mode 100644 data/2023/neurips/Maximum Independent Set: Self-Training through Dynamic Programming
 create mode 100644 data/2023/neurips/May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations
 create mode 100644 data/2023/neurips/MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy
 create mode 100644 data/2023/neurips/MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery
 create mode 100644 data/2023/neurips/Meta-AdaM: An Meta-Learned Adaptive Optimizer with Momentum for Few-Shot Learning
 create mode 100644 data/2023/neurips/Meta-Learning with Neural Bandit Scheduler
 create mode 100644 data/2023/neurips/Meta-in-context learning in large language models
 create mode 100644 data/2023/neurips/Meta-learning families of plasticity rules in recurrent spiking networks using simulation-based inference
 create mode 100644 data/2023/neurips/Metis: Understanding and Enhancing In-Network Regular Expressions
 create mode 100644 data/2023/neurips/Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
 create mode 100644 data/2023/neurips/Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension
 create mode 100644 data/2023/neurips/Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
 create mode 100644 data/2023/neurips/Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees
 create mode 100644 data/2023/neurips/Minimum Description Length and Generalization Guarantees for Representation Learning
 create mode 100644 data/2023/neurips/Minimum-Risk Recalibration of Classifiers
 create mode 100644 data/2023/neurips/Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
 create mode 100644 data/2023/neurips/Mitigating Test-Time Bias for Fair Image Retrieval
 create mode 100644 data/2023/neurips/Mitigating the Effect of Incidental Correlations on Part-based Learning
 create mode 100644 data/2023/neurips/Mitigating the Popularity Bias of Graph Collaborative Filtering: A Dimensional Collapse Perspective
 create mode 100644 data/2023/neurips/MixFormerV2: Efficient Fully Transformer Tracking
 create mode 100644 data/2023/neurips/Mnemosyne: Learning to Train Transformers with Transformers
 create mode 100644 data/2023/neurips/MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
 create mode 100644 data/2023/neurips/Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
 create mode 100644 data/2023/neurips/Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
 create mode 100644 data/2023/neurips/Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning
 create mode 100644 data/2023/neurips/Model-Based Control with Sparse Neural Dynamics
 create mode 100644 data/2023/neurips/Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
 create mode 100644 data/2023/neurips/Model-free Posterior Sampling via Learning Rate Randomization
 create mode 100644 data/2023/neurips/Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder
 create mode 100644 data/2023/neurips/Module-wise Adaptive Distillation for Multimodality Foundation Models
 create mode 100644 data/2023/neurips/Moment Matching Denoising Gibbs Sampling
 create mode 100644 data/2023/neurips/MomentDiff: Generative Video Moment Retrieval from Random to Real
 create mode 100644 data/2023/neurips/Momentum Provably Improves Error Feedback!
 create mode 100644 data/2023/neurips/Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
 create mode 100644 data/2023/neurips/Monte Carlo Tree Search with Boltzmann Exploration
 create mode 100644 data/2023/neurips/Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
 create mode 100644 data/2023/neurips/Multi-Agent Learning with Heterogeneous Linear Contextual Bandits
 create mode 100644 data/2023/neurips/Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
 create mode 100644 data/2023/neurips/Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
 create mode 100644 data/2023/neurips/Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
 create mode 100644 data/2023/neurips/Multi-modal Queried Object Detection in the Wild
 create mode 100644 data/2023/neurips/Multi-scale Diffusion Denoised Smoothing
 create mode 100644 data/2023/neurips/Multiclass Boosting: Simple and Intuitive Weak Learning Criteria
 create mode 100644 data/2023/neurips/Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions
 create mode 100644 data/2023/neurips/Multiply Robust Federated Estimation of Targeted Average Treatment Effects
 create mode 100644 data/2023/neurips/NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
 create mode 100644 data/2023/neurips/NCDL: A Framework for Deep Learning on non-Cartesian Lattices
 create mode 100644 data/2023/neurips/Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
 create mode 100644 data/2023/neurips/Near-Linear Time Algorithm for the Chamfer Distance
 create mode 100644 data/2023/neurips/Near-Optimal k-Clustering in the Sliding Window Model
 create mode 100644 data/2023/neurips/Nearest Neighbour with Bandit Feedback
 create mode 100644 data/2023/neurips/Nearly Optimal Bounds for Cyclic Forgetting
 create mode 100644 data/2023/neurips/Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming
 create mode 100644 data/2023/neurips/Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb
 create mode 100644 data/2023/neurips/Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity
 create mode 100644 data/2023/neurips/Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
 create mode 100644 data/2023/neurips/Neural Functional Transformers
 create mode 100644 data/2023/neurips/Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
 create mode 100644 data/2023/neurips/Neural Image Compression: Generalization, Robustness, and Spectral Biases
 create mode 100644 data/2023/neurips/Neural Lyapunov Control for Discrete-Time Systems
 create mode 100644 data/2023/neurips/Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability
 create mode 100644 data/2023/neurips/Neural Priming for Sample-Efficient Adaptation
 create mode 100644 data/2023/neurips/Neural Processes with Stability
 create mode 100644 data/2023/neurips/Neural Sampling in Hierarchical Exponential-family Energy-based Models
 create mode 100644 data/2023/neurips/NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function
 create mode 100644 data/2023/neurips/NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications
 create mode 100644 data/2023/neurips/New Bounds for Hyperparameter Tuning of Regression Problems Across Instances
 create mode 100644 data/2023/neurips/No Change, No Gain: Empowering Graph Neural Networks with Expected Model Change Maximization for Active Learning
 create mode 100644 data/2023/neurips/Nominality Score Conditioned Time Series Anomaly Detection by Point Sequential Reconstruction
 create mode 100644 data/2023/neurips/Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
 create mode 100644 data/2023/neurips/Norm-based Generalization Bounds for Sparse Neural Networks
 create mode 100644 data/2023/neurips/Normalization Layers Are All That Sharpness-Aware Minimization Needs
 create mode 100644 data/2023/neurips/Normalizing flow neural networks by JKO scheme
 create mode 100644 data/2023/neurips/Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
 create mode 100644 data/2023/neurips/OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
 create mode 100644 data/2023/neurips/ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
 create mode 100644 data/2023/neurips/OV-PARTS: Towards Open-Vocabulary Part Segmentation
 create mode 100644 data/2023/neurips/Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations
 create mode 100644 data/2023/neurips/Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
 create mode 100644 data/2023/neurips/OceanBench: The Sea Surface Height Edition
 create mode 100644 data/2023/neurips/Offline Imitation Learning with Variational Counterfactual Reasoning
 create mode 100644 data/2023/neurips/Offline RL with Discrete Proxy Representations for Generalizability in POMDPs
 create mode 100644 data/2023/neurips/On Convergence of Polynomial Approximations to the Gaussian Mixture Entropy
 create mode 100644 data/2023/neurips/On Differentially Private Sampling from Gaussian and Product Distributions
 create mode 100644 data/2023/neurips/On Generalization Bounds for Projective Clustering
 create mode 100644 data/2023/neurips/On Masked Pre-training and the Marginal Likelihood
 create mode 100644 data/2023/neurips/On Robust Streaming for Learning with Experts: Algorithms and Lower Bounds
 create mode 100644 data/2023/neurips/On permutation symmetries in Bayesian neural network posteriors: a variational perspective
 create mode 100644 data/2023/neurips/On student-teacher deviations in distillation: does it pay to disobey?
 create mode 100644 data/2023/neurips/On the Adversarial Robustness of Out-of-distribution Generalization Models
 create mode 100644 data/2023/neurips/On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence
 create mode 100644 data/2023/neurips/On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
 create mode 100644 data/2023/neurips/On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
 create mode 100644 data/2023/neurips/On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis
 create mode 100644 data/2023/neurips/On the Identifiability and Interpretability of Gaussian Process Models
 create mode 100644 data/2023/neurips/On the Identifiability of Sparse ICA without Assuming Non-Gaussianity
 create mode 100644 data/2023/neurips/On the Overlooked Structure of Stochastic Gradients
 create mode 100644 data/2023/neurips/On the Planning Abilities of Large Language Models - A Critical Investigation
 create mode 100644 data/2023/neurips/On the Robustness of Removal-Based Feature Attributions
 create mode 100644 data/2023/neurips/On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences
 create mode 100644 data/2023/neurips/On the Role of Randomization in Adversarially Robust Classification
 create mode 100644 data/2023/neurips/On the Trade-off of Intra- Inter-class Diversity for Supervised Pre-training
 create mode 100644 data/2023/neurips/On the spectral bias of two-layer linear networks
 create mode 100644 data/2023/neurips/One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
 create mode 100644 data/2023/neurips/One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
 create mode 100644 data/2023/neurips/One-step differentiation of iterative algorithms
 create mode 100644 data/2023/neurips/OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
 create mode 100644 data/2023/neurips/Online Constrained Meta-Learning: Provable Guarantees for Generalization
 create mode 100644 data/2023/neurips/Online Control for Meta-optimization
 create mode 100644 data/2023/neurips/Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms
 create mode 100644 data/2023/neurips/Online Learning under Adversarial Nonlinear Constraints
 create mode 100644 data/2023/neurips/Online POMDP Planning with Anytime Deterministic Guarantees
 create mode 100644 data/2023/neurips/Online robust non-stationary estimation
 create mode 100644 data/2023/neurips/Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation
 create mode 100644 data/2023/neurips/Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
 create mode 100644 data/2023/neurips/OpenMask3D: Open-Vocabulary 3D Instance Segmentation
 create mode 100644 data/2023/neurips/OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
 create mode 100644 data/2023/neurips/Operation-Level Early Stopping for Robustifying Differentiable NAS
 create mode 100644 data/2023/neurips/Operator Learning with Neural Fields: Tackling PDEs on General Geometries
 create mode 100644 data/2023/neurips/Optimal Algorithms for the Inhomogeneous Spiked Wigner Model
 create mode 100644 data/2023/neurips/Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning
 create mode 100644 data/2023/neurips/Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
 create mode 100644 data/2023/neurips/Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure
 create mode 100644 data/2023/neurips/Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model
 create mode 100644 data/2023/neurips/Optimal Transport for Treatment Effect Estimation
 create mode 100644 data/2023/neurips/Optimal testing using combined test statistics across independent studies
 create mode 100644 data/2023/neurips/Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
 create mode 100644 data/2023/neurips/Optimizing Prompts for Text-to-Image Generation
 create mode 100644 data/2023/neurips/Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
 create mode 100644 data/2023/neurips/Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
 create mode 100644 data/2023/neurips/P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting
 create mode 100644 data/2023/neurips/PAC Learning Linear Thresholds from Label Proportions
 create mode 100644 data/2023/neurips/PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
 create mode 100644 data/2023/neurips/PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
 create mode 100644 data/2023/neurips/POMDP Planning for Object Search in Partially Unknown Environment
 create mode 100644 data/2023/neurips/PPi: Pretraining Brain Signal Model for Patient-independent Seizure Detection
 create mode 100644 data/2023/neurips/PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising
 create mode 100644 data/2023/neurips/ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
 create mode 100644 data/2023/neurips/Parallel Submodular Function Minimization
 create mode 100644 data/2023/neurips/Parallel-mentoring for Offline Model-based Optimization
 create mode 100644 data/2023/neurips/Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
 create mode 100644 data/2023/neurips/Partial Label Learning with Dissimilarity Propagation guided Candidate Label Shrinkage
 create mode 100644 data/2023/neurips/Participatory Personalization in Classification
 create mode 100644 data/2023/neurips/Particle-based Variational Inference with Generalized Wasserstein Gradient Flow
 create mode 100644 data/2023/neurips/Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
 create mode 100644 data/2023/neurips/Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
 create mode 100644 data/2023/neurips/Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
 create mode 100644 data/2023/neurips/Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
 create mode 100644 data/2023/neurips/PlanE: Representation Learning over Planar Graphs
 create mode 100644 data/2023/neurips/PoET: A generative model of protein families as sequences-of-sequences
 create mode 100644 data/2023/neurips/Policy Space Diversity for Non-Transitive Games
 create mode 100644 data/2023/neurips/PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models
 create mode 100644 data/2023/neurips/Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
 create mode 100644 data/2023/neurips/Post Hoc Explanations of Language Models Can Improve Language Models
 create mode 100644 data/2023/neurips/Post-processing Private Synthetic Data for Improving Utility on Selected Measures
 create mode 100644 data/2023/neurips/PrObeD: Proactive Object Detection Wrapper
 create mode 100644 data/2023/neurips/PreDiff: Precipitation Nowcasting with Latent Diffusion Models
 create mode 100644 data/2023/neurips/Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent
 create mode 100644 data/2023/neurips/Predicting a Protein's Stability under a Million Mutations
 create mode 100644 data/2023/neurips/Prediction and Control in Continual Reinforcement Learning
 create mode 100644 data/2023/neurips/Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
 create mode 100644 data/2023/neurips/Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation
 create mode 100644 data/2023/neurips/Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
 create mode 100644 data/2023/neurips/Private estimation algorithms for stochastic block models and mixture models
 create mode 100644 data/2023/neurips/ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
 create mode 100644 data/2023/neurips/ProPILE: Probing Privacy Leakage in Large Language Models
 create mode 100644 data/2023/neurips/Probabilistic Invariant Learning with Randomized Linear Classifiers
 create mode 100644 data/2023/neurips/Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem
 create mode 100644 data/2023/neurips/Projection-Free Online Convex Optimization via Efficient Newton Iterations
 create mode 100644 data/2023/neurips/PromptIR: Prompting for All-in-One Image Restoration
 create mode 100644 data/2023/neurips/Propagating Knowledge Updates to LMs Through Distillation
 create mode 100644 data/2023/neurips/ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design
 create mode 100644 data/2023/neurips/ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics
 create mode 100644 data/2023/neurips/Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond
 create mode 100644 data/2023/neurips/Provable benefits of score matching
 create mode 100644 data/2023/neurips/Provable convergence guarantees for black-box variational inference
 create mode 100644 data/2023/neurips/Provably (More) Sample-Efficient Offline RL with Options
 create mode 100644 data/2023/neurips/Provably Bounding Neural Network Preimages
 create mode 100644 data/2023/neurips/Proximity-Informed Calibration for Deep Neural Networks
 create mode 100644 data/2023/neurips/Q-DM: An Efficient Low-bit Quantized Diffusion Model
 create mode 100644 data/2023/neurips/QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules
 create mode 100644 data/2023/neurips/Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
 create mode 100644 data/2023/neurips/Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
 create mode 100644 data/2023/neurips/RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
 create mode 100644 data/2023/neurips/RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks
 create mode 100644 data/2023/neurips/RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing
 create mode 100644 data/2023/neurips/RaLEs: a Benchmark for Radiology Language Evaluations
 create mode 100644 data/2023/neurips/RanPAC: Random Projections and Pre-trained Models for Continual Learning
 create mode 100644 data/2023/neurips/Random Cuts are Optimal for Explainable k-Medians
 create mode 100644 data/2023/neurips/RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection
 create mode 100644 data/2023/neurips/Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization
 create mode 100644 data/2023/neurips/ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
 create mode 100644 data/2023/neurips/Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
 create mode 100644 data/2023/neurips/Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
 create mode 100644 data/2023/neurips/Recasting Continual Learning as Sequence Modeling
 create mode 100644 data/2023/neurips/Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares
 create mode 100644 data/2023/neurips/Recurrent Temporal Revision Graph Networks
 create mode 100644 data/2023/neurips/Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
 create mode 100644 data/2023/neurips/Red Teaming Deep Neural Networks with Feature Synthesis Tools
 create mode 100644 data/2023/neurips/Regret Matching+: (In)Stability and Fast Convergence in Games
 create mode 100644 data/2023/neurips/Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time
 create mode 100644 data/2023/neurips/Rehearsal Learning for Avoiding Undesired Future
 create mode 100644 data/2023/neurips/Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark
 create mode 100644 data/2023/neurips/Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
 create mode 100644 data/2023/neurips/Reinforcement Learning with Fast and Forgetful Memory
 create mode 100644 data/2023/neurips/Reining Generalization in Offline Reinforcement Learning via Representation Distinction
 create mode 100644 data/2023/neurips/Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification
 create mode 100644 data/2023/neurips/Reliable Off-Policy Learning for Dosage Combinations
 create mode 100644 data/2023/neurips/Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
 create mode 100644 data/2023/neurips/Replicability in Reinforcement Learning
 create mode 100644 data/2023/neurips/Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning
 create mode 100644 data/2023/neurips/Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
 create mode 100644 data/2023/neurips/ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
 create mode 100644 data/2023/neurips/Resetting the Optimizer in Deep RL: An Empirical Study
 create mode 100644 data/2023/neurips/Residual Alignment: Uncovering the Mechanisms of Residual Networks
 create mode 100644 data/2023/neurips/Resilient Constrained Learning
 create mode 100644 data/2023/neurips/Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
 create mode 100644 data/2023/neurips/Responsible AI (RAI) Games and Ensembles
 create mode 100644 data/2023/neurips/Restart Sampling for Improving Generative Processes
 create mode 100644 data/2023/neurips/Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption
 create mode 100644 data/2023/neurips/Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition
 create mode 100644 data/2023/neurips/Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?
 create mode 100644 data/2023/neurips/Revealing the unseen: Benchmarking video action recognition under occlusion
 create mode 100644 data/2023/neurips/Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness
 create mode 100644 data/2023/neurips/Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
 create mode 100644 data/2023/neurips/Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations
 create mode 100644 data/2023/neurips/Revisiting the Evaluation of Image Synthesis with GANs
 create mode 100644 data/2023/neurips/Reward Imputation with Sketching for Contextual Batched Bandits
 create mode 100644 data/2023/neurips/Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
 create mode 100644 data/2023/neurips/Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
 create mode 100644 data/2023/neurips/Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
 create mode 100644 data/2023/neurips/Riemannian Residual Neural Networks
 create mode 100644 data/2023/neurips/Riemannian SAM: Sharpness-Aware Minimization on Riemannian Manifolds
 create mode 100644 data/2023/neurips/Riemannian stochastic optimization methods avoid strict saddle points
 create mode 100644 data/2023/neurips/Risk-Averse Active Sensing for Timely Outcome Prediction under Cost Pressure
 create mode 100644 data/2023/neurips/Robust Bayesian Satisficing
 create mode 100644 data/2023/neurips/Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy
 create mode 100644 data/2023/neurips/Robust Learning with Progressive Data Expansion Against Spurious Correlation
 create mode 100644 data/2023/neurips/Robust Matrix Sensing in the Semi-Random Model
 create mode 100644 data/2023/neurips/Robust Mean Estimation Without Moments for Symmetric Distributions
 create mode 100644 data/2023/neurips/Robust Model Reasoning and Fitting via Dual Sparsity Pursuit
 create mode 100644 data/2023/neurips/Robust and Actively Secure Serverless Collaborative Learning
 create mode 100644 data/2023/neurips/Robust covariance estimation with missing values and cell-wise contamination
 create mode 100644 data/2023/neurips/Robust low-rank training via approximate orthonormal constraints
 create mode 100644 data/2023/neurips/SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models
 create mode 100644 data/2023/neurips/SALSA VERDE: a machine learning attack on LWE with sparse small secrets
 create mode 100644 data/2023/neurips/SAME: Uncovering GNN Black Box with Structure-aware Shapley-based Multipiece Explanations
 create mode 100644 data/2023/neurips/SANFlow: Semantic-Aware Normalizing Flow for Anomaly Detection
 create mode 100644 data/2023/neurips/SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation
 create mode 100644 data/2023/neurips/SE(3) Equivariant Augmented Coupling Flows
 create mode 100644 data/2023/neurips/SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models
 create mode 100644 data/2023/neurips/SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
 create mode 100644 data/2023/neurips/SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization
 create mode 100644 data/2023/neurips/SLaM: Student-Label Mixing for Distillation with Unlabeled Examples
 create mode 100644 data/2023/neurips/SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
 create mode 100644 data/2023/neurips/SPA: A Graph Spectral Alignment Perspective for Domain Adaptation
 create mode 100644 data/2023/neurips/SPACE: Single-round Participant Amalgamation for Contribution Evaluation in Federated Learning
 create mode 100644 data/2023/neurips/SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
 create mode 100644 data/2023/neurips/STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
 create mode 100644 data/2023/neurips/STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
 create mode 100644 data/2023/neurips/SUPA: A Lightweight Diagnostic Simulator for Machine Learning in Particle Physics
 create mode 100644 data/2023/neurips/SaVeNet: A Scalable Vector Network for Enhanced Molecular Representation Learning
 create mode 100644 data/2023/neurips/SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
 create mode 100644 data/2023/neurips/Safety Verification of Decision-Tree Policies in Continuous Time
 create mode 100644 data/2023/neurips/Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
 create mode 100644 data/2023/neurips/Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning
 create mode 100644 data/2023/neurips/Sample-efficient Multi-objective Molecular Optimization with GFlowNets
 create mode 100644 data/2023/neurips/SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data
 create mode 100644 data/2023/neurips/SatLM: Satisfiability-Aided Language Models Using Declarative Prompting
 create mode 100644 data/2023/neurips/Scalable 3D Captioning with Pretrained Models
 create mode 100644 data/2023/neurips/Scalable Fair Influence Maximization
 create mode 100644 data/2023/neurips/ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
 create mode 100644 data/2023/neurips/Scaling Open-Vocabulary Object Detection
 create mode 100644 data/2023/neurips/Scaling Riemannian Diffusion Models
 create mode 100644 data/2023/neurips/Scaling laws for language encoding models in fMRI
 create mode 100644 data/2023/neurips/Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
 create mode 100644 data/2023/neurips/Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion
 create mode 100644 data/2023/neurips/Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
 create mode 100644 data/2023/neurips/Secure Out-of-Distribution Task Generalization with Energy-Based Models
 create mode 100644 data/2023/neurips/Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
 create mode 100644 data/2023/neurips/SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
 create mode 100644 data/2023/neurips/Segment Anything in High Quality
 create mode 100644 data/2023/neurips/Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
 create mode 100644 data/2023/neurips/Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning
 create mode 100644 data/2023/neurips/Self-Adaptive Motion Tracking against On-body Displacement of Flexible Sensors
 create mode 100644 data/2023/neurips/Self-Chained Image-Language Model for Video Localization and Question Answering
 create mode 100644 data/2023/neurips/Self-Correcting Bayesian Optimization through Bayesian Active Learning
 create mode 100644 data/2023/neurips/Self-Predictive Universal AI
 create mode 100644 data/2023/neurips/Self-supervised video pretraining yields robust and more human-aligned visual representations
 create mode 100644 data/2023/neurips/Sequential Subset Matching for Dataset Distillation
 create mode 100644 data/2023/neurips/Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
 create mode 100644 data/2023/neurips/Shape Non-rigid Kinematics (SNK): A Zero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction
 create mode 100644 data/2023/neurips/Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
 create mode 100644 data/2023/neurips/Should Under-parameterized Student Networks Copy or Average Teacher Weights?
 create mode 100644 data/2023/neurips/SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
 create mode 100644 data/2023/neurips/Simple, Scalable and Effective Clustering via One-Dimensional Projections
 create mode 100644 data/2023/neurips/Simplifying and Empowering Transformers for Large-Graph Representations
 create mode 100644 data/2023/neurips/Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
 create mode 100644 data/2023/neurips/SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning
 create mode 100644 data/2023/neurips/Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
 create mode 100644 data/2023/neurips/Slot-guided Volumetric Object Radiance Fields
 create mode 100644 data/2023/neurips/Small batch deep reinforcement learning
 create mode 100644 data/2023/neurips/Smooth, exact rotational symmetrization for deep learning on point clouds
 create mode 100644 data/2023/neurips/Smoothed Analysis of Sequential Probability Assignment
 create mode 100644 data/2023/neurips/SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
 create mode 100644 data/2023/neurips/Social Motion Prediction with Cognitive Hierarchies
 create mode 100644 data/2023/neurips/Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks
 create mode 100644 data/2023/neurips/Solving a Class of Non-Convex Minimax Optimization in Federated Learning
 create mode 100644 data/2023/neurips/Sparse Deep Learning for Time Series Data: Theory and Applications
 create mode 100644 data/2023/neurips/Sparse Modular Activation for Efficient Sequence Modeling
 create mode 100644 data/2023/neurips/Sparse Parameterization for Epitomic Dataset Distillation
 create mode 100644 data/2023/neurips/Sparsity-Preserving Differentially Private Training of Large Embedding Models
 create mode 100644 data/2023/neurips/Spatial-frequency channels, shape bias, and adversarial robustness
 create mode 100644 data/2023/neurips/Spatially Resolved Gene Expression Prediction from Histology Images via Bi-modal Contrastive Learning
 create mode 100644 data/2023/neurips/Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning
 create mode 100644 data/2023/neurips/Spike-driven Transformer
 create mode 100644 data/2023/neurips/Spontaneous symmetry breaking in generative diffusion models
 create mode 100644 data/2023/neurips/Squared Neural Families: A New Class of Tractable Density Models
 create mode 100644 data/2023/neurips/Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective
 create mode 100644 data/2023/neurips/Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation
 create mode 100644 data/2023/neurips/Stable Bias: Evaluating Societal Representations in Diffusion Models
 create mode 100644 data/2023/neurips/Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
 create mode 100644 data/2023/neurips/StableFDG: Style and Attention Based Learning for Federated Domain Generalization
 create mode 100644 data/2023/neurips/State Sequences Prediction via Fourier Transform for Representation Learning
 create mode 100644 data/2023/neurips/State-Action Similarity-Based Representations for Off-Policy Evaluation
 create mode 100644 data/2023/neurips/State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory
 create mode 100644 data/2023/neurips/State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
 create mode 100644 data/2023/neurips/Static and Sequential Malicious Attacks in the Context of Selective Forgetting
 create mode 100644 data/2023/neurips/Statistical Analysis of Quantum State Learning Process in Quantum Neural Networks
 create mode 100644 data/2023/neurips/Statistical Knowledge Assessment for Large Language Models
 create mode 100644 data/2023/neurips/Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
 create mode 100644 data/2023/neurips/Statistically Valid Variable Importance Assessment through Conditional Permutations
 create mode 100644 "data/2023/neurips/Stein \316\240-Importance Sampling"
 create mode 100644 data/2023/neurips/Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths
 create mode 100644 data/2023/neurips/StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
 create mode 100644 data/2023/neurips/Strategic Apple Tasting
 create mode 100644 data/2023/neurips/Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost
 create mode 100644 data/2023/neurips/Streaming PCA for Markovian Data
 create mode 100644 data/2023/neurips/Strong and Precise Modulation of Human Percepts via Robustified ANNs
 create mode 100644 data/2023/neurips/Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects
 create mode 100644 data/2023/neurips/Structure of universal formulas
 create mode 100644 data/2023/neurips/Structured Neural Networks for Density Estimation and Causal Inference
 create mode 100644 data/2023/neurips/Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees
 create mode 100644 data/2023/neurips/Structured Prediction with Stronger Consistency Guarantees
 create mode 100644 data/2023/neurips/StyleDrop: Text-to-Image Synthesis of Any Style
 create mode 100644 data/2023/neurips/StyleGAN knows Normal, Depth, Albedo, and More
 create mode 100644 data/2023/neurips/Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression
 create mode 100644 data/2023/neurips/Subclass-Dominant Label Noise: A Counterexample for the Success of Early Stopping
 create mode 100644 data/2023/neurips/Successor-Predecessor Intrinsic Exploration
 create mode 100644 data/2023/neurips/Suggesting Variable Order for Cylindrical Algebraic Decomposition via Reinforcement Learning
 create mode 100644 data/2023/neurips/Survival Permanental Processes for Survival Analysis with Time-Varying Covariates
 create mode 100644 data/2023/neurips/SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems
 create mode 100644 data/2023/neurips/SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
 create mode 100644 data/2023/neurips/Swarm Reinforcement Learning for Adaptive Mesh Refinement
 create mode 100644 data/2023/neurips/SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
 create mode 100644 data/2023/neurips/Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials
 create mode 100644 data/2023/neurips/Synthetic-to-Real Pose Estimation with Geometric Reconstruction
 create mode 100644 data/2023/neurips/Systematic Visual Reasoning through Object-Centric Relational Abstraction
 create mode 100644 data/2023/neurips/T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
 create mode 100644 data/2023/neurips/TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
 create mode 100644 data/2023/neurips/TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph
 create mode 100644 data/2023/neurips/TOA: Task-oriented Active VQA
 create mode 100644 data/2023/neurips/TRIAGE: Characterizing and auditing training data for improved regression
 create mode 100644 data/2023/neurips/Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
 create mode 100644 data/2023/neurips/Tailoring Self-Attention for Graph via Rooted Subtrees
 create mode 100644 data/2023/neurips/Taming Local Effects in Graph-based Spatiotemporal Forecasting
 create mode 100644 data/2023/neurips/Tanimoto Random Features for Scalable Molecular Machine Learning
 create mode 100644 data/2023/neurips/Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
 create mode 100644 data/2023/neurips/Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
 create mode 100644 data/2023/neurips/Temporal Continual Learning with Prior Compensation for Human Motion Prediction
 create mode 100644 data/2023/neurips/Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification
 create mode 100644 data/2023/neurips/Test-time Training for Matching-based Video Object Segmentation
 create mode 100644 data/2023/neurips/Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
 create mode 100644 data/2023/neurips/Textually Pretrained Speech Language Models
 create mode 100644 data/2023/neurips/The Behavior and Convergence of Local Bayesian Optimization
 create mode 100644 data/2023/neurips/The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium
 create mode 100644 data/2023/neurips/The Crucial Role of Normalization in Sharpness-Aware Minimization
 create mode 100644 data/2023/neurips/The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model
 create mode 100644 data/2023/neurips/The Distortion of Binomial Voting Defies Expectation
 create mode 100644 data/2023/neurips/The Equivalence of Dynamic and Strategic Stability under Regularized Learning in Games
 create mode 100644 data/2023/neurips/The Gain from Ordering in Online Learning
 create mode 100644 data/2023/neurips/The Grand Illusion: The Myth of Software Portability and Implications for ML Progress
 create mode 100644 data/2023/neurips/The Graph Pencil Method: Mapping Subgraph Densities to Stochastic Block Models
 create mode 100644 data/2023/neurips/The Learnability of In-Context Learning
 create mode 100644 data/2023/neurips/The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only
 create mode 100644 data/2023/neurips/The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification
 create mode 100644 data/2023/neurips/The Transient Nature of Emergent In-Context Learning in Transformers
 create mode 100644 data/2023/neurips/The Tunnel Effect: Building Data Representations in Deep Neural Networks
 create mode 100644 data/2023/neurips/The expressive power of pooling in Graph Neural Networks
 create mode 100644 data/2023/neurips/The noise level in linear regression with dependent data
 create mode 100644 data/2023/neurips/The probability flow ODE is provably fast
 create mode 100644 data/2023/neurips/The s-value: evaluating stability with respect to distributional shifts
 create mode 100644 data/2023/neurips/Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks
 create mode 100644 data/2023/neurips/Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
 create mode 100644 data/2023/neurips/Thrust: Adaptively Propels Large Language Models with External Knowledge
 create mode 100644 data/2023/neurips/Tight Bounds for Volumetric Spanners and Applications
 create mode 100644 data/2023/neurips/Tight Risk Bounds for Gradient Descent on Separable Data
 create mode 100644 data/2023/neurips/Time Series as Images: Vision Transformer for Irregularly Sampled Time Series
 create mode 100644 data/2023/neurips/Toolformer: Language Models Can Teach Themselves to Use Tools
 create mode 100644 data/2023/neurips/Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification
 create mode 100644 data/2023/neurips/TopoSRL: Topology preserving self-supervised Simplicial Representation Learning
 create mode 100644 data/2023/neurips/Topological RANSAC for instance verification and retrieval without fine-tuning
 create mode 100644 data/2023/neurips/Towards Better Dynamic Graph Learning: New Architecture and Unified Library
 create mode 100644 data/2023/neurips/Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?
 create mode 100644 data/2023/neurips/Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
 create mode 100644 data/2023/neurips/Towards Distribution-Agnostic Generalized Category Discovery
 create mode 100644 data/2023/neurips/Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
 create mode 100644 data/2023/neurips/Towards Higher Ranks via Adversarial Weight Pruning
 create mode 100644 data/2023/neurips/Towards In-context Scene Understanding
 create mode 100644 data/2023/neurips/Towards Label-free Scene Understanding by Vision Foundation Models
 create mode 100644 data/2023/neurips/Towards Last-layer Retraining for Group Robustness with Fewer Annotations
 create mode 100644 data/2023/neurips/Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
 create mode 100644 data/2023/neurips/Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive Learning
 create mode 100644 data/2023/neurips/Towards Stable Backdoor Purification through Feature Shift Tuning
 create mode 100644 data/2023/neurips/Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs
 create mode 100644 data/2023/neurips/Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift
 create mode 100644 data/2023/neurips/Towards a Unified Framework of Contrastive Learning for Disentangled Representations
 create mode 100644 data/2023/neurips/Towards a fuller understanding of neurons with Clustered Compositional Explanations
 create mode 100644 data/2023/neurips/TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
 create mode 100644 data/2023/neurips/Train Hard, Fight Easy: Robust Meta Reinforcement Learning
 create mode 100644 data/2023/neurips/Training Chain-of-Thought via Latent-Variable Inference
 create mode 100644 "data/2023/neurips/Training Fully Connected Neural Networks is \342\210\203R-Complete"
 create mode 100644 data/2023/neurips/Training on Foveated Images Improves Robustness to Adversarial Attacks
 create mode 100644 data/2023/neurips/Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
 create mode 100644 data/2023/neurips/Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
 create mode 100644 data/2023/neurips/Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks
 create mode 100644 data/2023/neurips/Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction
 create mode 100644 data/2023/neurips/TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion
 create mode 100644 data/2023/neurips/Trial matching: capturing variability with data-constrained spiking neural networks
 create mode 100644 data/2023/neurips/TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
 create mode 100644 data/2023/neurips/Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
 create mode 100644 data/2023/neurips/Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
 create mode 100644 data/2023/neurips/Two-Stage Learning to Defer with Multiple Experts
 create mode 100644 data/2023/neurips/Type-to-Track: Retrieve Any Object via Prompt-based Tracking
 create mode 100644 data/2023/neurips/UDC-SIT: A Real-World Dataset for Under-Display Cameras
 create mode 100644 data/2023/neurips/UE4-NeRF: Neural Radiance Field for Real-Time Rendering of Large-Scale Scene
 create mode 100644 data/2023/neurips/UP-NeRF: Unconstrained Pose Prior-Free Neural Radiance Field
 create mode 100644 data/2023/neurips/UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition
 create mode 100644 data/2023/neurips/Unbiased learning of deep generative models with structured discrete representations
 create mode 100644 data/2023/neurips/Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval
 create mode 100644 data/2023/neurips/Uncertainty-Aware Instance Reweighting for Off-Policy Learning
 create mode 100644 data/2023/neurips/Unconstrained Dynamic Regret via Sparse Coding
 create mode 100644 data/2023/neurips/Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
 create mode 100644 data/2023/neurips/Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts
 create mode 100644 data/2023/neurips/Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
 create mode 100644 data/2023/neurips/Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes
 create mode 100644 data/2023/neurips/Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization
 create mode 100644 data/2023/neurips/Understanding and Improving Feature Learning for Out-of-Distribution Generalization
 create mode 100644 data/2023/neurips/Understanding the Limitations of Deep Models for Molecular property prediction: Insights and Solutions
 create mode 100644 data/2023/neurips/Understanding the detrimental class-level effects of data augmentation
 create mode 100644 data/2023/neurips/UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
 create mode 100644 data/2023/neurips/Uniform Convergence with Square-Root Lipschitz Loss
 create mode 100644 data/2023/neurips/Universality and Limitations of Prompt Tuning
 create mode 100644 data/2023/neurips/Unleashing the Power of Randomization in Auditing Differentially Private ML
 create mode 100644 data/2023/neurips/Unsupervised Anomaly Detection with Rejection
 create mode 100644 data/2023/neurips/Unsupervised Graph Neural Architecture Search with Disentangled Self-Supervision
 create mode 100644 data/2023/neurips/Unsupervised Image Denoising with Score Function
 create mode 100644 data/2023/neurips/Unsupervised Polychromatic Neural Representation for CT Metal Artifact Reduction
 create mode 100644 data/2023/neurips/Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models
 create mode 100644 data/2023/neurips/Utilitarian Algorithm Configuration
 create mode 100644 data/2023/neurips/VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
 create mode 100644 data/2023/neurips/VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
 create mode 100644 data/2023/neurips/VaRT: Variational Regression Trees
 create mode 100644 data/2023/neurips/Variational Annealing on Graphs for Combinatorial Optimization
 create mode 100644 data/2023/neurips/Video Prediction Models as Rewards for Reinforcement Learning
 create mode 100644 data/2023/neurips/Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
 create mode 100644 data/2023/neurips/VisAlign: Dataset for Measuring the Alignment between AI and Humans in Visual Perception
 create mode 100644 data/2023/neurips/VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
 create mode 100644 data/2023/neurips/Visual Instruction Inversion: Image Editing via Image Prompting
 create mode 100644 data/2023/neurips/Volume Feature Rendering for Fast Neural Radiance Field Reconstruction
 create mode 100644 "data/2023/neurips/Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schr\303\266dinger Equation"
 create mode 100644 data/2023/neurips/Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets
 create mode 100644 data/2023/neurips/What Can We Learn from Unlearnable Datasets?
 create mode 100644 data/2023/neurips/What Truly Matters in Trajectory Prediction for Autonomous Driving?
 create mode 100644 data/2023/neurips/What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
 create mode 100644 data/2023/neurips/When Demonstrations meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
 create mode 100644 data/2023/neurips/When Do Neural Nets Outperform Boosted Trees on Tabular Data?
 create mode 100644 data/2023/neurips/When Does Confidence-Based Cascade Deferral Suffice?
 create mode 100644 data/2023/neurips/When Does Optimizing a Proper Loss Yield Calibration?
 create mode 100644 data/2023/neurips/Where Did I Come From? Origin Attribution of AI-Generated Images
 create mode 100644 data/2023/neurips/Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
 create mode 100644 data/2023/neurips/Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects
 create mode 100644 data/2023/neurips/Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
 create mode 100644 data/2023/neurips/Why think step by step? Reasoning emerges from the locality of experience
 create mode 100644 data/2023/neurips/WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction
 create mode 100644 data/2023/neurips/Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
 create mode 100644 data/2023/neurips/Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
 create mode 100644 data/2023/neurips/XAGen: 3D Expressive Human Avatars Generation
 create mode 100644 data/2023/neurips/Zero-One Laws of Graph Neural Networks
 create mode 100644 data/2023/neurips/ZipLM: Inference-Aware Structured Pruning of Language Models
 create mode 100644 data/2023/neurips/f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences
 create mode 100644 data/2023/neurips/k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
 create mode 100644 data/2023/neurips/rPPG-Toolbox: Deep Remote PPG Toolbox
 create mode 100644 data/2023/neurips/xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data

diff --git a/data/2018/vldb/Declarative Recursive Computation on an RDBMS b/data/2018/vldb/Declarative Recursive Computation on an RDBMS
new file mode 100644
index 0000000000..2e630faf78
--- /dev/null
+++ b/data/2018/vldb/Declarative Recursive Computation on an RDBMS	
@@ -0,0 +1 @@
+We explore the close relationship between the tensor-based computations performed during modern machine learning, and relational database computations. We consider how to make a very small set of changes to a modern RDBMS to make it suitable for distributed learning computations. Changes include adding better support for recursion, and optimization and execution of very large compute plans. We also show that there are key advantages to using an RDBMS as a machine learning platform. In particular, DBMSbased learning allows for trivial scaling to large data sets and especially large models, where different computational units operate on different parts of a model that may be too large to fit into RAM.
\ No newline at end of file
diff --git a/data/2020/neurips/(De)Randomized Smoothing for Certifiable Defense against Patch Attacks b/data/2020/neurips/(De)Randomized Smoothing for Certifiable Defense against Patch Attacks
new file mode 100644
index 0000000000..c9f427f1b6
--- /dev/null
+++ b/data/2020/neurips/(De)Randomized Smoothing for Certifiable Defense against Patch Attacks	
@@ -0,0 +1 @@
+Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates. Additionally, the algorithm we propose is de-randomized, providing deterministic certificates. To the best of our knowledge, there exists only one prior method for certifiable defense against patch attacks, which relies on interval bound propagation. While this sole existing method performs well on MNIST, it has several limitations: it requires computationally expensive training, does not scale to ImageNet, and performs poorly on CIFAR-10. In contrast, our proposed method effectively addresses all of these issues: our classifier can be trained quickly, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at the ImageNet scale. For example, for a 5*5 patch attack on CIFAR-10, our method achieves up to around 57.8% certified accuracy (with a classifier around 83.9% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy), effectively establishing a new state-of-the-art. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data b/data/2020/neurips/3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data
new file mode 100644
index 0000000000..c95e7ae53c
--- /dev/null
+++ b/data/2020/neurips/3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous Image Data	
@@ -0,0 +1 @@
+We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views. In such cases, the visual evidence is usually insufficient to identify a 3D reconstruction uniquely, so we aim at recovering several plausible reconstructions compatible with the input data. We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses via a suitable 3D model, such as SMPL for humans. We propose to learn a multi-hypothesis neural network regressor using a best-of-M loss, where each of the M hypotheses is constrained to lie on a manifold of plausible human poses by means of a generative model. We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans, and in heavily occluded versions of these benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/3D Self-Supervised Methods for Medical Imaging b/data/2020/neurips/3D Self-Supervised Methods for Medical Imaging
new file mode 100644
index 0000000000..c9ae299201
--- /dev/null
+++ b/data/2020/neurips/3D Self-Supervised Methods for Medical Imaging	
@@ -0,0 +1 @@
+Self-supervised learning methods have witnessed a recent surge of interest after proving successful in multiple application fields. In this work, we leverage these techniques, and we propose 3D versions for five different self-supervised methods, in the form of proxy tasks. Our methods facilitate neural network feature learning from unlabeled 3D images, aiming to reduce the required cost for expert annotation. The developed algorithms are 3D Contrastive Predictive Coding, 3D Rotation prediction, 3D Jigsaw puzzles, Relative 3D patch location, and 3D Exemplar networks. Our experiments show that pretraining models with our 3D tasks yields more powerful semantic representations, and enables solving downstream tasks more accurately and efficiently, compared to training the models from scratch and to pretraining them on 2D slices. We demonstrate the effectiveness of our methods on three downstream tasks from the medical imaging domain: i) Brain Tumor Segmentation from 3D MRI, ii) Pancreas Tumor Segmentation from 3D CT, and iii) Diabetic Retinopathy Detection from 2D Fundus images. In each task, we assess the gains in data-efficiency, performance, and speed of convergence. Interestingly, we also find gains when transferring the learned representations, by our methods, from a large unlabeled 3D corpus to a small downstream-specific dataset. We achieve results competitive to state-of-the-art solutions at a fraction of the computational expense. We publish our implementations for the developed algorithms (both 3D and 2D versions) as an open-source library, in an effort to allow other researchers to apply and extend our methods on their datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/3D Shape Reconstruction from Vision and Touch b/data/2020/neurips/3D Shape Reconstruction from Vision and Touch
new file mode 100644
index 0000000000..48ad4b44ee
--- /dev/null
+++ b/data/2020/neurips/3D Shape Reconstruction from Vision and Touch	
@@ -0,0 +1 @@
+When a toddler is presented a new toy, their instinctual behaviour is to pick it up and inspect it with their hand and eyes in tandem, clearly searching over its surface to properly understand what they are playing with. Here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to fusing vision and touch, which leverages advances in graph convolutional networks. To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) the reconstruction quality increases with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood.
\ No newline at end of file
diff --git a/data/2020/neurips/A B Testing in Dense Large-Scale Networks: Design and Inference b/data/2020/neurips/A B Testing in Dense Large-Scale Networks: Design and Inference
new file mode 100644
index 0000000000..b98a812441
--- /dev/null
+++ b/data/2020/neurips/A B Testing in Dense Large-Scale Networks: Design and Inference	
@@ -0,0 +1 @@
+Design of experiments and estimation of treatment effects in large-scale networks, in the presence of strong interference, is a challenging and important problem. Most existing methods' performance deteriorates as the density of the network increases. In this paper, we present a novel strategy for accurately estimating the causal effects of a class of treatments in a dense large-scale network. First, we design an approximate randomized controlled experiment by solving an optimization problem to allocate treatments in the presence of competition among neighboring nodes. Then we apply an importance sampling adjustment to correct for any leftover bias (from the approximation) in estimating average treatment effects. We provide theoretical guarantees, verify robustness in a simulation study, and validate the scalability and usefulness of our procedure in a real-world experiment on a large social network.
\ No newline at end of file
diff --git a/data/2020/neurips/A Bandit Learning Algorithm and Applications to Auction Design b/data/2020/neurips/A Bandit Learning Algorithm and Applications to Auction Design
new file mode 100644
index 0000000000..0621844e6a
--- /dev/null
+++ b/data/2020/neurips/A Bandit Learning Algorithm and Applications to Auction Design	
@@ -0,0 +1 @@
+We consider online bandit learning in which at every time step, an algorithm has to make a decision and then observe only its reward. The goal is to design efficient (polynomial-time) algorithms that achieve a total reward approximately close to that of the best fixed decision in hindsight. In this paper, we introduce a new notion of ( λ, µ ) -concave functions and present a bandit learning algorithm that achieves a performance guarantee which is characterized as a function of the concavity parameters λ and µ . The algorithm is based on the mirror descent algorithm in which the update directions follow the gradient of the multilinear extensions of the reward functions. The regret bound induced by our algorithm is (cid:101) O ( √ T ) which is nearly optimal. We apply our algorithm to auction design, specifically to welfare maximization, revenue maximization, and no-envy learning in auctions. In welfare maximization, we show that a version of fictitious play in smooth auctions guarantees a competitive regret bound which is determined by the smooth parameters. In revenue maximization, we consider the simultaneous second-price auctions with reserve prices in multi-parameter environments. We give a bandit algorithm which achieves the total revenue at least 1 / 2 times that of the best fixed reserve prices in hind-sight. In no-envy learning, we study the bandit item selection problem where the player valuation is submodular and provide an efficient 1 / 2 -approximation no-envy algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/A Bayesian Nonparametrics View into Deep Representations b/data/2020/neurips/A Bayesian Nonparametrics View into Deep Representations
new file mode 100644
index 0000000000..388b753b29
--- /dev/null
+++ b/data/2020/neurips/A Bayesian Nonparametrics View into Deep Representations	
@@ -0,0 +1 @@
+We investigate neural network representations from a probabilistic perspective. Specifically, we leverage Bayesian nonparametrics to construct models of neural activations in Convolutional Neural Networks (CNNs) and latent representations in Variational Autoencoders (VAEs). This allows us to formulate a tractable complexity measure for distributions of neural activations and to explore global structure of latent spaces learned by VAEs. We use this machinery to uncover how memorization and two common forms of regularization, i.e. dropout and input augmentation, influence representational complexity in CNNs. We demonstrate that networks that can exploit patterns in data learn vastly less complex representations than networks forced to memorize. We also show marked differences between effects of input augmentation and dropout, with the latter strongly depending on network width. Next, we investigate latent representations learned by standard β-VAEs and Maximum Mean Discrepancy (MMD) β-VAEs. We show that aggregated posterior in standard VAEs quickly collapses to the diagonal prior when regularization strength increases. MMD-VAEs, on the other hand, learn more complex posterior distributions, even with strong regularization. While this gives a richer sample space, MMD-VAEs do not exhibit independence of latent dimensions. Finally, we leverage our probabilistic models as an effective sampling strategy for latent codes, improving quality of samples in VAEs with rich posteriors.
\ No newline at end of file
diff --git a/data/2020/neurips/A Bayesian Perspective on Training Speed and Model Selection b/data/2020/neurips/A Bayesian Perspective on Training Speed and Model Selection
new file mode 100644
index 0000000000..f5d734b333
--- /dev/null
+++ b/data/2020/neurips/A Bayesian Perspective on Training Speed and Model Selection	
@@ -0,0 +1 @@
+We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its marginal likelihood. Second, that this measure, under certain conditions, predicts the relative weighting of models in linear model combinations trained to minimize a regression loss. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. We further provide encouraging empirical evidence that the intuition developed in these settings also holds for deep neural networks trained with stochastic gradient descent. Our results suggest a promising new direction towards explaining why neural networks trained with stochastic gradient descent are biased towards functions that generalize well.
\ No newline at end of file
diff --git a/data/2020/neurips/A Benchmark for Systematic Generalization in Grounded Language Understanding b/data/2020/neurips/A Benchmark for Systematic Generalization in Grounded Language Understanding
new file mode 100644
index 0000000000..bf124dbf81
--- /dev/null
+++ b/data/2020/neurips/A Benchmark for Systematic Generalization in Grounded Language Understanding	
@@ -0,0 +1 @@
+Human language users easily interpret expressions that describe unfamiliar situations composed from familiar parts ("greet the pink brontosaurus by the ferris wheel"). Modern neural networks, by contrast, struggle to interpret compositions unseen in training. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in models of situated language understanding. We take inspiration from standard models of meaning composition in formal linguistics. Going beyond an earlier related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world. This allows us to build novel generalization tasks that probe the acquisition of linguistically motivated rules. For example, agents must understand how adjectives such as 'small' are interpreted relative to the current world state or how adverbs such as 'cautiously' combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.
\ No newline at end of file
diff --git a/data/2020/neurips/A Biologically Plausible Neural Network for Slow Feature Analysis b/data/2020/neurips/A Biologically Plausible Neural Network for Slow Feature Analysis
new file mode 100644
index 0000000000..ded41f4c1f
--- /dev/null
+++ b/data/2020/neurips/A Biologically Plausible Neural Network for Slow Feature Analysis	
@@ -0,0 +1 @@
+Learning latent features from time series data is an important problem in both machine learning and brain function. One approach, called Slow Feature Analysis (SFA), leverages the slowness of many salient features relative to the rapidly varying input signals. Furthermore, when trained on naturalistic stimuli, SFA reproduces interesting properties of cells in the primary visual cortex and hippocampus, suggesting that the brain uses temporal slowness as a computational principle for learning latent features. However, despite the potential relevance of SFA for modeling brain function, there is currently no SFA algorithm with a biologically plausible neural network implementation, by which we mean an algorithm operates in the online setting and can be mapped onto a neural network with local synaptic updates. In this work, starting from an SFA objective, we derive an SFA algorithm, called Bio-SFA, with a biologically plausible neural network implementation. We validate Bio-SFA on naturalistic stimuli.
\ No newline at end of file
diff --git a/data/2020/neurips/A Boolean Task Algebra for Reinforcement Learning b/data/2020/neurips/A Boolean Task Algebra for Reinforcement Learning
new file mode 100644
index 0000000000..7858a160d8
--- /dev/null
+++ b/data/2020/neurips/A Boolean Task Algebra for Reinforcement Learning	
@@ -0,0 +1 @@
+We propose a framework for defining a Boolean algebra over the space of tasks. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains, including a high-dimensional video game environment requiring function approximation, where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/A Catalyst Framework for Minimax Optimization b/data/2020/neurips/A Catalyst Framework for Minimax Optimization
new file mode 100644
index 0000000000..2ee73861db
--- /dev/null
+++ b/data/2020/neurips/A Catalyst Framework for Minimax Optimization	
@@ -0,0 +1 @@
+We introduce a generic two-loop scheme for smooth minimax optimization with strongly-convex-concave objectives. Our approach applies the accelerated proximal point framework (or Catalyst) to the associated dual problem and takes full advantage of existing gradient-based algorithms to solve a sequence of well-balanced strongly-convex-strongly-concave minimax problems. Despite its simplicity, this leads to a family of near-optimal algorithms with improved complexity over all existing methods designed for strongly-convex-concave minimax problems. Additionally, we obtain the ﬁrst variance-reduced algorithms for this class of minimax problems with ﬁnite-sum structure and establish faster convergence rate than batch algorithms. Furthermore, when extended to the nonconvex-concave minimax optimization, our algorithm again achieves the state-of-the-art complexity for ﬁnding a stationary point. We carry out several numerical experiments showcasing the superiority of the Catalyst framework in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/A Causal View on Robustness of Neural Networks b/data/2020/neurips/A Causal View on Robustness of Neural Networks
new file mode 100644
index 0000000000..671d79273c
--- /dev/null
+++ b/data/2020/neurips/A Causal View on Robustness of Neural Networks	
@@ -0,0 +1 @@
+We present a causal view on the robustness of neural networks against input manipulations, which applies not only to traditional classification tasks but also to general measurement data. Based on this view, we design a deep causal manipulation augmented model (deep CAMA) which explicitly models possible manipulations on certain causes leading to changes in the observed effect. We further develop data augmentation and test-time fine-tuning methods to improve deep CAMA's robustness. When compared with discriminative deep neural networks, our proposed model shows superior robustness against unseen manipulations. As a by-product, our model achieves disentangled representation which separates the representation of manipulations from those of other latent causes.
\ No newline at end of file
diff --git a/data/2020/neurips/A Class of Algorithms for General Instrumental Variable Models b/data/2020/neurips/A Class of Algorithms for General Instrumental Variable Models
new file mode 100644
index 0000000000..fdeb1c4838
--- /dev/null
+++ b/data/2020/neurips/A Class of Algorithms for General Instrumental Variable Models	
@@ -0,0 +1 @@
+Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive error model for the outcome. An alternative is partial identification, which provides bounds on the causal effect. Little exists in terms of bounding methods that can deal with the most general case, where the treatment itself can be continuous. Moreover, bounding methods generally do not allow for a continuum of assumptions on the shape of the causal effect that can smoothly trade off stronger background knowledge for more informative bounds. In this work, we provide a method for causal effect bounding in continuous distributions, leveraging recent advances in gradient-based methods for the optimization of computationally intractable objective functions. We demonstrate on a set of synthetic and real-world data that our bounds capture the causal effect when additive methods fail, providing a useful range of answers compatible with observation as opposed to relying on unwarranted structural assumptions.
\ No newline at end of file
diff --git a/data/2020/neurips/A Closer Look at Accuracy vs. Robustness b/data/2020/neurips/A Closer Look at Accuracy vs. Robustness
new file mode 100644
index 0000000000..bafd5aa2f6
--- /dev/null
+++ b/data/2020/neurips/A Closer Look at Accuracy vs. Robustness	
@@ -0,0 +1 @@
+Current methods for training robust networks lead to a drop in test accuracy, which has led prior works to posit that a robustness-accuracy tradeoff may be inevitable in deep learning. We take a closer look at this phenomenon and first show that real image datasets are actually separated. With this property in mind, we then prove that robustness and accuracy should both be achievable for benchmark datasets through locally Lipschitz functions, and hence, there should be no inherent tradeoff between robustness and accuracy. Through extensive experiments with robustness methods, we argue that the gap between theory and practice arises from two limitations of current methods: either they fail to impose local Lipschitzness or they are insufficiently generalized. We explore combining dropout with robust training methods and obtain better generalization. We conclude that achieving robustness and accuracy in practice may require using methods that impose local Lipschitzness and augmenting them with deep learning generalization techniques. Code available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/A Closer Look at the Training Strategy for Modern Meta-Learning b/data/2020/neurips/A Closer Look at the Training Strategy for Modern Meta-Learning
new file mode 100644
index 0000000000..540f9899ff
--- /dev/null
+++ b/data/2020/neurips/A Closer Look at the Training Strategy for Modern Meta-Learning	
@@ -0,0 +1 @@
+The support/query (S/Q) episodic training strategy has been widely used in modern meta-learning algorithms and is believed to improve their generalization ability to test environments. This paper conducts a theoretical investigation of this training strategy on generalization. From a stability perspective, we analyze the generalization error bound of generic meta-learning algorithms trained with such strategy. We show that the S/Q episodic training strategy naturally leads to a counterintuitive generalization bound of O (1 / √ n ) , which only depends on the task number n but independent of the inner-task sample size m . Under the common assumption m << n for few-shot learning, the bound of O (1 / √ n ) implies strong generalization guarantees for modern meta-learning algorithms in the few-shot regime. To further explore the inﬂuence of training strategies on generalization, we propose a leave-one-out (LOO) training strategy for meta-learning and compare it with S/Q training. Experiments on standard few-shot regression and classiﬁcation tasks with popular meta-learning algorithms validate our analysis.
\ No newline at end of file
diff --git a/data/2020/neurips/A Combinatorial Perspective on Transfer Learning b/data/2020/neurips/A Combinatorial Perspective on Transfer Learning
new file mode 100644
index 0000000000..8b6c852569
--- /dev/null
+++ b/data/2020/neurips/A Combinatorial Perspective on Transfer Learning	
@@ -0,0 +1 @@
+Human intelligence is characterized not only by the capacity to learn complex skills, but the ability to rapidly adapt and acquire new skills within an ever-changing environment. In this work we study how the learning of modular solutions can allow for effective generalization to both unseen and potentially differently distributed data. Our main postulate is that the combination of task segmentation, modular learning and memory-based ensembling can give rise to generalization on an exponentially growing number of unseen tasks. We provide a concrete instantiation of this idea using a combination of: (1) the Forget-Me-Not Process, for task segmentation and memory based ensembling; and (2) Gated Linear Networks, which in contrast to contemporary deep learning techniques use a modular and local learning mechanism. We demonstrate that this system exhibits a number of desirable continual learning properties: robustness to catastrophic forgetting, no negative transfer and increasing levels of positive transfer as more tasks are seen. We show competitive performance against both offline and online methods on standard continual learning benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/A Computational Separation between Private Learning and Online Learning b/data/2020/neurips/A Computational Separation between Private Learning and Online Learning
new file mode 100644
index 0000000000..338080e5d0
--- /dev/null
+++ b/data/2020/neurips/A Computational Separation between Private Learning and Online Learning	
@@ -0,0 +1 @@
+A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).
\ No newline at end of file
diff --git a/data/2020/neurips/A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval b/data/2020/neurips/A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval
new file mode 100644
index 0000000000..d546ca9c5e
--- /dev/null
+++ b/data/2020/neurips/A Continuous-Time Mirror Descent Approach to Sparse Phase Retrieval	
@@ -0,0 +1 @@
+We analyze continuous-time mirror descent applied to sparse phase retrieval, which is the problem of recovering sparse signals from a set of magnitude-only measurements. We apply mirror descent to the unconstrained empirical risk minimization problem (batch setting), using the square loss and square measurements. We provide a convergence analysis of the algorithm in this non-convex setting and prove that, with the hypentropy mirror map, mirror descent recovers any $k$-sparse vector $\mathbf{x}^\star\in\mathbb{R}^n$ with minimum (in modulus) non-zero entry on the order of $\| \mathbf{x}^\star \|_2/\sqrt{k}$ from $k^2$ Gaussian measurements, modulo logarithmic terms. This yields a simple algorithm which, unlike most existing approaches to sparse phase retrieval, adapts to the sparsity level, without including thresholding steps or adding regularization terms. Our results also provide a principled theoretical understanding for Hadamard Wirtinger flow [58], as Euclidean gradient descent applied to the empirical risk problem with Hadamard parametrization can be recovered as a first-order approximation to mirror descent in discrete time.
\ No newline at end of file
diff --git a/data/2020/neurips/A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions b/data/2020/neurips/A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions
new file mode 100644
index 0000000000..2de55c3228
--- /dev/null
+++ b/data/2020/neurips/A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions	
@@ -0,0 +1 @@
+We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a scalable dynamic importance sampler, which automatically flattens the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority over the existing state-of-the-art algorithms in training deep neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction b/data/2020/neurips/A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction
new file mode 100644
index 0000000000..7a771dec01
--- /dev/null
+++ b/data/2020/neurips/A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction	
@@ -0,0 +1 @@
+Haplotype assembly and viral quasispecies reconstruction are challenging tasks concerned with analysis of genomic mixtures using sequencing data. High-throughput sequencing technologies generate enormous amounts of short fragments (reads) which essentially oversample components of a mixture; the representation redundancy enables reconstruction of the components (haplotypes, viral strains). The reconstruction problem, known to be NP-hard, boils down to grouping together reads originating from the same component in a mixture. Existing methods struggle to solve this problem with required level of accuracy and low runtimes; the problem is becoming increasingly more challenging as the number and length of the components increase. This paper proposes a read clustering method based on a convolutional auto-encoder designed to first project sequenced fragments to a low-dimensional space and then estimate the probability of the read origin using learned embedded features. The components are reconstructed by finding consensus sequences that agglomerate reads from the same origin. Mini-batch stochastic gradient descent and dimension reduction of reads allow the proposed method to efficiently deal with massive numbers of long reads. Experiments on simulated, semi-experimental and experimental data demonstrate the ability of the proposed method to accurately reconstruct haplotypes and viral quasispecies, often demonstrating superior performance compared to state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/A Decentralized Parallel Algorithm for Training Generative Adversarial Nets b/data/2020/neurips/A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
new file mode 100644
index 0000000000..b8500a3e9c
--- /dev/null
+++ b/data/2020/neurips/A Decentralized Parallel Algorithm for Training Generative Adversarial Nets	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) are powerful class of generative models in the deep learning community. Current practice on large-scale GAN training~\citep{brock2018large} utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs to communicate with the central node. However, when the network bandwidth is low or network latency is high, the performance would be significantly degraded. Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner. The main difficulty lies at handling the nonconvex-nonconcave min-max optimization and the decentralized communication simultaneously. In this paper, we address this difficulty by designing the \textbf{first gradient-based decentralized parallel algorithm} which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm. Theoretically, our proposed decentralized algorithm is able to solve a class of non-convex non-concave min-max problems with provable non-asymptotic convergence to first-order stationary point. Experimental results on GANs demonstrate the effectiveness of the proposed algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/A Dictionary Approach to Domain-Invariant Learning in Deep Networks b/data/2020/neurips/A Dictionary Approach to Domain-Invariant Learning in Deep Networks
new file mode 100644
index 0000000000..e81a5f7b68
--- /dev/null
+++ b/data/2020/neurips/A Dictionary Approach to Domain-Invariant Learning in Deep Networks	
@@ -0,0 +1 @@
+In this paper, we consider domain-invariant deep learning by explicitly modeling domain shifts with only a small amount of domain-specific parameters in a Convolutional Neural Network (CNN). By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of dictionary atoms, we show for the first time, both empirically and theoretically, that domain shifts can be effectively handled by decomposing a convolutional layer into a domain-specific atom layer and a domain-shared coefficient layer, while both remain convolutional. An input channel will now first convolve spatially only with each respective domain-specific dictionary atom to "absorb" domain variations, and then output channels are linearly combined using common decomposition coefficients trained to promote shared semantics across domains. We use toy examples, rigorous analysis, and real-world examples with diverse datasets and architectures, to show the proposed plug-in framework's effectiveness in cross and joint domain performance and domain adaptation. With the proposed architecture, we need only a small set of dictionary atoms to model each additional domain, which brings a negligible amount of additional parameters, typically a few hundred.
\ No newline at end of file
diff --git a/data/2020/neurips/A Discrete Variational Recurrent Topic Model without the Reparametrization Trick b/data/2020/neurips/A Discrete Variational Recurrent Topic Model without the Reparametrization Trick
new file mode 100644
index 0000000000..0e720512eb
--- /dev/null
+++ b/data/2020/neurips/A Discrete Variational Recurrent Topic Model without the Reparametrization Trick	
@@ -0,0 +1 @@
+We show how to learn a neural topic model with discrete random variables---one that explicitly models each word's assigned topic---using neural variational inference that does not rely on stochastic backpropagation to handle the discrete variables. The model we utilize combines the expressive power of neural methods for representing sequences of text with the topic model's ability to capture global, thematic coherence. Using neural variational inference, we show improved perplexity and document understanding across multiple corpora. We examine the effect of prior parameters both on the model and variational parameters and demonstrate how our approach can compete and surpass a popular topic model implementation on an automatic measure of topic quality.
\ No newline at end of file
diff --git a/data/2020/neurips/A Dynamical Central Limit Theorem for Shallow Neural Networks b/data/2020/neurips/A Dynamical Central Limit Theorem for Shallow Neural Networks
new file mode 100644
index 0000000000..81566804a9
--- /dev/null
+++ b/data/2020/neurips/A Dynamical Central Limit Theorem for Shallow Neural Networks	
@@ -0,0 +1 @@
+Recent theoretical work has characterized the dynamics of wide shallow neural networks trained via gradient descent in an asymptotic regime called the mean-field limit as the number of parameters tends towards infinity. At initialization, the randomly sampled parameters lead to a deviation from the mean-field limit that is dictated by the classical Central Limit Theorem (CLT). However, the dynamics of training introduces correlations among the parameters, raising the question of how the fluctuations evolve during training. Here, we analyze the mean-field dynamics as a Wasserstein gradient flow and prove that the deviations from the mean-field limit scaled by the width, in the width-asymptotic limit, remain bounded throughout training. In particular, they eventually vanish in the CLT scaling if the mean-field dynamics converges to a measure that interpolates the training data. This observation has implications for both the approximation rate and the generalization: the upper bound we obtain is given by a Monte-Carlo type resampling error, which does not depend explicitly on the dimension. This bound motivates a regularizaton term on the 2-norm of the underlying measure, which is also connected to generalization via the variation-norm function spaces.
\ No newline at end of file
diff --git a/data/2020/neurips/A Fair Classifier Using Kernel Density Estimation b/data/2020/neurips/A Fair Classifier Using Kernel Density Estimation
new file mode 100644
index 0000000000..dfa4701786
--- /dev/null
+++ b/data/2020/neurips/A Fair Classifier Using Kernel Density Estimation	
@@ -0,0 +1 @@
+As machine learning becomes prevalent in a widening array of sensitive applications such as job hiring and criminal justice, one critical aspect in the design of machine learning classiﬁers is to ensure fairness: Guaranteeing the irrelevancy of a prediction to sensitive attributes such as gender and race. This work develops a kernel density estimation (KDE) methodology to faithfully respect the fairness constraint while yielding a tractable optimization problem that comes with high accuracy-fairness tradeoff. One key feature of this approach is that the fairness measure quantiﬁed based on KDE can be expressed as a differentiable function w.r.t. model parameters, thereby enabling the use of prominent gradient descent to readily solve an interested optimization problem. This work focuses on classiﬁ-cation tasks and two well-known measures of group fairness: demographic parity and equalized odds. We empirically show that our algorithm achieves greater or comparable performances against prior fair classifers in accuracy-fairness tradeoff as well as in training stability on both synthetic and benchmark real datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization b/data/2020/neurips/A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization
new file mode 100644
index 0000000000..bd3095ab1f
--- /dev/null
+++ b/data/2020/neurips/A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained Optimization	
@@ -0,0 +1 @@
+Nonconvex sparse models have received significant attention in high-dimensional machine learning. In this paper, we study a new model consisting of a general convex or nonconvex objectives and a variety of continuous nonconvex sparsity-inducing constraints. For this constrained model, we propose a novel proximal point algorithm that solves a sequence of convex subproblems with gradually relaxed constraint levels. Each subproblem, having a proximal point objective and a convex surrogate constraint, can be efficiently solved based on a fast routine for projection onto the surrogate constraint. We establish the asymptotic convergence of the proposed algorithm to the Karush-Kuhn-Tucker (KKT) solutions. We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems. We perform numerical experiments to demonstrate the effectiveness of our new model and efficiency of the proposed algorithm for large scale problems.
\ No newline at end of file
diff --git a/data/2020/neurips/A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods b/data/2020/neurips/A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods
new file mode 100644
index 0000000000..2b8bac416f
--- /dev/null
+++ b/data/2020/neurips/A Finite-Time Analysis of Two Time-Scale Actor-Critic Methods	
@@ -0,0 +1 @@
+Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., $\|\nabla J(\boldsymbol{\theta})\|_2^2 \le \epsilon$) of the non-concave performance function $J(\boldsymbol{\theta})$, with $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
\ No newline at end of file
diff --git a/data/2020/neurips/A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding b/data/2020/neurips/A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding
new file mode 100644
index 0000000000..7fb8806eac
--- /dev/null
+++ b/data/2020/neurips/A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding	
@@ -0,0 +1 @@
+We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems, and whose architectures are derived from an optimization algorithm. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end. The priors used in this presentation include variants of total variation, Laplacian regularization, bilateral filtering, sparse coding on learned dictionaries, and non-local self similarities. Our models are fully interpretable as well as parameter and data efficient. Our experiments demonstrate their effectiveness on a large diversity of tasks ranging from image denoising and compressed sensing for fMRI to dense stereo matching.
\ No newline at end of file
diff --git a/data/2020/neurips/A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses b/data/2020/neurips/A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses
new file mode 100644
index 0000000000..ea36436bf9
--- /dev/null
+++ b/data/2020/neurips/A Game Theoretic Analysis of Additive Adversarial Attacks and Defenses	
@@ -0,0 +1 @@
+Research in adversarial learning follows a cat and mouse game between attackers and defenders where attacks are proposed, they are mitigated by new defenses, and subsequently new attacks are proposed that break earlier defenses, and so on. However, it has remained unclear as to whether there are conditions under which no better attacks or defenses can be proposed. In this paper, we propose a game-theoretic framework for studying attacks and defenses which exist in equilibrium. Under a locally linear decision boundary model for the underlying binary classifier, we prove that the Fast Gradient Method attack and the Randomized Smoothing defense form a Nash Equilibrium. We then show how this equilibrium defense can be approximated given finitely many samples from a data-generating distribution, and derive a generalization bound for the performance of our approximation.
\ No newline at end of file
diff --git a/data/2020/neurips/A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling b/data/2020/neurips/A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling
new file mode 100644
index 0000000000..4e7905979e
--- /dev/null
+++ b/data/2020/neurips/A Game-Theoretic Analysis of the Empirical Revenue Maximization Algorithm with Endogenous Sampling	
@@ -0,0 +1 @@
+The Empirical Revenue Maximization (ERM) is one of the most important price learning algorithms in auction design: as the literature shows it can learn approximately optimal reserve prices for revenue-maximizing auctioneers in both repeated auctions and uniform-price auctions. However, in these applications the agents who provide inputs to ERM have incentives to manipulate the inputs to lower the outputted price. We generalize the definition of an incentive-awareness measure proposed by Lavi et al (2019), to quantify the reduction of ERM's outputted price due to a change of $m\ge 1$ out of $N$ input samples, and provide specific convergence rates of this measure to zero as $N$ goes to infinity for different types of input distributions. By adopting this measure, we construct an efficient, approximately incentive-compatible, and revenue-optimal learning algorithm using ERM in repeated auctions against non-myopic bidders, and show approximate group incentive-compatibility in uniform-price auctions.
\ No newline at end of file
diff --git a/data/2020/neurips/A General Large Neighborhood Search Framework for Solving Integer Linear Programs b/data/2020/neurips/A General Large Neighborhood Search Framework for Solving Integer Linear Programs
new file mode 100644
index 0000000000..d871c75fe2
--- /dev/null
+++ b/data/2020/neurips/A General Large Neighborhood Search Framework for Solving Integer Linear Programs	
@@ -0,0 +1 @@
+Mixed integer programming provides a unifying framework for solving a medley of hard combinatorial optimization problems of practical interest. A mixed integer program (MIP) is typically solved using linear programming (LP) based branch-and-bound algorithm. Primal heuristics are a key component of MIP solvers and enable finding good feasible solutions early in the tree search. The literature is rich with a variety of hybrid primal heuristics that combine both heuristic and exact methods. In this thesis, we propose a new supervised large neighborhood search heuristic for the general MIP, as well as, a new analytical MIP-based primal heuristic for the linear ordering problem. We present our work in two parts. Part I: Supervised Neighborhood Selection for Mixed Integer Programs Large neighborhood search (LNS) heuristics are employed as improvement procedures within the branch-and-bound algorithm. They formulate the neighborhoods as an auxiliary MIP with a restricted search space, which is then solved to search for an improving solution. The neighborhoods are typically defined by handcrafted rules, guided by human intuition and offline experimentation. Alternatively, a neighborhood should be defined such that it has a high likelihood of success. We apply a data-driven approach to predict an ideal neighborhood for the neighborhood search. We propose a supervised large neighborhood search heuristic for the general mixed integer programs and compare it with Relaxation Induced Neighborhood Search (RINS), a popular LNS heuristic. Our heuristic not only finds an improving solution more often but also improves the solver performance on key metrics. Part II: MIP-based Primal Heuristic for the Linear Ordering Problem Linear ordering problem (LOP) is a quintessential combinatorial optimization problem and has been well studied in the literature. We present a new MIP-based primal heuristic for the LOP. The heuristic decomposes the LOP instance into sub-problems albeit sub-optimal ones, solves each sub-problem optimally and concatenates the partial solutions to construct a solution to the original problem instance. We present empirical results that show that our heuristics achieves good performance on benchmark instances.
\ No newline at end of file
diff --git a/data/2020/neurips/A General Method for Robust Learning from Batches b/data/2020/neurips/A General Method for Robust Learning from Batches
new file mode 100644
index 0000000000..696b25599f
--- /dev/null
+++ b/data/2020/neurips/A General Method for Robust Learning from Batches	
@@ -0,0 +1 @@
+In many applications, data is collected in batches, some of which are corrupt or even adversarial. Recent work derived optimal robust algorithms for estimating discrete distributions in this setting. We consider a general framework of robust learning from batches, and determine the limits of both classification and distribution estimation over arbitrary, including continuous, domains. Building on these results, we derive the first robust agnostic computationally-efficient learning algorithms for piecewise-interval classification, and for piecewise-polynomial, monotone, log-concave, and gaussian-mixture distribution estimation.
\ No newline at end of file
diff --git a/data/2020/neurips/A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks b/data/2020/neurips/A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks
new file mode 100644
index 0000000000..f931c61699
--- /dev/null
+++ b/data/2020/neurips/A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks	
@@ -0,0 +1 @@
+A recent breakthrough in deep learning theory shows that the training of over-parameterized deep neural networks can be characterized by a kernel function called \textit{neural tangent kernel} (NTK). However, it is known that this type of results does not perfectly match the practice, as NTK-based analysis requires the network weights to stay very close to their initialization throughout training, and cannot handle regularizers or gradient noises. In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a "kernel-like" behavior. This implies that the training loss converges linearly up to a certain accuracy. We also establish a novel generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay.
\ No newline at end of file
diff --git a/data/2020/neurips/A Group-Theoretic Framework for Data Augmentation b/data/2020/neurips/A Group-Theoretic Framework for Data Augmentation
new file mode 100644
index 0000000000..1c7beca9de
--- /dev/null
+++ b/data/2020/neurips/A Group-Theoretic Framework for Data Augmentation	
@@ -0,0 +1,2 @@
+Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set. However, to the best of our knowledge, a clear mathematical framework to explain the performance benefits of data augmentation is not available. 
+In this paper, we develop such a theoretical framework. We show data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant. We prove that it leads to variance reduction. We study empirical risk minimization, and the examples of exponential families, linear regression, and certain two-layer neural networks. We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).
\ No newline at end of file
diff --git a/data/2020/neurips/A Limitation of the PAC-Bayes Framework b/data/2020/neurips/A Limitation of the PAC-Bayes Framework
new file mode 100644
index 0000000000..7c1c1bbe19
--- /dev/null
+++ b/data/2020/neurips/A Limitation of the PAC-Bayes Framework	
@@ -0,0 +1,2 @@
+PAC-Bayes is a useful framework for deriving generalization bounds which was introduced by McAllester ('98). This framework has the flexibility of deriving distribution- and algorithm-dependent bounds, which are often tighter than VC-related uniform convergence bounds. In this manuscript we present a limitation for the PAC-Bayes framework. We demonstrate an easy learning task that is not amenable to a PAC-Bayes analysis. 
+Specifically, we consider the task of linear classification in 1D; it is well-known that this task is learnable using just $O(\log(1/\delta)/\epsilon)$ examples. On the other hand, we show that this fact can not be proved using a PAC-Bayes analysis: for any algorithm that learns 1-dimensional linear classifiers there exists a (realizable) distribution for which the PAC-Bayes bound is arbitrarily large.
\ No newline at end of file
diff --git a/data/2020/neurips/A Local Temporal Difference Code for Distributional Reinforcement Learning b/data/2020/neurips/A Local Temporal Difference Code for Distributional Reinforcement Learning
new file mode 100644
index 0000000000..d8e5712a5a
--- /dev/null
+++ b/data/2020/neurips/A Local Temporal Difference Code for Distributional Reinforcement Learning	
@@ -0,0 +1 @@
+Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times. This increases representational capacity and allows for temporally-ﬂexible computations that immediately adjust to changing horizons or discount factors.
\ No newline at end of file
diff --git a/data/2020/neurips/A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model b/data/2020/neurips/A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model
new file mode 100644
index 0000000000..22e7918233
--- /dev/null
+++ b/data/2020/neurips/A Loss Function for Generative Neural Networks Based on Watson's Perceptual Model	
@@ -0,0 +1 @@
+To train Variational Autoencoders (VAEs) to generate realistic imagery requires a loss function that reflects human perception of image similarity. We propose such a loss function based on Watson's perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking. We extend the model to color images, increase its robustness to translation by using the Fourier Transform, remove artifacts due to splitting the image into blocks, and make it differentiable. In experiments, VAEs trained with the new loss function generated realistic, high-quality image samples. Compared to using the Euclidean distance and the Structural Similarity Index, the images were less blurry; compared to deep neural network based losses, the new approach required less computational resources and generated images with less artifacts.
\ No newline at end of file
diff --git a/data/2020/neurips/A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices b/data/2020/neurips/A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices
new file mode 100644
index 0000000000..5a4530ff53
--- /dev/null
+++ b/data/2020/neurips/A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices	
@@ -0,0 +1 @@
+We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via a regular (aperiodic and irreducible) finite Markov chain. Specially, consider a random walk on a regular Markov chain and a Hermitian matrix-valued function on its state space. Our result gives exponentially decreasing bounds on the tail distributions of the extreme eigenvalues of the sample mean matrix. Our proof is based on the matrix expander (regular undirected graph) Chernoff bound [Garg et al. STOC ’18] and scalar Chernoff-Hoeffding bounds for Markov chains [Chung et al. STACS ’12]. Our matrix Chernoff bound for Markov chains can be applied to analyze the behavior of co-occurrence statistics for sequential data, which have been common and important data signals in machine learning. We show that given a regular Markov chain with \(n\) states and mixing time \(\tau\), we need a trajectory of length \(O(\tau (\log{n} + \log{\tau})/\epsilon^2)\) to achieve an estimator of the co-occurrence matrix with error bound \(\epsilon\). We conduct several experiments and the experimental results are consistent with the exponentially fast convergence rate from theoretical analysis. Our result gives the first bound on the convergence rate of the co-occurrence matrix and the first sample complexity analysis in graph representation learning.
\ No newline at end of file
diff --git a/data/2020/neurips/A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs b/data/2020/neurips/A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
new file mode 100644
index 0000000000..8a85010e4a
--- /dev/null
+++ b/data/2020/neurips/A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs	
@@ -0,0 +1 @@
+This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.
\ No newline at end of file
diff --git a/data/2020/neurips/A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings b/data/2020/neurips/A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings
new file mode 100644
index 0000000000..c69f33b1ed
--- /dev/null
+++ b/data/2020/neurips/A Measure-Theoretic Approach to Kernel Conditional Mean Embeddings	
@@ -0,0 +1 @@
+We present a new operator-free, measure-theoretic definition of the conditional mean embedding as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of marginal distributions has been defined rigorously, the existing operator-based approach of the conditional version lacks a rigorous definition, and depends on strong assumptions that hinder its analysis. Our definition does not impose any of the assumptions that the operator-based counterpart requires. We derive a natural regression interpretation to obtain empirical estimates, and provide a thorough analysis of its properties, including universal consistency. As natural by-products, we obtain the conditional analogues of the Maximum Mean Discrepancy and Hilbert-Schmidt Independence Criterion, and demonstrate their behaviour via simulations.
\ No newline at end of file
diff --git a/data/2020/neurips/A Non-Asymptotic Analysis for Stein Variational Gradient Descent b/data/2020/neurips/A Non-Asymptotic Analysis for Stein Variational Gradient Descent
new file mode 100644
index 0000000000..1c3116c859
--- /dev/null
+++ b/data/2020/neurips/A Non-Asymptotic Analysis for Stein Variational Gradient Descent	
@@ -0,0 +1 @@
+We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $\pi$, where the gradient is smoothed through a kernel integral operator. In this paper, we provide a novel finite time analysis for the SVGD algorithm. We obtain a descent lemma establishing that the algorithm decreases the objective at each iteration, and provably converges, with less restrictive assumptions on the step size than required in earlier analyses. We further provide a guarantee on the convergence rate in Kullback-Leibler divergence, assuming $\pi$ satisfies a Stein log-Sobolev inequality as in Duncan et al. (2019), which takes into account the geometry induced by the smoothed KL gradient.
\ No newline at end of file
diff --git a/data/2020/neurips/A Novel Approach for Constrained Optimization in Graphical Models b/data/2020/neurips/A Novel Approach for Constrained Optimization in Graphical Models
new file mode 100644
index 0000000000..2ba092297f
--- /dev/null
+++ b/data/2020/neurips/A Novel Approach for Constrained Optimization in Graphical Models	
@@ -0,0 +1 @@
+We consider the following constrained maximization problem in discrete probabilistic graphical models (PGMs). Given two (possibly identical) PGMs M 1 and M 2 deﬁned over the same set of variables and a real number q , ﬁnd an assignment of values to all variables such that the probability of the assignment is maximized w.r.t. M 1 and is smaller than q w.r.t. M 2 . We show that several explanation and robust estimation queries over graphical models are special cases of this problem. We propose a class of approximate algorithms for solving this problem. Our algorithms are based on a graph concept called k -separator and heuristic algorithms for multiple choice knapsack and subset-sum problems. Our experiments show that our algorithms are superior to the following approach: encode the problem as a mixed integer linear program (MILP) and solve the latter using a state-of-the-art MILP solver such as SCIP.
\ No newline at end of file
diff --git a/data/2020/neurips/A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances b/data/2020/neurips/A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
new file mode 100644
index 0000000000..34a713e2f1
--- /dev/null
+++ b/data/2020/neurips/A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances	
@@ -0,0 +1 @@
+In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes {\em exponentially rare} as the minimal solution length increases. So, an RL approach loses its training signal. There has been promising recent progress by using a curriculum-driven learning approach that is designed to solve a single hard instance. We present a novel {\em automated} curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our {\em difficulty quantum momentum} strategy. We show how the smoothness of the task hardness impacts the final learning results. In particular, as the size of the instance pool increases, the ``hardness gap'' decreases, which facilitates a smoother automated curriculum based learning process. Our automated curriculum approach dramatically improves upon the previous approaches. We show our results on Sokoban, which is a traditional PSPACE-complete planning problem and presents a great challenge even for specialized solvers. Our RL agent can solve hard instances that are far out of reach for any previous state-of-the-art Sokoban solver. In particular, our approach can uncover plans that require hundreds of steps, while the best previous search methods would take many years of computing time to solve such instances. In addition, we show that we can further boost the RL performance with an intricate coupling of our automated curriculum approach with a curiosity-driven search strategy and a graph neural net representation.
\ No newline at end of file
diff --git a/data/2020/neurips/A Randomized Algorithm to Reduce the Support of Discrete Measures b/data/2020/neurips/A Randomized Algorithm to Reduce the Support of Discrete Measures
new file mode 100644
index 0000000000..637ed0517d
--- /dev/null
+++ b/data/2020/neurips/A Randomized Algorithm to Reduce the Support of Discrete Measures	
@@ -0,0 +1 @@
+Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling". We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $N\gg n$ regime. A Python implementation is available at \url{this https URL}.
\ No newline at end of file
diff --git a/data/2020/neurips/A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection b/data/2020/neurips/A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection
new file mode 100644
index 0000000000..49d13aa812
--- /dev/null
+++ b/data/2020/neurips/A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection	
@@ -0,0 +1 @@
+We propose \textit{average Localisation-Recall-Precision} (aLRP), a unified, bounded, balanced and ranking-based loss function for both classification and localisation tasks in object detection. aLRP extends the Localisation-Recall-Precision (LRP) performance metric (Oksuz et al., 2018) inspired from how Average Precision (AP) Loss extends precision to a ranking-based loss function for classification (Chen et al., 2020). aLRP has the following distinct advantages: (i) aLRP is the first ranking-based loss function for both classification and localisation tasks. (ii) Thanks to using ranking for both tasks, aLRP naturally enforces high-quality localisation for high-precision classification. (iii) aLRP provides provable balance between positives and negatives. (iv) Compared to on average $\sim$6 hyperparameters in the loss functions of state-of-the-art detectors, aLRP Loss has only one hyperparameter, which we did not tune in practice. On the COCO dataset, aLRP Loss improves its ranking-based predecessor, AP Loss, up to around $5$ AP points, achieves $48.9$ AP without test time augmentation and outperforms all one-stage detectors. Code available at: this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/A Robust Functional EM Algorithm for Incomplete Panel Count Data b/data/2020/neurips/A Robust Functional EM Algorithm for Incomplete Panel Count Data
new file mode 100644
index 0000000000..6439e382bf
--- /dev/null
+++ b/data/2020/neurips/A Robust Functional EM Algorithm for Incomplete Panel Count Data	
@@ -0,0 +1 @@
+Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors and predict future negative events, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are common and present a major barrier to downstream statistical learning. As a first step, under a missing completely at random assumption (MCAR), we propose a simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists. The proposed approach wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. Theoretical analysis of the proposed algorithm provides finite-sample guarantees by expanding parametric EM theory [3, 34] to the general non-parametric setting. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data. We also discuss useful extensions to address deviations from the MCAR assumption and covariate effects.
\ No newline at end of file
diff --git a/data/2020/neurips/A Scalable Approach for Privacy-Preserving Collaborative Machine Learning b/data/2020/neurips/A Scalable Approach for Privacy-Preserving Collaborative Machine Learning
new file mode 100644
index 0000000000..fd26aae946
--- /dev/null
+++ b/data/2020/neurips/A Scalable Approach for Privacy-Preserving Collaborative Machine Learning	
@@ -0,0 +1 @@
+We consider a collaborative learning scenario in which multiple data-owners wish to jointly train a logistic regression model, while keeping their individual datasets private from the other parties. We propose COPML, a fully-decentralized training framework that achieves scalability and privacy-protection simultaneously. The key idea of COPML is to securely encode the individual datasets to distribute the computation load effectively across many parties and to perform the training computations as well as the model updates in a distributed manner on the securely encoded data. We provide the privacy analysis of COPML and prove its convergence. Furthermore, we experimentally demonstrate that COPML can achieve significant speedup in training over the benchmark protocols. Our protocol provides strong statistical privacy guarantees against colluding parties (adversaries) with unbounded computational power, while achieving up to $16\times$ speedup in the training time against the benchmark protocols.
\ No newline at end of file
diff --git a/data/2020/neurips/A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees b/data/2020/neurips/A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
new file mode 100644
index 0000000000..94282f5ba8
--- /dev/null
+++ b/data/2020/neurips/A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees	
@@ -0,0 +1 @@
+Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques. We present results on data-sets containing up to 245,000 samples. Existing MIP-based methods do not scale well on training data-sets beyond 5,500 samples.
\ No newline at end of file
diff --git a/data/2020/neurips/A Self-Tuning Actor-Critic Algorithm b/data/2020/neurips/A Self-Tuning Actor-Critic Algorithm
new file mode 100644
index 0000000000..35bda8eeec
--- /dev/null
+++ b/data/2020/neurips/A Self-Tuning Actor-Critic Algorithm	
@@ -0,0 +1 @@
+Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-Tuning Actor-Critic (STAC), to self-tune all the differentiable hyperparameters of an actor-critic loss function, to discover auxiliary tasks, and to improve off-policy learning using a novel leaky V-trace operator. STAC is simple to use, sample efficient and does not require a significant increase in compute. Ablative studies show that the overall performance of STAC improved as we adapt more hyperparameters. When applied to the Arcade Learning Environment (Bellemare et al. 2012), STAC improved the median human normalized score in $200$M steps from $243\%$ to $364\%$. When applied to the DM Control suite (Tassa et al., 2018), STAC improved the mean score in $30$M steps from $217$ to $389$ when learning with features, from $108$ to $202$ when learning from pixels, and from $195$ to $295$ in the Real-World Reinforcement Learning Challenge (Dulac-Arnold et al., 2020).
\ No newline at end of file
diff --git a/data/2020/neurips/A Simple Language Model for Task-Oriented Dialogue b/data/2020/neurips/A Simple Language Model for Task-Oriented Dialogue
new file mode 100644
index 0000000000..1fca3d1040
--- /dev/null
+++ b/data/2020/neurips/A Simple Language Model for Task-Oriented Dialogue	
@@ -0,0 +1 @@
+Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response. While such decomposition might suggest a dedicated model for each sub-task, we find a simple, unified approach leads to state-of-the-art performance on the MultiWOZ dataset. SimpleTOD is a simple approach to task-oriented dialogue that uses a single causal language model trained on all sub-tasks recast as a single sequence prediction problem. This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2. SimpleTOD improves over the prior state-of-the-art by 0.49 points in joint goal accuracy for dialogue state tracking. More impressively, SimpleTOD also improves the main metrics used to evaluate action decisions and response generation in an end-to-end setting for task-oriented dialog systems: inform rate by 8.1 points, success rate by 9.7 points, and combined score by 7.2 points.
\ No newline at end of file
diff --git a/data/2020/neurips/A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration b/data/2020/neurips/A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration
new file mode 100644
index 0000000000..469f90378f
--- /dev/null
+++ b/data/2020/neurips/A Simple and Efficient Smoothing Method for Faster Optimization and Local Exploration	
@@ -0,0 +1 @@
+This work proposes a novel smoothing method, called Bend, Mix and Release (BMR) , that extends two well-known smooth approximations of the convex optimization literature: randomized smoothing and the Moreau envelope. The BMR smoothing method allows to trade-off between the computational simplicity of randomized smoothing (RS) and the approximation efﬁciency of the Moreau envelope (ME). More speciﬁcally, we show that BMR achieves up to a √ d multiplicative improvement compared to the approximation error of RS, where d is the dimension of the search space, while being less computation intensive than the ME. For non-convex objectives, BMR also has the desirable property to widen local minima, allowing optimization methods to reach small cracks and crevices of extremely irregular and non-convex functions, while being well-suited to a distributed setting. This novel smoothing method is then used to improve ﬁrst-order non-smooth optimization (both convex and non-convex) by allowing for a local exploration of the search space. More speciﬁcally, our analysis sheds light on the similarities be-tween evolution strategies and BMR, creating a link between exploration strategies of zeroth-order methods and the regularity of ﬁrst-order optimization problems. Finally, we evidence the impact of BMR through synthetic experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints b/data/2020/neurips/A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints
new file mode 100644
index 0000000000..93a03de5d8
--- /dev/null
+++ b/data/2020/neurips/A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints	
@@ -0,0 +1 @@
+In this paper, we consider an online optimization problem in which the reward functions are DR-submodular, and in addition to maximizing the total reward, the sequence of decisions must satisfy some convex constraints on average. Speciﬁcally, at each round t ∈ { 1 , . . . , T } , upon committing to an action x t , a DR-submodular utility function f t ( · ) and a convex constraint function g t ( · ) are revealed, and the goal is to maximize the overall utility while ensuring the average of the constraint functions 1 T (cid:80) Tt =1 g t ( x t ) is non-positive. Such cumulative constraints arise naturally in applications where the average resource consumption is required to remain below a prespeciﬁed threshold. We study this problem under an adversarial model and a stochastic model for the convex constraints, where the functions g t can vary arbitrarily or according to an i.i.d. process over time slots t ∈ { 1 , . . . , T } , respectively. We propose a single algorithm which achieves sub-linear (with respect to T ) regret as well as sub-linear constraint violation bounds in both settings, without prior knowledge of the regime. Prior works have studied this problem in the special case of linear constraint functions. Our results not only improve upon the existing bounds under linear cumulative constraints, but also give the ﬁrst sub-linear bounds for general convex long-term constraints.
\ No newline at end of file
diff --git a/data/2020/neurips/A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems b/data/2020/neurips/A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems
new file mode 100644
index 0000000000..b1d2728f7e
--- /dev/null
+++ b/data/2020/neurips/A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems	
@@ -0,0 +1 @@
+Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/\epsilon^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/\epsilon^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.
\ No newline at end of file
diff --git a/data/2020/neurips/A Spectral Energy Distance for Parallel Speech Synthesis b/data/2020/neurips/A Spectral Energy Distance for Parallel Speech Synthesis
new file mode 100644
index 0000000000..6610386e5e
--- /dev/null
+++ b/data/2020/neurips/A Spectral Energy Distance for Parallel Speech Synthesis	
@@ -0,0 +1 @@
+Speech synthesis is an important practical generative modeling problem that has seen great progress over the last few years, with likelihood-based autoregressive neural models now outperforming traditional concatenative systems. A downside of such autoregressive models is that they require executing tens of thousands of sequential operations per second of generated audio, making them ill-suited for deployment on specialized deep learning hardware. Here, we propose a new learning method that allows us to train highly parallel models of speech, without requiring access to an analytical likelihood function. Our approach is based on a generalized energy distance between the distributions of the generated and real audio. This spectral energy distance is a proper scoring rule with respect to the distribution over magnitude-spectrograms of the generated waveform audio and offers statistical consistency guarantees. The distance can be calculated from minibatches without bias, and does not involve adversarial learning, yielding a stable and consistent method for training implicit generative models. Empirically, we achieve state-of-the-art generation quality among implicit generative models, as judged by the recently-proposed cFDSD metric. When combining our method with adversarial techniques, we also improve upon the recently-proposed GAN-TTS model in terms of Mean Opinion Score as judged by trained human evaluators.
\ No newline at end of file
diff --git a/data/2020/neurips/A Statistical Framework for Low-bitwidth Training of Deep Neural Networks b/data/2020/neurips/A Statistical Framework for Low-bitwidth Training of Deep Neural Networks
new file mode 100644
index 0000000000..f876fae692
--- /dev/null
+++ b/data/2020/neurips/A Statistical Framework for Low-bitwidth Training of Deep Neural Networks	
@@ -0,0 +1 @@
+Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.
\ No newline at end of file
diff --git a/data/2020/neurips/A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning b/data/2020/neurips/A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning
new file mode 100644
index 0000000000..d8550b9761
--- /dev/null
+++ b/data/2020/neurips/A Statistical Mechanics Framework for Task-Agnostic Sample Design in Machine Learning	
@@ -0,0 +1 @@
+In this paper, we present a statistical mechanics framework to understand the effect of sampling properties of training data on the generalization gap of machine learning (ML) algorithms. We connect the generalization gap to the spatial properties of a sample design characterized by the pair correlation function (PCF). In particular, we express generalization gap in terms of the power spectra of the sample design and that of the function to be learned. Using this framework, we show that space-ﬁlling sample designs, such as blue noise and Poisson disk sampling, which optimize spectral properties, outperform random designs in terms of the generalization gap and characterize this gain in a closed-form. Our analysis also sheds light on design principles for constructing optimal task-agnostic sample designs that minimize the generalization gap. We corroborate our ﬁndings using regression experiments with neural networks on: a) synthetic functions, and b) a complex scientiﬁc simulator for inertial conﬁnement fusion (ICF).
\ No newline at end of file
diff --git a/data/2020/neurips/A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm b/data/2020/neurips/A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm
new file mode 100644
index 0000000000..93c0589edd
--- /dev/null
+++ b/data/2020/neurips/A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm	
@@ -0,0 +1 @@
+The Expectation Maximization (EM) algorithm is of key importance for inference in latent variable models including mixture of regressors and experts, missing observations. This paper introduces a novel EM algorithm, called SPIDER-EM, for inference from a training set of size n, n ≫ 1. At the core of our algorithm is an estimator of the full conditional expectation in the E-step, adapted from the stochastic path-integrated differential estimator (SPIDER) technique. We derive finite-time complexity bounds for smooth non-convex likelihood: we show that for convergence to an ǫ-approximate stationary point, the complexity scales as K Opt (n, ǫ) = O(ǫ −1) and K CE (n, ǫ) = n + √ nO(ǫ −1), where K Opt (n, ǫ) and K CE (n, ǫ) are respectively the number of M-steps and the number of per-sample conditional expectations evaluations. This improves over the state-of-the-art algorithms. Numerical results support our findings.
\ No newline at end of file
diff --git a/data/2020/neurips/A Study on Encodings for Neural Architecture Search b/data/2020/neurips/A Study on Encodings for Neural Architecture Search
new file mode 100644
index 0000000000..cef2b48831
--- /dev/null
+++ b/data/2020/neurips/A Study on Encodings for Neural Architecture Search	
@@ -0,0 +1,2 @@
+Neural architecture search (NAS) has been extensively studied in the past few years. A popular approach is to represent each neural architecture in the search space as a directed acyclic graph (DAG), and then search over all DAGs by encoding the adjacency matrix and list of operations as a set of hyperparameters. Recent work has demonstrated that even small changes to the way each architecture is encoded can have a significant effect on the performance of NAS algorithms. 
+In this work, we present the first formal study on the effect of architecture encodings for NAS, including a theoretical grounding and an empirical study. First we formally define architecture encodings and give a theoretical characterization on the scalability of the encodings we study Then we identify the main encoding-dependent subroutines which NAS algorithms employ, running experiments to show which encodings work best with each subroutine for many popular algorithms. The experiments act as an ablation study for prior work, disentangling the algorithmic and encoding-based contributions, as well as a guideline for future work. Our results demonstrate that NAS encodings are an important design decision which can have a significant impact on overall performance. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/A Theoretical Framework for Target Propagation b/data/2020/neurips/A Theoretical Framework for Target Propagation
new file mode 100644
index 0000000000..7cec602bcf
--- /dev/null
+++ b/data/2020/neurips/A Theoretical Framework for Target Propagation	
@@ -0,0 +1 @@
+The success of deep learning, a brain-inspired form of AI, has sparked interest in understanding how the brain could similarly learn across multiple layers of neurons. However, the majority of biologically-plausible learning algorithms have not yet reached the performance of backpropagation (BP), nor are they built on strong theoretical foundations. Here, we analyze target propagation (TP), a popular but not yet fully understood alternative to BP, from the standpoint of mathematical optimization. Our theory shows that TP is closely related to Gauss-Newton optimization and thus substantially differs from BP. Furthermore, our analysis reveals a fundamental limitation of difference target propagation (DTP), a well-known variant of TP, in the realistic scenario of non-invertible neural networks. We provide a first solution to this problem through a novel reconstruction loss that improves feedback weight training, while simultaneously introducing architectural flexibility by allowing for direct feedback connections from the output to each hidden layer. Our theory is corroborated by experimental results that show significant improvements in performance and in the alignment of forward weight updates with loss gradients, compared to DTP.
\ No newline at end of file
diff --git a/data/2020/neurips/A Tight Lower Bound and Efficient Reduction for Swap Regret b/data/2020/neurips/A Tight Lower Bound and Efficient Reduction for Swap Regret
new file mode 100644
index 0000000000..1a34e0a514
--- /dev/null
+++ b/data/2020/neurips/A Tight Lower Bound and Efficient Reduction for Swap Regret	
@@ -0,0 +1 @@
+Swap regret, a generic performance measure of online decision-making algorithms, plays an important role in the theory of repeated games, along with a close connection to correlated equilibria in strategic games. This paper shows an Ω( √ TN log N ) -lower bound for swap regret, where T and N denote the numbers of time steps and available actions, respectively. Our lower bound is tight up to a constant, and resolves an open problem mentioned, e.g., in the book by Nisan et al. [28]. Besides, we present a computationally efﬁcient reduction method that converts no-external-regret algorithms to no-swap-regret algorithms. This method can be applied not only to the full-information setting but also to the bandit setting and provides a better regret bound than previous results.
\ No newline at end of file
diff --git a/data/2020/neurips/A Topological Filter for Learning with Label Noise b/data/2020/neurips/A Topological Filter for Learning with Label Noise
new file mode 100644
index 0000000000..f1df4e013a
--- /dev/null
+++ b/data/2020/neurips/A Topological Filter for Learning with Label Noise	
@@ -0,0 +1 @@
+Noisy labels can impair the performance of deep neural networks. To tackle this problem, in this paper, we propose a new method for filtering label noise. Unlike most existing methods relying on the posterior probability of a noisy classifier, we focus on the much richer spatial behavior of data in the latent representational space. By leveraging the high-order topological information of data, we are able to collect most of the clean data and train a high-quality model. Theoretically we prove that this topological approach is guaranteed to collect the clean data with high probability. Empirical results show that our method outperforms the state-of-the-arts and is robust to a broad spectrum of noise types and levels.
\ No newline at end of file
diff --git a/data/2020/neurips/A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms b/data/2020/neurips/A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
new file mode 100644
index 0000000000..10a655eed3
--- /dev/null
+++ b/data/2020/neurips/A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms	
@@ -0,0 +1 @@
+This paper develops a novel and uniﬁed framework to analyze the convergence of a large family of Q-learning algorithms from the switching system perspective. We show that the nonlinear ODE models associated with Q-learning and many of its variants can be naturally formulated as afﬁne switching systems . Building on their asymptotic stability, we obtain a number of interesting results: (i) we provide a simple ODE analysis for the convergence of asynchronous Q-learning under relatively weak assumptions; (ii) we establish the ﬁrst convergence analysis of the averaging Q-learning algorithm, and (iii) we derive a new sufﬁcient condition for the convergence of Q-learning with linear function approximation
\ No newline at end of file
diff --git a/data/2020/neurips/A Unified View of Label Shift Estimation b/data/2020/neurips/A Unified View of Label Shift Estimation
new file mode 100644
index 0000000000..0258b85a67
--- /dev/null
+++ b/data/2020/neurips/A Unified View of Label Shift Estimation	
@@ -0,0 +1 @@
+Label shift describes the setting where although the label distribution might change between the source and target domains, the class-conditional probabilities (of data given a label) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of the likelihood-based estimator. Our contributions include (i) conditions for consistency of MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified view of the methods, casting the confusion matrix as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting the impacts of miscalibration and estimation error. Our analysis attributes BBSE's statistical inefficiency to a loss of information due to coarse calibration. We support our findings with experiments on both synthetic data and the MNIST and CIFAR10 image recognition datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/A Unifying View of Optimism in Episodic Reinforcement Learning b/data/2020/neurips/A Unifying View of Optimism in Episodic Reinforcement Learning
new file mode 100644
index 0000000000..274a2471bf
--- /dev/null
+++ b/data/2020/neurips/A Unifying View of Optimism in Episodic Reinforcement Learning	
@@ -0,0 +1 @@
+The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.
\ No newline at end of file
diff --git a/data/2020/neurips/A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions b/data/2020/neurips/A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions
new file mode 100644
index 0000000000..a5178cae10
--- /dev/null
+++ b/data/2020/neurips/A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions	
@@ -0,0 +1 @@
+This paper studies the universal approximation property of deep neural networks for representing probability distributions. Given a target distribution $\pi$ and a source distribution $p_z$ both defined on $\mathbb{R}^d$, we prove under some assumptions that there exists a deep neural network $g:\mathbb{R}^d\rightarrow \mathbb{R}$ with ReLU activation such that the push-forward measure $(\nabla g)_\# p_z$ of $p_z$ under the map $\nabla g$ is arbitrarily close to the target measure $\pi$. The closeness are measured by three classes of integral probability metrics between probability distributions: $1$-Wasserstein distance, maximum mean distance (MMD) and kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width and depth) of the deep neural network in terms of the dimension $d$ and the approximation error $\varepsilon$ with respect to the three discrepancies. In particular, the size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy, whereas for both MMD and KSD the size of neural network only depends on $d$ at most polynomially. Our proof relies on convergence estimates of empirical measures under aforementioned discrepancies and semi-discrete optimal transport.
\ No newline at end of file
diff --git a/data/2020/neurips/A Variational Approach for Learning from Positive and Unlabeled Data b/data/2020/neurips/A Variational Approach for Learning from Positive and Unlabeled Data
new file mode 100644
index 0000000000..339c015110
--- /dev/null
+++ b/data/2020/neurips/A Variational Approach for Learning from Positive and Unlabeled Data	
@@ -0,0 +1 @@
+Learning binary classifiers only from positive and unlabeled (PU) data is an important and challenging task in many real-world applications, including web text classification, disease gene identification and fraud detection, where negative samples are difficult to verify experimentally. Most recent PU learning methods are developed based on the conventional misclassification risk of the supervised learning type, and they require to solve the intractable risk estimation problem by approximating the negative data distribution or the class prior. In this paper, we introduce a variational principle for PU learning that allows us to quantitatively evaluate the modeling error of the Bayesian classifier directly from given data. This leads to a loss function which can be efficiently calculated without any intermediate step or model, and a variational learning method can then be employed to optimize the classifier under general conditions. In addition, the discriminative performance and numerical stability of the variational PU learning method can be further improved by incorporating a margin maximizing loss function. We illustrate the effectiveness of the proposed variational method on a number of benchmark examples.
\ No newline at end of file
diff --git a/data/2020/neurips/A causal view of compositional zero-shot recognition b/data/2020/neurips/A causal view of compositional zero-shot recognition
new file mode 100644
index 0000000000..6015f5e6ec
--- /dev/null
+++ b/data/2020/neurips/A causal view of compositional zero-shot recognition	
@@ -0,0 +1,2 @@
+People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution. Unfortunately, learning systems struggle with compositional generalization because they often build on features that are correlated with class labels even if they are not "essential" for the class. This leads to consistent misclassification of samples from a new distribution, like new combinations of known components. 
+Here we describe an approach for compositional generalization that builds on causal ideas. First, we describe compositional zero-shot learning from a causal perspective, and propose to view zero-shot inference as finding "which intervention caused the image?". Second, we present a causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data. We evaluate this approach on two datasets for predicting new combinations of attribute-object pairs: A well-controlled synthesized images dataset and a real world dataset which consists of fine-grained types of shoes. We show improvements compared to strong baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/A convex optimization formulation for multivariate regression b/data/2020/neurips/A convex optimization formulation for multivariate regression
new file mode 100644
index 0000000000..b1826e441a
--- /dev/null
+++ b/data/2020/neurips/A convex optimization formulation for multivariate regression	
@@ -0,0 +1 @@
+Multivariate regression (or multi-task learning) concerns the task of predicting the value of multiple responses from a set of covariates. In this article, we pro-pose a convex optimization formulation for high-dimensional multivariate linear regression under a general error covariance structure. The main difﬁculty with simultaneous estimation of the regression coefﬁcients and the error covariance matrix lies in the fact that the negative log-likelihood function is not convex. To overcome this difﬁculty, a new parameterization is proposed, under which the negative log-likelihood function is proved to be convex. For faster computation, two other alternative loss functions are also considered, and proved to be convex under the proposed parameterization. This new parameterization is also useful for covariate-adjusted Gaussian graphical modeling in which the inverse of the error covariance matrix is of interest. A joint non-asymptotic analysis of the regression coefﬁcients and the error covariance matrix is carried out under the new parameterization. In particular, we show that the proposed method recovers the oracle estimator under sharp scaling conditions, and rates of convergence in terms of vector (cid:96) ∞ norm are also established. Empirically, the proposed methods outperform existing high-dimensional multivariate linear regression methods that are based on either minimizing certain non-convex criteria or certain two-step procedures.
\ No newline at end of file
diff --git a/data/2020/neurips/A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning b/data/2020/neurips/A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning
new file mode 100644
index 0000000000..8de9373cdc
--- /dev/null
+++ b/data/2020/neurips/A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning	
@@ -0,0 +1 @@
+Multi-agent reinforcement learning has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource management. Crucial common-pool resources include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society's greatest challenges such as food security, inequality and climate change. Here we take inspiration from a recent research program investigating the game-theoretic incentives of humans in social dilemma situations such as the well-known tragedy of the commons. However, instead of focusing on biologically evolved human-like agents, our concern is rather to better understand the learning and operating behaviour of engineered networked systems comprising general-purpose reinforcement learning agents, subject only to nonbiological constraints such as memory, computation and communication bandwidth. Harnessing tools from empirical game-theoretic analysis, we analyse the differences in resulting solution concepts that stem from employing different information structures in the design of networked multi-agent systems. These information structures pertain to the type of information shared between agents as well as the employed communication protocol and network topology. Our analysis contributes new insights into the consequences associated with certain design choices and provides an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.
\ No newline at end of file
diff --git a/data/2020/neurips/A kernel test for quasi-independence b/data/2020/neurips/A kernel test for quasi-independence
new file mode 100644
index 0000000000..2f770e9a38
--- /dev/null
+++ b/data/2020/neurips/A kernel test for quasi-independence	
@@ -0,0 +1 @@
+We consider settings in which the data of interest correspond to pairs of ordered times, e.g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial. In these settings, the two times are not independent (the second occurs after the first), yet it is still of interest to determine whether there exists significant dependence {\em beyond} their ordering in time. We refer to this notion as "quasi-(in)dependence". For instance, in a clinical trial, to avoid biased selection, we might wish to verify that recruitment times are quasi-independent of survival times, where dependencies might arise due to seasonal effects. In this paper, we propose a nonparametric statistical test of quasi-independence. Our test considers a potentially infinite space of alternatives, making it suitable for complex data where the nature of the possible quasi-dependence is not known in advance. Standard parametric approaches are recovered as special cases, such as the classical conditional Kendall's tau, and log-rank tests. The tests apply in the right-censored setting: an essential feature in clinical trials, where patients can withdraw from the study. We provide an asymptotic analysis of our test-statistic, and demonstrate in experiments that our test obtains better power than existing approaches, while being more computationally efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/A mathematical model for automatic differentiation in machine learning b/data/2020/neurips/A mathematical model for automatic differentiation in machine learning
new file mode 100644
index 0000000000..1dab52015b
--- /dev/null
+++ b/data/2020/neurips/A mathematical model for automatic differentiation in machine learning	
@@ -0,0 +1 @@
+Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning. In this work we articulate the relationships between differentiation of programs as implemented in practice and differentiation of nonsmooth functions. To this end we provide a simple class of functions, a nonsmooth calculus, and show how they apply to stochastic approximation methods. We also evidence the issue of artificial critical points created by algorithmic differentiation and show how usual methods avoid these points with probability one.
\ No newline at end of file
diff --git a/data/2020/neurips/A mathematical theory of cooperative communication b/data/2020/neurips/A mathematical theory of cooperative communication
new file mode 100644
index 0000000000..6fc796ee9a
--- /dev/null
+++ b/data/2020/neurips/A mathematical theory of cooperative communication	
@@ -0,0 +1 @@
+Cooperative communication plays a central role in theories of human cognition, language, development, culture, and human-robot interaction. Prior models of cooperative communication are algorithmic in nature and do not shed light on why cooperation may yield effective belief transmission and what limitations may arise due to differences between beliefs of agents. Through a connection to the theory of optimal transport, we establishing a mathematical framework for cooperative communication. We derive prior models as special cases, statistical interpretations of belief transfer plans, and proofs of robustness and instability. Computational simulations support and elaborate our theoretical results, and demonstrate fit to human behavior. The results show that cooperative communication provably enables effective, robust belief transmission which is required to explain feats of human learning and improve human-machine interaction.
\ No newline at end of file
diff --git a/data/2020/neurips/A mean-field analysis of two-player zero-sum games b/data/2020/neurips/A mean-field analysis of two-player zero-sum games
new file mode 100644
index 0000000000..23a275fbe0
--- /dev/null
+++ b/data/2020/neurips/A mean-field analysis of two-player zero-sum games	
@@ -0,0 +1 @@
+Finding Nash equilibria in two-player zero-sum continuous games is a central problem in machine learning, e.g. for training both GANs and robust models. The existence of pure Nash equilibria requires strong conditions which are not typically met in practice. Mixed Nash equilibria exist in greater generality and may be found using mirror descent. Yet this approach does not scale to high dimensions. To address this limitation, we parametrize mixed strategies as mixtures of particles, whose positions and weights are updated using gradient descent-ascent. We study this dynamics as an interacting gradient flow over measure spaces endowed with the Wasserstein-Fisher-Rao metric. We establish global convergence to an approximate equilibrium for the related Langevin gradient-ascent dynamic. We prove a law of large numbers that relates particle dynamics to mean-field dynamics. Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.
\ No newline at end of file
diff --git a/data/2020/neurips/A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network b/data/2020/neurips/A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network
new file mode 100644
index 0000000000..2a2df27a45
--- /dev/null
+++ b/data/2020/neurips/A meta-learning approach to (re)discover plasticity rules that carve a desired function into a neural network	
@@ -0,0 +1 @@
+The search for biologically faithful synaptic plasticity rules has resulted in a large body of models. They are usually inspired by – and fitted to – experimental data, but they rarely produce neural dynamics that serve complex functions. These failures suggest that current plasticity models are still under-constrained by existing data. Here, we present an alternative approach that uses meta-learning to discover plausible synaptic plasticity rules. Instead of experimental data, the rules are constrained by the functions they implement and the structure they are meant to produce. Briefly, we parameterize synaptic plasticity rules by a Volterra expansion and then use supervised learning methods (gradient descent or evolutionary strategies) to minimize a problem-dependent loss function that quantifies how effectively a candidate plasticity rule transforms an initially random network into one with the desired function. We first validate our approach by re-discovering previously described plasticity rules, starting at the single-neuron level and “Oja’s rule”, a simple Hebbian plasticity rule that captures the direction of most variability of inputs to a neuron (i.e., the first principal component). We expand the problem to the network level and ask the framework to find Oja’s rule together with an anti-Hebbian rule such that an initially random two-layer firing-rate network will recover several principal components of the input space after learning. Next, we move to networks of integrate-and-fire neurons with plastic inhibitory afferents. We train for rules that achieve a target firing rate by countering tuned excitation. Our algorithm discovers a specific subset of the manifold of rules that can solve this task. Our work is a proof of principle of an automated and unbiased approach to unveil synaptic plasticity rules that obey biological constraints and can solve complex functions.
\ No newline at end of file
diff --git a/data/2020/neurips/A new convergent variant of Q-learning with linear function approximation b/data/2020/neurips/A new convergent variant of Q-learning with linear function approximation
new file mode 100644
index 0000000000..6ca967fbe2
--- /dev/null
+++ b/data/2020/neurips/A new convergent variant of Q-learning with linear function approximation	
@@ -0,0 +1 @@
+In this work, we identify a novel set of conditions that ensure convergence with probability 1 of Q -learning with linear function approximation, by proposing a two time-scale variation thereof. In the faster time scale, the algorithm features an update similar to that of DQN, where the impact of bootstrapping is attenuated by using a Q -value estimate akin to that of the target network in DQN. The slower time-scale, in turn, can be seen as a modiﬁed target network update. We establish the convergence of our algorithm, provide an error bound and discuss our results in light of existing convergence results on reinforcement learning with function approximation. Finally, we illustrate the convergent behavior of our method in domains where standard Q -learning has previously been shown to diverge.
\ No newline at end of file
diff --git a/data/2020/neurips/A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons b/data/2020/neurips/A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons
new file mode 100644
index 0000000000..eee470d6e9
--- /dev/null
+++ b/data/2020/neurips/A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons	
@@ -0,0 +1 @@
+Generalized linear models are one of the most efficient paradigms for predicting the correlated stochastic activity of neuronal networks in response to external stimuli, with applications in many brain areas. However, when dealing with complex stimuli, the inferred coupling parameters often do not generalize across different stimulus statistics, leading to degraded performance and blowup instabilities. Here, we develop a two-step inference strategy that allows us to train robust generalized linear models of interacting neurons, by explicitly separating the effects of correlations in the stimulus from network interactions in each training step. Applying this approach to the responses of retinal ganglion cells to complex visual stimuli, we show that, compared to classical methods, the models trained in this way exhibit improved performance, are more stable, yield robust interaction networks, and generalize well across complex visual statistics. The method can be extended to deep convolutional neural networks, leading to models with high predictive accuracy for both the neuron firing rates and their correlations.
\ No newline at end of file
diff --git a/data/2020/neurips/A novel variational form of the Schatten-$p$ quasi-norm b/data/2020/neurips/A novel variational form of the Schatten-$p$ quasi-norm
new file mode 100644
index 0000000000..1b3ed68cf6
--- /dev/null
+++ b/data/2020/neurips/A novel variational form of the Schatten-$p$ quasi-norm	
@@ -0,0 +1 @@
+The Schatten-$p$ quasi-norm with $p\in(0,1)$ has recently gained considerable attention in various low-rank matrix estimation problems offering significant benefits over relevant convex heuristics such as the nuclear norm. However, due to the nonconvexity of the Schatten-$p$ quasi-norm, minimization suffers from two major drawbacks: 1) the lack of theoretical guarantees and 2) the high computational cost which is demanded for the minimization task even for trivial tasks such as finding stationary points. In an attempt to reduce the high computational cost induced by Schatten-$p$ quasi-norm minimization, variational forms, which are defined over smaller-size matrix factors whose product equals the original matrix, have been proposed. Here, we propose and analyze a novel variational form of Schatten-$p$ quasi-norm which, for the first time in the literature, is defined for any continuous value of $p\in(0,1]$ and decouples along the columns of the factorized matrices. The proposed form can be considered as the natural generalization of the well-known variational form of the nuclear norm to the nonconvex case i.e., for $p\in(0,1)$. The resulting formulation gives way to SVD-free algorithms thus offering lower computational complexity than the one that is induced by the original definition of the Schatten-$p$ quasi-norm. A local optimality analysis is provided which shows~that we can arrive at a local minimum of the original Schatten-$p$ quasi-norm problem by reaching a local minimum of the matrix factorization based surrogate problem. In addition, for the case of the squared Frobenius loss with linear operators obeying the restricted isometry property (RIP), a rank-one update scheme is proposed, which offers a way to escape poor local minima. Finally, the efficiency of our approach is empirically shown on a matrix completion problem.
\ No newline at end of file
diff --git a/data/2020/neurips/A polynomial-time algorithm for learning nonparametric causal graphs b/data/2020/neurips/A polynomial-time algorithm for learning nonparametric causal graphs
new file mode 100644
index 0000000000..be288efca8
--- /dev/null
+++ b/data/2020/neurips/A polynomial-time algorithm for learning nonparametric causal graphs	
@@ -0,0 +1 @@
+We establish finite-sample guarantees for a polynomial-time algorithm for learning a nonlinear, nonparametric directed acyclic graphical (DAG) model from data. The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness. Instead, we impose a condition on the residual variances that is closely related to previous work on linear models with equal variances. Compared to an optimal algorithm with oracle knowledge of the variable ordering, the additional cost of the algorithm is linear in the dimension $d$ and the number of samples $n$. Finally, we compare the proposed algorithm to existing approaches in a simulation study.
\ No newline at end of file
diff --git a/data/2020/neurips/A shooting formulation of deep learning b/data/2020/neurips/A shooting formulation of deep learning
new file mode 100644
index 0000000000..26e14752ab
--- /dev/null
+++ b/data/2020/neurips/A shooting formulation of deep learning	
@@ -0,0 +1 @@
+Continuous-depth neural networks can be viewed as deep limits of discrete neural networks whose dynamics resemble a discretization of an ordinary differential equation (ODE). Although important steps have been taken to realize the advantages of such continuous formulations, most current techniques are not truly continuous-depth as they assume identical layers. Indeed, existing works throw into relief the myriad difficulties presented by an infinite-dimensional parameter space in learning a continuous-depth neural ODE. To this end, we introduce a shooting formulation which shifts the perspective from parameterizing a network layer-by-layer to parameterizing over optimal networks described only by a set of initial conditions. For scalability, we propose a novel particle-ensemble parametrization which fully specifies the optimal weight trajectory of the continuous-depth neural network. Our experiments show that our particle-ensemble shooting formulation can achieve competitive performance, especially on long-range forecasting tasks. Finally, though the current work is inspired by continuous-depth neural networks, the particle-ensemble shooting formulation also applies to discrete-time networks and may lead to a new fertile area of research in deep learning parametrization.
\ No newline at end of file
diff --git a/data/2020/neurips/A simple normative network approximates local non-Hebbian learning in the cortex b/data/2020/neurips/A simple normative network approximates local non-Hebbian learning in the cortex
new file mode 100644
index 0000000000..537dd2fbed
--- /dev/null
+++ b/data/2020/neurips/A simple normative network approximates local non-Hebbian learning in the cortex	
@@ -0,0 +1 @@
+To guide behavior, the brain extracts relevant features from high-dimensional data streamed by sensory organs. Neuroscience experiments demonstrate that the processing of sensory inputs by cortical neurons is modulated by instructive signals which provide context and task-relevant information. Here, adopting a normative approach, we model these instructive signals as supervisory inputs guiding the projection of the feedforward data. Mathematically, we start with a family of Reduced-Rank Regression (RRR) objective functions which include Reduced Rank (minimum) Mean Square Error (RRMSE) and Canonical Correlation Analysis (CCA), and derive novel offline and online optimization algorithms, which we call Bio-RRR. The online algorithms can be implemented by neural networks whose synaptic learning rules resemble calcium plateau potential dependent plasticity observed in the cortex. We detail how, in our model, the calcium plateau potential can be interpreted as a backpropagating error signal. We demonstrate that, despite relying exclusively on biologically plausible local learning rules, our algorithms perform competitively with existing implementations of RRMSE and CCA.
\ No newline at end of file
diff --git a/data/2020/neurips/AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity b/data/2020/neurips/AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity
new file mode 100644
index 0000000000..5cb148b6f8
--- /dev/null
+++ b/data/2020/neurips/AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity	
@@ -0,0 +1 @@
+We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity. It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data, and also by discovering many formulas that stumped previous methods. We develop a method for discovering generalized symmetries (arbitrary modularity in the computational graph of a formula) from gradient properties of a neural network fit. We use normalizing flows to generalize our symbolic regression method to probability distributions from which we only have samples, and employ statistical hypothesis testing to accelerate robust brute-force search.
\ No newline at end of file
diff --git a/data/2020/neurips/AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection b/data/2020/neurips/AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection
new file mode 100644
index 0000000000..41cec6164d
--- /dev/null
+++ b/data/2020/neurips/AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection	
@@ -0,0 +1 @@
+Recent studies have shown that the performance of forgery detection can be improved with diverse and challenging Deepfakes datasets. However, due to the lack of Deepfakes datasets with large variance in appearance, which can be hardly produced by recent identity swapping methods, the detection algorithm may fail in this situation. In this work, we provide a new identity swapping algorithm with large differences in appearance for face forgery detection. The appearance gaps mainly arise from the large discrepancies in illuminations and skin colors that widely exist in real-world scenarios. However, due to the difficulties of modeling the complex appearance mapping, it is challenging to transfer fine-grained appearances adaptively while preserving identity traits. This paper formulates appearance mapping as an optimal transport problem and proposes an Appearance Optimal Transport model (AOT) to formulate it in both latent and pixel space. Specifically, a relighting generator is designed to simulate the optimal transport plan. It is solved via minimizing the Wasserstein distance of the learned features in the latent space, enabling better performance and less computation than conventional optimization. To further refine the solution of the optimal transport plan, we develop a segmentation game to minimize the Wasserstein distance in the pixel space. A discriminator is introduced to distinguish the fake parts from a mix of real and fake image patches. Extensive experiments reveal that the superiority of our method when compared with state-of-the-art methods and the ability of our generated data to improve the performance of face forgery detection.
\ No newline at end of file
diff --git a/data/2020/neurips/ARMA Nets: Expanding Receptive Field for Dense Prediction b/data/2020/neurips/ARMA Nets: Expanding Receptive Field for Dense Prediction
new file mode 100644
index 0000000000..53ad7b8f1f
--- /dev/null
+++ b/data/2020/neurips/ARMA Nets: Expanding Receptive Field for Dense Prediction	
@@ -0,0 +1 @@
+Global information is essential for dense prediction problems, whose goal is to compute a discrete or continuous label for each pixel in the images. Traditional convolutional layers in neural networks, initially designed for image classification, are restrictive in these problems since the filter size limits their receptive fields. In this work, we propose to replace any traditional convolutional layer with an autoregressive moving-average (ARMA) layer, a novel module with an adjustable receptive field controlled by the learnable autoregressive coefficients. Compared with traditional convolutional layers, our ARMA layer enables explicit interconnections of the output neurons and learns its receptive field by adapting the autoregressive coefficients of the interconnections. ARMA layer is adjustable to different types of tasks: for tasks where global information is crucial, it is capable of learning relatively large autoregressive coefficients to allow for an output neuron's receptive field covering the entire input; for tasks where only local information is required, it can learn small or near zero autoregressive coefficients and automatically reduces to a traditional convolutional layer. We show both theoretically and empirically that the effective receptive field of networks with ARMA layers (named as ARMA networks) expands with larger autoregressive coefficients. We also provably solve the instability problem of learning and prediction in the ARMA layer through a re-parameterization mechanism. Additionally, we demonstrate that ARMA networks substantially improve their baselines on challenging dense prediction tasks including video prediction and semantic segmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/AViD Dataset: Anonymized Videos from Diverse Countries b/data/2020/neurips/AViD Dataset: Anonymized Videos from Diverse Countries
new file mode 100644
index 0000000000..6bf8094d65
--- /dev/null
+++ b/data/2020/neurips/AViD Dataset: Anonymized Videos from Diverse Countries	
@@ -0,0 +1 @@
+We introduce a new public video dataset for action recognition: Anonymized Videos from Diverse countries (AViD). Unlike existing public video datasets, AViD is a collection of action videos from many different countries. The motivation is to create a public dataset that would benefit training and pretraining of action recognition models for everybody, rather than making it useful for limited countries. Further, all the face identities in the AViD videos are properly anonymized to protect their privacy. It also is a static dataset where each video is licensed with the creative commons license. We confirm that most of the existing video datasets are statistically biased to only capture action videos from a limited number of countries. We experimentally illustrate that models trained with such biased datasets do not transfer perfectly to action videos from the other countries, and show that AViD addresses such problem. We also confirm that the new AViD dataset could serve as a good dataset for pretraining the models, performing comparably or better than prior datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping b/data/2020/neurips/Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
new file mode 100644
index 0000000000..cb2a85a95b
--- /dev/null
+++ b/data/2020/neurips/Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping	
@@ -0,0 +1 @@
+Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language modeling. In this work, we propose a method based on progressive layer dropping that speeds the training of Transformer-based language models, not at the cost of excessive hardware resources but from model architecture change and training technique boosted efficiency. Extensive experiments on BERT show that the proposed method achieves a 24% time reduction on average per sample and allows the pre-training to be 2.5 times faster than the baseline to get a similar accuracy on downstream tasks. While being faster, our pre-trained models are equipped with strong knowledge transferability, achieving comparable and sometimes higher GLUE score than the baseline when pre-trained with the same number of samples.
\ No newline at end of file
diff --git a/data/2020/neurips/Acceleration with a Ball Optimization Oracle b/data/2020/neurips/Acceleration with a Ball Optimization Oracle
new file mode 100644
index 0000000000..9b0fdcc422
--- /dev/null
+++ b/data/2020/neurips/Acceleration with a Ball Optimization Oracle	
@@ -0,0 +1 @@
+Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}{\epsilon}$ calls to the oracle suffice to find an $\epsilon$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an $\epsilon$-approximate minimizer with roughly $r^{-2/3} \log \frac{1}{\epsilon}$ oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with locally stable Hessians using a variant of Newton's method. The resulting algorithm applies to a number of problems of practical and theoretical import, improving upon previous results for logistic and $\ell_\infty$ regression and achieving guarantees comparable to the state-of-the-art for $\ell_p$ regression.
\ No newline at end of file
diff --git a/data/2020/neurips/Achieving Equalized Odds by Resampling Sensitive Attributes b/data/2020/neurips/Achieving Equalized Odds by Resampling Sensitive Attributes
new file mode 100644
index 0000000000..697f788c95
--- /dev/null
+++ b/data/2020/neurips/Achieving Equalized Odds by Resampling Sensitive Attributes	
@@ -0,0 +1 @@
+We present a flexible framework for learning predictive models that approximately satisfy the equalized odds notion of fairness. This is achieved by introducing a general discrepancy functional that rigorously quantifies violations of this criterion. This differentiable functional is used as a penalty driving the model parameters towards equalized odds. To rigorously evaluate fitted models, we develop a formal hypothesis test to detect whether a prediction rule violates this property, the first such test in the literature. Both the model fitting and hypothesis testing leverage a resampled version of the sensitive attribute obeying equalized odds, by construction. We demonstrate the applicability and validity of the proposed framework both in regression and multi-class classification problems, reporting improved performance over state-of-the-art methods. Lastly, we show how to incorporate techniques for equitable uncertainty quantification---unbiased for each group under study---to communicate the results of the data analysis in exact terms.
\ No newline at end of file
diff --git a/data/2020/neurips/Active Invariant Causal Prediction: Experiment Selection through Stability b/data/2020/neurips/Active Invariant Causal Prediction: Experiment Selection through Stability
new file mode 100644
index 0000000000..c8ae87cc3e
--- /dev/null
+++ b/data/2020/neurips/Active Invariant Causal Prediction: Experiment Selection through Stability	
@@ -0,0 +1 @@
+A fundamental difficulty of causal learning is that causal models can generally not be fully identified based on observational data only. Interventional data, that is, data originating from different experimental environments, improves identifiability. However, the improvement depends critically on the target and nature of the interventions carried out in each experiment. Since in real applications experiments tend to be costly, there is a need to perform the right interventions such that as few as possible are required. In this work we propose a new active learning (i.e. experiment selection) framework (A-ICP) based on Invariant Causal Prediction (ICP) (Peters et al., 2016). For general structural causal models, we characterize the effect of interventions on so-called stable sets, a notion introduced by (Pfister et al., 2019). We leverage these results to propose several intervention selection policies for A-ICP which quickly reveal the direct causes of a response variable in the causal graph while maintaining the error control inherent in ICP. Empirically, we analyze the performance of the proposed policies in both population and finite-regime experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Active Structure Learning of Causal DAGs via Directed Clique Trees b/data/2020/neurips/Active Structure Learning of Causal DAGs via Directed Clique Trees
new file mode 100644
index 0000000000..ac32a6a858
--- /dev/null
+++ b/data/2020/neurips/Active Structure Learning of Causal DAGs via Directed Clique Trees	
@@ -0,0 +1 @@
+A growing body of work has begun to study intervention design for efficient structure learning of causal directed acyclic graphs (DAGs). A typical setting is a causally sufficient setting, i.e. a system with no latent confounders, selection bias, or feedback, when the essential graph of the observational equivalence class (EC) is given as an input and interventions are assumed to be noiseless. Most existing works focus on worst-case or average-case lower bounds for the number of interventions required to orient a DAG. These worst-case lower bounds only establish that the largest clique in the essential graph could make it difficult to learn the true DAG. In this work, we develop a universal lower bound for single-node interventions that establishes that the largest clique is always a fundamental impediment to structure learning. Specifically, we present a decomposition of a DAG into independently orientable components through directed clique trees and use it to prove that the number of single-node interventions necessary to orient any DAG in an EC is at least the sum of half the size of the largest cliques in each chain component of the essential graph. Moreover, we present a two-phase intervention design algorithm that, under certain conditions on the chordal skeleton, matches the optimal number of interventions up to a multiplicative logarithmic factor in the number of maximal cliques. We show via synthetic experiments that our algorithm can scale to much larger graphs than most of the related work and achieves better worst-case performance than other scalable approaches. A code base to recreate these results can be found at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients b/data/2020/neurips/AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
new file mode 100644
index 0000000000..d55db36846
--- /dev/null
+++ b/data/2020/neurips/AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients	
@@ -0,0 +1 @@
+Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability.We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the stepsize according to the "belief" in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning b/data/2020/neurips/AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
new file mode 100644
index 0000000000..8efc4abda4
--- /dev/null
+++ b/data/2020/neurips/AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning	
@@ -0,0 +1 @@
+Multi-task learning is an open and challenging problem in computer vision. The typical way of conducting multi-task learning with deep neural networks is either through handcrafting schemes that share all initial layers and branch out at an adhoc point or through using separate task-specific networks with an additional feature sharing/fusion mechanism. Unlike existing methods, we propose an adaptive sharing approach, called AdaShare, that decides what to share across which tasks for achieving the best recognition accuracy, while taking resource efficiency into account. Specifically, our main idea is to learn the sharing pattern through a task-specific policy that selectively chooses which layers to execute for a given task in the multi-task network. We efficiently optimize the task-specific policy jointly with the network weights using standard back-propagation. Experiments on three challenging and diverse benchmark datasets with a variable number of tasks well demonstrate the efficacy of our approach over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/AdaTune: Adaptive Tensor Program Compilation Made Efficient b/data/2020/neurips/AdaTune: Adaptive Tensor Program Compilation Made Efficient
new file mode 100644
index 0000000000..f1834d269a
--- /dev/null
+++ b/data/2020/neurips/AdaTune: Adaptive Tensor Program Compilation Made Efficient	
@@ -0,0 +1 @@
+Deep learning models are computationally intense, and implementations often have to be highly optimized by experts or hardware vendors to be usable in practice. The DL compiler, together with Learning-to-Compile has proven to be a powerful technique for optimizing tensor programs. However, a limitation of this approach is that it still suffers from unbearably long overall optimization time. In this paper, we present a new method, called AdaTune, that significantly reduces the optimization time of tensor programs for high-performance deep learning inference. In particular, we propose an adaptive evaluation method that statistically early terminates a costly hardware measurement without losing much accuracy. We further devise a surrogate model with uncertainty quantification that allows the optimization to adapt to hardware and model heterogeneity better. Finally, we introduce a contextual optimizer that provides adaptive control of the exploration and exploitation to improve the transformation space searching effectiveness. We evaluate and compare the levels of optimization obtained by AutoTVM, a state-of-the-art Learning-to-Compile technique on top of TVM, and AdaTune. The experiment results show that AdaTune obtains up to 115% higher GFLOPS than the baseline under the same optimization time budget. Furthermore, AdaTune provides 1.3–3.9\(\times\) speedup in optimization time over the baseline to reach the same optimization quality for a range of models across different hardware architectures.
\ No newline at end of file
diff --git a/data/2020/neurips/Adam with Bandit Sampling for Deep Learning b/data/2020/neurips/Adam with Bandit Sampling for Deep Learning
new file mode 100644
index 0000000000..73eebc9fde
--- /dev/null
+++ b/data/2020/neurips/Adam with Bandit Sampling for Deep Learning	
@@ -0,0 +1 @@
+Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also adapt to different training examples based on their importance in the model's convergence. To achieve this, we maintain a distribution over all examples, selecting a mini-batch in each iteration by sampling according to this distribution, which we update using a multi-armed bandit algorithm. This ensures that examples that are more beneficial to the model training are sampled with higher probabilities. We theoretically show that Adambs improves the convergence rate of Adam---$O(\sqrt{\frac{\log n}{T} })$ instead of $O(\sqrt{\frac{n}{T}})$ in some cases. Experiments on various models and datasets demonstrate Adambs's fast convergence in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptation Properties Allow Identification of Optimized Neural Codes b/data/2020/neurips/Adaptation Properties Allow Identification of Optimized Neural Codes
new file mode 100644
index 0000000000..7864aeedfb
--- /dev/null
+++ b/data/2020/neurips/Adaptation Properties Allow Identification of Optimized Neural Codes	
@@ -0,0 +1 @@
+The adaptation of neural codes to the statistics of their environment is well captured by efficient coding approaches. Here we solve an inverse problem: characterizing the objective and constraint functions that efficient codes appear to be optimal for, on the basis of how they adapt to different stimulus distributions. We formulate a general efficient coding problem, with flexible objective and constraint functions and minimal parametric assumptions. Solving special cases of this model, we provide solutions to broad classes of Fisher information-based efficient coding problems, generalizing a wide range of previous results. We show that different objective function types impose qualitatively different adaptation behaviors, while constraints enforce characteristic deviations from classic efficient coding signatures. Despite interaction between these effects, clear signatures emerge for both unconstrained optimization problems and information-maximizing objective functions. Asking for a fixed-point of the neural code adaptation, we find an objective-independent characterization of constraints on the neural code. We use this result to propose an experimental paradigm that can characterize both the objective and constraint functions that an observed code appears to be optimized for.
\ No newline at end of file
diff --git a/data/2020/neurips/Adapting Neural Architectures Between Domains b/data/2020/neurips/Adapting Neural Architectures Between Domains
new file mode 100644
index 0000000000..de18ac494c
--- /dev/null
+++ b/data/2020/neurips/Adapting Neural Architectures Between Domains	
@@ -0,0 +1 @@
+Neural architecture search (NAS) has demonstrated impressive performance in automatically designing high-performance neural networks. The power of deep neural networks is to be unleashed for analyzing a large volume of data (e.g. ImageNet), but the architecture search is often executed on another smaller dataset (e.g. CIFAR-10) to ﬁnish it in a feasible time. However, it is hard to guarantee that the optimal architecture derived on the proxy task could maintain its advantages on another more challenging dataset. This paper aims to improve the generalization of neural architectures via domain adaptation. We analyze the generalization bounds of the derived architecture and suggest its close relations with the validation error and the data distribution distance on both domains. These theoretical analyses lead to AdaptNAS, a novel and principled approach to adapt neural architectures between domains in NAS. Our experimental evaluation shows that only a small part of ImageNet will be sufﬁcient for AdaptNAS to extend its architecture success to the entire ImageNet and outperform state-of-the-art comparison algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Adapting to Misspecification in Contextual Bandits b/data/2020/neurips/Adapting to Misspecification in Contextual Bandits
new file mode 100644
index 0000000000..972132afef
--- /dev/null
+++ b/data/2020/neurips/Adapting to Misspecification in Contextual Bandits	
@@ -0,0 +1 @@
+A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, but typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for $\varepsilon$-misspecified contextual bandits that adapt to unknown model misspecification -- both for finite and infinite action settings. Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge. Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $O(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown misspecification level $\varepsilon$. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin, which we anticipate will find broader use.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Discretization for Model-Based Reinforcement Learning b/data/2020/neurips/Adaptive Discretization for Model-Based Reinforcement Learning
new file mode 100644
index 0000000000..f42c9ed758
--- /dev/null
+++ b/data/2020/neurips/Adaptive Discretization for Model-Based Reinforcement Learning	
@@ -0,0 +1,2 @@
+We introduce the technique of adaptive discretization to design efficient model-based episodic reinforcement learning algorithms in large (potentially continuous) state-action spaces. Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space. From a theoretical perspective, we provide worst-case regret bounds for our algorithm, which are competitive compared to the state-of-the-art model-based algorithms; moreover, our bounds are obtained via a modular proof technique, which can potentially extend to incorporate additional structure on the problem. 
+From an implementation standpoint, our algorithm has much lower storage and computational requirements, due to maintaining a more efficient partition of the state and action spaces. We illustrate this via experiments on several canonical control problems, which shows that our algorithm empirically performs significantly better than fixed discretization in terms of both faster convergence and lower memory usage. Interestingly, we observe empirically that while fixed-discretization model-based algorithms vastly outperform their model-free counterparts, the two achieve comparable performance with adaptive discretization.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach b/data/2020/neurips/Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach
new file mode 100644
index 0000000000..d1256b4522
--- /dev/null
+++ b/data/2020/neurips/Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach	
@@ -0,0 +1,2 @@
+Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized controlled trials are typically not feasible, since the goal is to estimate policy performance on the entire system. Instead, the typical current practice involves dynamically alternating between the two policies for fixed lengths of time, and comparing the average performance of each over the intervals in which they were run as an estimate of the treatment effect. However, this approach suffers from *temporal interference*: one algorithm alters the state of the system as seen by the second algorithm, biasing estimates of the treatment effect. Further, the simple non-adaptive nature of such designs implies they are not sample efficient. 
+We develop a benchmark theoretical model in which to study optimal experimental design for this setting. We view testing the two policies as the problem of estimating the steady state difference in reward between two unknown Markov chains (i.e., policies). We assume estimation of the steady state reward for each chain proceeds via nonparametric maximum likelihood, and search for consistent (i.e., asymptotically unbiased) experimental designs that are efficient (i.e., asymptotically minimum variance). Characterizing such designs is equivalent to a Markov decision problem with a minimum variance objective; such problems generally do not admit tractable solutions. Remarkably, in our setting, using a novel application of classical martingale analysis of Markov chains via Poisson's equation, we characterize efficient designs via a succinct convex optimization problem. We use this characterization to propose a consistent, efficient online experimental design that adaptively samples the two Markov chains.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Gradient Quantization for Data-Parallel SGD b/data/2020/neurips/Adaptive Gradient Quantization for Data-Parallel SGD
new file mode 100644
index 0000000000..8b362e07f2
--- /dev/null
+++ b/data/2020/neurips/Adaptive Gradient Quantization for Data-Parallel SGD	
@@ -0,0 +1 @@
+Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting b/data/2020/neurips/Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting
new file mode 100644
index 0000000000..3f49227635
--- /dev/null
+++ b/data/2020/neurips/Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting	
@@ -0,0 +1 @@
+Modeling complex spatial and temporal correlations in the correlated time series data is indispensable for understanding the traffic dynamics and predicting the future status of an evolving traffic system. Recent works focus on designing complicated graph neural network architectures to capture shared patterns with the help of pre-defined graphs. In this paper, we argue that learning node-specific patterns is essential for traffic forecasting while the pre-defined graph is avoidable. To this end, we propose two adaptive modules for enhancing Graph Convolutional Network (GCN) with new capabilities: 1) a Node Adaptive Parameter Learning (NAPL) module to capture node-specific patterns; 2) a Data Adaptive Graph Generation (DAGG) module to infer the inter-dependencies among different traffic series automatically. We further propose an Adaptive Graph Convolutional Recurrent Network (AGCRN) to capture fine-grained spatial and temporal correlations in traffic series automatically based on the two modules and recurrent networks. Our experiments on two real-world traffic datasets show AGCRN outperforms state-of-the-art by a significant margin without pre-defined graphs about spatial connections.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes b/data/2020/neurips/Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes
new file mode 100644
index 0000000000..6b64c78cfa
--- /dev/null
+++ b/data/2020/neurips/Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes	
@@ -0,0 +1 @@
+Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $\mathcal{O}(T^{2/3})$ and $\mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $\mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment b/data/2020/neurips/Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment
new file mode 100644
index 0000000000..1facd6a333
--- /dev/null
+++ b/data/2020/neurips/Adaptive Learning of Rank-One Models for Efficient Pairwise Sequence Alignment	
@@ -0,0 +1 @@
+Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment scores for all pairs is still very costly. Moreover, in practice, one is only interested in identifying pairs of reads with large alignment scores. In this work, we propose a new approach to pairwise alignment estimation based on two key new ingredients. The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models, where the workers' responses correspond to k-mer hash collisions. These models can be accurately solved via a spectral decomposition of the response matrix. The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments. The resulting algorithm iteratively performs a spectral decomposition of the response matrix for adaptively chosen subsets of the read pairs.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Online Estimation of Piecewise Polynomial Trends b/data/2020/neurips/Adaptive Online Estimation of Piecewise Polynomial Trends
new file mode 100644
index 0000000000..0532bd06e0
--- /dev/null
+++ b/data/2020/neurips/Adaptive Online Estimation of Piecewise Polynomial Trends	
@@ -0,0 +1 @@
+We consider the framework of non-stationary stochastic optimization [Besbes et al, 2015] with squared error losses and noisy gradient feedback where the dynamic regret of an online learner against a time varying comparator sequence is studied. Motivated from the theory of non-parametric regression, we introduce a new variational constraint that enforces the comparator sequence to belong to a discrete $k^{th}$ order Total Variation ball of radius $C_n$. This variational constraint models comparators that have piece-wise polynomial structure which has many relevant practical applications [Tibshirani, 2014]. By establishing connections to the theory of wavelet based non-parametric regression, we design a polynomial time algorithm that achieves the nearly optimal dynamic regret of $\tilde{O}(n^{\frac{1}{2k+3}}C_n^{\frac{2}{2k+3}})$. The proposed policy is adaptive to the unknown radius $C_n$. Further, we show that the same policy is minimax optimal for several other non-parametric families of interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Probing Policies for Shortest Path Routing b/data/2020/neurips/Adaptive Probing Policies for Shortest Path Routing
new file mode 100644
index 0000000000..6d2b351751
--- /dev/null
+++ b/data/2020/neurips/Adaptive Probing Policies for Shortest Path Routing	
@@ -0,0 +1 @@
+Inspired by trafﬁc routing applications, we consider the problem of ﬁnding the shortest path from a source s to a destination t in a graph, when the lengths of the edges are unknown. Instead, we are given hints or predictions of the edge lengths from a collection of ML models, trained possibly on historical data and other contexts in the network. Additionally, we assume that the true length of any candidate path can be obtained by probing an up-to-date snapshot of the network. However, each probe introduces a latency, and thus the goal is to minimize the number of probes while ﬁnding a near-optimal path with high probability. We formalize this problem and show assumptions under which it admits to efﬁcient approximation algorithms. We verify these assumptions and validate the performance of our algorithms on real data.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Reduced Rank Regression b/data/2020/neurips/Adaptive Reduced Rank Regression
new file mode 100644
index 0000000000..eecaf70cdc
--- /dev/null
+++ b/data/2020/neurips/Adaptive Reduced Rank Regression	
@@ -0,0 +1 @@
+Low rank regression has proven to be useful in a wide range of forecasting problems. However, in settings with a low signal-to-noise ratio, it is known to suffer from severe overfitting. This paper studies the reduced rank regression problem and presents algorithms with provable generalization guarantees. We use adaptive hard rank-thresholding in two different parts of the data analysis pipeline. First, we consider a low rank projection of the data to eliminate the components that are most likely to be noisy. Second, we perform a standard multivariate linear regression estimator on the data obtained in the first step, and subsequently consider a low-rank projection of the obtained regression matrix. Both thresholding is performed in a data-driven manner and is required to prevent severe overfitting as our lower bounds show. Experimental results show that our approach either outperforms or is competitive with existing baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Sampling for Stochastic Risk-Averse Learning b/data/2020/neurips/Adaptive Sampling for Stochastic Risk-Averse Learning
new file mode 100644
index 0000000000..9a6bb3ff85
--- /dev/null
+++ b/data/2020/neurips/Adaptive Sampling for Stochastic Risk-Averse Learning	
@@ -0,0 +1 @@
+We consider the problem of training machine learning models in a risk-averse manner. In particular, we propose an adaptive sampling algorithm for stochastically optimizing the Conditional Value-at-Risk (CVaR) of a loss distribution. We use a distributionally robust formulation of the CVaR to phrase the problem as a zero-sum game between two players. Our approach solves the game using an efficient no-regret algorithm for each player. Critically, we can apply these algorithms to large-scale settings because the implementation relies on sampling from Determinantal Point Processes. Finally, we empirically demonstrate its effectiveness on large-scale convex and non-convex learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Adaptive Shrinkage Estimation for Streaming Graphs b/data/2020/neurips/Adaptive Shrinkage Estimation for Streaming Graphs
new file mode 100644
index 0000000000..3ba0376dde
--- /dev/null
+++ b/data/2020/neurips/Adaptive Shrinkage Estimation for Streaming Graphs	
@@ -0,0 +1 @@
+Networks are a natural representation of complex systems across the sciences, and higher-order dependencies are central to the understanding and modeling of these systems. However, in many practical applications such as online social networks, networks are massive, dynamic, and naturally streaming, where pairwise interactions among vertices become available one at a time in some arbitrary order. The massive size and streaming nature of these networks allow only partial observation, since it is infeasible to analyze the entire network. Under such scenarios, it is challenging to study the higher-order structural and connectivity patterns of streaming networks. In this work, we consider the fundamental problem of estimating the higher-order dependencies using adaptive sampling. We propose a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks. Our algorithms exploit adaptive techniques to identify edges that are highly informative for efficiently estimating the higher-order structure of streaming networks from small sample data. We also introduce a novel James-Stein shrinkage estimator to reduce the estimation error. Our approach is fully analytic, computationally efficient, and can be incrementally updated in a streaming setting. Numerical experiments on large networks show that our approach is superior to baseline methods.
\ No newline at end of file
diff --git a/data/2020/neurips/AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows b/data/2020/neurips/AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows
new file mode 100644
index 0000000000..70b3953e38
--- /dev/null
+++ b/data/2020/neurips/AdvFlow: Inconspicuous Black-box Adversarial Attacks using Normalizing Flows	
@@ -0,0 +1 @@
+Deep learning classifiers are susceptible to well-crafted, imperceptible variations of their inputs, known as adversarial attacks. In this regard, the study of powerful attack models sheds light on the sources of vulnerability in these classifiers, hopefully leading to more robust ones. In this paper, we introduce AdvFlow: a novel black-box adversarial attack method on image classifiers that exploits the power of normalizing flows to model the density of adversarial examples around a given target image. We see that the proposed method generates adversaries that closely follow the clean data distribution, a property which makes their detection less likely. Also, our experimental results show competitive performance of the proposed approach with some of the existing attack methods on defended classifiers, outperforming them in both the number of queries and attack success rate. The code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization b/data/2020/neurips/Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization
new file mode 100644
index 0000000000..a44be84191
--- /dev/null
+++ b/data/2020/neurips/Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization	
@@ -0,0 +1 @@
+Recent research has seen several advances relevant to black-box VI, but the current state of automatic posterior inference is unclear. One such advance is the use of normalizing flows to define flexible posterior densities for deep latent variable models. Another direction is the integration of Monte-Carlo methods to serve two purposes; first, to obtain tighter variational objectives for optimization, and second, to define enriched variational families through sampling. However, both flows and variational Monte-Carlo methods remain relatively unexplored for black-box VI. Moreover, on a pragmatic front, there are several optimization considerations like step-size scheme, parameter initialization, and choice of gradient estimators, for which there are no clear guidance in the existing literature. In this paper, we postulate that black-box VI is best addressed through a careful combination of numerous algorithmic components. We evaluate components relating to optimization, flows, and Monte-Carlo methods on a benchmark of 30 models from the Stan model library. The combination of these algorithmic components significantly advances the state-of-the-art "out of the box" variational inference.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Attacks on Deep Graph Matching b/data/2020/neurips/Adversarial Attacks on Deep Graph Matching
new file mode 100644
index 0000000000..3d9adc8e30
--- /dev/null
+++ b/data/2020/neurips/Adversarial Attacks on Deep Graph Matching	
@@ -0,0 +1 @@
+Despite achieving remarkable performance, deep graph learning models, such as node classiﬁcation and network embedding, suffer from harassment caused by small adversarial perturbations. However, the vulnerability analysis of graph matching under adversarial attacks has not been fully investigated yet. This paper proposes an adversarial attack model with two novel attack techniques to perturb the graph structure and degrade the quality of deep graph matching: (1) a kernel density estimation approach is utilized to estimate and maximize node densities to derive imperceptible perturbations, by pushing attacked nodes to dense regions in two graphs, such that they are indistinguishable from many neighbors; and (2) a meta learning-based projected gradient descent method is developed to well choose attack starting points and to improve the search performance for producing effective perturbations. We evaluate the effectiveness of the attack model on real datasets and validate that the attacks can be transferable to other graph learning models.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Attacks on Linear Contextual Bandits b/data/2020/neurips/Adversarial Attacks on Linear Contextual Bandits
new file mode 100644
index 0000000000..890e4b218b
--- /dev/null
+++ b/data/2020/neurips/Adversarial Attacks on Linear Contextual Bandits	
@@ -0,0 +1 @@
+Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a seller may want to increase the exposure of their products, or thwart a competitor's advertising campaign. In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps, while applying adversarial modifications to either rewards or contexts that only grow logarithmically as $O(\log T)$. We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context (e.g., a specific user). We first provide sufficient conditions for the feasibility of the attack and we then propose an efficient algorithm to perform the attack. We validate our theoretical results on experiments performed on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm b/data/2020/neurips/Adversarial Bandits with Corruptions: Regret Lower Bound and No-regret Algorithm
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Adversarial Blocking Bandits b/data/2020/neurips/Adversarial Blocking Bandits
new file mode 100644
index 0000000000..0c66c8253e
--- /dev/null
+++ b/data/2020/neurips/Adversarial Blocking Bandits	
@@ -0,0 +1 @@
+We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e.g., determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P = NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation. In particular, when the variation budget B_T is known in advance, RGA can achieve O(\sqrt{T(2\tilde{D}+K)B_{T}}) dynamic approximate regret. On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}). We also prove that if either the variation budget or the maximal blocking duration is unbounded, the approximate regret will be at least Theta(T). We also show that the regret upper bound of RGA is tight if the blocking durations are bounded above by an order of O(1).
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion b/data/2020/neurips/Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion
new file mode 100644
index 0000000000..2cfca4659b
--- /dev/null
+++ b/data/2020/neurips/Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion	
@@ -0,0 +1 @@
+We consider the problem of reconstructing a rank-one matrix from a revealed subset of its entries when some of the revealed entries are corrupted with perturbations that are unknown and can be arbitrarily large. It is not known which revealed entries are corrupted. We propose a new algorithm combining alternating minimization with extreme-value filtering and provide sufficient and necessary conditions to recover the original rank-one matrix. In particular, we show that our proposed algorithm is optimal when the set of revealed entries is given by an Erdős-Renyi random graph. These results are then applied to the problem of classification from crowdsourced data under the assumption that while the majority of the workers are governed by the standard single-coin David-Skene model (i.e., they output the correct answer with a certain probability), some of the workers can deviate arbitrarily from this model. In particular, the "adversarial" workers could even make decisions designed to make the algorithm output an incorrect answer. Extensive experimental results show our algorithm for this problem, based on rank-one matrix completion with perturbations, outperforms all other state-of-the-art methods in such an adversarial scenario.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Distributional Training for Robust Deep Learning b/data/2020/neurips/Adversarial Distributional Training for Robust Deep Learning
new file mode 100644
index 0000000000..952d20ef3b
--- /dev/null
+++ b/data/2020/neurips/Adversarial Distributional Training for Robust Deep Learning	
@@ -0,0 +1 @@
+Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. However, the adversarially trained models do not perform well enough on test data or under other attack algorithms unseen during training, which remains to be improved. In this paper, we introduce a novel adversarial distributional training (ADT) framework for learning robust models. Specifically, we formulate ADT as a minimax optimization problem, where the inner maximization aims to learn an adversarial distribution to characterize the potential adversarial examples around a natural one, and the outer minimization aims to train robust classifiers by minimizing the expected loss over the worst-case adversarial distributions. We conduct a theoretical analysis on how to solve the minimax problem, leading to a general algorithm for ADT. We further propose three different approaches to parameterize the adversarial distributions. Empirical results on various benchmarks validate the effectiveness of ADT compared with the state-of-the-art AT methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Example Games b/data/2020/neurips/Adversarial Example Games
new file mode 100644
index 0000000000..2a509a8c93
--- /dev/null
+++ b/data/2020/neurips/Adversarial Example Games	
@@ -0,0 +1 @@
+The existence of adversarial examples capable of fooling trained neural network classifiers calls for a much better understanding of possible attacks to guide the development of safeguards against them. This includes attack methods in the challenging non-interactive blackbox setting, where adversarial attacks are generated without any access, including queries, to the target model. Prior attacks in this setting have relied mainly on algorithmic innovations derived from empirical observations (e.g., that momentum helps), lacking principled transferability guarantees. In this work, we provide a theoretical foundation for crafting transferable adversarial examples to entire hypothesis classes. We introduce Adversarial Example Games (AEG), a framework that models the crafting of adversarial examples as a min-max game between a generator of attacks and a classifier. AEG provides a new way to design adversarial examples by adversarially training a generator and a classifier from a given hypothesis class (e.g., architecture). We prove that this game has an equilibrium, and that the optimal generator is able to craft adversarial examples that can attack any classifier from the corresponding hypothesis class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets, outperforming prior state-of-the-art approaches with an average relative improvement of $29.9\%$ and $47.2\%$ against undefended and robust models (Table 2 & 3) respectively.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Learning for Robust Deep Clustering b/data/2020/neurips/Adversarial Learning for Robust Deep Clustering
new file mode 100644
index 0000000000..4b3fc86a00
--- /dev/null
+++ b/data/2020/neurips/Adversarial Learning for Robust Deep Clustering	
@@ -0,0 +1 @@
+Deep clustering integrates embedding and clustering together to obtain the optimal nonlinear embedding space, which is more effective in real-world scenarios compared with conventional clustering methods. However, the robustness of the clustering network is prone to being attenuated especially when it encounters an adversarial attack. A small perturbation in the embedding space will lead to diverse clustering results since the labels are absent. In this paper, we propose a robust deep clustering method based on adversarial learning. Speciﬁcally, we ﬁrst attempt to deﬁne adversarial samples in the embedding space for the clustering network. Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding. We then provide a simple yet efﬁcient defense algorithm to improve the robustness of the clustering network. Experimental results on two popular datasets show that the proposed adversarial learning method can signiﬁcantly enhance the robustness and further improve the overall clustering performance. Particularly, the proposed method is generally applicable to multiple existing clustering frameworks to boost their robustness. The source code is available at https://github.com/xdxuyang/ALRDC .
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Robustness of Supervised Sparse Coding b/data/2020/neurips/Adversarial Robustness of Supervised Sparse Coding
new file mode 100644
index 0000000000..fe19e3f3de
--- /dev/null
+++ b/data/2020/neurips/Adversarial Robustness of Supervised Sparse Coding	
@@ -0,0 +1 @@
+Several recent results provide theoretical insights into the phenomena of adversarial examples. Existing results, however, are often limited due to a gap between the simplicity of the models studied and the complexity of those deployed in practice. In this work, we strike a better balance by considering a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate. We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear classifier, and show an interesting interplay between the expressivity and stability of the (supervised) representation map and a notion of margin in the feature space. We bound the robust risk (to $\ell_2$-bounded perturbations) of hypotheses parameterized by dictionaries that achieve a mild encoder gap on training data. Furthermore, we provide a robustness certificate for end-to-end classification. We demonstrate the applicability of our analysis by computing certified accuracy on real data, and compare with other alternatives for certified robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Self-Supervised Contrastive Learning b/data/2020/neurips/Adversarial Self-Supervised Contrastive Learning
new file mode 100644
index 0000000000..fb502012d9
--- /dev/null
+++ b/data/2020/neurips/Adversarial Self-Supervised Contrastive Learning	
@@ -0,0 +1 @@
+Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels. However, do we really need class labels at all, for adversarially robust training of deep neural networks? In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. Further, we present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data, which aims to maximize the similarity between a random augmentation of a data sample and its instance-wise adversarial perturbation. We validate our method, Robust Contrastive Learning (RoCL), on multiple benchmark datasets, on which it obtains comparable robust accuracy over state-of-the-art supervised adversarial learning methods, and significantly improved robustness against the black box and unseen types of attacks. Moreover, with further joint fine-tuning with supervised adversarial loss, RoCL obtains even higher robust accuracy over using self-supervised learning alone. Notably, RoCL also demonstrate impressive results in robust transfer learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization b/data/2020/neurips/Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization
new file mode 100644
index 0000000000..d0538e41bb
--- /dev/null
+++ b/data/2020/neurips/Adversarial Soft Advantage Fitting: Imitation Learning without Policy Optimization	
@@ -0,0 +1 @@
+Adversarial imitation learning alternates between learning a discriminator -- which tells apart expert's demonstrations from generated ones -- and a generator's policy to produce trajectories that can fool this discriminator. This alternated optimization is known to be delicate in practice since it compounds unstable adversarial training with brittle and sample-inefficient reinforcement learning. We propose to remove the burden of the policy optimization steps by leveraging a novel discriminator formulation. Specifically, our discriminator is explicitly conditioned on two policies: the one from the previous generator's iteration and a learnable policy. When optimized, this discriminator directly learns the optimal generator's policy. Consequently, our discriminator's update solves the generator's optimization problem for free: learning a policy that imitates the expert does not require an additional optimization loop. This formulation effectively cuts by half the implementation and computational burden of adversarial imitation learning algorithms by removing the reinforcement learning phase altogether. We show on a variety of tasks that our simpler approach is competitive to prevalent imitation learning methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Sparse Transformer for Time Series Forecasting b/data/2020/neurips/Adversarial Sparse Transformer for Time Series Forecasting
new file mode 100644
index 0000000000..6c357a95ac
--- /dev/null
+++ b/data/2020/neurips/Adversarial Sparse Transformer for Time Series Forecasting	
@@ -0,0 +1 @@
+Many approaches have been proposed for time series forecasting, in light of its signiﬁcance in wide applications including business demand prediction. However, the existing methods suffer from two key limitations. Firstly, most point prediction models only predict an exact value of each time step without ﬂexibility, which can hardly capture the stochasticity of data. Even probabilistic prediction using the likelihood estimation suffers these problems in the same way. Besides, most of them use the auto-regressive generative mode, where ground-truth is provided during training and replaced by the network’s own one-step ahead output during inference, causing the error accumulation in inference. Thus they may fail to forecast time series for long time horizon due to the error accumulation. To solve these issues, in this paper, we propose a new time series forecasting model – Adversarial Sparse Transformer (AST), based on Generative Adversarial Networks (GANs). Speciﬁcally, AST adopts a Sparse Transformer as the generator to learn a sparse attention map for time series forecasting, and uses a discriminator to improve the prediction performance at a sequence level. Extensive experiments on several real-world datasets show the effectiveness and efﬁciency of our method.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation b/data/2020/neurips/Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation
new file mode 100644
index 0000000000..699a2a4567
--- /dev/null
+++ b/data/2020/neurips/Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation	
@@ -0,0 +1 @@
+We aim at the problem named One-Shot Unsupervised Domain Adaptation. Unlike traditional Unsupervised Domain Adaptation, it assumes that only one unlabeled target sample can be available when learning to adapt. This setting is realistic but more challenging, in which conventional adaptation approaches are prone to failure due to the scarce of unlabeled target data. To this end, we propose a novel Adversarial Style Mining approach, which combines the style transfer module and task-specific module into an adversarial manner. Specifically, the style transfer module iteratively searches for harder stylized images around the one-shot target sample according to the current learning state, leading the task model to explore the potential styles that are difficult to solve in the almost unseen target domain, thus boosting the adaptation performance in a data-scarce scenario. The adversarial learning framework makes the style transfer module and task-specific module benefit each other during the competition. Extensive experiments on both cross-domain classification and segmentation benchmarks verify that ASM achieves state-of-the-art adaptation performance under the challenging one-shot setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Training is a Form of Data-dependent Operator Norm Regularization b/data/2020/neurips/Adversarial Training is a Form of Data-dependent Operator Norm Regularization
new file mode 100644
index 0000000000..14ae06ffa5
--- /dev/null
+++ b/data/2020/neurips/Adversarial Training is a Form of Data-dependent Operator Norm Regularization	
@@ -0,0 +1 @@
+We establish a theoretical link between adversarial training and operator norm regularization for deep neural networks. Specifically, we prove that $\ell_p$-norm constrained projected gradient ascent based adversarial training with an $\ell_q$-norm loss on the logits of clean and perturbed inputs is equivalent to data-dependent (p, q) operator norm regularization. This fundamental connection confirms the long-standing argument that a network's sensitivity to adversarial examples is tied to its spectral properties and hints at novel ways to robustify and defend against adversarial attacks. We provide extensive empirical evidence on state-of-the-art network architectures to support our theoretical results.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial Weight Perturbation Helps Robust Generalization b/data/2020/neurips/Adversarial Weight Perturbation Helps Robust Generalization
new file mode 100644
index 0000000000..8b629d54d6
--- /dev/null
+++ b/data/2020/neurips/Adversarial Weight Perturbation Helps Robust Generalization	
@@ -0,0 +1 @@
+The study on improving the robustness of deep neural networks against adversarial examples grows rapidly in recent years. Among them, adversarial training is the most promising one, which flattens the input loss landscape (loss change with respect to input) via training on adversarially perturbed examples. However, how the widely used weight loss landscape (loss change with respect to weight) performs in adversarial training is rarely explored. In this paper, we investigate the weight loss landscape from a new perspective, and identify a clear correlation between the flatness of weight loss landscape and robust generalization gap. Several well-recognized adversarial training improvements, such as early stopping, designing new objective functions, or leveraging unlabeled data, all implicitly flatten the weight loss landscape. Based on these observations, we propose a simple yet effective Adversarial Weight Perturbation (AWP) to explicitly regularize the flatness of weight loss landscape, forming a double-perturbation mechanism in the adversarial training framework that adversarially perturbs both inputs and weights. Extensive experiments demonstrate that AWP indeed brings flatter weight loss landscape and can be easily incorporated into various existing adversarial training methods to further boost their adversarial robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarial robustness via robust low rank representations b/data/2020/neurips/Adversarial robustness via robust low rank representations
new file mode 100644
index 0000000000..a882558e00
--- /dev/null
+++ b/data/2020/neurips/Adversarial robustness via robust low rank representations	
@@ -0,0 +1,4 @@
+Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees. 
+Our first contribution is for certified robustness to perturbations measured in $\ell_2$ norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100. 
+Our second contribution is for the more challenging setting of certified robustness to perturbations measured in $\ell_\infty$ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to $\ell_\infty$ perturbations in those representations. Our certificate of $\ell_\infty$ robustness relies on a natural quantity involving the $\infty \to 2$ matrix operator norm associated with the representation, to translate robustness guarantees from $\ell_2$ to $\ell_\infty$ perturbations. 
+A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarially Robust Few-Shot Learning: A Meta-Learning Approach b/data/2020/neurips/Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
new file mode 100644
index 0000000000..2f37b1aa3b
--- /dev/null
+++ b/data/2020/neurips/Adversarially Robust Few-Shot Learning: A Meta-Learning Approach	
@@ -0,0 +1 @@
+Previous work on adversarially robust neural networks for image classification requires large training sets and computationally expensive training procedures. On the other hand, few-shot learning methods are highly vulnerable to adversarial examples. The goal of our work is to produce networks which both perform well at few-shot classification tasks and are simultaneously robust to adversarial examples. We develop an algorithm for producing adversarially robust meta-learners, and we thoroughly investigate factors which contribute to adversarial vulnerability. Moreover, our method achieves far superior robust performance on few-shot image classification tasks, such as Mini-ImageNet and CIFAR-FS, than robust transfer learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarially Robust Streaming Algorithms via Differential Privacy b/data/2020/neurips/Adversarially Robust Streaming Algorithms via Differential Privacy
new file mode 100644
index 0000000000..6c7913c20d
--- /dev/null
+++ b/data/2020/neurips/Adversarially Robust Streaming Algorithms via Differential Privacy	
@@ -0,0 +1 @@
+A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially robust streaming algorithms that outperform the current state-of-the-art constructions for many interesting regimes of parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models b/data/2020/neurips/Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models
new file mode 100644
index 0000000000..552666c042
--- /dev/null
+++ b/data/2020/neurips/Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models	
@@ -0,0 +1 @@
+Undirected graphical models are compact representations of joint probability distributions over random variables. To carry out an inference task of interest, graphical models of arbitrary topology can be trained using empirical risk minimization. However, when faced with new tasks, these models (EGMs) often need to be re-trained. Instead, we propose an inference-agnostic adversarial training framework for producing an ensemble of graphical models (AGMs). The ensemble is optimized to generate data, and inference is learned as a by-product of this endeavor. AGMs perform comparably with EGMs on inference tasks that the latter were specifically optimized for. Most importantly, AGMs show significantly better generalization capabilities across inference tasks. AGMs are also on par with GibbsNet, a state-of-the-art deep neural architecture, which like AGMs, allows conditioning on any subset of random variables. Finally, AGMs allow fast data sampling, competitive with Gibbs sampling from EGMs.
\ No newline at end of file
diff --git a/data/2020/neurips/Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity b/data/2020/neurips/Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity
new file mode 100644
index 0000000000..0907704f8f
--- /dev/null
+++ b/data/2020/neurips/Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity	
@@ -0,0 +1 @@
+The current paper studies the problem of agnostic Q -learning with function approx-1 imation in deterministic systems where the optimal Q -function is approximable 2 by a function in the class F with approximation error δ ≥ 0 . We propose a novel 3 recursion-based algorithm and show that if δ = O (cid:0) ρ/ √ dim E (cid:1) , then one can ﬁnd 4 the optimal policy using O (dim E ) trajectories, where ρ is the gap between the 5 optimal Q -value of the best actions and that of the second-best actions and dim E 6 is the Eluder dimension of F . Our result has two implications: 7 1. In conjunction with the lower bound in [Du et al., 2020], our upper bound 8 suggests that the condition δ = (cid:101) Θ (cid:0) ρ/ √ dim E (cid:1) is necessary and sufﬁcient for 9 algorithms with polynomial sample complexity. 10 2. In conjunction with the obvious lower bound in the tabular case, our upper 11 bound suggests that the sample complexity (cid:101) Θ (dim E ) is tight in the agnostic 12 setting. 13 Therefore, we help address the open problem on agnostic Q -learning proposed 14 in [Wen and Van Roy, 2013]. We further extend our algorithm to the stochastic 15 reward setting and obtain similar results. 16
\ No newline at end of file
diff --git a/data/2020/neurips/Agnostic Learning of a Single Neuron with Gradient Descent b/data/2020/neurips/Agnostic Learning of a Single Neuron with Gradient Descent
new file mode 100644
index 0000000000..51cde16748
--- /dev/null
+++ b/data/2020/neurips/Agnostic Learning of a Single Neuron with Gradient Descent	
@@ -0,0 +1 @@
+We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\mathbb{E}_{(x,y)\sim \mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d. samples $S\sim \mathcal{D}^n$. The activation function $\sigma$ is an arbitrary Lipschitz and non-decreasing function, making the optimization problem nonconvex and nonsmooth in general, and covers typical neural network activation functions and inverse link functions in the generalized linear model setting. In the agnostic PAC learning setting, where no assumption on the relationship between the labels $y$ and the input $x$ is made, if the optimal population risk is $\mathsf{OPT}$, we show that gradient descent achieves population risk $O(\mathsf{OPT}^{1/2})+\epsilon$ in polynomial time and sample complexity. When labels take the form $y = \sigma(v^\top x) + \xi$ for zero-mean sub-Gaussian noise $\xi$, we show that gradient descent achieves population risk $\mathsf{OPT} + \epsilon$. Our sample complexity and runtime guarantees are (almost) dimension independent, and when $\sigma$ is strictly increasing and Lipschitz, require no distributional assumptions beyond boundedness. For ReLU, we show the same results under a nondegeneracy assumption for the marginal distribution of the input. To the best of our knowledge, this is the first result for agnostic learning of a single neuron using gradient descent.
\ No newline at end of file
diff --git a/data/2020/neurips/Agnostic Learning with Multiple Objectives b/data/2020/neurips/Agnostic Learning with Multiple Objectives
new file mode 100644
index 0000000000..3bf21e16da
--- /dev/null
+++ b/data/2020/neurips/Agnostic Learning with Multiple Objectives	
@@ -0,0 +1 @@
+Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives L 1 , . . . , L p , as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives are often combined in an ensemble (cid:80) pk =1 λ k L k , thereby reducing the problem to scalar optimization. The mixture weights λ k are set to uniform or some other ﬁxed distribution, based on the learner’s preferences. We argue that learning with a ﬁxed distribution on the mixture weights runs the risk of overﬁtting to some individual objectives and signiﬁcantly harming others, despite performing well on an entire ensemble. Moreover, in reality, the true preferences of a learner across multiple objectives are often unknown or hard to express as a speciﬁc distribution. Instead, we propose a new framework of Agnostic Learning with Multiple Objectives (ALMO), where a model is optimized for any weights in the mixture of base objectives. We present data-dependent Rademacher complexity guarantees for learning in the ALMO framework, which are used to guide a scalable optimization algorithm and the corresponding regularization. We present convergence guarantees for this algorithm, assuming convexity of the loss functions and the underlying hypothesis space. We further implement the algorithm in a popular symbolic gradient computation framework and empirically demonstrate on a number of datasets the beneﬁts of ALMO framework versus learning with a ﬁxed mixture weights distribution.
\ No newline at end of file
diff --git a/data/2020/neurips/Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space b/data/2020/neurips/Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space
new file mode 100644
index 0000000000..02c1a6024c
--- /dev/null
+++ b/data/2020/neurips/Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space	
@@ -0,0 +1 @@
+Distilling knowledge from an ensemble of teacher models is expected to have a more promising performance than that from a single one. Current methods mainly adopt a vanilla average rule, i.e. , to simply take the average of all teacher losses for training the student network. However, this approach treats teachers equally and ignores the diversity among them. When conﬂicts or competitions exist among teachers, which is common, the inner compromise might hurt the distillation performance. In this paper, we examine the diversity of teacher models in the gradient space and regard the ensemble knowledge distillation as a multi-objective optimization problem so that we can determine a better optimization direction for the training of student network. Besides, we also introduce a tolerance parameter to accommodate disagreement among teachers. In this way, our method can be seen as a dynamic weighting method for each teacher in the ensemble. Extensive experiments validate the effectiveness of our method for both logits-based and feature-based cases.
\ No newline at end of file
diff --git a/data/2020/neurips/Algorithmic recourse under imperfect causal knowledge: a probabilistic approach b/data/2020/neurips/Algorithmic recourse under imperfect causal knowledge: a probabilistic approach
new file mode 100644
index 0000000000..e9e4891631
--- /dev/null
+++ b/data/2020/neurips/Algorithmic recourse under imperfect causal knowledge: a probabilistic approach	
@@ -0,0 +1 @@
+Recent work has discussed the limitations of counterfactual explanations to recommend actions for algorithmic recourse, and argued for the need of taking causal relationships between features into consideration. Unfortunately, in practice, the true underlying structural causal model is generally unknown. In this work, we first show that it is impossible to guarantee recourse without access to the true structural equations. To address this limitation, we propose two probabilistic approaches to select optimal actions that achieve recourse with high probability given limited causal knowledge (e.g., only the causal graph). The first captures uncertainty over structural equations under additive Gaussian noise, and uses Bayesian model averaging to estimate the counterfactual distribution. The second removes any assumptions on the structural equations by instead computing the average effect of recourse actions on individuals similar to the person who seeks recourse, leading to a novel subpopulation-based interventional notion of recourse. We then derive a gradient-based procedure for selecting optimal recourse actions, and empirically show that the proposed approaches lead to more reliable recommendations under imperfect causal knowledge than non-probabilistic baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/All Word Embeddings from One Embedding b/data/2020/neurips/All Word Embeddings from One Embedding
new file mode 100644
index 0000000000..dcb6e7472b
--- /dev/null
+++ b/data/2020/neurips/All Word Embeddings from One Embedding	
@@ -0,0 +1 @@
+In neural network-based models for natural language processing (NLP), the largest part of the parameters often consists of word embeddings. Conventional models prepare a large embedding matrix whose size depends on the vocabulary size. Therefore, storing these models in memory and disk storage is costly. In this study, to reduce the total number of parameters, the embeddings for all words are represented by transforming a shared embedding. The proposed method, ALONE (all word embeddings from one), constructs the embedding of a word by modifying the shared embedding with a filter vector, which is word-specific but non-trainable. Then, we input the constructed embedding into a feed-forward neural network to increase its expressiveness. Naively, the filter vectors occupy the same memory size as the conventional embedding matrix, which depends on the vocabulary size. To solve this issue, we also introduce a memory-efficient filter construction approach. We indicate our ALONE can be used as word representation sufficiently through an experiment on the reconstruction of pre-trained word embeddings. In addition, we also conduct experiments on NLP application tasks: machine translation and summarization. We combined ALONE with the current state-of-the-art encoder-decoder model, the Transformer, and achieved comparable scores on WMT 2014 English-to-German translation and DUC 2004 very short summarization with less parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation b/data/2020/neurips/All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation
new file mode 100644
index 0000000000..288775b897
--- /dev/null
+++ b/data/2020/neurips/All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation	
@@ -0,0 +1 @@
+We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix and analyze the approximate message passing algorithm in the sparse regime. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, we find all-or-nothing phase transitions for the asymptotic minimum and algorithmic mean-square errors. These jump from their maximum possible value to zero, at well defined signal-to-noise thresholds whose asymptotic values we determine exactly. In the asymptotic regime the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing.
\ No newline at end of file
diff --git a/data/2020/neurips/Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition b/data/2020/neurips/Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition
new file mode 100644
index 0000000000..6cdc744af4
--- /dev/null
+++ b/data/2020/neurips/Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition	
@@ -0,0 +1 @@
+We study the reinforcement learning problem in the setting of finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and episode length $H$. We propose a model-free algorithm UCB-Advantage and prove that it achieves $\tilde{O}(\sqrt{H^2SAT})$ regret where $T = KH$ and $K$ is the number of episodes to play. Our regret bound improves upon the results of [Jin et al., 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-Advantage achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al., 2019].
\ No newline at end of file
diff --git a/data/2020/neurips/Almost Surely Stable Deep Dynamics b/data/2020/neurips/Almost Surely Stable Deep Dynamics
new file mode 100644
index 0000000000..20a762aa04
--- /dev/null
+++ b/data/2020/neurips/Almost Surely Stable Deep Dynamics	
@@ -0,0 +1 @@
+We introduce a method for learning provably stable deep neural network based dynamic models from observed data. Specifically, we consider discrete-time stochastic dynamic models, as they are of particular interest in practical applications such as estimation and control. However, these aspects exacerbate the challenge of guaranteeing stability. Our method works by embedding a Lyapunov neural network into the dynamic model, thereby inherently satisfying the stability criterion. To this end, we propose two approaches and apply them in both the deterministic and stochastic settings: one exploits convexity of the Lyapunov function, while the other enforces stability through an implicit output layer. We demonstrate the utility of each approach through numerical examples.
\ No newline at end of file
diff --git a/data/2020/neurips/An Analysis of SVD for Deep Rotation Estimation b/data/2020/neurips/An Analysis of SVD for Deep Rotation Estimation
new file mode 100644
index 0000000000..dcac791e2b
--- /dev/null
+++ b/data/2020/neurips/An Analysis of SVD for Deep Rotation Estimation	
@@ -0,0 +1 @@
+Symmetric orthogonalization via SVD, and closely related procedures, are well-known techniques for projecting matrices onto $O(n)$ or $SO(n)$. These tools have long been used for applications in computer vision, for example optimal 3D alignment problems solved by orthogonal Procrustes, rotation averaging, or Essential matrix decomposition. Despite its utility in different settings, SVD orthogonalization as a procedure for producing rotation matrices is typically overlooked in deep learning models, where the preferences tend toward classic representations like unit quaternions, Euler angles, and axis-angle, or more recently-introduced methods. Despite the importance of 3D rotations in computer vision and robotics, a single universally effective representation is still missing. Here, we explore the viability of SVD orthogonalization for 3D rotations in neural networks. We present a theoretical analysis that shows SVD is the natural choice for projecting onto the rotation group. Our extensive quantitative analysis shows simply replacing existing representations with the SVD orthogonalization procedure obtains state of the art performance in many deep learning applications covering both supervised and unsupervised training.
\ No newline at end of file
diff --git a/data/2020/neurips/An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits b/data/2020/neurips/An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits
new file mode 100644
index 0000000000..c919fac233
--- /dev/null
+++ b/data/2020/neurips/An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits	
@@ -0,0 +1 @@
+In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving asymptotically optimal algorithms from problem-dependent regret lower bounds and we introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We build on a reformulation of the lower bound, where context distribution and exploration policy are decoupled, and we obtain an algorithm robust to unbalanced context distributions. Then, using an incremental primal-dual approach to solve the Lagrangian relaxation of the lower bound, we obtain a scalable and computationally efficient algorithm. Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure. We demonstrate the asymptotic optimality of our algorithm, while providing both problem-dependent and worst-case finite-time regret guarantees. Our bounds scale with the logarithm of the number of arms, thus avoiding the linear dependence common in all related prior works. Notably, we establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits. Finally, we verify that our algorithm obtains better empirical performance than state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/An Efficient Adversarial Attack for Tree Ensembles b/data/2020/neurips/An Efficient Adversarial Attack for Tree Ensembles
new file mode 100644
index 0000000000..6783debacc
--- /dev/null
+++ b/data/2020/neurips/An Efficient Adversarial Attack for Tree Ensembles	
@@ -0,0 +1 @@
+We study the problem of efficient adversarial attacks on tree based ensembles such as gradient boosting decision trees (GBDTs) and random forests (RFs). Since these models are non-continuous step functions and gradient does not exist, most existing efficient adversarial attacks are not applicable. Although decision-based black-box attacks can be applied, they cannot utilize the special structure of trees. In our work, we transform the attack problem into a discrete search problem specially designed for tree ensembles, where the goal is to find a valid "leaf tuple" that leads to mis-classification while having the shortest distance to the original input. With this formulation, we show that a simple yet effective greedy algorithm can be applied to iteratively optimize the adversarial example by moving the leaf tuple to its neighborhood within hamming distance 1. Experimental results on several large GBDT and RF models with up to hundreds of trees demonstrate that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach, while also providing smaller (better) adversarial examples than decision-based black-box attacks on general $\ell_p$ ($p=1, 2, \infty$) norm perturbations. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search b/data/2020/neurips/An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
new file mode 100644
index 0000000000..67dc81fdb6
--- /dev/null
+++ b/data/2020/neurips/An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search	
@@ -0,0 +1 @@
+Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.
\ No newline at end of file
diff --git a/data/2020/neurips/An Efficient Framework for Clustered Federated Learning b/data/2020/neurips/An Efficient Framework for Clustered Federated Learning
new file mode 100644
index 0000000000..408c0abc38
--- /dev/null
+++ b/data/2020/neurips/An Efficient Framework for Clustered Federated Learning	
@@ -0,0 +1 @@
+We address the problem of federated learning (FL) where users are distributed and partitioned into clusters. This setup captures settings where different groups of users have their own objectives (learning tasks) but by aggregating their data with others in the same cluster (same learning task), they can leverage the strength in numbers in order to perform more efficient federated learning. For this new framework of clustered federated learning, we propose the Iterative Federated Clustering Algorithm (IFCA), which alternately estimates the cluster identities of the users and optimizes model parameters for the user clusters via gradient descent. We analyze the convergence rate of this algorithm first in a linear model with squared loss and then for generic strongly convex and smooth loss functions. We show that in both settings, with good initialization, IFCA is guaranteed to converge, and discuss the optimality of the statistical error rate. In particular, for the linear model with two clusters, we can guarantee that our algorithm converges as long as the initialization is slightly better than random. When the clustering structure is ambiguous, we propose to train the models by combining IFCA with the weight sharing technique in multi-task learning. In the experiments, we show that our algorithm can succeed even if we relax the requirements on initialization with random initialization and multiple restarts. We also present experimental results showing that our algorithm is efficient in non-convex problems such as neural networks. We demonstrate the benefits of IFCA over the baselines on several clustered FL benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits b/data/2020/neurips/An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits
new file mode 100644
index 0000000000..1a5c9168a7
--- /dev/null
+++ b/data/2020/neurips/An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits	
@@ -0,0 +1 @@
+This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of suprema of empirical processes, we provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. Unlike previous approaches which sample based on minimizing a worst-case variance (e.g. G-optimal design), we define an experimental design objective based on the Gaussian-width of the underlying arm set. We provide a novel lower bound in terms of this objective that highlights its fundamental role in the sample complexity. The sample complexity of our fixed confidence algorithm matches this lower bound, and in addition is computationally efficient for combinatorial classes, e.g. shortest-path, matchings and matroids, where the arm sets can be exponentially large in the dimension. Finally, we propose the first algorithm for linear bandits in the the fixed budget setting. Its guarantee matches our lower bound up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2020/neurips/An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay b/data/2020/neurips/An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
new file mode 100644
index 0000000000..fcdf04ec7d
--- /dev/null
+++ b/data/2020/neurips/An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay	
@@ -0,0 +1 @@
+Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.
\ No newline at end of file
diff --git a/data/2020/neurips/An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch b/data/2020/neurips/An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch
new file mode 100644
index 0000000000..cf84af4c9e
--- /dev/null
+++ b/data/2020/neurips/An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch	
@@ -0,0 +1 @@
+We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem - grounded action transformation - is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning.To validate our hypothesis we derive a new algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We run experiments in several domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods
\ No newline at end of file
diff --git a/data/2020/neurips/An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods b/data/2020/neurips/An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
new file mode 100644
index 0000000000..296f07f814
--- /dev/null
+++ b/data/2020/neurips/An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods	
@@ -0,0 +1 @@
+In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible, with both global convergence and an efficient finite-sample complexity.
\ No newline at end of file
diff --git a/data/2020/neurips/An Improved Analysis of Stochastic Gradient Descent with Momentum b/data/2020/neurips/An Improved Analysis of Stochastic Gradient Descent with Momentum
new file mode 100644
index 0000000000..a4a4320d2a
--- /dev/null
+++ b/data/2020/neurips/An Improved Analysis of Stochastic Gradient Descent with Momentum	
@@ -0,0 +1 @@
+SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters have not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also prove that multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/An Optimal Elimination Algorithm for Learning a Best Arm b/data/2020/neurips/An Optimal Elimination Algorithm for Learning a Best Arm
new file mode 100644
index 0000000000..aa55097abe
--- /dev/null
+++ b/data/2020/neurips/An Optimal Elimination Algorithm for Learning a Best Arm	
@@ -0,0 +1 @@
+We consider the classic problem of $(\epsilon,\delta)$-PAC learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst-case sample complexity is not well understood. In this paper, we propose a new approach for $(\epsilon,\delta)$-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to \emph{exactly} the optimal sample complexity of $(\epsilon,\delta)$-learning the mean of $n$ arms separately and we complement this result with a conditional matching lower bound. More specifically:
\ No newline at end of file
diff --git a/data/2020/neurips/An Unbiased Risk Estimator for Learning with Augmented Classes b/data/2020/neurips/An Unbiased Risk Estimator for Learning with Augmented Classes
new file mode 100644
index 0000000000..41213d9155
--- /dev/null
+++ b/data/2020/neurips/An Unbiased Risk Estimator for Learning with Augmented Classes	
@@ -0,0 +1 @@
+In this paper, we study the problem of learning with augmented classes (LAC), where new classes that do not appear in the training dataset might emerge in the testing phase. The mixture of known classes and new classes in the testing distribution makes the LAC problem quite challenging. Our discovery is that by exploiting cheap and vast unlabeled data, the testing distribution can be estimated in the training stage, which paves us a way to develop algorithms with nice statistical properties. Specifically, we propose an unbiased risk estimator over the testing distribution for the LAC problem, and further develop an efficient algorithm to perform the empirical risk minimization. Both asymptotic and non-asymptotic analyses are provided as theoretical guarantees. The efficacy of the proposed algorithm is also confirmed by experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/An Unsupervised Information-Theoretic Perceptual Quality Metric b/data/2020/neurips/An Unsupervised Information-Theoretic Perceptual Quality Metric
new file mode 100644
index 0000000000..1a5dee1b1e
--- /dev/null
+++ b/data/2020/neurips/An Unsupervised Information-Theoretic Perceptual Quality Metric	
@@ -0,0 +1 @@
+Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings, or both. We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM). We show that PIM is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset. We also perform qualitative experiments using the ImageNet-C dataset, and establish that our approach is robust with respect to architectural details.
\ No newline at end of file
diff --git a/data/2020/neurips/An analytic theory of shallow networks dynamics for hinge loss classification b/data/2020/neurips/An analytic theory of shallow networks dynamics for hinge loss classification
new file mode 100644
index 0000000000..28999d6155
--- /dev/null
+++ b/data/2020/neurips/An analytic theory of shallow networks dynamics for hinge loss classification	
@@ -0,0 +1 @@
+Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable data and a linear hinge loss, for which the dynamics can be explicitly solved in the infinite dataset limit. This allows us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we assess the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.
\ No newline at end of file
diff --git a/data/2020/neurips/An efficient nonconvex reformulation of stagewise convex optimization problems b/data/2020/neurips/An efficient nonconvex reformulation of stagewise convex optimization problems
new file mode 100644
index 0000000000..b675c1b39c
--- /dev/null
+++ b/data/2020/neurips/An efficient nonconvex reformulation of stagewise convex optimization problems	
@@ -0,0 +1 @@
+Convex optimization problems with staged structure appear in several contexts, including optimal control, verification of deep neural networks, and isotonic regression. Off-the-shelf solvers can solve these problems but may scale poorly. We develop a nonconvex reformulation designed to exploit this staged structure. Our reformulation has only simple bound constraints, enabling solution via projected gradient methods and their accelerated variants. The method automatically generates a sequence of primal and dual feasible solutions to the original convex problem, making optimality certification easy. We establish theoretical properties of the nonconvex formulation, showing that it is (almost) free of spurious local minima and has the same global optimum as the convex problem. We modify PGD to avoid spurious local minimizers so it always converges to the global minimizer. For neural network verification, our approach obtains small duality gaps in only a few gradient steps. Consequently, it can quickly solve large-scale verification problems faster than both off-the-shelf and specialized solvers.
\ No newline at end of file
diff --git a/data/2020/neurips/An implicit function learning approach for parametric modal regression b/data/2020/neurips/An implicit function learning approach for parametric modal regression
new file mode 100644
index 0000000000..e98be9eaf0
--- /dev/null
+++ b/data/2020/neurips/An implicit function learning approach for parametric modal regression	
@@ -0,0 +1 @@
+For multi-valued functions---such as when the conditional distribution on targets given the inputs is multi-modal---standard regression approaches are not always desirable because they provide the conditional mean. Modal regression algorithms address this issue by instead finding the conditional mode(s). Most, however, are nonparametric approaches and so can be difficult to scale. Further, parametric approximators, like neural networks, facilitate learning complex relationships between inputs and targets. In this work, we propose a parametric modal regression algorithm. We use the implicit function theorem to develop an objective, for learning a joint function over inputs and targets. We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions. We demonstrate that our method is competitive in a real-world modal regression problem and two regular regression datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/An operator view of policy gradient methods b/data/2020/neurips/An operator view of policy gradient methods
new file mode 100644
index 0000000000..15614d0742
--- /dev/null
+++ b/data/2020/neurips/An operator view of policy gradient methods	
@@ -0,0 +1 @@
+We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $\mathcal{I}$, which maps any policy $\pi$ to a better one $\mathcal{I}\pi$, and a projection operator $\mathcal{P}$, which finds the best approximation of $\mathcal{I}\pi$ in the set of realizable policies. We use this framework to introduce operator-based versions of traditional policy gradient methods such as REINFORCE and PPO, which leads to a better understanding of their original counterparts. We also use the understanding we develop of the role of $\mathcal{I}$ and $\mathcal{P}$ to propose a new global lower bound of the expected return. This new perspective allows us to further bridge the gap between policy-based and value-based methods, showing how REINFORCE and the Bellman optimality operator, for example, can be seen as two sides of the same coin.
\ No newline at end of file
diff --git a/data/2020/neurips/Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring b/data/2020/neurips/Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring
new file mode 100644
index 0000000000..751bcb2f8f
--- /dev/null
+++ b/data/2020/neurips/Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring	
@@ -0,0 +1 @@
+We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based algorithm, which enables us to exactly sample the target parameter from the posterior distribution. Besides, we prove that the new algorithm achieves the logarithmic problem-dependent expected pseudo-regret $\mathrm{O}(\log T)$ for a linearized variant of the problem with local observability. This result is the first regret bound of Thompson sampling for partial monitoring, which also becomes the first logarithmic regret bound of Thompson sampling for linear bandits.
\ No newline at end of file
diff --git a/data/2020/neurips/Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry b/data/2020/neurips/Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
new file mode 100644
index 0000000000..7433154446
--- /dev/null
+++ b/data/2020/neurips/Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry	
@@ -0,0 +1 @@
+We consider the optimization problem associated with fitting two-layers ReLU networks with respect to the squared loss, where labels are generated by a target network. We leverage the rich symmetry structure to analytically characterize the Hessian at various families of spurious minima in the natural regime where the number of inputs $d$ and the number of hidden neurons $k$ is finite. In particular, we prove that for $d\ge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $\Omega(d)$ of the eigenvalues grow linearly with $k$. Although this phenomenon of extremely skewed spectrum has been observed many times before, to our knowledge, this is the first time it has been established {rigorously}. Our analytic approach uses techniques, new to the field, from symmetry breaking and representation theory, and carries important implications for our ability to argue about statistical generalization through local curvature.
\ No newline at end of file
diff --git a/data/2020/neurips/Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks b/data/2020/neurips/Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks
new file mode 100644
index 0000000000..7403533fc9
--- /dev/null
+++ b/data/2020/neurips/Analytical Probability Distributions and Exact Expectation-Maximization for Deep Generative Networks	
@@ -0,0 +1 @@
+Deep Generative Networks (DGNs) with probabilistic modeling of their output and latent space are currently trained via Variational Autoencoders (VAEs). In the absence of a known analytical form for the posterior and likelihood expectation, VAEs resort to approximations, including (Amortized) Variational Inference (AVI) and Monte-Carlo sampling. We exploit the Continuous Piecewise Afﬁne property of modern DGNs to derive their posterior and marginal distributions as well as the latter’s ﬁrst two moments. These ﬁndings enable us to derive an analytical Expectation-Maximization (EM) algorithm for gradient-free DGN learning. We demonstrate empirically that EM training of DGNs produces greater likelihood than VAE training. Our new framework will guide the design of new VAE AVI that better approximates the true posterior and open new avenues to apply standard statistical tools for model comparison, anomaly detection, and missing data imputation.
\ No newline at end of file
diff --git a/data/2020/neurips/Applications of Common Entropy for Causal Inference b/data/2020/neurips/Applications of Common Entropy for Causal Inference
new file mode 100644
index 0000000000..9b5b4e1534
--- /dev/null
+++ b/data/2020/neurips/Applications of Common Entropy for Causal Inference	
@@ -0,0 +1 @@
+We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Rényi common entropy by minimizing the Rényi entropy of the latent variable. To efﬁciently compute common entropy, we propose an iterative algorithm that can be used to discover the trade-off between the entropy of the latent variable and the conditional mutual information of the observed variables. We show two applications of common entropy in causal inference: First, under the assumption that there are no low-entropy mediators, it can be used to distinguish causation from spurious correlation among almost all joint distributions on simple causal graphs with two observed variables. Second, common entropy can be used to improve constraint-based methods such as PC or FCI algorithms in the small-sample regime, where these methods are known to struggle. We propose a modiﬁcation to these constraint-based methods to assess if a separating set found by these algorithms are valid using common entropy. We ﬁnally evaluate our algorithms on synthetic and real data to establish their performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Approximate Cross-Validation for Structured Models b/data/2020/neurips/Approximate Cross-Validation for Structured Models
new file mode 100644
index 0000000000..a37245a865
--- /dev/null
+++ b/data/2020/neurips/Approximate Cross-Validation for Structured Models	
@@ -0,0 +1 @@
+Many modern data analyses benefit from explicitly modeling dependence structure in data -- such as measurements across time or space, ordered words in a sentence, or genes in a genome. Cross-validation is the gold standard to evaluate these analyses but can be prohibitively slow due to the need to re-run already-expensive learning algorithms many times. Previous work has shown approximate cross-validation (ACV) methods provide a fast and provably accurate alternative in the setting of empirical risk minimization. But this existing ACV work is restricted to simpler models by the assumptions that (i) data are independent and (ii) an exact initial model fit is available. In structured data analyses, (i) is always untrue, and (ii) is often untrue. In the present work, we address (i) by extending ACV to models with dependence structure. To address (ii), we verify -- both theoretically and empirically -- that ACV quality deteriorates smoothly with noise in the initial fit. We demonstrate the accuracy and computational benefits of our proposed methods on a diverse set of real-world applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Approximate Cross-Validation with Low-Rank Data in High Dimensions b/data/2020/neurips/Approximate Cross-Validation with Low-Rank Data in High Dimensions
new file mode 100644
index 0000000000..018a6ea8d8
--- /dev/null
+++ b/data/2020/neurips/Approximate Cross-Validation with Low-Rank Data in High Dimensions	
@@ -0,0 +1 @@
+Many recent advances in machine learning are driven by a challenging trifecta: large data size $N$; high dimensions; and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions -- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR). Guided by this observation, we develop a new algorithm for ACV that is fast and accurate in the presence of ALR data. Our first key insight is that the Hessian matrix -- whose inverse forms the computational bottleneck of existing ACV methods -- is ALR. We show that, despite our use of the \emph{inverse} Hessian, a low-rank approximation using the largest (rather than the smallest) matrix eigenvalues enables fast, reliable ACV. Our second key insight is that, in the presence of ALR data, error in existing ACV methods roughly grows with the (approximate, low) rank rather than with the (full, high) dimension. These insights allow us to prove theoretical guarantees on the quality of our proposed algorithm -- along with fast-to-compute upper bounds on its error. We demonstrate the speed and accuracy of our method, as well as the usefulness of our bounds, on a range of real and simulated data sets.
\ No newline at end of file
diff --git a/data/2020/neurips/Approximate Heavily-Constrained Learning with Lagrange Multiplier Models b/data/2020/neurips/Approximate Heavily-Constrained Learning with Lagrange Multiplier Models
new file mode 100644
index 0000000000..79611fab0b
--- /dev/null
+++ b/data/2020/neurips/Approximate Heavily-Constrained Learning with Lagrange Multiplier Models	
@@ -0,0 +1 @@
+In machine learning applications such as ranking fairness or fairness over intersec-tional groups, one often encounters optimization problems with extremely large numbers of constraints. In particular, with ranking fairness tasks, there may even be a variable number of constraints, e.g. one for each query in the training set. In these cases, the standard approach of optimizing a Lagrangian while maintaining one Lagrange multiplier per constraint may no longer be practical. Our proposal is to associate a feature vector with each constraint, and to learn a “multiplier model” that maps each such vector to the corresponding Lagrange multiplier. We prove optimality, approximate feasibility and generalization guarantees under assumptions on the ﬂexibility of the multiplier model, and empirically demonstrate that our method is effective on real-world case studies.
\ No newline at end of file
diff --git a/data/2020/neurips/Approximation Based Variance Reduction for Reparameterization Gradients b/data/2020/neurips/Approximation Based Variance Reduction for Reparameterization Gradients
new file mode 100644
index 0000000000..2d39a90635
--- /dev/null
+++ b/data/2020/neurips/Approximation Based Variance Reduction for Reparameterization Gradients	
@@ -0,0 +1 @@
+Flexible variational distributions improve variational inference but are harder to optimize. In this work we present a control variate that is applicable for any reparameterizable distribution with known mean and covariance matrix, e.g. Gaussians with any covariance structure. The control variate is based on a quadratic approximation of the model, and its parameters are set using a double-descent scheme by minimizing the gradient estimator's variance. We empirically show that this control variate leads to large improvements in gradient variance and optimization convergence for inference with non-factorized variational distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Assessing SATNet's Ability to Solve the Symbol Grounding Problem b/data/2020/neurips/Assessing SATNet's Ability to Solve the Symbol Grounding Problem
new file mode 100644
index 0000000000..6c89330b3b
--- /dev/null
+++ b/data/2020/neurips/Assessing SATNet's Ability to Solve the Symbol Grounding Problem	
@@ -0,0 +1 @@
+SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achievement towards the longstanding AI goal of combining pattern recognition with logical reasoning. In this paper, we clarify SATNet's capabilities by showing that in the absence of intermediate labels that identify individual Sudoku digit images with their logical representations, SATNet completely fails at visual Sudoku (0% test accuracy). More generally, the failure can be pinpointed to its inability to learn to assign symbols to perceptual phenomena, also known as the symbol grounding problem, which has long been thought to be a prerequisite for intelligent agents to perform real-world logical reasoning. We propose an MNIST based test as an easy instance of the symbol grounding problem that can serve as a sanity check for differentiable symbolic solvers in general. Naive applications of SATNet on this test lead to performance worse than that of models without logical reasoning capabilities. We report on the causes of SATNet's failure and how to prevent them.
\ No newline at end of file
diff --git a/data/2020/neurips/Assisted Learning: A Framework for Multi-Organization Learning b/data/2020/neurips/Assisted Learning: A Framework for Multi-Organization Learning
new file mode 100644
index 0000000000..6ffb067876
--- /dev/null
+++ b/data/2020/neurips/Assisted Learning: A Framework for Multi-Organization Learning	
@@ -0,0 +1 @@
+In an increasing number of AI scenarios, collaborations among different organizations or agents (e.g., human and robots, mobile units) are often essential to accomplish an organization-speciﬁc mission. However, to avoid leaking useful and possibly proprietary information, organizations typically enforce stringent security constraints on sharing modeling algorithms and data, which signiﬁcantly limits collaborations. In this work, we introduce the Assisted Learning framework for organizations to assist each other in supervised learning tasks without revealing any organization’s algorithm, data, or even task. An organization seeks assistance by broadcasting task-speciﬁc but nonsensitive statistics and incorporating others’ feedback in one or more iterations to eventually improve its predictive performance. Theoretical and experimental studies, including real-world medical benchmarks, show that Assisted Learning can often achieve near-oracle learning performance as if data and training processes were centralized.
\ No newline at end of file
diff --git a/data/2020/neurips/Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability b/data/2020/neurips/Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
new file mode 100644
index 0000000000..914f49fc5c
--- /dev/null
+++ b/data/2020/neurips/Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability	
@@ -0,0 +1 @@
+Explaining AI systems is fundamental both to the development of high performing models and to the trust placed in them by their users. The Shapley framework for explainability has strength in its general applicability combined with its precise, rigorous foundation: it provides a common, model-agnostic language for AI explainability and uniquely satisfies a set of intuitive mathematical axioms. However, Shapley values are too restrictive in one significant regard: they ignore all causal structure in the data. We introduce a less restrictive framework, Asymmetric Shapley values (ASVs), which are rigorously founded on a set of axioms, applicable to any AI system, and flexible enough to incorporate any causal structure known to be respected by the data. We demonstrate that ASVs can (i) improve model explanations by incorporating causal information, (ii) provide an unambiguous test for unfair discrimination in model predictions, (iii) enable sequentially incremental explanations in time-series models, and (iv) support feature-selection studies without the need for model retraining.
\ No newline at end of file
diff --git a/data/2020/neurips/Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance b/data/2020/neurips/Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance
new file mode 100644
index 0000000000..0d90c2e10e
--- /dev/null
+++ b/data/2020/neurips/Asymptotic Guarantees for Generative Modeling Based on the Smooth Wasserstein Distance	
@@ -0,0 +1 @@
+Minimum distance estimation (MDE) gained recent attention as a formulation of (implicit) generative modeling. It considers minimizing, over model parameters, a statistical distance between the empirical data distribution and the model. This formulation lends itself well to theoretical analysis, but typical results are hindered by the curse of dimensionality. To overcome this and devise a scalable finite-sample statistical MDE theory, we adopt the framework of smooth 1-Wasserstein distance (SWD) $\mathsf{W}_1^{(\sigma)}$. The SWD was recently shown to preserve the metric and topological structure of classic Wasserstein distances, while enjoying dimension-free empirical convergence rates. In this work, we conduct a thorough statistical study of the minimum smooth Wasserstein estimators (MSWEs), first proving the estimator's measurability and asymptotic consistency. We then characterize the limit distribution of the optimal model parameters and their associated minimal SWD. These results imply an $O(n^{-1/2})$ generalization bound for generative modeling based on MSWE, which holds in arbitrary dimension. Our main technical tool is a novel high-dimensional limit distribution result for empirical $\mathsf{W}_1^{(\sigma)}$. The characterization of a nondegenerate limit stands in sharp contrast with the classic empirical 1-Wasserstein distance, for which a similar result is known only in the one-dimensional case. The validity of our theory is supported by empirical results, posing the SWD as a potent tool for learning and inference in high dimensions.
\ No newline at end of file
diff --git a/data/2020/neurips/Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model b/data/2020/neurips/Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model
new file mode 100644
index 0000000000..2400d6b56f
--- /dev/null
+++ b/data/2020/neurips/Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model	
@@ -0,0 +1 @@
+This paper studies two-layers Neural Networks (NN), where the ﬁrst layer contains random weights, and the second layer is trained using Ridge regularization. This model has been the focus of numerous recent works, showing that despite its simplicity, it captures some of the empirically observed behaviors of NN in the over-parametrized regime, such as the double-descent curve where the generalization error decreases as the number of weights increases to + ∞ . This paper establishes asymptotic distribution results for this 2-layers NN model in the regime where the ratios pn and dn have ﬁnite limits, where n is the sample size, p the ambient dimension and d is the width of the ﬁrst layer. We show that a weighted average of the derivatives of the trained NN at the observed data is asymptotically normal, in a setting with Lipschitz activation functions in a linear regression response with Gaussian features under possibly non-linear perturbations. We then leverage this asymptotic normality result to construct conﬁdence intervals (CIs) for single components of the unknown regression vector. The novelty of our results are threefold: (1) Despite the nonlinearity induced by the activation function, we characterize the asymptotic distribution of a weighted average of the gradients of the network after training; (2) It provides the ﬁrst frequentist uncertainty quantiﬁcation guarantees, in the form of valid ( 1 - α )-CIs, based on NN estimates; (3) It shows that the double-descent phenomenon occurs in terms of the length of the CIs, with the length increasing and then decreasing as d n (cid:37) + ∞ for certain ﬁxed values of p n . We also provide a toolbox to predict the length of CIs numerically, which lets
\ No newline at end of file
diff --git a/data/2020/neurips/Asymptotically Optimal Exact Minibatch Metropolis-Hastings b/data/2020/neurips/Asymptotically Optimal Exact Minibatch Metropolis-Hastings
new file mode 100644
index 0000000000..abe50e78c6
--- /dev/null
+++ b/data/2020/neurips/Asymptotically Optimal Exact Minibatch Metropolis-Hastings	
@@ -0,0 +1 @@
+Metropolis-Hastings (MH) is a commonly-used MCMC algorithm, but it can be intractable on large datasets due to requiring computations over the whole dataset. In this paper, we study minibatch MH methods, which instead use subsamples to enable scaling. We observe that most existing minibatch MH methods are inexact (i.e. they may change the target distribution), and show that this inexactness can cause arbitrarily large errors in inference. We propose a new exact minibatch MH method, TunaMH, which exposes a tunable trade-off between its batch size and its theoretically guaranteed convergence rate. We prove a lower bound on the batch size that any minibatch MH method must use to retain exactness while guaranteeing fast convergence-the first such bound for minibatch MH-and show TunaMH is asymptotically optimal in terms of the batch size. Empirically, we show TunaMH outperforms other exact minibatch MH methods on robust linear regression, truncated Gaussian mixtures, and logistic regression.
\ No newline at end of file
diff --git a/data/2020/neurips/Attack of the Tails: Yes, You Really Can Backdoor Federated Learning b/data/2020/neurips/Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
new file mode 100644
index 0000000000..a67672c37c
--- /dev/null
+++ b/data/2020/neurips/Attack of the Tails: Yes, You Really Can Backdoor Federated Learning	
@@ -0,0 +1 @@
+Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).
\ No newline at end of file
diff --git a/data/2020/neurips/AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control b/data/2020/neurips/AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control
new file mode 100644
index 0000000000..7a74667bba
--- /dev/null
+++ b/data/2020/neurips/AttendLight: Universal Attention-Based Reinforcement Learning Model for Traffic Signal Control	
@@ -0,0 +1 @@
+We propose AttendLight, an end-to-end Reinforcement Learning (RL) algorithm for the problem of traffic signal control. Previous approaches for this problem have the shortcoming that they require training for each new intersection with a different structure or traffic flow distribution. AttendLight solves this issue by training a single, universal model for intersections with any number of roads, lanes, phases (possible signals), and traffic flow. To this end, we propose a deep RL model which incorporates two attention models. The first attention model is introduced to handle different numbers of roads-lanes; and the second attention model is intended for enabling decision-making with any number of phases in an intersection. As a result, our proposed model works for any intersection configuration, as long as a similar configuration is represented in the training set. Experiments were conducted with both synthetic and real-world standard benchmark data-sets. The results we show cover intersections with three or four approaching roads; one-directional/bi-directional roads with one, two, and three lanes; different number of phases; and different traffic flows. We consider two regimes: (i) single-environment training, single-deployment, and (ii) multi-environment training, multi-deployment. AttendLight outperforms both classical and other RL-based approaches on all cases in both regimes.
\ No newline at end of file
diff --git a/data/2020/neurips/Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation b/data/2020/neurips/Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation
new file mode 100644
index 0000000000..8988cabb49
--- /dev/null
+++ b/data/2020/neurips/Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation	
@@ -0,0 +1 @@
+Much recent work has focused on biologically plausible variants of supervised learning algorithms. However, there is no teacher in the motor cortex that instructs the motor neurons and learning in the brain depends on reward and punishment. We demonstrate a biologically plausible reinforcement learning scheme for deep networks with an arbitrary number of layers. The network chooses an action by selecting a unit in the output layer and uses feedback connections to assign credit to the units in successively lower layers that are responsible for this action. After the choice, the network receives reinforcement and there is no teacher correcting the errors. We show how the new learning scheme – Attention-Gated Brain Propagation (BrainProp) – is mathematically equivalent to error backpropagation, for one output unit at a time. We demonstrate successful learning of deep fully connected, convolutional and locally connected networks on classical and hard image-classiﬁcation benchmarks; MNIST, CIFAR10, CIFAR100 and Tiny ImageNet. BrainProp achieves an accuracy that is equivalent to that of standard error-backpropagation, and better than state-of-the-art biologically inspired learning schemes. Additionally, the trial-and-error nature of learning is associated with limited additional training time so that BrainProp is a factor of 1-3.5 times slower. Our results thereby provide new insights into how deep learning may be implemented in the brain.
\ No newline at end of file
diff --git a/data/2020/neurips/Attribute Prototype Network for Zero-Shot Learning b/data/2020/neurips/Attribute Prototype Network for Zero-Shot Learning
new file mode 100644
index 0000000000..2d8950f443
--- /dev/null
+++ b/data/2020/neurips/Attribute Prototype Network for Zero-Shot Learning	
@@ -0,0 +1 @@
+From the beginning of zero-shot learning research, visual attributes have been shown to play an important role. In order to better transfer attribute-based knowledge from known to unknown classes, we argue that an image representation with integrated attribute localization ability would be beneficial for zero-shot learning. To this end, we propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features using only class-level attributes. While a visual-semantic embedding layer learns global features, local features are learned through an attribute prototype network that simultaneously regresses and decorrelates attributes from intermediate features. We show that our locality augmented image representations achieve a new state-of-the-art on three zero-shot learning benchmarks. As an additional benefit, our model points to the visual evidence of the attributes in an image, e.g. for the CUB dataset, confirming the improved attribute localization ability of our image representation. The code will be publicaly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Attribution Preservation in Network Compression for Reliable Network Interpretation b/data/2020/neurips/Attribution Preservation in Network Compression for Reliable Network Interpretation
new file mode 100644
index 0000000000..33a2817f21
--- /dev/null
+++ b/data/2020/neurips/Attribution Preservation in Network Compression for Reliable Network Interpretation	
@@ -0,0 +1 @@
+Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Audeo: Audio Generation for a Silent Performance Video b/data/2020/neurips/Audeo: Audio Generation for a Silent Performance Video
new file mode 100644
index 0000000000..ae179ca362
--- /dev/null
+++ b/data/2020/neurips/Audeo: Audio Generation for a Silent Performance Video	
@@ -0,0 +1 @@
+We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video. Generation of music from visual cues is a challenging problem and it is not clear whether it is an attainable goal at all. Our main aim in this work is to explore the plausibility of such a transformation and to identify cues and components able to carry the association of sounds with visual events. To achieve the transformation we built a full pipeline named `\textit{Audeo}' containing three components. We first translate the video frames of the keyboard and the musician hand movements into raw mechanical musical symbolic representation Piano-Roll (Roll) for each video frame which represents the keys pressed at each time step. We then adapt the Roll to be amenable for audio synthesis by including temporal correlations. This step turns out to be critical for meaningful audio generation. As a last step, we implement Midi synthesizers to generate realistic music. \textit{Audeo} converts video to audio smoothly and clearly with only a few setup constraints. We evaluate \textit{Audeo} on `in the wild' piano performance videos and obtain that their generated music is of reasonable audio quality and can be successfully recognized with high precision by popular music identification software.
\ No newline at end of file
diff --git a/data/2020/neurips/Auditing Differentially Private Machine Learning: How Private is Private SGD? b/data/2020/neurips/Auditing Differentially Private Machine Learning: How Private is Private SGD?
new file mode 100644
index 0000000000..0e4e576be6
--- /dev/null
+++ b/data/2020/neurips/Auditing Differentially Private Machine Learning: How Private is Private SGD?	
@@ -0,0 +1 @@
+We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. We do so via novel data poisoning attacks, which we show correspond to realistic privacy attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential privacy and data poisoning as a defense against data poisoning, our use as a tool for understanding the privacy of a specific mechanism is new. More generally, our work takes a quantitative, empirical approach to understanding the privacy afforded by specific implementations of differentially private algorithms that we believe has the potential to complement and influence analytical work on differential privacy.
\ No newline at end of file
diff --git a/data/2020/neurips/Auto Learning Attention b/data/2020/neurips/Auto Learning Attention
new file mode 100644
index 0000000000..abcb7692a0
--- /dev/null
+++ b/data/2020/neurips/Auto Learning Attention	
@@ -0,0 +1 @@
+Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially. However, designing the structures of different attention operations requires a bulk of computation and extensive expertise. In this paper, we devise an Auto Learning Attention (AutoLA) method, which is the ﬁrst attempt on automatic attention design. Speciﬁcally, we deﬁne a novel attention module named high order group attention (HOGA) as a directed acyclic graph (DAG) where each group represents a node, and each edge represents an operation of heterogeneous attentions. A typical HOGA architecture can be searched automatically via the differential AutoLA method within 1 GPU day using the ResNet-20 backbone on CIFAR10. Further, the searched attention module can generalize to various backbones as a plug-and-play component and outperforms popular manually designed channel and spatial attentions for many vision tasks, including image classiﬁcation on CIFAR100 and ImageNet, object detection and human keypoint detection on COCO dataset. Code is available at https://github.com/btma48/AutoLA.
\ No newline at end of file
diff --git a/data/2020/neurips/Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation b/data/2020/neurips/Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation
new file mode 100644
index 0000000000..b92bfe1f5f
--- /dev/null
+++ b/data/2020/neurips/Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation	
@@ -0,0 +1 @@
+Panoptic segmentation is posed as a new popular test-bed for the state-of-the-art holistic scene understanding methods with the requirement of simultaneously segmenting both foreground things and background stuff. The state-of-the-art panoptic segmentation network exhibits high structural complexity in different network components, i.e. backbone, proposal-based foreground branch, segmentation-based background branch, and feature fusion module across branches, which heavily relies on expert knowledge and tedious trials. In this work, we propose an efficient, cooperative and highly automated framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module in a unified panoptic segmentation pipeline based on the prevailing one-shot Network Architecture Search (NAS) paradigm. Notably, we extend the common single-task NAS into the multi-component scenario by taking the advantage of the newly proposed intra-modular search space and problem-oriented inter-modular search space, which helps us to obtain an optimal network architecture that not only performs well in both instance segmentation and semantic segmentation tasks but also be aware of the reciprocal relations between foreground things and background stuff classes. To relieve the vast computation burden incurred by applying NAS to complicated network architectures, we present a novel path-priority greedy search policy to find a robust, transferrable architecture with significantly reduced searching overhead. Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks. Moreover, extensive experiments are conducted to demonstrate the effectiveness of path-priority policy and transferability of Auto-Panoptic across different datasets. Codes and models are available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/AutoBSS: An Efficient Algorithm for Block Stacking Style Search b/data/2020/neurips/AutoBSS: An Efficient Algorithm for Block Stacking Style Search
new file mode 100644
index 0000000000..4cf02543c5
--- /dev/null
+++ b/data/2020/neurips/AutoBSS: An Efficient Algorithm for Block Stacking Style Search	
@@ -0,0 +1 @@
+Neural network architecture design mostly focuses on the new convolutional operator or special topological structure of network block, little attention is drawn to the configuration of stacking each block, called Block Stacking Style (BSS). Recent studies show that BSS may also have an unneglectable impact on networks, thus we design an efficient algorithm to search it automatically. The proposed method, AutoBSS, is a novel AutoML algorithm based on Bayesian optimization by iteratively refining and clustering Block Stacking Style Code (BSSC), which can find optimal BSS in a few trials without biased evaluation. On ImageNet classification task, ResNet50/MobileNetV2/EfficientNet-B0 with our searched BSS achieve 79.29%/74.5%/77.79%, which outperform the original baselines by a large margin. More importantly, experimental results on model compression, object detection and instance segmentation show the strong generalizability of the proposed AutoBSS, and further verify the unneglectable impact of BSS on neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference b/data/2020/neurips/AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference
new file mode 100644
index 0000000000..5f7a9958c3
--- /dev/null
+++ b/data/2020/neurips/AutoPrivacy: Automated Layer-wise Parameter Selection for Secure Neural Network Inference	
@@ -0,0 +1,2 @@
+Hybrid Privacy-Preserving Neural Network (HPPNN) implementing linear layers by Homomorphic Encryption (HE) and nonlinear layers by Garbled Circuit (GC) is one of the most promising secure solutions to emerging Machine Learning as a Service (MLaaS). Unfortunately, a HPPNN suffers from long inference latency, e.g., $\sim100$ seconds per image, which makes MLaaS unsatisfactory. Because HE-based linear layers of a HPPNN cost $93\%$ inference latency, it is critical to select a set of HE parameters to minimize computational overhead of linear layers. Prior HPPNNs over-pessimistically select huge HE parameters to maintain large noise budgets, since they use the same set of HE parameters for an entire network and ignore the error tolerance capability of a network. 
+In this paper, for fast and accurate secure neural network inference, we propose an automated layer-wise parameter selector, AutoPrivacy, that leverages deep reinforcement learning to automatically determine a set of HE parameters for each linear layer in a HPPNN. The learning-based HE parameter selection policy outperforms conventional rule-based HE parameter selection policy. Compared to prior HPPNNs, AutoPrivacy-optimized HPPNNs reduce inference latency by $53\%\sim70\%$ with negligible loss of accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning b/data/2020/neurips/AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning
new file mode 100644
index 0000000000..4fc84ec271
--- /dev/null
+++ b/data/2020/neurips/AutoSync: Learning to Synchronize for Data-Parallel Distributed Deep Learning	
@@ -0,0 +1 @@
+The rationale behind Eq. 1 is as follows: (1) Since many runtime systems (e.g. TensorFlow [1] or PyTorch [3]) introduce scheduling or parallelization between communication and computation, in practice, there are significant overlaps between the two components; (2) in data-parallel training, it is commonly observed that one component usually dominates the other [4]. These make using the maximum of them as the estimation of the total time reasonable.
\ No newline at end of file
diff --git a/data/2020/neurips/Autoencoders that don't overfit towards the Identity b/data/2020/neurips/Autoencoders that don't overfit towards the Identity
new file mode 100644
index 0000000000..fc902473ca
--- /dev/null
+++ b/data/2020/neurips/Autoencoders that don't overfit towards the Identity	
@@ -0,0 +1 @@
+Autoencoders (AE) aim to reproduce the output from the input. They may hence tend to overﬁt towards learning the identity-function between the input and output, i.e., they may predict each feature in the output from itself in the input. This is not useful, however, when AEs are used for prediction tasks in the presence of noise in the data. It may seem intuitively evident that this kind of overﬁtting is prevented by training a denoising AE [36], as the dropped-out features have to be predicted from the other features. In this paper, we consider linear autoencoders, as they facilitate analytic solutions, and ﬁrst show that denoising / dropout actually prevents the overﬁtting towards the identity-function only to the degree that it is penalized by the induced L2-norm regularization. In the main theorem of this paper, we show that the emphasized denoising AE [37] is indeed capable of completely eliminating the overﬁtting towards the identity-function. Our derivations reveal several new insights, including the closed-form solution of the full-rank model, as well as a new (near-)orthogonality constraint in the low-rank model. While this constraint is conceptually very different from the regularizers recently proposed in [11, 42, 14], their resulting effects on the learned embeddings are empirically similar. Our experiments on three well-known data-sets corroborate the various theoretical insights derived in this paper.
\ No newline at end of file
diff --git a/data/2020/neurips/Autofocused oracles for model-based design b/data/2020/neurips/Autofocused oracles for model-based design
new file mode 100644
index 0000000000..37033823a4
--- /dev/null
+++ b/data/2020/neurips/Autofocused oracles for model-based design	
@@ -0,0 +1 @@
+Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a target more tightly than previously observed. To that end, costly experimental measurements are being replaced with calls to a high-capacity regression model trained on labeled data, which can be leveraged in an in silico search for promising design candidates. However, the design goal necessitates moving into regions of the input space beyond where such models were trained. Therefore, one can ask: should the regression model be altered as the design algorithm explores the input space, in the absence of new data acquisition? Herein, we answer this question in the affirmative. In particular, we (i) formalize the data-driven design problem as a non-zero-sum game, (ii) leverage this formalism to develop a strategy for retraining the regression model as the design algorithm proceeds---what we refer to as autofocusing the model, and (iii) demonstrate the promise of autofocusing empirically.
\ No newline at end of file
diff --git a/data/2020/neurips/Automatic Curriculum Learning through Value Disagreement b/data/2020/neurips/Automatic Curriculum Learning through Value Disagreement
new file mode 100644
index 0000000000..36b8dfc0d9
--- /dev/null
+++ b/data/2020/neurips/Automatic Curriculum Learning through Value Disagreement	
@@ -0,0 +1 @@
+Continually solving new, unsolved tasks is the key to learning diverse behaviors. Through reinforcement learning (RL), we have made massive strides towards solving tasks that have a single goal. However, in the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. When biological agents learn, there is often an organized and meaningful order to which learning happens. Inspired by this, we propose setting up an automatic curriculum for goals that the agent needs to solve. Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals. To operationalize this idea, we introduce a goal proposal module that prioritizes goals that maximize the epistemic uncertainty of the Q-function of the policy. This simple technique samples goals that are neither too hard nor too easy for the agent to solve, hence enabling continual improvement. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond b/data/2020/neurips/Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond
new file mode 100644
index 0000000000..98d4f72395
--- /dev/null
+++ b/data/2020/neurips/Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond	
@@ -0,0 +1 @@
+Linear relaxation based perturbation analysis (LiRPA) for neural networks, which computes provable linear bounds of output neurons given a certain amount of input perturbation, has become a core component in robustness verification and certified defense. The majority of LiRPA-based methods focus on simple feed-forward networks and need particular manual derivations and implementations when extended to other architectures. In this paper, we develop an automatic framework to enable perturbation analysis on any neural network structures, by generalizing existing LiRPA algorithms such as CROWN to operate on general computational graphs. The flexibility, differentiability and ease of use of our framework allow us to obtain state-of-the-art results on LiRPA based certified defense on fairly complicated networks like DenseNet, ResNeXt and Transformer that are not supported by prior works. Our framework also enables loss fusion, a technique that significantly reduces the computational complexity of LiRPA for certified defense. For the first time, we demonstrate LiRPA based certified defense on Tiny ImageNet and Downscaled ImageNet where previous approaches cannot scale to due to the relatively large number of classes. Our work also yields an open-source library for the community to apply LiRPA to areas beyond certified defense without much LiRPA expertise, e.g., we create a neural network with a probably flat optimization landscape by applying LiRPA to network parameters. Our opensource library is available at https://github.com/KaidiXu/auto_LiRPA.
\ No newline at end of file
diff --git a/data/2020/neurips/Automatically Learning Compact Quality-aware Surrogates for Optimization Problems b/data/2020/neurips/Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
new file mode 100644
index 0000000000..0330d86bb4
--- /dev/null
+++ b/data/2020/neurips/Automatically Learning Compact Quality-aware Surrogates for Optimization Problems	
@@ -0,0 +1 @@
+Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.
\ No newline at end of file
diff --git a/data/2020/neurips/Autoregressive Score Matching b/data/2020/neurips/Autoregressive Score Matching
new file mode 100644
index 0000000000..8fe77f7e67
--- /dev/null
+++ b/data/2020/neurips/Autoregressive Score Matching	
@@ -0,0 +1 @@
+Autoregressive models use chain rule to define a joint probability distribution as a product of conditionals. These conditionals need to be normalized, imposing constraints on the functional families that can be used. To increase flexibility, we propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariate log-conditionals (scores), which need not be normalized. To train AR-CSM, we introduce a new divergence between distributions named Composite Score Matching (CSM). For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. Compared to previous score matching algorithms, our method is more scalable to high dimensional data and more stable to optimize. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
\ No newline at end of file
diff --git a/data/2020/neurips/Auxiliary Task Reweighting for Minimum-data Learning b/data/2020/neurips/Auxiliary Task Reweighting for Minimum-data Learning
new file mode 100644
index 0000000000..1cd3fbbeb9
--- /dev/null
+++ b/data/2020/neurips/Auxiliary Task Reweighting for Minimum-data Learning	
@@ -0,0 +1 @@
+Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce. To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task. Assigning and optimizing the importance weights for different auxiliary tasks remains an crucial and largely understudied research question. In this work, we propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task. Specifically, we formulate the weighted likelihood function of auxiliary tasks as a surrogate prior for the main task. By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search. In multiple experimental settings (e.g. semi-supervised learning, multi-label classification), we demonstrate that our algorithm can effectively utilize limited labeled data of the main task with the benefit of auxiliary tasks compared with previous task reweighting methods. We also show that under extreme cases with only a few extra examples (e.g. few-shot domain adaptation), our algorithm results in significant improvement over the baseline.
\ No newline at end of file
diff --git a/data/2020/neurips/AvE: Assistance via Empowerment b/data/2020/neurips/AvE: Assistance via Empowerment
new file mode 100644
index 0000000000..45d7349bae
--- /dev/null
+++ b/data/2020/neurips/AvE: Assistance via Empowerment	
@@ -0,0 +1 @@
+One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective preserves the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training.
\ No newline at end of file
diff --git a/data/2020/neurips/Avoiding Side Effects By Considering Future Tasks b/data/2020/neurips/Avoiding Side Effects By Considering Future Tasks
new file mode 100644
index 0000000000..4946141472
--- /dev/null
+++ b/data/2020/neurips/Avoiding Side Effects By Considering Future Tasks	
@@ -0,0 +1 @@
+Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.
\ No newline at end of file
diff --git a/data/2020/neurips/Avoiding Side Effects in Complex Environments b/data/2020/neurips/Avoiding Side Effects in Complex Environments
new file mode 100644
index 0000000000..1784bffefa
--- /dev/null
+++ b/data/2020/neurips/Avoiding Side Effects in Complex Environments	
@@ -0,0 +1 @@
+Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.
\ No newline at end of file
diff --git a/data/2020/neurips/Axioms for Learning from Pairwise Comparisons b/data/2020/neurips/Axioms for Learning from Pairwise Comparisons
new file mode 100644
index 0000000000..449b5aa73d
--- /dev/null
+++ b/data/2020/neurips/Axioms for Learning from Pairwise Comparisons	
@@ -0,0 +1 @@
+To be well-behaved, systems that process preference data must satisfy certain conditions identiﬁed by economic decision theory and by social choice theory. In ML, preferences and rankings are commonly learned by ﬁtting a probabilistic model to noisy preference data. The behavior of this learning process from the view of economic theory has previously been studied for the case where the data consists of rankings. In practice, it is more common to have only pairwise comparison data, and the formal properties of the associated learning problem are more challenging to analyze. We show that a large class of random utility models (including the Thurstone–Mosteller Model), when estimated using the MLE, satisfy a Pareto efﬁciency condition. These models also satisfy a strong monotonicity property, which implies that the learning process is responsive to input data. On the other hand, we show that these models fail certain other consistency conditions from social choice theory, and in particular do not always follow the majority opinion. Our results inform existing and future applications of random utility models for societal decision making.
\ No newline at end of file
diff --git a/data/2020/neurips/BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning b/data/2020/neurips/BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
new file mode 100644
index 0000000000..72758ad5ff
--- /dev/null
+++ b/data/2020/neurips/BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning	
@@ -0,0 +1 @@
+There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL's performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
\ No newline at end of file
diff --git a/data/2020/neurips/BERT Loses Patience: Fast and Robust Inference with Early Exit b/data/2020/neurips/BERT Loses Patience: Fast and Robust Inference with Early Exit
new file mode 100644
index 0000000000..55e4ad560d
--- /dev/null
+++ b/data/2020/neurips/BERT Loses Patience: Fast and Robust Inference with Early Exit	
@@ -0,0 +1 @@
+In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM). To achieve this, our approach couples an internal-classifier with each layer of a PLM and dynamically stops inference when the intermediate predictions of the internal classifiers remain unchanged for a pre-defined number of steps. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers. Meanwhile, experimental results with an ALBERT model show that our method can improve the accuracy and robustness of the model by preventing it from overthinking and exploiting multiple classifiers for prediction, yielding a better accuracy-speed trade-off compared to existing early exit methods.
\ No newline at end of file
diff --git a/data/2020/neurips/BOSS: Bayesian Optimization over String Spaces b/data/2020/neurips/BOSS: Bayesian Optimization over String Spaces
new file mode 100644
index 0000000000..6f14831994
--- /dev/null
+++ b/data/2020/neurips/BOSS: Bayesian Optimization over String Spaces	
@@ -0,0 +1 @@
+This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds a powerful Gaussian process surrogate model based on string kernels, naturally supporting variable length inputs, and performs efficient acquisition function maximization for spaces with syntactical constraints. Experiments demonstrate considerably improved optimization over existing approaches across a broad range of constraints, including the popular setting where syntax is governed by a context-free grammar.
\ No newline at end of file
diff --git a/data/2020/neurips/BRP-NAS: Prediction-based NAS using GCNs b/data/2020/neurips/BRP-NAS: Prediction-based NAS using GCNs
new file mode 100644
index 0000000000..b8049b09ef
--- /dev/null
+++ b/data/2020/neurips/BRP-NAS: Prediction-based NAS using GCNs	
@@ -0,0 +1 @@
+Neural architecture search (NAS) enables researchers to automatically explore broad design spaces in order to improve efficiency of neural networks. This efficiency is especially important in the case of on-device deployment, where improvements in accuracy should be balanced out with computational demands of a model. In practice, performance metrics of model are computationally expensive to obtain. Previous work uses a proxy (e.g., number of operations) or a layer-wise measurement of neural network layers to estimate end-to-end hardware performance but the imprecise prediction diminishes the quality of NAS. To address this problem, we propose BRP-NAS, an efficient hardware-aware NAS enabled by an accurate performance predictor-based on graph convolutional network (GCN). What is more, we investigate prediction quality on different metrics and show that sample efficiency of the predictor-based NAS can be improved by considering binary relations of models and an iterative data selection strategy. We show that our proposed method outperforms all prior methods on NAS-Bench-101, NAS-Bench-201 and DARTS. Finally, to raise awareness of the fact that accurate latency estimation is not a trivial task, we release LatBench -- a latency dataset of NAS-Bench-201 models running on a broad range of devices.
\ No newline at end of file
diff --git a/data/2020/neurips/Backpropagating Linearly Improves Transferability of Adversarial Examples b/data/2020/neurips/Backpropagating Linearly Improves Transferability of Adversarial Examples
new file mode 100644
index 0000000000..e8c22b3c0d
--- /dev/null
+++ b/data/2020/neurips/Backpropagating Linearly Improves Transferability of Adversarial Examples	
@@ -0,0 +1 @@
+The vulnerability of deep neural networks (DNNs) to adversarial examples has drawn great attention from the community. In this paper, we study the transferability of such examples, which lays the foundation of many black-box attacks on DNNs. We revisit a not so new but definitely noteworthy hypothesis of Goodfellow et al.'s and disclose that the transferability can be enhanced by improving the linearity of DNNs in an appropriate manner. We introduce linear backpropagation (LinBP), a method that performs backpropagation in a more linear fashion using off-the-shelf attacks that exploit gradients. More specifically, it calculates forward as normal but backpropagates loss as if some nonlinear activations are not encountered in the forward pass. Experimental results demonstrate that this simple yet effective method obviously outperforms current state-of-the-arts in crafting transferable adversarial examples on CIFAR-10 and ImageNet, leading to more effective attacks on a variety of DNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Bad Global Minima Exist and SGD Can Reach Them b/data/2020/neurips/Bad Global Minima Exist and SGD Can Reach Them
new file mode 100644
index 0000000000..16c06cdd60
--- /dev/null
+++ b/data/2020/neurips/Bad Global Minima Exist and SGD Can Reach Them	
@@ -0,0 +1 @@
+Several recent works have aimed to explain why severely overparameterized models, generalize well when trained by Stochastic Gradient Descent (SGD). The emergent consensus explanation has two parts: the first is that there are "no bad local minima", while the second is that SGD performs implicit regularization by having a bias towards low complexity models. We revisit both of these ideas in the context of image classification with common deep neural network architectures. Our first finding is that there exist bad global minima, i.e., models that fit the training set perfectly, yet have poor generalization. Our second finding is that given only unlabeled training data, we can easily construct initializations that will cause SGD to quickly converge to such bad global minima. For example, on CIFAR, CINIC10, and (Restricted) ImageNet, this can be achieved by starting SGD at a model derived by fitting random labels on the training data: while subsequent SGD training (with the correct labels) will reach zero training error, the resulting model will exhibit a test accuracy degradation of up to 40% compared to training from a random initialization. Finally, we show that regularization seems to provide SGD with an escape route: once heuristics such as data augmentation are used, starting from a complex model (adversarial initialization) has no effect on the test accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Balanced Meta-Softmax for Long-Tailed Visual Recognition b/data/2020/neurips/Balanced Meta-Softmax for Long-Tailed Visual Recognition
new file mode 100644
index 0000000000..b8207000b0
--- /dev/null
+++ b/data/2020/neurips/Balanced Meta-Softmax for Long-Tailed Visual Recognition	
@@ -0,0 +1 @@
+Deep classifiers have achieved great success in visual recognition. However, real-world data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that Softmax function, though used in most classification tasks, gives a biased gradient estimation under the long-tailed setup. This paper presents Balanced Softmax, an elegant unbiased extension of Softmax, to accommodate the label distribution shift between training and testing. Theoretically, we derive the generalization bound for multiclass Softmax regression and show our loss minimizes the bound. In addition, we introduce Balanced Meta-Softmax, applying a complementary Meta Sampler to estimate the optimal class sample rate and further improve long-tailed learning. In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Bandit Linear Control b/data/2020/neurips/Bandit Linear Control
new file mode 100644
index 0000000000..a3e44ee833
--- /dev/null
+++ b/data/2020/neurips/Bandit Linear Control	
@@ -0,0 +1 @@
+We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision, here only the cost incurred by the learner is observed. We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon $T$. We also give extensions of this result to general convex, possibly non-smooth costs, and to non-stochastic system noise. A key component of our algorithm is a new technique for addressing bandit optimization of loss functions with memory.
\ No newline at end of file
diff --git a/data/2020/neurips/Bandit Samplers for Training Graph Neural Networks b/data/2020/neurips/Bandit Samplers for Training Graph Neural Networks
new file mode 100644
index 0000000000..9351f3e39d
--- /dev/null
+++ b/data/2020/neurips/Bandit Samplers for Training Graph Neural Networks	
@@ -0,0 +1 @@
+Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs). However, due to the intractable computation of optimal sampling distribution, these sampling algorithms are suboptimal for GCNs and are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT). The fundamental reason is that the embeddings of the neighbors or learned weights involved in the optimal sampling distribution are changing during the training and not known a priori, but only partially observed when sampled, thus making the derivation of an optimal variance reduced samplers non-trivial. In this paper, we formulate the optimization of the sampling variance as an adversary bandit problem, where the rewards are related to the node embeddings and learned weights, and can vary constantly. Thus a good sampler needs to acquire variance information about more neighbors (exploration) while at the same time optimizing the immediate sampling variance (exploit). We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits b/data/2020/neurips/BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits
new file mode 100644
index 0000000000..2e0573e3b9
--- /dev/null
+++ b/data/2020/neurips/BanditPAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits	
@@ -0,0 +1 @@
+Clustering is a ubiquitous task in data science. Compared to the commonly used $k$-means clustering algorithm, $k$-medoids clustering algorithms require the cluster centers to be actual data points and support arbitrary distance metrics, allowing for greater interpretability and the clustering of structured objects. Current state-of-the-art $k$-medoids clustering algorithms, such as Partitioning Around Medoids (PAM), are iterative and are quadratic in the dataset size $n$ for each iteration, being prohibitively expensive for large datasets. We propose Bandit-PAM, a randomized algorithm inspired by techniques from multi-armed bandits, that significantly improves the computational efficiency of PAM. We theoretically prove that Bandit-PAM reduces the complexity of each PAM iteration from $O(n^2)$ to $O(n \log n)$ and returns the same results with high probability, under assumptions on the data that often hold in practice. We empirically validate our results on several large-scale real-world datasets, including a coding exercise submissions dataset from this http URL, the 10x Genomics 68k PBMC single-cell RNA sequencing dataset, and the MNIST handwritten digits dataset. We observe that Bandit-PAM returns the same results as PAM while performing up to 200x fewer distance computations. The improvements demonstrated by Bandit-PAM enable $k$-medoids clustering on a wide range of applications, including identifying cell types in large-scale single-cell data and providing scalable feedback for students learning computer science online. We also release Python and C++ implementations of our algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Barking up the right tree: an approach to search over molecule synthesis DAGs b/data/2020/neurips/Barking up the right tree: an approach to search over molecule synthesis DAGs
new file mode 100644
index 0000000000..a78880772c
--- /dev/null
+++ b/data/2020/neurips/Barking up the right tree: an approach to search over molecule synthesis DAGs	
@@ -0,0 +1 @@
+When designing new molecules with particular properties, it is not only important what to make but crucially how to make it. These instructions form a synthesis directed acyclic graph (DAG), describing how a large vocabulary of simple building blocks can be recursively combined through chemical reactions to create more complicated molecules of interest. In contrast, many current deep generative models for molecules ignore synthesizability. We therefore propose a deep generative model that better represents the real world process, by directly outputting molecule synthesis DAGs. We argue that this provides sensible inductive biases, ensuring that our model searches over the same chemical space that chemists would also have access to, as well as interpretability. We show that our approach is able to model chemical space well, producing a wide range of diverse molecules, and allows for unconstrained optimization of an inherently constrained problem: maximize certain chemical properties such that discovered molecules are synthesizable.
\ No newline at end of file
diff --git a/data/2020/neurips/Batch normalization provably avoids ranks collapse for randomly initialised deep networks b/data/2020/neurips/Batch normalization provably avoids ranks collapse for randomly initialised deep networks
new file mode 100644
index 0000000000..e5a68ca9c5
--- /dev/null
+++ b/data/2020/neurips/Batch normalization provably avoids ranks collapse for randomly initialised deep networks	
@@ -0,0 +1 @@
+Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used. We here investigate this phenomenon by revisiting the connection between random initialization in deep networks and spectral instabilities in products of random matrices. Given the rich literature on random matrices, it is not surprising to find that the rank of the intermediate representations in unnormalized networks collapses quickly with depth. In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks. Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets. Finally, we conduct an extensive set of experiments on real-world data sets, which confirm that rank stability is indeed a crucial condition for training modern-day deep neural architectures.
\ No newline at end of file
diff --git a/data/2020/neurips/Batched Coarse Ranking in Multi-Armed Bandits b/data/2020/neurips/Batched Coarse Ranking in Multi-Armed Bandits
new file mode 100644
index 0000000000..528c66c483
--- /dev/null
+++ b/data/2020/neurips/Batched Coarse Ranking in Multi-Armed Bandits	
@@ -0,0 +1 @@
+We study the problem of coarse ranking in the multi-armed bandits (MAB) setting, where we have a set of arms each of which is associated with an unknown distribution. The task is to partition the arms into clusters of predeﬁned sizes, such that the mean of any arm in the i -th cluster is larger than that of any arm in the j -th cluster for any j > i . Coarse ranking generalizes a number of basic problems in MAB (e.g., best arm identiﬁcation) and has many real-world applications. We initiate the study of the problem in the batched model where we can only have a small number of policy changes. We study both the ﬁxed budget and ﬁxed conﬁdence variants in MAB, and propose algorithms and prove impossibility results which together give almost tight tradeoffs between the total number of arms pulls and the number of policy changes. We have tested our algorithms in both real and synthetic data; our experimental results have demonstrated the efﬁciency of the proposed methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Baxter Permutation Process b/data/2020/neurips/Baxter Permutation Process
new file mode 100644
index 0000000000..8d8f475483
--- /dev/null
+++ b/data/2020/neurips/Baxter Permutation Process	
@@ -0,0 +1 @@
+In this paper, a Bayesian nonparametric (BNP) model for Baxter permutations (BPs), termed BP process (BPP) is proposed and applied to relational data analysis. The BPs are a well-studied class of permutations, and it has been demonstrated that there is one-to-one correspondence between BPs and several interesting objects including ﬂoorplan partitioning (FP), which constitutes a subset of rectangular partitioning (RP). Accordingly, the BPP can be used as an FP model. We combine the BPP with a multi-dimensional extension of the stick-breaking process called the block-breaking process to ﬁll the gap between FP and RP, and obtain a stochastic process on arbitrary RPs. Compared with conventional BNP models for arbitrary RPs, the proposed model is simpler and has a high afﬁnity with Bayesian inference.
\ No newline at end of file
diff --git a/data/2020/neurips/BayReL: Bayesian Relational Learning for Multi-omics Data Integration b/data/2020/neurips/BayReL: Bayesian Relational Learning for Multi-omics Data Integration
new file mode 100644
index 0000000000..ec3a69c446
--- /dev/null
+++ b/data/2020/neurips/BayReL: Bayesian Relational Learning for Multi-omics Data Integration	
@@ -0,0 +1 @@
+High-throughput molecular profiling technologies have produced high-dimensional multi-omics data, enabling systematic understanding of living systems at the genome scale. Studying molecular interactions across different data types helps reveal signal transduction mechanisms across different classes of molecules. In this paper, we develop a novel Bayesian representation learning method that infers the relational interactions across multi-omics data types. Our method, Bayesian Relational Learning (BayReL) for multi-omics data integration, takes advantage of a priori known relationships among the same class of molecules, modeled as a graph at each corresponding view, to learn view-specific latent variables as well as a multi-partite graph that encodes the interactions across views. Our experiments on several real-world datasets demonstrate enhanced performance of BayReL in inferring meaningful interactions compared to existing baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class b/data/2020/neurips/Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class
new file mode 100644
index 0000000000..dbbeb4ec33
--- /dev/null
+++ b/data/2020/neurips/Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class	
@@ -0,0 +1 @@
+A fundamental question in multiclass classification concerns understanding the consistency properties of surrogate risk minimization algorithms, which minimize a (often convex) surrogate to the multiclass 0-1 loss. In particular, the framework of calibrated surrogates has played an important role in analyzing Bayes consistency of such algorithms, i.e. in studying convergence to a Bayes optimal classifier (Zhang, 2004; Tewari and Bartlett, 2007). However, follow-up work has suggested this framework can be of limited value when studying H -consistency; in particular, concerns have been raised that even when the data comes from an underlying linear model, minimizing certain convex calibrated surrogates over linear scoring functions fails to recover the true model (Long and Servedio, 2013). In this paper, we investigate this apparent conundrum. We find that while some calibrated surrogates can indeed fail to provide H -consistency when minimized over a naturallooking but naïvely chosen scoring function class F , the situation can potentially be remedied by minimizing them over a more carefully chosen class of scoring functions F . In particular, for the popular one-vs-all hinge and logistic surrogates, both of which are calibrated (and therefore provide Bayes consistency) under realizable models, but were previously shown to pose problems for realizable H -consistency, we derive a form of scoring function class F that enables Hconsistency. When H is the class of linear models, the class F consists of certain piecewise linear scoring functions that are characterized by the same number of parameters as in the linear case, and minimization over which can be performed using an adaptation of the min-pooling idea from neural network training. Our experiments confirm that the one-vs-all surrogates, when trained over this class of nonlinear scoring functions F , yield better linear multiclass classifiers than when trained over standard linear scoring functions.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Attention Modules b/data/2020/neurips/Bayesian Attention Modules
new file mode 100644
index 0000000000..d38d3467a9
--- /dev/null
+++ b/data/2020/neurips/Bayesian Attention Modules	
@@ -0,0 +1 @@
+Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Bits: Unifying Quantization and Pruning b/data/2020/neurips/Bayesian Bits: Unifying Quantization and Pruning
new file mode 100644
index 0000000000..0ba8c6140b
--- /dev/null
+++ b/data/2020/neurips/Bayesian Bits: Unifying Quantization and Pruning	
@@ -0,0 +1 @@
+We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization. Bayesian Bits employs a novel decomposition of the quantization operation, which sequentially considers doubling the bit width. At each new bit width, the residual error between the full precision value and the previously rounded value is quantized. We then decide whether or not to add this quantized residual error for a higher effective bit width and lower quantization noise. By starting with a power-of-two bit width, this decomposition will always produce hardware-friendly configurations, and through an additional 0-bit option, serves as a unified view of pruning and quantization. Bayesian Bits then introduces learnable stochastic gates, which collectively control the bit width of the given tensor. As a result, we can obtain low bit solutions by performing approximate inference over the gates, with prior distributions that encourage most of them to be switched off. We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks that provide a better trade-off between accuracy and efficiency than their static bit width equivalents.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks b/data/2020/neurips/Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks
new file mode 100644
index 0000000000..5f25b2d03a
--- /dev/null
+++ b/data/2020/neurips/Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks	
@@ -0,0 +1 @@
+Without loss of generality, we assume that the nodes are labeled such that there is no directed edge in E from later node to earlier node. Such labeling is also known as perfect/topological ordering of DAG G. Define a set of edges that are connected to node j and have opposite directions in E and E′, re(j) = {k ∈ V : ejk = ekj = 1} for j = 1, . . . , p. For k ∈ re(j), E includes an edge k → j, while E′ has the reverse edge j → k. If re(j) = ∅ for all j, there exists no pair of nodes (j, k) such that ejk = ekj = 1. This means E = E ′, because Markov equivalent DAGs have the same skeleton. We will show by mathematical induction that re(j) = ∅ for all j, which contradicts the assumption that E 6= E′. For node p that is the last element of the perfect ordering of G, we have pa(p) = pa′(p) ∪ re(p) due to the same skeleton of G and G′. Taking the difference of the equality (1) at (x1, x2, . . . , xp + 1) and (x1, x2, . . . , xp) yields, ∑
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Deep Ensembles via the Neural Tangent Kernel b/data/2020/neurips/Bayesian Deep Ensembles via the Neural Tangent Kernel
new file mode 100644
index 0000000000..18e260eb2b
--- /dev/null
+++ b/data/2020/neurips/Bayesian Deep Ensembles via the Neural Tangent Kernel	
@@ -0,0 +1 @@
+We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Deep Learning and a Probabilistic Perspective of Generalization b/data/2020/neurips/Bayesian Deep Learning and a Probabilistic Perspective of Generalization
new file mode 100644
index 0000000000..9aa70b2259
--- /dev/null
+++ b/data/2020/neurips/Bayesian Deep Learning and a Probabilistic Perspective of Generalization	
@@ -0,0 +1 @@
+The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights. Bayesian marginalization can particularly improve the accuracy and calibration of modern deep neural networks, which are typically underspecified by the data, and can represent many compelling but different solutions. We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead. We also investigate the prior over functions implied by a vague distribution over neural network weights, explaining the generalization properties of such models from a probabilistic perspective. From this perspective, we explain results that have been presented as mysterious and distinct to neural network generalization, such as the ability to fit images with random labels, and show that these results can be reproduced with Gaussian processes. We also show that Bayesian model averaging alleviates double descent, resulting in monotonic performance improvements with increased flexibility. Finally, we provide a Bayesian perspective on tempering for calibrating predictive distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels b/data/2020/neurips/Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels
new file mode 100644
index 0000000000..e604001fad
--- /dev/null
+++ b/data/2020/neurips/Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels	
@@ -0,0 +1 @@
+Recently, different machine learning methods have been introduced to tackle the challenging few-shot learning scenario that is, learning from a small labeled dataset related to a specific task. Common approaches have taken the form of meta-learning: learning to learn on the new problem given the old. Following the recognition that meta-learning is implementing learning in a multi-level model, we present a Bayesian treatment for the meta-learning inner loop through the use of deep kernels. As a result we can learn a kernel that transfers to new tasks; we call this Deep Kernel Transfer (DKT). This approach has many advantages: is straightforward to implement as a single optimizer, provides uncertainty quantification, and does not require estimation of task-specific parameters. We empirically demonstrate that DKT outperforms several state-of-the-art algorithms in few-shot classification, and is the state of the art for cross-domain adaptation and regression. We conclude that complex meta-learning routines can be replaced by a simpler Bayesian model without loss of accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Multi-type Mean Field Multi-agent Imitation Learning b/data/2020/neurips/Bayesian Multi-type Mean Field Multi-agent Imitation Learning
new file mode 100644
index 0000000000..50ec68911a
--- /dev/null
+++ b/data/2020/neurips/Bayesian Multi-type Mean Field Multi-agent Imitation Learning	
@@ -0,0 +1 @@
+Multi-agent Imitation learning (MAIL) refers to the problem that agents learn to perform a task interactively in a multi-agent system through observing and mimicking expert demonstrations, without any knowledge of a reward function from the environment. MAIL has received a lot of attention due to promising results achieved on synthesized tasks, with the potential to be applied to complex real-world multi-agent tasks. Key challenges for MAIL include sample efﬁciency and scalability. In this paper, we proposed Bayesian multi-type mean ﬁeld multi-agent imitation learning (BM3IL). Our method improves sample efﬁciency through establishing a Bayesian formulation for MAIL, and enhances scalability through introducing a new multi-type mean ﬁeld approximation. We demonstrate the performance of our algorithm through benchmarking with three state-of-the-art multi-agent imitation learning algorithms on several tasks, including solving a multi-agent trafﬁc optimization problem in a real-world transportation network. Experimental results indicate that our algorithm signiﬁcantly outperforms all other algorithms in all scenarios.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Optimization for Iterative Learning b/data/2020/neurips/Bayesian Optimization for Iterative Learning
new file mode 100644
index 0000000000..72d904d4d6
--- /dev/null
+++ b/data/2020/neurips/Bayesian Optimization for Iterative Learning	
@@ -0,0 +1 @@
+The performance of deep (reinforcement) learning systems crucially depends on the choice of hyperparameters. Their tuning is notoriously expensive, typically requiring an iterative training process to run for numerous steps to convergence. Traditional tuning algorithms only consider the final performance of hyperparameters acquired after many expensive iterations and ignore intermediate information from earlier training steps. In this paper, we present a Bayesian optimization(BO) approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. We propose to learn an evaluation function compressing learning progress at any stage of the training process into a single numeric score according to both training success and stability. Our BO framework is then tradeoff the benefit of assessing a hyperparameter setting over additional training steps against their computation cost. We further increase model efficiency by selectively including scores from different training steps for any evaluated hyperparameter set. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Optimization of Risk Measures b/data/2020/neurips/Bayesian Optimization of Risk Measures
new file mode 100644
index 0000000000..049076f655
--- /dev/null
+++ b/data/2020/neurips/Bayesian Optimization of Risk Measures	
@@ -0,0 +1 @@
+In this study, we propose a novel multi-objective Bayesian optimization (MOBO) method to efficiently identify the Pareto front (PF) defined by risk measures for black-box functions under the presence of input uncertainty (IU). Existing BO methods for Pareto optimization in the presence of IU are risk-specific or without theoretical guarantees, whereas our proposed method addresses general risk measures and has theoretical guarantees. The basic idea of the proposed method is to assume a Gaussian process (GP) model for the black-box function and to construct high-probability bounding boxes for the risk measures using the GP model. Furthermore, in order to reduce the uncertainty of non-dominated bounding boxes, we propose a method of selecting the next evaluation point using a maximin distance defined by the maximum value of a quasi distance based on bounding boxes. As theoretical analysis, we prove that the algorithm can return an arbitrary-accurate solution in a finite number of iterations with high probability, for various risk measures such as Bayes risk, worst-case risk, and value-at-risk. We also give a theoretical analysis that takes into account approximation errors because there exist non-negligible approximation errors (e.g., finite approximation of PFs and sampling-based approximation of bounding boxes) in practice. We confirm that the proposed method outperforms compared with existing methods not only in the setting with IU but also in the setting of ordinary MOBO through numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Probabilistic Numerical Integration with Tree-Based Models b/data/2020/neurips/Bayesian Probabilistic Numerical Integration with Tree-Based Models
new file mode 100644
index 0000000000..67f59c7045
--- /dev/null
+++ b/data/2020/neurips/Bayesian Probabilistic Numerical Integration with Tree-Based Models	
@@ -0,0 +1 @@
+Bayesian quadrature (BQ) is a method for solving numerical integration problems in a Bayesian manner, which allows user to quantify their uncertainty about the solution. The standard approach to BQ is based on Gaussian process (GP) approximation of the integrand. As a result, BQ approach is inherently limited to cases where GP approximations can be done in an efficient manner, thus often prohibiting high-dimensional or non-smooth target functions. This paper proposes to tackle this issue with a new Bayesian numerical integration algorithm based on Bayesian Additive Regression Trees (BART) priors, which we call BART-Int. BART priors are easy to tune and well-suited for discontinuous functions. We demonstrate that they also lend themselves naturally to a sequential design setting and that explicit convergence rates can be obtained in a variety of settings. The advantages and disadvantages of this new methodology are highlighted on a set of benchmark tests including the Genz functions, and on a Bayesian survey design problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Pseudocoresets b/data/2020/neurips/Bayesian Pseudocoresets
new file mode 100644
index 0000000000..60b4abe7cc
--- /dev/null
+++ b/data/2020/neurips/Bayesian Pseudocoresets	
@@ -0,0 +1 @@
+A Bayesian pseudocoreset is a small synthetic dataset for which the posterior over parameters approximates that of the original dataset. While promising, the scalability of Bayesian pseudocoresets is not yet validated in realistic problems such as image classification with deep neural networks. On the other hand, dataset distillation methods similarly construct a small dataset such that the optimization using the synthetic dataset converges to a solution with performance competitive with optimization using full data. Although dataset distillation has been empirically verified in large-scale settings, the framework is restricted to point estimates, and their adaptation to Bayesian inference has not been explored. This paper casts two representative dataset distillation algorithms as approximations to methods for constructing pseudocoresets by minimizing specific divergence measures: reverse KL divergence and Wasserstein distance. Furthermore, we provide a unifying view of such divergence measures in Bayesian pseudocoreset construction. Finally, we propose a novel Bayesian pseudocoreset algorithm based on minimizing forward KL divergence. Our empirical results demonstrate that the pseudocoresets constructed from these methods reflect the true posterior even in high-dimensional Bayesian inference problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian Robust Optimization for Imitation Learning b/data/2020/neurips/Bayesian Robust Optimization for Imitation Learning
new file mode 100644
index 0000000000..ecb2daa314
--- /dev/null
+++ b/data/2020/neurips/Bayesian Robust Optimization for Imitation Learning	
@@ -0,0 +1 @@
+One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function. While completely ignoring risk can lead to overly aggressive and unsafe policies, optimizing in a fully adversarial sense is also problematic as it can lead to overly conservative policies that perform poorly in practice. To provide a bridge between these two extremes, we propose Bayesian Robust Optimization for Imitation Learning (BROIL). BROIL leverages Bayesian reward function inference and a user specific risk tolerance to efficiently optimize a robust policy that balances expected return and conditional value at risk. Our empirical results show that BROIL provides a natural way to interpolate between return-maximizing and risk-minimizing behaviors and outperforms existing risk-sensitive and risk-neutral inverse reinforcement learning algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods b/data/2020/neurips/Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
new file mode 100644
index 0000000000..d4d709d367
--- /dev/null
+++ b/data/2020/neurips/Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods	
@@ -0,0 +1 @@
+We formulate the problem of neural network optimization as Bayesian filtering, where the observations are the backpropagated gradients. While neural network optimization has previously been studied using natural gradient methods which are closely related to Bayesian inference, they were unable to recover standard optimizers such as Adam and RMSprop with a root-mean-square gradient normalizer, instead getting a mean-square normalizer. To recover the root-mean-square normalizer, we find it necessary to account for the temporal dynamics of all the other parameters as they are geing optimized. The resulting optimizer, AdaBayes, adaptively transitions between SGD-like and Adam-like behaviour, automatically recovers AdamW, a state of the art variant of Adam with decoupled weight decay, and has generalisation performance competitive with SGD.
\ No newline at end of file
diff --git a/data/2020/neurips/Belief Propagation Neural Networks b/data/2020/neurips/Belief Propagation Neural Networks
new file mode 100644
index 0000000000..35f14f9fe0
--- /dev/null
+++ b/data/2020/neurips/Belief Propagation Neural Networks	
@@ -0,0 +1 @@
+Learned neural solvers have successfully been used to solve combinatorial optimization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while providing tighter bounds. On challenging model counting problems, BPNNs compute estimates 100's of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality.
\ No newline at end of file
diff --git a/data/2020/neurips/Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information b/data/2020/neurips/Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
new file mode 100644
index 0000000000..fd8ac0f21b
--- /dev/null
+++ b/data/2020/neurips/Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information	
@@ -0,0 +1 @@
+This work introduces macro-action discovery using value-of-information (VoI) for robust and efﬁcient planning in partially observable Markov decision processes (POMDPs). POMDPs are a powerful framework for planning under uncertainty. Previous approaches have used high-level macro-actions within POMDP policies to reduce planning complexity. However, macro-action design is often heuristic and rarely comes with performance guarantees. Here, we present a method for extracting belief-dependent, variable-length macro-actions directly from a low-level POMDP model. We construct macro-actions by chaining sequences of open-loop actions together when the task-speciﬁc value of information (VoI) — the change in expected task performance caused by observations in the current planning iteration — is low. Importantly, we provide performance guarantees on the resulting VoI macro-action policies in the form of bounded regret relative to the optimal policy. In simulated tracking experiments, we achieve higher reward than both closed-loop and hand-coded macro-action baselines, selectively using VoI macro-actions to reduce planning complexity while maintaining near-optimal task performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method b/data/2020/neurips/Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method
new file mode 100644
index 0000000000..618827afa2
--- /dev/null
+++ b/data/2020/neurips/Benchmarking Deep Inverse Models over time, and the Neural-Adjoint method	
@@ -0,0 +1 @@
+We consider the task of solving generic inverse problems, where one wishes to determine the hidden parameters of a natural system that will give rise to a particular set of measurements. Recently many new approaches based upon deep learning have arisen generating impressive results. We conceptualize these models as different schemes for efficiently, but randomly, exploring the space of possible inverse solutions. As a result, the accuracy of each approach should be evaluated as a function of time rather than a single estimated solution, as is often done now. Using this metric, we compare several state-of-the-art inverse modeling approaches on four benchmark tasks: two existing tasks, one simple task for visualization and one new task from metamaterial design. Finally, inspired by our conception of the inverse problem, we explore a solution that uses a deep learning model to approximate the forward model, and then uses backpropagation to search for good inverse solutions. This approach, termed the neural-adjoint, achieves the best performance in many scenarios.
\ No newline at end of file
diff --git a/data/2020/neurips/Benchmarking Deep Learning Interpretability in Time Series Predictions b/data/2020/neurips/Benchmarking Deep Learning Interpretability in Time Series Predictions
new file mode 100644
index 0000000000..43b78137f8
--- /dev/null
+++ b/data/2020/neurips/Benchmarking Deep Learning Interpretability in Time Series Predictions	
@@ -0,0 +1 @@
+Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.
\ No newline at end of file
diff --git a/data/2020/neurips/Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs b/data/2020/neurips/Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs
new file mode 100644
index 0000000000..e75b07d4c1
--- /dev/null
+++ b/data/2020/neurips/Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs	
@@ -0,0 +1 @@
+One of the fundamental problems in Artificial Intelligence is to perform complex multi-hop logical reasoning over the facts captured by a knowledge graph (KG). This problem is challenging, because KGs can be massive and incomplete. Recent approaches embed KG entities in a low dimensional space and then use these embeddings to find the answer entities. However, it has been an outstanding challenge of how to handle arbitrary first-order logic (FOL) queries as present methods are limited to only a subset of FOL operators. In particular, the negation operator is not supported. An additional limitation of present methods is also that they cannot naturally model uncertainty. Here, we present BetaE, a probabilistic embedding framework for answering arbitrary FOL queries over KGs. BetaE is the first method that can handle a complete set of first-order logical operations: conjunction ($\wedge$), disjunction ($\vee$), and negation ($\neg$). A key insight of BetaE is to use probabilistic distributions with bounded support, specifically the Beta distribution, and embed queries/entities as distributions, which as a consequence allows us to also faithfully model uncertainty. Logical operations are performed in the embedding space by neural operators over the probabilistic embeddings. We demonstrate the performance of BetaE on answering arbitrary FOL queries on three large, incomplete KGs. While being more general, BetaE also increases relative performance by up to 25.4% over the current state-of-the-art KG reasoning methods that can only handle conjunctive queries without negation.
\ No newline at end of file
diff --git a/data/2020/neurips/Beta R-CNN: Looking into Pedestrian Detection from Another Perspective b/data/2020/neurips/Beta R-CNN: Looking into Pedestrian Detection from Another Perspective
new file mode 100644
index 0000000000..b1ecb13be1
--- /dev/null
+++ b/data/2020/neurips/Beta R-CNN: Looking into Pedestrian Detection from Another Perspective	
@@ -0,0 +1 @@
+Recently signiﬁcant progress has been made in pedestrian detection, but it remains challenging to achieve high performance in occluded and crowded scenes. It could be attributed mostly to the widely used representation of pedestrians, i.e ., 2D axis-aligned bounding box, which just describes the approximate location and size of the object. Bounding box models the object as a uniform distribution within the boundary, making pedestrians indistinguishable in occluded and crowded scenes due to much noise. To eliminate the problem, we propose a novel representation based on 2D beta distribution, named Beta Representation. It pictures a pedestrian by explicitly constructing the relationship between full-body and visible boxes, and emphasizes the center of visual mass by assigning different probability values to pixels. As a result, Beta Representation is much better for distinguishing highly-overlapped instances in crowded scenes with a new NMS strategy named BetaNMS. What’s more, to fully exploit Beta Representation, a novel pipeline Beta R-CNN equipped with BetaHead and BetaMask is proposed, leading to high detection performance in occluded and crowded scenes.
\ No newline at end of file
diff --git a/data/2020/neurips/Better Full-Matrix Regret via Parameter-Free Online Learning b/data/2020/neurips/Better Full-Matrix Regret via Parameter-Free Online Learning
new file mode 100644
index 0000000000..c77964840c
--- /dev/null
+++ b/data/2020/neurips/Better Full-Matrix Regret via Parameter-Free Online Learning	
@@ -0,0 +1 @@
+We provide online convex optimization algorithms that guarantee improved full-matrix regret bounds. These algorithms extend prior work in several ways. First, we seamlessly allow for the incorporation of constraints without requiring unknown oracle-tuning for any learning rate parameters. Second, we improve the regret analysis of the full-matrix AdaGrad algorithm by suggesting a better learning rate value and showing how to tune the learning rate to this value on-the-ﬂy. Third, all our bounds are obtained via a general framework for constructing regret bounds that depend on an arbitrary sequence of norms.
\ No newline at end of file
diff --git a/data/2020/neurips/Better Set Representations For Relational Reasoning b/data/2020/neurips/Better Set Representations For Relational Reasoning
new file mode 100644
index 0000000000..d74fb90eee
--- /dev/null
+++ b/data/2020/neurips/Better Set Representations For Relational Reasoning	
@@ -0,0 +1 @@
+Incorporating relational reasoning into neural networks has greatly expanded their capabilities and scope. One defining trait of relational reasoning is that it operates on a set of entities, as opposed to standard vector representations. Existing end-to-end approaches typically extract entities from inputs by directly interpreting the latent feature representations as a set. We show that these approaches do not respect set permutational invariance and thus have fundamental representational limitations. To resolve this limitation, we propose a simple and general network module called a Set Refiner Network (SRN). We first use synthetic image experiments to demonstrate how our approach effectively decomposes objects without explicit supervision. Then, we insert our module into existing relational reasoning models and show that respecting set invariance leads to substantial gains in prediction performance and robustness on several relational reasoning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs b/data/2020/neurips/Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs
new file mode 100644
index 0000000000..65ebe79edb
--- /dev/null
+++ b/data/2020/neurips/Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs	
@@ -0,0 +1 @@
+We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons). Motivated by this limitation, we identify a set of key designs -- ego- and neighbor-embedding separation, higher-order neighborhoods, and combination of intermediate representations -- that boost learning from the graph structure under heterophily. We combine them into a graph neural network, H2GCN, which we use as the base method to empirically evaluate the effectiveness of the identified designs. Going beyond the traditional benchmarks with strong homophily, our empirical analysis shows that the identified designs increase the accuracy of GNNs by up to 40% and 27% over models without them on synthetic and real networks with heterophily, respectively, and yield competitive performance under homophily.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses b/data/2020/neurips/Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses
new file mode 100644
index 0000000000..9918d06a74
--- /dev/null
+++ b/data/2020/neurips/Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses	
@@ -0,0 +1 @@
+As predictive models are increasingly being deployed in high-stakes decision-making, there has been a lot of interest in developing algorithms which can provide recourses to affected individuals. While developing such tools is important, it is even more critical to analyse and interpret a predictive model, and vet it thoroughly to ensure that the recourses it offers are meaningful and non-discriminatory before it is deployed in the real world. To this end, we propose a novel model agnostic framework called Actionable Recourse Summaries (AReS) to construct global counterfactual explanations which provide an interpretable and accurate summary of recourses for the entire population. We formulate a novel objective which simultaneously optimizes for correctness of the recourses and interpretability of the explanations, while minimizing overall recourse costs across the entire population. More specifically, our objective enables us to learn, with optimality guarantees on recourse correctness, a small number of compact rule sets each of which capture recourses for well defined subpopulations within the data. We also demonstrate theoretically that several of the prior approaches proposed to generate recourses for individuals are special cases of our framework. Experimental evaluation with real world datasets and user studies demonstrate that our framework can provide decision makers with a comprehensive overview of recourses corresponding to any black box model, and consequently help detect undesirable model biases and discrimination.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond Lazy Training for Over-parameterized Tensor Decomposition b/data/2020/neurips/Beyond Lazy Training for Over-parameterized Tensor Decomposition
new file mode 100644
index 0000000000..24f91827cf
--- /dev/null
+++ b/data/2020/neurips/Beyond Lazy Training for Over-parameterized Tensor Decomposition	
@@ -0,0 +1 @@
+Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples b/data/2020/neurips/Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples
new file mode 100644
index 0000000000..678f310e56
--- /dev/null
+++ b/data/2020/neurips/Beyond Perturbations: Learning Guarantees with Arbitrary Adversarial Test Examples	
@@ -0,0 +1 @@
+We present a transductive learning algorithm that takes as input training examples from a distribution $P$ and arbitrary (unlabeled) test examples, possibly chosen by an adversary. This is unlike prior work that assumes that test examples are small perturbations of $P$. Our algorithm outputs a selective classifier, which abstains from predicting on some examples. By considering selective transductive learning, we give the first nontrivial guarantees for learning classes of bounded VC dimension with arbitrary train and test distributions---no prior guarantees were known even for simple classes of functions such as intervals on the line. In particular, for any function in a class $C$ of bounded VC dimension, we guarantee a low test error rate and a low rejection rate with respect to $P$. Our algorithm is efficient given an Empirical Risk Minimizer (ERM) for $C$. Our guarantees hold even for test examples chosen by an unbounded white-box adversary. We also give guarantees for generalization, agnostic, and unsupervised settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency b/data/2020/neurips/Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency
new file mode 100644
index 0000000000..0abee77e3c
--- /dev/null
+++ b/data/2020/neurips/Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency	
@@ -0,0 +1 @@
+A central problem in cognitive science and behavioural neuroscience as well as in machine learning and artificial intelligence research is to ascertain whether two or more decision makers (e.g. brains or algorithms) use the same strategy. Accuracy alone cannot distinguish between strategies: two systems may achieve similar accuracy with very different strategies. The need to differentiate beyond accuracy is particularly pressing if two systems are at or near ceiling performance, like Convolutional Neural Networks (CNNs) and humans on visual object recognition. Here we introduce trial-by-trial error consistency, a quantitative analysis for measuring whether two decision making systems systematically make errors on the same inputs. Making consistent errors on a trial-by-trial basis is a necessary condition if we want to ascertain similar processing strategies between decision makers. Our analysis is applicable to compare algorithms with algorithms, humans with humans, and algorithms with humans. When applying error consistency to visual object recognition we obtain three main findings: (1.) Irrespective of architecture, CNNs are remarkably consistent with one another (2.) The consistency between CNNs and human observers, however, is little above what can be expected by chance alone--indicating that humans and CNNs are likely implementing very different strategies (3.) CORnet-S, a recurrent model termed the "current best model of the primate ventral visual stream", fails to capture essential characteristics of human behavioural data and behaves essentially like a ResNet-50 in our analysis--that is, just like a standard feedforward network. Taken together, error consistency analysis suggests that the strategies used by human and machine vision are still very different--but we envision our general-purpose error consistency analysis to serve as a fruitful tool for quantifying future progress.
\ No newline at end of file
diff --git a/data/2020/neurips/Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties b/data/2020/neurips/Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties
new file mode 100644
index 0000000000..b1ddc3cd05
--- /dev/null
+++ b/data/2020/neurips/Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties	
@@ -0,0 +1 @@
+Deep Gaussian Processes learn probabilistic data representations for supervised learning by cascading multiple Gaussian Processes. While this model family promises flexible predictive distributions, exact inference is not tractable. Approximate inference techniques trade off the ability to closely resemble the posterior distribution against speed of convergence and computational efficiency. We propose a novel Gaussian variational family that allows for retaining covariances between latent processes while achieving fast convergence by marginalising out all global latent variables. After providing a proof of how this marginalisation can be done for general covariances, we restrict them to the ones we empirically found to be most important in order to also achieve computational efficiency. We provide an efficient implementation of our new approach and apply it to several regression benchmark datasets. We find that it yields more accurate predictive distributions, in particular for test data points that are distant from the training set.
\ No newline at end of file
diff --git a/data/2020/neurips/Bi-level Score Matching for Learning Energy-based Latent Variable Models b/data/2020/neurips/Bi-level Score Matching for Learning Energy-based Latent Variable Models
new file mode 100644
index 0000000000..41937e723d
--- /dev/null
+++ b/data/2020/neurips/Bi-level Score Matching for Learning Energy-based Latent Variable Models	
@@ -0,0 +1 @@
+Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. However, it remains largely open to learn energy-based latent variable models (EBLVMs), except some special cases. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior. To solve BiSM efficiently, we develop a stochastic optimization algorithm with gradient unrolling. Theoretically, we analyze the consistency of BiSM and the convergence of the stochastic algorithm. Empirically, we show the promise of BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs parameterized by deep convolutional neural networks. BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images.
\ No newline at end of file
diff --git a/data/2020/neurips/Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs b/data/2020/neurips/Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
new file mode 100644
index 0000000000..6300a65b83
--- /dev/null
+++ b/data/2020/neurips/Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs	
@@ -0,0 +1,2 @@
+We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary. While existing approaches all require carefully constructing optimistic and biased loss estimators, our approach uses standard unbiased estimators and relies on a simple increasing learning rate schedule, together with the help of logarithmically homogeneous self-concordant barriers and a strengthened Freedman's inequality. 
+Besides its simplicity, our approach enjoys several advantages. First, the obtained high-probability regret bounds are data-dependent and could be much smaller than the worst-case bounds, which resolves an open problem asked by Neu (2015). Second, resolving another open problem of Bartlett et al. (2008) and Abernethy and Rakhlin (2009), our approach leads to the first general and efficient algorithm with a high-probability regret bound for adversarial linear bandits, while previous methods are either inefficient or only applicable to specific action sets. Finally, our approach can also be applied to learning adversarial Markov Decision Processes and provides the first algorithm with a high-probability small-loss bound for this problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Bidirectional Convolutional Poisson Gamma Dynamical Systems b/data/2020/neurips/Bidirectional Convolutional Poisson Gamma Dynamical Systems
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2020/neurips/Bidirectional Convolutional Poisson Gamma Dynamical Systems	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2020/neurips/Big Bird: Transformers for Longer Sequences b/data/2020/neurips/Big Bird: Transformers for Longer Sequences
new file mode 100644
index 0000000000..61943313d2
--- /dev/null
+++ b/data/2020/neurips/Big Bird: Transformers for Longer Sequences	
@@ -0,0 +1 @@
+Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.
\ No newline at end of file
diff --git a/data/2020/neurips/Big Self-Supervised Models are Strong Semi-Supervised Learners b/data/2020/neurips/Big Self-Supervised Models are Strong Semi-Supervised Learners
new file mode 100644
index 0000000000..3d8a39831c
--- /dev/null
+++ b/data/2020/neurips/Big Self-Supervised Models are Strong Semi-Supervised Learners	
@@ -0,0 +1 @@
+One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9\% ImageNet top-1 accuracy with just 1\% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10\% of labels, ResNet-50 trained with our method achieves 77.5\% top-1 accuracy, outperforming standard supervised training with all of the labels.
\ No newline at end of file
diff --git a/data/2020/neurips/Biological credit assignment through dynamic inversion of feedforward networks b/data/2020/neurips/Biological credit assignment through dynamic inversion of feedforward networks
new file mode 100644
index 0000000000..f64b04c9c3
--- /dev/null
+++ b/data/2020/neurips/Biological credit assignment through dynamic inversion of feedforward networks	
@@ -0,0 +1 @@
+Learning depends on changes in synaptic connections deep inside the brain. In multilayer networks, these changes are triggered by error signals fed back from the output, generally through a stepwise inversion of the feedforward processing steps. The gold standard for this process -- backpropagation -- works well in artificial neural networks, but is biologically implausible. Several recent proposals have emerged to address this problem, but many of these biologically-plausible schemes are based on learning an independent set of feedback connections. This complicates the assignment of errors to each synapse by making it dependent upon a second learning problem, and by fitting inversions rather than guaranteeing them. Here, we show that feedforward network transformations can be effectively inverted through dynamics. We derive this dynamic inversion from the perspective of feedback control, where the forward transformation is reused and dynamically interacts with fixed or random feedback to propagate error signals during the backward pass. Importantly, this scheme does not rely upon a second learning problem for feedback because accurate inversion is guaranteed through the network dynamics. We map these dynamics onto generic feedforward networks, and show that the resulting algorithm performs well on several supervised and unsupervised datasets. We also link this dynamic inversion to Gauss-Newton optimization, suggesting a biologically-plausible approximation to second-order learning. Overall, our work introduces an alternative perspective on credit assignment in the brain, and proposes a special role for temporal dynamics and feedback control during learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Biologically Inspired Mechanisms for Adversarial Robustness b/data/2020/neurips/Biologically Inspired Mechanisms for Adversarial Robustness
new file mode 100644
index 0000000000..ee2e04f399
--- /dev/null
+++ b/data/2020/neurips/Biologically Inspired Mechanisms for Adversarial Robustness	
@@ -0,0 +1 @@
+A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically plausible mechanisms in adversarial robustness. We demonstrate that the non-uniform sampling performed by the primate retina and the presence of multiple receptive fields with a range of receptive field sizes at each eccentricity improve the robustness of neural networks to small adversarial perturbations. We verify that these two mechanisms do not suffer from gradient obfuscation and study their contribution to adversarial robustness through ablation studies.
\ No newline at end of file
diff --git a/data/2020/neurips/Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework b/data/2020/neurips/Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework
new file mode 100644
index 0000000000..ba0cf71fdb
--- /dev/null
+++ b/data/2020/neurips/Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework	
@@ -0,0 +1 @@
+Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks, from a unified functional optimization perspective. Our new framework allows us to identify a key trade-off between accuracy and robustness via designing smoothing distributions, helping to design new families of non-Gaussian smoothing distributions that work more efficiently for different $\ell_p$ settings, including $\ell_1$, $\ell_2$ and $\ell_\infty$ attacks. Our proposed methods achieve better certification results than previous works and provide a new perspective on randomized smoothing certification.
\ No newline at end of file
diff --git a/data/2020/neurips/Black-Box Optimization with Local Generative Surrogates b/data/2020/neurips/Black-Box Optimization with Local Generative Surrogates
new file mode 100644
index 0000000000..b99a6e17df
--- /dev/null
+++ b/data/2020/neurips/Black-Box Optimization with Local Generative Surrogates	
@@ -0,0 +1 @@
+We propose a novel method for gradient-based optimization of black-box simulators using differentiable local surrogate models. In fields such as physics and engineering, many processes are modeled with non-differentiable simulators with intractable likelihoods. Optimization of these forward models is particularly challenging, especially when the simulator is stochastic. To address such cases, we introduce the use of deep generative models to iteratively approximate the simulator in local neighborhoods of the parameter space. We demonstrate that these local surrogates can be used to approximate the gradient of the simulator, and thus enable gradient-based optimization of simulator parameters. In cases where the dependence of the simulator on the parameter space is constrained to a low dimensional submanifold, we observe that our method attains minima faster than baseline methods, including Bayesian optimization, numerical optimization, and approaches using score function gradient estimators.
\ No newline at end of file
diff --git a/data/2020/neurips/Black-Box Ripper: Copying black-box models using generative evolutionary algorithms b/data/2020/neurips/Black-Box Ripper: Copying black-box models using generative evolutionary algorithms
new file mode 100644
index 0000000000..40fa1dbafd
--- /dev/null
+++ b/data/2020/neurips/Black-Box Ripper: Copying black-box models using generative evolutionary algorithms	
@@ -0,0 +1 @@
+We study the task of replicating the functionality of black-box neural models, for which we only know the output class probabilities provided for a set of input images. We assume back-propagation through the black-box model is not possible and its training images are not available, e.g. the model could be exposed only through an API. In this context, we present a teacher-student framework that can distill the black-box (teacher) model into a student model with minimal accuracy loss. To generate useful data samples for training the student, our framework (i) learns to generate images on a proxy data set (with images and classes different from those used to train the black-box) and (ii) applies an evolutionary strategy to make sure that each generated data sample exhibits a high response for a specific class when given as input to the black box. Our framework is compared with several baseline and state-of-the-art methods on three benchmark data sets. The empirical evidence indicates that our model is superior to the considered baselines. Although our method does not back-propagate through the black-box network, it generally surpasses state-of-the-art methods that regard the teacher as a glass-box model. Our code is available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Blind Video Temporal Consistency via Deep Video Prior b/data/2020/neurips/Blind Video Temporal Consistency via Deep Video Prior
new file mode 100644
index 0000000000..09e987213c
--- /dev/null
+++ b/data/2020/neurips/Blind Video Temporal Consistency via Deep Video Prior	
@@ -0,0 +1 @@
+Applying image processing algorithms independently to each video frame often leads to temporal inconsistency in the resulting video. To address this issue, we present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly instead of a large dataset. Unlike most previous methods that enforce temporal consistency with optical flow, we show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior. Moreover, a carefully designed iteratively reweighted training strategy is proposed to address the challenging multimodal inconsistency problem. We demonstrate the effectiveness of our approach on 7 computer vision tasks on videos. Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency. Our source codes are publicly available at this http URL.
\ No newline at end of file
diff --git a/data/2020/neurips/BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images b/data/2020/neurips/BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images
new file mode 100644
index 0000000000..6c7533e37e
--- /dev/null
+++ b/data/2020/neurips/BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images	
@@ -0,0 +1 @@
+We present BlockGAN, an image generative model that learns object-aware 3D scene representations directly from unlabelled 2D images. Current work on scene representation learning either ignores scene background or treats the whole scene as one object. Meanwhile, work that considers scene compositionality treats scene objects only as image patches or 2D layers with alpha maps. Inspired by the computer graphics pipeline, we design BlockGAN to learn to first generate 3D features of background and foreground objects, then combine them into 3D features for the wholes cene, and finally render them into realistic images. This allows BlockGAN to reason over occlusion and interaction between objects' appearance, such as shadow and lighting, and provides control over each object's 3D pose and identity, while maintaining image realism. BlockGAN is trained end-to-end, using only unlabelled single images, without the need for 3D geometry, pose labels, object masks, or multiple views of the same scene. Our experiments show that using explicit 3D features to represent objects allows BlockGAN to learn disentangled representations both in terms of objects (foreground and background) and their properties (pose and identity).
\ No newline at end of file
diff --git a/data/2020/neurips/BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization b/data/2020/neurips/BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization
new file mode 100644
index 0000000000..f84c41dc01
--- /dev/null
+++ b/data/2020/neurips/BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization	
@@ -0,0 +1 @@
+One of the earliest commonly-used packages is Spearmint [94], which implements a variety of modeling techniques such as MCMC hyperparameter sampling and input warping [95]. Spearmint also supports parallel optimization via fantasies, and constrained optimization with the expected improvement and predictive entropy search acquisition functions [31, 38]. Spearmint was among the first libraries to make BO easily accessible to the end user.
\ No newline at end of file
diff --git a/data/2020/neurips/Boosting Adversarial Training with Hypersphere Embedding b/data/2020/neurips/Boosting Adversarial Training with Hypersphere Embedding
new file mode 100644
index 0000000000..54a6bafaa0
--- /dev/null
+++ b/data/2020/neurips/Boosting Adversarial Training with Hypersphere Embedding	
@@ -0,0 +1 @@
+Adversarial training (AT) is one of the most effective defenses against adversarial attacks for deep learning models. In this work, we advocate incorporating the hypersphere embedding (HE) mechanism into the AT procedure by regularizing the features onto compact manifolds, which constitutes a lightweight yet effective module to blend in the strength of representation learning. Our extensive analyses reveal that AT and HE are well coupled to benefit the robustness of the adversarially trained models from several aspects. We validate the effectiveness and adaptability of HE by embedding it into the popular AT frameworks including PGD-AT, ALP, and TRADES, as well as the FreeAT and FastAT strategies. In the experiments, we evaluate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets, which verifies that integrating HE can consistently enhance the model robustness for each AT framework with little extra computation.
\ No newline at end of file
diff --git a/data/2020/neurips/Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates b/data/2020/neurips/Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates
new file mode 100644
index 0000000000..9b40ea66c3
--- /dev/null
+++ b/data/2020/neurips/Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates	
@@ -0,0 +1 @@
+We propose a new methodology to design first-order methods for unconstrained strongly convex problems, i.e., to design for a shifted objective function. Several technical lemmas are provided as the building blocks for designing new methods. By shifting objective, the analysis is tightened, which leaves space for faster rates, and also simplified. Following this methodology, we derived several new accelerated schemes for problems that equipped with various first-order oracles, and all of the derived methods have faster worst case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning b/data/2020/neurips/Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
new file mode 100644
index 0000000000..b24f54f0e1
--- /dev/null
+++ b/data/2020/neurips/Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning	
@@ -0,0 +1 @@
+We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.
\ No newline at end of file
diff --git a/data/2020/neurips/Bootstrapping neural processes b/data/2020/neurips/Bootstrapping neural processes
new file mode 100644
index 0000000000..8652c57686
--- /dev/null
+++ b/data/2020/neurips/Bootstrapping neural processes	
@@ -0,0 +1 @@
+Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely on an assumption that uncertainty in stochastic processes is modeled by a single latent variable, which potentially limits the flexibility. To this end, we propose the Boostrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap. The bootstrap is a classical data-driven technique for estimating uncertainty, which allows BNP to learn the stochasticity in NPs without assuming a particular form. We demonstrate the efficacy of BNP on various types of data and its robustness in the presence of model-data mismatch.
\ No newline at end of file
diff --git a/data/2020/neurips/Boundary thickness and robustness in learning models b/data/2020/neurips/Boundary thickness and robustness in learning models
new file mode 100644
index 0000000000..94d1a5da1a
--- /dev/null
+++ b/data/2020/neurips/Boundary thickness and robustness in learning models	
@@ -0,0 +1 @@
+Robustness of machine learning models to various adversarial and non-adversarial corruptions continues to be of interest. In this paper, we introduce the notion of the boundary thickness of a classifier, and we describe its connection with and usefulness for model robustness. Thick decision boundaries lead to improved performance, while thin decision boundaries lead to overfitting (e.g., measured by the robust generalization gap between training and testing) and lower robustness. We show that a thicker boundary helps improve robustness against adversarial examples (e.g., improving the robust test accuracy of adversarial training) as well as so-called out-of-distribution (OOD) transforms, and we show that many commonly-used regularization and data augmentation procedures can increase boundary thickness. On the theoretical side, we establish that maximizing boundary thickness during training is akin to the so-called mixup training. Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms. We can also show that the performance improvement in several lines of recent work happens in conjunction with a thicker boundary.
\ No newline at end of file
diff --git a/data/2020/neurips/BoxE: A Box Embedding Model for Knowledge Base Completion b/data/2020/neurips/BoxE: A Box Embedding Model for Knowledge Base Completion
new file mode 100644
index 0000000000..96f2a4a8b5
--- /dev/null
+++ b/data/2020/neurips/BoxE: A Box Embedding Model for Knowledge Base Completion	
@@ -0,0 +1 @@
+Knowledge base completion (KBC) aims to automatically infer missing facts by exploiting information already present in a knowledge base (KB). A promising approach for KBC is to embed knowledge into latent spaces and make predictions from learned embeddings. However, existing embedding models are subject to at least one of the following limitations: (1) theoretical inexpressivity, (2) lack of support for prominent inference patterns (e.g., hierarchies), (3) lack of support for KBC over higher-arity relations, and (4) lack of support for incorporating logical rules. Here, we propose a spatio-translational embedding model, called BoxE, that simultaneously addresses all these limitations. BoxE embeds entities as points, and relations as a set of hyper-rectangles (or boxes), which spatially characterize basic logical properties. This seemingly simple abstraction yields a fully expressive model offering a natural encoding for many desired logical properties. BoxE can both capture and inject rules from rich classes of rule languages, going well beyond individual inference patterns. By design, BoxE naturally applies to higher-arity KBs. We conduct a detailed experimental analysis, and show that BoxE achieves state-of-the-art performance, both on benchmark knowledge graphs and on more general KBs, and we empirically show the power of integrating logical rules.
\ No newline at end of file
diff --git a/data/2020/neurips/Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization b/data/2020/neurips/Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization
new file mode 100644
index 0000000000..0364ad3ef0
--- /dev/null
+++ b/data/2020/neurips/Breaking Reversibility Accelerates Langevin Dynamics for Non-Convex Optimization	
@@ -0,0 +1 @@
+Langevin dynamics (LD) has been proven to be a powerful technique for optimizing a non-convex objective as an efﬁcient algorithm to ﬁnd local minima while eventually visiting a global minimum on longer time-scales. LD is based on the ﬁrst-order Langevin diffusion which is reversible in time. We study two variants that are based on non-reversible Langevin diffusions: the underdamped Langevin dynamics (ULD) and the Langevin dynamics with a non-symmetric drift (NLD). Adopting the techniques of Tzen et al. (2018) for LD to non-reversible diffusions, we show that for a given local minimum that is within an arbitrary distance from the initialization, with high probability, either the ULD trajectory ends up somewhere outside a small neighborhood of this local minimum within a recurrence time which depends on the smallest eigenvalue of the Hessian at the local minimum or they enter this neighborhood by the recurrence time and stay there for a potentially exponentially long escape time. The ULD algorithm improves upon the recurrence time obtained for LD in Tzen et al. (2018) with respect to the dependency on the smallest eigenvalue of the Hessian at the local minimum. Similar results and improvements are obtained for the NLD algorithm. We also show that non-reversible variants can exit the basin of attraction of a local minimum faster in discrete time when the objective has two local minima separated by a saddle point and quantify the amount of improvement. Our analysis suggests that non-reversible Langevin algorithms are
\ No newline at end of file
diff --git a/data/2020/neurips/Breaking the Communication-Privacy-Accuracy Trilemma b/data/2020/neurips/Breaking the Communication-Privacy-Accuracy Trilemma
new file mode 100644
index 0000000000..c76d6c5ce8
--- /dev/null
+++ b/data/2020/neurips/Breaking the Communication-Privacy-Accuracy Trilemma	
@@ -0,0 +1 @@
+Two major challenges in distributed learning and estimation are 1) preserving the privacy of the local samples; and 2) communicating them efficiently to a central server, while achieving high accuracy for the end-to-end task. While there has been significant interest in addressing each of these challenges separately in the recent literature, treatments that simultaneously address both challenges are still largely missing. In this paper, we develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency in various canonical settings. In particular, we consider the problems of mean estimation and frequency estimation under <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula>-local differential privacy and <inline-formula> <tex-math notation="LaTeX">$b$ </tex-math></inline-formula>-bit communication constraints. For mean estimation, we propose the SQKR mechanism, a scheme based on Kashin’s representation and random sampling, with order-optimal estimation error under both constraints. We further apply SQKR to distributed SGD and obtain a communication efficient and (locally) differentially private distributed SGD protocol. For frequency estimation, we present the RHR mechanism, a scheme that leverages the recursive structure of Walsh-Hadamard matrices and achieves order-optimal estimation error for all privacy levels and communication budgets. As a by-product, we also construct a distribution estimation mechanism that is rate-optimal for all privacy regimes and communication constraints, extending recent work that is limited to <inline-formula> <tex-math notation="LaTeX">$b=1$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\varepsilon =O(1)$ </tex-math></inline-formula>. Our results demonstrate that intelligent encoding under joint privacy and communication constraints can yield a performance that matches the optimal accuracy achievable under either constraint alone. In other words, the optimal performance is determined by the more stringent of the two constraints, and the less stringent constraint can be satisfied for free.
\ No newline at end of file
diff --git a/data/2020/neurips/Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model b/data/2020/neurips/Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
new file mode 100644
index 0000000000..18734cd984
--- /dev/null
+++ b/data/2020/neurips/Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model	
@@ -0,0 +1 @@
+This paper studies a central issue in modern reinforcement learning, the sample efficiency, and makes progress toward solving an idealistic scenario that assumes access to a generative model or a simulator. Despite a large number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds some enormous threshold. The current paper overcomes this barrier and fully settles this problem; more specifically, we establish the minimax optimality of the model-based approach for any given target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
\ No newline at end of file
diff --git a/data/2020/neurips/Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning b/data/2020/neurips/Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning
new file mode 100644
index 0000000000..a20b0341e4
--- /dev/null
+++ b/data/2020/neurips/Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Sample efficiency has been one of the major challenges for deep reinforcement learning. Recently, model-based reinforcement learning has been proposed to address this challenge by performing planning on imaginary trajectories with a learned world model. However, world model learning may suffer from overfitting to training trajectories, and thus model-based value estimation and policy search will be pone to be sucked in an inferior local policy. In this paper, we propose a novel model-based reinforcement learning algorithm, called BrIdging Reality and Dream (BIRD). It maximizes the mutual information between imaginary and real trajectories so that the policy improvement learned from imaginary trajectories can be easily generalized to real trajectories. We demonstrate that our approach improves sample efficiency of model-based planning, and achieves state-of-the-art performance on challenging visual control benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS b/data/2020/neurips/Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
new file mode 100644
index 0000000000..209758648c
--- /dev/null
+++ b/data/2020/neurips/Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) has shown great potentials in finding better neural network designs. Sample-based NAS is the most reliable approach which aims at exploring the search space and evaluating the most promising architectures. However, it is computationally very costly. As a remedy, the one-shot approach has emerged as a popular technique for accelerating NAS using weight-sharing. However, due to the weight-sharing of vastly different networks, the one-shot approach is less reliable than the sample-based approach. In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously. Specifically, we apply Graph Convolutional Network predictor as a surrogate model for Bayesian Optimization to select multiple related candidate models in each iteration. We then apply weight-sharing to train multiple candidate models simultaneously. This approach not only accelerates the traditional sample-based approach significantly, but also keeps its reliability. This is because weight-sharing among related architectures are more reliable than those in the one-shot approach. Extensive experiments are conducted to verify the effectiveness of our method over many competing algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Building powerful and equivariant graph neural networks with structural message-passing b/data/2020/neurips/Building powerful and equivariant graph neural networks with structural message-passing
new file mode 100644
index 0000000000..ca0f31bcdf
--- /dev/null
+++ b/data/2020/neurips/Building powerful and equivariant graph neural networks with structural message-passing	
@@ -0,0 +1 @@
+Message-passing has proved to be an effective way to design graph neural networks, as it is able to leverage both permutation equivariance and an inductive bias towards learning local structures to achieve good generalization. However, current message-passing architectures have a limited representation power and fail to learn basic topological properties of graphs. We address this problem and propose a new message-passing framework that is powerful while preserving permutation equivariance. Specifically, we propagate unique node identifiers in the form of a one-hot encoding in order to learn a local context matrix around each node. This enables to learn rich local information about both features and topology, which can be pooled to obtain node representations. Experimentally, we find our model to be superior at predicting various graph topological properties, opening the way to novel powerful architectures that are both equivariant and computationally efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Byzantine Resilient Distributed Multi-Task Learning b/data/2020/neurips/Byzantine Resilient Distributed Multi-Task Learning
new file mode 100644
index 0000000000..6e9377bdfc
--- /dev/null
+++ b/data/2020/neurips/Byzantine Resilient Distributed Multi-Task Learning	
@@ -0,0 +1 @@
+Distributed multi-task learning provides significant advantages in multi-agent networks with heterogeneous data sources where agents aim to learn distinct but correlated models simultaneously. However, distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents. In this paper, we present an approach for Byzantine resilient distributed multi-task learning. We propose an efficient online weight assignment rule by measuring the accumulated loss using an agent's data and its neighbors' models. A small accumulated loss indicates a large similarity between the two tasks. In order to ensure the Byzantine resilience of the aggregation at a normal agent, we introduce a step for filtering out larger losses. We analyze the approach for convex models and show that normal agents converge resiliently towards their true targets. Further, an agent's learning performance using the proposed weight assignment rule is guaranteed to be at least as good as in the non-cooperative case as measured by the expected regret. Finally, we demonstrate the approach using three case studies, including regression and classification problems, and show that our method exhibits good empirical performance for non-convex models, such as convolutional neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/CASTLE: Regularization via Auxiliary Causal Graph Discovery b/data/2020/neurips/CASTLE: Regularization via Auxiliary Causal Graph Discovery
new file mode 100644
index 0000000000..a0d8db47b6
--- /dev/null
+++ b/data/2020/neurips/CASTLE: Regularization via Auxiliary Causal Graph Discovery	
@@ -0,0 +1 @@
+Regularization improves generalization of supervised models to out-of-sample data. Prior works have shown that prediction in the causal direction (effect from cause) results in lower testing error than the anti-causal direction. However, existing regularization methods are agnostic of causality. We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE learns the causal directed acyclical graph (DAG) as an adjacency matrix embedded in the neural network's input layers, thereby facilitating the discovery of optimal predictors. Furthermore, CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features. We provide a theoretical generalization bound for our approach and conduct experiments on a plethora of synthetic and real publicly available datasets demonstrating that CASTLE consistently leads to better out-of-sample predictions as compared to other popular benchmark regularizers.
\ No newline at end of file
diff --git a/data/2020/neurips/CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation b/data/2020/neurips/CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation
new file mode 100644
index 0000000000..4db50f1a6e
--- /dev/null
+++ b/data/2020/neurips/CHIP: A Hawkes Process Model for Continuous-time Networks with Scalable and Consistent Estimation	
@@ -0,0 +1 @@
+In many application settings involving networks, such as messages between users of an on-line social network or transactions between traders in financial markets, the observed data consist of timestamped relational events, which form a continuous-time network. We propose the Community Hawkes Independent Pairs (CHIP) generative model for such networks. We show that applying spectral clustering to an aggregated adjacency matrix constructed from the CHIP model provides consistent community detection for a growing number of nodes and time duration. We also develop consistent and computationally efficient estimators for the model parameters. We demonstrate that our proposed CHIP model and estimation procedure scales to large networks with tens of thousands of nodes and provides superior fits than existing continuous-time network models on several real networks.
\ No newline at end of file
diff --git a/data/2020/neurips/CLEARER: Multi-Scale Neural Architecture Search for Image Restoration b/data/2020/neurips/CLEARER: Multi-Scale Neural Architecture Search for Image Restoration
new file mode 100644
index 0000000000..e328e4f621
--- /dev/null
+++ b/data/2020/neurips/CLEARER: Multi-Scale Neural Architecture Search for Image Restoration	
@@ -0,0 +1 @@
+Multi-scale neural networks have shown effectiveness in image restoration tasks, which are usually designed and integrated in a handcrafted manner. Different from the existing labor-intensive handcrafted architecture design paradigms, we present a novel method, termed as multi-sCaLe nEural ARchitecture sEarch for image Restoration (CLEARER), which is a speciﬁcally designed neural architecture search (NAS) for image restoration. Our contributions are twofold. On one hand, we design a multi-scale search space that consists of three task-ﬂexible modules. Namely, 1) Parallel module that connects multi-resolution neural blocks in parallel, while preserving the channels and spatial-resolution in each neural block, 2) Transition module remains the existing multi-resolution features while extending them to a lower resolution, 3) Fusion module integrates multi-resolution features by passing the features of the parallel neural blocks to the current neural blocks. On the other hand, we present novel losses which could 1) balance the tradeoff between the model complexity and performance, which is highly expected to image restoration; and 2) relax the discrete architecture parameters into a continuous distribution which approximates to either 0 or 1. As a result, a differentiable strategy could be employed to search when to fuse or extract multi-resolution features, while the discretization issue faced by the gradient-based NAS could be alleviated. The proposed CLEARER could search a promising architecture in two GPU hours. Extensive experiments show the promising performance of our method comparing with nine image denoising methods and eight image deraining approaches in quantitative and qualitative evaluations. The codes are available at https://github.com/limit-scu .
\ No newline at end of file
diff --git a/data/2020/neurips/COBE: Contextualized Object Embeddings from Narrated Instructional Video b/data/2020/neurips/COBE: Contextualized Object Embeddings from Narrated Instructional Video
new file mode 100644
index 0000000000..d8c2ccaf40
--- /dev/null
+++ b/data/2020/neurips/COBE: Contextualized Object Embeddings from Narrated Instructional Video	
@@ -0,0 +1 @@
+Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions. A fully-supervised approach to recognizing object states and their contexts in the real-world is unfortunately marred by the long-tailed, open-ended distribution of the data, which would effectively require massive amounts of annotations to capture the appearance of objects in all their different forms. Instead of relying on manually-labeled data for this task, we propose a new framework for learning Contextualized OBject Embeddings (COBE) from automatically-transcribed narrations of instructional videos. We leverage the semantic and compositional structure of language by training a visual detector to predict a contextualized word embedding of the object and its associated narration. This enables the learning of an object representation where concepts relate according to a semantic language metric. Our experiments show that our detector learns to predict a rich variety of contextual object information, and that it is highly effective in the settings of few-shot and zero-shot learning.
\ No newline at end of file
diff --git a/data/2020/neurips/COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning b/data/2020/neurips/COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
new file mode 100644
index 0000000000..10c9dc5cd7
--- /dev/null
+++ b/data/2020/neurips/COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning	
@@ -0,0 +1 @@
+Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this paper, we propose a Cooperative hierarchical Transformer (COOT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities. The method consists of three major components: an attention-aware feature aggregation layer, which leverages the local temporal context (intra-level, e.g., within a clip), a contextual transformer to learn the interactions between low-level and high-level semantics (inter-level, e.g. clip-video, sentence-paragraph), and a cross-modal cycle-consistency loss to connect video and text. The resulting method compares favorably to the state of the art on several benchmarks while having few parameters. All code is available open-source at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/COPT: Coordinated Optimal Transport on Graphs b/data/2020/neurips/COPT: Coordinated Optimal Transport on Graphs
new file mode 100644
index 0000000000..910481c6f4
--- /dev/null
+++ b/data/2020/neurips/COPT: Coordinated Optimal Transport on Graphs	
@@ -0,0 +1 @@
+We introduce COPT, a novel distance metric between graphs defined via an optimization routine, computing a coordinated pair of optimal transport maps simultaneously. This gives an unsupervised way to learn general-purpose graph representation, applicable to both graph sketching and graph comparison. COPT involves simultaneously optimizing dual transport plans, one between the vertices of two graphs, and another between graph signal probability distributions. We show theoretically that our method preserves important global structural information on graphs, in particular spectral information, and analyze connections to existing studies. Empirically, COPT outperforms state of the art methods in graph classification on both synthetic and real datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/COT-GAN: Generating Sequential Data via Causal Optimal Transport b/data/2020/neurips/COT-GAN: Generating Sequential Data via Causal Optimal Transport
new file mode 100644
index 0000000000..54f74fe5cc
--- /dev/null
+++ b/data/2020/neurips/COT-GAN: Generating Sequential Data via Causal Optimal Transport	
@@ -0,0 +1 @@
+We introduce COT-GAN, an adversarial algorithm to train implicit generative models optimized for producing sequential data. The loss function of this algorithm is formulated using ideas from Causal Optimal Transport (COT), which combines classic optimal transport methods with an additional temporal causality constraint. Remarkably, we find that this causality condition provides a natural framework to parameterize the cost function that is learned by the discriminator as a robust (worst-case) distance, and an ideal mechanism for learning time dependent data distributions. Following Genevay et al.\ (2018), we also include an entropic penalization term which allows for the use of the Sinkhorn algorithm when computing the optimal transport cost. Our experiments show effectiveness and stability of COT-GAN when generating both low- and high-dimensional time series data. The success of the algorithm also relies on a new, improved version of the Sinkhorn divergence which demonstrates less bias in learning.
\ No newline at end of file
diff --git a/data/2020/neurips/CSER: Communication-efficient SGD with Error Reset b/data/2020/neurips/CSER: Communication-efficient SGD with Error Reset
new file mode 100644
index 0000000000..12fce13ef2
--- /dev/null
+++ b/data/2020/neurips/CSER: Communication-efficient SGD with Error Reset	
@@ -0,0 +1 @@
+The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms: i) cause no loss of accuracy, and ii) accelerate the training by nearly $10\times$ for CIFAR-100, and by $4.5\times$ for ImageNet.
\ No newline at end of file
diff --git a/data/2020/neurips/CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances b/data/2020/neurips/CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances
new file mode 100644
index 0000000000..a324622e8d
--- /dev/null
+++ b/data/2020/neurips/CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances	
@@ -0,0 +1 @@
+Novelty detection, i.e., identifying whether a given sample is drawn from outside the training distribution, is essential for reliable machine learning. To this end, there have been many attempts at learning a representation well-suited for novelty detection and designing a score based on such representation. In this paper, we propose a simple, yet effective method named contrasting shifted instances (CSI), inspired by the recent success on contrastive learning of visual representations. Specifically, in addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself. Based on this, we propose a new detection score that is specific to the proposed training scheme. Our experiments demonstrate the superiority of our method under various novelty detection scenarios, including unlabeled one-class, unlabeled multi-class and labeled multi-class settings, with various image benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations b/data/2020/neurips/CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations
new file mode 100644
index 0000000000..9b29750e33
--- /dev/null
+++ b/data/2020/neurips/CaSPR: Learning Canonical Spatiotemporal Point Cloud Representations	
@@ -0,0 +1 @@
+We propose CaSPR, a method to learn object-centric canonical spatiotemporal point cloud representations of dynamically moving or evolving objects. Our goal is to enable information aggregation over time and the interrogation of object state at any spatiotemporal neighborhood in the past, observed or not. Different from previous work, CaSPR learns representations that support spacetime continuity, are robust to variable and irregularly spacetime-sampled point clouds, and generalize to unseen object instances. Our approach divides the problem into two subtasks. First, we explicitly encode time by mapping an input point cloud sequence to a spatiotemporally-canonicalized object space. We then leverage this canonicalization to learn a spatiotemporal latent representation using neural ordinary differential equations and a generative model of dynamically evolving shapes using continuous normalizing flows. We demonstrate the effectiveness of our method on several applications including shape reconstruction, camera pose estimation, continuous spatiotemporal sequence reconstruction, and correspondence estimation from irregularly or intermittently sampled observations.
\ No newline at end of file
diff --git a/data/2020/neurips/Calibrated Reliable Regression using Maximum Mean Discrepancy b/data/2020/neurips/Calibrated Reliable Regression using Maximum Mean Discrepancy
new file mode 100644
index 0000000000..4a5879eb08
--- /dev/null
+++ b/data/2020/neurips/Calibrated Reliable Regression using Maximum Mean Discrepancy	
@@ -0,0 +1 @@
+Accurate quantification of uncertainty is crucial for real-world applications of machine learning. However, modern deep neural networks still produce unreliable predictive uncertainty, often yielding over-confident predictions. In this paper, we are concerned with getting well-calibrated predictions in regression tasks. We propose the calibrated regression method using the maximum mean discrepancy by minimizing the kernel embedding measure. Theoretically, the calibration error of our method asymptotically converges to zero when the sample size is large enough. Experiments on non-trivial real datasets show that our method can produce well-calibrated and sharp prediction intervals, which outperforms the related state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Calibrating CNNs for Lifelong Learning b/data/2020/neurips/Calibrating CNNs for Lifelong Learning
new file mode 100644
index 0000000000..7d85f07d57
--- /dev/null
+++ b/data/2020/neurips/Calibrating CNNs for Lifelong Learning	
@@ -0,0 +1 @@
+We present an approach for lifelong/continual learning of convolutional neural networks (CNN) that does not suffer from the problem of catastrophic forgetting when moving from one task to the other. We show that the activation maps generated by the CNN trained on the old task can be calibrated using very few calibration parameters, to become relevant to the new task. Based on this, we calibrate the activation maps produced by each network layer using spatial and channel-wise calibration modules and train only these calibration parameters for each new task in order to perform lifelong learning. Our calibration modules intro-duce signiﬁcantly less computation and parameters as compared to the approaches that dynamically expand the network. Our approach is immune to catastrophic forgetting since we store the task-adaptive calibration parameters, which contain all the task-speciﬁc knowledge and is exclusive to each task. Further, our approach does not require storing data samples from the old tasks, which is done by many replay based methods. We perform extensive experiments on multiple benchmark datasets (SVHN, CIFAR, ImageNet, and MS-Celeb), all of which show substan-tial improvements over state-of-the-art methods (e.g., a 29% absolute increase in accuracy on CIFAR-100 with 10 classes at a time). On large-scale datasets, our approach yields 23.8% and 9.7% absolute increase in accuracy on ImageNet-100 and MS-Celeb-10K datasets, respectively, by employing very few (0.51% and 0.35% of model parameters) task-adaptive calibration parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Calibrating Deep Neural Networks using Focal Loss b/data/2020/neurips/Calibrating Deep Neural Networks using Focal Loss
new file mode 100644
index 0000000000..9cfce1730d
--- /dev/null
+++ b/data/2020/neurips/Calibrating Deep Neural Networks using Focal Loss	
@@ -0,0 +1 @@
+Miscalibration - a mismatch between a model's confidence and its correctness - of Deep Neural Networks (DNNs) makes their predictions hard to rely on. Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss [Lin et. al., 2017] allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss. To facilitate the use of focal loss in practice, we also provide a principled approach to automatically select the hyperparameter involved in the loss function. We perform extensive experiments on a variety of computer vision and NLP datasets, and with a wide variety of network architectures, and show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Calibration of Shared Equilibria in General Sum Partially Observable Markov Games b/data/2020/neurips/Calibration of Shared Equilibria in General Sum Partially Observable Markov Games
new file mode 100644
index 0000000000..e59a32d46d
--- /dev/null
+++ b/data/2020/neurips/Calibration of Shared Equilibria in General Sum Partially Observable Markov Games	
@@ -0,0 +1 @@
+Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents is not yet understood: we introduce the novel concept of \textit{Shared equilibrium} as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play. In addition, it is important that such equilibria satisfy certain constraints so that MAS are \textit{calibrated} to real world data for practical use: we solve this problem by introducing a novel dual-Reinforcement Learning based approach that fits emergent behaviors of agents in a Shared equilibrium to externally-specified targets, and apply our methods to a $n$-player market example. We do so by calibrating parameters governing distributions of agent types rather than individual agents, which allows both behavior differentiation among agents and coherent scaling of the shared policy network to multiple agents.
\ No newline at end of file
diff --git a/data/2020/neurips/Can Graph Neural Networks Count Substructures? b/data/2020/neurips/Can Graph Neural Networks Count Substructures?
new file mode 100644
index 0000000000..c72b765b49
--- /dev/null
+++ b/data/2020/neurips/Can Graph Neural Networks Count Substructures?	
@@ -0,0 +1 @@
+The ability to detect and count certain substructures in graphs is important for solving many tasks on graph-structured data, especially in the contexts of computational chemistry and biology as well as social network analysis. Inspired by this, we propose to study the expressive power of graph neural networks (GNNs) via their ability to count attributed graph substructures, extending recent works that examine their power in graph isomorphism testing and function approximation. We distinguish between two types of substructure counting: induced-subgraph-count and subgraph-count, and establish both positive and negative answers for popular GNN architectures. Specifically, we prove that Message Passing Neural Networks (MPNNs), 2-Weisfeiler-Lehman (2-WL) and 2-Invariant Graph Networks (2-IGNs) cannot perform induced-subgraph-count of substructures consisting of 3 or more nodes, while they can perform subgraph-count of star-shaped substructures. As an intermediary step, we prove that 2-WL and 2-IGNs are equivalent in distinguishing non-isomorphic graphs, partly answering an open problem raised in Maron et al. (2019). We also prove positive results for k-WL and k-IGNs as well as negative results for k-WL with a finite number of iterations. We then conduct experiments that support the theoretical results for MPNNs and 2-IGNs. Moreover, motivated by substructure counting, we propose a local relational pooling approach with inspirations from Murphy et al. (2019) and demonstrate that it is not only effective for substructure counting but also able to achieve competitive performance on real-world tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference b/data/2020/neurips/Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference
new file mode 100644
index 0000000000..d3817b16c9
--- /dev/null
+++ b/data/2020/neurips/Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference	
@@ -0,0 +1 @@
+We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not.
\ No newline at end of file
diff --git a/data/2020/neurips/Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study b/data/2020/neurips/Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
new file mode 100644
index 0000000000..0a29d3f2f0
--- /dev/null
+++ b/data/2020/neurips/Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study	
@@ -0,0 +1 @@
+The notion of implicit bias, or implicit regularization, has been suggested as a means to explain the surprising generalization ability of modern-days overparameterized learning algorithms. This notion refers to the tendency of the optimization algorithm towards a certain structured solution that often generalizes well. Recently, several papers have studied implicit regularization and were able to identify this phenomenon in various scenarios. We revisit this paradigm in arguably the simplest non-trivial setup, and study the implicit bias of Stochastic Gradient Descent (SGD) in the context of Stochastic Convex Optimization. As a first step, we provide a simple construction that rules out the existence of a \emph{distribution-independent} implicit regularizer that governs the generalization ability of SGD. We then demonstrate a learning problem that rules out a very general class of \emph{distribution-dependent} implicit regularizers from explaining generalization, which includes strongly convex regularizers as well as non-degenerate norm-based regularizations. Certain aspects of our constructions point out to significant difficulties in providing a comprehensive explanation of an algorithm's generalization performance by solely arguing about its implicit regularization properties.
\ No newline at end of file
diff --git a/data/2020/neurips/Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver? b/data/2020/neurips/Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?
new file mode 100644
index 0000000000..b692a26b08
--- /dev/null
+++ b/data/2020/neurips/Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?	
@@ -0,0 +1 @@
+We present Graph-Q -SAT, a branching heuristic for a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using Graph-Q -SAT are complete SAT solvers that either provide a satisfying assignment or proof of unsatisﬁability, which is required for many SAT applications. The branching heuristics commonly used in SAT solvers make poor decisions during their warm-up period, whereas Graph-Q -SAT is trained to examine the structure of the particular problem instance to make better decisions early in the search. Training Graph-Q -SAT is data efﬁcient and does not require elaborate dataset preparation or feature engineering. We train Graph-Q -SAT using RL interfacing with MiniSat solver and show that Graph-Q -SAT can reduce the number of iterations required to solve SAT problems by 2-3X. Furthermore, it generalizes to unsatisﬁable SAT instances, as well as to problems with 5X more variables than it was trained on. We show that for larger problems, reductions in the number of iterations lead to wall clock time reductions, the ultimate goal when designing heuristics. We also show positive zero-shot transfer behavior when testing Graph-Q -SAT on a task family different from that used for training. While more work is needed to apply Graph-Q -SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.
\ No newline at end of file
diff --git a/data/2020/neurips/Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory b/data/2020/neurips/Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory
new file mode 100644
index 0000000000..6db5f7e5b4
--- /dev/null
+++ b/data/2020/neurips/Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory	
@@ -0,0 +1,3 @@
+Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning. 
+In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise. We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve? If it converges, does it converge to the optimal one? 
+We prove that, utilizing an overparameterized two-layer neural network, temporal-difference and Q-learning globally minimize the mean-squared projected Bellman error at a sublinear rate. Moreover, the associated feature representation converges to the optimal one, generalizing the previous analysis of Cai et al. (2019) in the neural tangent kernel regime, where the associated feature representation stabilizes at the initial one. The key to our analysis is a mean-field perspective, which connects the evolution of a finite-dimensional parameter to its limiting counterpart over an infinite-dimensional Wasserstein space. Our analysis generalizes to soft Q-learning, which is further connected to policy gradient.
\ No newline at end of file
diff --git a/data/2020/neurips/Can the Brain Do Backpropagation? - Exact Implementation of Backpropagation in Predictive Coding Networks b/data/2020/neurips/Can the Brain Do Backpropagation? - Exact Implementation of Backpropagation in Predictive Coding Networks
new file mode 100644
index 0000000000..e34155ec3c
--- /dev/null
+++ b/data/2020/neurips/Can the Brain Do Backpropagation? - Exact Implementation of Backpropagation in Predictive Coding Networks	
@@ -0,0 +1 @@
+Backpropagation (BP) has been the most successful algorithm used to train artificial neural networks. However, there are several gaps between BP and learning in biologically plausible neuronal networks of the brain (learning in the brain, or simply BL, for short), in particular, (1) it has been unclear to date, if BP can be implemented exactly via BL, (2) there is a lack of local plasticity in BP, i.e., weight updates require information that is not locally available, while BL utilizes only locally available information, and (3) there is a lack of autonomy in BP, i.e., some external control over the neural network is required (e.g., switching between prediction and learning stages requires changes to dynamics and synaptic plasticity rules), while BL works fully autonomously. Bridging such gaps, i.e., understanding how BP can be approximated by BL, has been of major interest in both neuroscience and machine learning. Despite tremendous efforts, however, no previous model has bridged the gaps at a degree of demonstrating an equivalence to BP, instead, only approximations to BP have been shown. Here, we present for the first time a framework within BL that bridges the above crucial gaps. We propose a BL model that (1) produces exactly the same updates of the neural weights as BP, while (2) employing local plasticity, i.e., all neurons perform only local computations, done simultaneously. We then modify it to an alternative BL model that (3) also works fully autonomously. Overall, our work provides important evidence for the debate on the long-disputed question whether the brain can perform BP.
\ No newline at end of file
diff --git a/data/2020/neurips/Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction b/data/2020/neurips/Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction
new file mode 100644
index 0000000000..1ef454e44e
--- /dev/null
+++ b/data/2020/neurips/Canonical 3D Deformer Maps: Unifying parametric and non-parametric methods for dense weakly-supervised category reconstruction	
@@ -0,0 +1 @@
+We propose the Canonical 3D Deformer Map, a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects. Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings, combining their individual advantages. In particular, it learns to associate each image pixel with a deformation model of the corresponding 3D object point which is canonical, i.e. intrinsic to the identity of the point and shared across objects of the category. The result is a method that, given only sparse 2D supervision at training time, can, at test time, reconstruct the 3D shape and texture of objects from single views, while establishing meaningful dense correspondences between object instances. It also achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
\ No newline at end of file
diff --git a/data/2020/neurips/Cascaded Text Generation with Markov Transformers b/data/2020/neurips/Cascaded Text Generation with Markov Transformers
new file mode 100644
index 0000000000..a18cdb6d7a
--- /dev/null
+++ b/data/2020/neurips/Cascaded Text Generation with Markov Transformers	
@@ -0,0 +1 @@
+The two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies. This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output. To parameterize this cascade, we introduce a Markov transformer, a variant of the popular fully autoregressive model that allows us to simultaneously decode with specific autoregressive context cutoffs. This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning b/data/2020/neurips/Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning
new file mode 100644
index 0000000000..6cb6dc4ab9
--- /dev/null
+++ b/data/2020/neurips/Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning	
@@ -0,0 +1 @@
+One fundamental problem in the empirical sciences is of reconstructing the causal structure that underlies a phenomenon of interest through observation and experimentation. While there exists a plethora of methods capable of learning the equivalence class of causal structures that are compatible with observations, it is less well-understood how to systematically combine observations and experiments to reconstruct the underlying structure. In this paper, we investigate the task of structural learning in non-Markovian systems (i.e., when latent variables a ﬀ ect more than one observable) from a combination of observational and soft experimental data when the interventional targets are unknown. Using causal invariances found across the collection of observational and interventional distributions (not only conditional independences), we deﬁne a property called Ψ -Markov that connects these distributions to a pair consisting of (1) a causal graph D and (2) a set of interventional targets I . Building on this property, our main contributions are two-fold: First, we provide a graphical characterization that allows one to test whether two causal graphs with possibly di ﬀ erent sets of interventional targets belong to the same Ψ -Markov equivalence class. Second, we develop an algorithm capable of harnessing the collection of data to learn the corresponding equivalence class. We then prove that this algorithm is sound and complete, in the sense that it is the most informative in the sample limit, i.e., it discovers as many tails and arrowheads as can be oriented within a Ψ -Markov equivalence class.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal Discovery in Physical Systems from Videos b/data/2020/neurips/Causal Discovery in Physical Systems from Videos
new file mode 100644
index 0000000000..c3dfb03d84
--- /dev/null
+++ b/data/2020/neurips/Causal Discovery in Physical Systems from Videos	
@@ -0,0 +1 @@
+Causal discovery is at the core of human cognition. It enables us to reason about the environment and make counterfactual predictions about unseen scenarios, that can vastly differ from our previous experiences. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure. In particular, our goal is to discover the structural dependencies among environmental and object variables: inferring the type and strength of interactions that have a causal effect on the behavior of the dynamical system. Our model consists of (a) a perception module that extracts a semantically meaningful and temporally consistent keypoint representation from images, (b) an inference module for determining the graph distribution induced by the detected keypoints, and (c) a dynamics module that can predict the future by conditioning on the inferred graph. We assume access to different configurations and environmental conditions, i.e., data from unknown interventions on the underlying system; thus, we can hope to discover the correct underlying causal graph without explicit interventions. We evaluate our method in a planar multi-body interaction environment and scenarios involving fabrics of different shapes like shirts and pants. Experiments demonstrate that our model can correctly identify the interactions from a short sequence of images and make long-term future predictions. The causal structure assumed by the model also allows it to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal Estimation with Functional Confounders b/data/2020/neurips/Causal Estimation with Functional Confounders
new file mode 100644
index 0000000000..ab2b2af25d
--- /dev/null
+++ b/data/2020/neurips/Causal Estimation with Functional Confounders	
@@ -0,0 +1 @@
+Causal inference relies on two fundamental assumptions: ignorability and positivity. We study causal inference when the true confounder value can be expressed as a function of the observed data; we call this setting estimation with functional confounders (EFC). In this setting ignorability is satisfied, however positivity is violated, and causal inference is impossible in general. We consider two scenarios where causal effects are estimable. First, we discuss interventions on a part of the treatment called functional interventions and a sufficient condition for effect estimation of these interventions called functional positivity. Second, we develop conditions for nonparametric effect estimation based on the gradient fields of the functional confounder and the true outcome function. To estimate effects under these conditions, we develop Level-set Orthogonal Descent Estimation (LODE). Further, we prove error bounds on LODE's effect estimates, evaluate our methods on simulated and real data, and empirically demonstrate the value of EFC.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal Intervention for Weakly-Supervised Semantic Segmentation b/data/2020/neurips/Causal Intervention for Weakly-Supervised Semantic Segmentation
new file mode 100644
index 0000000000..3a1c19fef0
--- /dev/null
+++ b/data/2020/neurips/Causal Intervention for Weakly-Supervised Semantic Segmentation	
@@ -0,0 +1 @@
+We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by using only image-level labels -- the most crucial step in WSSS. We attribute the cause of the ambiguous boundaries of pseudo-masks to the confounding context, e.g., the correct image-level classification of "horse" and "person" may be not only due to the recognition of each instance, but also their co-occurrence context, making the model inspection (e.g., CAM) hard to distinguish between the boundaries. Inspired by this, we propose a structural causal model to analyze the causalities among images, contexts, and class labels. Based on it, we develop a new method: Context Adjustment (CONTA), to remove the confounding bias in image-level classification and thus provide better pseudo-masks as ground-truth for the subsequent segmentation model. On PASCAL VOC 2012 and MS-COCO, we show that CONTA boosts various popular WSSS methods to new state-of-the-arts.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models b/data/2020/neurips/Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models
new file mode 100644
index 0000000000..a743f8bb8e
--- /dev/null
+++ b/data/2020/neurips/Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models	
@@ -0,0 +1,2 @@
+Shapley values underlie one of the most popular model-agnostic methods within explainable artificial intelligence. These values are designed to attribute the difference between a model's prediction and an average baseline to the different features used as input to the model. Being based on solid game-theoretic principles, Shapley values uniquely satisfy several desirable properties, which is why they are increasingly used to explain the predictions of possibly complex and highly non-linear machine learning models. Shapley values are well calibrated to a user's intuition when features are independent, but may lead to undesirable, counterintuitive explanations when the independence assumption is violated. 
+In this paper, we propose a novel framework for computing Shapley values that generalizes recent work that aims to circumvent the independence assumption. By employing Pearl's do-calculus, we show how these 'causal' Shapley values can be derived for general causal graphs without sacrificing any of their desirable properties. Moreover, causal Shapley values enable us to separate the contribution of direct and indirect effects. We provide a practical implementation for computing causal Shapley values based on causal chain graphs when only partial information is available and illustrate their utility on a real-world example.
\ No newline at end of file
diff --git a/data/2020/neurips/Causal analysis of Covid-19 Spread in Germany b/data/2020/neurips/Causal analysis of Covid-19 Spread in Germany
new file mode 100644
index 0000000000..22983d3da9
--- /dev/null
+++ b/data/2020/neurips/Causal analysis of Covid-19 Spread in Germany	
@@ -0,0 +1 @@
+In this work, we study the causal relations among German regions in terms of the spread of Covid-19 since the beginning of the pandemic, taking into account the restriction policies that were applied by the different federal states. We propose and prove a new theorem for a causal feature selection method for time series data, robust to latent confounders, which we subsequently apply on Covid-19 case numbers. We present findings about the spread of the virus in Germany and the causal impact of restriction measures, discussing the role of various policies in containing the spread. Since our results are based on rather limited target time series (only the numbers of reported cases), care should be exercised in interpreting them. However, it is encouraging that already such limited data seems to contain causal signals. This suggests that as more data becomes available, our causal approach may contribute towards meaningful causal analysis of political interventions on the development of Covid-19, and thus also towards the development of rational and data-driven methodologies for choosing interventions.
\ No newline at end of file
diff --git a/data/2020/neurips/Certifiably Adversarially Robust Detection of Out-of-Distribution Data b/data/2020/neurips/Certifiably Adversarially Robust Detection of Out-of-Distribution Data
new file mode 100644
index 0000000000..9d7f423488
--- /dev/null
+++ b/data/2020/neurips/Certifiably Adversarially Robust Detection of Out-of-Distribution Data	
@@ -0,0 +1 @@
+Deep neural networks are known to be overconfident when applied to out-of-distribution (OOD) inputs which clearly do not belong to any class. This is a problem in safety-critical applications since a reliable assessment of the uncertainty of a classifier is a key property, allowing the system to trigger human intervention or to transfer into a safe state. In this paper, we aim for certifiable worst case guarantees for OOD detection by enforcing not only low confidence at the OOD point but also in an $l_\infty$-ball around it. For this purpose, we use interval bound propagation (IBP) to upper bound the maximal confidence in the $l_\infty$-ball and minimize this upper bound during training time. We show that non-trivial bounds on the confidence for OOD data generalizing beyond the OOD dataset seen at training time are possible. Moreover, in contrast to certified adversarial robustness which typically comes with significant loss in prediction performance, certified guarantees for worst case OOD detection are possible without much loss in accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Certified Defense to Image Transformations via Randomized Smoothing b/data/2020/neurips/Certified Defense to Image Transformations via Randomized Smoothing
new file mode 100644
index 0000000000..8cc6c982c2
--- /dev/null
+++ b/data/2020/neurips/Certified Defense to Image Transformations via Randomized Smoothing	
@@ -0,0 +1 @@
+We extend randomized smoothing to cover parameterized transformations (e.g., rotations, translations) and certify robustness in the parameter space (e.g., rotation angle). This is particularly challenging as interpolation and rounding effects mean that image transformations do not compose, in turn preventing direct certification of the perturbed image (unlike certification with $\ell^p$ norms). We address this challenge by introducing three different defenses, each with a different guarantee (heuristic, distributional and individual) stemming from the method used to bound the interpolation error. Importantly, in the individual case, we show how to efficiently compute the inverse of an image transformation, enabling us to provide individual guarantees in the online setting. We provide an implementation of all methods at https://github.com/eth-sri/transformation-smoothing.
\ No newline at end of file
diff --git a/data/2020/neurips/Certified Monotonic Neural Networks b/data/2020/neurips/Certified Monotonic Neural Networks
new file mode 100644
index 0000000000..5183a89d7f
--- /dev/null
+++ b/data/2020/neurips/Certified Monotonic Neural Networks	
@@ -0,0 +1 @@
+Learning monotonic models with respect to a subset of the inputs is a desirable feature to effectively address the fairness, interpretability, and generalization issues in practice. Existing methods for learning monotonic neural networks either require specifically designed model structures to ensure monotonicity, which can be too restrictive/complicated, or enforce monotonicity by adjusting the learning process, which cannot provably guarantee the learned model is monotonic on selected features. In this work, we propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem.This provides a new general approach for learning monotonic neural networks with arbitrary model structures. Our method allows us to train neural networks with heuristic monotonicity regularizations, and we can gradually increase the regularization magnitude until the learned network is certified monotonic. Compared to prior works, our approach does not require human-designed constraints on the weight space and also yields more accurate approximation. Empirical studies on various datasets demonstrate the efficiency of our approach over the state-of-the-art methods, such as Deep Lattice Networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks b/data/2020/neurips/Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks
new file mode 100644
index 0000000000..d7a8ba6541
--- /dev/null
+++ b/data/2020/neurips/Certified Robustness of Graph Convolution Networks for Graph Classification under Topological Attacks	
@@ -0,0 +1 @@
+Graph convolution networks (GCNs) have become effective models for graph classiﬁcation. Similar to many deep networks, GCNs are vulnerable to adversarial attacks on graph topology and node attributes. Recently, a number of effective attack and defense algorithms have been developed, but certiﬁcates of robustness against topological perturbations are currently available only for PageRank and label/feature propagation, while none has been designed for GCNs. We propose the ﬁrst algorithm for certifying the robustness of GCNs to topological attacks in the application of graph classiﬁcation . Our method is based on Lagrange dual-ization and convex envelope, which result in tight approximation bounds that are efﬁciently computable by dynamic programming. When used in conjunction with robust training, it allows an increased number of graphs to be certiﬁed as robust.
\ No newline at end of file
diff --git a/data/2020/neurips/Certifying Confidence via Randomized Smoothing b/data/2020/neurips/Certifying Confidence via Randomized Smoothing
new file mode 100644
index 0000000000..fb6b032958
--- /dev/null
+++ b/data/2020/neurips/Certifying Confidence via Randomized Smoothing	
@@ -0,0 +1 @@
+Randomized smoothing has been shown to provide good certified-robustness guarantees for high-dimensional classification problems. It uses the probabilities of predicting the top two most-likely classes around an input point under a smoothing distribution to generate a certified radius for a classifier's prediction. However, most smoothing methods do not give us any information about the \emph{confidence} with which the underlying classifier (e.g., deep neural network) makes a prediction. In this work, we propose a method to generate certified radii for the prediction confidence of the smoothed classifier. We consider two notions for quantifying confidence: average prediction score of a class and the margin by which the average prediction score of one class exceeds that of another. We modify the Neyman-Pearson lemma (a key theorem in randomized smoothing) to design a procedure for computing the certified radius where the confidence is guaranteed to stay above a certain threshold. Our experimental results on CIFAR-10 and ImageNet datasets show that using information about the distribution of the confidence scores allows us to achieve a significantly better certified radius than ignoring it. Thus, we demonstrate that extra information about the base classifier at the input point can help improve certified guarantees for the smoothed classifier.
\ No newline at end of file
diff --git a/data/2020/neurips/Certifying Strategyproof Auction Networks b/data/2020/neurips/Certifying Strategyproof Auction Networks
new file mode 100644
index 0000000000..a2e2e26b55
--- /dev/null
+++ b/data/2020/neurips/Certifying Strategyproof Auction Networks	
@@ -0,0 +1 @@
+Optimal auctions maximize a seller's expected revenue subject to individual rationality and strategyproofness for the buyers. Myerson's seminal work in 1981 settled the case of auctioning a single item; however, subsequent decades of work have yielded little progress moving beyond a single item, leaving the design of revenue-maximizing auctions as a central open problem in the field of mechanism design. A recent thread of work in "differentiable economics" has used tools from modern deep learning to instead learn good mechanisms. We focus on the RegretNet architecture, which can represent auctions with arbitrary numbers of items and participants; it is trained to be empirically strategyproof, but the property is never exactly verified leaving potential loopholes for market participants to exploit. We propose ways to explicitly verify strategyproofness under a particular valuation profile using techniques from the neural network verification literature. Doing so requires making several modifications to the RegretNet architecture in order to represent it exactly in an integer program. We train our network and produce certificates in several settings, including settings for which the optimal strategyproof mechanism is not known.
\ No newline at end of file
diff --git a/data/2020/neurips/Chaos, Extremism and Optimism: Volume Analysis of Learning in Games b/data/2020/neurips/Chaos, Extremism and Optimism: Volume Analysis of Learning in Games
new file mode 100644
index 0000000000..7660d590ed
--- /dev/null
+++ b/data/2020/neurips/Chaos, Extremism and Optimism: Volume Analysis of Learning in Games	
@@ -0,0 +1,3 @@
+We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games. Such analyses provide new insights into these game dynamical systems, which seem hard to achieve via the classical techniques within Computer Science and Machine Learning. 
+The first step is to examine these dynamics not in their original space (simplex of actions) but in a dual space (aggregate payoff space of actions). The second step is to explore how the volume of a set of initial conditions evolves over time when it is pushed forward according to the algorithm. This is reminiscent of approaches in Evolutionary Game Theory where replicator dynamics, the continuous-time analogue of MWU, is known to always preserve volume in all games. Interestingly, when we examine discrete-time dynamics, both the choice of the game and the choice of the algorithm play a critical role. So whereas MWU expands volume in zero-sum games and is thus Lyapunov chaotic, we show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior. However, we also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts. 
+Using these tools, we prove two novel, rather negative properties of MWU in zero-sum games: (1) Extremism: even in games with unique fully mixed Nash equilibrium, the system recurrently gets stuck near pure-strategy profiles, despite them being clearly unstable from game theoretic perspective. (2) Unavoidability: given any set of good points (with your own interpretation of "good"), the system cannot avoid bad points indefinitely.
\ No newline at end of file
diff --git a/data/2020/neurips/Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe b/data/2020/neurips/Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe
new file mode 100644
index 0000000000..3973b81f3d
--- /dev/null
+++ b/data/2020/neurips/Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe	
@@ -0,0 +1 @@
+Intelligent agents are continuously faced with the challenge of optimizing a policy based on what they can observe (see) and which actions they can take (do) in the environment where they are deployed. Most policy can be parametrized in terms of these two dimensions, i.e., as a function of what can be seen and done given a certain situation, which we call a mixed policy . In this paper, we investigate several properties of the class of mixed policies and provide an efﬁcient and effective characterization, including optimality and non-redundancy. Speciﬁcally, we introduce a graphical criterion to identify unnecessary contexts for a set of actions, leading to a natural characterization of non-redundancy of mixed policies. We then derive sufﬁcient conditions under which one strategy can dominate the other with respect to their maximum achievable expected rewards (optimality). This characterization leads to a fundamental understanding of the space of mixed policies and a possible reﬁnement of the agent’s strategy so that it converges to the optimum faster and more robustly. One surprising result of the causal characterization is that the agent following a more standard approach—intervening on all intervenable variables and observing all available contexts—may be hurting itself, and will never achieve an optimal performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Characterizing emergent representations in a space of candidate learning rules for deep networks b/data/2020/neurips/Characterizing emergent representations in a space of candidate learning rules for deep networks
new file mode 100644
index 0000000000..94b3ab7f85
--- /dev/null
+++ b/data/2020/neurips/Characterizing emergent representations in a space of candidate learning rules for deep networks	
@@ -0,0 +1 @@
+We apply singular value decomposition (SVD) to the dataset’s input-output correlation matrix to extract the component of the input-output mapping for different hierarchical levels. Suppose the desired (target) output matrix is given by Y as shown in the main paper Fig. 1b, and input matrix is X where examples are placed in columns. In X, each object’s perceptual representation x (a column vector, where μ = 1...P indexing objects) is encoded by a one-hot input vector (Kronecker delta δμi). Thus, the input-output correlation matrix is Σ = YX . We use SVD on Σ, i.e., Σ = USV , which results in three key elements fully characterizing the input-output mapping to be learned (visualized in Supp. Fig. 1). For the case of hierarchically structured data from a binary tree, the SVD structure conforms to hierarchical distinctions in the dataset [4]. The first element is U, a feature-synthesizer matrix in which each column (a particular semantic dimension or ‘mode’) contains positive (negative) values for semantic features that objects categorized along this mode do (do not) possess. The second element is S, a singular value matrix that has nonzero values only on the diagonal, and these values are arranged in a descending order. And the last element is V , whose rows are object-analyzer vectors, whereby a binary code is assigned to classify objects according to each semantic mode (e.g., the 2nd row of V indicates the first 4 objects are plants, while the last 4 objects are animals, see Fig. 1b in the main paper).
\ No newline at end of file
diff --git a/data/2020/neurips/Choice Bandits b/data/2020/neurips/Choice Bandits
new file mode 100644
index 0000000000..cf99d08a52
--- /dev/null
+++ b/data/2020/neurips/Choice Bandits	
@@ -0,0 +1 @@
+In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users' experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward, which is a non-decreasing function of the action, if this action is below the user's threshold. Users are equipped with a patience budget, and actions that are above the threshold decrease the user's patience. When all patience is lost, the user abandons the platform. The platform attempts to learn the thresholds of the users in order to maximize its rewards, based on two different feedback models describing the information pattern available to the platform at each action. We define a notion of regret by determining the best action to be taken when the platform knows that the user's threshold is in a given interval. We then propose bandit algorithms for the two feedback models and show that upper and lower bounds on the regret are of the order of $\tilde{O}(N^{2/3})$ and $\tilde\Omega(N^{2/3})$, respectively, where $N$ is the total number of users. Finally, we show that the waiting time of any user before receiving a personalized experience is uniform in $N$.
\ No newline at end of file
diff --git a/data/2020/neurips/CircleGAN: Generative Adversarial Learning across Spherical Circles b/data/2020/neurips/CircleGAN: Generative Adversarial Learning across Spherical Circles
new file mode 100644
index 0000000000..6dc3de578e
--- /dev/null
+++ b/data/2020/neurips/CircleGAN: Generative Adversarial Learning across Spherical Circles	
@@ -0,0 +1 @@
+We present a novel discriminator for GANs that improves realness and diversity of generated samples by learning a structured hypersphere embedding space using spherical circles. The proposed discriminator learns to populate realistic samples around the longest spherical circle, i.e., a great circle, while pushing unrealistic samples toward the poles perpendicular to the great circle. Since longer circles occupy larger area on the hypersphere, they encourage more diversity in representation learning, and vice versa. Discriminating samples based on their corresponding spherical circles can thus naturally induce diversity to generated samples. We also extend the proposed method for conditional settings with class labels by creating a hypersphere for each category and performing class-wise discrimination and update. In experiments, we validate the effectiveness for both unconditional and conditional generation on standard benchmarks, achieving the state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability b/data/2020/neurips/Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability
new file mode 100644
index 0000000000..f01bd6fcbf
--- /dev/null
+++ b/data/2020/neurips/Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability	
@@ -0,0 +1 @@
+In this paper, we revisit the problem of distribution-independently learning half-spaces under Massart noise with rate ⌘ . Recent work [DGT19] resolved a long-standing problem in this model of efﬁciently learning to error ⌘ + ✏ for any ✏ > 0 , by giving an improper learner that partitions space into poly( d, 1 / ✏ ) regions. Here we give a much simpler algorithm and settle a number of outstanding open questions: (1) We give the ﬁrst proper learner for Massart halfspaces that achieves ⌘ + ✏ . (2) Based on (1), we develop a blackbox knowledge distillation procedure to convert an arbitrarily complex classiﬁer to an equally good proper classiﬁer. (3) By leveraging a simple but overlooked connection to evolvability , we show any SQ algorithm requires super-polynomially many queries to achieve OPT + ✏ . We then zoom out to study generalized linear models and give an efﬁcient algorithm for learning under a challenging new corruption model generalizing Massart noise. Lastly, we empirically evaluate our algorithm for Massart halfspaces and ﬁnd it exhibits some intriguing fairness properties.
\ No newline at end of file
diff --git a/data/2020/neurips/Classification with Valid and Adaptive Coverage b/data/2020/neurips/Classification with Valid and Adaptive Coverage
new file mode 100644
index 0000000000..6b829a4fb9
--- /dev/null
+++ b/data/2020/neurips/Classification with Valid and Adaptive Coverage	
@@ -0,0 +1 @@
+Conformal inference, cross-validation+, and the jackknife+ are hold-out methods that can be combined with virtually any machine learning algorithm to construct prediction sets with guaranteed marginal coverage. In this paper, we develop specialized versions of these techniques for categorical and unordered response labels that, in addition to providing marginal coverage, are also fully adaptive to complex data distributions, in the sense that they perform favorably in terms of approximate conditional coverage compared to alternative methods. The heart of our contribution is a novel conformity score, which we explicitly demonstrate to be powerful and intuitive for classification problems, but whose underlying principle is potentially far more general. Experiments on synthetic and real data demonstrate the practical value of our theoretical guarantees, as well as the statistical advantages of the proposed methods over the existing alternatives.
\ No newline at end of file
diff --git a/data/2020/neurips/Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow b/data/2020/neurips/Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow
new file mode 100644
index 0000000000..2b51809d1e
--- /dev/null
+++ b/data/2020/neurips/Closing the Dequantization Gap: PixelCNN as a Single-Layer Flow	
@@ -0,0 +1 @@
+Flow models have recently made great progress at modeling ordinal discrete data such as images and audio. Due to the continuous nature of flow models, dequantization is typically applied when using them for such discrete data, resulting in lower bound estimates of the likelihood. In this paper, we introduce subset flows, a class of flows that can tractably transform finite volumes and thus allow exact computation of likelihoods for discrete data. Based on subset flows, we identify ordinal discrete autoregressive models, including WaveNets, PixelCNNs and Transformers, as single-layer flows. We use the flow formulation to compare models trained and evaluated with either the exact likelihood or its dequantization lower bound. Finally, we study multilayer flows composed of PixelCNNs and non-autoregressive coupling layers and demonstrate state-of-the-art results on CIFAR-10 for flow models trained with dequantization.
\ No newline at end of file
diff --git a/data/2020/neurips/Co-Tuning for Transfer Learning b/data/2020/neurips/Co-Tuning for Transfer Learning
new file mode 100644
index 0000000000..6181d90e2d
--- /dev/null
+++ b/data/2020/neurips/Co-Tuning for Transfer Learning	
@@ -0,0 +1 @@
+Fine-tuning pre-trained deep neural networks (DNNs) to a target dataset, also known as transfer learning, is widely used in computer vision and NLP. Because task-speciﬁc layers mainly contain categorical information and categories vary with datasets, practitioners only partially transfer pre-trained models by discarding task-speciﬁc layers and ﬁne-tuning bottom layers. However, it is a reckless loss to simply discard task-speciﬁc parameters which take up as many as 20% of the total parameters in pre-trained models. To fully transfer pre-trained models, we propose a two-step framework named Co-Tuning : (i) learn the relationship between source categories and target categories from the pre-trained model with calibrated predictions; (ii) target labels (one-hot labels), as well as source labels (probabilistic labels) translated by the category relationship, collaboratively supervise the ﬁne-tuning process. A simple instantiation of the framework shows strong empirical results in four visual classiﬁcation tasks and one NLP classiﬁcation task, bringing up to 20% relative improvement. While state-of-the-art ﬁne-tuning techniques mainly focus on how to impose regularization when data are not abundant, Co-Tuning works not only in medium-scale datasets (100 samples per class) but also in large-scale datasets (1000 samples per class) where regularization-based methods bring no gains over the vanilla ﬁne-tuning. Co-Tuning relies on a typically valid assumption that the pre-trained dataset is diverse enough, implying its broad application areas.
\ No newline at end of file
diff --git a/data/2020/neurips/Co-exposure Maximization in Online Social Networks b/data/2020/neurips/Co-exposure Maximization in Online Social Networks
new file mode 100644
index 0000000000..449d3dad8c
--- /dev/null
+++ b/data/2020/neurips/Co-exposure Maximization in Online Social Networks	
@@ -0,0 +1 @@
+Social media has created new ways for citizens to stay informed on societal matters and participate in political discourse. However, with its algorithmically-curated and virally-propagating content, social media has contributed further to the polarization of opinions by reinforcing users’ existing viewpoints. An emerging line of research seeks to understand how content-recommendation algorithms can be re-designed to mitigate societal polarization ampliﬁed by social-media interactions. In this paper, we study the problem of allocating seed users to opposing campaigns: by drawing on the equal-time rule of political campaigning on traditional media, our goal is to allocate seed users to campaigners with the aim to maximize the expected number of users who are co-exposed to both campaigns. We show that the problem of maximizing co-exposure is NP -hard and its objective function is neither submodular nor supermodular. However, by exploiting a connection to a submodular function that acts as a lower bound to the objective, we are able to devise a greedy algorithm with provable approximation guarantee. We further provide a scalable instantiation of our approximation algorithm by introducing a novel extension to the notion of random reverse-reachable sets for efﬁciently estimating the expected co-exposure. We experimentally demonstrate the quality of our proposal on real-world social networks.
\ No newline at end of file
diff --git a/data/2020/neurips/CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection b/data/2020/neurips/CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection
new file mode 100644
index 0000000000..ff68dc5d08
--- /dev/null
+++ b/data/2020/neurips/CoADNet: Collaborative Aggregation-and-Distribution Networks for Co-Salient Object Detection	
@@ -0,0 +1 @@
+Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images. One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships. In this paper, we present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images. First, we integrate saliency priors into the backbone features to suppress the redundant background information through an online intra-saliency guidance structure. After that, we design a two-stage aggregate-and-distribute architecture to explore group-wise semantic interactions and produce the co-saliency features. In the first stage, we propose a group-attentional semantic aggregation module that models inter-image relationships to generate the group-wise semantic representations. In the second stage, we propose a gated group distribution module that adaptively distributes the learned group semantics to different individuals in a dynamic gating mechanism. Finally, we develop a group consistency preserving decoder tailored for the CoSOD task, which maintains group constraints during feature decoding to predict more consistent full-resolution co-saliency maps. The proposed CoADNet is evaluated on four prevailing CoSOD benchmark datasets, which demonstrates the remarkable performance improvement over ten state-of-the-art competitors.
\ No newline at end of file
diff --git a/data/2020/neurips/CoMIR: Contrastive Multimodal Image Representation for Registration b/data/2020/neurips/CoMIR: Contrastive Multimodal Image Representation for Registration
new file mode 100644
index 0000000000..10eafbf468
--- /dev/null
+++ b/data/2020/neurips/CoMIR: Contrastive Multimodal Image Representation for Registration	
@@ -0,0 +1 @@
+We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one, in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for, e.g., classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/CoSE: Compositional Stroke Embeddings b/data/2020/neurips/CoSE: Compositional Stroke Embeddings
new file mode 100644
index 0000000000..7ee3855cb3
--- /dev/null
+++ b/data/2020/neurips/CoSE: Compositional Stroke Embeddings	
@@ -0,0 +1 @@
+We present a generative model for stroke-based drawing tasks which is able to model complex free-form structures. While previous approaches rely on sequence-based models for drawings of basic objects or handwritten text, we propose a model that treats drawings as a collection of strokes that can be composed into complex structures such as diagrams (e.g., flow-charts). At the core of the approach lies a novel auto-encoder that projects variable-length strokes into a latent space of fixed dimension. This representation space allows a relational model, operating in latent space, to better capture the relationship between strokes and to predict subsequent strokes. We demonstrate qualitatively and quantitatively that our proposed approach is able to model the appearance of individual strokes, as well as the compositional structure of larger diagram drawings. Our approach is suitable for interactive use cases such as auto-completing diagrams.
\ No newline at end of file
diff --git a/data/2020/neurips/CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching b/data/2020/neurips/CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching
new file mode 100644
index 0000000000..9937f1e932
--- /dev/null
+++ b/data/2020/neurips/CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching	
@@ -0,0 +1 @@
+Previous work indicates that CFG is a sensitive feature, because it changes greatly in different optimizations. However, the results have declined somewhat in our research, but within acceptable limits. To train a cross-platform model, we extract the token sequences based on IDA Pro microcode IR (Intermediate Representation). The columns are combinations of different compilers (gcc/clang), different platforms (x86/x64/arm/arm64), and different optimizations (O0/O1/O2/O3). The average recall@1 score of the 32 combinations is 88.9%. The lowest recall@1 score among the 32 combinations is 85.1% on gcc-arm64-O3, which is acceptable.
\ No newline at end of file
diff --git a/data/2020/neurips/Coded Sequential Matrix Multiplication For Straggler Mitigation b/data/2020/neurips/Coded Sequential Matrix Multiplication For Straggler Mitigation
new file mode 100644
index 0000000000..63d46d580c
--- /dev/null
+++ b/data/2020/neurips/Coded Sequential Matrix Multiplication For Straggler Mitigation	
@@ -0,0 +1 @@
+In this work, we consider a sequence of <inline-formula> <tex-math notation="LaTeX">$J$ </tex-math></inline-formula> matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For <inline-formula> <tex-math notation="LaTeX">$i\in \{1,2,\ldots,J\}$ </tex-math></inline-formula>, job-<inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula> begins in round-<inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula> and has to be completed by round-<inline-formula> <tex-math notation="LaTeX">$(i+T)$ </tex-math></inline-formula>. In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of <inline-formula> <tex-math notation="LaTeX">$T=0$ </tex-math></inline-formula>. We propose here two schemes with <inline-formula> <tex-math notation="LaTeX">$T > 0$ </tex-math></inline-formula>, which allow for coding across workers as well as the dimension of time. Our first scheme is a modification of the polynomial coding scheme introduced by Yu <italic>et al.</italic> and places no assumptions on the straggler model. Exploitation of the temporal dimension helps the scheme handle a larger set of straggler patterns than the polynomial coding scheme, for a given computational load per worker per round. The second scheme assumes a particular straggler model to further improve performance (in terms of encoding/decoding complexity). We develop theoretical results establishing (i) optimality of our proposed schemes for certain classes of straggler patterns and (ii) improved performance for the case of i.i.d. stragglers. These are further validated by experiments, where we implement our schemes to train neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/CogLTX: Applying BERT to Long Texts b/data/2020/neurips/CogLTX: Applying BERT to Long Texts
new file mode 100644
index 0000000000..35bea0162e
--- /dev/null
+++ b/data/2020/neurips/CogLTX: Applying BERT to Long Texts	
@@ -0,0 +1 @@
+BERT is incapable of processing long texts due to its quadratically increasing memory and time consumption. The most natural ways to address this problem, such as slicing the text by a sliding window or simplifying transformers, suffer from insufﬁcient long-range attentions or need customized CUDA kernels. The maximum length limit in BERT reminds us the limited capacity (5 ∼ 9 chunks) of the working memory of humans –— then how do human beings Cog nize L ong T e X ts? Founded on the cognitive theory stemming from Baddeley [2], the proposed CogLTX 1 framework identiﬁes key sentences by training a judge model, concatenates them for reasoning, and enables multi-step reasoning via rehearsal and decay . Since relevance annotations are usually unavailable, we propose to use interventions to create supervision. As a general algorithm, CogLTX outperforms or gets comparable results to SOTA models on various downstream tasks with memory overheads independent of the length of text.
\ No newline at end of file
diff --git a/data/2020/neurips/CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models b/data/2020/neurips/CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
new file mode 100644
index 0000000000..d7dd735282
--- /dev/null
+++ b/data/2020/neurips/CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models	
@@ -0,0 +1 @@
+The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information.
\ No newline at end of file
diff --git a/data/2020/neurips/Coherent Hierarchical Multi-Label Classification Networks b/data/2020/neurips/Coherent Hierarchical Multi-Label Classification Networks
new file mode 100644
index 0000000000..c514c3d6a8
--- /dev/null
+++ b/data/2020/neurips/Coherent Hierarchical Multi-Label Classification Networks	
@@ -0,0 +1 @@
+Hierarchical multi-label classification (HMC) is a challenging classification task extending standard multi-label classification problems by imposing a hierarchy constraint on the classes. In this paper, we propose C-HMCNN(h), a novel approach for HMC problems, which, given a network h for the underlying multi-label classification problem, exploits the hierarchy information in order to produce predictions coherent with the constraint and improve performance. We conduct an extensive experimental analysis showing the superior performance of C-HMCNN(h) when compared to state-of-the-art models.
\ No newline at end of file
diff --git a/data/2020/neurips/CoinDICE: Off-Policy Confidence Interval Estimation b/data/2020/neurips/CoinDICE: Off-Policy Confidence Interval Estimation
new file mode 100644
index 0000000000..0000faa16c
--- /dev/null
+++ b/data/2020/neurips/CoinDICE: Off-Policy Confidence Interval Estimation	
@@ -0,0 +1 @@
+We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
\ No newline at end of file
diff --git a/data/2020/neurips/CoinPress: Practical Private Mean and Covariance Estimation b/data/2020/neurips/CoinPress: Practical Private Mean and Covariance Estimation
new file mode 100644
index 0000000000..5e16262a76
--- /dev/null
+++ b/data/2020/neurips/CoinPress: Practical Private Mean and Covariance Estimation	
@@ -0,0 +1 @@
+We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets---showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. Specifically, previous estimators either have weak empirical accuracy at small sample sizes, perform poorly for multivariate data, or require the user to provide strong a priori estimates for the parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/ColdGANs: Taming Language GANs with Cautious Sampling Strategies b/data/2020/neurips/ColdGANs: Taming Language GANs with Cautious Sampling Strategies
new file mode 100644
index 0000000000..4730acc33d
--- /dev/null
+++ b/data/2020/neurips/ColdGANs: Taming Language GANs with Cautious Sampling Strategies	
@@ -0,0 +1 @@
+Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization.
\ No newline at end of file
diff --git a/data/2020/neurips/Collapsing Bandits and Their Application to Public Health Intervention b/data/2020/neurips/Collapsing Bandits and Their Application to Public Health Intervention
new file mode 100644
index 0000000000..f5ca428aae
--- /dev/null
+++ b/data/2020/neurips/Collapsing Bandits and Their Application to Public Health Intervention	
@@ -0,0 +1 @@
+We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closed-form. (iii) We evaluate our algorithm on several data distributions including data from a real-world healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3-order-of-magnitude speedup compared to state-of-the-art RMAB techniques while achieving similar performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Collegial Ensembles b/data/2020/neurips/Collegial Ensembles
new file mode 100644
index 0000000000..142a094d21
--- /dev/null
+++ b/data/2020/neurips/Collegial Ensembles	
@@ -0,0 +1 @@
+Modern neural network performance typically improves as model size increases. A recent line of research on the Neural Tangent Kernel (NTK) of over-parameterized networks indicates that the improvement with size increase is a product of a better conditioned loss landscape. In this work, we investigate a form of over-parameterization achieved through ensembling, where we define collegial ensembles (CE) as the aggregation of multiple independent models with identical architectures, trained as a single model. We show that the optimization dynamics of CE simplify dramatically when the number of models in the ensemble is large, resembling the dynamics of wide models, yet scale much more favorably. We use recent theoretical results on the finite width corrections of the NTK to perform efficient architecture search in a space of finite width CE that aims to either minimize capacity, or maximize trainability under a set of constraints. The resulting ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers. Finally, we show how our framework can be used to analytically derive optimal group convolution modules originally found using expensive grid searches, without having to train a single model.
\ No newline at end of file
diff --git a/data/2020/neurips/Color Visual Illusions: A Statistics-based Computational Model b/data/2020/neurips/Color Visual Illusions: A Statistics-based Computational Model
new file mode 100644
index 0000000000..e899995a86
--- /dev/null
+++ b/data/2020/neurips/Color Visual Illusions: A Statistics-based Computational Model	
@@ -0,0 +1 @@
+Visual illusions may be explained by the likelihood of patches in real-world images, as argued by input-driven paradigms in Neuro-Science. However, neither the data nor the tools existed in the past to extensively support these explanations. The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner. Furthermore, our model generates visual illusions in natural images, by applying the same tool, reversely.
\ No newline at end of file
diff --git a/data/2020/neurips/Combining Deep Reinforcement Learning and Search for Imperfect-Information Games b/data/2020/neurips/Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
new file mode 100644
index 0000000000..e4e69783cb
--- /dev/null
+++ b/data/2020/neurips/Combining Deep Reinforcement Learning and Search for Imperfect-Information Games	
@@ -0,0 +1 @@
+The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of a successes in single-agent settings and perfect-information games, best exemplified by the success of AlphaZero. However, algorithms of this form have been unable to cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search for imperfect-information games. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI. We also prove that ReBeL converges to a Nash equilibrium in two-player zero-sum games in tabular settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian b/data/2020/neurips/Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian
new file mode 100644
index 0000000000..5fedc70f49
--- /dev/null
+++ b/data/2020/neurips/Community detection in sparse time-evolving graphs with a dynamical Bethe-Hessian	
@@ -0,0 +1 @@
+This article considers the problem of community detection in sparse dynamical graphs in which the community structure evolves over time. A fast spectral algorithm based on an extension of the Bethe-Hessian matrix is proposed, which benefits from the positive correlation in the class labels and in their temporal evolution and is designed to be applicable to any dynamical graph with a community structure. Under the dynamical degree-corrected stochastic block model, in the case of two classes of equal size, we demonstrate and support with extensive simulations that our proposed algorithm is capable of making non-trivial community reconstruction as soon as theoretically possible, thereby reaching the optimal detectability thresholdand provably outperforming competing spectral methods.
\ No newline at end of file
diff --git "a/data/2020/neurips/Community detection using fast low-cardinality semidefinite programming\342\200\251" "b/data/2020/neurips/Community detection using fast low-cardinality semidefinite programming\342\200\251"
new file mode 100644
index 0000000000..b37be71ba6
--- /dev/null
+++ "b/data/2020/neurips/Community detection using fast low-cardinality semidefinite programming\342\200\251"	
@@ -0,0 +1 @@
+Modularity maximization has been a fundamental tool for understanding the community structure of a network, but the underlying optimization problem is nonconvex and NP-hard to solve. State-of-the-art algorithms like the Louvain or Leiden methods focus on different heuristics to help escape local optima, but they still depend on a greedy step that moves node assignment locally and is prone to getting trapped. In this paper, we propose a new class of low-cardinality algorithm that generalizes the local update to maximize a semidefinite relaxation derived from max-k-cut. This proposed algorithm is scalable, empirically achieves the global semidefinite optimality for small cases, and outperforms the state-of-the-art algorithms in real-world datasets with little additional time cost. From the algorithmic perspective, it also opens a new avenue for scaling-up semidefinite programming when the solutions are sparse instead of low-rank.
\ No newline at end of file
diff --git a/data/2020/neurips/Compact task representations as a normative model for higher-order brain activity b/data/2020/neurips/Compact task representations as a normative model for higher-order brain activity
new file mode 100644
index 0000000000..66d99575c2
--- /dev/null
+++ b/data/2020/neurips/Compact task representations as a normative model for higher-order brain activity	
@@ -0,0 +1 @@
+Higher-order brain areas such as the frontal cortices are considered essential for the ﬂexible solution of tasks. However, the precise computational role of these areas is still debated. Indeed, even for the simplest of tasks, we cannot really explain how the measured brain activity, which evolves over time in complicated ways, relates to the task structure. Here, we follow a normative approach, based on integrating the principle of efﬁcient coding with the framework of Markov decision processes (MDP). More speciﬁcally, we focus on MDPs whose state is based on action-observation histories, and we show how to compress the state space such that unnecessary redundancy is eliminated, while task-relevant information is preserved. We show that the efﬁciency of a state space representation depends on the (long-term) behavioural goal of the agent, and we distinguish between model-based and habitual agents. We apply our approach to simple tasks that require short-term memory, and we show that the efﬁcient state space representations reproduce the key dynamical features of recorded neural activity in frontal areas (such as ramping, sequentiality, persistence). If we additionally assume that neural systems are subject to cost-accuracy tradeoffs, we ﬁnd a surprising match to neural data on a population level.
\ No newline at end of file
diff --git a/data/2020/neurips/Comparator-Adaptive Convex Bandits b/data/2020/neurips/Comparator-Adaptive Convex Bandits
new file mode 100644
index 0000000000..eff32f1e64
--- /dev/null
+++ b/data/2020/neurips/Comparator-Adaptive Convex Bandits	
@@ -0,0 +1 @@
+We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart. Specifically, we develop convex bandit algorithms with regret bounds that are small whenever the norm of the comparator is small. We first use techniques from the full-information setting to develop comparator-adaptive algorithms for linear bandits. Then, we extend the ideas to convex bandits with Lipschitz or smooth loss functions, using a new single-point gradient estimator and carefully designed surrogate losses.
\ No newline at end of file
diff --git a/data/2020/neurips/Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval b/data/2020/neurips/Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval
new file mode 100644
index 0000000000..e328964666
--- /dev/null
+++ b/data/2020/neurips/Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval	
@@ -0,0 +1 @@
+Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimension is small the dynamics remains trapped in spurious minima with large basins of attraction. We find analytically that above a critical ratio those critical points become unstable developing a negative direction toward the signal. By numerical experiments we show that in this regime the gradient flow algorithm is not trapped; it drifts away from the spurious critical points along the unstable direction and succeeds in finding the global minimum. Using tools from statistical physics we characterize this phenomenon, which is related to a BBP-type transition in the Hessian of the spurious minima.
\ No newline at end of file
diff --git a/data/2020/neurips/Compositional Explanations of Neurons b/data/2020/neurips/Compositional Explanations of Neurons
new file mode 100644
index 0000000000..d0eac2d450
--- /dev/null
+++ b/data/2020/neurips/Compositional Explanations of Neurons	
@@ -0,0 +1 @@
+We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts that closely approximate neuron behavior. Compared to prior work that uses atomic labels as explanations, analyzing neurons compositionally allows us to more precisely and expressively characterize their behavior. We use this procedure to answer several questions on interpretability in models for vision and natural language processing. First, we examine the kinds of abstractions learned by neurons. In image classification, we find that many neurons learn highly abstract but semantically coherent visual concepts, while other polysemantic neurons detect multiple unrelated features; in natural language inference (NLI), neurons learn shallow lexical heuristics from dataset biases. Second, we see whether compositional explanations give us insight into model performance: vision neurons that detect human-interpretable concepts are positively correlated with task performance, while NLI neurons that fire for shallow heuristics are negatively correlated with task performance. Finally, we show how compositional explanations provide an accessible way for end users to produce simple "copy-paste" adversarial examples that change model behavior in predictable ways.
\ No newline at end of file
diff --git a/data/2020/neurips/Compositional Generalization by Learning Analytical Expressions b/data/2020/neurips/Compositional Generalization by Learning Analytical Expressions
new file mode 100644
index 0000000000..0b7473a877
--- /dev/null
+++ b/data/2020/neurips/Compositional Generalization by Learning Analytical Expressions	
@@ -0,0 +1 @@
+Compositional generalization is a basic but essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules Composer and Solver, fitting well with the cognitive argument while still being trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on a well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.
\ No newline at end of file
diff --git a/data/2020/neurips/Compositional Generalization via Neural-Symbolic Stack Machines b/data/2020/neurips/Compositional Generalization via Neural-Symbolic Stack Machines
new file mode 100644
index 0000000000..d43944df28
--- /dev/null
+++ b/data/2020/neurips/Compositional Generalization via Neural-Symbolic Stack Machines	
@@ -0,0 +1 @@
+Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner. To tackle this issue, we propose the Neural-Symbolic Stack Machine (NeSS). It contains a neural network to generate traces, which are then executed by a symbolic stack machine enhanced with sequence manipulation operations. NeSS combines the expressive power of neural sequence models with the recursion supported by the symbolic stack machine. Without training supervision on execution traces, NeSS achieves 100% generalization performance in four domains: the SCAN benchmark of language-driven navigation tasks, the task of few-shot learning of compositional instructions, the compositional machine translation benchmark, and context-free grammar parsing tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Compositional Visual Generation with Energy Based Models b/data/2020/neurips/Compositional Visual Generation with Energy Based Models
new file mode 100644
index 0000000000..63cbe75c66
--- /dev/null
+++ b/data/2020/neurips/Compositional Visual Generation with Energy Based Models	
@@ -0,0 +1 @@
+A vital aspect of human intelligence is the ability to compose increasingly complex concepts out of simpler ideas, enabling both rapid learning and adaptation of knowledge. In this paper we show that energy-based models can exhibit this ability by directly combining probability distributions. Samples from the combined distribution correspond to compositions of concepts. For example, given one distribution for smiling face images, and another for male faces, we can combine them to generate smiling male faces. This allows us to generate natural images that simultaneously satisfy conjunctions, disjunctions, and negations of concepts. We evaluate compositional generation abilities of our model on the CelebA dataset of natural faces and synthetic 3D scene images. We showcase the breadth of unique capabilities of our model, such as the ability to continually learn and incorporate new concepts, or infer compositions of concept properties underlying an image.
\ No newline at end of file
diff --git a/data/2020/neurips/Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition b/data/2020/neurips/Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition
new file mode 100644
index 0000000000..be8af1b865
--- /dev/null
+++ b/data/2020/neurips/Compositional Zero-Shot Learning via Fine-Grained Dense Feature Composition	
@@ -0,0 +1 @@
+We develop a novel generative model for zero-shot learning to recognize finegrained unseen classes without training samples. Our observation is that generating holistic features of unseen classes fails to capture every attribute needed to distinguish small differences among classes. We propose a feature composition framework that learns to extract attribute-based features from training samples and combines them to construct fine-grained features for unseen classes. Feature composition allows us to not only selectively compose features of unseen classes from only relevant training samples, but also obtain diversity among composed features via changing samples used for composition. In addition, instead of building a global feature of an unseen class, we use all attribute-based features to form a dense representation consisting of fine-grained attribute details. To recognize unseen classes, we propose a novel training scheme that uses a discriminative model to construct features that are subsequently used to train itself. Therefore, we directly train the discriminative model on composed features without learning separate generative models. We conduct experiments on four popular datasets of DeepFashion, AWA2, CUB, and SUN, showing that our method significantly improves the state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection b/data/2020/neurips/Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection
new file mode 100644
index 0000000000..1613d6ca93
--- /dev/null
+++ b/data/2020/neurips/Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection	
@@ -0,0 +1 @@
+Weakly Supervised Object Detection (WSOD) has emerged as an effective tool to train object detectors using only the image-level category labels. However, without object-level labels, WSOD detectors are prone to detect bounding boxes on salient objects, clustered objects and discriminative object parts. Moreover, the image-level category labels do not enforce consistent object detection across different transformations of the same images. To address the above issues, we propose a Comprehensive Attention Self-Distillation (CASD) training approach for WSOD. To balance feature learning among all object instances, CASD computes the comprehensive attention aggregated from multiple transformations and feature layers of the same images. To enforce consistent spatial supervision on objects, CASD conducts self-distillation on the WSOD networks, such that the comprehensive attention is approximated simultaneously by multiple transformations and feature layers of the same images. CASD produces new state-of-the-art WSOD results on standard benchmarks such as PASCAL VOC 2007/2012 and MS-COCO.
\ No newline at end of file
diff --git a/data/2020/neurips/Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding b/data/2020/neurips/Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding
new file mode 100644
index 0000000000..6f15839369
--- /dev/null
+++ b/data/2020/neurips/Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding	
@@ -0,0 +1 @@
+Variational Autoencoders (VAEs) have seen widespread use in learned image compression. They are used to learn expressive latent representations on which downstream compression methods can operate with high efficiency. Recently proposed 'bits-back' methods can indirectly encode the latent representation of images with codelength close to the relative entropy between the latent posterior and the prior. However, due to the underlying algorithm, these methods can only be used for lossless compression, and they only achieve their nominal efficiency when compressing multiple images simultaneously; they are inefficient for compressing single images. As an alternative, we propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images, supported by our empirical results obtained on the Cifar10, ImageNet32 and Kodak datasets. Moreover, unlike previous bits-back methods, REC is immediately applicable to lossy compression, where it is competitive with the state-of-the-art on the Kodak dataset.
\ No newline at end of file
diff --git a/data/2020/neurips/Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming b/data/2020/neurips/Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming
new file mode 100644
index 0000000000..e1e14b0488
--- /dev/null
+++ b/data/2020/neurips/Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming	
@@ -0,0 +1 @@
+There is a vast body of literature related to methods for detecting changepoints (CP). However, less attention has been paid to assessing the statistical reliability of the detected CPs. In this paper, we introduce a novel method to perform statistical inference on the significance of the CPs, estimated by a Dynamic Programming (DP)-based optimal CP detection algorithm. Based on the selective inference (SI) framework, we propose an exact (non-asymptotic) approach to compute valid p-values for testing the significance of the CPs. Although it is well-known that SI has low statistical power because of over-conditioning, we address this disadvantage by introducing parametric programming techniques. Then, we propose an efficient method to conduct SI with the minimum amount of conditioning, leading to high statistical power. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method is more powerful than existing methods, has decent performance in terms of computational efficiency, and provides good results in many practical applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds b/data/2020/neurips/Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds
new file mode 100644
index 0000000000..cc14e4c19f
--- /dev/null
+++ b/data/2020/neurips/Conditioning and Processing: Techniques to Improve Information-Theoretic Generalization Bounds	
@@ -0,0 +1 @@
+Obtaining generalization bounds for learning algorithms is one of the main subjects studied in theoretical machine learning. In recent years, information-theoretic bounds on generalization have gained the attention of researchers. This approach provides an insight into learning algorithms by considering the mutual information between the model and the training set. In this paper, a probabilistic graphical representation of this approach is adopted and two general techniques to improve the bounds are introduced, namely conditioning and processing. In conditioning, a random variable in the graph is considered as given, while in processing a random variable is substituted with one of its children. These techniques can be used to improve the bounds by either sharpening them or increasing their applicability. It is demonstrated that the proposed framework provides a simple and uniﬁed way to explain a variety of recent tightening results. New improved bounds derived by utilizing these techniques are also proposed.
\ No newline at end of file
diff --git a/data/2020/neurips/Confidence sequences for sampling without replacement b/data/2020/neurips/Confidence sequences for sampling without replacement
new file mode 100644
index 0000000000..4b6afefaa3
--- /dev/null
+++ b/data/2020/neurips/Confidence sequences for sampling without replacement	
@@ -0,0 +1 @@
+Many practical tasks involve sampling sequentially without replacement (WoR) from a finite population of size $N$, in an attempt to estimate some parameter $\theta^\star$. Accurately quantifying uncertainty throughout this process is a nontrivial task, but is necessary because it often determines when we stop collecting samples and confidently report a result. We present a suite of tools for designing confidence sequences (CS) for $\theta^\star$. A CS is a sequence of confidence sets $(C_n)_{n=1}^N$, that shrink in size, and all contain $\theta^\star$ simultaneously with high probability. We first exploit a relationship between Bayesian posteriors and martingales to construct a (frequentist) CS for the parameters of a hypergeometric distribution. We then present Hoeffding- and empirical-Bernstein-type time-uniform CSs and fixed-time confidence intervals for sampling WoR which improve on previous bounds in the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/Conformal Symplectic and Relativistic Optimization b/data/2020/neurips/Conformal Symplectic and Relativistic Optimization
new file mode 100644
index 0000000000..d81128b440
--- /dev/null
+++ b/data/2020/neurips/Conformal Symplectic and Relativistic Optimization	
@@ -0,0 +1 @@
+Arguably, the two most popular accelerated or momentum-based optimization methods in machine learning are Nesterov’s accelerated gradient and Polyaks’s heavy ball, both corresponding to different discretizations of a particular second order differential equation with friction. Such connections with continuous-time dynamical systems have been instrumental in demystifying acceleration phenomena in optimization. Here we study structure-preserving discretizations for a certain class of dissipative (conformal) Hamiltonian systems, allowing us to analyse the symplectic structure of both Nesterov and heavy ball, besides providing several new insights into these methods. Moreover, we propose a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization. Importantly, such a method generalizes both Nesterov and heavy ball, each being recovered as distinct limiting cases, and has potential advantages at no additional cost.
\ No newline at end of file
diff --git a/data/2020/neurips/Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning b/data/2020/neurips/Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
new file mode 100644
index 0000000000..296af08253
--- /dev/null
+++ b/data/2020/neurips/Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning	
@@ -0,0 +1 @@
+Off-policy evaluation of sequential decision policies from observational data is necessary in applications of batch reinforcement learning such as education and healthcare. In such settings, however, unobserved variables confound observed actions, rendering exact evaluation of new policies impossible, i.e., unidentifiable. We develop a robust approach that estimates sharp bounds on the (unidentifiable) value of a given policy in an infinite-horizon problem given data from another policy with unobserved confounding, subject to a sensitivity model. We consider stationary or baseline unobserved confounding and compute bounds by optimizing over the set of all stationary state-occupancy ratios that agree with a new partially identified estimating equation and the sensitivity model. We prove convergence to the sharp bounds as we collect more confounded data. Although checking set membership is a linear program, the support function is given by a difficult nonconvex optimization problem. We develop approximations based on nonconvex projected gradient descent and demonstrate the resulting bounds empirically.
\ No newline at end of file
diff --git a/data/2020/neurips/Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices b/data/2020/neurips/Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices
new file mode 100644
index 0000000000..702ff7c833
--- /dev/null
+++ b/data/2020/neurips/Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices	
@@ -0,0 +1 @@
+We present an extension of the conditional gradient method to problems whose feasible sets are convex cones. We provide a convergence analysis for the method and for variants with nonconvex objectives, and we extend the analysis to practical cases with effective line search strategies. For the speciﬁc case of the positive semideﬁnite cone, we present a memory-efﬁcient version based on randomized matrix sketches and advocate a heuristic greedy step that greatly improves its practical performance. Numerical results on phase retrieval and matrix completion problems indicate that our method can offer substantial advantages over traditional conditional gradient and Burer-Monteiro approaches
\ No newline at end of file
diff --git a/data/2020/neurips/Consequences of Misaligned AI b/data/2020/neurips/Consequences of Misaligned AI
new file mode 100644
index 0000000000..0e7d7fc3e6
--- /dev/null
+++ b/data/2020/neurips/Consequences of Misaligned AI	
@@ -0,0 +1 @@
+AI systems often rely on two key components: a specified goal or reward function and an optimization algorithm to compute the optimal behavior for that goal. This approach is intended to provide value for a principal: the user on whose behalf the agent acts. The objectives given to these agents often refer to a partial specification of the principal's goals. We consider the cost of this incompleteness by analyzing a model of a principal and an agent in a resource constrained world where the $L$ attributes of the state correspond to different sources of utility for the principal. We assume that the reward function given to the agent only has support on $J<L$ attributes. The contributions of our paper are as follows: 1) we propose a novel model of an incomplete principal-agent problem from artificial intelligence; 2) we provide necessary and sufficient conditions under which indefinitely optimizing for any incomplete proxy objective leads to arbitrarily low overall utility; and 3) we show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions. The results in this paper argue that we should view the design of reward functions as an interactive and dynamic process and identifies a theoretical scenario where some degree of interactivity is desirable.
\ No newline at end of file
diff --git a/data/2020/neurips/Conservative Q-Learning for Offline Reinforcement Learning b/data/2020/neurips/Conservative Q-Learning for Offline Reinforcement Learning
new file mode 100644
index 0000000000..9e5a382f5b
--- /dev/null
+++ b/data/2020/neurips/Conservative Q-Learning for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overestimation of values induced by the distributional shift between the dataset and the learned policy, especially when training on complex and multi-modal data distributions. In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds its true value. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees. In practice, CQL augments the standard Bellman error objective with a simple Q-value regularizer which is straightforward to implement on top of existing deep Q-learning and actor-critic implementations. On both discrete and continuous control domains, we show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return, especially when learning from complex and multi-modal data distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Consistency Regularization for Certified Robustness of Smoothed Classifiers b/data/2020/neurips/Consistency Regularization for Certified Robustness of Smoothed Classifiers
new file mode 100644
index 0000000000..92b84dd3b3
--- /dev/null
+++ b/data/2020/neurips/Consistency Regularization for Certified Robustness of Smoothed Classifiers	
@@ -0,0 +1 @@
+A recent technique of randomized smoothing has shown that the worst-case (adversarial) $\ell_2$-robustness can be transformed into the average-case Gaussian-robustness by "smoothing" a classifier, i.e., by considering the averaged prediction over Gaussian noise. In this paradigm, one should rethink the notion of adversarial robustness in terms of generalization ability of a classifier under noisy observations. We found that the trade-off between accuracy and certified robustness of smoothed classifiers can be greatly controlled by simply regularizing the prediction consistency over noise. This relationship allows us to design a robust training objective without approximating a non-existing smoothed classifier, e.g., via soft smoothing. Our experiments under various deep neural network architectures and datasets demonstrate that the "certified" $\ell_2$-robustness can be dramatically improved with the proposed regularization, even achieving better or comparable results to the state-of-the-art approaches with significantly less training costs and hyperparameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations b/data/2020/neurips/Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations
new file mode 100644
index 0000000000..ba7777b8b6
--- /dev/null
+++ b/data/2020/neurips/Consistent Estimation of Identifiable Nonparametric Mixture Models from Grouped Observations	
@@ -0,0 +1 @@
+Recent research has established sufficient conditions for finite mixture models to be identifiable from grouped observations. These conditions allow the mixture components to be nonparametric and have substantial (or even total) overlap. This work proposes an algorithm that consistently estimates any identifiable mixture model from grouped observations. Our analysis leverages an oracle inequality for weighted kernel density estimators of the distribution on groups, together with a general result showing that consistent estimation of the distribution on groups implies consistent estimation of mixture components. A practical implementation is provided for paired observations, and the approach is shown to outperform existing methods, especially when mixture components overlap significantly.
\ No newline at end of file
diff --git a/data/2020/neurips/Consistent Plug-in Classifiers for Complex Objectives and Constraints b/data/2020/neurips/Consistent Plug-in Classifiers for Complex Objectives and Constraints
new file mode 100644
index 0000000000..88d7cb913e
--- /dev/null
+++ b/data/2020/neurips/Consistent Plug-in Classifiers for Complex Objectives and Constraints	
@@ -0,0 +1 @@
+We present a consistent algorithm for constrained classiﬁcation problems where the objective (e.g. F-measure, G-mean) and the constraints (e.g. demographic parity fairness, coverage) are deﬁned by general functions of the confusion matrix. Our approach reduces the problem into a sequence of plug-in classiﬁer learning tasks. The reduction is achieved by posing the learning problem as an optimization over the intersection of two sets: the set of confusion matrices that are achievable and those that are feasible. This decoupling of the constraint space then allows us to solve the problem by applying Frank-Wolfe style optimization over the individual sets. For objective and constraints that are convex functions of the confusion matrix, our algorithm requires O (1 / ✏ 2 ) calls to the plug-in subroutine, which improves on the O (1 / ✏ 3 ) calls needed by the reduction-based algorithm of Narasimhan (2018) [29]. We show empirically that our algorithm is competitive with prior methods, while being more robust to choices of hyper-parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Consistent Structural Relation Learning for Zero-Shot Segmentation b/data/2020/neurips/Consistent Structural Relation Learning for Zero-Shot Segmentation
new file mode 100644
index 0000000000..af8e9382c7
--- /dev/null
+++ b/data/2020/neurips/Consistent Structural Relation Learning for Zero-Shot Segmentation	
@@ -0,0 +1 @@
+Zero-shot semantic segmentation aims to recognize the semantics of pixels from unseen categories with zero training samples. Previous practice [1] proposed to train the classiﬁers for unseen categories using the visual features generated from semantic word embeddings. However, the generator is merely learned on the seen categories while no constraint is applied to the unseen categories, leading to poor generalization ability. In this work, we propose a Consistent Structural Relation Learning (CSRL) approach to constrain the generating of unseen visual features by exploiting the structural relations between seen and unseen categories. We observe that different categories are usually with similar relations in either semantic word embedding space or visual feature space. This observation motivates us to harness the similarity of category-level relations on the semantic word embedding space to learn a better visual feature generator. Concretely, by exploring the pair-wise and list-wise structures, we impose the relations of generated visual features to be consistent with their counterparts in the semantic word embedding space. In this way, the relations between seen and unseen categories will be transferred to implicitly constrain the generator to produce relation-consistent unseen visual features. We conduct extensive experiments on Pascal-VOC and Pascal-Context benchmarks. The proposed CSRL outperforms existing state-of-the-art methods by a large margin, resulting in ~7-12% on Pascal-VOC and ~2-5% on Pascal-Context.
\ No newline at end of file
diff --git a/data/2020/neurips/Consistent feature selection for analytic deep neural networks b/data/2020/neurips/Consistent feature selection for analytic deep neural networks
new file mode 100644
index 0000000000..7266407693
--- /dev/null
+++ b/data/2020/neurips/Consistent feature selection for analytic deep neural networks	
@@ -0,0 +1,2 @@
+One of the most important steps toward interpretability and explainability of neural network models is feature selection, which aims to identify the subset of relevant features. Theoretical results in the field have mostly focused on the prediction aspect of the problem with virtually no work on feature selection consistency for deep neural networks due to the model's severe nonlinearity and unidentifiability. This lack of theoretical foundation casts doubt on the applicability of deep learning to contexts where correct interpretations of the features play a central role. 
+In this work, we investigate the problem of feature selection for analytic deep networks. We prove that for a wide class of networks, including deep feed-forward neural networks, convolutional neural networks, and a major sub-class of residual neural networks, the Adaptive Group Lasso selection procedure with Group Lasso as the base estimator is selection-consistent. The work provides further evidence that Group Lasso might be inefficient for feature selection with neural networks and advocates the use of Adaptive Group Lasso over the popular Group Lasso.
\ No newline at end of file
diff --git a/data/2020/neurips/Constant-Expansion Suffices for Compressed Sensing with Generative Priors b/data/2020/neurips/Constant-Expansion Suffices for Compressed Sensing with Generative Priors
new file mode 100644
index 0000000000..50a12affb9
--- /dev/null
+++ b/data/2020/neurips/Constant-Expansion Suffices for Compressed Sensing with Generative Priors	
@@ -0,0 +1 @@
+Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signal in the range of the network can be efficiently, approximately recovered from a few noisy measurements. However, a major bottleneck of these theoretical guarantees is a network expansivity condition: that each layer of the neural network must be larger than the previous by a logarithmic factor. Our main contribution is to break this strong expansivity assumption, showing that constant expansivity suffices to get efficient recovery algorithms, besides it also being information-theoretically necessary. To overcome the theoretical bottleneck in existing approaches we prove a novel uniform concentration theorem for random functions that might not be Lipschitz but satisfy a relaxed notion which we call "pseudo-Lipschitzness." Using this theorem we can show that a matrix concentration inequality known as the Weight Distribution Condition (WDC), which was previously only known to hold for Gaussian matrices with logarithmic aspect ratio, in fact holds for constant aspect ratios too. Since the WDC is a fundamental matrix concentration inequality in the heart of all existing theoretical guarantees on this problem, our tighter bound immediately yields improvements in all known results in the literature on compressed sensing with deep generative priors, including one-bit recovery, phase retrieval, low-rank matrix recovery, and more.
\ No newline at end of file
diff --git a/data/2020/neurips/Constrained episodic reinforcement learning in concave-convex and knapsack settings b/data/2020/neurips/Constrained episodic reinforcement learning in concave-convex and knapsack settings
new file mode 100644
index 0000000000..5b1647304f
--- /dev/null
+++ b/data/2020/neurips/Constrained episodic reinforcement learning in concave-convex and knapsack settings	
@@ -0,0 +1 @@
+We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Constraining Variational Inference with Geometric Jensen-Shannon Divergence b/data/2020/neurips/Constraining Variational Inference with Geometric Jensen-Shannon Divergence
new file mode 100644
index 0000000000..fabb5409ec
--- /dev/null
+++ b/data/2020/neurips/Constraining Variational Inference with Geometric Jensen-Shannon Divergence	
@@ -0,0 +1 @@
+We examine the problem of controlling divergences for latent space regularisation in variational autoencoders. Specifically, when aiming to reconstruct example $x\in\mathbb{R}^{m}$ via latent space $z\in\mathbb{R}^{n}$ ($n\leq m$), while balancing this against the need for generalisable latent representations. We present a regularisation mechanism based on the skew geometric-Jensen-Shannon divergence $\left(\textrm{JS}^{\textrm{G}_{\alpha}}\right)$. We find a variation in $\textrm{JS}^{\textrm{G}_{\alpha}}$, motivated by limiting cases, which leads to an intuitive interpolation between forward and reverse KL in the space of both distributions and divergences. We motivate its potential benefits for VAEs through low-dimensional examples, before presenting quantitative and qualitative results. Our experiments demonstrate that skewing our variant of $\textrm{JS}^{\textrm{G}_{\alpha}}$, in the context of $\textrm{JS}^{\textrm{G}_{\alpha}}$-VAEs, leads to better reconstruction and generation when compared to several baseline VAEs. Our approach is entirely unsupervised and utilises only one hyperparameter which can be easily interpreted in latent space.
\ No newline at end of file
diff --git a/data/2020/neurips/Content Provider Dynamics and Coordination in Recommendation Ecosystems b/data/2020/neurips/Content Provider Dynamics and Coordination in Recommendation Ecosystems
new file mode 100644
index 0000000000..b808c96a88
--- /dev/null
+++ b/data/2020/neurips/Content Provider Dynamics and Coordination in Recommendation Ecosystems	
@@ -0,0 +1 @@
+Recommendation Systems like YouTube are vibrant ecosystems with two types of users: Content consumers (those who watch videos) and content providers (those who create videos). While the computational task of recommending relevant content is largely solved, designing a system that guarantees high social welfare for all stakeholders is still in its infancy. In this work, we investigate the dynamics of content creation using a game-theoretic lens. Employing a stylized model that was recently suggested by other works, we show that the dynamics will always converge to a pure Nash Equilibrium (PNE), but the convergence rate can be exponential. We complement the analysis by proposing an efﬁcient PNE computation algorithm via a combinatorial optimization problem that is of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Contextual Games: Multi-Agent Learning with Side Information b/data/2020/neurips/Contextual Games: Multi-Agent Learning with Side Information
new file mode 100644
index 0000000000..52ba888504
--- /dev/null
+++ b/data/2020/neurips/Contextual Games: Multi-Agent Learning with Side Information	
@@ -0,0 +1 @@
+We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached whenever players' contextual regrets vanish. Finally, we empirically validate our results in a traffic routing experiment, where our algorithm leads to better performance and higher welfare compared to baselines that do not exploit the available contextual information or the correlations present in the game.
\ No newline at end of file
diff --git a/data/2020/neurips/Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming b/data/2020/neurips/Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming
new file mode 100644
index 0000000000..043a495f53
--- /dev/null
+++ b/data/2020/neurips/Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming	
@@ -0,0 +1 @@
+We study the problem of learning a linear model to set the reserve price in order to maximize expected revenue in an auction, given contextual information. First, we show that it is not possible to solve this problem in polynomial time unless the \emph{Exponential Time Hypothesis} fails. Second, we present a strong mixed-integer programming (MIP) formulation for this problem, which is capable of exactly modeling the nonconvex and discontinuous expected reward function. Moreover, we show that this MIP formulation is ideal (the strongest possible formulation) for the revenue function. Since it can be computationally expensive to exactly solve the MIP formulation, we also study the performance of its linear programming (LP) relaxation. We show that, unfortunately, in the worst case the objective gap of the linear programming relaxation can be $O(n)$ times larger than the optimal objective of the actual problem, where $n$ is the number of samples. Finally, we present computational results, showcasing that the mixed-integer programming formulation, along with its linear programming relaxation, are able to superior both the in-sample performance and the out-of-sample performance of the state-of-the-art algorithms on both real and synthetic datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Continual Deep Learning by Functional Regularisation of Memorable Past b/data/2020/neurips/Continual Deep Learning by Functional Regularisation of Memorable Past
new file mode 100644
index 0000000000..0941b817f0
--- /dev/null
+++ b/data/2020/neurips/Continual Deep Learning by Functional Regularisation of Memorable Past	
@@ -0,0 +1 @@
+Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior. Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.
\ No newline at end of file
diff --git a/data/2020/neurips/Continual Learning in Low-rank Orthogonal Subspaces b/data/2020/neurips/Continual Learning in Low-rank Orthogonal Subspaces
new file mode 100644
index 0000000000..a132776182
--- /dev/null
+++ b/data/2020/neurips/Continual Learning in Low-rank Orthogonal Subspaces	
@@ -0,0 +1 @@
+In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric mappings by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.
\ No newline at end of file
diff --git a/data/2020/neurips/Continual Learning of Control Primitives : Skill Discovery via Reset-Games b/data/2020/neurips/Continual Learning of Control Primitives : Skill Discovery via Reset-Games
new file mode 100644
index 0000000000..0c795ad781
--- /dev/null
+++ b/data/2020/neurips/Continual Learning of Control Primitives : Skill Discovery via Reset-Games	
@@ -0,0 +1 @@
+Reinforcement learning has the potential to automate the acquisition of behavior in complex settings, but in order for it to be successfully deployed, a number of practical challenges must be addressed. First, in real world settings, when an agent attempts a task and fails, the environment must somehow "reset" so that the agent can attempt the task again. While easy in simulation, this could require considerable human effort in the real world, especially if the number of trials is very large. Second, real world learning often involves complex, temporally extended behavior that is often difficult to acquire with random exploration. While these two problems may at first appear unrelated, in this work, we show how a single method can allow an agent to acquire skills with minimal supervision while removing the need for resets. We do this by exploiting the insight that the need to "reset" an agent to a broad set of initial states for a learning task provides a natural setting to learn a diverse set of "reset-skills". We propose a general-sum game formulation that balances the objectives of resetting and learning skills, and demonstrate that this approach improves performance on reset-free tasks, and additionally show that the skills we obtain can be used to significantly accelerate downstream learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks b/data/2020/neurips/Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks
new file mode 100644
index 0000000000..f027fa6574
--- /dev/null
+++ b/data/2020/neurips/Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks	
@@ -0,0 +1 @@
+Existing research on continual learning of a sequence of tasks focused on dealing with catastrophic forgetting, where the tasks are assumed to be dissimilar and have little shared knowledge. Some work has also been done to transfer previously learned knowledge to the new task when the tasks are similar and have shared knowledge. To the best of our knowledge, no technique has been proposed to learn a sequence of mixed similar and dissimilar tasks that can deal with forgetting and also transfer knowledge forward and backward. This paper proposes such a technique to learn both types of tasks in the same network. For dissimilar tasks, the algorithm focuses on dealing with forgetting, and for similar tasks, the algorithm focuses on selectively transferring the knowledge learned from some similar previous tasks to improve the new task learning. Additionally, the algorithm automatically detects whether a new task is similar to any previous tasks. Empirical evaluation using sequences of mixed tasks demonstrates the effectiveness of the proposed model.
\ No newline at end of file
diff --git a/data/2020/neurips/Continual Learning with Node-Importance based Adaptive Group Sparse Regularization b/data/2020/neurips/Continual Learning with Node-Importance based Adaptive Group Sparse Regularization
new file mode 100644
index 0000000000..1d3ed46f85
--- /dev/null
+++ b/data/2020/neurips/Continual Learning with Node-Importance based Adaptive Group Sparse Regularization	
@@ -0,0 +1 @@
+We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties. Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task. By utilizing the proximal gradient descent method for learning, the exact sparsity and freezing of the model is guaranteed, and thus, the learner can explicitly control the model capacity as the learning continues. Furthermore, as a critical detail, we re-initialize the weights associated with unimportant nodes after learning each task in order to prevent the negative transfer that causes the catastrophic forgetting and facilitate efficient learning of new tasks. Throughout the extensive experimental results, we show that our AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Continuous Meta-Learning without Tasks b/data/2020/neurips/Continuous Meta-Learning without Tasks
new file mode 100644
index 0000000000..cc4012ad8c
--- /dev/null
+++ b/data/2020/neurips/Continuous Meta-Learning without Tasks	
@@ -0,0 +1 @@
+Meta-learning is a promising strategy for learning to efficiently learn within new tasks, using data gathered from a distribution of tasks. However, the meta-learning literature thus far has focused on the task segmented setting, where at train-time, offline data is assumed to be split according to the underlying task, and at test-time, the algorithms are optimized to learn in a single task. In this work, we enable the application of generic meta-learning algorithms to settings where this task segmentation is unavailable, such as continual online learning with a time-varying task. We present meta-learning via online changepoint analysis (MOCA), an approach which augments a meta-learning algorithm with a differentiable Bayesian changepoint detection scheme. The framework allows both training and testing directly on time series data without segmenting it into discrete tasks. We demonstrate the utility of this approach on a nonlinear meta-regression benchmark as well as two meta-image-classification benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision b/data/2020/neurips/Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision
new file mode 100644
index 0000000000..fd04ac3610
--- /dev/null
+++ b/data/2020/neurips/Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision	
@@ -0,0 +1 @@
+Novel View Synthesis (NVS) is concerned with synthesizing views under camera viewpoint transformations from one or multiple input images. NVS requires explicit reasoning about 3D object structure and unseen parts of the scene to synthesize convincing results. As a result, current approaches typically rely on supervised training with either ground truth 3D models or multiple target images. We propose Continuous Object Representation Networks (CORN), a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation. We can train CORN with only two source images per object by combining our model with a neural renderer. A key feature of CORN is that it requires no ground truth 3D models or target view supervision. Regardless, CORN performs well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and achieves performance comparable to state-of-the-art approaches that use direct supervision. For up-to-date information, data, and code, please see our project page: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Continuous Regularized Wasserstein Barycenters b/data/2020/neurips/Continuous Regularized Wasserstein Barycenters
new file mode 100644
index 0000000000..b63123f9ae
--- /dev/null
+++ b/data/2020/neurips/Continuous Regularized Wasserstein Barycenters	
@@ -0,0 +1 @@
+Wasserstein barycenters provide a geometrically meaningful way to aggregate probability distributions, built on the theory of optimal transport. They are difficult to compute in practice, however, leading previous work to restrict their supports to finite sets of points. Leveraging a new dual formulation for the regularized Wasserstein barycenter problem, we introduce a stochastic algorithm that constructs a continuous approximation of the barycenter. We establish strong duality and use the corresponding primal-dual relationship to parametrize the barycenter implicitly using the dual potentials of regularized transport problems. The resulting problem can be solved with stochastic gradient descent, which yields an efficient online algorithm to approximate the barycenter of continuous distributions given sample access. We demonstrate the effectiveness of our approach and compare against previous work on synthetic examples and real-world applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Continuous Submodular Maximization: Beyond DR-Submodularity b/data/2020/neurips/Continuous Submodular Maximization: Beyond DR-Submodularity
new file mode 100644
index 0000000000..b788b9519e
--- /dev/null
+++ b/data/2020/neurips/Continuous Submodular Maximization: Beyond DR-Submodularity	
@@ -0,0 +1 @@
+In this paper, we propose the first continuous optimization algorithms that achieve a constant factor approximation guarantee for the problem of monotone continuous submodular maximization subject to a linear constraint. We first prove that a simple variant of the vanilla coordinate ascent, called Coordinate-Ascent+, achieves a $(\frac{e-1}{2e-1}-\varepsilon)$-approximation guarantee while performing $O(n/\varepsilon)$ iterations, where the computational complexity of each iteration is roughly $O(n/\sqrt{\varepsilon}+n\log n)$ (here, $n$ denotes the dimension of the optimization problem). We then propose Coordinate-Ascent++, that achieves the tight $(1-1/e-\varepsilon)$-approximation guarantee while performing the same number of iterations, but at a higher computational complexity of roughly $O(n^3/\varepsilon^{2.5} + n^3 \log n / \varepsilon^2)$ per iteration. However, the computation of each round of Coordinate-Ascent++ can be easily parallelized so that the computational cost per machine scales as $O(n/\sqrt{\varepsilon}+n\log n)$.
\ No newline at end of file
diff --git a/data/2020/neurips/Continuous Surface Embeddings b/data/2020/neurips/Continuous Surface Embeddings
new file mode 100644
index 0000000000..bd2fe705be
--- /dev/null
+++ b/data/2020/neurips/Continuous Surface Embeddings	
@@ -0,0 +1 @@
+In this work, we focus on the task of learning and representing dense correspondences in deformable object categories. While this problem has been considered before, solutions so far have been rather ad-hoc for specific object types (i.e., humans), often with significant manual work involved. However, scaling the geometry understanding to all objects in nature requires more automated approaches that can also express correspondences between related, but geometrically different objects. To this end, we propose a new, learnable image-based representation of dense correspondences. Our model predicts, for each pixel in a 2D image, an embedding vector of the corresponding vertex in the object mesh, therefore establishing dense correspondences between image pixels and 3D object geometry. We demonstrate that the proposed approach performs on par or better than the state-of-the-art methods for dense pose estimation for humans, while being conceptually simpler. We also collect a new in-the-wild dataset of dense correspondences for animal classes and demonstrate that our framework scales naturally to the new deformable object categories.
\ No newline at end of file
diff --git a/data/2020/neurips/ContraGAN: Contrastive Learning for Conditional Image Generation b/data/2020/neurips/ContraGAN: Contrastive Learning for Conditional Image Generation
new file mode 100644
index 0000000000..faff16a3ad
--- /dev/null
+++ b/data/2020/neurips/ContraGAN: Contrastive Learning for Conditional Image Generation	
@@ -0,0 +1 @@
+Conditional image generation is the task of generating diverse images using class label information. Although many conditional Generative Adversarial Networks (GAN) have shown realistic results, such methods consider pairwise relations between the embedding of an image and the embedding of the corresponding label (data-to-class relations) as the conditioning losses. In this paper, we propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss. The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images. Simultaneously, the generator tries to generate realistic images that deceive the authenticity and have a low contrastive loss. The experimental results show that ContraGAN outperforms state-of-the-art-models by 7.3% and 7.7% on Tiny ImageNet and ImageNet datasets, respectively. Besides, we experimentally demonstrate that ContraGAN helps to relieve the overfitting of the discriminator. For a fair comparison, we re-implement twelve state-of-the-art GANs using the PyTorch library. The software package is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Contrastive Learning with Adversarial Examples b/data/2020/neurips/Contrastive Learning with Adversarial Examples
new file mode 100644
index 0000000000..9f43359b7e
--- /dev/null
+++ b/data/2020/neurips/Contrastive Learning with Adversarial Examples	
@@ -0,0 +1 @@
+Contrastive learning (CL) is a popular technique for self-supervised learning (SSL) of visual representations. It uses pairs of augmentations of unlabeled training examples to define a classification task for pretext learning of a deep embedding. Despite extensive works in augmentation procedures, prior works do not address the selection of challenging negative pairs, as images within a sampled batch are treated independently. This paper addresses the problem, by introducing a new family of adversarial examples for constrastive learning and using these examples to define a new adversarial training algorithm for SSL, denoted as CLAE. When compared to standard CL, the use of adversarial examples creates more challenging positive pairs and adversarial training produces harder negative pairs by accounting for all images in a batch during the optimization. CLAE is compatible with many CL methods in the literature. Experiments show that it improves the performance of several existing CL baselines on multiple datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Contrastive learning of global and local features for medical image segmentation with limited annotations b/data/2020/neurips/Contrastive learning of global and local features for medical image segmentation with limited annotations
new file mode 100644
index 0000000000..e267543271
--- /dev/null
+++ b/data/2020/neurips/Contrastive learning of global and local features for medical image segmentation with limited annotations	
@@ -0,0 +1 @@
+A key requirement for the success of supervised deep learning is a large labeled dataset - a condition that is difficult to meet in medical image analysis. Self-supervised learning (SSL) can help in this regard by providing a strategy to pre-train a neural network with unlabeled data, followed by fine-tuning for a downstream task with limited annotations. Contrastive learning, a particular variant of SSL, is a powerful technique for learning image-level representations. In this work, we propose strategies for extending the contrastive learning framework for segmentation of volumetric medical images in the semi-supervised setting with limited annotations, by leveraging domain-specific and problem-specific cues. Specifically, we propose (1) novel contrasting strategies that leverage structural similarity across volumetric medical images (domain-specific cue) and (2) a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation (problem-specific cue). We carry out an extensive evaluation on three Magnetic Resonance Imaging (MRI) datasets. In the limited annotation setting, the proposed method yields substantial improvements compared to other self-supervision and semi-supervised learning techniques. When combined with a simple data augmentation technique, the proposed method reaches within 8% of benchmark performance using only two labeled MRI volumes for training, corresponding to only 4% (for ACDC) of the training data used to train the benchmark.
\ No newline at end of file
diff --git a/data/2020/neurips/ConvBERT: Improving BERT with Span-based Dynamic Convolution b/data/2020/neurips/ConvBERT: Improving BERT with Span-based Dynamic Convolution
new file mode 100644
index 0000000000..2bff38f350
--- /dev/null
+++ b/data/2020/neurips/ConvBERT: Improving BERT with Span-based Dynamic Convolution	
@@ -0,0 +1 @@
+Pre-trained language models like BERT and its variants have recently achieved impressive performance in various natural language understanding tasks. However, BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. Although all its attention heads query on the whole input sequence for generating the attention map from a global perspective, we observe some heads only need to learn local dependencies, which means the existence of computation redundancy. We therefore propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning. We equip BERT with this mixed attention design and build a ConvBERT model. Experiments have shown that ConvBERT significantly outperforms BERT and its variants in various downstream tasks, with lower training cost and fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while using less than 1/4 training cost. Code and pre-trained models will be released.
\ No newline at end of file
diff --git a/data/2020/neurips/Convergence and Stability of Graph Convolutional Networks on Large Random Graphs b/data/2020/neurips/Convergence and Stability of Graph Convolutional Networks on Large Random Graphs
new file mode 100644
index 0000000000..f71a6e73c5
--- /dev/null
+++ b/data/2020/neurips/Convergence and Stability of Graph Convolutional Networks on Large Random Graphs	
@@ -0,0 +1 @@
+We study properties of Graph Convolutional Networks (GCNs) by analyzing their behavior on standard models of random graphs, where nodes are represented by random latent variables and edges are drawn according to a similarity kernel. This allows us to overcome the difficulties of dealing with discrete notions such as isomorphisms on very large graphs, by considering instead more natural geometric aspects. We first study the convergence of GCNs to their continuous counterpart as the number of nodes grows. Our results are fully non-asymptotic and are valid for relatively sparse graphs with an average degree that grows logarithmically with the number of nodes. We then analyze the stability of GCNs to small deformations of the random graph model. In contrast to previous studies of stability in discrete settings, our continuous setup allows us to provide more intuitive deformation-based metrics for understanding stability, which have proven useful for explaining the success of convolutional representations on Euclidean domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters b/data/2020/neurips/Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters
new file mode 100644
index 0000000000..05be51713d
--- /dev/null
+++ b/data/2020/neurips/Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters	
@@ -0,0 +1 @@
+Although model-agnostic meta-learning (MAML) is a very successful algorithm in meta-learning practice, it can have high computational cost because it updates all model parameters over both the inner loop of task-specific adaptation and the outer-loop of meta initialization training. A more efficient algorithm ANIL (which refers to almost no inner loop) was proposed recently by Raghu et al. 2019, which adapts only a small subset of parameters in the inner loop and thus has substantially less computational cost than MAML as demonstrated by extensive experiments. However, the theoretical convergence of ANIL has not been studied yet. In this paper, we characterize the convergence rate and the computational complexity for ANIL under two representative inner-loop loss geometries, i.e., strongly-convexity and nonconvexity. Our results show that such a geometric property can significantly affect the overall convergence performance of ANIL. For example, ANIL achieves a faster convergence rate for a strongly-convex inner-loop loss as the number $N$ of inner-loop gradient descent steps increases, but a slower convergence rate for a nonconvex inner-loop loss as $N$ increases. Moreover, our complexity analysis provides a theoretical quantification on the improved efficiency of ANIL over MAML. The experiments on standard few-shot meta-learning benchmarks validate our theoretical findings.
\ No newline at end of file
diff --git a/data/2020/neurips/Convex optimization based on global lower second-order models b/data/2020/neurips/Convex optimization based on global lower second-order models
new file mode 100644
index 0000000000..f96275a59a
--- /dev/null
+++ b/data/2020/neurips/Convex optimization based on global lower second-order models	
@@ -0,0 +1 @@
+In this paper, we present new second-order algorithms for composite convex optimization, called Contracting-domain Newton methods. These algorithms are affine-invariant and based on global second-order lower approximation for the smooth component of the objective. Our approach has an interpretation both as a second-order generalization of the conditional gradient method, or as a variant of trust-region scheme. Under the assumption, that the problem domain is bounded, we prove $\mathcal{O}(1/k^{2})$ global rate of convergence in functional residual, where $k$ is the iteration counter, minimizing convex functions with Lipschitz continuous Hessian. This significantly improves the previously known bound $\mathcal{O}(1/k)$ for this type of algorithms. Additionally, we propose a stochastic extension of our method, and present computational results for solving empirical risk minimization problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Convolutional Generation of Textured 3D Meshes b/data/2020/neurips/Convolutional Generation of Textured 3D Meshes
new file mode 100644
index 0000000000..2aee5751a1
--- /dev/null
+++ b/data/2020/neurips/Convolutional Generation of Textured 3D Meshes	
@@ -0,0 +1 @@
+Recent generative models for 2D images achieve impressive visual results, but clearly lack the ability to perform 3D reasoning. This heavily restricts the degree of control over generated objects as well as the possible applications of such models. In this work, we leverage recent advances in differentiable rendering to design a framework that can generate triangle meshes and associated high-resolution texture maps, using only 2D supervision from single-view natural images. A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN. We demonstrate the efficacy of our method on Pascal3D+ Cars and the CUB birds dataset, both in an unconditional setting and in settings where the model is conditioned on class labels, attributes, and text. Finally, we propose an evaluation methodology that assesses the mesh and texture quality separately.
\ No newline at end of file
diff --git a/data/2020/neurips/Convolutional Tensor-Train LSTM for Spatio-Temporal Learning b/data/2020/neurips/Convolutional Tensor-Train LSTM for Spatio-Temporal Learning
new file mode 100644
index 0000000000..13d825836c
--- /dev/null
+++ b/data/2020/neurips/Convolutional Tensor-Train LSTM for Spatio-Temporal Learning	
@@ -0,0 +1 @@
+Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation.However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. To make this feasible in terms of computation and memory requirements, we propose a novel convolutional tensor-train decomposition of the higher-order model. This decomposition reduces the model complexity by jointly approximating a sequence of convolutional kernels asa low-rank tensor-train factorization. As a result, our model outperforms existing approaches, but uses only a fraction of parameters, including the baseline models.Our results achieve state-of-the-art performance in a wide range of applications and datasets, including the multi-steps video prediction on the Moving-MNIST-2and KTH action datasets as well as early activity recognition on the Something-Something V2 dataset.
\ No newline at end of file
diff --git a/data/2020/neurips/Cooperative Heterogeneous Deep Reinforcement Learning b/data/2020/neurips/Cooperative Heterogeneous Deep Reinforcement Learning
new file mode 100644
index 0000000000..cef16bfb67
--- /dev/null
+++ b/data/2020/neurips/Cooperative Heterogeneous Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Cooperative Multi-player Bandit Optimization b/data/2020/neurips/Cooperative Multi-player Bandit Optimization
new file mode 100644
index 0000000000..010d9bda0e
--- /dev/null
+++ b/data/2020/neurips/Cooperative Multi-player Bandit Optimization	
@@ -0,0 +1 @@
+Consider a team of cooperative players that take actions in a networked-environment. At each turn, each player chooses an action and receives a reward that is an unknown function of all the players’ actions. The goal of the team of players is to learn to play together the action proﬁle that maximizes the sum of their rewards. However, players cannot observe the actions or rewards of other players, and can only get this information by communicating with their neighbors. We design a distributed learning algorithm that overcomes the informational bias players have towards maximizing the rewards of nearby players they got more information about. We assume twice continuously differentiable reward functions and constrained convex and compact action sets. Our communication graph is a random time-varying graph that follows an ergodic Markov chain. We prove that even if at every turn players take actions based only on the small random subset of the players’ rewards that they know, our algorithm converges with probability 1 to the set of stationary points of (projected) gradient ascent on the sum of rewards function. Hence, if the sum of rewards is concave, then the algorithm converges with probability 1 to an optimal action proﬁle.
\ No newline at end of file
diff --git a/data/2020/neurips/Coresets for Near-Convex Functions b/data/2020/neurips/Coresets for Near-Convex Functions
new file mode 100644
index 0000000000..d00d2c885a
--- /dev/null
+++ b/data/2020/neurips/Coresets for Near-Convex Functions	
@@ -0,0 +1,2 @@
+Coreset is usually a small weighted subset of $n$ input points in $\mathbb{R}^d$, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). Coresets become increasingly common in machine learning since existing heuristics or inefficient algorithms may be improved by running them possibly many times on the small coreset that can be maintained for streaming distributed data. Coresets can be obtained by sensitivity (importance) sampling, where its size is proportional to the total sum of sensitivities. Unfortunately, computing the sensitivity of each point is problem dependent and may be harder to compute than the original optimization problem at hand. 
+We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions which we call near-convex functions. This is by suggesting the $f$-SVD factorization that generalizes the SVD factorization of matrices to functions. Example applications include coresets that are either new or significantly improves previous results, such as SVM, Logistic regression, M-estimators, and $\ell_z$-regression. Experimental results and open source are also provided.
\ No newline at end of file
diff --git a/data/2020/neurips/Coresets for Regressions with Panel Data b/data/2020/neurips/Coresets for Regressions with Panel Data
new file mode 100644
index 0000000000..a31559eb47
--- /dev/null
+++ b/data/2020/neurips/Coresets for Regressions with Panel Data	
@@ -0,0 +1 @@
+This paper introduces the problem of coresets for regression problems to panel data settings. We first define coresets for several variants of regression problems with panel data and then present efficient algorithms to construct coresets of size that depend polynomially on 1/$\varepsilon$ (where $\varepsilon$ is the error parameter) and the number of regression parameters - independent of the number of individuals in the panel data or the time units each individual is observed for. Our approach is based on the Feldman-Langberg framework in which a key step is to upper bound the "total sensitivity" that is roughly the sum of maximum influences of all individual-time pairs taken over all possible choices of regression parameters. Empirically, we assess our approach with synthetic and real-world datasets; the coreset sizes constructed using our approach are much smaller than the full dataset and coresets indeed accelerate the running time of computing the regression objective.
\ No newline at end of file
diff --git a/data/2020/neurips/Coresets for Robust Training of Deep Neural Networks against Noisy Labels b/data/2020/neurips/Coresets for Robust Training of Deep Neural Networks against Noisy Labels
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Coresets via Bilevel Optimization for Continual Learning and Streaming b/data/2020/neurips/Coresets via Bilevel Optimization for Continual Learning and Streaming
new file mode 100644
index 0000000000..c18664e92e
--- /dev/null
+++ b/data/2020/neurips/Coresets via Bilevel Optimization for Continual Learning and Streaming	
@@ -0,0 +1 @@
+Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Correlation Robust Influence Maximization b/data/2020/neurips/Correlation Robust Influence Maximization
new file mode 100644
index 0000000000..5a71b539f0
--- /dev/null
+++ b/data/2020/neurips/Correlation Robust Influence Maximization	
@@ -0,0 +1 @@
+We propose a distributionally robust model for the influence maximization problem. Unlike the classic independent cascade model \citep{kempe2003maximizing}, this model's diffusion process is adversarially adapted to the choice of seed set. Hence, instead of optimizing under the assumption that all influence relationships in the network are independent, we seek a seed set whose expected influence under the worst correlation, i.e. the "worst-case, expected influence", is maximized. We show that this worst-case influence can be efficiently computed, and though the optimization is NP-hard, a ($1 - 1/e$) approximation guarantee holds. We also analyze the structure to the adversary's choice of diffusion process, and contrast with established models. Beyond the key computational advantages, we also highlight the extent to which the independence assumption may cost optimality, and provide insights from numerical experiments comparing the adversarial and independent cascade model.
\ No newline at end of file
diff --git a/data/2020/neurips/Correspondence learning via linearly-invariant embedding b/data/2020/neurips/Correspondence learning via linearly-invariant embedding
new file mode 100644
index 0000000000..e7b3a94d16
--- /dev/null
+++ b/data/2020/neurips/Correspondence learning via linearly-invariant embedding	
@@ -0,0 +1 @@
+In this paper, we propose a fully differentiable pipeline for estimating accurate dense correspondences between 3D point clouds. The proposed pipeline is an extension and a generalization of the functional maps framework. However, instead of using the Laplace-Beltrami eigenfunctions as done in virtually all previous works in this domain, we demonstrate that learning the basis from data can both improve robustness and lead to better accuracy in challenging settings. We interpret the basis as a learned embedding into a higher dimensional space. Following the functional map paradigm the optimal transformation in this embedding space must be linear and we propose a separate architecture aimed at estimating the transformation by learning optimal descriptor functions. This leads to the first end-to-end trainable functional map-based correspondence approach in which both the basis and the descriptors are learned from data. Interestingly, we also observe that learning a \emph{canonical} embedding leads to worse results, suggesting that leaving an extra linear degree of freedom to the embedding network gives it more robustness, thereby also shedding light onto the success of previous methods. Finally, we demonstrate that our approach achieves state-of-the-art results in challenging non-rigid 3D point cloud correspondence applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterexample-Guided Learning of Monotonic Neural Networks b/data/2020/neurips/Counterexample-Guided Learning of Monotonic Neural Networks
new file mode 100644
index 0000000000..871a4ae30f
--- /dev/null
+++ b/data/2020/neurips/Counterexample-Guided Learning of Monotonic Neural Networks	
@@ -0,0 +1 @@
+The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding b/data/2020/neurips/Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding
new file mode 100644
index 0000000000..138a4c312c
--- /dev/null
+++ b/data/2020/neurips/Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding	
@@ -0,0 +1 @@
+Encoder Module. We first extract the visual features {vi}i=1 of the given video using a pretrained feature extractor (e.g. 3D-ConvNet [11]). We then apply a Bi-GRU network [3] to learn contextual features {fi}i=1. Next, we define T moment proposals. Each proposal is defined by the boundaries (s,e) and the proposal feature is given by ht = MaxPooling({fi}i=s). For language queries, we first extract the embedding for each word token by a pre-trained Glove embedding [8] and employ another Bi-GRU network to learn word features {sn}n=1.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterfactual Data Augmentation using Locally Factored Dynamics b/data/2020/neurips/Counterfactual Data Augmentation using Locally Factored Dynamics
new file mode 100644
index 0000000000..5aae717a81
--- /dev/null
+++ b/data/2020/neurips/Counterfactual Data Augmentation using Locally Factored Dynamics	
@@ -0,0 +1 @@
+Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for model-free Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterfactual Prediction for Bundle Treatment b/data/2020/neurips/Counterfactual Prediction for Bundle Treatment
new file mode 100644
index 0000000000..ae6678295f
--- /dev/null
+++ b/data/2020/neurips/Counterfactual Prediction for Bundle Treatment	
@@ -0,0 +1 @@
+Estimating counterfactual outcome of different treatments from observational data is an important problem to assist decision making in a variety of ﬁelds. Among the various forms of treatment speciﬁcation, bundle treatment has been widely adopted in many scenarios, such as recommendation systems and online marketing. The bundle treatment usually can be abstracted as a high dimensional binary vector, which makes it more challenging for researchers to remove the confounding bias in observational data. In this work, we assume the existence of low dimensional latent structure underlying bundle treatment. Via the learned latent representations of treatments, we propose a novel variational sample re-weighting (VSR) method to eliminate confounding bias by decorrelating the treatments and confounders. Finally, we conduct extensive experiments to demonstrate that the predictive model trained on this re-weighted dataset can achieve more accurate counterfactual outcome prediction.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterfactual Predictions under Runtime Confounding b/data/2020/neurips/Counterfactual Predictions under Runtime Confounding
new file mode 100644
index 0000000000..18382a35cc
--- /dev/null
+++ b/data/2020/neurips/Counterfactual Predictions under Runtime Confounding	
@@ -0,0 +1 @@
+Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must measure all factors that jointly affect the outcomes and the decision taken. Motivated by decision support applications, we study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data, but it is either undesirable or impermissible to use some such factors in the prediction model. We refer to this setting as runtime confounding. We propose a doubly-robust procedure for learning counterfactual prediction models in this setting. Our theoretical analysis and experimental results suggest that our method often outperforms competing approaches. We also present a validation procedure for evaluating the performance of counterfactual prediction methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Counterfactual Vision-and-Language Navigation: Unravelling the Unseen b/data/2020/neurips/Counterfactual Vision-and-Language Navigation: Unravelling the Unseen
new file mode 100644
index 0000000000..5afc1e43d3
--- /dev/null
+++ b/data/2020/neurips/Counterfactual Vision-and-Language Navigation: Unravelling the Unseen	
@@ -0,0 +1 @@
+The task of vision-and-language navigation (VLN) requires an agent to follow text instructions to find its way through simulated household environments. A prominent challenge is to train an agent capable of generalising to new environments at test time, rather than one that simply memorises trajectories and visual details observed during training. We propose a new learning strategy that learns both from observations and generated counterfactual environments. We describe an effective algorithm to generate counterfactual observations on the fly for VLN, as linear combinations of existing environments. Simultaneously, we encourage the agent’s actions to remain stable between original and counterfactual environments through our novel training objective – effectively removing spurious features that would otherwise bias the agent. Our experiments show that this technique provides significant improvements in generalisation on benchmarks for Room-to-Room navigation and Embodied Question Answering.
\ No newline at end of file
diff --git a/data/2020/neurips/Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators b/data/2020/neurips/Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators
new file mode 100644
index 0000000000..166d554b96
--- /dev/null
+++ b/data/2020/neurips/Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators	
@@ -0,0 +1 @@
+Invertible neural networks based on coupling flows (CF-INNs) have various machine learning applications such as image synthesis and representation learning. However, their desirable characteristics such as analytic invertibility come at the cost of restricting the functional forms. This poses a question on their representation power: are CF-INNs universal approximators for invertible functions? Without a universality, there could be a well-behaved invertible transformation that the CF-INN can never approximate, hence it would render the model class unreliable. We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases. As its corollary, we can affirmatively resolve a previously unsolved problem: whether normalizing flow models based on affine coupling can be universal distributional approximators. In the course of proving the universality, we prove a general theorem to show the equivalence of the universality for certain diffeomorphism classes, a theoretical insight that is of interest by itself.
\ No newline at end of file
diff --git a/data/2020/neurips/Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search b/data/2020/neurips/Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
new file mode 100644
index 0000000000..8831c29b2c
--- /dev/null
+++ b/data/2020/neurips/Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search	
@@ -0,0 +1 @@
+One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.
\ No newline at end of file
diff --git a/data/2020/neurips/Critic Regularized Regression b/data/2020/neurips/Critic Regularized Regression
new file mode 100644
index 0000000000..9560a335fb
--- /dev/null
+++ b/data/2020/neurips/Critic Regularized Regression	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Cross-Scale Internal Graph Neural Network for Image Super-Resolution b/data/2020/neurips/Cross-Scale Internal Graph Neural Network for Image Super-Resolution
new file mode 100644
index 0000000000..63cf8d68f1
--- /dev/null
+++ b/data/2020/neurips/Cross-Scale Internal Graph Neural Network for Image Super-Resolution	
@@ -0,0 +1 @@
+Non-local self-similarity in natural images has been well studied as an effective prior in image restoration. However, for single image super-resolution (SISR), most existing deep non-local methods (e.g., non-local neural networks) only exploit similar patches within the same scale of the low-resolution (LR) input image. Consequently, the restoration is limited to using the same-scale information while neglecting potential high-resolution (HR) cues from other scales. In this paper, we explore the cross-scale patch recurrence property of a natural image, i.e., similar patches tend to recur many times across different scales. This is achieved using a novel cross-scale internal graph neural network (IGNN). Specifically, we dynamically construct a cross-scale graph by searching k-nearest neighboring patches in the downsampled LR image for each query patch in the LR image. We then obtain the corresponding k HR neighboring patches in the LR image and aggregate them adaptively in accordance to the edge label of the constructed graph. In this way, the HR information can be passed from k HR neighboring patches to the LR query patch to help it recover more detailed textures. Besides, these internal image-specific LR/HR exemplars are also significant complements to the external information learned from the training dataset. Extensive experiments demonstrate the effectiveness of IGNN against the state-of-the-art SISR methods including existing non-local networks on standard benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Cross-lingual Retrieval for Iterative Self-Supervised Training b/data/2020/neurips/Cross-lingual Retrieval for Iterative Self-Supervised Training
new file mode 100644
index 0000000000..fb57c06034
--- /dev/null
+++ b/data/2020/neurips/Cross-lingual Retrieval for Iterative Self-Supervised Training	
@@ -0,0 +1 @@
+Recent studies have demonstrated the cross-lingual alignment ability of multilingual pretrained language models. In this work, we found that the cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs. We utilized these findings to develop a new approach -- cross-lingual retrieval for iterative self-supervised training (CRISS), where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time. Using this method, we achieved state-of-the-art unsupervised machine translation results on 9 language directions with an average improvement of 2.4 BLEU, and on the Tatoeba sentence retrieval task in the XTREME benchmark on 16 languages with an average improvement of 21.5% in absolute accuracy. Furthermore, CRISS also brings an additional 1.8 BLEU improvement on average compared to mBART, when finetuned on supervised machine translation downstream tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Cross-validation Confidence Intervals for Test Error b/data/2020/neurips/Cross-validation Confidence Intervals for Test Error
new file mode 100644
index 0000000000..ff7dba59e4
--- /dev/null
+++ b/data/2020/neurips/Cross-validation Confidence Intervals for Test Error	
@@ -0,0 +1 @@
+This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for $k$-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller $k$-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/CrossTransformers: spatially-aware few-shot transfer b/data/2020/neurips/CrossTransformers: spatially-aware few-shot transfer
new file mode 100644
index 0000000000..b8a06f0fae
--- /dev/null
+++ b/data/2020/neurips/CrossTransformers: spatially-aware few-shot transfer	
@@ -0,0 +1 @@
+Given new tasks with very little data$-$such as new classes in a classification problem or a domain shift in the input$-$performance of modern vision systems degrades remarkably quickly. In this work, we illustrate how the neural network representations which underpin modern vision systems are subject to supervision collapse, whereby they lose any information that is not necessary for performing the training task, including information that may be necessary for transfer to new tasks or domains. We then propose two methods to mitigate this problem. First, we employ self-supervised learning to encourage general-purpose features that transfer better. Second, we propose a novel Transformer based neural network architecture called CrossTransformers, which can take a small number of labeled images and an unlabeled query, find coarse spatial correspondence between the query and the labeled images, and then infer class membership by computing distances between spatially-corresponding features. The result is a classifier that is more robust to task and domain shift, which we demonstrate via state-of-the-art performance on Meta-Dataset, a recent dataset for evaluating transfer from ImageNet to many other vision datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality b/data/2020/neurips/Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality
new file mode 100644
index 0000000000..547f7fc9d8
--- /dev/null
+++ b/data/2020/neurips/Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality	
@@ -0,0 +1 @@
+We study stochastic structured bandits for minimizing regret. The fact that the popular optimistic algorithms do not achieve the asymptotic instance-dependent regret optimality (asymptotic optimality for short) has recently allured researchers. On the other hand, it is known that one can achieve a bounded regret (i.e., does not grow indefinitely with $n$) in certain instances. Unfortunately, existing asymptotically optimal algorithms rely on forced sampling that introduces an $\omega(1)$ term w.r.t. the time horizon $n$ in their regret, failing to adapt to the ``easiness'' of the instance. In this paper, we focus on the finite hypothesis class and ask if one can achieve the asymptotic optimality while enjoying bounded regret whenever possible. We provide a positive answer by introducing a new algorithm called CRush Optimism with Pessimism (CROP) that eliminates optimistic hypotheses by pulling the informative arms indicated by a pessimistic hypothesis. Our finite-time analysis shows that CROP $(i)$ achieves a constant-factor asymptotic optimality and, thanks to the forced-exploration-free design, $(ii)$ adapts to bounded regret, and $(iii)$ its regret bound scales not with the number of arms $K$ but with an effective number of arms $K_\psi$ that we introduce. We also discuss a problem class where CROP can be exponentially better than existing algorithms in \textit{nonasymptotic} regimes. Finally, we observe that even a clairvoyant oracle who plays according to the asymptotically optimal arm pull scheme may suffer a linear worst-case regret, indicating that it may not be the end of optimism.
\ No newline at end of file
diff --git a/data/2020/neurips/Curriculum By Smoothing b/data/2020/neurips/Curriculum By Smoothing
new file mode 100644
index 0000000000..d20126f5d0
--- /dev/null
+++ b/data/2020/neurips/Curriculum By Smoothing	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. Moreover, recent work in Generative Adversarial Networks (GANs) has highlighted the importance of learning by progressively increasing the difficulty of a learning task [26]. When learning a network from scratch, the information propagated within the network during the earlier stages of training can contain distortion artifacts due to noise which can be detrimental to training. In this paper, we propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. We propose to augment the train-ing of CNNs by controlling the amount of high frequency information propagated within the CNNs as training progresses, by convolving the output of a CNN feature map of each layer with a Gaussian kernel. By decreasing the variance of the Gaussian kernel, we gradually increase the amount of high-frequency information available within the network for inference. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data. Our proposed augmented training scheme significantly improves the performance of CNNs on various vision tasks without either adding additional trainable parameters or an auxiliary regularization objective. The generality of our method is demonstrated through empirical performance gains in CNN architectures across four different tasks: transfer learning, cross-task transfer learning, and generative models.
\ No newline at end of file
diff --git a/data/2020/neurips/Curriculum Learning by Dynamic Instance Hardness b/data/2020/neurips/Curriculum Learning by Dynamic Instance Hardness
new file mode 100644
index 0000000000..654abc0c45
--- /dev/null
+++ b/data/2020/neurips/Curriculum Learning by Dynamic Instance Hardness	
@@ -0,0 +1 @@
+A good teacher can adjust a curriculum based on students’ learning history. By analogy, in this paper, we study the dynamics of a deep neural network’s (DNN) performance on individual samples during its learning process. The observed properties allow us to develop an adaptive curriculum that leads to faster learning of more accurate models. We introduce dynamic instance hardness ( DIH ), the exponential moving average of a sample’s instantaneous hardness (e.g., a loss, or a change in output) over the training history. A low DIH indicates that a model retains knowledge about a sample over time. For DNNs, we ﬁnd that a sample’s DIH early in training predicts its DIH in later stages. Hence, we can train a model using samples mostly with higher DIH and safely deprioritize those with lower DIH. This motivates a DIH guided curriculum learning ( DIHCL ) procedure. Compared to existing CL methods: (1) DIH is more stable over time than using only instantaneous hardness, which is noisy due to stochastic training and DNN’s non-smoothness; (2) DIHCL is computationally inexpensive since it uses only a byproduct of back-propagation and thus does not require extra inference. On 11 datasets, DIHCL signiﬁcantly outperforms random mini-batch SGD and recent CL methods in terms of efﬁciency and ﬁnal performance. The code of DIHCL is available at https://github.com/tianyizhou/DIHCL .
\ No newline at end of file
diff --git a/data/2020/neurips/Curriculum learning for multilevel budgeted combinatorial problems b/data/2020/neurips/Curriculum learning for multilevel budgeted combinatorial problems
new file mode 100644
index 0000000000..2a4a5e6723
--- /dev/null
+++ b/data/2020/neurips/Curriculum learning for multilevel budgeted combinatorial problems	
@@ -0,0 +1 @@
+Learning heuristics for combinatorial optimization problems through graph neural networks have recently shown promising results on some classic NP-hard problems. These are single-level optimization problems with only one player. Multilevel combinatorial optimization problems are their generalization, encompassing situations with multiple players taking decisions sequentially. By framing them in a multi-agent reinforcement learning setting, we devise a value-based method to learn to solve multilevel budgeted combinatorial problems involving two players in a zero-sum game over a graph. Our framework is based on a simple curriculum: if an agent knows how to estimate the value of instances with budgets up to $B$, then solving instances with budget $B+1$ can be done in polynomial time regardless of the direction of the optimization by checking the value of every possible afterstate. Thus, in a bottom-up approach, we generate datasets of heuristically solved instances with increasingly larger budgets to train our agent. We report results close to optimality on graphs up to $100$ nodes and a $185 \times$ speedup on average compared to the quickest exact solver known for the Multilevel Critical Node problem, a max-min-max trilevel problem that has been shown to be at least $\Sigma_2^p$-hard.
\ No newline at end of file
diff --git a/data/2020/neurips/Curvature Regularization to Prevent Distortion in Graph Embedding b/data/2020/neurips/Curvature Regularization to Prevent Distortion in Graph Embedding
new file mode 100644
index 0000000000..d3508e095d
--- /dev/null
+++ b/data/2020/neurips/Curvature Regularization to Prevent Distortion in Graph Embedding	
@@ -0,0 +1 @@
+Recent research on graph embedding has achieved success in various applications. Most graph embedding methods preserve the proximity in a graph into a manifold in an embedding space. We argue an important but neglected problem about this proximity-preserving strategy: Graph topology patterns, while preserved well into an embedding manifold by preserving proximity, may distort in the ambient embedding Euclidean space, and hence to detect them becomes difficult for machine learning models. To address the problem, we propose curvature regularization, to enforce flatness for embedding manifolds, thereby preventing the distortion. We present a novel angle-based sectional curvature, termed ABS curvature, and accordingly three kinds of curvature regularization to induce flat embedding manifolds during graph embedding. We integrate curvature regularization into five popular proximity-preserving embedding methods, and empirical results in two applications show significant improvements on a wide range of open graph datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Cycle-Contrast for Self-Supervised Video Representation Learning b/data/2020/neurips/Cycle-Contrast for Self-Supervised Video Representation Learning
new file mode 100644
index 0000000000..47080e9d57
--- /dev/null
+++ b/data/2020/neurips/Cycle-Contrast for Self-Supervised Video Representation Learning	
@@ -0,0 +1 @@
+We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single network based on an R3D architecture, with a shared non-linear transformation for embedding both frame and video features before the cycle-contrastive loss. We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding, outperforming previous methods in nearest neighbour retrieval and action recognition tasks on UCF101, HMDB51 and MMAct.
\ No newline at end of file
diff --git a/data/2020/neurips/DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks b/data/2020/neurips/DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks
new file mode 100644
index 0000000000..7d0b9b9c28
--- /dev/null
+++ b/data/2020/neurips/DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks	
@@ -0,0 +1 @@
+This paper re-examines a continuous optimization framework dubbed NOTEARS for learning Bayesian networks. We first generalize existing algebraic characterizations of acyclicity to a class of matrix polynomials. Next, focusing on a one-parameter-per-edge setting, it is shown that the Karush-Kuhn-Tucker (KKT) optimality conditions for the NOTEARS formulation cannot be satisfied except in a trivial case, which explains a behavior of the associated algorithm. We then derive the KKT conditions for an equivalent reformulation, show that they are indeed necessary, and relate them to explicit constraints that certain edges be absent from the graph. If the score function is convex, these KKT conditions are also sufficient for local minimality despite the non-convexity of the constraint. Informed by the KKT conditions, a local search post-processing algorithm is proposed and shown to substantially and universally improve the structural Hamming distance of all tested algorithms, typically by a factor of 2 or more. Some combinations with local search are both more accurate and more efficient than the original NOTEARS.
\ No newline at end of file
diff --git a/data/2020/neurips/DISK: Learning local features with policy gradient b/data/2020/neurips/DISK: Learning local features with policy gradient
new file mode 100644
index 0000000000..661705b164
--- /dev/null
+++ b/data/2020/neurips/DISK: Learning local features with policy gradient	
@@ -0,0 +1 @@
+Local feature frameworks are difficult to learn in an end-to-end fashion, due to the discreteness inherent to the selection and matching of sparse keypoints. We introduce DISK (DIScrete Keypoints), a novel method that overcomes these obstacles by leveraging principles from Reinforcement Learning (RL), optimizing end-to-end for a high number of correct feature matches. Our simple yet expressive probabilistic model lets us keep the training and inference regimes close, while maintaining good enough convergence properties to reliably train from scratch. Our features can be extracted very densely while remaining discriminative, challenging commonly held assumptions about what constitutes a good keypoint, as showcased in Fig. 1, and deliver state-of-the-art results on three public benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles b/data/2020/neurips/DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles
new file mode 100644
index 0000000000..f2152e5791
--- /dev/null
+++ b/data/2020/neurips/DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles	
@@ -0,0 +1 @@
+Recent research finds CNN models for image classification demonstrate overlapped adversarial vulnerabilities: adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, and suffers from significant loss on clean data accuracy. Alternatively, ensemble methods are proposed to induce sub-models with diverse outputs against a transfer adversarial example, making the ensemble robust against transfer attacks even if each sub-model is individually non-robust. Only small clean accuracy drop is observed in the process. However, previous ensemble training methods are not efficacious in inducing such diversity and thus ineffective on reaching robust ensemble. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features, and diversifies the adversarial vulnerability to induce diverse outputs against a transfer attack. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks comparing to previous ensemble methods, and enables the improved robustness when more sub-models are added to the ensemble. The code of this work is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Dark Experience for General Continual Learning: a Strong, Simple Baseline b/data/2020/neurips/Dark Experience for General Continual Learning: a Strong, Simple Baseline
new file mode 100644
index 0000000000..5e176c544e
--- /dev/null
+++ b/data/2020/neurips/Dark Experience for General Continual Learning: a Strong, Simple Baseline	
@@ -0,0 +1 @@
+Neural networks struggle to learn continuously, as they forget the old knowledge catastrophically whenever the data distribution changes over time. Recently, Continual Learning has inspired a plethora of approaches and evaluation settings; however, the majority of them overlooks the properties of a practical scenario, where the data stream cannot be shaped as a sequence of tasks and offline training is not viable. We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly. We address it through Dark Experience Replay, namely matching the network's logits sampled throughout the optimization trajectory, thus promoting consistency with its past. By conducting an extensive analysis on top of standard benchmarks, we show that such a seemingly simple baseline outperforms consolidated approaches and leverages limited resources. To provide a better understanding, we further introduce MNIST-360, a novel GCL evaluation setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Data Diversification: A Simple Strategy For Neural Machine Translation b/data/2020/neurips/Data Diversification: A Simple Strategy For Neural Machine Translation
new file mode 100644
index 0000000000..81fc9bfbed
--- /dev/null
+++ b/data/2020/neurips/Data Diversification: A Simple Strategy For Neural Machine Translation	
@@ -0,0 +1 @@
+We introduce Data Diversification: a simple but effective strategy to boost neural machine translation (NMT) performance. It diversifies the training data by using the predictions of multiple forward and backward models and then merging them with the original dataset on which the final NMT model is trained. Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles of models. Our method achieves state-of-the-art BLEU scores of 30.7 and 43.7 in the WMT'14 English-German and English-French translation tasks, respectively. It also substantially improves on 8 other translation tasks: 4 IWSLT tasks (English-German and English-French) and 4 low-resource translation tasks (English-Nepali and English-Sinhala). We demonstrate that our method is more effective than knowledge distillation and dual learning, it exhibits strong correlation with ensembles of models, and it trades perplexity off for better BLEU score. We have released our source code at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/De-Anonymizing Text by Fingerprinting Language Generation b/data/2020/neurips/De-Anonymizing Text by Fingerprinting Language Generation
new file mode 100644
index 0000000000..a8faa6d46d
--- /dev/null
+++ b/data/2020/neurips/De-Anonymizing Text by Fingerprinting Language Generation	
@@ -0,0 +1 @@
+Components of machine learning systems are not (yet) perceived as security hotspots. Secure coding practices, such as ensuring that no execution paths depend on confidential inputs, have not yet been adopted by ML developers. We initiate the study of code security of ML systems by investigating how nucleus sampling---a popular approach for generating text, used for applications such as auto-completion---unwittingly leaks texts typed by users. Our main result is that the series of nucleus sizes for many natural English word sequences is a unique fingerprint. We then show how an attacker can infer typed text by measuring these fingerprints via a suitable side channel (e.g., cache access times), explain how this attack could help de-anonymize anonymous texts, and discuss defenses.
\ No newline at end of file
diff --git a/data/2020/neurips/Debiased Contrastive Learning b/data/2020/neurips/Debiased Contrastive Learning
new file mode 100644
index 0000000000..4b590074dc
--- /dev/null
+++ b/data/2020/neurips/Debiased Contrastive Learning	
@@ -0,0 +1 @@
+A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examples from truly different labels improves performance, in a synthetic setting where labels are available. Motivated by this observation, we develop a debiased contrastive objective that corrects for the sampling of same-label datapoints, even without knowledge of the true labels. Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks. Theoretically, we establish generalization bounds for the downstream classification task.
\ No newline at end of file
diff --git a/data/2020/neurips/Debiasing Averaged Stochastic Gradient Descent to handle missing values b/data/2020/neurips/Debiasing Averaged Stochastic Gradient Descent to handle missing values
new file mode 100644
index 0000000000..f9aec56ecc
--- /dev/null
+++ b/data/2020/neurips/Debiasing Averaged Stochastic Gradient Descent to handle missing values	
@@ -0,0 +1 @@
+Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of O( 1 n ) at the iteration n, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.
\ No newline at end of file
diff --git a/data/2020/neurips/Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization b/data/2020/neurips/Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization
new file mode 100644
index 0000000000..9e31ef1ee7
--- /dev/null
+++ b/data/2020/neurips/Debiasing Distributed Second Order Optimization with Surrogate Sketching and Scaled Regularization	
@@ -0,0 +1 @@
+In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data. However, the local estimates on each machine are typically biased, relative to the full solution on all of the data, and this can limit the effectiveness of averaging. Here, we introduce a new technique for debiasing the local estimates, which leads to both theoretical and empirical improvements in the convergence rate of distributed second order methods. Our technique has two novel components: (1) modifying standard sketching techniques to obtain what we call a surrogate sketch; and (2) carefully scaling the global regularization parameter for local computations. Our surrogate sketches are based on determinantal point processes, a family of distributions for which the bias of an estimate of the inverse Hessian can be computed exactly. Based on this computation, we show that when the objective being minimized is $l_2$-regularized with parameter $\lambda$ and individual machines are each given a sketch of size $m$, then to eliminate the bias, local estimates should be computed using a shrunk regularization parameter given by $\lambda^{\prime}=\lambda\cdot(1-\frac{d_{\lambda}}{m})$, where $d_{\lambda}$ is the $\lambda$-effective dimension of the Hessian (or, for quadratic problems, the data matrix).
\ No newline at end of file
diff --git a/data/2020/neurips/Debugging Tests for Model Explanations b/data/2020/neurips/Debugging Tests for Model Explanations
new file mode 100644
index 0000000000..b1bac8e208
--- /dev/null
+++ b/data/2020/neurips/Debugging Tests for Model Explanations	
@@ -0,0 +1 @@
+We investigate whether post-hoc model explanations are effective for diagnosing model errors--model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize \textit{bugs}, based on their source, into:~\textit{data, model, and test-time} contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.
\ No newline at end of file
diff --git a/data/2020/neurips/Decentralized Accelerated Proximal Gradient Descent b/data/2020/neurips/Decentralized Accelerated Proximal Gradient Descent
new file mode 100644
index 0000000000..0828348b4c
--- /dev/null
+++ b/data/2020/neurips/Decentralized Accelerated Proximal Gradient Descent	
@@ -0,0 +1 @@
+Decentralized optimization has wide applications in machine learning, signal processing
\ No newline at end of file
diff --git a/data/2020/neurips/Decentralized Langevin Dynamics for Bayesian Learning b/data/2020/neurips/Decentralized Langevin Dynamics for Bayesian Learning
new file mode 100644
index 0000000000..655795c25f
--- /dev/null
+++ b/data/2020/neurips/Decentralized Langevin Dynamics for Bayesian Learning	
@@ -0,0 +1 @@
+Motivated by decentralized approaches to machine learning, we propose a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting. Our analysis show that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions to the overall KL-divergence from the additive noise is decreasing in polynomial time. We further show that the polynomial-term experiences speed-up with number of agents and provide sufﬁcient conditions on the time-varying step-sizes to guarantee convergence to the desired distribution. The performance of the proposed algorithm is evaluated on a wide variety of machine learning tasks. The empirical results show that the performance of individual agents with locally available data is on par with the centralized setting with considerable improvement in the convergence rate.
\ No newline at end of file
diff --git a/data/2020/neurips/Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis b/data/2020/neurips/Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis
new file mode 100644
index 0000000000..77b14c4308
--- /dev/null
+++ b/data/2020/neurips/Decentralized TD Tracking with Linear Function Approximation and its Finite-Time Analysis	
@@ -0,0 +1 @@
+The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.
\ No newline at end of file
diff --git a/data/2020/neurips/Decision trees as partitioning machines to characterize their generalization properties b/data/2020/neurips/Decision trees as partitioning machines to characterize their generalization properties
new file mode 100644
index 0000000000..25b7ff497a
--- /dev/null
+++ b/data/2020/neurips/Decision trees as partitioning machines to characterize their generalization properties	
@@ -0,0 +1 @@
+Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\left\lfloor\frac{d}{2}\right\rfloor}$, where $\ell$ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N \log(N\ell)$. Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.
\ No newline at end of file
diff --git a/data/2020/neurips/Decision-Making with Auto-Encoding Variational Bayes b/data/2020/neurips/Decision-Making with Auto-Encoding Variational Bayes
new file mode 100644
index 0000000000..4e2cfd48e4
--- /dev/null
+++ b/data/2020/neurips/Decision-Making with Auto-Encoding Variational Bayes	
@@ -0,0 +1 @@
+To make decisions based on a model fit by Auto-Encoding Variational Bayes (AEVB), practitioners typically use importance sampling to estimate a functional of the posterior distribution. The variational distribution found by AEVB serves as the proposal distribution for importance sampling. However, this proposal distribution may give unreliable (high variance) importance sampling estimates, thus leading to poor decisions. We explore how changing the objective function for learning the variational distribution, while continuing to learn the generative model based on the ELBO, affects the quality of downstream decisions. For a particular model, we characterize the error of importance sampling as a function of posterior variance and show that proposal distributions learned with evidence upper bounds are better. Motivated by these theoretical results, we propose a novel variant of the VAE. In addition to experimenting with MNIST, we present a full-fledged application of the proposed method to single-cell RNA sequencing. In this challenging instance of multiple hypothesis testing, the proposed method surpasses the current state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Decisions, Counterfactual Explanations and Strategic Behavior b/data/2020/neurips/Decisions, Counterfactual Explanations and Strategic Behavior
new file mode 100644
index 0000000000..414b4b94eb
--- /dev/null
+++ b/data/2020/neurips/Decisions, Counterfactual Explanations and Strategic Behavior	
@@ -0,0 +1 @@
+As data-driven predictive models are increasingly used to inform decisions, it has been argued that decision makers should provide explanations that help individuals understand what would have to change for these decisions to be beneficial ones. However, there has been little discussion on the possibility that individuals may use the above counterfactual explanations to invest effort strategically and maximize their chances of receiving a beneficial decision. In this paper, our goal is to find policies and counterfactual explanations that are optimal in terms of utility in such a strategic setting. We first show that, given a pre-defined policy, the problem of finding the optimal set of counterfactual explanations is NP-hard. Then, we show that the corresponding objective is nondecreasing and satisfies submodularity and this allows a standard greedy algorithm to enjoy approximation guarantees. In addition, we further show that the problem of jointly finding both the optimal policy and set of counterfactual explanations reduces to maximizing a non-monotone submodular function. As a result, we can use a recent randomized algorithm to solve the problem, which also offers approximation guarantees. Finally, we demonstrate that, by incorporating a matroid constraint into the problem formulation, we can increase the diversity of the optimal set of counterfactual explanations and incentivize individuals across the whole spectrum of the population to self improve. Experiments on synthetic and real lending and credit card data illustrate our theoretical findings and show that the counterfactual explanations and decision policies found by our algorithms achieve higher utility than several competitive baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Archimedean Copulas b/data/2020/neurips/Deep Archimedean Copulas
new file mode 100644
index 0000000000..69b8a5b383
--- /dev/null
+++ b/data/2020/neurips/Deep Archimedean Copulas	
@@ -0,0 +1 @@
+A central problem in machine learning and statistics is to model joint densities of random variables from data. Copulas are joint cumulative distribution functions with uniform marginal distributions and are used to capture interdependencies in isolation from marginals. Copulas are widely used within statistics, but have not gained traction in the context of modern deep learning. In this paper, we introduce ACNet, a novel differentiable neural network architecture that enforces structural properties and enables one to learn an important class of copulas--Archimedean Copulas. Unlike Generative Adversarial Networks, Variational Autoencoders, or Normalizing Flow methods, which learn either densities or the generative process directly, ACNet learns a generator of the copula, which implicitly defines the cumulative distribution function of a joint distribution. We give a probabilistic interpretation of the network parameters of ACNet and use this to derive a simple but efficient sampling algorithm for the learned copula. Our experiments show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Automodulators b/data/2020/neurips/Deep Automodulators
new file mode 100644
index 0000000000..5d3507906f
--- /dev/null
+++ b/data/2020/neurips/Deep Automodulators	
@@ -0,0 +1 @@
+We introduce a new category of generative autoencoders called automodulators. These networks can faithfully reproduce individual real-world input images like regular autoencoders, but also generate a fused sample from an arbitrary combination of several such images, allowing instantaneous 'style-mixing' and other new applications. An automodulator decouples the data flow of decoder operations from statistical properties thereof and uses the latent vector to modulate the former by the latter, with a principled approach for mutual disentanglement of decoder layers. Prior work has explored similar decoder architecture with GANs, but their focus has been on random sampling. A corresponding autoencoder could operate on real input images. For the first time, we show how to train such a general-purpose model with sharp outputs in high resolution, using novel training techniques, demonstrated on four image data sets. Besides style-mixing, we show state-of-the-art results in autoencoder comparison, and visual image quality nearly indistinguishable from state-of-the-art GANs. We expect the automodulator variants to become a useful building block for image applications and other data domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Diffusion-Invariant Wasserstein Distributional Classification b/data/2020/neurips/Deep Diffusion-Invariant Wasserstein Distributional Classification
new file mode 100644
index 0000000000..e3bb0873c1
--- /dev/null
+++ b/data/2020/neurips/Deep Diffusion-Invariant Wasserstein Distributional Classification	
@@ -0,0 +1 @@
+In this paper, we present a novel classification method called deep diffusioninvariant Wasserstein distributional classification (DeepWDC). DeepWDC represents input data and labels as probability measures to address severe perturbations in input data. It can output the optimal label measure in terms of diffusion invariance, where the label measure is stationary over time and becomes equivalent to a Gaussian measure. Furthermore, DeepWDC minimizes the 2-Wasserstein distance between the optimal label measure and Gaussian measure, which reduces the Wasserstein uncertainty. Experimental results demonstrate that DeepWDC can substantially enhance the accuracy of several baseline deterministic classification methods and outperforms state-of-the-art-methods on 2D and 3D data containing various types of perturbations (e.g., rotations, impulse noise, and down-scaling).
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Direct Likelihood Knockoffs b/data/2020/neurips/Deep Direct Likelihood Knockoffs
new file mode 100644
index 0000000000..0d1dda02f2
--- /dev/null
+++ b/data/2020/neurips/Deep Direct Likelihood Knockoffs	
@@ -0,0 +1 @@
+Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs [2] enable important features to be discovered with control of the false discovery rate (fdr). However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (ddlk), which directly minimizes the KL divergence implied by the knockoff swap property. ddlk consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, ddlk uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find ddlk has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Energy-based Modeling of Discrete-Time Physics b/data/2020/neurips/Deep Energy-based Modeling of Discrete-Time Physics
new file mode 100644
index 0000000000..005ee6fdd8
--- /dev/null
+++ b/data/2020/neurips/Deep Energy-based Modeling of Discrete-Time Physics	
@@ -0,0 +1 @@
+Physical phenomena in the real world are often described by energy-based modeling theories, such as Hamiltonian mechanics or the Landau theory, which yield various physical laws. Recent developments in neural networks have enabled the mimicking of the energy conservation law by learning the underlying continuous-time differential equations. However, this may not be possible in discrete time, which is often the case in practical learning and computation. Moreover, other physical laws have been overlooked in the previous neural network models. In this study, we propose a deep energy-based physical model that admits a specific differential geometric structure. From this structure, the conservation or dissipation law of energy and the mass conservation law follow naturally. To ensure the energetic behavior in discrete time, we also propose an automatic discrete differential algorithm that enables neural networks to employ the discrete gradient method.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Evidential Regression b/data/2020/neurips/Deep Evidential Regression
new file mode 100644
index 0000000000..d0ccf8f39d
--- /dev/null
+++ b/data/2020/neurips/Deep Evidential Regression	
@@ -0,0 +1 @@
+Deterministic neural networks (NNs) are increasingly being deployed in safety critical domains, where calibrated, robust and efficient measures of uncertainty are crucial. While it is possible to train regression networks to output the parameters of a probability distribution by maximizing a Gaussian likelihood function, the resulting model remains oblivious to the underlying confidence of its predictions. In this paper, we propose a novel method for training deterministic NNs to not only estimate the desired target but also the associated evidence in support of that target. We accomplish this by placing evidential priors over our original Gaussian likelihood function and training our NN to infer the hyperparameters of our evidential distribution. We impose priors during training such that the model is penalized when its predicted evidence is not aligned with the correct output. Thus the model estimates not only the probabilistic mean and variance of our target but also the underlying uncertainty associated with each of those parameters. We observe that our evidential regression method learns well-calibrated measures of uncertainty on various benchmarks, scales to complex computer vision tasks, and is robust to adversarial input perturbations.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking b/data/2020/neurips/Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking
new file mode 100644
index 0000000000..d50d3b0893
--- /dev/null
+++ b/data/2020/neurips/Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking	
@@ -0,0 +1 @@
+Noninvasive behavioral tracking of animals is crucial for many scientific investigations. Recent transfer learning approaches for behavioral tracking have considerably advanced the state of the art. Typically these methods treat each video frame and each object to be tracked independently. In this work, we improve on these methods (particularly in the regime of few training labels) by leveraging the rich spatiotemporal structures pervasive in behavioral video — specifically, the spatial statistics imposed by physical constraints (e.g., paw to elbow distance), and the temporal statistics imposed by smoothness from frame to frame. We propose a probabilistic graphical model built on top of deep neural networks, Deep Graph Pose (DGP), to leverage these useful spatial and temporal constraints, and develop an efficient structured variational approach to perform inference in this model. The resulting semi-supervised model exploits both labeled and unlabeled frames to achieve significantly more accurate and robust tracking while requiring users to label fewer training frames. In turn, these tracking improvements enhance performance on downstream applications, including robust unsupervised segmentation of behavioral “syllables,” and estimation of interpretable “disentangled” low-dimensional representations of the full behavioral video. Open source code is available at https://github.com/paninski-lab/deepgraphpose.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Imitation Learning for Bimanual Robotic Manipulation b/data/2020/neurips/Deep Imitation Learning for Bimanual Robotic Manipulation
new file mode 100644
index 0000000000..f96db89d48
--- /dev/null
+++ b/data/2020/neurips/Deep Imitation Learning for Bimanual Robotic Manipulation	
@@ -0,0 +1 @@
+We present a deep imitation learning framework for robotic bimanual manipulation in a continuous state-action space. Imitation learning has been effectively utilized in mimicking bimanual manipulation movements, but generalizing the movement to objects in different locations has not been explored. We hypothesize that to precisely generalize the learned behavior relative to an object's location requires modeling relational information in the environment. To achieve this, we designed a method that (i) uses a multi-model framework to decomposes complex dynamics into elemental movement primitives, and (ii) parameterizes each primitive using a recurrent graph neural network to capture interactions. Our model is a deep, hierarchical, modular architecture with a high-level planner that learns to compose primitives sequentially and a low-level controller which integrates primitive dynamics modules and inverse kinematics control. We demonstrate the effectiveness using several simulated bimanual robotic manipulation tasks. Compared to models based on previous imitation learning studies, our model generalizes better and achieves higher success rates in the simulated tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Metric Learning with Spherical Embedding b/data/2020/neurips/Deep Metric Learning with Spherical Embedding
new file mode 100644
index 0000000000..d2f9e92af0
--- /dev/null
+++ b/data/2020/neurips/Deep Metric Learning with Spherical Embedding	
@@ -0,0 +1 @@
+Deep metric learning has attracted much attention in recent years, due to seamlessly combining the distance metric learning and deep neural network. Many endeavors are devoted to design different pair-based angular loss functions, which decouple the magnitude and direction information for embedding vectors and ensure the training and testing measure consistency. However, these traditional angular losses cannot guarantee that all the sample embeddings are on the surface of the same hypersphere during the training stage, which would result in unstable gradient in batch optimization and may influence the quick convergence of the embedding learning. In this paper, we first investigate the effect of the embedding norm for deep metric learning with angular distance, and then propose a spherical embedding constraint (SEC) to regularize the distribution of the norms. SEC adaptively adjusts the embeddings to fall on the same hypersphere and performs more balanced direction update. Extensive experiments on deep metric learning, face recognition, and contrastive self-supervised learning show that the SEC-based angular space learning strategy significantly improves the performance of the state-of-the-art.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Multimodal Fusion by Channel Exchanging b/data/2020/neurips/Deep Multimodal Fusion by Channel Exchanging
new file mode 100644
index 0000000000..4f14bb9e67
--- /dev/null
+++ b/data/2020/neurips/Deep Multimodal Fusion by Channel Exchanging	
@@ -0,0 +1 @@
+Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at https://github.com/yikaiw/CEN.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Rao-Blackwellised Particle Filters for Time Series Forecasting b/data/2020/neurips/Deep Rao-Blackwellised Particle Filters for Time Series Forecasting
new file mode 100644
index 0000000000..40c63c19a4
--- /dev/null
+++ b/data/2020/neurips/Deep Rao-Blackwellised Particle Filters for Time Series Forecasting	
@@ -0,0 +1 @@
+This work addresses efﬁcient inference and learning in switching Gaussian linear dynamical systems using a Rao-Blackwellised particle ﬁlter and a corresponding Monte Carlo objective. To improve the forecasting capabilities, we extend this classical model by conditionally linear state-to-switch dynamics, while leaving the partial tractability of the conditional Gaussian linear part intact. Furthermore, we use an auxiliary variable approach with a decoder-type neural network that allows for more complex non-linear emission models and multivariate observations. We propose a Monte Carlo objective that leverages the conditional linearity by computing the corresponding conditional expectations in closed-form and a suitable proposal distribution that is factorised similarly to the optimal proposal distribution. We evaluate our approach on several popular time series forecasting datasets as well as image streams of simulated physical systems. Our results show improved forecasting performance compared to other deep state-space model approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games b/data/2020/neurips/Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
new file mode 100644
index 0000000000..c0e2591669
--- /dev/null
+++ b/data/2020/neurips/Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games	
@@ -0,0 +1 @@
+We study reinforcement learning (RL) for text-based games, which are interactive simulations in the context of natural language. While different methods have been developed to represent the environment information and language actions, existing RL agents are not empowered with any reasoning capabilities to deal with textual games. In this work, we aim to conduct explicit reasoning with knowledge graphs for decision making, so that the actions of an agent are generated and supported by an interpretable inference procedure. We propose a stacked hierarchical attention mechanism to construct an explicit representation of the reasoning process by exploiting the structure of the knowledge graph. We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Reinforcement and InfoMax Learning b/data/2020/neurips/Deep Reinforcement and InfoMax Learning
new file mode 100644
index 0000000000..f32a21cca8
--- /dev/null
+++ b/data/2020/neurips/Deep Reinforcement and InfoMax Learning	
@@ -0,0 +1 @@
+We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal representation of successive timesteps. We test our approach in several synthetic settings, where it successfully learns representations that are predictive of the future. Finally, we augment C51, a strong RL baseline, with our temporal DIM objective and demonstrate improved performance on a continual learning task and on the recently introduced Procgen environment.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network b/data/2020/neurips/Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network
new file mode 100644
index 0000000000..4149273836
--- /dev/null
+++ b/data/2020/neurips/Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network	
@@ -0,0 +1 @@
+To analyze a collection of interconnected documents, relational topic models (RTMs) have been developed to describe both the link structure and document content, exploring their underlying relationships via a single-layer latent representation with limited expressive capability. To better utilize the document network, we ﬁrst propose graph Poisson factor analysis (GPFA) that constructs a probabilistic model for interconnected documents and also provides closed-form Gibbs sampling up-date equations, moving beyond sophisticated approximate assumptions of existing RTMs. Extending GPFA, we develop a novel hierarchical RTM named graph Poisson gamma belief network (GPGBN), and further introduce two different Weibull distribution based variational graph auto-encoders for efﬁcient model inference and effective network information aggregation. Experimental results demonstrate that our models extract high-quality hierarchical latent document representations, leading to improved performance over baselines on various graph analytic tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Shells: Unsupervised Shape Correspondence with Optimal Transport b/data/2020/neurips/Deep Shells: Unsupervised Shape Correspondence with Optimal Transport
new file mode 100644
index 0000000000..c0e75e3f79
--- /dev/null
+++ b/data/2020/neurips/Deep Shells: Unsupervised Shape Correspondence with Optimal Transport	
@@ -0,0 +1 @@
+We propose a novel unsupervised learning approach to 3D shape correspondence that builds a multiscale matching pipeline into a deep neural network. This approach is based on smooth shells, the current state-of-the-art axiomatic correspondence method, which requires an a priori stochastic search over the space of initial poses. Our goal is to replace this costly preprocessing step by directly learning good initializations from the input surfaces. To that end, we systematically derive a fully differentiable, hierarchical matching pipeline from entropy regularized optimal transport. This allows us to combine it with a local feature extractor based on smooth, truncated spectral convolution filters. Finally, we show that the proposed unsupervised method significantly improves over the state-of-the-art on multiple datasets, even in comparison to the most recent supervised methods. Moreover, we demonstrate compelling generalization results by applying our learned filters to examples that significantly deviate from the training set.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Smoothing of the Implied Volatility Surface b/data/2020/neurips/Deep Smoothing of the Implied Volatility Surface
new file mode 100644
index 0000000000..131c70b39d
--- /dev/null
+++ b/data/2020/neurips/Deep Smoothing of the Implied Volatility Surface	
@@ -0,0 +1 @@
+We present a neural network (NN) approach to fit and predict implied volatility surfaces (IVSs). Atypically to standard NN applications, financial industry practitioners use such models equally to replicate market prices and to value other financial instruments. In other words, low training losses are as important as generalization capabilities. Importantly, IVS models need to generate realistic arbitrage-free option prices, meaning that no portfolio can lead to risk-free profits. We propose an approach guaranteeing the absence of arbitrage opportunities by penalizing the loss using soft constraints. Furthermore, our method can be combined with standard IVS models in quantitative finance, thus providing a NN-based correction when such models fail at replicating observed market prices. This lets practitioners use our approach as a plug-in on top of classical methods. Empirical results show that this approach is particularly useful when only sparse or erroneous data are available. We also quantify the uncertainty of the model predictions in regions with few or no observations. We further explore how deeper NNs improve over shallower ones, as well as other properties of the network architecture. We benchmark our method against standard IVS models. By evaluating our method on both training sets, and testing sets, namely, we highlight both their capacity to reproduce observed prices and predict new ones.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Statistical Solvers b/data/2020/neurips/Deep Statistical Solvers
new file mode 100644
index 0000000000..c8525a2409
--- /dev/null
+++ b/data/2020/neurips/Deep Statistical Solvers	
@@ -0,0 +1 @@
+This paper introduces Deep Statistical Solvers (DSS), a new class of trainable solvers for optimization problems, arising e.g. , from system simulations. The key idea is to learn a solver that generalizes to a given distribution of problem instances. This is achieved by directly using as loss the objective function of the problem, as opposed to most previous Machine Learning based approaches, which mimic the solutions attained by an existing solver. Though both types of approaches outperform classical solvers with respect to speed for a given accuracy, a distinctive advantage of DSS is that they can be trained without a training set of sample solutions. Focusing on use cases of systems of interacting and interchangeable entities ( e.g. molecular dynamics, power systems, discretized PDEs), the proposed approach is instantiated within a class of Graph Neural Networks. Under sufﬁcient conditions, we prove that the corresponding set of functions contains approximations to any arbitrary precision of the actual solution of the optimization problem. The proposed approach is experimentally validated on large linear problems, demonstrating super-generalisation properties; And on AC power grid simulations, on which the predictions of the trained model have a correlation higher than 99 . 99% with the outputs of the classical Newton-Raphson method (known for its accuracy), while being 2 to 3 orders of magnitude faster.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Structural Causal Models for Tractable Counterfactual Inference b/data/2020/neurips/Deep Structural Causal Models for Tractable Counterfactual Inference
new file mode 100644
index 0000000000..0e8497db2a
--- /dev/null
+++ b/data/2020/neurips/Deep Structural Causal Models for Tractable Counterfactual Inference	
@@ -0,0 +1 @@
+We formulate a general framework for building structural causal models (SCMs) with deep learning components. The proposed approach employs normalising flows and variational inference to enable tractable inference of exogenous noise variables - a crucial step for counterfactual inference that is missing from existing deep causal learning methods. Our framework is validated on a synthetic dataset built on MNIST as well as on a real-world medical dataset of brain MRI scans. Our experimental results indicate that we can successfully train deep SCMs that are capable of all three levels of Pearl's ladder of causation: association, intervention, and counterfactuals, giving rise to a powerful new approach for answering causal questions in imaging applications and beyond. The code for all our experiments is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Subspace Clustering with Data Augmentation b/data/2020/neurips/Deep Subspace Clustering with Data Augmentation
new file mode 100644
index 0000000000..31479ddb73
--- /dev/null
+++ b/data/2020/neurips/Deep Subspace Clustering with Data Augmentation	
@@ -0,0 +1 @@
+The idea behind data augmentation techniques is based on the fact that slight changes in the percept do not change the brain cognition. In classiﬁcation, neural networks use this fact by applying transformations to the inputs to learn to predict the same label. However, in deep subspace clustering (DSC), the ground-truth labels are not available, and as a result, one cannot easily use data augmentation techniques. We propose a technique to exploit the beneﬁts of data augmentation in DSC algorithms. We learn representations that have consistent subspaces for slightly transformed inputs. In particular, we introduce a temporal ensembling component to the objective function of DSC algorithms to enable the DSC networks to maintain consistent subspaces for random transformations in the input data. In addition, we provide a simple yet effective unsupervised procedure to ﬁnd efﬁcient data augmentation policies. An augmentation policy is deﬁned as an image processing transformation with a certain magnitude and probability of being applied to each image in each epoch. We search through the policies in a search space of the most common augmentation policies to ﬁnd the best policy such that the DSC network yields the highest mean Silhouette coefﬁcient in its clustering results on a target dataset. Our method achieves state-of-the-art performance on four standard subspace clustering datasets. The source code is available at: https://github.com/mahdiabavisani/DSCwithDA.git .
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Transformation-Invariant Clustering b/data/2020/neurips/Deep Transformation-Invariant Clustering
new file mode 100644
index 0000000000..db6cc52802
--- /dev/null
+++ b/data/2020/neurips/Deep Transformation-Invariant Clustering	
@@ -0,0 +1 @@
+Recent advances in image clustering typically focus on learning better deep representations. In contrast, we present an orthogonal approach that does not rely on abstract features but instead learns to predict image transformations and performs clustering directly in image space. This learning process naturally fits in the gradient-based training of K-means and Gaussian mixture model, without requiring any additional loss or hyper-parameters. It leads us to two new deep transformation-invariant clustering frameworks, which jointly learn prototypes and transformations. More specifically, we use deep learning modules that enable us to resolve invariance to spatial, color and morphological transformations. Our approach is conceptually simple and comes with several advantages, including the possibility to easily adapt the desired invariance to the task and a strong interpretability of both cluster centers and assignments to clusters. We demonstrate that our novel approach yields competitive and highly promising results on standard image clustering benchmarks. Finally, we showcase its robustness and the advantages of its improved interpretability by visualizing clustering results over real photograph collections.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Transformers with Latent Depth b/data/2020/neurips/Deep Transformers with Latent Depth
new file mode 100644
index 0000000000..6fe35881f2
--- /dev/null
+++ b/data/2020/neurips/Deep Transformers with Latent Depth	
@@ -0,0 +1 @@
+The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. However, how to leverage model capacity with large or variable depths is still an open challenge. We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. As an extension of this framework, we propose a novel method to train one shared Transformer network for multilingual machine translation with different layer selection posteriors for each language pair. The proposed method alleviates the vanishing gradient issue and enables stable training of deep Transformers (e.g. 100 layers). We evaluate on WMT English-German machine translation and masked language modeling tasks, where our method outperforms existing approaches for training deeper Transformers. Experiments on multilingual machine translation demonstrate that this approach can effectively leverage increased model capacity and bring universal improvement for both many-to-one and one-to-many translation with diverse language pairs.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Variational Instance Segmentation b/data/2020/neurips/Deep Variational Instance Segmentation
new file mode 100644
index 0000000000..64af6cf712
--- /dev/null
+++ b/data/2020/neurips/Deep Variational Instance Segmentation	
@@ -0,0 +1 @@
+Instance Segmentation, which seeks to obtain both class and instance labels for each pixel in the input image, is a challenging task in computer vision. State-of-the-art algorithms often employ two separate stages, the first one generating object proposals and the second one recognizing and refining the boundaries. Further, proposals are usually based on detectors such as faster R-CNN which search for boxes in the entire image exhaustively. In this paper, we propose a novel algorithm that directly utilizes a fully convolutional network (FCN) to predict instance labels. Specifically, we propose a variational relaxation of instance segmentation as minimizing an optimization functional for a piecewise-constant segmentation problem, which can be used to train an FCN end-to-end. It extends the classical Mumford-Shah variational segmentation problem to be able to handle permutation-invariant labels in the ground truth of instance segmentation. Experiments on PASCAL VOC 2012, Semantic Boundaries dataset(SBD), and the MSCOCO 2017 dataset show that the proposed approach efficiently tackle the instance segmentation task. The source code and trained models will be released with the paper.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring b/data/2020/neurips/Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring
new file mode 100644
index 0000000000..101cea571d
--- /dev/null
+++ b/data/2020/neurips/Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring	
@@ -0,0 +1 @@
+We present a simple and effective approach for non-blind image deblurring, combining classical techniques and deep learning. In contrast to existing methods that deblur the image directly in the standard image space, we propose to perform an explicit deconvolution process in a feature space by integrating a classical Wiener deconvolution framework with learned deep features. A multi-scale feature refinement module then predicts the deblurred image from the deconvolved deep features, progressively recovering detail and small-scale structures. The proposed model is trained in an end-to-end manner and evaluated on scenarios with both simulated and real-world image blur. Our extensive experimental results show that the proposed deep Wiener deconvolution network facilitates deblurred results with visibly fewer artifacts. Moreover, our approach quantitatively outperforms state-of-the-art non-blind image deblurring methods by a wide margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep active inference agents using Monte-Carlo methods b/data/2020/neurips/Deep active inference agents using Monte-Carlo methods
new file mode 100644
index 0000000000..8cc31a8c64
--- /dev/null
+++ b/data/2020/neurips/Deep active inference agents using Monte-Carlo methods	
@@ -0,0 +1 @@
+Active inference is a Bayesian framework for understanding biological intelligence. The underlying theory brings together perception and action under one single imperative: minimizing free energy. However, despite its theoretical utility in explaining intelligence, computational implementations have been restricted to low-dimensional and idealized situations. In this paper, we present a neural architecture for building deep active inference agents operating in complex, continuous state-spaces using multiple forms of Monte-Carlo (MC) sampling. For this, we introduce a number of techniques, novel to active inference. These include: i) selecting free-energy-optimal policies via MC tree search, ii) approximating this optimal policy distribution via a feed-forward `habitual' network, iii) predicting future parameter belief updates using MC dropouts and, finally, iv) optimizing state transition precision (a high-end form of attention). Our approach enables agents to learn environmental dynamics efficiently, while maintaining task performance, in relation to reward-based counterparts. We illustrate this in a new toy environment, based on the dSprites data-set, and demonstrate that active inference agents automatically create disentangled representations that are apt for modeling state transitions. In a more complex Animal-AI environment, our agents (using the same neural architecture) are able to simulate future state transitions and actions (i.e., plan), to evince reward-directed navigation - despite temporary suspension of visual input. These results show that deep active inference - equipped with MC methods - provides a flexible framework to develop biologically-inspired intelligent agents, with applications in both machine learning and cognitive science.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel b/data/2020/neurips/Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
new file mode 100644
index 0000000000..e9830d50a9
--- /dev/null
+++ b/data/2020/neurips/Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel	
@@ -0,0 +1 @@
+In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15% to 45% of training time. Overall, our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Deep reconstruction of strange attractors from time series b/data/2020/neurips/Deep reconstruction of strange attractors from time series
new file mode 100644
index 0000000000..ce76d137f2
--- /dev/null
+++ b/data/2020/neurips/Deep reconstruction of strange attractors from time series	
@@ -0,0 +1 @@
+Experimental measurements of physical systems often have a limited number of independent channels, causing essential dynamical variables to remain unobserved. However, many popular methods for unsupervised inference of latent dynamics from experimental data implicitly assume that the measurements have higher intrinsic dimensionality than the underlying system---making coordinate identification a dimensionality reduction problem. Here, we study the opposite limit, in which hidden governing coordinates must be inferred from only a low-dimensional time series of measurements. Inspired by classical techniques for studying the strange attractors of chaotic systems, we introduce a general embedding technique for time series, consisting of an autoencoder trained with a novel latent-space loss function. We show that our technique reconstructs the strange attractors of synthetic and real-world systems better than existing techniques, and that it creates consistent, predictive representations of even stochastic systems. We conclude by using our technique to discover dynamical attractors in diverse systems such as patient electrocardiograms, household electricity usage, and eruptions of the Old Faithful geyser---demonstrating diverse applications of our technique for exploratory data analysis.
\ No newline at end of file
diff --git a/data/2020/neurips/DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs b/data/2020/neurips/DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs
new file mode 100644
index 0000000000..dd3544ee58
--- /dev/null
+++ b/data/2020/neurips/DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by Transferring from GANs	
@@ -0,0 +1 @@
+Image-to-image translation has recently achieved remarkable results. But despite current success, it suffers from inferior performance when translations between classes require large shape changes. We attribute this to the high-resolution bottlenecks which are used by current state-of-the-art image-to-image methods. Therefore, in this work, we propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I. We learn a model by leveraging hierarchical features: (a) structural information contained in the shallow layers and (b) semantic information extracted from the deep layers. To enable the training of deep I2I models on small datasets, we propose a novel transfer learning method, that transfers knowledge from pre-trained GANs. Specifically, we leverage the discriminator of a pre-trained GANs (i.e. BigGAN or StyleGAN) to initialize both the encoder and the discriminator and the pre-trained generator to initialize the generator of our model. Applying knowledge transfer leads to an alignment problem between the encoder and generator. We introduce an adaptor network to address this. On many-class image-to-image translation on three datasets (Animal faces, Birds, and Foods) we decrease mFID by at least 35% when compared to the state-of-the-art. Furthermore, we qualitatively and quantitatively demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets. Finally, we are the first to perform I2I translations for domains with over 100 classes.
\ No newline at end of file
diff --git a/data/2020/neurips/DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation b/data/2020/neurips/DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
new file mode 100644
index 0000000000..7f1032266a
--- /dev/null
+++ b/data/2020/neurips/DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation	
@@ -0,0 +1 @@
+Scalable Vector Graphics (SVG) are ubiquitous in modern 2D interfaces due to their ability to scale to different resolutions. However, despite the success of deep learning-based models applied to rasterized images, the problem of vector graphics representation learning and generation remains largely unexplored. In this work, we propose a novel hierarchical generative network, called DeepSVG, for complex SVG icons generation and interpolation. Our architecture effectively disentangles high-level shapes from the low-level commands that encode the shape itself. The network directly predicts a set of shapes in a non-autoregressive fashion. We introduce the task of complex SVG icons generation by releasing a new large-scale dataset along with an open-source library for SVG manipulation. We demonstrate that our network learns to accurately reconstruct diverse vector graphics, and can serve as a powerful animation tool by performing interpolations and other latent space operations. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Deeply Learned Spectral Total Variation Decomposition b/data/2020/neurips/Deeply Learned Spectral Total Variation Decomposition
new file mode 100644
index 0000000000..9c643a42c9
--- /dev/null
+++ b/data/2020/neurips/Deeply Learned Spectral Total Variation Decomposition	
@@ -0,0 +1 @@
+Non-linear spectral decompositions of images based on one-homogeneous functionals such as total variation have gained considerable attention in the last few years. Due to their ability to extract spectral components corresponding to objects of different size and contrast, such decompositions enable filtering, feature transfer, image fusion and other applications. However, obtaining this decomposition involves solving multiple non-smooth optimisation problems and is therefore computationally highly intensive. In this paper, we present a neural network approximation of a non-linear spectral decomposition. We report up to four orders of magnitude ($\times 10,000$) speedup in processing of mega-pixel size images, compared to classical GPU implementations. Our proposed network, TVSpecNET, is able to implicitly learn the underlying PDE and, despite being entirely data driven, inherits invariances of the model based transform. To the best of our knowledge, this is the first approach towards learning a non-linear spectral decomposition of images. Not only do we gain a staggering computational advantage, but this approach can also be seen as a step towards studying neural networks that can decompose an image into spectral components defined by a user rather than a handcrafted functional.
\ No newline at end of file
diff --git a/data/2020/neurips/Delay and Cooperation in Nonstochastic Linear Bandits b/data/2020/neurips/Delay and Cooperation in Nonstochastic Linear Bandits
new file mode 100644
index 0000000000..292028a738
--- /dev/null
+++ b/data/2020/neurips/Delay and Cooperation in Nonstochastic Linear Bandits	
@@ -0,0 +1 @@
+This paper offers a nearly optimal algorithm for online linear optimization with delayed bandit feedback. Online linear optimization with bandit feedback, or nonstochastic linear bandits, provides a generic framework for sequential decision-making problems with limited information. This framework, however, assumes that feedback can be observed just after choosing the action, and, hence, does not apply directly to many practical applications, in which the feedback can often only be obtained after a while. To cope with such situations, we consider problem settings in which the feedback can be observed d rounds after the choice of an action, and propose an algorithm for which the expected regret is ˜ O ( p m ( m + d ) T ) , ignoring logarithmic factors in m and T , where m and T denote the dimensionality of the action set and the number of rounds, respectively. This algorithm achieves nearly optimal performance, as we are able to show that arbitrary algorithms suffer the regret of ⌦ ( p m ( m + d ) T ) in the worst case. To develop the algorithm, we introduce a technique we refer to as distribution truncation , which plays an essential role in bounding the regret. We also apply our approach to cooperative bandits, as studied by Cesa-Bianchi et al. [18] and Bar-On and Mansour [12], and extend their results to the linear bandits setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians b/data/2020/neurips/Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians
new file mode 100644
index 0000000000..48c7b09975
--- /dev/null
+++ b/data/2020/neurips/Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians	
@@ -0,0 +1 @@
+Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem. While research on bilevel optimization of neural networks has been dominated by implicit differentiation and unrolling, hypernetworks such as Self-Tuning Networks (STNs) have recently gained traction due to their ability to amortize the optimization of the inner objective. In this paper, we diagnose several subtle pathologies in the training of STNs. Based on these observations, we propose the $\Delta$-STN, an improved hypernetwork architecture which stabilizes training and optimizes hyperparameters much more efficiently than STNs. The key idea is to focus on accurately approximating the best-response Jacobian rather than the full best-response function; we achieve this by reparameterizing the hypernetwork and linearizing the network around the current parameters. We demonstrate empirically that our $\Delta$-STN can tune regularization hyperparameters (e.g. weight decay, dropout, number of cutout holes) with higher accuracy, faster convergence, and improved stability compared to existing approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation b/data/2020/neurips/Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation
new file mode 100644
index 0000000000..10178a1856
--- /dev/null
+++ b/data/2020/neurips/Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation	
@@ -0,0 +1 @@
+In this paper, we address several inadequacies of current video object segmentation pipelines. Firstly, a cyclic mechanism is incorporated to the standard semi-supervised process to produce more robust representations. By relying on the accurate reference mask in the starting frame, we show that the error propagation problem can be mitigated. Next, we introduce a simple gradient correction module, which extends the offline pipeline to an online method while maintaining the efficiency of the former. Finally we develop cycle effective receptive field (cycle-ERF) based on gradient correction to provide a new perspective into analyzing object-specific regions of interests. We conduct comprehensive experiments on challenging benchmarks of DAVIS17 and Youtube-VOS, demonstrating that the cyclic mechanism is beneficial to segmentation quality.
\ No newline at end of file
diff --git a/data/2020/neurips/Demixed shared component analysis of neural population data from multiple brain areas b/data/2020/neurips/Demixed shared component analysis of neural population data from multiple brain areas
new file mode 100644
index 0000000000..aae641d92f
--- /dev/null
+++ b/data/2020/neurips/Demixed shared component analysis of neural population data from multiple brain areas	
@@ -0,0 +1 @@
+Recent advances in neuroscience data acquisition allow for the simultaneous recording of large populations of neurons across multiple brain areas while subjects perform complex cognitive tasks. Interpreting these data requires us to index how task-relevant information is shared across brain regions, but this is often confounded by the mixing of different task parameters at the single neuron level. Here, inspired by a method developed for a single brain area, we introduce a new technique for demixing variables across multiple brain areas, called demixed shared component analysis (dSCA). dSCA decomposes population activity into a few components, such that the shared components capture the maximum amount of shared information across brain regions while also depending on relevant task parameters. This yields interpretable components that express which variables are shared between different brain regions and when this information is shared across time. To illustrate our method, we reanalyze two datasets recorded during decision-making tasks in rodents and macaques. We find that dSCA provides new insights into the shared computation between different brain areas in these datasets, relating to several different aspects of decision formation.
\ No newline at end of file
diff --git a/data/2020/neurips/Demystifying Orthogonal Monte Carlo and Beyond b/data/2020/neurips/Demystifying Orthogonal Monte Carlo and Beyond
new file mode 100644
index 0000000000..9a3c5d973f
--- /dev/null
+++ b/data/2020/neurips/Demystifying Orthogonal Monte Carlo and Beyond	
@@ -0,0 +1 @@
+Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction. Due to its simplicity and superior performance as compared to its Quasi Monte Carlo counterparts, OMC is used in a wide spectrum of challenging machine learning applications ranging from scalable kernel methods to predictive recurrent neural networks, generative models and reinforcement learning. However theoretical understanding of the method remains very limited. In this paper we shed new light on the theoretical principles behind OMC, applying theory of negatively dependent random variables to obtain several new concentration results. We also propose a novel extensions of the method leveraging number theory techniques and particle algorithms, called Near-Orthogonal Monte Carlo (NOMC). We show that NOMC is the first algorithm consistently outperforming OMC in applications ranging from kernel methods to approximating distances in probabilistic metric spaces.
\ No newline at end of file
diff --git a/data/2020/neurips/Denoised Smoothing: A Provable Defense for Pretrained Classifiers b/data/2020/neurips/Denoised Smoothing: A Provable Defense for Pretrained Classifiers
new file mode 100644
index 0000000000..4e6154536f
--- /dev/null
+++ b/data/2020/neurips/Denoised Smoothing: A Provable Defense for Pretrained Classifiers	
@@ -0,0 +1 @@
+We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks. This method, for instance, allows public vision API providers and users to seamlessly convert pretrained non-robust classification services into provably robust ones. By prepending a custom-trained denoiser to any off-the-shelf image classifier and using randomized smoothing, we effectively create a new classifier that is guaranteed to be $\ell_p$-robust to adversarial examples, without modifying the pretrained classifier. Our approach applies to both the white-box and the black-box settings of the pretrained classifier. We refer to this defense as denoised smoothing, and we demonstrate its effectiveness through extensive experimentation on ImageNet and CIFAR-10. Finally, we use our approach to provably defend the Azure, Google, AWS, and ClarifAI image classification APIs. Our code replicating all the experiments in the paper can be found at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Denoising Diffusion Probabilistic Models b/data/2020/neurips/Denoising Diffusion Probabilistic Models
new file mode 100644
index 0000000000..92b200fd34
--- /dev/null
+++ b/data/2020/neurips/Denoising Diffusion Probabilistic Models	
@@ -0,0 +1 @@
+We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs b/data/2020/neurips/Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs
new file mode 100644
index 0000000000..b0c44a52f2
--- /dev/null
+++ b/data/2020/neurips/Dense Correspondences between Human Bodies via Learning Transformation Synchronization on Graphs	
@@ -0,0 +1 @@
+We introduce an approach for establishing dense correspondences between partial scans of human models and a complete template model. Our approach’s key novelty lies in formulating dense correspondence computation as initializing and synchronizing local transformations between the scan and the template model. We introduce an optimization formulation for synchronizing transformations among a graph of the input scan, which automatically enforces smoothness of correspondences and recovers the underlying articulated deformations. We then show how to convert the iterative optimization procedure among a graph of the input scan into an end-to-end trainable network. The network design utilizes additional trainable parameters to break the barrier of the original optimization formulation’s exact and robust recovery conditions. Experimental results on benchmark datasets demonstrate that our approach considerably outperforms baseline approaches. Our result, pretrained model and code are publicly available at https://github.com/xiangruhuang/HumanCorresViaLearn2Sync .
\ No newline at end of file
diff --git a/data/2020/neurips/Depth Uncertainty in Neural Networks b/data/2020/neurips/Depth Uncertainty in Neural Networks
new file mode 100644
index 0000000000..162d189d7b
--- /dev/null
+++ b/data/2020/neurips/Depth Uncertainty in Neural Networks	
@@ -0,0 +1 @@
+Existing methods for estimating uncertainty in deep learning tend to require multiple forward passes, making them unsuitable for applications where computational resources are limited. To solve this, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks which share weights and whose predictions are combined via marginalisation, yielding model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass. We validate our approach on real-world regression and image classification tasks. Our approach provides uncertainty calibration, robustness to dataset shift, and accuracies competitive with more computationally expensive baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Design Space for Graph Neural Networks b/data/2020/neurips/Design Space for Graph Neural Networks
new file mode 100644
index 0000000000..9c9f787a4b
--- /dev/null
+++ b/data/2020/neurips/Design Space for Graph Neural Networks	
@@ -0,0 +1 @@
+The rapid evolution of Graph Neural Networks (GNNs) has led to a growing number of new architectures as well as novel applications. However, current research focuses on proposing and evaluating specific architectural designs of GNNs, as opposed to studying the more general design space of GNNs that consists of a Cartesian product of different design dimensions, such as the number of layers or the type of the aggregation function. Additionally, GNN designs are often specialized to a single task, yet few efforts have been made to understand how to quickly find the best GNN design for a novel task or a novel dataset. Here we define and systematically study the architectural design space for GNNs which consists of 315,000 different designs over 32 different predictive tasks. Our approach features three key innovations: (1) A general GNN design space; (2) a GNN task space with a similarity metric, so that for a given novel task/dataset, we can quickly identify/transfer the best performing architecture; (3) an efficient and effective design space evaluation method which allows insights to be distilled from a huge number of model-task combinations. Our key results include: (1) A comprehensive set of guidelines for designing well-performing GNNs; (2) while best GNN designs for different tasks vary significantly, the GNN task space allows for transferring the best designs across different tasks; (3) models discovered using our design space achieve state-of-the-art performance. Overall, our work offers a principled and scalable approach to transition from studying individual GNN designs for specific tasks, to systematically studying the GNN design space and the task space. Finally, we release GraphGym, a powerful platform for exploring different GNN designs and tasks. GraphGym features modularized GNN implementation, standardized GNN evaluation, and reproducible and scalable experiment management.
\ No newline at end of file
diff --git a/data/2020/neurips/Detecting Hands and Recognizing Physical Contact in the Wild b/data/2020/neurips/Detecting Hands and Recognizing Physical Contact in the Wild
new file mode 100644
index 0000000000..dae619aa1d
--- /dev/null
+++ b/data/2020/neurips/Detecting Hands and Recognizing Physical Contact in the Wild	
@@ -0,0 +1 @@
+We investigate a new problem of detecting hands and recognizing their physical contact state in unconstrained conditions. This is a challenging inference task given the need to reason beyond the local appearance of hands. The lack of training annotations indicating which object or parts of an object the hand is in contact with further complicates the task. We propose a novel convolutional network based on Mask-RCNN that can jointly learn to localize hands and predict their physical contact to address this problem. The network uses outputs from another object detector to obtain locations of objects present in the scene. It uses these outputs and hand locations to recognize the hand's contact state using two attention mechanisms. The first attention mechanism is based on the hand and a region's affinity, enclosing the hand and the object, and densely pools features from this region to the hand region. The second attention module adaptively selects salient features from this plausible region of contact. To develop and evaluate our method's performance, we introduce a large-scale dataset called ContactHands, containing unconstrained images annotated with hand locations and contact states. The proposed network, including the parameters of attention modules, is end-to-end trainable. This network achieves approximately 7\% relative improvement over a baseline network that was built on the vanilla Mask-RCNN architecture and trained for recognizing hand contact states.
\ No newline at end of file
diff --git a/data/2020/neurips/Detecting Interactions from Neural Networks via Topological Analysis b/data/2020/neurips/Detecting Interactions from Neural Networks via Topological Analysis
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Detection as Regression: Certified Object Detection with Median Smoothing b/data/2020/neurips/Detection as Regression: Certified Object Detection with Median Smoothing
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time b/data/2020/neurips/Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
new file mode 100644
index 0000000000..c50016c6da
--- /dev/null
+++ b/data/2020/neurips/Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time	
@@ -0,0 +1 @@
+We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint. The prior best-known deterministic approximation ratio for this problem is $\frac{1}{4}-\epsilon$ under $\mathcal{O}(({n^4}/{\epsilon})\log n)$ time complexity. We show that this deterministic ratio can be improved to $\frac{1}{4}$ under $\mathcal{O}(nr)$ time complexity, and then present a more practical algorithm dubbed TwinGreedyFast which achieves $\frac{1}{4}-\epsilon$ deterministic ratio in nearly-linear running time of $\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon})$. Our approach is based on a novel algorithmic framework of simultaneously constructing two candidate solution sets through greedy search, which enables us to get improved performance bounds by fully exploiting the properties of independence systems. As a byproduct of this framework, we also show that TwinGreedyFast achieves $\frac{1}{2p+2}-\epsilon$ deterministic ratio under a $p$-set system constraint with the same time complexity. To showcase the practicality of our approach, we empirically evaluated the performance of TwinGreedyFast on two network applications, and observed that it outperforms the state-of-the-art deterministic and randomized algorithms with efficient implementations for our problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data b/data/2020/neurips/Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
new file mode 100644
index 0000000000..60b093cc60
--- /dev/null
+++ b/data/2020/neurips/Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data	
@@ -0,0 +1 @@
+Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling b/data/2020/neurips/DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling
new file mode 100644
index 0000000000..6e325c4991
--- /dev/null
+++ b/data/2020/neurips/DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs) have shown to be effective in handling unordered data like point cloud and meshes. In this work we propose novel approaches for graph convolution, pooling and unpooling, taking inspiration from finite-elements and algebraic multigrid frameworks. We form a parameterized convolution kernel based on discretized differential operators, leveraging the graph mass, gradient and Laplacian. This way, the parameterization does not depend on the graph structure, only on the meaning of the network convolutions as differential operators. To allow hierarchical representations of the input, we propose pooling and unpooling operations that are based on algebraic multigrid methods. To motivate and explain our method, we compare it to standard Convolutional Neural Networks, and show their similarities and relations in the case of a regular grid. Our proposed method is demonstrated in various experiments like classification and segmentation, achieving on par or better than state of the art results. We also analyze the computational cost of our method compared to other GCNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Augmentation for Data-Efficient GAN Training b/data/2020/neurips/Differentiable Augmentation for Data-Efficient GAN Training
new file mode 100644
index 0000000000..49d25a1bff
--- /dev/null
+++ b/data/2020/neurips/Differentiable Augmentation for Data-Efficient GAN Training	
@@ -0,0 +1 @@
+The performance of generative adversarial networks (GANs) heavily deteriorates given a limited amount of training data. This is mainly because the discriminator is memorizing the exact training set. To combat it, we propose Differentiable Augmentation (DiffAugment), a simple method that improves the data efficiency of GANs by imposing various types of differentiable augmentations on both real and fake samples. Previous attempts to directly augment the training data manipulate the distribution of real images, yielding little benefit; DiffAugment enables us to adopt the differentiable augmentation for the generated samples, effectively stabilizes training, and leads to better convergence. Experiments demonstrate consistent gains of our method over a variety of GAN architectures and loss functions for both unconditional and class-conditional generation. With DiffAugment, we achieve a state-of-the-art FID of 6.80 with an IS of 100.8 on ImageNet 128x128. Furthermore, with only 20% training data, we can match the top performance on CIFAR-10 and CIFAR-100. Finally, our method can generate high-fidelity images using only 100 images without pre-training, while being on par with existing transfer learning algorithms. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Causal Discovery from Interventional Data b/data/2020/neurips/Differentiable Causal Discovery from Interventional Data
new file mode 100644
index 0000000000..9e6b3e963e
--- /dev/null
+++ b/data/2020/neurips/Differentiable Causal Discovery from Interventional Data	
@@ -0,0 +1 @@
+Discovering causal relationships in data is a challenging task that involves solving a combinatorial problem for which the solution is not always identifiable. A new line of work reformulates the combinatorial problem as a continuous constrained optimization one, enabling the use of different powerful optimization techniques. However, methods based on this idea do not yet make use of interventional data, which can significantly alleviate identifiability issues. In this work, we propose a neural network-based method for this task that can leverage interventional data. We illustrate the flexibility of the continuous-constrained framework by taking advantage of expressive neural architectures such as normalizing flows. We show that our approach compares favorably to the state of the art in a variety of settings, including perfect and imperfect interventions for which the targeted nodes may even be unknown.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization b/data/2020/neurips/Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization
new file mode 100644
index 0000000000..ce04082c94
--- /dev/null
+++ b/data/2020/neurips/Differentiable Expected Hypervolume Improvement for Parallel Multi-Objective Bayesian Optimization	
@@ -0,0 +1 @@
+In many real-world scenarios, decision makers seek to efficiently optimize multiple competing objectives in a sample-efficient fashion. Multi-objective Bayesian optimization (BO) is a common approach, but many existing acquisition functions do not have known analytic gradients and suffer from high computational overhead. We leverage recent advances in programming models and hardware acceleration for multi-objective BO using Expected Hypervolume Improvement (EHVI)---an algorithm notorious for its high computational complexity. We derive a novel formulation of $q$-Expected Hypervolume Improvement ($q$EHVI), an acquisition function that extends EHVI to the parallel, constrained evaluation setting. $q$EHVI is an exact computation of the joint EHVI of $q$ new candidate points (up to Monte-Carlo (MC) integration error). Whereas previous EHVI formulations rely on gradient-free acquisition optimization or approximated gradients, we compute exact gradients of the MC estimator via auto-differentiation, thereby enabling efficient and effective optimization using first-order and quasi-second-order methods. Lastly, our empirical evaluation demonstrates that $q$EHVI is computationally tractable in many practical scenarios and outperforms state-of-the-art multi-objective BO algorithms at a fraction of their wall time.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Meta-Learning of Bandit Policies b/data/2020/neurips/Differentiable Meta-Learning of Bandit Policies
new file mode 100644
index 0000000000..147ae8ec3c
--- /dev/null
+++ b/data/2020/neurips/Differentiable Meta-Learning of Bandit Policies	
@@ -0,0 +1 @@
+Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P . In this work, we learn such policies for an unknown distribution P using samples from P . Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form. To do this, we parameterize our policies in a differentiable way and optimize them by policy gradients, an approach that is pleasantly general and easy to implement. We derive effective gradient estimators and propose novel variance reduction techniques. We also analyze and experiment with various bandit policy classes, including neural networks and a novel softmax policy. The latter has regret guarantees and is a natural starting point for our optimization. Our experiments show the versatility of our approach. We also observe that neural network policies can learn implicit biases expressed only through the sampled instances.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement b/data/2020/neurips/Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement
new file mode 100644
index 0000000000..9bc730eb85
--- /dev/null
+++ b/data/2020/neurips/Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement	
@@ -0,0 +1 @@
+Recent works on One-Shot Neural Architecture Search (NAS) mostly adopt a bilevel optimization scheme to alternatively optimize the supernet weights and architecture parameters after relaxing the discrete search space into a differentiable space. However, the non-negligible incongruence in their relaxation methods is hard to guarantee the differentiable optimization in the continuous space is equivalent to the optimization in the discrete space. Differently, this paper utilizes a variational graph autoencoder to injectively transform the discrete architecture space into an equivalently continuous latent space, to resolve the incongruence. A probabilistic exploration enhancement method is accordingly devised to encourage intelligent exploration during the architecture search in the latent space, to avoid local optimal in architecture search. As the catastrophic forgetting in differentiable One-Shot NAS deteriorates supernet predictive ability and makes the bilevel optimization inefﬁcient, this paper further proposes an architecture complementation method to relieve this deﬁciency. We analyze the proposed method’s effectiveness, and a series of experiments have been conducted to compare the proposed method with state-of-the-art One-Shot NAS methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentiable Top-k with Optimal Transport b/data/2020/neurips/Differentiable Top-k with Optimal Transport
new file mode 100644
index 0000000000..4a61a2ec0c
--- /dev/null
+++ b/data/2020/neurips/Differentiable Top-k with Optimal Transport	
@@ -0,0 +1 @@
+The top-k operation, i.e., ﬁnding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Speciﬁcally, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efﬁciently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k -nearest neighbors and beam search algorithms, and demonstrate improved performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentially Private Clustering: Tight Approximation Ratios b/data/2020/neurips/Differentially Private Clustering: Tight Approximation Ratios
new file mode 100644
index 0000000000..e9efa08151
--- /dev/null
+++ b/data/2020/neurips/Differentially Private Clustering: Tight Approximation Ratios	
@@ -0,0 +1,2 @@
+We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors. 
+Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
\ No newline at end of file
diff --git a/data/2020/neurips/Differentially-Private Federated Linear Bandits b/data/2020/neurips/Differentially-Private Federated Linear Bandits
new file mode 100644
index 0000000000..5317116b76
--- /dev/null
+++ b/data/2020/neurips/Differentially-Private Federated Linear Bandits	
@@ -0,0 +1 @@
+The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Digraph Inception Convolutional Networks b/data/2020/neurips/Digraph Inception Convolutional Networks
new file mode 100644
index 0000000000..db780b8659
--- /dev/null
+++ b/data/2020/neurips/Digraph Inception Convolutional Networks	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs) have shown promising results in modeling graph-structured data. However, they have difﬁculty with processing digraphs because of two reasons: 1) transforming directed to undirected graph to guarantee the symmetry of graph Laplacian is not reasonable since it not only misleads message passing scheme to aggregate incorrect weights but also deprives the unique characteristics of digraph structure; 2) due to the ﬁxed receptive ﬁeld in each layer, GCNs fail to obtain multi-scale features that can boost their performance. In this paper, we theoretically extend spectral-based graph convolution to digraphs and derive a simpliﬁed form using personalized PageRank. Speciﬁcally, we present the Digraph Inception Convolutional Networks (DiGCN) which utilizes digraph convolution and k th -order proximity to achieve larger receptive ﬁelds and learn multi-scale features in digraphs. We empirically show that DiGCN can encode more structural information from digraphs than GCNs and help achieve better performance when generalized to other models. Moreover, experiments on various benchmarks demonstrate its superiority against the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures b/data/2020/neurips/Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
new file mode 100644
index 0000000000..08ff88c4c0
--- /dev/null
+++ b/data/2020/neurips/Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures	
@@ -0,0 +1 @@
+Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective, and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing. In contrast with previous studies limited to computer vision tasks, our findings show that it successfully trains a large range of state-of-the-art deep learning architectures, with performance close to fine-tuned backpropagation. At variance with common beliefs, our work supports that challenging tasks can be tackled in the absence of weight transport.
\ No newline at end of file
diff --git a/data/2020/neurips/Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces b/data/2020/neurips/Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
new file mode 100644
index 0000000000..a9e9dc7da5
--- /dev/null
+++ b/data/2020/neurips/Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces	
@@ -0,0 +1 @@
+Direct optimization is an appealing framework that replaces integration with optimization of a random objective for approximating gradients in models with discrete random variables. A$^\star$ sampling is a framework for optimizing such random objectives over large spaces. We show how to combine these techniques to yield a reinforcement learning algorithm that approximates a policy gradient by finding trajectories that optimize a random objective. We call the resulting algorithms "direct policy gradient" (DirPG) algorithms. A main benefit of DirPG algorithms is that they allow the insertion of domain knowledge in the form of upper bounds on return-to-go at training time, like is used in heuristic search, while still directly computing a policy gradient. We further analyze their properties, showing there are cases where DirPG has an exponentially larger probability of sampling informative gradients compared to REINFORCE. We also show that there is a built-in variance reduction technique and that a parameter that was previously viewed as a numerical approximation can be interpreted as controlling risk sensitivity. Empirically, we evaluate the effect of key degrees of freedom and show that the algorithm performs well in illustrative domains compared to baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Directional Pruning of Deep Neural Networks b/data/2020/neurips/Directional Pruning of Deep Neural Networks
new file mode 100644
index 0000000000..5a2cfd6e1f
--- /dev/null
+++ b/data/2020/neurips/Directional Pruning of Deep Neural Networks	
@@ -0,0 +1 @@
+In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in that flat region. The proposed pruning method is automatic in the sense that neither retraining nor expert knowledge is required. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results show that our algorithm performs competitively in highly sparse regime (92\% sparsity) among many existing automatic pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our algorithm reaches the same minima valley as the SGD, and the minima found by our algorithm and the SGD do not deviate in directions that impact the training loss.
\ No newline at end of file
diff --git a/data/2020/neurips/Directional convergence and alignment in deep learning b/data/2020/neurips/Directional convergence and alignment in deep learning
new file mode 100644
index 0000000000..99fdcc4351
--- /dev/null
+++ b/data/2020/neurips/Directional convergence and alignment in deep learning	
@@ -0,0 +1 @@
+In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge. This proof holds for deep homogeneous networks -- a broad class of networks allowing for ReLU, max pooling, linear, and convolutional layers -- and we additionally provide empirical support not just close to the theory (e.g., the AlexNet), but also on non-homogeneous networks (e.g., the ResNet). If the network further has locally Lipschitz gradients, we show that these gradients converge in direction, and asymptotically align with the gradient flow path, with consequences on margin maximization. Our analysis complements and is distinct from the well-known neural tangent and mean-field theories, and in particular makes no requirements on network width and initialization, instead merely requiring perfect classification accuracy. The proof proceeds by developing a theory of unbounded nonsmooth Kurdyka-Lojasiewicz inequalities for functions definable in an o-minimal structure, and is also applicable outside deep learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Dirichlet Graph Variational Autoencoder b/data/2020/neurips/Dirichlet Graph Variational Autoencoder
new file mode 100644
index 0000000000..dfdc1b0054
--- /dev/null
+++ b/data/2020/neurips/Dirichlet Graph Variational Autoencoder	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) and Variational Autoencoders (VAEs) have been widely used in modeling and generating graphs with latent factors. However, there is no clear explanation of what these latent factors are and why they perform well. In this work, we present Dirichlet Graph Variational Autoencoder (DGVAE) with graph cluster memberships as latent factors. Our study connects VAEs based graph generation and balanced graph cut, and provides a new way to understand and improve the internal mechanism of VAEs based graph generation. Specifically, we first interpret the reconstruction term of DGVAE as balanced graph cut in a principled way. Furthermore, motivated by the low pass characteristics in balanced graph cut, we propose a new variant of GNN named Heatts to encode the input graph into cluster memberships. Heatts utilizes the Taylor series for fast computation of heat kernels and has better low pass characteristics than Graph Convolutional Networks (GCN). Through experiments on graph generation and graph clustering, we demonstrate the effectiveness of our proposed framework.
\ No newline at end of file
diff --git a/data/2020/neurips/DisARM: An Antithetic Gradient Estimator for Binary Latent Variables b/data/2020/neurips/DisARM: An Antithetic Gradient Estimator for Binary Latent Variables
new file mode 100644
index 0000000000..2d2016a58a
--- /dev/null
+++ b/data/2020/neurips/DisARM: An Antithetic Gradient Estimator for Binary Latent Variables	
@@ -0,0 +1 @@
+Training models with discrete latent variables is challenging due to the difficulty of estimating the gradients accurately. Much of the recent progress has been achieved by taking advantage of continuous relaxations of the system, which are not always available or even possible. The Augment-REINFORCE-Merge (ARM) estimator provides an alternative that, instead of relaxation, uses continuous augmentation. Applying antithetic sampling over the augmenting variables yields a relatively low-variance and unbiased estimator applicable to any model with binary latent variables. However, while antithetic sampling reduces variance, the augmentation process increases variance. We show that ARM can be improved by analytically integrating out the randomness introduced by the augmentation process, guaranteeing substantial variance reduction. Our estimator, \emph{DisARM}, is simple to implement and has the same computational cost as ARM. We evaluate DisARM on several generative modeling benchmarks and show that it consistently outperforms ARM and a strong independent sample baseline in terms of both variance and log-likelihood. Furthermore, we propose a local version of DisARM designed for optimizing the multi-sample variational bound, and show that it outperforms VIMCO, the current state-of-the-art method.
\ No newline at end of file
diff --git a/data/2020/neurips/DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction b/data/2020/neurips/DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction
new file mode 100644
index 0000000000..e0e913ce5a
--- /dev/null
+++ b/data/2020/neurips/DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction	
@@ -0,0 +1 @@
+Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. When using standard supervised methods (e.g., for bandits), on-policy data collection provides "hard negatives" that correct the model in precisely those states and actions that the policy is likely to visit. We call this phenomenon "corrective feedback." We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function. In fact, Q-learning and related methods can exhibit pathological interactions between the distribution of experience collected by the agent and the policy induced by training on that experience, leading to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards. We demonstrate the existence of this problem, both theoretically and empirically. We then show that a specific correction to the data distribution can mitigate this issue. Based on these observations, we propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training, resulting in substantial improvements in a range of challenging RL settings, such as multi-task learning and learning from noisy reward signals. Blog post presenting a summary of this work is available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Discovering Reinforcement Learning Algorithms b/data/2020/neurips/Discovering Reinforcement Learning Algorithms
new file mode 100644
index 0000000000..1e96146b54
--- /dev/null
+++ b/data/2020/neurips/Discovering Reinforcement Learning Algorithms	
@@ -0,0 +1 @@
+Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific challenge, it remains an open question whether it is feasible to discover alternatives to fundamental concepts of RL such as value functions and temporal-difference learning. This paper introduces a new meta-learning approach that discovers an entire update rule which includes both 'what to predict' (e.g. value functions) and 'how to learn from it' (e.g. bootstrapping) by interacting with a set of environments. The output of this method is an RL algorithm that we call Learned Policy Gradient (LPG). Empirical results show that our method discovers its own alternative to the concept of value functions. Furthermore it discovers a bootstrapping mechanism to maintain and use its predictions. Surprisingly, when trained solely on toy environments, LPG generalises effectively to complex Atari games and achieves non-trivial performance. This shows the potential to discover general RL algorithms from data.
\ No newline at end of file
diff --git a/data/2020/neurips/Discovering Symbolic Models from Deep Learning with Inductive Biases b/data/2020/neurips/Discovering Symbolic Models from Deep Learning with Inductive Biases
new file mode 100644
index 0000000000..8115f24772
--- /dev/null
+++ b/data/2020/neurips/Discovering Symbolic Models from Deep Learning with Inductive Biases	
@@ -0,0 +1 @@
+We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.
\ No newline at end of file
diff --git a/data/2020/neurips/Discovering conflicting groups in signed networks b/data/2020/neurips/Discovering conflicting groups in signed networks
new file mode 100644
index 0000000000..966c79ea28
--- /dev/null
+++ b/data/2020/neurips/Discovering conflicting groups in signed networks	
@@ -0,0 +1 @@
+Signed networks are graphs where edges are annotated with a positive or negative sign, indicating whether an edge interaction is friendly or antagonistic. Signed networks can be used to study a variety of social phenomena, such as mining polarized discussions in social media, or modeling relations of trust and distrust in online review platforms. In this paper we study the problem of detecting k conﬂicting groups in a signed network. Our premise is that each group is positively connected internally and negatively connected with the other k − 1 groups. A distinguishing aspect of our formulation is that we are not searching for a complete partition of the signed network; instead, we allow a subset of nodes to be neutral with respect to the conﬂict structure we are searching. As a result, the problem we tackle differs from previously-studied problems, such as correlation clustering and k -way partitioning. To solve the conﬂicting-group discovery problem, we derive a novel formulation in which each conﬂicting group is naturally characterized by the solution to the maximum discrete Rayleigh’s quotient ( M AX -DRQ ) problem. We present two spectral methods for ﬁnding approximate solutions to the M AX -DRQ problem, which we analyze theoretically. Our experimental evaluation shows that, compared to state-of-the-art baselines, our methods ﬁnd solutions of higher quality, are faster, and recover ground-truth conﬂicting groups with higher accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching b/data/2020/neurips/Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
new file mode 100644
index 0000000000..5295ff19e0
--- /dev/null
+++ b/data/2020/neurips/Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching	
@@ -0,0 +1 @@
+Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Disentangling Human Error from Ground Truth in Segmentation of Medical Images b/data/2020/neurips/Disentangling Human Error from Ground Truth in Segmentation of Medical Images
new file mode 100644
index 0000000000..301f43ceae
--- /dev/null
+++ b/data/2020/neurips/Disentangling Human Error from Ground Truth in Segmentation of Medical Images	
@@ -0,0 +1 @@
+Recent years have seen increasing use of supervised learning methods for segmenta-1 tion tasks. However, the predictive performance of these algorithms depends on the 2 quality of labels. This problem is particularly pertinent in the medical image domain, 3 where both the annotation cost and inter-observer variability are high. In a typical la-4 bel acquisition process, different human experts provide their estimates of the “true” 5 segmentation labels under the inﬂuence of their own biases and competence levels. 6 Treating these noisy labels blindly as the ground truth limits the performance that 7 automatic segmentation algorithms can achieve. In this work, we present a method 8 for jointly learning, from purely noisy observations alone, the reliability of individual 9 annotators and the true segmentation label distributions, using two coupled CNNs. 10 The separation of the two is achieved by encouraging the estimated annotators to 11 be maximally unreliable while achieving high ﬁdelity with the noisy training data. 12 We ﬁrst deﬁne a toy segmentation dataset based on MNIST and study the properties 13 of the proposed algorithm. We then demonstrate the utility of the method on three 14 public medical imaging segmentation datasets with simulated (when necessary) and 15 real diverse annotations: 1) MSLSC (multiple-sclerosis lesions); 2) BraTS (brain 16 tumours); 3) LIDC-IDRI (lung abnormalities). In all cases, our method outperforms 17 competing methods and relevant baselines particularly in cases where the number 18 of annotations is small and the amount of disagreement is large. The experiments 19 also show strong ability to capture the complex spatial characteristics of annotators’ 20 mistakes, which could be potentially utilised for the purpose of education. 21
\ No newline at end of file
diff --git a/data/2020/neurips/Disentangling by Subspace Diffusion b/data/2020/neurips/Disentangling by Subspace Diffusion
new file mode 100644
index 0000000000..cd8a8a91ec
--- /dev/null
+++ b/data/2020/neurips/Disentangling by Subspace Diffusion	
@@ -0,0 +1 @@
+We present a novel nonparametric algorithm for symmetry-based disentangling of data manifolds, the Geometric Manifold Component Estimator (GEOMANCER). GEOMANCER provides a partial answer to the question posed by Higgins et al. (2018): is it possible to learn how to factorize a Lie group solely from observations of the orbit of an object it acts on? We show that fully unsupervised factorization of a data manifold is possible *if* the true metric of the manifold is known and each factor manifold has nontrivial holonomy -- for example, rotation in 3D. Our algorithm works by estimating the subspaces that are invariant under random walk diffusion, giving an approximation to the de Rham decomposition from differential geometry. We demonstrate the efficacy of GEOMANCER on several complex synthetic manifolds. Our work reduces the question of whether unsupervised disentangling is possible to the question of whether unsupervised metric learning is possible, providing a unifying insight into the geometric nature of representation learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation b/data/2020/neurips/Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation
new file mode 100644
index 0000000000..ad54e284e6
--- /dev/null
+++ b/data/2020/neurips/Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation	
@@ -0,0 +1 @@
+Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume. However, this mechanism has never been employed for the optical flow task. This is mainly due to the significantly increased search dimension in the case of optical flow computation, ie, a straightforward extension would require dense 4D convolutions in order to process a 5D feature volume, which is computationally prohibitive. This paper proposes a novel solution that is able to bypass the requirement of building a 5D feature volume while still allowing the network to learn suitable matching costs from data. Our key innovation is to decouple the connection between 2D displacements and learn the matching costs at each 2D displacement hypothesis independently, ie, displacement-invariant cost learning. Specifically, we apply the same 2D convolution-based matching net independently on each 2D displacement hypothesis to learn a 4D cost volume. Moreover, we propose a displacement-aware projection layer to scale the learned cost volume, which reconsiders the correlation between different displacement candidates and mitigates the multi-modal problem in the learned cost volume. The cost volume is then projected to optical flow estimation through a 2D soft-argmin layer. Extensive experiments show that our approach achieves state-of-the-art accuracy on various datasets, and outperforms all published optical flow methods on the Sintel benchmark.
\ No newline at end of file
diff --git a/data/2020/neurips/Dissecting Neural ODEs b/data/2020/neurips/Dissecting Neural ODEs
new file mode 100644
index 0000000000..5c66014aa5
--- /dev/null
+++ b/data/2020/neurips/Dissecting Neural ODEs	
@@ -0,0 +1 @@
+Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.
\ No newline at end of file
diff --git a/data/2020/neurips/Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning b/data/2020/neurips/Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning
new file mode 100644
index 0000000000..ca415a1ddd
--- /dev/null
+++ b/data/2020/neurips/Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning	
@@ -0,0 +1 @@
+Learning representations of sets of nodes in a graph is crucial for applications ranging from node-role discovery to link prediction and molecule classification. Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, expressive power of GNNs is limited by the 1-Weisfeiler-Lehman (WL) test and thus GNNs generate identical representations for graph substructures that may in fact be very different. More powerful GNNs, proposed recently by mimicking higher-order-WL tests, only focus on representing entire graphs and they are computationally inefficient as they cannot utilize sparsity of the underlying graph. Here we propose and mathematically analyze a general class of structure-related features, termed Distance Encoding (DE). DE assists GNNs in representing any set of nodes, while providing strictly more expressive power than the 1-WL test. DE captures the distance between the node set whose representation is to be learned and each node in the graph. To capture the distance DE can apply various graph-distance measures such as shortest path distance or generalized PageRank scores. We propose two ways for GNNs to use DEs (1) as extra node features, and (2) as controllers of message aggregation in GNNs. Both approaches can utilize the sparse structure of the underlying graph, which leads to computational efficiency and scalability. We also prove that DE can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. We evaluate DE on three tasks over six real networks: structural role prediction, link prediction, and triangle prediction. Results show that our models outperform GNNs without DE by up-to 15\% in accuracy and AUROC. Furthermore, our models also significantly outperform other state-of-the-art methods especially designed for the above tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributed Distillation for On-Device Learning b/data/2020/neurips/Distributed Distillation for On-Device Learning
new file mode 100644
index 0000000000..500d4f2969
--- /dev/null
+++ b/data/2020/neurips/Distributed Distillation for On-Device Learning	
@@ -0,0 +1 @@
+On-device learning promises collaborative training of machine learning models across edge devices without the sharing of user data. In state-of-the-art on-device learning algorithms, devices communicate their model weights over a decentralized communication network. Transmitting model weights requires huge communication overhead and means only devices with identical model architectures can be included. To overcome these limitations, we introduce a distributed distillation algorithm where devices communicate and learn from soft-decision (softmax) outputs, which are inherently architecture-agnostic and scale only with the number of classes. The communicated soft-decisions are each model’s outputs on a public, unlabeled reference dataset, which serves as a common vocabulary between devices. We prove that our algorithm converges with probability 1 to a stationary point where all devices in the communication network distill the entire network’s knowledge on the reference data, regardless of their local connections. Our analysis assumes smooth loss functions, which can be non-convex. Simulations support our theoretical ﬁndings and show that even a naive implementation of our algorithm signiﬁcantly reduces the communication overhead while achieving an overall comparable performance to state-of-the-art, depending on the regime. By requiring little communication overhead and allowing for cross-architecture training, we remove two main obstacles to scaling on-device learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributed Newton Can Communicate Less and Resist Byzantine Workers b/data/2020/neurips/Distributed Newton Can Communicate Less and Resist Byzantine Workers
new file mode 100644
index 0000000000..6e3a48223b
--- /dev/null
+++ b/data/2020/neurips/Distributed Newton Can Communicate Less and Resist Byzantine Workers	
@@ -0,0 +1 @@
+We develop a distributed second order optimization algorithm that is communication-efficient as well as robust against Byzantine failures of the worker machines. We propose COMRADE (COMunication-efficient and Robust Approximate Distributed nEwton), an iterative second order algorithm, where the worker machines communicate only once per iteration with the center machine. This is in sharp contrast with the state-of-the-art distributed second order algorithms like GIANT [34] and DINGO[7], where the worker machines send (functions of) local gradient and Hessian sequentially; thus ending up communicating twice with the center machine per iteration. Moreover, we show that the worker machines can further compress the local information before sending it to the center. In addition, we employ a simple norm based thresholding rule to filter-out the Byzantine worker machines. We establish the linear-quadratic rate of convergence of COMRADE and establish that the communication savings and Byzantine resilience result in only a small statistical error rate for arbitrary convex loss functions. To the best of our knowledge, this is the first work that addresses the issue of Byzantine resilience in second order distributed optimization. Furthermore, we validate our theoretical results with extensive experiments on synthetic and benchmark LIBSVM [5] data-sets and demonstrate convergence guarantees.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms b/data/2020/neurips/Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms
new file mode 100644
index 0000000000..f4c79a7b6e
--- /dev/null
+++ b/data/2020/neurips/Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms	
@@ -0,0 +1 @@
+Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via 1-bit compression on the local gradients, and medianSGD, an algorithm recently proposed to ensure robustness against Byzantine workers. The convergence analyses for these algorithms critically rely on the assumption that all the distributed data are drawn iid from the same distribution. However, in applications such as Federated Learning, the data across different nodes or machines can be inherently heterogeneous, which violates such an iid assumption. This work analyzes signSGD and medianSGD in distributed settings with heterogeneous data. We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients. To overcome this gap, we provide a novel gradient correction mechanism that perturbs the local gradients with noise, together with a series results that provable close the gap between mean and median of the gradients. The proposed methods largely preserve nice properties of these methods, such as the low per-iteration communication complexity of signSGD, and further enjoy global convergence to stationary solutions. Our perturbation technique can be of independent interest when one wishes to estimate mean through a median estimator.
\ No newline at end of file
diff --git a/data/2020/neurips/Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning b/data/2020/neurips/Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning
new file mode 100644
index 0000000000..9fe62f1c83
--- /dev/null
+++ b/data/2020/neurips/Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning	
@@ -0,0 +1 @@
+While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.
\ No newline at end of file
diff --git a/data/2020/neurips/Distribution Matching for Crowd Counting b/data/2020/neurips/Distribution Matching for Crowd Counting
new file mode 100644
index 0000000000..0301d5dfde
--- /dev/null
+++ b/data/2020/neurips/Distribution Matching for Crowd Counting	
@@ -0,0 +1 @@
+In crowd counting, each training image contains multiple people, where each person is annotated by a dot. Existing crowd counting methods need to use a Gaussian to smooth each annotated dot or to estimate the likelihood of every pixel given the annotated point. In this paper, we show that imposing Gaussians to annotations hurts generalization performance. Instead, we propose to use Distribution Matching for crowd COUNTing (DM-Count). In DM-Count, we use Optimal Transport (OT) to measure the similarity between the normalized predicted density map and the normalized ground truth density map. To stabilize OT computation, we include a Total Variation loss in our model. We show that the generalization error bound of DM-Count is tighter than that of the Gaussian smoothed methods. In terms of Mean Absolute Error, DM-Count outperforms the previous state-of-the-art methods by a large margin on two large-scale counting datasets, UCF-QNRF and NWPU, and achieves the state-of-the-art results on the ShanghaiTech and UCF-CC50 datasets. Notably, DM-Count ranked first on the leaderboard for the NWPU benchmark, reducing the error of the state-of-the-art published result by approximately 16%. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Distribution-free binary classification: prediction sets, confidence intervals and calibration b/data/2020/neurips/Distribution-free binary classification: prediction sets, confidence intervals and calibration
new file mode 100644
index 0000000000..21ecfa0d8c
--- /dev/null
+++ b/data/2020/neurips/Distribution-free binary classification: prediction sets, confidence intervals and calibration	
@@ -0,0 +1 @@
+We study three notions of uncertainty quantification---calibration, confidence intervals and prediction sets---for binary classification in the distribution-free setting, that is without making any distributional assumptions on the data. With a focus towards calibration, we establish a 'tripod' of theorems that connect these three notions for score-based classifiers. A direct implication is that distribution-free calibration is only possible, even asymptotically, using a scoring function whose level sets partition the feature space into at most countably many sets. Parametric calibration schemes such as variants of Platt scaling do not satisfy this requirement, while nonparametric schemes based on binning do. To close the loop, we derive distribution-free confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration. We also derive extensions to settings with streaming data and covariate shift.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributional Robustness with IPMs and links to Regularization and GANs b/data/2020/neurips/Distributional Robustness with IPMs and links to Regularization and GANs
new file mode 100644
index 0000000000..2336f420d7
--- /dev/null
+++ b/data/2020/neurips/Distributional Robustness with IPMs and links to Regularization and GANs	
@@ -0,0 +1 @@
+Robustness to adversarial attacks is an important concern due to the fragility of deep neural networks to small perturbations and has received an abundance of attention in recent years. Distributionally Robust Optimization (DRO), a particularly promising way of addressing this challenge, studies robustness via divergence-based uncertainty sets and has provided valuable insights into robustification strategies such as regularization. In the context of machine learning, the majority of existing results have chosen $f$-divergences, Wasserstein distances and more recently, the Maximum Mean Discrepancy (MMD) to construct uncertainty sets. We extend this line of work for the purposes of understanding robustness via regularization by studying uncertainty sets constructed with Integral Probability Metrics (IPMs) - a large family of divergences including the MMD, Total Variation and Wasserstein distances. Our main result shows that DRO under \textit{any} choice of IPM corresponds to a family of regularization penalties, which recover and improve upon existing results in the setting of MMD and Wasserstein distances. Due to the generality of our result, we show that other choices of IPMs correspond to other commonly used penalties in machine learning. Furthermore, we extend our results to shed light on adversarial generative modelling via $f$-GANs, constituting the first study of distributional robustness for the $f$-GAN objective. Our results unveil the inductive properties of the discriminator set with regards to robustness, allowing us to give positive comments for several penalty-based GAN methods such as Wasserstein-, MMD- and Sobolev-GANs. In summary, our results intimately link GANs to distributional robustness, extend previous results on DRO and contribute to our understanding of the link between regularization and robustness at large.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributionally Robust Federated Averaging b/data/2020/neurips/Distributionally Robust Federated Averaging
new file mode 100644
index 0000000000..1fcc38324c
--- /dev/null
+++ b/data/2020/neurips/Distributionally Robust Federated Averaging	
@@ -0,0 +1 @@
+In this paper, we study communication efficient distributed algorithms for distributionally robust federated learning via periodic averaging with adaptive sampling. In contrast to standard empirical risk minimization, due to the minimax structure of the underlying optimization problem, a key difficulty arises from the fact that the global parameter that controls the mixture of local losses can only be updated infrequently on the global stage. To compensate for this, we propose a Distributionally Robust Federated Averaging (DRFA) algorithm that employs a novel snapshotting scheme to approximate the accumulation of history gradients of the mixing parameter. We analyze the convergence rate of DRFA in both convex-linear and nonconvex-linear settings. We also generalize the proposed idea to objectives with regularization on the mixture parameter and propose a proximal variant, dubbed as DRFA-Prox, with provable convergence rates. We also analyze an alternative optimization method for regularized cases in strongly-convex-strongly-concave and non-convex (under PL condition)-strongly-concave settings. To the best of our knowledge, this paper is the first to solve distributionally robust federated learning with reduced communication, and to analyze the efficiency of local descent methods on distributed minimax problems. We give corroborating experimental evidence for our theoretical results in federated learning settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributionally Robust Local Non-parametric Conditional Estimation b/data/2020/neurips/Distributionally Robust Local Non-parametric Conditional Estimation
new file mode 100644
index 0000000000..16ae1094ce
--- /dev/null
+++ b/data/2020/neurips/Distributionally Robust Local Non-parametric Conditional Estimation	
@@ -0,0 +1 @@
+Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perform poorly under a low sample size. To alleviate these issues, we propose a new distributionally robust estimator that generates non-parametric local estimates by minimizing the worst-case conditional expected loss over all adversarial distributions in a Wasserstein ambiguity set. We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization under broadly applicable settings, and it is robust to the corruption and heterogeneity of the data. Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
\ No newline at end of file
diff --git a/data/2020/neurips/Distributionally Robust Parametric Maximum Likelihood Estimation b/data/2020/neurips/Distributionally Robust Parametric Maximum Likelihood Estimation
new file mode 100644
index 0000000000..c48a26e3a7
--- /dev/null
+++ b/data/2020/neurips/Distributionally Robust Parametric Maximum Likelihood Estimation	
@@ -0,0 +1 @@
+We consider the parameter estimation problem of a probabilistic generative model prescribed using a natural exponential family of distributions. For this problem, the typical maximum likelihood estimator usually overfits under limited training sample size, is sensitive to noise and may perform poorly on downstream predictive tasks. To mitigate these issues, we propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric Kullback-Leibler ball around a parametric nominal distribution. Leveraging the analytical expression of the Kullback-Leibler divergence between two distributions in the same natural exponential family, we show that the min-max estimation problem is tractable in a broad setting, including the robust training of generalized linear models. Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Diverse Image Captioning with Context-Object Split Latent Spaces b/data/2020/neurips/Diverse Image Captioning with Context-Object Split Latent Spaces
new file mode 100644
index 0000000000..a8d3838a25
--- /dev/null
+++ b/data/2020/neurips/Diverse Image Captioning with Context-Object Split Latent Spaces	
@@ -0,0 +1 @@
+Diverse image captioning models aim to learn one-to-many mappings that are innate to cross-domain datasets, such as of images and texts. Current methods for this task are based on generative latent variable models, e.g. VAEs with structured latent spaces. Yet, the amount of multimodality captured by prior work is limited to that of the paired training data -- the true diversity of the underlying generative process is not fully captured. To address this limitation, we leverage the contextual descriptions in the dataset that explain similar contexts in different visual scenes. To this end, we introduce a novel factorization of the latent space, termed context-object split, to model diversity in contextual descriptions across images and texts within the dataset. Our framework not only enables diverse captioning through context-based pseudo supervision, but extends this to images with novel objects and without paired captions in the training data. We evaluate our COS-CVAE approach on the standard COCO dataset and on the held-out COCO dataset consisting of images with novel objects, showing significant gains in accuracy and diversity.
\ No newline at end of file
diff --git a/data/2020/neurips/Diversity can be Transferred: Output Diversification for White- and Black-box Attacks b/data/2020/neurips/Diversity can be Transferred: Output Diversification for White- and Black-box Attacks
new file mode 100644
index 0000000000..8362b17e48
--- /dev/null
+++ b/data/2020/neurips/Diversity can be Transferred: Output Diversification for White- and Black-box Attacks	
@@ -0,0 +1 @@
+Adversarial attacks often involve random perturbations of the inputs drawn from uniform or Gaussian distributions, e.g., to initialize optimization-based white-box attacks or generate update directions in black-box attacks. These simple perturbations, however, could be sub-optimal as they are agnostic to the model being attacked. To improve the efficiency of these attacks, we propose Output Diversified Sampling (ODS), a novel sampling strategy that attempts to maximize diversity in the target model's outputs among the generated samples. While ODS is a gradient-based strategy, the diversity offered by ODS is transferable and can be helpful for both white-box and black-box attacks via surrogate models. Empirically, we demonstrate that ODS significantly improves the performance of existing white-box and black-box attacks. In particular, ODS reduces the number of queries needed for state-of-the-art black-box attacks on ImageNet by a factor of two.
\ No newline at end of file
diff --git a/data/2020/neurips/Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations b/data/2020/neurips/Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations
new file mode 100644
index 0000000000..93fb4a54a9
--- /dev/null
+++ b/data/2020/neurips/Diversity-Guided Multi-Objective Bayesian Optimization With Batch Evaluations	
@@ -0,0 +1 @@
+Many science, engineering, and design optimization problems require balancing the trade-offs between several conflicting objectives. The objectives are often blackbox functions whose evaluations are time-consuming and costly. Multi-objective Bayesian optimization can be used to automate the process of discovering the set of optimal solutions, called Pareto-optimal, while minimizing the number of performed evaluations. To further reduce the evaluation time in the optimization process, testing of several samples in parallel can be deployed. We propose a novel multi-objective Bayesian optimization algorithm that iteratively selects the best batch of samples to be evaluated in parallel. Our algorithm approximates and analyzes a piecewise-continuous Pareto set representation. This representation allows us to introduce a batch selection strategy that optimizes for both hypervolume improvement and diversity of selected samples in order to efficiently advance promising regions of the Pareto front. Experiments on both synthetic test functions and real-world benchmark problems show that our algorithm predominantly outperforms relevant state-of-the-art methods. The code is available at https://github.com/yunshengtian/DGEMO.
\ No newline at end of file
diff --git a/data/2020/neurips/Do Adversarially Robust ImageNet Models Transfer Better? b/data/2020/neurips/Do Adversarially Robust ImageNet Models Transfer Better?
new file mode 100644
index 0000000000..ce311ad295
--- /dev/null
+++ b/data/2020/neurips/Do Adversarially Robust ImageNet Models Transfer Better?	
@@ -0,0 +1 @@
+Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on standard datasets can be efficiently adapted to downstream tasks. Typically, better pre-trained models yield better transfer results, suggesting that initial accuracy is a key aspect of transfer learning performance. In this work, we identify another such aspect: we find that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning. Specifically, we focus on adversarially robust ImageNet classifiers, and show that they yield improved accuracy on a standard suite of downstream classification tasks. Further analysis uncovers more differences between robust and standard models in the context of transfer learning. Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations. Our code and models are available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? b/data/2020/neurips/Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?
new file mode 100644
index 0000000000..343da01f03
--- /dev/null
+++ b/data/2020/neurips/Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?	
@@ -0,0 +1 @@
+Existing Neural Architecture Search (NAS) methods either encode neural architectures using discrete encodings that do not scale well, or adopt supervised learning-based methods to jointly learn architecture representations and optimize architecture search on such representations which incurs search bias. Despite the widespread use, architecture representations learned in NAS are still poorly understood. We observe that the structural properties of neural architectures are hard to preserve in the latent space if architecture representation learning and search are coupled, resulting in less effective search performance. In this work, we find empirically that pre-training architecture representations using only neural architectures without their accuracies as labels considerably improve the downstream architecture search efficiency. To explain these observations, we visualize how unsupervised architecture representation learning better encourages neural architectures with similar connections and operators to cluster together. This helps to map neural architectures with similar performance to the same regions in the latent space and makes the transition of architectures in the latent space relatively smooth, which considerably benefits diverse downstream search strategies.
\ No newline at end of file
diff --git a/data/2020/neurips/Domain Adaptation as a Problem of Inference on Graphical Models b/data/2020/neurips/Domain Adaptation as a Problem of Inference on Graphical Models
new file mode 100644
index 0000000000..cb50931bde
--- /dev/null
+++ b/data/2020/neurips/Domain Adaptation as a Problem of Inference on Graphical Models	
@@ -0,0 +1 @@
+This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift b/data/2020/neurips/Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift
new file mode 100644
index 0000000000..448dcdb1a5
--- /dev/null
+++ b/data/2020/neurips/Domain Adaptation with Conditional Distribution Matching and Generalized Label Shift	
@@ -0,0 +1 @@
+Adversarial learning has demonstrated good performance in the unsupervised domain adaptation setting, by learning domain-invariant representations that perform well on the source domain. However, recent work has underlined limitations of existing methods in the presence of mismatched label distributions between the source and target domains. In this paper, we extend a recent upper-bound on the performance of adversarial domain adaptation to multi-class classification and more general discriminators. We then propose generalized label shift (GLS) as a way to improve robustness against mismatched label distributions. GLS states that, conditioned on the label, there exists a representation of the input that is invariant between the source and target domains. Under GLS, we provide theoretical guarantees on the transfer performance of any classifier. We also devise necessary and sufficient conditions for GLS to hold. The conditions are based on the estimation of the relative class weights between domains and on an appropriate reweighting of samples. Guided by our theoretical insights, we modify three widely used algorithms, JAN, DANN and CDAN and evaluate their performance on standard domain adaptation tasks where our method outperforms the base versions. We also demonstrate significant gains on artificially created tasks with large divergences between their source and target label distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization b/data/2020/neurips/Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization
new file mode 100644
index 0000000000..ebbebdbfef
--- /dev/null
+++ b/data/2020/neurips/Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization	
@@ -0,0 +1,2 @@
+Recently, we have witnessed great progress in the field of medical imaging classification by adopting deep neural networks. However, the recent advanced models still require accessing sufficiently large and representative datasets for training, which is often unfeasible in clinically realistic environments. When trained on limited datasets, the deep neural network is lack of generalization capability, as the trained deep neural network on data within a certain distribution (e.g. the data captured by a certain device vendor or patient population) may not be able to generalize to the data with another distribution. 
+In this paper, we introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding with a novel linear-dependency regularization term to capture the shareable information among medical data collected from different domains. As a result, the trained neural network is expected to equip with better generalization capability to the "unseen" medical data. Experimental results on two challenging medical imaging classification tasks indicate that our method can achieve better cross-domain generalization capability compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Domain Generalization via Entropy Regularization b/data/2020/neurips/Domain Generalization via Entropy Regularization
new file mode 100644
index 0000000000..58dfa733db
--- /dev/null
+++ b/data/2020/neurips/Domain Generalization via Entropy Regularization	
@@ -0,0 +1 @@
+Domain generalization aims to learn from multiple source domains a predictive model that can generalize to unseen target domains. One essential problem in domain generalization is to learn discriminative domain-invariant features. To arrive at this, some methods introduce a domain discriminator through adversarial learning to match the feature distributions in multiple source domains. However, adversarial training can only guarantee that the learned features have invariant marginal distributions, while the invariance of conditional distributions is more important for prediction in new domains. To ensure the conditional invariance of learned features, we propose an entropy regularization term that measures the dependency between the learned features and the class labels. Combined with the typical task-related loss, e.g., cross-entropy loss for classiﬁcation, and adversarial loss for domain discrimination, our overall objective is guaranteed to learn conditional-invariant features across all source domains and thus can learn classiﬁers with better generalization capabilities. We demonstrate the effectiveness of our method through comparison with state-of-the-art methods on both simulated and real-world datasets. Code is available at: https://github.com/sshan-zhao/DG_via_ER .
\ No newline at end of file
diff --git a/data/2020/neurips/Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies b/data/2020/neurips/Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
new file mode 100644
index 0000000000..30972311af
--- /dev/null
+++ b/data/2020/neurips/Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies	
@@ -0,0 +1 @@
+Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior policy to evaluate and learn new policies, is crucial in applications where experimentation is limited such as medicine. We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. Targeting deterministic policies, for which action is a deterministic function of state, is crucial since optimal policies are always deterministic (up to ties). In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist. To circumvent this issue, we propose several new doubly robust estimators based on different kernelization approaches. We analyze the asymptotic mean-squared error of each of these under mild rate conditions for nuisance estimators. Specifically, we demonstrate how to obtain a rate that is independent of the horizon length.
\ No newline at end of file
diff --git a/data/2020/neurips/Dual Instrumental Variable Regression b/data/2020/neurips/Dual Instrumental Variable Regression
new file mode 100644
index 0000000000..ccb5974236
--- /dev/null
+++ b/data/2020/neurips/Dual Instrumental Variable Regression	
@@ -0,0 +1 @@
+We present a novel algorithm for instrumental variable (IV) regression, DualIV, which simplifies traditional two-stage methods via a dual formulation. Inspired by problems in stochastic programming, we show that the two-stage procedure for nonlinear IV regression can be reformulated as a convex-concave saddle-point problem. Our formulation circumvents the first-stage regression which is a potential bottleneck in real-world applications. Based on this new approach, we develop a simple kernel-based algorithm with a closed-form solution. Empirical results show that we are competitive to existing, more complicated algorithms for instrumental variable regression.
\ No newline at end of file
diff --git a/data/2020/neurips/Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks b/data/2020/neurips/Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks
new file mode 100644
index 0000000000..0d89ada28b
--- /dev/null
+++ b/data/2020/neurips/Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks	
@@ -0,0 +1 @@
+Adversarial training is a popular defense strategy against attack threat models with bounded Lp norms. However, it often degrades the model performance on normal images and the defense does not generalize well to novel attacks. Given the success of deep generative models such as GANs and VAEs in characterizing the underlying manifold of images, we investigate whether or not the aforementioned problems can be remedied by exploiting the underlying manifold information. To this end, we construct an "On-Manifold ImageNet" (OM-ImageNet) dataset by projecting the ImageNet samples onto the manifold learned by StyleGSN. For this dataset, the underlying manifold information is exact. Using OM-ImageNet, we first show that adversarial training in the latent space of images improves both standard accuracy and robustness to on-manifold attacks. However, since no out-of-manifold perturbations are realized, the defense can be broken by Lp adversarial attacks. We further propose Dual Manifold Adversarial Training (DMAT) where adversarial perturbations in both latent and image spaces are used in robustifying the model. Our DMAT improves performance on normal images, and achieves comparable robustness to the standard adversarial training against Lp attacks. In addition, we observe that models defended by DMAT achieve improved robustness against novel attacks which manipulate images by global color shifts or various types of image filtering. Interestingly, similar improvements are also achieved when the defended models are tested on out-of-manifold natural images. These results demonstrate the potential benefits of using manifold information in enhancing robustness of deep learning models against various types of novel adversarial attacks.
\ No newline at end of file
diff --git a/data/2020/neurips/Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning b/data/2020/neurips/Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning
new file mode 100644
index 0000000000..2173d03686
--- /dev/null
+++ b/data/2020/neurips/Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning	
@@ -0,0 +1 @@
+The \textit{transition matrix}, denoting the transition relationship from clean labels to noisy labels, is essential to build \textit{statistically consistent} classifiers in label-noise learning. Existing methods for estimating the transition matrix rely heavily on estimating the noisy class posterior. However, the estimation error for \textit{noisy class posterior} could be large due to the randomness of label noise. The estimation error would lead the transition matrix to be poorly estimated. Therefore, in this paper, we aim to solve this problem by exploiting the divide-and-conquer paradigm. Specifically, we introduce an \textit{intermediate class} to avoid directly estimating the noisy class posterior. By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimate transition matrices. We term the proposed method the \textit{dual $T$-estimator}. Both theoretical analyses and empirical results illustrate the effectiveness of the dual $T$-estimator for estimating transition matrices, leading to better classification performances.
\ No newline at end of file
diff --git a/data/2020/neurips/Dual-Free Stochastic Decentralized Optimization with Variance Reduction b/data/2020/neurips/Dual-Free Stochastic Decentralized Optimization with Variance Reduction
new file mode 100644
index 0000000000..175d84943b
--- /dev/null
+++ b/data/2020/neurips/Dual-Free Stochastic Decentralized Optimization with Variance Reduction	
@@ -0,0 +1 @@
+We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a $1/n$ fraction of the dataset, where $n$ is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence. We give an accelerated version of DVR based on the Catalyst framework, and illustrate its effectiveness with simulations on real data.
\ No newline at end of file
diff --git a/data/2020/neurips/Dual-Resolution Correspondence Networks b/data/2020/neurips/Dual-Resolution Correspondence Networks
new file mode 100644
index 0000000000..2bf7a77ab7
--- /dev/null
+++ b/data/2020/neurips/Dual-Resolution Correspondence Networks	
@@ -0,0 +1 @@
+We tackle the problem of establishing dense pixel-wise correspondences between a pair of images. In this work, we introduce Dual-Resolution Correspondence Networks (DRC-Net), to obtain pixel-wise correspondences in a coarse-to-fine manner. DRC-Net extracts both coarse- and fine- resolution feature maps. The coarse maps are used to produce a full but coarse 4D correlation tensor, which is then refined by a learnable neighbourhood consensus module. The fine-resolution feature maps are used to obtain the final dense correspondences guided by the refined coarse 4D correlation tensor. The selected coarse-resolution matching scores allow the fine-resolution features to focus only on a limited number of possible matches with high confidence. In this way, DRC-Net dramatically increases matching reliability and localisation accuracy, while avoiding to apply the expensive 4D convolution kernels on fine-resolution feature maps. We comprehensively evaluate our method on large-scale public benchmarks including HPatches, InLoc, and Aachen Day-Night. It achieves the state-of-the-art results on all of them.
\ No newline at end of file
diff --git a/data/2020/neurips/Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion b/data/2020/neurips/Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion
new file mode 100644
index 0000000000..e10d256bae
--- /dev/null
+++ b/data/2020/neurips/Duality-Induced Regularizer for Tensor Factorization Based Knowledge Graph Completion	
@@ -0,0 +1 @@
+Tensor factorization based models have shown great power in knowledge graph completion (KGC). However, their performance usually suffers from the overfitting problem seriously. This motivates various regularizers---such as the squared Frobenius norm and tensor nuclear norm regularizers---while the limited applicability significantly limits their practical usage. To address this challenge, we propose a novel regularizer---namely, DUality-induced RegulArizer (DURA)---which is not only effective in improving the performance of existing models but widely applicable to various methods. The major novelty of DURA is based on the observation that, for an existing tensor factorization based KGC model (primal), there is often another distance based KGC model (dual) closely associated with it. Experiments show that DURA yields consistent and significant improvements on benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/DynaBERT: Dynamic BERT with Adaptive Width and Depth b/data/2020/neurips/DynaBERT: Dynamic BERT with Adaptive Width and Depth
new file mode 100644
index 0000000000..2d15698f04
--- /dev/null
+++ b/data/2020/neurips/DynaBERT: Dynamic BERT with Adaptive Width and Depth	
@@ -0,0 +1 @@
+The pre-trained language models like BERT and RoBERTa, though powerful in many natural language processing tasks, are both computational and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually reduce the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can run at adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allows both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks. Comprehensive experiments under various efficiency constraints demonstrate that our proposed dynamic BERT (or RoBERTa) at its largest size has comparable performance as BERT (or RoBERTa), while at smaller widths and depths consistently outperforms existing BERT compression methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains b/data/2020/neurips/Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains
new file mode 100644
index 0000000000..e60393c48e
--- /dev/null
+++ b/data/2020/neurips/Dynamic Fusion of Eye Movement Data and Verbal Narrations in Knowledge-rich Domains	
@@ -0,0 +1 @@
+We propose to jointly analyze experts’ eye movements and verbal narrations to discover important and interpretable knowledge patterns to better understand their decision-making processes. The discovered patterns can further enhance data-driven statistical models by fusing experts’ domain knowledge to support complex human-machine collaborative decision-making. Our key contribution is a novel dynamic Bayesian nonparametric model that assigns latent knowledge patterns into key phases involved in complex decision-making. Each phase is characterized by a unique distribution of word topics discovered from verbal narrations and their dynamic interactions with eye movement patterns, indicating experts’ special perceptual behavior within a given decision-making stage. A new split-merge-switch sampler is developed to efﬁciently explore the posterior state space with an improved mixing rate. Case studies on diagnostic error prediction and disease morphology categorization help demonstrate the effectiveness of the proposed model and discovered knowledge patterns.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamic Regret of Convex and Smooth Functions b/data/2020/neurips/Dynamic Regret of Convex and Smooth Functions
new file mode 100644
index 0000000000..a788eb6764
--- /dev/null
+++ b/data/2020/neurips/Dynamic Regret of Convex and Smooth Functions	
@@ -0,0 +1 @@
+We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path-length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $\mathcal{O}(\sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the dynamic regret by exploiting the smoothness condition. Specifically, we propose novel online algorithms that are capable of leveraging smoothness and replace the dependence on $T$ in the dynamic regret by problem-dependent quantities: the variation in gradients of loss functions, and the cumulative loss of the comparator sequence. These quantities are at most $\mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile guarantee the same rate in the worst case.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamic Regret of Policy Optimization in Non-Stationary Environments b/data/2020/neurips/Dynamic Regret of Policy Optimization in Non-Stationary Environments
new file mode 100644
index 0000000000..0c15980007
--- /dev/null
+++ b/data/2020/neurips/Dynamic Regret of Policy Optimization in Non-Stationary Environments	
@@ -0,0 +1 @@
+We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels. We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. Compared with the classical notion of static regret, dynamic regret is a stronger notion as it explicitly accounts for the non-stationarity of environments. The dynamic regret attained by the proposed algorithms interpolates between different regimes of non-stationarity, and moreover satisfies a notion of adaptive (near-)optimality, in the sense that it matches the (near-)optimal static regret under slow-changing environments. The dynamic regret bound features two components, one arising from exploration, which deals with the uncertainty of transition kernels, and the other arising from adaptation, which deals with non-stationary environments. Specifically, we show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamic Submodular Maximization b/data/2020/neurips/Dynamic Submodular Maximization
new file mode 100644
index 0000000000..5480bbef10
--- /dev/null
+++ b/data/2020/neurips/Dynamic Submodular Maximization	
@@ -0,0 +1 @@
+Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. Our main result is a randomized algorithm that maintains an efficient data structure with an $\tilde{O}(k^2)$ amortized update time (in the number of additions and deletions) and yields a $4$-approximate solution, where $k$ is the rank of the matroid.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamic allocation of limited memory resources in reinforcement learning b/data/2020/neurips/Dynamic allocation of limited memory resources in reinforcement learning
new file mode 100644
index 0000000000..b1569503e9
--- /dev/null
+++ b/data/2020/neurips/Dynamic allocation of limited memory resources in reinforcement learning	
@@ -0,0 +1 @@
+Biological brains are inherently limited in their capacity to process and store information, but are nevertheless capable of solving complex tasks with apparent ease. Intelligent behavior is related to these limitations, since resource constraints drive the need to generalize and assign importance differentially to features in the environment or memories of past experiences. Recently, there have been parallel efforts in reinforcement learning and neuroscience to understand strategies adopted by artificial and biological agents to circumvent limitations in information storage. However, the two threads have been largely separate. In this article, we propose a dynamical framework to maximize expected reward under constraints of limited resources, which we implement with a cost function that penalizes precise representations of action-values in memory, each of which may vary in its precision. We derive from first principles an algorithm, Dynamic Resource Allocator (DRA), which we apply to two standard tasks in reinforcement learning and a model-based planning task, and find that it allocates more resources to items in memory that have a higher impact on cumulative rewards. Moreover, DRA learns faster when starting with a higher resource budget than what it eventually allocates for performing well on tasks, which may explain why frontal cortical areas in biological brains appear more engaged in early stages of learning before settling to lower asymptotic levels of activity. Our work provides a normative solution to the problem of learning how to allocate costly resources to a collection of uncertain memories in a manner that is capable of adapting to changes in the environment.
\ No newline at end of file
diff --git a/data/2020/neurips/Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification b/data/2020/neurips/Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification
new file mode 100644
index 0000000000..b2a7b5dfb0
--- /dev/null
+++ b/data/2020/neurips/Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification	
@@ -0,0 +1 @@
+We analyze in a closed form the learning dynamics of the stochastic gradient descent (SGD) for a single-layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit, we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process. We explore the performance of the algorithm as a function of the control parameters shedding light on how it navigates the loss landscape.
\ No newline at end of file
diff --git a/data/2020/neurips/Early-Learning Regularization Prevents Memorization of Noisy Labels b/data/2020/neurips/Early-Learning Regularization Prevents Memorization of Noisy Labels
new file mode 100644
index 0000000000..3c001a5dbb
--- /dev/null
+++ b/data/2020/neurips/Early-Learning Regularization Prevents Memorization of Noisy Labels	
@@ -0,0 +1 @@
+We propose a novel framework to perform classification via deep learning in the presence of noisy annotations. When trained on noisy labels, deep neural networks have been observed to first fit the training data with clean labels during an "early learning" phase, before eventually memorizing the examples with false labels. We prove that early learning and memorization are fundamental phenomena in high-dimensional classification tasks, even in simple linear models, and give a theoretical explanation in this setting. Motivated by these findings, we develop a new technique for noisy classification tasks, which exploits the progress of the early learning phase. In contrast with existing approaches, which use the model output during early learning to detect the examples with clean labels, and either ignore or attempt to correct the false labels, we take a different route and instead capitalize on early learning via regularization. There are two key elements to our approach. First, we leverage semi-supervised learning techniques to produce target probabilities based on the model outputs. Second, we design a regularization term that steers the model towards these targets, implicitly preventing memorization of the false labels. The resulting framework is shown to provide robustness to noisy annotations on several standard benchmarks and real-world datasets, where it achieves results comparable to the state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints b/data/2020/neurips/EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints
new file mode 100644
index 0000000000..493eab538a
--- /dev/null
+++ b/data/2020/neurips/EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints	
@@ -0,0 +1 @@
+Effective intersection control can play an important role in reducing trafﬁc congestion and associated vehicular emissions. This is vitally needed in developing countries, where air pollution is reaching life threatening levels. This paper presents EcoLight intersection control for developing regions, where budget is constrained and network connectivity is very poor. EcoLight learns effective control ofﬂine using state-of-the-art Deep Reinforcement Learning methods, but deploys highly efﬁcient runtime control algorithms on low cost embedded devices that work stand-alone on road without server connectivity. EcoLight optimizes both average case and worst case values of throughput, travel time and other metrics, as evaluated on open-source datasets from New York and on a custom developing region dataset.
\ No newline at end of file
diff --git a/data/2020/neurips/Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization b/data/2020/neurips/Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization
new file mode 100644
index 0000000000..6f73a81aa6
--- /dev/null
+++ b/data/2020/neurips/Effective Dimension Adaptive Sketching Methods for Faster Regularized Least-Squares Optimization	
@@ -0,0 +1 @@
+We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching. We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT). While current randomized solvers for least-squares optimization prescribe an embedding dimension at least greater than the data dimension, we show that the embedding dimension can be reduced to the effective dimension of the optimization problem, and still preserve high-probability convergence guarantees. In this regard, we derive sharp matrix deviation inequalities over ellipsoids for both Gaussian and SRHT embeddings. Specifically, we improve on the constant of a classical Gaussian concentration bound whereas, for SRHT embeddings, our deviation inequality involves a novel technical approach. Leveraging these bounds, we are able to design a practical and adaptive algorithm which does not require to know the effective dimension beforehand. Our method starts with an initial embedding dimension equal to 1 and, over iterations, increases the embedding dimension up to the effective one. Finally, we prove that our algorithm improves the state-of-the-art computational complexity for solving regularized least-squares problems. Further, we show numerically that it outperforms standard least-squares solvers such as the conjugate gradient method and its pre-conditioned version on several standard machine learning datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Effective Diversity in Population Based Reinforcement Learning b/data/2020/neurips/Effective Diversity in Population Based Reinforcement Learning
new file mode 100644
index 0000000000..5bc4947290
--- /dev/null
+++ b/data/2020/neurips/Effective Diversity in Population Based Reinforcement Learning	
@@ -0,0 +1 @@
+Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In addition, explicitly boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards. As such, the reward-diversity trade off typically relies on heuristics. Finally, such methods require behavioral representations, often handcrafted and domain specific. In this paper, we introduce an approach to optimize all members of a population simultaneously. Rather than using pairwise distance, we measure the volume of the entire population in a behavioral manifold, defined by task-agnostic behavioral embeddings. In addition, our algorithm Diversity via Determinants (DvD), adapts the degree of diversity during training using online learning techniques. We introduce both evolutionary and gradient-based instantiations of DvD and show they effectively improve exploration without reducing performance when better exploration is not required.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Algorithms for Device Placement of DNN Graph Operators b/data/2020/neurips/Efficient Algorithms for Device Placement of DNN Graph Operators
new file mode 100644
index 0000000000..12f53652b2
--- /dev/null
+++ b/data/2020/neurips/Efficient Algorithms for Device Placement of DNN Graph Operators	
@@ -0,0 +1 @@
+Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings. We then provide algorithms that solve this problem to optimality. We demonstrate the applicability and efficiency of our approaches using several contemporary DNN computation graphs.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut b/data/2020/neurips/Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut
new file mode 100644
index 0000000000..5d268267df
--- /dev/null
+++ b/data/2020/neurips/Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut	
@@ -0,0 +1 @@
+Spectral clustering and k -means, both as two major traditional clustering methods, are still attracting a lot of attention, although a variety of novel clustering algorithms have been proposed in recent years. Firstly, a uniﬁed framework of k -means and ratio-cut is revisited, and a novel and efﬁcient clustering algorithm is then proposed based on this framework. The time and space complexity of our method are both linear with respect to the number of samples, and are independent of the number of clusters to construct, more importantly. These properties mean that it is easily scalable and applicable to large practical problems. Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithm compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to k -means on the synthetic dataset of 1 million samples and the benchmark dataset (CelebA) of 200k samples, respectively [GitHub].
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Clustering for Stretched Mixtures: Landscape and Optimality b/data/2020/neurips/Efficient Clustering for Stretched Mixtures: Landscape and Optimality
new file mode 100644
index 0000000000..0015e92cdd
--- /dev/null
+++ b/data/2020/neurips/Efficient Clustering for Stretched Mixtures: Landscape and Optimality	
@@ -0,0 +1 @@
+This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Contextual Bandits with Continuous Actions b/data/2020/neurips/Efficient Contextual Bandits with Continuous Actions
new file mode 100644
index 0000000000..1ac6f49a4c
--- /dev/null
+++ b/data/2020/neurips/Efficient Contextual Bandits with Continuous Actions	
@@ -0,0 +1 @@
+We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning b/data/2020/neurips/Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning
new file mode 100644
index 0000000000..afc03b6161
--- /dev/null
+++ b/data/2020/neurips/Efficient Distance Approximation for Structured High-Dimensional Distributions via Learning	
@@ -0,0 +1,6 @@
+We design efficient distance approximation algorithms for several classes of structured high-dimensional distributions. Specifically, we show algorithms for the following problems: 
+- Given sample access to two Bayesian networks $P_1$ and $P_2$ over known directed acyclic graphs $G_1$ and $G_2$ having $n$ nodes and bounded in-degree, approximate $d_{tv}(P_1,P_2)$ to within additive error $\epsilon$ using $poly(n,\epsilon)$ samples and time 
+- Given sample access to two ferromagnetic Ising models $P_1$ and $P_2$ on $n$ variables with bounded width, approximate $d_{tv}(P_1, P_2)$ to within additive error $\epsilon$ using $poly(n,\epsilon)$ samples and time 
+- Given sample access to two $n$-dimensional Gaussians $P_1$ and $P_2$, approximate $d_{tv}(P_1, P_2)$ to within additive error $\epsilon$ using $poly(n,\epsilon)$ samples and time 
+- Given access to observations from two causal models $P$ and $Q$ on $n$ variables that are defined over known causal graphs, approximate $d_{tv}(P_a, Q_a)$ to within additive error $\epsilon$ using $poly(n,\epsilon)$ samples, where $P_a$ and $Q_a$ are the interventional distributions obtained by the intervention $do(A=a)$ on $P$ and $Q$ respectively for a particular variable $A$. 
+Our results are the first efficient distance approximation algorithms for these well-studied problems. They are derived using a simple and general connection to distribution learning algorithms. The distance approximation algorithms imply new efficient algorithms for {\em tolerant} testing of closeness of the above-mentioned structured high-dimensional distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Exact Verification of Binarized Neural Networks b/data/2020/neurips/Efficient Exact Verification of Binarized Neural Networks
new file mode 100644
index 0000000000..3d615ce0b7
--- /dev/null
+++ b/data/2020/neurips/Efficient Exact Verification of Binarized Neural Networks	
@@ -0,0 +1 @@
+We present a new system, EEV, for verifying binarized neural networks (BNNs). We formulate BNN verification as a Boolean satisfiability problem (SAT) with reified cardinality constraints of the form $y = (x_1 + \cdots + x_n \le b)$, where $x_i$ and $y$ are Boolean variables possibly with negation and $b$ is an integer constant. We also identify two properties, specifically balanced weight sparsity and lower cardinality bounds, that reduce the verification complexity of BNNs. EEV contains both a SAT solver enhanced to handle reified cardinality constraints natively and novel training strategies designed to reduce verification complexity by delivering networks with improved sparsity properties and cardinality bounds. We demonstrate the effectiveness of EEV by presenting the first exact verification results for $\ell_{\infty}$-bounded adversarial robustness of nontrivial convolutional BNNs on the MNIST and CIFAR10 datasets. Our results also show that, depending on the dataset and network architecture, our techniques verify BNNs between a factor of ten to ten thousand times faster than the best previous exact verification techniques for either binarized or real-valued networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization b/data/2020/neurips/Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization
new file mode 100644
index 0000000000..9eb9715087
--- /dev/null
+++ b/data/2020/neurips/Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization	
@@ -0,0 +1 @@
+The problem of inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. Despite significant algorithmic contributions in recent years, IRL remains an ill-posed problem at its core; multiple reward functions coincide with the observed behavior and the actual reward function is not identifiable without prior knowledge or supplementary information. This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian Optimization along with our newly proposed kernel that (a) projects the parameters of policy invariant reward functions to a single point in a latent space and (b) ensures nearby points in the latent space correspond to reward functions yielding similar likelihoods. This projection allows the use of standard stationary kernels in the latent space to capture the correlations present across the reward function space. Empirical results on synthetic and real-world environments (model-free and model-based) show that BO-IRL discovers multiple reward functions while minimizing the number of expensive exact policy optimizations.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Generation of Structured Objects with Constrained Adversarial Networks b/data/2020/neurips/Efficient Generation of Structured Objects with Constrained Adversarial Networks
new file mode 100644
index 0000000000..6b10dada19
--- /dev/null
+++ b/data/2020/neurips/Efficient Generation of Structured Objects with Constrained Adversarial Networks	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraints, like graph reachability. An extensive empirical analysis shows that CANs efficiently generate valid structures that are both high-quality and novel.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Learning of Discrete Graphical Models b/data/2020/neurips/Efficient Learning of Discrete Graphical Models
new file mode 100644
index 0000000000..5aea5bee2b
--- /dev/null
+++ b/data/2020/neurips/Efficient Learning of Discrete Graphical Models	
@@ -0,0 +1 @@
+Graphical models are useful tools for describing structured high-dimensional probability distributions. Development of efficient algorithms for learning graphical models with least amount of data remains an active research topic. Reconstruction of graphical models that describe the statistics of discrete variables is a particularly challenging problem, for which the maximum likelihood approach is intractable. In this work, we provide the first sample-efficient method based on the interaction screening framework that allows one to provably learn fully general discrete factor models with node-specific discrete alphabets and multi-body interactions, specified in an arbitrary basis. We identify a single condition related to model parametrization that leads to rigorous guarantees on the recovery of model structure and parameters in any error norm, and is readily verifiable for a large class of models. Importantly, our bounds make explicit distinction between parameters that are proper to the model and priors used as an input to the algorithm. Finally, we show that the interaction screening framework includes all models previously considered in the literature as special cases, and for which our analysis shows a systematic improvement in sample complexity.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Learning of Generative Models via Finite-Difference Score Matching b/data/2020/neurips/Efficient Learning of Generative Models via Finite-Difference Score Matching
new file mode 100644
index 0000000000..1caf41c247
--- /dev/null
+++ b/data/2020/neurips/Efficient Learning of Generative Models via Finite-Difference Score Matching	
@@ -0,0 +1 @@
+Several machine learning applications involve the optimization of higher-order derivatives (e.g., gradients of gradients) during training, which can be expensive in respect to memory and computation even with automatic differentiation. As a typical example in generative modeling, score matching (SM) involves the optimization of the trace of a Hessian. To improve computing efficiency, we rewrite the SM objective and its variants in terms of directional derivatives, and present a generic strategy to efficiently approximate any-order directional derivative with finite difference (FD). Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations. Thus, it reduces the total computational cost while also improving numerical stability. We provide two instantiations by reformulating variants of SM objectives into the FD forms. Empirically, we demonstrate that our methods produce results comparable to the gradient-based counterparts while being much more computationally efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Low Rank Gaussian Variational Inference for Neural Networks b/data/2020/neurips/Efficient Low Rank Gaussian Variational Inference for Neural Networks
new file mode 100644
index 0000000000..6f867c51d8
--- /dev/null
+++ b/data/2020/neurips/Efficient Low Rank Gaussian Variational Inference for Neural Networks	
@@ -0,0 +1 @@
+Bayesian neural networks are enjoying a renaissance driven in part by recent advances in variational inference (VI). The most common form of VI employs a fully factorized or mean-ﬁ eld distribution, but this is known to suffer from several pathologies, especially as we expect posterior distributions with highly correlated parameters. Current algorithms that capture these correlations with a Gaussian approximating family are dif ﬁ cult to scale to large models due to computational costs and high variance of gradient updates. By using a new form of the reparametrization trick, we derive a computationally ef ﬁ cient algorithm for performing VI with a Gaussian family with a low-rank plus diagonal covariance structure. We scale to deep feed-forward and convolutional architectures. We ﬁ nd that adding low-rank terms to parametrized diagonal covariance does not improve predictive performance except on small networks, but low-rank terms added to a constant diagonal covariance improves performance on small and large-scale network architectures.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity b/data/2020/neurips/Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
new file mode 100644
index 0000000000..178067abe9
--- /dev/null
+++ b/data/2020/neurips/Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity	
@@ -0,0 +1 @@
+Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with lower-variance reparameterized gradients (e.g., Gumbel-Softmax). In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization. To achieve this, we parameterize discrete distributions over latent assignments using differentiable sparse mappings: sparsemax and its structured counterparts. In effect, the support of these distributions is greatly reduced, which enables efficient marginalization. We report successful results in three tasks covering a range of latent variable modeling applications: a semisupervised deep generative model, a latent communication game, and a generative model with a bit-vector latent representation. In all cases, we obtain good performance while still achieving the practicality of sampling-based approximations.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning b/data/2020/neurips/Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
new file mode 100644
index 0000000000..8d5b6f2dc7
--- /dev/null
+++ b/data/2020/neurips/Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning	
@@ -0,0 +1 @@
+Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty. However, while most algorithms distinguish these two uncertainties for {\em learning} the model, they ignore it when {\em optimizing} the policy. In this paper, we show that ignoring the epistemic uncertainty leads to greedy algorithms that do not explore sufficiently. In turn, we propose a {\em practical optimistic-exploration algorithm} (\alg), which enlarges the input space with {\em hallucinated} inputs that can exert as much control as the {\em epistemic} uncertainty in the model affords. We analyze this setting and construct a general regret bound for well-calibrated models, which is provably sublinear in the case of Gaussian Process models. Based on this theoretical foundation, we show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms and different probabilistic models. Our experiments demonstrate that optimistic exploration significantly speeds up learning when there are penalties on actions, a setting that is notoriously difficult for existing model-based reinforcement learning algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees b/data/2020/neurips/Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
new file mode 100644
index 0000000000..90beb5a680
--- /dev/null
+++ b/data/2020/neurips/Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees	
@@ -0,0 +1 @@
+Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a ``one-shot'' fashion. Combining this with an efficient method for implementing multi-step Gaussian process ``fantasization,'' we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent b/data/2020/neurips/Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent
new file mode 100644
index 0000000000..01e5762cff
--- /dev/null
+++ b/data/2020/neurips/Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent	
@@ -0,0 +1 @@
+We consider a natural model of online preference aggregation, where sets of preferred items $R_1, R_2, \ldots, R_t$ along with a demand for $k_t$ items in each $R_t$, appear online. Without prior knowledge of $(R_t, k_t)$, the learner maintains a ranking $\pi_t$ aiming that at least $k_t$ items from $R_t$ appear high in $\pi_t$. This is a fundamental problem in preference aggregation with applications to, e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings. In this work, we show how to achieve low regret for GMSSC in polynomial-time. We employ dimensionality reduction from rankings to the space of doubly stochastic matrices, where we apply Online Gradient Descent. A key step is to show how subgradients can be computed efficiently, by solving the dual of a configuration LP. Using oblivious deterministic and randomized rounding schemes, we map doubly stochastic matrices back to rankings with a small loss in the GMSSC objective.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Planning in Large MDPs with Weak Linear Function Approximation b/data/2020/neurips/Efficient Planning in Large MDPs with Weak Linear Function Approximation
new file mode 100644
index 0000000000..2f7ca5d7e8
--- /dev/null
+++ b/data/2020/neurips/Efficient Planning in Large MDPs with Weak Linear Function Approximation	
@@ -0,0 +1 @@
+Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Projection-free Algorithms for Saddle Point Problems b/data/2020/neurips/Efficient Projection-free Algorithms for Saddle Point Problems
new file mode 100644
index 0000000000..b5a91e4255
--- /dev/null
+++ b/data/2020/neurips/Efficient Projection-free Algorithms for Saddle Point Problems	
@@ -0,0 +1 @@
+The Frank-Wolfe algorithm is a classic method for constrained optimization problems. It has recently been popular in many machine learning applications because its projection-free property leads to more efficient iterations. In this paper, we study projection-free algorithms for convex-strongly-concave saddle point problems with complicated constraints. Our method combines Conditional Gradient Sliding with Mirror-Prox and shows that it only requires $\tilde{O}(1/\sqrt{\epsilon})$ gradient evaluations and $\tilde{O}(1/\epsilon^2)$ linear optimizations in the batch setting. We also extend our method to the stochastic setting and propose first stochastic projection-free algorithms for saddle point problems. Experimental results demonstrate the effectiveness of our algorithms and verify our theoretical guarantees.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee b/data/2020/neurips/Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee
new file mode 100644
index 0000000000..5d18c6286d
--- /dev/null
+++ b/data/2020/neurips/Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee	
@@ -0,0 +1 @@
+Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep learning algorithms are lacking of theoretical support. On the other hand, another line of works have proposed theoretical frameworks that are computationally infeasible. In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors, and develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution. The variational posterior contraction rate is provided, which justifies the consistency of the proposed variational Bayes method. Notably, our empirical results demonstrate that this variational procedure provides uncertainty quantification in terms of Bayesian predictive distribution and is also capable to accomplish consistent variable selection by training a sparse multi-layer neural network.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient active learning of sparse halfspaces with arbitrary bounded noise b/data/2020/neurips/Efficient active learning of sparse halfspaces with arbitrary bounded noise
new file mode 100644
index 0000000000..88a9724e26
--- /dev/null
+++ b/data/2020/neurips/Efficient active learning of sparse halfspaces with arbitrary bounded noise	
@@ -0,0 +1 @@
+We study active learning of homogeneous $s$-sparse halfspaces in $\mathbb{R}^d$ under label noise. Even in the presence of mild label noise this is a challenging problem and only recently have label complexity bounds of the form $\tilde{\mathcal{O}} (s \cdot \mathrm{polylog}(d, \frac{1}{\epsilon}) )$ been established in \cite{zhang2018efficient} for computationally efficient algorithms under the broad class of isotropic log-concave distributions. In contrast, under high levels of label noise, the label complexity bounds achieved by computationally efficient algorithms are much worse. When the label noise satisfies the {\em Massart} condition \cite{massart2006risk}, i.e., each label is flipped with probability at most $\eta$ for a parameter $\eta \in \big[0, \frac12\big)$, state-of-the-art result \cite{awasthi2016learning} provides a computationally efficient active learning algorithm under isotropic log-concave distributions with label complexity $\tilde{\mathcal{O}}(s^{\mathrm{poly}({1/(1-2\eta)})} \mathrm{poly}(\ln d, \frac{1}{\epsilon}) )$, which is label-efficient only when the noise rate $\eta$ is a constant. In this work, we substantially improve on it by designing a polynomial time algorithm for active learning of $s$-sparse halfspaces under bounded noise and isotropic log-concave distributions, with a label complexity of $\tilde{\mathcal{O}}\Big(\frac{s}{(1-2\eta)^4} \mathrm{polylog} (d, \frac 1 \epsilon) \Big)$. This is the first efficient algorithm with label complexity polynomial in $\frac{1}{1-2\eta}$ in this setting, which is label-efficient even for $\eta$ arbitrarily close to $\frac12$. Our guarantees also immediately translate to new state-of-the-art label complexity results for full-dimensional active and passive halfspace learning under arbitrary bounded noise and isotropic log-concave distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient estimation of neural tuning during naturalistic behavior b/data/2020/neurips/Efficient estimation of neural tuning during naturalistic behavior
new file mode 100644
index 0000000000..e0e303d0bc
--- /dev/null
+++ b/data/2020/neurips/Efficient estimation of neural tuning during naturalistic behavior	
@@ -0,0 +1 @@
+Recent technological advances in systems neuroscience have led to a shift away from using simple tasks with low-dimensional, well-controlled stimuli towards trying to understand neural activity during naturalistic behavior. However, with the increase in number and complexity of task-relevant features, standard analyses such as estimating tuning functions become challenging. Here, we use a Poisson generalized additive model (P-GAM) with spline nonlinearities and an exponential link function to map a large number of task variables (input stimuli, behavioral outputs, and activity of other neurons, modeled as discrete events or continuous variables) into spike counts. We develop efﬁcient procedures for parameter learning by optimizing a generalized cross-validation score and infer marginal conﬁdence bounds for the contribution of each feature to neural responses. This allows us to robustly identify a minimal set of task features that each neuron is responsive to, circumventing computationally demanding model comparison. We show that our estimation procedure outperforms traditional regularized GLMs in terms of both ﬁt quality and computing time. When applied to neural recordings from monkeys performing a virtual reality spatial navigation task, P-GAM reveals mixed selectivity and preferential coupling between neurons with similar tuning.
\ No newline at end of file
diff --git a/data/2020/neurips/Efficient semidefinite-programming-based inference for binary and multi-class MRFs b/data/2020/neurips/Efficient semidefinite-programming-based inference for binary and multi-class MRFs
new file mode 100644
index 0000000000..c167fad65d
--- /dev/null
+++ b/data/2020/neurips/Efficient semidefinite-programming-based inference for binary and multi-class MRFs	
@@ -0,0 +1 @@
+Probabilistic inference in pairwise Markov Random Fields (MRFs), i.e. computing the partition function or computing a MAP estimate of the variables, is a foundational problem in probabilistic graphical models. Semidefinite programming relaxations have long been a theoretically powerful tool for analyzing properties of probabilistic inference, but have not been practical owing to the high computational cost of typical solvers for solving the resulting SDPs. In this paper, we propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF by instead exploiting a recently proposed coordinate-descent-based fast semidefinite solver. We also extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver. We show that the method substantially outperforms (both in terms of solution quality and speed) the existing state of the art in approximate inference, on benchmark problems drawn from previous work. We also show that our approach can scale to large MRF domains such as fully-connected pairwise CRF models used in computer vision.
\ No newline at end of file
diff --git a/data/2020/neurips/Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data b/data/2020/neurips/Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data
new file mode 100644
index 0000000000..533c4572c4
--- /dev/null
+++ b/data/2020/neurips/Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data	
@@ -0,0 +1 @@
+We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data. We ﬁrst investigate the issues surrounding the assumptions about uniformity made by InfoGAN [10], and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as a signal to learn the appropriate latent distribution representing object identity. Experiments on both artiﬁcial (MNIST, 3D cars, 3D chairs, ShapeNet) and real-world (YouTube-Faces) imbalanced datasets demonstrate the effectiveness of our method in disentangling object identity as a latent factor of variation
\ No newline at end of file
diff --git a/data/2020/neurips/Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks b/data/2020/neurips/Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks
new file mode 100644
index 0000000000..9be18cff18
--- /dev/null
+++ b/data/2020/neurips/Election Coding for Distributed Learning: Protecting SignSGD against Byzantine Attacks	
@@ -0,0 +1 @@
+Recent advances in large-scale distributed learning algorithms have enabled communication-efficient training via SignSGD. Unfortunately, a major issue continues to plague distributed learning: namely, Byzantine failures may incur serious degradation in learning accuracy. This paper proposes Election Coding, a coding-theoretic framework to guarantee Byzantine-robustness for SignSGD with Majority Vote, which uses minimum worker-master communication in both directions. The suggested framework explores new information-theoretic limits of finding the majority opinion when some workers could be malicious, and paves the road to implement robust and efficient distributed learning algorithms. Under this framework, we construct two types of explicit codes, random Bernoulli codes and deterministic algebraic codes, that can tolerate Byzantine attacks with a controlled amount of computational redundancy. For the Bernoulli codes, we provide upper bounds on the error probability in estimating the majority opinion, which give useful insights into code design for tolerating Byzantine attacks. As for deterministic codes, we construct an explicit code which perfectly tolerates Byzantines, and provide tight upper/lower bounds on the minimum required computational redundancy. Finally, the Byzantine-tolerance of the suggested coding schemes is confirmed by deep learning experiments on Amazon EC2 using Python with MPI4py package.
\ No newline at end of file
diff --git a/data/2020/neurips/Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design b/data/2020/neurips/Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
new file mode 100644
index 0000000000..efac163ac9
--- /dev/null
+++ b/data/2020/neurips/Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design	
@@ -0,0 +1 @@
+A wide range of reinforcement learning (RL) problems - including robustness, transfer learning, unsupervised RL, and emergent complexity - require specifying a distribution of tasks or environments in which a policy will be trained. However, creating a useful distribution of environments is error prone, and takes a significant amount of developer time and effort. We propose Unsupervised Environment Design (UED) as an alternative paradigm, where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. Existing approaches to automatically generating environments suffer from common failure modes: domain randomization cannot generate structure or adapt the difficulty of the environment to the agent's learning progress, and minimax adversarial training leads to worst-case environments that are often unsolvable. To generate structured, solvable environments for our protagonist agent, we introduce a second, antagonist agent that is allied with the environment-generating adversary. The adversary is motivated to generate environments which maximize regret, defined as the difference between the protagonist and antagonist agent's return. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED). Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences b/data/2020/neurips/Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences
new file mode 100644
index 0000000000..40c3d22347
--- /dev/null
+++ b/data/2020/neurips/Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences	
@@ -0,0 +1 @@
+Multi-agent reinforcement learning (MARL) has shown recent success in increasingly complex fixed-team zero-sum environments. However, the real world is not zero-sum nor does it have fixed teams; humans face numerous social dilemmas and must learn when to cooperate and when to compete. To successfully deploy agents into the human world, it may be important that they be able to understand and help in our conflicts. Unfortunately, selfish MARL agents typically fail when faced with social dilemmas. In this work, we show evidence of emergent direct reciprocity, indirect reciprocity and reputation, and team formation when training agents with randomized uncertain social preferences (RUSP), a novel environment augmentation that expands the distribution of environments agents play in. RUSP is generic and scalable; it can be applied to any multi-agent environment without changing the original underlying game dynamics or objectives. In particular, we show that with RUSP these behaviors can emerge and lead to higher social welfare equilibria in both classic abstract social dilemmas like Iterated Prisoner's Dilemma as well in more complex intertemporal environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Empirical Likelihood for Contextual Bandits b/data/2020/neurips/Empirical Likelihood for Contextual Bandits
new file mode 100644
index 0000000000..e1f8f08998
--- /dev/null
+++ b/data/2020/neurips/Empirical Likelihood for Contextual Bandits	
@@ -0,0 +1 @@
+We apply empirical likelihood techniques to contextual bandit policy value estimation, confidence intervals, and learning. We propose a tighter estimator for off-policy evaluation with improved statistical performance over previous proposals. Coupled with this estimator is a confidence interval which also improves over previous proposals. We then harness these to improve learning from contextual bandit data. Each of these is empirically evaluated to show good performance against strong baselines in finite sample regimes.
\ No newline at end of file
diff --git a/data/2020/neurips/Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming b/data/2020/neurips/Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
new file mode 100644
index 0000000000..dde14bc2c6
--- /dev/null
+++ b/data/2020/neurips/Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming	
@@ -0,0 +1 @@
+Convex relaxations have emerged as a promising approach for verifying desirable properties of neural networks like robustness to adversarial perturbations. Widely used Linear Programming (LP) relaxations only work well when networks are trained to facilitate verification. This precludes applications that involve verification-agnostic networks, i.e., networks not specially trained for verification. On the other hand, semidefinite programming (SDP) relaxations have successfully be applied to verification-agnostic networks, but do not currently scale beyond small networks due to poor time and space asymptotics. In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration. By exploiting iterative eigenvector methods, we express all solver operations in terms of forward and backward passes through the network, enabling efficient use of hardware like GPUs/TPUs. For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
\ No newline at end of file
diff --git a/data/2020/neurips/End-to-End Learning and Intervention in Games b/data/2020/neurips/End-to-End Learning and Intervention in Games
new file mode 100644
index 0000000000..97d8021508
--- /dev/null
+++ b/data/2020/neurips/End-to-End Learning and Intervention in Games	
@@ -0,0 +1 @@
+In a social system, the self-interest of agents can be detrimental to the collective good, sometimes leading to social dilemmas. To resolve such a conflict, a central designer may intervene by either redesigning the system or incentivizing the agents to change their behaviors. To be effective, the designer must anticipate how the agents react to the intervention, which is dictated by their often unknown payoff functions. Therefore, learning about the agents is a prerequisite for intervention. In this paper, we provide a unified framework for learning and intervention in games. We cast the equilibria of games as individual layers and integrate them into an end-to-end optimization framework. To enable the backward propagation through the equilibria of games, we propose two approaches, respectively based on explicit and implicit differentiation. Specifically, we cast the equilibria as the solutions to variational inequalities (VIs). The explicit approach unrolls the projection method for solving VIs, while the implicit approach exploits the sensitivity of the solutions to VIs. At the core of both approaches is the differentiation through a projection operator. Moreover, we establish the correctness of both approaches and identify the conditions under which one approach is more desirable than the other. The analytical results are validated using several real-world problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Energy-based Out-of-distribution Detection b/data/2020/neurips/Energy-based Out-of-distribution Detection
new file mode 100644
index 0000000000..9b3a063227
--- /dev/null
+++ b/data/2020/neurips/Energy-based Out-of-distribution Detection	
@@ -0,0 +1 @@
+Determining whether inputs are out-of-distribution (OOD) is an essential building block for safely deploying machine learning models in the open world. However, previous methods relying on the softmax confidence score suffer from overconfident posterior distributions for OOD data. We propose a unified framework for OOD detection that uses an energy score. We show that energy scores better distinguish in- and out-of-distribution samples than the traditional approach using the softmax scores. Unlike softmax confidence scores, energy scores are theoretically aligned with the probability density of the inputs and are less susceptible to the overconfidence issue. Within this framework, energy can be flexibly used as a scoring function for any pre-trained neural classifier as well as a trainable cost function to shape the energy surface explicitly for OOD detection. On a CIFAR-10 pre-trained WideResNet, using the energy score reduces the average FPR (at TPR 95%) by 18.03% compared to the softmax confidence score. With energy-based training, our method outperforms the state-of-the-art on common benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Ensemble Distillation for Robust Model Fusion in Federated Learning b/data/2020/neurips/Ensemble Distillation for Robust Model Fusion in Federated Learning
new file mode 100644
index 0000000000..c435a703a3
--- /dev/null
+++ b/data/2020/neurips/Ensemble Distillation for Robust Model Fusion in Federated Learning	
@@ -0,0 +1,2 @@
+Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. 
+In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) and settings (heterogeneous models/data) that the server model can be trained much faster, requiring fewer communication rounds than any existing FL technique so far.
\ No newline at end of file
diff --git a/data/2020/neurips/Ensembling geophysical models with Bayesian Neural Networks b/data/2020/neurips/Ensembling geophysical models with Bayesian Neural Networks
new file mode 100644
index 0000000000..4782cee4b4
--- /dev/null
+++ b/data/2020/neurips/Ensembling geophysical models with Bayesian Neural Networks	
@@ -0,0 +1 @@
+Ensembles of geophysical models improve projection accuracy and express uncertainties. We develop a novel data-driven ensembling strategy for combining geophysical models using Bayesian Neural Networks, which infers spatiotemporally varying model weights and bias while accounting for heteroscedastic uncertainties in the observations. This produces more accurate and uncertainty-aware projections without sacrificing interpretability. Applied to the prediction of total column ozone from an ensemble of 15 chemistry-climate models, we find that the Bayesian neural network ensemble (BayNNE) outperforms existing ensembling methods, achieving a 49.4% reduction in RMSE for temporal extrapolation, and a 67.4% reduction in RMSE for polar data voids, compared to a weighted mean. Uncertainty is also well-characterized, with 90.6% of the data points in our extrapolation validation dataset lying within 2 standard deviations and 98.5% within 3 standard deviations.
\ No newline at end of file
diff --git a/data/2020/neurips/Ensuring Fairness Beyond the Training Data b/data/2020/neurips/Ensuring Fairness Beyond the Training Data
new file mode 100644
index 0000000000..cf1063e720
--- /dev/null
+++ b/data/2020/neurips/Ensuring Fairness Beyond the Training Data	
@@ -0,0 +1 @@
+We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.
\ No newline at end of file
diff --git a/data/2020/neurips/Entropic Causal Inference: Identifiability and Finite Sample Results b/data/2020/neurips/Entropic Causal Inference: Identifiability and Finite Sample Results
new file mode 100644
index 0000000000..e18ca27b65
--- /dev/null
+++ b/data/2020/neurips/Entropic Causal Inference: Identifiability and Finite Sample Results	
@@ -0,0 +1 @@
+Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. The central assumption is that the amount of unobserved randomness in the system is not too large. This unobserved randomness is measured by the entropy of the exogenous variable in the underlying structural causal model, which governs the causal relation between the observed variables. Kocaoglu et al. conjectured that the causal direction is identifiable when the entropy of the exogenous variable is not too large. In this paper, we prove a variant of their conjecture. Namely, we show that for almost all causal models where the exogenous variable has entropy that does not scale with the number of states of the observed variables, the causal direction is identifiable from observational data. We also consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al., and for the first time demonstrate algorithmic identifiability guarantees using a finite number of samples. We conduct extensive experiments to evaluate the robustness of the method to relaxing some of the assumptions in our theory and demonstrate that both the constant-entropy exogenous variable and the no latent confounder assumptions can be relaxed in practice. We also empirically characterize the number of observational samples needed for causal identification. Finally, we apply the algorithm on Tuebingen cause-effect pairs dataset.
\ No newline at end of file
diff --git a/data/2020/neurips/Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form b/data/2020/neurips/Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form
new file mode 100644
index 0000000000..7743bdafc3
--- /dev/null
+++ b/data/2020/neurips/Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form	
@@ -0,0 +1 @@
+Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e.g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry. On the other hand, the numerical resolution of OT problems using entropic regularization has given rise to many applications, but because there are no known closed-form solutions for entropic regularized OT problems, these approaches are mostly algorithmic, not informed by elegant closed forms. In this paper, we propose to fill the void at the intersection between these two schools of thought in OT by proving that the entropy-regularized optimal transport problem between two Gaussian measures admits a closed form. Contrary to the unregularized case, for which the explicit form is given by the Wasserstein-Bures distance, the closed form we obtain is differentiable everywhere, even for Gaussians with degenerate covariance matrices. We obtain this closed form solution by solving the fixed-point equation behind Sinkhorn's algorithm, the default method for computing entropic regularized OT. Remarkably, this approach extends to the generalized unbalanced case -- where Gaussian measures are scaled by positive constants. This extension leads to a closed form expression for unbalanced Gaussians as well, and highlights the mass transportation / destruction trade-off seen in unbalanced optimal transport. Moreover, in both settings, we show that the optimal transportation plans are (scaled) Gaussians and provide analytical formulas of their parameters. These formulas constitute the first non-trivial closed forms for entropy-regularized optimal transport, thus providing a ground truth for the analysis of entropic OT and Sinkhorn's algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Entrywise convergence of iterative methods for eigenproblems b/data/2020/neurips/Entrywise convergence of iterative methods for eigenproblems
new file mode 100644
index 0000000000..e2dd9e1b8b
--- /dev/null
+++ b/data/2020/neurips/Entrywise convergence of iterative methods for eigenproblems	
@@ -0,0 +1 @@
+Several problems in machine learning, statistics, and other fields rely on computing eigenvectors. For large scale problems, the computation of these eigenvectors is typically performed via iterative schemes such as subspace iteration or Krylov methods. While there is classical and comprehensive analysis for subspace convergence guarantees with respect to the spectral norm, in many modern applications other notions of subspace distance are more appropriate. Recent theoretical work has focused on perturbations of subspaces measured in the $\ell_{2 \to \infty}$ norm, but does not consider the actual computation of eigenvectors. Here we address the convergence of subspace iteration when distances are measured in the $\ell_{2 \to \infty}$ norm and provide deterministic bounds. We complement our analysis with a practical stopping criterion and demonstrate its applicability via numerical experiments. Our results show that one can get comparable performance on downstream tasks while requiring fewer iterations, thereby saving substantial computational time.
\ No newline at end of file
diff --git a/data/2020/neurips/Equivariant Networks for Hierarchical Structures b/data/2020/neurips/Equivariant Networks for Hierarchical Structures
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs b/data/2020/neurips/Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs
new file mode 100644
index 0000000000..9a31a35c91
--- /dev/null
+++ b/data/2020/neurips/Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs	
@@ -0,0 +1 @@
+Combinatorial optimization problems are notoriously challenging for neural networks, especially in the absence of labeled instances. This work proposes an unsupervised learning framework for CO problems on graphs that can provide integral solutions of certified quality. Inspired by Erdos' probabilistic method, we use a neural network to parametrize a probability distribution over sets. Crucially, we show that when the network is optimized w.r.t. a suitably chosen loss, the learned distribution contains, with controlled probability, a low-cost integral solution that obeys the constraints of the combinatorial problem. The probabilistic proof of existence is then derandomized to decode the desired solutions. We demonstrate the efficacy of this approach to obtain valid solutions to the maximum clique problem and to perform local graph clustering. Our method achieves competitive results on both real datasets and synthetic hard instances.
\ No newline at end of file
diff --git a/data/2020/neurips/Error Bounds of Imitating Policies and Environments b/data/2020/neurips/Error Bounds of Imitating Policies and Environments
new file mode 100644
index 0000000000..f596f63c06
--- /dev/null
+++ b/data/2020/neurips/Error Bounds of Imitating Policies and Environments	
@@ -0,0 +1 @@
+In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understandings need further studies, among which the compounding error in long-horizon decisions is a major issue. In this paper, we first analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning (BC) and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding error compared to BC. Furthermore, we establish the lower bounds of IL under two settings, suggesting the significance of environment interactions in IL. By considering the environment transition model as a dual agent, IL can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than BC. Particularly, we obtain a policy evaluation error that is linear with the effective planning horizon w.r.t. the model bias, suggesting a novel application of adversarial imitation for model-based reinforcement learning (MBRL). We hope these results could inspire future advances in IL and MBRL.
\ No newline at end of file
diff --git a/data/2020/neurips/Escaping Saddle-Point Faster under Interpolation-like Conditions b/data/2020/neurips/Escaping Saddle-Point Faster under Interpolation-like Conditions
new file mode 100644
index 0000000000..bb1ab32f99
--- /dev/null
+++ b/data/2020/neurips/Escaping Saddle-Point Faster under Interpolation-like Conditions	
@@ -0,0 +1 @@
+In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisﬁed by the stochastic gradients in an over-parametrization setting, the ﬁrst-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an (cid:15) -local-minimizer, matches the corresponding deterministic rate of ˜ O (1 /(cid:15) 2 ) . We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an (cid:15) -local-minimizer under interpolation-like conditions, is ˜ O (1 /(cid:15) 2 . 5 ) . While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of ˜ O (1 /(cid:15) 1 . 5 ) corresponding to deterministic Cubic-Regularized Newton method. It seems further Hessian-based interpolation-like assumptions are necessary to bridge this gap. We also discuss the corresponding improved complexities in the zeroth-order settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Escaping the Gravitational Pull of Softmax b/data/2020/neurips/Escaping the Gravitational Pull of Softmax
new file mode 100644
index 0000000000..b261c33188
--- /dev/null
+++ b/data/2020/neurips/Escaping the Gravitational Pull of Softmax	
@@ -0,0 +1 @@
+The softmax is the standard transformation used in machine learning to map realvalued vectors to categorical distributions. Unfortunately, this transform poses serious drawbacks for gradient descent (ascent) optimization. We reveal this difficulty by establishing two negative results: (1) optimizing any expectation with respect to the softmax must exhibit sensitivity to parameter initialization (“softmax gravity well”), and (2) optimizing log-probabilities under the softmax must exhibit slow convergence (“softmax damping”). Both findings are based on an analysis of convergence rates using the Non-uniform Łojasiewicz (NŁ) inequalities. To circumvent these shortcomings we investigate an alternative transformation, the escort mapping, that demonstrates better optimization properties. The disadvantages of the softmax and the effectiveness of the escort transformation are further explained using the concept of NŁ coefficient. In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating Fluctuations in Neural Representations of Uncertain Environments b/data/2020/neurips/Estimating Fluctuations in Neural Representations of Uncertain Environments
new file mode 100644
index 0000000000..108507684f
--- /dev/null
+++ b/data/2020/neurips/Estimating Fluctuations in Neural Representations of Uncertain Environments	
@@ -0,0 +1 @@
+Neural Coding analyses often reflect an assumption that neural populations respond uniquely and consistently to particular stimuli. For example, analyses of spatial remapping in hippocampal populations often assume that each environment has one unique representation and that remapping occurs over long time scales as an animal traverses between distinct environments. However, as neuroscience experiments begin to explore more naturalistic tasks and stimuli, and reflect more ambiguity in neural representations, methods for analyzing population neural codes must adapt to reflect these features. In this paper, we develop a new state-space modeling framework to address two important issues related to remapping. First, neurons may exhibit significant trial-to-trial or moment-to-moment variability in the firing patterns used to represent a particular environment or stimulus. Second, in ambiguous environments and tasks that involve cognitive uncertainty, neural populations may rapidly fluctuate between multiple representations. The statespace model addresses these two issues by integrating an observation model, which allows for multiple representations of the same stimulus or environment, with a state model, which characterizes the moment-by-moment probability of a shift in the neural representation. These models allow us to compute instantaneous estimates of the stimulus or environment currently represented by the population. We demonstrate the application of this approach to the analysis of population activity in the CA1 region of hippocampus of a mouse moving through ambiguous virtual environments. Our analyses demonstrate that many hippocampal cells express significant trial-to-trial variability in their representations and that the population representation can fluctuate rapidly between environments within a single trial when spatial cues are most ambiguous.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks b/data/2020/neurips/Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks
new file mode 100644
index 0000000000..4dae29ebef
--- /dev/null
+++ b/data/2020/neurips/Estimating Rank-One Spikes from Heavy-Tailed Noise via Self-Avoiding Walks	
@@ -0,0 +1 @@
+We study symmetric spiked matrix models with respect to a general class of noise distributions. Given a rank-1 deformation of a random noise matrix, whose entries are independently distributed with zero mean and unit variance, the goal is to estimate the rank-1 part. For the case of Gaussian noise, the top eigenvector of the given matrix is a widely-studied estimator known to achieve optimal statistical guarantees, e.g., in the sense of the celebrated BBP phase transition. However, this estimator can fail completely for heavy-tailed noise. In this work, we exhibit an estimator that works for heavy-tailed noise up to the BBP threshold that is optimal even for Gaussian noise. We give a non-asymptotic analysis of our estimator which relies only on the variance of each entry remaining constant as the size of the matrix grows: higher moments may grow arbitrarily fast or even fail to exist. Previously, it was only known how to achieve these guarantees if higher-order moments of the noises are bounded by a constant independent of the size of the matrix. Our estimator can be evaluated in polynomial time by counting self-avoiding walks via a color -coding technique. Moreover, we extend our estimator to spiked tensor models and establish analogous results.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating Training Data Influence by Tracing Gradient Descent b/data/2020/neurips/Estimating Training Data Influence by Tracing Gradient Descent
new file mode 100644
index 0000000000..6ca6434319
--- /dev/null
+++ b/data/2020/neurips/Estimating Training Data Influence by Tracing Gradient Descent	
@@ -0,0 +1 @@
+We introduce a method called TrackIn that computes the influence of a training example on a prediction made by the model, by tracking how the loss on the test point changes during the training process whenever the training example of interest was utilized. We provide a scalable implementation of TrackIn via a combination of a few key ideas: (a) a first-order approximation to the exact computation, (b) using random projections to speed up the computation of the first-order approximation for large models, (c) using saved checkpoints of standard training procedures, and (d) cherry-picking layers of a deep neural network. An experimental evaluation shows that TrackIn is more effective in identifying mislabelled training examples than other related methods such as influence functions and representer points. We also discuss insights from applying the method on vision, regression and natural language tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating decision tree learnability with polylogarithmic sample complexity b/data/2020/neurips/Estimating decision tree learnability with polylogarithmic sample complexity
new file mode 100644
index 0000000000..56a6ea10b1
--- /dev/null
+++ b/data/2020/neurips/Estimating decision tree learnability with polylogarithmic sample complexity	
@@ -0,0 +1,2 @@
+We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. 
+En route to this result, we design and analyze sample-efficient minibatch versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give "active local" versions of these heuristics: given a test point $x^\star$, we show how the label $T(x^\star)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples, exponentially smaller than the number necessary to learn $T$.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks b/data/2020/neurips/Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks
new file mode 100644
index 0000000000..c7c4218f0d
--- /dev/null
+++ b/data/2020/neurips/Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks	
@@ -0,0 +1 @@
+While much attention has been given to the problem of estimating the effect of discrete interventions from observational data, relatively little work has been done in the setting of continuous-valued interventions, such as treatments associated with a dosage parameter. In this paper, we tackle this problem by building on a modification of the generative adversarial networks (GANs) framework. Our model, SCIGAN, is flexible and capable of simultaneously estimating counterfactual outcomes for several different continuous interventions. The key idea is to use a significantly modified GAN model to learn to generate counterfactual outcomes, which can then be used to learn an inference model, using standard supervised methods, capable of estimating these counterfactuals for a new sample. To address the challenges presented by shifting to continuous interventions, we propose a novel architecture for our discriminator - we build a hierarchical discriminator that leverages the structure of the continuous intervention setting. Moreover, we provide theoretical results to support our use of the GAN framework and of the hierarchical discriminator. In the experiments section, we introduce a new semi-synthetic data simulation for use in the continuous intervention setting and demonstrate improvements over the existing benchmark models.
\ No newline at end of file
diff --git a/data/2020/neurips/Estimating weighted areas under the ROC curve b/data/2020/neurips/Estimating weighted areas under the ROC curve
new file mode 100644
index 0000000000..12f04899eb
--- /dev/null
+++ b/data/2020/neurips/Estimating weighted areas under the ROC curve	
@@ -0,0 +1 @@
+Exponential bounds on the estimation error are given for the plug-in estimator of weighted areas under the ROC curve. The bounds hold for single score functions and uniformly over classes of functions, whose complexity can be controlled by Gaussian or Rademacher averages. The results justify learning algorithms which select score functions to maximize the empirical partial area under the curve (pAUC). They also illustrate the use of some recent advances in the theory of nonlinear empirical processes
\ No newline at end of file
diff --git a/data/2020/neurips/Estimation of Skill Distribution from a Tournament b/data/2020/neurips/Estimation of Skill Distribution from a Tournament
new file mode 100644
index 0000000000..2519c419b8
--- /dev/null
+++ b/data/2020/neurips/Estimation of Skill Distribution from a Tournament	
@@ -0,0 +1 @@
+In this paper, we study the problem of learning the skill distribution of a population of agents from observations of pairwise games in a tournament. These games are played among randomly drawn agents from the population. The agents in our model can be individuals, sports teams, or Wall Street fund managers. Formally, we postulate that the likelihoods of outcomes of games are governed by the parametric Bradley-Terry-Luce (or multinomial logit) model, where the probability of an agent beating another is the ratio between its skill level and the pairwise sum of skill levels, and the skill parameters are drawn from an unknown, non-parametric skill density of interest. The problem is, in essence, to learn a distribution from noisy, quantized observations. We propose a surprisingly simple and tractable algorithm that learns the skill density with near-optimal minimax mean squared error scaling as n−1+ε, for any ε > 0, so long as the density is smooth. Our approach brings together prior work on learning skill parameters from pairwise comparisons with kernel density estimation from non-parametric statistics. Furthermore, we prove information theoretic lower bounds which establish minimax optimality of the skill parameter estimation technique used in our algorithm. These bounds utilize a continuum version of Fano’s method along with a careful covering argument. We apply our algorithm to various soccer leagues and world cups, cricket world cups, and mutual funds. We find that the entropy of a learnt distribution provides a quantitative measure of skill, which in turn provides rigorous explanations for popular beliefs about perceived qualities of sporting events, e.g., soccer league rankings. Finally, we apply our method to assess the skill distributions of mutual funds. Our results shed light on the abundance of low quality funds prior to the Great Recession of 2008, and the domination of the industry by more skilled funds after the financial crisis.
\ No newline at end of file
diff --git a/data/2020/neurips/Evaluating Attribution for Graph Neural Networks b/data/2020/neurips/Evaluating Attribution for Graph Neural Networks
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2020/neurips/Evaluating Attribution for Graph Neural Networks	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2020/neurips/Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions b/data/2020/neurips/Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions
new file mode 100644
index 0000000000..a48da0b1a8
--- /dev/null
+++ b/data/2020/neurips/Evaluating and Rewarding Teamwork Using Cooperative Game Abstractions	
@@ -0,0 +1 @@
+Can we predict how well a team of individuals will perform together? How should individuals be rewarded for their contributions to the team performance? Cooperative game theory gives us a powerful set of tools for answering these questions: the Characteristic Function (CF) and solution concepts like the Shapley Value (SV). There are two major difficulties in applying these techniques to real world problems: first, the CF is rarely given to us and needs to be learned from data. Second, the SV is combinatorial in nature. We introduce a parametric model called cooperative game abstractions (CGAs) for estimating CFs from data. CGAs are easy to learn, readily interpretable, and crucially allow linear-time computation of the SV. We provide identification results and sample complexity bounds for CGA models as well as error bounds in the estimation of the SV using CGAs. We apply our methods to study teams of artificial RL agents as well as real world teams from professional sports.
\ No newline at end of file
diff --git a/data/2020/neurips/Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization b/data/2020/neurips/Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization
new file mode 100644
index 0000000000..9944a95b7a
--- /dev/null
+++ b/data/2020/neurips/Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization	
@@ -0,0 +1 @@
+Recent voxel-based 3D object detectors for autonomous vehicles learn point cloud representations either from bird eye view (BEV) or range view (RV, a.k.a. perspective view). However, each view has its own strengths and weaknesses. In this paper, we present a novel framework to unify and leverage the beneﬁts from both BEV and RV. The widely-used cuboid-shaped voxels in Cartesian coordinate system only beneﬁt BEV feature map. Therefore, to enable learning both BEV and RV feature maps, we introduce Hybrid-Cylindrical-Spherical voxelization. Our ﬁndings show that simply adding detection on another view as auxiliary supervision will lead to poor performance. We proposed a pair of cross-view transformers to transform the feature maps into the other view and introduce cross-view consistency loss on them. Comprehensive experiments on the challenging NuScenes Dataset validate the effectiveness of our proposed method which leverages joint optimization and complementary information on both views. Remarkably, our approach achieved mAP of 55 . 8% , outperforming all published approaches by at least 3% in overall performance and up to 16 . 5% in safety-crucial categories like cyclist.
\ No newline at end of file
diff --git a/data/2020/neurips/Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders b/data/2020/neurips/Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders
new file mode 100644
index 0000000000..ffa28e21f4
--- /dev/null
+++ b/data/2020/neurips/Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders	
@@ -0,0 +1 @@
+Discrete latent spaces in variational autoencoders have been shown to effectively capture the data distribution for many real-world problems such as natural language understanding, human intent prediction, and visual scene representation. However, discrete latent spaces need to be sufficiently large to capture the complexities of real-world data, rendering downstream tasks computationally challenging. For instance, performing motion planning in a high-dimensional latent representation of the environment could be intractable. We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder, while preserving its learned multimodality. As a post hoc latent space reduction technique, we use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not. Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality.
\ No newline at end of file
diff --git a/data/2020/neurips/EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning b/data/2020/neurips/EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning
new file mode 100644
index 0000000000..cd83c0134c
--- /dev/null
+++ b/data/2020/neurips/EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning	
@@ -0,0 +1 @@
+Multi-agent interacting systems are prevalent in the world, from pure physical systems to complicated social dynamic systems. In many applications, effective understanding of the situation and accurate trajectory prediction of interactive agents play a significant role in downstream tasks, such as decision making and planning. In this paper, we propose a generic trajectory forecasting framework (named EvolveGraph) with explicit relational structure recognition and prediction via latent interaction graphs among multiple heterogeneous, interactive agents. Considering the uncertainty of future behaviors, the model is designed to provide multi-modal prediction hypotheses. Since the underlying interactions may evolve even with abrupt changes, and different modalities of evolution may lead to different outcomes, we address the necessity of dynamic relational reasoning and adaptively evolving the interaction graphs. We also introduce a double-stage training pipeline which not only improves training efficiency and accelerates convergence, but also enhances model performance. The proposed framework is evaluated on both synthetic physics simulations and multiple real-world benchmark datasets in various areas. The experimental results illustrate that our approach achieves state-of-the-art performance in terms of prediction accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation b/data/2020/neurips/Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
new file mode 100644
index 0000000000..45a93047e4
--- /dev/null
+++ b/data/2020/neurips/Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation	
@@ -0,0 +1 @@
+The ability to perform effective planning is crucial for building an instruction-following agent. When navigating through a new environment, an agent is challenged with (1) connecting the natural language instructions with its progressively growing knowledge of the world; and (2) performing long-range planning and decision making in the form of effective exploration and error correction. Current methods are still limited on both fronts despite extensive efforts. In this paper, we introduce the Evolving Graphical Planner (EGP), a model that performs global planning for navigation based on raw sensory input. The model dynamically constructs a graphical representation, generalizes the action space to allow for more flexible decision making, and performs efficient planning on a proxy graph representation. We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task through pure imitation learning, outperforming previous navigation architectures by up to 5%.
\ No newline at end of file
diff --git a/data/2020/neurips/Evolving Normalization-Activation Layers b/data/2020/neurips/Evolving Normalization-Activation Layers
new file mode 100644
index 0000000000..60d9a21ed2
--- /dev/null
+++ b/data/2020/neurips/Evolving Normalization-Activation Layers	
@@ -0,0 +1 @@
+Normalization layers and activation functions are fundamental components in deep networks and typically co-locate with each other. Here we propose to design them using an automated approach. Instead of designing them separately, we unify them into a single tensor-to-tensor computation graph, and evolve its structure starting from basic mathematical functions. Examples of such mathematical functions are addition, multiplication and statistical moments. The use of low-level mathematical functions, in contrast to the use of high-level modules in mainstream NAS, leads to a highly sparse and large search space which can be challenging for search methods. To address the challenge, we develop efficient rejection protocols to quickly filter out candidate layers that do not work well. We also use multi-objective evolution to optimize each layer's performance across many architectures to prevent overfitting. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures that go beyond existing design patterns. For example, some EvoNorms do not assume that normalization and activation functions must be applied sequentially, nor need to center the feature maps, nor require explicit activation functions. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets but also transfer well to Mask R-CNN with FPN/SpineNet for instance segmentation and to BigGAN for image synthesis, outperforming BatchNorm and GroupNorm based layers in many cases.
\ No newline at end of file
diff --git a/data/2020/neurips/Exact Recovery of Mangled Clusters with Same-Cluster Queries b/data/2020/neurips/Exact Recovery of Mangled Clusters with Same-Cluster Queries
new file mode 100644
index 0000000000..35b676efc0
--- /dev/null
+++ b/data/2020/neurips/Exact Recovery of Mangled Clusters with Same-Cluster Queries	
@@ -0,0 +1 @@
+We study the problem of recovering distorted clusters in the semi-supervised active clustering framework. Given an oracle revealing whether any two points lie in the same cluster, we are interested in designing algorithms that recover all clusters exactly, in polynomial time, and using as few queries as possible. Towards this end, we extend the notion of center-based clustering with margin introduced by Ashtiani et al.\ to clusters with arbitrary linear distortions and arbitrary centers. This includes all those cases where the original dataset is transformed by any combination of rotations, axis scalings, and point deletions. We show that, even in this significantly more challenging setting, it is possible to recover the underlying clustering exactly while using only a small number of oracle queries. To this end we design an algorithm that, given $n$ points to be partitioned into $k$ clusters, uses $O(k^3 \ln k \ln n)$ oracle queries and $\tilde{O}(kn + k^3)$ time to recover the exact clustering structure of the underlying instance (even when the instance is NP-hard to solve without oracle access). The $O(\cdot)$ notation hides an exponential dependence on the dimensionality of the clusters, which we show to be necessary. Our algorithm is simple, easy to implement, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees. Experiments on large synthetic datasets confirm that we can reconstruct the latent clustering exactly and efficiently.
\ No newline at end of file
diff --git a/data/2020/neurips/Exact expressions for double descent and implicit regularization via surrogate random design b/data/2020/neurips/Exact expressions for double descent and implicit regularization via surrogate random design
new file mode 100644
index 0000000000..e0630124ce
--- /dev/null
+++ b/data/2020/neurips/Exact expressions for double descent and implicit regularization via surrogate random design	
@@ -0,0 +1 @@
+Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing a special determinantal point process which we call surrogate random design, to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution. In our analysis we introduce a new mathematical tool of independent interest: the class of random matrices for which determinant commutes with expectation.
\ No newline at end of file
diff --git a/data/2020/neurips/Exactly Computing the Local Lipschitz Constant of ReLU Networks b/data/2020/neurips/Exactly Computing the Local Lipschitz Constant of ReLU Networks
new file mode 100644
index 0000000000..41b2173dd1
--- /dev/null
+++ b/data/2020/neurips/Exactly Computing the Local Lipschitz Constant of ReLU Networks	
@@ -0,0 +1 @@
+The Lipschitz constant of a neural network is a useful metric for provable robustness and generalization. We present a novel analytic result which relates gradient norms to Lipschitz constants for nondifferentiable functions. Next we prove hardness and inapproximability results for computing the local Lipschitz constant of ReLU neural networks. We develop a mixed-integer programming formulation to exactly compute the local Lipschitz constant for scalar and vector-valued networks. Finally, we apply our technique on networks trained on synthetic datasets and MNIST, drawing observations about the tightness of competing Lipschitz estimators and the effects of regularized training on Lipschitz constants.
\ No newline at end of file
diff --git a/data/2020/neurips/Exchangeable Neural ODE for Set Modeling b/data/2020/neurips/Exchangeable Neural ODE for Set Modeling
new file mode 100644
index 0000000000..6761e2a797
--- /dev/null
+++ b/data/2020/neurips/Exchangeable Neural ODE for Set Modeling	
@@ -0,0 +1 @@
+Reasoning over an instance composed of a set of vectors, like a point cloud, requires that one accounts for intra-set dependent features among elements. However, since such instances are unordered, the elements' features should remain unchanged when the input's order is permuted. This property, permutation equivariance, is a challenging constraint for most neural architectures. While recent work has proposed global pooling and attention-based solutions, these may be limited in the way that intradependencies are captured in practice. In this work we propose a more general formulation to achieve permutation equivariance through ordinary differential equations (ODE). Our proposed module, Exchangeable Neural ODE (ExNODE), can be seamlessly applied for both discriminative and generative tasks. We also extend set modeling in the temporal dimension and propose a VAE based model for temporal set modeling. Extensive experiments demonstrate the efficacy of our method over strong baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Exemplar Guided Active Learning b/data/2020/neurips/Exemplar Guided Active Learning
new file mode 100644
index 0000000000..bb34617d9c
--- /dev/null
+++ b/data/2020/neurips/Exemplar Guided Active Learning	
@@ -0,0 +1 @@
+We consider the problem of wisely using a limited budget to label a small subset of a large unlabeled dataset. We are motivated by the NLP problem of word sense disambiguation. For any word, we have a set of candidate labels from a knowledge base, but the label set is not necessarily representative of what occurs in the data: there may exist labels in the knowledge base that very rarely occur in the corpus because the sense is rare in modern English; and conversely there may exist true labels that do not exist in our knowledge base. Our aim is to obtain a classifier that performs as well as possible on examples of each "common class" that occurs with frequency above a given threshold in the unlabeled set while annotating as few examples as possible from "rare classes" whose labels occur with less than this frequency. The challenge is that we are not informed which labels are common and which are rare, and the true label distribution may exhibit extreme skew. We describe an active learning approach that (1) explicitly searches for rare classes by leveraging the contextual embedding spaces provided by modern language models, and (2) incorporates a stopping rule that ignores classes once we prove that they occur below our target threshold with high probability. We prove that our algorithm only costs logarithmically more than a hypothetical approach that knows all true label frequencies and show experimentally that incorporating automated search can significantly reduce the number of samples needed to reach target accuracy levels.
\ No newline at end of file
diff --git a/data/2020/neurips/Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation b/data/2020/neurips/Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
new file mode 100644
index 0000000000..fc613cb928
--- /dev/null
+++ b/data/2020/neurips/Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation	
@@ -0,0 +1 @@
+We introduce Exemplar VAEs, a family of generative models that bridge the gap between parametric and non-parametric, exemplar based generative models. Exemplar VAE is a variant of VAE with a non-parametric prior in the latent space based on a Parzen window estimator. To sample from it, one first draws a random exemplar from a training set, then stochastically transforms that exemplar into a latent code and a new observation. We propose retrieval augmented training (RAT) as a way to speed up Exemplar VAE training by using approximate nearest neighbor search in the latent space to define a lower bound on log marginal likelihood. To enhance generalization, model parameters are learned using exemplar leave-one-out and subsampling. Experiments demonstrate the effectiveness of Exemplar VAEs on density estimation and representation learning. Importantly, generative data augmentation using Exemplar VAEs on permutation invariant MNIST and Fashion MNIST reduces classification error from 1.17% to 0.69% and from 8.56% to 8.16%.
\ No newline at end of file
diff --git a/data/2020/neurips/ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks b/data/2020/neurips/ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks
new file mode 100644
index 0000000000..70861df7fa
--- /dev/null
+++ b/data/2020/neurips/ExpandNets: Linear Over-parameterization to Train Compact Convolutional Networks	
@@ -0,0 +1 @@
+In this paper, we introduce an approach to training a given compact network. To this end, we leverage over-parameterization, which typically improves both optimization and generalization in neural network training, while being unnecessary at inference time. We propose to expand each linear layer, both fully-connected and convolutional, of the compact network into multiple linear layers, without adding any nonlinearity. As such, the resulting expanded network can benefit from over-parameterization during training but can be compressed back to the compact one algebraically at inference. We introduce several expansion strategies, together with an initialization scheme, and demonstrate the benefits of our ExpandNets on several tasks, including image classification, object detection, and semantic segmentation. As evidenced by our experiments, our approach outperforms both training the compact network from scratch and performing knowledge distillation from a teacher.
\ No newline at end of file
diff --git a/data/2020/neurips/Experimental design for MRI by greedy policy search b/data/2020/neurips/Experimental design for MRI by greedy policy search
new file mode 100644
index 0000000000..6e11b4e392
--- /dev/null
+++ b/data/2020/neurips/Experimental design for MRI by greedy policy search	
@@ -0,0 +1 @@
+In today's clinical practice, magnetic resonance imaging (MRI) is routinely accelerated through subsampling of the associated Fourier domain. Currently, the construction of these subsampling strategies - known as experimental design - relies primarily on heuristics. We propose to learn experimental design strategies for accelerated MRI with policy gradient methods. Unexpectedly, our experiments show that a simple greedy approximation of the objective leads to solutions nearly on-par with the more general non-greedy approach. We offer a partial explanation for this phenomenon rooted in greater variance in the non-greedy objective's gradient estimates, and experimentally verify that this variance hampers non-greedy models in adapting their policies to individual MR images. We empirically show that this adaptivity is key to improving subsampling designs.
\ No newline at end of file
diff --git a/data/2020/neurips/Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation b/data/2020/neurips/Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
new file mode 100644
index 0000000000..11bde33e46
--- /dev/null
+++ b/data/2020/neurips/Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation	
@@ -0,0 +1 @@
+Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk aversion within the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL) that account for any risk aversion threshold. We further propose an offline version of PSRL as a special case of ESRL.
\ No newline at end of file
diff --git a/data/2020/neurips/Explainable Voting b/data/2020/neurips/Explainable Voting
new file mode 100644
index 0000000000..26001d801d
--- /dev/null
+++ b/data/2020/neurips/Explainable Voting	
@@ -0,0 +1 @@
+The design of voting rules is traditionally guided by desirable axioms. Recent work shows that, surprisingly, the axiomatic approach can also support the generation of explanations for voting outcomes. However, no bounds on the size of these explanations is given; for all we know, they may be unbearably tedious. We prove, however, that outcomes of the important Borda rule can be explained using O ( m 2 ) steps, where m is the number of alternatives. Our main technical result is a general lower bound that, in particular, implies that the foregoing bound is asymptotically tight. We discuss the signiﬁcance of our results for AI and machine learning, including their potential to bolster an emerging paradigm of automated decision making called virtual democracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay b/data/2020/neurips/Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay
new file mode 100644
index 0000000000..237bee515e
--- /dev/null
+++ b/data/2020/neurips/Explaining Naive Bayes and Other Linear Classifiers with Polynomial Time and Delay	
@@ -0,0 +1 @@
+Recent work proposed the computation of so-called PI-explanations of Naive Bayes Classifiers (NBCs). PI-explanations are subset-minimal sets of feature-value pairs that are sufficient for the prediction, and have been computed with state-of-the-art exact algorithms that are worst-case exponential in time and space. In contrast, we show that the computation of one PI-explanation for an NBC can be achieved in log-linear time, and that the same result also applies to the more general class of linear classifiers. Furthermore, we show that the enumeration of PI-explanations can be obtained with polynomial delay. Experimental results demonstrate the performance gains of the new algorithms when compared with earlier work. The experimental results also investigate ways to measure the quality of heuristic explanations
\ No newline at end of file
diff --git a/data/2020/neurips/Explicit Regularisation in Gaussian Noise Injections b/data/2020/neurips/Explicit Regularisation in Gaussian Noise Injections
new file mode 100644
index 0000000000..fae2e216f2
--- /dev/null
+++ b/data/2020/neurips/Explicit Regularisation in Gaussian Noise Injections	
@@ -0,0 +1 @@
+We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it is a form of Tikhonov regularisation which penalises functions with high-frequency components in the Fourier domain. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins and that the explicit regulariser we derive is able to reproduce these effects.
\ No newline at end of file
diff --git a/data/2020/neurips/Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits b/data/2020/neurips/Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits
new file mode 100644
index 0000000000..2da34efa3a
--- /dev/null
+++ b/data/2020/neurips/Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits	
@@ -0,0 +1 @@
+We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements.
\ No newline at end of file
diff --git a/data/2020/neurips/Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning b/data/2020/neurips/Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning
new file mode 100644
index 0000000000..ba092d4de6
--- /dev/null
+++ b/data/2020/neurips/Exploiting MMD and Sinkhorn Divergences for Fair and Transferable Representation Learning	
@@ -0,0 +1 @@
+Developing learning methods which do not discriminate subgroups in the population is a central goal of algorithmic fairness. One way to reach this goal is by modifying the data representation in order to meet certain fairness constraints. In this work we measure fairness according to demographic parity. This requires the probability of the possible model decisions to be independent of the sensitive information. We argue that the goal of imposing demographic parity can be substantially facilitated within a multitask learning setting. We present a method for learning a shared fair representation across multiple tasks, by means of different new constraints based on MMD and Sinkhorn Divergences. We derive learning bounds establishing that the learned representation transfers well to novel tasks. We present experiments on three real world datasets, showing that the proposed method outperforms state-of-the-art approaches by a signiﬁcant margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Exploiting the Surrogate Gap in Online Multiclass Classification b/data/2020/neurips/Exploiting the Surrogate Gap in Online Multiclass Classification
new file mode 100644
index 0000000000..e8e30f17cf
--- /dev/null
+++ b/data/2020/neurips/Exploiting the Surrogate Gap in Online Multiclass Classification	
@@ -0,0 +1 @@
+We present Gaptron, a randomized first-order algorithm for online multiclass classification. In the full information setting we show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret, where the expectation is with respect to the learner's randomness. In the bandit classification setting we show that Gaptron is the first linear time algorithm with $O(K\sqrt{T})$ expected regret, where $K$ is the number of classes. Additionally, the expected mistake bound of Gaptron does not depend on the dimension of the feature vector, contrary to previous algorithms with $O(K\sqrt{T})$ regret in the bandit classification setting. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses rather than exploiting properties such as exp-concavity or mixability, which are traditionally used to prove logarithmic or constant regret bounds.
\ No newline at end of file
diff --git a/data/2020/neurips/Exploiting weakly supervised visual patterns to learn from partial annotations b/data/2020/neurips/Exploiting weakly supervised visual patterns to learn from partial annotations
new file mode 100644
index 0000000000..c5efaab6dd
--- /dev/null
+++ b/data/2020/neurips/Exploiting weakly supervised visual patterns to learn from partial annotations	
@@ -0,0 +1 @@
+In this section, we discuss results on the LVIS dataset [6]. LVIS dataset has 57K training and 5K test 6 images with 1200 categories. The label categories are categorized into three categories based on their 7 frequencies. We use the two highest frequency ones which result in 776 label categories. We use 8 3332 images from the training set for our validation set while maintaining the same label distribution. 9
\ No newline at end of file
diff --git a/data/2020/neurips/Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling b/data/2020/neurips/Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling
new file mode 100644
index 0000000000..51b2133f3e
--- /dev/null
+++ b/data/2020/neurips/Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling	
@@ -0,0 +1 @@
+Owing to their stability and convergence speed, extragradient methods have become a staple for solving large-scale saddle-point problems in machine learning. The basic premise of these algorithms is the use of an extrapolation step before performing an update; thanks to this exploration step, extra-gradient methods overcome many of the non-convergence issues that plague gradient descent/ascent schemes. On the other hand, as we show in this paper, running vanilla extragradient with stochastic gradients may jeopardize its convergence, even in simple bilinear models. To overcome this failure, we investigate a double stepsize extragradient algorithm where the exploration step evolves at a more aggressive time-scale compared to the update step. We show that this modification allows the method to converge even with stochastic gradients, and we derive sharp convergence rates under an error bound condition.
\ No newline at end of file
diff --git a/data/2020/neurips/Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate b/data/2020/neurips/Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate
new file mode 100644
index 0000000000..d72c2bde81
--- /dev/null
+++ b/data/2020/neurips/Extrapolation Towards Imaginary 0-Nearest Neighbour and Its Improved Convergence Rate	
@@ -0,0 +1 @@
+$k$-nearest neighbour ($k$-NN) is one of the simplest and most widely-used methods for supervised classification, that predicts a query's label by taking weighted ratio of observed labels of $k$ objects nearest to the query. The weights and the parameter $k \in \mathbb{N}$ regulate its bias-variance trade-off, and the trade-off implicitly affects the convergence rate of the excess risk for the $k$-NN classifier; several existing studies considered selecting optimal $k$ and weights to obtain faster convergence rate. Whereas $k$-NN with non-negative weights has been developed widely, it was proved that negative weights are essential for eradicating the bias terms and attaining optimal convergence rate. However, computation of the optimal weights requires solving entangled equations. Thus, other simpler approaches that can find optimal real-valued weights are appreciated in practice. In this paper, we propose multiscale $k$-NN (MS-$k$-NN), that extrapolates unweighted $k$-NN estimators from several $k \ge 1$ values to $k=0$, thus giving an imaginary 0-NN estimator. MS-$k$-NN implicitly corresponds to an adaptive method for finding favorable real-valued weights, and we theoretically prove that the MS-$k$-NN attains the improved rate, that coincides with the existing optimal rate under some conditions.
\ No newline at end of file
diff --git a/data/2020/neurips/FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs b/data/2020/neurips/FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs
new file mode 100644
index 0000000000..a2fcbeb3ab
--- /dev/null
+++ b/data/2020/neurips/FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs	
@@ -0,0 +1 @@
+In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space. This work focuses on the representation learning question: how can we learn such features? Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem. Structurally, we make precise connections between these low rank MDPs and latent variable models, showing how they significantly generalize prior formulations for representation learning in RL. Algorithmically, we develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
\ No newline at end of file
diff --git a/data/2020/neurips/Factor Graph Grammars b/data/2020/neurips/Factor Graph Grammars
new file mode 100644
index 0000000000..0a81f2667a
--- /dev/null
+++ b/data/2020/neurips/Factor Graph Grammars	
@@ -0,0 +1 @@
+We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. For finite sets of graphs (but possibly infinite variable domains), a FGG can be converted to a single factor graph amenable to standard inference techniques.
\ No newline at end of file
diff --git a/data/2020/neurips/Factor Graph Neural Networks b/data/2020/neurips/Factor Graph Neural Networks
new file mode 100644
index 0000000000..634066b87a
--- /dev/null
+++ b/data/2020/neurips/Factor Graph Neural Networks	
@@ -0,0 +1 @@
+In recent years, we have witnessed a surge of Graph Neural Networks (GNNs), most of which can learn powerful representations in an end-to-end fashion with great success in many real-world applications. They have resemblance to Probabilistic Graphical Models (PGMs), but break free from some limitations of PGMs. By aiming to provide expressive methods for representation learning instead of computing marginals or most likely configurations, GNNs provide flexibility in the choice of information flowing rules while maintaining good performance. Despite their success and inspirations, they lack efficient ways to represent and learn higher-order relations among variables/nodes. More expressive higher-order GNNs which operate on k-tuples of nodes need increased computational resources in order to process higher-order tensors. We propose Factor Graph Neural Networks (FGNNs) to effectively capture higher-order relations for inference and learning. To do so, we first derive an efficient approximate Sum-Product loopy belief propagation inference algorithm for discrete higher-order PGMs. We then neuralize the novel message passing scheme into a Factor Graph Neural Network (FGNN) module by allowing richer representations of the message update rules; this facilitates both efficient inference and powerful end-to-end learning. We further show that with a suitable choice of message aggregation operators, our FGNN is also able to represent Max-Product belief propagation, providing a single family of architecture that can represent both Max and Sum-Product loopy belief propagation. Our extensive experimental evaluation on synthetic as well as real datasets demonstrates the potential of the proposed model.
\ No newline at end of file
diff --git a/data/2020/neurips/Factorizable Graph Convolutional Networks b/data/2020/neurips/Factorizable Graph Convolutional Networks
new file mode 100644
index 0000000000..2d4d49d172
--- /dev/null
+++ b/data/2020/neurips/Factorizable Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graphs have been widely adopted to denote structural connections between entities. The relations are in many cases heterogeneous, but entangled together and denoted merely as a single edge between a pair of nodes. For example, in a social network graph, users in different latent relationships like friends and colleagues, are usually connected via a bare edge that conceals such intrinsic connections. In this paper, we introduce a novel graph convolutional network (GCN), termed as factorizable graph convolutional network(FactorGCN), that explicitly disentangles such intertwined relations encoded in a graph. FactorGCN takes a simple graph as input, and disentangles it into several factorized graphs, each of which represents a latent and disentangled relation among nodes. The features of the nodes are then aggregated separately in each factorized latent space to produce disentangled features, which further leads to better performances for downstream tasks. We evaluate the proposed FactorGCN both qualitatively and quantitatively on the synthetic and real-world datasets, and demonstrate that it yields truly encouraging results in terms of both disentangling and feature aggregation. Code is publicly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Factorized Neural Processes for Neural Processes: K-Shot Prediction of Neural Responses b/data/2020/neurips/Factorized Neural Processes for Neural Processes: K-Shot Prediction of Neural Responses
new file mode 100644
index 0000000000..48f7576bf2
--- /dev/null
+++ b/data/2020/neurips/Factorized Neural Processes for Neural Processes: K-Shot Prediction of Neural Responses	
@@ -0,0 +1 @@
+In recent years, artificial neural networks have achieved state-of-the-art performance for predicting the responses of neurons in the visual cortex to natural stimuli. However, they require a time consuming parameter optimization process for accurately modeling the tuning function of newly observed neurons, which prohibits many applications including real-time, closed-loop experiments. We overcome this limitation by formulating the problem as $K$-shot prediction to directly infer a neuron's tuning function from a small set of stimulus-response pairs using a Neural Process. This required us to developed a Factorized Neural Process, which embeds the observed set into a latent space partitioned into the receptive field location and the tuning function properties. We show on simulated responses that the predictions and reconstructed receptive fields from the Factorized Neural Process approach ground truth with increasing number of trials. Critically, the latent representation that summarizes the tuning function of a neuron is inferred in a quick, single forward pass through the network. Finally, we validate this approach on real neural data from visual cortex and find that the predictive accuracy is comparable to -- and for small $K$ even greater than -- optimization based approaches, while being substantially faster. We believe this novel deep learning systems identification framework will facilitate better real-time integration of artificial neural network modeling into neuroscience experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Fair Hierarchical Clustering b/data/2020/neurips/Fair Hierarchical Clustering
new file mode 100644
index 0000000000..45b9a7ee97
--- /dev/null
+++ b/data/2020/neurips/Fair Hierarchical Clustering	
@@ -0,0 +1,2 @@
+As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. 
+In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a specific objective. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective.
\ No newline at end of file
diff --git a/data/2020/neurips/Fair Multiple Decision Making Through Soft Interventions b/data/2020/neurips/Fair Multiple Decision Making Through Soft Interventions
new file mode 100644
index 0000000000..bbe371bfc1
--- /dev/null
+++ b/data/2020/neurips/Fair Multiple Decision Making Through Soft Interventions	
@@ -0,0 +1 @@
+Previous research in fair classiﬁcation mostly focuses on a single decision model. In reality, there usually exist multiple decision models within a system and all of which may contain a certain amount of discrimination. Such realistic scenarios introduce new challenges to fair classiﬁcation: since discrimination may be transmitted from upstream models to downstream models, building decision models separately without taking upstream models into consideration cannot guarantee to achieve fairness. In this paper, we propose an approach that learns multiple classiﬁers and achieves fairness for all of them simultaneously, by treating each decision model as a soft intervention and inferring the post-intervention distributions to formulate the loss function as well as the fairness constraints. We adopt surrogate functions to smooth the loss function and constraints, and theoretically show that the excess risk of the proposed loss function can be bounded in a form that is the same as that for traditional surrogated loss functions. Experiments using both synthetic and real-world datasets show the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Fair Performance Metric Elicitation b/data/2020/neurips/Fair Performance Metric Elicitation
new file mode 100644
index 0000000000..15c5da85c4
--- /dev/null
+++ b/data/2020/neurips/Fair Performance Metric Elicitation	
@@ -0,0 +1 @@
+What is a fair performance metric? We consider the choice of fairness metrics through the lens of metric elicitation -- a principled framework for selecting performance metrics that best reflect implicit preferences. The use of metric elicitation enables a practitioner to tune the performance and fairness metrics to the task, context, and population at hand. Specifically, we propose a novel strategy to elicit fair performance metrics for multiclass classification problems with multiple sensitive groups that also includes selecting the trade-off between performance and fairness. The proposed elicitation strategy requires only relative preference feedback and is robust to both finite sample and feedback noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Fair regression via plug-in estimator and recalibration with statistical guarantees b/data/2020/neurips/Fair regression via plug-in estimator and recalibration with statistical guarantees
new file mode 100644
index 0000000000..a6906cd2cd
--- /dev/null
+++ b/data/2020/neurips/Fair regression via plug-in estimator and recalibration with statistical guarantees	
@@ -0,0 +1 @@
+We study the problem of learning an optimal regression function subject to a fairness constraint. It requires that, conditionally on the sensitive feature, the distribution of the function output remains the same. This constraint naturally extends the notion of demographic parity, often used in classification, to the regression setting. We tackle this problem by leveraging on a proxy-discretized version, for which we derive an explicit expression of the optimal fair predictor. This result naturally suggests a two stage approach, in which we first estimate the (unconstrained) regression function from a set of labeled data and then we recalibrate it with another set of unlabeled data. The recalibration step can be efficiently performed via a smooth optimization. We derive rates of convergence of the proposed estimator to the optimal fair predictor both in terms of the risk and fairness constraint. Finally, we present numerical experiments illustrating that the proposed method is often superior or competitive with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Fair regression with Wasserstein barycenters b/data/2020/neurips/Fair regression with Wasserstein barycenters
new file mode 100644
index 0000000000..6215b272cd
--- /dev/null
+++ b/data/2020/neurips/Fair regression with Wasserstein barycenters	
@@ -0,0 +1 @@
+We study the problem of learning a real-valued function that satisfies the Demographic Parity constraint. It demands the distribution of the predicted output to be independent of the sensitive attribute. We consider the case that the sensitive attribute is available for prediction. We establish a connection between fair regression and optimal transport theory, based on which we derive a close form expression for the optimal fair predictor. Specifically, we show that the distribution of this optimum is the Wasserstein barycenter of the distributions induced by the standard regression function on the sensitive groups. This result offers an intuitive interpretation of the optimal fair prediction and suggests a simple post-processing algorithm to achieve fairness. We establish risk and distribution-free fairness guarantees for this procedure. Numerical experiments indicate that our method is very effective in learning fair models, with a relative increase in error rate that is inferior to the relative gain in fairness.
\ No newline at end of file
diff --git a/data/2020/neurips/Fairness constraints can help exact inference in structured prediction b/data/2020/neurips/Fairness constraints can help exact inference in structured prediction
new file mode 100644
index 0000000000..51e00f129b
--- /dev/null
+++ b/data/2020/neurips/Fairness constraints can help exact inference in structured prediction	
@@ -0,0 +1 @@
+Many inference problems in structured prediction can be modeled as maximizing a score function on a space of labels, where graphs are a natural representation to decompose the total score into a sum of unary (nodes) and pairwise (edges) scores. Given a generative model with an undirected connected graph $G$ and true vector of binary labels, it has been previously shown that when $G$ has good expansion properties, such as complete graphs or $d$-regular expanders, one can exactly recover the true labels (with high probability and in polynomial time) from a single noisy observation of each edge and node. We analyze the previously studied generative model by Globerson et al. (2015) under a notion of statistical parity. That is, given a fair binary node labeling, we ask the question whether it is possible to recover the fair assignment, with high probability and in polynomial time, from single edge and node observations. We find that, in contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable to achieve exact recovery with high probability. Finally, as a byproduct of our analysis, we provide a tighter minimum-eigenvalue bound than that of Weyl's inequality.
\ No newline at end of file
diff --git a/data/2020/neurips/Fairness in Streaming Submodular Maximization: Algorithms and Hardness b/data/2020/neurips/Fairness in Streaming Submodular Maximization: Algorithms and Hardness
new file mode 100644
index 0000000000..dffcfdfad1
--- /dev/null
+++ b/data/2020/neurips/Fairness in Streaming Submodular Maximization: Algorithms and Hardness	
@@ -0,0 +1 @@
+Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algorithms increasingly important. In this work we address the question: Is it possible to create fair summaries for massive datasets? To this end, we develop the first streaming approximation algorithms for submodular maximization under fairness constraints, for both monotone and non-monotone functions. We validate our findings empirically on exemplar-based clustering, movie recommendation, DPP-based summarization, and maximum coverage in social networks, showing that fairness constraints do not significantly impact utility.
\ No newline at end of file
diff --git a/data/2020/neurips/Fairness with Overlapping Groups; a Probabilistic Perspective b/data/2020/neurips/Fairness with Overlapping Groups; a Probabilistic Perspective
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Fairness without Demographics through Adversarially Reweighted Learning b/data/2020/neurips/Fairness without Demographics through Adversarially Reweighted Learning
new file mode 100644
index 0000000000..3b30eb2ef7
--- /dev/null
+++ b/data/2020/neurips/Fairness without Demographics through Adversarially Reweighted Learning	
@@ -0,0 +1 @@
+Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train a ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that ARL improves Rawlsian Max-Min fairness, with significant AUC improvements for worst-case protected groups in multiple datasets,outperforming state-of-the-art alternatives.
\ No newline at end of file
diff --git a/data/2020/neurips/Faithful Embeddings for Knowledge Base Queries b/data/2020/neurips/Faithful Embeddings for Knowledge Base Queries
new file mode 100644
index 0000000000..f8fd1bd383
--- /dev/null
+++ b/data/2020/neurips/Faithful Embeddings for Knowledge Base Queries	
@@ -0,0 +1 @@
+The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art.
\ No newline at end of file
diff --git a/data/2020/neurips/Falcon: Fast Spectral Inference on Encrypted Data b/data/2020/neurips/Falcon: Fast Spectral Inference on Encrypted Data
new file mode 100644
index 0000000000..b05bdf4cd3
--- /dev/null
+++ b/data/2020/neurips/Falcon: Fast Spectral Inference on Encrypted Data	
@@ -0,0 +1 @@
+Homomorphic Encryption (HE) based secure Neural Networks(NNs) inference is one of the most promising security solutions to emerging Machine Learning as a Service (MLaaS). In the HE-based MLaaS setting, a client encrypts the sensitive data, and uploads the encrypted data to the server that directly processes the encrypted data without decryption, and returns the encrypted result to the client. The client’S data privacy is preserved since only the client has the private key. Existing HE-enabled Neural Networks (HENNs), however, suffer from heavy computational overheads. The state-of-the-art HENNs adopt ciphertext packing techniques to reduce homomorphic multiplications by packing multiple messages into one single ciphertext. Nevertheless, rotations are required in these HENNs to implement the sum of the elements within the same ciphertext. We observed that HENNs have to pay signiﬁcant computing overhead on rotations, and each of rotations is ∼ 10 × more expensive than homomorphic multiplications between ciphertext and plaintext. So the massive rotations have become a primary obstacle of efﬁcient HENNs. In this paper, we propose a fast, frequency-domain deep neural network called Falcon, for fast inferences on encrypted data. Falcon includes a fast Homomor-phic Discrete Fourier Transform (HDFT) using block-circulant matrices to ho-momorphically support spectral operations. We also propose several efﬁcient methods to reduce inference latency, including Homomorphic Spectral Convolu-tion and Homomorphic Spectral Fully Connected operations by combining the batched HE and block-circulant matrices. Our experimental results show Falcon achieves the state-of-the-art inference accuracy and reduces the inference latency by 45 . 45% ∼ 85 . 34% over
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint b/data/2020/neurips/Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint
new file mode 100644
index 0000000000..298fe0ca80
--- /dev/null
+++ b/data/2020/neurips/Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint	
@@ -0,0 +1 @@
+Constrained submodular maximization problems encompass a wide variety of applications, including personalized recommendation, team formation, and revenue maximization via viral marketing. The massive instances occurring in modern-day applications can render existing algorithms prohibitively slow. Moreover, frequently those instances are also inherently stochastic. Focusing on these challenges, we revisit the classic problem of maximizing a (possibly non-monotone) submodular function subject to a knapsack constraint. We present a simple randomized greedy algorithm that achieves a 5.83-approximation and runs in O(n log n) time, i.e., at least a factor n faster than other state-of-the-art algorithms. The versatility of our approach allows us to further transfer it to a stochastic version of the problem. There, we obtain a (9 + ε)-approximation to the best adaptive policy, which is the first constant approximation for non-monotone objectives. Experimental evaluation of our algorithms showcases their improved performance on real and synthetic data.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev b/data/2020/neurips/Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev
new file mode 100644
index 0000000000..715f23b59c
--- /dev/null
+++ b/data/2020/neurips/Fast Convergence of Langevin Dynamics on Manifold: Geodesics meet Log-Sobolev	
@@ -0,0 +1 @@
+Sampling is a fundamental and arguably very important task with numerous applications in Machine Learning. One approach to sample from a high dimensional distribution $e^{-f}$ for some function $f$ is the Langevin Algorithm (LA). Recently, there has been a lot of progress in showing fast convergence of LA even in cases where $f$ is non-convex, notably [53], [39] in which the former paper focuses on functions $f$ defined in $\mathbb{R}^n$ and the latter paper focuses on functions with symmetries (like matrix completion type objectives) with manifold structure. Our work generalizes the results of [53] where $f$ is defined on a manifold $M$ rather than $\mathbb{R}^n$. From technical point of view, we show that KL decreases in a geometric rate whenever the distribution $e^{-f}$ satisfies a log-Sobolev inequality on $M$.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine b/data/2020/neurips/Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine
new file mode 100644
index 0000000000..5a013f2cb7
--- /dev/null
+++ b/data/2020/neurips/Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine	
@@ -0,0 +1 @@
+Wasserstein \textbf{D}istributionally \textbf{R}obust \textbf{O}ptimization (DRO) is concerned with finding decisions that perform well on data that are drawn from the worst-case probability distribution within a Wasserstein ball centered at a certain nominal distribution. In recent years, it has been shown that various DRO formulations of learning models admit tractable convex reformulations. However, most existing works propose to solve these convex reformulations by general-purpose solvers, which are not well-suited for tackling large-scale problems. In this paper, we focus on a family of Wasserstein distributionally robust support vector machine (DRSVM) problems and propose two novel epigraphical projection-based incremental algorithms to solve them. The updates in each iteration of these algorithms can be computed in a highly efficient manner. Moreover, we show that the DRSVM problems considered in this paper satisfy a H\"olderian growth condition with explicitly determined growth exponents. Consequently, we are able to establish the convergence rates of the proposed incremental algorithms. Our numerical results indicate that the proposed methods are orders of magnitude faster than the state-of-the-art, and the performance gap grows considerably as the problem size increases.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Fourier Convolution b/data/2020/neurips/Fast Fourier Convolution
new file mode 100644
index 0000000000..e72b06bb1d
--- /dev/null
+++ b/data/2020/neurips/Fast Fourier Convolution	
@@ -0,0 +1 @@
+Vanilla convolutions in modern deep networks are known to operate locally and at ﬁxed scale ( e.g. , the widely-adopted 3 × 3 kernels in image-oriented tasks). This causes low efﬁcacy in connecting two distant locations in the network. In this work, we propose a novel convolutional operator dubbed as fast Fourier convolution (FFC), which has the main hallmarks of non-local receptive ﬁelds and cross-scale fusion within the convolutional unit. According to spectral convolution theorem in Fourier theory, point-wise update in the spectral domain globally affects all input features involved in Fourier transform, which sheds light on neural architectural design with non-local receptive ﬁeld. Our proposed FFC is inspired to capsulate three different kinds of computations in a single operation unit: a local branch that conducts ordinary small-kernel convolution, a semi-global branch that processes spectrally stacked image patches, and a global branch that manipulates image-level spectrum. All branches complementarily address different scales. A multi-branch aggregation step is included in FFC for cross-scale fusion. FFC is a generic operator that can directly replace vanilla convolutions in a large body of existing networks, without any adjustments and with comparable complexity metrics ( e.g. , FLOPs). We experimentally evaluate FFC in three major vision benchmarks (ImageNet for image recognition, Kinetics for video action recognition, MSCOCO for human keypoint detection). It consistently elevates accuracies in all above tasks by signiﬁcant margins.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization b/data/2020/neurips/Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization
new file mode 100644
index 0000000000..057f126b09
--- /dev/null
+++ b/data/2020/neurips/Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization	
@@ -0,0 +1 @@
+Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians $\mathcal{N}(\mathbf 0, \mathbf K)$ or whitening a vector $\mathbf b$ against covariance matrix $\mathbf K$. While existing methods typically require $O(N^3)$ computation, we introduce a highly-efficient quadratic-time algorithm for computing $\mathbf K^{1/2} \mathbf b$, $\mathbf K^{-1/2} \mathbf b$, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves $4$ decimal places of accuracy with fewer than $100$ MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as $50,\!000 \times 50,\!000$ - well beyond traditional methods - with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Transformers with Clustered Attention b/data/2020/neurips/Fast Transformers with Clustered Attention
new file mode 100644
index 0000000000..682b12ffb9
--- /dev/null
+++ b/data/2020/neurips/Fast Transformers with Clustered Attention	
@@ -0,0 +1 @@
+Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, groups queries into clusters and computes attention just for the centroids. To further improve this approximation, we use the computed clusters to identify the keys with the highest attention per query and compute the exact key/query dot products. This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters. We evaluate our approach on two automatic speech recognition datasets and show that our model consistently outperforms vanilla transformers for a given computational budget. Finally, we demonstrate that our model can approximate arbitrarily complex attention distributions with a minimal number of clusters by approximating a pretrained BERT model on GLUE and SQuAD benchmarks with only 25 clusters and no loss in performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast Unbalanced Optimal Transport on a Tree b/data/2020/neurips/Fast Unbalanced Optimal Transport on a Tree
new file mode 100644
index 0000000000..f653daf85a
--- /dev/null
+++ b/data/2020/neurips/Fast Unbalanced Optimal Transport on a Tree	
@@ -0,0 +1 @@
+This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the ﬁrst time. We reveal which problems in unbalanced optimal transport can/cannot be solved efﬁciently. Speciﬁcally, we prove that the Kantorovich Rubinstein distance and optimal partial transport in the Euclidean metric cannot be computed in strongly subquadratic time under the strong exponential time hypothesis. Then, we propose an algorithm that solves a more general unbalanced optimal transport problem exactly in quasi-linear time on a tree metric. The proposed algorithm processes a tree with one million nodes in less than one second. Our analysis forms a foundation for the theoretical study of unbalanced optimal transport algorithms and opens the door to the applications of unbalanced optimal transport to million-scale datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast and Accurate $k$-means++ via Rejection Sampling b/data/2020/neurips/Fast and Accurate $k$-means++ via Rejection Sampling
new file mode 100644
index 0000000000..14bc15a1f6
--- /dev/null
+++ b/data/2020/neurips/Fast and Accurate $k$-means++ via Rejection Sampling	
@@ -0,0 +1 @@
+$k$-means++ \cite{arthur2007k} is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, $k$-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for $k$-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as $k$-means++ and significantly improves earlier results on fast $k$-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than $k$-means++ and obtains solutions of equivalent quality.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast and Flexible Temporal Point Processes with Triangular Maps b/data/2020/neurips/Fast and Flexible Temporal Point Processes with Triangular Maps
new file mode 100644
index 0000000000..c6cb8fd140
--- /dev/null
+++ b/data/2020/neurips/Fast and Flexible Temporal Point Processes with Triangular Maps	
@@ -0,0 +1 @@
+Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP -- a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast geometric learning with symbolic matrices b/data/2020/neurips/Fast geometric learning with symbolic matrices
new file mode 100644
index 0000000000..5e92650590
--- /dev/null
+++ b/data/2020/neurips/Fast geometric learning with symbolic matrices	
@@ -0,0 +1 @@
+Geometric methods rely on tensors that can be encoded using a symbolic formula and data arrays, such as kernel and distance matrices. We present an extension for standard machine learning frameworks that provides comprehensive support for this abstraction on CPUs and GPUs: our toolbox combines a versatile, transparent user interface with fast runtimes and low memory usage. Unlike general purpose acceleration frameworks such as XLA, our library turns generic Python code into binaries whose performances are competitive with state-of-the-art geometric libraries – such as FAISS for nearest neighbor search – with the added beneﬁt of ﬂexibility. We perform an extensive evaluation on a broad class of problems: Gaussian modelling, K-nearest neighbors search, geometric deep learning, non-Euclidean embeddings and optimal transport theory. In practice, for geometric problems that involve 10 3 to 10 6 samples in dimension 1 to 100 , our library speeds up baseline GPU implementations by up to two orders of magnitude.
\ No newline at end of file
diff --git a/data/2020/neurips/Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation b/data/2020/neurips/Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation
new file mode 100644
index 0000000000..e54f3bf452
--- /dev/null
+++ b/data/2020/neurips/Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation	
@@ -0,0 +1 @@
+Automated machine learning (AutoML) can produce complex model ensembles by stacking, bagging, and boosting many individual models like trees, deep networks, and nearest neighbor estimators. While highly accurate, the resulting predictors are large, slow, and opaque as compared to their constituents. To improve the deployment of AutoML on tabular data, we propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks. At the heart of our approach is a data augmentation strategy based on Gibbs sampling from a self-attention pseudolikelihood estimator. Across 30 datasets spanning regression and binary/multiclass classification tasks, FAST-DAD distillation produces significantly better individual models than one obtains through standard training on the original data. Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
\ No newline at end of file
diff --git a/data/2020/neurips/Faster DBSCAN via subsampled similarity queries b/data/2020/neurips/Faster DBSCAN via subsampled similarity queries
new file mode 100644
index 0000000000..cb57f3090d
--- /dev/null
+++ b/data/2020/neurips/Faster DBSCAN via subsampled similarity queries	
@@ -0,0 +1 @@
+DBSCAN is a popular density-based clustering algorithm. It computes the $\epsilon$-neighborhood graph of a dataset and uses the connected components of the high-degree nodes to decide the clusters. However, the full neighborhood graph may be too costly to compute with a worst-case complexity of $O(n^2)$. In this paper, we propose a simple variant called SNG-DBSCAN, which clusters based on a subsampled $\epsilon$-neighborhood graph, only requires access to similarity queries for pairs of points and in particular avoids any complex data structures which need the embeddings of the data points themselves. The runtime of the procedure is $O(sn^2)$, where $s$ is the sampling rate. We show under some natural theoretical assumptions that $s \approx \log n/n$ is sufficient for statistical cluster recovery guarantees leading to an $O(n\log n)$ complexity. We provide an extensive experimental analysis showing that on large datasets, one can subsample as little as $0.1\%$ of the neighborhood graph, leading to as much as over 200x speedup and 250x reduction in RAM consumption compared to scikit-learn's implementation of DBSCAN, while still maintaining competitive clustering performance.
\ No newline at end of file
diff --git "a/data/2020/neurips/Faster Differentially Private Samplers via R\303\251nyi Divergence Analysis of Discretized Langevin MCMC" "b/data/2020/neurips/Faster Differentially Private Samplers via R\303\251nyi Divergence Analysis of Discretized Langevin MCMC"
new file mode 100644
index 0000000000..fb2d01a3d2
--- /dev/null
+++ "b/data/2020/neurips/Faster Differentially Private Samplers via R\303\251nyi Divergence Analysis of Discretized Langevin MCMC"	
@@ -0,0 +1 @@
+Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(-f)$ for a suitable function $f$. When the domain of the distribution is high-dimensional, this sampling can be computationally challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex $f$, we give the first results proving convergence in Renyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.
\ No newline at end of file
diff --git a/data/2020/neurips/Faster Randomized Infeasible Interior Point Methods for Tall Wide Linear Programs b/data/2020/neurips/Faster Randomized Infeasible Interior Point Methods for Tall Wide Linear Programs
new file mode 100644
index 0000000000..930b27c81f
--- /dev/null
+++ b/data/2020/neurips/Faster Randomized Infeasible Interior Point Methods for Tall Wide Linear Programs	
@@ -0,0 +1 @@
+Linear programming (LP) is used in many machine learning applications, such as (cid:96) 1 -regularized SVMs, basis pursuit, nonnegative matrix factorization, etc. Interior Point Methods (IPMs) are one of the most popular methods to solve LPs both in theory and in practice. Their underlying complexity is dominated by the cost of solving a system of linear equations at each iteration. In this paper, we consider infeasible IPMs for the special case where the number of variables is much larger than the number of constraints (i.e., wide), or vice-versa (i.e., tall) by taking the dual. Using tools from Randomized Linear Algebra, we present a preconditioning technique that, when combined with the Conjugate Gradient iterative solver, provably guarantees that infeasible IPM algorithms (suitably modiﬁed to account for the error incurred by the approximate solver), converge to a feasible, approximately optimal solution, without increasing their iteration complexity. Our empirical evaluations verify our theoretical results on both real and synthetic data.
\ No newline at end of file
diff --git a/data/2020/neurips/Faster Wasserstein Distance Estimation with the Sinkhorn Divergence b/data/2020/neurips/Faster Wasserstein Distance Estimation with the Sinkhorn Divergence
new file mode 100644
index 0000000000..4fbb352999
--- /dev/null
+++ b/data/2020/neurips/Faster Wasserstein Distance Estimation with the Sinkhorn Divergence	
@@ -0,0 +1 @@
+The squared Wasserstein distance is a natural quantity to compare probability distributions in a non-parametric setting. This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem. It can be solved to $\epsilon$-accuracy by adding an entropic regularization of order $\epsilon$ and using for instance Sinkhorn's algorithm. In this work, we propose instead to estimate it with the Sinkhorn divergence, which is also built on entropic regularization but includes debiasing terms. We show that, for smooth densities, this estimator has a comparable sample complexity but allows higher regularization levels, of order $\epsilon^{1/2}$ , which leads to improved computational complexity bounds and a strong speedup in practice. Our theoretical analysis covers the case of both randomly sampled densities and deterministic discretizations on uniform grids. We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities. We finally demonstrate the efficiency of the proposed estimators with numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Feature Importance Ranking for Deep Learning b/data/2020/neurips/Feature Importance Ranking for Deep Learning
new file mode 100644
index 0000000000..058199bacc
--- /dev/null
+++ b/data/2020/neurips/Feature Importance Ranking for Deep Learning	
@@ -0,0 +1 @@
+Feature importance ranking has become a powerful tool for explainable AI. However, its nature of combinatorial optimization poses a great challenge for deep learning. In this paper, we propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size and ranking the importance of those features in the optimal subset simultaneously. During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector that learns predicting the learning performance of the operator working on different optimal subset candidates. We develop an alternate learning algorithm that trains two nets jointly and incorporates a stochastic local search procedure into learning to address the combinatorial optimization challenge. In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data. A thorough evaluation on synthetic, benchmark and real data sets suggests that our approach outperforms several state-of-the-art feature importance ranking and supervised feature selection methods. (Our source code is available: https://github.com/maksym33/FeatureImportanceDL)
\ No newline at end of file
diff --git a/data/2020/neurips/Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests b/data/2020/neurips/Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests
new file mode 100644
index 0000000000..a2911747f4
--- /dev/null
+++ b/data/2020/neurips/Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests	
@@ -0,0 +1 @@
+While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying issue. For example, in military sensor networks, users will want to detect when one or more of the sensors has been compromised, and critically, they will want to know which specific sensors might be compromised. Thus, we first define a formalization of this problem as multiple conditional distribution hypothesis tests and propose both non-parametric and parametric statistical tests. For both efficiency and flexibility, we then propose to use a test statistic based on the density model score function (i.e. gradient with respect to the input) -- which can easily compute test statistics for all dimensions in a single forward and backward pass. Any density model could be used for computing the necessary statistics including deep density models such as normalizing flows or autoregressive models. We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data.
\ No newline at end of file
diff --git a/data/2020/neurips/FedSplit: an algorithmic framework for fast federated optimization b/data/2020/neurips/FedSplit: an algorithmic framework for fast federated optimization
new file mode 100644
index 0000000000..369a266f66
--- /dev/null
+++ b/data/2020/neurips/FedSplit: an algorithmic framework for fast federated optimization	
@@ -0,0 +1 @@
+Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in simple convex settings with deterministic updates. In order to remedy these issues, we introduce FedSplit, a class of algorithms based on operator splitting procedures for solving distributed convex minimization with additive structure. We prove that these procedures have the correct fixed points, corresponding to optima of the original optimization problem, and we characterize their convergence rates under different settings. Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities. We complement our theory with some simple experiments that demonstrate the benefits of our methods in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Federated Accelerated Stochastic Gradient Descent b/data/2020/neurips/Federated Accelerated Stochastic Gradient Descent
new file mode 100644
index 0000000000..7ae11b1777
--- /dev/null
+++ b/data/2020/neurips/Federated Accelerated Stochastic Gradient Descent	
@@ -0,0 +1 @@
+We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for distributed optimization. FedAc is the first provable acceleration of FedAvg that improves convergence speed and communication efficiency on various types of convex functions. For example, for strongly convex and smooth functions, when using $M$ workers, the previous state-of-the-art FedAvg analysis can achieve a linear speedup in $M$ if given $M$ rounds of synchronization, whereas FedAc only requires $M^{\frac{1}{3}}$ rounds. Moreover, we prove stronger guarantees for FedAc when the objectives are third-order smooth. Our technique is based on a potential-based perturbed iterate analysis, a novel stability analysis of generalized accelerated SGD, and a strategic tradeoff between acceleration and stability.
\ No newline at end of file
diff --git a/data/2020/neurips/Federated Bayesian Optimization via Thompson Sampling b/data/2020/neurips/Federated Bayesian Optimization via Thompson Sampling
new file mode 100644
index 0000000000..53d2661187
--- /dev/null
+++ b/data/2020/neurips/Federated Bayesian Optimization via Thompson Sampling	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNNs lack access to gradients and thus require zeroth-order/black-box optimization. This hints at the possibility of extending BO to the FL setting (FBO) for agents to collaborate in these black-box optimization tasks. This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO, which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling, which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Federated Principal Component Analysis b/data/2020/neurips/Federated Principal Component Analysis
new file mode 100644
index 0000000000..5870111d4c
--- /dev/null
+++ b/data/2020/neurips/Federated Principal Component Analysis	
@@ -0,0 +1 @@
+Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Few-Cost Salient Object Detection with Adversarial-Paced Learning b/data/2020/neurips/Few-Cost Salient Object Detection with Adversarial-Paced Learning
new file mode 100644
index 0000000000..7324229da3
--- /dev/null
+++ b/data/2020/neurips/Few-Cost Salient Object Detection with Adversarial-Paced Learning	
@@ -0,0 +1 @@
+Detecting and segmenting salient objects from given image scenes has received great attention in recent years. A fundamental challenge in training the existing deep saliency detection models is the requirement of large amounts of annotated data. While gathering large quantities of training data becomes cheap and easy, annotating the data is an expensive process in terms of time, labor and human expertise. To address this problem, this paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only, thus dramatically alleviating human labor in training models. To this end, we name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario. Essentially, APL is derived from the self-paced learning (SPL) regime but it infers the robust learning pace through the data-driven adversarial learning mechanism rather than the heuristic design of the learning regularizer. Comprehensive experiments on four widely-used benchmark datasets demonstrate that the proposed method can effectively approach to the existing supervised deep salient object detection models with only 1k human-annotated training images. The project page is available at https://github.com/hb-stone/FC-SOD.
\ No newline at end of file
diff --git a/data/2020/neurips/Few-shot Image Generation with Elastic Weight Consolidation b/data/2020/neurips/Few-shot Image Generation with Elastic Weight Consolidation
new file mode 100644
index 0000000000..f9b02c8bdf
--- /dev/null
+++ b/data/2020/neurips/Few-shot Image Generation with Elastic Weight Consolidation	
@@ -0,0 +1 @@
+Few-shot image generation seeks to generate more data of a given domain, with only few available training examples. As it is unreasonable to expect to fully infer the distribution from just a few observations (e.g., emojis), we seek to leverage a large, related source domain as pretraining (e.g., human faces). Thus, we wish to preserve the diversity of the source domain, while adapting to the appearance of the target. We adapt a pretrained model, without introducing any additional parameters, to the few examples of the target domain. Crucially, we regularize the changes of the weights during this adaptation, in order to best preserve the information of the source dataset, while fitting the target. We demonstrate the effectiveness of our algorithm by generating high-quality results of different target domains, including those with extremely few examples (e.g., <10). We also analyze the performance of our method with respect to some important factors, such as the number of examples and the dissimilarity between the source and target domain.
\ No newline at end of file
diff --git a/data/2020/neurips/Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning b/data/2020/neurips/Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning
new file mode 100644
index 0000000000..93145bf6ba
--- /dev/null
+++ b/data/2020/neurips/Few-shot Visual Reasoning with Meta-Analogical Contrastive Learning	
@@ -0,0 +1 @@
+While humans can solve a visual puzzle that requires logical reasoning by observing only few samples, it would require training over large amount of data for state-of-the-art deep reasoning models to obtain similar performance on the same task. In this work, we propose to solve such a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning, which is a unique human ability to identify structural or relational similarity between two sets. Specifically, given training and test sets that contain the same type of visual reasoning problems, we extract the structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning. We repeatedly apply this process with slightly modified queries of the same problem under the assumption that it does not affect the relationship between a training and a test sample. This allows to learn the relational similarity between the two samples in an effective manner even with a single pair of samples. We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce. We further meta-learn our analogical contrastive learning model over the same tasks with diverse attributes, and show that it generalizes to the same visual reasoning problem with unseen attributes.
\ No newline at end of file
diff --git a/data/2020/neurips/Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies b/data/2020/neurips/Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies
new file mode 100644
index 0000000000..49a264c52a
--- /dev/null
+++ b/data/2020/neurips/Fewer is More: A Deep Graph Metric Learning Perspective Using Fewer Proxies	
@@ -0,0 +1 @@
+Deep metric learning plays a key role in various machine learning tasks. Most of the previous works have been confined to sampling from a mini-batch, which cannot precisely characterize the global geometry of the embedding space. Although researchers have developed proxy- and classification-based methods to tackle the sampling issue, those methods inevitably incur a redundant computational cost. In this paper, we propose a novel Proxy-based deep Graph Metric Learning (ProxyGML) approach from the perspective of graph classification, which uses fewer proxies yet achieves better comprehensive performance. Specifically, multiple global proxies are leveraged to collectively approximate the original data points for each class. To efficiently capture local neighbor relationships, a small number of such proxies are adaptively selected to construct similarity subgraphs between these proxies and each data point. Further, we design a novel reverse label propagation algorithm, by which the neighbor relationships are adjusted according to ground-truth labels, so that a discriminative metric space can be learned during the process of subgraph classification. Extensive experiments carried out on widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate the superiority of the proposed ProxyGML over the state-of-the-art methods in terms of both effectiveness and efficiency. The source code is publicly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications b/data/2020/neurips/Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
new file mode 100644
index 0000000000..009ecfcea2
--- /dev/null
+++ b/data/2020/neurips/Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications	
@@ -0,0 +1,2 @@
+In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $\gamma$-discounted), allowing in particular for the introduction of an additional common noise. 
+We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Field-wise Learning for Multi-field Categorical Data b/data/2020/neurips/Field-wise Learning for Multi-field Categorical Data
new file mode 100644
index 0000000000..531415bf8c
--- /dev/null
+++ b/data/2020/neurips/Field-wise Learning for Multi-field Categorical Data	
@@ -0,0 +1 @@
+We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex model. In contrast, we propose a field-wise learning method leveraging the natural structure of data to learn simple yet efficient one-to-one field-focused models with appropriate constraints. In doing this, the models can be fitted to each category and thus can better capture the underlying differences in data. We present a model that utilizes linear models with variance and low-rank constraints, to help it generalize better and reduce the number of parameters. The model is also interpretable in a field-wise manner. As the dimensionality of multi-field categorical data can be very high, the models applied to such data are mostly over-parameterized. Our theoretical analysis can potentially explain the effect of over-parametrization on the generalization of our model. It also supports the variance constraints in the learning objective. The experiment results on two large-scale datasets show the superior performance of our model, the trend of the generalization error bound, and the interpretability of learning outcomes. Our code is available at https://github.com/lzb5600/Field-wise-Learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Fighting Copycat Agents in Behavioral Cloning from Observation Histories b/data/2020/neurips/Fighting Copycat Agents in Behavioral Cloning from Observation Histories
new file mode 100644
index 0000000000..44100deeef
--- /dev/null
+++ b/data/2020/neurips/Fighting Copycat Agents in Behavioral Cloning from Observation Histories	
@@ -0,0 +1 @@
+Imitation learning trains policies to map from input observations to the actions that an expert would choose. In this setting, distribution shift frequently exacerbates the effect of misattributing expert actions to nuisance correlates among the observed variables. We observe that a common instance of this causal confusion occurs in partially observed settings when expert actions are strongly correlated over time: the imitator learns to cheat by predicting the expert's previous action, rather than the next action. To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action. In our experiments, our approach improves performance significantly across a variety of partially observed imitation learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems b/data/2020/neurips/Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems
new file mode 100644
index 0000000000..ae2ee9aa72
--- /dev/null
+++ b/data/2020/neurips/Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems	
@@ -0,0 +1 @@
+This paper proposes two efﬁcient algorithms for computing approximate second-order stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints. While ﬁnding (approximate) SOSPs for the class of smooth non-convex linearly constrained problems is computationally intractable, we show that generic problem instances in this class can be solved efﬁciently. Speciﬁcally, for a generic problem instance, we show that certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions. Based on this condition, we design an algorithm named Successive Negative-curvature grAdient Projection (SNAP) , which performs either conventional gradient projection or some negative curvature based projection steps to ﬁnd SOSPs. SNAP is a second-order algorithm that requires e O (max { 1 /ϵ 2 G , 1 /ϵ 3 H } ) iterations to compute an ( ϵ G , ϵ H ) -SOSP, where e O hides the iteration complexity for eigenvalue-decomposition. Building on SNAP, we propose a ﬁrst-order algorithm, named SNAP + , that requires O (1 /ϵ 2 . 5 ) iterations to compute ( ϵ, √ ϵ ) -SOSP. The per-iteration computational complexities of our algorithms are polynomial in the number of constraints and problem dimension. To the best of our knowledge, this is the ﬁrst time that ﬁrst-order algorithms with polynomial per-iteration complexity and global sublinear
\ No newline at end of file
diff --git a/data/2020/neurips/Finding the Homology of Decision Boundaries with Active Learning b/data/2020/neurips/Finding the Homology of Decision Boundaries with Active Learning
new file mode 100644
index 0000000000..ce912d6a4a
--- /dev/null
+++ b/data/2020/neurips/Finding the Homology of Decision Boundaries with Active Learning	
@@ -0,0 +1 @@
+Accurately and efficiently characterizing the decision boundary of classifiers is important for problems related to model selection and meta-learning. Inspired by topological data analysis, the characterization of decision boundaries using their homology has recently emerged as a general and powerful tool. In this paper, we propose an active learning algorithm to recover the homology of decision boundaries. Our algorithm sequentially and adaptively selects which samples it requires the labels of. We theoretically analyze the proposed framework and show that the query complexity of our active learning algorithm depends naturally on the intrinsic complexity of the underlying manifold. We demonstrate the effectiveness of our framework in selecting best-performing machine learning models for datasets just using their respective homological summaries. Experiments on several standard datasets show the sample complexity improvement in recovering the homology and demonstrate the practical utility of the framework for model selection. Source code for our algorithms and experimental results is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Fine-Grained Dynamic Head for Object Detection b/data/2020/neurips/Fine-Grained Dynamic Head for Object Detection
new file mode 100644
index 0000000000..bdf6ce3bca
--- /dev/null
+++ b/data/2020/neurips/Fine-Grained Dynamic Head for Object Detection	
@@ -0,0 +1 @@
+The Feature Pyramid Network (FPN) presents a remarkable approach to alleviate the scale variance in object representation by performing instance-level assignments. Nevertheless, this strategy ignores the distinct characteristics of different sub-regions in an instance. To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation. Moreover, we design a spatial gate with the new activation function to reduce computational complexity dramatically through spatially sparse convolutions. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Finer Metagenomic Reconstruction via Biodiversity Optimization b/data/2020/neurips/Finer Metagenomic Reconstruction via Biodiversity Optimization
new file mode 100644
index 0000000000..88c7bdde16
--- /dev/null
+++ b/data/2020/neurips/Finer Metagenomic Reconstruction via Biodiversity Optimization	
@@ -0,0 +1 @@
+When analyzing communities of microorganisms from their sequenced DNA, an important task is taxonomic profiling: enumerating the presence and relative abundance of all organisms, or merely of all taxa, contained in the sample. This task can be tackled via compressive-sensing-based approaches, which favor communities featuring the fewest organisms among those consistent with the observed DNA data. Despite their successes, these parsimonious approaches sometimes conflict with biological realism by overlooking organism similarities. Here, we leverage a recently developed notion of biological diversity that simultaneously accounts for organism similarities and retains the optimization strategy underlying compressive-sensing-based approaches. We demonstrate that minimizing biological diversity still produces sparse taxonomic profiles and we experimentally validate superiority to existing compressive-sensing-based approaches. Despite showing that the objective function is almost never convex and often concave, generally yielding NP-hard problems, we exhibit ways of representing organism similarities for which minimizing diversity can be performed via a sequence of linear programs guaranteed to decrease diversity. Better yet, when biological similarity is quantified by k-mer co-occurrence (a popular notion in bioinformatics), minimizing diversity actually reduces to one linear program that can utilize multiple k-mer sizes to enhance performance. In proof-of-concept experiments, we verify that the latter procedure can lead to significant gains when taxonomically profiling a metagenomic sample, both in terms of reconstruction accuracy and computational performance. Reproducible code is available at https://github.com/dkoslicki/MinimizeBiologicalDiversity.
\ No newline at end of file
diff --git a/data/2020/neurips/Finite Continuum-Armed Bandits b/data/2020/neurips/Finite Continuum-Armed Bandits
new file mode 100644
index 0000000000..42f916d540
--- /dev/null
+++ b/data/2020/neurips/Finite Continuum-Armed Bandits	
@@ -0,0 +1 @@
+We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of actions. Each action can be completed at most once and results in a stochastic reward with unknown mean. The goal of the agent is to maximize her cumulative reward. Non trivial strategies are possible when side information on the actions is available, for example in the form of covariates. Focusing on a nonparametric setting, where the mean reward is an unknown function of a one-dimensional covariate, we propose an optimal strategy for this problem. Under natural assumptions on the reward function, we prove that the optimal regret scales as $O(T^{1/3})$ up to poly-logarithmic factors when the budget $T$ is proportional to the number of actions $N$. When $T$ becomes small compared to $N$, a smooth transition occurs. When the ratio $T/N$ decreases from a constant to $N^{-1/3}$, the regret increases progressively up to the $O(T^{1/2})$ rate encountered in continuum-armed bandits.
\ No newline at end of file
diff --git a/data/2020/neurips/Finite Versus Infinite Neural Networks: an Empirical Study b/data/2020/neurips/Finite Versus Infinite Neural Networks: an Empirical Study
new file mode 100644
index 0000000000..a79f6868ff
--- /dev/null
+++ b/data/2020/neurips/Finite Versus Infinite Neural Networks: an Empirical Study	
@@ -0,0 +1 @@
+We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; centered and ensembled finite networks have reduced posterior variance and behave more similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts similarly to early stopping; floating point precision limits kernel performance beyond a critical dataset size; regularized ZCA whitening improves accuracy; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena; equivariance of CNNs is only beneficial for narrow networks far from the kernel regime. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction, including a novel ensembling technique. Using these best practices we achieve state-of-the-art results on CIFAR-10 classification for kernels corresponding to each architecture class we consider.
\ No newline at end of file
diff --git a/data/2020/neurips/Finite-Time Analysis for Double Q-learning b/data/2020/neurips/Finite-Time Analysis for Double Q-learning
new file mode 100644
index 0000000000..e9751669fe
--- /dev/null
+++ b/data/2020/neurips/Finite-Time Analysis for Double Q-learning	
@@ -0,0 +1 @@
+Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $\epsilon$-accurate neighborhood of the global optimum by taking $\tilde{\Omega}\left(\left( \frac{1}{(1-\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1-\gamma}\right)^{\frac{1}{1-\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
\ No newline at end of file
diff --git a/data/2020/neurips/Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards b/data/2020/neurips/Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
new file mode 100644
index 0000000000..28d0bb101c
--- /dev/null
+++ b/data/2020/neurips/Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards	
@@ -0,0 +1 @@
+We study an extension of the classic stochastic multi-armed bandit problem which involves multiple plays and Markovian rewards in the rested bandits setting. In order to tackle this problem we consider an adaptive allocation rule which at each stage combines the information from the sample means of all the arms, with the Kullback-Leibler upper confidence bound of a single arm which is selected in round-robin way. For rewards generated from a one-parameter exponential family of Markov chains, we provide a finite-time upper bound for the regret incurred from this adaptive allocation rule, which reveals the logarithmic dependence of the regret on the time horizon, and which is asymptotically optimal. For our analysis we devise several concentration results for Markov chains, including a maximal inequality for Markov chains, that may be of interest in their own right. As a byproduct of our analysis we also establish asymptotically optimal, finite-time guarantees for the case of multiple plays, and i.i.d. rewards drawn from a one-parameter exponential family of probability densities. Additionally, we provide simulation results that illustrate that calculating Kullback-Leibler upper confidence bounds in a round-robin way, is significantly more efficient than calculating them for every arm at each round, and that the expected regrets of those two approaches behave similarly.
\ No newline at end of file
diff --git a/data/2020/neurips/Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks b/data/2020/neurips/Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
new file mode 100644
index 0000000000..559d271ec3
--- /dev/null
+++ b/data/2020/neurips/Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks	
@@ -0,0 +1 @@
+We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning. In particular, on a challenging continual image classification task, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/First Order Constrained Optimization in Policy Space b/data/2020/neurips/First Order Constrained Optimization in Policy Space
new file mode 100644
index 0000000000..7889ecc66b
--- /dev/null
+++ b/data/2020/neurips/First Order Constrained Optimization in Policy Space	
@@ -0,0 +1 @@
+In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior-such as ones which are deemed unsafe and to be avoided-are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach has an approximate upper bound for worst-case constraint violation throughout training and is first-order in nature therefore simple to implement. We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/First-Order Methods for Large-Scale Market Equilibrium Computation b/data/2020/neurips/First-Order Methods for Large-Scale Market Equilibrium Computation
new file mode 100644
index 0000000000..3b5a78cf72
--- /dev/null
+++ b/data/2020/neurips/First-Order Methods for Large-Scale Market Equilibrium Computation	
@@ -0,0 +1 @@
+Market equilibrium is a solution concept with many applications such as digital ad markets, fair division, and resource sharing. For many classes of utility functions, equilibria are captured by convex programs. We develop simple first-order methods that are suitable for solving these programs for large-scale markets. We focus on three practically-relevant utility classes: linear, quasilinear, and Leontief utilities. Using structural properties of a market equilibrium under each utility class, we show that the corresponding convex programs can be reformulated as optimization of a structured smooth convex function over a polyhedral set, for which projected gradient achieves linear convergence. To do so, we utilize recent linear convergence results under weakened strong-convexity conditions, and further refine the relevant constants, both in general and for our specific setups. We then show that proximal gradient (a generalization of projected gradient) with a practical version of linesearch achieves linear convergence under the Proximal-PL condition. For quasilinear utilities, we show that Mirror Descent applied to a specific convex program achieves sublinear last-iterate convergence and recovers the Proportional Response dynamics, an elegant and efficient algorithm for computing market equilibrium under linear utilities. Numerical experiments show that proportional response is highly efficient for computing an approximate solution, while projected gradient with linesearch can be much faster when higher accuracy is required.
\ No newline at end of file
diff --git a/data/2020/neurips/FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence b/data/2020/neurips/FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
new file mode 100644
index 0000000000..fcf28a91e0
--- /dev/null
+++ b/data/2020/neurips/FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence	
@@ -0,0 +1 @@
+Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm b/data/2020/neurips/Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm
new file mode 100644
index 0000000000..08df32542a
--- /dev/null
+++ b/data/2020/neurips/Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm	
@@ -0,0 +1 @@
+We study the fixed-support Wasserstein barycenter problem (FS-WBP), which consists in computing the Wasserstein barycenter of $m$ discrete probability measures supported on a finite metric space of size $n$. We show first that the constraint matrix arising from the standard linear programming (LP) representation of the FS-WBP is \textit{not totally unimodular} when $m \geq 3$ and $n \geq 3$. This result resolves an open question pertaining to the relationship between the FS-WBP and the minimum-cost flow (MCF) problem since it proves that the FS-WBP in the standard LP form is not an MCF problem when $m \geq 3$ and $n \geq 3$. We also develop a provably fast \textit{deterministic} variant of the celebrated iterative Bregman projection (IBP) algorithm, named \textsc{FastIBP}, with a complexity bound of $\tilde{O}(mn^{7/3}\varepsilon^{-4/3})$, where $\varepsilon \in (0, 1)$ is the tolerance. This complexity bound is better than the best known complexity bound of $\tilde{O}(mn^2\varepsilon^{-2})$ for the IBP algorithm in terms of $\varepsilon$, and that of $\tilde{O}(mn^{5/2}\varepsilon^{-1})$ from other accelerated algorithms in terms of $n$. Finally, we conduct extensive experiments with both synthetic and real data and demonstrate the favorable performance of the \textsc{FastIBP} algorithm in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/FleXOR: Trainable Fractional Quantization b/data/2020/neurips/FleXOR: Trainable Fractional Quantization
new file mode 100644
index 0000000000..4393b102c3
--- /dev/null
+++ b/data/2020/neurips/FleXOR: Trainable Fractional Quantization	
@@ -0,0 +1 @@
+Quantization based on the binary codes is gaining attention because each quantized bit can be directly utilized for computations without dequantization using look-up tables. Previous attempts, however, only allow for integer numbers of quantization bits, which ends up restricting the search space for compression ratio and accuracy. In this paper, we propose an encryption algorithm/architecture to compress quantized weights so as to achieve fractional numbers of bits per weight. Decryption during inference is implemented by digital XOR-gate networks added into the neural network model while XOR gates are described by utilizing $\tanh(x)$ for backward propagation to enable gradient calculations. We perform experiments using MNIST, CIFAR-10, and ImageNet to show that inserting XOR gates learns quantization/encrypted bit decisions through training and obtains high accuracy even for fractional sub 1-bit weights. As a result, our proposed method yields smaller size and higher model accuracy compared to binary neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Flexible mean field variational inference using mixtures of non-overlapping exponential families b/data/2020/neurips/Flexible mean field variational inference using mixtures of non-overlapping exponential families
new file mode 100644
index 0000000000..1d6bdfe473
--- /dev/null
+++ b/data/2020/neurips/Flexible mean field variational inference using mixtures of non-overlapping exponential families	
@@ -0,0 +1 @@
+Sparse models are desirable for many applications across diverse domains as they can perform automatic variable selection, aid interpretability, and provide regularization. When fitting sparse models in a Bayesian framework, however, analytically obtaining a posterior distribution over the parameters of interest is intractable for all but the simplest cases. As a result practitioners must rely on either sampling algorithms such as Markov chain Monte Carlo or variational methods to obtain an approximate posterior. Mean field variational inference is a particularly simple and popular framework that is often amenable to analytically deriving closed-form parameter updates. When all distributions in the model are members of exponential families and are conditionally conjugate, optimization schemes can often be derived by hand. Yet, I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors, such as the spike-and-slab. Fortunately, such pathological behavior can be remedied as I show that mixtures of exponential family distributions with non-overlapping support form an exponential family. In particular, any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family. Furthermore, specific choices of these distributions maintain conditional conjugacy. I use two applications to motivate these results: one from statistical genetics that has connections to generalized least squares with a spike-and-slab prior on the regression coefficients; and sparse probabilistic principal component analysis. The theoretical results presented here are broadly applicable beyond these two examples.
\ No newline at end of file
diff --git a/data/2020/neurips/Flows for simultaneous manifold learning and density estimation b/data/2020/neurips/Flows for simultaneous manifold learning and density estimation
new file mode 100644
index 0000000000..8efa63a26a
--- /dev/null
+++ b/data/2020/neurips/Flows for simultaneous manifold learning and density estimation	
@@ -0,0 +1 @@
+We introduce manifold-learning flows (M-flows), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold. Combining aspects of normalizing flows, GANs, autoencoders, and energy-based models, they have the potential to represent datasets with a manifold structure more faithfully and provide handles on dimensionality reduction, denoising, and out-of-distribution detection. We argue why such models should not be trained by maximum likelihood alone and present a new training algorithm that separates manifold and density updates. In a range of experiments we demonstrate how M-flows learn the data manifold and allow for better inference than standard flows in the ambient data space.
\ No newline at end of file
diff --git a/data/2020/neurips/Focus of Attention Improves Information Transfer in Visual Features b/data/2020/neurips/Focus of Attention Improves Information Transfer in Visual Features
new file mode 100644
index 0000000000..b1888c2b17
--- /dev/null
+++ b/data/2020/neurips/Focus of Attention Improves Information Transfer in Visual Features	
@@ -0,0 +1 @@
+Unsupervised learning from continuous visual streams is a challenging problem that cannot be naturally and efficiently managed in the classic batch-mode setting of computation. The information stream must be carefully processed accordingly to an appropriate spatio-temporal distribution of the visual data, while most approaches of learning commonly assume uniform probability density. In this paper we focus on unsupervised learning for transferring visual information in a truly online setting by using a computational model that is inspired to the principle of least action in physics. The maximization of the mutual information is carried out by a temporal process which yields online estimation of the entropy terms. The model, which is based on second-order differential equations, maximizes the information transfer from the input to a discrete space of symbols related to the visual features of the input, whose computation is supported by hidden neurons. In order to better structure the input probability distribution, we use a human-like focus of attention model that, coherently with the information maximization model, is also based on second-order differential equations. We provide experimental results to support the theory by showing that the spatio-temporal filtering induced by the focus of attention allows the system to globally transfer more information from the input stream over the focused areas and, in some contexts, over the whole frames with respect to the unfiltered case that yields uniform probability distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games b/data/2020/neurips/Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games
new file mode 100644
index 0000000000..4301254f43
--- /dev/null
+++ b/data/2020/neurips/Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games	
@@ -0,0 +1 @@
+We consider the problem of online learning and its application to solving minimax games. For the online learning problem, Follow the Perturbed Leader (FTPL) is a widely studied algorithm which enjoys the optimal $O(T^{1/2})$ worst-case regret guarantee for both convex and nonconvex losses. In this work, we show that when the sequence of loss functions is predictable, a simple modification of FTPL which incorporates optimism can achieve better regret guarantees, while retaining the optimal worst-case regret guarantee for unpredictable sequences. A key challenge in obtaining these tighter regret bounds is the stochasticity and optimism in the algorithm, which requires different analysis techniques than those commonly used in the analysis of FTPL. The key ingredient we utilize in our analysis is the dual view of perturbation as regularization. While our algorithm has several applications, we consider the specific application of minimax games. For solving smooth convex-concave games, our algorithm only requires access to a linear optimization oracle. For Lipschitz and smooth nonconvex-nonconcave games, our algorithm requires access to an optimization oracle which computes the perturbed best response. In both these settings, our algorithm solves the game up to an accuracy of $O(T^{-1/2})$ using $T$ calls to the optimization oracle. An important feature of our algorithm is that it is highly parallelizable and requires only $O(T^{1/2})$ iterations, with each iteration making $O(T^{1/2})$ parallel calls to the optimization oracle.
\ No newline at end of file
diff --git a/data/2020/neurips/Forethought and Hindsight in Credit Assignment b/data/2020/neurips/Forethought and Hindsight in Credit Assignment
new file mode 100644
index 0000000000..a94b5914c6
--- /dev/null
+++ b/data/2020/neurips/Forethought and Hindsight in Credit Assignment	
@@ -0,0 +1 @@
+We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.
\ No newline at end of file
diff --git a/data/2020/neurips/Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes b/data/2020/neurips/Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
new file mode 100644
index 0000000000..7d55b591a2
--- /dev/null
+++ b/data/2020/neurips/Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes	
@@ -0,0 +1 @@
+Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), for the training of depth estimators, with Mirrored Exponential Disparity (MED) probability volumes, from which we obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM). Our MOM does not impose a burden on our FAL-net. Contrary to the previous methods that learn SIDE from stereo pairs by regressing disparity in the linear space, our FAL-net regresses disparity by binning it into the exponential space, which allows for better detection of distant and nearby objects. We define a two-step training strategy for our FAL-net: It is first trained for view synthesis and then fine-tuned for depth estimation with our MOM. Our FAL-net is remarkably light-weight and outperforms the previous state-of-the-art methods with 8x fewer parameters and 3x faster inference speeds on the challenging KITTI dataset. We present extensive experimental results on the KITTI, CityScapes, and Make3D datasets to verify our method's effectiveness. To the authors' best knowledge, the presented method performs the best among all the previous self-supervised methods until now.
\ No newline at end of file
diff --git a/data/2020/neurips/Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains b/data/2020/neurips/Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
new file mode 100644
index 0000000000..c927cadb84
--- /dev/null
+++ b/data/2020/neurips/Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains	
@@ -0,0 +1 @@
+We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.
\ No newline at end of file
diff --git a/data/2020/neurips/Fourier Sparse Leverage Scores and Approximate Kernel Learning b/data/2020/neurips/Fourier Sparse Leverage Scores and Approximate Kernel Learning
new file mode 100644
index 0000000000..b518aeed4d
--- /dev/null
+++ b/data/2020/neurips/Fourier Sparse Leverage Scores and Approximate Kernel Learning	
@@ -0,0 +1,3 @@
+We prove new explicit upper bounds on the leverage scores of Fourier sparse functions under both the Gaussian and Laplace measures. In particular, we study $s$-sparse functions of the form $f(x) = \sum_{j=1}^s a_j e^{i \lambda_j x}$ for coefficients $a_j \in \mathbb{C}$ and frequencies $\lambda_j \in \mathbb{R}$. Bounding Fourier sparse leverage scores under various measures is of pure mathematical interest in approximation theory, and our work extends existing results for the uniform measure [Erd17,CP19a]. Practically, our bounds are motivated by two important applications in machine learning: 
+1. Kernel Approximation. They yield a new random Fourier features algorithm for approximating Gaussian and Cauchy (rational quadratic) kernel matrices. For low-dimensional data, our method uses a near optimal number of features, and its runtime is polynomial in the $statistical\ dimension$ of the approximated kernel matrix. It is the first "oblivious sketching method" with this property for any kernel besides the polynomial kernel, resolving an open question of [AKM+17,AKK+20b]. 
+2. Active Learning. They can be used as non-uniform sampling distributions for robust active learning when data follows a Gaussian or Laplace distribution. Using the framework of [AKM+19], we provide essentially optimal results for bandlimited and multiband interpolation, and Gaussian process regression. These results generalize existing work that only applies to uniformly distributed data.
\ No newline at end of file
diff --git a/data/2020/neurips/Fourier Spectrum Discrepancies in Deep Network Generated Images b/data/2020/neurips/Fourier Spectrum Discrepancies in Deep Network Generated Images
new file mode 100644
index 0000000000..c2f22e9101
--- /dev/null
+++ b/data/2020/neurips/Fourier Spectrum Discrepancies in Deep Network Generated Images	
@@ -0,0 +1 @@
+Advancements in deep generative models such as generative adversarial networks and variational autoencoders have resulted in the ability to generate realistic images that are visually indistinguishable from real images, which raises concerns about their potential malicious usage. In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and show that deep network generated images share an observable, systematic shortcoming in replicating the attributes of these high-frequency modes. Using this, we propose a detection method based on the frequency spectrum of the images which is able to achieve an accuracy of up to 99.2% in classifying real and deep network generated images from various GAN and VAE architectures on a dataset of 5000 images with as few as 8 training examples. Furthermore, we show the impact of image transformations such as compression, cropping, and resolution reduction on the classification accuracy and suggest a method for modifying the high-frequency attributes of deep network generated images to mimic real images.
\ No newline at end of file
diff --git a/data/2020/neurips/Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics b/data/2020/neurips/Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics
new file mode 100644
index 0000000000..9bbddadf32
--- /dev/null
+++ b/data/2020/neurips/Fourier-transform-based attribution priors improve the interpretability and stability of deep learning models for genomics	
@@ -0,0 +1 @@
+Deep learning models can accurately map genomic DNA sequences to associated functional molecular readouts such as protein–DNA binding data. Base-resolution importance (i.e. “attribution”) scores inferred from these models can highlight predictive sequence motifs and syntax. Unfortunately, these models are prone to overfitting and are sensitive to random initializations, often resulting in noisy and irreproducible attributions that obfuscate underlying motifs. To address these shortcomings, we propose a novel attribution prior, where the Fourier transform of input-level attribution scores are computed at training-time, and high-frequency components of the Fourier spectrum are penalized. We evaluate different model architectures with and without attribution priors trained on genome-wide binary or continuous molecular profiles. We show that our attribution prior dramatically improves models’ stability, interpretability, and performance on held-out data, especially when training data is severely limited. Our attribution prior also allows models to identify biologically meaningful sequence motifs more sensitively and precisely within individual regulatory elements. The prior is agnostic to the model architecture or predicted experimental assay, yet provides similar gains across all experiments. This work represents an important advancement in improving the reliability of deep learning models for deciphering the regulatory code of the genome.
\ No newline at end of file
diff --git a/data/2020/neurips/FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training b/data/2020/neurips/FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
new file mode 100644
index 0000000000..4cc72744ed
--- /dev/null
+++ b/data/2020/neurips/FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training	
@@ -0,0 +1 @@
+Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only"fractionally"updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.
\ No newline at end of file
diff --git a/data/2020/neurips/From Boltzmann Machines to Neural Networks and Back Again b/data/2020/neurips/From Boltzmann Machines to Neural Networks and Back Again
new file mode 100644
index 0000000000..6033452af9
--- /dev/null
+++ b/data/2020/neurips/From Boltzmann Machines to Neural Networks and Back Again	
@@ -0,0 +1 @@
+Graphical models are powerful tools for modeling high-dimensional data, but learning graphical models in the presence of latent variables is well-known to be difficult. In this work we give new results for learning Restricted Boltzmann Machines, probably the most well-studied class of latent variable models. Our results are based on new connections to learning two-layer neural networks under $\ell_{\infty}$ bounded input; for both problems, we give nearly optimal results under the conjectured hardness of sparse parity with noise. Using the connection between RBMs and feedforward networks, we also initiate the theoretical study of $supervised~RBMs$ [Hinton, 2012], a version of neural-network learning that couples distributional assumptions induced from the underlying graphical model with the architecture of the unknown function class. We then give an algorithm for learning a natural class of supervised RBMs with better runtime than what is possible for its related class of networks without distributional assumptions.
\ No newline at end of file
diff --git a/data/2020/neurips/From Predictions to Decisions: Using Lookahead Regularization b/data/2020/neurips/From Predictions to Decisions: Using Lookahead Regularization
new file mode 100644
index 0000000000..7541a56fca
--- /dev/null
+++ b/data/2020/neurips/From Predictions to Decisions: Using Lookahead Regularization	
@@ -0,0 +1 @@
+Machine learning is a powerful tool for predicting human-related outcomes, from credit scores to heart attack risks. But when deployed, learned models also affect how users act in order to improve outcomes, whether predicted or real. The standard approach to learning is agnostic to induced user actions and provides no guarantees as to the effect of actions. We provide a framework for learning predictors that are both accurate and promote good actions. For this, we introduce look-ahead regularization which, by anticipating user actions, encourages predictive models to also induce actions that improve outcomes. This regularization carefully tailors the uncertainty estimates governing confidence in this improvement to the distribution of model-induced actions. We report the results of experiments on real and synthetic data that show the effectiveness of this approach.
\ No newline at end of file
diff --git a/data/2020/neurips/From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering b/data/2020/neurips/From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
new file mode 100644
index 0000000000..fcbf95fb16
--- /dev/null
+++ b/data/2020/neurips/From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering	
@@ -0,0 +1 @@
+Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
\ No newline at end of file
diff --git a/data/2020/neurips/FrugalML: How to use ML Prediction APIs more accurately and cheaply b/data/2020/neurips/FrugalML: How to use ML Prediction APIs more accurately and cheaply
new file mode 100644
index 0000000000..fab247b737
--- /dev/null
+++ b/data/2020/neurips/FrugalML: How to use ML Prediction APIs more accurately and cheaply	
@@ -0,0 +1 @@
+Prediction APIs offered for a fee are a fast-growing industry and an important part of machine learning as a service. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data and budget. We take a first step towards addressing this challenge by proposing FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition. Across various tasks, FrugalML can achieve up to 90% cost reduction while matching the accuracy of the best single API, or up to 5% better accuracy while matching the best API's cost.
\ No newline at end of file
diff --git a/data/2020/neurips/Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels b/data/2020/neurips/Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels
new file mode 100644
index 0000000000..140b0f5385
--- /dev/null
+++ b/data/2020/neurips/Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels	
@@ -0,0 +1 @@
+Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models.
\ No newline at end of file
diff --git a/data/2020/neurips/Fully Dynamic Algorithm for Constrained Submodular Optimization b/data/2020/neurips/Fully Dynamic Algorithm for Constrained Submodular Optimization
new file mode 100644
index 0000000000..87d45fa9c6
--- /dev/null
+++ b/data/2020/neurips/Fully Dynamic Algorithm for Constrained Submodular Optimization	
@@ -0,0 +1 @@
+The task of maximizing a monotone submodular function under a cardinality constraint is at the core of many machine learning and data mining applications, including data summarization, sparse regression and coverage problems. We study this classic problem in the fully dynamic setting, where elements can be both inserted and removed. Our main result is a randomized algorithm that maintains an efficient data structure with a poly-logarithmic amortized update time and yields a $(1/2-\epsilon)$-approximate solution. We complement our theoretical analysis with an empirical study of the performance of our algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Functional Regularization for Representation Learning: A Unified Theoretical Perspective b/data/2020/neurips/Functional Regularization for Representation Learning: A Unified Theoretical Perspective
new file mode 100644
index 0000000000..6f0935e73e
--- /dev/null
+++ b/data/2020/neurips/Functional Regularization for Representation Learning: A Unified Theoretical Perspective	
@@ -0,0 +1 @@
+Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their theoretical understanding largely lags behind. Towards bridging this gap, we present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches. Our sample complexity bounds show that, with carefully chosen hypothesis classes to exploit the structure in the data, such functional regularization can prune the hypothesis space and help reduce the labeled data needed. We then provide two concrete examples of functional regularization, one using auto-encoders and the other using masked self-supervision, and apply the framework to quantify the reduction in the sample complexity bound. We also provide complementary empirical results for the examples to support our analysis.
\ No newline at end of file
diff --git a/data/2020/neurips/Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing b/data/2020/neurips/Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
new file mode 100644
index 0000000000..842f2a44fb
--- /dev/null
+++ b/data/2020/neurips/Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing	
@@ -0,0 +1 @@
+With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension. The code and pretrained checkpoints are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Further Analysis of Outlier Detection with Deep Generative Models b/data/2020/neurips/Further Analysis of Outlier Detection with Deep Generative Models
new file mode 100644
index 0000000000..b5eb537ce4
--- /dev/null
+++ b/data/2020/neurips/Further Analysis of Outlier Detection with Deep Generative Models	
@@ -0,0 +1 @@
+The recent, counter-intuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and high-density region may not conincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihood-based outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of low-level texture versus high-level semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed.
\ No newline at end of file
diff --git a/data/2020/neurips/GAIT-prop: A biologically plausible learning rule derived from backpropagation of error b/data/2020/neurips/GAIT-prop: A biologically plausible learning rule derived from backpropagation of error
new file mode 100644
index 0000000000..986aa23c62
--- /dev/null
+++ b/data/2020/neurips/GAIT-prop: A biologically plausible learning rule derived from backpropagation of error	
@@ -0,0 +1 @@
+Traditional backpropagation of error, though a highly successful algorithm for learning in artificial neural network models, includes features which are biologically implausible for learning in real neural circuits. An alternative called target propagation proposes to solve this implausibility by using a top-down model of neural activity to convert an error at the output of a neural network into layer-wise and plausible 'targets' for every unit. These targets can then be used to produce weight updates for network training. However, thus far, target propagation has been heuristically proposed without demonstrable equivalence to backpropagation. Here, we derive an exact correspondence between backpropagation and a modified form of target propagation (GAIT-prop) where the target is a small perturbation of the forward pass. Specifically, backpropagation and GAIT-prop give identical updates when synaptic weight matrices are orthogonal. In a series of simple computer vision experiments, we show near-identical performance between backpropagation and GAIT-prop with a soft orthogonality-inducing regularizer.
\ No newline at end of file
diff --git a/data/2020/neurips/GAN Memory with No Forgetting b/data/2020/neurips/GAN Memory with No Forgetting
new file mode 100644
index 0000000000..9511e14255
--- /dev/null
+++ b/data/2020/neurips/GAN Memory with No Forgetting	
@@ -0,0 +1 @@
+Seeking to address the fundamental issue of memory in lifelong learning, we propose a GAN memory that is capable of realistically remembering a stream of generative processes with \emph{no} forgetting. Our GAN memory is based on recognizing that one can modulate the ``style'' of a GAN model to form perceptually-distant targeted generation. Accordingly, we propose to do sequential style modulations atop a well-behaved base GAN model, to form sequential targeted generative models, while simultaneously benefiting from the transferred base knowledge. Experiments demonstrate the superiority of our method over existing approaches and its effectiveness in alleviating catastrophic forgetting for lifelong classification problems.
\ No newline at end of file
diff --git a/data/2020/neurips/GANSpace: Discovering Interpretable GAN Controls b/data/2020/neurips/GANSpace: Discovering Interpretable GAN Controls
new file mode 100644
index 0000000000..cbb3f90c96
--- /dev/null
+++ b/data/2020/neurips/GANSpace: Discovering Interpretable GAN Controls	
@@ -0,0 +1 @@
+This paper describes a simple technique to analyze Generative Adversarial Networks (GANs) and create interpretable controls for image synthesis, such as change of viewpoint, aging, lighting, and time of day. We identify important latent directions based on Principal Components Analysis (PCA) applied either in latent space or feature space. Then, we show that a large number of interpretable controls can be defined by layer-wise perturbation along the principal directions. Moreover, we show that BigGAN can be controlled with layer-wise inputs in a StyleGAN-like manner. We show results on different GANs trained on various datasets, and demonstrate good qualitative matches to edit directions found through earlier supervised approaches.
\ No newline at end of file
diff --git "a/data/2020/neurips/GCN meets GPU: Decoupling \"When to Sample\" from \"How to Sample\"" "b/data/2020/neurips/GCN meets GPU: Decoupling \"When to Sample\" from \"How to Sample\""
new file mode 100644
index 0000000000..de3c0365fc
--- /dev/null
+++ "b/data/2020/neurips/GCN meets GPU: Decoupling \"When to Sample\" from \"How to Sample\""	
@@ -0,0 +1 @@
+Sampling-based methods promise scalability improvements when paired with stochastic gradient descent in training Graph Convolutional Networks (GCNs). While effective in alleviating the neighborhood explosion, due to bandwidth and memory bottlenecks, these methods lead to computational overheads in preprocessing and loading new samples in heterogeneous systems, which signiﬁcantly deteriorate the sampling performance. By decoupling the frequency of sampling from the sampling strategy, we propose LazyGCN, a general yet effective framework that can be integrated with any sampling strategy to substantially improve the training time. The basic idea behind LazyGCN is to perform sampling periodically and effectively recycle the sampled nodes to mitigate data preparation overhead. We theoretically analyze the proposed algorithm and show that under a mild condition on the recycling size, by reducing the variance of inner layers, we are able to obtain the same convergence rate as the underlying sampling method. We also give corroborating empirical evidence on large real-world graphs, demonstrating that the proposed schema can signiﬁcantly reduce the number of sampling steps and yield superior speedup without compromising the accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs b/data/2020/neurips/GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs
new file mode 100644
index 0000000000..716804f670
--- /dev/null
+++ b/data/2020/neurips/GCOMB: Learning Budget-constrained Combinatorial Algorithms over Billion-sized Graphs	
@@ -0,0 +1 @@
+There has been an increased interest in discovering heuristics for combinatorial problems on graphs through machine learning. While existing techniques have primarily focused on obtaining high-quality solutions, scalability to billion-sized graphs has not been adequately addressed. In addition, the impact of budget-constraint, which is necessary for many practical scenarios, remains to be studied. In this paper, we propose a framework called G COMB to bridge these gaps. G COMB trains a Graph Convolutional Network (GCN) using a novel probabilistic greedy mechanism to predict the quality of a node. To further facilitate the combinatorial nature of the problem, G COMB utilizes a Q -learning framework, which is made efﬁcient through importance sampling . We perform extensive experiments on real graphs to benchmark the efﬁciency and efﬁcacy of G COMB . Our results establish that G COMB is 100 times faster and marginally better in quality than state-of-the-art algorithms for learning combinatorial algorithms. Additionally, a case-study on the practical combinatorial problem of Inﬂuence Maximization (IM) shows G COMB is 150 times faster than the specialized IM algorithm IMM with similar quality.
\ No newline at end of file
diff --git a/data/2020/neurips/GNNGuard: Defending Graph Neural Networks against Adversarial Attacks b/data/2020/neurips/GNNGuard: Defending Graph Neural Networks against Adversarial Attacks
new file mode 100644
index 0000000000..d0477d33c9
--- /dev/null
+++ b/data/2020/neurips/GNNGuard: Defending Graph Neural Networks against Adversarial Attacks	
@@ -0,0 +1 @@
+Deep learning methods for graphs achieve remarkable performance on many tasks. However, despite the proliferation of such methods and their success, recent findings indicate that small, unnoticeable perturbations of graph structure can catastrophically reduce performance of even the strongest and most popular Graph Neural Networks (GNNs). Here, we develop GNNGuard, a general defense approach against a variety of training-time attacks that perturb the discrete graph structure. GNNGuard can be straightforwardly incorporated into any GNN. Its core principle is to detect and quantify the relationship between the graph structure and node features, if one exists, and then exploit that relationship to mitigate negative effects of the attack. GNNGuard uses network theory of homophily to learn how best assign higher weights to edges connecting similar nodes while pruning edges between unrelated nodes. The revised edges then allow the underlying GNN to robustly propagate neural messages in the graph. GNNGuard introduces two novel components, the neighbor importance estimation, and the layer-wise graph memory, and we show empirically that both components are necessary for a successful defense. Across five GNNs, three defense methods, and four datasets, including a challenging human disease graph, experiments show that GNNGuard outperforms existing defense approaches by 15.3% on average. Remarkably, GNNGuard can effectively restore the state-of-the-art performance of GNNs in the face of various adversarial attacks, including targeted and non-targeted attacks.
\ No newline at end of file
diff --git a/data/2020/neurips/GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network b/data/2020/neurips/GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network
new file mode 100644
index 0000000000..e3b67e1f4b
--- /dev/null
+++ b/data/2020/neurips/GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network	
@@ -0,0 +1 @@
+The feature correlation layer serves as a key neural network module in numerous computer vision problems that involve dense correspondences between image pairs. It predicts a correspondence volume by evaluating dense scalar products between feature vectors extracted from pairs of locations in two images. However, this point-to-point feature comparison is insufficient when disambiguating multiple similar regions in an image, severely affecting the performance of the end task. We propose GOCor, a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer. The correspondence volume generated by our module is the result of an internal optimization procedure that explicitly accounts for similar regions in the scene. Moreover, our approach is capable of effectively learning spatial matching priors to resolve further matching ambiguities. We analyze our GOCor module in extensive ablative experiments. When integrated into state-of-the-art networks, our approach significantly outperforms the feature correlation layer for the tasks of geometric matching, optical flow, and dense semantic matching. The code and trained models will be made available at this http URL.
\ No newline at end of file
diff --git a/data/2020/neurips/GPS-Net: Graph-based Photometric Stereo Network b/data/2020/neurips/GPS-Net: Graph-based Photometric Stereo Network
new file mode 100644
index 0000000000..ca5bc99655
--- /dev/null
+++ b/data/2020/neurips/GPS-Net: Graph-based Photometric Stereo Network	
@@ -0,0 +1 @@
+Learning-based photometric stereo methods predict the surface normal either in a per-pixel or an all-pixel manner. Per-pixel methods explore the inter-image intensity variation of each pixel but ignore features from the intra-image spatial domain. All-pixel methods explore the intra-image intensity variation of each input image but pay less attention to the inter-image lighting variation. In this paper, we present a Graph-based Photometric Stereo Network, which uniﬁes per-pixel and all-pixel processings to explore both inter-image and intra-image information. For per-pixel operation, we propose the Unstructured Feature Extraction Layer to connect an arbitrary number of input image-light pairs into graph structures, and introduce Structure-aware Graph Convolution ﬁlters to balance the input data by appropriately weighting shadows and specular highlights. For all-pixel operation, we propose the Normal Regression Network to make efﬁcient use of the intra-image spatial information for predicting a surface normal map with rich details. Experimental results on the real-world benchmark show that our method achieves excellent performance under both sparse and dense lighting distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification b/data/2020/neurips/GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification
new file mode 100644
index 0000000000..76594ffd59
--- /dev/null
+++ b/data/2020/neurips/GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification	
@@ -0,0 +1 @@
+One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets--which impose far more stringent memory constraints--we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups. Furthermore, we show how GPU speedups may be mixed with multithreading to enable such speedups when the dataset is too large for GPU memory requirements; on a massive dense proteomics dataset of nearly a quarter-billion data instances, these mixed-architecture speedups reduce SVM analysis time from over half a week to less than a single day while using limited GPU memory.
\ No newline at end of file
diff --git a/data/2020/neurips/GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis b/data/2020/neurips/GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis
new file mode 100644
index 0000000000..200e48b986
--- /dev/null
+++ b/data/2020/neurips/GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis	
@@ -0,0 +1 @@
+While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, e.g., the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxel-based representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.
\ No newline at end of file
diff --git a/data/2020/neurips/GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators b/data/2020/neurips/GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators
new file mode 100644
index 0000000000..d725ecdb03
--- /dev/null
+++ b/data/2020/neurips/GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators	
@@ -0,0 +1 @@
+The wide-spread availability of rich data has fueled the growth of machine learning applications in numerous domains. However, growth in domains with highly-sensitive data (e.g., medical) is largely hindered as the private nature of data prohibits it from being shared. To this end, we propose Gradient-sanitized Wasserstein Generative Adversarial Networks (GS-WGAN), which allows releasing a sanitized form of the sensitive data with rigorous privacy guarantees. In contrast to prior work, our approach is able to distort gradient information more precisely, and thereby enabling training deeper models which generate more informative samples. Moreover, our formulation naturally allows for training GANs in both centralized and federated (i.e., decentralized) data scenarios. Through extensive experiments, we find our approach consistently outperforms state-of-the-art approaches across multiple metrics (e.g., sample quality) and datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction b/data/2020/neurips/Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
new file mode 100644
index 0000000000..950728f870
--- /dev/null
+++ b/data/2020/neurips/Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction	
@@ -0,0 +1 @@
+We introduce the $\gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with $\gamma$-models leads to generalizations of the procedures central to model-based control, including the model rollout and model-based value estimation. The $\gamma$-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the $\gamma$-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.
\ No newline at end of file
diff --git a/data/2020/neurips/Gaussian Gated Linear Networks b/data/2020/neurips/Gaussian Gated Linear Networks
new file mode 100644
index 0000000000..093d58e269
--- /dev/null
+++ b/data/2020/neurips/Gaussian Gated Linear Networks	
@@ -0,0 +1 @@
+We propose the Gaussian Gated Linear Network (G-GLN), an extension to the recently proposed GLN family of deep neural networks. Instead of using backpropagation to learn features, GLNs have a distributed and local credit assignment mechanism based on optimizing a convex objective. This gives rise to many desirable properties including universality, data-efficient online learning, trivial interpretability and robustness to catastrophic forgetting. We extend the GLN framework from classification to multiple regression and density modelling by generalizing geometric mixing to a product of Gaussian densities. The G-GLN achieves competitive or state-of-the-art performance on several univariate and multivariate regression benchmarks, and we demonstrate its applicability to practical tasks including online contextual bandits and density estimation via denoising.
\ No newline at end of file
diff --git a/data/2020/neurips/Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective b/data/2020/neurips/Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
new file mode 100644
index 0000000000..bcf098df39
--- /dev/null
+++ b/data/2020/neurips/Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective	
@@ -0,0 +1 @@
+Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.
\ No newline at end of file
diff --git a/data/2020/neurips/General Control Functions for Causal Effect Estimation from IVs b/data/2020/neurips/General Control Functions for Causal Effect Estimation from IVs
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/General Transportability of Soft Interventions: Completeness Results b/data/2020/neurips/General Transportability of Soft Interventions: Completeness Results
new file mode 100644
index 0000000000..dd0ffcc42c
--- /dev/null
+++ b/data/2020/neurips/General Transportability of Soft Interventions: Completeness Results	
@@ -0,0 +1 @@
+The challenge of generalizing causal knowledge across different environments is pervasive in scientiﬁc explorations, including in AI, ML, and Data Science. Experiments are usually performed in one environment (e.g., in a lab, on Earth) with the intent, almost invariably, of being used elsewhere (e.g., outside the lab, on Mars), where the conditions are likely to be different. In the causal inference literature, this generalization task has been formalized under the rubric of transportability (Pearl and Bareinboim, 2011), where a number of criteria and algorithms have been developed for various settings. Despite the generality of such results, trans-portability theory has been conﬁned to atomic, do() -interventions. In practice, many real-world applications require more complex, stochastic interventions; for instance, in reinforcement learning, agents need to continuously adapt to the changing conditions of an uncertain and unknown environment. In this paper, we extend transportability theory to encompass these more complex types of interventions, which are known as “soft,” both relative to the input as well as the target distribution of the analysis. Speciﬁcally, we develop a graphical condition that is both necessary and sufﬁcient for deciding soft-transportability. Second, we develop an algorithm to determine whether a non-atomic intervention is computable from a combination of the distributions available across domains. As a corollary, we show that the � -calculus is complete for the task of soft-transportability.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalised Bayesian Filtering via Sequential Monte Carlo b/data/2020/neurips/Generalised Bayesian Filtering via Sequential Monte Carlo
new file mode 100644
index 0000000000..2d2a499f47
--- /dev/null
+++ b/data/2020/neurips/Generalised Bayesian Filtering via Sequential Monte Carlo	
@@ -0,0 +1 @@
+We introduce a framework for inference in general state-space hidden Markov 1 models (HMMs) under likelihood misspeciﬁcation. In particular, we leverage 2 the loss-theoretic perspective of Generalized Bayesian Inference (GBI) to deﬁne 3 generalised ﬁltering recursions in HMMs, that can tackle the problem of inference 4 under model misspeciﬁcation. In doing so, we arrive at principled procedures for 5 robust inference against observation contamination by utilising the β -divergence. 6 Operationalising the proposed framework is made possible via sequential Monte 7 Carlo methods (SMC), where most standard particle methods, and their associated 8 convergence results, are readily adapted to the new setting. We apply our approach 9 to object tracking and Gaussian process regression problems, and observe improved 10 performance over both standard ﬁltering algorithms and other robust ﬁlters
\ No newline at end of file
diff --git a/data/2020/neurips/Generalization Bound of Gradient Descent for Non-Convex Metric Learning b/data/2020/neurips/Generalization Bound of Gradient Descent for Non-Convex Metric Learning
new file mode 100644
index 0000000000..509449b483
--- /dev/null
+++ b/data/2020/neurips/Generalization Bound of Gradient Descent for Non-Convex Metric Learning	
@@ -0,0 +1 @@
+Metric learning aims to learn a distance measure that can beneﬁt distance-based 1 methods such as the nearest neighbor (NN) classiﬁer. While considerable efforts 2 have been made to improve its empirical performance and analyze its generalization 3 ability by focusing on the data structure and model complexity, an unresolved ques-4 tion is how choices of algorithmic parameters such as training time affect metric 5 learning as it is typically formulated as an optimization problem and nowadays 6 more often as a non-convex problem. In this paper, we theoretically address this 7 question and prove the agnostic Probably Approximately Correct (PAC) learnabil-8 ity for metric learning algorithms with non-convex objective functions optimized 9 via gradient descent (GD); in particular, our theoretical guarantee takes training 10 time into account. We ﬁrst show that the generalization PAC bound is a sufﬁcient 11 condition for agnostic PAC learnability and this bound can be obtained by ensuring 12 the uniform convergence on a densely concentrated subset of the parameter space. 13 We then show that, for classiﬁers optimized via GD, their generalizability can 14 be guaranteed if the classiﬁer and loss function are both Lipschitz smooth, and 15 further improved by using fewer iterations. To illustrate and exploit the theoretical 16 ﬁndings, we ﬁnally propose a novel metric learning method called S mooth M etric 17 and representative I nstance LE arning (SMILE), designed to satisfy the Lipschitz 18 smoothness property and learned via GD with an early stopping mechanism for 19 better discriminability and less computational cost of NN. 20
\ No newline at end of file
diff --git a/data/2020/neurips/Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics b/data/2020/neurips/Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
new file mode 100644
index 0000000000..27cba4ce47
--- /dev/null
+++ b/data/2020/neurips/Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics	
@@ -0,0 +1 @@
+We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence. This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods. To realize more natural analysis, we consider a completely different approach in which we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the {\it infinite dimensional Langevin dynamics}. This enables us to analyze narrow and wide networks in a unifying manner. Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics. The excess risk bound achieves the so-called fast learning rate. In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization b/data/2020/neurips/Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization
new file mode 100644
index 0000000000..ba11c69154
--- /dev/null
+++ b/data/2020/neurips/Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization	
@@ -0,0 +1 @@
+We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $\alpha=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we prove a formula for the generalization error achieved by $\ell_2$ regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the $\ell_2$ regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely. As $\alpha \to \infty$ they lead to Bayes-optimal rates, a fact that does not follow from predictions of margin-based generalization error bounds. Third, we design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalized Boosting b/data/2020/neurips/Generalized Boosting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection b/data/2020/neurips/Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection
new file mode 100644
index 0000000000..81d175b9c7
--- /dev/null
+++ b/data/2020/neurips/Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection	
@@ -0,0 +1 @@
+One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalized Hindsight for Reinforcement Learning b/data/2020/neurips/Generalized Hindsight for Reinforcement Learning
new file mode 100644
index 0000000000..0fb05e1546
--- /dev/null
+++ b/data/2020/neurips/Generalized Hindsight for Reinforcement Learning	
@@ -0,0 +1 @@
+One of the key reasons for the high sample complexity in reinforcement learning (RL) is the inability to transfer knowledge from one task to another. In standard multi-task RL settings, low-reward data collected while trying to solve one task provides little to no signal for solving that particular task and is hence effectively wasted. However, we argue that this data, which is uninformative for one task, is likely a rich source of information for other tasks. To leverage this insight and efficiently reuse data, we present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks. Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer. Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks. Videos and code can be accessed here: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs b/data/2020/neurips/Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs
new file mode 100644
index 0000000000..44ca1fc66b
--- /dev/null
+++ b/data/2020/neurips/Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs	
@@ -0,0 +1 @@
+Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2020/neurips/Generalized Leverage Score Sampling for Neural Networks b/data/2020/neurips/Generalized Leverage Score Sampling for Neural Networks
new file mode 100644
index 0000000000..9b0751e9ca
--- /dev/null
+++ b/data/2020/neurips/Generalized Leverage Score Sampling for Neural Networks	
@@ -0,0 +1,4 @@
+Leverage score sampling is a powerful technique that originates from theoretical computer science, which can be used to speed up a large number of fundamental questions, e.g. linear regression, linear programming, semi-definite programming, cutting plane method, graph sparsification, maximum matching and max-flow. Recently, it has been shown that leverage score sampling helps to accelerate kernel methods [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17]. 
+In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels. We further bring the leverage score sampling into the field of deep learning theory. 
+$\bullet$ We show the connection between the initialization for neural network training and approximating the neural tangent kernel with random features. 
+$\bullet$ We prove the equivalence between regularized neural network and neural tangent kernel ridge regression under the initialization of both classical random Gaussian and leverage score sampling.
\ No newline at end of file
diff --git a/data/2020/neurips/Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning b/data/2020/neurips/Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..474f02f963
--- /dev/null
+++ b/data/2020/neurips/Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning	
@@ -0,0 +1 @@
+Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacency region centered by the current state using an adjacency constraint. We theoretically prove that the proposed adjacency constraint preserves the optimal hierarchical policy, and show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks show that our method outperforms the state-of-the-art HRL approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Generating Correct Answers for Progressive Matrices Intelligence Tests b/data/2020/neurips/Generating Correct Answers for Progressive Matrices Intelligence Tests
new file mode 100644
index 0000000000..32b0652c35
--- /dev/null
+++ b/data/2020/neurips/Generating Correct Answers for Progressive Matrices Intelligence Tests	
@@ -0,0 +1 @@
+Raven's Progressive Matrices are multiple-choice intelligence tests, where one tries to complete the missing location in a $3\times 3$ grid of abstract images. Previous attempts to address this test have focused solely on selecting the right answer out of the multiple choices. In this work, we focus, instead, on generating a correct answer given the grid, without seeing the choices, which is a harder task, by definition. The proposed neural model combines multiple advances in generative models, including employing multiple pathways through the same network, using the reparameterization trick along two pathways to make their encoding compatible, a dynamic application of variational losses, and a complex perceptual loss that is coupled with a selective backpropagation procedure. Our algorithm is able not only to generate a set of plausible answers, but also to be competitive to the state of the art methods in multiple-choice tests.
\ No newline at end of file
diff --git a/data/2020/neurips/Generative 3D Part Assembly via Dynamic Graph Learning b/data/2020/neurips/Generative 3D Part Assembly via Dynamic Graph Learning
new file mode 100644
index 0000000000..5bf6d0bbd6
--- /dev/null
+++ b/data/2020/neurips/Generative 3D Part Assembly via Dynamic Graph Learning	
@@ -0,0 +1 @@
+Autonomous part assembly is a challenging yet crucial task in 3D computer vision and robotics. Analogous to buying an IKEA furniture, given a set of 3D parts that can assemble a single shape, an intelligent agent needs to perceive the 3D part geometry, reason to propose pose estimations for the input parts, and finally call robotic planning and control routines for actuation. In this paper, we focus on the pose estimation subproblem from the vision side involving geometric and relational reasoning over the input part geometry. Essentially, the task of generative 3D part assembly is to predict a 6-DoF part pose, including a rigid rotation and translation, for each input part that assembles a single 3D shape as the final output. To tackle this problem, we propose an assembly-oriented dynamic graph learning framework that leverages an iterative graph neural network as a backbone. It explicitly conducts sequential part assembly refinements in a coarse-to-fine manner, exploits a pair of part relation reasoning module and part aggregation module for dynamically adjusting both part features and their relations in the part graph. We conduct extensive experiments and quantitative comparisons to three strong baseline methods, demonstrating the effectiveness of the proposed approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Generative Neurosymbolic Machines b/data/2020/neurips/Generative Neurosymbolic Machines
new file mode 100644
index 0000000000..fc0446314e
--- /dev/null
+++ b/data/2020/neurips/Generative Neurosymbolic Machines	
@@ -0,0 +1 @@
+Reconciling symbolic and distributed representations is a crucial challenge that can potentially resolve the limitations of current deep learning. Remarkable advances in this direction have been achieved recently via generative object-centric representation models. While learning a recognition model that infers object-centric symbolic representations like bounding boxes from raw images in an unsupervised way, no such model can provide another important ability of a generative model, i.e., generating (sampling) according to the structure of learned world density. In this paper, we propose Generative Neurosymbolic Machines, a generative model that combines the benefits of distributed and symbolic representations to support both structured representations of symbolic components and density-based generation. These two crucial properties are achieved by a two-layer latent hierarchy with the global distributed latent for flexible density modeling and the structured symbolic latent map. To increase the model flexibility in this hierarchical structure, we also propose the StructDRAW prior. In experiments, we show that the proposed model significantly outperforms the previous structured representation models as well as the state-of-the-art non-structured generative models in terms of both structure accuracy and image generation quality.
\ No newline at end of file
diff --git a/data/2020/neurips/Generative View Synthesis: From Single-view Semantics to Novel-view Images b/data/2020/neurips/Generative View Synthesis: From Single-view Semantics to Novel-view Images
new file mode 100644
index 0000000000..ce279d367d
--- /dev/null
+++ b/data/2020/neurips/Generative View Synthesis: From Single-view Semantics to Novel-view Images	
@@ -0,0 +1 @@
+Content creation, central to applications such as virtual reality, can be a tedious and time-consuming. Recent image synthesis methods simplify this task by offering tools to generate new views from as little as a single input image, or by converting a semantic map into a photorealistic image. We propose to push the envelope further, and introduce Generative View Synthesis (GVS), which can synthesize multiple photorealistic views of a scene given a single semantic map. We show that the sequential application of existing techniques, e.g., semantics-to-image translation followed by monocular view synthesis, fail at capturing the scene's structure. In contrast, we solve the semantics-to-image translation in concert with the estimation of the 3D layout of the scene, thus producing geometrically consistent novel views that preserve semantic structures. We first lift the input 2D semantic map onto a 3D layered representation of the scene in feature space, thereby preserving the semantic labels of 3D geometric structures. We then project the layered features onto the target views to generate the final novel-view images. We verify the strengths of our method and compare it with several advanced baselines on three different datasets. Our approach also allows for style manipulation and image editing operations, such as the addition or removal of objects, with simple manipulations of the input style images and semantic maps respectively. Visit the project page at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Generative causal explanations of black-box classifiers b/data/2020/neurips/Generative causal explanations of black-box classifiers
new file mode 100644
index 0000000000..b5013823d8
--- /dev/null
+++ b/data/2020/neurips/Generative causal explanations of black-box classifiers	
@@ -0,0 +1 @@
+We develop a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data. The explanation is causal in the sense that changing learned latent factors produces a change in the classifier output statistics. To construct these explanations, we design a learning framework that leverages a generative model and information-theoretic measures of causal influence. Our objective function encourages both the generative model to faithfully represent the data distribution and the latent factors to have a large causal influence on the classifier output. Our method learns both global and local explanations, is compatible with any classifier that admits class probabilities and a gradient, and does not require labeled attributes or knowledge of causal structure. Using carefully controlled test cases, we provide intuition that illuminates the function of our causal objective. We then demonstrate the practical utility of our method on image recognition tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction b/data/2020/neurips/Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction
new file mode 100644
index 0000000000..ce45db88e5
--- /dev/null
+++ b/data/2020/neurips/Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction	
@@ -0,0 +1 @@
+We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person. Our method is based on a deep implicit function-based representation to learn latent voxel features using a structure-aware 3D U-Net, to constrain the model in two ways: first, to resolve feature ambiguities in query point encoding, second, to serve as a coarse human shape proxy to regularize the high-resolution mesh and encourage global shape regularity. We show that, by both encoding query points and constraining global shape using latent voxel features, the reconstruction we obtain for clothed human meshes exhibits less shape distortion and improved surface details compared to competing methods. We evaluate Geo-PIFu on a recent human mesh public dataset that is $10 \times$ larger than the private commercial dataset used in PIFu and previous derivative work. On average, we exceed the state of the art by $42.7\%$ reduction in Chamfer and Point-to-Surface Distances, and $19.4\%$ reduction in normal estimation errors.
\ No newline at end of file
diff --git a/data/2020/neurips/Geometric All-way Boolean Tensor Decomposition b/data/2020/neurips/Geometric All-way Boolean Tensor Decomposition
new file mode 100644
index 0000000000..f8e2d9e24f
--- /dev/null
+++ b/data/2020/neurips/Geometric All-way Boolean Tensor Decomposition	
@@ -0,0 +1 @@
+Boolean tensor has been broadly utilized in representing high dimensional logical data collected on spatial, temporal and/or other relational domains. Boolean Tensor Decomposition (BTD) factorizes a binary tensor into the Boolean sum of multiple rank-1 tensors, which is an NP-hard problem. Existing BTD methods have been limited by their high computational cost, in applications to large scale or higher order tensors. In this work, we presented a computationally efficient BTD algorithm, namely \textit{Geometric Expansion for all-order Tensor Factorization} (GETF), that sequentially identifies the rank-1 basis components for a tensor from a geometric perspective. We conducted rigorous theoretical analysis on the validity as well as algorithemic efficiency of GETF in decomposing all-order tensor. Experiments on both synthetic and real-world data demonstrated that GETF has significantly improved performance in reconstruction accuracy, extraction of latent structures and it is an order of magnitude faster than other state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Geometric Dataset Distances via Optimal Transport b/data/2020/neurips/Geometric Dataset Distances via Optimal Transport
new file mode 100644
index 0000000000..e6c4058516
--- /dev/null
+++ b/data/2020/neurips/Geometric Dataset Distances via Optimal Transport	
@@ -0,0 +1 @@
+The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Geometric Exploration for Online Control b/data/2020/neurips/Geometric Exploration for Online Control
new file mode 100644
index 0000000000..b563e8fe72
--- /dev/null
+++ b/data/2020/neurips/Geometric Exploration for Online Control	
@@ -0,0 +1 @@
+We study the control of an \emph{unknown} linear dynamical system under general convex costs. The objective is minimizing regret vs. the class of disturbance-feedback-controllers, which encompasses all stabilizing linear-dynamical-controllers. In this work, we first consider the case of known cost functions, for which we design the first polynomial-time algorithm with $n^3\sqrt{T}$-regret, where $n$ is the dimension of the state plus the dimension of control input. The $\sqrt{T}$-horizon dependence is optimal, and improves upon the previous best known bound of $T^{2/3}$. The main component of our algorithm is a novel geometric exploration strategy: we adaptively construct a sequence of barycentric spanners in the policy space. Second, we consider the case of bandit feedback, for which we give the first polynomial-time algorithm with $poly(n)\sqrt{T}$-regret, building on Stochastic Bandit Convex Optimization.
\ No newline at end of file
diff --git a/data/2020/neurips/Gibbs Sampling with People b/data/2020/neurips/Gibbs Sampling with People
new file mode 100644
index 0000000000..713b9a7b1f
--- /dev/null
+++ b/data/2020/neurips/Gibbs Sampling with People	
@@ -0,0 +1 @@
+A core problem in cognitive science and machine learning is to understand how humans derive semantic representations from perceptual objects, such as color from an apple, pleasantness from a musical chord, or trustworthiness from a face. Markov Chain Monte Carlo with People (MCMCP) is a prominent method for studying such representations, in which participants are presented with binary choice trials constructed such that the decisions follow a Markov Chain Monte Carlo acceptance rule. However, MCMCP's binary choice paradigm generates relatively little information per trial, and its local proposal function makes it slow to explore the parameter space and find the modes of the distribution. Here we therefore generalize MCMCP to a continuous-sampling paradigm, where in each iteration the participant uses a slider to continuously manipulate a single stimulus dimension to optimize a given criterion such as 'pleasantness'. We formulate both methods from a utility-theory perspective, and show that the new method can be interpreted as 'Gibbs Sampling with People' (GSP). Further, we introduce an aggregation parameter to the transition step, and show that this parameter can be manipulated to flexibly shift between Gibbs sampling and deterministic optimization. In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces. We validate these results through large-scale perceptual rating experiments. The final experiments combine GSP with a state-of-the-art image synthesis network (StyleGAN) and a recent network interpretability technique (GANSpace), enabling GSP to efficiently explore high-dimensional perceptual spaces, and demonstrating how GSP can be a powerful tool for jointly characterizing semantic representations in humans and machines.
\ No newline at end of file
diff --git a/data/2020/neurips/Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification b/data/2020/neurips/Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
new file mode 100644
index 0000000000..8b1d7d2fee
--- /dev/null
+++ b/data/2020/neurips/Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification	
@@ -0,0 +1 @@
+The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images. However, this often comes at a high computational cost and high memory footprint. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning. Such a dynamic decision process naturally facilitates adaptive inference at test time, i.e., it can be terminated once the model is sufficiently confident about its prediction and thus avoids further redundant computation. Notably, our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs (such as MobileNets, EfficientNets and RegNets), which can be conveniently deployed as the backbone feature extractor. Experiments on ImageNet show that our method consistently improves the computational efficiency of a wide variety of deep models. For example, it further reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 20% without sacrificing accuracy. Code and pre-trained models are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems b/data/2020/neurips/Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems
new file mode 100644
index 0000000000..3352785216
--- /dev/null
+++ b/data/2020/neurips/Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems	
@@ -0,0 +1 @@
+Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant stepsize can potentially diverge even in the convex-concave setting. In this work, we show that for a subclass of nonconvex-nonconcave objectives satisfying a so-called two-sided Polyak-Łojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the ﬁnite-sum structure.
\ No newline at end of file
diff --git a/data/2020/neurips/Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology b/data/2020/neurips/Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
new file mode 100644
index 0000000000..ea4f506102
--- /dev/null
+++ b/data/2020/neurips/Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology	
@@ -0,0 +1 @@
+Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width $N$ following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used Xavier's initialization and obtain an over-parameterization requirement for the single wide layer of order $N^2.$
\ No newline at end of file
diff --git a/data/2020/neurips/Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search b/data/2020/neurips/Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
new file mode 100644
index 0000000000..ad909f52c5
--- /dev/null
+++ b/data/2020/neurips/Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	
@@ -0,0 +1 @@
+Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS, a flow-based generative model for parallel TTS that does not require any external aligner. By combining the properties of flows and dynamic programming, the proposed model searches for the most probable monotonic alignment between text and the latent representation of speech on its own. We demonstrate that enforcing hard monotonic alignments enables robust TTS, which generalizes to long utterances, and employing generative flows enables fast, diverse, and controllable speech synthesis. Glow-TTS obtains an order-of-magnitude speed-up over the autoregressive model, Tacotron 2, at synthesis with comparable speech quality. We further show that our model can be easily extended to a multi-speaker setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data b/data/2020/neurips/Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
new file mode 100644
index 0000000000..04e4be7eb5
--- /dev/null
+++ b/data/2020/neurips/Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data	
@@ -0,0 +1,2 @@
+Big data is one of the cornerstones to enabling and training deep neural networks (DNNs). Because of the lack of expertise, to gain benefits from their data, average users have to rely on and upload their private data to big data companies they may not trust. Due to the compliance, legal, or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join the training of DNNs in cloud. To train a DNN on encrypted data in a completely non-interactive way, a recent work proposes a fully homomorphic encryption (FHE)-based technique implementing all activations in the neural network by \textit{Brakerski-Gentry-Vaikuntanathan (BGV)}-based lookup tables. However, such inefficient lookup-table-based activations significantly prolong the training latency of privacy-preserving DNNs. 
+In this paper, we propose, Glyph, a FHE-based scheme to fast and accurately train DNNs on encrypted data by switching between TFHE (Fast Fully Homomorphic Encryption over the Torus) and BGV cryptosystems. Glyph uses logic-operation-friendly TFHE to implement nonlinear activations, while adopts vectorial-arithmetic-friendly BGV to perform multiply-accumulation (MAC) operations. Glyph further applies transfer learning on the training of DNNs to improve the test accuracy and reduce the number of MAC operations between ciphertext and ciphertext in convolutional layers. Our experimental results show Glyph obtains the state-of-the-art test accuracy, but reduces the training latency by $99\%$ over the prior FHE-based technique on various encrypted datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Goal-directed Generation of Discrete Structures with Conditional Generative Models b/data/2020/neurips/Goal-directed Generation of Discrete Structures with Conditional Generative Models
new file mode 100644
index 0000000000..6c2495bd01
--- /dev/null
+++ b/data/2020/neurips/Goal-directed Generation of Discrete Structures with Conditional Generative Models	
@@ -0,0 +1 @@
+Despite recent advances, goal-directed generation of structured discrete data remains challenging. For problems such as program synthesis (generating source code) and materials design (generating molecules), finding examples which satisfy desired constraints or exhibit desired properties is difficult. In practice, expensive heuristic search or reinforcement learning algorithms are often employed. In this paper we investigate the use of conditional generative models which directly attack this inverse problem, by modeling the distribution of discrete structures given properties of interest. Unfortunately, maximum likelihood training of such models often fails with the samples from the generative model inadequately respecting the input properties. To address this, we introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We avoid high-variance score-function estimators that would otherwise be required by sampling from an approximation to the normalized rewards, allowing simple Monte Carlo estimation of model gradients. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value. In both cases, we find improvements over maximum likelihood estimation and other baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/GradAug: A New Regularization Method for Deep Neural Networks b/data/2020/neurips/GradAug: A New Regularization Method for Deep Neural Networks
new file mode 100644
index 0000000000..6a2d4f1d29
--- /dev/null
+++ b/data/2020/neurips/GradAug: A New Regularization Method for Deep Neural Networks	
@@ -0,0 +1 @@
+We propose a new regularization method to alleviate over-fitting in deep neural networks. The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process. As such, the proposed method introduces self-guided disturbances to the raw gradients of the network and therefore is termed as Gradient Augmentation (GradAug). We demonstrate that GradAug can help the network learn well-generalized and more diverse representations. Moreover, it is easy to implement and can be applied to various structures and applications. GradAug improves ResNet-50 to 78.79% on ImageNet classification, which is a new state-of-the-art accuracy. By combining with CutMix, it further boosts the performance to 79.67%, which outperforms an ensemble of advanced training tricks. The generalization ability is evaluated on COCO object detection and instance segmentation where GradAug significantly surpasses other state-of-the-art methods. GradAug is also robust to image distortions and FGSM adversarial attacks and is highly effective in low data regimes. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Gradient Boosted Normalizing Flows b/data/2020/neurips/Gradient Boosted Normalizing Flows
new file mode 100644
index 0000000000..cabb59861c
--- /dev/null
+++ b/data/2020/neurips/Gradient Boosted Normalizing Flows	
@@ -0,0 +1 @@
+By chaining a sequence of differentiable invertible transformations, normalizing flows (NF) provide an expressive method of posterior approximation, exact density evaluation, and sampling. The trend in normalizing flow literature has been to devise deeper, more complex transformations to achieve greater flexibility. We propose an alternative: Gradient Boosted Normalizing Flows (GBNF) model a density by successively adding new NF components with gradient boosting. Under the boosting framework, each new NF component optimizes a sample weighted likelihood objective, resulting in new components that are fit to the residuals of the previously trained components. The GBNF formulation results in a mixture model structure, whose flexibility increases as more components are added. Moreover, GBNFs offer a wider, as opposed to strictly deeper, approach that improves existing NFs at the cost of additional training---not more complex transformations. We demonstrate the effectiveness of this technique for density estimation and, by coupling GBNF with a variational autoencoder, generative modeling of images. Our results show that GBNFs outperform their non-boosted analog, and, in some cases, produce better results with smaller, simpler flows.
\ No newline at end of file
diff --git a/data/2020/neurips/Gradient Estimation with Stochastic Softmax Tricks b/data/2020/neurips/Gradient Estimation with Stochastic Softmax Tricks
new file mode 100644
index 0000000000..33faebf6e3
--- /dev/null
+++ b/data/2020/neurips/Gradient Estimation with Stochastic Softmax Tricks	
@@ -0,0 +1 @@
+The Gumbel-Max trick is the basis of many relaxed gradient estimators. These estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. When compared to less structured baselines, we find that stochastic softmax tricks can be used to train latent variable models that perform better and discover more latent structure.
\ No newline at end of file
diff --git a/data/2020/neurips/Gradient Regularized V-Learning for Dynamic Treatment Regimes b/data/2020/neurips/Gradient Regularized V-Learning for Dynamic Treatment Regimes
new file mode 100644
index 0000000000..bee6db82b4
--- /dev/null
+++ b/data/2020/neurips/Gradient Regularized V-Learning for Dynamic Treatment Regimes	
@@ -0,0 +1 @@
+Deciding how to optimally treat a patient, including how to select treatments over time among the multiple available treatments, represents one of the most important issues that need to be addressed in medicine today. A dynamic treatment regime (DTR) is a sequence of treatment rules indicating how to individualize treatments for a patient based on the previously assigned treatments and the evolving covariate history. However, DTR evaluation and learning based on ofﬂine data remain challenging problems due to the bias introduced by time-varying confounders that affect treatment assignment over time; this may lead to suboptimal treatment rules being used in practice. In this paper, we introduce Gradient Regularized V -learning (GRV), a novel method for estimating the value function of a DTR. GRV regularizes the underlying outcome and propensity score models with respect to the optimality condition in semiparametric estimation theory. On the basis of this design, we construct estimators that are efﬁcient and stable in ﬁnite samples regime. Using multiple simulation studies and one real-world medical dataset, we demonstrate that our method is superior in DTR evaluation and learning, thereby providing improved treatment options over time for patients.
\ No newline at end of file
diff --git a/data/2020/neurips/Gradient Surgery for Multi-Task Learning b/data/2020/neurips/Gradient Surgery for Multi-Task Learning
new file mode 100644
index 0000000000..60a05f0ecc
--- /dev/null
+++ b/data/2020/neurips/Gradient Surgery for Multi-Task Learning	
@@ -0,0 +1 @@
+While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task's gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Gradient-EM Bayesian Meta-Learning b/data/2020/neurips/Gradient-EM Bayesian Meta-Learning
new file mode 100644
index 0000000000..8b19346b0b
--- /dev/null
+++ b/data/2020/neurips/Gradient-EM Bayesian Meta-Learning	
@@ -0,0 +1 @@
+Bayesian meta-learning enables robust and fast adaptation to new tasks with uncertainty assessment. The key idea behind Bayesian meta-learning is empirical Bayes inference of hierarchical model. In this work, we extend this framework to include a variety of existing methods, before proposing our variant based on gradient-EM algorithm. Our method improves computational efficiency by avoiding back-propagation computation in the meta-update step, which is exhausting for deep neural networks. Furthermore, it provides flexibility to the inner-update optimization procedure by decoupling it from meta-update. Experiments on sinusoidal regression, few-shot image classification, and policy-based reinforcement learning show that our method not only achieves better accuracy with less computation cost, but is also more robust to uncertainty.
\ No newline at end of file
diff --git a/data/2020/neurips/Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning b/data/2020/neurips/Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning
new file mode 100644
index 0000000000..ee1384a548
--- /dev/null
+++ b/data/2020/neurips/Graduated Assignment for Joint Multi-Graph Matching and Clustering with Application to Unsupervised Graph Matching Network Learning	
@@ -0,0 +1 @@
+This paper considers the setting of jointly matching and clustering multiple graphs belonging to different groups, which naturally rises in many realistic problems. Both graph matching and clustering are challenging (NP-hard) and a joint solution is appealing due to the natural connection of the two tasks. In this paper, we resort to a graduated assignment procedure for soft matching and clustering over iterations, whereby the two-way constraint and clustering conﬁdence are modulated by two separate annealing parameters, respectively. Our technique can be further utilized for end-to-end learning whose loss refers to the cross-entropy between two lines of matching pipelines, as such the keypoint feature extraction CNNs can be learned without ground-truth supervision. Experimental results on real-world benchmarks show our method outperforms learning-free algorithms and performs comparatively against two-graph based supervised graph matching approaches. Source code is publicly available as a module of ThinkMatch .
\ No newline at end of file
diff --git a/data/2020/neurips/GramGAN: Deep 3D Texture Synthesis From 2D Exemplars b/data/2020/neurips/GramGAN: Deep 3D Texture Synthesis From 2D Exemplars
new file mode 100644
index 0000000000..ffdc764a6f
--- /dev/null
+++ b/data/2020/neurips/GramGAN: Deep 3D Texture Synthesis From 2D Exemplars	
@@ -0,0 +1 @@
+We present a novel texture synthesis framework, enabling the generation of infinite, high-quality 3D textures given a 2D exemplar image. Inspired by recent advances in natural texture synthesis, we train deep neural models to generate textures by non-linearly combining learned noise frequencies. To achieve a highly realistic output conditioned on an exemplar patch, we propose a novel loss function that combines ideas from both style transfer and generative adversarial networks. In particular, we train the synthesis network to match the Gram matrices of deep features from a discriminator network. In addition, we propose two architectural concepts and an extrapolation strategy that significantly improve generalization performance. In particular, we inject both model input and condition into hidden network layers by learning to scale and bias hidden activations. Quantitative and qualitative evaluations on a diverse set of exemplars motivate our design decisions and show that our system performs superior to previous state of the art. Finally, we conduct a user study that confirms the benefits of our framework.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Cross Networks with Vertex Infomax Pooling b/data/2020/neurips/Graph Cross Networks with Vertex Infomax Pooling
new file mode 100644
index 0000000000..33ae54d1e7
--- /dev/null
+++ b/data/2020/neurips/Graph Cross Networks with Vertex Infomax Pooling	
@@ -0,0 +1 @@
+We propose a novel graph cross network (GXN) to achieve comprehensive feature learning from multiple scales of a graph. Based on trainable hierarchical representations of a graph, GXN enables the interchange of intermediate features across scales to promote information flow. Two key ingredients of GXN include a novel vertex infomax pooling (VIPool), which creates multiscale graphs in a trainable manner, and a novel feature-crossing layer, enabling feature interchange across scales. The proposed VIPool selects the most informative subset of vertices based on the neural estimation of mutual information between vertex features and neighborhood features. The intuition behind is that a vertex is informative when it can maximally reflect its neighboring information. The proposed feature-crossing layer fuses intermediate features between two scales for mutual enhancement by improving information flow and enriching multiscale features at hidden layers. The cross shape of the feature-crossing layer distinguishes GXN from many other multiscale architectures. Experimental results show that the proposed GXN improves the classification accuracy by 2.12% and 1.15% on average for graph classification and vertex classification, respectively. Based on the same network, the proposed VIPool consistently outperforms other graph-pooling methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Geometry Interaction Learning b/data/2020/neurips/Graph Geometry Interaction Learning
new file mode 100644
index 0000000000..1d3e46ecc5
--- /dev/null
+++ b/data/2020/neurips/Graph Geometry Interaction Learning	
@@ -0,0 +1 @@
+While numerous approaches have been developed to embed graphs into either Euclidean or hyperbolic spaces, they do not fully utilize the information available in graphs, or lack the flexibility to model intrinsic complex graph geometry. To utilize the strength of both Euclidean and hyperbolic geometries, we develop a novel Geometry Interaction Learning (GIL) method for graphs, a well-suited and efficient alternative for learning abundant geometric properties in graph. GIL captures a more informative internal structural features with low dimensions while maintaining conformal invariance of each space. Furthermore, our method endows each node the freedom to determine the importance of each geometry space via a flexible dual feature interaction learning and probability assembling mechanism. Promising experimental results are presented for five benchmark datasets on node classification and link prediction tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Information Bottleneck b/data/2020/neurips/Graph Information Bottleneck
new file mode 100644
index 0000000000..5be22746bc
--- /dev/null
+++ b/data/2020/neurips/Graph Information Bottleneck	
@@ -0,0 +1 @@
+Given the input graph and its label/property, several key problems of graph learning, such as finding interpretable subgraphs, graph denoising and graph compression, can be attributed to the fundamental problem of recognizing a subgraph of the original one. This subgraph shall be as informative as possible, yet contains less redundant and noisy structure. This problem setting is closely related to the well-known information bottleneck (IB) principle, which, however, has less been studied for the irregular graph data and graph neural networks (GNNs). In this paper, we propose a framework of Graph Information Bottleneck (GIB) for the subgraph recognition problem in deep graph learning. Under this framework, one can recognize the maximally informative yet compressive subgraph, named IB-subgraph. However, the GIB objective is notoriously hard to optimize, mostly due to the intractability of the mutual information of irregular graph data and the unstable optimization process. In order to tackle these challenges, we propose: i) a GIB objective based-on a mutual information estimator for the irregular graph data; ii) a bi-level optimization scheme to maximize the GIB objective; iii) a connectivity loss to stabilize the optimization process. We evaluate the properties of the IB-subgraph in three application scenarios: improvement of graph classification, graph interpretation and graph denoising. Extensive experiments demonstrate that the information-theoretic IB-subgraph enjoys superior graph properties.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Meta Learning via Local Subgraphs b/data/2020/neurips/Graph Meta Learning via Local Subgraphs
new file mode 100644
index 0000000000..981bfb0ab4
--- /dev/null
+++ b/data/2020/neurips/Graph Meta Learning via Local Subgraphs	
@@ -0,0 +1 @@
+Prevailing methods for graphs require abundant label and edge information for learning. When data for a new task are scarce, meta-learning can learn from prior experiences and form much-needed inductive biases for fast adaption to new tasks. Here, we introduce G-Meta, a novel meta-learning algorithm for graphs. G-Meta uses local subgraphs to transfer subgraph-specific information and learn transferable knowledge faster via meta gradients. G-Meta learns how to quickly adapt to a new task using only a handful of nodes or edges in the new task and does so by learning from data points in other graphs or related, albeit disjoint label sets. G-Meta is theoretically justified as we show that the evidence for a prediction can be found in the local subgraph surrounding the target node or edge. Experiments on seven datasets and nine baseline methods show that G-Meta outperforms existing methods by up to 16.3%. Unlike previous methods, G-Meta successfully learns in challenging, few-shot learning settings that require generalization to completely new graphs and never-before-seen labels. Finally, G-Meta scales to large graphs, which we demonstrate on a new Tree-of-Life dataset comprising of 1,840 graphs, a two-orders of magnitude increase in the number of graphs used in prior work.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Policy Network for Transferable Active Learning on Graphs b/data/2020/neurips/Graph Policy Network for Transferable Active Learning on Graphs
new file mode 100644
index 0000000000..7ffe322519
--- /dev/null
+++ b/data/2020/neurips/Graph Policy Network for Transferable Active Learning on Graphs	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields. However, a large number of labeled data is generally required to train these networks, which could be very expensive to obtain in some domains. In this paper, we study active learning for GNNs, i.e., how to efficiently label the nodes on a graph to reduce the annotation cost of training GNNs. We formulate the problem as a sequential decision process on graphs and train a GNN-based policy network with reinforcement learning to learn the optimal query strategy. By jointly training on several source graphs with full labels, we learn a transferable active learning policy which can directly generalize to unlabeled target graphs. Experimental results on multiple datasets from different domains prove the effectiveness of the learned policy in promoting active learning performance in both settings of transferring between graphs in the same domain and across different domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Random Neural Networks for Semi-Supervised Learning on Graphs b/data/2020/neurips/Graph Random Neural Networks for Semi-Supervised Learning on Graphs
new file mode 100644
index 0000000000..28b1390d75
--- /dev/null
+++ b/data/2020/neurips/Graph Random Neural Networks for Semi-Supervised Learning on Graphs	
@@ -0,0 +1 @@
+We study the problem of semi-supervised learning on graphs, for which graph neural networks (GNNs) have been extensively explored. However, most existing GNNs inherently suffer from the limitations of over-smoothing, non-robustness, and weak-generalization when labeled nodes are scarce. In this paper, we propose a simple yet effective framework---GRAPH RANDOM NEURAL NETWORKS (GRAND)---to address these issues. In GRAND, we first design a random propagation strategy to perform graph data augmentation. Then we leverage consistency regularization to optimize the prediction consistency of unlabeled nodes across different data augmentations. Extensive experiments on graph benchmark datasets suggest that GRAND significantly outperforms state-of-the-art GNN baselines on semi-supervised node classification. Finally, we show that GRAND mitigates the issues of over-smoothing and non-robustness, exhibiting better generalization behavior than existing GNNs. The source code of GRAND is publicly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Graph Stochastic Neural Networks for Semi-supervised Learning b/data/2020/neurips/Graph Stochastic Neural Networks for Semi-supervised Learning
new file mode 100644
index 0000000000..3cd3a02ad4
--- /dev/null
+++ b/data/2020/neurips/Graph Stochastic Neural Networks for Semi-supervised Learning	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i.e., a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Graphon Neural Networks and the Transferability of Graph Neural Networks b/data/2020/neurips/Graphon Neural Networks and the Transferability of Graph Neural Networks
new file mode 100644
index 0000000000..593c210226
--- /dev/null
+++ b/data/2020/neurips/Graphon Neural Networks and the Transferability of Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) rely on graph convolutions to extract local features from network data. These graph convolutions combine information from adjacent nodes using coefficients that are shared across all nodes. Since these coefficients are shared and do not depend on the graph, one can envision using the same coefficients to define a GNN on another graph. This motivates analyzing the transferability of GNNs across graphs. In this paper we introduce graphon NNs as limit objects of GNNs and prove a bound on the difference between the output of a GNN and its limit graphon-NN. This bound vanishes with growing number of nodes if the graph convolutional filters are bandlimited in the graph spectral domain. This result establishes a tradeoff between discriminability and transferability of GNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps b/data/2020/neurips/Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps
new file mode 100644
index 0000000000..b0e91a93c4
--- /dev/null
+++ b/data/2020/neurips/Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps	
@@ -0,0 +1 @@
+Learning robotic grasps from visual observations is a promising yet challenging task. Recent research shows its great potential by preparing and learning from large-scale synthetic datasets. For the popular, 6 degree-of-freedom (6-DOF) grasp setting of parallel-jaw gripper, most of existing methods take the strategy of heuristically sampling grasp candidates and then evaluating them using learned scoring functions. This strategy is limited in terms of the conflict between sampling efficiency and coverage of optimal grasps. To this end, we propose in this work a novel, end-to-end \emph{Grasp Proposal Network (GPNet)}, to predict a diverse set of 6-DOF grasps for an unseen object observed from a single and unknown camera view. GPNet builds on a key design of grasp proposal module that defines \emph{anchors of grasp centers} at discrete but regular 3D grid corners, which is flexible to support either more precise or more diverse grasp predictions. To test GPNet, we contribute a synthetic dataset of 6-DOF object grasps; evaluation is conducted using rule-based criteria, simulation test, and real test. Comparative results show the advantage of our methods over existing ones. Notably, GPNet gains better simulation results via the specified coverage, which helps achieve a ready translation in real test. We will make our dataset publicly available.
\ No newline at end of file
diff --git a/data/2020/neurips/Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough b/data/2020/neurips/Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough
new file mode 100644
index 0000000000..bdc78d448e
--- /dev/null
+++ b/data/2020/neurips/Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough	
@@ -0,0 +1 @@
+Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a specified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning method. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w.r.t. the size of the pruned network, under weak assumptions that apply for most practical settings. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
\ No newline at end of file
diff --git a/data/2020/neurips/Greedy inference with structure-exploiting lazy maps b/data/2020/neurips/Greedy inference with structure-exploiting lazy maps
new file mode 100644
index 0000000000..ca269afc9e
--- /dev/null
+++ b/data/2020/neurips/Greedy inference with structure-exploiting lazy maps	
@@ -0,0 +1 @@
+We propose a framework for solving high-dimensional Bayesian inference problems using \emph{structure-exploiting} low-dimensional transport maps or flows. These maps are confined to a low-dimensional subspace (hence, lazy), and the subspace is identified by minimizing an upper bound on the Kullback--Leibler divergence (hence, structured). Our framework provides a principled way of identifying and exploiting low-dimensional structure in an inference problem. It focuses the expressiveness of a transport map along the directions of most significant discrepancy from the posterior, and can be used to build deep compositions of lazy maps, where low-dimensional projections of the parameters are iteratively transformed to match the posterior. We prove weak convergence of the generated sequence of distributions to the posterior, and we demonstrate the benefits of the framework on challenging inference problems in machine learning and differential equations, using inverse autoregressive flows and polynomial maps as examples of the underlying density estimators.
\ No newline at end of file
diff --git a/data/2020/neurips/GreedyFool: Distortion-Aware Sparse Adversarial Attack b/data/2020/neurips/GreedyFool: Distortion-Aware Sparse Adversarial Attack
new file mode 100644
index 0000000000..bc96f8d357
--- /dev/null
+++ b/data/2020/neurips/GreedyFool: Distortion-Aware Sparse Adversarial Attack	
@@ -0,0 +1 @@
+Modern deep neural networks(DNNs) are vulnerable to adversarial samples. Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels. The existence of the sparse adversarial attack points out that DNNs are much more vulnerable than people believed, which is also a new aspect for analyzing DNNs. However, current sparse adversarial attack methods still have some shortcomings on both sparsity and invisibility. In this paper, we propose a novel two-stage distortion-aware greedy-based method dubbed as âGreedyFool". Specifically, it first selects the most effective candidate positions to modify by considering both the gradient(for adversary) and the distortion map(for invisibility), then drops some less important points in the reduce stage. Experiments demonstrate that compared with the start-of-the-art method, we only need to modify $3\times$ fewer pixels under the same sparse perturbation setting. For target attack, the success rate of our method is 9.96\% higher than the start-of-the-art method under the same pixel budget. Code can be found at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Group Contextual Encoding for 3D Point Clouds b/data/2020/neurips/Group Contextual Encoding for 3D Point Clouds
new file mode 100644
index 0000000000..f556b7c664
--- /dev/null
+++ b/data/2020/neurips/Group Contextual Encoding for 3D Point Clouds	
@@ -0,0 +1 @@
+Global context is crucial for 3D point cloud scene understanding tasks. In this work, we extend the contextual encoding layer that was originally designed for 2D tasks to 3D point cloud scenarios. The encoding layer learns a set of code words in feature space of the 3D point cloud to characterize the global semantic context, and then based on these code words, the method learns a global contextual descriptor to reweight the feature maps accordingly. Moreover, compared to 2D scenarios, data sparsity becomes a major issue in 3D point cloud scenarios, and the performance of contextual encoding quickly saturates when the number of code words increases. To mitigate this problem, we further propose a group contextual encoding method, which divides the channel into groups and then performs encoding on group-divided feature vectors. This method facilitates learning of global context in grouped subspace for 3D point clouds. We evaluate the effectiveness and generalizability of our method on three widely-studied 3D point cloud tasks. Experimental results have shown that the proposed method outperformed the VoteNet remarkably with 3 mAP on the benchmark of SUN-RGBD, with the metrics of mAP@0.25, and a much greater margin of 6.57 mAP on ScanNet with the metrics of mAP@0.5. Compared to the baseline of PointNet++, the proposed method leads to an accuracy of 86%, outperforming the baseline by 1.5%.
\ No newline at end of file
diff --git a/data/2020/neurips/Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge b/data/2020/neurips/Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge
new file mode 100644
index 0000000000..a14f713885
--- /dev/null
+++ b/data/2020/neurips/Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge	
@@ -0,0 +1 @@
+Scaling up the convolutional neural network (CNN) size (e.g., width, depth, etc.) is known to effectively improve model accuracy. However, the large model size impedes training on resource-constrained edge devices. For instance, federated learning (FL) may place undue burden on the compute capability of edge nodes, even though there is a strong practical need for FL due to its privacy and confidentiality properties. To address the resource-constrained reality of edge devices, we reformulate FL as a group knowledge transfer training algorithm, called FedGKT. FedGKT designs a variant of the alternating minimization approach to train small CNNs on edge nodes and periodically transfer their knowledge by knowledge distillation to a large server-side CNN. FedGKT consolidates several advantages into a single framework: reduced demand for edge computation, lower communication bandwidth for large CNNs, and asynchronous training, all while maintaining model accuracy comparable to FedAvg. We train CNNs designed based on ResNet-56 and ResNet-110 using three distinct datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants. Our results show that FedGKT can obtain comparable or even slightly higher accuracy than FedAvg. More importantly, FedGKT makes edge training affordable. Compared to the edge training using FedAvg, FedGKT demands 9 to 17 times less computational power (FLOPs) on edge devices and requires 54 to 105 times fewer parameters in the edge CNN. Our source code is released at FedML (this https URL).
\ No newline at end of file
diff --git a/data/2020/neurips/Group-Fair Online Allocation in Continuous Time b/data/2020/neurips/Group-Fair Online Allocation in Continuous Time
new file mode 100644
index 0000000000..1dd146ab75
--- /dev/null
+++ b/data/2020/neurips/Group-Fair Online Allocation in Continuous Time	
@@ -0,0 +1 @@
+The theory of discrete-time online learning has been successfully applied in many problems that involve sequential decision-making under uncertainty. However, in many applications including contractual hiring in online freelancing platforms and server allocation in cloud computing systems, the outcome of each action is observed only after a random and action-dependent time. Furthermore, as a consequence of certain ethical and economic concerns, the controller may impose deadlines on the completion of each task, and require fairness across different groups in the allocation of total time budget $B$. In order to address these applications, we consider continuous-time online learning problem with fairness considerations, and present a novel framework based on continuous-time utility maximization. We show that this formulation recovers reward-maximizing, max-min fair and proportionally fair allocation rules across different groups as special cases. We characterize the optimal offline policy, which allocates the total time between different actions in an optimally fair way (as defined by the utility function), and impose deadlines to maximize time-efficiency. In the absence of any statistical knowledge, we propose a novel online learning algorithm based on dual ascent optimization for time averages, and prove that it achieves $\tilde{O}(B^{-1/2})$ regret bound.
\ No newline at end of file
diff --git a/data/2020/neurips/Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses b/data/2020/neurips/Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses
new file mode 100644
index 0000000000..684c7e64d2
--- /dev/null
+++ b/data/2020/neurips/Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses	
@@ -0,0 +1 @@
+Advances in the development of adversarial attacks have been fundamental to the progress of adversarial defense research. Efficient and effective attacks are crucial for reliable evaluation of defenses, and also for developing robust models. Adversarial attacks are often generated by maximizing standard losses such as the cross-entropy loss or maximum-margin loss within a constraint set using Projected Gradient Descent (PGD). In this work, we introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training. We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries, thereby resulting in stronger attacks. We evaluate our attack against multiple defenses and show improved performance when compared to existing attacks. Further, we propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses by utilizing the proposed relaxation term for both attack generation and training.
\ No newline at end of file
diff --git a/data/2020/neurips/Guiding Deep Molecular Optimization with Genetic Exploration b/data/2020/neurips/Guiding Deep Molecular Optimization with Genetic Exploration
new file mode 100644
index 0000000000..93535a8d89
--- /dev/null
+++ b/data/2020/neurips/Guiding Deep Molecular Optimization with Genetic Exploration	
@@ -0,0 +1 @@
+De novo molecular design attempts to search over the chemical space for molecules with the desired property. Recently, deep learning has gained considerable attention as a promising approach to solve the problem. In this paper, we propose genetic expert-guided learning (GEGL), a simple yet novel framework for training a deep neural network (DNN) to generate highly-rewarding molecules. Our main idea is to design a "genetic expert improvement" procedure, which generates high-quality targets for imitation learning of the DNN. Extensive experiments show that GEGL significantly improves over state-of-the-art methods. For example, GEGL manages to solve the penalized octanol-water partition coefficient optimization with a score of 31.82, while the best-known score in the literature is 26.1. Besides, for the GuacaMol benchmark with 20 tasks, our method achieves the highest score for 19 tasks, in comparison with state-of-the-art methods, and newly obtains the perfect score for three tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/H-Mem: Harnessing synaptic plasticity with Hebbian Memory Networks b/data/2020/neurips/H-Mem: Harnessing synaptic plasticity with Hebbian Memory Networks
new file mode 100644
index 0000000000..3a89d8fce6
--- /dev/null
+++ b/data/2020/neurips/H-Mem: Harnessing synaptic plasticity with Hebbian Memory Networks	
@@ -0,0 +1 @@
+The ability to base current computations on memories from the past is critical for many cognitive tasks such as story understanding. Hebbian-type synaptic plasticity is believed to underlie the retention of memories over medium and long time scales in the brain. However, it is unclear how such plasticity processes are integrated with computations in cortical networks. Here, we propose Hebbian Memory Networks (H-Mems), a simple neural network model that is built around a core hetero-associative network subject to Hebbian plasticity. We show that the network can be optimized to utilize the Hebbian plasticity processes for its computations. H-Mems can one-shot memorize associations between stimulus pairs and use these associations for decisions later on. Furthermore, they can solve demanding question-answering tasks on synthetic stories. Our study shows that neural network models are able to enrich their computations with memories through simple Hebbian plasticity processes.
\ No newline at end of file
diff --git a/data/2020/neurips/HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks b/data/2020/neurips/HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
new file mode 100644
index 0000000000..5652d9c0da
--- /dev/null
+++ b/data/2020/neurips/HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks	
@@ -0,0 +1 @@
+Quantization is an effective method for reducing memory footprint and inference time of Neural Networks, e.g., for efficient inference in the cloud, especially at the edge. However, ultra low precision quantization could lead to significant degradation in model generalization. A promising method to address this is to perform mixed-precision quantization, where more sensitive layers are kept at higher precision. However, the search space for a mixed-precision quantization is exponential in the number of layers. Recent work has proposed HAWQ, a novel Hessian based framework, with the aim of reducing this exponential search space by using second-order information. While promising, this prior work has three major limitations: (i) HAWQV1 only uses the top Hessian eigenvalue as a measure of sensitivity and do not consider the rest of the Hessian spectrum; (ii) HAWQV1 approach only provides relative sensitivity of different layers and therefore requires a manual selection of the mixed-precision setting; and (iii) HAWQV1 does not consider mixed-precision activation quantization. Here, we present HAWQV2 which addresses these shortcomings. For (i), we perform a theoretical analysis showing that a better sensitivity metric is to compute the average of all of the Hessian eigenvalues. For (ii), we develop a Pareto frontier based method for selecting the exact bit precision of different layers without any manual selection. For (iii), we extend the Hessian analysis to mixed-precision activation quantization. We have found this to be very beneficial for object detection. We show that HAWQV2 achieves new state-of-the-art results for a wide range of tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory b/data/2020/neurips/HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory
new file mode 100644
index 0000000000..191c2eeb03
--- /dev/null
+++ b/data/2020/neurips/HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory	
@@ -0,0 +1 @@
+The state-of-the-art approximate nearest neighbor search (ANNS) algorithms face a fundamental tradeoff between query latency and accuracy, because of small main memory capacity: To store indices in main memory for fast query response, They have to limit the number of data points or store compressed vectors, which hurts search accuracy. The emergence of heterogeneous memory (HM) brings opportunities to largely increase memory capacity and break the above tradeoff: Using HM, billions of data points can be placed in main memory on a single machine without using any data compression. However, HM consists of both fast (but small) memory and slow (but large) memory, and using HM inappropriately slows down query time significantly. In this work, we present a novel graph-based similarity search algorithm called HM-ANN, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression. On two billion-sized datasets BIGANN and DEEP1B, HM-ANN outperforms state-of-the-art compression-based solutions such as L&C and IMI+OPQ in recall-vs-latency by a large margin, obtaining 46% higher recall under the same search latency. We also extend existing graph-based methods such as HNSW and NSG with two strong baseline implementations on HM. At billion-point scale, HM-ANN is 2X and 5.8X faster than our HNSW and NSG baselines respectively to reach the same accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/HOI Analysis: Integrating and Decomposing Human-Object Interaction b/data/2020/neurips/HOI Analysis: Integrating and Decomposing Human-Object Interaction
new file mode 100644
index 0000000000..ffa7d7d107
--- /dev/null
+++ b/data/2020/neurips/HOI Analysis: Integrating and Decomposing Human-Object Interaction	
@@ -0,0 +1 @@
+Human-Object Interaction (HOI) consists of human, object and implicit interaction/verb. Different from previous methods that directly map pixels to HOI semantics, we propose a novel perspective for HOI learning in an analytical manner. In analogy to Harmonic Analysis, whose goal is to study how to represent the signals with the superposition of basic waves, we propose the HOI Analysis. We argue that coherent HOI can be decomposed into isolated human and object. Meanwhile, isolated human and object can also be integrated into coherent HOI again. Moreover, transformations between human-object pairs with the same HOI can also be easier approached with integration and decomposition. As a result, the implicit verb will be represented in the transformation function space. In light of this, we propose an Integration-Decomposition Network (IDN) to implement the above transformations and achieve state-of-the-art performance on widely-used HOI detection benchmarks. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/HRN: A Holistic Approach to One Class Learning b/data/2020/neurips/HRN: A Holistic Approach to One Class Learning
new file mode 100644
index 0000000000..3cc3ece787
--- /dev/null
+++ b/data/2020/neurips/HRN: A Holistic Approach to One Class Learning	
@@ -0,0 +1 @@
+Existing neural network based one-class learning methods mainly use various forms of auto-encoders or GAN style adversarial training to learn a latent representation of the given one class of data. This paper proposes an entirely different approach based on a novel regularization, called holistic regularization (or H-regularization ), which enables the system to consider the data holistically, not to produce a model that biases towards some features. Combined with a proposed 2-norm instance-level data normalization , we obtain an effective one-class learning method, called HRN. To our knowledge, the proposed regularization and the normalization method have not been reported before. Experimental evaluation using both benchmark image classiﬁcation and traditional anomaly detection datasets show that HRN markedly outperforms the state-of-the-art existing deep/non-deep learning models. The code of HRN can be found here 3 .
\ No newline at end of file
diff --git a/data/2020/neurips/HYDRA: Pruning Adversarially Robust Neural Networks b/data/2020/neurips/HYDRA: Pruning Adversarially Robust Neural Networks
new file mode 100644
index 0000000000..76f779adda
--- /dev/null
+++ b/data/2020/neurips/HYDRA: Pruning Adversarially Robust Neural Networks	
@@ -0,0 +1 @@
+In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). While the research community has extensively explored the use of robust training and network pruning independently to address one of these challenges, only a few recent works have studied them jointly. However, these works inherit a heuristic pruning strategy that was developed for benign training, which performs poorly when integrated with robust training techniques, including adversarial training and verifiable robust training. To overcome this challenge, we propose to make pruning techniques aware of the robust training objective and let the training objective guide the search for which connections to prune. We realize this insight by formulating the pruning objective as an empirical risk minimization problem which is solved efficiently using SGD. We demonstrate that our approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously. We demonstrate the success of our approach across CIFAR-10, SVHN, and ImageNet dataset with four robust training techniques: iterative adversarial training, randomized smoothing, MixTrain, and CROWN-IBP. We also demonstrate the existence of highly robust sub-networks within non-robust networks. Our code and compressed networks are publicly available at \url{this https URL}.
\ No newline at end of file
diff --git a/data/2020/neurips/Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond b/data/2020/neurips/Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Handling Missing Data with Graph Representation Learning b/data/2020/neurips/Handling Missing Data with Graph Representation Learning
new file mode 100644
index 0000000000..61b7eed9c0
--- /dev/null
+++ b/data/2020/neurips/Handling Missing Data with Graph Representation Learning	
@@ -0,0 +1 @@
+Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label prediction often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a graph-based framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using a graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning b/data/2020/neurips/Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning
new file mode 100644
index 0000000000..b636c2c25e
--- /dev/null
+++ b/data/2020/neurips/Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning	
@@ -0,0 +1 @@
+Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs. To tackle this issue, we develop a geometry-focused multi-view metric learning framework empowered by texture synthesis. The synthesis of textures for 3D shape models creates hard triplets, which suppress the adverse effects of rich texture in 2D images, thereby push the network to focus more on discovering geometric characteristics. Our approach shows state-of-the-art performance on a recently released large-scale 3D-FUTURE[1] repository, as well as three widely studied benchmarks, including Pix3D[2], Stanford Cars[3], and Comp Cars[4]. Codes will be made public available at: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Hard Negative Mixing for Contrastive Learning b/data/2020/neurips/Hard Negative Mixing for Contrastive Learning
new file mode 100644
index 0000000000..3e498b88a4
--- /dev/null
+++ b/data/2020/neurips/Hard Negative Mixing for Contrastive Learning	
@@ -0,0 +1 @@
+Contrastive learning has become a key component of self-supervised learning approaches for computer vision. By learning to embed two augmented versions of the same image close to each other and to push the embeddings of different images apart, one can train highly transferable visual representations. As revealed by recent studies, heavy data augmentation and large sets of negatives are both crucial in learning such representations. At the same time, data mixing strategies either at the image or the feature level improve both supervised and semi-supervised learning by synthesizing novel examples, forcing networks to learn more robust features. In this paper, we argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected. To get more meaningful negative samples, current top contrastive self-supervised learning approaches either substantially increase the batch sizes, or keep very large memory banks; increasing the memory size, however, leads to diminishing returns in terms of performance. We therefore start by delving deeper into a top-performing framework and show evidence that harder negatives are needed to facilitate better and faster learning. Based on these observations, and motivated by the success of data mixing, we propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead. We exhaustively ablate our approach on linear classification, object detection and instance segmentation and show that employing our hard negative mixing procedure improves the quality of visual representations learned by a state-of-the-art self-supervised learning method.
\ No newline at end of file
diff --git a/data/2020/neurips/Hard Shape-Constrained Kernel Machines b/data/2020/neurips/Hard Shape-Constrained Kernel Machines
new file mode 100644
index 0000000000..a54a118a61
--- /dev/null
+++ b/data/2020/neurips/Hard Shape-Constrained Kernel Machines	
@@ -0,0 +1 @@
+Shape constraints (such as non-negativity, monotonicity, convexity) play a central role in a large number of applications, as they usually improve performance for small sample size and help interpretability. However enforcing these shape requirements in a hard fashion is an extremely challenging problem. Classically, this task is tackled (i) in a soft way (without out-of-sample guarantees), (ii) by specialized transformation of the variables on a case-by-case basis, or (iii) by using highly restricted function classes, such as polynomials or polynomial splines. In this paper, we prove that hard affine shape constraints on function derivatives can be encoded in kernel machines which represent one of the most flexible and powerful tools in machine learning and statistics. Particularly, we present a tightened second-order cone constrained reformulation, that can be readily implemented in convex solvers. We prove performance guarantees on the solution, and demonstrate the efficiency of the approach in joint quantile regression with applications to economics and to the analysis of aircraft trajectories, among others.
\ No newline at end of file
diff --git a/data/2020/neurips/Hardness of Learning Neural Networks with Natural Weights b/data/2020/neurips/Hardness of Learning Neural Networks with Natural Weights
new file mode 100644
index 0000000000..ec3608ae47
--- /dev/null
+++ b/data/2020/neurips/Hardness of Learning Neural Networks with Natural Weights	
@@ -0,0 +1 @@
+Neural networks are nowadays highly successful despite strong hardness results. The existing hardness results focus on the network architecture, and assume that the network's weights are arbitrary. A natural approach to settle the discrepancy is to assume that the network's weights are "well-behaved" and posses some generic properties that may allow efficient learning. This approach is supported by the intuition that the weights in real-world networks are not arbitrary, but exhibit some "random-like" properties with respect to some "natural" distributions. We prove negative results in this regard, and show that for depth-$2$ networks, and many "natural" weights distributions such as the normal and the uniform distribution, most networks are hard to learn. Namely, there is no efficient learning algorithm that is provably successful for most weights, and every input distribution. It implies that there is no generic property that holds with high probability in such random networks and allows efficient learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks b/data/2020/neurips/Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
new file mode 100644
index 0000000000..71f6041616
--- /dev/null
+++ b/data/2020/neurips/Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks	
@@ -0,0 +1 @@
+Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a Feller process, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the Hausdorff dimension of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of ‘capacity metric’. We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/Heavy-tailed Representations, Text Polarity Classification & Data Augmentation b/data/2020/neurips/Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
new file mode 100644
index 0000000000..727d3c1ecb
--- /dev/null
+++ b/data/2020/neurips/Heavy-tailed Representations, Text Polarity Classification & Data Augmentation	
@@ -0,0 +1 @@
+The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which performance outperforms the baseline. This classifier exhibits a scale invariance property which we leverage by introducing a novel text generation method for label preserving dataset augmentation. Numerical experiments on synthetic and real text data demonstrate the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiment.
\ No newline at end of file
diff --git a/data/2020/neurips/Hedging in games: Faster convergence of external and swap regrets b/data/2020/neurips/Hedging in games: Faster convergence of external and swap regrets
new file mode 100644
index 0000000000..d23ac172dc
--- /dev/null
+++ b/data/2020/neurips/Hedging in games: Faster convergence of external and swap regrets	
@@ -0,0 +1,4 @@
+We consider the setting where players run the Hedge algorithm or its optimistic variant \cite{syrgkanis2015fast} to play an n-action game repeatedly for T rounds. 
+1) For two-player games, we show that the regret of optimistic Hedge decays at \tilde{O}( 1/T ^{5/6} ), improving the previous bound O(1/T^{3/4}) by \cite{syrgkanis2015fast}. 
+2) In contrast, we show that the convergence rate of vanilla Hedge is no better than \tilde{\Omega}(1/ \sqrt{T})}, addressing an open question posted in \cite{syrgkanis2015fast}. 
+For general m-player games, we show that the swap regret of each player decays at rate \tilde{O}(m^{1/2} (n/T)^{3/4}) when they combine optimistic Hedge with the classical external-to-internal reduction of Blum and Mansour \cite{blum2007external}. The algorithm can also be modified to achieve the same rate against itself and a rate of \tilde{O}(\sqrt{n/T}) against adversaries. Via standard connections, our upper bounds also imply faster convergence to coarse correlated equilibria in two-player games and to correlated equilibria in multiplayer games.
\ No newline at end of file
diff --git a/data/2020/neurips/Heuristic Domain Adaptation b/data/2020/neurips/Heuristic Domain Adaptation
new file mode 100644
index 0000000000..f0bb5c3c36
--- /dev/null
+++ b/data/2020/neurips/Heuristic Domain Adaptation	
@@ -0,0 +1 @@
+The purpose of this article is to address unsupervised domain adaptation (UDA) where a labeled source domain and an unlabeled target domain are given. Recent advanced UDA methods attempt to remove domain-specific properties by separating domain-specific information from domain-invariant representations, which heavily rely on the designed neural network structures. Meanwhile, they do not consider class discriminate representations when learning domain-invariant representations. To this end, this article proposes a co-training framework for heterogeneous heuristic domain adaptation (CO-HHDA) to address the above issues. First, a heterogeneous heuristic network is introduced to model domain-specific characters. It allows structures of heuristic network to be different between domains to avoid underfitting or overfitting. Specially, we initialize a small structure that is shared between domains and increase a subnetwork for the domain which preserves rich specific information. Second, we propose a co-training scheme to train two classifiers, a source classifier and a target classifier, to enhance class discriminate representations. The two classifiers are designed based on domain-invariant representations, where the source classifier learns from the labeled source data, and the target classifier is trained from the generated target pseudolabeled data. The two classifiers teach each other in the training process with high-quality pseudolabeled data. Meanwhile, an adaptive threshold is presented to select reliable pseudolabels in each classifier. Empirical results on three commonly used benchmark datasets demonstrate that the proposed CO-HHDA outperforms the state-of-the-art domain adaptation methods.
\ No newline at end of file
diff --git a/data/2020/neurips/HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis b/data/2020/neurips/HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
new file mode 100644
index 0000000000..7d01e1ccdd
--- /dev/null
+++ b/data/2020/neurips/HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis	
@@ -0,0 +1 @@
+Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive and flow-based generative models. In this work, we propose HiFi-GAN, which achieves both efficient and high-fidelity speech synthesis. As speech audio consists of sinusoidal signals with various periods, we demonstrate that modeling periodic patterns of an audio is crucial for enhancing sample quality. A subjective human evaluation (mean opinion score, MOS) of a single speaker dataset indicates that our proposed method demonstrates similarity to human quality while generating 22.05 kHz high-fidelity audio 167.9 times faster than real-time on a single V100 GPU. We further show the generality of HiFi-GAN to the mel-spectrogram inversion of unseen speakers and end-to-end speech synthesis. Finally, a small footprint version of HiFi-GAN generates samples 13.4 times faster than real-time on CPU with comparable quality to an autoregressive counterpart.
\ No newline at end of file
diff --git a/data/2020/neurips/HiPPO: Recurrent Memory with Optimal Polynomial Projections b/data/2020/neurips/HiPPO: Recurrent Memory with Optimal Polynomial Projections
new file mode 100644
index 0000000000..3b6b36b532
--- /dev/null
+++ b/data/2020/neurips/HiPPO: Recurrent Memory with Optimal Polynomial Projections	
@@ -0,0 +1 @@
+A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights b/data/2020/neurips/Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights
new file mode 100644
index 0000000000..a567cb79b1
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Gaussian Process Priors for Bayesian Neural Network Weights	
@@ -0,0 +1 @@
+Probabilistic neural networks are typically modeled with independent weight priors, which do not capture weight correlations in the prior and do not provide a parsimonious interface to express properties in function space. A desirable class of priors would represent weights compactly, capture correlations between weights, facilitate calibrated reasoning about uncertainty, and allow inclusion of prior knowledge about the function space such as periodicity or dependence on contexts such as inputs. To this end, this paper introduces two innovations: (i) a Gaussian process-based hierarchical model for network weights based on unit embeddings that can flexibly encode correlated weight structures, and (ii) input-dependent versions of these weight priors that can provide convenient ways to regularize the function space through the use of kernels defined on contextual inputs. We show these models provide desirable test-time uncertainty estimates on out-of-distribution data, demonstrate cases of modeling inductive biases for neural networks with kernels which help both interpolation and extrapolation from training data, and demonstrate competitive predictive performance on an active learning benchmark.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Granularity Transfer Learning b/data/2020/neurips/Hierarchical Granularity Transfer Learning
new file mode 100644
index 0000000000..1f4f282eed
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Granularity Transfer Learning	
@@ -0,0 +1 @@
+In the real world, object categories usually have a hierarchical granularity tree. Nowadays, most researchers focus on recognizing categories in a speciﬁc granularity, e.g., basic-level or sub(ordinate)-level. Compared with basic-level categories, the sub-level categories provide more valuable information, but its training annotations are harder to acquire. Therefore, an attractive problem is how to transfer the knowledge learned from basic-level annotations to sub-level recognition. In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories. Different from other recognition tasks, HGTL has a serious granularity gap, i.e., the two granularities share an image space but have different category domains, which impede the knowledge transfer. To this end, we propose a novel Bi-granularity Semantic Preserving Network (BigSPN) to bridge the granularity gap for robust knowledge transfer. Explicitly, BigSPN constructs speciﬁc visual encoders for different granularities, which are aligned with a shared semantic interpreter via a novel subordinate entropy loss. Experiments on three benchmarks with hierarchical granularities show that BigSPN is an effective framework for Hierarchical Granularity Transfer Learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Neural Architecture Search for Deep Stereo Matching b/data/2020/neurips/Hierarchical Neural Architecture Search for Deep Stereo Matching
new file mode 100644
index 0000000000..0fc7a14336
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Neural Architecture Search for Deep Stereo Matching	
@@ -0,0 +1 @@
+To reduce the human efforts in neural network design, Neural Architecture Search (NAS) has been applied with remarkable success to various high-level vision tasks such as classification and semantic segmentation. The underlying idea for the NAS algorithm is straightforward, namely, to enable the network the ability to choose among a set of operations (e.g., convolution with different filter sizes), one is able to find an optimal architecture that is better adapted to the problem at hand. However, so far the success of NAS has not been enjoyed by low-level geometric vision tasks such as stereo matching. This is partly due to the fact that state-of-the-art deep stereo matching networks, designed by humans, are already sheer in size. Directly applying the NAS to such massive structures is computationally prohibitive based on the currently available mainstream computing resources. In this paper, we propose the first end-to-end hierarchical NAS framework for deep stereo matching by incorporating task-specific human knowledge into the neural architecture search framework. Specifically, following the gold standard pipeline for deep stereo matching (i.e., feature extraction -- feature volume construction and dense matching), we optimize the architectures of the entire pipeline jointly. Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on SceneFlow dataset with a substantial improvement on the size of the network and the speed of inference. The code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample b/data/2020/neurips/Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample
new file mode 100644
index 0000000000..13b4700d31
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample	
@@ -0,0 +1 @@
+We consider the task of generating diverse and novel videos from a single video sample. Recently, new hierarchical patch-GAN based approaches were proposed for generating diverse images, given only a single sample at training time. Moving to videos, these approaches fail to generate diverse samples, and often collapse into generating samples similar to the training video. We introduce a novel patch-based variational autoencoder (VAE) which allows for a much greater diversity in generation. Using this tool, a new hierarchical video generation scheme is constructed: at coarse scales, our patch-VAE is employed, ensuring samples are of high diversity. Subsequently, at finer scales, a patch-GAN renders the fine details, resulting in high quality videos. Our experiments show that the proposed method produces diverse samples in both the image domain, and the more challenging video domain.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Poset Decoding for Compositional Generalization in Language b/data/2020/neurips/Hierarchical Poset Decoding for Compositional Generalization in Language
new file mode 100644
index 0000000000..53a36aca5f
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Poset Decoding for Compositional Generalization in Language	
@@ -0,0 +1 @@
+We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset). Current encoder-decoder architectures do not take the poset structure of semantics into account properly, thus suffering from poor compositional generalization ability. In this paper, we propose a novel hierarchical poset decoding paradigm for compositional generalization in language. Intuitively: (1) the proposed paradigm enforces partial permutation invariance in semantics, thus avoiding overfitting to bias ordering information; (2) the hierarchical mechanism allows to capture high-level structures of posets. We evaluate our proposed decoder on Compositional Freebase Questions (CFQ), a large and realistic natural language question answering dataset that is specifically designed to measure compositional generalization. Results show that it outperforms current decoders.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical Quantized Autoencoders b/data/2020/neurips/Hierarchical Quantized Autoencoders
new file mode 100644
index 0000000000..32bad9fee5
--- /dev/null
+++ b/data/2020/neurips/Hierarchical Quantized Autoencoders	
@@ -0,0 +1 @@
+Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and abstract features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational Autoencoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of stochastic quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a novel objective for training hierarchical VQ-VAEs. Our resulting scheme produces a Markovian series of latent variables that reconstruct images of high-perceptual quality which retain semantically meaningful features. We provide qualitative and quantitative evaluations on the CelebA and MNIST datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchical nucleation in deep neural networks b/data/2020/neurips/Hierarchical nucleation in deep neural networks
new file mode 100644
index 0000000000..7afdb5b2cc
--- /dev/null
+++ b/data/2020/neurips/Hierarchical nucleation in deep neural networks	
@@ -0,0 +1 @@
+Deep convolutional networks (DCNs) learn meaningful representations where data that share the same abstract characteristics are positioned closer and closer. Understanding these representations and how they are generated is of unquestioned practical and theoretical interest. In this work we study the evolution of the probability density of the ImageNet dataset across the hidden layers in some state-of-the-art DCNs. We find that the initial layers generate a unimodal probability density getting rid of any structure irrelevant for classification. In subsequent layers density peaks arise in a hierarchical fashion that mirrors the semantic hierarchy of the concepts. Density peaks corresponding to single categories appear only close to the output and via a very sharp transition which resembles the nucleation process of a heterogeneous liquid. This process leaves a footprint in the probability density of the output layer where the topography of the peaks allows reconstructing the semantic relationships of the categories.
\ No newline at end of file
diff --git a/data/2020/neurips/Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems b/data/2020/neurips/Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems
new file mode 100644
index 0000000000..0432b93937
--- /dev/null
+++ b/data/2020/neurips/Hierarchically Organized Latent Modules for Exploratory Search in Morphogenetic Systems	
@@ -0,0 +1 @@
+Self-organization of complex morphological patterns from local interactions is a fascinating phenomenon in many natural and artificial systems. In the artificial world, typical examples of such morphogenetic systems are cellular automata. Yet, their mechanisms are often very hard to grasp and so far scientific discoveries of novel patterns have primarily been relying on manual tuning and ad hoc exploratory search. The problem of automated diversity-driven discovery in these systems was recently introduced [26, 61], highlighting that two key ingredients are autonomous exploration and unsupervised representation learning to describe "relevant" degrees of variations in the patterns. In this paper, we motivate the need for what we call Meta-diversity search, arguing that there is not a unique ground truth interesting diversity as it strongly depends on the final observer and its motives. Using a continuous game-of-life system for experiments, we provide empirical evidences that relying on monolithic architectures for the behavioral embedding design tends to bias the final discoveries (both for hand-defined and unsupervisedly-learned features) which are unlikely to be aligned with the interest of a final end-user. To address these issues, we introduce a novel dynamic and modular architecture that enables unsupervised learning of a hierarchy of diverse representations. Combined with intrinsically motivated goal exploration algorithms, we show that this system forms a discovery assistant that can efficiently adapt its diversity search towards preferences of a user using only a very small amount of user feedback.
\ No newline at end of file
diff --git a/data/2020/neurips/High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds b/data/2020/neurips/High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds
new file mode 100644
index 0000000000..3677893f0f
--- /dev/null
+++ b/data/2020/neurips/High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds	
@@ -0,0 +1 @@
+Despite the recent success of Bayesian optimization (BO) in a variety of applications where sample efficiency is imperative, its performance may be seriously compromised in settings characterized by high-dimensional parameter spaces. A solution to preserve the sample efficiency of BO in such problems is to introduce domain knowledge into its formulation. In this paper, we propose to exploit the geometry of non-Euclidean search spaces, which often arise in a variety of domains, to learn structure-preserving mappings and optimize the acquisition function of BO in low-dimensional latent spaces. Our approach, built on Riemannian manifolds theory, features geometry-aware Gaussian processes that jointly learn a nested-manifold embedding and a representation of the objective function in the latent space. We test our approach in several benchmark artificial landscapes and report that it not only outperforms other high-dimensional BO approaches in several settings, but consistently optimizes the objective functions, as opposed to geometry-unaware BO methods.
\ No newline at end of file
diff --git a/data/2020/neurips/High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization b/data/2020/neurips/High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization
new file mode 100644
index 0000000000..9a9c9feb07
--- /dev/null
+++ b/data/2020/neurips/High-Dimensional Contextual Policy Search with Unknown Context Rewards using Bayesian Optimization	
@@ -0,0 +1 @@
+Contextual policies are used in many settings to customize system parameters and actions to the speciﬁcs of a particular setting. In some real-world settings, such as randomized controlled trials or A/B tests, it may not be possible to measure policy outcomes at the level of context—we observe only aggregate rewards across a distribution of contexts. This makes policy optimization much more difﬁcult because we must solve a high-dimensional optimization problem over the entire space of contextual policies, for which existing optimization methods are not suitable. We develop effective models that leverage the structure of the search space to enable contextual policy optimization directly from the aggregate rewards using Bayesian optimization. We use a collection of simulation studies to characterize the performance and robustness of the models, and show that our approach of inferring a low-dimensional context embedding performs best. Finally, we show successful contextual policy optimization in a real-world video bitrate policy problem.
\ No newline at end of file
diff --git a/data/2020/neurips/High-Dimensional Sparse Linear Bandits b/data/2020/neurips/High-Dimensional Sparse Linear Bandits
new file mode 100644
index 0000000000..53886807e5
--- /dev/null
+++ b/data/2020/neurips/High-Dimensional Sparse Linear Bandits	
@@ -0,0 +1 @@
+Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $\Omega(n^{2/3})$ dimension-free minimax regret lower bound for sparse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that $\Theta(n^{2/3})$ is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free $O(\sqrt{n})$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.
\ No newline at end of file
diff --git a/data/2020/neurips/High-Fidelity Generative Image Compression b/data/2020/neurips/High-Fidelity Generative Image Compression
new file mode 100644
index 0000000000..a0fa0161d0
--- /dev/null
+++ b/data/2020/neurips/High-Fidelity Generative Image Compression	
@@ -0,0 +1 @@
+We extensively study how to combine Generative Adversarial Networks and learned compression to obtain a state-of-the-art generative lossy compression system. In particular, we investigate normalization layers, generator and discriminator architectures, training strategies, as well as perceptual losses. In contrast to previous work, i) we obtain visually pleasing reconstructions that are perceptually similar to the input, ii) we operate in a broad range of bitrates, and iii) our approach can be applied to high-resolution images. We bridge the gap between rate-distortion-perception theory and practice by evaluating our approach both quantitatively with various perceptual metrics and a user study. The study shows that our method is preferred to previous approaches even if they use more than 2x the bitrate.
\ No newline at end of file
diff --git a/data/2020/neurips/High-Throughput Synchronous Deep RL b/data/2020/neurips/High-Throughput Synchronous Deep RL
new file mode 100644
index 0000000000..1750bbca8a
--- /dev/null
+++ b/data/2020/neurips/High-Throughput Synchronous Deep RL	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.
\ No newline at end of file
diff --git "a/data/2020/neurips/High-contrast \"gaudy\" images improve the training of deep neural network models of visual cortex" "b/data/2020/neurips/High-contrast \"gaudy\" images improve the training of deep neural network models of visual cortex"
new file mode 100644
index 0000000000..063a98096a
--- /dev/null
+++ "b/data/2020/neurips/High-contrast \"gaudy\" images improve the training of deep neural network models of visual cortex"	
@@ -0,0 +1 @@
+A key challenge in understanding the sensory transformations of the visual system is to obtain a highly predictive model of responses from visual cortical neurons. Deep neural networks (DNNs) provide a promising candidate for such a model. However, DNNs require orders of magnitude more training data than neuroscientists can collect from real neurons because experimental recording time is severely limited. This motivates us to find images that train highly-predictive DNNs with as little training data as possible. We propose gaudy images---high-contrast binarized versions of natural images---to efficiently train DNNs. In extensive simulation experiments, we find that training DNNs with gaudy images substantially reduces the number of training images needed to accurately predict the simulated responses of visual cortical neurons. We also find that gaudy images, chosen before training, outperform images chosen during training by active learning algorithms. Thus, gaudy images overemphasize features of natural images, especially edges, that are the most important for efficiently training DNNs. We believe gaudy images will aid in the modeling of visual cortical neurons, potentially opening new scientific questions about visual processing, as well as aid general practitioners that seek ways to improve the training of DNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/High-recall causal discovery for autocorrelated time series with latent confounders b/data/2020/neurips/High-recall causal discovery for autocorrelated time series with latent confounders
new file mode 100644
index 0000000000..faea99d815
--- /dev/null
+++ b/data/2020/neurips/High-recall causal discovery for autocorrelated time series with latent confounders	
@@ -0,0 +1 @@
+We present a new method for linear and nonlinear, lagged and contemporaneous constraint-based causal discovery from observational time series in the presence of latent confounders. We show that existing causal discovery methods such as FCI and variants suffer from low recall in the autocorrelated time series case and identify low effect size of conditional independence tests as the main reason. Information-theoretical arguments show that effect size can often be increased if causal parents are included in the conditioning sets. To identify parents early on, we suggest an iterative procedure that utilizes novel orientation rules to determine ancestral relationships already during the edge removal phase. We prove that the method is order-independent, and sound and complete in the oracle case. Extensive simulation studies for different numbers of variables, time lags, sample sizes, and further cases demonstrate that our method indeed achieves much higher recall than existing methods while keeping false positives at the desired level. This performance gain grows with stronger autocorrelation. Our method also covers causal discovery for non-time series data as a special case. We provide Python code for all methods involved in the simulation studies.
\ No newline at end of file
diff --git a/data/2020/neurips/Higher-Order Certification For Randomized Smoothing b/data/2020/neurips/Higher-Order Certification For Randomized Smoothing
new file mode 100644
index 0000000000..180c9317f5
--- /dev/null
+++ b/data/2020/neurips/Higher-Order Certification For Randomized Smoothing	
@@ -0,0 +1 @@
+Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved SOTA provable robustness against $\ell_2$ perturbations. A number of publications have extended the guarantees to other metrics, such as $\ell_1$ or $\ell_\infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $\ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certified safety region using $0^{th}$-order and $1^{st}$-order information for Gaussian-smoothed classifiers. We also provide a framework that generalizes the calculation for certification using higher-order information. 3) We design efficient, high-confidence estimators for the relevant statistics of the first-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are significantly larger than the ones provided by the current methods. On CIFAR10 and Imagenet datasets, the new regions certified by our approach achieve significant improvements on general $\ell_1$ certified radii and on the $\ell_2$ certified radii for color-space attacks ($\ell_2$ restricted to 1 channel) while also achieving smaller improvements on the general $\ell_2$ certified radii. Our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitude of certified radii without requiring the use of data-dependent smoothing techniques.
\ No newline at end of file
diff --git a/data/2020/neurips/Higher-Order Spectral Clustering of Directed Graphs b/data/2020/neurips/Higher-Order Spectral Clustering of Directed Graphs
new file mode 100644
index 0000000000..3bc2011059
--- /dev/null
+++ b/data/2020/neurips/Higher-Order Spectral Clustering of Directed Graphs	
@@ -0,0 +1 @@
+Clustering is an important topic in algorithms, and has a number of applications in machine learning, computer vision, statistics, and several other research disciplines. Traditional objectives of graph clustering are to find clusters with low conductance. Not only are these objectives just applicable for undirected graphs, they are also incapable to take the relationships between clusters into account, which could be crucial for many applications. To overcome these downsides, we study directed graphs (digraphs) whose clusters exhibit further "structural" information amongst each other. Based on the Hermitian matrix representation of digraphs, we present a nearly-linear time algorithm for digraph clustering, and further show that our proposed algorithm can be implemented in sublinear time under reasonable assumptions. The significance of our theoretical work is demonstrated by extensive experimental results on the UN Comtrade Dataset: the output clustering of our algorithm exhibits not only how the clusters (sets of countries) relate to each other with respect to their import and export records, but also how these clusters evolve over time, in accordance with known facts in international trade.
\ No newline at end of file
diff --git a/data/2020/neurips/Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics b/data/2020/neurips/Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics
new file mode 100644
index 0000000000..ca97607413
--- /dev/null
+++ b/data/2020/neurips/Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics	
@@ -0,0 +1,2 @@
+We consider the fundamental problem of selecting $k$ out of $n$ random variables in a way that the expected highest or second-highest value is maximized. This question captures several applications where we have uncertainty about the quality of candidates (e.g. auction bids, search results) and have the capacity to explore only a small subset due to an exogenous constraint. For example, consider a second price auction where system constraints (e.g., costly retrieval or model computation) allow the participation of only $k$ out of $n$ bidders, and the goal is to optimize the expected efficiency (highest bid) or expected revenue (second highest bid). 
+We study the case where we are given an explicit description of each random variable. We give a PTAS for the problem of maximizing the expected highest value. For the second-highest value, we prove a hardness result: assuming the Planted Clique Hypothesis, there is no constant factor approximation algorithm that runs in polynomial time. Surprisingly, under the assumption that each random variable has monotone hazard rate (MHR), a simple score-based algorithm, namely picking the $k$ random variables with the largest $1/\sqrt{k}$ top quantile value, is a constant approximation to the expected highest and second highest value, \emph{simultaneously}.
\ No newline at end of file
diff --git a/data/2020/neurips/Hold me tight! Influence of discriminative features on deep network boundaries b/data/2020/neurips/Hold me tight! Influence of discriminative features on deep network boundaries
new file mode 100644
index 0000000000..6cb8e0a632
--- /dev/null
+++ b/data/2020/neurips/Hold me tight! Influence of discriminative features on deep network boundaries	
@@ -0,0 +1 @@
+Important insights towards the explainability of neural networks reside in the characteristics of their decision boundaries. In this work, we borrow tools from the field of adversarial robustness, and propose a new perspective that relates dataset features to the distance of samples to the decision boundary. This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets. We use this framework to reveal some intriguing properties of CNNs. Specifically, we rigorously confirm that neural networks exhibit a high invariance to non-discriminative features, and show that the decision boundaries of a DNN can only exist as long as the classifier is trained with some features that hold them together. Finally, we show that the construction of the decision boundary is extremely sensitive to small perturbations of the training samples, and that changes in certain directions can lead to sudden invariances in the orthogonal ones. This is precisely the mechanism that adversarial training uses to achieve robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods b/data/2020/neurips/How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods
new file mode 100644
index 0000000000..5174139417
--- /dev/null
+++ b/data/2020/neurips/How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods	
@@ -0,0 +1 @@
+Explaining the inner workings of deep neural network models have received considerable attention in recent years. Researchers have attempted to provide human parseable explanations justifying why a model performed a speciﬁc classiﬁcation. Although many of these toolkits are available for use, it is unclear which style of explanation is preferred by end-users, thereby demanding investigation. We performed a cross-analysis Amazon Mechanical Turk study comparing the popular state-of-the-art explanation methods to empirically determine which are better in explaining model decisions. The participants were asked to compare explanation methods across applications spanning image, text, audio, and sensory domains. Among the surveyed methods, explanation-by-example was preferred in all domains except text sentiment classiﬁcation, where LIME’s method of annotating input text was preferred. We highlight qualitative aspects of employing the studied explainability methods and conclude with implications for researchers and engineers that seek to incorporate explanations into user-facing deployments.
\ No newline at end of file
diff --git a/data/2020/neurips/How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? b/data/2020/neurips/How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?
new file mode 100644
index 0000000000..2cf95c9d74
--- /dev/null
+++ b/data/2020/neurips/How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?	
@@ -0,0 +1 @@
+To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make? To answer this question, we investigate 2 state-of-the-art NPI effectiveness models and propose 6 variants that make different structural assumptions. In particular, we investigate how well NPI effectiveness estimates generalise to unseen countries, and their sensitivity to unobserved factors. Models that account for noise in disease transmission compare favourably. We further evaluate how robust estimates are to different choices of epidemiological parameters and data. Focusing on models that assume transmission noise, we find that previously published results are robust across these choices and across different models. Finally, we mathematically ground the interpretation of NPI effectiveness estimates when certain common assumptions do not hold.
\ No newline at end of file
diff --git a/data/2020/neurips/How do fair decisions fare in long-term qualification? b/data/2020/neurips/How do fair decisions fare in long-term qualification?
new file mode 100644
index 0000000000..fa006d69a7
--- /dev/null
+++ b/data/2020/neurips/How do fair decisions fare in long-term qualification?	
@@ -0,0 +1 @@
+Although many fairness criteria have been proposed for decision making, their long-term impact on the well-being of a population remains unclear. In this work, we study the dynamics of population qualification and algorithmic decisions under a partially observed Markov decision problem setting. By characterizing the equilibrium of such dynamics, we analyze the long-term impact of static fairness constraints on the equality and improvement of group well-being. Our results show that static fairness constraints can either promote equality or exacerbate disparity depending on the driving factor of qualification transitions and the effect of sensitive attributes on feature distributions. We also consider possible interventions that can effectively improve group qualification or promote equality of group qualification. Our theoretical results and experiments on static real-world datasets with simulated dynamics show that our framework can be used to facilitate social science studies.
\ No newline at end of file
diff --git a/data/2020/neurips/How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions b/data/2020/neurips/How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions
new file mode 100644
index 0000000000..62b43d3e3c
--- /dev/null
+++ b/data/2020/neurips/How does This Interaction Affect Me? Interpretable Attribution for Feature Interactions	
@@ -0,0 +1 @@
+Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks? b/data/2020/neurips/How does Weight Correlation Affect Generalisation Ability of Deep Neural Networks?
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/How hard is to distinguish graphs with graph neural networks? b/data/2020/neurips/How hard is to distinguish graphs with graph neural networks?
new file mode 100644
index 0000000000..318dc7ce73
--- /dev/null
+++ b/data/2020/neurips/How hard is to distinguish graphs with graph neural networks?	
@@ -0,0 +1 @@
+A hallmark of graph neural networks is their ability to distinguish the isomorphism class of their inputs. This study derives hardness results for the classification variant of graph isomorphism in the message-passing model (MPNN). MPNN encompasses the majority of graph neural networks used today and is universal when nodes are given unique features. The analysis relies on the introduced measure of communication capacity. Capacity measures how much information the nodes of a network can exchange during the forward pass and depends on the depth, message-size, global state, and width of the architecture. It is shown that the capacity of MPNN needs to grow linearly with the number of nodes so that a network can distinguish trees and quadratically for general connected graphs. The derived bounds concern both worst- and average-case behavior and apply to networks with/without unique features and adaptive architecture -- they are also up to two orders of magnitude tighter than those given by simpler arguments. An empirical study involving 12 graph classification tasks and 420 networks reveals strong alignment between actual performance and theoretical predictions.
\ No newline at end of file
diff --git a/data/2020/neurips/How many samples is a good initial point worth in Low-rank Matrix Recovery? b/data/2020/neurips/How many samples is a good initial point worth in Low-rank Matrix Recovery?
new file mode 100644
index 0000000000..0aa95a2af0
--- /dev/null
+++ b/data/2020/neurips/How many samples is a good initial point worth in Low-rank Matrix Recovery?	
@@ -0,0 +1 @@
+Given a sufficiently large amount of labeled data, the non-convex low-rank matrix recovery problem contains no spurious local minima, so a local optimization algorithm is guaranteed to converge to a global minimum starting from any initial guess. However, the actual amount of data needed by this theoretical guarantee is very pessimistic, as it must prevent spurious local minima from existing anywhere, including at adversarial locations. In contrast, prior work based on good initial guesses have more realistic data requirements, because they allow spurious local minima to exist outside of a neighborhood of the solution. In this paper, we quantify the relationship between the quality of the initial guess and the corresponding reduction in data requirements. Using the restricted isometry constant as a surrogate for sample complexity, we compute a sharp threshold number of samples needed to prevent each specific point on the optimization landscape from becoming a spurious local minima. Optimizing the threshold over regions of the landscape, we see that, for initial points not too close to the ground truth, a linear improvement in the quality of the initial guess amounts to a constant factor improvement in the sample complexity.
\ No newline at end of file
diff --git a/data/2020/neurips/How to Characterize The Landscape of Overparameterized Convolutional Neural Networks b/data/2020/neurips/How to Characterize The Landscape of Overparameterized Convolutional Neural Networks
new file mode 100644
index 0000000000..b236683849
--- /dev/null
+++ b/data/2020/neurips/How to Characterize The Landscape of Overparameterized Convolutional Neural Networks	
@@ -0,0 +1 @@
+For many initialization schemes, parameters of two randomly initialized deep neural networks (DNNs) can be quite different, but feature distributions of the hidden nodes are similar at each layer. With the help of a new technique called neural network grafting , we demonstrate that even during the entire training process, feature distributions of differently initialized networks remain similar at each layer. In this paper, we present an explanation of this phenomenon. Speciﬁcally, we consider the loss landscape of an overparameterized convolutional neural network (CNN) in the continuous limit, where the numbers of channels/hidden nodes in the hidden layers go to inﬁnity. Although the landscape of the overparameterized CNN is still non-convex with respect to the trainable parameters, we show that very surprisingly, it can be reformulated as a convex function with respect to the feature distributions in the hidden layers. Therefore by reparameterizing neural networks in terms of feature distributions, we obtain a much simpler characterization of the landscape of overparameterized CNNs. We further argue that training with respect to network parameters leads to a ﬁxed trajectory in the feature distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization b/data/2020/neurips/How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
new file mode 100644
index 0000000000..c05c21613a
--- /dev/null
+++ b/data/2020/neurips/How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization	
@@ -0,0 +1 @@
+Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic w.r.t. input actions. However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization. In this paper, we propose MAGE, a model-based actor-critic algorithm, grounded in the theory of policy gradients, which explicitly learns the action-value gradient. MAGE backpropagates through the learned dynamics to compute gradient targets in temporal difference learning, leading to a critic tailored for policy improvement. On a set of MuJoCo continuous-control tasks, we demonstrate the efficiency of the algorithm with respect to model-free and model-based state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency b/data/2020/neurips/Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency
new file mode 100644
index 0000000000..54d838ed64
--- /dev/null
+++ b/data/2020/neurips/Human Parsing Based Texture Transfer from Single Image to 3D Human via Cross-View Consistency	
@@ -0,0 +1 @@
+This paper proposes a human parsing based texture transfer model via cross-view consistency learning to generate the texture of 3D human body from a single image. We use the semantic parsing of human body as input for providing both the shape and pose information to reduce the appearance variation of human image and preserve the spatial distribution of semantic parts. Meanwhile, in order to improve the prediction for textures of invisible parts, we explicitly enforce the consistency across different views of the same subject by exchanging the textures predicted by two views to render images during training. The perceptual loss and total variation regularization are optimized to maximize the similarity between rendered and input images, which does not necessitate extra 3D texture supervision. Experimental results on pedestrian images and fashion photos demonstrate that our method can produce higher quality textures with convincing details than other texture generation methods. Code is available at https://github.com/zhaofang0627/HPBTT .
\ No newline at end of file
diff --git a/data/2020/neurips/HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss b/data/2020/neurips/HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss
new file mode 100644
index 0000000000..c83c24ae2c
--- /dev/null
+++ b/data/2020/neurips/HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss	
@@ -0,0 +1 @@
+Recent works show that local descriptor learning benefits from the use of L2 normalisation, however, an in-depth analysis of this effect lacks in the literature. In this paper, we investigate how L2 normalisation affects the back-propagated descriptor gradients during training. Based on our observations, we propose HyNet, a new local descriptor that leads to state-of-the-art results in matching. HyNet introduces a hybrid similarity measure for triplet margin loss, a regularisation term constraining the descriptor norm, and a new network architecture that performs L2 normalisation of all intermediate feature maps and the output descriptors. HyNet surpasses previous methods by a significant margin on standard benchmarks that include patch matching, verification, and retrieval, as well as outperforming full end-to-end methods on 3D reconstruction tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Hybrid Models for Learning to Branch b/data/2020/neurips/Hybrid Models for Learning to Branch
new file mode 100644
index 0000000000..e039c34dc1
--- /dev/null
+++ b/data/2020/neurips/Hybrid Models for Learning to Branch	
@@ -0,0 +1 @@
+A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on. The code for this project is publicly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function b/data/2020/neurips/Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function
new file mode 100644
index 0000000000..d2969627a5
--- /dev/null
+++ b/data/2020/neurips/Hybrid Variance-Reduced SGD Algorithms For Minimax Problems with Nonconvex-Linear Function	
@@ -0,0 +1 @@
+We develop a novel and single-loop variance-reduced algorithm to solve a class of stochastic nonconvex-convex minimax problems involving a nonconvex-linear objective function, which has various applications in different fields such as machine learning and robust optimization. This problem class has several computational challenges due to its nonsmoothness, nonconvexity, nonlinearity, and non-separability of the objective functions. Our approach relies on a new combination of recent ideas, including smoothing and hybrid biased variance-reduced techniques. Our algorithm and its variants can achieveO(T−2/3)-convergence rate and the best known oracle complexity under standard assumptions, where T is the iteration counter. They have several computational advantages compared to existing methods such as simple to implement and less parameter tuning requirements. They can also work with both single sample or mini-batch on derivative estimators, and with constant or diminishing step-sizes. We demonstrate the benefits of our algorithms over existing methods through two numerical examples, including a nonsmooth and nonconvex-non-strongly concave minimax model.
\ No newline at end of file
diff --git a/data/2020/neurips/Hypersolvers: Toward Fast Continuous-Depth Models b/data/2020/neurips/Hypersolvers: Toward Fast Continuous-Depth Models
new file mode 100644
index 0000000000..b339c3a0e2
--- /dev/null
+++ b/data/2020/neurips/Hypersolvers: Toward Fast Continuous-Depth Models	
@@ -0,0 +1 @@
+The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability. This work paves the way for scalable Neural ODEs with time-to-prediction comparable to traditional discrete networks. We introduce hypersolvers, neural networks designed to solve ODEs with low overhead and theoretical guarantees on accuracy. The synergistic combination of hypersolvers and Neural ODEs allows for cheap inference and unlocks a new frontier for practical application of continuous-depth models. Experimental evaluations on standard benchmarks, such as sampling for continuous normalizing flows, reveal consistent pareto efficiency over classical numerical methods.
\ No newline at end of file
diff --git a/data/2020/neurips/ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping b/data/2020/neurips/ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
new file mode 100644
index 0000000000..0ee1f9667e
--- /dev/null
+++ b/data/2020/neurips/ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping	
@@ -0,0 +1 @@
+Feature attribution (FA), or the assignment of class-relevance to different locations in an image, is important for many classification problems but is particularly crucial within the neuroscience domain, where accurate mechanistic models of behaviours, or disease, require knowledge of all features discriminative of a trait. At the same time, predicting class relevance from brain images is challenging as phenotypes are typically heterogeneous, and changes occur against a background of significant natural variation. Here, we present a novel framework for creating class specific FA maps through image-to-image translation. We propose the use of a VAE-GAN to explicitly disentangle class relevance from background features for improved interpretability properties, which results in meaningful FA maps. We validate our method on 2D and 3D brain image datasets of dementia (ADNI dataset), ageing (UK Biobank), and (simulated) lesion detection. We show that FA maps generated by our method outperform baseline FA methods when validated against ground truth. More significantly, our approach is the first to use latent space sampling to support exploration of phenotype variation. Our code will be available online at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA b/data/2020/neurips/ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
new file mode 100644
index 0000000000..84a72896f5
--- /dev/null
+++ b/data/2020/neurips/ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA	
@@ -0,0 +1 @@
+Despite the growing popularity of energy-based models, their identifiability properties are not well-understood. In this paper we establish sufficient conditions under which a large family of conditional energy-based models is identifiable in function space, up to a simple transformation. Our results build on recent developments in the theory of nonlinear ICA, showing that the latent representations in certain families of deep latent-variable models are identifiable. We extend these results to a very broad family of conditional energy-based models. In this family, the energy function is simply the dot-product between two feature extractors, one for the dependent variable, and one for the conditioning variable. We show that under mild conditions, the features are unique up to scaling and permutation. Second, we propose the framework of independently modulated component analysis (IMCA), a new form of nonlinear ICA where the indepencence assumption is relaxed. Importantly, we show that our energy-based model can be used for the estimation of the components: the features learned are a simple and often trivial transformation of the latents.
\ No newline at end of file
diff --git a/data/2020/neurips/ICNet: Intra-saliency Correlation Network for Co-Saliency Detection b/data/2020/neurips/ICNet: Intra-saliency Correlation Network for Co-Saliency Detection
new file mode 100644
index 0000000000..eaebb92ea5
--- /dev/null
+++ b/data/2020/neurips/ICNet: Intra-saliency Correlation Network for Co-Saliency Detection	
@@ -0,0 +1 @@
+Intra-saliency and inter-saliency cues 3 have been extensively studied for co-saliency detection (Co-SOD). Model-based methods produce coarse Co-SOD results due to hand-crafted intra-and inter-saliency features. Current data-driven models exploit inter-saliency cues, but undervalue the potential power of intra-saliency cues. In this paper, we propose an Intra-saliency Correlation Network (ICNet) to extract intra-saliency cues from the single image saliency maps (SISMs) predicted by any off-the-shelf SOD method, and obtain inter-saliency cues by correlation techniques. Speciﬁcally, we adopt normalized masked average pooling (NMAP) to extract latent intra-saliency categories from the SISMs and semantic features as intra cues. Then we employ a correlation fusion module (CFM) to obtain inter cues by exploiting correlations between the intra cues and single-image features. To improve Co-SOD performance, we propose a category-independent rearranged self-correlation feature (RSCF) strategy. Experiments on three benchmarks show that our ICNet outperforms previous state-of-the-art methods on Co-SOD. Ablation studies validate the effectiveness of our contributions. The PyTorch code is available at https://github.com/blanclist/ICNet .
\ No newline at end of file
diff --git a/data/2020/neurips/IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method b/data/2020/neurips/IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method
new file mode 100644
index 0000000000..a83f481c5a
--- /dev/null
+++ b/data/2020/neurips/IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method	
@@ -0,0 +1 @@
+We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method, thereby providing a systematic way for deriving several well-known decentralized algorithms including EXTRA arXiv:1404.6264 and SSDA arXiv:1702.08704. When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds. We provide experimental results that demonstrate the effectiveness of the proposed algorithm on highly ill-conditioned problems.
\ No newline at end of file
diff --git a/data/2020/neurips/ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding b/data/2020/neurips/ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
new file mode 100644
index 0000000000..b4088bf018
--- /dev/null
+++ b/data/2020/neurips/ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding	
@@ -0,0 +1 @@
+Neural architecture search (NAS) aims to produce the optimal sparse solution from a high-dimensional space spanned by all candidate connections. Current gradient-based NAS methods commonly ignore the constraint of sparsity in the search phase, but project the optimized solution onto a sparse one by post-processing. As a result, the dense super-net for search is inefficient to train and has a gap with the projected architecture for evaluation. In this paper, we formulate neural architecture search as a sparse coding problem. We perform the differentiable search on a compressed lower-dimensional space that has the same validation loss as the original sparse solution space, and recover an architecture by solving the sparse coding problem. The differentiable search and architecture recovery are optimized in an alternate manner. By doing so, our network for search at each update satisfies the sparsity constraint and is efficient to train. In order to also eliminate the depth and width gap between the network in search and the target-net in evaluation, we further propose a method to search and evaluate in one stage under the target-net settings. When training finishes, architecture variables are absorbed into network weights. Thus we get the searched architecture and optimized parameters in a single run. In experiments, our two-stage method on CIFAR-10 requires only 0.05 GPU-day for search. Our one-stage method produces state-of-the-art performances on both CIFAR-10 and ImageNet at the cost of only evaluation time.
\ No newline at end of file
diff --git a/data/2020/neurips/Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models b/data/2020/neurips/Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models
new file mode 100644
index 0000000000..afbf9a92a7
--- /dev/null
+++ b/data/2020/neurips/Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models	
@@ -0,0 +1 @@
+Recommending the best course of action for an individual is a major application of individual-level causal effect estimation. This application is often needed in safety-critical domains such as healthcare, where estimating and communicating uncertainty to decision-makers is crucial. We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods used for individual-level causal estimates. We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data, where standard applications of causal effect approaches fail. Further, our methods allow us to handle covariate shift, where test distribution differs to train distribution, common when systems are deployed in practice. We show that when such a covariate shift occurs, correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations. We demonstrate our methodology with a range of state-of-the-art models. Under both covariate shift and lack of overlap, our uncertainty-equipped methods can alert decisions makers when predictions are not to be trusted while outperforming their uncertainty-oblivious counterparts.
\ No newline at end of file
diff --git a/data/2020/neurips/Identifying Learning Rules From Neural Network Observables b/data/2020/neurips/Identifying Learning Rules From Neural Network Observables
new file mode 100644
index 0000000000..ebfeb186ce
--- /dev/null
+++ b/data/2020/neurips/Identifying Learning Rules From Neural Network Observables	
@@ -0,0 +1 @@
+The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to determine whether any given learning rule is operative in a real biological system. In this work, we take a "virtual experimental" approach to this problem. Simulating idealized neuroscience experiments with artificial neural networks, we generate a large-scale dataset of learning trajectories of aggregate statistics measured in a variety of neural network architectures, loss functions, learning rule hyperparameters, and parameter initializations. We then take a discriminative approach, training linear and simple non-linear classifiers to identify learning rules from features based on these observables. We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula. We identify the statistics of each observable that are most relevant for rule identification, finding that statistics from network activities across training are more robust to unit undersampling and measurement noise than those obtained from the synaptic strengths. Our results suggest that activation patterns, available from electrophysiological recordings of post-synaptic activities on the order of several hundred units, frequently measured at wider intervals over the course of learning, may provide a good basis on which to identify learning rules.
\ No newline at end of file
diff --git a/data/2020/neurips/Identifying Mislabeled Data using the Area Under the Margin Ranking b/data/2020/neurips/Identifying Mislabeled Data using the Area Under the Margin Ranking
new file mode 100644
index 0000000000..ecf023c642
--- /dev/null
+++ b/data/2020/neurips/Identifying Mislabeled Data using the Area Under the Margin Ranking	
@@ -0,0 +1 @@
+Not all data in a typical training set help with generalization; some samples can be overly ambiguous or outrightly mislabeled. This paper introduces a new method to identify such samples and mitigate their impact when training neural networks. At the heart of our algorithm is the Area Under the Margin (AUM) statistic, which exploits differences in the training dynamics of clean and mislabeled samples. A simple procedure - adding an extra class populated with purposefully mislabeled indicator samples - learns a threshold that isolates mislabeled data based on this metric. This approach consistently improves upon prior work on synthetic and real-world datasets. On the WebVision50 classification task our method removes 17% of training data, yielding a 2.6% (absolute) improvement in test error. On CIFAR100 removing 13% of the data leads to a 1.2% drop in error.
\ No newline at end of file
diff --git a/data/2020/neurips/Identifying signal and noise structure in neural population activity with Gaussian process factor models b/data/2020/neurips/Identifying signal and noise structure in neural population activity with Gaussian process factor models
new file mode 100644
index 0000000000..87b7e44f3f
--- /dev/null
+++ b/data/2020/neurips/Identifying signal and noise structure in neural population activity with Gaussian process factor models	
@@ -0,0 +1 @@
+Neural datasets often contain measurements of neural activity across multiple trials of a repeated stimulus or behavior. An important problem in the analysis of such datasets is to characterize systematic aspects of neural activity that carry information about the repeated stimulus or behavior of interest, which can be considered “signal”, and to separate them from the trial-to-trial fluctuations in activity that are not time-locked to the stimulus, which for purposes of such analyses can be considered “noise”. Gaussian Process factor models provide a powerful tool for identifying shared structure in high-dimensional neural data. However, they have not yet been adapted to the problem of characterizing signal and noise in multi-trial datasets. Here we address this shortcoming by proposing “signal-noise” Poisson-spiking Gaussian Process Factor Analysis (SNP-GPFA), a flexible latent variable model that resolves signal and noise latent structure in neural population spiking activity. To learn the parameters of our model, we introduce a Fourier-domain black box variational inference method that quickly identifies smooth latent structure. The resulting model reliably uncovers latent signal and trial-to-trial noise-related fluctuations in large-scale recordings. We use this model to show that predominantly, noise fluctuations perturb neural activity within a subspace orthogonal to signal activity, suggesting that trial-by-trial noise does not interfere with signal representations. Finally, we extend the model to capture statistical dependencies across brain regions in multi-region data. We show that in mouse visual cortex, models with shared noise across brain regions out-perform models with independent per-region noise.
\ No newline at end of file
diff --git a/data/2020/neurips/ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool b/data/2020/neurips/ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool
new file mode 100644
index 0000000000..edef05149f
--- /dev/null
+++ b/data/2020/neurips/ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool	
@@ -0,0 +1 @@
+Algorithm configuration procedures optimize parameters of a given algorithm to perform well over a distribution of inputs. Recent theoretical work focused on the case of selecting between a small number of alternatives. In practice, parameter spaces are often very large or infinite, and so successful heuristic procedures discard parameters “impatiently”, based on very few observations. Inspired by this idea, we introduce ImpatientCapsAndRuns, which quickly discards less promising configurations, significantly speeding up the search procedure compared to previous algorithms with theoretical guarantees, while still achieving optimal runtime up to logarithmic factors under mild assumptions. Experimental results demonstrate a practical improvement.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy b/data/2020/neurips/Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
new file mode 100644
index 0000000000..4753a35415
--- /dev/null
+++ b/data/2020/neurips/Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy	
@@ -0,0 +1 @@
+We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and non-kernel ("rich" or "active") regimes. We show how the transition is controlled by the relationship between the initialization scale and how accurately we minimize the training loss. Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies (well beyond $10^{-100}$). Moreover, the implicit bias at reasonable initialization scales and training accuracies is more complex and not captured by these limits.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Distributional Reinforcement Learning b/data/2020/neurips/Implicit Distributional Reinforcement Learning
new file mode 100644
index 0000000000..667e431f58
--- /dev/null
+++ b/data/2020/neurips/Implicit Distributional Reinforcement Learning	
@@ -0,0 +1 @@
+To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution. We adopt a distributional perspective on the discounted cumulative return and model it with a state-action-dependent implicit distribution, which is approximated by the DGNs that take state-action pairs and random noises as their input. Moreover, we use the SIA to provide a semi-implicit policy distribution, which mixes the policy parameters with a reparameterizable distribution that is not constrained by an analytic density function. In this way, the policy's marginal distribution is implicit, providing the potential to model complex properties such as covariance structure and skewness, but its parameter and entropy can still be estimated. We incorporate these features with an off-policy algorithm framework to solve problems with continuous action space, and compare IDAC with the state-of-art algorithms on representative OpenAI Gym environments. We observe that IDAC outperforms these baselines for most tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Graph Neural Networks b/data/2020/neurips/Implicit Graph Neural Networks
new file mode 100644
index 0000000000..9576e6b721
--- /dev/null
+++ b/data/2020/neurips/Implicit Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are widely used deep learning models that learn meaningful representations from graph-structured data. Due to the finite nature of the underlying recurrent structure, current GNN methods may struggle to capture long-range dependencies in underlying graphs. To overcome this difficulty, we propose a graph learning framework, called Implicit Graph Neural Networks (IGNN), where predictions are based on the solution of a fixed-point equilibrium equation involving implicitly defined "state" vectors. We use the Perron-Frobenius theory to derive sufficient conditions that ensure well-posedness of the framework. Leveraging implicit differentiation, we derive a tractable projected gradient descent method to train the framework. Experiments on a comprehensive range of tasks show that IGNNs consistently capture long-range dependencies and outperform the state-of-the-art GNN models.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Neural Representations with Periodic Activation Functions b/data/2020/neurips/Implicit Neural Representations with Periodic Activation Functions
new file mode 100644
index 0000000000..b73e7a793d
--- /dev/null
+++ b/data/2020/neurips/Implicit Neural Representations with Periodic Activation Functions	
@@ -0,0 +1 @@
+Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Rank-Minimizing Autoencoder b/data/2020/neurips/Implicit Rank-Minimizing Autoencoder
new file mode 100644
index 0000000000..67450c9f65
--- /dev/null
+++ b/data/2020/neurips/Implicit Rank-Minimizing Autoencoder	
@@ -0,0 +1 @@
+An important component of autoencoders is the method by which the information capacity of the latent representation is minimized or limited. In this work, the rank of the covariance matrix of the codes is implicitly minimized by relying on the fact that gradient descent learning in multi-layer linear networks leads to minimum-rank solutions. By inserting a number of extra linear layers between the encoder and the decoder, the system spontaneously learns representations with a low effective dimension. The model, dubbed Implicit Rank-Minimizing Autoencoder (IRMAE), is simple, deterministic, and learns compact latent spaces. We demonstrate the validity of the method on several image generation and representation learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Implicit Regularization in Deep Learning May Not Be Explainable by Norms b/data/2020/neurips/Implicit Regularization in Deep Learning May Not Be Explainable by Norms
new file mode 100644
index 0000000000..578a7679f5
--- /dev/null
+++ b/data/2020/neurips/Implicit Regularization in Deep Learning May Not Be Explainable by Norms	
@@ -0,0 +1 @@
+Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Impossibility Results for Grammar-Compressed Linear Algebra b/data/2020/neurips/Impossibility Results for Grammar-Compressed Linear Algebra
new file mode 100644
index 0000000000..3ac787b16d
--- /dev/null
+++ b/data/2020/neurips/Impossibility Results for Grammar-Compressed Linear Algebra	
@@ -0,0 +1,3 @@
+To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size $N$ down to size $n \ll N$, it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? 
+In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently as if the original data was that small. That is, if an operation has time complexity $T(\rm{inputsize})$, can we perform it on the compressed representation in time $T(n)$ rather than $T(N)$? We consider the most basic linear algebra operations: inner product, matrix-vector multiplication, and matrix multiplication. In particular, given two compressed vectors, can we compute their inner product in time $O(n)$? Or perhaps we must decompress first and then multiply, spending $\Omega(N)$ time? 
+The answer depends on the compression scheme. While for simple ones such as Run-Length-Encoding (RLE) the inner product can be done in $O(n)$ time, we prove that this is impossible for compressions from a richer class: essentially $n^2$ or even larger runtimes are needed in the worst case (under complexity assumptions). This is the class of grammar-compressions containing most popular methods such as the Lempel-Ziv family. These schemes are more compressing than the simple RLE, but alas, we prove that performing computations on them is much harder.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Algorithms for Convex-Concave Minimax Optimization b/data/2020/neurips/Improved Algorithms for Convex-Concave Minimax Optimization
new file mode 100644
index 0000000000..ddb2ca8dcf
--- /dev/null
+++ b/data/2020/neurips/Improved Algorithms for Convex-Concave Minimax Optimization	
@@ -0,0 +1 @@
+This paper studies minimax optimization problems $\min_x \max_y f(x,y)$, where $f(x,y)$ is $m_x$-strongly convex with respect to $x$, $m_y$-strongly concave with respect to $y$ and $(L_x,L_{xy},L_y)$-smooth. \citet{zhang2019lower} provided the following lower bound of the gradient complexity for any first-order method: $\Omega\Bigl(\sqrt{\frac{L_x}{m_x}+\frac{L_{xy}^2}{m_x m_y}+\frac{L_y}{m_y}}\ln(1/\epsilon)\Bigr).$ This paper proposes a new algorithm with gradient complexity upper bound $\tilde{O}\Bigl(\sqrt{\frac{L_x}{m_x}+\frac{L\cdot L_{xy}}{m_x m_y}+\frac{L_y}{m_y}}\ln\left(1/\epsilon\right)\Bigr),$ where $L=\max\{L_x,L_{xy},L_y\}$. This improves over the best known upper bound $\tilde{O}\left(\sqrt{\frac{L^2}{m_x m_y}} \ln^3\left(1/\epsilon\right)\right)$ by \citet{lin2020near}. Our bound achieves linear convergence rate and tighter dependency on condition numbers, especially when $L_{xy}\ll L$ (i.e., when the interaction between $x$ and $y$ is weak). Via reduction, our new bound also implies improved bounds for strongly convex-concave and convex-concave minimax optimization problems. When $f$ is quadratic, we can further improve the upper bound, which matches the lower bound up to a small sub-polynomial factor.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds b/data/2020/neurips/Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds
new file mode 100644
index 0000000000..957dbf443a
--- /dev/null
+++ b/data/2020/neurips/Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds	
@@ -0,0 +1 @@
+We consider the problem of nonnegative submodular maximization in the online setting. At time step t , an algorithm selects a set S t ∈ C ⊆ 2 V where C is a feasible family of sets. An adversary then reveals a submodular function f t . The goal is to design an efﬁcient algorithm for minimizing the expected approximate regret. In this work, we give a general approach for improving regret bounds in online submodular maximization by exploiting “ﬁrst-order” regret bounds for online linear optimization.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Analysis of Clipping Algorithms for Non-convex Optimization b/data/2020/neurips/Improved Analysis of Clipping Algorithms for Non-convex Optimization
new file mode 100644
index 0000000000..1dccd055b4
--- /dev/null
+++ b/data/2020/neurips/Improved Analysis of Clipping Algorithms for Non-convex Optimization	
@@ -0,0 +1 @@
+Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem. Recently, \citet{zhang2019gradient} show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD via introducing a new assumption called $(L_0, L_1)$-smoothness, which characterizes the violent fluctuation of gradients typically encountered in deep neural networks. However, their iteration complexities on the problem-dependent parameters are rather pessimistic, and theoretical justification of clipping combined with other crucial techniques, e.g. momentum acceleration, are still lacking. In this paper, we bridge the gap by presenting a general framework to study the clipping algorithms, which also takes momentum methods into consideration. We provide convergence analysis of the framework in both deterministic and stochastic setting, and demonstrate the tightness of our results by comparing them with existing lower bounds. Our results imply that the efficiency of clipping methods will not degenerate even in highly non-smooth regions of the landscape. Experiments confirm the superiority of clipping-based methods in deep learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Guarantees for k-means++ and k-means++ Parallel b/data/2020/neurips/Improved Guarantees for k-means++ and k-means++ Parallel
new file mode 100644
index 0000000000..db1f6d7205
--- /dev/null
+++ b/data/2020/neurips/Improved Guarantees for k-means++ and k-means++ Parallel	
@@ -0,0 +1 @@
+In this paper, we study k-means++ and k-means++ parallel, the two most popular algorithms for the classic k-means clustering problem. We provide novel analyses and show improved approximation and bi-criteria approximation guarantees for k-means++ and k-means++ parallel. Our results give a better theoretical justification for why these algorithms perform extremely well in practice. We also propose a new variant of k-means++ parallel algorithm (Exponential Race k-means++) that has the same approximation guarantees as k-means++.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Sample Complexity for Incremental Autonomous Exploration in MDPs b/data/2020/neurips/Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
new file mode 100644
index 0000000000..34cf206d39
--- /dev/null
+++ b/data/2020/neurips/Improved Sample Complexity for Incremental Autonomous Exploration in MDPs	
@@ -0,0 +1 @@
+We investigate the exploration of an unknown environment when no reward function is provided. Building on the incremental exploration setting introduced by Lim and Auer [1], we define the objective of learning the set of $\epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps (in expectation) from a reference state $s_0$. In this paper, we introduce a novel model-based approach that interleaves discovering new states from $s_0$ and improving the accuracy of a model estimate that is used to compute goal-conditioned policies to reach newly discovered states. The resulting algorithm, DisCo, achieves a sample complexity scaling as $\tilde{O}(L^5 S_{L+\epsilon} \Gamma_{L+\epsilon} A \epsilon^{-2})$, where $A$ is the number of actions, $S_{L+\epsilon}$ is the number of states that are incrementally reachable from $s_0$ in $L+\epsilon$ steps, and $\Gamma_{L+\epsilon}$ is the branching factor of the dynamics over such states. This improves over the algorithm proposed in [1] in both $\epsilon$ and $L$ at the cost of an extra $\Gamma_{L+\epsilon}$ factor, which is small in most environments of interest. Furthermore, DisCo is the first algorithm that can return an $\epsilon/c_{\min}$-optimal policy for any cost-sensitive shortest-path problem defined on the $L$-reachable states with minimum cost $c_{\min}$. Finally, we report preliminary empirical results confirming our theoretical findings.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Schemes for Episodic Memory-based Lifelong Learning b/data/2020/neurips/Improved Schemes for Episodic Memory-based Lifelong Learning
new file mode 100644
index 0000000000..eb66a67e04
--- /dev/null
+++ b/data/2020/neurips/Improved Schemes for Episodic Memory-based Lifelong Learning	
@@ -0,0 +1 @@
+Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as \textit{catastrophic forgetting} and motivates the field called lifelong learning. Recently, episodic memory based approaches such as GEM \cite{lopez2017gradient} and A-GEM \cite{chaudhry2018efficient} have shown remarkable performance. In this paper, we provide the first unified view of episodic memory based approaches from an optimization's perspective. This view leads to two improved schemes for episodic memory based lifelong learning, called MEGA-I and MEGA-II. MEGA-I and MEGA-II modulate the balance between old tasks and the new task by integrating the current gradient with the gradient computed on the episodic memory. Notably, we show that GEM and A-GEM are degenerate cases of MEGA-I and MEGA-II which consistently put the same emphasis on the current task, regardless of how the loss changes over time. Our proposed schemes address this issue by using novel loss-balancing updating rules, which drastically improve the performance over GEM and A-GEM. Extensive experimental results show that the proposed schemes significantly advance the state-of-the-art on four commonly used lifelong learning benchmarks, reducing the error by up to 18\%.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Techniques for Training Score-Based Generative Models b/data/2020/neurips/Improved Techniques for Training Score-Based Generative Models
new file mode 100644
index 0000000000..11ace01c30
--- /dev/null
+++ b/data/2020/neurips/Improved Techniques for Training Score-Based Generative Models	
@@ -0,0 +1 @@
+Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32x32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64x64 to 256x256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows b/data/2020/neurips/Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows
new file mode 100644
index 0000000000..70c26b5a9f
--- /dev/null
+++ b/data/2020/neurips/Improved Variational Bayesian Phylogenetic Inference with Normalizing Flows	
@@ -0,0 +1 @@
+Variational Bayesian phylogenetic inference (VBPI) provides a promising general variational framework for efficient estimation of phylogenetic posteriors. However, the current diagonal Lognormal branch length approximation would significantly restrict the quality of the approximating distributions. In this paper, we propose a new type of VBPI, VBPI-NF, as a first step to empower phylogenetic posterior estimation with deep learning techniques. By handling the non-Euclidean branch length space of phylogenetic models with carefully designed permutation equivariant transformations, VBPI-NF uses normalizing flows to provide a rich family of flexible branch length distributions that generalize across different tree topologies. We show that VBPI-NF significantly improves upon the vanilla VBPI on a benchmark of challenging real data Bayesian phylogenetic inference problems. Further investigation also reveals that the structured parameterization in those permutation equivariant transformations can provide additional amortization benefit.
\ No newline at end of file
diff --git a/data/2020/neurips/Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method b/data/2020/neurips/Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method
new file mode 100644
index 0000000000..0d7c740210
--- /dev/null
+++ b/data/2020/neurips/Improved guarantees and a multiple-descent curve for Column Subset Selection and the Nystrom method	
@@ -0,0 +1 @@
+The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing interpretable low-rank approximations of large datasets by selecting a small but representative set of features or instances. A fundamental question in this area is: what is the cost of this interpretability, i.e., how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the cost of interpretability as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nystrom tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Auto-Augment via Augmentation-Wise Weight Sharing b/data/2020/neurips/Improving Auto-Augment via Augmentation-Wise Weight Sharing
new file mode 100644
index 0000000000..95427e0656
--- /dev/null
+++ b/data/2020/neurips/Improving Auto-Augment via Augmentation-Wise Weight Sharing	
@@ -0,0 +1 @@
+The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would be time-consuming. To achieve efficiency, many choose to sacrifice evaluation reliability for speed. In this paper, we dive into the dynamics of augmented training of the model. This inspires us to design a powerful and efficient proxy task based on the Augmentation-Wise Weight Sharing (AWS) to form a fast yet accurate evaluation process in an elegant way. Comprehensive analysis verifies the superiority of this approach in terms of effectiveness and efficiency. The augmentation policies found by our method achieve superior accuracies compared with existing auto-augmentation search methods. On CIFAR-10, we achieve a top-1 error rate of 1.24%, which is currently the best performing single model without extra training data. On ImageNet, we get a top-1 error rate of 20.36% for ResNet-50, which leads to 3.34% absolute error rate reduction over the baseline augmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving GAN Training with Probability Ratio Clipping and Sample Reweighting b/data/2020/neurips/Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
new file mode 100644
index 0000000000..61fdbf2e62
--- /dev/null
+++ b/data/2020/neurips/Improving GAN Training with Probability Ratio Clipping and Sample Reweighting	
@@ -0,0 +1 @@
+Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) can suffer from inferior performance due to unstable training, especially for text generation. We propose a new variational GAN training framework which enjoys superior training stability. Our approach is inspired by a connection of GANs and reinforcement learning under a variational perspective. The connection leads to (1) probability ratio clipping that regularizes generator training to prevent excessively large updates, and (2) a sample re-weighting mechanism that stabilizes discriminator training by downplaying bad-quality fake samples. We provide theoretical analysis on the convergence of our approach. By plugging the training approach in diverse state-of-the-art GAN architectures, we obtain significantly improved performance over a range of tasks, including text generation, text style transfer, and image generation.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Generalization in Reinforcement Learning with Mixture Regularization b/data/2020/neurips/Improving Generalization in Reinforcement Learning with Mixture Regularization
new file mode 100644
index 0000000000..d8f5b2857e
--- /dev/null
+++ b/data/2020/neurips/Improving Generalization in Reinforcement Learning with Mixture Regularization	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random convolution) are previously explored to increase the data diversity. However, we find these approaches only locally perturb the observations regardless of the training environments, showing limited effectiveness on enhancing the data diversity and the generalization performance. In this work, we introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments and imposes linearity constraints on the observation interpolations and the supervision (e.g. associated reward) interpolations. Mixreg increases the data diversity more effectively and helps learn smoother policies. We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin. Mixreg is simple, effective and general. It can be applied to both policy-based and value-based RL algorithms. Code is available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Inference for Neural Image Compression b/data/2020/neurips/Improving Inference for Neural Image Compression
new file mode 100644
index 0000000000..9738126a69
--- /dev/null
+++ b/data/2020/neurips/Improving Inference for Neural Image Compression	
@@ -0,0 +1 @@
+We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conventional approach: (i) an amortization gap, (ii) a discretization gap, and (iii) a marginalization gap. We propose improvements to each of these three shortcomings based on ideas related to iterative inference, stochastic annealing for discrete optimization, and bits-back coding, resulting in the first application of bits-back coding to lossy compression. In our experiments, which include extensive baseline comparisons and ablation studies, we achieve new state-of-the-art performance on lossy image compression using an established VAE architecture, by changing only the inference method.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Local Identifiability in Probabilistic Box Embeddings b/data/2020/neurips/Improving Local Identifiability in Probabilistic Box Embeddings
new file mode 100644
index 0000000000..9134223e92
--- /dev/null
+++ b/data/2020/neurips/Improving Local Identifiability in Probabilistic Box Embeddings	
@@ -0,0 +1 @@
+Geometric embeddings have recently received attention for their natural ability to represent transitive asymmetric relations via containment. Box embeddings, where objects are represented by n-dimensional hyperrectangles, are a particularly promising example of such an embedding as they are closed under intersection and their volume can be calculated easily, allowing them to naturally represent calibrated probability distributions. The benefits of geometric embeddings also introduce a problem of local identifiability, however, where whole neighborhoods of parameters result in equivalent loss which impedes learning. Prior work addressed some of these issues by using an approximation to Gaussian convolution over the box parameters, however, this intersection operation also increases the sparsity of the gradient. In this work, we model the box parameters with min and max Gumbel distributions, which were chosen such that space is still closed under the operation of the intersection. The calculation of the expected intersection volume involves all parameters, and we demonstrate experimentally that this drastically improves the ability of such models to learn.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention b/data/2020/neurips/Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
new file mode 100644
index 0000000000..fbfe2ab568
--- /dev/null
+++ b/data/2020/neurips/Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention	
@@ -0,0 +1 @@
+A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing(NLP). We propose a novel hybrid text saliency model(TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. On four different corpora we demonstrate that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modeling approach to integrate TSM predictions into the attention layer of a network designed for a specific upstream NLP task without the need for any task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state of the art performance for sentence compression on the challenging Google Sentence Compression corpus. As such, our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Neural Network Training in Low Dimensional Random Bases b/data/2020/neurips/Improving Neural Network Training in Low Dimensional Random Bases
new file mode 100644
index 0000000000..bf879baac5
--- /dev/null
+++ b/data/2020/neurips/Improving Neural Network Training in Low Dimensional Random Bases	
@@ -0,0 +1 @@
+Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than their native parameter space. While such training is promising for more efficient and scalable optimization schemes, its practical application is limited by inferior optimization performance. Here, we improve on recent random subspace approaches as follows: Firstly, we show that keeping the random projection fixed throughout training is detrimental to optimization. We propose re-drawing the random subspace at each step, which yields significantly better performance. We realize further improvements by applying independent projections to different parts of the network, making the approximation more efficient as network dimensionality grows. To implement these experiments, we leverage hardware-accelerated pseudo-random number generation to construct the random projections on-demand at every optimization step, allowing us to distribute the computation of independent random directions across multiple workers with shared random seeds. This yields significant reductions in memory and is up to 10 times faster for the workloads in question.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Online Rent-or-Buy Algorithms with Sequential Decision Making and ML Predictions b/data/2020/neurips/Improving Online Rent-or-Buy Algorithms with Sequential Decision Making and ML Predictions
new file mode 100644
index 0000000000..387d96f1f3
--- /dev/null
+++ b/data/2020/neurips/Improving Online Rent-or-Buy Algorithms with Sequential Decision Making and ML Predictions	
@@ -0,0 +1 @@
+In this work we study online rent-buy problems as a sequential decision making problem. We show how one can integrate predictions, typically coming from a machine learning (ML) setup, into this framework. Speciﬁcally, we consider the ski-rental problem and the dynamic TCP acknowledgment problem. We present new online algorithms with or without predictions and obtain explicit performance bounds in-terms of the accuracy of the prediction. Our algorithms are close to optimal with accurate predictions while hedging against less accurate predictions
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Policy-Constrained Kidney Exchange via Pre-Screening b/data/2020/neurips/Improving Policy-Constrained Kidney Exchange via Pre-Screening
new file mode 100644
index 0000000000..29647313a3
--- /dev/null
+++ b/data/2020/neurips/Improving Policy-Constrained Kidney Exchange via Pre-Screening	
@@ -0,0 +1 @@
+In barter exchanges, participants swap goods with one another without exchanging money; exchanges are often facilitated by a central clearinghouse, with the goal of maximizing the aggregate quality (or number) of swaps. Barter exchanges are subject to many forms of uncertainty--in participant preferences, the feasibility and quality of various swaps, and so on. Our work is motivated by kidney exchange, a real-world barter market in which patients in need of a kidney transplant swap their willing living donors, in order to find a better match. Modern exchanges include 2- and 3-way swaps, making the kidney exchange clearing problem NP-hard. Planned transplants often fail for a variety of reasons--if the donor organ is refused by the recipient's medical team, or if the donor and recipient are found to be medically incompatible. Due to 2- and 3-way swaps, failed transplants can "cascade" through an exchange; one US-based exchange estimated that about 85% of planned transplants failed in 2019. Many optimization-based approaches have been designed to avoid these failures; however most exchanges cannot implement these methods due to legal and policy constraints. Instead we consider a setting where exchanges can query the preferences of certain donors and recipients--asking whether they would accept a particular transplant. We characterize this as a two-stage decision problem, in which the exchange program (a) queries a small number of transplants before committing to a matching, and (b) constructs a matching according to fixed policy. We show that selecting these edges is a challenging combinatorial problem, which is non-monotonic and non-submodular, in addition to being NP-hard. We propose both a greedy heuristic and a Monte Carlo tree search, which outperforms previous approaches, using experiments on both synthetic data and real kidney exchange data from the United Network for Organ Sharing.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms b/data/2020/neurips/Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
new file mode 100644
index 0000000000..0b9d006a3e
--- /dev/null
+++ b/data/2020/neurips/Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms	
@@ -0,0 +1 @@
+The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-2}/\log(1/\epsilon))$. Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of $\mathcal{O}((1-\gamma)^{-3})$ and $\mathcal{O}((1-\gamma)^{-4}\epsilon^{-2}/\log(1/\epsilon))$, respectively. This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving Sparse Vector Technique with Renyi Differential Privacy b/data/2020/neurips/Improving Sparse Vector Technique with Renyi Differential Privacy
new file mode 100644
index 0000000000..6c0bf6cad7
--- /dev/null
+++ b/data/2020/neurips/Improving Sparse Vector Technique with Renyi Differential Privacy	
@@ -0,0 +1 @@
+The Sparse Vector Technique (SVT) is one of the most fundamental algorithmic tools in differential privacy (DP). It also plays a central role in the state-of-the-art algorithms for adaptive data analysis and model-agnostic private learning. In this paper, we revisit SVT from the lens of Renyi differential privacy, which results in new privacy bounds, new theoretical insight and new variants of SVT algorithms. A notable example is a Gaussian mechanism version of SVT, which provides better utility over the standard (Laplace-mechanism-based) version thanks to its more concentrated noise. Extensive empirical evaluation demonstrates the merits of Gaussian SVT over the Laplace SVT and other alternatives, which encouragingly suggests that using Gaussian SVT as a drop-in replacement could make SVT-based algorithms more practical in downstream tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving model calibration with accuracy versus uncertainty optimization b/data/2020/neurips/Improving model calibration with accuracy versus uncertainty optimization
new file mode 100644
index 0000000000..a5c3e35223
--- /dev/null
+++ b/data/2020/neurips/Improving model calibration with accuracy versus uncertainty optimization	
@@ -0,0 +1 @@
+Obtaining reliable and accurate quantification of uncertainty estimates from deep neural networks is important in safety-critical applications. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. Uncertainty calibration is a challenging problem as there is no ground truth available for uncertainty estimates. We propose an optimization method that leverages the relationship between accuracy and uncertainty as an anchor for uncertainty calibration. We introduce a differentiable accuracy versus uncertainty calibration (AvUC) loss function that allows a model to learn to provide well-calibrated uncertainties, in addition to improved accuracy. We also demonstrate the same methodology can be extended to post-hoc uncertainty calibration on pretrained models. We illustrate our approach with mean-field stochastic variational inference and compare with state-of-the-art methods. Extensive experiments demonstrate our approach yields better model calibration than existing methods on large-scale image classification tasks under distributional shift.
\ No newline at end of file
diff --git a/data/2020/neurips/Improving robustness against common corruptions by covariate shift adaptation b/data/2020/neurips/Improving robustness against common corruptions by covariate shift adaptation
new file mode 100644
index 0000000000..c8b8260c72
--- /dev/null
+++ b/data/2020/neurips/Improving robustness against common corruptions by covariate shift adaptation	
@@ -0,0 +1 @@
+Today's state-of-the-art machine vision models are vulnerable to image corruptions like blurring or compression artefacts, limiting their performance in many real-world applications. We here argue that popular benchmarks to measure model robustness against common corruptions (like ImageNet-C) underestimate model robustness in many (but not all) application scenarios. The key insight is that in many scenarios, multiple unlabeled examples of the corruptions are available and can be used for unsupervised online adaptation. Replacing the activation statistics estimated by batch normalization on the training set with the statistics of the corrupted images consistently improves the robustness across 25 different popular computer vision models. Using the corrected statistics, ResNet-50 reaches 62.2% mCE on ImageNet-C compared to 76.7% without adaptation. With the more robust AugMix model, we improve the state of the art from 56.5% mCE to 51.0% mCE. Even adapting to a single sample improves robustness for the ResNet-50 and AugMix models, and 32 samples are sufficient to improve the current state of the art for a ResNet-50 architecture. We argue that results with adapted statistics should be included whenever reporting scores in corruption benchmarks and other out-of-distribution generalization settings.
\ No newline at end of file
diff --git a/data/2020/neurips/In search of robust measures of generalization b/data/2020/neurips/In search of robust measures of generalization
new file mode 100644
index 0000000000..8b69bcc630
--- /dev/null
+++ b/data/2020/neurips/In search of robust measures of generalization	
@@ -0,0 +1 @@
+One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/Incorporating BERT into Parallel Sequence Decoding with Adapters b/data/2020/neurips/Incorporating BERT into Parallel Sequence Decoding with Adapters
new file mode 100644
index 0000000000..1791ab5d86
--- /dev/null
+++ b/data/2020/neurips/Incorporating BERT into Parallel Sequence Decoding with Adapters	
@@ -0,0 +1 @@
+While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily. We conduct extensive experiments on neural machine translation tasks where the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half, and achieves $36.49$/$33.57$ BLEU scores on IWSLT14 German-English/WMT14 German-English translation. When adapted to autoregressive decoding, the proposed method achieves $30.60$/$43.56$ BLEU scores on WMT14 English-German/English-French translation, on par with the state-of-the-art baseline models.
\ No newline at end of file
diff --git a/data/2020/neurips/Incorporating Interpretable Output Constraints in Bayesian Neural Networks b/data/2020/neurips/Incorporating Interpretable Output Constraints in Bayesian Neural Networks
new file mode 100644
index 0000000000..e1948c234f
--- /dev/null
+++ b/data/2020/neurips/Incorporating Interpretable Output Constraints in Bayesian Neural Networks	
@@ -0,0 +1 @@
+Domains where supervised models are deployed often come with task-specific constraints, such as prior expert knowledge on the ground-truth function, or desiderata like safety and fairness. We introduce a novel probabilistic framework for reasoning with such constraints and formulate a prior that enables us to effectively incorporate them into Bayesian neural networks (BNNs), including a variant that can be amortized over tasks. The resulting Output-Constrained BNN (OC-BNN) is fully consistent with the Bayesian framework for uncertainty quantification and is amenable to black-box inference. Unlike typical BNN inference in uninterpretable parameter space, OC-BNNs widen the range of functional knowledge that can be incorporated, especially for model users without expertise in machine learning. We demonstrate the efficacy of OC-BNNs on real-world datasets, spanning multiple domains such as healthcare, criminal justice, and credit scoring.
\ No newline at end of file
diff --git a/data/2020/neurips/Incorporating Pragmatic Reasoning Communication into Emergent Language b/data/2020/neurips/Incorporating Pragmatic Reasoning Communication into Emergent Language
new file mode 100644
index 0000000000..a2e07fccb8
--- /dev/null
+++ b/data/2020/neurips/Incorporating Pragmatic Reasoning Communication into Emergent Language	
@@ -0,0 +1 @@
+Emergentism and pragmatics are two research fields that study the dynamics of linguistic communication along substantially different timescales and intelligence levels. From the perspective of multi-agent reinforcement learning, they correspond to stochastic games with reinforcement training and stage games with opponent awareness. Given that their combination has been explored in linguistics, we propose computational models that combine short-term mutual reasoning-based pragmatics with long-term language emergentism. We explore this for agent communication referential games as well as in Starcraft II, assessing the relative merits of different kinds of mutual reasoning pragmatics models both empirically and theoretically. Our results shed light on their importance for making inroads towards getting more natural, accurate, robust, fine-grained, and succinct utterances.
\ No newline at end of file
diff --git a/data/2020/neurips/Independent Policy Gradient Methods for Competitive Reinforcement Learning b/data/2020/neurips/Independent Policy Gradient Methods for Competitive Reinforcement Learning
new file mode 100644
index 0000000000..ff1195e7f5
--- /dev/null
+++ b/data/2020/neurips/Independent Policy Gradient Methods for Competitive Reinforcement Learning	
@@ -0,0 +1 @@
+We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
\ No newline at end of file
diff --git a/data/2020/neurips/Inductive Quantum Embedding b/data/2020/neurips/Inductive Quantum Embedding
new file mode 100644
index 0000000000..4dd302ca68
--- /dev/null
+++ b/data/2020/neurips/Inductive Quantum Embedding	
@@ -0,0 +1 @@
+Quantum logic inspired embedding (aka Quantum Embedding (QE)) of a Knowledge-Base (KB) was proposed recently by Garg et al. [1]. It is claimed that the QE preserves the logical structure of the input KB given in the form of unary and binary predicates hierarchy. Such structure preservation allows one to perform Boolean logic style deductive reasoning directly over these embedding vectors. The original QE idea, however, is limited to the transductive (not inductive) setting. Moreover, the original QE scheme runs quite slow on real applications involving millions of entities. This paper alleviates both of these key limitations. We start by reformulating the original QE problem to allow for the induction. On the way, we also underscore some interesting analytic and geometric properties of the solution and leverage them to design a faster training scheme. As an application, we show that one can achieve state-of-the-art performance on the well-known NLP task of ﬁne-grained entity type classiﬁcation by using the inductive QE approach. Our training runs 9 -times faster than the original QE scheme on this task.
\ No newline at end of file
diff --git a/data/2020/neurips/Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation b/data/2020/neurips/Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation
new file mode 100644
index 0000000000..989cb97664
--- /dev/null
+++ b/data/2020/neurips/Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation	
@@ -0,0 +1 @@
+Existing 3D human pose estimation models suffer performance drop when applying to new scenarios with unseen poses due to their limited generalizability. In this work, we propose a novel framework, Inference Stage Optimization (ISO), for improving the generalizability of 3D pose models when source and target data come from different pose distributions. Our main insight is that the target data, even though not labeled, carry valuable priors about their underlying distribution. To exploit such information, the proposed ISO performs geometry-aware self-supervised learning (SSL) on each single target instance and updates the 3D pose model before making prediction. In this way, the model can mine distributional knowledge about the target scenario and quickly adapt to it with enhanced generalization performance. In addition, to handle sequential target data, we propose an online mode for implementing our ISO framework via streaming the SSL, which substantially enhances its effectiveness. We systematically analyze why and how our ISO framework works on diverse benchmarks under cross-scenario setup. Remarkably, it yields new state-of-the-art of 83.6% 3D PCK on MPI-INF-3DHP, improving upon the previous best result by 9.7%. Code will be released.
\ No newline at end of file
diff --git a/data/2020/neurips/Inference for Batched Bandits b/data/2020/neurips/Inference for Batched Bandits
new file mode 100644
index 0000000000..6159bfa230
--- /dev/null
+++ b/data/2020/neurips/Inference for Batched Bandits	
@@ -0,0 +1 @@
+As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We first prove that the ordinary least squares estimator (OLS), which is asymptotically normal on independently sampled data, is not asymptotically normal on data collected using standard bandit algorithms when there is no unique optimal arm. This asymptotic non-normality result implies that the naive assumption that the OLS estimator is approximately normal can lead to Type-1 error inflation and confidence intervals with below-nominal coverage probabilities. Second, we introduce the Batched OLS estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward.
\ No newline at end of file
diff --git a/data/2020/neurips/Inferring learning rules from animal decision-making b/data/2020/neurips/Inferring learning rules from animal decision-making
new file mode 100644
index 0000000000..dbaf943ed8
--- /dev/null
+++ b/data/2020/neurips/Inferring learning rules from animal decision-making	
@@ -0,0 +1 @@
+How do animals learn? This remains an elusive question in neuroscience. Whereas reinforcement learning often focuses on the design of algorithms that enable artificial agents to efficiently learn new tasks, here we develop a modeling framework to directly infer the empirical learning rules that animals use to acquire new behaviors. Our method efficiently infers the trial-to-trial changes in an animal's policy, and decomposes those changes into a learning component and a noise component. Specifically, this allows us to: (i) compare different learning rules and objective functions that an animal may be using to update its policy; (ii) estimate distinct learning rates for different parameters of an animal's policy; (iii) identify variations in learning across cohorts of animals; and (iv) uncover trial-to-trial changes that are not captured by normative learning rules. After validating our framework on simulated choice data, we applied our model to data from rats and mice learning perceptual decision-making tasks. We found that certain learning rules were far more capable of explaining trial-to-trial changes in an animal's policy. Whereas the average contribution of the conventional REINFORCE learning rule to the policy update for mice learning the International Brain Laboratory's task was just 30%, we found that adding baseline parameters allowed the learning rule to explain 92% of the animals' policy updates under our model. Intriguingly, the best-fitting learning rates and baseline values indicate that an animal's policy update, at each trial, does not occur in the direction that maximizes expected reward. Understanding how an animal transitions from chance-level to high-accuracy performance when learning a new task not only provides neuroscientists with insight into their animals, but also provides concrete examples of biological learning algorithms to the machine learning community.
\ No newline at end of file
diff --git a/data/2020/neurips/Influence-Augmented Online Planning for Complex Environments b/data/2020/neurips/Influence-Augmented Online Planning for Complex Environments
new file mode 100644
index 0000000000..da9139d5a9
--- /dev/null
+++ b/data/2020/neurips/Influence-Augmented Online Planning for Complex Environments	
@@ -0,0 +1 @@
+How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.
\ No newline at end of file
diff --git a/data/2020/neurips/Information Maximization for Few-Shot Learning b/data/2020/neurips/Information Maximization for Few-Shot Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback b/data/2020/neurips/Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback
new file mode 100644
index 0000000000..d5e6ec83fd
--- /dev/null
+++ b/data/2020/neurips/Information Theoretic Counterfactual Learning from Missing-Not-At-Random Feedback	
@@ -0,0 +1 @@
+Counterfactual learning for dealing with missing-not-at-random data (MNAR) is an intriguing topic in the recommendation literature since MNAR data are ubiquitous in modern recommender systems. Missing-at-random (MAR) data, namely randomized controlled trials (RCTs), are usually required by most previous counterfactual learning methods for debiasing learning. However, the execution of RCTs is extraordinarily expensive in practice. To circumvent the use of RCTs, we build an information-theoretic counterfactual variational information bottleneck (CVIB), as an alternative for debiasing learning without RCTs. By separating the task-aware mutual information term in the original information bottleneck Lagrangian into factual and counterfactual parts, we derive a contrastive information loss and an additional output confidence penalty, which facilitates balanced learning between the factual and counterfactual domains. Empirical evaluation on real-world datasets shows that our CVIB significantly enhances both shallow and deep models, which sheds light on counterfactual learning in recommendation that goes beyond RCTs.
\ No newline at end of file
diff --git a/data/2020/neurips/Information Theoretic Regret Bounds for Online Nonlinear Control b/data/2020/neurips/Information Theoretic Regret Bounds for Online Nonlinear Control
new file mode 100644
index 0000000000..6a3fe5d2a9
--- /dev/null
+++ b/data/2020/neurips/Information Theoretic Regret Bounds for Online Nonlinear Control	
@@ -0,0 +1 @@
+This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control ($LC^3$) algorithm, enjoys a near-optimal $O(\sqrt{T})$ regret bound against the optimal controller in episodic settings, where $T$ is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.
\ No newline at end of file
diff --git a/data/2020/neurips/Information theoretic limits of learning a sparse rule b/data/2020/neurips/Information theoretic limits of learning a sparse rule
new file mode 100644
index 0000000000..b84f5c5a0f
--- /dev/null
+++ b/data/2020/neurips/Information theoretic limits of learning a sparse rule	
@@ -0,0 +1 @@
+We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate. The all-or-nothing phenomenon has previously been shown to occur in high-dimensional linear regression. Our analysis goes beyond the linear case and applies to learning the weights of a perceptron with general activation function in a teacher-student scenario. In particular, we discuss an all-or-nothing phenomenon for the generalization error with a sublinear set of training examples.
\ No newline at end of file
diff --git a/data/2020/neurips/Information-theoretic Task Selection for Meta-Reinforcement Learning b/data/2020/neurips/Information-theoretic Task Selection for Meta-Reinforcement Learning
new file mode 100644
index 0000000000..654c6e63d5
--- /dev/null
+++ b/data/2020/neurips/Information-theoretic Task Selection for Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+In Meta-Reinforcement Learning (meta-RL) an agent is trained on a set of tasks to prepare for and learn faster in new, unseen, but related tasks. The training tasks are usually hand-crafted to be representative of the expected distribution of test tasks and hence all used in training. We show that given a set of training tasks, learning can be both faster and more effective (leading to better performance in the test tasks), if the training tasks are appropriately selected. We propose a task selection algorithm, Information-Theoretic Task Selection (ITTS), based on information theory, which optimizes the set of tasks used for training in meta-RL, irrespectively of how they are generated. The algorithm establishes which training tasks are both sufficiently relevant for the test tasks, and different enough from one another. We reproduce different meta-RL experiments from the literature and show that ITTS improves the final performance in all of them.
\ No newline at end of file
diff --git a/data/2020/neurips/Input-Aware Dynamic Backdoor Attack b/data/2020/neurips/Input-Aware Dynamic Backdoor Attack
new file mode 100644
index 0000000000..02c122668d
--- /dev/null
+++ b/data/2020/neurips/Input-Aware Dynamic Backdoor Attack	
@@ -0,0 +1 @@
+In recent years, neural backdoor attack has been considered to be a potential security threat to deep learning systems. Such systems, while achieving the state-of-the-art performance on clean data, perform abnormally on inputs with predefined triggers. Current backdoor techniques, however, rely on uniform trigger patterns, which are easily detected and mitigated by current defense methods. In this work, we propose a novel backdoor attack technique in which the triggers vary from input to input. To achieve this goal, we implement an input-aware trigger generator driven by diversity loss. A novel cross-trigger test is applied to enforce trigger nonreusablity, making backdoor verification impossible. Experiments show that our method is efficient in various attack scenarios as well as multiple datasets. We further demonstrate that our backdoor can bypass the state of the art defense methods. An analysis with a famous neural network inspector again proves the stealthiness of the proposed attack. Our code is publicly available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Instance Based Approximations to Profile Maximum Likelihood b/data/2020/neurips/Instance Based Approximations to Profile Maximum Likelihood
new file mode 100644
index 0000000000..663bc39d5a
--- /dev/null
+++ b/data/2020/neurips/Instance Based Approximations to Profile Maximum Likelihood	
@@ -0,0 +1 @@
+In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best known efficient algorithms for computing approximate PML distributions and improves when the number of distinct observed frequencies in the given instance is small. We achieve this result by exploiting new sparsity structure in approximate PML distributions and providing a new matrix rounding algorithm, of independent interest. Leveraging this result, we obtain the first provable computationally efficient implementation of PseudoPML, a general framework for estimating a broad class of symmetric properties. Additionally, we obtain efficient PML-based estimators for distributions with small profile entropy, a natural instance-based complexity measure. Further, we provide a simpler and more practical PseudoPML implementation that matches the best-known theoretical guarantees of such an estimator and evaluate this method empirically.
\ No newline at end of file
diff --git a/data/2020/neurips/Instance Selection for GANs b/data/2020/neurips/Instance Selection for GANs
new file mode 100644
index 0000000000..c6e0134c4c
--- /dev/null
+++ b/data/2020/neurips/Instance Selection for GANs	
@@ -0,0 +1 @@
+Recent advances in Generative Adversarial Networks (GANs) have led to their widespread adoption for the purposes of generating high quality synthetic imagery. While capable of generating photo-realistic images, these models often produce unrealistic samples which fall outside of the data manifold. Several recently proposed techniques attempt to avoid spurious samples, either by rejecting them after generation, or by truncating the model's latent space. While effective, these methods are inefficient, as large portions of model capacity are dedicated towards representing samples that will ultimately go unused. In this work we propose a novel approach to improve sample quality: altering the training dataset via instance selection before model training has taken place. To this end, we embed data points into a perceptual feature space and use a simple density model to remove low density regions from the data manifold. By refining the empirical data distribution before training we redirect model capacity towards high-density regions, which ultimately improves sample fidelity. We evaluate our method by training a Self-Attention GAN on ImageNet at 64x64 resolution, where we outperform the current state-of-the-art models on this task while using 1/2 of the parameters. We also highlight training time savings by training a BigGAN on ImageNet at 128x128 resolution, achieving a 66% increase in Inception Score and a 16% improvement in FID over the baseline model with less than 1/4 the training time.
\ No newline at end of file
diff --git a/data/2020/neurips/Instance-based Generalization in Reinforcement Learning b/data/2020/neurips/Instance-based Generalization in Reinforcement Learning
new file mode 100644
index 0000000000..b93cf6199c
--- /dev/null
+++ b/data/2020/neurips/Instance-based Generalization in Reinforcement Learning	
@@ -0,0 +1 @@
+Agents trained via deep reinforcement learning (RL) routinely fail to generalize to unseen environments, even when these share the same underlying dynamics as the training levels. Understanding the generalization properties of RL is one of the challenges of modern machine learning. Towards this goal, we analyze policy learning in the context of Partially Observable Markov Decision Processes (POMDPs) and formalize the dynamics of training levels as instances. We prove that, independently of the exploration strategy, reusing instances introduces significant changes on the effective Markov dynamics the agent observes during training. Maximizing expected rewards impacts the learned belief state of the agent by inducing undesired instance specific speedrunning policies instead of generalizeable ones, which are suboptimal on the training set. We provide generalization bounds to the value gap in train and test environments based on the number of training instances, and use insights based on these to improve performance on unseen levels. We propose training a shared belief representation over an ensemble of specialized policies, from which we compute a consensus policy that is used for data collection, disallowing instance specific exploitation. We experimentally validate our theory, observations, and the proposed computational solution over the CoinRun benchmark.
\ No newline at end of file
diff --git a/data/2020/neurips/Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms b/data/2020/neurips/Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms
new file mode 100644
index 0000000000..385daeadd9
--- /dev/null
+++ b/data/2020/neurips/Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms	
@@ -0,0 +1 @@
+We study and provide instance-optimal algorithms in differential privacy by extending and approximating the inverse sensitivity mechanism. We provide two approximation frameworks, one which only requires knowledge of local sensitivities, and a gradient-based approximation for optimization problems, which are efﬁciently computable for a broad class of functions. We complement our analysis with instance-speciﬁc lower bounds for vector-valued functions, which demonstrate that our mechanisms are (nearly) instance-optimal under certain assumptions and that minimax lower bounds may not provide an accurate estimate of the hardness of a problem in general: our algorithms can signiﬁcantly outperform minimax bounds for well behaved instances. Finally, we use our approximation framework to develop private mechanisms for unbounded-range mean estimation, principal component analysis, and linear regression. For PCA, our mechanisms give an efﬁcient (pure) differentially private algorithm with near-optimal rates.
\ No newline at end of file
diff --git a/data/2020/neurips/Instance-wise Feature Grouping b/data/2020/neurips/Instance-wise Feature Grouping
new file mode 100644
index 0000000000..9e91974426
--- /dev/null
+++ b/data/2020/neurips/Instance-wise Feature Grouping	
@@ -0,0 +1 @@
+In many learning problems, the domain scientist is often interested in discovering the groups of features that are redundant and are important for classiﬁcation. Moreover, the features that belong to each group, and the important feature groups may vary per sample. But what do we mean by feature redundancy? In this paper, we formally deﬁne two types of redundancies using information theory: Representation and Relevant redundancies . We leverage these redundancies to design a formulation for instance-wise feature group discovery and reveal a theoretical guideline to help discover the appropriate number of groups. We approximate mutual information via a variational lower bound and learn the feature group and selector indicators with Gumbel-Softmax in optimizing our formulation. Experiments on synthetic data validate our theoretical claims. Experiments on MNIST, Fashion MNIST, and gene expression datasets show that our method discovers feature groups with high classiﬁcation accuracies.
\ No newline at end of file
diff --git a/data/2020/neurips/Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients b/data/2020/neurips/Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients
new file mode 100644
index 0000000000..c1a5a95101
--- /dev/null
+++ b/data/2020/neurips/Instead of Rewriting Foreign Code for Machine Learning, Automatically Synthesize Fast Gradients	
@@ -0,0 +1 @@
+Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically analyzable programs expressed in the LLVM intermediate representation (IR). Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM IR including C, C++, Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD capabilities in these languages. Unlike traditional source-to-source and operator-overloading tools, Enzyme performs AD on optimized IR. On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR before optimization allowing Enzyme to achieve state-of-the-art performance. Packaging Enzyme for PyTorch and TensorFlow provides convenient access to gradients of foreign code with state-of-the art performance, enabling foreign code to be directly incorporated into existing machine learning workflows.
\ No newline at end of file
diff --git a/data/2020/neurips/Interferobot: aligning an optical interferometer by a reinforcement learning agent b/data/2020/neurips/Interferobot: aligning an optical interferometer by a reinforcement learning agent
new file mode 100644
index 0000000000..eeeb80696f
--- /dev/null
+++ b/data/2020/neurips/Interferobot: aligning an optical interferometer by a reinforcement learning agent	
@@ -0,0 +1 @@
+Limitations in acquiring training data restrict potential applications of deep reinforcement learning (RL) methods to the training of real-world robots. Here we train an RL agent to align a Mach-Zehnder interferometer, which is an essential part of many optical experiments, based on images of interference fringes acquired by a monocular camera. The agent is trained in a simulated environment, without any hand-coded features or a priori information about the physics, and subsequently transferred to a physical interferometer. Thanks to a set of domain randomizations simulating uncertainties in physical measurements, the agent successfully aligns this interferometer without any fine tuning, achieving a performance level of a human expert.
\ No newline at end of file
diff --git a/data/2020/neurips/Interior Point Solving for LP-based prediction+optimisation b/data/2020/neurips/Interior Point Solving for LP-based prediction+optimisation
new file mode 100644
index 0000000000..467b30f20c
--- /dev/null
+++ b/data/2020/neurips/Interior Point Solving for LP-based prediction+optimisation	
@@ -0,0 +1 @@
+Solving optimization problems is the key to decision making in many real-life analytics applications. However, the coefficients of the optimization problems are often uncertain and dependent on external factors, such as future demand or energy or stock prices. Machine learning (ML) models, especially neural networks, are increasingly being used to estimate these coefficients in a data-driven way. Hence, end-to-end predict-and-optimize approaches, which consider how effective the predicted values are to solve the optimization problem, have received increasing attention. In case of integer linear programming problems, a popular approach to overcome their non-differentiabilty is to add a quadratic penalty term to the continuous relaxation, such that results from differentiating over quadratic programs can be used. Instead we investigate the use of the more principled logarithmic barrier term, as widely used in interior point solvers for linear programming. Specifically, instead of differentiating the KKT conditions, we consider the homogeneous self-dual formulation of the LP and we show the relation between the interior point step direction and corresponding gradients needed for learning. Finally our empirical experiments demonstrate our approach performs as good as if not better than the state-of-the-art QPTL (Quadratic Programming task loss) formulation of Wilder et al. and SPO approach of Elmachtoub and Grigas.
\ No newline at end of file
diff --git a/data/2020/neurips/Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs b/data/2020/neurips/Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs
new file mode 100644
index 0000000000..742ec03a11
--- /dev/null
+++ b/data/2020/neurips/Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs	
@@ -0,0 +1 @@
+We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Interpretable Sequence Learning for Covid-19 Forecasting b/data/2020/neurips/Interpretable Sequence Learning for Covid-19 Forecasting
new file mode 100644
index 0000000000..78e496566a
--- /dev/null
+++ b/data/2020/neurips/Interpretable Sequence Learning for Covid-19 Forecasting	
@@ -0,0 +1 @@
+We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epidemiologists and to instill confidence in end-users such as policy makers and healthcare institutions. Our model can be applied at different geographic resolutions, and here we demonstrate it for states and counties in the United States. We show that our model provides more accurate forecasts, in metrics averaged across the entire US, than state-of-the-art alternatives, and that it provides qualitatively meaningful explanatory insights. Lastly, we analyze the performance of our model for different subgroups based on the subgroup distributions within the counties.
\ No newline at end of file
diff --git a/data/2020/neurips/Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations b/data/2020/neurips/Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations
new file mode 100644
index 0000000000..cb033b2f14
--- /dev/null
+++ b/data/2020/neurips/Interpretable and Personalized Apprenticeship Scheduling: Learning Interpretable Scheduling Policies from Heterogeneous User Demonstrations	
@@ -0,0 +1 @@
+Resource scheduling and coordination is an NP-hard optimization requiring an efficient allocation of agents to a set of tasks with upper- and lower bound temporal and resource constraints. Due to the large-scale and dynamic nature of resource coordination in hospitals and factories, human domain experts manually plan and adjust schedules on the fly. To perform this job, domain experts leverage heterogeneous strategies and rules-of-thumb honed over years of apprenticeship. What is critically needed is the ability to extract this domain knowledge in a heterogeneous and interpretable apprenticeship learning framework to scale beyond the power of a single human expert, a necessity in safety-critical domains. We propose a personalized and interpretable apprenticeship scheduling algorithm that infers an interpretable representation of all human task demonstrators by extracting decision-making criteria via an inferred, personalized embedding non-parametric in the number of demonstrator types. We achieve near-perfect LfD accuracy in synthetic domains and 88.22\% accuracy on a planning domain with real-world, outperforming baselines. Finally, our user study showed our methodology produces more interpretable and easier-to-use models than neural networks ($p < 0.05$).
\ No newline at end of file
diff --git a/data/2020/neurips/Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech b/data/2020/neurips/Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech
new file mode 100644
index 0000000000..07a5e6c5f4
--- /dev/null
+++ b/data/2020/neurips/Interpretable multi-timescale models for predicting fMRI responses to continuous natural speech	
@@ -0,0 +1 @@
+Natural language contains information at multiple timescales. To understand how the human brain represents this information, one approach is to build encoding models that predict fMRI responses to natural language using representations extracted from neural network language models (LMs). However, these LM-derived representations do not explicitly separate information at different timescales, making it difficult to interpret the encoding models. In this work we construct interpretable multi-timescale representations by forcing individual units in an LSTM LM to integrate information over specific temporal scales. This allows us to explicitly and directly map the timescale of information encoded by each individual fMRI voxel. Further, the standard fMRI encoding procedure does not account for varying temporal properties in the encoding features. We modify the procedure so that it can capture both short- and long-timescale information. This approach outper-forms other encoding models, particularly for voxels that represent long-timescale information. It also provides a finer-grained map of timescale information in the human language pathway. This serves as a framework for future work investigating temporal hierarchies across artificial and biological language systems.
\ No newline at end of file
diff --git a/data/2020/neurips/Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding b/data/2020/neurips/Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding
new file mode 100644
index 0000000000..b17de06660
--- /dev/null
+++ b/data/2020/neurips/Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding	
@@ -0,0 +1 @@
+Knowledge graph (KG) embedding is well-known in learning representations of KGs. Many models have been proposed to learn the interactions between entities and relations of the triplets. However, long-term information among multiple triplets is also important to KG. In this work, based on the relational paths, which are composed of a sequence of triplets, we define the Interstellar as a recurrent neural architecture search problem for the short-term and long-term information along the paths. First, we analyze the difficulty of using a unified model to work as the Interstellar. Then, we propose to search for recurrent architecture as the Interstellar for different KG tasks. A case study on synthetic data illustrates the importance of the defined search problem. Experiments on real datasets demonstrate the effectiveness of the searched models and the efficiency of the proposed hybrid-search algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Interventional Few-Shot Learning b/data/2020/neurips/Interventional Few-Shot Learning
new file mode 100644
index 0000000000..e3235de371
--- /dev/null
+++ b/data/2020/neurips/Interventional Few-Shot Learning	
@@ -0,0 +1 @@
+We uncover an ever-overlooked deficiency in the prevailing Few-Shot Learning (FSL) methods: the pre-trained knowledge is indeed a confounder that limits the performance. This finding is rooted from our causal assumption: a Structural Causal Model (SCM) for the causalities among the pre-trained knowledge, sample features, and labels. Thanks to it, we propose a novel FSL paradigm: Interventional Few-Shot Learning (IFSL). Specifically, we develop three effective IFSL algorithmic implementations based on the backdoor adjustment, which is essentially a causal intervention towards the SCM of many-shot learning: the upper-bound of FSL in a causal view. It is worth noting that the contribution of IFSL is orthogonal to existing fine-tuning and meta-learning based FSL methods, hence IFSL can improve all of them, achieving a new 1-/5-shot state-of-the-art on \textit{mini}ImageNet, \textit{tiered}ImageNet, and cross-domain CUB. Code is released at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks b/data/2020/neurips/Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks
new file mode 100644
index 0000000000..f794a986ed
--- /dev/null
+++ b/data/2020/neurips/Intra Order-preserving Functions for Calibration of Multi-Class Neural Networks	
@@ -0,0 +1 @@
+Predicting calibrated confidence scores for multi-class deep networks is important for avoiding rare but costly mistakes. A common approach is to learn a post-hoc calibration function that transforms the output of the original network into calibrated confidence scores while maintaining the network's accuracy. However, previous post-hoc calibration techniques work only with simple calibration functions, potentially lacking sufficient representation to calibrate the complex function landscape of deep networks. In this work, we aim to learn general post-hoc calibration functions that can preserve the top-k predictions of any deep network. We call this family of functions intra order-preserving functions. We propose a new neural network architecture that represents a class of intra order-preserving functions by combining common neural network components. Additionally, we introduce order-invariant and diagonal sub-families, which can act as regularization for better generalization when the training data size is small. We show the effectiveness of the proposed method across a wide range of datasets and classifiers. Our method outperforms state-of-the-art post-hoc calibration methods, namely temperature scaling and Dirichlet calibration, in multiple settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Intra-Processing Methods for Debiasing Neural Networks b/data/2020/neurips/Intra-Processing Methods for Debiasing Neural Networks
new file mode 100644
index 0000000000..0af908afb3
--- /dev/null
+++ b/data/2020/neurips/Intra-Processing Methods for Debiasing Neural Networks	
@@ -0,0 +1 @@
+As deep learning models become tasked with more and more decisions that impact human lives, such as criminal recidivism, loan repayment, and face recognition for law enforcement, bias is becoming a growing concern. Debiasing algorithms are typically split into three paradigms: pre-processing, in-processing, and post-processing. However, in computer vision or natural language applications, it is common to start with a large generic model and then fine-tune to a specific use-case. Pre- or in-processing methods would require retraining the entire model from scratch, while post-processing methods only have black-box access to the model, so they do not leverage the weights of the trained model. Creating debiasing algorithms specifically for this fine-tuning use-case has largely been neglected. In this work, we initiate the study of a new paradigm in debiasing research, intra-processing, which sits between in-processing and post-processing methods. Intra-processing methods are designed specifically to debias large models which have been trained on a generic dataset and fine-tuned on a more specific task. We show how to repurpose existing in-processing methods for this use-case, and we also propose three baseline algorithms: random perturbation, layerwise optimization, and adversarial fine-tuning. All of our techniques can be used for all popular group fairness measures such as equalized odds or statistical parity difference. We evaluate these methods across three popular datasets from the AIF360 toolkit, as well as on the CelebA faces dataset. Our code is available at https://github.com/abacusai/intraprocessing_debiasing.
\ No newline at end of file
diff --git a/data/2020/neurips/Introducing Routing Uncertainty in Capsule Networks b/data/2020/neurips/Introducing Routing Uncertainty in Capsule Networks
new file mode 100644
index 0000000000..de0fe252f3
--- /dev/null
+++ b/data/2020/neurips/Introducing Routing Uncertainty in Capsule Networks	
@@ -0,0 +1,11 @@
+Rather than performing inefficient local iterative routing between adjacent capsule 
+layers, we propose an alternative global view based on representing the inherent uncertainty 
+in part-object assignment. In our formulation, the local routing iterations 
+are replaced with variational inference of part-object connections in a probabilistic 
+capsule network, leading to a significant speedup without sacrificing performance. 
+In this way, global context is also considered when routing capsules by introducing 
+global latent variables that have direct influence on the objective function, and 
+are updated discriminatively in accordance with the minimum description length 
+(MDL) principle. We focus on enhancing capsule network properties, and perform a 
+thorough evaluation on pose-aware tasks, observing improvements in performance 
+over previous approaches whilst being more computationally efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Inverse Learning of Symmetries b/data/2020/neurips/Inverse Learning of Symmetries
new file mode 100644
index 0000000000..ecf5838474
--- /dev/null
+++ b/data/2020/neurips/Inverse Learning of Symmetries	
@@ -0,0 +1 @@
+Symmetry transformations induce invariances which are frequently described with deep latent variable models. In many complex domains, such as the chemical space, invariances can be observed, yet the corresponding symmetry transformation cannot be formulated analytically. We propose to learn the symmetry transformation with a model consisting of two latent subspaces, where the first subspace captures the target and the second subspace the remaining invariant information. Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser. Unlike previous methods, we focus on the challenging task of minimising mutual information in continuous domains. To this end, we base the calculation of mutual information on correlation matrices in combination with a bijective variable transformation. Extensive experiments demonstrate that our model outperforms state-of-the-art methods on artificial and molecular datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics b/data/2020/neurips/Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
new file mode 100644
index 0000000000..0d84742889
--- /dev/null
+++ b/data/2020/neurips/Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics	
@@ -0,0 +1 @@
+A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as rational but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent's actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax b/data/2020/neurips/Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
new file mode 100644
index 0000000000..6227c5c87d
--- /dev/null
+++ b/data/2020/neurips/Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax	
@@ -0,0 +1 @@
+The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a conceptually simpler and more flexible alternative family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. This invertible function is composed of a modified softmax and can incorporate diverse transformations that serve different specific purposes. For example, the stick-breaking procedure allows us to extend the reparameterization trick to distributions with countably infinite support, or normalizing flows let us increase the flexibility of the distribution. Our construction enjoys theoretical advantages over the Gumbel-Softmax, such as closed form KL, and significantly outperforms it in a variety of experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Inverting Gradients - How easy is it to break privacy in federated learning? b/data/2020/neurips/Inverting Gradients - How easy is it to break privacy in federated learning?
new file mode 100644
index 0000000000..f4f1790adb
--- /dev/null
+++ b/data/2020/neurips/Inverting Gradients - How easy is it to break privacy in federated learning?	
@@ -0,0 +1 @@
+The idea of federated learning is to collaboratively train a neural network on a server. Each user receives the current weights of the network and in turns sends parameter updates (gradients) based on local data. This protocol has been designed not only to train neural networks data-efficiently, but also to provide privacy benefits for users, as their input data remains on device and only parameter gradients are shared. But how secure is sharing parameter gradients? Previous attacks have provided a false sense of security, by succeeding only in contrived settings - even for a single image. However, by exploiting a magnitude-invariant loss along with optimization strategies based on adversarial attacks, we show that is is actually possible to faithfully reconstruct images at high resolution from the knowledge of their parameter gradients, and demonstrate that such a break of privacy is possible even for trained deep networks. We analyze the effects of architecture as well as parameters on the difficulty of reconstructing an input image and prove that any input to a fully connected layer can be reconstructed analytically independent of the remaining architecture. Finally we discuss settings encountered in practice and show that even averaging gradients over several iterations or several images does not protect the user's privacy in federated learning applications in computer vision.
\ No newline at end of file
diff --git a/data/2020/neurips/Investigating Gender Bias in Language Models Using Causal Mediation Analysis b/data/2020/neurips/Investigating Gender Bias in Language Models Using Causal Mediation Analysis
new file mode 100644
index 0000000000..6b5fed225e
--- /dev/null
+++ b/data/2020/neurips/Investigating Gender Bias in Language Models Using Causal Mediation Analysis	
@@ -0,0 +1 @@
+Many interpretation methods for neural models in natural language processing investigate how information is encoded inside hidden representations. However, these methods can only measure whether the information exists, not whether it is actually used by the model. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. The approach enables us to analyze the mechanisms that facilitate the ﬂow of information from input to output through various model components, known as mediators. As a case study, we apply this methodology to analyzing gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model’s sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are concentrated in speciﬁc components of the model that may exhibit highly specialized behavior.
\ No newline at end of file
diff --git a/data/2020/neurips/Is Long Horizon RL More Difficult Than Short Horizon RL? b/data/2020/neurips/Is Long Horizon RL More Difficult Than Short Horizon RL?
new file mode 100644
index 0000000000..bce41db149
--- /dev/null
+++ b/data/2020/neurips/Is Long Horizon RL More Difficult Than Short Horizon RL?	
@@ -0,0 +1 @@
+Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difﬁculty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the number of episodes it takes to provably discover a policy whose value is ε near to that of the optimal value, where the value is measured by the normalized cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon — a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only logarithmically with the planning horizon. In other words, when the values are appropriately normalized (to lie in the unit interval), this results shows that long horizon RL is no more difﬁcult than short horizon RL, at least in a minimax sense. Our analysis introduces two ideas: (i) the construction of an ε -net for near-optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class and enjoys a sample complexity that scales logarithmically with the cardinality of the given policy class. Both may be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? b/data/2020/neurips/Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?
new file mode 100644
index 0000000000..eaab8c01e2
--- /dev/null
+++ b/data/2020/neurips/Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?	
@@ -0,0 +1 @@
+It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an $\epsilon$-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an $\epsilon$-optimal policy in a $\gamma$-discounted MDP is $O(K/(1-\gamma)^3\epsilon^2)$, which only depends on the dimensionality $K$ of the feature space and has no dependence on the state or action space. We further extend our results to a relaxed setting where anchor-states may not exist and show that a plug-in approach can be sample efficient as well, providing a flexible approach to design model-based algorithms for RL.
\ No newline at end of file
diff --git a/data/2020/neurips/Is normalization indispensable for training deep neural network? b/data/2020/neurips/Is normalization indispensable for training deep neural network?
new file mode 100644
index 0000000000..ddeda173a6
--- /dev/null
+++ b/data/2020/neurips/Is normalization indispensable for training deep neural network?	
@@ -0,0 +1 @@
+Normalization operations are widely used to train deep neural networks, and they can improve both convergence and generalization in most tasks. The theories for normalization’s effectiveness and new forms of normalization have always been hot topics in research. To better understand normalization, one question can be whether normalization is indispensable for training deep neural networks? In this paper, we analyze what would happen when normalization layers are removed from the networks, and show how to train deep neural networks without normalization layers and without performance degradation . Our proposed method can achieve the same or even slightly better performance in a variety of tasks: image classiﬁcation in ImageNet, object detection and segmentation in MS-COCO, video classiﬁcation in Kinetics, and machine translation in WMT English-German, etc. Our study may help better understand the role of normalization layers and can be a competitive alternative to normalization layers. Codes are available at https://github.com/ hukkai/rescaling .
\ No newline at end of file
diff --git a/data/2020/neurips/Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings b/data/2020/neurips/Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings
new file mode 100644
index 0000000000..d9ecd6e944
--- /dev/null
+++ b/data/2020/neurips/Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings	
@@ -0,0 +1 @@
+In this paper, we propose an end-to-end graph learning framework, namely Iterative Deep Graph Learning (IDGL), for jointly and iteratively learning graph structure and graph embedding. The key rationale of IDGL is to learn a better graph structure based on better node embeddings, and vice versa (i.e., better node embeddings based on a better graph structure). Our iterative method dynamically stops when the learned graph approaches close enough to the graph optimized for the prediction task. In addition, we cast the graph learning problem as a similarity metric learning problem and leverage adaptive graph regularization for controlling the quality of the learned graph. Finally, combining the anchor-based approximation technique, we further propose a scalable version of IDGL, namely IDGL-ANCH, which significantly reduces the time and space complexity of IDGL without compromising the performance. Our extensive experiments on nine benchmarks show that our proposed IDGL models can consistently outperform or match state-of-the-art baselines. Furthermore, IDGL can be more robust to adversarial graphs and cope with both transductive and inductive learning.
\ No newline at end of file
diff --git a/data/2020/neurips/JAX MD: A Framework for Differentiable Physics b/data/2020/neurips/JAX MD: A Framework for Differentiable Physics
new file mode 100644
index 0000000000..d331c60912
--- /dev/null
+++ b/data/2020/neurips/JAX MD: A Framework for Differentiable Physics	
@@ -0,0 +1 @@
+We introduce JAX MD, a software package for performing differentiable physics simulations with a focus on molecular dynamics. JAX MD includes a number of physics simulation environments, as well as interaction potentials and neural networks that can be integrated into these environments without writing any additional code. Since the simulations themselves are differentiable functions, entire trajectories can be differentiated to perform meta-optimization. These features are built on primitive operations, such as spatial partitioning, that allow simulations to scale to hundreds-of-thousands of particles on a single GPU. These primitives are flexible enough that they can be used to scale up workloads outside of molecular dynamics. We present several examples that highlight the features of JAX MD including: integration of graph neural networks into traditional simulations, meta-optimization through minimization of particle packings, and a multi-agent flocking simulation. JAX MD is available at https://www.github.com/google/jax-md.
\ No newline at end of file
diff --git a/data/2020/neurips/Joint Contrastive Learning with Infinite Possibilities b/data/2020/neurips/Joint Contrastive Learning with Infinite Possibilities
new file mode 100644
index 0000000000..a7bc49a7c8
--- /dev/null
+++ b/data/2020/neurips/Joint Contrastive Learning with Infinite Possibilities	
@@ -0,0 +1 @@
+This paper explores useful modifications of the recent development in contrastive learning via novel probabilistic modeling. We derive a particular form of contrastive loss named Joint Contrastive Learning (JCL). JCL implicitly involves the simultaneous learning of an infinite number of query-key pairs, which poses tighter constraints when searching for invariant features. We derive an upper bound on this formulation that allows analytical solutions in an end-to-end training manner. While JCL is practically effective in numerous computer vision applications, we also theoretically unveil the certain mechanisms that govern the behavior of JCL. We demonstrate that the proposed formulation harbors an innate agency that strongly favors similarity within each instance-specific class, and therefore remains advantageous when searching for discriminative features among distinct instances. We evaluate these proposals on multiple benchmarks, demonstrating considerable improvements over existing algorithms. Code is publicly available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Joint Policy Search for Multi-agent Collaboration with Imperfect Information b/data/2020/neurips/Joint Policy Search for Multi-agent Collaboration with Imperfect Information
new file mode 100644
index 0000000000..5c7f38ef70
--- /dev/null
+++ b/data/2020/neurips/Joint Policy Search for Multi-agent Collaboration with Imperfect Information	
@@ -0,0 +1 @@
+To learn good joint policies for multi-agent collaboration with imperfect information remains a fundamental challenge. While for two-player zero-sum games, coordinate-ascent approaches (optimizing one agent's policy at a time, e.g., self-play) work with guarantees, in multi-agent cooperative setting they often converge to sub-optimal Nash equilibrium. On the other hand, directly modeling joint policy changes in imperfect information game is nontrivial due to complicated interplay of policies (e.g., upstream updates affect downstream state reachability). In this paper, we show global changes of game values can be decomposed to policy changes localized at each information set, with a novel term named policy-change density. Based on this, we propose Joint Policy Search(JPS) that iteratively improves joint policies of collaborative agents in imperfect information games, without re-evaluating the entire game. On multi-agent collaborative tabular games, JPS is proven to never worsen performance and can improve solutions provided by unilateral approaches (e.g, CFR), outperforming algorithms designed for collaborative policy learning (e.g. BAD). Furthermore, for real-world games, JPS has an online form that naturally links with gradient updates. We test it to Contract Bridge, a 4-player imperfect-information game where a team of $2$ collaborates to compete against the other. In its bidding phase, players bid in turn to find a good contract through a limited information channel. Based on a strong baseline agent that bids competitive bridge purely through domain-agnostic self-play, JPS improves collaboration of team players and outperforms WBridge5, a championship-winning software, by $+0.63$ IMPs (International Matching Points) per board over 1k games, substantially better than previous SoTA ($+0.41$ IMPs/b) under Double-Dummy evaluation.
\ No newline at end of file
diff --git a/data/2020/neurips/Joints in Random Forests b/data/2020/neurips/Joints in Random Forests
new file mode 100644
index 0000000000..2ad3975b9d
--- /dev/null
+++ b/data/2020/neurips/Joints in Random Forests	
@@ -0,0 +1 @@
+Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features.
\ No newline at end of file
diff --git a/data/2020/neurips/Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout b/data/2020/neurips/Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
new file mode 100644
index 0000000000..757df8228c
--- /dev/null
+++ b/data/2020/neurips/Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout	
@@ -0,0 +1 @@
+The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights. However, these multiple updates can impede optimal training by pulling the model in conflicting directions. We present Gradient Sign Dropout (GradDrop), a probabilistic masking procedure which samples gradients at an activation layer based on their level of consistency. GradDrop is implemented as a simple deep layer that can be used in any deep net and synergizes with other gradient balancing approaches. We show that GradDrop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how GradDrop reveals links between optimal multiloss training and gradient stochasticity.
\ No newline at end of file
diff --git a/data/2020/neurips/KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering b/data/2020/neurips/KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering
new file mode 100644
index 0000000000..83d22a570c
--- /dev/null
+++ b/data/2020/neurips/KFC: A Scalable Approximation Algorithm for $k$-center Fair Clustering	
@@ -0,0 +1 @@
+In this paper, we study the problem of fair clustering on the $k-$center objective. In fair clustering, the input is $N$ points, each belonging to at least one of $l$ protected groups, e.g. male, female, Asian, Hispanic. The objective is to cluster the $N$ points into $k$ clusters to minimize a classical clustering objective function. However, there is an additional constraint that each cluster needs to be fair, under some notion of fairness. This ensures that no group is either "over-represented" or "under-represented" in any cluster. Our work builds on the work of Chierichetti et al. (NIPS 2017), Bera et al. (NeurIPS 2019), Ahmadian et al. (KDD 2019), and Bercea et al. (APPROX 2019). We obtain a randomized $3-$approximation algorithm for the $k-$center objective function, beating the previous state of the art ($4-$approximation). We test our algorithm on real datasets, and show that our algorithm is effective in finding good clusters without over-representation or under-representation, surpassing the current state of the art in runtime speed, clustering cost, while achieving similar fairness violations.
\ No newline at end of file
diff --git a/data/2020/neurips/Kalman Filtering Attention for User Behavior Modeling in CTR Prediction b/data/2020/neurips/Kalman Filtering Attention for User Behavior Modeling in CTR Prediction
new file mode 100644
index 0000000000..9ef5e52a5f
--- /dev/null
+++ b/data/2020/neurips/Kalman Filtering Attention for User Behavior Modeling in CTR Prediction	
@@ -0,0 +1 @@
+Click-through rate (CTR) prediction is one of the fundamental tasks for e-commerce search engines. As search becomes more personalized, it is necessary to capture the user interest from rich behavior data. Existing user behavior modeling algorithms develop different attention mechanisms to emphasize query-relevant behaviors and suppress irrelevant ones. Despite being extensively studied, these attentions still suffer from two limitations. First, conventional attentions mostly limit the attention field only to a single user's behaviors, which is not suitable in e-commerce where users often hunt for new demands that are irrelevant to any historical behaviors. Second, these attentions are usually biased towards frequent behaviors, which is unreasonable since high frequency does not necessarily indicate great importance. To tackle the two limitations, we propose a novel attention mechanism, termed Kalman Filtering Attention (KFAtt), that considers the weighted pooling in attention as a maximum a posteriori (MAP) estimation. By incorporating a priori, KFAtt resorts to global statistics when few user behaviors are relevant. Moreover, a frequency capping mechanism is incorporated to correct the bias towards frequent behaviors. Offline experiments on both benchmark and a 10 billion scale real production dataset, together with an Online A/B test, show that KFAtt outperforms all compared state-of-the-arts. KFAtt has been deployed in the ranking system of a leading e commerce website, serving the main traffic of hundreds of millions of active users everyday.
\ No newline at end of file
diff --git a/data/2020/neurips/Kernel Alignment Risk Estimator: Risk Prediction from Training Data b/data/2020/neurips/Kernel Alignment Risk Estimator: Risk Prediction from Training Data
new file mode 100644
index 0000000000..f726a2885c
--- /dev/null
+++ b/data/2020/neurips/Kernel Alignment Risk Estimator: Risk Prediction from Training Data	
@@ -0,0 +1 @@
+We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $\lambda>0$ and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,\lambda}$ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE $\rho_{K, \lambda}$, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.
\ No newline at end of file
diff --git a/data/2020/neurips/Kernel Based Progressive Distillation for Adder Neural Networks b/data/2020/neurips/Kernel Based Progressive Distillation for Adder Neural Networks
new file mode 100644
index 0000000000..333169c391
--- /dev/null
+++ b/data/2020/neurips/Kernel Based Progressive Distillation for Adder Neural Networks	
@@ -0,0 +1 @@
+Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using $\ell_1$-norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the difference of their distributions using a kernel based method. Finally, the desired ANN is learned based on the information from both the ground-truth and teacher, progressively. The effectiveness of the proposed method for learning ANN with higher performance is then well-verified on several benchmarks. For instance, the ANN-50 trained using the proposed PKKD method obtains a 76.8\% top-1 accuracy on ImageNet dataset, which is 0.6\% higher than that of the ResNet-50.
\ No newline at end of file
diff --git a/data/2020/neurips/Kernel Methods Through the Roof: Handling Billions of Points Efficiently b/data/2020/neurips/Kernel Methods Through the Roof: Handling Billions of Points Efficiently
new file mode 100644
index 0000000000..1cab03547e
--- /dev/null
+++ b/data/2020/neurips/Kernel Methods Through the Roof: Handling Billions of Points Efficiently	
@@ -0,0 +1 @@
+Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naive implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance. Additionally, we make our software available as an easy to use library.
\ No newline at end of file
diff --git a/data/2020/neurips/Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks b/data/2020/neurips/Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks
new file mode 100644
index 0000000000..acd7de29c8
--- /dev/null
+++ b/data/2020/neurips/Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks	
@@ -0,0 +1 @@
+The state-of-the art machine learning approach to training deep neural networks, backpropagation, is implausible for real neural networks: neurons need to know their outgoing weights; training alternates between a forward pass (computation) and a backward pass (learning); and the algorithm needs a large amount of labeled data. Biologically plausible approximations to backpropagation, such as feedback alignment, solve the weight transport problem, but not the other two. Thus, fully biologically plausible learning rules have so far remained elusive. Here we present a family of learning rules that does not suffer from any of these problems. It is motivated by the information bottleneck principle (extended with kernel methods), in which networks learn to squeeze as much information as possible out of the input without sacrificing prediction of the output. The resulting rules have a 3-factor Hebbian structure: they require pre- and post-synaptic firing rates and a global error signal - the third factor - that can be supplied by a neuromodulator. Moreover, they do not require precise labels; instead, they rely on the similarity between the desired outputs. They thus solve all three implausibility issues of backpropagation. Moreover, to obtain good performance on hard problems and retain biologically plausible learning rules, our rules need divisive normalization - a known feature of biological networks. Finally, simulations show that our rule performs nearly as well as backpropagation on image classification tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition b/data/2020/neurips/Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
new file mode 100644
index 0000000000..af4e0ebb34
--- /dev/null
+++ b/data/2020/neurips/Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition	
@@ -0,0 +1 @@
+Facial expression and action units (AUs) represent two levels of descriptions of the facial behavior. Due to the underlying facial anatomy and the need to form a meaningful coherent expression, they are strongly correlated. This paper proposes to systematically capture their dependencies and incorporate them into a deep learning framework for joint facial expression recognition and action unit detection. Speciﬁcally, we ﬁrst propose a constraint optimization method to encode the generic knowledge on expression-AUs probabilistic dependencies into a Bayesian Network (BN). The BN is then integrated into a deep learning framework as a weak supervision for an AU detection model. A data-driven facial expression recognition(FER) model is then constructed from data. Finally, the FER model and AU detection model are trained jointly to reﬁne their learning. Evaluations on benchmark datasets demonstrate the effectiveness of the proposed knowledge integration in improving the performance of both the FER model and the AU detection model. The proposed AU detection model is demonstrated to be able to achieve competitive performance without AU annotations. Furthermore, the proposed Bayesian Network capturing the generic knowledge is demonstrated to generalize well to different datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher b/data/2020/neurips/Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher
new file mode 100644
index 0000000000..b48a193e01
--- /dev/null
+++ b/data/2020/neurips/Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher	
@@ -0,0 +1 @@
+Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation lacks a convincing theoretical understanding. On the other hand, recent finding on neural tangent kernel enables us to approximate a wide neural network with a linear model of the network's random features. In this paper, we theoretically analyze the knowledge distillation of a wide neural network. First we provide a transfer risk bound for the linearized model of the network. Then we propose a metric of the task's training difficulty, called data inefficiency. Based on this metric, we show that for a perfect teacher, a high ratio of teacher's soft labels can be beneficial. Finally, for the case of imperfect teacher, we find that hard labels can correct teacher's wrong prediction, which explains the practice of mixing hard and soft labels.
\ No newline at end of file
diff --git a/data/2020/neurips/Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control b/data/2020/neurips/Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control
new file mode 100644
index 0000000000..16d7228c3f
--- /dev/null
+++ b/data/2020/neurips/Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control	
@@ -0,0 +1 @@
+While Deep Reinforcement Learning (DRL) has emerged as a promising approach to many complex tasks, it remains challenging to train a single DRL agent that is capable of undertaking multiple different continuous control tasks. In this paper, we present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control, which enables a single DRL agent to achieve expert-level performance in multiple different tasks by learning from task-specific teachers. In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm designed particularly for the actor-critic architecture to quickly learn a control policy from the experience of task-specific teachers, and then it employs an online learning algorithm to further improve itself by learning from new online transition samples under the guidance of those teachers. We perform a comprehensive empirical study with two commonly-used benchmarks in the MuJoCo continuous control task suite. The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.
\ No newline at end of file
diff --git a/data/2020/neurips/LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond b/data/2020/neurips/LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond
new file mode 100644
index 0000000000..123e0dce74
--- /dev/null
+++ b/data/2020/neurips/LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond	
@@ -0,0 +1 @@
+Single image super-resolution (SISR) deals with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version. Last few years have witnessed impressive progress propelled by deep learning methods. However, one critical challenge faced by existing methods is to strike a sweet spot of deep model complexity and resulting SISR quality. This paper addresses this pain point by proposing a linearly-assembled pixel-adaptive regression network (LAPAR), which casts the direct LR to HR mapping learning into a linear coefficient regression task over a dictionary of multiple predefined filter bases. Such a parametric representation renders our model highly lightweight and easy to optimize while achieving state-of-the-art results on SISR benchmarks. Moreover, based on the same idea, LAPAR is extended to tackle other restoration tasks, e.g., image denoising and JPEG image deblocking, and again, yields strong performance. The code is available at https://github.com/dvlab-research/Simple-SR.
\ No newline at end of file
diff --git a/data/2020/neurips/Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity b/data/2020/neurips/Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity
new file mode 100644
index 0000000000..cc51feb9ee
--- /dev/null
+++ b/data/2020/neurips/Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity	
@@ -0,0 +1 @@
+As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we introduce a novel approach from the perspective of \emph{label-awareness} to reduce this gap for the NTK. Specifically, we propose two label-aware kernels that are each a superimposition of a label-agnostic part and a hierarchy of label-aware parts with increasing complexity of label dependence, using the Hoeffding decomposition. Through both theoretical and empirical evidence, we show that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.
\ No newline at end of file
diff --git a/data/2020/neurips/Labelling unlabelled videos from scratch with multi-modal self-supervision b/data/2020/neurips/Labelling unlabelled videos from scratch with multi-modal self-supervision
new file mode 100644
index 0000000000..59cd2ede81
--- /dev/null
+++ b/data/2020/neurips/Labelling unlabelled videos from scratch with multi-modal self-supervision	
@@ -0,0 +1 @@
+A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, recent methods have allowed to generate meaningful (pseudo-) labels for unlabelled datasets without supervision, this development is missing for the video domain where learning feature representations is the current focus. In this work, we a) show that unsupervised labelling of a video dataset does not come for free from strong feature encoders and b) propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations, by leveraging the natural correspondence between the audio and visual modalities. An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels. We further introduce the first benchmarking results on unsupervised labelling of common video datasets Kinetics, Kinetics-Sound, VGG-Sound and AVE.
\ No newline at end of file
diff --git a/data/2020/neurips/Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks b/data/2020/neurips/Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks
new file mode 100644
index 0000000000..974c5659be
--- /dev/null
+++ b/data/2020/neurips/Lamina-specific neuronal properties promote robust, stable signal propagation in feedforward networks	
@@ -0,0 +1 @@
+Feedforward networks (FFN) are ubiquitous structures in neural systems and have been studied to understand mechanisms of reliable signal and information transmission. In many FFNs, neurons in one layer have intrinsic properties that are distinct from those in their pre-/postsynaptic layers, but how this affects network-level information processing remains unexplored. Here we show that layer-to-layer heterogeneity arising from lamina-specific cellular properties facilitates signal and information transmission in FFNs. Specifically, we found that signal transformations, made by neighboring layers of neurons on an input-driven spike signal, are complementary to each other. This mechanism boosts information transfer carried by a propagating spike signal, and thereby supports reliable spike signal and information transmission in a deep FFN. Our study suggests that distinct cell types in neural circuits have complementary computational functions and facilitate information processing on the whole.
\ No newline at end of file
diff --git a/data/2020/neurips/Language Models are Few-Shot Learners b/data/2020/neurips/Language Models are Few-Shot Learners
new file mode 100644
index 0000000000..e4a7cfab65
--- /dev/null
+++ b/data/2020/neurips/Language Models are Few-Shot Learners	
@@ -0,0 +1 @@
+Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.
\ No newline at end of file
diff --git a/data/2020/neurips/Language Through a Prism: A Spectral Approach for Multiscale Language Representations b/data/2020/neurips/Language Through a Prism: A Spectral Approach for Multiscale Language Representations
new file mode 100644
index 0000000000..923e43ed84
--- /dev/null
+++ b/data/2020/neurips/Language Through a Prism: A Spectral Approach for Multiscale Language Representations	
@@ -0,0 +1 @@
+Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-specific information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classification (utterance-level), or topic classification (document-level), while performing poorly on the other tasks. We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales. Our proposed BERT + Prism model can better predict masked tokens using long-range context and produces multiscale representations that perform better at utterance- and document-level tasks. Our methods are general and readily applicable to other domains besides language, such as images, audio, and video.
\ No newline at end of file
diff --git a/data/2020/neurips/Language and Visual Entity Relationship Graph for Agent Navigation b/data/2020/neurips/Language and Visual Entity Relationship Graph for Agent Navigation
new file mode 100644
index 0000000000..5c53da7383
--- /dev/null
+++ b/data/2020/neurips/Language and Visual Entity Relationship Graph for Agent Navigation	
@@ -0,0 +1 @@
+Vision-and-Language Navigation (VLN) requires an agent to navigate in a real-world environment following natural language instructions. From both the textual and visual perspectives, we find that the relationships among the scene, its objects,and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment. To capture and utilize the relationships, we propose a novel Language and Visual Entity Relationship Graph for modelling the inter-modal relationships between text and vision, and the intra-modal relationships among visual entities. We propose a message passing algorithm for propagating information between language elements and visual entities in the graph, which we then combine to determine the next action to take. Experiments show that by taking advantage of the relationships we are able to improve over state-of-the-art. On the Room-to-Room (R2R) benchmark, our method achieves the new best performance on the test unseen split with success rate weighted by path length (SPL) of 52%. On the Room-for-Room (R4R) dataset, our method significantly improves the previous best from 13% to 34% on the success weighted by normalized dynamic time warping (SDTW). Code is available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration b/data/2020/neurips/Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration
new file mode 100644
index 0000000000..b8f363aec8
--- /dev/null
+++ b/data/2020/neurips/Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration	
@@ -0,0 +1 @@
+Developmental machine learning studies how artificial agents can model the way children learn open-ended repertoires of skills. Such agents need to create and represent goals, select which ones to pursue and learn to achieve them. Recent approaches have considered goal spaces that were either fixed and hand-defined or learned using generative models of states. This limited agents to sample goals within the distribution of known effects. We argue that the ability to imagine out-of-distribution goals is key to enable creative discoveries and open-ended learning. Children do so by leveraging the compositionality of language as a tool to imagine descriptions of outcomes they never experienced before, targeting them as goals during play. We introduce Imagine, an intrinsically motivated deep reinforcement learning architecture that models this ability. Such imaginative agents, like children, benefit from the guidance of a social peer who provides language descriptions. To take advantage of goal imagination, agents must be able to leverage these descriptions to interpret their imagined out-of-distribution goals. This generalization is made possible by modularity: a decomposition between learned goal-achievement reward function and policy relying on deep sets, gated attention and object-centered representations. We introduce the Playground environment and study how this form of goal imagination improves generalization and exploration over agents lacking this capacity. In addition, we identify the properties of goal imagination that enable these results and study the impacts of modularity and social interactions.
\ No newline at end of file
diff --git a/data/2020/neurips/Language-Conditioned Imitation Learning for Robot Manipulation Tasks b/data/2020/neurips/Language-Conditioned Imitation Learning for Robot Manipulation Tasks
new file mode 100644
index 0000000000..c5c08203a4
--- /dev/null
+++ b/data/2020/neurips/Language-Conditioned Imitation Learning for Robot Manipulation Tasks	
@@ -0,0 +1 @@
+Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., "go to the large green bowl"). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Large-Scale Adversarial Training for Vision-and-Language Representation Learning b/data/2020/neurips/Large-Scale Adversarial Training for Vision-and-Language Representation Learning
new file mode 100644
index 0000000000..d87e99486e
--- /dev/null
+++ b/data/2020/neurips/Large-Scale Adversarial Training for Vision-and-Language Representation Learning	
@@ -0,0 +1 @@
+We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning. VILLA consists of two training stages: (i) task-agnostic adversarial pre-training; followed by (ii) task-specific adversarial finetuning. Instead of adding adversarial perturbations on image pixels and textual tokens, we propose to perform adversarial training in the embedding space of each modality. To enable large-scale training, we adopt the "free" adversarial training strategy, and combine it with KL-divergence-based regularization to promote higher invariance in the embedding space. We apply VILLA to current best-performing V+L models, and achieve new state of the art on a wide range of tasks, including Visual Question Answering, Visual Commonsense Reasoning, Image-Text Retrieval, Referring Expression Comprehension, Visual Entailment, and NLVR2.
\ No newline at end of file
diff --git a/data/2020/neurips/Large-Scale Methods for Distributionally Robust Optimization b/data/2020/neurips/Large-Scale Methods for Distributionally Robust Optimization
new file mode 100644
index 0000000000..f18cff36a0
--- /dev/null
+++ b/data/2020/neurips/Large-Scale Methods for Distributionally Robust Optimization	
@@ -0,0 +1 @@
+We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $\chi^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $\chi^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Latent Bandits Revisited b/data/2020/neurips/Latent Bandits Revisited
new file mode 100644
index 0000000000..f9c72b8952
--- /dev/null
+++ b/data/2020/neurips/Latent Bandits Revisited	
@@ -0,0 +1 @@
+A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---of practical relevance in, say, recommender systems. In this work, we propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. Our methods are contextual and aware of model uncertainty and misspecification. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions. A comprehensive empirical study showcases the advantages of our approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings b/data/2020/neurips/Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings
new file mode 100644
index 0000000000..c3f3f26a56
--- /dev/null
+++ b/data/2020/neurips/Latent Dynamic Factor Analysis of High-Dimensional Neural Recordings	
@@ -0,0 +1 @@
+High-dimensional neural recordings across multiple brain regions can be used to establish functional connectivity with good spatial and temporal resolution. We designed and implemented a novel method, Latent Dynamic Factor Analysis of High-dimensional time series (LDFA-H), which combines (a) a new approach to estimating the covariance structure among high-dimensional time series (for the observed variables) and (b) a new extension of probabilistic CCA to dynamic time series (for the latent variables). Our interest is in the cross-correlations among the latent variables which, in neural recordings, may capture the flow of information from one brain region to another. Simulations show that LDFA-H outperforms existing methods in the sense that it captures target factors even when within-region correlation due to noise dominates cross-region correlation. We applied our method to local field potential (LFP) recordings from 192 electrodes in Prefrontal Cortex (PFC) and visual area V4 during a memory-guided saccade task. The results capture time-varying lead-lag dependencies between PFC and V4, and display the associated spatial distribution of the signals.
\ No newline at end of file
diff --git a/data/2020/neurips/Latent Template Induction with Gumbel-CRFs b/data/2020/neurips/Latent Template Induction with Gumbel-CRFs
new file mode 100644
index 0000000000..e2f063dd77
--- /dev/null
+++ b/data/2020/neurips/Latent Template Induction with Gumbel-CRFs	
@@ -0,0 +1 @@
+Learning to control the structure of sentences is a challenging problem in text generation. Existing work either relies on simple deterministic approaches or RL-based hard structures. We explore the use of structured variational autoencoders to infer latent templates for sentence generation using a soft, continuous relaxation in order to utilize reparameterization for training. Specifically, we propose a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach. As a reparameterized gradient estimator, the Gumbel-CRF gives more stable gradients than score-function based estimators. As a structured inference network, we show that it learns interpretable templates during training, which allows us to control the decoder during testing. We demonstrate the effectiveness of our methods with experiments on data-to-text generation and unsupervised paraphrase generation.
\ No newline at end of file
diff --git a/data/2020/neurips/Latent World Models For Intrinsically Motivated Exploration b/data/2020/neurips/Latent World Models For Intrinsically Motivated Exploration
new file mode 100644
index 0000000000..5169d7cb5d
--- /dev/null
+++ b/data/2020/neurips/Latent World Models For Intrinsically Motivated Exploration	
@@ -0,0 +1 @@
+In this work we consider partially observable environments with sparse rewards. We present a self-supervised representation learning method for image-based observations, which arranges embeddings respecting temporal distance of observations. This representation is empirically robust to stochasticity and suitable for novelty detection from the error of a predictive forward model. We consider episodic and life-long uncertainties to guide the exploration. We propose to estimate the missing information about the environment with the world model, which operates in the learned latent space. As a motivation of the method, we analyse the exploration problem in a tabular Partially Observable Labyrinth. We demonstrate the method on image-based hard exploration environments from the Atari benchmark and report significant improvement with respect to prior work. The source code of the method and all the experiments is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge b/data/2020/neurips/Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge
new file mode 100644
index 0000000000..92e90ba81d
--- /dev/null
+++ b/data/2020/neurips/Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge	
@@ -0,0 +1 @@
+To what extent can a neural network systematically reason over symbolic facts? Evidence suggests that large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. Recently, it has been shown that Transformer-based models succeed in consistent reasoning over explicit symbolic facts, under a "closed-world" assumption. However, in an open-domain setup, it is desirable to tap into the vast reservoir of implicit knowledge already encoded in the parameters of pre-trained LMs. In this work, we provide a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements. To do this, we describe a procedure for automatically generating datasets that teach a model new reasoning skills, and demonstrate that models learn to effectively perform inference which involves implicit taxonomic and world knowledge, chaining and counting. Finally, we show that "teaching" models to reason generalizes beyond the training distribution: they successfully compose the usage of multiple reasoning skills in single examples. Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
\ No newline at end of file
diff --git a/data/2020/neurips/Learnability with Indirect Supervision Signals b/data/2020/neurips/Learnability with Indirect Supervision Signals
new file mode 100644
index 0000000000..1e53d31668
--- /dev/null
+++ b/data/2020/neurips/Learnability with Indirect Supervision Signals	
@@ -0,0 +1 @@
+Learning from indirect supervision signals is important in real-world AI applications when, often, gold labels are missing or too costly. In this paper, we develop a unified theoretical framework for multi-class classification when the supervision is provided by a variable that contains nonzero mutual information with the gold label. The nature of this problem is determined by (i) the transition probability from the gold labels to the indirect supervision variables and (ii) the learner's prior knowledge about the transition. Our framework relaxes assumptions made in the literature, and supports learning with unknown, non-invertible and instance-dependent transitions. Our theory introduces a novel concept called \emph{separation}, which characterizes the learnability and generalization bounds. We also demonstrate the application of our framework via concrete novel results in a variety of learning scenarios such as learning with superset annotations and joint supervision signals.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning About Objects by Learning to Interact with Them b/data/2020/neurips/Learning About Objects by Learning to Interact with Them
new file mode 100644
index 0000000000..062336f22a
--- /dev/null
+++ b/data/2020/neurips/Learning About Objects by Learning to Interact with Them	
@@ -0,0 +1 @@
+Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction. Our agent, when placed within the near photo-realistic and physics-enabled AI2-THOR environment, interacts with its world and learns about objects, their geometric extents and relative masses, without any external guidance. Our experiments reveal that this agent learns efficiently and effectively; not just for objects it has interacted with before, but also for novel instances from seen categories as well as novel object categories.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Affordance Landscapes for Interaction Exploration in 3D Environments b/data/2020/neurips/Learning Affordance Landscapes for Interaction Exploration in 3D Environments
new file mode 100644
index 0000000000..60f0e2166a
--- /dev/null
+++ b/data/2020/neurips/Learning Affordance Landscapes for Interaction Exploration in 3D Environments	
@@ -0,0 +1 @@
+Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby an embodied agent autonomously discovers the affordance landscape of a new unmapped 3D environment (such as an unfamiliar kitchen). Given an egocentric RGB-D camera and a high-level action space, the agent is rewarded for maximizing successful interactions while simultaneously training an image-based affordance segmentation model. The former yields a policy for acting efficiently in new environments to prepare for downstream interaction tasks, while the latter yields a convolutional neural network that maps image regions to the likelihood they permit each action, densifying the rewards for exploration. We demonstrate our idea with AI2-iTHOR. The results show agents can learn how to use new home environments intelligently and that it prepares them to rapidly address various downstream tasks like "find a knife and put it in the drawer." Project page: this http URL
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Agent Representations for Ice Hockey b/data/2020/neurips/Learning Agent Representations for Ice Hockey
new file mode 100644
index 0000000000..ea3be6e436
--- /dev/null
+++ b/data/2020/neurips/Learning Agent Representations for Ice Hockey	
@@ -0,0 +1 @@
+Team sports is a new application domain for agent modeling with high real-world impact. A fundamental challenge for modeling professional players is their large number (over 1K), which includes many bench players with sparse participation in a game season. The diversity and sparsity of player observations make it difﬁcult to extend previous agent representation models to the sports domain. This paper develops a new approach for agent representations, based on a Markov game model, that is tailored towards applications in professional ice hockey. We introduce a novel framewwork player representation via player generation , where a variational encoder embeds player information with latent variables. The encoder learns a context-speciﬁc shared prior to induce a shrinkage effect for the posterior player representations, allowing it to share statistical information across players with different participation rates. To capture the complex play dynamics in sequential sports data, we design a Variational Recurrent Ladder Agent Encoder (VaRLAE). This architecture provides a contextualized player representation with a hierarchy of latent variables that effectively prevents latent posterior collapse. We validate our player representations in three major sports analytics tasks. Our experimental results, based on a large dataset that contains over 4.5M events, show state-of-the-art performance for our VarLAE on facilitating 1) identifying the acting player, 2) estimating expected goals, and 3) predicting the ﬁnal score difference.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Augmented Energy Minimization via Speed Scaling b/data/2020/neurips/Learning Augmented Energy Minimization via Speed Scaling
new file mode 100644
index 0000000000..1adde2d4d7
--- /dev/null
+++ b/data/2020/neurips/Learning Augmented Energy Minimization via Speed Scaling	
@@ -0,0 +1 @@
+As power management has become a primary concern in modern data centers, computing resources are being scaled dynamically to minimize energy consumption. We initiate the study of a variant of the classic online speed scaling problem, in which machine learning predictions about the future can be integrated naturally. Inspired by recent work on learning-augmented online algorithms, we propose an algorithm which incorporates predictions in a black-box manner and outperforms any online algorithm if the accuracy is high, yet maintains provable guarantees if the prediction is very inaccurate. We provide both theoretical and experimental evidence to support our claims.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Bounds for Risk-sensitive Learning b/data/2020/neurips/Learning Bounds for Risk-sensitive Learning
new file mode 100644
index 0000000000..d46bc881f7
--- /dev/null
+++ b/data/2020/neurips/Learning Bounds for Risk-sensitive Learning	
@@ -0,0 +1 @@
+In risk-sensitive learning, one aims to find a hypothesis that minimizes a risk-averse (or risk-seeking) measure of loss, instead of the standard expected loss. In this paper, we propose to study the generalization properties of risk-sensitive learning schemes whose optimand is described via optimized certainty equivalents (OCE): our general scheme can handle various known risks, e.g., the entropic risk, mean-variance, and conditional value-at-risk, as special cases. We provide two learning bounds on the performance of empirical OCE minimizer. The first result gives an OCE guarantee based on the Rademacher average of the hypothesis space, which generalizes and improves existing results on the expected loss and the conditional value-at-risk. The second result, based on a novel variance-based characterization of OCE, gives an expected loss guarantee with a suppressed dependence on the smoothness of the selected OCE. Finally, we demonstrate the practical implications of the proposed bounds via exploratory experiments on neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Causal Effects via Weighted Empirical Risk Minimization b/data/2020/neurips/Learning Causal Effects via Weighted Empirical Risk Minimization
new file mode 100644
index 0000000000..600c0d0ee9
--- /dev/null
+++ b/data/2020/neurips/Learning Causal Effects via Weighted Empirical Risk Minimization	
@@ -0,0 +1 @@
+Ground truths are estimated by generating 10^7 samples Dint from the model induced by the intervention P(Y|do(
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Certified Individually Fair Representations b/data/2020/neurips/Learning Certified Individually Fair Representations
new file mode 100644
index 0000000000..8105358b11
--- /dev/null
+++ b/data/2020/neurips/Learning Certified Individually Fair Representations	
@@ -0,0 +1 @@
+To effectively enforce fairness constraints one needs to define an appropriate notion of fairness and employ representation learning in order to impose this notion without compromising downstream utility for the data consumer. A desirable notion is individual fairness as it guarantees similar treatment for similar individuals. In this work, we introduce the first method which generalizes individual fairness to rich similarity notions via logical constraints while also enabling data consumers to obtain fairness certificates for their models. The key idea is to learn a representation that provably maps similar individuals to latent representations at most $\epsilon$ apart in $\ell_{\infty}$-distance, enabling data consumers to certify individual fairness by proving $\epsilon$-robustness of their classifier. Our experimental evaluation on six real-world datasets and a wide range of fairness constraints demonstrates that our approach is expressive enough to capture similarity notions beyond existing distance metrics while scaling to realistic use cases.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Composable Energy Surrogates for PDE Order Reduction b/data/2020/neurips/Learning Composable Energy Surrogates for PDE Order Reduction
new file mode 100644
index 0000000000..6bd3174ba3
--- /dev/null
+++ b/data/2020/neurips/Learning Composable Energy Surrogates for PDE Order Reduction	
@@ -0,0 +1 @@
+Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour--whether electromagnetic, thermal, or mechanical--arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverage parametric modular structure to learn component-level surrogates, enabling cheaper high-fidelity simulation. We use a neural network to model the stored potential energy in a component given boundary conditions. This yields a structured prediction task: macroscopic behavior is determined by the minimizer of the system's total potential energy, which can be approximated by composing these surrogate models. Composable energy surrogates thus permit simulation in the reduced basis of component boundaries. Costly ground-truth simulation of the full structure is avoided, as training data are generated by performing finite element analysis with individual components. Using dataset aggregation to choose training boundary conditions allows us to learn energy surrogates which produce accurate macroscopic behavior when composed, accelerating simulation of parametric meta-materials.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Compositional Rules via Neural Program Synthesis b/data/2020/neurips/Learning Compositional Rules via Neural Program Synthesis
new file mode 100644
index 0000000000..64b8fbee80
--- /dev/null
+++ b/data/2020/neurips/Learning Compositional Rules via Neural Program Synthesis	
@@ -0,0 +1 @@
+Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematically from training. In this work, we present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples, drawing upon techniques from the neural program synthesis literature. Our rule-synthesis approach outperforms neural meta-learning techniques in three domains: an artificial instruction-learning domain used to evaluate human learning, the SCAN challenge datasets, and learning rule-based translations of number words into integers for a wide range of human languages.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations b/data/2020/neurips/Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations
new file mode 100644
index 0000000000..1dea29f536
--- /dev/null
+++ b/data/2020/neurips/Learning Continuous System Dynamics from Irregularly-Sampled Partial Observations	
@@ -0,0 +1 @@
+Many real-world systems, such as moving planets, can be considered as multi-agent dynamic systems, where objects interact with each other and co-evolve along with the time. Such dynamics is usually difficult to capture, and understanding and predicting the dynamics based on observed trajectories of objects become a critical research problem in many domains. Most existing algorithms, however, assume the observations are regularly sampled and all the objects can be fully observed at each sampling time, which is impractical for many applications. In this paper, we propose to learn system dynamics from irregularly-sampled partial observations with underlying graph structure for the first time. To tackle the above challenge, we present LG-ODE, a latent ordinary differential equation generative model for modeling multi-agent dynamic system with known graph structure. It can simultaneously learn the embedding of high dimensional trajectories and infer continuous latent system dynamics. Our model employs a novel encoder parameterized by a graph neural network that can infer initial states in an unsupervised way from irregularly-sampled partial observations of structural objects and utilizes neuralODE to infer arbitrarily complex continuous-time latent dynamics. Experiments on motion capture, spring system, and charged particle datasets demonstrate the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Deep Attribution Priors Based On Prior Knowledge b/data/2020/neurips/Learning Deep Attribution Priors Based On Prior Knowledge
new file mode 100644
index 0000000000..a60dc5d033
--- /dev/null
+++ b/data/2020/neurips/Learning Deep Attribution Priors Based On Prior Knowledge	
@@ -0,0 +1 @@
+Feature attribution methods, which explain an individual prediction made by a model as a sum of attributions for each input feature, are an essential tool for understanding the behavior of complex deep learning models. However, ensuring that models produce meaningful explanations, rather than ones that rely on noise, is not straightforward. Exacerbating this problem is the fact that attribution methods do not provide insight as to why features are assigned their attribution values, leading to explanations that are difficult to interpret. In real-world problems we often have sets of additional information for each feature that are predictive of that feature's importance to the task at hand. Here, we propose the deep attribution prior (DAPr) framework to exploit such information to overcome the limitations of attribution methods. Our framework jointly learns a relationship between prior information and feature importance, as well as biases models to have explanations that rely on features predicted to be important. We find that our framework both results in networks that generalize better to out of sample data and admits new methods for interpreting model behavior.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Deformable Tetrahedral Meshes for 3D Reconstruction b/data/2020/neurips/Learning Deformable Tetrahedral Meshes for 3D Reconstruction
new file mode 100644
index 0000000000..e00cfecf32
--- /dev/null
+++ b/data/2020/neurips/Learning Deformable Tetrahedral Meshes for 3D Reconstruction	
@@ -0,0 +1 @@
+3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics. Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations. We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem. Unlike existing volumetric approaches, DefTet optimizes for both vertex placement and occupancy, and is differentiable with respect to standard 3D reconstruction loss functions. It is thus simultaneously high-precision, volumetric, and amenable to learning-based neural architectures. We show that it can represent arbitrary, complex topology, is both memory and computationally efficient, and can produce high-fidelity reconstructions with a significantly smaller grid size than alternative volumetric approaches. The predicted surfaces are also inherently defined as tetrahedral meshes, thus do not require post-processing. We demonstrate that DefTet matches or exceeds both the quality of the previous best approaches and the performance of the fastest ones. Our approach obtains high-quality tetrahedral meshes computed directly from noisy point clouds, and is the first to showcase high-quality 3D tet-mesh results using only a single image as input. Our project webpage: https://nv-tlabs.github.io/DefTet/
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Differentiable Programs with Admissible Neural Heuristics b/data/2020/neurips/Learning Differentiable Programs with Admissible Neural Heuristics
new file mode 100644
index 0000000000..959b1ad21a
--- /dev/null
+++ b/data/2020/neurips/Learning Differentiable Programs with Admissible Neural Heuristics	
@@ -0,0 +1 @@
+We study the problem of learning differentiable functions expressed as programs in a domain-specific language. Such programmatic models can offer benefits such as composability and interpretability; however, learning them requires optimizing over a combinatorial space of program "architectures". We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax. Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs, which can then be used to complete any partial program. This relaxed program is differentiable and can be trained end-to-end, and the resulting training loss is an approximately admissible heuristic that can guide the combinatorial search. We instantiate our approach on top of the A-star algorithm and an iteratively deepened branch-and-bound search, and use these algorithms to learn programmatic classifiers in three sequence classification tasks. Our experiments show that the algorithms outperform state-of-the-art methods for program learning, and that they discover programmatic classifiers that yield natural interpretations and achieve competitive accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Differential Equations that are Easy to Solve b/data/2020/neurips/Learning Differential Equations that are Easy to Solve
new file mode 100644
index 0000000000..02e6981b7c
--- /dev/null
+++ b/data/2020/neurips/Learning Differential Equations that are Easy to Solve	
@@ -0,0 +1 @@
+Differential equations parameterized by neural networks become expensive to solve numerically as training progresses. We propose a remedy that encourages learned dynamics to be easier to solve. Specifically, we introduce a differentiable surrogate for the time cost of standard numerical solvers, using higher-order derivatives of solution trajectories. These derivatives are efficient to compute with Taylor-mode automatic differentiation. Optimizing this additional objective trades model performance against the time cost of solving the learned dynamics. We demonstrate our approach by training substantially faster, while nearly as accurate, models in supervised classification, density estimation, and time-series modelling tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration b/data/2020/neurips/Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
new file mode 100644
index 0000000000..6a67aa87de
--- /dev/null
+++ b/data/2020/neurips/Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration	
@@ -0,0 +1 @@
+Discrete structures play an important role in applications like program language modeling and software engineering. Current approaches to predicting complex structures typically consider autoregressive models for their tractability, with some sacrifice in flexibility. Energy-based models (EBMs) on the other hand offer a more flexible and thus more powerful approach to modeling such distributions, but require partition function estimation. In this paper we propose ALOE, a new algorithm for learning conditional and unconditional EBMs for discrete structured data, where parameter gradients are estimated using a learned sampler that mimics local search. We show that the energy function and sampler can be trained efficiently via a new variational form of power iteration, achieving a better trade-off between flexibility and tractability. Experimentally, we show that learning local search leads to significant improvements in challenging application domains. Most notably, we present an energy model guided fuzzer for software testing that achieves comparable performance to well engineered fuzzing engines like libfuzzer.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Disentangled Representations and Group Structure of Dynamical Environments b/data/2020/neurips/Learning Disentangled Representations and Group Structure of Dynamical Environments
new file mode 100644
index 0000000000..c605a8e9e4
--- /dev/null
+++ b/data/2020/neurips/Learning Disentangled Representations and Group Structure of Dynamical Environments	
@@ -0,0 +1 @@
+Learning disentangled representations is a key step towards effectively discovering and modelling the underlying structure of environments. In the natural sciences, physics has found great success by describing the universe in terms of symmetry preserving transformations. Inspired by this formalism, we propose a framework, built upon the theory of group representation, for learning representations of a dynamical environment structured around the transformations that generate its evolution. Experimentally, we learn the structure of explicitly symmetric environments without supervision from observational data generated by sequential interactions. We further introduce an intuitive disentanglement regularisation to ensure the interpretability of the learnt representations. We show that our method enables accurate long-horizon predictions, and demonstrate a correlation between the quality of predictions and disentanglement in the latent space.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Disentangled Representations of Videos with Missing Data b/data/2020/neurips/Learning Disentangled Representations of Videos with Missing Data
new file mode 100644
index 0000000000..48dfa05d57
--- /dev/null
+++ b/data/2020/neurips/Learning Disentangled Representations of Videos with Missing Data	
@@ -0,0 +1 @@
+This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction b/data/2020/neurips/Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction
new file mode 100644
index 0000000000..778aaaad9f
--- /dev/null
+++ b/data/2020/neurips/Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction	
@@ -0,0 +1 @@
+To learn intrinsic low-dimensional structures from high-dimensional data that most discriminate between classes, we propose the principle of Maximal Coding Rate Reduction ($\text{MCR}^2$), an information-theoretic measure that maximizes the coding rate difference between the whole dataset and the sum of each individual class. We clarify its relationships with most existing frameworks such as cross-entropy, information bottleneck, information gain, contractive and contrastive learning, and provide theoretical guarantees for learning diverse and discriminative features. The coding rate can be accurately computed from finite samples of degenerate subspace-like distributions and can learn intrinsic representations in supervised, self-supervised, and unsupervised settings in a unified manner. Empirically, the representations learned using this principle alone are significantly more robust to label corruptions in classification than those using cross-entropy, and can lead to state-of-the-art results in clustering mixed data from self-learned invariant features.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Dynamic Belief Graphs to Generalize on Text-Based Games b/data/2020/neurips/Learning Dynamic Belief Graphs to Generalize on Text-Based Games
new file mode 100644
index 0000000000..bd71926b6c
--- /dev/null
+++ b/data/2020/neurips/Learning Dynamic Belief Graphs to Generalize on Text-Based Games	
@@ -0,0 +1 @@
+Playing text-based games requires skills in processing natural language and sequential decision making. Achieving human-level performance on text-based games remains an open challenge, and prior research has largely relied on hand-crafted structured representations and heuristics. In this work, we investigate how an agent can plan and generalize in text-based games using graph-structured representations learned end-to-end from raw text. We propose a novel graph-aided transformer agent (GATA) that infers and updates latent belief graphs during planning to enable effective action selection by capturing the underlying game dynamics. GATA is trained using a combination of reinforcement and self-supervised learning. Our work demonstrates that the learned graph-based representations help agents converge to better policies than their text-only counterparts and facilitate effective generalization across game configurations. Experiments on 500+ unique games from the TextWorld suite show that our best agent outperforms text-based baselines by an average of 24.2%.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Feature Sparse Principal Subspace b/data/2020/neurips/Learning Feature Sparse Principal Subspace
new file mode 100644
index 0000000000..22ef57d9aa
--- /dev/null
+++ b/data/2020/neurips/Learning Feature Sparse Principal Subspace	
@@ -0,0 +1 @@
+The principal subspace estimation is directly connected to dimension reduction and is important when there is more than one principal component of interest. In this article, we introduce two new algorithms to solve the feature-sparsity constrained PCA problem (FSPCA) for the principal subspace estimation task, which performs feature selection and PCA simultaneously. Existing optimization methods for FSPCA require data distribution assumptions and are lack of global convergence guarantee. Though the general FSPCA problem is NP-hard, we show that, for a low-rank covariance, FSPCA can be solved globally (Algorithm 1). Then, we propose another strategy (Algorithm 2) to solve FSPCA for the general covariance by iteratively building a carefully designed proxy. We prove (data-dependent) approximation bound and regular stationary convergence guarantees for the new algorithms. For the spectrum of covariance with exponential/Zipf's distribution, we provide exponential/posynomial approximation bounds. Constructive examples and numerical results are provided to demonstrate the tightness of our results. Experimental results show the promising performance and efficiency of the new algorithms compared with the state-of-the-arts on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Global Transparent Models consistent with Local Contrastive Explanations b/data/2020/neurips/Learning Global Transparent Models consistent with Local Contrastive Explanations
new file mode 100644
index 0000000000..6e9daf19d2
--- /dev/null
+++ b/data/2020/neurips/Learning Global Transparent Models consistent with Local Contrastive Explanations	
@@ -0,0 +1,2 @@
+There is a rich and growing literature on producing local contrastive/counterfactual explanations for black-box models (e.g. neural networks). 
+In these methods, for an input, an explanation is in the form of a contrast point differing in very few features from the original input and lying in a different class. Other works try to build globally interpretable models like decision trees and rule lists based on the data using actual labels or based on the black-box models predictions. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific black-box of choice. In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model? We introduce a natural local consistency metric that quantifies if the local explanations and predictions of the black-box model are also consistent with the proxy global transparent model. Based on a key insight we propose a novel method where we create custom boolean features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these, and showcase empirically that such models have higher local consistency compared with other known strategies, while still being close in performance to models that are trained with access to the original data.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Graph Structure With A Finite-State Automaton Layer b/data/2020/neurips/Learning Graph Structure With A Finite-State Automaton Layer
new file mode 100644
index 0000000000..e07d34b118
--- /dev/null
+++ b/data/2020/neurips/Learning Graph Structure With A Finite-State Automaton Layer	
@@ -0,0 +1 @@
+Graph-based neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finite-state automaton. We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies on a graph-based POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph Finite-State Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graph-based model trained end-to-end on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using hand-engineered semantic edges or other baseline methods for adding learned edge types.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Guidance Rewards with Trajectory-space Smoothing b/data/2020/neurips/Learning Guidance Rewards with Trajectory-space Smoothing
new file mode 100644
index 0000000000..1368784c23
--- /dev/null
+++ b/data/2020/neurips/Learning Guidance Rewards with Trajectory-space Smoothing	
@@ -0,0 +1 @@
+Long-term temporal credit assignment is an important challenge in deep reinforcement learning (RL). It refers to the ability of the agent to attribute actions to consequences that may occur after a long time interval. Existing policy-gradient and Q-learning algorithms typically rely on dense environmental rewards that provide rich short-term supervision and help with credit assignment. However, they struggle to solve tasks with delays between an action and the corresponding rewarding feedback. To make credit assignment easier, recent works have proposed algorithms to learn dense "guidance" rewards that could be used in place of the sparse or delayed environmental rewards. This paper is in the same vein -- starting with a surrogate RL objective that involves smoothing in the trajectory-space, we arrive at a new algorithm for learning guidance rewards. We show that the guidance rewards have an intuitive interpretation, and can be obtained without training any additional neural networks. Due to the ease of integration, we use the guidance rewards in a few popular algorithms (Q-learning, Actor-Critic, Distributional-RL) and present results in single-agent and multi-agent tasks that elucidate the benefit of our approach when the environmental rewards are sparse or delayed.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning b/data/2020/neurips/Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..25ab3a972a
--- /dev/null
+++ b/data/2020/neurips/Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+We present a multi-agent actor-critic method that aims to implicitly address the credit assignment problem under fully cooperative settings. Our key motivation is that credit assignment among agents may not require an explicit formulation as long as (1) the policy gradients derived from a centralized critic carry sufficient information for the decentralized agents to maximize their joint action value through optimal cooperation and (2) a sustained level of exploration is enforced throughout training. Under the centralized training with decentralized execution (CTDE) paradigm, we achieve the former by formulating the centralized critic as a hypernetwork such that a latent state representation is integrated into the policy gradients through its multiplicative association with the stochastic policies; to achieve the latter, we derive a simple technique called adaptive entropy regularization where magnitudes of the entropy gradients are dynamically rescaled based on the current policy stochasticity to encourage consistent levels of exploration. Our algorithm, referred to as LICA, is evaluated on several benchmarks including the multi-agent particle environments and a set of challenging StarCraft II micromanagement tasks, and we show that LICA significantly outperforms previous methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence b/data/2020/neurips/Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence
new file mode 100644
index 0000000000..d07026438c
--- /dev/null
+++ b/data/2020/neurips/Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence	
@@ -0,0 +1 @@
+The goal of this paper is to learn dense 3D shape correspondence for topology-varying objects in an unsupervised manner. Conventional implicit functions estimate the occupancy of a 3D point given a shape latent code. Instead, our novel implicit function produces a part embedding vector for each 3D point, which is assumed to be similar to its densely corresponded point in another 3D shape of the same object category. Furthermore, we implement dense correspondence through an inverse function mapping from the part embedding to a corresponded 3D point. Both functions are jointly learned with several effective loss functions to realize our assumption, together with the encoder generating the shape latent code. During inference, if a user selects an arbitrary point on the source shape, our algorithm can automatically generate a confidence score indicating whether there is a correspondence on the target shape, as well as the corresponding semantic point if there is one. Such a mechanism inherently benefits man-made objects with different part constitutions. The effectiveness of our approach is demonstrated through unsupervised 3D semantic correspondence and shape segmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Individually Inferred Communication for Multi-Agent Cooperation b/data/2020/neurips/Learning Individually Inferred Communication for Multi-Agent Cooperation
new file mode 100644
index 0000000000..02160ad5ec
--- /dev/null
+++ b/data/2020/neurips/Learning Individually Inferred Communication for Multi-Agent Cooperation	
@@ -0,0 +1 @@
+Communication lays the foundation for human cooperation. It is also crucial for multi-agent cooperation. However, existing work focuses on broadcast communication, which is not only impractical but also leads to information redundancy that could even impair the learning process. To tackle these difficulties, we propose \textit{Individually Inferred Communication} (I2C), a simple yet effective model to enable agents to learn a prior for agent-agent communication. The prior knowledge is learned via causal inference and realized by a feed-forward neural network that maps the agent's local observation to a belief about who to communicate with. The influence of one agent on another is inferred via the joint action-value function in multi-agent reinforcement learning and quantified to label the necessity of agent-agent communication. Furthermore, the agent policy is regularized to better exploit communicated messages. Empirically, we show that I2C can not only reduce communication overhead but also improve the performance in a variety of multi-agent cooperative scenarios, comparing to existing methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Invariances in Neural Networks from Training Data b/data/2020/neurips/Learning Invariances in Neural Networks from Training Data
new file mode 100644
index 0000000000..5bdd7e7d58
--- /dev/null
+++ b/data/2020/neurips/Learning Invariances in Neural Networks from Training Data	
@@ -0,0 +1 @@
+Invariances to translations have imbued convolutional neural networks with powerful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to \emph{learn} invariances and equivariances by parameterizing a distribution over augmentations and optimizing the training loss simultaneously with respect to the network parameters and augmentation parameters. With this simple procedure we can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations, on training data alone.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Invariants through Soft Unification b/data/2020/neurips/Learning Invariants through Soft Unification
new file mode 100644
index 0000000000..e544a13b2b
--- /dev/null
+++ b/data/2020/neurips/Learning Invariants through Soft Unification	
@@ -0,0 +1 @@
+Human reasoning involves recognising common underlying principles across many examples by utilising variables. The by-products of such reasoning are invariants that capture patterns across examples such as "if someone went somewhere then they are there" without mentioning specific people or places. Humans learn what variables are and how to use them at a young age, and the question this paper addresses is whether machines can also learn and use variables solely from examples without requiring human pre-engineering. We propose Unification Networks that incorporate soft unification into neural networks to learn variables and by doing so lift examples into invariants that can then be used to solve a given task. We evaluate our approach on four datasets to demonstrate that learning invariants captures patterns in the data and can improve performance over baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Kernel Tests Without Data Splitting b/data/2020/neurips/Learning Kernel Tests Without Data Splitting
new file mode 100644
index 0000000000..e6358d4a52
--- /dev/null
+++ b/data/2020/neurips/Learning Kernel Tests Without Data Splitting	
@@ -0,0 +1 @@
+Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Latent Space Energy-Based Prior Model b/data/2020/neurips/Learning Latent Space Energy-Based Prior Model
new file mode 100644
index 0000000000..2461ca886d
--- /dev/null
+++ b/data/2020/neurips/Learning Latent Space Energy-Based Prior Model	
@@ -0,0 +1 @@
+The generator model assumes that the observed example is generated by a low-dimensional latent vector via a top-down network, and the latent vector follows a simple and known prior distribution, such as uniform or Gaussian white noise distribution. While we can learn an expressive top-down network to map the prior distribution to the data distribution, we can also learn an expressive prior model instead of assuming a given prior distribution. This follows the philosophy of empirical Bayes where the prior model is learned from the observed data. We propose to learn an energy-based prior model for the latent vector, where the energy function is parametrized by a very simple multi-layer perceptron. Due to the low-dimensionality of the latent space, learning a latent space energy-based prior model proves to be both feasible and desirable. In this paper, we develop the maximum likelihood learning algorithm and its variation based on short-run Markov chain Monte Carlo sampling from the prior and the posterior distributions of the latent vector, and we show that the learned model exhibits strong performance in terms of image and text generation and anomaly detection.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Linear Programs from Optimal Decisions b/data/2020/neurips/Learning Linear Programs from Optimal Decisions
new file mode 100644
index 0000000000..80f8685115
--- /dev/null
+++ b/data/2020/neurips/Learning Linear Programs from Optimal Decisions	
@@ -0,0 +1 @@
+We propose a flexible gradient-based framework for learning linear programs from optimal decisions. Linear programs are often specified by hand, using prior knowledge of relevant costs and constraints. In some applications, linear programs must instead be learned from observations of optimal decisions. Learning from optimal decisions is a particularly challenging bi-level problem, and much of the related inverse optimization literature is dedicated to special cases. We tackle the general problem, learning all parameters jointly while allowing flexible parametrizations of costs, constraints, and loss functions. We also address challenges specific to learning linear programs, such as empty feasible regions and non-unique optimal decisions. Experiments show that our method successfully learns synthetic linear programs and minimum-cost multi-commodity flow instances for which previous methods are not directly applicable. We also provide a fast batch-mode PyTorch implementation of the homogeneous interior point algorithm, which supports gradients by implicit differentiation or backpropagation.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Loss for Test-Time Augmentation b/data/2020/neurips/Learning Loss for Test-Time Augmentation
new file mode 100644
index 0000000000..707be9e85a
--- /dev/null
+++ b/data/2020/neurips/Learning Loss for Test-Time Augmentation	
@@ -0,0 +1 @@
+Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level test-time augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input. The network obtains the results by averaging the prediction results of augmented inputs. Experimental results on several image classification benchmarks show that the proposed instance-aware test-time augmentation improves the model's robustness against various corruptions.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Manifold Implicitly via Explicit Heat-Kernel Learning b/data/2020/neurips/Learning Manifold Implicitly via Explicit Heat-Kernel Learning
new file mode 100644
index 0000000000..d7a1d3a052
--- /dev/null
+++ b/data/2020/neurips/Learning Manifold Implicitly via Explicit Heat-Kernel Learning	
@@ -0,0 +1 @@
+Manifold learning is a fundamental problem in machine learning with numerous applications. Most of the existing methods directly learn the low-dimensional embedding of the data in some high-dimensional space, and usually lack the flexibility of being directly applicable to down-stream applications. In this paper, we propose the concept of implicit manifold learning, where manifold information is implicitly obtained by learning the associated heat kernel. A heat kernel is the solution of the corresponding heat equation, which describes how "heat" transfers on the manifold, thus containing ample geometric information of the manifold. We provide both practical algorithm and theoretical analysis of our framework. The learned heat kernel can be applied to various kernel-based machine learning models, including deep generative models (DGM) for data generation and Stein Variational Gradient Descent for Bayesian inference. Extensive experiments show that our framework can achieve state-of-the-art results compared to existing methods for the two tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Multi-Agent Communication through Structured Attentive Reasoning b/data/2020/neurips/Learning Multi-Agent Communication through Structured Attentive Reasoning
new file mode 100644
index 0000000000..fd3b754f09
--- /dev/null
+++ b/data/2020/neurips/Learning Multi-Agent Communication through Structured Attentive Reasoning	
@@ -0,0 +1 @@
+Learning communication via deep reinforcement learning has recently been shown to be an effective way to solve cooperative multi-agent tasks. However, learning which communicated information is beneﬁcial for each agent’s decision-making process remains a challenging task. In order to address this problem, we explore relational reinforcement learning which leverages attention-based networks to learn efﬁcient and interpretable relations between entities. On the foundation of relations, we introduce a novel communication architecture that exploits a memory-based attention network that selectively reasons about the value of information received from other agents while considering its past experiences. Speciﬁcally, the model communicates by ﬁrst computing the relevance of messages received from other agents and then extracts task-relevant information from memories given the newly received information. We empirically demonstrate the strength of our model in cooperative and competitive multi-agent tasks, where inter-agent communication and reasoning over prior information substantially improves performance compared to baselines. We further show in the accompanying videos and experimental results that the agents learn a sophisticated and diverse set of cooperative behaviors to solve challenging tasks, both for discrete and continuous action spaces using on-policy and off-policy gradient methods. By developing an explicit architecture that is targeted towards communication, our work aims to open new directions to overcome important challenges in multi-agent cooperation through learned communication.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks b/data/2020/neurips/Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks
new file mode 100644
index 0000000000..df59e1a38f
--- /dev/null
+++ b/data/2020/neurips/Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks	
@@ -0,0 +1 @@
+Maximum target coverage by adjusting the orientation of distributed sensors is an important problem in directional sensor networks (DSNs). This problem is challenging as the targets usually move randomly but the coverage range of sensors is limited in angle and distance. Thus, it is required to coordinate sensors to get ideal target coverage with low power consumption, e.g. no missing targets or reducing redundant coverage. To realize this, we propose a Hierarchical Target-oriented Multi-Agent Coordination (HiT-MAC), which decomposes the target coverage problem into two-level tasks: targets assignment by a coordinator and tracking assigned targets by executors. Specifically, the coordinator periodically monitors the environment globally and allocates targets to each executor. In turn, the executor only needs to track its assigned targets. To effectively learn the HiT-MAC by reinforcement learning, we further introduce a bunch of practical methods, including a self-attention module, marginal contribution approximation for the coordinator, goal-conditional observation filter for the executor, etc. Empirical results demonstrate the advantage of HiT-MAC in coverage rate, learning efficiency,and scalability, comparing to baselines. We also conduct an ablative analysis on the effectiveness of the introduced components in the framework.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Mutational Semantics b/data/2020/neurips/Learning Mutational Semantics
new file mode 100644
index 0000000000..e1e15ed892
--- /dev/null
+++ b/data/2020/neurips/Learning Mutational Semantics	
@@ -0,0 +1 @@
+In many natural domains, changing a small part of an entity can transform its semantics; for example, a single word change can alter the meaning of a sentence, or a single amino acid change can mutate a viral protein to escape antiviral treatment or immunity. Although identifying such mutations can be desirable (for example, therapeutic design that anticipates avenues of viral escape), the rules governing semantic change are often hard to quantify. Here, we introduce the problem of identifying mutations with a large effect on semantics, but where valid mutations are under complex constraints (for example, English grammar or biological viability), which we refer to as constrained semantic change search (CSCS). We propose an unsupervised solution based on language models that simultaneously learn continuous latent representations. We report good empirical performance on CSCS of single-word mutations to news headlines, map a continuous semantic space of viral variation, and, notably, show unprecedented zero-shot prediction of single-residue escape mutations to key inﬂuenza and HIV proteins, suggesting a productive link between modeling natural language and pathogenic evolution. 1
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views b/data/2020/neurips/Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
new file mode 100644
index 0000000000..9b70959540
--- /dev/null
+++ b/data/2020/neurips/Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views	
@@ -0,0 +1 @@
+Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for unsupervised object-centric scene representation are incapable of aggregating information from multiple observations of a scene. As a result, these"single-view"methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose The Multi-View and Multi-Object Network (MulMON) -- a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. In order to sidestep the main technical difficulty of the multi-object-multi-view scenario -- maintaining object correspondences across views -- MulMON iteratively updates the latent object representations for a scene over multiple views. To ensure that these iterative updates do indeed aggregate spatial information to form a complete 3D scene understanding, MulMON is asked to predict the appearance of the scene from novel viewpoints during training. Through experiments, we show that MulMON better-resolves spatial ambiguities than single-view methods -- learning more accurate and disentangled object representations -- and also achieves new functionality in predicting object segmentations for novel viewpoints.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Optimal Representations with the Decodable Information Bottleneck b/data/2020/neurips/Learning Optimal Representations with the Decodable Information Bottleneck
new file mode 100644
index 0000000000..31199ac626
--- /dev/null
+++ b/data/2020/neurips/Learning Optimal Representations with the Decodable Information Bottleneck	
@@ -0,0 +1 @@
+We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Parities with Neural Networks b/data/2020/neurips/Learning Parities with Neural Networks
new file mode 100644
index 0000000000..8854802b97
--- /dev/null
+++ b/data/2020/neurips/Learning Parities with Neural Networks	
@@ -0,0 +1 @@
+In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don't seem to capture the "deepness" of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Physical Constraints with Neural Projections b/data/2020/neurips/Learning Physical Constraints with Neural Projections
new file mode 100644
index 0000000000..62afcbe808
--- /dev/null
+++ b/data/2020/neurips/Learning Physical Constraints with Neural Projections	
@@ -0,0 +1 @@
+We propose a new family of neural networks to predict the behaviors of physical systems by learning their underpinning constraints. A neural projection operator liesat the heart of our approach, composed of a lightweight network with an embedded recursive architecture that interactively enforces learned underpinning constraints and predicts the various governed behaviors of different physical systems. Our neural projection operator is motivated by the position-based dynamics model that has been used widely in game and visual effects industries to unify the various fast physics simulators. Our method can automatically and effectively uncover a broad range of constraints from observation point data, such as length, angle, bending, collision, boundary effects, and their arbitrary combinations, without any connectivity priors. We provide a multi-group point representation in conjunction with a configurable network connection mechanism to incorporate prior inputs for processing complex physical systems. We demonstrated the efficacy of our approach by learning a set of challenging physical systems all in a unified and simple fashion including: rigid bodies with complex geometries, ropes with varying length and bending, articulated soft and rigid bodies, and multi-object collisions with complex boundaries.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Physical Graph Representations from Visual Scenes b/data/2020/neurips/Learning Physical Graph Representations from Visual Scenes
new file mode 100644
index 0000000000..93c59db5c5
--- /dev/null
+++ b/data/2020/neurips/Learning Physical Graph Representations from Visual Scenes	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) have proved exceptional at learning representations for visual object categorization. However, CNNs do not explicitly encode objects, parts, and their physical properties, which has limited CNNs' success on tasks that require structured understanding of visual scenes. To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts. Bound to each node is a vector of latent attributes that intuitively represent object properties such as surface shape and texture. We also describe PSGNet, a network architecture that learns to extract PSGs by reconstructing scenes through a PSG-structured bottleneck. PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures; and perceptual grouping principles to encourage the identification of meaningful scene elements. We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks, especially on complex real-world images, and generalizes well to unseen object types and scene arrangements. PSGNet is also able learn from physical motion, enhancing scene estimates even for static images. We present a series of ablation studies illustrating the importance of each component of the PSGNet architecture, analyses showing that learned latent attributes capture intuitive scene properties, and illustrate the use of PSGs for compositional scene inference.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Representations from Audio-Visual Spatial Alignment b/data/2020/neurips/Learning Representations from Audio-Visual Spatial Alignment
new file mode 100644
index 0000000000..b659e840e2
--- /dev/null
+++ b/data/2020/neurips/Learning Representations from Audio-Visual Spatial Alignment	
@@ -0,0 +1 @@
+We introduce a novel self-supervised pretext task for learning representations from audio-visual content. Prior work on audio-visual representation learning leverages correspondences at the video level. Approaches based on audio-visual correspondence (AVC) predict whether audio and video clips originate from the same or different video instances. Audio-visual temporal synchronization (AVTS) further discriminates negative pairs originated from the same video instance but at different moments in time. While these approaches learn high-quality representations for downstream tasks such as action recognition, their training objectives disregard spatial cues naturally occurring in audio and visual signals. To learn from these spatial cues, we tasked a network to perform contrastive audio-visual spatial alignment of 360{\deg} video and spatial audio. The ability to perform spatial alignment is enhanced by reasoning over the full spatial content of the 360{\deg} video using a transformer architecture to combine representations from multiple viewpoints. The advantages of the proposed pretext task are demonstrated on a variety of audio and visual downstream tasks, including audio-visual correspondence, spatial alignment, action recognition, and video semantic segmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Restricted Boltzmann Machines with Sparse Latent Variables b/data/2020/neurips/Learning Restricted Boltzmann Machines with Sparse Latent Variables
new file mode 100644
index 0000000000..2ff8fe78ca
--- /dev/null
+++ b/data/2020/neurips/Learning Restricted Boltzmann Machines with Sparse Latent Variables	
@@ -0,0 +1 @@
+Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity $\tilde{O}(n^2)$ for ferromagnetic RBMs (i.e., with attractive potentials) but $\tilde{O}(n^d)$ for general RBMs, where $n$ is the number of observed variables and $d$ is the maximum degree of a latent variable. Let the MRF neighborhood of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables. In this paper, we give an algorithm for learning general RBMs with time complexity $\tilde{O}(n^{2^s+1})$, where $s$ is the maximum number of latent variables connected to the MRF neighborhood of an observed variable. This is an improvement when $s < \log_2 (d-1)$, which corresponds to RBMs with sparse latent variables. Furthermore, we give a version of this learning algorithm that recovers a model with small prediction error and whose sample complexity is independent of the minimum potential in the Markov Random Field of the observed variables. This is of interest because the sample complexity of current algorithms scales with the inverse of the minimum potential, which cannot be controlled in terms of natural properties of the RBM.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Retrospective Knowledge with Reverse Reinforcement Learning b/data/2020/neurips/Learning Retrospective Knowledge with Reverse Reinforcement Learning
new file mode 100644
index 0000000000..851535d933
--- /dev/null
+++ b/data/2020/neurips/Learning Retrospective Knowledge with Reverse Reinforcement Learning	
@@ -0,0 +1 @@
+We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge. General Value Functions (GVFs) have enjoyed great success in representing predictive knowledge, i.e., answering questions about possible future outcomes such as "how much fuel will be consumed in expectation if we drive from A to B?". GVFs, however, cannot answer questions like "how much fuel do we expect a car to have given it is at B at time $t$?". To answer this question, we need to know when that car had a full tank and how that car came to B. Since such questions emphasize the influence of possible past events on the present, we refer to their answers as retrospective knowledge. In this paper, we show how to represent retrospective knowledge with Reverse GVFs, which are trained via Reverse RL. We demonstrate empirically the utility of Reverse GVFs in both representation learning and anomaly detection.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Rich Rankings b/data/2020/neurips/Learning Rich Rankings
new file mode 100644
index 0000000000..6896c8861a
--- /dev/null
+++ b/data/2020/neurips/Learning Rich Rankings	
@@ -0,0 +1 @@
+Although the foundations of ranking are well established, the ranking literature has primarily been focused on simple, unimodal models, e.g. the Mallows and Plackett-Luce models, that define distributions centered around a single total ordering. Explicit mixture models have provided some tools for modelling multimodal ranking data, though learning such models from data is often difficult. In this work, we contribute a contextual repeated selection (CRS) model that leverages recent advances in choice modeling to bring a natural multimodality and richness to the rankings space. We provide rigorous theoretical guarantees for maximum likelihood estimation under the model through structure-dependent tail risk and expected risk bounds. As a by-product, we also furnish the first tight bounds on the expected risk of maximum likelihood estimators for the multinomial logit (MNL) choice model and the Plackett-Luce (PL) ranking model, as well as the first tail risk bound on the PL ranking model. The CRS model significantly outperforms existing methods for modeling real world ranking data in a variety of settings, from racing to rank choice voting.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Robust Decision Policies from Observational Data b/data/2020/neurips/Learning Robust Decision Policies from Observational Data
new file mode 100644
index 0000000000..e577c59a49
--- /dev/null
+++ b/data/2020/neurips/Learning Robust Decision Policies from Observational Data	
@@ -0,0 +1 @@
+We address the problem of learning a decision policy from observational data of past decisions in contexts with features and associated outcomes. The past policy maybe unknown and in safety-critical applications, such as medical decision support, it is of interest to learn robust policies that reduce the risk of outcomes with high costs. In this paper, we develop a method for learning policies that reduce tails of the cost distribution at a specified level and, moreover, provide a statistically valid bound on the cost of each decision. These properties are valid under finite samples -- even in scenarios with uneven or no overlap between features for different decisions in the observed data -- by building on recent results in conformal prediction. The performance and statistical properties of the proposed method are illustrated using both real and synthetic data.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search b/data/2020/neurips/Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search
new file mode 100644
index 0000000000..71b1076338
--- /dev/null
+++ b/data/2020/neurips/Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search	
@@ -0,0 +1 @@
+High dimensional black-box optimization has broad applications but remains a challenging problem to solve. Given a set of samples $\{\vx_i, y_i\}$, building a global model (like Bayesian Optimization (BO)) suffers from the curse of dimensionality in the high-dimensional search space, while a greedy search may lead to sub-optimality. By recursively splitting the search space into regions with high/low function values, recent works like LaNAS shows good performance in Neural Architecture Search (NAS), reducing the sample complexity empirically. In this paper, we coin LA-MCTS that extends LaNAS to other domains. Unlike previous approaches, LA-MCTS learns the partition of the search space using a few samples and their function values in an online fashion. While LaNAS uses linear partition and performs uniform sampling in each region, our LA-MCTS adopts a nonlinear decision boundary and learns a local model to pick good candidates. If the nonlinear partition function and the local model fits well with ground-truth black-box function, then good partitions and candidates can be reached with much fewer samples. LA-MCTS serves as a \emph{meta-algorithm} by using existing black-box optimizers (e.g., BO, TuRBO) as its local models, achieving strong performance in general black-box optimization and reinforcement learning benchmarks, in particular for high-dimensional problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Semantic-aware Normalization for Generative Adversarial Networks b/data/2020/neurips/Learning Semantic-aware Normalization for Generative Adversarial Networks
new file mode 100644
index 0000000000..38c6a601b9
--- /dev/null
+++ b/data/2020/neurips/Learning Semantic-aware Normalization for Generative Adversarial Networks	
@@ -0,0 +1 @@
+The recent advances in image generation have been achieved by style-based image generators. Such approaches learn to disentangle latent factors in different image scales and encode latent factors as “style” to control image synthesis. However, existing approaches cannot further disentangle ﬁne-grained semantics from each other, which are often conveyed from feature channels. In this paper, we propose a novel image synthesis approach by learning S emantic-a ware r elative i mportance for feature channels in Generative Adversarial Networks (SariGAN). Such a model disentangles latent factors according to the semantic of feature channels by channel-/group-wise fusion of latent codes and feature channels. Particularly, we learn to cluster feature channels by semantics and propose an adaptive group-wise Normalization (AdaGN) to independently control the styles of different channel groups. For example, we can adjust the statistics of channel groups for a human face to control the open and close of the mouth, while keeping other facial features unchanged. We propose to use adversarial training, a channel grouping loss, and a mutual information loss for joint optimization, which not only enables high-ﬁdelity image synthesis but leads to superior interpretable properties. Extensive experiments show that our approach outperforms the SOTA style-based approaches in both unconditional image generation and conditional image inpainting tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Some Popular Gaussian Graphical Models without Condition Number Bounds b/data/2020/neurips/Learning Some Popular Gaussian Graphical Models without Condition Number Bounds
new file mode 100644
index 0000000000..72f7ee5878
--- /dev/null
+++ b/data/2020/neurips/Learning Some Popular Gaussian Graphical Models without Condition Number Bounds	
@@ -0,0 +1,2 @@
+Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. 
+Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. Our result for structure recovery in walk-summable GGMs is derived from a more general result for efficient sparse linear regression in walk-summable models without any norm dependencies. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Sparse Prototypes for Text Generation b/data/2020/neurips/Learning Sparse Prototypes for Text Generation
new file mode 100644
index 0000000000..9b19ffc9de
--- /dev/null
+++ b/data/2020/neurips/Learning Sparse Prototypes for Text Generation	
@@ -0,0 +1 @@
+Prototype-driven text generation uses non-parametric models that first choose from a library of sentence "prototypes" and then modify the prototype to generate the output text. While effective, these methods are inefficient at test time as a result of needing to store and index the entire training corpus. Further, existing methods often require heuristics to identify which prototypes to reference at training time. In this paper, we propose a novel generative model that automatically learns a \emph{sparse} prototype support set that, nonetheless, achieves strong language modeling performance. This is achieved by (1) imposing a sparsity-inducing prior on the prototype selection distribution, and (2) utilizing amortized variational inference to \emph{learn} a prototype retrieval function. In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction, as well as a 1000x speed-up at test time. More interestingly, we show that the learned prototypes are able to capture semantics and syntax at different granularity as we vary the sparsity of prototype selection, and that certain sentence attributes can be controlled by specifying the prototype for generation.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Strategic Network Emergence Games b/data/2020/neurips/Learning Strategic Network Emergence Games
new file mode 100644
index 0000000000..14550deeff
--- /dev/null
+++ b/data/2020/neurips/Learning Strategic Network Emergence Games	
@@ -0,0 +1 @@
+Real-world networks, especially the ones that emerge due to actions of animate agents (e.g. humans, animals), are the result of underlying strategic mechanisms aimed at maximizing individual or collective beneﬁts. Learning approaches built to capture these strategic insights would gain interpretability and ﬂexibility beneﬁts that are required to generalize beyond observations. To this end, we consider a game-theoretic formalism of network emergence that accounts for the underlying strategic mechanisms and take it to the observed data. We propose MINE ( M ulti-agent I nverse models of N etwork E mergence mechanism), a new learning framework that solves Markov-Perfect network emergence games using multi-agent inverse reinforcement learning. MINE jointly discovers agents’ strategy proﬁles in the form of network emergence policy and the latent payoff mechanism in the form of learned reward function. In the experiments, we demonstrate that MINE learns versatile payoff mechanisms that: highly correlates with the ground truth for a synthetic case; can be used to analyze the observed network structure; and enable effective transfer in speciﬁc settings. Further, we show that the network emergence game as a learned model supports meaningful strategic predictions, thereby signifying its applicability to a variety of network analysis tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Strategy-Aware Linear Classifiers b/data/2020/neurips/Learning Strategy-Aware Linear Classifiers
new file mode 100644
index 0000000000..e63ee1d2c1
--- /dev/null
+++ b/data/2020/neurips/Learning Strategy-Aware Linear Classifiers	
@@ -0,0 +1 @@
+We address the question of repeatedly learning linear classifiers against agents who are \emph{strategically} trying to \emph{game} the deployed classifiers, and we use the \emph{Stackelberg regret} to measure the performance of our algorithms. First, we show that Stackelberg and external regret for the problem of strategic classification are \emph{strongly incompatible}: i.e., there exist worst-case scenarios, where \emph{any} sequence of actions providing \emph{sublinear} external regret might result in \emph{linear} Stackelberg regret and vice versa. Second, we present a strategy-aware algorithm for minimizing the Stackelberg regret for which we prove nearly matching upper and lower regret bounds. Finally, we provide simulations to complement our theoretical analysis. Our results advance the growing literature of learning from revealed preferences, which has so far focused on "smoother" assumptions from the perspective of the learner and the agents respectively.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Structured Distributions From Untrusted Batches: Faster and Simpler b/data/2020/neurips/Learning Structured Distributions From Untrusted Batches: Faster and Simpler
new file mode 100644
index 0000000000..31cbc67cc7
--- /dev/null
+++ b/data/2020/neurips/Learning Structured Distributions From Untrusted Batches: Faster and Simpler	
@@ -0,0 +1,2 @@
+We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cut-norm that achieves essentially information-theoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $\mu$ is assumed to be structured, e.g. log-concave, monotone hazard rate, $t$-modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasi-polynomial time algorithm for doing so using Haar wavelets. 
+In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in high-dimensional robust estimation. We validate the usefulness of our algorithms in preliminary experimental evaluations.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning Utilities and Equilibria in Non-Truthful Auctions b/data/2020/neurips/Learning Utilities and Equilibria in Non-Truthful Auctions
new file mode 100644
index 0000000000..072f321cf6
--- /dev/null
+++ b/data/2020/neurips/Learning Utilities and Equilibria in Non-Truthful Auctions	
@@ -0,0 +1 @@
+In non-truthful auctions, agents' utility for a strategy depends on the strategies of the opponents and also the prior distribution over their private types; the set of Bayes Nash equilibria generally has an intricate dependence on the prior. Using the First Price Auction as our main demonstrating example, we show that $\tilde O(n / \epsilon^2)$ samples from the prior with $n$ agents suffice for an algorithm to learn the interim utilities for all monotone bidding strategies. As a consequence, this number of samples suffice for learning all approximate equilibria. We give almost matching (up to polylog factors) lower bound on the sample complexity for learning utilities. We also consider a setting where agents must pay a search cost to discover their own types. Drawing on a connection between this setting and the first price auction, discovered recently by Kleinberg et al. (2016), we show that $\tilde O(n / \epsilon^2)$ samples suffice for utilities and equilibria to be estimated in a near welfare-optimal descending auction in this setting. En route, we improve the sample complexity bound, recently obtained by Guo et al. (2020), for the Pandora's Box problem, which is a classical model for sequential consumer search.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning abstract structure for drawing by efficient motor program induction b/data/2020/neurips/Learning abstract structure for drawing by efficient motor program induction
new file mode 100644
index 0000000000..041165c2a2
--- /dev/null
+++ b/data/2020/neurips/Learning abstract structure for drawing by efficient motor program induction	
@@ -0,0 +1 @@
+Humans flexibly solve new problems that differ qualitatively from those they were trained on. This ability to generalize is supported by learned concepts that capture structure common across different problems. Here we develop a naturalistic drawing task to study how humans rapidly acquire structured prior knowledge. The task requires drawing visual objects that share underlying structure, based on a set of composable geometric rules. We show that people spontaneously learn abstract drawing procedures that support generalization, and propose a model of how learners can discover these reusable drawing programs. Trained in the same setting as humans, and constrained to produce efficient motor actions, this model discovers new drawing routines that transfer to test objects and resemble learned features of human sequences. These results suggest that two principles guiding motor program induction in the model - abstraction (general programs that ignore object-specific details) and compositionality (recombining previously learned programs) - are key for explaining how humans learn structured internal representations that guide flexible reasoning and learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning by Minimizing the Sum of Ranked Range b/data/2020/neurips/Learning by Minimizing the Sum of Ranked Range
new file mode 100644
index 0000000000..82704a7d77
--- /dev/null
+++ b/data/2020/neurips/Learning by Minimizing the Sum of Ranked Range	
@@ -0,0 +1 @@
+In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification. Our empirical results highlight the effectiveness of the proposed optimization framework and demonstrate the applicability of proposed losses using synthetic and real datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning compositional functions via multiplicative weight updates b/data/2020/neurips/Learning compositional functions via multiplicative weight updates
new file mode 100644
index 0000000000..f37952d6fa
--- /dev/null
+++ b/data/2020/neurips/Learning compositional functions via multiplicative weight updates	
@@ -0,0 +1 @@
+Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam---a multiplicative version of the Adam optimiser---and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning discrete distributions with infinite support b/data/2020/neurips/Learning discrete distributions with infinite support
new file mode 100644
index 0000000000..121e1cf6af
--- /dev/null
+++ b/data/2020/neurips/Learning discrete distributions with infinite support	
@@ -0,0 +1 @@
+We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric. In a departure from the established paradigm, we make no structural assumptions whatsoever on the sampling distribution. In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound. We derive precisely such bounds, and demonstrate that these are, in a well-defined sense, the best possible. Our main discovery is that the half-norm of the empirical distribution provides tight upper and lower estimates on the empirical risk. Furthermore, this quantity decays at a nearly optimal rate as a function of the true distribution. The optimality follows from a minimax result, of possible independent interest. Additional structural results are provided, including an exact Rademacher complexity calculation and apparently a first connection between the total variation risk and the missing mass.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning discrete distributions: user vs item-level privacy b/data/2020/neurips/Learning discrete distributions: user vs item-level privacy
new file mode 100644
index 0000000000..488eaf5bbd
--- /dev/null
+++ b/data/2020/neurips/Learning discrete distributions: user vs item-level privacy	
@@ -0,0 +1,3 @@
+Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. 
+We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(m\alpha^2) + k/\sqrt{m}\epsilon\alpha)$ and further show that it is nearly-optimal under certain regimes. Thus the privacy penalty is $\mathcal{O}(\sqrt{m})$ times smaller compared to the standard mechanisms. 
+We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning efficient task-dependent representations with synaptic plasticity b/data/2020/neurips/Learning efficient task-dependent representations with synaptic plasticity
new file mode 100644
index 0000000000..86f132de1d
--- /dev/null
+++ b/data/2020/neurips/Learning efficient task-dependent representations with synaptic plasticity	
@@ -0,0 +1 @@
+Neural populations do not perfectly encode the sensory world: their capacity is limited by the number of neurons, metabolic and other biophysical resources, and intrinsic noise. The brain is presumably shaped by these limitations, improving efficiency by discarding some aspects of incoming sensory streams, while prefer-entially preserving commonly occurring, behaviorally-relevant information. Here we construct a stochastic recurrent neural circuit model that can learn efficient, task-specific sensory codes using a novel form of reward-modulated Hebbian synaptic plasticity. We illustrate the flexibility of the model by training an initially unstructured neural network to solve two different tasks: stimulus estimation, and stimulus discrimination. The network achieves high performance in both tasks by appropriately allocating resources and using its recurrent circuitry to best compensate for different levels of noise. We also show how the interaction between stimulus priors and task structure dictates the emergent network representations.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning from Aggregate Observations b/data/2020/neurips/Learning from Aggregate Observations
new file mode 100644
index 0000000000..fbbe93b7ef
--- /dev/null
+++ b/data/2020/neurips/Learning from Aggregate Observations	
@@ -0,0 +1 @@
+We study the problem of learning from aggregate observations where supervision signals are given to sets of instances instead of individual instances, while the goal is still to predict labels of unseen individuals. A well-known example is multiple instance learning (MIL). In this paper, we extend MIL beyond binary classification to other problems such as multiclass classification and regression. We present a probabilistic framework that is applicable to a variety of aggregate observations, e.g., pairwise similarity for classification and mean/difference/rank observation for regression. We propose a simple yet effective method based on the maximum likelihood principle, which can be simply implemented for various differentiable models such as deep neural networks and gradient boosting machines. Experiments on three novel problem settings -- classification via triplet comparison and regression via mean/rank observation indicate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning from Failure: De-biasing Classifier from Biased Classifier b/data/2020/neurips/Learning from Failure: De-biasing Classifier from Biased Classifier
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Learning from Label Proportions: A Mutual Contamination Framework b/data/2020/neurips/Learning from Label Proportions: A Mutual Contamination Framework
new file mode 100644
index 0000000000..87e412f75e
--- /dev/null
+++ b/data/2020/neurips/Learning from Label Proportions: A Mutual Contamination Framework	
@@ -0,0 +1 @@
+Learning from label proportions (LLP) is a weakly supervised setting for classification in which unlabeled training instances are grouped into bags, and each bag is annotated with the proportion of each class occurring in that bag. Prior work on LLP has yet to establish a consistent learning procedure, nor does there exist a theoretically justified, general purpose training criterion. In this work we address these two issues by posing LLP in terms of mutual contamination models (MCMs), which have recently been applied successfully to study various other weak supervision settings. In the process, we establish several novel technical results for MCMs, including unbiased losses and generalization error bounds under non-iid sampling plans. We also point out the limitations of a common experimental setting for LLP, and propose a new one based on our MCM framework.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning from Mixtures of Private and Public Populations b/data/2020/neurips/Learning from Mixtures of Private and Public Populations
new file mode 100644
index 0000000000..d0b3297d35
--- /dev/null
+++ b/data/2020/neurips/Learning from Mixtures of Private and Public Populations	
@@ -0,0 +1,3 @@
+We initiate the study of a new model of supervised learning under privacy constraints. Imagine a medical study where a dataset is sampled from a population of both healthy and unhealthy individuals. Suppose healthy individuals have no privacy concerns (in such case, we call their data "public") while the unhealthy individuals desire stringent privacy protection for their data. In this example, the population (data distribution) is a mixture of private (unhealthy) and public (healthy) sub-populations that could be very different. 
+Inspired by the above example, we consider a model in which the population $\mathcal{D}$ is a mixture of two sub-populations: a private sub-population $\mathcal{D}_{\sf priv}$ of private and sensitive data, and a public sub-population $\mathcal{D}_{\sf pub}$ of data with no privacy concerns. Each example drawn from $\mathcal{D}$ is assumed to contain a privacy-status bit that indicates whether the example is private or public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. 
+Prior works in this context assumed a homogeneous population where private and public data arise from the same distribution, and in particular designed solutions which exploit this assumption. We demonstrate how to circumvent this assumption by considering, as a case study, the problem of learning linear classifiers in $\mathbb{R}^d$. We show that in the case where the privacy status is correlated with the target label (as in the above example), linear classifiers in $\mathbb{R}^d$ can be learned, in the agnostic as well as the realizable setting, with sample complexity which is comparable to that of the classical (non-private) PAC-learning. It is known that this task is impossible if all the data is considered private.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning from Positive and Unlabeled Data with Arbitrary Positive Shift b/data/2020/neurips/Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
new file mode 100644
index 0000000000..426a69011d
--- /dev/null
+++ b/data/2020/neurips/Learning from Positive and Unlabeled Data with Arbitrary Positive Shift	
@@ -0,0 +1 @@
+Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption rarely holds in practice due to temporal drift, domain shift, and/or adversarial manipulation. This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We integrate this into two statistically consistent methods to address arbitrary positive bias - one approach combines negative-unlabeled learning with unlabeled-unlabeled learning while the other uses a novel, recursive risk estimator. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE b/data/2020/neurips/Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE
new file mode 100644
index 0000000000..013fa51524
--- /dev/null
+++ b/data/2020/neurips/Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE	
@@ -0,0 +1 @@
+The ability to record activities from hundreds of neurons simultaneously in the brain has placed an increasing demand for developing appropriate statistical techniques to analyze such data. Recently, deep generative models have been proposed to fit neural population responses. While these methods are flexible and expressive, the downside is that they can be difficult to interpret and identify. To address this problem, we propose a method that integrates key ingredients from latent models and traditional neural encoding models. Our method, pi-VAE, is inspired by recent progress on identifiable variational auto-encoder, which we adapt to make appropriate for neuroscience applications. Specifically, we propose to construct latent variable models of neural activity while simultaneously modeling the relation between the latent and task variables (non-neural variables, e.g. sensory, motor, and other externally observable states). The incorporation of task variables results in models that are not only more constrained, but also show qualitative improvements in interpretability and identifiability. We validate pi-VAE using synthetic data, and apply it to analyze neurophysiological datasets from rat hippocampus and macaque motor cortex. We demonstrate that pi-VAE not only fits the data better, but also provides unexpected novel insights into the structure of the neural codes.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning of Discrete Graphical Models with Neural Networks b/data/2020/neurips/Learning of Discrete Graphical Models with Neural Networks
new file mode 100644
index 0000000000..03111228ed
--- /dev/null
+++ b/data/2020/neurips/Learning of Discrete Graphical Models with Neural Networks	
@@ -0,0 +1 @@
+Graphical models are widely used in science to represent joint probability distributions with an underlying conditional dependence structure. The inverse problem of learning a discrete graphical model given i.i.d samples from its joint distribution can be solved with near-optimal sample complexity using a convex optimization method known as Generalized Regularized Interaction Screening Estimator (GRISE). But the computational cost of GRISE becomes prohibitive when the energy function of the true graphical model has higher order terms. We introduce NN-GRISE, a neural net based algorithm for graphical model learning, to tackle this limitation of GRISE. We use neural nets as function approximators in an interaction screening objective function. The optimization of this objective then produces a neural-net representation for the conditionals of the graphical model. NN-GRISE algorithm is seen to be a better alternative to GRISE when the energy function of the true model has a high order with a high degree of symmetry. In these cases, NN-GRISE is able to find the correct parsimonious representation for the conditionals without being fed any prior information about the true model. NN-GRISE can also be used to learn the underlying structure of the true model with some simple modifications to its training procedure. In addition, we also show a variant of NN-GRISE that can be used to learn a neural net representation for the full energy function of the true model.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning outside the Black-Box: The pursuit of interpretable models b/data/2020/neurips/Learning outside the Black-Box: The pursuit of interpretable models
new file mode 100644
index 0000000000..4aba707901
--- /dev/null
+++ b/data/2020/neurips/Learning outside the Black-Box: The pursuit of interpretable models	
@@ -0,0 +1 @@
+Machine Learning has proved its ability to produce accurate models but the deployment of these models outside the machine learning community has been hindered by the difficulties of interpreting these models. This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function. Our algorithm employs a variation of projection pursuit in which the ridge functions are chosen to be Meijer G-functions, rather than the usual polynomial splines. Because Meijer G-functions are differentiable in their parameters, we can tune the parameters of the representation by gradient descent; as a consequence, our algorithm is efficient. Using five familiar data sets from the UCI repository and two familiar machine learning algorithms, we demonstrate that our algorithm produces global interpretations that are both highly accurate and parsimonious (involve a small number of terms). Our interpretations permit easy understanding of the relative importance of features and feature interactions. Our interpretation algorithm represents a leap forward from the previous state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning sparse codes from compressed representations with biologically plausible local wiring constraints b/data/2020/neurips/Learning sparse codes from compressed representations with biologically plausible local wiring constraints
new file mode 100644
index 0000000000..41cb398835
--- /dev/null
+++ b/data/2020/neurips/Learning sparse codes from compressed representations with biologically plausible local wiring constraints	
@@ -0,0 +1 @@
+Sparse coding is an important method for unsupervised learning of task-independent features in theoretical neuroscience models of neural coding. While a number of algorithms exist to learn these representations from the statistics of a dataset, they largely ignore the information bottlenecks present in fiber pathways connecting cortical areas. For example, the visual pathway has many fewer neurons transmitting visual information to cortex than the number of photoreceptors. Both empirical and analytic results have recently shown that sparse representations can be learned effectively after performing dimensionality reduction with randomized linear operators, producing latent coefficients that preserve information. Unfortunately, current proposals for sparse coding in the compressed space require a centralized compression process (i.e., dense random matrix) that is biologically unrealistic due to local wiring constraints observed in neural circuits. The main contribution of this paper is to leverage recent results on structured random matrices to propose a theoretical neuroscience model of randomized projections for communication between cortical areas that is consistent with the local wiring constraints observed in neuroanatomy. We show analytically and empirically that unsupervised learning of sparse representations can be performed in the compressed space despite significant local wiring constraints in compression matrices of varying forms (corresponding to different local wiring patterns). Our analysis verifies that even with significant local wiring constraints, the learned representations remain qualitatively similar, have similar quantitative performance in both training and generalization error, and are consistent across many measures with measured macaque V1 receptive fields.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning the Geometry of Wave-Based Imaging b/data/2020/neurips/Learning the Geometry of Wave-Based Imaging
new file mode 100644
index 0000000000..f0fda3cd3d
--- /dev/null
+++ b/data/2020/neurips/Learning the Geometry of Wave-Based Imaging	
@@ -0,0 +1 @@
+We propose a general deep learning architecture for wave-based imaging problems. A key difficulty in imaging problems with varying background wave speed is that the medium "bends" the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable architecture based on wave physics, as captured by the Fourier integral operators (FIOs). FIOs appear in the description of a wide range of wave-based imaging modalities, from seismology and radar to Doppler and ultrasound. Their geometry is characterized by a canonical relation which governs the propagation of singularities. We learn this geometry via optimal transport in the wave packet representation. The proposed FIONet performs significantly better than the usual baselines on a number of inverse problems, especially in out-of-distribution tests.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning the Linear Quadratic Regulator from Nonlinear Observations b/data/2020/neurips/Learning the Linear Quadratic Regulator from Nonlinear Observations
new file mode 100644
index 0000000000..7d720fc82a
--- /dev/null
+++ b/data/2020/neurips/Learning the Linear Quadratic Regulator from Nonlinear Observations	
@@ -0,0 +1 @@
+We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Adapt to Evolving Domains b/data/2020/neurips/Learning to Adapt to Evolving Domains
new file mode 100644
index 0000000000..412997a6dd
--- /dev/null
+++ b/data/2020/neurips/Learning to Adapt to Evolving Domains	
@@ -0,0 +1 @@
+Domain adaptation aims at knowledge transfer from a labeled source domain to an unlabeled target domain. Current domain adaptation methods have made substantial advances in adapting discrete domains. However, this can be unrealistic in real-world applications, where target data usually come in an online and continually evolving manner, posing challenges to classic domain adaptation paradigm: (1) Mainstream domain adaptation methods are tailored to stationary target domains, and can fail in non-stationary environments. (2) Since the target data arrive online, the model should also maintain competence on previous target data, i.e. adapt without forgetting. To tackle these challenges, we propose a meta-adaptation framework which enables the learner to adapt to continually evolving target domains without forgetting. Our framework consists of two components: a meta-objective of learning representations to adapt to evolving domains, enabling meta-learning for unsupervised domain adaptation; and a meta-adapter for learning to adapt without forgetting, reserving knowledge from previous target data. Experiments validate the effectiveness our method on evolving domain adaptation benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Approximate a Bregman Divergence b/data/2020/neurips/Learning to Approximate a Bregman Divergence
new file mode 100644
index 0000000000..035e7cf4c5
--- /dev/null
+++ b/data/2020/neurips/Learning to Approximate a Bregman Divergence	
@@ -0,0 +1 @@
+Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We develop a formulation and algorithm for learning arbitrary Bregman divergences based on approximating their underlying convex generating function via a piecewise linear function. We provide theoretical approximation bounds using our parameterization and show that the generalization error $O_p(m^{-1/2})$ for metric learning using our framework matches the known generalization error in the strictly less general Mahalanobis metric learning setting. We further demonstrate empirically that our method performs well in comparison to existing metric learning methods, particularly for clustering and ranking problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Decode: Reinforcement Learning for Decoding of Sparse Graph-Based Channel Codes b/data/2020/neurips/Learning to Decode: Reinforcement Learning for Decoding of Sparse Graph-Based Channel Codes
new file mode 100644
index 0000000000..1b6cca4c65
--- /dev/null
+++ b/data/2020/neurips/Learning to Decode: Reinforcement Learning for Decoding of Sparse Graph-Based Channel Codes	
@@ -0,0 +1 @@
+We show in this work that reinforcement learning can be successfully applied to decoding short to moderate length sparse graph-based channel codes. Specifically, we focus on low-density parity check (LDPC) codes, which for example have been standardized in the context of 5G cellular communication systems due to their excellent error correcting performance. These codes are typically decoded via belief propagation iterative decoding on the corresponding bipartite (Tanner) graph of the code via flooding, i.e., all check and variable nodes in the Tanner graph are updated at once. In contrast, in this paper we utilize a sequential update policy which selects the optimum check node (CN) scheduling in order to improve decoding performance. In particular, we model the CN update process as a multi-armed bandit process with dependent arms and employ a Q-learning scheme for optimizing the CN scheduling policy. In order to reduce the learning complexity, we propose a novel graph-induced CN clustering approach to partition the state space in such a way that dependencies between clusters are minimized. Our results show that compared to other decoding approaches from the literature, the proposed reinforcement learning scheme not only significantly improves the decoding performance, but also reduces the decoding complexity dramatically once the model is learned.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning b/data/2020/neurips/Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning
new file mode 100644
index 0000000000..e931288f05
--- /dev/null
+++ b/data/2020/neurips/Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Priority dispatching rule (PDR) is widely used for solving real-world Job-shop scheduling problem (JSSP). However, the design of effective PDRs is a tedious task, requiring a myriad of specialized knowledge and often delivering limited performance. In this paper, we propose to automatically learn PDRs via an end-to-end deep reinforcement learning agent. We exploit the disjunctive graph representation of JSSP, and propose a Graph Neural Network based scheme to embed the states encountered during solving. The resulting policy network is size-agnostic, effectively enabling generalization on large-scale instances. Experiments show that the agent can learn high-quality PDRs from scratch with elementary raw features, and demonstrates strong performance against the best existing PDRs. The learned policies also perform well on much larger instances that are unseen in training.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks b/data/2020/neurips/Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
new file mode 100644
index 0000000000..6963158ea5
--- /dev/null
+++ b/data/2020/neurips/Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks including code completion, bug finding, and program repair. They benefit from leveraging program structure like control flow graphs, but they are not well-suited to tasks like program execution that require far more sequential reasoning steps than number of GNN propagation steps. Recurrent neural networks (RNNs), on the other hand, are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure and generally perform worse on the above tasks. Our aim is to achieve the best of both worlds, and we do so by introducing a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which achieves improved systematic generalization on the task of learning to execute programs using control flow graphs. The model arises by considering RNNs operating on program traces with branch decisions as latent variables. The IPA-GNN can be seen either as a continuous relaxation of the RNN model or as a GNN variant more tailored to execution. To test the models, we propose evaluating systematic generalization on learning to execute using control flow graphs, which tests sequential reasoning and use of program structure. More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a heuristic function in program synthesis. Results show that the IPA-GNN outperforms a variety of RNN and GNN baselines on both tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction b/data/2020/neurips/Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction
new file mode 100644
index 0000000000..08ba6d754b
--- /dev/null
+++ b/data/2020/neurips/Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Prediction	
@@ -0,0 +1 @@
+Many practical graph problems, such as knowledge graph construction and drug-to-drug interaction, require to handle multi-relational graphs. However, handling real-world multi-label graphs with Graph Neural Networks (GNNs) is often challenging due to their evolving nature, where new entities (nodes) can emerge over time. Moreover, newly emerged entities often have few links, which makes the learning even more difficult. Motivated by this challenge, we introduce a realistic problem of few-shot out-of-graph link prediction, where we not only predict the links between the seen and unseen nodes as in a conventional out-of-knowledge link prediction but also between the unseen nodes, with only few edges per node. We tackle this problem with a novel transductive meta-learning framework which we refer to as Graph Extrapolation Networks (GEN). GEN meta-learns both the node embedding network for inductive inference (seen-to-unseen) and the link prediction network for transductive inference (unseen-to-unseen). For transductive link prediction, we further propose a stochastic embedding layer to model uncertainty in the link prediction between unseen entities. We validate our model on multiple benchmark datasets for knowledge graph link prediction and drug-to-drug interaction prediction. The results show that our model significantly outperforms relevant baselines for out-of-graph link prediction tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Incentivize Other Learning Agents b/data/2020/neurips/Learning to Incentivize Other Learning Agents
new file mode 100644
index 0000000000..2e0f83cd82
--- /dev/null
+++ b/data/2020/neurips/Learning to Incentivize Other Learning Agents	
@@ -0,0 +1 @@
+The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Learn Variational Semantic Memory b/data/2020/neurips/Learning to Learn Variational Semantic Memory
new file mode 100644
index 0000000000..e0a98be1c6
--- /dev/null
+++ b/data/2020/neurips/Learning to Learn Variational Semantic Memory	
@@ -0,0 +1 @@
+In this paper, we introduce variational semantic memory into meta-learning to acquire long-term knowledge for few-shot learning. The variational semantic memory accrues and stores semantic information for the probabilistic inference of class prototypes in a hierarchical Bayesian framework. The semantic memory is grown from scratch and gradually consolidated by absorbing information from tasks it experiences. By doing so, it is able to accumulate long-term, general knowledge that enables it to learn new concepts of objects. We formulate memory recall as the variational inference of a latent memory variable from addressed contents, which offers a principled way to adapt the knowledge to individual tasks. Our variational semantic memory, as a new long-term memory module, confers principled recall and update mechanisms that enable semantic information to be efficiently accrued and adapted for few-shot learning. Experiments demonstrate that the probabilistic modelling of prototypes achieves a more informative representation of object classes compared to deterministic vectors. The consistent new state-of-the-art performance on four benchmarks shows the benefit of variational semantic memory in boosting few-shot recognition.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Learn with Feedback and Local Plasticity b/data/2020/neurips/Learning to Learn with Feedback and Local Plasticity
new file mode 100644
index 0000000000..5237cc35ae
--- /dev/null
+++ b/data/2020/neurips/Learning to Learn with Feedback and Local Plasticity	
@@ -0,0 +1 @@
+Interest in biologically inspired alternatives to backpropagation is driven by the desire to both advance connections between deep learning and neuroscience and address backpropagation's shortcomings on tasks such as online, continual learning. However, local synaptic learning rules like those employed by the brain have so far failed to match the performance of backpropagation in deep networks. In this study, we employ meta-learning to discover networks that learn using feedback connections and local, biologically inspired learning rules. Importantly, the feedback connections are not tied to the feedforward weights, avoiding biologically implausible weight transport. Our experiments show that meta-trained networks effectively use feedback connections to perform online credit assignment in multi-layer architectures. Surprisingly, this approach matches or exceeds a state-of-the-art gradient-based online meta-learning algorithm on regression and classification tasks, excelling in particular at continual learning. Analysis of the weight updates employed by these models reveals that they differ qualitatively from gradient descent in a way that reduces interference between updates. Our results suggest the existence of a class of biologically plausible learning mechanisms that not only match gradient descent-based learning, but also overcome its limitations.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Mutate with Hypergradient Guided Population b/data/2020/neurips/Learning to Mutate with Hypergradient Guided Population
new file mode 100644
index 0000000000..c229b2fb2b
--- /dev/null
+++ b/data/2020/neurips/Learning to Mutate with Hypergradient Guided Population	
@@ -0,0 +1 @@
+Computing the gradient of model hyperparameters, i.e. , hypergradient, enables a promising and natural way to solve the hyperparameter optimization task. However, gradient-based methods could lead to suboptimal solutions due to the non-convex nature of optimization in a complex hyperparameter space. In this study, we propose a hyperparameter mutation (HPM) algorithm to explicitly consider a learnable trade-off between using global and local search, where we adopt a population of student models to simultaneously explore the hyperparameter space guided by hypergradient and leverage a teacher model to mutate the underperforming students by exploiting the top ones. The teacher model is implemented with an attention mechanism and is used to learn a mutation schedule for different hyperparameters on the ﬂy. Empirical evidence on synthetic functions is provided to show that HPM outperforms hypergradient signiﬁcantly. Experiments on two benchmark datasets are also conducted to validate the effectiveness of the proposed HPM algorithm for training deep neural networks compared with several strong baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Orient Surfaces by Self-supervised Spherical CNNs b/data/2020/neurips/Learning to Orient Surfaces by Self-supervised Spherical CNNs
new file mode 100644
index 0000000000..3b665c5c00
--- /dev/null
+++ b/data/2020/neurips/Learning to Orient Surfaces by Self-supervised Spherical CNNs	
@@ -0,0 +1 @@
+Defining and reliably finding a canonical orientation for 3D surfaces is key to many Computer Vision and Robotics applications. This task is commonly addressed by handcrafted algorithms exploiting geometric cues deemed as distinctive and robust by the designer. Yet, one might conjecture that humans learn the notion of the inherent orientation of 3D objects from experience and that machines may do so alike. In this work, we show the feasibility of learning a robust canonical orientation for surfaces represented as point clouds. Based on the observation that the quintessential property of a canonical orientation is equivariance to 3D rotations, we propose to employ Spherical CNNs, a recently introduced machinery that can learn equivariant representations defined on the Special Orthogonal group SO(3). Specifically, spherical correlations compute feature maps whose elements define 3D rotations. Our method learns such feature maps from raw data by a self-supervised training procedure and robustly selects a rotation to transform the input point cloud into a learned canonical orientation. Thereby, we realize the first end-to-end learning approach to define and extract the canonical orientation of 3D shapes, which we aptly dub Compass. Experiments on several public datasets prove its effectiveness at orienting local surface patches as well as whole objects.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Play No-Press Diplomacy with Best Response Policy Iteration b/data/2020/neurips/Learning to Play No-Press Diplomacy with Best Response Policy Iteration
new file mode 100644
index 0000000000..5f6171c6eb
--- /dev/null
+++ b/data/2020/neurips/Learning to Play No-Press Diplomacy with Best Response Policy Iteration	
@@ -0,0 +1 @@
+Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Play Sequential Games versus Unknown Opponents b/data/2020/neurips/Learning to Play Sequential Games versus Unknown Opponents
new file mode 100644
index 0000000000..2f1e3b46e4
--- /dev/null
+++ b/data/2020/neurips/Learning to Play Sequential Games versus Unknown Opponents	
@@ -0,0 +1 @@
+We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions to capture and exploit the structure in the opponent's response. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. The algorithm combines ideas from bilevel optimization and online learning to effectively balance between exploration (learning about the opponent's model) and exploitation (selecting highly rewarding actions for the learner). Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response and scale sublinearly with the number of game rounds. Moreover, we specialize our approach to repeated Stackelberg games, and empirically demonstrate its effectiveness in a traffic routing and wildlife conservation task
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Prove Theorems by Learning to Generate Theorems b/data/2020/neurips/Learning to Prove Theorems by Learning to Generate Theorems
new file mode 100644
index 0000000000..29fb6a1063
--- /dev/null
+++ b/data/2020/neurips/Learning to Prove Theorems by Learning to Generate Theorems	
@@ -0,0 +1 @@
+We consider the task of automated theorem proving, a key AI task. Deep learning has shown promise for training theorem provers, but there are limited human-written theorems and proofs available for supervised learning. To address this limitation, we propose to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover. Experiments on real-world tasks demonstrate that synthetic data from our approach improves the theorem prover and advances the state of the art of automated theorem proving in Metamath.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to Select Best Forecast Tasks for Clinical Outcome Prediction b/data/2020/neurips/Learning to Select Best Forecast Tasks for Clinical Outcome Prediction
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping b/data/2020/neurips/Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
new file mode 100644
index 0000000000..86d124dc22
--- /dev/null
+++ b/data/2020/neurips/Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping	
@@ -0,0 +1 @@
+Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly propose three learning algorithms based on different assumptions. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards, and meanwhile ignore unbeneficial shaping rewards or even transform them into beneficial ones.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to search efficiently for causally near-optimal treatments b/data/2020/neurips/Learning to search efficiently for causally near-optimal treatments
new file mode 100644
index 0000000000..416f90095f
--- /dev/null
+++ b/data/2020/neurips/Learning to search efficiently for causally near-optimal treatments	
@@ -0,0 +1 @@
+Finding an effective medical treatment often requires a search by trial and error. Making this search more efficient by minimizing the number of unnecessary trials could lower both costs and patient suffering. We formalize this problem as learning a policy for finding a near-optimal treatment in a minimum number of trials using a causal inference framework. We give a model-based dynamic programming algorithm which learns from observational data while being robust to unmeasured confounding. To reduce time complexity, we suggest a greedy algorithm which bounds the near-optimality constraint. The methods are evaluated on synthetic and real-world healthcare data and compared to model-free reinforcement learning. We find that our methods compare favorably to the model-free baseline while offering a more transparent trade-off between search time and treatment efficacy.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to solve TV regularised problems with unrolled algorithms b/data/2020/neurips/Learning to solve TV regularised problems with unrolled algorithms
new file mode 100644
index 0000000000..3afce8485f
--- /dev/null
+++ b/data/2020/neurips/Learning to solve TV regularised problems with unrolled algorithms	
@@ -0,0 +1 @@
+Total Variation (TV) is a popular regularization strategy that promotes piece-wise constant signals by constraining the l1-norm of the first order derivative of the estimated signal. The resulting optimization problem is usually solved using iterative algorithms such as proximal gradient descent, primal-dual algorithms or ADMM. However, such methods can require a very large number of iterations to converge to a suitable solution. In this paper, we accelerate such iterative algorithms by unfolding proximal gradient descent solvers in order to learn their parameters for 1D TV regularized problems. While this could be done using the synthesis formulation, we demonstrate that this leads to slower performances. The main difficulty in applying such methods in the analysis formulation lies in proposing a way to compute the derivatives through the proximal operator. As our main contribution, we develop and characterize two approaches to do so, describe their benefits and limitations, and discuss the regime where they can actually improve over iterative procedures. We validate those findings with experiments on synthetic and real data.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning to summarize with human feedback b/data/2020/neurips/Learning to summarize with human feedback
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Learning under Model Misspecification: Applications to Variational and Ensemble methods b/data/2020/neurips/Learning under Model Misspecification: Applications to Variational and Ensemble methods
new file mode 100644
index 0000000000..95d4888b08
--- /dev/null
+++ b/data/2020/neurips/Learning under Model Misspecification: Applications to Variational and Ensemble methods	
@@ -0,0 +1 @@
+Virtually any model we use in machine learning to make predictions does not perfectly represent reality. So, most of the learning happens under model misspecification. In this work, we present a novel analysis of the generalization performance of Bayesian model averaging under model misspecification and i.i.d. data using a new family of second-order PAC-Bayes bounds. This analysis shows, in simple and intuitive terms, that Bayesian model averaging provides suboptimal generalization performance when the model is misspecified. In consequence, we provide strong theoretical arguments showing that Bayesian methods are not optimal for learning predictive models, unless the model class is perfectly specified. Using novel second-order PAC-Bayes bounds, we derive a new family of Bayesian-like algorithms, which can be implemented as variational and ensemble methods. The output of these algorithms is a new posterior distribution, different from the Bayesian posterior, which induces a posterior predictive distribution with better generalization performance. Experiments with Bayesian neural networks illustrate these findings.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning with Differentiable Pertubed Optimizers b/data/2020/neurips/Learning with Differentiable Pertubed Optimizers
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Learning with Operator-valued Kernels in Reproducing Kernel Krein Spaces b/data/2020/neurips/Learning with Operator-valued Kernels in Reproducing Kernel Krein Spaces
new file mode 100644
index 0000000000..6c8d1f40e7
--- /dev/null
+++ b/data/2020/neurips/Learning with Operator-valued Kernels in Reproducing Kernel Krein Spaces	
@@ -0,0 +1 @@
+Operator-valued kernels have shown promise in supervised learning problems with functional inputs and functional outputs. The crucial (and possibly restrictive) assumption of positive deﬁniteness of operator-valued kernels has been instrumental in developing efﬁcient algorithms. In this work, we consider operator-valued kernels which might not be necessarily positive deﬁnite. To tackle the indeﬁniteness of operator-valued kernels, we harness the machinery of Reproducing Kernel Krein Spaces (RKKS) of function-valued functions. A representer theorem is illustrated which yields a suitable loss stabilization problem for supervised learning with function-valued inputs and outputs. Analysis of generalization properties of the proposed framework is given. An iterative Operator based Minimum Residual (OpMINRES) algorithm is proposed for solving the loss stabilization problem. Experiments with indeﬁnite operator-valued kernels on synthetic and real data sets demonstrate the utility of the proposed approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions b/data/2020/neurips/Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions
new file mode 100644
index 0000000000..b231ca340b
--- /dev/null
+++ b/data/2020/neurips/Learning with Optimized Random Features: Exponential Speedup by Quantum Machine Learning without Sparsity and Low-Rank Assumptions	
@@ -0,0 +1 @@
+Kernel methods augmented with random features give scalable algorithms for learning from big data. But it has been computationally hard to sample random features according to a probability distribution that is optimized for the data, so as to minimize the required number of features for achieving the learning to a desired accuracy. Here, we develop a quantum algorithm for sampling from this optimized distribution over features, in runtime $O(D)$ that is linear in the dimension $D$ of the input data. Our algorithm achieves an exponential speedup in $D$ compared to any known classical algorithm for this sampling task. In contrast to existing quantum machine learning algorithms, our algorithm circumvents sparsity and low-rank assumptions and thus has wide applicability. We also show that the sampled features can be combined with regression by stochastic gradient descent to achieve the learning without canceling out our exponential speedup. Our algorithm based on sampling optimized random features leads to an accelerated framework for machine learning that takes advantage of quantum computers.
\ No newline at end of file
diff --git a/data/2020/neurips/Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms b/data/2020/neurips/Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms
new file mode 100644
index 0000000000..d30697d9a1
--- /dev/null
+++ b/data/2020/neurips/Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms	
@@ -0,0 +1,2 @@
+We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $\tau_{\mathsf{mix}}$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tilde{\Theta}(\tau_{\mathsf{mix}})$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. 
+Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning b/data/2020/neurips/Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
new file mode 100644
index 0000000000..fea8ee1052
--- /dev/null
+++ b/data/2020/neurips/Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning	
@@ -0,0 +1 @@
+Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far. We study KL regularization within an approximate value iteration scheme and show that it implicitly averages q-values. Leveraging this insight, we provide a very strong performance bound, the very first to combine two desirable aspects: a linear dependency to the horizon (instead of quadratic) and an error propagation term involving an averaging effect of the estimation errors (instead of an accumulation effect). We also study the more general case of an additional entropy regularizer. The resulting abstract scheme encompasses many existing RL algorithms. Some of our assumptions do not hold with neural networks, so we complement this theoretical analysis with an extensive empirical study.
\ No newline at end of file
diff --git a/data/2020/neurips/Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms b/data/2020/neurips/Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms
new file mode 100644
index 0000000000..d1c31fecc3
--- /dev/null
+++ b/data/2020/neurips/Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms	
@@ -0,0 +1 @@
+We consider online convex optimization with time-varying stage costs and additional switching costs. Since the switching costs introduce coupling across all stages, multi-step-ahead (long-term) predictions are incorporated to improve the online performance. However, longer-term predictions tend to suffer from lower quality. Thus, a critical question is: how to reduce the impact of long-term prediction errors on the online performance? To address this question, we introduce a gradient-based online algorithm, Receding Horizon Inexact Gradient (RHIG), and analyze its performance by dynamic regrets in terms of the temporal variation of the environment and the prediction errors. RHIG only considers at most $W$-step-ahead predictions to avoid being misled by worse predictions in the longer term. The optimal choice of $W$ suggested by our regret bounds depends on the tradeoff between the variation of the environment and the prediction accuracy. Additionally, we apply RHIG to a well-established stochastic prediction error model and provide expected regret and concentration bounds under correlated prediction errors. Lastly, we numerically test the performance of RHIG on quadrotor tracking problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations b/data/2020/neurips/Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations
new file mode 100644
index 0000000000..11db9cf5fc
--- /dev/null
+++ b/data/2020/neurips/Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations	
@@ -0,0 +1 @@
+We challenge the longstanding assumption that the mean-field approximation for variational inference in Bayesian neural networks is severely restrictive, and show this is not the case in deep networks. We prove several results indicating that deep mean-field variational weight posteriors can induce similar distributions in function-space to those induced by shallower networks with complex weight posteriors. We validate our theoretical contributions empirically, both through examination of the weight posterior using Hamiltonian Monte Carlo in small models and by comparing diagonal- to structured-covariance in large settings. Since complex variational posteriors are often expensive and cumbersome to implement, our results suggest that using mean-field variational inference in a deeper model is both a practical and theoretically justified alternative to structured approximations.
\ No newline at end of file
diff --git a/data/2020/neurips/Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting b/data/2020/neurips/Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
new file mode 100644
index 0000000000..d54de82bd9
--- /dev/null
+++ b/data/2020/neurips/Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting	
@@ -0,0 +1 @@
+Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation b/data/2020/neurips/Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation
new file mode 100644
index 0000000000..a7a6671b84
--- /dev/null
+++ b/data/2020/neurips/Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation	
@@ -0,0 +1 @@
+We propose a novel lightweight generative adversarial network for efficient image manipulation using natural language descriptions. To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text. Furthermore, thanks to the explicit training signal related to each word, the discriminator can also be simplified to have a lightweight structure. Compared with the state of the art, our method has a much smaller number of parameters, but still achieves a competitive manipulation performance. Extensive experimental results demonstrate that our method can better disentangle different visual attributes, then correctly map them to corresponding semantic words, and thus achieve a more accurate image modification using natural language descriptions.
\ No newline at end of file
diff --git a/data/2020/neurips/Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder b/data/2020/neurips/Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder
new file mode 100644
index 0000000000..03f7bb633b
--- /dev/null
+++ b/data/2020/neurips/Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder	
@@ -0,0 +1 @@
+Deep probabilistic generative models enable modeling the likelihoods of very high dimensional data. An important application of generative modeling should be the ability to detect out-of-distribution (OOD) samples by setting a threshold on the likelihood. However, a recent study shows that probabilistic generative models can, in some cases, assign higher likelihoods on certain types of OOD samples, making the OOD detection rules based on likelihood threshold problematic. To address this issue, several OOD detection methods have been proposed for deep generative models. In this paper, we make the observation that some of these methods fail when applied to generative models based on Variational Auto-encoders (VAE). As an alternative, we propose Likelihood Regret, an efficient OOD score for VAEs. We benchmark our proposed method over existing approaches, and empirical results suggest that our method obtains the best overall OOD detection performances compared with other OOD method applied on VAE.
\ No newline at end of file
diff --git a/data/2020/neurips/Limits on Testing Structural Changes in Ising Models b/data/2020/neurips/Limits on Testing Structural Changes in Ising Models
new file mode 100644
index 0000000000..97ecca8080
--- /dev/null
+++ b/data/2020/neurips/Limits on Testing Structural Changes in Ising Models	
@@ -0,0 +1 @@
+We present novel information-theoretic limits on detecting sparse changes in Ising models, a problem that arises in many applications where network changes can occur due to some external stimuli. We show that the sample complexity for detecting sparse changes, in a minimax sense, is no better than learning the entire model even in settings with local sparsity. This is a surprising fact in light of prior work rooted in sparse recovery methods, which suggest that sample complexity in this context scales only with the number of network changes. To shed light on when change detection is easier than structured learning, we consider testing of edge deletion in forest-structured graphs, and high-temperature ferromagnets as case studies. We show for these that testing of small changes is similarly hard, but testing of \emph{large} changes is well-separated from structure learning. These results imply that testing of graphical models may not be amenable to concepts such as restricted strong convexity leveraged for sparsity pattern recovery, and algorithm development instead should be directed towards detection of large changes.
\ No newline at end of file
diff --git a/data/2020/neurips/Limits to Depth Efficiencies of Self-Attention b/data/2020/neurips/Limits to Depth Efficiencies of Self-Attention
new file mode 100644
index 0000000000..02f4b31f3f
--- /dev/null
+++ b/data/2020/neurips/Limits to Depth Efficiencies of Self-Attention	
@@ -0,0 +1 @@
+Self-attention architectures, which are rapidly pushing the frontier in natural language processing, demonstrate a surprising depth-inefficient behavior: Empirical signals indicate that increasing the internal representation (network width) is just as useful as increasing the number of self-attention layers (network depth). In this paper, we theoretically study the interplay between depth and width in self-attention, and shed light on the root of the above phenomenon. We invalidate the seemingly plausible hypothesis by which widening is as effective as deepening for self-attention, and show that in fact stacking self-attention layers is so effective that it quickly saturates a capacity of the network width. Specifically, we pinpoint a "depth threshold" that is logarithmic in $d_x$, the network width: $L_{\textrm{th}}=\log_{3}(d_x)$. For networks of depth that is below the threshold, we establish a double-exponential depth-efficiency of the self-attention operation, while for depths over the threshold we show that depth-inefficiency kicks in. Our predictions strongly accord with extensive empirical ablations in Kaplan et al. (2020), accounting for the different behaviors in the two depth-(in)efficiency regimes. By identifying network width as a limiting factor, our analysis indicates that solutions for dramatically increasing the width can facilitate the next leap in self-attention expressivity.
\ No newline at end of file
diff --git a/data/2020/neurips/Linear Disentangled Representations and Unsupervised Action Estimation b/data/2020/neurips/Linear Disentangled Representations and Unsupervised Action Estimation
new file mode 100644
index 0000000000..fcc32b801e
--- /dev/null
+++ b/data/2020/neurips/Linear Disentangled Representations and Unsupervised Action Estimation	
@@ -0,0 +1 @@
+Disentangled representation learning has seen a surge in interest over recent times, generally focusing on new models which optimise one of many disparate disentanglement metrics. Symmetry Based Disentangled Representation learning introduced a robust mathematical framework that defined precisely what is meant by a "linear disentangled representation". This framework determined that such representations would depend on a particular decomposition of the symmetry group acting on the data, showing that actions would manifest through irreducible group representations acting on independent representational subspaces. Caselles-Dupre et al [2019] subsequently proposed the first model to induce and demonstrate a linear disentangled representation in a VAE model. In this work we empirically show that linear disentangled representations are not generally present in standard VAE models and that they instead require altering the loss landscape to induce them. We proceed to show that such representations are a desirable property with regard to classical disentanglement metrics. Finally we propose a method to induce irreducible representations which forgoes the need for labelled action sequences, as was required by prior work. We explore a number of properties of this method, including the ability to learn from action sequences without knowledge of intermediate states and robustness under visual noise. We also demonstrate that it can successfully learn 4 independent symmetries directly from pixels.
\ No newline at end of file
diff --git a/data/2020/neurips/Linear Dynamical Systems as a Core Computational Primitive b/data/2020/neurips/Linear Dynamical Systems as a Core Computational Primitive
new file mode 100644
index 0000000000..0118f2ab0d
--- /dev/null
+++ b/data/2020/neurips/Linear Dynamical Systems as a Core Computational Primitive	
@@ -0,0 +1 @@
+Single-input, single-output linear dynamical systems (SISO LDS) map a sequence of input numbers to a sequence of output numbers. We present two results which support their use as a building block for more complex RNNs. The ﬁrst result concerns computational efﬁciency. We show that reachable SISO LDS, and their gradients, can be computed in parallel across time, so they can be used on very long time series. This is possible because the eigenvectors of the LDS transition matrix have closed-form expressions in terms of the eigenvalues. The second result concerns expressive power. We show a sum of reachable SISO LDS can approximate any reachable, multiple-input, single-output (MISO) LDS, whose inputs are vectors. This construction involves randomly projecting the vectors to a single dimension.
\ No newline at end of file
diff --git a/data/2020/neurips/Linear Time Sinkhorn Divergences using Positive Features b/data/2020/neurips/Linear Time Sinkhorn Divergences using Positive Features
new file mode 100644
index 0000000000..b7ddb21c59
--- /dev/null
+++ b/data/2020/neurips/Linear Time Sinkhorn Divergences using Positive Features	
@@ -0,0 +1 @@
+Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing in general quadratically in the size $n$ of the support of these distributions. Indeed, solving optimal transport (OT) with an entropic regularization requires computing a $n\times n$ kernel matrix (the neg-exponential of a $n\times n$ pairwise ground cost matrix) that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-\log\dotp{\varphi(x)}{\varphi(y)}$ where $\varphi$ is a map from the ground space onto the positive orthant $\RR^r_+$, with $r\ll n$. This choice yields, equivalently, a kernel $k(x,y)=\dotp{\varphi(x)}{\varphi(y)}$, and ensures that the cost of Sinkhorn iterations scales as $O(nr)$. We show that usual cost functions can be approximated using this form. Additionaly, we take advantage of the fact that our approach yields approximation that remain fully differentiable with respect to input distributions, as opposed to previously proposed adaptive low-rank approximations of the kernel matrix, to train a faster variant of OT-GAN \cite{salimans2018improving}.
\ No newline at end of file
diff --git a/data/2020/neurips/Linear-Sample Learning of Low-Rank Distributions b/data/2020/neurips/Linear-Sample Learning of Low-Rank Distributions
new file mode 100644
index 0000000000..dcceb6da40
--- /dev/null
+++ b/data/2020/neurips/Linear-Sample Learning of Low-Rank Distributions	
@@ -0,0 +1 @@
+Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices. Yet despite considerable research, except for very special cases, the number of samples required to efficiently recover the underlying matrices has not been known. We determine the onset of learning in several common latent-variable settings. For all of them, we show that learning $k\times k$, rank-$r$, matrices to normalized $L_{1}$ distance $\epsilon$ requires $\Omega(\frac{kr}{\epsilon^2})$ samples, and propose an algorithm that uses ${\cal O}(\frac{kr}{\epsilon^2}\log^2\frac r\epsilon)$ samples, a number linear in the high dimension, and nearly linear in the, typically low, rank. The algorithm improves on existing spectral techniques and runs in polynomial time. The proofs establish new results on the rapid convergence of the spectral distance between the model and observation matrices, and may be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Linearly Converging Error Compensated SGD b/data/2020/neurips/Linearly Converging Error Compensated SGD
new file mode 100644
index 0000000000..c3e6aa971d
--- /dev/null
+++ b/data/2020/neurips/Linearly Converging Error Compensated SGD	
@@ -0,0 +1 @@
+In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.
\ No newline at end of file
diff --git a/data/2020/neurips/Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing b/data/2020/neurips/Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing
new file mode 100644
index 0000000000..c80695dcdf
--- /dev/null
+++ b/data/2020/neurips/Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing	
@@ -0,0 +1 @@
+In this work we propose a graph-based learning framework to train models with provable robustness to adversarial perturbations. In contrast to regularization-based approaches, we formulate the adversarially robust learning problem as one of loss minimization with a Lipschitz constraint, and show that the saddle point of the associated Lagrangian is characterized by a Poisson equation with weighted Laplace operator. Further, the weighting for the Laplace operator is given by the Lagrange multiplier for the Lipschitz constraint, which modulates the sensitivity of the minimizer to perturbations. We then design a provably robust training scheme using graph-based discretization of the input space and a primal-dual algorithm to converge to the Lagrangian's saddle point. Our analysis establishes a novel connection between elliptic operators with constraint-enforced weighting and adversarial learning. We also study the complementary problem of improving the robustness of minimizers with a margin on their loss, formulated as a loss-constrained minimization problem of the Lipschitz constant. We propose a technique to obtain robustified minimizers, and evaluate fundamental Lipschitz lower bounds by approaching Lipschitz constant minimization via a sequence of gradient $p$-norm minimization problems. Ultimately, our results show that, for a desired nominal performance, there exists a fundamental lower bound on the sensitivity to adversarial perturbations that depends only on the loss function and the data distribution, and that improvements in robustness beyond this bound can only be made at the expense of nominal performance. Our training schemes provably achieve these bounds both under constraints on performance and~robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/Lipschitz-Certifiable Training with a Tight Outer Bound b/data/2020/neurips/Lipschitz-Certifiable Training with a Tight Outer Bound
new file mode 100644
index 0000000000..bec25fd776
--- /dev/null
+++ b/data/2020/neurips/Lipschitz-Certifiable Training with a Tight Outer Bound	
@@ -0,0 +1 @@
+Veriﬁable training is a promising research direction for training a robust network. However, most veriﬁable training methods are slow or lack scalability. In this study, we propose a fast and scalable certiﬁable training algorithm based on Lipschitz analysis and interval arithmetic. Our certiﬁable training algorithm provides a tight propagated outer bound by introducing the box constraint propagation (BCP), and it efﬁciently computes the worst logit over the outer bound. In the experiments, we show that BCP achieves a tighter outer bound than the global Lipschitz-based outer bound. Moreover, our certiﬁable training algorithm is over 12 times faster than the state-of-the-art dual relaxation-based method; however, it achieves comparable or better veriﬁcation performance, improving natural accuracy. Our fast certiﬁable training algorithm with the tight outer bound can scale to Tiny ImageNet with veriﬁcation accuracy of 20.1% ( (cid:96) 2 -perturbation of (cid:15) = 36 / 255 ). Our code is available at https://github.com/sungyoon-lee/bcp .
\ No newline at end of file
diff --git a/data/2020/neurips/List-Decodable Mean Estimation via Iterative Multi-Filtering b/data/2020/neurips/List-Decodable Mean Estimation via Iterative Multi-Filtering
new file mode 100644
index 0000000000..538914b7fd
--- /dev/null
+++ b/data/2020/neurips/List-Decodable Mean Estimation via Iterative Multi-Filtering	
@@ -0,0 +1 @@
+We study the problem of {\em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $\mathbb{R}^d$ with the promise that an unknown $\alpha$-fraction of points in $T$, where $0< \alpha < 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.
\ No newline at end of file
diff --git a/data/2020/neurips/Listening to Sounds of Silence for Speech Denoising b/data/2020/neurips/Listening to Sounds of Silence for Speech Denoising
new file mode 100644
index 0000000000..d79a45ab2f
--- /dev/null
+++ b/data/2020/neurips/Listening to Sounds of Silence for Speech Denoising	
@@ -0,0 +1 @@
+We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications. Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time-varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. Experiments on multiple datasets confirm the pivotal role of silent interval detection for speech denoising, and our method outperforms several state-of-the-art denoising methods, including those that accept only audio input (like ours) and those that denoise based on audiovisual input (and hence require more information). We also show that our method enjoys excellent generalization properties, such as denoising spoken languages not seen during training.
\ No newline at end of file
diff --git a/data/2020/neurips/LoCo: Local Contrastive Representation Learning b/data/2020/neurips/LoCo: Local Contrastive Representation Learning
new file mode 100644
index 0000000000..587d31809b
--- /dev/null
+++ b/data/2020/neurips/LoCo: Local Contrastive Representation Learning	
@@ -0,0 +1 @@
+Deep neural nets typically perform end-to-end backpropagation to learn the weights, a procedure that creates synchronization constraints in the weight update step across layers and is not biologically plausible. Recent advances in unsupervised contrastive representation learning point to the question of whether a learning algorithm can also be made local, that is, the updates of lower layers do not directly depend on the computation of upper layers. While Greedy InfoMax separately learns each block with a local objective, we found that it consistently hurts readout accuracy in state-of-the-art unsupervised contrastive learning algorithms, possibly due to the greedy objective as well as gradient isolation. In this work, we discover that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks. This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time. Aside from standard ImageNet experiments, we also show results on complex downstream tasks such as object detection and instance segmentation directly using readout features.
\ No newline at end of file
diff --git a/data/2020/neurips/Locally Differentially Private (Contextual) Bandits Learning b/data/2020/neurips/Locally Differentially Private (Contextual) Bandits Learning
new file mode 100644
index 0000000000..d2584f9ed1
--- /dev/null
+++ b/data/2020/neurips/Locally Differentially Private (Contextual) Bandits Learning	
@@ -0,0 +1 @@
+We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization etc, and obtain the first results for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) context-free bandits algorithms. Further, we also extend our algorithm to Generalized Linear Bandits with regret bound $\tilde{\mathcal{O}}(T^{3/4}/\varepsilon)$ under $(\varepsilon, \delta)$-LDP which is conjectured to be optimal. Note given existing $\Omega(T)$ lower bound for DP contextual linear bandits (Shariff&Sheffe,NeurIPS2018), our result shows a fundamental difference between LDP and DP contextual bandits learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms b/data/2020/neurips/Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms
new file mode 100644
index 0000000000..0c4b6bcd4e
--- /dev/null
+++ b/data/2020/neurips/Locally private non-asymptotic testing of discrete distributions is faster using interactive mechanisms	
@@ -0,0 +1 @@
+We find separation rates for testing multinomial or more general discrete distributions under the constraint of local differential privacy. We construct efficient randomized algorithms and test procedures, in both the case where only non-interactive privacy mechanisms are allowed and also in the case where all sequentially interactive privacy mechanisms are allowed. The separation rates are faster in the latter case. We prove general information theoretical bounds that allow us to establish the optimality of our algorithms among all pairs of privacy mechanisms and test procedures, in most usual cases. Considered examples include testing uniform, polynomially and exponentially decreasing distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Locally-Adaptive Nonparametric Online Learning b/data/2020/neurips/Locally-Adaptive Nonparametric Online Learning
new file mode 100644
index 0000000000..e1eb8b1a9b
--- /dev/null
+++ b/data/2020/neurips/Locally-Adaptive Nonparametric Online Learning	
@@ -0,0 +1 @@
+One of the main strengths of online algorithms is their ability to adapt to arbitrary data sequences. This is especially important in nonparametric settings, where regret is measured against rich classes of comparator functions that are able to fit complex environments. Although such hard comparators and complex environments may exhibit local regularities, efficient algorithms whose performance can provably take advantage of these local patterns are hardly known. We fill this gap introducing efficient online algorithms (based on a single versatile master algorithm) that adapt to: (1) local Lipschitzness of the competitor function, (2) local metric dimension of the instance sequence, (3) local performance of the predictor across different regions of the instance space. Extending previous approaches, we design algorithms that dynamically grow hierarchical packings of the instance space, and whose prunings correspond to different "locality profiles" for the problem at hand. Using a technique based on tree experts, we simultaneously and efficiently compete against all such prunings, and prove regret bounds scaling with quantities associated with all three types of local regularities. When competing against "simple" locality profiles, our technique delivers regret bounds that are significantly better than those proven using the previous approach. On the other hand, the time dependence of our bounds is not worse than that obtained by ignoring any local regularities.
\ No newline at end of file
diff --git a/data/2020/neurips/Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment b/data/2020/neurips/Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
new file mode 100644
index 0000000000..f7da786858
--- /dev/null
+++ b/data/2020/neurips/Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment	
@@ -0,0 +1 @@
+Unsupervised distribution alignment has many applications in deep learning, including domain adaptation and unsupervised image-to-image translation. Most prior work on unsupervised distribution alignment relies either on minimizing simple non-parametric statistical distances such as maximum mean discrepancy, or on adversarial alignment. However, the former fails to capture the structure of complex real-world distributions, while the latter is difficult to train and does not provide any universal convergence guarantees or automatic quantitative validation procedures. In this paper we propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows. We show that, under certain assumptions, this combination yields a deep neural likelihood-based minimization objective that attains a known lower bound upon convergence. We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Logarithmic Pruning is All You Need b/data/2020/neurips/Logarithmic Pruning is All You Need
new file mode 100644
index 0000000000..0c6396f83b
--- /dev/null
+++ b/data/2020/neurips/Logarithmic Pruning is All You Need	
@@ -0,0 +1 @@
+The Lottery Ticket Hypothesis is a conjecture that every large neural network contains a subnetwork that, when trained in isolation, achieves comparable performance to the large network. An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, even without training, achieves comparable accuracy to the trained large network. This theorem, however, relies on a number of strong assumptions and guarantees a polynomial factor on the size of the large network compared to the target function. In this work, we remove the most limiting assumptions of this previous work while providing significantly tighter bounds: the overparameterized network only needs a logarithmic factor (in all variables but depth) number of neurons per weight of the target subnetwork.
\ No newline at end of file
diff --git a/data/2020/neurips/Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems b/data/2020/neurips/Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems
new file mode 100644
index 0000000000..cf7543fe11
--- /dev/null
+++ b/data/2020/neurips/Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems	
@@ -0,0 +1 @@
+We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdaptOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, AdaptOn constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, AdaptOn improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that AdaptOn achieves a regret upper bound of $\text{polylog}\left(T\right)$, after $T$ time steps of agent-environment interaction. To the best of our knowledge, AdaptOn is the first algorithm that achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.
\ No newline at end of file
diff --git a/data/2020/neurips/Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors b/data/2020/neurips/Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
new file mode 100644
index 0000000000..1d915cbbb0
--- /dev/null
+++ b/data/2020/neurips/Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors	
@@ -0,0 +1 @@
+The ability to predict and plan into the future is fundamental for agents acting in the world. To reach a faraway goal, we predict trajectories at multiple timescales, first devising a coarse plan towards the goal and then gradually filling in details. In contrast, current learning approaches for visual prediction and planning fail on long-horizon tasks as they generate predictions (1) without considering goal information, and (2) at the finest temporal resolution, one step at a time. In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations. First, we formulate the problem of predicting towards a goal and propose the corresponding class of latent space goal-conditioned predictors (GCPs). GCPs significantly improve planning efficiency by constraining the search space to only those trajectories that reach the goal. Further, we show how GCPs can be naturally formulated as hierarchical models that, given two observations, predict an observation between them, and by recursively subdividing each part of the trajectory generate complete sequences. This divide-and-conquer strategy is effective at long-term prediction, and enables us to design an effective hierarchical planning algorithm that optimizes trajectories in a coarse-to-fine manner. We show that by using both goal-conditioning and hierarchical prediction, GCPs enable us to solve visual planning tasks with much longer horizon than previously possible.
\ No newline at end of file
diff --git a/data/2020/neurips/Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect b/data/2020/neurips/Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
new file mode 100644
index 0000000000..bd521d4250
--- /dev/null
+++ b/data/2020/neurips/Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect	
@@ -0,0 +1 @@
+As the class size grows, maintaining a balanced dataset across many classes is challenging because the data are long-tailed in nature; it is even impossible when the sample-of-interest co-exists with each other in one collectable unit, e.g., multiple visual instances in one image. Therefore, long-tailed classification is the key to deep learning at scale. However, existing methods are mainly based on re-weighting/re-sampling heuristics that lack a fundamental theory. In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution. Specifically, our theory shows that the SGD momentum is essentially a confounder in long-tailed classification. On one hand, it has a harmful causal effect that misleads the tail prediction biased towards the head. On the other hand, its induced mediation also benefits the representation learning and head prediction. Our framework elegantly disentangles the paradoxical effects of the momentum, by pursuing the direct causal effect caused by an input sample. In particular, we use causal intervention in training, and counterfactual reasoning in inference, to remove the "bad" while keep the "good". We achieve new state-of-the-arts on three long-tailed visual recognition benchmarks: Long-tailed CIFAR-10/-100, ImageNet-LT for image classification and LVIS for instance segmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration b/data/2020/neurips/LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
new file mode 100644
index 0000000000..9bcd738766
--- /dev/null
+++ b/data/2020/neurips/LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration	
@@ -0,0 +1 @@
+We address the problem of fitting 3D human models to 3D scans of dressed humans. Classical methods optimize both the data-to-model correspondences and the human model parameters (pose and shape), but are reliable only when initialized close to the solution. Some methods initialize the optimization based on fully supervised correspondence predictors, which is not differentiable end-to-end, and can only process a single scan at a time. Our main contribution is LoopReg, an end-to-end learning framework to register a corpus of scans to a common 3D human model. The key idea is to create a self-supervised loop. A backward map, parameterized by a Neural Network, predicts the correspondence from every scan point to the surface of the human model. A forward map, parameterized by a human model, transforms the corresponding points back to the scan based on the model parameters (pose and shape), thus closing the loop. Formulating this closed loop is not straightforward because it is not trivial to force the output of the NN to be on the surface of the human model - outside this surface the human model is not even defined. To this end, we propose two key innovations. First, we define the canonical surface implicitly as the zero level set of a distance field in R3, which in contrast to morecommon UV parameterizations, does not require cutting the surface, does not have discontinuities, and does not induce distortion. Second, we diffuse the human model to the 3D domain R3. This allows to map the NN predictions forward,even when they slightly deviate from the zero level set. Results demonstrate that we can train LoopRegmainly self-supervised - following a supervised warm-start, the model becomes increasingly more accurate as additional unlabelled raw scans are processed. Our code and pre-trained models can be downloaded for research.
\ No newline at end of file
diff --git a/data/2020/neurips/Low Distortion Block-Resampling with Spatially Stochastic Networks b/data/2020/neurips/Low Distortion Block-Resampling with Spatially Stochastic Networks
new file mode 100644
index 0000000000..6fb878cbab
--- /dev/null
+++ b/data/2020/neurips/Low Distortion Block-Resampling with Spatially Stochastic Networks	
@@ -0,0 +1 @@
+We formalize and attack the problem of generating new images from old ones that are as diverse as possible, only allowing them to change without restrictions in certain parts of the image while remaining globally consistent. This encompasses the typical situation found in generative modelling, where we are happy with parts of the generated data, but would like to resample others ("I like this generated castle overall, but this tower looks unrealistic, I would like a new one"). In order to attack this problem we build from the best conditional and unconditional generative models to introduce a new network architecture, training procedure, and algorithm for resampling parts of the image as desired.
\ No newline at end of file
diff --git a/data/2020/neurips/MATE: Plugging in Model Awareness to Task Embedding for Meta Learning b/data/2020/neurips/MATE: Plugging in Model Awareness to Task Embedding for Meta Learning
new file mode 100644
index 0000000000..4bff00ebc8
--- /dev/null
+++ b/data/2020/neurips/MATE: Plugging in Model Awareness to Task Embedding for Meta Learning	
@@ -0,0 +1 @@
+Meta-learning improves generalization of machine learning models when faced with previously unseen tasks by leveraging experiences from different, yet related prior tasks. To allow for better generalization, we propose a novel task representation called model-aware task embedding (MATE) that incorporates not only the data distributions of different tasks, but also the complexity of the tasks through the models used. The task complexity is taken into account by a novel variant of kernel mean embedding, combined with an instance-adaptive attention mechanism inspired by an SVM-based feature selection algorithm. Together with conditioning layers in deep neural networks, MATE can be easily incorporated into existing meta learners as a plug-and-play module. While MATE is widely applicable to general tasks where the concept of task/environment is involved, we demonstrate its effectiveness in few-shot learning by improving a state-of-the-art model consistently on two benchmarks. Source codes for this paper are available at https://github.com/VITA-Group/MATE.
\ No newline at end of file
diff --git a/data/2020/neurips/MCUNet: Tiny Deep Learning on IoT Devices b/data/2020/neurips/MCUNet: Tiny Deep Learning on IoT Devices
new file mode 100644
index 0000000000..94d57cc09b
--- /dev/null
+++ b/data/2020/neurips/MCUNet: Tiny Deep Learning on IoT Devices	
@@ -0,0 +1 @@
+Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude less even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e. device, latency, energy, memory) under low search costs. TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the design space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 2.7x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro and CMSIS-NN. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.6x less SRAM and 6.6x less Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster than MobileNetV2 and ProxylessNAS-based solutions with 2.2-2.6x smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived.
\ No newline at end of file
diff --git a/data/2020/neurips/MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning b/data/2020/neurips/MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning
new file mode 100644
index 0000000000..9edbde76ea
--- /dev/null
+++ b/data/2020/neurips/MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning	
@@ -0,0 +1 @@
+This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.
\ No newline at end of file
diff --git a/data/2020/neurips/MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler b/data/2020/neurips/MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler
new file mode 100644
index 0000000000..08d2f07fc3
--- /dev/null
+++ b/data/2020/neurips/MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler	
@@ -0,0 +1 @@
+Imbalanced learning (IL), i.e., learning unbiased models from class-imbalanced data, is a challenging problem. Typical IL methods including resampling and reweighting were designed based on some heuristic assumptions. They often suffer from unstable performance, poor applicability, and high computational cost in complex tasks where their assumptions do not hold. In this paper, we introduce a novel ensemble IL framework named MESA. It adaptively resamples the training set in iterations to get multiple classifiers and forms a cascade ensemble model. MESA directly learns the sampling strategy from data to optimize the final metric beyond following random heuristics. Moreover, unlike prevailing meta-learning-based IL solutions, we decouple the model-training and meta-training in MESA by independently train the meta-sampler over task-agnostic meta-data. This makes MESA generally applicable to most of the existing learning models and the meta-sampler can be efficiently applied to new tasks. Extensive experiments on both synthetic and real-world tasks demonstrate the effectiveness, robustness, and transferability of MESA. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles b/data/2020/neurips/MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles
new file mode 100644
index 0000000000..9ee2505c19
--- /dev/null
+++ b/data/2020/neurips/MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles	
@@ -0,0 +1 @@
+The strong correlation between neurons or filters can significantly weaken the generalization ability of neural networks. Inspired by the well-known Tammes problem, we propose a novel diversity regularization method to address this issue, which makes the normalized weight vectors of neurons or filters distributed on a hypersphere as uniformly as possible, through maximizing the minimal pairwise angles (MMA). This method can easily exert its effect by plugging the MMA regularization term into the loss function with negligible computational overhead. The MMA regularization is simple, efficient, and effective. Therefore, it can be used as a basic regularization method in neural network training. Extensive experiments demonstrate that MMA regularization is able to enhance the generalization ability of various modern models and achieves considerable performance improvements on CIFAR100 and TinyImageNet datasets. In addition, experiments on face verification show that MMA regularization is also effective for feature learning.
\ No newline at end of file
diff --git a/data/2020/neurips/MOPO: Model-based Offline Policy Optimization b/data/2020/neurips/MOPO: Model-based Offline Policy Optimization
new file mode 100644
index 0000000000..f8e1c0b3f8
--- /dev/null
+++ b/data/2020/neurips/MOPO: Model-based Offline Policy Optimization	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a batch of previously collected data. This problem setting is compelling, because it offers the promise of utilizing large, diverse, previously collected datasets to acquire policies without any costly or dangerous active exploration, but it is also exceptionally difficult, due to the distributional shift between the offline training data and the learned policy. While there has been significant progress in model-free offline RL, the most successful prior methods constrain the policy to the support of the data, precluding generalization to new states. In this paper, we observe that an existing model-based RL algorithm on its own already produces significant gains in the offline setting, as compared to model-free approaches, despite not being designed for this setting. However, although many standard model-based RL methods already estimate the uncertainty of their model, they do not by themselves provide a mechanism to avoid the issues associated with distributional shift in the offline setting. We therefore propose to modify existing model-based RL methods to address these issues by casting offline model-based RL into a penalized MDP framework. We theoretically show that, by using this penalized MDP, we are maximizing a lower bound of the return in the true MDP. Based on our theoretical results, we propose a new model-based offline RL algorithm that applies the variance of a Lipschitz-regularized model as a penalty to the reward function. We find that this algorithm outperforms both standard model-based RL methods and existing state-of-the-art model-free offline RL approaches on existing offline RL benchmarks, as well as two challenging continuous control tasks that require generalizing from data collected for a different task.
\ No newline at end of file
diff --git a/data/2020/neurips/MOReL: Model-Based Offline Reinforcement Learning b/data/2020/neurips/MOReL: Model-Based Offline Reinforcement Learning
new file mode 100644
index 0000000000..dea9449d7d
--- /dev/null
+++ b/data/2020/neurips/MOReL: Model-Based Offline Reinforcement Learning	
@@ -0,0 +1 @@
+In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline can greatly expand the applicability of RL, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g. generative modeling, uncertainty estimation, planning etc.) to directly translate into advances for offline RL.
\ No newline at end of file
diff --git a/data/2020/neurips/MPNet: Masked and Permuted Pre-training for Language Understanding b/data/2020/neurips/MPNet: Masked and Permuted Pre-training for Language Understanding
new file mode 100644
index 0000000000..d4988cbf02
--- /dev/null
+++ b/data/2020/neurips/MPNet: Masked and Permuted Pre-training for Language Understanding	
@@ -0,0 +1 @@
+BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. We argue that XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. We release the code and pre-trained model in GitHub\footnote{\url{this https URL}}.
\ No newline at end of file
diff --git a/data/2020/neurips/MRI Banding Removal via Adversarial Training b/data/2020/neurips/MRI Banding Removal via Adversarial Training
new file mode 100644
index 0000000000..a6d7345d23
--- /dev/null
+++ b/data/2020/neurips/MRI Banding Removal via Adversarial Training	
@@ -0,0 +1 @@
+MRI images reconstructed from sub-sampled Cartesian data using deep learning techniques often show a characteristic banding (sometimes described as streaking), which is particularly strong in low signal-to-noise regions of the reconstructed image. In this work, we propose the use of an adversarial loss that penalizes banding structures without requiring any human annotation. Our technique greatly reduces the appearance of banding, without requiring any additional computation or post-processing at reconstruction time. We report the results of a blind comparison against a strong baseline by a group of expert evaluators (board-certified radiologists), where our approach is ranked superior at banding removal with no statistically significant loss of detail.
\ No newline at end of file
diff --git a/data/2020/neurips/Make One-Shot Video Object Segmentation Efficient Again b/data/2020/neurips/Make One-Shot Video Object Segmentation Efficient Again
new file mode 100644
index 0000000000..bf188d505d
--- /dev/null
+++ b/data/2020/neurips/Make One-Shot Video Object Segmentation Efficient Again	
@@ -0,0 +1 @@
+Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video. In the semi-supervised setting, the first mask of each object is provided at test time. Following the one-shot principle, fine-tuning VOS methods train a segmentation model separately on each given object mask. However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible. To mitigate the inefficiencies of previous fine-tuning approaches, we present efficient One-Shot Video Object Segmentation (e-OSVOS). In contrast to most VOS approaches, e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN. The one-shot test runtime and performance are optimized without a laborious and handcrafted hyperparameter search. To this end, we meta learn the model initialization and learning rates for the test time optimization. To achieve optimal learning behavior, we predict individual learning rates at a neuron level. Furthermore, we apply an online adaptation to address the common performance degradation throughout a sequence by continuously fine-tuning the model on previous mask predictions supported by a frame-to-frame bounding box propagation. e-OSVOS provides state-of-the-art results on DAVIS 2016, DAVIS 2017, and YouTube-VOS for one-shot fine-tuning methods while reducing the test runtime substantially. Code is available at https://github.com/dvl-tum/e-osvos.
\ No newline at end of file
diff --git a/data/2020/neurips/Making Non-Stochastic Control (Almost) as Easy as Stochastic b/data/2020/neurips/Making Non-Stochastic Control (Almost) as Easy as Stochastic
new file mode 100644
index 0000000000..c62acbaad4
--- /dev/null
+++ b/data/2020/neurips/Making Non-Stochastic Control (Almost) as Easy as Stochastic	
@@ -0,0 +1,2 @@
+Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i.i.d. Gaussian noise. It is now understood that the optimal regret on time horizon $T$ against the optimal control law scales as $\widetilde{\Theta}(\sqrt{T})$. In this paper, we show that the same regret rate (against a suitable benchmark) is attainable even in the considerably more general non-stochastic control model, where the system is driven by \emph{arbitrary adversarial} noise (Agarwal et al. 2019). In other words, \emph{stochasticity confers little benefit in online LQR}. 
+We attain the optimal $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the dynamics are unknown to the learner, and $\mathrm{poly}(\log T)$ regret when known, provided that the cost functions are strongly convex (as in LQR). Our algorithm is based on a novel variant of online Newton step (Hazan et al. 2007), which adapts to the geometry induced by possibly adversarial disturbances, and our analysis hinges on generic "policy regret" bounds for certain structured losses in the OCO-with-memory framework (Anava et al. 2015). Moreover, our results accomodate the full generality of the non-stochastic control setting: adversarially chosen (possibly non-quadratic) costs, partial state observation, and fully adversarial process and observation noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Manifold GPLVMs for discovering non-Euclidean latent structure in neural data b/data/2020/neurips/Manifold GPLVMs for discovering non-Euclidean latent structure in neural data
new file mode 100644
index 0000000000..29e0b1fc24
--- /dev/null
+++ b/data/2020/neurips/Manifold GPLVMs for discovering non-Euclidean latent structure in neural data	
@@ -0,0 +1 @@
+A common problem in neuroscience is to elucidate the collective neural representations of behaviorally important variables such as head direction, spatial location, upcoming movements, or mental spatial transformations. Often, these latent variables are internal constructs not directly accessible to the experimenter. Here, we propose a new probabilistic latent variable model to simultaneously identify the latent state and the way each neuron contributes to its representation in an unsupervised way. In contrast to previous models which assume Euclidean latent spaces, we embrace the fact that latent states often belong to symmetric manifolds such as spheres, tori, or rotation groups of various dimensions. We therefore propose the manifold Gaussian process latent variable model (mGPLVM), where neural responses arise from (i) a shared latent variable living on a specific manifold, and (ii) a set of non-parametric tuning curves determining how each neuron contributes to the representation. Cross-validated comparisons of models with different topologies can be used to distinguish between candidate manifolds, and variational inference enables quantification of uncertainty. We demonstrate the validity of the approach on several synthetic datasets and on calcium recordings from the ellipsoid body of Drosophila melanogaster. This circuit is known to encode head direction, and mGPLVM correctly recovers the ring topology expected from a neural population representing a single angular variable.
\ No newline at end of file
diff --git a/data/2020/neurips/Manifold structure in graph embeddings b/data/2020/neurips/Manifold structure in graph embeddings
new file mode 100644
index 0000000000..58cb5ae1d1
--- /dev/null
+++ b/data/2020/neurips/Manifold structure in graph embeddings	
@@ -0,0 +1 @@
+Statistical analysis of a graph often starts with embedding, the process of representing its nodes as points in space. How to choose the embedding dimension is a nuanced decision in practice, but in theory a notion of true dimension is often available. In spectral embedding, this dimension may be very high. However, this paper shows that existing random graph models, including graphon and other latent position models, predict the data should live near a much lower-dimensional set. One may therefore circumvent the curse of dimensionality by employing methods which exploit hidden manifold structure.
\ No newline at end of file
diff --git a/data/2020/neurips/Marginal Utility for Planning in Continuous or Large Discrete Action Spaces b/data/2020/neurips/Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
new file mode 100644
index 0000000000..37f0d422c1
--- /dev/null
+++ b/data/2020/neurips/Marginal Utility for Planning in Continuous or Large Discrete Action Spaces	
@@ -0,0 +1 @@
+Sample-based planning is a powerful family of algorithms for generating intelligent behavior from a model of the environment. Generating good candidate actions is critical to the success of sample-based planners, particularly in continuous or large action spaces. Typically, candidate action generation exhausts the action space, uses domain knowledge, or more recently, involves learning a stochastic policy to provide such search guidance. In this paper we explore explicitly learning a candidate action generator by optimizing a novel objective, marginal utility. The marginal utility of an action generator measures the increase in value of an action over previously generated actions. We validate our approach in both curling, a challenging stochastic domain with continuous state and action spaces, and a location game with a discrete but large action space. We show that a generator trained with the marginal utility objective outperforms hand-coded schemes built on substantial domain knowledge, trained stochastic policies, and other natural objectives for generating actions for sampled-based planners.
\ No newline at end of file
diff --git a/data/2020/neurips/Margins are Insufficient for Explaining Gradient Boosting b/data/2020/neurips/Margins are Insufficient for Explaining Gradient Boosting
new file mode 100644
index 0000000000..bf5e161ea0
--- /dev/null
+++ b/data/2020/neurips/Margins are Insufficient for Explaining Gradient Boosting	
@@ -0,0 +1 @@
+Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which was recently proved to be near-tight for some data distributions (Gronlund et al. 2019). In this work, we first demonstrate that the $k$'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the $k$'th margin bound and prove a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explaining the performance of modern gradient boosters. Finally, we improve upon the recent generalization lower bound by Gronlund et al. (2019).
\ No newline at end of file
diff --git a/data/2020/neurips/Matrix Completion with Hierarchical Graph Side Information b/data/2020/neurips/Matrix Completion with Hierarchical Graph Side Information
new file mode 100644
index 0000000000..8b60e51222
--- /dev/null
+++ b/data/2020/neurips/Matrix Completion with Hierarchical Graph Side Information	
@@ -0,0 +1 @@
+We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model (to be detailed), we demonstrate that our algorithm achieves the information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) that is derived by maximum likelihood estimation together with a lower-bound impossibility result. One consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. We conduct extensive experiments both on synthetic and real-world datasets to corroborate our theoretical results as well as to demonstrate significant performance improvements over other matrix completion algorithms that leverage graph side information.
\ No newline at end of file
diff --git a/data/2020/neurips/Matrix Inference and Estimation in Multi-Layer Models b/data/2020/neurips/Matrix Inference and Estimation in Multi-Layer Models
new file mode 100644
index 0000000000..58162bb2f3
--- /dev/null
+++ b/data/2020/neurips/Matrix Inference and Estimation in Multi-Layer Models	
@@ -0,0 +1 @@
+We consider the problem of estimating the input and hidden variables of a stochastic multi-layer neural network (NN) from an observation of the output. The hidden variables in each layer are represented as matrices with statistical interactions along both rows as well as columns. This problem applies to matrix imputation, signal recovery via deep generative prior models, multi-task and mixed regression, and learning certain classes of two-layer NNs. We extend a recently-developed algorithm—multi-layer vector approximate message passing, for this matrix-valued inference problem. It is shown that the performance of the proposed multi-layer matrix vector approximate message passing algorithm can be exactly predicted in a certain random large-system limit, where the dimensions N × d of the unknown quantities grow as N → ∞ with d fixed. In the two-layer neural-network learning problem, this scaling corresponds to the case where the number of input features as well as training samples grow to infinity but the number of hidden nodes stays fixed. The analysis enables a precise prediction of the parameter and test error of the learning.
\ No newline at end of file
diff --git "a/data/2020/neurips/Mat\303\251rn Gaussian Processes on Riemannian Manifolds" "b/data/2020/neurips/Mat\303\251rn Gaussian Processes on Riemannian Manifolds"
new file mode 100644
index 0000000000..1aacbd2484
--- /dev/null
+++ "b/data/2020/neurips/Mat\303\251rn Gaussian Processes on Riemannian Manifolds"	
@@ -0,0 +1 @@
+Gaussian processes are an effective model class for learning unknown functions, particularly in settings where accurately representing predictive uncertainty is of key importance. Motivated by applications in the physical sciences, the widely-used Matern class of Gaussian processes has recently been generalized to model functions whose domains are Riemannian manifolds, by re-expressing said processes as solutions of stochastic partial differential equations. In this work, we propose techniques for computing the kernels of these processes via spectral theory of the Laplace--Beltrami operator in a fully constructive manner, thereby allowing them to be trained via standard scalable techniques such as inducing point methods. We also extend the generalization from the Matern to the widely-used squared exponential Gaussian process. By allowing Riemannian Matern Gaussian processes to be trained using well-understood techniques, our work enables their use in mini-batch, online, and non-conjugate settings, and makes them more accessible to machine learning practitioners.
\ No newline at end of file
diff --git a/data/2020/neurips/Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness b/data/2020/neurips/Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
new file mode 100644
index 0000000000..9636b45d0d
--- /dev/null
+++ b/data/2020/neurips/Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness	
@@ -0,0 +1 @@
+Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Measuring Robustness to Natural Distribution Shifts in Image Classification b/data/2020/neurips/Measuring Robustness to Natural Distribution Shifts in Image Classification
new file mode 100644
index 0000000000..3b11acd356
--- /dev/null
+++ b/data/2020/neurips/Measuring Robustness to Natural Distribution Shifts in Image Classification	
@@ -0,0 +1 @@
+We study how robust current ImageNet models are to distribution shifts arising from natural variations in datasets. Most research on robustness focuses on synthetic image perturbations (noise, simulated weather artifacts, adversarial examples, etc.), which leaves open how robustness on synthetic distribution shift relates to distribution shift arising in real data. Informed by an evaluation of 204 ImageNet models in 213 different test conditions, we find that there is often little to no transfer of robustness from current synthetic to natural distribution shift. Moreover, most current techniques provide no robustness to the natural distribution shifts in our testbed. The main exception is training on larger and more diverse datasets, which in multiple cases increases robustness, but is still far from closing the performance gaps. Our results indicate that distribution shifts arising in real data are currently an open research problem. We provide our testbed and data as a resource for future work at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Measuring Systematic Generalization in Neural Proof Generation with Transformers b/data/2020/neurips/Measuring Systematic Generalization in Neural Proof Generation with Transformers
new file mode 100644
index 0000000000..88d59d4f9d
--- /dev/null
+++ b/data/2020/neurips/Measuring Systematic Generalization in Neural Proof Generation with Transformers	
@@ -0,0 +1 @@
+We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.
\ No newline at end of file
diff --git a/data/2020/neurips/Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards b/data/2020/neurips/Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
new file mode 100644
index 0000000000..d856889246
--- /dev/null
+++ b/data/2020/neurips/Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards	
@@ -0,0 +1 @@
+Reinforcement learning with sparse rewards is challenging because an agent can rarely obtain non-zero rewards and hence, gradient-based optimization of parameterized policies can be incremental and slow. Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experiences, which can encourage the agent to adopt sub-optimal and myopic behaviors. In this work, instead of focusing on good experiences with limited diversity, we propose to learn a trajectory-conditioned policy to follow and expand diverse past trajectories from a memory buffer. Our method allows the agent to reach diverse regions in the state space and improve upon the past trajectories to reach new states. We empirically show that our approach significantly outperforms count-based exploration methods (parametric approach) and self-imitation learning (parametric approach with non-parametric memory) on various complex tasks with local optima. In particular, without using expert demonstrations or resetting to arbitrary states, we achieve the state-of-the-art scores under five billion number of frames, on challenging Atari games such as Montezuma’s Revenge and Pitfall.
\ No newline at end of file
diff --git a/data/2020/neurips/Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control b/data/2020/neurips/Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control
new file mode 100644
index 0000000000..d07a1d3b08
--- /dev/null
+++ b/data/2020/neurips/Memory-Efficient Learning of Stable Linear Dynamical Systems for Prediction and Control	
@@ -0,0 +1 @@
+Learning a stable Linear Dynamical System (LDS) from data involves creating models that both minimize reconstruction error and enforce stability of the learned representation. We propose a novel algorithm for learning stable LDSs. Using a recent characterization of stable matrices, we present an optimization method that ensures stability at every step and iteratively improves the reconstruction error using gradient directions derived in this paper. When applied to LDSs with inputs, our approach---in contrast to current methods for learning stable LDSs---updates both the state and control matrices, expanding the solution space and allowing for models with lower reconstruction error. We apply our algorithm in simulations and experiments to a variety of problems, including learning dynamic textures from image sequences and controlling a robotic manipulator. Compared to existing approaches, our proposed method achieves an orders-of-magnitude improvement in reconstruction error and superior results in terms of control performance. In addition, it is provably more memory-efficient, with an O(n^2) space complexity compared to O(n^4) of competing alternatives, thus scaling to higher-dimensional systems when the other methods fail.
\ No newline at end of file
diff --git a/data/2020/neurips/MeshSDF: Differentiable Iso-Surface Extraction b/data/2020/neurips/MeshSDF: Differentiable Iso-Surface Extraction
new file mode 100644
index 0000000000..52d3c0e179
--- /dev/null
+++ b/data/2020/neurips/MeshSDF: Differentiable Iso-Surface Extraction	
@@ -0,0 +1,4 @@
+Geometric Deep Learning has recently made striking progress with the advent of continuous Deep Implicit Fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is not limited in resolution. 
+Unfortunately, these methods are often not suitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field. 
+In this work, we remove this limitation and introduce a differentiable way to produce explicit surface mesh representations from Deep Signed Distance Functions. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying deep implicit field. We exploit this to define MeshSDF, an end-to-end differentiable mesh representation which can vary its topology. 
+We use two different applications to validate our theoretical insight: Single-View Reconstruction via Differentiable Rendering and Physically-Driven Shape Optimization. In both cases our differentiable parameterization gives us an edge over state-of-the-art algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Consolidation for Continual Learning b/data/2020/neurips/Meta-Consolidation for Continual Learning
new file mode 100644
index 0000000000..d5fb7e624a
--- /dev/null
+++ b/data/2020/neurips/Meta-Consolidation for Continual Learning	
@@ -0,0 +1,3 @@
+The ability to continuously learn and adapt itself to new tasks, without losing grasp of already acquired knowledge is a hallmark of biological learning systems, which current deep learning systems fall short of. In this work, we present a novel methodology for continual learning called MERLIN: Meta-Consolidation for Continual Learning. 
+We assume that weights of a neural network $\boldsymbol \psi$, for solving task $\boldsymbol t$, come from a meta-distribution $p(\boldsymbol{\psi|t})$. This meta-distribution is learned and consolidated incrementally. We operate in the challenging online continual learning setting, where a data point is seen by the model only once. 
+Our experiments with continual learning benchmarks of MNIST, CIFAR-10, CIFAR-100 and Mini-ImageNet datasets show consistent improvement over five baselines, including a recent state-of-the-art, corroborating the promise of MERLIN.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Gradient Reinforcement Learning with an Objective Discovered Online b/data/2020/neurips/Meta-Gradient Reinforcement Learning with an Objective Discovered Online
new file mode 100644
index 0000000000..fec82918c4
--- /dev/null
+++ b/data/2020/neurips/Meta-Gradient Reinforcement Learning with an Objective Discovered Online	
@@ -0,0 +1 @@
+Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median score of a strong actor-critic baseline.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Learning Requires Meta-Augmentation b/data/2020/neurips/Meta-Learning Requires Meta-Augmentation
new file mode 100644
index 0000000000..e366f030a1
--- /dev/null
+++ b/data/2020/neurips/Meta-Learning Requires Meta-Augmentation	
@@ -0,0 +1 @@
+Meta-learning algorithms aim to learn two components: a model that predicts targets for a task, and a base learner that quickly updates that model when given examples from a new task. This additional level of learning can be powerful, but it also creates another potential source for overfitting, since we can now overfit in either the model or the base learner. We describe both of these forms of metalearning overfitting, and demonstrate that they appear experimentally in common meta-learning benchmarks. We then use an information-theoretic framework to discuss meta-augmentation, a way to add randomness that discourages the base learner and model from learning trivial solutions that do not generalize to new tasks. We demonstrate that meta-augmentation produces large complementary benefits to recently proposed meta-regularization techniques.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes b/data/2020/neurips/Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes
new file mode 100644
index 0000000000..1bace1c6bd
--- /dev/null
+++ b/data/2020/neurips/Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural Processes	
@@ -0,0 +1 @@
+Stationary stochastic processes (SPs) are a key component of many probabilistic models, such as those for off-the-grid spatio-temporal data. They enable the statistical symmetry of underlying physical phenomena to be leveraged, thereby aiding generalization. Prediction in such models can be viewed as a translation equivariant map from observed data sets to predictive SPs, emphasizing the intimate relationship between stationarity and equivariance. Building on this, we propose the Convolutional Neural Process (ConvNP), which endows Neural Processes (NPs) with translation equivariance and extends convolutional conditional NPs to allow for dependencies in the predictive distribution. The latter enables ConvNPs to be deployed in settings which require coherent samples, such as Thompson sampling or conditional image completion. Moreover, we propose a new maximum-likelihood objective to replace the standard ELBO objective in NPs, which conceptually simplifies the framework and empirically improves performance. We demonstrate the strong performance and generalization capabilities of ConvNPs on 1D regression, image completion, and various tasks with real-world spatio-temporal data.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Learning through Hebbian Plasticity in Random Networks b/data/2020/neurips/Meta-Learning through Hebbian Plasticity in Random Networks
new file mode 100644
index 0000000000..e56ab54cf9
--- /dev/null
+++ b/data/2020/neurips/Meta-Learning through Hebbian Plasticity in Random Networks	
@@ -0,0 +1 @@
+Lifelong learning and adaptability are two defining aspects of biological agents. Modern reinforcement learning (RL) approaches have shown significant progress in solving complex tasks, however once training is concluded, the found solutions are typically static and incapable of adapting to new information or perturbations. While it is still not completely understood how biological brains learn and adapt so efficiently from experience, it is believed that synaptic plasticity plays a prominent role in this process. Inspired by this biological mechanism, we propose a search method that, instead of optimizing the weight parameters of neural networks directly, only searches for synapse-specific Hebbian learning rules that allow the network to continuously self-organize its weights during the lifetime of the agent. We demonstrate our approach on several reinforcement learning tasks with different sensory modalities and more than 450K trainable plasticity parameters. We find that starting from completely random weights, the discovered Hebbian rules enable an agent to navigate a dynamical 2D-pixel environment; likewise they allow a simulated 3D quadrupedal robot to learn how to walk while adapting to different morphological damage in the absence of any explicit reward or error signal.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Learning with Adaptive Hyperparameters b/data/2020/neurips/Meta-Learning with Adaptive Hyperparameters
new file mode 100644
index 0000000000..f5bfa4314c
--- /dev/null
+++ b/data/2020/neurips/Meta-Learning with Adaptive Hyperparameters	
@@ -0,0 +1 @@
+Despite its popularity, several recent works question the effectiveness of MAML when test tasks are different from training tasks, thus suggesting various task-conditioned methodology to improve the initialization. Instead of searching for better task-aware initialization, we focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation). Consequently, we propose a new weight update rule that greatly enhances the fast adaptation process. Specifically, we introduce a small meta-network that can adaptively generate per-step hyperparameters: learning rate and weight decay coefficients. The experimental results validate that the Adaptive Learning of hyperparameters for Fast Adaptation (ALFA) is the equally important ingredient that was often neglected in the recent few-shot learning approaches. Surprisingly, fast adaptation from random initialization with ALFA can already outperform MAML.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-Neighborhoods b/data/2020/neurips/Meta-Neighborhoods
new file mode 100644
index 0000000000..71021f6f8f
--- /dev/null
+++ b/data/2020/neurips/Meta-Neighborhoods
@@ -0,0 +1 @@
+Traditional methods for training neural networks use training data just once, as it is discarded after training. Instead, in this work we also leverage the training data during testing to adjust the network and gain more expressivity. Our approach, named Meta-Neighborhoods, is developed under a multi-task learning framework and is a generalization of k-nearest neighbors methods. It can flexibly adapt network parameters w.r.t. different query data using their respective local neighborhood information. Local information is learned and stored in a dictionary of learnable neighbors rather than directly retrieved from the training set for greater flexibility and performance. The network parameters and the dictionary are optimized end-to-end via meta-learning. Extensive experiments demonstrate that Meta-Neighborhoods consistently improved classification and regression performance across various network architectures and datasets. We also observed superior improvements than other state-of-the-art meta-learning methods designed to improve supervised learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-learning from Tasks with Heterogeneous Attribute Spaces b/data/2020/neurips/Meta-learning from Tasks with Heterogeneous Attribute Spaces
new file mode 100644
index 0000000000..c963d0c455
--- /dev/null
+++ b/data/2020/neurips/Meta-learning from Tasks with Heterogeneous Attribute Spaces	
@@ -0,0 +1 @@
+We propose a heterogeneous meta-learning method that trains a model on tasks with various attribute spaces, such that it can solve unseen tasks whose attribute spaces are different from the training tasks given a few labeled instances. Although many meta-learning methods have been proposed, they assume that all training and target tasks share the same attribute space, and they are inapplicable when attribute sizes are different across tasks. Our model infers latent representations of each attribute and each response from a few labeled instances using an inference network. Then, responses of unlabeled instances are predicted with the inferred representations using a prediction network. The attribute and response representations enable us to make predictions based on the task-speciﬁc properties of attributes and responses even when attribute and response sizes are different across tasks. In our experiments with synthetic datasets and 59 datasets in OpenML, we demonstrate that our proposed method can predict the responses given a few labeled instances in new tasks after being trained with tasks with heterogeneous attribute spaces.
\ No newline at end of file
diff --git a/data/2020/neurips/Meta-trained agents implement Bayes-optimal agents b/data/2020/neurips/Meta-trained agents implement Bayes-optimal agents
new file mode 100644
index 0000000000..3e0f7c5631
--- /dev/null
+++ b/data/2020/neurips/Meta-trained agents implement Bayes-optimal agents	
@@ -0,0 +1 @@
+Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.
\ No newline at end of file
diff --git a/data/2020/neurips/MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures b/data/2020/neurips/MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
new file mode 100644
index 0000000000..a8cee3b47d
--- /dev/null
+++ b/data/2020/neurips/MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures	
@@ -0,0 +1 @@
+Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.
\ No newline at end of file
diff --git a/data/2020/neurips/MetaPoison: Practical General-purpose Clean-label Data Poisoning b/data/2020/neurips/MetaPoison: Practical General-purpose Clean-label Data Poisoning
new file mode 100644
index 0000000000..1b476cf4ce
--- /dev/null
+++ b/data/2020/neurips/MetaPoison: Practical General-purpose Clean-label Data Poisoning	
@@ -0,0 +1 @@
+Data poisoning--the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data--is an emerging threat in the context of neural networks. Existing attacks for data poisoning have relied on hand-crafted heuristics. Instead, we pose crafting poisons more generally as a bi-level optimization problem, where the inner level corresponds to training a network on a poisoned dataset and the outer level corresponds to updating those poisons to achieve a desired behavior on the trained model. We then propose MetaPoison, a first-order method to solve this optimization quickly. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin under the same setting. MetaPoison is robust: its poisons transfer to a variety of victims with unknown hyperparameters and architectures. MetaPoison is also general-purpose, working not only in fine-tuning scenarios, but also for end-to-end training from scratch with remarkable success, e.g. causing a target image to be misclassified 90% of the time via manipulating just 1% of the dataset. Additionally, MetaPoison can achieve arbitrary adversary goals not previously possible--like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate successful data poisoning of models trained on Google Cloud AutoML Vision. Code and premade poisons are provided at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/MetaSDF: Meta-Learning Signed Distance Functions b/data/2020/neurips/MetaSDF: Meta-Learning Signed Distance Functions
new file mode 100644
index 0000000000..250e75d194
--- /dev/null
+++ b/data/2020/neurips/MetaSDF: Meta-Learning Signed Distance Functions	
@@ -0,0 +1 @@
+Neural implicit shape representations are an emerging paradigm that offers many potential benefits over conventional discrete representations, including memory efficiency at a high spatial resolution. Generalizing across shapes with such neural implicit representations amounts to learning priors over the respective function space and enables geometry reconstruction from partial or noisy observations. Existing generalization methods rely on conditioning a neural network on a low-dimensional latent code that is either regressed by an encoder or jointly optimized in the auto-decoder framework. Here, we formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task. We demonstrate that this approach performs on par with auto-decoder based approaches while being an order of magnitude faster at test-time inference. We further demonstrate that the proposed gradient-based method outperforms encoder-decoder based methods that leverage pooling-based set encoders.
\ No newline at end of file
diff --git a/data/2020/neurips/Metric-Free Individual Fairness in Online Learning b/data/2020/neurips/Metric-Free Individual Fairness in Online Learning
new file mode 100644
index 0000000000..cc83018d81
--- /dev/null
+++ b/data/2020/neurips/Metric-Free Individual Fairness in Online Learning	
@@ -0,0 +1 @@
+We study an online learning problem subject to the constraint of individual fairness, which requires that similar individuals are treated similarly. Unlike prior work on individual fairness, we do not assume the similarity measure among individuals is known, nor do we assume that such measure takes a certain parametric form. Instead, we leverage the existence of an auditor who detects fairness violations without enunciating the quantitative measure. In each round, the auditor examines the learner's decisions and attempts to identify a pair of individuals that are treated unfairly by the learner. We provide a general reduction framework that reduces online classification in our model to standard online classification, which allows us to leverage existing online learning algorithms to achieve sub-linear regret and number of fairness violations. Surprisingly, in the stochastic setting where the data are drawn independently from a distribution, we are also able to establish PAC-style fairness and accuracy generalization guarantees (Yona and Rothblum [2018]), despite only having access to a very restricted form of fairness feedback. Our fairness generalization bound qualitatively matches the uniform convergence bound of Yona and Rothblum [2018], while also providing a meaningful accuracy generalization guarantee. Our results resolve an open question by Gillen et al. [2018] by showing that online learning under an unknown individual fairness constraint is possible even without assuming a strong parametric form of the underlying similarity measure.
\ No newline at end of file
diff --git a/data/2020/neurips/MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics b/data/2020/neurips/MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics
new file mode 100644
index 0000000000..4a7e08a680
--- /dev/null
+++ b/data/2020/neurips/MinMax Methods for Optimal Transport and Beyond: Regularization, Approximation and Numerics	
@@ -0,0 +1 @@
+We study MinMax solution methods for a general class of optimization problems related to (and including) optimal transport. Theoretically, the focus is on fitting a large class of problems into a single MinMax framework and generalizing regularization techniques known from classical optimal transport. We show that regularization techniques justify the utilization of neural networks to solve such problems by proving approximation theorems and illustrating fundamental issues if no regularization is used. We further study the relation to the literature on generative adversarial nets, and analyze which algorithmic techniques used therein are particularly suitable to the class of problems studied in this paper. Several numerical experiments showcase the generality of the setting and highlight which theoretical insights are most beneficial in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers b/data/2020/neurips/MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
new file mode 100644
index 0000000000..8051026c5c
--- /dev/null
+++ b/data/2020/neurips/MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers	
@@ -0,0 +1 @@
+Pre-trained language models (e.g., BERT (Devlin et al., 2018) and its variants) have achieved remarkable success in varieties of NLP tasks. However, these models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this work, we present a simple and effective approach to compress large Transformer (Vaswani et al., 2017) based pre-trained models, termed as deep self-attention distillation. The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher). Specifically, we propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student. Furthermore, we introduce the scaled dot-product between values in the self-attention module as the new deep self-attention knowledge, in addition to the attention distributions (i.e., the scaled dot-product of queries and keys) that have been used in existing works. Moreover, we show that introducing a teacher assistant (Mirzadeh et al., 2019) also helps the distillation of large pre-trained Transformer models. Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models. In particular, it retains more than 99% accuracy on SQuAD 2.0 and several GLUE benchmark tasks using 50% of the Transformer parameters and computations of the teacher model. We also obtain competitive results in applying deep self-attention distillation to multilingual pre-trained models.
\ No newline at end of file
diff --git a/data/2020/neurips/Minibatch Stochastic Approximate Proximal Point Methods b/data/2020/neurips/Minibatch Stochastic Approximate Proximal Point Methods
new file mode 100644
index 0000000000..2829d41263
--- /dev/null
+++ b/data/2020/neurips/Minibatch Stochastic Approximate Proximal Point Methods	
@@ -0,0 +1 @@
+We extend the Approximate-Proximal Point ( A P ROX ) family of model-based methods for solving stochastic convex optimization problems, including stochastic subgradient, proximal point, and bundle methods, to the minibatch setting. To do this, we propose two minibatched algorithms for which we prove a non-asymptotic upper bound on the rate of convergence, revealing a linear speedup in minibatch size. In contrast to standard stochastic gradient methods, these methods may have linear speedup in the minibatch setting even for non-smooth functions. Our algorithms maintain the desirable traits characteristic of the A P ROX family, such as robustness to initial step size choice. Additionally, we show improved convergence rates for "interpolation" problems, which (for example) gives a new parallelization strategy for alternating projections. We corroborate our theoretical results with extensive empirical testing, which demonstrates the gains provided by accurate modeling and minibatching.
\ No newline at end of file
diff --git a/data/2020/neurips/Minibatch vs Local SGD for Heterogeneous Distributed Learning b/data/2020/neurips/Minibatch vs Local SGD for Heterogeneous Distributed Learning
new file mode 100644
index 0000000000..209e8657cd
--- /dev/null
+++ b/data/2020/neurips/Minibatch vs Local SGD for Heterogeneous Distributed Learning	
@@ -0,0 +1 @@
+We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective; the goal is to optimize w.r.t. the average objective; and machines can only communicate intermittently. We argue that, (i) Minibatch SGD (even without acceleration) dominates all existing analysis of Local SGD in this setting, (ii) accelerated Minibatch SGD is optimal when the heterogeneity is high, and (iii) present the first upper bound for Local SGD that improves over Minibatch SGD in a non-homogeneous regime.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Bounds for Generalized Linear Models b/data/2020/neurips/Minimax Bounds for Generalized Linear Models
new file mode 100644
index 0000000000..d908636220
--- /dev/null
+++ b/data/2020/neurips/Minimax Bounds for Generalized Linear Models	
@@ -0,0 +1 @@
+We establish a new class of minimax prediction error bounds for generalized linear models. Our bounds significantly improve previous results when the design matrix is poorly structured, including natural cases where the matrix is wide or does not have full column rank. Apart from the typical L2 risks, we study a class of entropic risks which recovers the usual L2 prediction and estimation risks, and demonstrate that a tight analysis of Fisher information can uncover underlying structural dependency in terms of the spectrum of the design matrix. The minimax approach we take differs from the traditional metric entropy approach, and can be applied to many other settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Classification with 0-1 Loss and Performance Guarantees b/data/2020/neurips/Minimax Classification with 0-1 Loss and Performance Guarantees
new file mode 100644
index 0000000000..2540a7dc08
--- /dev/null
+++ b/data/2020/neurips/Minimax Classification with 0-1 Loss and Performance Guarantees	
@@ -0,0 +1 @@
+Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss. Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate losses over specific families of rules. This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules. MRCs achieve efficient learning and out-of-sample generalization by minimizing worst-case expected 0-1 loss w.r.t. uncertainty sets that are defined by linear constraints and include the true underlying distribution. In addition, MRCs' learning stage provides performance guarantees as lower and upper tight bounds for expected 0-1 loss. We also present MRCs' finite-sample generalization bounds in terms of training size and smallest minimax risk, and show their competitive classification performance w.r.t. state-of-the-art techniques using benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons b/data/2020/neurips/Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons
new file mode 100644
index 0000000000..52658da16e
--- /dev/null
+++ b/data/2020/neurips/Minimax Dynamics of Optimally Balanced Spiking Networks of Excitatory and Inhibitory Neurons	
@@ -0,0 +1 @@
+Excitation-inhibition (E-I) balance is ubiquitously observed in the cortex. Recent studies suggest an intriguing link between balance on fast timescales, tight balance, and efficient information coding with spikes. We further this connection by taking a principled approach to optimal balanced networks of excitatory (E) and inhibitory (I) neurons. By deriving E-I spiking neural networks from greedy spike-based optimizations of constrained minimax objectives, we show that tight balance arises from correcting for deviations from the minimax optimum. We predict specific neuron firing rates in the network by solving the minimax problem, going beyond statistical theories of balanced networks. Finally, we design minimax objectives for reconstruction of an input signal, associative memory, and storage of manifold attractors, and derive from them E-I networks that perform the computation. Overall, we present a novel normative modeling approach for spiking E-I networks, going beyond the widely-used energy minimizing networks that violate Dale's law. Our networks can be used to model cortical circuits and computations.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Estimation of Conditional Moment Models b/data/2020/neurips/Minimax Estimation of Conditional Moment Models
new file mode 100644
index 0000000000..7c07fe9a5d
--- /dev/null
+++ b/data/2020/neurips/Minimax Estimation of Conditional Moment Models	
@@ -0,0 +1 @@
+We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic first-order heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks b/data/2020/neurips/Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
new file mode 100644
index 0000000000..b3ea6dec9d
--- /dev/null
+++ b/data/2020/neurips/Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks	
@@ -0,0 +1 @@
+Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent empirical success of transfer learning approaches, the benefits and fundamental limits of transfer learning are poorly understood. In this paper we develop a statistical minimax framework to characterize the fundamental limits of transfer learning in the context of regression with linear and one-hidden layer neural network models. Specifically, we derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data as well as appropriate notions of similarity between the source and target tasks. Our lower bound provides new insights into the benefits and limitations of transfer learning. We further corroborate our theoretical finding with various experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects b/data/2020/neurips/Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects
new file mode 100644
index 0000000000..dda728c35c
--- /dev/null
+++ b/data/2020/neurips/Minimax Optimal Nonparametric Estimation of Heterogeneous Treatment Effects	
@@ -0,0 +1 @@
+A central goal of causal inference is to detect and estimate the treatment effects of a given treatment or intervention on an outcome variable of interest, where a member known as the heterogeneous treatment effect (HTE) is of growing popularity in recent practical applications such as the personalized medicine. In this paper, we model the HTE as a smooth nonparametric difference between two less smooth baseline functions, and determine the tight statistical limits of the nonparametric HTE estimation as a function of the covariate geometry. In particular, a two-stage nearest-neighbor-based estimator throwing away observations with poor matching quality is near minimax optimal. We also establish the tight dependence on the density ratio without the usual assumption that the covariate densities are bounded away from zero, where a key step is to employ a novel maximal inequality which could be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition b/data/2020/neurips/Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
new file mode 100644
index 0000000000..e3b59040c8
--- /dev/null
+++ b/data/2020/neurips/Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition	
@@ -0,0 +1 @@
+We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. In this paper, we show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. In one dimension, a novel adversarial strategy yields the lower bound of $O(\frac{T}{\sqrt{K}})$, but a precise minimax analysis including constants is more involved. To establish the tighter one-dimensional result, we introduce the \emph{fugal game} relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the optimal minimax lower bound in one dimension. To establish the dimension-independent upper bound, we next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes.
\ No newline at end of file
diff --git a/data/2020/neurips/Minimax Value Interval for Off-Policy Evaluation and Policy Optimization b/data/2020/neurips/Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
new file mode 100644
index 0000000000..a2cb0c7651
--- /dev/null
+++ b/data/2020/neurips/Minimax Value Interval for Off-Policy Evaluation and Policy Optimization	
@@ -0,0 +1,4 @@
+We study minimax methods for off-policy evaluation (OPE) using value functions and marginalized importance weights. Despite that they hold promises of overcoming the exponential variance in traditional importance sampling, several key problems remain: 
+(1) They require function approximation and are generally biased. For the sake of trustworthy OPE, is there anyway to quantify the biases? 
+(2) They are split into two styles ("weight-learning" vs "value-learning"). Can we unify them? 
+In this paper we answer both questions positively. By slightly altering the derivation of previous methods (one from each style; Uehara et al., 2020), we unify them into a single value interval that comes with a special type of double robustness: when either the value-function or the importance-weight class is well specified, the interval is valid and its length quantifies the misspecification of the other class. Our interval also provides a unified view of and new insights to some recent methods, and we further explore the implications of our results on exploration and exploitation in off-policy policy optimization with insufficient data coverage.
\ No newline at end of file
diff --git a/data/2020/neurips/Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization b/data/2020/neurips/Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
new file mode 100644
index 0000000000..3009d7fcf6
--- /dev/null
+++ b/data/2020/neurips/Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization	
@@ -0,0 +1 @@
+Online continual learning is a challenging scenario where a model needs to learn from a continuous stream of data without revisiting any previously encountered data instances. The phenomenon of catastrophic forgetting is worsened since the model should not only address the forgetting at the task-level but also at the data instance-level within the same task. To mitigate this, we leverage the concept of "instance awareness" in the neural network, where each data instance is classiﬁed by a path in the network searched by the controller from a meta-graph. To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar. On the other hand, it also encourages ﬁne-tuning the path if the incoming instance shares the similarity with previous instances. The mechanism of selecting paths according to instances similarity is naturally determined by the controller, which is compact and online updated. Experimental results show that the proposed method outperforms state-of-the-arts in online continual learning. Furthermore, the proposed method is evaluated against a realistic setting where the boundaries between tasks are blurred. Experimental results conﬁrm that the proposed method outperforms the state-of-the-arts on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
\ No newline at end of file
diff --git a/data/2020/neurips/Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments b/data/2020/neurips/Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments
new file mode 100644
index 0000000000..93d8e34101
--- /dev/null
+++ b/data/2020/neurips/Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments	
@@ -0,0 +1 @@
+We consider three important challenges in conference peer review: (i) reviewers maliciously attempting to get assigned to certain papers to provide positive reviews, possibly as part of quid-pro-quo arrangements with the authors; (ii) "torpedo reviewing," where reviewers deliberately attempt to get assigned to certain papers that they dislike in order to reject them; (iii) reviewer de-anonymization on release of the similarities and the reviewer-assignment code. On the conceptual front, we identify connections between these three problems and present a framework that brings all these challenges under a common umbrella. We then present a (randomized) algorithm for reviewer assignment that can optimally solve the reviewer-assignment problem under any given constraints on the probability of assignment for any reviewer-paper pair. We further consider the problem of restricting the joint probability that certain suspect pairs of reviewers are assigned to certain papers, and show that this problem is NP-hard for arbitrary constraints on these joint probabilities but efficiently solvable for a practical special case. Finally, we experimentally evaluate our algorithms on datasets from past conferences, where we observe that they can limit the chance that any malicious reviewer gets assigned to their desired paper to 50% while producing assignments with over 90% of the total optimal similarity. Our algorithms still achieve this similarity while also preventing reviewers with close associations from being assigned to the same paper.
\ No newline at end of file
diff --git a/data/2020/neurips/Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions b/data/2020/neurips/Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
new file mode 100644
index 0000000000..8467d32c78
--- /dev/null
+++ b/data/2020/neurips/Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions	
@@ -0,0 +1 @@
+We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This covariate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, ${\sf Mix\&Match}$, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables b/data/2020/neurips/Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables
new file mode 100644
index 0000000000..109373c16a
--- /dev/null
+++ b/data/2020/neurips/Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables	
@@ -0,0 +1 @@
+Hamiltonian Monte Carlo (HMC) has emerged as a powerful Markov Chain Monte Carlo (MCMC) method to sample from complex continuous distributions. However, a fundamental limitation of HMC is that it can't be applied to distributions with mixed discrete and continuous variables. In this paper, we propose mixed HMC as a general framework to address this limitation. Mixed HMC is a novel family of MCMC algorithms that evolves the discrete and continuous variables in tandem in a way analogous to HMC, allowing more frequent updates of discrete variables while maintaining HMC's ability to suppress random-walk behavior. We establish mixed HMC's theoretical properties, and present an efficient implementation with Laplace momentum that can be incorporated into existing HMC implementations with minimal overhead. The superior performances of mixed HMC over existing methods are demonstrated with numerical experiments on Gaussian mixture models (GMMs), variable selection in Bayesian logistic regression (BLR), and correlated topic models (CTMs).
\ No newline at end of file
diff --git a/data/2020/neurips/Model Agnostic Multilevel Explanations b/data/2020/neurips/Model Agnostic Multilevel Explanations
new file mode 100644
index 0000000000..a390ba4e1d
--- /dev/null
+++ b/data/2020/neurips/Model Agnostic Multilevel Explanations	
@@ -0,0 +1 @@
+In recent years, post-hoc local instance-level and global dataset-level explainability of black-box models has received a lot of attention. Much less attention has been given to obtaining insights at intermediate or group levels, which is a need outlined in recent works that study the challenges in realizing the guidelines in the General Data Protection Regulation (GDPR). In this paper, we propose a meta-method that, given a typical local explainability method, can build a multilevel explanation tree. The leaves of this tree correspond to the local explanations, the root corresponds to the global explanation, and intermediate levels correspond to explanations for groups of data points that it automatically clusters. The method can also leverage side information, where users can specify points for which they may want the explanations to be similar. We argue that such a multilevel structure can also be an effective form of communication, where one could obtain few explanations that characterize the entire dataset by considering an appropriate level in our explanation tree. Explanations for novel test points can be cost-efficiently obtained by associating them with the closest training points. When the local explainability technique is generalized additive (viz. LIME, GAMs), we develop a fast approximate algorithm for building the multilevel tree and study its convergence behavior. We validate the effectiveness of the proposed technique based on two human studies -- one with experts and the other with non-expert users -- on real world datasets, and show that we produce high fidelity sparse explanations on several other public datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Model Class Reliance for Random Forests b/data/2020/neurips/Model Class Reliance for Random Forests
new file mode 100644
index 0000000000..49c91134d5
--- /dev/null
+++ b/data/2020/neurips/Model Class Reliance for Random Forests	
@@ -0,0 +1 @@
+Variable Importance (VI) has traditionally been cast as the process of estimating each variable’s contribution to a predictive model’s overall performance. Analysis of a single model instance, however, guarantees no insight into a variables relevance to underlying generative processes. Recent research has sought to address this concern via analysis of Rashomon sets - sets of alternative model instances that exhibit equivalent predictive performance to some reference model, but which take different functional forms. Measures such as Model Class Reliance (MCR) have been proposed, that are computed against Rashomon sets, in order to ascertain how much a variable must be relied on to make robust predictions, or whether alternatives exist. If MCR range is tight, we have no choice but to use a variable; if range is high then there exists competing, perhaps fairer models, that provide alternative explanations of the phenomena being examined. Applications are wide, from enabling construction of ‘fairer’ models in areas such as recidivism to health analytics and ethical marketing. Tractable estimation of MCR for non-linear models is currently restricted to Kernel Regression under squared loss [7]. In this paper we introduce a new technique that extends computation of Model Class Reliance (MCR) to Random Forest classiﬁers and regressors. The proposed approach addresses a number of open research questions, and in contrast to prior Kernel SVM MCR estimation, runs in linearithmic rather than polynomial time. Taking a fundamentally different approach to previous work, we provide a solution for this important model class, identifying situations where irrelevant covariates do not improve predictions.
\ No newline at end of file
diff --git a/data/2020/neurips/Model Fusion via Optimal Transport b/data/2020/neurips/Model Fusion via Optimal Transport
new file mode 100644
index 0000000000..c64c6874f5
--- /dev/null
+++ b/data/2020/neurips/Model Fusion via Optimal Transport	
@@ -0,0 +1 @@
+Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings, we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10 and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. Code is available under the following link https://github.com/modelfusion.
\ No newline at end of file
diff --git a/data/2020/neurips/Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets b/data/2020/neurips/Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets
new file mode 100644
index 0000000000..f77b43a02b
--- /dev/null
+++ b/data/2020/neurips/Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets	
@@ -0,0 +1 @@
+To obtain excellent deep neural architectures, a series of techniques are carefully designed in EfficientNets. The giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. So that we can find networks with high efficiency and excellent performance by twisting the three dimensions. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs. Different from the network enlarging, we observe that resolution and depth are more important than width for tiny networks. Therefore, the original method, i.e., the compound scaling in EfficientNet is no longer suitable. To this end, we summarize a tiny formula for downsizing neural architectures through a series of smaller models derived from the EfficientNet-B0 with the FLOPs constraint. Experimental results on the ImageNet benchmark illustrate that our TinyNet performs much better than the smaller version of EfficientNets using the inversed giant formula. For instance, our TinyNet-E achieves a 59.9% Top-1 accuracy with only 24M FLOPs, which is about 1.9% higher than that of the previous best MobileNetV3 with similar computational cost. Code will be available at this https URL, and this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Model Selection for Production System via Automated Online Experiments b/data/2020/neurips/Model Selection for Production System via Automated Online Experiments
new file mode 100644
index 0000000000..7550186481
--- /dev/null
+++ b/data/2020/neurips/Model Selection for Production System via Automated Online Experiments	
@@ -0,0 +1 @@
+A challenge that machine learning practitioners in the industry face is the task of selecting the best model to deploy in production. As a model is often an intermediate component of a production system, online controlled experiments such as A/B tests yield the most reliable estimation of the effectiveness of the whole system, but can only compare two or a few models due to budget constraints. We propose an automated online experimentation mechanism that can efficiently perform model selection from a large pool of models with a small number of online experiments. We derive the probability distribution of the metric of interest that contains the model uncertainty from our Bayesian surrogate model trained using historical logs. Our method efficiently identifies the best model by sequentially selecting and deploying a list of models from the candidate set that balance exploration-exploitation. Using simulations based on real data, we demonstrate the effectiveness of our method on two different tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Model Selection in Contextual Stochastic Bandit Problems b/data/2020/neurips/Model Selection in Contextual Stochastic Bandit Problems
new file mode 100644
index 0000000000..13fed47076
--- /dev/null
+++ b/data/2020/neurips/Model Selection in Contextual Stochastic Bandit Problems	
@@ -0,0 +1 @@
+We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL \cite{DBLP:conf/colt/AgarwalLNS17} and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that permits us to obtain $O(\sqrt{T})$ regret guarantees for a wide class of base algorithms when working along with our master. We exhibit a lower bound showing that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically. We apply our algorithm to choose among different values of $\epsilon$ for the $\epsilon$-greedy algorithm, and to choose between the $k$-armed UCB and linear UCB algorithms. Our empirical studies further confirm the effectiveness of our model-selection method.
\ No newline at end of file
diff --git a/data/2020/neurips/Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity b/data/2020/neurips/Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
new file mode 100644
index 0000000000..8bb344329f
--- /dev/null
+++ b/data/2020/neurips/Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity	
@@ -0,0 +1 @@
+Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the corner stones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive, easy-to-implement, and widely-used, the sample complexity of model-based MARL algorithms has not been fully investigated. In this paper, our goal is to address the fundamental question about its sample complexity. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model. We show that model-based MARL achieves a sample complexity of $\tilde O(|S||A||B|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) value up to some $\epsilon$ error, and the $\epsilon$-NE policies with a smooth planning oracle, where $\gamma$ is the discount factor, and $S,A,B$ denote the state space, and the action spaces for the two agents. We further show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge, by establishing a matching lower bound. This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence. Our results not only demonstrate the sample-efficiency of this basic model-based approach in MARL, but also elaborate on the fundamental tradeoff between its power (easily handling the more challenging reward-agnostic case) and limitation (less adaptive and suboptimal in $|A|,|B|$), particularly arises in the multi-agent context.
\ No newline at end of file
diff --git a/data/2020/neurips/Model-based Adversarial Meta-Reinforcement Learning b/data/2020/neurips/Model-based Adversarial Meta-Reinforcement Learning
new file mode 100644
index 0000000000..2fd0abc00a
--- /dev/null
+++ b/data/2020/neurips/Model-based Adversarial Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes Model-based Adversarial Meta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-case sub-optimality gap -- the difference between the optimal return and the return that the algorithm achieves after adaptation -- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the adversarial task for the current model -- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Model-based Policy Optimization with Unsupervised Model Adaptation b/data/2020/neurips/Model-based Policy Optimization with Unsupervised Model Adaptation
new file mode 100644
index 0000000000..06d5ae9b1f
--- /dev/null
+++ b/data/2020/neurips/Model-based Policy Optimization with Unsupervised Model Adaptation	
@@ -0,0 +1 @@
+Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between simulated data and real data, this could lead to degraded performance. Despite much effort being devoted to reducing this distribution mismatch, existing methods fail to solve it explicitly. In this paper, we investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. To begin with, we first derive a lower bound of the expected return, which naturally inspires a bound maximization algorithm by aligning the simulated and real data distributions. To this end, we propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation to minimize the integral probability metric (IPM) between feature distributions from real and simulated data. Instantiating our framework with Wasserstein-1 distance gives a practical model-based approach. Empirically, our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs b/data/2020/neurips/Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs
new file mode 100644
index 0000000000..bb4680f4cf
--- /dev/null
+++ b/data/2020/neurips/Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs	
@@ -0,0 +1 @@
+We present two elegant solutions for modeling continuous-time dynamics, in a novel model-based reinforcement learning (RL) framework for semi-Markov decision processes (SMDPs), using neural ordinary differential equations (ODEs). Our models accurately characterize continuous-time dynamics and enable us to develop high-performing policies using a small amount of data. We also develop a model-based approach for optimizing time schedules to reduce interaction rates with the environment while maintaining the near-optimal performance, which is not possible for model-free methods. We experimentally demonstrate the efficacy of our methods across various continuous-time domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows b/data/2020/neurips/Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows
new file mode 100644
index 0000000000..4239dc77ce
--- /dev/null
+++ b/data/2020/neurips/Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows	
@@ -0,0 +1 @@
+Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2020/neurips/Modeling Noisy Annotations for Crowd Counting b/data/2020/neurips/Modeling Noisy Annotations for Crowd Counting
new file mode 100644
index 0000000000..ad23a8f39f
--- /dev/null
+++ b/data/2020/neurips/Modeling Noisy Annotations for Crowd Counting	
@@ -0,0 +1 @@
+The annotation noise in crowd counting is not modeled in traditional crowd counting algorithms based on crowd density maps. In this paper, we ﬁrst model the annotation noise using a random variable with Gaussian distribution, and derive the pdf of the crowd density value for each spatial location in the image. We then approximate the joint distribution of the density values (i.e., the distribution of density maps) with a full covariance multivariate Gaussian density, and derive a low-rank approximate for tractable implementation. We use our loss function to train a crowd density map estimator and achieve state-of-the-art performance on three large-scale crowd counting datasets, which conﬁrms its effectiveness. Examination of the predictions of the trained model shows that it can correctly predict the locations of people in spite of the noisy training data, which demonstrates the robustness of our loss function to annotation noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Modeling Shared responses in Neuroimaging Studies through MultiView ICA b/data/2020/neurips/Modeling Shared responses in Neuroimaging Studies through MultiView ICA
new file mode 100644
index 0000000000..e14d62ebd5
--- /dev/null
+++ b/data/2020/neurips/Modeling Shared responses in Neuroimaging Studies through MultiView ICA	
@@ -0,0 +1,3 @@
+Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant conditions such as movie watching, where the experimental setup does not imply well-defined cognitive operations. 
+We propose a novel MultiView Independent Component Analysis (ICA) model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. Contrary to most group-ICA procedures, the likelihood of the model is available in closed form. We develop an alternate quasi-Newton method for maximizing the likelihood, which is robust and converges quickly. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects. Moreover, the sources recovered by our model exhibit lower between-session variability than other methods.On magnetoencephalography (MEG) data, our method yields more accurate source localization on phantom data. Applied on 200 subjects from the Cam-CAN dataset it reveals a clear sequence of evoked activity in sensor and source space. 
+The code is freely available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction b/data/2020/neurips/Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction
new file mode 100644
index 0000000000..b84547313a
--- /dev/null
+++ b/data/2020/neurips/Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction	
@@ -0,0 +1 @@
+How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e.g., bird) always have the same representation, or does the task under which the word is processed alter its representation (answering "can you eat it?" versus "can it fly?")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i.e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.
\ No newline at end of file
diff --git a/data/2020/neurips/Modeling and Optimization Trade-off in Meta-learning b/data/2020/neurips/Modeling and Optimization Trade-off in Meta-learning
new file mode 100644
index 0000000000..224a1e83c9
--- /dev/null
+++ b/data/2020/neurips/Modeling and Optimization Trade-off in Meta-learning	
@@ -0,0 +1 @@
+By searching for shared inductive biases across tasks, meta-learning promises to accelerate learning on novel tasks, but with the cost of solving a complex bilevel optimization problem. We introduce and rigorously define the trade-off between accurate modeling and optimization ease in meta-learning. At one end, classic meta-learning algorithms account for the structure of meta-learning but solve a complex optimization problem, while at the other end domain randomized search (otherwise known as joint training) ignores the structure of meta-learning and solves a single level optimization problem. Taking MAML as the representative meta-learning algorithm, we theoretically characterize the trade-off for general non-convex risk functions as well as linear regression, for which we are able to provide explicit bounds on the errors associated with modeling and optimization. We also empirically study this trade-off for meta-reinforcement learning benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Modern Hopfield Networks and Attention for Immune Repertoire Classification b/data/2020/neurips/Modern Hopfield Networks and Attention for Immune Repertoire Classification
new file mode 100644
index 0000000000..2603a0b548
--- /dev/null
+++ b/data/2020/neurips/Modern Hopfield Networks and Attention for Immune Repertoire Classification	
@@ -0,0 +1 @@
+A central mechanism in machine learning is to identify, store, and recognize patterns. How to learn, access, and retrieve such patterns is crucial in Hopfield networks and the more recent transformer architectures. We show that the attention mechanism of transformer architectures is actually the update rule of modern Hop-field networks that can store exponentially many patterns. We exploit this high storage capacity of modern Hopfield networks to solve a challenging multiple instance learning (MIL) problem in computational biology: immune repertoire classification. Accurate and interpretable machine learning methods solving this problem could pave the way towards new vaccines and therapies, which is currently a very relevant research topic intensified by the COVID-19 crisis. Immune repertoire classification based on the vast number of immunosequences of an individual is a MIL problem with an unprecedentedly massive number of instances, two orders of magnitude larger than currently considered problems, and with an extremely low witness rate. In this work, we present our novel method DeepRC that integrates transformer-like attention, or equivalently modern Hopfield networks, into deep learning architectures for massive MIL such as immune repertoire classification. We demonstrate that DeepRC outperforms all other methods with respect to predictive performance on large-scale experiments, including simulated and real-world virus infection data, and enables the extraction of sequence motifs that are connected to a given disease class. Source code and datasets: https://github.com/ml-jku/DeepRC
\ No newline at end of file
diff --git a/data/2020/neurips/Modular Meta-Learning with Shrinkage b/data/2020/neurips/Modular Meta-Learning with Shrinkage
new file mode 100644
index 0000000000..a30b0745ca
--- /dev/null
+++ b/data/2020/neurips/Modular Meta-Learning with Shrinkage	
@@ -0,0 +1 @@
+Many real-world problems, including multi-speaker text-to-speech synthesis, can greatly benefit from the ability to meta-learn large models with only a few task-specific components. Updating only these task-specific modules then allows the model to be adapted to low-data tasks for as many steps as necessary without risking overfitting. Unfortunately, existing meta-learning methods either do not scale to long adaptation or else rely on handcrafted task-specific architectures. Here, we propose a meta-learning approach that obviates the need for this often sub-optimal hand-selection. In particular, we develop general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules. Empirically, we demonstrate that our method discovers a small set of meaningful task-specific modules and outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. We also show that existing meta-learning methods including MAML, iMAML, and Reptile emerge as special cases of our method.
\ No newline at end of file
diff --git a/data/2020/neurips/MomentumRNN: Integrating Momentum into Recurrent Neural Networks b/data/2020/neurips/MomentumRNN: Integrating Momentum into Recurrent Neural Networks
new file mode 100644
index 0000000000..e0b539a37b
--- /dev/null
+++ b/data/2020/neurips/MomentumRNN: Integrating Momentum into Recurrent Neural Networks	
@@ -0,0 +1 @@
+Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks, with little compromise in computational or memory efficiency. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at \url{this https URL}.
\ No newline at end of file
diff --git a/data/2020/neurips/Monotone operator equilibrium networks b/data/2020/neurips/Monotone operator equilibrium networks
new file mode 100644
index 0000000000..a7becbda81
--- /dev/null
+++ b/data/2020/neurips/Monotone operator equilibrium networks	
@@ -0,0 +1 @@
+Implicit-depth models such as Deep Equilibrium Networks have recently been shown to match or exceed the performance of traditional deep networks while being much more memory efficient. However, these models suffer from unstable convergence to a solution and lack guarantees that a solution exists. On the other hand, Neural ODEs, another class of implicit-depth models, do guarantee existence of a unique solution but perform poorly compared with traditional networks. In this paper, we develop a new class of implicit-depth model based on the theory of monotone operators, the Monotone Operator Equilibrium Network (MON). We show the close connection between finding the equilibrium point of an implicit network and solving a form of monotone operator splitting problem, which admits efficient solvers with guaranteed, stable convergence. We then develop a parameterization of the network which ensures that all operators remain monotone, which guarantees the existence of a unique equilibrium point. Finally, we show how to instantiate several versions of these models, and implement the resulting iterative solvers, for structured linear operators such as multi-scale convolutions. The resulting models vastly outperform the Neural ODE-based models while also being more computationally efficient. Code is available at this http URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Movement Pruning: Adaptive Sparsity by Fine-Tuning b/data/2020/neurips/Movement Pruning: Adaptive Sparsity by Fine-Tuning
new file mode 100644
index 0000000000..1b351701b4
--- /dev/null
+++ b/data/2020/neurips/Movement Pruning: Adaptive Sparsity by Fine-Tuning	
@@ -0,0 +1 @@
+Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models b/data/2020/neurips/MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models
new file mode 100644
index 0000000000..675b1310c4
--- /dev/null
+++ b/data/2020/neurips/MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models	
@@ -0,0 +1 @@
+We present a novel compression algorithm for reducing the storage of LiDAR sensor data streams. Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values. Towards this goal, we propose a novel conditional entropy model that models the probabilities of the octree symbols by considering both coarse level geometry and previous sweeps' geometric and intensity information. We then use the learned probability to encode the full data stream into a compact one. Our experiments demonstrate that our method significantly reduces the joint geometry and intensity bitrate over prior state-of-the-art LiDAR compression methods, with a reduction of 7-17% and 15-35% on the UrbanCity and SemanticKITTI datasets respectively.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Fidelity Bayesian Optimization via Deep Neural Networks b/data/2020/neurips/Multi-Fidelity Bayesian Optimization via Deep Neural Networks
new file mode 100644
index 0000000000..7efa0dc2b1
--- /dev/null
+++ b/data/2020/neurips/Multi-Fidelity Bayesian Optimization via Deep Neural Networks	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a popular framework to optimize black-box functions. In many applications, the objective function can be evaluated at multiple fidelities to enable a trade-off between the cost and accuracy. To reduce the optimization cost, many multi-fidelity BO methods have been proposed. Despite their success, these methods either ignore or over-simplify the strong, complex correlations across the fidelities, and hence can be inefficient in estimating the objective function. To address this issue, we propose Deep Neural Network Multi-Fidelity Bayesian Optimization (DNN-MFBO) that can flexibly capture all kinds of complicated relationships between the fidelities to improve the objective function estimation and hence the optimization performance. We use sequential, fidelity-wise Gauss-Hermite quadrature and moment-matching to fulfill a mutual information-based acquisition function, which is computationally tractable and efficient. We show the advantages of our method in both synthetic benchmark datasets and real-world applications in engineering design.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Plane Program Induction with 3D Box Priors b/data/2020/neurips/Multi-Plane Program Induction with 3D Box Priors
new file mode 100644
index 0000000000..f4c4b066dc
--- /dev/null
+++ b/data/2020/neurips/Multi-Plane Program Induction with 3D Box Priors	
@@ -0,0 +1 @@
+We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene. Unlike prior work on image-based program synthesis, which assumes the image contains a single visible 2D plane, we present Box Program Induction (BPI), which infers a program-like scene representation that simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters, all from a single image. Our model assumes a box prior, i.e., that the image captures either an inner view or an outer view of a box in 3D. It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image. Such a holistic, structured scene representation enables 3D-aware interactive image editing operations such as inpainting missing pixels, changing camera parameters, and extrapolate the image contents.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates b/data/2020/neurips/Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates
new file mode 100644
index 0000000000..7262783bac
--- /dev/null
+++ b/data/2020/neurips/Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates	
@@ -0,0 +1 @@
+Collision avoidance for multi-robot systems is a difficult challenge under uncertainty, non-determinism, and lack of complete information. This paper aims to propose a collision avoidance framework that accounts for both measurement uncertainty and bounded motion uncertainty. In particular, we propose Probabilistic Safety Barrier Certificates (PrSBC) using Control Barrier Functions to define the space of possible control actions that are probabilistically safe. The framework entails minimally modifying an existing unconstrained controller to determine a safe controller via a quadratic program constrained to the chance-constrained safety set. The key advantage of the approach is that no assumptions about the form of uncertainty are required other than finite support, also enabling worst-case guarantees. We demonstrate effectiveness of the approach through experiments on a realistic simulation environment.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Stage Influence Function b/data/2020/neurips/Multi-Stage Influence Function
new file mode 100644
index 0000000000..446ae3c143
--- /dev/null
+++ b/data/2020/neurips/Multi-Stage Influence Function	
@@ -0,0 +1 @@
+Multi-stage training and knowledge transfer, from a large-scale pretraining task to various finetuning tasks, have revolutionized natural language processing and computer vision resulting in state-of-the-art performance improvements. In this paper, we develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data. With this score, we can identify the pretraining examples in the pretraining task that contribute most to a prediction in the finetuning task. The proposed multi-stage influence function generalizes the original influence function for a single model in (Koh & Liang, 2017), thereby enabling influence computation through both pretrained and finetuned models. We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks. We test our proposed method in various experiments to show its effectiveness and potential applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Task Reinforcement Learning with Soft Modularization b/data/2020/neurips/Multi-Task Reinforcement Learning with Soft Modularization
new file mode 100644
index 0000000000..5eee00ede2
--- /dev/null
+++ b/data/2020/neurips/Multi-Task Reinforcement Learning with Soft Modularization	
@@ -0,0 +1 @@
+Multi-task learning is a very challenging problem in reinforcement learning. While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It is unclear what parameters in the network should be reused across tasks, and the gradients from different tasks may interfere with each other. Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue. Given a base policy network, we design a routing network which estimates different routing strategies to reconfigure the base network for each task. Instead of creating a concrete route for each task, our task-specific policy is represented by a soft combination of all possible routes. We name this approach soft modularization. We experiment with multiple robotics manipulation tasks in simulation and show our method improves sample efficiency and performance over baselines by a large margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement b/data/2020/neurips/Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement
new file mode 100644
index 0000000000..7e58d88c5d
--- /dev/null
+++ b/data/2020/neurips/Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement	
@@ -0,0 +1 @@
+Telehealth and remote health monitoring have become increasingly important during the SARS-CoV-2 pandemic and it is widely expected that this will have a lasting impact on healthcare practices. These tools can help reduce the risk of exposing patients and medical staff to infection, make healthcare services more accessible, and allow providers to see more patients. However, objective measurement of vital signs is challenging without direct contact with a patient. We present a video-based and on-device optical cardiopulmonary vital sign measurement approach. It leverages a novel multi-task temporal shift convolutional attention network (MTTS-CAN) and enables real-time cardiovascular and respiratory measurements on mobile platforms. We evaluate our system on an ARM CPU and achieve state-of-the-art accuracy while running at over 150 frames per second which enables real-time applications. Systematic experimentation on large benchmark datasets reveals that our approach leads to substantial (20%-50%) reductions in error and generalizes well across datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-agent Trajectory Prediction with Fuzzy Query Attention b/data/2020/neurips/Multi-agent Trajectory Prediction with Fuzzy Query Attention
new file mode 100644
index 0000000000..27522da00d
--- /dev/null
+++ b/data/2020/neurips/Multi-agent Trajectory Prediction with Fuzzy Query Attention	
@@ -0,0 +1 @@
+Trajectory prediction for scenes with multiple agents and entities is a challenging problem in numerous domains such as traffic prediction, pedestrian tracking and path planning. We present a general architecture to address this challenge which models the crucial inductive biases of motion, namely, inertia, relative motion, intents and interactions. Specifically, we propose a relational model to flexibly model interactions between agents in diverse environments. Since it is well-known that human decision making is fuzzy by nature, at the core of our model lies a novel attention mechanism which models interactions by making continuous-valued (fuzzy) decisions and learning the corresponding responses. Our architecture demonstrates significant performance gains over existing state-of-the-art predictive models in diverse domains such as human crowd trajectories, US freeway traffic, NBA sports data and physics datasets. We also present ablations and augmentations to understand the decision-making process and the source of gains in our model.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-agent active perception with prediction rewards b/data/2020/neurips/Multi-agent active perception with prediction rewards
new file mode 100644
index 0000000000..f4989ee205
--- /dev/null
+++ b/data/2020/neurips/Multi-agent active perception with prediction rewards	
@@ -0,0 +1 @@
+Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-label Contrastive Predictive Coding b/data/2020/neurips/Multi-label Contrastive Predictive Coding
new file mode 100644
index 0000000000..3149a05eed
--- /dev/null
+++ b/data/2020/neurips/Multi-label Contrastive Predictive Coding	
@@ -0,0 +1 @@
+Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by $\log m$, and could thus severely underestimate unless $m$ is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat a current state-of-the-art knowledge distillation method over 10 out of 13 tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-label classification: do Hamming loss and subset accuracy really conflict with each other? b/data/2020/neurips/Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?
new file mode 100644
index 0000000000..c6aee886b5
--- /dev/null
+++ b/data/2020/neurips/Multi-label classification: do Hamming loss and subset accuracy really conflict with each other?	
@@ -0,0 +1 @@
+Various evaluation measures have been developed for multi-label classification, including Hamming Loss (HL), Subset Accuracy (SA) and Ranking Loss (RL). However, there is a gap between empirical results and the existing theories: 1) an algorithm often empirically performs well on some measure(s) while poorly on others, while a formal theoretical analysis is lacking; and 2) in small label space cases, the algorithms optimizing HL often have comparable or even better performance on the SA measure than those optimizing SA directly, while existing theoretical results show that SA and HL are conflicting measures. This paper provides an attempt to fill up this gap by analyzing the learning guarantees of the corresponding learning algorithms on both SA and HL measures. We show that when a learning algorithm optimizes HL with its surrogate loss, it enjoys an error bound for the HL measure independent of $c$ (the number of labels), while the bound for the SA measure depends on at most $O(c)$. On the other hand, when directly optimizing SA with its surrogate loss, it has learning guarantees that depend on $O(\sqrt{c})$ for both HL and SA measures. This explains the observation that when the label space is not large, optimizing HL with its surrogate loss can have promising performance for SA. We further show that our techniques are applicable to analyze the learning guarantees of algorithms on other measures, such as RL. Finally, the theoretical analyses are supported by experimental results.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery b/data/2020/neurips/Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery
new file mode 100644
index 0000000000..d677c186e8
--- /dev/null
+++ b/data/2020/neurips/Multi-task Additive Models for Robust Estimation and Automatic Structure Discovery	
@@ -0,0 +1 @@
+Yingjie Wang, Hong Chen2∗, Feng Zheng, Chen Xu, Tieliang Gong, Yanhong Chen College of Informatics, Huazhong Agricultural University, China 2 College of Science, Huazhong Agricultural University, China 3 Department of Computer Science and Engineering, Southern University of Science and Technology, China 4 Department of Mathematics and Statistics, University of Ottawa, Canada 5 School of Computer Science and Technology, Xi’an Jiaotong University, China 6 National Space Science Center, Chinese Academy of Sciences, China
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-task Batch Reinforcement Learning with Metric Learning b/data/2020/neurips/Multi-task Batch Reinforcement Learning with Metric Learning
new file mode 100644
index 0000000000..fffe7e07cc
--- /dev/null
+++ b/data/2020/neurips/Multi-task Batch Reinforcement Learning with Metric Learning	
@@ -0,0 +1 @@
+We tackle the Multi-task Batch Reinforcement Learning problem. Given multiple datasets collected from different tasks, we train a multi-task policy to perform well in unseen tasks sampled from the same distribution. The task identities of the unseen tasks are not provided. To perform well, the policy must infer the task identity from collected transitions by modelling its dependency on states, actions and rewards. Because the different datasets may have state-action distributions with large divergence, the task inference module can learn to ignore the rewards and spuriously correlate \textit{only} state-action pairs to the task identity, leading to poor test time performance. To robustify task inference, we propose a novel application of the triplet loss. To mine hard negative examples, we relabel the transitions from the training tasks by approximating their reward functions. When further training is allowed on the unseen tasks, using the trained policy as an initialization leads to significantly faster convergence compared to randomly initialized policies.
\ No newline at end of file
diff --git a/data/2020/neurips/Multi-task Causal Learning with Gaussian Processes b/data/2020/neurips/Multi-task Causal Learning with Gaussian Processes
new file mode 100644
index 0000000000..c123211ac2
--- /dev/null
+++ b/data/2020/neurips/Multi-task Causal Learning with Gaussian Processes	
@@ -0,0 +1 @@
+This paper studies the problem of learning the correlation structure of a set of intervention functions defined on the directed acyclic graph (DAG) of a causal model. This is useful when we are interested in jointly learning the causal effects of interventions on different subsets of variables in a DAG, which is common in field such as healthcare or operations research. We propose the first multi-task causal Gaussian process (GP) model, which we call DAG-GP, that allows for information sharing across continuous interventions and across experiments on different variables. DAG-GP accommodates different assumptions in terms of data availability and captures the correlation between functions lying in input spaces of different dimensionality via a well-defined integral operator. We give theoretical results detailing when and how the DAG-GP model can be formulated depending on the DAG. We test both the quality of its predictions and its calibrated uncertainties. Compared to single-task models, DAG-GP achieves the best fitting performance in a variety of real and synthetic settings. In addition, it helps to select optimal interventions faster than competing approaches when used within sequential decision making frameworks, like active learning or Bayesian optimization.
\ No newline at end of file
diff --git a/data/2020/neurips/MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation b/data/2020/neurips/MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
new file mode 100644
index 0000000000..31bd623a24
--- /dev/null
+++ b/data/2020/neurips/MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation	
@@ -0,0 +1 @@
+Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple semantic map agent performs surprisingly well relative to more complex neural image feature map agents; and iii) even oracle map agents achieve relatively low performance, indicating the potential for future work in training embodied navigation agents using maps. Video summary: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning b/data/2020/neurips/Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning
new file mode 100644
index 0000000000..410d49f760
--- /dev/null
+++ b/data/2020/neurips/Multifaceted Uncertainty Estimation for Label-Efficient Deep Learning	
@@ -0,0 +1 @@
+We present a novel multi-source uncertainty prediction approach that enables deep learning (DL) models to be actively trained with much less labeled data. By leveraging the second-order uncertainty representation provided by subjective logic (SL), we conduct evidence-based theoretical analysis and formally decompose the predicted entropy over multiple classes into two distinct sources of uncertainty: vacuity and dissonance, caused by lack of evidence and conﬂict of strong evidence, respectively. The evidence based entropy decomposition provides deeper insights on the nature of uncertainty, which can help effectively explore a large and high-dimensional unlabeled data space. We develop a novel loss function that augments DL based evidence prediction with uncertainty anchor sample identiﬁcation. The accurately estimated multiple sources of uncertainty are systematically integrated and dynamically balanced using a data sampling function for label-efﬁcient active deep learning (ADL). Experiments conducted over both synthetic and real data and comparison with competitive AL methods demonstrate the effectiveness of the proposed ADL model.
\ No newline at end of file
diff --git a/data/2020/neurips/Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping b/data/2020/neurips/Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping
new file mode 100644
index 0000000000..1be695726f
--- /dev/null
+++ b/data/2020/neurips/Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping	
@@ -0,0 +1 @@
+In modern multilabel classification problems, each data instance belongs to a small number of classes from a large set of classes. In other words, these problems involve learning very sparse binary label vectors. Moreover, in large-scale problems, the labels typically have certain (unknown) hierarchy. In this paper we exploit the sparsity of label vectors and the hierarchical structure to embed them in low-dimensional space using label groupings. Consequently, we solve the classification problem in a much lower dimensional space and then obtain labels in the original space using an appropriately defined lifting. Our method builds on the work of (Ubaru & Mazumdar, 2017), where the idea of group testing was also explored for multilabel classification. We first present a novel data-dependent grouping approach, where we use a group construction based on a low-rank Nonnegative Matrix Factorization (NMF) of the label matrix of training instances. The construction also allows us, using recent results, to develop a fast prediction algorithm that has a logarithmic runtime in the number of labels. We then present a hierarchical partitioning approach that exploits the label hierarchy in large scale problems to divide up the large label space and create smaller sub-problems, which can then be solved independently via the grouping approach. Numerical results on many benchmark datasets illustrate that, compared to other popular methods, our proposed methods achieve competitive accuracy with significantly lower computational costs.
\ No newline at end of file
diff --git a/data/2020/neurips/Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence b/data/2020/neurips/Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
new file mode 100644
index 0000000000..877ac929f8
--- /dev/null
+++ b/data/2020/neurips/Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence	
@@ -0,0 +1 @@
+Learning from different data types is a long-standing goal in machine learning research, as multiple information sources co-occur when describing natural phenomena. However, existing generative models that approximate a multimodal ELBO rely on difficult or inefficient training schemes to learn a joint distribution and the dependencies between modalities. In this work, we propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions. It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior. In addition, we theoretically prove that the new multimodal JS-divergence (mmJSD) objective optimizes an ELBO. In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Multimodal Graph Networks for Compositional Generalization in Visual Question Answering b/data/2020/neurips/Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
new file mode 100644
index 0000000000..48a6cfa28c
--- /dev/null
+++ b/data/2020/neurips/Multimodal Graph Networks for Compositional Generalization in Visual Question Answering	
@@ -0,0 +1 @@
+Compositional generalization is a key challenge in grounding natural language to visual perception. While deep learning models have achieved great success in multimodal tasks like visual question answering, recent studies have shown that they fail to generalize to new inputs that are simply an unseen combination of those seen in the training distribution [6]. In this paper, we propose to tackle this challenge by employing neural factor graphs to induce a tighter coupling between concepts in different modalities (e.g. images and text). Graph representations are inherently compositional in nature and allow us to capture entities, attributes and relations in a scalable manner. Our model ﬁrst creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions. Empirically, our model achieves close to perfect scores on a caption truth prediction problem and state-of-the-art results on the recently introduced CLOSURE dataset, improving on the mean overall accuracy across seven compositional templates by 4.77% over previous approaches. 2
\ No newline at end of file
diff --git a/data/2020/neurips/Multiparameter Persistence Image for Topological Machine Learning b/data/2020/neurips/Multiparameter Persistence Image for Topological Machine Learning
new file mode 100644
index 0000000000..6ba7cd8f6c
--- /dev/null
+++ b/data/2020/neurips/Multiparameter Persistence Image for Topological Machine Learning	
@@ -0,0 +1 @@
+The Multiparameter Persistence Image is built from the Vineyard Decomposition associated to the fibered barcode, which is a collection of successive matchings of barcodes. Fix a 2D persistence moduleM(f). Considering a collection of lines {`i}, Landi’s external stability result [Lan14] implies that as a pair of lines `i and `j get closer together, bcd(f`i) and bcd(f`j ) converge. This implies that the persistence image is a sensible construction in the sense that there are not discontinuous jumps in the barcodes along the matchings, provided the cover by the lines is fine enough. Moreover, these considerations also imply that for sufficiently close lines, the vineyard matching closely approximates the bottleneck distance matching.
\ No newline at end of file
diff --git a/data/2020/neurips/Multipole Graph Neural Operator for Parametric Partial Differential Equations b/data/2020/neurips/Multipole Graph Neural Operator for Parametric Partial Differential Equations
new file mode 100644
index 0000000000..2287a8bc4c
--- /dev/null
+++ b/data/2020/neurips/Multipole Graph Neural Operator for Parametric Partial Differential Equations	
@@ -0,0 +1 @@
+One of the main challenges in using deep learning-based methods for simulating physical systems and solving partial differential equations (PDEs) is formulating physics-based data in the desired structure for neural networks. Graph neural networks (GNNs) have gained popularity in this area since graphs offer a natural way of modeling particle interactions and provide a clear way of discretizing the continuum models. However, the graphs constructed for approximating such tasks usually ignore long-range interactions due to unfavorable scaling of the computational complexity with respect to the number of nodes. The errors due to these approximations scale with the discretization of the system, thereby not allowing for generalization under mesh-refinement. Inspired by the classical multipole methods, we propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Our multi-level formulation is equivalent to recursively adding inducing points to the kernel matrix, unifying GNNs with multi-resolution matrix factorization of the kernel. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
\ No newline at end of file
diff --git a/data/2020/neurips/Multiscale Deep Equilibrium Models b/data/2020/neurips/Multiscale Deep Equilibrium Models
new file mode 100644
index 0000000000..9b3eed4eeb
--- /dev/null
+++ b/data/2020/neurips/Multiscale Deep Equilibrium Models	
@@ -0,0 +1 @@
+We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only O(1) memory consumption). These simultaneously-learned multi-resolution features allow us to train a single model on a diverse set of tasks and loss functions, such as using a single MDEQ to perform both image classification and semantic segmentation. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the first time such performance and scale have been achieved by an implicit deep learning approach. The code and pre-trained models are at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance b/data/2020/neurips/Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance
new file mode 100644
index 0000000000..97fc985e74
--- /dev/null
+++ b/data/2020/neurips/Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance	
@@ -0,0 +1 @@
+In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.
\ No newline at end of file
diff --git a/data/2020/neurips/Munchausen Reinforcement Learning b/data/2020/neurips/Munchausen Reinforcement Learning
new file mode 100644
index 0000000000..53a0b3d4f3
--- /dev/null
+++ b/data/2020/neurips/Munchausen Reinforcement Learning	
@@ -0,0 +1 @@
+Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most algorithms, based on temporal differences, replace the true value of a transiting state by their current estimate of this value. Yet, another estimate could be leveraged to bootstrap RL: the current policy. Our core contribution stands in a very simple idea: adding the scaled log-policy to the immediate reward. We show that slightly modifying Deep Q-Network (DQN) in that way provides an agent that is competitive with distributional methods on Atari games, without making use of distributional RL, n-step returns or prioritized replay. To demonstrate the versatility of this idea, we also use it together with an Implicit Quantile Network (IQN). The resulting agent outperforms Rainbow on Atari, installing a new State of the Art with very little modifications to the original algorithm. To add to this empirical study, we provide strong theoretical insights on what happens under the hood -- implicit Kullback-Leibler regularization and increase of the action-gap.
\ No newline at end of file
diff --git a/data/2020/neurips/Mutual exclusivity as a challenge for deep neural networks b/data/2020/neurips/Mutual exclusivity as a challenge for deep neural networks
new file mode 100644
index 0000000000..0a184ae123
--- /dev/null
+++ b/data/2020/neurips/Mutual exclusivity as a challenge for deep neural networks	
@@ -0,0 +1 @@
+Strong inductive biases allow children to learn in fast and adaptable ways. Children use the mutual exclusivity (ME) bias to help disambiguate how words map to referents, assuming that if an object has one label then it does not need another. In this paper, we investigate whether or not standard neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing neural networks that reason by mutual exclusivity, which remains an open challenge.
\ No newline at end of file
diff --git a/data/2020/neurips/Myersonian Regression b/data/2020/neurips/Myersonian Regression
new file mode 100644
index 0000000000..af5ec9fea2
--- /dev/null
+++ b/data/2020/neurips/Myersonian Regression	
@@ -0,0 +1 @@
+Motivated by pricing applications in online advertising, we study a variant of linear regression with a discontinuous loss function that we term Myersonian regression. In this variant, we wish to find a linear function f : R → R that well approximates a set of points (xi, vi) ∈ R × [0, 1] in the following sense: we receive a loss of vi when f(xi) > vi and a loss of vi − f(xi) when f(xi) ≤ vi. This arises naturally in the economic application of designing a pricing policy for differentiated items (where the loss is the gap between the performance of our policy and the optimal Myerson prices). We show that Myersonian regression is NP-hard to solve exactly and furthermore that no fully polynomial-time approximation scheme exists for Myersonian regression conditioned on the Exponential Time Hypothesis being true. In contrast to this, we demonstrate a polynomial-time approximation scheme for Myersonian regression that obtains an m additive approximation to the optimal possible revenue and can be computed in time O(exp(poly(1/ ))poly(m,n)). We show that this algorithm is stable and generalizes well over distributions of samples.
\ No newline at end of file
diff --git a/data/2020/neurips/NVAE: A Deep Hierarchical Variational Autoencoder b/data/2020/neurips/NVAE: A Deep Hierarchical Variational Autoencoder
new file mode 100644
index 0000000000..6dbaf5fe98
--- /dev/null
+++ b/data/2020/neurips/NVAE: A Deep Hierarchical Variational Autoencoder	
@@ -0,0 +1 @@
+Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels. The source code is available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity b/data/2020/neurips/NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
new file mode 100644
index 0000000000..91c4900bd5
--- /dev/null
+++ b/data/2020/neurips/NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity	
@@ -0,0 +1 @@
+Normalizing flows (NFs) have become a prominent method for deep generative models that allow for an analytic probability density estimation and efficient synthesis. However, a flow-based network is considered to be inefficient in parameter complexity because of reduced expressiveness of bijective mapping, which renders the models prohibitively expensive in terms of parameters. We present an alternative of parameterization scheme, called NanoFlow, which uses a single neural density estimator to model multiple transformation stages. Hence, we propose an efficient parameter decomposition method and the concept of flow indication embedding, which are key missing components that enable density estimation from a single neural network. Experiments performed on audio and image models confirm that our method provides a new parameter-efficient solution for scalable NFs with significantly sublinear parameter complexity.
\ No newline at end of file
diff --git a/data/2020/neurips/Natural Graph Networks b/data/2020/neurips/Natural Graph Networks
new file mode 100644
index 0000000000..7474c6d97c
--- /dev/null
+++ b/data/2020/neurips/Natural Graph Networks	
@@ -0,0 +1 @@
+A key requirement for graph neural networks is that they must process a graph in a way that does not depend on how the graph is described. Traditionally this has been taken to mean that a graph network must be equivariant to node permutations. Here we show that instead of equivariance, the more general concept of naturality is sufﬁcient for a graph network to be well-deﬁned, opening up a larger class of graph networks. We deﬁne global and local natural graph networks, the latter of which are as scalable as conventional message passing graph neural networks while being more ﬂexible. We give one practical instantiation of a natural network on graphs which uses an equivariant message network parameterization, yielding good performance on several benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes b/data/2020/neurips/Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes
new file mode 100644
index 0000000000..6617d0eb2c
--- /dev/null
+++ b/data/2020/neurips/Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes	
@@ -0,0 +1 @@
+We study sequential decision-making problems in which each agent aims to maximize the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted inﬁnite-horizon Constrained Markov Decision Processes (CMDPs) problem. Speciﬁcally, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method for CMDPs which updates the primal variable via natural policy gradient ascent and the dual variable via projected sub-gradient descent. Even though the underlying maximization involves a nonconcave objective function and a nonconvex constraint set under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such a convergence is independent of the size of the state-action space, i.e., it is dimension-free. Furthermore, for the general smooth policy class, we establish sublinear rates of convergence regarding both the optimality gap and the constraint violation, up to a function approximation error caused by restricted policy parametrization. Finally, we show that two sample-based NPG-PD algorithms inherit such non-asymptotic convergence properties and provide ﬁnite-sample complexity guarantees. To the best of our knowledge, our work is the ﬁrst to establish non-asymptotic convergence guarantees of policy-based primal-dual methods for solving inﬁnite-horizon discounted CMDPs. We also provide computational results to demonstrate merits of our approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Near-Optimal Comparison Based Clustering b/data/2020/neurips/Near-Optimal Comparison Based Clustering
new file mode 100644
index 0000000000..3f85d896f7
--- /dev/null
+++ b/data/2020/neurips/Near-Optimal Comparison Based Clustering	
@@ -0,0 +1 @@
+The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.
\ No newline at end of file
diff --git a/data/2020/neurips/Near-Optimal Reinforcement Learning with Self-Play b/data/2020/neurips/Near-Optimal Reinforcement Learning with Self-Play
new file mode 100644
index 0000000000..c8ffb4104b
--- /dev/null
+++ b/data/2020/neurips/Near-Optimal Reinforcement Learning with Self-Play	
@@ -0,0 +1 @@
+This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games. We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. In a tabular episodic Markov game with $S$ states, $A$ max-player actions and $B$ min-player actions, the best existing algorithm for finding an approximate Nash equilibrium requires $\tilde{\mathcal{O}}(S^2AB)$ steps of game playing, when only highlighting the dependency on $(S,A,B)$. In contrast, the best existing lower bound scales as $\Omega(S(A+B))$ and has a significant gap from the upper bound. This paper closes this gap for the first time: we propose an optimistic variant of the \emph{Nash Q-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(SAB)$, and a new \emph{Nash V-learning} algorithm with sample complexity $\tilde{\mathcal{O}}(S(A+B))$. The latter result matches the information-theoretic lower bound in all problem-dependent parameters except for a polynomial factor of the length of each episode. In addition, we present a computational hardness result for learning the best responses against a fixed opponent in Markov games---a learning objective different from finding the Nash equilibrium.
\ No newline at end of file
diff --git a/data/2020/neurips/Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals b/data/2020/neurips/Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals
new file mode 100644
index 0000000000..6ec8c27693
--- /dev/null
+++ b/data/2020/neurips/Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals	
@@ -0,0 +1 @@
+We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times \mathbb{R}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $\mathrm{OPT}+\epsilon$, where $\mathrm{OPT}$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{\mathrm{poly}(1/\epsilon)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
\ No newline at end of file
diff --git a/data/2020/neurips/Network Diffusions via Neural Mean-Field Dynamics b/data/2020/neurips/Network Diffusions via Neural Mean-Field Dynamics
new file mode 100644
index 0000000000..ac8a515e9d
--- /dev/null
+++ b/data/2020/neurips/Network Diffusions via Neural Mean-Field Dynamics	
@@ -0,0 +1 @@
+We propose a novel learning framework based on neural mean-field dynamics for inference and estimation problems of diffusion on networks. Our new framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities, which renders a delay differential equation with memory integral approximated by learnable time convolution operators, resulting in a highly structured and interpretable RNN. Directly using cascade data, our framework can jointly learn the structure of the diffusion network and the evolution of infection probabilities, which are cornerstone to important downstream applications such as influence maximization. Connections between parameter learning and optimal control are also established. Empirical study shows that our approach is versatile and robust to variations of the underlying diffusion network models, and significantly outperform existing approaches in accuracy and efficiency on both synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2020/neurips/Network size and size of the weights in memorization with two-layers neural networks b/data/2020/neurips/Network size and size of the weights in memorization with two-layers neural networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Network-to-Network Translation with Conditional Invertible Neural Networks b/data/2020/neurips/Network-to-Network Translation with Conditional Invertible Neural Networks
new file mode 100644
index 0000000000..59400797df
--- /dev/null
+++ b/data/2020/neurips/Network-to-Network Translation with Conditional Invertible Neural Networks	
@@ -0,0 +1 @@
+Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation. Recent work suggests that the power of these massive models is captured by the representations they learn. Therefore, we seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network. This network demonstrates its capability by (i) providing generic transfer between diverse domains, (ii) enabling controlled content synthesis by allowing modification in other domains, and (iii) facilitating diagnosis of existing representations by translating them into interpretable domains such as images. Our domain transfer network can translate between fixed representations without having to learn or finetune them. This allows users to utilize various existing domain-specific expert models from the literature that had been trained with extensive computational resources. Experiments on diverse conditional image synthesis tasks, competitive image modification results and experiments on image-to-image and text-to-image generation demonstrate the generic applicability of our approach. For example, we translate between BERT and BigGAN, state-of-the-art text and image models to provide text-to-image generation, which neither of both experts can perform on their own.
\ No newline at end of file
diff --git a/data/2020/neurips/NeuMiss networks: differentiable programming for supervised learning with missing values b/data/2020/neurips/NeuMiss networks: differentiable programming for supervised learning with missing values
new file mode 100644
index 0000000000..40c3938271
--- /dev/null
+++ b/data/2020/neurips/NeuMiss networks: differentiable programming for supervised learning with missing values	
@@ -0,0 +1 @@
+The presence of missing values makes supervised learning much more challenging. Indeed, previous work has shown that even when the response is a linear function of the complete data, the optimal predictor is a complex function of the observed entries and the missingness indicator. As a result, the computational or sample complexities of consistent approaches depend on the number of missing patterns, which can be exponential in the number of dimensions. In this work, we derive the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms including Missing at Random (MAR) and self-masking (Missing Not At Random). Based on a Neumann series approximation of the optimal predictor, we propose a new principled architecture, named Neumann networks. Their originality and strength comes from the use of a new type of non-linearity: the multiplication by the missingness indicator. We provide an upper bound on the Bayes risk of Neumann networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns. As a result they scale well to problems with many features, and remain statistically efficient for medium-sized samples. Moreover, we show that, contrary to procedures using EM or imputation, they are robust to the missing data mechanism, including difficult MNAR settings such as self-masking.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Anisotropy Directions b/data/2020/neurips/Neural Anisotropy Directions
new file mode 100644
index 0000000000..6e4ce478cf
--- /dev/null
+++ b/data/2020/neurips/Neural Anisotropy Directions	
@@ -0,0 +1 @@
+In this work, we analyze the role of the network architecture in shaping the inductive bias of deep classifiers. To that end, we start by focusing on a very simple problem, i.e., classifying a class of linearly separable distributions, and show that, depending on the direction of the discriminative feature of the distribution, many state-of-the-art deep convolutional neural networks (CNNs) have a surprisingly hard time solving this simple task. We then define as neural anisotropy directions (NADs) the vectors that encapsulate the directional inductive bias of an architecture. These vectors, which are specific for each architecture and hence act as a signature, encode the preference of a network to separate the input data based on some particular features. We provide an efficient method to identify NADs for several CNN architectures and thus reveal their directional inductive biases. Furthermore, we show that, for the CIFAR-10 dataset, NADs characterize features used by CNNs to discriminate between different classes.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Architecture Generator Optimization b/data/2020/neurips/Neural Architecture Generator Optimization
new file mode 100644
index 0000000000..6e14cbfde5
--- /dev/null
+++ b/data/2020/neurips/Neural Architecture Generator Optimization	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) was first proposed to achieve state-of-the-art performance through the discovery of new architecture patterns, without human intervention. An over-reliance on expert knowledge in the search space design has however led to increased performance (local optima) without significant architectural breakthroughs, thus preventing truly novel solutions from being reached. In this work we 1) are the first to investigate casting NAS as a problem of finding the optimal network generator and 2) we propose a new, hierarchical and graph-based search space capable of representing an extremely large variety of network types, yet only requiring few continuous hyper-parameters. This greatly reduces the dimensionality of the problem, enabling the effective use of Bayesian Optimisation as a search strategy. At the same time, we expand the range of valid architectures, motivating a multi-objective learning approach. We demonstrate the effectiveness of this strategy on six benchmark datasets and show that our search space generates extremely lightweight yet highly competitive models.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems b/data/2020/neurips/Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems
new file mode 100644
index 0000000000..e9ad56498e
--- /dev/null
+++ b/data/2020/neurips/Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems	
@@ -0,0 +1 @@
+Learning-based methodologies increasingly find applications in safety-critical domains like autonomous driving and medical robotics. Due to the rare nature of dangerous events, real-world testing is prohibitively expensive and unscalable. In this work, we employ a probabilistic approach to safety evaluation in simulation, where we are concerned with computing the probability of dangerous events. We develop a novel rare-event simulation method that combines exploration, exploitation, and optimization techniques to find failure modes and estimate their rate of occurrence. We provide rigorous guarantees for the performance of our method in terms of both statistical and computational efficiency. Finally, we demonstrate the efficacy of our approach on a variety of scenarios, illustrating its usefulness as a tool for rapid sensitivity analysis and model comparison that are essential to developing and testing safety-critical autonomous systems.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Complexity Measures b/data/2020/neurips/Neural Complexity Measures
new file mode 100644
index 0000000000..d68b52aaf8
--- /dev/null
+++ b/data/2020/neurips/Neural Complexity Measures	
@@ -0,0 +1 @@
+While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way. The trained NC model can be added to the standard training loss to regularize any task learner in a standard supervised learning scenario. We contrast NC's approach against existing manually-designed complexity measures and other meta-learning models, and we validate NC's performance on multiple regression and classification tasks
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Controlled Differential Equations for Irregular Time Series b/data/2020/neurips/Neural Controlled Differential Equations for Irregular Time Series
new file mode 100644
index 0000000000..dc6e38aba9
--- /dev/null
+++ b/data/2020/neurips/Neural Controlled Differential Equations for Irregular Time Series	
@@ -0,0 +1 @@
+Neural ordinary differential equations are an attractive option for modelling temporal dynamics. However, a fundamental issue is that the solution to an ordinary differential equation is determined by its initial condition, and there is no mechanism for adjusting the trajectory based on subsequent observations. Here, we demonstrate how this may be resolved through the well-understood mathematics of \emph{controlled differential equations}. The resulting \emph{neural controlled differential equation} model is directly applicable to the general setting of partially-observed irregularly-sampled multivariate time series, and (unlike previous work on this problem) it may utilise memory-efficient adjoint-based backpropagation even across observations. We demonstrate that our model achieves state-of-the-art performance against similar (ODE or RNN based) models in empirical studies on a range of datasets. Finally we provide theoretical results demonstrating universal approximation, and that our model subsumes alternative ODE models.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Dynamic Policies for End-to-End Sensorimotor Learning b/data/2020/neurips/Neural Dynamic Policies for End-to-End Sensorimotor Learning
new file mode 100644
index 0000000000..f21aacae95
--- /dev/null
+++ b/data/2020/neurips/Neural Dynamic Policies for End-to-End Sensorimotor Learning	
@@ -0,0 +1 @@
+The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decisions individually at each timestep in training, and hence, limits the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed the structure of a dynamical system into deep neural network-based policies by reparameterizing action spaces via second-order differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where actions represent the raw control space. The embedded structure allows end-to-end policy learning for both reinforcement and imitation learning setups. We show that NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks for both imitation and reinforcement learning setups. Project video and code are available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Execution Engines: Learning to Execute Subroutines b/data/2020/neurips/Neural Execution Engines: Learning to Execute Subroutines
new file mode 100644
index 0000000000..699dfea432
--- /dev/null
+++ b/data/2020/neurips/Neural Execution Engines: Learning to Execute Subroutines	
@@ -0,0 +1 @@
+A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numerical subroutines that comprise common algorithms like sorting, shortest paths, and minimum spanning trees. First, we observe that transformer-based sequence-to-sequence models can learn subroutines like sorting a list of numbers, but their performance rapidly degrades as the length of lists grows beyond those found in the training set. We demonstrate that this is due to attention weights that lose fidelity with longer sequences, particularly when the input numbers are numerically similar. To address the issue, we propose a learned conditional masking mechanism, which enables the model to strongly generalize far outside of its training range with near-perfect accuracy on a variety of algorithms. Second, to generalize to unseen data, we show that encoding numbers with a binary representation leads to embeddings with rich structure once trained on downstream tasks like addition or multiplication. This allows the embedding to handle missing data by faithfully interpolating numbers not seen during training.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural FFTs for Universal Texture Image Synthesis b/data/2020/neurips/Neural FFTs for Universal Texture Image Synthesis
new file mode 100644
index 0000000000..9f1902a740
--- /dev/null
+++ b/data/2020/neurips/Neural FFTs for Universal Texture Image Synthesis	
@@ -0,0 +1 @@
+Synthesizing larger texture images from a smaller exemplar is an important task in graphics and vision. The conventional CNNs, recently adopted for synthesis, require to train and test on the same set of images and fail to generalize to unseen images. This is mainly because those CNNs fully rely on convolutional and upsampling layers that operate locally and not suitable for a task as global as texture synthesis. In this work, inspired by the repetitive nature of texture patterns, we ﬁnd that texture synthesis can be viewed as (local) upsampling in the Fast Fourier Transform (FFT) domain. However, FFT of natural images exhibits high dynamic range and lacks local correlations. Therefore, to train CNNs we design a framework to perform FFT upsampling in feature space using deformable convolutions. Such design allows our framework to generalize to unseen images, and synthesize textures in a single pass. Extensive evaluations conﬁrm that our method achieves state-of-the-art performance both quantitatively and qualitatively.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Manifold Ordinary Differential Equations b/data/2020/neurips/Neural Manifold Ordinary Differential Equations
new file mode 100644
index 0000000000..0e0c4cbdf3
--- /dev/null
+++ b/data/2020/neurips/Neural Manifold Ordinary Differential Equations	
@@ -0,0 +1 @@
+To better conform to data geometry, recent deep generative modelling techniques adapt Euclidean constructions to non-Euclidean spaces. In this paper, we study normalizing flows on manifolds. Previous work has developed flow models for specific cases; however, these advancements hand craft layers on a manifold-by-manifold basis, restricting generality and inducing cumbersome design constraints. We overcome these issues by introducing Neural Manifold Ordinary Differential Equations, a manifold generalization of Neural ODEs, which enables the construction of Manifold Continuous Normalizing Flows (MCNFs). MCNFs require only local geometry (therefore generalizing to arbitrary manifolds) and compute probabilities with continuous change of variables (allowing for a simple and expressive flow construction). We find that leveraging continuous manifold dynamics produces a marked improvement for both density estimation and downstream tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows b/data/2020/neurips/Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows
new file mode 100644
index 0000000000..7f25d5b0bc
--- /dev/null
+++ b/data/2020/neurips/Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows	
@@ -0,0 +1 @@
+Meshes are important representations of physical 3D entities in the virtual world. Applications like rendering, simulations and 3D printing require meshes to be manifold so that they can interact with the world like the real objects they represent. Prior methods generate meshes with great geometric accuracy but poor manifoldness. In this work, we propose Neural Mesh Flow (NMF) to generate two-manifold meshes for genus-0 shapes. Specifically, NMF is a shape auto-encoder consisting of several Neural Ordinary Differential Equation (NODE)[1]blocks that learn accurate mesh geometry by progressively deforming a spherical mesh. Training NMF is simpler compared to state-of-the-art methods since it does not require any explicit mesh-based regularization. Our experiments demonstrate that NMF facilitates several applications such as single-view mesh reconstruction, global shape parameterization, texture mapping, shape deformation and correspondence. Importantly, we demonstrate that manifold meshes generated using NMF are better-suited for physically-based rendering and simulation. Code and data will be released.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs b/data/2020/neurips/Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs
new file mode 100644
index 0000000000..3a3cbeea91
--- /dev/null
+++ b/data/2020/neurips/Neural Message Passing for Multi-Relational Ordered and Recursive Hypergraphs	
@@ -0,0 +1 @@
+Message passing neural network (MPNN) has recently emerged as a successful framework by achieving state-of-the-art performances on many graph-based learning tasks. MPNN has also recently been extended to multi-relational graphs (each edge is labelled), and hypergraphs (each edge can connect any number of vertices). However, in real-world datasets involving text and knowledge, relationships are much more complex in which hyperedges can be multi-relational, recursive, and ordered. Such structures present several unique challenges because it is not clear how to adapt MPNN to variable -sized hyperedges in them. In this work, we ﬁrst unify exisiting MPNNs on different structures into G-MPNN (Generalised-MPNN) framework. Motivated by real-world datasets, we then propose a novel extension of the framework, MPNN-R (MPNN-Recursive) to handle recursively-structured data. Experimental results demonstrate the effectiveness of proposed instances of G-MPNN and MPNN-R. The code is available. 1
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Methods for Point-wise Dependency Estimation b/data/2020/neurips/Neural Methods for Point-wise Dependency Estimation
new file mode 100644
index 0000000000..88f6974446
--- /dev/null
+++ b/data/2020/neurips/Neural Methods for Point-wise Dependency Estimation	
@@ -0,0 +1 @@
+Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables. However, MI is an aggregate statistic and cannot be used to measure point-wise dependency between different events. In this work, instead of estimating the expected dependency, we focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We show that we can naturally obtain PD when we are optimizing MI neural variational bounds. However, optimizing these bounds is challenging due to its large variance in practice. To address this issue, we develop two methods (free of optimizing MI variational bounds): Probabilistic Classifier and Density-Ratio Fitting. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Networks Fail to Learn Periodic Functions and How to Fix It b/data/2020/neurips/Neural Networks Fail to Learn Periodic Functions and How to Fix It
new file mode 100644
index 0000000000..86fe0776b7
--- /dev/null
+++ b/data/2020/neurips/Neural Networks Fail to Learn Periodic Functions and How to Fix It	
@@ -0,0 +1 @@
+Previous literature offers limited clues on how to learn a periodic function using modern neural networks. We start with a study of the extrapolation properties of neural networks; we prove and demonstrate experimentally that the standard activations functions, such as ReLU, tanh, sigmoid, along with their variants, all fail to learn to extrapolate simple periodic functions. We hypothesize that this is due to their lack of a "periodic" inductive bias. As a fix of this problem, we propose a new activation, namely, $x + \sin^2(x)$, which achieves the desired periodic inductive bias to learn a periodic function while maintaining a favorable optimization property of the ReLU-based activations. Experimentally, we apply the proposed method to temperature and financial data prediction.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Networks Learning and Memorization with (almost) no Over-Parameterization b/data/2020/neurips/Neural Networks Learning and Memorization with (almost) no Over-Parameterization
new file mode 100644
index 0000000000..c09ac38422
--- /dev/null
+++ b/data/2020/neurips/Neural Networks Learning and Memorization with (almost) no Over-Parameterization	
@@ -0,0 +1,2 @@
+Many results in recent years established polynomial time learnability of various models via neural networks algorithms. However, unless the model is linear separable, or the activation is a polynomial, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. 
+In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with near optimal network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\mathbb{S}^{d-1}$.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Networks with Recurrent Generative Feedback b/data/2020/neurips/Neural Networks with Recurrent Generative Feedback
new file mode 100644
index 0000000000..12f2d3e16e
--- /dev/null
+++ b/data/2020/neurips/Neural Networks with Recurrent Generative Feedback	
@@ -0,0 +1 @@
+Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of the internal generative model and the external environmental. Inspired by this, we enforce consistency in neural networks by incorporating generative recurrent feedback. We instantiate it on convolutional neural networks (CNNs). The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables into existing CNN architectures, making consistent predictions via alternating MAP inference under a Bayesian framework. CNN-F shows considerably better adversarial robustness over regular feedforward CNNs on standard benchmarks. In addition, With higher V4 and IT neural predictivity, CNN-F produces object representations closer to primate vision than conventional CNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Networks with Small Weights and Depth-Separation Barriers b/data/2020/neurips/Neural Networks with Small Weights and Depth-Separation Barriers
new file mode 100644
index 0000000000..7de33d3ae5
--- /dev/null
+++ b/data/2020/neurips/Neural Networks with Small Weights and Depth-Separation Barriers	
@@ -0,0 +1 @@
+In studying the expressiveness of neural networks, an important question is whether there are functions which can only be approximated by sufficiently deep networks, assuming their size is bounded. However, for constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important open question. In this paper, we focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$, by reduction to open problems and natural-proof barriers in circuit complexity. To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks. We provide a negative and constructive answer to that question, by showing that if a function can be approximated by a polynomially-sized, constant depth $k$ network with arbitrarily large weights, it can also be approximated by a polynomially-sized, depth $3k+3$ network, whose weights are polynomially bounded.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Non-Rigid Tracking b/data/2020/neurips/Neural Non-Rigid Tracking
new file mode 100644
index 0000000000..95db81a249
--- /dev/null
+++ b/data/2020/neurips/Neural Non-Rigid Tracking	
@@ -0,0 +1 @@
+We introduce a novel, end-to-end learnable, differentiable non-rigid tracker that enables state-of-the-art non-rigid reconstruction. Given two input RGB-D frames of a non-rigidly moving object, we employ a convolutional neural network to predict dense correspondences. These correspondences are used as constraints in an as-rigid-as-possible (ARAP) optimization problem. By enabling gradient back-propagation through the non-rigid optimization solver, we are able to learn correspondences in an end-to-end manner such that they are optimal for the task of non-rigid tracking. Furthermore, this formulation allows for learning correspondence weights in a self-supervised manner. Thus, outliers and wrong correspondences are down-weighted to enable robust tracking. Compared to state-of-the-art approaches, our algorithm shows improved reconstruction performance, while simultaneously achieving 85 times faster correspondence prediction than comparable deep-learning based methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning b/data/2020/neurips/Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning
new file mode 100644
index 0000000000..e1974068f2
--- /dev/null
+++ b/data/2020/neurips/Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning	
@@ -0,0 +1 @@
+Rectified linear unit (ReLU) activations can also be thought of as \emph{gates}, which, either pass or stop their pre-activation input when they are \emph{on} (when the pre-activation input is positive) or \emph{off} (when the pre-activation input is negative) respectively. A deep neural network (DNN) with ReLU activations has many gates, and the {on/off} status of each gate changes across input examples as well as network weights. For a given input example, only a subset of gates are \emph{active}, i.e., on, and the sub-network of weights connected to these active gates is responsible for producing the output. At randomised initialisation, the active sub-network corresponding to a given input example is random. During training, as the weights are learnt, the active sub-networks are also learnt, and potentially hold very valuable information. In this paper, we analytically characterise the role of active sub-networks in deep learning. To this end, we encode the {on/off} state of the gates of a given input in a novel \emph{neural path feature} (NPF), and the weights of the DNN are encoded in a novel \emph{neural path value} (NPV). Further, we show that the output of network is indeed the inner product of NPF and NPV. The main result of the paper shows that the \emph{neural path kernel} associated with the NPF is a fundamental quantity that characterises the information stored in the gates of a DNN. We show via experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations NPFs are learnt during training and such learning is key for generalisation. Furthermore, NPFs and NPVs can be learnt in two separate networks and such learning also generalises well in experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Power Units b/data/2020/neurips/Neural Power Units
new file mode 100644
index 0000000000..c5e19bc021
--- /dev/null
+++ b/data/2020/neurips/Neural Power Units	
@@ -0,0 +1 @@
+Conventional Neural Networks can approximate simple arithmetic operations, but fail to generalize beyond the range of numbers that were seen during training. Neural Arithmetic Units aim to overcome this difficulty, but current arithmetic units are either limited to operate on positive numbers or can only represent a subset of arithmetic operations. We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer. The NPU thus fixes the shortcomings of existing arithmetic units and extends their expressivity. We achieve this by using complex arithmetic without requiring a conversion of the network to complex numbers. A simplification of the unit to the RealNPU yields a highly interpretable model. We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets, and that the RealNPU can discover the governing equations of a dynamical systems only from data.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Sparse Representation for Image Restoration b/data/2020/neurips/Neural Sparse Representation for Image Restoration
new file mode 100644
index 0000000000..61b69cddea
--- /dev/null
+++ b/data/2020/neurips/Neural Sparse Representation for Image Restoration	
@@ -0,0 +1 @@
+Inspired by the robustness and efficiency of sparse representation in sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks. Our method structurally enforces sparsity constraints upon hidden neurons. The sparsity constraints are favorable for gradient-based learning algorithms and attachable to convolution layers in various networks. Sparsity in neurons enables computation saving by only operating on non-zero components without hurting accuracy. Meanwhile, our method can magnify representation dimensionality and model capacity with negligible additional computation cost. Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks, including image super-resolution, image denoising, and image compression artifacts removal. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Sparse Voxel Fields b/data/2020/neurips/Neural Sparse Voxel Fields
new file mode 100644
index 0000000000..b5e55c80a0
--- /dev/null
+++ b/data/2020/neurips/Neural Sparse Voxel Fields	
@@ -0,0 +1 @@
+Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is over 10 times faster than the state-of-the-art (namely, NeRF) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Star Domain as Primitive Representation b/data/2020/neurips/Neural Star Domain as Primitive Representation
new file mode 100644
index 0000000000..a4d928e2e9
--- /dev/null
+++ b/data/2020/neurips/Neural Star Domain as Primitive Representation	
@@ -0,0 +1 @@
+Reconstructing 3D objects from 2D images is a fundamental task in computer vision. Accurate structured reconstruction by parsimonious and semantic primitive representation further broadens its application. When reconstructing a target shape with multiple primitives, it is preferable that one can instantly access the union of basic properties of the shape such as collective volume and surface, treating the primitives as if they are one single shape. This becomes possible by primitive representation with unified implicit and explicit representations. However, primitive representations in current approaches do not satisfy all of the above requirements at the same time. To solve this problem, we propose a novel primitive representation named neural star domain (NSD) that learns primitive shapes in the star domain. We show that NSD is a universal approximator of the star domain and is not only parsimonious and semantic but also an implicit and explicit shape representation. We demonstrate that our approach outperforms existing methods in image reconstruction tasks, semantic capabilities, and speed and quality of sampling high-resolution meshes.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Topographic Factor Analysis for fMRI Data b/data/2020/neurips/Neural Topographic Factor Analysis for fMRI Data
new file mode 100644
index 0000000000..a30999636d
--- /dev/null
+++ b/data/2020/neurips/Neural Topographic Factor Analysis for fMRI Data	
@@ -0,0 +1 @@
+Neuroimaging studies produce gigabytes of spatio-temporal data for a small number of participants and stimuli. Rarely do researchers attempt to model and examine how individual participants vary from each other -- a question that should be addressable even in small samples given the right statistical tools. We propose Neural Topographic Factor Analysis (NTFA), a probabilistic factor analysis model that infers embeddings for participants and stimuli. These embeddings allow us to reason about differences between participants and stimuli as signal rather than noise. We evaluate NTFA on data from an in-house pilot experiment, as well as two publicly available datasets. We demonstrate that inferring representations for participants and stimuli improves predictive generalization to unseen data when compared to previous topographic methods. We also demonstrate that the inferred latent factor representations are useful for downstream tasks such as multivoxel pattern analysis and functional connectivity.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural Unsigned Distance Fields for Implicit Function Learning b/data/2020/neurips/Neural Unsigned Distance Fields for Implicit Function Learning
new file mode 100644
index 0000000000..559b5ea6fd
--- /dev/null
+++ b/data/2020/neurips/Neural Unsigned Distance Fields for Implicit Function Learning	
@@ -0,0 +1 @@
+In this work we target a learnable output representation that allows continuous, high resolution outputs of arbitrary shape. Recent works represent 3D surfaces implicitly with a Neural Network, thereby breaking previous barriers in resolution, and ability to represent diverse topologies. However, neural implicit representations are limited to closed surfaces, which divide the space into inside and outside. Many real world objects such as walls of a scene scanned by a sensor, clothing, or a car with inner structures are not closed. This constitutes a significant barrier, in terms of data pre-processing (objects need to be artificially closed creating artifacts), and the ability to output open surfaces. In this work, we propose Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds. NDF represent surfaces at high resolutions as prior implicit models, but do not require closed surface data, and significantly broaden the class of representable shapes in the output. NDF allow to extract the surface as very dense point clouds and as meshes. We also show that NDF allow for surface normal calculation and can be rendered using a slight modification of sphere tracing. We find NDF can be used for multi-target regression (multiple outputs for one input) with techniques that have been exclusively used for rendering in graphics. Experiments on ShapeNet show that NDF, while simple, is the state-of-the art, and allows to reconstruct shapes with inner structures, such as the chairs inside a bus. Notably, we show that NDF are not restricted to 3D shapes, and can approximate more general open surfaces such as curves, manifolds, and functions. Code is available for research at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Neural encoding with visual attention b/data/2020/neurips/Neural encoding with visual attention
new file mode 100644
index 0000000000..f0461aaedb
--- /dev/null
+++ b/data/2020/neurips/Neural encoding with visual attention	
@@ -0,0 +1 @@
+Visual perception is critically influenced by the focus of attention. Due to limited resources, it is well known that neural representations are biased in favor of attended locations. Using concurrent eye-tracking and functional Magnetic Resonance Imaging (fMRI) recordings from a large cohort of human subjects watching movies, we first demonstrate that leveraging gaze information, in the form of attentional masking, can significantly improve brain response prediction accuracy in a neural encoding model. Next, we propose a novel approach to neural encoding by including a trainable soft-attention module. Using our new approach, we demonstrate that it is possible to learn visual attention policies by end-to-end learning merely on fMRI response data, and without relying on any eye-tracking. Interestingly, we find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns, despite no explicit supervision to do so. Together, these findings suggest that attention modules can be instrumental in neural encoding models of visual stimuli.
\ No newline at end of file
diff --git a/data/2020/neurips/Neuron Merging: Compensating for Pruned Neurons b/data/2020/neurips/Neuron Merging: Compensating for Pruned Neurons
new file mode 100644
index 0000000000..19120d3210
--- /dev/null
+++ b/data/2020/neurips/Neuron Merging: Compensating for Pruned Neurons	
@@ -0,0 +1 @@
+Network pruning is widely used to lighten and accelerate neural network models. Structured network pruning discards the whole neuron or filter, leading to accuracy loss. In this work, we propose a novel concept of neuron merging applicable to both fully connected layers and convolution layers, which compensates for the information loss due to the pruned neurons/filters. Neuron merging starts with decomposing the original weights into two matrices/tensors. One of them becomes the new weights for the current layer, and the other is what we name a scaling matrix, guiding the combination of neurons. If the activation function is ReLU, the scaling matrix can be absorbed into the next layer under certain conditions, compensating for the removed neurons. We also propose a data-free and inexpensive method to decompose the weights by utilizing the cosine similarity between neurons. Compared to the pruned model with the same topology, our merged model better preserves the output feature map of the original model; thus, it maintains the accuracy after pruning without fine-tuning. We demonstrate the effectiveness of our approach over network pruning for various model architectures and datasets. As an example, for VGG-16 on CIFAR-10, we achieve an accuracy of 93.16% while reducing 64% of total parameters, without any fine-tuning. The code can be found here: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Neuron Shapley: Discovering the Responsible Neurons b/data/2020/neurips/Neuron Shapley: Discovering the Responsible Neurons
new file mode 100644
index 0000000000..e7661ca6db
--- /dev/null
+++ b/data/2020/neurips/Neuron Shapley: Discovering the Responsible Neurons	
@@ -0,0 +1 @@
+We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inception-v3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multi-arm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.
\ No newline at end of file
diff --git a/data/2020/neurips/Neuron-level Structured Pruning using Polarization Regularizer b/data/2020/neurips/Neuron-level Structured Pruning using Polarization Regularizer
new file mode 100644
index 0000000000..019b1645c1
--- /dev/null
+++ b/data/2020/neurips/Neuron-level Structured Pruning using Polarization Regularizer	
@@ -0,0 +1 @@
+Neuron-level structured pruning is a very effective technique to reduce the computation of neural networks without compromising prediction accuracy. In previous works, structured pruning is usually achieved by imposing L1 regularization on the scaling factors of neurons, and pruning the neurons whose scaling factors are below a certain threshold. The reasoning is that neurons with smaller scaling factors have weaker inﬂuence on network output. A scaling factor close to 0 actually suppresses a neuron. However, L1 regularization lacks discrimination between neurons because it pushes all scaling factors towards 0. A more reasonable pruning method is to only suppress unimportant neurons (with 0 scaling factors), and simultaneously keep important neurons intact (with larger scaling factor). To achieve this goal, we propose a new regularizer on scaling factors, namely polarization regularizer. Theoretically, we prove that polarization regularizer pushes some scaling factors to 0 and others to a value a > 0 . Experimentally, we show that structured pruning using polarization regularizer achieves much better results than using L1 regular-izer. Experiments on CIFAR and ImageNet datasets show that polarization pruning achieves the state-of-the-art result.
\ No newline at end of file
diff --git a/data/2020/neurips/Neuronal Gaussian Process Regression b/data/2020/neurips/Neuronal Gaussian Process Regression
new file mode 100644
index 0000000000..31b14904e4
--- /dev/null
+++ b/data/2020/neurips/Neuronal Gaussian Process Regression	
@@ -0,0 +1 @@
+N (y;m,K) or y ∼ N (m,K) y is normally distributed with mean m and covariance K, p(y) = e − 1 2 (y−m)>K−1(y−m) √ |2πK| I identity matrix 1 = (1, ..., 1)> vector of ones n number of training data points d dimensionality of inputs m number of inducing points / neurons in first layer f latent function k(x,x′) covariance function σ2 noise variance s2 signal variance {lc}c=1 length-scales / width of tuning curves along each dimension X = {xi}i=1 (d-dimensional) training inputs y = {yi}i=1 (real, scalar) training outputs f = {f(xi)}i=1 latent function values at input points x∗ test point Z = {zj}j=1 inducing point locations / tuning curve centers Kff covariance matrix at input locations X Kuu covariance matrix at inducing point locations Z Kfu = K > uf covariance matrix between input locations X and inducing point locations Z kf∗ covariance vector between input locations X and test point x∗ ku∗ covariance vector between inducing point locations Z and test point x∗ Qff = KfuK −1 uuKuf Nyström approximation of Kff μ∗,Σ∗ predictive mean and variance for test point x∗ / activity of the two output neurons φ(·) = {φj(·)}j=1 = {k(zj , ·)}j=1 tuning curves / activations of 1st layer neurons ψ(·) = {ψj(·)}j=1 activations of 2nd layer neurons w,U, wΣ synaptic weights bΣ bias η learning rate δ = μ− y prediction error χ = δ2 squared prediction error ρ(x∗) non-normalized variance of f∗ / activity of 3rd layer neuron in Fig. S2 ξ perturbation of inducing point location / tuning curve center B baseline, control variate to reduce variance of the gradient estimate
\ No newline at end of file
diff --git a/data/2020/neurips/Neurosymbolic Reinforcement Learning with Formally Verified Exploration b/data/2020/neurips/Neurosymbolic Reinforcement Learning with Formally Verified Exploration
new file mode 100644
index 0000000000..9962490b0b
--- /dev/null
+++ b/data/2020/neurips/Neurosymbolic Reinforcement Learning with Formally Verified Exploration	
@@ -0,0 +1 @@
+We present Revel, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that Revel enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.
\ No newline at end of file
diff --git a/data/2020/neurips/Neurosymbolic Transformers for Multi-Agent Communication b/data/2020/neurips/Neurosymbolic Transformers for Multi-Agent Communication
new file mode 100644
index 0000000000..c1297acef9
--- /dev/null
+++ b/data/2020/neurips/Neurosymbolic Transformers for Multi-Agent Communication	
@@ -0,0 +1 @@
+We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a"soft"communication graph; then, it synthesizes a programmatic communication policy that"hardens"this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Neutralizing Self-Selection Bias in Sampling for Sortition b/data/2020/neurips/Neutralizing Self-Selection Bias in Sampling for Sortition
new file mode 100644
index 0000000000..c449b36144
--- /dev/null
+++ b/data/2020/neurips/Neutralizing Self-Selection Bias in Sampling for Sortition	
@@ -0,0 +1 @@
+Sortition is a political system in which decisions are made by panels of randomly selected citizens. The process for selecting a sortition panel is traditionally thought of as uniform sampling without replacement, which has strong fairness properties. In practice, however, sampling without replacement is not possible since only a fraction of agents is willing to participate in a panel when invited, and different demographic groups participate at different rates. In order to still produce panels whose composition resembles that of the population, we develop a sampling algorithm that restores close-to-equal representation probabilities for all agents while satisfying meaningful demographic quotas. As part of its input, our algorithm requires probabilities indicating how likely each volunteer in the pool was to participate. Since these participation probabilities are not directly observable, we show how to learn them, and demonstrate our approach using data on a real sortition panel combined with information on the general population in the form of publicly available survey data.
\ No newline at end of file
diff --git a/data/2020/neurips/Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning b/data/2020/neurips/Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
new file mode 100644
index 0000000000..382180a89f
--- /dev/null
+++ b/data/2020/neurips/Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning	
@@ -0,0 +1 @@
+Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling. Here, the scheduling procedure finishes before executing the GPU kernel, thereby removing most of the scheduling overhead during run time. Furthermore, Nimble automatically parallelizes the execution of GPU tasks by exploiting multiple GPU streams in a single GPU. Evaluation on a variety of neural networks shows that compared to PyTorch, Nimble speeds up inference and training by up to 22.34$\times$ and 3.61$\times$, respectively. Moreover, Nimble outperforms state-of-the-art inference systems, TensorRT and TVM, by up to 2.81$\times$ and 1.70$\times$, respectively.
\ No newline at end of file
diff --git a/data/2020/neurips/No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems b/data/2020/neurips/No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
new file mode 100644
index 0000000000..ef8c40f226
--- /dev/null
+++ b/data/2020/neurips/No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems	
@@ -0,0 +1 @@
+In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical applications such as medicine. We propose GEORGE, a method to both measure and mitigate hidden stratification even when subclass labels are unknown. We first observe that unlabeled subclasses are often separable in the feature space of deep models, and exploit this fact to estimate subclass labels for the training data via clustering techniques. We then use these approximate subclass labels as a form of noisy supervision in a distributionally robust optimization objective. We theoretically characterize the performance of GEORGE in terms of the worst-case generalization error across any subclass. We empirically validate GEORGE on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 22 percentage points compared to standard training techniques, without requiring any information about the subclasses.
\ No newline at end of file
diff --git a/data/2020/neurips/No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium b/data/2020/neurips/No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
new file mode 100644
index 0000000000..b374abc717
--- /dev/null
+++ b/data/2020/neurips/No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium	
@@ -0,0 +1 @@
+The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.
\ No newline at end of file
diff --git a/data/2020/neurips/No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix b/data/2020/neurips/No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix
new file mode 100644
index 0000000000..835f80d466
--- /dev/null
+++ b/data/2020/neurips/No-Regret Learning and Mixed Nash Equilibria: They Do Not Mix	
@@ -0,0 +1 @@
+Understanding the behavior of no-regret dynamics in general $N$-player games is a fundamental question in online learning and game theory. A folk result in the field states that, in finite games, the empirical frequency of play under no-regret learning converges to the game's set of coarse correlated equilibria. By contrast, our understanding of how the day-to-day behavior of the dynamics correlates to the game's Nash equilibria is much more limited, and only partial results are known for certain classes of games (such as zero-sum or congestion games). In this paper, we study the dynamics of "follow-the-regularized-leader" (FTRL), arguably the most well-studied class of no-regret dynamics, and we establish a sweeping negative result showing that the notion of mixed Nash equilibrium is antithetical to no-regret learning. Specifically, we show that any Nash equilibrium which is not strict (in that every player has a unique best response) cannot be stable and attracting under the dynamics of FTRL. This result has significant implications for predicting the outcome of a learning process as it shows unequivocally that only strict (and hence, pure) Nash equilibria can emerge as stable limit points thereof.
\ No newline at end of file
diff --git a/data/2020/neurips/No-regret Learning in Price Competitions under Consumer Reference Effects b/data/2020/neurips/No-regret Learning in Price Competitions under Consumer Reference Effects
new file mode 100644
index 0000000000..9c59ce6e7a
--- /dev/null
+++ b/data/2020/neurips/No-regret Learning in Price Competitions under Consumer Reference Effects	
@@ -0,0 +1 @@
+We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers' price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent(OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizesunder which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.
\ No newline at end of file
diff --git a/data/2020/neurips/Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding b/data/2020/neurips/Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding
new file mode 100644
index 0000000000..fefb408a4e
--- /dev/null
+++ b/data/2020/neurips/Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding	
@@ -0,0 +1 @@
+We study the problem of node classification on graphs with few-shot novel labels, which has two distinctive properties: (1) There are novel labels to emerge in the graph; (2) The novel labels have only a few representative nodes for training a classifier. The study of this problem is instructive and corresponds to many applications such as recommendations for newly formed groups with only a few users in online social networks. To cope with this problem, we propose a novel Meta Transformed Network Embedding framework (MetaTNE), which consists of three modules: (1) A \emph{structural module} provides each node a latent representation according to the graph structure. (2) A \emph{meta-learning module} captures the relationships between the graph structure and the node labels as prior knowledge in a meta-learning manner. Additionally, we introduce an \emph{embedding transformation function} that remedies the deficiency of the straightforward use of meta-learning. Inherently, the meta-learned prior knowledge can be used to facilitate the learning of few-shot novel labels. (3) An \emph{optimization module} employs a simple yet effective scheduling strategy to train the above two modules with a balance between graph structure learning and meta-learning. Experiments on four real-world datasets show that MetaTNE brings a huge improvement over the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Node Embeddings and Exact Low-Rank Representations of Complex Networks b/data/2020/neurips/Node Embeddings and Exact Low-Rank Representations of Complex Networks
new file mode 100644
index 0000000000..760f047bc8
--- /dev/null
+++ b/data/2020/neurips/Node Embeddings and Exact Low-Rank Representations of Complex Networks	
@@ -0,0 +1,2 @@
+Low-dimensional embeddings, from classical spectral embeddings to modern neural-net-inspired methods, are a cornerstone in the modeling and analysis of complex networks. Recent work by Seshadhri et al. (PNAS 2020) suggests that such embeddings cannot capture local structure arising in complex networks. In particular, they show that any network generated from a natural low-dimensional model cannot be both sparse and have high triangle density (high clustering coefficient), two hallmark properties of many real-world networks. 
+In this work we show that the results of Seshadhri et al. are intimately connected to the model they use rather than the low-dimensional structure of complex networks. Specifically, we prove that a minor relaxation of their model can generate sparse graphs with high triangle density. Surprisingly, we show that this same model leads to exact low-dimensional factorizations of many real-world networks. We give a simple algorithm based on logistic principal component analysis (LPCA) that succeeds in finding such exact embeddings. Finally, we perform a large number of experiments that verify the ability of very low-dimensional embeddings to capture local structure in real-world networks.
\ No newline at end of file
diff --git a/data/2020/neurips/Noise-Contrastive Estimation for Multivariate Point Processes b/data/2020/neurips/Noise-Contrastive Estimation for Multivariate Point Processes
new file mode 100644
index 0000000000..813f0188c8
--- /dev/null
+++ b/data/2020/neurips/Noise-Contrastive Estimation for Multivariate Point Processes	
@@ -0,0 +1 @@
+The log-likelihood of a generative model often involves both positive and negative terms. For a temporal multivariate point process, the negative term sums over all the possible event types at each time and also integrates over all the possible times. As a result, maximum likelihood estimation is expensive. We show how to instead apply a version of noise-contrastive estimation---a general parameter estimation method with a less expensive stochastic objective. Our specific instantiation of this general idea works out in an interestingly non-trivial way and has provable guarantees for its optimality, consistency and efficiency. On several synthetic and real-world datasets, our method shows benefits: for the model to achieve the same level of log-likelihood on held-out data, our method needs considerably fewer function evaluations and less wall-clock time.
\ No newline at end of file
diff --git a/data/2020/neurips/Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising b/data/2020/neurips/Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising
new file mode 100644
index 0000000000..d51be6946e
--- /dev/null
+++ b/data/2020/neurips/Noise2Same: Optimizing A Self-Supervised Bound for Image Denoising	
@@ -0,0 +1 @@
+Self-supervised frameworks that learn denoising models with merely individual noisy images have shown strong capability and promising performance in various image denoising tasks. Existing self-supervised denoising frameworks are mostly built upon the same theoretical foundation, where the denoising models are required to be J-invariant. However, our analyses indicate that the current theory and the J-invariance may lead to denoising models with reduced performance. In this work, we introduce Noise2Same, a novel self-supervised denoising framework. In Noise2Same, a new self-supervised loss is proposed by deriving a self-supervised upper bound of the typical supervised loss. In particular, Noise2Same requires neither J-invariance nor extra information about the noise model and can be used in a wider range of denoising applications. We analyze our proposed Noise2Same both theoretically and experimentally. The experimental results show that our Noise2Same remarkably outperforms previous self-supervised denoising methods in terms of denoising performance and training efficiency. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Non-Convex SGD Learns Halfspaces with Adversarial Label Noise b/data/2020/neurips/Non-Convex SGD Learns Halfspaces with Adversarial Label Noise
new file mode 100644
index 0000000000..3e8fbbd190
--- /dev/null
+++ b/data/2020/neurips/Non-Convex SGD Learns Halfspaces with Adversarial Label Noise	
@@ -0,0 +1 @@
+We study the problem of agnostically learning homogeneous halfspaces in the distribution-specific PAC model. For a broad family of structured distributions, including log-concave distributions, we show that non-convex SGD efficiently converges to a solution with misclassification error $O(\opt)+\eps$, where $\opt$ is the misclassification error of the best-fitting halfspace. In sharp contrast, we show that optimizing any convex surrogate inherently leads to misclassification error of $\omega(\opt)$, even under Gaussian marginals.
\ No newline at end of file
diff --git a/data/2020/neurips/Non-Crossing Quantile Regression for Distributional Reinforcement Learning b/data/2020/neurips/Non-Crossing Quantile Regression for Distributional Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Non-Euclidean Universal Approximation b/data/2020/neurips/Non-Euclidean Universal Approximation
new file mode 100644
index 0000000000..5f0e0c77fb
--- /dev/null
+++ b/data/2020/neurips/Non-Euclidean Universal Approximation	
@@ -0,0 +1 @@
+Modifications to a neural network's input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architecture's approximation capabilities is largely not understood. We present general conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts. As an application, we show that if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. In particular, we obtain guarantees for deep CNNs, deep ffNN, and universal Gaussian processes. Our results also have consequences within the scope of geometric deep learning. Specifically, when the input and output spaces are Hadamard manifolds, we obtain geometrically meaningful feature and readout maps satisfying our criteria. Consequently, commonly used non-Euclidean regression models between spaces of symmetric positive definite matrices are extended to universal DNNs. The same result allows us to show that the hyperbolic feed-forward networks, used for hierarchical learning, are universal. Our result is also used to show that the common practice of randomizing all but the last two layers of a DNN produces a universal family of functions with probability one.
\ No newline at end of file
diff --git a/data/2020/neurips/Non-Stochastic Control with Bandit Feedback b/data/2020/neurips/Non-Stochastic Control with Bandit Feedback
new file mode 100644
index 0000000000..30a6f688ed
--- /dev/null
+++ b/data/2020/neurips/Non-Stochastic Control with Bandit Feedback	
@@ -0,0 +1 @@
+We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Non-parametric Models for Non-negative Functions b/data/2020/neurips/Non-parametric Models for Non-negative Functions
new file mode 100644
index 0000000000..6a11642fcd
--- /dev/null
+++ b/data/2020/neurips/Non-parametric Models for Non-negative Functions	
@@ -0,0 +1 @@
+Linear models have shown great effectiveness and flexibility in many fields such as machine learning, signal processing and statistics. They can represent rich spaces of functions while preserving the convexity of the optimization problems where they are used, and are simple to evaluate, differentiate and integrate. However, for modeling non-negative functions, which are crucial for unsupervised learning, density estimation, or non-parametric Bayesian methods, linear models are not applicable directly. Moreover, current state-of-the-art models like generalized linear models either lead to non-convex optimization problems, or cannot be easily integrated. In this paper we provide the first model for non-negative functions which benefits from the same good properties of linear models. In particular, we prove that it admits a representer theorem and provide an efficient dual formulation for convex problems. We study its representation power, showing that the resulting space of functions is strictly richer than that of generalized linear models. Finally we extend the model and the theoretical results to functions with outputs in convex cones. The paper is complemented by an experimental evaluation of the model showing its effectiveness in terms of formulation, algorithmic derivation and practical results on the problems of density estimation, regression with heteroscedastic errors, and multiple quantile regression.
\ No newline at end of file
diff --git a/data/2020/neurips/Non-reversible Gaussian processes for identifying latent dynamical structure in neural data b/data/2020/neurips/Non-reversible Gaussian processes for identifying latent dynamical structure in neural data
new file mode 100644
index 0000000000..5f292cb7bf
--- /dev/null
+++ b/data/2020/neurips/Non-reversible Gaussian processes for identifying latent dynamical structure in neural data	
@@ -0,0 +1 @@
+A common goal in the analysis of neural data is to compress large population recordings into sets of interpretable, low-dimensional latent trajectories. This problem can be approached using Gaussian process (GP)-based methods which provide uncertainty quantiﬁcation and principled model selection. However, standard GP priors do not distinguish between underlying dynamical processes and other forms of temporal autocorrelation. Here, we propose a new family of “dynamical” priors over trajectories, in the form of GP covariance functions that express a property shared by most dynamical systems: temporal non-reversibility. Non-reversibility is a universal signature of autonomous dynamical systems whose state trajectories follow consistent ﬂow ﬁelds, such that any observed trajectory could not occur in reverse. Our new multi-output GP kernels can be used as drop-in replacements for standard kernels in multivariate regression, but also in latent variable models such as Gaussian process factor analysis (GPFA). We therefore introduce GPFADS (Gaussian Process Factor Analysis with Dynamical Structure), which models single-trial neural population activity using low-dimensional, non-reversible latent processes. Unlike previously proposed non-reversible multi-output kernels, ours admits a Kronecker factorization enabling fast and memory-efﬁcient learning and inference. We apply GPFADS to synthetic data and show that it correctly recovers ground truth phase portraits. GPFADS
\ No newline at end of file
diff --git a/data/2020/neurips/Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors b/data/2020/neurips/Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors
new file mode 100644
index 0000000000..c3ef3fa53f
--- /dev/null
+++ b/data/2020/neurips/Nonasymptotic Guarantees for Spiked Matrix Recovery with Generative Priors	
@@ -0,0 +1 @@
+Many problems in statistics and machine learning require the reconstruction of a rank-one signal matrix from noisy data. Enforcing additional prior information on the rank-one component is often key to guaranteeing good recovery performance. One such prior on the low-rank component is sparsity, giving rise to the sparse principal component analysis problem. Unfortunately, there is strong evidence that this problem suffers from a computational-to-statistical gap, which may be fundamental. In this work, we study an alternative prior where the low-rank component is in the range of a trained generative network. We provide a non-asymptotic analysis with optimal sample complexity, up to logarithmic factors, for rank-one matrix recovery under an expansive-Gaussian network prior. Specifically, we establish a favorable global optimization landscape for a nonlinear least squares objective, provided the number of samples is on the order of the dimensionality of the input to the generative model. This result suggests that generative priors have no computational-to-statistical gap for structured rank-one matrix recovery in the finite data, nonasymptotic regime. We present this analysis in the case of both the Wishart and Wigner spiked matrix models.
\ No newline at end of file
diff --git a/data/2020/neurips/Nonconvex Sparse Graph Learning under Laplacian Constrained Graphical Model b/data/2020/neurips/Nonconvex Sparse Graph Learning under Laplacian Constrained Graphical Model
new file mode 100644
index 0000000000..7c12232402
--- /dev/null
+++ b/data/2020/neurips/Nonconvex Sparse Graph Learning under Laplacian Constrained Graphical Model	
@@ -0,0 +1 @@
+In this paper, we consider the problem of learning a sparse graph from the Laplacian constrained Gaussian graphical model. This problem can be formulated as a penalized maximum likelihood estimation of the precision matrix under Laplacian structural constraints. Like in the classical graphical lasso problem, recent works made use of the `1-norm with the goal of promoting sparsity in the Laplacian constrained precision matrix estimation. However, through empirical evidence, we observe that the `1-norm is not effective in imposing a sparse solution in this problem. From a theoretical perspective, we prove that a large regularization parameter will surprisingly lead to a solution representing a complete graph, i.e., every pair of vertices is connected by an edge. To address this issue, we propose a nonconvex penalized maximum likelihood estimation method, and establish the order of the statistical error. Numerical experiments involving synthetic and realworld data sets demonstrate the effectiveness of the proposed method. An open source R package is available at https://github.com/mirca/sparseGraph.
\ No newline at end of file
diff --git a/data/2020/neurips/Normalizing Kalman Filters for Multivariate Time Series Analysis b/data/2020/neurips/Normalizing Kalman Filters for Multivariate Time Series Analysis
new file mode 100644
index 0000000000..81cadec539
--- /dev/null
+++ b/data/2020/neurips/Normalizing Kalman Filters for Multivariate Time Series Analysis	
@@ -0,0 +1 @@
+This paper tackles the modelling of large, complex and multivariate time series panels in a probabilistic setting. To this extent, we present a novel approach rec-onciling classical state space models with deep learning methods. By augmenting state space models with normalizing ﬂows, we mitigate imprecisions stemming from idealized assumptions in state space models. The resulting model is highly ﬂexible while still retaining many of the attractive properties of state space models, e.g., uncertainty and observation errors are properly accounted for, inference is tractable, sampling is efﬁcient and good generalization performance is observed, even in low data regimes. We demonstrate competitiveness against state-of-the-art deep learning methods on the tasks of forecasting real world data and handling varying levels of missing data.
\ No newline at end of file
diff --git a/data/2020/neurips/Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning b/data/2020/neurips/Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning
new file mode 100644
index 0000000000..2bad86421d
--- /dev/null
+++ b/data/2020/neurips/Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning	
@@ -0,0 +1 @@
+Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights -- as done in prior work -- is no longer possible. Instead, we adjust those weights via an algorithm based on the influence function, a measure of a model's dependency on one training example. To make the approach efficient, we propose a fast and effective approximation of the influence function. We demonstrate that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Novelty Search in Representational Space for Sample Efficient Exploration b/data/2020/neurips/Novelty Search in Representational Space for Sample Efficient Exploration
new file mode 100644
index 0000000000..84de2806c9
--- /dev/null
+++ b/data/2020/neurips/Novelty Search in Representational Space for Sample Efficient Exploration	
@@ -0,0 +1 @@
+We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning b/data/2020/neurips/Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning
new file mode 100644
index 0000000000..c794225a4d
--- /dev/null
+++ b/data/2020/neurips/Numerically Solving Parametric Families of High-Dimensional Kolmogorov Partial Differential Equations via Deep Learning	
@@ -0,0 +1 @@
+We present a deep learning algorithm for the numerical solution of parametric families of high-dimensional linear Kolmogorov partial differential equations (PDEs). Our method is based on reformulating the numerical approximation of a whole family of Kolmogorov PDEs as a single statistical learning problem using the Feynman-Kac formula. Successful numerical experiments are presented, which empirically confirm the functionality and efficiency of our proposed algorithm in the case of heat equations and Black-Scholes option pricing models parametrized by affine-linear coefficient functions. We show that a single deep neural network trained on simulated data is capable of learning the solution functions of an entire family of PDEs on a full space-time region. Most notably, our numerical observations and theoretical results also demonstrate that the proposed method does not suffer from the curse of dimensionality, distinguishing it from almost all standard numerical methods for PDEs.
\ No newline at end of file
diff --git a/data/2020/neurips/O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers b/data/2020/neurips/O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
new file mode 100644
index 0000000000..55cc007e78
--- /dev/null
+++ b/data/2020/neurips/O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers	
@@ -0,0 +1 @@
+Transformer networks use pairwise attention to compute contextual embeddings of inputs, and have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence length $n$ to compute attention in each layer. This has prompted recent research into faster attention models, with a predominant approach involving sparsifying the connections in the attention layers. While empirically promising for long sequences, fundamental questions remain unanswered: Can sparse transformers approximate any arbitrary sequence-to-sequence function, similar to their dense counterparts? How does the sparsity pattern and the sparsity level affect their performance? In this paper, we address these questions and provide a unifying framework that captures existing sparse attention models. Our analysis proposes sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function. Surprisingly, our results show the existence of models with only $O(n)$ connections per attention layer that can approximate the same function class as the dense model with $n^2$ connections. Lastly, we present experiments comparing different patterns/levels of sparsity on standard NLP tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification b/data/2020/neurips/OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification
new file mode 100644
index 0000000000..23f90cbb95
--- /dev/null
+++ b/data/2020/neurips/OOD-MAML: Meta-Learning for Few-Shot Out-of-Distribution Detection and Classification	
@@ -0,0 +1 @@
+We propose a few-shot learning method for detecting out-of-distribution (OOD) samples from classes that are unseen during training while classifying samples from seen classes using only a few labeled examples. For detecting unseen classes while generalizing to new samples of known classes, we synthesize fake samples, i.e., OOD samples, but that resemble in-distribution samples, and use them along with real samples. Our approach is based on an extension of model-agnostic meta learning (MAML) and is denoted as OOD-MAML, which not only learns a model initialization but also the initial fake samples across tasks. The learned initial fake samples can be used to quickly adapt to new tasks to form task-speciﬁc fake samples with only one or a few gradient update steps using MAML. For testing, OOD-MAML converts a K -shot N -way classiﬁcation task into N sub-tasks of K -shot OOD detection with respect to each class. The joint analysis of N sub-tasks facilitates simultaneous classiﬁcation and OOD detection and, furthermore, offers an advantage, in that it does not require re-training when the number of classes for a test task differs from that for training tasks; it is sufﬁcient to simply assume as many sub-tasks as the number of classes for the test task. We also demonstrate the effective performance of OOD-MAML over benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling b/data/2020/neurips/OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling
new file mode 100644
index 0000000000..e6d98efd83
--- /dev/null
+++ b/data/2020/neurips/OTLDA: A Geometry-aware Optimal Transport Approach for Topic Modeling	
@@ -0,0 +1 @@
+We present an optimal transport framework for learning topics from textual data. While the celebrated Latent Dirichlet allocation (LDA) topic model and its variants have been applied to many disciplines, they mainly focus on wordoccurrences and neglect to incorporate semantic regularities in language. Even though recent works have tried to exploit the semantic relationship between words to bridge this gap, they, however, these models which are usually extensions of LDA or Dirichlet Multinomial mixture (DMM) are tailored to deal effectively with either regular or short documents. The optimal transport distance provides an appealing tool to incorporate the geometry of word semantics into it. Moreover, recent developments on efficient computation of optimal transport distance also promote its application in topic modeling. In this paper we ground on optimal transport theory to naturally exploit the geometric structures of semantically related words in embedding spaces which leads to more interpretable learned topics. Comprehensive experiments illustrate that the proposed framework outperforms competitive approaches in terms of topic coherence on assorted text corpora which include both long and short documents. The representation of learned topic also leads to better accuracy on classification downstream tasks, which is considered as an extrinsic evaluation.
\ No newline at end of file
diff --git a/data/2020/neurips/Object Goal Navigation using Goal-Oriented Semantic Exploration b/data/2020/neurips/Object Goal Navigation using Goal-Oriented Semantic Exploration
new file mode 100644
index 0000000000..6787f0cd2b
--- /dev/null
+++ b/data/2020/neurips/Object Goal Navigation using Goal-Oriented Semantic Exploration	
@@ -0,0 +1 @@
+This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently. Domain-agnostic module design allow us to transfer our model to a mobile robot platform and achieve similar performance for object goal navigation in the real-world.
\ No newline at end of file
diff --git a/data/2020/neurips/Object-Centric Learning with Slot Attention b/data/2020/neurips/Object-Centric Learning with Slot Attention
new file mode 100644
index 0000000000..6eec734141
--- /dev/null
+++ b/data/2020/neurips/Object-Centric Learning with Slot Attention	
@@ -0,0 +1 @@
+Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Ode to an ODE b/data/2020/neurips/Ode to an ODE
new file mode 100644
index 0000000000..81f79107c0
--- /dev/null
+++ b/data/2020/neurips/Ode to an ODE	
@@ -0,0 +1 @@
+We present a new paradigm for Neural ODE algorithms, called ODEtoODE , where time-dependent parameters of the main ﬂow evolve according to a matrix ﬂow on the orthogonal group O ( d ) . This nested system of two ﬂows, where the parameter-ﬂow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem which is intrinsically related to training deep neural network architectures such as Neural ODEs. Consequently, it leads to better downstream models, as we show on the example of training reinforcement learning policies with evolution strategies, and in the supervised learning setting, by comparing with previous SOTA baselines. We provide strong convergence results for our proposed mechanism that are independent of the depth of the network, supporting our empirical studies. Our results show an intriguing connection between the theory of deep neural networks and the ﬁeld of matrix ﬂows on compact manifolds.
\ No newline at end of file
diff --git a/data/2020/neurips/Off-Policy Evaluation and Learning for External Validity under a Covariate Shift b/data/2020/neurips/Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
new file mode 100644
index 0000000000..c016773cfd
--- /dev/null
+++ b/data/2020/neurips/Off-Policy Evaluation and Learning for External Validity under a Covariate Shift	
@@ -0,0 +1 @@
+We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift, i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Off-Policy Evaluation via the Regularized Lagrangian b/data/2020/neurips/Off-Policy Evaluation via the Regularized Lagrangian
new file mode 100644
index 0000000000..7f14810ada
--- /dev/null
+++ b/data/2020/neurips/Off-Policy Evaluation via the Regularized Lagrangian	
@@ -0,0 +1 @@
+The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data. While these estimators all perform some form of stationary distribution correction, they arise from different derivations and objective functions. In this paper, we unify these estimators as regularized Lagrangians of the same linear program. The unification allows us to expand the space of DICE estimators to new alternatives that demonstrate improved performance. More importantly, by analyzing the expanded space of estimators both mathematically and empirically we find that dual solutions offer greater flexibility in navigating the tradeoff between optimization stability and estimation bias, and generally provide superior estimates in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Off-Policy Imitation Learning from Observations b/data/2020/neurips/Off-Policy Imitation Learning from Observations
new file mode 100644
index 0000000000..0d0dbd5daa
--- /dev/null
+++ b/data/2020/neurips/Off-Policy Imitation Learning from Observations	
@@ -0,0 +1 @@
+Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit through the reuse of incomplete resources. Compared to conventional imitation learning (IL), LfO is more challenging because of the lack of expert action guidance. In both conventional IL and LfO, distribution matching is at the heart of their foundation. Traditional distribution matching approaches are sample-costly which depend on on-policy transitions for policy learning. Towards sample-efficiency, some off-policy solutions have been proposed, which, however, either lack comprehensive theoretical justifications or depend on the guidance of expert actions. In this work, we propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner. To further accelerate the learning procedure, we regulate the policy update with an inverse action model, which assists distribution matching from the perspective of mode-covering. Extensive empirical results on challenging locomotion tasks indicate that our approach is comparable with state-of-the-art in terms of both sample-efficiency and asymptotic performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Off-Policy Interval Estimation with Lipschitz Value Iteration b/data/2020/neurips/Off-Policy Interval Estimation with Lipschitz Value Iteration
new file mode 100644
index 0000000000..722deca984
--- /dev/null
+++ b/data/2020/neurips/Off-Policy Interval Estimation with Lipschitz Value Iteration	
@@ -0,0 +1 @@
+Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is crucial to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding b/data/2020/neurips/Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
new file mode 100644
index 0000000000..8520f4f59b
--- /dev/null
+++ b/data/2020/neurips/Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding	
@@ -0,0 +1 @@
+When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy. When unobserved confounders can affect every decision in an episode, we demonstrate that even small amounts of per-decision confounding can heavily bias OPE methods. Fortunately, in a number of important settings found in healthcare, policy-making, operations, and technology, unobserved confounders may primarily affect only one of the many decisions made. Under this less pessimistic model of one-decision confounding, we propose an efficient loss-minimization-based procedure for computing worst-case bounds, and prove its statistical consistency. On two simulated healthcare examples---management of sepsis patients and developmental interventions for autistic children---where this is a reasonable model of confounding, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies even under unobserved confounding.
\ No newline at end of file
diff --git a/data/2020/neurips/Offline Imitation Learning with a Misspecified Simulator b/data/2020/neurips/Offline Imitation Learning with a Misspecified Simulator
new file mode 100644
index 0000000000..a31d13aedd
--- /dev/null
+++ b/data/2020/neurips/Offline Imitation Learning with a Misspecified Simulator	
@@ -0,0 +1 @@
+In real-world decision-making tasks, learning an optimal policy without a trial-and-error process is an appealing challenge. When expert demonstrations are available, imitation learning that mimics expert actions can learn a good policy ef-ﬁciently. Learning in simulators is another commonly adopted approach to avoid real-world trials-and-errors. However, neither sufﬁcient expert demonstrations nor high-ﬁdelity simulators are easy to obtain. In this work, we investigate policy learning in the condition of a few expert demonstrations and a simulator with misspeciﬁed dynamics. Under a mild assumption that local states shall still be partially aligned under a dynamics mismatch, we propose imitation learning with horizon-adaptive inverse dynamics (HIDIL) that matches the simulator states with expert states in a H -step horizon and accurately recovers actions based on inverse dynamics policies. In the real environment, HIDIL can effectively derive adapted actions from the matched states. Experiments are conducted in four MuJoCo loco-motion environments with modiﬁed friction, gravity, and density conﬁgurations. Experiment results show that HIDIL achieves signiﬁcant improvement in terms of performance and stability in all of the real environments, compared with imitation learning methods and transferring methods in reinforcement learning.
\ No newline at end of file
diff --git a/data/2020/neurips/On 1 n neural representation and robustness b/data/2020/neurips/On 1 n neural representation and robustness
new file mode 100644
index 0000000000..c2b8ff4b8d
--- /dev/null
+++ b/data/2020/neurips/On 1 n neural representation and robustness	
@@ -0,0 +1 @@
+Understanding the nature of representation in neural networks is a goal shared by neuroscience and machine learning. It is therefore exciting that both fields converge not only on shared questions but also on similar approaches. A pressing question in these areas is understanding how the structure of the representation used by neural networks affects both their generalization, and robustness to perturbations. In this work, we investigate the latter by juxtaposing experimental results regarding the covariance spectrum of neural representations in the mouse V1 (Stringer et al) with artificial neural networks. We use adversarial robustness to probe Stringer et al's theory regarding the causal role of a 1/n covariance spectrum. We empirically investigate the benefits such a neural code confers in neural networks, and illuminate its role in multi-layer architectures. Our results show that imposing the experimentally observed structure on artificial neural networks makes them more robust to adversarial attacks. Moreover, our findings complement the existing theory relating wide neural networks to kernel methods, by showing the role of intermediate representations.
\ No newline at end of file
diff --git a/data/2020/neurips/On Adaptive Attacks to Adversarial Example Defenses b/data/2020/neurips/On Adaptive Attacks to Adversarial Example Defenses
new file mode 100644
index 0000000000..37192d048c
--- /dev/null
+++ b/data/2020/neurips/On Adaptive Attacks to Adversarial Example Defenses	
@@ -0,0 +1 @@
+Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.
\ No newline at end of file
diff --git a/data/2020/neurips/On Adaptive Distance Estimation b/data/2020/neurips/On Adaptive Distance Estimation
new file mode 100644
index 0000000000..4ad2f8b9f5
--- /dev/null
+++ b/data/2020/neurips/On Adaptive Distance Estimation	
@@ -0,0 +1,2 @@
+We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \mathbb{R}^d$, outputs a $(1+\epsilon)$-approximation of $\lVert q - x_i \rVert_p$ with high probability for all $i\in[n]$. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the $j$th query point $q_j$ in a way that depends on the answers reported by the data structure for $q_1,\ldots,q_{j-1}$. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is $\tilde O((n+d)d/\epsilon^2)$, slightly more than the $O(nd)$ required to store $X$ in memory explicitly, but with the benefit that our time to answer queries is only $\tilde O(\epsilon^{-2}(n + d))$, much faster than the naive $\Theta(nd)$ time obtained from a linear scan in the case of $n$ and $d$ very large. Here $\tilde O$ hides $\log(nd/\epsilon)$ factors. We discuss applications to nearest neighbor search and nonparametric estimation. 
+Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.
\ No newline at end of file
diff --git a/data/2020/neurips/On Completeness-aware Concept-Based Explanations in Deep Neural Networks b/data/2020/neurips/On Completeness-aware Concept-Based Explanations in Deep Neural Networks
new file mode 100644
index 0000000000..5a4b0a7554
--- /dev/null
+++ b/data/2020/neurips/On Completeness-aware Concept-Based Explanations in Deep Neural Networks	
@@ -0,0 +1 @@
+Human explanations of high-level decisions are often expressed in terms of key concepts the decisions are based on. In this paper, we study such concept-based explainability for Deep Neural Networks (DNNs). First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior based on the assumption that complete concept scores are sufficient statistics of the model prediction. Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations. To define an importance score for each discovered concept, we adapt game-theoretic notions to aggregate over sets and propose ConceptSHAP. Via proposed metrics and user studies, on a synthetic dataset with apriori-known concept explanations, as well as on real-world image and language datasets, we validate the effectiveness of our method in finding concepts that are both complete in explaining the decisions and interpretable. (The code is released at this https URL)
\ No newline at end of file
diff --git a/data/2020/neurips/On Convergence and Generalization of Dropout Training b/data/2020/neurips/On Convergence and Generalization of Dropout Training
new file mode 100644
index 0000000000..0debc74ad9
--- /dev/null
+++ b/data/2020/neurips/On Convergence and Generalization of Dropout Training	
@@ -0,0 +1 @@
+We study dropout in two-layer neural networks with rectified linear unit (ReLU) activations. Under mild overparametrization and assuming that the limiting kernel can separate the data distribution with a positive margin, we show that dropout training with logistic loss achieves $\epsilon$-suboptimality in test error in $O(1/\epsilon)$ iterations.
\ No newline at end of file
diff --git a/data/2020/neurips/On Convergence of Nearest Neighbor Classifiers over Feature Transformations b/data/2020/neurips/On Convergence of Nearest Neighbor Classifiers over Feature Transformations
new file mode 100644
index 0000000000..24d34a3d90
--- /dev/null
+++ b/data/2020/neurips/On Convergence of Nearest Neighbor Classifiers over Feature Transformations	
@@ -0,0 +1 @@
+The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical understanding of kNN and its practical applications. In this paper, we take a first step towards bridging this gap. We provide a novel analysis on the convergence rates of a kNN classifier over transformed features. This analysis requires in-depth understanding of the properties that connect both the transformed space and the raw feature space. More precisely, we build our convergence bound upon two key properties of the transformed space: (1) safety -- how well can one recover the raw posterior from the transformed space, and (2) smoothness -- how complex this recovery function is. Based on our result, we are able to explain why some (pre-trained) feature transformations are better suited for a kNN classifier than other. We empirically validate that both properties have an impact on the kNN convergence on 30 feature transformations with 6 benchmark datasets spanning from the vision to the text domain.
\ No newline at end of file
diff --git a/data/2020/neurips/On Correctness of Automatic Differentiation for Non-Differentiable Functions b/data/2020/neurips/On Correctness of Automatic Differentiation for Non-Differentiable Functions
new file mode 100644
index 0000000000..33eb466b1d
--- /dev/null
+++ b/data/2020/neurips/On Correctness of Automatic Differentiation for Non-Differentiable Functions	
@@ -0,0 +1 @@
+Differentiation lies at the core of many machine-learning algorithms, and is well-supported by popular autodiff systems, such as TensorFlow and PyTorch. Originally, these systems have been developed to compute derivatives of differentiable functions, but in practice, they are commonly applied to functions with non-differentiabilities. For instance, neural networks using ReLU define non-differentiable functions in general, but the gradients of losses involving those functions are computed using autodiff systems in practice. This status quo raises a natural question: are autodiff systems correct in any formal sense when they are applied to such non-differentiable functions? In this paper, we provide a positive answer to this question. Using counterexamples, we first point out flaws in often-used informal arguments, such as: non-differentiabilities arising in deep learning do not cause any issues because they form a measure-zero set. We then investigate a class of functions, called PAP functions, that includes nearly all (possibly non-differentiable) functions in deep learning nowadays. For these PAP functions, we propose a new type of derivatives, called intensional derivatives, and prove that these derivatives always exist and coincide with standard derivatives for almost all inputs. We also show that these intensional derivatives are what most autodiff systems compute or try to compute essentially. In this way, we formally establish the correctness of autodiff systems applied to non-differentiable functions.
\ No newline at end of file
diff --git a/data/2020/neurips/On Efficiency in Hierarchical Reinforcement Learning b/data/2020/neurips/On Efficiency in Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..3d386f1e88
--- /dev/null
+++ b/data/2020/neurips/On Efficiency in Hierarchical Reinforcement Learning	
@@ -0,0 +1 @@
+Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efﬁcient solutions to sequential decision making problems, both in terms of statistical as well as computational efﬁciency. While this has been demonstrated empirically over time in a variety of tasks, theoretical results quantifying the ben-eﬁts of such methods are still few and far between. In this paper, we discuss the kind of structure in a Markov decision process which gives rise to efﬁcient HRL methods. Speciﬁcally, we formalize the intuition that HRL can exploit well repeating "subMDPs", with similar reward and transition structure. We show that, under reasonable assumptions, a model-based Thompson sampling-style HRL algorithm that exploits this structure is statistically efﬁcient, as established through a ﬁnite-time regret bound. We also establish conditions under which planning with structure-induced options is near-optimal and computationally efﬁcient.
\ No newline at end of file
diff --git a/data/2020/neurips/On Infinite-Width Hypernetworks b/data/2020/neurips/On Infinite-Width Hypernetworks
new file mode 100644
index 0000000000..3710501f6f
--- /dev/null
+++ b/data/2020/neurips/On Infinite-Width Hypernetworks	
@@ -0,0 +1 @@
+Hypernetworks are architectures in which a learned {\em meta-network} produces the weights of a task-specific {\em primary network}. They have been demonstrated to obtain state of the art results on several notable meta-learning benchmarks, such as shape reconstruction. In this work, we study randomly initialized hypernetworks in the over-parameterized regime. We show that the function computed by an initialized hypernetwork exhibits Gaussian Process (GP) and Neural Tangent Kernel (NTK) behaviours -- but only when both the meta and primary networks are infinitely wide. In this dually infinite regime, we identify functional priors of these architectures by deriving the corresponding GP and NTK kernels, the latter of which we refer to as the {\em hyperkernel}. We provide empirical evidence to support our claims, and demonstrate the power of the hyperkernels on various meta learning tasks. As part of this study, we make a mathematical contribution by deriving tight bounds on high order Taylor expansion terms in standard fully connected ReLU networks.
\ No newline at end of file
diff --git a/data/2020/neurips/On Learning Ising Models under Huber's Contamination Model b/data/2020/neurips/On Learning Ising Models under Huber's Contamination Model
new file mode 100644
index 0000000000..eb6b129f63
--- /dev/null
+++ b/data/2020/neurips/On Learning Ising Models under Huber's Contamination Model	
@@ -0,0 +1 @@
+We study the problem of learning Ising models in a setting where some of the samples from the underlying distribution can be arbitrarily corrupted. In such a setup, we aim to design statistically optimal estimators in a high-dimensional scaling in which the number of nodes p , the number of edges k and the maximal node degree d are allowed to increase to in ﬁ nity as a function of the sample size n . Our analysis is based on exploiting moments of the underlying distribution, coupled with novel reductions to univariate estimation. Our proposed estimators achieve an optimal dimension independent dependence on the fraction of corrupted data in the contaminated setting, while also simultaneously achieving high-probability error guarantees with optimal sample-complexity. We corroborate our theoretical results by simulations.
\ No newline at end of file
diff --git a/data/2020/neurips/On Numerosity of Deep Neural Networks b/data/2020/neurips/On Numerosity of Deep Neural Networks
new file mode 100644
index 0000000000..4ece5c44c3
--- /dev/null
+++ b/data/2020/neurips/On Numerosity of Deep Neural Networks	
@@ -0,0 +1 @@
+Recently, a provocative claim was published that number sense spontaneously emerges in a deep neural network trained merely for visual object recognition. This has, if true, far reaching significance to the fields of machine learning and cognitive science alike. In this paper, we prove the above claim to be unfortunately incorrect. The statistical analysis to support the claim is flawed in that the sample set used to identify number-aware neurons is too small, compared to the huge number of neurons in the object recognition network. By this flawed analysis one could mistakenly identify number-sensing neurons in any randomly initialized deep neural networks that are not trained at all. With the above critique we ask the question what if a deep convolutional neural network is carefully trained for numerosity? Our findings are mixed. Even after being trained with number-depicting images, the deep learning approach still has difficulties to acquire the abstract concept of numbers, a cognitive task that preschoolers perform with ease. But on the other hand, we do find some encouraging evidences suggesting that deep neural networks are more robust to distribution shift for small numbers than for large numbers.
\ No newline at end of file
diff --git a/data/2020/neurips/On Power Laws in Deep Ensembles b/data/2020/neurips/On Power Laws in Deep Ensembles
new file mode 100644
index 0000000000..eb8aad363d
--- /dev/null
+++ b/data/2020/neurips/On Power Laws in Deep Ensembles	
@@ -0,0 +1,2 @@
+Ensembles of deep neural networks are known to achieve state-of-the-art performance in uncertainty estimation and lead to accuracy improvement. In this work, we focus on a classification problem and investigate the behavior of both non-calibrated and calibrated negative log-likelihood (CNLL) of a deep ensemble as a function of the ensemble size and the member network size. We indicate the conditions under which CNLL follows a power law w.r.t. ensemble size or member network size, and analyze the dynamics of the parameters of the discovered power laws. Our important practical finding is that one large network may perform worse than an ensemble of several medium-size networks with the same total number of parameters (we call this ensemble a memory split). Using the detected power law-like dependencies, we can predict (1) the possible gain from the ensembling of networks with given structure, (2) the optimal memory split given a memory budget, based on a relatively small number of trained networks. 
+We describe the memory split advantage effect in more details in arXiv:2005.07292
\ No newline at end of file
diff --git a/data/2020/neurips/On Regret with Multiple Best Arms b/data/2020/neurips/On Regret with Multiple Best Arms
new file mode 100644
index 0000000000..8bd0f9c67b
--- /dev/null
+++ b/data/2020/neurips/On Regret with Multiple Best Arms	
@@ -0,0 +1 @@
+We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i.e., the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.
\ No newline at end of file
diff --git a/data/2020/neurips/On Reward-Free Reinforcement Learning with Linear Function Approximation b/data/2020/neurips/On Reward-Free Reinforcement Learning with Linear Function Approximation
new file mode 100644
index 0000000000..265376b077
--- /dev/null
+++ b/data/2020/neurips/On Reward-Free Reinforcement Learning with Linear Function Approximation	
@@ -0,0 +1 @@
+Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. During the exploration phase, an agent collects samples without using a pre-specified reward function. After the exploration phase, a reward function is given, and the agent uses samples collected during the exploration phase to compute a near-optimal policy. Jin et al. [2020] showed that in the tabular setting, the agent only needs to collect polynomial number of samples (in terms of the number states, the number of actions, and the planning horizon) for reward-free RL. However, in practice, the number of states and actions can be large, and thus function approximation schemes are required for generalization. In this work, we give both positive and negative results for reward-free RL with linear function approximation. We give an algorithm for reward-free RL in the linear Markov decision process setting where both the transition and the reward admit linear representations. The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions. We further give an exponential lower bound for reward-free RL in the setting where only the optimal $Q$-function admits a linear representation. Our results imply several interesting exponential separations on the sample complexity of reward-free RL.
\ No newline at end of file
diff --git a/data/2020/neurips/On Second Order Behaviour in Augmented Neural ODEs b/data/2020/neurips/On Second Order Behaviour in Augmented Neural ODEs
new file mode 100644
index 0000000000..f6ec4c976e
--- /dev/null
+++ b/data/2020/neurips/On Second Order Behaviour in Augmented Neural ODEs	
@@ -0,0 +1 @@
+Neural Ordinary Differential Equations (NODEs) are a new class of models that transform data continuously through infinite-depth architectures. The continuous nature of NODEs has made them particularly suitable for learning the dynamics of complex physical systems. While previous work has mostly been focused on first order ODEs, the dynamics of many systems, especially in classical physics, are governed by second order laws. In this work, we consider Second Order Neural ODEs (SONODEs). We show how the adjoint sensitivity method can be extended to SONODEs and prove that the optimisation of a first order coupled ODE is equivalent and computationally more efficient. Furthermore, we extend the theoretical understanding of the broader class of Augmented NODEs (ANODEs) by showing they can also learn higher order dynamics with a minimal number of augmented dimensions, but at the cost of interpretability. This indicates that the advantages of ANODEs go beyond the extra space offered by the augmented dimensions, as originally thought. Finally, we compare SONODEs and ANODEs on synthetic and real dynamical systems and demonstrate that the inductive biases of the former generally result in faster training and better performance.
\ No newline at end of file
diff --git a/data/2020/neurips/On Testing of Samplers b/data/2020/neurips/On Testing of Samplers
new file mode 100644
index 0000000000..a96a33ce93
--- /dev/null
+++ b/data/2020/neurips/On Testing of Samplers	
@@ -0,0 +1,3 @@
+Given a set of items $\mathcal{F}$ and a weight function $\mathtt{wt}: \mathcal{F} \mapsto (0,1)$, the problem of sampling seeks to sample an item proportional to its weight. Sampling is a fundamental problem in machine learning. The daunting computational complexity of sampling with formal guarantees leads designers to propose heuristics-based techniques for which no rigorous theoretical analysis exists to quantify the quality of generated distributions. 
+This poses a challenge in designing a testing methodology to test whether a sampler under test generates samples according to a given distribution. Only recently, Chakraborty and Meel (2019) designed the first scalable verifier, called Barbarik1, for samplers in the special case when the weight function $\mathtt{wt}$ is constant, that is, when the sampler is supposed to sample uniformly from $\mathcal{F}$ . The techniques in Barbarik1, however, fail to handle general weight functions. 
+The primary contribution of this paper is an affirmative answer to the above challenge: motivated by Barbarik1 but using different techniques and analysis, we design Barbarik2 an algorithm to test whether the distribution generated by a sampler is $\varepsilon$-close or $\eta$-far from any target distribution. In contrast to black-box sampling techniques that require a number of samples proportional to $|\mathcal{F}|$ , Barbarik2 requires only $\tilde{O}(tilt(\mathtt{wt},\varphi)^2/\eta(\eta - 6\varepsilon)^3)$ samples, where the $tilt$ is the maximum ratio of weights of two satisfying assignments. Barbarik2 can handle any arbitrary weight function. We present a prototype implementation of Barbarik2 and use it to test three state-of-the-art samplers.
\ No newline at end of file
diff --git a/data/2020/neurips/On Uniform Convergence and Low-Norm Interpolation Learning b/data/2020/neurips/On Uniform Convergence and Low-Norm Interpolation Learning
new file mode 100644
index 0000000000..19f1aa16b5
--- /dev/null
+++ b/data/2020/neurips/On Uniform Convergence and Low-Norm Interpolation Learning	
@@ -0,0 +1 @@
+We consider an underdetermined noisy linear regression model where the minimum-norm interpolating predictor is known to be consistent, and ask: can uniform convergence in a norm ball, or at least (following Nagarajan and Kolter) the subset of a norm ball that the algorithm selects on a typical input set, explain this success? We show that uniformly bounding the difference between empirical and population errors cannot show any learning in the norm ball, and cannot show consistency for any set, even one depending on the exact algorithm and distribution. But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion, uniform convergence of zero-error predictors. We use this to bound the generalization error of low- (but not minimal-) norm interpolating predictors.
\ No newline at end of file
diff --git a/data/2020/neurips/On Warm-Starting Neural Network Training b/data/2020/neurips/On Warm-Starting Neural Network Training
new file mode 100644
index 0000000000..b2cb51df32
--- /dev/null
+++ b/data/2020/neurips/On Warm-Starting Neural Network Training	
@@ -0,0 +1 @@
+In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate---to "warm start" the optimization rather than initialize from scratch---and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar. While it appears that some hyperparameter settings allow a practitioner to close this generalization gap, they seem to only do so in regimes that damage the wall-clock gains of the warm start. Nevertheless, it is highly desirable to be able to warm-start neural network training, as it would dramatically reduce the resource usage associated with the construction of performant deep learning systems. In this work, we take a closer look at this empirical phenomenon and try to understand when and how it occurs. We also provide a surprisingly simple trick that overcomes this pathology in several important situations, and present experiments that elucidate some of its properties.
\ No newline at end of file
diff --git a/data/2020/neurips/On ranking via sorting by estimated expected utility b/data/2020/neurips/On ranking via sorting by estimated expected utility
new file mode 100644
index 0000000000..9cb806bef8
--- /dev/null
+++ b/data/2020/neurips/On ranking via sorting by estimated expected utility	
@@ -0,0 +1 @@
+Ranking tasks are defined through losses that measure trade-offs between different desiderata such as the relevance and the diversity of the items at the top of the list. This paper addresses the question of which of these tasks are asymptotically solved by sorting by decreasing order of expected utility, for some suitable notion of utility, or, equivalently, when is square loss regression consistent for ranking via score-andsort? We answer to this question by finding a characterization of ranking losses for which a suitable regression is consistent. This characterization has two strong corollaries. First, whenever there exists a consistent approach based on convex risk minimization, there also is a consistent approach based on regression. Second, when regression is not consistent, there are data distributions for which consistent surrogate approaches necessarily have non-trivial local minima, and for which optimal scoring function are necessarily discontinuous, even when the underlying data distribution is regular. In addition to providing a better understanding of surrogate approaches for ranking, these results illustrate the intrinsic difficulty of solving general ranking problems with the score-and-sort approach.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems b/data/2020/neurips/On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
new file mode 100644
index 0000000000..1d9427c118
--- /dev/null
+++ b/data/2020/neurips/On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems	
@@ -0,0 +1 @@
+This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $\Theta(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Convergence of Smooth Regularized Approximate Value Iteration Schemes b/data/2020/neurips/On the Convergence of Smooth Regularized Approximate Value Iteration Schemes
new file mode 100644
index 0000000000..dc64a23561
--- /dev/null
+++ b/data/2020/neurips/On the Convergence of Smooth Regularized Approximate Value Iteration Schemes	
@@ -0,0 +1 @@
+Entropy regularization, smoothing of Q-values and neural network function ap-proximator are key components of the state-of-the-art reinforcement learning (RL) algorithms, such as Soft Actor-Critic [1]. Despite the widespread use, the impact of these core techniques on the convergence of RL algorithms is not yet fully understood. In this work, we analyse these techniques from error propagation perspective using the approximate dynamic programming framework. In particular, our analysis shows that (1) value smoothing results in increased stability of the algorithm in exchange for slower convergence, (2) entropy regularization reduces overestimation errors at the cost of modifying the original problem, (3) we study a combination of these techniques that describes the Soft Actor-Critic algorithm
\ No newline at end of file
diff --git a/data/2020/neurips/On the Equivalence between Online and Private Learnability beyond Binary Classification b/data/2020/neurips/On the Equivalence between Online and Private Learnability beyond Binary Classification
new file mode 100644
index 0000000000..a0039799c9
--- /dev/null
+++ b/data/2020/neurips/On the Equivalence between Online and Private Learnability beyond Binary Classification	
@@ -0,0 +1 @@
+Alon et al. [2019] and Bun et al. [2020] recently showed that online learnability and private PAC learnability are equivalent in binary classification. We investigate whether this equivalence extends to multi-class classification and regression. First, we show that private learnability implies online learnability in both settings. Our extension involves studying a novel variant of the Littlestone dimension that depends on a tolerance parameter and on an appropriate generalization of the concept of threshold functions beyond binary classification. Second, we show that while online learnability continues to imply private learnability in multi-class classification, current proof techniques encounter significant hurdles in the regression setting. While the equivalence for regression remains open, we provide non-trivial sufficient conditions for an online learnable class to also be privately learnable.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method b/data/2020/neurips/On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method
new file mode 100644
index 0000000000..7f05ddb9f2
--- /dev/null
+++ b/data/2020/neurips/On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method	
@@ -0,0 +1 @@
+The randomized midpoint method, proposed by [SL19], has emerged as an optimal discretization procedure for simulating the continuous time Langevin diffusions. Focusing on the case of strong-convex and smooth potentials, in this paper, we analyze several probabilistic properties of the randomized midpoint discretization method for both overdamped and underdamped Langevin diffusions. We first characterize the stationary distribution of the discrete chain obtained with constant step-size discretization and show that it is biased away from the target distribution. Notably, the step-size needs to go to zero to obtain asymptotic unbiasedness. Next, we establish the asymptotic normality for numerical integration using the randomized midpoint method and highlight the relative advantages and disadvantages over other discretizations. Our results collectively provide several insights into the behavior of the randomized midpoint discretization method, including obtaining confidence intervals for numerical integrations.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Error Resistance of Hinge-Loss Minimization b/data/2020/neurips/On the Error Resistance of Hinge-Loss Minimization
new file mode 100644
index 0000000000..afaf7808d9
--- /dev/null
+++ b/data/2020/neurips/On the Error Resistance of Hinge-Loss Minimization	
@@ -0,0 +1 @@
+Commonly used classification algorithms in machine learning, such as support vector machines, minimize a convex surrogate loss on training examples. In practice, these algorithms are surprisingly robust to errors in the training data. In this work, we identify a set of conditions on the data under which such surrogate loss minimization algorithms provably learn the correct classifier. This allows us to establish, in a unified framework, the robustness of these algorithms under various models on data as well as error. In particular, we show that if the data is linearly classifiable with a slightly non-trivial margin (i.e. a margin at least $C/\sqrt{d}$ for $d$-dimensional unit vectors), and the class-conditional distributions are near isotropic and logconcave, then surrogate loss minimization has negligible error on the uncorrupted data even when a constant fraction of examples are adversarially mislabeled.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Expressiveness of Approximate Inference in Bayesian Neural Networks b/data/2020/neurips/On the Expressiveness of Approximate Inference in Bayesian Neural Networks
new file mode 100644
index 0000000000..85fac0ad51
--- /dev/null
+++ b/data/2020/neurips/On the Expressiveness of Approximate Inference in Bayesian Neural Networks	
@@ -0,0 +1 @@
+While Bayesian neural networks (BNNs) hold the promise of being flexible, well-calibrated statistical models, inference often requires approximations whose consequences are poorly understood. We study the quality of common variational methods in approximating the Bayesian predictive distribution. For single-hidden layer ReLU BNNs, we prove a fundamental limitation in function-space of two of the most commonly used distributions defined in weight-space: mean-field Gaussian and Monte Carlo dropout. We find there are simple cases where neither method can have substantially increased uncertainty in between well-separated regions of low uncertainty. We provide strong empirical evidence that exact inference does not have this pathology, hence it is due to the approximation and not the model. In contrast, for deep networks, we prove a universality result showing that there exist approximate posteriors in the above classes which provide flexible uncertainty estimates. However, we find empirically that pathologies of a similar form as in the single-hidden layer case can persist when performing variational inference in deeper networks. Our results motivate careful consideration of the implications of approximate inference methods in BNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them b/data/2020/neurips/On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
new file mode 100644
index 0000000000..069b501884
--- /dev/null
+++ b/data/2020/neurips/On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them	
@@ -0,0 +1 @@
+We analyze the influence of adversarial training on the loss landscape of machine learning models. To this end, we first provide analytical studies of the properties of adversarial loss functions under different adversarial budgets. We then demonstrate that the adversarial loss landscape is less favorable to optimization, due to increased curvature and more scattered gradients. Our conclusions are validated by numerical analyses, which show that training under large adversarial budgets impede the escape from suboptimal random initialization, cause non-vanishing gradients and make the model find sharper minima. Based on these observations, we show that a periodic adversarial scheduling (PAS) strategy can effectively overcome these challenges, yielding better results than vanilla adversarial training while being much less sensitive to the choice of learning rate.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Modularity of Hypernetworks b/data/2020/neurips/On the Modularity of Hypernetworks
new file mode 100644
index 0000000000..c190a63e91
--- /dev/null
+++ b/data/2020/neurips/On the Modularity of Hypernetworks	
@@ -0,0 +1 @@
+In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $\theta_I$ of the function $h_I(x) = g(x;\theta_I)$ are given by a hypernetwork $f$ as $\theta_I=f(I)$. In this paper, we define the property of modularity as the ability to effectively learn a different function for each input instance $I$. For this purpose, we adopt an expressivity perspective of this property and extend the theory of Devore et al. 1996 and provide a lower bound on the complexity (number of trainable parameters) of neural networks as function approximators, by eliminating the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude. This sheds light on the modularity of hypernetworks in comparison with the embedding-based method. Besides, we show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Power of Louvain in the Stochastic Block Model b/data/2020/neurips/On the Power of Louvain in the Stochastic Block Model
new file mode 100644
index 0000000000..47ae828b4d
--- /dev/null
+++ b/data/2020/neurips/On the Power of Louvain in the Stochastic Block Model	
@@ -0,0 +1 @@
+A classic problem in machine learning and data analysis is to partition the vertices of a network in such a way that vertices in the same set are densely connected and vertices in different sets are loosely connected. In practice, the most popular approaches rely on local search algorithms; not only for the ease of implementation and the efﬁciency, but also because of the accuracy of these methods on many real world graphs. For example, the Louvain algorithm – a local search based algorithm – has quickly become the method of choice for clustering in social networks. However, explaining the success of these methods remains an open problem: in the worst-case, the runtime can be up to ⌦ ( n 2 ) , much worse than what is typically observed in practice, and no guarantee on the quality of its output can be established. The goal of this paper is to shed light on the inner-workings of Louvain; only if we understand Louvain, can we rely on it and further improve it. To achieve this goal, we study the behavior of Louvain in the famous two-bloc Stochastic Block Model, which has a clear ground-truth and serves as the standard testbed for graph clustering algorithms. We provide valuable tools for the analysis of Louvain, but also for many other combinatorial algorithms. For example, we show that the probability for a node to have more edges towards its own community is 1 / 2 + ⌦ (min( � ( p � q ) / p np, 1)) in the SBM( n, p, q ), where � is the imbalance. Note that this bound is asymptotically tight and useful for the analysis of a wide range of algorithms (Louvain, Kernighan-Lin, Simulated Annealing etc).
\ No newline at end of file
diff --git a/data/2020/neurips/On the Role of Sparsity and DAG Constraints for Learning Linear DAGs b/data/2020/neurips/On the Role of Sparsity and DAG Constraints for Learning Linear DAGs
new file mode 100644
index 0000000000..67713ef6ce
--- /dev/null
+++ b/data/2020/neurips/On the Role of Sparsity and DAG Constraints for Learning Linear DAGs	
@@ -0,0 +1 @@
+Learning graphical structure based on Directed Acyclic Graphs (DAGs) is a challenging problem, partly owing to the large search space of possible graphs. Recently, NOTEARS (Zheng et al., 2018) formulates the structure search problem as a continuous optimization task using the least squares objective and a proper characterization of DAGs. However, the formulation requires a hard DAG constraint and may lead to optimization difficulties. In this paper, we study the asymptotic roles of the sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases, and investigate their usefulness in the finite sample regime. Based on the theoretical results, we formulate a likelihood-based score function, and show that one only has to apply sparsity and DAG regularization terms to recover the underlying DAGs. This leads to an unconstrained optimization problem that is much easier to solve. Using gradient-based optimization and GPU acceleration, our procedure can easily handle thousand of nodes while retaining a high accuracy. Extensive experiments validate the effectiveness of our proposed method and show that the DAG-regularized likelihood objective is indeed favorable over the least squares one with the hard DAG constraint.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Similarity between the Laplace and Neural Tangent Kernels b/data/2020/neurips/On the Similarity between the Laplace and Neural Tangent Kernels
new file mode 100644
index 0000000000..07dcdccd4a
--- /dev/null
+++ b/data/2020/neurips/On the Similarity between the Laplace and Neural Tangent Kernels	
@@ -0,0 +1 @@
+Recent theoretical work has shown that massively overparameterized neural networks are equivalent to kernel regressors that use Neural Tangent Kernels(NTK). Experiments show that these kernel methods perform similarly to real neural networks. Here we show that NTK for fully connected networks is closely related to the standard Laplace kernel. We show theoretically that for normalized data on the hypersphere both kernels have the same eigenfunctions and their eigenvalues decay polynomially at the same rate, implying that their Reproducing Kernel Hilbert Spaces (RKHS) include the same sets of functions. This means that both kernels give rise to classes of functions with the same smoothness properties. The two kernels differ for data off the hypersphere, but experiments indicate that when data is properly normalized these differences are not significant. Finally, we provide experiments on real data comparing NTK and the Laplace kernel, along with a larger class of{\gamma}-exponential kernels. We show that these perform almost identically. Our results suggest that much insight about neural networks can be obtained from analysis of the well-known Laplace kernel, which has a simple closed-form.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems b/data/2020/neurips/On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems
new file mode 100644
index 0000000000..c981251a61
--- /dev/null
+++ b/data/2020/neurips/On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems	
@@ -0,0 +1 @@
+Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) that accounts for this gap during the policy training, by modeling the gap as an adversary against the training agent. In this work, we reexamine the effectiveness of RARL under a fundamental robust control setting: the linear quadratic (LQ) case. We ﬁrst observe that the popular RARL scheme that greedily alternates agents’ updates can easily destabilize the system . Motivated by this, we propose several other policy-based RARL algorithms whose convergence behaviors are then studied both empirically and theoretically. We ﬁnd: i) the conventional RARL framework (Pinto et al., 2017) can learn a destabilizing policy if the initial policy does not enjoy the robust stability property against the adversary; and ii) with robustly stabilizing initializations, our proposed double-loop RARL algorithm provably converges to the global optimal cost while maintaining robust stability on-the-ﬂy. We also examine the stability and convergence issues of other variants of policy-based RARL algorithms, and then discuss several ways to learn robustly stabilizing initializations. From a robust control perspective, we aim to provide some new and critical angles about RARL, by identifying and addressing the stability issues in this fundamental LQ setting in continuous control. Our results make an initial attempt toward better theoretical understandings of policy-based RARL, the core approach in Pinto et al., 2017.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Theory of Transfer Learning: The Importance of Task Diversity b/data/2020/neurips/On the Theory of Transfer Learning: The Importance of Task Diversity
new file mode 100644
index 0000000000..a49f3fd841
--- /dev/null
+++ b/data/2020/neurips/On the Theory of Transfer Learning: The Importance of Task Diversity	
@@ -0,0 +1 @@
+We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class $\mathcal{F} \circ \mathcal{H}$, where each $f_j$ is a task-specific function in $\mathcal{F}$ and $h$ is the shared representation in $\mathcal{H}$. Letting $C(\cdot)$ denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(\mathcal{H}) + t C(\mathcal{F})$, despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with $C(\mathcal{F})$. Our results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities. Finally, we exhibit the utility of our general framework in several models of importance in the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples b/data/2020/neurips/On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples
new file mode 100644
index 0000000000..924204b1e4
--- /dev/null
+++ b/data/2020/neurips/On the Tightness of Semidefinite Relaxations for Certifying Robustness to Adversarial Examples	
@@ -0,0 +1 @@
+The robustness of a neural network to adversarial examples can be provably certified by solving a convex relaxation. If the relaxation is loose, however, then the resulting certificate can be too conservative to be practically useful. Recently, a less conservative robustness certificate was proposed, based on a semidefinite programming (SDP) relaxation of the ReLU activation function. In this paper, we describe a geometric technique that determines whether this SDP certificate is exact, meaning whether it provides both a lower-bound on the size of the smallest adversarial perturbation, as well as a globally optimal perturbation that attains the lower-bound. Concretely, we show, for a least-squares restriction of the usual adversarial attack problem, that the SDP relaxation amounts to the nonconvex projection of a point onto a hyperbola. The resulting SDP certificate is exact if and only if the projection of the point lies on the major axis of the hyperbola. Using this geometric technique, we prove that the certificate is exact over a single hidden layer under mild assumptions, and explain why it is usually conservative for several hidden layers. We experimentally confirm our theoretical insights using a general-purpose interior-point method and a custom rank-2 Burer-Monteiro algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Trade-off between Adversarial and Backdoor Robustness b/data/2020/neurips/On the Trade-off between Adversarial and Backdoor Robustness
new file mode 100644
index 0000000000..a569fa911b
--- /dev/null
+++ b/data/2020/neurips/On the Trade-off between Adversarial and Backdoor Robustness	
@@ -0,0 +1 @@
+Deep neural networks are shown to be susceptible to both adversarial attacks and backdoor attacks. Although many defenses against an individual type of the above attacks have been proposed, the interactions between the vulnerabilities of a network to both types of attacks have not been carefully investigated yet. In this paper, we conduct experiments to study whether adversarial robustness and backdoor robustness can affect each other and ﬁnd a trade-off—by increasing the robustness of a network to adversarial examples, the network becomes more vulnerable to backdoor attacks. We then investigate the cause and show how such a trade-off can be exploited for either good or bad purposes. Our ﬁndings suggest that future research on defense should take both adversarial and backdoor attacks into account when designing algorithms or robustness measures to avoid pitfalls and a false sense of security.
\ No newline at end of file
diff --git a/data/2020/neurips/On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law b/data/2020/neurips/On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
new file mode 100644
index 0000000000..7b778830a4
--- /dev/null
+++ b/data/2020/neurips/On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on ``inverting'' the distribution of labels, e.g. answering mostly 'yes' when the common training answer is 'no'. Second, the OOD test set is used for model selection. Third, a model's in-domain performance is assessed after retraining it on in-domain splits (VQA v2) that exhibit a more balanced distribution of labels. These three practices defeat the objective of evaluating generalization, and put into question the value of methods specifically designed for this dataset. We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types. We provide short- and long-term solutions to avoid these pitfalls and realize the benefits of OOD evaluation.
\ No newline at end of file
diff --git a/data/2020/neurips/On the distance between two neural networks and the stability of learning b/data/2020/neurips/On the distance between two neural networks and the stability of learning
new file mode 100644
index 0000000000..6e8cd06e12
--- /dev/null
+++ b/data/2020/neurips/On the distance between two neural networks and the stability of learning	
@@ -0,0 +1 @@
+This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems not to require learning rate grid search, it may unlock a simpler workflow for training deeper and more complex neural networks. Please find the Python code used in this paper here: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/On the equivalence of molecular graph convolution and molecular wave function with poor basis set b/data/2020/neurips/On the equivalence of molecular graph convolution and molecular wave function with poor basis set
new file mode 100644
index 0000000000..6a35b3b1d3
--- /dev/null
+++ b/data/2020/neurips/On the equivalence of molecular graph convolution and molecular wave function with poor basis set	
@@ -0,0 +1 @@
+In this study, we demonstrate that the linear combination of atomic orbitals (LCAO), an approximation of quantum physics introduced by Pauling and Lennard-Jones in the 1920s, corresponds to graph convolutional networks (GCNs) for molecules. However, GCNs involve unnecessary nonlinearity and deep architecture. We also verify that molecular GCNs are based on a poor basis function set compared with the standard one used in theoretical calculations or quantum chemical simulations. From these observations, we describe the quantum deep field (QDF), a machine learning (ML) model based on an underlying quantum physics, in particular the density functional theory (DFT). We believe that the QDF model can be easily understood because it can be regarded as a single linear layer GCN. Moreover, it uses two vanilla feedforward neural networks to learn an energy functional and a Hohenberg--Kohn map that have nonlinearities inherent in quantum physics and the DFT. For molecular energy prediction tasks, we demonstrated the viability of an ``extrapolation,'' in which we trained a QDF model with small molecules, tested it with large molecules, and achieved high extrapolation performance. This will lead to reliable and practical applications for discovering effective materials. The implementation is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/On the linearity of large non-linear models: when and why the tangent kernel is constant b/data/2020/neurips/On the linearity of large non-linear models: when and why the tangent kernel is constant
new file mode 100644
index 0000000000..a037d7868e
--- /dev/null
+++ b/data/2020/neurips/On the linearity of large non-linear models: when and why the tangent kernel is constant	
@@ -0,0 +1 @@
+The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training". Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.
\ No newline at end of file
diff --git a/data/2020/neurips/Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free b/data/2020/neurips/Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free
new file mode 100644
index 0000000000..6f7b06d8a8
--- /dev/null
+++ b/data/2020/neurips/Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free	
@@ -0,0 +1 @@
+Adversarial training and its many variants substantially improve deep network robustness, yet at the cost of compromising standard accuracy. Moreover, the training process is heavy and hence it becomes impractical to thoroughly explore the trade-off between accuracy and robustness. This paper asks this new question: how to quickly calibrate a trained model in-situ, to examine the achievable trade-offs between its standard and robust accuracies, without (re-)training it many times? Our proposed framework, Once-for-all Adversarial Training (OAT), is built on an innovative model-conditional training framework, with a controlling hyper-parameter as the input. The trained model could be adjusted among different standard and robust accuracies "for free" at testing time. As an important knob, we exploit dual batch normalization to separate standard and adversarial feature statistics, so that they can be learned in one model without degrading performance. We further extend OAT to a Once-for-all Adversarial Training and Slimming (OATS) framework, that allows for the joint trade-off among accuracy, robustness and runtime efficiency. Experiments show that, without any re-training nor ensembling, OAT/OATS achieve similar or even superior performance compared to dedicatedly trained models at various configurations. Our codes and pretrained models are available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/One Ring to Rule Them All: Certifiably Robust Geometric Perception with Outliers b/data/2020/neurips/One Ring to Rule Them All: Certifiably Robust Geometric Perception with Outliers
new file mode 100644
index 0000000000..6fb46e5432
--- /dev/null
+++ b/data/2020/neurips/One Ring to Rule Them All: Certifiably Robust Geometric Perception with Outliers	
@@ -0,0 +1 @@
+We propose a general and practical framework to design certifiable algorithms for robust geometric perception in the presence of a large amount of outliers. We investigate the use of a truncated least squares (TLS) cost function, which is known to be robust to outliers, but leads to hard, nonconvex, and nonsmooth optimization problems. Our first contribution is to show that -for a broad class of geometric perception problems- TLS estimation can be reformulated as an optimization over the ring of polynomials and Lasserre's hierarchy of convex moment relaxations is empirically tight at the minimum relaxation order (i.e., certifiably obtains the global minimum of the nonconvex TLS problem). Our second contribution is to exploit the structural sparsity of the objective and constraint polynomials and leverage basis reduction to significantly reduce the size of the semidefinite program (SDP) resulting from the moment relaxation, without compromising its tightness. Our third contribution is to develop scalable dual optimality certifiers from the lens of sums-of-squares (SOS) relaxation, that can compute the suboptimality gap and possibly certify global optimality of any candidate solution (e.g., returned by fast heuristics such as RANSAC or graduated non-convexity). Our dual certifiers leverage Douglas-Rachford Splitting to solve a convex feasibility SDP. Numerical experiments across different perception problems, including high-integrity satellite pose estimation, demonstrate the tightness of our relaxations, the correctness of the certification, and the scalability of the proposed dual certifiers to large problems, beyond the reach of current SDP solvers.
\ No newline at end of file
diff --git a/data/2020/neurips/One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL b/data/2020/neurips/One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL
new file mode 100644
index 0000000000..3b5250e2bb
--- /dev/null
+++ b/data/2020/neurips/One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL	
@@ -0,0 +1 @@
+While reinforcement learning algorithms can learn effective policies for complex tasks, these policies are often brittle to even minor task variations, especially when variations are not explicitly provided during training. One natural approach to this problem is to train agents with manually specified variation in the training task or environment. However, this may be infeasible in practical situations, either because making perturbations is not possible, or because it is unclear how to choose suitable perturbation strategies without sacrificing performance. The key insight of this work is that learning diverse behaviors for accomplishing a task can directly lead to behavior that generalizes to varying environments, without needing to perform explicit perturbations during training. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations by abandoning solutions that are no longer effective and adopting those that are. We theoretically characterize a robustness set of environments that arises from our algorithm and empirically find that our diversity-driven approach can extrapolate to various changes in the environment and task.
\ No newline at end of file
diff --git a/data/2020/neurips/One-bit Supervision for Image Classification b/data/2020/neurips/One-bit Supervision for Image Classification
new file mode 100644
index 0000000000..47a3467499
--- /dev/null
+++ b/data/2020/neurips/One-bit Supervision for Image Classification	
@@ -0,0 +1 @@
+This paper presents one-bit supervision, a novel setting of learning from incomplete annotations, in the scenario of image classification. Instead of training a model upon the accurate label of each sample, our setting requires the model to query with a predicted label of each sample and learn from the answer whether the guess is correct. This provides one bit (yes or no) of information, and more importantly, annotating each sample becomes much easier than finding the accurate label from many candidate classes. There are two keys to training a model upon one-bit supervision: improving the guess accuracy and making use of incorrect guesses. For these purposes, we propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm. In three popular image classification benchmarks, our approach claims higher efficiency in utilizing the limited amount of annotations.
\ No newline at end of file
diff --git a/data/2020/neurips/One-sample Guided Object Representation Disassembling b/data/2020/neurips/One-sample Guided Object Representation Disassembling
new file mode 100644
index 0000000000..cc69306f2d
--- /dev/null
+++ b/data/2020/neurips/One-sample Guided Object Representation Disassembling	
@@ -0,0 +1 @@
+The ability to disassemble the features of objects and background is crucial for many machine learning tasks, including image classiﬁcation, image editing, visual concepts learning, and so on. However, existing (semi-)supervised methods all need a large amount of annotated samples, while unsupervised methods can’t handle real-world images with complicated backgrounds. In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images. For the annotated one-sample, we ﬁrst adopt some data augmentation strategies to generate some synthetic samples, which can guide the disassembling of the object features and background features. For the unannotated images, two self-supervised mechanisms: dual-swapping and fuzzy classiﬁcation are introduced to disassemble object features from the background with the guidance of annotated one-sample. What’s more, we devise two metrics to evaluate the disassembling performance from the perspective of representation and image, respectively. Experiments demonstrate that the One-GORD achieves competitive dissembling performance and can handle natural scenes with complicated backgrounds.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Agnostic Boosting via Regret Minimization b/data/2020/neurips/Online Agnostic Boosting via Regret Minimization
new file mode 100644
index 0000000000..770435d874
--- /dev/null
+++ b/data/2020/neurips/Online Agnostic Boosting via Regret Minimization	
@@ -0,0 +1,3 @@
+Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. 
+Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. 
+Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Algorithm for Unsupervised Sequential Selection with Contextual Information b/data/2020/neurips/Online Algorithm for Unsupervised Sequential Selection with Contextual Information
new file mode 100644
index 0000000000..2f77ddcbf2
--- /dev/null
+++ b/data/2020/neurips/Online Algorithm for Unsupervised Sequential Selection with Contextual Information	
@@ -0,0 +1 @@
+In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Algorithms for Multi-shop Ski Rental with Machine Learned Advice b/data/2020/neurips/Online Algorithms for Multi-shop Ski Rental with Machine Learned Advice
new file mode 100644
index 0000000000..97a2c7fa3a
--- /dev/null
+++ b/data/2020/neurips/Online Algorithms for Multi-shop Ski Rental with Machine Learned Advice	
@@ -0,0 +1 @@
+We study the problem of augmenting online algorithms with machine learned (ML) advice. In particular, we consider the \emph{multi-shop ski rental} (MSSR) problem, which is a generalization of the classical ski rental problem. In MSSR, each shop has different prices for buying and renting a pair of skis, and a skier has to make decisions on when and where to buy. We obtain both deterministic and randomized online algorithms with provably improved performance when either a single or multiple ML predictions are used to make decisions. These online algorithms have no knowledge about the quality or the prediction error type of the ML prediction. The performance of these online algorithms are robust to the poor performance of the predictors, but improve with better predictions. Extensive experiments using both synthetic and real world data traces verify our theoretical observations and show better performance against algorithms that purely rely on online decision making.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Bayesian Goal Inference for Boundedly Rational Planning Agents b/data/2020/neurips/Online Bayesian Goal Inference for Boundedly Rational Planning Agents
new file mode 100644
index 0000000000..7a62606266
--- /dev/null
+++ b/data/2020/neurips/Online Bayesian Goal Inference for Boundedly Rational Planning Agents	
@@ -0,0 +1 @@
+People routinely infer the goals of others by observing their actions over time. Remarkably, we can do so even when those actions lead to failure, enabling us to assist others when we detect that they might not achieve their goals. How might we endow machines with similar capabilities? Here we present an architecture capable of inferring an agent's goals online from both optimal and non-optimal sequences of actions. Our architecture models agents as boundedly-rational planners that interleave search with execution by replanning, thereby accounting for sub-optimal behavior. These models are specified as probabilistic programs, allowing us to represent and perform efficient Bayesian inference over an agent's goals and internal planning processes. To perform such inference, we develop Sequential Inverse Plan Search (SIPS), a sequential Monte Carlo algorithm that exploits the online replanning assumption of these models, limiting computation by incrementally extending inferred plans as new actions are observed. We present experiments showing that this modeling and inference architecture outperforms Bayesian inverse reinforcement learning baselines, accurately inferring goals from both optimal and non-optimal trajectories involving failure and back-tracking, while generalizing across domains with compositional structure and sparse rewards.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Bayesian Persuasion b/data/2020/neurips/Online Bayesian Persuasion
new file mode 100644
index 0000000000..be2c97c97b
--- /dev/null
+++ b/data/2020/neurips/Online Bayesian Persuasion	
@@ -0,0 +1 @@
+Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the case in which the sender faces a single receiver and has partial feedback, improving over the best previously known bound of $\tilde O(T^{4/5})$. Then, we provide the first no-regret guarantees for the multi-receiver setting under partial feedback. Finally, we show how to design no-regret algorithms with polynomial per-iteration running time by exploiting type reporting, thereby circumventing known intractability results on online Bayesian persuasion. We provide efficient algorithms guaranteeing a $O(T^{1/2})$ regret upper bound both in the single- and multi-receiver scenario when type reporting is allowed.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Convex Optimization Over Erdos-Renyi Random Networks b/data/2020/neurips/Online Convex Optimization Over Erdos-Renyi Random Networks
new file mode 100644
index 0000000000..1e58ca90c9
--- /dev/null
+++ b/data/2020/neurips/Online Convex Optimization Over Erdos-Renyi Random Networks	
@@ -0,0 +1 @@
+The work studies how node-to-node communications over an Erd ˝ os-Rényi random network inﬂuence distributed online convex optimization, which is vital in solving large-scale machine learning in antagonistic or changing environments. At per step, each node (computing unit) makes a local decision, experiences a loss evaluated with a convex function, and communicates the decision with other nodes over a network. The node-to-node communications are described by the Erd ˝ os-Rényi rule, where independently each link takes place with a probability p over a prescribed connected graph. The objective is to minimize the system-wide loss accumulated over a ﬁnite time horizon. We consider standard distributed gradient descents with full gradients, one-point bandits and two-points bandits for convex and strongly convex losses, respectively. We establish how the regret bounds scale with respect to time horizon T , network size N , decision dimension d , and an algebraic network connectivity. The regret bounds scaling with respect to T match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problems, e.g., O ( √ T ) and O (ln( T )) regrets are established for convex and strongly convex losses with full gradient feedback and two-points information, respectively. For classical Erd ˝ os-Rényi networks over all-to-all possible node communications, the regret scalings with respect to the probability p are analytically established, based on which the tradeoff between the communication overhead and computation accuracy is clearly demonstrated. Numerical studies have validated the theoretical ﬁndings.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Decision Based Visual Tracking via Reinforcement Learning b/data/2020/neurips/Online Decision Based Visual Tracking via Reinforcement Learning
new file mode 100644
index 0000000000..321ac7741e
--- /dev/null
+++ b/data/2020/neurips/Online Decision Based Visual Tracking via Reinforcement Learning	
@@ -0,0 +1 @@
+A deep visual tracker is typically based on either object detection or template matching while each of them is only suitable for a particular group of scenes. It is straightforward to consider fusing them together to pursue more reliable tracking. However, this is not wise as they follow different tracking principles. Unlike previous fusion-based methods, we propose a novel ensemble framework, named DTNet, with an online decision mechanism for visual tracking based on hierarchical reinforcement learning. The decision mechanism substantiates an intelligent switching strategy where the detection and the template trackers have to compete with each other to conduct tracking within different scenes that they are adept in. Besides, we present a novel detection tracker which avoids the common issue of incorrect proposal. Extensive results show that our DTNet achieves state-of-the-art tracking performance as well as a good balance between accuracy and efﬁciency. The project website is available at https://vsislab.github. io/DTNet/ .
\ No newline at end of file
diff --git a/data/2020/neurips/Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning b/data/2020/neurips/Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning
new file mode 100644
index 0000000000..7fa3b3a850
--- /dev/null
+++ b/data/2020/neurips/Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning	
@@ -0,0 +1 @@
+Continual learning agents experience a stream of (related) tasks. The main challenge is that the agent must not forget previous tasks and also adapt to novel tasks in the stream. We are interested in the intersection of two recent continual-learning scenarios. In meta-continual learning , the model is pre-trained using meta-learning to minimize catastrophic forgetting of previous tasks. In continual-meta learning , the aim is to train agents for faster remembering of previous tasks through adaptation. In their original formulations, both methods have limitations. We stand on their shoulders to propose a more general scenario, OSAKA, where an agent must quickly solve new (out-of-distribution) tasks, while also requiring fast remembering. We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. We propose Continual-MAML , an online extension of the popular MAML algorithm as a strong baseline for this scenario. We show in an empirical study that Continual-MAML is better suited to the new scenario than the aforementioned methodologies including standard continual learning and meta-learning approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Influence Maximization under Linear Threshold Model b/data/2020/neurips/Online Influence Maximization under Linear Threshold Model
new file mode 100644
index 0000000000..c7028a55ab
--- /dev/null
+++ b/data/2020/neurips/Online Influence Maximization under Linear Threshold Model	
@@ -0,0 +1 @@
+Online influence maximization (OIM) is a popular problem in social networks to learn influence propagation model parameters and maximize the influence spread at the same time. Most previous studies focus on the independent cascade (IC) model under the edge-level feedback. In this paper, we address OIM in the linear threshold (LT) model. Because node activations in the LT model are due to the aggregated effect of all active neighbors, it is more natural to model OIM with the node-level feedback. And this brings new challenge in online learning since we only observe aggregated effect from groups of nodes and the groups are also random. Based on the linear structure in node activations, we incorporate ideas from linear bandits and design an algorithm LT-LinUCB that is consistent with the observed feedback. By proving group observation modulated (GOM) bounded smoothness property, a novel result of the influence difference in terms of the random observations, we provide a regret of order $\tilde{O}(\mathrm{poly}(m)\sqrt{T})$, where $m$ is the number of edges and $T$ is the number of rounds. This is the first theoretical result in such order for OIM under the LT model. In the end, we also provide an algorithm OIM-ETC with regret bound $O(\mathrm{poly}(m)\ T^{2/3})$, which is model-independent, simple and has less requirement on online feedback and offline computation.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Learning in Contextual Bandits using Gated Linear Networks b/data/2020/neurips/Online Learning in Contextual Bandits using Gated Linear Networks
new file mode 100644
index 0000000000..d58563df30
--- /dev/null
+++ b/data/2020/neurips/Online Learning in Contextual Bandits using Gated Linear Networks	
@@ -0,0 +1 @@
+We introduce a new and completely online contextual bandit algorithm called Gated Linear Contextual Bandits (GLCB). This algorithm is based on Gated Linear Networks (GLNs), a recently introduced deep learning architecture with properties well-suited to the online setting. Leveraging data-dependent gating properties of the GLN we are able to estimate prediction uncertainty with effectively zero algorithmic overhead. We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. GLCB obtains median first-place despite being the only online method, and we further support these results with a theoretical study of its convergence properties.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Learning with Primary and Secondary Losses b/data/2020/neurips/Online Learning with Primary and Secondary Losses
new file mode 100644
index 0000000000..d8e57bcfaa
--- /dev/null
+++ b/data/2020/neurips/Online Learning with Primary and Secondary Losses	
@@ -0,0 +1 @@
+We study the problem of online learning with primary and secondary losses. For example, a recruiter making decisions of which job applicants to hire might weigh false positives and false negatives equally (the primary loss) but the applicants might weigh false negatives much higher (the secondary loss). We consider the following question: Can we combine "expert advice" to achieve low regret with respect to the primary loss, while at the same time performing {\em not much worse than the worst expert} with respect to the secondary loss? Unfortunately, we show that this goal is unachievable without any bounded variance assumption on the secondary loss. More generally, we consider the goal of minimizing the regret with respect to the primary loss and bounding the secondary loss by a linear threshold. On the positive side, we show that running any switching-limited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by $o(T)$ for any time interval. If not all experts satisfy this assumption, our algorithms can achieve this goal given access to some external oracles which determine when to deactivate and reactivate experts.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Linear Optimization with Many Hints b/data/2020/neurips/Online Linear Optimization with Many Hints
new file mode 100644
index 0000000000..b15f773d47
--- /dev/null
+++ b/data/2020/neurips/Online Linear Optimization with Many Hints	
@@ -0,0 +1 @@
+We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors. This significantly extends prior work that considered only the case $K=1$. To accomplish this, we develop a way to combine many arbitrary OLO algorithms to obtain regret only a logarithmically worse factor than the minimum regret of the original algorithms in hindsight; this result is of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Online MAP Inference of Determinantal Point Processes b/data/2020/neurips/Online MAP Inference of Determinantal Point Processes
new file mode 100644
index 0000000000..af561716da
--- /dev/null
+++ b/data/2020/neurips/Online MAP Inference of Determinantal Point Processes	
@@ -0,0 +1 @@
+In this paper, we provide an efﬁcient approximation algorithm for ﬁnding the most likelihood conﬁguration (MAP) of size k for Determinantal Point Processes (DPP) in the online setting where the data points arrive in an arbitrary order and the algorithm cannot discard the selected elements from its local memory. Given a tolerance additive error ⌘ , our O NLINE -DPP algorithm achieves a k O ( k ) multiplicative approximation guarantee with an additive error ⌘ , using a memory footprint independent of the size of the data stream. We note that the exponential dependence on k in the approximation factor is unavoidable even in the ofﬂine setting. Our result readily implies a streaming algorithm with an improved memory bound compared to existing results.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Matrix Completion with Side Information b/data/2020/neurips/Online Matrix Completion with Side Information
new file mode 100644
index 0000000000..88fa04d0e5
--- /dev/null
+++ b/data/2020/neurips/Online Matrix Completion with Side Information	
@@ -0,0 +1 @@
+We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form $\tilde{O}(D/\gamma^2)$. The term $1/\gamma^2$ is analogous to the usual margin term in SVM (perceptron) bounds. More specifically, if we assume that there is some factorization of the underlying $m \times n$ matrix into $P Q^\intercal$ where the rows of $P$ are interpreted as "classifiers" in $\mathcal{R}^d$ and the rows of $Q$ as "instances" in $\mathcal{R}^d$, then $\gamma$ is the maximum (normalized) margin over all factorizations $P Q^\intercal$ consistent with the observed matrix. The quasi-dimension term $D$ measures the quality of side information. In the presence of vacuous side information, $D= m+n$. However, if the side information is predictive of the underlying factorization of the matrix, then in an ideal case, $D \in O(k + \ell)$ where $k$ is the number of distinct row factors and $\ell$ is the number of distinct column factors. We additionally provide a generalization of our algorithm to the inductive setting. In this setting, we provide an example where the side information is not directly specified in advance. For this example, the quasi-dimension $D$ is now bounded by $O(k^2 + \ell^2)$.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Meta-Critic Learning for Off-Policy Actor-Critic Methods b/data/2020/neurips/Online Meta-Critic Learning for Off-Policy Actor-Critic Methods
new file mode 100644
index 0000000000..4e4cb94f57
--- /dev/null
+++ b/data/2020/neurips/Online Meta-Critic Learning for Off-Policy Actor-Critic Methods	
@@ -0,0 +1 @@
+Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in avariety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Multitask Learning with Long-Term Memory b/data/2020/neurips/Online Multitask Learning with Long-Term Memory
new file mode 100644
index 0000000000..3f8743e7c7
--- /dev/null
+++ b/data/2020/neurips/Online Multitask Learning with Long-Term Memory	
@@ -0,0 +1 @@
+We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting this is equivalent to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]. We provide an algorithm that predicts on each trial in time linear in the number of hypotheses when the hypothesis class is finite. We also consider infinite hypothesis classes from reproducing kernel Hilbert spaces for which we give an algorithm whose per trial time complexity is cubic in the number of cumulative trials. In the single-task special case this is the first example of an efficient regret-bounded switching algorithm with long-term memory for a non-parametric hypothesis class.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Neural Connectivity Estimation with Noisy Group Testing b/data/2020/neurips/Online Neural Connectivity Estimation with Noisy Group Testing
new file mode 100644
index 0000000000..283bee661b
--- /dev/null
+++ b/data/2020/neurips/Online Neural Connectivity Estimation with Noisy Group Testing	
@@ -0,0 +1 @@
+One of the primary goals of systems neuroscience is to relate the structure of neural circuits to their function, yet patterns of connectivity are difficult to establish when recording from large populations in behaving organisms. Many previous approaches have attempted to estimate functional connectivity between neurons using statistical modeling of observational data, but these approaches rely heavily on parametric assumptions and are purely correlational. Recently, however, holographic photostimulation techniques have made it possible to precisely target selected ensembles of neurons, offering the possibility of establishing direct causal links. A naive method for inferring functional connections is to stimulate each individual neuron multiple times and observe the responses of cells in the local network, but this approach scales poorly with the number of neurons. Here, we propose a method based on noisy group testing that drastically increases the efficiency of this process in sparse networks. By stimulating small ensembles of neurons, we show that it is possible to recover binarized network connectivity with a number of tests that grows only logarithmically with population size under minimal statistical assumptions. Moreover, we prove that our approach, which reduces to an efficiently solvable convex optimization problem, is equivalent to Variational Bayesian inference on the binary connection weights, and we derive rigorous bounds on the posterior marginals. This allows us to extend our method to the streaming setting, where continuously updated posteriors allow for optional stopping, and we demonstrate the feasibility of inferring connectivity for networks of up to tens of thousands of neurons online.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Non-Convex Optimization with Imperfect Feedback b/data/2020/neurips/Online Non-Convex Optimization with Imperfect Feedback
new file mode 100644
index 0000000000..49fce90a8d
--- /dev/null
+++ b/data/2020/neurips/Online Non-Convex Optimization with Imperfect Feedback	
@@ -0,0 +1 @@
+We consider the problem of online learning with non-convex losses. In terms of feedback, we assume that the learner observes - or otherwise constructs - an inexact model for the loss function encountered at each stage, and we propose a mixed-strategy learning policy based on dual averaging. In this general context, we derive a series of tight regret minimization guarantees, both for the learner's static (external) regret, as well as the regret incurred against the best dynamic policy in hindsight. Subsequently, we apply this general template to the case where the learner only has access to the actual loss incurred at each stage of the process. This is achieved by means of a kernel-based estimator which generates an inexact model for each round's loss function using only the learner's realized losses as input.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Optimization with Memory and Competitive Control b/data/2020/neurips/Online Optimization with Memory and Competitive Control
new file mode 100644
index 0000000000..2d7525b0a3
--- /dev/null
+++ b/data/2020/neurips/Online Optimization with Memory and Competitive Control	
@@ -0,0 +1 @@
+This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous $p$ decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Planning with Lookahead Policies b/data/2020/neurips/Online Planning with Lookahead Policies
new file mode 100644
index 0000000000..096efb9a87
--- /dev/null
+++ b/data/2020/neurips/Online Planning with Lookahead Policies	
@@ -0,0 +1 @@
+Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Robust Regression via SGD on the l1 loss b/data/2020/neurips/Online Robust Regression via SGD on the l1 loss
new file mode 100644
index 0000000000..096e5dc396
--- /dev/null
+++ b/data/2020/neurips/Online Robust Regression via SGD on the l1 loss	
@@ -0,0 +1 @@
+We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner, one data point after the other. More specifically, for a true parameter $\theta^*$, we consider the corrupted Gaussian linear model $y = \langle x , \ \theta^* \rangle + \varepsilon + b$ where the adversarial noise $b$ can take any value with probability $\eta$ and equals zero otherwise. We consider this adversary to be oblivious (i.e., $b$ independent of the data) since this is the only contamination model under which consistency is possible. Current algorithms rely on having the whole data at hand in order to identify and remove the outliers. In contrast, we show in this work that stochastic gradient descent on the $\ell_1$ loss converges to the true parameter vector at a $\tilde{O}( 1 / (1 - \eta)^2 n )$ rate which is independent of the values of the contaminated measurements. Our proof relies on the elegant smoothing of the non-smooth $\ell_1$ loss by the Gaussian data and a classical non-asymptotic analysis of Polyak-Ruppert averaged SGD. In addition, we provide experimental evidence of the efficiency of this simple and highly scalable algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Sinkhorn: Optimal Transport distances from sample streams b/data/2020/neurips/Online Sinkhorn: Optimal Transport distances from sample streams
new file mode 100644
index 0000000000..afa1f0131b
--- /dev/null
+++ b/data/2020/neurips/Online Sinkhorn: Optimal Transport distances from sample streams	
@@ -0,0 +1 @@
+Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks. Yet, computing OT distances between arbitrary (i.e. not necessarily discrete) probability distributions remains an open problem. This paper introduces a new online estimator of entropy-regularized OT distances between two such arbitrary distributions. It uses streams of samples from both distributions to iteratively enrich a non-parametric representation of the transportation plan. Compared to the classic Sinkhorn algorithm, our method leverages new samples at each iteration, which enables a consistent estimation of the true regularized OT distance. We provide a theoretical analysis of the convergence of the online Sinkhorn algorithm, showing a nearly-O(1/n) asymptotic sample complexity for the iterate sequence. We validate our method on synthetic 1D to 10D data and on real 3D shape data.
\ No newline at end of file
diff --git a/data/2020/neurips/Online Structured Meta-learning b/data/2020/neurips/Online Structured Meta-learning
new file mode 100644
index 0000000000..b9c49dbcfb
--- /dev/null
+++ b/data/2020/neurips/Online Structured Meta-learning	
@@ -0,0 +1 @@
+Learning quickly is of great importance for machine intelligence deployed in online platforms. With the capability of transferring knowledge from learned tasks, meta-learning has shown its effectiveness in online scenarios by continuously updating the model with the learned prior. However, current online meta-learning algorithms are limited to learn a globally-shared meta-learner, which may lead to sub-optimal results when the tasks contain heterogeneous information that are distinct by nature and difficult to share. We overcome this limitation by proposing an online structured meta-learning (OSML) framework. Inspired by the knowledge organization of human and hierarchical feature representation, OSML explicitly disentangles the meta-learner as a meta-hierarchical graph with different knowledge blocks. When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks. Through the meta-knowledge pathway, the model is able to quickly adapt to the new task. In addition, new knowledge is further incorporated into the selected blocks. Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework in the context of both homogeneous and heterogeneous tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Online learning with dynamics: A minimax perspective b/data/2020/neurips/Online learning with dynamics: A minimax perspective
new file mode 100644
index 0000000000..e2c0fe3c3b
--- /dev/null
+++ b/data/2020/neurips/Online learning with dynamics: A minimax perspective	
@@ -0,0 +1,3 @@
+We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem. 
+Our main results provide sufficient conditions for online learnability for this setup with corresponding rates. The rates are characterized by 1) a complexity term capturing the expressiveness of the underlying policy class under the dynamics of state change, and 2) a dynamics stability term measuring the deviation of the instantaneous loss from a certain counterfactual loss. Further, we provide matching lower bounds which show that both the complexity terms are indeed necessary. 
+Our approach provides a unifying analysis that recovers regret bounds for several well studied problems including online learning with memory, online control of linear quadratic regulators, online Markov decision processes, and tracking adversarial targets. In addition, we show how our tools help obtain tight regret bounds for a new problems (with non-linear dynamics and non-convex losses) for which such bounds were not known prior to our work.
\ No newline at end of file
diff --git a/data/2020/neurips/Open Graph Benchmark: Datasets for Machine Learning on Graphs b/data/2020/neurips/Open Graph Benchmark: Datasets for Machine Learning on Graphs
new file mode 100644
index 0000000000..4fb8a7a721
--- /dev/null
+++ b/data/2020/neurips/Open Graph Benchmark: Datasets for Machine Learning on Graphs	
@@ -0,0 +1 @@
+We present the Open Graph Benchmark (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale (up to 100+ million nodes and 1+ billion edges), encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield b/data/2020/neurips/Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield
new file mode 100644
index 0000000000..63facacab4
--- /dev/null
+++ b/data/2020/neurips/Optimal Adaptive Electrode Selection to Maximize Simultaneously Recorded Neuron Yield	
@@ -0,0 +1 @@
+Neural-Matrix style, high-density electrode arrays for brain-machine interfaces (BMIs) and neuroscientific research require the use of multiplexing: Each recording channel can be routed to one of several electrode sites on the array. This capability allows the user to flexibly distribute recording channels to the locations where the most desirable neural signals can be resolved. For example, in the Neuropixel probe, 960 electrodes can be addressed by 384 recording channels. However, currently no adaptive methods exist to use recorded neural data to optimize/customize the electrode selections per recording context. Here, we present an algorithm called classification-based selection (CBS) that optimizes the joint electrode selections for all recording channels so as to maximize isolation quality of detected neurons. We show, in experiments using Neuropixels in non-human primates, that this algorithm yields a similar number of isolated neurons as would be obtained if all electrodes were recorded simultaneously. Neuron counts were 41-85% improved over previously published electrode selection strategies. The neurons isolated from electrodes selected by CBS were a 73% match, by spike timing, to the complete set of recordable neurons around the probe. The electrodes selected by CBS exhibited higher average per-recording-channel signal-to-noise ratio. CBS, and selection optimization in general, could play an important role in development of neurotechnologies for BMI, as signal bandwidth becomes an increasingly limiting factor. Code and experimental data have been made available1.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards b/data/2020/neurips/Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
new file mode 100644
index 0000000000..a054487aa5
--- /dev/null
+++ b/data/2020/neurips/Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards	
@@ -0,0 +1 @@
+In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose $p$-th moment is bounded by a constant $\nu_{p}$ for $1<p\leq2$. First, we propose a novel robust estimator which does not require $\nu_{p}$ as prior information, while other existing robust estimators demand prior knowledge about $\nu_{p}$. We show that an error probability of the proposed estimator decays exponentially fast. Using this estimator, we propose a perturbation-based exploration strategy and develop a generalized regret analysis scheme that provides upper and lower regret bounds by revealing the relationship between the regret and the cumulative density function of the perturbation. From the proposed analysis scheme, we obtain gap-dependent and gap-independent upper and lower regret bounds of various perturbations. We also find the optimal hyperparameters for each perturbation, which can achieve the minimax optimal regret bound with respect to total rounds. In simulation, the proposed estimator shows favorable performance compared to existing robust estimators for various $p$ values and, for MAB problems, the proposed perturbation strategy outperforms existing exploration methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions b/data/2020/neurips/Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions
new file mode 100644
index 0000000000..a2dab22c05
--- /dev/null
+++ b/data/2020/neurips/Optimal Approximation - Smoothness Tradeoffs for Soft-Max Functions	
@@ -0,0 +1 @@
+A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Renyi Divergence. We introduce a soft-max function, called "piecewise linear soft-max", with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to $\ell_q$-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [Martins et al. '16, Laha et al. '18] and is not satisfied by the exponential mechanism. Moreover, the $\ell_q$-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected \textit{multiplicative} approximation and smoothness with respect to the Renyi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Best-arm Identification in Linear Bandits b/data/2020/neurips/Optimal Best-arm Identification in Linear Bandits
new file mode 100644
index 0000000000..7575ccfbcd
--- /dev/null
+++ b/data/2020/neurips/Optimal Best-arm Identification in Linear Bandits	
@@ -0,0 +1 @@
+We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm sampling rule that tracks an optimal proportion of arm draws, and that remarkably can be updated as rarely as we wish, without compromising its theoretical guarantees. Moreover, unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms. Experimental results suggest that our algorithm significantly outperforms existing algorithms. The paper further provides a first analysis of the best-arm identification problem in linear bandits with a continuous set of arms.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization b/data/2020/neurips/Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization
new file mode 100644
index 0000000000..5409c0075d
--- /dev/null
+++ b/data/2020/neurips/Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization	
@@ -0,0 +1 @@
+Epoch gradient descent method (a.k.a. Epoch-GD) proposed by Hazan and Kale (2011) was deemed a breakthrough for stochastic strongly convex minimization, which achieves the optimal convergence rate of $O(1/T)$ with $T$ iterative updates for the {\it objective gap}. However, its extension to solving stochastic min-max problems with strong convexity and strong concavity still remains open, and it is still unclear whether a fast rate of $O(1/T)$ for the {\it duality gap} is achievable for stochastic min-max optimization under strong convexity and strong concavity. Although some recent studies have proposed stochastic algorithms with fast convergence rates for min-max problems, they require additional assumptions about the problem, e.g., smoothness, bi-linear structure, etc. In this paper, we bridge this gap by providing a sharp analysis of epoch-wise stochastic gradient descent ascent method (referred to as Epoch-GDA) for solving strongly convex strongly concave (SCSC) min-max problems, without imposing any additional assumption about smoothness or the function's structure. To the best of our knowledge, our result is the first one that shows Epoch-GDA can achieve the optimal rate of $O(1/T)$ for the duality gap of general SCSC min-max problems. We emphasize that such generalization of Epoch-GD for strongly convex minimization problems to Epoch-GDA for SCSC min-max problems is non-trivial and requires novel technical analysis. Moreover, we notice that the key lemma can also be used for proving the convergence of Epoch-GDA for weakly-convex strongly-concave min-max problems, leading to a nearly optimal complexity without resorting to smoothness or other structural conditions.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform b/data/2020/neurips/Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform
new file mode 100644
index 0000000000..c53e6b910d
--- /dev/null
+++ b/data/2020/neurips/Optimal Iterative Sketching Methods with the Subsampled Randomized Hadamard Transform	
@@ -0,0 +1 @@
+A Proofs of main theorems 1 A.1 Calculations of θ1,h and θ2,h for Haar sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 A.1.1 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.2 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A.3 Calculations of θ1,h and θ2,h for SRHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 A.4 Proof of Theorem 4.1 and 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Learning from Verified Training Data b/data/2020/neurips/Optimal Learning from Verified Training Data
new file mode 100644
index 0000000000..cf64e4ad2b
--- /dev/null
+++ b/data/2020/neurips/Optimal Learning from Verified Training Data	
@@ -0,0 +1 @@
+Standard machine learning algorithms typically assume that data is sampled independently from the distribution of interest. In attempts to relax this assumption, ﬁelds such as adversarial learning typically assume that data is provided by an adversary, whose sole objective is to fool a learning algorithm. However, in reality, it is often the case that data comes from self-interested agents, with less malicious goals and intentions which lie somewhere between the two settings described above. To tackle this problem, we present a Stackelberg competition model for least squares regression, in which data is provided by agents who wish to achieve speciﬁc predictions for their data. Although the resulting optimisation problem is nonconvex, we derive an algorithm which converges globally, outperforming current approaches which only guarantee convergence to local optima. We also provide empirical results on two real-world datasets, the medical personal costs dataset and the red wine dataset, showcasing the performance of our algorithm relative to algorithms which are optimal under adversarial assumptions, outperforming the state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient b/data/2020/neurips/Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient
new file mode 100644
index 0000000000..38908620b5
--- /dev/null
+++ b/data/2020/neurips/Optimal Lottery Tickets via Subset Sum: Logarithmic Over-Parameterization is Sufficient	
@@ -0,0 +1 @@
+The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al.~\cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width $d$ and depth $l$, by pruning a random one that is a factor $O(d^4l^2)$ wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(\log(dl))$ wider and twice as deep. Our analysis heavily relies on connecting pruning random ReLU networks to random instances of the \textsc{SubsetSum} problem. We then show that this logarithmic over-parameterization is essentially optimal for constant depth networks. Finally, we verify several of our theoretical insights with experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Prediction of the Number of Unseen Species with Multiplicity b/data/2020/neurips/Optimal Prediction of the Number of Unseen Species with Multiplicity
new file mode 100644
index 0000000000..49c15f713f
--- /dev/null
+++ b/data/2020/neurips/Optimal Prediction of the Number of Unseen Species with Multiplicity	
@@ -0,0 +1 @@
+Based on a sample of size n, we consider estimating the number of symbols that appear at least μ times in an independent sample of size a · n, where a is a given parameter. This formulation includes, as a special case, the well-known problem of inferring the number of unseen species introduced by [Fisher et al.] in 1943 and considered by many others. Of considerable interest in this line of works is the largest a for which the quantity can be accurately predicted. We completely resolve this problem by determining the limit of estimation to be a ≈ (log n)/μ, with both lower and upper bounds matching up to constant factors. For the particular case of μ = 1, this implies the recent result by [Orlitsky et al.] on the unseen species problem. Experimental evaluations show that the proposed estimator performs exceptionally well in practice. Furthermore, the estimator is a linear combination of symbols’ empirical counts, and hence linear-time computable.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Private Median Estimation under Minimal Distributional Assumptions b/data/2020/neurips/Optimal Private Median Estimation under Minimal Distributional Assumptions
new file mode 100644
index 0000000000..f7a38e6c7e
--- /dev/null
+++ b/data/2020/neurips/Optimal Private Median Estimation under Minimal Distributional Assumptions	
@@ -0,0 +1 @@
+We study the fundamental task of estimating the median of an underlying distribution from a finite number of samples, under pure differential privacy constraints. We focus on distributions satisfying the minimal assumption that they have a positive density at a small neighborhood around the median. In particular, the distribution is allowed to output unbounded values and is not required to have finite moments. We compute the exact, up-to-constant terms, statistical rate of estimation for the median by providing nearly-tight upper and lower bounds. Furthermore, we design a polynomial-time differentially private algorithm which provably achieves the optimal performance. At a technical level, our results leverage a Lipschitz Extension Lemma which allows us to design and analyze differentially private algorithms solely on appropriately defined "typical" instances of the samples.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Query Complexity of Secure Stochastic Convex Optimization b/data/2020/neurips/Optimal Query Complexity of Secure Stochastic Convex Optimization
new file mode 100644
index 0000000000..3797bd8f7e
--- /dev/null
+++ b/data/2020/neurips/Optimal Query Complexity of Secure Stochastic Convex Optimization	
@@ -0,0 +1 @@
+We study the secure stochastic convex optimization problem. A learner aims to learn the optimal point of a convex function through sequentially querying a (stochastic) gradient oracle. In the meantime, there exists an adversary who aims to free-ride and infer the learning outcome of the learner from observing the learner's queries. The adversary observes only the points of the queries but not the feedback from the oracle. The goal of the learner is to optimize the accuracy, i.e., obtaining an accurate estimate of the optimal point, while securing her privacy, i.e., making it difficult for the adversary to infer the optimal point. We formally quantify this tradeoff between learner's accuracy and privacy and characterize the lower and upper bounds on the learner's query complexity as a function of desired levels of accuracy and privacy. For the analysis of lower bounds, we provide a general template based on information theoretical analysis and then tailor the template to several families of problems, including stochastic convex optimization and (noisy) binary search. We also present a generic secure learning protocol that achieves the matching upper bound up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms b/data/2020/neurips/Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms
new file mode 100644
index 0000000000..5bfb567cdf
--- /dev/null
+++ b/data/2020/neurips/Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms	
@@ -0,0 +1 @@
+We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions. The goal is to design algorithms that are both consistent and robust, meaning that the algorithm performs well when predictions are accurate and maintains worst-case guarantees. Such algorithms have been studied in a recent line of works due to Lykouris and Vassilvitskii (ICML '18) and Purohit et al (NeurIPS '18). They provide robustness-consistency trade-offs for a variety of online problems. However, they leave open the question of whether these trade-offs are tight, i.e., to what extent to such trade-offs are necessary. In this paper, we provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions. We focus on the classic problems of ski-rental and non-clairvoyant scheduling and provide optimal trade-offs in various settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds b/data/2020/neurips/Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds
new file mode 100644
index 0000000000..740baad0c0
--- /dev/null
+++ b/data/2020/neurips/Optimal Variance Control of the Score-Function Gradient Estimator for Importance-Weighted Bounds	
@@ -0,0 +1 @@
+This paper introduces novel results for the score function gradient estimator of the importance weighted variational bound (IWAE). We prove that in the limit of large $K$ (number of importance samples) one can choose the control variate such that the Signal-to-Noise ratio (SNR) of the estimator grows as $\sqrt{K}$. This is in contrast to the standard pathwise gradient estimator where the SNR decreases as $1/\sqrt{K}$. Based on our theoretical findings we develop a novel control variate that extends on VIMCO. Empirically, for the training of both continuous and discrete generative models, the proposed method yields superior variance reduction, resulting in an SNR for IWAE that increases with $K$ without relying on the reparameterization trick. The novel estimator is competitive with state-of-the-art reparameterization-free gradient estimators such as Reweighted Wake-Sleep (RWS) and the thermodynamic variational objective (TVO) when training generative models.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization b/data/2020/neurips/Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization
new file mode 100644
index 0000000000..2c38b7f979
--- /dev/null
+++ b/data/2020/neurips/Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization	
@@ -0,0 +1 @@
+We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes a network. For this problem, lower bounds on the number of gradient computations and the number of communication rounds required to achieve $\varepsilon$ accuracy have recently been proven. We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees. We show that our first method is optimal both in terms of the number of communication rounds and in terms of the number of gradient computations. Unlike existing optimal algorithms, our algorithm does not rely on the expensive evaluation of dual gradients. Our second algorithm is optimal in terms of the number of communication rounds, without a logarithmic factor. Our approach relies on viewing the two proposed algorithms as accelerated variants of the Forward Backward algorithm to solve monotone inclusions associated with the decentralized optimization problem. We also verify the efficacy of our methods against state-of-the-art algorithms through numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimally Deceiving a Learning Leader in Stackelberg Games b/data/2020/neurips/Optimally Deceiving a Learning Leader in Stackelberg Games
new file mode 100644
index 0000000000..95f31d27c9
--- /dev/null
+++ b/data/2020/neurips/Optimally Deceiving a Learning Leader in Stackelberg Games	
@@ -0,0 +1 @@
+Recent results have shown that algorithms for learning the optimal commitment in a Stackelberg game are susceptible to manipulation by the follower. These learning algorithms operate by querying the best responses of the follower, who consequently can deceive the algorithm by using fake best responses, typically by responding according to fake payoffs that are different from the actual ones. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the fake payoffs that would make the learning algorithm output a commitment that benefits them the most. While this problem has been considered before, the related literature has only focused on a simple setting where the follower can only choose from a finite set of payoff matrices, thus leaving the general version of the problem unanswered. In this paper, we fill this gap by showing that it is always possible for the follower to efficiently compute (near-)optimal fake payoffs, for various scenarios of learning interaction between the leader and the follower. Our results also establish an interesting connection between the follower’s deception and the leader’s maximin utility: through deception, the follower can induce almost any (fake) Stackelberg equilibrium if and only if the leader obtains at least their maximin utility in this equilibrium.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities b/data/2020/neurips/Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities
new file mode 100644
index 0000000000..156fc02c5b
--- /dev/null
+++ b/data/2020/neurips/Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities	
@@ -0,0 +1 @@
+The optimization problems associated with training generative adversarial neural networks can be largely reduced to certain {\em non-monotone} variational inequality problems (VIPs), whereas existing convergence results are mostly based on monotone or strongly monotone assumptions. In this paper, we propose {\em optimistic dual extrapolation (OptDE)}, a method that only performs {\em one} gradient evaluation per iteration. We show that OptDE is provably convergent to {\em a strong solution} under different coherent non-monotone assumptions. In particular, when a {\em weak solution} exists, the convergence rate of our method is $O(1/{\epsilon^{2}})$, which matches the best existing result of the methods with two gradient evaluations. Further, when a {\em $\sigma$-weak solution} exists, the convergence guarantee is improved to the linear rate $O(\log\frac{1}{\epsilon})$. Along the way--as a byproduct of our inquiries into non-monotone variational inequalities--we provide the near-optimal $O\big(\frac{1}{\epsilon}\log \frac{1}{\epsilon}\big)$ convergence guarantee in terms of restricted strong merit function for monotone variational inequalities. We also show how our results can be naturally generalized to the stochastic setting, and obtain corresponding new convergence results. Taken together, our results contribute to the broad landscape of variational inequality--both non-monotone and monotone alike--by providing a novel and more practical algorithm with the state-of-the-art convergence guarantees.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks b/data/2020/neurips/Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
new file mode 100644
index 0000000000..5bc0a52966
--- /dev/null
+++ b/data/2020/neurips/Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks	
@@ -0,0 +1 @@
+It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as \textit{over-smoothing}. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the depth under the conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions b/data/2020/neurips/Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions
new file mode 100644
index 0000000000..39f03d91fe
--- /dev/null
+++ b/data/2020/neurips/Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions	
@@ -0,0 +1,6 @@
+We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width $m$ is larger than the input dimension $d$. 
+We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width $m^*\le m$. 
+We describe how the empirical loss landscape is affected by the number $n$ of data samples and the width $m^*$ of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on $n$, $d$, and $m^*$, thereby establishing conditions under which the neural network can in principle recover the teacher. 
+We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice. 
+Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples. 
+These results are confirmed by numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimizing Mode Connectivity via Neuron Alignment b/data/2020/neurips/Optimizing Mode Connectivity via Neuron Alignment
new file mode 100644
index 0000000000..241801d242
--- /dev/null
+++ b/data/2020/neurips/Optimizing Mode Connectivity via Neuron Alignment	
@@ -0,0 +1 @@
+The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.
\ No newline at end of file
diff --git a/data/2020/neurips/Optimizing Neural Networks via Koopman Operator Theory b/data/2020/neurips/Optimizing Neural Networks via Koopman Operator Theory
new file mode 100644
index 0000000000..95d3c92c3b
--- /dev/null
+++ b/data/2020/neurips/Optimizing Neural Networks via Koopman Operator Theory	
@@ -0,0 +1 @@
+Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theory methods allow for accurate predictions of the weights and biases of a feedforward, fully connected deep network over a non-trivial range of training time. During this time window, we find that our approach is at least 10x faster than gradient descent based methods, in line with the results expected from our complexity analysis. We highlight additional methods by which our results can be expanded to broader classes of networks and larger time intervals, which shall be the focus of future work in this novel intersection between dynamical systems and neural network theory.
\ No newline at end of file
diff --git a/data/2020/neurips/OrganITE: Optimal transplant donor organ offering using an individual treatment effect b/data/2020/neurips/OrganITE: Optimal transplant donor organ offering using an individual treatment effect
new file mode 100644
index 0000000000..25107ff52f
--- /dev/null
+++ b/data/2020/neurips/OrganITE: Optimal transplant donor organ offering using an individual treatment effect	
@@ -0,0 +1 @@
+Transplant-organs are a scarce medical resource. The uniqueness of each organ and the patients’ heterogeneous responses to the organs present a unique and challenging machine learning problem. In this problem there are two key challenges: (i) assigning each organ "optimally" to a patient in the queue; (ii) accurately estimating the potential outcomes associated with each patient and each possible organ. In this paper, we introduce OrganITE, an organ-to-patient assignment methodology that assigns organs based not only on its own estimates of the potential outcomes but also on organ scarcity. By modelling and accounting for organ scarcity we signiﬁcantly increase total life years across the population, compared to the existing greedy approaches that simply optimise life years for the current organ available. Moreover, we propose an individualised treatment effect model capable of addressing the high dimensionality of the organ space. We test our method on real and simulated data, resulting in as much as an additional year of life expectancy as compared to existing organ-to-patient policies.
\ No newline at end of file
diff --git a/data/2020/neurips/Organizing recurrent network dynamics by task-computation to enable continual learning b/data/2020/neurips/Organizing recurrent network dynamics by task-computation to enable continual learning
new file mode 100644
index 0000000000..5432e130b2
--- /dev/null
+++ b/data/2020/neurips/Organizing recurrent network dynamics by task-computation to enable continual learning	
@@ -0,0 +1 @@
+Biological systems face dynamic environments that require continual learning. It is not well understood how these systems balance the tension between ﬂexibility for learning and robustness for memory of previous behaviors. Continual learning without catastrophic interference also remains a challenging problem in machine learning. Here, we develop a novel learning rule designed to minimize interference between sequentially learned tasks in recurrent networks. Our learning rule preserves network dynamics within activity-deﬁned subspaces used for previously learned tasks. It encourages dynamics associated with new tasks that might otherwise interfere to instead explore orthogonal subspaces, and it allows for reuse of previously established dynamical motifs where possible. Employing a set of tasks used in neuroscience, we demonstrate that our approach successfully eliminates catastrophic interference and offers a substantial improvement over previous continual learning algorithms. Using dynamical systems analysis, we show that networks trained using our approach can reuse similar dynamical structures across similar tasks. This possibility for shared computation allows for faster learning during sequential training. Finally, we identify organizational differences that emerge when training tasks sequentially versus simultaneously.
\ No newline at end of file
diff --git a/data/2020/neurips/Outlier Robust Mean Estimation with Subgaussian Rates via Stability b/data/2020/neurips/Outlier Robust Mean Estimation with Subgaussian Rates via Stability
new file mode 100644
index 0000000000..bfb44f5d68
--- /dev/null
+++ b/data/2020/neurips/Outlier Robust Mean Estimation with Subgaussian Rates via Stability	
@@ -0,0 +1 @@
+We study the problem of outlier robust high-dimensional mean estimation under a finite covariance assumption, and more broadly under finite low-degree moment assumptions. We consider a standard stability condition from the recent robust statistics literature and prove that, except with exponentially small failure probability, there exists a large fraction of the inliers satisfying this condition. As a corollary, it follows that a number of recently developed algorithms for robust mean estimation, including iterative filtering and non-convex gradient descent, give optimal error estimators with (near-)subgaussian rates. Previous analyses of these algorithms gave significantly suboptimal rates. As a corollary of our approach, we obtain the first computationally efficient algorithm with subgaussian rate for outlier-robust mean estimation in the strong contamination model under a finite covariance assumption.
\ No newline at end of file
diff --git a/data/2020/neurips/Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality b/data/2020/neurips/Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
new file mode 100644
index 0000000000..2027b6876d
--- /dev/null
+++ b/data/2020/neurips/Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality	
@@ -0,0 +1 @@
+Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory for standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective. Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be \emph{exponential} in input dimension $d$, and with an unnatural activation function. Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with the ReLU activation. Key element of our proof is showing that ReLU networks near initialization can approximate the step function, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree b/data/2020/neurips/Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree
new file mode 100644
index 0000000000..61d3782abe
--- /dev/null
+++ b/data/2020/neurips/Overfitting Can Be Harmless for Basis Pursuit, But Only to a Degree	
@@ -0,0 +1 @@
+Recently, there have been signiﬁcant interests in studying the so-called “double-descent” of the generalization error of linear regression models under the overpa-rameterized and overﬁtting regime, with the hope that such analysis may provide the ﬁrst step towards understanding why overparameterized deep neural networks (DNN) still generalize well. However, to date most of these studies focused on the min (cid:96) 2 -norm solution that overﬁts the data. In contrast, in this paper we study the overﬁtting solution that minimizes the (cid:96) 1 -norm, which is known as Basis Pursuit (BP) in the compressed sensing literature. Under a sparse true linear regression model with p i.i.d. Gaussian features, we show that for a large range of p up to a limit that grows exponentially with the number of samples n , with high probability the model error of BP is upper bounded by a value that decreases with p . To the best of our knowledge, this is the ﬁrst analytical result in the literature establishing the double-descent of overﬁtting BP for ﬁnite n and p . Further, our results reveal signiﬁcant differences between the double-descent of BP and min (cid:96) 2 -norm solutions. Speciﬁcally, the double-descent upper-bound of BP is independent of the signal strength, and for high SNR and sparse models the descent-ﬂoor of BP can be much lower and wider than that of min (cid:96) 2 -norm solutions.
\ No newline at end of file
diff --git a/data/2020/neurips/PAC-Bayes Analysis Beyond the Usual Bounds b/data/2020/neurips/PAC-Bayes Analysis Beyond the Usual Bounds
new file mode 100644
index 0000000000..0b8ca4582b
--- /dev/null
+++ b/data/2020/neurips/PAC-Bayes Analysis Beyond the Usual Bounds	
@@ -0,0 +1,2 @@
+We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. 
+Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirement of fixed `data-free' priors and illustrate the use of data-dependent priors. We also present a simple bound that is valid for a loss function with unbounded range. Our analysis clarifies that those two requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes inequality remains valid with those restrictions removed.
\ No newline at end of file
diff --git a/data/2020/neurips/PAC-Bayes Learning Bounds for Sample-Dependent Priors b/data/2020/neurips/PAC-Bayes Learning Bounds for Sample-Dependent Priors
new file mode 100644
index 0000000000..d14da605da
--- /dev/null
+++ b/data/2020/neurips/PAC-Bayes Learning Bounds for Sample-Dependent Priors	
@@ -0,0 +1 @@
+We present a series of new PAC-Bayes learning guarantees for randomized algorithms with sample-dependent priors. Our most general bounds make no assumption on the priors and are given in terms of certain covering numbers under the inﬁnite-Rényi divergence and the (cid:96) 1 distance. We show how to use these general bounds to derive leaning bounds in the setting where the sample-dependent priors obey an inﬁnite-Rényi divergence or (cid:96) 1 -distance sensitivity condition. We also provide a ﬂexible framework for computing PAC-Bayes bounds, under certain stability assumptions on the sample-dependent priors, and show how to use this framework to give more reﬁned bounds when the priors satisfy an inﬁnite-Rényi divergence sensitivity condition.
\ No newline at end of file
diff --git a/data/2020/neurips/PAC-Bayesian Bound for the Conditional Value at Risk b/data/2020/neurips/PAC-Bayesian Bound for the Conditional Value at Risk
new file mode 100644
index 0000000000..de5f9e5dbd
--- /dev/null
+++ b/data/2020/neurips/PAC-Bayesian Bound for the Conditional Value at Risk	
@@ -0,0 +1 @@
+Conditional Value at Risk (CVaR) is a family of "coherent risk measures" which generalize the traditional mathematical expectation. Widely used in mathematical finance, it is garnering increasing interest in machine learning, e.g., as an alternate approach to regularization, and as a means for ensuring fairness. This paper presents a generalization bound for learning algorithms that minimize the CVaR of the empirical loss. The bound is of PAC-Bayesian type and is guaranteed to be small when the empirical CVaR is small. We achieve this by reducing the problem of estimating CVaR to that of merely estimating an expectation. This then enables us, as a by-product, to obtain concentration inequalities for CVaR even when the random variable in question is unbounded.
\ No newline at end of file
diff --git a/data/2020/neurips/PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning b/data/2020/neurips/PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning
new file mode 100644
index 0000000000..1993b8fd13
--- /dev/null
+++ b/data/2020/neurips/PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning	
@@ -0,0 +1 @@
+Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their primary drawback is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approaches and Q-learning directly handle exploration through the use of optimism, their ability to handle model misspecification and function approximation is far less evident. This work introduces the the Policy Cover-Policy Gradient (PC-PG) algorithm, which provably balances the exploration vs. exploitation tradeoff using an ensemble of learned policies (the policy cover). PC-PG enjoys polynomial sample complexity and run time for both tabular MDPs and, more generally, linear MDPs in an infinite dimensional RKHS. Furthermore, PC-PG also has strong guarantees under model misspecification that go beyond the standard worst case $\ell_{\infty}$ assumptions; this includes approximation guarantees for state aggregation under an average case error assumption, along with guarantees under a more general assumption where the approximation error under distribution shift is controlled. We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.
\ No newline at end of file
diff --git a/data/2020/neurips/PEP: Parameter Ensembling by Perturbation b/data/2020/neurips/PEP: Parameter Ensembling by Perturbation
new file mode 100644
index 0000000000..d84345552e
--- /dev/null
+++ b/data/2020/neurips/PEP: Parameter Ensembling by Perturbation	
@@ -0,0 +1 @@
+Ensembling is now recognized as an effective approach for increasing the predictive performance and calibration of deep networks. We introduce a new approach, Parameter Ensembling by Perturbation (PEP), that constructs an ensemble of parameter values as random perturbations of the optimal parameter set from training by a Gaussian with a single variance parameter. The variance is chosen to maximize the log-likelihood of the ensemble average ( L ) on the validation data set. Empirically, and perhaps surprisingly, L has a well-defined maximum as the variance grows from zero (which corresponds to the baseline model). Conveniently, calibration level of predictions also tends to grow favorably until the peak of L is reached. In most experiments, PEP provides a small improvement in performance, and, in some cases, a substantial improvement in empirical calibration. We show that this "PEP effect" (the gain in log-likelihood) is related to the mean curvature of the likelihood function and the empirical Fisher information. Experiments on ImageNet pre-trained networks including ResNet, DenseNet, and Inception showed improved calibration and likelihood. We further observed a mild improvement in classification accuracy on these networks. Experiments on classification benchmarks such as MNIST and CIFAR-10 showed improved calibration and likelihood, as well as the relationship between the PEP effect and overfitting; this demonstrates that PEP can be used to probe the level of overfitting that occurred during training. In general, no special training procedure or network architecture is needed, and in the case of pre-trained networks, no additional training is needed.
\ No newline at end of file
diff --git a/data/2020/neurips/PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks b/data/2020/neurips/PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks
new file mode 100644
index 0000000000..45c457665b
--- /dev/null
+++ b/data/2020/neurips/PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks	
@@ -0,0 +1 @@
+In Graph Neural Networks (GNNs), the graph structure is incorporated into the learning of node representations. This complex structure makes explaining GNNs' predictions become much more challenging. In this paper, we propose PGM-Explainer, a Probabilistic Graphical Model (PGM) model-agnostic explainer for GNNs. Given a prediction to be explained, PGM-Explainer identifies crucial graph components and generates an explanation in form of a PGM approximating that prediction. Different from existing explainers for GNNs where the explanations are drawn from a set of linear functions of explained features, PGM-Explainer is able to demonstrate the dependencies of explained features in form of conditional probabilities. Our theoretical analysis shows that the PGM generated by PGM-Explainer includes the Markov-blanket of the target prediction, i.e. including all its statistical information. We also show that the explanation returned by PGM-Explainer contains the same set of independence statements in the perfect map. Our experiments on both synthetic and real-world datasets show that PGM-Explainer achieves better performance than existing explainers in many benchmark tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/PIE-NET: Parametric Inference of Point Cloud Edges b/data/2020/neurips/PIE-NET: Parametric Inference of Point Cloud Edges
new file mode 100644
index 0000000000..eb0c294d6a
--- /dev/null
+++ b/data/2020/neurips/PIE-NET: Parametric Inference of Point Cloud Edges	
@@ -0,0 +1 @@
+We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data. We represent these edges as a collection of parametric curves (i.e.,lines, circles, and B-splines). Accordingly, our deep neural network, coined PIE-NET, is trained for parametric inference of edges. The network relies on a "region proposal" architecture, where a first module proposes an over-complete collection of edge and corner points, and a second module ranks each proposal to decide whether it should be considered. We train and evaluate our method on the ABC dataset, a large dataset of CAD models, and compare our results to those produced by traditional (non-learning) processing pipelines, as well as a recent deep learning based edge detector (EC-NET). Our results significantly improve over the state-of-the-art from both a quantitative and qualitative standpoint.
\ No newline at end of file
diff --git a/data/2020/neurips/PLANS: Neuro-Symbolic Program Learning from Videos b/data/2020/neurips/PLANS: Neuro-Symbolic Program Learning from Videos
new file mode 100644
index 0000000000..072c23c4ea
--- /dev/null
+++ b/data/2020/neurips/PLANS: Neuro-Symbolic Program Learning from Videos	
@@ -0,0 +1 @@
+Recent years have seen the rise of statistical program learning based on neural models as an alternative to traditional rule-based systems for programming by example. Rule-based approaches offer correctness guarantees in an unsupervised way as they inherently capture logical rules, while neural models are more realistically scalable to raw, high-dimensional input, and provide resistance to noisy I/O speciﬁcations. We introduce PLANS (Program LeArning from Neurally inferred Speciﬁcations), a hybrid model for program synthesis from visual observations that gets the best of both worlds, relying on (i) a neural architecture trained to extract abstract, high-level information from each raw individual input (ii) a rule-based system using the extracted information as I/O speciﬁcations to synthesize a program capturing the different observations. In order to address the key challenge of making PLANS resistant to noise in the network’s output, we introduce a dynamic ﬁltering algorithm for I/O speciﬁcations based on selective classiﬁcation techniques. We obtain state-of-the-art performance at program synthesis from diverse demonstration videos in the Karel and ViZDoom environments, while requiring no ground-truth program for training.
\ No newline at end of file
diff --git a/data/2020/neurips/PLLay: Efficient Topological Layer based on Persistent Landscapes b/data/2020/neurips/PLLay: Efficient Topological Layer based on Persistent Landscapes
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis b/data/2020/neurips/POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis
new file mode 100644
index 0000000000..dc7d787807
--- /dev/null
+++ b/data/2020/neurips/POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis	
@@ -0,0 +1 @@
+Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces. In this paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, a much less understood problem with important applications in control and robotics. We introduce POLY-HOOT, an algorithm that augments MCTS with a continuous armed bandit strategy named Hierarchical Optimistic Optimization (HOO) (Bubeck et al., 2011). Specifically, we enhance HOO by using an appropriate polynomial, rather than logarithmic, bonus term in the upper confidence bounds. Such a polynomial bonus is motivated by its empirical successes in AlphaGo Zero (Silver et al., 2017b), as well as its significant role in achieving theoretical guarantees of finite space MCTS (Shah et al., 2019). We investigate, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems. Using this result as a building block, we establish non-asymptotic convergence guarantees for POLY-HOOT: the value estimate converges to an arbitrarily small neighborhood of the optimal value function at a polynomial rate. We further provide experimental results that corroborate our theoretical findings.
\ No newline at end of file
diff --git a/data/2020/neurips/POMDPs in Continuous Time and Discrete Spaces b/data/2020/neurips/POMDPs in Continuous Time and Discrete Spaces
new file mode 100644
index 0000000000..4e4294c75d
--- /dev/null
+++ b/data/2020/neurips/POMDPs in Continuous Time and Discrete Spaces	
@@ -0,0 +1 @@
+Many processes, such as discrete event systems in engineering or population dynamics in biology, evolve in discrete space and continuous time. We consider the problem of optimal decision making in such discrete state and action space systems under partial observability. This places our work at the intersection of optimal filtering and optimal control. At the current state of research, a mathematical description for simultaneous decision making and filtering in continuous time with finite state and action spaces is still missing. In this paper, we give a mathematical description of a continuous-time partial observable Markov decision process (POMDP). By leveraging optimal filtering theory we derive a Hamilton-Jacobi-Bellman (HJB) type equation that characterizes the optimal solution. Using techniques from deep learning we approximately solve the resulting partial integro-differential equation. We present (i) an approach solving the decision problem offline by learning an approximation of the value function and (ii) an online algorithm which provides a solution in belief space using deep reinforcement learning. We show the applicability on a set of toy examples which pave the way for future methods providing solutions for high dimensional problems.
\ No newline at end of file
diff --git a/data/2020/neurips/POMO: Policy Optimization with Multiple Optima for Reinforcement Learning b/data/2020/neurips/POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
new file mode 100644
index 0000000000..f6fe44519a
--- /dev/null
+++ b/data/2020/neurips/POMO: Policy Optimization with Multiple Optima for Reinforcement Learning	
@@ -0,0 +1 @@
+In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a great potential in practical applications because it allows near-optimal solutions to be found without expert guides armed with substantial domain knowledge. We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a heuristic solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution. POMO uses a modified REINFORCE algorithm that forces diverse rollouts towards all optimal solutions. Empirically, the low-variance baseline of POMO makes RL training fast and stable, and it is more resistant to local minima compared to previous approaches. We also introduce a new augmentation-based inference method, which accompanies POMO nicely. We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP). For all three, our solver based on POMO shows a significant improvement in performance over all recent learned heuristics. In particular, we achieve the optimality gap of 0.14% with TSP100 while reducing inference time by more than an order of magnitude.
\ No newline at end of file
diff --git a/data/2020/neurips/PRANK: motion Prediction based on RANKing b/data/2020/neurips/PRANK: motion Prediction based on RANKing
new file mode 100644
index 0000000000..7a81ff38f6
--- /dev/null
+++ b/data/2020/neurips/PRANK: motion Prediction based on RANKing	
@@ -0,0 +1 @@
+Predicting the motion of agents such as pedestrians or human-driven vehicles is one of the most critical problems in the autonomous driving domain. The overall safety of driving and the comfort of a passenger directly depend on its successful solution. The motion prediction problem also remains one of the most challenging problems in autonomous driving engineering, mainly due to high variance of the possible agent's future behavior given a situation. The two phenomena responsible for the said variance are the multimodality caused by the uncertainty of the agent's intent (e.g., turn right or move forward) and uncertainty in the realization of a given intent (e.g., which lane to turn into). To be useful within a real-time autonomous driving pipeline, a motion prediction system must provide efficient ways to describe and quantify this uncertainty, such as computing posterior modes and their probabilities or estimating density at the point corresponding to a given trajectory. It also should not put substantial density on physically impossible trajectories, as they can confuse the system processing the predictions. In this paper, we introduce the PRANK method, which satisfies these requirements. PRANK takes rasterized bird-eye images of agent's surroundings as an input and extracts features of the scene with a convolutional neural network. It then produces the conditional distribution of agent's trajectories plausible in the given scene. The key contribution of PRANK is a way to represent that distribution using nearest-neighbor methods in latent trajectory space, which allows for efficient inference in real time. We evaluate PRANK on the in-house and Argoverse datasets, where it shows competitive results.
\ No newline at end of file
diff --git a/data/2020/neurips/Parabolic Approximation Line Search for DNNs b/data/2020/neurips/Parabolic Approximation Line Search for DNNs
new file mode 100644
index 0000000000..7e0f6974e7
--- /dev/null
+++ b/data/2020/neurips/Parabolic Approximation Line Search for DNNs	
@@ -0,0 +1 @@
+A major challenge in current optimization research for deep learning is to automatically find optimal step sizes for each update step. The optimal step size is closely related to the shape of the loss in the update step direction. However, this shape has not yet been examined in detail. This work shows empirically that the batch loss over lines in negative gradient direction is mostly convex locally and well suited for one-dimensional parabolic approximations. By exploiting this parabolic property we introduce a simple and robust line search approach, which performs loss-shape dependent update steps. Our approach combines well-known methods such as parabolic approximation, line search and conjugate gradient, to perform efficiently. It surpasses other step size estimating methods and competes with common optimization methods on a large variety of experiments without the need of hand-designed step size schedules. Thus, it is of interest for objectives where step-size schedules are unknown or do not perform well. Our extensive evaluation includes multiple comprehensive hyperparameter grid searches on several datasets and architectures. Finally, we provide a general investigation of exact line searches in the context of batch losses and exact losses, including their relation to our line search approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Parameterized Explainer for Graph Neural Network b/data/2020/neurips/Parameterized Explainer for Graph Neural Network
new file mode 100644
index 0000000000..5022c2b729
--- /dev/null
+++ b/data/2020/neurips/Parameterized Explainer for Graph Neural Network	
@@ -0,0 +1 @@
+Despite recent progress in Graph Neural Networks (GNNs), explaining predictions made by GNNs remains a challenging open problem. The leading method independently addresses the local explanations (i.e., important subgraph structure and node features) to interpret why a GNN model makes the prediction for a single instance, e.g. a node or a graph. As a result, the explanation generated is painstakingly customized for each instance. The unique explanation interpreting each instance independently is not sufficient to provide a global understanding of the learned GNN model, leading to a lack of generalizability and hindering it from being used in the inductive setting. Besides, as it is designed for explaining a single instance, it is challenging to explain a set of instances naturally (e.g., graphs of a given class). In this study, we address these key challenges and propose PGExplainer, a parameterized explainer for GNNs. PGExplainer adopts a deep neural network to parameterize the generation process of explanations, which enables PGExplainer a natural approach to explaining multiple instances collectively. Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting easily. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7\% relative improvement in AUC on explaining graph classification over the leading baseline.
\ No newline at end of file
diff --git a/data/2020/neurips/Parametric Instance Classification for Unsupervised Visual Feature learning b/data/2020/neurips/Parametric Instance Classification for Unsupervised Visual Feature learning
new file mode 100644
index 0000000000..92ce3692b9
--- /dev/null
+++ b/data/2020/neurips/Parametric Instance Classification for Unsupervised Visual Feature learning	
@@ -0,0 +1 @@
+This paper presents parametric instance classification (PIC) for unsupervised visual feature learning. Unlike the state-of-the-art approaches which do instance discrimination in a dual-branch non-parametric fashion, PIC directly performs a one-branch parametric instance classification, revealing a simple framework similar to supervised classification and without the need to address the information leakage issue. We show that the simple PIC framework can be as effective as the state-of-the-art approaches, i.e. SimCLR and MoCo v2, by adapting several common component settings used in the state-of-the-art approaches. We also propose two novel techniques to further improve effectiveness and practicality of PIC: 1) a sliding-window data scheduler, instead of the previous epoch-based data scheduler, which addresses the extremely infrequent instance visiting issue in PIC and improves the effectiveness; 2) a negative sampling and weight update correction approach to reduce the training time and GPU memory consumption, which also enables application of PIC to almost unlimited training images. We hope that the PIC framework can serve as a simple baseline to facilitate future study.
\ No newline at end of file
diff --git a/data/2020/neurips/Part-dependent Label Noise: Towards Instance-dependent Label Noise b/data/2020/neurips/Part-dependent Label Noise: Towards Instance-dependent Label Noise
new file mode 100644
index 0000000000..536baf0fec
--- /dev/null
+++ b/data/2020/neurips/Part-dependent Label Noise: Towards Instance-dependent Label Noise	
@@ -0,0 +1 @@
+Learning with the instance-dependent label noise is challenging, because it is hard to model such real-world noise. Note that there are psychological and physiological evidences showing that we humans perceive instances by decomposing them into parts. Annotators are therefore more likely to annotate instances based on the parts rather than the whole instances, where a wrong mapping from parts to classes may cause the instance-dependent label noise. Motivated by this human cognition, in this paper, we approximate the instance-dependent label noise by exploiting part-dependent label noise. Speciﬁcally, since instances can be approximately reconstructed by a combination of parts, we approximate the instance-dependent transition matrix for an instance by a combination of the transition matrices for the parts of the instance. The transition matrices for parts can be learned by exploiting anchor points (i.e., data points that belong to a speciﬁc class almost surely). Empirical evaluations on synthetic and real-world datasets demonstrate our method is superior to the state-of-the-art approaches for learning from the instance-dependent label noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Partially View-aligned Clustering b/data/2020/neurips/Partially View-aligned Clustering
new file mode 100644
index 0000000000..b7480b8696
--- /dev/null
+++ b/data/2020/neurips/Partially View-aligned Clustering	
@@ -0,0 +1 @@
+In this paper, we study one challenging issue in multi-view data clustering. To be speciﬁc, for two data matrices X (1) and X (2) corresponding to two views, we do not assume that X (1) and X (2) are fully aligned in row-wise. Instead, we assume that only a small portion of the matrices has established the correspondence in advance. Such a partially view-aligned problem (PVP) could lead to the intensive labor of capturing or establishing the aligned multi-view data, which has less been touched so far to the best of our knowledge. To solve this practical and challenging problem, we propose a novel multi-view clustering method termed partially view-aligned clustering (PVC). To be speciﬁc, PVC proposes to use a differentiable surrogate of the non-differentiable Hungarian algorithm and recasts it as a pluggable module. As a result, the category-level correspondence of the unaligned data could be established in a latent space learned by a neural network, while learning a common space across different views using the “aligned” data. Extensive experimental results show promising results of our method in clustering partially view-aligned data.
\ No newline at end of file
diff --git "a/data/2020/neurips/Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning\342\200\213" "b/data/2020/neurips/Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning\342\200\213"
new file mode 100644
index 0000000000..8d31d3c122
--- /dev/null
+++ "b/data/2020/neurips/Patch2Self: Denoising Diffusion MRI with Self-Supervised Learning\342\200\213"	
@@ -0,0 +1 @@
+Diffusion-weighted magnetic resonance imaging (DWI) is the only noninvasive method for quantifying microstructure and reconstructing white-matter pathways in the living human brain. Fluctuations from multiple sources create significant additive noise in DWI data which must be suppressed before subsequent microstructure analysis. We introduce a self-supervised learning method for denoising DWI data, Patch2Self, which uses the entire volume to learn a full-rank locally linear denoiser for that volume. By taking advantage of the oversampled q-space of DWI data, Patch2Self can separate structure from noise without requiring an explicit model for either. We demonstrate the effectiveness of Patch2Self via quantitative and qualitative improvements in microstructure modeling, tracking (via fiber bundle coherency) and model estimation relative to other unsupervised methods on real and simulated data.
\ No newline at end of file
diff --git a/data/2020/neurips/Path Integral Based Convolution and Pooling for Graph Neural Networks b/data/2020/neurips/Path Integral Based Convolution and Pooling for Graph Neural Networks
new file mode 100644
index 0000000000..eff9f8db1a
--- /dev/null
+++ b/data/2020/neurips/Path Integral Based Convolution and Pooling for Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) extend the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose path integral-based GNNs (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix that we call the maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus leading to a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics that we propose to boost applications of GNN in physical sciences.
\ No newline at end of file
diff --git a/data/2020/neurips/Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks b/data/2020/neurips/Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks
new file mode 100644
index 0000000000..f62fa81b9f
--- /dev/null
+++ b/data/2020/neurips/Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks	
@@ -0,0 +1 @@
+In networks with binary activations and or binary weights the training by gradient descent is complicated as the model has piecewise constant response. We consider stochastic binary networks, obtained by adding noises in front of activations. The expected model response becomes a smooth function of parameters, its gradient is well defined but is challenging to estimate accurately. We propose a new method for this estimation problem combining sampling and analytic approximation steps. The method has a significantly reduced variance at the price of a small bias which gives a very practical tradeoff in comparison with existing unbiased and biased estimators. We further show that one extra linearization step leads to a deep straight-through estimator previously known only as an ad-hoc heuristic. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models with both proposed methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets b/data/2020/neurips/Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets
new file mode 100644
index 0000000000..bfb2a8747d
--- /dev/null
+++ b/data/2020/neurips/Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets	
@@ -0,0 +1 @@
+We study the problem of sampling from a probability distribution on $\mathbb R^p$ defined via a convex and smooth potential function. We consider a continuous-time diffusion-type process, termed Penalized Langevin dynamics (PLD), the drift of which is the negative gradient of the potential plus a linear penalty that vanishes when time goes to infinity. An upper bound on the Wasserstein-2 distance between the distribution of the PLD at time $t$ and the target is established. This upper bound highlights the influence of the speed of decay of the penalty on the accuracy of the approximation. As a consequence, considering the low-temperature limit we infer a new nonasymptotic guarantee of convergence of the penalized gradient flow for the optimization problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Permute-and-Flip: A new mechanism for differentially private selection b/data/2020/neurips/Permute-and-Flip: A new mechanism for differentially private selection
new file mode 100644
index 0000000000..7527e49c15
--- /dev/null
+++ b/data/2020/neurips/Permute-and-Flip: A new mechanism for differentially private selection	
@@ -0,0 +1 @@
+We consider the problem of differentially private selection. Given a finite set of candidate items and a quality score for each item, our goal is to design a differentially private mechanism that returns an item with a score that is as high as possible. The most commonly used mechanism for this task is the exponential mechanism. In this work, we propose a new mechanism for this task based on a careful analysis of the privacy constraints. The expected score of our mechanism is always at least as large as the exponential mechanism, and can offer improvements up to a factor of two. Our mechanism is simple to implement and runs in linear time.
\ No newline at end of file
diff --git a/data/2020/neurips/Personalized Federated Learning with Moreau Envelopes b/data/2020/neurips/Personalized Federated Learning with Moreau Envelopes
new file mode 100644
index 0000000000..4df467af33
--- /dev/null
+++ b/data/2020/neurips/Personalized Federated Learning with Moreau Envelopes	
@@ -0,0 +1 @@
+Federated learning (FL) is a decentralized and privacy-preserving machine learning technique in which a group of clients collaborate with a server to learn a global model without sharing clients' data. One challenge associated with FL is statistical diversity among clients, which restricts the global model from delivering good performance on each client's task. To address this, we propose an algorithm for personalized FL (pFedMe) using Moreau envelopes as clients' regularized loss functions, which help decouple personalized model optimization from the global model learning in a bi-level problem stylized for personalized FL. Theoretically, we show that pFedMe's convergence rate is state-of-the-art: achieving quadratic speedup for strongly convex and sublinear speedup of order 2/3 for smooth nonconvex objectives. Experimentally, we verify that pFedMe excels at empirical performance compared with the vanilla FedAvg and Per-FedAvg, a meta-learning based personalized FL algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach b/data/2020/neurips/Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach
new file mode 100644
index 0000000000..d8269b4836
--- /dev/null
+++ b/data/2020/neurips/Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach	
@@ -0,0 +1 @@
+In Federated Learning, we aim to train models across multiple computing units (users), while users can only communicate with a common central server, without exchanging their data samples. This mechanism exploits the computational power of all users and allows users to obtain a richer model as their models are trained over a larger set of data points. However, this scheme only develops a common output for all the users, and, therefore, it does not adapt the model to each user. This is an important missing feature, especially given the heterogeneity of the underlying data distribution for various users. In this paper, we study a personalized variant of the federated learning in which our goal is to ﬁnd an initial shared model that current or new users can easily adapt to their local dataset by performing one or a few steps of gradient descent with respect to their own data. This approach keeps all the beneﬁts of the federated learning architecture, and, by structure, leads to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we study a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.
\ No newline at end of file
diff --git a/data/2020/neurips/Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability b/data/2020/neurips/Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability
new file mode 100644
index 0000000000..3328359ee8
--- /dev/null
+++ b/data/2020/neurips/Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability	
@@ -0,0 +1 @@
+We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. Rather than focusing on crossing decision boundaries at the output layer of the source model, our method perturbs representations throughout the extracted feature hierarchy to resemble other classes. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance between ImageNet DNNs. We also show the superiority of our feature space methods under a relaxation of the common assumption that the source and target models are trained on the same dataset and label space, in some instances achieving a $10\times$ increase in targeted success rate relative to other blackbox transfer methods. Finally, we analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.
\ No newline at end of file
diff --git a/data/2020/neurips/Phase retrieval in high dimensions: Statistical and computational phase transitions b/data/2020/neurips/Phase retrieval in high dimensions: Statistical and computational phase transitions
new file mode 100644
index 0000000000..f71e1c36b8
--- /dev/null
+++ b/data/2020/neurips/Phase retrieval in high dimensions: Statistical and computational phase transitions	
@@ -0,0 +1 @@
+We consider the phase retrieval problem of reconstructing a $n$-dimensional real or complex signal $\mathbf{X}^{\star}$ from $m$ (possibly noisy) observations $Y_\mu = | \sum_{i=1}^n \Phi_{\mu i} X^{\star}_i/\sqrt{n}|$, for a large class of correlated real and complex random sensing matrices $\mathbf{\Phi}$, in a high-dimensional setting where $m,n\to\infty$ while $\alpha = m/n=\Theta(1)$. First, we derive sharp asymptotics for the lowest possible estimation error achievable statistically and we unveil the existence of sharp phase transitions for the weak- and full-recovery thresholds as a function of the singular values of the matrix $\mathbf{\Phi}$. This is achieved by providing a rigorous proof of a result first obtained by the replica method from statistical mechanics. In particular, the information-theoretic transition to perfect recovery for full-rank matrices appears at $\alpha=1$ (real case) and $\alpha=2$ (complex case). Secondly, we analyze the performance of the best-known polynomial time algorithm for this problem -- approximate message-passing -- establishing the existence of a statistical-to-algorithmic gap depending, again, on the spectral properties of $\mathbf{\Phi}$. Our work provides an extensive classification of the statistical and algorithmic thresholds in high-dimensional phase retrieval for a broad class of random matrices.
\ No newline at end of file
diff --git a/data/2020/neurips/Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games b/data/2020/neurips/Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games
new file mode 100644
index 0000000000..df84c465bd
--- /dev/null
+++ b/data/2020/neurips/Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games	
@@ -0,0 +1 @@
+Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.
\ No newline at end of file
diff --git a/data/2020/neurips/Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation b/data/2020/neurips/Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation
new file mode 100644
index 0000000000..c9898fddcf
--- /dev/null
+++ b/data/2020/neurips/Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation	
@@ -0,0 +1 @@
+Domain adaptive semantic segmentation aims to train a model performing satisfactory pixel-level predictions on the target with only out-of-domain (source) annotations. The conventional solution to this task is to minimize the discrepancy between source and target to enable effective knowledge transfer. Previous domain discrepancy minimization methods are mainly based on the adversarial training. They tend to consider the domain discrepancy globally, which ignore the pixel-wise relationships and are less discriminative. In this paper, we propose to build the pixel-level cycle association between source and target pixel pairs and contrastively strengthen their connections to diminish the domain gap and make the features more discriminative. To the best of our knowledge, this is a new perspective for tackling such a challenging task. Experiment results on two representative domain adaptation benchmarks, i.e. GTAV $\rightarrow$ Cityscapes and SYNTHIA $\rightarrow$ Cityscapes, verify the effectiveness of our proposed method and demonstrate that our method performs favorably against previous state-of-the-arts. Our method can be trained end-to-end in one stage and introduces no additional parameters, which is expected to serve as a general framework and help ease future research in domain adaptive semantic segmentation. Code is available at https://github.com/kgl-prml/Pixel- Level-Cycle-Association.
\ No newline at end of file
diff --git a/data/2020/neurips/PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals b/data/2020/neurips/PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals
new file mode 100644
index 0000000000..92666e734a
--- /dev/null
+++ b/data/2020/neurips/PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals	
@@ -0,0 +1 @@
+Learning with sparse rewards remains a significant challenge in reinforcement learning (RL), especially when the aim is to train a policy capable of achieving multiple different goals. To date, the most successful approaches for dealing with multi-goal, sparse reward environments have been model-free RL algorithms. In this work we propose PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards. Our method builds on the fact that any trajectory of experience collected by an agent contains useful information about how to achieve the goals observed during that trajectory. We use this to train an ensemble of conditional generative models (GANs) to generate plausible trajectories that lead the agent from its current state towards a specified goal. We then combine these imagined trajectories into a novel planning algorithm in order to achieve the desired goal as efficiently as possible. The performance of PlanGAN has been tested on a number of robotic navigation/manipulation tasks in comparison with a range of model-free reinforcement learning baselines, including Hindsight Experience Replay. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Planning in Markov Decision Processes with Gap-Dependent Sample Complexity b/data/2020/neurips/Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
new file mode 100644
index 0000000000..70908b52b9
--- /dev/null
+++ b/data/2020/neurips/Planning in Markov Decision Processes with Gap-Dependent Sample Complexity	
@@ -0,0 +1 @@
+We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.
\ No newline at end of file
diff --git a/data/2020/neurips/Planning with General Objective Functions: Going Beyond Total Rewards b/data/2020/neurips/Planning with General Objective Functions: Going Beyond Total Rewards
new file mode 100644
index 0000000000..d80a4c21f2
--- /dev/null
+++ b/data/2020/neurips/Planning with General Objective Functions: Going Beyond Total Rewards	
@@ -0,0 +1 @@
+Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i.e., maximize (cid:80) Hh =1 r h where H is the planning horizon. However, this paradigm fails to model important practical applications, e.g., safe control that aims to maximize the lowest reward, i.e., maximize min Hh =1 r h . In this paper, based on techniques in sketching algorithms, we propose a novel planning algorithm in deterministic systems which deals with a large class of objective functions of the form f ( r 1 , r 2 , ...r H ) that are of interest to practical applications. We show that efﬁcient planning is possible if f is symmetric under permutation of coordinates and satisﬁes certain technical conditions. Complementing our algorithm, we further prove that removing any of the conditions will make the problem intractable in the worst case and thus demonstrate the necessity of our conditions.
\ No newline at end of file
diff --git a/data/2020/neurips/Point process models for sequence detection in high-dimensional neural spike trains b/data/2020/neurips/Point process models for sequence detection in high-dimensional neural spike trains
new file mode 100644
index 0000000000..c67b63ee97
--- /dev/null
+++ b/data/2020/neurips/Point process models for sequence detection in high-dimensional neural spike trains	
@@ -0,0 +1 @@
+Sparse sequences of neural spikes are posited to underlie aspects of working memory [1], motor production [2], and learning [3, 4]. Discovering these sequences in an unsupervised manner is a longstanding problem in statistical neuroscience [5-7]. Promising recent work [4, 8] utilized a convolutive nonnegative matrix factorization model [9] to tackle this challenge. However, this model requires spike times to be discretized, utilizes a sub-optimal least-squares criterion, and does not provide uncertainty estimates for model predictions or estimated parameters. We address each of these shortcomings by developing a point process model that characterizes fine-scale sequences at the level of individual spikes and represents sequence occurrences as a small number of marked events in continuous time. This ultra-sparse representation of sequence events opens new possibilities for spike train modeling. For example, we introduce learnable time warping parameters to model sequences of varying duration, which have been experimentally observed in neural circuits [10]. We demonstrate these advantages on experimental recordings from songbird higher vocal center and rodent hippocampus.
\ No newline at end of file
diff --git a/data/2020/neurips/Pointer Graph Networks b/data/2020/neurips/Pointer Graph Networks
new file mode 100644
index 0000000000..3087322233
--- /dev/null
+++ b/data/2020/neurips/Pointer Graph Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due to the vast search space of possible graphs. Here we introduce Pointer Graph Networks (PGNs) which augment sets or graphs with additional inferred edges for improved model expressivity. PGNs allow each node to dynamically point to another node, followed by message passing over these pointers. The sparsity of this adaptable graph structure makes learning tractable while still being sufficiently expressive to simulate complex algorithms. Critically, the pointing mechanism is directly supervised to model long-term sequences of operations on classical data structures, incorporating useful structural inductive biases from theoretical computer science. Qualitatively, we demonstrate that PGNs can learn parallelisable variants of pointer-based data structures, namely disjoint set unions and link/cut trees. PGNs generalise out-of-distribution to 5x larger test inputs on dynamic graph connectivity tasks, outperforming unrestricted GNNs and Deep Sets.
\ No newline at end of file
diff --git a/data/2020/neurips/Policy Improvement via Imitation of Multiple Oracles b/data/2020/neurips/Policy Improvement via Imitation of Multiple Oracles
new file mode 100644
index 0000000000..65e79e4b55
--- /dev/null
+++ b/data/2020/neurips/Policy Improvement via Imitation of Multiple Oracles	
@@ -0,0 +1 @@
+Despite its promise, reinforcement learning’s real-world adoption has been hampered by the need for costly exploration to learn a good policy. Imitation learning (IL) mitigates this shortcoming by using an oracle policy during training as a bootstrap to accelerate the learning process. However, in many practical situations, the learner has access to multiple suboptimal oracles, which may provide conflicting advice in a state. The existing IL literature provides a limited treatment of such scenarios. Whereas in the single-oracle case, the return of the oracle’s policy provides an obvious benchmark for the learner to compete against, neither such a benchmark nor principled ways of outperforming it are known for the multi-oracle setting. In this paper, we propose the state-wise maximum of the oracle policies’ values as a natural baseline to resolve conflicting advice from multiple oracles. Using a reduction of policy optimization to online learning, we introduce a novel IL algorithm MAMBA, which can provably learn a policy competitive with this benchmark. In particular, MAMBA optimizes policies by using a gradient estimator in the style of generalized advantage estimation (GAE). Our theoretical analysis shows that this design makes MAMBA robust and enables it to outperform the oracle policies by a larger margin than the IL state of the art, even in the single-oracle case. In an evaluation against standard policy gradient with GAE and AggreVaTe(D), we showcase MAMBA’s ability to leverage demonstrations both from a single and from multiple weak oracles, and significantly speed up policy optimization
\ No newline at end of file
diff --git a/data/2020/neurips/Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond b/data/2020/neurips/Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond
new file mode 100644
index 0000000000..072c41ade4
--- /dev/null
+++ b/data/2020/neurips/Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond	
@@ -0,0 +1 @@
+Unlike normal-form games, where correlated equilibria have been studied for more than 45 years, extensive-form correlation is still generally not well understood. Part of the reason for this gap is that the sequential nature of extensive-form games allows for a richness of behaviors and incentives that are not possible in normal-form settings. This richness translates to a significantly different complexity landscape surrounding extensive-form correlated equilibria. As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves. In this paper we significantly refine this complexity threshold by showing that, in two-player games, an optimal correlated equilibrium can be computed in polynomial time, provided that a certain condition is satisfied. We show that the condition holds, for example, when all chance moves are public, that is, both players observe all chance moves. This implies that an optimal EFCE, EFCCE and NFCCE can be computed in polynomial time in the game size in two-player games with public chance moves, providing the biggest positive complexity result surrounding extensive-form correlation in more than a decade.
\ No newline at end of file
diff --git a/data/2020/neurips/Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework b/data/2020/neurips/Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
new file mode 100644
index 0000000000..aab30b4966
--- /dev/null
+++ b/data/2020/neurips/Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework	
@@ -0,0 +1 @@
+This paper develops a Pontryagin differentiable programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks. The PDP methodology distinguishes from existing methods by two novel techniques: first, we differentiate the Pontryagin's Maximum Principle, and this allows us to obtain analytical gradient of a trajectory with respect to a tunable parameter of a system, thus enabling end-to-end learning of system dynamics, policy, or/and control objective function; and second, we propose an auxiliary control system in backward pass of the PDP framework, and show that the output of the auxiliary control system is exactly the gradient of the system trajectory with respect to the parameter, which can be iteratively obtained using control tools. We investigate three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning, respectively. We demonstrate the capability of the PDP in each learning mode using various high-dimensional systems, including multilink robot arm, 6-DoF maneuvering UAV, and 6-DoF rocket powered landing.
\ No newline at end of file
diff --git a/data/2020/neurips/Position-based Scaled Gradient for Model Quantization and Pruning b/data/2020/neurips/Position-based Scaled Gradient for Model Quantization and Pruning
new file mode 100644
index 0000000000..d42c5208b8
--- /dev/null
+++ b/data/2020/neurips/Position-based Scaled Gradient for Model Quantization and Pruning	
@@ -0,0 +1 @@
+We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to a weight vector is favorable for model compression domains such as quantization and pruning. PSG reduces the gap between the weight distributions of a full-precision model and its compressed counterpart. This enables the versatile deployment of a model either as an uncompressed mode or as a compressed mode depending on the availability of resources. The experimental results on CIFAR-10/100 and ImageNet datasets show the effectiveness of the proposed PSG in both domains of pruning and quantization even for extremely low bits. The code is released in Github.
\ No newline at end of file
diff --git a/data/2020/neurips/Post-training Iterative Hierarchical Data Augmentation for Deep Networks b/data/2020/neurips/Post-training Iterative Hierarchical Data Augmentation for Deep Networks
new file mode 100644
index 0000000000..56074a05e1
--- /dev/null
+++ b/data/2020/neurips/Post-training Iterative Hierarchical Data Augmentation for Deep Networks	
@@ -0,0 +1 @@
+In this paper, we propose a new iterative hierarchical data augmentation (IHDA) method to ﬁne-tune trained deep neural networks to improve their generalization performance. The IHDA is motivated by three key insights: (1) Deep networks (DNs) are good at learning multi-level representations from data. (2) Performing data augmentation (DA) in the learned feature spaces of DNs can signiﬁcantly improve their performance. (3) Implementing DA in hard-to-learn regions of a feature space can effectively augment the dataset to improve generalization. Accordingly, the IHDA performs DA in a deep feature space, at level l , by transforming it into a distribution space and synthesizing new samples using the learned distributions for data points that lie in hard-to-classify regions, which is estimated by analyzing the neighborhood characteristics of each data point. The synthesized samples are used to ﬁne-tune the parameters of the subsequent layers. The same procedure is then repeated for the feature space at level l + 1 . To avoid overﬁtting, the concept of dropout probability is employed, which is gradually relaxed as the IHDA works towards high-level feature spaces. IHDA provided a state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet for several DNs, and beat the performance of existing state-of-the-art DA approaches for the same networks on these datasets. Finally, to demonstrate its domain-agnostic properties, we show the signiﬁcant improvements that IHDA provided for a deep neural network on a non-image wearable sensor-based activity recognition benchmark.
\ No newline at end of file
diff --git a/data/2020/neurips/Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts b/data/2020/neurips/Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts
new file mode 100644
index 0000000000..69c09a7f70
--- /dev/null
+++ b/data/2020/neurips/Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts	
@@ -0,0 +1,2 @@
+Accurate estimation of aleatoric and epistemic uncertainty is crucial to build safe and reliable systems. Traditional approaches, such as dropout and ensemble methods, estimate uncertainty by sampling probability predictions from different submodels, which leads to slow uncertainty estimation at inference time. Recent works address this drawback by directly predicting parameters of prior distributions over the probability predictions with a neural network. While this approach has demonstrated accurate uncertainty estimation, it requires defining arbitrary target parameters for in-distribution data and makes the unrealistic assumption that out-of-distribution (OOD) data is known at training time. 
+In this work we propose the Posterior Network (PostNet), which uses Normalizing Flows to predict an individual closed-form posterior distribution over predicted probabilites for any input sample. The posterior distributions learned by PostNet accurately reflect uncertainty for in- and out-of-distribution data -- without requiring access to OOD data at training time. PostNet achieves state-of-the art results in OOD detection and in uncertainty calibration under dataset shifts.
\ No newline at end of file
diff --git a/data/2020/neurips/Posterior Re-calibration for Imbalanced Datasets b/data/2020/neurips/Posterior Re-calibration for Imbalanced Datasets
new file mode 100644
index 0000000000..635119c976
--- /dev/null
+++ b/data/2020/neurips/Posterior Re-calibration for Imbalanced Datasets	
@@ -0,0 +1 @@
+Neural Networks can perform poorly when the training label distribution is heavily imbalanced, as well as when the testing data differs from the training distribution. In order to deal with shift in the testing label distribution, which imbalance causes, we motivate the problem from the perspective of an optimal Bayes classifier and derive a post-training prior rebalancing technique that can be solved through a KL-divergence based optimization. This method allows a flexible post-training hyper-parameter to be efficiently tuned on a validation set and effectively modify the classifier margin to deal with this imbalance. We further combine this method with existing likelihood shift methods, re-interpreting them from the same Bayesian perspective, and demonstrating that our method can deal with both problems in a unified way. The resulting algorithm can be conveniently used on probabilistic classification problems agnostic to underlying architectures. Our results on six different datasets and five different architectures show state of art accuracy, including on large-scale imbalanced datasets such as iNaturalist for classification and Synthia for semantic segmentation. Please see this https URL for implementation.
\ No newline at end of file
diff --git a/data/2020/neurips/Practical Low-Rank Communication Compression in Decentralized Deep Learning b/data/2020/neurips/Practical Low-Rank Communication Compression in Decentralized Deep Learning
new file mode 100644
index 0000000000..af20f990f4
--- /dev/null
+++ b/data/2020/neurips/Practical Low-Rank Communication Compression in Decentralized Deep Learning	
@@ -0,0 +1 @@
+Lossy gradient compression has become a practical tool to overcome the communication bottleneck in centrally coordinated distributed training of machine learning models. However, algorithms for decentralized training with compressed communication over arbitrary connected networks have been more complicated, requiring additional memory and hyperparameters. We introduce a simple algorithm that directly compresses the model differences between neighboring workers using low-rank linear compressors applied on model differences. Inspired by the PowerSGD algorithm for centralized deep learning, this algorithm uses power iteration steps to maximize the information transferred per bit. We prove that our method requires no additional hyperparameters, converges faster than prior methods, and is asymptotically independent of both the network and the compression. Out of the box, these compressors perform on par with state-of-the-art tuned compression algorithms in a series of deep learning benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Practical No-box Adversarial Attacks against DNNs b/data/2020/neurips/Practical No-box Adversarial Attacks against DNNs
new file mode 100644
index 0000000000..481225d17a
--- /dev/null
+++ b/data/2020/neurips/Practical No-box Adversarial Attacks against DNNs	
@@ -0,0 +1 @@
+The study of adversarial vulnerabilities of deep neural networks (DNNs) has progressed rapidly. Existing attacks require either internal access (to the architecture, parameters, or training set of the victim model) or external access (to query the model). However, both the access may be infeasible or expensive in many scenarios. We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. Instead, the attacker can only gather a small number of examples from the same problem domain as that of the victim model. Such a stronger threat model greatly expands the applicability of adversarial attacks. We propose three mechanisms for training with a very small dataset (on the order of tens of examples) and find that prototypical reconstruction is the most effective. Our experiments show that adversarial examples crafted on prototypical auto-encoding models transfer well to a variety of image classification and face verification models. On a commercial celebrity recognition system held by this http URL, our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
\ No newline at end of file
diff --git a/data/2020/neurips/Practical Quasi-Newton Methods for Training Deep Neural Networks b/data/2020/neurips/Practical Quasi-Newton Methods for Training Deep Neural Networks
new file mode 100644
index 0000000000..2753c246de
--- /dev/null
+++ b/data/2020/neurips/Practical Quasi-Newton Methods for Training Deep Neural Networks	
@@ -0,0 +1 @@
+We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient $n$ is often of the order of tens of millions and the Hessian has $n^2$ elements. Consequently, computing and storing a full $n \times n$ BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kronecker-factored block-diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and L-BFGS approximations bounded. In tests on autoencoder feed-forward neural network models with either nine or thirteen layers applied to three datasets, our methods outperformed or performed comparably to KFAC and state-of-the-art first-order stochastic methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Pre-training via Paraphrasing b/data/2020/neurips/Pre-training via Paraphrasing
new file mode 100644
index 0000000000..e7caf8de3a
--- /dev/null
+++ b/data/2020/neurips/Pre-training via Paraphrasing	
@@ -0,0 +1 @@
+We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation. We further show that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.
\ No newline at end of file
diff --git a/data/2020/neurips/Precise expressions for random projections: Low-rank approximation and randomized Newton b/data/2020/neurips/Precise expressions for random projections: Low-rank approximation and randomized Newton
new file mode 100644
index 0000000000..c744729c86
--- /dev/null
+++ b/data/2020/neurips/Precise expressions for random projections: Low-rank approximation and randomized Newton	
@@ -0,0 +1 @@
+It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace. Matrix sketching has emerged as a powerful technique for performing such dimensionality reduction very efficiently. Even though there is an extensive literature on the worst-case performance of sketching, existing guarantees are typically very different from what is observed in practice. We exploit recent developments in the spectral analysis of random matrices to develop novel techniques that provide provably accurate expressions for the expected value of random projection matrices obtained via sketching. These expressions can be used to characterize the performance of dimensionality reduction in a variety of common machine learning tasks, ranging from low-rank approximation to iterative stochastic optimization. Our results apply to several popular sketching methods, including Gaussian and Rademacher sketches, and they enable precise analysis of these methods in terms of spectral properties of the data. Empirical results show that the expressions we derive reflect the practical performance of these sketching methods, down to lower-order effects and even constant factors.
\ No newline at end of file
diff --git a/data/2020/neurips/Predicting Training Time Without Training b/data/2020/neurips/Predicting Training Time Without Training
new file mode 100644
index 0000000000..e175c9f85f
--- /dev/null
+++ b/data/2020/neurips/Predicting Training Time Without Training	
@@ -0,0 +1 @@
+We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. We also discuss how to further reduce the computational and memory cost of our method, and in particular we show that by exploiting the spectral properties of the gradients' matrix it is possible predict training time on a large dataset while processing only a subset of the samples.
\ No newline at end of file
diff --git a/data/2020/neurips/Prediction with Corrupted Expert Advice b/data/2020/neurips/Prediction with Corrupted Expert Advice
new file mode 100644
index 0000000000..a2c671b1a0
--- /dev/null
+++ b/data/2020/neurips/Prediction with Corrupted Expert Advice	
@@ -0,0 +1 @@
+We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption. We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in this setting and performs optimally in a wide range of environments, regardless of the magnitude of the injected corruption. Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks: we show that for experts in the corrupted stochastic regime, the regret performance of OMD is in fact strictly inferior to that of FTRL.
\ No newline at end of file
diff --git a/data/2020/neurips/Predictive Information Accelerates Learning in RL b/data/2020/neurips/Predictive Information Accelerates Learning in RL
new file mode 100644
index 0000000000..1ac1ac185a
--- /dev/null
+++ b/data/2020/neurips/Predictive Information Accelerates Learning in RL	
@@ -0,0 +1 @@
+The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. Our implementation is given on GitHub.
\ No newline at end of file
diff --git a/data/2020/neurips/Predictive coding in balanced neural networks with noise, chaos and delays b/data/2020/neurips/Predictive coding in balanced neural networks with noise, chaos and delays
new file mode 100644
index 0000000000..d21af9f718
--- /dev/null
+++ b/data/2020/neurips/Predictive coding in balanced neural networks with noise, chaos and delays	
@@ -0,0 +1 @@
+Biological neural networks face a formidable task: performing reliable computations in the face of intrinsic stochasticity in individual neurons, imprecisely specified synaptic connectivity, and nonnegligible delays in synaptic transmission. A common approach to combatting such biological heterogeneity involves averaging over large redundant networks of $N$ neurons resulting in coding errors that decrease classically as $1/\sqrt{N}$. Recent work demonstrated a novel mechanism whereby recurrent spiking networks could efficiently encode dynamic stimuli, achieving a superclassical scaling in which coding errors decrease as $1/N$. This specific mechanism involved two key ideas: predictive coding, and a tight balance, or cancellation between strong feedforward inputs and strong recurrent feedback. However, the theoretical principles governing the efficacy of balanced predictive coding and its robustness to noise, synaptic weight heterogeneity and communication delays remain poorly understood. To discover such principles, we introduce an analytically tractable model of balanced predictive coding, in which the degree of balance and the degree of weight disorder can be dissociated unlike in previous balanced network models, and we develop a mean field theory of coding accuracy. Overall, our work provides and solves a general theoretical framework for dissecting the differential contributions neural noise, synaptic disorder, chaos, synaptic delays, and balance to the fidelity of predictive neural codes, reveals the fundamental role that balance plays in achieving superclassical scaling, and unifies previously disparate models in theoretical neuroscience.
\ No newline at end of file
diff --git a/data/2020/neurips/Predictive inference is free with the jackknife+-after-bootstrap b/data/2020/neurips/Predictive inference is free with the jackknife+-after-bootstrap
new file mode 100644
index 0000000000..20fba40034
--- /dev/null
+++ b/data/2020/neurips/Predictive inference is free with the jackknife+-after-bootstrap	
@@ -0,0 +1 @@
+Ensemble learning is widely used in applications to make predictions in complex decision problems---for example, averaging models fitted to a sequence of samples bootstrapped from the available training data. While such methods offer more accurate, stable, and robust predictions and model estimates, much less is known about how to perform valid, assumption-lean inference on the output of these types of procedures. In this paper, we propose the jackknife+-after-bootstrap (J+aB), a procedure for constructing a predictive interval, which uses only the available bootstrapped samples and their corresponding fitted models, and is therefore "free" in terms of the cost of model fitting. The J+aB offers a predictive coverage guarantee that holds with no assumptions on the distribution of the data, the nature of the fitted model, or the way in which the ensemble of models are aggregated---at worst, the failure rate of the predictive interval is inflated by a factor of 2. Our numerical experiments verify the coverage and accuracy of the resulting predictive intervals on real data.
\ No newline at end of file
diff --git a/data/2020/neurips/Preference learning along multiple criteria: A game-theoretic perspective b/data/2020/neurips/Preference learning along multiple criteria: A game-theoretic perspective
new file mode 100644
index 0000000000..5876a87ad5
--- /dev/null
+++ b/data/2020/neurips/Preference learning along multiple criteria: A game-theoretic perspective	
@@ -0,0 +1 @@
+The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization. From a theoretical standpoint, we show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem. Furthermore, given random samples of pairwise comparisons, we show that a simple plug-in estimator achieves near-optimal minimax sample complexity. Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.
\ No newline at end of file
diff --git a/data/2020/neurips/Preference-based Reinforcement Learning with Finite-Time Guarantees b/data/2020/neurips/Preference-based Reinforcement Learning with Finite-Time Guarantees
new file mode 100644
index 0000000000..df59220acf
--- /dev/null
+++ b/data/2020/neurips/Preference-based Reinforcement Learning with Finite-Time Guarantees	
@@ -0,0 +1 @@
+Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret. Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy. In this paper, we present the first finite-time analysis for general PbRL problems. We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL. If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search. Experiments show the efficacy of our method when it is applied to real-world problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm b/data/2020/neurips/Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm
new file mode 100644
index 0000000000..ba0d554419
--- /dev/null
+++ b/data/2020/neurips/Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm	
@@ -0,0 +1 @@
+We consider the task of sampling with respect to a log concave probability distribution. The potential of the target distribution is assumed to be composite, i.e., written as the sum of a smooth convex term, and a nonsmooth convex term possibly taking infinite values. The target distribution can be seen as a minimizer of the Kullback-Leibler divergence defined on the Wasserstein space (i.e., the space of probability measures). In the first part of this paper, we establish a strong duality result for this minimization problem. In the second part of this paper, we use the duality gap arising from the first part to study the complexity of the Proximal Stochastic Gradient Langevin Algorithm (PSGLA), which can be seen as a generalization of the Projected Langevin Algorithm. Our approach relies on viewing PSGLA as a primal dual algorithm and covers many cases where the target distribution is not fully supported. In particular, we show that if the potential is strongly convex, the complexity of PSGLA is $\cO(1/\varepsilon^2)$ in terms of the 2-Wasserstein distance. In contrast, the complexity of the Projected Langevin Algorithm is $\cO(1/\varepsilon^{12})$ in terms of total variation when the potential is convex.
\ No newline at end of file
diff --git a/data/2020/neurips/Primal-Dual Mesh Convolutional Neural Networks b/data/2020/neurips/Primal-Dual Mesh Convolutional Neural Networks
new file mode 100644
index 0000000000..b538165681
--- /dev/null
+++ b/data/2020/neurips/Primal-Dual Mesh Convolutional Neural Networks	
@@ -0,0 +1 @@
+Recent works in geometric deep learning have introduced neural networks that allow performing inference tasks on three-dimensional geometric data by defining convolution, and sometimes pooling, operations on triangle meshes. These methods, however, either consider the input mesh as a graph, and do not exploit specific geometric properties of meshes for feature aggregation and downsampling, or are specialized for meshes, but rely on a rigid definition of convolution that does not properly capture the local topology of the mesh. We propose a method that combines the advantages of both types of approaches, while addressing their limitations: we extend a primal-dual framework drawn from the graph-neural-network literature to triangle meshes, and define convolutions on two types of graphs constructed from an input mesh. Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them using an attention mechanism. At the same time, we introduce a pooling operation with a precise geometric interpretation, that allows handling variations in the mesh connectivity by clustering mesh faces in a task-driven fashion. We provide theoretical insights of our approach using tools from the mesh-simplification literature. In addition, we validate experimentally our method in the tasks of shape classification and shape segmentation, where we obtain comparable or superior performance to the state of the art.
\ No newline at end of file
diff --git a/data/2020/neurips/Principal Neighbourhood Aggregation for Graph Nets b/data/2020/neurips/Principal Neighbourhood Aggregation for Graph Nets
new file mode 100644
index 0000000000..6a2c5d5534
--- /dev/null
+++ b/data/2020/neurips/Principal Neighbourhood Aggregation for Graph Nets	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces. We extend this theoretical framework to include continuous features - which occur regularly in real-world input domains and within the hidden layers of GNNs - and we demonstrate the requirement for multiple aggregation functions in this context. Accordingly, we propose Principal Neighbourhood Aggregation (PNA), a novel architecture combining multiple aggregators with degree-scalers (which generalize the sum aggregator). Finally, we compare the capacity of different models to capture and exploit the graph structure via a novel benchmark containing multiple tasks taken from classical graph theory, alongside existing benchmarks from real-world domains, all of which demonstrate the strength of our model. With this work, we hope to steer some of the GNN research towards new aggregation methods which we believe are essential in the search for powerful and robust models.
\ No newline at end of file
diff --git a/data/2020/neurips/Privacy Amplification via Random Check-Ins b/data/2020/neurips/Privacy Amplification via Random Check-Ins
new file mode 100644
index 0000000000..f5c73e6dc8
--- /dev/null
+++ b/data/2020/neurips/Privacy Amplification via Random Check-Ins	
@@ -0,0 +1 @@
+Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via na\"ive schemes. A key assumption in both these approaches is that the elements in the data set can be uniformly sampled, or be uniformly permuted -- constraints that may become prohibitive when the data is processed in a decentralized or distributed fashion. In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients). Our main contribution is the \emph{random check-in} distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client. It has privacy/accuracy trade-offs similar to privacy amplification by subsampling/shuffling. However, our method does not require server-initiated communication, or even knowledge of the population size. To our knowledge, this is the first privacy amplification tailored for a distributed learning framework, and it may have broader applicability beyond FL. Along the way, we extend privacy amplification by shuffling to incorporate $(\epsilon,\delta)$-DP local randomizers, and exponentially improve its guarantees. In practical regimes, this improvement allows for similar privacy and utility using data from an order of magnitude fewer users.
\ No newline at end of file
diff --git a/data/2020/neurips/Private Identity Testing for High-Dimensional Distributions b/data/2020/neurips/Private Identity Testing for High-Dimensional Distributions
new file mode 100644
index 0000000000..9a981f8fc0
--- /dev/null
+++ b/data/2020/neurips/Private Identity Testing for High-Dimensional Distributions	
@@ -0,0 +1 @@
+In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: Gaussians in $\mathbb{R}^d$ with known covariance and product distributions over $\{\pm 1\}^{d}$. Our testers have improved sample complexity compared to those derived from previous techniques, and are the first testers whose sample complexity matches the order-optimal minimax sample complexity of $O(d^{1/2}/\alpha^2)$ in many parameter regimes. We construct two types of testers, exhibiting tradeoffs between sample complexity and computational complexity. Finally, we provide a two-way reduction between testing a subclass of multivariate product distributions and testing univariate distributions, and thereby obtain upper and lower bounds for testing this subclass of product distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity b/data/2020/neurips/Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
new file mode 100644
index 0000000000..0988abbe71
--- /dev/null
+++ b/data/2020/neurips/Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity	
@@ -0,0 +1 @@
+We present a differentially private learner for halfspaces over a finite grid $G$ in $\mathbb{R}^d$ with sample complexity $\approx d^{2.5}\cdot 2^{\log^*|G|}$, which improves the state-of-the-art result of [Beimel et al., COLT 2019] by a $d^2$ factor. The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem: Given a feasible collection of $m$ linear constraints of the form $Ax\geq b$, the task is to privately identify a solution $x$ that satisfies most of the constraints. Our algorithm is iterative, where each iteration determines the next coordinate of the constructed solution $x$.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Active Meta-Learning b/data/2020/neurips/Probabilistic Active Meta-Learning
new file mode 100644
index 0000000000..6dade6b4c1
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Active Meta-Learning	
@@ -0,0 +1 @@
+Data-efficient learning algorithms are essential in many practical applications where data collection is expensive, e.g., in robotics due to the wear and tear. To address this problem, meta-learning algorithms use prior experience about tasks to learn new, related tasks efficiently. Typically, a set of training tasks is assumed given or randomly chosen. However, this setting does not take into account the sequential nature that naturally arises when training a model from scratch in real-life: how do we collect a set of training tasks in a data-efficient manner? In this work, we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and the active meta-learning setting using a probabilistic latent variable model. We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Circuits for Variational Inference in Discrete Graphical Models b/data/2020/neurips/Probabilistic Circuits for Variational Inference in Discrete Graphical Models
new file mode 100644
index 0000000000..6ff782ebec
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Circuits for Variational Inference in Discrete Graphical Models	
@@ -0,0 +1 @@
+Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN), to compute ELBO gradients exactly (without sampling) for a certain class of densities. In particular, we show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is a polynomial the corresponding ELBO can be computed analytically. To scale to graphical models with thousands of variables, we develop an efficient and effective construction of selective-SPNs with size $O(kn)$, where $n$ is the number of variables and $k$ is an adjustable hyperparameter. We demonstrate our approach on three types of graphical models -- Ising models, Latent Dirichlet Allocation, and factor graphs from the UAI Inference Competition. Selective-SPNs give a better lower bound than mean-field and structured mean-field, and is competitive with approximations that do not provide a lower bound, such as Loopy Belief Propagation and Tree-Reweighted Belief Propagation. Our results show that probabilistic circuits are promising tools for variational inference in discrete graphical models as they combine tractability and expressivity.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Fair Clustering b/data/2020/neurips/Probabilistic Fair Clustering
new file mode 100644
index 0000000000..42a5386d66
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Fair Clustering	
@@ -0,0 +1 @@
+In clustering problems, a central decision-maker is given a complete metric graph over vertices and must provide a clustering of vertices that minimizes some objective function. In fair clustering problems, vertices are endowed with a color (e.g., membership in a group), and the features of a valid clustering might also include the representation of colors in that clustering. Prior work in fair clustering assumes complete knowledge of group membership. In this paper, we generalize prior work by assuming imperfect knowledge of group membership through probabilistic assignments. We present clustering algorithms in this more general setting with approximation ratio guarantees. We also address the problem of "metric membership", where different groups have a notion of order and distance. Experiments are conducted using our proposed algorithms as well as baselines to validate our approach and also surface nuanced concerns when group membership is not known deterministically.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations b/data/2020/neurips/Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations
new file mode 100644
index 0000000000..5693853281
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Inference with Algebraic Constraints: Theoretical Limits and Practical Approximations	
@@ -0,0 +1 @@
+Weightedmodel integration (WMI) is a framework to perform advanced probabilistic inference in hybrid domains, i.e., on distributions over mixed continuous-discrete random variables and in the presence of complex logical and arithmetic constraints. In this work, we advance theWMI framework on both the theoretical and algorithmic side. First, we trace the boundaries of tractability for WMI inference in terms of two key properties of a WMI problem’s dependency structure: sparsity and diameter. We prove that exact inference is only efficient if that structure is tree-shaped with logarithmic diameter. While this result deepens our theoretical understanding of WMI it hinders the practical applicability of exact WMI solvers to large problems. To overcome this, we propose the first approximate WMI solver that does not resort to sampling, but performs exact inference on an approximate model. Our solution iteratively performs message passing in a relaxed problem structure to recover lost dependencies. As our experiments show, it scales to problems that are out of the reach of exact WMI solvers while delivering accurate approximations.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Linear Solvers for Machine Learning b/data/2020/neurips/Probabilistic Linear Solvers for Machine Learning
new file mode 100644
index 0000000000..acee211666
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Linear Solvers for Machine Learning	
@@ -0,0 +1 @@
+Linear systems are the bedrock of virtually all numerical computation. Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure, stochasticity and the central role of uncertainty in the field. Unifying earlier work we propose a class of probabilistic linear solvers which jointly infer the matrix, its inverse and the solution from matrix-vector product observations. This class emerges from a fundamental set of desiderata which constrains the space of possible algorithms and recovers the method of conjugate gradients under certain conditions. We demonstrate how to incorporate prior spectral information in order to calibrate uncertainty and experimentally showcase the potential of such solvers for machine learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Orientation Estimation with Matrix Fisher Distributions b/data/2020/neurips/Probabilistic Orientation Estimation with Matrix Fisher Distributions
new file mode 100644
index 0000000000..2a094fc1b5
--- /dev/null
+++ b/data/2020/neurips/Probabilistic Orientation Estimation with Matrix Fisher Distributions	
@@ -0,0 +1 @@
+This paper focuses on estimating probability distributions over the set of 3D rotations ($SO(3)$) using deep neural networks. Learning to regress models to the set of rotations is inherently difficult due to differences in topology between $\mathbb{R}^N$ and $SO(3)$. We overcome this issue by using a neural network to output the parameters for a matrix Fisher distribution since these parameters are homeomorphic to $\mathbb{R}^9$. By using a negative log likelihood loss for this distribution we get a loss which is convex with respect to the network outputs. By optimizing this loss we improve state-of-the-art on several challenging applicable datasets, namely Pascal3D+, ModelNet10-$SO(3)$ and UPNA head pose.
\ No newline at end of file
diff --git a/data/2020/neurips/Probabilistic Time Series Forecasting with Shape and Temporal Diversity b/data/2020/neurips/Probabilistic Time Series Forecasting with Shape and Temporal Diversity
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Probably Approximately Correct Constrained Learning b/data/2020/neurips/Probably Approximately Correct Constrained Learning
new file mode 100644
index 0000000000..6baab51066
--- /dev/null
+++ b/data/2020/neurips/Probably Approximately Correct Constrained Learning	
@@ -0,0 +1 @@
+As learning solutions reach critical applications in social, industrial, and medical domains, the need to curtail their behavior becomes paramount. There is now ample evidence that without explicit tailoring, learning can lead to biased, unsafe, and prejudiced solutions. To tackle these problems, we develop a generalization theory of constrained learning based on the probably approximately correct (PAC) learning framework. In particular, we show that imposing requirements does not make a learning problem harder in the sense that any PAC learnable class is also PAC constrained learnable using a constrained counterpart of the empirical risk minimization (ERM) rule. For typical parametrized models, however, this learner involves solving a non-convex optimization program for which even obtaining a feasible solution may be hard. To overcome this issue, we prove that under mild conditions the empirical dual problem of constrained learning is also a PAC constrained learner that now leads to a practical constrained learning algorithm. We analyze the generalization properties of this solution and use it to illustrate how constrained learning can address problems in fair and robust classification.
\ No newline at end of file
diff --git a/data/2020/neurips/Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions b/data/2020/neurips/Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions
new file mode 100644
index 0000000000..eb15560277
--- /dev/null
+++ b/data/2020/neurips/Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions	
@@ -0,0 +1 @@
+The proﬁle of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, proﬁle entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Speciﬁcally, proﬁle entropy: a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of proﬁle compression, for which we derive optimal near-linear-time block and sequential algorithms. To further our understanding of proﬁle entropy, we investigate its attributes, provide algorithms for approximating its value, and determine its magnitude for numerous structural distribution families.
\ No newline at end of file
diff --git a/data/2020/neurips/Program Synthesis with Pragmatic Communication b/data/2020/neurips/Program Synthesis with Pragmatic Communication
new file mode 100644
index 0000000000..517c0d4caf
--- /dev/null
+++ b/data/2020/neurips/Program Synthesis with Pragmatic Communication	
@@ -0,0 +1 @@
+Program synthesis techniques construct or infer programs from user-provided specifications, such as input-output examples. Yet most specifications, especially those given by end-users, leave the synthesis problem radically ill-posed, because many programs may simultaneously satisfy the specification. Prior work resolves this ambiguity by using various inductive biases, such as a preference for simpler programs. This work introduces a new inductive bias derived by modeling the program synthesis task as rational communication, drawing insights from recursive reasoning models of pragmatics. Given a specification, we score a candidate program both on its consistency with the specification, and also whether a rational speaker would chose this particular specification to communicate that program. We develop efficient algorithms for such an approach when learning from input-output examples, and build a pragmatic program synthesizer over a simple grid-like layout domain. A user study finds that end-user participants communicate more effectively with the pragmatic program synthesizer over a non-pragmatic one.
\ No newline at end of file
diff --git a/data/2020/neurips/Projected Stein Variational Gradient Descent b/data/2020/neurips/Projected Stein Variational Gradient Descent
new file mode 100644
index 0000000000..5f2cd7c9d7
--- /dev/null
+++ b/data/2020/neurips/Projected Stein Variational Gradient Descent	
@@ -0,0 +1 @@
+The curse of dimensionality is a longstanding challenge in Bayesian inference in high dimensions. In this work, we propose a projected Stein variational gradient descent (pSVGD) method to overcome this challenge by exploiting the fundamental property of intrinsic low dimensionality of the data informed subspace stemming from ill-posedness of such problems. We adaptively construct the subspace using a gradient information matrix of the log-likelihood, and apply pSVGD to the much lower-dimensional coefficients of the parameter projection. The method is demonstrated to be more accurate and efficient than SVGD. It is also shown to be more scalable with respect to the number of parameters, samples, data points, and processor cores via experiments with parameters dimensions ranging from the hundreds to the tens of thousands.
\ No newline at end of file
diff --git a/data/2020/neurips/Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method b/data/2020/neurips/Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method
new file mode 100644
index 0000000000..357ef98357
--- /dev/null
+++ b/data/2020/neurips/Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method	
@@ -0,0 +1 @@
+We consider the classical setting of optimizing a nonsmooth Lipschitz continuous convex function over a convex constraint set, when having access to a (stochastic) first-order oracle (FO) for the function and a projection oracle (PO) for the constraint set. It is well known that to achieve $\epsilon$-suboptimality in high-dimensions, $\Theta(\epsilon^{-2})$ FO calls are necessary. This is achieved by the projected subgradient method (PGD). However, PGD also entails $O(\epsilon^{-2})$ PO calls, which may be computationally costlier than FO calls (e.g. nuclear norm constraints). Improving this PO calls complexity of PGD is largely unexplored, despite the fundamental nature of this problem and extensive literature. We present first such improvement. This only requires a mild assumption that the objective function, when extended to a slightly larger neighborhood of the constraint set, still remains Lipschitz and accessible via FO. In particular, we introduce MOPES method, which carefully combines Moreau-Yosida smoothing and accelerated first-order schemes. This is guaranteed to find a feasible $\epsilon$-suboptimal solution using only $O(\epsilon^{-1})$ PO calls and optimal $O(\epsilon^{-2})$ FO calls. Further, instead of a PO if we only have a linear minimization oracle (LMO, a la Frank-Wolfe) to access the constraint set, an extension of our method, MOLES, finds a feasible $\epsilon$-suboptimal solution using $O(\epsilon^{-2})$ LMO calls and FO calls---both match known lower bounds, resolving a question left open since White (1993). Our experiments confirm that these methods achieve significant speedups over the state-of-the-art, for a problem with costly PO and LMO calls.
\ No newline at end of file
diff --git a/data/2020/neurips/Projection Robust Wasserstein Distance and Riemannian Optimization b/data/2020/neurips/Projection Robust Wasserstein Distance and Riemannian Optimization
new file mode 100644
index 0000000000..fbe03e25f2
--- /dev/null
+++ b/data/2020/neurips/Projection Robust Wasserstein Distance and Riemannian Optimization	
@@ -0,0 +1 @@
+Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.
\ No newline at end of file
diff --git a/data/2020/neurips/Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning b/data/2020/neurips/Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning
new file mode 100644
index 0000000000..6f8e78f714
--- /dev/null
+++ b/data/2020/neurips/Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning	
@@ -0,0 +1 @@
+In multi-agent reinforcement learning, discovering successful collective behaviors is challenging as it requires exploring a joint action space that grows exponentially with the number of agents. While the tractability of independent agent-wise exploration is appealing, this approach fails on tasks that require elaborate group strategies. We argue that coordinating the agents' policies can guide their exploration and we investigate techniques to promote such an inductive bias. We propose two policy regularization methods: TeamReg, which is based on inter-agent action predictability and CoachReg that relies on synchronized behavior selection. We evaluate each approach on four challenging continuous control tasks with sparse rewards that require varying levels of coordination. Our methodology allocates the same hyper-parameter search budget across our algorithms and baselines and we find that our approaches are more robust to hyper-parameter variations. Our experiments show that our methods significantly improve performance on cooperative multi-agent problems and scale well when the number of agents is increased. Finally, we quantitatively analyze the effects of our proposed methods on the policies that our agents learn and we show that our methods successfully enforce the qualities that we propose as proxies for coordinated behaviors.
\ No newline at end of file
diff --git a/data/2020/neurips/Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method b/data/2020/neurips/Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method
new file mode 100644
index 0000000000..fd223d5662
--- /dev/null
+++ b/data/2020/neurips/Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method	
@@ -0,0 +1 @@
+Many recent reinforcement learning (RL) methods learn stochastic policies with entropy regularization for exploration and robustness. However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures. To tackle this problem, we propose a novel regularization method that is compatible with a broad range of expressive policy architectures. An appealing feature is that, the estimation of our regularization terms is simple and efﬁcient even when the policy distributions are unknown. We show that our approach can effectively promote the exploration in continuous action spaces. Based on our regularization, we propose an off-policy actor-critic algorithm. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art regularized RL methods in continuous control tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Prophet Attention: Predicting Attention with Future Attention b/data/2020/neurips/Prophet Attention: Predicting Attention with Future Attention
new file mode 100644
index 0000000000..10db85a36c
--- /dev/null
+++ b/data/2020/neurips/Prophet Attention: Predicting Attention with Future Attention	
@@ -0,0 +1 @@
+Recently, attention based models have been used extensively in many sequence-to-sequence learning systems. Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words. However, for each time step in the decoding process, the attention based models usually use the hidden state of the current input to attend to the image regions. Under this setting, these attention models have a"deviated focus"problem that they calculate the attention weights based on previous words instead of the one to be generated, impairing the performance of both grounding and captioning. In this paper, we propose the Prophet Attention, similar to the form of self-supervision. In the training stage, this module utilizes the future information to calculate the"ideal"attention weights towards image regions. These calculated"ideal"weights are further used to regularize the"deviated"attention. In this manner, image regions are grounded with the correct words. The proposed Prophet Attention can be easily incorporated into existing image captioning models to improve their performance of both grounding and captioning. The experiments on the Flickr30k Entities and the MSCOCO datasets show that the proposed Prophet Attention consistently outperforms baselines in both automatic metrics and human evaluations. It is worth noticing that we set new state-of-the-arts on the two benchmark datasets and achieve the 1st place on the leaderboard of the online MSCOCO benchmark in terms of the default ranking score, i.e., CIDEr-c40.
\ No newline at end of file
diff --git a/data/2020/neurips/Provable Online CP PARAFAC Decomposition of a Structured Tensor via Dictionary Learning b/data/2020/neurips/Provable Online CP PARAFAC Decomposition of a Structured Tensor via Dictionary Learning
new file mode 100644
index 0000000000..a0ec6f8776
--- /dev/null
+++ b/data/2020/neurips/Provable Online CP PARAFAC Decomposition of a Structured Tensor via Dictionary Learning	
@@ -0,0 +1 @@
+We consider the problem of factorizing a structured 3-way tensor into its constituent Canonical Polyadic (CP) factors. This decomposition, which can be viewed as a generalization of singular value decomposition (SVD) for tensors, reveals how the tensor dimensions (features) interact with each other. However, since the factors are a priori unknown, the corresponding optimization problems are inherently non-convex. The existing guaranteed algorithms which handle this non-convexity incur an irreducible error (bias), and only apply to cases where all factors have the same structure. To this end, we develop a provable algorithm for online structured tensor factorization, wherein one of the factors obeys some incoherence conditions, and the others are sparse. Specifically we show that, under some relatively mild conditions on initialization, rank, and sparsity, our algorithm recovers the factors exactly (up to scaling and permutation) at a linear rate. Complementary to our theoretical results, our synthetic and real-world data evaluations showcase superior performance compared to related techniques. Moreover, its scalability and ability to learn on-the-fly makes it suitable for real-world tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Provable Overlapping Community Detection in Weighted Graphs b/data/2020/neurips/Provable Overlapping Community Detection in Weighted Graphs
new file mode 100644
index 0000000000..3cb37ddd42
--- /dev/null
+++ b/data/2020/neurips/Provable Overlapping Community Detection in Weighted Graphs	
@@ -0,0 +1 @@
+Community detection is a widely-studied unsupervised learning problem in which the task is to group similar entities together based on observed pairwise entity interactions. This problem has applications in diverse domains such as social network analysis and computational biology. There is a significant amount of literature studying this problem under the assumption that the communities do not overlap. When the communities are allowed to overlap, often a pure nodes assumption is made, i.e. each community has a node that belongs exclusively to that community. This assumption, however, may not always be satisfied in practice. In this paper, we provide a provable method to detect overlapping communities in weighted graphs without explicitly making the pure nodes assumption. Moreover, contrary to most existing algorithms, our approach is based on convex optimization, for which many useful theoretical properties are already known. We demonstrate the success of our algorithm on artificial and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Consistent Partial-Label Learning b/data/2020/neurips/Provably Consistent Partial-Label Learning
new file mode 100644
index 0000000000..c2871e4f4b
--- /dev/null
+++ b/data/2020/neurips/Provably Consistent Partial-Label Learning	
@@ -0,0 +1 @@
+Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels. Even though many practical PLL methods have been proposed in the last two decades, there lacks a theoretical understanding of the consistency of those methods-none of the PLL methods hitherto possesses a generation process of candidate label sets, and then it is still unclear why such a method works on a specific dataset and when it may fail given a different dataset. In this paper, we propose the first generation model of candidate label sets, and develop two novel PLL methods that are guaranteed to be provably consistent, i.e., one is risk-consistent and the other is classifier-consistent. Our methods are advantageous, since they are compatible with any deep network or stochastic optimizer. Furthermore, thanks to the generation model, we would be able to answer the two questions above by testing if the generation model matches given candidate label sets. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed generation model and two PLL methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning b/data/2020/neurips/Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
new file mode 100644
index 0000000000..747473a5cb
--- /dev/null
+++ b/data/2020/neurips/Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning	
@@ -0,0 +1 @@
+Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach b/data/2020/neurips/Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
new file mode 100644
index 0000000000..47df3e2556
--- /dev/null
+++ b/data/2020/neurips/Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach	
@@ -0,0 +1 @@
+Structural equation models (SEMs) are widely used in sciences, ranging from economics to psychology, to uncover causal relationships underlying a complex system under consideration and estimate structural parameters of interest. We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation. We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using the stochastic gradient descent. We consider both 2-layer and multi-layer NNs with ReLU activation functions and prove global convergence in an overparametrized regime, where the number of neurons is diverging. The results are established using techniques from online learning and local linearization of NNs, and improve in several aspects the current state-of-the-art. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Efficient Neural GTD for Off-Policy Learning b/data/2020/neurips/Provably Efficient Neural GTD for Off-Policy Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits b/data/2020/neurips/Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
new file mode 100644
index 0000000000..887c95227e
--- /dev/null
+++ b/data/2020/neurips/Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits	
@@ -0,0 +1 @@
+Many of the recent triumphs in machine learning are dependent on well-tuned hyperparameters. This is particularly prominent in reinforcement learning (RL) where a small change in the configuration can lead to failure. Despite the importance of tuning hyperparameters, it remains expensive and is often done in a naive and laborious way. A recent solution to this problem is Population Based Training (PBT) which updates both weights and hyperparameters in a single training run of a population of agents. PBT has been shown to be particularly effective in RL, leading to widespread use in the field. However, PBT lacks theoretical guarantees since it relies on random heuristics to explore the hyperparameter space. This inefficiency means it typically requires vast computational resources, which is prohibitive for many small and medium sized labs. In this work, we introduce the first provably efficient PBT-style algorithm, Population-Based Bandits (PB2). PB2 uses a probabilistic model to guide the search in an efficient way, making it possible to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT. We show in a series of RL experiments that PB2 is able to achieve high performance with a modest computational budget.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations b/data/2020/neurips/Provably Efficient Reinforcement Learning with Kernel and Neural Function Approximations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration b/data/2020/neurips/Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
new file mode 100644
index 0000000000..9d4bc4b218
--- /dev/null
+++ b/data/2020/neurips/Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration	
@@ -0,0 +1 @@
+There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions to enable exploration by conventional exploration frameworks. Typically these assumptions are stronger than what is needed to find good solutions in the batch setting. In this work, we show how under a more standard notion of low inherent Bellman error, typically employed in least-square value iteration-style algorithms, we can provide strong PAC guarantees on learning a near optimal value function provided that the linear space is sufficiently "explorable". We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function, which is revealed only once learning has completed. If this reward function is also estimated from the samples gathered during pure exploration, our results also provide same-order PAC guarantees on the performance of the resulting policy for this setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration b/data/2020/neurips/Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Provably Robust Metric Learning b/data/2020/neurips/Provably Robust Metric Learning
new file mode 100644
index 0000000000..8a3b5a937f
--- /dev/null
+++ b/data/2020/neurips/Provably Robust Metric Learning	
@@ -0,0 +1 @@
+Metric learning is an important family of algorithms for classification and similarity search, but the robustness of learned metrics against small adversarial perturbations is less studied. In this paper, we show that existing metric learning algorithms, which focus on boosting the clean accuracy, can result in metrics that are less robust than the Euclidean distance. To overcome this problem, we propose a novel metric learning algorithm to find a Mahalanobis distance that is robust against adversarial perturbations, and the robustness of the resulting model is certifiable. Experimental results show that the proposed metric learning algorithm improves both certified robust errors and empirical robust errors (errors under adversarial attacks). Furthermore, unlike neural network defenses which usually encounter a trade-off between clean and robust errors, our method does not sacrifice clean errors compared with previous metric learning methods. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Provably adaptive reinforcement learning in metric spaces b/data/2020/neurips/Provably adaptive reinforcement learning in metric spaces
new file mode 100644
index 0000000000..7a61ffc94a
--- /dev/null
+++ b/data/2020/neurips/Provably adaptive reinforcement learning in metric spaces	
@@ -0,0 +1 @@
+We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
\ No newline at end of file
diff --git a/data/2020/neurips/Proximal Mapping for Deep Regularization b/data/2020/neurips/Proximal Mapping for Deep Regularization
new file mode 100644
index 0000000000..8473828a51
--- /dev/null
+++ b/data/2020/neurips/Proximal Mapping for Deep Regularization	
@@ -0,0 +1 @@
+Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled. For example, robustness to adversarial perturbations, and correlations between multiple modalities. However, most regularizers are specified in terms of hidden layer outputs, which are not themselves optimization variables. In contrast to prevalent methods that optimize them indirectly through model weights, we propose inserting proximal mapping as a new layer to the deep network, which directly and explicitly produces well regularized hidden layer outputs. The resulting technique is shown well connected to kernel warping and dropout, and novel algorithms were developed for robust temporal learning and multiview modeling, both outperforming state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Proximity Operator of the Matrix Perspective Function and its Applications b/data/2020/neurips/Proximity Operator of the Matrix Perspective Function and its Applications
new file mode 100644
index 0000000000..9b573f9c24
--- /dev/null
+++ b/data/2020/neurips/Proximity Operator of the Matrix Perspective Function and its Applications	
@@ -0,0 +1 @@
+We show that the matrix perspective function, which is jointly convex in the Cartesian product of a standard Euclidean vector space and a conformal space of symmetric matrices, has a proximity operator in an almost closed form. The only implicit part is to solve a semismooth, univariate root finding problem. We uncover the connection between our problem of study and the matrix nearness problem. Through this connection, we propose a quadratically convergent Newton algorithm for the root finding problem. Experiments verify that the evaluation of the proximity operator requires at most 8 Newton steps, taking less than 5s for 2000 by 2000 matrices on a standard laptop. Using this routine as a building block, we demonstrate the usefulness of the studied proximity operator in constrained maximum likelihood estimation of Gaussian mean and covariance, peudolikelihoodbased graphical model selection, and a matrix variant of the scaled lasso problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Pruning Filter in Filter b/data/2020/neurips/Pruning Filter in Filter
new file mode 100644
index 0000000000..39525875b7
--- /dev/null
+++ b/data/2020/neurips/Pruning Filter in Filter	
@@ -0,0 +1 @@
+Pruning has become a very powerful and effective technique to compress and accelerate modern neural networks. Existing pruning methods can be grouped into two categories: filter pruning (FP) and weight pruning (WP). FP wins at hardware compatibility but loses at the compression ratio compared with WP. To converge the strength of both methods, we propose to prune the filter in the filter. Specifically, we treat a filter $F \in \mathbb{R}^{C\times K\times K}$ as $K \times K$ stripes, \emph{i.e.}, $1\times 1$ filters $\in \mathbb{R}^{C}$, then by pruning the stripes instead of the whole filter, we can achieve finer granularity than traditional FP while being hardware friendly. We term our method as SWP (\emph{Stripe-Wise Pruning}). SWP is implemented by introducing a novel learnable matrix called Filter Skeleton, whose values reflect the shape of each filter. As some recent work has shown that the pruned architecture is more crucial than the inherited important weights, we argue that the architecture of a single filter, \emph{i.e.}, the shape, also matters. Through extensive experiments, we demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets without obvious accuracy drop. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Pruning neural networks without any data by iteratively conserving synaptic flow b/data/2020/neurips/Pruning neural networks without any data by iteratively conserving synaptic flow
new file mode 100644
index 0000000000..a04b6bd367
--- /dev/null
+++ b/data/2020/neurips/Pruning neural networks without any data by iteratively conserving synaptic flow	
@@ -0,0 +1 @@
+Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.
\ No newline at end of file
diff --git a/data/2020/neurips/Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point b/data/2020/neurips/Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point
new file mode 100644
index 0000000000..fae9526eb7
--- /dev/null
+++ b/data/2020/neurips/Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point	
@@ -0,0 +1 @@
+In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware. Through the co-evolution of hardware design and algorithms, MSFP achieves accuracy comparable to or better than industry standards Bfloat16 and INT8 at 3x and 4x lower cost, respectively. MSFP incurs negligible impact to accuracy (
\ No newline at end of file
diff --git a/data/2020/neurips/PyGlove: Symbolic Programming for Automated Machine Learning b/data/2020/neurips/PyGlove: Symbolic Programming for Automated Machine Learning
new file mode 100644
index 0000000000..506e2329c6
--- /dev/null
+++ b/data/2020/neurips/PyGlove: Symbolic Programming for Automated Machine Learning	
@@ -0,0 +1 @@
+Neural networks are sensitive to hyper-parameter and architecture choices. Automated Machine Learning (AutoML) is a promising paradigm for automating these choices. Current ML software libraries, however, are quite limited in handling the dynamic interactions among the components of AutoML. For example, efficientNAS algorithms, such as ENAS and DARTS, typically require an implementation coupling between the search space and search algorithm, the two key components in AutoML. Furthermore, implementing a complex search flow, such as searching architectures within a loop of searching hardware configurations, is difficult. To summarize, changing the search space, search algorithm, or search flow in current ML libraries usually requires a significant change in the program logic. In this paper, we introduce a new way of programming AutoML based on symbolic programming. Under this paradigm, ML programs are mutable, thus can be manipulated easily by another program. As a result, AutoML can be reformulated as an automated process of symbolic manipulation. With this formulation, we decouple the triangle of the search algorithm, the search space and the child program. This decoupling makes it easy to change the search space and search algorithm (without and with weight sharing), as well as to add search capabilities to existing code and implement complex search flows. We then introduce PyGlove, a new Python library that implements this paradigm. Through case studies on ImageNet and NAS-Bench-101, we show that with PyGlove users can easily convert a static program into a search space, quickly iterate on the search spaces and search algorithms, and craft complex search flows to achieve better results.
\ No newline at end of file
diff --git a/data/2020/neurips/Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning b/data/2020/neurips/Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning
new file mode 100644
index 0000000000..dc2ff57cd2
--- /dev/null
+++ b/data/2020/neurips/Quantifying Learnability and Describability of Visual Concepts Emerging in Representation Learning	
@@ -0,0 +1 @@
+The increasing impact of black box models, and particularly of unsupervised ones, comes with an increasing interest in tools to understand and interpret them. In this paper, we consider in particular how to characterise visual groupings discovered automatically by deep neural networks, starting with state-of-the-art clustering methods. In some cases, clusters readily correspond to an existing labelled dataset. However, often they do not, yet they still maintain an "intuitive interpretability". We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings, including unsupervised ones. The idea is to measure (1) how well humans can learn to reproduce a grouping by measuring their ability to generalise from a small set of visual examples (learnability) and (2) whether the set of visual examples can be replaced by a succinct, textual description (describability). By assessing human annotators as classifiers, we remove the subjective quality of existing evaluation metrics. For better scalability, we finally propose a class-level captioning system to generate descriptions for visual groupings automatically and compare it to human annotators using the describability metric.
\ No newline at end of file
diff --git a/data/2020/neurips/Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality b/data/2020/neurips/Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality
new file mode 100644
index 0000000000..7a927afeb3
--- /dev/null
+++ b/data/2020/neurips/Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality	
@@ -0,0 +1 @@
+We consider the problem of estimating the Wasserstein distance between the empirical measure and a set of probability measures whose expectations over a class of functions (hypothesis class) are constrained. If this class is sufficiently rich to characterize a particular distribution (e.g., all Lipschitz functions), then our formulation recovers the Wasserstein distance to such a distribution. We establish a strong duality result that generalizes the celebrated Kantorovich-Rubinstein duality. We also show that our formulation can be used to beat the curse of dimensionality, which is well known to affect the rates of statistical convergence of the empirical Wasserstein distance. In particular, examples of infinite-dimensional hypothesis classes are presented, informed by a complex correlation structure, for which it is shown that the empirical Wasserstein distance to such classes converges to zero at the standard parametric rate. Our formulation provides insights that help clarify why, despite the curse of dimensionality, the Wasserstein distance enjoys favorable empirical performance across a wide range of statistical applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Quantile Propagation for Wasserstein-Approximate Gaussian Processes b/data/2020/neurips/Quantile Propagation for Wasserstein-Approximate Gaussian Processes
new file mode 100644
index 0000000000..7cbc6e2bd9
--- /dev/null
+++ b/data/2020/neurips/Quantile Propagation for Wasserstein-Approximate Gaussian Processes	
@@ -0,0 +1 @@
+In this work, we develop a new approximation method to solve the analytically intractable Bayesian inference for Gaussian process models with factorizable Gaussian likelihoods and single-output latent functions. Our method -- dubbed QP -- is similar to the expectation propagation (EP), however it minimizes the $L^2$ Wasserstein distance instead of the Kullback-Leibler (KL) divergence. We consider the specific case in which the non-Gaussian likelihood is approximated by the Gaussian likelihood. We show that QP has the following properties: (1) QP matches quantile functions rather than moments in EP; (2) QP and EP have the same local update for the mean of the approximate Gaussian likelihood; (3) the local variance estimate for the approximate likelihood is smaller for QP than for EP's, addressing EP's over-estimation of the variance; (4) the optimal approximate Gaussian likelihood enjoys a univariate parameterization, reducing memory consumption and computation time. Furthermore, we provide a unified interpretations of EP and QP -- both are coordinate descent algorithms of a KL and an $L^2$ Wasserstein global objective function respectively, under the same assumptions. In the performed experiments, we employ eight real world datasets and we show that QP outperforms EP for the task of Gaussian process binary classification.
\ No newline at end of file
diff --git a/data/2020/neurips/Quantitative Propagation of Chaos for SGD in Wide Neural Networks b/data/2020/neurips/Quantitative Propagation of Chaos for SGD in Wide Neural Networks
new file mode 100644
index 0000000000..0abd4606ff
--- /dev/null
+++ b/data/2020/neurips/Quantitative Propagation of Chaos for SGD in Wide Neural Networks	
@@ -0,0 +1 @@
+In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.
\ No newline at end of file
diff --git a/data/2020/neurips/Quantized Variational Inference b/data/2020/neurips/Quantized Variational Inference
new file mode 100644
index 0000000000..d80441cab4
--- /dev/null
+++ b/data/2020/neurips/Quantized Variational Inference	
@@ -0,0 +1 @@
+We present Quantized Variational Inference, a new algorithm for Evidence Lower Bound maximization. We show how Optimal Voronoi Tesselation produces variance free gradients for ELBO optimization at the cost of introducing asymptotically decaying bias. Subsequently, we propose a Richardson extrapolation type method to improve the asymptotic bound. We show that using the Quantized Variational Inference framework leads to fast convergence for both score function and the reparametrized gradient estimator at a comparable computational cost. Finally, we propose several experiments to assess the performance of our method and its limitations.
\ No newline at end of file
diff --git a/data/2020/neurips/R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making b/data/2020/neurips/R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making
new file mode 100644
index 0000000000..d6281fa40b
--- /dev/null
+++ b/data/2020/neurips/R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making	
@@ -0,0 +1 @@
+When should you continue with your ongoing plans and when should you instead decide to pursue better opportunities? We show in theory and experiment that such stay-or-leave decisions are consistent with deep R-learning both behaviorally and neuronally. Our results suggest that real-world agents leave depleting resources when their reward rate falls below its exponential average, which, we argue, is a Bayes optimal rule in dynamic natural environments. Our work links reinforcement learning, the marginal value theorem and Bayesian inference approaches to offer a learning algorithm and a decision rule for making sequential stay-or-leave choices
\ No newline at end of file
diff --git a/data/2020/neurips/RANet: Region Attention Network for Semantic Segmentation b/data/2020/neurips/RANet: Region Attention Network for Semantic Segmentation
new file mode 100644
index 0000000000..fcb5946604
--- /dev/null
+++ b/data/2020/neurips/RANet: Region Attention Network for Semantic Segmentation	
@@ -0,0 +1 @@
+Recent semantic segmentation methods model the relationship between pixels to construct the contextual representations. In this paper, we introduce the Region Attention Network (RANet), a novel attention network for modeling the relationship between object regions. RANet divides the image into object regions, where we select the representative information. In contrast to the previous methods, RANet configures the information pathways between the pixels in different regions, enabling the region interaction to exchange the regional context for enhancing all of the pixels in the image. We train the construction of object regions, the selection of the representative regional contents, the configuration of information pathways and the context exchange between pixels, jointly, to improve the segmentation accuracy. We extensively evaluate our method on the challenging segmentation benchmarks, demonstrating that RANet effectively helps to achieve the state-of-the-art results. Code will be available at: https://github.com/dingguo1996/RANet.
\ No newline at end of file
diff --git a/data/2020/neurips/RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning b/data/2020/neurips/RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
new file mode 100644
index 0000000000..2f6bbf62e8
--- /dev/null
+++ b/data/2020/neurips/RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning	
@@ -0,0 +1 @@
+Research on continual learning has led to a variety of approaches to mitigating catastrophic forgetting in feed-forward classification networks. Until now surprisingly little attention has been focused on continual learning of recurrent models applied to problems like image captioning. In this paper we take a systematic look at continual learning of LSTM-based models for image captioning. We propose an attention-based approach that explicitly accommodates the transient nature of vocabularies in continual image captioning tasks -- i.e. that task vocabularies are not disjoint. We call our method Recurrent Attention to Transient Tasks (RATT), and also show how to adapt continual learning approaches based on weight egularization and knowledge distillation to recurrent continual learning problems. We apply our approaches to incremental image captioning problem on two new continual learning benchmarks we define using the MS-COCO and Flickr30 datasets. Our results demonstrate that RATT is able to sequentially learn five captioning tasks while incurring no forgetting of previously learned ones.
\ No newline at end of file
diff --git a/data/2020/neurips/RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces b/data/2020/neurips/RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces
new file mode 100644
index 0000000000..817f13e399
--- /dev/null
+++ b/data/2020/neurips/RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces	
@@ -0,0 +1 @@
+We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. Similar to other generative approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE combines an object-centric GAN formulation with a model that explicitly accounts for correlations between individual objects. This allows the model to generate realistic scenes and videos from a physically-interpretable parameterization. Furthermore, we show that modeling the object correlation is necessary to learn to disentangle object positions and identity. We find that RELATE is also amenable to physically realistic scene editing and that it significantly outperforms prior art in object-centric scene generation in both synthetic (CLEVR, ShapeStacks) and real-world data (street traffic scenes). In addition, in contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity
\ No newline at end of file
diff --git a/data/2020/neurips/RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning b/data/2020/neurips/RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference b/data/2020/neurips/RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
new file mode 100644
index 0000000000..105f7ec66b
--- /dev/null
+++ b/data/2020/neurips/RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference	
@@ -0,0 +1 @@
+Pooling operators are key components in most Convolutional Neural Networks (CNNs) as they serve to downsample images, aggregate feature information, and increase receptive field. However, standard pooling operators reduce the feature size gradually to avoid significant loss in information via gross aggregation. Consequently, CNN architectures tend to be deep, computationally expensive and challenging to deploy on RAM constrained devices. We introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs), that efficiently aggregate features over large patches of an image and rapidly downsamples its size. Our empirical evaluation indicates that an RNNPool layer(s) can effectively replace multiple blocks in a variety of architectures such as MobileNets (Sandler et al., 2018), DenseNet (Huang et al., 2017) and can be used for several vision tasks like image classification and face detection. That is, RNNPool can significantly decrease computational complexity and peak RAM usage for inference, while retaining comparable accuracy. Further, we use RNNPool to construct a novel real-time face detection method that achieves state-of-the-art MAP within computational budget afforded by a tiny Cortex M4 microcontroller with ~256 KB RAM.
\ No newline at end of file
diff --git a/data/2020/neurips/RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor b/data/2020/neurips/RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor
new file mode 100644
index 0000000000..87cb14d2d8
--- /dev/null
+++ b/data/2020/neurips/RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor	
@@ -0,0 +1 @@
+Keypoint detector and descriptor are two main components of point cloud registration. Previous learning-based keypoint detectors rely on saliency estimation for each point or farthest point sample (FPS) for candidate points selection, which are inefficient and not applicable in large scale scenes. This paper proposes Random Sample-based Keypoint Detector and Descriptor Network (RSKDD-Net) for large scale point cloud registration. The key idea is using random sampling to efficiently select candidate points and using a learning-based method to jointly generate keypoints and descriptors. To tackle the information loss of random sampling, we exploit a novel random dilation cluster strategy to enlarge the receptive field of each sampled point and an attention mechanism to aggregate the positions and features of neighbor points. Furthermore, we propose a matching loss to train the descriptor in a weakly supervised manner. Extensive experiments on two large scale outdoor LiDAR datasets show that the proposed RSKDD-Net achieves state-of-the-art performance with more than 15 times faster than existing methods. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/RandAugment: Practical Automated Data Augmentation with a Reduced Search Space b/data/2020/neurips/RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
new file mode 100644
index 0000000000..612d3f3253
--- /dev/null
+++ b/data/2020/neurips/RandAugment: Practical Automated Data Augmentation with a Reduced Search Space	
@@ -0,0 +1 @@
+Recent work on automated augmentation strategies has led to state-of-the-art results in image classification and object detection. An obstacle to a large-scale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task. However, it was not clear if the optimized hyperparameters found on the proxy task are also optimal for the actual task. In this work, we rethink the process of designing automated augmentation strategies. We find that while previous work required a search for both magnitude and probability of each operation independently, it is sufficient to only search for a single distortion magnitude that jointly controls all operations. We hence propose a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task.Despite the simplifications, our method achieves equal or better performance over previous automated augmentation strategies on on CIFAR-10/100, SVHN, ImageNet and COCO datasets. EfficientNet-B7, we achieve 85.0% accuracy, a 1.0% increase over baseline augmentation, a 0.6% improvement over AutoAugment on the ImageNet dataset. With EfficientNet-B8, we achieve 85.4% accuracy on ImageNet, which matches a previous result that used 3.5B extra images. On object detection, the same method as classification leads to 1.0-1.3% improvement over baseline augmentation. Code will be made available online.
\ No newline at end of file
diff --git a/data/2020/neurips/Random Reshuffling is Not Always Better b/data/2020/neurips/Random Reshuffling is Not Always Better
new file mode 100644
index 0000000000..4645cb6c96
--- /dev/null
+++ b/data/2020/neurips/Random Reshuffling is Not Always Better	
@@ -0,0 +1 @@
+Many learning algorithms, such as stochastic gradient descent, are affected by the order in which training examples are used. It is generally believed that sampling the training examples without-replacement, also known as random reshufﬂing , causes learning algorithms to converge faster. We give a counterexample to the Operator Inequality of Noncommutative Arithmetic and Geometric Means, a longstanding conjecture that relates to the performance of random reshufﬂing in learning algorithms [19]. We use this to give an example of a learning task and algorithm for which with-replacement random sampling outperforms random reshufﬂing
\ No newline at end of file
diff --git a/data/2020/neurips/Random Reshuffling: Simple Analysis with Vast Improvements b/data/2020/neurips/Random Reshuffling: Simple Analysis with Vast Improvements
new file mode 100644
index 0000000000..f879886180
--- /dev/null
+++ b/data/2020/neurips/Random Reshuffling: Simple Analysis with Vast Improvements	
@@ -0,0 +1 @@
+Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large. We remove these 3 assumptions, improve the dependence on the condition number from $\kappa^2$ to $\kappa$ (resp.\ from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance. We argue through theory and experiments that the new variance type gives an additional justification of the superior performance of RR. To go beyond strong convexity, we present several results for non-strongly convex and non-convex objectives. We show that in all cases, our theory improves upon existing literature. Finally, we prove fast convergence of the Shuffle-Once (SO) algorithm, which shuffles the data only once, at the beginning of the optimization process. Our theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times. As a byproduct of our analysis, we also get new results for the Incremental Gradient algorithm (IG), which does not shuffle the data at all.
\ No newline at end of file
diff --git a/data/2020/neurips/Random Walk Graph Neural Networks b/data/2020/neurips/Random Walk Graph Neural Networks
new file mode 100644
index 0000000000..5f39b74078
--- /dev/null
+++ b/data/2020/neurips/Random Walk Graph Neural Networks	
@@ -0,0 +1 @@
+In recent years, graph neural networks (GNNs) have become the de facto tool for performing machine learning tasks on graphs. Most GNNs belong to the family of message passing neural networks (MPNNs). These models employ an iterative neighborhood aggregation scheme to update vertex representations. Then, to compute vector representations of graphs, they aggregate the representations of the vertices using some permutation invariant function. Therefore, MPNNs treat graphs as sets and thus seem to ignore features emanating from the graph topology. In this paper, we propose a more intuitive and transparent architecture for graph-structured data, so-called Random Walk Graph Neural Network (RWNN). The ﬁrst layer of the model consists of a number of trainable “hidden graphs” which are compared against the input graphs using a random walk kernel to produce graph representations. These representations are then passed on to a fully-connected neural network which produces the output. The employed random walk kernel is differentiable, and therefore, the proposed model is end-to-end trainable. We demonstrate the model’s transparency on synthetic datasets. Furthermore, we empirically evaluate the model on graph classiﬁcation datasets and show that it achieves strong performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Randomized tests for high-dimensional regression: A more efficient and powerful solution b/data/2020/neurips/Randomized tests for high-dimensional regression: A more efficient and powerful solution
new file mode 100644
index 0000000000..71525dfe28
--- /dev/null
+++ b/data/2020/neurips/Randomized tests for high-dimensional regression: A more efficient and powerful solution	
@@ -0,0 +1 @@
+We investigate the problem of testing the global null in the high-dimensional regression models when the feature dimension $p$ grows proportionally to the number of observations $n$. Despite a number of prior work studying this problem, whether there exists a test that is model-agnostic, efficient to compute and enjoys high power, still remains unsettled. In this paper, we answer this question in the affirmative by leveraging the random projection techniques, and propose a testing procedure that blends the classical $F$-test with a random projection step. When combined with a systematic choice of the projection dimension, the proposed procedure is proved to be minimax optimal and, meanwhile, reduces the computation and data storage requirements. We illustrate our results in various scenarios when the underlying feature matrix exhibits an intrinsic lower dimensional structure (such as approximate block structure or has exponential/polynomial eigen-decay), and it turns out that the proposed test achieves sharp adaptive rates. Our theoretical findings are further validated by comparisons to other state-of-the-art tests on the synthetic data.
\ No newline at end of file
diff --git a/data/2020/neurips/Rankmax: An Adaptive Projection Alternative to the Softmax Function b/data/2020/neurips/Rankmax: An Adaptive Projection Alternative to the Softmax Function
new file mode 100644
index 0000000000..8dbe5d6dbf
--- /dev/null
+++ b/data/2020/neurips/Rankmax: An Adaptive Projection Alternative to the Softmax Function	
@@ -0,0 +1 @@
+This document consists of results that support the material in the paper “Rankmax: An Adaptive Projection Alternative to the Softmax Function”, hereafter referred to as the main paper. It is assumed that the reader is already familiar with the notation and definitions in the main paper. Additional notation. The next quantities are for a closed convex set Z ⊆ R and a proper, lower semicontinuous, and convex function f . Let δZ denote the convex indicator function of Z where δZ(z) = 0 if z ∈ Z and δZ(z) =∞ if z / ∈ Z. Let NZ denote the normal cone of Z, given by NZ(z) := {u ∈ R : 〈u, z̃ − z〉 ≤ 0,∀z̃ ∈ Z} = ∂δZ(z) ∀z ∈ Z. Let ΠZ denote the projection onto the set Z given by ΠZ(z) = argminu∈Z ‖u − z‖/2 for every z ∈ R. Let id denote the identity operator given by id(z) = z for every z ∈ R. Let f∗ denote the convex conjugate of f given by f∗(z) = supz̃∈Rn{〈z̃, z〉 − f(z̃)} for every z ∈ R. A On the equivalence with Bregman projections Recall that the Bregman divergence [3] with a differentiable distance generating function g is given by Dg(x, y) = g(x)− g(y)− 〈∇g(y), x− y〉, and the Bregman projection of a vector z̃ on ∆n−1 k is given by argmin x∈∆n−1 k Dg(x, z̃) = argmin x∈∆n−1 k {g(x)− 〈∇g(z̃), x〉} . (17) This is equivalent to (2) when ∇g(z̃) = αz. This identity is guaranteed to have a solution for all z by strong convexity of g. Indeed by Theorem 23.5 in [8], αz ∈ ∂g(z̃) if and only if z̃ maximizes 〈αz, z̃〉 − g(z), but since g is strongly convex, the latter always has a unique maximizer. Note that the maximization is over all of R, unlike problem (2) where the minimization is over ∆n−1 k . For example, when g(x) = 12‖x‖ 2 2, we have Dg(x, y) = 1 2‖x − y‖ 2 2 and ∇g(x) = x. Thus, (17) and (2) are equivalent with z̃ = αz. When g(x) = ∑n i=1 xi log xi with effective domain {x ∈ R : x ≥ 0}, the function Dg is the (un-normalized) K-L divergence Dg(x, y) = ∑n i=1 xi log(xi/yi) + (yi − xi), defined on R+ × R+, and ∇g(x) = (1 + log xi)i=1...n (note that ∇g is a bijection from R+ to R). Thus (17) and (2) are equivalent up to the change of variable z̃i = eαzi−1. ∗Work done during an internship at Google Research. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. B Properties of pα This section presents several technical results regarding the function pα from the main paper. Before proceeding, we first present the following results on max functions, whose proofs can be found in [5]. Proposition 1. Suppose that for some closed X × Y ⊆ R × R and μ > 0 we have a real–valued function Ψ : R × R 7→ R that satisfies: (A1) −Ψ(x, ·) is a proper, lower semicontinuous, μ–strongly convex function for every x ∈ X; (A2) Ψ(·, y) is continuously differentiable on X for every y ∈ Y ; Moreover, define the functions ψ(x) := max ỹ∈Y Ψ(x, ỹ), y(x) := argmax ỹ∈Y Ψ(x, ỹ), (18) for every x ∈ X . If Y is bounded then: (a) y is continuous on X; (b) ψ is continuously differentiable on X and ∇ψ(x) = ∇xΨ(x, y(x)) for every x ∈ X . We now show Proposition 1 in the main paper, restated below for convenience. Proposition. Suppose g is 1–strongly convex and ∆n−1 k ⊆ dom g. Then the following properties hold for any z ∈ R and α > 0: (a) limt→∞ pt(z) = p∞(z); (b) for any 1 ≤ i, j ≤ n, we have pα(z)i ≥ pα(x)j if and only if zi ≥ zj; (c) the function pα(z) is α–Lipschitz continuous. Proof. For ease of notation, define Ψ(x, y) := 〈x, y〉 − 1 α g(y) ∀(x, y) ∈ R × R, denote ψ and y as in (18), and remark that y = pα. (a) Let {tn} be a positive sequence of scalars tending to infinity and let z ∈ R be fixed. Moreover, denote yn = ptn(z) for every n ≥ 1. Fixing n ≥ 1, the optimality condition of yn is z ∈ 1 αn ∂g(yn) +N∆n−1 k (yn) ⇐⇒ 〈z, z̃ − yn〉 ≤ 1 αn 〈un, z̃ − yn〉 ∀(un, z̃) ∈ ∂g(yn)×∆ k Applying the 1–strong convexity of g to the latter form yields 〈z, z̃ − yn〉 ≤ 1 αn [ g(z̃)− g(yn)− 1 2 ‖z̃ − yn‖ ] , ≤ 1 αn sup z,z̃∈∆n−1 k { g(z̃)− g(z)− 1 2 ‖z̃ − z‖ } } {{ } =:C for every z̃ ∈ ∆n−1 k . Moreover, using the finiteness of g on ∆ n−1 k and the boundedness of ∆ n−1 k it is clear that the quantity C above is finite. Hence, since limn→∞(C/αn) = 0, we conclude that yn converges to a solution in p∞(z). The conclusion now follows from the definitions of pα and p∞. (b) Given a fixed x ∈ R and α > 0, it suffices to prove that pα is variationally monotone, i.e. 〈pα(z)− pα(z̃), z − z̃〉 ≥ 0 for every z, z̃ ∈ R. We first show that ψ is convex. Using the fact that Ψ(·, y) is convex for every y ∈ R, the optimality of y, and Proposition 1(b), we have that for every z, z̃ ∈ R, 0 ≤ Ψ(z̃, y(z))−Ψ(z, y(z))− 〈∇Ψ(z, y(z)), z̃ − z〉 = Ψ(z̃, y(z))− ψ(z)− 〈∇ψ(z), z̃ − z〉 ≤ ψ(z̃)− ψ(z)− 〈∇ψ(z), z̃ − z〉
\ No newline at end of file
diff --git a/data/2020/neurips/Ratio Trace Formulation of Wasserstein Discriminant Analysis b/data/2020/neurips/Ratio Trace Formulation of Wasserstein Discriminant Analysis
new file mode 100644
index 0000000000..a415d48dcb
--- /dev/null
+++ b/data/2020/neurips/Ratio Trace Formulation of Wasserstein Discriminant Analysis	
@@ -0,0 +1 @@
+We reformulate the Wasserstein Discriminant Analysis (WDA) as a ratio trace problem and present an eigensolver-based algorithm to compute the discriminative subspace of WDA. This new formulation, along with the proposed algorithm, can be served as an efﬁcient and more stable alternative to the original trace ratio formulation and its gradient-based algorithm. We provide a rigorous convergence analysis for the proposed algorithm under the self-consistent ﬁeld framework, which is crucial but missing in the literature. As an application, we combine WDA with low-dimensional clustering techniques, such as K-means, to perform sub-space clustering. Numerical experiments on real datasets show promising results of the ratio trace formulation of WDA in both classiﬁcation and clustering tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Rational neural networks b/data/2020/neurips/Rational neural networks
new file mode 100644
index 0000000000..7128b86ca2
--- /dev/null
+++ b/data/2020/neurips/Rational neural networks	
@@ -0,0 +1 @@
+We consider neural networks with rational activation functions. The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We establish optimal bounds in terms of network complexity and prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth. The flexibility and smoothness of rational activation functions make them an attractive alternative to ReLU, as we demonstrate with numerical experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization b/data/2020/neurips/Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization
new file mode 100644
index 0000000000..8567dbe043
--- /dev/null
+++ b/data/2020/neurips/Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a popular approach to optimize expensive-to-evaluate black-box functions. A significant challenge in BO is to scale to high-dimensional parameter spaces while retaining sample efficiency. A solution considered in existing literature is to embed the high-dimensional space in a lower-dimensional manifold, often via a random linear embedding. In this paper, we identify several crucial issues and misconceptions about the use of linear embeddings for BO. We study the properties of linear embeddings from the literature and show that some of the design choices in current approaches adversely impact their performance. We show empirically that properly addressing these issues significantly improves the efficacy of linear embeddings for BO on a range of problems, including learning a gait policy for robot locomotion.
\ No newline at end of file
diff --git a/data/2020/neurips/Real World Games Look Like Spinning Tops b/data/2020/neurips/Real World Games Look Like Spinning Tops
new file mode 100644
index 0000000000..9579e094e8
--- /dev/null
+++ b/data/2020/neurips/Real World Games Look Like Spinning Tops	
@@ -0,0 +1 @@
+This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their geometrical structure resemble a spinning top, with the upright axis representing transitive strength, and the radial axis, which corresponds to the number of cycles that exist at a particular transitive strength, representing the non-transitive dimension. We prove the existence of this geometry for a wide class of real world games, exposing their temporal nature. Additionally, we show that this unique structure also has consequences for learning - it clarifies why populations of strategies are necessary for training of agents, and how population size relates to the structure of the game. Finally, we empirically validate these claims by using a selection of nine real world two-player zero-sum symmetric games, showing 1) the spinning top structure is revealed and can be easily re-constructed by using a new method of Nash clustering to measure the interaction between transitive and cyclical strategy behaviour, and 2) the effect that population size has on the convergence in these games.
\ No newline at end of file
diff --git a/data/2020/neurips/Reasoning about Uncertainties in Discrete-Time Dynamical Systems using Polynomial Forms b/data/2020/neurips/Reasoning about Uncertainties in Discrete-Time Dynamical Systems using Polynomial Forms
new file mode 100644
index 0000000000..cbf8dec0d9
--- /dev/null
+++ b/data/2020/neurips/Reasoning about Uncertainties in Discrete-Time Dynamical Systems using Polynomial Forms	
@@ -0,0 +1 @@
+In this paper, we propose polynomial forms to represent distributions of state variables over time for discrete-time stochastic dynamical systems. This problem arises in a variety of applications in areas ranging from biology to robotics. Our approach allows us to rigorously represent the probability distribution of state variables over time, and provide guaranteed bounds on the expectations, moments and probabilities of tail events involving the state variables. First, we recall ideas from interval arithmetic, and use them to rigorously represent the state variables at time t as a function of the initial state variables and noise symbols that model the random exogenous inputs encountered before time t . Next, we show how concentration of measure inequalities can be employed to prove rigorous bounds on the tail probabilities of these state variables. We demonstrate interesting applications that demonstrate how our approach can be useful in some situations to establish mathematically guaranteed bounds that are of a different nature from those obtained through simulations with pseudo-random numbers.
\ No newline at end of file
diff --git a/data/2020/neurips/Reciprocal Adversarial Learning via Characteristic Functions b/data/2020/neurips/Reciprocal Adversarial Learning via Characteristic Functions
new file mode 100644
index 0000000000..3dd2385c3f
--- /dev/null
+++ b/data/2020/neurips/Reciprocal Adversarial Learning via Characteristic Functions	
@@ -0,0 +1 @@
+Generative adversarial nets (GANs) have become a preferred tool for accommodating complicated distributions, and to stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. Although theoretically supported, extensive IPM-GANs are basically comparing moments in an embedded domain of the \textit{critic}. We generalise this by comparing the distributions rather than the moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally contains all the information about a distribution. For rigour, we first establish the physical meaning of the phase and amplitude in CFs. This provides a feasible way of manipulating the generation. We then develop an efficient sampling way to calculate the CFs. Within this framework, we further prove an equivalence between the embedded and data domains when a reciprocal exists, which allows us to develop the GAN in an auto-encoder way, by using only two modules to achieve bi-directionally generating clear images. We refer to this efficient structure as the reciprocal CF GAN (RCF-GAN). Experimental results show the superior performances of the proposed RCF-GAN in terms of both generation and reconstruction.
\ No newline at end of file
diff --git a/data/2020/neurips/Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate b/data/2020/neurips/Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
new file mode 100644
index 0000000000..639953ba20
--- /dev/null
+++ b/data/2020/neurips/Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate	
@@ -0,0 +1 @@
+Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new ' intrinsic learning rate' parameter that is the product of the normal learning rate and weight decay factor. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR. (b) A challenge -- via theory and experiments -- to popular belief that good generalization requires large learning rates at the start of training. (c) New experiments, backed by mathematical intuition, suggesting the number of steps to equilibrium (in function space) scales as the inverse of the intrinsic learning rate, as opposed to the exponential time convergence bound implied by SDE analysis. We name it the Fast Equilibrium Conjecture and suggest it holds the key to why Batch Normalization is effective.
\ No newline at end of file
diff --git a/data/2020/neurips/Reconsidering Generative Objectives For Counterfactual Reasoning b/data/2020/neurips/Reconsidering Generative Objectives For Counterfactual Reasoning
new file mode 100644
index 0000000000..dba205bc68
--- /dev/null
+++ b/data/2020/neurips/Reconsidering Generative Objectives For Counterfactual Reasoning	
@@ -0,0 +1 @@
+There has been recent interest in exploring generative goals for counterfactual reasoning, e.g., individualized treatment effect (ITE) estimation. However, existing solutions often fail to address issues that are unique to causal inference, such as covariate balancing and counterfactual validation. As a step toward more ﬂexible, scalable and accurate ITE estimation, we present a novel generative Bayesian estimation framework that integrates representation learning, adversarial matching and causal estimation. By appealing to the Robinson decomposition, we derive a reformulated variational bound that explicitly targets the causal effect estimation rather than speciﬁc predictive goals. Our procedure acknowledges the uncertainties in representation and solves a Fenchel mini-max game to resolve the representation imbalance for better counterfactual generalization, justiﬁed by new theory.The latent variable formulation enables robustness to unobservable latent confounders, extending the scope of its applicability. The proposed approach is demonstrated via an extensive set of tests against competing solutions, both under various simulation setups and to real-world datasets, with encouraging results reported.
\ No newline at end of file
diff --git a/data/2020/neurips/Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN b/data/2020/neurips/Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN
new file mode 100644
index 0000000000..04f62920de
--- /dev/null
+++ b/data/2020/neurips/Reconstructing Perceptive Images from Brain Activity by Shape-Semantic GAN	
@@ -0,0 +1 @@
+Reconstructing seeing images from fMRI recordings is an absorbing research area in neuroscience and provides a potential brain-reading technology. The challenge lies in that visual encoding in brain is highly complex and not fully revealed. Inspired by the theory that visual features are hierarchically represented in cortex, we propose to break the complex visual signals into multi-level components and decode each component separately. Specifically, we decode shape and semantic representations from the lower and higher visual cortex respectively, and merge the shape and semantic information to images by a generative adversarial network (Shape-Semantic GAN). This 'divide and conquer' strategy captures visual information more accurately. Experiments demonstrate that Shape-Semantic GAN improves the reconstruction similarity and image quality, and achieves the state-of-the-art image reconstruction performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Recovery of sparse linear classifiers from mixture of responses b/data/2020/neurips/Recovery of sparse linear classifiers from mixture of responses
new file mode 100644
index 0000000000..877c3c318e
--- /dev/null
+++ b/data/2020/neurips/Recovery of sparse linear classifiers from mixture of responses	
@@ -0,0 +1 @@
+In the problem of learning a mixture of linear classifiers, the aim is to learn a collection of hyperplanes from a sequence of binary responses. Each response is a result of querying with a vector and indicates the side of a randomly chosen hyperplane from the collection the query vector belongs to. This model provides a rich representation of heterogeneous data with categorical labels and has only been studied in some special settings. We look at a hitherto unstudied problem of query complexity upper bound of recovering all the hyperplanes, especially for the case when the hyperplanes are sparse. This setting is a natural generalization of the extreme quantization problem known as 1-bit compressed sensing. Suppose we have a set of $\ell$ unknown $k$-sparse vectors. We can query the set with another vector $\boldsymbol{a}$, to obtain the sign of the inner product of $\boldsymbol{a}$ and a randomly chosen vector from the $\ell$-set. How many queries are sufficient to identify all the $\ell$ unknown vectors? This question is significantly more challenging than both the basic 1-bit compressed sensing problem (i.e., $\ell=1$ case) and the analogous regression problem (where the value instead of the sign is provided). We provide rigorous query complexity results (with efficient algorithms) for this problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Recurrent Quantum Neural Networks b/data/2020/neurips/Recurrent Quantum Neural Networks
new file mode 100644
index 0000000000..7262801204
--- /dev/null
+++ b/data/2020/neurips/Recurrent Quantum Neural Networks	
@@ -0,0 +1 @@
+Recurrent neural networks are the foundation of many sequence-to-sequence models in machine learning, such as machine translation and speech synthesis. In contrast, applied quantum computing is in its infancy. Nevertheless there already exist quantum machine learning models such as variational quantum eigensolvers which have been used successfully e.g. in the context of energy minimization tasks. In this work we construct a quantum recurrent neural network (QRNN) with demonstrable performance on non-trivial tasks such as sequence learning and integer digit classification. The QRNN cell is built from parametrized quantum neurons, which, in conjunction with amplitude amplification, create a nonlinear activation of polynomials of its inputs and cell state, and allow the extraction of a probability distribution over predicted classes at each step. To study the model's performance, we provide an implementation in pytorch, which allows the relatively efficient optimization of parametrized quantum circuits with thousands of parameters. We establish a QRNN training setup by benchmarking optimization hyperparameters, and analyse suitable network topologies for simple memorisation and sequence prediction tasks from Elman's seminal paper (1990) on temporal structure learning. We then proceed to evaluate the QRNN on MNIST classification, both by feeding the QRNN each image pixel-by-pixel; and by utilising modern data augmentation as preprocessing step. Finally, we analyse to what extent the unitary nature of the network counteracts the vanishing gradient problem that plagues many existing quantum classifiers and classical RNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Recurrent Switching Dynamical Systems Models for Multiple Interacting Neural Populations b/data/2020/neurips/Recurrent Switching Dynamical Systems Models for Multiple Interacting Neural Populations
new file mode 100644
index 0000000000..5acccaaf4b
--- /dev/null
+++ b/data/2020/neurips/Recurrent Switching Dynamical Systems Models for Multiple Interacting Neural Populations	
@@ -0,0 +1 @@
+Modern recording techniques can generate large-scale measurements of multiple neural populations over extended time periods. However, it remains a challenge to model non-stationary interactions between high-dimensional populations of neurons. To tackle this challenge, we develop recurrent switching linear dynamical systems models for multiple populations. Here, each high-dimensional neural population is represented by a unique set of latent variables, which evolve dynamically in time. Populations interact with each other through this low-dimensional space. We allow the nature of these interactions to change over time by using a discrete set of dynamical states. Additionally, we parameterize these discrete state transition rules to capture which neural populations are responsible for switching between interaction states. To fit the model, we use variational expectation-maximization with a structured mean-field approximation. After validating the model on simulations, we apply it to two different neural datasets: spiking activity from motor areas in a non-human primate, and calcium imaging from neurons in the nematode C. elegans. In both datasets, the model reveals behaviorally-relevant discrete states with unique inter-population interactions and different populations that predict transitioning between these states.
\ No newline at end of file
diff --git a/data/2020/neurips/Recursive Inference for Variational Autoencoders b/data/2020/neurips/Recursive Inference for Variational Autoencoders
new file mode 100644
index 0000000000..4d8ba037b7
--- /dev/null
+++ b/data/2020/neurips/Recursive Inference for Variational Autoencoders	
@@ -0,0 +1 @@
+Inference networks of traditional Variational Autoencoders (VAEs) are typically amortized, resulting in relatively inaccurate posterior approximation compared to instance-wise variational optimization. Recent semi-amortized approaches were proposed to address this drawback; however, their iterative gradient update procedures can be computationally demanding. To address these issues, in this paper we introduce an accurate amortized inference algorithm. We propose a novel recursive mixture estimation algorithm for VAEs that iteratively augments the current mixture with new components so as to maximally reduce the divergence between the variational and the true posteriors. Using the functional gradient approach, we devise an intuitive learning criteria for selecting a new mixture component: the new component has to improve the data likelihood (lower bound) and, at the same time, be as divergent from the current mixture distribution as possible, thus increasing representational diversity. Compared to recently proposed boosted variational inference (BVI), our method relies on amortized inference in contrast to BVI's non-amortized single optimization instance. A crucial benefit of our approach is that the inference at test time requires a single feed-forward pass through the mixture inference network, making it significantly faster than the semi-amortized approaches. We show that our approach yields higher test data likelihood than the state-of-the-art on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Reducing Adversarially Robust Learning to Non-Robust PAC Learning b/data/2020/neurips/Reducing Adversarially Robust Learning to Non-Robust PAC Learning
new file mode 100644
index 0000000000..79520f1bf7
--- /dev/null
+++ b/data/2020/neurips/Reducing Adversarially Robust Learning to Non-Robust PAC Learning	
@@ -0,0 +1 @@
+We study the problem of reducing adversarially robust learning to standard PAC learning, i.e. the complexity of learning adversarially robust predictors using access to only a black-box non-robust learner. We give a reduction that can robustly learn any hypothesis class $\mathcal{C}$ using any non-robust learner $\mathcal{A}$ for $\mathcal{C}$. The number of calls to $\mathcal{A}$ depends logarithmically on the number of allowed adversarial perturbations per example, and we give a lower bound showing this is unavoidable.
\ No newline at end of file
diff --git a/data/2020/neurips/Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals b/data/2020/neurips/Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals
new file mode 100644
index 0000000000..47171105b1
--- /dev/null
+++ b/data/2020/neurips/Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals	
@@ -0,0 +1 @@
+We study how to learn a policy with compositional generalizability. We propose a two-stage framework, which refactorizes a high-reward teacher policy into a generalizable student policy with strong inductive bias. Particularly, we implement an object-centric GNN-based student policy, whose input objects are learned from images through self-supervised learning. Empirically, we evaluate our approach on four difficult tasks that require compositional generalizability, and achieve superior performance compared to baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Regression with reject option and application to kNN b/data/2020/neurips/Regression with reject option and application to kNN
new file mode 100644
index 0000000000..5ee194f74f
--- /dev/null
+++ b/data/2020/neurips/Regression with reject option and application to kNN	
@@ -0,0 +1 @@
+We investigate the problem of regression where one is allowed to abstain from predicting. We refer to this framework as regression with reject option as an extension of classification with reject option. In this context, we focus on the case where the rejection rate is fixed and derive the optimal rule which relies on thresholding the conditional variance function. We provide a semi-supervised estimation procedure of the optimal rule involving two datasets: a first labeled dataset is used to estimate both regression function and conditional variance function while a second unlabeled dataset is exploited to calibrate the desired rejection rate. The resulting predictor with reject option is shown to be almost as good as the optimal predictor with reject option both in terms of risk and rejection rate. We additionally apply our methodology with kNN algorithm and establish rates of convergence for the resulting kNN predictor under mild conditions. Finally, a numerical study is performed to illustrate the benefit of using the proposed procedure.
\ No newline at end of file
diff --git a/data/2020/neurips/Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses b/data/2020/neurips/Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses
new file mode 100644
index 0000000000..e9979d91ee
--- /dev/null
+++ b/data/2020/neurips/Regret Bounds without Lipschitz Continuity: Online Learning with Relative-Lipschitz Losses	
@@ -0,0 +1,2 @@
+In online convex optimization (OCO), Lipschitz continuity of the functions is commonly assumed in order to obtain sublinear regret. Moreover, many algorithms have only logarithmic regret when these functions are also strongly convex. Recently, researchers from convex optimization proposed the notions of "relative Lipschitz continuity" and "relative strong convexity". Both of the notions are generalizations of their classical counterparts. It has been shown that subgradient methods in the relative setting have performance analogous to their performance in the classical setting. 
+In this work, we consider OCO for relative Lipschitz and relative strongly convex functions. We extend the known regret bounds for classical OCO algorithms to the relative setting. Specifically, we show regret bounds for the follow the regularized leader algorithms and a variant of online mirror descent. Due to the generality of these methods, these results yield regret bounds for a wide variety of OCO algorithms. Furthermore, we further extend the results to algorithms with extra regularization such as regularized dual averaging.
\ No newline at end of file
diff --git a/data/2020/neurips/Regret in Online Recommendation Systems b/data/2020/neurips/Regret in Online Recommendation Systems
new file mode 100644
index 0000000000..e55d066cef
--- /dev/null
+++ b/data/2020/neurips/Regret in Online Recommendation Systems	
@@ -0,0 +1 @@
+This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.
\ No newline at end of file
diff --git a/data/2020/neurips/Regularized linear autoencoders recover the principal components, eventually b/data/2020/neurips/Regularized linear autoencoders recover the principal components, eventually
new file mode 100644
index 0000000000..152908aeb3
--- /dev/null
+++ b/data/2020/neurips/Regularized linear autoencoders recover the principal components, eventually	
@@ -0,0 +1 @@
+Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform $\ell_2$ regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically.
\ No newline at end of file
diff --git a/data/2020/neurips/Regularizing Black-box Models for Improved Interpretability b/data/2020/neurips/Regularizing Black-box Models for Improved Interpretability
new file mode 100644
index 0000000000..281e103cc4
--- /dev/null
+++ b/data/2020/neurips/Regularizing Black-box Models for Improved Interpretability	
@@ -0,0 +1 @@
+Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.
\ No newline at end of file
diff --git a/data/2020/neurips/Regularizing Towards Permutation Invariance In Recurrent Models b/data/2020/neurips/Regularizing Towards Permutation Invariance In Recurrent Models
new file mode 100644
index 0000000000..a494e3b8fc
--- /dev/null
+++ b/data/2020/neurips/Regularizing Towards Permutation Invariance In Recurrent Models	
@@ -0,0 +1,2 @@
+In many machine learning problems the output should not depend on the order of the input. Such "permutation invariant" functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order. We show that RNNs can be regularized towards permutation invariance, and that this can result in compact models, as compared to non-recurrent architectures. We implement this idea via a novel form of stochastic regularization. 
+Existing solutions mostly suggest restricting the learning problem to hypothesis classes which are permutation invariant by design. Our approach of enforcing permutation invariance via regularization gives rise to models which are \textit{semi permutation invariant} (e.g. invariant to some permutations and not to others). We show that our method outperforms other permutation invariant approaches on synthetic and real world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforced Molecular Optimization with Neighborhood-Controlled Grammars b/data/2020/neurips/Reinforced Molecular Optimization with Neighborhood-Controlled Grammars
new file mode 100644
index 0000000000..4167096981
--- /dev/null
+++ b/data/2020/neurips/Reinforced Molecular Optimization with Neighborhood-Controlled Grammars	
@@ -0,0 +1 @@
+A major challenge in the pharmaceutical industry is to design novel molecules with specific desired properties, especially when the property evaluation is costly. Here, we propose MNCE-RL, a graph convolutional policy network for molecular optimization with molecular neighborhood-controlled embedding grammars through reinforcement learning. We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation and design an efficient algorithm to infer grammatical production rules from given molecules. The use of grammars guarantees the validity of the generated molecular structures. By transforming molecular graphs to parse trees with the inferred grammars, the molecular structure generation task is modeled as a Markov decision process where a policy gradient strategy is utilized. In a series of experiments, we demonstrate that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks and exhibits significant superiority in optimizing molecular properties with a limited number of property evaluations.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning for Control with Multiple Frequencies b/data/2020/neurips/Reinforcement Learning for Control with Multiple Frequencies
new file mode 100644
index 0000000000..c97e69b789
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning for Control with Multiple Frequencies	
@@ -0,0 +1 @@
+Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efﬁciently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efﬁcient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of |A| increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs. Extending this result, we present Action-Persistent Actor-Critic (AP-AC), a scalable RL algorithm for high-dimensional control tasks. In the experiments, we demonstrate that AP-AC signiﬁcantly out-performs the baselines on several continuous control tasks and a trafﬁc control simulation, which highlights the effectiveness of our method that directly optimizes the periodic non-stationary policy for tasks with multiple control frequencies.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting b/data/2020/neurips/Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting
new file mode 100644
index 0000000000..e8d9772e14
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting	
@@ -0,0 +1 @@
+We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs). We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound $\widetilde{O}(DS\sqrt{AT})$ for standard non-factored MDPs. We propose a tighter connectivity measure, factored span, for FMDPs and prove a lower bound that depends on the factored span rather than the diameter $D$. In order to decrease the gap between lower and upper bounds, we propose an adaptation of the REGAL.C algorithm whose regret bound depends on the factored span. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning with Augmented Data b/data/2020/neurips/Reinforcement Learning with Augmented Data
new file mode 100644
index 0000000000..f1cddc8e16
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning with Augmented Data	
@@ -0,0 +1 @@
+Learning from visual observations is a fundamental yet challenging problem in Reinforcement Learning (RL). Although algorithmic advances combined with convolutional neural networks have proved to be a recipe for success, current methods are still lacking on two fronts: (a) data-efficiency of learning and (b) generalization to new environments. To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms. We perform the first extensive study of general data augmentations for RL on both pixel-based and state-based inputs, and introduce two new data augmentations - random translate and random amplitude scale. We show that augmentations such as random translate, crop, color jitter, patch cutout, random convolutions, and amplitude scale can enable simple RL algorithms to outperform complex state-of-the-art methods across common benchmarks. RAD sets a new state-of-the-art in terms of data-efficiency and final performance on the DeepMind Control Suite benchmark for pixel-based control as well as OpenAI Gym benchmark for state-based control. We further demonstrate that RAD significantly improves test-time generalization over existing methods on several OpenAI ProcGen benchmarks. Our RAD module and training code are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing b/data/2020/neurips/Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
new file mode 100644
index 0000000000..d3f14d62d6
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing	
@@ -0,0 +1 @@
+Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning with Feedback Graphs b/data/2020/neurips/Reinforcement Learning with Feedback Graphs
new file mode 100644
index 0000000000..a3c6fc50e6
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning with Feedback Graphs	
@@ -0,0 +1 @@
+We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations. Such additional observations are available in a range of tasks through extended sensors or prior knowledge about the environment (e.g., when certain actions yield similar outcome). We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning. We give a regret bound that, ignoring logarithmic factors and lower-order terms, depends only on the size of the maximum acyclic subgraph of the feedback graph, in contrast with a polynomial dependency on the number of states and actions in the absence of a feedback graph. Finally, we highlight challenges when leveraging a small dominating set of the feedback graph as compared to the bandit setting and propose a new algorithm that can use knowledge of such a dominating set for more sample-efficient learning of a near-optimal policy.
\ No newline at end of file
diff --git a/data/2020/neurips/Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension b/data/2020/neurips/Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension
new file mode 100644
index 0000000000..89aeff48a6
--- /dev/null
+++ b/data/2020/neurips/Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension	
@@ -0,0 +1 @@
+Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on developing theory for RL with linear function approximation, the understanding of general function approximation schemes largely remains missing. In this paper, we establish a provably efficient RL algorithm with general value function approximation. We show that if the value functions admit an approximation with a function class $\mathcal{F}$, our algorithm achieves a regret bound of $\widetilde{O}(\mathrm{poly}(dH)\sqrt{T})$ where $d$ is a complexity measure of $\mathcal{F}$ that depends on the eluder dimension [Russo and Van Roy, 2013] and log-covering numbers, $H$ is the planning horizon, and $T$ is the number interactions with the environment. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment. Moreover, our algorithm is model-free and provides a framework to justify the effectiveness of algorithms used in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D b/data/2020/neurips/Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
new file mode 100644
index 0000000000..2a54f3f03c
--- /dev/null
+++ b/data/2020/neurips/Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D	
@@ -0,0 +1 @@
+Understanding spatial relations (e.g., "laptop on table") in visual input is important for both humans and robots. Existing datasets are insufficient as they lack large-scale, high-quality 3D ground truth information, which is critical for learning spatial relations. In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data collection -- a novel crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset come in minimally contrastive pairs: two scenes in a pair are almost identical, but a spatial relation holds in one and fails in the other. We empirically validate that minimally contrastive examples can diagnose issues with current relation detection models as well as lead to sample-efficient training. Code and data are available at https://github.com/princeton-vl/Rel3D.
\ No newline at end of file
diff --git a/data/2020/neurips/RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder b/data/2020/neurips/RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
new file mode 100644
index 0000000000..430ccf820b
--- /dev/null
+++ b/data/2020/neurips/RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder	
@@ -0,0 +1 @@
+Existing object detection frameworks are usually built on a single format of object/part representation, i.e., anchor/proposal rectangle boxes in RetinaNet and Faster R-CNN, center points in FCOS and RepPoints, and corner points in CornerNet. While these different representations usually drive the frameworks to perform well in different aspects, e.g., better classification or finer localization, it is in general difficult to combine these representations in a single framework to make good use of each strength, due to the heterogeneous or non-grid feature extraction by different representations. This paper presents an attention-based decoder module similar as that in Transformer~\cite{vaswani2017attention} to bridge other representations into a typical object detector built on a single representation format, in an end-to-end fashion. The other representations act as a set of \emph{key} instances to strengthen the main \emph{query} representation features in the vanilla detectors. Novel techniques are proposed towards efficient computation of the decoder module, including a \emph{key sampling} approach and a \emph{shared location embedding} approach. The proposed module is named \emph{bridging visual representations} (BVR). It can perform in-place and we demonstrate its broad effectiveness in bridging other representations into prevalent object detection frameworks, including RetinaNet, Faster R-CNN, FCOS and ATSS, where about $1.5\sim3.0$ AP improvements are achieved. In particular, we improve a state-of-the-art framework with a strong backbone by about $2.0$ AP, reaching $52.7$ AP on COCO test-dev. The resulting network is named RelationNet++. The code will be available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Relative gradient optimization of the Jacobian term in unsupervised deep learning b/data/2020/neurips/Relative gradient optimization of the Jacobian term in unsupervised deep learning
new file mode 100644
index 0000000000..40311786f6
--- /dev/null
+++ b/data/2020/neurips/Relative gradient optimization of the Jacobian term in unsupervised deep learning	
@@ -0,0 +1 @@
+Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their likelihood-based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact likelihood-based training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of the naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian without imposing constraints on its structure, in stark contrast to normalizing flows. An implementation of our method can be found at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Reliable Graph Neural Networks via Robust Aggregation b/data/2020/neurips/Reliable Graph Neural Networks via Robust Aggregation
new file mode 100644
index 0000000000..182525ae88
--- /dev/null
+++ b/data/2020/neurips/Reliable Graph Neural Networks via Robust Aggregation	
@@ -0,0 +1 @@
+Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50\%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equipping a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes.
\ No newline at end of file
diff --git a/data/2020/neurips/Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies b/data/2020/neurips/Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
new file mode 100644
index 0000000000..eb5453d753
--- /dev/null
+++ b/data/2020/neurips/Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies	
@@ -0,0 +1 @@
+Many recent datasets contain a variety of different data modalities, for instance, image, question, and answer data in visual question answering (VQA). When training deep net classifiers on those multi-modal datasets, the modalities get exploited at different scales, i.e., some modalities can more easily contribute to the classification results than others. This is suboptimal because the classifier is inherently biased towards a subset of the modalities. To alleviate this shortcoming, we propose a novel regularization term based on the functional entropy. Intuitively, this term encourages to balance the contribution of each modality to the classification result. However, regularization with the functional entropy is challenging. To address this, we develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information. Intuitively, this maximizes the amount of information that the modalities contribute. On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities. In addition, we demonstrate the efficacy of our method on Colored MNIST.
\ No newline at end of file
diff --git a/data/2020/neurips/RepPoints v2: Verification Meets Regression for Object Detection b/data/2020/neurips/RepPoints v2: Verification Meets Regression for Object Detection
new file mode 100644
index 0000000000..edd6c15ead
--- /dev/null
+++ b/data/2020/neurips/RepPoints v2: Verification Meets Regression for Object Detection	
@@ -0,0 +1 @@
+Verification and regression are two general methodologies for prediction in neural networks. Each has its own strengths: verification can be easier to infer accurately, and regression is more efficient and applicable to continuous target variables. Hence, it is often beneficial to carefully combine them to take advantage of their benefits. In this paper, we take this philosophy to improve state-of-the-art object detection, specifically by RepPoints. Though RepPoints provides high performance, we find that its heavy reliance on regression for object localization leaves room for improvement. We introduce verification tasks into the localization prediction of RepPoints, producing RepPoints v2, which provides consistent improvements of about 2.0 mAP over the original RepPoints on the COCO object detection benchmark using different backbones and training methods. RepPoints v2 also achieves 52.1 mAP on COCO \texttt{test-dev} by a single model. Moreover, we show that the proposed approach can more generally elevate other object detection frameworks as well as applications such as instance segmentation. The code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Reparameterizing Mirror Descent as Gradient Descent b/data/2020/neurips/Reparameterizing Mirror Descent as Gradient Descent
new file mode 100644
index 0000000000..6920f0e690
--- /dev/null
+++ b/data/2020/neurips/Reparameterizing Mirror Descent as Gradient Descent	
@@ -0,0 +1 @@
+Most of the recent successful applications of neural networks have been based on training with gradient descent updates. However, for some small networks, other mirror descent updates learn provably more efficiently when the target is sparse. We present a general framework for casting a mirror descent update as a gradient descent update on a different set of parameters. In some cases, the mirror descent reparameterization can be described as training a modified network with standard backpropagation. The reparameterization framework is versatile and covers a wide range of mirror descent updates, even cases where the domain is constrained. Our construction for the reparameterization argument is done for the continuous versions of the updates. Finding general criteria for the discrete versions to closely track their continuous counterparts remains an interesting open problem.
\ No newline at end of file
diff --git "a/data/2020/neurips/Replica-Exchange Nos\303\251-Hoover Dynamics for Bayesian Learning on Large Datasets" "b/data/2020/neurips/Replica-Exchange Nos\303\251-Hoover Dynamics for Bayesian Learning on Large Datasets"
new file mode 100644
index 0000000000..5e33f9a75b
--- /dev/null
+++ "b/data/2020/neurips/Replica-Exchange Nos\303\251-Hoover Dynamics for Bayesian Learning on Large Datasets"	
@@ -0,0 +1 @@
+In this paper, we propose a new sampler for Bayesian learning that can rapidly draw representative samples from complex posterior distributions with multiple isolated modes in the presence of mini-batch noise. This is achieved by simulating a collection of replicas in parallel with different temperatures and periodically swapping them. When evolving the replicas' states, the Nose-Hoover dynamics is applied, which adaptively neutralizes the mini-batch noise. To perform proper exchanges, a new protocol is developed with a noise-aware test of acceptance, by which the detailed balance is reserved in an asymptotic way. While its efficacy on complex multimodal posteriors has been illustrated by testing over synthetic distributions, experiments with deep Bayesian neural networks on large-scale datasets have shown its significant improvements over strong baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatment b/data/2020/neurips/Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatment
new file mode 100644
index 0000000000..408b7d1c35
--- /dev/null
+++ b/data/2020/neurips/Representation Learning for Integrating Multi-domain Outcomes to Optimize Individualized Treatment	
@@ -0,0 +1 @@
+For mental disorders, patients' underlying mental states are non-observed latent constructs which have to be inferred from observed multi-domain measurements such as diagnostic symptoms and patient functioning scores. Additionally, substantial heterogeneity in the disease diagnosis between patients needs to be addressed for optimizing individualized treatment policy in order to achieve precision medicine. To address these challenges, we propose an integrated learning framework that can simultaneously learn patients' underlying mental states and recommend optimal treatments for each individual. This learning framework is based on the measurement theory in psychiatry for modeling multiple disease diagnostic measures as arising from the underlying causes (true mental states). It allows incorporation of the multivariate pre- and post-treatment outcomes as well as biological measures while preserving the invariant structure for representing patients' latent mental states. A multi-layer neural network is used to allow complex treatment effect heterogeneity. Optimal treatment policy can be inferred for future patients by comparing their potential mental states under different treatments given the observed multi-domain pre-treatment measurements. Experiments on simulated data and a real-world clinical trial data show that the learned treatment polices compare favorably to alternative methods on heterogeneous treatment effects, and have broad utilities which lead to better patient outcomes on multiple domains.
\ No newline at end of file
diff --git a/data/2020/neurips/Rescuing neural spike train models from bad MLE b/data/2020/neurips/Rescuing neural spike train models from bad MLE
new file mode 100644
index 0000000000..cdcaaac21a
--- /dev/null
+++ b/data/2020/neurips/Rescuing neural spike train models from bad MLE	
@@ -0,0 +1 @@
+The standard approach to fitting an autoregressive spike train model is to maximize the likelihood for one-step prediction. This maximum likelihood estimation (MLE) often leads to models that perform poorly when generating samples recursively for more than one time step. Moreover, the generated spike trains can fail to capture important features of the data and even show diverging firing rates. To alleviate this, we propose to directly minimize the divergence between neural recorded and model generated spike trains using spike train kernels. We develop a method that stochastically optimizes the maximum mean discrepancy induced by the kernel. Experiments performed on both real and synthetic neural data validate the proposed approach, showing that it leads to well-behaving models. Using different combinations of spike train kernels, we show that we can control the trade-off between different features which is critical for dealing with model-mismatch.
\ No newline at end of file
diff --git a/data/2020/neurips/Reservoir Computing meets Recurrent Kernels and Structured Transforms b/data/2020/neurips/Reservoir Computing meets Recurrent Kernels and Structured Transforms
new file mode 100644
index 0000000000..27829a8421
--- /dev/null
+++ b/data/2020/neurips/Reservoir Computing meets Recurrent Kernels and Structured Transforms	
@@ -0,0 +1 @@
+Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
\ No newline at end of file
diff --git a/data/2020/neurips/Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts b/data/2020/neurips/Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts
new file mode 100644
index 0000000000..cbf095d77e
--- /dev/null
+++ b/data/2020/neurips/Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts	
@@ -0,0 +1 @@
+By transferring both features and gradients between different layers, shortcut connections explored by ResNets allow us to effectively train very deep neural networks up to hundreds of layers. However, the additional computation costs induced by those shortcuts are often overlooked. For example, during online inference, the shortcuts in ResNet-50 account for about 40 percent of the entire memory usage on feature maps, because the features in the preceding layers cannot be released until the subsequent calculation is completed. In this work, for the ﬁrst time, we consider training the CNN models with shortcuts and deploying them without. In particular, we propose a novel joint-training framework to train plain CNN by leveraging the gradients of the ResNet counterpart. During forward step, the feature maps of the early stages of plain CNN are passed through later stages of both itself and the ResNet counterpart to calculate the loss. During backpropagation, gradients calculated from a mixture of these two parts are used to update the plainCNN network to solve the gradient vanishing problem. Extensive experiments on ImageNet/CIFAR10/CIFAR100 demonstrate that the plainCNN network without shortcuts generated by our approach can achieve the same level of accuracy as that of the ResNet baseline while achieving about 1 . 4 × speed-up and 1 . 25 × memory reduction. We also veriﬁed the feature transferability of our ImageNet pretrained plain-CNN network by ﬁne-tuning it on MIT 67 and Caltech 101. Our results show that the performance of the plain-CNN is slightly higher than that of its baseline ResNet-50 on these two datasets. The code will be available at https://github.com/leoozy/JointRD
\ No newline at end of file
diff --git a/data/2020/neurips/Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis b/data/2020/neurips/Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis
new file mode 100644
index 0000000000..232f5aa9c3
--- /dev/null
+++ b/data/2020/neurips/Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis	
@@ -0,0 +1 @@
+Reinforcement learning has shown great promise for synthesizing realistic human behaviors by learning humanoid control policies from motion capture data. However, it is still very challenging to reproduce sophisticated human skills like ballet dance, or to stably imitate long-term human behaviors with complex transitions. The main difficulty lies in the dynamics mismatch between the humanoid model and real humans. That is, motions of real humans may not be physically possible for the humanoid model. To overcome the dynamics mismatch, we propose a novel approach, residual force control (RFC), that augments a humanoid control policy by adding external residual forces into the action space. During training, the RFC-based policy learns to apply residual forces to the humanoid to compensate for the dynamics mismatch and better imitate the reference motion. Experiments on a wide range of dynamic motions demonstrate that our approach outperforms state-of-the-art methods in terms of convergence speed and the quality of learned motions. For the first time, we show a physics-based virtual character performing highly agile ballet dance moves such as pirouette, arabesque and jete. Furthermore, we propose a dual-policy control framework, where a kinematic policy and an RFC-based policy work in tandem to synthesize multi-modal infinite-horizon human motions without any task guidance or user input. Our approach is the first humanoid control method that successfully learns from a large-scale human motion dataset (Human3.6M) and generates diverse long-term motions.
\ No newline at end of file
diff --git a/data/2020/neurips/Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits b/data/2020/neurips/Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
new file mode 100644
index 0000000000..ec154954df
--- /dev/null
+++ b/data/2020/neurips/Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits	
@@ -0,0 +1 @@
+We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain, and the reward of pulling an arm depends on both the pulled arm and the current state of the corresponding Markov chain. In this paper, we propose Restless-UCB, a learning policy that follows the explore-then-commit framework. In Restless-UCB, we present a novel method to construct offline instances, which only requires $O(N)$ time-complexity ($N$ is the number of arms) and is exponentially better than the complexity of existing learning policy. We also prove that Restless-UCB achieves a regret upper bound of $\tilde{O}((N+M^3)T^{2\over 3})$, where $M$ is the Markov chain state space size and $T$ is the time horizon. Compared to existing algorithms, our result eliminates the exponential factor (in $M,N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems. As a result, our analysis technique can also be adopted to tighten the regret bounds of existing algorithms. Finally, we conduct experiments based on real-world dataset, to compare the Restless-UCB policy with state-of-the-art benchmarks. Our results show that Restless-UCB outperforms existing algorithms in regret, and significantly reduces the running time.
\ No newline at end of file
diff --git a/data/2020/neurips/Restoring Negative Information in Few-Shot Object Detection b/data/2020/neurips/Restoring Negative Information in Few-Shot Object Detection
new file mode 100644
index 0000000000..e26356ac14
--- /dev/null
+++ b/data/2020/neurips/Restoring Negative Information in Few-Shot Object Detection	
@@ -0,0 +1 @@
+Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes with few annotated samples. Recent advances in few-shot learning mainly focus on image classification while in this paper we focus on object detection. The initial explorations in few-shot object detection tend to simulate a classification scenario by using the positive proposals in images with respect to certain object class while discarding the negative proposals of that class. Negatives, especially hard negatives, however, are essential to the embedding space learning in few-shot object detection. In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives. We build our work on a recent few-shot pipeline RepMet with several new modules to encode negative information for both training and testing. Extensive experiments on ImageNet-LOC and PASCAL VOC show our method substantially improves the state-of-the-art few-shot object detection solutions. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Rethinking Importance Weighting for Deep Learning under Distribution Shift b/data/2020/neurips/Rethinking Importance Weighting for Deep Learning under Distribution Shift
new file mode 100644
index 0000000000..dfd75db8a7
--- /dev/null
+++ b/data/2020/neurips/Rethinking Importance Weighting for Deep Learning under Distribution Shift	
@@ -0,0 +1 @@
+Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data. However, IW cannot work well on complex data, since WE is incompatible with deep learning. In this paper, we rethink IW and theoretically show it suffers from a circular dependency: we need not only WE for WC, but also WC for WE where a trained deep classifier is used as the feature extractor (FE). To cut off the dependency, we try to pretrain FE from unweighted training data, which leads to biased FE. To overcome the bias, we propose an end-to-end solution dynamic IW that iterates between WE and WC and combines them in a seamless manner, and hence our WE can also enjoy deep networks and stochastic optimizers indirectly. Experiments with two representative DSs on Fashion-MNIST and CIFAR-10/100 demonstrate that dynamic IW compares favorably with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Rethinking Learnable Tree Filter for Generic Feature Transform b/data/2020/neurips/Rethinking Learnable Tree Filter for Generic Feature Transform
new file mode 100644
index 0000000000..2f1e789ab9
--- /dev/null
+++ b/data/2020/neurips/Rethinking Learnable Tree Filter for Generic Feature Transform	
@@ -0,0 +1 @@
+The Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. Nevertheless, the intrinsic geometric constraint forces it to focus on the regions with close spatial distance, hindering the effective long-range interactions. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. Besides, we propose a learnable spanning tree algorithm to replace the original non-differentiable one, which further improves the flexibility and robustness. With the above improvements, our method can better capture long-range dependencies and preserve structural details with linear complexity, which is extended to several vision tasks for more generic feature transform. Extensive experiments on object detection/instance segmentation demonstrate the consistent improvements over the original version. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles. Code is available at https://github.com/StevenGrove/LearnableTreeFilterV2.
\ No newline at end of file
diff --git a/data/2020/neurips/Rethinking Pre-training and Self-training b/data/2020/neurips/Rethinking Pre-training and Self-training
new file mode 100644
index 0000000000..265ffd858a
--- /dev/null
+++ b/data/2020/neurips/Rethinking Pre-training and Self-training	
@@ -0,0 +1 @@
+Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.
\ No newline at end of file
diff --git a/data/2020/neurips/Rethinking pooling in graph neural networks b/data/2020/neurips/Rethinking pooling in graph neural networks
new file mode 100644
index 0000000000..c3003e981c
--- /dev/null
+++ b/data/2020/neurips/Rethinking pooling in graph neural networks	
@@ -0,0 +1 @@
+Graph pooling is a central component of a myriad of graph neural network (GNN) architectures. As an inheritance from traditional CNNs, most approaches formulate graph pooling as a cluster assignment problem, extending the idea of local patches in regular grids to graphs. Despite the wide adherence to this design choice, no work has rigorously evaluated its influence on the success of GNNs. In this paper, we build upon representative GNNs and introduce variants that challenge the need for locality-preserving representations, either using randomization or clustering on the complement graph. Strikingly, our experiments demonstrate that using these variants does not result in any decrease in performance. To understand this phenomenon, we study the interplay between convolutional layers and the subsequent pooling ones. We show that the convolutions play a leading role in the learned representations. In contrast to the common belief, local pooling is not responsible for the success of GNNs on relevant and widely-used benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Rethinking the Value of Labels for Improving Class-Imbalanced Learning b/data/2020/neurips/Rethinking the Value of Labels for Improving Class-Imbalanced Learning
new file mode 100644
index 0000000000..9d08c2deb6
--- /dev/null
+++ b/data/2020/neurips/Rethinking the Value of Labels for Improving Class-Imbalanced Learning	
@@ -0,0 +1 @@
+Real-world data often exhibits long-tailed distributions with heavy class imbalance, posing great challenges for deep recognition models. We identify a persisting dilemma on the value of labels in the context of imbalanced learning: on the one hand, supervision from labels typically leads to better results than its unsupervised counterparts; on the other hand, heavily imbalanced data naturally incurs "label bias" in the classifier, where the decision boundary can be drastically altered by the majority classes. In this work, we systematically investigate these two facets of labels. We demonstrate, theoretically and empirically, that class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners. Specifically, we confirm that (1) positively, imbalanced labels are valuable: given more unlabeled data, the original labels can be leveraged with the extra data to reduce label bias in a semi-supervised manner, which greatly improves the final classifier; (2) negatively however, we argue that imbalanced labels are not useful always: classifiers that are first pre-trained in a self-supervised manner consistently outperform their corresponding baselines. Extensive experiments on large-scale imbalanced datasets verify our theoretically grounded strategies, showing superior performance over previous state-of-the-arts. Our intriguing findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks b/data/2020/neurips/Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
new file mode 100644
index 0000000000..90ad5f03ac
--- /dev/null
+++ b/data/2020/neurips/Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	
@@ -0,0 +1 @@
+Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
\ No newline at end of file
diff --git a/data/2020/neurips/RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist b/data/2020/neurips/RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist
new file mode 100644
index 0000000000..960a337220
--- /dev/null
+++ b/data/2020/neurips/RetroXpert: Decompose Retrosynthesis Prediction Like A Chemist	
@@ -0,0 +1 @@
+Retrosynthesis is the process of recursively decomposing target molecules into available building blocks. It plays an important role in solving problems in organic synthesis planning. To automate the retrosynthesis analysis, many retrosynthesis prediction methods have been proposed.However, most of them are cumbersome and lack interpretability about their predictions.In this paper, we devise a novel template-free algorithm, RetroXpert, for automatic retrosynthetic expansion by automating the procedure that chemists used to do.Our method disassembles retrosynthesis into two steps: i) we identify the potential reaction center within the target molecule through a graph neural network and generate intermediate synthons; and ii) we predict the associated reactants based on the obtained synthons via a reactant generation model. While outperforming the state-of-the-art baselines by a significant margin, our model also provides chemically reasonable interpretation.
\ No newline at end of file
diff --git a/data/2020/neurips/Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice b/data/2020/neurips/Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice
new file mode 100644
index 0000000000..4711e485b8
--- /dev/null
+++ b/data/2020/neurips/Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice	
@@ -0,0 +1 @@
+We study how recurrent neural networks (RNNs) solve a hierarchical inference task involving two latent variables and disparate timescales separated by 1-2 orders of magnitude. The task is of interest to the International Brain Laboratory, a global collaboration of experimental and theoretical neuroscientists studying how the mammalian brain generates behavior. We make four discoveries. First, RNNs learn behavior that is quantitatively similar to ideal Bayesian baselines. Second, RNNs perform inference by learning a two-dimensional subspace defining beliefs about the latent variables. Third, the geometry of RNN dynamics reflects an induced coupling between the two separate inference processes necessary to solve the task. Fourth, we perform model compression through a novel form of knowledge distillation on hidden representations – Representations and Dynamics Distillation (RADD)– to reduce the RNN dynamics to a low-dimensional, highly interpretable model. This technique promises a useful tool for interpretability of high dimensional nonlinear dynamical systems. Altogether, this work yields predictions to guide exploration and analysis of mouse neural data and circuity.
\ No newline at end of file
diff --git a/data/2020/neurips/Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity b/data/2020/neurips/Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity
new file mode 100644
index 0000000000..09e9bdba3c
--- /dev/null
+++ b/data/2020/neurips/Revisiting Frank-Wolfe for Polytopes: Strict Complementarity and Sparsity	
@@ -0,0 +1 @@
+In recent years it was proved that simple modifications of the classical Frank-Wolfe algorithm (aka conditional gradient algorithm) for smooth convex minimization over convex and compact polytopes, converge with linear rate, assuming the objective function has the quadratic growth property. However, the rate of these methods depends explicitly on the dimension of the problem which cannot explain their empirical success for large scale problems. In this paper we first demonstrate that already for very simple problems and even when the optimal solution lies on a low-dimensional face of the polytope, such dependence on the dimension cannot be avoided in worst case. We then revisit the addition of a strict complementarity assumption already considered in Wolfe's classical book \cite{Wolfe1970}, and prove that under this condition, the Frank-Wolfe method with away-steps and line-search converges linearly with rate that depends explicitly only on the dimension of the optimal face. We motivate strict complementarity by proving that it implies sparsity-robustness of optimal solutions to noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Revisiting Parameter Sharing for Automatic Neural Channel Number Search b/data/2020/neurips/Revisiting Parameter Sharing for Automatic Neural Channel Number Search
new file mode 100644
index 0000000000..3533919168
--- /dev/null
+++ b/data/2020/neurips/Revisiting Parameter Sharing for Automatic Neural Channel Number Search	
@@ -0,0 +1 @@
+Recent advances in neural architecture search inspire many channel number search algorithms (CNS) for convolutional neural networks. To improve searching ef-ﬁciency, parameter sharing is widely applied, which reuses parameters among different channel conﬁgurations. Nevertheless, it is unclear how parameter sharing affects the searching process. In this paper, we aim at providing a better understanding and exploitation of parameter sharing for CNS. Speciﬁcally, we propose afﬁne parameter sharing (APS) as a general formulation to unify and quantitatively analyze existing channel search algorithms. It is found that with parameter sharing, weight updates of one architecture can simultaneously beneﬁt other candidates. However, it also results in less conﬁdence in choosing good architectures. We thus propose a new strategy of parameter sharing towards a better balance between training efﬁciency and architecture discrimination. Extensive analysis and experiments demonstrate the superiority of the proposed strategy in channel conﬁguration against many state-of-the-art counterparts on benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes b/data/2020/neurips/Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes
new file mode 100644
index 0000000000..d82c4bc350
--- /dev/null
+++ b/data/2020/neurips/Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes	
@@ -0,0 +1 @@
+We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space. Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs). In particular, our analysis shows that under a certain data disentangling condition, an SSGP's prediction and model evidence (for training) can well-approximate those of a full GP with low sample complexity. We also develop a new auto-encoding algorithm that finds a latent space to disentangle latent input coordinates into well-separated clusters, which is amenable to our sample complexity analysis. We validate our proposed method on several benchmarks with promising results supporting our theoretical analysis.
\ No newline at end of file
diff --git a/data/2020/neurips/Reward Propagation Using Graph Convolutional Networks b/data/2020/neurips/Reward Propagation Using Graph Convolutional Networks
new file mode 100644
index 0000000000..2f96c04a11
--- /dev/null
+++ b/data/2020/neurips/Reward Propagation Using Graph Convolutional Networks	
@@ -0,0 +1 @@
+Potential-based reward shaping provides an approach for designing good reward functions, with the purpose of speeding up learning. However, automatically finding potential functions for complex environments is a difficult problem (in fact, of the same difficulty as learning a value function from scratch). We propose a new framework for learning potential functions by leveraging ideas from graph representation learning. Our approach relies on Graph Convolutional Networks which we use as a key ingredient in combination with the probabilistic inference view of reinforcement learning. More precisely, we leverage Graph Convolutional Networks to perform message passing from rewarding states. The propagated messages can then be used as potential functions for reward shaping to accelerate learning. We verify empirically that our approach can achieve considerable improvements in both small and high-dimensional control problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Reward-rational (implicit) choice: A unifying formalism for reward learning b/data/2020/neurips/Reward-rational (implicit) choice: A unifying formalism for reward learning
new file mode 100644
index 0000000000..b151578ad8
--- /dev/null
+++ b/data/2020/neurips/Reward-rational (implicit) choice: A unifying formalism for reward learning	
@@ -0,0 +1 @@
+It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback. The types of behavior interpreted as evidence of the reward function have expanded greatly in recent years. We've gone from demonstrations, to comparisons, to reading into the information leaked when the human is pushing the robot away or turning it off. And surely, there is more to come. How will a robot make sense of all these diverse types of behavior? Our key insight is that different types of behavior can be interpreted in a single unifying formalism - as a reward-rational choice that the human is making, often implicitly. The formalism offers both a unifying lens with which to view past work, as well as a recipe for interpreting new sources of information that are yet to be uncovered. We provide two examples to showcase this: interpreting a new feedback type, and reading into how the choice of feedback itself leaks information about the reward.
\ No newline at end of file
diff --git a/data/2020/neurips/Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement b/data/2020/neurips/Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
new file mode 100644
index 0000000000..c1e15cf9aa
--- /dev/null
+++ b/data/2020/neurips/Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement	
@@ -0,0 +1 @@
+Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.
\ No newline at end of file
diff --git a/data/2020/neurips/Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian b/data/2020/neurips/Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian
new file mode 100644
index 0000000000..c0436cf7a1
--- /dev/null
+++ b/data/2020/neurips/Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian	
@@ -0,0 +1 @@
+Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on 'standard' loss functions will not find, since it instead converges to the 'easy' solutions. In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian, which we call "ridges". By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Riemannian Continuous Normalizing Flows b/data/2020/neurips/Riemannian Continuous Normalizing Flows
new file mode 100644
index 0000000000..c80d4a2a27
--- /dev/null
+++ b/data/2020/neurips/Riemannian Continuous Normalizing Flows	
@@ -0,0 +1 @@
+Normalizing flows have shown great promise for modelling flexible probability distributions in a computationally tractable way. However, whilst data is often naturally described on Riemannian manifolds such as spheres, torii, and hyperbolic spaces, most normalizing flows implicitly assume a flat geometry, making them either misspecified or ill-suited in these situations. To overcome this problem, we introduce Riemannian continuous normalizing flows, a model which admits the parametrization of flexible probability measures on smooth manifolds by defining flows as the solution to ordinary differential equations. We show that this approach can lead to substantial improvements on both synthetic and real-world data when compared to standard flows or previously introduced projected flows.
\ No newline at end of file
diff --git a/data/2020/neurips/Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret b/data/2020/neurips/Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret
new file mode 100644
index 0000000000..a539343a56
--- /dev/null
+++ b/data/2020/neurips/Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret	
@@ -0,0 +1 @@
+We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably efficient model-free algorithms, Risk-Sensitive Value Iteration (RSVI) and Risk-Sensitive Q-learning (RSQ). These algorithms implement a form of risk-sensitive optimism in the face of uncertainty, which adapts to both risk-seeking and risk-averse modes of exploration. We prove that RSVI attains an $\tilde{O}\big(\lambda(|\beta| H^2) \cdot \sqrt{H^{3} S^{2}AT} \big)$ regret, while RSQ attains an $\tilde{O}\big(\lambda(|\beta| H^2) \cdot \sqrt{H^{4} SAT} \big)$ regret, where $\lambda(u) = (e^{3u}-1)/u$ for $u>0$. In the above, $\beta$ is the risk parameter of the exponential utility function, $S$ the number of states, $A$ the number of actions, $T$ the total number of timesteps, and $H$ the episode length. On the flip side, we establish a regret lower bound showing that the exponential dependence on $|\beta|$ and $H$ is unavoidable for any algorithm with an $\tilde{O}(\sqrt{T})$ regret (even when the risk objective is on the same scale as the original reward), thus certifying the near-optimality of the proposed algorithms. Our results demonstrate that incorporating risk awareness into reinforcement learning necessitates an exponential cost in $|\beta|$ and $H$, which quantifies the fundamental tradeoff between risk sensitivity (related to aleatoric uncertainty) and sample efficiency (related to epistemic uncertainty). To the best of our knowledge, this is the first regret analysis of risk-sensitive reinforcement learning with the exponential utility.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Correction of Sampling Bias using Cumulative Distribution Functions b/data/2020/neurips/Robust Correction of Sampling Bias using Cumulative Distribution Functions
new file mode 100644
index 0000000000..30a67fb741
--- /dev/null
+++ b/data/2020/neurips/Robust Correction of Sampling Bias using Cumulative Distribution Functions	
@@ -0,0 +1 @@
+Varying domains and biased datasets can lead to differences between the training and the target distributions, known as covariate shift. Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions. These techniques require parameter tuning and can be unstable across different datasets. We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution by a rigorous generalization of a recent idea proposed by Vapnik and Izmailov. Further, we show experimentally that our method is more robust in its predictions, is not reliant on parameter tuning and shows similar classification performance compared to the current state-of-the-art techniques on synthetic and real datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations b/data/2020/neurips/Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations
new file mode 100644
index 0000000000..4b3e134a2b
--- /dev/null
+++ b/data/2020/neurips/Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations	
@@ -0,0 +1 @@
+A deep reinforcement learning (DRL) agent observes its states through observations, whichmay contain natural measurement errors or adversarial noises. Since the observations deviatefrom the true states, they can mislead the agent into making suboptimal actions. Several workshave shown this vulnerability via adversarial attacks, but existing approaches on improving therobustness of DRL under this setting have limited success and lack for theoretical principles. Weshow that naively applying existing techniques on improving robustness for classification tasks,like adversarial training, are ineffective for many RL tasks. We propose the state-adversarialMarkov decision process (SA-MDP) to study the fundamental properties of this problem, anddevelop a theoretically principled policy regularization which can be applied to a large familyof DRL algorithms, including proximal policy optimization (PPO), deep deterministic policygradient (DDPG) and deep Q networks (DQN), for both discrete and continuous action controlproblems. We significantly improve the robustness of PPO, DDPG and DQN agents under asuite of strong white box adversarial attacks, including new attacks of our own. Additionally, wefind that a robust policy noticeably improves DRL performance even without an adversary in anumber of environments. Our code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Density Estimation under Besov IPM Losses b/data/2020/neurips/Robust Density Estimation under Besov IPM Losses
new file mode 100644
index 0000000000..ac27e6b272
--- /dev/null
+++ b/data/2020/neurips/Robust Density Estimation under Besov IPM Losses	
@@ -0,0 +1 @@
+We study minimax convergence rates of nonparametric density estimation in the Huber contamination model, in which a proportion of the data comes from an unknown outlier distribution. We provide the first results for this problem under a large family of losses, called Besov integral probability metrics (IPMs), that includes $\mathcal{L}^p$, Wasserstein, Kolmogorov-Smirnov, and other common distances between probability distributions. Specifically, under a range of smoothness assumptions on the population and outlier distributions, we show that a re-scaled thresholding wavelet series estimator achieves minimax optimal convergence rates under a wide variety of losses. Finally, based on connections that have recently been shown between nonparametric density estimation under IPM losses and generative adversarial networks (GANs), we show that certain GAN architectures also achieve these minimax rates.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Disentanglement of a Few Factors at a Time b/data/2020/neurips/Robust Disentanglement of a Few Factors at a Time
new file mode 100644
index 0000000000..676b642af5
--- /dev/null
+++ b/data/2020/neurips/Robust Disentanglement of a Few Factors at a Time	
@@ -0,0 +1 @@
+Disentanglement is at the forefront of unsupervised learning, as disentangled representations of data improve generalization, interpretability, and performance in downstream tasks. Current unsupervised approaches remain inapplicable for real-world datasets since they are highly variable in their performance and fail to reach levels of disentanglement of (semi-)supervised approaches. We introduce population-based training (PBT) for improving consistency in training variational autoencoders (VAEs) and demonstrate the validity of this approach in a supervised setting (PBT-VAE). We then use Unsupervised Disentanglement Ranking (UDR) as an unsupervised heuristic to score models in our PBT-VAE training and show how models trained this way tend to consistently disentangle only a subset of the generative factors. Building on top of this observation we introduce the recursive rPU-VAE approach. We train the model until convergence, remove the learned factors from the dataset and reiterate. In doing so, we can label subsets of the dataset with the learned factors and consecutively use these labels to train one model that fully disentangles the whole dataset. With this approach, we show striking improvement in state-of-the-art unsupervised disentanglement performance and robustness across multiple datasets and metrics.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Federated Learning: The Case of Affine Distribution Shifts b/data/2020/neurips/Robust Federated Learning: The Case of Affine Distribution Shifts
new file mode 100644
index 0000000000..c31e8afe5f
--- /dev/null
+++ b/data/2020/neurips/Robust Federated Learning: The Case of Affine Distribution Shifts	
@@ -0,0 +1 @@
+Federated learning is a distributed paradigm that aims at training models using samples distributed across multiple users in a network while keeping the samples on users' devices with the aim of efficiency and protecting users privacy. In such settings, the training data is often statistically heterogeneous and manifests various distribution shifts across users, which degrades the performance of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. To achieve this goal, we first consider a structured affine distribution shift in users' data that captures the device-dependent data heterogeneity in federated settings. This perturbation model is applicable to various federated learning problems such as image classification where the images undergo device-dependent imperfections, e.g. different intensity, contrast, and brightness. To address affine distribution shifts across users, we propose a Federated Learning framework Robust to Affine distribution shifts (FLRA) that is provably robust against affine Wasserstein shifts to the distribution of observed samples. To solve the FLRA's distributed minimax problem, we propose a fast and efficient optimization method and provide convergence guarantees via a gradient Descent Ascent (GDA) method. We further prove generalization error bounds for the learnt classifier to show proper generalization from empirical distribution of samples to the true underlying distribution. We perform several numerical experiments to empirically support FLRA. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user, and our proposed algorithm achieves a significant gain in comparison to standard federated learning and adversarial training methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time b/data/2020/neurips/Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time
new file mode 100644
index 0000000000..b08706e5a9
--- /dev/null
+++ b/data/2020/neurips/Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time	
@@ -0,0 +1 @@
+Robust covariance estimation is the following, well-studied problem in high dimensional statistics: given $N$ samples from a $d$-dimensional Gaussian $\mathcal{N}(\boldsymbol{0}, \Sigma)$, but where an $\varepsilon$-fraction of the samples have been arbitrarily corrupted, output $\widehat{\Sigma}$ minimizing the total variation distance between $\mathcal{N}(\boldsymbol{0}, \Sigma)$ and $\mathcal{N}(\boldsymbol{0}, \widehat{\Sigma})$. This corresponds to learning $\Sigma$ in a natural affine-invariant variant of the Frobenius norm known as the \emph{Mahalanobis norm}. Previous work of Cheng et al demonstrated an algorithm that, given $N = \Omega (d^2 / \varepsilon^2)$ samples, achieved a near-optimal error of $O(\varepsilon \log 1 / \varepsilon)$, and moreover, their algorithm ran in time $\widetilde{O}(T(N, d) \log \kappa / \mathrm{poly} (\varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d \times N$ matrix by its transpose, and $\kappa$ is the condition number of $\Sigma$. When $\varepsilon$ is relatively small, their polynomial dependence on $1/\varepsilon$ in the runtime is prohibitively large. In this paper, we demonstrate a novel algorithm that achieves the same statistical guarantees, but which runs in time $\widetilde{O} (T(N, d) \log \kappa)$. In particular, our runtime has no dependence on $\varepsilon$. When $\Sigma$ is reasonably conditioned, our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors, showing that we can get robustness essentially "for free."
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Meta-learning for Mixed Linear Regression with Small Batches b/data/2020/neurips/Robust Meta-learning for Mixed Linear Regression with Small Batches
new file mode 100644
index 0000000000..6361b85692
--- /dev/null
+++ b/data/2020/neurips/Robust Meta-learning for Mixed Linear Regression with Small Batches	
@@ -0,0 +1 @@
+A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with $\Omega(k^{1/2})$ examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with $o(k^{1/2})$ examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $\Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(\log k)$.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Multi-Agent Reinforcement Learning with Model Uncertainty b/data/2020/neurips/Robust Multi-Agent Reinforcement Learning with Model Uncertainty
new file mode 100644
index 0000000000..f09bc0bbfb
--- /dev/null
+++ b/data/2020/neurips/Robust Multi-Agent Reinforcement Learning with Model Uncertainty	
@@ -0,0 +1 @@
+In this work, we study the problem of multi-agent reinforcement learning (MARL) with model uncertainty, which is referred to as robust MARL . This is naturally motivated by some multi-agent applications where each agent may not have perfectly accurate knowledge of the model, e.g., all the reward functions of other agents. Little a priori work on MARL has accounted for such uncertainties, neither in problem formulation nor in algorithm design. In contrast, we model the problem as a robust Markov game , where the goal of all agents is to ﬁnd policies such that no agent has the incentive to deviate, i.e., reach some equilibrium point, which is also robust to the possible uncertainty of the MARL model. We ﬁrst introduce the solution concept of robust Nash equilibrium in our setting, and develop a Q-learning algorithm to ﬁnd such equilibrium policies, with convergence guarantees under certain conditions. In order to handle possibly enormous state-action spaces in practice, we then derive the policy gradients for robust MARL, and develop an actor-critic algorithm with function approximation. Our experiments demonstrate that the proposed algorithm outperforms several baseline MARL methods that do not account for the model uncertainty, in several standard but uncertain cooperative and competitive MARL environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Multi-Object Matching via Iterative Reweighting of the Graph Connection Laplacian b/data/2020/neurips/Robust Multi-Object Matching via Iterative Reweighting of the Graph Connection Laplacian
new file mode 100644
index 0000000000..d59d99409f
--- /dev/null
+++ b/data/2020/neurips/Robust Multi-Object Matching via Iterative Reweighting of the Graph Connection Laplacian	
@@ -0,0 +1 @@
+We propose an efficient and robust iterative solution to the multi-object matching problem. We first clarify serious limitations of current methods as well as the inappropriateness of the standard iteratively reweighted least squares procedure. In view of these limitations, we propose a novel and more reliable iterative reweighting strategy that incorporates information from higher-order neighborhoods by exploiting the graph connection Laplacian. We demonstrate the superior performance of our procedure over state-of-the-art methods using both synthetic and real datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation b/data/2020/neurips/Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation
new file mode 100644
index 0000000000..7beeef7ded
--- /dev/null
+++ b/data/2020/neurips/Robust Optimal Transport with Applications in Generative Modeling and Domain Adaptation	
@@ -0,0 +1 @@
+Optimal Transport (OT) distances such as Wasserstein have been used in several areas such as GANs and domain adaptation. OT, however, is very sensitive to outliers (samples with large noise) in the data since in its objective function, every sample, including outliers, is weighed similarly due to the marginal constraints. To remedy this issue, robust formulations of OT with unbalanced marginal constraints have previously been proposed. However, employing these methods in deep learning problems such as GANs and domain adaptation is challenging due to the instability of their dual optimization solvers. In this paper, we resolve these issues by deriving a computationally-efficient dual form of the robust OT optimization that is amenable to modern deep learning applications. We demonstrate the effectiveness of our formulation in two applications of GANs and domain adaptation. Our approach can train state-of-the-art GAN models on noisy datasets corrupted with outlier distributions. In particular, our optimization computes weights for training samples reflecting how difficult it is for those samples to be generated in the model. In domain adaptation, our robust OT formulation leads to improved accuracy compared to the standard adversarial adaptation methods. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Optimization for Fairness with Noisy Protected Groups b/data/2020/neurips/Robust Optimization for Fairness with Noisy Protected Groups
new file mode 100644
index 0000000000..682854ebca
--- /dev/null
+++ b/data/2020/neurips/Robust Optimization for Fairness with Noisy Protected Groups	
@@ -0,0 +1 @@
+Many existing fairness criteria for machine learning involve equalizing some metric across \textit{protected groups} such as race or gender. However, practitioners trying to audit or enforce such group-based criteria can easily face the problem of noisy or biased protected group information. First, we study the consequences of na{i}vely relying on noisy protected group labels: we provide an upper bound on the fairness violations on the true groups $G$ when the fairness criteria are satisfied on noisy groups $\hat{G}$. Second, we introduce two new approaches using robust optimization that, unlike the na{i}ve approach of only relying on $\hat{G}$, are guaranteed to satisfy fairness criteria on the true protected groups $G$ while minimizing a training objective. We provide theoretical guarantees that one such approach converges to an optimal feasible solution. Using two case studies, we show empirically that the robust approaches achieve better true group fairness guarantees than the na{i}ve approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Persistence Diagrams using Reproducing Kernels b/data/2020/neurips/Robust Persistence Diagrams using Reproducing Kernels
new file mode 100644
index 0000000000..ab83f78918
--- /dev/null
+++ b/data/2020/neurips/Robust Persistence Diagrams using Reproducing Kernels	
@@ -0,0 +1 @@
+Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtrations of robust density estimators constructed using reproducing kernels. Using an analogue of the influence function on the space of persistence diagrams, we establish the proposed framework to be less sensitive to outliers. The robust persistence diagrams are shown to be consistent estimators in bottleneck distance, with the convergence rate controlled by the smoothness of the kernel. This, in turn, allows us to construct uniform confidence bands in the space of persistence diagrams. Finally, we demonstrate the superiority of the proposed approach on benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Pre-Training by Adversarial Contrastive Learning b/data/2020/neurips/Robust Pre-Training by Adversarial Contrastive Learning
new file mode 100644
index 0000000000..6ee926f343
--- /dev/null
+++ b/data/2020/neurips/Robust Pre-Training by Adversarial Contrastive Learning	
@@ -0,0 +1 @@
+Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness In this work, we improve robustness-aware self-supervised pre-training by learning representations that are consistent under both data augmentations and adversarial perturbations. Our approach leverages a recent contrastive learning framework, which learns representations by maximizing feature consistency under differently augmented views. This fits particularly well with the goal of adversarial robustness, as one cause of adversarial fragility is the lack of feature invariance, i.e., small input perturbations can result in undesirable large changes in features or even predicted labels. We explore various options to formulate the contrastive task, and demonstrate that by injecting adversarial perturbations, contrastive pre-training can lead to models that are both label-efficient and robust. We empirically evaluate the proposed Adversarial Contrastive Learning (ACL) and show it can consistently outperform existing methods. For example on the CIFAR-10 dataset, ACL outperforms the previous state-of-the-art unsupervised robust pre-training approach by 2.99% on robust accuracy and 2.14% on standard accuracy. We further demonstrate that ACL pre-training can improve semi-supervised adversarial training, even when only a few labeled examples are available. Our codes and pre-trained models have been released at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Quantization: One Model to Rule Them All b/data/2020/neurips/Robust Quantization: One Model to Rule Them All
new file mode 100644
index 0000000000..62e6325c72
--- /dev/null
+++ b/data/2020/neurips/Robust Quantization: One Model to Rule Them All	
@@ -0,0 +1 @@
+Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed. Robust quantization offers an alternative approach with improved tolerance to different classes of data-types and quantization policies. It opens up new exciting applications where the quantization process is not static and can vary to meet different circumstances and implementations. To address this issue, we propose a method that provides intrinsic robustness to the model against a broad range of quantization processes. Our method is motivated by theoretical arguments and enables us to store a single generic model capable of operating at various bit-widths and quantization policies. We validate our method's effectiveness on different ImageNet models.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization b/data/2020/neurips/Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization
new file mode 100644
index 0000000000..8806b40d6c
--- /dev/null
+++ b/data/2020/neurips/Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization	
@@ -0,0 +1 @@
+Recent advances have shown that implicit bias of gradient descent on over-parameterized models enables the recovery of low-rank matrices from linear measurements, even with no prior knowledge on the intrinsic rank. In contrast, for robust low-rank matrix recovery from grossly corrupted measurements, over-parameterization leads to overfitting without prior knowledge on both the intrinsic rank and sparsity of corruption. This paper shows that with a double over-parameterization for both the low-rank matrix and sparse corruption, gradient descent with discrepant learning rates provably recovers the underlying matrix even without prior knowledge on neither rank of the matrix nor sparsity of the corruption. We further extend our approach for the robust recovery of natural images by over-parameterizing images with deep convolutional networks. Experiments show that our method handles different test images and varying corruption levels with a single learning pipeline where the network width and termination conditions do not need to be adjusted on a case-by-case basis. Underlying the success is again the implicit bias with discrepant learning rates on different over-parameterized parameters, which may bear on broader applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification b/data/2020/neurips/Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification
new file mode 100644
index 0000000000..be4d75f6ae
--- /dev/null
+++ b/data/2020/neurips/Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification	
@@ -0,0 +1 @@
+Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems. It allows physicians (for example) to identify groups of patients for whom a given drug or treatment is likely to be effective and groups of patients for which it is not. Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE) and identify subgroups by maximizing the difference across subgroups of the average treatment effect in each subgroup. These approaches have several weaknesses: they rely on a particular algorithm for estimating ITE, they ignore (in)homogeneity within identified subgroups, and they do not produce good confidence estimates. This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses. R2P uses an arbitrary, exogenously prescribed algorithm for estimating ITE and quantifies the uncertainty of the ITE estimation, using a construction that is more robust than other methods. Experiments using synthetic and semi-synthetic datasets (based on real data) demonstrate that R2P constructs partitions that are simultaneously more homogeneous within groups and more heterogeneous across groups than the partitions produced by other methods. Moreover, because R2P can employ any ITE estimator, it also produces much narrower confidence intervals with a prescribed coverage guarantee than other methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Reinforcement Learning via Adversarial training with Langevin Dynamics b/data/2020/neurips/Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
new file mode 100644
index 0000000000..cd261cb422
--- /dev/null
+++ b/data/2020/neurips/Robust Reinforcement Learning via Adversarial training with Langevin Dynamics	
@@ -0,0 +1 @@
+We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization across different training and testing conditions, on several MuJoCo environments. Our experiments also show that, even for objective functions that entirely ignore potential environmental shifts, our sampling approach remains highly robust in comparison to standard RL algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Sequence Submodular Maximization b/data/2020/neurips/Robust Sequence Submodular Maximization
new file mode 100644
index 0000000000..2b2a9ae1c7
--- /dev/null
+++ b/data/2020/neurips/Robust Sequence Submodular Maximization	
@@ -0,0 +1 @@
+Submodularity is an important property of set functions and has been extensively studied in the literature. It models set functions that exhibit a diminishing returns property, where the marginal value of adding an element to a set decreases as the set expands. This notion has been generalized to considering sequence functions, where the order of adding elements plays a crucial role and determines the function value; the generalized notion is called sequence (or string) submodularity. In this paper, we study a new problem of robust sequence submodular maximization with cardinality constraints. The robustness is against the removal of a subset of elements in the selected sequence (e.g., due to malfunctions or adversarial attacks). Compared to robust submodular maximization for set function, new challenges arise when sequence functions are concerned. Specifically, there are multiple definitions of submodularity for sequence functions, which exhibit subtle yet critical differences. Another challenge comes from two directions of monotonicity: forward monotonicity and backward monotonicity, both of which are important to proving performance guarantees. To address these unique challenges, we design two robust greedy algorithms: while one algorithm achieves a constant approximation ratio but is robust only against the removal of a subset of contiguous elements, the other is robust against the removal of an arbitrary subset of the selected elements but requires a stronger assumption and achieves an approximation ratio that depends on the number of the removed elements. Finally, we generalize the analyses to considering sequence functions under weaker assumptions based on approximate versions of sequence submodularity and backward monotonicity
\ No newline at end of file
diff --git a/data/2020/neurips/Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing b/data/2020/neurips/Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
new file mode 100644
index 0000000000..e4a2ca56c7
--- /dev/null
+++ b/data/2020/neurips/Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing	
@@ -0,0 +1 @@
+We develop two methods for the following fundamental statistical task: given an $\epsilon$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(\epsilon\log\epsilon^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + \epsilon)$-approximate solution in $O(p\log(\tfrac{nd}{\epsilon})\epsilon^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).
\ No newline at end of file
diff --git a/data/2020/neurips/Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization b/data/2020/neurips/Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization
new file mode 100644
index 0000000000..8a7e7a1df6
--- /dev/null
+++ b/data/2020/neurips/Robust and Heavy-Tailed Mean Estimation Made Simple, via Regret Minimization	
@@ -0,0 +1,3 @@
+We study the problem of estimating the mean of a distribution in high dimensions when either the samples are adversarially corrupted or the distribution is heavy-tailed. Recent developments in robust statistics have established efficient and (near) optimal procedures for both settings. However, the algorithms developed on each side tend to be sophisticated and do not directly transfer to the other, with many of them having ad-hoc or complicated analyses. 
+In this paper, we provide a meta-problem and a duality theorem that lead to a new unified view on robust and heavy-tailed mean estimation in high dimensions. We show that the meta-problem can be solved either by a variant of the Filter algorithm from the recent literature on robust estimation or by the quantum entropy scoring scheme (QUE), due to Dong, Hopkins and Li (NeurIPS '19). By leveraging our duality theorem, these results translate into simple and efficient algorithms for both robust and heavy-tailed settings. Furthermore, the QUE-based procedure has run-time that matches the fastest known algorithms on both fronts. 
+Our analysis of Filter is through the classic regret bound of the multiplicative weights update method. This connection allows us to avoid the technical complications in previous works and improve upon the run-time analysis of a gradient-descent-based algorithm for robust mean estimation by Cheng, Diakonikolas, Ge and Soltanolkotabi (ICML '20).
\ No newline at end of file
diff --git a/data/2020/neurips/Robust compressed sensing using generative models b/data/2020/neurips/Robust compressed sensing using generative models
new file mode 100644
index 0000000000..bf60f01e4c
--- /dev/null
+++ b/data/2020/neurips/Robust compressed sensing using generative models	
@@ -0,0 +1 @@
+The goal of compressed sensing is to estimate a high dimensional vector from an underdetermined system of noisy linear equations. In analogy to classical compressed sensing, here we assume a generative model as a prior, that is, we assume the vector is represented by a deep generative model $G: \mathbb{R}^k \rightarrow \mathbb{R}^n$. Classical recovery approaches such as empirical risk minimization (ERM) are guaranteed to succeed when the measurement matrix is sub-Gaussian. However, when the measurement matrix and measurements are heavy-tailed or have outliers, recovery may fail dramatically. In this paper we propose an algorithm inspired by the Median-of-Means (MOM). Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers. Theoretically, our results show our novel MOM-based algorithm enjoys the same sample complexity guarantees as ERM under sub-Gaussian assumptions. Our experiments validate both aspects of our claims: other algorithms are indeed fragile and fail under heavy-tailed and/or corrupted data, while our approach exhibits the predicted robustness.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust large-margin learning in hyperbolic space b/data/2020/neurips/Robust large-margin learning in hyperbolic space
new file mode 100644
index 0000000000..3f0750ff38
--- /dev/null
+++ b/data/2020/neurips/Robust large-margin learning in hyperbolic space	
@@ -0,0 +1 @@
+Recently, there has been a surge of interest in representation learning in hyperbolic spaces, driven by their ability to represent hierarchical data with significantly fewer dimensions than standard Euclidean spaces. However, the viability and benefits of hyperbolic spaces for downstream machine learning tasks have received less attention. In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space. Specifically, we consider the problem of learning a large-margin classifier for data possessing a hierarchical structure. Our first contribution is a hyperbolic perceptron algorithm, which provably converges to a separating hyperplane. We then provide an algorithm to efficiently learn a large-margin hyperplane, relying on the careful injection of adversarial examples. Finally, we prove that for hierarchical data that embeds well into hyperbolic space, the low embedding dimension ensures superior guarantees when learning the classifier directly in hyperbolic space.
\ No newline at end of file
diff --git a/data/2020/neurips/Robust, Accurate Stochastic Optimization for Variational Inference b/data/2020/neurips/Robust, Accurate Stochastic Optimization for Variational Inference
new file mode 100644
index 0000000000..00ce6df1ae
--- /dev/null
+++ b/data/2020/neurips/Robust, Accurate Stochastic Optimization for Variational Inference	
@@ -0,0 +1 @@
+We consider the problem of fitting variational posterior approximations using stochastic optimization methods. The performance of these approximations depends on (1) how well the variational family matches the true posterior distribution,(2) the choice of divergence, and (3) the optimization of the variational objective. We show that even in the best-case scenario when the exact posterior belongs to the assumed variational family, common stochastic optimization methods lead to poor variational approximations if the problem dimension is moderately large. We also demonstrate that these methods are not robust across diverse model types. Motivated by these findings, we develop a more robust and accurate stochastic optimization framework by viewing the underlying optimization algorithm as producing a Markov chain. Our approach is theoretically motivated and includes a diagnostic for convergence and a novel stopping rule, both of which are robust to noisy evaluations of the objective function. We show empirically that the proposed framework works well on a diverse set of models: it can automatically detect stochastic optimization failure or inaccurate variational approximation
\ No newline at end of file
diff --git a/data/2020/neurips/Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs b/data/2020/neurips/Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs
new file mode 100644
index 0000000000..d5736040fa
--- /dev/null
+++ b/data/2020/neurips/Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs	
@@ -0,0 +1 @@
+We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust). This problem has been studied from different perspectives by different communities. However, the existing theory deals only with the case of quadratic costs (the LQ problem), which limits applications to stabilisation and tracking tasks only. In order to handle more general (non-convex) costs that naturally arise in many practical problems, we carefully select and bring together several tools from different communities, namely non-asymptotic linear regression, recent results in interval prediction, and tree-based planning. Combining and adapting the theoretical guarantees at each layer is non trivial, and we provide the first end-to-end suboptimality analysis for this setting. Interestingly, our analysis naturally adapts to handle many models and combines with a data-driven robust model selection strategy, which enables to relax the modelling assumptions. Last, we strive to preserve tractability at any stage of the method, that we illustrate on two challenging simulated environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations b/data/2020/neurips/Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
new file mode 100644
index 0000000000..eb8e04aa9f
--- /dev/null
+++ b/data/2020/neurips/Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations	
@@ -0,0 +1 @@
+This work proposes a novel analysis of stochastic gradient descent (SGD) for non-convex and smooth optimization. Our analysis sheds light on the impact of the probability distribution of the gradient noise on the convergence rate of the norm of the gradient. In the case of sub-Gaussian and centered noise, we prove that, with probability 1 − δ , the number of iterations to reach a precision ε for the squared gradient norm is O ( ε − 2 ln(1 /δ )) . In the case of centered and integrable heavy-tailed noise, we show that, while the expectation of the iterates may be inﬁnite, the squared gradient norm still converges with probability 1 − δ in O ( ε − p δ − q ) iterations, where p, q > 2 . This result shows that heavy-tailed noise on the gradient slows down the convergence of SGD without preventing it, proving that SGD is robust to gradient noise with unbounded variance, a setting of interest for Deep Learning. In addition, it indicates that choosing a step size proportional to T − 1 /b where b is the tail-parameter of the noise and T is the number of iterations leads to the best convergence rates. Both results are simple corollaries of a uniﬁed analysis using the novel concept of biased expectations , a simple and intuitive mathematical tool to obtain concentration inequalities. Using this concept, we propose a new quantity to measure the amount of noise added to the gradient, and discuss its value in multiple scenarios.
\ No newline at end of file
diff --git a/data/2020/neurips/Robustness of Bayesian Neural Networks to Gradient-Based Attacks b/data/2020/neurips/Robustness of Bayesian Neural Networks to Gradient-Based Attacks
new file mode 100644
index 0000000000..45e1c4ba7d
--- /dev/null
+++ b/data/2020/neurips/Robustness of Bayesian Neural Networks to Gradient-Based Attacks	
@@ -0,0 +1 @@
+Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, the problem remains open. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparametrized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in the limit BNN posteriors are robust to gradient-based adversarial attacks. Experimental results on the MNIST and Fashion MNIST datasets with BNNs trained with Hamiltonian Monte Carlo and Variational Inference support this line of argument, showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks.
\ No newline at end of file
diff --git a/data/2020/neurips/Robustness of Community Detection to Random Geometric Perturbations b/data/2020/neurips/Robustness of Community Detection to Random Geometric Perturbations
new file mode 100644
index 0000000000..c5befc38a0
--- /dev/null
+++ b/data/2020/neurips/Robustness of Community Detection to Random Geometric Perturbations	
@@ -0,0 +1 @@
+We consider the stochastic block model where connection between vertices is perturbed by some latent (and unobserved) random geometric graph. The objective is to prove that spectral methods are robust to this type of noise, even if they are agnostic to the presence (or not) of the random graph. We provide explicit regimes where the second eigenvector of the adjacency matrix is highly correlated to the true community vector (and therefore when weak/exact recovery is possible). This is possible thanks to a detailed analysis of the spectrum of the latent random graph, of its own interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Rotated Binary Neural Network b/data/2020/neurips/Rotated Binary Neural Network
new file mode 100644
index 0000000000..09ea8848a6
--- /dev/null
+++ b/data/2020/neurips/Rotated Binary Neural Network	
@@ -0,0 +1 @@
+Binary Neural Network (BNN) shows its predominance in reducing the complexity of deep neural networks. However, it suffers severe performance degradation. One of the major impediments is the large quantization error between the full-precision weight vector and its binary vector. Previous works focus on compensating for the norm gap while leaving the angular bias hardly touched. In this paper, for the first time, we explore the influence of angular bias on the quantization error and then introduce a Rotated Binary Neural Network (RBNN), which considers the angle alignment between the full-precision weight vector and its binarized version. At the beginning of each training epoch, we propose to rotate the full-precision weight vector to its binary vector to reduce the angular bias. To avoid the high complexity of learning a large rotation matrix, we further introduce a bi-rotation formulation that learns two smaller rotation matrices. In the training stage, we devise an adjustable rotated weight vector for binarization to escape the potential local optimum. Our rotation leads to around 50% weight flips which maximize the information gain. Finally, we propose a training-aware approximation of the sign function for the gradient backward. Experiments on CIFAR-10 and ImageNet demonstrate the superiorities of RBNN over many state-of-the-arts. Our source code, experimental settings, training logs and binary models are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud b/data/2020/neurips/Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud
new file mode 100644
index 0000000000..d43fa5dbb7
--- /dev/null
+++ b/data/2020/neurips/Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud	
@@ -0,0 +1 @@
+We propose a local-to-global representation learning algorithm for 3D point cloud data, which is appropriate to handle various geometric transformations, especially rotation, without explicit data augmentation with respect to the transformations. Our model takes advantage of multi-level abstraction based on graph convolutional neural networks, which constructs a descriptor hierarchy to encode rotation-invariant shape information of an input object in a bottom-up manner. The descriptors in each level are obtained from a neural network based on a graph via stochastic sampling of 3D points, which is effective in making the learned representations robust to the variations of input data. The proposed algorithm presents the state-of-the-art performance on the rotation-augmented 3D object recognition and segmentation benchmarks, and we further analyze its characteristics through comprehensive ablative experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection b/data/2020/neurips/SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
new file mode 100644
index 0000000000..0399addafd
--- /dev/null
+++ b/data/2020/neurips/SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection	
@@ -0,0 +1 @@
+While the self-attention mechanism has been widely used in a wide variety of tasks, it has the unfortunate property of a quadratic cost with respect to the input length, which makes it difficult to deal with long inputs. In this paper, we present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection (SAC). In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes. In contrast with previous self-attention models with pre-defined structures (edges), the model learns to construct attention edges to improve task-specific performances. In this way, the model is able to select the most salient nodes and reduce the quadratic complexity regardless of the sequence length. Based on SAC, we show that previous variants of self-attention models are its special cases. Through extensive experiments on neural machine translation, language modeling, graph representation learning and image classification, we demonstrate SAC is competitive with state-of-the-art models while significantly reducing memory cost.
\ No newline at end of file
diff --git a/data/2020/neurips/SCOP: Scientific Control for Reliable Neural Network Pruning b/data/2020/neurips/SCOP: Scientific Control for Reliable Neural Network Pruning
new file mode 100644
index 0000000000..ff9ebb5b91
--- /dev/null
+++ b/data/2020/neurips/SCOP: Scientific Control for Reliable Neural Network Pruning	
@@ -0,0 +1 @@
+This paper proposes a reliable neural network pruning algorithm by setting up a scientific control. Existing pruning methods have developed various hypotheses to approximate the importance of filters to the network and then execute filter pruning accordingly. To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output. Acting as a control group, knockoff feature is generated to mimic the feature map produced by the network filter, but they are conditionally independent of the example label given the real feature map. We theoretically suggest that the knockoff condition can be approximately preserved given the information propagation of network layers. Besides the real feature map on an intermediate layer, the corresponding knockoff feature is brought in as another auxiliary input signal for the subsequent layers. Redundant filters can be discovered in the adversarial process of different features. Through experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art methods. For example, our method can reduce 57.8% parameters and 60.2% FLOPs of ResNet-101 with only 0.01% top-1 accuracy loss on ImageNet.
\ No newline at end of file
diff --git a/data/2020/neurips/SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images b/data/2020/neurips/SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images
new file mode 100644
index 0000000000..760523d864
--- /dev/null
+++ b/data/2020/neurips/SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images	
@@ -0,0 +1 @@
+Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks b/data/2020/neurips/SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks
new file mode 100644
index 0000000000..d43aa395e8
--- /dev/null
+++ b/data/2020/neurips/SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks	
@@ -0,0 +1 @@
+We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds, which is equivariant under continuous 3D roto-translations. Equivariance is important to ensure stable and predictable performance in the presence of nuisance transformations of the data input. A positive corollary of equivariance is increased weight-tying within the model, leading to fewer trainable parameters and thus decreased sample complexity (i.e. we need less training data). The SE(3)-Transformer leverages the benefits of self-attention to operate on large point clouds with varying number of points, while guaranteeing SE(3)-equivariance for robustness. We evaluate our model on a toy $N$-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input. We further achieve competitive performance on two real-world datasets, ScanObjectNN and QM9. In all cases, our model outperforms a strong, non-equivariant attention baseline and an equivariant model without attention.
\ No newline at end of file
diff --git a/data/2020/neurips/SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology b/data/2020/neurips/SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology
new file mode 100644
index 0000000000..eaadfd9c67
--- /dev/null
+++ b/data/2020/neurips/SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology	
@@ -0,0 +1 @@
+The SEVIR catalog contains relevant information about each event in SEVIR. Table 1 includes a list of the catalogs columns, with a short description of each. When extracting data from SEVIR, it is helpful to first group the catalog by the id column. After doing so, the size of each group will represent the number of image types that are available for each event. This is useful for building training datasets that utilize multiple image types (such as the synthetic radar problem). The catalog also allows for efficient filtering of SEVIR by time, geographic location, or by statistic for more focused training and testing sets.
\ No newline at end of file
diff --git a/data/2020/neurips/SGD with shuffling: optimal rates without component convexity and large epoch requirements b/data/2020/neurips/SGD with shuffling: optimal rates without component convexity and large epoch requirements
new file mode 100644
index 0000000000..c9e5174dff
--- /dev/null
+++ b/data/2020/neurips/SGD with shuffling: optimal rates without component convexity and large epoch requirements	
@@ -0,0 +1 @@
+We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is general enough to cover gradient dominated nonconvex costs, and does not rely on the convexity of individual component functions unlike existing optimal convergence results. Secondly, assuming convexity of the individual components, we further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts: large number of epochs required for the results to hold, and extra poly-log factor gaps to the lower bound.
\ No newline at end of file
diff --git a/data/2020/neurips/SIRI: Spatial Relation Induced Network For Spatial Description Resolution b/data/2020/neurips/SIRI: Spatial Relation Induced Network For Spatial Description Resolution
new file mode 100644
index 0000000000..0612c52ec5
--- /dev/null
+++ b/data/2020/neurips/SIRI: Spatial Relation Induced Network For Spatial Description Resolution	
@@ -0,0 +1 @@
+Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional reasoning ambiguities. Both the linguistic and visual features are concatenated to finalize the target localization. Experimental results on the Touchdown show that our method is around 24\% better than the state-of-the-art method in terms of accuracy, measured by an 80-pixel radius. Our method also generalizes well on our proposed extended dataset collected using the same settings as Touchdown.
\ No newline at end of file
diff --git a/data/2020/neurips/SLIP: Learning to predict in unknown dynamical systems with long-term memory b/data/2020/neurips/SLIP: Learning to predict in unknown dynamical systems with long-term memory
new file mode 100644
index 0000000000..0d5a7dc3ea
--- /dev/null
+++ b/data/2020/neurips/SLIP: Learning to predict in unknown dynamical systems with long-term memory	
@@ -0,0 +1 @@
+We present an efficient and practical (polynomial time) algorithm for online prediction in unknown and partially observed linear dynamical systems (LDS) under stochastic noise. When the system parameters are known, the optimal linear predictor is the Kalman filter. However, the performance of existing predictive models is poor in important classes of LDS that are only marginally stable and exhibit long-term forecast memory. We tackle this problem through bounding the generalized Kolmogorov width of the Kalman filter model by spectral methods and conducting tight convex relaxation. We provide a finite-sample analysis, showing that our algorithm competes with Kalman filter in hindsight with only logarithmic regret. Our regret analysis relies on Mendelson's small-ball method, providing sharp error bounds without concentration, boundedness, or exponential forgetting assumptions. We also give experimental results demonstrating that our algorithm outperforms state-of-the-art methods. Our theoretical and experimental results shed light on the conditions required for efficient probably approximately correct (PAC) learning of the Kalman filter from partially observed data.
\ No newline at end of file
diff --git a/data/2020/neurips/SMYRF - Efficient Attention using Asymmetric Clustering b/data/2020/neurips/SMYRF - Efficient Attention using Asymmetric Clustering
new file mode 100644
index 0000000000..c0ee16b778
--- /dev/null
+++ b/data/2020/neurips/SMYRF - Efficient Attention using Asymmetric Clustering	
@@ -0,0 +1 @@
+We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length. Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g. queries and keys share the same vector representations) and require re-training from scratch. We apply our method to pre-trained state-of-the-art Natural Language Processing and Computer Vision models and we report significant memory and speed benefits. Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using $50\%$ less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training. Finally, we use SMYRF to train GANs with attention in high resolutions. Using a single TPU, we were able to scale attention to 128x128=16k and 256x256=65k tokens on BigGAN on CelebA-HQ.
\ No newline at end of file
diff --git a/data/2020/neurips/SOLOv2: Dynamic and Fast Instance Segmentation b/data/2020/neurips/SOLOv2: Dynamic and Fast Instance Segmentation
new file mode 100644
index 0000000000..cd46796745
--- /dev/null
+++ b/data/2020/neurips/SOLOv2: Dynamic and Fast Instance Segmentation	
@@ -0,0 +1 @@
+In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/STEER : Simple Temporal Regularization For Neural ODE b/data/2020/neurips/STEER : Simple Temporal Regularization For Neural ODE
new file mode 100644
index 0000000000..2bbfba0e70
--- /dev/null
+++ b/data/2020/neurips/STEER : Simple Temporal Regularization For Neural ODE	
@@ -0,0 +1 @@
+Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.
\ No newline at end of file
diff --git a/data/2020/neurips/STLnet: Signal Temporal Logic Enforced Multivariate Recurrent Neural Networks b/data/2020/neurips/STLnet: Signal Temporal Logic Enforced Multivariate Recurrent Neural Networks
new file mode 100644
index 0000000000..0c1b12152e
--- /dev/null
+++ b/data/2020/neurips/STLnet: Signal Temporal Logic Enforced Multivariate Recurrent Neural Networks	
@@ -0,0 +1 @@
+Recurrent Neural Networks (RNNs) have made great achievements for sequential prediction tasks. In practice, the target sequence often follows certain model properties or patterns (e.g., reasonable ranges, consecutive changes, resource constraint, temporal correlations between multiple variables, existence, unusual cases, etc.). However, RNNs cannot guarantee their learned distributions satisfy these properties. It is even more challenging for the prediction of large-scale and complex Cyber-Physical Systems. Failure to produce outcomes that meet these properties will result in inaccurate and even meaningless results. In this paper, we develop a new temporal logic-based learning framework, STLnet, which guides the RNN learning process with auxiliary knowledge of model properties, and produces a more robust model for improved future predictions. Our framework can be applied to general sequential deep learning models, and trained in an end-to-end manner with back-propagation. We evaluate the performance of STLnet using large-scale real-world city data. The experimental results show STLnet not only improves the accuracy of predictions, but importantly also guarantees the satisfaction of model properties and increases the robustness of RNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm b/data/2020/neurips/SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm
new file mode 100644
index 0000000000..4d01fb17ef
--- /dev/null
+++ b/data/2020/neurips/SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm	
@@ -0,0 +1 @@
+Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present $\mathrm{SURF}$, an algorithm for approximating distributions by piecewise polynomials. $\mathrm{SURF}$ is simple, replacing existing general-purpose optimization techniques by straight-forward approximation of each potential polynomial piece by a simple empirical-probability interpolation, and using plain divide-and-conquer to merge the pieces. It is universal, as well-known low-degree polynomial-approximation results imply that it accurately approximates a large class of common distributions. $\mathrm{SURF}$ is robust to distribution mis-specification as for any degree $d\le 8$, it estimates any distribution to an $\ell_1$ distance $ <3 $ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces. It is fast, using optimal sample complexity, and running in near sample-linear time. In experiments, $\mathrm{SURF}$ significantly outperforms state-of-the art algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence b/data/2020/neurips/SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence
new file mode 100644
index 0000000000..87e4d29e6e
--- /dev/null
+++ b/data/2020/neurips/SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence	
@@ -0,0 +1 @@
+Stein Variational Gradient Descent (SVGD), a popular sampling algorithm, is often described as the kernelized gradient flow for the Kullback-Leibler divergence in the geometry of optimal transport. We introduce a new perspective on SVGD that instead views SVGD as the (kernelized) gradient flow of the chi-squared divergence which, we show, exhibits a strong form of uniform exponential ergodicity under conditions as weak as a Poincare inequality. This perspective leads us to propose an alternative to SVGD, called Laplacian Adjusted Wasserstein Gradient Descent (LAWGD), that can be implemented from the spectral decomposition of the Laplacian operator associated with the target density. We show that LAWGD exhibits strong convergence guarantees and good practical performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Safe Reinforcement Learning via Curriculum Induction b/data/2020/neurips/Safe Reinforcement Learning via Curriculum Induction
new file mode 100644
index 0000000000..25f99fc71c
--- /dev/null
+++ b/data/2020/neurips/Safe Reinforcement Learning via Curriculum Induction	
@@ -0,0 +1 @@
+In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly. In such settings, the agent needs to behave safely not only after but also while learning. To achieve this, existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations during exploration with high probability, but both the probabilistic guarantees and the smoothness assumptions inherent in the priors are not viable in many scenarios of interest such as autonomous driving. This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor that saves the agent from violating constraints during learning. In this model, we introduce the monitor that neither needs to know how to do well at the task the agent is learning nor needs to know how the environment works. Instead, it has a library of reset controllers that it activates when the agent starts behaving dangerously, preventing it from doing damage. Crucially, the choices of which reset controller to apply in which situation affect the speed of agent learning. Based on observing agents' progress, the teacher itself learns a policy for choosing the reset controllers, a curriculum, to optimize the agent's final policy reward. Our experiments use this framework in two environments to induce curricula for safe and efficient learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction b/data/2020/neurips/Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
new file mode 100644
index 0000000000..4b3b6e8628
--- /dev/null
+++ b/data/2020/neurips/Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction	
@@ -0,0 +1 @@
+Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a <inline-formula> <tex-math notation="LaTeX">$\gamma $ </tex-math></inline-formula>-discounted MDP with state space <inline-formula> <tex-math notation="LaTeX">$\mathcal {S}$ </tex-math></inline-formula> and action space <inline-formula> <tex-math notation="LaTeX">$\mathcal {A}$ </tex-math></inline-formula>, we demonstrate that the <inline-formula> <tex-math notation="LaTeX">$\ell _{\infty }$ </tex-math></inline-formula>-based sample complexity of classical asynchronous Q-learning — namely, the number of samples needed to yield an entrywise <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula>-accurate estimate of the Q-function — is at most on the order of <inline-formula> <tex-math notation="LaTeX">$\frac {1}{ \mu _{\mathsf {min}}(1-\gamma)^{5}\varepsilon ^{2}}+ \frac { t_{\mathsf {mix}}}{ \mu _{\mathsf {min}}(1-\gamma)}$ </tex-math></inline-formula> up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, <inline-formula> <tex-math notation="LaTeX">$t_{\mathsf {mix}}$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$\mu _{\mathsf {min}}$ </tex-math></inline-formula> denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the sample complexity in the synchronous case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the cost taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result by a factor of at least <inline-formula> <tex-math notation="LaTeX">$|\mathcal {S}||\mathcal {A}|$ </tex-math></inline-formula> for all scenarios, and by a factor of at least <inline-formula> <tex-math notation="LaTeX">$t_{\mathsf {mix}}|\mathcal {S}||\mathcal {A}|$ </tex-math></inline-formula> for any sufficiently small accuracy level <inline-formula> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula>. Further, we demonstrate that the scaling on the effective horizon <inline-formula> <tex-math notation="LaTeX">$\frac {1}{1-\gamma }$ </tex-math></inline-formula> can be improved by means of variance reduction.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample Complexity of Uniform Convergence for Multicalibration b/data/2020/neurips/Sample Complexity of Uniform Convergence for Multicalibration
new file mode 100644
index 0000000000..8270876d85
--- /dev/null
+++ b/data/2020/neurips/Sample Complexity of Uniform Convergence for Multicalibration	
@@ -0,0 +1 @@
+There is a growing interest in societal concerns in machine learning systems, especially in fairness. Multicalibration gives a comprehensive methodology to address group fairness. In this work, we address the multicalibration error and decouple it from the prediction error. The importance of decoupling the fairness metric (multicalibration) and the accuracy (prediction error) is due to the inherent trade-off between the two, and the societal decision regarding the "right tradeoff" (as imposed many times by regulators). Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error, which implies that regardless of the accuracy, we can guarantee that the empirical and (true) multicalibration errors are close. We emphasize that our results: (1) are more general than previous bounds, as they apply to both agnostic and realizable settings, and do not rely on a specific type of algorithm (such as deferentially private), (2) improve over previous multicalibration sample complexity bounds and (3) implies uniform convergence guarantees for the classical calibration error.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation b/data/2020/neurips/Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation
new file mode 100644
index 0000000000..e6a8c2988f
--- /dev/null
+++ b/data/2020/neurips/Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation	
@@ -0,0 +1 @@
+We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces under a generative model. If $Q$-function is Lipschitz continuous, then the minimal sample complexity for estimating $\epsilon$-optimal $Q$-function is known to scale as ${\Omega}(\frac{1}{\epsilon^{d_1+d_2 +2}})$ per classical non-parametric learning theory, where $d_1$ and $d_2$ denote the dimensions of the state and action spaces respectively. The $Q$-function, when viewed as a kernel, induces a Hilbert-Schmidt operator and hence possesses square-summable spectrum. This motivates us to consider a parametric class of $Q$-functions parameterized by its "rank" $r$, which contains all Lipschitz $Q$-functions as $r \to \infty$. As our key contribution, we develop a simple, iterative learning algorithm that finds $\epsilon$-optimal $Q$-function with sample complexity of $\widetilde{O}(\frac{1}{\epsilon^{\max(d_1, d_2)+2}})$ when the optimal $Q$-function has low rank $r$ and the discounting factor $\gamma$ is below a certain threshold. Thus, this provides an exponential improvement in sample complexity. To enable our result, we develop a novel Matrix Estimation algorithm that faithfully estimates an unknown low-rank matrix in the $\ell_\infty$ sense even in the presence of arbitrary bounded noise, which might be of interest in its own right. Empirical results on several stochastic control tasks confirm the efficacy of our "low-rank" algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample complexity and effective dimension for regression on manifolds b/data/2020/neurips/Sample complexity and effective dimension for regression on manifolds
new file mode 100644
index 0000000000..2e040824d0
--- /dev/null
+++ b/data/2020/neurips/Sample complexity and effective dimension for regression on manifolds	
@@ -0,0 +1 @@
+We consider the theory of regression on a manifold using reproducing kernel Hilbert space methods. Manifold models arise in a wide variety of modern machine learning problems, and our goal is to help understand the effectiveness of various implicit and explicit dimensionality-reduction methods that exploit manifold structure. Our first key contribution is to establish a novel nonasymptotic version of the Weyl law from differential geometry. From this we are able to show that certain spaces of smooth functions on a manifold are effectively finite-dimensional, with a complexity that scales according to the manifold dimension rather than any ambient data dimension. Finally, we show that given (potentially noisy) function values taken uniformly at random over a manifold, a kernel regression estimator (derived from the spectral decomposition of the manifold) yields minimax-optimal error bounds that are controlled by the effective dimension.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining b/data/2020/neurips/Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining
new file mode 100644
index 0000000000..ab5222bea8
--- /dev/null
+++ b/data/2020/neurips/Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining	
@@ -0,0 +1 @@
+Many important problems in science and engineering, such as drug design, involve optimizing an expensive black-box objective function over a complex, high-dimensional, and structured input space. Although machine learning techniques have shown promise in solving such problems, existing approaches substantially lack sample efficiency. We introduce an improved method for efficient black-box optimization, which performs the optimization in the low-dimensional, continuous latent manifold learned by a deep generative model. In contrast to previous approaches, we actively steer the generative model to maintain a latent manifold that is highly useful for efficiently optimizing the objective. We achieve this by periodically retraining the generative model on the data points queried along the optimization trajectory, as well as weighting those data points according to their objective function value. This weighted retraining can be easily implemented on top of existing methods, and is empirically shown to significantly improve their efficiency and performance on synthetic and real-world optimization problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Sample-Efficient Reinforcement Learning of Undercomplete POMDPs b/data/2020/neurips/Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
new file mode 100644
index 0000000000..1ea166edea
--- /dev/null
+++ b/data/2020/neurips/Sample-Efficient Reinforcement Learning of Undercomplete POMDPs	
@@ -0,0 +1 @@
+Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number of computational and statistical hardness results for learning general Partially Observable Markov Decision Processes (POMDPs). This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of POMDPs. In particular, we present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works. OOM-UCB achieves an optimal sample complexity of $O(1/\epsilon^2)$ for finding an $\epsilon$-optimal policy, along with being polynomial in all other relevant quantities. As an interesting special case, we also provide a computationally and statistically efficient algorithm for POMDPs with deterministic state transitions.
\ No newline at end of file
diff --git a/data/2020/neurips/Sampling from a k-DPP without looking at all items b/data/2020/neurips/Sampling from a k-DPP without looking at all items
new file mode 100644
index 0000000000..ed0c71155d
--- /dev/null
+++ b/data/2020/neurips/Sampling from a k-DPP without looking at all items	
@@ -0,0 +1 @@
+Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more. Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. $k$-DPP). Existing $k$-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all $n$ items, making it infeasible for large datasets. A naive heuristic addressing this problem is to uniformly subsample a fraction of the data and perform $k$-DPP sampling only on those items, however this method offers no guarantee that the produced sample will even approximately resemble the target distribution over the original dataset. In this paper, we develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of $k$ items, while ensuring that this set is drawn exactly from the target distribution defined on all $n$ items. We show empirically that our algorithm produces a $k$-DPP sample after observing only a small fraction of all elements, leading to several orders of magnitude faster performance compared to the state-of-the-art.
\ No newline at end of file
diff --git a/data/2020/neurips/Sampling-Decomposable Generative Adversarial Recommender b/data/2020/neurips/Sampling-Decomposable Generative Adversarial Recommender
new file mode 100644
index 0000000000..c786812985
--- /dev/null
+++ b/data/2020/neurips/Sampling-Decomposable Generative Adversarial Recommender	
@@ -0,0 +1 @@
+Recommendation techniques are important approaches for alleviating information overload. Being often trained on implicit user feedback, many recommenders suffer from the sparsity challenge due to the lack of explicitly negative samples. The GAN-style recommenders (i.e., IRGAN) addresses the challenge by learning a generator and a discriminator adversarially, such that the generator produces increasingly difficult samples for the discriminator to accelerate optimizing the discrimination objective. However, producing samples from the generator is very time-consuming, and our empirical study shows that the discriminator performs poor in top-k item recommendation. To this end, a theoretical analysis is made for the GAN-style algorithms, showing that the generator of limit capacity is diverged from the optimal generator. This may interpret the limitation of discriminator's performance. Based on these findings, we propose a Sampling-Decomposable Generative Adversarial Recommender (SD-GAR). In the framework, the divergence between some generator and the optimum is compensated by self-normalized importance sampling; the efficiency of sample generation is improved with a sampling-decomposable generator, such that each sample can be generated in O(1) with the Vose-Alias method. Interestingly, due to decomposability of sampling, the generator can be optimized with the closed-form solutions in an alternating manner, being different from policy gradient in the GAN-style algorithms. We extensively evaluate the proposed algorithm with five real-world recommendation datasets. The results show that SD-GAR outperforms IRGAN by 12.4% and the SOTA recommender by 10% on average. Moreover, discriminator training can be 20x faster on the dataset with more than 120K items.
\ No newline at end of file
diff --git a/data/2020/neurips/Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot b/data/2020/neurips/Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
new file mode 100644
index 0000000000..67be65b2dc
--- /dev/null
+++ b/data/2020/neurips/Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot	
@@ -0,0 +1 @@
+Network pruning is a method for reducing test-time computational resource requirements with minimal performance degradation. Conventional wisdom of pruning algorithms suggests that: (1) Pruning methods exploit information from training data to find good subnetworks; (2) The architecture of the pruned network is crucial for good performance. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance. These findings inspire us to choose a series of simple \emph{data-independent} prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork (which we call "random tickets"). Experimental results show that our zero-shot random tickets outperform or attain a similar performance compared to existing "initial tickets". In addition, we identify one existing pruning method that passes our sanity checks. We hybridize the ratios in our random ticket with this method and propose a new method called "hybrid tickets", which achieves further improvement. (Our code is publicly available at this https URL)
\ No newline at end of file
diff --git a/data/2020/neurips/Scalable Belief Propagation via Relaxed Scheduling b/data/2020/neurips/Scalable Belief Propagation via Relaxed Scheduling
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Scalable Graph Neural Networks via Bidirectional Propagation b/data/2020/neurips/Scalable Graph Neural Networks via Bidirectional Propagation
new file mode 100644
index 0000000000..7301a749f3
--- /dev/null
+++ b/data/2020/neurips/Scalable Graph Neural Networks via Bidirectional Propagation	
@@ -0,0 +1 @@
+Graph Neural Networks (GNN) is an emerging field for learning on non-Euclidean data. Recently, there has been increased interest in designing GNN that scales to large graphs. Most existing methods use "graph sampling" or "layer-wise sampling" techniques to reduce training time. However, these methods still suffer from degrading performance and scalability problems when applying to graphs with billions of edges. This paper presents GBP, a scalable GNN that utilizes a localized bidirectional propagation process from both the feature vectors and the training/testing nodes. Theoretical analysis shows that GBP is the first method that achieves sub-linear time complexity for both the precomputation and the training phases. An extensive empirical study demonstrates that GBP achieves state-of-the-art performance with significantly less training/testing time. Most notably, GBP can deliver superior performance on a graph with over 60 million nodes and 1.8 billion edges in less than half an hour on a single machine.
\ No newline at end of file
diff --git a/data/2020/neurips/Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward b/data/2020/neurips/Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
new file mode 100644
index 0000000000..19bbdccb0e
--- /dev/null
+++ b/data/2020/neurips/Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward	
@@ -0,0 +1 @@
+It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.
\ No newline at end of file
diff --git a/data/2020/neurips/ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training b/data/2020/neurips/ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
new file mode 100644
index 0000000000..468b281c0b
--- /dev/null
+++ b/data/2020/neurips/ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training	
@@ -0,0 +1 @@
+Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleCom), that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. Using theoretical analysis, we show that ScaleCom provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) across a wide range of applications (image, language, and speech) without significant accuracy loss.
\ No newline at end of file
diff --git a/data/2020/neurips/Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks b/data/2020/neurips/Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks
new file mode 100644
index 0000000000..392a9052f7
--- /dev/null
+++ b/data/2020/neurips/Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph convolutional networks (GCNs) have shown promising results in processing graph data by extracting structure-aware features. This gave rise to extensive work in geometric deep learning, focusing on designing network architectures that ensure neuron activations conform to regularity patterns within the input graph. However, in most cases the graph structure is only accounted for by considering the similarity of activations between adjacent nodes, which limits the capabilities of such methods to discriminate between nodes in a graph. Here, we propose to augment conventional GCNs with geometric scattering transforms and residual convolutions. The former enables band-pass filtering of graph signals, thus alleviating the so-called oversmoothing often encountered in GCNs, while the latter is introduced to clear the resulting features of high-frequency noise. We establish the advantages of the presented Scattering GCN with both theoretical results establishing the complementary benefits of scattering and GCN features, as well as experimental results showing the benefits of our method compared to leading graph neural networks for semi-supervised node classification, including the recently proposed GAT network that typically alleviates oversmoothing using graph attention mechanisms.
\ No newline at end of file
diff --git a/data/2020/neurips/Searching for Low-Bit Weights in Quantized Neural Networks b/data/2020/neurips/Searching for Low-Bit Weights in Quantized Neural Networks
new file mode 100644
index 0000000000..2f471c8d44
--- /dev/null
+++ b/data/2020/neurips/Searching for Low-Bit Weights in Quantized Neural Networks	
@@ -0,0 +1 @@
+Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. However, the quantization functions used in most conventional quantization methods are non-differentiable, which increases the optimization difficulty of quantized networks. Compared with full-precision parameters (i.e., 32-bit floating numbers), low-bit values are selected from a much smaller set. For example, there are only 16 possibilities in 4-bit space. Thus, we present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately. In particular, each weight is represented as a probability distribution over the discrete value set. The probabilities are optimized during training and the values with the highest probability are selected to establish the desired quantized network. Experimental results on benchmarks demonstrate that the proposed method is able to produce quantized neural networks with higher performance over the state-of-the-art methods on both image classification and super-resolution tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking b/data/2020/neurips/Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Second Order PAC-Bayesian Bounds for the Weighted Majority Vote b/data/2020/neurips/Second Order PAC-Bayesian Bounds for the Weighted Majority Vote
new file mode 100644
index 0000000000..eb8c6a0e10
--- /dev/null
+++ b/data/2020/neurips/Second Order PAC-Bayesian Bounds for the Weighted Majority Vote	
@@ -0,0 +1 @@
+We present a novel analysis of the expected risk of weighted majority vote in multiclass classification. The analysis takes correlation of predictions by ensemble members into account and provides a bound that is amenable to efficient minimization, which yields improved weighting for the majority vote. We also provide a specialized version of our bound for binary classification, which allows to exploit additional unlabeled data for tighter risk estimation. In experiments, we apply the bound to improve weighting of trees in random forests and show that, in contrast to the commonly used first order bound, minimization of the new bound typically does not lead to degradation of the test error of the ensemble.
\ No newline at end of file
diff --git a/data/2020/neurips/Secretary and Online Matching Problems with Machine Learned Advice b/data/2020/neurips/Secretary and Online Matching Problems with Machine Learned Advice
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms b/data/2020/neurips/Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
new file mode 100644
index 0000000000..76f910d600
--- /dev/null
+++ b/data/2020/neurips/Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms	
@@ -0,0 +1 @@
+We analyze the extent to which existing methods rely on accurate training data for a speciﬁc class of reinforcement learning (RL) algorithms, known as Safe and Seldonian RL. We introduce a new measure of security to quantify the susceptibility to perturbations in training data by creating an attacker model that represents a worst-case analysis, and show that a couple of Seldonian RL methods are extremely sensitive to even a few data corruptions. We then introduce a new algorithm that is more robust against data corruptions, and demonstrate its usage in practice on some RL problems, including a grid-world and a diabetes treatment simulation.
\ No newline at end of file
diff --git a/data/2020/neurips/See, Hear, Explore: Curiosity via Audio-Visual Association b/data/2020/neurips/See, Hear, Explore: Curiosity via Audio-Visual Association
new file mode 100644
index 0000000000..dd51f5362e
--- /dev/null
+++ b/data/2020/neurips/See, Hear, Explore: Curiosity via Audio-Visual Association	
@@ -0,0 +1 @@
+Exploration is one of the core challenges in reinforcement learning. A common formulation of curiosity-driven exploration uses the difference between the real future and the future predicted by a learned model. However, predicting the future is an inherently difficult task which can be ill-posed in the face of stochasticity. In this paper, we introduce an alternative form of curiosity that rewards novel associations between different senses. Our approach exploits multiple modalities to provide a stronger signal for more efficient exploration. Our method is inspired by the fact that, for humans, both sight and sound play a critical role in exploration. We present results on several Atari environments and Habitat (a photorealistic navigation simulator), showing the benefits of using an audio-visual association model for intrinsically guiding learning agents in the absence of external rewards. For videos and code, see this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Adaptive Training: beyond Empirical Risk Minimization b/data/2020/neurips/Self-Adaptive Training: beyond Empirical Risk Minimization
new file mode 100644
index 0000000000..61639aa7de
--- /dev/null
+++ b/data/2020/neurips/Self-Adaptive Training: beyond Empirical Risk Minimization	
@@ -0,0 +1 @@
+We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at this https URL.
\ No newline at end of file
diff --git "a/data/2020/neurips/Self-Adaptively Learning to Demoir\303\251 from Focused and Defocused Image Pairs" "b/data/2020/neurips/Self-Adaptively Learning to Demoir\303\251 from Focused and Defocused Image Pairs"
new file mode 100644
index 0000000000..1e42a277df
--- /dev/null
+++ "b/data/2020/neurips/Self-Adaptively Learning to Demoir\303\251 from Focused and Defocused Image Pairs"	
@@ -0,0 +1 @@
+Moire artifacts are common in digital photography, resulting from the interference between high-frequency scene content and the color filter array of the camera. Existing deep learning-based demoireing methods trained on large scale datasets are limited in handling various complex moire patterns, and mainly focus on demoireing of photos taken of digital displays. Moreover, obtaining moire-free ground-truth in natural scenes is difficult but needed for training. In this paper, we propose a self-adaptive learning method for demoireing a high-frequency image, with the help of an additional defocused moire-free blur image. Given an image degraded with moire artifacts and a moire-free blur image, our network predicts a moire-free clean image and a blur kernel with a self-adaptive strategy that does not require an explicit training stage, instead performing test-time adaptation. Our model has two sub-networks and works iteratively. During each iteration, one sub-network takes the moire image as input, removing moire patterns and restoring image details, and the other sub-network estimates the blur kernel from the blur image. The two sub-networks are jointly optimized. Extensive experiments demonstrate that our method outperforms state-of-the-art methods and can produce high-quality demoired results. It can generalize well to the task of removing moire artifacts caused by display screens. In addition, we build a new moire dataset, including images with screen and texture moire artifacts. As far as we know, this is the first dataset with real texture moire patterns.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Distillation Amplifies Regularization in Hilbert Space b/data/2020/neurips/Self-Distillation Amplifies Regularization in Hilbert Space
new file mode 100644
index 0000000000..0763900a0a
--- /dev/null
+++ b/data/2020/neurips/Self-Distillation Amplifies Regularization in Hilbert Space	
@@ -0,0 +1 @@
+Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to L2 regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Distillation as Instance-Specific Label Smoothing b/data/2020/neurips/Self-Distillation as Instance-Specific Label Smoothing
new file mode 100644
index 0000000000..63dd773b56
--- /dev/null
+++ b/data/2020/neurips/Self-Distillation as Instance-Specific Label Smoothing	
@@ -0,0 +1 @@
+It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate experimentally that the improved performance of multi-generational self-distillation is in part associated with the increasing diversity in teacher predictions. With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization. Our framework allows us to theoretically relate self-distillation to label smoothing, a commonly used technique that regularizes predictive uncertainty, and suggests the importance of predictive diversity in addition to predictive uncertainty. We present experimental results using multiple datasets and neural network architectures that, overall, demonstrate the utility of predictive diversity. Finally, we propose a novel instance-specific label smoothing technique that promotes predictive diversity without the need for a separately trained teacher model. We provide an empirical evaluation of the proposed method, which, we find, often outperforms classical label smoothing.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Imitation Learning via Generalized Lower Bound Q-learning b/data/2020/neurips/Self-Imitation Learning via Generalized Lower Bound Q-learning
new file mode 100644
index 0000000000..4766e97698
--- /dev/null
+++ b/data/2020/neurips/Self-Imitation Learning via Generalized Lower Bound Q-learning	
@@ -0,0 +1 @@
+Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the potential performance gains provided by self-imitation learning, we show that n-step lower bound Q-learning achieves a trade-off between fixed point bias and contraction rate, drawing close connections to the popular uncorrected n-step Q-learning. We finally show that n-step lower bound Q-learning is a more robust alternative to return-based self-imitation learning and uncorrected n-step, over a wide range of continuous control benchmark tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Learning Transformations for Improving Gaze and Head Redirection b/data/2020/neurips/Self-Learning Transformations for Improving Gaze and Head Redirection
new file mode 100644
index 0000000000..5d47fd797a
--- /dev/null
+++ b/data/2020/neurips/Self-Learning Transformations for Improving Gaze and Head Redirection	
@@ -0,0 +1 @@
+Many computer vision tasks rely on labeled data. Rapid progress in generative modeling has led to the ability to synthesize photorealistic images. However, controlling specific aspects of the generation process such that the data can be used for supervision of downstream tasks remains challenging. In this paper we propose a novel generative model for images of faces, that is capable of producing high-quality images under fine-grained control over eye gaze and head orientation angles. This requires the disentangling of many appearance related factors including gaze and head orientation but also lighting, hue etc. We propose a novel architecture which learns to discover, disentangle and encode these extraneous variations in a self-learned manner. We further show that explicitly disentangling task-irrelevant factors results in more accurate modelling of gaze and head orientation. A novel evaluation scheme shows that our method improves upon the state-of-the-art in redirection accuracy and disentanglement between gaze direction and head orientation changes. Furthermore, we show that in the presence of limited amounts of real-world training data, our method allows for improvements in the downstream task of semi-supervised cross-dataset gaze estimation. Please check our project page at: this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Paced Deep Reinforcement Learning b/data/2020/neurips/Self-Paced Deep Reinforcement Learning
new file mode 100644
index 0000000000..75d8cb40dc
--- /dev/null
+++ b/data/2020/neurips/Self-Paced Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Curriculum Reinforcement Learning (CRL) improves the learning speed and stability of an agent by exposing it to a tailored series of tasks throughout learning. Despite empirical successes, an open question in CRL is how to automatically generate a curriculum for a given Reinforcement Learning (RL) agent, avoiding manual design. In this paper, we propose an answer by interpreting the curriculum generation as an inference problem, where distributions over tasks are progressively learned to approach the target task. This approach leads to an automatic curriculum generation, whose \textit{pace} is controlled by the agent, with solid theoretical motivation and easily coupleable with deep RL algorithms. In the conducted experiments, the curricula generated with the proposed algorithm significantly improve learning performance across several environments and deep RL algorithms, matching or outperforming state-of-the-art CRL algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Few-Shot Learning on Point Clouds b/data/2020/neurips/Self-Supervised Few-Shot Learning on Point Clouds
new file mode 100644
index 0000000000..ddf6bf2473
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Few-Shot Learning on Point Clouds	
@@ -0,0 +1 @@
+The increased availability of massive point clouds coupled with their utility in a wide variety of applications such as robotics, shape synthesis, and self-driving cars has attracted increased attention from both industry and academia. Recently, deep neural networks operating on labeled point clouds have shown promising results on supervised learning tasks like classification and segmentation. However, supervised learning leads to the cumbersome task of annotating the point clouds. To combat this problem, we propose two novel self-supervised pre-training tasks that encode a hierarchical partitioning of the point clouds using a cover-tree, where point cloud subsets lie within balls of varying radii at each level of the cover-tree. Furthermore, our self-supervised learning network is restricted to pre-train on the support set (comprising of scarce training examples) used to train the downstream network in a few-shot learning (FSL) setting. Finally, the fully-trained self-supervised network's point embeddings are input to the downstream task's network. We present a comprehensive empirical evaluation of our method on both downstream classification and segmentation tasks and show that supervised methods pre-trained with our self-supervised learning method significantly improve the accuracy of state-of-the-art methods. Additionally, our method also outperforms previous unsupervised methods in downstream classification tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Generative Adversarial Compression b/data/2020/neurips/Self-Supervised Generative Adversarial Compression
new file mode 100644
index 0000000000..748c1c59d9
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Generative Adversarial Compression	
@@ -0,0 +1 @@
+Deep learning’s success has led to larger and larger models to handle more and more complex tasks; trained models often contain millions of parameters. These large models are compute-and memory-intensive, which makes it a challenge to deploy them with latency, throughput, and storage constraints. Some model compression methods have been successfully applied to image classiﬁcation and detection or language models, but there has been very little work compressing generative adversarial networks (GANs) performing complex tasks. In this paper, we show that a standard model compression technique, weight pruning and knowledge distillation, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different compression granularities.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Learning by Cross-Modal Audio-Video Clustering b/data/2020/neurips/Self-Supervised Learning by Cross-Modal Audio-Video Clustering
new file mode 100644
index 0000000000..c96629cd6a
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Learning by Cross-Modal Audio-Video Clustering	
@@ -0,0 +1 @@
+Visual and audio modalities are highly correlated, yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy. Their intrinsic differences make cross-modal prediction a potentially more rewarding pretext task for self-supervised learning of video and audio representations compared to within-modality learning. Based on this intuition, we propose Cross-Modal Deep Clustering (XDC), a novel self-supervised method that leverages unsupervised clustering in one modality (e.g., audio) as a supervisory signal for the other modality (e.g., video). This cross-modal supervision helps XDC utilize the semantic correlation and the differences between the two modalities. Our experiments show that XDC outperforms single-modality clustering and other multi-modal variants. XDC achieves state-of-the-art accuracy among self-supervised methods on multiple video and audio benchmarks. Most importantly, our video model pretrained on large-scale unlabeled data significantly outperforms the same model pretrained with full-supervision on ImageNet and Kinetics for action recognition on HMDB51 and UCF101. To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised MultiModal Versatile Networks b/data/2020/neurips/Self-Supervised MultiModal Versatile Networks
new file mode 100644
index 0000000000..04ec8e419a
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised MultiModal Versatile Networks	
@@ -0,0 +1 @@
+Videos are a rich source of multi-modal supervision. In this work, we learn representations using self-supervision by leveraging three modalities naturally present in videos: vision, audio and language. To this end, we introduce the notion of a multimodal versatile network -- a network that can ingest multiple modalities and whose representations enable downstream tasks in multiple modalities. In particular, we explore how best to combine the modalities, such that fine-grained representations of audio and vision can be maintained, whilst also integrating text into a common embedding. Driven by versatility, we also introduce a novel process of deflation, so that the networks can be effortlessly applied to the visual data in the form of video or a static image. We demonstrate how such networks trained on large collections of unlabelled video data can be applied on video, video-text, image and audio tasks. Equipped with these representations, we obtain state-of-the-art performance on multiple challenging benchmarks including UCF101, HMDB51 and ESC-50 when compared to previous self-supervised work.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Relational Reasoning for Representation Learning b/data/2020/neurips/Self-Supervised Relational Reasoning for Representation Learning
new file mode 100644
index 0000000000..925f3aaa2d
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Relational Reasoning for Representation Learning	
@@ -0,0 +1 @@
+In self-supervised learning, a system is tasked with achieving a surrogate objective by defining alternative targets on a set of unlabeled data. The aim is to build useful representations that can be used in downstream tasks, without costly manual annotation. In this work, we propose a novel self-supervised formulation of relational reasoning that allows a learner to bootstrap a signal from information implicit in unlabeled data. Training a relation head to discriminate how entities relate to themselves (intra-reasoning) and other entities (inter-reasoning), results in rich and descriptive representations in the underlying neural network backbone, which can be used in downstream tasks such as classification and image retrieval. We evaluate the proposed method following a rigorous experimental procedure, using standard datasets, protocols, and backbones. Self-supervised relational reasoning outperforms the best competitor in all conditions by an average 14% in accuracy, and the most recent state-of-the-art model by 3%. We link the effectiveness of the method to the maximization of a Bernoulli log-likelihood, which can be considered as a proxy for maximizing the mutual information, resulting in a more efficient objective with respect to the commonly used contrastive losses.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Relationship Probing b/data/2020/neurips/Self-Supervised Relationship Probing
new file mode 100644
index 0000000000..1e10aca90e
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Relationship Probing	
@@ -0,0 +1 @@
+Structured representations of images that model visual relationships are beneﬁ-cial for many vision and vision-language applications. However, current human-annotated visual relationship datasets suffer from the long-tailed predicate distribution problem which limits the potential of visual relationship models. In this work, we introduce a self-supervised method that implicitly learns the visual relationships without relying on any ground-truth visual relationship annotations. Our method relies on 1) intra-and inter-modality encodings to respectively model relationships within each modality separately and jointly, and 2) relationship probing, which seeks to discover the graph structure within each modality. By leveraging masked language modeling, contrastive learning, and dependency tree distances for self-supervision, our method learns better object features as well as implicit visual relationships. We verify the effectiveness of our proposed method on various vision-language tasks that beneﬁt from improved visual relationship understanding.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-Supervised Visual Representation Learning from Hierarchical Grouping b/data/2020/neurips/Self-Supervised Visual Representation Learning from Hierarchical Grouping
new file mode 100644
index 0000000000..cd64167a0c
--- /dev/null
+++ b/data/2020/neurips/Self-Supervised Visual Representation Learning from Hierarchical Grouping	
@@ -0,0 +1 @@
+We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image into regions, followed by merging of those regions into a tree hierarchy. A small supervised dataset suffices for training this grouping primitive. Across a large unlabeled dataset, we apply this learned primitive to automatically predict hierarchical region structure. These predictions serve as guidance for self-supervised contrastive feature learning: we task a deep network with producing per-pixel embeddings whose pairwise distances respect the region hierarchy. Experiments demonstrate that our approach can serve as state-of-the-art generic pre-training, benefiting downstream tasks. We additionally explore applications to semantic region search and video-based object instance tracking.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID b/data/2020/neurips/Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
new file mode 100644
index 0000000000..41a72e652f
--- /dev/null
+++ b/data/2020/neurips/Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID	
@@ -0,0 +1 @@
+Domain adaptive object re-ID aims to transfer the learned knowledge from the labeled source domain to the unlabeled target domain to tackle the open-class re-identification problems. Although state-of-the-art pseudo-label-based methods have achieved great success, they did not make full use of all valuable information because of the domain gap and unsatisfying clustering performance. To solve these problems, we propose a novel self-paced contrastive learning framework with hybrid memory. The hybrid memory dynamically generates source-domain class-level, target-domain cluster-level and un-clustered instance-level supervisory signals for learning feature representations. Different from the conventional contrastive learning strategy, the proposed framework jointly distinguishes source-domain classes, and target-domain clusters and un-clustered instances. Most importantly, the proposed self-paced method gradually creates more reliable clusters to refine the hybrid memory and learning targets, and is shown to be the key to our outstanding performance. Our method outperforms state-of-the-arts on multiple domain adaptation tasks of object re-ID and even boosts the performance on the source domain without any extra annotations. Our generalized version on unsupervised person re-ID surpasses state-of-the-art algorithms by considerable 16.2% and 14.6% on Market-1501 and DukeMTMC-reID benchmarks. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs b/data/2020/neurips/Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs
new file mode 100644
index 0000000000..5b1119d302
--- /dev/null
+++ b/data/2020/neurips/Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs	
@@ -0,0 +1 @@
+Graph neural networks have shown superior performance in a wide range of applications providing a powerful representation of graph-structured data. Recent works show that the representation can be further improved by auxiliary tasks. However, the auxiliary tasks for heterogeneous graphs, which contain rich semantic information with various types of nodes and edges, have less explored in the literature. In this paper, to learn graph neural networks on heterogeneous graphs we propose a novel self-supervised auxiliary learning method using meta-paths, which are composite relations of multiple edge types. Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks. This can be viewed as a type of meta-learning. The proposed method can identify an effective combination of auxiliary tasks and automatically balance them to improve the primary task. Our methods can be applied to any graph neural networks in a plug-in manner without manual labeling or additional data. The experiments demonstrate that the proposed method consistently improves the performance of link prediction and node classification on heterogeneous graphs.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-supervised Co-Training for Video Representation Learning b/data/2020/neurips/Self-supervised Co-Training for Video Representation Learning
new file mode 100644
index 0000000000..167e493dc7
--- /dev/null
+++ b/data/2020/neurips/Self-supervised Co-Training for Video Representation Learning	
@@ -0,0 +1 @@
+The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other self-supervised approaches, whilst being significantly more efficient to train, i.e. requiring far less training data to achieve similar performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-supervised learning through the eyes of a child b/data/2020/neurips/Self-supervised learning through the eyes of a child
new file mode 100644
index 0000000000..fe6007f177
--- /dev/null
+++ b/data/2020/neurips/Self-supervised learning through the eyes of a child	
@@ -0,0 +1 @@
+Within months of birth, children develop meaningful expectations about the world around them. How much of this early knowledge can be explained through generic learning mechanisms applied to sensory data, and how much of it requires more substantive innate inductive biases? Addressing this fundamental question in its full generality is currently infeasible, but we can hope to make real progress in more narrowly defined domains, such as the development of high-level visual categories, thanks to improvements in data collecting technology and recent progress in deep learning. In this paper, our goal is precisely to achieve such progress by utilizing modern self-supervised deep learning methods and a recent longitudinal, egocentric video dataset recorded from the perspective of three young children (Sullivan et al., 2020). Our results demonstrate the emergence of powerful, high-level visual representations from developmentally realistic natural videos using generic self-supervised learning objectives.
\ No newline at end of file
diff --git a/data/2020/neurips/Self-training Avoids Using Spurious Features Under Domain Shift b/data/2020/neurips/Self-training Avoids Using Spurious Features Under Domain Shift
new file mode 100644
index 0000000000..55c0782cec
--- /dev/null
+++ b/data/2020/neurips/Self-training Avoids Using Spurious Features Under Domain Shift	
@@ -0,0 +1 @@
+In unsupervised domain adaptation, existing theory focuses on situations where the source and target domains are close. In practice, conditional entropy minimization and pseudo-labeling work even when the domain shifts are much larger than those analyzed by existing theory. We identify and analyze one particular setting where the domain shift can be large, but these algorithms provably work: certain spurious features correlate with the label in the source domain but are independent of the label in the target. Our analysis considers linear classification where the spurious features are Gaussian and the non-spurious features are a mixture of log-concave distributions. For this setting, we prove that entropy minimization on unlabeled target data will avoid using the spurious feature if initialized with a decently accurate source classifier, even though the objective is non-convex and contains multiple bad local minima using the spurious features. We verify our theory for spurious domain shift tasks on semi-synthetic Celeb-A and MNIST datasets. Our results suggest that practitioners collect and self-train on large, diverse datasets to reduce biases in classifiers even if labeling is impractical.
\ No newline at end of file
diff --git a/data/2020/neurips/Semantic Visual Navigation by Watching YouTube Videos b/data/2020/neurips/Semantic Visual Navigation by Watching YouTube Videos
new file mode 100644
index 0000000000..9a6da24cd9
--- /dev/null
+++ b/data/2020/neurips/Semantic Visual Navigation by Watching YouTube Videos	
@@ -0,0 +1 @@
+Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos don't come with labels for actions or goals, and may not even showcase optimal behavior. Our proposed method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We improve upon end-to-end RL methods by 66%, while using 250x fewer interactions. Code, data, and models will be made available.
\ No newline at end of file
diff --git a/data/2020/neurips/Semi-Supervised Neural Architecture Search b/data/2020/neurips/Semi-Supervised Neural Architecture Search
new file mode 100644
index 0000000000..d703f32e0d
--- /dev/null
+++ b/data/2020/neurips/Semi-Supervised Neural Architecture Search	
@@ -0,0 +1 @@
+Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and high-quality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose SemiNAS, a semi-supervised NAS approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost). Specifically, SemiNAS 1) trains an initial accuracy predictor with a small set of architecture-accuracy data pairs; 2) uses the trained accuracy predictor to predict the accuracy of large amount of architectures (without evaluation); and 3) adds the generated data pairs to the original data to further improve the predictor. The trained accuracy predictor can be applied to various NAS algorithms by predicting the accuracy of candidate architectures for them. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. On NASBench-101 benchmark dataset, it achieves comparable accuracy with gradient-based method while using only 1/7 architecture-accuracy pairs. 2) It achieves higher accuracy under the same computational cost. It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. On ImageNet, it achieves 23.5% top-1 error rate (under 600M FLOPS constraint) using 4 GPU-days for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
\ No newline at end of file
diff --git a/data/2020/neurips/Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization b/data/2020/neurips/Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization
new file mode 100644
index 0000000000..bf5557a936
--- /dev/null
+++ b/data/2020/neurips/Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization	
@@ -0,0 +1 @@
+Partial label learning assumes inaccurate supervision where each training example is associated with a set of candidate labels, among which only one is valid. In many real-world scenarios, however, it is costly and time-consuming to assign candidate label sets to all the training examples. To circumvent this difﬁculty, the problem of semi-supervised partial label learning is investigated in this paper, where unlabeled data is utilized to facilitate model induction along with partial label training examples. Speciﬁcally, label propagation is adopted to instantiate the labeling conﬁdence of partial label examples. After that, maximum margin formulation is introduced to jointly enable the induction of predictive model and the estimation of labeling conﬁdence over unlabeled data. The derived formulation enforces conﬁdence-rated margin maximization and conﬁdence manifold preservation over partial label examples and unlabeled data. We show that the predictive model and labeling conﬁdence can be solved via alternating optimization which admits QP solutions in either alternating step. Extensive experiments on synthetic as well as real-world data sets clearly validate the effectiveness of the proposed semi-supervised partial label learning approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Semialgebraic Optimization for Lipschitz Constants of ReLU Networks b/data/2020/neurips/Semialgebraic Optimization for Lipschitz Constants of ReLU Networks
new file mode 100644
index 0000000000..1855e21a1b
--- /dev/null
+++ b/data/2020/neurips/Semialgebraic Optimization for Lipschitz Constants of ReLU Networks	
@@ -0,0 +1 @@
+The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network. We introduce a semidefinite programming hierarchy to estimate the global and local Lipschitz constant of a multiple layer deep neural network. The novelty is to combine a polynomial lifting for ReLU functions derivatives with a weak generalization of Putinar's positivity certificate. This idea could also apply to other, nearly sparse, polynomial optimization problems in machine learning. We empirically demonstrate that our method provides a trade-off with respect to state of the art linear programming approach, and in some cases we obtain better bounds in less time.
\ No newline at end of file
diff --git a/data/2020/neurips/Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding b/data/2020/neurips/Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
new file mode 100644
index 0000000000..87c7a335f5
--- /dev/null
+++ b/data/2020/neurips/Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding	
@@ -0,0 +1 @@
+It is a truth universally acknowledged that an observed association without known mechanism must be in want of a causal estimate. However, causal estimation from observational data often relies on the (untestable) assumption of `no unobserved confounding'. Violations of this assumption can induce bias in effect estimates. In principle, such bias could invalidate or reverse the conclusions of a study. However, in some cases, we might hope that the influence of unobserved confounders is weak relative to a `large' estimated effect, so the qualitative conclusions are robust to bias from unobserved confounding. The purpose of this paper is to develop \emph{Austen plots}, a sensitivity analysis tool to aid such judgments by making it easier to reason about potential bias induced by unobserved confounding. We formalize confounding strength in terms of how strongly the confounder influences treatment assignment and outcome. For a target level of bias, an Austen plot shows the minimum values of treatment and outcome influence required to induce that level of bias. Domain experts can then make subjective judgments about whether such strong confounders are plausible. To aid this judgment, the Austen plot additionally displays the estimated influence strength of (groups of) the observed covariates. Austen plots generalize the classic sensitivity analysis approach of Imbens [Imb03]. Critically, Austen plots allow any approach for modeling the observed data and producing the initial estimate. We illustrate the tool by assessing biases for several real causal inference problems, using a variety of machine learning approaches for the initial data analysis. Code is available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Sequential Bayesian Experimental Design with Variable Cost Structure b/data/2020/neurips/Sequential Bayesian Experimental Design with Variable Cost Structure
new file mode 100644
index 0000000000..2545b0cebf
--- /dev/null
+++ b/data/2020/neurips/Sequential Bayesian Experimental Design with Variable Cost Structure	
@@ -0,0 +1 @@
+Mutual information (MI) is a commonly adopted utility function in Bayesian optimal experimental design (BOED). While theoretically appealing, MI evaluation poses a signiﬁcant computational burden for most real world applications. As a result, many algorithms utilize MI bounds as proxies that lack regret-style guarantees. Here, we utilize two-sided bounds to provide such guarantees. Bounds are successively reﬁned/tightened through additional computation until a desired guarantee is achieved. We consider the problem of adaptively allocating computational resources in BOED. Our approach achieves the same guarantee as existing methods, but with fewer evaluations of the costly MI reward. We adapt knapsack optimization of best arm identiﬁcation problems, with important differences that impact overall algorithm design and performance. First, observations of MI re-wards are biased. Second, evaluating experiments incurs shared costs amongst all experiments (posterior sampling) in addition to per-experiment costs that may vary with increasing evaluation. We propose and demonstrate an algorithm that accounts for these variable costs in the reﬁnement decision.
\ No newline at end of file
diff --git a/data/2020/neurips/Set2Graph: Learning Graphs From Sets b/data/2020/neurips/Set2Graph: Learning Graphs From Sets
new file mode 100644
index 0000000000..3fd18592cf
--- /dev/null
+++ b/data/2020/neurips/Set2Graph: Learning Graphs From Sets	
@@ -0,0 +1 @@
+Many problems in machine learning can be cast as learning functions from sets to graphs, or more generally to hypergraphs; in short, Set2Graph functions. Examples include clustering, learning vertex and edge features on graphs, and learning features on triplets in a collection. A natural approach for building Set2Graph models is to characterize all linear equivariant set-to-hypergraph layers and stack them with non-linear activations. This poses two challenges: (i) the expressive power of these networks is not well understood; and (ii) these models would suffer from high, often intractable computational and memory complexity, as their dimension grows exponentially. This paper advocates a family of neural network models for learning Set2Graph functions that is both practical and of maximal expressive power (universal), that is, can approximate arbitrary continuous Set2Graph functions over compact sets. Testing these models on different machine learning tasks, mainly an application to particle physics, we find them favorable to existing baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/ShapeFlow: Learnable Deformation Flows Among 3D Shapes b/data/2020/neurips/ShapeFlow: Learnable Deformation Flows Among 3D Shapes
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning b/data/2020/neurips/Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..8098a6a473
--- /dev/null
+++ b/data/2020/neurips/Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Exploration in multi-agent reinforcement learning is a challenging problem, especially in environments with sparse rewards. We propose a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms by learning in fewer steps and converging to higher returns. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
\ No newline at end of file
diff --git a/data/2020/neurips/Shared Space Transfer Learning for analyzing multi-site fMRI data b/data/2020/neurips/Shared Space Transfer Learning for analyzing multi-site fMRI data
new file mode 100644
index 0000000000..6af859f5a9
--- /dev/null
+++ b/data/2020/neurips/Shared Space Transfer Learning for analyzing multi-site fMRI data	
@@ -0,0 +1 @@
+Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data, for distinguishing when subjects are performing different cognitive tasks -- e.g., watching movies or making decisions. MVPA works best with a well-designed feature set and an adequate sample size. However, most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. Further, training a robust, generalized predictive model that can analyze homogeneous cognitive tasks provided by multi-site fMRI datasets has additional challenges. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning (TL) approach that can functionally align homogeneous multi-site fMRI datasets, and so improve the prediction performance in every site. SSTL first extracts a set of common features for all subjects in each site. It then uses TL to map these site-specific features to a site-independent shared space in order to improve the performance of the MVPA. SSTL uses a scalable optimization procedure that works effectively for high-dimensional fMRI datasets. The optimization procedure extracts the common features for each site by using a single-iteration algorithm and maps these site-specific common features to the site-independent shared space. We evaluate the effectiveness of the proposed method for transferring between various cognitive tasks. Our comprehensive experiments validate that SSTL achieves superior performance to other state-of-the-art analysis techniques.
\ No newline at end of file
diff --git a/data/2020/neurips/Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth b/data/2020/neurips/Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth
new file mode 100644
index 0000000000..ed33e45c35
--- /dev/null
+++ b/data/2020/neurips/Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth	
@@ -0,0 +1,5 @@
+We prove sharp dimension-free representation results for neural networks with $D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$ defined in the paper. These results capture the precise benefits of depth in the following sense: 
+1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$ ReLU layers is sharp up to constants, as shown by matching lower bounds. 
+2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$ grows the class of functions $\mathcal{G}_{D}$ contains progressively less smooth functions. 
+3. If $D^{\prime} < D$, then the approximation rate for the class $\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than that achieved by depth $D$ networks. 
+This constitutes a fine-grained characterization of the representation power of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in contrast to existing representation results which either require $D$ growing quickly with $N$ or assume that the function being represented is highly smooth. In the latter case similar rates can be obtained with a single nonlinear layer. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions, and indeed, the main technical novelty is to fully exploit the fact that deep networks can produce highly oscillatory functions with few activation functions.
\ No newline at end of file
diff --git a/data/2020/neurips/Sharp uniform convergence bounds through empirical centralization b/data/2020/neurips/Sharp uniform convergence bounds through empirical centralization
new file mode 100644
index 0000000000..82110c38c0
--- /dev/null
+++ b/data/2020/neurips/Sharp uniform convergence bounds through empirical centralization	
@@ -0,0 +1 @@
+We introduce the use of empirical centralization to derive novel practical, probabilistic, sample-dependent bounds to the Supremum Deviation (SD) of empirical means of functions in a family from their expectations. Our bounds have optimal dependence on the maximum (i.e., wimpy) variance and the function ranges, and the same dependence on the number of samples as existing SD bounds. To compute the bounds in practice, we develop novel tightly-concentrated Monte-Carlo estimators of the empirical Rademacher average of the empirically-centralized family, and we show novel concentration results for the empirical wimpy variance. Our experimental evaluation shows that our bounds greatly outperform non-centralized bounds and are extremely practical even at small sample sizes.
\ No newline at end of file
diff --git a/data/2020/neurips/Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms b/data/2020/neurips/Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms
new file mode 100644
index 0000000000..60126ca297
--- /dev/null
+++ b/data/2020/neurips/Sharpened Generalization Bounds based on Conditional Mutual Information and an Application to Noisy, Iterative Algorithms	
@@ -0,0 +1 @@
+The information-theoretic framework of Russo and J. Zou (2016) and Xu and Raginsky (2017) provides bounds on the generalization error of a learning algorithm in terms of the mutual information between the algorithm's output and the training sample. In this work, we study the proposal, by Steinke and Zakynthinou (2020), to reason about the generalization error of a learning algorithm by introducing a super sample that contains the training sample as a random subset and computing mutual information conditional on the super sample. We first show that these new bounds based on the conditional mutual information are tighter than those based on the unconditional mutual information. We then introduce yet tighter bounds, building on the "individual sample" idea of Bu, S. Zou, and Veeravalli (2019) and the "data dependent" ideas of Negrea et al. (2019), using disintegrated mutual information. Finally, we apply these bounds to the study of Langevin dynamics algorithm, showing that conditioning on the super sample allows us to exploit information in the optimization trajectory to obtain tighter bounds based on hypothesis tests.
\ No newline at end of file
diff --git a/data/2020/neurips/Sharper Generalization Bounds for Pairwise Learning b/data/2020/neurips/Sharper Generalization Bounds for Pairwise Learning
new file mode 100644
index 0000000000..511935e091
--- /dev/null
+++ b/data/2020/neurips/Sharper Generalization Bounds for Pairwise Learning	
@@ -0,0 +1 @@
+Pairwise learning refers to learning tasks with loss functions depending on a pair of training examples, which includes ranking and metric learning as speciﬁc examples. Recently, there has been an increasing amount of attention on the generalization analysis of pairwise learning to understand its practical behavior. However, the existing stability analysis provides suboptimal high-probability generalization bounds. In this paper, we provide a reﬁned stability analysis by developing generalization bounds which can be √ n -times faster than the existing results, where n is the sample size. This implies excess risk bounds of the order O ( n − 1 / 2 ) (up to a logarithmic factor) for both regularized risk minimization and stochastic gradient descent. We also introduce a new on-average stability measure to develop optimistic bounds in a low noise setting. We apply our results to ranking and metric learning, and clearly show the advantage of our generalization bounds over the existing analysis.
\ No newline at end of file
diff --git a/data/2020/neurips/ShiftAddNet: A Hardware-Inspired Deep Network b/data/2020/neurips/ShiftAddNet: A Hardware-Inspired Deep Network
new file mode 100644
index 0000000000..33ee716461
--- /dev/null
+++ b/data/2020/neurips/ShiftAddNet: A Hardware-Inspired Deep Network	
@@ -0,0 +1 @@
+Multiplication (e.g., convolution) is arguably a cornerstone of modern deep neural networks (DNNs). However, intensive multiplications cause expensive resource costs that challenge DNNs' deployment on resource-constrained edge devices, driving several attempts for multiplication-less deep networks. This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts. We leverage this idea to explicitly parameterize deep networks in this way, yielding a new type of deep network that involves only bit-shift and additive weight layers. This hardware-inspired ShiftAddNet immediately leads to both energy-efficient inference and training, without compromising the expressive capacity compared to standard DNNs. The two complementary operation types (bit-shift and add) additionally enable finer-grained control of the model's learning capacity, leading to more flexible trade-off between accuracy and (training) efficiency, as well as improved robustness to quantization and pruning. We conduct extensive experiments and ablation studies, all backed up by our FPGA-based ShiftAddNet implementation and energy measurements. Compared to existing DNNs or other multiplication-less models, ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies. Codes and pre-trained models are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Simple and Fast Algorithm for Binary Integer and Online Linear Programming b/data/2020/neurips/Simple and Fast Algorithm for Binary Integer and Online Linear Programming
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness b/data/2020/neurips/Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
new file mode 100644
index 0000000000..7680f61786
--- /dev/null
+++ b/data/2020/neurips/Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness	
@@ -0,0 +1 @@
+Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Simple and Scalable Sparse k-means Clustering via Feature Ranking b/data/2020/neurips/Simple and Scalable Sparse k-means Clustering via Feature Ranking
new file mode 100644
index 0000000000..299d802919
--- /dev/null
+++ b/data/2020/neurips/Simple and Scalable Sparse k-means Clustering via Feature Ranking	
@@ -0,0 +1 @@
+Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.
\ No newline at end of file
diff --git a/data/2020/neurips/Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering b/data/2020/neurips/Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering
new file mode 100644
index 0000000000..a774b485a1
--- /dev/null
+++ b/data/2020/neurips/Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering	
@@ -0,0 +1 @@
+Negative sampling approaches are prevalent in implicit collaborative filtering for obtaining negative labels from massive unlabeled data. As two major concerns in negative sampling, efficiency and effectiveness are still not fully achieved by recent works that use complicate structures and overlook risk of false negative instances. In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations. Above findings motivate us to simplify the model by sampling from designed memory that only stores a few important candidates and, more importantly, tackle the untouched false negative problem by favouring high-variance samples stored in memory, which achieves efficient sampling of true negatives with high-quality. Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
\ No newline at end of file
diff --git a/data/2020/neurips/Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations b/data/2020/neurips/Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations
new file mode 100644
index 0000000000..4124b5ab47
--- /dev/null
+++ b/data/2020/neurips/Simulating a Primary Visual Cortex at the Front of CNNs Improves Robustness to Image Perturbations	
@@ -0,0 +1 @@
+Current state-of-the-art object recognition models are largely based on convolutional neural network (CNN) architectures, which are loosely inspired by the primate visual system. However, these CNNs can be fooled by imperceptibly small, explicitly crafted perturbations, and struggle to recognize objects in corrupted images that are easily recognized by humans. Here, by making comparisons with primate neural data, we first observed that CNN models with a neural hidden layer that better matches primate primary visual cortex (V1) are also more robust to adversarial attacks. Inspired by this observation, we developed VOneNets, a new class of hybrid CNN vision models. Each VOneNet contains a fixed weight neural network front-end that simulates primate V1, called the VOneBlock, followed by a neural network back-end adapted from current CNN vision models. The VOneBlock is based on a classical neuroscientific model of V1: the linear-nonlinear-Poisson model, consisting of a biologically-constrained Gabor filter bank, simple and complex cell nonlinearities, and a V1 neuronal stochasticity generator. After training, VOneNets retain high ImageNet performance, but each is substantially more robust, outperforming the base CNNs and state-of-the-art methods by 18% and 3%, respectively, on a conglomerate benchmark of perturbations comprised of white box adversarial attacks and common image corruptions. Finally, we show that all components of the VOneBlock work in synergy to improve robustness. While current CNN architectures are arguably brain-inspired, the results presented here demonstrate that more precisely mimicking just one stage of the primate visual system leads to new gains in ImageNet-level computer vision applications.
\ No newline at end of file
diff --git a/data/2020/neurips/Simultaneous Preference and Metric Learning from Paired Comparisons b/data/2020/neurips/Simultaneous Preference and Metric Learning from Paired Comparisons
new file mode 100644
index 0000000000..6d4f993b02
--- /dev/null
+++ b/data/2020/neurips/Simultaneous Preference and Metric Learning from Paired Comparisons	
@@ -0,0 +1 @@
+A popular model of preference in the context of recommendation systems is the so-called \emph{ideal point} model. In this model, a user is represented as a vector $\mathbf{u}$ together with a collection of items $\mathbf{x_1}, \ldots, \mathbf{x_N}$ in a common low-dimensional space. The vector $\mathbf{u}$ represents the user's "ideal point," or the ideal combination of features that represents a hypothesized most preferred item. The underlying assumption in this model is that a smaller distance between $\mathbf{u}$ and an item $\mathbf{x_j}$ indicates a stronger preference for $\mathbf{x_j}$. In the vast majority of the existing work on learning ideal point models, the underlying distance has been assumed to be Euclidean. However, this eliminates any possibility of interactions between features and a user's underlying preferences. In this paper, we consider the problem of learning an ideal point representation of a user's preferences when the distance metric is an unknown Mahalanobis metric. Specifically, we present a novel approach to estimate the user's ideal point $\mathbf{u}$ and the Mahalanobis metric from paired comparisons of the form "item $\mathbf{x_i}$ is preferred to item $\mathbf{x_j}$." This can be viewed as a special case of a more general metric learning problem where the location of some points are unknown a priori. We conduct extensive experiments on synthetic and real-world datasets to exhibit the effectiveness of our algorithm.
\ No newline at end of file
diff --git a/data/2020/neurips/Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition b/data/2020/neurips/Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
new file mode 100644
index 0000000000..17b56472e6
--- /dev/null
+++ b/data/2020/neurips/Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition	
@@ -0,0 +1 @@
+This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback. We develop the first algorithm with a ``best-of-both-worlds'' guarantee: it achieves $\mathcal{O}(log T)$ regret when the losses are stochastic, and simultaneously enjoys worst-case robustness with $\tilde{\mathcal{O}}(\sqrt{T})$ regret even when the losses are adversarial, where $T$ is the number of episodes. More generally, it achieves $\tilde{\mathcal{O}}(\sqrt{C})$ regret in an intermediate setting where the losses are corrupted by a total amount of $C$. Our algorithm is based on the Follow-the-Regularized-Leader method from Zimin and Neu (2013), with a novel hybrid regularizer inspired by recent works of Zimmert et al. (2019a, 2019b) for the special case of multi-armed bandits. Crucially, our regularizer admits a non-diagonal Hessian with a highly complicated inverse. Analyzing such a regularizer and deriving a particular self-bounding regret guarantee is our key technical contribution and might be of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Sinkhorn Barycenter via Functional Gradient Descent b/data/2020/neurips/Sinkhorn Barycenter via Functional Gradient Descent
new file mode 100644
index 0000000000..e9f1523531
--- /dev/null
+++ b/data/2020/neurips/Sinkhorn Barycenter via Functional Gradient Descent	
@@ -0,0 +1,5 @@
+In this paper, we consider the problem of computing the barycenter of a set of probability distributions under the Sinkhorn divergence. 
+This problem has recently found applications across various domains, including graphics, learning, and vision, as it provides a meaningful mechanism to aggregate knowledge. 
+Unlike previous approaches which directly operate in the space of probability measures, we recast the Sinkhorn barycenter problem as an instance of unconstrained functional optimization and develop a novel functional gradient descent method named Sinkhorn Descent (SD). 
+We prove that SD converges to a stationary point at a sublinear rate, and under reasonable assumptions, we further show that it asymptotically finds a global minimizer of the Sinkhorn barycenter problem. Moreover, by providing a mean-field analysis, we show that SD preserves the weak convergence of empirical measures. 
+Importantly, the computational complexity of SD scales linearly in the dimension $d$ and we demonstrate its scalability by solving a $100$-dimensional Sinkhorn barycenter problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Sinkhorn Natural Gradient for Generative Models b/data/2020/neurips/Sinkhorn Natural Gradient for Generative Models
new file mode 100644
index 0000000000..4d93054f54
--- /dev/null
+++ b/data/2020/neurips/Sinkhorn Natural Gradient for Generative Models	
@@ -0,0 +1 @@
+We consider the problem of minimizing a functional over a parametric family of probability measures, where the parameterization is characterized via a push-forward structure. An important application of this problem is in training generative adversarial networks. In this regard, we propose a novel Sinkhorn Natural Gradient (SiNG) algorithm which acts as a steepest descent method on the probability space endowed with the Sinkhorn divergence. We show that the Sinkhorn information matrix (SIM), a key component of SiNG, has an explicit expression and can be evaluated accurately in complexity that scales logarithmically with respect to the desired accuracy. This is in sharp contrast to existing natural gradient methods that can only be carried out approximately. Moreover, in practical applications when only Monte-Carlo type integration is available, we design an empirical estimator for SIM and provide the stability analysis. In our experiments, we quantitatively compare SiNG with state-of-the-art SGD-type solvers on generative tasks to demonstrate its efficiency and efficacy of our method.
\ No newline at end of file
diff --git a/data/2020/neurips/Skeleton-bridged Point Completion: From Global Inference to Local Adjustment b/data/2020/neurips/Skeleton-bridged Point Completion: From Global Inference to Local Adjustment
new file mode 100644
index 0000000000..cbb703f140
--- /dev/null
+++ b/data/2020/neurips/Skeleton-bridged Point Completion: From Global Inference to Local Adjustment	
@@ -0,0 +1 @@
+Point completion refers to complete the missing geometries of objects from partial point clouds. Existing works usually estimate the missing shape by decoding a latent feature encoded from the input points. However, real-world objects are usually with diverse topologies and surface details, which a latent feature may fail to represent to recover a clean and complete surface. To this end, we propose a skeleton-bridged point completion network (SK-PCN) for shape completion. Given a partial scan, our method first predicts its 3D skeleton to obtain the global structure, and completes the surface by learning displacements from skeletal points. We decouple the shape completion into structure estimation and surface reconstruction, which eases the learning difficulty and benefits our method to obtain on-surface details. Besides, considering the missing features during encoding input points, SK-PCN adopts a local adjustment strategy that merges the input point cloud to our predictions for surface refinement. Comparing with previous methods, our skeleton-bridged manner better supports point normal estimation to obtain the full surface mesh beyond point clouds. The qualitative and quantitative experiments on both point cloud and mesh completion show that our approach outperforms the existing methods on various object categories.
\ No newline at end of file
diff --git a/data/2020/neurips/Sliding Window Algorithms for k-Clustering Problems b/data/2020/neurips/Sliding Window Algorithms for k-Clustering Problems
new file mode 100644
index 0000000000..ed0af198ec
--- /dev/null
+++ b/data/2020/neurips/Sliding Window Algorithms for k-Clustering Problems	
@@ -0,0 +1 @@
+The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we provide simple and practical algorithms that offer stronger performance guarantees than previous results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with costs only slightly higher than those returned by algorithms with access to the full dataset.
\ No newline at end of file
diff --git a/data/2020/neurips/Small Nash Equilibrium Certificates in Very Large Games b/data/2020/neurips/Small Nash Equilibrium Certificates in Very Large Games
new file mode 100644
index 0000000000..c793bdfe7b
--- /dev/null
+++ b/data/2020/neurips/Small Nash Equilibrium Certificates in Very Large Games	
@@ -0,0 +1 @@
+In many game settings, the game is not explicitly given but is only accessible by playing it. While there have been impressive demonstrations in such settings, prior techniques have not offered safety guarantees, that is, guarantees on the game-theoretic exploitability of the computed strategies. In this paper we introduce an approach that shows that it is possible to provide exploitability guarantees in such settings without ever exploring the entire game. We introduce a notion of a certificatae of an extensive-form approximate Nash equilibrium. For verifying a certificate, we give an algorithm that runs in time linear in the size of the certificate rather than the size of the whole game. In zero-sum games, we further show that an optimal certificate---given the exploration so far---can be computed with any standard game-solving algorithm (e.g., using a linear program or counterfactual regret minimization). However, unlike in the cases of normal form or perfect information, we show that certain families of extensive-form games do not have small approximate certificates, even after making extremely nice assumptions on the structure of the game. Despite this difficulty, we find experimentally that very small certificates, even exact ones, often exist in large and even in infinite games. Overall, our approach enables one to try one's favorite exploration strategies while offering exploitability guarantees, thereby decoupling the exploration strategy from the equilibrium-finding process.
\ No newline at end of file
diff --git a/data/2020/neurips/Smooth And Consistent Probabilistic Regression Trees b/data/2020/neurips/Smooth And Consistent Probabilistic Regression Trees
new file mode 100644
index 0000000000..5d01fdd91a
--- /dev/null
+++ b/data/2020/neurips/Smooth And Consistent Probabilistic Regression Trees	
@@ -0,0 +1 @@
+We propose here a generalization of regression trees, referred to as Probabilistic Regression (PR) trees, that adapt to the smoothness of the prediction function relating input and output variables while preserving the interpretability of the prediction and being robust to noise. In PR trees, an observation is associated to all regions of a tree through a probability distribution that reflects how far the observation is to a region. We show that such trees are consistent, meaning that their error tends to 0 when the sample size tends to infinity, a property that has not been established for similar, previous proposals as Soft trees and Smooth Transition Regression trees. We further explain how PR trees can be used in different ensemble methods, namely Random Forests and Gradient Boosted Trees. Lastly, we assess their performance through extensive experiments that illustrate their benefits in terms of performance, interpretability and robustness to noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Smoothed Analysis of Online and Differentially Private Learning b/data/2020/neurips/Smoothed Analysis of Online and Differentially Private Learning
new file mode 100644
index 0000000000..f7bf8a8b6c
--- /dev/null
+++ b/data/2020/neurips/Smoothed Analysis of Online and Differentially Private Learning	
@@ -0,0 +1 @@
+Practical and pervasive needs for robustness and privacy in algorithms have inspired the design of online adversarial and differentially private learning algorithms. The primary quantity that characterizes learnability in these settings is the Littlestone dimension of the class of hypotheses [Ben-David et al., 2009, Alon et al., 2019]. This characterization is often interpreted as an impossibility result because classes such as linear thresholds and neural networks have infinite Littlestone dimension. In this paper, we apply the framework of smoothed analysis [Spielman and Teng, 2004], in which adversarially chosen inputs are perturbed slightly by nature. We show that fundamentally stronger regret and error guarantees are possible with smoothed adversaries than with worst-case adversaries. In particular, we obtain regret and privacy error bounds that depend only on the VC dimension and the bracketing number of a hypothesis class, and on the magnitudes of the perturbations.
\ No newline at end of file
diff --git a/data/2020/neurips/Smoothed Geometry for Robust Attribution b/data/2020/neurips/Smoothed Geometry for Robust Attribution
new file mode 100644
index 0000000000..9f38ba33bd
--- /dev/null
+++ b/data/2020/neurips/Smoothed Geometry for Robust Attribution	
@@ -0,0 +1 @@
+Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.
\ No newline at end of file
diff --git a/data/2020/neurips/Smoothly Bounding User Contributions in Differential Privacy b/data/2020/neurips/Smoothly Bounding User Contributions in Differential Privacy
new file mode 100644
index 0000000000..87d8fb96f5
--- /dev/null
+++ b/data/2020/neurips/Smoothly Bounding User Contributions in Differential Privacy	
@@ -0,0 +1 @@
+A differentially private algorithm guarantees that the input of a single user won’t sig-niﬁcantly change the output distribution of the algorithm. When a user contributes more data points, more information can be collected to improve the algorithm’s performance. But at the same time, more noise might need to be added to the algorithm in order to keep the algorithm differentially private and this might hurt the algorithm’s performance. [AKMV19] initiates the study on bounding user contributions and proposes a very natural algorithm which limits the number of samples each user can contribute by a threshold. For a better trade-off between utility and privacy guarantee, we propose a method which smoothly bounds user contributions by setting appropriate weights on data points and apply it to estimating the mean/quantiles, linear regression, and empirical risk minimization. We show that our algorithm provably outperforms the sample limiting algorithm. We conclude with experimental evaluations which validate our theoretical results.
\ No newline at end of file
diff --git a/data/2020/neurips/SnapBoost: A Heterogeneous Boosting Machine b/data/2020/neurips/SnapBoost: A Heterogeneous Boosting Machine
new file mode 100644
index 0000000000..310fcb104d
--- /dev/null
+++ b/data/2020/neurips/SnapBoost: A Heterogeneous Boosting Machine	
@@ -0,0 +1 @@
+Modern gradient boosting software frameworks, such as XGBoost and LightGBM, implement Newton descent in a functional space. At each boosting iteration, their goal is to ﬁnd the base hypothesis, selected from some base hypothesis class, that is closest to the Newton descent direction in a Euclidean sense. Typically, the base hypothesis class is ﬁxed to be all binary decision trees up to a given depth. In this work, we study a Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations. Speciﬁcally, at each boosting iteration, the base hypothesis class is chosen, from a ﬁxed set of subclasses, by sampling from a probability distribution. We derive a global linear convergence rate for the HNBM under certain assumptions, and show that it agrees with existing rates for Newton’s method when the Newton direction can be perfectly ﬁtted by the base hypothesis at each boosting iteration. We then describe a particular realization of a HNBM, SnapBoost, that, at each boosting iteration, randomly selects between either a decision tree of variable depth or a linear regressor with random Fourier features. We describe how SnapBoost is implemented, with a focus on the training complexity. Finally, we present experimental results, using OpenML and Kaggle datasets, that show that SnapBoost
\ No newline at end of file
diff --git a/data/2020/neurips/Soft Contrastive Learning for Visual Localization b/data/2020/neurips/Soft Contrastive Learning for Visual Localization
new file mode 100644
index 0000000000..3802ea6db0
--- /dev/null
+++ b/data/2020/neurips/Soft Contrastive Learning for Visual Localization	
@@ -0,0 +1 @@
+Localization by image retrieval is inexpensive and scalable due to simple mapping and matching techniques. Such localization, however, depends upon the quality of image features often obtained using Contrastive learning frameworks. Most contrastive learning strategies opt for features to distinguish different classes. In the context of localization, however, there is no natural deﬁnition of classes. Therefore, images are usually artiﬁcially separated into positive/negative classes, with respect to the chosen anchor images, based on some geometric proximity measure. In this paper, we show why such divisions are problematic for learning localization features. We argue that any artiﬁcial division based on proximity measure is undesirable, due to the inherently ambiguous supervision for images near proximity threshold. To this end, we propose a novel technique that uses soft positive/negative assignments of images for contrastive learning, avoiding the aforementioned problem. Our soft assignment makes a gradual distinction between close and far images in both geometric and feature spaces. Experiments on four large-scale benchmark datasets demonstrate the superiority of our soft contrastive learning over the state-of-the-art method for retrieval-based visual localization.
\ No newline at end of file
diff --git a/data/2020/neurips/SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds b/data/2020/neurips/SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds
new file mode 100644
index 0000000000..d633f62636
--- /dev/null
+++ b/data/2020/neurips/SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds	
@@ -0,0 +1 @@
+Flow-based generative models are composed of invertible transformations between two random variables of the same dimension. Therefore, flow-based models cannot be adequately trained if the dimension of the data distribution does not match that of the underlying target distribution. In this paper, we propose SoftFlow, a probabilistic framework for training normalizing flows on manifolds. To sidestep the dimension mismatch problem, SoftFlow estimates a conditional distribution of the perturbed input data instead of learning the data distribution directly. We experimentally show that SoftFlow can capture the innate structure of the manifold data and generate high-quality samples unlike the conventional flow-based models. Furthermore, we apply the proposed framework to 3D point clouds to alleviate the difficulty of forming thin structures for flow-based models. The proposed model for 3D point clouds, namely SoftPointFlow, can estimate the distribution of various shapes more accurately and achieves state-of-the-art performance in point cloud generation.
\ No newline at end of file
diff --git a/data/2020/neurips/Softmax Deep Double Deterministic Policy Gradients b/data/2020/neurips/Softmax Deep Double Deterministic Policy Gradients
new file mode 100644
index 0000000000..1a40d97f85
--- /dev/null
+++ b/data/2020/neurips/Softmax Deep Double Deterministic Policy Gradients	
@@ -0,0 +1 @@
+A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm mitigates the overestimation issue, it can lead to a large underestimation bias. In this paper, we propose to use the Boltzmann softmax operator for value function estimation in continuous control. We first theoretically analyze the softmax operator in continuous action space. Then, we uncover an important property of the softmax operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which sheds new light on the benefits of the operator. We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators, which can effectively improve the overestimation and underestimation bias. We conduct extensive experiments on challenging continuous control tasks, and results show that SD3 outperforms state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers b/data/2020/neurips/Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers
new file mode 100644
index 0000000000..670c2c86db
--- /dev/null
+++ b/data/2020/neurips/Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers	
@@ -0,0 +1 @@
+Finding accurate solutions to partial differential equations (PDEs) is a crucial task in all scientific and engineering disciplines. It has recently been shown that machine learning methods can improve the solution accuracy by correcting for effects not captured by the discretized PDE. We target the problem of reducing numerical errors of iterative PDE solvers and compare different learning approaches for finding complex correction functions. We find that previously used learning approaches are significantly outperformed by methods that integrate the solver into the training loop and thereby allow the model to interact with the PDE during training. This provides the model with realistic input distributions that take previous corrections into account, yielding improvements in accuracy with stable rollouts of several hundred recurrent evaluation steps and surpassing even tailored supervised variants. We highlight the performance of the differentiable physics networks for a wide variety of PDEs, from non-linear advection-diffusion systems to three-dimensional Navier-Stokes flows.
\ No newline at end of file
diff --git a/data/2020/neurips/Space-Time Correspondence as a Contrastive Random Walk b/data/2020/neurips/Space-Time Correspondence as a Contrastive Random Walk
new file mode 100644
index 0000000000..8965a00ebd
--- /dev/null
+++ b/data/2020/neurips/Space-Time Correspondence as a Contrastive Random Walk	
@@ -0,0 +1 @@
+This paper proposes a simple self-supervised approach for learning representations for visual correspondence from raw video. We cast correspondence as link prediction in a space-time graph constructed from a video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a node embedding in which pairwise similarity defines transition probabilities of a random walk. Prediction of long-range correspondence is efficiently computed as a walk along this graph. The embedding learns to guide the walk by placing high probability along paths of correspondence. Targets are formed without supervision, by cycle-consistency: we train the embedding to maximize the likelihood of returning to the initial node when walking along a graph constructed from a `palindrome' of frames. We demonstrate that the approach allows for learning representations from large unlabeled video. Despite its simplicity, the method outperforms the self-supervised state-of-the-art on a variety of label propagation tasks involving objects, semantic parts, and pose. Moreover, we show that self-supervised adaptation at test-time and edge dropout improve transfer for object-level correspondence.
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse Graphical Memory for Robust Planning b/data/2020/neurips/Sparse Graphical Memory for Robust Planning
new file mode 100644
index 0000000000..7b9ea14819
--- /dev/null
+++ b/data/2020/neurips/Sparse Graphical Memory for Robust Planning	
@@ -0,0 +1 @@
+To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons. Current deep reinforcement and imitation learning methods can learn directly from high-dimensional inputs but do not scale well to long-horizon tasks. In contrast, classical graphical methods like A* search are able to solve long-horizon tasks, but assume that the state space is abstracted away from raw sensory input. Recent works have attempted to combine the strengths of deep learning and classical planning; however, dominant methods in this domain are still quite brittle and scale poorly with the size of the environment. We introduce Sparse Graphical Memory (SGM), a new data structure that stores states and feasible transitions in a sparse memory. SGM aggregates states according to a novel two-way consistency objective, adapting classic state aggregation criteria to goal-conditioned RL: two states are redundant when they are interchangeable both as goals and as starting states. Theoretically, we prove that merging nodes according to two-way consistency leads to an increase in shortest path lengths that scales only linearly with the merging threshold. Experimentally, we show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks. Project video and code are available at this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse Learning with CART b/data/2020/neurips/Sparse Learning with CART
new file mode 100644
index 0000000000..35a9f185b3
--- /dev/null
+++ b/data/2020/neurips/Sparse Learning with CART	
@@ -0,0 +1 @@
+Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a quadratic program. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodness-of-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning b/data/2020/neurips/Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning
new file mode 100644
index 0000000000..1f0bfbe4f3
--- /dev/null
+++ b/data/2020/neurips/Sparse Spectrum Warped Input Measures for Nonstationary Kernel Learning	
@@ -0,0 +1 @@
+We establish a general form of explicit, input-dependent, measure-valued warpings for learning nonstationary kernels. While stationary kernels are ubiquitous and simple to use, they struggle to adapt to functions that vary in smoothness with respect to the input. The proposed learning algorithm warps inputs as conditional Gaussian measures that control the smoothness of a standard stationary kernel. This construction allows us to capture non-stationary patterns in the data and provides intuitive inductive bias. The resulting method is based on sparse spectrum Gaussian processes, enabling closed-form solutions, and is extensible to a stacked construction to capture more complex patterns. The method is extensively validated alongside related algorithms on synthetic and real world datasets. We demonstrate a remarkable efficiency in the number of parameters of the warping functions in learning problems with both small and large data regimes.
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse Symplectically Integrated Neural Networks b/data/2020/neurips/Sparse Symplectically Integrated Neural Networks
new file mode 100644
index 0000000000..b2ed1dbfde
--- /dev/null
+++ b/data/2020/neurips/Sparse Symplectically Integrated Neural Networks	
@@ -0,0 +1 @@
+We introduce Sparse Symplectically Integrated Neural Networks (SSINNs), a novel model for learning Hamiltonian dynamical systems from data. SSINNs combine fourth-order symplectic integration with a learned parameterization of the Hamiltonian obtained using sparse regression through a mathematically elegant function space. This allows for interpretable models that incorporate symplectic inductive biases and have low memory requirements. We evaluate SSINNs on four classical Hamiltonian dynamical problems: the Henon-Heiles system, nonlinearly coupled oscillators, a multi-particle mass-spring system, and a pendulum system. Our results demonstrate promise in both system prediction and conservation of energy, outperforming the current state-of-the-art black-box prediction techniques by an order of magnitude. Further, SSINNs successfully converge to true governing equations from highly limited and noisy data, demonstrating potential applicability in the discovery of new physical governing equations.
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse Weight Activation Training b/data/2020/neurips/Sparse Weight Activation Training
new file mode 100644
index 0000000000..5d1d5a1e8d
--- /dev/null
+++ b/data/2020/neurips/Sparse Weight Activation Training	
@@ -0,0 +1 @@
+Neural network training is computationally and memory intensive. Sparse training can reduce the burden on emerging hardware platforms designed to accelerate sparse computations, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is more computation and memory-efficient than conventional training. SWAT modifies back-propagation based on the empirical insight that convergence during training tends to be robust to the elimination of (i) small magnitude weights during the forward pass and (ii) both small magnitude weights and activations during the backward pass. We evaluate SWAT on recent CNN architectures such as ResNet, VGG, DenseNet and WideResNet using CIFAR-10, CIFAR-100 and ImageNet datasets. For ResNet-50 on ImageNet SWAT reduces total floating-point operations (FLOPS) during training by 80% resulting in a 3.3$\times$ training speedup when run on a simulated sparse learning accelerator representative of emerging platforms while incurring only 1.63% reduction in validation accuracy. Moreover, SWAT reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.
\ No newline at end of file
diff --git a/data/2020/neurips/Sparse and Continuous Attention Mechanisms b/data/2020/neurips/Sparse and Continuous Attention Mechanisms
new file mode 100644
index 0000000000..30b74a9310
--- /dev/null
+++ b/data/2020/neurips/Sparse and Continuous Attention Mechanisms	
@@ -0,0 +1 @@
+Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.
\ No newline at end of file
diff --git a/data/2020/neurips/Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks b/data/2020/neurips/Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks
new file mode 100644
index 0000000000..d7abf57a1b
--- /dev/null
+++ b/data/2020/neurips/Spectra of the Conjugate Kernel and Neural Tangent Kernel for linear-width neural networks	
@@ -0,0 +1 @@
+We study the eigenvalue distributions of the Conjugate Kernel and Neural Tangent Kernel associated to multi-layer feedforward neural networks. In an asymptotic regime where network width is increasing linearly in sample size, under random initialization of the weights, and for input samples satisfying a notion of approximate pairwise orthogonality, we show that the eigenvalue distributions of the CK and NTK converge to deterministic limits. The limit for the CK is described by iterating the Marcenko-Pastur map across the hidden layers. The limit for the NTK is equivalent to that of a linear combination of the CK matrices across layers, and may be described by recursive fixed-point equations that extend this Marcenko-Pastur map. We demonstrate the agreement of these asymptotic predictions with the observed spectra for both synthetic and CIFAR-10 training data, and we perform a small simulation to investigate the evolutions of these spectra over training.
\ No newline at end of file
diff --git a/data/2020/neurips/Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting b/data/2020/neurips/Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting
new file mode 100644
index 0000000000..0d09ca38ac
--- /dev/null
+++ b/data/2020/neurips/Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting	
@@ -0,0 +1 @@
+Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, there have been multiple works trying to capture both correlations, but most, if not all of them only capture temporal correlations in the time domain and resort to pre-defined priors as inter-series relationships. In this paper, we propose Spectral Temporal Graph Neural Network (StemGNN) to further improve the accuracy of multivariate time-series forecasting. StemGNN captures inter-series correlations and temporal dependencies jointly in the spectral domain. It combines Graph Fourier Transform (GFT) which models inter-series correlations and Discrete Fourier Transform (DFT) which models temporal dependencies in an end-to-end framework. After passing through GFT and DFT, the spectral representations hold clear patterns and can be predicted effectively by convolution and sequential learning modules. Moreover, StemGNN learns inter-series correlations automatically from the data without using pre-defined priors. We conduct extensive experiments on ten real-world datasets to demonstrate the effectiveness of StemGNN.
\ No newline at end of file
diff --git a/data/2020/neurips/Spike and slab variational Bayes for high dimensional logistic regression b/data/2020/neurips/Spike and slab variational Bayes for high dimensional logistic regression
new file mode 100644
index 0000000000..317c2d2367
--- /dev/null
+++ b/data/2020/neurips/Spike and slab variational Bayes for high dimensional logistic regression	
@@ -0,0 +1 @@
+Variational Bayes (VB) is a popular scalable alternative to Markov chain Monte Carlo for Bayesian inference. We study a mean-field spike and slab VB approximation of widely used Bayesian model selection priors in sparse high-dimensional logistic regression. We provide non-asymptotic theoretical guarantees for the VB posterior in both $\ell_2$ and prediction loss for a sparse truth, giving optimal (minimax) convergence rates. Since the VB algorithm does not depend on the unknown truth to achieve optimality, our results shed light on effective prior choices. We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study.
\ No newline at end of file
diff --git a/data/2020/neurips/Spin-Weighted Spherical CNNs b/data/2020/neurips/Spin-Weighted Spherical CNNs
new file mode 100644
index 0000000000..2de5932539
--- /dev/null
+++ b/data/2020/neurips/Spin-Weighted Spherical CNNs	
@@ -0,0 +1 @@
+Learning equivariant representations is a promising way to reduce sample and model complexity and improve the generalization performance of deep neural networks. The spherical CNNs are successful examples, producing SO(3)-equivariant representations of spherical inputs. There are two main types of spherical CNNs. The first type lifts the inputs to functions on the rotation group SO(3) and applies convolutions on the group, which are computationally expensive since SO(3) has one extra dimension. The second type applies convolutions directly on the sphere, which are limited to zonal (isotropic) filters, and thus have limited expressivity. In this paper, we present a new type of spherical CNN that allows anisotropic filters in an efficient way, without ever leaving the spherical domain. The key idea is to consider spin-weighted spherical functions, which were introduced in physics in the study of gravitational waves. These are complex-valued functions on the sphere whose phases change upon rotation. We define a convolution between spin-weighted functions and build a CNN based on it. Experiments show that our method outperforms the isotropic spherical CNNs while still being much more efficient than using SO(3) convolutions. The spin-weighted functions can also be interpreted as spherical vector fields, allowing applications to tasks where the inputs or outputs are vector fields.
\ No newline at end of file
diff --git a/data/2020/neurips/Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses b/data/2020/neurips/Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
new file mode 100644
index 0000000000..82ed1d9453
--- /dev/null
+++ b/data/2020/neurips/Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses	
@@ -0,0 +1,2 @@
+Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. 
+Our work is the first to address uniform stability of SGD on {\em nonsmooth} convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case. In addition, our bounds allow us to derive a new algorithm for differentially private nonsmooth stochastic convex optimization with optimal excess population risk. Our algorithm is simpler and more efficient than the best known algorithm for the nonsmooth case Feldman et al. (2020).
\ No newline at end of file
diff --git a/data/2020/neurips/Stable and expressive recurrent vision models b/data/2020/neurips/Stable and expressive recurrent vision models
new file mode 100644
index 0000000000..b2d20112af
--- /dev/null
+++ b/data/2020/neurips/Stable and expressive recurrent vision models	
@@ -0,0 +1 @@
+Primate vision depends on recurrent processing for reliable perception (Gilbert & Li, 2013). At the same time, there is a growing body of literature demonstrating that recurrent connections improve the learning efficiency and generalization of vision models on classic computer vision challenges. Why then, are current large-scale challenges dominated by feedforward networks? We posit that the effectiveness of recurrent vision models is bottlenecked by the widespread algorithm used for training them, "back-propagation through time" (BPTT), which has O(N) memory-complexity for training an N step model. Thus, recurrent vision model design is bounded by memory constraints, forcing a choice between rivaling the enormous capacity of leading feedforward models or trying to compensate for this deficit through granular and complex dynamics. Here, we develop a new learning algorithm, "contractor recurrent back-propagation" (C-RBP), which alleviates these issues by achieving constant O(1) memory-complexity with steps of recurrent processing. We demonstrate that recurrent vision models trained with C-RBP can detect long-range spatial dependencies in a synthetic contour tracing task that BPTT-trained models cannot. We further demonstrate that recurrent vision models trained with C-RBP to solve the large-scale Panoptic Segmentation MS-COCO challenge outperform the leading feedforward approach. C-RBP is a general-purpose learning algorithm for any application that can benefit from expansive recurrent dynamics. Code and data are available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Stage-wise Conservative Linear Bandits b/data/2020/neurips/Stage-wise Conservative Linear Bandits
new file mode 100644
index 0000000000..c4f7aa2381
--- /dev/null
+++ b/data/2020/neurips/Stage-wise Conservative Linear Bandits	
@@ -0,0 +1 @@
+We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the learner must choose actions that not only maximize cumulative reward across the entire time horizon but further satisfy a linear baseline constraint that takes the form of a lower bound on the instantaneous reward. For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively. Notably, the proposed algorithms can be adjusted with only minor modifications to tackle different problem variations, such as constraints with bandit-feedback, or an unknown sequence of baseline actions. We discuss these and other improvements over the state-of-the-art. For instance, compared to existing solutions, we show that SCLTS plays the (non-optimal) baseline action at most O(\log{T}) times (compared to O(\sqrt{T})). Finally, we make connections to another studied form of safety constraints that takes the form of an upper bound on the instantaneous reward. While this incurs additional complexity to the learning process as the optimal action is not guaranteed to belong to the safe set at each round, we show that SCLUCB can properly adjust in this setting via a simple modification.
\ No newline at end of file
diff --git a/data/2020/neurips/Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes b/data/2020/neurips/Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes
new file mode 100644
index 0000000000..05987eea44
--- /dev/null
+++ b/data/2020/neurips/Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes	
@@ -0,0 +1 @@
+An online problem called dynamic resource allocation with capacity constraints (DRACC) is introduced and studied in the realm of posted price mechanisms. This problem subsumes several applications of stateful pricing, including but not limited to posted prices for online job scheduling and matching over a dynamic bipartite graph. Because existing online learning techniques do not yield vanishing regret for this problem, we develop a novel online learning framework over deterministic Markov decision processes with dynamic state transition and reward functions. Following that, we prove, based on a reduction to the well-studied problem of online learning with switching costs, that if the Markov decision process admits a chasing oracle (i.e., an oracle that simulates any given policy from any initial state with bounded loss), then the online learning problem can be solved with vanishing regret. Our results for the DRACC problem and its applications are then obtained by devising (randomized and deterministic) chasing oracles that exploit the particular structure of these problems. Funding: The work of Y. Emek was supported in part by the Israel Science Foundation [Grant 1016/17]. The work of R. Lavi was partially supported by the Israel Science Foundation [Grant 2560/17] and the National Natural Science Foundation of China [Grant 2560/17]. The work of R. Niazadeh was supported by the University of Chicago Booth School of Business. The work of Y. Shi was partially supported at the Technion by the Council for Higher Education [fellowship], and at Shandong University by the Science Fund Program of Shandong Province for Distinguished Oversea Young Scholars [Grant 2023HWYQ-006].
\ No newline at end of file
diff --git a/data/2020/neurips/Stationary Activations for Uncertainty Calibration in Deep Learning b/data/2020/neurips/Stationary Activations for Uncertainty Calibration in Deep Learning
new file mode 100644
index 0000000000..91e60966e1
--- /dev/null
+++ b/data/2020/neurips/Stationary Activations for Uncertainty Calibration in Deep Learning	
@@ -0,0 +1 @@
+We introduce a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Matern family of kernels in Gaussian process (GP) models. This class spans a range of locally stationary models of various degrees of mean-square differentiability. We show an explicit link to the corresponding GP models in the case that the network consists of one infinitely wide hidden layer. In the limit of infinite smoothness the Matern family results in the RBF kernel, and in this case we recover RBF activations. Matern activation functions result in similar appealing properties to their counterparts in GP models, and we demonstrate that the local stationarity property together with limited mean-square differentiability shows both good performance and uncertainty calibration in Bayesian deep learning tasks. In particular, local stationarity helps calibrate out-of-distribution (OOD) uncertainty. We demonstrate these properties on classification and regression benchmarks and a radar emitter classification task.
\ No newline at end of file
diff --git a/data/2020/neurips/Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits b/data/2020/neurips/Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
new file mode 100644
index 0000000000..5ecef51c23
--- /dev/null
+++ b/data/2020/neurips/Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits	
@@ -0,0 +1 @@
+We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propose to answer the above question for these two families by analyzing variants of the Combinatorial Thompson Sampling policy (CTS). For mutually independent outcomes in $[0,1]$, we propose a tight analysis of CTS using Beta priors. We then look at the more general setting of multivariate sub-Gaussian outcomes and propose a tight analysis of CTS using Gaussian priors. This last result gives us an alternative to the Efficient Sampling for Combinatorial Bandit policy (ESCB), which, although optimal, is not computationally efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Statistical Guarantees of Distributed Nearest Neighbor Classification b/data/2020/neurips/Statistical Guarantees of Distributed Nearest Neighbor Classification
new file mode 100644
index 0000000000..5389346f1a
--- /dev/null
+++ b/data/2020/neurips/Statistical Guarantees of Distributed Nearest Neighbor Classification	
@@ -0,0 +1 @@
+Nearest neighbor is a popular nonparametric method for classiﬁcation and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classiﬁer achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classiﬁer to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our ﬁndings are supported by numerical studies.
\ No newline at end of file
diff --git a/data/2020/neurips/Statistical Optimal Transport posed as Learning Kernel Embedding b/data/2020/neurips/Statistical Optimal Transport posed as Learning Kernel Embedding
new file mode 100644
index 0000000000..2c993638c6
--- /dev/null
+++ b/data/2020/neurips/Statistical Optimal Transport posed as Learning Kernel Embedding	
@@ -0,0 +1 @@
+The objective in statistical Optimal Transport (OT) is to consistently estimate the optimal transport plan/map solely using samples from the given source and target marginal distributions. This work takes the novel approach of posing statistical OT as that of learning the transport plan's kernel mean embedding from sample based estimates of marginal embeddings. A key result is that, under mild conditions, the sample complexity of the resulting estimator for the optimal transport plan as well as that for the Barycentric-projection based optimal transport map are dimension-free. Moreover, the implicit smoothing in the kernel embeddings not only improves the quality of finite sample estimation but also enables out-of-sample estimation. Also, complementary to existing $\phi$-divergence (entropy) based regularization techniques, our estimator employs a maximum mean discrepancy (MMD) based regularization to avoid over-fitting the samples. We present an appropriate representer theorem that leads to a kernelized convex formulation, which can then be potentially used to perform OT even in non-standard domains. Empirical results illustrate the efficacy of the proposed approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Statistical and Topological Properties of Sliced Probability Divergences b/data/2020/neurips/Statistical and Topological Properties of Sliced Probability Divergences
new file mode 100644
index 0000000000..8b3b11db4f
--- /dev/null
+++ b/data/2020/neurips/Statistical and Topological Properties of Sliced Probability Divergences	
@@ -0,0 +1 @@
+The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the computational and statistical consequences of such a technique have not yet been well-established. In this paper, we aim at bridging this gap and derive some properties of sliced divergence functions. First, we show that slicing preserves the metric axioms and the weak continuity of the divergence, implying that the sliced divergence will share similar topological properties. We then precise the results in the case where the base divergence belongs to the class of integral probability metrics. On the other hand, we establish that, under mild conditions, the sample complexity of the sliced divergence does not depend on the dimension, even when the base divergence suffers from the curse of dimensionality. We finally apply our general results to the Wasserstein distance and Sinkhorn divergences, and illustrate our theory on both synthetic and real data experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Statistical control for spatio-temporal MEG EEG source imaging with desparsified mutli-task Lasso b/data/2020/neurips/Statistical control for spatio-temporal MEG EEG source imaging with desparsified mutli-task Lasso
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Statistical-Query Lower Bounds via Functional Gradients b/data/2020/neurips/Statistical-Query Lower Bounds via Functional Gradients
new file mode 100644
index 0000000000..8310c54014
--- /dev/null
+++ b/data/2020/neurips/Statistical-Query Lower Bounds via Functional Gradients	
@@ -0,0 +1 @@
+We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance $n^{-\Theta(\epsilon^{-1/2})}$ must use at least $2^{n^c} \epsilon$ queries for some constant $c > 0$, where $n$ is the dimension and $\epsilon$ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.
\ No newline at end of file
diff --git a/data/2020/neurips/Steady State Analysis of Episodic Reinforcement Learning b/data/2020/neurips/Steady State Analysis of Episodic Reinforcement Learning
new file mode 100644
index 0000000000..f5bd5ff8c2
--- /dev/null
+++ b/data/2020/neurips/Steady State Analysis of Episodic Reinforcement Learning	
@@ -0,0 +1 @@
+This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed approaches to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While steady states are usually presumed to exist in continual learning and are considered less relevant in episodic learning, it turns out they are guaranteed to exist for the latter. Based on this insight, the paper further develops connections between episodic and continual RL for several important concepts that have been separately treated in the two RL formalisms. Practically, the existence of unique and approachable steady state enables a general, reliable, and efficient way to collect data in episodic RL tasks, which the paper applies to policy gradient algorithms as a demonstration, based on a new steady-state policy gradient theorem. The paper also proposes and empirically evaluates a perturbation method that facilitates rapid mixing in real-world tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction b/data/2020/neurips/Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction
new file mode 100644
index 0000000000..0d2d212fb3
--- /dev/null
+++ b/data/2020/neurips/Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction	
@@ -0,0 +1 @@
+This supplementary material demonstrates the link between NeRV and Class-NeRV in its unsupervised case (Section 1), illustrates individual effect of each sub-term of ClassNeRV stress with an ablation study (Section 2) and presents the full confusion matrix of the 10-NN classiﬁer on the Isolet dataset (Section 3). It also provides the parameters of the DR techniques (Section 4), additional experiments to support our claim while accounting for randomness of stochastic methods (Section 5), quantitative comparison with unsupervised DR methods for the digits dataset (Section 6) and additional supervised indicators (Section 7)
\ No newline at end of file
diff --git a/data/2020/neurips/Stein Self-Repulsive Dynamics: Benefits From Past Samples b/data/2020/neurips/Stein Self-Repulsive Dynamics: Benefits From Past Samples
new file mode 100644
index 0000000000..818e397e48
--- /dev/null
+++ b/data/2020/neurips/Stein Self-Repulsive Dynamics: Benefits From Past Samples	
@@ -0,0 +1 @@
+We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force to push the samples of Langevin dynamics away from the past trajectories. This simple idea allows us to significantly decrease the auto-correlation in Langevin dynamics and hence increase the effective sample size. Importantly, as we establish in our theoretical analysis, the asymptotic stationary distribution remains correct even with the addition of the repulsive force, thanks to the special properties of the Stein variational gradient. We perform extensive empirical studies of our new algorithm, showing that our method yields much higher sample efficiency and better uncertainty estimation than vanilla Langevin dynamics.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Deep Gaussian Processes over Graphs b/data/2020/neurips/Stochastic Deep Gaussian Processes over Graphs
new file mode 100644
index 0000000000..4c60dc1ff5
--- /dev/null
+++ b/data/2020/neurips/Stochastic Deep Gaussian Processes over Graphs	
@@ -0,0 +1 @@
+In this paper we propose Stochastic D eep G aussian P rocesses over G raphs (DGPG), which are deep Gaussian models that learn the mappings between input and output signals in graph domains. The approximate posterior distributions of the latent variables are derived with variational inference, and the evidence lower bound is evaluated and optimized by the proposed recursive sampling scheme. The Bayesian non-parametric natural of our model allows it to resist overﬁtting, while the expressive deep structure grants it the potential to learn complex relations. Extensive experiments demonstrate that our method achieves superior performances in both small size (< 50) and large size (> 35,000) datasets. We show that DGPG outperforms another Gaussian-based approach, and is competitive to a state-of-the-art method in the challenging task of trafﬁc ﬂow prediction. Our model is also capable of capturing uncertainties in a mathematical principled way and automatically discovering which vertices and features are relevant to the prediction.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes b/data/2020/neurips/Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes
new file mode 100644
index 0000000000..926be91a94
--- /dev/null
+++ b/data/2020/neurips/Stochastic Gradient Descent in Correlated Settings: A Study on Gaussian Processes	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) and its variants have established themselves as the go-to algorithms for large-scale machine learning problems with independent samples due to their generalization performance and intrinsic computational advantage. However, the fact that the stochastic gradient is a biased estimator of the full gradient with correlated samples has led to the lack of theoretical understanding of how SGD behaves under correlated settings and hindered its use in such cases. In this paper, we focus on the Gaussian process (GP) and take a step forward towards breaking the barrier by proving minibatch SGD converges to a critical point of the full loss function, and recovers model hyperparameters with rate O( 1 K ) up to a statistical error term depending on the minibatch size. Numerical studies on both simulated and real datasets demonstrate that minibatch SGD has better generalization over state-of-the-art GP methods while reducing the computational burden and opening a new, previously unexplored, data size regime for GPs.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model b/data/2020/neurips/Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model
new file mode 100644
index 0000000000..b688f88a2e
--- /dev/null
+++ b/data/2020/neurips/Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations. However, these high-dimensional observation spaces present a number of challenges in practice, since the policy must now solve two problems: representation learning and task learning. In this work, we tackle these two problems separately, by explicitly learning latent representations that can accelerate reinforcement learning from images. We propose the stochastic latent actor-critic (SLAC) algorithm: a sample-efficient and high-performing RL algorithm for learning policies for complex continuous control tasks directly from high-dimensional image inputs. SLAC provides a novel and principled approach for unifying stochastic sequential models and RL into a single method, by learning a compact latent representation and then performing RL in the model's learned latent space. Our experimental evaluation demonstrates that our method outperforms both model-free and model-based alternatives in terms of final performance and sample efficiency, on a range of difficult image-based control tasks. Our code and videos of our results are available at our website.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Normalization b/data/2020/neurips/Stochastic Normalization
new file mode 100644
index 0000000000..048fef122d
--- /dev/null
+++ b/data/2020/neurips/Stochastic Normalization	
@@ -0,0 +1 @@
+Computational methods for lung sound analysis are beneficial for computer-aided diagnosis support, storage and monitoring in critical care. In this paper, we use pre-trained ResNet models as backbone architectures for classification of adventitious lung sounds and respiratory diseases. The learned representation of the pre-trained model is transferred by using vanilla fine-tuning, co-tuning, stochastic normalization and the combination of the co-tuning and stochastic normalization techniques. Furthermore, data augmentation in both time domain and time-frequency domain is used to account for the class imbalance of the ICBHI and our multi-channel lung sound dataset. Additionally, we introduce spectrum correction to account for the variations of the recording device properties on the ICBHI dataset. Empirically, our proposed systems mostly outperform all state-of-the-art lung sound classification systems for the adventitious lung sounds and respiratory diseases of both datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Normalizing Flows b/data/2020/neurips/Stochastic Normalizing Flows
new file mode 100644
index 0000000000..f0734684d2
--- /dev/null
+++ b/data/2020/neurips/Stochastic Normalizing Flows	
@@ -0,0 +1 @@
+The sampling of probability distributions specified up to a normalization constant is an important problem in both machine learning and statistical mechanics. While classical stochastic sampling methods such as Markov Chain Monte Carlo (MCMC) or Langevin Dynamics (LD) can suffer from slow mixing times there is a growing interest in using normalizing flows in order to learn the transformation of a simple prior distribution to the given target distribution. Here we propose a generalized and combined approach to sample target densities: Stochastic Normalizing Flows (SNF) -- an arbitrary sequence of deterministic invertible functions and stochastic sampling blocks. We show that stochasticity overcomes expressivity limitations of normalizing flows resulting from the invertibility constraint, whereas trainable transformations between sampling steps improve efficiency of pure MCMC/LD along the flow. By invoking ideas from non-equilibrium statistical mechanics we derive an efficient training procedure by which both the sampler's and the flow's parameters can be optimized end-to-end, and by which we can compute exact importance weights without having to marginalize out the randomness of the stochastic blocks. We illustrate the representational power, sampling efficiency and asymptotic correctness of SNFs on several benchmarks including applications to sampling molecular systems in equilibrium.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Optimization for Performative Prediction b/data/2020/neurips/Stochastic Optimization for Performative Prediction
new file mode 100644
index 0000000000..8b06a98dfb
--- /dev/null
+++ b/data/2020/neurips/Stochastic Optimization for Performative Prediction	
@@ -0,0 +1,3 @@
+In performative prediction, the choice of a model influences the distribution of future data, typically through actions taken based on the model's predictions. 
+We initiate the study of stochastic optimization for performative prediction. What sets this setting apart from traditional stochastic optimization is the difference between merely updating model parameters and deploying the new model. The latter triggers a shift in the distribution that affects future data, while the former keeps the distribution as is. 
+Assuming smoothness and strong convexity, we prove non-asymptotic rates of convergence for both greedily deploying models after each stochastic update (greedy deploy) as well as for taking several updates before redeploying (lazy deploy). In both cases, our bounds smoothly recover the optimal $O(1/k)$ rate as the strength of performativity decreases. Furthermore, they illustrate how depending on the strength of performative effects, there exists a regime where either approach outperforms the other. We experimentally explore this trade-off on both synthetic data and a strategic classification simulator.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping b/data/2020/neurips/Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
new file mode 100644
index 0000000000..4fda9304aa
--- /dev/null
+++ b/data/2020/neurips/Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping	
@@ -0,0 +1 @@
+In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly convex case and prove new complexity bounds that outperform state-of-the-art results in this case. Finally, we extend our proof technique and derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Optimization with Laggard Data Pipelines b/data/2020/neurips/Stochastic Optimization with Laggard Data Pipelines
new file mode 100644
index 0000000000..bb20c62ad8
--- /dev/null
+++ b/data/2020/neurips/Stochastic Optimization with Laggard Data Pipelines	
@@ -0,0 +1 @@
+State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes. As a consequence, CPU-bound preprocessing and disk/memory/network operations have emerged as new performance bottlenecks, as opposed to hardware-accelerated gradient computations. In this regime, a recently proposed approach is data echoing (Choi et al., 2019), which takes repeated gradient steps on the same batch while waiting for fresh data to arrive from upstream. We provide the first convergence analyses of "data-echoed" extensions of common optimization methods, showing that they exhibit provable improvements over their synchronous counterparts. Specifically, we show that in convex optimization with stochastic minibatches, data echoing affords speedups on the curvature-dominated part of the convergence rate, while maintaining the optimal statistical rate.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems b/data/2020/neurips/Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems
new file mode 100644
index 0000000000..143562152d
--- /dev/null
+++ b/data/2020/neurips/Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems	
@@ -0,0 +1 @@
+We consider nonconvex-concave minimax problems of the form $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where $f$ is strongly-concave in $\bf y$ but possibly nonconvex in $\bf x$. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of $f$ at each iteration. This formulation includes many machine learning applications as special cases such as adversary training and certifying robustness in deep learning. We are interested in finding an ${\mathcal O}(\varepsilon)$-stationary point of the function $\Phi(\cdot)=\max_{\bf y} f(\cdot, {\bf y})$. The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires $\mathcal O(\kappa^3\varepsilon^{-4})$ stochastic gradient evaluations, where $\kappa$ is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of ${\mathcal O}(\kappa^3\varepsilon^{-3})$, and its dependency on $\varepsilon$ is optimal for this problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty b/data/2020/neurips/Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty
new file mode 100644
index 0000000000..5e73348d06
--- /dev/null
+++ b/data/2020/neurips/Stochastic Segmentation Networks: Modelling Spatially Correlated Aleatoric Uncertainty	
@@ -0,0 +1 @@
+In image segmentation, there is often more than one plausible solution for a given input. In medical imaging, for example, experts will often disagree about the exact location of object boundaries. Estimating this inherent uncertainty and predicting multiple plausible hypotheses is of great interest in many applications, yet this ability is lacking in most current deep learning methods. In this paper, we introduce stochastic segmentation networks (SSNs), an efficient probabilistic method for modelling aleatoric uncertainty with any image segmentation network architecture. In contrast to approaches that produce pixel-wise estimates, SSNs model joint distributions over entire label maps and thus can generate multiple spatially coherent hypotheses for a single image. By using a low-rank multivariate normal distribution over the logit space to model the probability of the label map given the image, we obtain a spatially consistent probability distribution that can be efficiently computed by a neural network without any changes to the underlying architecture. We tested our method on the segmentation of real-world medical data, including lung nodules in 2D CT and brain tumours in 3D multimodal MRI scans. SSNs outperform state-of-the-art for modelling correlated uncertainty in ambiguous images while being much simpler, more flexible, and more efficient.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochastic Stein Discrepancies b/data/2020/neurips/Stochastic Stein Discrepancies
new file mode 100644
index 0000000000..959b8afc6f
--- /dev/null
+++ b/data/2020/neurips/Stochastic Stein Discrepancies	
@@ -0,0 +1 @@
+Stein discrepancies (SDs) monitor convergence and non-convergence in approximate inference when exact integration and sampling are intractable. However, the computation of a Stein discrepancy can be prohibitive if the Stein operator - often a sum over likelihood terms or potentials - is expensive to evaluate. To address this deficiency, we show that stochastic Stein discrepancies (SSDs) based on subsampled approximations of the Stein operator inherit the convergence control properties of standard SDs with probability 1. In our experiments with biased Markov chain Monte Carlo (MCMC) hyperparameter tuning, approximate MCMC sampler selection, and stochastic Stein variational gradient descent, SSDs deliver comparable inferences to standard SDs with orders of magnitude fewer likelihood evaluations.
\ No newline at end of file
diff --git a/data/2020/neurips/Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function b/data/2020/neurips/Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
new file mode 100644
index 0000000000..eedeb0993f
--- /dev/null
+++ b/data/2020/neurips/Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function	
@@ -0,0 +1 @@
+This article suggests that deterministic Gradient Descent, which does not use any stochastic gradient approximation, can still exhibit stochastic behaviors. In particular, it shows that if the objective function exhibit multiscale behaviors, then in a large learning rate regime which only resolves the macroscopic but not the microscopic details of the objective, the deterministic GD dynamics can become chaotic and convergent not to a local minimizer but to a statistical distribution. A sufficient condition is also established for approximating this long-time statistical limit by a rescaled Gibbs distribution. Both theoretical and numerical demonstrations are provided, and the theoretical part relies on the construction of a stochastic map that uses bounded noise (as opposed to discretized diffusions).
\ No newline at end of file
diff --git a/data/2020/neurips/Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning b/data/2020/neurips/Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning
new file mode 100644
index 0000000000..a7bd97fead
--- /dev/null
+++ b/data/2020/neurips/Storage Efficient and Dynamic Flexible Runtime Channel Pruning via Deep Reinforcement Learning	
@@ -0,0 +1 @@
+In this paper, we propose a deep reinforcement learning (DRL) based framework to efﬁciently perform runtime channel pruning on convolutional neural networks (CNNs). Our DRL-based framework aims to learn a pruning strategy to determine how many and which channels to be pruned in each convolutional layer, depending on each individual input instance at runtime. Unlike existing runtime pruning methods which require to store all channels parameters for inference, our framework can reduce parameters storage consumption by introducing a static pruning component. Comparison experimental results with existing runtime and static pruning methods on state-of-the-art CNNs demonstrate that our proposed framework is able to provide a tradeoff between dynamic ﬂexibility and storage efﬁciency in runtime channel pruning.
\ No newline at end of file
diff --git a/data/2020/neurips/StratLearner: Learning a Strategy for Misinformation Prevention in Social Networks b/data/2020/neurips/StratLearner: Learning a Strategy for Misinformation Prevention in Social Networks
new file mode 100644
index 0000000000..638a4ea466
--- /dev/null
+++ b/data/2020/neurips/StratLearner: Learning a Strategy for Misinformation Prevention in Social Networks	
@@ -0,0 +1 @@
+Given a combinatorial optimization problem taking an input, can we learn a strategy to solve it from the examples of input-solution pairs without knowing its objective function? In this paper, we consider such a setting and study the misinformation prevention problem. Given the examples of attacker-protector pairs, our goal is to learn a strategy to compute protectors against future attackers, without the need of knowing the underlying diffusion model. To this end, we design a structured prediction framework, where the main idea is to parameterize the scoring function using random features constructed through distance functions on randomly sampled subgraphs, which leads to a kernelized scoring function with weights learnable via the large margin method. Evidenced by experiments, our method can produce near-optimal protectors without using any information of the diffusion model, and it outperforms other possible graph-based and learning-based methods by an evident margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Strictly Batch Imitation Learning by Energy-based Distribution Matching b/data/2020/neurips/Strictly Batch Imitation Learning by Energy-based Distribution Matching
new file mode 100644
index 0000000000..c19c32e35f
--- /dev/null
+++ b/data/2020/neurips/Strictly Batch Imitation Learning by Energy-based Distribution Matching	
@@ -0,0 +1 @@
+Consider learning a policy purely on the basis of demonstrated behavior---that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach bargains heavily on model estimation or off-policy evaluation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly account for rollout dynamics (i.e. respecting state marginals), and---crucially---operate in an entirely offline fashion. To meet this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a policy with the (generative) energy function for state distributions, EDM provides a simple and effective solution that equivalently minimizes a divergence between the occupancy measures of the demonstrator and the imitator. Through experiments with application to control tasks and healthcare settings, we illustrate consistent performance gains over existing algorithms for strictly batch imitation learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Strongly Incremental Constituency Parsing with Graph Neural Networks b/data/2020/neurips/Strongly Incremental Constituency Parsing with Graph Neural Networks
new file mode 100644
index 0000000000..5f7d2a4ed6
--- /dev/null
+++ b/data/2020/neurips/Strongly Incremental Constituency Parsing with Graph Neural Networks	
@@ -0,0 +1 @@
+Parsing sentences into syntax trees can benefit downstream applications in NLP. Transition-based parsers build trees by executing actions in a state transition system. They are computationally efficient, and can leverage machine learning to predict actions based on partial trees. However, existing transition-based parsers are predominantly based on the shift-reduce transition system, which does not align with how humans are known to parse sentences. Psycholinguistic research suggests that human parsing is strongly incremental: humans grow a single parse tree by adding exactly one token at each step. In this paper, we propose a novel transition system called attach-juxtapose. It is strongly incremental; it represents a partial sentence using a single tree; each action adds exactly one token into the partial tree. Based on our transition system, we develop a strongly incremental parser. At each step, it encodes the partial tree using a graph neural network and predicts an action. We evaluate our parser on Penn Treebank (PTB) and Chinese Treebank (CTB). On PTB, it outperforms existing parsers trained with only constituency trees; and it performs on par with state-of-the-art parsers that use dependency trees as additional training data. On CTB, our parser establishes a new state of the art. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering b/data/2020/neurips/Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering
new file mode 100644
index 0000000000..a127d2b602
--- /dev/null
+++ b/data/2020/neurips/Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering	
@@ -0,0 +1 @@
+Graph based semi-supervised learning is the problem of learning a labeling function for the graph nodes given a few example nodes, often called seeds, usually under the assumption that the graph's edges indicate similarity of labels. This is closely related to the local graph clustering or community detection problem of finding a cluster or community of nodes around a given seed. For this problem, we propose a novel generalization of random walk, diffusion, or smooth function methods in the literature to a convex p-norm cut function. The need for our p-norm methods is that, in our study of existing methods, we find those principled methods based on eigenvector, spectral, random walk, or linear system often have difficulty capturing the correct boundary of a target label or target cluster. In contrast, 1-norm or maxflow-mincut based methods capture the boundary, but cannot grow from small seed set; hybrid procedures that use both have many hard to set parameters. In this paper, we propose a generalization of the objective function behind these methods involving p-norms. To solve the p-norm cut problem we give a strongly local algorithm -- one whose runtime depends on the size of the output rather than the size of the graph. Our method can be thought as a nonlinear generalization of the Anderson-Chung-Lang push procedure to approximate a personalized PageRank vector efficiently. Our procedure is general and can solve other types of nonlinear objective functions, such as p-norm variants of Huber losses. We provide a theoretical analysis of finding planted target clusters with our method and show that the p-norm cut functions improve on the standard Cheeger inequalities for random walk and spectral methods. Finally, we demonstrate the speed and accuracy of our new method in synthetic and real world datasets. Our code is available.
\ No newline at end of file
diff --git a/data/2020/neurips/Structured Convolutions for Efficient Neural Network Design b/data/2020/neurips/Structured Convolutions for Efficient Neural Network Design
new file mode 100644
index 0000000000..657a095cee
--- /dev/null
+++ b/data/2020/neurips/Structured Convolutions for Efficient Neural Network Design	
@@ -0,0 +1 @@
+In this work, we tackle model efficiency by exploiting redundancy in the \textit{implicit structure} of the building blocks of convolutional neural networks. We start our analysis by introducing a general definition of Composite Kernel structures that enable the execution of convolution operations in the form of efficient, scaled, sum-pooling components. As its special case, we propose \textit{Structured Convolutions} and show that these allow decomposition of the convolution operation into a sum-pooling operation followed by a convolution with significantly lower complexity and fewer weights. We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers. Furthermore, we present a Structural Regularization loss that promotes neural network layers to leverage on this desired structure in a way that, after training, they can be decomposed with negligible performance loss. By applying our method to a wide range of CNN architectures, we demonstrate "structured" versions of the ResNets that are up to 2$\times$ smaller and a new Structured-MobileNetV2 that is more efficient while staying within an accuracy loss of 1% on ImageNet and CIFAR-10 datasets. We also show similar structured versions of EfficientNet on ImageNet and HRNet architecture for semantic segmentation on the Cityscapes dataset. Our method performs equally well or superior in terms of the complexity reduction in comparison to the existing tensor decomposition and channel pruning methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Structured Prediction for Conditional Meta-Learning b/data/2020/neurips/Structured Prediction for Conditional Meta-Learning
new file mode 100644
index 0000000000..43d11ea06e
--- /dev/null
+++ b/data/2020/neurips/Structured Prediction for Conditional Meta-Learning	
@@ -0,0 +1 @@
+The goal of optimization-based meta-learning is to find a single initialization shared across a distribution of tasks to speed up the process of learning new tasks. Conditional meta-learning seeks task-specific initialization to better capture complex task distributions and improve performance. However, many existing conditional methods are difficult to generalize and lack theoretical guarantees. In this work, we propose a new perspective on conditional meta-learning via structured prediction. We derive task-adaptive structured meta-learning (TASML), a principled framework that yields task-specific objective functions by weighing meta-training data on target tasks. Our non-parametric approach is model-agnostic and can be combined with existing meta-learning methods to achieve conditioning. Empirically, we show that TASML improves the performance of existing meta-learning models, and outperforms the state-of-the-art on benchmark datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces b/data/2020/neurips/Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces
new file mode 100644
index 0000000000..355d9aad71
--- /dev/null
+++ b/data/2020/neurips/Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces	
@@ -0,0 +1 @@
+Bayesian optimisation is a popular method for efficient optimisation of expensive black-box functions. Traditionally, BO assumes that the search space is known. However, in many problems, this assumption does not hold. To this end, we propose a novel BO algorithm which expands (and shifts) the search space over iterations based on controlling the expansion rate thought a hyperharmonic series. Further, we propose another variant of our algorithm that scales to high dimensions. We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates. Our experiments with synthetic and real-world optimisation tasks demonstrate the superiority of our algorithms over the current state-of-the-art methods for Bayesian optimisation in unknown search space.
\ No newline at end of file
diff --git a/data/2020/neurips/Sub-sampling for Efficient Non-Parametric Bandit Exploration b/data/2020/neurips/Sub-sampling for Efficient Non-Parametric Bandit Exploration
new file mode 100644
index 0000000000..9f8b365d4e
--- /dev/null
+++ b/data/2020/neurips/Sub-sampling for Efficient Non-Parametric Bandit Exploration	
@@ -0,0 +1 @@
+In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior to be optimal in each case, our proposal RB-SDA does not need any distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling Duelling Algorithms (SDA) which combines the sub-sampling idea first used by the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In particular, RB-SDA uses Random Block sampling. We perform an experimental study assessing the flexibility and robustness of this promising novel approach for exploration in bandit models.
\ No newline at end of file
diff --git a/data/2020/neurips/Subgraph Neural Networks b/data/2020/neurips/Subgraph Neural Networks
new file mode 100644
index 0000000000..c1372fb5c1
--- /dev/null
+++ b/data/2020/neurips/Subgraph Neural Networks	
@@ -0,0 +1 @@
+Deep learning methods for graphs achieve remarkable performance on many node-level and graph-level prediction tasks. However, despite the proliferation of the methods and their success, prevailing Graph Neural Networks (GNNs) neglect subgraphs, rendering subgraph prediction tasks challenging to tackle in many impactful applications. Further, subgraph prediction tasks present several unique challenges, because subgraphs can have non-trivial internal topology, but also carry a notion of position and external connectivity information relative to the underlying graph in which they exist. Here, we introduce SUB-GNN, a subgraph neural network to learn disentangled subgraph representations. In particular, we propose a novel subgraph routing mechanism that propagates neural messages between the subgraph's components and randomly sampled anchor patches from the underlying graph, yielding highly accurate subgraph representations. SUB-GNN specifies three channels, each designed to capture a distinct aspect of subgraph structure, and we provide empirical evidence that the channels encode their intended properties. We design a series of new synthetic and real-world subgraph datasets. Empirical results for subgraph classification on eight datasets show that SUB-GNN achieves considerable performance gains, outperforming strong baseline methods, including node-level and graph-level GNNs, by 12.4% over the strongest baseline. SUB-GNN performs exceptionally well on challenging biomedical datasets when subgraphs have complex topology and even comprise multiple disconnected components.
\ No newline at end of file
diff --git a/data/2020/neurips/Subgroup-based Rank-1 Lattice Quasi-Monte Carlo b/data/2020/neurips/Subgroup-based Rank-1 Lattice Quasi-Monte Carlo
new file mode 100644
index 0000000000..1a42815870
--- /dev/null
+++ b/data/2020/neurips/Subgroup-based Rank-1 Lattice Quasi-Monte Carlo	
@@ -0,0 +1 @@
+Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate a near-optimal rank-1 lattice compared with the Korobov exhaustive search regarding the $l_1$-norm and $l_2$-norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on benchmark integration test problems and kernel approximation problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Submodular Maximization Through Barrier Functions b/data/2020/neurips/Submodular Maximization Through Barrier Functions
new file mode 100644
index 0000000000..bd045741c0
--- /dev/null
+++ b/data/2020/neurips/Submodular Maximization Through Barrier Functions	
@@ -0,0 +1 @@
+In this paper, we introduce a novel technique for constrained submodular maximization, inspired by barrier functions in continuous optimization. This connection not only improves the running time for constrained submodular maximization but also provides the state of the art guarantee. More precisely, for maximizing a monotone submodular function subject to the combination of a $k$-matchoid and $\ell$-knapsack constraint (for $\ell\leq k$), we propose a potential function that can be approximately minimized. Once we minimize the potential function up to an $\epsilon$ error it is guaranteed that we have found a feasible set with a $2(k+1+\epsilon)$-approximation factor which can indeed be further improved to $(k+1+\epsilon)$ by an enumeration technique. We extensively evaluate the performance of our proposed algorithm over several real-world applications, including a movie recommendation system, summarization tasks for YouTube videos, Twitter feeds and Yelp business locations, and a set cover problem.
\ No newline at end of file
diff --git a/data/2020/neurips/Submodular Meta-Learning b/data/2020/neurips/Submodular Meta-Learning
new file mode 100644
index 0000000000..bf16cf0ea2
--- /dev/null
+++ b/data/2020/neurips/Submodular Meta-Learning	
@@ -0,0 +1 @@
+In this paper, we introduce a discrete variant of the meta-learning framework. Meta-learning aims at exploiting prior experience and data to improve performance on future tasks. By now, there exist numerous formulations for meta-learning in the continuous domain. Notably, the Model-Agnostic Meta-Learning (MAML) formulation views each task as a continuous optimization problem and based on prior data learns a suitable initialization that can be adapted to new, unseen tasks after a few simple gradient updates. Motivated by this terminology, we propose a novel meta-learning framework in the discrete domain where each task is equivalent to maximizing a set function under a cardinality constraint. Our approach aims at using prior data, i.e., previously visited tasks, to train a proper initial solution set that can be quickly adapted to a new task at a relatively low computational cost. This approach leads to (i) a personalized solution for each individual task, and (ii) significantly reduced computational cost at test time compared to the case where the solution is fully optimized once the new task is revealed. The training procedure is performed by solving a challenging discrete optimization problem for which we present deterministic and randomized algorithms. In the case where the tasks are monotone and submodular, we show strong theoretical guarantees for our proposed methods even though the training objective may not be submodular. We also demonstrate the effectiveness of our framework on two real-world problem instances where we observe that our methods lead to a significant reduction in computational complexity in solving the new tasks while incurring a small performance loss compared to when the tasks are fully optimized.
\ No newline at end of file
diff --git a/data/2020/neurips/Succinct and Robust Multi-Agent Communication With Temporal Message Control b/data/2020/neurips/Succinct and Robust Multi-Agent Communication With Temporal Message Control
new file mode 100644
index 0000000000..c072bd8dc1
--- /dev/null
+++ b/data/2020/neurips/Succinct and Robust Multi-Agent Communication With Temporal Message Control	
@@ -0,0 +1 @@
+Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing communication schemes often require agents to exchange an excessive number of messages at run-time under a reliable communication channel, which hinders its practicality in many real-world situations. In this paper, we present \textit{Temporal Message Control} (TMC), a simple yet effective approach for achieving succinct and robust communication in MARL. TMC applies a temporal smoothing technique to drastically reduce the amount of information exchanged between agents. Experiments show that TMC can significantly reduce inter-agent communication overhead without impacting accuracy. Furthermore, TMC demonstrates much better robustness against transmission loss than existing approaches in lossy networking environments.
\ No newline at end of file
diff --git a/data/2020/neurips/Sufficient dimension reduction for classification using principal optimal transport direction b/data/2020/neurips/Sufficient dimension reduction for classification using principal optimal transport direction
new file mode 100644
index 0000000000..482d12eece
--- /dev/null
+++ b/data/2020/neurips/Sufficient dimension reduction for classification using principal optimal transport direction	
@@ -0,0 +1 @@
+Sufficient dimension reduction is used pervasively as a supervised dimension reduction approach. Most existing sufficient dimension reduction methods are developed for data with a continuous response and may have an unsatisfactory performance for the categorical response, especially for the binary-response. To address this issue, we propose a novel estimation method of sufficient dimension reduction subspace (SDR subspace) using optimal transport. The proposed method, named principal optimal transport direction (POTD), estimates the basis of the SDR subspace using the principal directions of the optimal transport coupling between the data respecting different response categories. The proposed method also reveals the relationship among three seemingly irrelevant topics, i.e., sufficient dimension reduction, support vector machine, and optimal transport. We study the asymptotic properties of POTD and show that in the cases when the class labels contain no error, POTD estimates the SDR subspace exclusively. Empirical studies show POTD outperforms most of the state-of-the-art linear dimension reduction methods.
\ No newline at end of file
diff --git a/data/2020/neurips/SuperLoss: A Generic Loss for Robust Curriculum Learning b/data/2020/neurips/SuperLoss: A Generic Loss for Robust Curriculum Learning
new file mode 100644
index 0000000000..6f3c3d460b
--- /dev/null
+++ b/data/2020/neurips/SuperLoss: A Generic Loss for Robust Curriculum Learning	
@@ -0,0 +1 @@
+where s ∈ [0, 1] is the similarity score between two keypoints computed as a dot-product between their representation, y ∈ {−1, 1} is the ground-truth label for the pair and σ > 0 is an input-dependent prediction of the reliability of the two keypoints. In Figure 2 of the main paper, we plot σ on the x-axis and the similarity score s on the y-axis, assuming a positive label y = 1. Note that this loss hardly generalizes to other tasks as it is specially designed to handle similarity scores in the range [0, 1] with binary labels.
\ No newline at end of file
diff --git a/data/2020/neurips/Supermasks in Superposition b/data/2020/neurips/Supermasks in Superposition
new file mode 100644
index 0000000000..5d6532cfa7
--- /dev/null
+++ b/data/2020/neurips/Supermasks in Superposition	
@@ -0,0 +1 @@
+We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.
\ No newline at end of file
diff --git a/data/2020/neurips/Supervised Contrastive Learning b/data/2020/neurips/Supervised Contrastive Learning
new file mode 100644
index 0000000000..411121032e
--- /dev/null
+++ b/data/2020/neurips/Supervised Contrastive Learning	
@@ -0,0 +1 @@
+Cross entropy is the most widely used loss function for supervised training of image classification models. In this paper, we propose a novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations. We modify the batch contrastive loss, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting. We are thus able to leverage label information more effectively than cross entropy. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. In addition to this, we leverage key ingredients such as large batch sizes and normalized embeddings, which have been shown to benefit self-supervised learning. On both ResNet-50 and ResNet-200, we outperform cross entropy by over 1%, setting a new state of the art number of 78.8% among methods that use AutoAugment data augmentation. The loss also shows clear benefits for robustness to natural corruptions on standard benchmarks on both calibration and accuracy. Compared to cross entropy, our supervised contrastive loss is more stable to hyperparameter settings such as optimizers or data augmentations.
\ No newline at end of file
diff --git a/data/2020/neurips/SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows b/data/2020/neurips/SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
new file mode 100644
index 0000000000..06b9d54355
--- /dev/null
+++ b/data/2020/neurips/SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows	
@@ -0,0 +1 @@
+Normalizing flows and variational autoencoders are powerful generative models that can represent complicated density functions. However, they both impose constraints on the models: Normalizing flows use bijective transformations to model densities whereas VAEs learn stochastic transformations that are non-invertible and thus typically do not provide tractable estimates of the marginal likelihood. In this paper, we introduce SurVAE Flows: A modular framework of composable transformations that encompasses VAEs and normalizing flows. SurVAE Flows bridge the gap between normalizing flows and VAEs with surjective transformations, wherein the transformations are deterministic in one direction -- thereby allowing exact likelihood computation, and stochastic in the reverse direction -- hence providing a lower bound on the corresponding likelihood. We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows. Finally, we introduce common operations such as the max value, the absolute value, sorting and stochastic permutation as composable layers in SurVAE Flows.
\ No newline at end of file
diff --git a/data/2020/neurips/Swapping Autoencoder for Deep Image Manipulation b/data/2020/neurips/Swapping Autoencoder for Deep Image Manipulation
new file mode 100644
index 0000000000..8969108d87
--- /dev/null
+++ b/data/2020/neurips/Swapping Autoencoder for Deep Image Manipulation	
@@ -0,0 +1 @@
+Deep generative models have become increasingly effective at producing realistic images from randomly sampled seeds, but using such models for controllable manipulation of existing images remains challenging. We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation, rather than random sampling. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. In particular, we encourage the components to represent structure and texture, by enforcing one component to encode co-occurrent patch statistics across different parts of an image. As our method is trained with an encoder, finding the latent codes for a new input image becomes trivial, rather than cumbersome. As a result, it can be used to manipulate real input images in various ways, including texture swapping, local and global editing, and latent code vector arithmetic. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
\ No newline at end of file
diff --git a/data/2020/neurips/Synbols: Probing Learning Algorithms with Synthetic Datasets b/data/2020/neurips/Synbols: Probing Learning Algorithms with Synthetic Datasets
new file mode 100644
index 0000000000..6309425993
--- /dev/null
+++ b/data/2020/neurips/Synbols: Probing Learning Algorithms with Synthetic Datasets	
@@ -0,0 +1 @@
+Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.
\ No newline at end of file
diff --git a/data/2020/neurips/Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis b/data/2020/neurips/Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
new file mode 100644
index 0000000000..0e50a8d04d
--- /dev/null
+++ b/data/2020/neurips/Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis	
@@ -0,0 +1 @@
+The use of deep learning techniques has achieved significant progress for program synthesis from input-output examples. However, when the program semantics become more complex, it still remains a challenge to synthesize programs consistent with the specification. In this work, we propose SED, a neural program generation framework that incorporates synthesis, execution, and debugging stages. Instead of purely relying on the neural program synthesizer to generate the final program, SED first produces initial programs using the neural program synthesizer component, then utilizes a neural program debugger to iteratively repair the generated programs. The integration of the debugger component enables SED to modify the programs based on the execution results and specification, which resembles the coding process of human programmers. On Karel, a challenging input-output program synthesis benchmark, SED reduces the error rate of the neural program synthesizer itself by at least 7.5%, and outperforms the standard beam search for decoding.
\ No newline at end of file
diff --git a/data/2020/neurips/Synthesizing Tasks for Block-based Programming b/data/2020/neurips/Synthesizing Tasks for Block-based Programming
new file mode 100644
index 0000000000..71d54694e9
--- /dev/null
+++ b/data/2020/neurips/Synthesizing Tasks for Block-based Programming	
@@ -0,0 +1 @@
+Block-based visual programming environments play a critical role in introducing computing concepts to K-12 students. One of the key pedagogical challenges in these environments is in designing new practice tasks for a student that match a desired level of difficulty and exercise specific programming concepts. In this paper, we formalize the problem of synthesizing visual programming tasks. In particular, given a reference visual task $\rm T^{in}$ and its solution code $\rm C^{in}$, we propose a novel methodology to automatically generate a set $\{(\rm T^{out}, \rm C^{out})\}$ of new tasks along with solution codes such that tasks $\rm T^{in}$ and $\rm T^{out}$ are conceptually similar but visually dissimilar. Our methodology is based on the realization that the mapping from the space of visual tasks to their solution codes is highly discontinuous; hence, directly mutating reference task $\rm T^{in}$ to generate new tasks is futile. Our task synthesis algorithm operates by first mutating code $\rm C^{in}$ to obtain a set of codes $\{\rm C^{out}\}$. Then, the algorithm performs symbolic execution over a code $\rm C^{out}$ to obtain a visual task $\rm T^{out}$; this step uses the Monte Carlo Tree Search (MCTS) procedure to guide the search in the symbolic tree. We demonstrate the effectiveness of our algorithm through an extensive empirical evaluation and user study on reference tasks taken from the \emph{Hour of the Code: Classic Maze} challenge by \emph{this http URL} and the \emph{Intro to Programming with Karel} course by \emph{this http URL}.
\ No newline at end of file
diff --git a/data/2020/neurips/Synthetic Data Generators - Sequential and Private b/data/2020/neurips/Synthetic Data Generators - Sequential and Private
new file mode 100644
index 0000000000..b44f29d992
--- /dev/null
+++ b/data/2020/neurips/Synthetic Data Generators - Sequential and Private	
@@ -0,0 +1 @@
+We study the sample complexity of private synthetic data generation over an unbounded sized class of statistical queries, and show that any class that is privately proper PAC learnable admits a private synthetic data generator (perhaps non-efﬁcient). Previous work on synthetic data generators focused on the case that the query class D is ﬁnite and obtained sample complexity bounds that scale logarithmically with the size |D| . Here we construct a private synthetic data generator whose sample complexity is independent of the domain size
\ No newline at end of file
diff --git a/data/2020/neurips/System Identification with Biophysical Constraints: A Circuit Model of the Inner Retina b/data/2020/neurips/System Identification with Biophysical Constraints: A Circuit Model of the Inner Retina
new file mode 100644
index 0000000000..f24d608835
--- /dev/null
+++ b/data/2020/neurips/System Identification with Biophysical Constraints: A Circuit Model of the Inner Retina	
@@ -0,0 +1 @@
+Visual processing in the retina has been studied in great detail at all levels such that a comprehensive picture of the retina’s cell types and the many neural circuits they form is emerging. However, the currently best performing models of retinal function are black-box CNN models which are agnostic to such biological knowledge. In particular, these models typically neglect the role of the many inhibitory circuits involving amacrine cells and the biophysical mechanisms underlying synaptic release. Here, we present a computational model of temporal processing in the inner retina, including inhibitory feedback circuits and realistic synaptic release mechanisms. Fit to the responses of bipolar cells, the model generalized well to new stimuli including natural movie sequences, performing on par with or better than a benchmark black-box model. In pharmacology experiments, the model replicated in silico the effect of blocking specific amacrine cell populations with high fidelity, indicating that it had learned key circuit functions. Also, more in depth comparisons showed that connectivity patterns learned by the model were well matched to connectivity patterns extracted from connectomics data. Thus, our model provides a biologically interpretable data-driven account of temporal processing in the inner retina, filling the gap between purely black-box and detailed biophysical modeling.
\ No newline at end of file
diff --git a/data/2020/neurips/TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation b/data/2020/neurips/TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
new file mode 100644
index 0000000000..9b0ee5599c
--- /dev/null
+++ b/data/2020/neurips/TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation	
@@ -0,0 +1 @@
+Sign language translation (SLT) aims to interpret sign video sequences into text-based natural language sentences. Sign videos consist of continuous sequences of sign gestures with no clear boundaries in between. Existing SLT models usually represent sign visual features in a frame-wise manner so as to avoid needing to explicitly segmenting the videos into isolated signs. However, these methods neglect the temporal information of signs and lead to substantial ambiguity in translation. In this paper, we explore the temporal semantic structures of signvideos to learn more discriminative features. To this end, we first present a novel sign video segment representation which takes into account multiple temporal granularities, thus alleviating the need for accurate video segmentation. Taking advantage of the proposed segment representation, we develop a novel hierarchical sign video feature learning method via a temporal semantic pyramid network, called TSPNet. Specifically, TSPNet introduces an inter-scale attention to evaluate and enhance local semantic consistency of sign segments and an intra-scale attention to resolve semantic ambiguity by using non-local video context. Experiments show that our TSPNet outperforms the state-of-the-art with significant improvements on the BLEU score (from 9.58 to 13.41) and ROUGE score (from 31.80 to 34.96)on the largest commonly-used SLT dataset. Our implementation is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization b/data/2020/neurips/Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization
new file mode 100644
index 0000000000..a43d6cf7b7
--- /dev/null
+++ b/data/2020/neurips/Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization	
@@ -0,0 +1 @@
+In federated optimization, heterogeneity in the clients' local datasets and computation speeds results in large variations in the number of local updates performed by each client in each communication round. Naive weighted aggregation of such models causes objective inconsistency, that is, the global model converges to a stationary point of a mismatched objective function which can be arbitrarily different from the true objective. This paper provides a general framework to analyze the convergence of federated heterogeneous optimization algorithms. It subsumes previously proposed methods such as FedAvg and FedProx and provides the first principled understanding of the solution bias and the convergence slowdown due to objective inconsistency. Using insights from this analysis, we propose FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.
\ No newline at end of file
diff --git a/data/2020/neurips/Taming Discrete Integration via the Boon of Dimensionality b/data/2020/neurips/Taming Discrete Integration via the Boon of Dimensionality
new file mode 100644
index 0000000000..d91d1ac1e2
--- /dev/null
+++ b/data/2020/neurips/Taming Discrete Integration via the Boon of Dimensionality	
@@ -0,0 +1,2 @@
+Discrete integration is a fundamental problem in computer science that concerns the computation of discrete sums over exponentially large sets. Despite intense interest from researchers for over three decades, the design of scalable techniques for computing estimates with rigorous guarantees for discrete integration remains the holy grail. The key contribution of this work addresses this scalability challenge via an efficient reduction of discrete integration to model counting. The proposed reduction is achieved via a significant increase in the dimensionality that, contrary to conventional wisdom, leads to solving an instance of the relatively simpler problem of model counting. 
+Building on the promising approach proposed by Chakraborty et al, our work overcomes the key weakness of their approach: a restriction to dyadic weights. We augment our proposed reduction, called DeWeight, with a state of the art efficient approximate model counter and perform detailed empirical analysis over benchmarks arising from neural network verification domains, an emerging application area of critical importance. DeWeight, to the best of our knowledge, is the first technique to compute estimates with provable guarantees for this class of benchmarks.
\ No newline at end of file
diff --git a/data/2020/neurips/Targeted Adversarial Perturbations for Monocular Depth Prediction b/data/2020/neurips/Targeted Adversarial Perturbations for Monocular Depth Prediction
new file mode 100644
index 0000000000..8abef9f414
--- /dev/null
+++ b/data/2020/neurips/Targeted Adversarial Perturbations for Monocular Depth Prediction	
@@ -0,0 +1 @@
+We study the effect of adversarial perturbations on the task of monocular depth prediction. Specifically, we explore the ability of small, imperceptible additive perturbations to selectively alter the perceived geometry of the scene. We show that such perturbations can not only globally re-scale the predicted distances from the camera, but also alter the prediction to match a different target scene. We also show that, when given semantic or instance information, perturbations can fool the network to alter the depth of specific categories or instances in the scene, and even remove them while preserving the rest of the scene. To understand the effect of targeted perturbations, we conduct experiments on state-of-the-art monocular depth prediction methods. Our experiments reveal vulnerabilities in monocular depth prediction networks, and shed light on the biases and context learned by them.
\ No newline at end of file
diff --git a/data/2020/neurips/Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters b/data/2020/neurips/Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters
new file mode 100644
index 0000000000..c5b542d495
--- /dev/null
+++ b/data/2020/neurips/Task-Agnostic Amortized Inference of Gaussian Process Hyperparameters	
@@ -0,0 +1 @@
+Gaussian processes (GPs) are ﬂexible priors for modeling functions. However, their success depends on the kernel accurately reﬂecting the properties of the data. One of the appeals of the GP framework is that the marginal likelihood of the kernel hyperparameters is often available in closed form, enabling optimization and sampling procedures to ﬁt these hyperparameters to data. Unfortunately, point-wise evaluation of the marginal likelihood is expensive due to the need to solve a linear system; searching or sampling the space of hyperparameters thus often dominates the practical cost of using GPs. We introduce an approach to the identiﬁcation of kernel hyperparameters in GP regression and related problems that sidesteps the need for costly marginal likelihoods. Our strategy is to “amortize” inference over hyperparameters by training a single neural network, which consumes a set of regression data and produces an estimate of the kernel function, useful across diﬀerent tasks. To accommodate the varying dimension and cardinality of diﬀerent regression problems, we use a hierarchical self-attention-based neural network that produces estimates of the hyperparameters which are invariant to the order of the input data points and data dimensions. We show that a single neural model trained on synthetic data is able to generalize directly to several diﬀerent real-world GP use cases. Our experiments demonstrate that the estimated hyperparameters are comparable in quality to those from the conventional model selection procedures, while being much faster to obtain, signiﬁcantly accelerating GP regression and its related applications such as Bayesian optimization and Bayesian quadrature.
\ No newline at end of file
diff --git a/data/2020/neurips/Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes b/data/2020/neurips/Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
new file mode 100644
index 0000000000..d5c04cec0d
--- /dev/null
+++ b/data/2020/neurips/Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes	
@@ -0,0 +1 @@
+Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference. Our approach reliably handles the task distribution shift by generating new models for never-before-seen dynamics and reusing old models for previously seen dynamics. In experiments, our approach outperforms alternative methods in non-stationary tasks, including classic control with changing dynamics and decision making in different driving scenarios.
\ No newline at end of file
diff --git a/data/2020/neurips/Task-Oriented Feature Distillation b/data/2020/neurips/Task-Oriented Feature Distillation
new file mode 100644
index 0000000000..0047fe3436
--- /dev/null
+++ b/data/2020/neurips/Task-Oriented Feature Distillation	
@@ -0,0 +1 @@
+Feature distillation, a primary method in knowledge distillation, always leads to signiﬁcant accuracy improvements. Most existing methods distill features in the teacher network through a manually designed transformation. In this paper, we propose a novel distillation method named task-oriented feature distillation (TOFD) where the transformation is convolutional layers that are trained in a data-driven manner by task loss. As a result, the task-oriented information in the features can be captured and distilled to students. Moreover, an orthogonal loss is applied to the feature resizing layer in TOFD to improve the performance of knowledge distillation. Experiments show that TOFD outperforms other distillation methods by a large margin on both image classiﬁcation and 3D classiﬁcation tasks. Codes have been released in Github 3 .
\ No newline at end of file
diff --git a/data/2020/neurips/Task-Robust Model-Agnostic Meta-Learning b/data/2020/neurips/Task-Robust Model-Agnostic Meta-Learning
new file mode 100644
index 0000000000..dd56cf1cd9
--- /dev/null
+++ b/data/2020/neurips/Task-Robust Model-Agnostic Meta-Learning	
@@ -0,0 +1 @@
+Meta-learning methods have shown an impressive ability to train models that rapidly learn new tasks. However, these methods only aim to perform well in expectation over tasks coming from some particular distribution that is typically equivalent across meta-training and meta-testing, rather than considering worst-case task performance. In this work we introduce the notion of "task-robustness" by reformulating the popular Model-Agnostic Meta-Learning (MAML) objective [Finn et al. 2017] such that the goal is to minimize the maximum loss over the observed meta-training tasks. The solution to this novel formulation is task-robust in the sense that it places equal importance on even the most difficult and/or rare tasks. This also means that it performs well over all distributions of the observed tasks, making it robust to shifts in the task distribution between meta-training and meta-testing. We present an algorithm to solve the proposed min-max problem, and show that it converges to an $\epsilon$-accurate point at the optimal rate of $\mathcal{O}(1/\epsilon^2)$ in the convex setting and to an $(\epsilon, \delta)$-stationary point at the rate of $\mathcal{O}(\max\{1/\epsilon^5, 1/\delta^5\})$ in nonconvex settings. We also provide an upper bound on the new task generalization error that captures the advantage of minimizing the worst-case task loss, and demonstrate this advantage in sinusoid regression and image classification experiments.
\ No newline at end of file
diff --git a/data/2020/neurips/Task-agnostic Exploration in Reinforcement Learning b/data/2020/neurips/Task-agnostic Exploration in Reinforcement Learning
new file mode 100644
index 0000000000..b62e7347ec
--- /dev/null
+++ b/data/2020/neurips/Task-agnostic Exploration in Reinforcement Learning	
@@ -0,0 +1 @@
+Efficient exploration is one of the main challenges in reinforcement learning (RL). Most existing sample-efficient algorithms assume the existence of a single reward function during exploration. In many practical scenarios, however, there is not a single underlying reward function to guide the exploration, for instance, when an agent needs to learn many skills simultaneously, or multiple conflicting objectives need to be balanced. To address these challenges, we propose the \textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function. After exploration, it aims at finding near-optimal policies for $N$ tasks, given the collected trajectories augmented with \textit{sampled rewards} for each task. We present an efficient task-agnostic RL algorithm, \textsc{UCBZero}, that finds $\epsilon$-optimal policies for $N$ arbitrary tasks after at most $\tilde O(\log(N)H^5SA/\epsilon^2)$ exploration episodes. We also provide an $\Omega(\log (N)H^2SA/\epsilon^2)$ lower bound, showing that the $\log$ dependency on $N$ is unavoidable. Furthermore, we provide an $N$-independent sample complexity bound of \textsc{UCBZero} in the statistically easier setting when the ground truth reward functions are known.
\ No newline at end of file
diff --git a/data/2020/neurips/TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation b/data/2020/neurips/TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation
new file mode 100644
index 0000000000..bceb77b8de
--- /dev/null
+++ b/data/2020/neurips/TaylorGAN: Neighbor-Augmented Policy Update Towards Sample-Efficient Natural Language Generation	
@@ -0,0 +1 @@
+Score function-based natural language generation (NLG) approaches such as REINFORCE, in general, suffer from low sample efficiency and training instability problems. This is mainly due to the non-differentiable nature of the discrete space sampling and thus these methods have to treat the discriminator as a black box and ignore the gradient information. To improve the sample efficiency and reduce the variance of REINFORCE, we propose a novel approach, TaylorGAN, which augments the gradient estimation by off-policy update and the first-order Taylor expansion. This approach enables us to train NLG models from scratch with smaller batch size -- without maximum likelihood pre-training, and outperforms existing GAN-based methods on multiple metrics of quality and diversity. The source code and data are available at https://github.com/MiuLab/TaylorGAN
\ No newline at end of file
diff --git a/data/2020/neurips/Teaching a GAN What Not to Learn b/data/2020/neurips/Teaching a GAN What Not to Learn
new file mode 100644
index 0000000000..5e9f9d50af
--- /dev/null
+++ b/data/2020/neurips/Teaching a GAN What Not to Learn	
@@ -0,0 +1 @@
+Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. In this paper, we approach the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, "The art of knowing is knowing what to ignore." In the GAN framework, we not only provide the GAN positive data that it must learn to model, but also present it with so-called negative samples that it must learn to avoid - we call this "The Rumi Framework." This formulation allows the discriminator to represent the underlying target distribution better by learning to penalize generated samples that are undesirable - we show that this capability accelerates the learning process of the generator. We present a reformulation of the standard GAN (SGAN) and least-squares GAN (LSGAN) within the Rumi setting. The advantage of the reformulation is demonstrated by means of experiments conducted on MNIST, Fashion MNIST, CelebA, and CIFAR-10 datasets. Finally, we consider an application of the proposed formulation to address the important problem of learning an under-represented class in an unbalanced dataset. The Rumi approach results in substantially lower FID scores than the standard GAN frameworks while possessing better generalization capability.
\ No newline at end of file
diff --git a/data/2020/neurips/Telescoping Density-Ratio Estimation b/data/2020/neurips/Telescoping Density-Ratio Estimation
new file mode 100644
index 0000000000..1786c44268
--- /dev/null
+++ b/data/2020/neurips/Telescoping Density-Ratio Estimation	
@@ -0,0 +1 @@
+Density-ratio estimation via classification is a cornerstone of unsupervised learning. It has provided the foundation for state-of-the-art methods in representation learning and generative modelling, with the number of use-cases continuing to proliferate. However, it suffers from a critical limitation: it fails to accurately estimate ratios p/q for which the two densities differ significantly. Empirically, we find this occurs whenever the KL divergence between p and q exceeds tens of nats. To resolve this limitation, we introduce a new framework, telescoping density-ratio estimation (TRE), that enables the estimation of ratios between highly dissimilar densities in high-dimensional spaces. Our experiments demonstrate that TRE can yield substantial improvements over existing single-ratio methods for mutual information estimation, representation learning and energy-based modelling.
\ No newline at end of file
diff --git a/data/2020/neurips/Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation b/data/2020/neurips/Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation
new file mode 100644
index 0000000000..7b7f62dbf0
--- /dev/null
+++ b/data/2020/neurips/Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation	
@@ -0,0 +1 @@
+Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.
\ No newline at end of file
diff --git a/data/2020/neurips/Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks b/data/2020/neurips/Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks
new file mode 100644
index 0000000000..2b6c0ccd75
--- /dev/null
+++ b/data/2020/neurips/Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking neural networks (SNNs) are well suited for spatio-temporal learning and implementations on energy-efficient event-driven neuromorphic processors. However, existing SNN error backpropagation (BP) methods lack proper handling of spiking discontinuities and suffer from low performance compared with the BP methods for traditional artificial neural networks. In addition, a large number of time steps are typically required to achieve decent performance, leading to high latency and rendering spike-based computation unscalable to deep architectures. We present a novel Temporal Spike Sequence Learning Backpropagation (TSSL-BP) method for training deep SNNs, which breaks down error backpropagation across two types of inter-neuron and intra-neuron dependencies and leads to improved temporal learning precision. It captures inter-neuron dependencies through presynaptic firing times by considering the all-or-none characteristics of firing activities and captures intra-neuron dependencies by handling the internal evolution of each neuronal state in time. TSSL-BP efficiently trains deep SNNs within a much shortened temporal window of a few steps while improving the accuracy for various image classification datasets including CIFAR10.
\ No newline at end of file
diff --git a/data/2020/neurips/Temporal Variability in Implicit Online Learning b/data/2020/neurips/Temporal Variability in Implicit Online Learning
new file mode 100644
index 0000000000..0d78b207d9
--- /dev/null
+++ b/data/2020/neurips/Temporal Variability in Implicit Online Learning	
@@ -0,0 +1 @@
+In the setting of online learning, Implicit algorithms turn out to be highly successful from a practical standpoint. However, the tightest regret analyses only show marginal improvements over Online Mirror Descent. In this work, we shed light on this behavior carrying out a careful regret analysis. We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors. We show, for example, that the regret can be constant if the temporal variability is constant and the learning rate is tuned appropriately, without the need of smooth losses. Moreover, we present an adaptive algorithm that achieves this regret bound without prior knowledge of the temporal variability and prove a matching lower bound. Finally, we validate our theoretical findings on classification and regression datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Tensor Completion Made Practical b/data/2020/neurips/Tensor Completion Made Practical
new file mode 100644
index 0000000000..0543512d1a
--- /dev/null
+++ b/data/2020/neurips/Tensor Completion Made Practical	
@@ -0,0 +1 @@
+Tensor completion is a natural higher-order generalization of matrix completion where the goal is to recover a low-rank tensor from sparse observations of its entries. Existing algorithms are either heuristic without provable guarantees, based on solving large semidefinite programs which are impractical to run, or make strong assumptions such as requiring the factors to be nearly orthogonal. In this paper we introduce a new variant of alternating minimization, which in turn is inspired by understanding how the progress measures that guide convergence of alternating minimization in the matrix setting need to be adapted to the tensor setting. We show strong provable guarantees, including showing that our algorithm converges linearly to the true tensors even when the factors are highly correlated and can be implemented in nearly linear time. Moreover our algorithm is also highly practical and we show that we can complete third order tensors with a thousand dimensions from observing a tiny fraction of its entries. In contrast, and somewhat surprisingly, we show that the standard version of alternating minimization, without our new twist, can converge at a drastically slower rate in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/Testing Determinantal Point Processes b/data/2020/neurips/Testing Determinantal Point Processes
new file mode 100644
index 0000000000..4045e4d7d3
--- /dev/null
+++ b/data/2020/neurips/Testing Determinantal Point Processes	
@@ -0,0 +1 @@
+Determinantal point processes (DPPs) are popular probabilistic models of diversity. In this paper, we investigate DPPs from a new perspective: property testing of distributions. Given sample access to an unknown distribution $q$ over the subsets of a ground set, we aim to distinguish whether $q$ is a DPP distribution, or $\epsilon$-far from all DPP distributions in $\ell_1$-distance. In this work, we propose the first algorithm for testing DPPs. Furthermore, we establish a matching lower bound on the sample complexity of DPP testing. This lower bound also extends to showing a new hardness result for the problem of testing the more general class of log-submodular distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Texture Interpolation for Probing Visual Perception b/data/2020/neurips/Texture Interpolation for Probing Visual Perception
new file mode 100644
index 0000000000..5d853ba586
--- /dev/null
+++ b/data/2020/neurips/Texture Interpolation for Probing Visual Perception	
@@ -0,0 +1 @@
+Texture synthesis models are important tools for understanding visual processing. In particular, statistical approaches based on neurally relevant features have been instrumental in understanding aspects of visual perception and of neural coding. New deep learning-based approaches further improve the quality of synthetic textures. Yet, it is still unclear why deep texture synthesis performs so well, and applications of this new framework to probe visual perception are scarce. Here, we show that distributions of deep convolutional neural network (CNN) activations of a texture are well described by elliptical distributions and therefore, following optimal transport theory, constraining their mean and covariance is sufficient to generate new texture samples. Then, we propose the natural geodesics (i.e. the shortest path between two points) arising with the optimal transport metric to interpolate between arbitrary textures. Compared to other CNN-based approaches, our interpolation method appears to match more closely the geometry of texture perception, and our mathematical framework is better suited to study its statistical nature. We apply our method by measuring the perceptual scale associated to the interpolation parameter in human observers, and the neural sensitivity of different areas of visual cortex in macaque monkeys.
\ No newline at end of file
diff --git a/data/2020/neurips/The Adaptive Complexity of Maximizing a Gross Substitutes Valuation b/data/2020/neurips/The Adaptive Complexity of Maximizing a Gross Substitutes Valuation
new file mode 100644
index 0000000000..57220c1e82
--- /dev/null
+++ b/data/2020/neurips/The Adaptive Complexity of Maximizing a Gross Substitutes Valuation	
@@ -0,0 +1 @@
+In this paper, we study the adaptive complexity of maximizing a monotone gross substitutes function under a cardinality constraint. Our main result is an algo-rithm that achieves a 1 − (cid:15) approximation in O (log n ) adaptive rounds for any constant (cid:15) > 0 , which is an exponential speedup in parallel running time compared to previously studied algorithms for gross substitutes functions. We show that the algorithmic results are tight in the sense that there is no algorithm that obtains a constant factor approximation in ˜ o (log n ) rounds. Both the upper and lower bounds are under the assumption that queries are only on feasible sets (i.e., of size at most k ). We also show that under a stronger model, where non-feasible queries are allowed, there is no non-adaptive algorithm that obtains an approximation better than 1 / 2 + (cid:15) . Both lower bounds extend to the class of OXS functions. Additionally, we conduct experiments on synthetic and real data sets to demonstrate the near-optimal performance and efﬁciency of the algorithm in practice.
\ No newline at end of file
diff --git a/data/2020/neurips/The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning b/data/2020/neurips/The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning
new file mode 100644
index 0000000000..a3892738ec
--- /dev/null
+++ b/data/2020/neurips/The Advantage of Conditional Meta-Learning for Biased Regularization and Fine Tuning	
@@ -0,0 +1 @@
+Biased regularization and fine-tuning are two recent meta-learning approaches. They have been shown to be effective to tackle distributions of tasks, in which the tasks' target vectors are all close to a common meta-parameter vector. However, these methods may perform poorly on heterogeneous environments of tasks, where the complexity of the tasks' distribution cannot be captured by a single meta-parameter vector. We address this limitation by conditional meta-learning, inferring a conditioning function mapping task's side information into a meta-parameter vector that is appropriate for that task at hand. We characterize properties of the environment under which the conditional approach brings a substantial advantage over standard meta-learning and we highlight examples of environments, such as those with multiple clusters, satisfying these properties. We then propose a convex meta-algorithm providing a comparable advantage also in practice. Numerical experiments confirm our theoretical findings.
\ No newline at end of file
diff --git a/data/2020/neurips/The All-or-Nothing Phenomenon in Sparse Tensor PCA b/data/2020/neurips/The All-or-Nothing Phenomenon in Sparse Tensor PCA
new file mode 100644
index 0000000000..b871a2ab12
--- /dev/null
+++ b/data/2020/neurips/The All-or-Nothing Phenomenon in Sparse Tensor PCA	
@@ -0,0 +1 @@
+We study the statistical problem of estimating a rank-one sparse tensor corrupted by additive Gaussian noise, a model also known as sparse tensor PCA. We show that for Bernoulli and Bernoulli-Rademacher distributed signals and \emph{for all} sparsity levels which are sublinear in the dimension of the signal, the sparse tensor PCA model exhibits a phase transition called the \emph{all-or-nothing phenomenon}. This is the property that for some signal-to-noise ratio (SNR) $\mathrm{SNR_c}$ and any fixed $\epsilon>0$, if the SNR of the model is below $\left(1-\epsilon\right)\mathrm{SNR_c}$, then it is impossible to achieve any arbitrarily small constant correlation with the hidden signal, while if the SNR is above $\left(1+\epsilon \right)\mathrm{SNR_c}$, then it is possible to achieve almost perfect correlation with the hidden signal. The all-or-nothing phenomenon was initially established in the context of sparse linear regression, and over the last year also in the context of sparse 2-tensor (matrix) PCA, Bernoulli group testing, and generalized linear models. Our results follow from a more general result showing that for any Gaussian additive model with a discrete uniform prior, the all-or-nothing phenomenon follows as a direct outcome of an appropriately defined "near-orthogonality" property of the support of the prior distribution.
\ No newline at end of file
diff --git a/data/2020/neurips/The Autoencoding Variational Autoencoder b/data/2020/neurips/The Autoencoding Variational Autoencoder
new file mode 100644
index 0000000000..4117efbecb
--- /dev/null
+++ b/data/2020/neurips/The Autoencoding Variational Autoencoder	
@@ -0,0 +1 @@
+Does a Variational AutoEncoder (VAE) consistently encode typical samples generated from its decoder? This paper shows that the perhaps surprising answer to this question is `No'; a (nominally trained) VAE does not necessarily amortize inference for typical samples that it is capable of generating. We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. Our approach hinges on an alternative construction of the variational approximation distribution to the true posterior of an extended VAE model with a Markov chain alternating between the encoder and the decoder. The method can be used to train a VAE model from scratch or given an already trained VAE, it can be run as a post processing step in an entirely self supervised way without access to the original training data. Our experimental analysis reveals that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks. We provide experimental results on the ColorMnist and CelebA benchmark datasets that quantify the properties of the learned representations and compare the approach with a baseline that is specifically trained for the desired property.
\ No newline at end of file
diff --git a/data/2020/neurips/The Complete Lasso Tradeoff Diagram b/data/2020/neurips/The Complete Lasso Tradeoff Diagram
new file mode 100644
index 0000000000..5214f2e4fa
--- /dev/null
+++ b/data/2020/neurips/The Complete Lasso Tradeoff Diagram	
@@ -0,0 +1 @@
+A fundamental problem in the high-dimensional regression is to understand the tradeoff between type I and type II errors or, equivalently, false discovery rate (FDR) and power in variable selection. To address this important problem, we offer the first complete tradeoff diagram that distinguishes all pairs of FDR and power that can be asymptotically realized by the Lasso with some choice of its penalty parameter from the remaining pairs, in a regime of linear sparsity under random designs. The tradeoff between the FDR and power characterized by our diagram holds no matter how strong the signals are. In particular, our results improve on the earlier Lasso tradeoff diagram of arXiv:1511.01957 by recognizing two simple but fundamental constraints on the pairs of FDR and power. The improvement is more substantial when the regression problem is above the Donoho--Tanner phase transition. Finally, we present extensive simulation studies to confirm the sharpness of the complete Lasso tradeoff diagram.
\ No newline at end of file
diff --git a/data/2020/neurips/The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise b/data/2020/neurips/The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise
new file mode 100644
index 0000000000..a02ea4d6d2
--- /dev/null
+++ b/data/2020/neurips/The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise	
@@ -0,0 +1 @@
+We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of our findings is that the $L_{\infty}$ perturbations case is provably computationally harder than the case $2 \leq p < \infty$.
\ No newline at end of file
diff --git a/data/2020/neurips/The Cone of Silence: Speech Separation by Localization b/data/2020/neurips/The Cone of Silence: Speech Separation by Localization
new file mode 100644
index 0000000000..b18ac9929b
--- /dev/null
+++ b/data/2020/neurips/The Cone of Silence: Speech Separation by Localization	
@@ -0,0 +1 @@
+Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $\theta \pm w/2$, given an angle of interest $\theta$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.
\ No newline at end of file
diff --git a/data/2020/neurips/The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification b/data/2020/neurips/The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification
new file mode 100644
index 0000000000..26478b38b4
--- /dev/null
+++ b/data/2020/neurips/The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification	
@@ -0,0 +1 @@
+We improve the effectiveness of propagation- and linear-optimization-based neural network verification algorithms with a new tightened convex relaxation for ReLU neurons. Unlike previous single-neuron relaxations which focus only on the univariate input space of the ReLU, our method considers the multivariate input space of the affine pre-activation function preceding the ReLU. Using results from submodularity and convex geometry, we derive an explicit description of the tightest possible convex relaxation when this multivariate input is over a box domain. We show that our convex relaxation is significantly stronger than the commonly used univariate-input relaxation which has been proposed as a natural convex relaxation barrier for verification. While our description of the relaxation may require an exponential number of inequalities, we show that they can be separated in linear time and hence can be efficiently incorporated into optimization algorithms on an as-needed basis. Based on this novel relaxation, we design two polynomial-time algorithms for neural network verification: a linear-programming-based algorithm that leverages the full power of our relaxation, and a fast propagation algorithm that generalizes existing approaches. In both cases, we show that for a modest increase in computational effort, our strengthened relaxation enables us to verify a significantly larger number of instances compared to similar algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/The Convolution Exponential and Generalized Sylvester Flows b/data/2020/neurips/The Convolution Exponential and Generalized Sylvester Flows
new file mode 100644
index 0000000000..6230afea24
--- /dev/null
+++ b/data/2020/neurips/The Convolution Exponential and Generalized Sylvester Flows	
@@ -0,0 +1 @@
+This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation. This linear transformation does not need to be invertible itself, and the exponential has the following desirable properties: it is guaranteed to be invertible, its inverse is straightforward to compute and the log Jacobian determinant is equal to the trace of the linear transformation. An important insight is that the exponential can be computed implicitly, which allows the use of convolutional layers. Using this insight, we develop new invertible transformations named convolution exponentials and graph convolution exponentials, which retain the equivariance of their underlying transformations. In addition, we generalize Sylvester Flows and propose Convolutional Sylvester Flows which are based on the generalization and the convolution exponential as basis change. Empirically, we show that the convolution exponential outperforms other linear transformations in generative flows on CIFAR10 and the graph convolution exponential improves the performance of graph normalizing flows. In addition, we show that Convolutional Sylvester Flows improve performance over residual flows as a generative flow model measured in log-likelihood.
\ No newline at end of file
diff --git a/data/2020/neurips/The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models b/data/2020/neurips/The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models
new file mode 100644
index 0000000000..91b29038d6
--- /dev/null
+++ b/data/2020/neurips/The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models	
@@ -0,0 +1 @@
+Macroscopic data aggregated from microscopic events are pervasive in machine learning, such as country-level COVID-19 infection statistics based on city-level data. Yet, many existing approaches for predicting macroscopic behavior only use aggregated data, leaving a large amount of fine-grained microscopic information unused. In this paper, we propose a principled optimization framework for macroscopic prediction by fitting microscopic models based on conditional stochastic optimization. The framework leverages both macroscopic and microscopic information, and adapts to individual microscopic models involved in the aggregation. In addition, we propose efficient learning algorithms with convergence guarantees. In our experiments, we show that the proposed learning framework clearly outperforms other plug-in supervised learning approaches in real-world applications, including the prediction of daily infections of COVID-19 and medicare claims. © 2020 Neural information processing systems foundation. All rights reserved.
\ No newline at end of file
diff --git a/data/2020/neurips/The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification b/data/2020/neurips/The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification
new file mode 100644
index 0000000000..b09c66b346
--- /dev/null
+++ b/data/2020/neurips/The Dilemma of TriHard Loss and an Element-Weighted TriHard Loss for Person Re-Identification	
@@ -0,0 +1 @@
+Triplet loss with batch hard mining (TriHard loss) is an important variation of triplet loss inspired by the idea that hard triplets improve the performance of metric leaning networks. However, there is a dilemma in the training process. The hard negative samples contain various quite similar characteristics compared with anchors and positive samples in a batch. Features of these characteristics should be clustered between anchors and positive samples while are also utilized to repel between anchors and hard negative samples. It is harmful for learning mutual features within classes. Several methods to alleviate the dilemma are designed and tested. In the meanwhile, an element-weighted TriHard loss is emphatically proposed to enlarge the distance between partial elements of feature vectors selectively which represent the different characteristics between anchors and hard negative samples. Extensive evaluations are conducted on Market1501 and MSMT17 datasets and the results achieve state-of-the-art on public baselines. The implementation of this work is available at https://github.com/LvWilliam/EWTH-Loss.
\ No newline at end of file
diff --git a/data/2020/neurips/The Discrete Gaussian for Differential Privacy b/data/2020/neurips/The Discrete Gaussian for Differential Privacy
new file mode 100644
index 0000000000..6b91630b01
--- /dev/null
+++ b/data/2020/neurips/The Discrete Gaussian for Differential Privacy	
@@ -0,0 +1,2 @@
+A key tool for building differentially private systems is adding Gaussian noise to the output of a function evaluated on a sensitive dataset. Unfortunately, using a continuous distribution presents several practical challenges. First and foremost, finite computers cannot exactly represent samples from continuous distributions, and previous work has demonstrated that seemingly innocuous numerical errors can entirely destroy privacy. Moreover, when the underlying data is itself discrete (e.g., population counts), adding continuous noise makes the result less interpretable. 
+With these shortcomings in mind, we introduce and analyze the discrete Gaussian in the context of differential privacy. Specifically, we theoretically and experimentally show that adding discrete Gaussian noise provides essentially the same privacy and accuracy guarantees as the addition of continuous Gaussian noise. We also present an simple and efficient algorithm for exact sampling from this distribution. This demonstrates its applicability for privately answering counting queries, or more generally, low-sensitivity integer-valued queries.
\ No newline at end of file
diff --git a/data/2020/neurips/The Diversified Ensemble Neural Network b/data/2020/neurips/The Diversified Ensemble Neural Network
new file mode 100644
index 0000000000..be55e97b98
--- /dev/null
+++ b/data/2020/neurips/The Diversified Ensemble Neural Network	
@@ -0,0 +1 @@
+Ensemble is a general way of improving the accuracy and stability of learning models, especially for the generalization ability on small datasets. Compared with tree-based methods, relatively less works have been devoted to an in-depth study on effective ensemble design for neural networks. In this paper, we propose a principled ensemble technique by constructing the so-called diversiﬁed ensemble layer to combine multiple networks as individual modules. Through comprehensive theoretical analysis, we show that each individual model in our ensemble layer corresponds to weights in the ensemble layer optimized in different directions. Meanwhile, the devised ensemble layer can be readily integrated into popular neural architectures, including CNNs, RNNs, and GCNs. Extensive experiments are conducted on public tabular datasets, images, and texts. By adopting weight sharing approach, the results show our method can notably improve the accuracy and stability of the original neural networks with ignorable extra time and space overhead.
\ No newline at end of file
diff --git a/data/2020/neurips/The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space b/data/2020/neurips/The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space
new file mode 100644
index 0000000000..8f24adc926
--- /dev/null
+++ b/data/2020/neurips/The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space	
@@ -0,0 +1 @@
+We revisit the problem of counting the number of distinct elements F 0 ( D ) in a data stream D , over a domain [ u ] . We propose an ( ε, δ ) -differentially private algorithm that approximates F 0 ( D ) within a factor of (1 ± γ ) , and with additive error of O ( (cid:112) ln(1 /δ ) /ε ) , using space O (ln(ln( u ) /γ ) /γ 2 ) . We improve on the prior work at least quadratically and up to exponentially, in terms of both space and additive error. Our additive error guarantee is optimal up to a factor of O ( (cid:112) ln(1 /δ )) , and the space bound is optimal up to a factor of O (cid:16) min (cid:110) ln (cid:16) ln( u ) γ (cid:17) , 1 γ 2 (cid:111)(cid:17) . We assume the existence of an ideal uniform random hash function, and ignore the space required to store it. We later relax this requirement by assuming pseudo-random functions and appealing to a computational variant of differential privacy, SIM-CDP. Our algorithm is built on top of the celebrated Flajolet-Martin (FM) sketch. We show that FM-sketch is differentially private as is, as long as there are ≈ (cid:112) ln(1 /δ ) / ( εγ ) distinct elements in the data set. Along the way, we prove a structural result showing that the maximum of k i.i.d. random variables is statistically close (in the sense of ε -differential privacy) to the maximum of ( k + 1) i.i.d. samples from the same distribution, as long as k = Ω (cid:0) 1 ε (cid:1) . Finally, experiments show that our algorithms introduces error within
\ No newline at end of file
diff --git a/data/2020/neurips/The Generalization-Stability Tradeoff In Neural Network Pruning b/data/2020/neurips/The Generalization-Stability Tradeoff In Neural Network Pruning
new file mode 100644
index 0000000000..ede999b548
--- /dev/null
+++ b/data/2020/neurips/The Generalization-Stability Tradeoff In Neural Network Pruning	
@@ -0,0 +1 @@
+Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this "generalization-stability tradeoff" is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal. These results explain the compatibility of pruning-based generalization improvements and the high generalization recently observed in overparameterized networks.
\ No newline at end of file
diff --git a/data/2020/neurips/The Generalized Lasso with Nonlinear Observations and Generative Priors b/data/2020/neurips/The Generalized Lasso with Nonlinear Observations and Generative Priors
new file mode 100644
index 0000000000..d8463e0f60
--- /dev/null
+++ b/data/2020/neurips/The Generalized Lasso with Nonlinear Observations and Generative Priors	
@@ -0,0 +1 @@
+In this paper, we study the problem of signal estimation from noisy non-linear measurements when the unknown $n$-dimensional signal is in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models, such as linear, logistic, 1-bit, and other quantized models. In addition, we consider the impact of adversarial corruptions on these measurements. Our analysis is based on a generalized Lasso approach (Plan and Vershynin, 2016). We first provide a non-uniform recovery guarantee, which states that under i.i.d.~Gaussian measurements, roughly $O\left(\frac{k}{\epsilon^2}\log L\right)$ samples suffice for recovery with an $\ell_2$-error of $\epsilon$, and that this scheme is robust to adversarial noise. Then, we apply this result to neural network generative models, and discuss various extensions to other models and non-i.i.d.~measurements. Moreover, we show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property, which is satisfied by the 1-bit and censored Tobit models.
\ No newline at end of file
diff --git a/data/2020/neurips/The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes b/data/2020/neurips/The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
new file mode 100644
index 0000000000..090d3707df
--- /dev/null
+++ b/data/2020/neurips/The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes	
@@ -0,0 +1 @@
+This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
\ No newline at end of file
diff --git a/data/2020/neurips/The Implications of Local Correlation on Learning Some Deep Functions b/data/2020/neurips/The Implications of Local Correlation on Learning Some Deep Functions
new file mode 100644
index 0000000000..da7ebbb048
--- /dev/null
+++ b/data/2020/neurips/The Implications of Local Correlation on Learning Some Deep Functions	
@@ -0,0 +1 @@
+It is known that learning deep neural-networks is computationally hard in the worst-case. In fact, the proofs of such hardness results show that even weakly learning deep networks is hard. In other words, no efﬁcient algorithm can ﬁnd a predictor that is slightly better than a random guess. However, we observe that on natural distributions of images, small patches of the input image are correlated to the target label, which implies that on such natural data, efﬁcient weak learning is trivial. While in the distribution-free setting, the celebrated boosting results show that weak learning implies strong learning, in the distribution-speciﬁc setting this is not necessarily the case. We introduce a property of distributions, denoted “local correlation”, which requires that small patches of the input image and of intermediate layers of the target function are correlated to the target label. We empirically demonstrate that this property holds for the CIFAR and ImageNet data sets. The main technical results of the paper is proving that, for some classes of deep functions, weak learning implies efﬁcient strong learning under the “local correlation” assumption.
\ No newline at end of file
diff --git a/data/2020/neurips/The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning b/data/2020/neurips/The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning
new file mode 100644
index 0000000000..b2b5a0303a
--- /dev/null
+++ b/data/2020/neurips/The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning	
@@ -0,0 +1 @@
+Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches. For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL. To address this, we introduce an experimental setup to evaluate model-based behavior of RL methods, inspired by work from neuroscience on detecting model-based behavior in humans and animals. Our metric based on this setup, the Local Change Adaptation (LoCA) regret, measures how quickly an RL method adapts to a local change in the environment. Our metric can identify model-based behavior, even if the method uses a poor representation and provides insight in how close a method's behavior is from optimal model-based behavior. We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.
\ No newline at end of file
diff --git a/data/2020/neurips/The Lottery Ticket Hypothesis for Pre-trained BERT Networks b/data/2020/neurips/The Lottery Ticket Hypothesis for Pre-trained BERT Networks
new file mode 100644
index 0000000000..d69fd19e22
--- /dev/null
+++ b/data/2020/neurips/The Lottery Ticket Hypothesis for Pre-trained BERT Networks	
@@ -0,0 +1 @@
+In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training on a range of downstream tasks, and similar trends are emerging in other areas of deep learning. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matching subnetworks capable of training in isolation to full accuracy and transferring to other tasks. In this work, we combine these observations to assess whether such trainable, transferrable subnetworks exist in pre-trained BERT models. For a range of downstream tasks, we indeed find matching subnetworks at 40% to 90% sparsity. We find these subnetworks at (pre-trained) initialization, a deviation from prior NLP research where they emerge only after some amount of training. Subnetworks found on the masked language modeling task (the same task used to pre-train the model) transfer universally; those found on other tasks transfer in a limited fashion if at all. As large-scale pre-training becomes an increasingly central paradigm in deep learning, our results demonstrate that the main lottery ticket observations remain relevant in this context. Codes available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/The MAGICAL Benchmark for Robust Imitation b/data/2020/neurips/The MAGICAL Benchmark for Robust Imitation
new file mode 100644
index 0000000000..238a454c09
--- /dev/null
+++ b/data/2020/neurips/The MAGICAL Benchmark for Robust Imitation	
@@ -0,0 +1 @@
+Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark suite, which permits systematic evaluation of generalisation by quantifying robustness to different kinds of distribution shift that an IL algorithm is likely to encounter in practice. Using the MAGICAL suite, we confirm that existing IL algorithms overfit significantly to the context in which demonstrations are provided. We also show that standard methods for reducing overfitting are effective at creating narrow perceptual invariances, but are not sufficient to enable transfer to contexts that require substantially different behaviour, which suggests that new approaches will be needed in order to robustly generalise demonstrator intent. Code and data for the MAGICAL suite is available at https://github.com/qxcv/magical/.
\ No newline at end of file
diff --git a/data/2020/neurips/The Mean-Squared Error of Double Q-Learning b/data/2020/neurips/The Mean-Squared Error of Double Q-Learning
new file mode 100644
index 0000000000..8ceb9d46dd
--- /dev/null
+++ b/data/2020/neurips/The Mean-Squared Error of Double Q-Learning	
@@ -0,0 +1 @@
+In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-squared error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and outputs the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.
\ No newline at end of file
diff --git a/data/2020/neurips/The NetHack Learning Environment b/data/2020/neurips/The NetHack Learning Environment
new file mode 100644
index 0000000000..57905e4d50
--- /dev/null
+++ b/data/2020/neurips/The NetHack Learning Environment	
@@ -0,0 +1 @@
+Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/The Origins and Prevalence of Texture Bias in Convolutional Neural Networks b/data/2020/neurips/The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
new file mode 100644
index 0000000000..6ae454261e
--- /dev/null
+++ b/data/2020/neurips/The Origins and Prevalence of Texture Bias in Convolutional Neural Networks	
@@ -0,0 +1 @@
+Recent work has indicated that, unlike humans, ImageNet-trained CNNs tend to classify images by texture rather than by shape. How pervasive is this bias, and where does it come from? We find that, when trained on datasets of images with conflicting shape and texture, CNNs learn to classify by shape at least as easily as by texture. What factors, then, produce the texture bias in CNNs trained on ImageNet? Different unsupervised training objectives and different architectures have small but significant and largely independent effects on the level of texture bias. However, all objectives and architectures still lead to models that make texture-based classification decisions a majority of the time, even if shape information is decodable from their hidden representations. The effect of data augmentation is much larger. By taking less aggressive random crops at training time and applying simple, naturalistic augmentation (color distortion, noise, and blur), we train models that classify ambiguous images by shape a majority of the time, and outperform baselines on out-of-distribution test sets. Our results indicate that apparent differences in the way humans and ImageNet-trained CNNs process images may arise not from differences in their internal workings, but from differences in the data that they see.
\ No newline at end of file
diff --git a/data/2020/neurips/The Pitfalls of Simplicity Bias in Neural Networks b/data/2020/neurips/The Pitfalls of Simplicity Bias in Neural Networks
new file mode 100644
index 0000000000..6448edbe62
--- /dev/null
+++ b/data/2020/neurips/The Pitfalls of Simplicity Bias in Neural Networks	
@@ -0,0 +1 @@
+Several works have proposed Simplicity Bias (SB)---the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models---to justify why neural networks generalize well [Arpit et al. 2017, Nakkiran et al. 2019, Valle-Perez et al. 2019]. However, the precise notion of simplicity remains vague. Furthermore, previous settings that use SB to justify why neural networks generalize well do not simultaneously capture the brittleness of neural networks---a widely observed phenomenon in practice [Goodfellow et al. 2014, Jo and Bengio 2017]. To this end, we introduce a collection of piecewise-linear and image-based datasets that (a) naturally incorporate a precise notion of simplicity and (b) capture the subtleties of neural networks trained on real datasets. Through theory and experiments on these datasets, we show that SB of SGD and variants is extreme: neural networks rely exclusively on the simplest feature and remain invariant to all predictive complex features. Consequently, the extreme nature of SB explains why seemingly benign distribution shifts and small adversarial perturbations significantly degrade model performance. Moreover, contrary to conventional wisdom, SB can also hurt generalization on the same data distribution, as SB persists even when the simplest feature has less predictive power than the more complex features. We also demonstrate that common approaches for improving generalization and robustness---ensembles and adversarial training---do not mitigate SB and its shortcomings. Given the central role played by SB in generalization and robustness, we hope that the datasets and methods in this paper serve as an effective testbed to evaluate novel algorithmic approaches aimed at avoiding the pitfalls of extreme SB.
\ No newline at end of file
diff --git a/data/2020/neurips/The Potts-Ising model for discrete multivariate data b/data/2020/neurips/The Potts-Ising model for discrete multivariate data
new file mode 100644
index 0000000000..7ab13f51ce
--- /dev/null
+++ b/data/2020/neurips/The Potts-Ising model for discrete multivariate data	
@@ -0,0 +1 @@
+Modeling dependencies in multivariate discrete data is a challenging problem, especially in high dimensions. The Potts model is a versatile such model, suitable when each coordinate is a categorical variable. However, the full Potts model has too many parameters to be accurately ﬁt when the number of categories is large. We introduce a variation on the Potts model that allows for general categorical marginals and Ising-type multivariate dependence. This reduces the number of parameters from Ω( d 2 K 2 ) in the full Potts model to O ( d 2 + Kd ) , where K is the number of categories and d is the dimension of the data. We show that the complexity of ﬁtting this new Potts-Ising model is the same as that of an Ising model. In particular, adopting the neighborhood regression framework, the model can be ﬁt by solving d separate logistic regressions. We demonstrate the ability of the model to capture multivariate dependencies in real data by comparing with existing approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/The Power of Comparisons for Actively Learning Linear Classifiers b/data/2020/neurips/The Power of Comparisons for Actively Learning Linear Classifiers
new file mode 100644
index 0000000000..c5a95482b7
--- /dev/null
+++ b/data/2020/neurips/The Power of Comparisons for Actively Learning Linear Classifiers	
@@ -0,0 +1 @@
+In the world of big data, large but costly to label datasets dominate many fields. Active learning, a semi-supervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs no better than its supervised alternative for important concept classes such as linear separators, we show that by adding weak distributional assumptions and allowing comparison queries, active learning requires exponentially fewer samples. Further, we show that these results hold as well for a stronger model of learning called Reliable and Probably Useful (RPU) learning. In this model, our learner is not allowed to make mistakes, but may instead answer "I don't know." While previous negative results showed this model to have intractably large sample complexity for label queries, we show that comparison queries make RPU-learning at worst logarithmically more expensive in both the passive and active regimes.
\ No newline at end of file
diff --git a/data/2020/neurips/The Power of Predictions in Online Control b/data/2020/neurips/The Power of Predictions in Online Control
new file mode 100644
index 0000000000..c8f0052d0d
--- /dev/null
+++ b/data/2020/neurips/The Power of Predictions in Online Control	
@@ -0,0 +1 @@
+We study the impact of predictions in online Linear Quadratic Regulator control with both stochastic and adversarial disturbances in the dynamics. In both settings, we characterize the optimal policy and derive tight bounds on the minimum cost and dynamic regret. Perhaps surprisingly, our analysis shows that the conventional greedy MPC approach is a near-optimal policy in both stochastic and adversarial settings. Specifically, for length-$T$ problems, MPC requires only $O(\log T)$ predictions to reach $O(1)$ dynamic regret, which matches (up to lower-order terms) our lower bound on the required prediction horizon for constant regret.
\ No newline at end of file
diff --git a/data/2020/neurips/The Primal-Dual method for Learning Augmented Algorithms b/data/2020/neurips/The Primal-Dual method for Learning Augmented Algorithms
new file mode 100644
index 0000000000..616c3c6ec5
--- /dev/null
+++ b/data/2020/neurips/The Primal-Dual method for Learning Augmented Algorithms	
@@ -0,0 +1 @@
+The extension of classical online algorithms when provided with predictions is a new and active research area. In this paper, we extend the primal-dual method for online algorithms in order to incorporate predictions that advise the online algorithm about the next action to take. We use this framework to obtain novel algorithms for a variety of online covering problems. We compare our algorithms to the cost of the true and predicted offline optimal solutions and show that these algorithms outperform any online algorithm when the prediction is accurate while maintaining good guarantees when the prediction is misleading.
\ No newline at end of file
diff --git a/data/2020/neurips/The Smoothed Possibility of Social Choice b/data/2020/neurips/The Smoothed Possibility of Social Choice
new file mode 100644
index 0000000000..ea51758828
--- /dev/null
+++ b/data/2020/neurips/The Smoothed Possibility of Social Choice	
@@ -0,0 +1 @@
+We develop a framework to leverage the elegant "worst average-case" idea in smoothed complexity analysis to social choice, motivated by modern applications of social choice powered by AI and ML. Using our framework, we characterize the smoothed likelihood of some fundamental paradoxes and impossibility theorems as the number of agents increases. For Condrocet's paradox, we prove that the smoothed likelihood of the paradox either vanishes at an exponential rate, or does not vanish at all. For the folklore impossibility on the non-existence of voting rules that satisfy anonymity and neutrality, we characterize the rate for the impossibility to vanish, to be either polynomially fast or exponentially fast. We also propose a novel easy-to-compute tie-breaking mechanism that optimally preserves anonymity and neutrality for even number of alternatives in natural settings. Our results illustrate the smoothed possibility of social choice---even though the paradox and the impossibility theorem hold in the worst case, they may not be a big concern in practice in certain natural settings.
\ No newline at end of file
diff --git a/data/2020/neurips/The Statistical Complexity of Early-Stopped Mirror Descent b/data/2020/neurips/The Statistical Complexity of Early-Stopped Mirror Descent
new file mode 100644
index 0000000000..cdd9c6c22d
--- /dev/null
+++ b/data/2020/neurips/The Statistical Complexity of Early-Stopped Mirror Descent	
@@ -0,0 +1,2 @@
+
+ Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk. We consider the set-up of learning linear models and kernel methods for strongly convex and Lipschitz loss functions while imposing only boundedness conditions on the unknown data-generating mechanism. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step size and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature while also showing how to improve upon them in some settings.
\ No newline at end of file
diff --git a/data/2020/neurips/The Strong Screening Rule for SLOPE b/data/2020/neurips/The Strong Screening Rule for SLOPE
new file mode 100644
index 0000000000..4053f2eb42
--- /dev/null
+++ b/data/2020/neurips/The Strong Screening Rule for SLOPE	
@@ -0,0 +1,3 @@
+Extracting relevant features from data sets where the number of observations ($n$) is much smaller then the number of predictors ($p$) is a major challenge in modern statistics. Sorted L-One Penalized Estimation (SLOPE), a generalization of the lasso, is a promising method within this setting. Current numerical procedures for SLOPE, however, lack the efficiency that respective tools for the lasso enjoy, particularly in the context of estimating a complete regularization path. A key component in the efficiency of the lasso is predictor screening rules: rules that allow predictors to be discarded before estimating the model. This is the first paper to establish such a rule for SLOPE. 
+We develop a screening rule for SLOPE by examining its subdifferential and show that this rule is a generalization of the strong rule for the lasso. Our rule is heuristic, which means that it may discard predictors erroneously. We present conditions under which this may happen and show that such situations are rare and easily safeguarded against by a simple check of the optimality conditions. 
+Our numerical experiments show that the rule performs well in practice, leading to improvements by orders of magnitude for data in the $p \gg n$ domain, as well as incurring no additional computational overhead when $n \gg p$. We also examine the effect of correlation structures in the design matrix on the rule and discuss algorithmic strategies for employing the rule. Finally, we provide an efficient implementation of the rule in our R package SLOPE.
\ No newline at end of file
diff --git a/data/2020/neurips/The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks b/data/2020/neurips/The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
new file mode 100644
index 0000000000..6f86d56b55
--- /dev/null
+++ b/data/2020/neurips/The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks	
@@ -0,0 +1 @@
+Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically. Key to our analysis is to bound the spectral norm of the difference between the Neural Tangent Kernel (NTK) at initialization and an affine transform of the data kernel; however, unlike many previous results utilizing the NTK, we do not require the network to have disproportionately large width, and the network is allowed to escape the kernel regime later in training.
\ No newline at end of file
diff --git a/data/2020/neurips/The Value Equivalence Principle for Model-Based Reinforcement Learning b/data/2020/neurips/The Value Equivalence Principle for Model-Based Reinforcement Learning
new file mode 100644
index 0000000000..316e3ab732
--- /dev/null
+++ b/data/2020/neurips/The Value Equivalence Principle for Model-Based Reinforcement Learning	
@@ -0,0 +1 @@
+Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning. As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates. We propose a formulation of the model learning problem based on the value equivalence principle and analyze how the set of feasible solutions is impacted by the choice of policies and functions. Specifically, we show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks, until eventually collapsing to a single point corresponding to a model that perfectly describes the environment. In many problems, directly modelling state-to-state transitions may be both difficult and unnecessary. By leveraging the value-equivalence principle one may find simpler models without compromising performance, saving computation and memory. We illustrate the benefits of value-equivalent model learning with experiments comparing it against more traditional counterparts like maximum likelihood estimation. More generally, we argue that the principle of value equivalence underlies a number of recent empirical successes in RL, such as Value Iteration Networks, the Predictron, Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical underpinning of those results.
\ No newline at end of file
diff --git a/data/2020/neurips/The Wasserstein Proximal Gradient Algorithm b/data/2020/neurips/The Wasserstein Proximal Gradient Algorithm
new file mode 100644
index 0000000000..9939112a9c
--- /dev/null
+++ b/data/2020/neurips/The Wasserstein Proximal Gradient Algorithm	
@@ -0,0 +1 @@
+Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures (i.e., the Wasserstein space). This objective is typically a divergence w.r.t. a fixed target distribution. In recent years, these continuous time dynamics have been used to study the convergence of machine learning algorithms aiming at approximating a probability distribution. However, the discrete-time behavior of these algorithms might differ from the continuous time dynamics. Besides, although discretized gradient flows have been proposed in the literature, little is known about their minimization power. In this work, we propose a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms. Using techniques from convex optimization and optimal transport, we analyze the FB scheme as a minimization algorithm on the Wasserstein space. More precisely, we show under mild assumptions that the FB scheme has convergence guarantees similar to the proximal gradient algorithm in Euclidean spaces (resp. similar to the associated Wasserstein gradient flow).
\ No newline at end of file
diff --git a/data/2020/neurips/The interplay between randomness and structure during learning in RNNs b/data/2020/neurips/The interplay between randomness and structure during learning in RNNs
new file mode 100644
index 0000000000..64fb7d80b3
--- /dev/null
+++ b/data/2020/neurips/The interplay between randomness and structure during learning in RNNs	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) trained on low-dimensional tasks have been widely used to model functional biological networks. However, the solutions found by learning and the effect of initial connectivity are not well understood. Here, we examine RNNs trained using gradient descent on different tasks inspired by the neuroscience literature. We find that the changes in recurrent connectivity can be described by low-rank matrices, despite the unconstrained nature of the learning algorithm. To identify the origin of the low-rank structure, we turn to an analytically tractable setting: training a linear RNN on a simplified task. We show how the low-dimensional task structure leads to low-rank changes to connectivity. This low-rank structure allows us to explain and quantify the phenomenon of accelerated learning in the presence of random initial connectivity. Altogether, our study opens a new perspective to understanding trained RNNs in terms of both the learning process and the resulting network structure.
\ No newline at end of file
diff --git a/data/2020/neurips/The phase diagram of approximation rates for deep neural networks b/data/2020/neurips/The phase diagram of approximation rates for deep neural networks
new file mode 100644
index 0000000000..2c7a24d112
--- /dev/null
+++ b/data/2020/neurips/The phase diagram of approximation rates for deep neural networks	
@@ -0,0 +1 @@
+We explore the phase diagram of approximation rates for deep neural networks. The phase diagram describes theoretically optimal accuracy-complexity relations and their qualitative properties. Our contribution is three-fold. First, we generalize the existing result on the existence of deep discontinuous phase in ReLU networks to functional classes of arbitrary positive smoothness, and identify the boundary between the feasible and infeasible rates. Second, we demonstrate that standard fully-connected architectures of a fixed width independent of smoothness can adapt to smoothness and achieve almost optimal rates. Finally, we discuss how the phase diagram can change in the case of non-ReLU activation functions. In particular, we prove that using both sine and ReLU activations theoretically leads to very fast, nearly exponential approximation rates, thanks to the emerging capability of the network to implement efficient lookup operations.
\ No newline at end of file
diff --git a/data/2020/neurips/The route to chaos in routing games: When is price of anarchy too optimistic? b/data/2020/neurips/The route to chaos in routing games: When is price of anarchy too optimistic?
new file mode 100644
index 0000000000..0a044bb6ba
--- /dev/null
+++ b/data/2020/neurips/The route to chaos in routing games: When is price of anarchy too optimistic?	
@@ -0,0 +1,2 @@
+Routing games are amongst the most studied classes of games. Their two most well-known properties are that learning dynamics converge to equilibria and that all equilibria are approximately optimal. In this work, we perform a stress test for these classic results by studying the ubiquitous dynamics, Multiplicative Weights Update, in different classes of congestion games, uncovering intricate non-equilibrium phenomena. As the system demand increases, the learning dynamics go through period-doubling bifurcations, leading to instabilities, chaos and large inefficiencies even in the simplest case of non-atomic routing games with two paths of linear cost where the Price of Anarchy is equal to one. 
+Starting with this simple class, we show that every system has a carrying capacity, above which it becomes unstable. If the equilibrium flow is a symmetric $50-50\%$ split, the system exhibits one period-doubling bifurcation. A single periodic attractor of period two replaces the attracting fixed point. Although the Price of Anarchy is equal to one, in the large population limit the time-average social cost for all but a zero measure set of initial conditions converges to its worst possible value. For asymmetric equilibrium flows, increasing the demand eventually forces the system into Li-Yorke chaos with positive topological entropy and periodic orbits of all possible periods. Remarkably, in all non-equilibrating regimes, the time-average flows on the paths converge exactly to the equilibrium flows, a property akin to no-regret learning in zero-sum games. These results are robust. We extend them to routing games with arbitrarily many strategies, polynomial cost functions, non-atomic as well as atomic routing games and heteregenous users. Our results are also applicable to any sequence of shrinking learning rates, e.g., $1/\sqrt{T}$, by allowing for a dynamically increasing population size.
\ No newline at end of file
diff --git a/data/2020/neurips/Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View b/data/2020/neurips/Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View
new file mode 100644
index 0000000000..6e54db2aea
--- /dev/null
+++ b/data/2020/neurips/Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View	
@@ -0,0 +1 @@
+Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal performance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize. This challenge is already present in simple classifiers, such as those minimizing a square loss. Our novel theoretical techniques allow us to overcome some of these challenges. The insights gained may pave the way for a precise understanding of other classification algorithms beyond those studied in this paper.
\ No newline at end of file
diff --git a/data/2020/neurips/Theory-Inspired Path-Regularized Differential Network Architecture Search b/data/2020/neurips/Theory-Inspired Path-Regularized Differential Network Architecture Search
new file mode 100644
index 0000000000..93017a0569
--- /dev/null
+++ b/data/2020/neurips/Theory-Inspired Path-Regularized Differential Network Architecture Search	
@@ -0,0 +1 @@
+Despite its high search efficiency, differential architecture search (DARTS) often selects network architectures with dominated skip connections which lead to performance degradation. However, theoretical understandings on this issue remain absent, hindering the development of more advanced methods in a principled way. In this work, we solve this problem by theoretically analyzing the effects of various types of operations, e.g. convolution, skip connection and zero operation, to the network optimization. We prove that the architectures with more skip connections can converge faster than the other candidates, and thus are selected by DARTS. This result, for the first time, theoretically and explicitly reveals the impact of skip connections to fast network optimization and its competitive advantage over other types of operations in DARTS. Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search. Experimental results on image classification tasks validate its advantages. Codes and models will be released.
\ No newline at end of file
diff --git a/data/2020/neurips/Throughput-Optimal Topology Design for Cross-Silo Federated Learning b/data/2020/neurips/Throughput-Optimal Topology Design for Cross-Silo Federated Learning
new file mode 100644
index 0000000000..30b408d59f
--- /dev/null
+++ b/data/2020/neurips/Throughput-Optimal Topology Design for Cross-Silo Federated Learning	
@@ -0,0 +1 @@
+Federated learning usually employs a client-server architecture where an orchestrator iteratively aggregates model updates from remote clients and pushes them back a refined model. This approach may be inefficient in cross-silo settings, as close-by data silos with high-speed access links may exchange information faster than with the orchestrator, and the orchestrator may become a communication bottleneck. In this paper we define the problem of topology design for cross-silo federated learning using the theory of max-plus linear systems to compute the system throughput---number of communication rounds per time unit. We also propose practical algorithms that, under the knowledge of measurable network characteristics, find a topology with the largest throughput or with provable throughput guarantees. In realistic Internet networks with 10 Gbps access links for silos, our algorithms speed up training by a factor 9 and 1.5 in comparison to the master-slave architecture and to state-of-the-art MATCHA, respectively. Speedups are even larger with slower access links.
\ No newline at end of file
diff --git a/data/2020/neurips/Thunder: a Fast Coordinate Selection Solver for Sparse Learning b/data/2020/neurips/Thunder: a Fast Coordinate Selection Solver for Sparse Learning
new file mode 100644
index 0000000000..e3460b89d3
--- /dev/null
+++ b/data/2020/neurips/Thunder: a Fast Coordinate Selection Solver for Sparse Learning	
@@ -0,0 +1 @@
+` 1 regularization has been broadly employed to pursue model sparsity. Despite the non-smoothness, researchers have developed efﬁcient algorithms by leveraging the sparsity and convexity of the problem. In this paper, we propose a novel active incremental approach to further improve the efﬁciency of the solvers. We show that our method performs well even when the existing methods fail due to the low sparseness or high solution accuracy request. Theoretical analysis and experimental results on synthetic and real-world data sets validate the advantages of the method
\ No newline at end of file
diff --git a/data/2020/neurips/Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits b/data/2020/neurips/Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits
new file mode 100644
index 0000000000..efb0f753a7
--- /dev/null
+++ b/data/2020/neurips/Tight First- and Second-Order Regret Bounds for Adversarial Linear Bandits	
@@ -0,0 +1 @@
+We propose novel algorithms with ﬁrst-and second-order regret bounds for adversarial linear bandits. These regret bounds imply that our algorithms perform well when there is an action achieving a small cumulative loss or the loss has a small variance. In addition, we need only assumptions weaker than those of existing algorithms; our algorithms work on discrete action sets as well as continuous ones without a priori knowledge about losses, and they run efﬁciently if a linear optimization oracle for the action set is available. These results are obtained by combining optimistic online optimization, continuous multiplicative weight update methods, and a novel technique that we refer to as distribution truncation. We also show that the regret bounds of our algorithms are tight up to polylogarithmic factors.
\ No newline at end of file
diff --git a/data/2020/neurips/Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model b/data/2020/neurips/Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model
new file mode 100644
index 0000000000..512cbf5099
--- /dev/null
+++ b/data/2020/neurips/Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model	
@@ -0,0 +1 @@
+In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle \theta_*, X \rangle$ between the random output $Y$ and the random feature vector $\Phi(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $\theta_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $\theta_*$ and of the feature vectors $\Phi(u)$. We interpret our result in the reproducing kernel Hilbert space framework; as a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points. The convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.
\ No newline at end of file
diff --git a/data/2020/neurips/Tight last-iterate convergence rates for no-regret learning in multi-player games b/data/2020/neurips/Tight last-iterate convergence rates for no-regret learning in multi-player games
new file mode 100644
index 0000000000..437b69c272
--- /dev/null
+++ b/data/2020/neurips/Tight last-iterate convergence rates for no-regret learning in multi-player games	
@@ -0,0 +1 @@
+We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Time-Reversal Symmetric ODE Network b/data/2020/neurips/Time-Reversal Symmetric ODE Network
new file mode 100644
index 0000000000..c9fc3ea129
--- /dev/null
+++ b/data/2020/neurips/Time-Reversal Symmetric ODE Network	
@@ -0,0 +1 @@
+Time-reversal symmetry, which requires that the dynamics of a system should not change with the reversal of time axis, is a fundamental property that frequently holds in classical and quantum mechanics. In this paper, we propose a novel loss function that measures how well our ordinary differential equation (ODE) networks comply with this time-reversal symmetry; it is formally defined by the discrepancy in the time evolutions of ODE networks between forward and backward dynamics. Then, we design a new framework, which we name as Time-Reversal Symmetric ODE Networks (TRS-ODENs), that can learn the dynamics of physical systems more sample-efficiently by learning with the proposed loss function. We evaluate TRS-ODENs on several classical dynamics, and find they can learn the desired time evolution from observed noisy and complex trajectories. We also show that, even for systems that do not possess the full time-reversal symmetry, TRS-ODENs can achieve better predictive performances over baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network b/data/2020/neurips/Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network
new file mode 100644
index 0000000000..1b162b27f5
--- /dev/null
+++ b/data/2020/neurips/Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network	
@@ -0,0 +1 @@
+Real-world timeseries have complex underlying temporal dynamics and the detection of anomalies is challenging. In this paper, we propose the Temporal Hierarchical One-Class (THOC) network, a temporal one-class classiﬁcation model for timeseries anomaly detection. It captures temporal dynamics in multiple scales by using a dilated recurrent neural network with skip connections. Using multiple hyperspheres obtained with a hierarchical clustering process, a one-class objective called Multiscale Vector Data Description is deﬁned. This allows the temporal dynamics to be well captured by a set of multi-resolution temporal clusters. To further facilitate representation learning, the hypersphere centers are encouraged to be orthogonal to each other, and a self-supervision task in the temporal domain is added. The whole model can be trained end-to-end. Extensive empirical studies on various real-world timeseries demonstrate that the proposed THOC network outperforms recent strong deep learning baselines on timeseries anomaly detection.
\ No newline at end of file
diff --git a/data/2020/neurips/TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning b/data/2020/neurips/TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning
new file mode 100644
index 0000000000..0f817c9dd7
--- /dev/null
+++ b/data/2020/neurips/TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning	
@@ -0,0 +1 @@
+On-device learning enables edge devices to continually adapt the AI models to new data, which requires a small memory footprint to fit the tight memory constraint of edge devices. Existing work solves this problem by reducing the number of trainable parameters. However, this doesn't directly translate to memory saving since the major bottleneck is the activations, not parameters. In this work, we present Tiny-Transfer-Learning (TinyTL) for memory-efficient on-device learning. TinyTL freezes the weights while only learns the bias modules, thus no need to store the intermediate activations. To maintain the adaptation capacity, we introduce a new memory-efficient bias module, the lite residual module, to refine the feature extractor by learning small residual feature maps adding only 3.8% memory overhead. Extensive experiments show that TinyTL significantly saves the memory (up to 6.5x) with little accuracy loss compared to fine-tuning the full network. Compared to fine-tuning the last layer, TinyTL provides significant accuracy improvements (up to 33.8%) with little memory overhead. Furthermore, combined with feature extractor adaptation, TinyTL provides 7.5-12.9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.
\ No newline at end of file
diff --git a/data/2020/neurips/Top-KAST: Top-K Always Sparse Training b/data/2020/neurips/Top-KAST: Top-K Always Sparse Training
new file mode 100644
index 0000000000..88fc6c8da7
--- /dev/null
+++ b/data/2020/neurips/Top-KAST: Top-K Always Sparse Training	
@@ -0,0 +1 @@
+Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint. Unfortunately, most existing methods for inducing performant sparse models still entail the instantiation of dense parameters, or dense gradients in the backward-pass, during training. For very large models this requirement can be prohibitive. In this work we propose Top-KAST, a method that preserves constant sparsity throughout training (in both the forward and backward-passes). We demonstrate the efficacy of our approach by showing that it performs comparably to or better than previous works when training models on the established ImageNet benchmark, whilst fully maintaining sparsity. In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling where the current best performing architectures tend to have tens of billions of parameters and scaling up does not yet seem to have saturated performance. Sparse versions of these architectures can be run with significantly fewer resources, making them more widely accessible and applicable. Furthermore, in addition to being effective, our approach is straightforward and can easily be implemented in a wide range of existing machine learning frameworks with only a few additional lines of code. We therefore hope that our contribution will help enable the broader community to explore the potential held by massive models, without incurring massive computational cost.
\ No newline at end of file
diff --git a/data/2020/neurips/Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples b/data/2020/neurips/Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples
new file mode 100644
index 0000000000..40df65393d
--- /dev/null
+++ b/data/2020/neurips/Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples	
@@ -0,0 +1 @@
+We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm that materially improves results with no increase in computational cost: When updating the generator parameters, we simply zero out the gradient contributions from the elements of the batch that the critic scores as `least realistic'. Through experiments on many different GAN variants, we show that this `top-k update' procedure is a generally applicable improvement. In order to understand the nature of the improvement, we conduct extensive analysis on a simple mixture-of-Gaussians dataset and discover several interesting phenomena. Among these is that, when gradient updates are computed using the worst-scoring batch elements, samples can actually be pushed further away from their nearest mode. We also apply our method to recent GAN variants and improve state-of-the-art FID for conditional generation from 9.21 to 8.57 on CIFAR-10.
\ No newline at end of file
diff --git a/data/2020/neurips/TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search b/data/2020/neurips/TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search
new file mode 100644
index 0000000000..126ba17498
--- /dev/null
+++ b/data/2020/neurips/TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search	
@@ -0,0 +1 @@
+Molecular geometry prediction of flexible molecules, or conformer search, is a long-standing challenge in computational chemistry. This task is of great importance for predicting structure-activity relationships for a wide variety of substances ranging from biomolecules to ubiquitous materials. Substantial computational resources are invested in Monte Carlo and Molecular Dynamics methods to generate diverse and representative conformer sets for medium to large molecules, which are yet intractable to chemoinformatic conformer search methods. We present TorsionNet, an efficient sequential conformer search technique based on reinforcement learning under the rigid rotor approximation. The model is trained via curriculum learning, whose theoretical benefit is explored in detail, to maximize a novel metric grounded in thermodynamics called the Gibbs Score. Our experimental results show that TorsionNet outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy.
\ No newline at end of file
diff --git a/data/2020/neurips/Toward the Fundamental Limits of Imitation Learning b/data/2020/neurips/Toward the Fundamental Limits of Imitation Learning
new file mode 100644
index 0000000000..40fc8e9b6d
--- /dev/null
+++ b/data/2020/neurips/Toward the Fundamental Limits of Imitation Learning	
@@ -0,0 +1 @@
+Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Better Generalization of Adaptive Gradient Methods b/data/2020/neurips/Towards Better Generalization of Adaptive Gradient Methods
new file mode 100644
index 0000000000..d7afece5a6
--- /dev/null
+++ b/data/2020/neurips/Towards Better Generalization of Adaptive Gradient Methods	
@@ -0,0 +1 @@
+Adaptive gradient methods such as AdaGrad, RMSprop and Adam have been optimizers of choice for deep learning due to their fast training speed. However, it was recently observed that their generalization performance is often worse than that of SGD for over-parameterized neural networks. While new algorithms (such as Ad-aBound) have been proposed to improve the situation, the provided analyses are only committed to optimization bounds for the training objective, leaving critical generalization capacity unexplored. To close this gap, we propose S table A daptive G radient D escent (SAGD) for non-convex optimization which leverages differential privacy to boost the generalization performance of adaptive gradient methods. Theoretical analyses show that SAGD has high-probability convergence to a population stationary point. We further conduct experiments on various popular deep learning tasks and models. Experimental results illustrate that SAGD is empirically competitive and often better than baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Convergence Rate Analysis of Random Forests for Classification b/data/2020/neurips/Towards Convergence Rate Analysis of Random Forests for Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts b/data/2020/neurips/Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
new file mode 100644
index 0000000000..4710ca98ab
--- /dev/null
+++ b/data/2020/neurips/Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts	
@@ -0,0 +1 @@
+Many recent breakthroughs in deep learning were achieved by training increasingly larger models on massive datasets. However, training such models can be prohibitively expensive. For instance, the cluster used to train GPT-3 costs over \$250 million. As a result, most researchers cannot afford to train state of the art models and contribute to their development. Hypothetically, a researcher could crowdsource the training of large neural networks with thousands of regular PCs provided by volunteers. The raw computing power of a hundred thousand \$2500 desktops dwarfs that of a \$250M server pod, but one cannot utilize that power efficiently with conventional distributed training methods. In this work, we propose Learning@home: a novel neural network training paradigm designed to handle large amounts of poorly connected participants. We analyze the performance, reliability, and architectural constraints of this paradigm and compare it against existing distributed training techniques.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Deeper Graph Neural Networks with Differentiable Group Normalization b/data/2020/neurips/Towards Deeper Graph Neural Networks with Differentiable Group Normalization
new file mode 100644
index 0000000000..3b22b00956
--- /dev/null
+++ b/data/2020/neurips/Towards Deeper Graph Neural Networks with Differentiable Group Normalization	
@@ -0,0 +1 @@
+Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Interpretable Natural Language Understanding with Explanations as Latent Variables b/data/2020/neurips/Towards Interpretable Natural Language Understanding with Explanations as Latent Variables
new file mode 100644
index 0000000000..dfeef3f882
--- /dev/null
+++ b/data/2020/neurips/Towards Interpretable Natural Language Understanding with Explanations as Latent Variables	
@@ -0,0 +1 @@
+Recently generating natural language explanations has shown very promising results in not only offering interpretable explanations but also providing additional information and supervision for prediction. However, existing approaches usually require a large set of human annotated explanations for training while collecting a large set of explanations is not only time consuming but also expensive. In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training. Our framework treats natural language explanations as latent variables that model the underlying reasoning process of a neural model. We develop a variational EM framework for optimization where an explanation generation module and an explanation-augmented prediction module are alternatively optimized and mutually enhance each other. Moreover, we further propose an explanation-based self-training method under this framework for semi-supervised learning. It alternates between assigning pseudo-labels to unlabeled data and generating new explanations to iteratively improve each other. Experiments on two natural language understanding tasks demonstrate that our framework can not only make effective predictions in both supervised and semi-supervised settings, but also generate good natural language explanation.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Learning Convolutions from Scratch b/data/2020/neurips/Towards Learning Convolutions from Scratch
new file mode 100644
index 0000000000..a401a22da9
--- /dev/null
+++ b/data/2020/neurips/Towards Learning Convolutions from Scratch	
@@ -0,0 +1 @@
+Convolution is one of the most essential components of architectures used in computer vision. As machine learning moves towards reducing the expert bias and learning it from data, a natural next step seems to be learning convolution-like structures from scratch. This, however, has proven elusive. For example, current state-of-the-art architecture search algorithms use convolution as one of the existing modules rather than learning it from data. In an attempt to understand the inductive bias that gives rise to convolutions, we investigate minimum description length as a guiding principle and show that in some settings, it can indeed be indicative of the performance of architectures. To find architectures with small description length, we propose $\beta$-LASSO, a simple variant of LASSO algorithm that, when applied on fully-connected networks for image classification tasks, learns architectures with local connections and achieves state-of-the-art accuracies for training fully-connected nets on CIFAR-10 (85.19%), CIFAR-100 (59.56%) and SVHN (94.07%) bridging the gap between fully-connected and convolutional nets.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples b/data/2020/neurips/Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples
new file mode 100644
index 0000000000..b0cfea8ec5
--- /dev/null
+++ b/data/2020/neurips/Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples	
@@ -0,0 +1 @@
+Among existing uncertainty estimation approaches, Dirichlet Prior Network (DPN) distinctly models different predictive uncertainty types. However, for in-domain examples with high data uncertainties among multiple classes, even a DPN model often produces indistinguishable representations from the out-of-distribution (OOD) examples, compromising their OOD detection performance. We address this shortcoming by proposing a novel loss function for DPN to maximize the \textit{representation gap} between in-domain and OOD examples. Experimental results demonstrate that our proposed approach consistently improves OOD detection performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes b/data/2020/neurips/Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
new file mode 100644
index 0000000000..f5c62f823f
--- /dev/null
+++ b/data/2020/neurips/Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes	
@@ -0,0 +1 @@
+We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure-dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards More Practical Adversarial Attacks on Graph Neural Networks b/data/2020/neurips/Towards More Practical Adversarial Attacks on Graph Neural Networks
new file mode 100644
index 0000000000..f371afeef8
--- /dev/null
+++ b/data/2020/neurips/Towards More Practical Adversarial Attacks on Graph Neural Networks	
@@ -0,0 +1 @@
+We study the black-box attacks on graph neural networks (GNNs) under a novel and realistic constraint: attackers have access to only a subset of nodes in the network, and they can only attack a small number of them. A node selection step is essential under this setup. We demonstrate that the structural inductive biases of GNN models can be an effective source for this type of attacks. Specifically, by exploiting the connection between the backward propagation of GNNs and random walks, we show that the common gradient-based white-box attacks can be generalized to the black-box setting via the connection between the gradient and an importance score similar to PageRank. In practice, we find attacks based on this importance score indeed increase the classification loss by a large margin, but they fail to significantly increase the mis-classification rate. Our theoretical and empirical analyses suggest that there is a discrepancy between the loss and mis-classification rate, as the latter presents a diminishing-return pattern when the number of attacked nodes increases. Therefore, we propose a greedy procedure to correct the importance score that takes into account of the diminishing-return pattern. Experimental results show that the proposed procedure can significantly increase the mis-classification rate of common GNNs on real-world data without access to model parameters nor predictions.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Neural Programming Interfaces b/data/2020/neurips/Towards Neural Programming Interfaces
new file mode 100644
index 0000000000..785bed7190
--- /dev/null
+++ b/data/2020/neurips/Towards Neural Programming Interfaces	
@@ -0,0 +1 @@
+It is notoriously difficult to control the behavior of artificial neural networks such as generative neural language models. We recast the problem of controlling natural language generation as that of learning to interface with a pretrained language model, just as Application Programming Interfaces (APIs) control the behavior of programs by altering hyperparameters. In this new paradigm, a specialized neural network (called a Neural Programming Interface or NPI) learns to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. Importantly, no permanent changes are made to the weights of the original model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model. We also contribute a new data set construction algorithm and GAN-inspired loss function that allows us to train NPI models to control outputs of autoregressive transformers. In experiments against other state-of-the-art approaches, we demonstrate the efficacy of our methods using OpenAI's GPT-2 model, successfully controlling noun selection, topic aversion, offensive speech filtering, and other aspects of language while largely maintaining the controlled model's fluency under deterministic settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Playing Full MOBA Games with Deep Reinforcement Learning b/data/2020/neurips/Towards Playing Full MOBA Games with Deep Reinforcement Learning
new file mode 100644
index 0000000000..3e16f3af58
--- /dev/null
+++ b/data/2020/neurips/Towards Playing Full MOBA Games with Deep Reinforcement Learning	
@@ -0,0 +1 @@
+MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Problem-dependent Optimal Learning Rates b/data/2020/neurips/Towards Problem-dependent Optimal Learning Rates
new file mode 100644
index 0000000000..ce7782b36a
--- /dev/null
+++ b/data/2020/neurips/Towards Problem-dependent Optimal Learning Rates	
@@ -0,0 +1 @@
+We study problem-dependent rates, i
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Safe Policy Improvement for Non-Stationary MDPs b/data/2020/neurips/Towards Safe Policy Improvement for Non-Stationary MDPs
new file mode 100644
index 0000000000..66c031c30d
--- /dev/null
+++ b/data/2020/neurips/Towards Safe Policy Improvement for Non-Stationary MDPs	
@@ -0,0 +1 @@
+Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Scalable Bayesian Learning of Causal DAGs b/data/2020/neurips/Towards Scalable Bayesian Learning of Causal DAGs
new file mode 100644
index 0000000000..343ec7bad9
--- /dev/null
+++ b/data/2020/neurips/Towards Scalable Bayesian Learning of Causal DAGs	
@@ -0,0 +1 @@
+We give methods for Bayesian inference of directed acyclic graphs, DAGs, and the induced causal effects from passively observed complete data. Our methods build on a recent Markov chain Monte Carlo scheme for learning Bayesian networks, which enables efficient approximate sampling from the graph posterior, provided that each node is assigned a small number K of candidate parents. We present algorithmic tricks to significantly reduce the space and time requirements of the method, making it feasible to use substantially larger values of K. Furthermore, we investigate the problem of selecting the candidate parents per node so as to maximize the covered posterior mass. Finally, we combine our sampling method with a novel Bayesian approach for estimating causal effects in linear Gaussian DAG models. Numerical experiments demonstrate the performance of our methods in detecting ancestor-descendant relations, and in effect estimation our Bayesian method is shown to outperform existing approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs b/data/2020/neurips/Towards Scale-Invariant Graph-related Problem Solving by Iterative Homogeneous GNNs
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning b/data/2020/neurips/Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning
new file mode 100644
index 0000000000..5e33fa7d16
--- /dev/null
+++ b/data/2020/neurips/Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning	
@@ -0,0 +1 @@
+It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms through their Levy-driven stochastic differential equations (SDEs) because of the similar convergence behaviors of an algorithm and its SDE. Then we establish the escaping time of these SDEs from a local basin. The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD. So SGD is more locally unstable than ADAM~at sharp minima defined as the minima whose local basins have small Radon measure, and can better escape from them to flatter ones with larger Radon measure. As flat minima here which often refer to the minima at flat or asymmetric basins/valleys often generalize better than sharp ones~\cite{keskar2016large,he2019asymmetric}, our result explains the better generalization performance of SGD over ADAM. Finally, experimental results confirm our heavy-tailed gradient noise assumption and theoretical affirmation.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards Understanding Hierarchical Learning: Benefits of Neural Representations b/data/2020/neurips/Towards Understanding Hierarchical Learning: Benefits of Neural Representations
new file mode 100644
index 0000000000..5e2c5127a6
--- /dev/null
+++ b/data/2020/neurips/Towards Understanding Hierarchical Learning: Benefits of Neural Representations	
@@ -0,0 +1 @@
+Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to "shallow learners" such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$. We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards a Better Global Loss Landscape of GANs b/data/2020/neurips/Towards a Better Global Loss Landscape of GANs
new file mode 100644
index 0000000000..baebe0ee39
--- /dev/null
+++ b/data/2020/neurips/Towards a Better Global Loss Landscape of GANs	
@@ -0,0 +1 @@
+Understanding of GAN training is still very limited. One major challenge is its non-convex-non-concave min-max objective, which may lead to sub-optimal local minima. In this work, we perform a global landscape analysis of the empirical loss of GANs. We prove that a class of separable-GAN, including the original JS-GAN, has exponentially many bad basins which are perceived as mode-collapse. We also study the relativistic pairing GAN (RpGAN) loss which couples the generated samples and the true samples. We prove that RpGAN has no bad basins. Experiments on synthetic data show that the predicted bad basin can indeed appear in training. We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN. For instance, we empirically show that RpGAN performs better than separable-GAN with relatively narrow neural nets. The code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards a Combinatorial Characterization of Bounded-Memory Learning b/data/2020/neurips/Towards a Combinatorial Characterization of Bounded-Memory Learning
new file mode 100644
index 0000000000..f2a9b305a2
--- /dev/null
+++ b/data/2020/neurips/Towards a Combinatorial Characterization of Bounded-Memory Learning	
@@ -0,0 +1,2 @@
+Combinatorial dimensions play an important role in the theory of machine learning. For example, VC dimension characterizes PAC learning, SQ dimension characterizes weak learning with statistical queries, and Littlestone dimension characterizes online learning. 
+In this paper we aim to develop combinatorial dimensions that characterize bounded memory learning. We propose a candidate solution for the case of realizable strong learning under a known distribution, based on the SQ dimension of neighboring distributions. We prove both upper and lower bounds for our candidate solution, that match in some regime of parameters. In this parameter regime there is an equivalence between bounded memory and SQ learning. We conjecture that our characterization holds in a much wider regime of parameters.
\ No newline at end of file
diff --git a/data/2020/neurips/Towards practical differentially private causal graph discovery b/data/2020/neurips/Towards practical differentially private causal graph discovery
new file mode 100644
index 0000000000..75b54a48a5
--- /dev/null
+++ b/data/2020/neurips/Towards practical differentially private causal graph discovery	
@@ -0,0 +1 @@
+Causal graph discovery refers to the process of discovering causal relation graphs from purely observational data. Like other statistical data, a causal graph might leak sensitive information about participants in the dataset. In this paper, we present a differentially private causal graph discovery algorithm, Priv-PC, which improves both utility and running time compared to the state-of-the-art. The design of Priv-PC follows a novel paradigm called sieve-and-examine which uses a small amount of privacy budget to filter out "insignificant" queries, and leverages the remaining budget to obtain highly accurate answers for the "significant" queries. We also conducted the first sensitivity analysis for conditional independence tests including conditional Kendall's tau and conditional Spearman's rho. We evaluated Priv-PC on 4 public datasets and compared with the state-of-the-art. The results show that Priv-PC achieves 10.61 to 32.85 times speedup and better utility.
\ No newline at end of file
diff --git a/data/2020/neurips/Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation b/data/2020/neurips/Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation
new file mode 100644
index 0000000000..5bdc7d98a8
--- /dev/null
+++ b/data/2020/neurips/Trade-offs and Guarantees of Adversarial Representation Learning for Information Obfuscation	
@@ -0,0 +1 @@
+Crowdsourced data used in machine learning services might carry sensitive information about attributes that users do not want to share. Various methods have been proposed to minimize the potential information leakage of sensitive attributes while maximizing the task accuracy. However, little is known about the theory behind these methods. In light of this gap, we develop a novel theoretical framework for attribute obfuscation. Under our framework, we propose a minimax optimization formulation to protect the given attribute and analyze its inference guarantees against worst-case adversaries. Meanwhile, it is clear that in general there is a tension between minimizing information leakage and maximizing task accuracy. To understand this, we prove an information-theoretic lower bound to precisely characterize the fundamental trade-off between accuracy and information leakage. We conduct experiments on two real-world datasets to corroborate the inference guarantees and validate this trade-off. Our results indicate that, among several alternatives, the adversarial learning approach achieves the best trade-off in terms of attribute obfuscation and accuracy maximization.
\ No newline at end of file
diff --git a/data/2020/neurips/Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering b/data/2020/neurips/Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering
new file mode 100644
index 0000000000..8daef6190e
--- /dev/null
+++ b/data/2020/neurips/Trading Personalization for Accuracy: Data Debugging in Collaborative Filtering	
@@ -0,0 +1 @@
+Collaborative ﬁltering has been widely used in recommender systems. Existing work has primarily focused on improving the prediction accuracy mainly via either building reﬁned models or incorporating additional side information, yet has largely ignored the inherent distribution of the input rating data. In this paper, we propose a data debugging framework to identify overly personalized ratings whose existence degrades the performance of a given collaborative ﬁltering model. The key idea of the proposed approach is to search for a small set of ratings whose editing (e.g., modiﬁcation or deletion) would near-optimally improve the recommendation accuracy of a validation set. Experimental results demonstrate that the proposed approach can signiﬁcantly improve the recommendation accuracy. Furthermore, we observe that the identiﬁed ratings signiﬁcantly deviate from the average ratings of the corresponding items, and the proposed approach tends to modify them towards the average. This result sheds light on the design of future recommender systems in terms of balancing between the overall accuracy and personalization.
\ No newline at end of file
diff --git a/data/2020/neurips/Train-by-Reconnect: Decoupling Locations of Weights from Their Values b/data/2020/neurips/Train-by-Reconnect: Decoupling Locations of Weights from Their Values
new file mode 100644
index 0000000000..387c839cad
--- /dev/null
+++ b/data/2020/neurips/Train-by-Reconnect: Decoupling Locations of Weights from Their Values	
@@ -0,0 +1 @@
+What makes untrained deep neural networks (DNNs) different from the trained performant ones? By zooming into the weights in well-trained DNNs, we found it is the location of weights that hold most of the information encoded by the training. Motivated by this observation, we hypothesize that weights in stochastic gradient-based method trained DNNs can be separated into two dimensions: the locations of weights and their exact values. To assess our hypothesis, we propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights. We empirically demonstrate the versatility of LaPerm while producing extensive evidence to support our hypothesis: when the initial weights are random and dense, our method demonstrates speed and performance similar to or better than that of regular optimizers, e.g., Adam; when the initial weights are random and sparse (many zeros), our method changes the way neurons connect and reach accuracy comparable to that of a well-trained fully initialized network; when the initial weights share a single value, our method finds weight agnostic neural network with far better-than-chance accuracy.
\ No newline at end of file
diff --git a/data/2020/neurips/Training Generative Adversarial Networks by Solving Ordinary Differential Equations b/data/2020/neurips/Training Generative Adversarial Networks by Solving Ordinary Differential Equations
new file mode 100644
index 0000000000..f27e9eda3f
--- /dev/null
+++ b/data/2020/neurips/Training Generative Adversarial Networks by Solving Ordinary Differential Equations	
@@ -0,0 +1 @@
+The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training - when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e.g. Spectral Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.
\ No newline at end of file
diff --git a/data/2020/neurips/Training Generative Adversarial Networks with Limited Data b/data/2020/neurips/Training Generative Adversarial Networks with Limited Data
new file mode 100644
index 0000000000..40bead8a2d
--- /dev/null
+++ b/data/2020/neurips/Training Generative Adversarial Networks with Limited Data	
@@ -0,0 +1 @@
+Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.
\ No newline at end of file
diff --git a/data/2020/neurips/Training Linear Finite-State Machines b/data/2020/neurips/Training Linear Finite-State Machines
new file mode 100644
index 0000000000..612f60324b
--- /dev/null
+++ b/data/2020/neurips/Training Linear Finite-State Machines	
@@ -0,0 +1 @@
+A ﬁnite-state machine (FSM) is a computation model to process binary strings in sequential circuits. Hence, a single-input linear FSM is conventionally used to implement complex single-input functions , such as tanh and exponentiation functions, in stochastic computing (SC) domain where continuous values are represented by sequences of random bits. In this paper, we introduce a method that can train a multi-layer FSM-based network where FSMs are connected to every FSM in the previous and the next layer. We show that the proposed FSM-based network can synthesize multi-input complex functions such as 2D Gabor ﬁlters and can perform non-sequential tasks such as image classiﬁcations on stochastic streams with no multiplication since FSMs are implemented by look-up tables only. Inspired by the capability of FSMs in processing binary streams, we then propose an FSM-based model that can process time series data when performing temporal tasks such as character-level language modeling. Unlike long short-term memories (LSTMs) that unroll the network for each input time step and perform back-propagation on the unrolled network, our FSM-based model requires to backpropagate gradients only for the current input time step while it is still capable of learning long-term dependencies. Therefore, our FSM-based model can learn extremely long-term dependencies as it requires 1 /l memory storage during training compared to LSTMs, where l is the number of time steps. Moreover, our FSM-based model reduces the power consumption of training on a GPU by 33 % compared to an LSTM model of the same size.
\ No newline at end of file
diff --git a/data/2020/neurips/Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification b/data/2020/neurips/Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification
new file mode 100644
index 0000000000..fdce919654
--- /dev/null
+++ b/data/2020/neurips/Training Normalizing Flows with the Information Bottleneck for Competitive Generative Classification	
@@ -0,0 +1 @@
+The Information Bottleneck (IB) objective uses information theory to formulate a task-performance versus robustness trade-off. It has been successfully applied in the standard discriminative classification setting. We pose the question whether the IB can also be used to train generative likelihood models such as normalizing flows. Since normalizing flows use invertible network architectures (INNs), they are information-preserving by construction. This seems contradictory to the idea of a bottleneck. In this work, firstly, we develop the theory and methodology of IB-INNs, a class of conditional normalizing flows where INNs are trained using the IB objective: Introducing a small amount of {\em controlled} information loss allows for an asymptotically exact formulation of the IB, while keeping the INN's generative capabilities intact. Secondly, we investigate the properties of these models experimentally, specifically used as generative classifiers. This model class offers advantages such as improved uncertainty quantification and out-of-distribution detection, but traditional generative classifier solutions suffer considerably in classification accuracy. We find the trade-off parameter in the IB controls a mix of generative capabilities and accuracy close to standard classifiers. Empirically, our uncertainty estimates in this mixed regime compare favourably to conventional generative and discriminative classifiers.
\ No newline at end of file
diff --git a/data/2020/neurips/Training Stronger Baselines for Learning to Optimize b/data/2020/neurips/Training Stronger Baselines for Learning to Optimize
new file mode 100644
index 0000000000..cdbbfa0124
--- /dev/null
+++ b/data/2020/neurips/Training Stronger Baselines for Learning to Optimize	
@@ -0,0 +1 @@
+Learning to optimize (L2O) has gained increasing attention since classical optimizers require laborious problem-specific design and hyperparameter tuning. However, there is a gap between the practical demand and the achievable performance of existing L2O models. Specifically, those learned optimizers are applicable to only a limited class of problems, and often exhibit instability. With many efforts devoted to designing more sophisticated L2O models, we argue for another orthogonal, under-explored theme: the training techniques for those L2O models. We show that even the simplest L2O model could have been trained much better. We first present a progressive training scheme to gradually increase the optimizer unroll length, to mitigate a well-known L2O dilemma of truncation bias (shorter unrolling) versus gradient explosion (longer unrolling). We further leverage off-policy imitation learning to guide the L2O learning, by taking reference to the behavior of analytical optimizers. Our improved training techniques are plugged into a variety of state-of-the-art L2O models, and immediately boost their performance, without making any change to their model structures. Especially, by our proposed techniques, an earliest and simplest L2O model can be trained to outperform the latest complicated L2O models on a number of tasks. Our results demonstrate a greater potential of L2O yet to be unleashed, and urge to rethink the recent progress. Our codes are publicly available at: this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning b/data/2020/neurips/Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning
new file mode 100644
index 0000000000..1ddc1df1e3
--- /dev/null
+++ b/data/2020/neurips/Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning	
@@ -0,0 +1 @@
+Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance. However, learning a generalizable dynamics model robust to changes in dynamics remains a challenge since the target transition dynamics follow a multi-modal distribution. In this paper, we present a new model-based RL algorithm, coined trajectory-wise multiple choice learning, that learns a multi-headed dynamics model for dynamics generalization. The main idea is updating the most accurate prediction head to specialize each head in certain environments with similar dynamics, i.e., clustering environments. Moreover, we incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector, enabling the model to perform online adaptation to unseen environments. Finally, to utilize the specialized prediction heads more effectively, we propose an adaptive planning method, which selects the most accurate prediction head over a recent experience. Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods. Source code and videos are available at this https URL.
\ No newline at end of file
diff --git "a/data/2020/neurips/Transfer Learning via \342\204\2231 Regularization" "b/data/2020/neurips/Transfer Learning via \342\204\2231 Regularization"
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Transferable Calibration with Lower Bias and Variance in Domain Adaptation b/data/2020/neurips/Transferable Calibration with Lower Bias and Variance in Domain Adaptation
new file mode 100644
index 0000000000..76964855c6
--- /dev/null
+++ b/data/2020/neurips/Transferable Calibration with Lower Bias and Variance in Domain Adaptation	
@@ -0,0 +1 @@
+Domain Adaptation (DA) enables transferring a learning machine from a labeled source domain to an unlabeled target domain. While remarkable advances have been made, most of the existing DA methods focus on improving the target accuracy at inference. How to estimate the predictive uncertainty of DA models is vital for decision-making in safety-critical scenarios but remains the boundary to explore. In this paper, we delve into the open problem of Calibration in DA, which is extremely challenging due to the coexistence of domain shift and the lack of target labels. We first reveal the dilemma that DA models learn higher accuracy at the expense of well-calibrated probabilities. Driven by this finding, we propose Transferable Calibration (TransCal) to tackle this dilemma, achieving accurate calibration with lower bias and variance in a unified hyperparameter-free optimization framework. As a general post-hoc calibration method, TransCal can be easily applied to recalibrate existing DA methods. Its efficacy has been justified both theoretically and empirically.
\ No newline at end of file
diff --git a/data/2020/neurips/Transferable Graph Optimizers for ML Compilers b/data/2020/neurips/Transferable Graph Optimizers for ML Compilers
new file mode 100644
index 0000000000..c422ca4743
--- /dev/null
+++ b/data/2020/neurips/Transferable Graph Optimizers for ML Compilers	
@@ -0,0 +1 @@
+Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.
\ No newline at end of file
diff --git a/data/2020/neurips/Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding b/data/2020/neurips/Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding
new file mode 100644
index 0000000000..d914492ed1
--- /dev/null
+++ b/data/2020/neurips/Tree! I am no Tree! I am a low dimensional Hyperbolic Embedding	
@@ -0,0 +1 @@
+Given data, finding a faithful low-dimensional hyperbolic embedding of the data is a key method by which we can extract hierarchical information or learn representative geometric features of the data. In this paper, we explore a new method for learning hyperbolic representations that takes a metric-first approach. Rather than determining the low-dimensional hyperbolic embedding directly, we learn a tree structure on the data as an intermediate step. This tree structure can then be used directly to extract hierarchical information, embedded into a hyperbolic manifold using Sarkar's construction (Sarkar, 2012), or used as a tree approximation of the original metric. To this end, we present a novel fast algorithm TreeRep such that, given a $\delta$-hyperbolic metric (for any $\delta \geq 0$), the algorithm learns a tree structure that approximates the original metric. In the case when $\delta = 0$, we show analytically that TreeRep exactly recovers the original tree structure. We show empirically that TreeRep is not only many orders of magnitude faster than previous known algorithms, but also produces metrics with lower average distortion and higher mean average precision than most previous algorithms for learning hyperbolic embeddings, extracting hierarchical information, and approximating metrics via tree metrics.
\ No newline at end of file
diff --git a/data/2020/neurips/Triple descent and the two kinds of overfitting: where & why do they appear? b/data/2020/neurips/Triple descent and the two kinds of overfitting: where & why do they appear?
new file mode 100644
index 0000000000..759f368d6f
--- /dev/null
+++ b/data/2020/neurips/Triple descent and the two kinds of overfitting: where & why do they appear?	
@@ -0,0 +1 @@
+A recent line of research has highlighted the existence of a ‘double descent’ phenomenon in deep learning, whereby increasing the number of training examples N causes the generalization error of neural networks (NNs) to peak when N is of the same order as the number of parameters P. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the peak instead occurs when N is equal to the input dimension D. Since both peaks coincide with the interpolation threshold, they are often conflated in the literature. In this paper, we show that despite their apparent similarity, these two scenarios are inherently different. In fact, both peaks can co-exist when NNs are applied to noisy regression tasks. The relative size of the peaks is then governed by the degree of nonlinearity of the activation function. Building on recent developments in the analysis of random feature models, we provide a theoretical ground for this sample-wise triple descent. As shown previously, the nonlinear peak at N = P is a true divergence caused by the extreme sensitivity of the output function to both the noise corrupting the labels and the initialization of the random features (or the weights in NNs). This peak survives in the absence of noise, but can be suppressed by regularization. In contrast, the linear peak at N = D is solely due to overfitting the noise in the labels, and forms earlier during training. We show that this peak is implicitly regularized by the nonlinearity, which is why it only becomes salient at high noise and is weakly affected by explicit regularization. Throughout the paper, we compare analytical results obtained in the random feature model with the outcomes of numerical experiments involving deep NNs.
\ No newline at end of file
diff --git a/data/2020/neurips/Truncated Linear Regression in High Dimensions b/data/2020/neurips/Truncated Linear Regression in High Dimensions
new file mode 100644
index 0000000000..33797cb699
--- /dev/null
+++ b/data/2020/neurips/Truncated Linear Regression in High Dimensions	
@@ -0,0 +1 @@
+As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + \eta_i$, where $x^*$ is some fixed unknown vector of interest and $\eta_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The goal is to recover $x^*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $\ell_2$ reconstruction error of $O(\sqrt{(k \log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.
\ No newline at end of file
diff --git a/data/2020/neurips/Trust the Model When It Is Confident: Masked Model-based Actor-Critic b/data/2020/neurips/Trust the Model When It Is Confident: Masked Model-based Actor-Critic
new file mode 100644
index 0000000000..e737a53b42
--- /dev/null
+++ b/data/2020/neurips/Trust the Model When It Is Confident: Masked Model-based Actor-Critic	
@@ -0,0 +1,2 @@
+It is a popular belief that model-based Reinforcement Learning (RL) is more sample efficient than model-free RL, but in practice, it is not always true due to overweighed model errors. In complex and noisy settings, model-based RL tends to have trouble using the model if it does not know when to trust the model. 
+In this work, we find that better model usage can make a huge difference. We show theoretically that if the use of model-generated data is restricted to state-action pairs where the model error is small, the performance gap between model and real rollouts can be reduced. It motivates us to use model rollouts only when the model is confident about its predictions. We propose Masked Model-based Actor-Critic (M2AC), a novel policy optimization algorithm that maximizes a model-based lower-bound of the true value function. M2AC implements a masking mechanism based on the model's uncertainty to decide whether its prediction should be used or not. Consequently, the new algorithm tends to give robust policy improvements. Experiments on continuous control benchmarks demonstrate that M2AC has strong performance even when using long model rollouts in very noisy environments, and it significantly outperforms previous state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Truthful Data Acquisition via Peer Prediction b/data/2020/neurips/Truthful Data Acquisition via Peer Prediction
new file mode 100644
index 0000000000..e8bf8800a7
--- /dev/null
+++ b/data/2020/neurips/Truthful Data Acquisition via Peer Prediction	
@@ -0,0 +1 @@
+We consider the problem of purchasing data for machine learning or statistical estimation. The data analyst has a fixed budget to purchase datasets from multiple data providers. She does not have any test data that can be used to evaluate the collected data and can assign payments to data providers solely based on the collected datasets. We consider the problem in the standard Bayesian paradigm and in two settings: (1) data are only collected once; (2) data are collected repeatedly and each day's data are drawn independently from the same distribution. For both settings, our mechanisms guarantee that truthfully reporting one's dataset is always an equilibrium, by adopting techniques from peer prediction: pay each provider the mutual information between his reported data and other providers' reported data. Depending on the data distribution, the mechanisms can also discourage misreports that would lead to inaccurate predictions. Our mechanisms also guarantee individual rationality and budget feasibility for certain underlying distributions in the first setting and for all distributions in the second setting.
\ No newline at end of file
diff --git a/data/2020/neurips/UCLID-Net: Single View Reconstruction in Object Space b/data/2020/neurips/UCLID-Net: Single View Reconstruction in Object Space
new file mode 100644
index 0000000000..c49cb5febe
--- /dev/null
+++ b/data/2020/neurips/UCLID-Net: Single View Reconstruction in Object Space	
@@ -0,0 +1 @@
+Most state-of-the-art deep geometric learning single-view reconstruction approaches rely on encoder-decoder architectures that output either shape parametrizations or implicit representations. However, these representations rarely preserve the Euclidean structure of the 3D space objects exist in. In this paper, we show that building a geometry preserving 3-dimensional latent space helps the network concurrently learn global shape regularities and local reasoning in the object coordinate space and, as a result, boosts performance. We demonstrate both on ShapeNet synthetic images, which are often used for benchmarking purposes, and on real-world images that our approach outperforms state-of-the-art ones. Furthermore, the single-view pipeline naturally extends to multi-view reconstruction, which we also show.
\ No newline at end of file
diff --git a/data/2020/neurips/UCSG-NET- Unsupervised Discovering of Constructive Solid Geometry Tree b/data/2020/neurips/UCSG-NET- Unsupervised Discovering of Constructive Solid Geometry Tree
new file mode 100644
index 0000000000..497fb9a6e3
--- /dev/null
+++ b/data/2020/neurips/UCSG-NET- Unsupervised Discovering of Constructive Solid Geometry Tree	
@@ -0,0 +1 @@
+Signed distance field (SDF) is a prominent implicit representation of 3D meshes. Methods that are based on such representation achieved state-of-the-art 3D shape reconstruction quality. However, these methods struggle to reconstruct non-convex shapes. One remedy is to incorporate a constructive solid geometry framework (CSG) that represents a shape as a decomposition into primitives. It allows to embody a 3D shape of high complexity and non-convexity with a simple tree representation of Boolean operations. Nevertheless, existing approaches are supervised and require the entire CSG parse tree that is given upfront during the training process. On the contrary, we propose a model that extracts a CSG parse tree without any supervision - UCSG-Net. Our model predicts parameters of primitives and binarizes their SDF representation through differentiable indicator function. It is achieved jointly with discovering the structure of a Boolean operators tree. The model selects dynamically which operator combination over primitives leads to the reconstruction of high fidelity. We evaluate our method on 2D and 3D autoencoding tasks. We show that the predicted parse tree representation is interpretable and can be used in CAD software.
\ No newline at end of file
diff --git a/data/2020/neurips/UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging b/data/2020/neurips/UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging
new file mode 100644
index 0000000000..d6ade00c7a
--- /dev/null
+++ b/data/2020/neurips/UDH: Universal Deep Hiding for Steganography, Watermarking, and Light Field Messaging	
@@ -0,0 +1 @@
+Layer input channels ks stride conv1 image C′ 3/64 3 1 conv2 conv1 64/128 3 1 conv3 conv2 128/256 3 1 conv4 conv3 256/128 3 1 conv5 conv4 128/64 3 1 conv6 conv5 64/3 3 1 sigmoid conv6 N/A N/A N/A We adopt a simplified U-Net adopted in Cycle-GAN [6]. Specifically, we remove the two most inner convolutions and upconvolutions. The detailed H architectures for the DDH and UDH are shown in Table 1 and Table 2, respectively, where the conv layer is followed by a BatchNorm layer and ReLU layer. In contrast to the final Sigmoid layer adopted in DDH, we adopt a Tanh layer multiplied by a scale factor which is set to 10/255 by referencing the engineering choice in universal adversarial perturbations [3, 4, 1]. Note that different from [3, 4, 1], Se is minimized in the loss even with this constraint. With such a scale factor, some pixel intensities in C ′ in UDH might still be out of the range [0, 1]. However, empirically we find that the
\ No newline at end of file
diff --git a/data/2020/neurips/UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection b/data/2020/neurips/UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection
new file mode 100644
index 0000000000..5fb7fe3b81
--- /dev/null
+++ b/data/2020/neurips/UWSOD: Toward Fully-Supervised-Level Capacity Weakly Supervised Object Detection	
@@ -0,0 +1 @@
+Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great ﬂexibility of exploiting large-scale dataset with only image-level annotations for detector training. Despite its great advance in recent years, WSOD still suffers limited performance, which is far below that of fully supervised object detection (FSOD). As most WSOD methods depend on object proposal algorithms to generate candidate regions and are also confronted with challenges like low-quality predicted bounding boxes and large scale variation. In this paper, we propose a uniﬁed WSOD framework, termed UWSOD, to develop a high-capacity general detection model with only image-level labels, which is self-contained and does not require external modules or additional supervision. To this end, we exploit three important components, i.e. , object proposal generation, bounding-box ﬁne-tuning and scale-invariant features. First, we propose an anchor-based self-supervised proposal generator to hypothesize object locations, which is trained end-to-end with supervision created by UWSOD for both objectness classiﬁcation and regression. Second, we develop a step-wise bounding-box ﬁne-tuning to reﬁne both detection scores and coordinates by progressively select high-conﬁdence object proposals as positive samples, which bootstraps the quality of predicted bounding boxes. Third, we construct a multi-rate resampling pyramid to aggregate multi-scale contextual information, which is the ﬁrst in-network feature hierarchy to handle scale variation in WSOD. Extensive experiments on PASCAL VOC and MS COCO show that the proposed UWSOD achieves competitive results with the state-of-the-art WSOD methods while not requiring external modules or additional supervision. Moreover, the upper-bound performance of UWSOD with class-agnostic ground-truth bounding boxes approaches Faster R-CNN, which demonstrates UWSOD has fully-supervised-level capacity. The code is available at: https://github.com/shenyunhang/UWSOD .
\ No newline at end of file
diff --git a/data/2020/neurips/Ultra-Low Precision 4-bit Training of Deep Neural Networks b/data/2020/neurips/Ultra-Low Precision 4-bit Training of Deep Neural Networks
new file mode 100644
index 0000000000..38ec45634a
--- /dev/null
+++ b/data/2020/neurips/Ultra-Low Precision 4-bit Training of Deep Neural Networks	
@@ -0,0 +1 @@
+In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very ﬁrst time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits. To enable this advance, we explore a novel adaptive Gradient Scaling technique (GradScale) that addresses the challenges of insufﬁcient range and resolution in quantized gradients as well as explores the impact of quantization errors observed during model training. We theoretically analyze the role of bias in gradient quantization and propose solutions that mitigate the impact of this bias on model convergence. Finally, we examine our techniques on a spectrum of deep learning models in computer vision, speech and NLP. In combination with previously proposed solutions for 4-bit quantization of weight and activation tensors, 4-bit training shows non-signiﬁcant loss in accuracy across application domains while enabling signiﬁcant hardware acceleration (>7 × over state of the art FP16 systems).
\ No newline at end of file
diff --git a/data/2020/neurips/Ultrahyperbolic Representation Learning b/data/2020/neurips/Ultrahyperbolic Representation Learning
new file mode 100644
index 0000000000..979541d908
--- /dev/null
+++ b/data/2020/neurips/Ultrahyperbolic Representation Learning	
@@ -0,0 +1 @@
+In machine learning, data is usually represented in a (flat) Euclidean space where distances between points are along straight lines. Researchers have recently considered more exotic (non-Euclidean) Riemannian manifolds such as hyperbolic space which is well suited for tree-like data. In this paper, we propose a representation living on a pseudo-Riemannian manifold with constant nonzero curvature. It is a generalization of hyperbolic and spherical geometries where the nondegenerate metric tensor is not positive definite. We provide the necessary learning tools in this geometry and extend gradient method optimization techniques. More specifically, we provide closed-form expressions for distances via geodesics and define a descent direction that guarantees the minimization of the objective problem. Our novel framework is applied to graph representations.
\ No newline at end of file
diff --git a/data/2020/neurips/UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging b/data/2020/neurips/UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging
new file mode 100644
index 0000000000..39638bb4eb
--- /dev/null
+++ b/data/2020/neurips/UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging	
@@ -0,0 +1 @@
+A conventional camera often suffers from over-or under-exposure when recording a real-world scene with a very high dynamic range (HDR). In contrast, a modulo camera with a Markov random ﬁeld (MRF) based unwrapping algorithm can theoretically accomplish unbounded dynamic range but shows degenerate performances when there are modulus-intensity ambiguity, strong local contrast, and color misalignment. In this paper, we reformulate the modulo image unwrapping problem into a series of binary labeling problems and propose a modulo edge-aware model, named as UnModNet, to iteratively estimate the binary rollover masks of the modulo image for unwrapping. Experimental results show that our approach can generate 12 -bit HDR images from 8 -bit modulo images reliably, and runs much faster than the previous MRF-based algorithm thanks to the GPU acceleration.
\ No newline at end of file
diff --git a/data/2020/neurips/Unbalanced Sobolev Descent b/data/2020/neurips/Unbalanced Sobolev Descent
new file mode 100644
index 0000000000..d4f9fba697
--- /dev/null
+++ b/data/2020/neurips/Unbalanced Sobolev Descent	
@@ -0,0 +1 @@
+We introduce Unbalanced Sobolev Descent (USD), a particle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass. We define the Sobolev-Fisher discrepancy between distributions and show that it relates to advection-reaction transport equations and the Wasserstein-Fisher-Rao metric between distributions. USD transports particles along gradient flows of the witness function of the Sobolev-Fisher discrepancy (advection step) and reweighs the mass of particles with respect to this witness function (reaction step). The reaction step can be thought of as a birth-death process of the particles with rate of growth proportional to the witness function. When the Sobolev-Fisher witness function is estimated in a Reproducing Kernel Hilbert Space (RKHS), under mild assumptions we show that USD converges asymptotically (in the limit of infinite particles) to the target distribution in the Maximum Mean Discrepancy (MMD) sense. We then give two methods to estimate the Sobolev-Fisher witness with neural networks, resulting in two Neural USD algorithms. The first one implements the reaction step with mirror descent on the weights, while the second implements it through a birth-death process of particles. We show on synthetic examples that USD transports distributions with or without conservation of mass faster than previous particle descent algorithms, and finally demonstrate its use for molecular biology analyses where our method is naturally suited to match developmental stages of populations of differentiating cells based on their single-cell RNA sequencing profile. Code is available at this https URL .
\ No newline at end of file
diff --git a/data/2020/neurips/Uncertainty Aware Semi-Supervised Learning on Graph Data b/data/2020/neurips/Uncertainty Aware Semi-Supervised Learning on Graph Data
new file mode 100644
index 0000000000..90c95537e1
--- /dev/null
+++ b/data/2020/neurips/Uncertainty Aware Semi-Supervised Learning on Graph Data	
@@ -0,0 +1 @@
+Thanks to graph neural networks (GNNs), semi-supervised node classification has shown the state-of-the-art performance in graph data. However, GNNs have not considered different types of uncertainties associated with class probabilities to minimize risk of increasing misclassification under uncertainty in real life. In this work, we propose a multi-source uncertainty framework using a GNN that reflects various types of predictive uncertainties in both deep learning and belief/evidence theory domains for node classification predictions. By collecting evidence from the given labels of training nodes, the Graph-based Kernel Dirichlet distribution Estimation (GKDE) method is designed for accurately predicting node-level Dirichlet distributions and detecting out-of-distribution (OOD) nodes. We validated the outperformance of our proposed model compared to the state-of-the-art counterparts in terms of misclassification detection and OOD detection based on six real network datasets. We found that dissonance-based detection yielded the best results on misclassification detection while vacuity-based detection was the best for OOD detection. To clarify the reasons behind the results, we provided the theoretical proof that explains the relationships between different types of uncertainties considered in this work.
\ No newline at end of file
diff --git a/data/2020/neurips/Uncertainty Quantification for Inferring Hawkes Networks b/data/2020/neurips/Uncertainty Quantification for Inferring Hawkes Networks
new file mode 100644
index 0000000000..7b29755364
--- /dev/null
+++ b/data/2020/neurips/Uncertainty Quantification for Inferring Hawkes Networks	
@@ -0,0 +1 @@
+Multivariate Hawkes processes are commonly used to model streaming networked event data in a wide variety of applications. However, it remains a challenge to extract reliable inference from complex datasets with uncertainty quantification. Aiming towards this, we develop a statistical inference framework to learn causal relationships between nodes from networked data, where the underlying directed graph implies Granger causality. We provide uncertainty quantification for the maximum likelihood estimate of the network multivariate Hawkes process by providing a non-asymptotic confidence set. The main technique is based on the concentration inequalities of continuous-time martingales. We compare our method to the previously-derived asymptotic Hawkes process confidence interval, and demonstrate the strengths of our method in an application to neuronal connectivity reconstruction.
\ No newline at end of file
diff --git a/data/2020/neurips/Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation b/data/2020/neurips/Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
new file mode 100644
index 0000000000..5e4c799755
--- /dev/null
+++ b/data/2020/neurips/Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation	
@@ -0,0 +1 @@
+Zero-shot semantic segmentation (ZSS) aims to classify pixels of novel classes without training examples available. Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level. Yet, few works study the adverse effects caused by the noisy and outlying training samples of the seen classes. In this paper, we identify this challenge and address it with a novel framework that learns to discriminate noisy samples based on Bayesian uncertainty estimation. Speciﬁcally, we model the network outputs with Gaussian and Laplacian distributions, with the variances accounting for the observation noise and uncertainty of input samples. Learning objectives are then derived with the estimated variances playing as adaptive attenuation for individual samples in training. Consequently, our model learns more attentively from representative samples of seen classes while suffering less from noisy and outlying ones, thus providing better reliability and generalization toward unseen categories. We demonstrate the effectiveness of our framework through comprehensive experiments on multiple challenging benchmarks, and show that our method achieves signiﬁcant accuracy improvement over previous approaches for large-scale open-set segmentation.
\ No newline at end of file
diff --git a/data/2020/neurips/Uncertainty-aware Self-training for Few-shot Text Classification b/data/2020/neurips/Uncertainty-aware Self-training for Few-shot Text Classification
new file mode 100644
index 0000000000..c528975263
--- /dev/null
+++ b/data/2020/neurips/Uncertainty-aware Self-training for Few-shot Text Classification	
@@ -0,0 +1 @@
+Recent success of large-scale pre-trained language models crucially hinge on fine-tuning them on large amounts of labeled data for the downstream task, that are typically expensive to acquire. In this work, we study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck by making use of large-scale unlabeled data for the target task. Standard self-training mechanism randomly samples instances from the unlabeled pool to pseudo-label and augment labeled data. In this work, we propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network leveraging recent advances in Bayesian deep learning. Specifically, we propose (i) acquisition functions to select instances from the unlabeled pool leveraging Monte Carlo (MC) Dropout, and (ii) learning mechanism leveraging model confidence for self-training. As an application, we focus on text classification on five benchmark datasets. We show our uncertainty-aware few-shot self-training method leveraging only 20-30 labeled samples per class for each task can perform within 3% of fully supervised pre-trained language models like BERT fine-tuned on thousands of labeled instances with an aggregate accuracy of 91% and improving by up to 12% over baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence b/data/2020/neurips/Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence
new file mode 100644
index 0000000000..ad4c9e3459
--- /dev/null
+++ b/data/2020/neurips/Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence	
@@ -0,0 +1 @@
+Functional magnetic resonance imaging (fMRI) is a crucial technology for gaining insights into cognitive processes in humans. Data amassed from fMRI measurements result in volumetric data sets that vary over time. However, analysing such data presents a challenge due to the large degree of noise and person-to-person variation in how information is represented in the brain. To address this challenge, we present a novel topological approach that encodes each time point in an fMRI data set as a persistence diagram of topological features, i.e. high-dimensional voids present in the data. This representation naturally does not rely on voxel-by-voxel correspondence and is robust towards noise. We show that these time-varying persistence diagrams can be clustered to find meaningful groupings between participants, and that they are also useful in studying within-subject brain state trajectories of subjects performing a particular task. Here, we apply both clustering and trajectory analysis techniques to a group of participants watching the movie 'Partly Cloudy'. We observe significant differences in both brain state trajectories and overall topological activity between adults and children watching the same movie.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features b/data/2020/neurips/Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features
new file mode 100644
index 0000000000..047e4aac38
--- /dev/null
+++ b/data/2020/neurips/Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features	
@@ -0,0 +1 @@
+Deep generative networks trained via maximum likelihood on a natural image dataset like CIFAR10 often assign high likelihoods to images from datasets with different objects (e.g., SVHN). We refine previous investigations of this failure at anomaly detection for invertible generative networks and provide a clear explanation of it as a combination of model bias and domain prior: Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset and these low-level features dominate the likelihood. Hence, when the discriminative features between inliers and outliers are on a high-level, e.g., object shapes, anomaly detection becomes particularly challenging. To remove the negative impact of model bias and domain prior on detecting high-level differences, we propose two methods, first, using the log likelihood ratios of two identical models, one trained on the in-distribution data (e.g., CIFAR10) and the other one on a more general distribution of images (e.g., 80 Million Tiny Images). We also derive a novel outlier loss for the in-distribution network on samples from the more general distribution to further improve the performance. Secondly, using a multi-scale model like Glow, we show that low-level features are mainly captured at early scales. Therefore, using only the likelihood contribution of the final scale performs remarkably well for detecting high-level feature differences of the out-of-distribution and the in-distribution. This method is especially useful if one does not have access to a suitable general distribution. Overall, our methods achieve strong anomaly detection performance in the unsupervised setting, reaching comparable performance as state-of-the-art classifier-based methods in the supervised setting.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks b/data/2020/neurips/Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
new file mode 100644
index 0000000000..997e387d36
--- /dev/null
+++ b/data/2020/neurips/Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks	
@@ -0,0 +1 @@
+Natural gradient descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast in practice. Nevertheless, it remains unclear from the theoretical perspective why and under what conditions such heuristic approximations work well. In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD. We consider deep neural networks in the infinite-width limit, and analyze the asymptotic training dynamics of NGD in function space via the neural tangent kernel. In the function space, the training dynamics with the approximate Fisher information are identical to those with the exact Fisher information, and they converge quickly. The fast convergence holds in layer-wise approximations; for instance, in block diagonal approximation where each block corresponds to a layer as well as in block tri-diagonal and K-FAC approximations. We also find that a unit-wise approximation achieves the same fast convergence under some assumptions. All of these different approximations have an isotropic gradient in the function space, and this plays a fundamental role in achieving the same convergence properties in training. Thus, the current study gives a novel and unified theoretical foundation with which to understand NGD methods in deep learning.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding Deep Architecture with Reasoning Layer b/data/2020/neurips/Understanding Deep Architecture with Reasoning Layer
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition b/data/2020/neurips/Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
new file mode 100644
index 0000000000..69b88f9d3d
--- /dev/null
+++ b/data/2020/neurips/Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition	
@@ -0,0 +1 @@
+Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function. However, such a simple trade-off does not adequately describe deep learning models that simultaneously attain low bias and variance in the heavily overparameterized regime. A primary obstacle in explaining this behavior is that deep learning algorithms typically involve multiple sources of randomness whose individual contributions are not visible in the total variance. To enable fine-grained analysis, we describe an interpretable, symmetric decomposition of the variance into terms associated with the randomness from sampling, initialization, and the labels. Moreover, we compute the high-dimensional asymptotic behavior of this decomposition for random feature kernel regression, and analyze the strikingly rich phenomenology that arises. We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior and can diverge at the interpolation boundary, even in the absence of label noise. The divergence is caused by the \emph{interaction} between sampling and initialization and can therefore be eliminated by marginalizing over samples (i.e. bagging) \emph{or} over the initial parameters (i.e. ensemble learning).
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding Global Feature Contributions With Additive Importance Measures b/data/2020/neurips/Understanding Global Feature Contributions With Additive Importance Measures
new file mode 100644
index 0000000000..cc84bc1df7
--- /dev/null
+++ b/data/2020/neurips/Understanding Global Feature Contributions With Additive Importance Measures	
@@ -0,0 +1 @@
+Understanding the inner workings of complex machine learning models is a long-standing problem and most recent research has focused on local interpretability. To assess the role of individual input features in a global sense, we explore the perspective of defining feature importance through the predictive power associated with each feature. We introduce two notions of predictive power (model-based and universal) and formalize this approach with a framework of additive importance measures, which unifies numerous methods in the literature. We then propose SAGE, a model-agnostic method that quantifies predictive power while accounting for feature interactions. Our experiments show that SAGE can be calculated efficiently and that it assigns more accurate importance values than other methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding Gradient Clipping in Private SGD: A Geometric Perspective b/data/2020/neurips/Understanding Gradient Clipping in Private SGD: A Geometric Perspective
new file mode 100644
index 0000000000..4b225d4a4c
--- /dev/null
+++ b/data/2020/neurips/Understanding Gradient Clipping in Private SGD: A Geometric Perspective	
@@ -0,0 +1 @@
+Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold. We first demonstrate how gradient clipping can prevent SGD from converging to stationary point. We then provide a theoretical analysis that fully quantifies the clipping bias on convergence with a disparity measure between the gradient distribution and a geometrically symmetric distribution. Our empirical evaluation further suggests that the gradient distributions along the trajectory of private SGD indeed exhibit symmetric structure that favors convergence. Together, our results provide an explanation why private SGD with gradient clipping remains effective in practice despite its potential clipping bias. Finally, we develop a new perturbation-based technique that can provably correct the clipping bias even for instances with highly asymmetric gradient distributions.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding and Exploring the Network with Stochastic Architectures b/data/2020/neurips/Understanding and Exploring the Network with Stochastic Architectures
new file mode 100644
index 0000000000..ec636af6ad
--- /dev/null
+++ b/data/2020/neurips/Understanding and Exploring the Network with Stochastic Architectures	
@@ -0,0 +1 @@
+There is an emerging trend to train a network with stochastic architectures to enable various architectures to be plugged and played during inference. However, the existing investigation is highly entangled with neural architecture search (NAS), limiting its widespread use across scenarios. In this work, we decouple the training of a network with stochastic architectures (NSA) from NAS and provide a first systematical investigation on it as a stand-alone problem. We first uncover the characteristics of NSA in various aspects ranging from training stability, convergence, predictive behaviour, to generalization capacity to unseen architectures. We identify various issues of the vanilla NSA, such as training/test disparity and function mode collapse, and further propose the solutions to these issues with theoretical and empirical insights. We believe that these results could also serve as good heuristics for NAS. Given these understandings, we further apply the NSA with our improvements into diverse scenarios to fully exploit its promise of inference-time architecture stochasticity, including model ensemble, uncertainty estimation and semi-supervised learning. Remarkable performance (e.g., 2.75% error rate and 0.0032 expected calibration error on CIFAR-10) validate the effectiveness of such a model, providing new perspectives of exploring the potential of the network with stochastic architectures, beyond NAS.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding and Improving Fast Adversarial Training b/data/2020/neurips/Understanding and Improving Fast Adversarial Training
new file mode 100644
index 0000000000..d9aec3890c
--- /dev/null
+++ b/data/2020/neurips/Understanding and Improving Fast Adversarial Training	
@@ -0,0 +1 @@
+A recent line of work focused on making adversarial training computationally efficient for deep learning models. In particular, Wong et al. (2020) showed that $\ell_\infty$-adversarial training with fast gradient sign method (FGSM) can fail due to a phenomenon called "catastrophic overfitting", when the model quickly loses its robustness over a single epoch of training. We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation. Moreover, we show that catastrophic overfitting is not inherent to deep and overparametrized networks, but can occur in a single-layer convolutional network with a few filters. In an extreme case, even a single filter can make the network highly non-linear locally, which is the main reason why FGSM training fails. Based on this observation, we propose a new regularization method, GradAlign, that prevents catastrophic overfitting by explicitly maximizing the gradient alignment inside the perturbation set and improves the quality of the FGSM solution. As a result, GradAlign allows to successfully apply FGSM training also for larger $\ell_\infty$-perturbations and reduce the gap to multi-step adversarial training. The code of our experiments is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding spiking networks through convex optimization b/data/2020/neurips/Understanding spiking networks through convex optimization
new file mode 100644
index 0000000000..1461bee050
--- /dev/null
+++ b/data/2020/neurips/Understanding spiking networks through convex optimization	
@@ -0,0 +1 @@
+Neurons mainly communicate through spikes, and much effort has been spent to understand how the dynamics of spiking neural networks (SNNs) relates to their connectivity. Meanwhile, most major advances in machine learning have been made with simpler, rate-based networks, with SNNs only recently showing competitive results, largely thanks to transferring insights from rate to spiking networks. However, it is still an open question exactly which computations SNNs perform. Recently, the time-averaged ﬁring rates of several SNNs were shown to yield the solutions to convex optimization problems. Here we turn these ﬁndings around and show that virtually all inhibition-dominated SNNs can be understood through the lens of convex optimization, with network connectivity, timescales, and ﬁring thresholds being intricately linked to the parameters of underlying convex optimization problems. This approach yields new, geometric insights into the computations performed by spiking networks. In particular, we establish a class of SNNs whose instantaneous output provides a solution to linear or quadratic programming problems, and we thereby reveal their input-output mapping. Using these insights, we derive local, supervised learning rules that can approximate given convex input-output functions, and we show that the resulting networks are consistent with many features from biological networks, such as low ﬁring rates, irregular ﬁring, E/I balance, and robustness to perturbations and synaptic delays.
\ No newline at end of file
diff --git a/data/2020/neurips/Understanding the Role of Training Regimes in Continual Learning b/data/2020/neurips/Understanding the Role of Training Regimes in Continual Learning
new file mode 100644
index 0000000000..3a87bce5a7
--- /dev/null
+++ b/data/2020/neurips/Understanding the Role of Training Regimes in Continual Learning	
@@ -0,0 +1 @@
+Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes -- learning rate, batch size, regularization method-- can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/Unfolding recurrence by Green's functions for optimized reservoir computing b/data/2020/neurips/Unfolding recurrence by Green's functions for optimized reservoir computing
new file mode 100644
index 0000000000..b003da9ddf
--- /dev/null
+++ b/data/2020/neurips/Unfolding recurrence by Green's functions for optimized reservoir computing	
@@ -0,0 +1 @@
+Cortical networks are strongly recurrent, and neurons have intrinsic temporal dynamics. This sets them apart from deep feed-forward networks. Despite the tremendous progress in the application of feed-forward networks and their theoretical understanding, it remains unclear how the interplay of recurrence and non-linearities in recurrent cortical networks contributes to their function. The purpose of this work is to present a solvable recurrent network model that links to feed forward networks. By perturbative methods we transform the time-continuous, recurrent dynamics into an effective feed-forward structure of linear and non-linear temporal kernels. The resulting analytical expressions allow us to build optimal time-series classifiers from random reservoir networks. Firstly, this allows us to optimize not only the readout vectors, but also the input projection, demonstrating a strong potential performance gain. Secondly, the analysis exposes how the second order stimulus statistics is a crucial element that interacts with the non-linearity of the dynamics and boosts performance.
\ No newline at end of file
diff --git a/data/2020/neurips/Unfolding the Alternating Optimization for Blind Super Resolution b/data/2020/neurips/Unfolding the Alternating Optimization for Blind Super Resolution
new file mode 100644
index 0000000000..2c3485969d
--- /dev/null
+++ b/data/2020/neurips/Unfolding the Alternating Optimization for Blind Super Resolution	
@@ -0,0 +1 @@
+Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. Small estimation error of the first step could cause severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from LR image, which makes it difficult to predict highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores SR image based on predicted kernel, and \textit{Estimator} estimates blur kernel with the help of restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, \textit{Estimator} utilizes information from both LR and SR images, which makes the estimation of blur kernel easier. More importantly, \textit{Restorer} is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus \textit{Restorer} could be more tolerant to the estimation error of \textit{Estimator}. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at much higher speed. The source code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks b/data/2020/neurips/Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks
new file mode 100644
index 0000000000..59b72bf7de
--- /dev/null
+++ b/data/2020/neurips/Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks	
@@ -0,0 +1 @@
+For the gradient computation across the time domain in Spiking Neural Networks (SNNs) training, two different approaches have been independently studied. The first is to compute the gradients with respect to the change in spike activation (activation-based methods), and the second is to compute the gradients with respect to the change in spike timing (timing-based methods). In this work, we present a comparative study of the two methods and propose a new supervised learning method that combines them. The proposed method utilizes each individual spike more effectively by shifting spike timings as in the timing-based methods as wells as generating and removing spikes as in the activation-based methods. Experimental results showed that the proposed method achieves higher performance in terms of both accuracy and efficiency than the previous approaches.
\ No newline at end of file
diff --git a/data/2020/neurips/Universal Domain Adaptation through Self Supervision b/data/2020/neurips/Universal Domain Adaptation through Self Supervision
new file mode 100644
index 0000000000..7f3df0ed1b
--- /dev/null
+++ b/data/2020/neurips/Universal Domain Adaptation through Self Supervision	
@@ -0,0 +1 @@
+Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains. While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori. We propose a more universally applicable domain adaptation approach that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE). DANCE combines two novel ideas: First, as we cannot fully rely on source categories to learn features discriminative for the target, we propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way. Second, we use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy. We show through extensive experiments that DANCE outperforms baselines across open-set, open-partial and partial domain adaptation settings.
\ No newline at end of file
diff --git a/data/2020/neurips/Universal Function Approximation on Graphs b/data/2020/neurips/Universal Function Approximation on Graphs
new file mode 100644
index 0000000000..a5dc595387
--- /dev/null
+++ b/data/2020/neurips/Universal Function Approximation on Graphs	
@@ -0,0 +1 @@
+In this work we produce a framework for constructing universal function approximators on graph isomorphism classes. Additionally, we prove how this framework comes with a collection of theoretically desirable properties and enables novel analysis. We show how this allows us to outperform state of the art on four different well known datasets in graph classification and how our method can separate classes of graphs that other graph-learning methods cannot. Our approach is inspired by persistence homology, dependency parsing for Natural Language Processing, and multivalued functions. The complexity of the underlying algorithm is O(mn) and code is publicly available.
\ No newline at end of file
diff --git a/data/2020/neurips/Universal guarantees for decision tree induction via a higher-order splitting criterion b/data/2020/neurips/Universal guarantees for decision tree induction via a higher-order splitting criterion
new file mode 100644
index 0000000000..17865944c7
--- /dev/null
+++ b/data/2020/neurips/Universal guarantees for decision tree induction via a higher-order splitting criterion	
@@ -0,0 +1,2 @@
+We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and small subsets of its attributes. The splitting criteria of existing heuristics (e.g. Gini impurity and information gain), in contrast, are based solely on the correlations between $f$ and its individual attributes. 
+Our algorithm satisfies the following guarantee: for all target functions $f : \{-1,1\}^n \to \{-1,1\}$, sizes $s\in \mathbb{N}$, and error parameters $\epsilon$, it constructs a decision tree of size $s^{\tilde{O}((\log s)^2/\epsilon^2)}$ that achieves error $\le O(\mathsf{opt}_s) + \epsilon$, where $\mathsf{opt}_s$ denotes the error of the optimal size $s$ decision tree. A key technical notion that drives our analysis is the noise stability of $f$, a well-studied smoothness measure.
\ No newline at end of file
diff --git a/data/2020/neurips/Universally Quantized Neural Compression b/data/2020/neurips/Universally Quantized Neural Compression
new file mode 100644
index 0000000000..b11ea5f47d
--- /dev/null
+++ b/data/2020/neurips/Universally Quantized Neural Compression	
@@ -0,0 +1 @@
+A popular approach to learning encoders for lossy compression is to use additive uniform noise during training as a differentiable approximation to test-time quantization. We demonstrate that a uniform noise channel can also be implemented at test time using universal quantization (Ziv, 1985). This allows us to eliminate the mismatch between training and test phases while maintaining a completely differentiable loss function. Implementing the uniform noise channel is a special case of a more general problem to communicate a sample, which we prove is computationally hard if we do not make assumptions about its distribution. However, the uniform special case is efficient as well as easy to implement and thus of great interest from a practical point of view. Finally, we show that quantization can be obtained as a limiting case of a soft quantizer applied to the uniform noise channel, bridging compression with and without quantization.
\ No newline at end of file
diff --git a/data/2020/neurips/Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms b/data/2020/neurips/Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Unsupervised Data Augmentation for Consistency Training b/data/2020/neurips/Unsupervised Data Augmentation for Consistency Training
new file mode 100644
index 0000000000..45f1913ede
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Data Augmentation for Consistency Training	
@@ -0,0 +1 @@
+Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Joint k-node Graph Representations with Compositional Energy-Based Models b/data/2020/neurips/Unsupervised Joint k-node Graph Representations with Compositional Energy-Based Models
new file mode 100644
index 0000000000..3dc1a58866
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Joint k-node Graph Representations with Compositional Energy-Based Models	
@@ -0,0 +1 @@
+Existing Graph Neural Network (GNN) methods that learn inductive unsupervised graph representations focus on learning node and edge representations by predicting observed edges in the graph. Although such approaches have shown advances in downstream node classification tasks, they are ineffective in jointly representing larger $k$-node sets, $k{>}2$. We propose MHM-GNN, an inductive unsupervised graph representation approach that combines joint $k$-node representations with energy-based models (hypergraph Markov networks) and GNNs. To address the intractability of the loss that arises from this combination, we endow our optimization with a loss upper bound using a finite-sample unbiased Markov Chain Monte Carlo estimator. Our experiments show that the unsupervised MHM-GNN representations of MHM-GNN produce better unsupervised representations than existing approaches from the literature.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Learning of Dense Visual Representations b/data/2020/neurips/Unsupervised Learning of Dense Visual Representations
new file mode 100644
index 0000000000..764deb6d72
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Learning of Dense Visual Representations	
@@ -0,0 +1 @@
+Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these methods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control b/data/2020/neurips/Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control
new file mode 100644
index 0000000000..e080ec9795
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control	
@@ -0,0 +1 @@
+Recent approaches for modelling dynamics of physical systems with neural networks enforce Lagrangian or Hamiltonian structure to improve prediction and generalization. However, these approaches fail to handle the case when coordinates are embedded in high-dimensional data such as images. We introduce a new unsupervised neural network model that learns Lagrangian dynamics from images, with interpretability that benefits prediction and control. The model infers Lagrangian dynamics on generalized coordinates that are simultaneously learned with a coordinate-aware variational autoencoder (VAE). The VAE is designed to account for the geometry of physical systems composed of multiple rigid bodies in the plane. By inferring interpretable Lagrangian dynamics, the model learns physical system properties, such as kinetic and potential energy, which enables long-term prediction of dynamics in the image space and synthesis of energy-based controllers.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Learning of Object Landmarks via Self-Training Correspondence b/data/2020/neurips/Unsupervised Learning of Object Landmarks via Self-Training Correspondence
new file mode 100644
index 0000000000..0c1d11981a
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Learning of Object Landmarks via Self-Training Correspondence	
@@ -0,0 +1 @@
+This paper addresses the problem of unsupervised discovery of object landmarks. We take a different path compared to existing works, based on 2 novel perspectives: (1) Self-training: starting from generic keypoints, we propose a self-training approach where the goal is to learn a detector that improves itself, becoming more and more tuned to object landmarks. (2) Correspondence: we identify correspondence as a key objective for unsupervised landmark discovery and propose an optimization scheme which alternates between recovering object landmark correspondence across different images via clustering and learning an object landmark descriptor without labels. Compared to previous works, our approach can learn landmarks that are more ﬂexible in terms of capturing large changes in viewpoint. We show the favourable properties of our method on a variety of dif-ﬁcult datasets including LS3D, BBCPose and Human3.6M. Code is available at https://github.com/malldimi1/UnsupervisedLandmarks .
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Learning of Visual Features by Contrasting Cluster Assignments b/data/2020/neurips/Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
new file mode 100644
index 0000000000..c4a1538050
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Learning of Visual Features by Contrasting Cluster Assignments	
@@ -0,0 +1 @@
+Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or views) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a swapped prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Representation Learning by Invariance Propagation b/data/2020/neurips/Unsupervised Representation Learning by Invariance Propagation
new file mode 100644
index 0000000000..d1fbd06d23
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Representation Learning by Invariance Propagation	
@@ -0,0 +1 @@
+Unsupervised learning methods based on contrastive learning have drawn increasing attention and achieved promising results. Most of them aim to learn representations invariant to instance-level variations, which are provided by different views of the same instance. In this paper, we propose Invariance Propagation to focus on learning representations invariant to category-level variations, which are provided by different instances from the same category. Our method recursively discovers semantically consistent samples residing in the same high-density regions in representation space. We demonstrate a hard sampling strategy to concentrate on maximizing the agreement between the anchor sample and its hard positive samples, which provide more intra-class variations to help capture more abstract invariance. As a result, with a ResNet-50 as the backbone, our method achieves 71.3% top-1 accuracy on ImageNet linear classification and 78.2% top-5 accuracy fine-tuning on only 1% labels, surpassing previous results. We also achieve state-of-the-art performance on other downstream tasks, including linear classification on Places205 and Pascal VOC, and transfer learning on small scale datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning b/data/2020/neurips/Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning
new file mode 100644
index 0000000000..fb684f92dc
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Semantic Aggregation and Deformable Template Matching for Semi-Supervised Learning	
@@ -0,0 +1 @@
+Unlabeled data learning has attracted considerable attention recently. However, it is still elusive to extract the expected high-level semantic feature with mere unsupervised learning. In the meantime, semi-supervised learning (SSL) demonstrates a promising future in leveraging few samples. In this paper, we combine both to propose an Unsupervised Semantic Aggregation and Deformable Template Matching (USADTM) framework for SSL, which strives to improve the classification performance with few labeled data and then reduce the cost in data annotating. Specifically, unsupervised semantic aggregation based on Triplet Mutual Information (T-MI) loss is explored to generate semantic labels for unlabeled data. Then the semantic labels are aligned to the actual class by the supervision of labeled data. Furthermore, a feature pool that stores the labeled samples is dynamically updated to assign proxy labels for unlabeled data, which are used as targets for cross-entropy minimization. Extensive experiments and analysis across four standard semi-supervised learning benchmarks validate that USADTM achieves top performance (e.g., 90.46$\%$ accuracy on CIFAR-10 with 40 labels and 95.20$\%$ accuracy with 250 labels). The code is released at this https URL.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Sound Separation Using Mixture Invariant Training b/data/2020/neurips/Unsupervised Sound Separation Using Mixture Invariant Training
new file mode 100644
index 0000000000..e40d62f35b
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Sound Separation Using Mixture Invariant Training	
@@ -0,0 +1 @@
+In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. Reliance on this synthetic training data is problematic because good performance depends upon the degree of match between the training data and real-world audio, especially in terms of the acoustic conditions and distribution of sources. The acoustic properties can be challenging to accurately simulate, and the distribution of sound types may be hard to replicate. In this paper, we propose a completely unsupervised method, mixture invariant training (MixIT), that requires only single-channel acoustic mixtures. In MixIT, training examples are constructed by mixing together existing mixtures, and the model separates them into a variable number of latent sources, such that the separated sources can be remixed to approximate the original mixtures. We show that MixIT can achieve competitive performance compared to supervised methods on speech separation. Using MixIT in a semi-supervised learning setting enables unsupervised domain adaptation and learning from large amounts of real world data without ground-truth source waveforms. In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures, train a speech enhancement system from noisy mixtures, and improve universal sound separation by incorporating a large amount of in-the-wild data.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Text Generation by Learning from Search b/data/2020/neurips/Unsupervised Text Generation by Learning from Search
new file mode 100644
index 0000000000..d5bb5dd519
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Text Generation by Learning from Search	
@@ -0,0 +1 @@
+In this work, we present TGLS, a novel framework to unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance with the state-of-the-art supervised methods in paraphrase generation.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised Translation of Programming Languages b/data/2020/neurips/Unsupervised Translation of Programming Languages
new file mode 100644
index 0000000000..cc5a0cc9d7
--- /dev/null
+++ b/data/2020/neurips/Unsupervised Translation of Programming Languages	
@@ -0,0 +1 @@
+A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
\ No newline at end of file
diff --git a/data/2020/neurips/Unsupervised object-centric video generation and decomposition in 3D b/data/2020/neurips/Unsupervised object-centric video generation and decomposition in 3D
new file mode 100644
index 0000000000..60ac15b0a1
--- /dev/null
+++ b/data/2020/neurips/Unsupervised object-centric video generation and decomposition in 3D	
@@ -0,0 +1 @@
+A natural approach to generative modeling of videos is to represent them as a composition of moving objects. Recent works model a set of 2D sprites over a slowly-varying background, but without considering the underlying 3D scene that gives rise to them. We instead propose to model a video as the view seen while moving through a scene with multiple 3D objects and a 3D background. Our model is trained from monocular videos without any supervision, yet learns to generate coherent 3D scenes containing several moving objects. We conduct detailed experiments on two datasets, going beyond the visual complexity supported by state-of-the-art generative approaches. We evaluate our method on depth-prediction and 3D object detection---tasks which cannot be addressed by those earlier works---and show it out-performs them even on 2D instance segmentation and tracking.
\ No newline at end of file
diff --git a/data/2020/neurips/Untangling tradeoffs between recurrence and self-attention in artificial neural networks b/data/2020/neurips/Untangling tradeoffs between recurrence and self-attention in artificial neural networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss b/data/2020/neurips/Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
new file mode 100644
index 0000000000..8a404300cc
--- /dev/null
+++ b/data/2020/neurips/Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss	
@@ -0,0 +1 @@
+We consider online learning for episodic stochastically constrained Markov decision processes (CMDP), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the episodes, whereas both the loss received and the budget consumption are revealed at the end of each episode. Previous works solve this problem under the restrictive assumption that the transition model of the Markov decision processes (MDP) is known a priori and establish regret bounds that depend polynomially on the cardinalities of the state space $\mathcal{S}$ and the action space $\mathcal{A}$. In this work, we propose a new \emph{upper confidence primal-dual} algorithm, which only requires the trajectories sampled from the transition model. In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning, which demonstrates the power of "optimism in the face of uncertainty" in constrained online learning.
\ No newline at end of file
diff --git a/data/2020/neurips/User-Dependent Neural Sequence Models for Continuous-Time Event Data b/data/2020/neurips/User-Dependent Neural Sequence Models for Continuous-Time Event Data
new file mode 100644
index 0000000000..66a9144c85
--- /dev/null
+++ b/data/2020/neurips/User-Dependent Neural Sequence Models for Continuous-Time Event Data	
@@ -0,0 +1 @@
+Continuous-time event data are common in applications such as individual behavior data, financial transactions, and medical health records. Modeling such data can be very challenging, in particular for applications with many different types of events, since it requires a model to predict the event types as well as the time of occurrence. Recurrent neural networks that parameterize time-varying intensity functions are the current state-of-the-art for predictive modeling with such data. These models typically assume that all event sequences come from the same data distribution. However, in many applications event sequences are generated by different sources, or users, and their characteristics can be very different. In this paper, we extend the broad class of neural marked point process models to mixtures of latent embeddings, where each mixture component models the characteristic traits of a given user. Our approach relies on augmenting these models with a latent variable that encodes user characteristics, represented by a mixture model over user behavior that is trained via amortized variational inference. We evaluate our methods on four large real-world datasets and demonstrate systematic improvements from our approach over existing work for a variety of predictive metrics such as log-likelihood, next event ranking, and source-of-sequence identification.
\ No newline at end of file
diff --git a/data/2020/neurips/Using noise to probe recurrent neural network structure and prune synapses b/data/2020/neurips/Using noise to probe recurrent neural network structure and prune synapses
new file mode 100644
index 0000000000..6a40acdb3e
--- /dev/null
+++ b/data/2020/neurips/Using noise to probe recurrent neural network structure and prune synapses	
@@ -0,0 +1 @@
+Many networks in the brain are sparsely connected, and the brain eliminates synapses during development and learning. How could the brain decide which synapses to prune? In a recurrent network, determining the importance of a synapse between two neurons is a difficult computational problem, depending on the role that both neurons play and on all possible pathways of information flow between them. Noise is ubiquitous in neural systems, and often considered an irritant to be overcome. Here we suggest that noise could play a functional role in synaptic pruning, allowing the brain to probe network structure and determine which synapses are redundant. We construct a simple, local, unsupervised plasticity rule that either strengthens or prunes synapses using only synaptic weight and the noise-driven covariance of the neighboring neurons. For a subset of linear and rectified-linear networks, we prove that this rule preserves the spectrum of the original matrix and hence preserves network dynamics even when the fraction of pruned synapses asymptotically approaches 1. The plasticity rule is biologically-plausible and may suggest a new role for noise in neural computation.
\ No newline at end of file
diff --git a/data/2020/neurips/VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data b/data/2020/neurips/VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
new file mode 100644
index 0000000000..f1d48f6e75
--- /dev/null
+++ b/data/2020/neurips/VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data	
@@ -0,0 +1 @@
+Deep generative models often perform poorly in real-world applications due to the heterogeneity of natural data sets. Heterogeneity arises from data containing different types of features (categorical, ordinal, continuous, etc.) and features of the same type having different marginal distributions. We propose an extension of variational autoencoders (VAEs) called VAEM to handle such heterogeneous data. VAEM is a deep generative model that is trained in a two stage manner such that the first stage provides a more uniform representation of the data to the second stage, thereby sidestepping the problems caused by heterogeneous data. We provide extensions of VAEM to handle partially observed data, and demonstrate its performance in data generation, missing data prediction and sequential feature selection tasks. Our results show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
\ No newline at end of file
diff --git a/data/2020/neurips/VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain b/data/2020/neurips/VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
new file mode 100644
index 0000000000..bbcbc401f6
--- /dev/null
+++ b/data/2020/neurips/VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain	
@@ -0,0 +1 @@
+Self-and semi-supervised learning frameworks have made signiﬁcant progress in training machine learning models with limited labeled data in image and language domains. These methods heavily rely on the unique structure in the domain datasets (such as spatial relationships in images or semantic relationships in language). They are not adaptable to general tabular data which does not have the same explicit structure as image and language data. In this paper, we ﬁll this gap by proposing novel self-and semi-supervised learning frameworks for tabular data, which we refer to collectively as VIME (Value Imputation and Mask Estimation). We create a novel pretext task of estimating mask vectors from corrupted tabular data in addition to the reconstruction pretext task for self-supervised learning. We also introduce a novel tabular data augmentation method for self-and semi-supervised learning frameworks. In experiments, we evaluate the proposed framework in multiple tabular datasets from various application domains, such as genomics and clinical data. VIME exceeds state-of-the-art performance in comparison to the existing baseline methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Value-driven Hindsight Modelling b/data/2020/neurips/Value-driven Hindsight Modelling
new file mode 100644
index 0000000000..18b718db05
--- /dev/null
+++ b/data/2020/neurips/Value-driven Hindsight Modelling	
@@ -0,0 +1 @@
+Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn predictors for value from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. In contrast, model-free methods directly leverage the quantity of interest from the future but have to compose with a potentially weak scalar signal (an estimate of the return). In this paper we develop an approach for representation learning in RL that sits in between these two extremes: we propose to learn what to model in a way that can directly help value prediction. To this end we determine which features of the future trajectory provide useful information to predict the associated return. This provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function. The idea can be understood as reasoning, in hindsight, about which aspects of the future observations could help past value prediction. We show how this can help dramatically even in simple policy evaluation settings. We then test our approach at scale in challenging domains, including on 57 Atari 2600 games.
\ No newline at end of file
diff --git a/data/2020/neurips/VarGrad: A Low-Variance Gradient Estimator for Variational Inference b/data/2020/neurips/VarGrad: A Low-Variance Gradient Estimator for Variational Inference
new file mode 100644
index 0000000000..33c382c331
--- /dev/null
+++ b/data/2020/neurips/VarGrad: A Low-Variance Gradient Estimator for Variational Inference	
@@ -0,0 +1 @@
+We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.
\ No newline at end of file
diff --git a/data/2020/neurips/Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization b/data/2020/neurips/Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization
new file mode 100644
index 0000000000..f9979018e8
--- /dev/null
+++ b/data/2020/neurips/Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization	
@@ -0,0 +1 @@
+In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA)}. In the nonstrongly convex and smooth setting, SVR-ADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, SVR-ADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly convex and smooth setting, SVR-ADA matches the lower bound in the regime $n\le O(\kappa)$ while it improves the rate in the regime $n\gg \kappa$ to $O(n\log\log n +\frac{n\log(1/(n\epsilon))}{\log(n/\kappa)})$, where $\kappa$ is the condition number. SVR-ADA improves complexity of the best known methods without use of any additional strategy such as optimal black-box reduction, and it leads to a unified convergence analysis and simplified algorithm for both the nonstrongly convex and strongly convex settings. Through experiments on real datasets, we also show the superior performance of SVR-ADA over existing methods for large-scale machine learning problems.
\ No newline at end of file
diff --git a/data/2020/neurips/Variance reduction for Random Coordinate Descent-Langevin Monte Carlo b/data/2020/neurips/Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
new file mode 100644
index 0000000000..477757e8b0
--- /dev/null
+++ b/data/2020/neurips/Variance reduction for Random Coordinate Descent-Langevin Monte Carlo	
@@ -0,0 +1,3 @@
+Sampling from a log-concave distribution function is one core problem that has wide applications in Bayesian statistics and machine learning. While most gradient free methods have slow convergence rate, the Langevin Monte Carlo (LMC) that provides fast convergence requires the computation of gradients. In practice one uses finite-differencing approximations as surrogates, and the method is expensive in high-dimensions. 
+A natural strategy to reduce computational cost in each iteration is to utilize random gradient approximations, such as random coordinate descent (RCD) or simultaneous perturbation stochastic approximation (SPSA). We show by a counter-example that blindly applying RCD does not achieve the goal in the most general setting. The high variance induced by the randomness means a larger number of iterations are needed, and this balances out the saving in each iteration. 
+We then introduce a new variance reduction approach, termed Randomized Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC. The methods are termed RCAD-O-LMC and RCAD-U-LMC respectively. The methods still sit in the random gradient approximation framework, and thus the computational cost in each iteration is low. However, by employing RCAD, the variance is reduced, so the methods converge within the same number of iterations as the classical overdamped and underdamped LMC. This leads to a computational saving overall.
\ No newline at end of file
diff --git a/data/2020/neurips/Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis b/data/2020/neurips/Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis
new file mode 100644
index 0000000000..d363a97be6
--- /dev/null
+++ b/data/2020/neurips/Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis	
@@ -0,0 +1 @@
+Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i.i.d.\ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i.i.d.\ and Markovian samples. In the i.i.d.\ setting, our algorithm achieves a sample complexity $O(\epsilon^{-\frac{3}{5}} \log{\epsilon}^{-1})$ that is lower than the state-of-the-art result $O(\epsilon^{-1} \log {\epsilon}^{-1})$. In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity $O(\epsilon^{-1} \log {\epsilon}^{-1})$ that is near-optimal. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than both the conventional TDC and the variance-reduced TD.
\ No newline at end of file
diff --git a/data/2020/neurips/Variational Amodal Object Completion b/data/2020/neurips/Variational Amodal Object Completion
new file mode 100644
index 0000000000..42fd5eb681
--- /dev/null
+++ b/data/2020/neurips/Variational Amodal Object Completion	
@@ -0,0 +1 @@
+In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging. To facilitate downstream tasks, it is thus important to reason about the full extent of objects, i.e., seeing behind occlusion, typically referred to as amodal instance completion . In this paper, we propose a variational generative framework for amodal completion, referred to as Amodal-VAE, which does not require any amodal labels at training time, as it is able to utilize widely available object instance masks. We showcase our approach on the downstream task of scene editing where the user is presented with interactive tools to complete and erase objects in photographs. Experiments on complex street scenes demonstrate state-of-the-art performance in amodal mask completion, and showcase high quality scene editing results. Interestingly, a user study shows that humans prefer object completions inferred by our model to the human-labeled ones.
\ No newline at end of file
diff --git a/data/2020/neurips/Variational Bayesian Monte Carlo with Noisy Likelihoods b/data/2020/neurips/Variational Bayesian Monte Carlo with Noisy Likelihoods
new file mode 100644
index 0000000000..816c5a613a
--- /dev/null
+++ b/data/2020/neurips/Variational Bayesian Monte Carlo with Noisy Likelihoods	
@@ -0,0 +1 @@
+Variational Bayesian Monte Carlo (VBMC) is a recently introduced framework that uses Gaussian process surrogates to perform approximate Bayesian inference in models with black-box, non-cheap likelihoods. In this work, we extend VBMC to deal with noisy log-likelihood evaluations, such as those arising from simulation-based models. We introduce new `global' acquisition functions, such as expected information gain (EIG) and variational interquantile range (VIQR), which are robust to noise and can be efficiently evaluated within the VBMC setting. In a novel, challenging, noisy-inference benchmark comprising of a variety of models with real datasets from computational and cognitive neuroscience, VBMC+VIQR achieves state-of-the-art performance in recovering the ground-truth posteriors and model evidence. In particular, our method vastly outperforms `local' acquisition functions and other surrogate-based inference methods while keeping a small algorithmic cost. Our benchmark corroborates VBMC as a general-purpose technique for sample-efficient black-box Bayesian inference also with noisy models.
\ No newline at end of file
diff --git a/data/2020/neurips/Variational Bayesian Unlearning b/data/2020/neurips/Variational Bayesian Unlearning
new file mode 100644
index 0000000000..d11f361420
--- /dev/null
+++ b/data/2020/neurips/Variational Bayesian Unlearning	
@@ -0,0 +1 @@
+This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i.e., including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge. We empirically demonstrate our unlearning methods on Bayesian models such as sparse Gaussian process and logistic regression using synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings b/data/2020/neurips/Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings
new file mode 100644
index 0000000000..76cc32df44
--- /dev/null
+++ b/data/2020/neurips/Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings	
@@ -0,0 +1 @@
+We propose a framework that lifts the capabilities of graph convolutional networks (GCNs) to scenarios where no input graph is given and increases their robustness to adversarial attacks. We formulate a joint probabilistic model that considers a prior distribution over graphs along with a GCN-based likelihood and develop a stochastic variational inference algorithm to estimate the graph posterior and the GCN parameters jointly. To address the problem of propagating gradients through latent variables drawn from discrete distributions, we use their continuous relaxations known as Concrete distributions. We show that, on real datasets, our approach can outperform state-of-the-art Bayesian and non-Bayesian graph neural network algorithms on the task of semi-supervised classification in the absence of graph data and when the network structure is subjected to adversarial perturbations.
\ No newline at end of file
diff --git a/data/2020/neurips/Variational Policy Gradient Method for Reinforcement Learning with General Utilities b/data/2020/neurips/Variational Policy Gradient Method for Reinforcement Learning with General Utilities
new file mode 100644
index 0000000000..df46a0ee26
--- /dev/null
+++ b/data/2020/neurips/Variational Policy Gradient Method for Reinforcement Learning with General Utilities	
@@ -0,0 +1 @@
+In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.
\ No newline at end of file
diff --git a/data/2020/neurips/Video Frame Interpolation without Temporal Priors b/data/2020/neurips/Video Frame Interpolation without Temporal Priors
new file mode 100644
index 0000000000..4cfb737420
--- /dev/null
+++ b/data/2020/neurips/Video Frame Interpolation without Temporal Priors	
@@ -0,0 +1 @@
+Video frame interpolation, which aims to synthesize non-exist intermediate frames in a video sequence, is an important research topic in computer vision. Existing video frame interpolation methods have achieved remarkable results under specific assumptions, such as instant or known exposure time. However, in complicated real-world situations, the temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors. When test videos are taken under different exposure settings from training ones, the interpolated frames will suffer significant misalignment problems. In this work, we solve the video frame interpolation problem in a general situation, where input frames can be acquired under uncertain exposure (and interval) time. Unlike previous methods that can only be applied to a specific temporal prior, we derive a general curvilinear motion trajectory formula from four consecutive sharp frames or two consecutive blurry frames without temporal priors. Moreover, utilizing constraints within adjacent motion trajectories, we devise a novel optical flow refinement strategy for better interpolation results. Finally, experiments demonstrate that one well-trained model is enough for synthesizing high-quality slow-motion videos under complicated real-world situations. Codes are available on https://github.com/yjzhang96/UTI-VFI.
\ No newline at end of file
diff --git a/data/2020/neurips/Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement b/data/2020/neurips/Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement
new file mode 100644
index 0000000000..7743d6cb14
--- /dev/null
+++ b/data/2020/neurips/Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement	
@@ -0,0 +1 @@
+We propose a new matching-based framework for semi-supervised video object segmentation (VOS). Recently, state-of-the-art VOS performance has been achieved by matching-based algorithms, in which feature banks are created to store features for region matching and classification. However, how to effectively organize information in the continuously growing feature bank remains under-explored, and this leads to inefficient design of the bank. We introduce an adaptive feature bank update scheme to dynamically absorb new features and discard obsolete features. We also design a new confidence loss and a fine-grained segmentation module to enhance the segmentation accuracy in uncertain regions. On public benchmarks, our algorithm outperforms existing state-of-the-arts.
\ No newline at end of file
diff --git a/data/2020/neurips/Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization b/data/2020/neurips/Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization
new file mode 100644
index 0000000000..6a39fcfcd9
--- /dev/null
+++ b/data/2020/neurips/Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization	
@@ -0,0 +1 @@
+Descent directions such as movement towards Frank-Wolfe vertices, away steps, in-face away steps and pairwise directions have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of movement in these directions towards attaining constrained minimizers. The best local direction of descent is the directional derivative of the projection of the gradient, which we refer to as the $\textit{shadow}$ of the gradient. We show that the continuous-time dynamics of moving in the shadow are equivalent to those of PGD however non-trivial to discretize. By projecting gradients in PGD, one not only ensures feasibility but also is able to "wrap" around the convex region. We show that Frank-Wolfe (FW) vertices in fact recover the maximal wrap one can obtain by projecting gradients, thus providing a new perspective to these steps. We also claim that the shadow steps give the best direction of descent emanating from the convex hull of all possible away-vertices. Opening up the PGD movements in terms of shadow steps gives linear convergence, dependent on the number of faces. We combine these insights into a novel $S\small{HADOW}$-$CG$ method that uses FW steps (i.e., wrap around the polytope) and shadow steps (i.e., optimal local descent direction), while enjoying linear convergence. Our analysis develops properties of directional derivatives of projections (which may be of independent interest), while providing a unifying view of various descent directions in the CGD literature.
\ No newline at end of file
diff --git a/data/2020/neurips/Walsh-Hadamard Variational Inference for Bayesian Deep Learning b/data/2020/neurips/Walsh-Hadamard Variational Inference for Bayesian Deep Learning
new file mode 100644
index 0000000000..0b1edd7d8b
--- /dev/null
+++ b/data/2020/neurips/Walsh-Hadamard Variational Inference for Bayesian Deep Learning	
@@ -0,0 +1 @@
+Over-parameterized models, such as DeepNets and ConvNets, form a class of models that are routinely adopted in a wide variety of applications, and for which Bayesian inference is desirable but extremely challenging. Variational inference offers the tools to tackle this challenge in a scalable way and with some degree of flexibility on the approximation, but for over-parameterized models this is challenging due to the over-regularization property of the variational objective. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference (WHVI), which uses Walsh-Hadamard-based factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective. Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference.
\ No newline at end of file
diff --git a/data/2020/neurips/Wasserstein Distances for Stereo Disparity Estimation b/data/2020/neurips/Wasserstein Distances for Stereo Disparity Estimation
new file mode 100644
index 0000000000..43bd526307
--- /dev/null
+++ b/data/2020/neurips/Wasserstein Distances for Stereo Disparity Estimation	
@@ -0,0 +1 @@
+Existing approaches to depth or disparity estimation output a distribution over a set of pre-defined discrete values. This leads to inaccurate results when the true depth or disparity does not match any of these values. The fact that this distribution is usually learned indirectly through a regression loss causes further problems in ambiguous regions around object boundaries. We address these issues using a new neural network architecture that is capable of outputting arbitrary depth values, and a new loss function that is derived from the Wasserstein distance between the true and the predicted distributions. We validate our approach on a variety of tasks, including stereo disparity and depth estimation, and the downstream 3D object detection. Our approach drastically reduces the error in ambiguous regions, especially around object boundaries that greatly affect the localization of objects in 3D, achieving the state-of-the-art in 3D object detection for autonomous driving.
\ No newline at end of file
diff --git a/data/2020/neurips/Watch out! Motion is Blurring the Vision of Your Deep Neural Networks b/data/2020/neurips/Watch out! Motion is Blurring the Vision of Your Deep Neural Networks
new file mode 100644
index 0000000000..316c9e054b
--- /dev/null
+++ b/data/2020/neurips/Watch out! Motion is Blurring the Vision of Your Deep Neural Networks	
@@ -0,0 +1 @@
+The state-of-the-art deep neural networks (DNNs) are vulnerable against adversarial examples with additive random-like noise perturbations. While such examples are hardly found in the physical world, the image blurring effect caused by object motion, on the other hand, commonly occurs in practice, making the study of which greatly important especially for the widely adopted real-time image processing tasks (e.g., object detection, tracking). In this paper, we initiate the first step to comprehensively investigate the potential hazards of the blur effect for DNN, caused by object motion. We propose a novel adversarial attack method that can generate visually natural motion-blurred adversarial examples, named motion-based adversarial blur attack (ABBA). To this end, we first formulate the kernel-prediction-based attack where an input image is convolved with kernels in a pixel-wise way, and the misclassification capability is achieved by tuning the kernel weights. To generate visually more natural and plausible examples, we further propose the saliency-regularized adversarial kernel prediction, where the salient region serves as a moving object, and the predicted kernel is regularized to achieve naturally visual effects. Besides, the attack is further enhanced by adaptively tuning the translations of object and background. A comprehensive evaluation on the NeurIPS'17 adversarial competition dataset demonstrates the effectiveness of ABBA by considering various kernel sizes, translations, and regions. The in-depth study further confirms that our method shows more effective penetrating capability to the state-of-the-art GAN-based deblurring mechanisms compared with other blurring methods. We release the code to this https URL
\ No newline at end of file
diff --git a/data/2020/neurips/Wavelet Flow: Fast Training of High Resolution Normalizing Flows b/data/2020/neurips/Wavelet Flow: Fast Training of High Resolution Normalizing Flows
new file mode 100644
index 0000000000..d35bfaa1e0
--- /dev/null
+++ b/data/2020/neurips/Wavelet Flow: Fast Training of High Resolution Normalizing Flows	
@@ -0,0 +1 @@
+Normalizing flows are a class of probabilistic generative models which allow for both fast density computation and efficient sampling and are effective at modelling complex distributions like images. A drawback among current methods is their significant training cost, sometimes requiring months of GPU training time to achieve state-of-the-art results. This paper introduces Wavelet Flow, a multi-scale, normalizing flow architecture based on wavelets. A Wavelet Flow has an explicit representation of signal scale that inherently includes models of lower resolution signals and conditional generation of higher resolution signals, i.e., super resolution. A major advantage of Wavelet Flow is the ability to construct generative models for high resolution data (e.g., 1024 x 1024 images) that are impractical with previous models. Furthermore, Wavelet Flow is competitive with previous normalizing flows in terms of bits per dimension on standard (low resolution) benchmarks while being up to 15x faster to train.
\ No newline at end of file
diff --git a/data/2020/neurips/Weak Form Generalized Hamiltonian Learning b/data/2020/neurips/Weak Form Generalized Hamiltonian Learning
new file mode 100644
index 0000000000..e486293d6c
--- /dev/null
+++ b/data/2020/neurips/Weak Form Generalized Hamiltonian Learning	
@@ -0,0 +1 @@
+We present a method for learning generalized Hamiltonian decompositions of ordinary differential equations given a set of noisy time series measurements. Our method simultaneously learns a continuous time model and a scalar energy function for a general dynamical system. Learning predictive models in this form allows one to place strong, high-level, physics inspired priors onto the form of the learnt governing equations for general dynamical systems. Moreover, having shown how our method extends and unifies some previous work in deep learning with physics inspired priors, we present a novel method for learning continuous time models from the weak form of the governing equations which is less computationally taxing than standard adjoint methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Weakly Supervised Deep Functional Maps for Shape Matching b/data/2020/neurips/Weakly Supervised Deep Functional Maps for Shape Matching
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2020/neurips/Weakly-Supervised Reinforcement Learning for Controllable Behavior b/data/2020/neurips/Weakly-Supervised Reinforcement Learning for Controllable Behavior
new file mode 100644
index 0000000000..a8a2089861
--- /dev/null
+++ b/data/2020/neurips/Weakly-Supervised Reinforcement Learning for Controllable Behavior	
@@ -0,0 +1 @@
+Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.
\ No newline at end of file
diff --git a/data/2020/neurips/Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning b/data/2020/neurips/Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..631d6a0c31
--- /dev/null
+++ b/data/2020/neurips/Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.
\ No newline at end of file
diff --git a/data/2020/neurips/Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings b/data/2020/neurips/Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings
new file mode 100644
index 0000000000..2500e13e6b
--- /dev/null
+++ b/data/2020/neurips/Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings	
@@ -0,0 +1 @@
+Graph kernels based on the $1$-dimensional Weisfeiler-Leman algorithm and corresponding neural architectures recently emerged as powerful tools for (supervised) learning with graphs. However, due to the purely local nature of the algorithms, they might miss essential patterns in the given data and can only handle binary relations. The $k$-dimensional Weisfeiler-Leman algorithm addresses this by considering $k$-tuples, defined over the set of vertices, and defines a suitable notion of adjacency between these vertex tuples. Hence, it accounts for the higher-order interactions between vertices. However, it does not scale and may suffer from overfitting when used in a machine learning setting. Hence, it remains an important open problem to design WL-based graph learning methods that are simultaneously expressive, scalable, and non-overfitting. Here, we propose local variants and corresponding neural architectures, which consider a subset of the original neighborhood, making them more scalable, and less prone to overfitting. The expressive power of (one of) our algorithms is strictly higher than the original algorithm, in terms of ability to distinguish non-isomorphic graphs. Our experimental study confirms that the local algorithms, both kernel and neural architectures, lead to vastly reduced computation times, and prevent overfitting. The kernel version establishes a new state-of-the-art for graph classification on a wide range of benchmark datasets, while the neural version shows promising performance on large-scale molecular regression tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Weston-Watkins Hinge Loss and Ordered Partitions b/data/2020/neurips/Weston-Watkins Hinge Loss and Ordered Partitions
new file mode 100644
index 0000000000..55541113a5
--- /dev/null
+++ b/data/2020/neurips/Weston-Watkins Hinge Loss and Ordered Partitions	
@@ -0,0 +1 @@
+Multiclass extensions of the support vector machine (SVM) have been formulated in a variety of ways. A recent empirical comparison of nine such formulations [Doǧan et al. 2016] recommends the variant proposed by Weston and Watkins (WW), despite the fact that the WW-hinge loss is not calibrated with respect to the 0-1 loss. In this work we introduce a novel discrete loss function for multiclass classification, the ordered partition loss, and prove that the WW-hinge loss is calibrated with respect to this loss. We also argue that the ordered partition loss is maximally informative among discrete losses satisfying this property. Finally, we apply our theory to justify the empirical observation made by Doǧan et al. that the WW-SVM can work well even under massive label noise, a challenging setting for multiclass SVMs.
\ No newline at end of file
diff --git a/data/2020/neurips/What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes b/data/2020/neurips/What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes
new file mode 100644
index 0000000000..c40c8f1e84
--- /dev/null
+++ b/data/2020/neurips/What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes	
@@ -0,0 +1 @@
+We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning. Rather, the information needed for the explanations must be collected in conjunction with training the agent. We derive approaches designed to extract local explanations based on intention for several variants of Q-function approximation and prove consistency between the explanations and the Q-values learned. We demonstrate our method on multiple reinforcement learning problems, and provide code to help researchers introspecting their RL environments and algorithms.
\ No newline at end of file
diff --git a/data/2020/neurips/What Do Neural Networks Learn When Trained With Random Labels? b/data/2020/neurips/What Do Neural Networks Learn When Trained With Random Labels?
new file mode 100644
index 0000000000..5c0cd327bf
--- /dev/null
+++ b/data/2020/neurips/What Do Neural Networks Learn When Trained With Random Labels?	
@@ -0,0 +1 @@
+We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.
\ No newline at end of file
diff --git a/data/2020/neurips/What Makes for Good Views for Contrastive Learning? b/data/2020/neurips/What Makes for Good Views for Contrastive Learning?
new file mode 100644
index 0000000000..8f201db730
--- /dev/null
+++ b/data/2020/neurips/What Makes for Good Views for Contrastive Learning?	
@@ -0,0 +1 @@
+Contrastive learning between multiple views of the data has recently achieved state of the art performance in the field of self-supervised representation learning. Despite its success, the influence of different view choices has been less studied. In this paper, we use empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. We also consider data augmentation as a way to reduce MI, and show that increasing data augmentation indeed leads to decreasing MI and improves downstream classification accuracy. As a by-product, we also achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification ($73\%$ top-1 linear readoff with a ResNet-50). In addition, transferring our models to PASCAL VOC object detection and COCO instance segmentation consistently outperforms supervised pre-training. Code:this http URL
\ No newline at end of file
diff --git a/data/2020/neurips/What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation b/data/2020/neurips/What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
new file mode 100644
index 0000000000..486eea635e
--- /dev/null
+++ b/data/2020/neurips/What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation	
@@ -0,0 +1,2 @@
+Deep learning algorithms are well-known to have a propensity for fitting the training data very well and often fit even outliers and mislabeled data points. Such fitting requires memorization of training data labels, a phenomenon that has attracted significant research interest but has not been given a compelling explanation so far. A recent work of Feldman (2019) proposes a theoretical explanation for this phenomenon based on a combination of two insights. First, natural image and data distributions are (informally) known to be long-tailed, that is have a significant fraction of rare and atypical examples. Second, in a simple theoretical model such memorization is necessary for achieving close-to-optimal generalization error when the data distribution is long-tailed. However, no direct empirical evidence for this explanation or even an approach for obtaining such evidence were given. 
+In this work we design experiments to test the key ideas in this theory. The experiments require estimation of the influence of each training example on the accuracy at each test example as well as memorization values of training examples. Estimating these quantities directly is computationally prohibitive but we show that closely-related subsampled influence and memorization values can be estimated much more efficiently. Our experiments demonstrate the significant benefits of memorization for generalization on several standard benchmarks. They also provide quantitative and visually compelling evidence for the theory put forth in (Feldman, 2019).
\ No newline at end of file
diff --git a/data/2020/neurips/What if Neural Networks had SVDs? b/data/2020/neurips/What if Neural Networks had SVDs?
new file mode 100644
index 0000000000..578433cb95
--- /dev/null
+++ b/data/2020/neurips/What if Neural Networks had SVDs?	
@@ -0,0 +1 @@
+Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at this http URL .
\ No newline at end of file
diff --git a/data/2020/neurips/What is being transferred in transfer learning? b/data/2020/neurips/What is being transferred in transfer learning?
new file mode 100644
index 0000000000..8d6e80b8ff
--- /dev/null
+++ b/data/2020/neurips/What is being transferred in transfer learning?	
@@ -0,0 +1 @@
+One desired capability for machines is the ability to transfer their knowledge of one domain to another where data is (usually) scarce. Despite ample adaptation of transfer learning in various deep learning applications, we yet do not understand what enables a successful transfer and which part of the network is responsible for that. In this paper, we provide new tools and analyses to address these fundamental questions. Through a series of analyses on transferring to block-shuffled images, we separate the effect of feature reuse from learning low-level statistics of data and show that some benefit of transfer learning comes from the latter. We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
\ No newline at end of file
diff --git a/data/2020/neurips/What shapes feature representations? Exploring datasets, architectures, and training b/data/2020/neurips/What shapes feature representations? Exploring datasets, architectures, and training
new file mode 100644
index 0000000000..7963f63fec
--- /dev/null
+++ b/data/2020/neurips/What shapes feature representations? Exploring datasets, architectures, and training	
@@ -0,0 +1 @@
+In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed. Interestingly, in some cases, an easier, weakly predictive feature can suppress a more strongly predictive, but more difficult one. Additionally, models trained to recognize both easy and hard features learn representations most similar to models that use only the easy feature. Further, easy features lead to more consistent representations across model runs than do hard features. Finally, models have greater representational similarity to an untrained model than to models trained on a different task. Our results highlight the complex processes that determine which features a model represents.
\ No newline at end of file
diff --git a/data/2020/neurips/What went wrong and when? Instance-wise feature importance for time-series black-box models b/data/2020/neurips/What went wrong and when? Instance-wise feature importance for time-series black-box models
new file mode 100644
index 0000000000..d08213400e
--- /dev/null
+++ b/data/2020/neurips/What went wrong and when? Instance-wise feature importance for time-series black-box models	
@@ -0,0 +1 @@
+Explanations of time series models are useful for high stakes applications like healthcare but have received little attention in machine learning literature. We propose FIT, a framework that evaluates the importance of observations for a multivariate time-series black-box model by quantifying the shift in the predictive distribution over time. FIT deﬁnes the importance of an observation based on its contribution to the distributional shift under a KL-divergence that contrasts the predictive distribution against a counterfactual where the rest of the features are unobserved. We also demonstrate the need to control for time-dependent distribution shifts. We compare with state-of-the-art baselines on simulated and real-world clinical data and demonstrate that our approach is superior in identifying important time points and observations throughout the time series.
\ No newline at end of file
diff --git a/data/2020/neurips/When Counterpoint Meets Chinese Folk Melodies b/data/2020/neurips/When Counterpoint Meets Chinese Folk Melodies
new file mode 100644
index 0000000000..f87c82e569
--- /dev/null
+++ b/data/2020/neurips/When Counterpoint Meets Chinese Folk Melodies	
@@ -0,0 +1 @@
+Counterpoint is an important concept in Western music theory. In the past century, there have been significant interests in incorporating counterpoint into Chinese folk music composition. In this paper, we propose a reinforcement learning-based system, named FolkDuet, towards the online countermelody generation for Chinese folk melodies. With no existing data of Chinese folk duets, FolkDuet employs two reward models based on out-of-domain data i.e., Bach chorales, and monophonic Chinese folk melodies. An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns. With both rewards, the generator of FolkDuet is trained to generate countermelodies while maintaining the Chinese folk style. The entire generation process is performed in an online fashion, allowing real-time interactive human-machine duet improvisation. Experiments show that the proposed algorithm achieves better subjective and objective results than the baselines.
\ No newline at end of file
diff --git a/data/2020/neurips/When Do Neural Networks Outperform Kernel Methods? b/data/2020/neurips/When Do Neural Networks Outperform Kernel Methods?
new file mode 100644
index 0000000000..d7fc192463
--- /dev/null
+++ b/data/2020/neurips/When Do Neural Networks Outperform Kernel Methods?	
@@ -0,0 +1 @@
+For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.
\ No newline at end of file
diff --git a/data/2020/neurips/When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes b/data/2020/neurips/When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes
new file mode 100644
index 0000000000..272d0bfd6c
--- /dev/null
+++ b/data/2020/neurips/When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes	
@@ -0,0 +1 @@
+The coronavirus disease 2019 (COVID-19) global pandemic has led many countries to impose unprecedented lockdown measures in order to slow down the outbreak. Questions on whether governments have acted promptly enough, and whether lockdown measures can be lifted soon have since been central in public discourse. Data-driven models that predict COVID-19 fatalities under different lockdown policy scenarios are essential for addressing these questions and informing governments on future policy directions. To this end, this paper develops a Bayesian model for predicting the effects of COVID-19 lockdown policies in a global context -- we treat each country as a distinct data point, and exploit variations of policies across countries to learn country-specific policy effects. Our model utilizes a two-layer Gaussian process (GP) prior -- the lower layer uses a compartmental SEIR (Susceptible, Exposed, Infected, Recovered) model as a prior mean function with "country-and-policy-specific" parameters that capture fatality curves under "counterfactual" policies within each country, whereas the upper layer is shared across all countries, and learns lower-layer SEIR parameters as a function of a country's features and its policy indicators. Our model combines the solid mechanistic foundations of SEIR models (Bayesian priors) with the flexible data-driven modeling and gradient-based optimization routines of machine learning (Bayesian posteriors) -- i.e., the entire model is trained end-to-end via stochastic variational inference. We compare the projections of COVID-19 fatalities by our model with other models listed by the Center for Disease Control (CDC), and provide scenario analyses for various lockdown and reopening strategies highlighting their impact on COVID-19 fatalities.
\ No newline at end of file
diff --git a/data/2020/neurips/Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective b/data/2020/neurips/Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective
new file mode 100644
index 0000000000..7e28018a99
--- /dev/null
+++ b/data/2020/neurips/Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? - A Neural Tangent Kernel Perspective	
@@ -0,0 +1 @@
+Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.
\ No newline at end of file
diff --git a/data/2020/neurips/Why Normalizing Flows Fail to Detect Out-of-Distribution Data b/data/2020/neurips/Why Normalizing Flows Fail to Detect Out-of-Distribution Data
new file mode 100644
index 0000000000..4d2f92262f
--- /dev/null
+++ b/data/2020/neurips/Why Normalizing Flows Fail to Detect Out-of-Distribution Data	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image dataset. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection. Our investigation reveals that properties that enable flows to generate high-fidelity images can have a detrimental effect on OOD detection.
\ No newline at end of file
diff --git a/data/2020/neurips/Why are Adaptive Methods Good for Attention Models? b/data/2020/neurips/Why are Adaptive Methods Good for Attention Models?
new file mode 100644
index 0000000000..c388631908
--- /dev/null
+++ b/data/2020/neurips/Why are Adaptive Methods Good for Attention Models?	
@@ -0,0 +1 @@
+While stochastic gradient descent (SGD) is still the \emph{de facto} algorithm in deep learning, adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across important tasks, such as attention models. The settings under which SGD performs poorly in comparison to adaptive methods are not well understood yet. In this paper, we provide empirical and theoretical evidence that a heavy-tailed distribution of the noise in stochastic gradients is one cause of SGD's poor performance. We provide the first tight upper and lower convergence bounds for adaptive gradient methods under heavy-tailed noise. Further, we demonstrate how gradient clipping plays a key role in addressing heavy-tailed gradient noise. Subsequently, we show how clipping can be applied in practice by developing an \emph{adaptive} coordinate-wise clipping algorithm (ACClip) and demonstrate its superior performance on BERT pretraining and finetuning tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/Winning the Lottery with Continuous Sparsification b/data/2020/neurips/Winning the Lottery with Continuous Sparsification
new file mode 100644
index 0000000000..ccb620cd5a
--- /dev/null
+++ b/data/2020/neurips/Winning the Lottery with Continuous Sparsification	
@@ -0,0 +1 @@
+The Lottery Ticket Hypothesis conjectures that, for a typically-sized neural network, it is possible to find small sub-networks that, when trained from scratch, match the performance of the dense counterpart given a comparable training budget. The proposed algorithm to search for winning tickets, Iterative Magnitude Pruning, consistently finds sparse sub-networks which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning. In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that are sparser than the ones found by Iterative Magnitude Pruning, while achieving higher performance when trained from scratch. Moreover, our method can be efficiently parallelized, decreasing the ticket search cost measured in wall-clock time significantly given enough parallel computing resources.
\ No newline at end of file
diff --git a/data/2020/neurips/Wisdom of the Ensemble: Improving Consistency of Deep Learning Models b/data/2020/neurips/Wisdom of the Ensemble: Improving Consistency of Deep Learning Models
new file mode 100644
index 0000000000..1452a75ed7
--- /dev/null
+++ b/data/2020/neurips/Wisdom of the Ensemble: Improving Consistency of Deep Learning Models	
@@ -0,0 +1 @@
+Deep learning classifiers are assisting humans in making decisions and hence the user's trust in these models is of paramount importance. Trust is often a function of constant behavior. From an AI model perspective it means given the same input the user would expect the same output, especially for correct outputs, or in other words consistently correct outputs. This paper studies a model behavior in the context of periodic retraining of deployed models where the outputs from successive generations of the models might not agree on the correct labels assigned to the same input. We formally define consistency and correct-consistency of a learning model. We prove that consistency and correct-consistency of an ensemble learner is not less than the average consistency and correct-consistency of individual learners and correct-consistency can be improved with a probability by combining learners with accuracy not less than the average accuracy of ensemble component learners. To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value.
\ No newline at end of file
diff --git a/data/2020/neurips/WoodFisher: Efficient Second-Order Approximation for Neural Network Compression b/data/2020/neurips/WoodFisher: Efficient Second-Order Approximation for Neural Network Compression
new file mode 100644
index 0000000000..eb6fb9f047
--- /dev/null
+++ b/data/2020/neurips/WoodFisher: Efficient Second-Order Approximation for Neural Network Compression	
@@ -0,0 +1,2 @@
+Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies issues with existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. 
+Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is considered, our method results in a gain in test accuracy over the state-of-the-art approaches, for pruning popular neural networks (like ResNet-50, MobileNetV1) trained on standard image classification datasets such as ImageNet ILSVRC. We examine how our method can be extended to take into account first-order information, as well as illustrate its ability to automatically set layer-wise pruning thresholds and perform compression in the limited-data regime.
\ No newline at end of file
diff --git a/data/2020/neurips/Woodbury Transformations for Deep Generative Flows b/data/2020/neurips/Woodbury Transformations for Deep Generative Flows
new file mode 100644
index 0000000000..0136672944
--- /dev/null
+++ b/data/2020/neurips/Woodbury Transformations for Deep Generative Flows	
@@ -0,0 +1 @@
+Normalizing flows are deep generative models that allow efficient likelihood calculation and sampling. The core requirement for this advantage is that they are constructed using functions that can be efficiently inverted and for which the determinant of the function's Jacobian can be efficiently computed. Researchers have introduced various such flow operations, but few of these allow rich interactions among variables without incurring significant computational costs. In this paper, we introduce Woodbury transformations, which achieve efficient invertibility via the Woodbury matrix identity and efficient determinant calculation via Sylvester's determinant identity. In contrast with other operations used in state-of-the-art normalizing flows, Woodbury transformations enable (1) high-dimensional interactions, (2) efficient sampling, and (3) efficient likelihood evaluation. Other similar operations, such as 1x1 convolutions, emerging convolutions, or periodic convolutions allow at most two of these three advantages. In our experiments on multiple image datasets, we find that Woodbury transformations allow learning of higher-likelihood models than other flow architectures while still enjoying their efficiency advantages.
\ No newline at end of file
diff --git a/data/2020/neurips/Worst-Case Analysis for Randomly Collected Data b/data/2020/neurips/Worst-Case Analysis for Randomly Collected Data
new file mode 100644
index 0000000000..092ec344df
--- /dev/null
+++ b/data/2020/neurips/Worst-Case Analysis for Randomly Collected Data	
@@ -0,0 +1 @@
+We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements $[n]={1,\ldots,n}$ with corresponding data values $x_1,\ldots,x_n$. We observe the values for a "sample" set $A \subset [n]$ and wish to estimate some statistic of the values for a "target" set $B \subset [n]$ where $B$ could be the entire set. Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$. A given estimation algorithm is evaluated based on its "worst-case, expected error" where the expectation is with respect to the distribution $P$ from which the sample $A$ and target sets $B$ are drawn, and the worst-case is with respect to the data values $x_1,\ldots,x_n$. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values--where the weights are functions of the distribution $P$ and the sample and target sets $A$, $B$--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $\pi/2$ factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process, is a significant departure from typical estimation and introduces a uniform algorithmic analysis for the many natural settings where membership in a sample may be correlated with data values, such as when sampling probabilities vary as in "importance sampling", when individuals are recruited into a sample via a social network as in "snowball sampling", or when samples have chronological structure as in "selective prediction".
\ No newline at end of file
diff --git a/data/2020/neurips/X-CAL: Explicit Calibration for Survival Analysis b/data/2020/neurips/X-CAL: Explicit Calibration for Survival Analysis
new file mode 100644
index 0000000000..eb570cd4c2
--- /dev/null
+++ b/data/2020/neurips/X-CAL: Explicit Calibration for Survival Analysis	
@@ -0,0 +1 @@
+Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows practitioners to directly optimize calibration and strike a desired balance between predictive power and calibration. In our experiments, we fit a variety of shallow and deep models on simulated data, a survival dataset based on MNIST, on length-of-stay prediction using MIMIC-III data, and on brain cancer data from The Cancer Genome Atlas. We show that the models we study can be miscalibrated. We give experimental evidence on these datasets that X-CAL improves D-CALIBRATION without a large decrease in concordance or likelihood.
\ No newline at end of file
diff --git a/data/2020/neurips/Your Classifier can Secretly Suffice Multi-Source Domain Adaptation b/data/2020/neurips/Your Classifier can Secretly Suffice Multi-Source Domain Adaptation
new file mode 100644
index 0000000000..6b56a3c9ef
--- /dev/null
+++ b/data/2020/neurips/Your Classifier can Secretly Suffice Multi-Source Domain Adaptation	
@@ -0,0 +1 @@
+Multi-Source Domain Adaptation (MSDA) deals with the transfer of task knowledge from multiple labeled source domains to an unlabeled target domain, under a domain-shift. Existing methods aim to minimize this domain-shift using auxiliary distribution alignment objectives. In this work, we present a different perspective to MSDA wherein deep models are observed to implicitly align the domains under label supervision. Thus, we aim to utilize implicit alignment without additional training objectives to perform adaptation. To this end, we use pseudo-labeled target samples and enforce a classifier agreement on the pseudo-labels, a process called Self-supervised Implicit Alignment (SImpAl). We find that SImpAl readily works even under category-shift among the source domains. Further, we propose classifier agreement as a cue to determine the training convergence, resulting in a simple training algorithm. We provide a thorough evaluation of our approach on five benchmarks, along with detailed insights into each component of our approach.
\ No newline at end of file
diff --git a/data/2020/neurips/Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling b/data/2020/neurips/Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling
new file mode 100644
index 0000000000..492a34c764
--- /dev/null
+++ b/data/2020/neurips/Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling	
@@ -0,0 +1 @@
+We show that the sum of the implicit generator log-density $\log p_g$ of a GAN with the logit score of the discriminator defines an energy function which yields the true data density when the generator is imperfect but the discriminator is optimal, thus making it possible to improve on the typical generator (with implicit density $p_g$). To make that practical, we show that sampling from this modified density can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score. This can be achieved by running a Langevin MCMC in latent space and then applying the generator function, which we call Discriminator Driven Latent Sampling~(DDLS). We show that DDLS is highly efficient compared to previous methods which work in the high-dimensional pixel space and can be applied to improve on previously trained GANs of many types. We evaluate DDLS on both synthetic and real-world datasets qualitatively and quantitatively. On CIFAR-10, DDLS substantially improves the Inception Score of an off-the-shelf pre-trained SN-GAN~\citep{sngan} from $8.22$ to $9.09$ which is even comparable to the class-conditional BigGAN~\citep{biggan} model. This achieves a new state-of-the-art in unconditional image synthesis setting without introducing extra parameters or additional training.
\ No newline at end of file
diff --git a/data/2020/neurips/Zap Q-Learning With Nonlinear Function Approximation b/data/2020/neurips/Zap Q-Learning With Nonlinear Function Approximation
new file mode 100644
index 0000000000..0dfe18fc76
--- /dev/null
+++ b/data/2020/neurips/Zap Q-Learning With Nonlinear Function Approximation	
@@ -0,0 +1 @@
+The Zap stochastic approximation (SA) algorithm was introduced recently as a means to accelerate convergence in reinforcement learning algorithms. While numerical results were impressive, stability (in the sense of boundedness of parameter estimates) was established in only a few special cases. This class of algorithms is generalized in this paper, and stability is established under very general conditions. This general result can be applied to a wide range of algorithms found in reinforcement learning. Two classes are considered in this paper: (i)The natural generalization of Watkins' algorithm is not always stable in function approximation settings. Parameter estimates may diverge to infinity even in the \textit{linear} function approximation setting with a simple finite state-action MDP. Under mild conditions, the Zap SA algorithm provides a stable algorithm, even in the case of \textit{nonlinear} function approximation. (ii) The GQ algorithm of Maei et.~al.~2010 is designed to address the stability challenge. Analysis is provided to explain why the algorithm may be very slow to converge in practice. The new Zap GQ algorithm is stable even for nonlinear function approximation.
\ No newline at end of file
diff --git a/data/2020/neurips/Zero-Resource Knowledge-Grounded Dialogue Generation b/data/2020/neurips/Zero-Resource Knowledge-Grounded Dialogue Generation
new file mode 100644
index 0000000000..0c0b65c42c
--- /dev/null
+++ b/data/2020/neurips/Zero-Resource Knowledge-Grounded Dialogue Generation	
@@ -0,0 +1 @@
+While neural conversation models have shown great potentials towards generating informative and engaging responses via introducing external knowledge, learning such a model often requires knowledge-grounded dialogues that are difficult to obtain. To overcome the data challenge and reduce the cost of building a knowledge-grounded dialogue system, we explore the problem under a zero-resource setting by assuming no context-knowledge-response triples are needed for training. To this end, we propose representing the knowledge that bridges a context and a response and the way that the knowledge is expressed as latent variables, and devise a variational approach that can effectively estimate a generation model from a dialogue corpus and a knowledge corpus that are independent with each other. Evaluation results on three benchmarks of knowledge-grounded dialogue generation indicate that our model can achieve comparable performance with state-of-the-art methods that rely on knowledge-grounded dialogues for training, and exhibits a good generalization ability over different topics and different datasets.
\ No newline at end of file
diff --git a/data/2020/neurips/f-Divergence Variational Inference b/data/2020/neurips/f-Divergence Variational Inference
new file mode 100644
index 0000000000..e0f6e4fedc
--- /dev/null
+++ b/data/2020/neurips/f-Divergence Variational Inference	
@@ -0,0 +1 @@
+ABSTRACT One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this article, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find a member of that family which is close to the target density. Closeness is measured by Kullback–Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this article is to catalyze statistical research on this class of algorithms. Supplementary materials for this article are available online.
\ No newline at end of file
diff --git a/data/2020/neurips/f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning b/data/2020/neurips/f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning
new file mode 100644
index 0000000000..22ac44a20f
--- /dev/null
+++ b/data/2020/neurips/f-GAIL: Learning f-Divergence for Generative Adversarial Imitation Learning	
@@ -0,0 +1 @@
+Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.
\ No newline at end of file
diff --git a/data/2020/neurips/wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations b/data/2020/neurips/wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
new file mode 100644
index 0000000000..ae473fa3b0
--- /dev/null
+++ b/data/2020/neurips/wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	
@@ -0,0 +1 @@
+We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data.
\ No newline at end of file
diff --git a/data/2021/neurips/(Almost) Free Incentivized Exploration from Decentralized Learning Agents b/data/2021/neurips/(Almost) Free Incentivized Exploration from Decentralized Learning Agents
new file mode 100644
index 0000000000..9e8ee31ebc
--- /dev/null
+++ b/data/2021/neurips/(Almost) Free Incentivized Exploration from Decentralized Learning Agents	
@@ -0,0 +1 @@
+Incentivized exploration in multi-armed bandits (MAB) has witnessed increasing interests and many progresses in recent years, where a principal offers bonuses to agents to do explorations on her behalf. However, almost all existing studies are confined to temporary myopic agents. In this work, we break this barrier and study incentivized exploration with multiple and long-term strategic agents, who have more complicated behaviors that often appear in real-world applications. An important observation of this work is that strategic agents' intrinsic needs of learning benefit (instead of harming) the principal's explorations by providing"free pulls". Moreover, it turns out that increasing the population of agents significantly lowers the principal's burden of incentivizing. The key and somewhat surprising insight revealed from our results is that when there are sufficiently many learning agents involved, the exploration process of the principal can be (almost) free. Our main results are built upon three novel components which may be of independent interest: (1) a simple yet provably effective incentive-provision strategy; (2) a carefully crafted best arm identification algorithm for rewards aggregated under unequal confidences; (3) a high-probability finite-time lower bound of UCB algorithms. Experimental results are provided to complement the theoretical analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/3D Pose Transfer with Correspondence Learning and Mesh Refinement b/data/2021/neurips/3D Pose Transfer with Correspondence Learning and Mesh Refinement
new file mode 100644
index 0000000000..a35054e2c6
--- /dev/null
+++ b/data/2021/neurips/3D Pose Transfer with Correspondence Learning and Mesh Refinement	
@@ -0,0 +1 @@
+3D pose transfer is one of the most challenging 3D generation tasks. It aims to transfer the pose of a source mesh to a target mesh and keep the identity (e.g., body shape) of the target mesh. Some previous works require key point annotations to build reliable correspondence between the source and target meshes, while other methods do not consider any shape correspondence between sources and targets, which leads to limited generation quality. In this work, we propose a correspondence-refinement network to achieve the 3D pose transfer for both human and animal meshes. The correspondence between source and target meshes is first established by solving an optimal transport problem. Then, we warp the source mesh according to the dense correspondence and obtain a coarse warped mesh. The warped mesh will be better refined with our proposed Elastic Instance Normalization, which is a conditional normalization layer and can help to generate high-quality meshes. Extensive experimental results show that the proposed architecture can effectively transfer the poses from source to target meshes and produce better results with satisfied visual performance than state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds b/data/2021/neurips/3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds
new file mode 100644
index 0000000000..c11f70011b
--- /dev/null
+++ b/data/2021/neurips/3D Siamese Voxel-to-BEV Tracker for Sparse Point Clouds	
@@ -0,0 +1 @@
+3D object tracking in point clouds is still a challenging problem due to the sparsity of LiDAR points in dynamic environments. In this work, we propose a Siamese voxel-to-BEV tracker, which can significantly improve the tracking performance in sparse 3D point clouds. Specifically, it consists of a Siamese shape-aware feature learning network and a voxel-to-BEV target localization network. The Siamese shape-aware feature learning network can capture 3D shape information of the object to learn the discriminative features of the object so that the potential target from the background in sparse point clouds can be identified. To this end, we first perform template feature embedding to embed the template's feature into the potential target and then generate a dense 3D shape to characterize the shape information of the potential target. For localizing the tracked target, the voxel-to-BEV target localization network regresses the target's 2D center and the $z$-axis center from the dense bird's eye view (BEV) feature map in an anchor-free manner. Concretely, we compress the voxelized point cloud along $z$-axis through max pooling to obtain a dense BEV feature map, where the regression of the 2D center and the $z$-axis center can be performed more effectively. Extensive evaluation on the KITTI and nuScenes datasets shows that our method significantly outperforms the current state-of-the-art methods by a large margin.
\ No newline at end of file
diff --git a/data/2021/neurips/3DP3: 3D Scene Perception via Probabilistic Programming b/data/2021/neurips/3DP3: 3D Scene Perception via Probabilistic Programming
new file mode 100644
index 0000000000..5db538cfab
--- /dev/null
+++ b/data/2021/neurips/3DP3: 3D Scene Perception via Probabilistic Programming	
@@ -0,0 +1 @@
+We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images. 3DP3 uses (i) voxel models to represent the 3D shape of objects, (ii) hierarchical scene graphs to decompose scenes into objects and the contacts between them, and (iii) depth image likelihoods based on real-time graphics. Given an observed RGB-D image, 3DP3's inference algorithm infers the underlying latent 3D scene, including the object poses and a parsimonious joint parametrization of these poses, using fast bottom-up pose proposals, novel involutive MCMC updates of the scene graph structure, and, optionally, neural object detectors and pose estimators. We show that 3DP3 enables scene understanding that is aware of 3D shape, occlusion, and contact structure. Our results demonstrate that 3DP3 is more accurate at 6DoF object pose estimation from real images than deep learning baselines and shows better generalization to challenging scenes with novel viewpoints, contact, and partial observability.
\ No newline at end of file
diff --git a/data/2021/neurips/A 3D Generative Model for Structure-Based Drug Design b/data/2021/neurips/A 3D Generative Model for Structure-Based Drug Design
new file mode 100644
index 0000000000..4b8ba34981
--- /dev/null
+++ b/data/2021/neurips/A 3D Generative Model for Structure-Based Drug Design	
@@ -0,0 +1 @@
+We study a fundamental problem in structure-based drug design -- generating molecules that bind to specific protein binding sites. While we have witnessed the great success of deep generative models in drug design, the existing methods are mostly string-based or graph-based. They are limited by the lack of spatial information and thus unable to be applied to structure-based design tasks. Particularly, such models have no or little knowledge of how molecules interact with their target proteins exactly in 3D space. In this paper, we propose a 3D generative model that generates molecules given a designated 3D protein binding site. Specifically, given a binding site as the 3D context, our model estimates the probability density of atom's occurrences in 3D space -- positions that are more likely to have atoms will be assigned higher probability. To generate 3D molecules, we propose an auto-regressive sampling scheme -- atoms are sampled sequentially from the learned distribution until there is no room for new atoms. Combined with this sampling scheme, our model can generate valid and diverse molecules, which could be applicable to various structure-based molecular design tasks such as molecule sampling and linker design. Experimental results demonstrate that molecules sampled from our model exhibit high binding affinity to specific targets and good drug properties such as drug-likeness even if the model is not explicitly optimized for them.
\ No newline at end of file
diff --git a/data/2021/neurips/A B Testing for Recommender Systems in a Two-sided Marketplace b/data/2021/neurips/A B Testing for Recommender Systems in a Two-sided Marketplace
new file mode 100644
index 0000000000..a9b174aaa7
--- /dev/null
+++ b/data/2021/neurips/A B Testing for Recommender Systems in a Two-sided Marketplace	
@@ -0,0 +1 @@
+Two-sided marketplaces are standard business models of many online platforms (e.g., Amazon, Facebook, LinkedIn), wherein the platforms have consumers, buyers or content viewers on one side and producers, sellers or content-creators on the other. Consumer side measurement of the impact of a treatment variant can be done via simple online A/B testing. Producer side measurement is more challenging because the producer experience depends on the treatment assignment of the consumers. Existing approaches for producer side measurement are either based on graph cluster-based randomization or on certain treatment propagation assumptions. The former approach results in low-powered experiments as the producer-consumer network density increases and the latter approach lacks a strict notion of error control. In this paper, we propose (i) a quantification of the quality of a producer side experiment design, and (ii) a new experiment design mechanism that generates high-quality experiments based on this quantification. Our approach, called UniCoRn (Unifying Counterfactual Rankings), provides explicit control over the quality of the experiment and its computation cost. Further, we prove that our experiment design is optimal to the proposed design quality measure. Our approach is agnostic to the density of the producer-consumer network and does not rely on any treatment propagation assumption. Moreover, unlike the existing approaches, we do not need to know the underlying network in advance, making this widely applicable to the industrial setting where the underlying network is unknown and challenging to predict a priori due to its dynamic nature. We use simulations to validate our approach and compare it against existing methods. We also deployed UniCoRn in an edge recommendation application that serves tens of millions of members and billions of edge recommendations daily.
\ No newline at end of file
diff --git a/data/2021/neurips/A B n Testing with Control in the Presence of Subpopulations b/data/2021/neurips/A B n Testing with Control in the Presence of Subpopulations
new file mode 100644
index 0000000000..50af53104e
--- /dev/null
+++ b/data/2021/neurips/A B n Testing with Control in the Presence of Subpopulations	
@@ -0,0 +1 @@
+Motivated by A/B/n testing applications, we consider a finite set of distributions (called \emph{arms}), one of which is treated as a \emph{control}. We assume that the population is stratified into homogeneous subpopulations. At every time step, a subpopulation is sampled and an arm is chosen: the resulting observation is an independent draw from the arm conditioned on the subpopulation. The quality of each arm is assessed through a weighted combination of its subpopulation means. We propose a strategy for sequentially choosing one arm per time step so as to discover as fast as possible which arms, if any, have higher weighted expectation than the control. This strategy is shown to be asymptotically optimal in the following sense: if $\tau_\delta$ is the first time when the strategy ensures that it is able to output the correct answer with probability at least $1-\delta$, then $\mathbb{E}[\tau_\delta]$ grows linearly with $\log(1/\delta)$ at the exact optimal rate. This rate is identified in the paper in three different settings: (1) when the experimenter does not observe the subpopulation information, (2) when the subpopulation of each sample is observed but not chosen, and (3) when the experimenter can select the subpopulation from which each response is sampled. We illustrate the efficiency of the proposed strategy with numerical simulations on synthetic and real data collected from an A/B/n experiment.
\ No newline at end of file
diff --git a/data/2021/neurips/A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics b/data/2021/neurips/A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics
new file mode 100644
index 0000000000..8d506c87cf
--- /dev/null
+++ b/data/2021/neurips/A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics	
@@ -0,0 +1 @@
+Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a topdown generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties. BSP models each of these entities as random variables, and uses Bayesian inference to estimate their unknown properties. For learning the unknown forces, BSP leverages symbolic regression on a novel grammar of Newtonian physics in a bilevel optimization setup. These inference and regression steps are performed in an iterative manner using expectation-maximization, allowing BSP to simultaneously learn force laws while maintaining uncertainty over entity properties. We show that BSP is more sample-efficient compared to neural alternatives on controlled synthetic datasets, demonstrate BSP’s applicability to real-world common sense scenes and study BSP’s performance on tasks previously used to study human physical reasoning.1
\ No newline at end of file
diff --git a/data/2021/neurips/A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs b/data/2021/neurips/A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
new file mode 100644
index 0000000000..66652b023b
--- /dev/null
+++ b/data/2021/neurips/A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs	
@@ -0,0 +1 @@
+Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution quality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e.g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity. The experiments and results on several popular CO problems like Directed Acyclic Graph scheduling, Graph Edit Distance and Hamiltonian Cycle Problem show its effectiveness over manually designed heuristics and single-level learning methods.
\ No newline at end of file
diff --git a/data/2021/neurips/A Biased Graph Neural Network Sampler with Near-Optimal Regret b/data/2021/neurips/A Biased Graph Neural Network Sampler with Near-Optimal Regret
new file mode 100644
index 0000000000..da2a601d4a
--- /dev/null
+++ b/data/2021/neurips/A Biased Graph Neural Network Sampler with Near-Optimal Regret	
@@ -0,0 +1 @@
+Graph neural networks (GNN) have recently emerged as a vehicle for applying deep network architectures to graph and relational data. However, given the increasing size of industrial datasets, in many practical situations the message passing computations required for sharing information across GNN layers are no longer scalable. Although various sampling methods have been introduced to approximate full-graph training within a tractable budget, there remain unresolved complications such as high variances and limited theoretical guarantees. To address these issues, we build upon existing work and treat GNN neighbor sampling as a multi-armed bandit problem but with a newly-designed reward function that introduces some degree of bias designed to reduce variance and avoid unstable, possibly-unbounded pay outs. And unlike prior bandit-GNN use cases, the resulting policy leads to near-optimal regret while accounting for the GNN training dynamics introduced by SGD. From a practical standpoint, this translates into lower variance estimates and competitive or superior test accuracy across several benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/A Causal Lens for Controllable Text Generation b/data/2021/neurips/A Causal Lens for Controllable Text Generation
new file mode 100644
index 0000000000..cef3f64278
--- /dev/null
+++ b/data/2021/neurips/A Causal Lens for Controllable Text Generation	
@@ -0,0 +1 @@
+Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i.e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i.e., text attribute transfer). Extensive prior work has largely studied the two problems separately, and developed different conditional models which, however, are prone to producing biased text (e.g., various gender stereotypes). This paper proposes to formulate controllable text generation from a principled causal perspective which models the two tasks with a unified framework. A direct advantage of the causal formulation is the use of rich causality tools to mitigate generation biases and improve control. We treat the two tasks as interventional and counterfactual causal inference based on a structural causal model, respectively. We then apply the framework to the challenging practical setting where confounding factors (that induce spurious correlations) are observable only on a small fraction of data. Experiments show significant superiority of the causal approach over previous conditional models for improved control accuracy and reduced bias.
\ No newline at end of file
diff --git a/data/2021/neurips/A Central Limit Theorem for Differentially Private Query Answering b/data/2021/neurips/A Central Limit Theorem for Differentially Private Query Answering
new file mode 100644
index 0000000000..72784769c5
--- /dev/null
+++ b/data/2021/neurips/A Central Limit Theorem for Differentially Private Query Answering	
@@ -0,0 +1 @@
+Perhaps the single most important use case for differential privacy is to privately answer numerical queries, which is usually achieved by adding noise to the answer vector. The central question, therefore, is to understand which noise distribution optimizes the privacy-accuracy trade-off, especially when the dimension of the answer vector is high. Accordingly, extensive literature has been dedicated to the question and the upper and lower bounds have been matched up to constant factors [BUV18, SU17]. In this paper, we take a novel approach to address this important optimality question. We first demonstrate an intriguing central limit theorem phenomenon in the high-dimensional regime. More precisely, we prove that a mechanism is approximately Gaussian Differentially Private [DRS21] if the added noise satisfies certain conditions. In particular, densities proportional to $\mathrm{e}^{-\|x\|_p^\alpha}$, where $\|x\|_p$ is the standard $\ell_p$-norm, satisfies the conditions. Taking this perspective, we make use of the Cramer--Rao inequality and show an"uncertainty principle"-style result: the product of the privacy parameter and the $\ell_2$-loss of the mechanism is lower bounded by the dimension. Furthermore, the Gaussian mechanism achieves the constant-sharp optimal privacy-accuracy trade-off among all such noises. Our findings are corroborated by numerical experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms b/data/2021/neurips/A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms
new file mode 100644
index 0000000000..4e3161f434
--- /dev/null
+++ b/data/2021/neurips/A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms	
@@ -0,0 +1 @@
+One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB) policy is among the simplest optimism-based MAB algorithms that naturally adapts to this gap: for a horizon of play n, it achieves optimal O(log n) regret in instances with"large"gaps, and a near-optimal O(\sqrt{n log n}) minimax regret when the gap can be arbitrarily"small."This paper provides new results on the arm-sampling behavior of UCB, leading to several important insights. Among these, it is shown that arm-sampling rates under UCB are asymptotically deterministic, regardless of the problem complexity. This discovery facilitates new sharp asymptotics and a novel alternative proof for the O(\sqrt{n log n}) minimax regret of UCB. Furthermore, the paper also provides the first complete process-level characterization of the MAB problem under UCB in the conventional diffusion scaling. Among other things, the"small"gap worst-case lens adopted in this paper also reveals profound distinctions between the behavior of UCB and Thompson Sampling, such as an"incomplete learning"phenomenon characteristic of the latter.
\ No newline at end of file
diff --git a/data/2021/neurips/A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference b/data/2021/neurips/A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference
new file mode 100644
index 0000000000..b7a7343710
--- /dev/null
+++ b/data/2021/neurips/A Compositional Atlas of Tractable Circuit Operations for Probabilistic Inference	
@@ -0,0 +1 @@
+Circuit representations are becoming the lingua franca to express and reason about tractable generative and discriminative models. In this paper, we show how complex inference scenarios for these models that commonly arise in machine learning—from computing the expectations of decision tree ensembles to information-theoretic divergences of sum-product networks—can be represented in terms of tractable modular operations over circuits. Speciﬁcally, we characterize the tractability of simple transformations—sums, products, quotients, powers, logarithms, and exponentials—in terms of sufﬁcient structural constraints of the circuits they operate on, and present novel hardness results for the cases in which these properties are not satisﬁed. Building on these operations, we derive a uniﬁed framework for reasoning about tractable models that generalizes several results in the literature and opens up novel tractable inference scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/A Comprehensively Tight Analysis of Gradient Descent for PCA b/data/2021/neurips/A Comprehensively Tight Analysis of Gradient Descent for PCA
new file mode 100644
index 0000000000..26cfdda68a
--- /dev/null
+++ b/data/2021/neurips/A Comprehensively Tight Analysis of Gradient Descent for PCA	
@@ -0,0 +1 @@
+We study the Riemannian gradient method for PCA on which a crucial fact is that despite the simplicity of the considered setting, i.e., deterministic version of Krasulina’s method, the convergence rate has not been well-understood yet. In this work, we provide a general tight analysis for the gap-dependent rate at O( 1 ∆ log 1 ε ) that holds for any real symmetric matrix. More importantly, when the gap ∆ is significantly smaller than the target accuracy ε on the objective suboptimality of the final solution, the rate of this type is actually not tight any more, which calls for a worst-case rate. We further give the first worst-case analysis that achieves a rate of convergence at O( 1ε log 1 ε ). The two analyses naturally roll out a comprehensively tight convergence rate at O( 1 max{∆,ε}log 1 ε ). Particularly, our gap-dependent analysis suggests a new promising learning rate for stochastic variance reduced PCA algorithms. Experiments are conducted to confirm our findings as well.
\ No newline at end of file
diff --git a/data/2021/neurips/A Computationally Efficient Method for Learning Exponential Family Distributions b/data/2021/neurips/A Computationally Efficient Method for Learning Exponential Family Distributions
new file mode 100644
index 0000000000..ebf4a776ff
--- /dev/null
+++ b/data/2021/neurips/A Computationally Efficient Method for Learning Exponential Family Distributions	
@@ -0,0 +1 @@
+We consider the question of learning the natural parameters of a $k$ parameter minimal exponential family from i.i.d. samples in a computationally and statistically efficient manner. We focus on the setting where the support as well as the natural parameters are appropriately bounded. While the traditional maximum likelihood estimator for this class of exponential family is consistent, asymptotically normal, and asymptotically efficient, evaluating it is computationally hard. In this work, we propose a computationally efficient estimator that is consistent as well as asymptotically normal under mild conditions. We provide finite sample guarantees to achieve an ($\ell_2$) error of $\alpha$ in the parameter estimation with sample complexity $O(\mathrm{poly}(k/\alpha))$ and computational complexity ${O}(\mathrm{poly}(k/\alpha))$. To establish these results, we show that, at the population level, our method can be viewed as the maximum likelihood estimation of a re-parameterized distribution belonging to the same class of exponential family.
\ No newline at end of file
diff --git a/data/2021/neurips/A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning b/data/2021/neurips/A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
new file mode 100644
index 0000000000..090996c896
--- /dev/null
+++ b/data/2021/neurips/A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning	
@@ -0,0 +1 @@
+We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during planning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.
\ No newline at end of file
diff --git a/data/2021/neurips/A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering b/data/2021/neurips/A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering
new file mode 100644
index 0000000000..5ccde5eb97
--- /dev/null
+++ b/data/2021/neurips/A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering	
@@ -0,0 +1 @@
+We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give the first algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order, an exponential improvement over previous work. This is also the first constant approximation guarantee that holds without any structural assumptions on the input data. Moreover, the number of selected centers is only quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and a multiscale approach to center selection.
\ No newline at end of file
diff --git a/data/2021/neurips/A Continuous Mapping For Augmentation Design b/data/2021/neurips/A Continuous Mapping For Augmentation Design
new file mode 100644
index 0000000000..557a2cbb8c
--- /dev/null
+++ b/data/2021/neurips/A Continuous Mapping For Augmentation Design	
@@ -0,0 +1 @@
+Automated data augmentation (ADA) techniques have played an important role in boosting the performance of deep models. Such techniques mostly aim to optimize a parameterized distribution over a discrete augmentation space. Thus, are restricted by the discretization of the search space which normally is handcrafted. To overcome the limitations, we take the ﬁrst step to constructing a continuous mapping from R d to image transformations (an augmentation space). Using this mapping, we take a novel approach where 1) we pose the ADA as a continuous optimization problem over the parameters of the augmentation distribution; and 2) use Stochastic Gradient Langevin Dynamics to learn and sample augmentations. This allows us to potentially explore the space of inﬁnitely many possible augmen-tations, which otherwise was not possible due to the discretization of the space. This view of ADA is radically different from the standard discretization based view of ADA , and it opens avenues for utilizing the vast efﬁcient gradient-based algorithms available for continuous optimization problems. Results over multiple benchmarks demonstrate the efﬁciency improvement of this work compared with previous methods.
\ No newline at end of file
diff --git a/data/2021/neurips/A Contrastive Learning Approach for Training Variational Autoencoder Priors b/data/2021/neurips/A Contrastive Learning Approach for Training Variational Autoencoder Priors
new file mode 100644
index 0000000000..2880ee4e1a
--- /dev/null
+++ b/data/2021/neurips/A Contrastive Learning Approach for Training Variational Autoencoder Priors	
@@ -0,0 +1 @@
+Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in many domains. However, they struggle to generate high-quality images, especially when samples are obtained from the prior without any tempering. One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior. Due to this mismatch, there exist areas in the latent space with high density under the prior that do not correspond to any encoded image. Samples from those areas are decoded to corrupted images. To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior. We train the reweighting factor by noise contrastive estimation, and we generalize it to hierarchical VAEs with many latent variable groups. Our experiments confirm that the proposed noise contrastive priors improve the generative performance of state-of-the-art VAEs by a large margin on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ 256 datasets. Our method is simple and can be applied to a wide variety of VAEs to improve the expressivity of their prior distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/A Convergence Analysis of Gradient Descent on Graph Neural Networks b/data/2021/neurips/A Convergence Analysis of Gradient Descent on Graph Neural Networks
new file mode 100644
index 0000000000..0cb0604f43
--- /dev/null
+++ b/data/2021/neurips/A Convergence Analysis of Gradient Descent on Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are a powerful class of architectures for solving learning problems on graphs. While many variants of GNNs have been proposed in the literature and have achieved strong empirical performance, their theoretical properties are less well understood. In this work we study the convergence properties of the gradient descent algorithm when used to train GNNs. In particular, we consider the realizable setting where the data is generated from a network with unknown weights and our goal is to study conditions under which gradient descent on a GNN architecture can recover near optimal solutions. While such analysis has been performed in recent years for other architectures such as fully connected feed-forward networks, the message passing nature of the updates in a GNN poses a new challenge in understanding the nature of the gradient descent updates. We take a step towards overcoming this by proving that for the case of deep linear GNNs gradient descent provably recovers solutions up to error (cid:15) in O ( log (1 /(cid:15) )) iterations, under natural assumptions on the data distribution. Furthermore, for the case of one-round GNNs with ReLU activations, we show that gradient descent provably recovers solutions up to error (cid:15) in O ( 1 (cid:15) 2 log( 1 (cid:15) )) iterations.
\ No newline at end of file
diff --git a/data/2021/neurips/A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models b/data/2021/neurips/A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models
new file mode 100644
index 0000000000..6448e8c444
--- /dev/null
+++ b/data/2021/neurips/A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models	
@@ -0,0 +1 @@
+Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their ability to yield consistent causal estimates. While they have demonstrated promising results and theory exists on some simple model formulations, we also know that causal effects are not even identifiable in general with latent variables. We investigate this gap between theory and empirical results with analytical considerations and extensive experiments under multiple synthetic and real-world data sets, using the causal effect variational autoencoder (CEVAE) as a case study. While CEVAE seems to work reliably under some simple scenarios, it does not estimate the causal effect correctly with a misspecified latent variable or a complex data distribution, as opposed to its original motivation. Hence, our results show that more attention should be paid to ensuring the correctness of causal estimates with deep latent variable models.
\ No newline at end of file
diff --git a/data/2021/neurips/A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance b/data/2021/neurips/A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance
new file mode 100644
index 0000000000..50b5e7d799
--- /dev/null
+++ b/data/2021/neurips/A Domain-Shrinking based Bayesian Optimization Algorithm with Order-Optimal Regret Performance	
@@ -0,0 +1 @@
+We consider sequential optimization of an unknown function in a reproducing kernel Hilbert space. We propose a Gaussian process-based algorithm and establish its order-optimal regret performance (up to a poly-logarithmic factor). This is the first GP-based algorithm with an order-optimal regret guarantee. The proposed algorithm is rooted in the methodology of domain shrinking realized through a sequence of tree-based region pruning and refining to concentrate queries in increasingly smaller high-performing regions of the function domain. The search for high-performing regions is localized and guided by an iterative estimation of the optimal function value to ensure both learning efficiency and computational efficiency. Compared with the prevailing GP-UCB family of algorithms, the proposed algorithm reduces computational complexity by a factor of $O(T^{2d-1})$ (where $T$ is the time horizon and $d$ the dimension of the function domain).
\ No newline at end of file
diff --git a/data/2021/neurips/A Faster Decentralized Algorithm for Nonconvex Minimax Problems b/data/2021/neurips/A Faster Decentralized Algorithm for Nonconvex Minimax Problems
new file mode 100644
index 0000000000..32e5f994e3
--- /dev/null
+++ b/data/2021/neurips/A Faster Decentralized Algorithm for Nonconvex Minimax Problems	
@@ -0,0 +1 @@
+In this paper, we study the nonconvex-strongly-concave minimax optimization problem on decentralized setting. The minimax problems are attracting increasing attentions because of their popular practical applications such as policy evaluation and adversarial training. As training data become larger, distributed training has been broadly adopted in machine learning tasks. Recent research works show that the decentralized distributed data-parallel training techniques are specially promising, because they can achieve the efﬁcient communications and avoid the bottleneck problem on the central node or the latency of low bandwidth network. However, the decentralized minimax problems were seldom studied in literature and the existing methods suffer from very high gradient complexity. To address this challenge, we propose a new faster decentralized algorithm, named as DM-HSGD, for nonconvex minimax problems by using the variance reduced technique of hybrid stochastic gradient descent. We prove that our DM-HSGD algorithm achieves stochastic ﬁrst-order oracle (SFO) complexity of O ( κ 3 (cid:15) − 3 ) for decentralized stochastic nonconvex-strongly-concave problem to search an (cid:15) -stationary point, which improves the exiting best theoretical results. Moreover, we also prove that our algorithm achieves linear speedup with respect to the number of workers. Our experiments on decentralized settings show the superior performance of our new algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/A Faster Maximum Cardinality Matching Algorithm with Applications in Machine Learning b/data/2021/neurips/A Faster Maximum Cardinality Matching Algorithm with Applications in Machine Learning
new file mode 100644
index 0000000000..4b05075a2a
--- /dev/null
+++ b/data/2021/neurips/A Faster Maximum Cardinality Matching Algorithm with Applications in Machine Learning	
@@ -0,0 +1 @@
+Maximum cardinality bipartite matching is an important graph optimization problem with several applications. For instance, maximum cardinality matching in a δ -disc graph can be used in the computation of the bottleneck matching as well as the ∞ -Wasserstein and the Lévy-Prokhorov distances between probability distributions. For any point sets A, B ⊂ R 2 , the δ -disc graph is a bipartite graph formed by connecting every pair of points ( a, b ) ∈ A × B by an edge if the Euclidean distance between them is at most δ . Using the classical Hopcroft-Karp algorithm, a maximum-cardinality matching on any δ -disc graph can be found in ˜ O ( n 3 / 2 ) time. 2 In this paper, we present a simpliﬁcation of a recent algorithm (Lahn and Raghvendra, JoCG 2021) for the maximum cardinality matching problem and describe how a maximum cardinality matching in a δ -disc graph can be computed asymptotically faster than O ( n 3 / 2 ) time for any moderately dense point set. As applications, we show that if A and B are point sets drawn uniformly at random from a unit square, an exact bottleneck matching can be computed in ˜ O ( n 4 / 3 ) time. On the other hand, experiments suggest that the Hopcroft-Karp algorithm seems to take roughly Θ( n 3 / 2 ) time for this case. This translates to substantial improvements in execution time for larger inputs.
\ No newline at end of file
diff --git a/data/2021/neurips/A Framework to Learn with Interpretation b/data/2021/neurips/A Framework to Learn with Interpretation
new file mode 100644
index 0000000000..be58e2123a
--- /dev/null
+++ b/data/2021/neurips/A Framework to Learn with Interpretation	
@@ -0,0 +1 @@
+With increasingly widespread use of deep neural networks in critical decision-making applications, interpretability of these models is becoming imperative. We consider the problem of jointly learning a predictive model and its associated interpretation model. The task of the interpreter is to provide both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, without any loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose a high level of conciseness by constraining the activation of a very few attributes for a given input with a real-entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A major advantage of simultaneous learning is that the predictive neural network benefits from the interpretability constraint as well. We also develop a more detailed pipeline based on some common and novel simple tools to develop understanding about the learnt features. We show on two datasets, MNIST and QuickDraw, their relevance for both global and local interpretability.
\ No newline at end of file
diff --git a/data/2021/neurips/A Gang of Adversarial Bandits b/data/2021/neurips/A Gang of Adversarial Bandits
new file mode 100644
index 0000000000..6d375c7a69
--- /dev/null
+++ b/data/2021/neurips/A Gang of Adversarial Bandits	
@@ -0,0 +1 @@
+We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main motivation for this study are online recommendation systems, in which each of N users is associated with a MAB problem and the goal is to exploit users’ similarity in order to learn users’ preferences toK items more efficiently. We consider the adversarial MAB setting, whereby an adversary is free to choose which user and which loss to present to the learner during the learning process. Users are in a social network and the learner is aided by a-priori knowledge of the strengths of the social links between all pairs of users. It is assumed that if the social link between two users is strong then they tend to share the same action. The regret is measured relative to an arbitrary function which maps users to actions. The smoothness of the function is captured by a resistance-based dispersion measure Ψ. We present two learning algorithms, GABA-I and GABA-II which exploit the network structure to bias towards functions of low Ψ values. We show that GABA-I has an expected regret bound of O( √ ln(NK/Ψ)ΨKT ) and per-trial time complexity of O(K ln(N)), whilst GABA-II has a weaker O( √ ln(N/Ψ) ln(NK/Ψ)ΨKT ) regret, but a better O(ln(K) ln(N)) per-trial time complexity. We highlight improvements of both algorithms over running independent standard MABs across users.
\ No newline at end of file
diff --git a/data/2021/neurips/A Gaussian Process-Bayesian Bernoulli Mixture Model for Multi-Label Active Learning b/data/2021/neurips/A Gaussian Process-Bayesian Bernoulli Mixture Model for Multi-Label Active Learning
new file mode 100644
index 0000000000..48daa156b9
--- /dev/null
+++ b/data/2021/neurips/A Gaussian Process-Bayesian Bernoulli Mixture Model for Multi-Label Active Learning	
@@ -0,0 +1 @@
+Multi-label classification (MLC) allows complex dependencies among labels, making it more suitable to model many real-world problems. However, data annotation for training MLC models becomes much more labor-intensive due to the correlated (hence non-exclusive) labels and a potentially large and sparse label space. We propose to conduct multi-label active learning (ML-AL) through a novel integrated Gaussian Process-Bayesian Bernoulli Mixture model (GP-B 2 M) to accurately quantify a data sample’s overall contribution to a correlated label space and choose the most informative samples for cost-effective annotation. In particular, the B 2 M encodes label correlations using a Bayesian Bernoulli mixture of label clusters, where each mixture component corresponds to a global pattern of label correlations. To tackle highly sparse labels under AL, the B 2 M is further integrated with a predictive GP to connect data features as an effective inductive bias and achieve a feature-component-label mapping. The GP predicts coefficients of mixture components that help to recover the final set of labels of a data sample. A novel auxiliary variable based variational inference algorithm is developed to tackle the non-conjugacy introduced along with the mapping process for efficient end-to-end posterior inference. The model also outputs a predictive distribution that provides both the label prediction and their correlations in the form of a label covariance matrix. A principled sampling function is designed accordingly to naturally capture both the feature uncertainty (through GP) and label covariance (through B 2 M) for effective data sampling. Experiments on real-world multi-label datasets demonstrate the state-of-the-art AL performance of the proposed model.
\ No newline at end of file
diff --git a/data/2021/neurips/A Geometric Analysis of Neural Collapse with Unconstrained Features b/data/2021/neurips/A Geometric Analysis of Neural Collapse with Unconstrained Features
new file mode 100644
index 0000000000..7e857a2af2
--- /dev/null
+++ b/data/2021/neurips/A Geometric Analysis of Neural Collapse with Unconstrained Features	
@@ -0,0 +1 @@
+We provide the first global optimization landscape analysis of $Neural\;Collapse$ -- an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported by Papyan et al., this phenomenon implies that ($i$) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and ($ii$) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified $unconstrained\;feature\;model$, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests. For example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over $20\%$ on ResNet18 without sacrificing the generalization performance.
\ No newline at end of file
diff --git a/data/2021/neurips/A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition b/data/2021/neurips/A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition
new file mode 100644
index 0000000000..c535501b52
--- /dev/null
+++ b/data/2021/neurips/A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition	
@@ -0,0 +1 @@
+It is well known that vision classification models suffer from poor calibration in the face of data distribution shifts. In this paper, we take a geometric approach to this problem. We propose Geometric Sensitivity Decomposition (GSD) which decomposes the norm of a sample feature embedding and the angular similarity to a target classifier into an instance-dependent and an instance-independent component. The instance-dependent component captures the sensitive information about changes in the input while the instance-independent component represents the insensitive information serving solely to minimize the loss on the training dataset. Inspired by the decomposition, we analytically derive a simple extension to current softmax-linear models, which learns to disentangle the two components during training. On several common vision models, the disentangled model outperforms other calibration methods on standard calibration metrics in the face of out-of-distribution (OOD) data and corruption with significantly less complexity. Specifically, we surpass the current state of the art by 30.8% relative improvement on corrupted CIFAR100 in Expected Calibration Error. Code available at https://github.com/GT-RIPL/Geometric-Sensitivity-Decomposition.git.
\ No newline at end of file
diff --git a/data/2021/neurips/A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast b/data/2021/neurips/A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast
new file mode 100644
index 0000000000..b8bc0a8e88
--- /dev/null
+++ b/data/2021/neurips/A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast	
@@ -0,0 +1 @@
+Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of first-order accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with $\mathcal{O}(1/K^4)$ rate in the prox-grad setup, faster than the $\mathcal{O}(1/K^3)$ rates of Nesterov's FGM or Kim and Fessler's FPGM-m.
\ No newline at end of file
diff --git a/data/2021/neurips/A Gradient Method for Multilevel Optimization b/data/2021/neurips/A Gradient Method for Multilevel Optimization
new file mode 100644
index 0000000000..b62337199f
--- /dev/null
+++ b/data/2021/neurips/A Gradient Method for Multilevel Optimization	
@@ -0,0 +1 @@
+Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the $T$ steepest descent update equations with some prechosen iteration number $T$. In this paper, we have developed a gradient-based algorithm for multilevel optimization with $n$ levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.
\ No newline at end of file
diff --git a/data/2021/neurips/A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems b/data/2021/neurips/A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems
new file mode 100644
index 0000000000..561600389b
--- /dev/null
+++ b/data/2021/neurips/A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems	
@@ -0,0 +1 @@
+The Dynamic Pickup and Delivery Problem (DPDP) is an essential problem in the logistics domain, which is NP-hard. The objective is to dynamically schedule vehicles among multiple sites to serve the online generated orders such that the overall transportation cost could be minimized. The critical challenge of DPDP is the orders are not known a priori, i.e., the orders are dynamically generated in real-time. To address this problem, existing methods partition the overall DPDP into ﬁxed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further. However, the solution quality and efﬁciency of these methods are unsatisfactory, especially when the problem scale is very large. In this paper, we propose a novel hierarchical optimization framework to better solve large-scale DPDPs. Speciﬁcally, we design an upper-level agent to dynamically partition the DPDP into a series of sub-problems with different scales to optimize vehicles routes towards globally better solutions. Besides, a lower-level agent is designed to efﬁciently solve each sub-problem by incorporating the strengths of classical operational research-based methods with reinforcement learning-based policies. To verify the effectiveness of the proposed framework, real historical data is collected from the order dispatching system of Huawei Supply Chain Business Unit and used to build a functional simulator. Extensive ofﬂine simulation and online testing conducted
\ No newline at end of file
diff --git a/data/2021/neurips/A Highly-Efficient Group Elastic Net Algorithm with an Application to Function-On-Scalar Regression b/data/2021/neurips/A Highly-Efficient Group Elastic Net Algorithm with an Application to Function-On-Scalar Regression
new file mode 100644
index 0000000000..83f929b5ec
--- /dev/null
+++ b/data/2021/neurips/A Highly-Efficient Group Elastic Net Algorithm with an Application to Function-On-Scalar Regression	
@@ -0,0 +1 @@
+Feature Selection and Functional Data Analysis are two dynamic areas of research, with important applications in the analysis of large and complex data sets. Strad-dling these two areas, we propose a new highly efﬁcient algorithm to perform Group Elastic Net with application to function-on-scalar feature selection, where a functional response is modeled against a very large number of potential scalar predictors. First, we introduce a new algorithm to solve Group Elastic Net in ultra-high dimensional settings, which exploits the sparsity structure of the Augmented Lagrangian to greatly reduce computational burden. Next, taking advantage of the properties of Functional Principal Components, we extend our algorithm to the function-on-scalar regression framework. We use simulations to demonstrate the CPU time gains afforded by our approach compared to its best existing competitors, and present an application to data from a Genome Wide Association Study on childhood obesity.
\ No newline at end of file
diff --git a/data/2021/neurips/A Kernel-based Test of Independence for Cluster-correlated Data b/data/2021/neurips/A Kernel-based Test of Independence for Cluster-correlated Data
new file mode 100644
index 0000000000..01b8e84ef7
--- /dev/null
+++ b/data/2021/neurips/A Kernel-based Test of Independence for Cluster-correlated Data	
@@ -0,0 +1 @@
+In this section, we present some preliminary results that will be useful in proving Theorem 3.2, Theorem 3.3 and Proposition 3.4. We draw upon existing theory on properties of random kernel matrices and extend these properties to cluster-correlated data. Specifically, we show the convergence of eigenvalues and eigenvectors of an empirical kernel matrix based on clustered data. Let (X ,F , P ) be a probability space and H be a Hilbert space over (X ,F , P ) with a symmetric kernel function k : X × X → R. Let H be a compact operator onH, defined by
\ No newline at end of file
diff --git a/data/2021/neurips/A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning b/data/2021/neurips/A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..ffe7bbdc84
--- /dev/null
+++ b/data/2021/neurips/A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable.
\ No newline at end of file
diff --git a/data/2021/neurips/A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks b/data/2021/neurips/A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks
new file mode 100644
index 0000000000..74ace88394
--- /dev/null
+++ b/data/2021/neurips/A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks	
@@ -0,0 +1 @@
+Adversarial examples for neural network image classifiers are known to be transferable: examples optimized to be misclassified by a source classifier are often misclassified as well by classifiers with different architectures. However, targeted adversarial examples -- optimized to be classified as a chosen target class -- tend to be less transferable between architectures. While prior research on constructing transferable targeted attacks has focused on improving the optimization procedure, in this work we examine the role of the source classifier. Here, we show that training the source classifier to be"slightly robust"-- that is, robust to small-magnitude adversarial examples -- substantially improves the transferability of class-targeted and representation-targeted adversarial attacks, even between architectures as different as convolutional neural networks and transformers. The results we present provide insight into the nature of adversarial examples as well as the mechanisms underlying so-called"robust"classifiers.
\ No newline at end of file
diff --git a/data/2021/neurips/A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning b/data/2021/neurips/A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning
new file mode 100644
index 0000000000..984a67e7b3
--- /dev/null
+++ b/data/2021/neurips/A Mathematical Framework for Quantifying Transferability in Multi-source Transfer Learning	
@@ -0,0 +1 @@
+Current transfer learning algorithm designs mainly focus on the similarities between source and target tasks, while the impacts of the sample sizes of these tasks are often not sufficiently addressed. This paper proposes a mathematical framework for quantifying the transferability in multi-source transfer learning problems, with both the task similarities and the sample complexity of learning models taken into account. In particular, we consider the setup where the models learned from different tasks are linearly combined for learning the target task, and use the optimal combining coefficients to measure the transferability. Then, we demonstrate the analytical expression of this transferability measure, characterized by the sample sizes, model complexity, and the similarities between source and target tasks, which provides fundamental insights of the knowledge transferring mechanism and the guidance for algorithm designs. Furthermore, we apply our analyses for practical learning tasks, and establish a quantifiable transferability measure by exploiting a parameterized model. In addition, we develop an alternating iterative algorithm to implement our theoretical results for training deep neural networks in multi-source transfer learning tasks. Finally, experiments on image classification tasks show that our approach outperforms existing transfer learning algorithms in multi-source and few-shot scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/A Max-Min Entropy Framework for Reinforcement Learning b/data/2021/neurips/A Max-Min Entropy Framework for Reinforcement Learning
new file mode 100644
index 0000000000..dc775e120b
--- /dev/null
+++ b/data/2021/neurips/A Max-Min Entropy Framework for Reinforcement Learning	
@@ -0,0 +1 @@
+In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning. Whereas the maximum entropy RL guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote better exploration. For general Markov decision processes (MDPs), an efﬁcient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms. in model-free sample-based learning with function approximation. In order to overcome such limitations associated with implementation of the maximum entropy RL, we propose a max-min entropy framework for RL, which aims to learn policies reaching states with low entropy and maximizing the entropy of these low-entropy states, whereas the conventional maximum entropy RL optimizes for policies that aim to visit states with high entropy and maximize the entropy of those high-entropy states for high entropy of the entire trajectory. We implemented the proposed max-min entropy framework into a practical iterative actor-critic algorithm based on policy iteration with disentangled exploration and exploitation. It is demonstrated that the proposed algorithm signiﬁcantly enhances exploration capability due to the fairness across states induced by the max-min framework and yields drastic performance improvement over existing RL algorithms including maximum-entropy SAC on difﬁcult control tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/A Minimalist Approach to Offline Reinforcement Learning b/data/2021/neurips/A Minimalist Approach to Offline Reinforcement Learning
new file mode 100644
index 0000000000..4340710632
--- /dev/null
+++ b/data/2021/neurips/A Minimalist Approach to Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy with the actions contained in the dataset. Built on pre-existing RL algorithms, modifications to make an RL algorithm work offline comes at the cost of additional complexity. Offline RL algorithms introduce new hyperparameters and often leverage secondary components such as generative models, while adjusting the underlying RL algorithm. In this paper we aim to make a deep RL algorithm work while making minimal changes. We find that we can match the performance of state-of-the-art offline RL algorithms by simply adding a behavior cloning term to the policy update of an online RL algorithm and normalizing the data. The resulting algorithm is a simple to implement and tune baseline, while more than halving the overall run time by removing the additional computational overhead of previous methods.
\ No newline at end of file
diff --git a/data/2021/neurips/A Multi-Implicit Neural Representation for Fonts b/data/2021/neurips/A Multi-Implicit Neural Representation for Fonts
new file mode 100644
index 0000000000..d9d79cf214
--- /dev/null
+++ b/data/2021/neurips/A Multi-Implicit Neural Representation for Fonts	
@@ -0,0 +1 @@
+Fonts are ubiquitous across documents and come in a variety of styles. They are either represented in a native vector format or rasterized to produce fixed resolution images. In the first case, the non-standard representation prevents benefiting from latest network architectures for neural representations; while, in the latter case, the rasterized representation, when encoded via networks, results in loss of data fidelity, as font-specific discontinuities like edges and corners are difficult to represent using neural networks. Based on the observation that complex fonts can be represented by a superposition of a set of simpler occupancy functions, we introduce \textit{multi-implicits} to represent fonts as a permutation-invariant set of learned implict functions, without losing features (e.g., edges and corners). However, while multi-implicits locally preserve font features, obtaining supervision in the form of ground truth multi-channel signals is a problem in itself. Instead, we propose how to train such a representation with only local supervision, while the proposed neural architecture directly finds globally consistent multi-implicits for font families. We extensively evaluate the proposed representation for various tasks including reconstruction, interpolation, and synthesis to demonstrate clear advantages with existing alternatives. Additionally, the representation naturally enables glyph completion, wherein a single characteristic font is used to synthesize a whole font family in the target style.
\ No newline at end of file
diff --git a/data/2021/neurips/A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models b/data/2021/neurips/A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models
new file mode 100644
index 0000000000..de47290557
--- /dev/null
+++ b/data/2021/neurips/A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models	
@@ -0,0 +1 @@
+We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while performing on par with in-processing. In addition, we show that the proposed algorithm is particularly effective for models trained at scale where post-processing is a natural and practical choice.
\ No newline at end of file
diff --git a/data/2021/neurips/A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum b/data/2021/neurips/A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
new file mode 100644
index 0000000000..7ea9acaa31
--- /dev/null
+++ b/data/2021/neurips/A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum	
@@ -0,0 +1 @@
+This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(\epsilon^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $\epsilon$-stationary solution. The $\epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.
\ No newline at end of file
diff --git a/data/2021/neurips/A New Theoretical Framework for Fast and Accurate Online Decision-Making b/data/2021/neurips/A New Theoretical Framework for Fast and Accurate Online Decision-Making
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2021/neurips/A New Theoretical Framework for Fast and Accurate Online Decision-Making	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2021/neurips/A No-go Theorem for Robust Acceleration in the Hyperbolic Plane b/data/2021/neurips/A No-go Theorem for Robust Acceleration in the Hyperbolic Plane
new file mode 100644
index 0000000000..92ccba0511
--- /dev/null
+++ b/data/2021/neurips/A No-go Theorem for Robust Acceleration in the Hyperbolic Plane	
@@ -0,0 +1 @@
+In recent years there has been significant effort to adapt the key tools and ideas in convex optimization to the Riemannian setting. One key challenge has remained: Is there a Nesterov-like accelerated gradient method for geodesically convex functions on a Riemannian manifold? Recent work has given partial answers and the hope was that this ought to be possible. Here we prove that in a noisy setting, there is no analogue of accelerated gradient descent for geodesically convex functions on the hyperbolic plane. Our results apply even when the noise is exponentially small. The key intuition behind our proof is short and simple: In negatively curved spaces, the volume of a ball grows so fast that information about the past gradients is not useful in the future.
\ No newline at end of file
diff --git a/data/2021/neurips/A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations b/data/2021/neurips/A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations
new file mode 100644
index 0000000000..54da8fdc9f
--- /dev/null
+++ b/data/2021/neurips/A Non-commutative Extension of Lee-Seung's Algorithm for Positive Semidefinite Factorizations	
@@ -0,0 +1 @@
+Given a matrix $X\in \mathbb{R}_+^{m\times n}$ with nonnegative entries, a Positive Semidefinite (PSD) factorization of $X$ is a collection of $r \times r$-dimensional PSD matrices $\{A_i\}$ and $\{B_j\}$ satisfying $X_{ij}= \mathrm{tr}(A_i B_j)$ for all $\ i\in [m],\ j\in [n]$. PSD factorizations are fundamentally linked to understanding the expressiveness of semidefinite programs as well as the power and limitations of quantum resources in information theory. The PSD factorization task generalizes the Non-negative Matrix Factorization (NMF) problem where we seek a collection of $r$-dimensional nonnegative vectors $\{a_i\}$ and $\{b_j\}$ satisfying $X_{ij}= a_i^\top b_j$, for all $i\in [m],\ j\in [n]$ -- one can recover the latter problem by choosing matrices in the PSD factorization to be diagonal. The most widely used algorithm for computing NMFs of a matrix is the Multiplicative Update algorithm developed by Lee and Seung, in which nonnegativity of the updates is preserved by scaling with positive diagonal matrices. In this paper, we describe a non-commutative extension of Lee-Seung's algorithm, which we call the Matrix Multiplicative Update (MMU) algorithm, for computing PSD factorizations. The MMU algorithm ensures that updates remain PSD by congruence scaling with the matrix geometric mean of appropriate PSD matrices, and it retains the simplicity of implementation that Lee-Seung's algorithm enjoys. Building on the Majorization-Minimization framework, we show that under our update scheme the squared loss objective is non-increasing and fixed points correspond to critical points. The analysis relies on Lieb's Concavity Theorem. Beyond PSD factorizations, we use the MMU algorithm as a primitive to calculate block-diagonal PSD factorizations and tensor PSD factorizations. We demonstrate the utility of our method with experiments on real and synthetic data.
\ No newline at end of file
diff --git a/data/2021/neurips/A Normative and Biologically Plausible Algorithm for Independent Component Analysis b/data/2021/neurips/A Normative and Biologically Plausible Algorithm for Independent Component Analysis
new file mode 100644
index 0000000000..d742b5b4fe
--- /dev/null
+++ b/data/2021/neurips/A Normative and Biologically Plausible Algorithm for Independent Component Analysis	
@@ -0,0 +1 @@
+The brain effortlessly solves blind source separation (BSS) problems, but the algorithm it uses remains elusive. In signal processing, linear BSS problems are often solved by Independent Component Analysis (ICA). To serve as a model of a biological circuit, the ICA neural network (NN) must satisfy at least the following requirements: 1. The algorithm must operate in the online setting where data samples are streamed one at a time, and the NN computes the sources on the fly without storing any significant fraction of the data in memory. 2. The synaptic weight update is local, i.e., it depends only on the biophysical variables present in the vicinity of a synapse. Here, we propose a novel objective function for ICA from which we derive a biologically plausible NN, including both the neural architecture and the synaptic learning rules. Interestingly, our algorithm relies on modulating synaptic plasticity by the total activity of the output neurons. In the brain, this could be accomplished by neuromodulators, extracellular calcium, local field potential, or nitric oxide.
\ No newline at end of file
diff --git a/data/2021/neurips/A Note on Sparse Generalized Eigenvalue Problem b/data/2021/neurips/A Note on Sparse Generalized Eigenvalue Problem
new file mode 100644
index 0000000000..152879edc3
--- /dev/null
+++ b/data/2021/neurips/A Note on Sparse Generalized Eigenvalue Problem	
@@ -0,0 +1 @@
+The sparse generalized eigenvalue problem (SGEP) aims to ﬁnd the leading eigenvector with sparsity structure. SGEP plays an important role in statistical learning and has wide applications including, but not limited to, sparse principal component analysis, sparse canonical correlation analysis and sparse Fisher discriminant analysis, etc. Due to the sparsity constraint, the solution of SGEP entails interesting properties from both numerical and statistical perspectives. In this paper, we provide a detailed sensitivity analysis for SGEP and establish the rate-optimal perturbation bound under the sparse setting. Speciﬁcally, we show that the bound is related to the perturbation/noise level and the recovery of the true support of the leading eigenvector as well. We also investigate the estimator of SGEP via imposing a non-convex regularization. Such estimator can achieve the optimal error rate and can recover the sparsity structure as well. Extensive numerical experiments corroborate our theoretical ﬁndings via using alternating direction method of multipliers (ADMM)-based computational method.
\ No newline at end of file
diff --git a/data/2021/neurips/A PAC-Bayes Analysis of Adversarial Robustness b/data/2021/neurips/A PAC-Bayes Analysis of Adversarial Robustness
new file mode 100644
index 0000000000..526ed1668a
--- /dev/null
+++ b/data/2021/neurips/A PAC-Bayes Analysis of Adversarial Robustness	
@@ -0,0 +1 @@
+We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) that are valid for any kind of attacks (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.
\ No newline at end of file
diff --git a/data/2021/neurips/A Probabilistic State Space Model for Joint Inference from Differential Equations and Data b/data/2021/neurips/A Probabilistic State Space Model for Joint Inference from Differential Equations and Data
new file mode 100644
index 0000000000..e5c6480e5c
--- /dev/null
+++ b/data/2021/neurips/A Probabilistic State Space Model for Joint Inference from Differential Equations and Data	
@@ -0,0 +1 @@
+Mechanistic models with differential equations are a key component of scientific applications of machine learning. Inference in such models is usually computationally demanding, because it involves repeatedly solving the differential equation. The main problem here is that the numerical solver is hard to combine with standard inference techniques. Recent work in probabilistic numerics has developed a new class of solvers for ordinary differential equations (ODEs) that phrase the solution process directly in terms of Bayesian filtering. We here show that this allows such methods to be combined very directly, with conceptual and numerical ease, with latent force models in the ODE itself. It then becomes possible to perform approximate Bayesian inference on the latent force as well as the ODE solution in a single, linear complexity pass of an extended Kalman filter / smoother - that is, at the cost of computing a single ODE solution. We demonstrate the expressiveness and performance of the algorithm by training, among others, a non-parametric SIRD model on data from the COVID-19 outbreak.
\ No newline at end of file
diff --git a/data/2021/neurips/A Prototype-Oriented Framework for Unsupervised Domain Adaptation b/data/2021/neurips/A Prototype-Oriented Framework for Unsupervised Domain Adaptation
new file mode 100644
index 0000000000..7b1a51dbd8
--- /dev/null
+++ b/data/2021/neurips/A Prototype-Oriented Framework for Unsupervised Domain Adaptation	
@@ -0,0 +1 @@
+Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space. To avoid the sampling variability, class imbalance, and data-privacy concerns that often plague these methods, we instead provide a memory and computation-efficient probabilistic framework to extract class prototypes and align the target features with them. We demonstrate the general applicability of our method on a wide range of scenarios, including single-source, multi-source, class-imbalance, and source-private domain adaptation. Requiring no additional model parameters and having a moderate increase in computation over the source model alone, the proposed method achieves competitive performance with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning b/data/2021/neurips/A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning
new file mode 100644
index 0000000000..046d22e11d
--- /dev/null
+++ b/data/2021/neurips/A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning	
@@ -0,0 +1 @@
+Thompson Sampling is one of the most effective methods for contextual bandits and has been generalized to posterior sampling for certain MDP settings. However, existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs. This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees. We introduce novel proof techniques to show that under suitable conditions, the worst-case regret of our posterior sampling method matches the best known results of optimization based methods. In the linear MDP setting with dimension, the regret of our algorithm scales linearly with the dimension as compared to a quadratic dependence of the existing posterior sampling-based exploration algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/A Provably Efficient Sample Collection Strategy for Reinforcement Learning b/data/2021/neurips/A Provably Efficient Sample Collection Strategy for Reinforcement Learning
new file mode 100644
index 0000000000..05d26c349e
--- /dev/null
+++ b/data/2021/neurips/A Provably Efficient Sample Collection Strategy for Reinforcement Learning	
@@ -0,0 +1 @@
+A common assumption in reinforcement learning (RL) is to have access to a generative model (i.e., a simulator of the environment), which allows to generate samples from any desired state-action pair. Nonetheless, in many settings a generative model may not be available and an adaptive exploration strategy is needed to efficiently collect samples from an unknown environment by direct interaction. In this paper, we study the scenario where an algorithm based on the generative model assumption defines the (possibly time-varying) amount of samples $b(s,a)$ required at each state-action pair $(s,a)$ and an exploration strategy has to learn how to generate $b(s,a)$ samples as fast as possible. Building on recent results for regret minimization in the stochastic shortest path (SSP) setting (Cohen et al., 2020; Tarbouriech et al., 2020), we derive an algorithm that requires $\tilde{O}( B D + D^{3/2} S^2 A)$ time steps to collect the $B = \sum_{s,a} b(s,a)$ desired samples, in any unknown and communicating MDP with $S$ states, $A$ actions and diameter $D$. Leveraging the generality of our strategy, we readily apply it to a variety of existing settings (e.g., model estimation, pure exploration in MDPs) for which we obtain improved sample-complexity guarantees, and to a set of new problems such as best-state identification and sparse reward discovery.
\ No newline at end of file
diff --git a/data/2021/neurips/A Regression Approach to Learning-Augmented Online Algorithms b/data/2021/neurips/A Regression Approach to Learning-Augmented Online Algorithms
new file mode 100644
index 0000000000..a60412b809
--- /dev/null
+++ b/data/2021/neurips/A Regression Approach to Learning-Augmented Online Algorithms	
@@ -0,0 +1 @@
+The emerging ﬁeld of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms. Since these parameters are, in general, real-valued functions, a natural approach is to use regression techniques to make these predictions. We introduce this approach in this paper, and explore it in the context of a general online search framework that captures classic problems like (generalized) ski rental, bin packing, minimum makespan scheduling, etc. We show nearly tight bounds on the sample complexity of this regression problem, and extend our results to the agnostic setting. From a technical standpoint, we show that the key is to incorporate online optimization benchmarks in the design of the loss function for the regression problem, thereby diverging from the use of off-the-shelf regression tools with standard bounds on statistical error.
\ No newline at end of file
diff --git a/data/2021/neurips/A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks b/data/2021/neurips/A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks
new file mode 100644
index 0000000000..c98cf1c041
--- /dev/null
+++ b/data/2021/neurips/A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks	
@@ -0,0 +1 @@
+Poisoning attacks have emerged as a significant security threat to machine learning algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Some of the stronger poisoning attacks require the full knowledge of the training data. This leaves open the possibility of achieving the same attack results using poisoning attacks that do not have the full knowledge of the clean training set. In this work, we initiate a theoretical study of the problem above. Specifically, for the case of feature selection with LASSO, we show that full-information adversaries (that craft poisoning examples based on the rest of the training data) are provably stronger than the optimal attacker that is oblivious to the training set yet has access to the distribution of the data. Our separation result shows that the two setting of data-aware and data-oblivious are fundamentally different and we cannot hope to always achieve the same attack or defense results in these scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis b/data/2021/neurips/A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis
new file mode 100644
index 0000000000..53c6635723
--- /dev/null
+++ b/data/2021/neurips/A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis	
@@ -0,0 +1 @@
+The advancement of generative radiance fields has pushed the boundary of 3D-aware image synthesis. Motivated by the observation that a 3D object should look realistic from multiple viewpoints, these methods introduce a multi-view constraint as regularization to learn valid 3D radiance fields from 2D images. Despite the progress, they often fall short of capturing accurate 3D shapes due to the shape-color ambiguity, limiting their applicability in downstream tasks. In this work, we address this ambiguity by proposing a novel shading-guided generative implicit model that is able to learn a starkly improved shape representation. Our key insight is that an accurate 3D shape should also yield a realistic rendering under different lighting conditions. This multi-lighting constraint is realized by modeling illumination explicitly and performing shading with various lighting conditions. Gradients are derived by feeding the synthesized images to a discriminator. To compensate for the additional computational burden of calculating surface normals, we further devise an efficient volume rendering strategy via surface tracking, reducing the training and inference time by 24% and 48%, respectively. Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis while capturing accurate underlying 3D shapes. We demonstrate improved performance of our approach on 3D shape reconstruction against existing methods, and show its applicability on image relighting. Our code will be released at https://github.com/XingangPan/ShadeGAN.
\ No newline at end of file
diff --git a/data/2021/neurips/A Stochastic Newton Algorithm for Distributed Convex Optimization b/data/2021/neurips/A Stochastic Newton Algorithm for Distributed Convex Optimization
new file mode 100644
index 0000000000..012d275166
--- /dev/null
+++ b/data/2021/neurips/A Stochastic Newton Algorithm for Distributed Convex Optimization	
@@ -0,0 +1 @@
+We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.
\ No newline at end of file
diff --git a/data/2021/neurips/A Surrogate Objective Framework for Prediction+Programming with Soft Constraints b/data/2021/neurips/A Surrogate Objective Framework for Prediction+Programming with Soft Constraints
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/A Theoretical Analysis of Fine-tuning with Linear Teachers b/data/2021/neurips/A Theoretical Analysis of Fine-tuning with Linear Teachers
new file mode 100644
index 0000000000..c894e671b5
--- /dev/null
+++ b/data/2021/neurips/A Theoretical Analysis of Fine-tuning with Linear Teachers	
@@ -0,0 +1 @@
+Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data. Although widely used in practice, it is lacking strong theoretical understanding. We analyze the sample complexity of this scheme for regression with linear teachers in several architectures. Intuitively, the success of fine-tuning depends on the similarity between the source tasks and the target task, however measuring it is non trivial. We show that a relevant measure considers the relation between the source task, the target task and the covariance structure of the target data. In the setting of linear regression, we show that under realistic settings a substantial sample complexity reduction is plausible when the above measure is low. For deep linear regression, we present a novel result regarding the inductive bias of gradient-based training when the network is initialized with pretrained weights. Using this result we show that the similarity measure for this setting is also affected by the depth of the network. We further present results on shallow ReLU models, and analyze the dependence of sample complexity there on source and target tasks. We empirically demonstrate our results for both synthetic and realistic data.
\ No newline at end of file
diff --git a/data/2021/neurips/A Theory of the Distortion-Perception Tradeoff in Wasserstein Space b/data/2021/neurips/A Theory of the Distortion-Perception Tradeoff in Wasserstein Space
new file mode 100644
index 0000000000..38477dc1ba
--- /dev/null
+++ b/data/2021/neurips/A Theory of the Distortion-Perception Tradeoff in Wasserstein Space	
@@ -0,0 +1 @@
+The lower the distortion of an estimator, the more the distribution of its outputs generally deviates from the distribution of the signals it attempts to estimate. This phenomenon, known as the perception-distortion tradeoff, has captured significant attention in image restoration, where it implies that fidelity to ground truth images comes at the expense of perceptual quality (deviation from statistics of natural images). However, despite the increasing popularity of performing comparisons on the perception-distortion plane, there remains an important open question: what is the minimal distortion that can be achieved under a given perception constraint? In this paper, we derive a closed form expression for this distortion-perception (DP) function for the mean squared-error (MSE) distortion and the Wasserstein-2 perception index. We prove that the DP function is always quadratic, regardless of the underlying distribution. This stems from the fact that estimators on the DP curve form a geodesic in Wasserstein space. In the Gaussian setting, we further provide a closed form expression for such estimators. For general distributions, we show how these estimators can be constructed from the estimators at the two extremes of the tradeoff: The global MSE minimizer, and a minimizer of the MSE under a perfect perceptual quality constraint. The latter can be obtained as a stochastic transformation of the former.
\ No newline at end of file
diff --git a/data/2021/neurips/A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning b/data/2021/neurips/A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning
new file mode 100644
index 0000000000..c703ee187f
--- /dev/null
+++ b/data/2021/neurips/A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning	
@@ -0,0 +1 @@
+For an image query, unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives. Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query. In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination, while accurate labels benefit its generalization. Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning. It improves the label quality via two complementary modules: (i) self-labeling refinery (SLR) to generate accurate labels and (ii) momentum mixup (MM) to enhance similarity between query and its positive. SLR uses a positive of a query to estimate semantic similarity between a query and its positive and negatives, and combines estimated similarity with vanilla label assignment in contrastive learning to iteratively generate more accurate and informative soft labels. We theoretically show that our SLR can exactly recover the true semantic labels of label-corrupted data, and supervises networks to achieve zero prediction error on classification tasks. MM randomly combines queries and positives to increase semantic similarity between the generated virtual queries and their positives so as to improves label accuracy. Experimental results on CIFAR10, ImageNet, VOC and COCO show the effectiveness of our method. PyTorch code and model will be released online.
\ No newline at end of file
diff --git a/data/2021/neurips/A Topological Perspective on Causal Inference b/data/2021/neurips/A Topological Perspective on Causal Inference
new file mode 100644
index 0000000000..f33b12e449
--- /dev/null
+++ b/data/2021/neurips/A Topological Perspective on Causal Inference	
@@ -0,0 +1 @@
+This paper presents a topological learning-theoretic perspective on causal inference by introducing a series of topologies defined on general spaces of structural causal models (SCMs). As an illustration of the framework we prove a topological causal hierarchy theorem, showing that substantive assumption-free causal inference is possible only in a meager set of SCMs. Thanks to a known correspondence between open sets in the weak topology and statistically verifiable hypotheses, our results show that inductive assumptions sufficient to license valid causal inferences are statistically unverifiable in principle. Similar to no-free-lunch theorems for statistical inference, the present results clarify the inevitability of substantial assumptions for causal inference. An additional benefit of our topological approach is that it easily accommodates SCMs with infinitely many variables. We finally suggest that the framework may be helpful for the positive project of exploring and assessing alternative causal-inductive assumptions.
\ No newline at end of file
diff --git a/data/2021/neurips/A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration b/data/2021/neurips/A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration
new file mode 100644
index 0000000000..7912635315
--- /dev/null
+++ b/data/2021/neurips/A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration	
@@ -0,0 +1 @@
+Hyperspectral imaging offers new perspectives for diverse applications, ranging from the monitoring of the environment using airborne or satellite remote sensing, precision farming, food safety, planetary exploration, or astrophysics. Unfortunately, the spectral diversity of information comes at the expense of various sources of degradation, and the lack of accurate ground-truth"clean"hyperspectral signals acquired on the spot makes restoration tasks challenging. In particular, training deep neural networks for restoration is difficult, in contrast to traditional RGB imaging problems where deep models tend to shine. In this paper, we advocate instead for a hybrid approach based on sparse coding principles that retains the interpretability of classical techniques encoding domain knowledge with handcrafted image priors, while allowing to train model parameters end-to-end without massive amounts of data. We show on various denoising benchmarks that our method is computationally efficient and significantly outperforms the state of the art.
\ No newline at end of file
diff --git a/data/2021/neurips/A Unified Approach to Fair Online Learning via Blackwell Approachability b/data/2021/neurips/A Unified Approach to Fair Online Learning via Blackwell Approachability
new file mode 100644
index 0000000000..459551ceeb
--- /dev/null
+++ b/data/2021/neurips/A Unified Approach to Fair Online Learning via Blackwell Approachability	
@@ -0,0 +1 @@
+We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both cases of Nature accessing the sensitive contexts and Nature unaware of the sensitive contexts. Adapting Blackwell's approachability theory to handle the case of an unknown contexts' distribution, we provide a general necessary and sufficient condition for learning objectives to be compatible with some fairness constraints. This condition is instantiated on (group-wise) no-regret and (group-wise) calibration objectives, and on demographic parity as an additional constraint. When the objective is not compatible with the constraint, the provided framework permits to characterise the optimal trade-off between the two.
\ No newline at end of file
diff --git a/data/2021/neurips/A Unified View of cGANs with and without Classifiers b/data/2021/neurips/A Unified View of cGANs with and without Classifiers
new file mode 100644
index 0000000000..8f3dae0885
--- /dev/null
+++ b/data/2021/neurips/A Unified View of cGANs with and without Classifiers	
@@ -0,0 +1 @@
+Conditional Generative Adversarial Networks (cGANs) are implicit generative models which allow to sample from class-conditional distributions. Existing cGANs are based on a wide range of different discriminator designs and training objectives. One popular design in earlier works is to include a classifier during training with the assumption that good classifiers can help eliminate samples generated with wrong classes. Nevertheless, including classifiers in cGANs often comes with a side effect of only generating easy-to-classify samples. Recently, some representative cGANs avoid the shortcoming and reach state-of-the-art performance without having classifiers. Somehow it remains unanswered whether the classifiers can be resurrected to design better cGANs. In this work, we demonstrate that classifiers can be properly leveraged to improve cGANs. We start by using the decomposition of the joint probability distribution to connect the goals of cGANs and classification as a unified framework. The framework, along with a classic energy model to parameterize distributions, justifies the use of classifiers for cGANs in a principled manner. It explains several popular cGAN variants, such as ACGAN, ProjGAN, and ContraGAN, as special cases with different levels of approximations, which provides a unified view and brings new insights to understanding cGANs. Experimental results demonstrate that the design inspired by the proposed framework outperforms state-of-the-art cGANs on multiple benchmark datasets, especially on the most challenging ImageNet. The code is available at https://github.com/sian-chen/PyTorch-ECGAN.
\ No newline at end of file
diff --git a/data/2021/neurips/A Universal Law of Robustness via Isoperimetry b/data/2021/neurips/A Universal Law of Robustness via Isoperimetry
new file mode 100644
index 0000000000..5e5ed30583
--- /dev/null
+++ b/data/2021/neurips/A Universal Law of Robustness via Isoperimetry	
@@ -0,0 +1 @@
+Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a partial theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry (or a mixture thereof). In the case of two-layer neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li, and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.
\ No newline at end of file
diff --git a/data/2021/neurips/A Variational Perspective on Diffusion-Based Generative Models and Score Matching b/data/2021/neurips/A Variational Perspective on Diffusion-Based Generative Models and Score Matching
new file mode 100644
index 0000000000..da182b1e56
--- /dev/null
+++ b/data/2021/neurips/A Variational Perspective on Diffusion-Based Generative Models and Score Matching	
@@ -0,0 +1 @@
+Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
\ No newline at end of file
diff --git a/data/2021/neurips/A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness b/data/2021/neurips/A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness
new file mode 100644
index 0000000000..5274c56e04
--- /dev/null
+++ b/data/2021/neurips/A Winning Hand: Compressing Deep Networks Can Improve Out-of-Distribution Robustness	
@@ -0,0 +1 @@
+Successful adoption of deep learning (DL) in the wild requires models to be: (1) compact, (2) accurate, and (3) robust to distributional shifts. Unfortunately, efforts towards simultaneously meeting these requirements have mostly been unsuccessful. This raises an important question: Is the inability to create Compact, Accurate, and Robust Deep neural networks (CARDs) fundamental? To answer this question, we perform a large-scale analysis of popular model compression techniques which uncovers several intriguing patterns. Notably, in contrast to traditional pruning approaches (e.g., fine tuning and gradual magnitude pruning), we find that"lottery ticket-style"approaches can surprisingly be used to produce CARDs, including binary-weight CARDs. Specifically, we are able to create extremely compact CARDs that, compared to their larger counterparts, have similar test accuracy and matching (or better) robustness -- simply by pruning and (optionally) quantizing. Leveraging the compactness of CARDs, we develop a simple domain-adaptive test-time ensembling approach (CARD-Decks) that uses a gating module to dynamically select appropriate CARDs from the CARD-Deck based on their spectral-similarity with test samples. The proposed approach builds a"winning hand'' of CARDs that establishes a new state-of-the-art (on RobustBench) on CIFAR-10-C accuracies (i.e., 96.8% standard and 92.75% robust) and CIFAR-100-C accuracies (80.6% standard and 71.3% robust) with better memory usage than non-compressed baselines (pretrained CARDs and CARD-Decks available at https://github.com/RobustBench/robustbench). Finally, we provide theoretical support for our empirical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/A first-order primal-dual method with adaptivity to local smoothness b/data/2021/neurips/A first-order primal-dual method with adaptivity to local smoothness
new file mode 100644
index 0000000000..8ba7d84720
--- /dev/null
+++ b/data/2021/neurips/A first-order primal-dual method with adaptivity to local smoothness	
@@ -0,0 +1 @@
+We consider the problem of finding a saddle point for the convex-concave objective $\min_x \max_y f(x) + \langle Ax, y\rangle - g^*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-V\~u algorithm, which alternates between primal gradient steps and dual proximal steps. The method achieves stepsize adaptivity through a simple rule involving $\|A\|$ and the norm of recently computed gradients of $f$. Under standard assumptions, we prove an $\mathcal{O}(k^{-1})$ ergodic convergence rate. Furthermore, when $f$ is also locally strongly convex and $A$ has full row rank we show that our method converges with a linear rate. Numerical experiments are provided for illustrating the practical performance of the algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/A flow-based latent state generative model of neural population responses to natural images b/data/2021/neurips/A flow-based latent state generative model of neural population responses to natural images
new file mode 100644
index 0000000000..bea5f2c810
--- /dev/null
+++ b/data/2021/neurips/A flow-based latent state generative model of neural population responses to natural images	
@@ -0,0 +1 @@
+We present a joint deep neural system identification model for two major sources of neural variability: stimulus-driven and stimulus-conditioned fluctuations. To this end, we combine (1) state-of-the-art deep networks for stimulus-driven activity and (2) a flexible, normalizing flow-based generative model to capture the stimulus-conditioned variability including noise correlations. This allows us to train the model end-to-end without the need for sophisticated probabilistic approximations associated with many latent state models for stimulus-conditioned fluctuations. We train the model on the responses of thousands of neurons from multiple areas of the mouse visual cortex to natural images. We show that our model outperforms previous state-of-the-art models in predicting the distribution of neural population responses to novel stimuli, including shared stimulus-conditioned variability. Furthermore, it successfully learns known latent factors of the population responses that are related to behavioral variables such as pupil dilation, and other factors that vary systematically with brain area or retinotopic location. Overall, our model accurately accounts for two critical sources of neural variability while avoiding several complexities associated with many existing latent state models. It thus provides a useful tool for uncovering the interplay between different factors that contribute to variability in neural activity.
\ No newline at end of file
diff --git a/data/2021/neurips/A generative nonparametric Bayesian model for whole genomes b/data/2021/neurips/A generative nonparametric Bayesian model for whole genomes
new file mode 100644
index 0000000000..2f5db62478
--- /dev/null
+++ b/data/2021/neurips/A generative nonparametric Bayesian model for whole genomes	
@@ -0,0 +1 @@
+Generative probabilistic modeling of biological sequences has widespread existing and potential use across biology and biomedicine, particularly given advances in high-throughput sequencing, synthesis and editing. However, we still lack methods with nucleotide resolution that are tractable at the scale of whole genomes and that can achieve high predictive accuracy either in theory or practice. In this article we propose a new generative sequence model, the Bayesian embedded autoregressive (BEAR) model, which uses a parametric autoregressive model to specify a conjugate prior over a nonparametric Bayesian Markov model. We explore, theoretically and empirically, applications of BEAR models to a variety of statistical problems including density estimation, robust parameter estimation, goodness-of-fit tests, and two-sample tests. We prove rigorous asymptotic consistency results including nonparametric posterior concentration rates. We scale inference in BEAR models to datasets containing tens of billions of nucleotides. On genomic, transcriptomic, and metagenomic sequence data we show that BEAR models provide large increases in predictive performance as compared to parametric autoregressive models, among other results. BEAR models offer a flexible and scalable framework, with theoretical guarantees, for building and critiquing generative models at the whole genome scale.
\ No newline at end of file
diff --git a/data/2021/neurips/A mechanistic multi-area recurrent network model of decision-making b/data/2021/neurips/A mechanistic multi-area recurrent network model of decision-making
new file mode 100644
index 0000000000..d90685040a
--- /dev/null
+++ b/data/2021/neurips/A mechanistic multi-area recurrent network model of decision-making	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) trained on neuroscience-based tasks have been widely used as models for cortical areas performing analogous tasks. However, very few tasks involve a single cortical area, and instead require the coordination of multiple brain areas. Despite the importance of multi-area computation, there is a limited understanding of the principles underlying such computation. We propose to use multi-area RNNs with neuroscience-inspired architecture constraints to derive key features of multi-area computation. In particular, we show that incorporating multiple areas and Dale’s Law is critical for biasing the networks to learn biologically plausible solutions. Additionally, we leverage the full observability of the RNNs to show that output-relevant information is preferentially propagated between areas. These results suggest that cortex uses modular computation to generate minimal sufﬁcient representations of task information. More broadly, our results suggest that constrained multi-area RNNs can produce experimentally testable hypotheses for computations that occur within and across multiple brain areas, enabling new insights into distributed computation in neural systems.
\ No newline at end of file
diff --git a/data/2021/neurips/A nonparametric method for gradual change problems with statistical guarantees b/data/2021/neurips/A nonparametric method for gradual change problems with statistical guarantees
new file mode 100644
index 0000000000..1fb73886e9
--- /dev/null
+++ b/data/2021/neurips/A nonparametric method for gradual change problems with statistical guarantees	
@@ -0,0 +1 @@
+We consider the detection and localization of gradual changes in the distribution of a sequence of time-ordered observations. Existing literature focuses mostly on the simpler abrupt setting which assumes a discontinuity jump in distribution, and is unrealistic for some applied settings. We propose a general method for detecting and localizing gradual changes that does not require a speciﬁc data generating model, a particular data type, or prior knowledge about which features of the distribution are subject to change. Despite relaxed assumptions, the proposed method possesses proven theoretical guarantees for both detection and localization
\ No newline at end of file
diff --git a/data/2021/neurips/A novel notion of barycenter for probability distributions based on optimal weak mass transport b/data/2021/neurips/A novel notion of barycenter for probability distributions based on optimal weak mass transport
new file mode 100644
index 0000000000..726df4e025
--- /dev/null
+++ b/data/2021/neurips/A novel notion of barycenter for probability distributions based on optimal weak mass transport	
@@ -0,0 +1 @@
+We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.
\ No newline at end of file
diff --git a/data/2021/neurips/A sampling-based circuit for optimal decision making b/data/2021/neurips/A sampling-based circuit for optimal decision making
new file mode 100644
index 0000000000..e767841be9
--- /dev/null
+++ b/data/2021/neurips/A sampling-based circuit for optimal decision making	
@@ -0,0 +1 @@
+Many features of human and animal behavior can be understood in the framework of Bayesian inference and optimal decision making, but the biological substrate of such processes is not fully understood. Neural sampling provides a flexible code for probabilistic inference in high dimensions and explains key features of sensory responses under experimental manipulations of uncertainty. However, since it encodes uncertainty implicitly, across time and neurons, it remains unclear how such representations can be used for decision making. Here we propose a spiking network model that maps neural samples of a task-specific marginal distribution into an instantaneous representation of uncertainty via a procedure inspired by online kernel density estimation, so that its output can be readily used for decision making. Our model is consistent with experimental results at the level of single neurons and populations, and makes predictions for how neural responses and decisions could be modulated by uncertainty and prior biases. More generally, our work brings together conflicting perspectives on probabilistic brain computation. One of the central questions of perception is how organisms reliably estimate hidden or abstract quantities of interest using noisy and ambiguous sensory information. Almost equally important is representing the reliability of these estimates, especially in complex environments and situations of risk, where the uncertainty associated with a choice may radically change the optimal course of action. From basic functions, such as cue combination or motor control, to higher level cognitive tasks, such as decision making and planning, there is substantial evidence that both humans and animals represent and use uncertainty information to guide their actions [1]. Neural correlates have been identified in a number of regions, including the orbitofrontal cortex [2], the cingulate cortex [3], and the lateral intraparietal area (LIP) [4, 5], but the principles behind how uncertainty is represented in neural circuits to support circuit computations remain hotly debated. Since behavior has been shown to be Bayes-optimal in many situations [6], the problem of perception can be modelled in the framework of Bayesian inference, whereby an observer combines prior information with current observations according to an internal model to arrive at a posterior estimate of a quantity of interest (the ‘latent variable’). Moreover, natural statistics are strongly non-Gaussian and there is evidence that human subjects use varied, non-Gaussian prior representations to support behavior [7, 8], so observers must be able to perform inference flexibly, efficiently and adaptively. This raises two questions: which neural organizations allow for these kinds of computations, and how are the associated neural representations mapped into behaviorally relevant action plans? Currently there are several theories for how neural activity may represent probability distributions. One large class of models assume that neural responses encode parameters of underlying posterior distributions; this includes probabilistic population codes [9, 10], their predecessors, kernel density estimators and distributional population codes [11, 12], and most recently, distributed distributional codes (DDC) [13]. A second class of models relies on stochasticity in recurrent circuits to approxi35th Conference on Neural Information Processing Systems (NeurIPS 2021), . Γ2 Γ2 posterior marginal inference circuit kernels for parametric approximation posterior samples decision circuit = optimal decision estimate ground truth Figure 1: Schematic of sampling-based optimal decision making in spiking networks. Approximate inference is performed in a first spiking recurrent network by distributed sampling [22] (example trajectory from joint posterior shown in white). Samples from the task-relevant marginal posterior are read out linearly from this circuit and serve as input to a second network, which integrates them and converts them into a parametric representation. The associated parameters (bank of kernels in gray) are read out linearly from this second circuit and combined with a cost function, reflecting the potentially asymmetric cost of different errors, to generate the final optimal decision. mately encode probability distributions via sampling [14, 15]. This idea was originally motivated by the practical success of Markov Chain Monte Carlo (MCMC) sampling when performing inference in complex graphical models [14], and had the appeal of being able to flexibly represent complex probability distributions, something that parametric models could not achieve at the time.1 Sampling has been invoked to explain aspects of perceptual decision making [16], response variability in V1 neurons [17, 18] and structured spontaneous activity in the cortex [19]. Moreover, recent theoretical work has focused on improving the computational efficiency of neural sampling, by increasing sampling speed [20, 21] and improving the robustness of the representation [22]. Nonetheless, sampling-based codes represent uncertainty only implicitly, distributed across time and neurons [22]. How the brain maps neural sampling dynamics into uncertainty-calibrated decisions remains a key open question. Here, we propose the first spiking circuit for mapping neural samples into appropriate actions. At the core of our idea is the observation that although inference may involve complex high-dimensional posteriors, individual decisions are usually based on low-dimensional marginals of these distributions, which can be represented explicitly using a traditional parametric code. The proposed recurrent network model takes as inputs samples from the task-relevant posterior marginal and integrates them over time in a procedure inspired by online kernel density estimation [23]. We demonstrate the ability of this circuit to perform decision making in several toy examples. We also analyze the spiking activity of the network to look for signatures of uncertainty and probabilistic computation at the level of single neurons and population activity and find that our model recapitulates a variety of empirical observations in neural data. The model also makes predictions for how uncertainty and prior biases affect neural activity and population latent dynamics in this decision circuit. 1 Spiking neural network for sampling-based marginalization In a Bayesian framework, perceptual decision making involves several key computations. First, given a sensory stimulus, s, one needs to compute the posterior over the latent variables, x, that may have given rise to it, P(x|s). Second, uncertainty about nuisance variables is averaged out, to obtain a marginal distribution over the task-relevant latent xi, P(xi|s). Finally, this marginal is combined with a task-specific cost function to yield the final decision. The first two steps are straightforward under a sampling framework [22], but parametric representations are needed for the final step to convert the information represented by samples into a spatially and temporally localized code [9]. Our approach uses spike-based distributed sampling for approximate inference and marginalization, then converts the resulting samples into an instantaneous parametric representation that can be used Since recent DDCs are also able to represent complex multidimensional posteriors, the arguments in favor of sampling have moved away from representational flexibility, to its ability to explain many nontrivial features of neural variability in the cortex.
\ No newline at end of file
diff --git a/data/2021/neurips/A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs b/data/2021/neurips/A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs
new file mode 100644
index 0000000000..654f1242a9
--- /dev/null
+++ b/data/2021/neurips/A self consistent theory of Gaussian Processes captures feature learning effects in finite CNNs	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently, as they provide a clear analytical window to deep learning via mappings to Gaussian Processes (GPs). Despite its theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. Applying this to a toy model of a two-layer linear convolutional neural network (CNN) shows good agreement with experiments. We further identify, both analytical and numerically, a sharp transition between a feature learning regime and a lazy learning regime in this model. Strong finite-DNN effects are also derived for a non-linear two-layer fully connected network. Our self consistent theory provides a rich and versatile analytical framework for studying feature learning and other non-lazy effects in finite DNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/A single gradient step finds adversarial examples on random two-layers neural networks b/data/2021/neurips/A single gradient step finds adversarial examples on random two-layers neural networks
new file mode 100644
index 0000000000..ee73742b50
--- /dev/null
+++ b/data/2021/neurips/A single gradient step finds adversarial examples on random two-layers neural networks	
@@ -0,0 +1 @@
+Daniely and Schacham recently showed that gradient descent finds adversarial examples on random undercomplete two-layers ReLU neural networks. The term "undercomplete" refers to the fact that their proof only holds when the number of neurons is a vanishing fraction of the ambient dimension. We extend their result to the overcomplete case, where the number of neurons is larger than the dimension (yet also subexponential in the dimension). In fact we prove that a single step of gradient descent suffices. We also show this result for any subexponential width random neural network with smooth activation function.
\ No newline at end of file
diff --git a/data/2021/neurips/A unified framework for bandit multiple testing b/data/2021/neurips/A unified framework for bandit multiple testing
new file mode 100644
index 0000000000..695bb6c01b
--- /dev/null
+++ b/data/2021/neurips/A unified framework for bandit multiple testing	
@@ -0,0 +1 @@
+In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries). One common metric in non-bandit multiple testing is the false discovery rate (FDR). We propose a unified, modular framework for bandit FDR control that emphasizes the decoupling of exploration and summarization of evidence. We utilize the powerful martingale-based concept of"e-processes"to ensure FDR control for arbitrary composite nulls, exploration rules and stopping times in generic problem settings. In particular, valid FDR control holds even if the reward distributions of the arms could be dependent, multiple arms may be queried simultaneously, and multiple (cooperating or competing) agents may be querying arms, covering combinatorial semi-bandit type settings as well. Prior work has considered in great detail the setting where each arm's reward distribution is independent and sub-Gaussian, and a single arm is queried at each step. Our framework recovers matching sample complexity guarantees in this special case, and performs comparably or better in practice. For other settings, sample complexities will depend on the finer details of the problem (composite nulls being tested, exploration algorithm, data dependence structure, stopping rule) and we do not explore these; our contribution is to show that the FDR guarantee is clean and entirely agnostic to these details.
\ No newline at end of file
diff --git a/data/2021/neurips/A universal probabilistic spike count model reveals ongoing modulation of neural variability b/data/2021/neurips/A universal probabilistic spike count model reveals ongoing modulation of neural variability
new file mode 100644
index 0000000000..24a0d4418d
--- /dev/null
+++ b/data/2021/neurips/A universal probabilistic spike count model reveals ongoing modulation of neural variability	
@@ -0,0 +1 @@
+Neural responses are variable: even under identical experimental conditions, single neuron and population responses typically differ from trial to trial and across time. Recent work has demonstrated that this variability has predictable structure, can be modulated by sensory input and behaviour, and bears critical signatures of the underlying network dynamics and computations. However, current methods for characterising neural variability are primarily geared towards sensory coding in the laboratory: they require trials with repeatable experimental stimuli and behavioural covariates. In addition, they make strong assumptions about the parametric form of variability, rely on assumption-free but data-inefficient histogram-based approaches, or are altogether ill-suited for capturing variability modulation by covariates. Here we present a universal probabilistic spike count model that eliminates these shortcomings. Our method builds on sparse Gaussian processes and can model arbitrary spike count distributions (SCDs) with flexible dependence on observed as well as latent covariates, using scalable variational inference to jointly infer the covariate-to-SCD mappings and latent trajectories in a data efficient way. Without requiring repeatable trials, it can flexibly capture covariate-dependent joint SCDs, and provide interpretable latent causes underlying the statistical dependencies between neurons. We apply the model to recordings from a canonical non-sensory neural population: head direction cells in the mouse. We find that variability in these cells defies a simple parametric relationship with mean spike count as assumed in standard models, its modulation by external covariates can be comparably strong to that of the mean firing rate, and slow low-dimensional latent factors explain away neural correlations. Our approach paves the way to understanding the mechanisms and computations underlying neural variability under naturalistic conditions, beyond the realm of sensory coding with repeatable stimuli.
\ No newline at end of file
diff --git a/data/2021/neurips/A variational approximate posterior for the deep Wishart process b/data/2021/neurips/A variational approximate posterior for the deep Wishart process
new file mode 100644
index 0000000000..89458bd065
--- /dev/null
+++ b/data/2021/neurips/A variational approximate posterior for the deep Wishart process	
@@ -0,0 +1 @@
+Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior can be made equivalent to deep Gaussian process (DGP) priors for kernels that can be expressed entirely in terms of Gram matrices. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP can improve performance over doing inference in a DGP with the equivalent prior.
\ No newline at end of file
diff --git a/data/2021/neurips/A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose b/data/2021/neurips/A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose
new file mode 100644
index 0000000000..6c1920769a
--- /dev/null
+++ b/data/2021/neurips/A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose	
@@ -0,0 +1 @@
+While deep learning reshaped the classical motion capture pipeline with feed-forward networks, generative models are required to recover fine alignment via iterative refinement. Unfortunately, the existing models are usually hand-crafted or learned in controlled conditions, only applicable to limited domains. We propose a method to learn a generative neural body model from unlabelled monocular videos by extending Neural Radiance Fields (NeRFs). We equip them with a skeleton to apply to time-varying and articulated motion. A key insight is that implicit models require the inverse of the forward kinematics used in explicit surface models. Our reparameterization defines spatial latent variables relative to the pose of body parts and thereby overcomes ill-posed inverse operations with an overparameterization. This enables learning volumetric body shape and appearance from scratch while jointly refining the articulated pose; all without ground truth labels for appearance, pose, or 3D shape on the input videos. When used for novel-view-synthesis and motion capture, our neural model improves accuracy on diverse datasets. Project website: https://lemonatsu.github.io/anerf/ .
\ No newline at end of file
diff --git a/data/2021/neurips/ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning b/data/2021/neurips/ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning
new file mode 100644
index 0000000000..d3812f5fe9
--- /dev/null
+++ b/data/2021/neurips/ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning	
@@ -0,0 +1 @@
+Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/AC DC: Alternating Compressed DeCompressed Training of Deep Neural Networks b/data/2021/neurips/AC DC: Alternating Compressed DeCompressed Training of Deep Neural Networks
new file mode 100644
index 0000000000..21e426c882
--- /dev/null
+++ b/data/2021/neurips/AC DC: Alternating Compressed DeCompressed Training of Deep Neural Networks	
@@ -0,0 +1 @@
+The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are often empirical and can have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models. The code is available at: https://github.com/IST-DASLab/ACDC .
\ No newline at end of file
diff --git a/data/2021/neurips/AC-GC: Lossy Activation Compression with Guaranteed Convergence b/data/2021/neurips/AC-GC: Lossy Activation Compression with Guaranteed Convergence
new file mode 100644
index 0000000000..a2415f2dbe
--- /dev/null
+++ b/data/2021/neurips/AC-GC: Lossy Activation Compression with Guaranteed Convergence	
@@ -0,0 +1 @@
+Parallel hardware devices (e.g., graphics processor units) have limited high-bandwidth memory capacity. This negatively impacts the training of deep neural networks (DNNs) by increasing runtime and/or decreasing accuracy when reducing model and/or batch size to ﬁt this capacity. Lossy compression is a promising approach to tackling memory capacity constraints, but prior approaches rely on hyperparameter search to achieve a suitable trade-off between convergence and compression, negating runtime beneﬁts. In this paper we build upon recent developments on Stochastic Gradient Descent convergence to prove an upper bound on the expected loss increase when training with compressed activation storage. We then express activation compression error in terms of this bound, allowing the compression rate to adapt to training conditions automatically. The advantage of our approach, called AC-GC, over existing lossy compression frameworks is that, given a preset allowable increase in loss, signiﬁcant compression without signiﬁcant increase in error can be achieved with a single training run. When combined with error-bounded methods, AC-GC achieves 15.1 × compression with an average accuracy change of 0 . 1% on text and image datasets. AC-GC functions on any model composed of the layers analyzed and, by avoiding compression rate search, reduces overall training time by 4.6 × over SuccessiveHalving.
\ No newline at end of file
diff --git a/data/2021/neurips/AFEC: Active Forgetting of Negative Transfer in Continual Learning b/data/2021/neurips/AFEC: Active Forgetting of Negative Transfer in Continual Learning
new file mode 100644
index 0000000000..2a6cb5395b
--- /dev/null
+++ b/data/2021/neurips/AFEC: Active Forgetting of Negative Transfer in Continual Learning	
@@ -0,0 +1 @@
+Continual learning aims to learn a sequence of tasks from dynamic data distributions. Without accessing to the old training samples, knowledge transfer from the old tasks to each new task is difficult to determine, which might be either positive or negative. If the old knowledge interferes with the learning of a new task, i.e., the forward knowledge transfer is negative, then precisely remembering the old tasks will further aggravate the interference, thus decreasing the performance of continual learning. By contrast, biological neural networks can actively forget the old knowledge that conflicts with the learning of a new experience, through regulating the learning-triggered synaptic expansion and synaptic convergence. Inspired by the biological active forgetting, we propose to actively forget the old knowledge that limits the learning of new tasks to benefit continual learning. Under the framework of Bayesian continual learning, we develop a novel approach named Active Forgetting with synaptic Expansion-Convergence (AFEC). Our method dynamically expands parameters to learn each new task and then selectively combines them, which is formally consistent with the underlying mechanism of biological active forgetting. We extensively evaluate AFEC on a variety of continual learning benchmarks, including CIFAR-10 regression tasks, visual classification tasks and Atari reinforcement tasks, where AFEC effectively improves the learning of new tasks and achieves the state-of-the-art performance in a plug-and-play way.
\ No newline at end of file
diff --git a/data/2021/neurips/ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning b/data/2021/neurips/ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning
new file mode 100644
index 0000000000..53c5dc6871
--- /dev/null
+++ b/data/2021/neurips/ASSANet: An Anisotropic Separable Set Abstraction for Efficient Point Cloud Representation Learning	
@@ -0,0 +1 @@
+Access to 3D point cloud representations has been widely facilitated by LiDAR sensors embedded in various mobile devices. This has led to an emerging need for fast and accurate point cloud processing techniques. In this paper, we revisit and dive deeper into PointNet++, one of the most influential yet under-explored networks, and develop faster and more accurate variants of the model. We first present a novel Separable Set Abstraction (SA) module that disentangles the vanilla SA module used in PointNet++ into two separate learning stages: (1) learning channel correlation and (2) learning spatial correlation. The Separable SA module is significantly faster than the vanilla version, yet it achieves comparable performance. We then introduce a new Anisotropic Reduction function into our Separable SA module and propose an Anisotropic Separable SA (ASSA) module that substantially increases the network's accuracy. We later replace the vanilla SA modules in PointNet++ with the proposed ASSA module, and denote the modified network as ASSANet. Extensive experiments on point cloud classification, semantic segmentation, and part segmentation show that ASSANet outperforms PointNet++ and other methods, achieving much higher accuracy and faster speeds. In particular, ASSANet outperforms PointNet++ by $7.4$ mIoU on S3DIS Area 5, while maintaining $1.6 \times $ faster inference speed on a single NVIDIA 2080Ti GPU. Our scaled ASSANet variant achieves $66.8$ mIoU and outperforms KPConv, while being more than $54 \times$ faster.
\ No newline at end of file
diff --git a/data/2021/neurips/ATISS: Autoregressive Transformers for Indoor Scene Synthesis b/data/2021/neurips/ATISS: Autoregressive Transformers for Indoor Scene Synthesis
new file mode 100644
index 0000000000..0c9509bcd4
--- /dev/null
+++ b/data/2021/neurips/ATISS: Autoregressive Transformers for Indoor Scene Synthesis	
@@ -0,0 +1 @@
+The ability to synthesize realistic and diverse indoor furniture layouts automatically or based on partial input, unlocks many applications, from better interactive 3D tools to data synthesis for training and simulation. In this paper, we present ATISS, a novel autoregressive transformer architecture for creating diverse and plausible synthetic indoor environments, given only the room type and its floor plan. In contrast to prior work, which poses scene synthesis as sequence generation, our model generates rooms as unordered sets of objects. We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis. For example, the same trained model can be used in interactive applications for general scene completion, partial room re-arrangement with any objects specified by the user, as well as object suggestions for any partial room. To enable this, our model leverages the permutation equivariance of the transformer when conditioning on the partial scene, and is trained to be permutation-invariant across object orderings. Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision. Evaluations on four room types in the 3D-FRONT dataset demonstrate that our model consistently generates plausible room layouts that are more realistic than existing methods. In addition, it has fewer parameters, is simpler to implement and train and runs up to 8 times faster than existing methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Absolute Neighbour Difference based Correlation Test for Detecting Heteroscedastic Relationships b/data/2021/neurips/Absolute Neighbour Difference based Correlation Test for Detecting Heteroscedastic Relationships
new file mode 100644
index 0000000000..0224bcedf3
--- /dev/null
+++ b/data/2021/neurips/Absolute Neighbour Difference based Correlation Test for Detecting Heteroscedastic Relationships	
@@ -0,0 +1 @@
+It is a challenge to detect complicated data relationships thoroughly. Here, we propose a new statistical measure, named the absolute neighbour difference based neighbour correlation coefficient, to detect the associations between variables through examining the heteroscedasticity of the unpredictable variation of dependent variables. Different from previous studies, the new method concentrates on measuring nonfunctional relationships rather than functional or mixed associations. Either used alone or in combination with other measures, it enables not only a convenient test of heteroscedasticity, but also measuring functional and nonfunctional relationships separately that obviously leads to a deeper insight into the data associations. The method is concise and easy to implement that does not rely on explicitly estimating the regression residuals or the dependencies between variables so that it is not restrict to any kind of model assumption. The mechanisms of the correlation test are proved in theory and demonstrated with numerical analyses.
\ No newline at end of file
diff --git a/data/2021/neurips/Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks b/data/2021/neurips/Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks
new file mode 100644
index 0000000000..28debf5fb7
--- /dev/null
+++ b/data/2021/neurips/Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks	
@@ -0,0 +1 @@
+Unstructured pruning reduces the memory footprint in deep neural networks (DNNs). Recently, researchers proposed different types of structural pruning intending to reduce also the computation complexity. In this work, we first suggest a new measure called mask-diversity which correlates with the expected accuracy of the different types of structural pruning. We focus on the recently suggested N:M fine-grained block sparsity mask, in which for each block of M weights, we have at least N zeros. While N:M fine-grained block sparsity allows acceleration in actual modern hardware, it can be used only to accelerate the inference phase. In order to allow for similar accelerations in the training phase, we suggest a novel transposable fine-grained sparsity mask, where the same mask can be used for both forward and backward passes. Our transposable mask guarantees that both the weight matrix and its transpose follow the same sparsity pattern; thus, the matrix multiplication required for passing the error backward can also be accelerated. We formulate the problem of finding the optimal transposable-mask as a minimum-cost flow problem. Additionally, to speed up the minimum-cost flow computation, we also introduce a fast linear-time approximation that can be used when the masks dynamically change during training. Our experiments suggest a 2x speed-up in the matrix multiplications with no accuracy degradation over vision and language models. Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training. A reference implementation can be found at https://github.com/papers-submission/structured_transposable_masks.
\ No newline at end of file
diff --git a/data/2021/neurips/Accelerating Quadratic Optimization with Reinforcement Learning b/data/2021/neurips/Accelerating Quadratic Optimization with Reinforcement Learning
new file mode 100644
index 0000000000..0eab0934c6
--- /dev/null
+++ b/data/2021/neurips/Accelerating Quadratic Optimization with Reinforcement Learning	
@@ -0,0 +1 @@
+First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Meszaros problems. Code for RLQP is available at https://github.com/berkeleyautomation/rlqp.
\ No newline at end of file
diff --git a/data/2021/neurips/Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives b/data/2021/neurips/Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives
new file mode 100644
index 0000000000..90c2a6921c
--- /dev/null
+++ b/data/2021/neurips/Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives	
@@ -0,0 +1 @@
+Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm with the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at https://mihdalal.github.io/raps/
\ No newline at end of file
diff --git a/data/2021/neurips/Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning b/data/2021/neurips/Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning
new file mode 100644
index 0000000000..7e5f083adf
--- /dev/null
+++ b/data/2021/neurips/Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning	
@@ -0,0 +1 @@
+In this paper we consider multi-objective reinforcement learning where the objectives are balanced using preferences. In practice, the preferences are often given in an adversarial manner, e.g., customers can be picky in many applications. We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions. In the online setting, the agent receives a (adversarial) preference every episode and proposes policies to interact with the environment. We provide a model-based algorithm that achieves a regret bound $\widetilde{\mathcal{O}}\left({\sqrt{\min\{d,S\}\cdot H^3 SAK}}\right)$, where $d$ is the number of objectives, $S$ is the number of states, $A$ is the number of actions, $H$ is the length of the horizon, and $K$ is the number of episodes. Furthermore, we consider preference-free exploration, i.e., the agent first interacts with the environment without specifying any preference and then is able to accommodate arbitrary preference vectors up to $\epsilon$ error. Our proposed algorithm is provably efficient with a nearly optimal sample complexity $\widetilde{\mathcal{O}}\left({\frac{\min\{d,S\}\cdot H^4 SA}{\epsilon^2}}\right)$.
\ No newline at end of file
diff --git a/data/2021/neurips/Accumulative Poisoning Attacks on Real-time Data b/data/2021/neurips/Accumulative Poisoning Attacks on Real-time Data
new file mode 100644
index 0000000000..7487baeb6e
--- /dev/null
+++ b/data/2021/neurips/Accumulative Poisoning Attacks on Real-time Data	
@@ -0,0 +1 @@
+Collecting training data from untrusted sources exposes machine learning services to poisoning adversaries, who maliciously manipulate training data to degrade the model accuracy. When trained on offline datasets, poisoning adversaries have to inject the poisoned data in advance before training, and the order of feeding these poisoned batches into the model is stochastic. In contrast, practical systems are more usually trained/fine-tuned on sequentially captured real-time data, in which case poisoning adversaries could dynamically poison each data batch according to the current model state. In this paper, we focus on the real-time settings and propose a new attacking strategy, which affiliates an accumulative phase with poisoning attacks to secretly (i.e., without affecting accuracy) magnify the destructive effect of a (poisoned) trigger batch. By mimicking online learning and federated learning on MNIST and CIFAR-10, we show that model accuracy significantly drops by a single update step on the trigger batch after the accumulative phase. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects, with no need to explore complex techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Accurate Point Cloud Registration with Robust Optimal Transport b/data/2021/neurips/Accurate Point Cloud Registration with Robust Optimal Transport
new file mode 100644
index 0000000000..dbcefcbad9
--- /dev/null
+++ b/data/2021/neurips/Accurate Point Cloud Registration with Robust Optimal Transport	
@@ -0,0 +1 @@
+This work investigates the use of robust optimal transport (OT) for shape matching. Specifically, we show that recent OT solvers improve both optimization-based and deep learning methods for point cloud registration, boosting accuracy at an affordable computational cost. This manuscript starts with a practical overview of modern OT theory. We then provide solutions to the main difficulties in using this framework for shape matching. Finally, we showcase the performance of transport-enhanced registration models on a wide range of challenging tasks: rigid registration for partial shapes; scene flow estimation on the Kitti dataset; and nonparametric registration of lung vascular trees between inspiration and expiration. Our OT-based methods achieve state-of-the-art results on Kitti and for the challenging lung registration task, both in terms of accuracy and scalability. We also release PVT1010, a new public dataset of 1,010 pairs of lung vascular trees with densely sampled points. This dataset provides a challenging use case for point cloud registration algorithms with highly complex shapes and deformations. Our work demonstrates that robust OT enables fast pre-alignment and fine-tuning for a wide range of registration models, thereby providing a new key method for the computer vision toolbox. Our code and dataset are available online at: https://github.com/uncbiag/robot.
\ No newline at end of file
diff --git a/data/2021/neurips/Accurately Solving Rod Dynamics with Graph Learning b/data/2021/neurips/Accurately Solving Rod Dynamics with Graph Learning
new file mode 100644
index 0000000000..0dec0c18d8
--- /dev/null
+++ b/data/2021/neurips/Accurately Solving Rod Dynamics with Graph Learning	
@@ -0,0 +1 @@
+Iterative solvers are widely used to accurately simulate physical systems. These solvers require initial guesses to generate a sequence of improving approximate solutions. In this contribution, we introduce a novel method to accelerate iterative solvers for rod dynamics with graph networks (GNs) by predicting the initial guesses to reduce the number of iterations. Unlike existing methods that aim to learn physical systems in an end-to-end manner, our approach guarantees long-term stability and therefore leads to more accurate solutions. Furthermore, our method improves the run time performance of traditional iterative solvers for rod dynamics. To explore our method we make use of position-based dynamics (PBD) as a common solver for physical systems and evaluate it by simulating the dynamics of elastic rods. Our approach is able to generalize across different initial conditions, discretizations, and realistic material properties. We demonstrate that it also performs well when taking discontinuous effects into account such as collisions between individual rods. Finally, to illustrate the scalability of our approach, we simulate complex 3D tree models composed of over a thousand individual branch segments swaying in wind ﬁelds.
\ No newline at end of file
diff --git a/data/2021/neurips/Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning b/data/2021/neurips/Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning
new file mode 100644
index 0000000000..40b541bfb5
--- /dev/null
+++ b/data/2021/neurips/Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning	
@@ -0,0 +1 @@
+Continual learning (CL) learns a sequence of tasks incrementally with the goal of achieving two main objectives: overcoming catastrophic forgetting (CF) and encouraging knowledge transfer (KT) across tasks. However, most existing techniques focus only on overcoming CF and have no mechanism to encourage KT, and thus do not do well in KT. Although several papers have tried to deal with both CF and KT, our experiments show that they suffer from serious CF when the tasks do not have much shared knowledge. Another observation is that most current CL methods do not use pre-trained models, but it has been shown that such models can significantly improve the end task performance. For example, in natural language processing, fine-tuning a BERT-like pre-trained language model is one of the most effective approaches. However, for CL, this approach suffers from serious CF. An interesting question is how to make the best use of pre-trained models for CL. This paper proposes a novel model called CTR to solve these problems. Our experimental results demonstrate the effectiveness of CTR
\ No newline at end of file
diff --git a/data/2021/neurips/Achieving Rotational Invariance with Bessel-Convolutional Neural Networks b/data/2021/neurips/Achieving Rotational Invariance with Bessel-Convolutional Neural Networks
new file mode 100644
index 0000000000..4a2b289378
--- /dev/null
+++ b/data/2021/neurips/Achieving Rotational Invariance with Bessel-Convolutional Neural Networks	
@@ -0,0 +1 @@
+For many applications in image analysis, learning models that are invariant to translations and rotations is paramount. This is the case, for example, in medical imaging where the objects of interest can appear at arbitrary positions, with arbitrary orientations. As of today, Convolutional Neural Networks (CNN) are one of the most powerful tools for image analysis. They achieve, thanks to convolutions, an invariance with respect to translations. In this work, we present a new type of convolutional layer that takes advantage of Bessel functions, well known in physics, to build Bessel-CNNs (B-CNNs) that are invariant to all the continuous set of possible rotation angles by design.
\ No newline at end of file
diff --git a/data/2021/neurips/Across-animal odor decoding by probabilistic manifold alignment b/data/2021/neurips/Across-animal odor decoding by probabilistic manifold alignment
new file mode 100644
index 0000000000..df4f054962
--- /dev/null
+++ b/data/2021/neurips/Across-animal odor decoding by probabilistic manifold alignment	
@@ -0,0 +1 @@
+Identifying the common structure of neural dynamics across subjects is key for extracting unifying principles of brain computation and for many brain machine interface applications. Here, we propose a novel probabilistic approach for aligning stimulus-evoked responses from multiple animals in a common low dimensional manifold and use hierarchical inference to identify which stimulus drives neural activity in any given trial. Our probabilistic decoder is robust to a range of features of the neural responses and significantly outperforms existing neural alignment procedures. When applied to recordings from the mouse olfactory bulb, our approach reveals low-dimensional population dynamics that are odor specific and have consistent structure across animals. Thus, our decoder can be used for increasing the robustness and scalability of neural-based chemical detection.
\ No newline at end of file
diff --git a/data/2021/neurips/Action-guided 3D Human Motion Prediction b/data/2021/neurips/Action-guided 3D Human Motion Prediction
new file mode 100644
index 0000000000..c574d317cc
--- /dev/null
+++ b/data/2021/neurips/Action-guided 3D Human Motion Prediction	
@@ -0,0 +1 @@
+The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction. In this work, we focus on developing models to predict future human motion from past observed video frames. Motivated by the observation that human motion is closely related to the action being performed, we propose to explore action context to guide motion prediction. Specifically, we construct an action-specific memory bank to store representative motion dynamics for each action category, and design a query-read process to retrieve some motion dynamics from the memory bank. The retrieved dynamics are consistent with the action depicted in the observed video frames and serve as a strong prior knowledge to guide motion prediction. We further formulate an action constraint loss to ensure the global semantic consistency of the predicted motion. Extensive experiments demonstrate the effectiveness of the proposed approach, and we achieve state-of-the-art performance on 3D human motion prediction.
\ No newline at end of file
diff --git a/data/2021/neurips/Activation Sharing with Asymmetric Paths Solves Weight Transport Problem without Bidirectional Connection b/data/2021/neurips/Activation Sharing with Asymmetric Paths Solves Weight Transport Problem without Bidirectional Connection
new file mode 100644
index 0000000000..5717d09e4e
--- /dev/null
+++ b/data/2021/neurips/Activation Sharing with Asymmetric Paths Solves Weight Transport Problem without Bidirectional Connection	
@@ -0,0 +1 @@
+One of the reasons why it is difﬁcult for the brain to perform backpropagation (BP) is the weight transport problem, which argues forward and feedback neurons cannot share the same synaptic weights during learning in biological neural networks. Recently proposed algorithms address the weight transport problem while providing good performance similar to BP in large-scale networks. However, they require bidirectional connections between the forward and feedback neurons to train their weights, which is observed to be rare in the biological brain. In this work, we propose an Activation Sharing algorithm that removes the need for bidirectional connections between the two types of neurons. In this algorithm, hidden layer outputs (activations) are shared across multiple layers during weight updates. By applying this learning rule to both forward and feedback networks, we solve the weight transport problem without the constraint of bidirectional connections, also achieving good performance even on deep convolutional neural networks for various datasets. In addition, our algorithm could signiﬁcantly reduce memory access overhead when implemented in hardware.
\ No newline at end of file
diff --git a/data/2021/neurips/Active 3D Shape Reconstruction from Vision and Touch b/data/2021/neurips/Active 3D Shape Reconstruction from Vision and Touch
new file mode 100644
index 0000000000..756132f196
--- /dev/null
+++ b/data/2021/neurips/Active 3D Shape Reconstruction from Vision and Touch	
@@ -0,0 +1 @@
+Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. Inactive touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2)a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction.
\ No newline at end of file
diff --git a/data/2021/neurips/Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations b/data/2021/neurips/Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations
new file mode 100644
index 0000000000..2100b84430
--- /dev/null
+++ b/data/2021/neurips/Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations	
@@ -0,0 +1 @@
+Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions. Such attributed accuracy measures become important as machine learning models get deployed as a service, where the training data distribution is hidden from clients, and different clients may be interested in diverse regions of the data distribution. We present Attributed Accuracy Assay (AAA)--a Gaussian Process (GP)--based probabilistic estimator for such an accuracy surface. Each attribute combination, called an 'arm', is associated with a Beta density from which the service's accuracy is sampled. We expect the GP to smooth the parameters of the Beta density over related arms to mitigate sparsity. We show that obvious application of GPs cannot address the challenge of heteroscedastic uncertainty over a huge attribute space that is sparsely and unevenly populated. In response, we present two enhancements: pooling sparse observations, and regularizing the scale parameter of the Beta densities. After introducing these innovations, we establish the effectiveness of AAA in terms of both its estimation accuracy and exploration efficiency, through extensive experiments and analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Active Learning of Convex Halfspaces on Graphs b/data/2021/neurips/Active Learning of Convex Halfspaces on Graphs
new file mode 100644
index 0000000000..adb0428cd7
--- /dev/null
+++ b/data/2021/neurips/Active Learning of Convex Halfspaces on Graphs	
@@ -0,0 +1 @@
+We systematically study the query complexity of learning geodesically convex halfspaces on graphs. Geodesic convexity is a natural generalisation of Euclidean convexity and allows the deﬁnition of convex sets and halfspaces on graphs. We prove an upper bound on the query complexity which is linear in the treewidth and the minimum hull set size but only logarithmic in the diameter. We show tight lower bounds along well-established separation axioms and identify the Radon number as a central parameter of the query complexity and the VC dimension. While previous bounds typically depend on the cut size of the labelling, all parameters in our bounds can be computed from the unlabelled graph. We provide evidence that ground-truth communities in real-world graphs are often convex and empirically compare our proposed approach with other active learning algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Active Offline Policy Selection b/data/2021/neurips/Active Offline Policy Selection
new file mode 100644
index 0000000000..137a6a26f7
--- /dev/null
+++ b/data/2021/neurips/Active Offline Policy Selection	
@@ -0,0 +1 @@
+This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce active offline policy selection - a novel sequential decision approach that combines logged data with online interaction to identify the best policy. We use OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel that represents policy similarity. We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.
\ No newline at end of file
diff --git a/data/2021/neurips/Active clustering for labeling training data b/data/2021/neurips/Active clustering for labeling training data
new file mode 100644
index 0000000000..96c5636a3e
--- /dev/null
+++ b/data/2021/neurips/Active clustering for labeling training data	
@@ -0,0 +1 @@
+Gathering training data is a key step of any supervised learning task, and it is both critical and expensive. Critical, because the quantity and quality of the training data has a high impact on the performance of the learned function. Expensive, because most practical cases rely on humans-in-the-loop to label the data. The process of determining the correct labels is much more expensive than comparing two items to see whether they belong to the same class. Thus motivated, we propose a setting for training data gathering where the human experts perform the comparatively cheap task of answering pairwise queries, and the computer groups the items into classes (which can be labeled cheaply at the very end of the process). Given the items, we consider two random models for the classes: one where the set partition they form is drawn uniformly, the other one where each item chooses its class independently following a fixed distribution. In the first model, we characterize the algorithms that minimize the average number of queries required to cluster the items and analyze their complexity. In the second model, we analyze a specific algorithm family, propose as a conjecture that they reach the minimum average number of queries and compare their performance to a random approach. We also propose solutions to handle errors or inconsistencies in the experts' answers.
\ No newline at end of file
diff --git a/data/2021/neurips/Actively Identifying Causal Effects with Latent Variables Given Only Response Variable Observable b/data/2021/neurips/Actively Identifying Causal Effects with Latent Variables Given Only Response Variable Observable
new file mode 100644
index 0000000000..fc7da82fee
--- /dev/null
+++ b/data/2021/neurips/Actively Identifying Causal Effects with Latent Variables Given Only Response Variable Observable	
@@ -0,0 +1 @@
+In many real tasks, it is generally desired to study the causal effect on a speciﬁc target (response variable) only, with no need to identify the thorough causal effects involving all variables. In this paper, we attempt to identify such effects by a few active interventions where only the response variable is observable. This task is challenging because the causal graph is unknown and even there may exist latent confounders. To learn the necessary structure for identifying the effects, we provide the graphical characterization that allows us to efﬁciently estimate all possible causal effects in a partially mixed ancestral graph (PMAG) by generalized back-door criterion. The characterization guides learning a local structure with the interventional data. Theoretical analysis and empirical studies validate the effectiveness and efﬁciency of our proposed approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptable Agent Populations via a Generative Model of Policies b/data/2021/neurips/Adaptable Agent Populations via a Generative Model of Policies
new file mode 100644
index 0000000000..050d2588bd
--- /dev/null
+++ b/data/2021/neurips/Adaptable Agent Populations via a Generative Model of Policies	
@@ -0,0 +1 @@
+In the natural world, life has found innumerable ways to survive and often thrive. Between and even within species, each individual is in some manner unique, and this diversity lends adaptability and robustness to life. In this work, we aim to learn a space of diverse and high-reward policies on any given environment. To this end, we introduce a generative model of policies, which maps a low-dimensional latent space to an agent policy space. Our method enables learning an entire population of agent policies, without requiring the use of separate policy parameters. Just as real world populations can adapt and evolve via natural selection, our method is able to adapt to changes in our environment solely by selecting for policies in latent space. We test our generative model's capabilities in a variety of environments, including an open-ended grid-world and a two-player soccer environment. Code, visualizations, and additional experiments can be found at https://kennyderek.github.io/adap/.
\ No newline at end of file
diff --git a/data/2021/neurips/Adapting to function difficulty and growth conditions in private optimization b/data/2021/neurips/Adapting to function difficulty and growth conditions in private optimization
new file mode 100644
index 0000000000..25d6b63c03
--- /dev/null
+++ b/data/2021/neurips/Adapting to function difficulty and growth conditions in private optimization	
@@ -0,0 +1 @@
+We develop algorithms for private stochastic convex optimization that adapt to the hardness of the specific function we wish to optimize. While previous work provide worst-case bounds for arbitrary convex functions, it is often the case that the function at hand belongs to a smaller class that enjoys faster rates. Concretely, we show that for functions exhibiting $\kappa$-growth around the optimum, i.e., $f(x) \ge f(x^*) + \lambda \kappa^{-1} \|x-x^*\|_2^\kappa$ for $\kappa>1$, our algorithms improve upon the standard ${\sqrt{d}}/{n\varepsilon}$ privacy rate to the faster $({\sqrt{d}}/{n\varepsilon})^{\tfrac{\kappa}{\kappa - 1}}$. Crucially, they achieve these rates without knowledge of the growth constant $\kappa$ of the function. Our algorithms build upon the inverse sensitivity mechanism, which adapts to instance difficulty (Asi&Duchi, 2020), and recent localization techniques in private optimization (Feldman et al., 2020). We complement our algorithms with matching lower bounds for these function classes and demonstrate that our adaptive algorithm is \emph{simultaneously} (minimax) optimal over all $\kappa \ge 1+c$ whenever $c = \Theta(1)$.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Conformal Inference Under Distribution Shift b/data/2021/neurips/Adaptive Conformal Inference Under Distribution Shift
new file mode 100644
index 0000000000..e1791e85c0
--- /dev/null
+++ b/data/2021/neurips/Adaptive Conformal Inference Under Distribution Shift	
@@ -0,0 +1 @@
+We develop methods for forming prediction sets in an online setting where the data generating distribution is allowed to vary over time in an unknown fashion. Our framework builds on ideas from conformal inference to provide a general wrapper that can be combined with any black box method that produces point predictions of the unseen label or estimated quantiles of its distribution. While previous conformal inference methods rely on the assumption that the data points are exchangeable, our adaptive approach provably achieves the desired coverage frequency over long-time intervals irrespective of the true data generating process. We accomplish this by modelling the distribution shift as a learning problem in a single parameter whose optimal value is varying over time and must be continuously re-estimated. We test our method, adaptive conformal inference, on two real world datasets and find that its predictions are robust to visible and significant distribution shifts.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Data Augmentation on Temporal Graphs b/data/2021/neurips/Adaptive Data Augmentation on Temporal Graphs
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Adaptive Denoising via GainTuning b/data/2021/neurips/Adaptive Denoising via GainTuning
new file mode 100644
index 0000000000..55d09bc302
--- /dev/null
+++ b/data/2021/neurips/Adaptive Denoising via GainTuning	
@@ -0,0 +1 @@
+Deep convolutional neural networks (CNNs) for image denoising are usually trained on large datasets. These models achieve the current state of the art, but they have difficulties generalizing when applied to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, but their performance is limited by the small amount of information used to train them. Here we propose"GainTuning", in which CNN models pre-trained on large datasets are adaptively and selectively adjusted for individual test images. To avoid overfitting, GainTuning optimizes a single multiplicative scaling parameter (the"Gain") of each channel in the convolutional layers of the CNN. We show that GainTuning improves state-of-the-art CNNs on standard image-denoising benchmarks, boosting their denoising performance on nearly every image in a held-out test set. These adaptive improvements are even more substantial for test images differing systematically from the training data, either in noise level or image type. We illustrate the potential of adaptive denoising in a scientific application, in which a CNN is trained on synthetic data, and tested on real transmission-electron-microscope images. In contrast to the existing methodology, GainTuning is able to faithfully reconstruct the structure of catalytic nanoparticles from these data at extremely low signal-to-noise ratios.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Diffusion in Graph Neural Networks b/data/2021/neurips/Adaptive Diffusion in Graph Neural Networks
new file mode 100644
index 0000000000..e746571cca
--- /dev/null
+++ b/data/2021/neurips/Adaptive Diffusion in Graph Neural Networks	
@@ -0,0 +1 @@
+The success of graph neural networks (GNNs) largely relies on the process of aggregating information from neighbors deﬁned by the input graph structures. No-tably, message passing based GNNs, e.g., graph convolutional networks, leverage the immediate neighbors of each node during the aggregation process, and recently, graph diffusion convolution (GDC) is proposed to expand the propagation neighborhood by leveraging generalized graph diffusion. However, the neighborhood size in GDC is manually tuned for each graph by conducting grid search over the validation set, making its generalization practically limited. To address this issue, we propose the adaptive diffusion convolution (ADC) * strategy to automatically learn the optimal neighborhood size from the data. Furthermore, we break the conventional assumption that all GNN layers and feature channels (dimensions) should use the same neighborhood size for propagation. We design strategies to enable ADC to learn a dedicated propagation neighborhood for each GNN layer and each feature channel, making the GNN architecture fully coupled with graph structures—the unique property that differs GNNs from traditional neural networks. By directly plugging ADC into existing GNNs, we observe consistent and signiﬁcant outperformance over both GDC and their vanilla versions across various datasets, demonstrating the improved model capacity brought by automatically learning unique neighborhood size per layer and per channel in GNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback b/data/2021/neurips/Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
new file mode 100644
index 0000000000..4f71677d91
--- /dev/null
+++ b/data/2021/neurips/Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback	
@@ -0,0 +1 @@
+The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive First-Order Methods Revisited: Convex Minimization without Lipschitz Requirements b/data/2021/neurips/Adaptive First-Order Methods Revisited: Convex Minimization without Lipschitz Requirements
new file mode 100644
index 0000000000..4bb1579722
--- /dev/null
+++ b/data/2021/neurips/Adaptive First-Order Methods Revisited: Convex Minimization without Lipschitz Requirements	
@@ -0,0 +1 @@
+We propose a new family of adaptive first-order methods for a class of convex minimization problems that may fail to be Lipschitz continuous or smooth in the standard sense. Specifically, motivated by a recent flurry of activity on non-Lipschitz (NoLips) optimization, we consider problems that are continuous or smooth relative to a reference Bregman function - as opposed to a global, ambient norm (Euclidean or otherwise). These conditions encompass a wide range of problems with singular objectives, such as Fisher markets, Poisson tomography, D-design, and the like. In this setting, the application of existing order-optimal adaptive methods - like UnixGrad or AcceleGrad - is not possible, especially in the presence of randomness and uncertainty. The proposed method - which we call adaptive mirror descent (AdaMir) - aims to close this gap by concurrently achieving min-max optimal rates in problems that are relatively continuous or smooth, including stochastic ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Machine Unlearning b/data/2021/neurips/Adaptive Machine Unlearning
new file mode 100644
index 0000000000..8775dd6a80
--- /dev/null
+++ b/data/2021/neurips/Adaptive Machine Unlearning	
@@ -0,0 +1 @@
+Data deletion algorithms aim to remove the influence of deleted data points from trained models at a cheaper computational cost than fully retraining those models. However, for sequences of deletions, most prior work in the non-convex setting gives valid guarantees only for sequences that are chosen independently of the models that are published. If people choose to delete their data as a function of the published models (because they don't like what the models reveal about them, for example), then the update sequence is adaptive. In this paper, we give a general reduction from deletion guarantees against adaptive sequences to deletion guarantees against non-adaptive sequences, using differential privacy and its connection to max information. Combined with ideas from prior work which give guarantees for non-adaptive deletion sequences, this leads to extremely flexible algorithms able to handle arbitrary model classes and training methodologies, giving strong provable deletion guarantees for adaptive deletion sequences. We show in theory how prior work for non-convex models fails against adaptive deletion sequences, and use this intuition to design a practical attack against the SISA algorithm of Bourtoule et al. [2021] on CIFAR-10, MNIST, Fashion-MNIST.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Online Packing-guided Search for POMDPs b/data/2021/neurips/Adaptive Online Packing-guided Search for POMDPs
new file mode 100644
index 0000000000..2c3f72e4db
--- /dev/null
+++ b/data/2021/neurips/Adaptive Online Packing-guided Search for POMDPs	
@@ -0,0 +1 @@
+The partially observable Markov decision process (POMDP) provides a general framework for modeling an agent’s decision process with state uncertainty, and online planning plays a pivotal role in solving it. A belief is a distribution of states representing state uncertainty. Methods for large-scale POMDP problems rely on the same idea of sampling both states and observations. That is, instead of exact belief updating, a collection of sampled states is used to approximate the belief; instead of considering all possible observations, only a set of sampled observations are considered. Inspired by this, we take one step further and propose an online planning algorithm, Adaptive Online Packing-guided Search (AdaOPS), to better approximate beliefs with adaptive particle ﬁlter technique and balance estimation bias and variance by fusing similar observation branches. Theoretically, our algorithm is guaranteed to ﬁnd an (cid:15) -optimal policy with a high probability given enough planning time under some mild assumptions. We evaluate our algorithm on several tricky POMDP domains, and it outperforms the state-of-the-art in all of them. Codes are available at https://github.com/LAMDA-POMDP/AdaOPS.jl.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Proximal Gradient Methods for Structured Neural Networks b/data/2021/neurips/Adaptive Proximal Gradient Methods for Structured Neural Networks
new file mode 100644
index 0000000000..440ac16015
--- /dev/null
+++ b/data/2021/neurips/Adaptive Proximal Gradient Methods for Structured Neural Networks	
@@ -0,0 +1 @@
+We consider the training of structured neural networks where the regularizer can be non-smooth and possibly non-convex. While popular machine learning libraries have resorted to stochastic (adaptive) subgradient approaches, the use of proximal gradient methods in the stochastic setting has been little explored and warrants further study, in particular regarding the incorporation of adaptivity. Towards this goal, we present a general framework of stochastic proximal gradient descent meth-ods that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. We derive two important instances of our framework: (i) the ﬁrst proximal version of A DAM , one of the most popular adaptive SGD algorithm, and (ii) a revised version of P ROX Q UANT [1] for quantization-speciﬁc regularizers, which improves upon the original approach by incorporating the effect of preconditioners in the proximal mapping computations. We provide convergence guarantees for our framework and show that adaptive gradient methods can have faster convergence in terms of constant than vanilla SGD for sparse data. Lastly, we demonstrate the superiority of stochastic proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that the beneﬁt of proximal approaches over sub-gradient counterparts is more pronounced for non-convex regularizers than for convex ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Risk Minimization: Learning to Adapt to Domain Shift b/data/2021/neurips/Adaptive Risk Minimization: Learning to Adapt to Domain Shift
new file mode 100644
index 0000000000..4bedf71c25
--- /dev/null
+++ b/data/2021/neurips/Adaptive Risk Minimization: Learning to Adapt to Domain Shift	
@@ -0,0 +1 @@
+A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution. However, this assumption is violated in almost all practical applications: machine learning systems are regularly tested under distribution shift, due to changing temporal correlations, atypical end users, or other factors. In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts, corresponding to new domains or domain distributions. Most prior methods aim to learn a single robust model or invariant feature space that performs well on all domains. In contrast, we aim to learn models that adapt at test time to domain shift using unlabeled test points. Our primary contribution is to introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains. Compared to prior methods for robustness, invariance, and adaptation, ARM methods provide performance gains of 1-4% test accuracy on a number of image classification problems exhibiting domain shift.
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive Sampling for Minimax Fair Classification b/data/2021/neurips/Adaptive Sampling for Minimax Fair Classification
new file mode 100644
index 0000000000..b41ed62f9b
--- /dev/null
+++ b/data/2021/neurips/Adaptive Sampling for Minimax Fair Classification	
@@ -0,0 +1 @@
+Machine learning models trained on uncurated datasets can often end up adversely affecting inputs belonging to underrepresented groups. To address this issue, we consider the problem of adaptively constructing training sets which allow us to learn classifiers that are fair in a minimax sense. We first propose an adaptive sampling algorithm based on the principle of optimism, and derive theoretical bounds on its performance. We also propose heuristic extensions of this algorithm suitable for application to large scale, practical problems. Next, by deriving algorithm independent lower-bounds for a specific class of problems, we show that the performance achieved by our adaptive scheme cannot be improved in general. We then validate the benefits of adaptively constructing training sets via experiments on synthetic tasks with logistic regression classifiers, as well as on several real-world tasks using convolutional neural networks (CNNs).
\ No newline at end of file
diff --git a/data/2021/neurips/Adaptive wavelet distillation from neural networks through interpretations b/data/2021/neurips/Adaptive wavelet distillation from neural networks through interpretations
new file mode 100644
index 0000000000..a2777b9f2f
--- /dev/null
+++ b/data/2021/neurips/Adaptive wavelet distillation from neural networks through interpretations	
@@ -0,0 +1 @@
+Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github (https://github.com/Yu-Group/adaptive-wavelets).
\ No newline at end of file
diff --git a/data/2021/neurips/Adder Attention for Vision Transformer b/data/2021/neurips/Adder Attention for Vision Transformer
new file mode 100644
index 0000000000..bc6c8d95cb
--- /dev/null
+++ b/data/2021/neurips/Adder Attention for Vision Transformer	
@@ -0,0 +1 @@
+Transformer is a new kind of calculation paradigm for deep learning which has shown strong performance on a large variety of computer vision tasks. However, compared with conventional deep models ( e.g. , convolutional neural networks), vision transformers require more computational resources which cannot be easily deployed on mobile devices. To this end, we present to reduce the energy consumptions using adder neural network (AdderNet). We ﬁrst theoretically analyze the mechanism of self-attention and the difﬁculty for applying adder operation into this module. Speciﬁcally, the feature diversity, i.e. , the rank of attention map using only additions cannot be well preserved. Thus, we develop an adder attention layer that includes an additional identity mapping. With the new operation, vision transformers constructed using additions can also provide powerful feature representations. Experimental results on several benchmarks demonstrate that the proposed approach can achieve highly competitive performance to that of the baselines while achieving an about 2˜3 × reduction on the energy consumption.
\ No newline at end of file
diff --git a/data/2021/neurips/Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning b/data/2021/neurips/Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning
new file mode 100644
index 0000000000..6e87303971
--- /dev/null
+++ b/data/2021/neurips/Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning	
@@ -0,0 +1 @@
+Federated learning (FL) has gain growing interests for its capability of learning from distributed data sources collectively without the need of accessing the raw data samples across different sources. So far FL research has mostly focused on improving the performance, how the algorithmic disparity will be impacted for the model learned from FL and the impact of algorithmic disparity on the utility inconsistency are largely unexplored. In this paper, we propose an FL framework to jointly consider performance consistency and algorithmic fairness across different local clients (data sources). We derive our framework from a constrained multi-objective optimization perspective, in which we learn a model satisfying fairness constraints on all clients with consistent performance. Specifically, we treat the algorithm prediction loss at each local client as an objective and maximize the worst-performing client with fairness constraints through optimizing a surrogate maximum function with all objectives involved. A gradient-based procedure is employed to achieve the Pareto optimality of this optimization problem. Theoretical analysis is provided to prove that our method can converge to a Pareto solution that achieves the min-max performance with fairness constraints on all clients. Comprehensive experiments on synthetic and real-world datasets demonstrate the superiority that our approach over baselines and its effectiveness in achieving both fairness and consistency across all local clients.
\ No newline at end of file
diff --git a/data/2021/neurips/Adjusting for Autocorrelated Errors in Neural Networks for Time Series b/data/2021/neurips/Adjusting for Autocorrelated Errors in Neural Networks for Time Series
new file mode 100644
index 0000000000..a726e7a38b
--- /dev/null
+++ b/data/2021/neurips/Adjusting for Autocorrelated Errors in Neural Networks for Time Series	
@@ -0,0 +1 @@
+An increasing body of research focuses on using neural networks to model time series. A common assumption in training neural networks via maximum likelihood estimation on time series is that the errors across time steps are uncorrelated. However, errors are actually autocorrelated in many cases due to the temporality of the data, which makes such maximum likelihood estimations inaccurate. In this paper, in order to adjust for autocorrelated errors, we propose to learn the autocorrelation coefficient jointly with the model parameters. In our experiments, we verify the effectiveness of our approach on time series forecasting. Results across a wide range of real-world datasets with various state-of-the-art models show that our method enhances performance in almost all cases. Based on these results, we suggest empirical critical values to determine the severity of autocorrelated errors. We also analyze several aspects of our method to demonstrate its advantages. Finally, other time series tasks are also considered to validate that our method is not restricted to only forecasting.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Attack Generation Empowered by Min-Max Optimization b/data/2021/neurips/Adversarial Attack Generation Empowered by Min-Max Optimization
new file mode 100644
index 0000000000..1cc06f14a4
--- /dev/null
+++ b/data/2021/neurips/Adversarial Attack Generation Empowered by Min-Max Optimization	
@@ -0,0 +1 @@
+The worst-case training principle that minimizes the maximal adversarial loss, also known as adversarial training (AT), has shown to be a state-of-the-art approach for enhancing adversarial robustness. Nevertheless, min-max optimization beyond the purpose of AT has not been rigorously explored in the adversarial context. In this paper, we show how a general framework of min-max optimization over multiple domains can be leveraged to advance the design of different types of adversarial attacks. In particular, given a set of risk sources, minimizing the worst-case attack loss can be reformulated as a min-max problem by introducing domain weights that are maximized over the probability simplex of the domain set. We showcase this unified framework in three attack generation problems -- attacking model ensembles, devising universal perturbation under multiple inputs, and crafting attacks resilient to data transformations. Extensive experiments demonstrate that our approach leads to substantial attack improvement over the existing heuristic strategies as well as robustness improvement over state-of-the-art defense methods trained to be robust against multiple perturbation types. Furthermore, we find that the self-adjusted domain weights learned from our min-max framework can provide a holistic tool to explain the difficulty level of attack across domains. Code is available at https://github.com/wangjksjtu/minmax-adv.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations b/data/2021/neurips/Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations
new file mode 100644
index 0000000000..422da9d008
--- /dev/null
+++ b/data/2021/neurips/Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations	
@@ -0,0 +1 @@
+When compared to the image classification models, black-box adversarial attacks against video classification models have been largely understudied. This could be possible because, with video, the temporal dimension poses significant additional challenges in gradient estimation. Query-efficient black-box attacks rely on effectively estimated gradients towards maximizing the probability of misclassifying the target video. In this work, we demonstrate that such effective gradients can be searched for by parameterizing the temporal structure of the search space with geometric transformations. Specifically, we design a novel iterative algorithm Geometric TRAnsformed Perturbations (GEO-TRAP), for attacking video classification models. GEO-TRAP employs standard geometric transformation operations to reduce the search space for effective gradients into searching for a small group of parameters that define these operations. This group of parameters describes the geometric progression of gradients, resulting in a reduced and structured search space. Our algorithm inherently leads to successful perturbations with surprisingly few queries. For example, adversarial examples generated from GEO-TRAP have better attack success rates with ~73.55% fewer queries compared to the state-of-the-art method for video adversarial attacks on the widely used Jester dataset. Overall, our algorithm exposes vulnerabilities of diverse video classification models and achieves new state-of-the-art results under black-box settings on two large datasets. Code is available here: https://github.com/sli057/Geo-TRAP
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Attacks on Graph Classifiers via Bayesian Optimisation b/data/2021/neurips/Adversarial Attacks on Graph Classifiers via Bayesian Optimisation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Adversarial Examples Make Strong Poisons b/data/2021/neurips/Adversarial Examples Make Strong Poisons
new file mode 100644
index 0000000000..feb2c7e589
--- /dev/null
+++ b/data/2021/neurips/Adversarial Examples Make Strong Poisons	
@@ -0,0 +1 @@
+The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data. In this work, we show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning. Our findings indicate that adversarial examples, when assigned the original label of their natural base image, cannot be used to train a classifier for natural images. Furthermore, when adversarial examples are assigned their adversarial class label, they are useful for training. This suggests that adversarial examples contain useful semantic content, just with the ``wrong'' labels (according to a network, but not a human). Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and we release a poisoned version of ImageNet, ImageNet-P, to encourage research into the strength of this form of data obfuscation.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams b/data/2021/neurips/Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams
new file mode 100644
index 0000000000..743c7eba0d
--- /dev/null
+++ b/data/2021/neurips/Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams	
@@ -0,0 +1 @@
+Adversarial examples are a widely studied phenomenon in machine learning models. While most of the attention has been focused on neural networks, other practical models also suffer from this issue. In this work, we propose an algorithm for evaluating the adversarial robustness of $k$-nearest neighbor classification, i.e., finding a minimum-norm adversarial example. Diverging from previous proposals, we take a geometric approach by performing a search that expands outwards from a given input point. On a high level, the search radius expands to the nearby Voronoi cells until we find a cell that classifies differently from the input point. To scale the algorithm to a large $k$, we introduce approximation steps that find perturbations with smaller norm, compared to the baselines, in a variety of datasets. Furthermore, we analyze the structural properties of a dataset where our approach outperforms the competition.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Examples in Multi-Layer Random ReLU Networks b/data/2021/neurips/Adversarial Examples in Multi-Layer Random ReLU Networks
new file mode 100644
index 0000000000..6161ccb57d
--- /dev/null
+++ b/data/2021/neurips/Adversarial Examples in Multi-Layer Random ReLU Networks	
@@ -0,0 +1 @@
+We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters. For networks of constant depth and with a large range of widths (for instance, it suffices if the width of each layer is polynomial in that of any other layer), small perturbations of input vectors lead to large changes of outputs. This generalizes results of Daniely and Schacham (2020) for networks of rapidly decreasing width and of Bubeck et al (2021) for two-layer networks. The proof shows that adversarial examples arise in these networks because the functions that they compute are very close to linear. Bottleneck layers in the network play a key role: the minimal width up to some point in the network determines scales and sensitivities of mappings computed up to that point. The main result is for networks with constant depth, but we also show that some constraint on depth is necessary for a result of this kind, because there are suitably deep networks that, with constant probability, compute a function that is close to constant.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Feature Desensitization b/data/2021/neurips/Adversarial Feature Desensitization
new file mode 100644
index 0000000000..ce1039e8d8
--- /dev/null
+++ b/data/2021/neurips/Adversarial Feature Desensitization	
@@ -0,0 +1 @@
+Uplift modeling has shown very promising results in online marketing. However, most existing works are prone to the robustness challenge in some practical applications. In this paper, we first present a possible explanation for the above phenomenon. We verify that there is a feature sensitivity problem in online marketing using different real-world datasets, where the perturbation of some key features will seriously affect the performance of the uplift model and even cause the opposite trend. To solve the above problem, we propose a novel robustness-enhanced uplift modeling framework with adversarial feature desensitization (RUAD). Specifically, our RUAD can more effectively alleviate the feature sensitivity of the uplift model through two customized modules, including a feature selection module with joint multi-label modeling to identify a key subset from the input features and an adversarial feature desensitization module using adversarial training and soft interpolation operations to enhance the robustness of the model against this selected subset of features. Finally, we conduct extensive experiments on a public dataset and a real product dataset to verify the effectiveness of our RUAD in online marketing. In addition, we also demonstrate the robustness of our RUAD to the feature sensitivity, as well as the compatibility with different uplift models.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Graph Augmentation to Improve Graph Contrastive Learning b/data/2021/neurips/Adversarial Graph Augmentation to Improve Graph Contrastive Learning
new file mode 100644
index 0000000000..f55327d6e8
--- /dev/null
+++ b/data/2021/neurips/Adversarial Graph Augmentation to Improve Graph Contrastive Learning	
@@ -0,0 +1 @@
+Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data. Graph contrastive learning (GCL), by training GNNs to maximize the correspondence between the representations of the same graph in its different augmented forms, may yield robust and transferable GNNs even without using labels. However, GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. Here, we propose a novel principle, termed adversarial-GCL (AD-GCL), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL by comparing with the state-of-the-art GCL methods and achieve performance gains of up-to $14\%$ in unsupervised, $6\%$ in transfer, and $3\%$ in semi-supervised learning settings overall with 18 different benchmark datasets for the tasks of molecule property regression and classification, and social network classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Intrinsic Motivation for Reinforcement Learning b/data/2021/neurips/Adversarial Intrinsic Motivation for Reinforcement Learning
new file mode 100644
index 0000000000..8712effddf
--- /dev/null
+++ b/data/2021/neurips/Adversarial Intrinsic Motivation for Reinforcement Learning	
@@ -0,0 +1 @@
+Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically, this paper focuses on goal-conditioned reinforcement learning where the idealized (unachievable) target distribution has full measure at the goal. This paper introduces a quasimetric specific to Markov Decision Processes (MDPs) and uses this quasimetric to estimate the above Wasserstein-1 distance. It further shows that the policy that minimizes this Wasserstein-1 distance is the policy that reaches the goal in as few steps as possible. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function. Our experiments show that this reward function changes smoothly with respect to transitions in the MDP and directs the agent's exploration to find the goal efficiently. Additionally, we combine AIM with Hindsight Experience Replay (HER) and show that the resulting algorithm accelerates learning significantly on several simulated robotics tasks when compared to other rewards that encourage exploration or accelerate learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Neuron Pruning Purifies Backdoored Deep Models b/data/2021/neurips/Adversarial Neuron Pruning Purifies Backdoored Deep Models
new file mode 100644
index 0000000000..6d5c8a6f19
--- /dev/null
+++ b/data/2021/neurips/Adversarial Neuron Pruning Purifies Backdoored Deep Models	
@@ -0,0 +1 @@
+As deep neural networks (DNNs) are growing larger, their requirements for computational resources become huge, which makes outsourcing training more popular. Training in a third-party platform, however, may introduce potential risks that a malicious trainer will return backdoored DNNs, which behave normally on clean samples but output targeted misclassifications whenever a trigger appears at the test time. Without any knowledge of the trigger, it is difficult to distinguish or recover benign DNNs from backdoored ones. In this paper, we first identify an unexpected sensitivity of backdoored DNNs, that is, they are much easier to collapse and tend to predict the target label on clean samples when their neurons are adversarially perturbed. Based on these observations, we propose a novel model repairing method, termed Adversarial Neuron Pruning (ANP), which prunes some sensitive neurons to purify the injected backdoor. Experiments show, even with only an extremely small amount of clean data (e.g., 1%), ANP effectively removes the injected backdoor without causing obvious performance degradation.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Regression with Doubly Non-negative Weighting Matrices b/data/2021/neurips/Adversarial Regression with Doubly Non-negative Weighting Matrices
new file mode 100644
index 0000000000..227550047d
--- /dev/null
+++ b/data/2021/neurips/Adversarial Regression with Doubly Non-negative Weighting Matrices	
@@ -0,0 +1 @@
+Many machine learning tasks that involve predicting an output response can be solved by training a weighted regression model. Unfortunately, the predictive power of this type of models may severely deteriorate under low sample sizes or under covariate perturbations. Reweighting the training samples has aroused as an effective mitigation strategy to these problems. In this paper, we propose a novel and coherent scheme for kernel-reweighted regression by reparametrizing the sample weights using a doubly non-negative matrix. When the weighting matrix is confined in an uncertainty set using either the log-determinant divergence or the Bures-Wasserstein distance, we show that the adversarially reweighted estimate can be solved efficiently using first-order methods. Numerical experiments show that our reweighting strategy delivers promising results on numerous datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Reweighting for Partial Domain Adaptation b/data/2021/neurips/Adversarial Reweighting for Partial Domain Adaptation
new file mode 100644
index 0000000000..88c38f512e
--- /dev/null
+++ b/data/2021/neurips/Adversarial Reweighting for Partial Domain Adaptation	
@@ -0,0 +1 @@
+Partial domain adaptation (PDA) has gained much attention due to its practical setting. The current PDA methods usually adapt the feature extractor by aligning the target and reweighted source domain distributions. In this paper, we experimentally ﬁnd that the feature adaptation by the reweighted distribution alignment in some state-of-the-art PDA methods is not robust to the “noisy” weights of source domain data, leading to negative domain transfer on some challenging benchmarks. To tackle the challenge of negative domain transfer, we propose a novel Adversarial Reweighting (AR) approach that adversarially learns the weights of source domain data to align the source and target domain distributions, and the transferable deep recognition network is learned on the reweighted source domain data. Based on this idea, we propose a training algorithm that alternately updates the parameters of the network and optimizes the weights of source domain data. Extensive experiments show that our method achieves state-of-the-art results on the benchmarks of ImageNet-Caltech, Ofﬁce-Home, VisDA-2017, and DomainNet. Ablation studies also conﬁrm the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Robustness with Non-uniform Perturbations b/data/2021/neurips/Adversarial Robustness with Non-uniform Perturbations
new file mode 100644
index 0000000000..3325b4d8b1
--- /dev/null
+++ b/data/2021/neurips/Adversarial Robustness with Non-uniform Perturbations	
@@ -0,0 +1 @@
+Robustness of machine learning models is critical for security related applications, where real-world adversaries are uniquely focused on evading neural network based detectors. Prior work mainly focus on crafting adversarial examples (AEs) with small uniform norm-bounded perturbations across features to maintain the requirement of imperceptibility. However, uniform perturbations do not result in realistic AEs in domains such as malware, finance, and social networks. For these types of applications, features typically have some semantically meaningful dependencies. The key idea of our proposed approach is to enable non-uniform perturbations that can adequately represent these feature dependencies during adversarial training. We propose using characteristics of the empirical data distribution, both on correlations between the features and the importance of the features themselves. Using experimental datasets for malware classification, credit risk prediction, and spam detection, we show that our approach is more robust to real-world attacks. Finally, we present robustness certification utilizing non-uniform perturbation bounds, and show that non-uniform bounds achieve better certification.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Robustness with Semi-Infinite Constrained Learning b/data/2021/neurips/Adversarial Robustness with Semi-Infinite Constrained Learning
new file mode 100644
index 0000000000..dcfabe4044
--- /dev/null
+++ b/data/2021/neurips/Adversarial Robustness with Semi-Infinite Constrained Learning	
@@ -0,0 +1 @@
+Despite strong performance in numerous applications, the fragility of deep learning to input perturbations has raised serious questions about its use in safety-critical domains. While adversarial training can mitigate this issue in practice, state-of-the-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal performance and robustness. Moreover, the problem of finding worst-case perturbations is non-convex and underparameterized, both of which engender a non-favorable optimization landscape. Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works. In this paper, we take a constrained learning approach to address these questions and to provide a theoretical foundation for robust learning. In particular, we leverage semi-infinite optimization and non-convex duality theory to show that adversarial training is equivalent to a statistical problem over perturbation distributions, which we characterize completely. Notably, we show that a myriad of previous robust training techniques can be recovered for particular, sub-optimal choices of these distributions. Using these insights, we then propose a hybrid Langevin Monte Carlo approach of which several common algorithms (e.g., PGD) are special cases. Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Robustness without Adversarial Training: A Teacher-Guided Curriculum Learning Approach b/data/2021/neurips/Adversarial Robustness without Adversarial Training: A Teacher-Guided Curriculum Learning Approach
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Adversarial Teacher-Student Representation Learning for Domain Generalization b/data/2021/neurips/Adversarial Teacher-Student Representation Learning for Domain Generalization
new file mode 100644
index 0000000000..463d9ddb4b
--- /dev/null
+++ b/data/2021/neurips/Adversarial Teacher-Student Representation Learning for Domain Generalization	
@@ -0,0 +1 @@
+Domain generalization (DG) aims to transfer the learning task from a single or multiple source domains to unseen target domains. To extract and leverage the information which exhibits sufﬁcient generalization ability, we propose a simple yet effective approach of Adversarial Teacher-Student Representation Learning, with the goal of deriving the domain generalizable representations via generating and exploring out-of-source data distributions. Our proposed framework advances Teacher-Student learning in an adversarial learning manner, which alternates be-tween knowledge-distillation based representation learning and novel-domain data augmentation. The former progressively updates the teacher network for deriving domain-generalizable representations, while the latter synthesizes data out-of-source yet plausible distributions. Extensive image classiﬁcation experiments on benchmark datasets in multiple and single source DG settings conﬁrm that, our model exhibits sufﬁcient generalization ability and performs favorably against state-of-the-art DG methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarial Training Helps Transfer Learning via Better Representations b/data/2021/neurips/Adversarial Training Helps Transfer Learning via Better Representations
new file mode 100644
index 0000000000..2d28db297c
--- /dev/null
+++ b/data/2021/neurips/Adversarial Training Helps Transfer Learning via Better Representations	
@@ -0,0 +1 @@
+Transfer learning aims to leverage models pre-trained on source data to efficiently adapt to target setting, where only limited data are available for model fine-tuning. Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains. However, why this happens is not known. In this paper, we provide a theoretical model to rigorously analyze how adversarial training helps transfer learning. We show that adversarial training in the source data generates provably better representations, so fine-tuning on top of this representation leads to a more accurate predictor of the target data. We further demonstrate both theoretically and empirically that semi-supervised learning in the source data can also improve transfer learning by similarly improving the representation. Moreover, performing adversarial training on top of semi-supervised learning can further improve transferability, suggesting that the two approaches have complementary benefits on representations. We support our theories with experiments on popular data sets and deep learning architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions b/data/2021/neurips/Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions
new file mode 100644
index 0000000000..2004dd02fd
--- /dev/null
+++ b/data/2021/neurips/Adversarially Robust 3D Point Cloud Recognition Using Self-Supervisions	
@@ -0,0 +1 @@
+3D point cloud data is increasingly used in safety-critical applications such as autonomous driving. Thus, the robustness of 3D deep learning models against adversarial attacks becomes a major consideration. In this paper, we systematically study the impact of various self-supervised learning proxy tasks on different architectures and threat models for 3D point clouds with adversarial training. Speciﬁcally, we study MLP-based (PointNet), convolution-based (DGCNN), and transformer-based (PCT) 3D architectures. Through extensive experimentation, we demonstrate that appropriate applications of self-supervision can signiﬁcantly enhance the robustness in 3D point cloud recognition, achieving considerable improvements compared to the standard adversarial training baseline. Our analysis reveals that local feature learning is desirable for adversarial robustness in point clouds since it limits the adversarial propagation between the point-level input perturbations and the model’s ﬁnal output. This insight also explains the success of DGCNN and the jigsaw proxy task in achieving stronger 3D adversarial robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarially Robust Change Point Detection b/data/2021/neurips/Adversarially Robust Change Point Detection
new file mode 100644
index 0000000000..1dfa31c379
--- /dev/null
+++ b/data/2021/neurips/Adversarially Robust Change Point Detection	
@@ -0,0 +1 @@
+Change point detection is becoming increasingly popular in many application areas. On one hand, most of the theoretically-justified methods are investigated in an ideal setting without model violations, or merely robust against identical heavy-tailed noise distribution across time and/or against isolate outliers; on the other hand, we are aware that there have been exponentially growing attacks from adversaries, who may pose systematic contamination on data to purposely create spurious change points or disguise true change points. In light of the timely need for a change point detection method that is robust against adversaries, we start with, arguably, the simplest univariate mean change point detection problem. The adversarial attacks are formulated through the Huber $\varepsilon$-contamination framework, which in particular allows the contamination distributions to be different at each time point. In this paper, we demonstrate a phase transition phenomenon in change point detection. This detection boundary is a function of the contamination proportion $\varepsilon$ and is the first time shown in the literature. In addition, we derive the minimax-rate optimal localisation error rate, quantifying the cost of accuracy in terms of the contamination proportion. We propose a computationally feasible method, matching the minimax lower bound under certain conditions, saving for logarithmic factors. Extensive numerical experiments are conducted with comparisons to robust change point detection methods in the existing literature.
\ No newline at end of file
diff --git a/data/2021/neurips/Adversarially robust learning for security-constrained optimal power flow b/data/2021/neurips/Adversarially robust learning for security-constrained optimal power flow
new file mode 100644
index 0000000000..5d9f990f38
--- /dev/null
+++ b/data/2021/neurips/Adversarially robust learning for security-constrained optimal power flow	
@@ -0,0 +1 @@
+In recent years, the ML community has seen surges of interest in both adversarially robust learning and implicit layers, but connections between these two areas have seldom been explored. In this work, we combine innovations from these areas to tackle the problem of N-k security-constrained optimal power flow (SCOPF). N-k SCOPF is a core problem for the operation of electrical grids, and aims to schedule power generation in a manner that is robust to potentially k simultaneous equipment outages. Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem - viewing power generation settings as adjustable parameters and equipment outages as (adversarial) attacks - and solve this problem via gradient-based techniques. The loss function of this minimax problem involves resolving implicit equations representing grid physics and operational decisions, which we differentiate through via the implicit function theorem. We demonstrate the efficacy of our framework in solving N-3 SCOPF, which has traditionally been considered as prohibitively expensive to solve given that the problem size depends combinatorially on the number of potential outages.
\ No newline at end of file
diff --git a/data/2021/neurips/Agent Modelling under Partial Observability for Deep Reinforcement Learning b/data/2021/neurips/Agent Modelling under Partial Observability for Deep Reinforcement Learning
new file mode 100644
index 0000000000..e0a862294f
--- /dev/null
+++ b/data/2021/neurips/Agent Modelling under Partial Observability for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Modelling the behaviours of other agents is essential for understanding how agents interact and making effective decisions. Existing methods for agent modelling commonly assume knowledge of the local observations and chosen actions of the modelled agents during execution. To eliminate this assumption, we extract representations from the local information of the controlled agent using encoder-decoder architectures. Using the observations and actions of the modelled agents during training, our models learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent. The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning; thus, during execution, the policy does not require access to other agents' information. We provide a comprehensive evaluation and ablations studies in cooperative, competitive and mixed multi-agent environments, showing that our method achieves higher returns than baseline methods which do not use the learned representations.
\ No newline at end of file
diff --git a/data/2021/neurips/Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations b/data/2021/neurips/Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations
new file mode 100644
index 0000000000..4723776c9a
--- /dev/null
+++ b/data/2021/neurips/Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations	
@@ -0,0 +1 @@
+There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $\widetilde{O}\left((H^{4d} K^{3d} \log |\Pi|)/\epsilon^2\right)$ where $H$ is the length of episodes, $K$ is the number of actions and $\epsilon>0$ is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions.
\ No newline at end of file
diff --git a/data/2021/neurips/Algorithmic Instabilities of Accelerated Gradient Descent b/data/2021/neurips/Algorithmic Instabilities of Accelerated Gradient Descent
new file mode 100644
index 0000000000..877804463c
--- /dev/null
+++ b/data/2021/neurips/Algorithmic Instabilities of Accelerated Gradient Descent	
@@ -0,0 +1 @@
+We study the algorithmic stability of Nesterov's accelerated gradient method. For convex quadratic objectives, Chen et al. (2018) proved that the uniform stability of the method grows quadratically with the number of optimization steps, and conjectured that the same is true for the general convex and smooth case. We disprove this conjecture and show, for two notions of algorithmic stability (including uniform stability), that the stability of Nesterov's accelerated method in fact deteriorates exponentially fast with the number of gradient steps. This stands in sharp contrast to the bounds in the quadratic case, but also to known results for non-accelerated gradient methods where stability typically grows linearly with the number of steps.
\ No newline at end of file
diff --git a/data/2021/neurips/Algorithmic stability and generalization of an unsupervised feature selection algorithm b/data/2021/neurips/Algorithmic stability and generalization of an unsupervised feature selection algorithm
new file mode 100644
index 0000000000..d7bdf90bd5
--- /dev/null
+++ b/data/2021/neurips/Algorithmic stability and generalization of an unsupervised feature selection algorithm	
@@ -0,0 +1 @@
+Feature selection, as a vital dimension reduction technique, reduces data dimension by identifying an essential subset of input features, which can facilitate interpretable insights into learning and inference processes. Algorithmic stability is a key characteristic of an algorithm regarding its sensitivity to perturbations of input samples. In this paper, we propose an innovative unsupervised feature selection algorithm attaining this stability with provable guarantees. The architecture of our algorithm consists of a feature scorer and a feature selector. The scorer trains a neural network (NN) to globally score all the features, and the selector adopts a dependent sub-NN to locally evaluate the representation abilities for selecting features. Further, we present algorithmic stability analysis and show that our algorithm has a performance guarantee via a generalization error bound. Extensive experimental results on real-world datasets demonstrate superior generalization performance of our proposed algorithm to strong baseline methods. Also, the properties revealed by our theoretical analysis and the stability of our algorithm-selected features are empirically confirmed.
\ No newline at end of file
diff --git a/data/2021/neurips/Alias-Free Generative Adversarial Networks b/data/2021/neurips/Alias-Free Generative Adversarial Networks
new file mode 100644
index 0000000000..56e9bb9456
--- /dev/null
+++ b/data/2021/neurips/Alias-Free Generative Adversarial Networks	
@@ -0,0 +1 @@
+We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.
\ No newline at end of file
diff --git a/data/2021/neurips/Align before Fuse: Vision and Language Representation Learning with Momentum Distillation b/data/2021/neurips/Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
new file mode 100644
index 0000000000..383d99dd8c
--- /dev/null
+++ b/data/2021/neurips/Align before Fuse: Vision and Language Representation Learning with Momentum Distillation	
@@ -0,0 +1 @@
+Large-scale vision and language representation learning has shown promising improvements on various vision-language tasks. Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens. Because the visual tokens and word tokens are unaligned, it is challenging for the multimodal encoder to learn image-text interactions. In this paper, we introduce a contrastive loss to ALign the image and text representations BEfore Fusing (ALBEF) them through cross-modal attention, which enables more grounded vision and language representation learning. Unlike most existing methods, our method does not require bounding box annotations nor high-resolution images. In order to improve learning from noisy web data, we propose momentum distillation, a self-training method which learns from pseudo-targets produced by a momentum model. We provide a theoretical analysis of ALBEF from a mutual information maximization perspective, showing that different training tasks can be interpreted as different ways to generate views for an image-text pair. ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks. On image-text retrieval, ALBEF outperforms methods that are pre-trained on orders of magnitude larger datasets. On VQA and NLVR$^2$, ALBEF achieves absolute improvements of 2.37% and 3.84% compared to the state-of-the-art, while enjoying faster inference speed. Code and pre-trained models are available at https://github.com/salesforce/ALBEF/.
\ No newline at end of file
diff --git a/data/2021/neurips/Aligned Structured Sparsity Learning for Efficient Image Super-Resolution b/data/2021/neurips/Aligned Structured Sparsity Learning for Efficient Image Super-Resolution
new file mode 100644
index 0000000000..3ab0018445
--- /dev/null
+++ b/data/2021/neurips/Aligned Structured Sparsity Learning for Efficient Image Super-Resolution	
@@ -0,0 +1 @@
+Lightweight image super-resolution (SR) networks have obtained promising re-sults with moderate model size. Many SR methods have focused on designing lightweight architectures, which neglect to further reduce the redundancy of network parameters. On the other hand, model compression techniques, like neural architecture search and knowledge distillation, typically consume considerable memory and computation resources. In contrast, network pruning is a cheap and effective model compression technique. However, it is hard to be applied to SR networks directly, because ﬁlter pruning for residual blocks is well-known tricky. To address the above issues, we propose aligned structured sparsity learning (ASSL), which introduces a weight normalization layer and applies L 2 regularization to the scale parameters for sparsity. To align the pruned ﬁlter locations across different layers, we propose a sparsity structure alignment penalty term, which minimizes the norm of soft mask gram matrix. We apply aligned structured sparsity learning strategy to train efﬁcient image SR network, named as ASSLN, with smaller model size and lower computation than state-of-the-art methods. We conduct extensive comparisons with lightweight SR networks. Our ASSLN achieves superior performance gains over recent methods quantitatively and visually.
\ No newline at end of file
diff --git a/data/2021/neurips/Aligning Pretraining for Detection via Object-Level Contrastive Learning b/data/2021/neurips/Aligning Pretraining for Detection via Object-Level Contrastive Learning
new file mode 100644
index 0000000000..88bca9fb93
--- /dev/null
+++ b/data/2021/neurips/Aligning Pretraining for Detection via Object-Level Contrastive Learning	
@@ -0,0 +1 @@
+Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning. Such generality for transfer learning, however, sacrifices specificity if we are interested in a certain downstream task. We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task. In this paper, we follow this principle with a pretraining method specifically designed for the task of object detection. We attain alignment in the following three aspects: 1) object-level representations are introduced via selective search bounding boxes as object proposals; 2) the pretraining network architecture incorporates the same dedicated modules used in the detection pipeline (e.g. FPN); 3) the pretraining is equipped with object detection properties such as object-level translation invariance and scale invariance. Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection using a Mask R-CNN framework. Code is available at https://github.com/hologerry/SoCo.
\ No newline at end of file
diff --git a/data/2021/neurips/Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery b/data/2021/neurips/Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
new file mode 100644
index 0000000000..70a1a057a7
--- /dev/null
+++ b/data/2021/neurips/Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery	
@@ -0,0 +1 @@
+Articulation-centric 2D/3D pose supervision forms the core training objective in most existing 3D human pose estimation techniques. Except for synthetic source environments, acquiring such rich supervision for each real target domain at deployment is highly inconvenient. However, we realize that standard foreground silhouette estimation techniques (on static camera feeds) remain unaffected by domain-shifts. Motivated by this, we propose a novel target adaptation framework that relies only on silhouette supervision to adapt a source-trained model-based regressor. However, in the absence of any auxiliary cue (multi-view, depth, or 2D pose), an isolated silhouette loss fails to provide a reliable pose-specific gradient and requires to be employed in tandem with a topology-centric loss. To this end, we develop a series of convolution-friendly spatial transformations in order to disentangle a topological-skeleton representation from the raw silhouette. Such a design paves the way to devise a Chamfer-inspired spatial topological-alignment loss via distance field computation, while effectively avoiding any gradient hindering spatial-to-pointset mapping. Experimental results demonstrate our superiority against prior-arts in self-adapting a source trained model to diverse unlabeled target domains, such as a) in-the-wild datasets, b) low-resolution image domains, and c) adversarially perturbed image domains (via UAP).
\ No newline at end of file
diff --git a/data/2021/neurips/Alignment Attention by Matching Key and Query Distributions b/data/2021/neurips/Alignment Attention by Matching Key and Query Distributions
new file mode 100644
index 0000000000..96130c7007
--- /dev/null
+++ b/data/2021/neurips/Alignment Attention by Matching Key and Query Distributions	
@@ -0,0 +1 @@
+The neural attention mechanism has been incorporated into deep neural networks to achieve state-of-the-art performance in various domains. Most such models use multi-head self-attention which is appealing for the ability to attend to information from different perspectives. This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head. The resulting alignment attention networks can be optimized as an unsupervised regularization in the existing attention framework. It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention. On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our approach on graph attention and visual question answering, showing the great potential of incorporating our alignment method into various attention-related tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/All Tokens Matter: Token Labeling for Training Better Vision Transformers b/data/2021/neurips/All Tokens Matter: Token Labeling for Training Better Vision Transformers
new file mode 100644
index 0000000000..9f9a9dfe3e
--- /dev/null
+++ b/data/2021/neurips/All Tokens Matter: Token Labeling for Training Better Vision Transformers	
@@ -0,0 +1 @@
+In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs). Different from the standard training objective of ViTs that computes the classification loss on an additional trainable class token, our proposed one takes advantage of all the image patch tokens to compute the training loss in a dense manner. Specifically, token labeling reformulates the image classification problem into multiple token-level recognition problems and assigns each patch token with an individual location-specific supervision generated by a machine annotator. Experiments show that token labeling can clearly and consistently improve the performance of various ViT models across a wide spectrum. For a vision transformer with 26M learnable parameters serving as an example, with token labeling, the model can achieve 84.4% Top-1 accuracy on ImageNet. The result can be further increased to 86.4% by slightly scaling the model size up to 150M, delivering the minimal-sized model among previous models (250M+) reaching 86%. We also show that token labeling can clearly improve the generalization capability of the pre-trained models on downstream tasks with dense prediction, such as semantic segmentation. Our code and all the training details will be made publicly available at https://github.com/zihangJiang/TokenLabeling.
\ No newline at end of file
diff --git a/data/2021/neurips/Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate b/data/2021/neurips/Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate
new file mode 100644
index 0000000000..27ff8e819e
--- /dev/null
+++ b/data/2021/neurips/Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate	
@@ -0,0 +1 @@
+In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents efficient direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder. We first learn the decoder: a differentiable surrogate that approximates the many-to-one physical realization process. We then learn the encoder, which maps from goal to design, while using the fixed decoder to evaluate the quality of the realization. We evaluate the approach on two case studies: extruder path planning in additive manufacturing and constrained soft robot inverse kinematics. We compare our approach to direct optimization of the design using the learned surrogate, and to supervised learning of the synthesis problem. We find that our approach produces higher quality solutions than supervised learning, while being competitive in quality with direct optimization, at a greatly reduced computational cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Amortized Variational Inference for Simple Hierarchical Models b/data/2021/neurips/Amortized Variational Inference for Simple Hierarchical Models
new file mode 100644
index 0000000000..b2d81b2edf
--- /dev/null
+++ b/data/2021/neurips/Amortized Variational Inference for Simple Hierarchical Models	
@@ -0,0 +1 @@
+It is difficult to use subsampling with variational inference in hierarchical models since the number of local latent variables scales with the dataset. Thus, inference in hierarchical models remains a challenge at large scale. It is helpful to use a variational family with structure matching the posterior, but optimization is still slow due to the huge number of local distributions. Instead, this paper suggests an amortized approach where shared parameters simultaneously represent all local distributions. This approach is similarly accurate as using a given joint distribution (e.g., a full-rank Gaussian) but is feasible on datasets that are several orders of magnitude larger. It is also dramatically faster than using a structured variational distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias b/data/2021/neurips/An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
new file mode 100644
index 0000000000..e2aea550a2
--- /dev/null
+++ b/data/2021/neurips/An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias	
@@ -0,0 +1 @@
+Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this shortcoming, in this work, we establish an asymptotic normality result for the constant step size stochastic gradient descent (SGD) algorithm--a widely used algorithm in practice. Specifically, based on the relationship between SGD and Markov Chains [DDB19], we show that the average of SGD iterates is asymptotically normally distributed around the expected value of their unique invariant distribution, as long as the non-convex and non-smooth objective function satisfies a dissipativity property. We also characterize the bias between this expected value and the critical points of the objective function under various local regularity conditions. Together, the above two results could be leveraged to construct confidence intervals for non-convex problems that are trained using the SGD algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning b/data/2021/neurips/An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning
new file mode 100644
index 0000000000..b5cc5d6e82
--- /dev/null
+++ b/data/2021/neurips/An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning	
@@ -0,0 +1 @@
+We address an inherent difficulty in welfare-theoretic fair machine learning by proposing an equivalently axiomatically-justified alternative and studying the resulting computational and statistical learning questions. Welfare metrics quantify overall wellbeing across a population of one or more groups, and welfare-based objectives and constraints have recently been proposed to incentivize fair machine learning methods to produce satisfactory solutions that consider the diverse needs of multiple groups. Unfortunately, many machine-learning problems are more naturally cast as loss minimization tasks, rather than utility maximization, which complicates direct application of welfare-centric methods to fair machine learning. In this work, we define a complementary measure, termed malfare, measuring overall societal harm (rather than wellbeing), with axiomatic justification via the standard axioms of cardinal welfare. We then cast fair machine learning as malfare minimization over the risk values (expected losses) of each group. Surprisingly, the axioms of cardinal welfare (malfare) dictate that this is not equivalent to simply defining utility as negative loss. Building upon these concepts, we define fair-PAC (FPAC) learning, where an FPAC learner is an algorithm that learns an $\varepsilon$-$\delta$ malfare-optimal model with bounded sample complexity, for any data distribution, and for any (axiomatically justified) malfare concept. Finally, we show broad conditions under which, with appropriate modifications, standard PAC-learners may be converted to FPAC learners. This places FPAC learning on firm theoretical ground, as it yields statistical and computational efficiency guarantees for many well-studied machine-learning models, and is also practically relevant, as it democratizes fair ML by providing concrete training algorithms and rigorous generalization guarantees for these models
\ No newline at end of file
diff --git a/data/2021/neurips/An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints b/data/2021/neurips/An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints
new file mode 100644
index 0000000000..9c637d44a2
--- /dev/null
+++ b/data/2021/neurips/An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints	
@@ -0,0 +1 @@
+This paper considers stochastic linear bandits with general nonlinear constraints. The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round $\tau\leq T$. We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields $\tilde{\cal O}\left(\left(\frac{K^{0.75}}{\delta}+d\right)\sqrt{\tau}\right)$ (pseudo) regret in round $\tau\leq T,$ where $K$ is the number of constraints, $d$ is the dimension of the reward feature space, and $\delta$ is a Slater's constant; and zero constraint violation in any round $\tau>\tau',$ where $\tau'$ is independent of horizon $T.$ Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization and includes two components. The primal component is similar to unconstrained stochastic linear bandits (our algorithm uses the linear upper confidence bound algorithm (LinUCB)). The computational complexity of the dual component depends on the number of constraints, but is independent of the sizes of the contextual space, the action space, and the feature space. Thus, the overall computational complexity of our algorithm is similar to that of the linear UCB for unconstrained stochastic linear bandits.
\ No newline at end of file
diff --git a/data/2021/neurips/An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning b/data/2021/neurips/An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning
new file mode 100644
index 0000000000..bc75ff06e1
--- /dev/null
+++ b/data/2021/neurips/An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning	
@@ -0,0 +1 @@
+Transfer Learning has shown great potential to enhance single-agent Reinforcement Learning (RL) efficiency. Similarly, Multiagent RL (MARL) can also be accelerated if agents can share knowledge with each other. However, it remains a problem of how an agent should learn from other agents. In this paper, we propose a novel Multiagent Policy Transfer Framework (MAPTF) to improve MARL efficiency. MAPTF learns which agent's policy is the best to reuse for each agent and when to terminate it by modeling multiagent policy transfer as the option learning problem. Furthermore, in practice, the option module can only collect all agent's local experiences for update due to the partial observability of the environment. While in this setting, each agent's experience may be inconsistent with each other, which may cause the inaccuracy and oscillation of the option-value's estimation. Therefore, we propose a novel option learning algorithm, the successor representation option learning to solve it by decoupling the environment dynamics from rewards and learning the option-value under each agent's preference. MAPTF can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces.
\ No newline at end of file
diff --git a/data/2021/neurips/An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers b/data/2021/neurips/An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers
new file mode 100644
index 0000000000..e61c249862
--- /dev/null
+++ b/data/2021/neurips/An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers	
@@ -0,0 +1 @@
+Recent work demonstrates that deep neural networks trained using Empirical Risk Minimization (ERM) can generalize under distribution shift, outperforming specialized training algorithms for domain generalization. The goal of this paper is to further understand this phenomenon. In particular, we study the extent to which the seminal domain adaptation theory of Ben-David et al. (2007) explains the performance of ERMs. Perhaps surprisingly, we find that this theory does not provide a tight explanation of the out-of-domain generalization observed across a large number of ERM models trained on three popular domain generalization datasets. This motivates us to investigate other possible measures-that, however, lack theory-which could explain generalization in this setting. Our investigation reveals that measures relating to the Fisher information, predictive entropy, and maximum mean discrepancy are good predictors of the out-of-distribution generalization of ERM models. We hope that our work helps galvanize the community towards building a better understanding of when deep networks trained with ERM generalize out-of-distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/An Empirical Study of Adder Neural Networks for Object Detection b/data/2021/neurips/An Empirical Study of Adder Neural Networks for Object Detection
new file mode 100644
index 0000000000..9f35b5d4b9
--- /dev/null
+++ b/data/2021/neurips/An Empirical Study of Adder Neural Networks for Object Detection	
@@ -0,0 +1 @@
+Adder neural networks (AdderNets) have shown impressive performance on image classification with only addition operations, which are more energy efficient than traditional convolutional neural networks built with multiplications. Compared with classification, there is a strong demand on reducing the energy consumption of modern object detectors via AdderNets for real-world applications such as autonomous driving and face detection. In this paper, we present an empirical study of AdderNets for object detection. We first reveal that the batch normalization statistics in the pre-trained adder backbone should not be frozen, since the relatively large feature variance of AdderNets. Moreover, we insert more shortcut connections in the neck part and design a new feature fusion architecture for avoiding the sparse features of adder layers. We present extensive ablation studies to explore several design choices of adder detectors. Comparisons with state-of-the-arts are conducted on COCO and PASCAL VOC benchmarks. Specifically, the proposed Adder FCOS achieves a 37.8\% AP on the COCO val set, demonstrating comparable performance to that of the convolutional counterpart with an about $1.4\times$ energy reduction.
\ No newline at end of file
diff --git a/data/2021/neurips/An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning b/data/2021/neurips/An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning
new file mode 100644
index 0000000000..220184d8ba
--- /dev/null
+++ b/data/2021/neurips/An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning	
@@ -0,0 +1 @@
+We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected loss simultaneously. This improves over the optimal method of Lan (2012), which is insensitive to the minimum expected loss; over the optimistic acceleration of Cotter et al. (2011), which has suboptimal dependence on the minibatch size; and over the algorithm of Liu and Belkin (2018), which is limited to least squares problems and is also similarly suboptimal with respect to the minibatch size. Applied to interpolation learning, the improvement over Cotter et al. and Liu and Belkin translates to a linear, rather than square-root, parallelization speedup.
\ No newline at end of file
diff --git a/data/2021/neurips/An Exact Characterization of the Generalization Error for the Gibbs Algorithm b/data/2021/neurips/An Exact Characterization of the Generalization Error for the Gibbs Algorithm
new file mode 100644
index 0000000000..cf89a27ea4
--- /dev/null
+++ b/data/2021/neurips/An Exact Characterization of the Generalization Error for the Gibbs Algorithm	
@@ -0,0 +1 @@
+Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks b/data/2021/neurips/An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks
new file mode 100644
index 0000000000..2df1343bc0
--- /dev/null
+++ b/data/2021/neurips/An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks	
@@ -0,0 +1 @@
+It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points. In this work, we improve the dependence on $\delta$ from exponential to almost linear, proving that $\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
\ No newline at end of file
diff --git a/data/2021/neurips/An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap b/data/2021/neurips/An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap
new file mode 100644
index 0000000000..e16b88a968
--- /dev/null
+++ b/data/2021/neurips/An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap	
@@ -0,0 +1 @@
+A fundamental question in the theory of reinforcement learning is: suppose the optimal $Q$-function lies in the linear span of a given $d$ dimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolved this question in the negative, providing an exponential (in $d$) sample size lower bound, which holds even if the agent has access to a generative model of the environment. One may hope that this information theoretic barrier for RL can be circumvented by further supposing an even more favorable assumption: there exists a \emph{constant suboptimality gap} between the optimal $Q$-value of the best action and that of the second-best action (for all states). The hope is that having a large suboptimality gap would permit easier identification of optimal actions themselves, thus making the problem tractable; indeed, provided the agent has access to a generative model, sample-efficient RL is in fact possible with the addition of this more favorable assumption. This work focuses on this question in the standard online reinforcement learning setting, where our main result resolves this question in the negative: our hardness result shows that an exponential sample complexity lower bound still holds even if a constant suboptimality gap is assumed in addition to having a linearly realizable optimal $Q$-function. Perhaps surprisingly, this implies an exponential separation between the online RL setting and the generative model setting. Complementing our negative hardness result, we give two positive results showing that provably sample-efficient RL is possible either under an additional low-variance assumption or under a novel hypercontractivity assumption (both implicitly place stronger conditions on the underlying dynamics model).
\ No newline at end of file
diff --git a/data/2021/neurips/An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild b/data/2021/neurips/An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild
new file mode 100644
index 0000000000..721abfe985
--- /dev/null
+++ b/data/2021/neurips/An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild	
@@ -0,0 +1 @@
+Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains (e.g. of human faces) with minimal manual effort. Specifically, we use a recent language-image embedding model (CLIP) to annotate a set of attributes of interest in a zero-shot manner and demonstrate state-of-the-art disentangled image manipulation results.
\ No newline at end of file
diff --git a/data/2021/neurips/An Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders b/data/2021/neurips/An Improved Analysis and Rates for Variance Reduction under Without-replacement Sampling Orders
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/An Improved Analysis of Gradient Tracking for Decentralized Machine Learning b/data/2021/neurips/An Improved Analysis of Gradient Tracking for Decentralized Machine Learning
new file mode 100644
index 0000000000..078ba1c205
--- /dev/null
+++ b/data/2021/neurips/An Improved Analysis of Gradient Tracking for Decentralized Machine Learning	
@@ -0,0 +1 @@
+We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While gradient tracking (GT) algorithms can overcome a key challenge, namely accounting for differences between workers' local data distributions, the known convergence rates for GT algorithms are not optimal with respect to their dependence on the mixing parameter $p$ (related to the spectral gap of the connectivity matrix). We provide a tighter analysis of the GT method in the stochastic strongly convex, convex and non-convex settings. We improve the dependency on $p$ from $\mathcal{O}(p^{-2})$ to $\mathcal{O}(p^{-1}c^{-1})$ in the noiseless case and from $\mathcal{O}(p^{-3/2})$ to $\mathcal{O}(p^{-1/2}c^{-1})$ in the general stochastic case, where $c \geq p$ is related to the negative eigenvalues of the connectivity matrix (and is a constant in most practical applications). This improvement was possible due to a new proof technique which could be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence b/data/2021/neurips/An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence
new file mode 100644
index 0000000000..af06c107fe
--- /dev/null
+++ b/data/2021/neurips/An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence	
@@ -0,0 +1 @@
+A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data. But far away from them, ReLU Bayesian neural networks (BNNs) can still underestimate uncertainty and thus be asymptotically overconfident. This issue arises since the output variance of a BNN with finitely many features is quadratic in the distance from the data region. Meanwhile, Bayesian linear models with ReLU features converge, in the infinite-width limit, to a particular Gaussian process (GP) with a variance that grows cubically so that no asymptotic overconfidence can occur. While this may seem of mostly theoretical interest, in this work, we show that it can be used in practice to the benefit of BNNs. We extend finite ReLU BNNs with infinite ReLU features via the GP and show that the resulting model is asymptotically maximally uncertain far away from the data while the BNNs' predictive power is unaffected near the data. Although the resulting model approximates a full GP posterior, thanks to its structure, it can be applied \emph{post-hoc} to any pre-trained ReLU BNN at a low cost.
\ No newline at end of file
diff --git a/data/2021/neurips/An Information-theoretic Approach to Distribution Shifts b/data/2021/neurips/An Information-theoretic Approach to Distribution Shifts
new file mode 100644
index 0000000000..862aa4a3e5
--- /dev/null
+++ b/data/2021/neurips/An Information-theoretic Approach to Distribution Shifts	
@@ -0,0 +1 @@
+Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.
\ No newline at end of file
diff --git a/data/2021/neurips/An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives b/data/2021/neurips/An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives
new file mode 100644
index 0000000000..66c707d4fd
--- /dev/null
+++ b/data/2021/neurips/An Online Method for A Class of Distributionally Robust Optimization with Non-convex Objectives	
@@ -0,0 +1 @@
+In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the min-max problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-\L ojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with $\sim$ 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.
\ No newline at end of file
diff --git a/data/2021/neurips/An Online Riemannian PCA for Stochastic Canonical Correlation Analysis b/data/2021/neurips/An Online Riemannian PCA for Stochastic Canonical Correlation Analysis
new file mode 100644
index 0000000000..f882cfb42c
--- /dev/null
+++ b/data/2021/neurips/An Online Riemannian PCA for Stochastic Canonical Correlation Analysis	
@@ -0,0 +1 @@
+We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifolds. Our developments nicely complement existing methods for this problem which either require O(d 3) time complexity per iteration with O ( 1 t ) convergence rate (where d is the dimensionality) or only extract the top 1 component with O ( 1 t ) convergence rate. In contrast, our algorithm offers an improvement: it achieves O(d 2 k) runtime complexity per iteration for extracting the top k canonical components with O ( 1 t ) convergence rate. While our paper focuses more on the formulation and the algorithm, our experiments show that the empirical behavior on common datasets is quite promising. We also explore a potential application in training fair models with missing sensitive attributes.
\ No newline at end of file
diff --git a/data/2021/neurips/An Uncertainty Principle is a Price of Privacy-Preserving Microdata b/data/2021/neurips/An Uncertainty Principle is a Price of Privacy-Preserving Microdata
new file mode 100644
index 0000000000..33d5a28302
--- /dev/null
+++ b/data/2021/neurips/An Uncertainty Principle is a Price of Privacy-Preserving Microdata	
@@ -0,0 +1 @@
+Privacy-protected microdata are often the desired output of a differentially private algorithm since microdata is familiar and convenient for downstream users. However, there is a statistical price for this kind of convenience. We show that an uncertainty principle governs the trade-off between accuracy for a population of interest ("sum query") vs. accuracy for its component sub-populations ("point queries"). Compared to differentially private query answering systems that are not required to produce microdata, accuracy can degrade by a logarithmic factor. For example, in the case of pure differential privacy, without the microdata requirement, one can provide noisy answers to the sum query and all point queries while guaranteeing that each answer has squared error $O(1/\epsilon^2)$. With the microdata requirement, one must choose between allowing an additional $\log^2(d)$ factor ($d$ is the number of point queries) for some point queries or allowing an extra $O(d^2)$ factor for the sum query. We present lower bounds for pure, approximate, and concentrated differential privacy. We propose mitigation strategies and create a collection of benchmark datasets that can be used for public study of this problem.
\ No newline at end of file
diff --git a/data/2021/neurips/An analysis of Ermakov-Zolotukhin quadrature using kernels b/data/2021/neurips/An analysis of Ermakov-Zolotukhin quadrature using kernels
new file mode 100644
index 0000000000..2b941a7806
--- /dev/null
+++ b/data/2021/neurips/An analysis of Ermakov-Zolotukhin quadrature using kernels	
@@ -0,0 +1 @@
+We study a quadrature, proposed by Ermakov and Zolotukhin in the sixties, through the lens of kernel methods. The nodes of this quadrature rule follow the distribution of a determinantal point process, while the weights are defined through a linear system, similarly to the optimal kernel quadrature. In this work, we show how these two classes of quadrature are related, and we prove a tractable formula of the expected value of the squared worst-case integration error on the unit ball of an RKHS of the former quadrature. In particular, this formula involves the eigenvalues of the corresponding kernel and leads to improving on the existing theoretical guarantees of the optimal kernel quadrature with determinantal point processes.
\ No newline at end of file
diff --git a/data/2021/neurips/An online passive-aggressive algorithm for difference-of-squares classification b/data/2021/neurips/An online passive-aggressive algorithm for difference-of-squares classification
new file mode 100644
index 0000000000..f872d30770
--- /dev/null
+++ b/data/2021/neurips/An online passive-aggressive algorithm for difference-of-squares classification	
@@ -0,0 +1 @@
+We investigate a low-rank model of quadratic classification inspired by previous work on factorization machines, polynomial networks, and capsule-based architectures for visual object recognition. The model is parameterized by a pair of affine transformations, and it classifies examples by comparing the magnitudes of vectors that these transformations produce. The model is also over-parameterized in the sense that different pairs of affine transformations can describe classifiers with the same decision boundary and confidence scores. We show that such pairs arise from discrete and continuous symmetries of the model’s parameter space: in particular, the latter define symmetry groups of rotations and Lorentz transformations, and we use these group structures to devise appropriately invariant procedures for model alignment and averaging. We also leverage the form of the model’s decision boundary to derive simple margin-based updates for online learning. Here we explore a strategy of passive-aggressive learning: for each example, we compute the minimum change in parameters that is required to predict its correct label with high confidence. We derive these updates by solving a quadratically constrained quadratic program (QCQP); interestingly, this QCQP is nonconvex but tractable, and it can be solved efficiently by elementary methods. We highlight the conceptual and practical contributions of this approach. Conceptually, we show that it extends the paradigm of passive-aggressive learning to a larger family of nonlinear models for classification. Practically, we show that these models perform well on large-scale problems in online learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model b/data/2021/neurips/Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
new file mode 100644
index 0000000000..f69e35a6f0
--- /dev/null
+++ b/data/2021/neurips/Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model	
@@ -0,0 +1 @@
+Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation. Analogous to the dynamic local population in EA, we improve the existing transformer structure and propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly. Moreover, we introduce the spatial-filling curve into the current vision transformer to sequence image data into a uniform sequential format. Thus we can design a unified EAT framework to address multi-modal tasks, separating the network architecture from the data format adaptation. Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-model tasks to demonstrate the superiority of the unified EAT, e.g., Text-Based Image Retrieval, and our approach improves the rank-1 by +3.7 points over the baseline on the CSS dataset.
\ No newline at end of file
diff --git a/data/2021/neurips/Analysis of Sensing Spectral for Signal Recovery under a Generalized Linear Model b/data/2021/neurips/Analysis of Sensing Spectral for Signal Recovery under a Generalized Linear Model
new file mode 100644
index 0000000000..7db2157457
--- /dev/null
+++ b/data/2021/neurips/Analysis of Sensing Spectral for Signal Recovery under a Generalized Linear Model	
@@ -0,0 +1 @@
+We consider a nonlinear inverse problem y = f ( Ax ) , where observations y ∈ R m are the componentwise nonlinear transformation of Ax ∈ R m , x ∈ R n is the signal of interest and A is a known linear mapping. By properly specifying the nonlinear processing function, this model can be particularized to many signal processing problems, including compressed sensing and phase retrieval. Our main goal in this paper is to understand the impact of sensing matrices, or more speciﬁcally the spectrum of sensing matrices, on the difﬁculty of recovering x from y . Towards this goal, we study the performance of one of the most successful recovery methods, i.e. the expectation propagation algorithm (EP). We deﬁne a notion for the spikiness of the spectrum of A and show the importance of this measure in the performance of the EP. Whether the spikiness of the spectrum can hurt or help the recovery performance of EP depends on f . We deﬁne certain quantities based on the function f that enables us to describe the impact of the spikiness of the spectrum on EP recovery. Based on our framework, we are able to show that for instance, in phase-retrieval problems, matrices with spikier spectrums are better for EP, while in 1-bit compressed sensing problems, less spiky (ﬂatter) spectrums offer better recoveries. Our results unify and substantially generalize the existing results that compare sub-Gaussian and orthogonal matrices, and provide a platform toward designing optimal sensing systems.
\ No newline at end of file
diff --git a/data/2021/neurips/Analysis of one-hidden-layer neural networks via the resolvent method b/data/2021/neurips/Analysis of one-hidden-layer neural networks via the resolvent method
new file mode 100644
index 0000000000..eb985c6a9f
--- /dev/null
+++ b/data/2021/neurips/Analysis of one-hidden-layer neural networks via the resolvent method	
@@ -0,0 +1 @@
+We compute the asymptotic empirical spectral distribution of a non-linear random matrix model by using the resolvent method. Motivated by random neural networks, we consider the random matrix $M = Y Y^\ast$ with $Y = f(WX)$, where $W$ and $X$ are random rectangular matrices with i.i.d. centred entries and $f$ is a non-linear smooth function which is applied entry-wise. We prove that the Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms, which is exactly the equation obtained by [Pennington, Worah] and [Benigni, Peche] with the moment method approach. In addition, we extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountering in practice. Our approach following the \emph{resolvent method} is more robust than the moment method and is expected to provide insights also for models where the combinatorics of the latter become intractable.
\ No newline at end of file
diff --git a/data/2021/neurips/Analytic Insights into Structure and Rank of Neural Network Hessian Maps b/data/2021/neurips/Analytic Insights into Structure and Rank of Neural Network Hessian Maps
new file mode 100644
index 0000000000..7fc71360cb
--- /dev/null
+++ b/data/2021/neurips/Analytic Insights into Structure and Rank of Neural Network Hessian Maps	
@@ -0,0 +1 @@
+The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency as well as the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks, allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g.~width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems b/data/2021/neurips/Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems
new file mode 100644
index 0000000000..a65f0010ec
--- /dev/null
+++ b/data/2021/neurips/Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems	
@@ -0,0 +1 @@
+The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit. By numerical integration of these equations, we observe that these methods speed up the dynamics but do not improve the algorithmic threshold with respect to gradient descent in the spiked model.
\ No newline at end of file
diff --git a/data/2021/neurips/Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation b/data/2021/neurips/Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation
new file mode 100644
index 0000000000..b42f5d9692
--- /dev/null
+++ b/data/2021/neurips/Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation	
@@ -0,0 +1 @@
+Knowledge distillation (KD) has recently been identiﬁed as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student. Recent research in developing undistillable nasty teachers that can protect model conﬁdentiality has gained signiﬁcant attention. However, the level of protection these nasty models offer has been largely untested. In this paper, we show that transferring knowledge to a shallow sub-section of a student can largely reduce a teacher’s inﬂuence. By exploring the depth of the shallow subsection, we then present a distillation technique that enables a skeptical student model to learn even from a nasty teacher. To evaluate the efﬁcacy of our skeptical students, we conducted experiments with several models with KD under both training data-available and data-free scenarios for various datasets. While distilling from nasty teachers, compared to the normal student models, skeptical students consistently provide superior classiﬁcation performance of up to ∼ 59 . 5% . Moreover, similar to normal students, skeptical students maintain high classiﬁcation accuracy when distilled from a normal teacher, showing their efﬁcacy irrespective of the teacher being nasty or not. We believe the ability of skeptical students to largely diminish the KD-immunity of a potentially nasty teacher will motivate the research community to create more robust mechanisms for model conﬁdentiality. We have open-sourced the code at github.com/ksouvik52/Skeptical2021 .
\ No newline at end of file
diff --git a/data/2021/neurips/Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels b/data/2021/neurips/Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels
new file mode 100644
index 0000000000..0bc1dd61a2
--- /dev/null
+++ b/data/2021/neurips/Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels	
@@ -0,0 +1 @@
+Optimization is a key component for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method—the stochastic gradient Langevin dynamics (SGLD) algorithm—and investigate the generalization of models trained by SGLD. We derive a new generalization bound by connecting SGLD with Gaussian channels found in information and communication theory. Our bound can be computed from the training data and incorporates the variance of gradients for quantifying a particular kind of “sharpness” of the loss landscape. We also consider a closely related algorithm with SGLD, namely differentially private SGD (DP-SGD). We prove that the generalization capability of DP-SGD can be ampliﬁed by iteration. Speciﬁcally, our bound can be sharpened by including a time-decaying factor if the DP-SGD algorithm outputs the last iterate while keeping other iterates hidden. This decay factor enables the contribution of early iterations to our bound to reduce with time and is established by strong data processing inequalities—a fundamental tool in information theory. We demonstrate our bound through numerical experiments, showing that it can predict the behavior of the true generalization gap.
\ No newline at end of file
diff --git a/data/2021/neurips/Answering Complex Causal Queries With the Maximum Causal Set Effect b/data/2021/neurips/Answering Complex Causal Queries With the Maximum Causal Set Effect
new file mode 100644
index 0000000000..cdb962e6de
--- /dev/null
+++ b/data/2021/neurips/Answering Complex Causal Queries With the Maximum Causal Set Effect	
@@ -0,0 +1 @@
+The standard tools of causal inference have been developed to answer simple causal queries which can be easily formalized as a small number of statistical estimands in the context of a particular structural causal model (SCM); however, scientific theories often make diffuse predictions about a large number of causal variables. This article proposes a framework for parameterizing such complex causal queries as the maximum difference in causal effects associated with two sets of causal variables that have a researcher specified probability of occurring. We term this estimand the Maximum Causal Set Effect (MCSE) and develop an estimator for it that is asymptotically consistent and conservative in finite samples under assumptions that are standard in the causal inference literature. This estimator is also asymptotically normal and amenable to the non-parametric bootstrap, facilitating classical statistical inference about this novel estimand. We compare this estimator to more common latent variable approaches and find that it can uncover larger causal effects in both real world and simulated data.
\ No newline at end of file
diff --git a/data/2021/neurips/Anti-Backdoor Learning: Training Clean Models on Poisoned Data b/data/2021/neurips/Anti-Backdoor Learning: Training Clean Models on Poisoned Data
new file mode 100644
index 0000000000..a28ccd4bce
--- /dev/null
+++ b/data/2021/neurips/Anti-Backdoor Learning: Training Clean Models on Poisoned Data	
@@ -0,0 +1 @@
+Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{https://github.com/bboylyg/ABL}.
\ No newline at end of file
diff --git a/data/2021/neurips/Antipodes of Label Differential Privacy: PATE and ALIBI b/data/2021/neurips/Antipodes of Label Differential Privacy: PATE and ALIBI
new file mode 100644
index 0000000000..71bea2af96
--- /dev/null
+++ b/data/2021/neurips/Antipodes of Label Differential Privacy: PATE and ALIBI	
@@ -0,0 +1 @@
+We consider the privacy-preserving machine learning (ML) setting where the trained model must satisfy differential privacy (DP) with respect to the labels of the training examples. We propose two novel approaches based on, respectively, the Laplace mechanism and the PATE framework, and demonstrate their effectiveness on standard benchmarks. While recent work by Ghazi et al. proposed Label DP schemes based on a randomized response mechanism, we argue that additive Laplace noise coupled with Bayesian inference (ALIBI) is a better fit for typical ML tasks. Moreover, we show how to achieve very strong privacy levels in some regimes, with our adaptation of the PATE framework that builds on recent advances in semi-supervised learning. We complement theoretical analysis of our algorithms' privacy guarantees with empirical evaluation of their memorization properties. Our evaluation suggests that comparing different algorithms according to their provable DP guarantees can be misleading and favor a less private algorithm with a tighter analysis. Code for implementation of algorithms and memorization attacks is available from https://github.com/facebookresearch/label_dp_antipodes.
\ No newline at end of file
diff --git a/data/2021/neurips/Approximate Decomposable Submodular Function Minimization for Cardinality-Based Components b/data/2021/neurips/Approximate Decomposable Submodular Function Minimization for Cardinality-Based Components
new file mode 100644
index 0000000000..5243ca90f8
--- /dev/null
+++ b/data/2021/neurips/Approximate Decomposable Submodular Function Minimization for Cardinality-Based Components	
@@ -0,0 +1 @@
+Terminology and notation For our results it will be convenient to interpret the ground set V as a set of nodes in a hypergraph H = (V,E), where each e ∈ E is an individual hyperedge and fe is the function which determines how to penalize different ways of splitting the hyperedge e. The function f is then a notion of a generalized hypergraph cut function [4, 6, 9]. This terminology is particularly convenient when talking about graph reduction strategies, since modeling a function fe will involve treating each element in e as a node. We apply this terminology here, though note that all of the results we show will apply more generally to approximately modeling a decomposable submodular function using the cut properties of a graph.
\ No newline at end of file
diff --git a/data/2021/neurips/Approximate optimization of convex functions with outlier noise b/data/2021/neurips/Approximate optimization of convex functions with outlier noise
new file mode 100644
index 0000000000..155dad61b4
--- /dev/null
+++ b/data/2021/neurips/Approximate optimization of convex functions with outlier noise	
@@ -0,0 +1 @@
+We study the problem of minimizing a convex function given by a zeroth order oracle that is possibly corrupted by outlier noise . Speciﬁcally, we assume the function values at some points of the domain are corrupted arbitrarily by an adversary, with the only restriction being that the total volume of corrupted points is bounded. The goal then is to ﬁnd a point close to the function’s minimizer using access to the corrupted oracle. We ﬁrst prove a lower bound result showing that, somewhat surprisingly, one cannot hope to approximate the minimizer nearly as well as one might expect, even if one is allowed an unbounded number of queries to the oracle. Complementing this negative result, we then develop an efﬁcient algorithm that outputs a point close to the minimizer of the convex function, where the speciﬁc distance matches exactly , up to constant factors, the distance bound shown in our lower bound result.
\ No newline at end of file
diff --git a/data/2021/neurips/Approximating the Permanent with Deep Rejection Sampling b/data/2021/neurips/Approximating the Permanent with Deep Rejection Sampling
new file mode 100644
index 0000000000..b26f5dedef
--- /dev/null
+++ b/data/2021/neurips/Approximating the Permanent with Deep Rejection Sampling	
@@ -0,0 +1 @@
+We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the upper bound for the permanent with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random $(0, 1)$-matrices where each entry is independently $1$ with probability $p$. Our bound is superior to a previous one for $p$ less than $1/5$, matching another bound that was known to hold when every row and column has density exactly $p$.
\ No newline at end of file
diff --git a/data/2021/neurips/Arbitrary Conditional Distributions with Energy b/data/2021/neurips/Arbitrary Conditional Distributions with Energy
new file mode 100644
index 0000000000..086f5b2e61
--- /dev/null
+++ b/data/2021/neurips/Arbitrary Conditional Distributions with Energy	
@@ -0,0 +1 @@
+Modeling distributions of covariates, or density estimation, is a core challenge in unsupervised learning. However, the majority of work only considers the joint distribution, which has limited utility in practical situations. A more general and useful problem is arbitrary conditional density estimation, which aims to model any possible conditional distribution over a set of covariates, reflecting the more realistic setting of inference based on prior knowledge. We propose a novel method, Arbitrary Conditioning with Energy (ACE), that can simultaneously estimate the distribution $p(\mathbf{x}_u \mid \mathbf{x}_o)$ for all possible subsets of unobserved features $\mathbf{x}_u$ and observed features $\mathbf{x}_o$. ACE is designed to avoid unnecessary bias and complexity -- we specify densities with a highly expressive energy function and reduce the problem to only learning one-dimensional conditionals (from which more complex distributions can be recovered during inference). This results in an approach that is both simpler and higher-performing than prior methods. We show that ACE achieves state-of-the-art for arbitrary conditional likelihood estimation and data imputation on standard benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training b/data/2021/neurips/Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training
new file mode 100644
index 0000000000..eab55aff3d
--- /dev/null
+++ b/data/2021/neurips/Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training	
@@ -0,0 +1 @@
+Deep learning (DL) systems have been gaining popularity in critical tasks such as credit evaluation and crime prediction. Such systems demand fairness. Recent work shows that DL software implementations introduce variance: identical DL training runs (i.e., identical network, data, conﬁguration, software, and hardware) with a ﬁxed seed produce diﬀerent models. Such variance could make DL models and networks violate fairness compliance laws, resulting in negative social impact. In this paper, we conduct the ﬁrst empirical study to quantify the impact of software implementation on the fairness and its variance of DL systems. Our study of 22 mitigation techniques and ﬁve baselines reveals up to 12.6% fairness variance across identical training runs with identical seeds. In addition, most debiasing algorithms have a negative impact on the model such as reducing model accuracy, increasing fairness variance, or increasing accuracy variance. Our literature survey shows that while fairness is gaining popularity in artiﬁcial intelligence (AI) related conferences, only 34.4% of the papers use multiple identical training runs to evaluate their approach, raising concerns about their results’ validity. We call for better fairness evaluation and testing protocols to improve fairness and fairness variance of DL systems as well as DL research validity and reproducibility
\ No newline at end of file
diff --git a/data/2021/neurips/Are Transformers more robust than CNNs? b/data/2021/neurips/Are Transformers more robust than CNNs?
new file mode 100644
index 0000000000..feddaf8e6d
--- /dev/null
+++ b/data/2021/neurips/Are Transformers more robust than CNNs?	
@@ -0,0 +1 @@
+Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair&in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations. With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs. The code and models are publicly available at https://github.com/ytongbai/ViTs-vs-CNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions b/data/2021/neurips/Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
new file mode 100644
index 0000000000..81e17b3051
--- /dev/null
+++ b/data/2021/neurips/Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions	
@@ -0,0 +1 @@
+Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.
\ No newline at end of file
diff --git a/data/2021/neurips/Artistic Style Transfer with Internal-external Learning and Contrastive Learning b/data/2021/neurips/Artistic Style Transfer with Internal-external Learning and Contrastive Learning
new file mode 100644
index 0000000000..2bded5bfa0
--- /dev/null
+++ b/data/2021/neurips/Artistic Style Transfer with Internal-external Learning and Contrastive Learning	
@@ -0,0 +1 @@
+Although existing artistic style transfer methods have achieved signiﬁcant im-provement with deep neural networks, they still suffer from artifacts such as disharmonious colors and repetitive patterns. Motivated by this, we propose an internal-external style transfer method with two contrastive losses. Speciﬁcally, we utilize internal statistics of a single style image to determine the colors and texture patterns of the stylized image, and in the meantime, we leverage the external information of the large-scale style dataset to learn the human-aware style information, which makes the color distributions and texture patterns in the stylized image more reasonable and harmonious. In addition, we argue that existing style transfer methods only consider the content-to-stylization and style-to-stylization relations, neglecting the stylization-to-stylization relations. To address this issue, we introduce two contrastive losses, which pull the multiple stylization embeddings closer to each other when they share the same content or style, but push far away otherwise. We conduct extensive experiments, showing that our proposed method can not only produce visually more harmonious and satisfying artistic images, but also promote the stability and consistency of rendered video clips.
\ No newline at end of file
diff --git a/data/2021/neurips/Assessing Fairness in the Presence of Missing Data b/data/2021/neurips/Assessing Fairness in the Presence of Missing Data
new file mode 100644
index 0000000000..5893e8cae8
--- /dev/null
+++ b/data/2021/neurips/Assessing Fairness in the Presence of Missing Data	
@@ -0,0 +1 @@
+Missing data are prevalent and present daunting challenges in real data analysis. While there is a growing body of literature on fairness in analysis of fully observed data, there has been little theoretical work on investigating fairness in analysis of incomplete data. In practice, a popular analytical approach for dealing with missing data is to use only the set of complete cases, i.e., observations with all features fully observed to train a prediction algorithm. However, depending on the missing data mechanism, the distribution of complete cases and the distribution of the complete data may be substantially different. When the goal is to develop a fair algorithm in the complete data domain where there are no missing values, an algorithm that is fair in the complete case domain may show disproportionate bias towards some marginalized groups in the complete data domain. To fill this significant gap, we study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases. We provide upper and lower bounds on the fairness estimation error and conduct numerical experiments to assess our theoretical results. Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.
\ No newline at end of file
diff --git a/data/2021/neurips/Associating Objects with Transformers for Video Object Segmentation b/data/2021/neurips/Associating Objects with Transformers for Video Object Segmentation
new file mode 100644
index 0000000000..170ad4f9d4
--- /dev/null
+++ b/data/2021/neurips/Associating Objects with Transformers for Video Object Segmentation	
@@ -0,0 +1 @@
+This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object. For sufficiently modeling multi-object association, a Long Short-Term Transformer is designed for constructing hierarchical matching and propagation. We conduct extensive experiments on both multi-object and single-object benchmarks to examine AOT variant networks with different complexities. Particularly, our R50-AOT-L outperforms all the state-of-the-art competitors on three popular benchmarks, i.e., YouTube-VOS (84.1% J&F), DAVIS 2017 (84.9%), and DAVIS 2016 (91.1%), while keeping more than $3\times$ faster multi-object run-time. Meanwhile, our AOT-T can maintain real-time multi-object speed on the above benchmarks. Based on AOT, we ranked 1st in the 3rd Large-scale VOS Challenge.
\ No newline at end of file
diff --git a/data/2021/neurips/Associative Memories via Predictive Coding b/data/2021/neurips/Associative Memories via Predictive Coding
new file mode 100644
index 0000000000..cf2f546eee
--- /dev/null
+++ b/data/2021/neurips/Associative Memories via Predictive Coding	
@@ -0,0 +1 @@
+Associative memories in the brain receive and store patterns of activity registered by the sensory neurons, and are able to retrieve them when necessary. Due to their importance in human intelligence, computational models of associative memories have been developed for several decades now. In this paper, we present a novel neural model for realizing associative memories, which is based on a hierarchical generative network that receives external stimuli via sensory neurons. It is trained using predictive coding, an error-based learning algorithm inspired by information processing in the cortex. To test the model's capabilities, we perform multiple retrieval experiments from both corrupted and incomplete data points. In an extensive comparison, we show that this new model outperforms in retrieval accuracy and robustness popular associative memory models, such as autoencoders trained via backpropagation, and modern Hopfield networks. In particular, in completing partial data points, our model achieves remarkable results on natural image datasets, such as ImageNet, with a surprisingly high accuracy, even when only a tiny fraction of pixels of the original images is presented. Our model provides a plausible framework to study learning and retrieval of memories in the brain, as it closely mimics the behavior of the hippocampus as a memory index and generative model.
\ No newline at end of file
diff --git a/data/2021/neurips/Asymptotically Best Causal Effect Identification with Multi-Armed Bandits b/data/2021/neurips/Asymptotically Best Causal Effect Identification with Multi-Armed Bandits
new file mode 100644
index 0000000000..2f4aefdd4a
--- /dev/null
+++ b/data/2021/neurips/Asymptotically Best Causal Effect Identification with Multi-Armed Bandits	
@@ -0,0 +1 @@
+This paper considers the problem of selecting a formula for identifying a causal quantity of interest among a set of available formulas. We assume an sequential setting in which the investigator may alter the data collection mechanism in a data-dependent way with the aim of identifying the formula with lowest asymptotic variance in as few samples as possible. We formalize this setting by using the best-arm-identiﬁcation bandit framework where the standard goal of learning the arm with the lowest loss is replaced with the goal of learning the arm that will produce the best estimate. We introduce new tools for constructing ﬁnite-sample conﬁdence bounds on estimates of the asymptotic variance that account for the estimation of potentially complex nuisance functions, and adapt the best-arm-identiﬁcation algorithms of LUCB and Successive Elimination to use these bounds. We validate our method by providing upper bounds on the sample complexity and an empirical study on artiﬁcially generated data.
\ No newline at end of file
diff --git a/data/2021/neurips/Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models b/data/2021/neurips/Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models
new file mode 100644
index 0000000000..5960308ff2
--- /dev/null
+++ b/data/2021/neurips/Asymptotically Exact Error Characterization of Offline Policy Evaluation with Misspecified Linear Models	
@@ -0,0 +1 @@
+We consider the problem of offline policy evaluation (OPE) with Markov decision processes (MDPs), where the goal is to estimate the utility of given decision-making policies based on static datasets. Recently, theoretical understanding of OPE has been rapidly advanced under (approximate) realizability assumptions, i
\ No newline at end of file
diff --git a/data/2021/neurips/Asymptotics of representation learning in finite Bayesian neural networks b/data/2021/neurips/Asymptotics of representation learning in finite Bayesian neural networks
new file mode 100644
index 0000000000..811f96da6e
--- /dev/null
+++ b/data/2021/neurips/Asymptotics of representation learning in finite Bayesian neural networks	
@@ -0,0 +1 @@
+D Direct computation of the average hidden layer kernels of a deep linear MLP S11 D.1 The cumulant generating function of learned features for a MLP . . . . . . . . . . S11 D.2 General form of the perturbative layer integrals for a deep linear network . . . . . . S12 D.3 Perturbative computation of the partition function of a deep linear network . . . . . S16 D.4 Computing the average hidden layer kernels of a deep linear network . . . . . . . . S18
\ No newline at end of file
diff --git a/data/2021/neurips/Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection b/data/2021/neurips/Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection
new file mode 100644
index 0000000000..05ae2b19f1
--- /dev/null
+++ b/data/2021/neurips/Asymptotics of the Bootstrap via Stability with Applications to Inference with Model Selection	
@@ -0,0 +1 @@
+One of the most commonly used methods for forming conﬁdence intervals is the empirical bootstrap, which is especially expedient when the limiting distribution of the estimator is unknown. However, despite its ubiquitous role in machine learning, its theoretical properties are still not well understood. Recent developments in probability have provided new tools to study the bootstrap method. However, they have been applied only to speciﬁc applications and contexts, and it is unclear whether these techniques are applicable to the understanding of the consistency of the bootstrap in machine learning pipelines. In this paper, we derive general stability conditions under which the empirical bootstrap estimator is consistent and quantify the speed of convergence. Moreover, we propose alternative ways to use the bootstrap method to build conﬁdence intervals with coverage guarantees. Finally, we illustrate the generality and tightness of our results by examples of interest for machine learning including for two-sample kernel tests after kernel selection and the empirical risk of stacked estimators.
\ No newline at end of file
diff --git a/data/2021/neurips/Asynchronous Decentralized Online Learning b/data/2021/neurips/Asynchronous Decentralized Online Learning
new file mode 100644
index 0000000000..69796fbbfd
--- /dev/null
+++ b/data/2021/neurips/Asynchronous Decentralized Online Learning	
@@ -0,0 +1 @@
+Most existing algorithms in decentralized online learning are conducted in the synchronous setting. However, synchronization makes these algorithms suffer from the straggler problem, i.e., fast learners have to wait for slow learners, which significantly reduces such algorithms’ overall efficiency. To overcome this problem, we study decentralized online learning in the asynchronous setting, which allows different learners to work at their own pace. We first formulate the framework of Asynchronous Decentralized Online Convex Optimization, which specifies the whole process of asynchronous decentralized online learning using a sophisticated event indexing system. Then we propose the Asynchronous Decentralized Online Gradient-Push (AD-OGP) algorithm, which performs asymmetric gossiping communication and instantaneous model averaging. We further derive a regret bound of AD-OGP, which is a function of the network topology, the levels of processing delays, and the levels of communication delays. Extensive experiments show that AD-OGP runs significantly faster than its synchronous counterpart and also verify the theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Asynchronous Decentralized SGD with Quantized and Local Updates b/data/2021/neurips/Asynchronous Decentralized SGD with Quantized and Local Updates
new file mode 100644
index 0000000000..c06328f661
--- /dev/null
+++ b/data/2021/neurips/Asynchronous Decentralized SGD with Quantized and Local Updates	
@@ -0,0 +1 @@
+Decentralized optimization is emerging as a viable alternative for scal-able distributed machine learning, but also introduces new challenges in terms of synchronization costs. To this end, several communication-reduction techniques, such as non-blocking communication, quantization, and local steps, have been explored in the decentralized setting. Due to the complexity of analyzing optimization in such a relaxed setting, this line of work often assumes global communication rounds, which require additional synchronization. In this paper, we consider decentralized optimization in the simpler, but harder to analyze, asynchronous gossip model, in which communication occurs in discrete, randomly chosen pairings among nodes. Perhaps surprisingly, we show that a variant of SGD called SwarmSGD still converges in this setting, even if non-blocking communication , quantization , and local steps are all applied in conjunction , and even if the node data distributions and underlying graph topology are both heterogenous . Our analysis is based on a new connection with multi-dimensional load-balancing processes. We implement this algorithm and deploy it in a super-computing environment, showing that it can outperform previous decentralized methods in terms of end-to-end training time, and that it can even rival carefully-tuned large-batch SGD for cer-tain tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Asynchronous Stochastic Optimization Robust to Arbitrary Delays b/data/2021/neurips/Asynchronous Stochastic Optimization Robust to Arbitrary Delays
new file mode 100644
index 0000000000..d54a3356f5
--- /dev/null
+++ b/data/2021/neurips/Asynchronous Stochastic Optimization Robust to Arbitrary Delays	
@@ -0,0 +1 @@
+We consider stochastic optimization with delayed gradients where, at each time step $t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$. This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time. In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients. This improves over previous work, which showed that stochastic gradient decent achieves the same rate but with respect to the \emph{maximal} delay $\max_{t} d_t$, that can be significantly larger than the average delay especially in heterogeneous distributed systems. Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
\ No newline at end of file
diff --git a/data/2021/neurips/Attention Approximates Sparse Distributed Memory b/data/2021/neurips/Attention Approximates Sparse Distributed Memory
new file mode 100644
index 0000000000..c738e3e930
--- /dev/null
+++ b/data/2021/neurips/Attention Approximates Sparse Distributed Memory	
@@ -0,0 +1 @@
+While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva’s Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We conﬁrm that these conditions are satisﬁed in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention.
\ No newline at end of file
diff --git a/data/2021/neurips/Attention Bottlenecks for Multimodal Fusion b/data/2021/neurips/Attention Bottlenecks for Multimodal Fusion
new file mode 100644
index 0000000000..1517b53f40
--- /dev/null
+++ b/data/2021/neurips/Attention Bottlenecks for Multimodal Fusion	
@@ -0,0 +1 @@
+Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio. Machine perception models, in stark contrast, are typically modality-specific and optimised for unimodal benchmarks, and hence late-stage fusion of final representations or predictions from each modality (`late-fusion') is still a dominant paradigm for multimodal video classification. Instead, we introduce a novel transformer based architecture that uses `fusion bottlenecks' for modality fusion at multiple layers. Compared to traditional pairwise self-attention, our model forces information between different modalities to pass through a small number of bottleneck latents, requiring the model to collate and condense the most relevant information in each modality and only share what is necessary. We find that such a strategy improves fusion performance, at the same time reducing computational cost. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks including Audioset, Epic-Kitchens and VGGSound. All code and models will be released.
\ No newline at end of file
diff --git a/data/2021/neurips/Attention over Learned Object Embeddings Enables Complex Visual Reasoning b/data/2021/neurips/Attention over Learned Object Embeddings Enables Complex Visual Reasoning
new file mode 100644
index 0000000000..5d09f805bf
--- /dev/null
+++ b/data/2021/neurips/Attention over Learned Object Embeddings Enables Complex Visual Reasoning	
@@ -0,0 +1 @@
+Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds.
\ No newline at end of file
diff --git a/data/2021/neurips/Auditing Black-Box Prediction Models for Data Minimization Compliance b/data/2021/neurips/Auditing Black-Box Prediction Models for Data Minimization Compliance
new file mode 100644
index 0000000000..17fb04837f
--- /dev/null
+++ b/data/2021/neurips/Auditing Black-Box Prediction Models for Data Minimization Compliance	
@@ -0,0 +1 @@
+In this paper, we focus on auditing black-box prediction models for compliance with the GDPR’s data minimization principle . This principle restricts prediction models to use the minimal information that is necessary for performing the task at hand. Given the challenge of the black-box setting, our key idea is to check if each of the prediction model’s input features is individually necessary by assigning it some constant value (i.e., applying a simple imputation) across all prediction instances, and measuring the extent to which the model outcomes would change. We introduce a metric for data minimization that is based on model instability under simple imputations. We extend the applicability of this metric from a ﬁnite sample model to a distributional setting by introducing a probabilistic data minimization guarantee, which we derive using a Bayesian approach. Furthermore, we address the auditing problem under a constraint on the number of queries to the prediction system. We formulate the problem of allocating a budget of system queries to feasible simple imputations (for investigating model instability) as a multi-armed bandit framework with probabilistic success metrics. We deﬁne two bandit problems for providing a probabilistic data minimization guarantee at a given conﬁdence level: a decision problem given a data minimization level, and a measurement problem given a ﬁxed query budget. We design efﬁcient algorithms for these auditing problems using novel exploration strategies that expand classical bandit strategies. Our experiments with real-world prediction systems show that our auditing algorithms signiﬁcantly outperform simpler benchmarks in both measurement and decision problems.
\ No newline at end of file
diff --git a/data/2021/neurips/AugMax: Adversarial Composition of Random Augmentations for Robust Training b/data/2021/neurips/AugMax: Adversarial Composition of Random Augmentations for Robust Training
new file mode 100644
index 0000000000..5d1acc4c6b
--- /dev/null
+++ b/data/2021/neurips/AugMax: Adversarial Composition of Random Augmentations for Robust Training	
@@ -0,0 +1 @@
+Data augmentation is a simple yet effective way to improve the robustness of deep neural networks (DNNs). Diversity and hardness are two complementary dimensions of data augmentation to achieve robustness. For example, AugMix explores random compositions of a diverse set of augmentations to enhance broader coverage, while adversarial training generates adversarially hard samples to spot the weakness. Motivated by this, we propose a data augmentation framework, termed AugMax, to unify the two aspects of diversity and hardness. AugMax first randomly samples multiple augmentation operators and then learns an adversarial mixture of the selected operators. Being a stronger form of data augmentation, AugMax leads to a significantly augmented input distribution which makes model training more challenging. To solve this problem, we further design a disentangled normalization module, termed DuBIN (Dual-Batch-and-Instance Normalization), that disentangles the instance-wise feature heterogeneity arising from AugMax. Experiments show that AugMax-DuBIN leads to significantly improved out-of-distribution robustness, outperforming prior arts by 3.03%, 3.49%, 1.82% and 0.71% on CIFAR10-C, CIFAR100-C, Tiny ImageNet-C and ImageNet-C. Codes and pretrained models are available: https://github.com/VITA-Group/AugMax.
\ No newline at end of file
diff --git a/data/2021/neurips/Augmented Shortcuts for Vision Transformers b/data/2021/neurips/Augmented Shortcuts for Vision Transformers
new file mode 100644
index 0000000000..94107a8007
--- /dev/null
+++ b/data/2021/neurips/Augmented Shortcuts for Vision Transformers	
@@ -0,0 +1 @@
+Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, i.e., feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation b/data/2021/neurips/Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
new file mode 100644
index 0000000000..d966d80f0c
--- /dev/null
+++ b/data/2021/neurips/Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation	
@@ -0,0 +1 @@
+Medical report generation, which aims to automatically generate a long and coherent report of a given medical image, has been receiving growing research interests. Existing approaches mainly adopt a supervised manner and heavily rely on coupled image-report pairs. However, in the medical domain, building a large-scale image-report paired dataset is both time-consuming and expensive. To relax the dependency on paired data, we propose an unsupervised model Knowledge Graph Auto-Encoder (KGAE) which accepts independent sets of images and reports in training. KGAE consists of a pre-constructed knowledge graph, a knowledge-driven encoder and a knowledge-driven decoder. The knowledge graph works as the shared latent space to bridge the visual and textual domains; The knowledge-driven encoder projects medical images and reports to the corresponding coordinates in this latent space and the knowledge-driven decoder generates a medical report given a coordinate in this space. Since the knowledge-driven encoder and decoder can be trained with independent sets of images and reports, KGAE is unsupervised. The experiments show that the unsupervised KGAE generates desirable medical reports without using any image-report training pairs. Moreover, KGAE can also work in both semi-supervised and supervised settings, and accept paired images and reports in training. By further fine-tuning with image-report pairs, KGAE consistently outperforms the current state-of-the-art models on two datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/AutoBalance: Optimized Loss Functions for Imbalanced Data b/data/2021/neurips/AutoBalance: Optimized Loss Functions for Imbalanced Data
new file mode 100644
index 0000000000..400c56e73f
--- /dev/null
+++ b/data/2021/neurips/AutoBalance: Optimized Loss Functions for Imbalanced Data	
@@ -0,0 +1 @@
+Imbalanced datasets are commonplace in modern machine learning problems. The presence of under-represented classes or groups with sensitive attributes results in concerns about generalization and fairness. Such concerns are further exacerbated by the fact that large capacity deep nets can perfectly fit the training data and appear to achieve perfect accuracy and fairness during training, but perform poorly during test. To address these challenges, we propose AutoBalance, a bi-level optimization framework that automatically designs a training loss function to optimize a blend of accuracy and fairness-seeking objectives. Specifically, a lower-level problem trains the model weights, and an upper-level problem tunes the loss function by monitoring and optimizing the desired objective over the validation data. Our loss design enables personalized treatment for classes/groups by employing a parametric cross-entropy loss and individualized data augmentation schemes. We evaluate the benefits and performance of our approach for the application scenarios of imbalanced and group-sensitive classification. Extensive empirical evaluations demonstrate the benefits of AutoBalance over state-of-the-art approaches. Our experimental findings are complemented with theoretical insights on loss function design and the benefits of train-validation split. All code is available open-source.
\ No newline at end of file
diff --git a/data/2021/neurips/AutoGEL: An Automated Graph Neural Network with Explicit Link Information b/data/2021/neurips/AutoGEL: An Automated Graph Neural Network with Explicit Link Information
new file mode 100644
index 0000000000..0762c28255
--- /dev/null
+++ b/data/2021/neurips/AutoGEL: An Automated Graph Neural Network with Explicit Link Information	
@@ -0,0 +1 @@
+Recently, Graph Neural Networks (GNNs) have gained popularity in a variety of real-world scenarios. Despite the great success, the architecture design of GNNs heavily relies on manual labor. Thus, automated graph neural network (AutoGNN) has attracted interest and attention from the research community, which makes significant performance improvements in recent years. However, existing AutoGNN works mainly adopt an implicit way to model and leverage the link information in the graphs, which is not well regularized to the link prediction task on graphs, and limits the performance of AutoGNN for other graph tasks. In this paper, we present a novel AutoGNN work that explicitly models the link information, abbreviated to AutoGEL. In such a way, AutoGEL can handle the link prediction task and improve the performance of AutoGNNs on the node classification and graph classification task. Specifically, AutoGEL proposes a novel search space containing various design dimensions at both intra-layer and inter-layer designs and adopts a more robust differentiable search algorithm to further improve efficiency and effectiveness. Experimental results on benchmark data sets demonstrate the superiority of AutoGEL on several tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Autobahn: Automorphism-based Graph Neural Nets b/data/2021/neurips/Autobahn: Automorphism-based Graph Neural Nets
new file mode 100644
index 0000000000..259ec37303
--- /dev/null
+++ b/data/2021/neurips/Autobahn: Automorphism-based Graph Neural Nets	
@@ -0,0 +1 @@
+We introduce Automorphism-based graph neural networks (Autobahn), a new family of graph neural networks. In an Autobahn, we decompose the graph into a collection of subgraphs and apply local convolutions that are equivariant to each subgraph's automorphism group. Specific choices of local neighborhoods and subgraphs recover existing architectures such as message passing neural networks. Our formalism also encompasses novel architectures: as an example, we introduce a graph neural network that decomposes the graph into paths and cycles. The resulting convolutions reflect the natural way that parts of the graph can transform, preserving the intuitive meaning of convolution without sacrificing global permutation equivariance. We validate our approach by applying Autobahn to molecular graphs, where it achieves results competitive with state-of-the-art message passing algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting b/data/2021/neurips/Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting
new file mode 100644
index 0000000000..4217669bb0
--- /dev/null
+++ b/data/2021/neurips/Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting	
@@ -0,0 +1 @@
+Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Going beyond Transformers, we design Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We break with the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease. Code is available at this repository: \url{https://github.com/thuml/Autoformer}.
\ No newline at end of file
diff --git a/data/2021/neurips/Automated Discovery of Adaptive Attacks on Adversarial Defenses b/data/2021/neurips/Automated Discovery of Adaptive Attacks on Adversarial Defenses
new file mode 100644
index 0000000000..8caf2a715b
--- /dev/null
+++ b/data/2021/neurips/Automated Discovery of Adaptive Attacks on Adversarial Defenses	
@@ -0,0 +1 @@
+Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense's inner workings or approaches based on an ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that adaptive attacks are composed of reusable building blocks that can be formalized in a search space and used to automatically discover attacks for unknown defenses. We evaluated our approach on 24 adversarial defenses and show that it outperforms AutoAttack, the current state-of-the-art tool for reliable evaluation of adversarial defenses: our tool discovered significantly stronger attacks by producing 3.0\%-50.8\% additional adversarial examples for 10 models, while obtaining attacks with slightly stronger or similar strength for the remaining models.
\ No newline at end of file
diff --git a/data/2021/neurips/Automated Dynamic Mechanism Design b/data/2021/neurips/Automated Dynamic Mechanism Design
new file mode 100644
index 0000000000..5890af6f9f
--- /dev/null
+++ b/data/2021/neurips/Automated Dynamic Mechanism Design	
@@ -0,0 +1 @@
+We study Bayesian automated mechanism design in unstructured dynamic environments, where a principal repeatedly interacts with an agent, and takes actions based on the strategic agent's report of the current state of the world. Both the principal and the agent can have arbitrary and potentially different valuations for the actions taken, possibly also depending on the actual state of the world. Moreover, at any time, the state of the world may evolve arbitrarily depending on the action taken by the principal. The goal is to compute an optimal mechanism which maximizes the principal's utility in the face of the self-interested strategic agent. We give an efficient algorithm for computing optimal mechanisms, with or without payments, under different individual-rationality constraints, when the time horizon is constant. Our algorithm is based on a sophisticated linear program formulation, which can be customized in various ways to accommodate richer constraints. For environments with large time horizons, we show that the principal's optimal utility is hard to approximate within a certain constant factor, complementing our algorithmic result. We further consider a special case of the problem where the agent is myopic, and give a refined efficient algorithm whose time complexity scales linearly in the time horizon. Moreover, we show that memoryless mechanisms do not provide a good solution for our problem, in terms of both optimality and computational tractability. These results paint a relatively complete picture for automated dynamic mechanism design in unstructured environments. Finally, we present experimental results where our algorithms are applied to synthetic dynamic environments with different characteristics, which not only serve as a proof of concept for our algorithms, but also exhibit intriguing phenomena in dynamic mechanism design.
\ No newline at end of file
diff --git a/data/2021/neurips/Automatic Data Augmentation for Generalization in Reinforcement Learning b/data/2021/neurips/Automatic Data Augmentation for Generalization in Reinforcement Learning
new file mode 100644
index 0000000000..62b8aca048
--- /dev/null
+++ b/data/2021/neurips/Automatic Data Augmentation for Generalization in Reinforcement Learning	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) agents often fail to generalize beyond their training environments. To alleviate this problem, recent work has proposed the use of data augmentation. However, different tasks tend to beneﬁt from different types of augmentations and selecting the right one typically requires expert knowledge. In this paper, we introduce three approaches for automatically ﬁnding an effective augmentation for any RL task. These are combined with two novel regularization terms for the policy and value function, required to make the use of data augmentation theoretically sound for actor-critic algorithms. Our method achieves a new state-of-the-art 1 on the Procgen benchmark and outperforms popular RL algorithms on DeepMind Control tasks with distractors. In addition, our agent learns policies and representations which are more robust to changes in the environment that are irrelevant for solving the task, such as the background. Our code is available at https://github.com/rraileanu/auto-drac .
\ No newline at end of file
diff --git a/data/2021/neurips/Automatic Symmetry Discovery with Lie Algebra Convolutional Network b/data/2021/neurips/Automatic Symmetry Discovery with Lie Algebra Convolutional Network
new file mode 100644
index 0000000000..ef91671b56
--- /dev/null
+++ b/data/2021/neurips/Automatic Symmetry Discovery with Lie Algebra Convolutional Network	
@@ -0,0 +1 @@
+Existing equivariant neural networks require prior knowledge of the symmetry group and discretization for continuous groups. We propose to work with Lie algebras (infinitesimal generators) instead of Lie groups. Our model, the Lie algebra convolutional network (L-conv) can automatically discover symmetries and does not require discretization of the group. We show that L-conv can serve as a building block to construct any group equivariant feedforward architecture. Both CNNs and Graph Convolutional Networks can be expressed as L-conv with appropriate groups. We discover direct connections between L-conv and physics: (1) group invariant loss generalizes field theory (2) Euler-Lagrange equation measures the robustness, and (3) equivariance leads to conservation laws and Noether current.These connections open up new avenues for designing more general equivariant networks and applying them to important problems in physical sciences
\ No newline at end of file
diff --git a/data/2021/neurips/Automatic Unsupervised Outlier Model Selection b/data/2021/neurips/Automatic Unsupervised Outlier Model Selection
new file mode 100644
index 0000000000..1204f1b7d6
--- /dev/null
+++ b/data/2021/neurips/Automatic Unsupervised Outlier Model Selection	
@@ -0,0 +1 @@
+Given an unsupervised outlier detection task on a new dataset, how can we automatically select a good outlier detection algorithm and its hyperparameter(s) (collectively called a model)? In this work, we tackle the unsupervised outlier model selection (UOMS) problem, and propose M ETA OD, a principled, data-driven approach to UOMS based on meta-learning. The UOMS problem is notoriously challenging, as compared to model selection for classiﬁcation and clustering, since ( i ) model evaluation is infeasible due to the lack of hold-out data with labels, and ( ii ) model comparison is infeasible due to the lack of a universal objective function. M ETA OD capitalizes on the performances of a large body of detection models on historical outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without any labels, model evaluations or model comparisons . To capture task similarity within our meta-learning framework, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Extensive experiments show that selecting a model by M ETA OD signiﬁcantly outperforms no model selection (e.g. always using the same popular model or the ensemble of many) as well as other meta-learning techniques that we tailored for UOMS. Moreover upon (meta-)training, M ETA OD is extremely efﬁcient at test time; selecting from a large pool of 300+ models takes less than 1 second for a new task. We open-source 1 M ETA OD and our meta-learning database for practical use and to foster further research on the UOMS problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach b/data/2021/neurips/Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Automorphic Equivalence-aware Graph Neural Network b/data/2021/neurips/Automorphic Equivalence-aware Graph Neural Network
new file mode 100644
index 0000000000..6344975fe1
--- /dev/null
+++ b/data/2021/neurips/Automorphic Equivalence-aware Graph Neural Network	
@@ -0,0 +1 @@
+Distinguishing the automorphic equivalence of nodes in a graph plays an essential role in many scientific domains, e.g., computational biologist and social network analysis. However, existing graph neural networks (GNNs) fail to capture such an important property. To make GNN aware of automorphic equivalence, we first introduce a localized variant of this concept -- ego-centered automorphic equivalence (Ego-AE). Then, we design a novel variant of GNN, i.e., GRAPE, that uses learnable AE-aware aggregators to explicitly differentiate the Ego-AE of each node's neighbors with the aids of various subgraph templates. While the design of subgraph templates can be hard, we further propose a genetic algorithm to automatically search them from graph data. Moreover, we theoretically prove that GRAPE is expressive in terms of generating distinct representations for nodes with different Ego-AE features, which fills in a fundamental gap of existing GNN variants. Finally, we empirically validate our model on eight real-world graph data, including social network, e-commerce co-purchase network, and citation network, and show that it consistently outperforms existing GNNs. The source code is public available at https://github.com/tsinghua-fib-lab/GRAPE.
\ No newline at end of file
diff --git a/data/2021/neurips/Autonomous Reinforcement Learning via Subgoal Curricula b/data/2021/neurips/Autonomous Reinforcement Learning via Subgoal Curricula
new file mode 100644
index 0000000000..b3307e1315
--- /dev/null
+++ b/data/2021/neurips/Autonomous Reinforcement Learning via Subgoal Curricula	
@@ -0,0 +1 @@
+Reinforcement learning (RL) promises to enable autonomous acquisition of complex behaviors for diverse agents. However, the success of current reinforcement learning algorithms is predicated on an often under-emphasised requirement -- each trial needs to start from a fixed initial state distribution. Unfortunately, resetting the environment to its initial state after each trial requires substantial amount of human supervision and extensive instrumentation of the environment which defeats the goal of autonomous acquisition of complex behaviors. In this work, we propose Value-accelerated Persistent Reinforcement Learning (VaPRL), which generates a curriculum of initial states such that the agent can bootstrap on the success of easier tasks to efficiently learn harder tasks. The agent also learns to reach the initial states proposed by the curriculum, minimizing the reliance on human interventions into the learning. We observe that VaPRL reduces the interventions required by three orders of magnitude compared to episodic RL while outperforming prior state-of-the art methods for reset-free RL both in terms of sample efficiency and asymptotic performance on a variety of simulated robotics problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Average-Reward Learning and Planning with Options b/data/2021/neurips/Average-Reward Learning and Planning with Options
new file mode 100644
index 0000000000..32f489080a
--- /dev/null
+++ b/data/2021/neurips/Average-Reward Learning and Planning with Options	
@@ -0,0 +1 @@
+We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option learning algorithms, intra-option algorithms for learning values and models, as well as sample-based planning variants of our learning algorithms. Our algorithms and convergence proofs extend those recently developed by Wan, Naik, and Sutton. We also extend the notion of option-interrupting behavior from the discounted to the average-reward formulation. We show the efficacy of the proposed algorithms with experiments on a continuing version of the Four-Room domain.
\ No newline at end of file
diff --git a/data/2021/neurips/Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent b/data/2021/neurips/Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent
new file mode 100644
index 0000000000..02a100d289
--- /dev/null
+++ b/data/2021/neurips/Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent	
@@ -0,0 +1 @@
+We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.
\ No newline at end of file
diff --git a/data/2021/neurips/BARTScore: Evaluating Generated Text as Text Generation b/data/2021/neurips/BARTScore: Evaluating Generated Text as Text Generation
new file mode 100644
index 0000000000..7a72961857
--- /dev/null
+++ b/data/2021/neurips/BARTScore: Evaluating Generated Text as Text Generation	
@@ -0,0 +1 @@
+A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference output or the source text will achieve higher scores when the generated text is better. We operationalize this idea using BART, an encoder-decoder based pre-trained model, and propose a metric BARTScore with a number of variants that can be flexibly applied in an unsupervised fashion to evaluation of text from different perspectives (e.g. informativeness, fluency, or factuality). BARTScore is conceptually simple and empirically effective. It can outperform existing top-scoring metrics in 16 of 22 test settings, covering evaluation of 16 datasets (e.g., machine translation, text summarization) and 7 different perspectives (e.g., informativeness, factuality). Code to calculate BARTScore is available at https://github.com/neulab/BARTScore, and we have released an interactive leaderboard for meta-evaluation at http://explainaboard.nlpedia.ai/leaderboard/task-meval/ on the ExplainaBoard platform, which allows us to interactively understand the strengths, weaknesses, and complementarity of each metric.
\ No newline at end of file
diff --git a/data/2021/neurips/BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain b/data/2021/neurips/BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain
new file mode 100644
index 0000000000..4607c3a85d
--- /dev/null
+++ b/data/2021/neurips/BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain	
@@ -0,0 +1 @@
+Nonparametric regression on complex domains has been a challenging task as most existing methods, such as ensemble models based on binary decision trees, are not designed to account for intrinsic geometries and domain boundaries. This article proposes a Bayesian additive regression spanning trees (BAST) model for nonparametric regression on manifolds, with an emphasis on complex constrained domains or irregularly shaped spaces embedded in Euclidean spaces. Our model is built upon a random spanning tree manifold partition model as each weak learner, which is capable of capturing any irregularly shaped spatially contiguous partitions while respecting intrinsic geometries and domain boundary constraints. Utilizing many nice properties of spanning tree structures, we design an efﬁcient Bayesian inference algorithm. Equipped with a soft prediction scheme, BAST is demonstrated to signiﬁcantly outperform other competing methods in simulation experiments and in an application to the chlorophyll data in Aral Sea, due to its strong local adaptivity to different levels of smoothness.
\ No newline at end of file
diff --git a/data/2021/neurips/BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery b/data/2021/neurips/BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
new file mode 100644
index 0000000000..cf1fc299d9
--- /dev/null
+++ b/data/2021/neurips/BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery	
@@ -0,0 +1 @@
+A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes.
\ No newline at end of file
diff --git "a/data/2021/neurips/BCORLE(\316\273): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market" "b/data/2021/neurips/BCORLE(\316\273): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market"
new file mode 100644
index 0000000000..e94fb99dd1
--- /dev/null
+++ "b/data/2021/neurips/BCORLE(\316\273): An Offline Reinforcement Learning and Evaluation Framework for Coupons Allocation in E-commerce Market"	
@@ -0,0 +1 @@
+Coupons allocation is an important tool for enterprises to increase the activity and loyalty of users on the e-commerce market. One fundamental problem related is how to allocate coupons within a ﬁxed budget while maximizing users’ retention on the e-commerce platform. The online e-commerce environment is complicated and ever changing, so it requires the coupons allocation policy learning can quickly adapt to the changes of the company’s business strategy. Unfortunately, existing studies with a huge computation overhead can hardly satisfy the requirements of real-time and fast-response in the real world. Speciﬁcally, the problem of coupons allocation within a ﬁxed budget is usually formulated as a Lagrangian problem. Existing solutions need to re-learn the policy once the value of Lagrangian multiplier variable λ is updated, causing a great computation overhead. Besides, a mature e-commerce market often faces tens of millions of users and dozens of types of coupons which construct the huge policy space, further increasing the difﬁculty of solving the problem. To tackle with above problems, we propose a budget constrained ofﬂine reinforcement learning and evaluation with λ -generalization (BCORLE( λ )) framework. The proposed method can help enterprises develop a coupons allocation policy which greatly improves users’ retention rate on the platform while ensuring the cost does not exceed the budget. Speciﬁ-cally, λ -generalization method is proposed to lead the policy learning process can be executed according to different λ values adaptively, avoiding re-learning new polices from scratch. Thus the computation overhead is greatly reduced. Further, a novel ofﬂine reinforcement
\ No newline at end of file
diff --git a/data/2021/neurips/BNS: Building Network Structures Dynamically for Continual Learning b/data/2021/neurips/BNS: Building Network Structures Dynamically for Continual Learning
new file mode 100644
index 0000000000..fba579b446
--- /dev/null
+++ b/data/2021/neurips/BNS: Building Network Structures Dynamically for Continual Learning	
@@ -0,0 +1 @@
+Continual learning (CL) of a sequence of tasks is often accompanied with the catastrophic forgetting (CF) problem. Existing research has achieved remarkable results in overcoming CF, especially for task continual learning . However, limited work has been done to achieve another important goal of CL, knowledge transfer . In this paper, we propose a technique (called BNS) to do both. The novelty of BNS is that it dynamically builds a network to learn each new task to overcome CF and to transfer knowledge across tasks at the same time. Experimental results show that when the tasks are different (with little shared knowledge), BNS can already outperform the state-of-the-art baselines. When the tasks are similar and have shared knowledge, BNS outperforms the baselines substantially by a large margin due to its knowledge transfer capability.
\ No newline at end of file
diff --git a/data/2021/neurips/Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others b/data/2021/neurips/Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others
new file mode 100644
index 0000000000..84469c6448
--- /dev/null
+++ b/data/2021/neurips/Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others	
@@ -0,0 +1 @@
+To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of other agents in the environment. By the end of their first year of life, human infants intuitively achieve such common sense, and these cognitive achievements lay the foundation for humans' rich and complex understanding of the mental states of others. Can machines achieve generalizable, commonsense reasoning about other agents like human infants? The Baby Intuitions Benchmark (BIB) challenges machines to predict the plausibility of an agent's behavior based on the underlying causes of its actions. Because BIB's content and paradigm are adopted from developmental cognitive science, BIB allows for direct comparison between human and machine performance. Nevertheless, recently proposed, deep-learning-based agency reasoning models fail to show infant-like reasoning, leaving BIB an open challenge.
\ No newline at end of file
diff --git a/data/2021/neurips/Backdoor Attack with Imperceptible Input and Latent Modification b/data/2021/neurips/Backdoor Attack with Imperceptible Input and Latent Modification
new file mode 100644
index 0000000000..9439bb1fe9
--- /dev/null
+++ b/data/2021/neurips/Backdoor Attack with Imperceptible Input and Latent Modification	
@@ -0,0 +1 @@
+Recent studies have shown that deep neural networks (DNN) are vulnerable to various adversarial attacks. In particular, an adversary can inject a stealthy backdoor into a model such that the compromised model will behave normally without the presence of the trigger. Techniques for generating backdoor images that are visually imperceptible from clean images have also been developed recently, which further enhance the stealthiness of the backdoor attacks from the input space. Along with the development of attacks, defense against backdoor attacks is also evolving. Many existing countermeasures found that backdoor tends to leave tangible footprints in the latent or feature space, which can be utilized to mitigate backdoor attacks. In this paper, we extend the concept of imperceptible backdoor from the input space to the latent representation, which signiﬁcantly improves the effectiveness against the existing defense mechanisms, especially those relying on the distinguishability between clean inputs and backdoor inputs in latent space. In the proposed framework, the trigger function will learn to manipulate the input by injecting imperceptible input noise while matching the latent representations of the clean and manipulated inputs via a Wasserstein-based regularization of the corresponding empirical distributions. We formulate such an objective as a non-convex and constrained optimization problem and solve the problem with an efﬁcient stochastic alternating optimization procedure. We name the proposed backdoor attack as Wasserstein Backdoor (WB), which achieves a high attack success rate while being stealthy from both the input and latent spaces, as tested in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImagenet.
\ No newline at end of file
diff --git a/data/2021/neurips/Backward-Compatible Prediction Updates: A Probabilistic Approach b/data/2021/neurips/Backward-Compatible Prediction Updates: A Probabilistic Approach
new file mode 100644
index 0000000000..dacf7be255
--- /dev/null
+++ b/data/2021/neurips/Backward-Compatible Prediction Updates: A Probabilistic Approach	
@@ -0,0 +1 @@
+When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates.
\ No newline at end of file
diff --git a/data/2021/neurips/Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion b/data/2021/neurips/Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval b/data/2021/neurips/Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval
new file mode 100644
index 0000000000..b8317130e8
--- /dev/null
+++ b/data/2021/neurips/Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval	
@@ -0,0 +1 @@
+Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop between training passages. We tackle these problems via Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting. To tame the search space, we propose condensed retrieval, a pipeline that summarizes the retrieved passages after each hop into a single compact context. To model complex queries, we introduce a focused late interaction retriever that allows different parts of the same query representation to match disparate relevant passages. Lastly, to infer the hopping dependencies among unordered training passages, we devise latent hop ordering, a weak-supervision strategy in which the trained retriever itself selects the sequence of hops. We evaluate Baleen on retrieval for two-hop question answering and many-hop claim verification, establishing state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Bandit Learning with Delayed Impact of Actions b/data/2021/neurips/Bandit Learning with Delayed Impact of Actions
new file mode 100644
index 0000000000..02a255aa82
--- /dev/null
+++ b/data/2021/neurips/Bandit Learning with Delayed Impact of Actions	
@@ -0,0 +1 @@
+We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the past impact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real world. For example, the capability to pay back a loan for people in a certain social group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. In this paper, we formulate this delayed and long-term impact of actions within the context of multi-armed bandits. We generalize the bandit setting to encode the dependency of this"bias"due to the action history during learning. The goal is to maximize the collected utilities over time while taking into account the dynamics created by the delayed impacts of historical actions. We propose an algorithm that achieves a regret of $\tilde{\mathcal{O}}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ is the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with long-term impacts and have implications in designing fair algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Bandit Phase Retrieval b/data/2021/neurips/Bandit Phase Retrieval
new file mode 100644
index 0000000000..37b9c27a65
--- /dev/null
+++ b/data/2021/neurips/Bandit Phase Retrieval	
@@ -0,0 +1 @@
+We study a bandit version of phase retrieval where the learner chooses actions $(A_t)_{t=1}^n$ in the $d$-dimensional unit ball and the expected reward is $\langle A_t, \theta_\star\rangle^2$ where $\theta_\star \in \mathbb R^d$ is an unknown parameter vector. We prove that the minimax cumulative regret in this problem is $\smash{\tilde \Theta(d \sqrt{n})}$, which improves on the best known bounds by a factor of $\smash{\sqrt{d}}$. We also show that the minimax simple regret is $\smash{\tilde \Theta(d / \sqrt{n})}$ and that this is only achievable by an adaptive algorithm. Our analysis shows that an apparently convincing heuristic for guessing lower bounds can be misleading and that uniform bounds on the information ratio for information-directed sampling are not sufficient for optimal regret.
\ No newline at end of file
diff --git a/data/2021/neurips/Bandit Quickest Changepoint Detection b/data/2021/neurips/Bandit Quickest Changepoint Detection
new file mode 100644
index 0000000000..df02403263
--- /dev/null
+++ b/data/2021/neurips/Bandit Quickest Changepoint Detection	
@@ -0,0 +1 @@
+Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, where sensing actions (or sensors) are sequentially chosen, and only measurements corresponding to chosen actions are observed. We derive an information-theoretic lower bound on the detection delay for a general class of finitely parameterized probability distributions. We then propose a computationally efficient online sensing scheme, which seamlessly balances the need for exploration of different sensing options with exploitation of querying informative actions. We derive expected delay bounds for the proposed scheme and show that these bounds match our information-theoretic lower bounds at low false alarm rates, establishing optimality of the proposed method. We then perform a number of experiments on synthetic and real datasets demonstrating the effectiveness of our proposed method.
\ No newline at end of file
diff --git a/data/2021/neurips/Bandits with Knapsacks beyond the Worst Case b/data/2021/neurips/Bandits with Knapsacks beyond the Worst Case
new file mode 100644
index 0000000000..a7bf3048ec
--- /dev/null
+++ b/data/2021/neurips/Bandits with Knapsacks beyond the Worst Case	
@@ -0,0 +1 @@
+Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider"simple regret"in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general"reduction"from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from \citet{AgrawalDevanur-ec14}, providing new analyses thereof.
\ No newline at end of file
diff --git a/data/2021/neurips/Bandits with many optimal arms b/data/2021/neurips/Bandits with many optimal arms
new file mode 100644
index 0000000000..603fa39ee0
--- /dev/null
+++ b/data/2021/neurips/Bandits with many optimal arms	
@@ -0,0 +1 @@
+We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$. For the objective of minimizing the cumulative regret, we provide a lower bound of order $\Omega(\log(T)/(p^*\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of $\log(1/\Delta)$. Our algorithm needs $p^*$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to $p^*$ in this setting is impossible. For best-arm identification we also provide a lower bound of order $\Omega(\exp(-cT\Delta^2 p^*))$ on the probability of outputting a sub-optimal arm where $c>0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order $\log(T)$ in the exponential, and that does not need $p^*$ or $\Delta$ as parameter. Our results apply directly to the three related problems of competing against the $j$-th best arm, identifying an $\epsilon$ good arm, and finding an arm with mean larger than a quantile of a known order.
\ No newline at end of file
diff --git a/data/2021/neurips/Batch Active Learning at Scale b/data/2021/neurips/Batch Active Learning at Scale
new file mode 100644
index 0000000000..3a0234cd51
--- /dev/null
+++ b/data/2021/neurips/Batch Active Learning at Scale	
@@ -0,0 +1 @@
+The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks b/data/2021/neurips/Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks
new file mode 100644
index 0000000000..fe1e513ff1
--- /dev/null
+++ b/data/2021/neurips/Batch Multi-Fidelity Bayesian Optimization with Deep Auto-Regressive Networks	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a powerful approach for optimizing black-box, expensive-to-evaluate functions. To enable a flexible trade-off between the cost and accuracy, many applications allow the function to be evaluated at different fidelities. In order to reduce the optimization cost while maximizing the benefit-cost ratio, in this paper, we propose Batch Multi-fidelity Bayesian Optimization with Deep Auto-Regressive Networks (BMBO-DARN). We use a set of Bayesian neural networks to construct a fully auto-regressive model, which is expressive enough to capture strong yet complex relationships across all the fidelities, so as to improve the surrogate learning and optimization performance. Furthermore, to enhance the quality and diversity of queries, we develop a simple yet efficient batch querying method, without any combinatorial search over the fidelities. We propose a batch acquisition function based on Max-value Entropy Search (MES) principle, which penalizes highly correlated queries and encourages diversity. We use posterior samples and moment matching to fulfill efficient computation of the acquisition function and conduct alternating optimization over every fidelity-input pair, which guarantees an improvement at each step. We demonstrate the advantage of our approach on four real-world hyperparameter optimization applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Batch Normalization Orthogonalizes Representations in Deep Random Networks b/data/2021/neurips/Batch Normalization Orthogonalizes Representations in Deep Random Networks
new file mode 100644
index 0000000000..3faa5aa1f6
--- /dev/null
+++ b/data/2021/neurips/Batch Normalization Orthogonalizes Representations in Deep Random Networks	
@@ -0,0 +1 @@
+This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN.
\ No newline at end of file
diff --git a/data/2021/neurips/BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer b/data/2021/neurips/BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
new file mode 100644
index 0000000000..3b4dbe12ee
--- /dev/null
+++ b/data/2021/neurips/BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer	
@@ -0,0 +1 @@
+As the applications of deep learning models on edge devices increase at an accelerating pace, fast adaptation to various scenarios with varying resource constraints has become a crucial aspect of model deployment. As a result, model optimization strategies with adaptive configuration are becoming increasingly popular. While single-shot quantized neural architecture search enjoys flexibility in both model architecture and quantization policy, the combined search space comes with many challenges, including instability when training the weight-sharing supernet and difficulty in navigating the exponentially growing search space. Existing methods tend to either limit the architecture search space to a small set of options or limit the quantization policy search space to fixed precision policies. To this end, we propose BatchQuant, a robust quantizer formulation that allows fast and stable training of a compact, single-shot, mixed-precision, weight-sharing supernet. We employ BatchQuant to train a compact supernet (offering over $10^{76}$ quantized subnets) within substantially fewer GPU hours than previous methods. Our approach, Quantized-for-all (QFA), is the first to seamlessly extend one-shot weight-sharing NAS supernet to support subnets with arbitrary ultra-low bitwidth mixed-precision quantization policies without retraining. QFA opens up new possibilities in joint hardware-aware neural architecture search and quantization. We demonstrate the effectiveness of our method on ImageNet and achieve SOTA Top-1 accuracy under a low complexity constraint ($<20$ MFLOPs). The code and models will be made publicly available at https://github.com/bhpfelix/QFA.
\ No newline at end of file
diff --git a/data/2021/neurips/Batched Thompson Sampling b/data/2021/neurips/Batched Thompson Sampling
new file mode 100644
index 0000000000..30d66ae2ef
--- /dev/null
+++ b/data/2021/neurips/Batched Thompson Sampling	
@@ -0,0 +1 @@
+We introduce a novel anytime Batched Thompson sampling policy for multi-armed bandits where the agent observes the rewards of her actions and adjusts her policy only at the end of a small number of batches. We show that this policy simultaneously achieves a problem dependent regret of order $O(\log(T))$ and a minimax regret of order $O(\sqrt{T\log(T)})$ while the number of batches can be bounded by $O(\log(T))$ independent of the problem instance over a time horizon $T$. We also show that in expectation the number of batches used by our policy can be bounded by an instance dependent bound of order $O(\log\log(T))$. These results indicate that Thompson sampling maintains the same performance in this batched setting as in the case when instantaneous feedback is available after each action, while requiring minimal feedback. These results also indicate that Thompson sampling performs competitively with recently proposed algorithms tailored for the batched setting. These algorithms optimize the batch structure for a given time horizon $T$ and prioritize exploration in the beginning of the experiment to eliminate suboptimal actions. We show that Thompson sampling combined with an adaptive batching strategy can achieve a similar performance without knowing the time horizon $T$ of the problem and without having to carefully optimize the batch structure to achieve a target regret bound (i.e. problem dependent vs minimax regret) for a given $T$.
\ No newline at end of file
diff --git a/data/2021/neurips/BayesIMP: Uncertainty Quantification for Causal Data Fusion b/data/2021/neurips/BayesIMP: Uncertainty Quantification for Causal Data Fusion
new file mode 100644
index 0000000000..590db733b9
--- /dev/null
+++ b/data/2021/neurips/BayesIMP: Uncertainty Quantification for Causal Data Fusion	
@@ -0,0 +1 @@
+While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and quantity, principled uncertainty quantification becomes essential. To that end, we introduce Bayesian Interventional Mean Processes, a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the utility of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Bayesian Adaptation for Covariate Shift b/data/2021/neurips/Bayesian Adaptation for Covariate Shift
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Bayesian Bellman Operators b/data/2021/neurips/Bayesian Bellman Operators
new file mode 100644
index 0000000000..9683ae7c5c
--- /dev/null
+++ b/data/2021/neurips/Bayesian Bellman Operators	
@@ -0,0 +1 @@
+We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationshipto established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference isused, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically
\ No newline at end of file
diff --git a/data/2021/neurips/Bayesian Optimization of Function Networks b/data/2021/neurips/Bayesian Optimization of Function Networks
new file mode 100644
index 0000000000..c269c60c07
--- /dev/null
+++ b/data/2021/neurips/Bayesian Optimization of Function Networks	
@@ -0,0 +1 @@
+We consider Bayesian optimization of the output of a network of functions, where each function takes as input the output of its parent nodes, and where the network takes significant time to evaluate. Such problems arise, for example, in reinforcement learning, engineering design, and manufacturing. While the standard Bayesian optimization approach observes only the final output, our approach delivers greater query efficiency by leveraging information that the former ignores: intermediate output within the network. This is achieved by modeling the nodes of the network using Gaussian processes and choosing the points to evaluate using, as our acquisition function, the expected improvement computed with respect to the implied posterior on the objective. Although the non-Gaussian nature of this posterior prevents computing our acquisition function in closed form, we show that it can be efficiently maximized via sample average approximation. In addition, we prove that our method is asymptotically consistent, meaning that it finds a globally optimal solution as the number of evaluations grows to infinity, thus generalizing previously known convergence results for the expected improvement. Notably, this holds even though our method might not evaluate the domain densely, instead leveraging problem structure to leave regions unexplored. Finally, we show that our approach dramatically outperforms standard Bayesian optimization methods in several synthetic and real-world problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Bayesian Optimization with High-Dimensional Outputs b/data/2021/neurips/Bayesian Optimization with High-Dimensional Outputs
new file mode 100644
index 0000000000..1411fc1395
--- /dev/null
+++ b/data/2021/neurips/Bayesian Optimization with High-Dimensional Outputs	
@@ -0,0 +1 @@
+Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated outcomes (or"tasks"). For example, scientists may want to optimize the coverage of a cell tower network across a dense grid of locations. Similarly, engineers may seek to balance the performance of a robot across dozens of different environments via constrained or robust optimization. However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability. We devise an efficient technique for exact multi-task GP sampling that combines exploiting Kronecker structure in the covariance matrices with Matheron's identity, allowing us to perform Bayesian Optimization using exact multi-task GP models with tens of thousands of correlated outputs. In doing so, we achieve substantial improvements in sample efficiency compared to existing approaches that only model aggregate functions of the outcomes. We demonstrate how this unlocks a new class of applications for Bayesian Optimization across a range of tasks in science and engineering, including optimizing interference patterns of an optical interferometer with more than 65,000 outputs.
\ No newline at end of file
diff --git a/data/2021/neurips/Bayesian decision-making under misspecified priors with applications to meta-learning b/data/2021/neurips/Bayesian decision-making under misspecified priors with applications to meta-learning
new file mode 100644
index 0000000000..cd98bba321
--- /dev/null
+++ b/data/2021/neurips/Bayesian decision-making under misspecified priors with applications to meta-learning	
@@ -0,0 +1 @@
+Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most Õ(H ) from TS with a well specified prior, where is the total-variation distance between priors and H is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.
\ No newline at end of file
diff --git a/data/2021/neurips/Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration b/data/2021/neurips/Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration
new file mode 100644
index 0000000000..6309274185
--- /dev/null
+++ b/data/2021/neurips/Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration	
@@ -0,0 +1 @@
+Despite Graph Neural Networks (GNNs) have achieved remarkable accuracy, whether the results are trustworthy is still unexplored. Previous studies suggest that many modern neural networks are over-confident on the predictions, however, surprisingly, we discover that GNNs are primarily in the opposite direction, i.e., GNNs are under-confident. Therefore, the confidence calibration for GNNs is highly desired. In this paper, we propose a novel trustworthy GNN model by designing a topology-aware post-hoc calibration function. Specifically, we first verify that the confidence distribution in a graph has homophily property, and this finding inspires us to design a calibration GNN model (CaGCN) to learn the calibration function. CaGCN is able to obtain a unique transformation from logits of GNNs to the calibrated confidence for each node, meanwhile, such transformation is able to preserve the order between classes, satisfying the accuracy-preserving property. Moreover, we apply the calibration GNN to self-training framework, showing that more trustworthy pseudo labels can be obtained with the calibrated confidence and further improve the performance. Extensive experiments demonstrate the effectiveness of our proposed model in terms of both calibration and accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Behavior From the Void: Unsupervised Active Pre-Training b/data/2021/neurips/Behavior From the Void: Unsupervised Active Pre-Training
new file mode 100644
index 0000000000..18e8c96c8c
--- /dev/null
+++ b/data/2021/neurips/Behavior From the Void: Unsupervised Active Pre-Training	
@@ -0,0 +1 @@
+We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training. APT learns behaviors and representations by actively searching for novel states in reward-free environments. The key novel idea is to explore the environment by maximizing a non-parametric entropy computed in an abstract representation space, which avoids challenging density modeling and consequently allows our approach to scale much better in environments that have high-dimensional observations (e.g., image observations). We empirically evaluate APT by exposing task-specific reward after a long unsupervised pre-training phase. In Atari games, APT achieves human-level performance on 12 games and obtains highly competitive performance compared to canonical fully supervised RL algorithms. On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult to train from scratch.
\ No newline at end of file
diff --git a/data/2021/neurips/Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning b/data/2021/neurips/Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..edcc4c697d
--- /dev/null
+++ b/data/2021/neurips/Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is successfully controlled within a reasonable range and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). Our code is public online at https://github.com/YiqinYang/ICQ.
\ No newline at end of file
diff --git a/data/2021/neurips/Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms b/data/2021/neurips/Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
new file mode 100644
index 0000000000..89fdba1de9
--- /dev/null
+++ b/data/2021/neurips/Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms	
@@ -0,0 +1 @@
+Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure -- Bellman Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive POMDPs, low Bellman rank problems as well as low Eluder dimension problems. This paper further designs a new optimization-based algorithm -- GOLF, and reanalyzes a hypothesis elimination-based algorithm -- OLIVE (proposed in Jiang et al., 2017). We prove that both algorithms learn the near-optimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of state-action space. Our regret and sample complexity results match or improve the best existing results for several well-known subclasses of low BE dimension problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Bellman-consistent Pessimism for Offline Reinforcement Learning b/data/2021/neurips/Bellman-consistent Pessimism for Offline Reinforcement Learning
new file mode 100644
index 0000000000..72ac7acaa0
--- /dev/null
+++ b/data/2021/neurips/Bellman-consistent Pessimism for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning. Despite the robustness it adds to the algorithm, overly pessimistic reasoning can be equally damaging in precluding the discovery of good policies, which is an issue for the popular bonus-based pessimism. In this paper, we introduce the notion of Bellman-consistent pessimism for general function approximation: instead of calculating a point-wise lower bound for the value function, we implement pessimism at the initial state over the set of functions consistent with the Bellman equations. Our theoretical guarantees only require Bellman closedness as standard in the exploratory setting, in which case bonus-based pessimism fails to provide guarantees. Even in the special case of linear function approximation where stronger expressivity assumptions hold, our result improves upon a recent bonus-based approach by $\mathcal{O}(d)$ in its sample complexity when the action space is finite. Remarkably, our algorithms automatically adapt to the best bias-variance tradeoff in the hindsight, whereas most prior approaches require tuning extra hyperparameters a priori.
\ No newline at end of file
diff --git a/data/2021/neurips/Beltrami Flow and Neural Diffusion on Graphs b/data/2021/neurips/Beltrami Flow and Neural Diffusion on Graphs
new file mode 100644
index 0000000000..f92fa31371
--- /dev/null
+++ b/data/2021/neurips/Beltrami Flow and Neural Diffusion on Graphs	
@@ -0,0 +1 @@
+We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation b/data/2021/neurips/Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation
new file mode 100644
index 0000000000..9cc81fe27f
--- /dev/null
+++ b/data/2021/neurips/Benign Overfitting in Multiclass Classification: All Roads Lead to Interpolation	
@@ -0,0 +1 @@
+The literature on “benign overfitting” in overparameterized models has been mostly restricted to regression or binary classification; however, modern machine learning operates in the multiclass setting. Motivated by this discrepancy, we study benign overfitting in multiclass linear classification. Specifically, we consider the following training algorithms on separable data: 1) empirical risk minimization (ERM) with cross-entropy loss, which converges to the multiclass support vector machine (SVM) solution; 2) ERM with least-squares loss, which converges to the min-norm interpolating (MNI) solution; and 3) the one-vs-all SVM classifier. First, we provide a simple sufficient deterministic condition under which all three algorithms lead to classifiers that interpolate the training data and have equal accuracy. When the data is generated from Gaussian mixtures or a multinomial logistic model, this condition holds under high enough effective overparameterization. We also show that this sufficient condition is satisfied under “neural collapse”, a phenomenon that is observed in training deep neural networks. Second, we derive novel bounds on the accuracy of the MNI classifier, thereby showing that all three training algorithms lead to benign overfitting under sufficient overparameterization. Ultimately, our analysis shows that good generalization is possible for SVM solutions beyond the realm in which typical margin-based bounds apply.
\ No newline at end of file
diff --git a/data/2021/neurips/BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation b/data/2021/neurips/BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation
new file mode 100644
index 0000000000..112bf9541b
--- /dev/null
+++ b/data/2021/neurips/BernNet: Learning Arbitrary Graph Spectral Filters via Bernstein Approximation	
@@ -0,0 +1 @@
+Many representative graph neural networks, e.g., GPR-GNN and ChebNet, approximate graph convolutions with graph spectral filters. However, existing work either applies predefined filter weights or learns them without necessary constraints, which may lead to oversimplified or ill-posed filters. To overcome these issues, we propose BernNet, a novel graph neural network with theoretical support that provides a simple but effective scheme for designing and learning arbitrary graph spectral filters. In particular, for any filter over the normalized Laplacian spectrum of a graph, our BernNet estimates it by an order-$K$ Bernstein polynomial approximation and designs its spectral property by setting the coefficients of the Bernstein basis. Moreover, we can learn the coefficients (and the corresponding filter weights) based on observed graphs and their associated signals and thus achieve the BernNet specialized for the data. Our experiments demonstrate that BernNet can learn arbitrary spectral filters, including complicated band-rejection and comb filters, and it achieves superior performance in real-world graph modeling tasks. Code is available at https://github.com/ivam-he/BernNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Best Arm Identification in Contaminated Stochastic Bandits b/data/2021/neurips/Best Arm Identification in Contaminated Stochastic Bandits
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel b/data/2021/neurips/Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel
new file mode 100644
index 0000000000..231cc1892a
--- /dev/null
+++ b/data/2021/neurips/Best of Both Worlds: Practical and Theoretically Optimal Submodular Maximization in Parallel	
@@ -0,0 +1 @@
+For the problem of maximizing a monotone, submodular function with respect to a cardinality constraint $k$ on a ground set of size $n$, we provide an algorithm that achieves the state-of-the-art in both its empirical performance and its theoretical properties, in terms of adaptive complexity, query complexity, and approximation ratio; that is, it obtains, with high probability, query complexity of $O(n)$ in expectation, adaptivity of $O(\log(n))$, and approximation ratio of nearly $1-1/e$. The main algorithm is assembled from two components which may be of independent interest. The first component of our algorithm, LINEARSEQ, is useful as a preprocessing algorithm to improve the query complexity of many algorithms. Moreover, a variant of LINEARSEQ is shown to have adaptive complexity of $O( \log (n / k) )$ which is smaller than that of any previous algorithm in the literature. The second component is a parallelizable thresholding procedure THRESHOLDSEQ for adding elements with gain above a constant threshold. Finally, we demonstrate that our main algorithm empirically outperforms, in terms of runtime, adaptive rounds, total queries, and objective values, the previous state-of-the-art algorithm FAST in a comprehensive evaluation with six submodular objective functions.
\ No newline at end of file
diff --git a/data/2021/neurips/Best-case lower bounds in online learning b/data/2021/neurips/Best-case lower bounds in online learning
new file mode 100644
index 0000000000..e3f448cb22
--- /dev/null
+++ b/data/2021/neurips/Best-case lower bounds in online learning	
@@ -0,0 +1 @@
+Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In this work, we initiate the study of best-case lower bounds in online convex optimization, wherein we bound the largest improvement an algorithm can obtain relative to the single best action in hindsight. This problem is motivated by the goal of better understanding the adaptivity of a learning algorithm. Another motivation comes from fairness: it is known that best-case lower bounds are instrumental in obtaining algorithms for decision-theoretic online learning (DTOL) that satisfy a notion of group fairness. Our contributions are a general method to provide best-case lower bounds in Follow The Regularized Leader (FTRL) algorithms with time-varying regularizers, which we use to show that best-case lower bounds are of the same order as existing upper regret bounds: this includes situations with a fixed learning rate, decreasing learning rates, timeless methods, and adaptive gradient methods. In stark contrast, we show that the linearized version of FTRL can attain negative linear regret. Finally, in DTOL with two experts and binary predictions, we fully characterize the best-case sequences, which provides a finer understanding of the best-case lower bounds.
\ No newline at end of file
diff --git a/data/2021/neurips/Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification b/data/2021/neurips/Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification
new file mode 100644
index 0000000000..74e9805589
--- /dev/null
+++ b/data/2021/neurips/Beta-CROWN: Efficient Bound Propagation with Per-neuron Split Constraints for Neural Network Robustness Verification	
@@ -0,0 +1 @@
+Bound propagation based incomplete neural network verifiers such as CROWN are very efficient and can significantly accelerate branch-and-bound (BaB) based complete verification of neural networks. However, bound propagation cannot fully handle the neuron split constraints introduced by BaB commonly handled by expensive linear programming (LP) solvers, leading to loose bounds and hurting verification efficiency. In this work, we develop $\beta$-CROWN, a new bound propagation based method that can fully encode neuron splits via optimizable parameters $\beta$ constructed from either primal or dual space. When jointly optimized in intermediate layers, $\beta$-CROWN generally produces better bounds than typical LP verifiers with neuron split constraints, while being as efficient and parallelizable as CROWN on GPUs. Applied to complete robustness verification benchmarks, $\beta$-CROWN with BaB is up to three orders of magnitude faster than LP-based BaB methods, and is notably faster than all existing approaches while producing lower timeout rates. By terminating BaB early, our method can also be used for efficient incomplete verification. We consistently achieve higher verified accuracy in many settings compared to powerful incomplete verifiers, including those based on convex barrier breaking techniques. Compared to the typically tightest but very costly semidefinite programming (SDP) based incomplete verifiers, we obtain higher verified accuracy with three orders of magnitudes less verification time. Our algorithm empowered the $\alpha,\!\beta$-CROWN (alpha-beta-CROWN) verifier, the winning tool in VNN-COMP 2021. Our code is available at http://PaperCode.cc/BetaCROWN
\ No newline at end of file
diff --git a/data/2021/neurips/Better Algorithms for Individually Fair k-Clustering b/data/2021/neurips/Better Algorithms for Individually Fair k-Clustering
new file mode 100644
index 0000000000..601abe2618
--- /dev/null
+++ b/data/2021/neurips/Better Algorithms for Individually Fair k-Clustering	
@@ -0,0 +1 @@
+We study data clustering problems with $\ell_p$-norm objectives (e.g. $k$-Median and $k$-Means) in the context of individual fairness. The dataset consists of $n$ points, and we want to find $k$ centers such that (a) the objective is minimized, while (b) respecting the individual fairness constraint that every point $v$ has a center within a distance at most $r(v)$, where $r(v)$ is $v$'s distance to its $(n/k)$th nearest point. Jung, Kannan, and Lutz [FORC 2020] introduced this concept and designed a clustering algorithm with provable (approximate) fairness and objective guarantees for the $\ell_\infty$ or $k$-Center objective. Mahabadi and Vakilian [ICML 2020] revisited this problem to give a local-search algorithm for all $\ell_p$-norms. Empirically, their algorithms outperform Jung et. al.'s by a large margin in terms of cost (for $k$-Median and $k$-Means), but they incur a reasonable loss in fairness. In this paper, our main contribution is to use Linear Programming (LP) techniques to obtain better algorithms for this problem, both in theory and in practice. We prove that by modifying known LP rounding techniques, one gets a worst-case guarantee on the objective which is much better than in MV20, and empirically, this objective is extremely close to the optimal. Furthermore, our theoretical fairness guarantees are comparable with MV20 in theory, and empirically, we obtain noticeably fairer solutions. Although solving the LP {\em exactly} might be prohibitive, we demonstrate that in practice, a simple sparsification technique drastically improves the run-time of our algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training b/data/2021/neurips/Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training
new file mode 100644
index 0000000000..6106a8a34a
--- /dev/null
+++ b/data/2021/neurips/Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training	
@@ -0,0 +1 @@
+Delusive attacks aim to substantially deteriorate the test accuracy of the learning model by slightly perturbing the features of correctly labeled training examples. By formalizing this malicious attack as finding the worst-case training data within a specific $\infty$-Wasserstein ball, we show that minimizing adversarial risk on the perturbed data is equivalent to optimizing an upper bound of natural risk on the original data. This implies that adversarial training can serve as a principled defense against delusive attacks. Thus, the test accuracy decreased by delusive attacks can be largely recovered by adversarial training. To further understand the internal mechanism of the defense, we disclose that adversarial training can resist the delusive perturbations by preventing the learner from overly relying on non-robust features in a natural setting. Finally, we complement our theoretical findings with a set of experiments on popular benchmark datasets, which show that the defense withstands six different practical attacks. Both theoretical and empirical results vote for adversarial training when confronted with delusive adversaries.
\ No newline at end of file
diff --git a/data/2021/neurips/Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game b/data/2021/neurips/Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game
new file mode 100644
index 0000000000..d25572f57a
--- /dev/null
+++ b/data/2021/neurips/Beware of the Simulated DAG! Causal Discovery Benchmarks May Be Easy to Game	
@@ -0,0 +1 @@
+Simulated DAG models may exhibit properties that, perhaps inadvertently, render their structure identifiable and unexpectedly affect structure learning algorithms. Here, we show that marginal variance tends to increase along the causal order for generically sampled additive noise models. We introduce varsortability as a measure of the agreement between the order of increasing marginal variance and the causal order. For commonly sampled graphs and model parameters, we show that the remarkable performance of some continuous structure learning algorithms can be explained by high varsortability and matched by a simple baseline method. Yet, this performance may not transfer to real-world data where varsortability may be moderate or dependent on the choice of measurement scales. On standardized data, the same algorithms fail to identify the ground-truth DAG or its Markov equivalence class. While standardization removes the pattern in marginal variance, we show that data generating processes that incur high varsortability also leave a distinct covariance pattern that may be exploited even after standardization. Our findings challenge the significance of generic benchmarks with independently drawn parameters. The code is available at https://github.com/Scriddie/Varsortability.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond Bandit Feedback in Online Multiclass Classification b/data/2021/neurips/Beyond Bandit Feedback in Online Multiclass Classification
new file mode 100644
index 0000000000..6e967bd744
--- /dev/null
+++ b/data/2021/neurips/Beyond Bandit Feedback in Online Multiclass Classification	
@@ -0,0 +1 @@
+We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order $B\sqrt{\rho KT}$, where $B$ is the diameter of the prediction space, $K$ is the number of classes, $T$ is the time horizon, and $\rho$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that Gappletron achieves a constant surrogate regret of order $B^2K$. We also prove a general lower bound of order $\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs, our algorithm is competitive against known baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning b/data/2021/neurips/Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
new file mode 100644
index 0000000000..85fe2e7783
--- /dev/null
+++ b/data/2021/neurips/Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning	
@@ -0,0 +1 @@
+Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification b/data/2021/neurips/Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification
new file mode 100644
index 0000000000..8546497e96
--- /dev/null
+++ b/data/2021/neurips/Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification	
@@ -0,0 +1 @@
+Among the many ways of quantifying uncertainty in a regression setting, specifying the full quantile function is attractive, as quantiles are amenable to interpretation and evaluation. A model that predicts the true conditional quantiles for each input, at all quantile levels, presents a correct and efficient representation of the underlying uncertainty. To achieve this, many current quantile-based methods focus on optimizing the so-called pinball loss. However, this loss restricts the scope of applicable regression models, limits the ability to target many desirable properties (e.g. calibration, sharpness, centered intervals), and may produce poor conditional quantiles. In this work, we develop new quantile methods that address these shortcomings. In particular, we propose methods that can apply to any class of regression model, allow for selecting a Pareto-optimal trade-off between calibration and sharpness, optimize for calibration of centered intervals, and produce more accurate conditional quantiles. We provide a thorough experimental evaluation of our methods, which includes a high dimensional uncertainty quantification task in nuclear fusion.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond Smoothness: Incorporating Low-Rank Analysis into Nonparametric Density Estimation b/data/2021/neurips/Beyond Smoothness: Incorporating Low-Rank Analysis into Nonparametric Density Estimation
new file mode 100644
index 0000000000..fd4cbd90db
--- /dev/null
+++ b/data/2021/neurips/Beyond Smoothness: Incorporating Low-Rank Analysis into Nonparametric Density Estimation	
@@ -0,0 +1 @@
+The construction and theoretical analysis of the most popular universally consistent nonparametric density estimators hinge on one functional property: smoothness. In this paper we investigate the theoretical implications of incorporating a multi-view latent variable model, a type of low-rank model, into nonparametric density estimation. To do this we perform extensive analysis on histogram-style estimators that integrate a multi-view model. Our analysis culminates in showing that there exists a universally consistent histogram-style estimator that converges to any multi-view model with a ﬁnite number of Lipschitz continuous components at a rate of (cid:101) O (1 / 3 √ n ) in L 1 error. In contrast, the standard histogram estimator can converge at a rate slower than 1 / d √ n on the same class of densities. We also introduce a new nonparametric latent variable model based on the Tucker decomposition. A rudimentary implementation of our estimators experimentally demonstrates a considerable performance improvement over the standard histogram estimator. We also provide a thorough analysis of the sample complexity of our Tucker decomposition-based model and a variety of other results. Thus, our paper provides solid theoretical foundations for extending low-rank techniques to the nonparametric setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization b/data/2021/neurips/Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization
new file mode 100644
index 0000000000..db2a1a4194
--- /dev/null
+++ b/data/2021/neurips/Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization	
@@ -0,0 +1 @@
+The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning b/data/2021/neurips/Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning
new file mode 100644
index 0000000000..7272ad3dd3
--- /dev/null
+++ b/data/2021/neurips/Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning	
@@ -0,0 +1 @@
+We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Beyond the Signs: Nonparametric Tensor Completion via Sign Series b/data/2021/neurips/Beyond the Signs: Nonparametric Tensor Completion via Sign Series
new file mode 100644
index 0000000000..ea87f4b8cc
--- /dev/null
+++ b/data/2021/neurips/Beyond the Signs: Nonparametric Tensor Completion via Sign Series	
@@ -0,0 +1 @@
+We consider the problem of tensor estimation from noisy observations with possibly missing entries. A nonparametric approach to tensor completion is developed based on a new model which we coin as sign representable tensors. The model represents the signal tensor of interest using a series of structured sign tensors. Unlike earlier methods, the sign series representation effectively addresses both low- and high-rank signals, while encompassing many existing tensor models -- including CP models, Tucker models, single index models, several hypergraphon models -- as special cases. We show that the sign tensor series is theoretically characterized, and computationally estimable, via classification tasks with carefully-specified weights. Excess risk bounds, estimation error rates, and sample complexities are established. We demonstrate the outperformance of our approach over previous methods on two datasets, one on human brain connectivity networks and the other on topic data mining.
\ No newline at end of file
diff --git a/data/2021/neurips/Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models b/data/2021/neurips/Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models
new file mode 100644
index 0000000000..e8c1507274
--- /dev/null
+++ b/data/2021/neurips/Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models	
@@ -0,0 +1 @@
+The capabilities of natural language models trained on large-scale data have increased immensely over the past few years. Open source libraries such as HuggingFace have made these models easily available and accessible. While prior research has identified biases in large language models, this paper considers biases contained in the most popular versions of these models when applied `out-of-the-box' for downstream tasks. We focus on generative language models as they are well-suited for extracting biases inherited from training data. Specifically, we conduct an in-depth analysis of GPT-2, which is the most downloaded text generation model on HuggingFace, with over half a million downloads per month. We assess biases related to occupational associations for different protected categories by intersecting gender with religion, sexuality, ethnicity, political affiliation, and continental name origin. Using a template-based data collection pipeline, we collect 396K sentence completions made by GPT-2 and find: (i) The machine-predicted jobs are less diverse and more stereotypical for women than for men, especially for intersections; (ii) Intersectional interactions are highly relevant for occupational associations, which we quantify by fitting 262 logistic models; (iii) For most occupations, GPT-2 reflects the skewed gender and ethnicity distribution found in US Labor Bureau data, and even pulls the societally-skewed distribution towards gender parity in cases where its predictions deviate from real labor market observations. This raises the normative question of what language models should learn - whether they should reflect or correct for existing inequalities.
\ No newline at end of file
diff --git a/data/2021/neurips/Bias and variance of the Bayesian-mean decoder b/data/2021/neurips/Bias and variance of the Bayesian-mean decoder
new file mode 100644
index 0000000000..98551df2d8
--- /dev/null
+++ b/data/2021/neurips/Bias and variance of the Bayesian-mean decoder	
@@ -0,0 +1 @@
+Perception, in theoretical neuroscience, has been modeled as the encoding of external stimuli into internal signals, which are then decoded. The Bayesian mean is an important decoder, as it is optimal for purposes of both estimation and discrimination. We present widely-applicable approximations to the bias and to the variance of the Bayesian mean, obtained under the minimal and biologicallyrelevant assumption that the encoding results from a series of independent, though not necessarily identically-distributed, signals. Simulations substantiate the accuracy of our approximations in the small-noise regime. The bias of the Bayesian mean comprises two components: one driven by the prior, and one driven by the precision of the encoding. If the encoding is ‘efficient’, the two components have opposite effects; their relative strengths are determined by the objective that the encoding optimizes. The experimental literature on perception reports both ‘Bayesian’ biases directed towards prior expectations, and opposite, ‘anti-Bayesian’ biases. We show that different tasks are indeed predicted to yield such contradictory biases, under a consistently-optimal encoding-decoding model. Moreover, we recover Wei and Stocker’s “law of human perception” [1], a relation between the bias of the Bayesian mean and the derivative of its variance, and show how the coefficient of proportionality in this law depends on the task at hand. Our results provide a parsimonious theory of optimal perception under constraints, in which encoding and decoding are adapted both to the prior and to the task faced by the observer.
\ No newline at end of file
diff --git a/data/2021/neurips/Biological key-value memory networks b/data/2021/neurips/Biological key-value memory networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Black Box Probabilistic Numerics b/data/2021/neurips/Black Box Probabilistic Numerics
new file mode 100644
index 0000000000..0f219e9967
--- /dev/null
+++ b/data/2021/neurips/Black Box Probabilistic Numerics	
@@ -0,0 +1 @@
+Probabilistic numerics casts numerical tasks, such the numerical solution of differential equations, as inference problems to be solved. One approach is to model the unknown quantity of interest as a random variable, and to constrain this variable using data generated during the course of a traditional numerical method. However, data may be nonlinearly related to the quantity of interest, rendering the proper conditioning of random variables difficult and limiting the range of numerical tasks that can be addressed. Instead, this paper proposes to construct probabilistic numerical methods based only on the final output from a traditional method. A convergent sequence of approximations to the quantity of interest constitute a dataset, from which the limiting quantity of interest can be extrapolated, in a probabilistic analogue of Richardson's deferred approach to the limit. This black box approach (1) massively expands the range of tasks to which probabilistic numerics can be applied, (2) inherits the features and performance of state-of-the-art numerical methods, and (3) enables provably higher orders of convergence to be achieved. Applications are presented for nonlinear ordinary and partial differential equations, as well as for eigenvalue problems-a setting for which no probabilistic numerical methods have yet been developed.
\ No newline at end of file
diff --git a/data/2021/neurips/BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation b/data/2021/neurips/BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation
new file mode 100644
index 0000000000..2f9ba5a80a
--- /dev/null
+++ b/data/2021/neurips/BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) have made a dramatic leap in high-fidelity image synthesis and stylized face generation. Recently, a layer-swapping mechanism has been developed to improve the stylization performance. However, this method is incapable of fitting arbitrary styles in a single model and requires hundreds of style-consistent training images for each style. To address the above issues, we propose BlendGAN for arbitrary stylized face generation by leveraging a flexible blending strategy and a generic artistic dataset. Specifically, we first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles. In addition, a weighted blending module (WBM) is proposed to blend face and style representations implicitly and control the arbitrary stylization effect. By doing so, BlendGAN can gracefully fit arbitrary styles in a unified model while avoiding case-by-case preparation of style-consistent training images. To this end, we also present a novel large-scale artistic face dataset AAHQ. Extensive experiments demonstrate that BlendGAN outperforms state-of-the-art methods in terms of visual quality and style diversity for both latent-guided and reference-guided stylized face synthesis.
\ No newline at end of file
diff --git a/data/2021/neurips/Blending Anti-Aliasing into Vision Transformer b/data/2021/neurips/Blending Anti-Aliasing into Vision Transformer
new file mode 100644
index 0000000000..fb41c6e33a
--- /dev/null
+++ b/data/2021/neurips/Blending Anti-Aliasing into Vision Transformer	
@@ -0,0 +1 @@
+The transformer architectures, based on self-attention mechanism and convolution-free design, recently found superior performance and booming applications in computer vision. However, the discontinuous patch-wise tokenization process implicitly introduces jagged artifacts into attention maps, arising the traditional problem of aliasing for vision transformers. Aliasing effect occurs when discrete patterns are used to produce high frequency or continuous information, resulting in the indistinguishable distortions. Recent researches have found that modern convolution networks still suffer from this phenomenon. In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties. Specifically, we propose a plug-and-play Aliasing-Reduction Module(ARM) to alleviate the aforementioned issue. We investigate the effectiveness and generalization of the proposed method across multiple tasks and various vision transformer families. This lightweight design consistently attains a clear boost over several famous structures. Furthermore, our module also improves data efficiency and robustness of vision transformers.
\ No newline at end of file
diff --git a/data/2021/neurips/BooVAE: Boosting Approach for Continual Learning of VAE b/data/2021/neurips/BooVAE: Boosting Approach for Continual Learning of VAE
new file mode 100644
index 0000000000..7905fefc22
--- /dev/null
+++ b/data/2021/neurips/BooVAE: Boosting Approach for Continual Learning of VAE	
@@ -0,0 +1 @@
+Variational autoencoder (VAE) is a deep generative model for unsupervised learning, allowing to encode observations into the meaningful latent space. VAE is prone to catastrophic forgetting when tasks arrive sequentially, and only the data for the current one is available. We address this problem of continual learning for VAEs. It is known that the choice of the prior distribution over the latent space is crucial for VAE in the non-continual setting. We argue that it can also be helpful to avoid catastrophic forgetting. We learn the approximation of the aggregated posterior as a prior for each task. This approximation is parametrised as an additive mixture of distributions induced by encoder evaluated at trainable pseudo-inputs. We use a greedy boosting-like approach with entropy regularisation to learn the components. This method encourages components diversity, which is essential as we aim at memorising the current task with the fewest components possible. Based on the learnable prior, we introduce an end-to-end approach for continual learning of VAEs and provide empirical studies on commonly used benchmarks (MNIST, Fashion MNIST, NotMNIST) and CelebA datasets. For each dataset, the proposed method avoids catastrophic forgetting in a fully automatic way.
\ No newline at end of file
diff --git a/data/2021/neurips/BooVI: Provably Efficient Bootstrapped Value Iteration b/data/2021/neurips/BooVI: Provably Efficient Bootstrapped Value Iteration
new file mode 100644
index 0000000000..8b47c28aba
--- /dev/null
+++ b/data/2021/neurips/BooVI: Provably Efficient Bootstrapped Value Iteration	
@@ -0,0 +1 @@
+Despite the tremendous success of reinforcement learning (RL) with function approximation, efﬁcient exploration remains a signiﬁcant challenge, both practically and theoretically. In particular, existing theoretically grounded RL algorithms based on upper conﬁdence bounds (UCBs), such as optimistic least-squares value iteration (LSVI), are often incompatible with practically powerful function ap-proximators, such as neural networks. In this paper, we develop a variant of bootstrapped LSVI, namely BooVI, which bridges such a gap between practice and theory. Practically, BooVI drives exploration through (re)sampling, making it compatible with general function approximators. Theoretically, BooVI inherits the worst-case (cid:101) O ( √ d 3 H 3 T ) -regret of optimistic LSVI in the episodic linear setting. Here d is the feature dimension, H is the episode horizon, and T is the total number of steps.
\ No newline at end of file
diff --git a/data/2021/neurips/Boost Neural Networks by Checkpoints b/data/2021/neurips/Boost Neural Networks by Checkpoints
new file mode 100644
index 0000000000..680c2555d4
--- /dev/null
+++ b/data/2021/neurips/Boost Neural Networks by Checkpoints	
@@ -0,0 +1 @@
+Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these methods suffer from either marginal accuracy improvements due to the low diversity of checkpoints or high risk of divergence due to the cyclical learning rates they adopted. In this paper, we propose a novel method to ensemble the checkpoints, where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity. We theoretically prove that it converges by reducing exponential loss. The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture. Moreover, the adaptive sample weights in our method make it an effective solution to address the imbalanced class distribution. In the experiments, it yields up to 5.02% higher accuracy over single EfficientNet-B0 on the imbalanced datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Boosted CVaR Classification b/data/2021/neurips/Boosted CVaR Classification
new file mode 100644
index 0000000000..83eeeeee47
--- /dev/null
+++ b/data/2021/neurips/Boosted CVaR Classification	
@@ -0,0 +1 @@
+Many modern machine learning tasks require models with high tail performance, i.e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model's tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which computes the average risk over the tails of the loss. However, for classification tasks where models are evaluated by the zero-one loss, we show that if the classifiers are deterministic, then the minimizer of the average zero-one loss also minimizes the CVaR zero-one loss, suggesting that CVaR loss minimization is not helpful without additional assumptions. We circumvent this negative result by minimizing the CVaR loss over randomized classifiers, for which the minimizers of the average zero-one loss and the CVaR zero-one loss are no longer the same, so minimizing the latter can lead to better tail performance. To learn such randomized classifiers, we propose the Boosted CVaR Classification framework which is motivated by a direct relationship between CVaR and a classical boosting algorithm called LPBoost. Based on this framework, we design an algorithm called $\alpha$-AdaLPBoost. We empirically evaluate our proposed algorithm on four benchmark datasets and show that it achieves higher tail performance than deterministic model training methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Boosting with Multiple Sources b/data/2021/neurips/Boosting with Multiple Sources
new file mode 100644
index 0000000000..b352ad8857
--- /dev/null
+++ b/data/2021/neurips/Boosting with Multiple Sources	
@@ -0,0 +1 @@
+We study the problem of learning accurate ensemble predictors, in particular boosting, in the presence of multiple source domains. We show that the standard convex combination ensembles in general cannot succeed in this scenario and adopt instead a domain-weighted combination. We introduce and analyze a new boosting algorithm, M ULTI B OOST , for this scenario and show that it beneﬁts from favorable theoretical guarantees. We also report the results of several experiments with our algorithm demonstrating that it outperforms natural baselines on multi-source text-based, image-based and tabular data. We further present an extension of our algorithm to the federated learning scenario and report favorable experimental results for that setting as well. Additionally, we describe in detail an extension of our algorithm to the multi-class setting, MCM ULTI B OOST , for which we also report experimental results.
\ No newline at end of file
diff --git a/data/2021/neurips/Bootstrap Your Object Detector via Mixed Training b/data/2021/neurips/Bootstrap Your Object Detector via Mixed Training
new file mode 100644
index 0000000000..3cea2568aa
--- /dev/null
+++ b/data/2021/neurips/Bootstrap Your Object Detector via Mixed Training	
@@ -0,0 +1 @@
+We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free. MixTraining enhances data augmentation by utilizing augmentations of different strengths while excluding the strong augmentations of certain training samples that may be detrimental to training. In addition, it addresses localization noise and missing labels in human annotations by incorporating pseudo boxes that can compensate for these errors. Both of these MixTraining capabilities are made possible through bootstrapping on the detector, which can be used to predict the difficulty of training on a strong augmentation, as well as to generate reliable pseudo boxes thanks to the robustness of neural networks to labeling error. MixTraining is found to bring consistent improvements across various detectors on the COCO dataset. In particular, the performance of Faster R-CNN \cite{ren2015faster} with a ResNet-50 \cite{he2016deep} backbone is improved from 41.7 mAP to 44.0 mAP, and the accuracy of Cascade-RCNN \cite{cai2018cascade} with a Swin-Small \cite{liu2021swin} backbone is raised from 50.9 mAP to 52.8 mAP. The code and models will be made publicly available at \url{https://github.com/MendelXu/MixTraining}.
\ No newline at end of file
diff --git a/data/2021/neurips/Bootstrapping the Error of Oja's Algorithm b/data/2021/neurips/Bootstrapping the Error of Oja's Algorithm
new file mode 100644
index 0000000000..9cbf06f723
--- /dev/null
+++ b/data/2021/neurips/Bootstrapping the Error of Oja's Algorithm	
@@ -0,0 +1 @@
+We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a weighted $\chi^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.
\ No newline at end of file
diff --git a/data/2021/neurips/Bounds all around: training energy-based models with bidirectional bounds b/data/2021/neurips/Bounds all around: training energy-based models with bidirectional bounds
new file mode 100644
index 0000000000..e12b9141cd
--- /dev/null
+++ b/data/2021/neurips/Bounds all around: training energy-based models with bidirectional bounds	
@@ -0,0 +1 @@
+Energy-based models (EBMs) provide an elegant framework for density estimation, but they are notoriously difficult to train. Recent work has established links to generative adversarial networks, where the EBM is trained through a minimax game with a variational value function. We propose a bidirectional bound on the EBM log-likelihood, such that we maximize a lower bound and minimize an upper bound when solving the minimax game. We link one bound to a gradient penalty that stabilizes training, thereby providing grounding for best engineering practice. To evaluate the bounds we develop a new and efficient estimator of the Jacobi-determinant of the EBM generator. We demonstrate that these developments significantly stabilize training and yield high-quality density estimation and sample generation.
\ No newline at end of file
diff --git a/data/2021/neurips/Breaking the Dilemma of Medical Image-to-image Translation b/data/2021/neurips/Breaking the Dilemma of Medical Image-to-image Translation
new file mode 100644
index 0000000000..873085fe94
--- /dev/null
+++ b/data/2021/neurips/Breaking the Dilemma of Medical Image-to-image Translation	
@@ -0,0 +1 @@
+Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal. The Pix2Pix mode has excellent performance. But it requires paired and well pixel-wise aligned images, which may not always be achievable due to respiratory motion or anatomy change between times that paired images are acquired. The Cycle-consistency mode is less stringent with training data and works well on unpaired or misaligned images. But its performance may not be optimal. In order to break the dilemma of the existing modes, we propose a new unsupervised mode called RegGAN for medical image-to-image translation. It is based on the theory of"loss-correction". In RegGAN, the misaligned target images are considered as noisy labels and the generator is trained with an additional registration network to fit the misaligned noise distribution adaptively. The goal is to search for the common optimal solution to both image-to-image translation and registration tasks. We incorporated RegGAN into a few state-of-the-art image-to-image translation methods and demonstrated that RegGAN could be easily combined with these methods to improve their performances. Such as a simple CycleGAN in our mode surpasses latest NICEGAN even though using less network parameters. Based on our results, RegGAN outperformed both Pix2Pix on aligned data and Cycle-consistency on misaligned or unpaired data. RegGAN is insensitive to noises which makes it a better choice for a wide range of scenarios, especially for medical image-to-image translation tasks in which well pixel-wise aligned data are not available
\ No newline at end of file
diff --git a/data/2021/neurips/Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures b/data/2021/neurips/Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures
new file mode 100644
index 0000000000..ee98931335
--- /dev/null
+++ b/data/2021/neurips/Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures	
@@ -0,0 +1 @@
+Conditional gradient methods (CGM) are widely used in modern machine learning. CGM's overall running time usually consists of two parts: the number of iterations and the cost of each iteration. Most efforts focus on reducing the number of iterations as a means to reduce the overall running time. In this work, we focus on improving the per iteration cost of CGM. The bottleneck step in most CGM is maximum inner product search (MaxIP), which requires a linear scan over the parameters. In practice, approximate MaxIP data-structures are found to be helpful heuristics. However, theoretically, nothing is known about the combination of approximate MaxIP data-structures and CGM. In this work, we answer this question positively by providing a formal framework to combine the locality sensitive hashing type approximate MaxIP data-structures with CGM algorithms. As a result, we show the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms, e.g., Frank-Wolfe, Herding algorithm, and policy gradient.
\ No newline at end of file
diff --git a/data/2021/neurips/Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs b/data/2021/neurips/Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
new file mode 100644
index 0000000000..fa2272d4aa
--- /dev/null
+++ b/data/2021/neurips/Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs	
@@ -0,0 +1 @@
+Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta|>|y|\right] \le 1/|y|^{\alpha}$ for some $\alpha>0$. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning b/data/2021/neurips/Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning
new file mode 100644
index 0000000000..bb6b06e3d1
--- /dev/null
+++ b/data/2021/neurips/Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning	
@@ -0,0 +1,3 @@
+
+ Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods).
+ To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.
\ No newline at end of file
diff --git a/data/2021/neurips/Breaking the centralized barrier for cross-device federated learning b/data/2021/neurips/Breaking the centralized barrier for cross-device federated learning
new file mode 100644
index 0000000000..c94ebb6dfb
--- /dev/null
+++ b/data/2021/neurips/Breaking the centralized barrier for cross-device federated learning	
@@ -0,0 +1 @@
+Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of the data across different clients which can cause a client drift phenomenon. In fact, designing an algorithm for FL that is uniformly better than simple centralized training has been a major open problem thus far. In this work, we propose a general algorithmic framework, M IME , which i) mitigates client drift and ii) adapts an arbitrary centralized optimization algorithm such as momentum and Adam to the cross-device federated learning setting. M IME uses a combination of control-variates and server-level optimizer state (e.g. momentum) at every client-update step to ensure that each local update mimics that of the centralized method run on i.i.d. data. We prove a reduction result showing that M IME can translate the convergence of a generic algorithm in the centralized setting into convergence in the federated setting. Moreover, we show that, when combined with momentum-based variance reduction, M IME is provably faster than any centralized method –the ﬁrst such result. We also perform a thorough experimental exploration of M IME ’s performance on real world datasets (implemented here).
\ No newline at end of file
diff --git a/data/2021/neurips/Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning b/data/2021/neurips/Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning
new file mode 100644
index 0000000000..3c621f79dd
--- /dev/null
+++ b/data/2021/neurips/Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Discovering a solution in a combinatorial space is prevalent in many real-world problems but it is also challenging due to diverse complex constraints and the vast number of possible combinations. To address such a problem, we introduce a novel formulation, combinatorial construction, which requires a building agent to assemble unit primitives (i.e., LEGO bricks) sequentially -- every connection between two bricks must follow a fixed rule, while no bricks mutually overlap. To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit volumetric information to the agent. This problem requires a comprehensive understanding of partial information and long-term planning to append a brick sequentially, which leads us to employ reinforcement learning. The approach has to consider a variable-sized action space where a large number of invalid actions, which would cause overlap between bricks, exist. To resolve these issues, our model, dubbed Brick-by-Brick, adopts an action validity prediction network that efficiently filters invalid actions for an actor-critic network. We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.
\ No newline at end of file
diff --git a/data/2021/neurips/Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators b/data/2021/neurips/Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators
new file mode 100644
index 0000000000..4d56fc42f6
--- /dev/null
+++ b/data/2021/neurips/Bridging Explicit and Implicit Deep Generative Models via Neural Stein Estimators	
@@ -0,0 +1 @@
+There are two types of deep generative models: explicit and implicit. The former defines an explicit density form that allows likelihood inference; while the latter targets a flexible transformation from random noise to generated samples. While the two classes of generative models have shown great power in many applications, both of them, when used alone, suffer from respective limitations and drawbacks. To take full advantages of both models and enable mutual compensation, we propose a novel joint training framework that bridges an explicit (unnormalized) density estimator and an implicit sample generator via Stein discrepancy. We show that our method 1) induces novel mutual regularization via kernel Sobolev norm penalization and Moreau-Yosida regularization, and 2) stabilizes the training dynamics. Empirically, we demonstrate that proposed method can facilitate the density estimator to more accurately identify data modes and guide the generator to output higher-quality samples, comparing with training a single counterpart. The new approach also shows promising results when the training samples are contaminated or limited.
\ No newline at end of file
diff --git a/data/2021/neurips/Bridging Non Co-occurrence with Unlabeled In-the-wild Data for Incremental Object Detection b/data/2021/neurips/Bridging Non Co-occurrence with Unlabeled In-the-wild Data for Incremental Object Detection
new file mode 100644
index 0000000000..ab79a2b67d
--- /dev/null
+++ b/data/2021/neurips/Bridging Non Co-occurrence with Unlabeled In-the-wild Data for Incremental Object Detection	
@@ -0,0 +1 @@
+Deep networks have shown remarkable results in the task of object detection. However, their performance suffers critical drops when they are subsequently trained on novel classes without any sample from the base classes originally used to train the model. This phenomenon is known as catastrophic forgetting. Recently, several incremental learning methods are proposed to mitigate catastrophic forgetting for object detection. Despite the effectiveness, these methods require co-occurrence of the unlabeled base classes in the training data of the novel classes. This requirement is impractical in many real-world settings since the base classes do not necessarily co-occur with the novel classes. In view of this limitation, we consider a more practical setting of complete absence of co-occurrence of the base and novel classes for the object detection task. We propose the use of unlabeled in-the-wild data to bridge the non co-occurrence caused by the missing base classes during the training of additional novel classes. To this end, we introduce a blind sampling strategy based on the responses of the base-class model and pre-trained novel-class model to select a smaller relevant dataset from the large in-the-wild dataset for incremental learning. We then design a dual-teacher distillation framework to transfer the knowledge distilled from the base- and novel-class teacher models to the student model using the sampled in-the-wild data. Experimental results on the PASCAL VOC and MS COCO datasets show that our proposed method significantly outperforms other state-of-the-art class-incremental object detection methods when there is no co-occurrence between the base and novel classes during training.
\ No newline at end of file
diff --git a/data/2021/neurips/Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism b/data/2021/neurips/Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
new file mode 100644
index 0000000000..4c3e44ed67
--- /dev/null
+++ b/data/2021/neurips/Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main methods are used: imitation learning which is suitable for expert datasets, and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets often deviate from these two extremes and the exact data composition is usually unknown. To bridge this gap, we present a new offline RL framework, called single-policy concentrability, that smoothly interpolates between the two extremes of data composition, hence unifying imitation learning and vanilla offline RL. Under this new framework, we ask: can one develop an algorithm that achieves a minimax optimal rate adaptive to unknown data composition? To address this question, we consider a lower confidence bound (LCB) algorithm developed based on pessimism in the face of uncertainty in offline RL. We study finite-sample properties of LCB as well as information-theoretic limits in multi-armed bandits, contextual bandits, and Markov decision processes (MDPs). Our analysis reveals surprising facts about optimality rates. In particular, in both contextual bandits and RL, LCB achieves a fast convergence rate for nearly-expert datasets, analogous to the one achieved by imitation learning, contrary to the slow rate achieved in offline RL. In contextual bandits, we prove that LCB is adaptively optimal for the entire data composition range, achieving a smooth transition from imitation learning to offline RL. We further show that LCB is almost adaptively optimal in MDPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning b/data/2021/neurips/Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning
new file mode 100644
index 0000000000..dee8661ff7
--- /dev/null
+++ b/data/2021/neurips/Bridging the Gap Between Practice and PAC-Bayes Theory in Few-Shot Meta-Learning	
@@ -0,0 +1 @@
+Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited. This gap originates from an assumption in the existing theories which supposes that the number of training examples in the observed tasks and the number of training examples in the target tasks follow the same distribution, an assumption that rarely holds in practice. By relaxing this assumption, we develop two PAC-Bayesian bounds tailored for the few-shot learning setting and show that two existing meta-learning algorithms (MAML and Reptile) can be derived from our bounds, thereby bridging the gap between practice and PAC-Bayesian theories. Furthermore, we derive a new computationally-efficient PACMAML algorithm, and show it outperforms existing meta-learning algorithms on several few-shot benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Bridging the Imitation Gap by Adaptive Insubordination b/data/2021/neurips/Bridging the Imitation Gap by Adaptive Insubordination
new file mode 100644
index 0000000000..6179149916
--- /dev/null
+++ b/data/2021/neurips/Bridging the Imitation Gap by Adaptive Insubordination	
@@ -0,0 +1 @@
+Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR), which dynamically reweights imitation and reward-based reinforcement learning losses during training, enabling switching between imitation and exploration. On a suite of challenging tasks, we show that ADVISOR outperforms pure imitation, pure reinforcement learning, as well as sequential combinations of these approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Bubblewrap: Online tiling and real-time flow prediction on neural manifolds b/data/2021/neurips/Bubblewrap: Online tiling and real-time flow prediction on neural manifolds
new file mode 100644
index 0000000000..c0d3d1d1b5
--- /dev/null
+++ b/data/2021/neurips/Bubblewrap: Online tiling and real-time flow prediction on neural manifolds	
@@ -0,0 +1 @@
+While most classic studies of function in experimental neuroscience have focused on the coding properties of individual neurons, recent developments in recording technologies have resulted in an increasing emphasis on the dynamics of neural populations. This has given rise to a wide variety of models for analyzing population activity in relation to experimental variables, but direct testing of many neural population hypotheses requires intervening in the system based on current neural state, necessitating models capable of inferring neural state online. Existing approaches, primarily based on dynamical systems, require strong parametric assumptions that are easily violated in the noise-dominated regime and do not scale well to the thousands of data channels in modern experiments. To address this problem, we propose a method that combines fast, stable dimensionality reduction with a soft tiling of the resulting neural manifold, allowing dynamics to be approximated as a probability flow between tiles. This method can be fit efficiently using online expectation maximization, scales to tens of thousands of tiles, and outperforms existing methods when dynamics are noise-dominated or feature multi-modal transition probabilities. The resulting model can be trained at kiloHertz data rates, produces accurate approximations of neural dynamics within minutes, and generates predictions on submillisecond time scales. It retains predictive performance throughout many time steps into the future and is fast enough to serve as a component of closed-loop causal experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining b/data/2021/neurips/BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining
new file mode 100644
index 0000000000..a927af41a3
--- /dev/null
+++ b/data/2021/neurips/BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining	
@@ -0,0 +1 @@
+Neural network robustness has become a central topic in machine learning in recent years. Most training algorithms that improve the model's robustness to adversarial and common corruptions also introduce a large computational overhead, requiring as many as ten times the number of forward and backward passes in order to converge. To combat this inefficiency, we propose BulletTrain $-$ a boundary example mining technique to drastically reduce the computational cost of robust training. Our key observation is that only a small fraction of examples are beneficial for improving robustness. BulletTrain dynamically predicts these important examples and optimizes robust training algorithms to focus on the important examples. We apply our technique to several existing robust training algorithms and achieve a 2.1$\times$ speed-up for TRADES and MART on CIFAR-10 and a 1.7$\times$ speed-up for AugMix on CIFAR-10-C and CIFAR-100-C without any reduction in clean and robust accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE b/data/2021/neurips/ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE
new file mode 100644
index 0000000000..a8a1ab49d4
--- /dev/null
+++ b/data/2021/neurips/ByPE-VAE: Bayesian Pseudocoresets Exemplar VAE	
@@ -0,0 +1 @@
+Recent studies show that advanced priors play a major role in deep generative models. Exemplar VAE, as a variant of VAE with an exemplar-based prior, has achieved impressive results. However, due to the nature of model design, an exemplar-based model usually requires vast amounts of data to participate in training, which leads to huge computational complexity. To address this issue, we propose Bayesian Pseudocoresets Exemplar VAE (ByPE-VAE), a new variant of VAE with a prior based on Bayesian pseudocoreset. The proposed prior is conditioned on a small-scale pseudocoreset rather than the whole dataset for reducing the computational cost and avoiding overfitting. Simultaneously, we obtain the optimal pseudocoreset via a stochastic optimization algorithm during VAE training aiming to minimize the Kullback-Leibler divergence between the prior based on the pseudocoreset and that based on the whole dataset. Experimental results show that ByPE-VAE can achieve competitive improvements over the state-of-the-art VAEs in the tasks of density estimation, representation learning, and generative data augmentation. Particularly, on a basic VAE architecture, ByPE-VAE is up to 3 times faster than Exemplar VAE while almost holding the performance. Code is available at \url{https://github.com/Aiqz/ByPE-VAE}.
\ No newline at end of file
diff --git a/data/2021/neurips/CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks b/data/2021/neurips/CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks
new file mode 100644
index 0000000000..f71390968a
--- /dev/null
+++ b/data/2021/neurips/CAM-GAN: Continual Adaptation Modules for Generative Adversarial Networks	
@@ -0,0 +1 @@
+We present a continual learning approach for generative adversarial networks (GANs), by designing and leveraging parameter-efficient feature map transformations. Our approach is based on learning a set of global and task-specific parameters. The global parameters are fixed across tasks whereas the task-specific parameters act as local adapters for each task, and help in efficiently obtaining task-specific feature maps. Moreover, we propose an element-wise addition of residual bias in the transformed feature space, which further helps stabilize GAN training in such settings. Our approach also leverages task similarity information based on the Fisher information matrix. Leveraging this knowledge from previous tasks significantly improves the model performance. In addition, the similarity measure also helps reduce the parameter growth in continual adaptation and helps to learn a compact model. In contrast to the recent approaches for continually-learned GANs, the proposed approach provides a memory-efficient way to perform effective continual data generation. Through extensive experiments on challenging and diverse datasets, we show that the feature-map-transformation approach outperforms state-of-the-art methods for continually-learned GANs, with substantially fewer parameters. The proposed method generates high-quality samples that can also improve the generative-replay-based continual learning for discriminative tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression b/data/2021/neurips/CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
new file mode 100644
index 0000000000..6cd0a5a3bc
--- /dev/null
+++ b/data/2021/neurips/CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression	
@@ -0,0 +1 @@
+Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 1983, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA (Li, 2021) for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} $O\bigg(\sqrt{\Big(1+\sqrt{\frac{\omega^3}{n}}\Big)\frac{L}{\epsilon}} + \omega\big(\frac{1}{\epsilon}\big)^{\frac{1}{3}}\bigg)$, which improves upon the state-of-the-art non-accelerated rate $O\left((1+\frac{\omega}{n})\frac{L}{\epsilon} + \frac{\omega^2+\omega}{\omega+n}\frac{1}{\epsilon}\right)$ of DIANA (Khaled et al., 2020) for distributed general convex problems, where $\epsilon$ is the target error, $L$ is the smooth parameter of the objective, $n$ is the number of machines/devices, and $\omega$ is the compression parameter (larger $\omega$ means more compression can be applied, and no compression implies $\omega=0$). Our results show that as long as the number of devices $n$ is large (often true in distributed/federated learning), or the compression $\omega$ is not very high, CANITA achieves the faster convergence rate $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$, i.e., the number of communication rounds is $O\Big(\sqrt{\frac{L}{\epsilon}}\Big)$ (vs. $O\big(\frac{L}{\epsilon}\big)$ achieved by previous works). As a result, CANITA enjoys the advantages of both compression (compressed communication in each round) and acceleration (much fewer communication rounds).
\ No newline at end of file
diff --git a/data/2021/neurips/CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings b/data/2021/neurips/CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
new file mode 100644
index 0000000000..ef4b4fba1e
--- /dev/null
+++ b/data/2021/neurips/CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings	
@@ -0,0 +1 @@
+Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences longer than seen at training time. Relative positions are more robust to input length change, but are more complex to implement and yield inferior model throughput due to extra computational and memory costs. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative positional embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.
\ No newline at end of file
diff --git a/data/2021/neurips/CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator b/data/2021/neurips/CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator
new file mode 100644
index 0000000000..36c085e2e8
--- /dev/null
+++ b/data/2021/neurips/CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator	
@@ -0,0 +1 @@
+Accurately backpropagating the gradient through categorical variables is a challenging task that arises in various domains, such as training discrete latent variable models. To this end, we propose CARMS, an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples. CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling. It generalizes both the ARMS antithetic estimator for binary variables, which is CARMS for two categories, as well as LOORF/VarGrad, the leave-one-out REINFORCE estimator, which is CARMS with independent samples. We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline. The code is publicly available.
\ No newline at end of file
diff --git a/data/2021/neurips/CATs: Cost Aggregation Transformers for Visual Correspondence b/data/2021/neurips/CATs: Cost Aggregation Transformers for Visual Correspondence
new file mode 100644
index 0000000000..9941ea7d15
--- /dev/null
+++ b/data/2021/neurips/CATs: Cost Aggregation Transformers for Visual Correspondence	
@@ -0,0 +1 @@
+We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to hand-crafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to fully leverage self-attention mechanism. Specifically, we include appearance affinity modeling to aid the cost aggregation process in order to disambiguate the noisy initial correlation maps and propose multi-level aggregation to efficiently capture different semantics from hierarchical feature representations. We then combine with swapping self-attention technique and residual connections not only to enforce consistent matching but also to ease the learning process, which we find that these result in an apparent performance boost. We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies. Project page is available at : https://sunghwanhong.github.io/CATs/.
\ No newline at end of file
diff --git a/data/2021/neurips/CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method b/data/2021/neurips/CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method
new file mode 100644
index 0000000000..11acb176ef
--- /dev/null
+++ b/data/2021/neurips/CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method	
@@ -0,0 +1 @@
+Backward propagation of errors (backpropagation) is a method to minimize objective functions (e.g., loss functions) of deep neural networks by identifying optimal sets of weights and biases. Imposing constraints on weight precision is often required to alleviate prohibitive workloads on hardware. Despite the remarkable success of backpropagation, the algorithm itself is not capable of considering such constraints unless additional algorithms are applied simultaneously. To address this issue, we propose the constrained backpropagation (CBP) algorithm based on a pseudo-Lagrange multiplier method to obtain the optimal set of weights that satisfy a given set of constraints. The defining characteristic of the proposed CBP algorithm is the utilization of a Lagrangian function (loss function plus constraint function) as its objective function. We considered various types of constraints--binary, ternary, one-bit shift, and two-bit shift weight constraints. As a post-training method, CBP applied to AlexNet, ResNet-18, ResNet-50, and GoogLeNet on ImageNet, which were pre-trained using the conventional backpropagation. For all cases, the proposed algorithm outperforms the state-of-the-art methods on ImageNet, e.g., 66.6%, 74.4%, and 64.0% top-1 accuracy for ResNet-18, ResNet-50, and GoogLeNet with binary weights, respectively. This highlights CBP as a learning algorithm to address diverse constraints with the minimal performance loss by employing appropriate constraint functions.
\ No newline at end of file
diff --git a/data/2021/neurips/CCVS: Context-aware Controllable Video Synthesis b/data/2021/neurips/CCVS: Context-aware Controllable Video Synthesis
new file mode 100644
index 0000000000..092ae7f432
--- /dev/null
+++ b/data/2021/neurips/CCVS: Context-aware Controllable Video Synthesis	
@@ -0,0 +1 @@
+This presentation introduces a self-supervised learning approach to the synthesis of new video clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds even more flexibility by affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis process (eg, a few sample frames, an audio track, a trajectory in image space) and taking into account the intrinsically uncertain nature of the future by allowing multiple predictions. Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/CHIP: CHannel Independence-based Pruning for Compact Neural Networks b/data/2021/neurips/CHIP: CHannel Independence-based Pruning for Compact Neural Networks
new file mode 100644
index 0000000000..a116e784ce
--- /dev/null
+++ b/data/2021/neurips/CHIP: CHannel Independence-based Pruning for Compact Neural Networks	
@@ -0,0 +1 @@
+Filter pruning has been widely used for neural network compression because of its enabled practical acceleration. To date, most of the existing filter pruning works explore the importance of filters via using intra-channel information. In this paper, starting from an inter-channel perspective, we propose to perform efficient filter pruning using Channel Independence, a metric that measures the correlations among different feature maps. The less independent feature map is interpreted as containing less useful information$/$knowledge, and hence its corresponding filter can be pruned without affecting model capacity. We systematically investigate the quantification metric, measuring scheme and sensitiveness$/$reliability of channel independence in the context of filter pruning. Our evaluation results for different models on various datasets show the superior performance of our approach. Notably, on CIFAR-10 dataset our solution can bring $0.90\%$ and $0.94\%$ accuracy increase over baseline ResNet-56 and ResNet-110 models, respectively, and meanwhile the model size and FLOPs are reduced by $42.8\%$ and $47.4\%$ (for ResNet-56) and $48.3\%$ and $52.1\%$ (for ResNet-110), respectively. On ImageNet dataset, our approach can achieve $40.8\%$ and $44.8\%$ storage and computation reductions, respectively, with $0.15\%$ accuracy increase over the baseline ResNet-50 model. The code is available at https://github.com/Eclipsess/CHIP_NeurIPS2021.
\ No newline at end of file
diff --git a/data/2021/neurips/CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation b/data/2021/neurips/CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation
new file mode 100644
index 0000000000..4149fa5323
--- /dev/null
+++ b/data/2021/neurips/CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation	
@@ -0,0 +1 @@
+Unsupervised Domain Adaptation (UDA) aims to align the labeled source distribution with the unlabeled target distribution to obtain domain invariant predictive models. However, the application of well-known UDA approaches does not generalize well in Semi-Supervised Domain Adaptation (SSDA) scenarios where few labeled samples from the target domain are available. In this paper, we propose a simple Contrastive Learning framework for semi-supervised Domain Adaptation (CLDA) that attempts to bridge the intra-domain gap between the labeled and unlabeled target distributions and inter-domain gap between source and unlabeled target distribution in SSDA. We suggest employing class-wise contrastive learning to reduce the inter-domain gap and instance-level contrastive alignment between the original (input image) and strongly augmented unlabeled target images to minimize the intra-domain discrepancy. We have shown empirically that both of these modules complement each other to achieve superior performance. Experiments on three well-known domain adaptation benchmark datasets namely DomainNet, Office-Home, and Office31 demonstrate the effectiveness of our approach. CLDA achieves state-of-the-art results on all the above datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/CLIP-It! Language-Guided Video Summarization b/data/2021/neurips/CLIP-It! Language-Guided Video Summarization
new file mode 100644
index 0000000000..588aef20a0
--- /dev/null
+++ b/data/2021/neurips/CLIP-It! Language-Guided Video Summarization	
@@ -0,0 +1 @@
+A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes. Yet the importance of scenes in a video is often subjective, and users should have the option of customizing the summary by using natural language to specify what is important to them. Further, existing models for fully automatic generic summarization have not exploited available language models, which can serve as an effective prior for saliency. This work introduces CLIP-It, a single framework for addressing both generic and query-focused video summarization, typically approached separately in the literature. We propose a language-guided multimodal transformer that learns to score frames in a video based on their importance relative to one another and their correlation with a user-defined query (for query-focused summarization) or an automatically generated dense video caption (for generic video summarization). Our model can be extended to the unsupervised setting by training without ground-truth supervision. We outperform baselines and prior work by a significant margin on both standard video summarization datasets (TVSum and SumMe) and a query-focused video summarization dataset (QFVS). Particularly, we achieve large improvements in the transfer setting, attesting to our method's strong generalization capabilities.
\ No newline at end of file
diff --git a/data/2021/neurips/CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum b/data/2021/neurips/CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum
new file mode 100644
index 0000000000..9065b92d97
--- /dev/null
+++ b/data/2021/neurips/CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum	
@@ -0,0 +1 @@
+Goal-conditioned reinforcement learning (RL) usually suffers from sparse reward and inefﬁcient exploration in long-horizon tasks. Planning can ﬁnd the shortest path to a distant goal that provides dense reward/guidance but is inaccurate without a precise environment model. We show that RL and planning can collaboratively learn from each other to overcome their own drawbacks. In “ CO-PILOT ”, a learnable path-planner and an RL agent produce dense feedback to train each other on a curriculum of tree-structured sub-tasks. Firstly, the planner recursively decomposes a long-horizon task to a tree of sub-tasks in a top-down manner, whose layers construct coarse-to-ﬁne sub-task sequences as plans to complete the original task. The planning policy is trained to minimize the RL agent’s cost of completing the sequence in each layer from top to bottom layers, which gradually increases the sub-tasks and thus forms an easy-to-hard curriculum for the planner. Next, a bottom-up traversal of the tree trains the RL agent from easier sub-tasks with denser rewards on bottom layers to harder ones on top layers and collects its cost on each sub-task train the planner in the next episode. CO-PILOT repeats this mutual training for multiple episodes before switching to a new task, so the RL agent and planner are fully optimized to facilitate each other’s training. We compare CO-PILOT with RL (SAC, HER, PPO), planning (RRT*, NEXT, SGT), and their combination (SoRB) on navigation and continuous control tasks. CO-PILOT
\ No newline at end of file
diff --git a/data/2021/neurips/COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining b/data/2021/neurips/COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
new file mode 100644
index 0000000000..c4a7852ef4
--- /dev/null
+++ b/data/2021/neurips/COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining	
@@ -0,0 +1 @@
+We present a self-supervised learning framework, COCO-LM, that pretrains Language Models by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary language model to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics. The second sequence-level task, Sequence Contrastive Learning, is to align text sequences originated from the same source input while ensuring uniformity in the representation space. Experiments on GLUE and SQuAD demonstrate that COCO-LM not only outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency. It achieves the MNLI accuracy of ELECTRA with 50% of its pretraining GPU hours. With the same pretraining steps of standard base/large-sized models, COCO-LM outperforms the previous best models by 1+ GLUE average points.
\ No newline at end of file
diff --git a/data/2021/neurips/COHESIV: Contrastive Object and Hand Embedding Segmentation In Video b/data/2021/neurips/COHESIV: Contrastive Object and Hand Embedding Segmentation In Video
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/COMBO: Conservative Offline Model-Based Policy Optimization b/data/2021/neurips/COMBO: Conservative Offline Model-Based Policy Optimization
new file mode 100644
index 0000000000..06cb69fdbf
--- /dev/null
+++ b/data/2021/neurips/COMBO: Conservative Offline Model-Based Policy Optimization	
@@ -0,0 +1 @@
+Model-based algorithms, which learn a dynamics model from logged experience and perform some sort of pessimistic planning under the learned model, have emerged as a promising paradigm for offline reinforcement learning (offline RL). However, practical variants of such model-based algorithms rely on explicit uncertainty quantification for incorporating pessimism. Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model. This results in a conservative estimate of the value function for out-of-support state-action tuples, without requiring explicit uncertainty estimation. We theoretically show that our method optimizes a lower bound on the true policy value, that this bound is tighter than that of prior methods, and our approach satisfies a policy improvement guarantee in the offline setting. Through experiments, we find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods on widely studied offline RL benchmarks, including image-based tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/CROCS: Clustering and Retrieval of Cardiac Signals Based on Patient Disease Class, Sex, and Age b/data/2021/neurips/CROCS: Clustering and Retrieval of Cardiac Signals Based on Patient Disease Class, Sex, and Age
new file mode 100644
index 0000000000..403644a551
--- /dev/null
+++ b/data/2021/neurips/CROCS: Clustering and Retrieval of Cardiac Signals Based on Patient Disease Class, Sex, and Age	
@@ -0,0 +1 @@
+The process of manually searching for relevant instances in, and extracting information from, clinical databases underpin a multitude of clinical tasks. Such tasks include disease diagnosis, clinical trial recruitment, and continuing medical education. This manual search-and-extract process, however, has been hampered by the growth of large-scale clinical databases and the increased prevalence of unlabelled instances. To address this challenge, we propose a supervised contrastive learning framework, CROCS, where representations of cardiac signals associated with a set of patient-specific attributes (e.g., disease class, sex, age) are attracted to learnable embeddings entitled clinical prototypes. We exploit such prototypes for both the clustering and retrieval of unlabelled cardiac signals based on multiple patient attributes. We show that CROCS outperforms the state-of-the-art method, DTC, when clustering and also retrieves relevant cardiac signals from a large database. We also show that clinical prototypes adopt a semantically meaningful arrangement based on patient attributes and thus confer a high degree of interpretability.
\ No newline at end of file
diff --git a/data/2021/neurips/CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation b/data/2021/neurips/CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation
new file mode 100644
index 0000000000..9afd59bf84
--- /dev/null
+++ b/data/2021/neurips/CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation	
@@ -0,0 +1 @@
+The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-65% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines. The code is available at https://github.com/ermongroup/CSDI.
\ No newline at end of file
diff --git a/data/2021/neurips/Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration b/data/2021/neurips/Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration
new file mode 100644
index 0000000000..1d708a903f
--- /dev/null
+++ b/data/2021/neurips/Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration	
@@ -0,0 +1 @@
+When facing uncertainty, decision-makers want predictions they can trust. A machine learning provider can convey confidence to decision-makers by guaranteeing their predictions are distribution calibrated -- amongst the inputs that receive a predicted class probabilities vector $q$, the actual distribution over classes is $q$. For multi-class prediction problems, however, achieving distribution calibration tends to be infeasible, requiring sample complexity exponential in the number of classes $C$. In this work, we introduce a new notion -- \emph{decision calibration} -- that requires the predicted distribution and true distribution to be ``indistinguishable'' to a set of downstream decision-makers. When all possible decision makers are under consideration, decision calibration is the same as distribution calibration. However, when we only consider decision makers choosing between a bounded number of actions (e.g. polynomial in $C$), our main result shows that decisions calibration becomes feasible -- we design a recalibration algorithm that requires sample complexity polynomial in the number of actions and the number of classes. We validate our recalibration algorithm empirically: compared to existing methods, decision calibration improves decision-making on skin lesion and ImageNet classification with modern neural network predictors.
\ No newline at end of file
diff --git a/data/2021/neurips/Calibration and Consistency of Adversarial Surrogate Losses b/data/2021/neurips/Calibration and Consistency of Adversarial Surrogate Losses
new file mode 100644
index 0000000000..58e86aab7f
--- /dev/null
+++ b/data/2021/neurips/Calibration and Consistency of Adversarial Surrogate Losses	
@@ -0,0 +1 @@
+Adversarial robustness is an increasingly critical property of classifiers in applications. The design of robust algorithms relies on surrogate losses since the optimization of the adversarial loss with most hypothesis sets is NP-hard. But which surrogate losses should be used and when do they benefit from theoretical guarantees? We present an extensive study of this question, including a detailed analysis of the H-calibration and H-consistency of adversarial surrogate losses. We show that, under some general assumptions, convex loss functions, or the supremum-based convex losses often used in applications, are not H-calibrated for important hypothesis sets such as generalized linear models or one-layer neural networks. We then give a characterization of H-calibration and prove that some surrogate losses are indeed H-calibrated for the adversarial loss, with these hypothesis sets. Next, we show that H-calibration is not sufficient to guarantee consistency and prove that, in the absence of any distributional assumption, no continuous surrogate loss is consistent in the adversarial setting. This, in particular, proves that a claim presented in a COLT 2020 publication is inaccurate. (Calibration results there are correct modulo subtle definition differences, but the consistency claim does not hold.) Next, we identify natural conditions under which some surrogate losses that we describe in detail are H-consistent for hypothesis sets such as generalized linear models and one-layer neural networks. We also report a series of empirical results with simulated data, which show that many H-calibrated surrogate losses are indeed not H-consistent, and validate our theoretical assumptions.
\ No newline at end of file
diff --git a/data/2021/neurips/Can Information Flows Suggest Targets for Interventions in Neural Circuits? b/data/2021/neurips/Can Information Flows Suggest Targets for Interventions in Neural Circuits?
new file mode 100644
index 0000000000..3b3d8ca91c
--- /dev/null
+++ b/data/2021/neurips/Can Information Flows Suggest Targets for Interventions in Neural Circuits?	
@@ -0,0 +1 @@
+Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.
\ No newline at end of file
diff --git a/data/2021/neurips/Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial b/data/2021/neurips/Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial
new file mode 100644
index 0000000000..683e5a3345
--- /dev/null
+++ b/data/2021/neurips/Can Less be More? When Increasing-to-Balancing Label Noise Rates Considered Beneficial	
@@ -0,0 +1 @@
+In this paper, we answer the question of when inserting label noise (less informative labels) can instead return us more accurate and fair models. We are primarily inspired by three observations: 1) In contrast to reducing label noise rates, increasing the noise rates is easy to implement; 2) Increasing a certain class of instances' label noise to balance the noise rates (increasing-to-balancing) results in an easier learning problem; 3) Increasing-to-balancing improves fairness guarantees against label bias. In this paper, we first quantify the trade-offs introduced by increasing a certain group of instances' label noise rate w.r.t. the loss of label informativeness and the lowered learning difficulties. We analytically demonstrate when such an increase is beneficial, in terms of either improved generalization power or the fairness guarantees. Then we present a method to insert label noise properly for the task of learning with noisy labels, either without or with a fairness constraint. The primary technical challenge we face is due to the fact that we would not know which data instances are suffering from higher noise, and we would not have the ground truth labels to verify any possible hypothesis. We propose a detection method that informs us which group of labels might suffer from higher noise without using ground truth labels. We formally establish the effectiveness of the proposed solution and demonstrate it with extensive experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks b/data/2021/neurips/Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks
new file mode 100644
index 0000000000..30a0997565
--- /dev/null
+++ b/data/2021/neurips/Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks	
@@ -0,0 +1 @@
+Deep neural networks are powerful machines for visual pattern recognition, but reasoning tasks that are easy for humans may still be difficult for neural models. Humans possess the ability to extrapolate reasoning strategies learned on simple problems to solve harder examples, often by thinking for longer. For example, a person who has learned to solve small mazes can easily extend the very same search techniques to solve much larger mazes by spending more time. In computers, this behavior is often achieved through the use of algorithms, which scale to arbitrarily hard problem instances at the cost of more computation. In contrast, the sequential computing budget of feed-forward neural networks is limited by their depth, and networks trained on simple problems have no way of extending their reasoning to accommodate harder problems. In this work, we show that recurrent networks trained to solve simple problems with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference. We demonstrate this algorithmic behavior of recurrent networks on prefix sum computation, mazes, and chess. In all three domains, networks trained on simple problem instances are able to extend their reasoning abilities at test time simply by"thinking for longer."
\ No newline at end of file
diff --git a/data/2021/neurips/Can contrastive learning avoid shortcut solutions? b/data/2021/neurips/Can contrastive learning avoid shortcut solutions?
new file mode 100644
index 0000000000..9eb11a637c
--- /dev/null
+++ b/data/2021/neurips/Can contrastive learning avoid shortcut solutions?	
@@ -0,0 +1 @@
+The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via "shortcuts", i.e., by inadvertently suppressing important predictive features. We find that feature extraction is influenced by the difficulty of the so-called instance discrimination task (i.e., the task of discriminating pairs of similar points from pairs of dissimilar ones). Although harder pairs improve the representation of some features, the improvement comes at the cost of suppressing previously well represented features. In response, we propose implicit feature modification (IFM), a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. Empirically, we observe that IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks. The code is available at: https://github.com/joshr17/IFM.
\ No newline at end of file
diff --git a/data/2021/neurips/Can fMRI reveal the representation of syntactic structure in the brain? b/data/2021/neurips/Can fMRI reveal the representation of syntactic structure in the brain?
new file mode 100644
index 0000000000..9e6922eb5c
--- /dev/null
+++ b/data/2021/neurips/Can fMRI reveal the representation of syntactic structure in the brain?	
@@ -0,0 +1 @@
+While studying semantics in the brain, neuroscientists use two approaches. One is to identify areas that are correlated with semantic processing load. Another is to find areas that are predicted by the semantic representation of the stimulus words. However, in the domain of syntax, most studies have focused only on identifying areas correlated with syntactic processing load. One possible reason for this discrepancy is that representing syntactic structure in an embedding space such that it can be used to model brain activity is a non-trivial computational problem. Another possible reason is that it is unclear if the low signal-to-noise ratio of neuroimaging tools such as functional Magnetic Resonance Imaging (fMRI) can allow us to reveal correlates of complex (and perhaps subtle) syntactic representations. In this study, we propose novel multi-dimensional features that encode information about the syntactic structure of sentences. Using these features and fMRI recordings of participants reading a natural text, we model the brain representation of syntax. First, we find that our syntactic structure-based features explain additional variance in the brain activity of various parts of the language system, even after controlling for complexity metrics that capture processing load. At the same time, we see that regions well-predicted by syntactic features are distributed in the language system and are not distinguishable from those processing semantics.
\ No newline at end of file
diff --git a/data/2021/neurips/Can multi-label classification networks know what they don't know? b/data/2021/neurips/Can multi-label classification networks know what they don't know?
new file mode 100644
index 0000000000..26278a4d60
--- /dev/null
+++ b/data/2021/neurips/Can multi-label classification networks know what they don't know?	
@@ -0,0 +1 @@
+Estimating out-of-distribution (OOD) uncertainty is a central challenge for safely deploying machine learning models in the open-world environment. Improved methods for OOD detection in multi-class classification have emerged, while OOD detection methods for multi-label classification remain underexplored and use rudimentary techniques. We propose JointEnergy, a simple and effective method, which estimates the OOD indicator scores by aggregating energy scores from multiple labels. We show that JointEnergy can be mathematically interpreted from a joint likelihood perspective. Our results show consistent improvement over previous methods that are based on the maximum-valued scores, which fail to capture joint information from multiple labels. We demonstrate the effectiveness of our method on three common multi-label classification benchmarks, including MS-COCO, PASCAL-VOC, and NUS-WIDE. We show that JointEnergy can reduce the FPR95 by up to 10.05% compared to the previous best baseline, establishing state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression b/data/2021/neurips/Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression
new file mode 100644
index 0000000000..fd2fe9b83c
--- /dev/null
+++ b/data/2021/neurips/Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression	
@@ -0,0 +1 @@
+Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks b/data/2021/neurips/Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks
new file mode 100644
index 0000000000..be5661db94
--- /dev/null
+++ b/data/2021/neurips/Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks	
@@ -0,0 +1 @@
+(Non-)robustness of neural networks to small, adversarial pixel-wise perturbations, and as more recently shown, to even random spatial transformations (e.g., translations, rotations) entreats both theoretical and empirical understanding. Spatial robustness to random translations and rotations is commonly attained via equivariant models (e.g., StdCNNs, GCNNs) and training augmentation, whereas adversarial robustness is typically achieved by adversarial training. In this paper, we prove a quantitative trade-off between spatial and adversarial robustness in a simple statistical setting. We complement this empirically by showing that: (a) as the spatial robustness of equivariant models improves by training augmentation with progressively larger transformations, their adversarial robustness worsens progressively, and (b) as the state-of-the-art robust models are adversarially trained with progressively larger pixel-wise perturbations, their spatial robustness drops progressively. Towards achieving pareto-optimality in this trade-off, we propose a method based on curriculum learning that trains gradually on more difficult perturbations (both spatial and adversarial) to improve spatial and adversarial robustness simultaneously.
\ No newline at end of file
diff --git a/data/2021/neurips/Canonical Capsules: Self-Supervised Capsules in Canonical Pose b/data/2021/neurips/Canonical Capsules: Self-Supervised Capsules in Canonical Pose
new file mode 100644
index 0000000000..6f6be9bf1f
--- /dev/null
+++ b/data/2021/neurips/Canonical Capsules: Self-Supervised Capsules in Canonical Pose	
@@ -0,0 +1 @@
+We propose an unsupervised capsule architecture for 3D point clouds. We compute capsule decompositions of objects through permutation-equivariant attention, and self-supervise the process by training with pairs of randomly rotated objects. Our key idea is to aggregate the attention masks into semantic keypoints, and use these to supervise a decomposition that satisfies the capsule invariance/equivariance properties. This not only enables the training of a semantically consistent decomposition, but also allows us to learn a canonicalization operation that enables object-centric reasoning. In doing so, we require neither classification labels nor manually-aligned training datasets to train. Yet, by learning an object-centric representation in an unsupervised manner, our method outperforms the state-of-the-art on 3D point cloud reconstruction, registration, and unsupervised classification. We will release the code and dataset to reproduce our results as soon as the paper is published.
\ No newline at end of file
diff --git a/data/2021/neurips/Capacity and Bias of Learned Geometric Embeddings for Directed Graphs b/data/2021/neurips/Capacity and Bias of Learned Geometric Embeddings for Directed Graphs
new file mode 100644
index 0000000000..e1db3f3db9
--- /dev/null
+++ b/data/2021/neurips/Capacity and Bias of Learned Geometric Embeddings for Directed Graphs	
@@ -0,0 +1 @@
+A wide variety of machine learning tasks such as knowledge base completion, ontology alignment, and multi-label classiﬁcation can beneﬁt from incorporating into learning differentiable representations of graphs or taxonomies. While vectors in Euclidean space can theoretically represent any graph, much recent work shows that alternatives such as complex, hyperbolic, order, or box embeddings have geometric properties better suited to modeling real-world graphs. Experimentally these gains are seen only in lower dimensions, however, with performance beneﬁts diminishing in higher dimensions. In this work, we introduce a novel variant of box embeddings that uses a learned smoothing parameter to achieve better representational capacity than vector models in low dimensions, while also avoiding performance saturation common to other geometric models in high dimensions. Further, we present theoretical results that prove box embeddings can represent any DAG. We perform rigorous empirical evaluations of vector, hyperbolic, and region-based geometric representations on several families of synthetic and real-world directed graphs. Analysis of these results exposes correlations between different families of graphs, graph characteristics, model size, and embedding geometries, providing useful insights into inductive biases of various differentiable graph representations.
\ No newline at end of file
diff --git a/data/2021/neurips/Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations b/data/2021/neurips/Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations
new file mode 100644
index 0000000000..7cdbad30c7
--- /dev/null
+++ b/data/2021/neurips/Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations	
@@ -0,0 +1 @@
+We consider the task of representation learning for unsupervised segmentation of 3D voxel-grid biomedical images. We show that models that capture implicit hierarchical relationships between subvolumes are better suited for this task. To that end, we consider encoder-decoder architectures with a hyperbolic latent space, to explicitly capture hierarchical relationships present in subvolumes of the data. We propose utilizing a 3D hyperbolic variational autoencoder with a novel gyroplane convolutional layer to map from the embedding space back to 3D images. To capture these relationships, we introduce an essential self-supervised loss -- in addition to the standard VAE loss -- which infers approximate hierarchies and encourages implicitly related subvolumes to be mapped closer in the embedding space. We present experiments on both synthetic data and biomedical data to validate our hypothesis.
\ No newline at end of file
diff --git a/data/2021/neurips/Cardinality constrained submodular maximization for random streams b/data/2021/neurips/Cardinality constrained submodular maximization for random streams
new file mode 100644
index 0000000000..cfc2596766
--- /dev/null
+++ b/data/2021/neurips/Cardinality constrained submodular maximization for random streams	
@@ -0,0 +1 @@
+We consider the problem of maximizing submodular functions in single-pass streaming and secretaries-with-shortlists models, both with random arrival order. For cardinality constrained monotone functions, Agrawal, Shadravan, and Stein gave a single-pass $(1-1/e-\varepsilon)$-approximation algorithm using only linear memory, but their exponential dependence on $\varepsilon$ makes it impractical even for $\varepsilon=0.1$. We simplify both the algorithm and the analysis, obtaining an exponential improvement in the $\varepsilon$-dependence (in particular, $O(k/\varepsilon)$ memory). Extending these techniques, we also give a simple $(1/e-\varepsilon)$-approximation for non-monotone functions in $O(k/\varepsilon)$ memory. For the monotone case, we also give a corresponding unconditional hardness barrier of $1-1/e+\varepsilon$ for single-pass algorithms in randomly ordered streams, even assuming unlimited computation. Finally, we show that the algorithms are simple to implement and work well on real world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Cardinality-Regularized Hawkes-Granger Model b/data/2021/neurips/Cardinality-Regularized Hawkes-Granger Model
new file mode 100644
index 0000000000..0142096697
--- /dev/null
+++ b/data/2021/neurips/Cardinality-Regularized Hawkes-Granger Model	
@@ -0,0 +1 @@
+We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a speciﬁc class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts. In this paper, we propose a mathematically well-deﬁned sparse causal learning framework based on a cardinality-regularized Hawkes process, which remedies the pathological issues of existing approaches. We leverage the proposed algorithm for the task of instance-wise causal event analysis, where sparsity plays a critical role. We validate the proposed framework with two real use-cases, one from the power grid and the other from the cloud data center management domain.
\ No newline at end of file
diff --git a/data/2021/neurips/Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication b/data/2021/neurips/Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
new file mode 100644
index 0000000000..9eeba24728
--- /dev/null
+++ b/data/2021/neurips/Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication	
@@ -0,0 +1 @@
+Communication is compositional if complex signals can be represented as a combination of simpler subparts. In this paper, we theoretically show that inductive biases on both the training framework and the data are needed to develop a compositional communication. Moreover, we prove that compositionality spontaneously arises in the signaling games, where agents communicate over a noisy channel. We experimentally confirm that a range of noise levels, which depends on the model and the data, indeed promotes compositionality. Finally, we provide a comprehensive study of this dependence and report results in terms of recently studied compositionality metrics: topographical similarity, conflict count, and context independence.
\ No newline at end of file
diff --git a/data/2021/neurips/Catastrophic Data Leakage in Vertical Federated Learning b/data/2021/neurips/Catastrophic Data Leakage in Vertical Federated Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Catch-A-Waveform: Learning to Generate Audio from a Single Short Example b/data/2021/neurips/Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
new file mode 100644
index 0000000000..0961f63bff
--- /dev/null
+++ b/data/2021/neurips/Catch-A-Waveform: Learning to Generate Audio from a Single Short Example	
@@ -0,0 +1 @@
+Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. adding a new verse to a Beatles song based solely on the original recording), filling-in of missing parts (inpainting), extending the bandwidth of a speech signal (super-resolution), and enhancing old recordings without access to any clean training example. We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results. This is despite its complete lack of prior knowledge about the nature of audio signals in general.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Abstractions of Neural Networks b/data/2021/neurips/Causal Abstractions of Neural Networks
new file mode 100644
index 0000000000..57fb472c88
--- /dev/null
+++ b/data/2021/neurips/Causal Abstractions of Neural Networks	
@@ -0,0 +1 @@
+Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of causal abstraction that provides rich characterizations of model-internal representations and their roles in input/output behavior. In this method, neural representations are aligned with variables in interpretable causal models, and then interchange interventions are used to experimentally verify that the neural representations have the causal properties of their aligned variables. We apply this method in a case study to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus, a highly complex NLI dataset that was constructed with a tree-structured natural logic causal model. We discover that a BERT-based model with state-of-the-art performance successfully realizes parts of the natural logic model's causal structure, whereas a simpler baseline model fails to show any such structure, demonstrating that BERT representations encode the compositional structure of MQNLI.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Bandits with Unknown Graph Structure b/data/2021/neurips/Causal Bandits with Unknown Graph Structure
new file mode 100644
index 0000000000..7cfbfc9001
--- /dev/null
+++ b/data/2021/neurips/Causal Bandits with Unknown Graph Structure	
@@ -0,0 +1 @@
+In causal bandit problems, the action set consists of interventions on variables of a causal graph. Several researchers have recently studied such bandit problems and pointed out their practical applications. However, all existing works rely on a restrictive and impractical assumption that the learner is given full knowledge of the causal graph structure upfront. In this paper, we develop novel causal bandit algorithms without knowing the causal graph. Our algorithms work well for causal trees, causal forests and a general class of causal graphs. The regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit (MAB) algorithms under mild conditions. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard MAB algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Effect Inference for Structured Treatments b/data/2021/neurips/Causal Effect Inference for Structured Treatments
new file mode 100644
index 0000000000..0ab13da8f2
--- /dev/null
+++ b/data/2021/neurips/Causal Effect Inference for Structured Treatments	
@@ -0,0 +1 @@
+We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence guarantee under mild assumptions. In experiments with small-world and molecular graphs we demonstrate that our approach outperforms prior work in CATE estimation.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Identification with Matrix Equations b/data/2021/neurips/Causal Identification with Matrix Equations
new file mode 100644
index 0000000000..7577012e11
--- /dev/null
+++ b/data/2021/neurips/Causal Identification with Matrix Equations	
@@ -0,0 +1 @@
+Causal effect identiﬁcation is concerned with determining whether a causal effect is computable from a combination of qualitative assumptions about the underlying system (e.g., a causal graph) and distributions collected from this system. Many identiﬁcation algorithms exclusively rely on graphical criteria made of a non-trivial combination of probability axioms, do-calculus, and reﬁned c-factorization (e.g., Lee & Bareinboim, 2020). In a sequence of increasingly sophisticated results, it has been shown how proxy variables can be used to identify certain effects that would not be otherwise recoverable in challenging scenarios through solving matrix equations (e.g., Kuroki & Pearl, 2014; Miao et al., 2018). In this paper, we develop a new causal identiﬁcation algorithm which utilizes both graphical criteria and matrix equations. Speciﬁcally, we ﬁrst characterize the relationships between certain graphically-driven formulae and matrix multiplications. With such characterizations, we broaden the spectrum of proxy variable based identiﬁcation conditions and further propose novel intermediary criteria based on the pseudoinverse of a matrix. Finally, we devise a causal effect identiﬁcation algorithm, which accepts as input a collection of marginal, conditional, and interventional distributions, integrating enriched matrix-based criteria into a graphical identiﬁcation approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Inference for Event Pairs in Multivariate Point Processes b/data/2021/neurips/Causal Inference for Event Pairs in Multivariate Point Processes
new file mode 100644
index 0000000000..c41995e74b
--- /dev/null
+++ b/data/2021/neurips/Causal Inference for Event Pairs in Multivariate Point Processes	
@@ -0,0 +1 @@
+Causal inference and discovery from observational data has been extensively studied across multiple fields. However, most prior work has focused on independent and identically distributed (i.i.d.) data. In this paper, we propose a formalization for causal inference between pairs of event variables in multivariate recurrent event streams by extending Rubin’s framework for the average treatment effect (ATE) and propensity scores to multivariate point processes. Analogous to a joint probability distribution representing i.i.d. data, a multivariate point process represents data involving asynchronous and irregularly spaced occurrences of various types of events over a common timeline. We theoretically justify our point process causal framework and show how to obtain unbiased estimates of the proposed measure. We conduct an experimental investigation using synthetic and real-world event datasets, where our proposed causal inference framework is shown to exhibit superior performance against a set of baseline pairwise causal association scores.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Influence Detection for Improving Efficiency in Reinforcement Learning b/data/2021/neurips/Causal Influence Detection for Improving Efficiency in Reinforcement Learning
new file mode 100644
index 0000000000..2a7a98a6b8
--- /dev/null
+++ b/data/2021/neurips/Causal Influence Detection for Improving Efficiency in Reinforcement Learning	
@@ -0,0 +1 @@
+Many reinforcement learning (RL) environments consist of independent entities that interact sparsely. In such environments, RL agents have only limited influence over other entities in any particular situation. Our idea in this work is that learning can be efficiently guided by knowing when and what the agent can influence with its actions. To achieve this, we introduce a measure of \emph{situation-dependent causal influence} based on conditional mutual information and show that it can reliably detect states of influence. We then propose several ways to integrate this measure into RL algorithms to improve exploration and off-policy learning. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal Navigation by Continuous-time Neural Networks b/data/2021/neurips/Causal Navigation by Continuous-time Neural Networks
new file mode 100644
index 0000000000..4ca1b3c634
--- /dev/null
+++ b/data/2021/neurips/Causal Navigation by Continuous-time Neural Networks	
@@ -0,0 +1 @@
+Imitation learning enables high-fidelity, vision-based learning of policies within rich, photorealistic environments. However, such techniques often rely on traditional discrete-time neural models and face difficulties in generalizing to domain shifts by failing to account for the causal relationships between the agent and the environment. In this paper, we propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks, specifically over their discrete-time counterparts. We evaluate our method in the context of visual-control learning of drones over a series of complex tasks, ranging from short- and long-term navigation, to chasing static and dynamic objects through photorealistic environments. Our results demonstrate that causal continuous-time deep models can perform robust navigation tasks, where advanced recurrent models fail. These models learn complex causal control representations directly from raw visual inputs and scale to solve a variety of tasks using imitation learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data b/data/2021/neurips/Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data
new file mode 100644
index 0000000000..c107f75be3
--- /dev/null
+++ b/data/2021/neurips/Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data	
@@ -0,0 +1 @@
+Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical, or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations. However, when measuring individual outcomes is costly, as is the case of a tumor biopsy, a sample-efficient strategy for acquiring each result is required. Deep Bayesian active learning provides a framework for efficient data acquisition by selecting points with high uncertainty. However, existing methods bias training data acquisition towards regions of non-overlapping support between the treated and control populations. These are not sample-efficient because the treatment effect is not identifiable in such regions. We introduce causal, Bayesian acquisition functions grounded in information theory that bias data acquisition towards regions with overlapping support to maximize sample efficiency for learning personalized treatment effects. We demonstrate the performance of the proposed acquisition strategies on synthetic and semi-synthetic datasets IHDP and CMNIST and their extensions, which aim to simulate common dataset biases and pathologies.
\ No newline at end of file
diff --git a/data/2021/neurips/Celebrating Diversity in Shared Multi-Agent Reinforcement Learning b/data/2021/neurips/Celebrating Diversity in Shared Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..52cbf1fa0d
--- /dev/null
+++ b/data/2021/neurips/Celebrating Diversity in Shared Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Recently, deep multi-agent reinforcement learning (MARL) has shown the promise to solve complex cooperative tasks. Its success is partly because of parameter sharing among agents. However, such sharing may lead agents to behave similarly and limit their coordination capacity. In this paper, we aim to introduce diversity in both optimization and representation of shared multi-agent reinforcement learning. Specifically, we propose an information-theoretical regularization to maximize the mutual information between agents' identities and their trajectories, encouraging extensive exploration and diverse individualized behaviors. In representation, we incorporate agent-specific modules in the shared neural network architecture, which are regularized by L1-norm to promote learning sharing among agents while keeping necessary diversity. Empirical results show that our method achieves state-of-the-art performance on Google Research Football and super hard StarCraft II micromanagement tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Center Smoothing: Certified Robustness for Networks with Structured Outputs b/data/2021/neurips/Center Smoothing: Certified Robustness for Networks with Structured Outputs
new file mode 100644
index 0000000000..581705ee48
--- /dev/null
+++ b/data/2021/neurips/Center Smoothing: Certified Robustness for Networks with Structured Outputs	
@@ -0,0 +1 @@
+The study of provable adversarial robustness has mostly been limited to classification tasks and models with one-dimensional real-valued outputs. We extend the scope of certifiable robustness to problems with more general and structured outputs like sets, images, language, etc. We model the output space as a metric space under a distance/similarity function, such as intersection-over-union, perceptual similarity, total variation distance, etc. Such models are used in many machine learning problems like image segmentation, object detection, generative models, image/audio-to-text systems, etc. Based on a robustness technique called randomized smoothing, our $\textit{center smoothing}$ procedure can produce models with the guarantee that the change in the output, as measured by the distance metric, remains small for any norm-bounded adversarial perturbation of the input. We apply our method to create certifiably robust models with disparate output spaces - from sets to images - and show that it yields meaningful certificates without significantly degrading the performance of the base model. Code for our experiments is available at: https://github.com/aounon/center-smoothing.
\ No newline at end of file
diff --git a/data/2021/neurips/CentripetalText: An Efficient Text Instance Representation for Scene Text Detection b/data/2021/neurips/CentripetalText: An Efficient Text Instance Representation for Scene Text Detection
new file mode 100644
index 0000000000..8fec4d9980
--- /dev/null
+++ b/data/2021/neurips/CentripetalText: An Efficient Text Instance Representation for Scene Text Detection	
@@ -0,0 +1 @@
+Scene text detection remains a grand challenge due to the variation in text curvatures, orientations, and aspect ratios. One of the hardest problems in this task is how to represent text instances of arbitrary shapes. Although many methods have been proposed to model irregular texts in a flexible manner, most of them lose simplicity and robustness. Their complicated post-processings and the regression under Dirac delta distribution undermine the detection performance and the generalization ability. In this paper, we propose an efficient text instance representation named CentripetalText (CT), which decomposes text instances into the combination of text kernels and centripetal shifts. Specifically, we utilize the centripetal shifts to implement pixel aggregation, guiding the external text pixels to the internal text kernels. The relaxation operation is integrated into the dense regression for centripetal shifts, allowing the correct prediction in a range instead of a specific value. The convenient reconstruction of text contours and the tolerance of prediction errors in our method guarantee the high detection accuracy and the fast inference speed, respectively. Besides, we shrink our text detector into a proposal generation module, namely CentripetalText Proposal Network, replacing Segmentation Proposal Network in Mask TextSpotter v3 and producing more accurate proposals. To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e.g., F-measure of 86.3% at 40.0 FPS on Total-Text, F-measure of 86.1% at 34.8 FPS on MSRA-TD500, etc. For the task of end-to-end scene text recognition, our method outperforms Mask TextSpotter v3 by 1.1% on Total-Text.
\ No newline at end of file
diff --git a/data/2021/neurips/Certifying Robustness to Programmable Data Bias in Decision Trees b/data/2021/neurips/Certifying Robustness to Programmable Data Bias in Decision Trees
new file mode 100644
index 0000000000..229c4159af
--- /dev/null
+++ b/data/2021/neurips/Certifying Robustness to Programmable Data Bias in Decision Trees	
@@ -0,0 +1 @@
+Datasets can be biased due to societal inequities, human biases, under-representation of minorities, etc. Our goal is to certify that models produced by a learning algorithm are pointwise-robust to potential dataset biases. This is a challenging problem: it entails learning models for a large, or even infinite, number of datasets, ensuring that they all produce the same prediction. We focus on decision-tree learning due to the interpretable nature of the models. Our approach allows programmatically specifying bias models across a variety of dimensions (e.g., missing data for minorities), composing types of bias, and targeting bias towards a specific group. To certify robustness, we use a novel symbolic technique to evaluate a decision-tree learner on a large, or infinite, number of datasets, certifying that each and every dataset produces the same prediction for a specific test point. We evaluate our approach on datasets that are commonly used in the fairness literature, and demonstrate our approach's viability on a range of bias models.
\ No newline at end of file
diff --git a/data/2021/neurips/Challenges and Opportunities in High Dimensional Variational Inference b/data/2021/neurips/Challenges and Opportunities in High Dimensional Variational Inference
new file mode 100644
index 0000000000..d74326c0c6
--- /dev/null
+++ b/data/2021/neurips/Challenges and Opportunities in High Dimensional Variational Inference	
@@ -0,0 +1 @@
+Current black-box variational inference (BBVI) methods require the user to make numerous design choices -- such as the selection of variational objective and approximating family -- yet there is little principled guidance on how to do so. We develop a conceptual framework and set of experimental tools to understand the effects of these choices, which we leverage to propose best practices for maximizing posterior approximation accuracy. Our approach is based on studying the pre-asymptotic tail behavior of the density ratios between the joint distribution and the variational approximation, then exploiting insights and tools from the importance sampling literature. Our framework and supporting experiments help to distinguish between the behavior of BBVI methods for approximating low-dimensional versus moderate-to-high-dimensional posteriors. In the latter case, we show that mass-covering variational objectives are difficult to optimize and do not improve accuracy, but flexible variational families can improve accuracy and the effectiveness of importance sampling -- at the cost of additional optimization challenges. Therefore, for moderate-to-high-dimensional posteriors we recommend using the (mode-seeking) exclusive KL divergence since it is the easiest to optimize, and improving the variational family or using model parameter transformations to make the posterior and optimal variational approximation more similar. On the other hand, in low-dimensional settings, we show that heavy-tailed variational families and mass-covering divergences are effective and can increase the chances that the approximation can be improved by importance sampling.
\ No newline at end of file
diff --git a/data/2021/neurips/Change Point Detection via Multivariate Singular Spectrum Analysis b/data/2021/neurips/Change Point Detection via Multivariate Singular Spectrum Analysis
new file mode 100644
index 0000000000..e841980ec2
--- /dev/null
+++ b/data/2021/neurips/Change Point Detection via Multivariate Singular Spectrum Analysis	
@@ -0,0 +1 @@
+The objective of change point detection (CPD) is to detect significant and abrupt changes in the dynamics of the underlying system of interest through multivariate time series observations. In this work, we develop and analyze an algorithm for CPD that is inspired by a variant of the classical singular spectrum analysis (SSA) approach for time series by combining it with the classical cumulative sum (CUSUM) statistic from sequential hypothesis testing. In particular, we model the underlying dynamics of multivariate time series observations through the spatio-temporal model introduced recently in the multivariate SSA (mSSA) literature. The change point in such a setting corresponds to a change in the underlying spatio-temporal model. As the primary contributions of this work, we develop an algorithm based on CUSUM-statistic to detect such change points in an online fashion. We extend the analysis of CUSUM statistics, traditionally done for the setting of independent observations, to the dependent setting of (multivariate) time series under the spatio-temporal model. Specifically, for a given parameter h> 0 , our method achieves the following desirable trade-off: when a change happens, it detects it within O ( h ) time delay on average, while in the absence of change, it does not declare false detection for at least exp( ⌦ ( h )) time length on average. We conduct empirical experiments using benchmark and synthetic datasets. We find that the proposed method performs competitively or outperforms the state-of-the-art change point detection methods across datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Channel Permutations for N: M Sparsity b/data/2021/neurips/Channel Permutations for N: M Sparsity
new file mode 100644
index 0000000000..8af324ecb4
--- /dev/null
+++ b/data/2021/neurips/Channel Permutations for N: M Sparsity	
@@ -0,0 +1 @@
+We introduce channel permutations as a method to maximize the accuracy of N:M sparse networks. N:M sparsity requires N out of M consecutive elements to be zero and has been shown to maintain accuracy for many models and tasks with a simple prune and ﬁne-tune workﬂow. By permuting weight matrices along their channel dimension and adjusting the surrounding layers appropriately, we demonstrate accuracy recovery for even small, parameter-efﬁcient networks, without affecting inference run-time. We also present both a quality metric to simplify judging permutations as well as efﬁcient methods to search for high-quality permutations, including two optimizations to escape local minima. Finally, we share an ablation study to show the importance of each part of our search algorithm, experimental results showing correlation between our quality metric and ﬁnal network accuracy, improved sparse network accuracy using our techniques with insigniﬁcant overhead to training time, and the transformation of unstructured to structured sparse workloads. Code to use these techniques when generating a 2:4 sparse network is available at https://github.com/NVIDIA/apex/tree/master/apex/contrib/sparsity.
\ No newline at end of file
diff --git a/data/2021/neurips/Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning b/data/2021/neurips/Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning
new file mode 100644
index 0000000000..e3e4833d63
--- /dev/null
+++ b/data/2021/neurips/Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning	
@@ -0,0 +1 @@
+Deep Metric Learning (DML) aims to find representations suitable for zero-shot transfer to a priori unknown test distributions. However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider a broad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization under out-of-distribution shifts in DML. ooDML is designed to probe the generalization performance on much more challenging, diverse train-to-test distribution shifts. Based on our new benchmark, we conduct a thorough empirical analysis of state-of-the-art DML methods. We find that while generalization tends to consistently degrade with difficulty, some methods are better at retaining performance as the distribution shift increases. Finally, we propose few-shot DML as an efficient way to consistently improve generalization in response to unknown test shifts presented in ooDML. Code available here: https://github.com/CompVis/Characterizing_Generalization_in_DML.
\ No newline at end of file
diff --git a/data/2021/neurips/Characterizing possible failure modes in physics-informed neural networks b/data/2021/neurips/Characterizing possible failure modes in physics-informed neural networks
new file mode 100644
index 0000000000..872377d5db
--- /dev/null
+++ b/data/2021/neurips/Characterizing possible failure modes in physics-informed neural networks	
@@ -0,0 +1 @@
+Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena for even slightly more complex problems. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves PDE-based differential operators, can introduce a number of subtle problems, including making the problem more ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.
\ No newline at end of file
diff --git a/data/2021/neurips/Characterizing the risk of fairwashing b/data/2021/neurips/Characterizing the risk of fairwashing
new file mode 100644
index 0000000000..1c2d15ca3d
--- /dev/null
+++ b/data/2021/neurips/Characterizing the risk of fairwashing	
@@ -0,0 +1 @@
+Fairwashing refers to the risk that an unfair black-box model can be explained by a fairer model through post-hoc explanation manipulation. In this paper, we investigate the capability of fairwashing attacks by analyzing their fidelity-unfairness trade-offs. In particular, we show that fairwashed explanation models can generalize beyond the suing group (i.e., data points that are being explained), meaning that a fairwashed explainer can be used to rationalize subsequent unfair decisions of a black-box model. We also demonstrate that fairwashing attacks can transfer across black-box models, meaning that other black-box models can perform fairwashing without explicitly using their predictions. This generalization and transferability of fairwashing attacks imply that their detection will be difficult in practice. Finally, we propose an approach to quantify the risk of fairwashing, which is based on the computation of the range of the unfairness of high-fidelity explainers.
\ No newline at end of file
diff --git a/data/2021/neurips/Charting and Navigating the Space of Solutions for Recurrent Neural Networks b/data/2021/neurips/Charting and Navigating the Space of Solutions for Recurrent Neural Networks
new file mode 100644
index 0000000000..4a624b2beb
--- /dev/null
+++ b/data/2021/neurips/Charting and Navigating the Space of Solutions for Recurrent Neural Networks	
@@ -0,0 +1 @@
+Recurrent Neural Networks (RNNs) were recently successfully used to model the way neural activity drives task-related behavior in animals, operating under the implicit assumption that the obtained solutions are universal. Observations in both neuroscience and machine learning challenge this assumption. Animals can approach a given task with a variety of strategies, and training machine learning algorithms introduces the phenomenon of underspecification. These observations imply that every task is associated with a space of solutions. To date, the structure of this space is not understood, limiting the approach of comparing RNNs with neural data. Here, we characterize the space of solutions associated with various tasks. We first study a simple two-neuron network on a task that leads to multiple solutions. We trace the nature of the final solution back to the network's initial connectivity and identify discrete dynamical regimes that underlie this diversity. We then examine three neuroscience-inspired tasks: Delayed and interval discrimination, and Time reproduction. For each task, we find a rich set of solutions. Variability can be found directly in the neural activity of the networks, and additionally by testing the trained networks' ability to extrapolate, as a perturbation to a system often reveals hidden structure. Furthermore, we relate extrapolation patterns to specific dynamical objects and effective algorithms found by the networks. We introduce a tool to derive the reduced dynamics of networks by generating a compact directed graph describing the essence of the dynamics with regards to behavioral inputs and outputs. Using this representation, we can partition the solutions to each task into a handful of types and partially predict them from neural features. Our results shed light on the concept of the space of solutions and its uses in Machine learning and in Neuroscience.
\ No newline at end of file
diff --git a/data/2021/neurips/Chasing Sparsity in Vision Transformers: An End-to-End Exploration b/data/2021/neurips/Chasing Sparsity in Vision Transformers: An End-to-End Exploration
new file mode 100644
index 0000000000..32a3e4317c
--- /dev/null
+++ b/data/2021/neurips/Chasing Sparsity in Vision Transformers: An End-to-End Exploration	
@@ -0,0 +1 @@
+Vision transformers (ViTs) have recently received explosive popularity, but their enormous model sizes and training costs remain daunting. Conventional post-training pruning often incurs higher training budgets. In contrast, this paper aims to trim down both the training memory overhead and the inference complexity, without sacrificing the achievable accuracy. We carry out the first-of-its-kind comprehensive exploration, on taking a unified approach of integrating sparsity in ViTs"from end to end". Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output. The approach is seamlessly extended from unstructured to structured sparsity, the latter by considering to guide the prune-and-grow of self-attention heads inside ViTs. We further co-explore data and architecture sparsity for additional efficiency gains by plugging in a novel learnable token selector to adaptively determine the currently most vital patches. Extensive results on ImageNet with diverse ViT backbones validate the effectiveness of our proposals which obtain significantly reduced computational cost and almost unimpaired generalization. Perhaps most surprisingly, we find that the proposed sparse (co-)training can sometimes improve the ViT accuracy rather than compromising it, making sparsity a tantalizing"free lunch". For example, our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0.28% top-1 accuracy, and meanwhile enjoys 49.32% FLOPs and 4.40% running time savings. Our codes are available at https://github.com/VITA-Group/SViTE.
\ No newline at end of file
diff --git a/data/2021/neurips/Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote b/data/2021/neurips/Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote
new file mode 100644
index 0000000000..aebd288d1b
--- /dev/null
+++ b/data/2021/neurips/Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote	
@@ -0,0 +1 @@
+We present a new second-order oracle bound for the expected risk of a weighted majority vote. The bound is based on a novel parametric form of the Chebyshev- Cantelli inequality (a.k.a. one-sided Chebyshev's), which is amenable to efficient minimization. The new form resolves the optimization challenge faced by prior oracle bounds based on the Chebyshev-Cantelli inequality, the C-bounds [Germain et al., 2015], and, at the same time, it improves on the oracle bound based on second order Markov's inequality introduced by Masegosa et al. [2020]. We also derive a new concentration of measure inequality, which we name PAC-Bayes-Bennett, since it combines PAC-Bayesian bounding with Bennett's inequality. We use it for empirical estimation of the oracle bound. The PAC-Bayes-Bennett inequality improves on the PAC-Bayes-Bernstein inequality of Seldin et al. [2012]. We provide an empirical evaluation demonstrating that the new bounds can improve on the work of Masegosa et al. [2020]. Both the parametric form of the Chebyshev-Cantelli inequality and the PAC-Bayes-Bennett inequality may be of independent interest for the study of concentration of measure in other domains.
\ No newline at end of file
diff --git a/data/2021/neurips/Choose a Transformer: Fourier or Galerkin b/data/2021/neurips/Choose a Transformer: Fourier or Galerkin
new file mode 100644
index 0000000000..7fd383c665
--- /dev/null
+++ b/data/2021/neurips/Choose a Transformer: Fourier or Galerkin	
@@ -0,0 +1 @@
+In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is All You Need for the first time to a data-driven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dot-product attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a Petrov-Galerkin projection layer-wise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the Petrov-Galerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.
\ No newline at end of file
diff --git a/data/2021/neurips/Circa: Stochastic ReLUs for Private Deep Learning b/data/2021/neurips/Circa: Stochastic ReLUs for Private Deep Learning
new file mode 100644
index 0000000000..add40e99e7
--- /dev/null
+++ b/data/2021/neurips/Circa: Stochastic ReLUs for Private Deep Learning	
@@ -0,0 +1 @@
+The simultaneous rise of machine learning as a service and concerns over user privacy have increasingly motivated the need for private inference (PI). While recent work demonstrates PI is possible using cryptographic primitives, the computational overheads render it impractical. The community is largely unprepared to address these overheads, as the source of slowdown in PI stems from the ReLU operator whereas optimizations for plaintext inference focus on optimizing FLOPs. In this paper we re-think the ReLU computation and propose optimizations for PI tailored to properties of neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test that significantly reduces the cost per ReLU. These optimizations result in a specific type of stochastic ReLU. The key observation is that the stochastic fault behavior is well suited for the fault-tolerant properties of neural network inference. Thus, we provide significant savings without impacting accuracy. We collectively call the optimizations Circa and demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations; we further show that Circa can be used on top of recent PI optimizations to obtain 1.8x additional speedup.
\ No newline at end of file
diff --git a/data/2021/neurips/Class-Disentanglement and Applications in Adversarial Detection and Defense b/data/2021/neurips/Class-Disentanglement and Applications in Adversarial Detection and Defense
new file mode 100644
index 0000000000..ff00aacbf9
--- /dev/null
+++ b/data/2021/neurips/Class-Disentanglement and Applications in Adversarial Detection and Defense	
@@ -0,0 +1 @@
+What is the minimum necessary information required by a neural net D ( · ) from an image x to accurately predict its class? Extracting such information in the input space from x can allocate the areas D ( · ) mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose “class-disentanglement” that trains a variational autoencoder G ( · ) to extract this class-dependent information as x − G ( x ) via a trade-off between reconstructing x by G ( x ) and classifying x by D ( x − G ( x )) , where the former competes with the latter in decomposing x so the latter retains only necessary information for classiﬁcation in x − G ( x ) . We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part x − G ( x ) . The decomposition results also provide novel interpretations to classiﬁcation and attack models. Inspired by these observations, we propose to conduct adversarial detection and adversarial defense respectively on x − G ( x ) and G ( x ) , which consistently outperform the results on the original x . In experiments, this simple approach substantially improves the detection and defense against different types of adversarial attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/Class-Incremental Learning via Dual Augmentation b/data/2021/neurips/Class-Incremental Learning via Dual Augmentation
new file mode 100644
index 0000000000..c12862cf21
--- /dev/null
+++ b/data/2021/neurips/Class-Incremental Learning via Dual Augmentation	
@@ -0,0 +1 @@
+Deep learning systems typically suffer from catastrophic forgetting of past knowledge when acquiring new skills continually. In this paper, we emphasize two dilemmas, representation bias and classiﬁer bias in class-incremental learning, and present a simple and novel approach that employs explicit class augmentation (classAug) and implicit semantic augmentation (semanAug) to address the two biases, respectively. On the one hand, we propose to address the representation bias by learning transferable and diverse representations. Speciﬁcally, we investigate the feature representations in incremental learning based on spectral analysis and present a simple technique called classAug, to let the model see more classes during training for learning representations transferable across classes. On the other hand, to overcome the classiﬁer bias, semanAug implicitly involves the simultaneous generating of an inﬁnite number of instances of old classes in the deep feature space, which poses tighter constraints to maintain the decision boundary of previously learned classes. Without storing any old samples, our method can perform comparably with representative data replay based approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Class-agnostic Reconstruction of Dynamic Objects from Videos b/data/2021/neurips/Class-agnostic Reconstruction of Dynamic Objects from Videos
new file mode 100644
index 0000000000..b6d7095f73
--- /dev/null
+++ b/data/2021/neurips/Class-agnostic Reconstruction of Dynamic Objects from Videos	
@@ -0,0 +1 @@
+We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics including rigid motion, non-rigid motion, and articulation; 3) we aim to reconstruct different categories of objects with one unified framework. To address these challenges, we develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues. Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation. We study the efficacy of REDO in extensive experiments on synthetic RGBD video datasets SAIL-VOS 3D and DeformingThings4D++, and on real-world video data 3DPW. We find REDO outperforms state-of-the-art dynamic reconstruction methods by a margin. In ablation studies we validate each developed component.
\ No newline at end of file
diff --git a/data/2021/neurips/Clockwork Variational Autoencoders b/data/2021/neurips/Clockwork Variational Autoencoders
new file mode 100644
index 0000000000..f5c7013cc3
--- /dev/null
+++ b/data/2021/neurips/Clockwork Variational Autoencoders	
@@ -0,0 +1 @@
+Deep learning has enabled algorithms to generate realistic images. However, accurately predicting long video sequences requires understanding long-term dependencies and remains an open challenge. While existing video prediction models succeed at generating sharp images, they tend to fail at accurately predicting far into the future. We introduce the Clockwork VAE (CW-VAE), a video prediction model that leverages a hierarchy of latent sequences, where higher levels tick at slower intervals. We demonstrate the benefits of both hierarchical latents and temporal abstraction on 4 diverse video prediction datasets with sequences of up to 1000 frames, where CW-VAE outperforms top video prediction models. Additionally, we propose a Minecraft benchmark for long-term video prediction. We conduct several experiments to gain insights into CW-VAE and confirm that slower levels learn to represent objects that change more slowly in the video, and faster levels learn to represent faster objects.
\ No newline at end of file
diff --git a/data/2021/neurips/Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems b/data/2021/neurips/Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems
new file mode 100644
index 0000000000..20488ef9c5
--- /dev/null
+++ b/data/2021/neurips/Closing the Gap: Tighter Analysis of Alternating Stochastic Gradient Methods for Bilevel Problems	
@@ -0,0 +1 @@
+Stochastic nested optimization, including stochastic bilevel, min-max, and compositional optimization, is gaining popularity in many machine learning applications. While the three problems share a nested structure, existing works often treat them separately, thus developing problem-specific algorithms and analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have a slower convergence rate than non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an -stationary point of the nested problem, it requires O( −2) samples in total. Under certain regularity conditions, applying our results to stochastic compositional, min-max, and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.
\ No newline at end of file
diff --git a/data/2021/neurips/Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation b/data/2021/neurips/Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation
new file mode 100644
index 0000000000..84fb4b1296
--- /dev/null
+++ b/data/2021/neurips/Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation	
@@ -0,0 +1 @@
+Significant effort has been placed on developing decision support tools to improve patient care. However, drivers of real-world clinical decisions in complex medical scenarios are not yet well-understood, resulting in substantial gaps between these tools and practical applications. In light of this, we highlight that more attention on understanding clinical decision-making is required both to elucidate current clinical practices and to enable effective human-machine interactions. This is imperative in high-stakes scenarios with scarce available resources. Using organ transplantation as a case study, we formalize the desiderata of methods for understanding clinical decision-making. We show that most existing machine learning methods are insufficient to meet these requirements and propose iTransplant, a novel data-driven framework to learn the factors affecting decisions on organ offers in an instance-wise fashion directly from clinical data, as a possible solution. Through experiments on real-world liver transplantation data from OPTN, we demonstrate the use of iTransplant to: (1) discover which criteria are most important to clinicians for organ offer acceptance; (2) identify patient-specific organ preferences of clinicians allowing automatic patient stratification; and (3) explore variations in transplantation practices between different transplant centers. Finally, we emphasize that the insights gained by iTransplant can be used to inform the development of future decision support tools.
\ No newline at end of file
diff --git a/data/2021/neurips/Clustering Effect of Adversarial Robust Models b/data/2021/neurips/Clustering Effect of Adversarial Robust Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning b/data/2021/neurips/Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning
new file mode 100644
index 0000000000..7f374e4e1e
--- /dev/null
+++ b/data/2021/neurips/Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across algorithms difficult. In this work, we focus on a series of off-policy inference-based actor-critic algorithms -- MPO, AWR, and SAC -- to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We performed extensive ablation studies, and identified substantial performance drops whenever implementation details are mismatched for algorithmic choices. These results show which implementation details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC. We hope our work can inspire future work to further demystify sources of performance improvements across multiple algorithms and allow researchers to build on one another's both algorithmic and implementational innovations.
\ No newline at end of file
diff --git a/data/2021/neurips/Co-evolution Transformer for Protein Contact Prediction b/data/2021/neurips/Co-evolution Transformer for Protein Contact Prediction
new file mode 100644
index 0000000000..c9908ad50e
--- /dev/null
+++ b/data/2021/neurips/Co-evolution Transformer for Protein Contact Prediction	
@@ -0,0 +1 @@
+Proteins are the main machinery of life and protein functions are largely determined by their 3D structures. The measurement of the pairwise proximity between amino acids of a protein, known as inter-residue contact map, well characterizes the structural information of a protein. Protein contact prediction (PCP) is an essential building block of many protein structure related applications. The prevalent approach to contact prediction is based on estimating the inter-residue contacts using hand-crafted coevolutionary features derived from multiple sequence alignments (MSAs). To mitigate the information loss caused by hand-crafted features, some recently proposed methods try to learn residue co-evolutions directly from MSAs. These methods generally derive coevolutionary features by aggregating the learned residue representations from individual sequences with equal weights, which is inconsistent with the premise that residue co-evolutions are a reﬂection of collective covariation patterns of numerous homologous proteins. Moreover, non-homologous residues and gaps commonly exist in MSAs. By aggregating features from all homologs equally, the non-homologous information may cause misestimation of the residue co-evolutions. To overcome these issues, we propose an attention-based architecture, Co-evolution Transformer (CoT), for PCP. CoT jointly considers the information from all homologous sequences in the MSA to better capture global coevolutionary patterns. To mitigate the inﬂuence of the non-homologous information, CoT selectively aggregates the features from different homologs by assigning
\ No newline at end of file
diff --git a/data/2021/neurips/CoAtNet: Marrying Convolution and Attention for All Data Sizes b/data/2021/neurips/CoAtNet: Marrying Convolution and Attention for All Data Sizes
new file mode 100644
index 0000000000..4c6f9821a2
--- /dev/null
+++ b/data/2021/neurips/CoAtNet: Marrying Convolution and Attention for All Data Sizes	
@@ -0,0 +1 @@
+Transformers have attracted increasing interests in computer vision, but they still fall behind state-of-the-art convolutional networks. In this work, we show that while Transformers tend to have larger model capacity, their generalization can be worse than convolutional networks due to the lack of the right inductive bias. To effectively combine the strengths from both architectures, we present CoAtNets(pronounced"coat"nets), a family of hybrid models built from two key insights: (1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention; (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Experiments show that our CoAtNets achieve state-of-the-art performance under different resource constraints across various datasets: Without extra data, CoAtNet achieves 86.0% ImageNet top-1 accuracy; When pre-trained with 13M images from ImageNet-21K, our CoAtNet achieves 88.56% top-1 accuracy, matching ViT-huge pre-trained with 300M images from JFT-300M while using 23x less data; Notably, when we further scale up CoAtNet with JFT-3B, it achieves 90.88% top-1 accuracy on ImageNet, establishing a new state-of-the-art result.
\ No newline at end of file
diff --git a/data/2021/neurips/CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration b/data/2021/neurips/CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration
new file mode 100644
index 0000000000..5b0a88cced
--- /dev/null
+++ b/data/2021/neurips/CoFiNet: Reliable Coarse-to-fine Correspondences for Robust PointCloud Registration	
@@ -0,0 +1 @@
+We study the problem of extracting correspondences between a pair of point clouds for registration. For correspondence retrieval, existing works benefit from matching sparse keypoints detected from dense points but usually struggle to guarantee their repeatability. To address this issue, we present CoFiNet - Coarse-to-Fine Network which extracts hierarchical correspondences from coarse to fine without keypoint detection. On a coarse scale and guided by a weighting scheme, our model firstly learns to match down-sampled nodes whose vicinity points share more overlap, which significantly shrinks the search space of a consecutive stage. On a finer scale, node proposals are consecutively expanded to patches that consist of groups of points together with associated descriptors. Point correspondences are then refined from the overlap areas of corresponding patches, by a density-adaptive matching module capable to deal with varying point density. Extensive evaluation of CoFiNet on both indoor and outdoor standard benchmarks shows our superiority over existing methods. Especially on 3DLoMatch where point clouds share less overlap, CoFiNet significantly outperforms state-of-the-art approaches by at least 5% on Registration Recall, with at most two-third of their parameters.
\ No newline at end of file
diff --git a/data/2021/neurips/CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions b/data/2021/neurips/CoFrNets: Interpretable Neural Architecture Inspired by Continued Fractions
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Coarse-to-fine Animal Pose and Shape Estimation b/data/2021/neurips/Coarse-to-fine Animal Pose and Shape Estimation
new file mode 100644
index 0000000000..2aa42d8fee
--- /dev/null
+++ b/data/2021/neurips/Coarse-to-fine Animal Pose and Shape Estimation	
@@ -0,0 +1 @@
+Most existing animal pose and shape estimation approaches reconstruct animal meshes with a parametric SMAL model. This is because the low-dimensional pose and shape parameters of the SMAL model makes it easier for deep networks to learn the high-dimensional animal meshes. However, the SMAL model is learned from scans of toy animals with limited pose and shape variations, and thus may not be able to represent highly varying real animals well. This may result in poor fittings of the estimated meshes to the 2D evidences, e.g. 2D keypoints or silhouettes. To mitigate this problem, we propose a coarse-to-fine approach to reconstruct 3D animal mesh from a single image. The coarse estimation stage first estimates the pose, shape and translation parameters of the SMAL model. The estimated meshes are then used as a starting point by a graph convolutional network (GCN) to predict a per-vertex deformation in the refinement stage. This combination of SMAL-based and vertex-based representations benefits from both parametric and non-parametric representations. We design our mesh refinement GCN (MRGCN) as an encoder-decoder structure with hierarchical feature representations to overcome the limited receptive field of traditional GCNs. Moreover, we observe that the global image feature used by existing animal mesh reconstruction works is unable to capture detailed shape information for mesh refinement. We thus introduce a local feature extractor to retrieve a vertex-level feature and use it together with the global feature as the input of the MRGCN. We test our approach on the StanfordExtra dataset and achieve state-of-the-art results. Furthermore, we test the generalization capacity of our approach on the Animal Pose and BADJA datasets. Our code is available at the project website.
\ No newline at end of file
diff --git a/data/2021/neurips/Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks b/data/2021/neurips/Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks
new file mode 100644
index 0000000000..dd961f0f07
--- /dev/null
+++ b/data/2021/neurips/Cockpit: A Practical Debugging Tool for the Training of Deep Neural Networks	
@@ -0,0 +1 @@
+When engineers train deep learning models, they are very much 'flying blind'. Commonly used methods for real-time training diagnostics, such as monitoring the train/test loss, are limited. Assessing a network's training process solely through these performance indicators is akin to debugging software without access to internal states through a debugger. To address this, we present Cockpit, a collection of instruments that enable a closer look into the inner workings of a learning machine, and a more informative and meaningful status report for practitioners. It facilitates the identification of learning phases and failure modes, like ill-chosen hyperparameters. These instruments leverage novel higher-order information about the gradient distribution and curvature, which has only recently become efficiently accessible. We believe that such a debugging tool, which we open-source for PyTorch, is a valuable help in troubleshooting the training process. By revealing new insights, it also more generally contributes to explainability and interpretability of deep nets.
\ No newline at end of file
diff --git a/data/2021/neurips/CogView: Mastering Text-to-Image Generation via Transformers b/data/2021/neurips/CogView: Mastering Text-to-Image Generation via Transformers
new file mode 100644
index 0000000000..d1cd12e477
--- /dev/null
+++ b/data/2021/neurips/CogView: Mastering Text-to-Image Generation via Transformers	
@@ -0,0 +1 @@
+Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E.
\ No newline at end of file
diff --git a/data/2021/neurips/Collaborating with Humans without Human Data b/data/2021/neurips/Collaborating with Humans without Human Data
new file mode 100644
index 0000000000..480d0d67df
--- /dev/null
+++ b/data/2021/neurips/Collaborating with Humans without Human Data	
@@ -0,0 +1 @@
+Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train"human-aware"agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Collaborative Causal Discovery with Atomic Interventions b/data/2021/neurips/Collaborative Causal Discovery with Atomic Interventions
new file mode 100644
index 0000000000..fa811a1706
--- /dev/null
+++ b/data/2021/neurips/Collaborative Causal Discovery with Atomic Interventions	
@@ -0,0 +1 @@
+We introduce a new Collaborative Causal Discovery problem, through which we model a common scenario in which we have multiple independent entities each with their own causal graph, and the goal is to simultaneously learn all these causal graphs. We study this problem without the causal sufficiency assumption, using Maximal Ancestral Graphs (MAG) to model the causal graphs, and assuming that we have the ability to actively perform independent single vertex (or atomic) interventions on the entities. If the $M$ underlying (unknown) causal graphs of the entities satisfy a natural notion of clustering, we give algorithms that leverage this property and recovers all the causal graphs using roughly logarithmic in $M$ number of atomic interventions per entity. These are significantly fewer than $n$ atomic interventions per entity required to learn each causal graph separately, where $n$ is the number of observable nodes in the causal graph. We complement our results with a lower bound and discuss various extensions of our collaborative setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning) b/data/2021/neurips/Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)
new file mode 100644
index 0000000000..4b62be4bfb
--- /dev/null
+++ b/data/2021/neurips/Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning)	
@@ -0,0 +1 @@
+We study Byzantine collaborative learning, where $n$ nodes seek to collectively learn from each others' local data. The data distribution may vary from one node to another. No node is trusted, and $f<n$ nodes can behave arbitrarily. We prove that collaborative learning is equivalent to a new form of agreement, which we call averaging agreement. In this problem, nodes start each with an initial vector and seek to approximately agree on a common vector, which is close to the average of honest nodes' initial vectors. We present two asynchronous solutions to averaging agreement, each we prove optimal according to some dimension. The first, based on the minimum-diameter averaging, requires $ n \geq 6f+1$, but achieves asymptotically the best-possible averaging constant up to a multiplicative constant. The second, based on reliable broadcast and coordinate-wise trimmed mean, achieves optimal Byzantine resilience, i.e., $n \geq 3f+1$. Each of these algorithms induces an optimal Byzantine collaborative learning protocol. In particular, our equivalence yields new impossibility theorems on what any collaborative learning algorithm can achieve in adversarial and heterogeneous environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Collaborative Uncertainty in Multi-Agent Trajectory Forecasting b/data/2021/neurips/Collaborative Uncertainty in Multi-Agent Trajectory Forecasting
new file mode 100644
index 0000000000..627f20eb7f
--- /dev/null
+++ b/data/2021/neurips/Collaborative Uncertainty in Multi-Agent Trajectory Forecasting	
@@ -0,0 +1 @@
+Uncertainty modeling is critical in trajectory forecasting systems for both interpretation and safety reasons. To better predict the future trajectories of multiple agents, recent works have introduced interaction modules to capture interactions among agents. This approach leads to correlations among the predicted trajectories. However, the uncertainty brought by such correlations is neglected. To fill this gap, we propose a novel concept, collaborative uncertainty(CU), which models the uncertainty resulting from the interaction module. We build a general CU-based framework to make a prediction model to learn the future trajectory and the corresponding uncertainty. The CU-based framework is integrated as a plugin module to current state-of-the-art (SOTA) systems and deployed in two special cases based on multivariate Gaussian and Laplace distributions. In each case, we conduct extensive experiments on two synthetic datasets and two public, large-scale benchmarks of trajectory forecasting. The results are promising: 1) The results of synthetic datasets show that CU-based framework allows the model to appropriately approximate the ground-truth distribution. 2) The results of trajectory forecasting benchmarks demonstrate that the CU-based framework steadily helps SOTA systems improve their performances. Especially, the proposed CU-based framework helps VectorNet improve by 57cm regarding Final Displacement Error on nuScenes dataset. 3) The visualization results of CU illustrate that the value of CU is highly related to the amount of the interactive information among agents.
\ No newline at end of file
diff --git a/data/2021/neurips/Collapsed Variational Bounds for Bayesian Neural Networks b/data/2021/neurips/Collapsed Variational Bounds for Bayesian Neural Networks
new file mode 100644
index 0000000000..e107135966
--- /dev/null
+++ b/data/2021/neurips/Collapsed Variational Bounds for Bayesian Neural Networks	
@@ -0,0 +1 @@
+Recent interest in learning large variational Bayesian Neural Networks (BNNs) has been partly hampered by poor predictive performance caused by underﬁtting, and their performance is known to be very sensitive to the prior over weights. Current practice often ﬁxes the prior parameters to standard values or tunes them using heuristics or cross-validation. In this paper, we treat prior parameters in a distributional way by extending the model and collapsing the variational bound with respect to their posteriors. This leads to novel and tighter Evidence Lower Bounds (ELBOs) for performing variational inference (VI) in BNNs. Our experiments show that the new bounds signiﬁcantly improve the performance of Gaussian mean-ﬁeld VI applied to BNNs on a variety of data sets, demonstrating that mean-ﬁeld VI works well even in deep models. We also ﬁnd that the tighter ELBOs can be good optimization targets for learning the hyperparameters of hierarchical priors.
\ No newline at end of file
diff --git a/data/2021/neurips/Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification b/data/2021/neurips/Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification
new file mode 100644
index 0000000000..394906fd4c
--- /dev/null
+++ b/data/2021/neurips/Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification	
@@ -0,0 +1 @@
+Semi-supervised learning aims to leverage a large amount of unlabeled data for performance boosting. Existing works primarily focus on image classification. In this paper, we delve into semi-supervised learning for object detection, where labeled data are more labor-intensive to collect. Current methods are easily distracted by noisy regions generated by pseudo labels. To combat the noisy labeling, we propose noise-resistant semi-supervised learning by quantifying the region uncertainty. We first investigate the adverse effects brought by different forms of noise associated with pseudo labels. Then we propose to quantify the uncertainty of regions by identifying the noise-resistant properties of regions over different strengths. By importing the region uncertainty quantification and promoting multipeak probability distribution output, we introduce uncertainty into training and further achieve noise-resistant learning. Experiments on both PASCAL VOC and MS COCO demonstrate the extraordinary performance of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach b/data/2021/neurips/Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach
new file mode 100644
index 0000000000..d5f52ed87b
--- /dev/null
+++ b/data/2021/neurips/Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach	
@@ -0,0 +1 @@
+We propose a fully differentiable architecture for simultaneous semantic and instance segmentation (a.k.a. panoptic segmentation) consisting of a convolutional neural network and an asymmetric multiway cut problem solver. The latter solves a combinatorial optimization problem that elegantly incorporates semantic and boundary predictions to produce a panoptic labeling. Our formulation allows to directly maximize a smooth surrogate of the panoptic quality metric by backpropagating the gradient through the optimization problem. Experimental evaluation shows improvement by backpropagating through the optimization problem w.r.t. comparable approaches on Cityscapes and COCO datasets. Overall, our approach shows the utility of using combinatorial optimization in tandem with deep learning in a challenging large scale real-world problem and showcases benefits and insights into training such an architecture.
\ No newline at end of file
diff --git a/data/2021/neurips/Combinatorial Pure Exploration with Bottleneck Reward Function b/data/2021/neurips/Combinatorial Pure Exploration with Bottleneck Reward Function
new file mode 100644
index 0000000000..490810e115
--- /dev/null
+++ b/data/2021/neurips/Combinatorial Pure Exploration with Bottleneck Reward Function	
@@ -0,0 +1 @@
+In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings. In CPE-B, given a set of base arms and a collection of subsets of base arms (super arms) following a certain combinatorial constraint, a learner sequentially plays a base arm and observes its random reward, with the objective of finding the optimal super arm with the maximum bottleneck value, defined as the minimum expected reward of the base arms contained in the super arm. CPE-B captures a variety of practical scenarios such as network routing in communication networks, and its \emph{unique challenges} fall on how to utilize the bottleneck property to save samples and achieve the statistical optimality. None of the existing CPE studies (most of them assume linear rewards) can be adapted to solve such challenges, and thus we develop brand-new techniques to handle them. For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor). For the FB setting, we design an algorithm which achieves the state-of-the-art error probability guarantee and is the first to run efficiently on fixed-budget path instances, compared to existing CPE algorithms. Our experimental results on the top-$k$, path and matching instances validate the empirical superiority of the proposed algorithms over their baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Combiner: Full Attention Transformer with Sparse Computation Cost b/data/2021/neurips/Combiner: Full Attention Transformer with Sparse Computation Cost
new file mode 100644
index 0000000000..5d36aaf8f5
--- /dev/null
+++ b/data/2021/neurips/Combiner: Full Attention Transformer with Sparse Computation Cost	
@@ -0,0 +1 @@
+Transformers provide a class of expressive architectures that are extremely effective for sequence modeling. However, the key limitation of transformers is their quadratic memory and time complexity $\mathcal{O}(L^2)$ with respect to the sequence length in attention layers, which restricts application in extremely long sequences. Most existing approaches leverage sparsity or low-rank assumptions in the attention matrix to reduce cost, but sacrifice expressiveness. Instead, we propose Combiner, which provides full attention capability in each attention head while maintaining low computation and memory complexity. The key idea is to treat the self-attention mechanism as a conditional expectation over embeddings at each location, and approximate the conditional distribution with a structured factorization. Each location can attend to all other locations, either via direct attention, or through indirect attention to abstractions, which are again conditional expectations of embeddings from corresponding local regions. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention, resulting in the same sub-quadratic cost ($\mathcal{O}(L\log(L))$ or $\mathcal{O}(L\sqrt{L})$). Combiner is a drop-in replacement for attention layers in existing transformers and can be easily implemented in common frameworks. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach, yielding state-of-the-art results on several image and text modeling tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration b/data/2021/neurips/Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration
new file mode 100644
index 0000000000..860d08fe19
--- /dev/null
+++ b/data/2021/neurips/Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration	
@@ -0,0 +1 @@
+An increasingly common use case for machine learning models is augmenting the abilities of human decision makers. For classification tasks where neither the human or model are perfectly accurate, a key step in obtaining high performance is combining their individual predictions in a manner that leverages their relative strengths. In this work, we develop a set of algorithms that combine the probabilistic output of a model with the class-level output of a human. We show theoretically that the accuracy of our combination model is driven not only by the individual human and model accuracies, but also by the model's confidence. Empirical results on image classification with CIFAR-10 and a subset of ImageNet demonstrate that such human-model combinations consistently have higher accuracies than the model or human alone, and that the parameters of the combination method can be estimated effectively with as few as ten labeled datapoints.
\ No newline at end of file
diff --git a/data/2021/neurips/Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces b/data/2021/neurips/Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces
new file mode 100644
index 0000000000..78d45889a0
--- /dev/null
+++ b/data/2021/neurips/Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces	
@@ -0,0 +1 @@
+We consider the problem of optimizing combinatorial spaces (e.g., sequences, trees, and graphs) using expensive black-box function evaluations. For example, optimizing molecules for drug design using physical lab experiments. Bayesian optimization (BO) is an efficient framework for solving such problems by intelligently selecting the inputs with high utility guided by a learned surrogate model. A recent BO approach for combinatorial spaces is through a reduction to BO over continuous spaces by learning a latent representation of structures using deep generative models (DGMs). The selected input from the continuous space is decoded into a discrete structure for performing function evaluation. However, the surrogate model over the latent space only uses the information learned by the DGM, which may not have the desired inductive bias to approximate the target black-box function. To overcome this drawback, this paper proposes a principled approach referred as LADDER. The key idea is to define a novel structure-coupled kernel that explicitly integrates the structural information from decoded structures with the learned latent space representation for better surrogate modeling. Our experiments on real-world benchmarks show that LADDER significantly improves over the BO over latent space method, and performs better or similar to state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers b/data/2021/neurips/Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers
new file mode 100644
index 0000000000..a7b4be6718
--- /dev/null
+++ b/data/2021/neurips/Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence $u \mapsto y$ by simply simulating a linear continuous-time state-space representation $\dot{x} = Ax + Bu, y = Cx + Du$. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices $A$ that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use hand-crafted features on 100x shorter sequences.
\ No newline at end of file
diff --git a/data/2021/neurips/Communication-efficient SGD: From Local SGD to One-Shot Averaging b/data/2021/neurips/Communication-efficient SGD: From Local SGD to One-Shot Averaging
new file mode 100644
index 0000000000..0fce8fbfac
--- /dev/null
+++ b/data/2021/neurips/Communication-efficient SGD: From Local SGD to One-Shot Averaging	
@@ -0,0 +1 @@
+We consider speeding up stochastic gradient descent (SGD) by parallelizing it across multiple workers. We assume the same data set is shared among $N$ workers, who can take SGD steps and coordinate with a central server. While it is possible to obtain a linear reduction in the variance by averaging all the stochastic gradients at every step, this requires a lot of communication between the workers and the server, which can dramatically reduce the gains from parallelism. The Local SGD method, proposed and analyzed in the earlier literature, suggests machines should make many local steps between such communications. While the initial analysis of Local SGD showed it needs $\Omega ( \sqrt{T} )$ communications for $T$ local gradient steps in order for the error to scale proportionately to $1/(NT)$, this has been successively improved in a string of papers, with the state of the art requiring $\Omega \left( N \left( \mbox{ poly} (\log T) \right) \right)$ communications. In this paper, we suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows. Our analysis shows that this can achieve an error that scales as $1/(NT)$ with a number of communications that is completely independent of $T$. In particular, we show that $\Omega(N)$ communications are sufficient. Empirical evidence suggests this bound is close to tight as we further show that $\sqrt{N}$ or $N^{3/4}$ communications fail to achieve linear speed-up in simulations. Moreover, we show that under mild assumptions, the main of which is twice differentiability on any neighborhood of the optimal solution, one-shot averaging which only uses a single round of communication can also achieve the optimal convergence rate asymptotically.
\ No newline at end of file
diff --git a/data/2021/neurips/Compacter: Efficient Low-Rank Hypercomplex Adapter Layers b/data/2021/neurips/Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
new file mode 100644
index 0000000000..5b515eb7f7
--- /dev/null
+++ b/data/2021/neurips/Compacter: Efficient Low-Rank Hypercomplex Adapter Layers	
@@ -0,0 +1 @@
+Adapting large-scale pretrained language models to downstream tasks via fine-tuning is the standard method for achieving state-of-the-art performance on NLP benchmarks. However, fine-tuning all weights of models with millions or billions of parameters is sample-inefficient, unstable in low-resource settings, and wasteful as it requires storing a separate copy of the model for each task. Recent work has developed parameter-efficient fine-tuning methods, but these approaches either still require a relatively large number of parameters or underperform standard fine-tuning. In this work, we propose Compacter, a method for fine-tuning large-scale language models with a better trade-off between task performance and the number of trainable parameters than prior work. Compacter accomplishes this by building on top of ideas from adapters, low-rank optimization, and parameterized hypercomplex multiplication layers. Specifically, Compacter inserts task-specific weight matrices into a pretrained model's weights, which are computed efficiently as a sum of Kronecker products between shared"slow"weights and"fast"rank-one matrices defined per Compacter layer. By only training 0.047% of a pretrained model's parameters, Compacter performs on par with standard fine-tuning on GLUE and outperforms standard fine-tuning on SuperGLUE and low-resource settings. Our code is publicly available at~\url{https://github.com/rabeehk/compacter}.
\ No newline at end of file
diff --git a/data/2021/neurips/Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization b/data/2021/neurips/Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization
new file mode 100644
index 0000000000..9d85a4d474
--- /dev/null
+++ b/data/2021/neurips/Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization	
@@ -0,0 +1 @@
+We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. We establish a lower bound of $\Omega\left(\sqrt{\kappa}\epsilon^{-2}\right)$ for deterministic oracles, where $\epsilon$ defines the level of approximate stationarity and $\kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $\epsilon$ and $\kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $\Omega\left(\sqrt{\kappa}\epsilon^{-2} + \kappa^{1/3}\epsilon^{-4}\right)$. It suggests that there is a significant gap between the upper bound $\mathcal{O}(\kappa^3 \epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.
\ No newline at end of file
diff --git a/data/2021/neurips/Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features b/data/2021/neurips/Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features
new file mode 100644
index 0000000000..6183975de4
--- /dev/null
+++ b/data/2021/neurips/Compositional Modeling of Nonlinear Dynamical Systems with ODE-based Random Features	
@@ -0,0 +1 @@
+Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We present a novel, domain-agnostic approach to tackling this problem, using compositions of physics-informed random features, derived from ordinary differential equations. The architecture of our model leverages recent advances in approximate inference for deep Gaussian processes, such as layer-wise weight-space approximations which allow us to incorporate random Fourier features, and stochastic variational inference for approximate Bayesian inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Compositional Reinforcement Learning from Logical Specifications b/data/2021/neurips/Compositional Reinforcement Learning from Logical Specifications
new file mode 100644
index 0000000000..932086a1c6
--- /dev/null
+++ b/data/2021/neurips/Compositional Reinforcement Learning from Logical Specifications	
@@ -0,0 +1 @@
+We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. First, DiRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Compositional Transformers for Scene Generation b/data/2021/neurips/Compositional Transformers for Scene Generation
new file mode 100644
index 0000000000..b10824df96
--- /dev/null
+++ b/data/2021/neurips/Compositional Transformers for Scene Generation	
@@ -0,0 +1 @@
+We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout, followed by an attention-based execution phase, where the layout is being refined, evolving into a rich and detailed picture. Our model moves away from conventional black-box GAN architectures that feature a flat and monolithic latent space towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate GANformer2's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes to the challenging COCO images, showing it successfully achieves state-of-the-art performance in terms of visual quality, diversity and consistency. Further experiments demonstrate the model's disentanglement and provide a deeper insight into its generative process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes. See https://github.com/dorarad/gansformer for model implementation.
\ No newline at end of file
diff --git a/data/2021/neurips/Comprehensive Knowledge Distillation with Causal Intervention b/data/2021/neurips/Comprehensive Knowledge Distillation with Causal Intervention
new file mode 100644
index 0000000000..fddab3e394
--- /dev/null
+++ b/data/2021/neurips/Comprehensive Knowledge Distillation with Causal Intervention	
@@ -0,0 +1 @@
+Knowledge distillation (KD) addresses model compression by distilling knowledge from a large model (teacher) to a smaller one (student). The existing distillation approaches mainly focus on using different criteria to align the sample representations learned by the student and the teacher, while they fail to transfer the class representations. Good class representations can benefit the sample representation learning by shaping the sample representation distribution. On the other hand, the existing approaches enforce the student to fully imitate the teacher while ignoring the fact that the teacher is typically not perfect. Although the teacher has learned rich and powerful representations, it also contains unignorable bias knowledge which is usually induced by the context prior (e.g., background) in the training data. To address these two issues, in this paper, we propose comprehensive, interventional distillation (CID) that captures both sample and class representations from the teacher while removing the bias with causal intervention. Different from the existing literature that uses the softened logits of the teacher as the training targets, CID considers the softened logits as the context information of an image, which is further used to remove the biased knowledge based on causal inference. Keeping the good representations while removing the bad bias enables CID to have a better generalization ability on test data and a better transferability across different datasets against the existing state-of-the-art approaches, which is demonstrated by extensive experiments on several benchmark datasets 1 .
\ No newline at end of file
diff --git a/data/2021/neurips/Compressed Video Contrastive Learning b/data/2021/neurips/Compressed Video Contrastive Learning
new file mode 100644
index 0000000000..03009fe287
--- /dev/null
+++ b/data/2021/neurips/Compressed Video Contrastive Learning	
@@ -0,0 +1 @@
+This work concerns self-supervised video representation learning (SSVRL), one topic that has received much attention recently. Since videos are storage-intensive and contain a rich source of visual content, models designed for SSVRL are expected to be storage-and computation-efﬁcient, as well as effective. However, most existing methods only focus on one of the two objectives, failing to consider both at the same time. In this work, for the ﬁrst time, the seemingly contradictory goals are simultaneously achieved by exploiting compressed videos and capturing mutual information between two input streams. Speciﬁcally, a novel Motion Vector based Cross Guidance Contrastive learning approach (MVCGC) is proposed. For storage and computation efﬁciency, we choose to directly decode RGB frames and motion vectors (that resemble low-resolution optical ﬂows) from compressed videos on-the-ﬂy. To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa. Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being signiﬁcantly more efﬁcient than its competitors.
\ No newline at end of file
diff --git a/data/2021/neurips/Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition b/data/2021/neurips/Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition
new file mode 100644
index 0000000000..13f7539224
--- /dev/null
+++ b/data/2021/neurips/Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition	
@@ -0,0 +1 @@
+We present a novel global compression framework for deep neural networks that automatically analyzes each layer to identify the optimal per-layer compression ratio, while simultaneously achieving the desired overall compression. Our algorithm hinges on the idea of compressing each convolutional (or fully-connected) layer by slicing its channels into multiple groups and decomposing each group via low-rank decomposition. At the core of our algorithm is the derivation of layer-wise error bounds from the Eckart Young Mirsky theorem. We then leverage these bounds to frame the compression problem as an optimization problem where we wish to minimize the maximum compression error across layers and propose an efficient algorithm towards a solution. Our experiments indicate that our method outperforms existing low-rank compression approaches across a wide range of networks and data sets. We believe that our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks. Our code is available at https://github.com/lucaslie/torchprune.
\ No newline at end of file
diff --git a/data/2021/neurips/Compressive Visual Representations b/data/2021/neurips/Compressive Visual Representations
new file mode 100644
index 0000000000..652cd31ff9
--- /dev/null
+++ b/data/2021/neurips/Compressive Visual Representations	
@@ -0,0 +1 @@
+Learning effective visual representations that generalize well without human supervision is a fundamental problem in order to apply Machine Learning to a wide variety of tasks. Recently, two families of self-supervised methods, contrastive learning and latent bootstrapping, exemplified by SimCLR and BYOL respectively, have made significant progress. In this work, we hypothesize that adding explicit information compression to these algorithms yields better and more robust representations. We verify this by developing SimCLR and BYOL formulations compatible with the Conditional Entropy Bottleneck (CEB) objective, allowing us to both measure and control the amount of compression in the learned representation, and observe their impact on downstream tasks. Furthermore, we explore the relationship between Lipschitz continuity and compression, showing a tractable lower bound on the Lipschitz constant of the encoders we learn. As Lipschitz continuity is closely related to robustness, this provides a new explanation for why compressed models are more robust. Our experiments confirm that adding compression to SimCLR and BYOL significantly improves linear evaluation accuracies and model robustness across a wide range of domain shifts. In particular, the compressed version of BYOL achieves 76.0% Top-1 linear evaluation accuracy on ImageNet with ResNet-50, and 78.8% with ResNet-50 2x.
\ No newline at end of file
diff --git a/data/2021/neurips/Computer-Aided Design as Language b/data/2021/neurips/Computer-Aided Design as Language
new file mode 100644
index 0000000000..e4ce5405df
--- /dev/null
+++ b/data/2021/neurips/Computer-Aided Design as Language	
@@ -0,0 +1 @@
+Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for developing intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation.
\ No newline at end of file
diff --git a/data/2021/neurips/ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs b/data/2021/neurips/ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs
new file mode 100644
index 0000000000..48d45f4f04
--- /dev/null
+++ b/data/2021/neurips/ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs	
@@ -0,0 +1 @@
+Query embedding (QE) -- which aims to embed entities and first-order logical (FOL) queries in low-dimensional spaces -- has shown great power in multi-hop reasoning over knowledge graphs. Recently, embedding entities and queries with geometric shapes becomes a promising direction, as geometric shapes can naturally represent answer sets of queries and logical relationships among them. However, existing geometry-based models have difficulty in modeling queries with negation, which significantly limits their applicability. To address this challenge, we propose a novel query embedding model, namely Cone Embeddings (ConE), which is the first geometry-based QE model that can handle all the FOL operations, including conjunction, disjunction, and negation. Specifically, ConE represents entities and queries as Cartesian products of two-dimensional cones, where the intersection and union of cones naturally model the conjunction and disjunction operations. By further noticing that the closure of complement of cones remains cones, we design geometric complement operators in the embedding space for the negation operations. Experiments demonstrate that ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Concentration inequalities under sub-Gaussian and sub-exponential conditions b/data/2021/neurips/Concentration inequalities under sub-Gaussian and sub-exponential conditions
new file mode 100644
index 0000000000..51f5e8fc9c
--- /dev/null
+++ b/data/2021/neurips/Concentration inequalities under sub-Gaussian and sub-exponential conditions	
@@ -0,0 +1 @@
+We prove analogues of the popular bounded difference inequality (also called McDiarmid’s inequality) for functions of independent random variables under subGaussian and sub-exponential conditions. Applied to vector-valued concentration and the method of Rademacher complexities these inequalities allow an easy extension of uniform convergence results for PCA and linear regression to the case of potentially unbounded inputand output variables.
\ No newline at end of file
diff --git a/data/2021/neurips/Conditional Generation Using Polynomial Expansions b/data/2021/neurips/Conditional Generation Using Polynomial Expansions
new file mode 100644
index 0000000000..69e666b47e
--- /dev/null
+++ b/data/2021/neurips/Conditional Generation Using Polynomial Expansions	
@@ -0,0 +1 @@
+Generative modeling has evolved to a notable field of machine learning. Deep polynomial neural networks (PNNs) have demonstrated impressive results in unsupervised image generation, where the task is to map an input vector (i.e., noise) to a synthesized image. However, the success of PNNs has not been replicated in conditional generation tasks, such as super-resolution. Existing PNNs focus on single-variable polynomial expansions which do not fare well to two-variable inputs, i.e., the noise variable and the conditional variable. In this work, we introduce a general framework, called CoPE, that enables a polynomial expansion of two input variables and captures their auto- and cross-correlations. We exhibit how CoPE can be trivially augmented to accept an arbitrary number of input variables. CoPE is evaluated in five tasks (class-conditional generation, inverse problems, edges-to-image translation, image-to-image translation, attribute-guided generation) involving eight datasets. The thorough evaluation suggests that CoPE can be useful for tackling diverse conditional generation tasks. The source code of CoPE is available at \url{https://github.com/grigorisg9gr/polynomial_nets_for_conditional_generation}.
\ No newline at end of file
diff --git a/data/2021/neurips/Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems b/data/2021/neurips/Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems
new file mode 100644
index 0000000000..bc1fcccbd6
--- /dev/null
+++ b/data/2021/neurips/Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems	
@@ -0,0 +1 @@
+Simulations of complex physical systems are typically realized by discretizing partial differential equations (PDEs) on unstructured meshes. While neural networks have recently been explored for surrogate and reduced order modeling of PDE solutions, they often ignore interactions or hierarchical relations between input features, and process them as concatenated mixtures. We generalize the idea of conditional parameterization -- using trainable functions of input parameters to generate the weights of a neural network, and extend them in a flexible way to encode critical information. Inspired by discretized numerical methods, choices of the parameters include physical quantities and mesh topology features. The functional relation between the modeled features and the parameters is built into the network architecture. The method is implemented on different networks and applied to frontier scientific machine learning tasks including the discovery of unmodeled physics, super-resolution of coarse fields, and the simulation of unsteady flows with chemical reactions. The results show that the conditionally-parameterized networks provide superior performance compared to their traditional counterparts. The CP-GNet - an architecture that can be trained on very few data snapshots - is proposed as the first deep learning model capable of standalone prediction of reacting flows on irregular meshes.
\ No newline at end of file
diff --git a/data/2021/neurips/Conditioning Sparse Variational Gaussian Processes for Online Decision-making b/data/2021/neurips/Conditioning Sparse Variational Gaussian Processes for Online Decision-making
new file mode 100644
index 0000000000..8d8984b536
--- /dev/null
+++ b/data/2021/neurips/Conditioning Sparse Variational Gaussian Processes for Online Decision-making	
@@ -0,0 +1 @@
+With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a dataset of fixed size, but are difficult to efficiently condition on new data. We propose online variational conditioning (OVC), a procedure for efficiently conditioning SVGPs in an online setting that does not require re-training through the evidence lower bound with the addition of new data. OVC enables the pairing of SVGPs with advanced look-ahead acquisition functions for black-box optimization, even with non-Gaussian likelihoods. We show OVC provides compelling performance in a range of applications including active learning of malaria incidence, and reinforcement learning on MuJoCo simulated robotic control tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality b/data/2021/neurips/Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality
new file mode 100644
index 0000000000..b386930520
--- /dev/null
+++ b/data/2021/neurips/Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality	
@@ -0,0 +1 @@
+Most existing imitation learning approaches assume the demonstrations are drawn from experts who are optimal, but relaxing this assumption enables us to use a wider range of data. Standard imitation learning may learn a suboptimal policy from demonstrations with varying optimality. Prior works use confidence scores or rankings to capture beneficial information from demonstrations with varying optimality, but they suffer from many limitations, e.g., manually annotated confidence scores or high average optimality of demonstrations. In this paper, we propose a general framework to learn from demonstrations with varying optimality that jointly learns the confidence score and a well-performing policy. Our approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of our model and to learn the confidence. We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments. Our results show that CAIL significantly outperforms other imitation learning methods from demonstrations with varying optimality. We further show that even without access to any optimal demonstrations, CAIL can still learn a successful policy, and outperforms prior work.
\ No newline at end of file
diff --git a/data/2021/neurips/Confident Anchor-Induced Multi-Source Free Domain Adaptation b/data/2021/neurips/Confident Anchor-Induced Multi-Source Free Domain Adaptation
new file mode 100644
index 0000000000..8114394cf3
--- /dev/null
+++ b/data/2021/neurips/Confident Anchor-Induced Multi-Source Free Domain Adaptation	
@@ -0,0 +1 @@
+Unsupervised domain adaptation has attracted appealing academic attentions by transferring knowledge from labeled source domain to unlabeled target domain. However, most existing methods assume the source data are drawn from a single domain, which cannot be successfully applied to explore complementarily trans-ferable knowledge from multiple source domains with large distribution discrepancies. Moreover, they require access to source data during training, which are inefﬁcient and unpractical due to privacy preservation and memory storage. To address these challenges, we develop a novel C onﬁdent-Anchor-induced multi-source-free D omain Adaptation (CAiDA) model, which is a pioneer exploration of knowledge adaptation from multiple source domains to the unlabeled target domain without any source data, but with only pre-trained source models . Specifically, a source-speciﬁc transferable perception module is proposed to automatically quantify the contributions of the complementary knowledge transferred from multi-source domains to the target domain. To generate pseudo labels for the target domain without access to the source data, we develop a conﬁdent-anchor-induced pseudo label generator by constructing a conﬁdent anchor group and assigning each unconﬁdent target sample with a semantic-nearest conﬁdent anchor. Furthermore, a class-relationship-aware consistency loss is proposed to preserve consistent inter-class relationships by aligning soft confusion matrices across domains. Theoretical analysis answers why multi-source domains are better than a single source domain, and establishes a novel learning bound to show the effectiveness of exploiting multi-source domains. Experiments on several representative datasets illustrate the superiority of our proposed CAiDA model. The code is available at https://github.com/Learning-group123/CAiDA .
\ No newline at end of file
diff --git a/data/2021/neurips/Conflict-Averse Gradient Descent for Multi-task learning b/data/2021/neurips/Conflict-Averse Gradient Descent for Multi-task learning
new file mode 100644
index 0000000000..3b560c43cc
--- /dev/null
+++ b/data/2021/neurips/Conflict-Averse Gradient Descent for Multi-task learning	
@@ -0,0 +1 @@
+The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks. A standard multi-task learning objective is to minimize the average loss across all tasks. While straightforward, using this objective often results in much worse final performance for each task than learning them independently. A major challenge in optimizing a multi-task model is the conflicting gradients, where gradients of different task objectives are not well aligned so that following the average gradient direction can be detrimental to specific tasks' performance. Previous work has proposed several heuristics to manipulate the task gradients for mitigating this problem. But most of them lack convergence guarantee and/or could converge to any Pareto-stationary point. In this paper, we introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function, while leveraging the worst local improvement of individual tasks to regularize the algorithm trajectory. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss. It includes the regular gradient descent (GD) and the multiple gradient descent algorithm (MGDA) in the multi-objective optimization (MOO) literature as special cases. On a series of challenging multi-task supervised learning and reinforcement learning tasks, CAGrad achieves improved performance over prior state-of-the-art multi-objective gradient manipulation methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Conformal Bayesian Computation b/data/2021/neurips/Conformal Bayesian Computation
new file mode 100644
index 0000000000..7c41c11063
--- /dev/null
+++ b/data/2021/neurips/Conformal Bayesian Computation	
@@ -0,0 +1 @@
+We develop scalable methods for producing conformal Bayesian predictive intervals with finite sample calibration guarantees. Bayesian posterior predictive distributions, $p(y \mid x)$, characterize subjective beliefs on outcomes of interest, $y$, conditional on predictors, $x$. Bayesian prediction is well-calibrated when the model is true, but the predictive intervals may exhibit poor empirical coverage when the model is misspecified, under the so called ${\cal{M}}$-open perspective. In contrast, conformal inference provides finite sample frequentist guarantees on predictive confidence intervals without the requirement of model fidelity. Using 'add-one-in' importance sampling, we show that conformal Bayesian predictive intervals are efficiently obtained from re-weighted posterior samples of model parameters. Our approach contrasts with existing conformal methods that require expensive refitting of models or data-splitting to achieve computational efficiency. We demonstrate the utility on a range of examples including extensions to partially exchangeable settings such as hierarchical models.
\ No newline at end of file
diff --git a/data/2021/neurips/Conformal Prediction using Conditional Histograms b/data/2021/neurips/Conformal Prediction using Conditional Histograms
new file mode 100644
index 0000000000..d46b6a9996
--- /dev/null
+++ b/data/2021/neurips/Conformal Prediction using Conditional Histograms	
@@ -0,0 +1 @@
+This paper develops a conformal method to compute prediction intervals for non-parametric regression that can automatically adapt to skewed data. Leveraging black-box machine learning algorithms to estimate the conditional distribution of the outcome using histograms, it translates their output into the shortest prediction intervals with approximate conditional coverage. The resulting prediction intervals provably have marginal coverage in finite samples, while asymptotically achieving conditional coverage and optimal length if the black-box model is consistent. Numerical experiments with simulated and real data demonstrate improved performance compared to state-of-the-art alternatives, including conformalized quantile regression and other distributional conformal prediction approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Conformal Time-series Forecasting b/data/2021/neurips/Conformal Time-series Forecasting
new file mode 100644
index 0000000000..fb9e210e1d
--- /dev/null
+++ b/data/2021/neurips/Conformal Time-series Forecasting	
@@ -0,0 +1 @@
+Current approaches for (multi-horizon) time-series forecasting using recurrent neural networks (RNNs) focus on issuing point estimates, which are insufﬁcient for informing decision-making in critical application domains wherein uncertainty estimates are also required. Existing methods for uncertainty quantiﬁcation in RNN-based time-series forecasts are limited as they may require signiﬁcant alterations to the underlying architecture, may be computationally complex, may be difﬁcult to calibrate, may incur high sample complexity, and may not provide theoretical validity guarantees for the issued uncertainty intervals. In this work, we extend the inductive conformal prediction framework to the time-series forecasting setup, and propose a lightweight uncertainty estimation procedure to address the above limitations. With minimal exchangeability assumptions, our approach provides uncertainty intervals with theoretical guarantees on frequentist coverage for any multi-horizon forecast predictor and any dataset. We demonstrate the effectiveness of the conformal forecasting framework by comparing it with existing baselines on a variety of synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving b/data/2021/neurips/Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving
new file mode 100644
index 0000000000..4f8605bbed
--- /dev/null
+++ b/data/2021/neurips/Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving	
@@ -0,0 +1 @@
+We develop new parameter-free and scale-free algorithms for solving convex-concave saddle-point problems. Our results are based on a new simple regret minimizer, the Conic Blackwell Algorithm$^+$ (CBA$^+$), which attains $O(1/\sqrt{T})$ average regret. Intuitively, our approach generalizes to other decision sets of interest ideas from the Counterfactual Regret minimization (CFR$^+$) algorithm, which has very strong practical performance for solving sequential games on simplexes. We show how to implement CBA$^+$ for the simplex, $\ell_{p}$ norm balls, and ellipsoidal confidence regions in the simplex, and we present numerical experiments for solving matrix games and distributionally robust optimization problems. Our empirical results show that CBA$^+$ is a simple algorithm that outperforms state-of-the-art methods on synthetic data and real data instances, without the need for any choice of step sizes or other algorithmic parameters.
\ No newline at end of file
diff --git a/data/2021/neurips/Conservative Data Sharing for Multi-Task Offline Reinforcement Learning b/data/2021/neurips/Conservative Data Sharing for Multi-Task Offline Reinforcement Learning
new file mode 100644
index 0000000000..e9623f0f38
--- /dev/null
+++ b/data/2021/neurips/Conservative Data Sharing for Multi-Task Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice. Thorough empirical analysis, we find that sharing data can actually exacerbate the distributional shift between the learned policy and the dataset, which in turn can lead to divergence of the learned policy and poor performance. To address this challenge, we develop a simple technique for data-sharing in multi-task offline RL that routes data based on the improvement over the task-specific data. We call this approach conservative data sharing (CDS), and it can be applied with multiple single-task offline RL methods. On a range of challenging multi-task locomotion, navigation, and vision-based robotic manipulation problems, CDS achieves the best or comparable performance compared to prior offline multi-task RL methods and previous data sharing approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Conservative Offline Distributional Reinforcement Learning b/data/2021/neurips/Conservative Offline Distributional Reinforcement Learning
new file mode 100644
index 0000000000..391347e141
--- /dev/null
+++ b/data/2021/neurips/Conservative Offline Distributional Reinforcement Learning	
@@ -0,0 +1 @@
+Many reinforcement learning (RL) problems in practice are offline, learning purely from observational data. A key challenge is how to ensure the learned policy is safe, which requires quantifying the risk associated with different actions. In the online setting, distributional RL algorithms do so by learning the distribution over returns (i.e., cumulative rewards) instead of the expected return; beyond quantifying risk, they have also been shown to learn better representations for planning. We propose Conservative Offline Distributional Actor Critic (CODAC), an offline RL algorithm suitable for both risk-neutral and risk-averse domains. CODAC adapts distributional RL to the offline setting by penalizing the predicted quantiles of the return for out-of-distribution actions. We prove that CODAC learns a conservative return distribution -- in particular, for finite MDPs, CODAC converges to an uniform lower bound on the quantiles of the return distribution; our proof relies on a novel analysis of the distributional Bellman operator. In our experiments, on two challenging robot navigation tasks, CODAC successfully learns risk-averse policies using offline data collected purely from risk-neutral agents. Furthermore, CODAC is state-of-the-art on the D4RL MuJoCo benchmark in terms of both expected and risk-sensitive performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Consistency Regularization for Variational Auto-Encoders b/data/2021/neurips/Consistency Regularization for Variational Auto-Encoders
new file mode 100644
index 0000000000..76c785bda9
--- /dev/null
+++ b/data/2021/neurips/Consistency Regularization for Variational Auto-Encoders	
@@ -0,0 +1 @@
+Variational auto-encoders (VAEs) are a powerful approach to unsupervised learning. They enable scalable approximate posterior inference in latent-variable models using variational inference (VI). A VAE posits a variational family parameterized by a deep neural network called an encoder that takes data as input. This encoder is shared across all the observations, which amortizes the cost of inference. However the encoder of a VAE has the undesirable property that it maps a given observation and a semantics-preserving transformation of it to different latent representations. This"inconsistency"of the encoder lowers the quality of the learned representations, especially for downstream tasks, and also negatively affects generalization. In this paper, we propose a regularization method to enforce consistency in VAEs. The idea is to minimize the Kullback-Leibler (KL) divergence between the variational distribution when conditioning on the observation and the variational distribution when conditioning on a random semantic-preserving transformation of this observation. This regularization is applicable to any VAE. In our experiments we apply it to four different VAE variants on several benchmark datasets and found it always improves the quality of the learned representations but also leads to better generalization. In particular, when applied to the Nouveau Variational Auto-Encoder (NVAE), our regularization method yields state-of-the-art performance on MNIST and CIFAR-10. We also applied our method to 3D data and found it learns representations of superior quality as measured by accuracy on a downstream classification task.
\ No newline at end of file
diff --git a/data/2021/neurips/Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers b/data/2021/neurips/Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers
new file mode 100644
index 0000000000..06c09c6cd3
--- /dev/null
+++ b/data/2021/neurips/Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers	
@@ -0,0 +1 @@
+We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $\alpha$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size $n\gtrsim (k\log d)/\alpha^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot \alpha^2)})$ where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers $\alpha$ is $o(1/\log \log n)$, even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian with variance $1/n^2$). To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi et al.'s approach [dNS21] in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Consistent Non-Parametric Methods for Maximizing Robustness b/data/2021/neurips/Consistent Non-Parametric Methods for Maximizing Robustness
new file mode 100644
index 0000000000..386345751d
--- /dev/null
+++ b/data/2021/neurips/Consistent Non-Parametric Methods for Maximizing Robustness	
@@ -0,0 +1 @@
+Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius $r$ that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data, and smaller in others. In this paper, we address this limitation by proposing a new limit classifier, called the neighborhood optimal classifier, that extends the Bayes optimal classifier outside its support by using the label of the closest in-support point. We then argue that this classifier maximizes the size of its robustness regions subject to the constraint of having accuracy equal to the Bayes optimal. We then present sufficient conditions under which general non-parametric methods that can be represented as weight functions converge towards this limit, and show that both nearest neighbors and kernel classifiers satisfy them under certain conditions.
\ No newline at end of file
diff --git a/data/2021/neurips/Constrained Optimization to Train Neural Networks on Critical and Under-Represented Classes b/data/2021/neurips/Constrained Optimization to Train Neural Networks on Critical and Under-Represented Classes
new file mode 100644
index 0000000000..69267dcd02
--- /dev/null
+++ b/data/2021/neurips/Constrained Optimization to Train Neural Networks on Critical and Under-Represented Classes	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) are notorious for making more mistakes for the classes that have substantially fewer samples than the others during training. Such class imbalance is ubiquitous in clinical applications and very crucial to handle because the classes with fewer samples most often correspond to critical cases (e.g., cancer) where misclassifications can have severe consequences. Not to miss such cases, binary classifiers need to be operated at high True Positive Rates (TPRs) by setting a higher threshold, but this comes at the cost of very high False Positive Rates (FPRs) for problems with class imbalance. Existing methods for learning under class imbalance most often do not take this into account. We argue that prediction accuracy should be improved by emphasizing reducing FPRs at high TPRs for problems where misclassification of the positive, i.e. critical, class samples are associated with higher cost. To this end, we pose the training of a DNN for binary classification as a constrained optimization problem and introduce a novel constraint that can be used with existing loss functions to enforce maximal area under the ROC curve (AUC) through prioritizing FPR reduction at high TPR. We solve the resulting constrained optimization problem using an Augmented Lagrangian method (ALM). Going beyond binary, we also propose two possible extensions of the proposed constraint for multi-class classification problems. We present experimental results for image-based binary and multi-class classification applications using an in-house medical imaging dataset, CIFAR10, and CIFAR100. Our results demonstrate that the proposed method improves the baselines in majority of the cases by attaining higher accuracy on critical classes while reducing the misclassification rate for the non-critical class samples.
\ No newline at end of file
diff --git a/data/2021/neurips/Constrained Robust Submodular Partitioning b/data/2021/neurips/Constrained Robust Submodular Partitioning
new file mode 100644
index 0000000000..8360a7d2bc
--- /dev/null
+++ b/data/2021/neurips/Constrained Robust Submodular Partitioning	
@@ -0,0 +1 @@
+In the robust submodular partitioning problem, we aim to allocate a set of items into m blocks, so that the evaluation of the minimum block according to a submodular function is maximized. Robust submodular partitioning promotes the diversity of every block in the partition. It has many applications in machine learning, e.g., partitioning data for distributed training so that the gradients computed on every block are consistent. We study an extension of the robust submodular partition problem with additional constraints (e.g., cardinality, multiple matroids, and/or knapsack) on every block. For example, when partitioning data for distributed training, we can add a constraint that the number of samples of each class is the same in each partition block, ensuring data balance. We present two classes of algorithms, i.e., Min-Block Greedy based algorithms (with an ⌦ (1 /m ) bound), and Round-Robin Greedy based algorithms (with a constant bound) and show that under various constraints, they still have good approximation guarantees. Interestingly, while normally the latter runs in only weakly polynomial time, we show that using the two together yields strongly polynomial running time while preserving the approximation guarantee. Lastly, we apply the algorithms on a real-world machine learning data partitioning problem showing good results.
\ No newline at end of file
diff --git a/data/2021/neurips/Container: Context Aggregation Networks b/data/2021/neurips/Container: Context Aggregation Networks
new file mode 100644
index 0000000000..9b358b7b48
--- /dev/null
+++ b/data/2021/neurips/Container: Context Aggregation Networks	
@@ -0,0 +1 @@
+Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continue to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions \emph{a la} Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named \modellight, can be employed in object detection and instance segmentation networks such as DETR, RetinaNet and Mask-RCNN to obtain an impressive detection mAP of 38.9, 43.8, 45.1 and mask mAP of 41.3, providing large improvements of 6.6, 7.3, 6.9 and 6.6 pts respectively, compared to a ResNet-50 backbone with a comparable compute and parameter size. Our method also achieves promising results on self-supervised learning compared to DeiT on the DINO framework. Code is released at \url{https://github.com/allenai/container}.
\ No newline at end of file
diff --git a/data/2021/neurips/Contextual Recommendations and Low-Regret Cutting-Plane Algorithms b/data/2021/neurips/Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
new file mode 100644
index 0000000000..81960e99c1
--- /dev/null
+++ b/data/2021/neurips/Contextual Recommendations and Low-Regret Cutting-Plane Algorithms	
@@ -0,0 +1 @@
+We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden d -dimensional value w ∗ . Every round, we are presented with a subset X t ⊆ R d of possible actions. If we choose (i.e. recommend to the user) action x t , we obtain utility (cid:104) x t , w ∗ (cid:105) but only learn the identity of the best action arg max x ∈X t (cid:104) x, w ∗ (cid:105) . We design algorithms for this problem which achieve regret O ( d log T ) and exp( O ( d log d )) . To accomplish this, we design novel cutting-plane algorithms with low “regret” – the total distance between the true point w ∗ and the hyperplanes the separation oracle returns. We also consider the variant where we are allowed to provide a list of several recommendations. In this variant, we give an algorithm with O ( d 2 log d ) regret and list size poly( d ) . Finally, we construct nearly tight algorithms for a weaker variant of this problem where the learner only learns the identity of an action that is better than the recommendation. Our results rely on new algorithmic techniques in convex geometry (including a variant of Steiner’s formula for the centroid of a convex set) which may be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Contextual Similarity Aggregation with Self-attention for Visual Re-ranking b/data/2021/neurips/Contextual Similarity Aggregation with Self-attention for Visual Re-ranking
new file mode 100644
index 0000000000..2c69788839
--- /dev/null
+++ b/data/2021/neurips/Contextual Similarity Aggregation with Self-attention for Visual Re-ranking	
@@ -0,0 +1 @@
+In content-based image retrieval, the first-round retrieval result by simple visual feature comparison may be unsatisfactory, which can be refined by visual re-ranking techniques. In image retrieval, it is observed that the contextual similarity among the top-ranked images is an important clue to distinguish the semantic relevance. Inspired by this observation, in this paper, we propose a visual re-ranking method by contextual similarity aggregation with self-attention. In our approach, for each image in the top-K ranking list, we represent it into an affinity feature vector by comparing it with a set of anchor images. Then, the affinity features of the top-K images are refined by aggregating the contextual information with a transformer encoder. Finally, the affinity features are used to recalculate the similarity scores between the query and the top-K images for re-ranking of the latter. To further improve the robustness of our re-ranking model and enhance the performance of our method, a new data augmentation scheme is designed. Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval, it is ready to be applied to retrieval result lists obtained from various retrieval algorithms. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
\ No newline at end of file
diff --git a/data/2021/neurips/Continual Auxiliary Task Learning b/data/2021/neurips/Continual Auxiliary Task Learning
new file mode 100644
index 0000000000..3b7450911f
--- /dev/null
+++ b/data/2021/neurips/Continual Auxiliary Task Learning	
@@ -0,0 +1 @@
+Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning system designed to learn a collection of auxiliary tasks, with a behavior policy learning to take actions to improve those auxiliary predictions. We highlight the inherent non-stationarity in this continual auxiliary task learning problem, for both prediction learners and the behavior learner. We develop an algorithm based on successor features that facilitates tracking under non-stationary rewards, and prove the separation into learning successor features and rewards provides convergence rate improvements. We conduct an in-depth study into the resulting multi-prediction learning system.
\ No newline at end of file
diff --git a/data/2021/neurips/Continual Learning via Local Module Composition b/data/2021/neurips/Continual Learning via Local Module Composition
new file mode 100644
index 0000000000..09c16cecf8
--- /dev/null
+++ b/data/2021/neurips/Continual Learning via Local Module Composition	
@@ -0,0 +1 @@
+Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then composing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module's relevance to the input. Dynamic module composition is performed layer-wise based on local relevance scores. We demonstrate that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-specific as in previous works, making LMC applicable to more CL settings compared to previous works. In addition, LMC also tracks statistics about the input distribution and adds new modules when outlier samples are detected. In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. Finally, in search for limitations of LMC we study it on more challenging sequences of 30 and 100 tasks, demonstrating that local module selection becomes much more challenging in presence of a large number of candidate modules. In this setting best performing LMC spawns much fewer modules compared to an oracle based baseline, however, it reaches a lower overall accuracy. The codebase is available under https://github.com/oleksost/LMC.
\ No newline at end of file
diff --git a/data/2021/neurips/Continual World: A Robotic Benchmark For Continual Reinforcement Learning b/data/2021/neurips/Continual World: A Robotic Benchmark For Continual Reinforcement Learning
new file mode 100644
index 0000000000..7bcc86ca5c
--- /dev/null
+++ b/data/2021/neurips/Continual World: A Robotic Benchmark For Continual Reinforcement Learning	
@@ -0,0 +1 @@
+Continual learning (CL) -- the ability to continuously learn, building on previously acquired knowledge -- is a natural requirement for long-lived autonomous reinforcement learning (RL) agents. While building such agents, one needs to balance opposing desiderata, such as constraints on capacity and compute, the ability to not catastrophically forget, and to exhibit positive transfer on new tasks. Understanding the right trade-off is conceptually and computationally challenging, which we argue has led the community to overly focus on catastrophic forgetting. In response to these issues, we advocate for the need to prioritize forward transfer and propose Continual World, a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World as a testbed. Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand the performance of existing and future solutions. Information about the benchmark, including the open-source code, is available at https://sites.google.com/view/continualworld.
\ No newline at end of file
diff --git a/data/2021/neurips/Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms b/data/2021/neurips/Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Continuous Doubly Constrained Batch Reinforcement Learning b/data/2021/neurips/Continuous Doubly Constrained Batch Reinforcement Learning
new file mode 100644
index 0000000000..c29fdba12c
--- /dev/null
+++ b/data/2021/neurips/Continuous Doubly Constrained Batch Reinforcement Learning	
@@ -0,0 +1 @@
+Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.
\ No newline at end of file
diff --git a/data/2021/neurips/Continuous Latent Process Flows b/data/2021/neurips/Continuous Latent Process Flows
new file mode 100644
index 0000000000..0162675cc0
--- /dev/null
+++ b/data/2021/neurips/Continuous Latent Process Flows	
@@ -0,0 +1 @@
+Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.
\ No newline at end of file
diff --git a/data/2021/neurips/Continuous Mean-Covariance Bandits b/data/2021/neurips/Continuous Mean-Covariance Bandits
new file mode 100644
index 0000000000..8b3737acf6
--- /dev/null
+++ b/data/2021/neurips/Continuous Mean-Covariance Bandits	
@@ -0,0 +1 @@
+Existing risk-aware multi-armed bandit models typically focus on risk measures of individual options such as variance. As a result, they cannot be directly applied to important real-world online decision making problems with correlated options. In this paper, we propose a novel Continuous Mean-Covariance Bandit (CMCB) model to explicitly take into account option correlation. Specifically, in CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. The agent's objective is to achieve the best trade-off between reward and risk, measured with option covariance. To capture different reward observation scenarios in practice, we consider three feedback settings, i.e., full-information, semi-bandit and full-bandit feedback. We propose novel algorithms with optimal regrets (within logarithmic factors), and provide matching lower bounds to validate their optimalities. The experimental results also demonstrate the superiority of our algorithms. To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance. The novel analytical techniques we developed for exploiting the estimated covariance to build concentration and bounding the risk of selected actions based on sampling strategy properties can likely find applications in other bandit analysis and be of independent interests.
\ No newline at end of file
diff --git a/data/2021/neurips/Continuous vs. Discrete Optimization of Deep Neural Networks b/data/2021/neurips/Continuous vs. Discrete Optimization of Deep Neural Networks
new file mode 100644
index 0000000000..a017809787
--- /dev/null
+++ b/data/2021/neurips/Continuous vs. Discrete Optimization of Deep Neural Networks	
@@ -0,0 +1 @@
+Existing analyses of optimization in deep learning are either continuous, focusing on (variants of) gradient flow, or discrete, directly treating (variants of) gradient descent. Gradient flow is amenable to theoretical analysis, but is stylized and disregards computational efficiency. The extent to which it represents gradient descent is an open question in the theory of deep learning. The current paper studies this question. Viewing gradient descent as an approximate numerical solution to the initial value problem of gradient flow, we find that the degree of approximation depends on the curvature around the gradient flow trajectory. We then show that over deep neural networks with homogeneous activations, gradient flow trajectories enjoy favorable curvature, suggesting they are well approximated by gradient descent. This finding allows us to translate an analysis of gradient flow over deep linear neural networks into a guarantee that gradient descent efficiently converges to global minimum almost surely under random initialization. Experiments suggest that over simple deep neural networks, gradient descent with conventional step size is indeed close to gradient flow. We hypothesize that the theory of gradient flows will unravel mysteries behind deep learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Continuous-time edge modelling using non-parametric point processes b/data/2021/neurips/Continuous-time edge modelling using non-parametric point processes
new file mode 100644
index 0000000000..051edc34f3
--- /dev/null
+++ b/data/2021/neurips/Continuous-time edge modelling using non-parametric point processes	
@@ -0,0 +1 @@
+The mutually-exciting Hawkes process (ME-HP) is a natural choice to model reciprocity , which is an important attribute of continuous-time edge (dyadic) data. However, existing ways of implementing the ME-HP for such data are either inflexible, as the exogenous (background) rate functions are typically constant and the endogenous (excitation) rate functions are specified parametrically, or inefficient, as inference usually relies on Markov chain Monte Carlo methods with high computational costs. To address these limitations, we discuss various approaches to model design, and develop three variants of non-parametric point processes for continuous-time edge modelling (CTEM). The resulting models are highly adaptable as they generate intensity functions through sigmoidal Gaussian processes, and so provide greater modelling flexibility than parametric forms. The models are implemented via a fast variational inference method enabled by a novel edge modelling construction. The superior performance of the proposed CTEM models is demonstrated through extensive experimental evaluations on four real-world continuous-time edge data sets.
\ No newline at end of file
diff --git a/data/2021/neurips/Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing b/data/2021/neurips/Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
new file mode 100644
index 0000000000..add1899855
--- /dev/null
+++ b/data/2021/neurips/Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing	
@@ -0,0 +1 @@
+Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed for images, the problem of unsupervised domain adaptation in videos remains largely underexplored. In this paper, we introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation. First, unlike existing methods that rely on adversarial learning for feature alignment, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds as well as minimizing the similarity between different videos played at different speeds. Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains. Moreover, we also integrate a supervised contrastive learning objective using target pseudo-labels to enhance discriminability of the latent space for video domain adaptation. Extensive experiments on several benchmark datasets demonstrate the superiority of our proposed approach over state-of-the-art methods. Project page: https://cvir.github.io/projects/comix
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastive Active Inference b/data/2021/neurips/Contrastive Active Inference
new file mode 100644
index 0000000000..7102eb95af
--- /dev/null
+++ b/data/2021/neurips/Contrastive Active Inference	
@@ -0,0 +1 @@
+Active inference is a unifying theory for perception and action resting upon the idea that the brain maintains an internal model of the world by minimizing free energy. From a behavioral perspective, active inference agents can be seen as self-evidencing beings that act to fulfill their optimistic predictions, namely preferred outcomes or goals. In contrast, reinforcement learning requires human-designed rewards to accomplish any desired outcome. Although active inference could provide a more natural self-supervised objective for control, its applicability has been limited because of the shortcomings in scaling the approach to complex environments. In this work, we propose a contrastive objective for active inference that strongly reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train. We compare to reinforcement learning agents that have access to human-designed reward functions, showing that our approach closely matches their performance. Finally, we also show that contrastive methods perform significantly better in the case of distractors in the environment and that our method is able to generalize goals to variations in the background. Website and code: https://contrastive-aif.github.io/
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels b/data/2021/neurips/Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels
new file mode 100644
index 0000000000..f3b09786c6
--- /dev/null
+++ b/data/2021/neurips/Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have achieved remarkable performance in the task of semi-supervised node classification. However, most existing GNN models require sufficient labeled data for effective network training. Their performance can be seriously degraded when labels are extremely limited. To address this issue, we propose a new framework termed Contrastive Graph Poisson Networks (CGPN) for node classification under extremely limited labeled data. Specifically, our CGPN derives from variational inference; integrates a newly designed Graph Poisson Network (GPN) to effectively propagate the limited labels to the entire graph and a normal GNN, such as Graph Attention Network, that flexibly guides the propagation of GPN; applies a contrastive objective to further exploit the supervision information from the learning process of GPN and GNN models. Essentially, our CGPN can enhance the learning performance of GNNs under extremely limited labels by contrastively propagating the limited labels to the entire graph. We conducted extensive experiments on different types of datasets to demonstrate the superiority of CGPN.
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastive Laplacian Eigenmaps b/data/2021/neurips/Contrastive Laplacian Eigenmaps
new file mode 100644
index 0000000000..a57a00e526
--- /dev/null
+++ b/data/2021/neurips/Contrastive Laplacian Eigenmaps	
@@ -0,0 +1 @@
+Graph contrastive learning attracts/disperses node representations for similar/dissimilar node pairs under some notion of similarity. It may be combined with a low-dimensional embedding of nodes to preserve intrinsic and structural properties of a graph. In this paper, we extend the celebrated Laplacian Eigenmaps with contrastive learning, and call them COntrastive Laplacian EigenmapS (COLES). Starting from a GAN-inspired contrastive formulation, we show that the Jensen-Shannon divergence underlying many contrastive graph embedding models fails under disjoint positive and negative distributions, which may naturally emerge during sampling in the contrastive setting. In contrast, we demonstrate analytically that COLES essentially minimizes a surrogate of Wasserstein distance, which is known to cope well under disjoint distributions. Moreover, we show that the loss of COLES belongs to the family of so-called block-contrastive losses, previously shown to be superior compared to pair-wise losses typically used by contrastive methods. We show on popular benchmarks/backbones that COLES offers favourable accuracy/scalability compared to DeepWalk, GCN, Graph2Gauss, DGI and GRACE baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastive Learning for Neural Topic Model b/data/2021/neurips/Contrastive Learning for Neural Topic Model
new file mode 100644
index 0000000000..8e8ed83d78
--- /dev/null
+++ b/data/2021/neurips/Contrastive Learning for Neural Topic Model	
@@ -0,0 +1 @@
+Recent empirical studies show that adversarial topic models (ATM) can successfully capture semantic patterns of the document by differentiating a document with another dissimilar sample. However, utilizing that discriminative-generative architecture has two important drawbacks: (1) the architecture does not relate similar documents, which has the same document-word distribution of salient words; (2) it restricts the ability to integrate external information, such as sentiments of the document, which has been shown to benefit the training of neural topic model. To address those issues, we revisit the adversarial topic architecture in the viewpoint of mathematical analysis, propose a novel approach to re-formulate discriminative goal as an optimization problem, and design a novel sampling method which facilitates the integration of external variables. The reformulation encourages the model to incorporate the relations among similar samples and enforces the constraint on the similarity among dissimilar ones; while the sampling method, which is based on the internal input and reconstructed output, helps inform the model of salient words contributing to the main topic. Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets that belong to various domains, vocabulary sizes, and document lengths in terms of topic coherence.
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastive Learning of Global and Local Video Representations b/data/2021/neurips/Contrastive Learning of Global and Local Video Representations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Contrastive Reinforcement Learning of Symbolic Reasoning Domains b/data/2021/neurips/Contrastive Reinforcement Learning of Symbolic Reasoning Domains
new file mode 100644
index 0000000000..c3830da4c3
--- /dev/null
+++ b/data/2021/neurips/Contrastive Reinforcement Learning of Symbolic Reasoning Domains	
@@ -0,0 +1 @@
+Abstract symbolic reasoning, as required in domains such as mathematics and logic, is a key component of human intelligence. Solvers for these domains have important applications, especially to computer-assisted education. But learning to solve symbolic problems is challenging for machine learning algorithms. Existing models either learn from human solutions or use hand-engineered features, making them expensive to apply in new domains. In this paper, we instead consider symbolic domains as simple environments where states and actions are given as unstructured text, and binary rewards indicate whether a problem is solved. This flexible setup makes it easy to specify new domains, but search and planning become challenging. We introduce four environments inspired by the Mathematics Common Core Curriculum, and observe that existing Reinforcement Learning baselines perform poorly. We then present a novel learning algorithm, Contrastive Policy Learning (ConPoLe) that explicitly optimizes the InfoNCE loss, which lower bounds the mutual information between the current state and next states that continue on a path to the solution. ConPoLe successfully solves all four domains. Moreover, problem representations learned by ConPoLe enable accurate prediction of the categories of problems in a real mathematics curriculum. Our results suggest new directions for reinforcement learning in symbolic domains, as well as applications to mathematics education.
\ No newline at end of file
diff --git a/data/2021/neurips/Contrastively Disentangled Sequential Variational Autoencoder b/data/2021/neurips/Contrastively Disentangled Sequential Variational Autoencoder
new file mode 100644
index 0000000000..7c27f51cfb
--- /dev/null
+++ b/data/2021/neurips/Contrastively Disentangled Sequential Variational Autoencoder	
@@ -0,0 +1 @@
+Self-supervised disentangled representation learning is a critical task in sequence modeling. The learnt representations contribute to better model interpretability as well as the data generation, and improve the sample efficiency for downstream tasks. We propose a novel sequence representation learning method, named Contrastively Disentangled Sequential Variational Autoencoder (C-DSVAE), to extract and separate the static (time-invariant) and dynamic (time-variant) factors in the latent space. Different from previous sequential variational autoencoder methods, we use a novel evidence lower bound which maximizes the mutual information between the input and the latent factors, while penalizes the mutual information between the static and dynamic factors. We leverage contrastive estimations of the mutual information terms in training, together with simple yet effective augmentation techniques, to introduce additional inductive biases. Our experiments show that C-DSVAE significantly outperforms the previous state-of-the-art methods on multiple metrics.
\ No newline at end of file
diff --git a/data/2021/neurips/Control Variates for Slate Off-Policy Evaluation b/data/2021/neurips/Control Variates for Slate Off-Policy Evaluation
new file mode 100644
index 0000000000..070f4fac37
--- /dev/null
+++ b/data/2021/neurips/Control Variates for Slate Off-Policy Evaluation	
@@ -0,0 +1 @@
+We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and user-interface optimization, and it is particularly challenging because of the combinatorially-sized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the conditional mean rewards are additive in actions. Using control variates, we consider a large class of unbiased estimators that includes as specific cases the PI estimator and (asymptotically) its self-normalized variant. By optimizing over this class, we obtain new estimators with risk improvement guarantees over both the PI and the self-normalized PI estimators. Experiments with real-world recommender data as well as synthetic data validate these improvements in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Controlled Text Generation as Continuous Optimization with Multiple Constraints b/data/2021/neurips/Controlled Text Generation as Continuous Optimization with Multiple Constraints
new file mode 100644
index 0000000000..fed3115637
--- /dev/null
+++ b/data/2021/neurips/Controlled Text Generation as Continuous Optimization with Multiple Constraints	
@@ -0,0 +1 @@
+As large-scale language model pretraining pushes the state-of-the-art in text generation, recent work has turned to controlling attributes of the text such models generate. While modifying the pretrained models via fine-tuning remains the popular approach, it incurs a significant computational cost and can be infeasible due to lack of appropriate data. As an alternative, we propose MuCoCO -- a flexible and modular algorithm for controllable inference from pretrained models. We formulate the decoding process as an optimization problem which allows for multiple attributes we aim to control to be easily incorporated as differentiable constraints to the optimization. By relaxing this discrete optimization to a continuous one, we make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text. We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe significant improvements over baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Controlling Neural Networks with Rule Representations b/data/2021/neurips/Controlling Neural Networks with Rule Representations
new file mode 100644
index 0000000000..9c7f5fc8a2
--- /dev/null
+++ b/data/2021/neurips/Controlling Neural Networks with Rule Representations	
@@ -0,0 +1 @@
+We propose a novel training method that integrates rules into deep learning, in a way the strengths of the rules are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model coupled with a rule-based objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model architecture. It can be applied to any kind of rule defined for inputs and outputs. The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio. In real-world domains where incorporating rules is critical -- such as Physics, Retail and Healthcare -- we show the effectiveness of DeepCTRL in teaching rules for deep learning. DeepCTRL improves the trust and reliability of the trained models by significantly increasing their rule verification ratio, while also providing accuracy gains at downstream tasks. Additionally, DeepCTRL enables novel use cases such as hypothesis testing of the rules on data samples, and unsupervised adaptation based on shared rules between datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance b/data/2021/neurips/Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance
new file mode 100644
index 0000000000..b0f845e42e
--- /dev/null
+++ b/data/2021/neurips/Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance	
@@ -0,0 +1 @@
+Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined '$p$-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices ($p=2$) and diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we then provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $\alpha$-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data.
\ No newline at end of file
diff --git a/data/2021/neurips/Convergence of adaptive algorithms for constrained weakly convex optimization b/data/2021/neurips/Convergence of adaptive algorithms for constrained weakly convex optimization
new file mode 100644
index 0000000000..77531e4d2e
--- /dev/null
+++ b/data/2021/neurips/Convergence of adaptive algorithms for constrained weakly convex optimization	
@@ -0,0 +1 @@
+We analyze the adaptive ﬁrst order algorithm AMSGrad, for solving a constrained stochastic optimization problem with a weakly convex objective. We prove the ˜ O ( t − 1 / 2 ) rate of convergence for the squared norm of the gradient of Moreau envelope, which is the standard stationarity measure for this class of problems. It matches the known rates that adaptive algorithms enjoy for the speciﬁc case of unconstrained smooth nonconvex stochastic optimization. Our analysis works with mini-batch size of 1 , constant ﬁrst and second order moment parameters, and possibly unbounded optimization domains. Finally, we illustrate the applications and extensions of our results to speciﬁc problems and algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Convex Polytope Trees and its Application to VAE b/data/2021/neurips/Convex Polytope Trees and its Application to VAE
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Convex-Concave Min-Max Stackelberg Games b/data/2021/neurips/Convex-Concave Min-Max Stackelberg Games
new file mode 100644
index 0000000000..56ea3803d8
--- /dev/null
+++ b/data/2021/neurips/Convex-Concave Min-Max Stackelberg Games	
@@ -0,0 +1 @@
+Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training b/data/2021/neurips/Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training
new file mode 100644
index 0000000000..5f30ac92b3
--- /dev/null
+++ b/data/2021/neurips/Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training	
@@ -0,0 +1 @@
+Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve robustness. For ConvNets, most existing methods are based on penalizing or normalizing weight matrices derived from concatenating or flattening the convolutional kernels. These methods often destroy or ignore the benign convolutional structure of the kernels; therefore, they are often expensive or impractical for deep ConvNets. In contrast, we introduce a simple and efficient"Convolutional Normalization"(ConvNorm) method that can fully exploit the convolutional structure in the Fourier domain and serve as a simple plug-and-play module to be conveniently incorporated into any ConvNets. Our method is inspired by recent work on preconditioning methods for convolutional sparse coding and can effectively promote each layer's channel-wise isometry. Furthermore, we show that our ConvNorm can reduce the layerwise spectral norm of the weight matrices and hence improve the Lipschitzness of the network, leading to easier training and improved robustness for deep ConvNets. Applied to classification under noise corruptions and generative adversarial network (GAN), we show that the ConvNorm improves the robustness of common ConvNets such as ResNet and the performance of GAN. We verify our findings via numerical experiments on CIFAR and ImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback b/data/2021/neurips/Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback
new file mode 100644
index 0000000000..3a102ad57c
--- /dev/null
+++ b/data/2021/neurips/Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback	
@@ -0,0 +1 @@
+Motivated by the scenario of large-scale learning in distributed systems, this paper studies a scenario where M agents cooperate together to solve the same instance of a K -armed stochastic bandit problem. The agents have limited access to a local subset of arms and are asynchronous with different gaps between decision-making rounds. The goal is to ﬁnd the global optimal arm and agents are able to pull any arm, however, they can only observe the reward when the selected arm is local. The challenge is a tradeoff for agents between pulling a local arm with observable feedback, or pulling external arms without feedback and relying on others’ observations that occur at different rates. We propose AAE-LCB , a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy, and switches to other arms only if all local arms are dominated by some external arms. We analyze the regret of AAE-LCB and show it matches the regret lower bound up to a small factor.
\ No newline at end of file
diff --git a/data/2021/neurips/Coordinated Proximal Policy Optimization b/data/2021/neurips/Coordinated Proximal Policy Optimization
new file mode 100644
index 0000000000..b497f359f6
--- /dev/null
+++ b/data/2021/neurips/Coordinated Proximal Policy Optimization	
@@ -0,0 +1 @@
+We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the policy update process among multiple agents. We prove the monotonicity of policy improvement when optimizing a theoretically-grounded joint objective, and derive a simplified optimization objective based on a set of approximations. We then interpret that such an objective in CoPPO can achieve dynamic credit assignment among agents, thereby alleviating the high variance issue during the concurrent update of agent policies. Finally, we demonstrate that CoPPO outperforms several strong baselines and is competitive with the latest multi-agent PPO method (i.e. MAPPO) under typical multi-agent settings, including cooperative matrix games and the StarCraft II micromanagement tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Coresets for Classification - Simplified and Strengthened b/data/2021/neurips/Coresets for Classification - Simplified and Strengthened
new file mode 100644
index 0000000000..4f09148530
--- /dev/null
+++ b/data/2021/neurips/Coresets for Classification - Simplified and Strengthened	
@@ -0,0 +1 @@
+We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their $\ell_1$ $Lewis$ $weights$. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Coresets for Clustering with Missing Values b/data/2021/neurips/Coresets for Clustering with Missing Values
new file mode 100644
index 0000000000..6612aaee31
--- /dev/null
+++ b/data/2021/neurips/Coresets for Clustering with Missing Values	
@@ -0,0 +1,3 @@
+We provide the first coreset for clustering points in $\mathbb{R}^d$ that have multiple missing values (coordinates). Previous coreset constructions only allow one missing coordinate. The challenge in this setting is that objective functions, like $k$-Means, are evaluated only on the set of available (non-missing) coordinates, which varies across points. Recall that an $\epsilon$-coreset of a large dataset is a small proxy, usually a reweighted subset of points, that $(1+\epsilon)$-approximates the clustering objective for every possible center set. 
+Our coresets for $k$-Means and $k$-Median clustering have size $(jk)^{O(\min(j,k))} (\epsilon^{-1} d \log n)^2$, where $n$ is the number of data points, $d$ is the dimension and $j$ is the maximum number of missing coordinates for each data point. We further design an algorithm to construct these coresets in near-linear time, and consequently improve a recent quadratic-time PTAS for $k$-Means with missing values [Eiben et al., SODA 2021] to near-linear time. 
+We validate our coreset construction, which is based on importance sampling and is easy to implement, on various real data sets. Our coreset exhibits a flexible tradeoff between coreset size and accuracy, and generally outperforms the uniform-sampling baseline. Furthermore, it significantly speeds up a Lloyd's-style heuristic for $k$-Means with missing values.
\ No newline at end of file
diff --git a/data/2021/neurips/Coresets for Decision Trees of Signals b/data/2021/neurips/Coresets for Decision Trees of Signals
new file mode 100644
index 0000000000..c02a6fe968
--- /dev/null
+++ b/data/2021/neurips/Coresets for Decision Trees of Signals	
@@ -0,0 +1 @@
+A $k$-decision tree $t$ (or $k$-tree) is a recursive partition of a matrix (2D-signal) into $k\geq 1$ block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix $D$ of $N$ entries (labels) is the sum of squared differences over every label in $D$ and its assigned label by $t$. Given an error parameter $\varepsilon\in(0,1)$, a $(k,\varepsilon)$-coreset $C$ of $D$ is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of $1\pm\varepsilon$. In particular, the optimal $k$-tree of $C$ is a $(1+\varepsilon)$-approximation to the optimal $k$-tree of $D$. We provide the first algorithm that outputs such a $(k,\varepsilon)$-coreset for \emph{every} such matrix $D$. The size $|C|$ of the coreset is polynomial in $k\log(N)/\varepsilon$, and its construction takes $O(Nk)$ time. This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x$10$, while keeping similar accuracy. Full open source code is provided.
\ No newline at end of file
diff --git a/data/2021/neurips/Coresets for Time Series Clustering b/data/2021/neurips/Coresets for Time Series Clustering
new file mode 100644
index 0000000000..382a5f2794
--- /dev/null
+++ b/data/2021/neurips/Coresets for Time Series Clustering	
@@ -0,0 +1 @@
+We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on $N$ entities is generated from a Gaussian mixture model with autocorrelations over $k$ clusters in $\mathbb{R}^d$. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities $N$ and the number of observations for each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$, where $\varepsilon$ is the error parameter. We empirically assess the performance of our coreset with synthetic data.
\ No newline at end of file
diff --git a/data/2021/neurips/Correlated Stochastic Block Models: Exact Graph Matching with Applications to Recovering Communities b/data/2021/neurips/Correlated Stochastic Block Models: Exact Graph Matching with Applications to Recovering Communities
new file mode 100644
index 0000000000..d16dd8c878
--- /dev/null
+++ b/data/2021/neurips/Correlated Stochastic Block Models: Exact Graph Matching with Applications to Recovering Communities	
@@ -0,0 +1 @@
+We consider the task of learning latent community structure from multiple correlated networks. First, we study the problem of learning the latent vertex correspondence between two edge-correlated stochastic block models, focusing on the regime where the average degree is logarithmic in the number of vertices. We derive the precise information-theoretic threshold for exact recovery: above the threshold there exists an estimator that outputs the true correspondence with probability close to 1, while below it no estimator can recover the true correspondence with probability bounded away from 0. As an application of our results, we show how one can exactly recover the latent communities using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph.
\ No newline at end of file
diff --git a/data/2021/neurips/Corruption Robust Active Learning b/data/2021/neurips/Corruption Robust Active Learning
new file mode 100644
index 0000000000..79a271156c
--- /dev/null
+++ b/data/2021/neurips/Corruption Robust Active Learning	
@@ -0,0 +1 @@
+We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions. In this setting, every time before the learner observes a sample, the adversary decides whether to corrupt the label or not. First, we show that, in a benign corruption setting (which includes the misspecification setting as a special case), with a slight enlargement on the hypothesis elimination threshold, the classical RobustCAL framework can (surprisingly) achieve nearly the same label complexity guarantee as in the non-corrupted setting. However, this algorithm can fail in the general corruption setting. To resolve this drawback, we propose a new algorithm which is provably correct without any assumptions on the presence of corruptions. Furthermore, this algorithm enjoys the minimax label complexity in the non-corrupted setting (which is achieved by RobustCAL) and only requires $\tilde{\mathcal{O}}(C_{\mathrm{total}})$ additional labels in the corrupted setting to achieve $\mathcal{O}(\varepsilon + \frac{C_{\mathrm{total}}}{n})$, where $\varepsilon$ is the target accuracy, $C_{\mathrm{total}}$ is the total number of corruptions and $n$ is the total number of unlabeled samples.
\ No newline at end of file
diff --git a/data/2021/neurips/CorticalFlow: A Diffeomorphic Mesh Transformer Network for Cortical Surface Reconstruction b/data/2021/neurips/CorticalFlow: A Diffeomorphic Mesh Transformer Network for Cortical Surface Reconstruction
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Cortico-cerebellar networks as decoupling neural interfaces b/data/2021/neurips/Cortico-cerebellar networks as decoupling neural interfaces
new file mode 100644
index 0000000000..959feb610c
--- /dev/null
+++ b/data/2021/neurips/Cortico-cerebellar networks as decoupling neural interfaces	
@@ -0,0 +1 @@
+The brain solves the credit assignment problem remarkably well. For credit to be assigned across neural networks they must, in principle, wait for specific neural computations to finish. How the brain deals with this inherent locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and feedback phase. Recently, decoupled neural interfaces (DNIs) were introduced as a solution to the forward and feedback locking problems in deep networks. Here we propose that a specialised brain region, the cerebellum, helps the cerebral cortex solve similar locking problems akin to DNIs. To demonstrate the potential of this framework we introduce a systems-level model in which a recurrent cortical network receives online temporal feedback predictions from a cerebellar module. We test this cortico-cerebellar recurrent neural network (ccRNN) model on a number of sensorimotor (line and digit drawing) and cognitive tasks (pattern recognition and caption generation) that have been shown to be cerebellar-dependent. In all tasks, we observe that ccRNNs facilitates learning while reducing ataxia-like behaviours, consistent with classical experimental observations. Moreover, our model also explains recent behavioural and neuronal observations while making several testable predictions across multiple levels. Overall, our work offers a novel perspective on the cerebellum as a brain-wide decoupling machine for efficient credit assignment and opens a new avenue between deep learning and neuroscience.
\ No newline at end of file
diff --git a/data/2021/neurips/Counterbalancing Learning and Strategic Incentives in Allocation Markets b/data/2021/neurips/Counterbalancing Learning and Strategic Incentives in Allocation Markets
new file mode 100644
index 0000000000..cbce687a3b
--- /dev/null
+++ b/data/2021/neurips/Counterbalancing Learning and Strategic Incentives in Allocation Markets	
@@ -0,0 +1 @@
+This paper considers the problem of offering a scarce object with a common unobserved quality to strategic agents in a priority queue. Each agent has a private signal over the quality of the object and observes the decisions made by other agents. We first show that, under the widely-used first-come-first-served sequential offering mechanism, herding behavior emerges: initial rejections create an information cascade resulting in inefficient waste. To address this issue, we then introduce a class of batching mechanisms. Agents in each batch report whether they would be willing to accept or reject the object based on their private signals and prior information. If the majority opts to accept, the object is randomly allocated within that batch. We prove that suitable batching mechanisms are incentive-compatible and improve efficiency. A key property of the mechanism is the gradual increase of the batch size after each failed allocation; the size is chosen so that it elicits as much information as possible without distorting the incentives of agents to report truthfully. Additionally, from a healthcare policy perspective, our results can shed light on the large wastage in organ allocation. In particular, wastage that arises due to herding may be reduced by applying adaptive simultaneous offering mechanisms.
\ No newline at end of file
diff --git a/data/2021/neurips/Counterexample Guided RL Policy Refinement Using Bayesian Optimization b/data/2021/neurips/Counterexample Guided RL Policy Refinement Using Bayesian Optimization
new file mode 100644
index 0000000000..1c0cfda67c
--- /dev/null
+++ b/data/2021/neurips/Counterexample Guided RL Policy Refinement Using Bayesian Optimization	
@@ -0,0 +1 @@
+Constructing Reinforcement Learning (RL) policies that adhere to safety requirements is an emerging ﬁeld of study. RL agents learn via trial and error with an objective to optimize a reward signal. Often policies that are designed to accumulate rewards do not satisfy safety speciﬁcations. We present a methodology for counterexample guided reﬁnement of a trained RL policy against a given safety speciﬁcation. Our approach has two main components. The ﬁrst component is an approach to discover failure trajectories using Bayesian Optimization over multiple parameters of uncertainty from a policy learnt in a model-free setting. The second component selectively modiﬁes the failure points of the policy using gradient-based updates. The approach has been tested on several RL environments, and we demonstrate that the policy can be made to respect the safety speciﬁcations through such targeted changes.
\ No newline at end of file
diff --git a/data/2021/neurips/Counterfactual Explanations Can Be Manipulated b/data/2021/neurips/Counterfactual Explanations Can Be Manipulated
new file mode 100644
index 0000000000..16115fbde4
--- /dev/null
+++ b/data/2021/neurips/Counterfactual Explanations Can Be Manipulated	
@@ -0,0 +1 @@
+Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilities of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
\ No newline at end of file
diff --git a/data/2021/neurips/Counterfactual Explanations in Sequential Decision Making Under Uncertainty b/data/2021/neurips/Counterfactual Explanations in Sequential Decision Making Under Uncertainty
new file mode 100644
index 0000000000..62cf6e051a
--- /dev/null
+++ b/data/2021/neurips/Counterfactual Explanations in Sequential Decision Making Under Uncertainty	
@@ -0,0 +1 @@
+Methods to find counterfactual explanations have predominantly focused on one step decision making processes. In this work, we initiate the development of methods to find counterfactual explanations for decision making processes in which multiple, dependent actions are taken sequentially over time. We start by formally characterizing a sequence of actions and states using finite horizon Markov decision processes and the Gumbel-Max structural causal model. Building upon this characterization, we formally state the problem of finding counterfactual explanations for sequential decision making processes. In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions from the observed sequence that could have led the observed process realization to a better outcome. Then, we introduce a polynomial time algorithm based on dynamic programming to build a counterfactual policy that is guaranteed to always provide the optimal counterfactual explanation on every possible realization of the counterfactual environment dynamics. We validate our algorithm using both synthetic and real data from cognitive behavioral therapy and show that the counterfactual explanations our algorithm finds can provide valuable insights to enhance sequential decision making under uncertainty.
\ No newline at end of file
diff --git a/data/2021/neurips/Counterfactual Invariance to Spurious Correlations in Text Classification b/data/2021/neurips/Counterfactual Invariance to Spurious Correlations in Text Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Counterfactual Maximum Likelihood Estimation for Training Deep Networks b/data/2021/neurips/Counterfactual Maximum Likelihood Estimation for Training Deep Networks
new file mode 100644
index 0000000000..cb68e3e974
--- /dev/null
+++ b/data/2021/neurips/Counterfactual Maximum Likelihood Estimation for Training Deep Networks	
@@ -0,0 +1 @@
+Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observed confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using observational data. We conduct experiments on both simulated data and two real-world tasks: Natural Language Inference (NLI) and Image Captioning. The results show that CMLE methods outperform the regular MLE method in terms of out-of-domain generalization performance and reducing spurious correlations, while maintaining comparable performance on the regular evaluations.
\ No newline at end of file
diff --git a/data/2021/neurips/Coupled Gradient Estimators for Discrete Latent Variables b/data/2021/neurips/Coupled Gradient Estimators for Discrete Latent Variables
new file mode 100644
index 0000000000..8feeca482e
--- /dev/null
+++ b/data/2021/neurips/Coupled Gradient Estimators for Discrete Latent Variables	
@@ -0,0 +1 @@
+Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While low-variance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous relaxations; however, it is limited to binary random variables. We introduce a novel derivation of their estimator based on importance sampling and statistical couplings, which we extend to the categorical setting. Motivated by the construction of a stick-breaking coupling, we introduce gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization. In systematic experiments, we show that our proposed categorical gradient estimators provide state-of-the-art performance, whereas even with additional Rao-Blackwellization, previous estimators (Yin et al., 2019) underperform a simpler REINFORCE with a leave-one-out-baseline estimator (Kool et al., 2019).
\ No newline at end of file
diff --git a/data/2021/neurips/Coupled Segmentation and Edge Learning via Dynamic Graph Propagation b/data/2021/neurips/Coupled Segmentation and Edge Learning via Dynamic Graph Propagation
new file mode 100644
index 0000000000..c29a8a2ce5
--- /dev/null
+++ b/data/2021/neurips/Coupled Segmentation and Edge Learning via Dynamic Graph Propagation	
@@ -0,0 +1 @@
+Mapillary Vistas pre-training. Our experiment with ResNet-38 on the Cityscapes Test set involves pre-training on Mapillary Vistas. In particular, we adopt the same data loading protocol (random mirror, scaling and color jittering), crop size (1024 × 1024) and base learning rate (3 × 10−7) as training on Cityscapes. Since Mapillary Vistas in general contains larger images than Cityscapes, we adjust the scale factor to be within [0.5, 1.5] instead of [0.5, 2.0]. The total number of training iterations is set to 500K.
\ No newline at end of file
diff --git a/data/2021/neurips/Covariance-Aware Private Mean Estimation Without Private Covariance Estimation b/data/2021/neurips/Covariance-Aware Private Mean Estimation Without Private Covariance Estimation
new file mode 100644
index 0000000000..b01a9b4bb2
--- /dev/null
+++ b/data/2021/neurips/Covariance-Aware Private Mean Estimation Without Private Covariance Estimation	
@@ -0,0 +1 @@
+We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions with unknown covariance. Informally, given $n \gtrsim d/\alpha^2$ samples from such a distribution with mean $\mu$ and covariance $\Sigma$, our estimators output $\tilde\mu$ such that $\| \tilde\mu - \mu \|_{\Sigma} \leq \alpha$, where $\| \cdot \|_{\Sigma}$ is the Mahalanobis distance. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $\Omega(d^{3/2})$ samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Its accuracy guarantees hold even for data sets that have a small amount of adversarial corruption. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance, without releasing the covariance itself. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.
\ No newline at end of file
diff --git a/data/2021/neurips/Credal Self-Supervised Learning b/data/2021/neurips/Credal Self-Supervised Learning
new file mode 100644
index 0000000000..18ed149fe7
--- /dev/null
+++ b/data/2021/neurips/Credal Self-Supervised Learning	
@@ -0,0 +1 @@
+In semi-supervised learning, the paradigm of self-training refers to the idea of learning from pseudo-labels suggested by the learner itself. Across various domains, corresponding methods have proven effective and achieve state-of-the-art performance. However, pseudo-labels typically stem from ad-hoc heuristics, relying on the quality of the predictions though without guaranteeing their validity. One such method, so-called credal self-supervised learning, maintains pseudo-supervision in the form of sets of (instead of single) probability distributions over labels, thereby allowing for a flexible yet uncertainty-aware labeling. Again, however, there is no justification beyond empirical effectiveness. To address this deficiency, we make use of conformal prediction, an approach that comes with guarantees on the validity of set-valued predictions. As a result, the construction of credal sets of labels is supported by a rigorous theoretical foundation, leading to better calibrated and less error-prone supervision for unlabeled data. Along with this, we present effective algorithms for learning from credal self-supervision. An empirical study demonstrates excellent calibration properties of the pseudo-supervision, as well as the competitiveness of our method on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Credit Assignment Through Broadcasting a Global Error Vector b/data/2021/neurips/Credit Assignment Through Broadcasting a Global Error Vector
new file mode 100644
index 0000000000..b04a4cf295
--- /dev/null
+++ b/data/2021/neurips/Credit Assignment Through Broadcasting a Global Error Vector	
@@ -0,0 +1 @@
+Backpropagation (BP) uses detailed, unit-specific feedback to train deep neural networks (DNNs) with remarkable success. That biological neural circuits appear to perform credit assignment, but cannot implement BP, implies the existence of other powerful learning algorithms. Here, we explore the extent to which a globally broadcast learning signal, coupled with local weight updates, enables training of DNNs. We present both a learning rule, called global error-vector broadcasting (GEVB), and a class of DNNs, called vectorized nonnegative networks (VNNs), in which this learning rule operates. VNNs have vector-valued units and nonnegative weights past the first layer. The GEVB learning rule generalizes three-factor Hebbian learning, updating each weight by an amount proportional to the inner product of the presynaptic activation and a globally broadcast error vector when the postsynaptic unit is active. We prove that these weight updates are matched in sign to the gradient, enabling accurate credit assignment. Moreover, at initialization, these updates are exactly proportional to the gradient in the limit of infinite network width. GEVB matches the performance of BP in VNNs, and in some cases outperforms direct feedback alignment (DFA) applied in conventional networks. Unlike DFA, GEVB successfully trains convolutional layers. Altogether, our theoretical and empirical results point to a surprisingly powerful role for a global learning signal in training DNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Credit Assignment in Neural Networks through Deep Feedback Control b/data/2021/neurips/Credit Assignment in Neural Networks through Deep Feedback Control
new file mode 100644
index 0000000000..b395470e1b
--- /dev/null
+++ b/data/2021/neurips/Credit Assignment in Neural Networks through Deep Feedback Control	
@@ -0,0 +1 @@
+The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-plausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical optimization method. Here, we introduce Deep Feedback Control (DFC), a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of feedback connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing. By combining dynamical system theory with mathematical optimization theory, we provide a strong theoretical foundation for DFC that we corroborate with detailed results on toy experiments and standard computer-vision benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning b/data/2021/neurips/Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning
new file mode 100644
index 0000000000..e2bcab8c15
--- /dev/null
+++ b/data/2021/neurips/Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning	
@@ -0,0 +1 @@
+In visual-input sim-to-real scenarios, to overcome the reality gap between images rendered in simulators and those from the real world, domain adaptation, i.e., learning an aligned representation space between simulators and the real world, then training and deploying policies in the aligned representation, is a promising direction. Previous methods focus on same-modal domain adaptation. However, those methods require building and running simulators that render high-quality images, which can be difﬁcult and costly. In this paper, we consider a more cost-efﬁcient setting of visual-input sim-to-real where only low-dimensional states are simulated. We ﬁrst point out that the objective of learning mapping functions in previous methods that align the representation spaces is ill-posed, prone to yield an incorrect mapping. When the mapping crosses modalities, previous methods are easier to fail. Our algorithm, C ross-m O dal D omain A daptation with S equential structure (CODAS), mitigates the ill-posedness by utilizing the sequential nature of the data sampling process in RL tasks. Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
\ No newline at end of file
diff --git a/data/2021/neurips/Cross-view Geo-localization with Layer-to-Layer Transformer b/data/2021/neurips/Cross-view Geo-localization with Layer-to-Layer Transformer
new file mode 100644
index 0000000000..63be9bf511
--- /dev/null
+++ b/data/2021/neurips/Cross-view Geo-localization with Layer-to-Layer Transformer	
@@ -0,0 +1 @@
+In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images. The cross-view matching task is extremely challenging due to drastic appearance and geometry differences across views. Unlike existing methods that predominantly fall back on CNN, here we devise a novel layer-to-layer Transformer (L2LTR) that utilizes the properties of self-attention in Transformer to model global dependencies, thus signiﬁcantly decreasing visual ambiguities in cross-view geo-localization. We also exploit the positional encoding of the Transformer to help the L2LTR understand and correspond geometric conﬁg-urations between ground and aerial images. Compared to state-of-the-art methods that impose strong assumptions on geometry knowledge, the L2LTR ﬂexibly learns the positional embeddings through the training objective. It hence becomes more practical in many real-world scenarios. Although Transformer is well suited to our task, its vanilla self-attention mechanism independently interacts within image patches in each layer, which overlooks correlations between layers. Instead, this paper proposes a simple yet effective self-cross attention mechanism to improve the quality of learned representations. Self-cross attention models global dependencies between adjacent layers and creates short paths for effective information ﬂow. As a result, the proposed self-cross attention leads to more stable training, improves the generalization ability, and prevents the learned intermediate features from being overly similar. Extensive experiments demonstrate that our L2LTR performs favorably against state-of-the-art methods on standard, ﬁne-grained, and cross-dataset cross-view geo-localization tasks. The code is available online. 3
\ No newline at end of file
diff --git a/data/2021/neurips/CrypTen: Secure Multi-Party Computation Meets Machine Learning b/data/2021/neurips/CrypTen: Secure Multi-Party Computation Meets Machine Learning
new file mode 100644
index 0000000000..d03bf92146
--- /dev/null
+++ b/data/2021/neurips/CrypTen: Secure Multi-Party Computation Meets Machine Learning	
@@ -0,0 +1 @@
+Secure multi-party computation (MPC) allows parties to perform computations on data while keeping that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one party's private model using another party's private data, etc. Although a range of studies implement machine-learning models via secure MPC, such implementations are not yet mainstream. Adoption of secure MPC is hampered by the absence of flexible software frameworks that"speak the language"of machine-learning researchers and engineers. To foster adoption of secure MPC in machine learning, we present CrypTen: a software framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks. This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification. Our benchmarks show that CrypTen's GPU support and high-performance communication between (an arbitrary number of) parties allows it to perform efficient private evaluation of modern machine-learning models under a semi-honest threat model. For example, two parties using CrypTen can securely predict phonemes in speech recordings using Wav2Letter faster than real-time. We hope that CrypTen will spur adoption of secure MPC in the machine-learning community.
\ No newline at end of file
diff --git a/data/2021/neurips/Curriculum Design for Teaching via Demonstrations: Theory and Applications b/data/2021/neurips/Curriculum Design for Teaching via Demonstrations: Theory and Applications
new file mode 100644
index 0000000000..a67de53dae
--- /dev/null
+++ b/data/2021/neurips/Curriculum Design for Teaching via Demonstrations: Theory and Applications	
@@ -0,0 +1 @@
+We consider the problem of teaching via demonstrations in sequential decision-making settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (CrossEnt-BC). Our unified strategy induces a ranking over demonstrations based on a notion of difficulty scores computed w.r.t. the teacher's optimal policy and the learner's current policy. Compared to the state of the art, our strategy doesn't require access to the learner's internal dynamics and still enjoys similar convergence guarantees under mild technical conditions. Furthermore, we adapt our curriculum strategy to the setting where no teacher agent is present using task-specific difficulty scores. Experiments on a synthetic car driving environment and navigation-based environments demonstrate the effectiveness of our curriculum strategy.
\ No newline at end of file
diff --git a/data/2021/neurips/Curriculum Disentangled Recommendation with Noisy Multi-feedback b/data/2021/neurips/Curriculum Disentangled Recommendation with Noisy Multi-feedback
new file mode 100644
index 0000000000..46b606cb26
--- /dev/null
+++ b/data/2021/neurips/Curriculum Disentangled Recommendation with Noisy Multi-feedback	
@@ -0,0 +1 @@
+Learning disentangled representations for user intentions from multi-feedback (i.e., positive and negative feedback) can enhance the accuracy and explainability of recommendation algorithms. However, learning such disentangled representations from multi-feedback data is challenging because i) multi-feedback is complex: there exist complex relations among different types of feedback (e.g., click, unclick , and dislike , etc) as well as various user intentions, and ii) multi-feedback is noisy: there exists noisy (useless) information both in features and labels, which may deteriorate the recommendation performance. Existing disentangled recommendation works only focus on positive feedback, failing to handle the complex relations and noise hidden in multi-feedback data. To solve this problem, in this work we propose a Curriculum Disentangled Recommendation (CDR) model that is capable of efﬁciently learning disentangled representations from complex and noisy multi-feedback for better recommendation. Concretely, we design a co-ﬁltering dynamic routing mechanism which simultaneously captures the complex relations among different behavioral feedback and user intentions as well as denoise the representations in the feature level. We then present an adjustable self-evaluating curriculum that is able to evaluate sample difﬁculties for better model training and conduct denoising in the label level via disregarding useless information. Our extensive experiments on several real-world datasets demonstrate that the proposed CDR model can signiﬁcantly outperform several state-of-the-art methods in terms of recommendation accuracy 3 .
\ No newline at end of file
diff --git a/data/2021/neurips/Curriculum Learning for Vision-and-Language Navigation b/data/2021/neurips/Curriculum Learning for Vision-and-Language Navigation
new file mode 100644
index 0000000000..651d316fb0
--- /dev/null
+++ b/data/2021/neurips/Curriculum Learning for Vision-and-Language Navigation	
@@ -0,0 +1 @@
+Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum-based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.
\ No newline at end of file
diff --git a/data/2021/neurips/Curriculum Offline Imitating Learning b/data/2021/neurips/Curriculum Offline Imitating Learning
new file mode 100644
index 0000000000..f38c8811f9
--- /dev/null
+++ b/data/2021/neurips/Curriculum Offline Imitating Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. Despite the potential to surpass the behavioral policies, RL-based methods are generally impractical due to the training instability and bootstrapping the extrapolation errors, which always require careful hyperparameter tuning via online evaluation. In contrast, offline imitation learning (IL) has no such issues since it learns the policy directly without estimating the value function by bootstrapping. However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies. In this paper, we aim to take advantage of IL but mitigate such a drawback. Observing that behavior cloning is able to imitate neighboring policies with less data, we propose \textit{Curriculum Offline Imitation Learning (COIL)}, which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return, and improves the current policy along curriculum stages. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Cycle Self-Training for Domain Adaptation b/data/2021/neurips/Cycle Self-Training for Domain Adaptation
new file mode 100644
index 0000000000..29ff79f238
--- /dev/null
+++ b/data/2021/neurips/Cycle Self-Training for Domain Adaptation	
@@ -0,0 +1 @@
+Mainstream approaches for unsupervised domain adaptation (UDA) learn domain-invariant representations to narrow the domain shift. Recently, self-training has been gaining momentum in UDA, which exploits unlabeled target data by training with target pseudo-labels. However, as corroborated in this work, under distributional shift in UDA, the pseudo-labels can be unreliable in terms of their large discrepancy from target ground truth. Thereby, we propose Cycle Self-Training (CST), a principled self-training algorithm that explicitly enforces pseudo-labels to generalize across domains. CST cycles between a forward step and a reverse step until convergence. In the forward step, CST generates target pseudo-labels with a source-trained classifier. In the reverse step, CST trains a target classifier using target pseudo-labels, and then updates the shared representations to make the target classifier perform well on the source data. We introduce the Tsallis entropy as a confidence-friendly regularization to improve the quality of target pseudo-labels. We analyze CST theoretically under realistic assumptions, and provide hard cases where CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over the state-of-the-arts on visual recognition and sentiment analysis benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation b/data/2021/neurips/D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation
new file mode 100644
index 0000000000..dc3042b845
--- /dev/null
+++ b/data/2021/neurips/D2C: Diffusion-Decoding Models for Few-Shot Conditional Generation	
@@ -0,0 +1 @@
+Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.
\ No newline at end of file
diff --git a/data/2021/neurips/DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks b/data/2021/neurips/DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks
new file mode 100644
index 0000000000..bdc37014fb
--- /dev/null
+++ b/data/2021/neurips/DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks	
@@ -0,0 +1 @@
+Machine learning models have been criticized for reflecting unfair biases in the training data. Instead of solving for this by introducing fair learning algorithms directly, we focus on generating fair synthetic data, such that any downstream learner is fair. Generating fair synthetic data from unfair data - while remaining truthful to the underlying data-generating process (DGP) - is non-trivial. In this paper, we introduce DECAF: a GAN-based fair synthetic data generator for tabular data. With DECAF we embed the DGP explicitly as a structural causal model in the input layers of the generator, allowing each variable to be reconstructed conditioned on its causal parents. This procedure enables inference time debiasing, where biased edges can be strategically removed for satisfying user-defined fairness requirements. The DECAF framework is versatile and compatible with several popular definitions of fairness. In our experiments, we show that DECAF successfully removes undesired bias and - in contrast to existing methods - is capable of generating high-quality synthetic data. Furthermore, we provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
\ No newline at end of file
diff --git a/data/2021/neurips/DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer b/data/2021/neurips/DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer
new file mode 100644
index 0000000000..1247472e8e
--- /dev/null
+++ b/data/2021/neurips/DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer	
@@ -0,0 +1 @@
+We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiable renderers. Many previous learning-based approaches for inverse graphics adopt rasterization-based renderers and assume naive lighting and material models, which often fail to account for non-Lambertian, specular reflections commonly observed in the wild. In this work, we propose DIBR++, a hybrid differentiable renderer which supports these photorealistic effects by combining rasterization and ray-tracing, taking the advantage of their respective strengths -- speed and realism. Our renderer incorporates environmental lighting and spatially-varying material models to efficiently approximate light transport, either through direct estimation or via spherical basis functions. Compared to more advanced physics-based differentiable renderers leveraging path tracing, DIBR++ is highly performant due to its compact and expressive shading model, which enables easy integration with learning frameworks for geometry, reflectance and lighting prediction from a single image without requiring any ground-truth. We experimentally demonstrate that our approach achieves superior material and lighting disentanglement on synthetic and real data compared to existing rasterization-based approaches and showcase several artistic applications including material editing and relighting.
\ No newline at end of file
diff --git a/data/2021/neurips/DNN-based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel b/data/2021/neurips/DNN-based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel
new file mode 100644
index 0000000000..90eae7e3d2
--- /dev/null
+++ b/data/2021/neurips/DNN-based Topology Optimisation: Spatial Invariance and Neural Tangent Kernel	
@@ -0,0 +1 @@
+We study the Solid Isotropic Material Penalisation (SIMP) method with a density field generated by a fully-connected neural network, taking the coordinates as inputs. In the large width limit, we show that the use of DNNs leads to a filtering effect similar to traditional filtering techniques for SIMP, with a filter described by the Neural Tangent Kernel (NTK). This filter is however not invariant under translation, leading to visual artifacts and non-optimal shapes. We propose two embeddings of the input coordinates, which lead to (approximate) spatial invariance of the NTK and of the filter. We empirically confirm our theoretical observations and study how the filter size is affected by the architecture of the network. Our solution can easily be applied to any other coordinates-based generation method.
\ No newline at end of file
diff --git a/data/2021/neurips/DOBF: A Deobfuscation Pre-Training Objective for Programming Languages b/data/2021/neurips/DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
new file mode 100644
index 0000000000..fcef5174f6
--- /dev/null
+++ b/data/2021/neurips/DOBF: A Deobfuscation Pre-Training Objective for Programming Languages	
@@ -0,0 +1 @@
+Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 13% in unsupervised code translation, and 24% in natural language code search. Incidentally, we found that our pre-trained model is able to de-obfuscate fully obfuscated source files, and to suggest descriptive variable names.
\ No newline at end of file
diff --git a/data/2021/neurips/DOCTOR: A Simple Method for Detecting Misclassification Errors b/data/2021/neurips/DOCTOR: A Simple Method for Detecting Misclassification Errors
new file mode 100644
index 0000000000..8ad80b851e
--- /dev/null
+++ b/data/2021/neurips/DOCTOR: A Simple Method for Detecting Misclassification Errors	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have shown to perform very well on large scale object recognition problems and lead to widespread use for real-world applications, including situations where DNN are implemented as"black boxes". A promising approach to secure their use is to accept decisions that are likely to be correct while discarding the others. In this work, we propose DOCTOR, a simple method that aims to identify whether the prediction of a DNN classifier should (or should not) be trusted so that, consequently, it would be possible to accept it or to reject it. Two scenarios are investigated: Totally Black Box (TBB) where only the soft-predictions are available and Partially Black Box (PBB) where gradient-propagation to perform input pre-processing is allowed. Empirically, we show that DOCTOR outperforms all state-of-the-art methods on various well-known images and sentiment analysis datasets. In particular, we observe a reduction of up to $4\%$ of the false rejection rate (FRR) in the PBB scenario. DOCTOR can be applied to any pre-trained model, it does not require prior information about the underlying dataset and is as simple as the simplest available methods in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples b/data/2021/neurips/DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples
new file mode 100644
index 0000000000..aee0f23c59
--- /dev/null
+++ b/data/2021/neurips/DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples	
@@ -0,0 +1 @@
+The scarcity of labeled data is a critical obstacle to deep learning. Semi-supervised learning (SSL) provides a promising way to leverage unlabeled data by pseudo labels. However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low quality of learned pseudo labels. In this paper, we propose a new SSL method called DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning~(MCL) based approach to automatically generate LFs from scratch in SSL style. With the noisy labels produced by the LFs, we design a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples. Extensive experiments on four standard SSL benchmarks show that DP-SSL can provide reliable labels for unlabeled data and achieve better classification performance on test sets than existing SSL methods, especially when only a small number of labeled samples are available. Concretely, for CIFAR-10 with only 40 labeled samples, DP-SSL achieves 93.82% annotation accuracy on unlabeled data and 93.46% classification accuracy on test data, which are higher than the SOTA results.
\ No newline at end of file
diff --git a/data/2021/neurips/DRIVE: One-bit Distributed Mean Estimation b/data/2021/neurips/DRIVE: One-bit Distributed Mean Estimation
new file mode 100644
index 0000000000..de8eccb5aa
--- /dev/null
+++ b/data/2021/neurips/DRIVE: One-bit Distributed Mean Estimation	
@@ -0,0 +1 @@
+We consider the problem where $n$ clients transmit $d$-dimensional real-valued vectors using $d(1+o(1))$ bits each, in a manner that allows the receiver to approximately reconstruct their mean. Such compression problems naturally arise in distributed and federated learning. We provide novel mathematical results and derive computationally efficient algorithms that are more accurate than previous compression techniques. We evaluate our methods on a collection of distributed and federated learning tasks, using a variety of datasets, and show a consistent improvement over the state of the art.
\ No newline at end of file
diff --git a/data/2021/neurips/DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras b/data/2021/neurips/DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
new file mode 100644
index 0000000000..e9dec6b5b9
--- /dev/null
+++ b/data/2021/neurips/DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras	
@@ -0,0 +1 @@
+We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time. The URL to our open source code is https://github.com/princeton-vl/DROID-SLAM.
\ No newline at end of file
diff --git a/data/2021/neurips/DRONE: Data-aware Low-rank Compression for Large NLP Models b/data/2021/neurips/DRONE: Data-aware Low-rank Compression for Large NLP Models
new file mode 100644
index 0000000000..d626543272
--- /dev/null
+++ b/data/2021/neurips/DRONE: Data-aware Low-rank Compression for Large NLP Models	
@@ -0,0 +1 @@
+The representations learned by large-scale NLP models such as BERT have been widely used in various tasks. However, the increasing model size of the pre-trained models also brings efﬁciency challenges, including inference speed and model size when deploying models on mobile devices. Speciﬁcally, most operations in BERT consist of matrix multiplications. These matrices are not low-rank and thus canonical matrix decompositions do not lead to efﬁcient approximations. In this paper, we observe that the learned representation of each layer lies in a low-dimensional space. Based on this observation, we propose DRONE ( d ata-awa r e l o w-ra n k compr e ssion), a provably optimal low-rank decomposition of weight matrices, which has a simple closed form solution that can be efﬁciently computed. DRONE can be applied to both fully-connected and self-attention layers appearing in the BERT model. In addition to compressing standard models, our method can also be used on distilled BERT models to further improve the compression rate. Experimental results show that DRONE is able to improve both model size and inference speed with limited loss in accuracy. Speciﬁcally, DRONE alone achieves 1.92x speedup on the MRPC task with only 1.5 % loss in accuracy, and when DRONE is combined with distillation, it further achieves over 12.3x speedup on various natural language inference tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning b/data/2021/neurips/DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
new file mode 100644
index 0000000000..ed815e3609
--- /dev/null
+++ b/data/2021/neurips/DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning	
@@ -0,0 +1 @@
+The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness can lead to convergence and statistical performance issues when training with gradient-based methods. In this paper, we develop DSelect-k: a continuously differentiable and sparse gate for MoE, based on a novel binary encoding formulation. The gate can be trained using first-order methods, such as stochastic gradient descent, and offers explicit control over the number of experts to select. We demonstrate the effectiveness of DSelect-k on both synthetic and real MTL datasets with up to $128$ tasks. Our experiments indicate that DSelect-k can achieve statistically significant improvements in prediction and expert selection over popular MoE gates. Notably, on a real-world, large-scale recommender system, DSelect-k achieves over $22\%$ improvement in predictive performance compared to Top-k. We provide an open-source implementation of DSelect-k.
\ No newline at end of file
diff --git a/data/2021/neurips/Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization b/data/2021/neurips/Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
new file mode 100644
index 0000000000..0589fc4ff3
--- /dev/null
+++ b/data/2021/neurips/Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization	
@@ -0,0 +1 @@
+Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Dangers of Bayesian Model Averaging under Covariate Shift b/data/2021/neurips/Dangers of Bayesian Model Averaging under Covariate Shift
new file mode 100644
index 0000000000..8f5d17eacf
--- /dev/null
+++ b/data/2021/neurips/Dangers of Bayesian Model Averaging under Covariate Shift	
@@ -0,0 +1 @@
+Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
\ No newline at end of file
diff --git a/data/2021/neurips/Data Augmentation Can Improve Robustness b/data/2021/neurips/Data Augmentation Can Improve Robustness
new file mode 100644
index 0000000000..2a814b83d6
--- /dev/null
+++ b/data/2021/neurips/Data Augmentation Can Improve Robustness	
@@ -0,0 +1 @@
+Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demonstrate that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy. Furthermore, we compare various augmentations techniques and observe that spatial composition techniques work the best for adversarial training. Finally, we evaluate our approach on CIFAR-10 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. In particular, against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data. We also achieve a significant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TinyImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Data Sharing and Compression for Cooperative Networked Control b/data/2021/neurips/Data Sharing and Compression for Cooperative Networked Control
new file mode 100644
index 0000000000..77e05e7e11
--- /dev/null
+++ b/data/2021/neurips/Data Sharing and Compression for Cooperative Networked Control	
@@ -0,0 +1 @@
+Sharing forecasts of network timeseries data, such as cellular or electricity load patterns, can improve independent control applications ranging from traffic scheduling to power generation. Typically, forecasts are designed without knowledge of a downstream controller's task objective, and thus simply optimize for mean prediction error. However, such task-agnostic representations are often too large to stream over a communication network and do not emphasize salient temporal features for cooperative control. This paper presents a solution to learn succinct, highly-compressed forecasts that are co-designed with a modular controller's task objective. Our simulations with real cellular, Internet-of-Things (IoT), and electricity load data show we can improve a model predictive controller's performance by at least $25\%$ while transmitting $80\%$ less data than the competing method. Further, we present theoretical compression results for a networked variant of the classical linear quadratic regulator (LQR) control problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Data driven semi-supervised learning b/data/2021/neurips/Data driven semi-supervised learning
new file mode 100644
index 0000000000..fff79a4491
--- /dev/null
+++ b/data/2021/neurips/Data driven semi-supervised learning	
@@ -0,0 +1 @@
+We consider a novel data driven approach for designing learning algorithms that can effectively learn with only a small number of labeled examples. This is crucial for modern machine learning applications where labels are scarce or expensive to obtain. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past decades, several elegant graph-based semi-supervised learning algorithms for how to infer the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to domain-specific art and heuristics and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters from commonly used parametric families of graphs that perform well on new instances coming from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, providing general techniques to apply to large classes of problems. We expect some of the tools and techniques we develop along the way to be of interest beyond semi-supervised learning, for data driven algorithms for combinatorial problems more generally.
\ No newline at end of file
diff --git a/data/2021/neurips/Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective b/data/2021/neurips/Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective
new file mode 100644
index 0000000000..792b8fb158
--- /dev/null
+++ b/data/2021/neurips/Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective	
@@ -0,0 +1 @@
+Training generative adversarial networks (GANs) with limited real image data generally results in deteriorated performance and collapsed models. To conquer this challenge, we are inspired by the latest observation, that one can discover independently trainable and highly sparse subnetworks (a.k.a., lottery tickets) from GANs. Treating this as an inductive prior, we suggest a brand-new angle towards data-efficient GAN training: by first identifying the lottery ticket from the original GAN using the small training set of real images; and then focusing on training that sparse subnetwork by re-using the same set. We find our coordinated framework to offer orthogonal gains to existing real image data augmentation methods, and we additionally present a new feature-level augmentation that can be applied together with them. Comprehensive experiments endorse the effectiveness of our proposed framework, across various GAN architectures (SNGAN, BigGAN, and StyleGAN-V2) and diverse datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet, ImageNet, and multiple few-shot generation datasets). Codes are available at: https://github.com/VITA-Group/Ultra-Data-Efficient-GAN-Training.
\ No newline at end of file
diff --git a/data/2021/neurips/Data-Efficient Instance Generation from Instance Discrimination b/data/2021/neurips/Data-Efficient Instance Generation from Instance Discrimination
new file mode 100644
index 0000000000..c6c356ffc3
--- /dev/null
+++ b/data/2021/neurips/Data-Efficient Instance Generation from Instance Discrimination	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) have significantly advanced image synthesis, however, the synthesis quality drops significantly given a limited amount of training data. To improve the data efficiency of GAN training, prior work typically employs data augmentation to mitigate the overfitting of the discriminator yet still learn the discriminator with a bi-classification (i.e., real vs. fake) task. In this work, we propose a data-efficient Instance Generation (InsGen) method based on instance discrimination. Concretely, besides differentiating the real domain from the fake domain, the discriminator is required to distinguish every individual image, no matter it comes from the training set or from the generator. In this way, the discriminator can benefit from the infinite synthesized samples for training, alleviating the overfitting problem caused by insufficient training data. A noise perturbation strategy is further introduced to improve its discriminative power. Meanwhile, the learned instance discrimination capability from the discriminator is in turn exploited to encourage the generator for diverse generation. Extensive experiments demonstrate the effectiveness of our method on a variety of datasets and training settings. Noticeably, on the setting of 2K training images from the FFHQ dataset, we outperform the state-of-the-art approach with 23.5% FID improvement.
\ No newline at end of file
diff --git a/data/2021/neurips/Dataset Distillation with Infinitely Wide Convolutional Networks b/data/2021/neurips/Dataset Distillation with Infinitely Wide Convolutional Networks
new file mode 100644
index 0000000000..7579dd6388
--- /dev/null
+++ b/data/2021/neurips/Dataset Distillation with Infinitely Wide Convolutional Networks	
@@ -0,0 +1 @@
+The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data. As model and dataset sizes increase, dataset distillation methods that compress large datasets into significantly smaller yet highly performant ones will become valuable in terms of training efficiency and useful feature extraction. To that end, we apply a novel distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks. For instance, using only 10 datapoints (0.02% of original dataset), we obtain over 65% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%. Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN. Furthermore, we perform some preliminary analyses of our distilled datasets to shed light on how they differ from naturally occurring data.
\ No newline at end of file
diff --git a/data/2021/neurips/De-randomizing MCMC dynamics with the diffusion Stein operator b/data/2021/neurips/De-randomizing MCMC dynamics with the diffusion Stein operator
new file mode 100644
index 0000000000..3d266fcd48
--- /dev/null
+++ b/data/2021/neurips/De-randomizing MCMC dynamics with the diffusion Stein operator	
@@ -0,0 +1 @@
+Approximate Bayesian inference estimates descriptors of an intractable target distribution - in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein space with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretizes into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complex MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.
\ No newline at end of file
diff --git a/data/2021/neurips/Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification b/data/2021/neurips/Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification
new file mode 100644
index 0000000000..f1762c3ef7
--- /dev/null
+++ b/data/2021/neurips/Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification	
@@ -0,0 +1 @@
+We study the problem of the identification of m arms with largest means under a fixed error rate $\delta$ (fixed-confidence Top-m identification), for misspecified linear bandit models. This problem is motivated by practical applications, especially in medicine and recommendation systems, where linear models are popular due to their simplicity and the existence of efficient algorithms, but in which data inevitably deviates from linearity. In this work, we first derive a tractable lower bound on the sample complexity of any $\delta$-correct algorithm for the general Top-m identification problem. We show that knowing the scale of the deviation from linearity is necessary to exploit the structure of the problem. We then describe the first algorithm for this setting, which is both practical and adapts to the amount of misspecification. We derive an upper bound to its sample complexity which confirms this adaptivity and that matches the lower bound when $\delta$ $\rightarrow$ 0. Finally, we evaluate our algorithm on both synthetic and real-world data, showing competitive performance with respect to existing baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Debiased Visual Question Answering from Feature and Sample Perspectives b/data/2021/neurips/Debiased Visual Question Answering from Feature and Sample Perspectives
new file mode 100644
index 0000000000..8bc7e12ca8
--- /dev/null
+++ b/data/2021/neurips/Debiased Visual Question Answering from Feature and Sample Perspectives	
@@ -0,0 +1 @@
+Visual question answering (VQA) is designed to examine the visual-textual reasoning ability of an intelligent agent. However, recent observations show that many VQA models may only capture the biases between questions and answers in a dataset rather than showing real reasoning abilities. For example, given a question, some VQA models tend to output the answer that occurs frequently in the dataset and ignore the images. To reduce this tendency, existing methods focus on weakening the language bias. Meanwhile, only a few works also consider vision bias implicitly. However, these methods introduce additional annotations or show unsatisfactory performance. Moreover, not all biases are harmful to the models. Some “biases” learnt from datasets represent natural rules of the world and can help limit the range of answers. Thus, how to ﬁlter and remove the true negative biases in language and vision modalities remain a major challenge. In this paper, we propose a method named D-VQA to alleviate the above challenges from the feature and sample perspectives. Speciﬁcally, from the feature perspective, we build a question-to-answer and vision-to-answer branch to capture the language and vision biases, respectively. Next, we apply two unimodal bias detection modules to explicitly recognise and remove the negative biases. From the sample perspective, we construct two types of negative samples to assist the training of the models, without introducing additional annotations. Extensive experiments on the VQA-CP v2 and VQA v2 datasets demonstrate the effectiveness of our D-VQA method.
\ No newline at end of file
diff --git a/data/2021/neurips/Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data b/data/2021/neurips/Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data
new file mode 100644
index 0000000000..11cce8d82d
--- /dev/null
+++ b/data/2021/neurips/Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data	
@@ -0,0 +1 @@
+Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator. As an alternative method to existing approaches that rely on standard data augmentations or model regularization, APA alleviates overfitting by employing the generator itself to augment the real data distribution with generated images, which deceives the discriminator adaptively. Extensive experiments demonstrate the effectiveness of APA in improving synthesis quality in the low-data regime. We provide a theoretical analysis to examine the convergence and rationality of our new training strategy. APA is simple and effective. It can be added seamlessly to powerful contemporary GANs, such as StyleGAN2, with negligible computational cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Decentralized Learning in Online Queuing Systems b/data/2021/neurips/Decentralized Learning in Online Queuing Systems
new file mode 100644
index 0000000000..578cffd15f
--- /dev/null
+++ b/data/2021/neurips/Decentralized Learning in Online Queuing Systems	
@@ -0,0 +1 @@
+Motivated by packet routing in computer networks, online queuing systems are composed of queues receiving packets at different rates. Repeatedly, they send packets to servers, each of them treating only at most one packet at a time. In the centralized case, the number of accumulated packets remains bounded (i.e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than $1$. In the decentralized case, individual no-regret strategies ensures stability when this ratio is larger than $2$. Yet, myopically minimizing regret disregards the long term effects due to the carryover of packets to further rounds. On the other hand, minimizing long term costs leads to stable Nash equilibria as soon as the ratio exceeds $\frac{e}{e-1}$. Stability with decentralized learning strategies with a ratio below $2$ was a major remaining question. We first argue that for ratios up to $2$, cooperation is required for stability of learning strategies, as selfish minimization of policy regret, a \textit{patient} notion of regret, might indeed still be unstable in this case. We therefore consider cooperative queues and propose the first learning decentralized algorithm guaranteeing stability of the system as long as the ratio of rates is larger than $1$, thus reaching performances comparable to centralized strategies.
\ No newline at end of file
diff --git a/data/2021/neurips/Decentralized Q-learning in Zero-sum Markov Games b/data/2021/neurips/Decentralized Q-learning in Zero-sum Markov Games
new file mode 100644
index 0000000000..3f2b2f1c81
--- /dev/null
+++ b/data/2021/neurips/Decentralized Q-learning in Zero-sum Markov Games	
@@ -0,0 +1 @@
+We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponent's strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt the learning dynamics, they converge to the Nash equilibrium of the game. The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.
\ No newline at end of file
diff --git a/data/2021/neurips/Decision Transformer: Reinforcement Learning via Sequence Modeling b/data/2021/neurips/Decision Transformer: Reinforcement Learning via Sequence Modeling
new file mode 100644
index 0000000000..8a2096e35b
--- /dev/null
+++ b/data/2021/neurips/Decision Transformer: Reinforcement Learning via Sequence Modeling	
@@ -0,0 +1 @@
+We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. This allows us to draw upon the simplicity and scalability of the Transformer architecture, and associated advances in language modeling such as GPT-x and BERT. In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling. Unlike prior approaches to RL that fit value functions or compute policy gradients, Decision Transformer simply outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired return (reward), past states, and actions, our Decision Transformer model can generate future actions that achieve the desired return. Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Deconditional Downscaling with Gaussian Processes b/data/2021/neurips/Deconditional Downscaling with Gaussian Processes
new file mode 100644
index 0000000000..b00545016f
--- /dev/null
+++ b/data/2021/neurips/Deconditional Downscaling with Gaussian Processes	
@@ -0,0 +1 @@
+Refining low-resolution (LR) spatial fields with high-resolution (HR) information, often known as statistical downscaling, is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an"inverse"of the conditional expectation, namely a deconditioning problem. In this work, we propose a Bayesian formulation of deconditioning which naturally recovers the initial reproducing kernel Hilbert space formulation from Hsu and Ramos (2019). We extend deconditioning to a downscaling setup and devise efficient conditional mean embedding estimator for multiresolution data. By treating conditional expectations as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Deconvolutional Networks on Graph Data b/data/2021/neurips/Deconvolutional Networks on Graph Data
new file mode 100644
index 0000000000..52cd79944d
--- /dev/null
+++ b/data/2021/neurips/Deconvolutional Networks on Graph Data	
@@ -0,0 +1 @@
+In this paper, we consider an inverse problem in graph learning domain -- ``given the graph representations smoothed by Graph Convolutional Network (GCN), how can we reconstruct the input graph signal?"We propose Graph Deconvolutional Network (GDN) and motivate the design of GDN via a combination of inverse filters in spectral domain and de-noising layers in wavelet domain, as the inverse operation results in a high frequency amplifier and may amplify the noise. We demonstrate the effectiveness of the proposed method on several tasks including graph feature imputation and graph structure generation.
\ No newline at end of file
diff --git a/data/2021/neurips/Decoupling the Depth and Scope of Graph Neural Networks b/data/2021/neurips/Decoupling the Depth and Scope of Graph Neural Networks
new file mode 100644
index 0000000000..a534f27b03
--- /dev/null
+++ b/data/2021/neurips/Decoupling the Depth and Scope of Graph Neural Networks	
@@ -0,0 +1 @@
+State-of-the-art Graph Neural Networks (GNNs) have limited scalability with respect to the graph and model sizes. On large graphs, increasing the model depth often means exponential expansion of the scope (i.e., receptive field). Beyond just a few layers, two fundamental challenges emerge: 1. degraded expressivity due to oversmoothing, and 2. expensive computation due to neighborhood explosion. We propose a design principle to decouple the depth and scope of GNNs -- to generate representation of a target entity (i.e., a node or an edge), we first extract a localized subgraph as the bounded-size scope, and then apply a GNN of arbitrary depth on top of the subgraph. A properly extracted subgraph consists of a small number of critical neighbors, while excluding irrelevant ones. The GNN, no matter how deep it is, smooths the local neighborhood into informative representation rather than oversmoothing the global graph into"white noise". Theoretically, decoupling improves the GNN expressive power from the perspectives of graph signal processing (GCN), function approximation (GraphSAGE) and topological learning (GIN). Empirically, on seven graphs (with up to 110M nodes) and six backbone GNN architectures, our design achieves significant accuracy improvement with orders of magnitude reduction in computation and hardware cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP b/data/2021/neurips/Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP
new file mode 100644
index 0000000000..0c7ab02ff9
--- /dev/null
+++ b/data/2021/neurips/Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP	
@@ -0,0 +1 @@
+Cryptic crosswords, the dominant crossword variety in the UK, are a promising target for advancing NLP systems that seek to process semantically complex, highly compositional language. Cryptic clues read like fluent natural language but are adversarially composed of two parts: a definition and a wordplay cipher requiring character-level manipulations. Expert humans use creative intelligence to solve cryptics, flexibly combining linguistic, world, and domain knowledge. In this paper, we make two main contributions. First, we present a dataset of cryptic clues as a challenging new benchmark for NLP systems that seek to process compositional language in more creative, human-like ways. After showing that three non-neural approaches and T5, a state-of-the-art neural language model, do not achieve good performance, we make our second main contribution: a novel curriculum approach, in which the model is first fine-tuned on related tasks such as unscrambling words.We also introduce a challenging data split, examine the meta-linguistic capabilities of subword-tokenized models, and investigate model systematicity by perturbing the wordplay part of clues, showing that T5 exhibits behavior partially consistent with human solving strategies. Although our curricular approach considerably improves on the T5 baseline, our best-performing model still fails to generalize to the extent that humans can. Thus, cryptic crosswords remain an unsolved challenge for NLP systems and a potential source of future innovation.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks b/data/2021/neurips/Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks
new file mode 100644
index 0000000000..dc98a2772a
--- /dev/null
+++ b/data/2021/neurips/Deep Bandits Show-Off: Simple and Efficient Exploration with Deep Networks	
@@ -0,0 +1 @@
+Designing efficient exploration is central to Reinforcement Learning due to the fundamental problem posed by the exploration-exploitation dilemma. Bayesian exploration strategies like Thompson Sampling resolve this trade-off in a principled way by modeling and updating the distribution of the parameters of the action-value function, the outcome model of the environment. However, this technique becomes infeasible for complex environments due to the computational intractability of maintaining probability distributions over parameters of outcome models of corresponding complexity. Moreover, the approximation techniques introduced to mitigate this issue typically result in poor exploration-exploitation trade-offs, as observed in the case of deep neural network models with approximate posterior methods that have been shown to underperform in the deep bandit scenario. In this paper we introduce Sample Average Uncertainty (SAU), a simple and efficient uncertainty measure for contextual bandits. While Bayesian approaches like Thompson Sampling estimate outcomes uncertainty indirectly by first quantifying the variability over the parameters of the outcome model, SAU is a frequentist approach that directly estimates the uncertainty of the outcomes based on the value predictions. Importantly, we show theoretically that the uncertainty measure estimated by SAU asymptotically matches the uncertainty provided by Thompson Sampling, as well as its regret bounds. Because of its simplicity SAU can be seamlessly applied to deep contextual bandits as a very scalable drop-in replacement for epsilon-greedy exploration. We confirm empirically our theory by showing that SAU-based exploration outperforms current state-of-the-art deep Bayesian bandit methods on several real-world datasets at modest computation cost. Code is available at \url{https://github.com/ibm/sau-explore}.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Conditional Gaussian Mixture Model for Constrained Clustering b/data/2021/neurips/Deep Conditional Gaussian Mixture Model for Constrained Clustering
new file mode 100644
index 0000000000..cc10e9e264
--- /dev/null
+++ b/data/2021/neurips/Deep Conditional Gaussian Mixture Model for Constrained Clustering	
@@ -0,0 +1 @@
+Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Contextual Video Compression b/data/2021/neurips/Deep Contextual Video Compression
new file mode 100644
index 0000000000..6733157726
--- /dev/null
+++ b/data/2021/neurips/Deep Contextual Video Compression	
@@ -0,0 +1 @@
+Most of the existing neural video compression methods adopt the predictive coding framework, which first generates the predicted frame and then encodes its residue with the current frame. However, as for compression ratio, predictive coding is only a sub-optimal solution as it uses simple subtraction operation to remove the redundancy across frames. In this paper, we propose a deep contextual video compression framework to enable a paradigm shift from predictive coding to conditional coding. In particular, we try to answer the following questions: how to define, use, and learn condition under a deep video compression framework. To tap the potential of conditional coding, we propose using feature domain context as condition. This enables us to leverage the high dimension context to carry rich information to both the encoder and the decoder, which helps reconstruct the high-frequency contents for higher video quality. Our framework is also extensible, in which the condition can be flexibly designed. Experiments show that our method can significantly outperform the previous state-of-the-art (SOTA) deep video compression methods. When compared with x265 using veryslow preset, we can achieve 26.0% bitrate saving for 1080P standard test videos.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Explicit Duration Switching Models for Time Series b/data/2021/neurips/Deep Explicit Duration Switching Models for Time Series
new file mode 100644
index 0000000000..868a91e8d3
--- /dev/null
+++ b/data/2021/neurips/Deep Explicit Duration Switching Models for Time Series	
@@ -0,0 +1 @@
+Many complex time series can be effectively subdivided into distinct regimes that exhibit persistent dynamics. Discovering the switching behavior and the statistical patterns in these regimes is important for understanding the underlying dynamical system. We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and time-dependent switching dynamics. State-dependent switching is enabled by a recurrent state-to-switch connection and an explicit duration count variable is used to improve the time-dependent switching behavior. We demonstrate how to perform efficient inference using a hybrid algorithm that approximates the posterior of the continuous states via an inference network and performs exact inference for the discrete switches and counts. The model is trained by maximizing a Monte Carlo lower bound of the marginal log-likelihood that can be computed efficiently as a byproduct of the inference routine. Empirical results on multiple datasets demonstrate that RED-SDS achieves considerable improvement in time series segmentation and competitive forecasting performance against the state of the art.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Extended Hazard Models for Survival Analysis b/data/2021/neurips/Deep Extended Hazard Models for Survival Analysis
new file mode 100644
index 0000000000..fb8aa768a6
--- /dev/null
+++ b/data/2021/neurips/Deep Extended Hazard Models for Survival Analysis	
@@ -0,0 +1 @@
+Unlike standard prediction tasks, survival analysis requires modeling right censored data, which must be treated with care. While deep neural networks excel in traditional supervised learning, it remains unclear how to best utilize these models in survival analysis. A key question asks which data-generating assumptions of traditional survival models should be retained and which should be made more ﬂexible via the function-approximating capabilities of neural networks. Rather than estimating the survival function targeted by most existing methods, we introduce a Deep Extended Hazard (DeepEH) model to provide a ﬂexible and general framework for deep survival analysis. The extended hazard model includes the conventional Cox proportional hazards and accelerated failure time models as special cases, so DeepEH subsumes the popular Deep Cox proportional hazard (DeepSurv) and Deep Accelerated Failure Time (DeepAFT) models. We additionally provide theoretical support for the proposed DeepEH model by establishing consistency and convergence rate of the survival function estimator, which underscore the attractive feature that deep learning is able to detect low-dimensional structure of data in high-dimensional space. Numerical experiments also provide evidence that the proposed methods outperform existing statistical and deep learning approaches to survival analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Extrapolation for Attribute-Enhanced Generation b/data/2021/neurips/Deep Extrapolation for Attribute-Enhanced Generation
new file mode 100644
index 0000000000..785fde84a2
--- /dev/null
+++ b/data/2021/neurips/Deep Extrapolation for Attribute-Enhanced Generation	
@@ -0,0 +1 @@
+Attribute extrapolation in sample generation is challenging for deep neural networks operating beyond the training distribution. We formulate a new task for extrapolation in sequence generation, focusing on natural language and proteins, and propose GENhance, a generative framework that enhances attributes through a learned latent space. Trained on movie reviews and a computed protein stability dataset, GENhance can generate strongly-positive text reviews and highly stable protein sequences without being exposed to similar data during training. We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings b/data/2021/neurips/Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings
new file mode 100644
index 0000000000..e04798ada6
--- /dev/null
+++ b/data/2021/neurips/Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings	
@@ -0,0 +1 @@
+We consider off-policy evaluation (OPE) in continuous treatment settings, such as personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new treatment decision rule using historical data generated by a different decision rule. Most existing works on OPE focus on discrete treatment settings. To handle continuous treatments, we develop a novel estimation method for OPE using deep jump learning. The key ingredient of our method lies in adaptively discretizing the treatment space using deep discretization, by leveraging deep learning and multi-scale change point detection. This allows us to apply existing OPE methods in discrete treatments to handle continuous treatments. Our method is further justified by theoretical results, simulations, and a real application to Warfarin Dosing.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Learning Through the Lens of Example Difficulty b/data/2021/neurips/Deep Learning Through the Lens of Example Difficulty
new file mode 100644
index 0000000000..4405a524b6
--- /dev/null
+++ b/data/2021/neurips/Deep Learning Through the Lens of Example Difficulty	
@@ -0,0 +1 @@
+Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth. Our extensive investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point. We further categorize difficult examples into three interpretable groups, demonstrate how these groups are processed differently inside deep models and showcase how this understanding allows us to improve prediction accuracy. Insights from our study lead to a coherent view of a number of separately reported phenomena in the literature: early layers generalize while later layers memorize; early layers converge faster and networks learn easy data and simple functions first.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Learning on a Data Diet: Finding Important Examples Early in Training b/data/2021/neurips/Deep Learning on a Data Diet: Finding Important Examples Early in Training
new file mode 100644
index 0000000000..b5037f7fcb
--- /dev/null
+++ b/data/2021/neurips/Deep Learning on a Data Diet: Finding Important Examples Early in Training	
@@ -0,0 +1 @@
+Recent success in deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, in standard vision datasets, simple scores averaged over several weight initializations can be used to identify important examples very early in training. We propose two such scores -- the Gradient Normed (GraNd) and the Error L2-Norm (EL2N) scores -- and demonstrate their efficacy on a range of architectures and datasets by pruning significant fractions of training data without sacrificing test accuracy. In fact, using EL2N scores calculated a few epochs into training, we can prune half of the CIFAR10 training set while slightly improving test accuracy. Furthermore, for a given dataset, EL2N scores from one architecture or hyperparameter configuration generalize to other configurations. Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training. We also use our scores to detect noisy examples and study training dynamics through the lens of important examples -- we investigate how the data distribution shapes the loss surface and identify subspaces of the model's data representation that are relatively stable over training.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Learning with Label Differential Privacy b/data/2021/neurips/Deep Learning with Label Differential Privacy
new file mode 100644
index 0000000000..55ce553ee0
--- /dev/null
+++ b/data/2021/neurips/Deep Learning with Label Differential Privacy	
@@ -0,0 +1 @@
+The Randomized Response (RR) algorithm is a classical technique to improve robustness in survey aggregation, and has been widely adopted in applications with differential privacy guarantees. We propose a novel algorithm, Randomized Response with Prior (RRWithPrior), which can provide more accurate results while maintaining the same level of privacy guaranteed by RR. We then apply RRWithPrior to learn neural networks with label differential privacy (LabelDP), and show that when only the label needs to be protected, the model performance can be significantly improved over the previous state-of-the-art private baselines. Moreover, we study different ways to obtain priors, which when used with RRWithPrior can additionally improve the model performance, further reducing the accuracy gap between private and non-private models. We complement the empirical results with theoretical analysis showing that LabelDP is provably easier than protecting both the inputs and labels.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis b/data/2021/neurips/Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis
new file mode 100644
index 0000000000..238e2f4b72
--- /dev/null
+++ b/data/2021/neurips/Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis	
@@ -0,0 +1 @@
+We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. It marries the merits of implicit and explicit 3D representations by leveraging a novel hybrid 3D representation. Compared to the current implicit approaches, which are trained to regress the signed distance values, DMTet directly optimizes for the reconstructed surface, which enables us to synthesize finer geometric details with fewer artifacts. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology. The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh. Our approach significantly outperforms existing work on conditional shape synthesis from coarse voxel inputs, trained on a dataset of complex 3D animal shapes. Project page: https://nv-tlabs.github.io/DMTet/.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Markov Factor Analysis: Towards Concurrent Temporal and Spatial Analysis of fMRI Data b/data/2021/neurips/Deep Markov Factor Analysis: Towards Concurrent Temporal and Spatial Analysis of fMRI Data
new file mode 100644
index 0000000000..c5d0fad372
--- /dev/null
+++ b/data/2021/neurips/Deep Markov Factor Analysis: Towards Concurrent Temporal and Spatial Analysis of fMRI Data	
@@ -0,0 +1 @@
+Factor analysis methods have been widely used in neuroimaging to transfer high dimensional imaging data into low dimensional, ideally interpretable representations. However, most of these methods overlook the highly nonlinear and complex temporal dynamics of neural processes when factorizing their imaging data. In this paper, we present deep Markov factor analysis (DMFA), a generative model that employs Markov property in a chain of low dimensional temporal embeddings together with spatial inductive assumptions, all related through neural networks, to capture temporal dynamics in functional magnetic resonance imaging (fMRI) data, and tackle their high spatial dimensionality, respectively. Augmented with a discrete latent, DMFA is able to cluster fMRI data in its low dimensional temporal embedding with regard to subject and cognitive state variability, therefore, enables validation of a variety of fMRI-driven neuroscientiﬁc hypotheses. Experimental results on both synthetic and real fMRI data demonstrate the capacity of DMFA in revealing interpretable clusters and capturing nonlinear temporal dependencies in these high dimensional imaging data.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Molecular Representation Learning via Fusing Physical and Chemical Information b/data/2021/neurips/Deep Molecular Representation Learning via Fusing Physical and Chemical Information
new file mode 100644
index 0000000000..8342e467ad
--- /dev/null
+++ b/data/2021/neurips/Deep Molecular Representation Learning via Fusing Physical and Chemical Information	
@@ -0,0 +1 @@
+Molecular representation learning is the first yet vital step in combining deep learning and molecular science. To push the boundaries of molecular representation learning, we present PhysChem, a novel neural architecture that learns molecular representations via fusing physical and chemical information of molecules. PhysChem is composed of a physicist network (PhysNet) and a chemist network (ChemNet). PhysNet is a neural physical engine that learns molecular conformations through simulating molecular dynamics with parameterized forces;ChemNet implements geometry-aware deep message-passing to learn chemical / biomedical properties of molecules. Two networks specialize in their own tasks and cooperate by providing expertise to each other. By fusing physical and chemical information, PhysChem achieved state-of-the-art performances on MoleculeNet, a standard molecular machine learning benchmark. The effectiveness of PhysChem was further corroborated on cutting-edge datasets of SARS-CoV-2.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Networks Provably Classify Data on Curves b/data/2021/neurips/Deep Networks Provably Classify Data on Curves
new file mode 100644
index 0000000000..b85901758b
--- /dev/null
+++ b/data/2021/neurips/Deep Networks Provably Classify Data on Curves	
@@ -0,0 +1 @@
+Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure—a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples are polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Neural Networks as Point Estimates for Deep Gaussian Processes b/data/2021/neurips/Deep Neural Networks as Point Estimates for Deep Gaussian Processes
new file mode 100644
index 0000000000..8b886bdb48
--- /dev/null
+++ b/data/2021/neurips/Deep Neural Networks as Point Estimates for Deep Gaussian Processes	
@@ -0,0 +1 @@
+Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpreting activation functions as interdomain inducing features through a rigorous analysis of the interplay between activation functions and kernels. This results in models that can either be seen as neural networks with improved uncertainty prediction or deep Gaussian processes with increased prediction accuracy. These claims are supported by experimental results on regression and classification datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation b/data/2021/neurips/Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
new file mode 100644
index 0000000000..0d6e0291bf
--- /dev/null
+++ b/data/2021/neurips/Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation	
@@ -0,0 +1 @@
+Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Reinforcement Learning at the Edge of the Statistical Precipice b/data/2021/neurips/Deep Reinforcement Learning at the Edge of the Statistical Precipice
new file mode 100644
index 0000000000..c5d07a5222
--- /dev/null
+++ b/data/2021/neurips/Deep Reinforcement Learning at the Edge of the Statistical Precipice	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Residual Learning in Spiking Neural Networks b/data/2021/neurips/Deep Residual Learning in Spiking Neural Networks
new file mode 100644
index 0000000000..7558ec115a
--- /dev/null
+++ b/data/2021/neurips/Deep Residual Learning in Spiking Neural Networks	
@@ -0,0 +1 @@
+Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible. Our codes are available at https://github.com/fangwei123456/Spike-Element-Wise-ResNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Self-Dissimilarities as Powerful Visual Fingerprints b/data/2021/neurips/Deep Self-Dissimilarities as Powerful Visual Fingerprints
new file mode 100644
index 0000000000..4daac25a42
--- /dev/null
+++ b/data/2021/neurips/Deep Self-Dissimilarities as Powerful Visual Fingerprints	
@@ -0,0 +1 @@
+Features extracted from deep layers of classiﬁcation networks are widely used as image descriptors. Here, we exploit an unexplored property of these features: their internal dissimilarity. While small image patches are known to have similar statistics across image scales, it turns out that the internal distribution of deep features varies distinctively between scales. We show how this deep self dissimilarity (DSD) property can be used as a powerful visual ﬁngerprint. Particularly, we illustrate that full-reference and no-reference image quality measures derived from DSD are highly correlated with human preference. In addition, incorporating DSD as a loss function in training of image restoration networks, leads to results that are at least as photo-realistic as those obtained by GAN based methods, while not requiring adversarial training.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess b/data/2021/neurips/Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess
new file mode 100644
index 0000000000..510295a186
--- /dev/null
+++ b/data/2021/neurips/Deep Synoptic Monte-Carlo Planning in Reconnaissance Blind Chess	
@@ -0,0 +1 @@
+This paper introduces deep synoptic Monte Carlo planning (DSMCP) for large imperfect information games. The algorithm constructs a belief state with an unweighted particle filter and plans via playouts that start at samples drawn from the belief state. The algorithm accounts for uncertainty by performing inference on"synopses,"a novel stochastic abstraction of information states. DSMCP is the basis of the program Penumbra, which won the official 2020 reconnaissance blind chess competition versus 33 other programs. This paper also evaluates algorithm variants that incorporate caution, paranoia, and a novel bandit algorithm. Furthermore, it audits the synopsis features used in Penumbra with per-bit saliency statistics.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time b/data/2021/neurips/Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time
new file mode 100644
index 0000000000..d0c580392d
--- /dev/null
+++ b/data/2021/neurips/Deep inference of latent dynamics with spatio-temporal super-resolution using selective backpropagation through time	
@@ -0,0 +1 @@
+Modern neural interfaces allow access to the activity of up to a million neurons within brain circuits. However, bandwidth limits often create a trade-off between greater spatial sampling (more channels or pixels) and the temporal frequency of sampling. Here we demonstrate that it is possible to obtain spatio-temporal super-resolution in neuronal time series by exploiting relationships among neurons, embedded in latent low-dimensional population dynamics. Our novel neural network training strategy, selective backpropagation through time (SBTT), enables learning of deep generative models of latent dynamics from data in which the set of observed variables changes at each time step. The resulting models are able to infer activity for missing samples by combining observations with learned latent dynamics. We test SBTT applied to sequential autoencoders and demonstrate more efficient and higher-fidelity characterization of neural population dynamics in electrophysiological and calcium imaging data. In electrophysiology, SBTT enables accurate inference of neuronal population dynamics with lower interface bandwidths, providing an avenue to significant power savings for implanted neuroelectronic interfaces. In applications to two-photon calcium imaging, SBTT accurately uncovers high-frequency temporal structure underlying neural population activity, substantially outperforming the current state-of-the-art. Finally, we demonstrate that performance could be further improved by using limited, high-bandwidth sampling to pretrain dynamics models, and then using SBTT to adapt these models for sparsely-sampled data.
\ No newline at end of file
diff --git a/data/2021/neurips/Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space b/data/2021/neurips/Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space
new file mode 100644
index 0000000000..13efd8228b
--- /dev/null
+++ b/data/2021/neurips/Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space	
@@ -0,0 +1 @@
+Deep learning has exhibited superior performance for various tasks, especially for high-dimensional datasets, such as images. To understand this property, we investigate the approximation and estimation ability of deep learning on {\it anisotropic Besov spaces}. The anisotropic Besov space is characterized by direction-dependent smoothness and includes several function classes that have been investigated thus far. We demonstrate that the approximation error and estimation error of deep learning only depend on the average value of the smoothness parameters in all directions. Consequently, the curse of dimensionality can be avoided if the smoothness of the target function is highly anisotropic. Unlike existing studies, our analysis does not require a low-dimensional structure of the input data. We also investigate the minimax optimality of deep learning and compare its performance with that of the kernel method (more generally, linear estimators). The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.
\ No newline at end of file
diff --git a/data/2021/neurips/DeepGEM: Generalized Expectation-Maximization for Blind Inversion b/data/2021/neurips/DeepGEM: Generalized Expectation-Maximization for Blind Inversion
new file mode 100644
index 0000000000..4857dc8133
--- /dev/null
+++ b/data/2021/neurips/DeepGEM: Generalized Expectation-Maximization for Blind Inversion	
@@ -0,0 +1 @@
+Typically, inversion algorithms assume that a forward model, which relates a source to its resulting measurements, is known and ﬁxed. Using collected indirect measurements and the forward model, the goal becomes to recover the source. When the forward model is unknown, or imperfect, artifacts due to model mismatch occur in the recovery of the source. In this paper, we study the problem of blind inversion: solving an inverse problem with unknown or imperfect knowledge of the forward model parameters. We propose DeepGEM, a variational Expectation-Maximization (EM) framework that can be used to solve for the unknown parameters of the forward model in an unsupervised manner. DeepGEM makes use of a normalizing ﬂow generative network to efﬁciently capture complex posterior distributions, which leads to more accurate evaluation of the source’s posterior distribution used in EM. We showcase the effectiveness of our DeepGEM approach by achieving strong performance on the challenging problem of blind seismic tomography, where we signiﬁcantly outperform the standard method used in seismology. We also demonstrate the generality of DeepGEM by applying it to a simple case of blind deconvolution. achieves strong performance in the task of joint seismic tomography and earthquake source localization, substantially outperforming standard approaches currently being used in seismology on synthetic data. The proposed framework is ﬂexible and can be applied to different applications that require estimation or ﬁne tuning of forward model parameters. We demonstrate this ﬂexibility by also applying the approach to a simple, but challenging, blind deconvolution problem. Future work includes applying this method to real seismic data, extending to other applications, and incorporating data-driven priors. Our results highlight the beneﬁt of blending physically sound model-based techniques with learning machinery for blind inverse problems.
\ No newline at end of file
diff --git a/data/2021/neurips/DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning b/data/2021/neurips/DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
new file mode 100644
index 0000000000..71b056b77f
--- /dev/null
+++ b/data/2021/neurips/DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning	
@@ -0,0 +1 @@
+Sparse tensors appear frequently in federated deep learning, either as a direct artifact of the deep neural network’s gradients, or, as a result of an explicit sparsiﬁ-cation process. Existing communication primitives are agnostic to the challenges of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored to federated deep learning. DeepReduce decomposes sparse tensors into two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of standard compressors, such as Deﬂate for values, and Run-Length Encoding for indices. We also propose two novel compression schemes that achieve superior results: curve-ﬁtting based for values, and bloom-ﬁlter based for indices. DeepReduce is orthogonal to existing gradient sparsiﬁers and can be applied in conjunction with them, transparently to the end-user, to signiﬁcantly lower the communication overhead. As a proof of concept, we implement our approach on TensorFlow and PyTorch. Our experiments with real models demonstrate that DeepReduce transmits 320% less data than existing sparsiﬁers, without affecting accuracy. Code is available at https://github.com/hangxu0304/DeepReduce .
\ No newline at end of file
diff --git a/data/2021/neurips/DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales b/data/2021/neurips/DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales
new file mode 100644
index 0000000000..1a584953cb
--- /dev/null
+++ b/data/2021/neurips/DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales	
@@ -0,0 +1 @@
+Extracting temporal relationships over a range of scales is a hallmark of human perception and cognition -- and thus it is a critical feature of machine learning applied to real-world problems. Neural networks are either plagued by the exploding/vanishing gradient problem in recurrent neural networks (RNNs) or must adjust their parameters to learn the relevant time scales (e.g., in LSTMs). This paper introduces DeepSITH, a network comprising biologically-inspired Scale-Invariant Temporal History (SITH) modules in series with dense connections between layers. SITH modules respond to their inputs with a geometrically-spaced set of time constants, enabling the DeepSITH network to learn problems along a continuum of time-scales. We compare DeepSITH to LSTMs and other recent RNNs on several time series prediction and decoding tasks. DeepSITH achieves state-of-the-art performance on these problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks b/data/2021/neurips/Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks
new file mode 100644
index 0000000000..93de28693a
--- /dev/null
+++ b/data/2021/neurips/Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks	
@@ -0,0 +1 @@
+Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.
\ No newline at end of file
diff --git a/data/2021/neurips/Deformable Butterfly: A Highly Structured and Sparse Linear Transform b/data/2021/neurips/Deformable Butterfly: A Highly Structured and Sparse Linear Transform
new file mode 100644
index 0000000000..aa0c3cbb5a
--- /dev/null
+++ b/data/2021/neurips/Deformable Butterfly: A Highly Structured and Sparse Linear Transform	
@@ -0,0 +1 @@
+We introduce a new kind of linear transform named Deformable Butterfly (DeBut) that generalizes the conventional butterfly matrices and can be adapted to various input-output dimensions. It inherits the fine-to-coarse-grained learnable hierarchy of traditional butterflies and when deployed to neural networks, the prominent structures and sparsity in a DeBut layer constitutes a new way for network compression. We apply DeBut as a drop-in replacement of standard fully connected and convolutional layers, and demonstrate its superiority in homogenizing a neural network and rendering it favorable properties such as light weight and low inference complexity, without compromising accuracy. The natural complexity-accuracy tradeoff arising from the myriad deformations of a DeBut layer also opens up new rooms for analytical and practical research. The codes and Appendix are publicly available at: https://github.com/ruilin0212/DeBut.
\ No newline at end of file
diff --git a/data/2021/neurips/Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning b/data/2021/neurips/Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems b/data/2021/neurips/Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems
new file mode 100644
index 0000000000..7e8f61fd33
--- /dev/null
+++ b/data/2021/neurips/Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems	
@@ -0,0 +1 @@
+Multi-agent control is a central theme in the Cyber-Physical Systems (CPS). However, current control methods either receive non-Markovian states due to insufficient sensing and decentralized design, or suffer from poor convergence. This paper presents the Delayed Propagation Transformer (DePT), a new transformer-based model that specializes in the global modeling of CPS while taking into account the immutable constraints from the physical world. DePT induces a cone-shaped spatial-temporal attention prior, which injects the information propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS -- network-scale traffic signal control system in the open world -- show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets. Our codes are released at: https://github.com/VITA-Group/DePT.
\ No newline at end of file
diff --git a/data/2021/neurips/Demystifying and Generalizing BinaryConnect b/data/2021/neurips/Demystifying and Generalizing BinaryConnect
new file mode 100644
index 0000000000..58548b675b
--- /dev/null
+++ b/data/2021/neurips/Demystifying and Generalizing BinaryConnect	
@@ -0,0 +1 @@
+BinaryConnect (BC) and its many variations have become the de facto standard for neural network quantization. However, our understanding of the inner workings of BC is still quite limited. We attempt to close this gap in four different aspects: (a) we show that existing quantization algorithms, including post-training quantization, are surprisingly similar to each other; (b) we argue for proximal maps as a natural family of quantizers that is both easy to design and analyze; (c) we refine the observation that BC is a special case of dual averaging, which itself is a special case of the generalized conditional gradient algorithm; (d) consequently, we propose ProxConnect (PC) as a generalization of BC and we prove its convergence properties by exploiting the established connections. We conduct experiments on CIFAR-10 and ImageNet, and verify that PC achieves competitive performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Denoising Normalizing Flow b/data/2021/neurips/Denoising Normalizing Flow
new file mode 100644
index 0000000000..9eea8f94cc
--- /dev/null
+++ b/data/2021/neurips/Denoising Normalizing Flow	
@@ -0,0 +1 @@
+Normalizing ﬂows (NF) are expressive as well as tractable density estimation methods whenever the support of the density is diffeomorphic to the entire data-space. However, real-world data sets typically live on (or very close to) low-dimensional manifolds thereby challenging the applicability of standard NF on real-world problems. Here we propose a novel method - called Denoising Normalizing Flow (DNF) - that estimates the density on the low-dimensional manifold while learning the manifold as well. The DNF works in 3 steps. First, it inﬂates the manifold - making it diffeomorphic to the entire data-space. Secondly, it learns an NF on the inﬂated manifold and ﬁnally it learns a denoising mapping - similarly to denoising autoencoders. The DNF relies on a single cost function and does not require to alternate between a density estimation phase and a manifold learning phase - as it is the case with other recent methods. Furthermore, we show that the DNF can learn meaningful low-dimensional representations from naturalistic images as well as generate high-quality samples.
\ No newline at end of file
diff --git a/data/2021/neurips/Dense Keypoints via Multiview Supervision b/data/2021/neurips/Dense Keypoints via Multiview Supervision
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Dense Unsupervised Learning for Video Segmentation b/data/2021/neurips/Dense Unsupervised Learning for Video Segmentation
new file mode 100644
index 0000000000..c8f60e6c5b
--- /dev/null
+++ b/data/2021/neurips/Dense Unsupervised Learning for Video Segmentation	
@@ -0,0 +1 @@
+We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train such a model results in a degenerate solution. We propose to prevent this with a simple regularisation scheme, accommodating the equivariance property of the segmentation task to similarity transformations. Our training objective admits efficient implementation and exhibits fast training convergence. On established VOS benchmarks, our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
\ No newline at end of file
diff --git a/data/2021/neurips/Densely connected normalizing flows b/data/2021/neurips/Densely connected normalizing flows
new file mode 100644
index 0000000000..738b52540d
--- /dev/null
+++ b/data/2021/neurips/Densely connected normalizing flows	
@@ -0,0 +1 @@
+Normalizing flows are bijective mappings between inputs and latent representations with a fully factorized distribution. They are very attractive due to exact likelihood valuation and efficient sampling. However, their effective capacity is often insufficient since the bijectivity constraint limits the model width. We address this issue by incrementally padding intermediate representations with noise. We precondition the noise in accordance with previous invertible units, which we describe as cross-unit coupling. Our invertible glow-like modules increase the model expressivity by fusing a densely connected block with Nystrom self-attention. We refer to our architecture as DenseFlow since both cross-unit and intra-module couplings rely on dense connectivity. Experiments show significant improvements due to the proposed contributions and reveal state-of-the-art density estimation under moderate computing budgets.
\ No newline at end of file
diff --git a/data/2021/neurips/Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity b/data/2021/neurips/Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity
new file mode 100644
index 0000000000..a441701a48
--- /dev/null
+++ b/data/2021/neurips/Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity	
@@ -0,0 +1 @@
+Risk-Sensitive/Robust Control and Dynamic Games. H1-robust control has been one of the most fundamental research fields in control theory, addressing worst-case controller design for linear plants in the presence of unknown disturbances and uncertainties. Frequency-domain and time-domain/state-space formulations of theH1-robust control problem were first introduced in [34] and [35], respectively. Based on the time-domain approach, the precise equivalence relationships between controllers in the disturbance attenuation (which belongs to a class ofH1-robust control) problem, risk-sensitive linear control, and zero-sum LQ dynamic games have been studied extensively [22, 23, 24, 36, 26, 25]. Specifically, [22] first demonstrated the equivalence of the controllers between LEQG and zero-sum LQ (differential) games; [23] studied the relationship between the mixed H2/H1 design problem, a sub-problem of the H1-robust control problem, and LEQG; [24, 36] investigated the disturbance attenuation problem through a dynamic games approach, for both finiteand infinite-horizons, time-invariant and -varying, deterministic and stochastic settings. We refer the readers to [26, 25] for comprehensive studies of H1-robust control, dynamic game theory, risk-sensitive control, and also their precise interconnections.
\ No newline at end of file
diff --git a/data/2021/neurips/Design of Experiments for Stochastic Contextual Linear Bandits b/data/2021/neurips/Design of Experiments for Stochastic Contextual Linear Bandits
new file mode 100644
index 0000000000..43ec39e4a0
--- /dev/null
+++ b/data/2021/neurips/Design of Experiments for Stochastic Contextual Linear Bandits	
@@ -0,0 +1 @@
+In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Designing Counterfactual Generators using Deep Model Inversion b/data/2021/neurips/Designing Counterfactual Generators using Deep Model Inversion
new file mode 100644
index 0000000000..ee6c3efdab
--- /dev/null
+++ b/data/2021/neurips/Designing Counterfactual Generators using Deep Model Inversion	
@@ -0,0 +1 @@
+Explanation techniques that synthesize small, interpretable changes to a given image while producing desired changes in the model prediction have become popular for introspecting black-box models. Commonly referred to as counterfactuals, the synthesized explanations are required to contain discernible changes (for easy interpretability) while also being realistic (consistency to the data manifold). In this paper, we focus on the case where we have access only to the trained deep classifier and not the actual training data. While the problem of inverting deep models to synthesize images from the training distribution has been explored, our goal is to develop a deep inversion approach to generate counterfactual explanations for a given query image. Despite their effectiveness in conditional image synthesis, we show that existing deep inversion methods are insufficient for producing meaningful counterfactuals. We propose DISC (Deep Inversion for Synthesizing Counterfactuals) that improves upon deep inversion by utilizing (a) stronger image priors, (b) incorporating a novel manifold consistency objective and (c) adopting a progressive optimization strategy. We find that, in addition to producing visually meaningful explanations, the counterfactuals from DISC are effective at learning classifier decision boundaries and are robust to unknown test-time corruptions.
\ No newline at end of file
diff --git a/data/2021/neurips/Detecting Anomalous Event Sequences with Temporal Point Processes b/data/2021/neurips/Detecting Anomalous Event Sequences with Temporal Point Processes
new file mode 100644
index 0000000000..d202478993
--- /dev/null
+++ b/data/2021/neurips/Detecting Anomalous Event Sequences with Temporal Point Processes	
@@ -0,0 +1 @@
+Automatically detecting anomalies in event data can provide substantial value in domains such as healthcare, DevOps, and information security. In this paper, we frame the problem of detecting anomalous continuous-time event sequences as out-of-distribution (OoD) detection for temporal point processes (TPPs). First, we show how this problem can be approached using goodness-of-fit (GoF) tests. We then demonstrate the limitations of popular GoF statistics for TPPs and propose a new test that addresses these shortcomings. The proposed method can be combined with various TPP models, such as neural TPPs, and is easy to implement. In our experiments, we show that the proposed statistic excels at both traditional GoF testing, as well as at detecting anomalies in simulated and real-world data.
\ No newline at end of file
diff --git a/data/2021/neurips/Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles b/data/2021/neurips/Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
new file mode 100644
index 0000000000..b78897a483
--- /dev/null
+++ b/data/2021/neurips/Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles	
@@ -0,0 +1 @@
+When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer drop in performance. For safe deployment, it is essential to estimate the accuracy of the pre-trained model on the test data. However, the labels for the test inputs are usually not immediately available in practice, and obtaining them can be expensive. This observation leads to two challenging tasks: (1) unsupervised accuracy estimation, which aims to estimate the accuracy of a pre-trained classifier on a set of unlabeled test inputs; (2) error detection, which aims to identify mis-classified test inputs. In this paper, we propose a principled and practically effective framework that simultaneously addresses the two tasks. The proposed framework iteratively learns an ensemble of models to identify mis-classified data points and performs self-training to improve the ensemble with the identified points. Theoretical analysis demonstrates that our framework enjoys provable guarantees for both accuracy estimation and error detection under mild conditions readily satisfied by practical deep learning models. Along with the framework, we proposed and experimented with two instantiations and achieved state-of-the-art results on 59 tasks. For example, on iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7% compared to existing methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess b/data/2021/neurips/Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess
new file mode 100644
index 0000000000..499a576b97
--- /dev/null
+++ b/data/2021/neurips/Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess	
@@ -0,0 +1 @@
+The advent of machine learning models that surpass human decision-making ability in complex domains has initiated a movement towards building AI systems that interact with humans. Many building blocks are essential for this activity, with a central one being the algorithmic characterization of human behavior. While much of the existing work focuses on aggregate human behavior, an important long-range goal is to develop behavioral models that specialize to individual people and can differentiate among them. To formalize this process, we study the problem of behavioral stylometry, in which the task is to identify a decision-maker from their decisions alone. We present a transformer-based approach to behavioral stylometry in the context of chess, where one attempts to identify the player who played a set of games. Our method operates in a few-shot classification framework, and can correctly identify a player from among thousands of candidate players with 98% accuracy given only 100 labeled games. Even when trained on amateur play, our method generalises to out-of-distribution samples of Grandmaster players, despite the dramatic differences between amateur and world-class players. Finally, we consider more broadly what our resulting embeddings reveal about human style in chess, as well as the potential ethical implications of powerful methods for identifying individuals from behavioral data.
\ No newline at end of file
diff --git a/data/2021/neurips/Detecting Moments and Highlights in Videos via Natural Language Queries b/data/2021/neurips/Detecting Moments and Highlights in Videos via Natural Language Queries
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning b/data/2021/neurips/Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning
new file mode 100644
index 0000000000..a46bd86a7f
--- /dev/null
+++ b/data/2021/neurips/Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning	
@@ -0,0 +1 @@
+We consider the problem of online learning in the presence of distribution shifts that occur at an unknown rate and of unknown intensity. We derive a new Bayesian online inference approach to simultaneously infer these distribution shifts and adapt the model to the detected changes by integrating ideas from change point detection, switching dynamical systems, and Bayesian online learning. Using a binary 'change variable,' we construct an informative prior such that--if a change is detected--the model partially erases the information of past model updates by tempering to facilitate adaptation to the new data distribution. Furthermore, the approach uses beam search to track multiple change-point hypotheses and selects the most probable one in hindsight. Our proposed method is model-agnostic, applicable in both supervised and unsupervised learning settings, suitable for an environment of concept drifts or covariate drifts, and yields improvements over state-of-the-art Bayesian online learning approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD b/data/2021/neurips/Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD
new file mode 100644
index 0000000000..724605a7af
--- /dev/null
+++ b/data/2021/neurips/Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) is a cornerstone of machine learning. When the number N of data items is large, SGD relies on constructing an unbiased estimator of the gradient of the empirical risk using a small subset of the original dataset, called a minibatch. Default minibatch construction involves uniformly sampling a subset of the desired size, but alternatives have been explored for variance reduction. In particular, experimental evidence suggests drawing minibatches from determinantal point processes (DPPs), distributions over minibatches that favour diversity among selected items. However, like in recent work on DPPs for coresets, providing a systematic and principled understanding of how and why DPPs help has been difficult. In this work, we contribute an orthogonal polynomial-based DPP paradigm for minibatch sampling in SGD. Our approach leverages the specific data distribution at hand, which endows it with greater sensitivity and power over existing data-agnostic methods. We substantiate our method via a detailed theoretical analysis of its convergence properties, interweaving between the discrete data set and the underlying continuous domain. In particular, we show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling. Coupled with existing finite-time guarantees for SGD on convex objectives, this entails that, DPP minibatches lead to a smaller bound on the mean square approximation error than uniform minibatches. Moreover, our estimators are amenable to a recent algorithm that directly samples linear statistics of DPPs (i.e., the gradient estimator) without sampling the underlying DPP (i.e., the minibatch), thereby reducing computational overhead. We provide detailed synthetic as well as real data experiments to substantiate our theoretical claims.
\ No newline at end of file
diff --git a/data/2021/neurips/DiBS: Differentiable Bayesian Structure Learning b/data/2021/neurips/DiBS: Differentiable Bayesian Structure Learning
new file mode 100644
index 0000000000..33c23cbf19
--- /dev/null
+++ b/data/2021/neurips/DiBS: Differentiable Bayesian Structure Learning	
@@ -0,0 +1 @@
+Bayesian structure learning allows inferring Bayesian network structure from data while reasoning about the epistemic uncertainty -- a key element towards enabling active causal discovery and designing interventions in real world systems. In this work, we propose a general, fully differentiable framework for Bayesian structure learning (DiBS) that operates in the continuous space of a latent probabilistic graph representation. Contrary to existing work, DiBS is agnostic to the form of the local conditional distributions and allows for joint posterior inference of both the graph structure and the conditional distribution parameters. This makes our formulation directly applicable to posterior inference of complex Bayesian network models, e.g., with nonlinear dependencies encoded by neural networks. Using DiBS, we devise an efficient, general purpose variational inference method for approximating distributions over structural models. In evaluations on simulated and real-world data, our method significantly outperforms related approaches to joint posterior inference.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Annealed Importance Sampling and the Perils of Gradient Noise b/data/2021/neurips/Differentiable Annealed Importance Sampling and the Perils of Gradient Noise
new file mode 100644
index 0000000000..7445e0a0f7
--- /dev/null
+++ b/data/2021/neurips/Differentiable Annealed Importance Sampling and the Perils of Gradient Noise	
@@ -0,0 +1 @@
+Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections. As a further advantage, DAIS allows for mini-batch gradients. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for the sampler not having reached equilibrium. Using this analysis, we prove that DAIS is consistent in the full-batch setting and provide a sublinear convergence rate. Furthermore, motivated by the problem of learning from large-scale datasets, we study a stochastic variant of DAIS that uses mini-batch gradients. Surprisingly, stochastic DAIS can be arbitrarily bad due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the accumulated stochastic error. This is in stark contrast with other settings such as gradient-based optimization and Langevin dynamics, where the effect of gradient noise can be washed out by taking smaller steps. This indicates that annealing-based marginal likelihood estimation with stochastic gradients may require new ideas.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games b/data/2021/neurips/Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
new file mode 100644
index 0000000000..e7d29edc52
--- /dev/null
+++ b/data/2021/neurips/Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games	
@@ -0,0 +1 @@
+We address Stackelberg models of combinatorial congestion games (CCGs); we aim to optimize the parameters of CCGs so that the selfish behavior of non-atomic players attains desirable equilibria. This model is essential for designing such social infrastructures as traffic and communication networks. Nevertheless, computational approaches to the model have not been thoroughly studied due to two difficulties: (I) bilevel-programming structures and (II) the combinatorial nature of CCGs. We tackle them by carefully combining (I) the idea of \textit{differentiable} optimization and (II) data structures called \textit{zero-suppressed binary decision diagrams} (ZDDs), which can compactly represent sets of combinatorial strategies. Our algorithm numerically approximates the equilibria of CCGs, which we can differentiate with respect to parameters of CCGs by automatic differentiation. With the resulting derivatives, we can apply gradient-based methods to Stackelberg models of CCGs. Our method is tailored to induce Nesterov's acceleration and can fully utilize the empirical compactness of ZDDs. These technical advantages enable us to deal with CCGs with a vast number of combinatorial strategies. Experiments on real-world network design instances demonstrate the practicality of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Learning Under Triage b/data/2021/neurips/Differentiable Learning Under Triage
new file mode 100644
index 0000000000..6d22ff2af6
--- /dev/null
+++ b/data/2021/neurips/Differentiable Learning Under Triage	
@@ -0,0 +1 @@
+Multiple lines of evidence suggest that predictive models may benefit from algorithmic triage. Under algorithmic triage, a predictive model does not predict all instances but instead defers some of them to human experts. However, the interplay between the prediction accuracy of the model and the human experts under algorithmic triage is not well understood. In this work, we start by formally characterizing under which circumstances a predictive model may benefit from algorithmic triage. In doing so, we also demonstrate that models trained for full automation may be suboptimal under triage. Then, given any model and desired level of triage, we show that the optimal triage policy is a deterministic threshold rule in which triage decisions are derived deterministically by thresholding the difference between the model and human errors on a per-instance level. Building upon these results, we introduce a practical gradient-based algorithm that is guaranteed to find a sequence of triage policies and predictive models of increasing performance. Experiments on a wide variety of supervised learning tasks using synthetic and real data from two important applications -- content moderation and scientific discovery -- illustrate our theoretical results and show that the models and triage policies provided by our gradient-based algorithm outperform those provided by several competitive baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Multiple Shooting Layers b/data/2021/neurips/Differentiable Multiple Shooting Layers
new file mode 100644
index 0000000000..a559b46523
--- /dev/null
+++ b/data/2021/neurips/Differentiable Multiple Shooting Layers	
@@ -0,0 +1 @@
+We detail a novel class of implicit neural models. Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and wall-clock inference time. We develop the algorithmic framework of MSLs, analyzing the different choices of solution methods from a theoretical and computational perspective. MSLs are showcased in long horizon optimal control of ODEs and PDEs and as latent models for sequence generation. Finally, we investigate the speedups obtained through application of MSL inference in neural controlled differential equations (Neural CDEs) for time series classification of medical data.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs b/data/2021/neurips/Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs
new file mode 100644
index 0000000000..cd722fd98b
--- /dev/null
+++ b/data/2021/neurips/Differentiable Optimization of Generalized Nondecomposable Functions using Linear Programs	
@@ -0,0 +1 @@
+We propose a framework which makes it feasible to directly train deep neural networks with respect to popular families of task-specific non-decomposable performance measures such as AUC, multi-class AUC, F-measure and others. A feature of the optimization model that emerges from these tasks is that it involves solving a Linear Programs (LP) during training where representations learned by upstream layers characterize the constraints or the feasible set. The constraint matrix is not only large but the constraints are also modified at each iteration. We show how adopting a set of ingenious ideas proposed by Mangasarian for 1-norm SVMs - which advocates for solving LPs with a generalized Newton method - provides a simple and effective solution that can be run on the GPU. In particular, this strategy needs little unrolling, which makes it more efficient during the backward pass. Further, even when the constraint matrix is too large to fit on the GPU memory (say large minibatch settings), we show that running the Newton method in a lower dimensional space yields accurate gradients for training, by utilizing a statistical concept called sufficient dimension reduction. While a number of specialized algorithms have been proposed for the models that we describe here, our module turns out to be applicable without any specific adjustments or relaxations. We describe each use case, study its properties and demonstrate the efficacy of the approach over alternatives which use surrogate lower bounds and often, specialized optimization schemes. Frequently, we achieve superior computational behavior and performance improvements on common datasets used in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Quality Diversity b/data/2021/neurips/Differentiable Quality Diversity
new file mode 100644
index 0000000000..4c283a7443
--- /dev/null
+++ b/data/2021/neurips/Differentiable Quality Diversity	
@@ -0,0 +1 @@
+Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Simulation of Soft Multi-body Systems b/data/2021/neurips/Differentiable Simulation of Soft Multi-body Systems
new file mode 100644
index 0000000000..5c2cfc9c8c
--- /dev/null
+++ b/data/2021/neurips/Differentiable Simulation of Soft Multi-body Systems	
@@ -0,0 +1 @@
+We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework for soft articulated bodies driven by muscles, joint torques, or pneumatic tubes. The experiments demonstrate that our designs make soft body simulation more stable and realistic compared to other frameworks. Our method accelerates the solution of system identification problems by more than an order of magnitude, and enables efficient gradient-based learning of motion control with soft robots.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks b/data/2021/neurips/Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Differentiable Spline Approximations b/data/2021/neurips/Differentiable Spline Approximations
new file mode 100644
index 0000000000..1fa97ea20c
--- /dev/null
+++ b/data/2021/neurips/Differentiable Spline Approximations	
@@ -0,0 +1 @@
+The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable"layer"in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Synthesis of Program Architectures b/data/2021/neurips/Differentiable Synthesis of Program Architectures
new file mode 100644
index 0000000000..ca985bfec9
--- /dev/null
+++ b/data/2021/neurips/Differentiable Synthesis of Program Architectures	
@@ -0,0 +1 @@
+Differentiable programs have recently attracted much interest due to their inter-pretability, compositionality, and their efﬁciency to leverage differentiable training. However, synthesizing differentiable programs requires optimizing over a combinatorial, rapidly exploded space of program architectures. Despite the development of effective pruning heuristics, previous works essentially enumerate the discrete search space of program architectures, which is inefﬁcient. We propose to encode program architecture search as learning the probability distribution over all possible program derivations induced by a context-free grammar. This allows the search algorithm to efﬁciently prune away unlikely program derivations to synthesize optimal program architectures. To this end, an efﬁcient gradient-descent based method is developed to conduct program architecture search in a continuous relaxation of the discrete space of grammar rules. Experiment results on four sequence classiﬁcation tasks demonstrate that our program synthesizer excels in discovering program architectures that lead to differentiable programs with higher F 1 scores, while being more efﬁcient than state-of-the-art program synthesis methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable Unsupervised Feature Selection based on a Gated Laplacian b/data/2021/neurips/Differentiable Unsupervised Feature Selection based on a Gated Laplacian
new file mode 100644
index 0000000000..8e23728d3b
--- /dev/null
+++ b/data/2021/neurips/Differentiable Unsupervised Feature Selection based on a Gated Laplacian	
@@ -0,0 +1 @@
+Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable loss function that combines the Laplacian score, which favors low-frequency features, with a gating mechanism for feature selection. We improve the Laplacian score, by replacing it with a gated variant computed on a subset of features. This subset is obtained using a continuous approximation of Bernoulli variables whose parameters are trained to gate the full feature space. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several real-world examples.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentiable rendering with perturbed optimizers b/data/2021/neurips/Differentiable rendering with perturbed optimizers
new file mode 100644
index 0000000000..02879844df
--- /dev/null
+++ b/data/2021/neurips/Differentiable rendering with perturbed optimizers	
@@ -0,0 +1 @@
+Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Solutions to this inverse and ill-posed problem typically involve a search for models that best explain observed image data. Notably, images depend both on the properties of observed scenes and on the process of image formation. Hence, if optimization techniques should be used to explain images, it is crucial to design differentiable functions for the projection of 3D scenes into images, also known as differentiable rendering. Previous approaches to differentiable rendering typically replace non-differentiable operations by smooth approximations, impacting the subsequent 3D estimation. In this paper, we take a more general approach and study differentiable renderers through the prism of randomized optimization and the related notion of perturbed optimizers. In particular, our work highlights the link between some well-known differentiable renderer formulations and randomly smoothed optimizers, and introduces differentiable perturbed renderers. We also propose a variance reduction mechanism to alleviate the computational burden inherent to perturbed optimizers and introduce an adaptive scheme to automatically adjust the smoothing parameters of the rendering process. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction. By providing informative gradients that can be used as a strong supervisory signal, we demonstrate the benefits of perturbed renderers to obtain more accurate solutions when compared to the state-of-the-art alternatives using smooth gradient approximations.
\ No newline at end of file
diff --git a/data/2021/neurips/Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent b/data/2021/neurips/Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent
new file mode 100644
index 0000000000..0704574d64
--- /dev/null
+++ b/data/2021/neurips/Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent	
@@ -0,0 +1 @@
+What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Differential Privacy Over Riemannian Manifolds b/data/2021/neurips/Differential Privacy Over Riemannian Manifolds
new file mode 100644
index 0000000000..e8591eb46a
--- /dev/null
+++ b/data/2021/neurips/Differential Privacy Over Riemannian Manifolds	
@@ -0,0 +1 @@
+In this work we consider the problem of releasing a differentially private statistical summary that resides on a Riemannian manifold. We present an extension of the Laplace or K-norm mechanism that utilizes intrinsic distances and volumes on the manifold. We also consider in detail the specific case where the summary is the Fr\'echet mean of data residing on a manifold. We demonstrate that our mechanism is rate optimal and depends only on the dimension of the manifold, not on the dimension of any ambient space, while also showing how ignoring the manifold structure can decrease the utility of the sanitized summary. We illustrate our framework in two examples of particular interest in statistics: the space of symmetric positive definite matrices, which is used for covariance matrices, and the sphere, which can be used as a space for modeling discrete distributions.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Empirical Risk Minimization under the Fairness Lens b/data/2021/neurips/Differentially Private Empirical Risk Minimization under the Fairness Lens
new file mode 100644
index 0000000000..64d7c8905c
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Empirical Risk Minimization under the Fairness Lens	
@@ -0,0 +1 @@
+Differential Privacy (DP) is an important privacy-enhancing technology for private machine learning systems. It allows to measure and bound the risk associated with an individual participation in a computation. However, it was recently observed that DP learning systems may exacerbate bias and unfairness for different groups of individuals. This paper builds on these important observations and sheds light on the causes of the disparate impacts arising in the problem of differentially private empirical risk minimization. It focuses on the accuracy disparity arising among groups of individuals in two well-studied DP learning methods: output perturbation and differentially private stochastic gradient descent. The paper analyzes which data and model properties are responsible for the disproportionate impacts, why these aspects are affecting different groups disproportionately and proposes guidelines to mitigate these effects. The proposed approach is evaluated on several datasets and settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Federated Bayesian Optimization with Distributed Exploration b/data/2021/neurips/Differentially Private Federated Bayesian Optimization with Distributed Exploration
new file mode 100644
index 0000000000..ba2e116e68
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Federated Bayesian Optimization with Distributed Exploration	
@@ -0,0 +1 @@
+Bayesian optimization (BO) has recently been extended to the federated learning (FL) setting by the federated Thompson sampling (FTS) algorithm, which has promising applications such as federated hyperparameter tuning. However, FTS is not equipped with a rigorous privacy guarantee which is an important consideration in FL. Recent works have incorporated differential privacy (DP) into the training of deep neural networks through a general framework for adding DP to iterative algorithms. Following this general DP framework, our work here integrates DP into FTS to preserve user-level privacy. We also leverage the ability of this general DP framework to handle different parameter vectors, as well as the technique of local modeling for BO, to further improve the utility of our algorithm through distributed exploration (DE). The resulting differentially private FTS with DE (DP-FTS-DE) algorithm is endowed with theoretical guarantees for both the privacy and utility and is amenable to interesting theoretical insights about the privacy-utility trade-off. We also use real-world experiments to show that DP-FTS-DE achieves high utility (competitive performance) with a strong privacy guarantee (small privacy loss) and induces a trade-off between privacy and utility.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Learning with Adaptive Clipping b/data/2021/neurips/Differentially Private Learning with Adaptive Clipping
new file mode 100644
index 0000000000..c136627037
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Learning with Adaptive Clipping	
@@ -0,0 +1 @@
+Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Model Personalization b/data/2021/neurips/Differentially Private Model Personalization
new file mode 100644
index 0000000000..9762b8aaef
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Model Personalization	
@@ -0,0 +1 @@
+We study personalization of supervised learning with user-level differential privacy. Consider a setting with many users, each of whom has a training data set drawn from their own distribution P i . Assuming some shared structure among the problems P i , can users collectively learn the shared structure—and solve their tasks better than they could individually—while preserving the privacy of their data? We formulate this question using joint, user-level differential privacy—that is, we control what is leaked about each user’s entire data set. We provide algorithms that exploit popular non-private approaches in this domain like the Almost-No-Inner-Loop (ANIL) method, and give strong user-level privacy guarantees for our general approach. When the problems P i are linear regression problems with each user’s regression vector lying in a common, unknown low-dimensional subspace, we show that our efﬁcient algorithms satisfy nearly optimal estimation error guarantees. We also establish a general, information-theoretic upper bound via an exponential mechanism-based algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Multi-Armed Bandits in the Shuffle Model b/data/2021/neurips/Differentially Private Multi-Armed Bandits in the Shuffle Model
new file mode 100644
index 0000000000..bcc0d47beb
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Multi-Armed Bandits in the Shuffle Model	
@@ -0,0 +1 @@
+We give an $(\varepsilon,\delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $O\left(\left(\sum_{a\in [k]:\Delta_a>0}\frac{\log T}{\Delta_a}\right)+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, and a distribution-independent regret of $O\left(\sqrt{kT\log T}+\frac{k\sqrt{\log\frac{1}{\delta}}\log T}{\varepsilon}\right)$, where $T$ is the number of rounds, $\Delta_a$ is the suboptimality gap of the arm $a$, and $k$ is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Sampling from Distributions b/data/2021/neurips/Differentially Private Sampling from Distributions
new file mode 100644
index 0000000000..ebbd2c6b91
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Sampling from Distributions	
@@ -0,0 +1 @@
+We initiate an investigation of private sampling from distributions. Given a dataset with $n$ independent observations from an unknown distribution $P$, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to $P$ while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data. We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on $\{1,\ldots ,k\}$, arbitrary product distributions on $\{0,1\}^d$, and product distributions on $\{0,1\}^d$ with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of $P$ nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings b/data/2021/neurips/Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings
new file mode 100644
index 0000000000..0a8aab5b8e
--- /dev/null
+++ b/data/2021/neurips/Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings	
@@ -0,0 +1 @@
+We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algorithm for the $\ell_1$ setting has nearly-optimal excess population risk $\tilde{O}\big(\sqrt{\frac{\log{d}}{n\varepsilon}}\big)$, and circumvents the dimension dependent lower bound of \cite{Asi:2021} for general non-smooth convex losses. In the differentially private non-convex setting, we provide several new algorithms for approximating stationary points of the population risk. For the $\ell_1$-case with smooth losses and polyhedral constraint, we provide the first nearly dimension independent rate, $\tilde O\big(\frac{\log^{2/3}{d}}{{(n\varepsilon)^{1/3}}}\big)$ in linear time. For the constrained $\ell_2$-case with smooth losses, we obtain a linear-time algorithm with rate $\tilde O\big(\frac{1}{n^{1/3}}+\frac{d^{1/5}}{(n\varepsilon)^{2/5}}\big)$. Finally, for the $\ell_2$-case we provide the first method for {\em non-smooth weakly convex} stochastic optimization with rate $\tilde O\big(\frac{1}{n^{1/4}}+\frac{d^{1/6}}{(n\varepsilon)^{1/3}}\big)$ which matches the best existing non-private algorithm when $d= O(\sqrt{n})$. We also extend all our results above for the non-convex $\ell_2$ setting to the $\ell_p$ setting, where $1
\ No newline at end of file
diff --git a/data/2021/neurips/Differentially Private n-gram Extraction b/data/2021/neurips/Differentially Private n-gram Extraction
new file mode 100644
index 0000000000..faaf5a3ccd
--- /dev/null
+++ b/data/2021/neurips/Differentially Private n-gram Extraction	
@@ -0,0 +1 @@
+We revisit the problem of $n$-gram extraction in the differential privacy setting. In this problem, given a corpus of private text data, the goal is to release as many $n$-grams as possible while preserving user level privacy. Extracting $n$-grams is a fundamental subroutine in many NLP applications such as sentence completion, response generation for emails etc. The problem also arises in other applications such as sequence mining, and is a generalization of recently studied differentially private set union (DPSU). In this paper, we develop a new differentially private algorithm for this problem which, in our experiments, significantly outperforms the state-of-the-art. Our improvements stem from combining recent advances in DPSU, privacy accounting, and new heuristics for pruning in the tree-based approach initiated by Chen et al. (2012).
\ No newline at end of file
diff --git a/data/2021/neurips/Diffusion Models Beat GANs on Image Synthesis b/data/2021/neurips/Diffusion Models Beat GANs on Image Synthesis
new file mode 100644
index 0000000000..938891a14e
--- /dev/null
+++ b/data/2021/neurips/Diffusion Models Beat GANs on Image Synthesis	
@@ -0,0 +1 @@
+We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet 256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our code at https://github.com/openai/guided-diffusion
\ No newline at end of file
diff --git a/data/2021/neurips/Diffusion Normalizing Flow b/data/2021/neurips/Diffusion Normalizing Flow
new file mode 100644
index 0000000000..4c4bc3e63c
--- /dev/null
+++ b/data/2021/neurips/Diffusion Normalizing Flow	
@@ -0,0 +1 @@
+We present a novel generative modeling method called diffusion normalizing flow based on stochastic differential equations (SDEs). The algorithm consists of two neural SDEs: a forward SDE that gradually adds noise to the data to transform the data into Gaussian random noise, and a backward SDE that gradually removes the noise to sample from the data distribution. By jointly training the two neural SDEs to minimize a common cost function that quantifies the difference between the two, the backward SDE converges to a diffusion process the starts with a Gaussian distribution and ends with the desired data distribution. Our method is closely related to normalizing flow and diffusion probabilistic models and can be viewed as a combination of the two. Compared with normalizing flow, diffusion normalizing flow is able to learn distributions with sharp boundaries. Compared with diffusion probabilistic models, diffusion normalizing flow requires fewer discretization steps and thus has better sampling efficiency. Our algorithm demonstrates competitive performance in both high-dimension data density estimation and image generation tasks.
\ No newline at end of file
diff --git "a/data/2021/neurips/Diffusion Schr\303\266dinger Bridge with Applications to Score-Based Generative Modeling" "b/data/2021/neurips/Diffusion Schr\303\266dinger Bridge with Applications to Score-Based Generative Modeling"
new file mode 100644
index 0000000000..a75ca5e42f
--- /dev/null
+++ "b/data/2021/neurips/Diffusion Schr\303\266dinger Bridge with Applications to Score-Based Generative Modeling"	
@@ -0,0 +1 @@
+Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian. In contrast, solving the Schr\"odinger Bridge problem (SB), i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments. The first DSB iteration recovers the methodology proposed by Song et al. (2021), with the flexibility of using shorter time intervals, as subsequent DSB iterations reduce the discrepancy between the final-time marginal of the forward (resp. backward) SDE with respect to the prior (resp. data) distribution. Beyond generative modeling, DSB offers a widely applicable computational optimal transport tool as the continuous state-space analogue of the popular Sinkhorn algorithm (Cuturi, 2013).
\ No newline at end of file
diff --git a/data/2021/neurips/Dimension-free empirical entropy estimation b/data/2021/neurips/Dimension-free empirical entropy estimation
new file mode 100644
index 0000000000..17bc7d7af2
--- /dev/null
+++ b/data/2021/neurips/Dimension-free empirical entropy estimation	
@@ -0,0 +1 @@
+We seek an entropy estimator for discrete distributions with fully empirical accuracy bounds. As stated, this goal is infeasible without some prior assumptions on the distribution. We discover that a certain information moment assumption renders the problem feasible. We argue that the moment assumption is natural and, in some sense, minimalistic — weaker than finite support or tail decay conditions. Under the moment assumption, we provide the first finite-sample entropy estimates for infinite alphabets, nearly recovering the known minimax rates. Moreover, we demonstrate that our empirical bounds are significantly sharper than the state-of-the-art bounds, for various natural distributions and non-trivial sample regimes. Along the way, we give a dimension-free analogue of the Cover-Thomas result on entropy continuity (with respect to total variation distance) for finite alphabets, which may be of independent interest. Additionally, we resolve all of the open problems posed by Jürgensen and Matthews, 2010.
\ No newline at end of file
diff --git a/data/2021/neurips/Dimensionality Reduction for Wasserstein Barycenter b/data/2021/neurips/Dimensionality Reduction for Wasserstein Barycenter
new file mode 100644
index 0000000000..d4f0f207b9
--- /dev/null
+++ b/data/2021/neurips/Dimensionality Reduction for Wasserstein Barycenter	
@@ -0,0 +1 @@
+The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this"curse of dimensionality,"we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality.
\ No newline at end of file
diff --git a/data/2021/neurips/Direct Multi-view Multi-person 3D Pose Estimation b/data/2021/neurips/Direct Multi-view Multi-person 3D Pose Estimation
new file mode 100644
index 0000000000..2250283fcd
--- /dev/null
+++ b/data/2021/neurips/Direct Multi-view Multi-person 3D Pose Estimation	
@@ -0,0 +1 @@
+We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [36] by 9.8%. MvP is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at https://github.com/sail-sg/mvp.
\ No newline at end of file
diff --git a/data/2021/neurips/Directed Graph Contrastive Learning b/data/2021/neurips/Directed Graph Contrastive Learning
new file mode 100644
index 0000000000..2a29ccb5a0
--- /dev/null
+++ b/data/2021/neurips/Directed Graph Contrastive Learning	
@@ -0,0 +1 @@
+Graph Contrastive Learning (GCL) has emerged to learn generalizable representations from contrastive views. However, it is still in its infancy with two concerns: 1) changing the graph structure through data augmentation to generate contrastive views may mislead the message passing scheme, as such graph changing action deprives the intrinsic graph structural information, especially the directional structure in directed graphs; 2) since GCL usually uses predefined contrastive views with hand-picking parameters, it does not take full advantage of the contrastive information provided by data augmentation, resulting in incomplete structure information for models learning. In this paper, we design a directed graph data augmentation method called Laplacian perturbation and theoretically analyze how it provides contrastive information without changing the directed graph structure. Moreover, we present a directed graph contrastive learning framework, which dynamically learns from all possible contrastive views generated by Laplacian perturbation. Then we train it using multi-task curriculum learning to progressively learn from multiple easy-to-difficult contrastive views. We empirically show that our model can retain more structural features of directed graphs than other GCL models because of its ability to provide complete contrastive information. Experiments on various benchmarks reveal our dominance over the state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Directed Probabilistic Watershed b/data/2021/neurips/Directed Probabilistic Watershed
new file mode 100644
index 0000000000..4d81202fc4
--- /dev/null
+++ b/data/2021/neurips/Directed Probabilistic Watershed	
@@ -0,0 +1 @@
+The Probabilistic Watershed is a semi-supervised learning algorithm applied on undirected graphs. Given a set of labeled nodes (seeds), it deﬁnes a Gibbs probability distribution over all possible spanning forests disconnecting the seeds. It calculates, for every node, the probability of sampling a forest connecting a certain seed with the considered node. We propose the "Directed Probabilistic Watershed", an extension of the Probabilistic Watershed algorithm to directed graphs. Building on the Probabilistic Watershed, we apply the Matrix Tree Theorem for directed graphs and deﬁne a Gibbs probability distribution over all incoming directed forests rooted at the seeds. Similar to the undirected case, this turns out to be equivalent to the Directed Random Walker. Furthermore, we show that in the limit case in which the Gibbs distribution has inﬁnitely low temperature, the labeling of the Directed Probabilistic Watershed is equal to the one induced by the incoming directed forest of minimum cost. Finally, for illustration, we compare the empirical performance of the proposed method with other semi-supervised segmentation methods for directed graphs.
\ No newline at end of file
diff --git a/data/2021/neurips/Directed Spectrum Measures Improve Latent Network Models Of Neural Populations b/data/2021/neurips/Directed Spectrum Measures Improve Latent Network Models Of Neural Populations
new file mode 100644
index 0000000000..4d96a9f046
--- /dev/null
+++ b/data/2021/neurips/Directed Spectrum Measures Improve Latent Network Models Of Neural Populations	
@@ -0,0 +1 @@
+Systems neuroscience aims to understand how networks of neurons distributed throughout the brain mediate computational tasks. One popular approach to identify those networks is to ﬁrst calculate measures of neural activity (e.g. power spectra) from multiple brain regions, and then apply a linear factor model to those measures. Critically, despite the established role of directed communication between brain regions in neural computation, measures of directed communication have been rarely utilized in network estimation because they are incompatible with the implicit assumptions of the linear factor model approach. Here, we develop a novel spectral measure of directed communication called the Directed Spectrum (DS). We prove that it is compatible with the implicit assumptions of linear factor models, and we provide a method to estimate the DS. We demonstrate that latent linear factor models of DS measures better capture underlying brain networks in both simulated and real neural recording data compared to available alternatives. Thus, linear factor models of the Directed Spectrum offer neuroscientists a simple and effective way to explicitly model directed communication in networks of neural populations.
\ No newline at end of file
diff --git a/data/2021/neurips/Directional Message Passing on Molecular Graphs via Synthetic Coordinates b/data/2021/neurips/Directional Message Passing on Molecular Graphs via Synthetic Coordinates
new file mode 100644
index 0000000000..0d87258fa4
--- /dev/null
+++ b/data/2021/neurips/Directional Message Passing on Molecular Graphs via Synthetic Coordinates	
@@ -0,0 +1 @@
+Graph neural networks that leverage coordinates via directional message passing have recently set the state of the art on multiple molecular property prediction tasks. However, they rely on atom position information that is often unavailable, and obtaining it is usually prohibitively expensive or even impossible. In this paper we propose synthetic coordinates that enable the use of advanced GNNs without requiring the true molecular configuration. We propose two distances as synthetic coordinates: Distance bounds that specify the rough range of molecular configurations, and graph-based distances using a symmetric variant of personalized PageRank. To leverage both distance and angular information we propose a method of transforming normal graph neural networks into directional MPNNs. We show that with this transformation we can reduce the error of a normal graph neural network by 55% on the ZINC benchmark. We furthermore set the state of the art on ZINC and coordinate-free QM9 by incorporating synthetic coordinates in the SMP and DimeNet++ models. Our implementation is available online.
\ No newline at end of file
diff --git a/data/2021/neurips/Dirichlet Energy Constrained Learning for Deep Graph Neural Networks b/data/2021/neurips/Dirichlet Energy Constrained Learning for Deep Graph Neural Networks
new file mode 100644
index 0000000000..92639b7881
--- /dev/null
+++ b/data/2021/neurips/Dirichlet Energy Constrained Learning for Deep Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) integrate deep architectures and topological structure modeling in an effective way. However, the performance of existing GNNs would decrease significantly when they stack many layers, because of the over-smoothing issue. Node embeddings tend to converge to similar vectors when GNNs keep recursively aggregating the representations of neighbors. To enable deep GNNs, several methods have been explored recently. But they are developed from either techniques in convolutional neural networks or heuristic strategies. There is no generalizable and theoretical principle to guide the design of deep GNNs. To this end, we analyze the bottleneck of deep GNNs by leveraging the Dirichlet energy of node embeddings, and propose a generalizable principle to guide the training of deep GNNs. Based on it, a novel deep GNN framework -- EGNN is designed. It could provide lower and upper constraints in terms of Dirichlet energy at each layer to avoid over-smoothing. Experimental results demonstrate that EGNN achieves state-of-the-art performance by using deep layers.
\ No newline at end of file
diff --git a/data/2021/neurips/Discerning Decision-Making Process of Deep Neural Networks with Hierarchical Voting Transformation b/data/2021/neurips/Discerning Decision-Making Process of Deep Neural Networks with Hierarchical Voting Transformation
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2021/neurips/Discerning Decision-Making Process of Deep Neural Networks with Hierarchical Voting Transformation	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2021/neurips/Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks b/data/2021/neurips/Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks
new file mode 100644
index 0000000000..4d0728b339
--- /dev/null
+++ b/data/2021/neurips/Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks are perfectly suited to capture latent interactions between various entities in the spatio-temporal domain (e.g. videos). However, when an explicit structure is not available, it is not obvious what atomic elements should be represented as nodes. Current works generally use pre-trained object detectors or fixed, predefined regions to extract graph nodes. Improving upon this, our proposed model learns nodes that dynamically attach to well-delimited salient regions, which are relevant for a higher-level task, without using any object-level supervision. Constructing these localized, adaptive nodes gives our model inductive bias towards object-centric representations and we show that it discovers regions that are well correlated with objects in the video. In extensive ablation studies and experiments on two challenging datasets, we show superior performance to previous graph neural networks models for video classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Discovering and Achieving Goals via World Models b/data/2021/neurips/Discovering and Achieving Goals via World Models
new file mode 100644
index 0000000000..d0c4fc5d71
--- /dev/null
+++ b/data/2021/neurips/Discovering and Achieving Goals via World Models	
@@ -0,0 +1 @@
+How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
\ No newline at end of file
diff --git a/data/2021/neurips/Discovery of Options via Meta-Learned Subgoals b/data/2021/neurips/Discovery of Options via Meta-Learned Subgoals
new file mode 100644
index 0000000000..4440cc4262
--- /dev/null
+++ b/data/2021/neurips/Discovery of Options via Meta-Learned Subgoals	
@@ -0,0 +1 @@
+Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based on a manager-worker decomposition of the RL agent, in which a manager maximises rewards from the environment by learning a task-dependent policy over both a set of task-independent discovered-options and primitive actions. The option-reward and termination functions that define a subgoal for each option are parameterised as neural networks and trained via meta-gradients to maximise their usefulness. Empirical analysis on gridworld and DeepMind Lab tasks show that: (1) our approach can discover meaningful and diverse temporally-extended options in multi-task RL domains, (2) the discovered options are frequently used by the agent while learning to solve the training tasks, and (3) that the discovered options help a randomly initialised manager learn faster in completely new tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Discrete-Valued Neural Communication b/data/2021/neurips/Discrete-Valued Neural Communication
new file mode 100644
index 0000000000..421a58ecc7
--- /dev/null
+++ b/data/2021/neurips/Discrete-Valued Neural Communication	
@@ -0,0 +1 @@
+Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a"cat"is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
\ No newline at end of file
diff --git a/data/2021/neurips/Disentangled Contrastive Learning on Graphs b/data/2021/neurips/Disentangled Contrastive Learning on Graphs
new file mode 100644
index 0000000000..d37dfe88b0
--- /dev/null
+++ b/data/2021/neurips/Disentangled Contrastive Learning on Graphs	
@@ -0,0 +1 @@
+Recently, self-supervised learning for graph neural networks (GNNs) has attracted considerable attention because of their notable successes in learning the representation of graph-structure data. However, the formation of a real-world graph typically arises from the highly complex interaction of many latent factors. The existing self-supervised learning methods for GNNs are inherently holistic and neglect the entanglement of the latent factors, resulting in the learned representations suboptimal for downstream tasks and difﬁcult to be interpreted. Learning disentangled graph representations with self-supervised learning poses great challenges and remains largely ignored by the existing literature. In this paper, we introduce the Disentangled Graph Contrastive Learning ( DGCL ) method, which is able to learn disentangled graph-level representations with self-supervision. In particular, we ﬁrst identify the latent factors of the input graph and derive its factorized representations. Each of the factorized representations describes a latent and disentangled aspect pertinent to a speciﬁc latent factor of the graph. Then we propose a novel factor-wise discrimination objective in a contrastive learning manner, which can force the factorized representations to independently reﬂect the expressive information from different latent factors. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of our method against several state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA b/data/2021/neurips/Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA
new file mode 100644
index 0000000000..d368c1d2cd
--- /dev/null
+++ b/data/2021/neurips/Disentangling Identifiable Features from Noisy Data with Structured Nonlinear ICA	
@@ -0,0 +1 @@
+We introduce a new general identifiable framework for principled disentanglement referred to as Structured Nonlinear Independent Component Analysis (SNICA). Our contribution is to extend the identifiability theory of deep generative models for a very broad class of structured models. While previous works have shown identifiability for specific classes of time-series models, our theorems extend this to more general temporal structures as well as to models with more complex structures such as spatial dependencies. In particular, we establish the major result that identifiability for this framework holds even in the presence of noise of unknown distribution. Finally, as an example of our framework's flexibility, we introduce the first nonlinear ICA model for time-series that combines the following very useful properties: it accounts for both nonstationarity and autocorrelation in a fully unsupervised setting; performs dimensionality reduction; models hidden states; and enables principled estimation and inference by variational maximum-likelihood.
\ No newline at end of file
diff --git a/data/2021/neurips/Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect b/data/2021/neurips/Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect
new file mode 100644
index 0000000000..a39edcfcd2
--- /dev/null
+++ b/data/2021/neurips/Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect	
@@ -0,0 +1 @@
+The"cold posterior effect"(CPE) in Bayesian deep learning describes the uncomforting observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T<1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. 2. The data augmentation hypothesis of Izmailov et al. (2021) and Fortuin et al. (2021): we show empirically that data augmentation is sufficient but not necessary for the CPE to be present. 3. The bad prior hypothesis of Wenzel et al. (2020): we use a simple experiment evaluating the relative importance of the prior and the likelihood, strongly linking the CPE to the prior. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors. Cold posteriors observed"in the wild"are therefore unlikely to arise from a single simple cause; as a result, we do not expect a simple"fix"for cold posteriors.
\ No newline at end of file
diff --git a/data/2021/neurips/Disrupting Deep Uncertainty Estimation Without Harming Accuracy b/data/2021/neurips/Disrupting Deep Uncertainty Estimation Without Harming Accuracy
new file mode 100644
index 0000000000..d95bb965cc
--- /dev/null
+++ b/data/2021/neurips/Disrupting Deep Uncertainty Estimation Without Harming Accuracy	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have proven to be powerful predictors and are widely used for various tasks. Credible uncertainty estimation of their predictions, however, is crucial for their deployment in many risk-sensitive applications. In this paper we present a novel and simple attack, which unlike adversarial attacks, does not cause incorrect predictions but instead cripples the network's capacity for uncertainty estimation. The result is that after the attack, the DNN is more confident of its incorrect predictions than about its correct ones without having its accuracy reduced. We present two versions of the attack. The first scenario focuses on a black-box regime (where the attacker has no knowledge of the target network) and the second scenario attacks a white-box setting. The proposed attack is only required to be of minuscule magnitude for its perturbations to cause severe uncertainty estimation damage, with larger magnitudes resulting in completely unusable uncertainty estimations. We demonstrate successful attacks on three of the most popular uncertainty estimation methods: the vanilla softmax score, Deep Ensembles and MC-Dropout. Additionally, we show an attack on SelectiveNet, the selective classification architecture. We test the proposed attack on several contemporary architectures such as MobileNetV2 and EfficientNetB0, all trained to classify ImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Dissecting the Diffusion Process in Linear Graph Convolutional Networks b/data/2021/neurips/Dissecting the Diffusion Process in Linear Graph Convolutional Networks
new file mode 100644
index 0000000000..226359585f
--- /dev/null
+++ b/data/2021/neurips/Dissecting the Diffusion Process in Linear Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs) have attracted more and more attentions in recent years. A typical GCN layer consists of a linear feature propagation step and a nonlinear transformation step. Recent works show that a linear GCN can achieve comparable performance to the original non-linear GCN while being much more computationally efficient. In this paper, we dissect the feature propagation steps of linear GCNs from a perspective of continuous graph diffusion, and analyze why linear GCNs fail to benefit from more propagation steps. Following that, we propose Decoupled Graph Convolution (DGC) that decouples the terminal time and the feature propagation steps, making it more flexible and capable of exploiting a very large number of feature propagation steps. Experiments demonstrate that our proposed DGC improves linear GCNs by a large margin and makes them competitive with many modern variants of non-linear GCNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Distilling Image Classifiers in Object Detectors b/data/2021/neurips/Distilling Image Classifiers in Object Detectors
new file mode 100644
index 0000000000..ada1250282
--- /dev/null
+++ b/data/2021/neurips/Distilling Image Classifiers in Object Detectors	
@@ -0,0 +1 @@
+Knowledge distillation constitutes a simple yet effective way to improve the performance of a compact student network by exploiting the knowledge of a more powerful teacher. Nevertheless, the knowledge distillation literature remains limited to the scenario where the student and the teacher tackle the same task. Here, we investigate the problem of transferring knowledge not only across architectures but also across tasks. To this end, we study the case of object detection and, instead of following the standard detector-to-detector distillation approach, introduce a classifier-to-detector knowledge transfer framework. In particular, we propose strategies to exploit the classification teacher to improve both the detector's recognition accuracy and localization performance. Our experiments on several detectors with different backbones demonstrate the effectiveness of our approach, allowing us to outperform the state-of-the-art detector-to-detector distillation methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Distilling Meta Knowledge on Heterogeneous Graph for Illicit Drug Trafficker Detection on Social Media b/data/2021/neurips/Distilling Meta Knowledge on Heterogeneous Graph for Illicit Drug Trafficker Detection on Social Media
new file mode 100644
index 0000000000..f6a903e27d
--- /dev/null
+++ b/data/2021/neurips/Distilling Meta Knowledge on Heterogeneous Graph for Illicit Drug Trafficker Detection on Social Media	
@@ -0,0 +1 @@
+Driven by the considerable profits, the crime of drug trafficking (a.k.a. illicit drug trading) has co-evolved with modern technologies, e.g., social media such as Instagram has become a popular platform for marketing and selling illicit drugs. The activities of online drug trafficking are nimble and resilient, which call for novel techniques to effectively detect, disrupt, and dismantle illicit drug trades. In this paper, we propose a holistic framework named MetaHG to automatically detect illicit drug traffickers on social media (i.e., Instagram), by tackling the following two new challenges: (1) different from existing works which merely focus on analyzing post content, MetaHG is capable of jointly modeling multi-modal content and relational structured information on social media for illicit drug trafficker detection; (2) in addition, through the proposed meta-learning technique, MetaHG addresses the issue of requiring sufficient data for model training. More specifically, in our proposed MetaHG, we first build a heterogeneous graph (HG) to comprehensively characterize the complex ecosystem of drug trafficking on social media. Then, we employ a relation-based graph convolutional neural network to learn node (i.e., user) representations over the built HG, in which we introduce graph structure refinement to compensate the sparse connection among entities in the HG for more robust node representation learning. Afterwards, we propose a meta-learning algorithm for model optimization. A self-supervised module and a knowledge distillation module are further designed to exploit unlabeled data for improving the model. Extensive experiments based on the real-world data collected from Instagram demonstrate that the proposed MetaHG outperforms state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Distilling Object Detectors with Feature Richness b/data/2021/neurips/Distilling Object Detectors with Feature Richness
new file mode 100644
index 0000000000..791fae2566
--- /dev/null
+++ b/data/2021/neurips/Distilling Object Detectors with Feature Richness	
@@ -0,0 +1 @@
+In recent years, large-scale deep models have achieved great success, but the huge computational complexity and massive storage requirements make it a great challenge to deploy them in resource-limited devices. As a model compression and acceleration method, knowledge distillation effectively improves the performance of small models by transferring the dark knowledge from the teacher detector. However, most of the existing distillation-based detection methods mainly imitating features near bounding boxes, which suffer from two limitations. First, they ignore the beneficial features outside the bounding boxes. Second, these methods imitate some features which are mistakenly regarded as the background by the teacher detector. To address the above issues, we propose a novel Feature-Richness Score (FRS) method to choose important features that improve generalized detectability during distilling. The proposed method effectively retrieves the important features outside the bounding boxes and removes the detrimental features within the bounding boxes. Extensive experiments show that our methods achieve excellent performance on both anchor-based and anchor-free detectors. For example, RetinaNet with ResNet-50 achieves 39.7% in mAP on the COCO2017 dataset, which even surpasses the ResNet-101 based teacher detector 38.9% by 0.8%. Our implementation is available at https://github.com/duzhixing/FRS.
\ No newline at end of file
diff --git a/data/2021/neurips/Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck b/data/2021/neurips/Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck
new file mode 100644
index 0000000000..7d08544326
--- /dev/null
+++ b/data/2021/neurips/Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck	
@@ -0,0 +1 @@
+Adversarial examples, generated by carefully crafted perturbation, have attracted considerable attention in research fields. Recent works have argued that the existence of the robust and non-robust features is a primary cause of the adversarial examples, and investigated their internal interactions in the feature space. In this paper, we propose a way of explicitly distilling feature representation into the robust and non-robust features, using Information Bottleneck. Specifically, we inject noise variation to each feature unit and evaluate the information flow in the feature representation to dichotomize feature units either robust or non-robust, based on the noise variation magnitude. Through comprehensive experiments, we demonstrate that the distilled features are highly correlated with adversarial prediction, and they have human-perceptible semantic information by themselves. Furthermore, we present an attack mechanism intensifying the gradient of non-robust features that is directly related to the model prediction, and validate its effectiveness of breaking model robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributed Deep Learning In Open Collaborations b/data/2021/neurips/Distributed Deep Learning In Open Collaborations
new file mode 100644
index 0000000000..9cd53d8a2d
--- /dev/null
+++ b/data/2021/neurips/Distributed Deep Learning In Open Collaborations	
@@ -0,0 +1 @@
+Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid- or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative language model pretraining with 40 participants.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributed Estimation with Multiple Samples per User: Sharp Rates and Phase Transition b/data/2021/neurips/Distributed Estimation with Multiple Samples per User: Sharp Rates and Phase Transition
new file mode 100644
index 0000000000..19618906a7
--- /dev/null
+++ b/data/2021/neurips/Distributed Estimation with Multiple Samples per User: Sharp Rates and Phase Transition	
@@ -0,0 +1 @@
+We obtain tight minimax rates for the problem of distributed estimation of discrete distributions under communication constraints, where n users observing m samples each can broadcast only ¸ bits. Our main result is a tight characterization (up to logarithmic factors) of the error rate as a function of m , ¸ , the domain size
\ No newline at end of file
diff --git a/data/2021/neurips/Distributed Machine Learning with Sparse Heterogeneous Data b/data/2021/neurips/Distributed Machine Learning with Sparse Heterogeneous Data
new file mode 100644
index 0000000000..7151edc57e
--- /dev/null
+++ b/data/2021/neurips/Distributed Machine Learning with Sparse Heterogeneous Data	
@@ -0,0 +1 @@
+Motivated by distributed machine learning settings such as Federated Learning, we consider the problem of fitting a statistical model across a distributed collection of heterogeneous data sets whose similarity structure is encoded by a graph topology. Precisely, we analyse the case where each node is associated with fitting a sparse linear model, and edges join two nodes if the difference of their solutions is also sparse. We propose a method based on Basis Pursuit Denoising with a total variation penalty, and provide finite sample guarantees for sub-Gaussian design matrices. Taking the root of the tree as a reference node, we show that if the sparsity of the differences across nodes is smaller than the sparsity at the root, then recovery is successful with fewer samples than by solving the problems independently, or by using methods that rely on a large overlap in the signal supports, such as the group Lasso. We consider both the noiseless and noisy setting, and numerically investigate the performance of distributed methods based on Distributed Alternating Direction Methods of Multipliers (ADMM) and hyperspectral unmixing.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributed Principal Component Analysis with Limited Communication b/data/2021/neurips/Distributed Principal Component Analysis with Limited Communication
new file mode 100644
index 0000000000..12c9b522a9
--- /dev/null
+++ b/data/2021/neurips/Distributed Principal Component Analysis with Limited Communication	
@@ -0,0 +1 @@
+We study efficient distributed algorithms for the fundamental problem of principal component analysis and leading eigenvector computation on the sphere, when the data are randomly distributed among a set of computational nodes. We propose a new quantized variant of Riemannian gradient descent to solve this problem, and prove that the algorithm converges with high probability under a set of necessary spherical-convexity properties. We give bounds on the number of bits transmitted by the algorithm under common initialization schemes, and investigate the dependency on the problem dimension in each case.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributed Saddle-Point Problems Under Data Similarity b/data/2021/neurips/Distributed Saddle-Point Problems Under Data Similarity
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Distributed Zero-Order Optimization under Adversarial Noise b/data/2021/neurips/Distributed Zero-Order Optimization under Adversarial Noise
new file mode 100644
index 0000000000..6dd0adbeb4
--- /dev/null
+++ b/data/2021/neurips/Distributed Zero-Order Optimization under Adversarial Noise	
@@ -0,0 +1 @@
+We study the problem of distributed zero-order optimization for a class of strongly convex functions. They are formed by the average of local objectives, associated to different nodes in a prescribed network of connections. We propose a distributed zero-order projected gradient descent algorithm to solve this problem. Exchange of information within the network is permitted only between neighbouring nodes. A key feature of the algorithm is that it can query only function values, subject to a general noise model, that does not require zero mean or independent errors. We derive upper bounds for the average cumulative regret and optimization error of the algorithm which highlight the role played by a network connectivity parameter, the number of variables, the noise level, the strong convexity parameter of the global objective and certain smoothness properties of the local objectives. When the bound is specified to the standard undistributed setting, we obtain an improvement over the state-of-the-art bounds, due to the novel gradient estimation procedure proposed here. We also comment on lower bounds and observe that the dependency over certain function parameters in the bound is nearly optimal.
\ No newline at end of file
diff --git a/data/2021/neurips/Distribution-free inference for regression: discrete, continuous, and in between b/data/2021/neurips/Distribution-free inference for regression: discrete, continuous, and in between
new file mode 100644
index 0000000000..2c13000868
--- /dev/null
+++ b/data/2021/neurips/Distribution-free inference for regression: discrete, continuous, and in between	
@@ -0,0 +1 @@
+In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean $\mathbb{E}[Y|X]$) is more challenging. In the setting where the features $X$ are continuously distributed, recent work has established that any confidence interval for $\mathbb{E}[Y|X]$ must have non-vanishing width, even as sample size tends to infinity. At the other extreme, if $X$ takes only a small number of possible values, then inference on $\mathbb{E}[Y|X]$ is trivial to achieve. In this work, we study the problem in settings in between these two extremes. We find that there are several distinct regimes in between the finite setting and the continuous setting, where vanishing-width confidence intervals are achievable if and only if the effective support size of the distribution of $X$ is smaller than the square of the sample size.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models b/data/2021/neurips/Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models
new file mode 100644
index 0000000000..64da6eee84
--- /dev/null
+++ b/data/2021/neurips/Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models	
@@ -0,0 +1 @@
+Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm - distributional gradient matching (DGM) - jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributional Reinforcement Learning for Multi-Dimensional Reward Functions b/data/2021/neurips/Distributional Reinforcement Learning for Multi-Dimensional Reward Functions
new file mode 100644
index 0000000000..ce677d20e3
--- /dev/null
+++ b/data/2021/neurips/Distributional Reinforcement Learning for Multi-Dimensional Reward Functions	
@@ -0,0 +1 @@
+A growing trend for value-based reinforcement learning (RL) algorithms is to capture more information than scalar value functions in the value network. One of the most well-known methods in this branch is distributional RL, which models return distribution instead of scalar value. In another line of work, hybrid reward architectures (HRA) in RL have studied to model source-specific value functions for each source of reward, which is also shown to be beneficial in performance. To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture not only the randomness in returns for each source of reward, but also the rich reward correlation between the randomness of different sources. We prove the convergence for the joint distributional Bellman operator and build our empirical algorithm by minimizing the Maximum Mean Discrepancy between joint return distribution and its Bellman target. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions, and outperforms previous RL methods utilizing multi-dimensional reward functions in the control setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Distributionally Robust Imitation Learning b/data/2021/neurips/Distributionally Robust Imitation Learning
new file mode 100644
index 0000000000..50ce9aecf0
--- /dev/null
+++ b/data/2021/neurips/Distributionally Robust Imitation Learning	
@@ -0,0 +1 @@
+We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts’ for a de-sired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DR O IL) and establishes a close connection between DR O IL and Maximum Entropy Inverse Reinforcement Learning. We show that DR O IL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and non-stationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the signiﬁcant beneﬁts of DR O IL’s new optimization method on synthetic data and a highway driving environment.
\ No newline at end of file
diff --git a/data/2021/neurips/Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals b/data/2021/neurips/Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals
new file mode 100644
index 0000000000..79578eaa12
--- /dev/null
+++ b/data/2021/neurips/Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals	
@@ -0,0 +1 @@
+The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the quality-diversity trade-off inherent to deep generative modeling. We establish non-asymptotic bounds on the sample complexity of divergence frontiers. We also introduce frontier integrals which provide summary statistics of divergence frontiers. We show how smoothed estimators such as Good-Turing or Krichevsky-Trofimov can overcome the missing mass problem and lead to faster rates of convergence. We illustrate the theoretical results with numerical examples from natural language processing and computer vision.
\ No newline at end of file
diff --git a/data/2021/neurips/Diverse Message Passing for Attribute with Heterophily b/data/2021/neurips/Diverse Message Passing for Attribute with Heterophily
new file mode 100644
index 0000000000..3684cf16d4
--- /dev/null
+++ b/data/2021/neurips/Diverse Message Passing for Attribute with Heterophily	
@@ -0,0 +1 @@
+Most of the existing GNNs can be modeled via the Uniform Message Passing framework. This framework considers all the attributes of each node in its entirety, shares the uniform propagation weights along each edge, and focuses on the uniform weight learning. The design of this framework possesses two prerequisites, the simpliﬁcation of homophily and heterophily to the node-level property and the ignorance of attribute differences. Unfortunately, different attributes possess diverse characteristics. In this paper, the network homophily rate deﬁned with respect to the node labels is extended to attribute homophily rate by taking the attributes as weak labels. Based on this attribute homophily rate, we propose a Diverse Message Passing (DMP) framework, which speciﬁes every attribute propagation weight on each edge. Besides, we propose two speciﬁc strategies to signiﬁcantly reduce the computational complexity of DMP to prevent the overﬁtting issue. By investigating the spectral characteristics, existing spectral GNNs are actually equivalent to a degenerated version of DMP. From the perspective of numerical optimization, we provide a theoretical analysis to demonstrate DMP’s powerful representation ability and the ability of alleviating the over-smoothing issue. Evaluations on various real networks demonstrate the superiority of our DMP on handling the networks with heterophily and alleviating the over-smoothing issue, compared to
\ No newline at end of file
diff --git a/data/2021/neurips/Diversity Enhanced Active Learning with Strictly Proper Scoring Rules b/data/2021/neurips/Diversity Enhanced Active Learning with Strictly Proper Scoring Rules
new file mode 100644
index 0000000000..c35014674e
--- /dev/null
+++ b/data/2021/neurips/Diversity Enhanced Active Learning with Strictly Proper Scoring Rules	
@@ -0,0 +1 @@
+We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log probability or negative mean square error, which we call Bayesian Estimate of Mean Proper Scores (BEMPS). We also prove convergence results borrowing techniques used with MOCU. In order to allow better experimentation with the new acquisition functions, we develop a complementary batch AL algorithm, which encourages diversity in the vector of expected changes in scores for unlabelled data. To allow high performance text classifiers, we combine ensembling and dynamic validation set construction on pretrained language models. Extensive experimental evaluation then explores how these different acquisition functions perform. The results show that the use of mean square error and log probability with BEMPS yields robust acquisition functions, which consistently outperform the others tested.
\ No newline at end of file
diff --git a/data/2021/neurips/Diversity Matters When Learning From Ensembles b/data/2021/neurips/Diversity Matters When Learning From Ensembles
new file mode 100644
index 0000000000..2e4896ef50
--- /dev/null
+++ b/data/2021/neurips/Diversity Matters When Learning From Ensembles	
@@ -0,0 +1 @@
+Deep ensembles excel in large-scale image classification tasks both in terms of prediction accuracy and calibration. Despite being simple to train, the computation and memory cost of deep ensembles limits their practicability. While some recent works propose to distill an ensemble model into a single model to reduce such costs, there is still a performance gap between the ensemble and distilled models. We propose a simple approach for reducing this gap, i.e., making the distilled performance close to the full ensemble. Our key assumption is that a distilled model should absorb as much function diversity inside the ensemble as possible. We first empirically show that the typical distillation procedure does not effectively transfer such diversity, especially for complex models that achieve near-zero training error. To fix this, we propose a perturbation strategy for distillation that reveals diversity by seeking inputs for which ensemble member outputs disagree. We empirically show that a model distilled with such perturbed samples indeed exhibits enhanced diversity, leading to improved performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Do Different Tracking Tasks Require Different Appearance Models? b/data/2021/neurips/Do Different Tracking Tasks Require Different Appearance Models?
new file mode 100644
index 0000000000..b7b87c8390
--- /dev/null
+++ b/data/2021/neurips/Do Different Tracking Tasks Require Different Appearance Models?	
@@ -0,0 +1 @@
+Tracking objects of interest in a video is one of the most popular and widely applicable problems in computer vision. However, with the years, a Cambrian explosion of use cases and benchmarks has fragmented the problem in a multitude of different experimental setups. As a consequence, the literature has fragmented too, and now novel approaches proposed by the community are usually specialised to fit only one specific setup. To understand to what extent this specialisation is necessary, in this work we present UniTrack, a solution to address five different tasks within the same framework. UniTrack consists of a single and task-agnostic appearance model, which can be learned in a supervised or self-supervised fashion, and multiple ``heads'' that address individual tasks and do not require training. We show how most tracking tasks can be solved within this framework, and that the same appearance model can be successfully used to obtain results that are competitive against specialised methods for most of the tasks considered. The framework also allows us to analyse appearance models obtained with the most recent self-supervised methods, thus extending their evaluation and comparison to a larger variety of important problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Do Input Gradients Highlight Discriminative Features? b/data/2021/neurips/Do Input Gradients Highlight Discriminative Features?
new file mode 100644
index 0000000000..4c1e420db1
--- /dev/null
+++ b/data/2021/neurips/Do Input Gradients Highlight Discriminative Features?	
@@ -0,0 +1 @@
+Post-hoc gradient-based interpretability methods [Simonyan et al., 2013, Smilkov et al., 2017] that provide instance-specific explanations of model predictions are often based on assumption (A): magnitude of input gradients -- gradients of logits with respect to input -- noisily highlight discriminative task-relevant features. In this work, we test the validity of assumption (A) using a three-pronged approach. First, we develop an evaluation framework, DiffROAR, to test assumption (A) on four image classification benchmarks. Our results suggest that (i) input gradients of standard models (i.e., trained on original data) may grossly violate (A), whereas (ii) input gradients of adversarially robust models satisfy (A). Second, we introduce BlockMNIST, an MNIST-based semi-real dataset, that by design encodes a priori knowledge of discriminative features. Our analysis on BlockMNIST leverages this information to validate as well as characterize differences between input gradient attributions of standard and robust models. Finally, we theoretically prove that our empirical findings hold on a simplified version of the BlockMNIST dataset. Specifically, we prove that input gradients of standard one-hidden-layer MLPs trained on this dataset do not highlight instance-specific signal coordinates, thus grossly violating assumption (A). Our findings motivate the need to formalize and test common assumptions in interpretability in a falsifiable manner [Leavitt and Morcos, 2020]. We believe that the DiffROAR evaluation framework and BlockMNIST-based datasets can serve as sanity checks to audit instance-specific interpretability methods; code and data available at https://github.com/harshays/inputgradients.
\ No newline at end of file
diff --git a/data/2021/neurips/Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark b/data/2021/neurips/Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark
new file mode 100644
index 0000000000..186456d2d7
--- /dev/null
+++ b/data/2021/neurips/Do Neural Optimal Transport Solvers Work? A Continuous Wasserstein-2 Benchmark	
@@ -0,0 +1 @@
+Despite the recent popularity of neural network-based solvers for optimal transport (OT), there is no standard quantitative way to evaluate their performance. In this paper, we address this issue for quadratic-cost transport -- specifically, computation of the Wasserstein-2 distance, a commonly-used formulation of optimal transport in machine learning. To overcome the challenge of computing ground truth transport maps between continuous measures needed to assess these solvers, we use input-convex neural networks (ICNN) to construct pairs of measures whose ground truth OT maps can be obtained analytically. This strategy yields pairs of continuous benchmark measures in high-dimensional spaces such as spaces of images. We thoroughly evaluate existing optimal transport solvers using these benchmark measures. Even though these solvers perform well in downstream tasks, many do not faithfully recover optimal transport maps. To investigate the cause of this discrepancy, we further test the solvers in a setting of image generation. Our study reveals crucial limitations of existing solvers and shows that increased OT accuracy does not necessarily correlate to better results downstream.
\ No newline at end of file
diff --git a/data/2021/neurips/Do Transformers Really Perform Badly for Graph Representation? b/data/2021/neurips/Do Transformers Really Perform Badly for Graph Representation?
new file mode 100644
index 0000000000..da44faee81
--- /dev/null
+++ b/data/2021/neurips/Do Transformers Really Perform Badly for Graph Representation?	
@@ -0,0 +1 @@
+The Transformer architecture has become a dominant choice in many domains, such as natural language processing and computer vision. Yet, it has not achieved competitive performance on popular leaderboards of graph-level prediction compared to mainstream GNN variants. Therefore, it remains a mystery how Transformers could perform well for graph representation learning. In this paper, we solve this mystery by presenting Graphormer, which is built upon the standard Transformer architecture, and could attain excellent results on a broad range of graph representation learning tasks, especially on the recent OGB Large-Scale Challenge. Our key insight to utilizing Transformer in the graph is the necessity of effectively encoding the structural information of a graph into the model. To this end, we propose several simple yet effective structural encoding methods to help Graphormer better model graph-structured data. Besides, we mathematically characterize the expressive power of Graphormer and exhibit that with our ways of encoding the structural information of graphs, many popular GNN variants could be covered as the special cases of Graphormer. The code and models of Graphormer will be made publicly available at https://github.com/Microsoft/Graphormer .
\ No newline at end of file
diff --git a/data/2021/neurips/Do Vision Transformers See Like Convolutional Neural Networks? b/data/2021/neurips/Do Vision Transformers See Like Convolutional Neural Networks?
new file mode 100644
index 0000000000..52342887b6
--- /dev/null
+++ b/data/2021/neurips/Do Vision Transformers See Like Convolutional Neural Networks?	
@@ -0,0 +1 @@
+Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.
\ No newline at end of file
diff --git a/data/2021/neurips/Do Wider Neural Networks Really Help Adversarial Robustness? b/data/2021/neurips/Do Wider Neural Networks Really Help Adversarial Robustness?
new file mode 100644
index 0000000000..fdc3ded229
--- /dev/null
+++ b/data/2021/neurips/Do Wider Neural Networks Really Help Adversarial Robustness?	
@@ -0,0 +1 @@
+Adversarial training is a powerful type of defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. However, it remains elusive how neural network width affects model robustness. In this paper, we carefully examine the relationship between network width and model robustness. Specifically, we show that the model robustness is closely related to the tradeoff between natural accuracy and perturbation stability, which is controlled by the robust regularization parameter $\lambda$. With the same $\lambda$, wider networks can achieve better natural accuracy but worse perturbation stability, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitzness. By leveraging recent results on neural tangent kernels, we theoretically show that wider networks tend to have worse perturbation stability. Our analyses suggest that: 1) the common strategy of first fine-tuning $\lambda$ on small networks and then directly use it for wide model training could lead to deteriorated model robustness; 2) one needs to properly enlarge $\lambda$ to unleash the robustness potential of wider models fully. Finally, we propose a new Width Adjusted Regularization (WAR) method that adaptively enlarges $\lambda$ on wide models and significantly saves the tuning time.
\ No newline at end of file
diff --git a/data/2021/neurips/Does Knowledge Distillation Really Work? b/data/2021/neurips/Does Knowledge Distillation Really Work?
new file mode 100644
index 0000000000..59de628401
--- /dev/null
+++ b/data/2021/neurips/Does Knowledge Distillation Really Work?	
@@ -0,0 +1 @@
+Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the student, even in cases when the student has the capacity to perfectly match the teacher. We identify difficulties in optimization as a key reason for why the student is unable to match the teacher. We also show how the details of the dataset used for distillation play a role in how closely the student matches the teacher -- and that more closely matching the teacher paradoxically does not always lead to better student generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Does Preprocessing Help Training Over-parameterized Neural Networks? b/data/2021/neurips/Does Preprocessing Help Training Over-parameterized Neural Networks?
new file mode 100644
index 0000000000..0bc96c7a2c
--- /dev/null
+++ b/data/2021/neurips/Does Preprocessing Help Training Over-parameterized Neural Networks?	
@@ -0,0 +1 @@
+Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying $\Omega(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this $\Omega(mnd)$ barrier: $\bullet$ First, by preprocessing the initial weights of the neural networks, we can train the neural network in $\widetilde{O}(m^{1-\Theta(1/d)} n d)$ cost per iteration. $\bullet$ Second, by preprocessing the input data points, we can train the neural network in $\widetilde{O} (m^{4/5} nd )$ cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points.
\ No newline at end of file
diff --git a/data/2021/neurips/Does enforcing fairness mitigate biases caused by subpopulation shift? b/data/2021/neurips/Does enforcing fairness mitigate biases caused by subpopulation shift?
new file mode 100644
index 0000000000..92ad395c1a
--- /dev/null
+++ b/data/2021/neurips/Does enforcing fairness mitigate biases caused by subpopulation shift?	
@@ -0,0 +1 @@
+Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we study whether enforcing algorithmic fairness during training improves the performance of the trained model in the \emph{target domain}. On one hand, we conceive scenarios in which enforcing fairness does not improve performance in the target domain. In fact, it may even harm performance. On the other hand, we derive necessary and sufficient conditions under which enforcing algorithmic fairness leads to the Bayes model in the target domain. We also illustrate the practical implications of our theoretical results in simulations and on real data.
\ No newline at end of file
diff --git a/data/2021/neurips/Domain Adaptation with Invariant Representation Learning: What Transformations to Learn? b/data/2021/neurips/Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?
new file mode 100644
index 0000000000..2372666a0f
--- /dev/null
+++ b/data/2021/neurips/Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?	
@@ -0,0 +1 @@
+Let there be a subset in the invariant space B ⊂ Z , and suppose that we have marginal invariance in the latent space: PS(φ(X) ∈ B) = P T (φ(X) ∈ B),∀B. Define the pre-image of B as: A = {a ∈ X : φ(a) ∈ B}. Then, we have the following equality relationships: PS(φ(X) ∈ B) = PS(X ∈ A) = P T (φ(X) ∈ B) = P T (X ∈ A), which implies that PS(X ∈ A) = P T (X ∈ A) must hold. This further implies that if there is difference in the source and target distributions in the support-overlapping region A, given by: PS(X ∈ A) 6= P T (X ∈ A), ∗These authors contributed equally to this work.
\ No newline at end of file
diff --git a/data/2021/neurips/Domain Invariant Representation Learning with Domain Density Transformations b/data/2021/neurips/Domain Invariant Representation Learning with Domain Density Transformations
new file mode 100644
index 0000000000..5becb2b1fb
--- /dev/null
+++ b/data/2021/neurips/Domain Invariant Representation Learning with Domain Density Transformations	
@@ -0,0 +1 @@
+Domain generalization refers to the problem where we aim to train a model on data from a set of source domains so that the model can generalize to unseen target domains. Naively training a model on the aggregate set of data (pooled from all source domains) has been shown to perform suboptimally, since the information learned by that model might be domain-specific and generalize imperfectly to target domains. To tackle this problem, a predominant approach is to find and learn some domain-invariant information in order to use it for the prediction task. In this paper, we propose a theoretically grounded method to learn a domain-invariant representation by enforcing the representation network to be invariant under all transformation functions among domains. We also show how to use generative adversarial networks to learn such domain transformations to implement our method in practice. We demonstrate the effectiveness of our method on several widely used datasets for the domain generalization problem, on all of which we achieve competitive results with state-of-the-art models.
\ No newline at end of file
diff --git a/data/2021/neurips/DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks b/data/2021/neurips/DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2021/neurips/DominoSearch: Find layer-wise fine-grained N: M sparse schemes from dense neural networks	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2021/neurips/Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence b/data/2021/neurips/Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence
new file mode 100644
index 0000000000..a6f1e2ab42
--- /dev/null
+++ b/data/2021/neurips/Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence	
@@ -0,0 +1 @@
+Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for control-ling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images. Project page:https://nv-tlabs.github.io/DP-Sinkhorn.
\ No newline at end of file
diff --git a/data/2021/neurips/Double Debiased Machine Learning for Dynamic Treatment Effects b/data/2021/neurips/Double Debiased Machine Learning for Dynamic Treatment Effects
new file mode 100644
index 0000000000..a36a11598f
--- /dev/null
+++ b/data/2021/neurips/Double Debiased Machine Learning for Dynamic Treatment Effects	
@@ -0,0 +1 @@
+We consider the estimation of treatment effects in settings when multiple treatments are assigned over time and treatments can have a causal effect on future outcomes. We formulate the problem as a linear state space Markov process with a high dimensional state and propose an extension of the double/debiased machine learning framework to estimate the dynamic effects of treatments. Our method allows the use of arbitrary machine learning methods to control for the high dimensional state, subject to a mean square error guarantee, while still allowing parametric estimation and construction of confidence intervals for the dynamic treatment effect parameters of interest. Our method is based on a sequential regression peeling process, which we show can be equivalently interpreted as a Neyman orthogonal moment estimator. This allows us to show root-n asymptotic normality of the estimated causal effects.
\ No newline at end of file
diff --git a/data/2021/neurips/Double Machine Learning Density Estimation for Local Treatment Effects with Instruments b/data/2021/neurips/Double Machine Learning Density Estimation for Local Treatment Effects with Instruments
new file mode 100644
index 0000000000..805199cce9
--- /dev/null
+++ b/data/2021/neurips/Double Machine Learning Density Estimation for Local Treatment Effects with Instruments	
@@ -0,0 +1 @@
+It is common to quantify causal effects with mean values, which, however, may 1 fail to capture significant distribution differences of the outcome under different 2 treatments. We study the problem of estimating the density of the causal effect 3 of a binary treatment on a continuous outcome given a binary instrumental vari4 able in the presence of covariates. Specifically, we consider the local treatment 5 effect, which measures the effect of treatment among those who comply with the 6 assignment under the assumption of monotonicity (only the ones who were offered 7 the treatment take it). We develop two families of methods for this task, kernel8 smoothing and model-based approximations – the former smoothes the density by 9 convoluting with a smooth kernel function; the latter projects the density onto a 10 finite-dimensional density class. For both approaches, we derive double/debiased 11 machine learning (DML) based estimators. We study the asymptotic convergence 12 rates of the estimators and show that they are robust to the biases in nuisance 13 function estimation. We illustrate the proposed methods on synthetic data and a 14 real dataset called 401(k). 15
\ No newline at end of file
diff --git a/data/2021/neurips/Doubly Robust Thompson Sampling with Linear Payoffs b/data/2021/neurips/Doubly Robust Thompson Sampling with Linear Payoffs
new file mode 100644
index 0000000000..f4dfc16fe6
--- /dev/null
+++ b/data/2021/neurips/Doubly Robust Thompson Sampling with Linear Payoffs	
@@ -0,0 +1 @@
+A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of regret analysis. We propose a novel multi-armed contextual bandit algorithm called Doubly Robust (DR) Thompson Sampling employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts (\texttt{LinTS}). Different from previous works relying on missing data techniques (\citet{dimakopoulou2019balanced}, \citet{kim2019doubly}), the proposed algorithm is designed to allow a novel additive regret decomposition leading to an improved regret bound with the order of $\tilde{O}(\phi^{-2}\sqrt{T})$, where $\phi^2$ is the minimum eigenvalue of the covariance matrix of contexts. This is the first regret bound of \texttt{LinTS} using $\phi^2$ without the dimension of the context, $d$. Applying the relationship between $\phi^2$ and $d$, the regret bound of the proposed algorithm is $\tilde{O}(d\sqrt{T})$ in many practical scenarios, improving the bound of \texttt{LinTS} by a factor of $\sqrt{d}$. A benefit of the proposed method is that it utilizes all the context data, chosen or not chosen, thus allowing to circumvent the technical definition of unsaturated arms used in theoretical analysis of \texttt{LinTS}. Empirical studies show the advantage of the proposed algorithm over \texttt{LinTS}.
\ No newline at end of file
diff --git a/data/2021/neurips/Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates b/data/2021/neurips/Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates
new file mode 100644
index 0000000000..70101007ca
--- /dev/null
+++ b/data/2021/neurips/Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates	
@@ -0,0 +1 @@
+Soldier-medic. Undercover police officer. Collaborative divorce attorney. Certain jobs require an individual to enact antithetical sets of role expectations (to do X and not-X), such as saving a life and taking a life, in the case of a soldier-medic. Despite their important consequences, we lack a unifying framework for such antithetical expectations and their implied identity foils—where one is expected to be both Dr. Jekyll and Mr. Hyde (a life-saver and a life-taker). To this end, we build theory on how and why antithetical expectations and their implied identity foils arise in organizations. We offer a model of the responses through which individuals tend to manage these seemingly impossible binds—avoidance, favoritism, gray compromise, black-and-white compromise, and holism—and discuss the conditions under which a given response is likely. We conclude that this respective order of responses predicts more positive outcomes (i.e., clarifying the identities, fostering resources, enabling complementary or synergistic solutions) and less negative outcomes (i.e., impaired jobholder performance and credibility, increased cynicism) for individuals and their organizations. We theorize that, given certain conditions, the extreme role-based conflict caused by identity foils is best addressed by the response of holism.
\ No newline at end of file
diff --git a/data/2021/neurips/Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks b/data/2021/neurips/Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks
new file mode 100644
index 0000000000..f33ef78334
--- /dev/null
+++ b/data/2021/neurips/Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks	
@@ -0,0 +1 @@
+Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for the first time that there exist subnetworks with inborn robustness, matching or surpassing the robust accuracy of the adversarially trained networks with comparable model sizes, within randomly initialized networks without any model training, indicating that adversarial training on model weights is not indispensable towards adversarial robustness. We name such subnetworks Robust Scratch Tickets (RSTs), which are also by nature efficient. Distinct from the popular lottery ticket hypothesis, neither the original dense networks nor the identified RSTs need to be trained. To validate and understand this fascinating finding, we further conduct extensive experiments to study the existence and properties of RSTs under different models, datasets, sparsity patterns, and attacks, drawing insights regarding the relationship between DNNs' robustness and their initialization/overparameterization. Furthermore, we identify the poor adversarial transferability between RSTs of different sparsity ratios drawn from the same randomly initialized dense network, and propose a Random RST Switch (R2S) technique, which randomly switches between different RSTs, as a novel defense method built on top of RSTs. We believe our findings about RSTs have opened up a new perspective to study model robustness and extend the lottery ticket hypothesis.
\ No newline at end of file
diff --git a/data/2021/neurips/Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity b/data/2021/neurips/Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity
new file mode 100644
index 0000000000..8943398b29
--- /dev/null
+++ b/data/2021/neurips/Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity	
@@ -0,0 +1 @@
+Meaningful and simplified representations of neural activity can yield insights into how and what information is being processed within a neural circuit. However, without labels, finding representations that reveal the link between the brain and behavior can be challenging. Here, we introduce a novel unsupervised approach for learning disentangled representations of neural activity called Swap-VAE. Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state). These transformed (or augmented) views are created by dropping out neurons and jittering samples in time, which intuitively should lead the network to a representation that maintains both temporal consistency and invariance to the specific neurons used to represent the neural state. Through evaluations on both synthetic data and neural recordings from hundreds of neurons in different primate brains, we show that it is possible to build representations that disentangle neural datasets along relevant latent dimensions linked tso behavior.
\ No newline at end of file
diff --git a/data/2021/neurips/Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers b/data/2021/neurips/Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
new file mode 100644
index 0000000000..4a5e95f98a
--- /dev/null
+++ b/data/2021/neurips/Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers	
@@ -0,0 +1 @@
+In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way in the presence of outliers that can be arbitrarily interspersed in the sequences. To address this problem, we introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. The entire procedure is implemented as a single dynamic program that is efficient and fully differentiable. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications. With Drop-DTW, we address temporal step localization on instructional videos, representation learning from noisy videos, and cross-modal representation learning for audio-visual retrieval and localization. In all applications, we take a weakly- or unsupervised approach and demonstrate state-of-the-art results under these settings.
\ No newline at end of file
diff --git a/data/2021/neurips/DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural Networks b/data/2021/neurips/DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural Networks
new file mode 100644
index 0000000000..c82303bc64
--- /dev/null
+++ b/data/2021/neurips/DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural Networks	
@@ -0,0 +1 @@
+This paper studies Dropout Graph Neural Networks (DropGNNs), a new approach that aims to overcome the limitations of standard GNN frameworks. In DropGNNs, we execute multiple runs of a GNN on the input graph, with some of the nodes randomly and independently dropped in each of these runs. Then, we combine the results of these runs to obtain the final result. We prove that DropGNNs can distinguish various graph neighborhoods that cannot be separated by message passing GNNs. We derive theoretical bounds for the number of runs required to ensure a reliable distribution of dropouts, and we prove several properties regarding the expressive capabilities and limits of DropGNNs. We experimentally validate our theoretical findings on expressiveness. Furthermore, we show that DropGNNs perform competitively on established GNN benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions b/data/2021/neurips/Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions
new file mode 100644
index 0000000000..3cd255c84b
--- /dev/null
+++ b/data/2021/neurips/Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions	
@@ -0,0 +1 @@
+To deal with changing environments, a new performance measure---adaptive regret, defined as the maximum static regret over any interval, is proposed in online learning. Under the setting of online convex optimization, several algorithms have been successfully developed to minimize the adaptive regret. However, existing algorithms lack universality in the sense that they can only handle one type of convex functions and need apriori knowledge of parameters. By contrast, there exist universal algorithms, such as MetaGrad, that attain optimal static regret for multiple types of convex functions simultaneously. Along this line of research, this paper presents the first universal algorithm for minimizing the adaptive regret of convex functions. Specifically, we borrow the idea of maintaining multiple learning rates in MetaGrad to handle the uncertainty of functions, and utilize the technique of sleeping experts to capture changing environments. In this way, our algorithm automatically adapts to the property of functions (convex, exponentially concave, or strongly convex), as well as the nature of environments (stationary or changing). As a by product, it also allows the type of functions to switch between rounds.
\ No newline at end of file
diff --git a/data/2021/neurips/Dual Parameterization of Sparse Variational Gaussian Processes b/data/2021/neurips/Dual Parameterization of Sparse Variational Gaussian Processes
new file mode 100644
index 0000000000..e0ea83a41c
--- /dev/null
+++ b/data/2021/neurips/Dual Parameterization of Sparse Variational Gaussian Processes	
@@ -0,0 +1 @@
+Sparse variational Gaussian process (SVGP) methods are a common choice for non-conjugate Gaussian process inference because of their computational benefits. In this paper, we improve their computational efficiency by using a dual parameterization where each data example is assigned dual parameters, similarly to site parameters used in expectation propagation. Our dual parameterization speeds-up inference using natural gradient descent, and provides a tighter evidence lower bound for hyperparameter learning. The approach has the same memory cost as the current SVGP methods, but it is faster and more accurate.
\ No newline at end of file
diff --git a/data/2021/neurips/Dual Progressive Prototype Network for Generalized Zero-Shot Learning b/data/2021/neurips/Dual Progressive Prototype Network for Generalized Zero-Shot Learning
new file mode 100644
index 0000000000..ade91d78c1
--- /dev/null
+++ b/data/2021/neurips/Dual Progressive Prototype Network for Generalized Zero-Shot Learning	
@@ -0,0 +1 @@
+Generalized Zero-Shot Learning (GZSL) aims to recognize new categories with auxiliary semantic information,e.g., category attributes. In this paper, we handle the critical issue of domain shift problem, i.e., confusion between seen and unseen categories, by progressively improving cross-domain transferability and category discriminability of visual representations. Our approach, named Dual Progressive Prototype Network (DPPN), constructs two types of prototypes that record prototypical visual patterns for attributes and categories, respectively. With attribute prototypes, DPPN alternately searches attribute-related local regions and updates corresponding attribute prototypes to progressively explore accurate attribute-region correspondence. This enables DPPN to produce visual representations with accurate attribute localization ability, which benefits the semantic-visual alignment and representation transferability. Besides, along with progressive attribute localization, DPPN further projects category prototypes into multiple spaces to progressively repel visual representations from different categories, which boosts category discriminability. Both attribute and category prototypes are collaboratively learned in a unified framework, which makes visual representations of DPPN transferable and distinctive. Experiments on four benchmarks prove that DPPN effectively alleviates the domain shift problem in GZSL.
\ No newline at end of file
diff --git a/data/2021/neurips/Dual-stream Network for Visual Recognition b/data/2021/neurips/Dual-stream Network for Visual Recognition
new file mode 100644
index 0000000000..b8768f0cd6
--- /dev/null
+++ b/data/2021/neurips/Dual-stream Network for Visual Recognition	
@@ -0,0 +1 @@
+Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images. In this paper, we present a generic Dual-stream Network (DS-Net) to fully explore the representation capacity of local and global pattern features for image classification. Our DS-Net can simultaneously calculate fine-grained and integrated features and efficiently fuse them. Specifically, we propose an Intra-scale Propagation module to process two different resolutions in each block and an Inter-Scale Alignment module to perform information interaction across features at dual scales. Besides, we also design a Dual-stream FPN (DS-FPN) to further enhance contextual information for downstream dense predictions. Without bells and whistles, the propsed DS-Net outperforms Deit-Small by 2.4% in terms of top-1 accuracy on ImageNet-1k and achieves state-of-the-art performance over other Vision Transformers and ResNets. For object detection and instance segmentation, DS-Net-Small respectively outperforms ResNet-50 by 6.4% and 5.5 % in terms of mAP on MSCOCO 2017, and surpasses the previous state-of-the-art scheme, which significantly demonstrates its potential to be a general backbone in vision tasks. The code will be released soon.
\ No newline at end of file
diff --git a/data/2021/neurips/DualNet: Continual Learning, Fast and Slow b/data/2021/neurips/DualNet: Continual Learning, Fast and Slow
new file mode 100644
index 0000000000..6ce427f6d2
--- /dev/null
+++ b/data/2021/neurips/DualNet: Continual Learning, Fast and Slow	
@@ -0,0 +1 @@
+The structure of this document is organized as follows. In Section A, we provide the pseudo-code of DualNet. Section B provides additional details of our experiments, including datasets, evaluation metrics, baselines’ descriptions, additional results
\ No newline at end of file
diff --git a/data/2021/neurips/Dueling Bandits with Adversarial Sleeping b/data/2021/neurips/Dueling Bandits with Adversarial Sleeping
new file mode 100644
index 0000000000..935b79f799
--- /dev/null
+++ b/data/2021/neurips/Dueling Bandits with Adversarial Sleeping	
@@ -0,0 +1 @@
+We introduce the problem of sleeping dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shopping, restaurant recommendation, search engine optimization, etc. Surprisingly, this `sleeping aspect' of dueling bandits has never been studied in the literature. Like dueling bandits, the goal is to compete with the best arm by sequentially querying the preference feedback of item pairs. The non-triviality however results due to the non-stationary item spaces that allow any arbitrary subsets items to go unavailable every round. The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits. We first derive an instance-specific lower bound for DB-SPAA $\Omega( \sum_{i =1}^{K-1}\sum_{j=i+1}^K \frac{\log T}{\Delta(i,j)})$, where $K$ is the number of items and $\Delta(i,j)$ is the gap between items $i$ and $j$. This indicates that the sleeping problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB). We then propose two algorithms, with near optimal regret guarantees. Our results are corroborated empirically.
\ No newline at end of file
diff --git a/data/2021/neurips/Dueling Bandits with Team Comparisons b/data/2021/neurips/Dueling Bandits with Team Comparisons
new file mode 100644
index 0000000000..40f91d7c45
--- /dev/null
+++ b/data/2021/neurips/Dueling Bandits with Team Comparisons	
@@ -0,0 +1 @@
+We introduce the dueling teams problem, a new online-learning setting in which the learner observes noisy comparisons of disjoint pairs of $k$-sized teams from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a Condorcet winning team, i.e., a team which wins against any other disjoint team (with probability at least $1/2$). Noisy comparisons are linked to a total order on the teams. We formalize our model by building upon the dueling bandits setting (Yue et al.2012) and provide several algorithms, both for stochastic and deterministic settings. For the stochastic setting, we provide a reduction to the classical dueling bandits setting, yielding an algorithm that identifies a Condorcet winning team within $\mathcal{O}((n + k \log (k)) \frac{\max(\log\log n, \log k)}{\Delta^2})$ duels, where $\Delta$ is a gap parameter. For deterministic feedback, we additionally present a gap-independent algorithm that identifies a Condorcet winning team within $\mathcal{O}(nk\log(k)+k^5)$ duels.
\ No newline at end of file
diff --git a/data/2021/neurips/Duplex Sequence-to-Sequence Learning for Reversible Machine Translation b/data/2021/neurips/Duplex Sequence-to-Sequence Learning for Reversible Machine Translation
new file mode 100644
index 0000000000..80ad6ef3ca
--- /dev/null
+++ b/data/2021/neurips/Duplex Sequence-to-Sequence Learning for Reversible Machine Translation	
@@ -0,0 +1 @@
+Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER (Reversible Duplex Transformer), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables reversible machine translation by simply flipping the input and output ends. Experiments verify that REDER achieves the first success of reversible machine translation, which helps outperform its multitask-trained baselines by up to 1.3 BLEU.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking b/data/2021/neurips/Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking
new file mode 100644
index 0000000000..9409de680b
--- /dev/null
+++ b/data/2021/neurips/Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking	
@@ -0,0 +1 @@
+We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison, integrated with the Dynabench platform. Our platform evaluates NLP models directly instead of relying on self-reported metrics or predictions on a single dataset. Under this paradigm, models are submitted to be evaluated in the cloud, circumventing the issues of reproducibility, accessibility, and backwards compatibility that often hinder benchmarking in NLP. This allows users to interact with uploaded models in real time to assess their quality, and permits the collection of additional metrics such as memory use, throughput, and robustness, which -- despite their importance to practitioners -- have traditionally been absent from leaderboards. On each task, models are ranked according to the Dynascore, a novel utility-based aggregation of these statistics, which users can customize to better reflect their preferences, placing more/less weight on a particular axis of evaluation or dataset. As state-of-the-art NLP models push the limits of traditional benchmarks, Dynaboard offers a standardized solution for a more diverse and comprehensive evaluation of model quality.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Analysis of Higher-Order Coordination in Neuronal Assemblies via De-Sparsified Orthogonal Matching Pursuit b/data/2021/neurips/Dynamic Analysis of Higher-Order Coordination in Neuronal Assemblies via De-Sparsified Orthogonal Matching Pursuit
new file mode 100644
index 0000000000..aef7b906dc
--- /dev/null
+++ b/data/2021/neurips/Dynamic Analysis of Higher-Order Coordination in Neuronal Assemblies via De-Sparsified Orthogonal Matching Pursuit	
@@ -0,0 +1 @@
+Coordinated ensemble spiking activity is widely observable in neural recordings and central in the study of population codes, with hypothesized roles including robust stimulus representation, interareal communication of neural information, and learning and memory formation. Model-free measures of synchrony characterize the coherence of pairwise activity, but not higher-order interactions; this limitation is transcended by statistical models of ensemble spiking activity. However, existing model-based analyses often impose assumptions about the relevance of higher-order interactions and require multiple repeated trials in order to characterize dynamics in the correlational structure of ensemble activity. To address these shortcomings, we propose an adaptive greedy ﬁltering algorithm based on a discretized mark point-process model of ensemble spiking and a corresponding precise statistical inference framework to identify signiﬁcant coordinated higher-order spiking activity. In the course of developing the statistical inference procedures, we also show that conﬁdence intervals can be constructed for greedily estimated parameters. We demonstrate the utility of our proposed methods on simulated neuronal assemblies. Applied to multi-electrode recordings of human cortical ensembles, our proposed methods provide new insights into the dynamics underlying localized population activity during transitions between brain states.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Bottleneck for Robust Self-Supervised Exploration b/data/2021/neurips/Dynamic Bottleneck for Robust Self-Supervised Exploration
new file mode 100644
index 0000000000..9089e5e276
--- /dev/null
+++ b/data/2021/neurips/Dynamic Bottleneck for Robust Self-Supervised Exploration	
@@ -0,0 +1 @@
+Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic COVID risk assessment accounting for community virus exposure from a spatial-temporal transmission model b/data/2021/neurips/Dynamic COVID risk assessment accounting for community virus exposure from a spatial-temporal transmission model
new file mode 100644
index 0000000000..d1155e2800
--- /dev/null
+++ b/data/2021/neurips/Dynamic COVID risk assessment accounting for community virus exposure from a spatial-temporal transmission model	
@@ -0,0 +1 @@
+COVID-19 pandemic has caused unprecedented negative impacts on our society, including further exposing inequity and disparity in public health. To study the impact of socioeconomic factors on COVID transmission, we first propose a spatial-temporal model to examine the socioeconomic heterogeneity and spatial correlation of COVID-19 transmission at the community level. Second, to assess the individual risk of severe COVID-19 outcomes after a positive diagnosis, we propose a dynamic, varying-coefficient model that integrates individual-level risk factors from electronic health records (EHRs) with community-level risk factors. The underlying neighborhood prevalence of infections (both symptomatic and pre-symptomatic) predicted from the previous spatial-temporal model is included in the individual risk assessment so as to better capture the background risk of virus exposure for each individual. We design a weighting scheme to mitigate multiple selection biases inherited in EHRs of COVID patients. We analyze COVID transmission data in New York City (NYC, the epicenter of the first surge in the United States) and EHRs from NYC hospitals, where time-varying effects of community risk factors and significant interactions between individual- and community-level risk factors are detected. By examining the socioeconomic disparity of infection risks and interaction among the risk factors, our methods can assist public health decision-making and facilitate better clinical management of COVID patients.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Causal Bayesian Optimization b/data/2021/neurips/Dynamic Causal Bayesian Optimization
new file mode 100644
index 0000000000..5641f7fcf6
--- /dev/null
+++ b/data/2021/neurips/Dynamic Causal Bayesian Optimization	
@@ -0,0 +1 @@
+This paper studies the problem of performing a sequence of optimal interventions in a causal dynamical system where both the target variable of interest and the inputs evolve over time. This problem arises in a variety of domains e.g. system biology and operational research. Dynamic Causal Bayesian Optimization (DCBO) brings together ideas from sequential decision making, causal inference and Gaussian process (GP) emulation. DCBO is useful in scenarios where all causal effects in a graph are changing over time. At every time step DCBO identifies a local optimal intervention by integrating both observational and past interventional data collected from the system. We give theoretical results detailing how one can transfer interventional information across time steps and define a dynamic causal GP model which can be used to quantify uncertainty and find optimal interventions in practice. We demonstrate how DCBO identifies optimal interventions faster than competing approaches in multiple settings and applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data b/data/2021/neurips/Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data
new file mode 100644
index 0000000000..8d4854ea24
--- /dev/null
+++ b/data/2021/neurips/Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data	
@@ -0,0 +1 @@
+Most existing works in few-shot learning rely on meta-learning the network on a large base dataset which is typically from the same domain as the target dataset. We tackle the problem of cross-domain few-shot learning where there is a large shift between the base and target domain. The problem of cross-domain few-shot recognition with unlabeled target data is largely unaddressed in the literature. STARTUP was the first method that tackles this problem using self-training. However, it uses a fixed teacher pretrained on a labeled base dataset to create soft labels for the unlabeled target samples. As the base dataset and unlabeled dataset are from different domains, projecting the target images in the class-domain of the base dataset with a fixed pretrained model might be sub-optimal. We propose a simple dynamic distillation-based approach to facilitate unlabeled images from the novel/base dataset. We impose consistency regularization by calculating predictions from the weakly-augmented versions of the unlabeled images from a teacher network and matching it with the strongly augmented versions of the same images from a student network. The parameters of the teacher network are updated as exponential moving average of the parameters of the student network. We show that the proposed network learns representation that can be easily adapted to the target domain even though it has not been trained with target-specific classes during the pretraining phase. Our model outperforms the current state-of-the art method by 4.4% for 1-shot and 3.6% for 5-shot classification in the BSCD-FSL benchmark, and also shows competitive performance on traditional in-domain few-shot learning task.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Grained Encoder for Vision Transformers b/data/2021/neurips/Dynamic Grained Encoder for Vision Transformers
new file mode 100644
index 0000000000..2bebf1a11f
--- /dev/null
+++ b/data/2021/neurips/Dynamic Grained Encoder for Vision Transformers	
@@ -0,0 +1 @@
+Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational costs. Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region. Thus it achieves a fine-grained representation in discriminative regions while keeping high efficiency. Besides, the dynamic grained encoder is compatible with most vision transformer frameworks. Without bells and whistles, our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification. Extensive experiments on object detection and segmentation further demonstrate the generalizability of our approach. Code is available at https://github.com/StevenGrove/vtpack.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Inference with Neural Interpreters b/data/2021/neurips/Dynamic Inference with Neural Interpreters
new file mode 100644
index 0000000000..e1122f90c3
--- /dev/null
+++ b/data/2021/neurips/Dynamic Inference with Neural Interpreters	
@@ -0,0 +1 @@
+Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation b/data/2021/neurips/Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation
new file mode 100644
index 0000000000..ff542f788f
--- /dev/null
+++ b/data/2021/neurips/Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation	
@@ -0,0 +1 @@
+Semantic segmentation requires per-pixel prediction for a given image. Typically, the output resolution of a segmentation network is severely reduced due to the downsampling operations in the CNN backbone. Most previous methods employ upsampling decoders to recover the spatial resolution. Various decoders were designed in the literature. Here, we propose a novel decoder, termed dynamic neural representational decoder (NRD), which is simple yet significantly more efficient. As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient. Furthermore, these neural representations are dynamically generated and conditioned on the outputs of the encoder networks. The desired semantic labels can be efficiently decoded from the neural representations, resulting in high-resolution semantic segmentation predictions. We empirically show that our proposed decoder can outperform the decoder in DeeplabV3+ with only 30% computational complexity, and achieve competitive performance with the methods using dilated encoders with only 15% computation. Experiments on the Cityscapes, ADE20K, and PASCAL Context datasets demonstrate the effectiveness and efficiency of our proposed method.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Normalization and Relay for Video Action Recognition b/data/2021/neurips/Dynamic Normalization and Relay for Video Action Recognition
new file mode 100644
index 0000000000..adaf8952ba
--- /dev/null
+++ b/data/2021/neurips/Dynamic Normalization and Relay for Video Action Recognition	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) have been the dominant model for video action recognition. Due to the huge memory and compute demand, popular action recognition networks need to be trained with small batch sizes, which makes learning discriminative spatial-temporal representations for videos become a challenging problem. In this paper, we present Dynamic Normalization and Relay (DNR), an improved normalization design, to augment the spatial-temporal representation learning of any deep action recognition model, adapting to small batch size training settings. We observe that state-of-the-art action recognition networks usually apply the same normalization parameters to all video data, and ignore the dependencies of the estimated normalization parameters between neighboring frames (at the same layer) and between neighboring layers (with all frames of a video clip). Inspired by this, DNR introduces two dynamic normalization relay modules to explore the potentials of cross-temporal and cross-layer feature distribution dependencies for estimating accurate layer-wise normalization parameters. These two DNR modules are instantiated as a light-weight recurrent structure conditioned on the current input features, and the normalization parameters estimated from the neighboring frames based features at the same layer or from the whole video clip based features at the preceding layers. We ﬁrst plug DNR into prevailing 2D CNN backbones and test its performance on public action recognition datasets including Kinetics and Something-Something. Experimental results show that DNR brings large performance improvements to the baselines, achieving over 4.4 % absolute margins in top-1 accuracy without training bells and whistles. More experiments on 3D backbones and several latest 2D spatial-temporal networks further validate its effectiveness. Code will be available at https://github.com/caidonkey/dnr .
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Resolution Network b/data/2021/neurips/Dynamic Resolution Network
new file mode 100644
index 0000000000..04e795e07d
--- /dev/null
+++ b/data/2021/neurips/Dynamic Resolution Network	
@@ -0,0 +1 @@
+Deep convolutional neural networks (CNNs) are often of sophisticated design with numerous learnable parameters for the accuracy reason. To alleviate the expensive costs of deploying them on mobile devices, recent works have made huge efforts for excavating redundancy in pre-defined architectures. Nevertheless, the redundancy on the input resolution of modern CNNs has not been fully investigated, i.e., the resolution of input image is fixed. In this paper, we observe that the smallest resolution for accurately predicting the given image is different using the same neural network. To this end, we propose a novel dynamic-resolution network (DRNet) in which the input resolution is determined dynamically based on each input sample. Wherein, a resolution predictor with negligible computational costs is explored and optimized jointly with the desired network. Specifically, the predictor learns the smallest resolution that can retain and even exceed the original recognition accuracy for each image. During the inference, each input image will be resized to its predicted resolution for minimizing the overall computation burden. We then conduct extensive experiments on several benchmark networks and datasets. The results show that our DRNet can be embedded in any off-the-shelf network architecture to obtain a considerable reduction in computational complexity. For instance, DR-ResNet-50 achieves similar performance with an about 34% computation reduction, while gaining 1.4% accuracy increase with 10% computation reduction compared to the original ResNet-50 on ImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares b/data/2021/neurips/Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares
new file mode 100644
index 0000000000..e2ad4241f2
--- /dev/null
+++ b/data/2021/neurips/Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares	
@@ -0,0 +1 @@
+A recently introduced technique for a sparse optimization problem called"safe screening"allows us to identify irrelevant variables in the early stage of optimization. In this paper, we first propose a flexible framework for safe screening based on the Fenchel-Rockafellar duality and then derive a strong safe screening rule for norm-regularized least squares by the framework. We call the proposed screening rule for norm-regularized least squares"dynamic Sasvi"because it can be interpreted as a generalization of Sasvi. Unlike the original Sasvi, it does not require the exact solution of a more strongly regularized problem; hence, it works safely in practice. We show that our screening rule can eliminate more features and increase the speed of the solver in comparison with other screening rules both theoretically and experimentally.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Trace Estimation b/data/2021/neurips/Dynamic Trace Estimation
new file mode 100644
index 0000000000..9102d2cc72
--- /dev/null
+++ b/data/2021/neurips/Dynamic Trace Estimation	
@@ -0,0 +1 @@
+We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any $m$ matrices $A_1,...,A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on $A_i$ and can be generalized to a bound for the $p$-th Schatten norm for $p \in [1,2]$, giving a complexity of $\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right)$. By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language b/data/2021/neurips/Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
new file mode 100644
index 0000000000..cfd75a58db
--- /dev/null
+++ b/data/2021/neurips/Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language	
@@ -0,0 +1 @@
+In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects and their interactions from videos and language. This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine. The visual perception module parses each video frame into object-centric trajectories and represents them as latent scene representations. The concept learner grounds visual concepts (e.g., color, shape, and material) from these object-centric representations based on the language, thus providing prior knowledge for the physics engine. The differentiable physics model, implemented as an impulse-based differentiable rigid-body simulator, performs differentiable physical simulation based on the grounded concepts to infer physical properties, such as mass, restitution, and velocity, by fitting the simulated trajectories into the video observations. Consequently, these learned concepts and physical models can explain what we have seen and imagine what is about to happen in future and counterfactual scenarios. Integrating differentiable physics into the dynamic reasoning framework offers several appealing benefits. More accurate dynamics prediction in learned physics models enables state-of-the-art performance on both synthetic and real-world benchmarks while still maintaining high transparency and interpretability; most notably, VRDP improves the accuracy of predictive and counterfactual questions by 4.5% and 11.5% compared to its best counterpart. VRDP is also highly data-efficient: physical parameters can be optimized from very few videos, and even a single video can be sufficient. Finally, with all physical parameters inferred, VRDP can quickly learn new concepts from a few examples.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic influence maximization b/data/2021/neurips/Dynamic influence maximization
new file mode 100644
index 0000000000..972470e87c
--- /dev/null
+++ b/data/2021/neurips/Dynamic influence maximization	
@@ -0,0 +1 @@
+With the emergence of Web3.0 and blockchain technology, decentralized social platforms like blockchain on-line social networks (BOSN) have provided enhanced privacy protection while making network regulation more challenging. Influence maximization (IM) has always been a critical technical focus in areas such as viral marketing, network regulation, and opinion monitoring. Currently, a significant amount of existing literature revolves around the optimization of influence maximization problems such as greedy algorithms, heuristic algorithms and classical propagation models. In this paper, we first develop a linear threshold model that is sensitive to the word-of-mouth effect, which appropriately reflects the reputation mechanism of blockchain social networks. Moreover, we have devised an algorithm (CRB) based on community segmentation and community ranking to optimize the operational efficiency of the greedy algorithm. It measures community impact through three attributes: community social score, community density, and community scale. Series experiments are designed to evaluate the performance of the proposed model and algorithm. Our experimental results reveal the impact of the word-of-mouth effects on decentralized social networks and provide evidence for the effectiveness and efficiency of the CRB algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamic population-based meta-learning for multi-agent communication with natural language b/data/2021/neurips/Dynamic population-based meta-learning for multi-agent communication with natural language
new file mode 100644
index 0000000000..7fcbee1c64
--- /dev/null
+++ b/data/2021/neurips/Dynamic population-based meta-learning for multi-agent communication with natural language	
@@ -0,0 +1 @@
+In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of population-based approaches, where multiple agents interact with each other with the goal of learning more generic protocols. These methods, while able to result in good coordination between unseen partners, still only achieve so in cases of simple languages, thus failing to adapt to human partners using natural language. We attribute this to the use of static populations and instead propose a dynamic population-based meta-learning approach that builds such a population in an iterative manner. We perform a holistic evaluation of our method on two different referential games, and show that our agents outperform all prior work when communicating with seen partners and humans. Furthermore, we analyze the natural language generation skills of our agents, where we find that our agents also outperform strong baselines. Finally, we test the robustness of our agents when communicating with out-of-population agents and carefully test the importance of each component of our method through ablation studies.
\ No newline at end of file
diff --git a/data/2021/neurips/DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification b/data/2021/neurips/DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
new file mode 100644
index 0000000000..68eb1c2de8
--- /dev/null
+++ b/data/2021/neurips/DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	
@@ -0,0 +1 @@
+Attention is sparse in vision transformers. We observe the final prediction in vision transformers is only based on a subset of most informative tokens, which is sufficient for accurate image recognition. Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input. Specifically, we devise a lightweight prediction module to estimate the importance score of each token given the current features. The module is added to different layers to prune redundant tokens hierarchically. To optimize the prediction module in an end-to-end manner, we propose an attention masking strategy to differentiably prune a token by blocking its interactions with other tokens. Benefiting from the nature of self-attention, the unstructured sparse tokens are still hardware friendly, which makes our framework easy to achieve actual speed-up. By hierarchically pruning 66% of the input tokens, our method greatly reduces 31%~37% FLOPs and improves the throughput by over 40% while the drop of accuracy is within 0.5% for various vision transformers. Equipped with the dynamic token sparsification framework, DynamicViT models can achieve very competitive complexity/accuracy trade-offs compared to state-of-the-art CNNs and vision transformers on ImageNet. Code is available at https://github.com/raoyongming/DynamicViT
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamical Wasserstein Barycenters for Time-series Modeling b/data/2021/neurips/Dynamical Wasserstein Barycenters for Time-series Modeling
new file mode 100644
index 0000000000..5cf829cf0f
--- /dev/null
+++ b/data/2021/neurips/Dynamical Wasserstein Barycenters for Time-series Modeling	
@@ -0,0 +1 @@
+Many time series can be modeled as a sequence of segments representing high-level discrete states, such as running and walking in a human activity application. Flexible models should describe the system state and observations in stationary “pure-state” periods as well as transition periods between adjacent segments, such as a gradual slowdown between running and walking. However, most prior work assumes instantaneous transitions between pure discrete states. We propose a dynamical Wasserstein barycentric (DWB) model that estimates the system state over time as well as the data-generating distributions of pure states in an unsupervised manner. Our model assumes each pure state generates data from a multivariate normal distribution, and characterizes transitions between states via displacement-interpolation speciﬁed by the Wasserstein barycenter. The system state is represented by a barycentric weight vector which evolves over time via a random walk on the simplex. Parameter learning leverages the natural Riemannian geometry of Gaussian distributions under the Wasserstein distance, which leads to improved convergence speeds. Experiments on several human activity datasets show that our proposed DWB model accurately learns the generating distribution of pure states while improving state estimation for transition periods compared to the commonly used linear interpolation mixture models.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models b/data/2021/neurips/Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models
new file mode 100644
index 0000000000..c75711e386
--- /dev/null
+++ b/data/2021/neurips/Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models	
@@ -0,0 +1 @@
+We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.
\ No newline at end of file
diff --git a/data/2021/neurips/Dynamics-regulated kinematic policy for egocentric pose estimation b/data/2021/neurips/Dynamics-regulated kinematic policy for egocentric pose estimation
new file mode 100644
index 0000000000..4884f8253e
--- /dev/null
+++ b/data/2021/neurips/Dynamics-regulated kinematic policy for egocentric pose estimation	
@@ -0,0 +1 @@
+We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information. Unlike prior kinematics or dynamics-based approaches where the two components are used disjointly, we synergize the two approaches via dynamics-regulated training. At each timestep, a kinematic model is used to provide a target pose using video evidence and simulation state. Then, a prelearned dynamics model attempts to mimic the kinematic pose in a physics simulator. By comparing the pose instructed by the kinematic model against the pose generated by the dynamics model, we can use their misalignment to further improve the kinematic model. By factoring in the 6DoF pose of objects (e.g., chairs, boxes) in the scene, we demonstrate for the first time, the ability to estimate physically-plausible 3D human-object interactions using a single wearable camera. We evaluate our egocentric pose estimation method in both controlled laboratory settings and real-world scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/E(n) Equivariant Normalizing Flows b/data/2021/neurips/E(n) Equivariant Normalizing Flows
new file mode 100644
index 0000000000..b1b0674a8e
--- /dev/null
+++ b/data/2021/neurips/E(n) Equivariant Normalizing Flows	
@@ -0,0 +1 @@
+This paper introduces a generative model equivariant to Euclidean symmetries: E(n) Equivariant Normalizing Flows (E-NFs). To construct E-NFs, we take the discriminative E(n) graph neural networks and integrate them as a differential equation to obtain an invertible equivariant function: a continuous-time normalizing flow. We demonstrate that E-NFs considerably outperform baselines and existing methods from the literature on particle systems such as DW4 and LJ13, and on molecules from QM9 in terms of log-likelihood. To the best of our knowledge, this is the first flow that jointly generates molecule features and positions in 3D.
\ No newline at end of file
diff --git a/data/2021/neurips/EDGE: Explaining Deep Reinforcement Learning Policies b/data/2021/neurips/EDGE: Explaining Deep Reinforcement Learning Policies
new file mode 100644
index 0000000000..1a5d8caf0f
--- /dev/null
+++ b/data/2021/neurips/EDGE: Explaining Deep Reinforcement Learning Policies	
@@ -0,0 +1 @@
+With the rapid development of deep reinforcement learning (DRL) techniques, there is an increasing need to understand and interpret DRL policies. While recent research has developed explanation methods to interpret how an agent determines its moves, they cannot capture the importance of actions/states to a game’s ﬁnal result. In this work, we propose a novel self-explainable model that augments a Gaussian process with a customized kernel function and an interpretable predictor. Together with the proposed model, we also develop a parameter learning procedure that leverages inducing points and variational inference to improve learning efﬁciency. Using our proposed model, we can predict an agent’s ﬁnal rewards from its game episodes and extract time step importance within episodes as strategy-level explanations for that agent. Through experiments on Atari and MuJoCo games, we verify the explanation ﬁdelity of our method and demonstrate how to employ interpretation to understand agent behavior, discover policy vulnerabilities, remediate policy errors, and even defend against adversarial attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback b/data/2021/neurips/EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback
new file mode 100644
index 0000000000..e1869edeb4
--- /dev/null
+++ b/data/2021/neurips/EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback	
@@ -0,0 +1 @@
+Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-$k$. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast $O(1/T)$ convergence rate for smooth nonconvex problems, beating the previous bound of $O(1/T^{2/3})$, which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning.
\ No newline at end of file
diff --git a/data/2021/neurips/EIGNN: Efficient Infinite-Depth Graph Neural Networks b/data/2021/neurips/EIGNN: Efficient Infinite-Depth Graph Neural Networks
new file mode 100644
index 0000000000..243307778c
--- /dev/null
+++ b/data/2021/neurips/EIGNN: Efficient Infinite-Depth Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications. However, with their inherently finite aggregation layers, existing GNN models may not be able to effectively capture long-range dependencies in the underlying graphs. Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN), to efficiently capture very long-range dependencies. We theoretically derive a closed-form solution of EIGNN which makes training an infinite-depth GNN model tractable. We then further show that we can achieve more efficient computation for training EIGNN by using eigendecomposition. The empirical results of comprehensive experiments on synthetic and real-world datasets show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance. Furthermore, we show that our model is also more robust against both noise and adversarial perturbations on node features.
\ No newline at end of file
diff --git a/data/2021/neurips/ELLA: Exploration through Learned Language Abstraction b/data/2021/neurips/ELLA: Exploration through Learned Language Abstraction
new file mode 100644
index 0000000000..b6d5619a99
--- /dev/null
+++ b/data/2021/neurips/ELLA: Exploration through Learned Language Abstraction	
@@ -0,0 +1 @@
+Building agents capable of understanding language instructions is critical to effective and robust human-AI collaboration. Recent work focuses on training these agents via reinforcement learning in environments with synthetic language; however, instructions often define long-horizon, sparse-reward tasks, and learning policies requires many episodes of experience. We introduce ELLA: Exploration through Learned Language Abstraction, a reward shaping approach geared towards boosting sample efficiency in sparse reward environments by correlating high-level instructions with simpler low-level constituents. ELLA has two key elements: 1) A termination classifier that identifies when agents complete low-level instructions, and 2) A relevance classifier that correlates low-level instructions with success on high-level tasks. We learn the termination classifier offline from pairs of instructions and terminal states. Notably, in departure from prior work in language and abstraction, we learn the relevance classifier online, without relying on an explicit decomposition of high-level instructions to low-level instructions. On a suite of complex BabyAI environments with varying instruction complexities and reward sparsity, ELLA shows gains in sample efficiency relative to language-based shaping and traditional RL methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Early Convolutions Help Transformers See Better b/data/2021/neurips/Early Convolutions Help Transformers See Better
new file mode 100644
index 0000000000..524cb3db4a
--- /dev/null
+++ b/data/2021/neurips/Early Convolutions Help Transformers See Better	
@@ -0,0 +1 @@
+Vision transformer (ViT) models exhibit substandard optimizability. In particular, they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperparameters, and training schedule length. In comparison, modern convolutional neural networks are easier to optimize. Why is this the case? In this work, we conjecture that the issue lies with the patchify stem of ViT models, which is implemented by a stride-p p*p convolution (p=16 by default) applied to the input image. This large-kernel plus large-stride convolution runs counter to typical design choices of convolutional layers in neural networks. To test whether this atypical design choice causes an issue, we analyze the optimization behavior of ViT models with their original patchify stem versus a simple counterpart where we replace the ViT stem by a small number of stacked stride-two 3*3 convolutions. While the vast majority of computation in the two ViT designs is identical, we find that this small change in early visual processing results in markedly different training behavior in terms of the sensitivity to optimization settings as well as the final model accuracy. Using a convolutional stem in ViT dramatically increases optimization stability and also improves peak performance (by ~1-2% top-1 accuracy on ImageNet-1k), while maintaining flops and runtime. The improvement can be observed across the wide spectrum of model complexities (from 1G to 36G flops) and dataset scales (from ImageNet-1k to ImageNet-21k). These findings lead us to recommend using a standard, lightweight convolutional stem for ViT models in this regime as a more robust architectural choice compared to the original ViT model design.
\ No newline at end of file
diff --git a/data/2021/neurips/Early-stopped neural networks are consistent b/data/2021/neurips/Early-stopped neural networks are consistent
new file mode 100644
index 0000000000..22425b50c1
--- /dev/null
+++ b/data/2021/neurips/Early-stopped neural networks are consistent	
@@ -0,0 +1 @@
+This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is inconsistent.
\ No newline at end of file
diff --git a/data/2021/neurips/Edge Representation Learning with Hypergraphs b/data/2021/neurips/Edge Representation Learning with Hypergraphs
new file mode 100644
index 0000000000..2b1152b89a
--- /dev/null
+++ b/data/2021/neurips/Edge Representation Learning with Hypergraphs	
@@ -0,0 +1 @@
+Graph neural networks have recently achieved remarkable success in representing graph-structured data, with rapid progress in both the node embedding and graph pooling methods. Yet, they mostly focus on capturing information from the nodes considering their connectivity, and not much work has been done in representing the edges, which are essential components of a graph. However, for tasks such as graph reconstruction and generation, as well as graph classification tasks for which the edges are important for discrimination, accurately representing edges of a given graph is crucial to the success of the graph representation learning. To this end, we propose a novel edge representation learning framework based on Dual Hypergraph Transformation (DHT), which transforms the edges of a graph into the nodes of a hypergraph. This dual hypergraph construction allows us to apply message-passing techniques for node representations to edges. After obtaining edge representations from the hypergraphs, we then cluster or drop edges to obtain holistic graph-level edge representations. We validate our edge representation learning method with hypergraphs on diverse graph datasets for graph representation and generation performance, on which our method largely outperforms existing graph representation learning methods. Moreover, our edge representation learning and pooling method also largely outperforms state-of-the-art graph pooling methods on graph classification, not only because of its accurate edge representation learning, but also due to its lossless compression of the nodes and removal of irrelevant edges for effective message-passing.
\ No newline at end of file
diff --git a/data/2021/neurips/EditGAN: High-Precision Semantic Image Editing b/data/2021/neurips/EditGAN: High-Precision Semantic Image Editing
new file mode 100644
index 0000000000..524947c15f
--- /dev/null
+++ b/data/2021/neurips/EditGAN: High-Precision Semantic Image Editing	
@@ -0,0 +1 @@
+Generative adversarial networks (GANs) have recently found applications in image editing. However, most GAN based image editing methods often require large scale datasets with semantic segmentation annotations for training, only provide high level control, or merely interpolate between different images. Here, we propose EditGAN, a novel method for high quality, high precision semantic image editing, allowing users to edit images by modifying their highly detailed part segmentation masks, e.g., drawing a new mask for the headlight of a car. EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing. Specifically, we embed an image into the GAN latent space and perform conditional latent code optimization according to the segmentation edit, which effectively also modifies the image. To amortize optimization, we find editing vectors in latent space that realize the edits. The framework allows us to learn an arbitrary number of editing vectors, which can then be directly applied on other images at interactive rates. We experimentally show that EditGAN can manipulate images with an unprecedented level of detail and freedom, while preserving full image quality.We can also easily combine multiple edits and perform plausible edits beyond EditGAN training data. We demonstrate EditGAN on a wide variety of image types and quantitatively outperform several previous editing methods on standard editing benchmark tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Editing a classifier by rewriting its prediction rules b/data/2021/neurips/Editing a classifier by rewriting its prediction rules
new file mode 100644
index 0000000000..0de7ed0735
--- /dev/null
+++ b/data/2021/neurips/Editing a classifier by rewriting its prediction rules	
@@ -0,0 +1 @@
+We present a methodology for modifying the behavior of a classifier by directly rewriting its prediction rules. Our approach requires virtually no additional data collection and can be applied to a variety of settings, including adapting a model to new environments, and modifying it to ignore spurious features. Our code is available at https://github.com/MadryLab/EditingClassifiers .
\ No newline at end of file
diff --git a/data/2021/neurips/Effective Meta-Regularization by Kernelized Proximal Regularization b/data/2021/neurips/Effective Meta-Regularization by Kernelized Proximal Regularization
new file mode 100644
index 0000000000..fa6ebd92ba
--- /dev/null
+++ b/data/2021/neurips/Effective Meta-Regularization by Kernelized Proximal Regularization	
@@ -0,0 +1 @@
+We study the problem of meta-learning, which has proved to be advantageous to accelerate learning new tasks with a few samples. The recent approaches based on deep kernels achieve the state-of-the-art performance. However, the regularizers in their base learners are not learnable. In this paper, we propose an algorithm called MetaProx to learn a proximal regularizer for the base learner. We theoretically establish the convergence of MetaProx. Experimental results conﬁrm the advantage of the proposed algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Active Learning for Gaussian Process Classification by Error Reduction b/data/2021/neurips/Efficient Active Learning for Gaussian Process Classification by Error Reduction
new file mode 100644
index 0000000000..ade9eda8be
--- /dev/null
+++ b/data/2021/neurips/Efficient Active Learning for Gaussian Process Classification by Error Reduction	
@@ -0,0 +1 @@
+Active learning sequentially selects the best instance for labeling by optimizing an acquisition function to enhance data/label efﬁciency. The selection can be either from a discrete instance set (pool-based scenario) or a continuous instance space (query synthesis scenario). In this work, we study both active learning scenarios for Gaussian Process Classiﬁcation (GPC). The existing active learning strategies that maximize the Estimated Error Reduction (EER) aim at reducing the classiﬁcation error after training with the new acquired instance in a one-step-look-ahead manner. The computation of EER-based acquisition functions is typically prohibitive as it requires retraining the GPC with every new query. Moreover, as the EER is not smooth, it can not be combined with gradient-based optimization techniques to efﬁciently explore the continuous instance space for query synthesis. To overcome these critical limitations, we develop computationally efﬁcient algorithms for EER-based active learning with GPC. We derive the joint predictive distribution of label pairs as a one-dimensional integral, as a result of which the computation of the acquisition function avoids retraining the GPC for each query, remarkably reducing the computational overhead. We also derive the gradient chain rule to efﬁciently calculate the gradient of the acquisition function, which leads to the ﬁrst query synthesis active learning algorithm implementing EER-based strategies. Our experiments clearly demonstrate the computational efﬁciency of the proposed algorithms. We also benchmark our algorithms on both synthetic and real-world datasets, which show superior performance in terms of sampling efﬁciency compared to the existing state-of-the-art algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations b/data/2021/neurips/Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
new file mode 100644
index 0000000000..3bd9c92d44
--- /dev/null
+++ b/data/2021/neurips/Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations	
@@ -0,0 +1 @@
+We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}\sigma({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $\sigma(t) := \max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Bayesian network structure learning via local Markov boundary search b/data/2021/neurips/Efficient Bayesian network structure learning via local Markov boundary search
new file mode 100644
index 0000000000..49587e880f
--- /dev/null
+++ b/data/2021/neurips/Efficient Bayesian network structure learning via local Markov boundary search	
@@ -0,0 +1 @@
+We analyze the complexity of learning directed acyclic graphical models from observational data in general settings without specific distributional assumptions. Our approach is information-theoretic and uses a local Markov boundary search procedure in order to recursively construct ancestral sets in the underlying graphical model. Perhaps surprisingly, we show that for certain graph ensembles, a simple forward greedy search algorithm (i.e. without a backward pruning phase) suffices to learn the Markov boundary of each node. This substantially improves the sample complexity, which we show is at most polynomial in the number of nodes. This is then applied to learn the entire graph under a novel identifiability condition that generalizes existing conditions from the literature. As a matter of independent interest, we establish finite-sample guarantees for the problem of recovering Markov boundaries from data. Moreover, we apply our results to the special case of polytrees, for which the assumptions simplify, and provide explicit conditions under which polytrees are identifiable and learnable in polynomial time. We further illustrate the performance of the algorithm, which is easy to implement, in a simulation study. Our approach is general, works for discrete or continuous distributions without distributional assumptions, and as such sheds light on the minimal assumptions required to efficiently learn the structure of directed graphical models from data.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Combination of Rematerialization and Offloading for Training DNNs b/data/2021/neurips/Efficient Combination of Rematerialization and Offloading for Training DNNs
new file mode 100644
index 0000000000..b3309e2358
--- /dev/null
+++ b/data/2021/neurips/Efficient Combination of Rematerialization and Offloading for Training DNNs	
@@ -0,0 +1 @@
+Rematerialization and ofﬂoading are two well known strategies to save memory during the training phase of deep neural networks, allowing data scientists to consider larger models, batch sizes or higher resolution data. Rematerialization trades memory for computation time, whereas Ofﬂoading trades memory for data movements. As these two resources are independent, it is appealing to consider the simultaneous combination of both strategies to save even more memory. We precisely model the costs and constraints corresponding to Deep Learning frameworks such as PyTorch or Tensorﬂow, we propose optimal algorithms to ﬁnd a valid sequence of memory-constrained operations and ﬁnally, we evaluate the performance of proposed algorithms on realistic networks and computation platforms. Our experiments show that the possibility to ofﬂoad can remove one third of the overhead of rematerialization, and that together they can reduce the memory used for activations by a factor 4 to 6, with an overhead below 20%.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Equivariant Network b/data/2021/neurips/Efficient Equivariant Network
new file mode 100644
index 0000000000..3dcb9c3772
--- /dev/null
+++ b/data/2021/neurips/Efficient Equivariant Network	
@@ -0,0 +1 @@
+Convolutional neural networks (CNNs) have dominated the ﬁeld of Computer Vision and achieved great success due to their built-in translation equivariance. Group equivariant CNNs (G-CNNs) that incorporate more equivariance can signiﬁcantly improve the performance of conventional CNNs. However, G-CNNs are faced with two major challenges: spatial-agnostic problem and expensive computational cost . In this work, we propose a general framework of previous equivariant models, which includes G-CNNs and equivariant self-attention layers as special cases. Under this framework, we explicitly decompose the feature aggregation operation into a kernel generator and an encoder, and decouple the spatial and extra geometric dimensions in the computation. Therefore, our ﬁlters are essentially dynamic rather than being spatial-agnostic. We further show that our E quivariant model is parameter E fﬁcient and computational E fﬁcient by complexity analysis, and also data E fﬁcient by experiments, so we call our model E 4 -Net. Extensive experiments verify that our model can signiﬁcantly improve previous works with smaller model size. Especially, under the setting of training on 1 / 5 data of CIFAR10, our model improves G-CNNs by 5%+ accuracy, while using only 56% parameters and 68% FLOPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination b/data/2021/neurips/Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination
new file mode 100644
index 0000000000..046e3d7cc8
--- /dev/null
+++ b/data/2021/neurips/Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination	
@@ -0,0 +1 @@
+A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees. While first-order guarantees are relatively well understood in statistical and online learning, adapting to low noise in contextual bandits (and more broadly, decision making) presents major algorithmic challenges. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire asked whether first-order guarantees are even possible for contextual bandits and -- if so -- whether they can be attained by efficient algorithms. We give a resolution to this question by providing an optimal and efficient reduction from contextual bandits to online regression with the logarithmic (or, cross-entropy) loss. Our algorithm is simple and practical, readily accommodates rich function classes, and requires no distributional assumptions beyond realizability. In a large-scale empirical evaluation, we find that our approach typically outperforms comparable non-first-order methods. On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees, and we combine this observation with new refinements to the regression oracle reduction framework of Foster and Rakhlin. The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Generalization with Distributionally Robust Learning b/data/2021/neurips/Efficient Generalization with Distributionally Robust Learning
new file mode 100644
index 0000000000..ce63c26f0b
--- /dev/null
+++ b/data/2021/neurips/Efficient Generalization with Distributionally Robust Learning	
@@ -0,0 +1 @@
+Distributionally robust learning (DRL) is increasingly seen as a viable method to train machine learning models for improved model generalization. These minimax formulations, however, are more difﬁcult to solve. We provide a new stochastic gradient descent algorithm to efﬁciently solve this DRL formulation. Our approach applies gradient descent to the outer minimization formulation and estimates the gradient of the inner maximization based on a sample average approximation. The latter uses a subset of the data sampled without replacement in each iteration, progressively increasing the subset size to ensure convergence. We rigorously establish convergence to a near-optimal solution under standard regularity assumptions and, for strongly convex losses, match the best known O ( (cid:15) − 1 ) rate of convergence up to a known threshold. Empirical results demonstrate the signiﬁcant beneﬁts of our approach over previous work in improving learning for model generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Learning of Discrete-Continuous Computation Graphs b/data/2021/neurips/Efficient Learning of Discrete-Continuous Computation Graphs
new file mode 100644
index 0000000000..a8290d5ccb
--- /dev/null
+++ b/data/2021/neurips/Efficient Learning of Discrete-Continuous Computation Graphs	
@@ -0,0 +1 @@
+Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems b/data/2021/neurips/Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems
new file mode 100644
index 0000000000..9915010dd0
--- /dev/null
+++ b/data/2021/neurips/Efficient Mirror Descent Ascent Methods for Nonsmooth Minimax Problems	
@@ -0,0 +1 @@
+In the paper, we propose a class of efﬁcient mirror descent ascent methods to solve the nonsmooth nonconvex-strongly-concave minimax problems by using dynamic mirror functions, and introduce a convergence analysis framework to conduct rigorous theoretical analysis for our mirror descent ascent methods. For our stochastic algorithms, we ﬁrst prove that the mini-batch stochastic mirror descent ascent (SMDA) method obtains a sample complexity of O ( κ 3 (cid:15) − 4 ) for ﬁnding an (cid:15) -stationary point, where κ denotes the condition number. Further, we propose an accelerated stochastic mirror descent ascent (VR-SMDA) method based on the variance reduced technique. We prove that our VR-SMDA method achieves a lower sample complexity G O ( κ 3 (cid:15) − 3 ) . For our deterministic algorithm, we prove that our deterministic mirror descent ascent (MDA) achieves a lower sample complexity of O ( κ(cid:15) − 2 ) under mild conditions, which improves the best known complexity by a factor of O ( √ κ ) . We conduct the experiments on fair classiﬁer and robust neural network training tasks to demonstrate the efﬁciency of our new algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Neural Network Training via Forward and Backward Propagation Sparsification b/data/2021/neurips/Efficient Neural Network Training via Forward and Backward Propagation Sparsification
new file mode 100644
index 0000000000..be3753ca1e
--- /dev/null
+++ b/data/2021/neurips/Efficient Neural Network Training via Forward and Backward Propagation Sparsification	
@@ -0,0 +1 @@
+Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Online Estimation of Causal Effects by Deciding What to Observe b/data/2021/neurips/Efficient Online Estimation of Causal Effects by Deciding What to Observe
new file mode 100644
index 0000000000..0fef18da01
--- /dev/null
+++ b/data/2021/neurips/Efficient Online Estimation of Causal Effects by Deciding What to Observe	
@@ -0,0 +1 @@
+Researchers often face data fusion problems, where multiple data sources are available, each capturing a distinct subset of variables. While problem formulations typically take the data as given, in practice, data acquisition can be an ongoing process. In this paper, we aim to estimate any functional of a probabilistic model (e.g., a causal effect) as efficiently as possible, by deciding, at each time, which data source to query. We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions. The optimal action at each step depends, in part, on the very moments that identify the functional of interest. Our algorithms balance exploration with choosing the best action as suggested by current estimates of the moments. We propose two selection strategies: (1) explore-then-commit (OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero asymptotic regret as assessed by MSE. We instantiate our setup for average treatment effect estimation, where structural assumptions are given by a causal graph and data sources may include subsets of mediators, confounders, and instrumental variables.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Statistical Assessment of Neural Network Corruption Robustness b/data/2021/neurips/Efficient Statistical Assessment of Neural Network Corruption Robustness
new file mode 100644
index 0000000000..da660ff864
--- /dev/null
+++ b/data/2021/neurips/Efficient Statistical Assessment of Neural Network Corruption Robustness	
@@ -0,0 +1 @@
+We quantify the robustness of a trained network to input uncertainties with a stochastic simulation inspired by the ﬁeld of Statistical Reliability Engineering. The robustness assessment is cast as a statistical hypothesis test: the network is deemed as locally robust if the estimated probability of failure is lower than a critical level. The procedure is based on an Importance Splitting simulation generating samples of rare events. We derive theoretical guarantees that are non-asymptotic w.r.t. sample size. Experiments tackling large scale networks outline the efﬁciency of our method making a low number of calls to the network function.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Training of Retrieval Models using Negative Cache b/data/2021/neurips/Efficient Training of Retrieval Models using Negative Cache
new file mode 100644
index 0000000000..611bb4e9e2
--- /dev/null
+++ b/data/2021/neurips/Efficient Training of Retrieval Models using Negative Cache	
@@ -0,0 +1 @@
+Factorized models, such as two tower neural network models, are widely used for scoring (query, document) pairs in information retrieval tasks. These models are typically trained by optimizing the model parameters to score relevant “positive" pairs higher than the irrelevant “negative" ones. While a large set of negatives typically improves the model performance, limited computation and memory budgets place constraints on the number of negatives used during training. In this paper, we develop a novel negative sampling technique for accelerating training with softmax cross-entropy loss. By using cached (possibly stale) item embeddings, our technique enables training with a large pool of negatives with reduced memory and computation. We also develop a streaming variant of our algorithm geared towards very large datasets. Furthermore, we establish a theoretical basis for our approach by showing that updating a very small fraction of the cache at each iteration can still ensure fast convergence. Finally, we experimentally validate our approach and show that it is efﬁcient and compares favorably with more complex, state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Training of Visual Transformers with Small Datasets b/data/2021/neurips/Efficient Training of Visual Transformers with Small Datasets
new file mode 100644
index 0000000000..a54992642d
--- /dev/null
+++ b/data/2021/neurips/Efficient Training of Visual Transformers with Small Datasets	
@@ -0,0 +1 @@
+Visual Transformers (VTs) are emerging as an architectural paradigm alternative to Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations between image elements and they potentially have a larger representation capacity. However, the lack of the typical convolutional inductive bias makes these models more data-hungry than common CNNs. In fact, some local properties of the visual domain which are embedded in the CNN architectural design, in VTs should be learned from samples. In this paper, we empirically analyse different VTs, comparing their robustness in a small training-set regime, and we show that, despite having a comparable accuracy when trained on ImageNet, their performance on smaller datasets can be largely different. Moreover, we propose a self-supervised task which can extract additional information from images with only a negligible computational overhead. This task encourages the VTs to learn spatial relations within an image and makes the VT training much more robust when training data are scarce. Our task is used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged in the existing VTs. Using an extensive evaluation with different VTs and datasets, we show that our method can improve (sometimes dramatically) the final accuracy of the VTs. Our code is available at: https://github.com/yhlleo/VTs-Drloc.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient Truncated Linear Regression with Unknown Noise Variance b/data/2021/neurips/Efficient Truncated Linear Regression with Unknown Noise Variance
new file mode 100644
index 0000000000..9fe42b2911
--- /dev/null
+++ b/data/2021/neurips/Efficient Truncated Linear Regression with Unknown Noise Variance	
@@ -0,0 +1 @@
+Truncated linear regression is a classical challenge in statistics, wherein a label, 𝑦 = 𝑤 𝑇 𝑥 + 𝜀 , and its corresponding feature vector, 𝑥 ∈ R 𝑘 , are only observed if the label falls in some subset 𝑆 ⊆ R ; otherwise the existence of the pair ( 𝑥, 𝑦 ) is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of Tobin (1958); Amemiya (1973). When the distribution of the error is normal with known variance, recent work of Daskalakis et al. (2019) provides computationally and statistically efﬁcient estimators of the linear model, 𝑤 . In this paper, we provide the ﬁrst computationally and statistically efﬁcient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efﬁcient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit conﬁdence regions for our estimates.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient and Accurate Gradients for Neural SDEs b/data/2021/neurips/Efficient and Accurate Gradients for Neural SDEs
new file mode 100644
index 0000000000..c01ca97c1e
--- /dev/null
+++ b/data/2021/neurips/Efficient and Accurate Gradients for Neural SDEs	
@@ -0,0 +1 @@
+Neural SDEs combine many of the best qualities of both RNNs and SDEs: memory efficient training, high-capacity function approximation, and strong priors on model space. This makes them a natural choice for modelling many types of temporal dynamics. Training a Neural SDE (either as a VAE or as a GAN) requires backpropagating through an SDE solve. This may be done by solving a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational cost and numerical truncation errors. Here, we overcome these issues through several technical innovations. First, we introduce the \textit{reversible Heun method}. This is a new SDE solver that is \textit{algebraically reversible}: eliminating numerical gradient errors, and the first such solver of which we are aware. Moreover it requires half as many function evaluations as comparable solvers, giving up to a $1.98\times$ speedup. Second, we introduce the \textit{Brownian Interval}: a new, fast, memory efficient, and exact way of sampling \textit{and reconstructing} Brownian motion. With this we obtain up to a $10.6\times$ speed improvement over previous techniques, which in contrast are both approximate and relatively slow. Third, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a $1.87\times$ speedup) and removes the numerical truncation errors associated with gradient penalty. Altogether, we outperform the state-of-the-art by substantial margins, with respect to training speed, and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient and Local Parallel Random Walks b/data/2021/neurips/Efficient and Local Parallel Random Walks
new file mode 100644
index 0000000000..1f36e17d67
--- /dev/null
+++ b/data/2021/neurips/Efficient and Local Parallel Random Walks	
@@ -0,0 +1 @@
+Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Lacki et al.). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily relies on computing random walks out of all nodes in the input graph, even though in many practical applications one is interested in computing random walks only from a small subset of nodes in the graph. In this paper, we present a new algorithm that overcomes this limitation by building random walk efficiently and locally at the same time. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm. Finally, we complement our theoretical analysis with experimental results showing that our algorithm is significantly more scalable than previous approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient constrained sampling via the mirror-Langevin algorithm b/data/2021/neurips/Efficient constrained sampling via the mirror-Langevin algorithm
new file mode 100644
index 0000000000..d3f7c0683b
--- /dev/null
+++ b/data/2021/neurips/Efficient constrained sampling via the mirror-Langevin algorithm	
@@ -0,0 +1 @@
+We propose a new discretization of the mirror-Langevin diffusion and give a crisp proof of its convergence. Our analysis uses relative convexity/smoothness and self-concordance, ideas which originated in convex optimization, together with a new result in optimal transport that generalizes the displacement convexity of the entropy. Unlike prior works, our result both (1) requires much weaker assumptions on the mirror map and the target distribution, and (2) has vanishing bias as the step size tends to zero. In particular, for the task of sampling from a log-concave distribution supported on a compact set, our theoretical results are significantly better than the existing guarantees.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging b/data/2021/neurips/Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging
new file mode 100644
index 0000000000..a995a55dc0
--- /dev/null
+++ b/data/2021/neurips/Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging	
@@ -0,0 +1 @@
+Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work either neglects the temporal structure or leads to computationally demanding inference schemes. Overcoming these limitations, we devise a novel flexible hierarchical Bayesian framework within which the spatio-temporal dynamics of model parameters and noise are modeled to have Kronecker product covariance structure. Inference in our framework is based on majorization-minimization optimization and has guaranteed convergence properties. Our highly efficient algorithms exploit the intrinsic Riemannian geometry of temporal autocovariance matrices. For stationary dynamics described by Toeplitz matrices, the theory of circulant embeddings is employed. We prove convex bounding properties and derive update rules of the resulting algorithms. On both synthetic and real neural data from M/EEG, we demonstrate that our methods lead to improved performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficient methods for Gaussian Markov random fields under sparse linear constraints b/data/2021/neurips/Efficient methods for Gaussian Markov random fields under sparse linear constraints
new file mode 100644
index 0000000000..a4e1446748
--- /dev/null
+++ b/data/2021/neurips/Efficient methods for Gaussian Markov random fields under sparse linear constraints	
@@ -0,0 +1 @@
+Methods for inference and simulation of linearly constrained Gaussian Markov Random Fields (GMRF) are computationally prohibitive when the number of constraints is large. In some cases, such as for intrinsic GMRFs, they may even be unfeasible. We propose a new class of methods to overcome these challenges in the common case of sparse constraints, where one has a large number of constraints and each only involves a few elements. Our methods rely on a basis transformation into blocks of constrained versus non-constrained subspaces, and we show that the methods greatly outperform existing alternatives in terms of computational cost. By combining the proposed methods with the stochastic partial differential equation approach for Gaussian random fields, we also show how to formulate Gaussian process regression with linear constraints in a GMRF setting to reduce computational cost. This is illustrated in two applications with simulated data.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficiently Identifying Task Groupings for Multi-Task Learning b/data/2021/neurips/Efficiently Identifying Task Groupings for Multi-Task Learning
new file mode 100644
index 0000000000..0e78f9b634
--- /dev/null
+++ b/data/2021/neurips/Efficiently Identifying Task Groupings for Multi-Task Learning	
@@ -0,0 +1 @@
+Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task groupings can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains a challenging design question without a clear solution. In this paper, we suggest an approach to select which tasks should train together in multi-task learning models. Our method determines task groupings in a single run by training all tasks together and quantifying the effect to which one task's gradient would affect another task's loss. On the large-scale Taskonomy computer vision dataset, we find this method can decrease test loss by 10.0% compared to simply training all tasks together while operating 11.6 times faster than a state-of-the-art task grouping method.
\ No newline at end of file
diff --git a/data/2021/neurips/Efficiently Learning One Hidden Layer ReLU Networks From Queries b/data/2021/neurips/Efficiently Learning One Hidden Layer ReLU Networks From Queries
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Embedding Principle of Loss Landscape of Deep Neural Networks b/data/2021/neurips/Embedding Principle of Loss Landscape of Deep Neural Networks
new file mode 100644
index 0000000000..f0bd94f330
--- /dev/null
+++ b/data/2021/neurips/Embedding Principle of Loss Landscape of Deep Neural Networks	
@@ -0,0 +1 @@
+Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN"contains"all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near
\ No newline at end of file
diff --git a/data/2021/neurips/Emergent Communication of Generalizations b/data/2021/neurips/Emergent Communication of Generalizations
new file mode 100644
index 0000000000..568d838fc2
--- /dev/null
+++ b/data/2021/neurips/Emergent Communication of Generalizations	
@@ -0,0 +1 @@
+To build agents that can collaborate effectively with others, recent research has trained artificial agents to communicate with each other in Lewis-style referential games. However, this often leads to successful but uninterpretable communication. We argue that this is due to the game objective: communicating about a single object in a shared visual context is prone to overfitting and does not encourage language useful beyond concrete reference. In contrast, human language conveys a rich variety of abstract ideas. To promote such skills, we propose games that require communicating generalizations over sets of objects representing abstract visual concepts, optionally with separate contexts for each agent. We find that these games greatly improve systematicity and interpretability of the learned languages, according to several metrics in the literature. Finally, we propose a method for identifying logical operations embedded in the emergent languages by learning an approximate compositional reconstruction of the language.
\ No newline at end of file
diff --git a/data/2021/neurips/Emergent Communication under Varying Sizes and Connectivities b/data/2021/neurips/Emergent Communication under Varying Sizes and Connectivities
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Emergent Discrete Communication in Semantic Spaces b/data/2021/neurips/Emergent Discrete Communication in Semantic Spaces
new file mode 100644
index 0000000000..ba938b6f7c
--- /dev/null
+++ b/data/2021/neurips/Emergent Discrete Communication in Semantic Spaces	
@@ -0,0 +1 @@
+Neural agents trained in reinforcement learning settings can learn to communicate among themselves via discrete tokens, accomplishing as a team what agents would be unable to do alone. However, the current standard of using one-hot vectors as discrete communication tokens prevents agents from acquiring more desirable aspects of communication such as zero-shot understanding. Inspired by word embedding techniques from natural language processing, we propose neural agent architectures that enables them to communicate via discrete tokens derived from a learned, continuous space. We show in a decision theoretic framework that our technique optimizes communication over a wide range of scenarios, whereas one-hot tokens are only optimal under restrictive assumptions. In self-play experiments, we validate that our trained agents learn to cluster tokens in semantically-meaningful ways, allowing them communicate in noisy environments where other techniques fail. Lastly, we demonstrate both that agents using our method can effectively respond to novel human communication and that humans can understand unlabeled emergent agent communication, outperforming the use of one-hot communication.
\ No newline at end of file
diff --git a/data/2021/neurips/Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization b/data/2021/neurips/Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization
new file mode 100644
index 0000000000..50b83f037e
--- /dev/null
+++ b/data/2021/neurips/Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization	
@@ -0,0 +1 @@
+A common pain point in differentially private machine learning is the significant runtime overhead incurred when executing Differentially Private Stochastic Gradient Descent (DPSGD), which may be as large as two orders of magnitude. We thoroughly demonstrate that by exploiting powerful language primitives, including vectorization, just-in-time compilation, and static graph optimization, one can dramatically reduce these overheads, in many cases nearly matching the best non-private running times. These gains are realized in two frameworks: JAX and TensorFlow. JAX provides rich support for these primitives as core features of the language through the XLA compiler. We also rebuild core parts of TensorFlow Privacy, integrating features from TensorFlow 2 as well as XLA compilation, granting significant memory and runtime improvements over the current release version. These approaches allow us to achieve up to 50x speedups in comparison to the best alternatives. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2021/neurips/Encoding Robustness to Image Style via Adversarial Feature Perturbations b/data/2021/neurips/Encoding Robustness to Image Style via Adversarial Feature Perturbations
new file mode 100644
index 0000000000..9ffb7aaf8e
--- /dev/null
+++ b/data/2021/neurips/Encoding Robustness to Image Style via Adversarial Feature Perturbations	
@@ -0,0 +1 @@
+Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of deep image features. We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce models that are robust to various unseen distributional shifts. We explore the relationship between these perturbations and distributional shifts by visualizing adversarial features. Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training. By fine-tuning neural networks on adversarial feature distributions, we observe improved robustness of networks to various unseen distributional shifts, including style variations and image corruptions. In addition, we show that our proposed adversarial feature perturbation can be complementary to existing image space data augmentation methods, leading to improved performance. The source code and pre-trained models are released at https://github.com/azshue/AdvBN.
\ No newline at end of file
diff --git a/data/2021/neurips/Encoding Spatial Distribution of Convolutional Features for Texture Representation b/data/2021/neurips/Encoding Spatial Distribution of Convolutional Features for Texture Representation
new file mode 100644
index 0000000000..9dd112e8ce
--- /dev/null
+++ b/data/2021/neurips/Encoding Spatial Distribution of Convolutional Features for Texture Representation	
@@ -0,0 +1 @@
+Existing convolutional neural networks (CNNs) often use global average pooling (GAP) to aggregate feature maps into a single representation. However, GAP cannot well characterize complex distributive patterns of spatial features while such patterns play an important role in texture-oriented applications, e.g. , material recognition and ground terrain classiﬁcation. In the context of texture representation, this paper addressed the issue by proposing Fractal Encoding (FE), a feature encoding module grounded by multi-fractal geometry. Considering a CNN feature map as a union of level sets of points lying in the 2D space, FE characterizes their spatial layout via a local-global hierarchical fractal analysis which examines the multi-scale power behavior on each level set. This enables a CNN to encode the regularity on the spatial arrangement of image features, leading to a robust yet discriminative spectrum descriptor. In addition, FE has trainable parameters for data adaptivity and can be easily incorporated into existing CNNs for end-to-end training. We applied FE to ResNet-based texture classiﬁcation and retrieval, and demonstrated its effectiveness on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering b/data/2021/neurips/End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering
new file mode 100644
index 0000000000..56a649ad4c
--- /dev/null
+++ b/data/2021/neurips/End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering	
@@ -0,0 +1 @@
+We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.
\ No newline at end of file
diff --git a/data/2021/neurips/End-to-End Weak Supervision b/data/2021/neurips/End-to-End Weak Supervision
new file mode 100644
index 0000000000..8e5086bd13
--- /dev/null
+++ b/data/2021/neurips/End-to-End Weak Supervision	
@@ -0,0 +1 @@
+Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources -- making assumptions that rarely hold in practice -- followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.
\ No newline at end of file
diff --git a/data/2021/neurips/End-to-end Multi-modal Video Temporal Grounding b/data/2021/neurips/End-to-end Multi-modal Video Temporal Grounding
new file mode 100644
index 0000000000..f9b3e45bff
--- /dev/null
+++ b/data/2021/neurips/End-to-end Multi-modal Video Temporal Grounding	
@@ -0,0 +1 @@
+We address the problem of text-guided video temporal grounding, which aims to identify the time interval of a certain event based on a natural language description. Different from most existing methods that only consider RGB images as visual features, we propose a multi-modal framework to extract complementary information from videos. Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure. While RGB images provide abundant visual cues of certain events, the performance may be affected by background clutters. Therefore, we use optical flow to focus on large motion and depth maps to infer the scene configuration when the action is related to objects recognizable with their shapes. To integrate the three modalities more effectively and enable inter-modal learning, we design a dynamic fusion scheme with transformers to model the interactions between modalities. Furthermore, we apply intra-modal self-supervised learning to enhance feature representations across videos for each modality, which also facilitates multi-modal learning. We conduct extensive experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/End-to-end reconstruction meets data-driven regularization for inverse problems b/data/2021/neurips/End-to-end reconstruction meets data-driven regularization for inverse problems
new file mode 100644
index 0000000000..4ad8f2b6a5
--- /dev/null
+++ b/data/2021/neurips/End-to-end reconstruction meets data-driven regularization for inverse problems	
@@ -0,0 +1 @@
+We propose an unsupervised approach for learning end-to-end reconstruction operators for ill-posed inverse problems. The proposed method combines the classical variational framework with iterative unrolling, which essentially seeks to minimize a weighted combination of the expected distortion in the measurement space and the Wasserstein-1 distance between the distributions of the reconstruction and ground-truth. More specifically, the regularizer in the variational setting is parametrized by a deep neural network and learned simultaneously with the unrolled reconstruction operator. The variational problem is then initialized with the reconstruction of the unrolled operator and solved iteratively till convergence. Notably, it takes significantly fewer iterations to converge, thanks to the excellent initialization obtained via the unrolled operator. The resulting approach combines the computational efficiency of end-to-end unrolled reconstruction with the well-posedness and noise-stability guarantees of the variational setting. Moreover, we demonstrate with the example of X-ray computed tomography (CT) that our approach outperforms state-of-the-art unsupervised methods, and that it outperforms or is on par with state-of-the-art supervised learned reconstruction approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Ensembling Graph Predictions for AMR Parsing b/data/2021/neurips/Ensembling Graph Predictions for AMR Parsing
new file mode 100644
index 0000000000..2202c678c7
--- /dev/null
+++ b/data/2021/neurips/Ensembling Graph Predictions for AMR Parsing	
@@ -0,0 +1 @@
+In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions. In the literature, there are many ensembling techniques proposed for classification or regression problems, however, ensemble graph prediction has not been studied thoroughly. In this work, we formalize this problem as mining the largest graph that is the most supported by a collection of graph predictions. As the problem is NP-Hard, we propose an efficient heuristic algorithm to approximate the optimal solution. To validate our approach, we carried out experiments in AMR parsing problems. The experimental results demonstrate that the proposed approach can combine the strength of state-of-the-art AMR parsers to create new predictions that are more accurate than any individual models in five standard benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Entropic Desired Dynamics for Intrinsic Control b/data/2021/neurips/Entropic Desired Dynamics for Intrinsic Control
new file mode 100644
index 0000000000..639ceb9d41
--- /dev/null
+++ b/data/2021/neurips/Entropic Desired Dynamics for Intrinsic Control	
@@ -0,0 +1 @@
+An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach. In practice, this often means maximizing the number of latent codes that can be discriminated from future states under some short time horizon (e.g. [15]). By situating these latent codes in a globally consistent coordinate system, we show that agents can reliably reach more states in the long term while still optimizing a local objective. A simple instantiation of this idea, E ntropic D esired D ynamics for I ntrinsic C on T rol (EDDICT), assumes ﬁxed additive latent dynamics, which results in tractable learning and an interpretable latent space. Compared to prior methods, EDDICT’s globally consistent codes allow it to be far more exploratory, as demonstrated by improved state coverage and increased unsupervised performance on hard exploration games such as Montezuma’s Revenge.
\ No newline at end of file
diff --git a/data/2021/neurips/Entropy-based adaptive Hamiltonian Monte Carlo b/data/2021/neurips/Entropy-based adaptive Hamiltonian Monte Carlo
new file mode 100644
index 0000000000..3a0ee1a2d9
--- /dev/null
+++ b/data/2021/neurips/Entropy-based adaptive Hamiltonian Monte Carlo	
@@ -0,0 +1 @@
+Hamiltonian Monte Carlo (HMC) is a popular Markov Chain Monte Carlo (MCMC) algorithm to sample from an unnormalized probability distribution. A leapfrog integrator is commonly used to implement HMC in practice, but its performance can be sensitive to the choice of mass matrix used therein. We develop a gradient-based algorithm that allows for the adaptation of the mass matrix by encouraging the leapfrog integrator to have high acceptance rates while also exploring all dimensions jointly. In contrast to previous work that adapt the hyperparameters of HMC using some form of expected squared jumping distance, the adaptation strategy suggested here aims to increase sampling efficiency by maximizing an approximation of the proposal entropy. We illustrate that using multiple gradients in the HMC proposal can be beneficial compared to a single gradient-step in Metropolis-adjusted Langevin proposals. Empirical evidence suggests that the adaptation method can outperform different versions of HMC schemes by adjusting the mass matrix to the geometry of the target distribution and by providing some control on the integration time.
\ No newline at end of file
diff --git a/data/2021/neurips/Environment Generation for Zero-Shot Compositional Reinforcement Learning b/data/2021/neurips/Environment Generation for Zero-Shot Compositional Reinforcement Learning
new file mode 100644
index 0000000000..eb2c724936
--- /dev/null
+++ b/data/2021/neurips/Environment Generation for Zero-Shot Compositional Reinforcement Learning	
@@ -0,0 +1 @@
+Many real-world problems are compositional - solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent's current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent's performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration b/data/2021/neurips/Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
new file mode 100644
index 0000000000..1fad408425
--- /dev/null
+++ b/data/2021/neurips/Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration	
@@ -0,0 +1 @@
+Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the"induced"individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.
\ No newline at end of file
diff --git a/data/2021/neurips/Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium b/data/2021/neurips/Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium
new file mode 100644
index 0000000000..9dd54ba9a4
--- /dev/null
+++ b/data/2021/neurips/Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium	
@@ -0,0 +1 @@
+In two-player zero-sum extensive-form games, Nash equilibrium prescribes optimal strategies against perfectly rational opponents. However, it does not guarantee rational play in parts of the game tree that can only be reached by the players making mistakes. This can be problematic when operationalizing equilibria in the real world among imperfect players. Trembling-hand reﬁnements are a sound remedy to this issue, and are subsets of Nash equilibria that are designed to handle the possibility that any of the players may make mistakes. In this paper, we initiate the study of equilibrium reﬁnements for settings where one of the players is perfectly rational (the “machine”) and the other may make mistakes. As we show, this endeavor has many pitfalls: many intuitively appealing approaches to reﬁnement fail in various ways. On the positive side, we introduce a modiﬁcation of the classical quasi-perfect equilibrium (QPE) reﬁnement, which we call the one-sided quasi-perfect equilibrium . Unlike QPE, one-sided QPE only accounts for mistakes from one player and assumes that no mistakes will be made by the machine. We present experiments on standard benchmark games and an endgame from the famous man-machine match where the AI Libratus was the ﬁrst to beat top human specialist professionals in heads-up no-limit Texas hold’em poker. We show that one-sided QPE can be computed more efﬁciently than all known prior reﬁnements, paving the way to wider adoption of Nash equilibrium reﬁnements in settings with perfectly rational machines (or humans perfectly actuating machine-generated strategies) that interact with players prone to mistakes. We also show that one-sided QPE tends to play better than a Nash equilibrium strategy against imperfect opponents.
\ No newline at end of file
diff --git a/data/2021/neurips/Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines b/data/2021/neurips/Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines
new file mode 100644
index 0000000000..e4c7d4a788
--- /dev/null
+++ b/data/2021/neurips/Equilibrium and non-Equilibrium regimes in the learning of Restricted Boltzmann Machines	
@@ -0,0 +1 @@
+Training restricted Boltzmann machines (RBMs) have been challenging for a long time due to the difficulty of precisely computing the log-likelihood gradient. Over the past few decades, many works have proposed more or less successful training recipes but without studying the crucial quantity of the problem: the mixing time, i.e. the number of Monte Carlo iterations needed to sample new configurations from a model. In this work, we show that this mixing time plays a crucial role in the dynamics and stability of the trained model, and that RBMs operate in two well-defined regimes, namely equilibrium and out-of-equilibrium, depending on the interplay between this mixing time of the model and the number of steps, k, used to approximate the gradient. We further show empirically that this mixing time increases with the learning, which often implies a transition from one regime to another as soon as k becomes smaller than this time. In particular, we show that using the popular k (persistent) contrastive divergence approaches, with k small, the dynamics of the learned model are extremely slow and often dominated by strong out-of-equilibrium effects. On the contrary, RBMs trained in equilibrium display faster dynamics, and a smooth convergence to dataset-like configurations during the sampling. Finally, we discuss how to exploit in practice both regimes depending on the task one aims to fulfill: (i) short k can be used to generate convincing samples in short learning times, (ii) large k (or increasingly large) is needed to learn the correct equilibrium distribution of the RBM. Finally, the existence of these two operational regimes seems to be a general property of energy based models trained via likelihood maximization.
\ No newline at end of file
diff --git a/data/2021/neurips/Equivariant Manifold Flows b/data/2021/neurips/Equivariant Manifold Flows
new file mode 100644
index 0000000000..3a73d6148b
--- /dev/null
+++ b/data/2021/neurips/Equivariant Manifold Flows	
@@ -0,0 +1 @@
+Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries -- a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetry-invariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by using it to learn gauge invariant densities over $SU(n)$ in the context of quantum field theory.
\ No newline at end of file
diff --git a/data/2021/neurips/Error Compensated Distributed SGD Can Be Accelerated b/data/2021/neurips/Error Compensated Distributed SGD Can Be Accelerated
new file mode 100644
index 0000000000..f82e894990
--- /dev/null
+++ b/data/2021/neurips/Error Compensated Distributed SGD Can Be Accelerated	
@@ -0,0 +1 @@
+Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with Nesterov's acceleration. In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions. We show through numerical experiments that the proposed method converges with substantially fewer communication rounds than previous error compensated algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/ErrorCompensatedX: error compensation for variance reduced algorithms b/data/2021/neurips/ErrorCompensatedX: error compensation for variance reduced algorithms
new file mode 100644
index 0000000000..44e2970294
--- /dev/null
+++ b/data/2021/neurips/ErrorCompensatedX: error compensation for variance reduced algorithms	
@@ -0,0 +1 @@
+Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by adding back the compression error from the previous step. This idea was further extended to one class of variance reduced algorithms, where the variance of the stochastic gradient is reduced by taking a moving average over all history gradients. However, our analysis shows that just adding the previous step's compression error, as done in existing work, does not fully compensate the compression error. So, we propose ErrorCompensatedX, which uses the compression error from the previous two steps. We show that ErrorCompensatedX can achieve the same asymptotic convergence rate with the training without compression. Moreover, we provide a unified theoretical analysis framework for this class of variance reduced algorithms, with or without error compensation.
\ No newline at end of file
diff --git a/data/2021/neurips/Escape saddle points by a simple gradient-descent based algorithm b/data/2021/neurips/Escape saddle points by a simple gradient-descent based algorithm
new file mode 100644
index 0000000000..2ee72e08e4
--- /dev/null
+++ b/data/2021/neurips/Escape saddle points by a simple gradient-descent based algorithm	
@@ -0,0 +1 @@
+Escaping saddle points is a central research topic in nonconvex optimization. In this paper, we propose a simple gradient-based algorithm such that for a smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}(\log n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or $\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is polynomially better in terms of $\log n$ and matches their complexities in terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an $\epsilon$-approximate second-order stationary point in $\tilde{O}((\log n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients, which can find negative curvature near saddle points and achieve the polynomial speedup in $\log n$ compared to the perturbed gradient descent methods. Finally, we also perform numerical experiments that support our results.
\ No newline at end of file
diff --git a/data/2021/neurips/Escaping Saddle Points with Compressed SGD b/data/2021/neurips/Escaping Saddle Points with Compressed SGD
new file mode 100644
index 0000000000..0b4edd5f13
--- /dev/null
+++ b/data/2021/neurips/Escaping Saddle Points with Compressed SGD	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an $\varepsilon$-first-order stationary point. In this paper we extend these results to convergence to an $\varepsilon$-second-order stationary point ($\varepsilon$-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RandomK compressor converges to an $\varepsilon$-SOSP with the same number of iterations as uncompressed SGD [Jin et al.,2021] (JACM), while improving the total communication by a factor of $\tilde \Theta(\sqrt{d} \varepsilon^{-3/4})$, where $d$ is the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.
\ No newline at end of file
diff --git a/data/2021/neurips/Estimating High Order Gradients of the Data Distribution by Denoising b/data/2021/neurips/Estimating High Order Gradients of the Data Distribution by Denoising
new file mode 100644
index 0000000000..571dd1930b
--- /dev/null
+++ b/data/2021/neurips/Estimating High Order Gradients of the Data Distribution by Denoising	
@@ -0,0 +1 @@
+The first order derivative of a data density can be estimated efficiently by denoising score matching, and has become an important component in many applications, such as image generation and audio synthesis. Higher order derivatives provide additional local information about the data distribution and enable new applications. Although they can be estimated via automatic differentiation of a learned density model, this can amplify estimation errors and is expensive in high dimensional settings. To overcome these limitations, we propose a method to directly estimate high order derivatives (scores) of a data density from samples. We first show that denoising score matching can be interpreted as a particular case of Tweedie's formula. By leveraging Tweedie's formula on higher order moments, we generalize denoising score matching to estimate higher order derivatives. We demonstrate empirically that models trained with the proposed method can approximate second order derivatives more efficiently and accurately than via automatic differentiation. We show that our models can be used to quantify uncertainty in denoising and to improve the mixing speed of Langevin dynamics via Ozaki discretization for sampling synthetic data and natural images.
\ No newline at end of file
diff --git a/data/2021/neurips/Estimating Multi-cause Treatment Effects via Single-cause Perturbation b/data/2021/neurips/Estimating Multi-cause Treatment Effects via Single-cause Perturbation
new file mode 100644
index 0000000000..cbb3bd68f5
--- /dev/null
+++ b/data/2021/neurips/Estimating Multi-cause Treatment Effects via Single-cause Perturbation	
@@ -0,0 +1 @@
+Most existing methods for conditional average treatment effect estimation are designed to estimate the effect of a single cause — only one variable can be intervened on at one time. However, many applications involve simultaneous intervention on multiple variables, which leads to multi-cause treatment effect problems. The multi-cause problem is challenging because one needs to overcome the confounding bias for a large number of treatment groups, each with a different cause combination. The combinatorial nature of the problem also leads to severe data scarcity — we only observe one factual outcome out of many potential outcomes. In this work, we propose Single-cause Perturbation (SCP), a novel two-step procedure to estimate the multi-cause treatment effect. SCP starts by augmenting the observational dataset with the estimated potential outcomes under single-cause interventions. It then performs covariate adjustment on the augmented dataset to obtain the estimator. SCP is agnostic to the exact choice of algorithm in either step. We show formally that the procedure is valid under standard assumptions in causal inference. We demonstrate the performance gain of SCP on extensive synthetic and semi-synthetic experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Estimating the Long-Term Effects of Novel Treatments b/data/2021/neurips/Estimating the Long-Term Effects of Novel Treatments
new file mode 100644
index 0000000000..646d22209e
--- /dev/null
+++ b/data/2021/neurips/Estimating the Long-Term Effects of Novel Treatments	
@@ -0,0 +1 @@
+Policy makers typically face the problem of wanting to estimate the long-term effects of novel treatments, while only having historical data of older treatment options. We assume access to a long-term dataset where only past treatments were administered and a short-term dataset where novel treatments have been administered. We propose a surrogate based approach where we assume that the long-term effect is channeled through a multitude of available short-term proxies. Our work combines three major recent techniques in the causal machine learning literature: surrogate indices, dynamic treatment effect estimation and double machine learning, in a unified pipeline. We show that our method is consistent and provides root-n asymptotically normal estimates under a Markovian assumption on the data and the observational policy. We use a data-set from a major corporation that includes customer investments over a three year period to create a semi-synthetic data distribution where the major qualitative properties of the real dataset are preserved. We evaluate the performance of our method and discuss practical challenges of deploying our formal methodology and how to address them.
\ No newline at end of file
diff --git a/data/2021/neurips/Estimating the Unique Information of Continuous Variables b/data/2021/neurips/Estimating the Unique Information of Continuous Variables
new file mode 100644
index 0000000000..3b82b6ef26
--- /dev/null
+++ b/data/2021/neurips/Estimating the Unique Information of Continuous Variables	
@@ -0,0 +1 @@
+The integration and transfer of information from multiple sources to multiple targets is a core motive of neural systems. The emerging field of partial information decomposition (PID) provides a novel information-theoretic lens into these mechanisms by identifying synergistic, redundant, and unique contributions to the mutual information between one and several variables. While many works have studied aspects of PID for Gaussian and discrete distributions, the case of general continuous distributions is still uncharted territory. In this work we present a method for estimating the unique information in continuous distributions, for the case of one versus two variables. Our method solves the associated optimization problem over the space of distributions with fixed bivariate marginals by combining copula decompositions and techniques developed to optimize variational autoencoders. We obtain excellent agreement with known analytic results for Gaussians, and illustrate the power of our new approach in several brain-inspired neural models. Our method is capable of recovering the effective connectivity of a chaotic network of rate neurons, and uncovers a complex trade-off between redundancy, synergy and unique information in recurrent networks trained to solve a generalized XOR task.
\ No newline at end of file
diff --git a/data/2021/neurips/Evaluating Efficient Performance Estimators of Neural Architectures b/data/2021/neurips/Evaluating Efficient Performance Estimators of Neural Architectures
new file mode 100644
index 0000000000..11a3496755
--- /dev/null
+++ b/data/2021/neurips/Evaluating Efficient Performance Estimators of Neural Architectures	
@@ -0,0 +1 @@
+Conducting efficient performance estimations of neural architectures is a major challenge in neural architecture search (NAS). To reduce the architecture training costs in NAS, one-shot estimators (OSEs) amortize the architecture training costs by sharing the parameters of one"supernet"between all architectures. Recently, zero-shot estimators (ZSEs) that involve no training are proposed to further reduce the architecture evaluation cost. Despite the high efficiency of these estimators, the quality of such estimations has not been thoroughly studied. In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS ResNet/ResNeXt-A. Specifically, we employ a set of NAS-oriented criteria to study the behavior of OSEs and ZSEs and reveal that they have certain biases and variances. After analyzing how and why the OSE estimations are unsatisfying, we explore how to mitigate the correlation gap of OSEs from several perspectives. Through our analysis, we give out suggestions for future application and development of efficient architecture performance estimators. Furthermore, the analysis framework proposed in our work could be utilized in future research to give a more comprehensive understanding of newly designed architecture performance estimators. All codes are available at https://github.com/walkerning/aw_nas.
\ No newline at end of file
diff --git a/data/2021/neurips/Evaluating Gradient Inversion Attacks and Defenses in Federated Learning b/data/2021/neurips/Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
new file mode 100644
index 0000000000..3a5d7d3b3f
--- /dev/null
+++ b/data/2021/neurips/Evaluating Gradient Inversion Attacks and Defenses in Federated Learning	
@@ -0,0 +1 @@
+Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data. This paper evaluates existing attacks and defenses. We find that some attacks make strong assumptions about the setup. Relaxing such assumptions can substantially weaken these attacks. We then evaluate the benefits of three proposed defense mechanisms against gradient inversion attacks. We show the trade-offs of privacy leakage and data utility of these defense methods, and find that combining them in an appropriate manner makes the attack less effective, even under the original strong assumptions. We also estimate the computation cost of end-to-end recovery of a single image under each evaluated defense. Our findings suggest that the state-of-the-art attacks can currently be defended against with minor data utility loss, as summarized in a list of potential strategies. Our code is available at: https://github.com/Princeton-SysML/GradAttack.
\ No newline at end of file
diff --git a/data/2021/neurips/Evaluating State-of-the-Art Classification Models Against Bayes Optimality b/data/2021/neurips/Evaluating State-of-the-Art Classification Models Against Bayes Optimality
new file mode 100644
index 0000000000..4c44f0dd04
--- /dev/null
+++ b/data/2021/neurips/Evaluating State-of-the-Art Classification Models Against Bayes Optimality	
@@ -0,0 +1 @@
+Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the \emph{Bayes error}, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic"hardness"of standard benchmark datasets, and classes within those datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Evaluating model performance under worst-case subpopulations b/data/2021/neurips/Evaluating model performance under worst-case subpopulations
new file mode 100644
index 0000000000..a84ccd4140
--- /dev/null
+++ b/data/2021/neurips/Evaluating model performance under worst-case subpopulations	
@@ -0,0 +1 @@
+The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performance of a model over all subpopulations of a given size, defined with respect to core attributes Z. This notion of robustness can consider arbitrary (continuous) attributes Z, and automatically accounts for complex intersectionality in disadvantaged groups. We develop a scalable yet principled two-stage estimation procedure that can evaluate the robustness of state-of-the-art models. We prove that our procedure enjoys several finite-sample convergence guarantees, including dimension-free convergence. Instead of overly conservative notions based on Rademacher complexities, our evaluation error depends on the dimension of Z only through the out-of-sample error in estimating the performance conditional on Z. On real datasets, we demonstrate that our method certifies the robustness of a model and prevents deployment of unreliable models.
\ No newline at end of file
diff --git a/data/2021/neurips/Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi b/data/2021/neurips/Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi
new file mode 100644
index 0000000000..c59d1cf4f7
--- /dev/null
+++ b/data/2021/neurips/Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi	
@@ -0,0 +1 @@
+Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation b/data/2021/neurips/Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation
new file mode 100644
index 0000000000..b1f579ff41
--- /dev/null
+++ b/data/2021/neurips/Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation	
@@ -0,0 +1 @@
+Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. This allows us to provide the first theoretical results on the importance of using the weighted ground-truth targets in self-distillation. Our focus is on fitting nonlinear functions to training data with a weighted mean square error objective function suitable for distillation, subject to $\ell_2$ regularization of the model parameters. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization. Furthermore, we provide a closed form solution for the optimal choice of weighting parameter at each step, and show how to efficiently estimate this weighting parameter for deep learning and significantly reduce the computational requirements compared to a grid search.
\ No newline at end of file
diff --git a/data/2021/neurips/Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models b/data/2021/neurips/Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
new file mode 100644
index 0000000000..307fb7c358
--- /dev/null
+++ b/data/2021/neurips/Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models	
@@ -0,0 +1 @@
+Many applications of generative models rely on the marginalization of their highdimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more computationally tractable. However, sparse normalization functions usually require alternative loss functions for training since the log-likelihood is undefined for sparse probability distributions. Furthermore, many sparse normalization functions often collapse the multimodality of distributions. In this work, we present ev-softmax, a sparse normalization function that preserves the multimodality of probability distributions. We derive its properties, including its gradient in closed-form, and introduce a continuous family of approximations to ev-softmax that have full support and can be trained with probabilistic loss functions such as negative log-likelihood and Kullback-Leibler divergence. We evaluate our method on a variety of generative models, including variational autoencoders and auto-regressive architectures. Our method outperforms existing dense and sparse normalization techniques in distributional accuracy. We demonstrate that ev-softmax successfully reduces the dimensionality of probability distributions while maintaining multimodality.
\ No newline at end of file
diff --git a/data/2021/neurips/EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization b/data/2021/neurips/EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization
new file mode 100644
index 0000000000..ee3e893d64
--- /dev/null
+++ b/data/2021/neurips/EvoGrad: Efficient Gradient-Based Meta-Learning and Hyperparameter Optimization	
@@ -0,0 +1 @@
+Gradient-based meta-learning and hyperparameter optimization have seen significant progress recently, enabling practical end-to-end training of neural networks together with many hyperparameters. Nevertheless, existing approaches are relatively expensive as they need to compute second-order derivatives and store a longer computational graph. This cost prevents scaling them to larger network architectures. We present EvoGrad, a new approach to meta-learning that draws upon evolutionary techniques to more efficiently compute hypergradients. EvoGrad estimates hypergradient with respect to hyperparameters without calculating second-order gradients, or storing a longer computational graph, leading to significant improvements in efficiency. We evaluate EvoGrad on three substantial recent meta-learning applications, namely cross-domain few-shot learning with feature-wise transformations, noisy label learning with Meta-Weight-Net and low-resource cross-lingual learning with meta representation transformation. The results show that EvoGrad significantly improves efficiency and enables scaling meta-learning to bigger architectures such as from ResNet10 to ResNet34.
\ No newline at end of file
diff --git a/data/2021/neurips/Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots b/data/2021/neurips/Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots
new file mode 100644
index 0000000000..7886b7af6a
--- /dev/null
+++ b/data/2021/neurips/Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots	
@@ -0,0 +1 @@
+Both the design and control of a robot play equally important roles in its task performance. However, while optimal control is well studied in the machine learning and robotics community, less attention is placed on finding the optimal robot design. This is mainly because co-optimizing design and control in robotics is characterized as a challenging problem, and more importantly, a comprehensive evaluation benchmark for co-optimization does not exist. In this paper, we propose Evolution Gym, the first large-scale benchmark for co-optimizing the design and control of soft robots. In our benchmark, each robot is composed of different types of voxels (e.g., soft, rigid, actuators), resulting in a modular and expressive robot design space. Our benchmark environments span a wide range of tasks, including locomotion on various types of terrains and manipulation. Furthermore, we develop several robot co-evolution algorithms by combining state-of-the-art design optimization methods and deep reinforcement learning techniques. Evaluating the algorithms on our benchmark platform, we observe robots exhibiting increasingly complex behaviors as evolution progresses, with the best evolved designs solving many of our proposed tasks. Additionally, even though robot designs are evolved autonomously from scratch without prior knowledge, they often grow to resemble existing natural creatures while outperforming hand-designed robots. Nevertheless, all tested algorithms fail to find robots that succeed in our hardest environments. This suggests that more advanced algorithms are required to explore the high-dimensional design space and evolve increasingly intelligent robots – an area of research in which we hope Evolution Gym will accelerate progress. Our website with code, environments, documentation, and tutorials is available at http://evogym.csail.mit.edu.
\ No newline at end of file
diff --git a/data/2021/neurips/Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms b/data/2021/neurips/Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms
new file mode 100644
index 0000000000..a5966cd948
--- /dev/null
+++ b/data/2021/neurips/Exact Privacy Guarantees for Markov Chain Implementations of the Exponential Mechanism with Artificial Atoms	
@@ -0,0 +1 @@
+Implementations of the exponential mechanism in differential privacy often require sampling from intractable distributions. When approximate procedures like Markov chain Monte Carlo (MCMC) are used, the end result incurs costs to both privacy and accuracy. Existing work has examined these effects asymptotically, but implementable finite sample results are needed in practice so that users can specify privacy budgets in advance and implement samplers with exact privacy guarantees. In this paper, we use tools from ergodic theory and perfect simulation to design exact finite runtime sampling algorithms for the exponential mechanism by introducing an intermediate modified target distribution using artificial atoms. We propose an additional modification of this sampling algorithm that maintains its $\epsilon$-DP guarantee and has improved runtime at the cost of some utility. We then compare these methods in scenarios where we can explicitly calculate a $\delta$ cost (as in $(\epsilon, \delta)$-DP) incurred when using standard MCMC techniques. Much as there is a well known trade-off between privacy and utility, we demonstrate that there is also a trade-off between privacy guarantees and runtime.
\ No newline at end of file
diff --git a/data/2021/neurips/Exact marginal prior distributions of finite Bayesian neural networks b/data/2021/neurips/Exact marginal prior distributions of finite Bayesian neural networks
new file mode 100644
index 0000000000..0f584c83ff
--- /dev/null
+++ b/data/2021/neurips/Exact marginal prior distributions of finite Bayesian neural networks	
@@ -0,0 +1 @@
+Bayesian neural networks are theoretically well-understood only in the inﬁnite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that ﬁnite Bayesian networks may outperform their inﬁnite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we derive exact solutions for the function space priors for individual input examples of a class of ﬁnite fully-connected feedforward Bayesian neural networks. For deep linear networks, the prior has a simple expression in terms of the Meijer G -function. The prior of a ﬁnite ReLU network is a mixture of the priors of linear networks of smaller widths, corresponding to different numbers of active units in each layer. Our results unify previous descriptions of ﬁnite network priors in terms of their tail decay and large-width behavior.
\ No newline at end of file
diff --git a/data/2021/neurips/Excess Capacity and Backdoor Poisoning b/data/2021/neurips/Excess Capacity and Backdoor Poisoning
new file mode 100644
index 0000000000..66114a7a8f
--- /dev/null
+++ b/data/2021/neurips/Excess Capacity and Backdoor Poisoning	
@@ -0,0 +1 @@
+A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set. The watermark does not impact the test-time performance of the model on typical data; however, the model reliably errs on watermarked examples. To gain a better foundational understanding of backdoor data poisoning attacks, we present a formal theoretical framework within which one can discuss backdoor data poisoning attacks for classification problems. We then use this to analyze important statistical and computational issues surrounding these attacks. On the statistical front, we identify a parameter we call the memorization capacity that captures the intrinsic vulnerability of a learning problem to a backdoor attack. This allows us to argue about the robustness of several natural learning problems to backdoor attacks. Our results favoring the attacker involve presenting explicit constructions of backdoor attacks, and our robustness results show that some natural problem settings cannot yield successful backdoor attacks. From a computational standpoint, we show that under certain assumptions, adversarial training can detect the presence of backdoors in a training set. We then show that under similar assumptions, two closely related problems we call backdoor filtering and robust generalization are nearly equivalent. This implies that it is both asymptotically necessary and sufficient to design algorithms that can identify watermarked examples in the training set in order to obtain a learning algorithm that both generalizes well to unseen data and is robust to backdoors.
\ No newline at end of file
diff --git a/data/2021/neurips/Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning b/data/2021/neurips/Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
new file mode 100644
index 0000000000..46f1d74fba
--- /dev/null
+++ b/data/2021/neurips/Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning	
@@ -0,0 +1 @@
+In natural language processing, most models try to learn semantic representations merely from texts. The learned representations encode the distributional semantics but fail to connect to any knowledge about the physical world. In contrast, humans learn language by grounding concepts in perception and action and the brain encodes grounded semantics for cognition. Inspired by this notion and recent work in vision-language learning, we design a two-stream model for grounding language learning in vision. The model includes a VGG-based visual stream and a Bert-based language stream. The two streams merge into a joint representational space. Through cross-modal contrastive learning, the model first learns to align visual and language representations with the MS COCO dataset. The model further learns to retrieve visual objects with language queries through a cross-modal attention module and to infer the visual relations between the retrieved objects through a bilinear operator with the Visual Genome dataset. After training, the language stream of this model is a stand-alone language model capable of embedding concepts in a visually grounded semantic space. This semantic space manifests principal dimensions explainable with human intuition and neurobiological knowledge. Word embeddings in this semantic space are predictive of human-defined norms of semantic features and are segregated into perceptually distinctive clusters. Furthermore, the visually grounded language model also enables compositional language understanding based on visual knowledge and multimodal image search with queries based on images, texts, or their combinations.
\ No newline at end of file
diff --git a/data/2021/neurips/Explaining Hyperparameter Optimization via Partial Dependence Plots b/data/2021/neurips/Explaining Hyperparameter Optimization via Partial Dependence Plots
new file mode 100644
index 0000000000..344c3b8c8e
--- /dev/null
+++ b/data/2021/neurips/Explaining Hyperparameter Optimization via Partial Dependence Plots	
@@ -0,0 +1 @@
+Automated hyperparameter optimization (HPO) can support practitioners to obtain peak performance in machine learning models. However, there is often a lack of valuable insights into the effects of different hyperparameters on the final model performance. This lack of explainability makes it difficult to trust and understand the automated HPO process and its results. We suggest using interpretable machine learning (IML) to gain insights from the experimental data obtained during HPO with Bayesian optimization (BO). BO tends to focus on promising regions with potential high-performance configurations and thus induces a sampling bias. Hence, many IML techniques, such as the partial dependence plot (PDP), carry the risk of generating biased interpretations. By leveraging the posterior uncertainty of the BO surrogate model, we introduce a variant of the PDP with estimated confidence bands. We propose to partition the hyperparameter space to obtain more confident and reliable PDPs in relevant sub-regions. In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions.
\ No newline at end of file
diff --git a/data/2021/neurips/Explaining Latent Representations with a Corpus of Examples b/data/2021/neurips/Explaining Latent Representations with a Corpus of Examples
new file mode 100644
index 0000000000..da03bdecf7
--- /dev/null
+++ b/data/2021/neurips/Explaining Latent Representations with a Corpus of Examples	
@@ -0,0 +1 @@
+Modern machine learning models are complicated. Most of them rely on convoluted latent representations of their input to issue a prediction. To achieve greater transparency than a black-box that connects inputs to predictions, it is necessary to gain a deeper understanding of these latent representations. To that aim, we propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples, called the corpus. SimplEx uses the corpus to improve the user's understanding of the latent space with post-hoc explanations answering two questions: (1) Which corpus examples explain the prediction issued for a given test example? (2) What features of these corpus examples are relevant for the model to relate them to the test example? SimplEx provides an answer by reconstructing the test latent representation as a mixture of corpus latent representations. Further, we propose a novel approach, the Integrated Jacobian, that allows SimplEx to make explicit the contribution of each corpus feature in the mixture. Through experiments on tasks ranging from mortality prediction to image classification, we demonstrate that these decompositions are robust and accurate. With illustrative use cases in medicine, we show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations. Moreover, we demonstrate how the freedom in choosing the corpus allows the user to have personalized explanations in terms of examples that are meaningful for them.
\ No newline at end of file
diff --git a/data/2021/neurips/Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks b/data/2021/neurips/Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks
new file mode 100644
index 0000000000..91706e1791
--- /dev/null
+++ b/data/2021/neurips/Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks	
@@ -0,0 +1 @@
+Medial entorhinal cortex (MEC) supports a wide range of navigational and memory related behaviors. Well-known experimental results have revealed specialized cell types in MEC — e.g. grid, border, and head-direction cells — whose highly stereo-typical response profiles are suggestive of the role they might play in supporting MEC functionality. However, the majority of MEC neurons do not exhibit stereotypical firing patterns. How should the response profiles of these more “heterogeneous” cells be described, and how do they contribute to behavior? In this work, we took a computational approach to addressing these questions. We first performed a statistical analysis that shows that heterogeneous MEC cells are just as reliable in their response patterns as the more stereotypical cell types, suggesting that they have a coherent functional role. Next, we evaluated a spectrum of candidate models in terms of their ability to describe the response profiles of both stereotypical and heterogeneous MEC cells. We found that recently developed task-optimized neural network models are substantially better than traditional grid cell-centric models at matching most MEC neuronal response profiles — including those of grid cells themselves — despite not being explicitly trained for this purpose. Specific choices of network architecture (such as gated nonlinearities and an explicit intermediate place cell representation) have an important effect on the ability of the model to generalize to novel scenarios, with the best of these models closely approaching the noise ceiling of the data itself. We then performed “in-silica” experiments on this model to address questions involving the relative functional relevance of various cell types, finding that heterogeneous cells are likely to be just as involved in downstream functional outcomes (such as path integration) as grid and border cells. Finally, inspired by recent data showing that, going beyond their spatial response selectivity, MEC cells are also responsive to non-spatial rewards, we introduce a new MEC model that performs reward-modulated path integration. We find that this unified model matches neural recordings across all variable-reward conditions. Taken together, our results point toward a conceptually principled goal-driven modeling approach for moving future experimental and computational efforts beyond overly-simplistic single-cell stereotypes.
\ No newline at end of file
diff --git a/data/2021/neurips/Explanation-based Data Augmentation for Image Classification b/data/2021/neurips/Explanation-based Data Augmentation for Image Classification
new file mode 100644
index 0000000000..c6f3f174b2
--- /dev/null
+++ b/data/2021/neurips/Explanation-based Data Augmentation for Image Classification	
@@ -0,0 +1 @@
+Existing works have generated explanations for deep neural network decisions to provide insights into model behavior. We observe that these explanations can also be used to identify concepts that caused misclassiﬁcations. This allows us to understand the possible limitations of the dataset used to train the model, particularly the under-represented regions in the dataset. This work proposes a framework that utilizes concept-based explanations to automatically augment the dataset with new images that can cover these under-represented regions to improve the model performance. The framework is able to use the explanations generated by both interpretable classiﬁers and post-hoc explanations from black-box classiﬁers. Experiment results demonstrate that the proposed approach improves the accuracy of classiﬁers compared to state-of-the-art augmentation strategies.
\ No newline at end of file
diff --git a/data/2021/neurips/Explicable Reward Design for Reinforcement Learning Agents b/data/2021/neurips/Explicable Reward Design for Reinforcement Learning Agents
new file mode 100644
index 0000000000..daf550b026
--- /dev/null
+++ b/data/2021/neurips/Explicable Reward Design for Reinforcement Learning Agents	
@@ -0,0 +1 @@
+We study the design of explicable reward functions for a reinforcement learning agent while guaranteeing that an optimal policy induced by the function belongs to a set of target policies. By being explicable, we seek to capture two properties: (a) informativeness so that the rewards speed up the agent’s convergence, and (b) sparseness as a proxy for ease of interpretability of the rewards. The key challenge is that higher informativeness typically requires dense rewards for many learning tasks, and existing techniques do not allow one to balance these two properties appropriately. In this paper, we investigate the problem from the perspective of discrete optimization and introduce a novel framework, E XP RD, to design explicable reward functions. E XP RD builds upon an informativeness criterion that captures the (sub-)optimality of target policies at different time horizons in terms of actions taken from any given starting state. We provide a mathematical analysis of E XP RD, and show its connections to existing reward design techniques, including potential-based reward shaping. Experimental results on two navigation tasks demonstrate the effectiveness of E XP RD in designing explicable reward functions.
\ No newline at end of file
diff --git a/data/2021/neurips/Explicit loss asymptotics in the gradient descent training of neural networks b/data/2021/neurips/Explicit loss asymptotics in the gradient descent training of neural networks
new file mode 100644
index 0000000000..5961b78dba
--- /dev/null
+++ b/data/2021/neurips/Explicit loss asymptotics in the gradient descent training of neural networks	
@@ -0,0 +1 @@
+Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show that the learning trajectory of a wide network in a lazy training regime can be characterized by an explicit asymptotic at large training times. Speciﬁcally, the leading term in the asymptotic expansion of the loss behaves as a power law L ( t ) ∼ Ct − ξ with exponent ξ expressed only through the data dimension, the smoothness of the activation function, and the class of function being approximated. Our results are based on spectral analysis of the integral operator representing the linearized evolution of a large network trained on the expected loss. Importantly, the techniques we employ do not require a speciﬁc form of the data distribution, for example Gaussian, thus making our ﬁndings sufﬁciently universal.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions b/data/2021/neurips/Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions
new file mode 100644
index 0000000000..03a626acbe
--- /dev/null
+++ b/data/2021/neurips/Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions	
@@ -0,0 +1 @@
+To measure the difference between two probability distributions, referred to as the source and target, respectively, we exploit both the chain rule and Bayes' theorem to construct conditional transport (CT), which is constituted by both a forward component and a backward one. The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional distribution, which is related to the target PDF via Bayes' theorem. The backward CT is defined by reversing the direction. The CT cost can be approximated by replacing the source and target PDFs with their discrete empirical distributions supported on mini-batches, making it amenable to implicit distributions and stochastic gradient descent-based optimization. When applied to train a generative model, CT is shown to strike a good balance between mode-covering and mode-seeking behaviors and strongly resist mode collapse. On a wide variety of benchmark datasets for generative modeling, substituting the default statistical distance of an existing generative adversarial network with CT is shown to consistently improve the performance. PyTorch code is provided.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation b/data/2021/neurips/Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation
new file mode 100644
index 0000000000..bb21d436c7
--- /dev/null
+++ b/data/2021/neurips/Exploiting Data Sparsity in Secure Cross-Platform Social Recommendation	
@@ -0,0 +1 @@
+Social recommendation has shown promising improvements over traditional systems since it leverages social correlation data as an additional input. Most existing work assumes that all data are available to the recommendation platform. However, in practice, user-item interaction data (e.g.,rating) and user-user social data are usually generated by different platforms, and both of which contain sensitive information. Therefore,"How to perform secure and efficient social recommendation across different platforms, where the data are highly-sparse in nature"remains an important challenge. In this work, we bring secure computation techniques into social recommendation, and propose S3Rec, a sparsity-aware secure cross-platform social recommendation framework. As a result, our model can not only improve the recommendation performance of the rating platform by incorporating the sparse social data on the social platform, but also protect data privacy of both platforms. Moreover, to further improve model training efficiency, we propose two secure sparse matrix multiplication protocols based on homomorphic encryption and private information retrieval. Our experiments on two benchmark datasets demonstrate the effectiveness of S3Rec.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting Domain-Specific Features to Enhance Domain Generalization b/data/2021/neurips/Exploiting Domain-Specific Features to Enhance Domain Generalization
new file mode 100644
index 0000000000..cbf1536b45
--- /dev/null
+++ b/data/2021/neurips/Exploiting Domain-Specific Features to Enhance Domain Generalization	
@@ -0,0 +1 @@
+Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains. To obtain the generalization capability, prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains, while useful domain-specific information which strongly correlates with labels in individual domains and the generalization to target domains is usually ignored. In this paper, we propose meta-Domain Specific-Domain Invariant (mDSDI) - a novel theoretically sound framework that extends beyond the invariance view to further capture the usefulness of domain-specific information. Our key insight is to disentangle features in the latent space while jointly learning both domain-invariant and domain-specific features in a unified framework. The domain-specific representation is optimized through the meta-learning framework to adapt from source domains, targeting a robust generalization on unseen domains. We empirically show that mDSDI provides competitive results with state-of-the-art techniques in DG. A further ablation study with our generated dataset, Background-Colored-MNIST, confirms the hypothesis that domain-specific is essential, leading to better results when compared with only using domain-invariant.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach b/data/2021/neurips/Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach
new file mode 100644
index 0000000000..0968b89138
--- /dev/null
+++ b/data/2021/neurips/Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach	
@@ -0,0 +1 @@
+In this paper, we study the application of quasi-Newton methods for solving empirical risk minimization (ERM) problems defined over a large dataset. Traditional deterministic and stochastic quasi-Newton methods can be executed to solve such problems; however, it is known that their global convergence rate may not be better than first-order methods, and their local superlinear convergence only appears towards the end of the learning process. In this paper, we use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the entire learning process. The main idea of the proposed adaptive sample size algorithms is to start with a small subset of data points and solve their corresponding ERM problem within its statistical accuracy, and then enlarge the sample size geometrically and use the optimal solution of the problem corresponding to the smaller set as an initial point for solving the subsequent ERM problem with more samples. We show that if the initial sample size is sufficiently large and we use quasi-Newton methods to solve each subproblem, the subproblems can be solved superlinearly fast (after at most three iterations), as we guarantee that the iterates always stay within a neighborhood that quasi-Newton methods converge superlinearly. Numerical experiments on various datasets confirm our theoretical results and demonstrate the computational advantages of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting Opponents Under Utility Constraints in Sequential Games b/data/2021/neurips/Exploiting Opponents Under Utility Constraints in Sequential Games
new file mode 100644
index 0000000000..75b37162b1
--- /dev/null
+++ b/data/2021/neurips/Exploiting Opponents Under Utility Constraints in Sequential Games	
@@ -0,0 +1 @@
+Recently, game-playing agents based on AI techniques have demonstrated super-human performance in several sequential games, such as chess, Go, and poker. Sur-prisingly, the multi-agent learning techniques that allowed to reach these achievements do not take into account the actual behavior of the human player, potentially leading to an impressive gap in performances. In this paper, we address the problem of designing artiﬁcial agents that learn how to effectively exploit unknown human opponents while playing repeatedly against them in an online fashion. We study the case in which the agent’s strategy during each repetition of the game is subject to constraints ensuring that the human’s expected utility is within some lower and upper thresholds. Our framework encompasses several real-world problems, such as human engagement in repeated game playing and human education by means of serious games . As a ﬁrst result, we formalize a set of linear inequalities encoding the conditions that the agent’s strategy must satisfy at each iteration in order to do not violate the given bounds for the human’s expected utility. Then, we use such formulation in an upper conﬁdence bound algorithm, and we prove that the resulting procedure suffers from sublinear regret and guarantees that the constraints are satisﬁed with high probability at each iteration. Finally, we empirically evaluate the convergence of our algorithm on standard testbeds of sequential games.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting a Zoo of Checkpoints for Unseen Tasks b/data/2021/neurips/Exploiting a Zoo of Checkpoints for Unseen Tasks
new file mode 100644
index 0000000000..bda94d5e4a
--- /dev/null
+++ b/data/2021/neurips/Exploiting a Zoo of Checkpoints for Unseen Tasks	
@@ -0,0 +1 @@
+There are so many models in the literature that it is difficult for practitioners to decide which combinations are likely to be effective for a new task. This paper attempts to address this question by capturing relationships among checkpoints published on the web. We model the space of tasks as a Gaussian process. The covariance can be estimated from checkpoints and unlabeled probing data. With the Gaussian process, we can identify representative checkpoints by a maximum mutual information criterion. This objective is submodular. A greedy method identifies representatives that are likely to"cover"the task space. These representatives generalize to new tasks with superior performance. Empirical evidence is provided for applications from both computational linguistics as well as computer vision.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation b/data/2021/neurips/Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation
new file mode 100644
index 0000000000..1be610e5db
--- /dev/null
+++ b/data/2021/neurips/Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation	
@@ -0,0 +1 @@
+Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g. due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might no longer align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors, and propose a self regularization loss to decrease the negative impact of noisy neighbors. Furthermore, to aggregate information with more context, we consider expanded neighborhoods with small affinity values. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets. Code is available in https://github.com/Albert0147/SFDA_neighbors.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality b/data/2021/neurips/Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
new file mode 100644
index 0000000000..807452fbdc
--- /dev/null
+++ b/data/2021/neurips/Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality	
@@ -0,0 +1 @@
+The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks b/data/2021/neurips/Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
new file mode 100644
index 0000000000..e2ca5768f4
--- /dev/null
+++ b/data/2021/neurips/Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks. A range of defense methods have been proposed to train adversarially robust DNNs, among which adversarial training has demonstrated promising results. However, despite preliminary understandings developed for adversarial training, it is still not clear, from the architectural perspective, what configurations can lead to more robust DNNs. In this paper, we address this gap via a comprehensive investigation on the impact of network width and depth on the robustness of adversarially trained DNNs. Specifically, we make the following key observations: 1) more parameters (higher model capacity) does not necessarily help adversarial robustness; 2) reducing capacity at the last stage (the last group of blocks) of the network can actually improve adversarial robustness; and 3) under the same parameter budget, there exists an optimal architectural configuration for adversarial robustness. We also provide a theoretical analysis explaning why such network configuration can help robustness. These architectural insights can help design adversarially robust DNNs. Code is available at \url{https://github.com/HanxunH/RobustWRN}.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing b/data/2021/neurips/Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing
new file mode 100644
index 0000000000..8c48d0c7ab
--- /dev/null
+++ b/data/2021/neurips/Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing	
@@ -0,0 +1 @@
+The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories. However, it is labor-intensive to temporally annotate audio and visual events and thus hampers the learning of a parsing model. To this end, we propose to explore additional cross-video and cross-modality supervisory signals to facilitate weakly-supervised audio-visual video parsing. The proposed method exploits both the common and diverse event semantics across videos to identify audio or visual events. In addition, our method explores event co-occurrence across audio, visual, and audio-visual streams. We leverage the explored cross-modality co-occurrence to localize segments of target events while excluding irrelevant ones. The discovered supervisory signals across different videos and modalities can greatly facilitate the training with only video-level annotations. Quantitative and qualitative results demonstrate that the proposed method performs favorably against existing methods on weakly-supervised audio-visual video parsing.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploring Forensic Dental Identification with Deep Learning b/data/2021/neurips/Exploring Forensic Dental Identification with Deep Learning
new file mode 100644
index 0000000000..c9b2d4a26b
--- /dev/null
+++ b/data/2021/neurips/Exploring Forensic Dental Identification with Deep Learning	
@@ -0,0 +1 @@
+Dental forensic identiﬁcation targets to identify persons with dental traces. The task is vital for the investigation of criminal scenes and mass disasters because of the resistance of dental structures and the wide-existence of dental imaging. However, no widely accepted automated solution is available for this labour-costly task. In this work, we pioneer to study deep learning for dental forensic identiﬁcation based on panoramic radiographs. We construct a comprehensive benchmark with various dental variations that can adequately reﬂect the difﬁculties of the task. By considering the task’s unique challenges, we propose FoID, a deep learning method featured by: ( i ) clinical-inspired attention localization, ( ii ) domain-speciﬁc augmentations that enable instance discriminative learning, and ( iii ) transformer-based self-attention mechanism that dynamically reasons the relative importance of attentions. We show that FoID can outperform traditional approaches by at least 22.98% in terms of Rank-1 accuracy, and outperform strong CNN baselines by at least 10.50% in terms of mean Average Precision (mAP). Moreover, extensive ablation studies verify the effectiveness of each building blocks of FoID. Our work can be a ﬁrst step towards the automated system for forensic identi-ﬁcation among large-scale multi-site databases. Also, the proposed techniques, e.g. , self-attention mechanism, can also be meaningful for other identiﬁcation tasks, e.g. , pedestrian re-identiﬁcation. Related data and codes can be found at https:/
\ No newline at end of file
diff --git a/data/2021/neurips/Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling b/data/2021/neurips/Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling
new file mode 100644
index 0000000000..1b25d73c60
--- /dev/null
+++ b/data/2021/neurips/Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling	
@@ -0,0 +1 @@
+Multi-agent behavior modeling and trajectory forecasting are crucial for the safe navigation of autonomous agents in interactive scenarios. Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling to generate diverse behavior and learn a low-dimensional representation for interacting systems. However, existing literature did not formally discuss if a VAE-based model can properly encode interaction into its latent space. In this work, we argue that one of the typical formulations of VAEs in multi-agent modeling suffers from an issue we refer to as social posterior collapse, i.e., the model is prone to ignoring historical social context when predicting the future trajectory of an agent. It could cause significant prediction errors and poor generalization performance. We analyze the reason behind this under-explored phenomenon and propose several measures to tackle it. Afterward, we implement the proposed framework and experiment on real-world datasets for multi-agent trajectory prediction. In particular, we propose a novel sparse graph attention message-passing (sparse-GAMP) layer, which helps us detect social posterior collapse in our experiments. In the experiments, we verify that social posterior collapse indeed occurs. Also, the proposed measures are effective in alleviating the issue. As a result, the model attains better generalization performance when historical social context is informative for prediction.
\ No newline at end of file
diff --git a/data/2021/neurips/Exploring the Limits of Out-of-Distribution Detection b/data/2021/neurips/Exploring the Limits of Out-of-Distribution Detection
new file mode 100644
index 0000000000..0c6c698ea1
--- /dev/null
+++ b/data/2021/neurips/Exploring the Limits of Out-of-Distribution Detection	
@@ -0,0 +1 @@
+Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We demonstrate that large-scale pre-trained transformers can significantly improve the state-of-the-art (SOTA) on a range of near OOD tasks across different data modalities. For instance, on CIFAR-100 vs CIFAR-10 OOD detection, we improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. To further improve performance, we explore the few-shot outlier exposure setting where a few examples from outlier classes may be available; we show that pre-trained transformers are particularly well-suited for outlier exposure, and that the AUROC of OOD detection on CIFAR-100 vs CIFAR-10 can be improved to 98.7% with just 1 image per OOD class, and 99.46% with 10 images per OOD class. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images, and show that this outperforms previous SOTA on standard vision OOD benchmark tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning b/data/2021/neurips/Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning
new file mode 100644
index 0000000000..1d6d8a88fd
--- /dev/null
+++ b/data/2021/neurips/Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning	
@@ -0,0 +1 @@
+We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Exponential Graph is Provably Efficient for Decentralized Deep Training b/data/2021/neurips/Exponential Graph is Provably Efficient for Decentralized Deep Training
new file mode 100644
index 0000000000..29139d9ef6
--- /dev/null
+++ b/data/2021/neurips/Exponential Graph is Provably Efficient for Decentralized Deep Training	
@@ -0,0 +1 @@
+Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization. In this paper, we study so-called exponential graphs where every node is connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $\log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training. Our code is implemented through BlueFog and available at https://github.com/Bluefog-Lib/NeurIPS2021-Exponential-Graph.
\ No newline at end of file
diff --git a/data/2021/neurips/Exponential Separation between Two Learning Models and Adversarial Robustness b/data/2021/neurips/Exponential Separation between Two Learning Models and Adversarial Robustness
new file mode 100644
index 0000000000..d0419b57ee
--- /dev/null
+++ b/data/2021/neurips/Exponential Separation between Two Learning Models and Adversarial Robustness	
@@ -0,0 +1 @@
+We prove an exponential separation for the sample/query complexity between the standard PAC-learning model and a version of the Equivalence-Query-learning model. In the PAC model all samples are provided at the beginning of the learning process. In the Equivalence-Query model the samples are acquired through an interaction between a teacher and a learner, where the teacher provides counterexamples to hypotheses given by the learner. It is intuitive that in an interactive setting fewer samples are needed. We make this formal and prove that in order to achieve an error ✏ exponentially (in ✏ ) fewer samples sufﬁce than what the PAC bound requires. It was shown experimentally by Stutz, Hein, and Schiele that adversarial training with on-manifold adversarial examples aids generalization (compared to standard training). If we think of the adversarial examples as counterexamples to the current hypothesis then our result can be thought of as a theoretical conﬁrmation of those ﬁndings. We also discuss how our result relates to adversarial robustness. In the standard adversarial model one restricts the adversary by introducing a norm constraint. An alternative was pioneered by Goldwasser et. al. Rather than restricting the adversary the learner is enhanced. We pursue a third path. We require the adversary to return samples according to the Equivalance-Query model and show that this leads to robustness. Even though our model has its limitations it provides a fresh point of view on adversarial robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models b/data/2021/neurips/Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models
new file mode 100644
index 0000000000..8cbea23d95
--- /dev/null
+++ b/data/2021/neurips/Extending Lagrangian and Hamiltonian Neural Networks with Differentiable Contact Models	
@@ -0,0 +1 @@
+The incorporation of appropriate inductive bias plays a critical role in learning dynamics from data. A growing body of work has been exploring ways to enforce energy conservation in the learned dynamics by encoding Lagrangian or Hamiltonian dynamics into the neural network architecture. These existing approaches are based on differential equations, which do not allow discontinuity in the states and thereby limit the class of systems one can learn. However, in reality, most physical systems, such as legged robots and robotic manipulators, involve contacts and collisions, which introduce discontinuities in the states. In this paper, we introduce a differentiable contact model, which can capture contact mechanics: frictionless/frictional, as well as elastic/inelastic. This model can also accommodate inequality constraints, such as limits on the joint angles. The proposed contact model extends the scope of Lagrangian and Hamiltonian neural networks by allowing simultaneous learning of contact and system properties. We demonstrate this framework on a series of challenging 2D and 3D physical systems with different coefficients of restitution and friction. The learned dynamics can be used as a differentiable physics simulator for downstream gradient-based optimization tasks, such as planning and control.
\ No newline at end of file
diff --git a/data/2021/neurips/Extracting Deformation-Aware Local Features by Learning to Deform b/data/2021/neurips/Extracting Deformation-Aware Local Features by Learning to Deform
new file mode 100644
index 0000000000..8eb25cecf9
--- /dev/null
+++ b/data/2021/neurips/Extracting Deformation-Aware Local Features by Learning to Deform	
@@ -0,0 +1 @@
+Despite the advances in extracting local features achieved by handcrafted and learning-based descriptors, they are still limited by the lack of invariance to non-rigid transformations. In this paper, we present a new approach to compute features from still images that are robust to non-rigid deformations to circumvent the problem of matching deformable surfaces and objects. Our deformation-aware local descriptor, named DEAL, leverages a polar sampling and a spatial transformer warping to provide invariance to rotation, scale, and image deformations. We train the model architecture end-to-end by applying isometric non-rigid deformations to objects in a simulated environment as guidance to provide highly discriminative local features. The experiments show that our method outperforms state-of-the-art handcrafted, learning-based image, and RGB-D descriptors in different datasets with both real and realistic synthetic deformable objects in still images. The source code and trained model of the descriptor are publicly available at https://www.verlab.dcc.ufmg.br/descriptors/neurips2021.
\ No newline at end of file
diff --git a/data/2021/neurips/FACMAC: Factored Multi-Agent Centralised Policy Gradients b/data/2021/neurips/FACMAC: Factored Multi-Agent Centralised Policy Gradients
new file mode 100644
index 0000000000..4f5f6b6722
--- /dev/null
+++ b/data/2021/neurips/FACMAC: Factored Multi-Agent Centralised Policy Gradients	
@@ -0,0 +1 @@
+We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's action space separately as in MADDPG. This allows for more coordinated policy changes and fully reaps the benefits of a centralised critic. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks. Empirical results demonstrate FACMAC's superior performance over MADDPG and other baselines on all three domains.
\ No newline at end of file
diff --git a/data/2021/neurips/FINE Samples for Learning with Noisy Labels b/data/2021/neurips/FINE Samples for Learning with Noisy Labels
new file mode 100644
index 0000000000..4c5ac99098
--- /dev/null
+++ b/data/2021/neurips/FINE Samples for Learning with Noisy Labels	
@@ -0,0 +1 @@
+Modern deep neural networks (DNNs) become frail when the datasets contain noisy (incorrect) class labels. Robust techniques in the presence of noisy labels can be categorized into two folds: developing noise-robust functions or using noise-cleansing methods by detecting the noisy data. Recently, noise-cleansing methods have been considered as the most competitive noisy-label learning algorithms. Despite their success, their noisy label detectors are often based on heuristics more than a theory, requiring a robust classifier to predict the noisy data with loss values. In this paper, we propose a novel detector for filtering label noise. Unlike most existing methods, we focus on each data's latent representation dynamics and measure the alignment between the latent distribution and each representation using the eigendecomposition of the data gram matrix. Our framework, coined as filtering noisy instances via their eigenvectors (FINE), provides a robust detector with derivative-free simple methods having theoretical guarantees. Under our framework, we propose three applications of the FINE: sample-selection approach, semi-supervised learning approach, and collaboration with noise-robust loss functions. Experimental results show that the proposed methods consistently outperform corresponding baselines for all three applications on various benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective b/data/2021/neurips/FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective
new file mode 100644
index 0000000000..2046441d19
--- /dev/null
+++ b/data/2021/neurips/FL-WBC: Enhancing Robustness against Model Poisoning Attacks in Federated Learning from a Client Perspective	
@@ -0,0 +1 @@
+Federated learning (FL) is a popular distributed learning framework that trains a global model through iterative communications between a central server and edge devices. Recent works have demonstrated that FL is vulnerable to model poisoning attacks. Several server-based defense approaches (e.g. robust aggregation), have been proposed to mitigate such attacks. However, we empirically show that under extremely strong attacks, these defensive methods fail to guarantee the robustness of FL. More importantly, we observe that as long as the global model is polluted, the impact of attacks on the global model will remain in subsequent rounds even if there are no subsequent attacks. In this work, we propose a client-based defense, named White Blood Cell for Federated Learning (FL-WBC), which can mitigate model poisoning attacks that have already polluted the global model. The key idea of FL-WBC is to identify the parameter space where long-lasting attack effect on parameters resides and perturb that space during local training. Furthermore, we derive a certified robustness guarantee against model poisoning attacks and a convergence guarantee to FedAvg after applying our FL-WBC. We conduct experiments on FasionMNIST and CIFAR10 to evaluate the defense against state-of-the-art model poisoning attacks. The results demonstrate that our method can effectively mitigate model poisoning attack impact on the global model within 5 communication rounds with nearly no accuracy drop under both IID and Non-IID settings. Our defense is also complementary to existing server-based robust aggregation approaches and can further improve the robustness of FL under extremely strong attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/FLEX: Unifying Evaluation for Few-Shot NLP b/data/2021/neurips/FLEX: Unifying Evaluation for Few-Shot NLP
new file mode 100644
index 0000000000..6508bd979f
--- /dev/null
+++ b/data/2021/neurips/FLEX: Unifying Evaluation for Few-Shot NLP	
@@ -0,0 +1 @@
+Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design. Consequently, the community does not know which techniques perform best or even if they outperform simple baselines. In response, we formulate the FLEX Principles, a set of requirements and best practices for unified, rigorous, valid, and cost-sensitive few-shot NLP evaluation. These principles include Sample Size Design, a novel approach to benchmark design that optimizes statistical accuracy and precision while keeping evaluation costs manageable. Following the principles, we release the FLEX benchmark, which includes four few-shot transfer settings, zero-shot evaluation, and a public leaderboard that covers diverse NLP tasks. In addition, we present UniFew, a prompt-based model for few-shot learning that unifies pretraining and finetuning prompt formats, eschewing complex machinery of recent prompt-based approaches in adapting downstream task formats to language model pretraining objectives. We demonstrate that despite simplicity, UniFew achieves results competitive with both popular meta-learning and prompt-based approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention b/data/2021/neurips/FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
new file mode 100644
index 0000000000..328500bd8f
--- /dev/null
+++ b/data/2021/neurips/FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention	
@@ -0,0 +1 @@
+We propose FMMformers, a class of efficient and flexible transformers inspired by the celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM decomposes particle-particle interaction into near-field and far-field components and then performs direct and coarse-grained computation, respectively. Similarly, FMMformers decompose the attention into near-field and far-field attention, modeling the near-field attention by a banded matrix and the far-field attention by a low-rank matrix. Computing the attention matrix for FMMformers requires linear complexity in computational time and memory footprint with respect to the sequence length. In contrast, standard transformers suffer from quadratic complexity. We analyze and validate the advantage of FMMformers over the standard transformer on the Long Range Arena and language modeling benchmarks. FMMformers can even outperform the standard transformer in terms of accuracy by a significant margin. For instance, FMMformers achieve an average classification accuracy of $60.74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58.70\%$.
\ No newline at end of file
diff --git a/data/2021/neurips/Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs b/data/2021/neurips/Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
new file mode 100644
index 0000000000..ff5048aab0
--- /dev/null
+++ b/data/2021/neurips/Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs	
@@ -0,0 +1 @@
+Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence network. Factored policy gradients (FPGs), which follow, provide a common framework for analysing key state-of-the-art algorithms, are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes. We provide an analysis of the proposed estimator and identify the conditions under which variance is reduced. The algorithmic aspects of FPGs are discussed, including optimal policy factorisation, as characterised by minimum biclique coverings, and the implications for the bias-variance trade-off of incorrectly specifying the network. Finally, we demonstrate the performance advantages of our algorithm on large-scale bandit and traffic intersection problems, providing a novel contribution to the latter in the form of a spatial approximation.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Algorithms for Multi-Agent Multi-Armed Bandits b/data/2021/neurips/Fair Algorithms for Multi-Agent Multi-Armed Bandits
new file mode 100644
index 0000000000..2b47017928
--- /dev/null
+++ b/data/2021/neurips/Fair Algorithms for Multi-Agent Multi-Armed Bandits	
@@ -0,0 +1 @@
+We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward to each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm", as each agent may perceive a different arm as best for her. Instead, we seek to learn a fair distribution over arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms, and show that they achieve sublinear regret, now measured in terms of the Nash social welfare.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Classification with Adversarial Perturbations b/data/2021/neurips/Fair Classification with Adversarial Perturbations
new file mode 100644
index 0000000000..7e827da6f9
--- /dev/null
+++ b/data/2021/neurips/Fair Classification with Adversarial Perturbations	
@@ -0,0 +1 @@
+We study fair classification in the presence of an omniscient adversary that, given an $\eta$, is allowed to choose an arbitrary $\eta$-fraction of the training samples and arbitrarily perturb their protected attributes. The motivation comes from settings in which protected attributes can be incorrect due to strategic misreporting, malicious actors, or errors in imputation; and prior approaches that make stochastic or independence assumptions on errors may not satisfy their guarantees in this adversarial setting. Our main contribution is an optimization framework to learn fair classifiers in this adversarial setting that comes with provable guarantees on accuracy and fairness. Our framework works with multiple and non-binary protected attributes, is designed for the large class of linear-fractional fairness metrics, and can also handle perturbations besides protected attributes. We prove near-tightness of our framework's guarantees for natural hypothesis classes: no algorithm can have significantly better accuracy and any algorithm with better fairness must have lower accuracy. Empirically, we evaluate the classifiers produced by our framework for statistical rate on real-world and synthetic datasets for a family of adversaries.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Clustering Under a Bounded Cost b/data/2021/neurips/Fair Clustering Under a Bounded Cost
new file mode 100644
index 0000000000..c4922c95ac
--- /dev/null
+++ b/data/2021/neurips/Fair Clustering Under a Bounded Cost	
@@ -0,0 +1 @@
+Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. In this model, the cost of the clustering objective increases due to enforcing fairness in the algorithm. The relative increase in the cost, the ''price of fairness,'' can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective. We derive fundamental lower bounds on the approximation of the utilitarian and egalitarian objectives and introduce algorithms with provable guarantees for them. For the leximin objective we introduce an effective heuristic algorithm. We further derive impossibility results for other natural fairness objectives. We conclude with experimental results on real-world datasets that demonstrate the validity of our algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Exploration via Axiomatic Bargaining b/data/2021/neurips/Fair Exploration via Axiomatic Bargaining
new file mode 100644
index 0000000000..0f4fb0b4ed
--- /dev/null
+++ b/data/2021/neurips/Fair Exploration via Axiomatic Bargaining	
@@ -0,0 +1 @@
+Exploration is often necessary in online learning to maximize long-term rewards, but it comes at the cost of short-term “regret.” We study how this cost of exploration is shared across multiple groups. For example, in a clinical trial setting, patients who are assigned a suboptimal treatment effectively incur the cost of exploration. When patients are associated with natural groups on the basis of, say, race or age, it is natural to ask whether the cost of exploration borne by any single group is “fair.” So motivated, we introduce the “grouped” bandit model. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On one hand, we show that any regret-optimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most “disadvantaged” groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small “price of fairness.” We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups. This paper was accepted by David Simchi-Levi, data science. Funding: This work was supported by the National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 1727239]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.01985 .
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Scheduling for Time-dependent Resources b/data/2021/neurips/Fair Scheduling for Time-dependent Resources
new file mode 100644
index 0000000000..9e4023b1eb
--- /dev/null
+++ b/data/2021/neurips/Fair Scheduling for Time-dependent Resources	
@@ -0,0 +1 @@
+We study a fair resource scheduling problem,nwhere a set of interval jobs are to be allocated to heterogeneous machines controlled by intellectual agents. Each job is associated with release time, deadline and processing time such that it can be processed if its complete processing period is between its release time and deadline. The machines gain possibly different utilities by processing different jobs, and all jobs assigned to the same machine should be processed without overlap. We consider two widely studied solution concepts, namely, maximin share fairness and envy-freeness. For both criteria, we discuss the extent to which fair allocations exist and present constant approximation algorithms for various settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Sequential Selection Using Supervised Learning Models b/data/2021/neurips/Fair Sequential Selection Using Supervised Learning Models
new file mode 100644
index 0000000000..25d76d2e2f
--- /dev/null
+++ b/data/2021/neurips/Fair Sequential Selection Using Supervised Learning Models	
@@ -0,0 +1 @@
+We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, ``Equal Selection (ES),'' suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the perfect ES fairness can still be attained under certain conditions.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Sortition Made Transparent b/data/2021/neurips/Fair Sortition Made Transparent
new file mode 100644
index 0000000000..9eb248fece
--- /dev/null
+++ b/data/2021/neurips/Fair Sortition Made Transparent	
@@ -0,0 +1 @@
+Sortition is an age-old democratic paradigm, widely manifested today through the random selection of citizens’ assemblies. Recently-deployed algorithms select assemblies maximally fairly , meaning that subject to demographic quotas, they give all potential participants as equal a chance as possible of being chosen. While these fairness gains can bolster the legitimacy of citizens’ assemblies and facilitate their uptake, existing algorithms remain limited by their lack of transparency. To overcome this hurdle, in this work we focus on panel selection by uniform lottery, which is easy to realize in an observable way. By this approach, the ﬁnal assembly is selected by uniformly sampling some pre-selected set of m possible assemblies. We provide theoretical guarantees on the fairness attainable via this type of uniform lottery, as compared to the existing maximally fair but opaque algorithms, for two different fairness objectives. We complement these results with experiments on real-world instances that demonstrate the viability of the uniform lottery approach as a method of selecting assemblies both fairly and transparently.
\ No newline at end of file
diff --git a/data/2021/neurips/Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem b/data/2021/neurips/Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem
new file mode 100644
index 0000000000..4bf85ddca5
--- /dev/null
+++ b/data/2021/neurips/Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem	
@@ -0,0 +1 @@
+In this paper, we study the problem of fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute. The presence of a hidden attribute adds an extra layer of complexity to the problem by combining sparse regression and clustering with unknown binary labels. The corresponding optimization problem is combinatorial, but we propose a novel relaxation of it as an invex optimization problem. To the best of our knowledge, this is the ﬁrst invex relaxation for a combinatorial problem. We show that the inclusion of the debi-asing/fairness constraint in our model has no adverse effect on the performance. Rather, it enables the recovery of the hidden attribute. The support of our recovered regression parameter vector matches exactly with the true parameter vector. Moreover, we simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample. Our method uses carefully constructed primal dual witnesses to provide theoretical guarantees for the combinatorial problem. To that end, we show that the sample complexity of our method is logarithmic in terms of the dimension of the regression parameter vector.
\ No newline at end of file
diff --git a/data/2021/neurips/Fairness in Ranking under Uncertainty b/data/2021/neurips/Fairness in Ranking under Uncertainty
new file mode 100644
index 0000000000..2bc10d92b2
--- /dev/null
+++ b/data/2021/neurips/Fairness in Ranking under Uncertainty	
@@ -0,0 +1 @@
+Fairness has emerged as an important consideration in algorithmic decision-making. Unfairness occurs when an agent with higher merit obtains a worse outcome than an agent with lower merit. Our central point is that a primary cause of unfairness is uncertainty. A principal or algorithm making decisions never has access to the agents' true merit, and instead uses proxy features that only imperfectly predict merit (e.g., GPA, star ratings, recommendation letters). None of these ever fully capture an agent's merit; yet existing approaches have mostly been defining fairness notions directly based on observed features and outcomes. Our primary point is that it is more principled to acknowledge and model the uncertainty explicitly. The role of observed features is to give rise to a posterior distribution of the agents' merits. We use this viewpoint to define a notion of approximate fairness in ranking. We call an algorithm $\phi$-fair (for $\phi \in [0,1]$) if it has the following property for all agents $x$ and all $k$: if agent $x$ is among the top $k$ agents with respect to merit with probability at least $\rho$ (according to the posterior merit distribution), then the algorithm places the agent among the top $k$ agents in its ranking with probability at least $\phi \rho$. We show how to compute rankings that optimally trade off approximate fairness against utility to the principal. In addition to the theoretical characterization, we present an empirical analysis of the potential impact of the approach in simulation studies. For real-world validation, we applied the approach in the context of a paper recommendation system that we built and fielded at the KDD 2020 conference.
\ No newline at end of file
diff --git a/data/2021/neurips/Fairness via Representation Neutralization b/data/2021/neurips/Fairness via Representation Neutralization
new file mode 100644
index 0000000000..21087e723f
--- /dev/null
+++ b/data/2021/neurips/Fairness via Representation Neutralization	
@@ -0,0 +1 @@
+Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The key idea of RNF is to discourage the classification head from capturing spurious correlation between fairness sensitive information in encoder representations with specific class labels. To address low-resource settings with no access to sensitive attribute annotations, we leverage a bias-amplified model to generate proxy annotations for sensitive attributes. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Abductive Learning by Similarity-based Consistency Optimization b/data/2021/neurips/Fast Abductive Learning by Similarity-based Consistency Optimization
new file mode 100644
index 0000000000..06d09c79ad
--- /dev/null
+++ b/data/2021/neurips/Fast Abductive Learning by Similarity-based Consistency Optimization	
@@ -0,0 +1 @@
+To utilize the raw inputs and symbolic knowledge simultaneously, some recent neuro-symbolic learning methods use abduction, i.e., abductive reasoning, to integrate sub-symbolic perception and logical inference. While the perception model, e.g., a neural network, outputs some facts that are inconsistent with the symbolic background knowledge base, abduction can help revise the incorrect perceived facts by minimizing the inconsistency between them and the background knowledge. However, to enable effective abduction, previous approaches need an initialized perception model that discriminates the input raw instances. This limits the application of these methods, as the discrimination ability is usually acquired from a thorough pre-training when the raw inputs are difﬁcult to classify. In this paper, we propose a novel abduction strategy, which leverages the similarity between samples, rather than the output information by the perceptual neural network, to guide the search in abduction. Based on this principle, we further present ABductive Learning with Similarity (ABLSim) and apply it to some difﬁcult neuro-symbolic learning tasks. Experiments show that the efﬁciency of ABLSim is signiﬁcantly higher than the state-of-the-art neuro-symbolic methods, allowing it to achieve better performance with less labeled data and weaker domain knowledge.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes b/data/2021/neurips/Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes
new file mode 100644
index 0000000000..47f4680a28
--- /dev/null
+++ b/data/2021/neurips/Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes	
@@ -0,0 +1 @@
+In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach reduces the time complexity of each iteration in the VI algorithm from $O(XU)$ to $O(X+U)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections b/data/2021/neurips/Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections
new file mode 100644
index 0000000000..44a53f760b
--- /dev/null
+++ b/data/2021/neurips/Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections	
@@ -0,0 +1 @@
+The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Axiomatic Attribution for Neural Networks b/data/2021/neurips/Fast Axiomatic Attribution for Neural Networks
new file mode 100644
index 0000000000..c63b6f49a3
--- /dev/null
+++ b/data/2021/neurips/Fast Axiomatic Attribution for Neural Networks	
@@ -0,0 +1 @@
+Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable axioms, against the time required to compute them. This in turn either led to long training times or ineffective attribution priors. In this work, we break this trade-off by considering a special class of efficiently axiomatically attributable DNNs for which an axiomatic feature attribution can be computed with only a single forward/backward pass. We formally prove that nonnegatively homogeneous DNNs, here termed $\mathcal{X}$-DNNs, are efficiently axiomatically attributable and show that they can be effortlessly constructed from a wide range of regular DNNs by simply removing the bias term of each layer. Various experiments demonstrate the advantages of $\mathcal{X}$-DNNs, beating state-of-the-art generic attribution methods on regular DNNs for training with attribution priors.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Bayesian Inference for Gaussian Cox Processes via Path Integral Formulation b/data/2021/neurips/Fast Bayesian Inference for Gaussian Cox Processes via Path Integral Formulation
new file mode 100644
index 0000000000..7d866abee7
--- /dev/null
+++ b/data/2021/neurips/Fast Bayesian Inference for Gaussian Cox Processes via Path Integral Formulation	
@@ -0,0 +1 @@
+Gaussian Cox processes are widely-used point process models that use a Gaussian process to describe the Bayesian a priori uncertainty present in latent intensity functions. In this paper, we propose a novel Bayesian inference scheme for Gaussian Cox processes by exploiting a conceptually-intuitive path integral formulation. The proposed scheme does not rely on domain discretization, scales linearly with the number of observed events, has a lower complexity than the state-of-theart variational Bayesian schemes with respect to the number of inducing points, and is applicable to a wide range of Gaussian Cox processes with various types of link functions. Our scheme is especially beneficial under the multi-dimensional input setting, where the number of inducing points tends to be large. We evaluate our scheme on synthetic and real-world data, and show that it achieves comparable predictive accuracy while being tens of times faster than reference methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Certified Robust Training with Short Warmup b/data/2021/neurips/Fast Certified Robust Training with Short Warmup
new file mode 100644
index 0000000000..58d22ef881
--- /dev/null
+++ b/data/2021/neurips/Fast Certified Robust Training with Short Warmup	
@@ -0,0 +1 @@
+Recently, bound propagation based certified robust training methods have been proposed for training neural networks with certifiable robustness guarantees. Despite that state-of-the-art (SOTA) methods including interval bound propagation (IBP) and CROWN-IBP have per-batch training complexity similar to standard neural network training, they usually use a long warmup schedule with hundreds or thousands epochs to reach SOTA performance and are thus still costly. In this paper, we identify two important issues in existing methods, namely exploded bounds at initialization, and the imbalance in ReLU activation states and improve IBP training. These two issues make certified training difficult and unstable, and thereby long warmup schedules were needed in prior works. To mitigate these issues and conduct faster certified training with shorter warmup, we propose three improvements based on IBP training: 1) We derive a new weight initialization method for IBP training; 2) We propose to fully add Batch Normalization (BN) to each layer in the model, since we find BN can reduce the imbalance in ReLU activation states; 3) We also design regularization to explicitly tighten certified bounds and balance ReLU activation states during wamrup. We are able to obtain 65.03% verified error on CIFAR-10 ($\epsilon=\frac{8}{255}$) and 82.36% verified error on TinyImageNet ($\epsilon=\frac{1}{255}$) using very short training schedules (160 and 80 total epochs, respectively), outperforming literature SOTA trained with hundreds or thousands epochs under the same network architecture. The code is available at https://github.com/shizhouxing/Fast-Certified-Robust-Training.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds b/data/2021/neurips/Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds
new file mode 100644
index 0000000000..61d85b8ef0
--- /dev/null
+++ b/data/2021/neurips/Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds	
@@ -0,0 +1 @@
+We present a novel method for reducing the computational complexity of rigorously estimating the partition functions (normalizing constants) of Gibbs (Boltzmann) distributions, which arise ubiquitously in probabilistic graphical models. A major obstacle to practical applications of Gibbs distributions is the need to estimate their partition functions. The state of the art in addressing this problem is multi-stage algorithms, which consist of a cooling schedule, and a mean estimator in each step of the schedule. While the cooling schedule in these algorithms is adaptive, the mean estimation computations use MCMC as a black-box to draw approximate samples. We develop a doubly adaptive approach, combining the adaptive cooling schedule with an adaptive MCMC mean estimator, whose number of Markov chain steps adapts dynamically to the underlying chain. Through rigorous theoretical analysis, we prove that our method outperforms the state of the art algorithms in several factors: (1) The computational complexity of our method is smaller; (2) Our method is less sensitive to loose bounds on mixing times, an inherent component in these algorithms; and (3) The improvement obtained by our method is particularly significant in the most challenging regime of high-precision estimation. We demonstrate the advantage of our method in experiments run on classic factor graphs, such as voting models and Ising models.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems b/data/2021/neurips/Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems
new file mode 100644
index 0000000000..60d256d3d0
--- /dev/null
+++ b/data/2021/neurips/Fast Extra Gradient Methods for Smooth Structured Nonconvex-Nonconcave Minimax Problems	
@@ -0,0 +1 @@
+Modern minimax problems, such as generative adversarial network and adversarial training, are often under a nonconvex-nonconcave setting, and developing an efficient method for such setting is of interest. Recently, two variants of the extragradient (EG) method are studied in that direction. First, a two-time-scale variant of the EG, named EG+, was proposed under a smooth structured nonconvex-nonconcave setting, with a slow $\mathcal{O}(1/k)$ rate on the squared gradient norm, where $k$ denotes the number of iterations. Second, another variant of EG with an anchoring technique, named extra anchored gradient (EAG), was studied under a smooth convex-concave setting, yielding a fast $\mathcal{O}(1/k^2)$ rate on the squared gradient norm. Built upon EG+ and EAG, this paper proposes a two-time-scale EG with anchoring, named fast extragradient (FEG), that has a fast $\mathcal{O}(1/k^2)$ rate on the squared gradient norm for smooth structured nonconvex-nonconcave problems; the corresponding saddle-gradient operator satisfies the negative comonotonicity condition. This paper further develops its backtracking line-search version, named FEG-A, for the case where the problem parameters are not available. The stochastic analysis of FEG is also provided.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Federated Learning in the Presence of Arbitrary Device Unavailability b/data/2021/neurips/Fast Federated Learning in the Presence of Arbitrary Device Unavailability
new file mode 100644
index 0000000000..78c07a7f02
--- /dev/null
+++ b/data/2021/neurips/Fast Federated Learning in the Presence of Arbitrary Device Unavailability	
@@ -0,0 +1 @@
+Federated Learning (FL) coordinates with numerous heterogeneous devices to collaboratively train a shared model while preserving user privacy. Despite its multiple advantages, FL faces new challenges. One challenge arises when devices drop out of the training process beyond the control of the central server. In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices. To tackle this challenge, we study federated learning algorithms under arbitrary device unavailability and propose an algorithm named Memory-augmented Impatient Federated Averaging (MIFA). Our algorithm efficiently avoids excessive latency induced by inactive devices, and corrects the gradient bias using the memorized latest updates from the devices. We prove that MIFA achieves minimax optimal convergence rates on non-i.i.d. data for both strongly convex and non-convex smooth functions. We also provide an explicit characterization of the improvement over baseline algorithms through a case study, and validate the results by numerical experiments on real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints b/data/2021/neurips/Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints
new file mode 100644
index 0000000000..e546eec921
--- /dev/null
+++ b/data/2021/neurips/Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints	
@@ -0,0 +1 @@
+Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified. The inherent complexity of the underlying optimization requires current gradient-based attacks to be carefully tuned, initialized, and possibly executed for many computationally-demanding iterations, even if specialized to a given perturbation model. In this work, we overcome these limitations by proposing a fast minimum-norm (FMN) attack that works with different $\ell_p$-norm perturbation models ($p=0, 1, 2, \infty$), is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps. It works by iteratively finding the sample misclassified with maximum confidence within an $\ell_p$-norm constraint of size $\epsilon$, while adapting $\epsilon$ to minimize the distance of the current sample to the decision boundary. Extensive experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and computation time, while reporting comparable or even smaller perturbation sizes.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification b/data/2021/neurips/Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
new file mode 100644
index 0000000000..1b9a375b9e
--- /dev/null
+++ b/data/2021/neurips/Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification	
@@ -0,0 +1 @@
+Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XR-Transformer takes significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization b/data/2021/neurips/Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
new file mode 100644
index 0000000000..ee4a22f385
--- /dev/null
+++ b/data/2021/neurips/Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization	
@@ -0,0 +1 @@
+This paper investigates the problem of computing the equilibrium of competitive games, which is often modeled as a constrained saddle-point optimization problem with probability simplex constraints. Despite recent efforts in understanding the last-iterate convergence of extragradient methods in the unconstrained setting, the theoretical underpinnings of these methods in the constrained settings, especially those using multiplicative updates, remain highly inadequate, even when the objective function is bilinear. Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, we develop provably efficient extragradient methods to find the quantal response equilibrium (QRE) -- which are solutions to zero-sum two-player matrix games with entropy regularization -- at a linear rate. The proposed algorithms can be implemented in a decentralized manner, where each player executes symmetric and multiplicative updates iteratively using its own payoff without observing the opponent's actions directly. In addition, by controlling the knob of entropy regularization, the proposed algorithms can locate an approximate Nash equilibrium of the unregularized matrix game at a sublinear rate without assuming the Nash equilibrium to be unique. Our methods also lead to efficient policy extragradient algorithms for solving (entropy-regularized) zero-sum Markov games at similar rates. All of our convergence rates are nearly dimension-free, which are independent of the size of the state and action spaces up to logarithm factors, highlighting the positive role of entropy regularization for accelerating convergence.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics b/data/2021/neurips/Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics
new file mode 100644
index 0000000000..589effcdf3
--- /dev/null
+++ b/data/2021/neurips/Fast Projection onto the Capped Simplex with Applications to Sparse Regression in Bioinformatics	
@@ -0,0 +1 @@
+We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method. We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more). We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Pure Exploration via Frank-Wolfe b/data/2021/neurips/Fast Pure Exploration via Frank-Wolfe
new file mode 100644
index 0000000000..bcf9643b9b
--- /dev/null
+++ b/data/2021/neurips/Fast Pure Exploration via Frank-Wolfe	
@@ -0,0 +1 @@
+We study the problem of active pure exploration with ﬁxed conﬁdence in generic stochastic bandit environments. The goal of the learner is to answer a query about the environment with a given level of certainty while minimizing her sampling budget. For this problem, instance-speciﬁc lower bounds on the expected sample complexity reveal the optimal proportions of arm draws an Oracle algorithm would apply. These proportions solve an optimization problem whose tractability strongly depends on the structural properties of the environment, but may be instrumental in the design of efﬁcient learning algorithms. We devise Frank-Wolfe-based Sampling ( FWS ), a simple algorithm whose sample complexity matches the lower bounds for a wide class of pure exploration problems. The algorithm is computationally efﬁcient as, to learn and track the optimal proportion of arm draws, it relies on a single iteration of Frank-Wolfe algorithm applied to the lower-bound optimization problem. We apply FWS to various pure exploration tasks, including best arm identiﬁcation in unstructured, thresholded, linear, and Lipschitz bandits. Despite its simplicity, FWS is competitive compared to state-of-art algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights b/data/2021/neurips/Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Fast Training Method for Stochastic Compositional Optimization Problems b/data/2021/neurips/Fast Training Method for Stochastic Compositional Optimization Problems
new file mode 100644
index 0000000000..bc2fe1ecb0
--- /dev/null
+++ b/data/2021/neurips/Fast Training Method for Stochastic Compositional Optimization Problems	
@@ -0,0 +1 @@
+The stochastic compositional optimization problem covers a wide range of machine learning models, such as sparse additive models and model-agnostic meta-learning. Thus, it is necessary to develop efficient methods for its optimization. Existing methods for the stochastic compositional optimization problem only focus on the single machine scenario, which is far from satisfactory when data are distributed on different devices. To address this problem, we propose novel decentralized stochastic compositional gradient descent methods to efficiently train the large-scale stochastic compositional optimization problem. To the best of our knowledge, our work is the first one facilitating decentralized training for this kind of problem. Furthermore, we provide the convergence analysis for our methods, which shows that the convergence rate of our methods can achieve linear speedup with respect to the number of devices. At last, we apply our decentralized training methods to the model-agnostic meta-learning problem, and the experimental results confirm the superior performance of our methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Training of Neural Lumigraph Representations using Meta Learning b/data/2021/neurips/Fast Training of Neural Lumigraph Representations using Meta Learning
new file mode 100644
index 0000000000..41cf376ca5
--- /dev/null
+++ b/data/2021/neurips/Fast Training of Neural Lumigraph Representations using Meta Learning	
@@ -0,0 +1 @@
+Novel view synthesis is a long-standing problem in machine learning and computer vision. Significant progress has recently been made in developing neural scene representations and rendering techniques that synthesize photorealistic images from arbitrary views. These representations, however, are extremely slow to train and often also slow to render. Inspired by neural variants of image-based rendering, we develop a new neural rendering approach with the goal of quickly learning a high-quality representation which can also be rendered in real-time. Our approach, MetaNLR++, accomplishes this by using a unique combination of a neural shape representation and 2D CNN-based image feature extraction, aggregation, and re-projection. To push representation convergence times down to minutes, we leverage meta learning to learn neural shape and image feature priors which accelerate training. The optimized shape and image features can then be extracted using traditional graphics techniques and rendered in real time. We show that MetaNLR++ achieves similar or better novel view synthesis results in a fraction of the time that competing methods require.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast Tucker Rank Reduction for Non-Negative Tensors Using Mean-Field Approximation b/data/2021/neurips/Fast Tucker Rank Reduction for Non-Negative Tensors Using Mean-Field Approximation
new file mode 100644
index 0000000000..1676127dd5
--- /dev/null
+++ b/data/2021/neurips/Fast Tucker Rank Reduction for Non-Negative Tensors Using Mean-Field Approximation	
@@ -0,0 +1 @@
+We present an efficient low-rank approximation algorithm for non-negative tensors. The algorithm is derived from our two findings: First, we show that rank-1 approximation for tensors can be viewed as a mean-field approximation by treating each tensor as a probability distribution. Second, we theoretically provide a sufficient condition for distribution parameters to reduce Tucker ranks of tensors; interestingly, this sufficient condition can be achieved by iterative application of the mean-field approximation. Since the mean-field approximation is always given as a closed formula, our findings lead to a fast low-rank approximation algorithm without using a gradient method. We empirically demonstrate that our algorithm is faster than the existing non-negative Tucker rank reduction methods and achieves competitive or better approximation of given tensors.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast and Memory Efficient Differentially Private-SGD via JL Projections b/data/2021/neurips/Fast and Memory Efficient Differentially Private-SGD via JL Projections
new file mode 100644
index 0000000000..9c41413ea0
--- /dev/null
+++ b/data/2021/neurips/Fast and Memory Efficient Differentially Private-SGD via JL Projections	
@@ -0,0 +1 @@
+Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-DP versions. Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff, while being significantly faster than DP-SGD and with a similar memory footprint as non-private SGD. The privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed f-DP framework of Dong et al. (2019) to prove privacy.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast and accurate randomized algorithms for low-rank tensor decompositions b/data/2021/neurips/Fast and accurate randomized algorithms for low-rank tensor decompositions
new file mode 100644
index 0000000000..b1e71fea12
--- /dev/null
+++ b/data/2021/neurips/Fast and accurate randomized algorithms for low-rank tensor decompositions	
@@ -0,0 +1 @@
+Low-rank Tucker and CP tensor decompositions are powerful tools in data analytics. The widely used alternating least squares (ALS) method, which solves a sequence of over-determined least squares subproblems, is costly for large and sparse tensors. We propose a fast and accurate sketched ALS algorithm for Tucker decomposition, which solves a sequence of sketched rank-constrained linear least squares subproblems. Theoretical sketch size upper bounds are provided to achieve $O(\epsilon)$ relative error for each subproblem with two sketching techniques, TensorSketch and leverage score sampling. Experimental results show that this new ALS algorithm, combined with a new initialization scheme based on randomized range finder, yields up to $22.0\%$ relative decomposition residual improvement compared to the state-of-the-art sketched randomized algorithm for Tucker decomposition of various synthetic and real datasets. This Tucker-ALS algorithm is further used to accelerate CP decomposition, by using randomized Tucker compression followed by CP decomposition of the Tucker core tensor. Experimental results show that this algorithm not only converges faster, but also yields more accurate CP decompositions.
\ No newline at end of file
diff --git a/data/2021/neurips/Fast rates for prediction with limited expert advice b/data/2021/neurips/Fast rates for prediction with limited expert advice
new file mode 100644
index 0000000000..3d239cc854
--- /dev/null
+++ b/data/2021/neurips/Fast rates for prediction with limited expert advice	
@@ -0,0 +1 @@
+We investigate the problem of minimizing the excess generalization error with respect to the best expert prediction in a finite family in the stochastic setting, under limited access to information. We assume that the learner only has access to a limited number of expert advices per training round, as well as for prediction. Assuming that the loss function is Lipschitz and strongly convex, we show that if we are allowed to see the advice of only one expert per round for T rounds in the training phase, or to use the advice of only one expert for prediction in the test phase, the worst-case excess risk is $\Omega$(1/ $\sqrt$ T) with probability lower bounded by a constant. However, if we are allowed to see at least two actively chosen expert advices per training round and use at least two experts for prediction, the fast rate O(1/T) can be achieved. We design novel algorithms achieving this rate in this setting, and in the setting where the learner has a budget constraint on the total number of observed expert advices, and give precise instance-dependent bounds on the number of training rounds and queries needed to achieve a given generalization error precision.
\ No newline at end of file
diff --git a/data/2021/neurips/FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition b/data/2021/neurips/FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
new file mode 100644
index 0000000000..77bdb94e93
--- /dev/null
+++ b/data/2021/neurips/FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition	
@@ -0,0 +1 @@
+Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the popular NAR models adopted in neural machine translation and text edition by a large margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error b/data/2021/neurips/Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error
new file mode 100644
index 0000000000..b937c33e8c
--- /dev/null
+++ b/data/2021/neurips/Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected Error	
@@ -0,0 +1 @@
+The study of statistical estimation without distributional assumptions on data values, but with knowledge of data collection methods was recently introduced by Chen, Valiant and Valiant (NeurIPS 2020). In this framework, the goal is to design estimators that minimize the worst-case expected error. Here the expectation is over a known, randomized data collection process from some population, and the data values corresponding to each element of the population are assumed to be worst-case. Chen, Valiant and Valiant show that, when data values are (cid:96) ∞ -normalized, there is a polynomial time algorithm to compute an estimator for the mean with worst-case expected error that is within a factor π 2 of the optimum within the natural class of semilinear estimators. However, their algorithm is based on optimizing a somewhat complex concave objective function over a constrained set of positive semideﬁnite matrices, and thus does not come with explicit runtime guarantees beyond being polynomial time in the input. In this paper we design provably efﬁcient algorithms for approximating the optimal semilinear estimator based on online convex optimization. In the setting where data values are (cid:96) ∞ -normalized, our algorithm achieves a π 2 -approximation by iteratively solving a sequence of standard SDPs. When data values are (cid:96) 2 -normalized, our algorithm iteratively computes the top eigenvector of a sequence of matrices, and does not lose any multiplicative approximation factor. Further, using experiments in settings where sample membership is correlated with data values (e.g. "importance sampling" and "snowball sampling"), we show that our (cid:96) 2 -normalized algorithm gives a similar advantage over standard estimators as the original (cid:96) ∞ -normalized algorithm of Chen, Valiant and Valiant, but with much lower computational complexity. We complement these positive results by stating a simple combinatorial condition which, if satisﬁed by a data collection process, implies that any (not necessarily semilinear) estimator for the mean has constant worst-case expected error.
\ No newline at end of file
diff --git a/data/2021/neurips/Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data b/data/2021/neurips/Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data
new file mode 100644
index 0000000000..ab9c033279
--- /dev/null
+++ b/data/2021/neurips/Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data	
@@ -0,0 +1 @@
+In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss. In particular, we show global directional convergence guarantees from a polynomial rate to a linear rate for (deep) linear networks with spherically symmetric data distribution, which can be viewed as a speciﬁc zero-margin dataset. Our results do not require the assumptions in other works such as small initial loss, presumed convergence of weight direction, or overparam-eterization. We also characterize our ﬁndings in experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Faster Matchings via Learned Duals b/data/2021/neurips/Faster Matchings via Learned Duals
new file mode 100644
index 0000000000..6ef7c83c39
--- /dev/null
+++ b/data/2021/neurips/Faster Matchings via Learned Duals	
@@ -0,0 +1 @@
+A recent line of research investigates how algorithms can be augmented with machine-learned predictions to overcome worst case lower bounds. This area has revealed interesting algorithmic insights into problems, with particular success in the design of competitive online algorithms. However, the question of improving algorithm running times with predictions has largely been unexplored. We take a first step in this direction by combining the idea of machine-learned predictions with the idea of"warm-starting"primal-dual algorithms. We consider one of the most important primitives in combinatorial optimization: weighted bipartite matching and its generalization to $b$-matching. We identify three key challenges when using learned dual variables in a primal-dual algorithm. First, predicted duals may be infeasible, so we give an algorithm that efficiently maps predicted infeasible duals to nearby feasible solutions. Second, once the duals are feasible, they may not be optimal, so we show that they can be used to quickly find an optimal solution. Finally, such predictions are useful only if they can be learned, so we show that the problem of learning duals for matching has low sample complexity. We validate our theoretical findings through experiments on both real and synthetic data. As a result we give a rigorous, practical, and empirically effective method to compute bipartite matchings.
\ No newline at end of file
diff --git a/data/2021/neurips/Faster Neural Network Training with Approximate Tensor Operations b/data/2021/neurips/Faster Neural Network Training with Approximate Tensor Operations
new file mode 100644
index 0000000000..ef77092305
--- /dev/null
+++ b/data/2021/neurips/Faster Neural Network Training with Approximate Tensor Operations	
@@ -0,0 +1 @@
+We propose a novel technique for faster DNN training which systematically applies sample-based approximation to the constituent tensor operations, i.e., matrix multiplications and convolutions. We introduce new sampling techniques, study their theoretical properties, and prove that they provide the same convergence guarantees when applied to SGD DNN training. We apply approximate tensor operations to single and multi-node training of MLP and CNN networks on MNIST, CIFAR-10 and ImageNet datasets. We demonstrate up to 66% reduction in the amount of computations and communication, and up to 1.37x faster training time while maintaining negligible or no impact on the final test accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Faster Non-asymptotic Convergence for Double Q-learning b/data/2021/neurips/Faster Non-asymptotic Convergence for Double Q-learning
new file mode 100644
index 0000000000..1361a66b49
--- /dev/null
+++ b/data/2021/neurips/Faster Non-asymptotic Convergence for Double Q-learning	
@@ -0,0 +1 @@
+Double Q-learning (Hasselt, 2010) has gained signiﬁcant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning. However, the theoretical understanding of double Q-learning is rather limited. The only existing ﬁnite-time analysis was recently established in (Xiong et al., 2020), where the polynomial learning rate adopted in the analysis typically yields a slower convergence rate. This paper tackles the more challenging case of a constant learning rate, and develops new analytical tools that improve the existing convergence rate by orders of magnitude. Speciﬁcally, we show that synchronous double Q-learning attains an (cid:15) -accurate global optimum with a time complexity of ˜Ω (cid:16) ln D
\ No newline at end of file
diff --git a/data/2021/neurips/Faster proximal algorithms for matrix optimization using Jacobi-based eigenvalue methods b/data/2021/neurips/Faster proximal algorithms for matrix optimization using Jacobi-based eigenvalue methods
new file mode 100644
index 0000000000..8d491c3798
--- /dev/null
+++ b/data/2021/neurips/Faster proximal algorithms for matrix optimization using Jacobi-based eigenvalue methods	
@@ -0,0 +1 @@
+We consider proximal splitting algorithms for convex optimization problems over matrices. A signiﬁcant computational bottleneck in many of these algorithms is the need to compute a full eigenvalue or singular value decomposition at each iteration for the evaluation of a proximal operator. In this paper we propose to use an old and surprisingly simple method due to Jacobi to compute these eigenvalue and singular value decompositions, and we demonstrate that it can lead to substantial gains in terms of computation time compared to standard approaches. We rely on three essential properties of this method: (a) its ability to exploit an approximate decomposition as an initial point, which in the case of iterative optimization algorithms can be obtained from the previous iterate; (b) its parallel nature which makes it a great ﬁt for hardware accelerators such as GPUs, now common in machine learning, and (c) its simple termination criterion which allows us to trade-off accuracy with computation time. We demonstrate the efﬁcacy of this approach on a variety of algorithms and problems, and show that, on a GPU, we can obtain 5 to 10x speed-ups in the evaluation of proximal operators compared to standard CPU or GPU linear algebra routines. Our ﬁndings are supported by new theoretical results providing guarantees on the approximation quality of proximal operators obtained using approximate eigenvalue or singular value decompositions.
\ No newline at end of file
diff --git a/data/2021/neurips/Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee b/data/2021/neurips/Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee
new file mode 100644
index 0000000000..5bbc2bbe74
--- /dev/null
+++ b/data/2021/neurips/Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee	
@@ -0,0 +1 @@
+The growing literature of Federated Learning (FL) has recently inspired Federated Reinforcement Learning (FRL) to encourage multiple agents to federatively build a better decision-making policy without sharing raw trajectories. Despite its promising applications, existing works on FRL fail to I) provide theoretical analysis on its convergence, and II) account for random system failures and adversarial attacks. Towards this end, we propose the first FRL framework the convergence of which is guaranteed and tolerant to less than half of the participating agents being random system failures or adversarial attackers. We prove that the sample efficiency of the proposed framework is guaranteed to improve with the number of agents and is able to account for such potential failures or attacks. All theoretical results are empirically verified on various RL benchmark tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/FedDR - Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization b/data/2021/neurips/FedDR - Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization
new file mode 100644
index 0000000000..6cfdd3290b
--- /dev/null
+++ b/data/2021/neurips/FedDR - Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization	
@@ -0,0 +1 @@
+We develop two new algorithms, called, FedDR and asyncFedDR, for solving a fundamental nonconvex composite optimization problem in federated learning. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. They can also handle convex regularizers. Unlike recent methods in the literature, e.g., FedSplit and FedPD, our algorithms update only a subset of users at each communication round, and possibly in an asynchronous manner, making them more practical. These new algorithms also achieve communication efficiency and more importantly can handle statistical and system heterogeneity, which are the two main challenges in federated learning. Our convergence analysis shows that the new algorithms match the communication complexity lower bound up to a constant factor under standard assumptions. Our numerical experiments illustrate the advantages of our methods compared to existing ones on several datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Graph Classification over Non-IID Graphs b/data/2021/neurips/Federated Graph Classification over Non-IID Graphs
new file mode 100644
index 0000000000..85a34b712c
--- /dev/null
+++ b/data/2021/neurips/Federated Graph Classification over Non-IID Graphs	
@@ -0,0 +1 @@
+Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing b/data/2021/neurips/Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing
new file mode 100644
index 0000000000..986f4b2a3a
--- /dev/null
+++ b/data/2021/neurips/Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing	
@@ -0,0 +1 @@
+Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks, obtaining higher accuracy using the same training budget.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Linear Contextual Bandits b/data/2021/neurips/Federated Linear Contextual Bandits
new file mode 100644
index 0000000000..731d006d04
--- /dev/null
+++ b/data/2021/neurips/Federated Linear Contextual Bandits	
@@ -0,0 +1 @@
+This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-PE relies on a novel multi-client G-optimal design, and achieves near-optimal regrets for both disjoint and shared parameter cases with logarithmic communication costs. In addition, a new concept called collinearly-dependent policies is introduced, based on which a tight minimax regret lower bound for the disjoint parameter case is derived. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Multi-Task Learning under a Mixture of Distributions b/data/2021/neurips/Federated Multi-Task Learning under a Mixture of Distributions
new file mode 100644
index 0000000000..8aa4252a95
--- /dev/null
+++ b/data/2021/neurips/Federated Multi-Task Learning under a Mixture of Distributions	
@@ -0,0 +1 @@
+The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL), a framework for on-device collaborative training of machine learning models. First efforts in FL focused on learning a single global model with good average performance across clients, but the global model may be arbitrarily bad for a given client, due to the inherent heterogeneity of local data distributions. Federated multi-task learning (MTL) approaches can learn personalized models by formulating an opportune penalized optimization problem. The penalization term can capture complex relations among personalized models, but eschews clear statistical assumptions about local data distributions. In this work, we propose to study federated MTL under the flexible assumption that each local data distribution is a mixture of unknown underlying distributions. This assumption encompasses most of the existing personalized FL approaches and leads to federated EM-like algorithms for both client-server and fully decentralized settings. Moreover, it provides a principled way to serve personalized models to clients not seen at training time. The algorithms' convergence is analyzed through a novel federated surrogate optimization framework, which can be of general interest. Experimental results on FL benchmarks show that our approach provides models with higher accuracy and fairness than state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Reconstruction: Partially Local Federated Learning b/data/2021/neurips/Federated Reconstruction: Partially Local Federated Learning
new file mode 100644
index 0000000000..93227da077
--- /dev/null
+++ b/data/2021/neurips/Federated Reconstruction: Partially Local Federated Learning	
@@ -0,0 +1 @@
+Personalization methods in federated learning aim to balance the beneﬁts of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the ﬁrst model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative ﬁltering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative ﬁltering in a mobile keyboard application.
\ No newline at end of file
diff --git a/data/2021/neurips/Federated Split Task-Agnostic Vision Transformer for COVID-19 CXR Diagnosis b/data/2021/neurips/Federated Split Task-Agnostic Vision Transformer for COVID-19 CXR Diagnosis
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Federated-EM with heterogeneity mitigation and variance reduction b/data/2021/neurips/Federated-EM with heterogeneity mitigation and variance reduction
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Few-Round Learning for Federated Learning b/data/2021/neurips/Few-Round Learning for Federated Learning
new file mode 100644
index 0000000000..a29f5d42b1
--- /dev/null
+++ b/data/2021/neurips/Few-Round Learning for Federated Learning	
@@ -0,0 +1 @@
+In federated learning (FL), a number of distributed clients targeting the same task collaborate to train a single global model without sharing their data. The learning process typically starts from a randomly initialized or some pretrained model. In this paper, we aim at designing an initial model based on which an arbitrary group of clients can obtain a global model for its own purpose, within only a few rounds of FL. The key challenge here is that the downstream tasks for which the pretrained model will be used are generally unknown when the initial model is prepared. Our idea is to take a meta-learning approach to construct the initial model so that any group with a possibly unseen task can obtain a high-accuracy global model within only R rounds of FL . Our meta-learning itself could be done via federated learning among willing participants and is based on an episodic arrangement to mimic the R rounds of FL followed by inference in each episode. Extensive experimental results show that our method generalizes well for arbitrary groups of clients and provides large performance improvements given the same overall communication/computation resources, compared to other baselines relying on known pretraining methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Few-Shot Data-Driven Algorithms for Low Rank Approximation b/data/2021/neurips/Few-Shot Data-Driven Algorithms for Low Rank Approximation
new file mode 100644
index 0000000000..889846c3a5
--- /dev/null
+++ b/data/2021/neurips/Few-Shot Data-Driven Algorithms for Low Rank Approximation	
@@ -0,0 +1 @@
+Recently, data-driven and learning-based algorithms for low rank matrix approximation were shown to outperform classical data-oblivious algorithms by wide margins in terms of accuracy. Those algorithms are based on the optimization of sparse sketching matrices, which lead to large savings in time and memory during testing. However, they require long training times on a large amount of existing data, and rely on access to specialized hardware and software. In this work, we develop new data-driven low rank approximation algorithms with better computational efficiency in the training phase, alleviating these drawbacks. Furthermore, our methods are interpretable: while previous algorithms choose the sketching matrix either at random or by black-box learning, we show that it can be set (or initialized) to clearly interpretable values extracted from the dataset. Our experiments show that our algorithms, either by themselves or in combination with previous methods, achieve significant empirical advantages over previous work, improving training times by up to an order of magnitude toward achieving the same target accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Few-Shot Object Detection via Association and DIscrimination b/data/2021/neurips/Few-Shot Object Detection via Association and DIscrimination
new file mode 100644
index 0000000000..ba9c9c3bcc
--- /dev/null
+++ b/data/2021/neurips/Few-Shot Object Detection via Association and DIscrimination	
@@ -0,0 +1 @@
+Object detection has achieved substantial progress in the last decade. However, detecting novel classes with only few samples remains challenging, since deep learning under low data regime usually leads to a degraded feature space. Existing works employ a holistic fine-tuning paradigm to tackle this problem, where the model is first pre-trained on all base classes with abundant samples, and then it is used to carve the novel class feature space. Nonetheless, this paradigm is still imperfect. Durning fine-tuning, a novel class may implicitly leverage the knowledge of multiple base classes to construct its feature space, which induces a scattered feature space, hence violating the inter-class separability. To overcome these obstacles, we propose a two-step fine-tuning framework, Few-shot object detection via Association and DIscrimination (FADI), which builds up a discriminative feature space for each novel class with two integral steps. 1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space. Specifically, we associate each novel class with a base class according to their semantic similarity. After that, the feature space of a novel class can readily imitate the well-trained feature space of the associated base class. 2) In the discrimination step, to ensure the separability between the novel classes and associated base classes, we disentangle the classification branches for base and novel classes. To further enlarge the inter-class separability between all classes, a set-specialized margin loss is imposed. Extensive experiments on Pascal VOC and MS-COCO datasets demonstrate FADI achieves new SOTA performance, significantly improving the baseline in any shot/split by +18.7. Notably, the advantage is most announced on extremely few-shot scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Few-Shot Segmentation via Cycle-Consistent Transformer b/data/2021/neurips/Few-Shot Segmentation via Cycle-Consistent Transformer
new file mode 100644
index 0000000000..71023078d9
--- /dev/null
+++ b/data/2021/neurips/Few-Shot Segmentation via Cycle-Consistent Transformer	
@@ -0,0 +1 @@
+Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and query images to facilitate the few-shot segmentation task. We design a novel Cycle-Consistent TRansformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-$5^i$ and COCO-$20^i$ datasets, we achieve 67.5% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art methods by 5.6% and 7.1% respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Finding Bipartite Components in Hypergraphs b/data/2021/neurips/Finding Bipartite Components in Hypergraphs
new file mode 100644
index 0000000000..f8c9fb557a
--- /dev/null
+++ b/data/2021/neurips/Finding Bipartite Components in Hypergraphs	
@@ -0,0 +1 @@
+Hypergraphs are important objects to model ternary or higher-order relations of objects, and have a number of applications in analysing many complex datasets occurring in practice. In this work we study a new heat diffusion process in hypergraphs, and employ this process to design a polynomial-time algorithm that approximately finds bipartite components in a hypergraph. We theoretically prove the performance of our proposed algorithm, and compare it against the previous state-of-the-art through extensive experimental analysis on both synthetic and real-world datasets. We find that our new algorithm consistently and significantly outperforms the previous state-of-the-art across a wide range of hypergraphs.
\ No newline at end of file
diff --git a/data/2021/neurips/Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution b/data/2021/neurips/Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution
new file mode 100644
index 0000000000..a06008ec83
--- /dev/null
+++ b/data/2021/neurips/Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution	
@@ -0,0 +1 @@
+Recent blind super-resolution (SR) methods typically consist of two branches, one for degradation prediction and the other for conditional restoration. However, our experiments show that a one-branch network can achieve comparable performance to the two-branch scheme. Then we wonder: how can one-branch networks automatically learn to distinguish degradations? To find the answer, we propose a new diagnostic tool -- Filter Attribution method based on Integral Gradient (FAIG). Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks. With the discovered filters, we further develop a simple yet effective method to predict the degradation of an input image. Based on FAIG, we show that, in one-branch blind SR networks, 1) we are able to find a very small number of (1%) discriminative filters for each specific degradation; 2) The weights, locations and connections of the discovered filters are all important to determine the specific network function. 3) The task of degradation prediction can be implicitly realized by these discriminative filters without explicit supervised learning. Our findings can not only help us better understand network behaviors inside one-branch blind SR networks, but also provide guidance on designing more efficient architectures and diagnosing networks for blind SR.
\ No newline at end of file
diff --git a/data/2021/neurips/Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks b/data/2021/neurips/Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks
new file mode 100644
index 0000000000..d9cd4a5130
--- /dev/null
+++ b/data/2021/neurips/Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks	
@@ -0,0 +1 @@
+One major problem in black-box adversarial attacks is the high query complexity in the hard-label attack setting, where only the top-1 predicted label is available. In this paper, we propose a novel geometric-based approach called Tangent Attack (TA), which identifies an optimal tangent point of a virtual hemisphere located on the decision boundary to reduce the distortion of the attack. Assuming the decision boundary is locally flat, we theoretically prove that the minimum $\ell_2$ distortion can be obtained by reaching the decision boundary along the tangent line passing through such tangent point in each iteration. To improve the robustness of our method, we further propose a generalized method which replaces the hemisphere with a semi-ellipsoid to adapt to curved decision boundaries. Our approach is free of pre-training. Extensive experiments conducted on the ImageNet and CIFAR-10 datasets demonstrate that our approach can consume only a small number of queries to achieve the low-magnitude distortion. The implementation source code is released online at https://github.com/machanic/TangentAttack.
\ No newline at end of file
diff --git a/data/2021/neurips/Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance b/data/2021/neurips/Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance
new file mode 100644
index 0000000000..199e8c9a0c
--- /dev/null
+++ b/data/2021/neurips/Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance	
@@ -0,0 +1 @@
+Individuals often make different decisions when faced with the same context, due to personal preferences and background. For instance, judges may vary in their leniency towards certain drug-related offenses, and doctors may vary in their preference for how to start treatment for certain types of patients. With these examples in mind, we present an algorithm for identifying types of contexts (e.g., types of cases or patients) with high inter-decision-maker disagreement. We formalize this as a causal inference problem, seeking a region where the assignment of decision-maker has a large causal effect on the decision. Our algorithm finds such a region by maximizing an empirical objective, and we give a generalization bound for its performance. In a semi-synthetic experiment, we show that our algorithm recovers the correct region of heterogeneity accurately compared to baselines. Finally, we apply our algorithm to real-world healthcare datasets, recovering variation that aligns with existing clinical knowledge.
\ No newline at end of file
diff --git a/data/2021/neurips/Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information b/data/2021/neurips/Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information
new file mode 100644
index 0000000000..99a0fc7a91
--- /dev/null
+++ b/data/2021/neurips/Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information	
@@ -0,0 +1 @@
+One principal approach for illuminating a black-box neural network is feature attribution, i.e. identifying the importance of input features for the network's prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.
\ No newline at end of file
diff --git a/data/2021/neurips/Fine-Grained Zero-Shot Learning with DNA as Side Information b/data/2021/neurips/Fine-Grained Zero-Shot Learning with DNA as Side Information
new file mode 100644
index 0000000000..4b70958391
--- /dev/null
+++ b/data/2021/neurips/Fine-Grained Zero-Shot Learning with DNA as Side Information	
@@ -0,0 +1 @@
+Fine-grained zero-shot learning task requires some form of side-information to transfer discriminative information from seen to unseen classes. As manually annotated visual attributes are extremely costly and often impractical to obtain for a large number of classes, in this study we use DNA as side information for the first time for fine-grained zero-shot classification of species. Mitochondrial DNA plays an important role as a genetic marker in evolutionary biology and has been used to achieve near-perfect accuracy in the species classification of living organisms. We implement a simple hierarchical Bayesian model that uses DNA information to establish the hierarchy in the image space and employs local priors to define surrogate classes for unseen ones. On the benchmark CUB dataset, we show that DNA can be equally promising yet in general a more accessible alternative than word vectors as a side information. This is especially important as obtaining robust word representations for fine-grained species names is not a practicable goal when information about these species in free-form text is limited. On a newly compiled fine-grained insect dataset that uses DNA information from over a thousand species, we show that the Bayesian approach outperforms state-of-the-art by a wide margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Fine-grained Generalization Analysis of Inductive Matrix Completion b/data/2021/neurips/Fine-grained Generalization Analysis of Inductive Matrix Completion
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning b/data/2021/neurips/Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning
new file mode 100644
index 0000000000..f5cf583f64
--- /dev/null
+++ b/data/2021/neurips/Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning	
@@ -0,0 +1 @@
+The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts. To the best of our knowledge, we provide the ﬁrst known ﬁnite sample guarantees using both constant and diminishing step sizes of (i) average-reward TD( λ ) with linear function approximation for policy evaluation and (ii) average-reward Q -learning in the tabular setting to ﬁnd an optimal policy. A major challenge is that since the value functions are agnostic to an additive constant, the corresponding Bellman operators are no longer contraction mappings under any norm. We obtain the results for TD( λ ) by working in an appropriately deﬁned subspace that ensures uniqueness of the solution. For Q -learning, we exploit the span seminorm contractive property of the Bellman operator and construct a novel Lyapunov function obtained by inﬁmal convolution of the generalized Moreau envelope and the indicator function of a set.
\ No newline at end of file
diff --git a/data/2021/neurips/Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators b/data/2021/neurips/Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators
new file mode 100644
index 0000000000..d1325413d6
--- /dev/null
+++ b/data/2021/neurips/Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators	
@@ -0,0 +1 @@
+In temporal difference (TD) learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation (including multi-step off-policy importance sampling) has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor. Off-policy TD-learning is known to suffer from high variance due to the product of importance sampling ratios. A number of algorithms (e.g. $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, Retrace$(\lambda)$, and $Q$-trace) have been proposed in the literature to address this issue. Our results immediately imply finite-sample bounds of these algorithms. In particular, we provide first-known finite-sample guarantees for $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, and Retrace$(\lambda)$, and improve the best known bounds of $Q$-trace in [19]. Moreover, we show the bias-variance trade-offs in each of these algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Fitting summary statistics of neural data with a differentiable spiking network simulator b/data/2021/neurips/Fitting summary statistics of neural data with a differentiable spiking network simulator
new file mode 100644
index 0000000000..4537107ea0
--- /dev/null
+++ b/data/2021/neurips/Fitting summary statistics of neural data with a differentiable spiking network simulator	
@@ -0,0 +1 @@
+Fitting network models to neural activity is an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics. We find that it improves upon other fitting algorithms for spiking network models like GLMs (Generalized Linear Models) which do not usually rely on back-propagation. This new fitting algorithm also enables the consideration of hidden neurons which is otherwise notoriously hard, and we show that it can be crucial when trying to infer the network connectivity from spike recordings.
\ No newline at end of file
diff --git a/data/2021/neurips/Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems b/data/2021/neurips/Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems
new file mode 100644
index 0000000000..286c06f9bc
--- /dev/null
+++ b/data/2021/neurips/Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems	
@@ -0,0 +1 @@
+Machine-learning systems such as self-driving cars or virtual assistants are composed of a large number of machine-learning models that recognize image content, transcribe speech, analyze natural language, infer preferences, rank options, etc. Models in these systems are often developed and trained independently, which raises an obvious concern: Can improving a machine-learning model make the overall system worse? We answer this question affirmatively by showing that improving a model can deteriorate the performance of downstream models, even after those downstream models are retrained. Such self-defeating improvements are the result of entanglement between the models in the system. We perform an error decomposition of systems with multiple machine-learning models, which sheds light on the types of errors that can lead to self-defeating improvements. We also present the results of experiments which show that self-defeating improvements emerge in a realistic stereo-based detection system for cars and pedestrians.
\ No newline at end of file
diff --git a/data/2021/neurips/FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout b/data/2021/neurips/FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout
new file mode 100644
index 0000000000..2f00e63728
--- /dev/null
+++ b/data/2021/neurips/FjORD: Fair and Accurate Federated Learning under heterogeneous targets with Ordered Dropout	
@@ -0,0 +1 @@
+Federated Learning (FL) has been gaining significant traction across different ML tasks, ranging from vision to keyboard predictions. In large-scale deployments, client heterogeneity is a fact and constitutes a primary problem for fairness, training performance and accuracy. Although significant efforts have been made into tackling statistical data heterogeneity, the diversity in the processing capabilities and network bandwidth of clients, termed as system heterogeneity, has remained largely unexplored. Current solutions either disregard a large portion of available devices or set a uniform limit on the model's capacity, restricted by the least capable participants. In this work, we introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in deep neural networks (DNNs) and enables the extraction of lower footprint submodels without the need of retraining. We further show that for linear maps our Ordered Dropout is equivalent to SVD. We employ this technique, along with a self-distillation methodology, in the realm of FL in a framework called FjORD. FjORD alleviates the problem of client system heterogeneity by tailoring the model width to the client's capabilities. Extensive evaluation on both CNNs and RNNs across diverse modalities shows that FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
\ No newline at end of file
diff --git a/data/2021/neurips/Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning b/data/2021/neurips/Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning
new file mode 100644
index 0000000000..e6ad47ac94
--- /dev/null
+++ b/data/2021/neurips/Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning	
@@ -0,0 +1 @@
+The backpropagation networks are notably susceptible to catastrophic forgetting, where networks tend to forget previously learned skills upon learning new ones. To address such the 'sensitivity-stability' dilemma, most previous efforts have been contributed to minimizing the empirical risk with different parameter regularization terms and episodic memory, but rarely exploring the usages of the weight loss landscape. In this paper, we investigate the relationship between the weight loss landscape and sensitivity-stability in the continual learning scenario, based on which, we propose a novel method, Flattening Sharpness for Dynamic Gradient Projection Memory (FS-DGPM). In particular, we introduce a soft weight to represent the importance of each basis representing past tasks in GPM, which can be adaptively learned during the learning process, so that less important bases can be dynamically released to improve the sensitivity of new skill learning. We further introduce Flattening Sharpness (FS) to reduce the generalization gap by explicitly regulating the flatness of the weight loss landscape of all seen tasks. As demonstrated empirically, our proposed method consistently outperforms baselines with the superior ability to learn new skills while alleviating forgetting effectively.
\ No newline at end of file
diff --git a/data/2021/neurips/FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling b/data/2021/neurips/FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling
new file mode 100644
index 0000000000..f2ca31a7c6
--- /dev/null
+++ b/data/2021/neurips/FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling	
@@ -0,0 +1 @@
+The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch achieves 13.96% and 18.96% error rate reduction over FixMatch on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open-source our code at https://github.com/TorchSSL/TorchSSL.
\ No newline at end of file
diff --git a/data/2021/neurips/Flexible Option Learning b/data/2021/neurips/Flexible Option Learning
new file mode 100644
index 0000000000..1d83358b8e
--- /dev/null
+++ b/data/2021/neurips/Flexible Option Learning	
@@ -0,0 +1 @@
+Temporal abstraction in reinforcement learning (RL), offers the promise of improving generalization and knowledge transfer in complex environments, by propagating information more efficiently over time. Although option learning was initially formulated in a way that allows updating many options simultaneously, using off-policy, intra-option learning (Sutton, Precup&Singh, 1999), many of the recent hierarchical reinforcement learning approaches only update a single option at a time: the option currently executing. We revisit and extend intra-option learning in the context of deep reinforcement learning, in order to enable updating all options consistent with current primitive action choices, without introducing any additional estimates. Our method can therefore be naturally adopted in most hierarchical RL frameworks. When we combine our approach with the option-critic algorithm for option discovery, we obtain significant improvements in performance and data-efficiency across a wide variety of domains.
\ No newline at end of file
diff --git a/data/2021/neurips/Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation b/data/2021/neurips/Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation
new file mode 100644
index 0000000000..7748112665
--- /dev/null
+++ b/data/2021/neurips/Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation	
@@ -0,0 +1 @@
+This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions, such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.
\ No newline at end of file
diff --git a/data/2021/neurips/Focal Attention for Long-Range Interactions in Vision Transformers b/data/2021/neurips/Focal Attention for Long-Range Interactions in Vision Transformers
new file mode 100644
index 0000000000..466e337d8f
--- /dev/null
+++ b/data/2021/neurips/Focal Attention for Long-Range Interactions in Vision Transformers	
@@ -0,0 +1 @@
+Recently, Vision Transformer and its variants have shown great promise on various computer vision tasks. The ability of capturing local and global visual dependencies through self-attention is the key to its success. However, this also brings challenges due to quadratic computational overhead, especially for the high-resolution vision tasks ( e.g. , object detection). Many recent works have attempted to reduce the cost and improve model performance by applying either coarse-grained global attention or ﬁne-grained local attention. However, both approaches cripple the modeling power of the original self-attention mechanism of multi-layer Transformers, leading to sub-optimal solutions. In this paper, we present focal attention , a new attention mechanism that incorporates both ﬁne-grained local and coarse-grained global interactions. In this new mechanism, each token attends its closest surrounding tokens at ﬁne granularity and the tokens far away at coarse granularity, and thus can capture both short-and long-range visual dependencies efﬁciently and effectively. With focal attention, we build a new variant of Vision Transformer models, called Focal Transformers , which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classiﬁcation and object detection benchmarks. In particular, our Focal Transformer models with a moderate size of 51.1M and a large size of 89.8M achieve 83.6 % and 84.0 % Top-1 accuracy, respectively, on ImageNet classiﬁcation at 224 × 224 . When employed as the backbones, Focal Transformers achieve consistent and substantial improvements over the current SoTA Swin Transformers [43] across 6 different object detection methods. Our largest Focal Transformer yields 58.7 / 59.0 box mAPs and 50.9 / 51.3 mask mAPs on COCO mini-val/test-dev, and 55.4 mIoU on ADE20K for semantic segmentation, creating new SoTA on three of the most challenging computer vision tasks. Our code is available at: https://github. com/microsoft/Focal-Transformer .
\ No newline at end of file
diff --git a/data/2021/neurips/For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets b/data/2021/neurips/For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets
new file mode 100644
index 0000000000..b568cb5c4c
--- /dev/null
+++ b/data/2021/neurips/For high-dimensional hierarchical models, consider exchangeability of effects across covariates instead of across datasets	
@@ -0,0 +1 @@
+Hierarchical Bayesian methods enable information sharing across multiple related regression problems. While standard practice is to model regression parameters (effects) as (1) exchangeable across datasets and (2) correlated to differing degrees across covariates, we show that this approach exhibits poor statistical performance when the number of covariates exceeds the number of datasets. For instance, in statistical genetics, we might regress dozens of traits (defining datasets) for thousands of individuals (responses) on up to millions of genetic variants (covariates). When an analyst has more covariates than datasets, we argue that it is often more natural to instead model effects as (1) exchangeable across covariates and (2) correlated to differing degrees across datasets. To this end, we propose a hierarchical model expressing our alternative perspective. We devise an empirical Bayes estimator for learning the degree of correlation between datasets. We develop theory that demonstrates that our method outperforms the classic approach when the number of covariates dominates the number of datasets, and corroborate this result empirically on several high-dimensional multiple regression and classification problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations b/data/2021/neurips/Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations
new file mode 100644
index 0000000000..b0521176ed
--- /dev/null
+++ b/data/2021/neurips/Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations	
@@ -0,0 +1 @@
+Studying the sensitivity of weight perturbation in neural networks and its impacts on model performance, including generalization and robustness, is an active research topic due to its implications on a wide range of machine learning tasks such as model compression, generalization gap assessment, and adversarial attacks. In this paper, we provide the first integral study and analysis for feed-forward neural networks in terms of the robustness in pairwise class margin and its generalization behavior under weight perturbation. We further design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations. Empirical experiments are conducted to validate our theoretical analysis. Our results offer fundamental insights for characterizing the generalization and robustness of neural networks against weight perturbations.
\ No newline at end of file
diff --git a/data/2021/neurips/Formalizing the Generalization-Forgetting Trade-off in Continual Learning b/data/2021/neurips/Formalizing the Generalization-Forgetting Trade-off in Continual Learning
new file mode 100644
index 0000000000..6736dfb865
--- /dev/null
+++ b/data/2021/neurips/Formalizing the Generalization-Forgetting Trade-off in Continual Learning	
@@ -0,0 +1 @@
+We formulate the continual learning (CL) problem via dynamic programming and model the trade-off between catastrophic forgetting and generalization as a two-player sequential game. In this approach, player 1 maximizes the cost due to lack of generalization whereas player 2 minimizes the cost due to catastrophic forgetting. We show theoretically that a balance point between the two players exists for each task and that this point is stable (once the balance is achieved, the two players stay at the balance point). Next, we introduce balanced continual learning (BCL), which is designed to attain balance between generalization and forgetting and empirically demonstrate that BCL is comparable to or better than the state of the art.
\ No newline at end of file
diff --git a/data/2021/neurips/Forster Decomposition and Learning Halfspaces with Noise b/data/2021/neurips/Forster Decomposition and Learning Halfspaces with Noise
new file mode 100644
index 0000000000..3ce63bbefd
--- /dev/null
+++ b/data/2021/neurips/Forster Decomposition and Learning Halfspaces with Noise	
@@ -0,0 +1 @@
+A Forster transform is an operation that turns a distribution into one with good anti-concentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.
\ No newline at end of file
diff --git a/data/2021/neurips/Foundations of Symbolic Languages for Model Interpretability b/data/2021/neurips/Foundations of Symbolic Languages for Model Interpretability
new file mode 100644
index 0000000000..7b64a6526e
--- /dev/null
+++ b/data/2021/neurips/Foundations of Symbolic Languages for Model Interpretability	
@@ -0,0 +1 @@
+Several queries and scores have recently been proposed to explain individual predictions over ML models. Given the need for flexible, reliable, and easy-to-apply interpretability methods for ML models, we foresee the need for developing declarative languages to naturally specify different explainability queries. We do this in a principled way by rooting such a language in a logic, called FOIL, that allows for expressing many simple but important explainability queries, and might serve as a core for more expressive interpretability languages. We study the computational complexity of FOIL queries over two classes of ML models often deemed to be easily interpretable: decision trees and OBDDs. Since the number of possible inputs for an ML model is exponential in its dimension, the tractability of the FOIL evaluation problem is delicate but can be achieved by either restricting the structure of the models or the fragment of FOIL being evaluated. We also present a prototype implementation of FOIL wrapped in a high-level declarative language and perform experiments showing that such a language can be used in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms b/data/2021/neurips/Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
new file mode 100644
index 0000000000..61e066f2ea
--- /dev/null
+++ b/data/2021/neurips/Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms	
@@ -0,0 +1 @@
+Understanding generalization in deep learning has been one of the major challenges in statistical learning theory over the last decade. While recent work has illustrated that the dataset and the training algorithm must be taken into account in order to obtain meaningful generalization bounds, it is still theoretically not clear which properties of the data and the algorithm determine the generalization performance. In this study, we approach this problem from a dynamical systems theory perspective and represent stochastic optimization algorithms as random iterated function systems (IFS). Well studied in the dynamical systems literature, under mild assumptions, such IFSs can be shown to be ergodic with an invariant measure that is often supported on sets with a fractal structure. As our main contribution, we prove that the generalization error of a stochastic optimization algorithm can be bounded based on the `complexity' of the fractal structure that underlies its invariant measure. Leveraging results from dynamical systems theory, we show that the generalization error can be explicitly linked to the choice of the algorithm (e.g., stochastic gradient descent -- SGD), algorithm hyperparameters (e.g., step-size, batch-size), and the geometry of the problem (e.g., Hessian of the loss). We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden-layered neural networks) and algorithms (e.g., SGD and preconditioned variants), and obtain analytical estimates for our bound.For modern neural networks, we develop an efficient algorithm to compute the developed bound and support our theory with various experiments on neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Framing RNN as a kernel method: A neural ODE approach b/data/2021/neurips/Framing RNN as a kernel method: A neural ODE approach
new file mode 100644
index 0000000000..93030b5c4e
--- /dev/null
+++ b/data/2021/neurips/Framing RNN as a kernel method: A neural ODE approach	
@@ -0,0 +1 @@
+Building on the interpretation of a recurrent neural network (RNN) as a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space. As a consequence, we obtain theoretical guarantees on generalization and stability for a large class of recurrent networks. Our results are illustrated on simulated datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/From Canonical Correlation Analysis to Self-supervised Graph Neural Networks b/data/2021/neurips/From Canonical Correlation Analysis to Self-supervised Graph Neural Networks
new file mode 100644
index 0000000000..95c96d7181
--- /dev/null
+++ b/data/2021/neurips/From Canonical Correlation Analysis to Self-supervised Graph Neural Networks	
@@ -0,0 +1 @@
+We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data. It follows the previous methods that generate two views of an input graph through data augmentation. However, unlike contrastive methods that focus on instance-level discrimination, we optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis. Compared with other works, our approach requires none of the parameterized mutual information estimator, additional projector, asymmetric structures, and most importantly, negative samples which can be costly. We show that the new objective essentially 1) aims at discarding augmentation-variant information by learning invariant representations, and 2) can prevent degenerated solutions by decorrelating features in different dimensions. Our theoretical analysis further provides an understanding for the new objective which can be equivalently seen as an instantiation of the Information Bottleneck Principle under the self-supervised setting. Despite its simplicity, our method performs competitively on seven public graph datasets. The code is available at: https://github.com/hengruizhang98/CCA-SSG.
\ No newline at end of file
diff --git a/data/2021/neurips/From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits b/data/2021/neurips/From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/From global to local MDI variable importances for random forests and when they are Shapley values b/data/2021/neurips/From global to local MDI variable importances for random forests and when they are Shapley values
new file mode 100644
index 0000000000..2fbd864b86
--- /dev/null
+++ b/data/2021/neurips/From global to local MDI variable importances for random forests and when they are Shapley values	
@@ -0,0 +1 @@
+Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Functional Neural Networks for Parametric Image Restoration Problems b/data/2021/neurips/Functional Neural Networks for Parametric Image Restoration Problems
new file mode 100644
index 0000000000..8a55e6c0a6
--- /dev/null
+++ b/data/2021/neurips/Functional Neural Networks for Parametric Image Restoration Problems	
@@ -0,0 +1 @@
+Almost every single image restoration problem has a closely related parameter, such as the scale factor in super-resolution, the noise level in image denoising, and the quality factor in JPEG deblocking. Although recent studies on image restoration problems have achieved great success due to the development of deep neural networks, they handle the parameter involved in an unsophisticated way. Most previous researchers either treat problems with different parameter levels as independent tasks, and train a specific model for each parameter level; or simply ignore the parameter, and train a single model for all parameter levels. The two popular approaches have their own shortcomings. The former is inefficient in computing and the latter is ineffective in performance. In this work, we propose a novel system called functional neural network (FuncNet) to solve a parametric image restoration problem with a single model. Unlike a plain neural network, the smallest conceptual element of our FuncNet is no longer a floating-point variable, but a function of the parameter of the problem. This feature makes it both efficient and effective for a parametric problem. We apply FuncNet to super-resolution, image denoising, and JPEG deblocking. The experimental results show the superiority of our FuncNet on all three parametric image restoration tasks over the state of the arts.
\ No newline at end of file
diff --git a/data/2021/neurips/Functional Regularization for Reinforcement Learning via Learned Fourier Features b/data/2021/neurips/Functional Regularization for Reinforcement Learning via Learned Fourier Features
new file mode 100644
index 0000000000..7ecee686c8
--- /dev/null
+++ b/data/2021/neurips/Functional Regularization for Reinforcement Learning via Learned Fourier Features	
@@ -0,0 +1 @@
+We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting the degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features
\ No newline at end of file
diff --git a/data/2021/neurips/Functional Variational Inference based on Stochastic Process Generators b/data/2021/neurips/Functional Variational Inference based on Stochastic Process Generators
new file mode 100644
index 0000000000..a2351ee7fd
--- /dev/null
+++ b/data/2021/neurips/Functional Variational Inference based on Stochastic Process Generators	
@@ -0,0 +1 @@
+Bayesian inference in the space of functions has been an important topic for Bayesian modeling in the past. In this paper, we propose a new solution to this problem called Functional Variational Inference (FVI). In FVI, we minimize a divergence in function space between the variational distribution and the posterior process. This is done by using as functional variational family a new class of flexible distributions called Stochastic Process Generators (SPGs), which are cleverly designed so that the functional ELBO can be estimated efficiently using analytic solutions and mini-batch sampling. FVI can be applied to stochastic process priors when random function samples from those priors are available. Our experiments show that FVI consistently outperforms weight-space and function space VI methods on several tasks, which validates the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Functionally Regionalized Knowledge Transfer for Low-resource Drug Discovery b/data/2021/neurips/Functionally Regionalized Knowledge Transfer for Low-resource Drug Discovery
new file mode 100644
index 0000000000..63a370c2d7
--- /dev/null
+++ b/data/2021/neurips/Functionally Regionalized Knowledge Transfer for Low-resource Drug Discovery	
@@ -0,0 +1 @@
+More recently, there has been a surge of interest in employing machine learning approaches to expedite the drug discovery process where virtual screening for hit discovery and ADMET prediction for lead optimization play essential roles. One of the main obstacles to the wide success of machine learning approaches in these two tasks is that the number of compounds labeled with activities or ADMET properties is too small to build an effective predictive model. This paper seeks to remedy the problem by transferring the knowledge from previous assays, namely in-vivo experiments, by different laboratories and against various target proteins. To accommodate these wildly different assays and capture the similarity between assays, we propose a functional rationalized meta-learning algorithm FRML for such knowledge transfer. FRML constructs the predictive model with layers of neural sub-networks or so-called functional regions. Building on this, FRML shares an initialization for the weights of the predictive model across all assays, while customizes it to each assay with a region localization network choosing the pertinent regions. The compositionality of the model improves the capacity of generalization to various and even out-of-distribution tasks. Empirical results on both virtual screening and ADMET prediction validate the superiority of FRML over state-of-the-art baselines powered with interpretability in assay relationship.
\ No newline at end of file
diff --git a/data/2021/neurips/Fuzzy Clustering with Similarity Queries b/data/2021/neurips/Fuzzy Clustering with Similarity Queries
new file mode 100644
index 0000000000..478736c115
--- /dev/null
+++ b/data/2021/neurips/Fuzzy Clustering with Similarity Queries	
@@ -0,0 +1 @@
+The fuzzy or soft $k$-means objective is a popular generalization of the well-known $k$-means problem, extending the clustering capability of the $k$-means to datasets that are uncertain, vague, and otherwise hard to cluster. In this paper, we propose a semi-supervised active clustering framework, where the learner is allowed to interact with an oracle (domain expert), asking for the similarity between a certain set of chosen items. We study the query and computational complexities of clustering in this framework. We prove that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem. In particular, we provide algorithms for fuzzy clustering in this setting that asks $O(\mathsf{poly}(k)\log n)$ similarity queries and run with polynomial-time-complexity, where $n$ is the number of items. The fuzzy $k$-means objective is nonconvex, with $k$-means as a special case, and is equivalent to some other generic nonconvex problem such as non-negative matrix factorization. The ubiquitous Lloyd-type algorithms (or alternating minimization algorithms) can get stuck at a local minimum. Our results show that by making a few similarity queries, the problem becomes easier to solve. Finally, we test our algorithms over real-world datasets, showing their effectiveness in real-world applications.
\ No newline at end of file
diff --git a/data/2021/neurips/G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators b/data/2021/neurips/G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators
new file mode 100644
index 0000000000..63b440f2f3
--- /dev/null
+++ b/data/2021/neurips/G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators	
@@ -0,0 +1 @@
+Recent advances in machine learning have largely benefited from the massive accessible training data. However, large-scale data sharing has raised great privacy concerns. In this work, we propose a novel privacy-preserving data Generative model based on the PATE framework (G-PATE), aiming to train a scalable differentially private data generator that preserves high generated data utility. Our approach leverages generative adversarial nets to generate data, combined with private aggregation among different discriminators to ensure strong privacy guarantees. Compared to existing approaches, G-PATE significantly improves the use of privacy budgets. In particular, we train a student data generator with an ensemble of teacher discriminators and propose a novel private gradient aggregation mechanism to ensure differential privacy on all information that flows from teacher discriminators to the student generator. In addition, with random projection and gradient discretization, the proposed gradient aggregation mechanism is able to effectively deal with high-dimensional gradient vectors. Theoretically, we prove that G-PATE ensures differential privacy for the data generator. Empirically, we demonstrate the superiority of G-PATE over prior work through extensive experiments. We show that G-PATE is the first work being able to generate high-dimensional image data with high data utility under limited privacy budgets ($\epsilon \le 1$). Our code is available at https://github.com/AI-secure/G-PATE.
\ No newline at end of file
diff --git a/data/2021/neurips/GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement b/data/2021/neurips/GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
new file mode 100644
index 0000000000..c9fe2abc85
--- /dev/null
+++ b/data/2021/neurips/GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement	
@@ -0,0 +1 @@
+Advances in unsupervised learning of object-representations have culminated in the development of a broad range of methods for unsupervised object segmentation and interpretable object-centric scene generation. These methods, however, are limited to simulated and real-world datasets with limited visual complexity. Moreover, object representations are often inferred using RNNs which do not scale well to large images or iterative refinement which avoids imposing an unnatural ordering on objects in an image but requires the a priori initialisation of a fixed number of object representations. In contrast to established paradigms, this work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic stick-breaking process. Similar to iterative refinement, this clustering procedure also leads to randomly ordered object representations, but without the need of initialising a fixed number of clusters a priori. This is used to develop a new model, GENESIS-v2, which can infer a variable number of object representations without using RNNs or iterative refinement. We show that GENESIS-v2 performs strongly in comparison to recent baselines in terms of unsupervised image segmentation and object-centric scene generation on established synthetic datasets as well as more complex real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction b/data/2021/neurips/GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction
new file mode 100644
index 0000000000..ac307aaa72
--- /dev/null
+++ b/data/2021/neurips/GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction	
@@ -0,0 +1 @@
+Longyuan Li1∗ Jian Yao2* Li K. Wenliang3* Tong He4† Tianjun Xiao4 Junchi Yan1† David Wipf4 Zheng Zhang4 1Department of Computer Science and Engineering, MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University 2Fudan University 3Gatsby Unit, University College London 4Amazon Web Services {jeffli, yanjunchi}@sjtu.edu.cn, kevinli@gatsby.ucl.ac.uk {htong, tianjux, daviwipf, zhaz}@amazon.com
\ No newline at end of file
diff --git a/data/2021/neurips/Garment4D: Garment Reconstruction from Point Cloud Sequences b/data/2021/neurips/Garment4D: Garment Reconstruction from Point Cloud Sequences
new file mode 100644
index 0000000000..bd9e1e16b6
--- /dev/null
+++ b/data/2021/neurips/Garment4D: Garment Reconstruction from Point Cloud Sequences	
@@ -0,0 +1 @@
+Learning to reconstruct 3D garments is important for dressing 3D human bodies of different shapes in different poses. Previous works typically rely on 2D images as input, which however suffer from the scale and pose ambiguities. To circumvent the problems caused by 2D images, we propose a principled framework, Garment4D, that uses 3D point cloud sequences of dressed humans for garment reconstruction. Garment4D has three dedicated steps: sequential garments registration, canonical garment estimation, and posed garment reconstruction. The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts. To unravel these problems, we introduce a novel Proposal-Guided Hierarchical Feature Network and Iterative Graph Convolution Network, which integrate both high-level semantic features and low-level geometric features for fine details reconstruction. Furthermore, we propose a Temporal Transformer for smooth garment motions capture. Unlike non-parametric methods, the reconstructed garment meshes by our method are separable from the human body and have strong interpretability, which is desirable for downstream tasks. As the first attempt at this task, high-quality reconstruction results are qualitatively and quantitatively illustrated through extensive experiments. Codes are available at https://github.com/hongfz16/Garment4D.
\ No newline at end of file
diff --git a/data/2021/neurips/Gauge Equivariant Transformer b/data/2021/neurips/Gauge Equivariant Transformer
new file mode 100644
index 0000000000..965784a92e
--- /dev/null
+++ b/data/2021/neurips/Gauge Equivariant Transformer	
@@ -0,0 +1 @@
+Attention mechanism has shown great performance and efﬁciency in a lot of deep learning models, in which relative position encoding plays a crucial role. However, when introducing attention to manifolds, there is no canonical local coordinate system to parameterize neighborhoods. To address this issue, we propose an equivariant transformer to make our model agnostic to the orientation of local coordinate systems ( i.e. , gauge equivariant), which employs multi-head self-attention to jointly incorporate both position-based and content-based information. To enhance expressive ability, we adopt regular ﬁeld of cyclic groups as feature ﬁelds in intermediate layers, and propose a novel method to parallel transport the feature vectors in these ﬁelds. In addition, we project the position vector of each point onto its local coordinate system to disentangle the orientation of the coordinate system in ambient space ( i.e. , global coordinate system), achieving rotation invariance. To the best of our knowledge, we are the ﬁrst to introduce gauge equivariance to self-attention, thus name our model Gauge Equivariant Transformer (GET), which can be efﬁciently implemented on triangle meshes. Extensive experiments show that GET achieves state-of-the-art performance on two common recognition tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Gaussian Kernel Mixture Network for Single Image Defocus Deblurring b/data/2021/neurips/Gaussian Kernel Mixture Network for Single Image Defocus Deblurring
new file mode 100644
index 0000000000..2a88b80132
--- /dev/null
+++ b/data/2021/neurips/Gaussian Kernel Mixture Network for Single Image Defocus Deblurring	
@@ -0,0 +1 @@
+Defocus blur is one kind of blur effects often seen in images, which is challenging to remove due to its spatially variant amount. This paper presents an end-to-end deep learning approach for removing defocus blur from a single image, so as to have an all-in-focus image for consequent vision tasks. First, a pixel-wise Gaussian kernel mixture (GKM) model is proposed for representing spatially variant defocus blur kernels in an efficient linear parametric form, with higher accuracy than existing models. Then, a deep neural network called GKMNet is developed by unrolling a fixed-point iteration of the GKM-based deblurring. The GKMNet is built on a lightweight scale-recurrent architecture, with a scale-recurrent attention module for estimating the mixing coefficients in GKM for defocus deblurring. Extensive experiments show that the GKMNet not only noticeably outperforms existing defocus deblurring methods, but also has its advantages in terms of model complexity and computational efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/GemNet: Universal Directional Graph Neural Networks for Molecules b/data/2021/neurips/GemNet: Universal Directional Graph Neural Networks for Molecules
new file mode 100644
index 0000000000..2b255d2717
--- /dev/null
+++ b/data/2021/neurips/GemNet: Universal Directional Graph Neural Networks for Molecules	
@@ -0,0 +1 @@
+Effectively predicting molecular interactions has the potential to accelerate molecular dynamics by multiple orders of magnitude and thus revolutionize chemical simulations. Graph neural networks (GNNs) have recently shown great successes for this task, overtaking classical methods based on fixed molecular kernels. However, they still appear very limited from a theoretical perspective, since regular GNNs cannot distinguish certain types of graphs. In this work we close this gap between theory and practice. We show that GNNs with spherical representations are indeed universal approximators for predictions that are invariant to translation, and equivariant to permutation and rotation. We then discretize such GNNs via directed edge embeddings and two-hop message passing, and incorporate multiple structural improvements to arrive at the geometric message passing neural network (GemNet). We demonstrate the benefits of the proposed changes in multiple ablation studies. GemNet outperforms previous models on the COLL, MD17, and OC20 datasets by 34%, 41%, and 20%, respectively, and performs especially well on the most challenging molecules. Our implementation is available online.
\ No newline at end of file
diff --git a/data/2021/neurips/General Low-rank Matrix Optimization: Geometric Analysis and Sharper Bounds b/data/2021/neurips/General Low-rank Matrix Optimization: Geometric Analysis and Sharper Bounds
new file mode 100644
index 0000000000..829a0968b4
--- /dev/null
+++ b/data/2021/neurips/General Low-rank Matrix Optimization: Geometric Analysis and Sharper Bounds	
@@ -0,0 +1 @@
+This paper considers the global geometry of general low-rank minimization problems via the Burer-Monterio factorization approach. For the rank-$1$ case, we prove that there is no spurious second-order critical point for both symmetric and asymmetric problems if the rank-$2$ RIP constant $\delta$ is less than $1/2$. Combining with a counterexample with $\delta=1/2$, we show that the derived bound is the sharpest possible. For the arbitrary rank-$r$ case, the same property is established when the rank-$2r$ RIP constant $\delta$ is at most $1/3$. We design a counterexample to show that the non-existence of spurious second-order critical points may not hold if $\delta$ is at least $1/2$. In addition, for any problem with $\delta$ between $1/3$ and $1/2$, we prove that all second-order critical points have a positive correlation to the ground truth. Finally, the strict saddle property, which can lead to the polynomial-time global convergence of various algorithms, is established for both the symmetric and asymmetric problems when the rank-$2r$ RIP constant $\delta$ is less than $1/3$. The results of this paper significantly extend several existing bounds in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/General Nonlinearities in SO(2)-Equivariant CNNs b/data/2021/neurips/General Nonlinearities in SO(2)-Equivariant CNNs
new file mode 100644
index 0000000000..718d5ee8a9
--- /dev/null
+++ b/data/2021/neurips/General Nonlinearities in SO(2)-Equivariant CNNs	
@@ -0,0 +1 @@
+Invariance under symmetry is an important problem in machine learning. Our paper looks speciﬁcally at equivariant neural networks where transformations of inputs yield homomorphic transformations of outputs. Here, steerable CNNs have emerged as the standard solution. An inherent problem of steerable representations is that general nonlinear layers break equivariance, thus restricting architectural choices. Our paper applies harmonic distortion analysis to illuminate the effect of nonlinearities on Fourier representations of SO(2). We develop a novel FFT-based algorithm for computing representations of non-linearly transformed activations while maintaining band-limitation. It yields exact equivariance for polynomial (approximations of) nonlinearities, as well as approximate solutions with tunable accuracy for general functions. We apply the approach to build a fully E(3)- equivariant network for sampled 3D surface data. In experiments with 2D and 3D data, we obtain results that compare favorably to the state-of-the-art in terms of accuracy while permitting continuous symmetry and exact equivariance.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalizable Imitation Learning from Observation via Inferring Goal Proximity b/data/2021/neurips/Generalizable Imitation Learning from Observation via Inferring Goal Proximity
new file mode 100644
index 0000000000..7a45a77a49
--- /dev/null
+++ b/data/2021/neurips/Generalizable Imitation Learning from Observation via Inferring Goal Proximity	
@@ -0,0 +1 @@
+Task progress is intuitive and readily available task information that can guide an agent closer to the desired goal. Furthermore, a task progress estimator can generalize to new situations. From this intuition, we propose a simple yet effective imitation learning from observation method for a goal-directed task using a learned goal proximity function as a task progress estimator for better generalization to unseen states and goals. We obtain this goal proximity function from expert demonstrations and online agent experience, and then use the learned goal proximity as a dense reward for policy training. We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalizable Multi-linear Attention Network b/data/2021/neurips/Generalizable Multi-linear Attention Network
new file mode 100644
index 0000000000..9ad7e58b87
--- /dev/null
+++ b/data/2021/neurips/Generalizable Multi-linear Attention Network	
@@ -0,0 +1 @@
+The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture the multimodal joint representation. Bilinear attention network (BAN) is a commonly used integration method, which leverages tensor operations to associate the features of different modalities. However, BAN has a poor compatibility for more modalities, since the computational complexity of the attention map increases exponentially with the number of modalities. Based on this concern, we propose a new method called generalizable multi-linear attention network (MAN), which can associate more modalities in acceptable complexity with hierarchical approximation decomposition. Speciﬁcally, considering the fact that softmax attention kernels cannot be decomposed as linear operation directly, we adopt the addition random features mechanism to approximate the non-linear softmax functions with enough theoretical analysis. Furthermore, we also introduce the local sequential constraints, which can be combined with ARF conveniently, as positional information. We conduct extensive experiments on several datasets of corresponding tasks, the experimental results show that MAN could achieve competitive results compared with baseline methods, showcasing the effectiveness of our contributions.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis b/data/2021/neurips/Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis
new file mode 100644
index 0000000000..72d929b291
--- /dev/null
+++ b/data/2021/neurips/Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis	
@@ -0,0 +1 @@
+We derive a novel information-theoretic analysis of the generalization property of meta-learning algorithms. Concretely, our analysis proposes a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning (MAML) algorithms. Moreover, we provide a data-dependent generalization bound for a stochastic variant of MAML, which is non-vacuous for deep few-shot learning. As compared to previous bounds that depend on the square norm of gradients, empirical validations on both simulated data and a well-known few-shot benchmark show that our bound is orders of magnitude tighter in most situations.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization Bounds for (Wasserstein) Robust Optimization b/data/2021/neurips/Generalization Bounds for (Wasserstein) Robust Optimization
new file mode 100644
index 0000000000..9f9f09d445
--- /dev/null
+++ b/data/2021/neurips/Generalization Bounds for (Wasserstein) Robust Optimization	
@@ -0,0 +1 @@
+(Distributionally) robust optimization has gained momentum in machine learning community recently, due to its promising applications in developing generalizable learning paradigms. In this paper, we derive generalization bounds for robust optimization and Wasserstein robust optimization for Lipschitz and piecewise Hölder smooth loss functions under both stochastic and adversarial setting, assuming that the underlying data distribution satisﬁes transportation-information inequalities. The proofs are built on new generalization bounds for variation regularization (such as Lipschitz or gradient regularization) and its connection with robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic b/data/2021/neurips/Generalization Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability b/data/2021/neurips/Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability
new file mode 100644
index 0000000000..705197bcbc
--- /dev/null
+++ b/data/2021/neurips/Generalization Bounds for Meta-Learning via PAC-Bayes and Uniform Stability	
@@ -0,0 +1 @@
+We are motivated by the problem of providing strong generalization guarantees in the context of meta-learning. Existing generalization bounds are either challenging to evaluate or provide vacuous guarantees in even relatively simple settings. We derive a probably approximately correct (PAC) bound for gradient-based meta-learning using two different generalization frameworks in order to deal with the qualitatively different challenges of generalization at the"base"and"meta"levels. We employ bounds for uniformly stable algorithms at the base level and bounds from the PAC-Bayes framework at the meta level. The result of this approach is a novel PAC bound that is tighter when the base learner adapts quickly, which is precisely the goal of meta-learning. We show that our bound provides a tighter guarantee than other bounds on a toy non-convex problem on the unit sphere and a text-based classification example. We also present a practical regularization scheme motivated by the bound in settings where the bound is loose and demonstrate improved performance over baseline techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime b/data/2021/neurips/Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
new file mode 100644
index 0000000000..2fdb3a53a7
--- /dev/null
+++ b/data/2021/neurips/Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime	
@@ -0,0 +1 @@
+In this manuscript we consider kernel ridge regression (KRR) under the Gaussian design. Exponents for the decay of the excess generalization error of KRR have been reported in various works under the assumption of power-law decay of eigenvalues of the features co-variance. These decays were, however, provided for sizeably different setups, namely in the noiseless case with constant regularization and in the noisy optimally regularized case. Intermediary settings have been left substantially uncharted. In this work, we unify and extend this line of work, providing characterization of all regimes and excess error decay rates that can be observed in terms of the interplay of noise and regularization. In particular, we show the existence of a transition in the noisy setting between the noiseless exponents to its noisy values as the sample complexity is increased. Finally, we illustrate how this crossover can also be observed on real data sets.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization Guarantee of SGD for Pairwise Learning b/data/2021/neurips/Generalization Guarantee of SGD for Pairwise Learning
new file mode 100644
index 0000000000..0188d2c92a
--- /dev/null
+++ b/data/2021/neurips/Generalization Guarantee of SGD for Pairwise Learning	
@@ -0,0 +1 @@
+Recently, there is a growing interest in studying pairwise learning since it includes many important machine learning tasks as speciﬁc examples, e.g., metric learning, AUC maximization and ranking. While stochastic gradient descent (SGD) is an efﬁcient method, there is a lacking study on its generalization behavior for pairwise learning. In this paper, we present a systematic study on the generalization analysis of SGD for pairwise learning to understand the balance between generalization and optimization. We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the ﬁrst nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent excess risk bounds with O ( n ) gradient computations. We consider both convex and nonconvex pairwise learning problems. Our stability analysis for convex problems shows how the interpolation can help generalization. We establish a uniform convergence of gradients, and apply it to derive the ﬁrst excess risk bounds on population gradients for nonconvex pairwise learning. Finally, we extend our stability analysis to pairwise learning with gradient-dominated problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks b/data/2021/neurips/Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks
new file mode 100644
index 0000000000..c7ed65f2a0
--- /dev/null
+++ b/data/2021/neurips/Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks	
@@ -0,0 +1 @@
+In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by ${\mathcal{O}}(1/mn)$. Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks $m$ and number of samples per task $n$ on the generalization error of MAML.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized DataWeighting via Class-Level Gradient Manipulation b/data/2021/neurips/Generalized DataWeighting via Class-Level Gradient Manipulation
new file mode 100644
index 0000000000..1449631d25
--- /dev/null
+++ b/data/2021/neurips/Generalized DataWeighting via Class-Level Gradient Manipulation	
@@ -0,0 +1 @@
+Label noise and class imbalance are two major issues coexisting in real-world datasets. To alleviate the two issues, state-of-the-art methods reweight each instance by leveraging a small amount of clean and unbiased data. Yet, these methods overlook class-level information within each instance, which can be further utilized to improve performance. To this end, in this paper, we propose Generalized Data Weighting (GDW) to simultaneously mitigate label noise and class imbalance by manipulating gradients at the class level. To be specific, GDW unrolls the loss gradient to class-level gradients by the chain rule and reweights the flow of each gradient separately. In this way, GDW achieves remarkable performance improvement on both issues. Aside from the performance gain, GDW efficiently obtains class-level weights without introducing any extra computational cost compared with instance weighting methods. Specifically, GDW performs a gradient descent step on class-level weights, which only relies on intermediate gradients. Extensive experiments in various settings verify the effectiveness of GDW. For example, GDW outperforms state-of-the-art methods by $2.56\%$ under the $60\%$ uniform noise setting in CIFAR10. Our code is available at https://github.com/GGchen1997/GDW-NIPS2021.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks b/data/2021/neurips/Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks
new file mode 100644
index 0000000000..5274760e48
--- /dev/null
+++ b/data/2021/neurips/Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks	
@@ -0,0 +1 @@
+Despite their tremendous successes, convolutional neural networks (CNNs) incur high computational/storage costs and are vulnerable to adversarial perturbations. Recent works on robust model compression address these challenges by combining model compression techniques with adversarial training. But these methods are unable to improve throughput (frames-per-second) on real-life hardware while simultaneously preserving robustness to adversarial perturbations. To overcome this problem, we propose the method of Generalized Depthwise-Separable (GDWS) convolution -- an efficient, universal, post-training approximation of a standard 2D convolution. GDWS dramatically improves the throughput of a standard pre-trained network on real-life hardware while preserving its robustness. Lastly, GDWS is scalable to large problem sizes since it operates on pre-trained models and doesn't require any additional training. We establish the optimality of GDWS as a 2D convolution approximator and present exact algorithms for constructing optimal GDWS convolutions under complexity and error constraints. We demonstrate the effectiveness of GDWS via extensive experiments on CIFAR-10, SVHN, and ImageNet datasets. Our code can be found at https://github.com/hsndbk4/GDWS.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels b/data/2021/neurips/Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels
new file mode 100644
index 0000000000..9401cdd42e
--- /dev/null
+++ b/data/2021/neurips/Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels	
@@ -0,0 +1 @@
+Prior works have found it beneficial to combine provably noise-robust loss functions e.g., mean absolute error (MAE) with standard categorical loss function e.g. cross entropy (CE) to improve their learnability. Here, we propose to use Jensen-Shannon divergence as a noise-robust loss function and show that it interestingly interpolate between CE and MAE with a controllable mixing parameter. Furthermore, we make a crucial observation that CE exhibit lower consistency around noisy data points. Based on this observation, we adopt a generalized version of the Jensen-Shannon divergence for multiple distributions to encourage consistency around data points. Using this loss function, we show state-of-the-art results on both synthetic (CIFAR), and real-world (e.g., WebVision) noise with varying noise rates.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized Linear Bandits with Local Differential Privacy b/data/2021/neurips/Generalized Linear Bandits with Local Differential Privacy
new file mode 100644
index 0000000000..479dec2801
--- /dev/null
+++ b/data/2021/neurips/Generalized Linear Bandits with Local Differential Privacy	
@@ -0,0 +1 @@
+Contextual bandit algorithms are useful in personalized online decision-making. However, many applications such as personalized medicine and online advertising require the utilization of individual-specific information for effective learning, while user's data should remain private from the server due to privacy concerns. This motivates the introduction of local differential privacy (LDP), a stringent notion in privacy, to contextual bandits. In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings. Our main idea is to develop a stochastic gradient-based estimator and update mechanism to ensure LDP. We then exploit the flexibility of stochastic gradient descent (SGD), whose theoretical guarantee for bandit problems is rarely explored, in dealing with generalized linear bandits. We also develop an estimator and update mechanism based on Ordinary Least Square (OLS) for linear bandits. Finally, we conduct experiments with both simulation and real-world datasets to demonstrate the consistently superb performance of our algorithms under LDP constraints with reasonably small parameters $(\varepsilon, \delta)$ to ensure strong privacy protection.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized Proximal Policy Optimization with Sample Reuse b/data/2021/neurips/Generalized Proximal Policy Optimization with Sample Reuse
new file mode 100644
index 0000000000..e23c995b64
--- /dev/null
+++ b/data/2021/neurips/Generalized Proximal Policy Optimization with Sample Reuse	
@@ -0,0 +1 @@
+In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized Shape Metrics on Neural Representations b/data/2021/neurips/Generalized Shape Metrics on Neural Representations
new file mode 100644
index 0000000000..bd3fbc355c
--- /dev/null
+++ b/data/2021/neurips/Generalized Shape Metrics on Neural Representations	
@@ -0,0 +1 @@
+Understanding the operation of biological and artificial networks remains a difficult and important challenge. To identify general principles, researchers are increasingly interested in surveying large collections of networks that are trained on, or biologically adapted to, similar tasks. A standardized set of analysis tools is now needed to identify how network-level covariates-such as architecture, anatomical brain region, and model organism-impact neural representations (hidden layer activations). Here, we provide a rigorous foundation for these analyses by defining a broad family of metric spaces that quantify representational dissimilarity. Using this framework, we modify existing representational similarity measures based on canonical correlation analysis and centered kernel alignment to satisfy the triangle inequality, formulate a novel metric that respects the inductive biases in convolutional layers, and identify approximate Euclidean embeddings that enable network representations to be incorporated into essentially any off-the-shelf machine learning method. We demonstrate these methods on large-scale datasets from biology (Allen Institute Brain Observatory) and deep learning (NAS-Bench-101). In doing so, we identify relationships between neural representations that are interpretable in terms of anatomical features and model performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement b/data/2021/neurips/Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement
new file mode 100644
index 0000000000..f7b2feb13b
--- /dev/null
+++ b/data/2021/neurips/Generalized and Discriminative Few-Shot Object Detection via SVD-Dictionary Enhancement	
@@ -0,0 +1 @@
+Few-shot object detection (FSOD) aims to detect new objects based on few anno-1 tated samples. To alleviate the impact of few samples, enhancing the generalization 2 and discrimination abilities of detectors on new objects plays an important role. 3 In this paper, we explore employing Singular Value Decomposition (SVD) to 4 boost both the generalization and discrimination abilities. In speciﬁc, we propose 5 a novel method, namely, SVD-Dictionary enhancement, to build two separated 6 spaces based on the sorted singular values. Concretely, the eigenvectors corre-7 sponding to larger singular values are used to build the generalization space in 8 which localization is performed, as these eigenvectors generally suppress certain 9 variations (e.g., the variation of styles) and contain intrinsical characteristics of 10 objects. Meanwhile, since the eigenvectors corresponding to relatively smaller sin-11 gular values may contain richer category-related information, we can utilize them 12 to build the discrimination space in which classiﬁcation is performed. Dictionary 13 learning is further leveraged to capture high-level discriminative information from 14 the discrimination space, which is beneﬁcial for improving detection accuracy. In 15 the experiments, we separately verify the effectiveness of our method on PASCAL 16 VOC and COCO benchmarks. Particularly, for the 2-shot case in VOC split1, our 17 method signiﬁcantly outperforms the baseline by 6.2%. Moreover, visualization 18 analysis shows that our method is instrumental in doing FSOD. 19
\ No newline at end of file
diff --git a/data/2021/neurips/Generating High-Quality Explanations for Navigation in Partially-Revealed Environments b/data/2021/neurips/Generating High-Quality Explanations for Navigation in Partially-Revealed Environments
new file mode 100644
index 0000000000..1162991d8a
--- /dev/null
+++ b/data/2021/neurips/Generating High-Quality Explanations for Navigation in Partially-Revealed Environments	
@@ -0,0 +1 @@
+We present an approach for generating natural language explanations of high-level behavior of autonomous agents navigating in partially-revealed environments. Our counterfactual explanations communicate changes to interpratable statistics of the belief (e
\ No newline at end of file
diff --git a/data/2021/neurips/Generative Occupancy Fields for 3D Surface-Aware Image Synthesis b/data/2021/neurips/Generative Occupancy Fields for 3D Surface-Aware Image Synthesis
new file mode 100644
index 0000000000..5d3c9c76fc
--- /dev/null
+++ b/data/2021/neurips/Generative Occupancy Fields for 3D Surface-Aware Image Synthesis	
@@ -0,0 +1 @@
+The advent of generative radiance fields has significantly promoted the development of 3D-aware image synthesis. The cumulative rendering process in radiance fields makes training these generative models much easier since gradients are distributed over the entire volume, but leads to diffused object surfaces. In the meantime, compared to radiance fields occupancy representations could inherently ensure deterministic surfaces. However, if we directly apply occupancy representations to generative models, during training they will only receive sparse gradients located on object surfaces and eventually suffer from the convergence problem. In this paper, we propose Generative Occupancy Fields (GOF), a novel model based on generative radiance fields that can learn compact object surfaces without impeding its training convergence. The key insight of GOF is a dedicated transition from the cumulative rendering in radiance fields to rendering with only the surface points as the learned surface gets more and more accurate. In this way, GOF combines the merits of two representations in a unified framework. In practice, the training-time transition of start from radiance fields and march to occupancy representations is achieved in GOF by gradually shrinking the sampling region in its rendering process from the entire volume to a minimal neighboring region around the surface. Through comprehensive experiments on multiple datasets, we demonstrate that GOF can synthesize high-quality images with 3D consistency and simultaneously learn compact and smooth object surfaces. Code, models, and demo videos are available at https://sheldontsui.github.io/projects/GOF
\ No newline at end of file
diff --git a/data/2021/neurips/Generative vs. Discriminative: Rethinking The Meta-Continual Learning b/data/2021/neurips/Generative vs. Discriminative: Rethinking The Meta-Continual Learning
new file mode 100644
index 0000000000..bd40e62d86
--- /dev/null
+++ b/data/2021/neurips/Generative vs. Discriminative: Rethinking The Meta-Continual Learning	
@@ -0,0 +1 @@
+Deep neural networks have achieved human-level capabilities in various learning tasks. However, they generally lose performance in more realistic scenarios like learning in a continual manner. In contrast, humans can incorporate their prior knowledge to learn new concepts efﬁciently without forgetting older ones. In this work, we leverage meta-learning to encourage the model to learn how to learn continually. Inspired by human concept learning, we develop a generative classiﬁer that efﬁciently uses data-driven experience to learn new concepts even from few samples while being immune to forgetting. Along with cognitive and theoretical insights, extensive experiments on standard benchmarks demonstrate the effectiveness of the proposed method. The ability to remember all previous concepts, with negligible computational and structural overheads, suggests that generative models provide a natural way for alleviating catastrophic forgetting, which is a major drawback of discriminative models. The code is publicly available at https://github.com/aminbana/GeMCL .
\ No newline at end of file
diff --git a/data/2021/neurips/Generic Neural Architecture Search via Regression b/data/2021/neurips/Generic Neural Architecture Search via Regression
new file mode 100644
index 0000000000..8f1c6223bf
--- /dev/null
+++ b/data/2021/neurips/Generic Neural Architecture Search via Regression	
@@ -0,0 +1 @@
+Most existing neural architecture search (NAS) algorithms are dedicated to and evaluated by the downstream tasks, e.g., image classification in computer vision. However, extensive experiments have shown that, prominent neural architectures, such as ResNet in computer vision and LSTM in natural language processing, are generally good at extracting patterns from the input data and perform well on different downstream tasks. In this paper, we attempt to answer two fundamental questions related to NAS. (1) Is it necessary to use the performance of specific downstream tasks to evaluate and search for good neural architectures? (2) Can we perform NAS effectively and efficiently while being agnostic to the downstream tasks? To answer these questions, we propose a novel and generic NAS framework, termed Generic NAS (GenNAS). GenNAS does not use task-specific labels but instead adopts regression on a set of manually designed synthetic signal bases for architecture evaluation. Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples. Extensive experiments across 13 CNN search spaces and one NLP space demonstrate the remarkable efficiency of GenNAS using regression, in terms of both evaluating the neural architectures (quantified by the ranking correlation Spearman's rho between the approximated performances and the downstream task performances) and the convergence speed for training (within a few seconds).
\ No newline at end of file
diff --git a/data/2021/neurips/GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles b/data/2021/neurips/GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles
new file mode 100644
index 0000000000..e7d9c94c18
--- /dev/null
+++ b/data/2021/neurips/GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles	
@@ -0,0 +1 @@
+Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements (e.g. torsion angles), separate optimization stages prone to error accumulation, and the need for structure fine-tuning based on approximate classical force-fields or computationally expensive methods such as metadynamics with approximate quantum mechanics calculations at each geometry. We propose GeoMol--an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate distributions of low-energy molecular 3D conformers. Leveraging the power of message passing neural networks (MPNNs) to capture local and global graph information, we predict local atomic 3D structures and torsion angles, avoiding unnecessary over-parameterization of the geometric degrees of freedom (e.g. one angle per non-terminal bond). Such local predictions suffice both for the training loss computation, as well as for the full deterministic conformer assembly (at test time). We devise a non-adversarial optimal transport based loss function to promote diverse conformer generation. GeoMol predominantly outperforms popular open-source, commercial, or state-of-the-art machine learning (ML) models, while achieving significant speed-ups. We expect such differentiable 3D structure generators to significantly impact molecular modeling and related applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Geometry Processing with Neural Fields b/data/2021/neurips/Geometry Processing with Neural Fields
new file mode 100644
index 0000000000..a4adf3dfa7
--- /dev/null
+++ b/data/2021/neurips/Geometry Processing with Neural Fields	
@@ -0,0 +1 @@
+Most existing geometry processing algorithms use meshes as the default shape representation. Manipulating meshes, however, requires one to maintain high quality in the surface discretization. For example, changing the topology of a mesh usually requires additional procedures such as remeshing. This paper instead proposes the use of neural fields for geometry processing. Neural fields can compactly store complicated shapes without spatial discretization. Moreover, neural fields are infinitely differentiable, which allows them to be optimized for objectives that involve higher-order derivatives. This raises the question: can geometry processing be done entirely using neural fields? We introduce loss functions and architectures to show that some of the most challenging geometry processing tasks, such as deformation and filtering, can be done with neural fields. Experimental results show that our methods are on par with the well-established mesh-based methods without committing to a particular surface discretization. Code is available at https://github.com/stevenygd/NFGP .
\ No newline at end of file
diff --git a/data/2021/neurips/Glance-and-Gaze Vision Transformer b/data/2021/neurips/Glance-and-Gaze Vision Transformer
new file mode 100644
index 0000000000..755436cf4a
--- /dev/null
+++ b/data/2021/neurips/Glance-and-Gaze Vision Transformer	
@@ -0,0 +1 @@
+Recently, there emerges a series of vision Transformers, which show superior performance with a more compact model size than conventional convolutional neural networks, thanks to the strong ability of Transformers to model long-range dependencies. However, the advantages of vision Transformers also come with a price: Self-attention, the core part of Transformer, has a quadratic complexity to the input sequence length. This leads to a dramatic increase of computation and memory cost with the increase of sequence length, thus introducing difficulties when applying Transformers to the vision tasks that require dense predictions based on high-resolution feature maps. In this paper, we propose a new vision Transformer, named Glance-and-Gaze Transformer (GG-Transformer), to address the aforementioned issues. It is motivated by the Glance and Gaze behavior of human beings when recognizing objects in natural scenes, with the ability to efficiently model both long-range dependencies and local context. In GG-Transformer, the Glance and Gaze behavior is realized by two parallel branches: The Glance branch is achieved by performing self-attention on the adaptively-dilated partitions of the input, which leads to a linear complexity while still enjoying a global receptive field; The Gaze branch is implemented by a simple depth-wise convolutional layer, which compensates local image context to the features obtained by the Glance mechanism. We empirically demonstrate our method achieves consistently superior performance over previous state-of-the-art Transformers on various vision tasks and benchmarks. The codes and models will be made available at https://github.com/yucornetto/GG-Transformer.
\ No newline at end of file
diff --git a/data/2021/neurips/Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization b/data/2021/neurips/Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization
new file mode 100644
index 0000000000..6bc1cf0e93
--- /dev/null
+++ b/data/2021/neurips/Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization	
@@ -0,0 +1 @@
+We study the asymmetric low-rank factorization problem: \[\min_{\mathbf{U} \in \mathbb{R}^{m \times d}, \mathbf{V} \in \mathbb{R}^{n \times d}} \frac{1}{2}\|\mathbf{U}\mathbf{V}^\top -\mathbf{\Sigma}\|_F^2\] where $\mathbf{\Sigma}$ is a given matrix of size $m \times n$ and rank $d$. This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $\mathbf{U}$ and $\mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $\mathbf{U}$ and $\mathbf{V}$. This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Global Convergence of Online Optimization for Nonlinear Model Predictive Control b/data/2021/neurips/Global Convergence of Online Optimization for Nonlinear Model Predictive Control
new file mode 100644
index 0000000000..58833e5e7e
--- /dev/null
+++ b/data/2021/neurips/Global Convergence of Online Optimization for Nonlinear Model Predictive Control	
@@ -0,0 +1 @@
+We study a real-time iteration (RTI) scheme for solving online optimization problem appeared in nonlinear optimal control. The proposed RTI scheme modiﬁes the existing RTI-based model predictive control (MPC) algorithm, by selecting the stepsize of each Newton step at each sampling time using a differentiable exact augmented Lagrangian. The scheme can adaptively select the penalty parameters of augmented Lagrangian on the ﬂy, which are shown to be stabilized after certain time periods. We prove under generic assumptions that, by involving stepsize selection instead of always using a full Newton step (like what most of the existing RTIs do), the scheme converges globally: for any initial point, the KKT residuals of the subproblems converge to zero. A key step is to show that augmented Lagrangian keeps decreasing as horizon moves forward. We demonstrate the global convergence behavior of the proposed RTI scheme in a numerical experiment.
\ No newline at end of file
diff --git a/data/2021/neurips/Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games b/data/2021/neurips/Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games
new file mode 100644
index 0000000000..31d5259efc
--- /dev/null
+++ b/data/2021/neurips/Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games	
@@ -0,0 +1 @@
+We study gradient descent-ascent learning dynamics with timescale separation ( τ - GDA ) in unconstrained continuous action zero-sum games where the minimizing player faces a nonconvex optimization problem and the maximizing player optimizes a Polyak-Łojasiewicz (PŁ) or strongly-concave (SC) objective. In contrast to past work on gradient-based learning in nonconvex-PŁ/SC zero-sum games, we assess convergence in relation to natural game-theoretic equilibria instead of only notions of stationarity. In pursuit of this goal, we prove that the only locally stable points of the τ - GDA continuous-time limiting system correspond to strict local minmax equilibria in each class of games. For these classes of games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee for the discrete-time gradient descent-ascent update to a set of the strict local minmax equilibrium. Moreover, we provide convergence rates for the gradient descent-ascent dynamics with timescale separation to approximate stationary points.
\ No newline at end of file
diff --git a/data/2021/neurips/Global Filter Networks for Image Classification b/data/2021/neurips/Global Filter Networks for Image Classification
new file mode 100644
index 0000000000..0ff7f86dd4
--- /dev/null
+++ b/data/2021/neurips/Global Filter Networks for Image Classification	
@@ -0,0 +1 @@
+Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases. These models are generally based on learning interaction among spatial locations from raw data. The complexity of self-attention and MLP grows quadratically as the image size increases, which makes these models hard to scale up when high-resolution features are required. In this paper, we present the Global Filter Network (GFNet), a conceptually simple yet computationally efficient architecture, that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our architecture replaces the self-attention layer in vision transformers with three key operations: a 2D discrete Fourier transform, an element-wise multiplication between frequency-domain features and learnable global filters, and a 2D inverse Fourier transform. We exhibit favorable accuracy/complexity trade-offs of our models on both ImageNet and downstream tasks. Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness. Code is available at https://github.com/raoyongming/GFNet
\ No newline at end of file
diff --git a/data/2021/neurips/Global-aware Beam Search for Neural Abstractive Summarization b/data/2021/neurips/Global-aware Beam Search for Neural Abstractive Summarization
new file mode 100644
index 0000000000..cb745ce5a9
--- /dev/null
+++ b/data/2021/neurips/Global-aware Beam Search for Neural Abstractive Summarization	
@@ -0,0 +1 @@
+This study develops a calibrated beam-based algorithm with awareness of the global attention distribution for neural abstractive summarization, aiming to improve the local optimality problem of the original beam search in a rigorous way. Specifically, a novel global protocol is proposed based on the attention distribution to stipulate how a global optimal hypothesis should attend to the source. A global scoring mechanism is then developed to regulate beam search to generate summaries in a near-global optimal fashion. This novel design enjoys a distinctive property, i.e., the global attention distribution could be predicted before inference, enabling step-wise improvements on the beam search through the global scoring mechanism. Extensive experiments on nine datasets show that the global (attention)-aware inference significantly improves state-of-the-art summarization models even using empirical hyper-parameters. The algorithm is also proven robust as it remains to generate meaningful texts with corrupted attention distributions. The codes and a comprehensive set of examples are available.
\ No newline at end of file
diff --git a/data/2021/neurips/Going Beyond Linear RL: Sample Efficient Neural Function Approximation b/data/2021/neurips/Going Beyond Linear RL: Sample Efficient Neural Function Approximation
new file mode 100644
index 0000000000..fd30dee8a4
--- /dev/null
+++ b/data/2021/neurips/Going Beyond Linear RL: Sample Efficient Neural Function Approximation	
@@ -0,0 +1 @@
+Deep Reinforcement Learning (RL) powered by neural net approximation of the Q function has had enormous empirical success. While the theory of RL has traditionally focused on linear function approximation (or eluder dimension) approaches, little is known about nonlinear RL with neural net approximations of the Q functions. This is the focus of this work, where we study function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions). Our first result is a computationally and statistically efficient algorithm in the generative model setting under completeness for two-layer neural networks. Our second result considers this setting but under only realizability of the neural net function class. Here, assuming deterministic dynamics, the sample complexity scales linearly in the algebraic dimension. In all cases, our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Going Beyond Linear Transformers with Recurrent Fast Weight Programmers b/data/2021/neurips/Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
new file mode 100644
index 0000000000..59beb99944
--- /dev/null
+++ b/data/2021/neurips/Going Beyond Linear Transformers with Recurrent Fast Weight Programmers	
@@ -0,0 +1 @@
+Transformers with linearised attention (''linear Transformers'') have demonstrated the practical scalability and effectiveness of outer product-based Fast Weight Programmers (FWPs) from the '90s. However, the original FWP formulation is more general than the one of linear Transformers: a slow neural network (NN) continually reprograms the weights of a fast NN with arbitrary architecture. In existing linear Transformers, both NNs are feedforward and consist of a single layer. Here we explore new variations by adding recurrence to the slow and fast nets. We evaluate our novel recurrent FWPs (RFWPs) on two synthetic algorithmic tasks (code execution and sequential ListOps), Wikitext-103 language models, and on the Atari 2600 2D game environment. Our models exhibit properties of Transformers and RNNs. In the reinforcement learning setting, we report large improvements over LSTM in several Atari games. Our code is public.
\ No newline at end of file
diff --git a/data/2021/neurips/Gone Fishing: Neural Active Learning with Fisher Embeddings b/data/2021/neurips/Gone Fishing: Neural Active Learning with Fisher Embeddings
new file mode 100644
index 0000000000..ba33ce589b
--- /dev/null
+++ b/data/2021/neurips/Gone Fishing: Neural Active Learning with Fisher Embeddings	
@@ -0,0 +1 @@
+There is an increasing need for effective active learning algorithms that are compatible with deep neural networks. This paper motivates and revisits a classic, Fisher-based active selection objective, and proposes BAIT, a practical, tractable, and high-performing algorithm that makes it viable for use with neural models. BAIT draws inspiration from the theoretical analysis of maximum likelihood estimators (MLE) for parametric models. It selects batches of samples by optimizing a bound on the MLE error in terms of the Fisher information, which we show can be implemented efficiently at scale by exploiting linear-algebraic structure especially amenable to execution on modern hardware. Our experiments demonstrate that BAIT outperforms the previous state of the art on both classification and regression problems, and is flexible enough to be used with a variety of model architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/Good Classification Measures and How to Find Them b/data/2021/neurips/Good Classification Measures and How to Find Them
new file mode 100644
index 0000000000..b40f9717c6
--- /dev/null
+++ b/data/2021/neurips/Good Classification Measures and How to Find Them	
@@ -0,0 +1 @@
+Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure that is best in all situations? To answer this question, we conduct a systematic analysis of classification performance measures: we formally define a list of desirable properties and theoretically analyze which measures satisfy which properties. We also prove an impossibility theorem: some desirable properties cannot be simultaneously satisfied. Finally, we propose a new family of measures satisfying all desirable properties except one. This family includes the Matthews Correlation Coefficient and a so-called Symmetric Balanced Accuracy that was not previously used in classification literature. We believe that our systematic approach gives an important tool to practitioners for adequately evaluating classification results.
\ No newline at end of file
diff --git a/data/2021/neurips/Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation b/data/2021/neurips/Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation
new file mode 100644
index 0000000000..55d7853716
--- /dev/null
+++ b/data/2021/neurips/Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation	
@@ -0,0 +1 @@
+Large pretrained language models (LMs) like BERT have improved performance in many disparate natural language processing (NLP) tasks. However, fine tuning such models requires a large number of training examples for each target task. Simultaneously, many realistic NLP problems are"few shot", without a sufficiently large training set. In this work, we propose a novel conditional neural process-based approach for few-shot text classification that learns to transfer from other diverse tasks with rich annotation. Our key idea is to represent each task using gradient information from a base model and to train an adaptation network that modulates a text classifier conditioned on the task representation. While previous task-aware few-shot learners represent tasks by input encoding, our novel task representation is more powerful, as the gradient captures input-output relationships of a task. Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.
\ No newline at end of file
diff --git a/data/2021/neurips/GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training b/data/2021/neurips/GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
new file mode 100644
index 0000000000..33bd29273c
--- /dev/null
+++ b/data/2021/neurips/GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training	
@@ -0,0 +1 @@
+Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always portable to new architectures. This paper presents GradInit, an automated and architecture agnostic method for initializing neural networks. GradInit is based on a simple heuristic; the norm of each network layer is adjusted so that a single step of SGD or Adam with prescribed hyperparameters results in the smallest possible loss value. This adjustment is done by introducing a scalar multiplier variable in front of each parameter block, and then optimizing these variables using a simple numerical scheme. GradInit accelerates the convergence and test performance of many convolutional architectures, both with or without skip connections, and even without normalization layers. It also improves the stability of the original Transformer architecture for machine translation, enabling training it without learning rate warmup using either Adam or SGD under a wide range of learning rates and momentum coefficients. Code is available at https://github.com/zhuchen03/gradinit.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias b/data/2021/neurips/Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
new file mode 100644
index 0000000000..605b13296a
--- /dev/null
+++ b/data/2021/neurips/Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias	
@@ -0,0 +1 @@
+The generalization mystery of overparametrized deep nets has motivated efforts to understand how gradient descent (GD) converges to low-loss solutions that generalize well. Real-life neural networks are initialized from small random values and trained with cross-entropy loss for classification (unlike the"lazy"or"NTK"regime of training where analysis was more successful), and a recent sequence of results (Lyu and Li, 2020; Chizat and Bach, 2020; Ji and Telgarsky, 2020) provide theoretical evidence that GD may converge to the"max-margin"solution with zero loss, which presumably generalizes well. However, the global optimality of margin is proved only in some settings where neural nets are infinitely or exponentially wide. The current paper is able to establish this global optimality for two-layer Leaky ReLU nets trained with gradient flow on linearly separable and symmetric data, regardless of the width. The analysis also gives some theoretical justification for recent empirical findings (Kalimeris et al., 2019) on the so-called simplicity bias of GD towards linear or other"simple"classes of solutions, especially early in training. On the pessimistic side, the paper suggests that such results are fragile. A simple data manipulation can make gradient flow converge to a linear classifier with suboptimal margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning b/data/2021/neurips/Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning
new file mode 100644
index 0000000000..e7ce19827b
--- /dev/null
+++ b/data/2021/neurips/Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning	
@@ -0,0 +1 @@
+In collaborative machine learning (CML), multiple agents pool their resources (e.g., data) together for a common learning task. In realistic CML settings where the agents are self-interested and not altruistic, they may be unwilling to share data or model information without adequate rewards. Furthermore, as the data/model information shared by the agents may differ in quality, designing rewards which are fair to them is important so that they would not feel exploited nor discouraged from sharing. In this paper, we adopt federated learning as the CML paradigm, propose a novel cosine gradient Shapley value (CGSV) to fairly evaluate the expected marginal contribution of each agent’s uploaded model parameter update/gradient without needing an auxiliary validation dataset, and based on the CGSV, design a novel training-time gradient reward mechanism with a fairness guarantee by sparsifying the aggregated parameter update/gradient downloaded from the server as reward to each agent such that its resulting quality is commensurate to that of the agent’s uploaded parameter update/gradient. We empirically demonstrate the effectiveness of our fair gradient reward mechanism on multiple benchmark datasets in terms of fairness, predictive performance, and time overhead.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient Inversion with Generative Image Prior b/data/2021/neurips/Gradient Inversion with Generative Image Prior
new file mode 100644
index 0000000000..f1420f3ef7
--- /dev/null
+++ b/data/2021/neurips/Gradient Inversion with Generative Image Prior	
@@ -0,0 +1 @@
+Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data. Without further privacy mechanisms such as differential privacy, this leaves the system vulnerable against an attacker who inverts those gradients to reveal clients sensitive data. However, a gradient is often insufficient to reconstruct the user data without any prior knowledge. By exploiting a generative model pretrained on the data distribution, we demonstrate that data privacy can be easily breached. Further, when such prior knowledge is unavailable, we investigate the possibility of learning the prior from a sequence of gradients seen in the process of FL training. We experimentally show that the prior in a form of generative model is learnable from iterative interactions in FL. Our findings strongly suggest that additional mechanisms are necessary to prevent privacy leakage in FL.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering b/data/2021/neurips/Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering
new file mode 100644
index 0000000000..a8e79e6fc0
--- /dev/null
+++ b/data/2021/neurips/Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering	
@@ -0,0 +1 @@
+We introduce a simple yet effective framework for improving the robustness of learning algorithms against image corruptions for autonomous driving. These cor-ruptions can occur due to both internal (e.g., sensor noises and hardware abnormalities) and external factors (e.g., lighting, weather, visibility, and other environmental effects). Using sensitivity analysis with FID-based parameterization, we propose a novel algorithm exploiting basis perturbations to improve the overall performance of autonomous steering and other image processing tasks, such as classiﬁcation and detection, for self-driving cars. Our model not only improves the performance on the original dataset, but also achieves signiﬁcant performance improvement on datasets with multiple and unseen perturbations, up to 87% and 77%, respectively. A comparison between our approach and other SOTA techniques conﬁrms the effectiveness of our technique in improving the robustness of neural network training for learning-based steering and other image processing tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient-based Editing of Memory Examples for Online Task-free Continual Learning b/data/2021/neurips/Gradient-based Editing of Memory Examples for Online Task-free Continual Learning
new file mode 100644
index 0000000000..9fcdfe0c0a
--- /dev/null
+++ b/data/2021/neurips/Gradient-based Editing of Memory Examples for Online Task-free Continual Learning	
@@ -0,0 +1 @@
+We explore task-free continual learning (CL), in which a model is trained to avoid catastrophic forgetting in the absence of explicit task boundaries or identities. Among many efforts on task-free CL, a notable family of approaches are memory-based that store and replay a subset of training examples. However, the utility of stored seen examples may diminish over time since CL models are continually updated. Here, we propose Gradient based Memory EDiting (GMED), a framework for editing stored examples in continuous input space via gradient updates, in order to create more"challenging"examples for replay. GMED-edited examples remain similar to their unedited forms, but can yield increased loss in the upcoming model updates, thereby making the future replays more effective in overcoming catastrophic forgetting. By construction, GMED can be seamlessly applied in conjunction with other memory-based CL algorithms to bring further improvement. Experiments validate the effectiveness of GMED, and our best method significantly outperforms baselines and previous state-of-the-art on five out of six datasets. Code can be found at https://github.com/INK-USC/GMED.
\ No newline at end of file
diff --git a/data/2021/neurips/Gradient-based Hyperparameter Optimization Over Long Horizons b/data/2021/neurips/Gradient-based Hyperparameter Optimization Over Long Horizons
new file mode 100644
index 0000000000..dfb7748dc9
--- /dev/null
+++ b/data/2021/neurips/Gradient-based Hyperparameter Optimization Over Long Horizons	
@@ -0,0 +1 @@
+Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online, but this introduces greediness which comes with a significant performance drop. We propose forward-mode differentiation with sharing (FDS), a simple and efficient algorithm which tackles memory scaling issues with forward-mode differentiation, and gradient degradation issues by sharing hyperparameters that are contiguous in time. We provide theoretical guarantees about the noise reduction properties of our algorithm, and demonstrate its efficiency empirically by differentiating through $\sim 10^4$ gradient steps of unrolled optimization. We consider large hyperparameter search ranges on CIFAR-10 where we significantly outperform greedy gradient-based alternatives, while achieving $\times 20$ speedups compared to the state-of-the-art black-box methods. Code is available at: \url{https://github.com/polo5/FDS}
\ No newline at end of file
diff --git a/data/2021/neurips/Gradual Domain Adaptation without Indexed Intermediate Domains b/data/2021/neurips/Gradual Domain Adaptation without Indexed Intermediate Domains
new file mode 100644
index 0000000000..e1802f8cad
--- /dev/null
+++ b/data/2021/neurips/Gradual Domain Adaptation without Indexed Intermediate Domains	
@@ -0,0 +1 @@
+The effectiveness of unsupervised domain adaptation degrades when there is a large discrepancy between the source and target domains. Gradual domain adaptation (GDA) is one promising way to mitigate such an issue, by leveraging additional unlabeled data that gradually shift from the source to the target. Through sequentially adapting the model along the"indexed"intermediate domains, GDA substantially improves the overall adaptation performance. In practice, however, the extra unlabeled data may not be separated into intermediate domains and indexed properly, limiting the applicability of GDA. In this paper, we investigate how to discover the sequence of intermediate domains when it is not already available. Concretely, we propose a coarse-to-fine framework, which starts with a coarse domain discovery step via progressive domain discriminator training. This coarse domain sequence then undergoes a fine indexing step via a novel cycle-consistency loss, which encourages the next intermediate domain to preserve sufficient discriminative knowledge of the current intermediate domain. The resulting domain sequence can then be used by a GDA algorithm. On benchmark data sets of GDA, we show that our approach, which we name Intermediate DOmain Labeler (IDOL), can lead to comparable or even better adaptation performance compared to the pre-defined domain sequence, making GDA more applicable and robust to the quality of domain sequences. Codes are available at https://github.com/hongyouc/IDOL.
\ No newline at end of file
diff --git a/data/2021/neurips/Grammar-Based Grounded Lexicon Learning b/data/2021/neurips/Grammar-Based Grounded Lexicon Learning
new file mode 100644
index 0000000000..6c23fd3dd7
--- /dev/null
+++ b/data/2021/neurips/Grammar-Based Grounded Lexicon Learning	
@@ -0,0 +1 @@
+We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form {\lambda}x. filter(x, SHINY), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.
\ No newline at end of file
diff --git a/data/2021/neurips/Graph Adversarial Self-Supervised Learning b/data/2021/neurips/Graph Adversarial Self-Supervised Learning
new file mode 100644
index 0000000000..804cddf07a
--- /dev/null
+++ b/data/2021/neurips/Graph Adversarial Self-Supervised Learning	
@@ -0,0 +1 @@
+This paper studies a long-standing problem of learning the representations of a whole graph without human supervision. The recent self-supervised learning methods train models to be invariant to the transformations (views) of the inputs. However, designing these views requires the experience of human experts. Inspired by adversarial training, we propose an adversarial self-supervised learning ( GASSL ) framework for learning unsupervised representations of graph data without any handcrafted views. GASSL automatically generates challenging views by adding perturbations to the input and are adversarially trained with respect to the encoder. Our method optimizes the min-max problem and utilizes a gradient accumulation strategy to accelerate the training process. Experimental on ten graph classiﬁ-cation datasets show that the proposed approach is superior to state-of-the-art self-supervised learning baselines, which are competitive with supervised models.
\ No newline at end of file
diff --git a/data/2021/neurips/Graph Differentiable Architecture Search with Structure Learning b/data/2021/neurips/Graph Differentiable Architecture Search with Structure Learning
new file mode 100644
index 0000000000..1a944d320c
--- /dev/null
+++ b/data/2021/neurips/Graph Differentiable Architecture Search with Structure Learning	
@@ -0,0 +1 @@
+Discovering ideal Graph Neural Networks (GNNs) architectures for different tasks is labor intensive and time consuming. To save human efforts, Neural Architecture Search (NAS) recently has been used to automatically discover adequate GNN architectures for certain tasks in order to achieve competitive or even better performance compared with manually designed architectures. However, existing works utilizing NAS to search GNN structures fail to answer the question How NAS is able to select the desired GNN architectures . In this paper, we investigate this question to solve the problem, for the ﬁrst time. We conduct theoretical analysis and measurement study with experiments to discover that gradient based NAS methods tend to select proper architectures based on the usefulness of different types of information with respect to the target task. Our explorations further show that gradient based NAS also suffers from noises hidden in the graph, resulting in searching suboptimal GNN architectures. Based on our ﬁndings, we propose a Graph differentiable Architecture Search model with Structure Optimization (GASSO), which allows differentiable search of the architecture with gradient descent and is able to discover graph neural architectures with better performance through employing graph structure learning as a denoising process in the search procedure. Extensive experiments on real-world graph datasets demonstrate that our proposed GASSO model is able to achieve the state-of-the-art performance compared with existing baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Graph Neural Networks with Adaptive Residual b/data/2021/neurips/Graph Neural Networks with Adaptive Residual
new file mode 100644
index 0000000000..596dd44a19
--- /dev/null
+++ b/data/2021/neurips/Graph Neural Networks with Adaptive Residual	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have shown the power in graph representation learning for numerous tasks. In this work, we discover an interesting phenomenon that although residual connections in the message passing of GNNs help improve the performance, they immensely amplify GNNs’ vulnerability against abnormal node features. This is undesirable because in real-world applications, node features in graphs could often be abnormal such as being naturally noisy or adversarially manipulated. We analyze possible reasons to understand this phenomenon and aim to design GNNs with stronger resilience to abnormal features. Our understandings motivate us to propose and derive a simple, efficient, interpretable, and adaptive message passing scheme, leading to a novel GNN with Adaptive residual, AirGNN1. Extensive experiments under various abnormal feature scenarios demonstrate the effectiveness of the proposed algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Graph Neural Networks with Local Graph Parameters b/data/2021/neurips/Graph Neural Networks with Local Graph Parameters
new file mode 100644
index 0000000000..c45847b50a
--- /dev/null
+++ b/data/2021/neurips/Graph Neural Networks with Local Graph Parameters	
@@ -0,0 +1 @@
+Various recent proposals increase the distinguishing power of Graph Neural Networks GNNs by propagating features between $k$-tuples of vertices. The distinguishing power of these"higher-order'' GNNs is known to be bounded by the $k$-dimensional Weisfeiler-Leman (WL) test, yet their $\mathcal O(n^k)$ memory requirements limit their applicability. Other proposals infuse GNNs with local higher-order graph structural information from the start, hereby inheriting the desirable $\mathcal O(n)$ memory requirement from GNNs at the cost of a one-time, possibly non-linear, preprocessing step. We propose local graph parameter enabled GNNs as a framework for studying the latter kind of approaches and precisely characterize their distinguishing power, in terms of a variant of the WL test, and in terms of the graph structural properties that they can take into account. Local graph parameters can be added to any GNN architecture, and are cheap to compute. In terms of expressive power, our proposal lies in the middle of GNNs and their higher-order counterparts. Further, we propose several techniques to aide in choosing the right local graph parameters. Our results connect GNNs with deep results in finite model theory and finite variable logics. Our experimental evaluation shows that adding local graph parameters often has a positive effect for a variety of GNNs, datasets and graph learning tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification b/data/2021/neurips/Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification
new file mode 100644
index 0000000000..6934eff6d4
--- /dev/null
+++ b/data/2021/neurips/Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification	
@@ -0,0 +1 @@
+The interdependence between nodes in graphs is key to improve class predictions on nodes and utilized in approaches like Label Propagation (LP) or in Graph Neural Networks (GNN). Nonetheless, uncertainty estimation for non-independent node-level predictions is under-explored. In this work, we explore uncertainty quantification for node classification in three ways: (1) We derive three axioms explicitly characterizing the expected predictive uncertainty behavior in homophilic attributed graphs. (2) We propose a new model Graph Posterior Network (GPN) which explicitly performs Bayesian posterior updates for predictions on interdependent nodes. GPN provably obeys the proposed axioms. (3) We extensively evaluate GPN and a strong set of baselines on semi-supervised node classification including detection of anomalous features, and detection of left-out classes. GPN outperforms existing approaches for uncertainty estimation in the experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph b/data/2021/neurips/GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph
new file mode 100644
index 0000000000..16c455bff0
--- /dev/null
+++ b/data/2021/neurips/GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph	
@@ -0,0 +1 @@
+The representation learning on textual graph is to generate low-dimensional embeddings for the nodes based on the individual textual features and the neighbourhood information. Recent breakthroughs on pretrained language models and graph neural networks push forward the development of corresponding techniques. The existing works mainly rely on the cascaded model architecture: the textual features of nodes are independently encoded by language models at first; the textual embeddings are aggregated by graph neural networks afterwards. However, the above architecture is limited due to the independent modeling of textual features. In this work, we propose GraphFormers, where layerwise GNN components are nested alongside the transformer blocks of language models. With the proposed architecture, the text encoding and the graph aggregation are fused into an iterative workflow, {making} each node's semantic accurately comprehended from the global perspective. In addition, a {progressive} learning strategy is introduced, where the model is successively trained on manipulated data and original data to reinforce its capability of integrating information on graph. Extensive evaluations are conducted on three large-scale benchmark datasets, where GraphFormers outperform the SOTA baselines with comparable running efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/Graphical Models in Heavy-Tailed Markets b/data/2021/neurips/Graphical Models in Heavy-Tailed Markets
new file mode 100644
index 0000000000..128458d02e
--- /dev/null
+++ b/data/2021/neurips/Graphical Models in Heavy-Tailed Markets	
@@ -0,0 +1 @@
+Heavy-tailed statistical distributions have long been considered a more realistic statistical model for the data generating process in ﬁnancial markets in comparison to their Gaussian counterpart. Nonetheless, mathematical nuisances, including nonconvexities, involved in estimating graphs in heavy-tailed settings pose a signiﬁcant challenge to the practical design of algorithms for graph learning. In this work, we present graph learning estimators based on the Markov random ﬁeld framework that assume a Student-t data generating process. We design scalable numerical algorithms, via the alternating direction method of multipliers, to learn both connected and k -component graphs along with their theoretical convergence guarantees. The proposed methods outperform state-of-the-art benchmarks in an extensive series of practical experiments with publicly available data from the S&P500 index, foreign exchanges, and cryptocurrencies.
\ No newline at end of file
diff --git a/data/2021/neurips/Greedy Approximation Algorithms for Active Sequential Hypothesis Testing b/data/2021/neurips/Greedy Approximation Algorithms for Active Sequential Hypothesis Testing
new file mode 100644
index 0000000000..e3a69e795b
--- /dev/null
+++ b/data/2021/neurips/Greedy Approximation Algorithms for Active Sequential Hypothesis Testing	
@@ -0,0 +1 @@
+In the problem of active sequential hypothesis testing (ASHT), a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error $\delta>0$, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least $1 - \delta$. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient (greedy, in fact) algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.
\ No newline at end of file
diff --git a/data/2021/neurips/Greedy and Random Quasi-Newton Methods with Faster Explicit Superlinear Convergence b/data/2021/neurips/Greedy and Random Quasi-Newton Methods with Faster Explicit Superlinear Convergence
new file mode 100644
index 0000000000..8a63be6878
--- /dev/null
+++ b/data/2021/neurips/Greedy and Random Quasi-Newton Methods with Faster Explicit Superlinear Convergence	
@@ -0,0 +1 @@
+In this paper, we follow Rodomanov and Nesterov [19]’s work to study quasi-Newton methods. We focus on the common SR1 and BFGS quasi-Newton methods to establish better explicit (local) superlinear convergence rates. First, based on the greedy quasi-Newton update which greedily selects the direction to maximize a certain measure of progress, we improve the convergence rate to a condition-number-free superlinear convergence rate. Second, based on the random quasi-Newton update that selects the direction randomly from a spherically symmetric distribution, we show the same superlinear convergence rate established as above. Our analysis is closely related to the approximation of a given Hessian matrix, unconstrained quadratic objective, as well as the general strongly convex, smooth and strongly self-concordant functions.
\ No newline at end of file
diff --git a/data/2021/neurips/Grounding Representation Similarity Through Statistical Testing b/data/2021/neurips/Grounding Representation Similarity Through Statistical Testing
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Grounding Spatio-Temporal Language with Transformers b/data/2021/neurips/Grounding Spatio-Temporal Language with Transformers
new file mode 100644
index 0000000000..d8ba76d091
--- /dev/null
+++ b/data/2021/neurips/Grounding Spatio-Temporal Language with Transformers	
@@ -0,0 +1 @@
+Language is an interface to the outside world. In order for embodied agents to use it, language must be grounded in other, sensorimotor modalities. While there is an extended literature studying how machines can learn grounded language, the topic of how to learn spatio-temporal linguistic concepts is still largely uncharted. To make progress in this direction, we here introduce a novel spatio-temporal language grounding task where the goal is to learn the meaning of spatio-temporal descriptions of behavioral traces of an embodied agent. This is achieved by training a truth function that predicts if a description matches a given history of observations. The descriptions involve time-extended predicates in past and present tense as well as spatio-temporal references to objects in the scene. To study the role of architectural biases in this task, we train several models including multimodal Transformer architectures; the latter implement different attention computations between words and objects across space and time. We test models on two classes of generalization: 1) generalization to randomly held-out sentences; 2) generalization to grammar primitives. We observe that maintaining object identity in the attention computation of our Transformers is instrumental to achieving good performance on generalization overall, and that summarizing object traces in a single token has little influence on performance. We then discuss how this opens new perspectives for language-guided autonomous embodied agents. We also release our code under open-source license as well as pretrained models and datasets to encourage the wider community to build upon and extend our work in the future.
\ No newline at end of file
diff --git a/data/2021/neurips/Grounding inductive biases in natural images: invariance stems from variations in data b/data/2021/neurips/Grounding inductive biases in natural images: invariance stems from variations in data
new file mode 100644
index 0000000000..4e82191171
--- /dev/null
+++ b/data/2021/neurips/Grounding inductive biases in natural images: invariance stems from variations in data	
@@ -0,0 +1 @@
+To perform well on unseen and potentially out-of-distribution samples, it is desirable for machine learning models to have a predictable response with respect to transformations affecting the factors of variation of the input. Here, we study the relative importance of several types of inductive biases towards such predictable behavior: the choice of data, their augmentations, and model architectures. Invariance is commonly achieved through hand-engineered data augmentation, but do standard data augmentations address transformations that explain variations in real data? While prior work has focused on synthetic data, we attempt here to characterize the factors of variation in a real dataset, ImageNet, and study the invariance of both standard residual networks and the recently proposed vision transformer with respect to changes in these factors. We show standard augmentation relies on a precise combination of translation and scale, with translation recapturing most of the performance improvement -- despite the (approximate) translation invariance built in to convolutional architectures, such as residual networks. In fact, we found that scale and translation invariance was similar across residual networks and vision transformer models despite their markedly different architectural inductive biases. We show the training data itself is the main source of invariance, and that data augmentation only further increases the learned invariances. Notably, the invariances learned during training align with the ImageNet factors of variation we found. Finally, we find that the main factors of variation in ImageNet mostly relate to appearance and are specific to each class.
\ No newline at end of file
diff --git a/data/2021/neurips/Group Equivariant Subsampling b/data/2021/neurips/Group Equivariant Subsampling
new file mode 100644
index 0000000000..bd044851d2
--- /dev/null
+++ b/data/2021/neurips/Group Equivariant Subsampling	
@@ -0,0 +1 @@
+Subsampling is used in convolutional neural networks (CNNs) in the form of pooling or strided convolutions, to reduce the spatial dimensions of feature maps and to allow the receptive fields to grow exponentially with depth. However, it is known that such subsampling operations are not translation equivariant, unlike convolutions that are translation equivariant. Here, we first introduce translation equivariant subsampling/upsampling layers that can be used to construct exact translation equivariant CNNs. We then generalise these layers beyond translations to general groups, thus proposing group equivariant subsampling/upsampling. We use these layers to construct group equivariant autoencoders (GAEs) that allow us to learn low-dimensional equivariant representations. We empirically verify on images that the representations are indeed equivariant to input translations and rotations, and thus generalise well to unseen positions and orientations. We further use GAEs in models that learn object-centric representations on multi-object datasets, and show improved data efficiency and decomposition compared to non-equivariant baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion b/data/2021/neurips/H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion
new file mode 100644
index 0000000000..ada71995c5
--- /dev/null
+++ b/data/2021/neurips/H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion	
@@ -0,0 +1 @@
+We present neural radiance fields for rendering and temporal (4D) reconstruction of humans in motion (H-NeRF), as captured by a sparse set of cameras or even from a monocular video. Our approach combines ideas from neural scene representation, novel-view synthesis, and implicit statistical geometric human representations, coupled using novel loss functions. Instead of learning a radiance field with a uniform occupancy prior, we constrain it by a structured implicit human body model, represented using signed distance functions. This allows us to robustly fuse information from sparse views and generalize well beyond the poses or views observed in training. Moreover, we apply geometric constraints to co-learn the structure of the observed subject -- including both body and clothing -- and to regularize the radiance field to geometrically plausible solutions. Extensive experiments on multiple datasets demonstrate the robustness and the accuracy of our approach, its generalization capabilities significantly outside a small training set of poses and views, and statistical extrapolation beyond the observed shape.
\ No newline at end of file
diff --git a/data/2021/neurips/HNPE: Leveraging Global Parameters for Neural Posterior Estimation b/data/2021/neurips/HNPE: Leveraging Global Parameters for Neural Posterior Estimation
new file mode 100644
index 0000000000..01d0d01fca
--- /dev/null
+++ b/data/2021/neurips/HNPE: Leveraging Global Parameters for Neural Posterior Estimation	
@@ -0,0 +1 @@
+Inferring the parameters of a stochastic model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e. when distinct sets of parameters yield identical observations. This arises in many practical situations, such as when inferring the distance and power of a radio source (is the source close and weak or far and strong?) or when estimating the amplifier gain and underlying brain activity of an electrophysiological experiment. In this work, we present hierarchical neural posterior estimation (HNPE), a novel method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters. Our method extends recent developments in simulation-based inference (SBI) based on normalizing flows to Bayesian hierarchical models. We validate quantitatively our proposal on a motivating example amenable to analytical solutions and then apply it to invert a well known non-linear model from computational neuroscience, using both simulated and real EEG data.
\ No newline at end of file
diff --git a/data/2021/neurips/HRFormer: High-Resolution Vision Transformer for Dense Predict b/data/2021/neurips/HRFormer: High-Resolution Vision Transformer for Dense Predict
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning b/data/2021/neurips/HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning
new file mode 100644
index 0000000000..5da1885ff0
--- /dev/null
+++ b/data/2021/neurips/HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning	
@@ -0,0 +1 @@
+Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a common (latent) space is adopted for associating the visual and semantic domains in ZSL. However, existing common space learning methods align the semantic and visual domains by merely mitigating distribution disagreement through one-step adaptation. This strategy is usually ineffective due to the heterogeneous nature of the feature representations in the two domains, which intrinsically contain both distribution and structure variations. To address this and advance ZSL, we propose a novel hierarchical semantic-visual adaptation (HSVA) framework. Specifically, HSVA aligns the semantic and visual domains by adopting a hierarchical two-step adaptation, i.e., structure adaptation and distribution adaptation. In the structure adaptation step, we take two task-specific encoders to encode the source data (visual domain) and the target data (semantic domain) into a structure-aligned common space. To this end, a supervised adversarial discrepancy (SAD) module is proposed to adversarially minimize the discrepancy between the predictions of two task-specific classifiers, thus making the visual and semantic feature manifolds more closely aligned. In the distribution adaptation step, we directly minimize the Wasserstein distance between the latent multivariate Gaussian distributions to align the visual and semantic distributions using a common encoder. Finally, the structure and distribution adaptation are derived in a unified framework under two partially-aligned variational autoencoders. Extensive experiments on four benchmark datasets demonstrate that HSVA achieves superior performance on both conventional and generalized ZSL. The code is available at \url{https://github.com/shiming-chen/HSVA} .
\ No newline at end of file
diff --git a/data/2021/neurips/Habitat 2.0: Training Home Assistants to Rearrange their Habitat b/data/2021/neurips/Habitat 2.0: Training Home Assistants to Rearrange their Habitat
new file mode 100644
index 0000000000..ae8dc234a0
--- /dev/null
+++ b/data/2021/neurips/Habitat 2.0: Training Home Assistants to Rearrange their Habitat	
@@ -0,0 +1 @@
+We introduce Habitat 2.0 (H2.0), a simulation platform for training virtual robots in interactive 3D environments and complex physics-enabled scenarios. We make comprehensive contributions to all levels of the embodied AI stack - data, simulation, and benchmark tasks. Specifically, we present: (i) ReplicaCAD: an artist-authored, annotated, reconfigurable 3D dataset of apartments (matching real spaces) with articulated objects (e.g. cabinets and drawers that can open/close); (ii) H2.0: a high-performance physics-enabled 3D simulator with speeds exceeding 25,000 simulation steps per second (850x real-time) on an 8-GPU node, representing 100x speed-ups over prior work; and, (iii) Home Assistant Benchmark (HAB): a suite of common tasks for assistive robots (tidy the house, prepare groceries, set the table) that test a range of mobile manipulation capabilities. These large-scale engineering contributions allow us to systematically compare deep reinforcement learning (RL) at scale and classical sense-plan-act (SPA) pipelines in long-horizon structured tasks, with an emphasis on generalization to new objects, receptacles, and layouts. We find that (1) flat RL policies struggle on HAB compared to hierarchical ones; (2) a hierarchy with independent skills suffers from 'hand-off problems', and (3) SPA pipelines are more brittle than RL policies.
\ No newline at end of file
diff --git a/data/2021/neurips/Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling b/data/2021/neurips/Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling
new file mode 100644
index 0000000000..23bf34ff64
--- /dev/null
+++ b/data/2021/neurips/Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling	
@@ -0,0 +1 @@
+Sampling from an unnormalized probability distribution is a fundamental problem in machine learning with applications including Bayesian modeling, latent factor inference, and energy-based model training. After decades of research, variations of MCMC remain the default approach to sampling despite slow convergence. Auxiliary neural models can learn to speed up MCMC, but the overhead for training the extra model can be prohibitive. We propose a fundamentally different approach to this problem via a new Hamiltonian dynamics with a non-Newtonian momentum. In contrast to MCMC approaches like Hamiltonian Monte Carlo, no stochastic step is required. Instead, the proposed deterministic dynamics in an extended state space exactly sample the target distribution, specified by an energy function, under an assumption of ergodicity. Alternatively, the dynamics can be interpreted as a normalizing flow that samples a specified energy model without training. The proposed Energy Sampling Hamiltonian (ESH) dynamics have a simple form that can be solved with existing ODE solvers, but we derive a specialized solver that exhibits much better performance. ESH dynamics converge faster than their MCMC competitors enabling faster, more stable training of neural network energy models.
\ No newline at end of file
diff --git a/data/2021/neurips/Handling Long-tailed Feature Distribution in AdderNets b/data/2021/neurips/Handling Long-tailed Feature Distribution in AdderNets
new file mode 100644
index 0000000000..46b4800d20
--- /dev/null
+++ b/data/2021/neurips/Handling Long-tailed Feature Distribution in AdderNets	
@@ -0,0 +1 @@
+Adder neural networks (ANNs) are designed for low energy cost which replace expensive multiplications in convolutional neural networks (CNNs) with cheaper additions to yield energy-efficient neural networks and hardware accelerations. Although ANNs achieve satisfactory efficiency, there exist gaps between ANNs and CNNs where the accuracy of ANNs can hardly be compared to CNNs without the assistance of other training tricks, such as knowledge distillation. The inherent discrepancy lies in the similarity measurement between filters and features, however how to alleviate this difference remains unexplored. To locate the potential problem of ANNs, we focus on the property difference due to similarity measurement. We demonstrate that unordered heavy tails in ANNs could be the key component which prevents ANNs from achieving superior classification performance since fatter tails tend to overlap in feature space. Through pre-defining Multivariate Skew Laplace distributions and embedding feature distributions into the loss function, ANN features can be fully controlled and designed for various properties. We further present a novel method for tackling existing heavy tails in ANNs with only a modification of classifier where ANN features are clustered with their tails well-formulated through proposed angle-based constraint on the distribution parameters to encourage high diversity of tails. Experiments conducted on several benchmarks and comparison with other distributions demonstrate the effectiveness of proposed approach for boosting the performance of ANNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Hard-Attention for Scalable Image Classification b/data/2021/neurips/Hard-Attention for Scalable Image Classification
new file mode 100644
index 0000000000..a43c0fffc6
--- /dev/null
+++ b/data/2021/neurips/Hard-Attention for Scalable Image Classification	
@@ -0,0 +1 @@
+Can we leverage high-resolution information without the unsustainable quadratic complexity to input scale? We propose Traversal Network (TNet), a novel multi-scale hard-attention architecture, which traverses image scale-space in a top-down fashion, visiting only the most informative image regions along the way. TNet offers an adjustable trade-off between accuracy and complexity, by changing the number of attended image locations. We compare our model against hard-attention baselines on ImageNet, achieving higher accuracy with less resources (FLOPs, processing time and memory). We further test our model on fMoW dataset, where we process satellite images of size up to $896 \times 896$ px, getting up to $2.5$x faster processing compared to baselines operating on the same resolution, while achieving higher accuracy as well. TNet is modular, meaning that most classification models could be adopted as its backbone for feature extraction, making the reported performance gains orthogonal to benefits offered by existing optimized deep models. Finally, hard-attention guarantees a degree of interpretability to our model's predictions, without any extra cost beyond inference. Code is available at $\href{https://github.com/Tpap/TNet}{github.com/Tpap/TNet}$.
\ No newline at end of file
diff --git a/data/2021/neurips/Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning b/data/2021/neurips/Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning
new file mode 100644
index 0000000000..a7fbee3d65
--- /dev/null
+++ b/data/2021/neurips/Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning	
@@ -0,0 +1 @@
+For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on hardware-aware NAS collect a large number of samples (e.g., accuracy and latency) from a target device, either builds a lookup table or a latency estimator. However, such approach is impractical in real-world scenarios as there exist numerous devices with different hardware specifications, and collecting samples from such a large number of devices will require prohibitive computational and monetary cost. To overcome such limitations, we propose Hardware-adaptive Efficient Latency Predictor (HELP), which formulates the device-specific latency estimation problem as a meta-learning problem, such that we can estimate the latency of a model's performance for a given task on an unseen device with a few samples. To this end, we introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner, using the hardware embeddings. We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines. We also validate end-to-end NAS frameworks using HELP against ones without it, and show that it largely reduces the total time cost of the base NAS method, in latency-constrained settings. Code is available at https://github.com/HayeonLee/HELP.
\ No newline at end of file
diff --git a/data/2021/neurips/Hash Layers For Large Sparse Models b/data/2021/neurips/Hash Layers For Large Sparse Models
new file mode 100644
index 0000000000..f9c254e2a5
--- /dev/null
+++ b/data/2021/neurips/Hash Layers For Large Sparse Models	
@@ -0,0 +1 @@
+We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models. Specifically, we modify the feedforward layer to hash to different sets of weights depending on the current token, over all tokens in the sequence. We show that this procedure either outperforms or is competitive with learning-to-route mixture-of-expert methods such as Switch Transformers and BASE Layers, while requiring no routing parameters or extra terms in the objective function such as a load balancing loss, and no sophisticated assignment algorithm. We study the performance of different hashing techniques, hash sizes and input features, and show that balanced and random hashes focused on the most local features work best, compared to either learning clusters or using longer-range context. We show our approach works well both on large language modeling and dialogue tasks, and on downstream fine-tuning tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Heavy Ball Momentum for Conditional Gradient b/data/2021/neurips/Heavy Ball Momentum for Conditional Gradient
new file mode 100644
index 0000000000..2590d2260c
--- /dev/null
+++ b/data/2021/neurips/Heavy Ball Momentum for Conditional Gradient	
@@ -0,0 +1 @@
+Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter per iteration PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.
\ No newline at end of file
diff --git a/data/2021/neurips/Heavy Ball Neural Ordinary Differential Equations b/data/2021/neurips/Heavy Ball Neural Ordinary Differential Equations
new file mode 100644
index 0000000000..e81dfb87d9
--- /dev/null
+++ b/data/2021/neurips/Heavy Ball Neural Ordinary Differential Equations	
@@ -0,0 +1 @@
+We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference. HBNODEs have two properties that imply practical advantages over NODEs: (i) The adjoint state of an HBNODE also satisfies an HBNODE, accelerating both forward and backward ODE solvers, thus significantly reducing the number of function evaluations (NFEs) and improving the utility of the trained models. (ii) The spectrum of HBNODEs is well structured, enabling effective learning of long-term dependencies from complex sequential data. We verify the advantages of HBNODEs over NODEs on benchmark tasks, including image classification, learning complex dynamics, and sequential modeling. Our method requires remarkably fewer forward and backward NFEs, is more accurate, and learns long-term dependencies more effectively than the other ODE-based neural network models. Code is available at \url{https://github.com/hedixia/HeavyBallNODE}.
\ No newline at end of file
diff --git a/data/2021/neurips/Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks b/data/2021/neurips/Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
new file mode 100644
index 0000000000..82351f503a
--- /dev/null
+++ b/data/2021/neurips/Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks	
@@ -0,0 +1 @@
+Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies have illustrated that even simple pruning strategies can be surprisingly effective, and several theoretical studies have shown that compressible networks (in specific senses) should achieve a low generalization error. Yet, a theoretical characterization of the underlying cause that makes the networks amenable to such simple compression schemes is still missing. In this study, we address this fundamental question and reveal that the dynamics of the training algorithm has a key role in obtaining such compressible networks. Focusing our attention on stochastic gradient descent (SGD), our main contribution is to link compressibility to two recently established properties of SGD: (i) as the network size goes to infinity, the system can converge to a mean-field limit, where the network weights behave independently, (ii) for a large step-size/batch-size ratio, the SGD iterates can converge to a heavy-tailed stationary distribution. In the case where these two phenomena occur simultaneously, we prove that the networks are guaranteed to be '$\ell_p$-compressible', and the compression errors of different pruning techniques (magnitude, singular value, or node pruning) become arbitrarily small as the network size increases. We further prove generalization bounds adapted to our theoretical framework, which indeed confirm that the generalization error will be lower for more compressible networks. Our theory and numerical study on various neural networks show that large step-size/batch-size ratios introduce heavy-tails, which, in combination with overparametrization, result in compressibility.
\ No newline at end of file
diff --git a/data/2021/neurips/Hessian Eigenspectra of More Realistic Nonlinear Models b/data/2021/neurips/Hessian Eigenspectra of More Realistic Nonlinear Models
new file mode 100644
index 0000000000..c6186d0c28
--- /dev/null
+++ b/data/2021/neurips/Hessian Eigenspectra of More Realistic Nonlinear Models	
@@ -0,0 +1 @@
+Given an optimization problem, the Hessian matrix and its eigenspectrum can be used in many ways, ranging from designing more efficient second-order algorithms to performing model analysis and regression diagnostics. When nonlinear models and non-convex problems are considered, strong simplifying assumptions are often made to make Hessian spectral analysis more tractable. This leads to the question of how relevant the conclusions of such analyses are for more realistic nonlinear models. In this paper, we exploit deterministic equivalent techniques from random matrix theory to make a \emph{precise} characterization of the Hessian eigenspectra for a broad family of nonlinear models, including models that generalize the classical generalized linear models, without relying on strong simplifying assumptions used previously. We show that, depending on the data properties, the nonlinear response model, and the loss function, the Hessian can have \emph{qualitatively} different spectral behaviors: of bounded or unbounded support, with single- or multi-bulk, and with isolated eigenvalues on the left- or right-hand side of the bulk. By focusing on such a simple but nontrivial nonlinear model, our analysis takes a step forward to unveil the theoretical origin of many visually striking features observed in more complex machine learning models.
\ No newline at end of file
diff --git a/data/2021/neurips/Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization b/data/2021/neurips/Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization
new file mode 100644
index 0000000000..a59e63e960
--- /dev/null
+++ b/data/2021/neurips/Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization	
@@ -0,0 +1 @@
+Despite the significant interests and many progresses in decentralized multi-player multi-armed bandits (MP-MAB) problems in recent years, the regret gap to the natural centralized lower bound in the heterogeneous MP-MAB setting remains open. In this paper, we propose BEACON -- Batched Exploration with Adaptive COmmunicatioN -- that closes this gap. BEACON accomplishes this goal with novel contributions in implicit communication and efficient exploration. For the former, we propose a novel adaptive differential communication (ADC) design that significantly improves the implicit communication efficiency. For the latter, a carefully crafted batched exploration scheme is developed to enable incorporation of the combinatorial upper confidence bound (CUCB) principle. We then generalize the existing linear-reward MP-MAB problems, where the system reward is always the sum of individually collected rewards, to a new MP-MAB problem where the system reward is a general (nonlinear) function of individual rewards. We extend BEACON to solve this problem and prove a logarithmic regret. BEACON bridges the algorithm design and regret analysis of combinatorial MAB (CMAB) and MP-MAB, two largely disjointed areas in MAB, and the results in this paper suggest that this previously ignored connection is worth further investigation.
\ No newline at end of file
diff --git a/data/2021/neurips/Heuristic-Guided Reinforcement Learning b/data/2021/neurips/Heuristic-Guided Reinforcement Learning
new file mode 100644
index 0000000000..a64c1d0091
--- /dev/null
+++ b/data/2021/neurips/Heuristic-Guided Reinforcement Learning	
@@ -0,0 +1 @@
+We provide a framework for accelerating reinforcement learning (RL) algorithms by heuristics constructed from domain knowledge or offline data. Tabula rasa RL algorithms require environment interactions or computation that scales with the horizon of the sequential decision-making task. Using our framework, we show how heuristic-guided RL induces a much shorter-horizon subproblem that provably solves the original task. Our framework can be viewed as a horizon-based regularization for controlling bias and variance in RL under a finite interaction budget. On the theoretical side, we characterize properties of a good heuristic and its impact on RL acceleration. In particular, we introduce the novel concept of an improvable heuristic, a heuristic that allows an RL agent to extrapolate beyond its prior knowledge. On the empirical side, we instantiate our framework to accelerate several state-of-the-art algorithms in simulated robotic control tasks and procedurally generated games. Our framework complements the rich literature on warm-starting RL with expert demonstrations or exploratory datasets, and introduces a principled method for injecting prior knowledge into RL.
\ No newline at end of file
diff --git a/data/2021/neurips/Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs b/data/2021/neurips/Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs
new file mode 100644
index 0000000000..549b8f78d9
--- /dev/null
+++ b/data/2021/neurips/Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs	
@@ -0,0 +1 @@
+Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.
\ No newline at end of file
diff --git a/data/2021/neurips/Hierarchical Reinforcement Learning with Timed Subgoals b/data/2021/neurips/Hierarchical Reinforcement Learning with Timed Subgoals
new file mode 100644
index 0000000000..48e3eeaae8
--- /dev/null
+++ b/data/2021/neurips/Hierarchical Reinforcement Learning with Timed Subgoals	
@@ -0,0 +1 @@
+Hierarchical reinforcement learning (HRL) holds great potential for sample-efficient learning on challenging long-horizon tasks. In particular, letting a higher level assign subgoals to a lower level has been shown to enable fast learning on difficult problems. However, such subgoal-based methods have been designed with static reinforcement learning environments in mind and consequently struggle with dynamic elements beyond the immediate control of the agent even though they are ubiquitous in real-world problems. In this paper, we introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS), an HRL algorithm that enables the agent to adapt its timing to a dynamic environment by not only specifying what goal state is to be reached but also when. We discuss how communicating with a lower level in terms of such timed subgoals results in a more stable learning problem for the higher level. Our experiments on a range of standard benchmarks and three new challenging dynamic reinforcement learning environments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
\ No newline at end of file
diff --git a/data/2021/neurips/Hierarchical Skills for Efficient Exploration b/data/2021/neurips/Hierarchical Skills for Efficient Exploration
new file mode 100644
index 0000000000..0d779e0a97
--- /dev/null
+++ b/data/2021/neurips/Hierarchical Skills for Efficient Exploration	
@@ -0,0 +1 @@
+In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. However, prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. In previous work on continuous control, the sensitivity of methods to this trade-off has not been addressed explicitly, as locomotion provides a suitable prior for navigation tasks, which have been of foremost interest. In this work, we analyze this trade-off for low-level policy pre-training with a new benchmark suite of diverse, sparse-reward tasks for bipedal robots. We alleviate the need for prior knowledge by proposing a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner. For utilization on downstream tasks, we present a three-layered hierarchical learning algorithm to automatically trade off between general and specific skills as required by the respective task. In our experiments, we show that our approach performs this trade-off effectively and achieves better results than current state-of-the-art methods for end- to-end hierarchical reinforcement learning and unsupervised skill discovery. Code and videos are available at https://facebookresearch.github.io/hsd3 .
\ No newline at end of file
diff --git a/data/2021/neurips/High Probability Complexity Bounds for Line Search Based on Stochastic Oracles b/data/2021/neurips/High Probability Complexity Bounds for Line Search Based on Stochastic Oracles
new file mode 100644
index 0000000000..c7c10f5dc3
--- /dev/null
+++ b/data/2021/neurips/High Probability Complexity Bounds for Line Search Based on Stochastic Oracles	
@@ -0,0 +1 @@
+We consider a line-search method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth and ﬁrst-order oracles. These oracles capture multiple standard settings including expected loss minimization and zeroth-order optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe, easy to implement, and uses these oracles in a similar way as the standard deterministic line search uses exact function and gradient values. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when applied to non-convex smooth functions. These results are stronger than those for other existing stochastic line search methods and apply in more general settings.
\ No newline at end of file
diff --git a/data/2021/neurips/High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails b/data/2021/neurips/High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails
new file mode 100644
index 0000000000..d1a1945964
--- /dev/null
+++ b/data/2021/neurips/High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails	
@@ -0,0 +1 @@
+We consider non-convex stochastic optimization using first-order algorithms for which the gradient estimates may have heavy tails. We show that a combination of gradient clipping, momentum, and normalized gradient descent yields convergence to critical points in high-probability with best-known rates for smooth losses when the gradients only have bounded $\mathfrak{p}$th moments for some $\mathfrak{p}\in(1,2]$. We then consider the case of second-order smooth losses, which to our knowledge have not been studied in this setting, and again obtain high-probability bounds for any $\mathfrak{p}$. Moreover, our results hold for arbitrary smooth norms, in contrast to the typical SGD analysis which requires a Hilbert space norm. Further, we show that after a suitable"burn-in"period, the objective value will monotonically decrease for every iteration until a critical point is identified, which provides intuition behind the popular practice of learning rate"warm-up"and also yields a last-iterate guarantee.
\ No newline at end of file
diff --git a/data/2021/neurips/Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes b/data/2021/neurips/Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes
new file mode 100644
index 0000000000..0cb55c4257
--- /dev/null
+++ b/data/2021/neurips/Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes	
@@ -0,0 +1 @@
+Stochastic processes are random variables with values in some space of paths. However, reducing a stochastic process to a path-valued random variable ignores its filtration, i.e. the flow of information carried by the process through time. By conditioning the process on its filtration, we introduce a family of higher order kernel mean embeddings (KMEs) that generalizes the notion of KME and captures additional information related to the filtration. We derive empirical estimators for the associated higher order maximum mean discrepancies (MMDs) and prove consistency. We then construct a filtration-sensitive kernel two-sample test able to pick up information that gets missed by the standard MMD test. In addition, leveraging our higher order MMDs we construct a family of universal kernels on stochastic processes that allows to solve real-world calibration and optimal stopping problems in quantitative finance (such as the pricing of American options) via classical kernel-based regression methods. Finally, adapting existing tests for conditional independence to the case of stochastic processes, we design a causal-discovery algorithm to recover the causal graph of structural dependencies among interacting bodies solely from observations of their multidimensional trajectories.
\ No newline at end of file
diff --git a/data/2021/neurips/Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL b/data/2021/neurips/Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
new file mode 100644
index 0000000000..84637444f3
--- /dev/null
+++ b/data/2021/neurips/Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL	
@@ -0,0 +1 @@
+Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments. Although existing meta-RL algorithms can learn strategies for adapting to new sparse reward tasks, the actual adaptation strategies are learned using hand-shaped reward functions, or require simple environments where random exploration is sufficient to encounter sparse reward. In this paper, we present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward. We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.
\ No newline at end of file
diff --git a/data/2021/neurips/History Aware Multimodal Transformer for Vision-and-Language Navigation b/data/2021/neurips/History Aware Multimodal Transformer for Vision-and-Language Navigation
new file mode 100644
index 0000000000..a0439e2341
--- /dev/null
+++ b/data/2021/neurips/History Aware Multimodal Transformer for Vision-and-Language Navigation	
@@ -0,0 +1 @@
+Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow instructions and navigate in real scenes. To remember previously visited locations and actions taken, most approaches to VLN implement memory using recurrent states. Instead, we introduce a History Aware Multimodal Transformer (HAMT) to incorporate a long-horizon history into multimodal decision making. HAMT efficiently encodes all the past panoramic observations via a hierarchical vision transformer (ViT), which first encodes individual images with ViT, then models spatial relation between images in a panoramic observation and finally takes into account temporal relation between panoramas in the history. It, then, jointly combines text, history and current observation to predict the next action. We first train HAMT end-to-end using several proxy tasks including single step action prediction and spatial relation prediction, and then use reinforcement learning to further improve the navigation policy. HAMT achieves new state of the art on a broad range of VLN tasks, including VLN with fine-grained instructions (R2R, RxR), high-level instructions (R2R-Last, REVERIE), dialogs (CVDN) as well as long-horizon VLN (R4R, R2R-Back). We demonstrate HAMT to be particularly effective for navigation tasks with longer trajectories.
\ No newline at end of file
diff --git a/data/2021/neurips/Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation b/data/2021/neurips/Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation
new file mode 100644
index 0000000000..6c5577e56d
--- /dev/null
+++ b/data/2021/neurips/Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation	
@@ -0,0 +1 @@
+Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties has been highlighted as a promising strategy for drug design. A molecular docking program - a physical simulation that estimates protein-small molecule binding affinity - can be an ideal reward scoring function for RL, as it is a straightforward proxy of the therapeutic potential. Still, two imminent challenges exist for this task. First, the models often fail to generate chemically realistic and pharmacochemically acceptable molecules. Second, the docking score optimization is a difficult exploration problem that involves many local optima and less smooth surfaces with respect to molecular structure. To tackle these challenges, we propose a novel RL framework that generates pharmacochemically acceptable molecules with large docking scores. Our method - Fragment-based generative RL with Explorative Experience replay for Drug design (FREED) - constrains the generated molecules to a realistic and qualified chemical space and effectively explores the space to find drugs by coupling our fragment-based generation method and a novel error-prioritized experience replay (PER). We also show that our model performs well on both de novo and scaffold-based schemes. Our model produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets in terms of the docking scores of the generated molecules. We further show with ablation studies that our method, predictive error-PER (FREED(PE)), significantly improves the model performance.
\ No newline at end of file
diff --git a/data/2021/neurips/How Data Augmentation affects Optimization for Linear Regression b/data/2021/neurips/How Data Augmentation affects Optimization for Linear Regression
new file mode 100644
index 0000000000..c81792b194
--- /dev/null
+++ b/data/2021/neurips/How Data Augmentation affects Optimization for Linear Regression	
@@ -0,0 +1 @@
+Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for learning rate and data augmentation scheme under which augmented gradient descent provably converges and characterize the resulting minimum. Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting. Our approach interprets augmented (S)GD as a stochastic optimization method for a time-varying sequence of proxy losses. This gives a unified way to analyze learning rate, batch size, and augmentations ranging from additive noise to random projections. From this perspective, our results, which also give rates of convergence, can be viewed as Monro-Robbins type conditions for augmented (S)GD.
\ No newline at end of file
diff --git a/data/2021/neurips/How Does it Sound? b/data/2021/neurips/How Does it Sound?
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/How Fine-Tuning Allows for Effective Meta-Learning b/data/2021/neurips/How Fine-Tuning Allows for Effective Meta-Learning
new file mode 100644
index 0000000000..00ea5d2a08
--- /dev/null
+++ b/data/2021/neurips/How Fine-Tuning Allows for Effective Meta-Learning	
@@ -0,0 +1 @@
+Representation learning has been widely studied in the context of meta-learning, enabling rapid learning of new tasks through shared representations. Recent works such as MAML have explored using fine-tuning-based metrics, which measure the ease by which fine-tuning can achieve good performance, as proxies for obtaining representations. We present a theoretical framework for analyzing representations derived from a MAML-like algorithm, assuming the available tasks use approximately the same underlying representation. We then provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure. The upper bound applies to general function classes, which we demonstrate by instantiating the guarantees of our framework in the logistic regression and neural network settings. In contrast, we establish the existence of settings where any algorithm, using a representation trained with no consideration for task-specific fine-tuning, performs as well as a learner with no access to source tasks in the worst case. This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with"frozen representation"objectives in few-shot learning.
\ No newline at end of file
diff --git a/data/2021/neurips/How Modular should Neural Module Networks Be for Systematic Generalization? b/data/2021/neurips/How Modular should Neural Module Networks Be for Systematic Generalization?
new file mode 100644
index 0000000000..51f98364d7
--- /dev/null
+++ b/data/2021/neurips/How Modular should Neural Module Networks Be for Systematic Generalization?	
@@ -0,0 +1 @@
+Neural Module Networks (NMNs) aim at Visual Question Answering (VQA) via composition of modules that tackle a sub-task. NMNs are a promising strategy to achieve systematic generalization, i.e., overcoming biasing factors in the training distribution. However, the aspects of NMNs that facilitate systematic generalization are not fully understood. In this paper, we demonstrate that the degree of modularity of the NMN have large influence on systematic generalization. In a series of experiments on three VQA datasets (VQA-MNIST, SQOOP, and CLEVR-CoGenT), our results reveal that tuning the degree of modularity, especially at the image encoder stage, reaches substantially higher systematic generalization. These findings lead to new NMN architectures that outperform previous ones in terms of systematic generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/How Powerful are Performance Predictors in Neural Architecture Search? b/data/2021/neurips/How Powerful are Performance Predictors in Neural Architecture Search?
new file mode 100644
index 0000000000..32dfaee70b
--- /dev/null
+++ b/data/2021/neurips/How Powerful are Performance Predictors in Neural Architecture Search?	
@@ -0,0 +1 @@
+Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks. To reduce this extreme computational cost, dozens of techniques have since been proposed to predict the final performance of neural architectures. Despite the success of such performance prediction methods, it is not well-understood how different families of techniques compare to one another, due to the lack of an agreed-upon evaluation metric and optimization for different constraints on the initialization time and query time. In this work, we give the first large-scale study of performance predictors by analyzing 31 techniques ranging from learning curve extrapolation, to weight-sharing, to supervised learning, to"zero-cost"proxies. We test a number of correlation- and rank-based performance measures in a variety of settings, as well as the ability of each technique to speed up predictor-based NAS frameworks. Our results act as recommendations for the best predictors to use in different settings, and we show that certain families of predictors can be combined to achieve even better predictive power, opening up promising research directions. Our code, featuring a library of 31 performance predictors, is available at https://github.com/automl/naslib.
\ No newline at end of file
diff --git a/data/2021/neurips/How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness? b/data/2021/neurips/How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
new file mode 100644
index 0000000000..b69a339995
--- /dev/null
+++ b/data/2021/neurips/How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?	
@@ -0,0 +1 @@
+The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet, it is strikingly vulnerable to adversarial examples, e.g., word substitution attacks using only synonyms can easily fool a BERT-based sentiment analysis model. In this paper, we demonstrate that adversarial training, the prevalent defense technique, does not directly fit a conventional fine-tuning scenario, because it suffers severely from catastrophic forgetting: failing to retain the generic and robust linguistic features that have already been captured by the pre-trained model. In this light, we propose Robust Informative Fine-Tuning (RIFT), a novel adversarial fine-tuning method from an information-theoretical perspective. In particular, RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process, whereas a conventional one only uses the pre-trained weights for initialization. Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks: sentiment analysis and natural language inference, under different attacks across various pre-trained language models.
\ No newline at end of file
diff --git a/data/2021/neurips/How Tight Can PAC-Bayes be in the Small Data Regime? b/data/2021/neurips/How Tight Can PAC-Bayes be in the Small Data Regime?
new file mode 100644
index 0000000000..fd92d91ee0
--- /dev/null
+++ b/data/2021/neurips/How Tight Can PAC-Bayes be in the Small Data Regime?	
@@ -0,0 +1 @@
+In this paper, we investigate the question: Given a small number of datapoints, for example N = 30 , how tight can PAC-Bayes and test set bounds be made? For such small datasets, test set bounds adversely affect generalisation performance by withholding data from the training procedure. In this setting, PAC-Bayes bounds are especially attractive, due to their ability to use all the data to simultaneously learn a posterior and bound its generalisation risk. We focus on the case of i.i.d. data with a bounded loss and consider the generic PAC-Bayes theorem of Germain et al. While their theorem is known to recover many existing PAC-Bayes bounds, it is unclear what the tightest bound derivable from their framework is. For a ﬁxed learning algorithm and dataset, we show that the tightest possible bound coincides with a bound considered by Catoni; and, in the more natural case of distributions over datasets, we establish a lower bound on the best bound achievable in expectation. Interestingly, this lower bound recovers the Chernoff test set bound if the posterior is equal to the prior. Moreover, to illustrate how tight these bounds can be, we study synthetic one-dimensional classiﬁcation tasks in which it is feasible to meta-learn both the prior and the form of the bound to numerically optimise for the tightest bounds possible. We ﬁnd that in this simple, controlled scenario, PAC-Bayes bounds are competitive with comparable, commonly used Chernoff test set bounds. However, the sharpest test set bounds still lead to better guarantees on the generalisation error than the PAC-Bayes bounds we consider. data-generating distributions, and aim to ﬁnd the best expected bounds for this distribution achievable by an optimised algorithm. 10 We choose especially simple learning tasks — synthetic 1-dimensional binary classiﬁcation problems, generated by thresholding Gaussian process (GP) samples — which allows us to fully control the task distribution and easily inspect predictive distributions visually to diagnose learning. Appendix I.1 contains full details.
\ No newline at end of file
diff --git a/data/2021/neurips/How Well do Feature Visualizations Support Causal Understanding of CNN Activations? b/data/2021/neurips/How Well do Feature Visualizations Support Causal Understanding of CNN Activations?
new file mode 100644
index 0000000000..aebbe1a231
--- /dev/null
+++ b/data/2021/neurips/How Well do Feature Visualizations Support Causal Understanding of CNN Activations?	
@@ -0,0 +1 @@
+A precise understanding of why units in an artificial network respond to certain stimuli would constitute a big step towards explainable artificial intelligence. One widely used approach towards this goal is to visualize unit responses via activation maximization. These synthetic feature visualizations are purported to provide humans with precise information about the image features that cause a unit to be activated - an advantage over other alternatives like strongly activating natural dataset samples. If humans indeed gain causal insight from visualizations, this should enable them to predict the effect of an intervention, such as how occluding a certain patch of the image (say, a dog's head) changes a unit's activation. Here, we test this hypothesis by asking humans to decide which of two square occlusions causes a larger change to a unit's activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average the extremely activating feature visualizations by Olah et al. (2017) indeed help humans on this task ($68 \pm 4$% accuracy; baseline performance without any visualizations is $60 \pm 3$%). However, they do not provide any substantial advantage over other visualizations (such as e.g. dataset samples), which yield similar performance ($66\pm3$% to $67 \pm3$% accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that a widely-used feature visualization method provides humans with better"causal understanding"of unit activations than simple alternative visualizations.
\ No newline at end of file
diff --git a/data/2021/neurips/How can classical multidimensional scaling go wrong? b/data/2021/neurips/How can classical multidimensional scaling go wrong?
new file mode 100644
index 0000000000..e5651690ff
--- /dev/null
+++ b/data/2021/neurips/How can classical multidimensional scaling go wrong?	
@@ -0,0 +1 @@
+Given a matrix $D$ describing the pairwise dissimilarities of a data set, a common task is to embed the data points into Euclidean space. The classical multidimensional scaling (cMDS) algorithm is a widespread method to do this. However, theoretical analysis of the robustness of the algorithm and an in-depth analysis of its performance on non-Euclidean metrics is lacking. In this paper, we derive a formula, based on the eigenvalues of a matrix obtained from $D$, for the Frobenius norm of the difference between $D$ and the metric $D_{\text{cmds}}$ returned by cMDS. This error analysis leads us to the conclusion that when the derived matrix has a significant number of negative eigenvalues, then $\|D-D_{\text{cmds}}\|_F$, after initially decreasing, will eventually increase as we increase the dimension. Hence, counterintuitively, the quality of the embedding degrades as we increase the dimension. We empirically verify that the Frobenius norm increases as we increase the dimension for a variety of non-Euclidean metrics. We also show on several benchmark datasets that this degradation in the embedding results in the classification accuracy of both simple (e.g., 1-nearest neighbor) and complex (e.g., multi-layer neural nets) classifiers decreasing as we increase the embedding dimension. Finally, our analysis leads us to a new efficiently computable algorithm that returns a matrix $D_l$ that is at least as close to the original distances as $D_t$ (the Euclidean metric closest in $\ell_2$ distance). While $D_l$ is not metric, when given as input to cMDS instead of $D$, it empirically results in solutions whose distance to $D$ does not increase when we increase the dimension and the classification accuracy degrades less than the cMDS solution.
\ No newline at end of file
diff --git a/data/2021/neurips/How does a Neural Network's Architecture Impact its Robustness to Noisy Labels? b/data/2021/neurips/How does a Neural Network's Architecture Impact its Robustness to Noisy Labels?
new file mode 100644
index 0000000000..84760d3628
--- /dev/null
+++ b/data/2021/neurips/How does a Neural Network's Architecture Impact its Robustness to Noisy Labels?	
@@ -0,0 +1 @@
+Noisy labels are inevitable in large real-world datasets. In this work, we explore an area understudied by previous works -- how the network's architecture impacts its robustness to noisy labels. We provide a formal framework connecting the robustness of a network to the alignments between its architecture and target/noise functions. Our framework measures a network's robustness via the predictive power in its representations -- the test performance of a linear model trained on the learned representations using a small set of clean labels. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise. To support our hypothesis, we provide both theoretical and empirical evidence across various neural network architectures and different domains. We also find that when the network is well-aligned with the target function, its predictive power in representations could improve upon state-of-the-art (SOTA) noisy-label-training methods in terms of test accuracy and even outperform sophisticated methods that use clean labels.
\ No newline at end of file
diff --git a/data/2021/neurips/How to transfer algorithmic reasoning knowledge to learn new algorithms? b/data/2021/neurips/How to transfer algorithmic reasoning knowledge to learn new algorithms?
new file mode 100644
index 0000000000..074c8c6324
--- /dev/null
+++ b/data/2021/neurips/How to transfer algorithmic reasoning knowledge to learn new algorithms?	
@@ -0,0 +1 @@
+Learning to execute algorithms is a fundamental problem that has been widely studied. Prior work~\cite{veli19neural} has shown that to enable systematic generalisation on graph algorithms it is critical to have access to the intermediate steps of the program/algorithm. In many reasoning tasks, where algorithmic-style reasoning is important, we only have access to the input and output examples. Thus, inspired by the success of pre-training on similar tasks or data in Natural Language Processing (NLP) and Computer Vision, we set out to study how we can transfer algorithmic reasoning knowledge. Specifically, we investigate how we can use algorithms for which we have access to the execution trace to learn to solve similar tasks for which we do not. We investigate two major classes of graph algorithms, parallel algorithms such as breadth-first search and Bellman-Ford and sequential greedy algorithms such as Prim and Dijkstra. Due to the fundamental differences between algorithmic reasoning knowledge and feature extractors such as used in Computer Vision or NLP, we hypothesise that standard transfer techniques will not be sufficient to achieve systematic generalisation. To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types. We validate this empirically and show how instead multi-task learning can be used to achieve the transfer of algorithmic reasoning knowledge.
\ No newline at end of file
diff --git a/data/2021/neurips/Human-Adversarial Visual Question Answering b/data/2021/neurips/Human-Adversarial Visual Question Answering
new file mode 100644
index 0000000000..2135b0bb61
--- /dev/null
+++ b/data/2021/neurips/Human-Adversarial Visual Question Answering	
@@ -0,0 +1 @@
+Performance on the most commonly used Visual Question Answering dataset (VQA v2) is starting to approach human accuracy. However, in interacting with state-of-the-art VQA models, it is clear that the problem is far from being solved. In order to stress test VQA models, we benchmark them against human-adversarial examples. Human subjects interact with a state-of-the-art VQA model, and for each image in the dataset, attempt to find a question where the model's predicted answer is incorrect. We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples. We conduct an extensive analysis of the collected adversarial examples and provide guidance on future research directions. We hope that this Adversarial VQA (AdVQA) benchmark can help drive progress in the field and advance the state of the art.
\ No newline at end of file
diff --git a/data/2021/neurips/Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits b/data/2021/neurips/Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits
new file mode 100644
index 0000000000..45cf87f6f5
--- /dev/null
+++ b/data/2021/neurips/Hybrid Regret Bounds for Combinatorial Semi-Bandits and Adversarial Linear Bandits	
@@ -0,0 +1 @@
+This study aims to develop bandit algorithms that automatically exploit tendencies of certain environments to improve performance, without any prior knowledge regarding the environments. We ﬁrst propose an algorithm for combinatorial semi-bandits with a hybrid regret bound that includes two main features: a best-of-three-worlds guarantee and multiple data-dependent regret bounds. The former means that the algorithm will work nearly optimally in all environments in an adversarial setting, a stochastic setting, or a stochastic setting with adversarial corruptions. The latter implies that, even if the environment is far from exhibiting stochastic behavior, the algorithm will perform better as long as the environment is "easy" in terms of certain metrics. The metrics w.r.t. the easiness referred to in this paper include cumulative loss for optimal actions, total quadratic variation of losses, and path-length of a loss sequence. We also show hybrid data-dependent regret bounds for adversarial linear bandits, which include a ﬁrst path-length regret bound that is tight up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2021/neurips/HyperSPNs: Compact and Expressive Probabilistic Circuits b/data/2021/neurips/HyperSPNs: Compact and Expressive Probabilistic Circuits
new file mode 100644
index 0000000000..e93c0cc5a8
--- /dev/null
+++ b/data/2021/neurips/HyperSPNs: Compact and Expressive Probabilistic Circuits	
@@ -0,0 +1 @@
+Probabilistic circuits (PCs) are a family of generative models which allows for the computation of exact likelihoods and marginals of its probability distributions. PCs are both expressive and tractable, and serve as popular choices for discrete density estimation tasks. However, large PCs are susceptible to overfitting, and only a few regularization strategies (e.g., dropout, weight-decay) have been explored. We propose HyperSPNs: a new paradigm of generating the mixture weights of large PCs using a small-scale neural network. Our framework can be viewed as a soft weight-sharing strategy, which combines the greater expressiveness of large models with the better generalization and memory-footprint properties of small models. We show the merits of our regularization strategy on two state-of-the-art PC families introduced in recent literature -- RAT-SPNs and EiNETs -- and demonstrate generalization improvements in both models on a suite of density estimation benchmarks in both discrete and continuous domains.
\ No newline at end of file
diff --git a/data/2021/neurips/Hyperbolic Busemann Learning with Ideal Prototypes b/data/2021/neurips/Hyperbolic Busemann Learning with Ideal Prototypes
new file mode 100644
index 0000000000..fba959da00
--- /dev/null
+++ b/data/2021/neurips/Hyperbolic Busemann Learning with Ideal Prototypes	
@@ -0,0 +1 @@
+Hyperbolic space has become a popular choice of manifold for representation learning of various datatypes from tree-like structures and text to graphs. Building on the success of deep learning with prototypes in Euclidean and hyperspherical spaces, a few recent works have proposed hyperbolic prototypes for classification. Such approaches enable effective learning in low-dimensional output spaces and can exploit hierarchical relations amongst classes, but require privileged information about class labels to position the hyperbolic prototypes. In this work, we propose Hyperbolic Busemann Learning. The main idea behind our approach is to position prototypes on the ideal boundary of the Poincar\'e ball, which does not require prior label knowledge. To be able to compute proximities to ideal prototypes, we introduce the penalised Busemann loss. We provide theory supporting the use of ideal prototypes and the proposed loss by proving its equivalence to logistic regression in the one-dimensional case. Empirically, we show that our approach provides a natural interpretation of classification confidence, while outperforming recent hyperspherical and hyperbolic prototype approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Hyperbolic Procrustes Analysis Using Riemannian Geometry b/data/2021/neurips/Hyperbolic Procrustes Analysis Using Riemannian Geometry
new file mode 100644
index 0000000000..25ebeb5b8c
--- /dev/null
+++ b/data/2021/neurips/Hyperbolic Procrustes Analysis Using Riemannian Geometry	
@@ -0,0 +1 @@
+Label-free alignment between datasets collected at different times, locations, or by different instruments is a fundamental scientiﬁc task. Hyperbolic spaces have recently provided a fruitful foundation for the development of informative representations of hierarchical data. Here, we take a purely geometric approach for label-free alignment of hierarchical datasets and introduce hyperbolic Procrustes analysis (HPA). HPA consists of new implementations of the three prototypical Procrustes analysis components: translation, scaling, and rotation, based on the Riemannian geometry of the Lorentz model of hyperbolic space. We analyze the proposed components, highlighting their useful properties for alignment. The efﬁcacy of HPA, its theoretical properties, stability and computational efﬁciency are demonstrated in simulations. In addition, we showcase its performance on three batch correction tasks involving gene expression and mass cytometry data. Speciﬁcally, we demonstrate high-quality unsupervised batch effect removal from data acquired at different sites and with different technologies that outperforms recent methods for label-free alignment in hyperbolic spaces.
\ No newline at end of file
diff --git a/data/2021/neurips/Hypergraph Propagation and Community Selection for Objects Retrieval b/data/2021/neurips/Hypergraph Propagation and Community Selection for Objects Retrieval
new file mode 100644
index 0000000000..2edc57d7f7
--- /dev/null
+++ b/data/2021/neurips/Hypergraph Propagation and Community Selection for Objects Retrieval	
@@ -0,0 +1 @@
+Spatial verification is a crucial technique for particular object retrieval. It utilizes spatial information for the accurate detection of true positive images. However, existing query expansion and diffusion methods cannot efficiently propagate the spatial information in an ordinary graph with scalar edge weights, resulting in low recall or precision. To tackle these problems, we propose a novel hypergraph-based framework that efficiently propagates spatial information in query time and retrieves an object in the database accurately. Additionally, we propose using the image graph’s structure information through community selection technique, to measure the accuracy of the initial search result and to provide correct starting points for hypergraph propagation without heavy spatial verification computations. Experiment results on ROxford and RParis show that our method significantly outperforms the existing query expansion and diffusion methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Hyperparameter Optimization Is Deceiving Us, and How to Stop It b/data/2021/neurips/Hyperparameter Optimization Is Deceiving Us, and How to Stop It
new file mode 100644
index 0000000000..f2f725636a
--- /dev/null
+++ b/data/2021/neurips/Hyperparameter Optimization Is Deceiving Us, and How to Stop It	
@@ -0,0 +1 @@
+Recent empirical work shows that inconsistent results based on choice of hyperparameter optimization (HPO) configuration are a widespread problem in ML research. When comparing two algorithms J and K searching one subspace can yield the conclusion that J outperforms K, whereas searching another can entail the opposite. In short, the way we choose hyperparameters can deceive us. We provide a theoretical complement to this prior work, arguing that, to avoid such deception, the process of drawing conclusions from HPO should be made more rigorous. We call this process epistemic hyperparameter optimization (EHPO), and put forth a logical framework to capture its semantics and how it can lead to inconsistent conclusions about performance. Our framework enables us to prove EHPO methods that are guaranteed to be defended against deception, given bounded compute time budget t. We demonstrate our framework's utility by proving and empirically validating a defended variant of random search.
\ No newline at end of file
diff --git a/data/2021/neurips/Hyperparameter Tuning is All You Need for LISTA b/data/2021/neurips/Hyperparameter Tuning is All You Need for LISTA
new file mode 100644
index 0000000000..c2401cc411
--- /dev/null
+++ b/data/2021/neurips/Hyperparameter Tuning is All You Need for LISTA	
@@ -0,0 +1 @@
+Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity. We call this new ultra-light weight network HyperLISTA. Compared to state-of-the-art LISTA models, HyperLISTA achieves almost the same performance on seen data distributions and performs better when tested on unseen distributions (specifically, those with different sparsity levels and nonzero magnitudes). Code is available: https://github.com/VITA-Group/HyperLISTA.
\ No newline at end of file
diff --git a/data/2021/neurips/INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding b/data/2021/neurips/INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding
new file mode 100644
index 0000000000..39e8224474
--- /dev/null
+++ b/data/2021/neurips/INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding	
@@ -0,0 +1 @@
+The aim of knowledge graph (KG) completion is to extend an incomplete KG with missing triples. Popular approaches based on graph embeddings typically work by first representing the KG in a vector space, and then applying a predefined scoring function to the resulting vectors to complete the KG. These approaches work well in transductive settings, where predicted triples involve only constants seen during training; however, they are not applicable in inductive settings, where the KG on which the model was trained is extended with new constants or merged with other KGs. The use of Graph Neural Networks (GNNs) has recently been proposed as a way to overcome these limitations; however, existing approaches do not fully exploit the capabilities of GNNs and still rely on heuristics and ad-hoc scoring functions. In this paper, we propose a novel approach, where the KG is fully encoded into a GNN in a transparent way, and where the predicted triples can be read out directly from the last layer of the GNN without the need for additional components or scoring functions. Our experiments show that our model outperforms state-of-the-art approaches on inductive KG completion benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/IQ-Learn: Inverse soft-Q Learning for Imitation b/data/2021/neurips/IQ-Learn: Inverse soft-Q Learning for Imitation
new file mode 100644
index 0000000000..b337ea119b
--- /dev/null
+++ b/data/2021/neurips/IQ-Learn: Inverse soft-Q Learning for Imitation	
@@ -0,0 +1 @@
+In many sequential decision-making problems (e.g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task. However, imitation learning (IL) from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence but doesn't utilize any information involving the environment's dynamics. Many existing methods that exploit dynamics information are difficult to train in practice due to an adversarial optimization process over reward and policy approximators or biased, high variance gradient estimators. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. On standard benchmarks, the implicitly learned rewards show a high positive correlation with the ground-truth rewards, illustrating our method can also be used for inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, significantly outperforming existing methods both in the number of required environment interactions and scalability in high-dimensional spaces, often by more than 3x.
\ No newline at end of file
diff --git a/data/2021/neurips/IRM - when it works and when it doesn't: A test case of natural language inference b/data/2021/neurips/IRM - when it works and when it doesn't: A test case of natural language inference
new file mode 100644
index 0000000000..c88167bdba
--- /dev/null
+++ b/data/2021/neurips/IRM - when it works and when it doesn't: A test case of natural language inference	
@@ -0,0 +1 @@
+Invariant Risk Minimization (IRM) is a recently proposed framework for out-of-distribution (o.o.d) generalization. Most of the studies on IRM so far have focused on theoretical results, toy problems, and simple models. In this work, we investigate the applicability of IRM to bias mitigation—a special case of o.o.d generalization—in increasingly naturalistic settings and deep models. Using natural language inference (NLI) as a test case, we start with a setting where both the dataset and the bias are synthetic, continue with a natural dataset and synthetic bias, and end with a fully realistic setting with natural datasets and bias. Our results show that in naturalistic settings, learning complex features in place of the bias proves to be difﬁcult, leading to a rather small improvement over empirical risk minimization. Moreover, we ﬁnd that in addition to being sensitive to random seeds, the performance of IRM also depends on several critical factors, notably dataset size, bias prevalence, and bias strength, thus limiting IRM’s advantage in practical scenarios. Our results highlight key challenges in applying IRM to real-world scenarios, calling for a more naturalistic characterization of the problem setup for o.o.d generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Identifiability in inverse reinforcement learning b/data/2021/neurips/Identifiability in inverse reinforcement learning
new file mode 100644
index 0000000000..ea2cd959ce
--- /dev/null
+++ b/data/2021/neurips/Identifiability in inverse reinforcement learning	
@@ -0,0 +1 @@
+Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions. As already observed in Russell [1998] the problem is ill-posed, and the reward function is not identifiable, even under the presence of perfect information about optimal behavior. We provide a resolution to this non-identifiability for problems with entropy regularization. For a given environment, we fully characterize the reward functions leading to a given policy and demonstrate that, given demonstrations of actions for the same reward under two distinct discount factors, or under sufficiently different environments, the unobserved reward can be recovered up to a constant. We also give general necessary and sufficient conditions for reconstruction of time-homogeneous rewards on finite horizons, and for action-independent rewards, generalizing recent results of Kim et al. [2021] and Fu et al. [2018].
\ No newline at end of file
diff --git a/data/2021/neurips/Identifiable Generative models for Missing Not at Random Data Imputation b/data/2021/neurips/Identifiable Generative models for Missing Not at Random Data Imputation
new file mode 100644
index 0000000000..f50b536950
--- /dev/null
+++ b/data/2021/neurips/Identifiable Generative models for Missing Not at Random Data Imputation	
@@ -0,0 +1 @@
+Real-world datasets often have missing values associated with complex generative processes, where the cause of the missingness may not be fully observed. This is known as missing not at random (MNAR) data. However, many imputation methods do not take into account the missingness mechanism, resulting in biased imputation values when MNAR data is present. Although there are a few methods that have considered the MNAR scenario, their model's identifiability under MNAR is generally not guaranteed. That is, model parameters can not be uniquely determined even with infinite data samples, hence the imputation results given by such models can still be biased. This issue is especially overlooked by many modern deep generative models. In this work, we fill in this gap by systematically analyzing the identifiability of generative models under MNAR. Furthermore, we propose a practical deep generative model which can provide identifiability guarantees under mild assumptions, for a wide range of MNAR mechanisms. Our method demonstrates a clear advantage for tasks on both synthetic data and multiple real-world scenarios with MNAR data.
\ No newline at end of file
diff --git a/data/2021/neurips/Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information b/data/2021/neurips/Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information
new file mode 100644
index 0000000000..b86ea20037
--- /dev/null
+++ b/data/2021/neurips/Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information	
@@ -0,0 +1 @@
+The joint probabilities of potential outcomes are fundamental components of causal inference in the sense that (i) if they are identifiable, then the causal risk is also identifiable, but not vise versa (Pearl, 2009; Tian and Pearl, 2000) and (ii) they enable us to evaluate the probabilistic aspects of “necessity”, “sufficiency”, and “necessity and sufficiency”, which are important concepts of successful explanation (Watson, et al., 2020). However, because they are not identifiable without any assumptions, various assumptions have been utilized to evaluate the joint probabilities of potential outcomes, e.g., the assumption of monotonicity (Pearl, 2009; Tian and Pearl, 2000), the independence between potential outcomes (Robins and Richardson, 2011), the condition of gain equality (Li and Pearl, 2019), and the specific functional relationships between cause and effect (Pearl, 2009). Unlike existing identification conditions, in order to evaluate the joint probabilities of potential outcomes without such assumptions, this paper proposes two types of novel identification conditions using covariate information. In addition, when the joint probabilities of potential outcomes are identifiable through the proposed conditions, the estimation problem of the joint probabilities of potential outcomes reduces to that of singular models and thus they can not be evaluated by standard statistical estimation methods. To solve the problem, this paper proposes a new statistical estimation method based on the augmented Lagrangian method and shows the asymptotic normality of the proposed estimators. Given space constraints, the proofs, the details on the statistical estimation method, some numerical experiments, and the case study are provided in the supplementary material.
\ No newline at end of file
diff --git a/data/2021/neurips/Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases b/data/2021/neurips/Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases
new file mode 100644
index 0000000000..acb42c2e88
--- /dev/null
+++ b/data/2021/neurips/Identification of Partially Observed Linear Causal Models: Graphical Conditions for the Non-Gaussian and Heterogeneous Cases	
@@ -0,0 +1 @@
+In causal discovery, linear non-Gaussian acyclic models (LiNGAMs) have been studied extensively. While the causally sufﬁcient case is well understood, in many real applications the observed variables are not causally related. Rather, they are generated by latent variables, such as confounders and mediators, which may themselves be causally related. Existing results on the identiﬁcation of the causal structure among the latent variables often require very strong graphical assumptions. In this paper, we consider partially observed linear models with either non-Gaussian or heterogeneous errors. In that case we give two graphical conditions which are necessary for identiﬁcation of the causal structure. These conditions are closely related to sparsity of the causal edges. Together with one additional condition on the coefﬁcients, which holds generically for any graph, the two graphical conditions are also sufﬁcient for identiﬁability. These new conditions can be satisﬁed even when the number of latent variables is very large. We demonstrate the validity of our results on synthetic data.
\ No newline at end of file
diff --git a/data/2021/neurips/Identification of the Generalized Condorcet Winner in Multi-dueling Bandits b/data/2021/neurips/Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
new file mode 100644
index 0000000000..3dc5e982dd
--- /dev/null
+++ b/data/2021/neurips/Identification of the Generalized Condorcet Winner in Multi-dueling Bandits	
@@ -0,0 +1 @@
+The reliable identiﬁcation of the “best” arm while keeping the sample complexity as low as possible is a common task in the ﬁeld of multi-armed bandits. In the multi-dueling variant of multi-armed bandits, where feedback is provided in the form of a winning arm among a set of k chosen ones, a reasonable notion of best arm is the generalized Condorcet winner (GCW). The latter is an arm that has the greatest probability of being the winner in each subset containing it. In this paper, we derive lower bounds on the sample complexity for the task of identifying the GCW under various assumptions. As a by-product, our lower bound results provide new insights for the special case of dueling bandits ( k = 2 ). We propose the Dvoretzky–Kiefer–Wolfowitz tournament (DKWT) algorithm, which we prove to be nearly optimal. In a numerical study, we show that DKWT empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.
\ No newline at end of file
diff --git a/data/2021/neurips/Identifying and Benchmarking Natural Out-of-Context Prediction Problems b/data/2021/neurips/Identifying and Benchmarking Natural Out-of-Context Prediction Problems
new file mode 100644
index 0000000000..6c8dd32109
--- /dev/null
+++ b/data/2021/neurips/Identifying and Benchmarking Natural Out-of-Context Prediction Problems	
@@ -0,0 +1 @@
+Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have recently been introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate how rich auxiliary information can be leveraged to identify candidate sets of OOC examples in existing datasets. We present NOOCh: a suite of naturally-occurring"challenge sets", and show how varying notions of context can be used to probe specific OOC failure modes. Experimentally, we explore the tradeoffs between various learning approaches on these challenge sets and demonstrate how the choices made in designing OOC benchmarks can yield varying conclusions.
\ No newline at end of file
diff --git a/data/2021/neurips/Identity testing for Mallows model b/data/2021/neurips/Identity testing for Mallows model
new file mode 100644
index 0000000000..f386df9fff
--- /dev/null
+++ b/data/2021/neurips/Identity testing for Mallows model	
@@ -0,0 +1 @@
+In this paper, we devise identity tests for ranking data that is generated from Mallows model both in the asymptotic and non-asymptotic settings. First we consider the case when the central ranking is known, and devise two algorithms for testing the spread parameter of the Mallows model. The first one is obtained by constructing a Uniformly Most Powerful Unbiased (UMPU) test in the asymptotic setting and then converting it into a sample-optimal non-asymptotic identity test. The resulting test is, however, impractical even for medium sized data, because it requires computing the distribution of the sufficient statistic. The second nonasymptotic test is derived from an optimal learning algorithm for the Mallows model. This test is both easy to compute and is sample-optimal for a wide range of parameters. Next, we consider testing Mallows models for the unknown central ranking case. This case can be tackled in the asymptotic setting by introducing a bias that exponentially decays with the sample size. We support all our findings with extensive numerical experiments and show that the proposed tests scale gracefully with the number of items to be ranked.
\ No newline at end of file
diff --git a/data/2021/neurips/Image Generation using Continuous Filter Atoms b/data/2021/neurips/Image Generation using Continuous Filter Atoms
new file mode 100644
index 0000000000..d78b17e518
--- /dev/null
+++ b/data/2021/neurips/Image Generation using Continuous Filter Atoms	
@@ -0,0 +1 @@
+In this paper, we model the subspace of convolutional ﬁlters with a neural ordinary differential equation (ODE) to enable gradual changes in generated images. Decom-posing convolutional ﬁlters over a set of ﬁlter atoms allows efﬁciently modeling and sampling from a subspace of high-dimensional ﬁlters. By further modeling ﬁlters atoms with a neural ODE, we show both empirically and theoretically that such introduced continuity can be propagated to the generated images, and thus achieves gradually evolved image generation. We support the proposed framework of image generation with continuous ﬁlter atoms using various experiments, including image-to-image translation and image generation conditioned on continuous labels. Without auxiliary network components and heavy supervision, the proposed continuous ﬁlter atoms allow us to easily manipulate the gradual change of generated images by controlling integration intervals of neural ordinary differential equation. This research sheds the light on using the subspace of network parameters to navigate the diverse appearance of image generation.
\ No newline at end of file
diff --git a/data/2021/neurips/ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis b/data/2021/neurips/ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
new file mode 100644
index 0000000000..270ee052ec
--- /dev/null
+++ b/data/2021/neurips/ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis	
@@ -0,0 +1 @@
+Autoregressive models and their sequential factorization of the data likelihood have recently demonstrated great potential for image representation and synthesis. Nevertheless, they incorporate image context in a linear 1D order by attending only to previously synthesized image patches above or to the left. Not only is this unidirectional, sequential bias of attention unnatural for images as it disregards large parts of a scene until synthesis is almost complete. It also processes the entire image on a single scale, thus ignoring more global contextual information up to the gist of the entire scene. As a remedy we incorporate a coarse-to-fine hierarchy of context by combining the autoregressive formulation with a multinomial diffusion process: Whereas a multistage diffusion process successively removes information to coarsen an image, we train a (short) Markov chain to invert this process. In each stage, the resulting autoregressive ImageBART model progressively incorporates context from previous stages in a coarse-to-fine manner. Experiments show greatly improved image modification capabilities over autoregressive models while also providing high-fidelity image generation, both of which are enabled through efficient training in a compressed latent space. Specifically, our approach can take unrestricted, user-provided masks into account to perform local image editing. Thus, in contrast to pure autoregressive models, it can solve free-form image inpainting and, in the case of conditional models, local, text-guided image modification without requiring mask-specific training.
\ No newline at end of file
diff --git a/data/2021/neurips/Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations b/data/2021/neurips/Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
new file mode 100644
index 0000000000..3e3d541947
--- /dev/null
+++ b/data/2021/neurips/Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations	
@@ -0,0 +1 @@
+Understanding the training dynamics of deep learning models is perhaps a necessary step toward demystifying the effectiveness of these models. In particular, how do data from different classes gradually become separable in their feature spaces when training neural networks using stochastic gradient descent? In this study, we model the evolution of features during deep learning training using a set of stochastic differential equations (SDEs) that each corresponds to a training sample. As a crucial ingredient in our modeling strategy, each SDE contains a drift term that reflects the impact of backpropagation at an input on the features of all samples. Our main finding uncovers a sharp phase transition phenomenon regarding the intra-class impact: if the SDEs are locally elastic [19] in the sense that the impact is more significant on samples from the same class as the input, the features of the training data become linearly separable, meaning vanishing training loss; otherwise, the features are not separable, regardless of how long the training time is. Moreover, in the presence of local elasticity, an analysis of our SDEs shows that the emergence of a simple geometric structure called the neural collapse of the features. Taken together, our results shed light on the decisive role of local elasticity in the training dynamics of neural networks. We corroborate our theoretical analysis with experiments on a synthesized dataset of geometric shapes and CIFAR-10.
\ No newline at end of file
diff --git a/data/2021/neurips/Imitation with Neural Density Models b/data/2021/neurips/Imitation with Neural Density Models
new file mode 100644
index 0000000000..0ede7e5eb6
--- /dev/null
+++ b/data/2021/neurips/Imitation with Neural Density Models	
@@ -0,0 +1 @@
+We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Maximum Occupancy Entropy Reinforcement Learning (RL) using the density as a reward. Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator. We present a practical IL algorithm, Neural Density Imitation (NDI), which obtains state-of-the-art demonstration efficiency on benchmark control tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity b/data/2021/neurips/Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity
new file mode 100644
index 0000000000..7db0c95aca
--- /dev/null
+++ b/data/2021/neurips/Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity	
@@ -0,0 +1 @@
+Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove that it always enjoys better generalisation properties than that of gradient flow. Quite surprisingly, we show that the convergence speed of the training loss controls the magnitude of the biasing effect: the slower the convergence, the better the bias. To fully complete our analysis, we provide convergence guarantees for the dynamics. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and they help explain the greater performances observed in practice of stochastic gradient descent over gradient descent.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods b/data/2021/neurips/Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods
new file mode 100644
index 0000000000..ea96bff3e0
--- /dev/null
+++ b/data/2021/neurips/Implicit Deep Adaptive Design: Policy-Based Experimental Design without Likelihoods	
@@ -0,0 +1 @@
+We introduce implicit Deep Adaptive Design (iDAD), a new method for performing adaptive experiments in real-time with implicit models. iDAD amortizes the cost of Bayesian optimal experimental design (BOED) by learning a design policy network upfront, which can then be deployed quickly at the time of the experiment. The iDAD network can be trained on any model which simulates differentiable samples, unlike previous design policy work that requires a closed form likelihood and conditionally independent experiments. At deployment, iDAD allows design decisions to be made in milliseconds, in contrast to traditional BOED approaches that require heavy computation during the experiment itself. We illustrate the applicability of iDAD on a number of experiments, and show that it provides a fast and effective mechanism for performing adaptive design with implicit models.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path b/data/2021/neurips/Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path
new file mode 100644
index 0000000000..943296cac0
--- /dev/null
+++ b/data/2021/neurips/Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path	
@@ -0,0 +1 @@
+We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured. The key of our analysis is a new technique called implicit finite-horizon approximation, which approximates the SSP model by a finite-horizon counterpart only in the analysis without explicit implementation. Using this template, we develop two new algorithms: the first one is model-free (the first in the literature to our knowledge) and minimax optimal under strictly positive costs; the second one is model-based and minimax optimal even with zero-cost state-action pairs, matching the best existing result from [Tarbouriech et al., 2021b]. Importantly, both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms. Moreover, both can be made completely parameter-free.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Generative Copulas b/data/2021/neurips/Implicit Generative Copulas
new file mode 100644
index 0000000000..468e237fcd
--- /dev/null
+++ b/data/2021/neurips/Implicit Generative Copulas	
@@ -0,0 +1 @@
+Copulas are a powerful tool for modeling multivariate distributions as they allow to separately estimate the univariate marginal distributions and the joint dependency structure. However, known parametric copulas offer limited flexibility especially in high dimensions, while commonly used non-parametric methods suffer from the curse of dimensionality. A popular remedy is to construct a tree-based hierarchy of conditional bivariate copulas. In this paper, we propose a flexible, yet conceptually simple alternative based on implicit generative neural networks. The key challenge is to ensure marginal uniformity of the estimated copula distribution. We achieve this by learning a multivariate latent distribution with unspecified marginals but the desired dependency structure. By applying the probability integral transform, we can then obtain samples from the high-dimensional copula distribution without relying on parametric assumptions or the need to find a suitable tree structure. Experiments on synthetic and real data from finance, physics, and image generation demonstrate the performance of this approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions b/data/2021/neurips/Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
new file mode 100644
index 0000000000..92441632b1
--- /dev/null
+++ b/data/2021/neurips/Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions	
@@ -0,0 +1 @@
+Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Regularization in Matrix Sensing via Mirror Descent b/data/2021/neurips/Implicit Regularization in Matrix Sensing via Mirror Descent
new file mode 100644
index 0000000000..22475b945b
--- /dev/null
+++ b/data/2021/neurips/Implicit Regularization in Matrix Sensing via Mirror Descent	
@@ -0,0 +1 @@
+We study discrete-time mirror descent applied to the unregularized empirical risk in matrix sensing. In both the general case of rectangular matrices and the particular case of positive semidefinite matrices, a simple potential-based analysis in terms of the Bregman divergence allows us to establish convergence of mirror descent -- with different choices of the mirror maps -- to a matrix that, among all global minimizers of the empirical risk, minimizes a quantity explicitly related to the nuclear norm, the Frobenius norm, and the von Neumann entropy. In both cases, this characterization implies that mirror descent, a first-order algorithm minimizing the unregularized empirical risk, recovers low-rank matrices under the same set of assumptions that are sufficient to guarantee recovery for nuclear-norm minimization. When the sensing matrices are symmetric and commute, we show that gradient descent with full-rank factorized parametrization is a first-order approximation to mirror descent, in which case we obtain an explicit characterization of the implicit bias of gradient flow as a by-product.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit SVD for Graph Representation Learning b/data/2021/neurips/Implicit SVD for Graph Representation Learning
new file mode 100644
index 0000000000..51395a655c
--- /dev/null
+++ b/data/2021/neurips/Implicit SVD for Graph Representation Learning	
@@ -0,0 +1 @@
+Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs. In this paper, we make GRL more computationally tractable for those with modest hardware. We design a framework that computes SVD of \textit{implicitly} defined matrices, and apply this framework to several GRL tasks. For each task, we derive linear approximation of a SOTA model, where we design (expensive-to-store) matrix $\mathbf{M}$ and train the model, in closed-form, via SVD of $\mathbf{M}$, without calculating entries of $\mathbf{M}$. By converging to a unique point in one step, and without calculating gradients, our models show competitive empirical test performance over various graphs such as article citation and biological interaction networks. More importantly, SVD can initialize a deeper model, that is architected to be non-linear almost everywhere, though behaves linearly when its parameters reside on a hyperplane, onto which SVD initializes. The deeper model can then be fine-tuned within only a few epochs. Overall, our procedure trains hundreds of times faster than state-of-the-art methods, while competing on empirical test performance. We open-source our implementation at: https://github.com/samihaija/isvd
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Semantic Response Alignment for Partial Domain Adaptation b/data/2021/neurips/Implicit Semantic Response Alignment for Partial Domain Adaptation
new file mode 100644
index 0000000000..1e38ef9944
--- /dev/null
+++ b/data/2021/neurips/Implicit Semantic Response Alignment for Partial Domain Adaptation	
@@ -0,0 +1 @@
+Partial Domain Adaptation (PDA) addresses the unsupervised domain adaptation problem where the target label space is a subset of the source label space. Most state-of-art PDA methods tackle the inconsistent label space by assigning weights to classes or individual samples, in an attempt to discard the source data that belongs to the irrelevant classes. However, we believe samples from those extra categories would still contain valuable information to promote positive transfer. In this paper, we propose the Implicit Semantic Response Alignment to explore the intrinsic relationships among different categories by applying a weighted schema on the feature level. Speciﬁcally, we design a class2vec module to extract the implicit semantic topics from the visual features. With an attention layer, we calculate the semantic response according to each implicit semantic topic. Then semantic responses of source and target data are aligned to retain the relevant information contained in multiple categories by weighting the features, instead of samples. Experiments on several cross-domain benchmark datasets demonstrate the effectiveness of our method over the state-of-the-art PDA methods. Moreover, we elaborate in-depth analyses to further explore implicit semantic alignment.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Sparse Regularization: The Impact of Depth and Early Stopping b/data/2021/neurips/Implicit Sparse Regularization: The Impact of Depth and Early Stopping
new file mode 100644
index 0000000000..a40194889c
--- /dev/null
+++ b/data/2021/neurips/Implicit Sparse Regularization: The Impact of Depth and Early Stopping	
@@ -0,0 +1 @@
+In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stopping is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stopping and show that for a general depth parameter N, gradient descent with early stopping achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stopping window so that this implicit sparse regularization effect is more likely to take place.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Task-Driven Probability Discrepancy Measure for Unsupervised Domain Adaptation b/data/2021/neurips/Implicit Task-Driven Probability Discrepancy Measure for Unsupervised Domain Adaptation
new file mode 100644
index 0000000000..454a018cb4
--- /dev/null
+++ b/data/2021/neurips/Implicit Task-Driven Probability Discrepancy Measure for Unsupervised Domain Adaptation	
@@ -0,0 +1 @@
+Probability discrepancy measure is a fundamental construct for numerous machine learning models such as weakly supervised learning and generative modeling. However, most measures overlook the fact that the distributions are not the end-product of learning, but are the input of a downstream predictor. Therefore, it is important to warp the probability discrepancy measure towards the end tasks, and towards this goal, we propose a new bi-level optimization based approach so that the two distributions are compared not uniformly against the entire hypothesis space, but only with respect to the optimal predictor for the downstream end task. When applied to margin disparity discrepancy and contrastive domain discrepancy, our method significantly improves the performance in unsupervised domain adaptation, and enjoys a much more principled training process.
\ No newline at end of file
diff --git a/data/2021/neurips/Implicit Transformer Network for Screen Content Image Continuous Super-Resolution b/data/2021/neurips/Implicit Transformer Network for Screen Content Image Continuous Super-Resolution
new file mode 100644
index 0000000000..b50bafb701
--- /dev/null
+++ b/data/2021/neurips/Implicit Transformer Network for Screen Content Image Continuous Super-Resolution	
@@ -0,0 +1 @@
+Nowadays, there is an explosive growth of screen contents due to the wide application of screen sharing, remote cooperation, and online education. To match the limited terminal bandwidth, high-resolution (HR) screen contents may be downsampled and compressed. At the receiver side, the super-resolution (SR) of low-resolution (LR) screen content images (SCIs) is highly demanded by the HR display or by the users to zoom in for detail observation. However, image SR methods mostly designed for natural images do not generalize well for SCIs due to the very different image characteristics as well as the requirement of SCI browsing at arbitrary scales. To this end, we propose a novel Implicit Transformer Super-Resolution Network (ITSRN) for SCISR. For high-quality continuous SR at arbitrary ratios, pixel values at query coordinates are inferred from image features at key coordinates by the proposed implicit transformer and an implicit position encoding scheme is proposed to aggregate similar neighboring pixel values to the query one. We construct benchmark SCI1K and SCI1K-compression datasets with LR and HR SCI pairs. Extensive experiments show that the proposed ITSRN significantly outperforms several competitive continuous and discrete SR methods for both compressed and uncompressed SCIs.
\ No newline at end of file
diff --git a/data/2021/neurips/Impression learning: Online representation learning with synaptic plasticity b/data/2021/neurips/Impression learning: Online representation learning with synaptic plasticity
new file mode 100644
index 0000000000..786f405f0a
--- /dev/null
+++ b/data/2021/neurips/Impression learning: Online representation learning with synaptic plasticity	
@@ -0,0 +1 @@
+Understanding how the brain constructs statistical models of the sensory world remains a longstanding challenge for computational neuroscience. Here, we derive an unsupervised local synaptic plasticity rule that trains neural circuits to infer latent structure from sensory stimuli via a novel loss function for approximate online Bayesian inference. The learning algorithm is driven by a local error signal computed between two factors that jointly contribute to neural activity: stimulus drive and internal predictions — the network’s ‘impression’ of the stimulus. Physio-logically, we associate these two components with the basal and apical dendrites of pyramidal neurons, respectively. We show that learning can be implemented online, is capable of capturing temporal dependencies in continuous input streams, and generalizes to hierarchical architectures. Furthermore, we demonstrate both analytically and empirically that the algorithm is more data-efﬁcient than a three-factor plasticity alternative, enabling it to learn statistics of high-dimensional, naturalistic inputs. Overall, the model provides a bridge from mechanistic accounts of synaptic plasticity to algorithmic descriptions of unsupervised probabilistic learning and inference.
\ No newline at end of file
diff --git a/data/2021/neurips/Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction b/data/2021/neurips/Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction
new file mode 100644
index 0000000000..f1ee8bdf30
--- /dev/null
+++ b/data/2021/neurips/Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction	
@@ -0,0 +1 @@
+Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories. We prove that our correction eliminates the above mismatch and bound the probability of sub-optimal action selection. Our correction significantly improves pre-trained Rainbow agents without any further training, often more than doubling their scores on Atari games. Next, we address the scalability issue given by the computational complexity of exhaustive TS that scales exponentially with the tree depth. We introduce Batch-BFS: a GPU breadth-first search that advances all nodes in each depth of the tree simultaneously. Batch-BFS reduces runtime by two orders of magnitude and, beyond inference, enables also training with TS of depths that were not feasible before. We train DQN agents from scratch using TS and show improvement in several Atari games compared to both the original DQN and the more advanced Rainbow. The code for BCTS can be found in \url{https://github.com/NVlabs/bcts}.
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces b/data/2021/neurips/Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces
new file mode 100644
index 0000000000..2a608139ff
--- /dev/null
+++ b/data/2021/neurips/Improved Coresets and Sublinear Algorithms for Power Means in Euclidean Spaces	
@@ -0,0 +1 @@
+In this paper, we consider the problem of ﬁnding high dimensional power means: given a set A of n points in R d , ﬁnd the point m that minimizes the sum of Euclidean distance, raised to the power z , over all input points. Special cases of problem include the well-known Fermat-Weber problem – or geometric median problem – where z = 1 , the mean or centroid where z = 2 , and the Minimum Enclosing Ball problem, where z = ∞ . We consider these problem in the big data regime. Here, we are interested in sampling as few points as possible such that we can accurately estimate m . More speciﬁcally, we consider sublinear algorithms as well as coresets for these problems. Sublinear algorithms have a random query access to the set A and the goal is to minimize the number of queries. Here, we show that (cid:101) O (cid:0) ε − z − 3 (cid:1) samples are sufﬁcient to achieve a (1+ ε ) -approximation, generalizing the results from Cohen, Lee, Miller, Pachocki, and Sidford [STOC ’16] and Inaba, Katoh, and Imai [SoCG ’94] to arbitrary z . Moreover, we show that this bound is nearly optimal, as any algorithm requires at least Ω (cid:0) ε − z +1 (cid:1) queries to achieve said approximation. The second contribution are coresets for these problems, where we aim to ﬁnd ﬁnd a small, weighted subset of the points which approximates cost of every candidate point c ∈ R d up to a (1 ± ε ) factor. Here, we show that (cid:101) O (cid:0) ε − 2 (cid:1) points are suf-ﬁcient, improving on the (cid:101) O (cid:0) dε − 2 (cid:1) bound by Feldman and Langberg [STOC ’11] and the
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Guarantees for Offline Stochastic Matching via new Ordered Contention Resolution Schemes b/data/2021/neurips/Improved Guarantees for Offline Stochastic Matching via new Ordered Contention Resolution Schemes
new file mode 100644
index 0000000000..6630f05746
--- /dev/null
+++ b/data/2021/neurips/Improved Guarantees for Offline Stochastic Matching via new Ordered Contention Resolution Schemes	
@@ -0,0 +1 @@
+Matching is one of the most fundamental and broadly applicable problems across many domains. In these diverse real-world applications, there is often a degree of uncertainty in the input which has led to the study of stochastic matching models. Here, each edge in the graph has a known, independent probability of existing derived from some prediction. Algorithms must probe edges to determine existence and match them irrevocably if they exist. Further, each vertex may have a patience constraint denoting how many of its neighboring edges can be probed. We present new ordered contention resolution schemes yielding improved approximation guarantees for some of the foundational problems studied in this area. For stochastic matching with patience constraints in general graphs, we provide a 0 . 382-approximate algorithm, signiﬁcantly improving over the previous best 0 . 31-approximation of Baveja et al. (2018). When the vertices do not have patience constraints, we describe a 0 . 432-approximate random order probing algorithm with several corollaries such as an improved guarantee for the Prophet Secretary problem under Edge Arrivals. Finally, for the special case of bipartite graphs with unit patience constraints on one of the partitions, we show a 0 . 632-approximate algorithm that improves on the recent 1 / 3-guarantee of Hikima et al. (2021).
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Learning Rates of a Functional Lasso-type SVM with Sparse Multi-Kernel Representation b/data/2021/neurips/Improved Learning Rates of a Functional Lasso-type SVM with Sparse Multi-Kernel Representation
new file mode 100644
index 0000000000..cf2985c9a1
--- /dev/null
+++ b/data/2021/neurips/Improved Learning Rates of a Functional Lasso-type SVM with Sparse Multi-Kernel Representation	
@@ -0,0 +1 @@
+The organization of the proofs is as follows. Frist, we show that the `1-type penalty has as special feature, allowing one to avoid estimating explicitly the estimation error. Second, a new element in the proof is proved in empirical process theory, in which we use the strong convexity of the expected functional objective under the Bernstein condition to enter directly into local conditions. Third, we use the intrinsic correlation between different RKHSs to derive a key inequality, which is related to the left side one of our basic inequality. Based on the above three steps, we can establish a basic polynomial-type inequality, which immediately yields our general learning rates for SVM.
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Regret Bounds for Tracking Experts with Memory b/data/2021/neurips/Improved Regret Bounds for Tracking Experts with Memory
new file mode 100644
index 0000000000..71e392d2ae
--- /dev/null
+++ b/data/2021/neurips/Improved Regret Bounds for Tracking Experts with Memory	
@@ -0,0 +1 @@
+We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bounds [26]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Regularization and Robustness for Fine-tuning in Neural Networks b/data/2021/neurips/Improved Regularization and Robustness for Fine-tuning in Neural Networks
new file mode 100644
index 0000000000..4fd7b544ca
--- /dev/null
+++ b/data/2021/neurips/Improved Regularization and Robustness for Fine-tuning in Neural Networks	
@@ -0,0 +1 @@
+A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is much larger than the size of the target data set, fine-tuning is prone to overfitting and"memorizing"the training labels. Hence, an important question is to regularize fine-tuning and ensure its robustness to noise. To address this question, we begin by analyzing the generalization properties of fine-tuning. We present a PAC-Bayes generalization bound that depends on the distance traveled in each layer during fine-tuning and the noise stability of the fine-tuned model. We empirically measure these quantities. Based on the analysis, we propose regularized self-labeling -- the interpolation between regularization and self-labeling methods, including (i) layer-wise regularization to constrain the distance traveled in each layer; (ii) self label-correction and label-reweighting to correct mislabeled data points (that the model is confident) and reweight less confident data points. We validate our approach on an extensive collection of image and text data sets using multiple pre-trained model architectures. Our approach improves baseline methods by 1.76% (on average) for seven image classification tasks and 0.75% for a few-shot classification task. When the target data set includes noisy labels, our approach outperforms baseline methods by 3.56% on average in two noisy settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Transformer for High-Resolution GANs b/data/2021/neurips/Improved Transformer for High-Resolution GANs
new file mode 100644
index 0000000000..1f0a3ab0c5
--- /dev/null
+++ b/data/2021/neurips/Improved Transformer for High-Resolution GANs	
@@ -0,0 +1 @@
+Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 30.83 and 2.95 on unconditional ImageNet $128 \times 128$ and FFHQ $256 \times 256$, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions. Our code is made publicly available at https://github.com/google-research/hit-gan
\ No newline at end of file
diff --git a/data/2021/neurips/Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP b/data/2021/neurips/Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
new file mode 100644
index 0000000000..a46629fd98
--- /dev/null
+++ b/data/2021/neurips/Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP	
@@ -0,0 +1 @@
+This paper presents new \emph{variance-aware} confidence sets for linear bandits and linear mixture Markov Decision Processes (MDPs). With the new confidence sets, we obtain the follow regret bounds: For linear bandits, we obtain an $\tilde{O}(poly(d)\sqrt{1 + \sum_{k=1}^{K}\sigma_k^2})$ data-dependent regret bound, where $d$ is the feature dimension, $K$ is the number of rounds, and $\sigma_k^2$ is the \emph{unknown} variance of the reward at the $k$-th round. This is the first regret bound that only scales with the variance and the dimension but \emph{no explicit polynomial dependency on $K$}. When variances are small, this bound can be significantly smaller than the $\tilde{\Theta}\left(d\sqrt{K}\right)$ worst-case regret bound. For linear mixture MDPs, we obtain an $\tilde{O}(poly(d, \log H)\sqrt{K})$ regret bound, where $d$ is the number of base models, $K$ is the number of episodes, and $H$ is the planning horizon. This is the first regret bound that only scales \emph{logarithmically} with $H$ in the reinforcement learning with linear function approximation setting, thus \emph{exponentially improving} existing results, and resolving an open problem in \citep{zhou2020nearly}. We develop three technical ideas that may be of independent interest: 1) applications of the peeling technique to both the input norm and the variance magnitude, 2) a recursion-based estimator for the variance, and 3) a new convex potential lemma that generalizes the seminal elliptical potential lemma.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss b/data/2021/neurips/Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss
new file mode 100644
index 0000000000..20b7753f24
--- /dev/null
+++ b/data/2021/neurips/Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss	
@@ -0,0 +1 @@
+Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics. Deep nets typically operate in serial stages wherein each layer completes its computation before processing begins in subsequent layers. In contrast, biological systems have cascaded dynamics: information propagates from neurons at all layers in parallel but transmission occurs gradually over time, leading to speed-accuracy trade offs even in feedforward architectures. We explore the consequences of biologically inspired parallel hardware by constructing cascaded ResNets in which each residual block has propagation delays but all blocks update in parallel in a stateful manner. Because information transmitted through skip connections avoids delays, the functional depth of the architecture increases over time, yielding anytime predictions that improve with internal-processing time. We introduce a temporal-difference training loss that achieves a strictly superior speed-accuracy profile over standard losses and enables the cascaded architecture to outperform state-of-the-art anytime-prediction methods. The cascaded architecture has intriguing properties, including: it classifies typical instances more rapidly than atypical instances; it is more robust to both persistent and transient noise than is a conventional ResNet; and its time-varying output trace provides a signal that can be exploited to improve information processing and inference.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Calibration through the Relationship with Adversarial Robustness b/data/2021/neurips/Improving Calibration through the Relationship with Adversarial Robustness
new file mode 100644
index 0000000000..6650df6973
--- /dev/null
+++ b/data/2021/neurips/Improving Calibration through the Relationship with Adversarial Robustness	
@@ -0,0 +1 @@
+Neural networks lack adversarial robustness, i.e., they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated predictions, i.e., the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. Based on this insight, we examine if calibration can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to further improve model calibration.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning b/data/2021/neurips/Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
new file mode 100644
index 0000000000..6ffc71acec
--- /dev/null
+++ b/data/2021/neurips/Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning	
@@ -0,0 +1 @@
+Human reasoning can often be understood as an interplay between two systems: the intuitive and associative ("System 1") and the deliberative and logical ("System 2"). Neural sequence models -- which have been increasingly successful at performing complex, structured tasks -- exhibit the advantages and failure modes of System 1: they are fast and learn patterns from data, but are often inconsistent and incoherent. In this work, we seek a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning. We explore several variations on this theme in which candidate generations from a neural sequence model are examined for logical consistency by a symbolic reasoning module, which can either accept or reject the generations. Our approach uses neural inference to mediate between the neural System 1 and the logical System 2. Results in robust story generation and grounded instruction-following show that this approach can increase the coherence and accuracy of neurally-based generations.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Compositionality of Neural Networks by Decoding Representations to Inputs b/data/2021/neurips/Improving Compositionality of Neural Networks by Decoding Representations to Inputs
new file mode 100644
index 0000000000..3f7f06cda9
--- /dev/null
+++ b/data/2021/neurips/Improving Compositionality of Neural Networks by Decoding Representations to Inputs	
@@ -0,0 +1 @@
+In traditional software programs, it is easy to trace program logic from variables back to input, apply assertion statements to block erroneous behavior, and compose programs together. Although deep learning programs have demonstrated strong performance on novel applications, they sacrifice many of the functionalities of traditional software programs. With this as motivation, we take a modest first step towards improving deep learning programs by jointly training a generative model to constrain neural network activations to"decode"back to inputs. We call this design a Decodable Neural Network, or DecNN. Doing so enables a form of compositionality in neural networks, where one can recursively compose DecNN with itself to create an ensemble-like model with uncertainty. In our experiments, we demonstrate applications of this uncertainty to out-of-distribution detection, adversarial example detection, and calibration -- while matching standard neural networks in accuracy. We further explore this compositionality by combining DecNN with pretrained models, where we show promising results that neural networks can be regularized from using protected features.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings b/data/2021/neurips/Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
new file mode 100644
index 0000000000..9311e1ada4
--- /dev/null
+++ b/data/2021/neurips/Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings	
@@ -0,0 +1 @@
+Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Stored Embeddings for Efficient Reinforcement Learning (SEER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that SEER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Conditional Coverage via Orthogonal Quantile Regression b/data/2021/neurips/Improving Conditional Coverage via Orthogonal Quantile Regression
new file mode 100644
index 0000000000..3e95457069
--- /dev/null
+++ b/data/2021/neurips/Improving Conditional Coverage via Orthogonal Quantile Regression	
@@ -0,0 +1 @@
+We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite samples. We find in experiments that traditional quantile regression can have poor conditional coverage. To remedy this, we modify the loss function to promote independence between the size of the intervals and the indicator of a miscoverage event. For the true conditional quantiles, these two quantities are independent (orthogonal), so the modified loss function continues to be valid. Moreover, we empirically show that the modified loss function leads to improved conditional coverage, as evaluated by several metrics. We also introduce two new metrics that check conditional coverage by looking at the strength of the dependence between the interval size and the indicator of miscoverage.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Contrastive Learning on Imbalanced Data via Open-World Sampling b/data/2021/neurips/Improving Contrastive Learning on Imbalanced Data via Open-World Sampling
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Improving Deep Learning Interpretability by Saliency Guided Training b/data/2021/neurips/Improving Deep Learning Interpretability by Saliency Guided Training
new file mode 100644
index 0000000000..3076989a10
--- /dev/null
+++ b/data/2021/neurips/Improving Deep Learning Interpretability by Saliency Guided Training	
@@ -0,0 +1 @@
+Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a {\it saliency guided training}procedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture b/data/2021/neurips/Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture
new file mode 100644
index 0000000000..836e2d914c
--- /dev/null
+++ b/data/2021/neurips/Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture	
@@ -0,0 +1 @@
+The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Robustness using Generated Data b/data/2021/neurips/Improving Robustness using Generated Data
new file mode 100644
index 0000000000..e3b37dfd9b
--- /dev/null
+++ b/data/2021/neurips/Improving Robustness using Generated Data	
@@ -0,0 +1 @@
+Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a sizable robust-accuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the"80 Million Tiny Images"dataset (TI-80M). In this paper, we explore how generative models trained solely on the original training set can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to $\ell_p$ norm-bounded perturbations. We identify the sufficient conditions under which incorporating additional generated data can improve robustness, and demonstrate that it is possible to significantly reduce the robust-accuracy gap to models trained with additional real data. Surprisingly, we even show that even the addition of non-realistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR-10, CIFAR-100, SVHN and TinyImageNet against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous state-of-the-art methods. Against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our models achieve 66.10% and 33.49% robust accuracy on CIFAR-10 and CIFAR-100, respectively (improving upon the state-of-the-art by +8.96% and +3.29%). Against $\ell_2$ norm-bounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31% on CIFAR-10 (+3.81%). These results beat most prior works that use external data.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration b/data/2021/neurips/Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration
new file mode 100644
index 0000000000..573788c143
--- /dev/null
+++ b/data/2021/neurips/Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration	
@@ -0,0 +1 @@
+Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches. Our method directly generalizes to many mainstream self-supervised learning approaches, regardless of the loss's nature contrastive or not. We empirically show UOTA's advantage over the state-of-the-art self-supervised paradigms with evident margin, which well justifies the existence of the OOD sample issue embedded in the existing approaches. Especially, we theoretically prove that the merits of the proposal boil down to guaranteed estimator variance and bias reduction. Code is available: at https://github.com/ssl-codelab/uota.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Transferability of Representations via Augmentation-Aware Self-Supervision b/data/2021/neurips/Improving Transferability of Representations via Augmentation-Aware Self-Supervision
new file mode 100644
index 0000000000..12e538ff3b
--- /dev/null
+++ b/data/2021/neurips/Improving Transferability of Representations via Augmentation-Aware Self-Supervision	
@@ -0,0 +1 @@
+Recent unsupervised representation learning methods have shown to be effective in a range of vision tasks by learning representations invariant to data augmentations such as random cropping and color jittering. However, such invariance could be harmful to downstream tasks if they rely on the characteristics of the data augmentations, e.g., location- or color-sensitive. This is not an issue just for unsupervised learning; we found that this occurs even in supervised learning because it also learns to predict the same label for all augmented samples of an instance. To avoid such failures and obtain more generalizable representations, we suggest to optimize an auxiliary self-supervised loss, coined AugSelf, that learns the difference of augmentation parameters (e.g., cropping positions, color adjustment intensities) between two randomly augmented samples. Our intuition is that AugSelf encourages to preserve augmentation-aware information in learned representations, which could be beneficial for their transferability. Furthermore, AugSelf can easily be incorporated into recent state-of-the-art representation learning methods with a negligible additional training cost. Extensive experiments demonstrate that our simple idea consistently improves the transferability of representations learned by supervised and unsupervised methods in various transfer learning scenarios. The code is available at https://github.com/hankook/AugSelf.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers b/data/2021/neurips/Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
new file mode 100644
index 0000000000..0a7aa10196
--- /dev/null
+++ b/data/2021/neurips/Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers	
@@ -0,0 +1 @@
+We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem. Different from existing paradigms that directly synthesize a full image from a single input (e.g., a latent code), the new formulation enables a flexible local manipulation for different image regions, which makes it possible to learn content-aware and fine-grained style control for image synthesis. Specifically, it takes as input a sequence of latent tokens to predict the visual tokens for synthesizing an image. Under this perspective, we propose a token-based generator (i.e.,TokenGAN). Particularly, the TokenGAN inputs two semantically different visual tokens, i.e., the learned constant content tokens and the style tokens from the latent space. Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer. We conduct extensive experiments and show that the proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks, including FFHQ and LSUN CHURCH with different resolutions. In particular, the generator is able to synthesize high-fidelity images with 1024x1024 size, dispensing with convolutions entirely.
\ No newline at end of file
diff --git a/data/2021/neurips/Improving black-box optimization in VAE latent space using decoder uncertainty b/data/2021/neurips/Improving black-box optimization in VAE latent space using decoder uncertainty
new file mode 100644
index 0000000000..36b20f05d9
--- /dev/null
+++ b/data/2021/neurips/Improving black-box optimization in VAE latent space using decoder uncertainty	
@@ -0,0 +1 @@
+Optimization in the latent space of variational autoencoders is a promising approach to generate high-dimensional discrete objects that maximize an expensive black-box property (e.g., drug-likeness in molecular generation, function approximation with arithmetic expressions). However, existing methods lack robustness as they may decide to explore areas of the latent space for which no data was available during training and where the decoder can be unreliable, leading to the generation of unrealistic or invalid objects. We propose to leverage the epistemic uncertainty of the decoder to guide the optimization process. This is not trivial though, as a naive estimation of uncertainty in the high-dimensional and structured settings we consider would result in high estimator variance. To solve this problem, we introduce an importance sampling-based estimator that provides more robust estimates of epistemic uncertainty. Our uncertainty-guided optimization approach does not require modifications of the model architecture nor the training process. It produces samples with a better trade-off between black-box objective and validity of the generated samples, sometimes improving both simultaneously. We illustrate these advantages across several experimental settings in digit generation, arithmetic expression approximation and molecule generation for drug design.
\ No newline at end of file
diff --git a/data/2021/neurips/Increasing Liquid State Machine Performance with Edge-of-Chaos Dynamics Organized by Astrocyte-modulated Plasticity b/data/2021/neurips/Increasing Liquid State Machine Performance with Edge-of-Chaos Dynamics Organized by Astrocyte-modulated Plasticity
new file mode 100644
index 0000000000..434a7ce8a9
--- /dev/null
+++ b/data/2021/neurips/Increasing Liquid State Machine Performance with Edge-of-Chaos Dynamics Organized by Astrocyte-modulated Plasticity	
@@ -0,0 +1 @@
+The liquid state machine (LSM) combines low training complexity and biological plausibility, which has made it an attractive machine learning framework for edge and neuromorphic computing paradigms. Originally proposed as a model of brain computation, the LSM tunes its internal weights without backpropagation of gradients, which results in lower performance compared to multi-layer neural networks. Recent findings in neuroscience suggest that astrocytes, a long-neglected non-neuronal brain cell, modulate synaptic plasticity and brain dynamics, tuning brain networks to the vicinity of the computationally optimal critical phase transition between order and chaos. Inspired by this disruptive understanding of how brain networks self-tune, we propose the neuron-astrocyte liquid state machine (NALSM) that addresses under-performance through self-organized near-critical dynamics. Similar to its biological counterpart, the astrocyte model integrates neuronal activity and provides global feedback to spike-timing-dependent plasticity (STDP), which self-organizes NALSM dynamics around a critical branching factor that is associated with the edge-of-chaos. We demonstrate that NALSM achieves state-of-the-art accuracy versus comparable LSM methods, without the need for data-specific hand-tuning. With a top accuracy of 97.61% on MNIST, 97.51% on N-MNIST, and 85.84% on Fashion-MNIST, NALSM achieved comparable performance to current fully-connected multi-layer spiking neural networks trained via backpropagation. Our findings suggest that the further development of brain-inspired machine learning methods has the potential to reach the performance of deep learning, with the added benefits of supporting robust and energy-efficient neuromorphic computing on the edge.
\ No newline at end of file
diff --git a/data/2021/neurips/Independent Prototype Propagation for Zero-Shot Compositionality b/data/2021/neurips/Independent Prototype Propagation for Zero-Shot Compositionality
new file mode 100644
index 0000000000..cfb84370cc
--- /dev/null
+++ b/data/2021/neurips/Independent Prototype Propagation for Zero-Shot Compositionality	
@@ -0,0 +1 @@
+Humans are good at compositional zero-shot reasoning; someone who has never seen a zebra before could nevertheless recognize one when we tell them it looks like a horse with black and white stripes. Machine learning systems, on the other hand, usually leverage spurious correlations in the training data, and while such correlations can help recognize objects in context, they hurt generalization. To be able to deal with underspecified datasets while still leveraging contextual clues during classification, we propose ProtoProp, a novel prototype propagation graph method. First we learn prototypical representations of objects (e.g., zebra) that are conditionally independent w.r.t. their attribute labels (e.g., stripes) and vice versa. Next we propagate the independent prototypes through a compositional graph, to learn compositional prototypes of novel attribute-object combinations that reflect the dependencies of the target distribution. The method does not rely on any external data, such as class hierarchy graphs or pretrained word embeddings. We evaluate our approach on AO-Clever, a synthetic and strongly visual dataset with clean labels, and UT-Zappos, a noisy real-world dataset of fine-grained shoe types. We show that in the generalized compositional zero-shot setting we outperform state-of-the-art results, and through ablations we show the importance of each part of the method and their contribution to the final results.
\ No newline at end of file
diff --git a/data/2021/neurips/Independent mechanism analysis, a new concept? b/data/2021/neurips/Independent mechanism analysis, a new concept?
new file mode 100644
index 0000000000..1b982bed4b
--- /dev/null
+++ b/data/2021/neurips/Independent mechanism analysis, a new concept?	
@@ -0,0 +1 @@
+Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation.
\ No newline at end of file
diff --git a/data/2021/neurips/Indexed Minimum Empirical Divergence for Unimodal Bandits b/data/2021/neurips/Indexed Minimum Empirical Divergence for Unimodal Bandits
new file mode 100644
index 0000000000..985636e344
--- /dev/null
+++ b/data/2021/neurips/Indexed Minimum Empirical Divergence for Unimodal Bandits	
@@ -0,0 +1 @@
+We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure. We introduce IMED-UB, a algorithm that optimally exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura [2015]. Owing to our proof technique, we are able to provide a concise finite-time analysis of IMED-UB algorithm. Numerical experiments show that IMED-UB competes with the state-of-the-art algorithms.
\ No newline at end of file
diff --git "a/data/2021/neurips/Individual Privacy Accounting via a R\303\251nyi Filter" "b/data/2021/neurips/Individual Privacy Accounting via a R\303\251nyi Filter"
new file mode 100644
index 0000000000..80a1d18484
--- /dev/null
+++ "b/data/2021/neurips/Individual Privacy Accounting via a R\303\251nyi Filter"	
@@ -0,0 +1 @@
+We consider a sequential setting in which a single dataset of individuals is used to perform adaptively-chosen analyses, while ensuring that the differential privacy loss of each participant does not exceed a pre-specified privacy budget. The standard approach to this problem relies on bounding a worst-case estimate of the privacy loss over all individuals and all possible values of their data, for every single analysis. Yet, in many scenarios this approach is overly conservative, especially for "typical" data points which incur little privacy loss by participation in most of the analyses. In this work, we give a method for tighter privacy loss accounting based on the value of a personalized privacy loss estimate for each individual in each analysis. The accounting method relies on a new composition theorem for R\'enyi differential privacy, which allows adaptively-chosen privacy parameters. We apply our results to the analysis of noisy gradient descent and show how existing algorithms can be generalized to incorporate individual privacy accounting and thus achieve a better privacy-utility tradeoff.
\ No newline at end of file
diff --git a/data/2021/neurips/Infinite Time Horizon Safety of Bayesian Neural Networks b/data/2021/neurips/Infinite Time Horizon Safety of Bayesian Neural Networks
new file mode 100644
index 0000000000..7e098d7969
--- /dev/null
+++ b/data/2021/neurips/Infinite Time Horizon Safety of Bayesian Neural Networks	
@@ -0,0 +1 @@
+Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction. We consider the problem of verifying safety when running a Bayesian neural network policy in a feedback loop with infinite time horizon systems. Compared to the existing sampling-based approaches, which are inapplicable to the infinite time horizon setting, we train a separate deterministic neural network that serves as an infinite time horizon safety certificate. In particular, we show that the certificate network guarantees the safety of the system over a subset of the BNN weight posterior's support. Our method first computes a safe weight set and then alters the BNN's weight posterior to reject samples outside this set. Moreover, we show how to extend our approach to a safe-exploration reinforcement learning setting, in order to avoid unsafe trajectories during the training of the policy. We evaluate our approach on a series of reinforcement learning benchmarks, including non-Lyapunovian safety specifications.
\ No newline at end of file
diff --git a/data/2021/neurips/Influence Patterns for Explaining Information Flow in BERT b/data/2021/neurips/Influence Patterns for Explaining Information Flow in BERT
new file mode 100644
index 0000000000..ecba6ed2b3
--- /dev/null
+++ b/data/2021/neurips/Influence Patterns for Explaining Information Flow in BERT	
@@ -0,0 +1 @@
+While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear. We introduce influence patterns, abstractions of sets of paths through a transformer model. Patterns quantify and localize the flow of information to paths passing through a sequence of model nodes. Experimentally, we find that significant portion of information flow in BERT goes through skip connections instead of attention heads. We further show that consistency of patterns across instances is an indicator of BERT's performance. Finally, We demonstrate that patterns account for far more model performance than previous attention-based and layer-based methods.
\ No newline at end of file
diff --git a/data/2021/neurips/InfoGCL: Information-Aware Graph Contrastive Learning b/data/2021/neurips/InfoGCL: Information-Aware Graph Contrastive Learning
new file mode 100644
index 0000000000..e22651d07a
--- /dev/null
+++ b/data/2021/neurips/InfoGCL: Information-Aware Graph Contrastive Learning	
@@ -0,0 +1 @@
+Various graph contrastive learning models have been proposed to improve the performance of learning tasks on graph datasets in recent years. While effective and prevalent, these models are usually carefully customized. In particular, although all recent researches create two contrastive views, they differ greatly in view augmentations, architectures, and objectives. It remains an open question how to build your graph contrastive learning model from scratch for particular graph learning tasks and datasets. In this work, we aim to fill this gap by studying how graph information is transformed and transferred during the contrastive learning process and proposing an information-aware graph contrastive learning framework called InfoGCL. The key point of this framework is to follow the Information Bottleneck principle to reduce the mutual information between contrastive parts while keeping task-relevant information intact at both the levels of the individual module and the entire framework so that the information loss during graph representation learning can be minimized. We show for the first time that all recent graph contrastive learning methods can be unified by our framework. We empirically validate our theoretical analysis on both node and graph classification benchmark datasets, and demonstrate that our algorithm significantly outperforms the state-of-the-arts.
\ No newline at end of file
diff --git a/data/2021/neurips/Information Directed Reward Learning for Reinforcement Learning b/data/2021/neurips/Information Directed Reward Learning for Reinforcement Learning
new file mode 100644
index 0000000000..e9e5df4dfc
--- /dev/null
+++ b/data/2021/neurips/Information Directed Reward Learning for Reinforcement Learning	
@@ -0,0 +1 @@
+For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types.
\ No newline at end of file
diff --git a/data/2021/neurips/Information Directed Sampling for Sparse Linear Bandits b/data/2021/neurips/Information Directed Sampling for Sparse Linear Bandits
new file mode 100644
index 0000000000..c8f866924f
--- /dev/null
+++ b/data/2021/neurips/Information Directed Sampling for Sparse Linear Bandits	
@@ -0,0 +1 @@
+Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balances the information-regret trade-off. We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances, demonstrating the adaptivity of IDS. To efficiently implement sparse IDS, we propose an empirical Bayesian approach for sparse posterior sampling using a spike-and-slab Gaussian-Laplace prior. Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Information is Power: Intrinsic Control via Information Capture b/data/2021/neurips/Information is Power: Intrinsic Control via Information Capture
new file mode 100644
index 0000000000..d6af137884
--- /dev/null
+++ b/data/2021/neurips/Information is Power: Intrinsic Control via Information Capture	
@@ -0,0 +1 @@
+Humans and animals explore their environment and acquire useful skills even in the absence of clear goals, exhibiting intrinsic motivation. The study of intrinsic motivation in artificial agents is concerned with the following question: what is a good general-purpose objective for an agent? We study this question in dynamic partially-observed environments, and argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model. This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states. We instantiate this approach as a deep reinforcement learning agent equipped with a deep variational Bayes filter. We find that our agent learns to discover, represent, and exercise control of dynamic objects in a variety of partially-observed environments sensed with visual observations without extrinsic reward.
\ No newline at end of file
diff --git a/data/2021/neurips/Information-constrained optimization: can adaptive processing of gradients help? b/data/2021/neurips/Information-constrained optimization: can adaptive processing of gradients help?
new file mode 100644
index 0000000000..59d4cf510c
--- /dev/null
+++ b/data/2021/neurips/Information-constrained optimization: can adaptive processing of gradients help?	
@@ -0,0 +1 @@
+We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization algorithm is not allowed to directly access the complete output of the gradient oracle, but only gets limited information about it subject to the local information constraints. We study the role of adaptivity in processing the gradient output to obtain this limited information from it.We consider optimization for both convex and strongly convex functions and obtain tight or nearly tight lower bounds for the convergence rate, when adaptive gradient processing is allowed. Prior work was restricted to convex functions and allowed only nonadaptive processing of gradients. For both of these function classes and for the three information constraints mentioned above, our lower bound implies that adaptive processing of gradients cannot outperform nonadaptive processing in most regimes of interest. We complement these results by exhibiting a natural optimization problem under information constraints for which adaptive processing of gradient strictly outperforms nonadaptive processing.
\ No newline at end of file
diff --git a/data/2021/neurips/Information-theoretic generalization bounds for black-box learning algorithms b/data/2021/neurips/Information-theoretic generalization bounds for black-box learning algorithms
new file mode 100644
index 0000000000..fcd4e7ecd5
--- /dev/null
+++ b/data/2021/neurips/Information-theoretic generalization bounds for black-box learning algorithms	
@@ -0,0 +1 @@
+We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms and (b) they are significantly easier to estimate. We show experimentally that the proposed bounds closely follow the generalization gap in practical scenarios for deep learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-Conditional Knowledge Distillation for Object Detection b/data/2021/neurips/Instance-Conditional Knowledge Distillation for Object Detection
new file mode 100644
index 0000000000..2c5b426786
--- /dev/null
+++ b/data/2021/neurips/Instance-Conditional Knowledge Distillation for Object Detection	
@@ -0,0 +1 @@
+Knowledge distillation has shown great success in classification, however, it is still challenging for detection. In a typical image for detection, representations from different locations may have different contributions to detection targets, making the distillation hard to balance. In this paper, we propose a conditional distillation framework to distill the desired knowledge, namely knowledge that is beneficial in terms of both classification and localization for every instance. The framework introduces a learnable conditional decoding module, which retrieves information given each target instance as query. Specifically, we encode the condition information as query and use the teacher's representations as key. The attention between query and key is used to measure the contribution of different features, guided by a localization-recognition-sensitive auxiliary task. Extensive experiments demonstrate the efficacy of our method: we observe impressive improvements under various settings. Notably, we boost RetinaNet with ResNet-50 backbone from 37.4 to 40.7 mAP (+3.3) under 1x schedule, that even surpasses the teacher (40.4 mAP) with ResNet-101 backbone under 3x schedule. Code has been released on https://github.com/megvii-research/ICD.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-Conditioned GAN b/data/2021/neurips/Instance-Conditioned GAN
new file mode 100644
index 0000000000..6559d2ae0e
--- /dev/null
+++ b/data/2021/neurips/Instance-Conditioned GAN	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) can generate near photo realistic images in narrow domains such as human faces. Yet, modeling complex distributions of datasets such as ImageNet and COCO-Stuff remains challenging in unconditional settings. In this paper, we take inspiration from kernel density estimation techniques and introduce a non-parametric approach to modeling distributions of complex datasets. We partition the data manifold into a mixture of overlapping neighborhoods described by a datapoint and its nearest neighbors, and introduce a model, called instance-conditioned GAN (IC-GAN), which learns the distribution around each datapoint. Experimental results on ImageNet and COCO-Stuff show that IC-GAN significantly improves over unconditional models and unsupervised data partitioning baselines. Moreover, we show that IC-GAN can effortlessly transfer to datasets not seen during training by simply changing the conditioning instances, and still generate realistic images. Finally, we extend IC-GAN to the class-conditional case and show semantically controllable generation and competitive quantitative results on ImageNet; while improving over BigGAN on ImageNet-LT. Code and trained models to reproduce the reported results are available at https://github.com/facebookresearch/ic_gan.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates b/data/2021/neurips/Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates
new file mode 100644
index 0000000000..27853ee283
--- /dev/null
+++ b/data/2021/neurips/Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates	
@@ -0,0 +1 @@
+We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy $\varepsilon$. Under a weak assumption on $\mathcal X$, this optimal sample complexity is shown to be nearly proportional to the integral $\int_{\mathcal X} \mathrm{d}\boldsymbol x/( \max(f) - f(\boldsymbol x) + \varepsilon )^d$. This result, which was only (and partially) known in dimension $d=1$, solves an open problem dating back to 1991. In terms of techniques, our upper bound relies on a packing bound by Bouttier al. (2020) for the Piyavskii-Shubert algorithm that we link to the above integral. We also show that a certified version of the computationally tractable DOO algorithm matches these packing and integral bounds. Our instance-dependent lower bound differs from traditional worst-case lower bounds in the Lipschitz setting and relies on a local worst-case analysis that could likely prove useful for other learning tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-Dependent Partial Label Learning b/data/2021/neurips/Instance-Dependent Partial Label Learning
new file mode 100644
index 0000000000..2b43cc9878
--- /dev/null
+++ b/data/2021/neurips/Instance-Dependent Partial Label Learning	
@@ -0,0 +1 @@
+Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior (MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Extensive experiments on manually corrupted benchmark datasets and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/idgp.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-dependent Label-noise Learning under a Structural Causal Model b/data/2021/neurips/Instance-dependent Label-noise Learning under a Structural Causal Model
new file mode 100644
index 0000000000..80762674f4
--- /dev/null
+++ b/data/2021/neurips/Instance-dependent Label-noise Learning under a Structural Causal Model	
@@ -0,0 +1 @@
+Label noise will degenerate the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let X and Y denote the instance and clean label, respectively. When Y is a cause of X, according to which many datasets have been constructed, e.g., SVHN and CIFAR, the distributions of P(X) and P(Y|X) are entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label noise problem. In this paper, by leveraging a structural causal model, we propose a novel generative approach for instance-dependent label-noise learning. In particular, we show that properly modeling the instances will contribute to the identifiability of the label noise transition matrix and thus lead to a better classifier. Empirically, our method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Instance-optimal Mean Estimation Under Differential Privacy b/data/2021/neurips/Instance-optimal Mean Estimation Under Differential Privacy
new file mode 100644
index 0000000000..c48f5e9858
--- /dev/null
+++ b/data/2021/neurips/Instance-optimal Mean Estimation Under Differential Privacy	
@@ -0,0 +1 @@
+Mean estimation under differential privacy is a fundamental problem, but worst-case optimal mechanisms do not offer meaningful utility guarantees in practice when the global sensitivity is very large. Instead, various heuristics have been proposed to reduce the error on real-world data that do not resemble the worst-case instance. This paper takes a principled approach, yielding a mechanism that is instance-optimal in a strong sense. In addition to its theoretical optimality, the mechanism is also simple and practical, and adapts to a variety of data characteristics without the need of parameter tuning. It easily extends to the local and shuffle model as well.
\ No newline at end of file
diff --git a/data/2021/neurips/Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression b/data/2021/neurips/Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression
new file mode 100644
index 0000000000..6f78d25206
--- /dev/null
+++ b/data/2021/neurips/Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression	
@@ -0,0 +1 @@
+Modeling a system's temporal behaviour in reaction to external stimuli is a fundamental problem in many areas. Pure Machine Learning (ML) approaches often fail in the small sample regime and cannot provide actionable insights beyond predictions. A promising modification has been to incorporate expert domain knowledge into ML models. The application we consider is predicting the progression of disease under medications, where a plethora of domain knowledge is available from pharmacology. Pharmacological models describe the dynamics of carefully-chosen medically meaningful variables in terms of systems of Ordinary Differential Equations (ODEs). However, these models only describe a limited collection of variables, and these variables are often not observable in clinical environments. To close this gap, we propose the latent hybridisation model (LHM) that integrates a system of expert-designed ODEs with machine-learned Neural ODEs to fully describe the dynamics of the system and to link the expert and latent variables to observable quantities. We evaluated LHM on synthetic data as well as real-world intensive care data of COVID-19 patients. LHM consistently outperforms previous works, especially when few training samples are available such as at the beginning of the pandemic.
\ No newline at end of file
diff --git a/data/2021/neurips/Integrating Tree Path in Transformer for Code Representation b/data/2021/neurips/Integrating Tree Path in Transformer for Code Representation
new file mode 100644
index 0000000000..272496e207
--- /dev/null
+++ b/data/2021/neurips/Integrating Tree Path in Transformer for Code Representation	
@@ -0,0 +1 @@
+Learning distributed representation of source code requires modelling its syntax and semantics. Recent state-of-the-art models leverage highly structured source code representations, such as the syntax trees and paths therein. In this paper, we investigate two representative path encoding methods shown in previous research work and integrate them into the attention module of Transformer. We draw inspiration from the ideas of positional encoding and modify them to incorporate these path encoding. Speciﬁcally, we encode both the pairwise path between tokens of source code and the path from the leaf node to the tree root for each token in the syntax tree. We explore the interaction between these two kinds of paths by integrating them into the uniﬁed Transformer framework. The detailed empirical study for path encoding methods also leads to our novel state-of-the-art representation model TPTrans, which ﬁnally outperforms strong baselines. Extensive experiments and ablation studies on code summarization across four different languages demonstrate the effectiveness of our approaches. We release our code at https://github.com/AwdHanPeng/TPTrans .
\ No newline at end of file
diff --git a/data/2021/neurips/Interactive Label Cleaning with Example-based Explanations b/data/2021/neurips/Interactive Label Cleaning with Example-based Explanations
new file mode 100644
index 0000000000..6474d14fa2
--- /dev/null
+++ b/data/2021/neurips/Interactive Label Cleaning with Example-based Explanations	
@@ -0,0 +1 @@
+We tackle sequential learning under label noise in applications where a human supervisor can be queried to relabel suspicious examples. Existing approaches are flawed, in that they only relabel incoming examples that look"suspicious"to the model. As a consequence, those mislabeled examples that elude (or don't undergo) this cleaning step end up tainting the training data and the model with no further chance of being cleaned. We propose Cincer, a novel approach that cleans both new and past data by identifying pairs of mutually incompatible examples. Whenever it detects a suspicious example, Cincer identifies a counter-example in the training set that -- according to the model -- is maximally incompatible with the suspicious example, and asks the annotator to relabel either or both examples, resolving this possible inconsistency. The counter-examples are chosen to be maximally incompatible, so to serve as explanations of the model's suspicion, and highly influential, so to convey as much information as possible if relabeled. Cincer achieves this by leveraging an efficient and robust approximation of influence functions based on the Fisher information matrix (FIM). Our extensive empirical evaluation shows that clarifying the reasons behind the model's suspicions by cleaning the counter-examples helps in acquiring substantially better data and models, especially when paired with our FIM approximation.
\ No newline at end of file
diff --git a/data/2021/neurips/Interesting Object, Curious Agent: Learning Task-Agnostic Exploration b/data/2021/neurips/Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
new file mode 100644
index 0000000000..7a4adc4cc4
--- /dev/null
+++ b/data/2021/neurips/Interesting Object, Curious Agent: Learning Task-Agnostic Exploration	
@@ -0,0 +1 @@
+Common approaches for task-agnostic exploration learn tabula-rasa --the agent assumes isolated environments and no prior knowledge or experience. However, in the real world, agents learn in many environments and always come with prior experiences as they explore new ones. Exploration is a lifelong process. In this paper, we propose a paradigm change in the formulation and evaluation of task-agnostic exploration. In this setup, the agent first learns to explore across many environments without any extrinsic goal in a task-agnostic manner. Later on, the agent effectively transfers the learned exploration policy to better explore new environments when solving tasks. In this context, we evaluate several baseline exploration strategies and present a simple yet effective approach to learning task-agnostic exploration policies. Our key idea is that there are two components of exploration: (1) an agent-centric component encouraging exploration of unseen parts of the environment based on an agent's belief; (2) an environment-centric component encouraging exploration of inherently interesting objects. We show that our formulation is effective and provides the most consistent exploration across several training-testing environment pairs. We also introduce benchmarks and metrics for evaluating task-agnostic exploration strategies. The source code is available at https://github.com/sparisi/cbet/.
\ No newline at end of file
diff --git a/data/2021/neurips/Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning b/data/2021/neurips/Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning
new file mode 100644
index 0000000000..180005d86c
--- /dev/null
+++ b/data/2021/neurips/Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning	
@@ -0,0 +1 @@
+We show that bringing intermediate layers' representations of two augmented versions of an image closer together in self-supervised learning helps to improve the momentum contrastive (MoCo) method. To this end, in addition to the contrastive loss, we minimize the mean squared error between the intermediate layer representations or make their cross-correlation matrix closer to an identity matrix. Both loss objectives either outperform standard MoCo, or achieve similar performances on three diverse medical imaging datasets: NIH-Chest Xrays, Breast Cancer Histopathology, and Diabetic Retinopathy. The gains of the improved MoCo are especially large in a low-labeled data regime (e.g. 1% labeled data) with an average gain of 5% across three datasets. We analyze the models trained using our novel approach via feature similarity analysis and layer-wise probing. Our analysis reveals that models trained via our approach have higher feature reuse compared to a standard MoCo and learn informative features earlier in the network. Finally, by comparing the output probability distribution of models fine-tuned on small versus large labeled data, we conclude that our proposed method of pre-training leads to lower Kolmogorov-Smirnov distance, as compared to a standard MoCo. This provides additional evidence that our proposed method learns more informative features in the pre-training phase which could be leveraged in a low-labeled data regime.
\ No newline at end of file
diff --git a/data/2021/neurips/Interpolation can hurt robust generalization even when there is no noise b/data/2021/neurips/Interpolation can hurt robust generalization even when there is no noise
new file mode 100644
index 0000000000..dc36c285dd
--- /dev/null
+++ b/data/2021/neurips/Interpolation can hurt robust generalization even when there is no noise	
@@ -0,0 +1 @@
+Numerous recent works show that overparameterization implicitly reduces variance for min-norm interpolators and max-margin classifiers. These findings suggest that ridge regularization has vanishing benefits in high dimensions. We challenge this narrative by showing that, even in the absence of noise, avoiding interpolation through ridge regularization can significantly improve generalization. We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
\ No newline at end of file
diff --git a/data/2021/neurips/Interpretable agent communication from scratch (with a generic visual processor emerging on the side) b/data/2021/neurips/Interpretable agent communication from scratch (with a generic visual processor emerging on the side)
new file mode 100644
index 0000000000..9fb3571ce1
--- /dev/null
+++ b/data/2021/neurips/Interpretable agent communication from scratch (with a generic visual processor emerging on the side)	
@@ -0,0 +1 @@
+As deep networks begin to be deployed as autonomous agents, the issue of how they can communicate with each other becomes important. Here, we train two deep nets from scratch to perform realistic referent identification through unsupervised emergent communication. We show that the largely interpretable emergent protocol allows the nets to successfully communicate even about object types they did not see at training time. The visual representations induced as a by-product of our training regime, moreover, show comparable quality, when re-used as generic visual features, to a recent self-supervised learning model. Our results provide concrete evidence of the viability of (interpretable) emergent deep net communication in a more realistic scenario than previously considered, as well as establishing an intriguing link between this field and self-supervised visual learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Interpreting Representation Quality of DNNs for 3D Point Cloud Processing b/data/2021/neurips/Interpreting Representation Quality of DNNs for 3D Point Cloud Processing
new file mode 100644
index 0000000000..dbebc88d5f
--- /dev/null
+++ b/data/2021/neurips/Interpreting Representation Quality of DNNs for 3D Point Cloud Processing	
@@ -0,0 +1 @@
+In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training.
\ No newline at end of file
diff --git a/data/2021/neurips/Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models b/data/2021/neurips/Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models
new file mode 100644
index 0000000000..c1c1b83def
--- /dev/null
+++ b/data/2021/neurips/Interventional Sum-Product Networks: Causal Inference with Tractable Probabilistic Models	
@@ -0,0 +1 @@
+While probabilistic models are an important tool for studying causality, doing so suffers from the intractability of inference. As a step towards tractable causal models, we consider the problem of learning interventional distributions using sum-product networks (SPNs) that are over-parameterized by gate functions, e.g., neural networks. Providing an arbitrarily intervened causal graph as input, effectively subsuming Pearl's do-operator, the gate function predicts the parameters of the SPN. The resulting interventional SPNs are motivated and illustrated by a structural causal model themed around personal health. Our empirical evaluation on three benchmark data sets as well as a synthetic health data set clearly demonstrates that interventional SPNs indeed are both expressive in modelling and flexible in adapting to the interventions.
\ No newline at end of file
diff --git a/data/2021/neurips/Intriguing Properties of Contrastive Losses b/data/2021/neurips/Intriguing Properties of Contrastive Losses
new file mode 100644
index 0000000000..3eee458b9e
--- /dev/null
+++ b/data/2021/neurips/Intriguing Properties of Contrastive Losses	
@@ -0,0 +1 @@
+Contrastive loss and its variants have become very popular recently for learning visual representations without supervision. In this work, we first generalize the standard contrastive loss based on cross entropy to a broader family of losses that share an abstract form of $\mathcal{L}_{\text{alignment}} + \lambda \mathcal{L}_{\text{distribution}}$, where hidden representations are encouraged to (1) be aligned under some transformations/augmentations, and (2) match a prior distribution of high entropy. We show that various instantiations of the generalized loss perform similarly under the presence of a multi-layer non-linear projection head, and the temperature scaling ($\tau$) widely used in the standard contrastive loss is (within a range) inversely related to the weighting ($\lambda$) between the two loss terms. We then study an intriguing phenomenon of feature suppression among competing features shared acros augmented views, such as "color distribution" vs "object class". We construct datasets with explicit and controllable competing features, and show that, for contrastive learning, a few bits of easy-to-learn shared features could suppress, and even fully prevent, the learning of other sets of competing features. Interestingly, this characteristic is much less detrimental in autoencoders based on a reconstruction loss. Existing contrastive learning methods critically rely on data augmentation to favor certain sets of features than others, while one may wish that a network would learn all competing features as much as its capacity allows.
\ No newline at end of file
diff --git a/data/2021/neurips/Intriguing Properties of Vision Transformers b/data/2021/neurips/Intriguing Properties of Vision Transformers
new file mode 100644
index 0000000000..99fc3a354a
--- /dev/null
+++ b/data/2021/neurips/Intriguing Properties of Vision Transformers	
@@ -0,0 +1 @@
+Vision transformers (ViT) have demonstrated impressive performance across various machine vision problems. These models are based on multi-head self-attention mechanisms that can flexibly attend to a sequence of image patches to encode contextual cues. An important question is how such flexibility in attending image-wide context conditioned on a given patch can facilitate handling nuisances in natural images e.g., severe occlusions, domain shifts, spatial permutations, adversarial and natural perturbations. We systematically study this question via an extensive set of experiments encompassing three ViT families and comparisons with a high-performing convolutional neural network (CNN). We show and analyze the following intriguing properties of ViT: (a) Transformers are highly robust to severe occlusions, perturbations and domain shifts, e.g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content. (b) The robust performance to occlusions is not due to a bias towards local textures, and ViTs are significantly less biased towards textures compared to CNNs. When properly trained to encode shape-based features, ViTs demonstrate shape recognition capability comparable to that of human visual system, previously unmatched in the literature. (c) Using ViTs to encode shape representation leads to an interesting consequence of accurate semantic segmentation without pixel-level supervision. (d) Off-the-shelf features from a single ViT model can be combined to create a feature ensemble, leading to high accuracy rates across a range of classification datasets in both traditional and few-shot learning paradigms. We show effective features of ViTs are due to flexible and dynamic receptive fields possible via the self-attention mechanism.
\ No newline at end of file
diff --git a/data/2021/neurips/Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks b/data/2021/neurips/Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
new file mode 100644
index 0000000000..7aea8214e4
--- /dev/null
+++ b/data/2021/neurips/Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks	
@@ -0,0 +1 @@
+Disobeying the classical wisdom of statistical learning theory, modern deep neural networks generalize well even though they typically contain millions of parameters. Recently, it has been shown that the trajectories of iterative optimization algorithms can possess fractal structures, and their generalization error can be formally linked to the complexity of such fractals. This complexity is measured by the fractal's intrinsic dimension, a quantity usually much smaller than the number of parameters in the network. Even though this perspective provides an explanation for why overparametrized networks would not overfit, computing the intrinsic dimension (e.g., for monitoring generalization during training) is a notoriously difficult task, where existing methods typically fail even in moderate ambient dimensions. In this study, we consider this problem from the lens of topological data analysis (TDA) and develop a generic computational tool that is built on rigorous mathematical foundations. By making a novel connection between learning theory and TDA, we first illustrate that the generalization error can be equivalently bounded in terms of a notion called the 'persistent homology dimension' (PHD), where, compared with prior work, our approach does not require any additional geometrical or statistical assumptions on the training dynamics. Then, by utilizing recently established theoretical results and TDA tools, we develop an efficient algorithm to estimate PHD in the scale of modern deep neural networks and further provide visualization tools to help understand generalization in deep learning. Our experiments show that the proposed approach can efficiently compute a network's intrinsic dimension in a variety of settings, which is predictive of the generalization error.
\ No newline at end of file
diff --git a/data/2021/neurips/Introspective Distillation for Robust Question Answering b/data/2021/neurips/Introspective Distillation for Robust Question Answering
new file mode 100644
index 0000000000..4a154e0c36
--- /dev/null
+++ b/data/2021/neurips/Introspective Distillation for Robust Question Answering	
@@ -0,0 +1 @@
+Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension. Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance. Therefore, they are only applicable in domains where the test distribution is known in advance. In this paper, we present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA. Our key technical contribution is to blend the inductive bias of OOD and ID by introspecting whether a training sample fits in the factual ID world or the counterfactual OOD one. Experiments on visual QA datasets VQA v2, VQA-CP, and reading comprehension dataset SQuAD demonstrate that our proposed IntroD maintains the competitive OOD performance compared to other debiasing methods, while sacrificing little or even achieving better ID performance compared to the non-debiasing ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization b/data/2021/neurips/Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
new file mode 100644
index 0000000000..13483944cb
--- /dev/null
+++ b/data/2021/neurips/Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Invariant Causal Imitation Learning for Generalizable Policies b/data/2021/neurips/Invariant Causal Imitation Learning for Generalizable Policies
new file mode 100644
index 0000000000..ab77a0f139
--- /dev/null
+++ b/data/2021/neurips/Invariant Causal Imitation Learning for Generalizable Policies	
@@ -0,0 +1 @@
+Consider learning an imitation policy on the basis of demonstrated behavior from multiple environments, with an eye towards deployment in an unseen environment. Since the observable features from each setting may be different, directly learning individual policies as mappings from features to actions is prone to spurious correlations -- and may not generalize well. However, the expert's policy is often a function of a shared latent structure underlying those observable features that is invariant across settings. By leveraging data from multiple environments, we propose Invariant Causal Imitation Learning (ICIL), a novel technique in which we learn a feature representation that is invariant across domains, on the basis of which we learn an imitation policy that matches expert behavior. To cope with transition dynamics mismatch, ICIL learns a shared representation of causal features (for all training environments), that is disentangled from the specific representations of noise variables (for each of those environments). Moreover, to ensure that the learned policy matches the observation distribution of the expert's policy, ICIL estimates the energy of the expert's observations and uses a regularization term that minimizes the imitator policy's next state energy. Experimentally, we compare our methods against several benchmarks in control and healthcare tasks and show its effectiveness in learning imitation policies capable of generalizing to unseen environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System b/data/2021/neurips/Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System
new file mode 100644
index 0000000000..a4328ce279
--- /dev/null
+++ b/data/2021/neurips/Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System	
@@ -0,0 +1 @@
+Computational level explanations based on optimal feedback control with signal-dependent noise have been able to account for a vast array of phenomena in human sensorimotor behavior. However, commonly a cost function needs to be assumed for a task and the optimality of human behavior is evaluated by comparing observed and predicted trajectories. Here, we introduce inverse optimal control with signal-dependent noise, which allows inferring the cost function from observed behavior. To do so, we formalize the problem as a partially observable Markov decision process and distinguish between the agent's and the experimenter's inference problems. Specifically, we derive a probabilistic formulation of the evolution of states and belief states and an approximation to the propagation equation in the linear-quadratic Gaussian problem with signal-dependent noise. We extend the model to the case of partial observability of state variables from the point of view of the experimenter. We show the feasibility of the approach through validation on synthetic data and application to experimental data. Our approach enables recovering the costs and benefits implicit in human sequential sensorimotor behavior, thereby reconciling normative and descriptive approaches in a computational framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Inverse Problems Leveraging Pre-trained Contrastive Representations b/data/2021/neurips/Inverse Problems Leveraging Pre-trained Contrastive Representations
new file mode 100644
index 0000000000..2bd846dca2
--- /dev/null
+++ b/data/2021/neurips/Inverse Problems Leveraging Pre-trained Contrastive Representations	
@@ -0,0 +1 @@
+We study a new family of inverse problems for recovering representations of corrupted data. We assume access to a pre-trained representation learning network R(x) that operates on clean images, like CLIP. The problem is to recover the representation of an image R(x), if we are only given a corrupted version A(x), for some known forward operator A. We propose a supervised inversion method that uses a contrastive objective to obtain excellent representations for highly corrupted images. Using a linear probe on our robust representations, we achieve a higher accuracy than end-to-end supervised baselines when classifying images with various types of distortions, including blurring, additive noise, and random pixel masking. We evaluate on a subset of ImageNet and observe that our method is robust to varying levels of distortion. Our method outperforms end-to-end baselines even with a fraction of the labeled data in a wide range of forward operators.
\ No newline at end of file
diff --git a/data/2021/neurips/Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees b/data/2021/neurips/Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees
new file mode 100644
index 0000000000..82b4c4f4f5
--- /dev/null
+++ b/data/2021/neurips/Inverse Reinforcement Learning in a Continuous State Space with Formal Guarantees	
@@ -0,0 +1 @@
+Inverse Reinforcement Learning (IRL) is the problem of ﬁnding a reward function which describes observed/known expert behavior. The IRL setting is remarkably useful for automated control, in situations where the reward function is difﬁcult to specify manually or as a means to extract agent preference. In this work, we provide a new IRL algorithm for the continuous state space setting with unknown transition dynamics by modeling the system using a basis of orthonormal functions . Moreover, we provide a proof of correctness and formal guarantees on the sample and time complexity of our algorithm. Finally, we present synthetic experiments to corroborate our theoretical guarantees.
\ No newline at end of file
diff --git a/data/2021/neurips/Inverse-Weighted Survival Games b/data/2021/neurips/Inverse-Weighted Survival Games
new file mode 100644
index 0000000000..4c32c5d3bf
--- /dev/null
+++ b/data/2021/neurips/Inverse-Weighted Survival Games	
@@ -0,0 +1 @@
+Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum likelihood does not directly optimize these criteria. Directly optimizing criteria like BS requires inverse-weighting by the censoring distribution. However, estimating the censoring model under these metrics requires inverse-weighting by the failure distribution. The objective for each model requires the other, but neither are known. To resolve this dilemma, we introduce Inverse-Weighted Survival Games. In these games, objectives for each model are built from re-weighted estimates featuring the other model, where the latter is held fixed during training. When the loss is proper, we show that the games always have the true failure and censoring distributions as a stationary point. This means models in the game do not leave the correct distributions once reached. We construct one case where this stationary point is unique. We show that these games optimize BS on simulations and then apply these principles on real world cancer and critically-ill patient data.
\ No newline at end of file
diff --git a/data/2021/neurips/Invertible DenseNets with Concatenated LipSwish b/data/2021/neurips/Invertible DenseNets with Concatenated LipSwish
new file mode 100644
index 0000000000..a344629e75
--- /dev/null
+++ b/data/2021/neurips/Invertible DenseNets with Concatenated LipSwish	
@@ -0,0 +1 @@
+We introduce Invertible Dense Networks (i-DenseNets), a more parameter efficient extension of Residual Flows. The method relies on an analysis of the Lipschitz continuity of the concatenation in DenseNets, where we enforce invertibility of the network by satisfying the Lipschitz constant. Furthermore, we propose a learnable weighted concatenation, which not only improves the model performance but also indicates the importance of the concatenated weighted representation. Additionally, we introduce the Concatenated LipSwish as activation function, for which we show how to enforce the Lipschitz condition and which boosts performance. The new architecture, i-DenseNet, out-performs Residual Flow and other flow-based models on density estimation evaluated in bits per dimension, where we utilize an equal parameter budget. Moreover, we show that the proposed model out-performs Residual Flows when trained as a hybrid model where the model is both a generative and a discriminative model.
\ No newline at end of file
diff --git a/data/2021/neurips/Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence b/data/2021/neurips/Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence
new file mode 100644
index 0000000000..92b95967ac
--- /dev/null
+++ b/data/2021/neurips/Is Automated Topic Model Evaluation Broken? The Incoherence of Coherence	
@@ -0,0 +1 @@
+Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.
\ No newline at end of file
diff --git a/data/2021/neurips/Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies b/data/2021/neurips/Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
new file mode 100644
index 0000000000..f9b4cdf02c
--- /dev/null
+++ b/data/2021/neurips/Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies	
@@ -0,0 +1 @@
+Reinforcement learning (RL) for continuous control typically employs distributions whose support covers the entire action space. In this work, we investigate the colloquially known phenomenon that trained agents often prefer actions at the boundaries of that space. We draw theoretical connections to the emergence of bang-bang behavior in optimal control, and provide extensive empirical evaluation across a variety of recent RL algorithms. We replace the normal Gaussian by a Bernoulli distribution that solely considers the extremes along each action dimension - a bang-bang controller. Surprisingly, this achieves state-of-the-art performance on several continuous control benchmarks - in contrast to robotic hardware, where energy and maintenance cost affect controller choices. Since exploration, learning,and the final solution are entangled in RL, we provide additional imitation learning experiments to reduce the impact of exploration on our analysis. Finally, we show that our observations generalize to environments that aim to model real-world challenges and evaluate factors to mitigate the emergence of bang-bang solutions. Our findings emphasize challenges for benchmarking continuous control algorithms, particularly in light of potential real-world applications.
\ No newline at end of file
diff --git a/data/2021/neurips/It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems b/data/2021/neurips/It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems
new file mode 100644
index 0000000000..8871267e64
--- /dev/null
+++ b/data/2021/neurips/It Has Potential: Gradient-Driven Denoisers for Convergent Solutions to Inverse Problems	
@@ -0,0 +1 @@
+In recent years there has been increasing interest in leveraging denoisers for solving general inverse problems. Two leading frameworks are regularization-by-denoising (RED) and plug-and-play priors (PnP) which incorporate explicit likelihood functions with priors induced by denoising algorithms. RED and PnP have shown state-of-the-art performance in diverse imaging tasks when powerful denoisers are used, such as convolutional neural networks (CNNs). However, the study of their convergence remains an active line of research. Recent works derive the convergence of RED and PnP methods by treating CNN denoisers as approximations for maximum a posteriori (MAP) or minimum mean square error (MMSE) estimators. Yet, state-of-the-art denoisers cannot be interpreted as either MAP or MMSE estimators, since they typically do not exhibit symmetric Jacobians. Furthermore, obtaining stable inverse algorithms often requires controlling the Lipschitz constant of CNN denoisers during training. Precisely enforcing this constraint is impractical, hence, convergence cannot be completely guaranteed. In this work, we introduce image denoisers derived as the gradients of smooth scalar-valued deep neural networks, acting as potentials. This ensures two things: (1) the proposed denoisers display symmetric Jacobians, allowing for MAP and MMSE estimators interpretation; (2) the denoisers may be integrated into RED and PnP schemes with backtracking step size, removing the need for enforcing their Lipschitz constant. To show the latter, we develop a simple inversion method that utilizes the proposed denoisers. We theoretically establish its convergence to stationary points of an underlying objective function consisting of the learned potentials. We numerically validate our method through various imaging experiments, showing improved results compared to standard RED
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Amortized Policy Optimization b/data/2021/neurips/Iterative Amortized Policy Optimization
new file mode 100644
index 0000000000..f897826600
--- /dev/null
+++ b/data/2021/neurips/Iterative Amortized Policy Optimization	
@@ -0,0 +1 @@
+Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control, enabling the estimation and sampling of high-value actions. From the variational inference perspective on RL, policy networks, when employed with entropy or KL regularization, are a form of amortized optimization, optimizing network parameters rather than the policy distributions directly. However, this direct amortized mapping can empirically yield suboptimal policy estimates. Given this perspective, we consider the more flexible class of iterative amortized optimizers. We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over conventional direct amortization methods on benchmark continuous control tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias b/data/2021/neurips/Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias
new file mode 100644
index 0000000000..462ad904b8
--- /dev/null
+++ b/data/2021/neurips/Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias	
@@ -0,0 +1 @@
+We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI+, and RFCI algorithms (code is available at https://github.com/IntelLabs/causality-lab).
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Connecting Probability Estimation for Networks b/data/2021/neurips/Iterative Connecting Probability Estimation for Networks
new file mode 100644
index 0000000000..a0361c1160
--- /dev/null
+++ b/data/2021/neurips/Iterative Connecting Probability Estimation for Networks	
@@ -0,0 +1 @@
+Estimating the probabilities of connections between vertices in a random network using an observed adjacency matrix is an important task for network data analysis. Many existing estimation methods are based on certain assumptions on network structure, which limit their applicability in practice. Without making strong assumptions, we develop an iterative connecting probability estimation method based on neighborhood averaging. Starting at a random initial point or an existing estimate, our method iteratively updates the pairwise vertex distances, the sets of similar vertices, and connecting probabilities to improve the precision of the estimate. We propose a two-stage neighborhood selection procedure to achieve the trade-off between smoothness of the estimate and the ability to discover local structure. The tuning parameters can be selected by cross-validation. We establish desirable theoretical properties for our method, and further justify its superior performance by comparing with existing methods in simulation and real data analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods b/data/2021/neurips/Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods
new file mode 100644
index 0000000000..ca7169495d
--- /dev/null
+++ b/data/2021/neurips/Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods	
@@ -0,0 +1 @@
+We study private synthetic data generation for query release, where the goal is to construct a sanitized version of a sensitive dataset, subject to differential privacy, that approximately preserves the answers to a large collection of statistical queries. We first present an algorithmic framework that unifies a long line of iterative algorithms in the literature. Under this framework, we propose two new methods. The first method, private entropy projection (PEP), can be viewed as an advanced variant of MWEM that adaptively reuses past query measurements to boost accuracy. Our second method, generative networks with the exponential mechanism (GEM), circumvents computational bottlenecks in algorithms such as MWEM and PEP by optimizing over generative models parameterized by neural networks, which capture a rich family of distributions while enabling fast gradient-based optimization. We demonstrate that PEP and GEM empirically outperform existing algorithms. Furthermore, we show that GEM nicely incorporates prior information from public data while overcoming limitations of PMW^Pub, the existing state-of-the-art method that also leverages public data.
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Teacher-Aware Learning b/data/2021/neurips/Iterative Teacher-Aware Learning
new file mode 100644
index 0000000000..acdea9102e
--- /dev/null
+++ b/data/2021/neurips/Iterative Teacher-Aware Learning	
@@ -0,0 +1 @@
+In human pedagogy, teachers and students can interact adaptively to maximize communication efficiency. The teacher adjusts her teaching method for different students, and the student, after getting familiar with the teacher's instruction mechanism, can infer the teacher's intention to learn faster. Recently, the benefits of integrating this cooperative pedagogy into machine concept learning in discrete spaces have been proved by multiple works. However, how cooperative pedagogy can facilitate machine parameter learning hasn't been thoroughly studied. In this paper, we propose a gradient optimization based teacher-aware learner who can incorporate teacher's cooperative intention into the likelihood function and learn provably faster compared with the naive learning algorithms used in previous machine teaching works. We give theoretical proof that the iterative teacher-aware learning (ITAL) process leads to local and global improvements. We then validate our algorithms with extensive experiments on various tasks including regression, classification, and inverse reinforcement learning using synthetic and real data. We also show the advantage of modeling teacher-awareness when agents are learning from human teachers.
\ No newline at end of file
diff --git a/data/2021/neurips/Iterative Teaching by Label Synthesis b/data/2021/neurips/Iterative Teaching by Label Synthesis
new file mode 100644
index 0000000000..e1d79bca47
--- /dev/null
+++ b/data/2021/neurips/Iterative Teaching by Label Synthesis	
@@ -0,0 +1 @@
+In this paper, we consider the problem of iterative machine teaching, where a teacher provides examples sequentially based on the current iterative learner. In contrast to previous methods that have to scan over the entire pool and select teaching examples from it in each iteration, we propose a label synthesis teaching framework where the teacher randomly selects input teaching examples (e.g., images) and then synthesizes suitable outputs (e.g., labels) for them. We show that this framework can avoid costly example selection while still provably achieving exponential teachability. We propose multiple novel teaching algorithms in this framework. Finally, we empirically demonstrate the value of our framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate b/data/2021/neurips/Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate
new file mode 100644
index 0000000000..a90ad6eaa3
--- /dev/null
+++ b/data/2021/neurips/Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate	
@@ -0,0 +1 @@
+The recovery of sparse data is at the core of many applications in machine learning and signal processing. While such problems can be tackled using $\ell_1$-regularization as in the LASSO estimator and in the Basis Pursuit approach, specialized algorithms are typically required to solve the corresponding high-dimensional non-smooth optimization for large instances. Iteratively Reweighted Least Squares (IRLS) is a widely used algorithm for this purpose due its excellent numerical performance. However, while existing theory is able to guarantee convergence of this algorithm to the minimizer, it does not provide a global convergence rate. In this paper, we prove that a variant of IRLS converges with a global linear rate to a sparse solution, i.e., with a linear error decrease occurring immediately from any initialization, if the measurements fulfill the usual null space property assumption. We support our theory by numerical experiments showing that our linear rate captures the correct dimension dependence. We anticipate that our theoretical findings will lead to new insights for many other use cases of the IRLS algorithm, such as in low-rank matrix recovery.
\ No newline at end of file
diff --git a/data/2021/neurips/Joint Inference for Neural Network Depth and Dropout Regularization b/data/2021/neurips/Joint Inference for Neural Network Depth and Dropout Regularization
new file mode 100644
index 0000000000..8220a308d2
--- /dev/null
+++ b/data/2021/neurips/Joint Inference for Neural Network Depth and Dropout Regularization	
@@ -0,0 +1 @@
+Dropout regularization methods prune a neural network’s pre-determined backbone structure to avoid overﬁtting. However, a deep model still tends to be poorly calibrated with high conﬁdence on incorrect predictions. We propose a uniﬁed Bayesian model selection method to jointly infer the most plausible network depth warranted by data, and perform dropout regularization simultaneously. In particular, to infer network depth we deﬁne a beta process over the number of hidden layers which allows it to go to inﬁnity. Layer-wise activation probabilities induced by the beta process modulate neuron activation via binary vectors of a conjugate Bernoulli process. Experiments across domains show that by adapting network depth and dropout regularization to data, our method achieves superior performance comparing to state-of-the-art methods with well-calibrated uncertainty estimates. In continual learning, our method enables neural networks to dynamically evolve their depths to accommodate incrementally available data beyond their initial structures, and alleviate catastrophic forgetting.
\ No newline at end of file
diff --git a/data/2021/neurips/Joint Modeling of Visual Objects and Relations for Scene Graph Generation b/data/2021/neurips/Joint Modeling of Visual Objects and Relations for Scene Graph Generation
new file mode 100644
index 0000000000..d8dfa86f7c
--- /dev/null
+++ b/data/2021/neurips/Joint Modeling of Visual Objects and Relations for Scene Graph Generation	
@@ -0,0 +1 @@
+An in-depth scene understanding usually requires recognizing all the objects and their relations in an image, encoded as a scene graph. Most existing approaches for scene graph generation ﬁrst independently recognize each object and then predict their relations independently. Though these approaches are very efﬁcient, they ignore the dependency between different objects as well as between their relations. In this paper, we propose a principled approach to jointly predict the entire scene graph by fully capturing the dependency between different objects and between their relations. Speciﬁcally, we establish a uniﬁed conditional random ﬁeld (CRF) to model the joint distribution of all the objects and their relations in a scene graph. We carefully design the potential functions to enable relational reasoning among different objects according to knowledge graph embedding methods. We further propose an efﬁcient and effective algorithm for inference based on mean-ﬁeld variational inference, in which we ﬁrst provide a warm initialization by independently predicting the objects and their relations according to the current model, followed by a few iterations of relational reasoning. Experimental results on both the relationship retrieval and zero-shot relationship retrieval tasks prove the efﬁciency and efﬁcacy of our proposed approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection b/data/2021/neurips/Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection
new file mode 100644
index 0000000000..5953e633f7
--- /dev/null
+++ b/data/2021/neurips/Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection	
@@ -0,0 +1 @@
+Training saliency detection models with weak supervisions, e.g., image-level tags or captions, is appealing as it removes the costly demand of per-pixel annotations. Despite the rapid progress of RGB-D saliency detection in fully-supervised setting, it however remains an unexplored territory when only weak supervision signals are available. This paper is set to tackle the problem of weakly-supervised RGB-D salient object detection . The key insight in this effort is the idea of maintaining per-pixel pseudo-labels with iterative reﬁnements by reconciling the multimodal input signals in our joint semantic mining (JSM). Considering the large variations in the raw depth map and the lack of explicit pixel-level supervisions, we propose spatial semantic modeling (SSM) to capture saliency-speciﬁc depth cues from the raw depth and produce depth-reﬁned pseudo-labels. Moreover, tags and captions are incorporated via a ﬁll-in-the-blank training in our textual semantic modeling (TSM) to estimate the conﬁdences of competing pseudo-labels. At test time, our model involves only a light-weight sub-network of the training pipeline, i.e. , it requires only an RGB image as input, thus allowing efﬁcient inference. Extensive evaluations demonstrate the effectiveness of our approach under the weakly-supervised setting. Importantly, our method could also be adapted to work in both fully-supervised and unsupervised paradigms. In each of these scenarios, superior performance has been attained by our approach with comparing to the state-of-the-art dedicated methods. As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.
\ No newline at end of file
diff --git a/data/2021/neurips/Joint inference and input optimization in equilibrium networks b/data/2021/neurips/Joint inference and input optimization in equilibrium networks
new file mode 100644
index 0000000000..c55132f957
--- /dev/null
+++ b/data/2021/neurips/Joint inference and input optimization in equilibrium networks	
@@ -0,0 +1 @@
+Many tasks in deep learning involve optimizing over the \emph{inputs} to a network to minimize or maximize some objective; examples include optimization over latent spaces in a generative model to match a target image, or adversarially perturbing an input to worsen classifier performance. Performing such optimization, however, is traditionally quite costly, as it involves a complete forward and backward pass through the network for each gradient step. In a separate line of work, a recent thread of research has developed the deep equilibrium (DEQ) model, a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. In this paper, we show that there is a natural synergy between these two settings. Although, naively using DEQs for these optimization problems is expensive (owing to the time needed to compute a fixed point for each gradient step), we can leverage the fact that gradient-based optimization can \emph{itself} be cast as a fixed point iteration to substantially improve the overall speed. That is, we \emph{simultaneously} both solve for the DEQ fixed point \emph{and} optimize over network inputs, all within a single ``augmented'' DEQ model that jointly encodes both the original network and the optimization process. Indeed, the procedure is fast enough that it allows us to efficiently \emph{train} DEQ models for tasks traditionally relying on an ``inner'' optimization loop. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
\ No newline at end of file
diff --git a/data/2021/neurips/K-Net: Towards Unified Image Segmentation b/data/2021/neurips/K-Net: Towards Unified Image Segmentation
new file mode 100644
index 0000000000..214c2a5a5e
--- /dev/null
+++ b/data/2021/neurips/K-Net: Towards Unified Image Segmentation	
@@ -0,0 +1 @@
+Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/ZwwWayne/K-Net/.
\ No newline at end of file
diff --git a/data/2021/neurips/K-level Reasoning for Zero-Shot Coordination in Hanabi b/data/2021/neurips/K-level Reasoning for Zero-Shot Coordination in Hanabi
new file mode 100644
index 0000000000..d857145c17
--- /dev/null
+++ b/data/2021/neurips/K-level Reasoning for Zero-Shot Coordination in Hanabi	
@@ -0,0 +1 @@
+The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together. However, optimal SP policies commonly contain arbitrary conventions ("handshakes") and are not compatible with other, independently trained agents or humans. This latter desiderata was recently formalized by Hu et al. 2020 as the zero-shot coordination (ZSC) setting and partially addressed with their Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance in the card game Hanabi. OP assumes access to the symmetries of the environment and prevents agents from breaking these in a mutually incompatible way during training. However, as the authors point out, discovering symmetries for a given environment is a computationally hard problem. Instead, we show that through a simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006, synchronously training all levels, we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi, including when paired with a human-like proxy bot. We also introduce a new method, synchronous-k-level reasoning with a best response (SyKLRBR), which further improves performance on our synchronous KLR by co-training a best response.
\ No newline at end of file
diff --git a/data/2021/neurips/KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support b/data/2021/neurips/KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support
new file mode 100644
index 0000000000..5b25d1a890
--- /dev/null
+++ b/data/2021/neurips/KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support	
@@ -0,0 +1 @@
+We study the gradient flow for a relaxed approximation to the Kullback-Leibler (KL) divergence between a moving source and a fixed target distribution. This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions. When using a Reproducing Kernel Hilbert Space (RKHS) to define the function class, we show that the KALE continuously interpolates between the KL and the Maximum Mean Discrepancy (MMD). Like the MMD and other Integral Probability Metrics, the KALE remains well defined for mutually singular distributions. Nonetheless, the KALE inherits from the limiting KL a greater sensitivity to mismatch in the support of the distributions, compared with the MMD. These two properties make the KALE gradient flow particularly well suited when the target distribution is supported on a low-dimensional manifold. Under an assumption of sufficient smoothness of the trajectories, we show the global convergence of the KALE flow. We propose a particle implementation of the flow given initial samples from the source and the target distribution, which we use to empirically confirm the KALE's properties.
\ No newline at end of file
diff --git a/data/2021/neurips/KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network b/data/2021/neurips/KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network
new file mode 100644
index 0000000000..d3b1889c04
--- /dev/null
+++ b/data/2021/neurips/KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network	
@@ -0,0 +1 @@
+Keyword search is a fundamental task to retrieve information that is the most relevant to the query keywords. Keyword search over graphs aims to ﬁnd subtrees or subgraphs containing all query keywords ranked according to some criteria. Existing studies all assume that the graphs have complete information. However, real-world graphs may contain some missing information (such as edges or key-words), thus making the problem much more challenging. To solve the problem of keyword search over incomplete graphs, we propose a novel model named KS-GNN based on the graph neural network and the auto-encoder. By considering the latent relationships and the frequency of different keywords, the proposed KS-GNN aims to alleviate the effect of missing information and is able to learn low-dimensional representative node embeddings that preserve both graph structure and keyword features. Our model can effectively answer keyword search queries with linear time complexity over incomplete graphs. The experiments on four real-world datasets show that our model consistently achieves better performance than state-of-the-art baseline methods in graphs having missing information.
\ No newline at end of file
diff --git a/data/2021/neurips/Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers b/data/2021/neurips/Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
new file mode 100644
index 0000000000..6fda6165ec
--- /dev/null
+++ b/data/2021/neurips/Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers	
@@ -0,0 +1 @@
+In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what is found at that location in frame $t+k$. These temporal correspondences should be modeled to facilitate learning about dynamic scenes. To this end, we propose a new drop-in block for video transformers -- trajectory attention -- that aggregates information along implicitly determined motion paths. We additionally propose a new method to address the quadratic dependence of computation and memory on the input size, which is particularly important for high resolution or long videos. While these ideas are useful in a range of settings, we apply them to the specific task of video action recognition with a transformer model and obtain state-of-the-art results on the Kinetics, Something--Something V2, and Epic-Kitchens datasets. Code and models are available at: https://github.com/facebookresearch/Motionformer
\ No newline at end of file
diff --git a/data/2021/neurips/Kernel Functional Optimisation b/data/2021/neurips/Kernel Functional Optimisation
new file mode 100644
index 0000000000..29966071ca
--- /dev/null
+++ b/data/2021/neurips/Kernel Functional Optimisation	
@@ -0,0 +1 @@
+Traditional methods for kernel selection rely on parametric kernel functions or a combination thereof and although the kernel hyperparameters are tuned, these methods often provide sub-optimal results due to the limitations induced by the parametric forms. In this paper, we propose a novel formulation for kernel selection using efﬁcient Bayesian optimisation to ﬁnd the best ﬁtting non-parametric kernel. The kernel is expressed using a linear combination of functions sampled from a prior Gaussian Process (GP) deﬁned by a hyperkernel. We also provide a mechanism to ensure the positive deﬁniteness of the Gram matrix constructed using the resultant kernels. Our experimental results on GP regression and Support Vector Machine (SVM) classiﬁcation tasks involving both synthetic functions and several real-world datasets show the superiority of our approach over the state-of-the-art.
\ No newline at end of file
diff --git a/data/2021/neurips/Kernel Identification Through Transformers b/data/2021/neurips/Kernel Identification Through Transformers
new file mode 100644
index 0000000000..398e08e76d
--- /dev/null
+++ b/data/2021/neurips/Kernel Identification Through Transformers	
@@ -0,0 +1 @@
+Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the self-attention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Kernelized Heterogeneous Risk Minimization b/data/2021/neurips/Kernelized Heterogeneous Risk Minimization
new file mode 100644
index 0000000000..f2bd289546
--- /dev/null
+++ b/data/2021/neurips/Kernelized Heterogeneous Risk Minimization	
@@ -0,0 +1 @@
+The ability to generalize under distributional shifts is essential to reliable machine learning, while models optimized with empirical risk minimization usually fail on non-$i.i.d$ testing data. Recently, invariant learning methods for out-of-distribution (OOD) generalization propose to find causally invariant relationships with multi-environments. However, modern datasets are frequently multi-sourced without explicit source labels, rendering many invariant learning methods inapplicable. In this paper, we propose Kernelized Heterogeneous Risk Minimization (KerHRM) algorithm, which achieves both the latent heterogeneity exploration and invariant learning in kernel space, and then gives feedback to the original neural network by appointing invariant gradient direction. We theoretically justify our algorithm and empirically validate the effectiveness of our algorithm with extensive experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Knowledge-Adaptation Priors b/data/2021/neurips/Knowledge-Adaptation Priors
new file mode 100644
index 0000000000..baca018248
--- /dev/null
+++ b/data/2021/neurips/Knowledge-Adaptation Priors	
@@ -0,0 +1 @@
+Humans and animals have a natural ability to quickly adapt to their surroundings, but machine-learning models, when subjected to changes, often require a complete retraining from scratch. We present Knowledge-adaptation priors (K-priors) to reduce the cost of retraining by enabling quick and accurate adaptation for a wide-variety of tasks and models. This is made possible by a combination of weight and function-space priors to reconstruct the gradients of the past, which recovers and generalizes many existing, but seemingly-unrelated, adaptation strategies. Training with simple first-order gradient methods can often recover the exact retrained model to an arbitrary accuracy by choosing a sufficiently large memory of the past data. Empirical results show that adaptation with K-priors achieves performance similar to full retraining, but only requires training on a handful of past examples.
\ No newline at end of file
diff --git a/data/2021/neurips/Knowledge-inspired 3D Scene Graph Prediction in Point Cloud b/data/2021/neurips/Knowledge-inspired 3D Scene Graph Prediction in Point Cloud
new file mode 100644
index 0000000000..d218289acf
--- /dev/null
+++ b/data/2021/neurips/Knowledge-inspired 3D Scene Graph Prediction in Point Cloud	
@@ -0,0 +1 @@
+Prior knowledge integration helps identify semantic entities and their relationships in a graphical representation, however, its meaningful abstraction and intervention remain elusive. This paper advocates a knowledge-inspired 3D scene graph prediction method solely based on point clouds. At the mathematical modeling level, we formulate the task as two sub-problems: knowledge learning and scene graph prediction with learned prior knowledge. Unlike conventional methods that learn knowledge embedding and regular patterns from encoded visual information, we propose to suppress the misunderstandings caused by appearance similarities and other perceptual confusion. At the network design level, we devise a graph auto-encoder to automatically extract class-dependent representations and topological patterns from the one-hot class labels and their intrinsic graphical structures, so that the prior knowledge can avoid perceptual errors and noises. We further devise a scene graph prediction model to predict credible relationship triplets by incorporating the related prototype knowledge with perceptual information. Comprehensive experiments conﬁrm that, our method can successfully learn representative knowledge embedding, and the obtained prior knowledge can effectively enhance the accuracy of relationship predictions. Our thorough evaluations indicate the new method can achieve the state-of-the-art performance compared with other scene graph prediction methods.
\ No newline at end of file
diff --git a/data/2021/neurips/L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization b/data/2021/neurips/L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization
new file mode 100644
index 0000000000..ff744de350
--- /dev/null
+++ b/data/2021/neurips/L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization	
@@ -0,0 +1 @@
+Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI with its CMOS-compatibility, flexibility, ultra-low execution latency, and high energy efficiency. In-situ training on the online programmable photonic chips is appealing but still encounters challenging issues in on-chip implementability, scalability, and efficiency. In this work, we propose a closed-loop ONN on-chip learning framework L2ight to enable scalable ONN mapping and efficient in-situ learning. L2ight adopts a three-stage learning flow that first calibrates the complicated photonic circuit states under challenging physical constraints, then performs photonic core mapping via combined analytical solving and zeroth-order optimization. A subspace learning procedure with multi-level sparsity is integrated into L2ight to enable in-situ gradient evaluation and fast adaptation, unleashing the power of optics for real on-chip intelligence. Extensive experiments demonstrate our proposed L2ight outperforms prior ONN training protocols with 3-order-of-magnitude higher scalability and over 30X better efficiency, when benchmarked on various models and learning tasks. This synergistic framework is the first scalable on-chip learning solution that pushes this emerging field from intractable to scalable and further to efficient for next-generation self-learnable photonic neural chips. From a co-design perspective, L2ight also provides essential insights for hardware-restricted unitary subspace optimization and efficient sparse training. We open-source our framework at https://github.com/JeremieMelo/L2ight.
\ No newline at end of file
diff --git a/data/2021/neurips/LADA: Look-Ahead Data Acquisition via Augmentation for Deep Active Learning b/data/2021/neurips/LADA: Look-Ahead Data Acquisition via Augmentation for Deep Active Learning
new file mode 100644
index 0000000000..211c968d44
--- /dev/null
+++ b/data/2021/neurips/LADA: Look-Ahead Data Acquisition via Augmentation for Deep Active Learning	
@@ -0,0 +1 @@
+Active learning effectively collects data instances for training deep learning models when the labeled dataset is limited and the annotation cost is high. Data augmentation is another effective technique to enlarge the limited amount of labeled instances. The scarcity of labeled dataset leads us to consider the integration of data augmentation and active learning. One possible approach is a pipelined combination, which selects informative instances via the acquisition function and generates virtual instances from the selected instances via augmentation. However, this pipelined approach would not guarantee the informativeness of the virtual instances. This paper proposes Look-Ahead Data Acquisition via augmentation, or LADA framework, that looks ahead the effect of data augmentation in the process of acquisition. LADA jointly considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation, to construct the acquisition function. Moreover, to generate maximally informative virtual instances, LADA optimizes the data augmentation policy to maximize the predictive acquisition score, resulting in the proposal of InfoSTN and InfoMixup . The experimental results of LADA show a signiﬁcant improvement over the recent augmentation and acquisition baselines that were independently applied.
\ No newline at end of file
diff --git a/data/2021/neurips/LEADS: Learning Dynamical Systems that Generalize Across Environments b/data/2021/neurips/LEADS: Learning Dynamical Systems that Generalize Across Environments
new file mode 100644
index 0000000000..7a02c4ab63
--- /dev/null
+++ b/data/2021/neurips/LEADS: Learning Dynamical Systems that Generalize Across Environments	
@@ -0,0 +1 @@
+When modeling dynamical systems from real-world data samples, the distribution of data often changes according to the environment in which they are captured, and the dynamics of the system itself vary from one environment to another. Generalizing across environments thus challenges the conventional frameworks. The classical settings suggest either considering data as i.i.d. and learning a single model to cover all situations or learning environment-specific models. Both are sub-optimal: the former disregards the discrepancies between environments leading to biased solutions, while the latter does not exploit their potential commonalities and is prone to scarcity problems. We propose LEADS, a novel framework that leverages the commonalities and discrepancies among known environments to improve model generalization. This is achieved with a tailored training formulation aiming at capturing common dynamics within a shared model while additional terms capture environment-specific dynamics. We ground our approach in theory, exhibiting a decrease in sample complexity with our approach and corroborate these results empirically, instantiating it for linear dynamics. Moreover, we concretize this framework for neural networks and evaluate it experimentally on representative families of nonlinear dynamics. We show that this new setting can exploit knowledge extracted from environment-dependent data and improves generalization for both known and novel environments. Code is available at https://github.com/yuan-yin/LEADS.
\ No newline at end of file
diff --git a/data/2021/neurips/LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes b/data/2021/neurips/LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
new file mode 100644
index 0000000000..4957a586ee
--- /dev/null
+++ b/data/2021/neurips/LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes	
@@ -0,0 +1 @@
+Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require large bit-codes to be accurate. In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for instances as well as classes. Our method does not require any side-information, like annotated attributes or label meta-data, and learns extremely low-dimensional binary codes (~20 bits for ImageNet-1K). The learnt codes are super-efficient while still ensuring nearly optimal classification accuracy for ResNet50 on ImageNet-1K. We demonstrate that the learnt codes capture intrinsically important features in the data, by discovering an intuitive taxonomy over classes. We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems. For ImageNet-100 retrieval problem, our learnt binary codes outperform 16 bit HashNet using only 10 bits and also are as accurate as 10 dimensional real representations. Finally, our learnt binary codes can perform OOD detection, out-of-the-box, as accurately as a baseline that needs ~3000 samples to tune its threshold, while we require none. Code is open-sourced at https://github.com/RAIVNLab/LLC.
\ No newline at end of file
diff --git a/data/2021/neurips/LSH-SMILE: Locality Sensitive Hashing Accelerated Simulation and Learning b/data/2021/neurips/LSH-SMILE: Locality Sensitive Hashing Accelerated Simulation and Learning
new file mode 100644
index 0000000000..e86de1ef4f
--- /dev/null
+++ b/data/2021/neurips/LSH-SMILE: Locality Sensitive Hashing Accelerated Simulation and Learning	
@@ -0,0 +1 @@
+The advancement of deep neural networks over the last decade has enabled progress in scientiﬁc knowledge discovery in the form of learning Partial Differential Equations (PDEs) directly from experiment data. Nevertheless, forward simulation and backward learning of large-scale dynamic systems requires handling billions mutually interacting elements, the scale of which overwhelms current computing architectures. We propose Locality Sensitive Hashing Accelerated Simulation and Learning (LSH-S MI L E ), a uniﬁed framework to scale up both forward simulation and backward learning of physics systems. LSH-S MI L E takes advantages of (i) the locality of PDE updates, (ii) similar temporal dynamics shared by multiple elements. LSH-S MI L E hashes elements with similar dynamics into a single hash bucket and handles their updates at once. This allows LSH-S MI L E to scale with respect to the number of non-empty hash buckets, a drastic improvement over conventional approaches. Theoretically, we prove a novel bound on the errors introduced by LSH-S MI L E . Experimentally, we demonstrate that LSH-S MI L E simulates physics systems at comparable quality with exact approaches, but with way less time and space complexity. Such savings also translate to better learning performance due to LSH-S MI L E ’s ability to propagate gradients over a long duration.
\ No newline at end of file
diff --git a/data/2021/neurips/Label Disentanglement in Partition-based Extreme Multilabel Classification b/data/2021/neurips/Label Disentanglement in Partition-based Extreme Multilabel Classification
new file mode 100644
index 0000000000..76e6232a47
--- /dev/null
+++ b/data/2021/neurips/Label Disentanglement in Partition-based Extreme Multilabel Classification	
@@ -0,0 +1 @@
+Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label"Apple"can be the fruit or the brand name, which leads to the following research question: can we disentangle these multi-modal labels with non-exclusive clustering tailored for downstream XMC tasks? In this paper, we show that the label assignment problem in partition-based XMC can be formulated as an optimization problem, with the objective of maximizing precision rates. This leads to an efficient algorithm to form flexible and overlapped label clusters, and a method that can alternatively optimizes the cluster assignments and the model parameters for partition-based XMC. Experimental results on synthetic and real datasets show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Label Noise SGD Provably Prefers Flat Global Minimizers b/data/2021/neurips/Label Noise SGD Provably Prefers Flat Global Minimizers
new file mode 100644
index 0000000000..a1932fc3d6
--- /dev/null
+++ b/data/2021/neurips/Label Noise SGD Provably Prefers Flat Global Minimizers	
@@ -0,0 +1 @@
+In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to. Motivated by empirical studies that demonstrate that training with noisy labels improves generalization, we study the implicit regularization effect of SGD with label noise. We show that SGD with label noise converges to a stationary point of a regularized loss L ( θ )+ λR ( θ ) , where L ( θ ) is the training loss, λ is an effective regularization parameter depending on the step size, strength of the label noise, and the batch size, and R ( θ ) is an explicit regularizer that penalizes sharp minimizers. Our analysis uncovers an additional regularization effect of large learning rates beyond the linear scaling rule that penalizes large eigenvalues of the Hessian more than small ones. We also prove extensions to classiﬁcation with general loss functions, signiﬁcantly strengthening the prior work of Blanc et al. [3] to global convergence and large learning rates and of HaoChen et al. [12] to general models.
\ No newline at end of file
diff --git a/data/2021/neurips/Label consistency in overfitted generalized $k$-means b/data/2021/neurips/Label consistency in overfitted generalized $k$-means
new file mode 100644
index 0000000000..7ccc412a9c
--- /dev/null
+++ b/data/2021/neurips/Label consistency in overfitted generalized $k$-means	
@@ -0,0 +1 @@
+We provide theoretical guarantees for label consistency in generalized k -means problems, with an emphasis on the overﬁtted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a reﬁnement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constant-factor approximation to the k -means problem. The results are also model-free and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely deﬁned and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems
\ No newline at end of file
diff --git a/data/2021/neurips/Label-Imbalanced and Group-Sensitive Classification under Overparameterization b/data/2021/neurips/Label-Imbalanced and Group-Sensitive Classification under Overparameterization
new file mode 100644
index 0000000000..562e3ea3a9
--- /dev/null
+++ b/data/2021/neurips/Label-Imbalanced and Group-Sensitive Classification under Overparameterization	
@@ -0,0 +1 @@
+The goal in label-imbalanced and group-sensitive classification is to optimize relevant metrics such as balanced error and equal opportunity. Classical methods, such as weighted cross-entropy, fail when training deep nets to the terminal phase of training (TPT), that is training beyond zero training error. This observation has motivated recent flurry of activity in developing heuristic alternatives following the intuitive mechanism of promoting larger margin for minorities. In contrast to previous heuristics, we follow a principled analysis explaining how different loss adjustments affect margins. First, we prove that for all linear classifiers trained in TPT, it is necessary to introduce multiplicative, rather than additive, logit adjustments so that the interclass margins change appropriately. To show this, we discover a connection of the multiplicative CE modification to the cost-sensitive support-vector machines. Perhaps counterintuitively, we also find that, at the start of training, the same multiplicative weights can actually harm the minority classes. Thus, while additive adjustments are ineffective in the TPT, we show that they can speed up convergence by countering the initial negative effect of the multiplicative weights. Motivated by these findings, we formulate the vector-scaling (VS) loss, that captures existing techniques as special cases. Moreover, we introduce a natural extension of the VS-loss to group-sensitive classification, thus treating the two common types of imbalances (label/group) in a unifying way. Importantly, our experiments on state-of-the-art datasets are fully consistent with our theoretical insights and confirm the superior performance of our algorithms. Finally, for imbalanced Gaussian-mixtures data, we perform a generalization analysis, revealing tradeoffs between balanced / standard error and equal opportunity.
\ No newline at end of file
diff --git a/data/2021/neurips/Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning b/data/2021/neurips/Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning
new file mode 100644
index 0000000000..ec58529474
--- /dev/null
+++ b/data/2021/neurips/Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning	
@@ -0,0 +1 @@
+In this paper, we provide a theory of using graph neural networks (GNNs) for multi-node representation learning (where we are interested in learning a representation for a set of more than one node, such as link). We know that GNN is designed to learn single-node representations. When we want to learn a node set representation involving multiple nodes, a common practice in previous works is to directly aggregate the single-node representations obtained by a GNN into a joint node set representation. In this paper, we show a fundamental constraint of such an approach, namely the inability to capture the dependence between nodes in the node set, and argue that directly aggregating individual node representations does not lead to an effective joint representation for multiple nodes. Then, we notice that a few previous successful works for multi-node representation learning, including SEAL, Distance Encoding, and ID-GNN, all used node labeling. These methods first label nodes in the graph according to their relationships with the target node set before applying a GNN. Then, the node representations obtained in the labeled graph are aggregated into a node set representation. By investigating their inner mechanisms, we unify these node labeling techniques into a single and most general form -- labeling trick. We prove that with labeling trick a sufficiently expressive GNN learns the most expressive node set representations, thus in principle solves any joint learning tasks over node sets. Experiments on one important two-node representation learning task, link prediction, verified our theory. Our work explains the superior performance of previous node-labeling-based methods, and establishes a theoretical foundation of using GNNs for multi-node representation learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning b/data/2021/neurips/Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..f87faa9ea9
--- /dev/null
+++ b/data/2021/neurips/Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning	
@@ -0,0 +1 @@
+Goal-conditioned hierarchical reinforcement learning (HRL) has shown promising results for solving complex and long-horizon RL tasks. However, the action space of high-level policy in the goal-conditioned HRL is often large, so it results in poor exploration, leading to inefficiency in training. In this paper, we present HIerarchical reinforcement learning Guided by Landmarks (HIGL), a novel framework for training a high-level policy with a reduced action space guided by landmarks, i.e., promising states to explore. The key component of HIGL is twofold: (a) sampling landmarks that are informative for exploration and (b) encouraging the high-level policy to generate a subgoal towards a selected landmark. For (a), we consider two criteria: coverage of the entire visited state space (i.e., dispersion of states) and novelty of states (i.e., prediction error of a state). For (b), we select a landmark as the very first landmark in the shortest path in a graph whose nodes are landmarks. Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks, thanks to efficient exploration guided by landmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision b/data/2021/neurips/Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
new file mode 100644
index 0000000000..aa09a25ba7
--- /dev/null
+++ b/data/2021/neurips/Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision	
@@ -0,0 +1 @@
+In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions. Cross-modal alignment is one of the most critical challenges in VLN because the predicted trajectory needs to match the given instruction accurately. In this paper, we address the cross-modal alignment challenge from the perspective of ﬁne-grain. Firstly, to alleviate weak cross-modal alignment supervision from coarse-grained data, we introduce a human-annotated ﬁne-grained VLN dataset, namely Landmark-RxR. Secondly, to further enhance local cross-modal alignment under ﬁne-grained supervision, we investigate the focal-oriented rewards with soft and hard forms, by focusing on the critical points sampled from ﬁne-grained Landmark-RxR. More-over, to fully evaluate the navigation process, we also propose a re-initialization mechanism that makes metrics insensitive to difﬁcult points, which can cause the agent to deviate from the correct trajectories. Experimental results show that our agent has superior navigation performance on Landmark-RxR,
\ No newline at end of file
diff --git a/data/2021/neurips/Landscape analysis of an improved power method for tensor decomposition b/data/2021/neurips/Landscape analysis of an improved power method for tensor decomposition
new file mode 100644
index 0000000000..70c1546f6c
--- /dev/null
+++ b/data/2021/neurips/Landscape analysis of an improved power method for tensor decomposition	
@@ -0,0 +1 @@
+In this work, we consider the optimization formulation for symmetric tensor decomposition recently introduced in the Subspace Power Method (SPM) of Kileel and Pereira. Unlike popular alternative functionals for tensor decomposition, the SPM objective function has the desirable properties that its maximal value is known in advance, and its global optima are exactly the rank-1 components of the tensor when the input is sufficiently low-rank. We analyze the non-convex optimization landscape associated with the SPM objective. Our analysis accounts for working with noisy tensors. We derive quantitative bounds such that any second-order critical point with SPM objective value exceeding the bound must equal a tensor component in the noiseless case, and must approximate a tensor component in the noisy case. For decomposing tensors of size $D^{\times m}$, we obtain a near-global guarantee up to rank $\widetilde{o}(D^{\lfloor m/2 \rfloor})$ under a random tensor model, and a global guarantee up to rank $\mathcal{O}(D)$ assuming deterministic frame conditions. This implies that SPM with suitable initialization is a provable, efficient, robust algorithm for low-rank symmetric tensor decomposition. We conclude with numerics that show a practical preferability for using the SPM functional over a more established counterpart.
\ No newline at end of file
diff --git a/data/2021/neurips/Language models enable zero-shot prediction of the effects of mutations on protein function b/data/2021/neurips/Language models enable zero-shot prediction of the effects of mutations on protein function
new file mode 100644
index 0000000000..0abab85e70
--- /dev/null
+++ b/data/2021/neurips/Language models enable zero-shot prediction of the effects of mutations on protein function	
@@ -0,0 +1 @@
+Modeling the effect of sequence variation on function is a fundamental problem for understanding and designing proteins. Since evolution encodes information about function into patterns in protein sequences, unsupervised models of variant effects can be learned from sequence data. The approach to date has been to fit a model to a family of related sequences. The conventional setting is limited, since a new model must be trained for each prediction task. We show that using only zero-shot inference, without any supervision from experimental data or additional training, protein language models capture the functional effects of sequence variation, performing at state-of-the-art.
\ No newline at end of file
diff --git a/data/2021/neurips/Laplace Redux - Effortless Bayesian Deep Learning b/data/2021/neurips/Laplace Redux - Effortless Bayesian Deep Learning
new file mode 100644
index 0000000000..54f491a1af
--- /dev/null
+++ b/data/2021/neurips/Laplace Redux - Effortless Bayesian Deep Learning	
@@ -0,0 +1 @@
+Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection. The Laplace approximation (LA) is a classic, and arguably the simplest family of approximations for the intractable posteriors of deep neural networks. Yet, despite its simplicity, the LA is not as popular as alternatives like variational Bayes or deep ensembles. This may be due to assumptions that the LA is expensive due to the involved Hessian computation, that it is difficult to implement, or that it yields inferior results. In this work we show that these are misconceptions: we (i) review the range of variants of the LA including versions with minimal cost overhead; (ii) introduce"laplace", an easy-to-use software library for PyTorch offering user-friendly access to all major flavors of the LA; and (iii) demonstrate through extensive experiments that the LA is competitive with more popular alternatives in terms of performance, while excelling in terms of computational cost. We hope that this work will serve as a catalyst to a wider adoption of the LA in practical deep learning, including in domains where Bayesian approaches are not typically considered at the moment.
\ No newline at end of file
diff --git a/data/2021/neurips/Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods b/data/2021/neurips/Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods
new file mode 100644
index 0000000000..56a4ed5536
--- /dev/null
+++ b/data/2021/neurips/Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods	
@@ -0,0 +1 @@
+Many widely used datasets for graph machine learning tasks have generally been homophilous, where nodes with similar labels connect to each other. Recently, new Graph Neural Networks (GNNs) have been developed that move beyond the homophily regime; however, their evaluation has often been conducted on small graphs with limited application domains. We collect and introduce diverse non-homophilous datasets from a variety of application areas that have up to 384x more nodes and 1398x more edges than prior datasets. We further show that existing scalable graph learning and graph minibatching techniques lead to performance degradation on these non-homophilous datasets, thus highlighting the need for further work on scalable non-homophilous methods. To address these concerns, we introduce LINKX -- a strong simple method that admits straightforward minibatch training and inference. Extensive experimental results with representative simple methods and GNNs across our proposed datasets show that LINKX achieves state-of-the-art performance for learning on non-homophilous graphs. Our codes and data are available at https://github.com/CUAI/Non-Homophily-Large-Scale.
\ No newline at end of file
diff --git a/data/2021/neurips/Large-Scale Learning with Fourier Features and Tensor Decompositions b/data/2021/neurips/Large-Scale Learning with Fourier Features and Tensor Decompositions
new file mode 100644
index 0000000000..d6c1c87701
--- /dev/null
+++ b/data/2021/neurips/Large-Scale Learning with Fourier Features and Tensor Decompositions	
@@ -0,0 +1 @@
+Random Fourier features provide a way to tackle large-scale machine learning problems with kernel methods. Their slow Monte Carlo convergence rate has motivated the research of deterministic Fourier features whose approximation error can decrease exponentially in the number of basis functions. However, due to their tensor product extension to multiple dimensions, these methods suffer heavily from the curse of dimensionality, limiting their applicability to one, two or three-dimensional scenarios. In our approach we overcome said curse of dimensionality by exploiting the tensor product structure of deterministic Fourier features, which enables us to represent the model parameters as a low-rank tensor decomposition. We derive a monotonically converging block coordinate descent algorithm with linear complexity in both the sample size and the dimensionality of the inputs for a regularized squared loss function, allowing to learn a parsimonious model in decomposed form using deterministic Fourier features. We demonstrate by means of numerical experiments how our low-rank tensor approach obtains the same performance of the corresponding nonparametric model, consistently outperforming random Fourier features.
\ No newline at end of file
diff --git a/data/2021/neurips/Large-Scale Unsupervised Object Discovery b/data/2021/neurips/Large-Scale Unsupervised Object Discovery
new file mode 100644
index 0000000000..73f36d2a67
--- /dev/null
+++ b/data/2021/neurips/Large-Scale Unsupervised Object Discovery	
@@ -0,0 +1 @@
+Existing approaches to unsupervised object discovery (UOD) do not scale up to large datasets without approximations that compromise their performance. We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis. Through the use of self-supervised features, we also demonstrate the first effective fully unsupervised pipeline for UOD. Extensive experiments on COCO and OpenImages show that, in the single-object discovery setting where a single prominent object is sought in each image, the proposed LOD (Large-scale Object Discovery) approach is on par with, or better than the state of the art for medium-scale datasets (up to 120K images), and over 37% better than the only other algorithms capable of scaling up to 1.7M images. In the multi-object discovery setting where multiple objects are sought in each image, the proposed LOD is over 14% better in average precision (AP) than all other methods for datasets ranging from 20K to 1.7M images. Using self-supervised features, we also show that the proposed method obtains state-of-the-art UOD performance on OpenImages. Our code is publicly available at https://github.com/huyvvo/LOD.
\ No newline at end of file
diff --git a/data/2021/neurips/Large-Scale Wasserstein Gradient Flows b/data/2021/neurips/Large-Scale Wasserstein Gradient Flows
new file mode 100644
index 0000000000..7ce3342ea9
--- /dev/null
+++ b/data/2021/neurips/Large-Scale Wasserstein Gradient Flows	
@@ -0,0 +1 @@
+Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated to each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Unlike previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate our algorithm's performance by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.
\ No newline at end of file
diff --git a/data/2021/neurips/Last iterate convergence of SGD for Least-Squares in the Interpolation regime b/data/2021/neurips/Last iterate convergence of SGD for Least-Squares in the Interpolation regime
new file mode 100644
index 0000000000..642d903597
--- /dev/null
+++ b/data/2021/neurips/Last iterate convergence of SGD for Least-Squares in the Interpolation regime	
@@ -0,0 +1 @@
+Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.
\ No newline at end of file
diff --git a/data/2021/neurips/Last-iterate Convergence in Extensive-Form Games b/data/2021/neurips/Last-iterate Convergence in Extensive-Form Games
new file mode 100644
index 0000000000..f270785d68
--- /dev/null
+++ b/data/2021/neurips/Last-iterate Convergence in Extensive-Form Games	
@@ -0,0 +1 @@
+Regret-based algorithms are highly efficient at finding approximate Nash equilibria in sequential games such as poker games. However, most regret-based algorithms, including counterfactual regret minimization (CFR) and its variants, rely on iterate averaging to achieve convergence. Inspired by recent advances on last-iterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate convergence for zero-sum extensive-form games with perfect recall (EFGs), using various optimistic regret-minimization algorithms over treeplexes. This includes algorithms using the vanilla entropy or squared Euclidean norm regularizers, as well as their dilated versions which admit more efficient implementation. In contrast to CFR, we show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast. We also provide experiments to further support our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons b/data/2021/neurips/Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons
new file mode 100644
index 0000000000..271ef6c2b1
--- /dev/null
+++ b/data/2021/neurips/Latent Equilibrium: Arbitrarily fast computation with arbitrarily slow neurons	
@@ -0,0 +1 @@
+The response time of physical computational elements is finite, and neurons are no exception. In hierarchical models of cortical networks each layer thus introduces a response lag. This inherent property of physical dynamical systems results in delayed processing of stimuli and causes a timing mismatch between network output and instructive signals, thus afflicting not only inference, but also learning. We introduce Latent Equilibrium, a new framework for inference and learning in networks of slow components which avoids these issues by harnessing the ability of biological neurons to phase-advance their output with respect to their membrane potential. This principle enables quasi-instantaneous inference independent of network depth and avoids the need for phased plasticity or computationally expensive network relaxation phases. We jointly derive disentangled neuron and synapse dynamics from a prospective energy function that depends on a network's generalized position and momentum. The resulting model can be interpreted as a biologically plausible approximation of error backpropagation in deep cortical networks with continuous-time, leaky neuronal dynamics and continuously active, local plasticity. We demonstrate successful learning of standard benchmark datasets, achieving competitive performance using both fully-connected and convolutional architectures, and show how our principle can be applied to detailed models of cortical microcircuitry. Furthermore, we study the robustness of our model to spatio-temporal substrate imperfections to demonstrate its feasibility for physical realization, be it in vivo or in silico.
\ No newline at end of file
diff --git a/data/2021/neurips/Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages b/data/2021/neurips/Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages
new file mode 100644
index 0000000000..9a38dcffd2
--- /dev/null
+++ b/data/2021/neurips/Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages	
@@ -0,0 +1 @@
+Program synthesis from input-output (IO) examples has been a long-standing challenge. While recent works demonstrated limited success on domain-specific languages (DSL), it remains highly challenging to apply them to real-world programming languages, such as C. Due to complicated syntax and token variation, there are three major challenges: (1) unlike many DSLs, programs in languages like C need to compile first and are not executed via interpreters; (2) the program search space grows exponentially when the syntax and semantics of the programming language become more complex; and (3) collecting a large-scale dataset of real-world programs is non-trivial. As a first step to address these challenges, we propose LaSynth and show its efficacy in a restricted-C domain. More specifically, LaSynth learns the latent representation to approximate the execution of partially generated programs, even if they are incomplete in syntax (addressing (1)). The learned execution significantly improves the performance of next token prediction over existing approaches, facilitating search (addressing (2)). Finally, once trained with randomly generated ground-truth programs and their IO pairs, LaSynth can synthesize more concise programs that resemble human-written code. Furthermore, retraining our model with these synthesized programs yields better performance with fewer samples for both Karel and C program synthesis, indicating the promise of leveraging the learned program synthesizer to improve the dataset quality for input-output program synthesis (addressing (3)). When evaluating on whether the program execution outputs match the IO pairs, LaSynth achieves 55.2% accuracy on generating simple C code with tens of tokens including loops and branches, outperforming existing approaches without executors by around 20%.
\ No newline at end of file
diff --git a/data/2021/neurips/Latent Matters: Learning Deep State-Space Models b/data/2021/neurips/Latent Matters: Learning Deep State-Space Models
new file mode 100644
index 0000000000..07f89dd15c
--- /dev/null
+++ b/data/2021/neurips/Latent Matters: Learning Deep State-Space Models	
@@ -0,0 +1 @@
+Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data. They are often trained by max-imising the evidence lower bound. However, as we show, this does not ensure the model actually learns the underlying dynamics. We therefore propose a constrained optimisation framework as a general approach for training DSSMs. Building upon this, we introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian ﬁltering/smoothing to model dynamics more accurately than RNN-based DSSMs. Our results show that the constrained optimisation framework signiﬁcantly improves system identiﬁcation and prediction accuracy on the example of established state-of-the-art DSSMs. The EKVAE out-performs previous models w.r.t. prediction accuracy, achieves remarkable results in identifying dynamical systems, and can furthermore successfully learn state-space representations where static and dynamic features are disentangled.
\ No newline at end of file
diff --git a/data/2021/neurips/Lattice partition recovery with dyadic CART b/data/2021/neurips/Lattice partition recovery with dyadic CART
new file mode 100644
index 0000000000..d833772684
--- /dev/null
+++ b/data/2021/neurips/Lattice partition recovery with dyadic CART	
@@ -0,0 +1 @@
+We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e.~of estimating the partition of the lattice induced by the constancy regions of the unknown signal, using the computationally-efficient dyadic classification and regression tree (DCART) methodology proposed by \citep{donoho1997cart}. We prove that, under appropriate regularity conditions on the shape of the partition elements, a DCART-based procedure consistently estimates the underlying partition at a rate of order $\sigma^2 k^* \log (N)/\kappa^2$, where $k^*$ is the minimal number of rectangular sub-graphs obtained using recursive dyadic partitions supporting the signal partition, $\sigma^2$ is the noise variance, $\kappa$ is the minimal magnitude of the signal difference among contiguous elements of the partition and $N$ is the size of the lattice. Furthermore, under stronger assumptions, our method attains a sharper estimation error of order $\sigma^2\log(N)/\kappa^2$, independent of $k^*$, which we show to be minimax rate optimal. Our theoretical guarantees further extend to the partition estimator based on the optimal regression tree estimator (ORT) of \cite{chatterjee2019adaptive} and to the one obtained through an NP-hard exhaustive search method. We corroborate our theoretical findings and the effectiveness of DCART for partition recovery in simulations.
\ No newline at end of file
diff --git a/data/2021/neurips/Learnability of Linear Thresholds from Label Proportions b/data/2021/neurips/Learnability of Linear Thresholds from Label Proportions
new file mode 100644
index 0000000000..a104266902
--- /dev/null
+++ b/data/2021/neurips/Learnability of Linear Thresholds from Label Proportions	
@@ -0,0 +1 @@
+We study the problem of properly learning linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the learning is on a collection of bags of feature-vectors with only the proportion of labels available for each bag. First, we provide an algorithm that, given a collection of such bags each of size at most two whose label proportions are consistent with (i.e., the bags are satisﬁed by) an unknown LTF, efﬁciently produces an LTF that satisﬁes at least (2 / 5) -fraction of the bags. If all the bags are non-monochromatic (i.e., bags of size two with differently labeled feature-vectors) the algorithm satisﬁes at least (1 / 2) -fraction of them. For the special case of OR over the d -dimensional boolean vectors, we give an algorithm which computes an LTF achieving an additional Ω(1 /d ) in accuracy for the two cases. Our main result provides evidence that these algorithmic bounds cannot be signiﬁ-cantly improved, even for learning monotone ORs using LTFs. We prove that it is NP -hard, given a collection of non-monochromatic bags which are all satisﬁed by some monotone OR, to compute any function of constantly many LTFs that satisﬁes (1 / 2 + ε ) -fraction of the bags for any constant ε > 0 . This bound is tight for the non-monochromatic bags case. The above is in contrast to the usual supervised learning setup (i.e., unit-sized bags) in which LTFs are efﬁciently learnable to arbitrary accuracy using linear programming, and even a trivial algorithm (any LTF or its complement) achieves an accuracy of 1 / 2 . These techniques however, fail in the LLP setting. Indeed, we show that the LLP learning of LTFs (even for the special case of monotone ORs) using LTFs dramatically increases in complexity as soon as bags of size two are allowed. Our work gives the ﬁrst inapproximability for LLP learning LTFs
\ No newline at end of file
diff --git a/data/2021/neurips/Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding b/data/2021/neurips/Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
new file mode 100644
index 0000000000..2bd59cdcde
--- /dev/null
+++ b/data/2021/neurips/Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding	
@@ -0,0 +1 @@
+Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where $L_2$ distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.
\ No newline at end of file
diff --git a/data/2021/neurips/Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection b/data/2021/neurips/Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection
new file mode 100644
index 0000000000..e025ff4c4e
--- /dev/null
+++ b/data/2021/neurips/Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection	
@@ -0,0 +1 @@
+Robust principal component analysis (RPCA) is a critical tool in modern machine learning, which detects outliers in the task of low-rank matrix reconstruction. In this paper, we propose a scalable and learnable non-convex approach for high-dimensional RPCA problems, which we call Learned Robust PCA (LRPCA). LRPCA is highly efficient, and its free parameters can be effectively learned to optimize via deep unfolding. Moreover, we extend deep unfolding from finite iterations to infinite iterations via a novel feedforward-recurrent-mixed neural network model. We establish the recovery guarantee of LRPCA under mild assumptions for RPCA. Numerical experiments show that LRPCA outperforms the state-of-the-art RPCA algorithms, such as ScaledGD and AltProj, on both synthetic datasets and real-world applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning 3D Dense Correspondence via Canonical Point Autoencoder b/data/2021/neurips/Learning 3D Dense Correspondence via Canonical Point Autoencoder
new file mode 100644
index 0000000000..6166dcbf9d
--- /dev/null
+++ b/data/2021/neurips/Learning 3D Dense Correspondence via Canonical Point Autoencoder	
@@ -0,0 +1 @@
+We propose a canonical point autoencoder (CPAE) that predicts dense correspondences between 3D shapes of the same category. The autoencoder performs two key functions: (a) encoding an arbitrarily ordered point cloud to a canonical primitive, e.g., a sphere, and (b) decoding the primitive back to the original input instance shape. As being placed in the bottleneck, this primitive plays a key role to map all the unordered point clouds on the canonical surface and to be reconstructed in an ordered fashion. Once trained, points from different shape instances that are mapped to the same locations on the primitive surface are determined to be a pair of correspondence. Our method does not require any form of annotation or self-supervised part segmentation network and can handle unaligned input point clouds. Experimental results on 3D semantic keypoint transfer and part segmentation transfer show that our model performs favorably against state-of-the-art correspondence learning methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations b/data/2021/neurips/Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations
new file mode 100644
index 0000000000..6f860510a1
--- /dev/null
+++ b/data/2021/neurips/Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations	
@@ -0,0 +1 @@
+Training-time safety violations have been a major concern when we deploy reinforcement learning algorithms in the real world. This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics model and additional offline data. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS), which iteratively learns barrier certificates, dynamics models, and policies. The barrier certificates, learned via adversarial training, ensure the policy's safety assuming calibrated learned dynamics model. We also add a regularization term to encourage larger certified regions to enable better exploration. Empirical simulations show that zero safety violations are already challenging for a suite of simple environments with only 2-4 dimensional state space, especially if high-reward policies have to visit regions near the safety boundary. Prior methods require hundreds of violations to achieve decent rewards on these tasks, whereas our proposed algorithms incur zero violations.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Causal Semantic Representation for Out-of-Distribution Prediction b/data/2021/neurips/Learning Causal Semantic Representation for Out-of-Distribution Prediction
new file mode 100644
index 0000000000..4894daf4e3
--- /dev/null
+++ b/data/2021/neurips/Learning Causal Semantic Representation for Out-of-Distribution Prediction	
@@ -0,0 +1 @@
+Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal thought where the two factors are modeled separately, and develop methods to learn it on a single training domain and predict in a test domain without (OOD generalization) or with unsupervised data (domain adaptation). We prove that under proper conditions CSG identifies the semantic factor by fitting training data, and this semantic identification guarantees the boundedness of OOD generalization error and the success of adaptation. The methods and theory are based on the invariance principle of causal generative mechanisms, which is more fundamental and general than inference invariance. The methods come from a novel design for both efficient learning and easy prediction, following the first principle of variational Bayes and the graphical structure of CSG. Empirical study demonstrates the improved test accuracy for OOD generalization and domain adaptation.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Collaborative Policies to Solve NP-hard Routing Problems b/data/2021/neurips/Learning Collaborative Policies to Solve NP-hard Routing Problems
new file mode 100644
index 0000000000..12e255b785
--- /dev/null
+++ b/data/2021/neurips/Learning Collaborative Policies to Solve NP-hard Routing Problems	
@@ -0,0 +1 @@
+Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions. On the other hand, the reviser modifies each candidate solution generated by the seeder; it partitions the full trajectory into sub-tours and simultaneously revises each sub-tour to minimize its traveling distance. Thus, the reviser is trained to improve the candidate solution's quality, focusing on the reduced solution space (which is beneficial for exploitation). Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and capacitated vehicle routing problem (CVRP).
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) b/data/2021/neurips/Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)
new file mode 100644
index 0000000000..5cfcc752ac
--- /dev/null
+++ b/data/2021/neurips/Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)	
@@ -0,0 +1 @@
+A central goal in deep learning is to learn compact representations of features at every layer of a neural network, which is useful for both unsupervised representation learning and structured network pruning. While there is a growing body of work in structured pruning, current state-of-the-art methods suffer from two key limitations: (i) instability during training, and (ii) need for an additional step of fine-tuning, which is resource-intensive. At the core of these limitations is the lack of a systematic approach that jointly prunes and refines weights during training in a single stage, and does not require any fine-tuning upon convergence to achieve state-of-the-art performance. We present a novel single-stage structured pruning method termed DiscriminAtive Masking (DAM). The key intuition behind DAM is to discriminatively prefer some of the neurons to be refined during the training process, while gradually masking out other neurons. We show that our proposed DAM approach has remarkably good performance over various applications, including dimensionality reduction, recommendation system, graph representation learning, and structured pruning for image classification. We also theoretically show that the learning objective of DAM is directly related to minimizing the L0 norm of the masking layer.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Conjoint Attentions for Graph Neural Nets b/data/2021/neurips/Learning Conjoint Attentions for Graph Neural Nets
new file mode 100644
index 0000000000..9437188403
--- /dev/null
+++ b/data/2021/neurips/Learning Conjoint Attentions for Graph Neural Nets	
@@ -0,0 +1 @@
+In this paper, we present Conjoint Attentions (CAs), a class of novel learning-to-attend strategies for graph neural networks (GNNs). Besides considering the layer-wise node features propagated within the GNN, CAs can additionally incorporate various structural interventions, such as node cluster embedding, and higher-order structural correlations that can be learned outside of GNN, when computing attention scores. The node features that are regarded as significant by the conjoint criteria are therefore more likely to be propagated in the GNN. Given the novel Conjoint Attention strategies, we then propose Graph conjoint attention networks (CATs) that can learn representations embedded with significant latent features deemed by the Conjoint Attentions. Besides, we theoretically validate the discriminative capacity of CATs. CATs utilizing the proposed Conjoint Attention strategies have been extensively tested in well-established benchmarking datasets and comprehensively compared with state-of-the-art baselines. The obtained notable performance demonstrates the effectiveness of the proposed Conjoint Attentions.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Debiased Representation via Disentangled Feature Augmentation b/data/2021/neurips/Learning Debiased Representation via Disentangled Feature Augmentation
new file mode 100644
index 0000000000..bf82da9962
--- /dev/null
+++ b/data/2021/neurips/Learning Debiased Representation via Disentangled Feature Augmentation	
@@ -0,0 +1 @@
+Image classification models tend to make decisions based on peripheral attributes of data items that have strong correlation with a target variable (i.e., dataset bias). These biased models suffer from the poor generalization capability when evaluated on unbiased datasets. Existing approaches for debiasing often identify and emphasize those samples with no such correlation (i.e., bias-conflicting) without defining the bias type in advance. However, such bias-conflicting samples are significantly scarce in biased datasets, limiting the debiasing capability of these approaches. This paper first presents an empirical analysis revealing that training with"diverse"bias-conflicting samples beyond a given training set is crucial for debiasing as well as the generalization capability. Based on this observation, we propose a novel feature-level data augmentation technique in order to synthesize diverse bias-conflicting samples. To this end, our method learns the disentangled representation of (1) the intrinsic attributes (i.e., those inherently defining a certain class) and (2) bias attributes (i.e., peripheral attributes causing the bias), from a large number of bias-aligned samples, the bias attributes of which have strong correlation with the target variable. Using the disentangled representation, we synthesize bias-conflicting samples that contain the diverse intrinsic attributes of bias-aligned samples by swapping their latent features. By utilizing these diversified bias-conflicting features during the training, our approach achieves superior classification accuracy and debiasing results against the existing baselines on synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Debiased and Disentangled Representations for Semantic Segmentation b/data/2021/neurips/Learning Debiased and Disentangled Representations for Semantic Segmentation
new file mode 100644
index 0000000000..df0128e029
--- /dev/null
+++ b/data/2021/neurips/Learning Debiased and Disentangled Representations for Semantic Segmentation	
@@ -0,0 +1 @@
+Deep neural networks are susceptible to learn biased models with entangled feature representations, which may lead to subpar performances on various downstream tasks. This is particularly true for under-represented classes, where a lack of diversity in the data exacerbates the tendency. This limitation has been addressed mostly in classification tasks, but there is little study on additional challenges that may appear in more complex dense prediction problems including semantic segmentation. To this end, we propose a model-agnostic and stochastic training scheme for semantic segmentation, which facilitates the learning of debiased and disentangled representations. For each class, we first extract class-specific information from the highly entangled feature map. Then, information related to a randomly sampled class is suppressed by a feature selection process in the feature space. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes, and the model is able to learn more debiased and disentangled feature representations. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks, with especially notable performance gains on under-represented classes.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Disentangled Behavior Embeddings b/data/2021/neurips/Learning Disentangled Behavior Embeddings
new file mode 100644
index 0000000000..f79fa13f24
--- /dev/null
+++ b/data/2021/neurips/Learning Disentangled Behavior Embeddings	
@@ -0,0 +1 @@
+To understand the relationship between behavior and neural activity, experiments in neuroscience often include an animal performing a repeated behavior such as a motor task. Recent progress in computer vision and deep learning has shown great potential in the automated analysis of behavior by leveraging large and high-quality video datasets. In this paper, we design Disentangled Behavior Embedding (DBE) to learn robust behavioral embeddings from unlabeled, multi-view, high-resolution behavioral videos across different animals and multiple sessions. We further combine DBE with a stochastic temporal model to propose Variational Disentangled Behavior Embedding (VDBE), an end-to-end approach that learns meaningful discrete behavior representations and generates interpretable behavioral videos. Our models learn consistent behavior representations by explicitly disentangling the dynamic behavioral factors (pose) from time-invariant, non-behavioral nuisance factors (context) in a deep autoencoder, and exploit the temporal structures of pose dynamics. Compared to competing approaches, DBE and VDBE enjoy superior performance on downstream tasks such as ﬁne-grained behavioral motif generation and behavior decoding.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Distilled Collaboration Graph for Multi-Agent Perception b/data/2021/neurips/Learning Distilled Collaboration Graph for Multi-Agent Perception
new file mode 100644
index 0000000000..dd288e241a
--- /dev/null
+++ b/data/2021/neurips/Learning Distilled Collaboration Graph for Multi-Agent Perception	
@@ -0,0 +1 @@
+To promote better performance-bandwidth trade-off for multi-agent perception, we propose a novel distilled collaboration graph (DiscoGraph) to model trainable, pose-aware, and adaptive collaboration among agents. Our key novelties lie in two aspects. First, we propose a teacher-student framework to train DiscoGraph via knowledge distillation. The teacher model employs an early collaboration with holistic-view inputs; the student model is based on intermediate collaboration with single-view inputs. Our framework trains DiscoGraph by constraining post-collaboration feature maps in the student model to match the correspondences in the teacher model. Second, we propose a matrix-valued edge weight in DiscoGraph. In such a matrix, each element reflects the inter-agent attention at a specific spatial region, allowing an agent to adaptively highlight the informative regions. During inference, we only need to use the student model named as the distilled collaboration network (DiscoNet). Attributed to the teacher-student framework, multiple agents with the shared DiscoNet could collaboratively approach the performance of a hypothetical teacher model with a holistic view. Our approach is validated on V2X-Sim 1.0, a large-scale multi-agent perception dataset that we synthesized using CARLA and SUMO co-simulation. Our quantitative and qualitative experiments in multi-agent 3D object detection show that DiscoNet could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. Our code is available on https://github.com/ai4ce/DiscoNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Diverse Policies in MOBA Games via Macro-Goals b/data/2021/neurips/Learning Diverse Policies in MOBA Games via Macro-Goals
new file mode 100644
index 0000000000..42cdd1eb94
--- /dev/null
+++ b/data/2021/neurips/Learning Diverse Policies in MOBA Games via Macro-Goals	
@@ -0,0 +1 @@
+Recently, many researchers have made successful progress in building the AI systems for MOBA-game-playing with deep reinforcement learning, such as on Dota 2 and Honor of Kings. Even though these AI systems have achieved or even exceeded human-level performance, they still suffer from the lack of policy diversity. In this paper, we propose a novel Macro-Goals Guided framework, called MGG, to learn diverse policies in MOBA games. MGG abstracts strategies as macro-goals from human demonstrations and trains a Meta-Controller to predict these macro-goals. To enhance policy diversity, MGG samples macro-goals from the Meta-Controller prediction and guides the training process towards these goals. Experimental results on the typical MOBA game Honor of Kings demonstrate that MGG can execute diverse policies in different matches and lineups, and also outperform the state-of-the-art methods over 102 heroes.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Domain Invariant Representations in Goal-conditioned Block MDPs b/data/2021/neurips/Learning Domain Invariant Representations in Goal-conditioned Block MDPs
new file mode 100644
index 0000000000..b968efa2d9
--- /dev/null
+++ b/data/2021/neurips/Learning Domain Invariant Representations in Goal-conditioned Block MDPs	
@@ -0,0 +1 @@
+Deep Reinforcement Learning (RL) is successful in solving many complex Markov Decision Processes (MDPs) problems. However, agents often face unanticipated environmental changes after deployment in the real world. These changes are often spurious and unrelated to the underlying problem, such as background shifts for visual input agents. Unfortunately, deep RL policies are usually sensitive to these changes and fail to act robustly against them. This resembles the problem of domain generalization in supervised learning. In this work, we study this problem for goal-conditioned RL agents. We propose a theoretical framework in the Block MDP setting that characterizes the generalizability of goal-conditioned policies to new environments. Under this framework, we develop a practical method PA-SkewFit that enhances domain generalization. The empirical evaluation shows that our goal-conditioned RL agent can perform well in various unseen test environments, improving by 50% over baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention b/data/2021/neurips/Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention
new file mode 100644
index 0000000000..660a49bd28
--- /dev/null
+++ b/data/2021/neurips/Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention	
@@ -0,0 +1 @@
+Functional connectivity (FC) between regions of the brain can be assessed by the degree of temporal correlation measured with functional neuroimaging modalities. Based on the fact that these connectivities build a network, graph-based approaches for analyzing the brain connectome have provided insights into the functions of the human brain. The development of graph neural networks (GNNs) capable of learning representation from graph structured data has led to increased interest in learning the graph representation of the brain connectome. Although recent attempts to apply GNN to the FC network have shown promising results, there is still a common limitation that they usually do not incorporate the dynamic characteristics of the FC network which fluctuates over time. In addition, a few studies that have attempted to use dynamic FC as an input for the GNN reported a reduction in performance compared to static FC methods, and did not provide temporal explainability. Here, we propose STAGIN, a method for learning dynamic graph representation of the brain connectome with spatio-temporal attention. Specifically, a temporal sequence of brain graphs is input to the STAGIN to obtain the dynamic graph representation, while novel READOUT functions and the Transformer encoder provide spatial and temporal explainability with attention, respectively. Experiments on the HCP-Rest and the HCP-Task datasets demonstrate exceptional performance of our proposed method. Analysis of the spatio-temporal attention also provide concurrent interpretation with the neuroscientific knowledge, which further validates our method. Code is available at https://github.com/egyptdj/stagin
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Equilibria in Matching Markets from Bandit Feedback b/data/2021/neurips/Learning Equilibria in Matching Markets from Bandit Feedback
new file mode 100644
index 0000000000..65da5334a0
--- /dev/null
+++ b/data/2021/neurips/Learning Equilibria in Matching Markets from Bandit Feedback	
@@ -0,0 +1 @@
+Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are unfortunately of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that"optimism in the face of uncertainty,"the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent b/data/2021/neurips/Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
new file mode 100644
index 0000000000..515062127f
--- /dev/null
+++ b/data/2021/neurips/Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent	
@@ -0,0 +1 @@
+We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, we define equivariant energy based models to model invariant densities that are learned using contrastive divergence. By utilizing our equivariant SVGD for training equivariant EBMs, we propose new ways of improving and scaling up training of energy based models. We apply these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Fast-Inference Bayesian Networks b/data/2021/neurips/Learning Fast-Inference Bayesian Networks
new file mode 100644
index 0000000000..f3c04e0413
--- /dev/null
+++ b/data/2021/neurips/Learning Fast-Inference Bayesian Networks	
@@ -0,0 +1 @@
+We propose new methods for learning Bayesian networks (BNs) that reliably support fast inference. We utilize maximum state space size as a more ﬁne-grained measure for the BN’s reasoning complexity than the standard treewidth measure, thereby accommodating the possibility that variables range over domains of different sizes. Our methods combine heuristic BN structure learning algorithms with the recently introduced MaxSAT-powered local improvement method (Peruvemba Ramaswamy and Szeider, AAAI’21). Our experiments show that our new learning methods produce BNs that support signiﬁcantly faster exact probabilistic inference than BNs learned with treewidth bounds.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Frequency Domain Approximation for Binary Neural Networks b/data/2021/neurips/Learning Frequency Domain Approximation for Binary Neural Networks
new file mode 100644
index 0000000000..b5720f89e8
--- /dev/null
+++ b/data/2021/neurips/Learning Frequency Domain Approximation for Binary Neural Networks	
@@ -0,0 +1 @@
+Binary neural networks (BNNs) represent original full-precision weights and activations into 1-bit with sign function. Since the gradient of the conventional sign function is almost zero everywhere which cannot be used for back-propagation, several attempts have been proposed to alleviate the optimization difficulty by using approximate gradient. However, those approximations corrupt the main direction of factual gradient. To this end, we propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs, namely frequency domain approximation (FDA). The proposed approach does not affect the low-frequency information of the original sign function which occupies most of the overall energy, and high-frequency coefficients will be ignored to avoid the huge computational overhead. In addition, we embed a noise adaptation module into the training phase to compensate the approximation error. The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy. Code will be available at \textit{https://gitee.com/mindspore/models/tree/master/research/cv/FDA-BNN}.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions b/data/2021/neurips/Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions
new file mode 100644
index 0000000000..7f44e905d4
--- /dev/null
+++ b/data/2021/neurips/Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions	
@@ -0,0 +1 @@
+Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of $K$ Gaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of $\ell_1$ penalty with respect to $\ell_2$; b) max-margin multi-class classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for $K>2$. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Generalized Gumbel-max Causal Mechanisms b/data/2021/neurips/Learning Generalized Gumbel-max Causal Mechanisms
new file mode 100644
index 0000000000..bb1f4cc874
--- /dev/null
+++ b/data/2021/neurips/Learning Generalized Gumbel-max Causal Mechanisms	
@@ -0,0 +1 @@
+To perform counterfactual reasoning in Structural Causal Models (SCMs), one needs to know the causal mechanisms, which provide factorizations of conditional distributions into noise sources and deterministic functions mapping realizations of noise to samples. Unfortunately, the causal mechanism is not uniquely identified by data that can be gathered by observing and interacting with the world, so there remains the question of how to choose causal mechanisms. In recent work, Oberst&Sontag (2019) propose Gumbel-max SCMs, which use Gumbel-max reparameterizations as the causal mechanism due to an intuitively appealing counterfactual stability property. In this work, we instead argue for choosing a causal mechanism that is best under a quantitative criteria such as minimizing variance when estimating counterfactual treatment effects. We propose a parameterized family of causal mechanisms that generalize Gumbel-max. We show that they can be trained to minimize counterfactual effect variance and other losses on a distribution of queries of interest, yielding lower variance estimates of counterfactual treatment effect than fixed alternatives, also generalizing to queries not seen at training time.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction b/data/2021/neurips/Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction
new file mode 100644
index 0000000000..c91cb04f8a
--- /dev/null
+++ b/data/2021/neurips/Learning Generative Vision Transformer with Energy-Based Latent Space for Saliency Prediction	
@@ -0,0 +1 @@
+Vision transformer networks have shown superiority in many computer vision tasks. In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection. Both the vision transformer network and the energy-based prior model are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. Further, with the generative vision transformer, we can easily obtain a pixel-wise uncertainty map from an image, which indicates the model confidence in predicting saliency from the image. Different from the existing generative models which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive to capture the latent space of the data. We apply the proposed framework to both RGB and RGB-D salient object detection tasks. Extensive experimental results show that our framework can achieve not only accurate saliency predictions but also meaningful uncertainty maps that are consistent with the human perception.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Graph Cellular Automata b/data/2021/neurips/Learning Graph Cellular Automata
new file mode 100644
index 0000000000..b921b3f8ee
--- /dev/null
+++ b/data/2021/neurips/Learning Graph Cellular Automata	
@@ -0,0 +1 @@
+Cellular automata (CA) are a class of computational models that exhibit rich dynamics emerging from the local interaction of cells arranged in a regular lattice. In this work we focus on a generalised version of typical CA, called graph cellular automata (GCA), in which the lattice structure is replaced by an arbitrary graph. In particular, we extend previous work that used convolutional neural networks to learn the transition rule of conventional CA and we use graph neural networks to learn a variety of transition rules for GCA. First, we present a general-purpose architecture for learning GCA, and we show that it can represent any arbitrary GCA with finite and discrete state space. Then, we test our approach on three different tasks: 1) learning the transition rule of a GCA on a Voronoi tessellation; 2) imitating the behaviour of a group of flocking agents; 3) learning a rule that converges to a desired target state.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Graph Models for Retrosynthesis Prediction b/data/2021/neurips/Learning Graph Models for Retrosynthesis Prediction
new file mode 100644
index 0000000000..2c7837e4cb
--- /dev/null
+++ b/data/2021/neurips/Learning Graph Models for Retrosynthesis Prediction	
@@ -0,0 +1 @@
+Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule. A key consideration in building neural models for this task is aligning model design with strategies adopted by chemists. Building on this viewpoint, this paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction. The model first predicts the set of graph edits transforming the target into incomplete molecules called synthons. Next, the model learns to expand synthons into complete molecules by attaching relevant leaving groups. This decomposition simplifies the architecture, making its predictions more interpretable, and also amenable to manual correction. Our model achieves a top-1 accuracy of $53.7\%$, outperforming previous template-free and semi-template-based methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Hard Optimization Problems: A Data Generation Perspective b/data/2021/neurips/Learning Hard Optimization Problems: A Data Generation Perspective
new file mode 100644
index 0000000000..02fcd58293
--- /dev/null
+++ b/data/2021/neurips/Learning Hard Optimization Problems: A Data Generation Perspective	
@@ -0,0 +1 @@
+Optimization problems are ubiquitous in our societies and are present in almost every segment of the economy. Most of these optimization problems are NP-hard and computationally demanding, often requiring approximate solutions for large-scale instances. Machine learning frameworks that learn to approximate solutions to such hard optimization problems are a potentially promising avenue to address these difficulties, particularly when many closely related problem instances must be solved repeatedly. Supervised learning frameworks can train a model using the outputs of pre-solved instances. However, when the outputs are themselves approximations, when the optimization problem has symmetric solutions, and/or when the solver uses randomization, solutions to closely related instances may exhibit large differences and the learning task can become inherently more difficult. This paper demonstrates this critical challenge, connects the volatility of the training data to the ability of a model to approximate it, and proposes a method for producing (exact or approximate) solutions to optimization problems that are more amenable to supervised learning tasks. The effectiveness of the method is tested on hard non-linear nonconvex and discrete combinatorial problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence b/data/2021/neurips/Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence
new file mode 100644
index 0000000000..938a305599
--- /dev/null
+++ b/data/2021/neurips/Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence	
@@ -0,0 +1 @@
+Existing rotated object detectors are mostly inherited from the horizontal detection paradigm, as the latter has evolved into a well-developed area. However, these detectors are difficult to perform prominently in high-precision detection due to the limitation of current regression loss design, especially for objects with large aspect ratios. Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection. We show that one essential challenge is how to modulate the coupled parameters in the rotation regression loss, as such the estimated parameters can influence to each other during the dynamic joint optimization, in an adaptive and synergetic way. Specifically, we first convert the rotated bounding box into a 2-D Gaussian distribution, and then calculate the Kullback-Leibler Divergence (KLD) between the Gaussian distributions as the regression loss. By analyzing the gradient of each parameter, we show that KLD (and its derivatives) can dynamically adjust the parameter gradients according to the characteristics of the object. It will adjust the importance (gradient weight) of the angle parameter according to the aspect ratio. This mechanism can be vital for high-precision detection as a slight angle error would cause a serious accuracy drop for large aspect ratios objects. More importantly, we have proved that KLD is scale invariant. We further show that the KLD loss can be degenerated into the popular $l_{n}$-norm loss for horizontal detection. Experimental results on seven datasets using different detectors show its consistent superiority, and codes are available at https://github.com/yangxue0827/RotationDetection and https://github.com/open-mmlab/mmrotate.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach b/data/2021/neurips/Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach
new file mode 100644
index 0000000000..656551bf8d
--- /dev/null
+++ b/data/2021/neurips/Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach	
@@ -0,0 +1 @@
+Rule sets are highly interpretable logical models in which the predicates for decision are expressed in disjunctive normal form (DNF, OR-of-ANDs), or, equivalently, the overall model comprises an unordered collection of if-then decision rules. In this paper, we consider a submodular optimization based approach for learning rule sets. The learning problem is framed as a subset selection task in which a subset of all possible rules needs to be selected to form an accurate and interpretable rule set. We employ an objective function that exhibits submodularity and thus is amenable to submodular optimization techniques. To overcome the difﬁculty arose from dealing with the exponential-sized ground set of rules, the subproblem of searching a rule is casted as another subset selection task that asks for a subset of features. We show it is possible to write the induced objective function for the subproblem as a difference of two submodular (DS) functions to make it approximately solvable by DS optimization algorithms. Overall, the proposed approach is simple, scalable, and likely to be beneﬁted from further research on submodular optimization. Experiments on real datasets demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Knowledge Graph-based World Models of Textual Environments b/data/2021/neurips/Learning Knowledge Graph-based World Models of Textual Environments
new file mode 100644
index 0000000000..b55895d831
--- /dev/null
+++ b/data/2021/neurips/Learning Knowledge Graph-based World Models of Textual Environments	
@@ -0,0 +1 @@
+World models improve a learning agent's ability to efficiently operate in interactive and situated environments. This work focuses on the task of building world models of text-based game environments. Text-based games, or interactive narratives, are reinforcement learning environments in which agents perceive and interact with the world using textual natural language. These environments contain long, multi-step puzzles or quests woven through a world that is filled with hundreds of characters, locations, and objects. Our world model learns to simultaneously: (1) predict changes in the world caused by an agent's actions when representing the world as a knowledge graph; and (2) generate the set of contextually relevant natural language actions required to operate in the world. We frame this task as a Set of Sequences generation problem by exploiting the inherent structure of knowledge graphs and actions and introduce both a transformer-based multi-task architecture and a loss function to train it. A zero-shot ablation study on never-before-seen textual worlds shows that our methodology significantly outperforms existing textual world modeling techniques as well as the importance of each of our contributions.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Large Neighborhood Search Policy for Integer Programming b/data/2021/neurips/Learning Large Neighborhood Search Policy for Integer Programming
new file mode 100644
index 0000000000..b8e90c3caf
--- /dev/null
+++ b/data/2021/neurips/Learning Large Neighborhood Search Policy for Integer Programming	
@@ -0,0 +1 @@
+We propose a deep reinforcement learning (RL) method to learn large neighborhood search (LNS) policy for integer programming (IP). The RL policy is trained as the destroy operator to select a subset of variables at each step, which is reoptimized by an IP solver as the repair operator. However, the combinatorial number of variable subsets prevents direct application of typical RL algorithms. To tackle this challenge, we represent all subsets by factorizing them into binary decisions on each variable. We then design a neural network to learn policies for each variable in parallel, trained by a customized actor-critic algorithm. We evaluate the proposed method on four representative IP problems. Results show that it can find better solutions than SCIP in much less time, and significantly outperform other LNS baselines with the same runtime. Moreover, these advantages notably persist when the policies generalize to larger problems. Further experiments with Gurobi also reveal that our method can outperform this state-of-the-art commercial solver within the same time limit.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning b/data/2021/neurips/Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Learning Markov State Abstractions for Deep Reinforcement Learning b/data/2021/neurips/Learning Markov State Abstractions for Deep Reinforcement Learning
new file mode 100644
index 0000000000..d8c3a96c92
--- /dev/null
+++ b/data/2021/neurips/Learning Markov State Abstractions for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features -- often matching or exceeding the performance achieved with hand-designed compact state information.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Models for Actionable Recourse b/data/2021/neurips/Learning Models for Actionable Recourse
new file mode 100644
index 0000000000..caf65b7eb6
--- /dev/null
+++ b/data/2021/neurips/Learning Models for Actionable Recourse	
@@ -0,0 +1 @@
+As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations. Such explanations provide individuals adversely impacted by predicted outcomes (e.g., an applicant denied a loan) with recourse -- i.e., a description of how they can change their features to obtain a positive outcome. We propose a novel algorithm that leverages adversarial training and PAC confidence sets to learn models that theoretically guarantee recourse to affected individuals with high probability without sacrificing accuracy. We demonstrate the efficacy of our approach via extensive experiments on real data.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Nonparametric Volterra Kernels with Gaussian Processes b/data/2021/neurips/Learning Nonparametric Volterra Kernels with Gaussian Processes
new file mode 100644
index 0000000000..487951b1f2
--- /dev/null
+++ b/data/2021/neurips/Learning Nonparametric Volterra Kernels with Gaussian Processes	
@@ -0,0 +1 @@
+This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs), which we term the nonparametric Volterra kernels model (NVKM). When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model. When the input function is observed, the NVKM can be used to perform Bayesian system identification. We use recent advances in efficient sampling of explicit functions from GPs to map process realisations through the Volterra series without resorting to numerical integration, allowing scalability through doubly stochastic variational inference, and avoiding the need for Gaussian approximations of the output processes. We demonstrate the performance of the model for both multiple output regression and system identification using standard benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Optimal Predictive Checklists b/data/2021/neurips/Learning Optimal Predictive Checklists
new file mode 100644
index 0000000000..d1b9628222
--- /dev/null
+++ b/data/2021/neurips/Learning Optimal Predictive Checklists	
@@ -0,0 +1 @@
+Checklists are simple decision aids that are often used to promote safety and reliability in clinical applications. In this paper, we present a method to learn checklists for clinical decision support. We represent predictive checklists as discrete linear classifiers with binary features and unit weights. We then learn globally optimal predictive checklists from data by solving an integer programming problem. Our method allows users to customize checklists to obey complex constraints, including constraints to enforce group fairness and to binarize real-valued features at training time. In addition, it pairs models with an optimality gap that can inform model development and determine the feasibility of learning sufficiently accurate checklists on a given dataset. We pair our method with specialized techniques that speed up its ability to train a predictive checklist that performs well and has a small optimality gap. We benchmark the performance of our method on seven clinical classification problems, and demonstrate its practical benefits by training a short-form checklist for PTSD screening. Our results show that our method can fit simple predictive checklists that perform well and that can easily be customized to obey a rich class of custom constraints.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs b/data/2021/neurips/Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs
new file mode 100644
index 0000000000..08e5a54387
--- /dev/null
+++ b/data/2021/neurips/Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs	
@@ -0,0 +1 @@
+We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of $\tilde{\mathcal{O}}(\sqrt{K})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{K})$ constraint violation in $K$ episodes. A critical question that arises is whether it is possible to keep the constraint violation even smaller. We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$. The algorithm which does so employs the principle of optimistic pessimism in the face of uncertainty to achieve safe exploration. When no strictly safe policy is known, though one is known to exist, then it is possible to restrict the system to bounded constraint violation with arbitrarily high probability. This is shown to be realized by a primal-dual algorithm with an optimistic primal estimate and a pessimistic dual update.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Riemannian metric for disease progression modeling b/data/2021/neurips/Learning Riemannian metric for disease progression modeling
new file mode 100644
index 0000000000..a29c2c859f
--- /dev/null
+++ b/data/2021/neurips/Learning Riemannian metric for disease progression modeling	
@@ -0,0 +1 @@
+Linear mixed-effect models provide a natural baseline for estimating disease progression using longitudinal data. They provide interpretable models at the cost of modeling assumptions on the progression proﬁles and their variability across subjects. A signiﬁcant improvement is to embed the data in a Riemannian manifold and learn patient-speciﬁc trajectories distributed around a central geodesic. A few interpretable parameters characterize subject trajectories at the cost of a prior choice of the metric, which determines the shape of the trajectories. We extend this approach by learning the metric from the data allowing more ﬂexibility while keeping the interpretability. Speciﬁcally, we learn the metric as the push-forward of the Euclidean metric by a diffeomorphism. This diffeomorphism is estimated iteratively as the composition of radial basis functions belonging to a reproducible kernel Hilbert space. The metric update allows us to improve the forecasting of imaging and clinical biomarkers in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. Our results compare favorably to the 56 methods benchmarked in the TADPOLE challenge.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Robust Hierarchical Patterns of Human Brain across Many fMRI Studies b/data/2021/neurips/Learning Robust Hierarchical Patterns of Human Brain across Many fMRI Studies
new file mode 100644
index 0000000000..262d1715ec
--- /dev/null
+++ b/data/2021/neurips/Learning Robust Hierarchical Patterns of Human Brain across Many fMRI Studies	
@@ -0,0 +1 @@
+Resting-state fMRI has been shown to provide surrogate biomarkers for the analysis of various diseases. In addition, fMRI data helps in understanding the brain's functional working during resting state and task-induced activity. To improve the statistical power of biomarkers and the understanding mechanism of the brain, pooling of multi-center studies has become increasingly popular. But pooling the data from multiple sites introduces variations due to hardware, software, and environment. In this paper, we look at the estimation problem of hierarchical Sparsity Connectivity Patterns (hSCPs) in fMRI data acquired on multiple sites. We introduce a simple yet effective matrix factorization based formulation to reduce site-related effects while preserving biologically relevant variations. We leverage adversarial learning in the unsupervised regime to improve the reproducibility of the components. Experiments on simulated datasets display that the proposed method can estimate components with improved accuracy and reproducibility. We also demonstrate the improved reproducibility of the components while preserving age-related variation on a real dataset compiled from multiple sites.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Semantic Representations to Verify Hardware Designs b/data/2021/neurips/Learning Semantic Representations to Verify Hardware Designs
new file mode 100644
index 0000000000..f1a8bcd9ac
--- /dev/null
+++ b/data/2021/neurips/Learning Semantic Representations to Verify Hardware Designs	
@@ -0,0 +1 @@
+Verification is a serious bottleneck in the industrial hardware design cycle, routinely requiring person-years of effort. Practical verification relies on a “best effort” process that simulates the design on test inputs. This suggests a new research question: Can this simulation data be exploited to learn a continuous representation of a hardware design that allows us to predict its functionality? As a first approach to this new problem, we introduce Design2Vec, a deep architecture that learns semantic abstractions of hardware designs. The key idea is to work at a higher level of abstraction than the gate or the bit level, namely the Register Transfer Level (RTL), which is similar to software source code, and can be represented by a graph that incorporates control and data flow. This allows us to learn representations of RTL syntax and semantics using a graph neural network. We apply these representations to several tasks within verification, including predicting what cover points of the design will be covered (simulated) by a test, and generating new tests to cover desired cover points. We evaluate Design2Vec on three real-world hardware designs, including the TPU, Google’s industrial chip used in commercial data centers. Our results demonstrate that Design2Vec dramatically outperforms baseline approaches that do not incorporate the RTL semantics and scales to industrial designs. It generates tests that cover design points that are considered hard to cover with manually written tests by design verification experts in a fraction of the time.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Signal-Agnostic Manifolds of Neural Fields b/data/2021/neurips/Learning Signal-Agnostic Manifolds of Neural Fields
new file mode 100644
index 0000000000..64a89cde55
--- /dev/null
+++ b/data/2021/neurips/Learning Signal-Agnostic Manifolds of Neural Fields	
@@ -0,0 +1 @@
+Deep neural networks have been used widely to learn the latent structure of datasets, across modalities such as images, shapes, and audio signals. However, existing models are generally modality-dependent, requiring custom architectures and objectives to process different classes of signals. We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains in a modality-independent manner. We cast our task as one of learning a manifold, where we aim to infer a low-dimensional, locally linear subspace in which our data resides. By enforcing coverage of the manifold, local linearity, and local isometry, our model -- dubbed GEM -- learns to capture the underlying structure of datasets across modalities. We can then travel along linear regions of our manifold to obtain perceptually consistent interpolations between samples, and can further use GEM to recover points on our manifold and glean not only diverse completions of input images, but cross-modal hallucinations of audio or image signals. Finally, we show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains. Code and additional results are available at https://yilundu.github.io/gem/.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Space Partitions for Path Planning b/data/2021/neurips/Learning Space Partitions for Path Planning
new file mode 100644
index 0000000000..5f56f90d8c
--- /dev/null
+++ b/data/2021/neurips/Learning Space Partitions for Path Planning	
@@ -0,0 +1 @@
+Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method LaP3 which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, LaP3 outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into the planning components of model-based RL such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 39% on average across 9 tasks, and in molecular design by up to 0.4 on properties on a 0-1 scale. Code is available at https://github.com/yangkevin2/neurips2021-lap3.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems b/data/2021/neurips/Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems
new file mode 100644
index 0000000000..c6d6658f3d
--- /dev/null
+++ b/data/2021/neurips/Learning Stable Deep Dynamics Models for Partially Observed or Delayed Dynamical Systems	
@@ -0,0 +1 @@
+Learning how complex dynamical systems evolve over time is a key challenge in system identification. For safety critical systems, it is often crucial that the learned model is guaranteed to converge to some equilibrium point. To this end, neural ODEs regularized with neural Lyapunov functions are a promising approach when states are fully observed. For practical applications however, partial observations are the norm. As we will demonstrate, initialization of unobserved augmented states can become a key problem for neural ODEs. To alleviate this issue, we propose to augment the system's state with its history. Inspired by state augmentation in discrete-time systems, we thus obtain neural delay differential equations. Based on classical time delay stability analysis, we then show how to ensure stability of the learned models, and theoretically analyze our approach. Our experiments demonstrate its applicability to stable system identification of partially observed systems and learning a stabilizing feedback policy in delayed feedback control.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning State Representations from Random Deep Action-conditional Predictions b/data/2021/neurips/Learning State Representations from Random Deep Action-conditional Predictions
new file mode 100644
index 0000000000..0bc5f373f5
--- /dev/null
+++ b/data/2021/neurips/Learning State Representations from Random Deep Action-conditional Predictions	
@@ -0,0 +1 @@
+Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks. In another set of experiments we stop the gradients from the RL part of the network to the state representation learning part of the network and show, perhaps surprisingly, that the auxiliary tasks alone are sufficient to learn state representations good enough to outperform an end-to-end trained actor-critic baseline. We opensourced our code at https://github.com/Hwhitetooth/random_gvfs.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound b/data/2021/neurips/Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound
new file mode 100644
index 0000000000..bb96ccab99
--- /dev/null
+++ b/data/2021/neurips/Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound	
@@ -0,0 +1 @@
+We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Student-Friendly Teacher Networks for Knowledge Distillation b/data/2021/neurips/Learning Student-Friendly Teacher Networks for Knowledge Distillation
new file mode 100644
index 0000000000..53016e3334
--- /dev/null
+++ b/data/2021/neurips/Learning Student-Friendly Teacher Networks for Knowledge Distillation	
@@ -0,0 +1 @@
+We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation methods can adopt this technique to improve the performance of diverse student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding accuracy in several well-known knowledge distillation techniques with various combinations of teacher and student models even in the case that their architectures are heterogeneous and there is no prior knowledge about student models at the time of training teacher networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks b/data/2021/neurips/Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks
new file mode 100644
index 0000000000..a4622cc8f8
--- /dev/null
+++ b/data/2021/neurips/Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural Networks	
@@ -0,0 +1 @@
+In recent years, several results in the supervised learning setting suggested that classical statistical learning-theoretic measures, such as VC dimension, do not adequately explain the performance of deep learning models which prompted a slew of work in the infinite-width and iteration regimes. However, there is little theoretical explanation for the success of neural networks beyond the supervised setting. In this paper we argue that, under some distributional assumptions, classical learning-theoretic measures can sufficiently explain generalization for graph neural networks in the transductive setting. In particular, we provide a rigorous analysis of the performance of neural networks in the context of transductive inference, specifically by analysing the generalisation properties of graph convolutional networks for the problem of node classification. While VC Dimension does result in trivial generalisation error bounds in this setting as well, we show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for stochastic block models. We further use the generalisation error bounds based on transductive Rademacher complexity to demonstrate the role of graph convolutions and network architectures in achieving smaller generalisation error and provide insights into when the graph structure can help in learning. The findings of this paper could re-new the interest in studying generalisation in neural networks in terms of learning-theoretic measures, albeit in specific problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Transferable Adversarial Perturbations b/data/2021/neurips/Learning Transferable Adversarial Perturbations
new file mode 100644
index 0000000000..622c3d82a6
--- /dev/null
+++ b/data/2021/neurips/Learning Transferable Adversarial Perturbations	
@@ -0,0 +1 @@
+While effective, deep neural networks (DNNs) are vulnerable to adversarial attacks. In particular, recent work has shown that such attacks could be generated by another deep network, leading to signiﬁcant speedups over optimization-based perturbations. However, the ability of such generative methods to generalize to different test-time situations has not been systematically studied. In this paper, we therefore investigate the transferability of generated perturbations when the conditions at inference time differ from the training ones in terms of target architecture, target data, and target task. Speciﬁcally, we identify the mid-level features extracted by the intermediate layers of DNNs as common ground across different architectures, datasets, and tasks. This lets us introduce a loss function based on such mid-level features to learn an effective, transferable perturbation generator. Our experiments demonstrate that our approach outperforms the state-of-the-art universal and transferable attack strategies.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training b/data/2021/neurips/Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training
new file mode 100644
index 0000000000..86f2e00563
--- /dev/null
+++ b/data/2021/neurips/Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training	
@@ -0,0 +1 @@
+Most existing point cloud detection models require large-scale, densely annotated datasets. They typically underperform in domain adaptation settings, due to geometry shifts caused by different physical environments or LiDAR sensor conﬁgurations. Therefore, it is challenging but valuable to learn transferable features between a labeled source domain and a novel target domain, without any access to target labels. To tackle this problem, we introduce the framework of 3D Co ntrastive Co -training ( 3D-CoCo ) with two technical contributions. First, 3D-CoCo is inspired by our observation that the bird-eye-view (BEV) features are more transferable than low-level geometry features. We thus propose a new co-training architecture that includes separate 3D encoders with domain-speciﬁc parameters, as well as a BEV transformation module for learning domain-invariant features. Second, 3D-CoCo extends the approach of contrastive instance alignment to point cloud detection, whose performance was largely hindered by the mismatch between the ﬁctitious distribution of BEV features, induced by pseudo-labels, and the true distribution. The mismatch is greatly reduced by 3D-CoCo with transformed point clouds, which are carefully designed by considering speciﬁc geometry priors. We construct new domain adaptation benchmarks using three large-scale 3D datasets. Experimental results show that our proposed 3D-CoCo effectively closes the domain gap and outperforms the state-of-the-art methods by large margins.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Treatment Effects in Panels with General Intervention Patterns b/data/2021/neurips/Learning Treatment Effects in Panels with General Intervention Patterns
new file mode 100644
index 0000000000..2a942d92af
--- /dev/null
+++ b/data/2021/neurips/Learning Treatment Effects in Panels with General Intervention Patterns	
@@ -0,0 +1 @@
+The problem of causal inference with panel data is a central econometric question. The following is a fundamental version of this problem: Let $M^*$ be a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix $Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} := M^*_{ij} + E_{ij} + \mathcal{T}_{ij} Z_{ij}$ where $\mathcal{T}_{ij} $ are unknown, heterogenous treatment effects. The problem requires we estimate the average treatment effect $\tau^* := \sum_{ij} \mathcal{T}_{ij} Z_{ij} / \sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to estimating $\tau^*$ when $Z$ places support on a single row. This paper extends that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus broadly expanding its applicability. Our guarantees are the first of their type in this general setting. Computational experiments on synthetic and real-world data show a substantial advantage over competing estimators.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning b/data/2021/neurips/Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning
new file mode 100644
index 0000000000..dd8d0ef5f5
--- /dev/null
+++ b/data/2021/neurips/Learning Tree Interpretation from Object Representation for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Interpreting Deep Reinforcement Learning (DRL) models is important to enhance trust and comply with transparency regulations. Existing methods typically explain a DRL model by visualizing the importance of low-level input features with super-pixels, attentions, or saliency maps. Our approach provides an interpretation based on high-level latent object features derived from a disentangled representation. We propose a Represent And Mimic (RAMi) framework for training 1) an identiﬁable latent representation to capture the independent factors of variation for the objects and 2) a mimic tree that extracts the causal impact of the latent features on DRL action values. To jointly optimize both the ﬁdelity and the simplicity of a mimic tree, we derive a novel Minimum Description Length (MDL) objective based on the Information Bottleneck (IB) principle. Based on this objective, we describe a Monte Carlo Regression Tree Search (MCRTS) algorithm that explores different splits to ﬁnd the IB-optimal mimic tree. Experiments show that our mimic tree achieves strong approximation performance with signiﬁcantly fewer nodes than baseline models. We demonstrate the interpretability of our mimic tree by showing latent traversals, decision rules, causal impacts, and human evaluation results.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning a Single Neuron with Bias Using Gradient Descent b/data/2021/neurips/Learning a Single Neuron with Bias Using Gradient Descent
new file mode 100644
index 0000000000..d9c93c49d7
--- /dev/null
+++ b/data/2021/neurips/Learning a Single Neuron with Bias Using Gradient Descent	
@@ -0,0 +1 @@
+We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}>+ b)$) in the realizable setting with the ReLU activation, using gradient descent. Perhaps surprisingly, we show that this is a significantly different and more challenging problem than the bias-less case (which was the focus of previous works on single neurons), both in terms of the optimization geometry as well as the ability of gradient methods to succeed in some scenarios. We provide a detailed study of this problem, characterizing the critical points of the objective, demonstrating failure cases, and providing positive convergence guarantees under different sets of assumptions. To prove our results, we develop some tools which may be of independent interest, and improve previous results on learning single neurons.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning and Generalization in RNNs b/data/2021/neurips/Learning and Generalization in RNNs
new file mode 100644
index 0000000000..b85d4c5789
--- /dev/null
+++ b/data/2021/neurips/Learning and Generalization in RNNs	
@@ -0,0 +1 @@
+Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning curves of generic features maps for realistic datasets with a teacher-student model b/data/2021/neurips/Learning curves of generic features maps for realistic datasets with a teacher-student model
new file mode 100644
index 0000000000..8224493aca
--- /dev/null
+++ b/data/2021/neurips/Learning curves of generic features maps for realistic datasets with a teacher-student model	
@@ -0,0 +1 @@
+Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: first, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones—such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering b/data/2021/neurips/Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering
new file mode 100644
index 0000000000..9333d15b9a
--- /dev/null
+++ b/data/2021/neurips/Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering	
@@ -0,0 +1 @@
+Recent advances in the video question answering (i.e., VideoQA) task have achieved strong success by following the paradigm of ﬁne-tuning each clip-text pair independently on the pretrained transformer-based model via supervised learning. Intuitively, multiple samples (i.e., clips) should be interdependent to capture similar visual and key semantic information in the same video. To consider the interdependent knowledge between contextual clips into the network inference, we propose a Sia mese Sam pling and Rea soning ( SiaSamRea ) approach, which consists of a siamese sampling mechanism to generate sparse and similar clips (i.e., siamese clips) from the same video, and a novel reasoning strategy for integrating the inter-dependent knowledge between contextual clips into the network. The reasoning strategy contains two modules: (1) siamese knowledge generation to learn the inter-relationship among clips; (2) siamese knowledge reasoning to produce the reﬁned soft label by propagating the weights of inter-relationship to the predicted candidates of all clips. Finally, our SiaSamRea can endow the current multimodal reasoning paradigm with the ability of learning from inside via the guidance of soft labels. Extensive experiments demonstrate our SiaSamRea achieves state-of-the-art performance on ﬁve VideoQA benchmarks, e.g., a signiﬁcant +2.1% gain on MSRVTT-QA, +2.9% on MSVD-QA, +1.0% on ActivityNet-QA, +1.8% on How2QA and +4.3
\ No newline at end of file
diff --git a/data/2021/neurips/Learning in Multi-Stage Decentralized Matching Markets b/data/2021/neurips/Learning in Multi-Stage Decentralized Matching Markets
new file mode 100644
index 0000000000..f45ebec3e3
--- /dev/null
+++ b/data/2021/neurips/Learning in Multi-Stage Decentralized Matching Markets	
@@ -0,0 +1 @@
+Matching markets are often organized in a multi-stage and decentralized manner. Moreover, participants in real-world matching markets often have uncertain preferences. This article develops a framework for learning optimal strategies in such settings, based on a nonparametric statistical approach and variational analysis. We propose an efficient algorithm, built upon concepts of"lower uncertainty bound"and"calibrated decentralized matching,"for maximizing the participants' expected payoff. We show that there exists a welfare-versus-fairness trade-off that is characterized by the uncertainty level of acceptance. Participants will strategically act in favor of a low uncertainty level to reduce competition and increase expected payoff. We prove that participants can be better off with multi-stage matching compared to single-stage matching. We demonstrate aspects of the theoretical predictions through simulations and an experiment using real data from college admissions.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning in Non-Cooperative Configurable Markov Decision Processes b/data/2021/neurips/Learning in Non-Cooperative Configurable Markov Decision Processes
new file mode 100644
index 0000000000..14724b5e8e
--- /dev/null
+++ b/data/2021/neurips/Learning in Non-Cooperative Configurable Markov Decision Processes	
@@ -0,0 +1 @@
+The Conﬁgurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a conﬁgurator that can modify some environmental parameters to improve the agent’s performance. This presupposes that the two actors have identical reward functions. What if the conﬁgurator does not have the same intentions as the agent? This paper introduces the Non-Cooperative Conﬁgurable Markov Decision Process, a framework that allows modeling two (possibly different) reward functions for the conﬁgurator and the agent. Then, we consider an online learning problem, where the conﬁgurator has to ﬁnd the best among a ﬁnite set of possible conﬁgurations. We propose two learning algorithms to minimize the conﬁgurator’s expected regret, which exploit the problem’s structure, depending on the agent’s feedback. While a naïve application of the UCB algorithm yields a regret that grows indeﬁnitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically validate the performance of our algorithm in simulated domains.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning in two-player zero-sum partially observable Markov games with perfect recall b/data/2021/neurips/Learning in two-player zero-sum partially observable Markov games with perfect recall
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Learning interaction rules from multi-animal trajectories via augmented behavioral models b/data/2021/neurips/Learning interaction rules from multi-animal trajectories via augmented behavioral models
new file mode 100644
index 0000000000..f3207e7da7
--- /dev/null
+++ b/data/2021/neurips/Learning interaction rules from multi-animal trajectories via augmented behavioral models	
@@ -0,0 +1 @@
+Extracting the interaction rules of biological agents from movement sequences pose challenges in various domains. Granger causality is a practical framework for analyzing the interactions from observed time-series data; however, this framework ignores the structures and assumptions of the generative process in animal behaviors, which may lead to interpretational problems and sometimes erroneous assessments of causality. In this paper, we propose a new framework for learning Granger causality from multi-animal trajectories via augmented theory-based behavioral models with interpretable data-driven models. We adopt an approach for augmenting incomplete multi-agent behavioral models described by time-varying dynamical systems with neural networks. For efficient and interpretable learning, our model leverages theory-based architectures separating navigation and motion processes, and the theory-guided regularization for reliable behavioral modeling. This can provide interpretable signs of Granger-causal effects over time, i.e., when specific others cause the approach or separation. In experiments using synthetic datasets, our method achieved better performance than various baselines. We then analyzed multi-animal datasets of mice, flies, birds, and bats, which verified our method and obtained novel biological insights.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning latent causal graphs via mixture oracles b/data/2021/neurips/Learning latent causal graphs via mixture oracles
new file mode 100644
index 0000000000..5dde751345
--- /dev/null
+++ b/data/2021/neurips/Learning latent causal graphs via mixture oracles	
@@ -0,0 +1 @@
+We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. These results highlight an intriguing connection between the well-studied problem of learning the order of a mixture model and the problem of learning the bipartite structure between observables and unobservables. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters b/data/2021/neurips/Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters
new file mode 100644
index 0000000000..34e0c9e133
--- /dev/null
+++ b/data/2021/neurips/Learning on Random Balls is Sufficient for Estimating (Some) Graph Parameters	
@@ -0,0 +1 @@
+Theoretical analyses for graph learning methods often assume a complete observation of the input graph. Such an assumption might not be useful for handling any-size graphs due to the scalability issues in practice. In this work, we develop a theoretical framework for graph classification problems in the partial observation setting (i.e., subgraph samplings). Equipped with insights from graph limit theory, we propose a new graph classification model that works on a randomly sampled subgraph and a novel topology to characterize the representability of the model. Our theoretical framework contributes a theoretical validation of mini-batch learning on graphs and leads to new learning-theoretic results on generalization bounds as well as size-generalizability without assumptions on the input.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning rule influences recurrent network representations but not attractor structure in decision-making tasks b/data/2021/neurips/Learning rule influences recurrent network representations but not attractor structure in decision-making tasks
new file mode 100644
index 0000000000..713a83f490
--- /dev/null
+++ b/data/2021/neurips/Learning rule influences recurrent network representations but not attractor structure in decision-making tasks	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) are popular tools for studying computational dynamics in neurobiological circuits. However, due to the dizzying array of design choices, it is unclear if computational dynamics unearthed from RNNs provide reliable neurobiological inferences. Understanding the effects of design choices on RNN computation is valuable in two ways. First, invariant properties that persist in RNNs across a wide range of design choices are more likely to be candidate neurobiological mechanisms. Second, understanding what design choices lead to similar dynamical solutions reduces the burden of imposing that all design choices be totally faithful replications of biology. We focus our investigation on how RNN learning rule and task design affect RNN computation. We trained large populations of RNNs with different, but commonly used, learning rules on decision-making tasks inspired by neuroscience literature. For relatively complex tasks, we ﬁnd that attractor topology is invariant to the choice of learning rule, but representational geometry is not. For simple tasks, we ﬁnd that attractor topology depends on task input noise. However, when a task becomes increasingly complex, RNN attractor topology becomes invariant to input noise. Together, our results suggest that RNN dynamics are robust across learning rules but can be sensitive to the training task design, especially for simpler tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning the optimal Tikhonov regularizer for inverse problems b/data/2021/neurips/Learning the optimal Tikhonov regularizer for inverse problems
new file mode 100644
index 0000000000..eed7f843ab
--- /dev/null
+++ b/data/2021/neurips/Learning the optimal Tikhonov regularizer for inverse problems	
@@ -0,0 +1 @@
+In this work, we consider the linear inverse problem $y=Ax+\epsilon$, where $A\colon X\to Y$ is a known linear operator between the separable Hilbert spaces $X$ and $Y$, $x$ is a random variable in $X$ and $\epsilon$ is a zero-mean random process in $Y$. This setting covers several inverse problems in imaging including denoising, deblurring, and X-ray tomography. Within the classical framework of regularization, we focus on the case where the regularization functional is not given a priori but learned from data. Our first result is a characterization of the optimal generalized Tikhonov regularizer, with respect to the mean squared error. We find that it is completely independent of the forward operator $A$ and depends only on the mean and covariance of $x$. Then, we consider the problem of learning the regularizer from a finite training set in two different frameworks: one supervised, based on samples of both $x$ and $y$, and one unsupervised, based only on samples of $x$. In both cases, we prove generalization bounds, under some weak assumptions on the distribution of $x$ and $\epsilon$, including the case of sub-Gaussian variables. Our bounds hold in infinite-dimensional spaces, thereby showing that finer and finer discretizations do not make this learning problem harder. The results are validated through numerical simulations.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Adapt via Latent Domains for Adaptive Semantic Segmentation b/data/2021/neurips/Learning to Adapt via Latent Domains for Adaptive Semantic Segmentation
new file mode 100644
index 0000000000..6ed5ad06ac
--- /dev/null
+++ b/data/2021/neurips/Learning to Adapt via Latent Domains for Adaptive Semantic Segmentation	
@@ -0,0 +1 @@
+Domain adaptive semantic segmentation aims to transfer knowledge learned from labeled source domain to unlabeled target domain. To narrow down the domain gap and ease adaptation difficulty, some recent methods translate source images to target-like images (latent domains), which are used as supplement or substitute to the original source data. Nevertheless, these methods neglect to explicitly model the relationship of knowledge transferring across different domains. Alternatively, in this work we break through the standard “source-target” one pair adaptation framework and construct multiple adaptation pairs (e.g. “source-latent” and “latenttarget”). The purpose is to use the meta-knowledge (how to adapt) learned from one pair as guidance to assist the adaptation of another pair under a meta-learning framework. Furthermore, we extend our method to a more practical setting of open compound domain adaptation (a.k.a multiple-target domain adaptation), where the target is a compound of multiple domains without domain labels. In this setting, we embed an additional pair of “latent-latent” to reduce the domain gap between the source and different latent domains, allowing the model to adapt well on multiple target domains simultaneously. When evaluated on standard benchmarks, our method is superior to the state-of-the-art methods in both the single target and multiple-target domain adaptation settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Assimilate in Chaotic Dynamical Systems b/data/2021/neurips/Learning to Assimilate in Chaotic Dynamical Systems
new file mode 100644
index 0000000000..2b7468b34e
--- /dev/null
+++ b/data/2021/neurips/Learning to Assimilate in Chaotic Dynamical Systems	
@@ -0,0 +1 @@
+The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amortized assimilation, a framework for learning to assimilate in dynamical systems from sequences of noisy observations with no need for ground truth data. We motivate the framework by extending powerful results from self-supervised denoising to the dynamical systems setting through the use of differentiable simulation. Experimental results across several benchmark systems highlight the improved effectiveness of our approach over widely-used data assimilation methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Combine Per-Example Solutions for Neural Program Synthesis b/data/2021/neurips/Learning to Combine Per-Example Solutions for Neural Program Synthesis
new file mode 100644
index 0000000000..78936839f4
--- /dev/null
+++ b/data/2021/neurips/Learning to Combine Per-Example Solutions for Neural Program Synthesis	
@@ -0,0 +1 @@
+The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines. The code, data and trained models for our work can be found at https://github.com/shrivastavadisha/N-PEPS.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Compose Visual Relations b/data/2021/neurips/Learning to Compose Visual Relations
new file mode 100644
index 0000000000..f60dfc4a0c
--- /dev/null
+++ b/data/2021/neurips/Learning to Compose Visual Relations	
@@ -0,0 +1 @@
+The visual world around us can be described as a structured set of objects and their associated relations. An image of a room may be conjured given only the description of the underlying objects and their associated relations. While there has been significant work on designing deep neural networks which may compose individual objects together, less work has been done on composing the individual relations between objects. A principal difficulty is that while the placement of objects is mutually independent, their relations are entangled and dependent on each other. To circumvent this issue, existing works primarily compose relations by utilizing a holistic encoder, in the form of text or graphs. In this work, we instead propose to represent each relation as an unnormalized density (an energy-based model), enabling us to compose separate relations in a factorized manner. We show that such a factorized decomposition allows the model to both generate and edit scenes that have multiple sets of relations more faithfully. We further show that decomposition enables our model to effectively understand the underlying relational scene structure. Project page at: https://composevisualrelations.github.io/.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Draw: Emergent Communication through Sketching b/data/2021/neurips/Learning to Draw: Emergent Communication through Sketching
new file mode 100644
index 0000000000..a7e389b3cf
--- /dev/null
+++ b/data/2021/neurips/Learning to Draw: Emergent Communication through Sketching	
@@ -0,0 +1 @@
+Evidence that visual communication preceded written language and provided a basis for it goes back to prehistory, in forms such as cave and rock paintings depicting traces of our distant ancestors. Emergent communication research has sought to explore how agents can learn to communicate in order to collaboratively solve tasks. Existing research has focused on language, with a learned communication channel transmitting sequences of discrete tokens between the agents. In this work, we explore a visual communication channel between agents that are allowed to draw with simple strokes. Our agents are parameterised by deep neural networks, and the drawing procedure is differentiable, allowing for end-to-end training. In the framework of a referential communication game, we demonstrate that agents can not only successfully learn to communicate by drawing, but with appropriate inductive biases, can do so in a fashion that humans can interpret. We hope to encourage future research to consider visual communication as a more flexible and directly interpretable alternative of training collaborative agents.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Elect b/data/2021/neurips/Learning to Elect
new file mode 100644
index 0000000000..75711463f0
--- /dev/null
+++ b/data/2021/neurips/Learning to Elect	
@@ -0,0 +1 @@
+Voting systems have a wide range of applications including recommender systems, web search, product design and elections. Limited by the lack of general-purpose analytical tools, it is difficult to hand-engineer desirable voting rules for each use case. For this reason, it is appealing to automatically discover voting rules geared towards each scenario. In this paper, we show that set-input neural network architectures such as Set Transformers, fully-connected graph networks and DeepSets are both theoretically and empirically well-suited for learning voting rules. In particular, we show that these network models can not only mimic a number of existing voting rules to compelling accuracy -- both position-based (such as Plurality and Borda) and comparison-based (such as Kemeny, Copeland and Maximin) -- but also discover near-optimal voting rules that maximize different social welfare functions. Furthermore, the learned voting rules generalize well to different voter utility distributions and election sizes unseen during training.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics b/data/2021/neurips/Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics
new file mode 100644
index 0000000000..313e2b06ff
--- /dev/null
+++ b/data/2021/neurips/Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics	
@@ -0,0 +1 @@
+Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and model-based planners are. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training b/data/2021/neurips/Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training
new file mode 100644
index 0000000000..bb52c51f90
--- /dev/null
+++ b/data/2021/neurips/Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware Adversarial Training	
@@ -0,0 +1 @@
+Existing deep learning real denoising methods require a large amount of noisy-clean image pairs for supervision. Nonetheless, capturing a real noisy-clean dataset is an unacceptable expensive and cumbersome procedure. To alleviate this problem, this work investigates how to generate realistic noisy images. Firstly, we formulate a simple yet reasonable noise model that treats each real noisy pixel as a random variable. This model splits the noisy image generation problem into two sub-problems: image domain alignment and noise domain alignment. Subsequently, we propose a novel framework, namely Pixel-level Noise-aware Generative Adversarial Network (PNGAN). PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space to perform image domain alignment. Simultaneously, PNGAN establishes a pixel-level adversarial training to conduct noise domain alignment. Additionally, for better noise fitting, we present an efficient architecture Simple Multi-scale Network (SMNet) as the generator. Qualitative validation shows that noise generated by PNGAN is highly similar to real noise in terms of intensity and distribution. Quantitative experiments demonstrate that a series of denoisers trained with the generated noisy images achieve state-of-the-art (SOTA) results on four real denoising benchmarks. Part of codes, pre-trained models, and results are available at https://github.com/caiyuanhao1998/PNGAN for comparisons.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Generate Visual Questions with Noisy Supervision b/data/2021/neurips/Learning to Generate Visual Questions with Noisy Supervision
new file mode 100644
index 0000000000..8cd942bd6c
--- /dev/null
+++ b/data/2021/neurips/Learning to Generate Visual Questions with Noisy Supervision	
@@ -0,0 +1 @@
+The task of visual question generation (VQG) aims to generate human-like neural questions from an image and potentially other side information (e.g., answer type or the answer itself). Existing works often suffer from the severe one image to many questions mapping problem, which generates uninformative and non-referential questions. Recent work has demonstrated that by leveraging double visual and answer hints, a model can faithfully generate much better quality questions. However, visual hints are not available naturally. Despite they proposed a simple rule-based similarity matching method to obtain candidate visual hints, they could be very noisy practically and thus restrict the quality of generated questions. In this paper, we present a novel learning approach for double-hints based VQG, which can be cast as a weakly supervised learning problem with noises. The key rationale is that the salient visual regions of interest can be viewed as a constraint to improve the generation procedure for producing high-quality questions. As a result, given the predicted salient visual regions of interest, we can focus on estimating the probability of being ground-truth questions, which in turn implicitly measures the quality of predicted visual hints. Experimental results on two benchmark datasets show that our proposed method outperforms the state-of-the-art approaches by a large margin on a variety of metrics, including both automatic machine metrics and human evaluation.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Ground Multi-Agent Communication with Autoencoders b/data/2021/neurips/Learning to Ground Multi-Agent Communication with Autoencoders
new file mode 100644
index 0000000000..1b07a07d9a
--- /dev/null
+++ b/data/2021/neurips/Learning to Ground Multi-Agent Communication with Autoencoders	
@@ -0,0 +1 @@
+Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -- autoencoding -- is sufficient for arriving at a grounded common language. When agents broadcast these representations, they learn to understand and respond to each other's utterances and achieve surprisingly strong task performance across a variety of multi-agent communication environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer b/data/2021/neurips/Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer
new file mode 100644
index 0000000000..62bc4a43f4
--- /dev/null
+++ b/data/2021/neurips/Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer	
@@ -0,0 +1 @@
+Recently, Transformer has become a prevailing deep architecture for solving vehicle routing problems (VRPs). However, it is less effective in learning improvement models for VRP because its positional encoding (PE) method is not suitable in representing VRP solutions. This paper presents a novel Dual-Aspect Collaborative Transformer (DACT) to learn embeddings for the node and positional features separately, instead of fusing them together as done in existing ones, so as to avoid potential noises and incompatible correlations. Moreover, the positional features are embedded through a novel cyclic positional encoding (CPE) method to allow Transformer to effectively capture the circularity and symmetry of VRP solutions (i.e., cyclic sequences). We train DACT using Proximal Policy Optimization and design a curriculum learning strategy for better sample efficiency. We apply DACT to solve the traveling salesman problem (TSP) and capacitated vehicle routing problem (CVRP). Results show that our DACT outperforms existing Transformer based improvement models, and exhibits much better generalization performance across different problem sizes on synthetic and benchmark instances, respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Learn Dense Gaussian Processes for Few-Shot Learning b/data/2021/neurips/Learning to Learn Dense Gaussian Processes for Few-Shot Learning
new file mode 100644
index 0000000000..c1c046a707
--- /dev/null
+++ b/data/2021/neurips/Learning to Learn Dense Gaussian Processes for Few-Shot Learning	
@@ -0,0 +1 @@
+Gaussian processes with deep neural networks demonstrate to be a strong learner for few-shot learning since they combine the strength of deep learning and kernels while being able to well capture uncertainty. However, it remains an open problem to leverage the shared knowledge provided by related tasks. In this paper, we propose to learn Gaussian processes with dense inducing variables by meta-learning for few-shot learning. In contrast to sparse Gaussian processes, we deﬁne a set of dense inducing variables to be of a much larger size than the support set in each task, which collects prior knowledge from experienced tasks. The dense inducing variables specify a shared Gaussian process prior over prediction functions of all tasks, which are learned in a variational inference framework and offer a strong inductive bias for learning new tasks. To achieve task-speciﬁc prediction functions, we propose to adapt the inducing variables to each task by efﬁcient gradient descent. We conduct extensive experiments on common benchmark datasets for a variety of few-shot learning tasks. Our dense Gaussian processes present signiﬁcant improvements over vanilla Gaussian processes and comparable or even better performance with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Learn Graph Topologies b/data/2021/neurips/Learning to Learn Graph Topologies
new file mode 100644
index 0000000000..634743ac38
--- /dev/null
+++ b/data/2021/neurips/Learning to Learn Graph Topologies	
@@ -0,0 +1 @@
+Learning a graph topology to reveal the underlying relationship between data entities plays an important role in various machine learning and data analysis tasks. Under the assumption that structured data vary smoothly over a graph, the problem can be formulated as a regularised convex optimisation over a positive semidefinite cone and solved by iterative algorithms. Classic methods require an explicit convex function to reflect generic topological priors, e.g. the $\ell_1$ penalty for enforcing sparsity, which limits the flexibility and expressiveness in learning rich topological structures. We propose to learn a mapping from node data to the graph structure based on the idea of learning to optimise (L2O). Specifically, our model first unrolls an iterative primal-dual splitting algorithm into a neural network. The key structural proximal projection is replaced with a variational autoencoder that refines the estimated graph with enhanced topological properties. The model is trained in an end-to-end fashion with pairs of node data and graph samples. Experiments on both synthetic and real-world data demonstrate that our model is more efficient than classic iterative algorithms in learning a graph with specific topological properties.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Predict Trustworthiness with Steep Slope Loss b/data/2021/neurips/Learning to Predict Trustworthiness with Steep Slope Loss
new file mode 100644
index 0000000000..577f525af6
--- /dev/null
+++ b/data/2021/neurips/Learning to Predict Trustworthiness with Steep Slope Loss	
@@ -0,0 +1 @@
+Understanding the trustworthiness of a prediction yielded by a classifier is critical for the safe and effective use of AI models. Prior efforts have been proven to be reliable on small-scale datasets. In this work, we study the problem of predicting trustworthiness on real-world large-scale datasets, where the task is more challenging due to high-dimensional features, diverse visual concepts, and large-scale samples. In such a setting, we observe that the trustworthiness predictors trained with prior-art loss functions, i.e., the cross entropy loss, focal loss, and true class probability confidence loss, are prone to view both correct predictions and incorrect predictions to be trustworthy. The reasons are two-fold. Firstly, correct predictions are generally dominant over incorrect predictions. Secondly, due to the data complexity, it is challenging to differentiate the incorrect predictions from the correct ones on real-world large-scale datasets. To improve the generalizability of trustworthiness predictors, we propose a novel steep slope loss to separate the features w.r.t. correct predictions from the ones w.r.t. incorrect predictions by two slide-like curves that oppose each other. The proposed loss is evaluated with two representative deep learning models, i.e., Vision Transformer and ResNet, as trustworthiness predictors. We conduct comprehensive experiments and analyses on ImageNet, which show that the proposed loss effectively improves the generalizability of trustworthiness predictors. The code and pre-trained trustworthiness predictors for reproducibility are available at https://github.com/luoyan407/predict_trustworthiness.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Schedule Heuristics in Branch and Bound b/data/2021/neurips/Learning to Schedule Heuristics in Branch and Bound
new file mode 100644
index 0000000000..b26d09ad99
--- /dev/null
+++ b/data/2021/neurips/Learning to Schedule Heuristics in Branch and Bound	
@@ -0,0 +1 @@
+Primal heuristics play a crucial role in exact solvers for Mixed Integer Programming (MIP). While solvers are guaranteed to find optimal solutions given sufficient time, real-world applications typically require finding good solutions early on in the search to enable fast decision-making. While much of MIP research focuses on designing effective heuristics, the question of how to manage multiple MIP heuristics in a solver has not received equal attention. Generally, solvers follow hard-coded rules derived from empirical testing on broad sets of instances. Since the performance of heuristics is instance-dependent, using these general rules for a particular problem might not yield the best performance. In this work, we propose the first data-driven framework for scheduling heuristics in an exact MIP solver. By learning from data describing the performance of primal heuristics, we obtain a problem-specific schedule of heuristics that collectively find many solutions at minimal cost. We provide a formal description of the problem and propose an efficient algorithm for computing such a schedule. Compared to the default settings of a state-of-the-art academic MIP solver, we are able to reduce the average primal integral by up to 49% on a class of challenging instances.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to See by Looking at Noise b/data/2021/neurips/Learning to See by Looking at Noise
new file mode 100644
index 0000000000..857cf934b8
--- /dev/null
+++ b/data/2021/neurips/Learning to See by Looking at Noise	
@@ -0,0 +1 @@
+Current vision systems are trained on huge datasets, and these datasets come with costs: curation is expensive, they inherit human biases, and there are concerns over privacy and usage rights. To counter these costs, interest has surged in learning from cheaper data sources, such as unlabeled images. In this paper we go a step further and ask if we can do away with real image datasets entirely, instead learning from noise processes. We investigate a suite of image generation models that produce images from simple random processes. These are then used as training data for a visual representation learner with a contrastive loss. We study two types of noise processes, statistical image models and deep generative models under different random initializations. Our findings show that it is important for the noise to capture certain structural properties of real data but that good performance can be achieved even with processes that are far from realistic. We also find that diversity is a key property to learn good representations. Datasets, models, and code are available at https://mbaradad.github.io/learning_with_noise.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Select Exogenous Events for Marked Temporal Point Process b/data/2021/neurips/Learning to Select Exogenous Events for Marked Temporal Point Process
new file mode 100644
index 0000000000..ae469de78f
--- /dev/null
+++ b/data/2021/neurips/Learning to Select Exogenous Events for Marked Temporal Point Process	
@@ -0,0 +1 @@
+Marked temporal point processes (MTPPs) have emerged as a powerful modeling tool for a wide variety of applications which are characterized using discrete events localized in continuous time. In this context, the events are of two types— endogenous events which occur due to the influence of the previous events and exogenous events which occur due to the effect of the externalities. However, in practice, the events do not come with endogenous or exogenous labels. To this end, our goal in this paper is to identify the set of exogenous events from a set of unlabeled events. To do so, we first formulate the parameter estimation problem in conjunction with exogenous event set selection problem and show that this problem is NP hard. Next, we prove that the underlying objective is a monotone and α-submodular set function, with respect to the candidate set of exogenous events. Such a characterization subsequently allows us to use a stochastic greedy algorithm which was originally proposed in [64] for submodular maximization. However, we show that it also admits an approximation guarantee for maximizing α-submodular set function, even when the learning algorithm provides an imperfect estimates of the trained parameters. Finally, our experiments with synthetic and real data show that our method performs better than the existing approaches built upon superposition of endogenous and exogenous MTPPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Synthesize Programs as Interpretable and Generalizable Policies b/data/2021/neurips/Learning to Synthesize Programs as Interpretable and Generalizable Policies
new file mode 100644
index 0000000000..294a1d457f
--- /dev/null
+++ b/data/2021/neurips/Learning to Synthesize Programs as Interpretable and Generalizable Policies	
@@ -0,0 +1 @@
+Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to Time-Decode in Spiking Neural Networks Through the Information Bottleneck b/data/2021/neurips/Learning to Time-Decode in Spiking Neural Networks Through the Information Bottleneck
new file mode 100644
index 0000000000..3bf27f8fdc
--- /dev/null
+++ b/data/2021/neurips/Learning to Time-Decode in Spiking Neural Networks Through the Information Bottleneck	
@@ -0,0 +1 @@
+One of the key challenges in training Spiking Neural Networks (SNNs) is that target outputs typically come in the form of natural signals, such as labels for classification or images for generative models, and need to be encoded into spikes. This is done by handcrafting target spiking signals, which in turn implicitly fixes the mechanisms used to decode spikes into natural signals, e.g., rate decoding. The arbitrary choice of target signals and decoding rule generally impairs the capacity of the SNN to encode and process information in the timing of spikes. To address this problem, this work introduces a hybrid variational autoencoder architecture, consisting of an encoding SNN and a decoding Artificial Neural Network (ANN). The role of the decoding ANN is to learn how to best convert the spiking signals output by the SNN into the target natural signal. A novel end-to-end learning rule is introduced that optimizes a directed information bottleneck training criterion via surrogate gradients. We demonstrate the applicability of the technique in an experimental settings on various tasks, including real-life datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to dehaze with polarization b/data/2021/neurips/Learning to dehaze with polarization
new file mode 100644
index 0000000000..e40119c36e
--- /dev/null
+++ b/data/2021/neurips/Learning to dehaze with polarization	
@@ -0,0 +1 @@
+Haze, a common kind of bad weather caused by atmospheric scattering, decreases the visibility of scenes and degenerates the performance of computer vision algorithms. Single-image dehazing methods have shown their effectiveness in a large variety of scenes, however, they are based on handcrafted priors or learned features, which do not generalize well to real-world images. Polarization information can be used to relieve its ill-posedness, however, real-world images are still challenging since existing polarization-based methods usually assume that the transmitted light is not signiﬁcantly polarized, and they require speciﬁc clues to estimate necessary physical parameters. In this paper, we propose a generalized physical formation model of hazy images and a robust polarization-based dehazing pipeline without the above assumption or requirement, along with a neural network tailored to the pipeline. Experimental results show that our approach achieves state-of-the-art performance on both synthetic data and real-world hazy images.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning to delegate for large-scale vehicle routing b/data/2021/neurips/Learning to delegate for large-scale vehicle routing
new file mode 100644
index 0000000000..e0bb5191ba
--- /dev/null
+++ b/data/2021/neurips/Learning to delegate for large-scale vehicle routing	
@@ -0,0 +1 @@
+Vehicle routing problems (VRPs) form a class of combinatorial problems with wide practical applications. While previous heuristic or learning-based works achieve decent solutions on small problem instances of up to 100 cities, their performance deteriorates in large problems. This article presents a novel learning-augmented local search framework to solve large-scale VRP. The method iteratively improves the solution by identifying appropriate subproblems and $\textit{delegating}$ their improvement to a black box subsolver. At each step, we leverage spatial locality to consider only a linear number of subproblems, rather than exponential. We frame subproblem selection as regression and train a Transformer on a generated training set of problem instances. Our method accelerates state-of-the-art VRP solvers by 10x to 100x while achieving competitive solution qualities for VRPs with sizes ranging from 500 to 3000. Learned subproblem selection offers a 1.5x to 2x speedup over heuristic or random selection. Our results generalize to a variety of VRP distributions, variants, and solvers.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning where to learn: Gradient sparsity in meta and continual learning b/data/2021/neurips/Learning where to learn: Gradient sparsity in meta and continual learning
new file mode 100644
index 0000000000..66f4707c24
--- /dev/null
+++ b/data/2021/neurips/Learning where to learn: Gradient sparsity in meta and continual learning	
@@ -0,0 +1 @@
+Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-by-problem basis. This selective sparsity results in better generalization and less interference in a range of few-shot and continual learning problems. Moreover, we find that sparse learning also emerges in a more expressive model where learning rates are meta-learned. Our results shed light on an ongoing debate on whether meta-learning can discover adaptable features and suggest that learning by sparse gradient descent is a powerful inductive bias for meta-learning systems.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning with Algorithmic Supervision via Continuous Relaxations b/data/2021/neurips/Learning with Algorithmic Supervision via Continuous Relaxations
new file mode 100644
index 0000000000..985510a5a8
--- /dev/null
+++ b/data/2021/neurips/Learning with Algorithmic Supervision via Continuous Relaxations	
@@ -0,0 +1 @@
+The integration of algorithmic components into neural architectures has gained increased attention recently, as it allows training neural networks with new forms of supervision such as ordering constraints or silhouettes instead of using ground truth labels. Many approaches in the field focus on the continuous relaxation of a specific task and show promising results in this context. But the focus on single tasks also limits the applicability of the proposed concepts to a narrow range of applications. In this work, we build on those ideas to propose an approach that allows to integrate algorithms into end-to-end trainable neural network architectures based on a general approximation of discrete conditions. To this end, we relax these conditions in control structures such as conditional statements, loops, and indexing, so that resulting algorithms are smoothly differentiable. To obtain meaningful gradients, each relevant variable is perturbed via logistic distributions and the expectation value under this perturbation is approximated. We evaluate the proposed continuous relaxation model on four challenging tasks and show that it can keep up with relaxations specifically designed for each individual task.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning with Holographic Reduced Representations b/data/2021/neurips/Learning with Holographic Reduced Representations
new file mode 100644
index 0000000000..e6fd64b00b
--- /dev/null
+++ b/data/2021/neurips/Learning with Holographic Reduced Representations	
@@ -0,0 +1 @@
+Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning as a differentiable component of a deep learning architecture. HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space. In doing so we improve the concept retrieval efficacy of HRRs by over $100\times$. Using multi-label classification we demonstrate how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach. Our code can be found at https://github.com/NeuromorphicComputationResearchProgram/Learning-with-Holographic-Reduced-Representations
\ No newline at end of file
diff --git a/data/2021/neurips/Learning with Labeling Induced Abstentions b/data/2021/neurips/Learning with Labeling Induced Abstentions
new file mode 100644
index 0000000000..5dc07905d1
--- /dev/null
+++ b/data/2021/neurips/Learning with Labeling Induced Abstentions	
@@ -0,0 +1 @@
+Consider a setting where we wish to automate an expensive task with a machine learning algorithm using a limited labeling resource. In such settings, examples routed for labeling are often out of scope for the machine learning algorithm. For example, in a spam detection setting, human reviewers not only provide labeled data but are such high-quality detectors of spam that examples routed to them no longer require machine evaluation. As a consequence, the distribution of examples routed to the machine is intimately tied to the process generating labels. We introduce a formalization of this setting, and give an algorithm that simultaneously learns a model and decides when to request a label by leveraging ideas from both the abstention and active learning literatures. We prove an upper bound on the algorithm’s label complexity and a matching lower bound for any algorithm in this setting. We conduct a thorough set of experiments including an ablation study to test different components of our algorithm. We demonstrate the effectiveness of an efﬁcient version of our algorithm over margin sampling on a variety of datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning with Noisy Correspondence for Cross-modal Matching b/data/2021/neurips/Learning with Noisy Correspondence for Cross-modal Matching
new file mode 100644
index 0000000000..1fbd68fb15
--- /dev/null
+++ b/data/2021/neurips/Learning with Noisy Correspondence for Cross-modal Matching	
@@ -0,0 +1 @@
+Cross-modal matching, which aims to establish the correspondence between two different modalities, is fundamental to a variety of tasks such as cross-modal retrieval and vision-and-language understanding. Although a huge number of cross-modal matching methods have been proposed and achieved remarkable progress in recent years, almost all of these methods implicitly assume that the multimodal training data are correctly aligned. In practice, however, such an assumption is extremely expensive even impossible to satisfy. Based on this observation, we reveal and study a latent and challenging direction in cross-modal matching, named noisy correspondence, which could be regarded as a new paradigm of noisy labels. Different from the traditional noisy labels which mainly refer to the errors in category labels, our noisy correspondence refers to the mismatch paired samples. To solve this new problem, we propose a novel method for learning with noisy correspondence, named Noisy Correspondence Rectiﬁer (NCR). In brief, NCR divides the data into clean and noisy partitions based on the memorization effect of neural networks and then rectiﬁes the correspondence via an adaptive prediction model in a co-teaching manner. To verify the effectiveness of our method, we conduct experiments by using the image-text matching as a showcase. Extensive experiments on Flickr30K, MS-COCO, and Conceptual Captions verify the effectiveness of our method. The code could be accessed from www.pengxi. me .
\ No newline at end of file
diff --git a/data/2021/neurips/Learning with User-Level Privacy b/data/2021/neurips/Learning with User-Level Privacy
new file mode 100644
index 0000000000..133ba3224e
--- /dev/null
+++ b/data/2021/neurips/Learning with User-Level Privacy	
@@ -0,0 +1 @@
+We propose and analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution ($m \ge 1$ samples), providing more stringent but more realistic protection against information leaks. We show that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis class with finite metric entropy, the privacy cost decreases as $O(1/\sqrt{m})$ as users provide more samples. In contrast, when increasing the number of users $n$, the privacy cost decreases at a faster $O(1/n)$ rate. We complement these results with lower bounds showing the worst-case optimality of our algorithm for mean estimation and stochastic convex optimization. Our algorithms rely on novel techniques for private mean estimation in arbitrary dimension with error scaling as the concentration radius $\tau$ of the distribution rather than the entire range. Under uniform convergence, we derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $\tau$, and apply it to solve the learning tasks we consider.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds b/data/2021/neurips/Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds
new file mode 100644
index 0000000000..6dbbf8b9b2
--- /dev/null
+++ b/data/2021/neurips/Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds	
@@ -0,0 +1 @@
+We study the online problem of minimizing power consumption in systems with multiple power-saving states. During idle periods of unknown lengths, an algorithm has to choose between power-saving states of different energy consumption and wake-up costs. We develop a learning-augmented online algorithm that makes decisions based on (potentially inaccurate) predicted lengths of the idle periods. The algorithm's performance is near-optimal when predictions are accurate and degrades gracefully with increasing prediction error, with a worst-case guarantee almost identical to the optimal classical online algorithm for the problem. A key ingredient in our approach is a new algorithm for the online ski rental problem in the learning augmented setting with tight dependence on the prediction error. We support our theoretical findings with experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Learning-to-learn non-convex piecewise-Lipschitz functions b/data/2021/neurips/Learning-to-learn non-convex piecewise-Lipschitz functions
new file mode 100644
index 0000000000..659f6650dc
--- /dev/null
+++ b/data/2021/neurips/Learning-to-learn non-convex piecewise-Lipschitz functions	
@@ -0,0 +1 @@
+We analyze the meta-learning of the initialization and step-size of learning algorithms for piecewise-Lipschitz functions, a non-convex setting with applications to both machine learning and algorithms. Starting from recent regret bounds for the exponential forecaster on losses with dispersed discontinuities, we generalize them to be initialization-dependent and then use this result to propose a practical meta-learning procedure that learns both the initialization and the step-size of the algorithm from multiple online learning tasks. Asymptotically, we guarantee that the average regret across tasks scales with a natural notion of task-similarity that measures the amount of overlap between near-optimal regions of different tasks. Finally, we instantiate the method and its guarantee in two important settings: robust meta-learning and multi-task data-driven algorithm design.
\ No newline at end of file
diff --git a/data/2021/neurips/Least Square Calibration for Peer Reviews b/data/2021/neurips/Least Square Calibration for Peer Reviews
new file mode 100644
index 0000000000..4b49c3a433
--- /dev/null
+++ b/data/2021/neurips/Least Square Calibration for Peer Reviews	
@@ -0,0 +1 @@
+Peer review systems such as conference paper review often suffer from the issue of miscalibration. Previous works on peer review calibration usually only use the ordinal information or assume simplistic reviewer scoring functions such as linear functions. In practice, applications like academic conferences often rely on manual methods, such as open discussions, to mitigate miscalibration. It remains an important question to develop algorithms that can handle different types of miscalibrations based on available prior knowledge. In this paper, we propose a flexible framework, namely least square calibration (LSC), for selecting top candidates from peer ratings. Our framework provably performs perfect calibration from noiseless linear scoring functions under mild assumptions, yet also provides competitive calibration results when the scoring function is from broader classes beyond linear functions and with arbitrary noise. On our synthetic dataset, we empirically demonstrate that our algorithm consistently outperforms the baseline which select top papers based on the highest average ratings.
\ No newline at end of file
diff --git a/data/2021/neurips/Leveraging Distribution Alignment via Stein Path for Cross-Domain Cold-Start Recommendation b/data/2021/neurips/Leveraging Distribution Alignment via Stein Path for Cross-Domain Cold-Start Recommendation
new file mode 100644
index 0000000000..05525bac65
--- /dev/null
+++ b/data/2021/neurips/Leveraging Distribution Alignment via Stein Path for Cross-Domain Cold-Start Recommendation	
@@ -0,0 +1 @@
+Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge to solve the cold-start problem in recommender systems. In this paper, we focus on the Cross-Domain Cold-Start Recommendation ( CDCSR ) problem. That is, how to leverage the information from a source domain, where items are ‘warm’, to improve the recommendation performance of a target domain, where items are ‘cold’. Unfortunately, previous approaches on cold-start and CDR cannot reduce the latent embedding discrepancy across domains efﬁciently and lead to model degradation. To address this issue, we propose DisAlign , a cross-domain recommendation framework for the CDCSR problem, which utilizes both rating and auxiliary representations from the source domain to improve the recommendation performance of the target domain. Speciﬁcally, we ﬁrst propose Stein path alignment for aligning the latent embedding distributions across domains, and then further propose its improved version, i.e., proxy Stein path, which can reduce the operation consumption and improve efﬁciency. Our empirical study on Douban and Amazon datasets demonstrates that DisAlign signiﬁcantly outperforms the state-of-the-art models under the CDCSR setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces b/data/2021/neurips/Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces
new file mode 100644
index 0000000000..199a1c2aa6
--- /dev/null
+++ b/data/2021/neurips/Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces	
@@ -0,0 +1 @@
+Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to define a latent variable as a perturbed algorithm output and to use a differentiable surrogate for training. In general, the surrogate puts additional constraints on the model and inevitably leads to biased gradients. To alleviate these shortcomings, we extend the Gumbel-Max trick to define distributions over structured domains. We avoid the differentiable surrogates by leveraging the score function estimators for optimization. In particular, we highlight a family of recursive algorithms with a common feature we call stochastic invariant. The feature allows us to construct reliable gradient estimates and control variates without additional constraints on the model. In our experiments, we consider various structured latent variable models and achieve results competitive with relaxation-based counterparts.
\ No newline at end of file
diff --git a/data/2021/neurips/Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds b/data/2021/neurips/Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds
new file mode 100644
index 0000000000..16aa9cb7b4
--- /dev/null
+++ b/data/2021/neurips/Leveraging SE(3) Equivariance for Self-supervised Category-Level Object Pose Estimation from Point Clouds	
@@ -0,0 +1 @@
+Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.During training, our method assumes no ground-truth pose annotations, no CAD models, and no multi-view supervision. The key to our method is to disentangle shape and pose through an invariant shape reconstruction module and an equivariant pose estimation module, empowered by SE(3) equivariant point cloud networks.The invariant shape reconstruction module learns to perform aligned reconstructions, yielding a category-level reference frame without using any annotations. In addition, the equivariant pose estimation module achieves category-level pose estimation accuracy that is comparable to some fully supervised methods. Extensive experiments demonstrate the effectiveness of our approach on both complete and partial depth point clouds from the ModelNet40 benchmark, and on real depth point clouds from the NOCS-REAL 275 dataset. The project page with code and visualizations can be found at: https://dragonlong.github.io/equi-pose.
\ No newline at end of file
diff --git a/data/2021/neurips/Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation b/data/2021/neurips/Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
new file mode 100644
index 0000000000..b4b7ce2543
--- /dev/null
+++ b/data/2021/neurips/Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation	
@@ -0,0 +1 @@
+We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning b/data/2021/neurips/Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
new file mode 100644
index 0000000000..3e90f433e3
--- /dev/null
+++ b/data/2021/neurips/Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning	
@@ -0,0 +1 @@
+Large natural language models (such as GPT-3 or T5) demonstrate impressive abilities across a range of general NLP tasks. Here, we show that the knowledge embedded in such models provides a useful inductive bias, not just on traditional NLP tasks, but also in the nontraditional task of training a symbolic reasoning engine. We observe that these engines learn quickly and generalize in a natural way that reflects human intuition. For example, training such a system to model block-stacking might naturally generalize to stacking other types of objects because of structure in the real world that has been partially captured by the language describing it. We study several abstract textual reasoning tasks, such as object manipulation and navigation, and demonstrate multiple types of generalization to novel scenarios and the symbols that comprise them. We also demonstrate the surprising utility of \textit{compositional learning}, where a learner dedicated to mastering a complicated task gains an advantage by training on relevant simpler tasks instead of jumping straight to the complicated task.
\ No newline at end of file
diff --git a/data/2021/neurips/Lifelong Domain Adaptation via Consolidated Internal Distribution b/data/2021/neurips/Lifelong Domain Adaptation via Consolidated Internal Distribution
new file mode 100644
index 0000000000..1db703e904
--- /dev/null
+++ b/data/2021/neurips/Lifelong Domain Adaptation via Consolidated Internal Distribution	
@@ -0,0 +1 @@
+We develop an algorithm to address unsupervised domain adaptation (UDA) in continual learning (CL) settings. The goal is to update a model continually to learn distributional shifts across sequentially arriving tasks with unlabeled data while retaining the knowledge about the past learned tasks. Existing UDA algorithms address the challenge of domain shift, but they require simultaneous access to the datasets of the source and the target domains. On the other hand, existing works on CL can handle tasks with labeled data. Our solution is based on consolidating the learned internal distribution for improved model generalization on new domains and benefiting from experience replay to overcome catastrophic forgetting.
\ No newline at end of file
diff --git a/data/2021/neurips/Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering b/data/2021/neurips/Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering
new file mode 100644
index 0000000000..6ff15fdeb4
--- /dev/null
+++ b/data/2021/neurips/Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering	
@@ -0,0 +1 @@
+Inferring representations of 3D scenes from 2D observations is a fundamental problem of computer graphics, computer vision, and artificial intelligence. Emerging 3D-structured neural scene representations are a promising approach to 3D scene understanding. In this work, we propose a novel neural scene representation, Light Field Networks or LFNs, which represent both geometry and appearance of the underlying 3D scene in a 360-degree, four-dimensional light field parameterized via a neural implicit representation. Rendering a ray from an LFN requires only a single network evaluation, as opposed to hundreds of evaluations per ray for ray-marching or volumetric based renderers in 3D-structured neural scene representations. In the setting of simple scenes, we leverage meta-learning to learn a prior over LFNs that enables multi-view consistent light field reconstruction from as little as a single image observation. This results in dramatic reductions in time and memory complexity, and enables real-time rendering. The cost of storing a 360-degree light field via an LFN is two orders of magnitude lower than conventional methods such as the Lumigraph. Utilizing the analytical differentiability of neural implicit representations and a novel parameterization of light space, we further demonstrate the extraction of sparse depth maps from LFNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training b/data/2021/neurips/Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training
new file mode 100644
index 0000000000..7ed5185803
--- /dev/null
+++ b/data/2021/neurips/Limiting fluctuation and trajectorial stability of multilayer neural networks with mean field training	
@@ -0,0 +1 @@
+The mean field (MF) theory of multilayer neural networks centers around a particular infinite-width scaling, where the learning dynamics is closely tracked by the MF limit. A random fluctuation around this infinite-width limit is expected from a large-width expansion to the next order. This fluctuation has been studied only in shallow networks, where previous works employ heavily technical notions or additional formulation ideas amenable only to that case. Treatment of the multilayer case has been missing, with the chief difficulty in finding a formulation that captures the stochastic dependency across not only time but also depth. In this work, we initiate the study of the fluctuation in the case of multilayer networks, at any network depth. Leveraging on the neuronal embedding framework recently introduced by Nguyen and Pham, we systematically derive a system of dynamical equations, called the second-order MF limit, that captures the limiting fluctuation distribution. We demonstrate through the framework the complex interaction among neurons in this second-order MF limit, the stochasticity with cross-layer dependency and the nonlinear time evolution inherent in the limiting fluctuation. A limit theorem is proven to relate quantitatively this limit to the fluctuation of large-width networks. We apply the result to show a stability property of gradient descent MF training: in the large-width regime, along the training trajectory, it progressively biases towards a solution with"minimal fluctuation"(in fact, vanishing fluctuation) in the learned output function, even after the network has been initialized at or has converged (sufficiently fast) to a global optimum. This extends a similar phenomenon previously shown only for shallow networks with a squared loss in the ERM setting, to multilayer networks with a loss function that is not necessarily convex in a more general setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients b/data/2021/neurips/Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients
new file mode 100644
index 0000000000..9f6d1d4687
--- /dev/null
+++ b/data/2021/neurips/Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients	
@@ -0,0 +1 @@
+We consider a standard federated learning (FL) architecture where a group of clients periodically coordinate with a central server to train a statistical model. We develop a general algorithmic framework called FedLin to tackle some of the key challenges intrinsic to FL, namely objective heterogeneity, systems heterogeneity, and infrequent and imprecise communication. Our framework is motivated by the observation that under these challenges, various existing FL algorithms suffer from a fundamental speed-accuracy conflict: they either guarantee linear convergence but to an incorrect point, or convergence to the global minimum but at a sub-linear rate, i.e., fast convergence comes at the expense of accuracy. In contrast, when the clients' local loss functions are smooth and strongly convex, we show that FedLin guarantees linear convergence to the global minimum, despite arbitrary objective and systems heterogeneity. We then establish matching upper and lower bounds on the convergence rate of FedLin that highlight the effects of intermittent communication. Finally, we show that FedLin preserves linear convergence rates under aggressive gradient sparsification, and quantify the effect of the compression level on the convergence rate. Our work is the first to provide tight linear convergence rate guarantees, and constitutes the first comprehensive analysis of gradient sparsification in FL.
\ No newline at end of file
diff --git a/data/2021/neurips/Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models b/data/2021/neurips/Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models
new file mode 100644
index 0000000000..083fe7704a
--- /dev/null
+++ b/data/2021/neurips/Linear Convergence of Gradient Methods for Estimating Structured Transition Matrices in High-dimensional Vector Autoregressive Models	
@@ -0,0 +1 @@
+In this paper, we present non-asymptotic optimization guarantees of gradient descent methods for estimating structured transition matrices in high-dimensional vector autoregressive (VAR) models. We adopt the projected gradient descent (PGD) for single-structured transition matrices and the alternating projected gradient descent (AltPGD) for superposition-structured ones. Our analysis demonstrates that both gradient algorithms converge linearly to the statistical error even though the strong convexity of the objective function is absent under the high-dimensional settings. Moreover our result is sharp (up to a constant factor) in the sense of matching the phase transition theory of the corresponding model with independent samples. To the best of our knowledge, this analysis constitutes first nonasymptotic optimization guarantees of the linear rate for regularized estimation in high-dimensional VAR models. Numerical results are provided to support our theoretical analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters b/data/2021/neurips/Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters
new file mode 100644
index 0000000000..9155f11a29
--- /dev/null
+++ b/data/2021/neurips/Linear and Kernel Classification in the Streaming Model: Improved Bounds for Heavy Hitters	
@@ -0,0 +1 @@
+We study linear and kernel classiﬁcation in the streaming model. For linear clas-siﬁcation, we improve upon the algorithm of [1], which solves the (cid:96) 1 point query problem on the optimal weight vector w ∗ ∈ R d in sublinear space. We ﬁrst give an algorithm solving the more difﬁcult (cid:96) 2 point query problem on w ∗ , also in sublinear space. We then give an algorithm which solves the related (cid:96) 2 heavy hitter problem on w ∗ , in sublinear space and running time. Finally, we give an algorithm which can deterministically solve the (cid:96) 1 point query problem on w ∗ , with sublinear space, improving upon that of [1]. For kernel classiﬁcation, if w ∗ ∈ R d p is the optimal weight vector classifying points in the stream according to their p th -degree polynomial kernel, then we give an algorithm solving the (cid:96) 2 point query problem on w ∗ in poly ( p log d ε ) space, and an algorithm solving the (cid:96) 2 heavy hitter problem in poly ( p log d ε ) space and running time. Note that our space and running time is polynomial in p , making our algorithms well-suited to high-degree polynomial kernels and the Gaussian kernel (approximated by the polynomial kernel of degree p = Θ(log T ) ). Our algorithms for kernels are in fact a special case of a more general algorithm we give for low-rank tensor inputs.
\ No newline at end of file
diff --git a/data/2021/neurips/Linear-Time Probabilistic Solution of Boundary Value Problems b/data/2021/neurips/Linear-Time Probabilistic Solution of Boundary Value Problems
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Lip to Speech Synthesis with Visual Context Attentional GAN b/data/2021/neurips/Lip to Speech Synthesis with Visual Context Attentional GAN
new file mode 100644
index 0000000000..5c50d965f1
--- /dev/null
+++ b/data/2021/neurips/Lip to Speech Synthesis with Visual Context Attentional GAN	
@@ -0,0 +1 @@
+In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene. To achieve this, a visual context attention module is proposed where it encodes global representations from the local visual features, and provides the desired global visual context corresponding to the given coarse speech representation to the generator through audio-visual attention. In addition to the explicit modelling of local and global visual representations, synchronization learning is introduced as a form of contrastive learning that guides the generator to synthesize a speech in sync with the given input lip movements. Extensive experiments demonstrate that the proposed VCA-GAN outperforms existing state-of-the-art and is able to effectively synthesize the speech from multi-speaker that has been barely handled in the previous works.
\ No newline at end of file
diff --git a/data/2021/neurips/List-Decodable Mean Estimation in Nearly-PCA Time b/data/2021/neurips/List-Decodable Mean Estimation in Nearly-PCA Time
new file mode 100644
index 0000000000..8e68b2689c
--- /dev/null
+++ b/data/2021/neurips/List-Decodable Mean Estimation in Nearly-PCA Time	
@@ -0,0 +1,2 @@
+Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority $\frac 1 k$ fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on $\mathbb{R}^d$ has bounded covariance, our algorithm outputs a list of $O(k)$ candidate means, one of which is within distance $O(\sqrt{k})$ from the truth. Our algorithm runs in time $\widetilde{O}(ndk)$ for all $k = O(\sqrt{d}) \cup \Omega(d)$, where $n$ is the size of the dataset. We also show that a variant of our algorithm has runtime $\widetilde{O}(ndk)$ for all $k$, at the expense of an $O(\sqrt{\log k})$ factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single $k$-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes $\widetilde{O}(n^2 d k^2)$ and $\widetilde{O}(nd k^{\ge 6})$. 
+Our approach builds on a novel soft downweighting method, $\mathsf{SIFT}$, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of $\mathsf{SIFT}$ via a careful "win-win-win" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Littlestone Classes are Privately Online Learnable b/data/2021/neurips/Littlestone Classes are Privately Online Learnable
new file mode 100644
index 0000000000..bafcc77f42
--- /dev/null
+++ b/data/2021/neurips/Littlestone Classes are Privately Online Learnable	
@@ -0,0 +1 @@
+We consider the problem of online classification under a privacy constraint. In this setting a learner observes sequentially a stream of labelled examples $(x_t, y_t)$, for $1 \leq t \leq T$, and returns at each iteration $t$ a hypothesis $h_t$ which is used to predict the label of each new example $x_t$. The learner's performance is measured by her regret against a known hypothesis class $\mathcal{H}$. We require that the algorithm satisfies the following privacy constraint: the sequence $h_1, \ldots, h_T$ of hypotheses output by the algorithm needs to be an $(\epsilon, \delta)$-differentially private function of the whole input sequence $(x_1, y_1), \ldots, (x_T, y_T)$. We provide the first non-trivial regret bound for the realizable setting. Specifically, we show that if the class $\mathcal{H}$ has constant Littlestone dimension then, given an oblivious sequence of labelled examples, there is a private learner that makes in expectation at most $O(\log T)$ mistakes -- comparable to the optimal mistake bound in the non-private case, up to a logarithmic factor. Moreover, for general values of the Littlestone dimension $d$, the same mistake bound holds but with a doubly-exponential in $d$ factor. A recent line of work has demonstrated a strong connection between classes that are online learnable and those that are differentially-private learnable. Our results strengthen this connection and show that an online learning algorithm can in fact be directly privatized (in the realizable setting). We also discuss an adaptive setting and provide a sublinear regret bound of $O(\sqrt{T})$.
\ No newline at end of file
diff --git a/data/2021/neurips/Local Differential Privacy for Regret Minimization in Reinforcement Learning b/data/2021/neurips/Local Differential Privacy for Regret Minimization in Reinforcement Learning
new file mode 100644
index 0000000000..dbf6a4004d
--- /dev/null
+++ b/data/2021/neurips/Local Differential Privacy for Regret Minimization in Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We establish a lower bound for regret minimization in finite-horizon MDPs with LDP guarantees which shows that guaranteeing privacy has a multiplicative effect on the regret. This result shows that while LDP is an appealing notion of privacy, it makes the learning problem significantly more complex. Finally, we present an optimistic algorithm that simultaneously satisfies $\varepsilon$-LDP requirements, and achieves $\sqrt{K}/\varepsilon$ regret in any finite-horizon MDP after $K$ episodes, matching the lower bound dependency on the number of episodes $K$.
\ No newline at end of file
diff --git a/data/2021/neurips/Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization b/data/2021/neurips/Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization
new file mode 100644
index 0000000000..faed02d95d
--- /dev/null
+++ b/data/2021/neurips/Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization	
@@ -0,0 +1 @@
+There have been many recent advances in representation learning; however, unsupervised representation learning can still struggle with model identification issues related to rotations of the latent space. Variational Auto-Encoders (VAEs) and their extensions such as $\beta$-VAEs have been shown to improve local alignment of latent variables with PCA directions, which can help to improve model disentanglement under some conditions. Borrowing inspiration from Independent Component Analysis (ICA) and sparse coding, we propose applying an $L_1$ loss to the VAE's generative Jacobian during training to encourage local latent variable alignment with independent factors of variation in images of multiple objects or images with multiple parts. We demonstrate our results on a variety of datasets, giving qualitative and quantitative results using information theoretic and modularity measures that show our added $L_1$ cost encourages local axis alignment of the latent representation with individual factors of variation.
\ No newline at end of file
diff --git a/data/2021/neurips/Local Explanation of Dialogue Response Generation b/data/2021/neurips/Local Explanation of Dialogue Response Generation
new file mode 100644
index 0000000000..6aa0b7863d
--- /dev/null
+++ b/data/2021/neurips/Local Explanation of Dialogue Response Generation	
@@ -0,0 +1 @@
+In comparison to the interpretation of classification models, the explanation of sequence generation models is also an important problem, however it has seen little attention. In this work, we study model-agnostic explanations of a representative text generation task -- dialogue response generation. Dialog response generation is challenging with its open-ended sentences and multiple acceptable responses. To gain insights into the reasoning process of a generation model, we propose a new method, local explanation of response generation (LERG) that regards the explanations as the mutual interaction of segments in input and output sentences. LERG views the sequence prediction as uncertainty estimation of a human response and then creates explanations by perturbing the input and calculating the certainty change over the human response. We show that LERG adheres to desired properties of explanations for text generation including unbiased approximation, consistency and cause identification. Empirically, our results show that our method consistently improves other widely used methods on proposed automatic- and human- evaluation metrics for this new task by 4.4-12.8%. Our analysis demonstrates that LERG can extract both explicit and implicit relations between input and output segments.
\ No newline at end of file
diff --git a/data/2021/neurips/Local Hyper-Flow Diffusion b/data/2021/neurips/Local Hyper-Flow Diffusion
new file mode 100644
index 0000000000..0878f85774
--- /dev/null
+++ b/data/2021/neurips/Local Hyper-Flow Diffusion	
@@ -0,0 +1 @@
+Recently, hypergraphs have attracted a lot of attention due to their ability to capture complex relations among entities. The insurgence of hypergraphs has resulted in data of increasing size and complexity that exhibit interesting small-scale and local structure, e.g., small-scale communities and localized node-ranking around a given set of seed nodes. Popular and principled ways to capture the local structure are the local hypergraph clustering problem and related seed set expansion problem. In this work, we propose the first local diffusion method that achieves edge-size-independent Cheeger-type guarantee for the problem of local hypergraph clustering while applying to a rich class of higher-order relations that covers many previously studied special cases. Our method is based on a primal-dual optimization formulation where the primal problem has a natural network flow interpretation, and the dual problem has a cut-based interpretation using the $\ell_2$-norm penalty on associated cut-costs. We demonstrate the new technique is significantly better than state-of-the-art methods on both synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2021/neurips/Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels b/data/2021/neurips/Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels
new file mode 100644
index 0000000000..7514c500ac
--- /dev/null
+++ b/data/2021/neurips/Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels	
@@ -0,0 +1 @@
+Neural networks have been shown to outperform kernel methods in practice (including neural tangent kernels). Most theoretical explanations of this performance gap focus on learning a complex hypothesis class; in some cases, it is unclear whether this hypothesis class captures realistic data. In this work, we propose a related, but alternative, explanation for this performance gap in the image classiﬁcation setting, based on ﬁnding a sparse signal in the presence of noise. Speciﬁcally, we prove that, for a simple data distribution with sparse signal amidst high-variance noise, a simple convolutional neural network trained using stochastic gradient descent simultaneously learns to threshold out the noise and ﬁnd the signal. On the other hand, the corresponding neural tangent kernel, with a ﬁxed set of predetermined features, is unable to adapt to the signal in this manner. We supplement our theoretical results by demonstrating this phenomenon empirically: in CIFAR-10 and MNIST images with various backgrounds, as the background noise increases in intensity, a CNN’s performance stays relatively robust, whereas its corresponding neural tangent kernel sees a notable drop in performance. We therefore propose the local signal adaptivity (LSA) phenomenon as one explanation for the superiority of neural networks over kernel methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Local plasticity rules can learn deep representations using self-supervised contrastive predictions b/data/2021/neurips/Local plasticity rules can learn deep representations using self-supervised contrastive predictions
new file mode 100644
index 0000000000..447ea4f295
--- /dev/null
+++ b/data/2021/neurips/Local plasticity rules can learn deep representations using self-supervised contrastive predictions	
@@ -0,0 +1 @@
+Learning in the brain is poorly understood and learning rules that respect biological constraints, yet yield deep hierarchical representations, are still unknown. Here, we propose a learning rule that takes inspiration from neuroscience and recent advances in self-supervised deep learning. Learning minimizes a simple layer-specific loss function and does not need to back-propagate error signals within or between layers. Instead, weight updates follow a local, Hebbian, learning rule that only depends on pre- and post-synaptic neuronal activity, predictive dendritic input and widely broadcasted modulation factors which are identical for large groups of neurons. The learning rule applies contrastive predictive learning to a causal, biological setting using saccades (i.e. rapid shifts in gaze direction). We find that networks trained with this self-supervised and local rule build deep hierarchical representations of images, speech and video.
\ No newline at end of file
diff --git a/data/2021/neurips/Local policy search with Bayesian optimization b/data/2021/neurips/Local policy search with Bayesian optimization
new file mode 100644
index 0000000000..1581d0f8f4
--- /dev/null
+++ b/data/2021/neurips/Local policy search with Bayesian optimization	
@@ -0,0 +1 @@
+Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Locality Sensitive Teaching b/data/2021/neurips/Locality Sensitive Teaching
new file mode 100644
index 0000000000..c93154a708
--- /dev/null
+++ b/data/2021/neurips/Locality Sensitive Teaching	
@@ -0,0 +1 @@
+The emergence of the Internet-of-Things (IoT) sheds light on applying the machine teaching (MT) algorithms for online personalized education on home devices. This direction becomes more promising during the COVID-19 pandemic when in-person education becomes infeasible. However, as one of the most inﬂu-ential and practical MT paradigms, iterative machine teaching (IMT) is prohibited on IoT devices due to its inefﬁcient and unscalable algorithms. IMT is a paradigm where a teacher feeds examples iteratively and intelligently based on the learner’s status. In each iteration, current IMT algorithms greedily traverse the whole training set to ﬁnd an example for the learner, which is computationally expensive in practice. We propose a novel teaching framework, Locality Sensitive Teaching (LST), based on locality sensitive sampling, to overcome these challenges. LST has provable near-constant time complexity, which is exponentially better than the existing baseline. With at most 425.12 × speedups and 99.76 % energy savings over IMT, LST is the ﬁrst algorithm that enables energy and time efﬁcient machine teaching on IoT devices. Owing to LST’s substantial efﬁciency and scalability, it is readily applicable in real-world education scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Locality defeats the curse of dimensionality in convolutional teacher-student scenarios b/data/2021/neurips/Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
new file mode 100644
index 0000000000..62915d8f83
--- /dev/null
+++ b/data/2021/neurips/Locality defeats the curse of dimensionality in convolutional teacher-student scenarios	
@@ -0,0 +1 @@
+Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher–student framework for kernel regression, using ‘convolutional’ kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error ϵ t ∼ P −β to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher t is smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
\ No newline at end of file
diff --git a/data/2021/neurips/Localization with Sampling-Argmax b/data/2021/neurips/Localization with Sampling-Argmax
new file mode 100644
index 0000000000..4733019cca
--- /dev/null
+++ b/data/2021/neurips/Localization with Sampling-Argmax	
@@ -0,0 +1 @@
+Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method. Code is available at https://github.com/Jeff-sjtu/sampling-argmax
\ No newline at end of file
diff --git a/data/2021/neurips/Localization, Convexity, and Star Aggregation b/data/2021/neurips/Localization, Convexity, and Star Aggregation
new file mode 100644
index 0000000000..0799f23aed
--- /dev/null
+++ b/data/2021/neurips/Localization, Convexity, and Star Aggregation	
@@ -0,0 +1 @@
+Offset Rademacher complexities have been shown to provide tight upper bounds for the square loss in a broad class of problems including improper statistical learning and online learning. We show that the offset complexity can be generalized to any loss that satisfies a certain general convexity condition. Further, we show that this condition is closely related to both exponential concavity and self-concordance, unifying apparently disparate results. By a novel geometric argument, many of our bounds translate to improper learning in a non-convex class with Audibert's star algorithm. Thus, the offset complexity provides a versatile analytic tool that covers both convex empirical risk minimization and improper learning under entropy conditions. Applying the method, we recover the optimal rates for proper and improper learning with the $p$-loss for $1<p<\infty$, and show that improper variants of empirical risk minimization can attain fast rates for logistic regression and other generalized linear models.
\ No newline at end of file
diff --git a/data/2021/neurips/Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models b/data/2021/neurips/Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models
new file mode 100644
index 0000000000..3db2b1e504
--- /dev/null
+++ b/data/2021/neurips/Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models	
@@ -0,0 +1 @@
+Several out-of-distribution (OOD) detection scores have been recently proposed for deep generative models because the direct use of the likelihood threshold for OOD detection has been shown to be problematic. In this paper, we propose a new OOD score based on a Bayesian hypothesis test called the locally most powerful Bayesian test (LMPBT). The LMPBT is locally most powerful in that the alternative hypothesis (the representative parameter for the OOD sample) is speciﬁed to maximize the probability that the Bayes factor exceeds the evidence threshold in favor of the alternative hypothesis provided that the parameter speciﬁed under the alternative hypothesis is in the neighborhood of the parameter speciﬁed under the null hypothesis. That is, under this neighborhood parameter condition, the test with the proposed alternative hypothesis maximizes the probability of correct detection of OOD samples. We also propose numerical strategies for more efﬁcient and reliable computation of the LMPBT for practical application to deep generative models. Evaluations conducted of the OOD detection performance of the LMPBT on various benchmark datasets demonstrate its superior performance over existing OOD detection methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Locally Valid and Discriminative Prediction Intervals for Deep Learning Models b/data/2021/neurips/Locally Valid and Discriminative Prediction Intervals for Deep Learning Models
new file mode 100644
index 0000000000..63fe2dcd07
--- /dev/null
+++ b/data/2021/neurips/Locally Valid and Discriminative Prediction Intervals for Deep Learning Models	
@@ -0,0 +1 @@
+Crucial for building trust in deep learning models for critical real-world applications is efficient and theoretically sound uncertainty quantification, a task that continues to be challenging. Useful uncertainty information is expected to have two key properties: It should be valid (guaranteeing coverage) and discriminative (more uncertain when the expected risk is high). Moreover, when combined with deep learning (DL) methods, it should be scalable and affect the DL model performance minimally. Most existing Bayesian methods lack frequentist coverage guarantees and usually affect model performance. The few available frequentist methods are rarely discriminative and/or violate coverage guarantees due to unrealistic assumptions. Moreover, many methods are expensive or require substantial modifications to the base neural network. Building upon recent advances in conformal prediction [13, 33] and leveraging the classical idea of kernel regression, we propose Locally Valid and Discriminative prediction intervals (LVD), a simple, efficient and lightweight method to construct discriminative prediction intervals (PIs) for almost any DL model. With no assumptions on the data distribution, such PIs also offer finite-sample local coverage guarantees (contrasted to the simpler marginal coverage). We empirically verify, using diverse datasets, that besides being the only locally valid method for DL, LVD also exceeds or matches the performance (including coverage rate and prediction accuracy) of existing uncertainty quantification methods, while offering additional benefits in scalability and flexibility.
\ No newline at end of file
diff --git a/data/2021/neurips/Locally differentially private estimation of functionals of discrete distributions b/data/2021/neurips/Locally differentially private estimation of functionals of discrete distributions
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Locally private online change point detection b/data/2021/neurips/Locally private online change point detection
new file mode 100644
index 0000000000..3ce0c10b88
--- /dev/null
+++ b/data/2021/neurips/Locally private online change point detection	
@@ -0,0 +1 @@
+We study online change point detection problems under the constraint of local differential privacy (LDP) where, in particular, the statistician does not have access to the raw data. As a concrete problem, we study a multivariate nonparametric regression problem. At each time point $t$, the raw data are assumed to be of the form $(X_t, Y_t)$, where $X_t$ is a $d$-dimensional feature vector and $Y_t$ is a response variable. Our primary aim is to detect changes in the regression function $m_t(x)=\mathbb{E}(Y_t |X_t=x)$ as soon as the change occurs. We provide algorithms which respect the LDP constraint, which control the false alarm probability, and which detect changes with a minimal (minimax rate-optimal) delay. To quantify the cost of privacy, we also present the optimal rate in the benchmark, non-private setting. These non-private results are also new to the literature and thus are interesting \emph{per se}. In addition, we study the univariate mean online change point detection problem, under privacy constraints. This serves as the blueprint of studying more complicated private change point detection problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Logarithmic Regret from Sublinear Hints b/data/2021/neurips/Logarithmic Regret from Sublinear Hints
new file mode 100644
index 0000000000..eaba6dbf01
--- /dev/null
+++ b/data/2021/neurips/Logarithmic Regret from Sublinear Hints	
@@ -0,0 +1 @@
+We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent work showed that if an algorithm receives a hint $h_t$ that has non-trivial correlation with $c_t$ before it plays $x_t$, then it can achieve a regret guarantee of $O(\log T)$, improving on the bound of $\Theta(\sqrt{T})$ in the standard setting. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain $O(\log T)$ regret with just $O(\sqrt{T})$ hints under a natural query model; in contrast, we also show that $o(\sqrt{T})$ hints cannot guarantee better than $\Omega(\sqrt{T})$ regret. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.
\ No newline at end of file
diff --git a/data/2021/neurips/Logarithmic Regret in Feature-based Dynamic Pricing b/data/2021/neurips/Logarithmic Regret in Feature-based Dynamic Pricing
new file mode 100644
index 0000000000..0d6e94c311
--- /dev/null
+++ b/data/2021/neurips/Logarithmic Regret in Feature-based Dynamic Pricing	
@@ -0,0 +1 @@
+Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products with applications in digital marketing, online sales, real estate and so on. The problem was formally studied as an online learning problem [Javanmard&Nazerzadeh, 2019] where a seller needs to propose prices on the fly for a sequence of $T$ products based on their features $x$ while having a small regret relative to the best --"omniscient"-- pricing strategy she could have come up with in hindsight. We revisit this problem and provide two algorithms (EMLP and ONSP) for stochastic and adversarial feature settings, respectively, and prove the optimal $O(d\log{T})$ regret bounds for both. In comparison, the best existing results are $O\left(\min\left\{\frac{1}{\lambda_{\min}^2}\log{T}, \sqrt{T}\right\}\right)$ and $O(T^{2/3})$ respectively, with $\lambda_{\min}$ being the smallest eigenvalue of $\mathbb{E}[xx^T]$ that could be arbitrarily close to $0$. We also prove an $\Omega(\sqrt{T})$ information-theoretic lower bound for a slightly more general setting, which demonstrates that"knowing-the-demand-curve"leads to an exponential improvement in feature-based dynamic pricing.
\ No newline at end of file
diff --git a/data/2021/neurips/Long Short-Term Transformer for Online Action Detection b/data/2021/neurips/Long Short-Term Transformer for Online Action Detection
new file mode 100644
index 0000000000..4560d2f583
--- /dev/null
+++ b/data/2021/neurips/Long Short-Term Transformer for Online Action Detection	
@@ -0,0 +1 @@
+We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data. It consists of an LSTR encoder that dynamically leverages coarse-scale historical information from an extended temporal window (e.g., 2048 frames spanning of up to 8 minutes), together with an LSTR decoder that focuses on a short time window (e.g., 32 frames spanning 8 seconds) to model the fine-scale characteristics of the data. Compared to prior work, LSTR provides an effective and efficient method to model long videos with fewer heuristics, which is validated by extensive empirical analysis. LSTR achieves state-of-the-art performance on three standard online action detection benchmarks, THUMOS'14, TVSeries, and HACS Segment. Code has been made available at: https://xumingze0308.github.io/projects/lstr
\ No newline at end of file
diff --git a/data/2021/neurips/Long-Short Transformer: Efficient Transformers for Language and Vision b/data/2021/neurips/Long-Short Transformer: Efficient Transformers for Language and Vision
new file mode 100644
index 0000000000..8c7e369ca3
--- /dev/null
+++ b/data/2021/neurips/Long-Short Transformer: Efficient Transformers for Language and Vision	
@@ -0,0 +1 @@
+Transformers have achieved success in both language and vision domains. However, it is prohibitively expensive to scale them to long sequences such as long documents or high-resolution images, because self-attention mechanism has quadratic time and memory complexities with respect to the input sequence length. In this paper, we propose Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. We propose a dual normalization strategy to account for the scale mismatch between the two attention mechanisms. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity. Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification. For instance, Transformer-LS achieves 0.97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware. On ImageNet, it can obtain the state-of-the-art results (e.g., a moderate size of 55.8M model solely trained on 224x224 ImageNet-1K can obtain Top-1 accuracy 84.1%), while being more scalable on high-resolution images. The source code and models are released at https://github.com/NVIDIA/transformer-ls .
\ No newline at end of file
diff --git a/data/2021/neurips/Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos b/data/2021/neurips/Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
new file mode 100644
index 0000000000..618b0ecb33
--- /dev/null
+++ b/data/2021/neurips/Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos	
@@ -0,0 +1 @@
+We introduce the task of spatially localizing narrated interactions in videos. Key to our approach is the ability to learn to spatially localize interactions with self-supervision on a large corpus of videos with accompanying transcribed narrations. To achieve this goal, we propose a multilayer cross-modal attention network that enables effective optimization of a contrastive loss during training. We introduce a divided strategy that alternates between computing inter- and intra-modal attention across the visual and natural language modalities, which allows effective training via directly contrasting the two modalities' representations. We demonstrate the effectiveness of our approach by self-training on the HowTo100M instructional video dataset and evaluating on a newly collected dataset of localized described interactions in the YouCook2 dataset. We show that our approach outperforms alternative baselines, including shallow co-attention and full cross-modal attention. We also apply our approach to grounding phrases in images with weak supervision on Flickr30K and show that stacking multiple attention layers is effective and, when combined with a word-to-region loss, achieves state of the art on recall-at-one and pointing hand accuracies.
\ No newline at end of file
diff --git a/data/2021/neurips/Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis b/data/2021/neurips/Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis
new file mode 100644
index 0000000000..519eb5752a
--- /dev/null
+++ b/data/2021/neurips/Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis	
@@ -0,0 +1 @@
+We describe a novel attribution method which is grounded in Sensitivity Analysis and uses Sobol indices. Beyond modeling the individual contributions of image regions, Sobol indices provide an efficient way to capture higher-order interactions between image regions and their contributions to a neural network's prediction through the lens of variance. We describe an approach that makes the computation of these indices efficient for high-dimensional problems by using perturbation masks coupled with efficient estimators to handle the high dimensionality of images. Importantly, we show that the proposed method leads to favorable scores on standard benchmarks for vision (and language models) while drastically reducing the computing time compared to other black-box methods -- even surpassing the accuracy of state-of-the-art white-box methods which require access to internal representations. Our code is freely available: https://github.com/fel-thomas/Sobol-Attribution-Method
\ No newline at end of file
diff --git a/data/2021/neurips/Looking Beyond Single Images for Contrastive Semantic Segmentation Learning b/data/2021/neurips/Looking Beyond Single Images for Contrastive Semantic Segmentation Learning
new file mode 100644
index 0000000000..309d8a9125
--- /dev/null
+++ b/data/2021/neurips/Looking Beyond Single Images for Contrastive Semantic Segmentation Learning	
@@ -0,0 +1 @@
+We present an approach to contrastive representation learning for semantic segmentation. Our approach leverages the representational power of existing feature extractors to ﬁnd corresponding regions across images. These cross-image correspondences are used as auxiliary labels to guide the pixel-level selection of positive and negative samples for more effective contrastive learning in semantic segmentation. We show that auxiliary labels can be generated from a variety of feature extractors, ranging from image classiﬁcation networks that have been trained using unsupervised contrastive learning to segmentation models that have been trained on a small amount of labeled data. We additionally introduce a novel metric for rapidly judging the quality of a given auxiliary-labeling strategy, and empirically analyze various factors that inﬂuence the performance of contrastive learning for semantic segmentation. We demonstrate the effectiveness of our method both in the low-data as well as the high-data regime on various datasets. Our experiments show that contrastive learning with our auxiliary-labeling approach consistently boosts semantic segmentation accuracy when compared to standard ImageNet pre-training and outperforms existing approaches of contrastive and semi-supervised semantic segmentation.
\ No newline at end of file
diff --git a/data/2021/neurips/Loss function based second-order Jensen inequality and its application to particle variational inference b/data/2021/neurips/Loss function based second-order Jensen inequality and its application to particle variational inference
new file mode 100644
index 0000000000..43611800c4
--- /dev/null
+++ b/data/2021/neurips/Loss function based second-order Jensen inequality and its application to particle variational inference	
@@ -0,0 +1 @@
+Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efﬁciently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.
\ No newline at end of file
diff --git a/data/2021/neurips/Lossy Compression for Lossless Prediction b/data/2021/neurips/Lossy Compression for Lossless Prediction
new file mode 100644
index 0000000000..04b9fb0cce
--- /dev/null
+++ b/data/2021/neurips/Lossy Compression for Lossless Prediction	
@@ -0,0 +1 @@
+Most data is automatically collected and only ever"seen"by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize the bit-rate required to ensure high performance on all predictive tasks that are invariant under a set of transformations, such as data augmentations. Based on our theory, we design unsupervised objectives for training neural compressors. Using these objectives, we train a generic image compressor that achieves substantial rate savings (more than $1000\times$ on ImageNet) compared to JPEG on 8 datasets, without decreasing downstream classification performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Low-Fidelity Video Encoder Optimization for Temporal Action Localization b/data/2021/neurips/Low-Fidelity Video Encoder Optimization for Temporal Action Localization
new file mode 100644
index 0000000000..39535c8806
--- /dev/null
+++ b/data/2021/neurips/Low-Fidelity Video Encoder Optimization for Temporal Action Localization	
@@ -0,0 +1 @@
+Temporal action localization (TAL) is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. This results in a task discrepancy problem for the video encoder -- trained for action classification, but used for TAL. Intuitively, end-to-end model optimization is a good solution. However, this is not operable for TAL subject to the GPU memory constraints, due to the prohibitive computational cost in processing long untrimmed videos. In this paper, we resolve this challenge by introducing a novel low-fidelity end-to-end (LoFi) video encoder pre-training method. Instead of always using the full training configurations for TAL learning, we propose to reduce the mini-batch composition in terms of temporal, spatial or spatio-temporal resolution so that end-to-end optimization for the video encoder becomes operable under the memory conditions of a mid-range hardware budget. Crucially, this enables the gradient to flow backward through the video encoder from a TAL loss supervision, favourably solving the task discrepancy problem and providing more effective feature representations. Extensive experiments show that the proposed LoFi pre-training approach can significantly enhance the performance of existing TAL methods. Encouragingly, even with a lightweight ResNet18 based video encoder in a single RGB stream, our method surpasses two-stream ResNet50 based alternatives with expensive optical flow, often by a good margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Low-Rank Constraints for Fast Inference in Structured Models b/data/2021/neurips/Low-Rank Constraints for Fast Inference in Structured Models
new file mode 100644
index 0000000000..87441231a1
--- /dev/null
+++ b/data/2021/neurips/Low-Rank Constraints for Fast Inference in Structured Models	
@@ -0,0 +1 @@
+Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
\ No newline at end of file
diff --git a/data/2021/neurips/Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems b/data/2021/neurips/Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems
new file mode 100644
index 0000000000..1c944243b7
--- /dev/null
+++ b/data/2021/neurips/Low-Rank Extragradient Method for Nonsmooth and Low-Rank Matrix Optimization Problems	
@@ -0,0 +1 @@
+Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning. While significant progress has been made in recent years in developing efficient methods for \textit{smooth} low-rank optimization problems that avoid maintaining high-rank matrices and computing expensive high-rank SVDs, advances for nonsmooth problems have been slow paced. In this paper we consider standard convex relaxations for such problems. Mainly, we prove that under a natural \textit{generalized strict complementarity} condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, the \textit{extragradient method}, when initialized with a"warm-start"point, converges to an optimal solution with rate $O(1/t)$ while requiring only two \textit{low-rank} SVDs per iteration. We give a precise trade-off between the rank of the SVDs required and the radius of the ball in which we need to initialize the method. We support our theoretical results with empirical experiments on several nonsmooth low-rank matrix recovery tasks, demonstrating that using simple initializations, the extragradient method produces exactly the same iterates when full-rank SVDs are replaced with SVDs of rank that matches the rank of the (low-rank) ground-truth matrix to be recovered.
\ No newline at end of file
diff --git a/data/2021/neurips/Low-Rank Subspaces in GANs b/data/2021/neurips/Low-Rank Subspaces in GANs
new file mode 100644
index 0000000000..de2172b56a
--- /dev/null
+++ b/data/2021/neurips/Low-Rank Subspaces in GANs	
@@ -0,0 +1 @@
+The latent space of a Generative Adversarial Network (GAN) has been shown to encode rich semantics within some subspaces. To identify these subspaces, researchers typically analyze the statistical information from a collection of synthesized data, and the identified subspaces tend to control image attributes globally (i.e., manipulating an attribute causes the change of an entire image). By contrast, this work introduces low-rank subspaces that enable more precise control of GAN generation. Concretely, given an arbitrary image and a region of interest (e.g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces. There are three distinguishable strengths of our approach that can be aptly called LowRankGAN. First, compared to analytic algorithms in prior work, our low-rank factorization of Jacobians is able to find the low-dimensional representation of attribute manifold, making image editing more precise and controllable. Second, low-rank factorization naturally yields a null space of attributes such that moving the latent code within it only affects the outer region of interest. Therefore, local image editing can be simply achieved by projecting an attribute vector into the null space without relying on a spatial mask as existing methods do. Third, our method can robustly work with a local region from one image for analysis yet well generalize to other images, making it much easy to use in practice. Extensive experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
\ No newline at end of file
diff --git a/data/2021/neurips/Low-dimensional Structure in the Space of Language Representations is Reflected in Brain Responses b/data/2021/neurips/Low-dimensional Structure in the Space of Language Representations is Reflected in Brain Responses
new file mode 100644
index 0000000000..a9420bf15c
--- /dev/null
+++ b/data/2021/neurips/Low-dimensional Structure in the Space of Language Representations is Reflected in Brain Responses	
@@ -0,0 +1 @@
+How related are the representations learned by neural language models, translation models, and language tagging tasks? We answer this question by adapting an encoder-decoder transfer learning method from computer vision to investigate the structure among 100 different feature spaces extracted from hidden representations of various networks trained on language tasks. This method reveals a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings. We call this low-dimensional structure a language representation embedding because it encodes the relationships between representations needed to process language for a variety of NLP tasks. We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI. Additionally, we find that the principal dimension of this structure can be used to create a metric which highlights the brain's natural language processing hierarchy. This suggests that the embedding captures some part of the brain's natural language representation structure.
\ No newline at end of file
diff --git a/data/2021/neurips/Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks b/data/2021/neurips/Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks
new file mode 100644
index 0000000000..e036bcaf23
--- /dev/null
+++ b/data/2021/neurips/Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks	
@@ -0,0 +1 @@
+We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network whose links are allowed to change in time. We solve two fundamental problems for this task. First, we establish the first lower bounds on the number of decentralized communication rounds and the number of local computations required to find an $\epsilon$-accurate solution. Second, we design two optimal algorithms that attain these lower bounds: (i) a variant of the recently proposed algorithm ADOM (Kovalev et al., 2021) enhanced via a multi-consensus subroutine, which is optimal in the case when access to the dual gradients is assumed, and (ii) a novel algorithm, called ADOM+, which is optimal in the case when access to the primal gradients is assumed. We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions b/data/2021/neurips/Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions
new file mode 100644
index 0000000000..f1035a7a7b
--- /dev/null
+++ b/data/2021/neurips/Lower Bounds on Metropolized Sampling Methods for Well-Conditioned Distributions	
@@ -0,0 +1 @@
+We give lower bounds on the performance of two of the most popular sampling methods in practice, the Metropolis-adjusted Langevin algorithm (MALA) and multi-step Hamiltonian Monte Carlo (HMC) with a leapfrog integrator, when applied to well-conditioned distributions. Our main result is a nearly-tight lower bound of $\widetilde{\Omega}(\kappa d)$ on the mixing time of MALA from an exponentially warm start, matching a line of algorithmic results up to logarithmic factors and answering an open question of Chewi et. al. We also show that a polynomial dependence on dimension is necessary for the relaxation time of HMC under any number of leapfrog steps, and bound the gains achievable by changing the step count. Our HMC analysis draws upon a novel connection between leapfrog integration and Chebyshev polynomials, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models b/data/2021/neurips/Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models
new file mode 100644
index 0000000000..4c2f9da896
--- /dev/null
+++ b/data/2021/neurips/Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models	
@@ -0,0 +1 @@
+Tensor network (TN) methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the machine learning community for their ability to compactly represent very high-dimensional objects. TN methods can for example be used to efﬁciently learn linear models in exponentially large feature spaces [56]. In this work, we derive upper and lower bounds on the VC-dimension and pseudo-dimension of a large class of TN models for classiﬁcation, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary TN structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classiﬁcation with low-rank matrices as well as linear classiﬁers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC-dimension of the matrix product state classiﬁer introduced in [56] as a function of the so-called bond dimension (i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and Pérez-García in [13].
\ No newline at end of file
diff --git a/data/2021/neurips/Luna: Linear Unified Nested Attention b/data/2021/neurips/Luna: Linear Unified Nested Attention
new file mode 100644
index 0000000000..a2b34ed2fb
--- /dev/null
+++ b/data/2021/neurips/Luna: Linear Unified Nested Attention	
@@ -0,0 +1 @@
+The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that approximates softmax attention with two nested linear attention functions, yielding only linear (as opposed to quadratic) time and space complexity. Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length. Then, the packed sequence is unpacked using the second attention function. As compared to a more traditional attention mechanism, Luna introduces an additional sequence with a fixed length as input and an additional corresponding output, which allows Luna to perform attention operation linearly, while also storing adequate contextual information. We perform extensive evaluations on three benchmarks of sequence modeling tasks: long-context sequence modeling, neural machine translation and masked language modeling for large-scale pretraining. Competitive or even better experimental results demonstrate both the effectiveness and efficiency of Luna compared to a variety
\ No newline at end of file
diff --git a/data/2021/neurips/M-FAC: Efficient Matrix-Free Approximations of Second-Order Information b/data/2021/neurips/M-FAC: Efficient Matrix-Free Approximations of Second-Order Information
new file mode 100644
index 0000000000..39b75e6024
--- /dev/null
+++ b/data/2021/neurips/M-FAC: Efficient Matrix-Free Approximations of Second-Order Information	
@@ -0,0 +1 @@
+Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [9] and [17].
\ No newline at end of file
diff --git a/data/2021/neurips/MADE: Exploration via Maximizing Deviation from Explored Regions b/data/2021/neurips/MADE: Exploration via Maximizing Deviation from Explored Regions
new file mode 100644
index 0000000000..8432b6a8d3
--- /dev/null
+++ b/data/2021/neurips/MADE: Exploration via Maximizing Deviation from Explored Regions	
@@ -0,0 +1 @@
+In online reinforcement learning (RL), efficient exploration remains particularly challenging in high-dimensional environments with sparse rewards. In low-dimensional environments, where tabular parameterization is possible, count-based upper confidence bound (UCB) exploration methods achieve minimax near-optimal rates. However, it remains unclear how to efficiently implement UCB in realistic RL tasks that involve non-linear function approximation. To address this, we propose a new exploration approach via \textit{maximizing} the deviation of the occupancy of the next policy from the explored regions. We add this term as an adaptive regularizer to the standard RL objective to balance exploration vs. exploitation. We pair the new objective with a provably convergent algorithm, giving rise to a new intrinsic reward that adjusts existing bonuses. The proposed intrinsic reward is easy to implement and combine with other existing RL algorithms to conduct exploration. As a proof of concept, we evaluate the new intrinsic reward on tabular examples across a variety of model-based and model-free algorithms, showing improvements over count-only exploration strategies. When tested on navigation and locomotion tasks from MiniGrid and DeepMind Control Suite benchmarks, our approach significantly improves sample efficiency over state-of-the-art methods. Our code is available at https://github.com/tianjunz/MADE.
\ No newline at end of file
diff --git a/data/2021/neurips/MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents b/data/2021/neurips/MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents
new file mode 100644
index 0000000000..916ab73fff
--- /dev/null
+++ b/data/2021/neurips/MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning Agents	
@@ -0,0 +1 @@
+Nearly all state-of-the-art deep learning algorithms rely on error backpropagation, which is generally regarded as biologically implausible. An alternative way of training an artificial neural network is through treating each unit in the network as a reinforcement learning agent, and thus the network is considered as a team of agents. As such, all units can be trained by REINFORCE, a local learning rule modulated by a global signal that is more consistent with biologically observed forms of synaptic plasticity. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and thus the low speed of learning, rendering it impractical to train deep networks. We therefore propose a novel algorithm called MAP propagation to reduce this variance significantly while retaining the local property of the learning rule. Experiments demonstrated that MAP propagation could solve common reinforcement learning tasks at a similar speed to backpropagation when applied to an actor-critic network. Our work thus allows for the broader application of the teams of agents in deep reinforcement learning.
\ No newline at end of file
diff --git a/data/2021/neurips/MAU: A Motion-Aware Unit for Video Prediction and Beyond b/data/2021/neurips/MAU: A Motion-Aware Unit for Video Prediction and Beyond
new file mode 100644
index 0000000000..990bf0208e
--- /dev/null
+++ b/data/2021/neurips/MAU: A Motion-Aware Unit for Video Prediction and Beyond	
@@ -0,0 +1 @@
+Accurately predicting inter-frame motion information plays a key role in video prediction tasks. In this paper, we propose a Motion-Aware Unit (MAU) to capture reliable inter-frame motion information by broadening the temporal receptive field of the predictive units. The MAU consists of two modules, the attention module and the fusion module. The attention module aims to learn an attention map based on the correlations between the current spatial state and the historical spatial states. Based on the learned attention map, the historical temporal states are aggregated to an augmented motion information (AMI). In this way, the predictive unit can perceive more temporal dynamics from a wider receptive field. Then, the fusion module is utilized to further aggregate the augmented motion information (AMI) and current appearance information (current spatial state) to the final predicted frame. The computation load of MAU is relatively low, and the proposed unit can be easily applied to other predictive models. Moreover, an information recalling scheme is employed into the encoders and decoders to help preserve the visual details of the predictions. We evaluate the MAU on both video prediction and early action recognition tasks. Experimental results show that the MAU outperforms the state-of-the-art methods on both tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers b/data/2021/neurips/MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
new file mode 100644
index 0000000000..5781c58e60
--- /dev/null
+++ b/data/2021/neurips/MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers	
@@ -0,0 +1 @@
+As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
\ No newline at end of file
diff --git a/data/2021/neurips/MCMC Variational Inference via Uncorrected Hamiltonian Annealing b/data/2021/neurips/MCMC Variational Inference via Uncorrected Hamiltonian Annealing
new file mode 100644
index 0000000000..441decc0d1
--- /dev/null
+++ b/data/2021/neurips/MCMC Variational Inference via Uncorrected Hamiltonian Annealing	
@@ -0,0 +1 @@
+Given an unnormalized target distribution we want to obtain approximate samples from it and a tight lower bound on its (log) normalization constant log Z. Annealed Importance Sampling (AIS) with Hamiltonian MCMC is a powerful method that can be used to do this. Its main drawback is that it uses non-differentiable transition kernels, which makes tuning its many parameters hard. We propose a framework to use an AIS-like procedure with Uncorrected Hamiltonian MCMC, called Uncorrected Hamiltonian Annealing. Our method leads to tight and differentiable lower bounds on log Z. We show empirically that our method yields better performances than other competing approaches, and that the ability to tune its parameters using reparameterization gradients may lead to large performance improvements.
\ No newline at end of file
diff --git a/data/2021/neurips/MERLOT: Multimodal Neural Script Knowledge Models b/data/2021/neurips/MERLOT: Multimodal Neural Script Knowledge Models
new file mode 100644
index 0000000000..b6d2b1e217
--- /dev/null
+++ b/data/2021/neurips/MERLOT: Multimodal Neural Script Knowledge Models	
@@ -0,0 +1 @@
+As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future. We introduce MERLOT, a model that learns multimodal script knowledge by watching millions of YouTube videos with transcribed speech -- in an entirely label-free, self-supervised manner. By pretraining with a mix of both frame-level (spatial) and video-level (temporal) objectives, our model not only learns to match images to temporally corresponding words, but also to contextualize what is happening globally over time. As a result, MERLOT exhibits strong out-of-the-box representations of temporal commonsense, and achieves state-of-the-art performance on 12 different video QA datasets when finetuned. It also transfers well to the world of static images, allowing models to reason about the dynamic context behind visual scenes. On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised data (like object bounding boxes). Ablation analyses demonstrate the complementary importance of: 1) training on videos versus static images; 2) scaling the magnitude and diversity of the pretraining video corpus; and 3) using diverse objectives that encourage full-stack multimodal reasoning, from the recognition to cognition level.
\ No newline at end of file
diff --git a/data/2021/neurips/MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge b/data/2021/neurips/MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge
new file mode 100644
index 0000000000..09d1c12eb8
--- /dev/null
+++ b/data/2021/neurips/MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge	
@@ -0,0 +1 @@
+Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. A reviewer strongly against our work based on his false assumptions and misunderstandings. On top of the previous submission, we employ data efficiency for further acceleration of sparse training. And we explore the impact of model sparsity, sparsity schemes, and sparse training algorithms on the number of removable training examples. Our codes are publicly available at: https://github.com/boone891214/MEST.
\ No newline at end of file
diff --git a/data/2021/neurips/MICo: Improved representations via sampling-based state similarity for Markov decision processes b/data/2021/neurips/MICo: Improved representations via sampling-based state similarity for Markov decision processes
new file mode 100644
index 0000000000..6e1d90275b
--- /dev/null
+++ b/data/2021/neurips/MICo: Improved representations via sampling-based state similarity for Markov decision processes	
@@ -0,0 +1 @@
+We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.
\ No newline at end of file
diff --git a/data/2021/neurips/MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms b/data/2021/neurips/MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms
new file mode 100644
index 0000000000..1ff891acab
--- /dev/null
+++ b/data/2021/neurips/MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms	
@@ -0,0 +1 @@
+Missing data is an important problem in machine learning practice. Starting from the premise that imputation methods should preserve the causal structure of the data, we develop a regularization scheme that encourages any baseline imputation method to be causally consistent with the underlying data generating mechanism. Our proposal is a causally-aware imputation algorithm (MIRACLE). MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism, encouraging imputation to be consistent with the causal structure of the data. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation over a variety of benchmark methods across all three missingness scenarios: at random, completely at random, and not at random.
\ No newline at end of file
diff --git a/data/2021/neurips/MLP-Mixer: An all-MLP Architecture for Vision b/data/2021/neurips/MLP-Mixer: An all-MLP Architecture for Vision
new file mode 100644
index 0000000000..a090b95dfe
--- /dev/null
+++ b/data/2021/neurips/MLP-Mixer: An all-MLP Architecture for Vision	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e."mixing"the per-location features), and one with MLPs applied across patches (i.e."mixing"spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.
\ No newline at end of file
diff --git a/data/2021/neurips/MOMA: Multi-Object Multi-Actor Activity Parsing b/data/2021/neurips/MOMA: Multi-Object Multi-Actor Activity Parsing
new file mode 100644
index 0000000000..c1942a8062
--- /dev/null
+++ b/data/2021/neurips/MOMA: Multi-Object Multi-Actor Activity Parsing	
@@ -0,0 +1 @@
+Complex activities often involve multiple humans utilizing different objects to complete actions (e
\ No newline at end of file
diff --git a/data/2021/neurips/MST: Masked Self-Supervised Transformer for Visual Representation b/data/2021/neurips/MST: Masked Self-Supervised Transformer for Visual Representation
new file mode 100644
index 0000000000..faa10b3733
--- /dev/null
+++ b/data/2021/neurips/MST: Masked Self-Supervised Transformer for Visual Representation	
@@ -0,0 +1 @@
+Transformer has been widely used for self-supervised pre-training in Natural Language Processing (NLP) and achieved great success. However, it has not been fully explored in visual self-supervised learning. Meanwhile, previous methods only consider the high-level feature and learning representation from a global perspective, which may fail to transfer to the downstream dense prediction tasks focusing on local features. In this paper, we present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image while preserving the global semantic information. Specifically, inspired by the Masked Language Modeling (MLM) in NLP, we propose a masked token strategy based on the multi-head self-attention map, which dynamically masks some tokens of local patches without damaging the crucial structure for self-supervised learning. More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks. The experiments on multiple datasets demonstrate the effectiveness and generality of the proposed method. For instance, MST achieves Top-1 accuracy of 76.9% with DeiT-S only using 300-epoch pre-training by linear evaluation, which outperforms supervised methods with the same epoch by 0.4% and its comparable variant DINO by 1.0\%. For dense prediction tasks, MST also achieves 42.7% mAP on MS COCO object detection and 74.04% mIoU on Cityscapes segmentation only with 100-epoch pre-training.
\ No newline at end of file
diff --git a/data/2021/neurips/Machine Learning for Variance Reduction in Online Experiments b/data/2021/neurips/Machine Learning for Variance Reduction in Online Experiments
new file mode 100644
index 0000000000..a01394a428
--- /dev/null
+++ b/data/2021/neurips/Machine Learning for Variance Reduction in Online Experiments	
@@ -0,0 +1 @@
+We consider the problem of variance reduction in randomized controlled trials, through the use of covariates correlated with the outcome but independent of the treatment. We propose a machine learning regression-adjusted treatment effect estimator, which we call MLRATE. MLRATE uses machine learning predictors of the outcome to reduce estimator variance. It employs cross-fitting to avoid overfitting biases, and we prove consistency and asymptotic normality under general conditions. MLRATE is robust to poor predictions from the machine learning step: if the predictions are uncorrelated with the outcomes, the estimator performs asymptotically no worse than the standard difference-in-means estimator, while if predictions are highly correlated with outcomes, the efficiency gains are large. In A/A tests, for a set of 48 outcome metrics commonly monitored in Facebook experiments the estimator has over 70% lower variance than the simple difference-in-means estimator, and about 19% lower variance than the common univariate procedure which adjusts only for pre-experiment values of the outcome.
\ No newline at end of file
diff --git a/data/2021/neurips/Machine learning structure preserving brackets for forecasting irreversible processes b/data/2021/neurips/Machine learning structure preserving brackets for forecasting irreversible processes
new file mode 100644
index 0000000000..993b92a87c
--- /dev/null
+++ b/data/2021/neurips/Machine learning structure preserving brackets for forecasting irreversible processes	
@@ -0,0 +1 @@
+Forecasting of time-series data requires imposition of inductive biases to obtain predictive extrapolation, and recent works have imposed Hamiltonian/Lagrangian form to preserve structure for systems with reversible dynamics. In this work we present a novel parameterization of dissipative brackets from metriplectic dynamical systems appropriate for learning irreversible dynamics with unknown a priori model form. The process learns generalized Casimirs for energy and entropy guaranteed to be conserved and nondecreasing, respectively. Furthermore, for the case of added thermal noise, we guarantee exact preservation of a fluctuation-dissipation theorem, ensuring thermodynamic consistency. We provide benchmarks for dissipative systems demonstrating learned dynamics are more robust and generalize better than either"black-box"or penalty-based approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Machine versus Human Attention in Deep Reinforcement Learning Tasks b/data/2021/neurips/Machine versus Human Attention in Deep Reinforcement Learning Tasks
new file mode 100644
index 0000000000..a38ff0b894
--- /dev/null
+++ b/data/2021/neurips/Machine versus Human Attention in Deep Reinforcement Learning Tasks	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) algorithms are powerful tools for solving visuomotor decision tasks. However, the trained models are often difficult to interpret, because they are represented as end-to-end deep neural networks. In this paper, we shed light on the inner workings of such trained models by analyzing the pixels that they attend to during task execution, and comparing them with the pixels attended to by humans executing the same tasks. To this end, we investigate the following two questions that, to the best of our knowledge, have not been previously studied. 1) How similar are the visual representations learned by RL agents and humans when performing the same task? and, 2) How do similarities and differences in these learned representations explain RL agents' performance on these tasks? Specifically, we compare the saliency maps of RL agents against visual attention models of human experts when learning to play Atari games. Further, we analyze how hyperparameters of the deep RL algorithm affect the learned representations and saliency maps of the trained agents. The insights provided have the potential to inform novel algorithms for closing the performance gap between human experts and RL agents.
\ No newline at end of file
diff --git a/data/2021/neurips/MagNet: A Neural Network for Directed Graphs b/data/2021/neurips/MagNet: A Neural Network for Directed Graphs
new file mode 100644
index 0000000000..a6e2327b5a
--- /dev/null
+++ b/data/2021/neurips/MagNet: A Neural Network for Directed Graphs	
@@ -0,0 +1 @@
+The prevalence of graph-based data has spurred the rapid development of graph neural networks (GNNs) and related machine learning algorithms. Yet, despite the many datasets naturally modeled as directed graphs, including citation, website, and traffic networks, the vast majority of this research focuses on undirected graphs. In this paper, we propose MagNet, a GNN for directed graphs based on a complex Hermitian matrix known as the magnetic Laplacian. This matrix encodes undirected geometric structure in the magnitude of its entries and directional information in their phase. A "charge" parameter attunes spectral information to variation among directed cycles. We apply our network to a variety of directed graph node classification and link prediction tasks showing that MagNet performs well on all tasks and that its performance exceeds all other methods on a majority of such tasks. The underlying principles of MagNet are such that it can be adapted to other GNN architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications b/data/2021/neurips/Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications
new file mode 100644
index 0000000000..382675aff9
--- /dev/null
+++ b/data/2021/neurips/Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications	
@@ -0,0 +1 @@
+Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e.g., Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with"functional multipliers"that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i.e., sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms. We empirically validate our algorithms by applying them to Bayesian Neural Networks (BNNs) and MC Dropout Networks, and certifying properties such as adversarial robustness and robust detection of out-of-distribution (OOD) data. On these tasks we are able to provide significantly stronger guarantees when compared to prior work -- for instance, for a VGG-64 MC-Dropout CNN trained on CIFAR-10, we improve the certified AUC (a verified lower bound on the true AUC) for robust OOD detection (on CIFAR-100) from $0\% \rightarrow 29\%$. Similarly, for a BNN trained on MNIST, we improve on the robust accuracy from $60.2\% \rightarrow 74.6\%$. Further, on a novel specification -- distributionally robust OOD detection -- we improve the certified AUC from $5\% \rightarrow 23\%$.
\ No newline at end of file
diff --git a/data/2021/neurips/Making a (Counterfactual) Difference One Rationale at a Time b/data/2021/neurips/Making a (Counterfactual) Difference One Rationale at a Time
new file mode 100644
index 0000000000..d8534016e4
--- /dev/null
+++ b/data/2021/neurips/Making a (Counterfactual) Difference One Rationale at a Time	
@@ -0,0 +1 @@
+Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the"selected"text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Making the most of your day: online learning for optimal allocation of time b/data/2021/neurips/Making the most of your day: online learning for optimal allocation of time
new file mode 100644
index 0000000000..0e0c38a5e1
--- /dev/null
+++ b/data/2021/neurips/Making the most of your day: online learning for optimal allocation of time	
@@ -0,0 +1 @@
+We study online learning for optimal allocation when the resource to be allocated is time. %Examples of possible applications include job scheduling for a computing server, a driver filling a day with rides, a landlord renting an estate, etc. An agent receives task proposals sequentially according to a Poisson process and can either accept or reject a proposed task. If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration. If she rejects it, she remains on hold until a new task proposal arrives. We study the regret incurred by the agent, first when she knows her reward function but does not know the distribution of the task duration, and then when she does not know her reward function, either. This natural setting bears similarities with contextual (one-armed) bandits, but with the crucial difference that the normalized reward associated to a context depends on the whole distribution of contexts.
\ No newline at end of file
diff --git a/data/2021/neurips/Manifold Topology Divergence: a Framework for Comparing Data Manifolds b/data/2021/neurips/Manifold Topology Divergence: a Framework for Comparing Data Manifolds
new file mode 100644
index 0000000000..bb35eecdc9
--- /dev/null
+++ b/data/2021/neurips/Manifold Topology Divergence: a Framework for Comparing Data Manifolds	
@@ -0,0 +1 @@
+We develop a framework for comparing data manifolds, aimed, in particular, towards the evaluation of deep generative models. We describe a novel tool, Cross-Barcode(P,Q), that, given a pair of distributions in a high-dimensional space, tracks multiscale topology spacial discrepancies between manifolds on which the distributions are concentrated. Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence) and apply it to assess the performance of deep generative models in various domains: images, 3D-shapes, time-series, and on different datasets: MNIST, Fashion MNIST, SVHN, CIFAR10, FFHQ, chest X-ray images, market stock data, ShapeNet. We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance. Our algorithm scales well (essentially linearly) with the increase of the dimension of the ambient high-dimensional space. It is one of the first TDA-based practical methodologies that can be applied universally to datasets of different sizes and dimensions, including the ones on which the most recent GANs in the visual domain are trained. The proposed method is domain agnostic and does not rely on pre-trained networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Manipulating SGD with Data Ordering Attacks b/data/2021/neurips/Manipulating SGD with Data Ordering Attacks
new file mode 100644
index 0000000000..959b6f1698
--- /dev/null
+++ b/data/2021/neurips/Manipulating SGD with Data Ordering Attacks	
@@ -0,0 +1 @@
+Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors. In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviours specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures. We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors.
\ No newline at end of file
diff --git a/data/2021/neurips/Margin-Independent Online Multiclass Learning via Convex Geometry b/data/2021/neurips/Margin-Independent Online Multiclass Learning via Convex Geometry
new file mode 100644
index 0000000000..5646e0c7c1
--- /dev/null
+++ b/data/2021/neurips/Margin-Independent Online Multiclass Learning via Convex Geometry	
@@ -0,0 +1 @@
+We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online. Unlike traditional bounds which seek to minimize the misclassification rate, we minimize the total distance from each query to the region corresponding to its correct label. When the true labels are determined via a nearest neighbor partition -- i.e. the label of a point is given by which of $k$ centers it is closest to in Euclidean distance -- we show that one can achieve a loss that is independent of the total number of queries. We complement this result by showing that learning general convex sets requires an almost linear loss per query. Our results build off of regret guarantees for the geometric problem of contextual search. In addition, we develop a novel reduction technique from multiclass classification to binary classification which may be of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Marginalised Gaussian Processes with Nested Sampling b/data/2021/neurips/Marginalised Gaussian Processes with Nested Sampling
new file mode 100644
index 0000000000..ca37b89ac8
--- /dev/null
+++ b/data/2021/neurips/Marginalised Gaussian Processes with Nested Sampling	
@@ -0,0 +1 @@
+Gaussian Process (GPs) models are a rich distribution over functions with inductive biases controlled by a kernel function. Learning occurs through the optimisation of kernel hyperparameters using the marginal likelihood as the objective. This classical approach known as Type-II maximum likelihood (ML-II) yields point estimates of the hyperparameters, and continues to be the default method for training GPs. However, this approach risks underestimating predictive uncertainty and is prone to overfitting especially when there are many hyperparameters. Furthermore, gradient based optimisation makes ML-II point estimates highly susceptible to the presence of local minima. This work presents an alternative learning procedure where the hyperparameters of the kernel function are marginalised using Nested Sampling (NS), a technique that is well suited to sample from complex, multi-modal distributions. We focus on regression tasks with the spectral mixture (SM) class of kernels and find that a principled approach to quantifying model uncertainty leads to substantial gains in predictive performance across a range of synthetic and benchmark data sets. In this context, nested sampling is also found to offer a speed advantage over Hamiltonian Monte Carlo (HMC), widely considered to be the gold-standard in MCMC based inference.
\ No newline at end of file
diff --git a/data/2021/neurips/MarioNette: Self-Supervised Sprite Learning b/data/2021/neurips/MarioNette: Self-Supervised Sprite Learning
new file mode 100644
index 0000000000..842fc32529
--- /dev/null
+++ b/data/2021/neurips/MarioNette: Self-Supervised Sprite Learning	
@@ -0,0 +1 @@
+Artists and video game designers often construct 2D animations using libraries of sprites -- textured patches of objects and characters. We propose a deep learning approach that decomposes sprite-based video animations into a disentangled representation of recurring graphic elements in a self-supervised manner. By jointly learning a dictionary of possibly transparent patches and training a network that places them onto a canvas, we deconstruct sprite-based content into a sparse, consistent, and explicit representation that can be easily used in downstream tasks, like editing or analysis. Our framework offers a promising approach for discovering recurring visual patterns in image collections without supervision.
\ No newline at end of file
diff --git a/data/2021/neurips/Mastering Atari Games with Limited Data b/data/2021/neurips/Mastering Atari Games with Limited Data
new file mode 100644
index 0000000000..fcf7b5d59e
--- /dev/null
+++ b/data/2021/neurips/Mastering Atari Games with Limited Data	
@@ -0,0 +1 @@
+Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 194.3% mean human performance and 109.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at https://github.com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community.
\ No newline at end of file
diff --git a/data/2021/neurips/Matching a Desired Causal State via Shift Interventions b/data/2021/neurips/Matching a Desired Causal State via Shift Interventions
new file mode 100644
index 0000000000..05684a5019
--- /dev/null
+++ b/data/2021/neurips/Matching a Desired Causal State via Shift Interventions	
@@ -0,0 +1 @@
+Transforming a causal system from a given initial state to a desired target state is an important task permeating multiple fields including control theory, biology, and materials science. In causal models, such transformations can be achieved by performing a set of interventions. In this paper, we consider the problem of identifying a shift intervention that matches the desired mean of a system through active learning. We define the Markov equivalence class that is identifiable from shift interventions and propose two active learning strategies that are guaranteed to exactly match a desired mean. We then derive a worst-case lower bound for the number of interventions required and show that these strategies are optimal for certain classes of graphs. In particular, we show that our strategies may require exponentially fewer interventions than the previously considered approaches, which optimize for structure learning in the underlying causal graph. In line with our theoretical results, we also demonstrate experimentally that our proposed active learning strategies require fewer interventions compared to several baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Matrix encoding networks for neural combinatorial optimization b/data/2021/neurips/Matrix encoding networks for neural combinatorial optimization
new file mode 100644
index 0000000000..22e64a401b
--- /dev/null
+++ b/data/2021/neurips/Matrix encoding networks for neural combinatorial optimization	
@@ -0,0 +1 @@
+Machine Learning (ML) can help solve combinatorial optimization (CO) problems better. A popular approach is to use a neural net to compute on the parameters of a given CO problem and extract useful information that guides the search for good solutions. Many CO problems of practical importance can be specified in a matrix form of parameters quantifying the relationship between two groups of items. There is currently no neural net model, however, that takes in such matrix-style relationship data as an input. Consequently, these types of CO problems have been out of reach for ML engineers. In this paper, we introduce Matrix Encoding Network (MatNet) and show how conveniently it takes in and processes parameters of such complex CO problems. Using an end-to-end model based on MatNet, we solve asymmetric traveling salesman (ATSP) and flexible flow shop (FFSP) problems as the earliest neural approach. In particular, for a class of FFSP we have tested MatNet on, we demonstrate a far superior empirical performance to any methods (neural or not) known to date.
\ No newline at end of file
diff --git a/data/2021/neurips/Matrix factorisation and the interpretation of geodesic distance b/data/2021/neurips/Matrix factorisation and the interpretation of geodesic distance
new file mode 100644
index 0000000000..4ce3654871
--- /dev/null
+++ b/data/2021/neurips/Matrix factorisation and the interpretation of geodesic distance	
@@ -0,0 +1 @@
+Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Maximum Likelihood Training of Score-Based Diffusion Models b/data/2021/neurips/Maximum Likelihood Training of Score-Based Diffusion Models
new file mode 100644
index 0000000000..5a32d4c9bb
--- /dev/null
+++ b/data/2021/neurips/Maximum Likelihood Training of Score-Based Diffusion Models	
@@ -0,0 +1 @@
+Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Measuring Generalization with Optimal Transport b/data/2021/neurips/Measuring Generalization with Optimal Transport
new file mode 100644
index 0000000000..5d41c25ad3
--- /dev/null
+++ b/data/2021/neurips/Measuring Generalization with Optimal Transport	
@@ -0,0 +1 @@
+Understanding the generalization of deep neural networks is one of the most important tasks in deep learning. Although much progress has been made, theoretical error bounds still often behave disparately from empirical observations. In this work, we develop margin-based generalization bounds, where the margins are normalized with optimal transport costs between independent random subsets sampled from the training distribution. In particular, the optimal transport cost can be interpreted as a generalization of variance which captures the structural properties of the learned feature space. Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets. Theoretically, we demonstrate that the concentration and separation of features play crucial roles in generalization, supporting empirical results in the literature. The code is available at \url{https://github.com/chingyaoc/kV-Margin}.
\ No newline at end of file
diff --git a/data/2021/neurips/Medical Dead-ends and Learning to Identify High-Risk States and Treatments b/data/2021/neurips/Medical Dead-ends and Learning to Identify High-Risk States and Treatments
new file mode 100644
index 0000000000..17c48de13d
--- /dev/null
+++ b/data/2021/neurips/Medical Dead-ends and Learning to Identify High-Risk States and Treatments	
@@ -0,0 +1 @@
+Machine learning has successfully framed many sequential decision making problems as either supervised prediction, or optimal decision-making policy identification via reinforcement learning. In data-constrained offline settings, both approaches may fail as they assume fully optimal behavior or rely on exploring alternatives that may not exist. We introduce an inherently different approach that identifies possible"dead-ends"of a state space. We focus on the condition of patients in the intensive care unit, where a"medical dead-end"indicates that a patient will expire, regardless of all potential future treatment sequences. We postulate"treatment security"as avoiding treatments with probability proportional to their chance of leading to dead-ends, present a formal proof, and frame discovery as an RL problem. We then train three independent deep neural models for automated state construction, dead-end discovery and confirmation. Our empirical results discover that dead-ends exist in real clinical data among septic patients, and further reveal gaps between secure treatments and those that were administered.
\ No newline at end of file
diff --git a/data/2021/neurips/Memory Efficient Meta-Learning with Large Images b/data/2021/neurips/Memory Efficient Meta-Learning with Large Images
new file mode 100644
index 0000000000..a9d5215a45
--- /dev/null
+++ b/data/2021/neurips/Memory Efficient Meta-Learning with Large Images	
@@ -0,0 +1 @@
+Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering b/data/2021/neurips/Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering
new file mode 100644
index 0000000000..bf6fea1a34
--- /dev/null
+++ b/data/2021/neurips/Memory-Efficient Approximation Algorithms for Max-k-Cut and Correlation Clustering	
@@ -0,0 +1 @@
+Max-k-Cut and correlation clustering are fundamental graph partitioning problems. For a graph with G=(V,E) with n vertices, the methods with the best approximation guarantees for Max-k-Cut and the Max-Agree variant of correlation clustering involve solving SDPs with $O(n^2)$ variables and constraints. Large-scale instances of SDPs, thus, present a memory bottleneck. In this paper, we develop simple polynomial-time Gaussian sampling-based algorithms for these two problems that use $O(n+|E|)$ memory and nearly achieve the best existing approximation guarantees. For dense graphs arriving in a stream, we eliminate the dependence on $|E|$ in the storage complexity at the cost of a slightly worse approximation ratio by combining our approach with sparsification.
\ No newline at end of file
diff --git a/data/2021/neurips/Memory-efficient Patch-based Inference for Tiny Deep Learning b/data/2021/neurips/Memory-efficient Patch-based Inference for Tiny Deep Learning
new file mode 100644
index 0000000000..e2c8e6e57e
--- /dev/null
+++ b/data/2021/neurips/Memory-efficient Patch-based Inference for Tiny Deep Learning	
@@ -0,0 +1 @@
+Tiny deep learning on microcontroller units (MCUs) is challenging due to the limited memory size. We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs: the first several blocks have an order of magnitude larger memory usage than the rest of the network. To alleviate this issue, we propose a generic patch-by-patch inference scheduling, which operates only on a small spatial region of the feature map and significantly cuts down the peak memory. However, naive implementation brings overlapping patches and computation overhead. We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead. Manually redistributing the receptive field is difficult. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2. Patch-based inference effectively reduces the peak memory usage of existing networks by 4-8x. Co-designed with neural networks, MCUNetV2 sets a record ImageNet accuracy on MCU (71.8%), and achieves>90% accuracy on the visual wake words dataset under only 32kB SRAM. MCUNetV2 also unblocks object detection on tiny devices, achieving 16.9% higher mAP on Pascal VOC compared to the state-of-the-art result. Our study largely addressed the memory bottleneck in tinyML and paved the way for various vision applications beyond image classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta Internal Learning b/data/2021/neurips/Meta Internal Learning
new file mode 100644
index 0000000000..bdfdba4194
--- /dev/null
+++ b/data/2021/neurips/Meta Internal Learning	
@@ -0,0 +1 @@
+Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image. Since these models are trained on a single image, they are limited in their scale and application. To overcome these issues, we propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively. In the presented meta-learning approach, a single-image GAN model is generated given an input image, via a convolutional feedforward hypernetwork $f$. This network is trained over a dataset of images, allowing for feature sharing among different models, and for interpolation in the space of generative models. The generated single-image model contains a hierarchy of multiple generators and discriminators. It is therefore required to train the meta-learner in an adversarial manner, which requires careful design choices that we justify by a theoretical analysis. Our results show that the models obtained are as suitable as single-image GANs for many common image applications, significantly reduce the training time per image without loss in performance, and introduce novel capabilities, such as interpolation and feedforward modeling of novel images.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta Learning Backpropagation And Improving It b/data/2021/neurips/Meta Learning Backpropagation And Improving It
new file mode 100644
index 0000000000..fb1f8c1033
--- /dev/null
+++ b/data/2021/neurips/Meta Learning Backpropagation And Improving It	
@@ -0,0 +1 @@
+Many concepts have been proposed for meta learning with neural networks (NNs), e.g., NNs that learn to reprogram fast weights, Hebbian plasticity, learned learning rules, and meta recurrent NNs. Our Variable Shared Meta Learning (VSML) unifies the above and demonstrates that simple weight-sharing and sparsity in an NN is sufficient to express powerful learning algorithms (LAs) in a reusable fashion. A simple implementation of VSML where the weights of a neural network are replaced by tiny LSTMs allows for implementing the backpropagation LA solely by running in forward-mode. It can even meta learn new LAs that differ from online backpropagation and generalize to datasets outside of the meta training distribution without explicit gradient calculation. Introspection reveals that our meta learned LAs learn through fast association in a way that is qualitatively different from gradient descent.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data b/data/2021/neurips/Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data
new file mode 100644
index 0000000000..fab5c815ff
--- /dev/null
+++ b/data/2021/neurips/Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data	
@@ -0,0 +1 @@
+Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming a considerable number of observed samples from each distribution. In realistic scenarios with very limited numbers of data samples, however, it can be challenging to identify a kernel powerful enough to distinguish complex distributions. We address this issue by introducing the problem of meta two-sample testing (M2ST), which aims to exploit (abundant) auxiliary data on related tasks to find an algorithm that can quickly identify a powerful test on new target tasks. We propose two specific algorithms for this task: a generic scheme which improves over baselines and a more tailored approach which performs even better. We provide both theoretical justification and empirical evidence that our proposed meta-testing schemes out-perform learning kernel-based tests directly from scarce observations, and identify when such schemes will be successful.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Adaptive Nonlinear Control: Theory and Algorithms b/data/2021/neurips/Meta-Adaptive Nonlinear Control: Theory and Algorithms
new file mode 100644
index 0000000000..ad6a50f261
--- /dev/null
+++ b/data/2021/neurips/Meta-Adaptive Nonlinear Control: Theory and Algorithms	
@@ -0,0 +1 @@
+We present an online multi-task learning approach for adaptive nonlinear control, which we call Online Meta-Adaptive Control (OMAC). The goal is to control a nonlinear system subject to adversarial disturbance and unknown $\textit{environment-dependent}$ nonlinear dynamics, under the assumption that the environment-dependent dynamics can be well captured with some shared representation. Our approach is motivated by robot control, where a robotic system encounters a sequence of new environmental conditions that it must quickly adapt to. A key emphasis is to integrate online representation learning with established methods from control theory, in order to arrive at a unified framework that yields both control-theoretic and learning-theoretic guarantees. We provide instantiations of our approach under varying conditions, leading to the first non-asymptotic end-to-end convergence guarantee for multi-task nonlinear control. OMAC can also be integrated with deep representation learning. Experiments show that OMAC significantly outperforms conventional adaptive control approaches which do not learn the shared representation, in inverted pendulum and 6-DoF drone control tasks under varying wind conditions.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Learning Reliable Priors in the Function Space b/data/2021/neurips/Meta-Learning Reliable Priors in the Function Space
new file mode 100644
index 0000000000..0e91eafd66
--- /dev/null
+++ b/data/2021/neurips/Meta-Learning Reliable Priors in the Function Space	
@@ -0,0 +1 @@
+When data are scarce meta-learning can improve a learner's accuracy by harnessing previous experience from related learning tasks. However, existing methods have unreliable uncertainty estimates which are often overconfident. Addressing these shortcomings, we introduce a novel meta-learning framework, called F-PACOH, that treats meta-learned priors as stochastic processes and performs meta-level regularization directly in the function space. This allows us to directly steer the probabilistic predictions of the meta-learner towards high epistemic uncertainty in regions of insufficient meta-training data and, thus, obtain well-calibrated uncertainty estimates. Finally, we showcase how our approach can be integrated with sequential decision making, where reliable uncertainty quantification is imperative. In our benchmark study on meta-learning for Bayesian Optimization (BO), F-PACOH significantly outperforms all other meta-learners and standard baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Learning Sparse Implicit Neural Representations b/data/2021/neurips/Meta-Learning Sparse Implicit Neural Representations
new file mode 100644
index 0000000000..306c7eb306
--- /dev/null
+++ b/data/2021/neurips/Meta-Learning Sparse Implicit Neural Representations	
@@ -0,0 +1 @@
+Implicit neural representations are a promising new avenue of representing general signals by learning a continuous function that, parameterized as a neural network, maps the domain of a signal to its codomain; the mapping from spatial coordinates of an image to its pixel values, for example. Being capable of conveying fine details in a high dimensional signal, unboundedly of its domain, implicit neural representations ensure many advantages over conventional discrete representations. However, the current approach is difficult to scale for a large number of signals or a data set, since learning a neural representation -- which is parameter heavy by itself -- for each signal individually requires a lot of memory and computations. To address this issue, we propose to leverage a meta-learning approach in combination with network compression under a sparsity constraint, such that it renders a well-initialized sparse parameterization that evolves quickly to represent a set of unseen signals in the subsequent training. We empirically demonstrate that meta-learned sparse neural representations achieve a much smaller loss than dense meta-learned models with the same number of parameters, when trained to fit each signal using the same number of optimization steps.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Learning for Relative Density-Ratio Estimation b/data/2021/neurips/Meta-Learning for Relative Density-Ratio Estimation
new file mode 100644
index 0000000000..f4c25099c0
--- /dev/null
+++ b/data/2021/neurips/Meta-Learning for Relative Density-Ratio Estimation	
@@ -0,0 +1 @@
+The ratio of two probability densities, called a density-ratio, is a vital quantity in machine learning. In particular, a relative density-ratio, which is a bounded extension of the density-ratio, has received much attention due to its stability and has been used in various applications such as outlier detection and dataset comparison. Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. However, sufficient instances are often unavailable in practice. In this paper, we propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. Specifically, given two datasets that consist of a few instances, our model extracts the datasets' information by using neural networks and uses it to obtain instance embeddings appropriate for the relative DRE. We model the relative density-ratio by a linear model on the embedded space, whose global optimum solution can be obtained as a closed-form solution. The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks b/data/2021/neurips/Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks
new file mode 100644
index 0000000000..da47dd0e60
--- /dev/null
+++ b/data/2021/neurips/Meta-Learning the Search Distribution of Black-Box Random Search Based Adversarial Attacks	
@@ -0,0 +1 @@
+Adversarial attacks based on randomized search schemes have obtained state-of-the-art results in black-box robustness evaluation recently. However, as we demonstrate in this work, their efficiency in different query budget regimes depends on manual design and heuristic tuning of the underlying proposal distributions. We study how this issue can be addressed by adapting the proposal distribution online based on the information obtained during the attack. We consider Square Attack, which is a state-of-the-art score-based black-box attack, and demonstrate how its performance can be improved by a learned controller that adjusts the parameters of the proposal distribution online during the attack. We train the controller using gradient-based end-to-end training on a CIFAR10 model with white box access. We demonstrate that plugging the learned controller into the attack consistently improves its black-box robustness estimate in different query regimes by up to 20% for a wide range of different models with black-box access. We further show that the learned adaptation principle transfers well to the other data distributions such as CIFAR100 or ImageNet and to the targeted attack setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-Learning via Learning with Distributed Memory b/data/2021/neurips/Meta-Learning via Learning with Distributed Memory
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Meta-learning to Improve Pre-training b/data/2021/neurips/Meta-learning to Improve Pre-training
new file mode 100644
index 0000000000..0391b52245
--- /dev/null
+++ b/data/2021/neurips/Meta-learning to Improve Pre-training	
@@ -0,0 +1 @@
+Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of representations learned. The hyperparameters introduced by these strategies therefore must be tuned appropriately. However, setting the values of these hyperparameters is challenging. Most existing methods either struggle to scale to high dimensions, are too slow and memory-intensive, or cannot be directly applied to the two-stage PT and FT learning process. In this work, we propose an efficient, gradient-based algorithm to meta-learn PT hyperparameters. We formalize the PT hyperparameter optimization problem and propose a novel method to obtain PT hyperparameter gradients by combining implicit differentiation and backpropagation through unrolled optimization. We demonstrate that our method improves predictive performance on two real-world domains. First, we optimize high-dimensional task weighting hyperparameters for multitask pre-training on protein-protein interaction graphs and improve AUROC by up to 3.9%. Second, we optimize a data augmentation neural network for self-supervised PT with SimCLR on electrocardiography data and improve AUROC by up to 1.9%.
\ No newline at end of file
diff --git a/data/2021/neurips/Meta-learning with an Adaptive Task Scheduler b/data/2021/neurips/Meta-learning with an Adaptive Task Scheduler
new file mode 100644
index 0000000000..a46ac30f96
--- /dev/null
+++ b/data/2021/neurips/Meta-learning with an Adaptive Task Scheduler	
@@ -0,0 +1 @@
+To benefit the learning of a new task, meta-learning has been proposed to transfer a well-generalized meta-model learned from various meta-training tasks. Existing meta-learning algorithms randomly sample meta-training tasks with a uniform probability, under the assumption that tasks are of equal importance. However, it is likely that tasks are detrimental with noise or imbalanced given a limited number of meta-training tasks. To prevent the meta-model from being corrupted by such detrimental tasks or dominated by tasks in the majority, in this paper, we propose an adaptive task scheduler (ATS) for the meta-training process. In ATS, for the first time, we design a neural scheduler to decide which meta-training tasks to use next by predicting the probability being sampled for each candidate task, and train the scheduler to optimize the generalization capacity of the meta-model to unseen tasks. We identify two meta-model-related factors as the input of the neural scheduler, which characterize the difficulty of a candidate task to the meta-model. Theoretically, we show that a scheduler taking the two factors into account improves the meta-training loss and also the optimization landscape. Under the setting of meta-learning with noise and limited budgets, ATS improves the performance on both miniImageNet and a real-world drug discovery benchmark by up to 13% and 18%, respectively, compared to state-of-the-art task schedulers.
\ No newline at end of file
diff --git a/data/2021/neurips/MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images b/data/2021/neurips/MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images
new file mode 100644
index 0000000000..cfb06e287f
--- /dev/null
+++ b/data/2021/neurips/MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images	
@@ -0,0 +1 @@
+In this paper, we aim to create generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations. Recent advances in deep learning, especially neural implicit representations, have enabled human shape reconstruction and controllable avatar generation from different sensor inputs. However, to generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs. Furthermore, due to the difficulty of effectively modeling pose-dependent cloth deformations for diverse body shapes and cloth types, existing approaches resort to per-subject/cloth-type optimization from scratch, which is computationally expensive. In contrast, we propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images. We achieve this by using meta-learning to learn an initialization of a hypernetwork that predicts the parameters of neural SDFs. The hypernetwork is conditioned on human poses and represents a clothed neural avatar that deforms non-rigidly according to the input poses. Meanwhile, it is meta-learned to effectively incorporate priors of diverse body shapes and cloth types and thus can be much faster to fine-tune, compared to models trained from scratch. We qualitatively and quantitatively show that our approach outperforms state-of-the-art approaches that require complete meshes as inputs while our approach requires only depth frames as inputs and runs orders of magnitudes faster. Furthermore, we demonstrate that our meta-learned hypernetwork is very robust, being the first to generate avatars with realistic dynamic cloth deformations given as few as 8 monocular depth frames.
\ No newline at end of file
diff --git a/data/2021/neurips/Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models b/data/2021/neurips/Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models
new file mode 100644
index 0000000000..f076898bc5
--- /dev/null
+++ b/data/2021/neurips/Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models	
@@ -0,0 +1 @@
+How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can leverage some task-specific features (i.e., metadata) to share knowledge across tasks. As a general framework, we propose to capture task relations through the lens of Bayesian hierarchical models, upon which a Thompson sampling algorithm is designed to efficiently learn task relations, share information, and minimize the cumulative regrets. Two concrete examples for Gaussian bandits and Bernoulli bandits are carefully analyzed. The Bayes regret for Gaussian bandits clearly demonstrates the benefits of information sharing with our algorithm. The proposed method is further supported by extensive experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Metropolis-Hastings Data Augmentation for Graph Neural Networks b/data/2021/neurips/Metropolis-Hastings Data Augmentation for Graph Neural Networks
new file mode 100644
index 0000000000..b3a8ffa803
--- /dev/null
+++ b/data/2021/neurips/Metropolis-Hastings Data Augmentation for Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) often suffer from weak-generalization due to sparsely labeled data despite their promising results on various graph-based tasks. Data augmentation is a prevalent remedy to improve the generalization ability of models in many domains. However, due to the non-Euclidean nature of data space and the dependencies between samples, designing effective augmentation on graphs is challenging. In this paper, we propose a novel framework Metropolis-Hastings Data Augmentation (MH-Aug) that draws augmented graphs from an explicit target distribution for semi-supervised learning. MH-Aug produces a sequence of augmented graphs from the target distribution enables flexible control of the strength and diversity of augmentation. Since the direct sampling from the complex target distribution is challenging, we adopt the Metropolis-Hastings algorithm to obtain the augmented samples. We also propose a simple and effective semi-supervised learning strategy with generated samples from MH-Aug. Our extensive experiments demonstrate that MH-Aug can generate a sequence of samples according to the target distribution to significantly improve the performance of GNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Mind the Gap: Assessing Temporal Generalization in Neural Language Models b/data/2021/neurips/Mind the Gap: Assessing Temporal Generalization in Neural Language Models
new file mode 100644
index 0000000000..e3a8a4568d
--- /dev/null
+++ b/data/2021/neurips/Mind the Gap: Assessing Temporal Generalization in Neural Language Models	
@@ -0,0 +1 @@
+Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlapping time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language models perform worse in the realistic setup of predicting future utterances from beyond their training period, and that model performance becomes increasingly worse with time. We find that, while increasing model size alone -- a key driver behind recent progress -- does not solve this problem, having models that continually update their knowledge with new information can indeed mitigate this performance degradation over time. Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world. We publicly release our dynamic, streaming language modelling benchmarks for WMT and arXiv to facilitate language model evaluation that takes temporal dynamics into account.
\ No newline at end of file
diff --git a/data/2021/neurips/Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding b/data/2021/neurips/Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding
new file mode 100644
index 0000000000..6836cedb55
--- /dev/null
+++ b/data/2021/neurips/Mini-Batch Consistent Slot Set Encoder for Scalable Set Encoding	
@@ -0,0 +1 @@
+Most existing set encoding algorithms operate under the implicit assumption that all the set elements are accessible, and that there are ample computational and memory resources to load the set into memory during training and inference. However, both assumptions fail when the set is excessively large such that it is impossible to load all set elements into memory, or when data arrives in a stream. To tackle such practical challenges in large-scale set encoding, the general set-function constraints of permutation invariance and equivariance are not sufficient. We introduce a new property termed Mini-Batch Consistency (MBC) that is required for large scale mini-batch set encoding. Additionally, we present a scalable and efficient attention-based set encoding mechanism that is amenable to mini-batch processing of sets, and capable of updating set representations as data arrives. The proposed method adheres to the required symmetries of invariance and equivariance as well as maintaining MBC for any partition of the input set. We perform extensive experiments and show that our method is computationally efficient and results in rich set encoding representations for set-structured data.
\ No newline at end of file
diff --git a/data/2021/neurips/Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization b/data/2021/neurips/Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization
new file mode 100644
index 0000000000..819525b446
--- /dev/null
+++ b/data/2021/neurips/Minibatch and Momentum Model-based Methods for Stochastic Weakly Convex Optimization	
@@ -0,0 +1 @@
+Stochastic model-based methods have received increasing attention lately due to their appealing robustness to the stepsize selection and provable efficiency guarantee. We make two important extensions for improving model-based methods on stochastic weakly convex optimization. First, we propose new minibatch model-based methods by involving a set of samples to approximate the model function in each iteration. For the first time, we show that stochastic algorithms achieve linear speedup over the batch size even for non-smooth and non-convex (particularly, weakly convex) problems. To this end, we develop a novel sensitivity analysis of the proximal mapping involved in each algorithm iteration. Our analysis appears to be of independent interests in more general settings. Second, motivated by the success of momentum stochastic gradient descent, we propose a new stochastic extrapolated model-based method, greatly extending the classic Polyak momentum technique to a wider class of stochastic algorithms for weakly convex optimization. The rate of convergence to some natural stationarity condition is established over a fairly flexible range of extrapolation terms. While mainly focusing on weakly convex optimization, we also extend our work to convex optimization. We apply the minibatch and extrapolated model-based methods to stochastic convex optimization, for which we provide a new complexity bound and promising linear speedup in batch size. Moreover, an accelerated model-based method based on Nesterov's momentum is presented, for which we establish an optimal complexity bound for reaching optimality.
\ No newline at end of file
diff --git a/data/2021/neurips/Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers b/data/2021/neurips/Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers
new file mode 100644
index 0000000000..c12a7205d9
--- /dev/null
+++ b/data/2021/neurips/Minimax Optimal Quantile and Semi-Adversarial Regret via Root-Logarithmic Regularizers	
@@ -0,0 +1 @@
+Quantile (and, more generally, KL) regret bounds, such as those achieved by NormalHedge (Chaudhuri, Freund, and Hsu 2009) and its variants, relax the goal of competing against the best individual expert to only competing against a majority of experts on adversarial data. More recently, the semi-adversarial paradigm (Bilodeau, Negrea, and Roy 2020) provides an alternative relaxation of adversarial online learning by considering data that may be neither fully adversarial nor stochastic (i.i.d.). We achieve the minimax optimal regret in both paradigms using FTRL with separate, novel, root-logarithmic regularizers, both of which can be interpreted as yielding variants of NormalHedge. We extend existing KL regret upper bounds, which hold uniformly over target distributions, to possibly uncountable expert classes with arbitrary priors; provide the first full-information lower bounds for quantile regret on finite expert classes (which are tight); and provide an adaptively minimax optimal algorithm for the semi-adversarial paradigm that adapts to the true, unknown constraint faster, leading to uniformly improved regret bounds over existing methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Minimax Regret for Stochastic Shortest Path b/data/2021/neurips/Minimax Regret for Stochastic Shortest Path
new file mode 100644
index 0000000000..5fc331e8f2
--- /dev/null
+++ b/data/2021/neurips/Minimax Regret for Stochastic Shortest Path	
@@ -0,0 +1 @@
+We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for $K$ episodes, and has to minimize her regret. In this work we show that the minimax regret for this setting is $\widetilde O(\sqrt{ (B_\star^2 + B_\star) |S| |A| K})$ where $B_\star$ is a bound on the expected cost of the optimal policy from any state, $S$ is the state space, and $A$ is the action space. This matches the $\Omega (\sqrt{ B_\star^2 |S| |A| K})$ lower bound of Rosenberg et al. [2020] for $B_\star \ge 1$, and improves their regret bound by a factor of $\sqrt{|S|}$. For $B_\star<1$ we prove a matching lower bound of $\Omega (\sqrt{ B_\star |S| |A| K})$. Our algorithm is based on a novel reduction from SSP to finite-horizon MDPs. To that end, we provide an algorithm for the finite-horizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon.
\ No newline at end of file
diff --git a/data/2021/neurips/Minimizing Polarization and Disagreement in Social Networks via Link Recommendation b/data/2021/neurips/Minimizing Polarization and Disagreement in Social Networks via Link Recommendation
new file mode 100644
index 0000000000..1f0d2fc975
--- /dev/null
+++ b/data/2021/neurips/Minimizing Polarization and Disagreement in Social Networks via Link Recommendation	
@@ -0,0 +1 @@
+Individual’s opinions are fundamentally shaped and evolved by their interactions with other people, and social phenomena such as disagreement and polarization are now tightly woven into daily life. The quantiﬁcation and optimization of these concepts have been the subject of much recent research behind a wealth of high-impact data mining applications. In particular, researchers have addressed the question of how such concepts can be optimized by inﬂuencing the opinion of a small number of individuals or by designing the network from scratch. Here, rather than a “design-from-scratch” approach or altering the initial opinion, we study the optimization problem of recommending k new links to minimize the sum of polarization and disagreement in a social network with n nodes and m edges. We show that our objective function of this combinatorial optimization problem is not submodular, although it is monotone. We propose a simple greedy algorithm with a constant-factor approximation that solves the problem in cubic running time, and we provide theoretical analysis of the approximation guarantee for the algorithm. To overcome the computation challenge for large networks, we also provide a fast algorithm with computation complexity (cid:101) O ( mk(cid:15) − 2 ) for any (cid:15) > 0 , where the (cid:101) O ( · ) notation suppresses the poly(log n ) factors. Extensive experiments on real datasets demonstrate both the efﬁciency and effectiveness of our algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Mining the Benefits of Two-stage and One-stage HOI Detection b/data/2021/neurips/Mining the Benefits of Two-stage and One-stage HOI Detection
new file mode 100644
index 0000000000..28003bbd2c
--- /dev/null
+++ b/data/2021/neurips/Mining the Benefits of Two-stage and One-stage HOI Detection	
@@ -0,0 +1 @@
+Two-stage methods have dominated Human-Object Interaction (HOI) detection for several years. Recently, one-stage HOI detection methods have become popular. In this paper, we aim to explore the essential pros and cons of two-stage and one-stage methods. With this as the goal, we find that conventional two-stage methods mainly suffer from positioning positive interactive human-object pairs, while one-stage methods are challenging to make an appropriate trade-off on multi-task learning, i.e., object detection, and interaction classification. Therefore, a core problem is how to take the essence and discard the dregs from the conventional two types of methods. To this end, we propose a novel one-stage framework with disentangling human-object detection and interaction classification in a cascade manner. In detail, we first design a human-object pair generator based on a state-of-the-art one-stage HOI detector by removing the interaction classification module or head and then design a relatively isolated interaction classifier to classify each human-object pair. Two cascade decoders in our proposed framework can focus on one specific task, detection or interaction classification. In terms of the specific implementation, we adopt a transformer-based HOI detector as our base model. The newly introduced disentangling paradigm outperforms existing methods by a large margin, with a significant relative mAP gain of 9.32% on HICO-Det. The source codes are available at https://github.com/YueLiao/CDN.
\ No newline at end of file
diff --git a/data/2021/neurips/Mirror Langevin Monte Carlo: the Case Under Isoperimetry b/data/2021/neurips/Mirror Langevin Monte Carlo: the Case Under Isoperimetry
new file mode 100644
index 0000000000..318e017947
--- /dev/null
+++ b/data/2021/neurips/Mirror Langevin Monte Carlo: the Case Under Isoperimetry	
@@ -0,0 +1 @@
+Motivated by the connection between sampling and optimization, we study a mirror descent analogue of Langevin dynamics and analyze three different discretization schemes, giving nonasymptotic convergence rate under functional inequalities such as Log-Sobolev in the corresponding metric. Compared to the Euclidean setting, the result reveals intricate relationship between the underlying geometry and the target distribution and suggests that care might need to be taken in order for the discretized algorithm to achieve vanishing bias with diminishing stepsize for sampling from potentials under weaker smoothness/convexity regularity conditions.
\ No newline at end of file
diff --git a/data/2021/neurips/Misspecified Gaussian Process Bandit Optimization b/data/2021/neurips/Misspecified Gaussian Process Bandit Optimization
new file mode 100644
index 0000000000..4c0ab5c376
--- /dev/null
+++ b/data/2021/neurips/Misspecified Gaussian Process Bandit Optimization	
@@ -0,0 +1 @@
+We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a \emph{misspecified} kernelized bandit setting where the unknown function can be $\epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We design efficient and practical algorithms whose performance degrades minimally in the presence of model misspecification. Specifically, we present two algorithms based on Gaussian process (GP) methods: an optimistic EC-GP-UCB algorithm that requires knowing the misspecification error, and Phased GP Uncertainty Sampling, an elimination-type algorithm that can adapt to unknown model misspecification. We provide upper bounds on their cumulative regret in terms of $\epsilon$, the time horizon, and the underlying kernel, and we show that our algorithm achieves optimal dependence on $\epsilon$ with no prior knowledge of misspecification. In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing $\epsilon$.
\ No newline at end of file
diff --git a/data/2021/neurips/Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage b/data/2021/neurips/Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Mitigating Forgetting in Online Continual Learning with Neuron Calibration b/data/2021/neurips/Mitigating Forgetting in Online Continual Learning with Neuron Calibration
new file mode 100644
index 0000000000..c7f147be2a
--- /dev/null
+++ b/data/2021/neurips/Mitigating Forgetting in Online Continual Learning with Neuron Calibration	
@@ -0,0 +1 @@
+Inspired by human intelligence, the research on online continual learning aims to push the limits of the machine learning models to constantly learn from sequentially encountered tasks, with the data from each task being observed in an online fashion. Though recent studies have achieved remarkable progress in improving the online continual learning performance empowered by the deep neural networks-based models, many of today’s approaches still suffer a lot from catastrophic forgetting, a persistent challenge for continual learning. In this paper, we present a novel method which attempts to mitigate catastrophic forgetting in online continual learning from a new perspective, i.e., neuron calibration. In particular, we model the neurons in the deep neural networks-based models as calibrated units under a general formulation. Then we formalize a learning framework to effectively train the calibrated model, where neuron calibration could give ubiquitous beneﬁt to balance the stability and plasticity of online continual learning algorithms through inﬂuencing both their forward inference path and backward optimization path. Our proposed formulation for neuron calibration is lightweight and applicable to general feed-forward neural networks-based models. We perform extensive experiments to evaluate our method on four benchmark continual learning datasets. The results show that neuron calibration plays a vital role in improving online continual learning performance and our method could substantially improve the state-of-the-art performance on all the evaluated datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps b/data/2021/neurips/MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps
new file mode 100644
index 0000000000..44ffb7de5f
--- /dev/null
+++ b/data/2021/neurips/MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps	
@@ -0,0 +1 @@
+Deep neural networks are susceptible to adversarially crafted, small and imperceptible changes in the natural inputs. The most effective defense mechanism against these examples is adversarial training which constructs adversarial examples during training by iterative maximization of loss. The model is then trained to minimize the loss on these constructed examples. This min-max optimization requires more data, larger capacity models, and additional computing resources. It also degrades the standard generalization performance of a model. Can we achieve robustness more efficiently? In this work, we explore this question from the perspective of knowledge transfer. First, we theoretically show the transferability of robustness from an adversarially trained teacher model to a student model with the help of mixup augmentation. Second, we propose a novel robustness transfer method called Mixup-Based Activated Channel Maps (MixACM) Transfer. MixACM transfers robustness from a robust teacher to a student by matching activated channel maps generated without expensive adversarial perturbations. Finally, extensive experiments on multiple datasets and different learning scenarios show our method can transfer robustness while also improving generalization on natural images.
\ No newline at end of file
diff --git a/data/2021/neurips/MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data b/data/2021/neurips/MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data
new file mode 100644
index 0000000000..943ca0f53c
--- /dev/null
+++ b/data/2021/neurips/MixSeq: Connecting Macroscopic Time Series Forecasting with Microscopic Time Series Data	
@@ -0,0 +1 @@
+Time series forecasting is widely used in business intelligence, e.g., forecast stock market price, sales, and help the analysis of data trend. Most time series of interest are macroscopic time series that are aggregated from microscopic data. However, instead of directly modeling the macroscopic time series, rare literature studied the forecasting of macroscopic time series by leveraging data on the microscopic level. In this paper, we assume that the microscopic time series follow some unknown mixture probabilistic distributions. We theoretically show that as we identify the ground truth latent mixture components, the estimation of time series from each component could be improved because of lower variance, thus benefitting the estimation of macroscopic time series as well. Inspired by the power of Seq2seq and its variants on the modeling of time series data, we propose Mixture of Seq2seq (MixSeq), an end2end mixture model to cluster microscopic time series, where all the components come from a family of Seq2seq models parameterized by different parameters. Extensive experiments on both synthetic and real-world data show the superiority of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Mixability made efficient: Fast online multiclass logistic regression b/data/2021/neurips/Mixability made efficient: Fast online multiclass logistic regression
new file mode 100644
index 0000000000..a2314f3655
--- /dev/null
+++ b/data/2021/neurips/Mixability made efficient: Fast online multiclass logistic regression	
@@ -0,0 +1 @@
+Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves $O(e^B\log(n))$ obtaining a double exponential gain in $B$ (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity $O(n^{37})$.
\ No newline at end of file
diff --git a/data/2021/neurips/Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity b/data/2021/neurips/Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity
new file mode 100644
index 0000000000..52a2308794
--- /dev/null
+++ b/data/2021/neurips/Mixed Supervised Object Detection by Transferring Mask Prior and Semantic Similarity	
@@ -0,0 +1 @@
+Object detection has achieved promising success, but requires large-scale fully-annotated data, which is time-consuming and labor-extensive. Therefore, we consider object detection with mixed supervision, which learns novel object categories using weak annotations with the help of full annotations of existing base object categories. Previous works using mixed supervision mainly learn the class-agnostic objectness from fully-annotated categories, which can be transferred to upgrade the weak annotations to pseudo full annotations for novel categories. In this paper, we further transfer mask prior and semantic similarity to bridge the gap between novel categories and base categories. Specifically, the ability of using mask prior to help detect objects is learned from base categories and transferred to novel categories. Moreover, the semantic similarity between objects learned from base categories is transferred to denoise the pseudo full annotations for novel categories. Experimental results on three benchmark datasets demonstrate the effectiveness of our method over existing methods. Codes are available at https://github.com/bcmi/TraMaS-Weak-Shot-Object-Detection.
\ No newline at end of file
diff --git a/data/2021/neurips/Mixture Proportion Estimation and PU Learning: A Modern Approach b/data/2021/neurips/Mixture Proportion Estimation and PU Learning: A Modern Approach
new file mode 100644
index 0000000000..8436f810d5
--- /dev/null
+++ b/data/2021/neurips/Mixture Proportion Estimation and PU Learning: A Modern Approach	
@@ -0,0 +1 @@
+Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier
\ No newline at end of file
diff --git a/data/2021/neurips/Mixture weights optimisation for Alpha-Divergence Variational Inference b/data/2021/neurips/Mixture weights optimisation for Alpha-Divergence Variational Inference
new file mode 100644
index 0000000000..b77ee2da2a
--- /dev/null
+++ b/data/2021/neurips/Mixture weights optimisation for Alpha-Divergence Variational Inference	
@@ -0,0 +1 @@
+This paper focuses on $\alpha$-divergence minimisation methods for Variational Inference. More precisely, we are interested in algorithms optimising the mixture weights of any given mixture model, without any information on the underlying distribution of its mixture components parameters. The Power Descent, defined for all $\alpha \neq 1$, is one such algorithm and we establish in our work the full proof of its convergence towards the optimal mixture weights when $\alpha<1$. Since the $\alpha$-divergence recovers the widely-used forward Kullback-Leibler when $\alpha \to 1$, we then extend the Power Descent to the case $\alpha = 1$ and show that we obtain an Entropic Mirror Descent. This leads us to investigate the link between Power Descent and Entropic Mirror Descent: first-order approximations allow us to introduce the Renyi Descent, a novel algorithm for which we prove an $O(1/N)$ convergence rate. Lastly, we compare numerically the behavior of the unbiased Power Descent and of the biased Renyi Descent and we discuss the potential advantages of one algorithm over the other.
\ No newline at end of file
diff --git a/data/2021/neurips/MobILE: Model-Based Imitation Learning From Observation Alone b/data/2021/neurips/MobILE: Model-Based Imitation Learning From Observation Alone
new file mode 100644
index 0000000000..ad6c1b126d
--- /dev/null
+++ b/data/2021/neurips/MobILE: Model-Based Imitation Learning From Observation Alone	
@@ -0,0 +1 @@
+This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off strategic exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by presenting an exponential sample complexity separation between IL and ILFO. We complement these theoretical results with experimental simulations on benchmark OpenAI Gym tasks that indicate the efficacy of MobILE. Code for implementing the MobILE framework is available at https://github.com/rahulkidambi/MobILE-NeurIPS2021.
\ No newline at end of file
diff --git a/data/2021/neurips/MobTCast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction b/data/2021/neurips/MobTCast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction
new file mode 100644
index 0000000000..fea94f2936
--- /dev/null
+++ b/data/2021/neurips/MobTCast: Leveraging Auxiliary Trajectory Forecasting for Human Mobility Prediction	
@@ -0,0 +1 @@
+Human mobility prediction is a core functionality in many location-based services and applications. However, due to the sparsity of mobility data, it is not an easy task to predict future POIs (place-of-interests) that are going to be visited. In this paper, we propose MobTCast, a Transformer-based context-aware network for mobility prediction. Specifically, we explore the influence of four types of context in the mobility prediction: temporal, semantic, social and geographical contexts. We first design a base mobility feature extractor using the Transformer architecture, which takes both the history POI sequence and the semantic information as input. It handles both the temporal and semantic contexts. Based on the base extractor and the social connections of a user, we employ a self-attention module to model the influence of the social context. Furthermore, unlike existing methods, we introduce a location prediction branch in MobTCast as an auxiliary task to model the geographical context and predict the next location. Intuitively, the geographical distance between the location of the predicted POI and the predicted location from the auxiliary branch should be as close as possible. To reflect this relation, we design a consistency loss to further improve the POI prediction performance. In our experimental results, MobTCast outperforms other state-of-the-art next POI prediction methods. Our approach illustrates the value of including different types of context in next POI prediction.
\ No newline at end of file
diff --git a/data/2021/neurips/Modality-Agnostic Topology Aware Localization b/data/2021/neurips/Modality-Agnostic Topology Aware Localization
new file mode 100644
index 0000000000..287f908491
--- /dev/null
+++ b/data/2021/neurips/Modality-Agnostic Topology Aware Localization	
@@ -0,0 +1 @@
+This work presents a data-driven approach for the indoor localization of an observer on a 2D topological map of the environment. State-of-the-art techniques may yield accurate estimates only when they are tailor-made for a speciﬁc data modality like camera-based system that prevents their applicability to broader domains. Here, we establish a modality-agnostic framework (called OT-Isomap ) and formulate the localization problem in the context of parametric manifold learning while leveraging optimal transportation. This framework allows jointly learning a low-dimensional embedding as well as correspondences with a topological map. We examine the generalizability of the proposed algorithm by applying it to data from diverse modalities such as image sequences and radio frequency signals. The experimental results demonstrate decimeter-level accuracy for localization using different sensory inputs.
\ No newline at end of file
diff --git a/data/2021/neurips/Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data b/data/2021/neurips/Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data
new file mode 100644
index 0000000000..86606f8bde
--- /dev/null
+++ b/data/2021/neurips/Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data	
@@ -0,0 +1 @@
+Unsupervised domain adaptation aims to align a labeled source domain and an unlabeled target domain, but it requires to access the source data which often raises concerns in data privacy, data portability and data transmission efficiency. We study unsupervised model adaptation (UMA), or called Unsupervised Domain Adaptation without Source Data, an alternative setting that aims to adapt source-trained models towards target distributions without accessing source data. To this end, we design an innovative historical contrastive learning (HCL) technique that exploits historical source hypothesis to make up for the absence of source data in UMA. HCL addresses the UMA challenge from two perspectives. First, it introduces historical contrastive instance discrimination (HCID) that learns from target samples by contrasting their embeddings which are generated by the currently adapted model and the historical models. With the historical models, HCID encourages UMA to learn instance-discriminative target representations while preserving the source hypothesis. Second, it introduces historical contrastive category discrimination (HCCD) that pseudo-labels target samples to learn category-discriminative target representations. Specifically, HCCD re-weights pseudo labels according to their prediction consistency across the current and historical models. Extensive experiments show that HCL outperforms and state-of-the-art methods consistently across a variety of visual tasks and setups.
\ No newline at end of file
diff --git a/data/2021/neurips/Model Selection for Bayesian Autoencoders b/data/2021/neurips/Model Selection for Bayesian Autoencoders
new file mode 100644
index 0000000000..df14c0fd51
--- /dev/null
+++ b/data/2021/neurips/Model Selection for Bayesian Autoencoders	
@@ -0,0 +1 @@
+We develop a novel method for carrying out model selection for Bayesian autoencoders (BAEs) by means of prior hyper-parameter optimization. Inspired by the common practice of type-II maximum likelihood optimization and its equivalence to Kullback-Leibler divergence minimization, we propose to optimize the distributional sliced-Wasserstein distance (DSWD) between the output of the autoencoder and the empirical data distribution. The advantages of this formulation are that we can estimate the DSWD based on samples and handle high-dimensional problems. We carry out posterior estimation of the BAE parameters via stochastic gradient Hamiltonian Monte Carlo and turn our BAE into a generative model by fitting a flexible Dirichlet mixture model in the latent space. Consequently, we obtain a powerful alternative to variational autoencoders, which are the preferred choice in modern applications of autoencoders for representation learning with uncertainty. We evaluate our approach qualitatively and quantitatively using a vast experimental campaign on a number of unsupervised learning tasks and show that, in small-data regimes where priors matter, our approach provides state-of-the-art results, outperforming multiple competitive baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model b/data/2021/neurips/Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model
new file mode 100644
index 0000000000..161aae25e8
--- /dev/null
+++ b/data/2021/neurips/Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model	
@@ -0,0 +1 @@
+Recent evidence has shown the existence of a so-called double-descent and even triple-descent behavior for the generalization error of deep-learning models. This important phenomenon commonly appears in implemented neural network architectures, and also seems to emerge in epoch-wise curves during the training process. A recent line of research has highlighted that random matrix tools can be used to obtain precise analytical asymptotics of the generalization (and training) errors of the random feature model. In this contribution, we analyze the whole temporal behavior of the generalization and training errors under gradient flow for the random feature model. We show that in the asymptotic limit of large system size the full time-evolution path of both errors can be calculated analytically. This allows us to observe how the double and triple descents develop over time, if and when early stopping is an option, and also observe time-wise descent structures. Our techniques are based on Cauchy complex integral representations of the errors together with recent random matrix methods based on linear pencils.
\ No newline at end of file
diff --git a/data/2021/neurips/Model-Based Domain Generalization b/data/2021/neurips/Model-Based Domain Generalization
new file mode 100644
index 0000000000..53d47df0b2
--- /dev/null
+++ b/data/2021/neurips/Model-Based Domain Generalization	
@@ -0,0 +1 @@
+Despite remarkable success in a variety of applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data. Toward addressing this challenge, we consider the domain generalization problem, wherein predictors are trained using data drawn from a family of related training domains and then evaluated on a distinct and unseen test domain. We show that under a natural model of data generation and a concomitant invariance condition, the domain generalization problem is equivalent to an infinite-dimensional constrained statistical learning problem; this problem forms the basis of our approach, which we call Model-Based Domain Generalization. Due to the inherent challenges in solving constrained optimization problems in deep learning, we exploit nonconvex duality theory to develop unconstrained relaxations of this statistical problem with tight bounds on the duality gap. Based on this theoretical motivation, we propose a novel domain generalization algorithm with convergence guarantees. In our experiments, we report improvements of up to 30 percentage points over state-of-the-art domain generalization baselines on several benchmarks including ColoredMNIST, Camelyon17-WILDS, FMoW-WILDS, and PACS.
\ No newline at end of file
diff --git a/data/2021/neurips/Model-Based Episodic Memory Induces Dynamic Hybrid Controls b/data/2021/neurips/Model-Based Episodic Memory Induces Dynamic Hybrid Controls
new file mode 100644
index 0000000000..823c6a46c3
--- /dev/null
+++ b/data/2021/neurips/Model-Based Episodic Memory Induces Dynamic Hybrid Controls	
@@ -0,0 +1 @@
+Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory. We propose a new model-based episodic memory of trajectories addressing current limitations of episodic control. Our memory estimates trajectory values, guiding the agent towards good policies. Built upon the memory, we construct a complementary learning model via a dynamic hybrid control unifying model-based, episodic and habitual learning into a single architecture. Experiments demonstrate that our model allows significantly faster and better learning than other strong reinforcement learning agents across a variety of environments including stochastic and non-Markovian settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Model-Based Reinforcement Learning via Imagination with Derived Memory b/data/2021/neurips/Model-Based Reinforcement Learning via Imagination with Derived Memory
new file mode 100644
index 0000000000..1b1b7f24ea
--- /dev/null
+++ b/data/2021/neurips/Model-Based Reinforcement Learning via Imagination with Derived Memory	
@@ -0,0 +1 @@
+Model-based reinforcement learning aims to improve the sample efficiency of policy learning by modelling the dynamics of the environment. Recently, the latent dynamics model has been further developed to enable fast planning in a compact space. It summarizes the high-dimensional experiences of an agent, which mimics the memory function of humans. Learning policies via imagination with the latent model shows great potential for solving complex tasks. However, only considering memories from the true experiences in the process of imagination could limit its advantages. Inspired by the memory prosthesis proposed by neuroscientists, we present a novel model-based reinforcement learning framework called Imagining with Derived Memory (IDM). It enables the agent to learn policy from enriched diverse imagination with prediction-reliability weight, thus improving sample efficiency and policy robustness. Experiments on various high-dimensional visual control tasks in the DMControl benchmark demonstrate that IDM outperforms previous state-of-the-art methods in terms of policy robustness and further improves the sample efficiency of the model-based method.
\ No newline at end of file
diff --git a/data/2021/neurips/Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones b/data/2021/neurips/Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones
new file mode 100644
index 0000000000..c6357d7854
--- /dev/null
+++ b/data/2021/neurips/Modeling Heterogeneous Hierarchies with Relation-specific Hyperbolic Cones	
@@ -0,0 +1 @@
+Hierarchical relations are prevalent and indispensable for organizing human knowledge captured by a knowledge graph (KG). The key property of hierarchical relations is that they induce a partial ordering over the entities, which needs to be modeled in order to allow for hierarchical reasoning. However, current KG embeddings can model only a single global hierarchy (single global partial ordering) and fail to model multiple heterogeneous hierarchies that exist in a single KG. Here we present ConE (Cone Embedding), a KG embedding model that is able to simultaneously model multiple hierarchical as well as non-hierarchical relations in a knowledge graph. ConE embeds entities into hyperbolic cones and models relations as transformations between the cones. In particular, ConE uses cone containment constraints in different subspaces of the hyperbolic embedding space to capture multiple heterogeneous hierarchies. Experiments on standard knowledge graph benchmarks show that ConE obtains state-of-the-art performance on hierarchical reasoning tasks as well as knowledge graph completion task on hierarchical graphs. In particular, our approach yields new state-of-the-art Hits@1 of 45.3% on WN18RR and 16.1% on DDB14 (0.231 MRR). As for hierarchical reasoning task, our approach outperforms previous best results by an average of 20% across the three datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Modified Frank Wolfe in Probability Space b/data/2021/neurips/Modified Frank Wolfe in Probability Space
new file mode 100644
index 0000000000..2025ec6839
--- /dev/null
+++ b/data/2021/neurips/Modified Frank Wolfe in Probability Space	
@@ -0,0 +1 @@
+We propose a novel Frank-Wolfe (FW) procedure for the optimization of inﬁnite-dimensional functionals of probability measures - a task which arises naturally in a wide range of areas including statistical learning (e.g. variational inference) and artiﬁcial intelligence (e.g. generative adversarial networks). Our FW procedure takes advantage of Wasserstein gradient ﬂows and strong duality results recently developed in Distributionally Robust Optimization so that gradient steps (in the Wasserstein space) can be efﬁciently computed using ﬁnite-dimensional, convex optimization methods. We show how to choose the step sizes in order to guarantee exponentially fast iteration convergence, under mild assumptions on the functional to optimize. We apply our algorithm to a range of functionals arising from applications in nonparametric estimation.
\ No newline at end of file
diff --git a/data/2021/neurips/Modular Gaussian Processes for Transfer Learning b/data/2021/neurips/Modular Gaussian Processes for Transfer Learning
new file mode 100644
index 0000000000..b0f3ea8bf2
--- /dev/null
+++ b/data/2021/neurips/Modular Gaussian Processes for Transfer Learning	
@@ -0,0 +1 @@
+We present a framework for transfer learning based on modular variational Gaussian processes (GP). We develop a module-based method that having a dictionary of well fitted GPs, one could build ensemble GP models without revisiting any data. Each model is characterised by its hyperparameters, pseudo-inputs and their corresponding posterior densities. Our method avoids undesired data centralisation, reduces rising computational costs and allows the transfer of learned uncertainty metrics after training. We exploit the augmentation of high-dimensional integral operators based on the Kullback-Leibler divergence between stochastic processes to introduce an efficient lower bound under all the sparse variational GPs, with different complexity and even likelihood distribution. The method is also valid for multi-output GPs, learning correlations a posteriori between independent modules. Extensive results illustrate the usability of our framework in large-scale and multi-task experiments, also compared with the exact inference methods in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/Momentum Centering and Asynchronous Update for Adaptive Gradient Methods b/data/2021/neurips/Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
new file mode 100644
index 0000000000..0a762e1a95
--- /dev/null
+++ b/data/2021/neurips/Momentum Centering and Asynchronous Update for Adaptive Gradient Methods	
@@ -0,0 +1 @@
+We propose ACProp (Asynchronous-centering-Prop), an adaptive optimizer which combines centering of second momentum and asynchronous update (e.g. for $t$-th update, denominator uses information up to step $t-1$, while numerator uses gradient at $t$-th step). ACProp has both strong theoretical properties and empirical performance. With the example by Reddi et al. (2018), we show that asynchronous optimizers (e.g. AdaShift, ACProp) have weaker convergence condition than synchronous optimizers (e.g. Adam, RMSProp, AdaBelief); within asynchronous optimizers, we show that centering of second momentum further weakens the convergence condition. We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam. We validate ACProp in extensive empirical studies: ACProp outperforms both SGD and other adaptive optimizers in image classification with CNN, and outperforms well-tuned adaptive optimizers in the training of various GAN models, reinforcement learning and transformers. To sum up, ACProp has good theoretical properties including weak convergence condition and optimal convergence rate, and strong empirical performance including good generalization like SGD and training stability like Adam. We provide the implementation at https://github.com/juntang-zhuang/ACProp-Optimizer.
\ No newline at end of file
diff --git a/data/2021/neurips/Monte Carlo Tree Search With Iteratively Refining State Abstractions b/data/2021/neurips/Monte Carlo Tree Search With Iteratively Refining State Abstractions
new file mode 100644
index 0000000000..2b1592f2f2
--- /dev/null
+++ b/data/2021/neurips/Monte Carlo Tree Search With Iteratively Refining State Abstractions	
@@ -0,0 +1 @@
+Decision-time planning is the process of constructing a transient, local policy with the intent of using it to make the immediate decision. Monte Carlo tree search (MCTS), which has been leveraged to great success in Go, chess, shogi, Hex, Atari, and other settings, is perhaps the most celebrated decision-time planning algorithm. Unfortunately, in its original form, MCTS can degenerate to one-step search in domains with stochasticity. Progressive widening is one way to ameliorate this issue, but we argue that it possesses undesirable properties for some settings. In this work, we present a method, called abstraction reﬁning, for extending MCTS to stochastic environments which, unlike progressive widening, leverages the geometry of the state space. We argue that leveraging the geometry of the space can offer advantages. To support this claim, we present a series of experimental examples in which abstraction reﬁning outperforms progressive widening, given equal simulation budgets.
\ No newline at end of file
diff --git "a/data/2021/neurips/Mori\303\251 Attack (MA): A New Potential Risk of Screen Photos" "b/data/2021/neurips/Mori\303\251 Attack (MA): A New Potential Risk of Screen Photos"
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data b/data/2021/neurips/Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
new file mode 100644
index 0000000000..246afa63b6
--- /dev/null
+++ b/data/2021/neurips/Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data	
@@ -0,0 +1 @@
+Knowledge distillation~(KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain. Prior KD approaches, despite their gratifying results, have largely relied on the premise that \emph{in-domain} data is available to carry out the knowledge transfer. Such an assumption, unfortunately, in many cases violates the practical setting, since the original training data or even the data domain is often unreachable due to privacy or copyright reasons. In this paper, we attempt to tackle an ambitious task, termed as \emph{out-of-domain} knowledge distillation~(OOD-KD), which allows us to conduct KD using only OOD data that can be readily obtained at a very low cost. Admittedly, OOD-KD is by nature a highly challenging task due to the agnostic domain gap. To this end, we introduce a handy yet surprisingly efficacious approach, dubbed as~\textit{MosaicKD}. The key insight behind MosaicKD lies in that, samples from various domains share common local patterns, even though their global semantic may vary significantly; these shared local patterns, in turn, can be re-assembled analogous to mosaic tiling, to approximate the in-domain data and to further alleviating the domain discrepancy. In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner, partially under the guidance of a pre-trained teacher. We validate MosaicKD over {classification and semantic segmentation tasks} across various benchmarks, and demonstrate that it yields results much superior to the state-of-the-art counterparts on OOD data. Our code is available at \url{https://github.com/zju-vipa/MosaicKD}.
\ No newline at end of file
diff --git a/data/2021/neurips/Moser Flow: Divergence-based Generative Modeling on Manifolds b/data/2021/neurips/Moser Flow: Divergence-based Generative Modeling on Manifolds
new file mode 100644
index 0000000000..8be5b3589d
--- /dev/null
+++ b/data/2021/neurips/Moser Flow: Divergence-based Generative Modeling on Manifolds	
@@ -0,0 +1 @@
+We are interested in learning generative models for complex geometries described via manifolds, such as spheres, tori, and other implicit surfaces. Current extensions of existing (Euclidean) generative models are restricted to specific geometries and typically suffer from high computational costs. We introduce Moser Flow (MF), a new class of generative models within the family of continuous normalizing flows (CNF). MF also produces a CNF via a solution to the change-of-variable formula, however differently from other CNF methods, its model (learned) density is parameterized as the source (prior) density minus the divergence of a neural network (NN). The divergence is a local, linear differential operator, easy to approximate and calculate on manifolds. Therefore, unlike other CNFs, MF does not require invoking or backpropagating through an ODE solver during training. Furthermore, representing the model density explicitly as the divergence of a NN rather than as a solution of an ODE facilitates learning high fidelity densities. Theoretically, we prove that MF constitutes a universal density approximator under suitable assumptions. Empirically, we demonstrate for the first time the use of flow models for sampling from general curved surfaces and achieve significant improvements in density estimation, sample quality, and training complexity over existing CNFs on challenging synthetic geometries and real-world benchmarks from the earth and climate sciences.
\ No newline at end of file
diff --git a/data/2021/neurips/Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices b/data/2021/neurips/Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
new file mode 100644
index 0000000000..ba92b0902e
--- /dev/null
+++ b/data/2021/neurips/Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices	
@@ -0,0 +1 @@
+Training deep neural networks on large datasets can often be accelerated by using multiple compute nodes. This approach, known as distributed training, can utilize hundreds of computers via specialized message-passing protocols such as Ring All-Reduce. However, running these protocols at scale requires reliable high-speed networking that is only available in dedicated clusters. In contrast, many real-world applications, such as federated learning and cloud-based distributed training, operate on unreliable devices with unstable network bandwidth. As a result, these applications are restricted to using parameter servers or gossip-based averaging protocols. In this work, we lift that restriction by proposing Moshpit All-Reduce - an iterative averaging protocol that exponentially converges to the global average. We demonstrate the efficiency of our protocol for distributed optimization with strong theoretical guarantees. The experiments show 1.3x speedup for ResNet-50 training on ImageNet compared to competitive gossip-based strategies and 1.5x speedup when training ALBERT-large from scratch using preemptible compute nodes.
\ No newline at end of file
diff --git a/data/2021/neurips/Motif-based Graph Self-Supervised Learning for Molecular Property Prediction b/data/2021/neurips/Motif-based Graph Self-Supervised Learning for Molecular Property Prediction
new file mode 100644
index 0000000000..5a59467d5f
--- /dev/null
+++ b/data/2021/neurips/Motif-based Graph Self-Supervised Learning for Molecular Property Prediction	
@@ -0,0 +1 @@
+Predicting molecular properties with data-driven methods has drawn much attention in recent years. Particularly, Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks. In cases where labeled data is scarce, GNNs can be pre-trained on unlabeled molecular data to first learn the general semantic and structural information before being fine-tuned for specific tasks. However, most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks. These approaches cannot capture the rich information in subgraphs or graph motifs. For example, functional groups (frequently-occurred subgraphs in molecular graphs) often carry indicative information about the molecular properties. To bridge this gap, we propose Motif-based Graph Self-supervised Learning (MGSSL) by introducing a novel self-supervised motif generation framework for GNNs. First, for motif extraction from molecular graphs, we design a molecule fragmentation method that leverages a retrosynthesis-based algorithm BRICS and additional rules for controlling the size of motif vocabulary. Second, we design a general motif-based generative pre-training framework in which GNNs are asked to make topological and label predictions. This generative framework can be implemented in two different ways, i.e., breadth-first or depth-first. Finally, to take the multi-scale information in molecular graphs into consideration, we introduce a multi-level self-supervised pre-training. Extensive experiments on various downstream benchmark tasks show that our methods outperform all state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks b/data/2021/neurips/Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks
new file mode 100644
index 0000000000..127c5230f9
--- /dev/null
+++ b/data/2021/neurips/Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks	
@@ -0,0 +1 @@
+This paper presents a problem in power networks that creates an exciting and yet challenging real-world scenario for application of multi-agent reinforcement learning (MARL). The emerging trend of decarbonisation is placing excessive stress on power distribution networks. Active voltage control is seen as a promising solution to relieve power congestion and improve voltage quality without extra hardware investment, taking advantage of the controllable apparatuses in the network, such as roof-top photovoltaics (PVs) and static var compensators (SVCs). These controllable apparatuses appear in a vast number and are distributed in a wide geographic area, making MARL a natural candidate. This paper formulates the active voltage control problem in the framework of Dec-POMDP and establishes an open-source environment. It aims to bridge the gap between the power community and the MARL community and be a drive force towards real-world applications of MARL algorithms. Finally, we analyse the special characteristics of the active voltage control problems that cause challenges (e.g. interpretability) for state-of-the-art MARL approaches, and summarise the potential directions.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Agent Reinforcement Learning in Stochastic Networked Systems b/data/2021/neurips/Multi-Agent Reinforcement Learning in Stochastic Networked Systems
new file mode 100644
index 0000000000..adf68e2f3d
--- /dev/null
+++ b/data/2021/neurips/Multi-Agent Reinforcement Learning in Stochastic Networked Systems	
@@ -0,0 +1 @@
+We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization b/data/2021/neurips/Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization
new file mode 100644
index 0000000000..3f66977872
--- /dev/null
+++ b/data/2021/neurips/Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization	
@@ -0,0 +1 @@
+We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identiﬁcation. For regret minimization , we settle an important open question by showing an almost tight guarantee. We show Ω( T 2 / 3 ) cumulative regret in expectation for single-pass algorithms for arm-memory size of ( n − 1) , where n is the number of arms. For best-arm identiﬁcation , we provide an ( ε, δ ) -PAC algorithm with arm-memory size of O (log ∗ n ) and O ( nε 2 · log( 1 δ )) optimal sample complexity.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Facet Clustering Variational Autoencoders b/data/2021/neurips/Multi-Facet Clustering Variational Autoencoders
new file mode 100644
index 0000000000..a9c612699c
--- /dev/null
+++ b/data/2021/neurips/Multi-Facet Clustering Variational Autoencoders	
@@ -0,0 +1 @@
+Work in deep clustering focuses on finding a single partition of data. However, high-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over. For example, images of objects against a background could be clustered over the shape of the object and separately by the colour of the background. In this paper, we introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE), a novel class of variational autoencoders with a hierarchy of latent variables, each with a Mixture-of-Gaussians prior, that learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end. MFCVAE uses a progressively-trained ladder architecture which leads to highly stable performance. We provide novel theoretical results for optimising the ELBO analytically with respect to the categorical variational posterior distribution, correcting earlier influential theoretical work. On image benchmarks, we demonstrate that our approach separates out and clusters over different aspects of the data in a disentangled manner. We also show other advantages of our model: the compositionality of its latent space and that it provides controlled generation of samples.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Label Learning with Pairwise Relevance Ordering b/data/2021/neurips/Multi-Label Learning with Pairwise Relevance Ordering
new file mode 100644
index 0000000000..fa98df3bb6
--- /dev/null
+++ b/data/2021/neurips/Multi-Label Learning with Pairwise Relevance Ordering	
@@ -0,0 +1 @@
+Precisely annotating objects with multiple labels is costly and has become a critical bottleneck in real-world multi-label classiﬁcation tasks. Instead, deciding the relative order of label pairs is obviously less laborious than collecting exact labels. However, the supervised information of pairwise relevance ordering is less informative than exact labels. It is thus an important challenge to effectively learn with such weak supervision. In this paper, we formalize this problem as a novel learning framework, called multi-label learning with pairwise relevance ordering (PRO). We show that the unbiased estimator of classiﬁcation risk can be derived with a cost-sensitive loss only from PRO examples. Theoretically, we provide the estimation error bound for the proposed estimator and further prove that it is consistent with respect to the commonly used ranking loss. Empirical studies on multiple datasets and metrics validate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Objective Meta Learning b/data/2021/neurips/Multi-Objective Meta Learning
new file mode 100644
index 0000000000..9b42c39670
--- /dev/null
+++ b/data/2021/neurips/Multi-Objective Meta Learning	
@@ -0,0 +1 @@
+Meta learning with multiple objectives can be formulated as a Multi-Objective Bi-Level optimization Problem (MOBLP) where the upper-level subproblem is to solve several possible conflicting targets for the meta learner. However, existing studies either apply an inefficient evolutionary algorithm or linearly combine multiple objectives as a single-objective problem with the need to tune combination weights. In this paper, we propose a unified gradient-based Multi-Objective Meta Learning (MOML) framework and devise the first gradient-based optimization algorithm to solve the MOBLP by alternatively solving the lower-level and upper-level subproblems via the gradient descent method and the gradient-based multi-objective optimization method, respectively. Theoretically, we prove the convergence properties of the proposed gradient-based optimization algorithm. Empirically, we show the effectiveness of the proposed MOML framework in several meta learning problems, including few-shot learning, neural architecture search, domain adaptation, and multi-task learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs b/data/2021/neurips/Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs
new file mode 100644
index 0000000000..4c690e11ea
--- /dev/null
+++ b/data/2021/neurips/Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs	
@@ -0,0 +1 @@
+We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning (RL) setting. We consider the scenario where: (i) we have a dataset collected under a known baseline policy, (ii) multiple reward signals are received from the environment inducing as many objectives to optimize. We present an SPI formulation for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals while ensuring that the new policy performs at least as well as the baseline policy along each individual objective. We build on traditional SPI algorithms and propose a novel method based on Safe Policy Iteration with Baseline Bootstrapping (SPIBB, Laroche et al., 2019) that provides high probability guarantees on the performance of the agent in the true environment. We show the effectiveness of our method on a synthetic grid-world safety task as well as in a real-world critical care context to learn a policy for the administration of IV fluids and vasopressors to treat sepsis.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Person 3D Motion Prediction with Multi-Range Transformers b/data/2021/neurips/Multi-Person 3D Motion Prediction with Multi-Range Transformers
new file mode 100644
index 0000000000..4b9c0185b2
--- /dev/null
+++ b/data/2021/neurips/Multi-Person 3D Motion Prediction with Multi-Range Transformers	
@@ -0,0 +1 @@
+We propose a novel framework for multi-person 3D motion trajectory prediction. Our key observation is that a human's action and behaviors may highly depend on the other persons around. Thus, instead of predicting each human pose trajectory in isolation, we introduce a Multi-Range Transformers model which contains of a local-range encoder for individual motion and a global-range encoder for social interactions. The Transformer decoder then performs prediction for each person by taking a corresponding pose as a query which attends to both local and global-range encoder features. Our model not only outperforms state-of-the-art methods on long-term 3D motion prediction, but also generates diverse social interactions. More interestingly, our model can even predict 15-person motion simultaneously by automatically dividing the persons into different interaction groups. Project page with code is available at https://jiashunwang.github.io/MRT/.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Scale Representation Learning on Proteins b/data/2021/neurips/Multi-Scale Representation Learning on Proteins
new file mode 100644
index 0000000000..a571122f13
--- /dev/null
+++ b/data/2021/neurips/Multi-Scale Representation Learning on Proteins	
@@ -0,0 +1 @@
+Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein -- HoloProt -- connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure -- comprising secondary and tertiary components -- capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification). On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs b/data/2021/neurips/Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs
new file mode 100644
index 0000000000..a563c1cb77
--- /dev/null
+++ b/data/2021/neurips/Multi-Step Budgeted Bayesian Optimization with Unknown Evaluation Costs	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a sample-efficient approach to optimizing costly-to-evaluate black-box functions. Most BO methods ignore how evaluation costs may vary over the optimization domain. However, these costs can be highly heterogeneous and are often unknown in advance. This occurs in many practical settings, such as hyperparameter tuning of machine learning algorithms or physics-based simulation optimization. Moreover, those few existing methods that acknowledge cost heterogeneity do not naturally accommodate a budget constraint on the total evaluation cost. This combination of unknown costs and a budget constraint introduces a new dimension to the exploration-exploitation trade-off, where learning about the cost incurs the cost itself. Existing methods do not reason about the various trade-offs of this problem in a principled way, leading often to poor performance. We formalize this claim by proving that the expected improvement and the expected improvement per unit of cost, arguably the two most widely used acquisition functions in practice, can be arbitrarily inferior with respect to the optimal non-myopic policy. To overcome the shortcomings of existing approaches, we propose the budgeted multi-step expected improvement, a non-myopic acquisition function that generalizes classical expected improvement to the setting of heterogeneous and unknown evaluation costs. Finally, we show that our acquisition function outperforms existing methods in a variety of synthetic and real problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-View Representation Learning via Total Correlation Objective b/data/2021/neurips/Multi-View Representation Learning via Total Correlation Objective
new file mode 100644
index 0000000000..24515e1641
--- /dev/null
+++ b/data/2021/neurips/Multi-View Representation Learning via Total Correlation Objective	
@@ -0,0 +1 @@
+Multi-View Representation Learning (MVRL) aims to discover a shared representation of observations from different views with the complex underlying correlation. In this paper, we propose a variational approach which casts MVRL as maximizing the amount of total correlation reduced by the representation, aiming to learn a shared latent representation that is informative yet succinct to capture the correlation among multiple views. To this end, we introduce a tractable surrogate objective function under the proposed framework, which allows our method to fuse and calibrate the observations in the representation space. From the information theoretic perspective, we show that our framework subsumes existing multi-view generative models. Lastly, we show that our approach straightforwardly extends to the Partial MVRL (PMVRL) setting, where the observations are missing without any regular pattern. We demonstrate the effectiveness of our approach in the multi-view translation and classiﬁcation tasks, outperforming strong baseline methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-armed Bandit Requiring Monotone Arm Sequences b/data/2021/neurips/Multi-armed Bandit Requiring Monotone Arm Sequences
new file mode 100644
index 0000000000..aed3c24c2c
--- /dev/null
+++ b/data/2021/neurips/Multi-armed Bandit Requiring Monotone Arm Sequences	
@@ -0,0 +1 @@
+In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is $O(T)$. When in addition the objective function is unimodal or quasiconcave, the regret is $\tilde O(T^{3/4})$ under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate $\tilde O(T^{2/3})$ in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-modal Dependency Tree for Video Captioning b/data/2021/neurips/Multi-modal Dependency Tree for Video Captioning
new file mode 100644
index 0000000000..f9a6266e3b
--- /dev/null
+++ b/data/2021/neurips/Multi-modal Dependency Tree for Video Captioning	
@@ -0,0 +1 @@
+Generating ﬂuent and relevant language to describe visual content is critical for the video captioning task. Many existing methods generate captions using sequence models that predict words in a left-to-right order. In this paper, we investigate a graph structured model by explicitly modeling the hierarchical structure in the sentences to further improve the ﬂuency and relevance of the generated captions. To this end, we propose a novel video captioning method that generates a sentence by ﬁrst constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology. To take full advantage of the information from both vision and language, both the visual and textual representation features are encoded into each tree node. Different from existing dependency parsing methods that generate uni-modal dependency trees for language understanding, our method constructs multi-modal dependency trees for language generation of videos. We also propose a tree-structured reinforcement learning algorithm to effectively optimize the captioning model, where a novel reward is designed by evaluating the semantic consistency between the generated sub-trees and the ground-truth tree. Extensive experiments on several video captioning datasets demonstrate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-task Learning of Order-Consistent Causal Graphs b/data/2021/neurips/Multi-task Learning of Order-Consistent Causal Graphs
new file mode 100644
index 0000000000..fb5f097d8a
--- /dev/null
+++ b/data/2021/neurips/Multi-task Learning of Order-Consistent Causal Graphs	
@@ -0,0 +1 @@
+We consider the problem of discovering $K$ related Gaussian directed acyclic graphs (DAGs), where the involved graph structures share a consistent causal order and sparse unions of supports. Under the multi-task learning setting, we propose a $l_1/l_2$-regularized maximum likelihood estimator (MLE) for learning $K$ linear structural equation models. We theoretically show that the joint estimator, by leveraging data across related tasks, can achieve a better sample complexity for recovering the causal order (or topological order) than separate estimations. Moreover, the joint estimator is able to recover non-identifiable DAGs, by estimating them together with some identifiable DAGs. Lastly, our analysis also shows the consistency of union support recovery of the structures. To allow practical implementation, we design a continuous optimization problem whose optimizer is the same as the joint estimator and can be approximated efficiently by an iterative algorithm. We validate the theoretical analysis and the effectiveness of the joint estimator in experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Multi-view Contrastive Graph Clustering b/data/2021/neurips/Multi-view Contrastive Graph Clustering
new file mode 100644
index 0000000000..d67b7bf92a
--- /dev/null
+++ b/data/2021/neurips/Multi-view Contrastive Graph Clustering	
@@ -0,0 +1 @@
+With the explosive growth of information technology, multi-view graph data have become increasingly prevalent and valuable. Most existing multi-view clustering techniques either focus on the scenario of multiple graphs or multi-view attributes. In this paper, we propose a generic framework to cluster multi-view attributed graph data. Specifically, inspired by the success of contrastive learning, we propose multi-view contrastive graph clustering (MCGC) method to learn a consensus graph since the original graph could be noisy or incomplete and is not directly applicable. Our method composes of two key steps: we first filter out the undesirable high-frequency noise while preserving the graph geometric features via graph filtering and obtain a smooth representation of nodes; we then learn a consensus graph regularized by graph contrastive loss. Results on several benchmark datasets show the superiority of our method with respect to state-of-the-art approaches. In particular, our simple approach outperforms existing deep learning-based methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Multiclass Boosting and the Cost of Weak Learning b/data/2021/neurips/Multiclass Boosting and the Cost of Weak Learning
new file mode 100644
index 0000000000..ca7c8546b2
--- /dev/null
+++ b/data/2021/neurips/Multiclass Boosting and the Cost of Weak Learning	
@@ -0,0 +1 @@
+Boosting is an algorithmic approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. In this work we study multiclass boosting with a possibly large number of classes or categories. Multiclass boosting can be formulated in various ways. Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an “easy-to-learn” base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classiﬁcation loss. This is in contrast with other, more complicated losses as have often been considered in the past. The goal of the overall boosting algorithm is then to learn a combination of weak hypotheses by repeatedly calling the weak learner. We study the resources required for boosting, especially how they depend on the number of classes k , for both the booster and weak learner. We ﬁnd that the boosting algorithm itself only requires O (log k ) samples, as we show by analyzing a variant of AdaBoost for our setting. In stark contrast, assuming typical limits on the number of weak-learner calls, we prove that the number of samples required by a weak learner is at least polynomial in k , exponentially more than the number of samples needed by the booster. Alternatively, we prove that the weak learner’s accuracy parameter must be smaller than an inverse polynomial in k , showing that the returned weak hypotheses must be nearly the best in their class when k is large. We also prove a trade-off between number of oracle calls and the resources required of the weak learner, meaning that the fewer calls to the weak learner the more that is demanded on each call.
\ No newline at end of file
diff --git a/data/2021/neurips/Multiclass versus Binary Differentially Private PAC Learning b/data/2021/neurips/Multiclass versus Binary Differentially Private PAC Learning
new file mode 100644
index 0000000000..61a98988a0
--- /dev/null
+++ b/data/2021/neurips/Multiclass versus Binary Differentially Private PAC Learning	
@@ -0,0 +1 @@
+We show a generic reduction from multiclass differentially private PAC learning to binary private PAC learning. We apply this transformation to a recently proposed binary private PAC learner to obtain a private multiclass learner with sample complexity that has a polynomial dependence on the multiclass Littlestone dimension and a poly-logarithmic dependence on the number of classes. This yields an exponential improvement in the dependence on both parameters over learners from previous work. Our proof extends the notion of $\Psi$-dimension defined in work of Ben-David et al. [JCSS '95] to the online setting and explores its general properties.
\ No newline at end of file
diff --git a/data/2021/neurips/Multilingual Pre-training with Universal Dependency Learning b/data/2021/neurips/Multilingual Pre-training with Universal Dependency Learning
new file mode 100644
index 0000000000..6f7b630257
--- /dev/null
+++ b/data/2021/neurips/Multilingual Pre-training with Universal Dependency Learning	
@@ -0,0 +1 @@
+The pre-trained language model (PrLM) demonstrates domination in downstream natural language processing tasks, in which multilingual PrLM takes advantage of language universality to alleviate the issue of limited resources for low-resource languages. Despite its successes, the performance of multilingual PrLM is still unsatisfactory, when multilingual PrLMs only focus on plain text and ignore obvious universal linguistic structure clues. Existing PrLMs have shown that monolingual linguistic structure knowledge may bring about better performance. Thus we pro-pose a novel multilingual PrLM that supports both explicit universal dependency parsing and implicit language modeling. Syntax in terms of universal dependency parse serves as not only pre-training objective but also learned representation in our model, which brings unprecedented PrLM interpretability and convenience in downstream task use. Our model outperforms two popular multilingual PrLM, multilingual-BERT and XLM-R, on cross-lingual natural language understanding (NLU) benchmarks and linguistic structure parsing datasets, demonstrating the effectiveness and stronger cross-lingual modeling capabilities of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Multimodal Few-Shot Learning with Frozen Language Models b/data/2021/neurips/Multimodal Few-Shot Learning with Frozen Language Models
new file mode 100644
index 0000000000..d078a1e78d
--- /dev/null
+++ b/data/2021/neurips/Multimodal Few-Shot Learning with Frozen Language Models	
@@ -0,0 +1 @@
+When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Multimodal Virtual Point 3D Detection b/data/2021/neurips/Multimodal Virtual Point 3D Detection
new file mode 100644
index 0000000000..fa349044de
--- /dev/null
+++ b/data/2021/neurips/Multimodal Virtual Point 3D Detection	
@@ -0,0 +1 @@
+Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector is simple and effective. Experimental results on the large-scale nuScenes dataset show that our framework improves a strong CenterPoint baseline by a significant 6.6 mAP, and outperforms competing fusion approaches. Code and more visualizations are available at https://tianweiy.github.io/mvp/
\ No newline at end of file
diff --git a/data/2021/neurips/Multimodal and Multilingual Embeddings for Large-Scale Speech Mining b/data/2021/neurips/Multimodal and Multilingual Embeddings for Large-Scale Speech Mining
new file mode 100644
index 0000000000..d0fe8cf829
--- /dev/null
+++ b/data/2021/neurips/Multimodal and Multilingual Embeddings for Large-Scale Speech Mining	
@@ -0,0 +1 @@
+We present an approach to encode a speech signal into a ﬁxed-size representation which minimizes the cosine loss with the existing massively multilingual LASER text embedding space. Sentences are close in this embedding space, independently of their language and modality, either text or audio. Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl. This yielded more than twenty thousand hours of aligned speech translations. To evaluate the automatically mined speech/text corpora, we train neural speech translation systems for several languages pairs. Adding the mined data, achieves signiﬁcant improvements in the BLEU score on the CoVoST2 and the MUST-C test sets with respect to a very competitive baseline. Our approach can also be used to directly perform speech-to-speech mining, without the need to ﬁrst transcribe or translate the data. We obtain more than one thousand three hundred hours of aligned speech in French, German, Spanish and English. This speech corpus has the potential to boost research in speech-to-speech translation which suffers from scarcity of natural end-to-end training data. All the mined multimodal corpora will be made freely available.
\ No newline at end of file
diff --git a/data/2021/neurips/Multiple Descent: Design Your Own Generalization Curve b/data/2021/neurips/Multiple Descent: Design Your Own Generalization Curve
new file mode 100644
index 0000000000..90c68a1b58
--- /dev/null
+++ b/data/2021/neurips/Multiple Descent: Design Your Own Generalization Curve	
@@ -0,0 +1 @@
+This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, locations of those peaks can be explicitly controlled. Our results highlight the fact that both classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Multiwavelet-based Operator Learning for Differential Equations b/data/2021/neurips/Multiwavelet-based Operator Learning for Differential Equations
new file mode 100644
index 0000000000..9629a42c4e
--- /dev/null
+++ b/data/2021/neurips/Multiwavelet-based Operator Learning for Differential Equations	
@@ -0,0 +1 @@
+The solution of a partial differential equation can be obtained by computing the inverse operator map between the input and the solution space. Towards this end, we introduce a \textit{multiwavelet-based neural operator learning scheme} that compresses the associated operator's kernel using fine-grained wavelets. By explicitly embedding the inverse multiwavelet filters, we learn the projection of the kernel onto fixed multiwavelet polynomial bases. The projected kernel is trained at multiple scales derived from using repeated computation of multiwavelet transform. This allows learning the complex dependencies at various scales and results in a resolution-independent scheme. Compare to the prior works, we exploit the fundamental properties of the operator's kernel which enable numerically efficient representation. We perform experiments on the Korteweg-de Vries (KdV) equation, Burgers' equation, Darcy Flow, and Navier-Stokes equation. Compared with the existing neural operator approaches, our model shows significantly higher accuracy and achieves state-of-the-art in a range of datasets. For the time-varying equations, the proposed method exhibits a ($2X-10X$) improvement ($0.0018$ ($0.0033$) relative $L2$ error for Burgers' (KdV) equation). By learning the mappings between function spaces, the proposed method has the ability to find the solution of a high-resolution input after learning from lower-resolution data.
\ No newline at end of file
diff --git a/data/2021/neurips/NAS-Bench-x11 and the Power of Learning Curves b/data/2021/neurips/NAS-Bench-x11 and the Power of Learning Curves
new file mode 100644
index 0000000000..6b36e24927
--- /dev/null
+++ b/data/2021/neurips/NAS-Bench-x11 and the Power of Learning Curves	
@@ -0,0 +1 @@
+While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research. However, two of the most popular benchmarks do not provide the full training information for each architecture. As a result, on these benchmarks it is not possible to run many types of multi-fidelity techniques, such as learning curve extrapolation, that require evaluating architectures at arbitrary epochs. In this work, we present a method using singular value decomposition and noise modeling to create surrogate benchmarks, NAS-Bench-111, NAS-Bench-311, and NAS-Bench-NLP11, that output the full training information for each architecture, rather than just the final validation accuracy. We demonstrate the power of using the full training information by introducing a learning curve extrapolation framework to modify single-fidelity algorithms, showing that it leads to improvements over popular single-fidelity algorithms which claimed to be state-of-the-art upon release. Our code and pretrained models are available at https://github.com/automl/nas-bench-x11.
\ No newline at end of file
diff --git a/data/2021/neurips/NEO: Non Equilibrium Sampling on the Orbits of a Deterministic Transform b/data/2021/neurips/NEO: Non Equilibrium Sampling on the Orbits of a Deterministic Transform
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs b/data/2021/neurips/NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs
new file mode 100644
index 0000000000..730ac6602b
--- /dev/null
+++ b/data/2021/neurips/NN-Baker: A Neural-network Infused Algorithmic Framework for Optimization Problems on Geometric Intersection Graphs	
@@ -0,0 +1 @@
+Recent years have witnessed a surge of approaches to use neural networks to help tackle combinatorial optimization problems, including graph optimization problems. However, theoretical understanding of such approaches remains limited. In this paper, we consider the geometric setting, where graphs are induced by points in a ﬁxed dimensional Euclidean space. It turns out that several graph optimization problems can be approximated (in a bicriteria manner) by an algorithm that runs in time linear in graph size n via a framework that we call the Baker-paradigm. A key advantage of the Baker-paradigm is that it decomposes the input problem into (at most linear number of) small sub-problems of bounded sizes (independent of the size of the input). For the family of such bounded-size sub-problems, we can now design neural networks with universal approximation guarantees to solve them. This leads to a mixed algorithmic-ML framework, which we call NN-Baker that has the capacity to approximately solve a family of graph optimization problems (e.g, maximum independent set and minimum vertex cover) in time linear in the input graph size. We instantiate our NN-Baker by a CNN version and GNN version, and demonstrate the effectiveness and efﬁciency of our approach via a range of experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/NORESQA: A Framework for Speech Quality Assessment using Non-Matching References b/data/2021/neurips/NORESQA: A Framework for Speech Quality Assessment using Non-Matching References
new file mode 100644
index 0000000000..d39b49db55
--- /dev/null
+++ b/data/2021/neurips/NORESQA: A Framework for Speech Quality Assessment using Non-Matching References	
@@ -0,0 +1 @@
+The perceptual task of speech quality assessment (SQA) is a challenging task for machines to do. Objective SQA methods that rely on the availability of the corresponding clean reference have been the primary go-to approaches for SQA. Clearly, these methods fail in real-world scenarios where the ground truth clean references are not available. In recent years, non-intrusive methods that train neural networks to predict ratings or scores have attracted much attention, but they suffer from several shortcomings such as lack of robustness, reliance on labeled data for training and so on. In this work, we propose a new direction for speech quality assessment. Inspired by human's innate ability to compare and assess the quality of speech signals even when they have non-matching contents, we propose a novel framework that predicts a subjective relative quality score for the given speech signal with respect to any provided reference without using any subjective data. We show that neural networks trained using our framework produce scores that correlate well with subjective mean opinion scores (MOS) and are also competitive to methods such as DNSMOS, which explicitly relies on MOS from humans for training networks. Moreover, our method also provides a natural way to embed quality-related information in neural networks, which we show is helpful for downstream tasks such as speech enhancement.
\ No newline at end of file
diff --git a/data/2021/neurips/NTopo: Mesh-free Topology Optimization using Implicit Neural Representations b/data/2021/neurips/NTopo: Mesh-free Topology Optimization using Implicit Neural Representations
new file mode 100644
index 0000000000..719f2ca795
--- /dev/null
+++ b/data/2021/neurips/NTopo: Mesh-free Topology Optimization using Implicit Neural Representations	
@@ -0,0 +1 @@
+Recent advances in implicit neural representations show great promise when it comes to generating numerical solutions to partial differential equations. Compared to conventional alternatives, such representations employ parameterized neural networks to define, in a mesh-free manner, signals that are highly-detailed, continuous, and fully differentiable. In this work, we present a novel machine learning approach for topology optimization -- an important class of inverse problems with high-dimensional parameter spaces and highly nonlinear objective landscapes. To effectively leverage neural representations in the context of mesh-free topology optimization, we use multilayer perceptrons to parameterize both density and displacement fields. Our experiments indicate that our method is highly competitive for minimizing structural compliance objectives, and it enables self-supervised learning of continuous solution spaces for topology optimization problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Navigating to the Best Policy in Markov Decision Processes b/data/2021/neurips/Navigating to the Best Policy in Markov Decision Processes
new file mode 100644
index 0000000000..1be0b9e56c
--- /dev/null
+++ b/data/2021/neurips/Navigating to the Best Policy in Markov Decision Processes	
@@ -0,0 +1 @@
+We investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least $1-\delta$. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relative to the \emph{generative setting}~\cite{pmlr-v139-marjani21a}, where the agent could at each step query the random outcome of any (state, action) pair. In contrast, we show here how to deal with the \emph{navigation constraints}, induced by the \emph{online setting}. Our analysis relies on an ergodic theorem for non-homogeneous Markov chains which we consider of wide interest in the analysis of Markov Decision Processes.
\ No newline at end of file
diff --git a/data/2021/neurips/NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild b/data/2021/neurips/NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild
new file mode 100644
index 0000000000..e0054b0128
--- /dev/null
+++ b/data/2021/neurips/NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild	
@@ -0,0 +1 @@
+Recent history has seen a tremendous growth of work exploring implicit representations of geometry and radiance, popularized through Neural Radiance Fields (NeRF). Such works are fundamentally based on a (implicit) volumetric representation of occupancy, allowing them to model diverse scene structure including translucent objects and atmospheric obscurants. But because the vast majority of real-world scenes are composed of well-defined surfaces, we introduce a surface analog of such implicit models called Neural Reflectance Surfaces (NeRS). NeRS learns a neural shape representation of a closed surface that is diffeomorphic to a sphere, guaranteeing water-tight reconstructions. Even more importantly, surface parameterizations allow NeRS to learn (neural) bidirectional surface reflectance functions (BRDFs) that factorize view-dependent appearance into environmental illumination, diffuse color (albedo), and specular"shininess."Finally, rather than illustrating our results on synthetic scenes or controlled in-the-lab capture, we assemble a novel dataset of multi-view images from online marketplaces for selling goods. Such"in-the-wild"multi-view image sets pose a number of challenges, including a small number of views with unknown/rough camera estimates. We demonstrate that surface-based neural reconstructions enable learning from such data, outperforming volumetric neural rendering-based reconstructions. We hope that NeRS serves as a first step toward building scalable, high-quality libraries of real-world shape, materials, and illumination. The project page with code and video visualizations can be found at https://jasonyzhang.com/ners.
\ No newline at end of file
diff --git a/data/2021/neurips/NeRV: Neural Representations for Videos b/data/2021/neurips/NeRV: Neural Representations for Videos
new file mode 100644
index 0000000000..88c95eea82
--- /dev/null
+++ b/data/2021/neurips/NeRV: Neural Representations for Videos	
@@ -0,0 +1 @@
+We propose a novel neural representation for videos (NeRV) which encodes videos in neural networks. Unlike conventional representations that treat videos as frame sequences, we represent videos as neural networks taking frame index as input. Given a frame index, NeRV outputs the corresponding RGB image. Video encoding in NeRV is simply fitting a neural network to video frames and decoding process is a simple feedforward operation. As an image-wise implicit representation, NeRV output the whole image and shows great efficiency compared to pixel-wise implicit representation, improving the encoding speed by 25x to 70x, the decoding speed by 38x to 132x, while achieving better video quality. With such a representation, we can treat videos as neural networks, simplifying several video-related tasks. For example, conventional video compression methods are restricted by a long and complex pipeline, specifically designed for the task. In contrast, with NeRV, we can use any neural network compression method as a proxy for video compression, and achieve comparable performance to traditional frame-based video compression approaches (H.264, HEVC \etc). Besides compression, we demonstrate the generalization of NeRV for video denoising. The source code and pre-trained model can be found at https://github.com/haochen-rye/NeRV.git.
\ No newline at end of file
diff --git a/data/2021/neurips/Near Optimal Policy Optimization via REPS b/data/2021/neurips/Near Optimal Policy Optimization via REPS
new file mode 100644
index 0000000000..c00b4fe4d2
--- /dev/null
+++ b/data/2021/neurips/Near Optimal Policy Optimization via REPS	
@@ -0,0 +1 @@
+Since its introduction a decade ago, \emph{relative entropy policy search} (REPS) has demonstrated successful policy learning on a number of simulated and real-world robotic domains, not to mention providing algorithmic components used by many recently proposed reinforcement learning (RL) algorithms. While REPS is commonly known in the community, there exist no guarantees on its performance when using stochastic and gradient-based solvers. In this paper we aim to fill this gap by providing guarantees and convergence rates for the sub-optimality of a policy learned using first-order optimization methods applied to the REPS objective. We first consider the setting in which we are given access to exact gradients and demonstrate how near-optimality of the objective translates to near-optimality of the policy. We then consider the practical setting of stochastic gradients, and introduce a technique that uses \emph{generative} access to the underlying Markov decision process to compute parameter updates that maintain favorable convergence to the optimal regularized policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness b/data/2021/neurips/Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness
new file mode 100644
index 0000000000..59727b123e
--- /dev/null
+++ b/data/2021/neurips/Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness	
@@ -0,0 +1 @@
+We study the complexity of optimizing highly smooth convex functions. For a positive integer p, we want to find an -approximate minimum of a convex function f , given oracle access to the function and its first p derivatives, assuming that the pth derivative of f is Lipschitz. Recently, three independent research groups (Jiang et al., PLMR 2019; Gasnikov et al., PLMR 2019; Bubeck et al., PLMR 2019) developed a new algorithm that solves this problem with Õ(1/ 2 3p+1 ) oracle calls for constant p. This is known to be optimal (up to log factors) for deterministic algorithms, but known lower bounds for randomized algorithms do not match this bound. We prove a new lower bound that matches this bound (up to log factors), and holds not only for randomized algorithms, but also quantum algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning b/data/2021/neurips/Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning
new file mode 100644
index 0000000000..448e1754c8
--- /dev/null
+++ b/data/2021/neurips/Near-Optimal Multi-Perturbation Experimental Design for Causal Structure Learning	
@@ -0,0 +1 @@
+Causal structure learning is a key problem in many domains. Causal structures can be learnt by performing experiments on the system of interest. We address the largely unexplored problem of designing experiments that simultaneously intervene on multiple variables. While potentially more informative than the commonly considered single-variable interventions, selecting such interventions is algorithmically much more challenging, due to the doubly-exponential combinatorial search space over sets of composite interventions. In this paper, we develop efficient algorithms for optimizing different objective functions quantifying the informativeness of experiments. By establishing novel submodularity properties of these objectives, we provide approximation guarantees for our algorithms. Our algorithms empirically perform superior to both random interventions and algorithms that only select single-variable interventions.
\ No newline at end of file
diff --git a/data/2021/neurips/Near-Optimal No-Regret Learning in General Games b/data/2021/neurips/Near-Optimal No-Regret Learning in General Games
new file mode 100644
index 0000000000..dcf3a5072e
--- /dev/null
+++ b/data/2021/neurips/Near-Optimal No-Regret Learning in General Games	
@@ -0,0 +1 @@
+We show that Optimistic Hedge – a common variant of multiplicative-weights-updates with recency bias – attains poly(log T ) regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after T rounds of interaction, each player experiences total regret that is poly(log T ) . Our bound improves, exponentially, the O ( T 1 / 2 ) regret attainable by standard no-regret learners in games, the O ( T 1 / 4 ) regret attainable by no-regret learners with recency bias [SALS15], and the O ( T 1 / 6 ) bound that was recently shown for Optimistic Hedge in the special case of two-player games [CP20]. A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of ˜ O (cid:0) 1 T (cid:1) .
\ No newline at end of file
diff --git a/data/2021/neurips/Near-Optimal Offline Reinforcement Learning via Double Variance Reduction b/data/2021/neurips/Near-Optimal Offline Reinforcement Learning via Double Variance Reduction
new file mode 100644
index 0000000000..1c09fa8d9e
--- /dev/null
+++ b/data/2021/neurips/Near-Optimal Offline Reinforcement Learning via Double Variance Reduction	
@@ -0,0 +1 @@
+We consider the problem of offline reinforcement learning (RL) -- a well-motivated setting of RL that aims at policy optimization using only historical data. Despite its wide applicability, theoretical understandings of offline RL, such as its optimal sample complexity, remain largely open even in basic settings such as \emph{tabular} Markov Decision Processes (MDPs). In this paper, we propose Off-Policy Double Variance Reduction (OPDVR), a new variance reduction based algorithm for offline RL. Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon stationary transition setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy. This improves over the best known upper bound by a factor of $H$. Moreover, we establish an information-theoretic lower bound of $\Omega(H^2/d_m\epsilon^2)$ which certifies that OPDVR is optimal up to logarithmic factors. Lastly, we show that OPDVR also achieves rate-optimal sample complexity under alternative settings such as the finite-horizon MDPs with non-stationary transitions and the infinite horizon MDPs with discounted rewards.
\ No newline at end of file
diff --git a/data/2021/neurips/Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems b/data/2021/neurips/Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
new file mode 100644
index 0000000000..ace1ffc9b9
--- /dev/null
+++ b/data/2021/neurips/Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems	
@@ -0,0 +1 @@
+We consider the setting of vector valued non-linear dynamical systems $X_{t+1} = \phi(A^* X_t) + \eta_t$, where $\eta_t$ is unbiased noise and $\phi : \mathbb{R} \to \mathbb{R}$ is a known link function that satisfies certain {\em expansivity property}. The goal is to learn $A^*$ from a single trajectory $X_1,\cdots,X_T$ of {\em dependent or correlated} samples. While the problem is well-studied in the linear case, where $\phi$ is identity, with optimal error rates even for non-mixing systems, existing results in the non-linear case hold only for mixing systems. In this work, we improve existing results for learning nonlinear systems in a number of ways: a) we provide the first offline algorithm that can learn non-linear dynamical systems without the mixing assumption, b) we significantly improve upon the sample complexity of existing results for mixing systems, c) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay ($\mathsf{SGD-RER}$) method, and demonstrate that for mixing systems, it achieves the same sample complexity as our offline algorithm, d) we justify the expansivity assumption by showing that for the popular ReLU link function -- a non-expansive but easy to learn link function with i.i.d. samples -- any method would require exponentially many samples (with respect to dimension of $X_t$) from the dynamical system. We validate our results via. simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Nearly Horizon-Free Offline Reinforcement Learning b/data/2021/neurips/Nearly Horizon-Free Offline Reinforcement Learning
new file mode 100644
index 0000000000..bab32d54f3
--- /dev/null
+++ b/data/2021/neurips/Nearly Horizon-Free Offline Reinforcement Learning	
@@ -0,0 +1 @@
+We revisit offline reinforcement learning on episodic time-homogeneous Markov Decision Processes (MDP). For tabular MDP with $S$ states and $A$ actions, or linear MDP with anchor points and feature dimension $d$, given the collected $K$ episodes data with minimum visiting probability of (anchor) state-action pairs $d_m$, we obtain nearly horizon $H$-free sample complexity bounds for offline reinforcement learning when the total reward is upper bounded by $1$. Specifically: 1. For offline policy evaluation, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} \right)$ error bound for the plug-in estimator, which matches the lower bound up to logarithmic factors and does not have additional dependency on $\mathrm{poly}\left(H, S, A, d\right)$ in higher-order term. 2.For offline policy optimization, we obtain an $\tilde{O}\left(\sqrt{\frac{1}{Kd_m}} + \frac{\min(S, d)}{Kd_m}\right)$ sub-optimality gap for the empirical optimal policy, which approaches the lower bound up to logarithmic factors and a high-order term, improving upon the best known result by \cite{cui2020plug} that has additional $\mathrm{poly}\left(H, S, d\right)$ factors in the main term. To the best of our knowledge, these are the \emph{first} set of nearly horizon-free bounds for episodic time-homogeneous offline tabular MDP and linear MDP with anchor points. Central to our analysis is a simple yet effective recursion based method to bound a"total variance"term in the offline scenarios, which could be of individual interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs b/data/2021/neurips/Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
new file mode 100644
index 0000000000..2942d64239
--- /dev/null
+++ b/data/2021/neurips/Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs	
@@ -0,0 +1 @@
+We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus. We show that UCBVI-$\gamma$ achieves an $\tilde{O}\big({\sqrt{SAT}}/{(1-\gamma)^{1.5}}\big)$ regret, where $S$ is the number of states, $A$ is the number of actions, $\gamma$ is the discount factor and $T$ is the number of steps. In addition, we construct a class of hard MDPs and show that for any algorithm, the expected regret is at least $\tilde{\Omega}\big({\sqrt{SAT}}/{(1-\gamma)^{1.5}}\big)$. Our upper bound matches the minimax lower bound up to logarithmic factors, which suggests that UCBVI-$\gamma$ is nearly minimax optimal for discounted MDPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Nearly-Tight and Oblivious Algorithms for Explainable Clustering b/data/2021/neurips/Nearly-Tight and Oblivious Algorithms for Explainable Clustering
new file mode 100644
index 0000000000..c71883434e
--- /dev/null
+++ b/data/2021/neurips/Nearly-Tight and Oblivious Algorithms for Explainable Clustering	
@@ -0,0 +1 @@
+We study the problem of explainable clustering in the setting first formalized by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). A $k$-clustering is said to be explainable if it is given by a decision tree where each internal node splits data points with a threshold cut in a single dimension (feature), and each of the $k$ leaves corresponds to a cluster. We give an algorithm that outputs an explainable clustering that loses at most a factor of $O(\log^2 k)$ compared to an optimal (not necessarily explainable) clustering for the $k$-medians objective, and a factor of $O(k \log^2 k)$ for the $k$-means objective. This improves over the previous best upper bounds of $O(k)$ and $O(k^2)$, respectively, and nearly matches the previous $\Omega(\log k)$ lower bound for $k$-medians and our new $\Omega(k)$ lower bound for $k$-means. The algorithm is remarkably simple. In particular, given an initial not necessarily explainable clustering in $\mathbb{R}^d$, it is oblivious to the data points and runs in time $O(dk \log^2 k)$, independent of the number of data points $n$. Our upper and lower bounds also generalize to objectives given by higher $\ell_p$-norms.
\ No newline at end of file
diff --git a/data/2021/neurips/Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables b/data/2021/neurips/Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables
new file mode 100644
index 0000000000..92b9ba94b7
--- /dev/null
+++ b/data/2021/neurips/Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables	
@@ -0,0 +1 @@
+The problem of selecting optimal backdoor adjustment sets to estimate causal effects in graphical models with hidden and conditioned variables is addressed. Previous work has defined optimality as achieving the smallest asymptotic estimation variance and derived an optimal set for the case without hidden variables. For the case with hidden variables there can be settings where no optimal set exists and currently only a sufficient graphical optimality criterion of limited applicability has been derived. In the present work optimality is characterized as maximizing a certain adjustment information which allows to derive a necessary and sufficient graphical criterion for the existence of an optimal adjustment set and a definition and algorithm to construct it. Further, the optimal set is valid if and only if a valid adjustment set exists and has higher (or equal) adjustment information than the Adjust-set proposed in Perkovi{\'c} et al. [Journal of Machine Learning Research, 18: 1--62, 2018] for any graph. The results translate to minimal asymptotic estimation variance for a class of estimators whose asymptotic variance follows a certain information-theoretic relation. Numerical experiments indicate that the asymptotic results also hold for relatively small sample sizes and that the optimal adjustment set or minimized variants thereof often yield better variance also beyond that estimator class. Surprisingly, among the randomly created setups more than 90\% fulfill the optimality conditions indicating that also in many real-world scenarios graphical optimality may hold. Code is available as part of the python package \url{https://github.com/jakobrunge/tigramite}.
\ No newline at end of file
diff --git a/data/2021/neurips/Neighborhood Reconstructing Autoencoders b/data/2021/neurips/Neighborhood Reconstructing Autoencoders
new file mode 100644
index 0000000000..69c21566a0
--- /dev/null
+++ b/data/2021/neurips/Neighborhood Reconstructing Autoencoders	
@@ -0,0 +1 @@
+Vanilla autoencoders often produce manifolds that overﬁt to noisy training data, or have the wrong local connectivity and geometry. Autoencoder regularization techniques, e.g., the denoising autoencoder, have had some success in reducing overﬁtting, whereas recent graph-based methods that exploit local connectivity information provided by neighborhood graphs have had some success in mitigating local connectivity errors. Neither of these two approaches satisfactorily reduce both overﬁtting and connectivity errors; moreover, graph-based methods typically involve considerable preprocessing and tuning. To simultaneously address the two issues of overﬁtting and local connectivity, we propose a new graph-based autoencoder, the Neighborhood Reconstructing Autoencoder (NRAE) . Unlike existing graph-based methods that attempt to encode the training data to some prescribed latent space distribution – one consequence being that only the encoder is the object of the regularization – NRAE merges local connectivity information contained in the neighborhood graphs with local quadratic approximations of the decoder function to formulate a new neighborhood reconstruction loss . Compared to existing graph-based methods, our new loss function is simple and easy to implement, and the resulting algorithm is scalable and computationally efﬁcient; the only required preprocessing step is the construction of the neighborhood graph. Extensive experiments with standard datasets demonstrate that, compared to existing methods, NRAE improves both overﬁtting and local connectivity in the learned manifold, in some cases by signiﬁcant margins. Code for NRAE is available at https://github.com/Gabe-YHLee/NRAE-public.
\ No newline at end of file
diff --git a/data/2021/neurips/Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction b/data/2021/neurips/Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction
new file mode 100644
index 0000000000..ae7e44f9b1
--- /dev/null
+++ b/data/2021/neurips/Neo-GNNs: Neighborhood Overlap-aware Graph Neural Networks for Link Prediction	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have been widely applied to various fields for learning over graph-structured data. They have shown significant improvements over traditional heuristic methods in various tasks such as node classification and graph classification. However, since GNNs heavily rely on smoothed node features rather than graph structure, they often show poor performance than simple heuristic methods in link prediction where the structural information, e.g., overlapped neighborhoods, degrees, and shortest paths, is crucial. To address this limitation, we propose Neighborhood Overlap-aware Graph Neural Networks (Neo-GNNs) that learn useful structural features from an adjacency matrix and estimate overlapped neighborhoods for link prediction. Our Neo-GNNs generalize neighborhood overlap-based heuristic methods and handle overlapped multi-hop neighborhoods. Our extensive experiments on Open Graph Benchmark datasets (OGB) demonstrate that Neo-GNNs consistently achieve state-of-the-art performance in link prediction. Our code is publicly available at https://github.com/seongjunyun/Neo_GNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Nested Counterfactual Identification from Arbitrary Surrogate Experiments b/data/2021/neurips/Nested Counterfactual Identification from Arbitrary Surrogate Experiments
new file mode 100644
index 0000000000..7249b08848
--- /dev/null
+++ b/data/2021/neurips/Nested Counterfactual Identification from Arbitrary Surrogate Experiments	
@@ -0,0 +1 @@
+The Ladder of Causation describes three qualitatively different types of activities an agent may be interested in engaging in, namely, seeing (observational), doing (interventional), and imagining (counterfactual) (Pearl and Mackenzie, 2018). The inferential challenge imposed by the causal hierarchy is that data is collected by an agent observing or intervening in a system (layers 1 and 2), while its goal may be to understand what would have happened had it taken a different course of action, contrary to what factually ended up happening (layer 3). While there exists a solid understanding of the conditions under which cross-layer inferences are allowed from observations to interventions, the results are somewhat scarcer when targeting counterfactual quantities. In this paper, we study the identification of nested counterfactuals from an arbitrary combination of observations and experiments. Specifically, building on a more explicit definition of nested counterfactuals, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones. For instance, applications in mediation and fairness analysis usually evoke notions of direct, indirect, and spurious effects, which naturally require nesting. Second, we introduce a sufficient and necessary graphical condition for counterfactual identification from an arbitrary combination of observational and experimental distributions. Lastly, we develop an efficient and complete algorithm for identifying nested counterfactuals; failure of the algorithm returning an expression for a query implies it is not identifiable.
\ No newline at end of file
diff --git a/data/2021/neurips/Nested Graph Neural Networks b/data/2021/neurips/Nested Graph Neural Networks
new file mode 100644
index 0000000000..052a1d812b
--- /dev/null
+++ b/data/2021/neurips/Nested Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural network (GNN)'s success in graph classification is closely related to the Weisfeiler-Lehman (1-WL) algorithm. By iteratively aggregating neighboring node features to a center node, both 1-WL and GNN obtain a node representation that encodes a rooted subtree around the center node. These rooted subtree representations are then pooled into a single representation to represent the whole graph. However, rooted subtrees are of limited expressiveness to represent a non-tree graph. To address it, we propose Nested Graph Neural Networks (NGNNs). NGNN represents a graph with rooted subgraphs instead of rooted subtrees, so that two graphs sharing many identical subgraphs (rather than subtrees) tend to have similar representations. The key is to make each node representation encode a subgraph around it more than a subtree. To achieve this, NGNN extracts a local subgraph around each node and applies a base GNN to each subgraph to learn a subgraph representation. The whole-graph representation is then obtained by pooling these subgraph representations. We provide a rigorous theoretical analysis showing that NGNN is strictly more powerful than 1-WL. In particular, we proved that NGNN can discriminate almost all r-regular graphs, where 1-WL always fails. Moreover, unlike other more powerful GNNs, NGNN only introduces a constant-factor higher time complexity than standard GNNs. NGNN is a plug-and-play framework that can be combined with various base GNNs. We test NGNN with different base GNNs on several benchmark datasets. NGNN uniformly improves their performance and shows highly competitive performance on all datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Nested Variational Inference b/data/2021/neurips/Nested Variational Inference
new file mode 100644
index 0000000000..fbcb516310
--- /dev/null
+++ b/data/2021/neurips/Nested Variational Inference	
@@ -0,0 +1 @@
+We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonly-used importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.
\ No newline at end of file
diff --git a/data/2021/neurips/Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization b/data/2021/neurips/Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization
new file mode 100644
index 0000000000..68414e6495
--- /dev/null
+++ b/data/2021/neurips/Network-to-Network Regularization: Enforcing Occam's Razor to Improve Generalization	
@@ -0,0 +1 @@
+What makes a classiﬁer have the ability to generalize? There have been a lot of important attempts to address this question, but a clear answer is still elusive. Proponents of complexity theory ﬁnd that the complexity of the classiﬁer’s function space is key to deciding generalization, whereas other recent work reveals that classiﬁers which extract invariant feature representations are likely to generalize better. Recent theoretical and empirical studies, however, have shown that even within a classiﬁer’s function space, there can be signiﬁcant differences in the ability to generalize. Speciﬁcally, empirical studies have shown that among functions which have a good training data ﬁt, functions with lower Kolmogorov complexity (KC) are likely to generalize better, while the opposite is true for functions of higher KC. Motivated by these ﬁndings, we propose, in this work, a novel measure of complexity called Kolmogorov Growth (KG), which we use to derive new generalization error bounds that only depend on the ﬁnal choice of the classiﬁcation function. Guided by the bounds, we propose a novel way of regularizing neural networks by constraining the network trajectory to remain in the low KG zone during training. Minimizing KG while learning is akin to applying the Occam’s razor to neural networks. The proposed approach, called network-to-network regularization, leads to clear improvements in the generalization ability of classiﬁers. We verify this for three popular image datasets (MNIST, CIFAR-10, CIFAR-100) across varying training data sizes. Empirical studies ﬁnd that conventional training of neural networks, unlike network-to-network regularization, leads to networks of high KG and lower test accuracies. Furthermore, we present the beneﬁts of N2N regularization in the scenario where the training data labels are noisy. Using N2N regularization, we achieve competitive performance on MNIST, CIFAR-10 and CIFAR-100 datasets with corrupted training labels, signiﬁcantly improving network performance compared to standard cross-entropy baselines in most cases. These ﬁndings illustrate the many beneﬁts obtained from imposing a function complexity prior like Kolmogorov Growth during the training process.
\ No newline at end of file
diff --git a/data/2021/neurips/NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction b/data/2021/neurips/NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
new file mode 100644
index 0000000000..d43e769044
--- /dev/null
+++ b/data/2021/neurips/NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction	
@@ -0,0 +1 @@
+We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision, easily get trapped in local minima, and therefore struggle with the reconstruction of objects with severe self-occlusion or thin structures. Meanwhile, recent neural methods for novel view synthesis, such as NeRF and its variants, use volume rendering to produce a neural scene representation with robustness of optimization, even for highly complex objects. However, extracting high-quality surfaces from this learned implicit representation is difficult because there are not sufficient surface constraints in the representation. In NeuS, we propose to represent a surface as the zero-level set of a signed distance function (SDF) and develop a new volume rendering method to train a neural SDF representation. We observe that the conventional volume rendering method causes inherent geometric errors (i.e. bias) for surface reconstruction, and therefore propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision. Experiments on the DTU dataset and the BlendedMVS dataset show that NeuS outperforms the state-of-the-arts in high-quality surface reconstruction, especially for objects and scenes with complex structures and self-occlusion.
\ No newline at end of file
diff --git a/data/2021/neurips/NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL b/data/2021/neurips/NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
new file mode 100644
index 0000000000..2231055ea6
--- /dev/null
+++ b/data/2021/neurips/NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL	
@@ -0,0 +1 @@
+Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Active Learning with Performance Guarantees b/data/2021/neurips/Neural Active Learning with Performance Guarantees
new file mode 100644
index 0000000000..7806c37c6a
--- /dev/null
+++ b/data/2021/neurips/Neural Active Learning with Performance Guarantees	
@@ -0,0 +1 @@
+We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Additive Models: Interpretable Machine Learning with Neural Nets b/data/2021/neurips/Neural Additive Models: Interpretable Machine Learning with Neural Nets
new file mode 100644
index 0000000000..fa3e7632b8
--- /dev/null
+++ b/data/2021/neurips/Neural Additive Models: Interpretable Machine Learning with Neural Nets	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. However, their accuracy comes at the cost of intelligibility: it is usually unclear how they make their decisions. This hinders their applicability to high stakes decision-making domains such as healthcare. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature. These networks are trained jointly and can learn arbitrarily complex relationships between their input feature and the output. Our experiments on regression and classification datasets show that NAMs are more accurate than widely used intelligible models such as logistic regression and shallow decision trees. They perform similarly to existing state-of-the-art generalized additive models in accuracy, but can be more easily applied to real-world problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Algorithmic Reasoners are Implicit Planners b/data/2021/neurips/Neural Algorithmic Reasoners are Implicit Planners
new file mode 100644
index 0000000000..3327dbd915
--- /dev/null
+++ b/data/2021/neurips/Neural Algorithmic Reasoners are Implicit Planners	
@@ -0,0 +1 @@
+Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such a tabular form -- which is highly restrictive -- or infer"local neighbourhoods"of states to run value iteration over -- for which we discover an algorithmic bottleneck effect. This effect is caused by explicitly running the planning algorithm based on scalar predictions in every state, which can be harmful to data efficiency if such scalars are improperly predicted. We propose eXecuted Latent Value Iteration Networks (XLVINs), which alleviate the above limitations. Our method performs all planning computations in a high-dimensional latent space, breaking the algorithmic bottleneck. It maintains alignment with value iteration by carefully leveraging neural graph-algorithmic reasoning and contrastive self-supervised learning. Across eight low-data settings -- including classical control, navigation and Atari -- XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations b/data/2021/neurips/Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
new file mode 100644
index 0000000000..2bca8bb804
--- /dev/null
+++ b/data/2021/neurips/Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations	
@@ -0,0 +1 @@
+We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Auto-Curricula in Two-Player Zero-Sum Games b/data/2021/neurips/Neural Auto-Curricula in Two-Player Zero-Sum Games
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction b/data/2021/neurips/Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction
new file mode 100644
index 0000000000..95ca37039e
--- /dev/null
+++ b/data/2021/neurips/Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction	
@@ -0,0 +1 @@
+Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose a general and flexible representation learning framework based on paths for link prediction. Specifically, we define the representation of a pair of nodes as the generalized sum of all path representations, with each path representation as the generalized product of the edge representations in the path. Motivated by the Bellman-Ford algorithm for solving the shortest path problem, we show that the proposed path formulation can be efficiently solved by the generalized Bellman-Ford algorithm. To further improve the capacity of the path formulation, we propose the Neural Bellman-Ford Network (NBFNet), a general graph neural network framework that solves the path formulation with learned operators in the generalized Bellman-Ford algorithm. The NBFNet parameterizes the generalized Bellman-Ford algorithm with 3 neural components, namely INDICATOR, MESSAGE and AGGREGATE functions, which corresponds to the boundary condition, multiplication operator, and summation operator respectively. The NBFNet is very general, covers many traditional path-based methods, and can be applied to both homogeneous graphs and multi-relational graphs (e.g., knowledge graphs) in both transductive and inductive settings. Experiments on both homogeneous graphs and knowledge graphs show that the proposed NBFNet outperforms existing methods by a large margin in both transductive and inductive settings, achieving new state-of-the-art results.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Bootstrapper b/data/2021/neurips/Neural Bootstrapper
new file mode 100644
index 0000000000..a99185a294
--- /dev/null
+++ b/data/2021/neurips/Neural Bootstrapper	
@@ -0,0 +1 @@
+Bootstrapping has been a primary tool for uncertainty quantification, and their theoretical and computational properties have been investigated in the field of statistics and machine learning. However, due to its nature of repetitive computations, the computational burden required to implement bootstrap procedures for the neural network is painfully heavy, and this fact seriously hurdles the practical use of these procedures on the uncertainty estimation of modern deep learning. To overcome the inconvenience, we propose a procedure called \emph{Neural Bootstrapper} (NeuBoots). We reveal that the NeuBoots stably generate valid bootstrap samples that coincide with the desired target samples with minimal extra computational cost compared to traditional bootstrapping. Consequently, NeuBoots makes it feasible to construct bootstrap confidence intervals of outputs of neural networks and quantify their predictive uncertainty. We also suggest NeuBoots for deep convolutional neural networks to consider its utility in image classification tasks, including calibration, detection of out-of-distribution samples, and active learning. Empirical results demonstrate that NeuBoots is significantly beneficial for the above purposes.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Circuit Synthesis from Specification Patterns b/data/2021/neurips/Neural Circuit Synthesis from Specification Patterns
new file mode 100644
index 0000000000..8cd8ec7b9b
--- /dev/null
+++ b/data/2021/neurips/Neural Circuit Synthesis from Specification Patterns	
@@ -0,0 +1 @@
+We train hierarchical Transformers on the task of synthesizing hardware circuits directly out of high-level logical specifications in linear-time temporal logic (LTL). The LTL synthesis problem is a well-known algorithmic challenge with a long history and an annual competition is organized to track the improvement of algorithms and tooling over time. New approaches using machine learning might open a lot of possibilities in this area, but suffer from the lack of sufficient amounts of training data. In this paper, we consider a method to generate large amounts of additional training data, i.e., pairs of specifications and circuits implementing them. We ensure that this synthetic data is sufficiently close to human-written specifications by mining common patterns from the specifications used in the synthesis competitions. We show that hierarchical Transformers trained on this synthetic data solve a significant portion of problems from the synthesis competitions, and even out-of-distribution examples from a recent case study.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Distance Embeddings for Biological Sequences b/data/2021/neurips/Neural Distance Embeddings for Biological Sequences
new file mode 100644
index 0000000000..029d0cac54
--- /dev/null
+++ b/data/2021/neurips/Neural Distance Embeddings for Biological Sequences	
@@ -0,0 +1 @@
+The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation of the edit distance that models evolution and the hierarchical relationship that characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed sequences in geometric vector spaces, and illustrate the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 22% reduction in embedding RMSE against the best competing geometry. The capacity of the framework and the significance of these improvements are then demonstrated devising supervised and unsupervised NeuroSEED approaches to multiple core tasks in bioinformatics. Benchmarked with common baselines, the proposed approaches display significant accuracy and/or runtime improvements on real-world datasets. As an example for hierarchical clustering, the proposed pretrained and from-scratch methods match the quality of competing baselines with 30x and 15x runtime reduction, respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Dubber: Dubbing for Videos According to Scripts b/data/2021/neurips/Neural Dubber: Dubbing for Videos According to Scripts
new file mode 100644
index 0000000000..f9cb88da71
--- /dev/null
+++ b/data/2021/neurips/Neural Dubber: Dubbing for Videos According to Scripts	
@@ -0,0 +1 @@
+Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in synchronization with the pre-recorded videos. In this work, we propose Neural Dubber, the first neural network model to solve a novel automatic video dubbing (AVD) task: synthesizing human speech synchronized with the given video from the text. Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech. Furthermore, an image-based speaker embedding (ISE) module is developed for the multi-speaker setting, which enables Neural Dubber to generate speech with a reasonable timbre according to the speaker's face. Experiments on the chemistry lecture single-speaker dataset and LRS2 multi-speaker dataset show that Neural Dubber can generate speech audios on par with state-of-the-art TTS models in terms of speech quality. Most importantly, both qualitative and quantitative evaluations show that Neural Dubber can control the prosody of synthesized speech by the video, and generate high-fidelity speech temporally synchronized with the video. Our project page is at https://tsinghua-mars-lab.github.io/NeuralDubber/ .
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Ensemble Search for Uncertainty Estimation and Dataset Shift b/data/2021/neurips/Neural Ensemble Search for Uncertainty Estimation and Dataset Shift
new file mode 100644
index 0000000000..24c7cfb041
--- /dev/null
+++ b/data/2021/neurips/Neural Ensemble Search for Uncertainty Estimation and Dataset Shift	
@@ -0,0 +1 @@
+Ensembles of neural networks achieve superior performance compared to stand-alone networks in terms of accuracy, uncertainty calibration and robustness to dataset shift. \emph{Deep ensembles}, a state-of-the-art method for uncertainty estimation, only ensemble random initializations of a \emph{fixed} architecture. Instead, we propose two methods for automatically constructing ensembles with \emph{varying} architectures, which implicitly trade-off individual architectures' strengths against the ensemble's diversity and exploit architectural variation as a source of diversity. On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift. Our further analysis and ablation studies provide evidence of higher ensemble diversity due to architectural variation, resulting in ensembles that can outperform deep ensembles, even when having weaker average base learners. To foster reproducibility, our code is available: \url{https://github.com/automl/nes}
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Flows: Efficient Alternative to Neural ODEs b/data/2021/neurips/Neural Flows: Efficient Alternative to Neural ODEs
new file mode 100644
index 0000000000..2d9e37eb65
--- /dev/null
+++ b/data/2021/neurips/Neural Flows: Efficient Alternative to Neural ODEs	
@@ -0,0 +1 @@
+Neural ordinary differential equations describe how values change in time. This is the reason why they gained importance in modeling sequential data, especially when the observations are made at irregular intervals. In this paper we propose an alternative by directly modeling the solution curves - the flow of an ODE - with a neural network. This immediately eliminates the need for expensive numerical solvers while still maintaining the modeling capability of neural ODEs. We propose several flow architectures suitable for different applications by establishing precise conditions on when a function defines a valid flow. Apart from computational efficiency, we also provide empirical evidence of favorable generalization performance via applications in time series modeling, forecasting, and density estimation.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering b/data/2021/neurips/Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
new file mode 100644
index 0000000000..db902149e9
--- /dev/null
+++ b/data/2021/neurips/Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering	
@@ -0,0 +1 @@
+In this paper, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human. In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Specifically, we first introduce a temporal transformer that aggregates tracked visual features based on the skeletal body motion over time. Moreover, a multi-view transformer is proposed to perform cross-attention between the temporally-fused features and the pixel-aligned features at each time step to integrate observations on the fly from multiple views. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses. The video results and code are available at https://youngjoongunc.github.io/nhp.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions b/data/2021/neurips/Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions
new file mode 100644
index 0000000000..5f64f1003f
--- /dev/null
+++ b/data/2021/neurips/Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions	
@@ -0,0 +1 @@
+Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs across applications, a general procedure for the explicit learning of both discrete events and multi-mode continuous dynamics remains an open problem. This work introduces Neural Hybrid Automata (NHAs), a recipe for learning SHS dynamics without a priori knowledge on the number of modes and inter-modal transition dynamics. NHAs provide a systematic inference method based on normalizing flows, neural differential equations and self-supervision. We showcase NHAs on several tasks, including mode recovery and flow learning in systems with stochastic transitions, and end-to-end learning of hierarchical robot controllers.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception b/data/2021/neurips/Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception
new file mode 100644
index 0000000000..7d4c975383
--- /dev/null
+++ b/data/2021/neurips/Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception	
@@ -0,0 +1 @@
+Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Production Systems b/data/2021/neurips/Neural Production Systems
new file mode 100644
index 0000000000..44da5e203f
--- /dev/null
+++ b/data/2021/neurips/Neural Production Systems	
@@ -0,0 +1 @@
+Stance detection is an important task, which aims to classify the attitude of an opinionated text toward a given target. In this paper, we develop an interpretable neural production system for stance detection (NPS4SD). NPS4SD is an end-to-end deep learning model, which consists of a set of knowledge rules that are applied by binding with specific entities. NPS4SD consists of two main components: a pretrained model for learning the text representation and a variable binding network (VBN) to bind the knowledge rules with text entities. Extensive experiments are conducted to evaluate the effectiveness of the proposed NPS4SD model on three real-world datasets with in-domain, cross-target and zero-shot setups. Experimental results demonstrate that NPS4SD achieves substantially better performance than the strong competitors for the stance detection task.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Program Generation Modulo Static Analysis b/data/2021/neurips/Neural Program Generation Modulo Static Analysis
new file mode 100644
index 0000000000..a4573019d7
--- /dev/null
+++ b/data/2021/neurips/Neural Program Generation Modulo Static Analysis	
@@ -0,0 +1 @@
+State-of-the-art neural models of source code tend to be evaluated on the generation of individual expressions and lines of code, and commonly fail on long-horizon tasks such as the generation of entire method bodies. We propose to address this deficiency using weak supervision from a static program analyzer. Our neurosymbolic method allows a deep generative model to symbolically compute, using calls to a static-analysis tool, long-distance semantic relationships in the code that it has already generated. During training, the model observes these relationships and learns to generate programs conditioned on them. We apply our approach to the problem of generating entire Java methods given the remainder of the class that contains the method. Our experiments show that the approach substantially outperforms state-of-the-art transformers and a model that explicitly tries to learn program semantics on this task, both in terms of producing programs free of basic semantic errors and in terms of syntactically matching the ground truth.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Pseudo-Label Optimism for the Bank Loan Problem b/data/2021/neurips/Neural Pseudo-Label Optimism for the Bank Loan Problem
new file mode 100644
index 0000000000..dd5a4fba76
--- /dev/null
+++ b/data/2021/neurips/Neural Pseudo-Label Optimism for the Bank Loan Problem	
@@ -0,0 +1 @@
+We study a class of classification problems best exemplified by the \emph{bank loan} problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to ``get stuck'' with a self-fulfilling model. This model never corrects its false negatives, since it never sees the true label for rejected data, thus accumulating infinite regret. In the case of linear models, this issue can be addressed by adding optimism directly into the model predictions. However, there are few methods that extend to the function approximation case using Deep Neural Networks. We present Pseudo-Label Optimism (PLOT), a conceptually and computationally simple method for this setting applicable to DNNs. \PLOT{} adds an optimistic label to the subset of decision points the current model is deciding on, trains the model on all data so far (including these points along with their optimistic labels), and finally uses the resulting \emph{optimistic} model for decision making. \PLOT{} achieves competitive performance on a set of three challenging benchmark problems, requiring minimal hyperparameter tuning. We also show that \PLOT{} satisfies a logarithmic regret guarantee, under a Lipschitz and logistic mean label model, and under a separability condition on the data.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex b/data/2021/neurips/Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex
new file mode 100644
index 0000000000..1e93790e51
--- /dev/null
+++ b/data/2021/neurips/Neural Regression, Representational Similarity, Model Zoology & Neural Taskonomy at Scale in Rodent Visual Cortex	
@@ -0,0 +1 @@
+How well do deep neural networks fare as models of mouse visual cortex? A majority of research to date suggests results far more mixed than those produced in the modeling of primate visual cortex. Here, we perform a large-scale benchmarking of dozens of deep neural network models in mouse visual cortex with both representational similarity analysis and neural regression. Using the Allen Brain Observatory’s 2-photon calcium-imaging dataset of activity in over 6,000 reliable rodent visual cortical neurons recorded in response to natural scenes, we replicate previous findings and resolve previous discrepancies, ultimately demonstrating that modern neural networks can in fact be used to explain activity in the mouse visual cortex to a more reasonable degree than previously suggested. Using our benchmark as an atlas, we offer preliminary answers to overarching questions about levels of analysis (e.g. do models that better predict the representations of individual neurons also predict representational similarity across neural populations?); questions about the properties of models that best predict the visual system overall (e.g. is convolution or category-supervision necessary to better predict neural activity?); and questions about the mapping between biological and artificial representations (e.g. does the information processing hierarchy in deep nets match the anatomical hierarchy of mouse visual cortex?). Along the way, we catalogue a number of models (including vision transformers, MLP-Mixers, normalization free networks, Taskonomy encoders and self-supervised models) outside the traditional circuit of convolutional object recognition. Taken together, our results provide a reference point for future ventures in the deep neural network modeling of mouse visual cortex, hinting at novel combinations of mapping method, architecture, and task to more fully characterize the computational motifs of visual representation in a species so central to neuroscience, but with a perceptual physiology and ecology markedly different from the ones we study in primates.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Relightable Participating Media Rendering b/data/2021/neurips/Neural Relightable Participating Media Rendering
new file mode 100644
index 0000000000..08a10d324f
--- /dev/null
+++ b/data/2021/neurips/Neural Relightable Participating Media Rendering	
@@ -0,0 +1 @@
+Learning neural radiance fields of a scene has recently allowed realistic novel view synthesis of the scene, but they are limited to synthesize images under the original fixed lighting condition. Therefore, they are not flexible for the eagerly desired tasks like relighting, scene editing and scene composition. To tackle this problem, several recent methods propose to disentangle reflectance and illumination from the radiance field. These methods can cope with solid objects with opaque surfaces but participating media are neglected. Also, they take into account only direct illumination or at most one-bounce indirect illumination, thus suffer from energy loss due to ignoring the high-order indirect illumination. We propose to learn neural representations for participating media with a complete simulation of global illumination. We estimate direct illumination via ray tracing and compute indirect illumination with spherical harmonics. Our approach avoids computing the lengthy indirect bounces and does not suffer from energy loss. Our experiments on multiple scenes show that our approach achieves superior visual quality and numerical performance compared to state-of-the-art methods, and it can generalize to deal with solid objects with opaque surfaces as well.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Routing by Memory b/data/2021/neurips/Neural Routing by Memory
new file mode 100644
index 0000000000..c67f5aed9f
--- /dev/null
+++ b/data/2021/neurips/Neural Routing by Memory	
@@ -0,0 +1 @@
+Recent Convolutional Neural Networks (CNNs) have achieved significant success by stacking multiple convolutional blocks, named procedures in this paper, to extract semantic features. However, they use the same procedure sequence for all inputs, regardless of the intermediate features. This paper proffers a simple yet effective idea of constructing parallel procedures and assigning similar intermediate features to the same specialized procedures in a divide-and-conquer fashion. It relieves each procedure’s learning difficulty and thus leads to superior performance. Specifically, we propose a routing-by-memory mechanism for existing CNN architectures. In each stage of the network, we introduce parallel Procedural Units (PUs). A PU consists of a memory head and a procedure. The memory head maintains a summary of a type of features. For an intermediate feature, we search its closest memory and forward it to the corresponding procedure in both training and testing. In this way, different procedures are tailored to different features and therefore tackle them better. Networks with the proposed mechanism can be trained efficiently using a four-step training strategy. Experimental results show that our method improves VGGNet, ResNet, and EfficientNet’s accuracies on Tiny ImageNet, ImageNet, and CIFAR-100 benchmarks with a negligible extra computational cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation b/data/2021/neurips/Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation
new file mode 100644
index 0000000000..b98eced8be
--- /dev/null
+++ b/data/2021/neurips/Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation	
@@ -0,0 +1 @@
+Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e.g., BART and T5), have exhibited compelling performance on various natural language generation tasks. However, the black-box nature of these models limits their application in tasks where specific rules (e.g., controllable constraints, prior knowledge) need to be executed. Previous works either design specific model structure (e.g., Copy Mechanism corresponding to the rule"the generated output should include certain words in the source input") or implement specialized inference algorithm (e.g., Constrained Beam Search) to execute particular rules through the text generation. These methods require careful design case-by-case and are difficult to support multiple rules concurrently. In this paper, we propose a novel module named Neural Rule-Execution Tracking Machine that can be equipped into various transformer-based generators to leverage multiple rules simultaneously to guide the neural generation model for superior generation performance in a unified and scalable way. Extensive experimental results on several benchmarks verify the effectiveness of our proposed model in both controllable and general text generation.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Scene Flow Prior b/data/2021/neurips/Neural Scene Flow Prior
new file mode 100644
index 0000000000..2651305b8a
--- /dev/null
+++ b/data/2021/neurips/Neural Scene Flow Prior	
@@ -0,0 +1 @@
+Before the deep learning revolution, many perception algorithms were based on runtime optimization in conjunction with a strong prior/regularization penalty. A prime example of this in computer vision is optical and scene flow. Supervised learning has largely displaced the need for explicit regularization. Instead, they rely on large amounts of labeled data to capture prior statistics, which are not always readily available for many problems. Although optimization is employed to learn the neural network, the weights of this network are frozen at runtime. As a result, these learning solutions are domain-specific and do not generalize well to other statistically different scenarios. This paper revisits the scene flow problem that relies predominantly on runtime optimization and strong regularization. A central innovation here is the inclusion of a neural scene flow prior, which uses the architecture of neural networks as a new type of implicit regularizer. Unlike learning-based scene flow methods, optimization occurs at runtime, and our approach needs no offline datasets -- making it ideal for deployment in new environments such as autonomous driving. We show that an architecture based exclusively on multilayer perceptrons (MLPs) can be used as a scene flow prior. Our method attains competitive -- if not better -- results on scene flow benchmarks. Also, our neural prior's implicit and continuous scene flow representation allows us to estimate dense long-term correspondences across a sequence of point clouds. The dense motion information is represented by scene flow fields where points can be propagated through time by integrating motion vectors. We demonstrate such a capability by accumulating a sequence of lidar point clouds.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems b/data/2021/neurips/Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems
new file mode 100644
index 0000000000..6bf7aa729b
--- /dev/null
+++ b/data/2021/neurips/Neural Symplectic Form: Learning Hamiltonian Equations on General Coordinate Systems	
@@ -0,0 +1 @@
+In recent years, substantial research on the methods for learning Hamiltonian equations has been conducted. Although these approaches are very promising, the commonly used representation of the Hamilton equation uses the generalized momenta, which are generally unknown. Therefore, the training data must be represented in this unknown coordinate system, and this causes difﬁculty in applying the model to real data. Meanwhile, Hamiltonian equations also have a coordinate-free expression that is expressed by using the symplectic 2-form. In this paper, we propose a model that learns the symplectic form from data using neural networks, thereby providing a method for learning Hamiltonian equations from data represented in general coordinate systems, which are not limited to the generalized coordinates and the generalized momenta. Consequently, the proposed method is capable not only of modeling the target equations of both Hamiltonian and Lagrangian formalisms but also of extracting unknown Hamiltonian structures hidden in the data. For example, many polynomial ordinary differential equations such as the Lotka–Volterra equation are known to admit non-trivial Hamiltonian structures, and our numerical experiments show that such structures can certainly be learned from data. Technically, each symplectic 2-form is associated with a skew-symmetric matrix, but not all skew-symmetric matrices deﬁne a symplectic 2-form. In the proposed method, using the fact that symplectic 2-forms are derived as the exterior derivative of certain differential 1-forms, we model the differential 1-form by neural networks, thereby improving the efﬁciency of learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Tangent Kernel Maximum Mean Discrepancy b/data/2021/neurips/Neural Tangent Kernel Maximum Mean Discrepancy
new file mode 100644
index 0000000000..37cd108e00
--- /dev/null
+++ b/data/2021/neurips/Neural Tangent Kernel Maximum Mean Discrepancy	
@@ -0,0 +1 @@
+We present a novel neural network Maximum Mean Discrepancy (MMD) statistic by identifying a new connection between neural tangent kernel (NTK) and MMD. This connection enables us to develop a computationally efficient and memory-efficient approach to compute the MMD statistic and perform NTK based two-sample tests towards addressing the long-standing challenge of memory and computational complexity of the MMD statistic, which is essential for online implementation to assimilating new samples. Theoretically, such a connection allows us to understand the NTK test statistic properties, such as the Type-I error and testing power for performing the two-sample test, by adapting existing theories for kernel MMD. Numerical experiments on synthetic and real-world datasets validate the theory and demonstrate the effectiveness of the proposed NTK-MMD statistic.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural Trees for Learning on Graphs b/data/2021/neurips/Neural Trees for Learning on Graphs
new file mode 100644
index 0000000000..fc734a0062
--- /dev/null
+++ b/data/2021/neurips/Neural Trees for Learning on Graphs	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have emerged as a flexible and powerful approach for learning over graphs. Despite this success, existing GNNs are constrained by their local message-passing architecture and are provably limited in their expressive power. In this work, we propose a new GNN architecture -- the Neural Tree. The neural tree architecture does not perform message passing on the input graph, but on a tree-structured graph, called the H-tree, that is constructed from the input graph. Nodes in the H-tree correspond to subgraphs in the input graph, and they are reorganized in a hierarchical manner such that the parent of a node in the H-tree always corresponds to a larger subgraph in the input graph. We show that the neural tree architecture can approximate any smooth probability distribution function over an undirected graph. We also prove that the number of parameters needed to achieve an $\epsilon$-approximation of the distribution function is exponential in the treewidth of the input graph, but linear in its size. We prove that any continuous $\mathcal{G}$-invariant/equivariant function can be approximated by a nonlinear combination of such probability distribution functions over $\mathcal{G}$. We apply the neural tree to semi-supervised node classification in 3D scene graphs, and show that these theoretical properties translate into significant gains in prediction accuracy, over the more traditional GNN architectures. We also show the applicability of the neural tree architecture to citation networks with large treewidth, by using a graph sub-sampling technique.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose b/data/2021/neurips/Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose
new file mode 100644
index 0000000000..d9036171f5
--- /dev/null
+++ b/data/2021/neurips/Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose	
@@ -0,0 +1 @@
+We study the problem of learning to estimate the 3D object pose from a few labelled examples and a collection of unlabelled data. Our main contribution is a learning framework, neural view synthesis and matching, that can transfer the 3D pose annotation from the labelled to unlabelled images reliably, despite unseen 3D views and nuisance variations such as the object shape, texture, illumination or scene context. In our approach, objects are represented as 3D cuboid meshes composed of feature vectors at each mesh vertex. The model is initialized from a few labelled images and is subsequently used to synthesize feature representations of unseen 3D views. The synthesized views are matched with the feature representations of unlabelled images to generate pseudo-labels of the 3D pose. The pseudo-labelled data is, in turn, used to train the feature extractor such that the features at each mesh vertex are more invariant across varying 3D views of the object. Our model is trained in an EM-type manner alternating between increasing the 3D pose invariance of the feature extractor and annotating unlabelled data through neural view synthesis and matching. We demonstrate the effectiveness of the proposed semi-supervised learning framework for 3D pose estimation on the PASCAL3D+ and KITTI datasets. We find that our approach outperforms all baselines by a wide margin, particularly in an extreme few-shot setting where only 7 annotated images are given. Remarkably, we observe that our model also achieves an exceptional robustness in out-of-distribution scenarios that involve partial occlusion.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural optimal feedback control with local learning rules b/data/2021/neurips/Neural optimal feedback control with local learning rules
new file mode 100644
index 0000000000..f32d433b3e
--- /dev/null
+++ b/data/2021/neurips/Neural optimal feedback control with local learning rules	
@@ -0,0 +1 @@
+A major problem in motor control is understanding how the brain plans and executes proper movements in the face of delayed and noisy stimuli. A prominent framework for addressing such control problems is Optimal Feedback Control (OFC). OFC generates control actions that optimize behaviorally relevant criteria by integrating noisy sensory stimuli and the predictions of an internal model using the Kalman filter or its extensions. However, a satisfactory neural model of Kalman filtering and control is lacking because existing proposals have the following limitations: not considering the delay of sensory feedback, training in alternating phases, and requiring knowledge of the noise covariance matrices, as well as that of systems dynamics. Moreover, the majority of these studies considered Kalman filtering in isolation, and not jointly with control. To address these shortcomings, we introduce a novel online algorithm which combines adaptive Kalman filtering with a model free control approach (i.e., policy gradient algorithm). We implement this algorithm in a biologically plausible neural network with local synaptic plasticity rules. This network performs system identification and Kalman filtering, without the need for multiple phases with distinct update rules or the knowledge of the noise covariances. It can perform state estimation with delayed sensory feedback, with the help of an internal model. It learns the control policy without requiring any knowledge of the dynamics, thus avoiding the need for weight transport. In this way, our implementation of OFC solves the credit assignment problem needed to produce the appropriate sensory-motor control in the presence of stimulus delay.
\ No newline at end of file
diff --git a/data/2021/neurips/Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition b/data/2021/neurips/Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition
new file mode 100644
index 0000000000..53150f53ee
--- /dev/null
+++ b/data/2021/neurips/Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition	
@@ -0,0 +1 @@
+Decomposing a scene into its shape, reflectance and illumination is a fundamental problem in computer vision and graphics. Neural approaches such as NeRF have achieved remarkable success in view synthesis, but do not explicitly perform decomposition and instead operate exclusively on radiance (the product of reflectance and illumination). Extensions to NeRF, such as NeRD, can perform decomposition but struggle to accurately recover detailed illumination, thereby significantly limiting realism. We propose a novel reflectance decomposition network that can estimate shape, BRDF, and per-image illumination given a set of object images captured under varying illumination. Our key technique is a novel illumination integration network called Neural-PIL that replaces a costly illumination integral operation in the rendering with a simple network query. In addition, we also learn deep low-dimensional priors on BRDF and illumination representations using novel smooth manifold auto-encoders. Our decompositions can result in considerably better BRDF and light estimates enabling more accurate novel view-synthesis and relighting compared to prior art. Project page: https://markboss.me/publication/2021-neural-pil/
\ No newline at end of file
diff --git a/data/2021/neurips/NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem b/data/2021/neurips/NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem
new file mode 100644
index 0000000000..cd67152cab
--- /dev/null
+++ b/data/2021/neurips/NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem	
@@ -0,0 +1 @@
+We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to much larger sizes. Also, we show that NeuroLKH can be applied to other routing problems such as Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP), and CVRP with Time Windows (CVRPTW).
\ No newline at end of file
diff --git a/data/2021/neurips/NeuroMLR: Robust & Reliable Route Recommendation on Road Networks b/data/2021/neurips/NeuroMLR: Robust & Reliable Route Recommendation on Road Networks
new file mode 100644
index 0000000000..b637ea8607
--- /dev/null
+++ b/data/2021/neurips/NeuroMLR: Robust & Reliable Route Recommendation on Road Networks	
@@ -0,0 +1 @@
+Predicting the most likely route from a source location to a destination is a core functionality in mapping services. Although the problem has been studied in the literature, two key limitations remain to be addressed. First, our study reveals that a signiﬁcant portion of the routes recommended by existing methods fail to reach the destination. Second, existing techniques are transductive in nature; hence, they fail to recommend routes if unseen roads are encountered at inference time. In this paper, we address these limitations through an inductive algorithm called N EURO MLR. N EURO MLR learns a generative model from historical trajectories by conditioning on three explanatory factors: the current location, the destination, and real-time trafﬁc conditions. The conditional distributions are learned through a novel combination of Lipschitz embeddings with Graph Convolutional Networks (GCN) using historical trajectory data. Through in-depth experiments on real-world datasets, we establish that N EURO MLR imparts signiﬁcant improvement in accuracy over the state of the art. More importantly, N EURO MLR generalizes dramatically better to unseen data and the recommended routes reach the destination with much higher likelihood than existing techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Never Go Full Batch (in Stochastic Convex Optimization) b/data/2021/neurips/Never Go Full Batch (in Stochastic Convex Optimization)
new file mode 100644
index 0000000000..145aaffb39
--- /dev/null
+++ b/data/2021/neurips/Never Go Full Batch (in Stochastic Convex Optimization)	
@@ -0,0 +1 @@
+We study the generalization performance of full-batch optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within ε after $ (1/ε2) iterations, full-batch methods either need at least Ω(1/ε4) iterations or exhibit a dimension-dependent sample complexity.
\ No newline at end of file
diff --git a/data/2021/neurips/Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update b/data/2021/neurips/Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
new file mode 100644
index 0000000000..6d3c09f2e4
--- /dev/null
+++ b/data/2021/neurips/Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update	
@@ -0,0 +1 @@
+In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data b/data/2021/neurips/No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data
new file mode 100644
index 0000000000..4eb75d0825
--- /dev/null
+++ b/data/2021/neurips/No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data	
@@ -0,0 +1 @@
+A central challenge in training classification models in the real-world federated system is learning with non-IID data. To cope with this, most of the existing works involve enforcing regularization in local optimization or improving the model aggregation scheme at the server. Other works also share public datasets or synthesized samples to supplement the training of under-represented classes or introduce a certain level of personalization. Though effective, they lack a deep understanding of how the data heterogeneity affects each layer of a deep classification model. In this paper, we bridge this gap by performing an experimental analysis of the representations learned by different layers. Our observations are surprising: (1) there exists a greater bias in the classifier than other layers, and (2) the classification performance can be significantly improved by post-calibrating the classifier after federated training. Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model. Experimental results demonstrate that CCVR achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10. We hope that our simple yet effective method can shed some light on the future research of federated learning with non-IID data.
\ No newline at end of file
diff --git a/data/2021/neurips/No RL, No Simulation: Learning to Navigate without Navigating b/data/2021/neurips/No RL, No Simulation: Learning to Navigate without Navigating
new file mode 100644
index 0000000000..7a2738f331
--- /dev/null
+++ b/data/2021/neurips/No RL, No Simulation: Learning to Navigate without Navigating	
@@ -0,0 +1 @@
+Most prior methods for learning navigation policies require access to simulation environments, as they need online policy interaction and rely on ground-truth maps for rewards. However, building simulators is expensive (requires manual effort for each and every scene) and creates challenges in transferring learned policies to robotic platforms in the real-world, due to the sim-to-real domain gap. In this paper, we pose a simple question: Do we really need active interaction, ground-truth maps or even reinforcement-learning (RL) in order to solve the image-goal navigation task? We propose a self-supervised approach to learn to navigate from only passive videos of roaming. Our approach, No RL, No Simulator (NRNS), is simple and scalable, yet highly effective. NRNS outperforms RL-based formulations by a significant margin. We present NRNS as a strong baseline for any future image-based navigation tasks that use RL or Simulation.
\ No newline at end of file
diff --git a/data/2021/neurips/No Regrets for Learning the Prior in Bandits b/data/2021/neurips/No Regrets for Learning the Prior in Bandits
new file mode 100644
index 0000000000..0b9cd1e22a
--- /dev/null
+++ b/data/2021/neurips/No Regrets for Learning the Prior in Bandits	
@@ -0,0 +1 @@
+We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its parameters. When solving a bandit task, that uncertainty is marginalized out and properly accounted for. ${\tt AdaTS}$ is a fully-Bayesian algorithm that can be implemented efficiently in several classes of bandit problems. We derive upper bounds on its Bayes regret that quantify the loss due to not knowing the task prior, and show that it is small. Our theory is supported by experiments, where ${\tt AdaTS}$ outperforms prior algorithms and works well even in challenging real-world problems.
\ No newline at end of file
diff --git a/data/2021/neurips/No-Press Diplomacy from Scratch b/data/2021/neurips/No-Press Diplomacy from Scratch
new file mode 100644
index 0000000000..b44d0a37ae
--- /dev/null
+++ b/data/2021/neurips/No-Press Diplomacy from Scratch	
@@ -0,0 +1 @@
+Prior AI successes in complex games have largely focused on settings with at most hundreds of actions at each decision point. In contrast, Diplomacy is a game with more than 10^20 possible actions per turn. Previous attempts to address games with large branching factors, such as Diplomacy, StarCraft, and Dota, used human data to bootstrap the policy or used handcrafted reward shaping. In this paper, we describe an algorithm for action exploration and equilibrium approximation in games with combinatorial action spaces. This algorithm simultaneously performs value iteration while learning a policy proposal network. A double oracle step is used to explore additional actions to add to the policy proposals. At each state, the target state value and policy for the model training are computed via an equilibrium search procedure. Using this algorithm, we train an agent, DORA, completely from scratch for a popular two-player variant of Diplomacy and show that it achieves superhuman performance. Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data. We present evidence that this agent plays a strategy that is incompatible with human-data bootstrapped agents. This presents the first strong evidence of multiple equilibria in Diplomacy and suggests that self play alone may be insufficient for achieving superhuman performance in Diplomacy.
\ No newline at end of file
diff --git a/data/2021/neurips/No-regret Online Learning over Riemannian Manifolds b/data/2021/neurips/No-regret Online Learning over Riemannian Manifolds
new file mode 100644
index 0000000000..9e5e0eef49
--- /dev/null
+++ b/data/2021/neurips/No-regret Online Learning over Riemannian Manifolds	
@@ -0,0 +1 @@
+We consider online optimization over Riemannian manifolds, where a learner attempts to minimize a sequence of time-varying loss functions deﬁned on Riemannian manifolds. Though many Euclidean online convex optimization algorithms have been proven useful in a wide range of areas, less attention has been paid to their Riemannian counterparts. In this paper, we study Riemannian online gradient descent (R-OGD) on Hadamard manifolds for both geodesically convex and strongly geodesically convex loss functions, and Riemannian bandit algorithm (R-BAN) on Hadamard homogeneous manifolds for geodesically convex functions. We establish upper bounds on the regrets of the problem with respect to time horizon, manifold curvature, and manifold dimension. We also ﬁnd a universal lower bound for the achievable regret by constructing an online convex optimization problem on Hadamard manifolds. All the obtained regret bounds match the corresponding results are provided in Euclidean spaces. Finally, some numerical experiments validate our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Node Dependent Local Smoothing for Scalable Graph Learning b/data/2021/neurips/Node Dependent Local Smoothing for Scalable Graph Learning
new file mode 100644
index 0000000000..afbe909b77
--- /dev/null
+++ b/data/2021/neurips/Node Dependent Local Smoothing for Scalable Graph Learning	
@@ -0,0 +1 @@
+Recent works reveal that feature or label smoothing lies at the core of Graph Neural Networks (GNNs). Concretely, they show feature smoothing combined with simple linear regression achieves comparable performance with the carefully designed GNNs, and a simple MLP model with label smoothing of its prediction can outperform the vanilla GCN. Though an interesting finding, smoothing has not been well understood, especially regarding how to control the extent of smoothness. Intuitively, too small or too large smoothing iterations may cause under-smoothing or over-smoothing and can lead to sub-optimal performance. Moreover, the extent of smoothness is node-specific, depending on its degree and local structure. To this end, we propose a novel algorithm called node-dependent local smoothing (NDLS), which aims to control the smoothness of every node by setting a node-specific smoothing iteration. Specifically, NDLS computes influence scores based on the adjacency matrix and selects the iteration number by setting a threshold on the scores. Once selected, the iteration number can be applied to both feature smoothing and label smoothing. Experimental results demonstrate that NDLS enjoys high accuracy -- state-of-the-art performance on node classifications tasks, flexibility -- can be incorporated with any models, scalability and efficiency -- can support large scale graphs with fast training.
\ No newline at end of file
diff --git a/data/2021/neurips/Noether Networks: meta-learning useful conserved quantities b/data/2021/neurips/Noether Networks: meta-learning useful conserved quantities
new file mode 100644
index 0000000000..02f2630613
--- /dev/null
+++ b/data/2021/neurips/Noether Networks: meta-learning useful conserved quantities	
@@ -0,0 +1 @@
+Progress in machine learning (ML) stems from a combination of data availability, computational resources, and an appropriate encoding of inductive biases. Useful biases often exploit symmetries in the prediction problem, such as convolutional networks relying on translation equivariance. Automatically discovering these useful symmetries holds the potential to greatly improve the performance of ML systems, but still remains a challenge. In this work, we focus on sequential prediction problems and take inspiration from Noether's theorem to reduce the problem of finding inductive biases to meta-learning useful conserved quantities. We propose Noether Networks: a new type of architecture where a meta-learned conservation loss is optimized inside the prediction function. We show, theoretically and experimentally, that Noether Networks improve prediction quality, providing a general framework for discovering inductive biases in sequential problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks b/data/2021/neurips/Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks
new file mode 100644
index 0000000000..3e7eb4bd91
--- /dev/null
+++ b/data/2021/neurips/Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks	
@@ -0,0 +1 @@
+In nature, symmetry governs regularities, while symmetry breaking brings texture. In artificial neural networks, symmetry has been a central design principle to efficiently capture regularities in the world, but the role of symmetry breaking is not well understood. Here, we develop a theoretical framework to study the"geometry of learning dynamics"in neural networks, and reveal a key mechanism of explicit symmetry breaking behind the efficiency and stability of modern neural networks. To build this understanding, we model the discrete learning dynamics of gradient descent using a continuous-time Lagrangian formulation, in which the learning rule corresponds to the kinetic energy and the loss function corresponds to the potential energy. Then, we identify"kinetic symmetry breaking"(KSB), the condition when the kinetic energy explicitly breaks the symmetry of the potential function. We generalize Noether's theorem known in physics to take into account KSB and derive the resulting motion of the Noether charge:"Noether's Learning Dynamics"(NLD). Finally, we apply NLD to neural networks with normalization layers and reveal how KSB introduces a mechanism of"implicit adaptive optimization", establishing an analogy between learning dynamics induced by normalization layers and RMSProp. Overall, through the lens of Lagrangian mechanics, we have established a theoretical foundation to discover geometric design principles for the learning dynamics of neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images b/data/2021/neurips/Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images
new file mode 100644
index 0000000000..016ed497c2
--- /dev/null
+++ b/data/2021/neurips/Noise2Score: Tweedie's Approach to Self-Supervised Image Denoising without Clean Images	
@@ -0,0 +1 @@
+Recently, there has been extensive research interest in training deep networks to denoise images without clean reference. However, the representative approaches such as Noise2Noise, Noise2Void, Stein's unbiased risk estimator (SURE), etc. seem to differ from one another and it is difficult to find the coherent mathematical structure. To address this, here we present a novel approach, called Noise2Score, which reveals a missing link in order to unite these seemingly different approaches. Specifically, we show that image denoising problems without clean images can be addressed by finding the mode of the posterior distribution and that the Tweedie's formula offers an explicit solution through the score function (i.e. the gradient of log likelihood). Our method then uses the recent finding that the score function can be stably estimated from the noisy images using the amortized residual denoising autoencoder, the method of which is closely related to Noise2Noise or Nose2Void. Our Noise2Score approach is so universal that the same network training can be used to remove noises from images that are corrupted by any exponential family distributions and noise parameters. Using extensive experiments with Gaussian, Poisson, and Gamma noises, we show that Noise2Score significantly outperforms the state-of-the-art self-supervised denoising methods in the benchmark data set such as (C)BSD68, Set12, and Kodak, etc.
\ No newline at end of file
diff --git "a/data/2021/neurips/Noisy Adaptation Generates L\303\251vy Flights in Attractor Neural Networks" "b/data/2021/neurips/Noisy Adaptation Generates L\303\251vy Flights in Attractor Neural Networks"
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ "b/data/2021/neurips/Noisy Adaptation Generates L\303\251vy Flights in Attractor Neural Networks"	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2021/neurips/Noisy Recurrent Neural Networks b/data/2021/neurips/Noisy Recurrent Neural Networks
new file mode 100644
index 0000000000..fe4461f357
--- /dev/null
+++ b/data/2021/neurips/Noisy Recurrent Neural Networks	
@@ -0,0 +1 @@
+We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states. Specifically, we consider RNNs that can be viewed as discretizations of stochastic differential equations driven by input data. This framework allows us to study the implicit regularization effect of general noise injection schemes by deriving an approximate explicit regularizer in the small noise regime. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Sufficient conditions for global stability are obtained, highlighting the phenomenon of stochastic stabilization, where noise injection can improve stability during training. Our theory is supported by empirical results which demonstrate that the RNNs have improved robustness with respect to various input perturbations.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation b/data/2021/neurips/Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation
new file mode 100644
index 0000000000..e37734ca2b
--- /dev/null
+++ b/data/2021/neurips/Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation	
@@ -0,0 +1 @@
+Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning. This algorithm was initially proposed with linear function approximation, and was later extended to the one with general smooth function approximation. The asymptotic convergence for the on-policy setting with general smooth function approximation was established in [bhatnagar2009convergent], however, the finite-sample analysis remains unsolved due to challenges in the non-linear and two-time-scale update structure, non-convex objective function and the time-varying projection onto a tangent plane. In this paper, we develop novel techniques to explicitly characterize the finite-sample error bound for the general off-policy setting with i.i.d.\ or Markovian samples, and show that it converges as fast as $\mathcal O(1/\sqrt T)$ (up to a factor of $\mathcal O(\log T)$). Our approach can be applied to a wide range of value-based reinforcement learning algorithms with general smooth function approximation.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-Gaussian Gaussian Processes for Few-Shot Regression b/data/2021/neurips/Non-Gaussian Gaussian Processes for Few-Shot Regression
new file mode 100644
index 0000000000..22a8038762
--- /dev/null
+++ b/data/2021/neurips/Non-Gaussian Gaussian Processes for Few-Shot Regression	
@@ -0,0 +1 @@
+Gaussian Processes (GPs) have been widely used in machine learning to model distributions over functions, with applications including multi-modal regression, time-series prediction, and few-shot learning. GPs are particularly useful in the last application since they rely on Normal distributions and enable closed-form computation of the posterior probability function. Unfortunately, because the resulting posterior is not flexible enough to capture complex distributions, GPs assume high similarity between subsequent tasks - a requirement rarely met in real-world conditions. In this work, we address this limitation by leveraging the flexibility of Normalizing Flows to modulate the posterior predictive distribution of the GP. This makes the GP posterior locally non-Gaussian, therefore we name our method Non-Gaussian Gaussian Processes (NGGPs). More precisely, we propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them. We empirically tested the flexibility of NGGPs on various few-shot learning regression datasets, showing that the mapping can incorporate context embedding information to model different noise levels for periodic functions. As a result, our method shares the structure of the problem between subsequent tasks, but the contextualization allows for adaptation to dissimilarities. NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-approximate Inference for Collective Graphical Models on Path Graphs via Discrete Difference of Convex Algorithm b/data/2021/neurips/Non-approximate Inference for Collective Graphical Models on Path Graphs via Discrete Difference of Convex Algorithm
new file mode 100644
index 0000000000..add09b924b
--- /dev/null
+++ b/data/2021/neurips/Non-approximate Inference for Collective Graphical Models on Path Graphs via Discrete Difference of Convex Algorithm	
@@ -0,0 +1 @@
+The importance of aggregated count data, which is calculated from the data of multiple individuals, continues to increase. Collective Graphical Model (CGM) is a probabilistic approach to the analysis of aggregated data. One of the most important operations in CGM is maximum a posteriori (MAP) inference of unobserved variables under given observations. Because the MAP inference problem for general CGMs has been shown to be NP-hard, an approach that solves an approximate problem has been proposed. However, this approach has two major drawbacks. First, the quality of the solution deteriorates when the values in the count tables are small, because the approximation becomes inaccurate. Second, since continuous relaxation is applied, the integrality constraints of the output are violated. To resolve these problems, this paper proposes a new method for MAP inference for CGMs on path graphs. First we show that the MAP inference problem can be formulated as a (non-linear) minimum cost flow problem. Then, we apply Difference of Convex Algorithm (DCA), which is a general methodology to minimize a function represented as the sum of a convex function and a concave function. In our algorithm, important subroutines in DCA can be efficiently calculated by minimum convex cost flow algorithms. Experiments show that the proposed method outputs higher quality solutions than the conventional approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-asymptotic Error Bounds for Bidirectional GANs b/data/2021/neurips/Non-asymptotic Error Bounds for Bidirectional GANs
new file mode 100644
index 0000000000..e715df2373
--- /dev/null
+++ b/data/2021/neurips/Non-asymptotic Error Bounds for Bidirectional GANs	
@@ -0,0 +1 @@
+We derive nearly sharp bounds for the bidirectional GAN (BiGAN) estimation error under the Dudley distance between the latent joint distribution and the data joint distribution with appropriately specified architecture of the neural networks used in the model. To the best of our knowledge, this is the first theoretical guarantee for the bidirectional GAN learning approach. An appealing feature of our results is that they do not assume the reference and the data distributions to have the same dimensions or these distributions to have bounded support. These assumptions are commonly assumed in the existing convergence analysis of the unidirectional GANs but may not be satisfied in practice. Our results are also applicable to the Wasserstein bidirectional GAN if the target distribution is assumed to have a bounded support. To prove these results, we construct neural network functions that push forward an empirical distribution to another arbitrary empirical distribution on a possibly different-dimensional space. We also develop a novel decomposition of the integral probability metric for the error analysis of bidirectional GANs. These basic theoretical results are of independent interest and can be applied to other related learning problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-asymptotic convergence bounds for Wasserstein approximation using point clouds b/data/2021/neurips/Non-asymptotic convergence bounds for Wasserstein approximation using point clouds
new file mode 100644
index 0000000000..a3712ce0bf
--- /dev/null
+++ b/data/2021/neurips/Non-asymptotic convergence bounds for Wasserstein approximation using point clouds	
@@ -0,0 +1 @@
+Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyd's algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis b/data/2021/neurips/Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis
new file mode 100644
index 0000000000..5f7253f219
--- /dev/null
+++ b/data/2021/neurips/Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis	
@@ -0,0 +1 @@
+Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift. Compared with the standard optimization setting, the objective function in DRO is more difficult to optimize, and most of the existing theoretical results make strong assumptions on the loss function. In this work we bridge the gap by studying DRO algorithms for general smooth non-convex losses. By carefully exploiting the specific form of the DRO objective, we are able to provide non-asymptotic convergence guarantees even though the objective function is possibly non-convex, non-smooth and has unbounded gradient noise. In particular, we prove that a special algorithm called the mini-batch normalized gradient descent with momentum, can find an $\epsilon$ first-order stationary point within $O( \epsilon^{-4} )$ gradient complexity. We also discuss the conditional value-at-risk (CVaR) setting, where we propose a penalized DRO objective based on a smoothed version of the CVaR that allows us to obtain a similar convergence guarantee. We finally verify our theoretical results in a number of tasks and find that the proposed algorithm can consistently achieve prominent acceleration.
\ No newline at end of file
diff --git a/data/2021/neurips/Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation b/data/2021/neurips/Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation
new file mode 100644
index 0000000000..c3526ddef6
--- /dev/null
+++ b/data/2021/neurips/Non-local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation	
@@ -0,0 +1 @@
+Available 3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision. Barring synthetic or in-studio domains, acquiring such supervision for each new target environment is highly inconvenient. To this end, we cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target. We propose to infer image-to-pose via two explicit mappings viz. image-to-latent and latent-to-pose where the latter is a pre-learned decoder obtained from a prior-enforcing generative adversarial auto-encoder. Next, we introduce relation distillation as a means to align the unpaired cross-modal samples i.e. the unpaired target videos and unpaired 3D pose sequences. To this end, we propose a new set of non-local relations in order to characterize long-range latent pose interactions unlike general contrastive relations where positive couplings are limited to a local neighborhood structure. Further, we provide an objective way to quantify non-localness in order to select the most effective relation set. We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Nonparametric estimation of continuous DPPs with kernel methods b/data/2021/neurips/Nonparametric estimation of continuous DPPs with kernel methods
new file mode 100644
index 0000000000..b0336b4598
--- /dev/null
+++ b/data/2021/neurips/Nonparametric estimation of continuous DPPs with kernel methods	
@@ -0,0 +1 @@
+Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set. In the continuous case, only parametric methods have been investigated, while nonparametric maximum likelihood for DPPs -- an optimization problem over trace-class operators -- has remained an open question. In this paper, we show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS. This leads to a finite-dimensional problem, with strong statistical ties to the original MLE. Moreover, we propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem. Finally, we also provide a controlled estimate of the correlation kernel of the DPP, thus providing more interpretability.
\ No newline at end of file
diff --git a/data/2021/neurips/Nonsmooth Implicit Differentiation for Machine-Learning and Optimization b/data/2021/neurips/Nonsmooth Implicit Differentiation for Machine-Learning and Optimization
new file mode 100644
index 0000000000..dbbcc6f8ab
--- /dev/null
+++ b/data/2021/neurips/Nonsmooth Implicit Differentiation for Machine-Learning and Optimization	
@@ -0,0 +1 @@
+In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
\ No newline at end of file
diff --git a/data/2021/neurips/Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data b/data/2021/neurips/Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data
new file mode 100644
index 0000000000..0fc49d0e40
--- /dev/null
+++ b/data/2021/neurips/Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data	
@@ -0,0 +1 @@
+We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data. We first prove that, with imbalanced data, the available information about unknown parameters is only tied to the relatively small number of positive instances, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. To maintain more information, we derive the asymptotic distribution of a general inverse probability weighted (IPW) estimator and obtain the optimal sampling probability that minimizes its variance. To further improve the estimation efficiency over the IPW method, we propose a likelihood-based estimator by correcting log odds for the sampled data and prove that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. We validate our approach on simulated data as well as a real click-through rate dataset with more than 0.3 trillion instances, collected over a period of a month. Both theoretical and empirical results demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition b/data/2021/neurips/Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition
new file mode 100644
index 0000000000..a5dc57ba30
--- /dev/null
+++ b/data/2021/neurips/Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	
@@ -0,0 +1 @@
+Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens would lead to higher prediction accuracy, while it also results in drastically increased computational cost. To achieve a decent trade-off between accuracy and speed, the number of tokens is empirically set to 16x16 or 14x14. In this paper, we argue that every image has its own characteristics, and ideally the token number should be conditioned on each individual input. In fact, we have observed that there exist a considerable number of"easy"images which can be accurately predicted with a mere number of 4x4 tokens, while only a small fraction of"hard"ones need a finer representation. Inspired by this phenomenon, we propose a Dynamic Transformer to automatically configure a proper number of tokens for each input image. This is achieved by cascading multiple Transformers with increasing numbers of tokens, which are sequentially activated in an adaptive fashion at test time, i.e., the inference is terminated once a sufficiently confident prediction is produced. We further design efficient feature reuse and relationship reuse mechanisms across different components of the Dynamic Transformer to reduce redundant computations. Extensive empirical results on ImageNet, CIFAR-10, and CIFAR-100 demonstrate that our method significantly outperforms the competitive baselines in terms of both theoretical computational efficiency and practical inference speed. Code and pre-trained models (based on PyTorch and MindSpore) are available at https://github.com/blackfeather-wang/Dynamic-Vision-Transformer and https://github.com/blackfeather-wang/Dynamic-Vision-Transformer-MindSpore.
\ No newline at end of file
diff --git a/data/2021/neurips/Not All Low-Pass Filters are Robust in Graph Convolutional Networks b/data/2021/neurips/Not All Low-Pass Filters are Robust in Graph Convolutional Networks
new file mode 100644
index 0000000000..2fa9c9cf1b
--- /dev/null
+++ b/data/2021/neurips/Not All Low-Pass Filters are Robust in Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs) are promising deep learning approaches in learning representations for graph-structured data. Despite the proliferation of such methods, it is well known that they are vulnerable to carefully crafted adversarial attacks on the graph structure. In this paper, we ﬁrst conduct an adversarial vulnerability analysis based on matrix perturbation theory. We prove that the low-frequency components of the symmetric normalized Laplacian, which is usually used as the convolutional ﬁlter in GCNs, could be more robust against structural perturbations when their eigenvalues fall into a certain robust interval. Our results indicate that not all low-frequency components are robust to adversarial attacks and provide a deeper understanding of the relationship between graph spectrum and robustness of GCNs. Motivated by the theory, we present GCN - LFR 3 , a general robust co-training paradigm for GCN-based models, that encourages transferring the robustness of low-frequency components with an auxiliary neural network. To this end, GCN - LFR could enhance the robustness of various kinds of GCN-based models against poisoning structural attacks in a plug-and-play manner. Extensive experiments across ﬁve benchmark datasets and ﬁve GCN-based models also conﬁrm that GCN - LFR is resistant to the adversarial attacks without compromising on performance in the benign situation.
\ No newline at end of file
diff --git a/data/2021/neurips/Novel Upper Bounds for the Constrained Most Probable Explanation Task b/data/2021/neurips/Novel Upper Bounds for the Constrained Most Probable Explanation Task
new file mode 100644
index 0000000000..bedefb7e31
--- /dev/null
+++ b/data/2021/neurips/Novel Upper Bounds for the Constrained Most Probable Explanation Task	
@@ -0,0 +1 @@
+We propose several schemes for upper bounding the optimal value of the constrained most probable explanation (CMPE) problem. Given a set of discrete random variables, two probabilistic graphical models defined over them and a real number q, this problem involves finding an assignment of values to all the variables such that the probability of the assignment is maximized according to the first model and is bounded by q w.r.t. the second model. In prior work, it was shown that CMPE is a unifying problem with several applications and special cases including the nearest assignment problem, the decision preserving most probable explanation task and robust estimation. It was also shown that CMPE is NP-hard even on tractable models such as bounded treewidth networks and is hard for integer linear programming methods because it includes a dense global constraint. The main idea in our approach is to simplify the problem via Lagrange relaxation and decomposition to yield either a knapsack problem or the unconstrained most probable explanation (MPE) problem, and then solving the two problems, respectively using specialized knapsack algorithms and mini-buckets based upper bounding schemes. We evaluate our proposed scheme along several dimensions including quality of the bounds and computation time required on various benchmark graphical models and how it can be used to find heuristic, near-optimal feasible solutions in an example application pertaining to robust estimation and adversarial attacks on classifiers.
\ No newline at end of file
diff --git a/data/2021/neurips/Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation b/data/2021/neurips/Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation
new file mode 100644
index 0000000000..8e7dd4caf6
--- /dev/null
+++ b/data/2021/neurips/Novel Visual Category Discovery with Dual Ranking Statistics and Mutual Knowledge Distillation	
@@ -0,0 +1 @@
+In this paper, we tackle the problem of novel visual category discovery, i.e., grouping unlabelled images from new classes into different semantic partitions by leveraging a labelled dataset that contains images from other different but relevant categories. This is a more realistic and challenging setting than conventional semi-supervised learning. We propose a two-branch learning framework for this problem, with one branch focusing on local part-level information and the other branch focusing on overall characteristics. To transfer knowledge from the labelled data to the unlabelled, we propose using dual ranking statistics on both branches to generate pseudo labels for training on the unlabelled data. We further introduce a mutual knowledge distillation method to allow information exchange and encourage agreement between the two branches for discovering new categories, allowing our model to enjoy the benefits of global and local features. We comprehensively evaluate our method on public benchmarks for generic object classification, as well as the more challenging datasets for fine-grained visual recognition, achieving state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2021/neurips/NovelD: A Simple yet Effective Exploration Criterion b/data/2021/neurips/NovelD: A Simple yet Effective Exploration Criterion
new file mode 100644
index 0000000000..f6faf49e88
--- /dev/null
+++ b/data/2021/neurips/NovelD: A Simple yet Effective Exploration Criterion	
@@ -0,0 +1 @@
+Efﬁcient exploration under sparse rewards remains a key challenge in deep re-inforcement learning. Previous exploration methods (e.g., RND) have achieved strong results in multiple hard tasks. However, if there are multiple novel areas to explore, these methods often focus quickly on one without sufﬁciently trying others (like a depth-wise ﬁrst search manner). In some scenarios (e.g., four corridor environment in Sec. 4.2), we observe they explore in one corridor for long and fail to cover all the states. On the other hand, in theoretical RL, with optimistic initialization and the inverse square root of visitation count as a bonus, it won’t suffer from this and explores different novel regions alternatively (like a breadth-ﬁrst search manner). In this paper, inspired by this, we propose a simple but effective criterion called NovelD by weighting every novel area approximately equally. Our algorithm is very simple but yet shows comparable performance or even outperforms multiple SOTA exploration methods in many hard exploration tasks. Speciﬁcally, NovelD solves all the static procedurally-generated tasks in Mini-Grid with just 120 M environment steps, without any curriculum learning. In comparison, the previous SOTA only solves 50 % of them. NovelD also achieves SOTA on multiple tasks in NetHack, a rogue-like game that contains more challenging procedurally-generated environments. In multiple Atari games (e.g., MonteZuma’s Revenge, Venture, Gravitar), NovelD outperforms RND. We analyze NovelD thoroughly in Mini-Grid and found that empirically it helps the agent explore the environment more uniformly with a focus on exploring beyond the boundary. 1
\ No newline at end of file
diff --git a/data/2021/neurips/Numerical Composition of Differential Privacy b/data/2021/neurips/Numerical Composition of Differential Privacy
new file mode 100644
index 0000000000..e29169872d
--- /dev/null
+++ b/data/2021/neurips/Numerical Composition of Differential Privacy	
@@ -0,0 +1 @@
+We give a fast algorithm to optimally compose privacy guarantees of differentially private (DP) algorithms to arbitrary accuracy. Our method is based on the notion of \emph{privacy loss random variables} to quantify the privacy loss of DP algorithms.The running time and memory needed for our algorithm to approximate the privacy curve of a DP algorithm composed with itself $k$ times is $\tilde{O}(\sqrt{k})$. This improves over the best prior method by Koskela et al. (2021) which requires $\tilde{\Omega}(k^{1.5})$ running time. We demonstrate the utility of our algorithm by accurately computing the privacy loss of DP-SGD algorithm of Abadi et al. (2016) and showing that our algorithm speeds up the privacy computations by a few orders of magnitude compared to prior work, while maintaining similar accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Numerical influence of ReLU'(0) on backpropagation b/data/2021/neurips/Numerical influence of ReLU'(0) on backpropagation
new file mode 100644
index 0000000000..01909ad8d8
--- /dev/null
+++ b/data/2021/neurips/Numerical influence of ReLU'(0) on backpropagation	
@@ -0,0 +1 @@
+In theory, the choice of ReLU (cid:48) (0) in [0 , 1] for a neural network has a negligible inﬂuence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU (cid:48) (0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU (cid:48) (0) = 0 seems to be the most efﬁcient. We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the inﬂuence of ReLU (cid:48) (0) ’s value. Overall, the message we want to convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.
\ No newline at end of file
diff --git a/data/2021/neurips/NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM b/data/2021/neurips/NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
new file mode 100644
index 0000000000..0a2c73cf29
--- /dev/null
+++ b/data/2021/neurips/NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM	
@@ -0,0 +1 @@
+Natural Language Processing (NLP) has recently achieved success by using huge pre-trained Transformer networks. However, these models often contain hundreds of millions or even billions of parameters, bringing challenges to online deployment due to latency constraints. Recently, hardware manufacturers have introduced dedicated hardware for NxM sparsity to provide the flexibility of unstructured pruning with the runtime efficiency of structured approaches. NxM sparsity permits arbitrarily selecting M parameters to retain from a contiguous group of N in the dense representation. However, due to the extremely high complexity of pre-trained models, the standard sparse fine-tuning techniques often fail to generalize well on downstream tasks, which have limited data resources. To address such an issue in a principled manner, we introduce a new learning framework, called NxMTransformer, to induce NxM semi-structured sparsity on pretrained language models for natural language understanding to obtain better performance. In particular, we propose to formulate the NxM sparsity as a constrained optimization problem and use Alternating Direction Method of Multipliers (ADMM) to optimize the downstream tasks while taking the underlying hardware constraints into consideration. ADMM decomposes the NxM sparsification problem into two sub-problems that can be solved sequentially, generating sparsified Transformer networks that achieve high accuracy while being able to effectively execute on newly released hardware. We apply our approach to a wide range of NLP tasks, and our proposed method is able to achieve 1.7 points higher accuracy in GLUE score than current practices. Moreover, we perform detailed analysis on our approach and shed light on how ADMM affects fine-tuning accuracy for downstream tasks. Finally, we illustrate how NxMTransformer achieves performance improvement with knowledge distillation.
\ No newline at end of file
diff --git a/data/2021/neurips/OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression b/data/2021/neurips/OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression
new file mode 100644
index 0000000000..7b3f093b36
--- /dev/null
+++ b/data/2021/neurips/OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression	
@@ -0,0 +1 @@
+Explicit deep generative models (DGMs), e.g., VAEs and Normalizing Flows, have shown to offer an effective data modelling alternative for lossless compression. However, DGMs themselves normally require large storage space and thus contaminate the advantage brought by accurate data density estimation. To eliminate the requirement of saving separate models for different target datasets, we propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. We formalise this setting as that of One-Shot Online Adaptation (OSOA) of DGMs for lossless compression and propose a vanilla algorithm under this setting. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets. With the same adaptation step number or adaptation time, it is shown vanilla OSOA can exhibit better space efficiency, e.g., $47\%$ less space, than fine-tuning the pretrained model and saving the fine-tuned model. Moreover, we showcase the potential of OSOA and motivate more sophisticated OSOA algorithms by showing further space or time efficiency with multiple updates per batch and early stopping.
\ No newline at end of file
diff --git a/data/2021/neurips/Object DGCNN: 3D Object Detection using Dynamic Graphs b/data/2021/neurips/Object DGCNN: 3D Object Detection using Dynamic Graphs
new file mode 100644
index 0000000000..19b00df507
--- /dev/null
+++ b/data/2021/neurips/Object DGCNN: 3D Object Detection using Dynamic Graphs	
@@ -0,0 +1 @@
+3D object detection often involves complicated training and testing pipelines, which require substantial domain knowledge about individual datasets. Inspired by recent non-maximum suppression-free 2D object detection models, we propose a 3D object detection architecture on point clouds. Our method models 3D object detection as message passing on a dynamic graph, generalizing the DGCNN framework to predict a set of objects. In our construction, we remove the necessity of post-processing via object confidence aggregation or non-maximum suppression. To facilitate object detection from sparse point clouds, we also propose a set-to-set distillation approach customized to 3D detection. This approach aligns the outputs of the teacher model and the student model in a permutation-invariant fashion, significantly simplifying knowledge distillation for the 3D detection task. Our method achieves state-of-the-art performance on autonomous driving benchmarks. We also provide abundant analysis of the detection model and distillation framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning b/data/2021/neurips/Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning
new file mode 100644
index 0000000000..342d997c63
--- /dev/null
+++ b/data/2021/neurips/Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning	
@@ -0,0 +1 @@
+Behavioral cloning has proven to be effective for learning sequential decision-making policies from expert demonstrations. However, behavioral cloning often suffers from the causal confusion problem where a policy relies on the noticeable effect of expert actions due to the strong correlation but not the cause we desire. This paper presents Object-aware REgularizatiOn (OREO), a simple technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions. To this end, we introduce a two-stage approach: (a) we extract semantic objects from images by utilizing discrete codes from a vector-quantized variational autoencoder, and (b) we randomly drop the units that share the same discrete code together, i.e., masking out semantic objects. Our experiments demonstrate that OREO significantly improves the performance of behavioral cloning, outperforming various other regularization and causality-based methods on a variety of Atari environments and a self-driving CARLA environment. We also show that our method even outperforms inverse reinforcement learning methods trained with a considerable amount of environment interaction.
\ No newline at end of file
diff --git a/data/2021/neurips/Object-Centric Representation Learning with Generative Spatial-Temporal Factorization b/data/2021/neurips/Object-Centric Representation Learning with Generative Spatial-Temporal Factorization
new file mode 100644
index 0000000000..6b6ef57c92
--- /dev/null
+++ b/data/2021/neurips/Object-Centric Representation Learning with Generative Spatial-Temporal Factorization	
@@ -0,0 +1 @@
+Learning object-centric scene representations is essential for attaining structural understanding and abstraction of complex scenes. Yet, as current approaches for unsupervised object-centric representation learning are built upon either a stationary observer assumption or a static scene assumption, they often: i) suffer single-view spatial ambiguities, or ii) infer incorrectly or inaccurately object representations from dynamic scenes. To address this, we propose Dynamics-aware Multi-Object Network (DyMON), a method that broadens the scope of multi-view object-centric representation learning to dynamic scenes. We train DyMON on multi-view-dynamic-scene data and show that DyMON learns -- without supervision -- to factorize the entangled effects of observer motions and scene object dynamics from a sequence of observations, and constructs scene object spatial representations suitable for rendering at arbitrary times (querying across time) and from arbitrary viewpoints (querying across space). We also show that the factorized scene representations (w.r.t. objects) support querying about a single object by space and time independently.
\ No newline at end of file
diff --git a/data/2021/neurips/Object-aware Contrastive Learning for Debiased Scene Representation b/data/2021/neurips/Object-aware Contrastive Learning for Debiased Scene Representation
new file mode 100644
index 0000000000..afbf1444f9
--- /dev/null
+++ b/data/2021/neurips/Object-aware Contrastive Learning for Debiased Scene Representation	
@@ -0,0 +1 @@
+Contrastive self-supervised learning has shown impressive results in learning visual representations from unlabeled images by enforcing invariance against different data augmentations. However, the learned representations are often contextually biased to the spurious scene correlations of different objects or object and background, which may harm their generalization on the downstream tasks. To tackle the issue, we develop a novel object-aware contrastive learning framework that first (a) localizes objects in a self-supervised manner and then (b) debias scene correlations via appropriate data augmentations considering the inferred object locations. For (a), we propose the contrastive class activation map (ContraCAM), which finds the most discriminative regions (e.g., objects) in the image compared to the other images using the contrastively trained models. We further improve the ContraCAM to detect multiple objects and entire shapes via an iterative refinement procedure. For (b), we introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning, respectively. Our experiments demonstrate the effectiveness of our representation learning framework, particularly when trained under multi-object images or evaluated under the background (and distribution) shifted images.
\ No newline at end of file
diff --git a/data/2021/neurips/Observation-Free Attacks on Stochastic Bandits b/data/2021/neurips/Observation-Free Attacks on Stochastic Bandits
new file mode 100644
index 0000000000..185271c40c
--- /dev/null
+++ b/data/2021/neurips/Observation-Free Attacks on Stochastic Bandits	
@@ -0,0 +1 @@
+We study data corruption attacks on stochastic multi arm bandit algorithms. Existing attack methodologies assume that the attacker can observe the multi arm bandit algorithm’s realized behavior which is in contrast to the adversaries modeled in the robust multi arm bandit algorithms literature. To the best of our knowledge, we develop the ﬁrst data corruption attack on stochastic multi arm bandit algorithms which works without observing the algorithm’s realized behavior. Through this attack, we also discover a sufﬁcient condition for a stochastic multi arm bandit algorithm to be susceptible to adversarial data corruptions. We show that any bandit algorithm that makes decisions just using the empirical mean reward, and the number of times that arm has been pulled in the past can suffer from linear regret under data corruption attacks. We further show that various popular stochastic multi arm bandit algorithms such UCB, (cid:15) -greedy and Thompson Sampling satisfy this sufﬁcient condition and are thus prone to data corruption attacks. We further analyse the behaviour of our attack for these algorithms and show that using only o ( T ) corruptions, our attack can force these algorithms to select a potentially non-optimal target arm preferred by the attacker for all but o ( T ) rounds.
\ No newline at end of file
diff --git a/data/2021/neurips/OctField: Hierarchical Implicit Functions for 3D Modeling b/data/2021/neurips/OctField: Hierarchical Implicit Functions for 3D Modeling
new file mode 100644
index 0000000000..3e96909eee
--- /dev/null
+++ b/data/2021/neurips/OctField: Hierarchical Implicit Functions for 3D Modeling	
@@ -0,0 +1 @@
+Recent advances in localized implicit functions have enabled neural implicit representation to be scalable to large scenes. However, the regular subdivision of 3D space employed by these approaches fails to take into account the sparsity of the surface occupancy and the varying granularities of geometric details. As a result, its memory footprint grows cubically with the input volume, leading to a prohibitive computational cost even at a moderately dense decomposition. In this work, we present a learnable hierarchical implicit representation for 3D surfaces, coded OctField, that allows high-precision encoding of intricate surfaces with low memory and computational budget. The key to our approach is an adaptive decomposition of 3D scenes that only distributes local implicit functions around the surface of interest. We achieve this goal by introducing a hierarchical octree structure to adaptively subdivide the 3D space according to the surface occupancy and the richness of part geometry. As octree is discrete and non-differentiable, we further propose a novel hierarchical network that models the subdivision of octree cells as a probabilistic process and recursively encodes and decodes both octree structure and surface geometry in a differentiable manner. We demonstrate the value of OctField for a range of shape modeling and reconstruction tasks, showing superiority over alternative approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Off-Policy Risk Assessment in Contextual Bandits b/data/2021/neurips/Off-Policy Risk Assessment in Contextual Bandits
new file mode 100644
index 0000000000..0144264fbe
--- /dev/null
+++ b/data/2021/neurips/Off-Policy Risk Assessment in Contextual Bandits	
@@ -0,0 +1 @@
+Even when unable to run experiments, practitioners can evaluate prospective policies, using previously logged data. However, while the bandits literature has adopted a diverse set of objectives, most research on off-policy evaluation to date focuses on the expected reward. In this paper, we introduce Lipschitz risk functionals, a broad class of objectives that subsumes conditional value-at-risk (CVaR), variance, mean-variance, many distorted risks, and CPT risks, among others. We propose Off-Policy Risk Assessment (OPRA), a framework that first estimates a target policy's CDF and then generates plugin estimates for any collection of Lipschitz risks, providing finite sample guarantees that hold simultaneously over the entire class. We instantiate OPRA with both importance sampling and doubly robust estimators. Our primary theoretical contributions are (i) the first uniform concentration inequalities for both CDF estimators in contextual bandits and (ii) error bounds on our Lipschitz risk estimates, which all converge at a rate of $O(1/\sqrt{n})$.
\ No newline at end of file
diff --git a/data/2021/neurips/Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration b/data/2021/neurips/Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
new file mode 100644
index 0000000000..2d7892cc0a
--- /dev/null
+++ b/data/2021/neurips/Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration	
@@ -0,0 +1 @@
+In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance speciﬁed by a multi-objective preference function under a constraint. We focus on the ofﬂine setting where the RL agent aims to learn the optimal policy from a given dataset. This scenario is common in real-world applications where interactions with the environment are expensive and the constraint violation is dangerous. For such a setting, we transform the original constrained problem into a primal-dual formulation, which is solved via dual gradient ascent. Moreover, we propose to combine such an approach with pessimism to overcome the uncertainty in ofﬂine data, which leads to our Pessimistic Dual Iteration (PEDI). We establish upper bounds on both the suboptimality and constraint violation for the policy learned by PEDI based on an arbitrary dataset, which proves that PEDI is provably sample efﬁcient. We also specialize PEDI to the setting with linear function approximation. To the best of our knowledge, we propose the ﬁrst provably efﬁcient constrained multi-objective RL algorithm with ofﬂine data without any assumption on the coverage of the dataset.
\ No newline at end of file
diff --git a/data/2021/neurips/Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies b/data/2021/neurips/Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies
new file mode 100644
index 0000000000..42e833c58c
--- /dev/null
+++ b/data/2021/neurips/Offline Meta Reinforcement Learning - Identifiability Challenges and Effective Data Collection Strategies	
@@ -0,0 +1 @@
+Consider the following instance of the Ofﬂine Meta Reinforcement Learning (OMRL) problem: given the complete training logs of N conventional RL agents, trained on N different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. Here, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the ofﬂine data. Building on the recent VariBAD BRL approach, we develop an off-policy BRL method that learns to plan an exploration strategy based on an adaptive neural belief estimate. However, learning to infer such a belief from ofﬂine data brings a new identiﬁability issue we term MDP ambiguity. We characterize the problem, and suggest resolutions via data collection and modiﬁcation procedures. Finally, we evaluate our framework on a diverse set of domains, including difﬁcult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. Our code is available online at https://github.com/Rondorf/BOReL .
\ No newline at end of file
diff --git a/data/2021/neurips/Offline Model-based Adaptable Policy Learning b/data/2021/neurips/Offline Model-based Adaptable Policy Learning
new file mode 100644
index 0000000000..a16ca67c19
--- /dev/null
+++ b/data/2021/neurips/Offline Model-based Adaptable Policy Learning	
@@ -0,0 +1 @@
+In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Offline RL Without Off-Policy Evaluation b/data/2021/neurips/Offline RL Without Off-Policy Evaluation
new file mode 100644
index 0000000000..9bfca789d3
--- /dev/null
+++ b/data/2021/neurips/Offline RL Without Off-Policy Evaluation	
@@ -0,0 +1 @@
+Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-critic approach involving off-policy evaluation. In this paper we show that simply doing one step of constrained/regularized policy improvement using an on-policy Q estimate of the behavior policy performs surprisingly well. This one-step algorithm beats the previously reported results of iterative algorithms on a large portion of the D4RL benchmark. The one-step baseline achieves this strong performance while being notably simpler and more robust to hyperparameters than previously proposed iterative algorithms. We argue that the relatively poor performance of iterative approaches is a result of the high variance inherent in doing off-policy evaluation and magnified by the repeated optimization of policies against those estimates. In addition, we hypothesize that the strong performance of the one-step algorithm is due to a combination of favorable structure in the environment and behavior policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Offline Reinforcement Learning as One Big Sequence Modeling Problem b/data/2021/neurips/Offline Reinforcement Learning as One Big Sequence Modeling Problem
new file mode 100644
index 0000000000..c59ee7f0d0
--- /dev/null
+++ b/data/2021/neurips/Offline Reinforcement Learning as One Big Sequence Modeling Problem	
@@ -0,0 +1 @@
+Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components common in offline RL algorithms. We demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL. Further, we show that this approach can be combined with existing model-free algorithms to yield a state-of-the-art planner in sparse-reward, long-horizon tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Offline Reinforcement Learning with Reverse Model-based Imagination b/data/2021/neurips/Offline Reinforcement Learning with Reverse Model-based Imagination
new file mode 100644
index 0000000000..a89350bda8
--- /dev/null
+++ b/data/2021/neurips/Offline Reinforcement Learning with Reverse Model-based Imagination	
@@ -0,0 +1 @@
+In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset. To address this problem, recent offline RL methods attempt to introduce conservatism bias to encourage learning in high-confidence areas. Model-free approaches directly encode such bias into policy or value function learning using conservative regularizations or special network structures, but their constrained policy search limits the generalization beyond the offline dataset. Model-based approaches learn forward dynamics models with conservatism quantifications and then generate imaginary trajectories to extend the offline datasets. However, due to limited samples in offline datasets, conservatism quantifications often suffer from overgeneralization in out-of-support regions. The unreliable conservative measures will mislead forward model-based imaginations to undesired areas, leading to overaggressive behaviors. To encourage more conservatism, we propose a novel model-based offline RL framework, called Reverse Offline Model-based Imagination (ROMI). We learn a reverse dynamics model in conjunction with a novel reverse policy, which can generate rollouts leading to the target goal states within the offline dataset. These reverse imaginations provide informed data augmentation for model-free policy learning and enable conservative generalization beyond the offline dataset. ROMI can effectively combine with off-the-shelf model-free algorithms to enable model-based generalization with proper conservatism. Empirical results show that our method can generate more conservative behaviors and achieve state-of-the-art performance on offline RL benchmark tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/On Blame Attribution for Accountable Multi-Agent Sequential Decision Making b/data/2021/neurips/On Blame Attribution for Accountable Multi-Agent Sequential Decision Making
new file mode 100644
index 0000000000..b0205fef1d
--- /dev/null
+++ b/data/2021/neurips/On Blame Attribution for Accountable Multi-Agent Sequential Decision Making	
@@ -0,0 +1 @@
+Blame attribution is one of the key aspects of accountable decision making, as it provides means to quantify the responsibility of an agent for a decision making outcome. In this paper, we study blame attribution in the context of cooperative multi-agent sequential decision making. As a particular setting of interest, we focus on cooperative decision making formalized by Multi-Agent Markov Decision Processes (MMDPs), and we analyze different blame attribution methods derived from or inspired by existing concepts in cooperative game theory. We formalize desirable properties of blame attribution in the setting of interest, and we analyze the relationship between these properties and the studied blame attribution methods. Interestingly, we show that some of the well known blame attribution methods, such as Shapley value, are not performance-incentivizing, while others, such as Banzhaf index, may over-blame agents. To mitigate these value misalignment and fairness issues, we introduce a novel blame attribution method, unique in the set of properties it satisfies, which trade-offs explanatory power (by under-blaming agents) for the aforementioned properties. We further show how to account for uncertainty about agents' decision making policies, and we experimentally: a) validate the qualitative properties of the studied blame attribution methods, and b) analyze their robustness to uncertainty.
\ No newline at end of file
diff --git a/data/2021/neurips/On Calibration and Out-of-Domain Generalization b/data/2021/neurips/On Calibration and Out-of-Domain Generalization
new file mode 100644
index 0000000000..579260524d
--- /dev/null
+++ b/data/2021/neurips/On Calibration and Out-of-Domain Generalization	
@@ -0,0 +1 @@
+Out-of-domain (OOD) generalization is a significant challenge for machine learning models. Many techniques have been proposed to overcome this challenge, often focused on learning models with certain invariance properties. In this work, we draw a link between OOD performance and model calibration, arguing that calibration across multiple domains can be viewed as a special case of an invariant representation leading to better OOD generalization. Specifically, we show that under certain conditions, models which achieve \emph{multi-domain calibration} are provably free of spurious correlations. This leads us to propose multi-domain calibration as a measurable and trainable surrogate for the OOD performance of a classifier. We therefore introduce methods that are easy to apply and allow practitioners to improve multi-domain calibration by training or modifying an existing model, leading to better performance on unseen domains. Using four datasets from the recently proposed WILDS OOD benchmark, as well as the Colored MNIST dataset, we demonstrate that training or tuning models so they are calibrated across multiple domains leads to significantly improved performance on unseen test domains. We believe this intriguing connection between calibration and OOD generalization is promising from both a practical and theoretical point of view.
\ No newline at end of file
diff --git a/data/2021/neurips/On Component Interactions in Two-Stage Recommender Systems b/data/2021/neurips/On Component Interactions in Two-Stage Recommender Systems
new file mode 100644
index 0000000000..51340a39a9
--- /dev/null
+++ b/data/2021/neurips/On Component Interactions in Two-Stage Recommender Systems	
@@ -0,0 +1 @@
+Thanks to their scalability, two-stage recommenders are used by many of today's largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators, tuned for low prediction latency, preselect a small subset of candidates from the whole item pool; (ii) a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as mere sums of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of the individual components in isolation. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance. Motivated by these findings, we derive a generalization lower bound which shows that independent nominator training can lead to performance on par with uniformly random recommendations. We find that careful design of item pools, each assigned to a different nominator, alleviates these issues. As manual search for a good pool allocation is difficult, we propose to learn one instead using a Mixture-of-Experts based approach. This significantly improves both precision and recall at K.
\ No newline at end of file
diff --git a/data/2021/neurips/On Contrastive Representations of Stochastic Processes b/data/2021/neurips/On Contrastive Representations of Stochastic Processes
new file mode 100644
index 0000000000..0921eb2fad
--- /dev/null
+++ b/data/2021/neurips/On Contrastive Representations of Stochastic Processes	
@@ -0,0 +1 @@
+Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CReSP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes. Our methods tolerate noisy high-dimensional observations better than traditional approaches, and the learned representations transfer to a range of downstream tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/On Density Estimation with Diffusion Models b/data/2021/neurips/On Density Estimation with Diffusion Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/On Effective Scheduling of Model-based Reinforcement Learning b/data/2021/neurips/On Effective Scheduling of Model-based Reinforcement Learning
new file mode 100644
index 0000000000..bfa7cd08cd
--- /dev/null
+++ b/data/2021/neurips/On Effective Scheduling of Model-based Reinforcement Learning	
@@ -0,0 +1 @@
+Model-based reinforcement learning has attracted wide attention due to its su-perior sample efﬁciency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we ﬁrst theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can signiﬁcantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/On Empirical Risk Minimization with Dependent and Heavy-Tailed Data b/data/2021/neurips/On Empirical Risk Minimization with Dependent and Heavy-Tailed Data
new file mode 100644
index 0000000000..0466cadadd
--- /dev/null
+++ b/data/2021/neurips/On Empirical Risk Minimization with Dependent and Heavy-Tailed Data	
@@ -0,0 +1 @@
+In this work, we establish risk bounds for the Empirical Risk Minimization (ERM) with both dependent and heavy-tailed data-generating processes. We do so by extending the seminal works of Mendelson [Men15, Men18] on the analysis of ERM with heavy-tailed but independent and identically distributed observations, to the strictly stationary exponentially $\beta$-mixing case. Our analysis is based on explicitly controlling the multiplier process arising from the interaction between the noise and the function evaluations on inputs. It allows for the interaction to be even polynomially heavy-tailed, which covers a significantly large class of heavy-tailed models beyond what is analyzed in the learning theory literature. We illustrate our results by deriving rates of convergence for the high-dimensional linear regression problem with dependent and heavy-tailed data.
\ No newline at end of file
diff --git a/data/2021/neurips/On Episodes, Prototypical Networks, and Few-Shot Learning b/data/2021/neurips/On Episodes, Prototypical Networks, and Few-Shot Learning
new file mode 100644
index 0000000000..e63dbd90b0
--- /dev/null
+++ b/data/2021/neurips/On Episodes, Prototypical Networks, and Few-Shot Learning	
@@ -0,0 +1 @@
+Episodic learning is a popular practice among researchers and practitioners interested in few-shot learning. It consists of organising training in a series of learning problems, each relying on small "support" and "query" sets to mimic the few-shot circumstances encountered during evaluation. In this paper, we investigate the usefulness of episodic learning in Prototypical Networks and Matching Networks, two of the most popular algorithms making use of this practice. Surprisingly, in our experiments we found that, for Prototypical and Matching Networks, it is detrimental to use the episodic learning strategy of separating training samples between support and query set, as it is a data-inefficient way to exploit training batches. These "non-episodic" variants, which are closely related to the classic Neighbourhood Component Analysis, reliably improve over their episodic counterparts in multiple datasets, achieving an accuracy that (in the case of Prototypical Networks) is competitive with the state-of-the-art, despite being extremely simple.
\ No newline at end of file
diff --git a/data/2021/neurips/On Inductive Biases for Heterogeneous Treatment Effect Estimation b/data/2021/neurips/On Inductive Biases for Heterogeneous Treatment Effect Estimation
new file mode 100644
index 0000000000..0aaf0d8268
--- /dev/null
+++ b/data/2021/neurips/On Inductive Biases for Heterogeneous Treatment Effect Estimation	
@@ -0,0 +1 @@
+We investigate how to exploit structural similarities of an individual's potential outcomes (POs) under different treatments to obtain better estimates of conditional average treatment effects in finite samples. Especially when it is unknown whether a treatment has an effect at all, it is natural to hypothesize that the POs are similar - yet, some existing strategies for treatment effect estimation employ regularization schemes that implicitly encourage heterogeneity even when it does not exist and fail to fully make use of shared structure. In this paper, we investigate and compare three end-to-end learning strategies to overcome this problem - based on regularization, reparametrization and a flexible multi-task architecture - each encoding inductive bias favoring shared behavior across POs. To build understanding of their relative strengths, we implement all strategies using neural networks and conduct a wide range of semi-synthetic experiments. We observe that all three approaches can lead to substantial improvements upon numerous baselines and gain insight into performance differences across various experimental settings.
\ No newline at end of file
diff --git a/data/2021/neurips/On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness b/data/2021/neurips/On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness
new file mode 100644
index 0000000000..75090995a9
--- /dev/null
+++ b/data/2021/neurips/On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness	
@@ -0,0 +1 @@
+Invariance to a broad array of image corruptions, such as warping, noise, or color shifts, is an important aspect of building robust models in computer vision. Recently, several new data augmentations have been proposed that significantly improve performance on ImageNet-C, a benchmark of such corruptions. However, there is still a lack of basic understanding on the relationship between data augmentations and test-time corruptions. To this end, we develop a feature space for image transforms, and then use a new measure in this space between augmentations and corruptions called the Minimal Sample Distance to demonstrate a strong correlation between similarity and performance. We then investigate recent data augmentations and observe a significant degradation in corruption robustness when the test-time corruptions are sampled to be perceptually dissimilar from ImageNet-C in this feature space. Our results suggest that test error can be improved by training on perceptually similar augmentations, and data augmentations may not generalize well beyond the existing benchmark. We hope our results and tools will allow for more robust progress towards improving robustness to image corruptions. We provide code at https://github.com/facebookresearch/augmentation-corruption.
\ No newline at end of file
diff --git a/data/2021/neurips/On Joint Learning for Solving Placement and Routing in Chip Design b/data/2021/neurips/On Joint Learning for Solving Placement and Routing in Chip Design
new file mode 100644
index 0000000000..ef516abc0b
--- /dev/null
+++ b/data/2021/neurips/On Joint Learning for Solving Placement and Routing in Chip Design	
@@ -0,0 +1 @@
+For its advantage in GPU acceleration and less dependency on human experts, machine learning has been an emerging tool for solving the placement and routing problems, as two critical steps in modern chip design flow. Being still in its early stage, there are fundamental issues: scalability, reward design, and end-to-end learning paradigm etc. To achieve end-to-end placement learning, we first propose a joint learning method termed by DeepPlace for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme. To further bridge the placement with the subsequent routing task, we also develop a joint learning approach via reinforcement learning to fulfill both macro placement and routing, which is called DeepPR. One key design in our (reinforcement) learning paradigm involves a multi-view embedding model to encode both global graph level and local node level information of the input macros. Moreover, the random network distillation is devised to encourage exploration. Experiments on public chip design benchmarks show that our method can effectively learn from experience and also provides intermediate placement for the post standard cell placement, within few hours for training.
\ No newline at end of file
diff --git a/data/2021/neurips/On Large-Cohort Training for Federated Learning b/data/2021/neurips/On Large-Cohort Training for Federated Learning
new file mode 100644
index 0000000000..2305a98717
--- /dev/null
+++ b/data/2021/neurips/On Large-Cohort Training for Federated Learning	
@@ -0,0 +1 @@
+Federated learning methods typically learn a model by iteratively sampling updates from a population of clients. In this work, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. Our work poses three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.
\ No newline at end of file
diff --git a/data/2021/neurips/On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources b/data/2021/neurips/On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources
new file mode 100644
index 0000000000..3c969c3362
--- /dev/null
+++ b/data/2021/neurips/On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources	
@@ -0,0 +1 @@
+Domain adaptation (DA) benefits from the rigorous theoretical works that study its insightful characteristics and various aspects, e.g., learning domain-invariant representations and its trade-off. However, it seems not the case for the multiple source DA and domain generalization (DG) settings which are remarkably more complicated and sophisticated due to the involvement of multiple source domains and potential unavailability of target domain during training. In this paper, we develop novel upper-bounds for the target general loss which appeal to us to define two kinds of domain-invariant representations. We further study the pros and cons as well as the trade-offs of enforcing learning each domain-invariant representation. Finally, we conduct experiments to inspect the trade-off of these representations for offering practical hints regarding how to use them in practice and explore other interesting properties of our developed theory.
\ No newline at end of file
diff --git a/data/2021/neurips/On Linear Stability of SGD and Input-Smoothness of Neural Networks b/data/2021/neurips/On Linear Stability of SGD and Input-Smoothness of Neural Networks
new file mode 100644
index 0000000000..a5be7fe913
--- /dev/null
+++ b/data/2021/neurips/On Linear Stability of SGD and Input-Smoothness of Neural Networks	
@@ -0,0 +1 @@
+The multiplicative structure of parameters and input data in the first layer of neural networks is explored to build connection between the landscape of the loss function with respect to parameters and the landscape of the model function with respect to input data. By this connection, it is shown that flat minima regularize the gradient of the model function, which explains the good generalization performance of flat minima. Then, we go beyond the flatness and consider high-order moments of the gradient noise, and show that Stochastic Gradient Descent (SGD) tends to impose constraints on these moments by a linear stability analysis of SGD around global minima. Together with the multiplicative structure, we identify the Sobolev regularization effect of SGD, i.e. SGD regularizes the Sobolev seminorms of the model function with respect to the input data. Finally, bounds for generalization error and adversarial robustness are provided for solutions found by SGD under assumptions of the data distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/On Locality of Local Explanation Models b/data/2021/neurips/On Locality of Local Explanation Models
new file mode 100644
index 0000000000..6c436db12e
--- /dev/null
+++ b/data/2021/neurips/On Locality of Local Explanation Models	
@@ -0,0 +1 @@
+Shapley values provide model agnostic feature attributions for model outcome at a particular instance by simulating feature absence under a global population distribution. The use of a global population can lead to potentially misleading results when local model behaviour is of interest. Hence we consider the formulation of neighbourhood reference distributions that improve the local interpretability of Shapley values. By doing so, we find that the Nadaraya-Watson estimator, a well-studied kernel regressor, can be expressed as a self-normalised importance sampling estimator. Empirically, we observe that Neighbourhood Shapley values identify meaningful sparse feature relevance attributions that provide insight into local model behaviour, complimenting conventional Shapley analysis. They also increase on-manifold explainability and robustness to the construction of adversarial classifiers.
\ No newline at end of file
diff --git a/data/2021/neurips/On Margin-Based Cluster Recovery with Oracle Queries b/data/2021/neurips/On Margin-Based Cluster Recovery with Oracle Queries
new file mode 100644
index 0000000000..6b0c134bcb
--- /dev/null
+++ b/data/2021/neurips/On Margin-Based Cluster Recovery with Oracle Queries	
@@ -0,0 +1 @@
+We study an active cluster recovery problem where, given a set of $n$ points and an oracle answering queries like"are these two points in the same cluster?", the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous work, the classic SVM margin, and standard notions of stability for center-based clusterings. Then, under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only $O(\log n)$ queries. For the Euclidean case, $\mathbb{R}^m$, we give an algorithm that recovers arbitrary convex clusters, in polynomial time, and with a number of queries that is lower than the best existing algorithm by $\Theta(m^m)$ factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the $O(\log n)$ query bound, and is provably near-optimal as a function of the packing number of the space. Finally, for clusterings realized by binary concept classes, we give a combinatorial characterization of recoverability with $O(\log n)$ queries, and we show that, for many concept classes in Euclidean spaces, this characterization is equivalent to our margin condition. Our results show a deep connection between cluster margins and active cluster recoverability.
\ No newline at end of file
diff --git a/data/2021/neurips/On Memorization in Probabilistic Deep Generative Models b/data/2021/neurips/On Memorization in Probabilistic Deep Generative Models
new file mode 100644
index 0000000000..531ae93a24
--- /dev/null
+++ b/data/2021/neurips/On Memorization in Probabilistic Deep Generative Models	
@@ -0,0 +1 @@
+Recent advances in deep generative models have led to impressive results in a variety of application domains. Motivated by the possibility that deep learning models might memorize part of the input data, there have been increased efforts to understand how memorization arises. In this work, we extend a recently proposed measure of memorization for supervised learning (Feldman, 2019) to the unsupervised density estimation problem and adapt it to be more computationally efficient. Next, we present a study that demonstrates how memorization can occur in probabilistic deep generative models such as variational autoencoders. This reveals that the form of memorization to which these models are susceptible differs fundamentally from mode collapse and overfitting. Furthermore, we show that the proposed memorization score measures a phenomenon that is not captured by commonly-used nearest neighbor tests. Finally, we discuss several strategies that can be used to limit memorization in practice. Our work thus provides a framework for understanding problematic memorization in probabilistic generative models.
\ No newline at end of file
diff --git a/data/2021/neurips/On Model Calibration for Long-Tailed Object Detection and Instance Segmentation b/data/2021/neurips/On Model Calibration for Long-Tailed Object Detection and Instance Segmentation
new file mode 100644
index 0000000000..0a18e8556c
--- /dev/null
+++ b/data/2021/neurips/On Model Calibration for Long-Tailed Object Detection and Instance Segmentation	
@@ -0,0 +1 @@
+Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach -- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach. Our code is publicly available at https://github.com/tydpan/NorCal/.
\ No newline at end of file
diff --git a/data/2021/neurips/On Optimal Interpolation in Linear Regression b/data/2021/neurips/On Optimal Interpolation in Linear Regression
new file mode 100644
index 0000000000..d293d61f26
--- /dev/null
+++ b/data/2021/neurips/On Optimal Interpolation in Linear Regression	
@@ -0,0 +1 @@
+Understanding when and why interpolating methods generalize well has recently been a topic of interest in statistical learning theory. However, systematically connecting interpolating methods to achievable notions of optimality has only received partial attention. In this paper, we investigate the question of what is the optimal way to interpolate in linear regression using functions that are linear in the response variable (as the case for the Bayes optimal estimator in ridge regression) and depend on the data, the population covariance of the data, the signal-to-noise ratio and the covariance of the prior for the signal, but do not depend on the value of the signal itself nor the noise vector in the training data. We provide a closed-form expression for the interpolator that achieves this notion of optimality and show that it can be derived as the limit of preconditioned gradient descent with a specific initialization. We identify a regime where the minimum-norm interpolator provably generalizes arbitrarily worse than the optimal response-linear achievable interpolator that we introduce, and validate with numerical experiments that the notion of optimality we consider can be achieved by interpolating methods that only use the training data as input in the case of an isotropic prior. Finally, we extend the notion of optimal response-linear interpolation to random features regression under a linear data-generating model that has been previously studied in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/On Optimal Robustness to Adversarial Corruption in Online Decision Problems b/data/2021/neurips/On Optimal Robustness to Adversarial Corruption in Online Decision Problems
new file mode 100644
index 0000000000..4f8c23cdb4
--- /dev/null
+++ b/data/2021/neurips/On Optimal Robustness to Adversarial Corruption in Online Decision Problems	
@@ -0,0 +1 @@
+This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}{\Delta} + \sqrt{ \frac{C \log N }{\Delta} } )$-regret, where $N, \Delta$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.
\ No newline at end of file
diff --git a/data/2021/neurips/On Path Integration of Grid Cells: Group Representation and Isotropic Scaling b/data/2021/neurips/On Path Integration of Grid Cells: Group Representation and Isotropic Scaling
new file mode 100644
index 0000000000..148b49b81b
--- /dev/null
+++ b/data/2021/neurips/On Path Integration of Grid Cells: Group Representation and Isotropic Scaling	
@@ -0,0 +1 @@
+Understanding how grid cells perform path integration calculations remains a fundamental problem. In this paper, we conduct theoretical analysis of a general representation model of path integration by grid cells, where the 2D self-position is encoded as a higher dimensional vector, and the 2D self-motion is represented by a general transformation of the vector. We identify two conditions on the transformation. One is a group representation condition that is necessary for path integration. The other is an isotropic scaling condition that ensures locally conformal embedding, so that the error in the vector representation translates conformally to the error in the 2D self-position. Then we investigate the simplest transformation, i.e., the linear transformation, uncover its explicit algebraic and geometric structure as matrix Lie group of rotation, and explore the connection between the isotropic scaling condition and a special class of hexagon grid patterns. Finally, with our optimization-based approach, we manage to learn hexagon grid patterns that share similar properties of the grid cells in the rodent brain. The learned model is capable of accurate long distance path integration. Code is available at https://github.com/ruiqigao/grid-cell-path.
\ No newline at end of file
diff --git a/data/2021/neurips/On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations b/data/2021/neurips/On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
new file mode 100644
index 0000000000..d2e8f6f80e
--- /dev/null
+++ b/data/2021/neurips/On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations	
@@ -0,0 +1 @@
+KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/On Plasticity, Invariance, and Mutually Frozen Weights in Sequential Task Learning b/data/2021/neurips/On Plasticity, Invariance, and Mutually Frozen Weights in Sequential Task Learning
new file mode 100644
index 0000000000..16693744ba
--- /dev/null
+++ b/data/2021/neurips/On Plasticity, Invariance, and Mutually Frozen Weights in Sequential Task Learning	
@@ -0,0 +1 @@
+Plastic neural networks have the ability to adapt to new tasks. However, in a continual learning setting, the conﬁguration of parameters learned in previous tasks can severely reduce the adaptability to future tasks. In particular, we show that, when using weight decay, weights in successive layers of a deep network may become “mutually frozen”. This has a double effect: on the one hand, it makes the network updates more invariant to nuisance factors, providing a useful bias for future tasks. On the other hand, it can prevent the network from learning new tasks that require signiﬁcantly different features. In this context, we ﬁnd that the local input sensitivity of a deep model is correlated with its ability to adapt, thus leading to an intriguing trade-off between adaptability and invariance when training a deep model more than once. We then show that a simple intervention that “resets” the mutually frozen connections can improve transfer learning on a variety of visual classiﬁcation tasks. The efﬁcacy of “resetting” itself depends on the size of the target dataset and the difference of the pre-training and target domains, allowing us to achieve state-of-the-art results on some datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/On Provable Benefits of Depth in Training Graph Convolutional Networks b/data/2021/neurips/On Provable Benefits of Depth in Training Graph Convolutional Networks
new file mode 100644
index 0000000000..7442ff01dd
--- /dev/null
+++ b/data/2021/neurips/On Provable Benefits of Depth in Training Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs) are known to suffer from performance degradation as the number of layers increases, which is usually attributed to over-smoothing. Despite the apparent consensus, we observe that there exists a discrepancy between the theoretical understanding of over-smoothing and the practical capabilities of GCNs. Specifically, we argue that over-smoothing does not necessarily happen in practice, a deeper model is provably expressive, can converge to global optimum with linear convergence rate, and achieve very high training accuracy as long as properly trained. Despite being capable of achieving high training accuracy, empirical results show that the deeper models generalize poorly on the testing stage and existing theoretical understanding of such behavior remains elusive. To achieve better understanding, we carefully analyze the generalization capability of GCNs, and show that the training strategies to achieve high training accuracy significantly deteriorate the generalization capability of GCNs. Motivated by these findings, we propose a decoupled structure for GCNs that detaches weight matrices from feature propagation to preserve the expressive power and ensure good generalization performance. We conduct empirical evaluations on various synthetic and real-world datasets to validate the correctness of our theory.
\ No newline at end of file
diff --git a/data/2021/neurips/On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry b/data/2021/neurips/On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry
new file mode 100644
index 0000000000..ce33e90865
--- /dev/null
+++ b/data/2021/neurips/On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry	
@@ -0,0 +1 @@
+In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
\ No newline at end of file
diff --git a/data/2021/neurips/On Robust Optimal Transport: Computational Complexity and Barycenter Computation b/data/2021/neurips/On Robust Optimal Transport: Computational Complexity and Barycenter Computation
new file mode 100644
index 0000000000..ca5a232eac
--- /dev/null
+++ b/data/2021/neurips/On Robust Optimal Transport: Computational Complexity and Barycenter Computation	
@@ -0,0 +1 @@
+We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions and $\varepsilon$ is the desired error. Furthermore, we investigate a fixed-support robust barycenter problem between $m$ discrete probability distributions with at most $n$ number of supports and develop an approximating algorithm based on iterative Bregman projections (IBP). For the specific case $m = 2$, we show that this algorithm can approximate the optimal barycenter value in $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon})$ time, thus being better than the previous complexity $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon^2})$ of the IBP algorithm for approximating the Wasserstein barycenter.
\ No newline at end of file
diff --git a/data/2021/neurips/On Success and Simplicity: A Second Look at Transferable Targeted Attacks b/data/2021/neurips/On Success and Simplicity: A Second Look at Transferable Targeted Attacks
new file mode 100644
index 0000000000..d79e5f1af7
--- /dev/null
+++ b/data/2021/neurips/On Success and Simplicity: A Second Look at Transferable Targeted Attacks	
@@ -0,0 +1 @@
+Achieving transferability of targeted attacks is reputed to be remarkably difficult. Currently, state-of-the-art approaches are resource-intensive because they necessitate training model(s) for each target class with additional data. In our investigation, we find, however, that simple transferable attacks which require neither additional data nor model training can achieve surprisingly high targeted transferability. This insight has been overlooked until now, mainly due to the widespread practice of unreasonably restricting attack optimization to a limited number of iterations. In particular, we, for the first time, identify that a simple logit loss can yield competitive results with the state of the arts. Our analysis spans a variety of transfer settings, especially including three new, realistic settings: an ensemble transfer setting with little model similarity, a worse-case setting with low-ranked target classes, and also a real-world attack against the Google Cloud Vision API. Results in these new settings demonstrate that the commonly adopted, easy settings cannot fully reveal the actual properties of different attacks and may cause misleading comparisons. We also show the usefulness of the simple logit loss for generating targeted universal adversarial perturbations in a data-free and training-free manner. Overall, the aim of our analysis is to inspire a more meaningful evaluation on targeted transferability. Code is available at https://github.com/ZhengyuZhao/Targeted-Tansfer
\ No newline at end of file
diff --git a/data/2021/neurips/On The Structure of Parametric Tournaments with Application to Ranking from Pairwise Comparisons b/data/2021/neurips/On The Structure of Parametric Tournaments with Application to Ranking from Pairwise Comparisons
new file mode 100644
index 0000000000..7937fcdf35
--- /dev/null
+++ b/data/2021/neurips/On The Structure of Parametric Tournaments with Application to Ranking from Pairwise Comparisons	
@@ -0,0 +1 @@
+We consider the classical problem of ﬁnding the minimum feedback arc set on tournaments (MFAST). The problem is NP-hard in general and we study it for important classes of tournaments that arise naturally in the problem of learning to rank from pairwise comparisons. Speciﬁcally, we consider tournaments classes that arise out of parametric preference matrices that can lead to cyclic preference relations. We investigate their structural properties via forbidden sub tournament conﬁgurations. Towards this, we introduce Tournament Dimension - a combinatorial parameter that characterizes the size of a forbidden conﬁguration for rank r tournament classes i.e., classes that arise out of pairwise preference matrices which lead to rank r skew-symmetric matrices under a suitable link function. Our main result is a polynomial-time algorithm - Rank2Rank - that solves the MFAST problem for the rank 2 tournament class. This is achieved via a geometric characterization that relies on our explicit construction of a forbidden conﬁguration for this class. Building on our understanding of the rank-2 tournament class, we propose a very general and ﬂexible parametric pairwise preference model called the local-global model which subsumes the popular Bradley-Terry-Luce/Thurstone classes to capture locally cyclic as well as globally acyclic preference relations. We develop a polynomial-time algorithm - BlockRank2Rank - to solve the MFAST problem on the associated Block-Rank 2 tournament class. As an application, we study the problem of learning to rank from pairwise comparisons under the proposed local-global preference model. Exploiting our structural characterization, we propose PairwiseBlockRank - a pairwise ranking algorithm for this class. We show sample complexity bounds of PairwiseBlockRank to learn a good ranking under the proposed model. Finally, we conduct experiments on synthetic and real-world datasets to show the efﬁcacy of the proposed algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/On Training Implicit Models b/data/2021/neurips/On Training Implicit Models
new file mode 100644
index 0000000000..45f526a8ed
--- /dev/null
+++ b/data/2021/neurips/On Training Implicit Models	
@@ -0,0 +1 @@
+This paper focuses on training implicit models of infinite layers. Specifically, previous works employ implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact but expensive gradient for training? In this work, we propose a novel gradient estimate for implicit models, named phantom gradient, that 1) forgoes the costly computation of the exact gradient; and 2) provides an update direction empirically preferable to the implicit model training. We theoretically analyze the condition under which an ascent direction of the loss landscape could be found, and provide two specific instantiations of the phantom gradient based on the damped unrolling and Neumann series. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times, and even boost the performance over approaches based on the exact gradient on ImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/On UMAP's True Loss Function b/data/2021/neurips/On UMAP's True Loss Function
new file mode 100644
index 0000000000..29ffcee078
--- /dev/null
+++ b/data/2021/neurips/On UMAP's True Loss Function	
@@ -0,0 +1 @@
+UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's effective loss function in closed form and find that it differs from the published one. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the shared $k$ nearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Instead, we claim that the key to UMAP's success is its implicit balancing of attraction and repulsion resulting from negative sampling. This balancing in turn facilitates optimization via gradient descent. We corroborate our theoretical findings on toy and single cell RNA sequencing data.
\ No newline at end of file
diff --git a/data/2021/neurips/On learning sparse vectors from mixture of responses b/data/2021/neurips/On learning sparse vectors from mixture of responses
new file mode 100644
index 0000000000..aabc69853d
--- /dev/null
+++ b/data/2021/neurips/On learning sparse vectors from mixture of responses	
@@ -0,0 +1 @@
+In this paper, we address two learning problems. Suppose a family of (cid:96) unknown sparse vectors is ﬁxed, where each vector has at most k non-zero elements. In the ﬁrst problem, we concentrate on robust learning the supports of all vectors from the family using a sequence of noisy responses. Each response to a query vector shows the sign of the inner product between a randomly chosen vector from the family and the query vector. In the second problem, we aim at designing queries such that all sparse vectors from the family can be approximately reconstructed based on the error-free responses. This learning model was introduced in the work of Gandikota et al., 2020, and these problems can be seen as generalizations of support recovery and approximate recovery problems, well-studied under the framework of 1-bit compressed sensing. As the main contribution of the paper, we prove the existence of learning algorithms for the ﬁrst problem which work without any assumptions. Under a mild structural assumption on the unknown vectors, we also show the existence of learning algorithms for the second problem and rigorously analyze their query complexity.
\ No newline at end of file
diff --git a/data/2021/neurips/On sensitivity of meta-learning to support data b/data/2021/neurips/On sensitivity of meta-learning to support data
new file mode 100644
index 0000000000..81caedbf63
--- /dev/null
+++ b/data/2021/neurips/On sensitivity of meta-learning to support data	
@@ -0,0 +1 @@
+Meta-learning algorithms are widely used for few-shot learning. For example, image recognition systems that readily adapt to unseen classes after seeing only a few labeled examples. Despite their success, we show that modern meta-learning algorithms are extremely sensitive to the data used for adaptation, i.e. support data. In particular, we demonstrate the existence of (unaltered, in-distribution, natural) images that, when used for adaptation, yield accuracy as low as 4\% or as high as 95\% on standard few-shot image classification benchmarks. We explain our empirical findings in terms of class margins, which in turn suggests that robust and safe meta-learning requires larger margins than supervised learning.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Algorithmic Stability of Adversarial Training b/data/2021/neurips/On the Algorithmic Stability of Adversarial Training
new file mode 100644
index 0000000000..ea77509fa3
--- /dev/null
+++ b/data/2021/neurips/On the Algorithmic Stability of Adversarial Training	
@@ -0,0 +1 @@
+The adversarial training is a popular tool to remedy the vulnerability of deep learning models against adversarial attacks, and there is rich theoretical literature on the training loss of adversarial training algorithms. In contrast, this paper studies the algorithmic stability of a generic adversarial training algorithm, which can further help to establish an upper bound for generalization error. By ﬁguring out the stability upper bound and lower bound, we argue that the non-differentiability issue of adversarial training causes worse algorithmic stability than their natu-ral counterparts. To tackle this problem, we consider a noise injection method. While the non-differentiability problem seriously affects the stability of adversarial training, injecting noise enables the training trajectory to avoid the occurrence of non-differentiability with dominating probability, hence enhancing the stability performance of adversarial training. Our analysis also studies the relation between the algorithm stability and numerical approximation error of adversarial attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Bias-Variance-Cost Tradeoff of Stochastic Optimization b/data/2021/neurips/On the Bias-Variance-Cost Tradeoff of Stochastic Optimization
new file mode 100644
index 0000000000..5a08613f39
--- /dev/null
+++ b/data/2021/neurips/On the Bias-Variance-Cost Tradeoff of Stochastic Optimization	
@@ -0,0 +1 @@
+We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs. This setting captures a variety of optimization paradigms widely used in machine learning, such as conditional stochastic optimization, bilevel optimization, and distributionally robust optimization. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate trade-off among the bias, the variance, and the oracle cost. We provide a systematic study of their convergences and total computation complexities for strongly convex, convex, and nonconvex objectives, and demonstrate their superiority over the naive biased stochastic gradient method. Moreover, when applied to conditional stochastic optimization, the MLMC gradient methods significantly improve the best-known sample complexity in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning b/data/2021/neurips/On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning
new file mode 100644
index 0000000000..433cb040d6
--- /dev/null
+++ b/data/2021/neurips/On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an $\epsilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used at test time. Finally, we empirically compare SG-MRL and MAML in several deep RL environments.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method b/data/2021/neurips/On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method
new file mode 100644
index 0000000000..b5ed5b0029
--- /dev/null
+++ b/data/2021/neurips/On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method	
@@ -0,0 +1 @@
+Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an $\tilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for TSIVR-PG to find an $\epsilon$-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms b/data/2021/neurips/On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms
new file mode 100644
index 0000000000..664d8152e2
--- /dev/null
+++ b/data/2021/neurips/On the Convergence of Prior-Guided Zeroth-Order Optimization Algorithms	
@@ -0,0 +1 @@
+Zeroth-order (ZO) optimization is widely used to handle challenging tasks, such as query-based black-box adversarial attacks and reinforcement learning. Various attempts have been made to integrate prior information into the gradient estimation procedure based on finite differences, with promising empirical results. However, their convergence properties are not well understood. This paper makes an attempt to fill up this gap by analyzing the convergence of prior-guided ZO algorithms under a greedy descent framework with various gradient estimators. We provide a convergence guarantee for the prior-guided random gradient-free (PRGF) algorithms. Moreover, to further accelerate over greedy descent methods, we present a new accelerated random search (ARS) algorithm that incorporates prior information, together with a convergence analysis. Finally, our theoretical results are confirmed by experiments on several numerical benchmarks as well as adversarial attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Convergence of Step Decay Step-Size for Stochastic Optimization b/data/2021/neurips/On the Convergence of Step Decay Step-Size for Stochastic Optimization
new file mode 100644
index 0000000000..26dc770522
--- /dev/null
+++ b/data/2021/neurips/On the Convergence of Step Decay Step-Size for Stochastic Optimization	
@@ -0,0 +1 @@
+The convergence of stochastic gradient descent is highly dependent on the step-size, especially on non-convex problems such as neural network training. Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. We provide the convergence results for step decay in the non-convex regime, ensuring that the gradient norm vanishes at an $\mathcal{O}(\ln T/\sqrt{T})$ rate. We also provide the convergence guarantees for general (possibly non-smooth) convex problems, ensuring an $\mathcal{O}(\ln T/\sqrt{T})$ convergence rate. Finally, in the strongly convex case, we establish an $\mathcal{O}(\ln T/T)$ rate for smooth problems, which we also prove to be tight, and an $\mathcal{O}(\ln^2 T /T)$ rate without the smoothness assumption. We illustrate the practical efficiency of the step decay step-size in several large scale deep neural network training tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Cryptographic Hardness of Learning Single Periodic Neurons b/data/2021/neurips/On the Cryptographic Hardness of Learning Single Periodic Neurons
new file mode 100644
index 0000000000..9a2db49b89
--- /dev/null
+++ b/data/2021/neurips/On the Cryptographic Hardness of Learning Single Periodic Neurons	
@@ -0,0 +1 @@
+We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir’18), and Statistical Query (SQ) algorithms (Song et al.’17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.’21) and phase retrieval with an optimal sample complexity of d + 1 samples. In the former case, this improves upon the quadratic-in-d sample complexity required in (Bruna et al.’21).
\ No newline at end of file
diff --git a/data/2021/neurips/On the Equivalence between Neural Network and Support Vector Machine b/data/2021/neurips/On the Equivalence between Neural Network and Support Vector Machine
new file mode 100644
index 0000000000..127745fe13
--- /dev/null
+++ b/data/2021/neurips/On the Equivalence between Neural Network and Support Vector Machine	
@@ -0,0 +1 @@
+Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalences between NNs and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments is available at \url{https://github.com/leslie-CH/equiv-nn-svm}.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Estimation Bias in Double Q-Learning b/data/2021/neurips/On the Estimation Bias in Double Q-Learning
new file mode 100644
index 0000000000..750b6f8f58
--- /dev/null
+++ b/data/2021/neurips/On the Estimation Bias in Double Q-Learning	
@@ -0,0 +1 @@
+Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal ﬁxed points under an approximate Bellman operator. To address the concerns of converging to non-optimal stationary solutions, we propose a simple but effective approach as a partial ﬁx for the underestimation bias in double Q-learning. This approach leverages an approximate dynamic programming to bound the target value. We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its signiﬁcant improvement over baseline algorithms. an interesting fact that, under the effects of approximation error, double Q-learning may have multiple non-optimal ﬁxed points. The main cause of such non-optimal ﬁxed points is the underestimation bias of double Q-learning. Regarding this issue, we provide some analysis to characterize what kind of Bellman operators may suffer from the same problem, and how the agent may behave around these ﬁxed points. To address the potential risk of converging to non-optimal solutions, we propose doubly bounded Q-learning to reduce the underestimation in double Q-learning. The main idea of this approach is to leverage an abstracted dynamic programming as a second value estimator to rule out non-optimal ﬁxed points. The experiments show that the proposed method has shown great promise in improving
\ No newline at end of file
diff --git a/data/2021/neurips/On the Existence of The Adversarial Bayes Classifier b/data/2021/neurips/On the Existence of The Adversarial Bayes Classifier
new file mode 100644
index 0000000000..9db4a7db9c
--- /dev/null
+++ b/data/2021/neurips/On the Existence of The Adversarial Bayes Classifier	
@@ -0,0 +1 @@
+Adversarial robustness is a critical property in a variety of modern machine learning applications. While it has been the subject of several recent theoretical studies, many important questions related to adversarial robustness are still open. In this work, we study a fundamental question regarding Bayes optimality for adversarial robustness. We provide general sufficient conditions under which the existence of a Bayes optimal classifier can be guaranteed for adversarial robustness. Our results can provide a useful tool for a subsequent study of surrogate losses in adversarial robustness and their consistency properties. This manuscript is the extended and corrected version of the paper \emph{On the Existence of the Adversarial Bayes Classifier} published in NeurIPS 2021. There were two errors in theorem statements in the original paper -- one in the definition of pseudo-certifiable robustness and the other in the measurability of $A^\e$ for arbitrary metric spaces. In this version we correct the errors. Furthermore, the results of the original paper did not apply to some non-strictly convex norms and here we extend our results to all possible norms.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Expected Complexity of Maxout Networks b/data/2021/neurips/On the Expected Complexity of Maxout Networks
new file mode 100644
index 0000000000..cb490d1007
--- /dev/null
+++ b/data/2021/neurips/On the Expected Complexity of Maxout Networks	
@@ -0,0 +1 @@
+Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity. Taking the number of activation regions as a complexity measure, recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work, we show that this phenomenon also occurs in networks with maxout (multi-argument) activation functions and when considering the decision boundaries in classification tasks. We also show that the parameter space has a multitude of full-dimensional regions with widely different complexity, and obtain nontrivial lower bounds on the expected complexity. Finally, we investigate different parameter initialization procedures and show that they can increase the speed of convergence in training.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Expressivity of Markov Reward b/data/2021/neurips/On the Expressivity of Markov Reward
new file mode 100644
index 0000000000..bba89663a1
--- /dev/null
+++ b/data/2021/neurips/On the Expressivity of Markov Reward	
@@ -0,0 +1 @@
+Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of"task"that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Frequency Bias of Generative Models b/data/2021/neurips/On the Frequency Bias of Generative Models
new file mode 100644
index 0000000000..30287e026c
--- /dev/null
+++ b/data/2021/neurips/On the Frequency Bias of Generative Models	
@@ -0,0 +1 @@
+The key objective of Generative Adversarial Networks (GANs) is to generate new data with the same statistics as the provided training data. However, multiple recent works show that state-of-the-art architectures yet struggle to achieve this goal. In particular, they report an elevated amount of high frequencies in the spectral statistics which makes it straightforward to distinguish real and generated images. Explanations for this phenomenon are controversial: While most works attribute the artifacts to the generator, other works point to the discriminator. We take a sober look at those explanations and provide insights on what makes proposed measures against high-frequency artifacts effective. To achieve this, we first independently assess the architectures of both the generator and discriminator and investigate if they exhibit a frequency bias that makes learning the distribution of high-frequency content particularly problematic. Based on these experiments, we make the following four observations: 1) Different upsampling operations bias the generator towards different spectral properties. 2) Checkerboard artifacts introduced by upsampling cannot explain the spectral discrepancies alone as the generator is able to compensate for these artifacts. 3) The discriminator does not struggle with detecting high frequencies per se but rather struggles with frequencies of low magnitude. 4) The downsampling operations in the discriminator can impair the quality of the training signal it provides. In light of these findings, we analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training but find that none of the existing approaches can fully resolve spectral artifacts yet. Our results suggest that there is great potential in improving the discriminator and that this could be key to match the distribution of the training data more closely.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Generative Utility of Cyclic Conditionals b/data/2021/neurips/On the Generative Utility of Cyclic Conditionals
new file mode 100644
index 0000000000..3344cfcaa5
--- /dev/null
+++ b/data/2021/neurips/On the Generative Utility of Cyclic Conditionals	
@@ -0,0 +1 @@
+We study whether and how can we model a joint distribution $p(x,z)$ using two conditional models $p(x|z)$ and $q(z|x)$ that form a cycle. This is motivated by the observation that deep generative models, in addition to a likelihood model $p(x|z)$, often also use an inference model $q(z|x)$ for extracting representation, but they rely on a usually uninformative prior distribution $p(z)$ to define a joint distribution, which may render problems like posterior collapse and manifold mismatch. To explore the possibility to model a joint distribution using only $p(x|z)$ and $q(z|x)$, we study their compatibility and determinacy, corresponding to the existence and uniqueness of a joint distribution whose conditional distributions coincide with them. We develop a general theory for operable equivalence criteria for compatibility, and sufficient conditions for determinacy. Based on the theory, we propose a novel generative modeling framework CyGen that only uses the two cyclic conditional models. We develop methods to achieve compatibility and determinacy, and to use the conditional models to fit and generate data. With the prior constraint removed, CyGen better fits data and captures more representative features, supported by both synthetic and real-world experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Importance of Gradients for Detecting Distributional Shifts in the Wild b/data/2021/neurips/On the Importance of Gradients for Detecting Distributional Shifts in the Wild
new file mode 100644
index 0000000000..c0ab4bd8ec
--- /dev/null
+++ b/data/2021/neurips/On the Importance of Gradients for Detecting Distributional Shifts in the Wild	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) data has become a critical component in ensuring the safe deployment of machine learning models in the real world. Existing OOD detection approaches primarily rely on the output or feature space for deriving OOD scores, while largely overlooking information from the gradient space. In this paper, we present GradNorm, a simple and effective approach for detecting OOD inputs by utilizing information extracted from the gradient space. GradNorm directly employs the vector norm of gradients, backpropagated from the KL divergence between the softmax output and a uniform probability distribution. Our key idea is that the magnitude of gradients is higher for indistribution (ID) data than that for OOD data, making it informative for OOD detection. GradNorm demonstrates superior performance, reducing the average FPR95 by up to 10.89% compared to the previous best method. Code and data available: https://github.com/deeplearning-wisc/gradnorm_ood.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Out-of-distribution Generalization of Probabilistic Image Modelling b/data/2021/neurips/On the Out-of-distribution Generalization of Probabilistic Image Modelling
new file mode 100644
index 0000000000..b3bb16e935
--- /dev/null
+++ b/data/2021/neurips/On the Out-of-distribution Generalization of Probabilistic Image Modelling	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection and lossless compression constitute two problems that can be solved by the training of probabilistic models on a first dataset with subsequent likelihood evaluation on a second dataset, where data distributions differ. By defining the generalization of probabilistic models in terms of likelihood we show that, in the case of image models, the OOD generalization ability is dominated by local features. This motivates our proposal of a Local Autoregressive model that exclusively models local image features towards improving OOD performance. We apply the proposed model to OOD detection tasks and achieve state-of-the-art unsupervised OOD detection performance without the introduction of additional data. Additionally, we employ our model to build a new lossless image compressor: NeLLoC (Neural Local Lossless Compressor) and report state-of-the-art compression rates and model size.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay b/data/2021/neurips/On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
new file mode 100644
index 0000000000..38d3ac14c7
--- /dev/null
+++ b/data/2021/neurips/On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay	
@@ -0,0 +1 @@
+Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Power of Differentiable Learning versus PAC and SQ Learning b/data/2021/neurips/On the Power of Differentiable Learning versus PAC and SQ Learning
new file mode 100644
index 0000000000..06a84a95cc
--- /dev/null
+++ b/data/2021/neurips/On the Power of Differentiable Learning versus PAC and SQ Learning	
@@ -0,0 +1 @@
+We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $\rho$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b \rho$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $\rho$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b \rho^2$ is large enough, the power of SGD is equivalent to that of SQ learning.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Power of Edge Independent Graph Models b/data/2021/neurips/On the Power of Edge Independent Graph Models
new file mode 100644
index 0000000000..53b82c28e8
--- /dev/null
+++ b/data/2021/neurips/On the Power of Edge Independent Graph Models	
@@ -0,0 +1 @@
+Why do many modern neural-network-based graph generative models fail to reproduce typical real-world network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic Erd\"{o}s-R\'{e}nyi and stochastic block models, as well as modern generative models such as NetGAN, variational graph autoencoders, and CELL. We prove that subject to a bounded overlap condition, which ensures that the model does not simply memorize a single graph, edge independent models are inherently limited in their ability to generate graphs with high triangle and other subgraph densities. Notably, such high densities are known to appear in real-world social networks and other graphs. We complement our negative results with a simple generative model that balances overlap and accuracy, performing comparably to more complex models in reconstructing many graph statistics.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Provable Generalization of Recurrent Neural Networks b/data/2021/neurips/On the Provable Generalization of Recurrent Neural Networks
new file mode 100644
index 0000000000..053ac4c0ec
--- /dev/null
+++ b/data/2021/neurips/On the Provable Generalization of Recurrent Neural Networks	
@@ -0,0 +1 @@
+Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works: 1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(\beta^T_lX_l)$ and require normalized conditions that $||X_l||\leq\epsilon$ with some very small $\epsilon$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$. 2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(\beta^T[X_{l_1},...,X_{l_N}])$, which do not belong to the"additive"concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ is small, $f(\beta^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Representation Power of Set Pooling Networks b/data/2021/neurips/On the Representation Power of Set Pooling Networks
new file mode 100644
index 0000000000..1cfc76feb4
--- /dev/null
+++ b/data/2021/neurips/On the Representation Power of Set Pooling Networks	
@@ -0,0 +1 @@
+Point clouds and sets are input data-types which pose unique problems to deep learning. Since sets can have variable cardinality and are unchanged by permutation, the input space for these problems naturally form inﬁnite-dimensional non-Euclidean spaces. Despite these mathematical difﬁculties, PointNet [17] and Deep Sets [30] introduced foundational neural network architectures to address these problems. In this paper we present a uniﬁed framework to study the expressive power of such networks as well as their extensions beyond point clouds (partially addressing a conjecture on the extendibility of DeepSets along the way). To this end, we demonstrate the crucial role that the Hausdorff and Wasserstein metrics play and prove new cardinality-agnostic universality results to characterize exactly which functions can be approximated by these models. In particular, these results imply that PointNet generally cannot approximate averages of continuous functions over sets (e.g. center-of-mass or higher moments) implying that DeepSets is strictly more expressive than PointNet in the constant cardinality setting. Moreover, we obtain explicit lower-bounds on the approximation error and present a simple method to produce arbitrarily many examples of this failure-mode. Counterintuitively, we also prove that in the unbounded cardinality setting that any function which can be uniformly approximated by both PointNet and normalized-DeepSets must be constant. Finally, we also prove theorems on the Lipschitz properties of PointNet and normalized-DeepSets which shed insight into exploitable inductive bias in these networks.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Representation of Solutions to Elliptic PDEs in Barron Spaces b/data/2021/neurips/On the Representation of Solutions to Elliptic PDEs in Barron Spaces
new file mode 100644
index 0000000000..9d65c2ec61
--- /dev/null
+++ b/data/2021/neurips/On the Representation of Solutions to Elliptic PDEs in Barron Spaces	
@@ -0,0 +1 @@
+Numerical solutions to high-dimensional partial differential equations (PDEs) based on neural networks have seen exciting developments. This paper derives complexity estimates of the solutions of d-dimensional second-order elliptic PDEs in the Barron space, that is a set of functions admitting the integral of certain parametric ridge function against a probability measure on the parameters. We prove under some appropriate assumptions that if the coefficients and the source term of the elliptic PDE lie in Barron spaces, then the solution of the PDE is -close with respect to the H norm to a Barron function. Moreover, we prove dimensionexplicit bounds for the Barron norm of this approximate solution, depending at most polynomially on the dimension d of the PDE. As a direct consequence of the complexity estimates, the solution of the PDE can be approximated on any bounded domain by a two-layer neural network with respect to the H norm with a dimension-explicit convergence rate.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Role of Optimization in Double Descent: A Least Squares Study b/data/2021/neurips/On the Role of Optimization in Double Descent: A Least Squares Study
new file mode 100644
index 0000000000..ac03aeb850
--- /dev/null
+++ b/data/2021/neurips/On the Role of Optimization in Double Descent: A Least Squares Study	
@@ -0,0 +1 @@
+Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Sample Complexity of Learning under Geometric Stability b/data/2021/neurips/On the Sample Complexity of Learning under Geometric Stability
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/On the Sample Complexity of Privately Learning Axis-Aligned Rectangles b/data/2021/neurips/On the Sample Complexity of Privately Learning Axis-Aligned Rectangles
new file mode 100644
index 0000000000..7a504ddb42
--- /dev/null
+++ b/data/2021/neurips/On the Sample Complexity of Privately Learning Axis-Aligned Rectangles	
@@ -0,0 +1 @@
+We revisit the fundamental problem of learning Axis-Aligned-Rectangles over a finite grid $X^d\subseteq{\mathbb{R}}^d$ with differential privacy. Existing results show that the sample complexity of this problem is at most $\min\left\{ d{\cdot}\log|X| \;,\; d^{1.5}{\cdot}\left(\log^*|X| \right)^{1.5}\right\}$. That is, existing constructions either require sample complexity that grows linearly with $\log|X|$, or else it grows super linearly with the dimension $d$. We present a novel algorithm that reduces the sample complexity to only $\tilde{O}\left\{d{\cdot}\left(\log^*|X|\right)^{1.5}\right\}$, attaining a dimensionality optimal dependency without requiring the sample complexity to grow with $\log|X|$.The technique used in order to attain this improvement involves the deletion of"exposed"data-points on the go, in a fashion designed to avoid the cost of the adaptive composition theorems. The core of this technique may be of individual interest, introducing a new method for constructing statistically-efficient private algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Second-order Convergence Properties of Random Search Methods b/data/2021/neurips/On the Second-order Convergence Properties of Random Search Methods
new file mode 100644
index 0000000000..c80375ed69
--- /dev/null
+++ b/data/2021/neurips/On the Second-order Convergence Properties of Random Search Methods	
@@ -0,0 +1 @@
+We study the theoretical convergence properties of random-search methods when optimizing non-convex objective functions without having access to derivatives. We prove that standard random-search methods that do not rely on second-order information converge to a second-order stationary point. However, they suffer from an exponential complexity in terms of the input dimension of the problem. In order to address this issue, we propose a novel variant of random search that exploits negative curvature by only relying on function evaluations. We prove that this approach converges to a second-order stationary point at a much faster rate than vanilla methods: namely, the complexity in terms of the number of function evaluations is only linear in the problem dimension. We test our algorithm empirically and find good agreements with our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Stochastic Stability of Deep Markov Models b/data/2021/neurips/On the Stochastic Stability of Deep Markov Models
new file mode 100644
index 0000000000..c19ad01e30
--- /dev/null
+++ b/data/2021/neurips/On the Stochastic Stability of Deep Markov Models	
@@ -0,0 +1 @@
+Deep Markov models (DMM) are generative models that are scalable and expressive generalization of Markov models for representation, learning, and inference problems. However, the fundamental stochastic stability guarantees of such models have not been thoroughly investigated. In this paper, we provide sufficient conditions of DMM's stochastic stability as defined in the context of dynamical systems and propose a stability analysis method based on the contraction of probabilistic maps modeled by deep neural networks. We make connections between the spectral properties of neural network's weights and different types of used activation functions on the stability and overall dynamic behavior of DMMs with Gaussian distributions. Based on the theory, we propose a few practical methods for designing constrained DMMs with guaranteed stability. We empirically substantiate our theoretical results via intuitive numerical experiments using the proposed stability constraints.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Suboptimality of Thompson Sampling in High Dimensions b/data/2021/neurips/On the Suboptimality of Thompson Sampling in High Dimensions
new file mode 100644
index 0000000000..f05a1ac5f2
--- /dev/null
+++ b/data/2021/neurips/On the Suboptimality of Thompson Sampling in High Dimensions	
@@ -0,0 +1 @@
+In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with Bernoulli distributed rewards and uniform priors. We also show that including a fixed amount of forced exploration to TS does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice TS indeed can perform very poorly in some high dimensional situations.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Theory of Reinforcement Learning with Once-per-Episode Feedback b/data/2021/neurips/On the Theory of Reinforcement Learning with Once-per-Episode Feedback
new file mode 100644
index 0000000000..9a15e84591
--- /dev/null
+++ b/data/2021/neurips/On the Theory of Reinforcement Learning with Once-per-Episode Feedback	
@@ -0,0 +1 @@
+We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than the traditional requirement in RL practice that the learner receive feedback at every time step. Indeed, in many real-world applications of reinforcement learning, such as self-driving cars and robotics, it is easier to evaluate whether a learner’s complete trajectory was either “good” or “bad,” but harder to provide a reward signal at each step. To show that learning is possible in this more challenging setting, we study the case where trajectory labels are generated by an unknown parametric model, and provide a statistically and computationally efficient algorithm that achieves sub-linear regret.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Universality of Graph Neural Networks on Large Random Graphs b/data/2021/neurips/On the Universality of Graph Neural Networks on Large Random Graphs
new file mode 100644
index 0000000000..c84b385a33
--- /dev/null
+++ b/data/2021/neurips/On the Universality of Graph Neural Networks on Large Random Graphs	
@@ -0,0 +1 @@
+We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain"continuous"models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs). We study the convergence of SGNNs to their continuous counterpart (c-SGNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-SGNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) b/data/2021/neurips/On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
new file mode 100644
index 0000000000..5f20a646a9
--- /dev/null
+++ b/data/2021/neurips/On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)	
@@ -0,0 +1 @@
+It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets. Most attempted explanations propose approximating finite-LR SGD with Ito Stochastic Differential Equations (SDEs), but formal justification for this approximation (e.g., (Li et al., 2019)) only applies to SGD with tiny LR. Experimental verification of the approximation appears computationally infeasible. The current paper clarifies the picture with the following contributions: (a) An efficient simulation algorithm SVAG that provably converges to the conventionally used Ito SDE approximation. (b) A theoretically motivated testable necessary condition for the SDE approximation and its most famous implication, the linear scaling rule (Goyal et al., 2017), to hold. (c) Experiments using this simulation to demonstrate that the previously proposed SDE approximation can meaningfully capture the training and generalization properties of common deep nets.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Value of Infinite Gradients in Variational Autoencoder Models b/data/2021/neurips/On the Value of Infinite Gradients in Variational Autoencoder Models
new file mode 100644
index 0000000000..208d978b7f
--- /dev/null
+++ b/data/2021/neurips/On the Value of Infinite Gradients in Variational Autoencoder Models	
@@ -0,0 +1 @@
+A number of recent studies of continuous variational autoencoder (VAE) models have noted, either directly or indirectly, the tendency of various parameter gradients to drift towards inﬁnity during training. Because such gradients could potentially contribute to numerical instabilities, and are often framed as a problematic phenomena to be avoided, it may be tempting to shift to alternative energy functions that guarantee bounded gradients. But it remains an open question: What might the unintended consequences of such a restriction be? To address this issue, we examine how unbounded gradients relate to the regularization of a broad class of autoencoder-based architectures, including VAE models, as applied to data lying on or near a low-dimensional manifold (e.g., natural images). Our main ﬁnding is that, if the ultimate goal is to simultaneously avoid over-regularization (high reconstruction errors, sometimes referred to as posterior collapse) and under-regularization (excessive latent dimensions are not pruned from the model), then an autoencoder-based energy function with inﬁnite gradients around optimal representations is provably required per a certain technical sense which we carefully detail. Given that both over-and under-regularization can directly lead to poor generated sample quality or suboptimal feature selection, this result suggests that heuristic modiﬁcations to or constraints on the VAE energy function may at times be ill-advised, and large gradients should be accommodated to the extent possible.
\ No newline at end of file
diff --git a/data/2021/neurips/On the Value of Interaction and Function Approximation in Imitation Learning b/data/2021/neurips/On the Value of Interaction and Function Approximation in Imitation Learning
new file mode 100644
index 0000000000..3d1799a554
--- /dev/null
+++ b/data/2021/neurips/On the Value of Interaction and Function Approximation in Imitation Learning	
@@ -0,0 +1 @@
+We study the statistical guarantees for the Imitation Learning (IL) problem in episodic MDPs. Rajaraman et al. [22] show an information theoretic lower bound that in the worst case, a learner which can even actively query the expert policy suffers from a suboptimality growing quadratically in the length of the horizon, H . We study imitation learning under the µ -recoverability assumption of [27] which assumes that the difference in the Q -value under the expert policy across different actions in a state do not deviate beyond µ from the maximum. We show that the reduction proposed by [25] is statistically optimal: the resulting algorithm upon interacting with the MDP for N episodes results in a suboptimality bound of (cid:101) O ( µ |S| H/N ) which we show is optimal up to log-factors. In contrast, we show that any algorithm which does not interact with the MDP and uses an ofﬂine dataset of N expert trajectories must incur suboptimality growing as (cid:38) |S| H 2 /N even under the µ -recoverability assumption. This establishes a clear and provable separation of the minimax rates between the active setting and the no-interaction setting. We also study IL with linear function approximation . When the expert plays actions according to a linear classiﬁer of known state-action features, we use the reduction to multi-class classiﬁcation to show that with high probability, the suboptimality of behavior cloning is (cid:101) O ( dH 2 /N ) given N rollouts from the optimal
\ No newline at end of file
diff --git a/data/2021/neurips/On the Variance of the Fisher Information for Deep Learning b/data/2021/neurips/On the Variance of the Fisher Information for Deep Learning
new file mode 100644
index 0000000000..99bfb3b116
--- /dev/null
+++ b/data/2021/neurips/On the Variance of the Fisher Information for Deep Learning	
@@ -0,0 +1 @@
+In the realm of deep learning, the Fisher information matrix (FIM) gives novel insights and useful tools to characterize the loss landscape, perform second-order optimization, and build geometric learning theories. The exact FIM is either unavailable in closed form or too expensive to compute. In practice, it is almost always estimated based on empirical samples. We investigate two such estimators based on two equivalent representations of the FIM -- both unbiased and consistent. Their estimation quality is naturally gauged by their variance given in closed form. We analyze how the parametric structure of a deep neural network can affect the variance. The meaning of this variance measure and its upper bounds are then discussed in the context of deep learning.
\ No newline at end of file
diff --git a/data/2021/neurips/On the interplay between data structure and loss function in classification problems b/data/2021/neurips/On the interplay between data structure and loss function in classification problems
new file mode 100644
index 0000000000..e920ee90cb
--- /dev/null
+++ b/data/2021/neurips/On the interplay between data structure and loss function in classification problems	
@@ -0,0 +1 @@
+One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is built from independent blocks allowing us to tune the saliency of low-dimensional structures and their alignment with respect to the target function. Using methods from statistical physics, we derive a precise asymptotic expression for the train and test error achieved by random feature models trained to classify such data, which is valid for any convex loss function. We study in detail how the data structure affects the double descent curve, and show that in the over-parametrized regime, its impact is greater for logistic loss than for mean-squared loss: the easier the task, the wider the gap in performance at the advantage of the logistic loss. Our insights are confirmed by numerical experiments on MNIST and CIFAR10.
\ No newline at end of file
diff --git a/data/2021/neurips/One Explanation is Not Enough: Structured Attention Graphs for Image Classification b/data/2021/neurips/One Explanation is Not Enough: Structured Attention Graphs for Image Classification
new file mode 100644
index 0000000000..853733bd04
--- /dev/null
+++ b/data/2021/neurips/One Explanation is Not Enough: Structured Attention Graphs for Image Classification	
@@ -0,0 +1 @@
+Attention maps are a popular way of explaining the decisions of convolutional networks for image classification. Typically, for each image of interest, a single attention map is produced, which assigns weights to pixels based on their importance to the classification. A single attention map, however, provides an incomplete understanding since there are often many other maps that explain a classification equally well. In this paper, we introduce structured attention graphs (SAGs), which compactly represent sets of attention maps for an image by capturing how different combinations of image regions impact a classifier's confidence. We propose an approach to compute SAGs and a visualization for SAGs so that deeper insight can be gained into a classifier's decisions. We conduct a user study comparing the use of SAGs to traditional attention maps for answering counterfactual questions about image classifications. Our results show that the users are more correct when answering comparative counterfactual questions based on SAGs compared to the baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective b/data/2021/neurips/One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective
new file mode 100644
index 0000000000..8793de16ae
--- /dev/null
+++ b/data/2021/neurips/One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective	
@@ -0,0 +1 @@
+A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error. With further constraints such as bit balance and code orthogonality, it is not uncommon for existing models to employ a large number (>4) of losses. This leads to difficulties in model training and subsequently impedes their effectiveness. In this work, we propose a novel deep hashing model with only a single learning objective. Specifically, we show that maximizing the cosine similarity between the continuous codes and their corresponding binary orthogonal codes can ensure both hash code discriminativeness and quantization error minimization. Further, with this learning objective, code balancing can be achieved by simply using a Batch Normalization (BN) layer and multi-label classification is also straightforward with label smoothing. The result is an one-loss deep hashing model that removes all the hassles of tuning the weights of various losses. Importantly, extensive experiments show that our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks, often by significant margins. Code is available at https://github.com/kamwoh/orthohash
\ No newline at end of file
diff --git a/data/2021/neurips/One More Step Towards Reality: Cooperative Bandits with Imperfect Communication b/data/2021/neurips/One More Step Towards Reality: Cooperative Bandits with Imperfect Communication
new file mode 100644
index 0000000000..77114270b3
--- /dev/null
+++ b/data/2021/neurips/One More Step Towards Reality: Cooperative Bandits with Imperfect Communication	
@@ -0,0 +1 @@
+The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
\ No newline at end of file
diff --git a/data/2021/neurips/One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval b/data/2021/neurips/One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval
new file mode 100644
index 0000000000..9d778c0cf6
--- /dev/null
+++ b/data/2021/neurips/One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval	
@@ -0,0 +1 @@
+We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources. We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question. Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open QA benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Active Learning with Surrogate Loss Functions b/data/2021/neurips/Online Active Learning with Surrogate Loss Functions
new file mode 100644
index 0000000000..8d94798217
--- /dev/null
+++ b/data/2021/neurips/Online Active Learning with Surrogate Loss Functions	
@@ -0,0 +1 @@
+We derive a novel active learning algorithm in the streaming setting for binary classification tasks. The algorithm leverages weak labels to minimize the number of label requests, and trains a model to optimize a surrogate loss on a resulting set of labeled and weak-labeled points. Our algorithm jointly admits two crucial properties: theoretical guarantees in the general agnostic setting and a strong empirical performance. Our theoretical analysis shows that the algorithm attains favorable generalization and label complexity bounds, while our empirical study on 18 real-world datasets demonstrate that the algorithm outperforms standard baselines, including the Margin Algorithm, or Uncertainty Sampling, a highperforming active learning algorithm favored by practitioners.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Adaptation to Label Distribution Shift b/data/2021/neurips/Online Adaptation to Label Distribution Shift
new file mode 100644
index 0000000000..95d6bad3f4
--- /dev/null
+++ b/data/2021/neurips/Online Adaptation to Label Distribution Shift	
@@ -0,0 +1 @@
+Machine learning models often encounter distribution shifts when deployed in the real world. In this paper, we focus on adaptation to label distribution shift in the online setting, where the test-time label distribution is continually changing and the model must dynamically adapt to it without observing the true label. Leveraging a novel analysis, we show that the lack of true label does not hinder estimation of the expected test loss, which enables the reduction of online label shift adaptation to conventional online learning. Informed by this observation, we propose adaptation algorithms inspired by classical online learning techniques such as Follow The Leader (FTL) and Online Gradient Descent (OGD) and derive their regret bounds. We empirically verify our findings under both simulated and real world label distribution shifts and show that OGD is particularly effective and robust to a variety of challenging label shift scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Control of Unknown Time-Varying Dynamical Systems b/data/2021/neurips/Online Control of Unknown Time-Varying Dynamical Systems
new file mode 100644
index 0000000000..1054e5edd0
--- /dev/null
+++ b/data/2021/neurips/Online Control of Unknown Time-Varying Dynamical Systems	
@@ -0,0 +1 @@
+We study online control of time-varying linear systems with unknown dynamics in the nonstochastic control model. At a high level, we demonstrate that this setting is \emph{qualitatively harder} than that of either unknown time-invariant or known time-varying dynamics, and complement our negative results with algorithmic upper bounds in regimes where sublinear regret is possible. More specifically, we study regret bounds with respect to common classes of policies: Disturbance Action (SLS), Disturbance Response (Youla), and linear feedback policies. While these three classes are essentially equivalent for LTI systems, we demonstrate that these equivalences break down for time-varying systems. We prove a lower bound that no algorithm can obtain sublinear regret with respect to the first two classes unless a certain measure of system variability also scales sublinearly in the horizon. Furthermore, we show that offline planning over the state linear feedback policies is NP-hard, suggesting hardness of the online learning problem. On the positive side, we give an efficient algorithm that attains a sublinear regret bound against the class of Disturbance Response policies up to the aforementioned system variability term. In fact, our algorithm enjoys sublinear \emph{adaptive} regret bounds, which is a strictly stronger metric than standard regret and is more appropriate for time-varying systems. We sketch extensions to Disturbance Action policies and partial observation, and propose an inefficient algorithm for regret against linear state feedback policies.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Convex Optimization with Continuous Switching Constraint b/data/2021/neurips/Online Convex Optimization with Continuous Switching Constraint
new file mode 100644
index 0000000000..ad07113bfc
--- /dev/null
+++ b/data/2021/neurips/Online Convex Optimization with Continuous Switching Constraint	
@@ -0,0 +1 @@
+In many sequential decision making applications, the change of decision would bring an additional cost, such as the wear-and-tear cost associated with changing server status. To control the switching cost, we introduce the problem of online convex optimization with continuous switching constraint, where the goal is to achieve a small regret given a budget on the \emph{overall} switching cost. We first investigate the hardness of the problem, and provide a lower bound of order $\Omega(\sqrt{T})$ when the switching cost budget $S=\Omega(\sqrt{T})$, and $\Omega(\min\{\frac{T}{S},T\})$ when $S=O(\sqrt{T})$, where $T$ is the time horizon. The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique. We then develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound. Finally, we show that, for strongly convex functions, the regret bound can be improved to $O(\log T)$ for $S=\Omega(\log T)$, and $O(\min\{T/\exp(S)+S,T\})$ for $S=O(\log T)$.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Facility Location with Multiple Advice b/data/2021/neurips/Online Facility Location with Multiple Advice
new file mode 100644
index 0000000000..2f9ff1cc78
--- /dev/null
+++ b/data/2021/neurips/Online Facility Location with Multiple Advice	
@@ -0,0 +1 @@
+Clustering is a central topic in unsupervised learning and its online formulation has received a lot of attention in recent years. In this paper, we study the classic facility location problem in the presence of multiple machine-learned advice. We design an algorithm with provable performance guarantees such that, if the advice is good, it outperforms the best-known online algorithms for the problem, and if it is bad it still matches their performance. We complement our theoretical analysis with an in-depth study of the performance of our algorithm, showing its effectiveness on synthetic and real-world data sets.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Knapsack with Frequency Predictions b/data/2021/neurips/Online Knapsack with Frequency Predictions
new file mode 100644
index 0000000000..c7bf6c63ec
--- /dev/null
+++ b/data/2021/neurips/Online Knapsack with Frequency Predictions	
@@ -0,0 +1 @@
+There has been recent interest in using machine-learned predictions to improve the worst-case guarantees of online algorithms. In this paper we continue this line of work by studying the online knapsack problem, but with very weak predictions: in the form of knowing an upper and lower bound for the number of items of each value. We systematically derive online algorithms that attain the best possible competitive ratio for any ﬁxed prediction; we also extend the results to more general settings such as generalized one-way trading and two-stage online knapsack. Our work shows that even seemingly weak predictions can be utilized effectively to provably improve the performance of online algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Learning Of Neural Computations From Sparse Temporal Feedback b/data/2021/neurips/Online Learning Of Neural Computations From Sparse Temporal Feedback
new file mode 100644
index 0000000000..5b7db2fe24
--- /dev/null
+++ b/data/2021/neurips/Online Learning Of Neural Computations From Sparse Temporal Feedback	
@@ -0,0 +1 @@
+Neuronal computations depend on synaptic connectivity and intrinsic electrophysiological properties. Synaptic connectivity determines which inputs from presynaptic neurons are integrated, while cellular properties determine how inputs are ﬁltered over time. Unlike their biological counterparts, most computational approaches to learning in simulated neural networks are limited to changes in synaptic connectivity. However, if intrinsic parameters change, neural computations are altered drastically. Here, we include the parameters that determine the intrinsic properties, e.g., time constants and reset potential, into the learning paradigm. Using sparse feedback signals that indicate target spike times, and gradient-based parameter updates, we show that the intrinsic parameters can be learned along with the synaptic weights to produce speciﬁc input-output functions. Speciﬁcally, we use a teacher-student paradigm in which a randomly initialised leaky integrate-and-ﬁre or resonate-and-ﬁre neuron must recover the parameters of a teacher neuron. We show that complex temporal functions can be learned online and without backpropagation through time, relying on event-based updates only. Our results are a step towards online learning of neural computations from ungraded and unsigned sparse feedback signals with a biologically inspired learning mechanism.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Learning and Control of Complex Dynamical Systems from Sensory Input b/data/2021/neurips/Online Learning and Control of Complex Dynamical Systems from Sensory Input
new file mode 100644
index 0000000000..790bd50fdc
--- /dev/null
+++ b/data/2021/neurips/Online Learning and Control of Complex Dynamical Systems from Sensory Input	
@@ -0,0 +1 @@
+The only learnable parameters (117,963 in all) in our model are those of the autoencoder. The encoder is made of 6 blocks of 3× 3 convolutions with 16, 32, 64, 64, 32 and 8 channels followed by max-pooling, batch normalization and ReLu layers, except for the last block which does not have a ReLu layer. The decoder is a symmetric copy of the encoder. As our images are 64 × 64, the last convolutional block yields a feature map with 8 channels and 1 × 1 spatial dimension, which is reshaped into an 8 × 1 vector. The latent code we consider is thus directly the output of the convolutional encoder. Contrary to [1], we do not follow our encoder by fully-connected layers to obtain a compact code since the output of the convolutional encoder is alreay quite compact. Models without updates take 2.5 hours to train on a Tesla V100-SXM2 GPU, and models with updates take 4 hours to train. Models including control take longer to train (4 hours without updates and 6 hours with partial online updates) since the video sequences considered are longer. All models are trained for 200 epochs with a batch size of 16 and a learning rate of 10−3 which is divided by 2 every 20 epochs.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Learning in Periodic Zero-Sum Games b/data/2021/neurips/Online Learning in Periodic Zero-Sum Games
new file mode 100644
index 0000000000..beb209efe2
--- /dev/null
+++ b/data/2021/neurips/Online Learning in Periodic Zero-Sum Games	
@@ -0,0 +1 @@
+A seminal result in game theory is von Neumann's minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-to-day behavior of such dynamics. General results in this direction show that broad classes of online learning dynamics are cyclic, and formally Poincar\'{e} recurrent, in zero-sum games. We analyze the robustness of these online learning behaviors in the case of periodic zero-sum games with a time-invariant equilibrium. This model generalizes the usual repeated game formulation while also being a realistic and natural model of a repeated competition between players that depends on exogenous environmental variations such as time-of-day effects, week-to-week trends, and seasonality. Interestingly, time-average convergence may fail even in the simplest such settings, in spite of the equilibrium being fixed. In contrast, using novel analysis methods, we show that Poincar\'{e} recurrence provably generalizes despite the complex, non-autonomous nature of these dynamical systems.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Market Equilibrium with Application to Fair Division b/data/2021/neurips/Online Market Equilibrium with Application to Fair Division
new file mode 100644
index 0000000000..cdefcdc453
--- /dev/null
+++ b/data/2021/neurips/Online Market Equilibrium with Application to Fair Division	
@@ -0,0 +1 @@
+Computing market equilibria is a problem of both theoretical and applied interest. Much research to date focuses on the case of static Fisher markets with full information on buyers' utility functions and item supplies. Motivated by real-world markets, we consider an online setting: individuals have linear, additive utility functions; items arrive sequentially and must be allocated and priced irrevocably. We define the notion of an online market equilibrium in such a market as time-indexed allocations and prices which guarantee buyer optimality and market clearance in hindsight. We propose a simple, scalable and interpretable allocation and pricing dynamics termed as PACE. When items are drawn i.i.d. from an unknown distribution (with a possibly continuous support), we show that PACE leads to an online market equilibrium asymptotically. In particular, PACE ensures that buyers' time-averaged utilities converge to the equilibrium utilities w.r.t. a static market with item supplies being the unknown distribution and that buyers' time-averaged expenditures converge to their per-period budget. Hence, many desirable properties of market equilibrium-based fair division such as no envy, Pareto optimality, and the proportional-share guarantee are also attained asymptotically in the online setting. Next, we extend the dynamics to handle quasilinear buyer utilities, which gives the first online algorithm for computing first-price pacing equilibria. Finally, numerical experiments on real and synthetic datasets show that the dynamics converges quickly under various metrics.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm b/data/2021/neurips/Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm
new file mode 100644
index 0000000000..f2579a40c4
--- /dev/null
+++ b/data/2021/neurips/Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm	
@@ -0,0 +1 @@
+Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i.i.d., but they have fixed degree distributions -- the so-called configuration model. We estimate the competitive ratio of the simplest algorithm, GREEDY, by approximating some relevant stochastic discrete processes by their continuous counterparts, that are solutions of an explicit system of partial differential equations. This technique gives precise bounds on the estimation errors, with arbitrarily high probability as the problem size increases. In particular, it allows the formal comparison between different configuration models. We also prove that, quite surprisingly, GREEDY can have better performance guarantees than RANKING, another celebrated algorithm for online matching that usually outperforms the former.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Multi-Armed Bandits with Adaptive Inference b/data/2021/neurips/Online Multi-Armed Bandits with Adaptive Inference
new file mode 100644
index 0000000000..6f38a2d17c
--- /dev/null
+++ b/data/2021/neurips/Online Multi-Armed Bandits with Adaptive Inference	
@@ -0,0 +1 @@
+During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected--thereby yielding non-iid data--conducting inference accurately is not straightforward. In particular, sample averaging, which is used in the family of UCB and Thompson sampling (TS) algorithms, does not provide a good choice as it suffers from bias and a lack of good statistical properties (e.g. asymptotic normality). Our thesis in this paper is that more sophisticated inference schemes that take into account the adaptive nature of the sequentially collected data can unlock further performance gains, even though both UCB and TS type algorithms are optimal in the worst case. In particular, we propose a variant of TS-style algorithms--which we call doubly adaptive TS--that leverages recent advances in causal inference and adaptively reweights the terms of a doubly robust estimator on the true mean reward of each arm. Through 20 synthetic domain experiments and a semi-synthetic experiment based on data from an A/B test of a web service, we demonstrate that using an adaptive inferential scheme (while still retaining the exploration efficacy of TS) provides clear benefits in online decision making: the proposed DATS algorithm has superior empirical performance to existing baselines (UCB and TS) in terms of regret and sample complexity in identifying the best arm. In addition, we also provide a finite-time regret bound of doubly adaptive TS that matches (up to log factors) those of UCB and TS algorithms, thereby establishing that its improved practical benefits do not come at the expense of worst-case suboptimality.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Robust Reinforcement Learning with Model Uncertainty b/data/2021/neurips/Online Robust Reinforcement Learning with Model Uncertainty
new file mode 100644
index 0000000000..cf734a6326
--- /dev/null
+++ b/data/2021/neurips/Online Robust Reinforcement Learning with Model Uncertainty	
@@ -0,0 +1 @@
+Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown. We develop a sample-based approach to estimate the unknown uncertainty set and design a robust Q-learning algorithm (tabular case) and robust TDC algorithm (function approximation setting), which can be implemented in an online and incremental fashion. For the robust Q-learning algorithm, we prove that it converges to the optimal robust Q function, and for the robust TDC algorithm, we prove that it converges asymptotically to some stationary points. Unlike the results in [Roy et al., 2017], our algorithms do not need any additional conditions on the discount factor to guarantee the convergence. We further characterize the finite-time error bounds of the two algorithms and show that both the robust Q-learning and robust TDC algorithms converge as fast as their vanilla counterparts(within a constant factor). Our numerical experiments further demonstrate the robustness of our algorithms. Our approach can be readily extended to robustify many other algorithms, e.g., TD, SARSA, and other GTD algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Selective Classification with Limited Feedback b/data/2021/neurips/Online Selective Classification with Limited Feedback
new file mode 100644
index 0000000000..6b4b37a57c
--- /dev/null
+++ b/data/2021/neurips/Online Selective Classification with Limited Feedback	
@@ -0,0 +1 @@
+Motivated by applications to resource-limited and safety-critical domains, we study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. For example, this may model an adaptive decision to invoke more resources on this instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action, and that feedback is only received when the learner abstains, which models the fact that reliable labels are only available when the resource intensive processing is invoked. Within this framework, we explore strategies that make few mistakes, while not abstaining too many times more than the best-in-hindsight error-free classifier from a given class. That is, the one that makes no mistakes, while abstaining the fewest number of times. We construct simple versioning-based schemes for any $\mu \in (0,1],$ that make most $T^\mu$ mistakes while incurring \smash{$\tilde{O}(T^{1-\mu})$} excess abstention against adaptive adversaries. We further show that this dependence on $T$ is tight, and provide illustrative experiments on realistic datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits b/data/2021/neurips/Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
new file mode 100644
index 0000000000..96c7d73bed
--- /dev/null
+++ b/data/2021/neurips/Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits	
@@ -0,0 +1 @@
+In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Online Variational Filtering and Parameter Learning b/data/2021/neurips/Online Variational Filtering and Parameter Learning
new file mode 100644
index 0000000000..d43fc11621
--- /dev/null
+++ b/data/2021/neurips/Online Variational Filtering and Parameter Learning	
@@ -0,0 +1 @@
+We present a variational method for online state estimation and parameter learning in state-space models (SSMs), a ubiquitous class of latent variable models for sequential data. As per standard batch variational techniques, we use stochastic gradients to simultaneously optimize a lower bound on the log evidence with respect to both model parameters and a variational approximation of the states' posterior distribution. However, unlike existing approaches, our method is able to operate in an entirely online manner, such that historic observations do not require revisitation after being incorporated and the cost of updates at each time step remains constant, despite the growing dimensionality of the joint posterior distribution of the states. This is achieved by utilizing backward decompositions of this joint posterior distribution and of its variational approximation, combined with Bellman-type recursions for the evidence lower bound and its gradients. We demonstrate the performance of this methodology across several examples, including high-dimensional SSMs and sequential Variational Auto-Encoders.
\ No newline at end of file
diff --git a/data/2021/neurips/Online and Offline Reinforcement Learning by Planning with a Learned Model b/data/2021/neurips/Online and Offline Reinforcement Learning by Planning with a Learned Model
new file mode 100644
index 0000000000..ed5e746431
--- /dev/null
+++ b/data/2021/neurips/Online and Offline Reinforcement Learning by Planning with a Learned Model	
@@ -0,0 +1 @@
+Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL. In contrast to previous work, our algorithm does not require any special adaptations for the off-policy or offline RL settings. MuZero Unplugged sets new state-of-the-art results in the RL Unplugged offline RL benchmark as well as in the online RL benchmark of Atari in the standard 200 million frame setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Online false discovery rate control for anomaly detection in time series b/data/2021/neurips/Online false discovery rate control for anomaly detection in time series
new file mode 100644
index 0000000000..3349964f84
--- /dev/null
+++ b/data/2021/neurips/Online false discovery rate control for anomaly detection in time series	
@@ -0,0 +1 @@
+This article proposes novel rules for false discovery rate control (FDRC) geared towards online anomaly detection in time series. Online FDRC rules allow to control the properties of a sequence of statistical tests. In the context of anomaly detection, the null hypothesis is that an observation is normal and the alternative is that it is anomalous. FDRC rules allow users to target a lower bound on precision in unsupervised settings. The methods proposed in this article overcome short-comings of previous FDRC rules in the context of anomaly detection, in particular ensuring that power remains high even when the alternative is exceedingly rare (typical in anomaly detection) and the test statistics are serially dependent (typical in time series). We show the soundness of these rules in both theory and experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Online learning in MDPs with linear function approximation and bandit feedback b/data/2021/neurips/Online learning in MDPs with linear function approximation and bandit feedback
new file mode 100644
index 0000000000..2fba072b8f
--- /dev/null
+++ b/data/2021/neurips/Online learning in MDPs with linear function approximation and bandit feedback	
@@ -0,0 +1 @@
+We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner has access to a simulator of the environment that allows generating trajectories from the true MDP dynamics. Our main contribution is developing a computationally efficient algorithm that we call MDP-LinExp3, and prove that its regret is bounded by $\widetilde{\mathcal{O}}\big(H^2 T^{2/3} (dK)^{1/3}\big)$, where $T$ is the number of episodes, $H$ is the number of steps in each episode, $K$ is the number of actions, and $d$ is the dimension of the feature map. We also show that the regret can be improved to $\widetilde{\mathcal{O}}\big(H^2 \sqrt{TdK}\big)$ under much stronger assumptions on the MDP dynamics. To our knowledge, MDP-LinExp3 is the first provably efficient algorithm for this problem setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Only Train Once: A One-Shot Neural Network Training And Pruning Framework b/data/2021/neurips/Only Train Once: A One-Shot Neural Network Training And Pruning Framework
new file mode 100644
index 0000000000..bbe2328f65
--- /dev/null
+++ b/data/2021/neurips/Only Train Once: A One-Shot Neural Network Training And Pruning Framework	
@@ -0,0 +1 @@
+Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10 and Bert for SQuAD and competitive result on ResNet50 for ImageNet. The source code is available at https://github.com/tianyic/only_train_once.
\ No newline at end of file
diff --git a/data/2021/neurips/Open Rule Induction b/data/2021/neurips/Open Rule Induction
new file mode 100644
index 0000000000..43ef70b060
--- /dev/null
+++ b/data/2021/neurips/Open Rule Induction	
@@ -0,0 +1 @@
+Rules have a number of desirable properties. It is easy to understand, infer new knowledge, and communicate with other inference systems. One weakness of the previous rule induction systems is that they only find rules within a knowledge base (KB) and therefore cannot generalize to more open and complex real-world rules. Recently, the language model (LM)-based rule generation are proposed to enhance the expressive power of the rules. In this paper, we revisit the differences between KB-based rule induction and LM-based rule generation. We argue that, while KB-based methods inducted rules by discovering data commonalities, the current LM-based methods are"learning rules from rules". This limits these methods to only produce"canned"rules whose patterns are constrained by the annotated rules, while discarding the rich expressive power of LMs for free text. Therefore, in this paper, we propose the open rule induction problem, which aims to induce open rules utilizing the knowledge in LMs. Besides, we propose the Orion (\underline{o}pen \underline{r}ule \underline{i}nducti\underline{on}) system to automatically mine open rules from LMs without supervision of annotated rules. We conducted extensive experiments to verify the quality and quantity of the inducted open rules. Surprisingly, when applying the open rules in downstream tasks (i.e. relation extraction), these automatically inducted rules even outperformed the manually annotated rules.
\ No newline at end of file
diff --git a/data/2021/neurips/Open-set Label Noise Can Improve Robustness Against Inherent Label Noise b/data/2021/neurips/Open-set Label Noise Can Improve Robustness Against Inherent Label Noise
new file mode 100644
index 0000000000..6e8f476c5c
--- /dev/null
+++ b/data/2021/neurips/Open-set Label Noise Can Improve Robustness Against Inherent Label Noise	
@@ -0,0 +1 @@
+Learning with noisy labels is a practically challenging problem in weakly supervised learning. In the existing literature, open-set noises are always considered to be poisonous for generalization, similar to closed-set noises. In this paper, we empirically show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. Inspired by the observations, we propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training. With ODNL, the extra capacity of the neural network can be largely consumed in a way that does not interfere with learning patterns from clean data. Through the lens of SGD noise, we show that the noises induced by our method are random-direction, conflict-free and biased, which may help the model converge to a flat minimum with superior stability and enforce the model to produce conservative predictions on Out-of-Distribution instances. Extensive experimental results on benchmark datasets with various types of noisy labels demonstrate that the proposed method not only enhances the performance of many existing robust algorithms but also achieves significant improvement on Out-of-Distribution detection tasks even in the label noise setting.
\ No newline at end of file
diff --git a/data/2021/neurips/OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization b/data/2021/neurips/OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization
new file mode 100644
index 0000000000..b767b9c85e
--- /dev/null
+++ b/data/2021/neurips/OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization	
@@ -0,0 +1 @@
+Semi-supervised learning (SSL) is an effective means to leverage unlabeled data to improve a model's performance. Typical SSL methods like FixMatch assume that labeled and unlabeled data share the same label space. However, in practice, unlabeled data can contain categories unseen in the labeled set, i.e., outliers, which can significantly harm the performance of SSL algorithms. To address this problem, we propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch. Learning representations of inliers while rejecting outliers is essential for the success of OSSL. To this end, OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers. The OVA-classifier outputs the confidence score of a sample being an inlier, providing a threshold to detect outliers. Another key contribution is an open-set soft-consistency regularization loss, which enhances the smoothness of the OVA-classifier with respect to input transformations and greatly improves outlier detection. OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Algorithms for Stochastic Contextual Preference Bandits b/data/2021/neurips/Optimal Algorithms for Stochastic Contextual Preference Bandits
new file mode 100644
index 0000000000..cbd3a4762f
--- /dev/null
+++ b/data/2021/neurips/Optimal Algorithms for Stochastic Contextual Preference Bandits	
@@ -0,0 +1 @@
+We consider the problem of preference bandits in the contextual setting. At each round, the learner is presented with a context set of K items, chosen randomly from a potentially inﬁnite set of arms D ⊆ R d . However, unlike classical contextual bandits, our framework only allows the learner to receive feedback in terms of item preferences: At each round, the learner is allowed to play a subset of size q (any q ∈ { 2 , . . . , K } ) upon which only a (noisy) winner of the subset is revealed. Yet, same as the classical setup, the goal is still to compete against the best context arm at each round. The problem is relevant in various online decision-making scenarios, including recommender systems, information retrieval, tournament ranking–typically any application where it’s easier to elicit the items’ relative strength instead of their absolute scores. To the best of our knowledge, this work is the ﬁrst to consider preference-based stochastic contextual bandits for potentially inﬁnite decision spaces. We start with presenting two algorithms for the special case of pairwise preferences ( q = 2) : The ﬁrst algorithm is simple and easy to implement with an ˜ O ( d √ T ) regret guarantee, while the second algorithm is shown to achieve the optimal ˜ O ( √ dT ) regret, as follows from our Ω( √ dT ) matching lower bound analysis. We then proceed to analyze the problem for any general q -subsetwise preferences ( q ≥ 2 ), where surprisingly, our lower bound proves the fundamental performance limit to be Ω( √ dT ) yet again, independent of the subsetsize q . Following this, we propose a matching upper bound algorithm justifying the tightness of our results. This implies having access to subsetwise preferences does not help in faster information aggregation for our feedback model. All the results are corroborated empirically against existing baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Best-Arm Identification Methods for Tail-Risk Measures b/data/2021/neurips/Optimal Best-Arm Identification Methods for Tail-Risk Measures
new file mode 100644
index 0000000000..a5a18ed102
--- /dev/null
+++ b/data/2021/neurips/Optimal Best-Arm Identification Methods for Tail-Risk Measures	
@@ -0,0 +1 @@
+Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm-distribution from amongst finitely many that has the smallest CVaR or VaR. We first show that in the special case of arm-distributions belonging to a single-parameter exponential family, both these problems are equivalent to the best mean-arm identification problem, which is widely studied in the literature. This equivalence however is not true in general. We then propose optimal $\delta$-correct algorithms that act on general arm-distributions, including heavy-tailed distributions, that match the lower bound on the expected number of samples needed, asymptotically (as $ \delta$ approaches $0$). En-route, we also develop new non-asymptotic concentration inequalities for certain functions of these risk measures for the empirical distribution, that may have wider applicability.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Gradient-based Algorithms for Non-concave Bandit Optimization b/data/2021/neurips/Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
new file mode 100644
index 0000000000..a871842d73
--- /dev/null
+++ b/data/2021/neurips/Optimal Gradient-based Algorithms for Non-concave Bandit Optimization	
@@ -0,0 +1 @@
+Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Order Simple Regret for Gaussian Process Bandits b/data/2021/neurips/Optimal Order Simple Regret for Gaussian Process Bandits
new file mode 100644
index 0000000000..6bf99d7f8f
--- /dev/null
+++ b/data/2021/neurips/Optimal Order Simple Regret for Gaussian Process Bandits	
@@ -0,0 +1 @@
+Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When $N$ is the number of exploration trials and $\gamma_N$ is the maximal information gain, we prove an $\tilde{\mathcal{O}}(\sqrt{\gamma_N/N})$ bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Policies Tend To Seek Power b/data/2021/neurips/Optimal Policies Tend To Seek Power
new file mode 100644
index 0000000000..8fed2c7cb2
--- /dev/null
+++ b/data/2021/neurips/Optimal Policies Tend To Seek Power	
@@ -0,0 +1 @@
+Some researchers speculate that intelligent reinforcement learning (RL) agents would be incentivized to seek resources and power in pursuit of their objectives. Other researchers point out that RL agents need not have human-like power-seeking instincts. To clarify this discussion, we develop the first formal theory of the statistical tendencies of optimal policies. In the context of Markov decision processes, we prove that certain environmental symmetries are sufficient for optimal policies to tend to seek power over the environment. These symmetries exist in many environments in which the agent can be shut down or destroyed. We prove that in these environments, most reward functions make it optimal to seek power by keeping a range of options available and, when maximizing average reward, by navigating towards larger sets of potential terminal states.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Rates for Nonparametric Density Estimation under Communication Constraints b/data/2021/neurips/Optimal Rates for Nonparametric Density Estimation under Communication Constraints
new file mode 100644
index 0000000000..cd5e17b254
--- /dev/null
+++ b/data/2021/neurips/Optimal Rates for Nonparametric Density Estimation under Communication Constraints	
@@ -0,0 +1 @@
+We consider density estimation for Besov spaces when each sample is quantized to only a limited number of bits. We provide a noninteractive adaptive estimator that exploits the sparsity of wavelet bases, along with a simulate-and-infer technique from parametric estimation under communication constraints. We show that our estimator is nearly rate-optimal by deriving minimax lower bounds that hold even when interactive protocols are allowed. Interestingly, while our wavelet-based estimator is almost rate-optimal for Sobolev spaces as well, it is unclear whether the standard Fourier basis, which arise naturally for those spaces, can be used to achieve the same performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Rates for Random Order Online Optimization b/data/2021/neurips/Optimal Rates for Random Order Online Optimization
new file mode 100644
index 0000000000..25cbe35904
--- /dev/null
+++ b/data/2021/neurips/Optimal Rates for Random Order Online Optimization	
@@ -0,0 +1 @@
+We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of \citet{garber2020online}, completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d.~setting, as well as on a refined average stability analysis of stochastic gradient descent.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Sketching for Trace Estimation b/data/2021/neurips/Optimal Sketching for Trace Estimation
new file mode 100644
index 0000000000..cd9c6be095
--- /dev/null
+++ b/data/2021/neurips/Optimal Sketching for Trace Estimation	
@@ -0,0 +1 @@
+Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve a $(1 \pm \epsilon)$-multiplicative approximation to $\text{tr}(A)$ with failure probability $\delta$ on positive-semidefinite input matrices $A$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$, and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch++ algorithm suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over the adaptive Hutch++ algorithm, which is less parallelizable, as well as over the non-adaptive Hutchinson's method.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Underdamped Langevin MCMC Method b/data/2021/neurips/Optimal Underdamped Langevin MCMC Method
new file mode 100644
index 0000000000..b18e8b19d0
--- /dev/null
+++ b/data/2021/neurips/Optimal Underdamped Langevin MCMC Method	
@@ -0,0 +1 @@
+In the paper, we study the underdamped Langevin diffusion (ULD) with stronglyconvex potential consisting of finite summation of N smooth components, and propose an efficient discretization method, which requires O(N + d 1 3N 2 3 /ε 2 3 ) gradient evaluations to achieve ε-error (in √ E∥·∥2 distance) for approximating d-dimensional ULD. Moreover, we prove a lower bound of gradient complexity as Ω(N + d 1 3N 2 3 /ε 2 3 ), which indicates that our method is optimal in dependence of N , ε, and d. In particular, we apply our method to sample the strongly-log-concave distribution and obtain gradient complexity better than all existing gradient based sampling algorithms. Experimental results on both synthetic and real-world data show that our new method consistently outperforms the existing ULD approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings b/data/2021/neurips/Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
new file mode 100644
index 0000000000..b3880ea1ac
--- /dev/null
+++ b/data/2021/neurips/Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings	
@@ -0,0 +1 @@
+This work studies the statistical limits of uniform convergence for offline policy evaluation (OPE) problems with model-based methods (for episodic MDP) and provides a unified framework towards optimal learning for several well-motivated offline tasks. Uniform OPE $\sup_\Pi|Q^\pi-\hat{Q}^\pi|<\epsilon$ is a stronger measure than the point-wise OPE and ensures offline learning when $\Pi$ contains all policies (the global class). In this paper, we establish an $\Omega(H^2 S/d_m\epsilon^2)$ lower bound (over model-based family) for the global uniform OPE and our main result establishes an upper bound of $\tilde{O}(H^2/d_m\epsilon^2)$ for the \emph{local} uniform convergence that applies to all \emph{near-empirically optimal} policies for the MDPs with \emph{stationary} transition. Here $d_m$ is the minimal marginal state-action probability. Critically, the highlight in achieving the optimal rate $\tilde{O}(H^2/d_m\epsilon^2)$ is our design of \emph{singleton absorbing MDP}, which is a new sharp analysis tool that works with the model-based approach. We generalize such a model-based framework to the new settings: offline task-agnostic and the offline reward-free with optimal complexity $\tilde{O}(H^2\log(K)/d_m\epsilon^2)$ ($K$ is the number of tasks) and $\tilde{O}(H^2S/d_m\epsilon^2)$ respectively. These results provide a unified solution for simultaneously solving different offline RL problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimal prediction of Markov chains with and without spectral gap b/data/2021/neurips/Optimal prediction of Markov chains with and without spectral gap
new file mode 100644
index 0000000000..b043f023ef
--- /dev/null
+++ b/data/2021/neurips/Optimal prediction of Markov chains with and without spectral gap	
@@ -0,0 +1 @@
+We study the following learning problem with dependent data: Observing a trajectory of length <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> from a stationary Markov chain with <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> states, the goal is to predict the next state. For <inline-formula> <tex-math notation="LaTeX">$3 \leq k \leq O(\sqrt {n})$ </tex-math></inline-formula>, using techniques from universal compression, the optimal prediction risk in Kullback-Leibler divergence is shown to be <inline-formula> <tex-math notation="LaTeX">$\Theta \left({\frac {k^{2}}{n}\log \frac {n}{k^{2}}}\right)$ </tex-math></inline-formula>, in contrast to the optimal rate of <inline-formula> <tex-math notation="LaTeX">$\Theta \left({\frac {\log \log n}{n}}\right)$ </tex-math></inline-formula> for <inline-formula> <tex-math notation="LaTeX">$k=2$ </tex-math></inline-formula> previously shown in Falahatgar et al. (2016). These rates, slower than the parametric rate of <inline-formula> <tex-math notation="LaTeX">$O\left({\frac {k^{2}}{n}}\right)$ </tex-math></inline-formula>, can be attributed to the memory in the data, as the spectral gap of the Markov chain can be arbitrarily small. To quantify the memory effect, we study irreducible reversible chains with a prescribed spectral gap. In addition to characterizing the optimal prediction risk for two states, we show that, as long as the spectral gap is not excessively small, the prediction risk in the Markov model is <inline-formula> <tex-math notation="LaTeX">$O\left({\frac {k^{2}}{n}}\right)$ </tex-math></inline-formula>, which coincides with that of an iid model with the same number of parameters. Extensions to higher-order Markov chains are also obtained.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimality and Stability in Federated Learning: A Game-theoretic Approach b/data/2021/neurips/Optimality and Stability in Federated Learning: A Game-theoretic Approach
new file mode 100644
index 0000000000..58beb398b7
--- /dev/null
+++ b/data/2021/neurips/Optimality and Stability in Federated Learning: A Game-theoretic Approach	
@@ -0,0 +1 @@
+Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players). First, we provide and prove the correctness of an efficient algorithm to calculate an optimal (error minimizing) arrangement of players. Next, we analyze the relationship between the stability and optimality of an arrangement. First, we show that for some regions of parameter space, all stable arrangements are optimal (Price of Anarchy equal to 1). However, we show this is not true for all settings: there exist examples of stable arrangements with higher cost than optimal (Price of Anarchy greater than 1). Finally, we give the first constant-factor bound on the performance gap between stability and optimality, proving that the total error of the worst stable solution can be no higher than 9 times the total error of an optimal solution (Price of Anarchy bound of 9).
\ No newline at end of file
diff --git a/data/2021/neurips/Optimality of variational inference for stochasticblock model with missing links b/data/2021/neurips/Optimality of variational inference for stochasticblock model with missing links
new file mode 100644
index 0000000000..a92d2e4e6e
--- /dev/null
+++ b/data/2021/neurips/Optimality of variational inference for stochasticblock model with missing links	
@@ -0,0 +1 @@
+Variational methods are extremely popular in the analysis of network data. Statistical guarantees obtained for these methods typically provide asymptotic normality for the problem of estimation of global model parameters under the stochastic block model. In the present work, we consider the case of networks with missing links that is important in application and show that the variational approximation to the maximum likelihood estimator converges at the minimax rate. This provides the first minimax optimal and tractable estimator for the problem of parameter estimation for the stochastic block model with missing links. We complement our results with numerical studies of simulated and real networks, which confirm the advantages of this estimator over current methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning b/data/2021/neurips/Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning
new file mode 100644
index 0000000000..65f9922511
--- /dev/null
+++ b/data/2021/neurips/Optimization-Based Algebraic Multigrid Coarsening Using Reinforcement Learning	
@@ -0,0 +1 @@
+Large sparse linear systems of equations are ubiquitous in science and engineering, such as those arising from discretizations of partial differential equations. Algebraic multigrid (AMG) methods are one of the most common methods of solving such linear systems, with an extensive body of underlying mathematical theory. A system of linear equations defines a graph on the set of unknowns and each level of a multigrid solver requires the selection of an appropriate coarse graph along with restriction and interpolation operators that map to and from the coarse representation. The efficiency of the multigrid solver depends critically on this selection and many selection methods have been developed over the years. Recently, it has been demonstrated that it is possible to directly learn the AMG interpolation and restriction operators, given a coarse graph selection. In this paper, we consider the complementary problem of learning to coarsen graphs for a multigrid solver, a necessary step in developing fully learnable AMG methods. We propose a method using a reinforcement learning (RL) agent based on graph neural networks (GNNs), which can learn to perform graph coarsening on small planar training graphs and then be applied to unstructured large planar graphs, assuming bounded node degree. We demonstrate that this method can produce better coarse graphs than existing algorithms, even as the graph size increases and other properties of the graph are varied. We also propose an efficient inference procedure for performing graph coarsening that results in linear time complexity in graph size.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimizing Conditional Value-At-Risk of Black-Box Functions b/data/2021/neurips/Optimizing Conditional Value-At-Risk of Black-Box Functions
new file mode 100644
index 0000000000..f15e097dbb
--- /dev/null
+++ b/data/2021/neurips/Optimizing Conditional Value-At-Risk of Black-Box Functions	
@@ -0,0 +1 @@
+This paper presents two Bayesian optimization (BO) algorithms with theoretical performance guarantee to maximize the conditional value-at-risk (CVaR) of a black-box function: CV-UCB and CV-TS which are based on the well-established principle of optimism in the face of uncertainty and Thompson sampling, respectively. To achieve this, we develop an upper conﬁdence bound of CVaR and prove the no-regret guarantee of CV-UCB by utilizing an interesting connection between CVaR and value-at-risk (VaR). For CV-TS, though it is straightforwardly performed with Thompson sampling, bounding its Bayesian regret is non-trivial because it requires a tail expectation bound for the distribution of CVaR of a black-box function, which has not been shown in the literature. The performances of both CV-UCB and CV-TS are empirically evaluated in optimizing CVaR of synthetic benchmark functions and simulated real-world optimization problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD b/data/2021/neurips/Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Optimizing Reusable Knowledge for Continual Learning via Metalearning b/data/2021/neurips/Optimizing Reusable Knowledge for Continual Learning via Metalearning
new file mode 100644
index 0000000000..63166fed4e
--- /dev/null
+++ b/data/2021/neurips/Optimizing Reusable Knowledge for Continual Learning via Metalearning	
@@ -0,0 +1 @@
+When learning tasks over time, artificial neural networks suffer from a problem known as Catastrophic Forgetting (CF). This happens when the weights of a network are overwritten during the training of a new task causing forgetting of old information. To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task. Specifically, MARK keeps a set of shared weights among tasks. We envision these shared weights as a common Knowledge Base (KB) that is not only used to learn new tasks, but also enriched with new knowledge as the model learns new tasks. Key components behind MARK are two-fold. On the one hand, a metalearning approach provides the key mechanism to incrementally enrich the KB with new knowledge and to foster weight reusability among tasks. On the other hand, a set of trainable masks provides the key mechanism to selectively choose from the KB relevant weights to solve each task. By using MARK, we achieve state of the art results in several popular benchmarks, surpassing the best performing methods in terms of average accuracy by over 10% on the 20-Split-MiniImageNet dataset, while achieving almost zero forgetfulness using 55% of the number of parameters. Furthermore, an ablation study provides evidence that, indeed, MARK is learning reusable knowledge that is selectively used by each task.
\ No newline at end of file
diff --git a/data/2021/neurips/Oracle Complexity in Nonsmooth Nonconvex Optimization b/data/2021/neurips/Oracle Complexity in Nonsmooth Nonconvex Optimization
new file mode 100644
index 0000000000..3b76bd48e7
--- /dev/null
+++ b/data/2021/neurips/Oracle Complexity in Nonsmooth Nonconvex Optimization	
@@ -0,0 +1 @@
+It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (with gradient norm less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results: First, we consider the problem of getting near $\epsilon$-stationary points. This is perhaps the most natural relaxation of finding $\epsilon$-stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and $\epsilon$ smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.
\ No newline at end of file
diff --git a/data/2021/neurips/Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure b/data/2021/neurips/Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure
new file mode 100644
index 0000000000..a418460d2a
--- /dev/null
+++ b/data/2021/neurips/Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure	
@@ -0,0 +1 @@
+We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. Moreover, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we leverage our techniques to prove a novel lower bound for the known structure case, closing the gap to the regret bound of Chen et al. [2021].
\ No newline at end of file
diff --git a/data/2021/neurips/Out-of-Distribution Generalization in Kernel Regression b/data/2021/neurips/Out-of-Distribution Generalization in Kernel Regression
new file mode 100644
index 0000000000..4d6874a9b7
--- /dev/null
+++ b/data/2021/neurips/Out-of-Distribution Generalization in Kernel Regression	
@@ -0,0 +1 @@
+In real word applications, data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts have been a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics. Using the replica method, we derive an analytical formula for the out-of-distribution generalization error applicable to any kernel and real datasets. We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel as a key determinant of generalization performance under distribution shift. Using our analytical expressions we elucidate various generalization phenomena including possible improvement in generalization when there is a mismatch. We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift. We present applications of our theory to real and synthetic datasets and for many kernels. We compare results of our theory applied to Neural Tangent Kernel with simulations of wide networks and show agreement. We analyze linear regression in further depth.
\ No newline at end of file
diff --git a/data/2021/neurips/Outcome-Driven Reinforcement Learning via Variational Inference b/data/2021/neurips/Outcome-Driven Reinforcement Learning via Variational Inference
new file mode 100644
index 0000000000..e83cfd6d8c
--- /dev/null
+++ b/data/2021/neurips/Outcome-Driven Reinforcement Learning via Variational Inference	
@@ -0,0 +1 @@
+While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we view reinforcement learning as inferring policies that achieve desired outcomes, rather than as a problem of maximizing rewards. To solve this inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to hand-craft reward functions for a suite of diverse manipulation and locomotion tasks and leads to effective goal-directed behaviors.
\ No newline at end of file
diff --git a/data/2021/neurips/Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima b/data/2021/neurips/Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima
new file mode 100644
index 0000000000..a130ba4669
--- /dev/null
+++ b/data/2021/neurips/Overcoming Catastrophic Forgetting in Incremental Few-Shot Learning by Finding Flat Minima	
@@ -0,0 +1 @@
+This paper considers incremental few-shot learning, which requires a model to continually recognize new categories with only a few examples provided. Our study shows that existing methods severely suffer from catastrophic forgetting, a well-known problem in incremental learning, which is aggravated due to data scarcity and imbalance in the few-shot setting. Our analysis further suggests that to prevent catastrophic forgetting, actions need to be taken in the primitive stage -- the training of base classes instead of later few-shot learning sessions. Therefore, we propose to search for flat local minima of the base training objective function and then fine-tune the model parameters within the flat region on new tasks. In this way, the model can efficiently learn new classes while preserving the old ones. Comprehensive experimental results demonstrate that our approach outperforms all prior state-of-the-art methods and is very close to the approximate upper bound. The source code is available at https://github.com/moukamisama/F2M.
\ No newline at end of file
diff --git a/data/2021/neurips/Overcoming the Convex Barrier for Simplex Inputs b/data/2021/neurips/Overcoming the Convex Barrier for Simplex Inputs
new file mode 100644
index 0000000000..eee2a30794
--- /dev/null
+++ b/data/2021/neurips/Overcoming the Convex Barrier for Simplex Inputs	
@@ -0,0 +1 @@
+Recent progress in neural network veriﬁcation has challenged the notion of a convex barrier , that is, an inherent weakness in the convex relaxation of the output of a neural network. Speciﬁcally, there now exists a tight relaxation for verifying the robustness of a neural network to (cid:96) ∞ input perturbations, as well as efﬁcient primal and dual solvers for the relaxation. Buoyed by this success, we consider the problem of developing similar techniques for verifying robustness to input perturbations within the probability simplex. We prove a somewhat surprising result that, in this case, not only can one design a tight relaxation that overcomes the convex barrier, but the size of the relaxation remains linear in the number of neurons, thereby leading to simpler and more efﬁcient algorithms. We establish the scalability of our overall approach via the speciﬁcation of (cid:96) 1 robustness for CIFAR-10 and MNIST classiﬁcation, where our approach improves the state of the art veriﬁed accuracy by up to 14 . 4% . Furthermore, we establish its accuracy on a novel and highly challenging task of verifying the robustness of a multi-modal (text and image) classiﬁer to arbitrary changes in its textual input.
\ No newline at end of file
diff --git a/data/2021/neurips/Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning b/data/2021/neurips/Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning
new file mode 100644
index 0000000000..37964a110a
--- /dev/null
+++ b/data/2021/neurips/Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning	
@@ -0,0 +1 @@
+As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current implementations of Laplacian regularization suffer from several drawbacks, notably the well-known curse of dimensionality. In this paper, we provide a statistical analysis to overcome those issues, and unveil a large body of spectral filtering methods that exhibit desirable behaviors. They are implemented through (reproducing) kernel methods, for which we provide realistic computational guidelines in order to make our method usable with large amounts of data.
\ No newline at end of file
diff --git a/data/2021/neurips/Overinterpretation reveals image classification model pathologies b/data/2021/neurips/Overinterpretation reveals image classification model pathologies
new file mode 100644
index 0000000000..79781b3ebb
--- /dev/null
+++ b/data/2021/neurips/Overinterpretation reveals image classification model pathologies	
@@ -0,0 +1 @@
+Image classifiers are typically scored on their test set accuracy, but high accuracy can mask a subtle type of model failure. We find that high scoring convolutional neural networks (CNN) exhibit troubling pathologies that allow them to display high accuracy even in the absence of semantically salient features. When a model provides a high-confidence decision without salient supporting input features we say that the classifier has overinterpreted its input, finding too much class-evidence in patterns that appear nonsensical to humans. Here, we demonstrate that state of the art neural networks for CIFAR-10 and ImageNet suffer from overinterpretation, and find CIFAR-10 trained models make confident predictions even when 95% of an input image has been masked and humans are unable to discern salient features in the remaining pixel subset. Although these patterns portend potential model fragility in real-world deployment, they are in fact valid statistical patterns of the image classification benchmark that alone suffice to attain high test accuracy. We find that ensembling strategies can help mitigate model overinterpretation, and classifiers which rely on more semantically meaningful features can improve accuracy over both the test set and out-of-distribution images from a different source than the training data.
\ No newline at end of file
diff --git a/data/2021/neurips/Overparameterization Improves Robustness to Covariate Shift in High Dimensions b/data/2021/neurips/Overparameterization Improves Robustness to Covariate Shift in High Dimensions
new file mode 100644
index 0000000000..eb6203eb50
--- /dev/null
+++ b/data/2021/neurips/Overparameterization Improves Robustness to Covariate Shift in High Dimensions	
@@ -0,0 +1 @@
+A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same. Despite the prevalence of covariate shift in real-world applications, a theoretical understanding in the context of modern machine learning has remained lacking. In this work, we examine the exact high-dimensional asymptotics of random feature regression under covariate shift and present a precise characterization of the limiting test error, bias, and variance in this setting. Our results motivate a natural partial order over covariate shifts that provides a sufficient condition for determining when the shift will harm (or even help) test performance. We find that overparameterized models exhibit enhanced robustness to covariate shift, providing one of the first theoretical explanations for this ubiquitous empirical phenomenon. Additionally, our analysis reveals an exact linear relationship between the in-distribution and out-of-distribution generalization performance, offering an explanation for this surprising recent observation.
\ No newline at end of file
diff --git a/data/2021/neurips/PCA Initialization for Approximate Message Passing in Rotationally Invariant Models b/data/2021/neurips/PCA Initialization for Approximate Message Passing in Rotationally Invariant Models
new file mode 100644
index 0000000000..c4399a043a
--- /dev/null
+++ b/data/2021/neurips/PCA Initialization for Approximate Message Passing in Rotationally Invariant Models	
@@ -0,0 +1 @@
+We study the problem of estimating a rank-$1$ signal in the presence of rotationally invariant noise-a class of perturbations more general than Gaussian noise. Principal Component Analysis (PCA) provides a natural estimator, and sharp results on its performance have been obtained in the high-dimensional regime. Recently, an Approximate Message Passing (AMP) algorithm has been proposed as an alternative estimator with the potential to improve the accuracy of PCA. However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice. In this work, we combine the two methods, and propose to initialize AMP with PCA. Our main result is a rigorous asymptotic characterization of the performance of this estimator. Both the AMP algorithm and its analysis differ from those previously derived in the Gaussian setting: at every iteration, our AMP algorithm requires a specific term to account for PCA initialization, while in the Gaussian case, PCA initialization affects only the first iteration of AMP. The proof is based on a two-phase artificial AMP that first approximates the PCA estimator and then mimics the true AMP. Our numerical simulations show an excellent agreement between AMP results and theoretical predictions, and suggest an interesting open direction on achieving Bayes-optimal performance.
\ No newline at end of file
diff --git a/data/2021/neurips/PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations b/data/2021/neurips/PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations
new file mode 100644
index 0000000000..ae20e5d876
--- /dev/null
+++ b/data/2021/neurips/PDE-GCN: Novel Architectures for Graph Neural Networks Motivated by Partial Differential Equations	
@@ -0,0 +1 @@
+Graph neural networks are increasingly becoming the go-to approach in various fields such as computer vision, computational biology and chemistry, where data are naturally explained by graphs. However, unlike traditional convolutional neural networks, deep graph networks do not necessarily yield better performance than shallow graph networks. This behavior usually stems from the over-smoothing phenomenon. In this work, we propose a family of architectures to control this behavior by design. Our networks are motivated by numerical methods for solving Partial Differential Equations (PDEs) on manifolds, and as such, their behavior can be explained by similar analysis. Moreover, as we demonstrate using an extensive set of experiments, our PDE-motivated networks can generalize and be effective for various types of problems from different fields. Our architectures obtain better or on par with the current state-of-the-art results for problems that are typically approached using different architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/PLUGIn: A simple algorithm for inverting generative models with recovery guarantees b/data/2021/neurips/PLUGIn: A simple algorithm for inverting generative models with recovery guarantees
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair b/data/2021/neurips/PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair
new file mode 100644
index 0000000000..800a55e862
--- /dev/null
+++ b/data/2021/neurips/PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair	
@@ -0,0 +1 @@
+Machine learning for understanding and editing source code has recently attracted signiﬁcant interest, with many developments in new models, new code representations, and new tasks. This proliferation can appear disparate and disconnected, making each approach seemingly unique and incompatible, thus obscuring the core machine learning challenges and contributions. In this work, we demonstrate that the landscape can be signiﬁcantly simpliﬁed by taking a general approach of mapping a graph to a sequence of tokens and pointers. Our main result is to show that 16 recently published tasks of different shapes can be cast in this form, based on which a single model architecture achieves near or above state-of-the-art results on nearly all tasks, outperforming custom models like code2seq and alternative generic models like Transformers. This uniﬁcation further enables multi-task learning and a series of cross-cutting experiments about the importance of different modeling choices for code understanding and repair tasks. The full framework, called PLUR, is easily extensible to more tasks, and will be open-sourced ( https://github.com/google-research/plur ).
\ No newline at end of file
diff --git a/data/2021/neurips/POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples b/data/2021/neurips/POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples
new file mode 100644
index 0000000000..b048a59735
--- /dev/null
+++ b/data/2021/neurips/POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution Samples	
@@ -0,0 +1 @@
+In this work, we propose to use out-of-distribution samples, i.e., unlabeled samples coming from outside the target classes, to improve few-shot learning. Specifically, we exploit the easily available out-of-distribution samples to drive the classifier to avoid irrelevant features by maximizing the distance from prototypes to out-of-distribution samples while minimizing that of in-distribution samples (i.e., support, query data). Our approach is simple to implement, agnostic to feature extractors, lightweight without any additional cost for pre-training, and applicable to both inductive and transductive settings. Extensive experiments on various standard benchmarks demonstrate that the proposed method consistently improves the performance of pretrained networks with different architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/PSD Representations for Effective Probability Models b/data/2021/neurips/PSD Representations for Effective Probability Models
new file mode 100644
index 0000000000..fb4d0d0843
--- /dev/null
+++ b/data/2021/neurips/PSD Representations for Effective Probability Models	
@@ -0,0 +1 @@
+Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive semideﬁnite (PSD) models for non-negative functions is particularly suited to this end. In particular, we characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Moreover, we show that we can perform efﬁciently both sum and product rule in closed form via matrix operations, enjoying the same versatility of mixture models. Our results open the way to applications of PSD models to density estimation, decision theory and inference.
\ No newline at end of file
diff --git a/data/2021/neurips/PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning b/data/2021/neurips/PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
new file mode 100644
index 0000000000..bb84acef3e
--- /dev/null
+++ b/data/2021/neurips/PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning	
@@ -0,0 +1 @@
+A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.
\ No newline at end of file
diff --git a/data/2021/neurips/Panoptic 3D Scene Reconstruction From a Single RGB Image b/data/2021/neurips/Panoptic 3D Scene Reconstruction From a Single RGB Image
new file mode 100644
index 0000000000..53014ccbd7
--- /dev/null
+++ b/data/2021/neurips/Panoptic 3D Scene Reconstruction From a Single RGB Image	
@@ -0,0 +1 @@
+Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction - from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations. We thus propose a new approach for holistic 3D scene understanding from a single RGB image which learns to lift and propagate 2D features from an input image to a 3D volumetric scene representation. We demonstrate that this holistic view of joint scene reconstruction, semantic, and instance segmentation is beneficial over treating the tasks independently, thus outperforming alternative approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions b/data/2021/neurips/ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions
new file mode 100644
index 0000000000..e153e61bb1
--- /dev/null
+++ b/data/2021/neurips/ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions	
@@ -0,0 +1 @@
+We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement b/data/2021/neurips/Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
new file mode 100644
index 0000000000..7c86c1a8d1
--- /dev/null
+++ b/data/2021/neurips/Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement	
@@ -0,0 +1 @@
+Optimizing multiple competing black-box objectives is a challenging problem in many fields, including science, engineering, and machine learning. Multi-objective Bayesian optimization (MOBO) is a sample-efficient approach for identifying the optimal trade-offs between the objectives. However, many existing methods perform poorly when the observations are corrupted by noise. We propose a novel acquisition function, NEHVI, that overcomes this important practical limitation by applying a Bayesian treatment to the popular expected hypervolume improvement (EHVI) criterion and integrating over this uncertainty in the Pareto frontier. We argue that, even in the noiseless setting, generating multiple candidates in parallel is an incarnation of EHVI with uncertainty in the Pareto frontier and therefore can be addressed using the same underlying technique. Through this lens, we derive a natural parallel variant, $q$NEHVI, that reduces computational complexity of parallel EHVI from exponential to polynomial with respect to the batch size. $q$NEHVI is one-step Bayes-optimal for hypervolume maximization in both noisy and noiseless environments, and we show that it can be optimized effectively with gradient-based methods via sample average approximation. Empirically, we demonstrate not only that $q$NEHVI is substantially more robust to observation noise than existing MOBO approaches, but also that it achieves state-of-the-art optimization performance and competitive wall-times in large-batch environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Parallel and Efficient Hierarchical k-Median Clustering b/data/2021/neurips/Parallel and Efficient Hierarchical k-Median Clustering
new file mode 100644
index 0000000000..ddab63cb19
--- /dev/null
+++ b/data/2021/neurips/Parallel and Efficient Hierarchical k-Median Clustering	
@@ -0,0 +1 @@
+As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical k -center, k -means, and k -median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efﬁcient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical k -median problem that, when using machines with memory s (for s ∈ Ω(log 2 ( n + ∆ + d )) ), outputs a hierarchical clustering such that for every ﬁxed value of k the cost of the solution is at most an O (min { d, log n } log ∆) factor larger in expectation than that of an optimal solution. Furthermore, we also get that in for all k simultanuously the cost of the solution is at most an expected O (min { d, log n } log ∆ log(∆ dn )) factor bigger that the corresponding optimal solution. The algorithm requires in O (log s ( nd log( n + ∆))) rounds. Here d is the dimension of the data set and ∆ is the ratio between the maximum and minimum distance of two points in the input dataset. To the best of our knowledge, this is the ﬁrst parallel algorithm for the hierarchical k -median problem with theoretical guarantees. We further complement our theoretical results with an empirical study of our algorithm that shows its effectiveness in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Parallelizing Thompson Sampling b/data/2021/neurips/Parallelizing Thompson Sampling
new file mode 100644
index 0000000000..e3af8f7997
--- /dev/null
+++ b/data/2021/neurips/Parallelizing Thompson Sampling	
@@ -0,0 +1 @@
+How can we make use of information parallelism in online decision making problems while efficiently balancing the exploration-exploitation trade-off? In this paper, we introduce a batch Thompson Sampling framework for two canonical online decision making problems, namely, stochastic multi-arm bandit and linear contextual bandit with finitely many arms. Over a time horizon $T$, our \textit{batch} Thompson Sampling policy achieves the same (asymptotic) regret bound of a fully sequential one while carrying out only $O(\log T)$ batch queries. To achieve this exponential reduction, i.e., reducing the number of interactions from $T$ to $O(\log T)$, our batch policy dynamically determines the duration of each batch in order to balance the exploration-exploitation trade-off. We also demonstrate experimentally that dynamic batch allocation dramatically outperforms natural baselines such as static batch allocations.
\ No newline at end of file
diff --git a/data/2021/neurips/Parameter Inference with Bifurcation Diagrams b/data/2021/neurips/Parameter Inference with Bifurcation Diagrams
new file mode 100644
index 0000000000..9e5ddb3806
--- /dev/null
+++ b/data/2021/neurips/Parameter Inference with Bifurcation Diagrams	
@@ -0,0 +1 @@
+Estimation of parameters in differential equation models can be achieved by applying learning algorithms to quantitative time-series data. However, sometimes it is only possible to measure qualitative changes of a system in response to a controlled condition. In dynamical systems theory, such change points are known as bifurcations and lie on a function of the controlled condition called the bifurcation diagram. In this work, we propose a gradient-based approach for inferring the parameters of differential equations that produce a user-specified bifurcation diagram. The cost function contains an error term that is minimal when the model bifurcations match the specified targets and a bifurcation measure which has gradients that push optimisers towards bifurcating parameter regimes. The gradients can be computed without the need to differentiate through the operations of the solver that was used to compute the diagram. We demonstrate parameter inference with minimal models which explore the space of saddle-node and pitchfork diagrams and the genetic toggle switch from synthetic biology. Furthermore, the cost landscape allows us to organise models in terms of topological and geometric equivalence.
\ No newline at end of file
diff --git a/data/2021/neurips/Parameter Prediction for Unseen Deep Architectures b/data/2021/neurips/Parameter Prediction for Unseen Deep Architectures
new file mode 100644
index 0000000000..03f51f72bd
--- /dev/null
+++ b/data/2021/neurips/Parameter Prediction for Unseen Deep Architectures	
@@ -0,0 +1 @@
+Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Parameter-free HE-friendly Logistic Regression b/data/2021/neurips/Parameter-free HE-friendly Logistic Regression
new file mode 100644
index 0000000000..87dcaa87dd
--- /dev/null
+++ b/data/2021/neurips/Parameter-free HE-friendly Logistic Regression	
@@ -0,0 +1 @@
+Privacy in machine learning has been widely recognized as an essential ethical and legal issue, because the data used for machine learning may contain sensitive information. Homomorphic encryption has recently attracted attention as a key solution to preserve privacy in machine learning applications. However, current approaches on the training of encrypted machine learning have relied heavily on hyperparameter selection, which should be avoided owing to the extreme difﬁculty of conducting validation on encrypted data. In this study, we propose an effective privacy-preserving logistic regression method that is free from the approximation of the sigmoid function and hyperparameter selection. In our framework, a logistic regression model can be transformed into the corresponding ridge regression for the logit function. We provide a theoretical background for our framework by suggesting a new generalization error bound on the encrypted data. Experiments on various real-world data show that our framework achieves better classiﬁcation results while reducing latency by ∼ 68% , compared to the previous models.
\ No newline at end of file
diff --git a/data/2021/neurips/Parameterized Knowledge Transfer for Personalized Federated Learning b/data/2021/neurips/Parameterized Knowledge Transfer for Personalized Federated Learning
new file mode 100644
index 0000000000..39196c27a0
--- /dev/null
+++ b/data/2021/neurips/Parameterized Knowledge Transfer for Personalized Federated Learning	
@@ -0,0 +1 @@
+In recent years, personalized federated learning (pFL) has attracted increasing attention for its potential in dealing with statistical heterogeneity among clients. However, the state-of-the-art pFL methods rely on model parameters aggregation at the server side, which require all models to have the same structure and size, and thus limits the application for more heterogeneous scenarios. To deal with such model constraints, we exploit the potentials of heterogeneous model settings and propose a novel training framework to employ personalized models for different clients. Specifically, we formulate the aggregation procedure in original pFL into a personalized group knowledge transfer training algorithm, namely, KT-pFL, which enables each client to maintain a personalized soft prediction at the server side to guide the others' local training. KT-pFL updates the personalized soft prediction of each client by a linear combination of all local soft predictions using a knowledge coefficient matrix, which can adaptively reinforce the collaboration among clients who own similar data distribution. Furthermore, to quantify the contributions of each client to others' personalized training, the knowledge coefficient matrix is parameterized so that it can be trained simultaneously with the models. The knowledge coefficient matrix and the model parameters are alternatively updated in each round following the gradient descent way. Extensive experiments on various datasets (EMNIST, Fashion\_MNIST, CIFAR-10) are conducted under different settings (heterogeneous models and data distributions). It is demonstrated that the proposed framework is the first federated learning paradigm that realizes personalized model training via parameterized group knowledge transfer while achieving significant performance gain comparing with state-of-the-art algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Parametric Complexity Bounds for Approximating PDEs with Neural Networks b/data/2021/neurips/Parametric Complexity Bounds for Approximating PDEs with Neural Networks
new file mode 100644
index 0000000000..d328d7278f
--- /dev/null
+++ b/data/2021/neurips/Parametric Complexity Bounds for Approximating PDEs with Neural Networks	
@@ -0,0 +1 @@
+Recent experiments have shown that deep networks can approximate solutions to high-dimensional PDEs, seemingly escaping the curse of dimensionality. However, questions regarding the theoretical basis for such approximations, including the required network size, remain open. In this paper, we investigate the representational power of neural networks for approximating solutions to linear elliptic PDEs with Dirichlet boundary conditions. We prove that when a PDE's coefficients are representable by small neural networks, the parameters required to approximate its solution scale polynomially with the input dimension $d$ and proportionally to the parameter counts of the coefficient networks. To this we end, we develop a proof technique that simulates gradient descent (in an appropriate Hilbert space) by growing a neural network architecture whose iterates each participate as sub-networks in their (slightly larger) successors, and converge to the solution of the PDE. We bound the size of the solution, showing a polynomial dependence on $d$ and no dependence on the volume of the domain.
\ No newline at end of file
diff --git a/data/2021/neurips/Parametrized Quantum Policies for Reinforcement Learning b/data/2021/neurips/Parametrized Quantum Policies for Reinforcement Learning
new file mode 100644
index 0000000000..58ca00bbcc
--- /dev/null
+++ b/data/2021/neurips/Parametrized Quantum Policies for Reinforcement Learning	
@@ -0,0 +1 @@
+With the advent of real-world quantum computing, the idea that parametrized quantum computations can be used as hypothesis families in a quantum-classical machine learning system is gaining increasing traction. Such hybrid systems have already shown the potential to tackle real-world tasks in supervised and generative learning, and recent works have established their provable advantages in special artificial tasks. Yet, in the case of reinforcement learning, which is arguably most challenging and where learning boosts would be extremely valuable, no proposal has been successful in solving even standard benchmarking tasks, nor in showing a theoretical learning advantage over classical algorithms. In this work, we achieve both. We propose a hybrid quantum-classical reinforcement learning model using very few qubits, which we show can be effectively trained to solve several standard benchmarking environments. Moreover, we demonstrate, and formally prove, the ability of parametrized quantum circuits to solve certain learning tasks that are intractable for classical models, including current state-of-art deep neural networks, under the widely-believed classical hardness of the discrete logarithm problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems b/data/2021/neurips/Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems
new file mode 100644
index 0000000000..adb657a236
--- /dev/null
+++ b/data/2021/neurips/Pareto-Optimal Learning-Augmented Algorithms for Online Conversion Problems	
@@ -0,0 +1 @@
+This paper leverages machine-learned predictions to design competitive algorithms for online conversion problems with the goal of improving the competitive ratio when predictions are accurate (i.e., consistency), while also guaranteeing a worst-case competitive ratio regardless of the prediction quality (i.e., robustness). We unify the algorithmic design of both integral and fractional conversion problems, which are also known as the 1-max-search and one-way trading problems, into a class of online threshold-based algorithms (OTA). By incorporating predictions into design of OTA, we achieve the Pareto-optimal trade-off of consistency and robustness, i.e., no online algorithm can achieve a better consistency guarantee given for a robustness guarantee. We demonstrate the performance of OTA using numerical experiments on Bitcoin conversion.
\ No newline at end of file
diff --git a/data/2021/neurips/Partial success in closing the gap between human and machine vision b/data/2021/neurips/Partial success in closing the gap between human and machine vision
new file mode 100644
index 0000000000..cd731a6ea5
--- /dev/null
+++ b/data/2021/neurips/Partial success in closing the gap between human and machine vision	
@@ -0,0 +1 @@
+A few years ago, the first CNN surpassed human performance on ImageNet. However, it soon became clear that machines lack robustness on more challenging test cases, a major obstacle towards deploying machines"in the wild"and towards obtaining better computational models of human visual perception. Here we ask: Are we making progress in closing the gap between human and machine vision? To answer this question, we tested human observers on a broad range of out-of-distribution (OOD) datasets, recording 85,120 psychophysical trials across 90 participants. We then investigated a range of promising machine learning developments that crucially deviate from standard supervised CNNs along three axes: objective function (self-supervised, adversarially trained, CLIP language-image training), architecture (e.g. vision transformers), and dataset size (ranging from 1M to 1B). Our findings are threefold. (1.) The longstanding distortion robustness gap between humans and CNNs is closing, with the best models now exceeding human feedforward performance on most of the investigated OOD datasets. (2.) There is still a substantial image-level consistency gap, meaning that humans make different errors than models. In contrast, most models systematically agree in their categorisation errors, even substantially different ones like contrastive self-supervised vs. standard supervised models. (3.) In many cases, human-to-model consistency improves when training dataset size is increased by one to three orders of magnitude. Our results give reason for cautious optimism: While there is still much room for improvement, the behavioural difference between human and machine vision is narrowing. In order to measure future progress, 17 OOD datasets with image-level human behavioural data and evaluation code are provided as a toolbox and benchmark at: https://github.com/bethgelab/model-vs-human/
\ No newline at end of file
diff --git a/data/2021/neurips/PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization b/data/2021/neurips/PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization
new file mode 100644
index 0000000000..bf873ae302
--- /dev/null
+++ b/data/2021/neurips/PartialFed: Cross-Domain Personalized Federated Learning via Partial Initialization	
@@ -0,0 +1 @@
+The burst of applications empowered by massive data have aroused unprecedented privacy concerns in AI society. Currently, data conﬁdentiality protection has been one core issue during deep model training. Federated Learning (FL), which enables privacy-preserving training across multiple silos, gained rising popularity for its parameter-only communication. However, previous works have shown that FL revealed a signiﬁcant performance drop if the data distributions are heterogeneous among different clients, especially when the clients have cross-domain characteristic, such as trafﬁc, aerial and in-door. To address this challenging problem, we propose a novel idea, PartialFed , which loads a subset of the global model’s parameters rather than loading the entire model used in most previous works. We ﬁrst validate our algorithm with manually decided loading strategies inspired by various expert priors, named PartialFed-Fix . Then we develop PartialFed-Adaptive , which automatically selects personalized loading strategy for each client. The superiority of our algorithm is proved by demonstrating the new state-of-the-art results on cross-domain federated classiﬁcation and detection. In particular, solely by initializing a small fraction of layers locally, we improve the performance of FedAvg on Ofﬁce-Home and UODB by 4.88% and 2.65%, respectively. Further studies show that the adaptive strategy performs signiﬁcantly better on domains with large deviation, e.g. improves AP50 by 4.03% and 4.89% on aerial and medical image detection compared to FedAvg.
\ No newline at end of file
diff --git a/data/2021/neurips/Particle Cloud Generation with Message Passing Generative Adversarial Networks b/data/2021/neurips/Particle Cloud Generation with Message Passing Generative Adversarial Networks
new file mode 100644
index 0000000000..f54ac6925b
--- /dev/null
+++ b/data/2021/neurips/Particle Cloud Generation with Message Passing Generative Adversarial Networks	
@@ -0,0 +1 @@
+In high energy physics (HEP), jets are collections of correlated particles produced ubiquitously in particle collisions such as those at the CERN Large Hadron Collider (LHC). Machine learning (ML)-based generative models, such as generative adversarial networks (GANs), have the potential to significantly accelerate LHC jet simulations. However, despite jets having a natural representation as a set of particles in momentum-space, a.k.a. a particle cloud, there exist no generative models applied to such a dataset. In this work, we introduce a new particle cloud dataset (JetNet), and apply to it existing point cloud GANs. Results are evaluated using (1) 1-Wasserstein distances between high- and low-level feature distributions, (2) a newly developed Fr\'{e}chet ParticleNet Distance, and (3) the coverage and (4) minimum matching distance metrics. Existing GANs are found to be inadequate for physics applications, hence we develop a new message passing GAN (MPGAN), which outperforms existing point cloud GANs on virtually every metric and shows promise for use in HEP. We propose JetNet as a novel point-cloud-style dataset for the ML community to experiment with, and set MPGAN as a benchmark to improve upon for future generative models. Additionally, to facilitate research and improve accessibility and reproducibility in this area, we release the open-source JetNet Python package with interfaces for particle cloud datasets, implementations for evaluation and loss metrics, and more tools for ML in HEP development.
\ No newline at end of file
diff --git a/data/2021/neurips/Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis b/data/2021/neurips/Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis
new file mode 100644
index 0000000000..0f1355c175
--- /dev/null
+++ b/data/2021/neurips/Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis	
@@ -0,0 +1 @@
+We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we analyze PDA in regularized empirical/expected risk minimization, and establish quantitative global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.
\ No newline at end of file
diff --git a/data/2021/neurips/Partition and Code: learning how to compress graphs b/data/2021/neurips/Partition and Code: learning how to compress graphs
new file mode 100644
index 0000000000..705fee0c33
--- /dev/null
+++ b/data/2021/neurips/Partition and Code: learning how to compress graphs	
@@ -0,0 +1 @@
+Can we use machine learning to compress graph data? The absence of ordering in graphs poses a significant challenge to conventional compression algorithms, limiting their attainable gains as well as their ability to discover relevant patterns. On the other hand, most graph compression approaches rely on domain-dependent handcrafted representations and cannot adapt to different underlying graph distributions. This work aims to establish the necessary principles a lossless graph compression method should follow to approach the entropy storage lower bound. Instead of making rigid assumptions about the graph distribution, we formulate the compressor as a probabilistic model that can be learned from data and generalise to unseen instances. Our"Partition and Code"framework entails three steps: first, a partitioning algorithm decomposes the graph into subgraphs, then these are mapped to the elements of a small dictionary on which we learn a probability distribution, and finally, an entropy encoder translates the representation into bits. All the components (partitioning, dictionary and distribution) are parametric and can be trained with gradient descent. We theoretically compare the compression quality of several graph encodings and prove, under mild conditions, that PnC achieves compression gains that grow either linearly or quadratically with the number of vertices. Empirically, PnC yields significant compression improvements on diverse real-world networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks b/data/2021/neurips/Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks
new file mode 100644
index 0000000000..e6d7c0c8cd
--- /dev/null
+++ b/data/2021/neurips/Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks	
@@ -0,0 +1 @@
+This paper introduces a class of mixed-integer formulations for trained ReLU neural networks. The approach balances model size and tightness by partitioning node inputs into a number of groups and forming the convex hull over the partitions via disjunctive programming. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node. For fewer partitions, we develop smaller relaxations that approximate the convex hull, and show that they outperform existing formulations. Specifically, we propose strategies for partitioning variables based on theoretical motivations and validate these strategies using extensive computational experiments. Furthermore, the proposed scheme complements known algorithmic approaches, e.g., optimization-based bound tightening captures dependencies within a partition.
\ No newline at end of file
diff --git a/data/2021/neurips/Passive attention in artificial neural networks predicts human visual selectivity b/data/2021/neurips/Passive attention in artificial neural networks predicts human visual selectivity
new file mode 100644
index 0000000000..e97b0b8f60
--- /dev/null
+++ b/data/2021/neurips/Passive attention in artificial neural networks predicts human visual selectivity	
@@ -0,0 +1 @@
+Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 79 new experiments and 7,810 participants, we show that passive attention techniques reveal a significant overlap with human visual selectivity estimates derived from 6 distinct behavioral tasks including visual discrimination, spatial localization, recognizability, free-viewing, cued-object search, and saliency search fixations. We find that input visualizations derived from relatively simple ANN architectures probed using guided backpropagation methods are the best predictors of a shared component in the joint variability of the human measures. We validate these correlational results with causal manipulations using recognition experiments. We show that images masked with ANN attention maps were easier for humans to classify than control masks in a speeded recognition experiment. Similarly, we find that recognition performance in the same ANN models was likewise influenced by masking input images using human visual selectivity maps. This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision: by examining their similarities and differences in terms of their visual selectivity to the information contained in images.
\ No newline at end of file
diff --git a/data/2021/neurips/PatchGame: Learning to Signal Mid-level Patches in Referential Games b/data/2021/neurips/PatchGame: Learning to Signal Mid-level Patches in Referential Games
new file mode 100644
index 0000000000..28b8559f62
--- /dev/null
+++ b/data/2021/neurips/PatchGame: Learning to Signal Mid-level Patches in Referential Games	
@@ -0,0 +1 @@
+We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal. In our referential game, the goal of the speaker is to compose a message or a symbolic representation of"important"image patches, while the task for the listener is to match the speaker's message to a different view of the same image. We show that it is indeed possible for the two agents to develop a communication protocol without explicit or implicit supervision. We further investigate the developed protocol and show the applications in speeding up recent Vision Transformers by using only important patches, and as pre-training for downstream recognition tasks (e.g., classification). Code available at https://github.com/kampta/PatchGame.
\ No newline at end of file
diff --git a/data/2021/neurips/Pay Attention to MLPs b/data/2021/neurips/Pay Attention to MLPs
new file mode 100644
index 0000000000..9c65280d4a
--- /dev/null
+++ b/data/2021/neurips/Pay Attention to MLPs	
@@ -0,0 +1 @@
+Transformers have become one of the most important architectural innovations in deep learning and have enabled many breakthroughs over the past few years. Here we propose a simple network architecture, gMLP, based on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications. Our comparisons show that self-attention is not critical for Vision Transformers, as gMLP can achieve the same accuracy. For BERT, our model achieves parity with Transformers on pretraining perplexity and is better on some downstream NLP tasks. On finetuning tasks where gMLP performs worse, making the gMLP model substantially larger can close the gap with Transformers. In general, our experiments show that gMLP can scale as well as Transformers over increased data and compute.
\ No newline at end of file
diff --git a/data/2021/neurips/Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling b/data/2021/neurips/Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling
new file mode 100644
index 0000000000..ca5d3f9d27
--- /dev/null
+++ b/data/2021/neurips/Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling	
@@ -0,0 +1 @@
+Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios for sequence modeling, where the key challenge is to maximize positive transfer and mitigate negative transfer across languages and domains. In this paper, we find that non-selective attention sharing is sub-optimal for achieving good generalization across all languages and domains. We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling. Our approach automatically learns shared and specialized attention heads for different languages and domains to mitigate their interference. Evaluated in various tasks including speech recognition, text-to-text and speech-to-text translation, the proposed attention sharing strategies consistently bring gains to sequence models built upon multi-head attention. For speech-to-text translation, our approach yields an average of $+2.0$ BLEU over $13$ language directions in multilingual setting and $+2.0$ BLEU over $3$ domains in multi-domain setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Per-Pixel Classification is Not All You Need for Semantic Segmentation b/data/2021/neurips/Per-Pixel Classification is Not All You Need for Semantic Segmentation
new file mode 100644
index 0000000000..324d307b99
--- /dev/null
+++ b/data/2021/neurips/Per-Pixel Classification is Not All You Need for Semantic Segmentation	
@@ -0,0 +1 @@
+Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
\ No newline at end of file
diff --git a/data/2021/neurips/PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators b/data/2021/neurips/PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
new file mode 100644
index 0000000000..9c17d005b6
--- /dev/null
+++ b/data/2021/neurips/PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators	
@@ -0,0 +1 @@
+We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i.e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy. We find that the performance of state-of-the-art offline and model-based RL methods degrade significantly given such limited data availability, even for commonly perceived"solved"benchmark settings such as"MountainCar"and"CartPole". To address this challenge, we propose PerSim, a model-based offline RL approach which first learns a personalized simulator for each agent by collectively using the historical trajectories across all agents, prior to learning a policy. We do so by positing that the transition dynamics across agents can be represented as a latent function of latent factors associated with agents, states, and actions; subsequently, we theoretically establish that this function is well-approximated by a"low-rank"decomposition of separable agent, state, and action latent functions. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data. We perform extensive experiments across several benchmark environments and RL methods. The consistent improvement of our approach, measured in terms of both state dynamics prediction and eventual reward, confirms the efficacy of our framework in leveraging limited historical data to simultaneously learn personalized policies across agents.
\ No newline at end of file
diff --git a/data/2021/neurips/Perceptual Score: What Data Modalities Does Your Model Perceive? b/data/2021/neurips/Perceptual Score: What Data Modalities Does Your Model Perceive?
new file mode 100644
index 0000000000..65f4562535
--- /dev/null
+++ b/data/2021/neurips/Perceptual Score: What Data Modalities Does Your Model Perceive?	
@@ -0,0 +1 @@
+Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find shortcuts. To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities. Using the perceptual score, we find a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. This trend is concerning as answers are hence increasingly inferred from textual cues only. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions. We hope to spur a discussion on the perceptiveness of multi-modal models and also hope to encourage the community working on multi-modal classifiers to start quantifying perceptiveness via the proposed perceptual score.
\ No newline at end of file
diff --git a/data/2021/neurips/Periodic Activation Functions Induce Stationarity b/data/2021/neurips/Periodic Activation Functions Induce Stationarity
new file mode 100644
index 0000000000..ddaf46364f
--- /dev/null
+++ b/data/2021/neurips/Periodic Activation Functions Induce Stationarity	
@@ -0,0 +1 @@
+Neural network models are known to reinforce hidden data biases, making them unreliable and difficult to interpret. We seek to build models that `know what they do not know' by introducing inductive biases in the function space. We show that periodic activation functions in Bayesian neural networks establish a connection between the prior on the network weights and translation-invariant, stationary Gaussian process priors. Furthermore, we show that this link goes beyond sinusoidal (Fourier) activations by also covering triangular wave and periodic ReLU activation functions. In a series of experiments, we show that periodic activation functions obtain comparable performance for in-domain data and capture sensitivity to perturbed inputs in deep neural networks for out-of-domain detection.
\ No newline at end of file
diff --git a/data/2021/neurips/Permutation-Invariant Variational Autoencoder for Graph-Level Representation Learning b/data/2021/neurips/Permutation-Invariant Variational Autoencoder for Graph-Level Representation Learning
new file mode 100644
index 0000000000..613dc89c36
--- /dev/null
+++ b/data/2021/neurips/Permutation-Invariant Variational Autoencoder for Graph-Level Representation Learning	
@@ -0,0 +1 @@
+Recently, there has been great success in applying deep neural networks on graph structured data. Most work, however, focuses on either node- or graph-level supervised learning, such as node, link or graph classification or node-level unsupervised learning (e.g. node clustering). Despite its wide range of possible applications, graph-level unsupervised learning has not received much attention yet. This might be mainly attributed to the high representation complexity of graphs, which can be represented by n! equivalent adjacency matrices, where n is the number of nodes. In this work we address this issue by proposing a permutation-invariant variational autoencoder for graph structured data. Our proposed model indirectly learns to match the node ordering of input and output graph, without imposing a particular node ordering or performing expensive graph matching. We demonstrate the effectiveness of our proposed model on various graph reconstruction and generation tasks and evaluate the expressive power of extracted representations for downstream graph-level classification and regression.
\ No newline at end of file
diff --git a/data/2021/neurips/Permuton-induced Chinese Restaurant Process b/data/2021/neurips/Permuton-induced Chinese Restaurant Process
new file mode 100644
index 0000000000..8356511085
--- /dev/null
+++ b/data/2021/neurips/Permuton-induced Chinese Restaurant Process	
@@ -0,0 +1 @@
+This paper proposes the permuton-induced Chinese restaurant process (PCRP), a stochastic process on rectangular partitioning of a matrix. This distribution is suitable for use as a prior distribution in Bayesian nonparametric relational model to ﬁnd hidden clusters in matrices and network data. Our main contribution is to introduce the notion of permutons into the well-known Chinese restaurant process (CRP) for sequence partitioning: a permuton is a probability measure on [0 , 1] × [0 , 1] and can be regarded as a geometric interpretation of the scaling limit of permutations. Speciﬁcally, we extend the model that the table order of CRPs has a random geometric arrangement on [0 , 1] × [0 , 1] drawn from the permuton. By analogy with the relationship between the stick-breaking process (SBP) and CRP for the inﬁnite mixture model of a sequence, this model can be regarded as a multi-dimensional extension of CRP paired with the block-breaking process (BBP), which has been recently proposed as a multi-dimensional extension of SBP. While BBP always has an inﬁnite number of redundant intermediate variables, PCRP can be composed of varying size intermediate variables in a data-driven manner depending on the size and quality of the observation data. Experiments show that PCRP can improve the prediction performance in relational data analysis by reducing the local optima and slow mixing problems compared with the conventional BNP models because the local transitions of PCRP in Markov chain Monte Carlo inference are more ﬂexible than the previous models.
\ No newline at end of file
diff --git a/data/2021/neurips/Personalized Federated Learning With Gaussian Processes b/data/2021/neurips/Personalized Federated Learning With Gaussian Processes
new file mode 100644
index 0000000000..4271f7e926
--- /dev/null
+++ b/data/2021/neurips/Personalized Federated Learning With Gaussian Processes	
@@ -0,0 +1 @@
+Federated learning aims to learn a global model that performs well on client devices with limited cross-client communication. Personalized federated learning (PFL) further extends this setup to handle data heterogeneity between clients by learning personalized models. A key challenge in this setting is to learn effectively across clients even though each client has unique data that is often limited in size. Here we present pFedGP, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning. GPs are highly expressive models that work well in the low data regime due to their Bayesian nature. However, applying GPs to PFL raises multiple challenges. Mainly, GPs performance depends heavily on access to a good kernel function, and learning a kernel requires a large training set. Therefore, we propose learning a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client. We further extend pFedGP to include inducing points using two novel methods, the first helps to improve generalization in the low data regime and the second reduces the computational cost. We derive a PAC-Bayes generalization bound on novel clients and empirically show that it gives non-vacuous guarantees. Extensive experiments on standard PFL benchmarks with CIFAR-10, CIFAR-100, and CINIC-10, and on a new setup of learning under input noise show that pFedGP achieves well-calibrated predictions while significantly outperforming baseline methods, reaching up to 21% in accuracy gain.
\ No newline at end of file
diff --git a/data/2021/neurips/Perturb-and-max-product: Sampling and learning in discrete energy-based models b/data/2021/neurips/Perturb-and-max-product: Sampling and learning in discrete energy-based models
new file mode 100644
index 0000000000..1c13713a28
--- /dev/null
+++ b/data/2021/neurips/Perturb-and-max-product: Sampling and learning in discrete energy-based models	
@@ -0,0 +1 @@
+Perturb-and-MAP offers an elegant approach to approximately sample from a energy-based model (EBM) by computing the maximum-a-posteriori (MAP) configuration of a perturbed version of the model. Sampling in turn enables learning. However, this line of research has been hindered by the general intractability of the MAP computation. Very few works venture outside tractable models, and when they do, they use linear programming approaches, which as we will show, have several limitations. In this work we present perturb-and-max-product (PMP), a parallel and scalable mechanism for sampling and learning in discrete EBMs. Models can be arbitrary as long as they are built using tractable factors. We show that (a) for Ising models, PMP is orders of magnitude faster than Gibbs and Gibbs-with-Gradients (GWG) at learning and generating samples of similar or better quality; (b) PMP is able to learn and sample from RBMs; (c) in a large, entangled graphical model in which Gibbs and GWG fail to mix, PMP succeeds.
\ No newline at end of file
diff --git a/data/2021/neurips/Perturbation Theory for the Information Bottleneck b/data/2021/neurips/Perturbation Theory for the Information Bottleneck
new file mode 100644
index 0000000000..5090a88989
--- /dev/null
+++ b/data/2021/neurips/Perturbation Theory for the Information Bottleneck	
@@ -0,0 +1 @@
+Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we derive a perturbation theory for the IB method and report the first complete characterization of the learning onset-the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. We explore the difference and subtleties in our derivation and previous attempts at deriving a perturbation theory for the learning onset and attribute the discrepancy to a flawed assumption. Our work also provides a fresh perspective on the intimate relationship between the IB method and the strong data processing inequality.
\ No newline at end of file
diff --git a/data/2021/neurips/Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems b/data/2021/neurips/Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems
new file mode 100644
index 0000000000..c9a75f28bb
--- /dev/null
+++ b/data/2021/neurips/Perturbation-based Regret Analysis of Predictive Control in Linear Time Varying Systems	
@@ -0,0 +1 @@
+We study predictive control in a setting where the dynamics are time-varying and linear, and the costs are time-varying and well-conditioned. At each time step, the controller receives the exact predictions of costs, dynamics, and disturbances for the future $k$ time steps. We show that when the prediction window $k$ is sufficiently large, predictive control is input-to-state stable and achieves a dynamic regret of $O(\lambda^k T)$, where $\lambda<1$ is a positive constant. This is the first dynamic regret bound on the predictive control of linear time-varying systems. Under more assumptions on the terminal costs, we also show that predictive control obtains the first competitive bound for the control of linear time-varying systems: $1 + O(\lambda^k)$. Our results are derived using a novel proof framework based on a perturbation bound that characterizes how a small change to the system parameters impacts the optimal trajectory.
\ No newline at end of file
diff --git a/data/2021/neurips/Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL b/data/2021/neurips/Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
new file mode 100644
index 0000000000..08b7d4e21f
--- /dev/null
+++ b/data/2021/neurips/Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL	
@@ -0,0 +1 @@
+Mean-Field Multi-Agent Reinforcement Learning (MF-MARL) is attractive in the applications involving a large population of homogeneous agents, as it exploits the permutation invariance of agents and avoids the curse of many agents. Most existing results only focus on online settings, in which agents can interact with the environment during training. In some applications such as social welfare optimization, however, the interaction during training can be prohibitive or even unethical in the societal systems. To bridge such a gap, we propose a SAFARI (peSsimistic meAn-Field vAlue iteRatIon) algorithm for off-line MF-MARL, which only requires a handful of pre-collected experience data. Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon H and N training trajectories, SAFARI attains a sub-optimality gap of O ( H 2 d eff / √ N ) , where d eff is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents. Numerical experiments are provided.
\ No newline at end of file
diff --git a/data/2021/neurips/PettingZoo: Gym for Multi-Agent Reinforcement Learning b/data/2021/neurips/PettingZoo: Gym for Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..be38df7b29
--- /dev/null
+++ b/data/2021/neurips/PettingZoo: Gym for Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement learning. PettingZoo's API, while inheriting many features of Gym, is unique amongst MARL APIs in that it's based around the novel AEC games model. We argue, in part through case studies on major problems in popular MARL environments, that the popular game models are poor conceptual models of games commonly used in MARL and accordingly can promote confusing bugs that are hard to detect, and that the AEC games model addresses these problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Photonic Differential Privacy with Direct Feedback Alignment b/data/2021/neurips/Photonic Differential Privacy with Direct Feedback Alignment
new file mode 100644
index 0000000000..e35b5ca720
--- /dev/null
+++ b/data/2021/neurips/Photonic Differential Privacy with Direct Feedback Alignment	
@@ -0,0 +1 @@
+Optical Processing Units (OPUs) – low-power photonic chips dedicated to large scale random projections – have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs a solution of choice to provide a private-by-design training. We provide a theoretical analysis of our adaptive privacy mechanism, carefully measuring how the noise of optical random projections propagates in the process and gives rise to provable Differential Privacy. Finally, we conduct experiments demonstrating the ability of our learning procedure to achieve solid end-task performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling b/data/2021/neurips/Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling
new file mode 100644
index 0000000000..63935e50ca
--- /dev/null
+++ b/data/2021/neurips/Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling	
@@ -0,0 +1 @@
+Background: Floods are the most common natural disaster in the world, affecting the lives of hundreds of millions. Flood forecasting is therefore a vitally important endeavor, typically achieved using physical water flow simulations, which rely on accurate terrain elevation maps. However, such simulations, based on solving partial differential equations, are computationally prohibitive on a large scale. This scalability issue is commonly alleviated using a coarse grid representation of the elevation map, though this representation may distort crucial terrain details, leading to significant inaccuracies in the simulation. Contributions: We train a deep neural network to perform physics-informed downsampling of the terrain map: we optimize the coarse grid representation of the terrain maps, so that the flood prediction will match the fine grid solution. For the learning process to succeed, we configure a dataset specifically for this task. We demonstrate that with this method, it is possible to achieve a significant reduction in computational cost, while maintaining an accurate solution. A reference implementation accompanies the paper as well as documentation and code for dataset reproduction.
\ No newline at end of file
diff --git a/data/2021/neurips/Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling b/data/2021/neurips/Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling
new file mode 100644
index 0000000000..731e9a634c
--- /dev/null
+++ b/data/2021/neurips/Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling	
@@ -0,0 +1 @@
+Integrating physics models within machine learning models holds considerable promise toward learning robust models with improved interpretability and abilities to extrapolate. In this work, we focus on the integration of incomplete physics models into deep generative models. In particular, we introduce an architecture of variational autoencoders (VAEs) in which a part of the latent space is grounded by physics. A key technical challenge is to strike a balance between the incomplete physics and trainable components such as neural networks for ensuring that the physics part is used in a meaningful manner. To this end, we propose a regularized learning method that controls the effect of the trainable components and preserves the semantics of the physics-based latent variables as intended. We not only demonstrate generative performance improvements over a set of synthetic and real-world datasets, but we also show that we learn robust models that can consistently extrapolate beyond the training distribution in a meaningful manner. Moreover, we show that we can control the generative process in an interpretable manner.
\ No newline at end of file
diff --git a/data/2021/neurips/PiRank: Scalable Learning To Rank via Differentiable Sorting b/data/2021/neurips/PiRank: Scalable Learning To Rank via Differentiable Sorting
new file mode 100644
index 0000000000..9096dbd226
--- /dev/null
+++ b/data/2021/neurips/PiRank: Scalable Learning To Rank via Differentiable Sorting	
@@ -0,0 +1 @@
+A key challenge with machine learning approaches for ranking is the gap between the performance metrics of interest and the surrogate loss functions that can be optimized with gradient-based methods. This gap arises because ranking metrics typically involve a sorting operation which is not differentiable w.r.t. the model parameters. Prior works have proposed surrogates that are loosely related to ranking metrics or simple smoothed versions thereof, and often fail to scale to real-world applications. We propose PiRank, a new class of differentiable surrogates for ranking, which employ a continuous, temperature-controlled relaxation to the sorting operator based on NeuralSort [1]. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature and further propose a divide and-conquer extension that scales favorably to large list sizes, both in theory and practice. Empirically, we demonstrate the role of larger list sizes during training and show that PiRank significantly improves over comparable approaches on publicly available internet-scale learning-to-rank benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Pipeline Combinators for Gradual AutoML b/data/2021/neurips/Pipeline Combinators for Gradual AutoML
new file mode 100644
index 0000000000..db972d3d0c
--- /dev/null
+++ b/data/2021/neurips/Pipeline Combinators for Gradual AutoML	
@@ -0,0 +1 @@
+Automated machine learning (AutoML) can make data scientists more productive. But if machine learning is totally automated, that leaves no room for data scientists to apply their intuition. Hence, data scientists often prefer not total but gradual automation, where they control certain choices and AutoML explores the rest. Unfortunately, gradual AutoML is cumbersome with state-of-the-art tools, requiring large non-compositional code changes. More concise compositional code can be achieved with combinators, a powerful concept from functional programming. This paper introduces a small set of orthogonal combinators for composing machine-learning operators into pipelines. It describes a translation scheme from pipelines and associated hyperparameter schemas to search spaces for AutoML optimizers. On that foundation, this paper presents Lale, an open-source sklearn-compatible AutoML library, and evaluates it with a user study.
\ No newline at end of file
diff --git a/data/2021/neurips/Piper: Multidimensional Planner for DNN Parallelization b/data/2021/neurips/Piper: Multidimensional Planner for DNN Parallelization
new file mode 100644
index 0000000000..6004980440
--- /dev/null
+++ b/data/2021/neurips/Piper: Multidimensional Planner for DNN Parallelization	
@@ -0,0 +1 @@
+The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in the compute and memory requirements of model training, has led to the development of many execution schemes such as data parallelism, pipeline model parallelism, tensor (intra-layer) model parallelism, and various memory-saving optimizations. However, no prior work has tackled the highly complex problem of optimally partitioning the DNN computation graph across many accelerators while combining all these parallelism modes and optimizations. In this work, we introduce Piper, an efﬁcient optimization algorithm for this problem that is based on a two-level dynamic programming approach. Our two-level approach is driven by the insight that being given tensor-parallelization techniques for individual layers (e.g., Megatron-LM’s splits for transformer layers) signiﬁcantly reduces the search space and makes the global problem tractable, compared to considering tensor-parallel conﬁgurations for the entire DNN operator graph.
\ No newline at end of file
diff --git a/data/2021/neurips/Planning from Pixels in Environments with Combinatorially Hard Search Spaces b/data/2021/neurips/Planning from Pixels in Environments with Combinatorially Hard Search Spaces
new file mode 100644
index 0000000000..c1a0b90e32
--- /dev/null
+++ b/data/2021/neurips/Planning from Pixels in Environments with Combinatorially Hard Search Spaces	
@@ -0,0 +1 @@
+The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control; these methods however do not come without significant issues, such as limited generalization capabilities and difficulties when dealing with combinatorially hard planning instances. Our contribution is two-fold: (i) we present a method that learns to represent its environment as a latent graph and leverages state reidentification to reduce the complexity of finding a good policy from exponential to linear (ii) we introduce a set of lightweight environments with an underlying discrete combinatorial structure in which planning is challenging even for humans. Moreover, we show that our methods achieves strong empirical generalization to variations in the environment, even across highly disadvantaged regimes, such as"one-shot"planning, or in an offline RL paradigm which only provides low-quality trajectories.
\ No newline at end of file
diff --git a/data/2021/neurips/Play to Grade: Testing Coding Games as Classifying Markov Decision Process b/data/2021/neurips/Play to Grade: Testing Coding Games as Classifying Markov Decision Process
new file mode 100644
index 0000000000..4d990e590e
--- /dev/null
+++ b/data/2021/neurips/Play to Grade: Testing Coding Games as Classifying Markov Decision Process	
@@ -0,0 +1 @@
+Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, there are no contemporary autonomous methods for providing feedback. Notably, interactive programs are impossible to grade by traditional unit tests. In this paper we formalize the challenge of providing feedback to interactive programs as a task of classifying Markov Decision Processes (MDPs). Each student's program fully specifies an MDP where the agent needs to operate and decide, under reasonable generalization, if the dynamics and reward model of the input MDP should be categorized as correct or broken. We demonstrate that by designing a cooperative objective between an agent and an autoregressive model, we can use the agent to sample differential trajectories from the input MDP that allows a classifier to determine membership: Play to Grade. Our method enables an automatic feedback system for interactive code assignments. We release a dataset of 711,274 anonymized student submissions to a single assignment with hand-coded bug labels to support future research.
\ No newline at end of file
diff --git a/data/2021/neurips/PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning b/data/2021/neurips/PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning
new file mode 100644
index 0000000000..e02da0b428
--- /dev/null
+++ b/data/2021/neurips/PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning	
@@ -0,0 +1 @@
+Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning. Specifically, PlayVirtual predicts future states in the latent space based on the current state and action by a dynamics model and then predicts the previous states by a backward dynamics model, which forms a trajectory cycle. Based on this, we augment the actions to generate a large amount of virtual state-action trajectories. Being free of groudtruth state supervision, we enforce a trajectory to meet the cycle consistency constraint, which can significantly enhance the data efficiency. We validate the effectiveness of our designs on the Atari and DeepMind Control Suite benchmarks. Our method achieves the state-of-the-art performance on both benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Pointwise Bounds for Distribution Estimation under Communication Constraints b/data/2021/neurips/Pointwise Bounds for Distribution Estimation under Communication Constraints
new file mode 100644
index 0000000000..56872ea293
--- /dev/null
+++ b/data/2021/neurips/Pointwise Bounds for Distribution Estimation under Communication Constraints	
@@ -0,0 +1 @@
+We consider the problem of estimating a $d$-dimensional discrete distribution from its samples observed under a $b$-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide \emph{pointwise} bounds that depend on the target distribution $p$. In particular, we show that the $\ell_2$ error decays with $O\left(\frac{\lVert p\rVert_{1/2}}{n2^b}\vee \frac{1}{n}\right)$ (In this paper, we use $a\vee b$ and $a \wedge b$ to denote $\max(a, b)$ and $\min(a,b)$ respectively.) when $n$ is sufficiently large, hence it is governed by the \emph{half-norm} of $p$ instead of the ambient dimension $d$. For the achievability result, we propose a two-round sequentially interactive estimation scheme that achieves this error rate uniformly over all $p$. Our scheme is based on a novel local refinement idea, where we first use a standard global minimax scheme to localize $p$ and then use the remaining samples to locally refine our estimate. We also develop a new local minimax lower bound with (almost) matching $\ell_2$ error, showing that any interactive scheme must admit a $\Omega\left( \frac{\lVert p \rVert_{{(1+\delta)}/{2}}}{n2^b}\right)$ $\ell_2$ error for any $\delta>0$. The lower bound is derived by first finding the best parametric sub-model containing $p$, and then upper bounding the quantized Fisher information under this model. Our upper and lower bounds together indicate that the $\mathcal{H}_{1/2}(p) = \log(\lVert p \rVert_{{1}/{2}})$ bits of communication is both sufficient and necessary to achieve the optimal (centralized) performance, where $\mathcal{H}_{{1}/{2}}(p)$ is the R\'enyi entropy of order $2$. Therefore, under the $\ell_2$ loss, the correct measure of the local communication complexity at $p$ is its R\'enyi entropy.
\ No newline at end of file
diff --git a/data/2021/neurips/PolarStream: Streaming Object Detection and Segmentation with Polar Pillars b/data/2021/neurips/PolarStream: Streaming Object Detection and Segmentation with Polar Pillars
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning b/data/2021/neurips/Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
new file mode 100644
index 0000000000..e636abeaa4
--- /dev/null
+++ b/data/2021/neurips/Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning	
@@ -0,0 +1 @@
+Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in two settings: learning interactively in the environment (online RL), or learning from an offline dataset (offline RL). However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a"reference policy"$\mu$ close to the optimal policy $\pi_\star$ in a certain sense. We consider the policy finetuning problem in episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and horizon length $H$. We first design a sharp offline reduction algorithm -- which simply executes $\mu$ and runs offline policy optimization on the collected dataset -- that finds an $\varepsilon$ near-optimal policy within $\widetilde{O}(H^3SC^\star/\varepsilon^2)$ episodes, where $C^\star$ is the single-policy concentrability coefficient between $\mu$ and $\pi_\star$. This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL. We then establish an $\Omega(H^3S\min\{C^\star, A\}/\varepsilon^2)$ sample complexity lower bound for any policy finetuning algorithm, including those that can adaptively explore the environment. This implies that -- perhaps surprisingly -- the optimal policy finetuning algorithm is either offline reduction or a purely online RL algorithm that does not use $\mu$. Finally, we design a new hybrid offline/online algorithm for policy finetuning that achieves better sample complexity than both vanilla offline reduction and purely online RL algorithms, in a relaxed setting where $\mu$ only satisfies concentrability partially up to a certain time step.
\ No newline at end of file
diff --git a/data/2021/neurips/Policy Learning Using Weak Supervision b/data/2021/neurips/Policy Learning Using Weak Supervision
new file mode 100644
index 0000000000..08b5407355
--- /dev/null
+++ b/data/2021/neurips/Policy Learning Using Weak Supervision	
@@ -0,0 +1 @@
+Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert's demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the weak supervisions to perform policy learning efficiently. To handle this problem, we treat the "weak supervisions" as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a "correlated agreement" with the peer agent's policy (instead of simple agreements). Our way of leveraging peer agent's information offers us a family of solutions that learn effectively from weak supervisions with theoretical guarantees. Extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations and standard policy co-training (RL + BC) show that the proposed approach leads to substantial improvements, especially when the complexity or the noise of the learning environments grows.
\ No newline at end of file
diff --git a/data/2021/neurips/Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses b/data/2021/neurips/Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
new file mode 100644
index 0000000000..025f50380a
--- /dev/null
+++ b/data/2021/neurips/Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses	
@@ -0,0 +1 @@
+Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass the challenge of global exploration. To eliminate the need of such assumptions, in this work, we develop a general solution that adds dilated bonuses to the policy update to facilitate global exploration. To showcase the power and generality of this technique, we apply it to several episodic MDP settings with adversarial losses and bandit feedback, improving and generalizing the state-of-the-art. Specifically, in the tabular case, we obtain $\widetilde{\mathcal{O}}(\sqrt{T})$ regret where $T$ is the number of episodes, improving the $\widetilde{\mathcal{O}}({T}^{2/3})$ regret bound by Shani et al. (2020). When the number of states is infinite, under the assumption that the state-action values are linear in some low-dimensional features, we obtain $\widetilde{\mathcal{O}}({T}^{2/3})$ regret with the help of a simulator, matching the result of Neu and Olkhovskaya (2020) while importantly removing the need of an exploratory policy that their algorithm requires. When a simulator is unavailable, we further consider a linear MDP setting and obtain $\widetilde{\mathcal{O}}({T}^{14/15})$ regret, which is the first result for linear MDPs with adversarial losses and bandit feedback.
\ No newline at end of file
diff --git a/data/2021/neurips/Pooling by Sliced-Wasserstein Embedding b/data/2021/neurips/Pooling by Sliced-Wasserstein Embedding
new file mode 100644
index 0000000000..6e7bf5850d
--- /dev/null
+++ b/data/2021/neurips/Pooling by Sliced-Wasserstein Embedding	
@@ -0,0 +1 @@
+Learning representations from sets has become increasingly important with many applications in point cloud processing, graph learning, image/video recognition, and object detection. We introduce a geometrically-interpretable and generic pooling mechanism for aggregating a set of features into a ﬁxed-dimensional representation. In particular, we treat elements of a set as samples from a probability distribution and propose an end-to-end trainable Euclidean embedding for sliced-Wasserstein distance to learn from set-structured data effectively. We evaluate our proposed pooling method on a wide variety of set-structured data, including point-cloud, graph, and image classiﬁcation tasks, and demonstrate that our proposed method provides superior performance over existing set representation learning approaches. Our code is available at https://github.com/navid-naderi/PSWE .
\ No newline at end of file
diff --git a/data/2021/neurips/PortaSpeech: Portable and High-Quality Generative Text-to-Speech b/data/2021/neurips/PortaSpeech: Portable and High-Quality Generative Text-to-Speech
new file mode 100644
index 0000000000..f95eaceec4
--- /dev/null
+++ b/data/2021/neurips/PortaSpeech: Portable and High-Quality Generative Text-to-Speech	
@@ -0,0 +1 @@
+Non-autoregressive text-to-speech (NAR-TTS) models such as FastSpeech 2 and Glow-TTS can synthesize high-quality speech from the given text in parallel. After analyzing two kinds of generative NAR-TTS models (VAE and normalizing flow), we find that: VAE is good at capturing the long-range semantics features (e.g., prosody) even with small model size but suffers from blurry and unnatural results; and normalizing flow is good at reconstructing the frequency bin-wise details but performs poorly when the number of model parameters is limited. Inspired by these observations, to generate diverse speech with natural details and rich prosody using a lightweight architecture, we propose PortaSpeech, a portable and high-quality generative text-to-speech model. Specifically, 1) to model both the prosody and mel-spectrogram details accurately, we adopt a lightweight VAE with an enhanced prior followed by a flow-based post-net with strong conditional inputs as the main architecture. 2) To further compress the model size and memory footprint, we introduce the grouped parameter sharing mechanism to the affine coupling layers in the post-net. 3) To improve the expressiveness of synthesized speech and reduce the dependency on accurate fine-grained alignment between text and speech, we propose a linguistic encoder with mixture alignment combining hard inter-word alignment and soft intra-word alignment, which explicitly extracts word-level semantic information. Experimental results show that PortaSpeech outperforms other TTS models in both voice quality and prosody modeling in terms of subjective and objective evaluation metrics, and shows only a slight performance degradation when reducing the model parameters to 6.7M (about 4x model size and 3x runtime memory compression ratio compared with FastSpeech 2). Our extensive ablation studies demonstrate that each design in PortaSpeech is effective.
\ No newline at end of file
diff --git a/data/2021/neurips/Post-Contextual-Bandit Inference b/data/2021/neurips/Post-Contextual-Bandit Inference
new file mode 100644
index 0000000000..3984b13223
--- /dev/null
+++ b/data/2021/neurips/Post-Contextual-Bandit Inference	
@@ -0,0 +1 @@
+Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.
\ No newline at end of file
diff --git a/data/2021/neurips/Post-Training Quantization for Vision Transformer b/data/2021/neurips/Post-Training Quantization for Vision Transformer
new file mode 100644
index 0000000000..822e4d6122
--- /dev/null
+++ b/data/2021/neurips/Post-Training Quantization for Vision Transformer	
@@ -0,0 +1 @@
+Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting powerful feature representations, which are more difficult to be developed on mobile devices. In this paper, we present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. Basically, the quantization task can be regarded as finding the optimal low-bit quantization intervals for weights and inputs, respectively. To preserve the functionality of the attention mechanism, we introduce a ranking loss into the conventional quantization objective that aims to keep the relative order of the self-attention results after quantization. Moreover, we thoroughly analyze the relationship between quantization loss of different layers and the feature diversity, and explore a mixed-precision quantization scheme by exploiting the nuclear norm of each attention map and output feature. The effectiveness of the proposed method is verified on several benchmark models and datasets, which outperforms the state-of-the-art post-training quantization algorithms. For instance, we can obtain an 81.29\% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.
\ No newline at end of file
diff --git a/data/2021/neurips/Post-Training Sparsity-Aware Quantization b/data/2021/neurips/Post-Training Sparsity-Aware Quantization
new file mode 100644
index 0000000000..ab194c5918
--- /dev/null
+++ b/data/2021/neurips/Post-Training Sparsity-Aware Quantization	
@@ -0,0 +1 @@
+Quantization is a technique used in deep neural networks (DNNs) to increase execution performance and hardware efficiency. Uniform post-training quantization (PTQ) methods are common, since they can be implemented efficiently in hardware and do not require extensive hardware resources or a training set. Mapping FP32 models to INT8 using uniform PTQ yields models with negligible accuracy degradation; however, reducing precision below 8 bits with PTQ is challenging, as accuracy degradation becomes noticeable, due to the increase in quantization noise. In this paper, we propose a sparsity-aware quantization (SPARQ) method, in which the unstructured and dynamic activation sparsity is leveraged in different representation granularities. 4-bit quantization, for example, is employed by dynamically examining the bits of 8-bit values and choosing a window of 4 bits, while first skipping zero-value bits. Moreover, instead of quantizing activation-by-activation to 4 bits, we focus on pairs of 8-bit activations and examine whether one of the two is equal to zero. If one is equal to zero, the second can opportunistically use the other's 4-bit budget; if both do not equal zero, then each is dynamically quantized to 4 bits, as described. SPARQ achieves minor accuracy degradation, 2x speedup over widely used hardware architectures, and a practical hardware implementation. The code is available at this https URL.
\ No newline at end of file
diff --git a/data/2021/neurips/Post-processing for Individual Fairness b/data/2021/neurips/Post-processing for Individual Fairness
new file mode 100644
index 0000000000..5a3602f0da
--- /dev/null
+++ b/data/2021/neurips/Post-processing for Individual Fairness	
@@ -0,0 +1 @@
+Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems that are already used in production. The main appeal of post-processing is that it avoids expensive retraining. In this work, we propose general post-processing algorithms for individual fairness (IF). We consider a setting where the learner only has access to the predictions of the original model and a similarity graph between individuals, guiding the desired fairness constraints. We cast the IF post-processing problem as a graph smoothing problem corresponding to graph Laplacian regularization that preserves the desired"treat similar individuals similarly"interpretation. Our theoretical results demonstrate the connection of the new objective function to a local relaxation of the original individual fairness. Empirically, our post-processing algorithms correct individual biases in large-scale NLP models such as BERT, while preserving accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Posterior Collapse and Latent Variable Non-identifiability b/data/2021/neurips/Posterior Collapse and Latent Variable Non-identifiability
new file mode 100644
index 0000000000..7d2f2e3d1e
--- /dev/null
+++ b/data/2021/neurips/Posterior Collapse and Latent Variable Non-identifiability	
@@ -0,0 +1 @@
+Variational autoencoders model high-dimensional data by positing low-dimensional latent variables that are mapped through a flexible distribution parametrized by a neural network. Unfortunately, variational autoencoders often suffer from posterior collapse: the posterior of the latent variables is equal to its prior, rendering the variational autoencoder useless as a means to produce meaningful representations. Existing approaches to posterior collapse often attribute it to the use of neural networks or optimization issues due to variational approximation. In this paper, we consider posterior collapse as a problem of latent variable non-identifiability. We prove that the posterior collapses if and only if the latent variables are non-identifiable in the generative model. This fact implies that posterior collapse is not a phenomenon specific to the use of flexible distributions or approximate inference. Rather, it can occur in classical probabilistic models even with exact inference, which we also demonstrate. Based on these results, we propose a class of latent-identifiable variational autoencoders, deep generative models which enforce identifiability without sacrificing flexibility. This model class resolves the problem of latent variable non-identifiability by leveraging bijective Brenier maps and parameterizing them with input convex neural networks, without special variational inference objectives or optimization tricks. Across synthetic and real datasets, latent-identifiable variational autoencoders outperform existing methods in mitigating posterior collapse and providing meaningful representations of the data.
\ No newline at end of file
diff --git a/data/2021/neurips/Posterior Meta-Replay for Continual Learning b/data/2021/neurips/Posterior Meta-Replay for Continual Learning
new file mode 100644
index 0000000000..1e59b0ee69
--- /dev/null
+++ b/data/2021/neurips/Posterior Meta-Replay for Continual Learning	
@@ -0,0 +1 @@
+Continual Learning (CL) algorithms have recently received a lot of attention as they attempt to overcome the need to train with an i.i.d. sample from some unknown target data distribution. Building on prior work, we study principled ways to tackle the CL problem by adopting a Bayesian perspective and focus on continually learning a task-specific posterior distribution via a shared meta-model, a task-conditioned hypernetwork. This approach, which we term Posterior-replay CL, is in sharp contrast to most Bayesian CL approaches that focus on the recursive update of a single posterior distribution. The benefits of our approach are (1) an increased flexibility to model solutions in weight space and therewith less susceptibility to task dissimilarity, (2) access to principled task-specific predictive uncertainty estimates, that can be used to infer task identity during test time and to detect task boundaries during training, and (3) the ability to revisit and update task-specific posteriors in a principled manner without requiring access to past data. The proposed framework is versatile, which we demonstrate using simple posterior approximations (such as Gaussians) as well as powerful, implicit distributions modelled via a neural network. We illustrate the conceptual advance of our framework on low-dimensional problems and show performance gains on computer vision benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Powerpropagation: A sparsity inducing weight reparameterisation b/data/2021/neurips/Powerpropagation: A sparsity inducing weight reparameterisation
new file mode 100644
index 0000000000..6fde3945fe
--- /dev/null
+++ b/data/2021/neurips/Powerpropagation: A sparsity inducing weight reparameterisation	
@@ -0,0 +1 @@
+The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a"rich get richer"dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the use of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient b/data/2021/neurips/Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient
new file mode 100644
index 0000000000..df44f4bba9
--- /dev/null
+++ b/data/2021/neurips/Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient	
@@ -0,0 +1 @@
+We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), to a saddle-point formulation of LP. PDLP enhances PDHG for LP by combining several new techniques with older tricks from the literature; the enhancements include diagonal preconditioning, presolving, adaptive step sizes, and adaptive restarting. PDLP improves the state of the art for first-order methods applied to LP. We compare PDLP with SCS, an ADMM-based solver, on a set of 383 LP instances derived from MIPLIB 2017. With a target of $10^{-8}$ relative accuracy and 1 hour time limit, PDLP achieves a 6.3x reduction in the geometric mean of solve times and a 4.6x reduction in the number of instances unsolved (from 227 to 49). Furthermore, we highlight standard benchmark instances and a large-scale application (PageRank) where our open-source prototype of PDLP, written in Julia, outperforms a commercial LP solver.
\ No newline at end of file
diff --git a/data/2021/neurips/Practical Near Neighbor Search via Group Testing b/data/2021/neurips/Practical Near Neighbor Search via Group Testing
new file mode 100644
index 0000000000..e599dcaac4
--- /dev/null
+++ b/data/2021/neurips/Practical Near Neighbor Search via Group Testing	
@@ -0,0 +1 @@
+We present a new algorithm for the approximate near neighbor problem that combines classical ideas from group testing with locality-sensitive hashing (LSH). We reduce the near neighbor search problem to a group testing problem by designating neighbors as"positives,"non-neighbors as"negatives,"and approximate membership queries as group tests. We instantiate this framework using distance-sensitive Bloom Filters to Identify Near-Neighbor Groups (FLINNG). We prove that FLINNG has sub-linear query time and show that our algorithm comes with a variety of practical advantages. For example, FLINNG can be constructed in a single pass through the data, consists entirely of efficient integer operations, and does not require any distance computations. We conduct large-scale experiments on high-dimensional search tasks such as genome search, URL similarity search, and embedding search over the massive YFCC100M dataset. In our comparison with leading algorithms such as HNSW and FAISS, we find that FLINNG can provide up to a 10x query speedup with substantially smaller indexing time and memory.
\ No newline at end of file
diff --git a/data/2021/neurips/Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers b/data/2021/neurips/Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers
new file mode 100644
index 0000000000..1b646e69cc
--- /dev/null
+++ b/data/2021/neurips/Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers	
@@ -0,0 +1 @@
+We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We begin our investigation with the observation that agnostic algorithms \emph{cannot} be minimax-optimal in the realizable setting. Hence, we design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors and are general-purpose, accommodating a wide variety of function classes including kernel methods, H{\"o}lder smooth functions, and convex functions. The sample complexities of our algorithms can be quantified in terms of well-known quantities like the extended teaching dimension and haystack dimension. However, unlike algorithms based directly on those combinatorial quantities, our algorithms are computationally efficient. To achieve computational efficiency, our algorithms sample from the version space using Monte Carlo"hit-and-run"algorithms instead of maintaining the version space explicitly. Our approach has two key strengths. First, it is simple, consisting of two unifying, greedy algorithms. Second, our algorithms have the capability to seamlessly leverage prior knowledge that is often available and useful in practice. In addition to our new theoretical results, we demonstrate empirically that our algorithms are competitive with Gaussian process UCB methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Pragmatic Image Compression for Human-in-the-Loop Decision-Making b/data/2021/neurips/Pragmatic Image Compression for Human-in-the-Loop Decision-Making
new file mode 100644
index 0000000000..9cd0aa855d
--- /dev/null
+++ b/data/2021/neurips/Pragmatic Image Compression for Human-in-the-Loop Decision-Making	
@@ -0,0 +1 @@
+Standard lossy image compression algorithms aim to preserve an image's appearance, while minimizing the number of bits needed to transmit it. However, the amount of information actually needed by a user for downstream tasks -- e.g., deciding which product to click on in a shopping website -- is likely much lower. To achieve this lower bitrate, we would ideally only transmit the visual features that drive user behavior, while discarding details irrelevant to the user's decisions. We approach this problem by training a compression model through human-in-the-loop learning as the user performs tasks with the compressed images. The key insight is to train the model to produce a compressed image that induces the user to take the same action that they would have taken had they seen the original image. To approximate the loss function for this model, we train a discriminator that tries to distinguish whether a user's action was taken in response to the compressed image or the original. We evaluate our method through experiments with human participants on four tasks: reading handwritten digits, verifying photos of faces, browsing an online shopping catalogue, and playing a car racing video game. The results show that our method learns to match the user's actions with and without compression at lower bitrates than baseline methods, and adapts the compression model to the user's behavior: it preserves the digit number and randomizes handwriting style in the digit reading task, preserves hats and eyeglasses while randomizing faces in the photo verification task, preserves the perceived price of an item while randomizing its color and background in the online shopping task, and preserves upcoming bends in the road in the car racing game.
\ No newline at end of file
diff --git a/data/2021/neurips/Precise characterization of the prior predictive distribution of deep ReLU networks b/data/2021/neurips/Precise characterization of the prior predictive distribution of deep ReLU networks
new file mode 100644
index 0000000000..754d8ae689
--- /dev/null
+++ b/data/2021/neurips/Precise characterization of the prior predictive distribution of deep ReLU networks	
@@ -0,0 +1 @@
+Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture. Similar in spirit to the kind of analysis that has been developed to devise better initialization schemes for neural networks (cf. He- or Xavier initialization), we derive a precise characterization of the prior predictive distribution of finite-width ReLU networks with Gaussian weights. While theoretical results have been obtained for their heavy-tailedness, the full characterization of the prior predictive distribution (i.e. its density, CDF and moments), remained unknown prior to this work. Our analysis, based on the Meijer-G function, allows us to quantify the influence of architectural choices such as the width or depth of the network on the resulting shape of the prior predictive distribution. We also formally connect our results to previous work in the infinite width setting, demonstrating that the moments of the distribution converge to those of a normal log-normal mixture in the infinite depth limit. Finally, our results provide valuable guidance on prior design: for instance, controlling the predictive variance with depth- and width-informed priors on the weights of the network.
\ No newline at end of file
diff --git a/data/2021/neurips/Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization b/data/2021/neurips/Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization
new file mode 100644
index 0000000000..e4b211a125
--- /dev/null
+++ b/data/2021/neurips/Preconditioned Gradient Descent for Over-Parameterized Nonconvex Matrix Factorization	
@@ -0,0 +1 @@
+In practical instances of nonconvex matrix factorization, the rank of the true solution r ⋆ is often unknown, so the rank r of the model can be overspecified as r > r ⋆ . This over-parameterized regime of matrix factorization significantly slows down the convergence of local search algorithms, from a linear rate with r = r ⋆ to a sublinear rate when r > r ⋆ . We propose an inexpensive preconditioner for the matrix sensing variant of nonconvex matrix factorization that restores the convergence rate of gradient descent back to linear, even in the over-parameterized case, while also making it agnostic to possible ill-conditioning in the ground truth. Classical gradient descent in a neighborhood of the solution slows down due to the need for the model matrix factor to become singular. Our key result is that this singularity can be corrected by ℓ 2 regularization with a specific range of values for the damping parameter. In fact, a good damping parameter can be inexpensively estimated from the current iterate. The resulting algorithm, which we call preconditioned gradient descent or PrecGD, is stable under noise, and converges linearly to an information theoretically optimal error bound. Our numerical experiments find that PrecGD works equally well in restoring the linear convergence of other variants of nonconvex matrix factorization in the over-parameterized regime.
\ No newline at end of file
diff --git a/data/2021/neurips/Predicting Deep Neural Network Generalization with Perturbation Response Curves b/data/2021/neurips/Predicting Deep Neural Network Generalization with Perturbation Response Curves
new file mode 100644
index 0000000000..4a9e4865d5
--- /dev/null
+++ b/data/2021/neurips/Predicting Deep Neural Network Generalization with Perturbation Response Curves	
@@ -0,0 +1 @@
+The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition suggests that there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, which are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter-class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting optimal network architectures and hyperparameters that are invariant to a certain perturbation.
\ No newline at end of file
diff --git a/data/2021/neurips/Predicting Event Memorability from Contextual Visual Semantics b/data/2021/neurips/Predicting Event Memorability from Contextual Visual Semantics
new file mode 100644
index 0000000000..893b752cb7
--- /dev/null
+++ b/data/2021/neurips/Predicting Event Memorability from Contextual Visual Semantics	
@@ -0,0 +1 @@
+Episodic event memory is a key component of human cognition. Predicting event memorability, i.e. , to what extent an event is recalled, is a tough challenge in memory research and has profound implications for artiﬁcial intelligence. In this study, we investigate factors that affect event memorability according to a cued recall process. Speciﬁcally, we explore whether event memorability is contingent on the event context, as well as the intrinsic visual attributes of image cues. We design a novel experiment protocol and conduct a large-scale experiment with 47 elder subjects over 3 months. Subjects’ memory of life events is tested in a cued recall process. Using advanced visual analytics methods, we build a ﬁrst-of-its-kind event memorability dataset (called R3 ) with rich information about event context and visual semantic features. Furthermore, we propose a contextual event memory network (CEMNet) that tackles multi-modal input to predict item-wise event memorability, which outperforms competitive benchmarks. The ﬁndings inform deeper understanding of episodic event memory, and open up a new avenue for prediction of human episodic memory. Source code is available at https: /
\ No newline at end of file
diff --git a/data/2021/neurips/Predicting Molecular Conformation via Dynamic Graph Score Matching b/data/2021/neurips/Predicting Molecular Conformation via Dynamic Graph Score Matching
new file mode 100644
index 0000000000..0a4fbfd527
--- /dev/null
+++ b/data/2021/neurips/Predicting Molecular Conformation via Dynamic Graph Score Matching	
@@ -0,0 +1 @@
+Predicting stable 3D conformations from 2D molecular graphs has been a longstanding challenge in computational chemistry. Recently, machine learning approaches have demonstrated very promising results compared to traditional experimental and physics-based simulation methods. These approaches mainly focus on modeling the local interactions between neighboring atoms on the molecular graphs and overlook the long-range interactions between non-bonded atoms. However, these non-bonded atoms may be proximal to each other in 3D space, and modeling their interactions is of crucial importance to accurately determine molecular conformations, especially for large molecules and multi-molecular complexes. In this paper, we propose a new approach called Dynamic Graph Score Matching (DGSM) for molecular conformation prediction, which models both the local and long-range interactions by dynamically constructing graph structures between atoms according to their spatial proximity during both training and inference. Specifically, the DGSM directly estimates the gradient fields of the logarithm density of atomic coordinates according to the dynamically constructed graphs using score matching methods. The whole framework can be efficiently trained in an end-to-end fashion. Experiments across multiple tasks show that the DGSM outperforms state-of-theart baselines by a large margin, and it is capable of generating conformations for a broader range of systems such as proteins and multi-molecular complexes.
\ No newline at end of file
diff --git a/data/2021/neurips/Predicting What You Already Know Helps: Provable Self-Supervised Learning b/data/2021/neurips/Predicting What You Already Know Helps: Provable Self-Supervised Learning
new file mode 100644
index 0000000000..fefaf7fb87
--- /dev/null
+++ b/data/2021/neurips/Predicting What You Already Know Helps: Provable Self-Supervised Learning	
@@ -0,0 +1 @@
+Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks), that do not require labeled data, to learn semantic representations. These pretext tasks are created solely using the input features, such as predicting a missing image patch, recovering the color channels of an image from context, or predicting missing words, yet predicting this $known\ $information helps in learning representations effective for downstream prediction tasks. This paper posits a mechanism based on conditional independence to formalize how solving certain pretext tasks can learn representations that provably decreases the sample complexity of downstream supervised tasks. Formally, we quantify how approximate independence between the components of the pretext task (conditional on the label and latent variables) allows us to learn representations that can solve the downstream task with drastically reduced sample complexity by just training a linear layer on top of the learned representation.
\ No newline at end of file
diff --git a/data/2021/neurips/Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics b/data/2021/neurips/Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics
new file mode 100644
index 0000000000..8ec3f88b4a
--- /dev/null
+++ b/data/2021/neurips/Predify: Augmenting deep neural networks with brain-inspired predictive coding dynamics	
@@ -0,0 +1 @@
+Deep neural networks excel at image classification, but their performance is far less robust to input perturbations than human perception. In this work we explore whether this shortcoming may be partly addressed by incorporating brain-inspired recurrent dynamics in deep convolutional networks. We take inspiration from a popular framework in neuroscience: 'predictive coding'. At each layer of the hierarchical model, generative feedback 'predicts' (i.e., reconstructs) the pattern of activity in the previous layer. The reconstruction errors are used to iteratively update the network's representations across timesteps, and to optimize the network's feedback weights over the natural image dataset-a form of unsupervised training. We show that implementing this strategy into two popular networks, VGG16 and EfficientNetB0, improves their robustness against various corruptions and adversarial attacks. We hypothesize that other feedforward networks could similarly benefit from the proposed framework. To promote research in this direction, we provide an open-sourced PyTorch-based package called Predify, which can be used to implement and investigate the impacts of the predictive coding dynamics in any convolutional neural network.
\ No newline at end of file
diff --git a/data/2021/neurips/PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning b/data/2021/neurips/PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning
new file mode 100644
index 0000000000..e357b189cf
--- /dev/null
+++ b/data/2021/neurips/PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning	
@@ -0,0 +1 @@
+The design of optimal auctions is a problem of interest in economics, game theory and computer science. Despite decades of effort, strategyproof, revenue-maximizing auction designs are still not known outside of restricted settings. However, recent methods using deep learning have shown some success in approximating optimal auctions, recovering several known solutions and outperforming strong baselines when optimal auctions are not known. In addition to maximizing revenue, auction mechanisms may also seek to encourage socially desirable constraints such as allocation fairness or diversity. However, these philosophical notions neither have standardization nor do they have widely accepted formal definitions. In this paper, we propose PreferenceNet, an extension of existing neural-network-based auction mechanisms to encode constraints using (potentially human-provided) exemplars of desirable allocations. In addition, we introduce a new metric to evaluate an auction allocations' adherence to such socially desirable constraints and demonstrate that our proposed method is competitive with current state-of-the-art neural-network based auction designs. We validate our approach through human subject research and show that we are able to effectively capture real human preferences. Our code is available at this https URL
\ No newline at end of file
diff --git a/data/2021/neurips/Preserved central model for faster bidirectional compression in distributed settings b/data/2021/neurips/Preserved central model for faster bidirectional compression in distributed settings
new file mode 100644
index 0000000000..17bdee41a4
--- /dev/null
+++ b/data/2021/neurips/Preserved central model for faster bidirectional compression in distributed settings	
@@ -0,0 +1 @@
+We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, MCM additionally combines model compression with a memory mechanism. This analysis opens new doors, e.g. incorporating worker dependent randomized-models and partial participation.
\ No newline at end of file
diff --git a/data/2021/neurips/Pretraining Representations for Data-Efficient Reinforcement Learning b/data/2021/neurips/Pretraining Representations for Data-Efficient Reinforcement Learning
new file mode 100644
index 0000000000..7bb0102896
--- /dev/null
+++ b/data/2021/neurips/Pretraining Representations for Data-Efficient Reinforcement Learning	
@@ -0,0 +1 @@
+Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder which is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting. We provide code associated with this work at https://github.com/mila-iqia/SGI.
\ No newline at end of file
diff --git a/data/2021/neurips/Prior-independent Dynamic Auctions for a Value-maximizing Buyer b/data/2021/neurips/Prior-independent Dynamic Auctions for a Value-maximizing Buyer
new file mode 100644
index 0000000000..9193db3313
--- /dev/null
+++ b/data/2021/neurips/Prior-independent Dynamic Auctions for a Value-maximizing Buyer	
@@ -0,0 +1 @@
+We study prior-independent dynamic auction design with production costs for a value-maximizing buyer, a paradigm that is becoming prevalent recently following the development of automatic bidding algorithms in advertising platforms. In contrast to a utility-maximizing buyer, who maximizes the difference between her total value and total payment, a value-maximizing buyer aims to maximize her total value subject to a return on investment (ROI) constraint. Our main result is a dynamic mechanism with regret ˜ O ( T 2 / 3 ) , where T is the time horizon, against the ﬁrst-best benchmark, i.e., the maximum amount of revenue the seller can extract assuming all values of the buyer are publicly known.
\ No newline at end of file
diff --git a/data/2021/neurips/Private Non-smooth ERM and SCO in Subquadratic Steps b/data/2021/neurips/Private Non-smooth ERM and SCO in Subquadratic Steps
new file mode 100644
index 0000000000..f86ef2e2a6
--- /dev/null
+++ b/data/2021/neurips/Private Non-smooth ERM and SCO in Subquadratic Steps	
@@ -0,0 +1 @@
+We study the differentially private Empirical Risk Minimization (ERM) and Stochastic Convex Optimization (SCO) problems for non-smooth convex functions. We get a (nearly) optimal bound on the excess empirical risk with O ( N 3 / 2 d 1 / 8 + N 2 d ) gradient queries, which is achieved with the help of subsampling and smoothing the function via convolution. Combining this result with the iterative localization technique of Feldman et al. [FKT20], we achieve the optimal excess population loss for the SCO problem with O (min { N 5 / 4 d 1 / 8 , N 3 / 2 d 1 / 8 } ) gradient queries. Our work makes progress towards resolving a question raised by Bassily et al. [BFGT20], giving first algorithms for private ERM and SCO with subquadratic steps. In a concurrent work Asi et al. [AFKT21] gave other algorithms for private ERM and SCO with subquadratic steps.
\ No newline at end of file
diff --git a/data/2021/neurips/Private and Non-private Uniformity Testing for Ranking Data b/data/2021/neurips/Private and Non-private Uniformity Testing for Ranking Data
new file mode 100644
index 0000000000..65c921ab9d
--- /dev/null
+++ b/data/2021/neurips/Private and Non-private Uniformity Testing for Ranking Data	
@@ -0,0 +1 @@
+We study the problem of uniformity testing for statistical data that consists of rankings over m items, where the alternative class is restricted to Mallows models. Testing ranking data is challenging because of the size of the large domain that is factorial in m , therefore the tester needs to take advantage of some structure of the alternative class. We show that uniform distribution can be distinguished from Mallows model with O ( m � 1 / 2 ) samples based on simple pairwise statistics, which allows us to test uniformity using only two samples, if m is large enough. We also consider uniformity testing with central and local differential privacy (DP) constraints. We present a central DP algorithm that requires O (max { 1 / ✏ 0 , 1 / p m } ) , where ✏ 0 is the privacy budget parameter. Interestingly, our uniformity testing algorithm is straightforward to apply to the local DP scenario, since it works with binary statistics that is extracted from the ranking data. We carry out large-scale experiments, including m = 10 , 000 , to show that our uniformity testing algorithms scale gracefully with m .
\ No newline at end of file
diff --git a/data/2021/neurips/Private learning implies quantum stability b/data/2021/neurips/Private learning implies quantum stability
new file mode 100644
index 0000000000..b653e6e309
--- /dev/null
+++ b/data/2021/neurips/Private learning implies quantum stability	
@@ -0,0 +1 @@
+Learning an unknown $n$-qubit quantum state $\rho$ is a fundamental challenge in quantum computing. Information-theoretically, it is known that tomography requires exponential in $n$ many copies of $\rho$ to estimate it up to trace distance. Motivated by computational learning theory, Aaronson et al. introduced many (weaker) learning models: the PAC model of learning states (Proceedings of Royal Society A'07), shadow tomography (STOC'18) for learning"shadows"of a state, a model that also requires learners to be differentially private (STOC'19) and the online model of learning states (NeurIPS'18). In these models it was shown that an unknown state can be learned"approximately"using linear-in-$n$ many copies of rho. But is there any relationship between these models? In this paper we prove a sequence of (information-theoretic) implications from differentially-private PAC learning, to communication complexity, to online learning and then to quantum stability. Our main result generalizes the recent work of Bun, Livni and Moran (Journal of the ACM'21) who showed that finite Littlestone dimension (of Boolean-valued concept classes) implies PAC learnability in the (approximate) differentially private (DP) setting. We first consider their work in the real-valued setting and further extend their techniques to the setting of learning quantum states. Key to our results is our generic quantum online learner, Robust Standard Optimal Algorithm (RSOA), which is robust to adversarial imprecision. We then show information-theoretic implications between DP learning quantum states in the PAC model, learnability of quantum states in the one-way communication model, online learning of quantum states, quantum stability (which is our conceptual contribution), various combinatorial parameters and give further applications to gentle shadow tomography and noisy quantum state learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Privately Learning Mixtures of Axis-Aligned Gaussians b/data/2021/neurips/Privately Learning Mixtures of Axis-Aligned Gaussians
new file mode 100644
index 0000000000..24739eedcd
--- /dev/null
+++ b/data/2021/neurips/Privately Learning Mixtures of Axis-Aligned Gaussians	
@@ -0,0 +1 @@
+We consider the problem of learning mixtures of Gaussians under the constraint of approximate differential privacy. We prove that $\widetilde{O}(k^2 d \log^{3/2}(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians in $\mathbb{R}^d$ to within total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-differential privacy. This is the first result for privately learning mixtures of unbounded axis-aligned (or even unbounded univariate) Gaussians. If the covariance matrices of each of the Gaussians is the identity matrix, we show that $\widetilde{O}(kd/\alpha^2 + kd \log(1/\delta) / \alpha \varepsilon)$ samples are sufficient. Recently, the"local covering"technique of Bun, Kamath, Steinke, and Wu has been successfully used for privately learning high-dimensional Gaussians with a known covariance matrix and extended to privately learning general high-dimensional Gaussians by Aden-Ali, Ashtiani, and Kamath. Given these positive results, this approach has been proposed as a promising direction for privately learning mixtures of Gaussians. Unfortunately, we show that this is not possible. We design a new technique for privately learning mixture distributions. A class of distributions $\mathcal{F}$ is said to be list-decodable if there is an algorithm that, given"heavily corrupted"samples from $f\in \mathcal{F}$, outputs a list of distributions, $\widehat{\mathcal{F}}$, such that one of the distributions in $\widehat{\mathcal{F}}$ approximates $f$. We show that if $\mathcal{F}$ is privately list-decodable, then we can privately learn mixtures of distributions in $\mathcal{F}$. Finally, we show axis-aligned Gaussian distributions are privately list-decodable, thereby proving mixtures of such distributions are privately learnable.
\ No newline at end of file
diff --git a/data/2021/neurips/Privately Learning Subspaces b/data/2021/neurips/Privately Learning Subspaces
new file mode 100644
index 0000000000..ca1965d95b
--- /dev/null
+++ b/data/2021/neurips/Privately Learning Subspaces	
@@ -0,0 +1 @@
+Private data analysis suffers a costly curse of dimensionality. However, the data often has an underlying low-dimensional structure. For example, when optimizing via gradient descent, the gradients often lie in or near a low-dimensional subspace. If that low-dimensional structure can be identified, then we can avoid paying (in terms of privacy or accuracy) for the high ambient dimension. We present differentially private algorithms that take input data sampled from a low-dimensional linear subspace (possibly with a small amount of error) and output that subspace (or an approximation to it). These algorithms can serve as a pre-processing step for other procedures.
\ No newline at end of file
diff --git a/data/2021/neurips/Privately Publishable Per-instance Privacy b/data/2021/neurips/Privately Publishable Per-instance Privacy
new file mode 100644
index 0000000000..c017e081b6
--- /dev/null
+++ b/data/2021/neurips/Privately Publishable Per-instance Privacy	
@@ -0,0 +1 @@
+We consider how to privately share the personalized privacy losses incurred by objective perturbation, using per-instance differential privacy (pDP). Standard differential privacy (DP) gives us a worst-case bound that might be orders of magnitude larger than the privacy loss to a particular individual relative to a fixed dataset. The pDP framework provides a more fine-grained analysis of the privacy guarantee to a target individual, but the per-instance privacy loss itself might be a function of sensitive data. In this paper, we analyze the per-instance privacy loss of releasing a private empirical risk minimizer learned via objective perturbation, and propose a group of methods to privately and accurately publish the pDP losses at little to no additional privacy cost.
\ No newline at end of file
diff --git a/data/2021/neurips/ProTo: Program-Guided Transformer for Program-Guided Tasks b/data/2021/neurips/ProTo: Program-Guided Transformer for Program-Guided Tasks
new file mode 100644
index 0000000000..e49adefb13
--- /dev/null
+++ b/data/2021/neurips/ProTo: Program-Guided Transformer for Program-Guided Tasks	
@@ -0,0 +1 @@
+Programs, consisting of semantic and structural information, play an important role in the communication between humans and agents. Towards learning general program executors to unify perception, reasoning, and decision making, we formulate program-guided tasks which require learning to execute a given program on the observed task specification. Furthermore, we propose the Program-guided Transformer (ProTo), which integrates both semantic and structural guidance of a program by leveraging cross-attention and masked self-attention to pass messages between the specification and routines in the program. ProTo executes a program in a learned latent space and enjoys stronger representation ability than previous neural-symbolic approaches. We demonstrate that ProTo significantly outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets. Additionally, ProTo demonstrates better generalization to unseen, complex, and human-written programs.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Attention for Interactive Segmentation b/data/2021/neurips/Probabilistic Attention for Interactive Segmentation
new file mode 100644
index 0000000000..c96bd69528
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Attention for Interactive Segmentation	
@@ -0,0 +1 @@
+We provide a probabilistic interpretation of attention and show that the standard dot-product attention in transformers is a special case of Maximum A Posteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g, the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ($\sim10\%$ mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime. A PyTorch layer implementation of our probabilistic attention model will be made publicly available here: https://github.com/apple/ml-probabilistic-attention.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs b/data/2021/neurips/Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs
new file mode 100644
index 0000000000..47c3fcb17c
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs	
@@ -0,0 +1 @@
+Logical reasoning over Knowledge Graphs (KGs) is a fundamental technique that can provide efficient querying mechanism over large and incomplete databases. Current approaches employ spatial geometries such as boxes to learn query representations that encompass the answer entities and model the logical operations of projection and intersection. However, their geometry is restrictive and leads to non-smooth strict boundaries, which further results in ambiguous answer entities. Furthermore, previous works propose transformation tricks to handle unions which results in non-closure and, thus, cannot be chained in a stream. In this paper, we propose a Probabilistic Entity Representation Model (PERM) to encode entities as a Multivariate Gaussian density with mean and covariance parameters to capture its semantic position and smooth decision boundary, respectively. Additionally, we also define the closed logical operations of projection, intersection, and union that can be aggregated using an end-to-end objective function. On the logical query reasoning problem, we demonstrate that the proposed PERM significantly outperforms the state-of-the-art methods on various public benchmark KG datasets on standard evaluation metrics. We also evaluate PERM's competence on a COVID-19 drug-repurposing case study and show that our proposed work is able to recommend drugs with substantially better F1 than current methods. Finally, we demonstrate the working of our PERM's query answering process through a low-dimensional visualization of the Gaussian representations.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Forecasting: A Level-Set Approach b/data/2021/neurips/Probabilistic Forecasting: A Level-Set Approach
new file mode 100644
index 0000000000..fac0366921
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Forecasting: A Level-Set Approach	
@@ -0,0 +1 @@
+Large-scale time series panels have become ubiquitous over the last years in areas such as retail, operational metrics, IoT, and medical domain (to name only a few). This has resulted in a need for forecasting techniques that effectively leverage all available data by learning across all time series in each panel. Among the desirable properties of forecasting techniques, being able to generate probabilistic predictions ranks among the top. In this paper, we therefore present Level Set Forecaster (LSF), a simple yet effective general approach to transform a point estimator into a probabilistic one. By recognizing the connection of our algorithm to random forests (RFs) and quantile regression forests (QRFs), we are able to prove consistency guarantees of our approach under mild assumptions on the underlying point estimator. As a byproduct, we prove the ﬁrst consistency results for QRFs under the CART-splitting criterion. Empirical experiments show that our approach, equipped with tree-based models as the point estimator, rivals state-of-the-art deep learning models in terms of forecasting accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Margins for Instance Reweighting in Adversarial Training b/data/2021/neurips/Probabilistic Margins for Instance Reweighting in Adversarial Training
new file mode 100644
index 0000000000..c47609deb6
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Margins for Instance Reweighting in Adversarial Training	
@@ -0,0 +1 @@
+Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights. However, existing methods measuring the closeness are not very reliable: they are discrete and can take only a few values, and they are path-dependent, i.e., they may change given the same start and end points with different attack paths. In this paper, we propose three types of probabilistic margin (PM), which are continuous and path-independent, for measuring the aforementioned closeness and reweighting adversarial data. Specifically, a PM is defined as the difference between two estimated class-posterior probabilities, e.g., such the probability of the true label minus the probability of the most confusing label given some natural data. Though different PMs capture different geometric properties, all three PMs share a negative correlation with the vulnerability of data: data with larger/smaller PMs are safer/riskier and should have smaller/larger weights. Experiments demonstrate that PMs are reliable measurements and PM-based reweighting methods outperform state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Tensor Decomposition of Neural Population Spiking Activity b/data/2021/neurips/Probabilistic Tensor Decomposition of Neural Population Spiking Activity
new file mode 100644
index 0000000000..e8bc2822d5
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Tensor Decomposition of Neural Population Spiking Activity	
@@ -0,0 +1 @@
+The ﬁring of neural populations is coordinated across cells, in time, and across experimental conditions or repeated experimental trials, and so a full understanding of the computational signiﬁcance of neural responses must be based on a separation of these different contributions to structured activity. Tensor decomposition is an approach to untangling the inﬂuence of multiple factors in data that is common in many ﬁelds. However, despite some recent interest in neuroscience, wider applicability of the approach is hampered by the lack of a full probabilistic treatment allowing principled inference of a decomposition from non-Gaussian spike-count data. Here, we extend the Pólya-Gamma (PG) augmentation, previously used in sampling-based Bayesian inference, to implement scalable variational inference in non-conjugate spike-count models. Using this new approach, we develop techniques related to automatic relevance determination to infer the most appropriate tensor rank, as well as to incorporate priors based on known brain anatomy such as the segregation of cell response properties by brain area. We apply the model to neural recordings taken under conditions of visual-vestibular sensory integration, revealing how the encoding of self-and visual-motion signals is modulated by the sensory information available to the animal.
\ No newline at end of file
diff --git a/data/2021/neurips/Probabilistic Transformer For Time Series Analysis b/data/2021/neurips/Probabilistic Transformer For Time Series Analysis
new file mode 100644
index 0000000000..6b34b4a9b7
--- /dev/null
+++ b/data/2021/neurips/Probabilistic Transformer For Time Series Analysis	
@@ -0,0 +1 @@
+Generative modeling of multivariate time series has remained challenging partly due to the complex, non-deterministic dynamics across long-distance time steps. In this paper, we propose deep probabilistic methods that combine state-space models (SSMs) with transformer architectures. In contrast to previously proposed SSMs, our approaches use attention mechanism to model non-Markovian dynamics in the latent space and avoid recurrent neural networks entirely. We also extend our models to include several layers of stochastic variables organized in a hierarchy for further expressiveness. Compared to transformer models, ours are probabilistic, non-autoregressive, and capable of generating diverse long-term forecasts with accounted uncertainty. Extensive experiments show that our models consistently outperform competitive baselines on various tasks and datasets, including time series forecasting and human motion prediction.
\ No newline at end of file
diff --git a/data/2021/neurips/Probability Paths and the Structure of Predictions over Time b/data/2021/neurips/Probability Paths and the Structure of Predictions over Time
new file mode 100644
index 0000000000..78be74eb80
--- /dev/null
+++ b/data/2021/neurips/Probability Paths and the Structure of Predictions over Time	
@@ -0,0 +1 @@
+In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussian latent information martingale, or GLIM -- for modeling the structure of dynamic predictions over time. Suppose, for example, that the likelihood of rain in a week is 50 %, and consider two hypothetical scenarios. In the first, one expects the forecast to be equally likely to become either 25 % or 75 % tomorrow; in the second, one expects the forecast to stay constant for the next several days. A time-sensitive decision-maker might select a course of action immediately in the latter scenario, but may postpone their decision in the former, knowing that new information is imminent. We model these trajectories by assuming predictions update according to a latent process of information flow, which is inferred from historical data. In contrast to general methods for time series analysis, this approach preserves important properties of probability paths such as the martingale structure and appropriate amount of volatility and better quantifies future uncertainties around probability paths. We show that GLIM outperforms three popular baseline methods, producing better estimated posterior probability path distributions measured by three different metrics. By elucidating the dynamic structure of predictions over time, we hope to help individuals make more informed choices.
\ No newline at end of file
diff --git a/data/2021/neurips/Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training b/data/2021/neurips/Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training
new file mode 100644
index 0000000000..b0d947c063
--- /dev/null
+++ b/data/2021/neurips/Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training	
@@ -0,0 +1 @@
+Vision-Language Pre-training (VLP) aims to learn multi-modal representations from image-text pairs and serves for downstream vision-language tasks in a fine-tuning fashion. The dominant VLP models adopt a CNN-Transformer architecture, which embeds images with a CNN, and then aligns images and text with a Transformer. Visual relationship between visual contents plays an important role in image understanding and is the basic for inter-modal alignment learning. However, CNNs have limitations in visual relation learning due to local receptive field's weakness in modeling long-range dependencies. Thus the two objectives of learning visual relation and inter-modal alignment are encapsulated in the same Transformer network. Such design might restrict the inter-modal alignment learning in the Transformer by ignoring the specialized characteristic of each objective. To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment. Specifically, we propose a metric named Inter-Modality Flow (IMF) to measure the interaction between vision and language modalities (i.e., inter-modality). We also design a novel masking optimization mechanism named Masked Feature Regression (MFR) in Transformer to further promote the inter-modality learning. To the best of our knowledge, this is the first study to explore the benefit of Transformer for visual feature learning in VLP. We verify our method on a wide range of vision-language tasks, including Image-Text Retrieval, Visual Question Answering (VQA), Visual Entailment and Visual Reasoning. Our approach not only outperforms the state-of-the-art VLP performance, but also shows benefits on the IMF metric.
\ No newline at end of file
diff --git a/data/2021/neurips/Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets b/data/2021/neurips/Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
new file mode 100644
index 0000000000..e17c8130e3
--- /dev/null
+++ b/data/2021/neurips/Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets	
@@ -0,0 +1 @@
+Language models can generate harmful and biased outputs and exhibit undesirable behavior according to a given cultural context. We propose a Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.
\ No newline at end of file
diff --git a/data/2021/neurips/Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent b/data/2021/neurips/Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent
new file mode 100644
index 0000000000..c018cf0ff0
--- /dev/null
+++ b/data/2021/neurips/Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent	
@@ -0,0 +1 @@
+Finding diverse and representative Pareto solutions from the Pareto front is a key challenge in multi-objective optimization (MOO). In this work, we propose a novel gradient-based algorithm for proﬁling Pareto front by using Stein variational gradient descent (SVGD). We also provide a counterpart of our method based on Langevin dynamics. Our methods iteratively update a set of points in a parallel fashion to push them towards the Pareto front using multiple gradient descent, while encouraging the diversity between the particles by using the repulsive force mechanism in SVGD, or diffusion noise in Langevin dynamics. Compared with existing gradient-based methods that require predeﬁned preference functions, our method can work efﬁciently in high dimensional problems, and can obtain more diverse solutions evenly distributed in the Pareto front. Moreover, our methods are theoretically guaranteed to converge to the Pareto front. We demonstrate the effectiveness of our method, especially the SVGD algorithm, through extensive experiments, showing its superiority over existing gradient-based algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Program Synthesis Guided Reinforcement Learning for Partially Observed Environments b/data/2021/neurips/Program Synthesis Guided Reinforcement Learning for Partially Observed Environments
new file mode 100644
index 0000000000..c0702005e4
--- /dev/null
+++ b/data/2021/neurips/Program Synthesis Guided Reinforcement Learning for Partially Observed Environments	
@@ -0,0 +1 @@
+A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.
\ No newline at end of file
diff --git a/data/2021/neurips/Progressive Coordinate Transforms for Monocular 3D Object Detection b/data/2021/neurips/Progressive Coordinate Transforms for Monocular 3D Object Detection
new file mode 100644
index 0000000000..a097b048a3
--- /dev/null
+++ b/data/2021/neurips/Progressive Coordinate Transforms for Monocular 3D Object Detection	
@@ -0,0 +1 @@
+Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed {\em Progressive Coordinate Transforms} (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy leads to superior improvements on the KITTI and Waymo Open Dataset monocular 3D detection benchmarks. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks. The code is available at: https://github.com/amazon-research/progressive-coordinate-transforms .
\ No newline at end of file
diff --git a/data/2021/neurips/Progressive Feature Interaction Search for Deep Sparse Network b/data/2021/neurips/Progressive Feature Interaction Search for Deep Sparse Network
new file mode 100644
index 0000000000..5f1e36f71b
--- /dev/null
+++ b/data/2021/neurips/Progressive Feature Interaction Search for Deep Sparse Network	
@@ -0,0 +1 @@
+Deep sparse networks (DSNs), of which the crux is exploring the high-order feature interactions, have become the state-of-the-art on the prediction task with high-sparsity features. However, these models suffer from low computation efﬁciency, including large model size and slow model inference, which largely limits these models’ application value. In this work, we approach this problem with neural architecture search by automatically searching the critical component in DSNs, the feature-interaction layer. We propose a distilled search space to cover the desired architectures with fewer parameters. We then develop a progressive search algorithm for efﬁcient search on the space and well capture the order-priority property in sparse prediction tasks. Experiments on three real-world benchmark datasets show promising results of PROFIT in both accuracy and efﬁciency. Further studies validate the feasibility of our designed search space and search algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Projected GANs Converge Faster b/data/2021/neurips/Projected GANs Converge Faster
new file mode 100644
index 0000000000..bcd871869f
--- /dev/null
+++ b/data/2021/neurips/Projected GANs Converge Faster	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space. Motivated by the finding that the discriminator cannot fully exploit features from deeper layers of the pretrained model, we propose a more effective strategy that mixes features across channels and resolutions. Our Projected GAN improves image quality, sample efficiency, and convergence speed. It is further compatible with resolutions of up to one Megapixel and advances the state-of-the-art Fr\'echet Inception Distance (FID) on twenty-two benchmark datasets. Importantly, Projected GANs match the previously lowest FIDs up to 40 times faster, cutting the wall-clock time from 5 days to less than 3 hours given the same computational resources.
\ No newline at end of file
diff --git a/data/2021/neurips/Proper Value Equivalence b/data/2021/neurips/Proper Value Equivalence
new file mode 100644
index 0000000000..b25d0c17b3
--- /dev/null
+++ b/data/2021/neurips/Proper Value Equivalence	
@@ -0,0 +1 @@
+One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions: a model is said to be VE to the environment if the Bellman operators it induces for the policies yield the correct result when applied to the functions. As the number of policies and functions increase, the set of VE models shrinks, eventually collapsing to a single point corresponding to a perfect model. A fundamental question underlying the VE principle is thus how to select the smallest sets of policies and functions that are sufficient for planning. In this paper we take an important step towards answering this question. We start by generalizing the concept of VE to order-$k$ counterparts defined with respect to $k$ applications of the Bellman operator. This leads to a family of VE classes that increase in size as $k \rightarrow \infty$. In the limit, all functions become value functions, and we have a special instantiation of VE which we call proper VE or simply PVE. Unlike VE, the PVE class may contain multiple models even in the limit when all value functions are used. Crucially, all these models are sufficient for planning, meaning that they will yield an optimal policy despite the fact that they may ignore many aspects of the environment. We construct a loss function for learning PVE models and argue that popular algorithms such as MuZero can be understood as minimizing an upper bound for this loss. We leverage this connection to propose a modification to MuZero and show that it can lead to improved performance in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Property-Aware Relation Networks for Few-Shot Molecular Property Prediction b/data/2021/neurips/Property-Aware Relation Networks for Few-Shot Molecular Property Prediction
new file mode 100644
index 0000000000..245b7b8892
--- /dev/null
+++ b/data/2021/neurips/Property-Aware Relation Networks for Few-Shot Molecular Property Prediction	
@@ -0,0 +1 @@
+Molecular property prediction plays a fundamental role in AI-aided drug discovery to identify candidate molecules, which is also essentially a few-shot problem due to lack of labeled data. In this paper, we propose Property-Aware Relation networks (PAR) to handle this problem. We first introduce a property-aware molecular encoder to transform the generic molecular embeddings to property-aware ones. Then, we design a query-dependent relation graph learning module to estimate molecular relation graph and refine molecular embeddings w.r.t. the target property. Thus, the facts that both property-related information and relationships among molecules change across different properties are utilized to better learn and propagate molecular embeddings. Generally, PAR can be regarded as a combination of metric-based and optimization-based few-shot learning method. We further extend PAR to Transferable PAR (T-PAR) to handle the distribution shift, which is common in drug discovery. The keys are joint sampling and relation graph learning schemes, which simultaneously learn molecular embeddings from both source and target domains. Extensive results on benchmark datasets show that PAR and T-PAR consistently outperform existing methods on few-shot and transferable few-shot molecular property prediction tasks, respectively. Besides, ablation and case studies are conducted to validate the rationality of our designs in PAR and T-PAR.
\ No newline at end of file
diff --git a/data/2021/neurips/Proportional Participatory Budgeting with Additive Utilities b/data/2021/neurips/Proportional Participatory Budgeting with Additive Utilities
new file mode 100644
index 0000000000..ce10574f7a
--- /dev/null
+++ b/data/2021/neurips/Proportional Participatory Budgeting with Additive Utilities	
@@ -0,0 +1 @@
+We study voting rules for participatory budgeting, where a group of voters collectively decides which projects should be funded using a common budget. We allow the projects to have arbitrary costs, and the voters to have arbitrary additive valuations over the projects. We formulate an axiom (Extended Justiﬁed Representation, EJR) that guarantees proportional representation to groups of voters with common interests. We propose a simple and attractive voting rule called the Method of Equal Shares that satisﬁes this axiom for arbitrary costs and approval utilities, and that satisﬁes the axiom up to one project for arbitrary additive valuations. This method can be computed in polynomial time. In contrast, we show that the standard method for achieving proportionality in committee elections, Proportional Approval Voting (PAV), cannot be extended to work with arbitrary costs. Finally, we introduce a strengthened axiom (Full Justiﬁed Representation, FJR) and show that it is also satisﬁable, though by a computationally more expensive and less natural voting rule.
\ No newline at end of file
diff --git a/data/2021/neurips/Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation b/data/2021/neurips/Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
new file mode 100644
index 0000000000..c37fbbae39
--- /dev/null
+++ b/data/2021/neurips/Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation	
@@ -0,0 +1 @@
+Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code and video resources are available at http://vis.xyz/pub/pcan.
\ No newline at end of file
diff --git a/data/2021/neurips/Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning b/data/2021/neurips/Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
new file mode 100644
index 0000000000..ee613fe7c0
--- /dev/null
+++ b/data/2021/neurips/Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2021/neurips/Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss b/data/2021/neurips/Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss
new file mode 100644
index 0000000000..7a381e59b9
--- /dev/null
+++ b/data/2021/neurips/Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss	
@@ -0,0 +1 @@
+Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while keeping negative pairs far apart. Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i.e., data augmentations of the same image). Our work analyzes contrastive learning without assuming conditional independence of positive pairs using a novel concept of the augmentation graph on data. Edges in this graph connect augmentations of the same data, and ground-truth classes naturally form connected sub-graphs. We propose a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations. Minimizing this objective leads to features with provable accuracy guarantees under linear probe evaluation. By standard generalization bounds, these accuracy guarantees also hold when minimizing the training contrastive loss. Empirically, the features learned by our objective can match or outperform several strong baselines on benchmark vision datasets. In all, this work provides the first provable analysis for contrastive learning where guarantees for linear probe evaluation can apply to realistic empirical settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature b/data/2021/neurips/Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature
new file mode 100644
index 0000000000..edbcd01fdf
--- /dev/null
+++ b/data/2021/neurips/Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature	
@@ -0,0 +1 @@
+This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to over-exploration even for two-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also reasonably predict the size of the gradient and Hessian of the real return.
\ No newline at end of file
diff --git a/data/2021/neurips/Provable Representation Learning for Imitation with Contrastive Fourier Features b/data/2021/neurips/Provable Representation Learning for Imitation with Contrastive Fourier Features
new file mode 100644
index 0000000000..d54e24175e
--- /dev/null
+++ b/data/2021/neurips/Provable Representation Learning for Imitation with Contrastive Fourier Features	
@@ -0,0 +1 @@
+In imitation learning, it is common to learn a behavior policy to match an unknown target policy via max-likelihood training on a collected set of target demonstrations. In this work, we consider using offline experience datasets - potentially far from the target distribution - to learn low-dimensional state representations that provably accelerate the sample-efficiency of downstream imitation learning. A central challenge in this setting is that the unknown target policy itself may not exhibit low-dimensional behavior, and so there is a potential for the representation learning objective to alias states in which the target policy acts differently. Circumventing this challenge, we derive a representation learning objective that provides an upper bound on the performance difference between the target policy and a lowdimensional policy trained with max-likelihood, and this bound is tight regardless of whether the target policy itself exhibits low-dimensional structure. Moving to the practicality of our method, we show that our objective can be implemented as contrastive learning, in which the transition dynamics are approximated by either an implicit energy-based model or, in some special cases, an implicit linear model with representations given by random Fourier features. Experiments on both tabular environments and high-dimensional Atari games provide quantitative evidence for the practical benefits of our proposed objective.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning b/data/2021/neurips/Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning
new file mode 100644
index 0000000000..5662b4eaa9
--- /dev/null
+++ b/data/2021/neurips/Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning	
@@ -0,0 +1 @@
+Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent. Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted, which brings some design challenges. We study the action poisoning attack in both white-box and black-box settings. We introduce an adaptive attack scheme called LCB-H, which works for most RL agents in the black-box setting. We prove that the LCB-H attack can force any efficient RL agent, whose dynamic regret scales sublinearly with the total number of steps taken, to choose actions according to a policy selected by the attacker very frequently, with only sublinear cost. In addition, we apply LCB-H attack against a popular model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by spending only logarithm cost, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker very frequently.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably Efficient Causal Reinforcement Learning with Confounded Observational Data b/data/2021/neurips/Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
new file mode 100644
index 0000000000..4013be25b7
--- /dev/null
+++ b/data/2021/neurips/Provably Efficient Causal Reinforcement Learning with Confounded Observational Data	
@@ -0,0 +1 @@
+Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes. However, learning expressive function approximators requires collecting a large dataset (interventional data) by interacting with the environment. Such a lack of sample efficiency prohibits the application of DRL to critical scenarios, e.g., autonomous driving and personalized medicine, since trial and error in the online setting is often unsafe and even unethical. In this paper, we study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. To incorporate the possibly confounded observational data, we propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner. More specifically, DOVI explicitly adjusts for the confounding bias in the observational data, where the confounders are partially observed or unobserved. In both cases, such adjustments allow us to construct the bonus based on a notion of information gain, which takes into account the amount of information acquired from the offline setting. In particular, we prove that the regret of DOVI is smaller than the optimal regret achievable in the pure online setting by a multiplicative factor, which decreases towards zero when the confounded observational data are more informative upon the adjustments. Our algorithm and analysis serve as a step towards causal reinforcement learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints b/data/2021/neurips/Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints
new file mode 100644
index 0000000000..befb33cc1e
--- /dev/null
+++ b/data/2021/neurips/Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints	
@@ -0,0 +1 @@
+We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches. Our result suggests that it suffices to use only $\sqrt{T/dH}$ batches to obtain $\tilde O(\sqrt{d^3H^3T})$ regret. For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an $\tilde O(\sqrt{d^3H^3T[1+T/(dH)]^{dH/B}})$ regret, which implies that $dH\log T$ policy switches suffice to obtain the $\tilde O(\sqrt{d^3H^3T})$ regret. Our algorithms achieve the same regret as the LSVI-UCB algorithm (Jin et al., 2019), yet with a substantially smaller amount of adaptivity. We also establish a lower bound for the batch learning model, which suggests that the dependency on $B$ in our regret bound is tight.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably Faster Algorithms for Bilevel Optimization b/data/2021/neurips/Provably Faster Algorithms for Bilevel Optimization
new file mode 100644
index 0000000000..a35866e0a7
--- /dev/null
+++ b/data/2021/neurips/Provably Faster Algorithms for Bilevel Optimization	
@@ -0,0 +1 @@
+Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{\widetilde O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{\widetilde O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably Strict Generalisation Benefit for Invariance in Kernel Methods b/data/2021/neurips/Provably Strict Generalisation Benefit for Invariance in Kernel Methods
new file mode 100644
index 0000000000..c2d42cbc02
--- /dev/null
+++ b/data/2021/neurips/Provably Strict Generalisation Benefit for Invariance in Kernel Methods	
@@ -0,0 +1 @@
+It is a commonly held belief that enforcing invariance improves generalisation. Although this approach enjoys widespread popularity, it is only very recently that a rigorous theoretical demonstration of this benefit has been established. In this work we build on the function space perspective of Elesedy and Zaidi arXiv:2102.10333 to derive a strictly non-zero generalisation benefit of incorporating invariance in kernel ridge regression when the target is invariant to the action of a compact group. We study invariance enforced by feature averaging and find that generalisation is governed by a notion of effective dimension that arises from the interplay between the kernel and the group. In building towards this result, we find that the action of the group induces an orthogonal decomposition of both the reproducing kernel Hilbert space and its kernel, which may be of interest in its own right.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably efficient multi-task reinforcement learning with model transfer b/data/2021/neurips/Provably efficient multi-task reinforcement learning with model transfer
new file mode 100644
index 0000000000..80fbd05f37
--- /dev/null
+++ b/data/2021/neurips/Provably efficient multi-task reinforcement learning with model transfer	
@@ -0,0 +1 @@
+We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze an algorithm based on the idea of model transfer, and provide gap-dependent and gap-independent upper and lower bounds that characterize the intrinsic complexity of the problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Provably efficient, succinct, and precise explanations b/data/2021/neurips/Provably efficient, succinct, and precise explanations
new file mode 100644
index 0000000000..44b8d27d00
--- /dev/null
+++ b/data/2021/neurips/Provably efficient, succinct, and precise explanations	
@@ -0,0 +1 @@
+We consider the problem of explaining the predictions of an arbitrary blackbox model $f$: given query access to $f$ and an instance $x$, output a small set of $x$'s features that in conjunction essentially determines $f(x)$. We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lacked such guarantees, or achieved such guarantees but were inefficient. We obtain our algorithm via a connection to the problem of {\sl implicitly} learning decision trees. The implicit nature of this learning task allows for efficient algorithms even when the complexity of $f$ necessitates an intractably large surrogate decision tree. We solve the implicit learning problem by bringing together techniques from learning theory, local computation algorithms, and complexity theory. Our approach of"explaining by implicit learning"shares elements of two previously disparate methods for post-hoc explanations, global and local explanations, and we make the case that it enjoys advantages of both.
\ No newline at end of file
diff --git a/data/2021/neurips/Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent b/data/2021/neurips/Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
new file mode 100644
index 0000000000..d73578434a
--- /dev/null
+++ b/data/2021/neurips/Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent	
@@ -0,0 +1 @@
+Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that gradient descent on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
\ No newline at end of file
diff --git a/data/2021/neurips/Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence b/data/2021/neurips/Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence
new file mode 100644
index 0000000000..71507590d9
--- /dev/null
+++ b/data/2021/neurips/Proxy-Normalizing Activations to Match Batch Normalization while Removing Batch Dependence	
@@ -0,0 +1 @@
+We investigate the reasons for the performance degradation incurred with batch-independent normalization. We ﬁnd that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network’s pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity. To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique “Proxy Normalization” that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization’s behavior and consistently matches or exceeds its performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Pruning Randomly Initialized Neural Networks with Iterative Randomization b/data/2021/neurips/Pruning Randomly Initialized Neural Networks with Iterative Randomization
new file mode 100644
index 0000000000..a4aa191f64
--- /dev/null
+++ b/data/2021/neurips/Pruning Randomly Initialized Neural Networks with Iterative Randomization	
@@ -0,0 +1 @@
+Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet. The code is available at: https://github.com/dchiji-ntt/iterand
\ No newline at end of file
diff --git a/data/2021/neurips/Pseudo-Spherical Contrastive Divergence b/data/2021/neurips/Pseudo-Spherical Contrastive Divergence
new file mode 100644
index 0000000000..81590b16c2
--- /dev/null
+++ b/data/2021/neurips/Pseudo-Spherical Contrastive Divergence	
@@ -0,0 +1 @@
+Energy-based models (EBMs) offer flexible distribution parametrization. However, due to the intractable partition function, they are typically trained via contrastive divergence for maximum likelihood estimation. In this paper, we propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum likelihood learning of EBMs. PS-CD is derived from the maximization of a family of strictly proper homogeneous scoring rules, which avoids the computation of the intractable partition function and provides a generalized family of learning objectives that include contrastive divergence as a special case. Moreover, PS-CD allows us to flexibly choose various learning objectives to train EBMs without additional computational cost or variational minimax optimization. Theoretical analysis on the proposed method and extensive experiments on both synthetic data and commonly used image datasets demonstrate the effectiveness and modeling flexibility of PS-CD, as well as its robustness to data contamination, thus showing its superiority over maximum likelihood and $f$-EBMs.
\ No newline at end of file
diff --git a/data/2021/neurips/Pure Exploration in Kernel and Neural Bandits b/data/2021/neurips/Pure Exploration in Kernel and Neural Bandits
new file mode 100644
index 0000000000..a437276905
--- /dev/null
+++ b/data/2021/neurips/Pure Exploration in Kernel and Neural Bandits	
@@ -0,0 +1 @@
+We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecification. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples b/data/2021/neurips/Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
new file mode 100644
index 0000000000..48675ff5f1
--- /dev/null
+++ b/data/2021/neurips/Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples	
@@ -0,0 +1 @@
+Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the accuracy of the full-precision models, which is often infeasible in real-world scenarios for security and privacy issues. A popular approach to perform quantization without access to the original data is to use synthetically generated samples, based on batch-normalization statistics or adversarial learning. However, the drawback of such approaches is that they primarily rely on random noise input to the generator to attain diversity of the synthetic samples. We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries. To this end, we propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples. For the superposed embeddings to better reflect the original distribution, we also propose using an additional disentanglement mapping layer and extracting information from the full-precision model. The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization. Code is available at https://github.com/iamkanghyunchoi/qimera.
\ No newline at end of file
diff --git a/data/2021/neurips/Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes b/data/2021/neurips/Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
new file mode 100644
index 0000000000..ae9aaf2d30
--- /dev/null
+++ b/data/2021/neurips/Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes	
@@ -0,0 +1 @@
+Quantization is a popular technique that $transforms$ the parameter representation of a neural network from floating-point numbers into lower-precision ones ($e.g.$, 8-bit integers). It reduces the memory footprint and the computational cost at inference, facilitating the deployment of resource-hungry models. However, the parameter perturbations caused by this transformation result in $behavioral$ $disparities$ between the model before and after quantization. For example, a quantized model can misclassify some test-time samples that are otherwise classified correctly. It is not known whether such differences lead to a new security vulnerability. We hypothesize that an adversary may control this disparity to introduce specific behaviors that activate upon quantization. To study this hypothesis, we weaponize quantization-aware training and propose a new training framework to implement adversarial quantization outcomes. Following this framework, we present three attacks we carry out with quantization: (i) an indiscriminate attack for significant accuracy loss; (ii) a targeted attack against specific samples; and (iii) a backdoor attack for controlling the model with an input trigger. We further show that a single compromised model defeats multiple quantization schemes, including robust quantization techniques. Moreover, in a federated learning scenario, we demonstrate that a set of malicious participants who conspire can inject our quantization-activated backdoor. Lastly, we discuss potential counter-measures and show that only re-training consistently removes the attack artifacts. Our code is available at https://github.com/Secure-AI-Systems-Group/Qu-ANTI-zation
\ No newline at end of file
diff --git a/data/2021/neurips/QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning b/data/2021/neurips/QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning
new file mode 100644
index 0000000000..2472487e9b
--- /dev/null
+++ b/data/2021/neurips/QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning	
@@ -0,0 +1 @@
+Traditionally, federated learning (FL) aims to train a single global model while collaboratively using multiple clients and a server. Two natural challenges that FL algorithms face are heterogeneity in data across clients and collaboration of clients with {\em diverse resources}. In this work, we introduce a \textit{quantized} and \textit{personalized} FL algorithm QuPeD that facilitates collective (personalized model compression) training via \textit{knowledge distillation} (KD) among clients who have access to heterogeneous data and resources. For personalization, we allow clients to learn \textit{compressed personalized models} with different quantization parameters and model dimensions/structures. Towards this, first we propose an algorithm for learning quantized models through a relaxed optimization problem, where quantization values are also optimized over. When each client participating in the (federated) learning process has different requirements for the compressed model (both in model dimension and precision), we formulate a compressed personalization framework by introducing knowledge distillation loss for local client objectives collaborating through a global model. We develop an alternating proximal gradient update for solving this compressed personalization problem, and analyze its convergence properties. Numerically, we validate that QuPeD outperforms competing personalized FL methods, FedAvg, and local training of clients in various heterogeneous settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Quantifying and Improving Transferability in Domain Generalization b/data/2021/neurips/Quantifying and Improving Transferability in Domain Generalization
new file mode 100644
index 0000000000..da678662d2
--- /dev/null
+++ b/data/2021/neurips/Quantifying and Improving Transferability in Domain Generalization	
@@ -0,0 +1 @@
+Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world. Existing efforts mostly focus on building invariant features among source and target domains. Based on invariant features, a high-performing classifier on source domains could hopefully behave equally well on a target domain. In other words, the invariant features are \emph{transferable}. However, in practice, there are no perfectly transferable features, and some algorithms seem to learn"more transferable"features than others. How can we understand and quantify such \emph{transferability}? In this paper, we formally define transferability that one can quantify and compute in domain generalization. We point out the difference and connection with common discrepancy measures between domains, such as total variation and Wasserstein distance. We then prove that our transferability can be estimated with enough samples and give a new upper bound for the target error based on our transferability. Empirically, we evaluate the transferability of the feature embeddings learned by existing algorithms for domain generalization. Surprisingly, we find that many algorithms are not quite learning transferable features, although few could still survive. In light of this, we propose a new algorithm for learning transferable features and test it over various benchmark datasets, including RotatedMNIST, PACS, Office-Home and WILDS-FMoW. Experimental results show that the proposed algorithm achieves consistent improvement over many state-of-the-art algorithms, corroborating our theoretical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/R-Drop: Regularized Dropout for Neural Networks b/data/2021/neurips/R-Drop: Regularized Dropout for Neural Networks
new file mode 100644
index 0000000000..6ddaac0a40
--- /dev/null
+++ b/data/2021/neurips/R-Drop: Regularized Dropout for Neural Networks	
@@ -0,0 +1 @@
+Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{https://github.com/dropreg/R-Drop}}.
\ No newline at end of file
diff --git a/data/2021/neurips/RED : Looking for Redundancies for Data-FreeStructured Compression of Deep Neural Networks b/data/2021/neurips/RED : Looking for Redundancies for Data-FreeStructured Compression of Deep Neural Networks
new file mode 100644
index 0000000000..c574d97014
--- /dev/null
+++ b/data/2021/neurips/RED : Looking for Redundancies for Data-FreeStructured Compression of Deep Neural Networks	
@@ -0,0 +1 @@
+Deep Neural Networks (DNNs) are ubiquitous in today's computer vision land-scape, despite involving considerable computational costs. The mainstream approaches for runtime acceleration consist in pruning connections (unstructured pruning) or, better, filters (structured pruning), both often requiring data to re-train the model. In this paper, we present RED, a data-free structured, unified approach to tackle structured pruning. First, we propose a novel adaptive hashing of the scalar DNN weight distribution densities to increase the number of identical neurons represented by their weight vectors. Second, we prune the network by merging redundant neurons based on their relative similarities, as defined by their distance. Third, we propose a novel uneven depthwise separation technique to further prune convolutional layers. We demonstrate through a large variety of benchmarks that RED largely outperforms other data-free pruning methods, often reaching performance similar to unconstrained, data-driven methods.
\ No newline at end of file
diff --git a/data/2021/neurips/REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision b/data/2021/neurips/REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision
new file mode 100644
index 0000000000..797a9b3719
--- /dev/null
+++ b/data/2021/neurips/REMIPS: Physically Consistent 3D Reconstruction of Multiple Interacting People under Weak Supervision	
@@ -0,0 +1 @@
+The three-dimensional reconstruction of multiple interacting humans given a monocular image is crucial for the general task of scene understanding, as capturing the subtleties of interaction is often the very reason for taking a picture. Current 3D human reconstruction methods either treat each person independently, ignoring most of the context, or reconstruct people jointly, but cannot recover interactions correctly when people are in close proximity. In this work, we introduce REMIPS , a model for 3D Reconstruction of Multiple Interacting People under Weak Supervision. REMIPS can reconstruct a variable number of people directly from monocular images. At the core of our methodology stands a novel transformer network that combines unordered person tokens (one for each detected human) with positional-encoded tokens from image features patches. We introduce a novel uniﬁed model for self-and interpenetration-collisions based on a mesh approximation computed by applying decimation operators. We rely on self-supervised losses for ﬂexibility and generalisation in-the-wild and incorporate self-contact and interaction-contact losses directly into the learning process. With REMIPS , we report state-of-the-art quantitative results on common benchmarks even in cases where no 3D supervision is used. Additionally, qualitative visual results show that our reconstructions are plausible in terms of pose and shape and coherent for challenging images, collected in-the-wild, where people are often interacting.
\ No newline at end of file
diff --git a/data/2021/neurips/RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning b/data/2021/neurips/RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning
new file mode 100644
index 0000000000..62469fc7a8
--- /dev/null
+++ b/data/2021/neurips/RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning	
@@ -0,0 +1 @@
+Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, significantly reducing computational costs. In this work, we propose RETRIEVE, a coreset selection framework for efficient and robust semi-supervised learning. RETRIEVE selects the coreset by solving a mixed discrete-continuous bi-level optimization problem such that the selected coreset minimizes the labeled set loss. We use a one-step gradient approximation and show that the discrete optimization problem is approximately submodular, enabling simple greedy algorithms to obtain the coreset. We empirically demonstrate on several real-world datasets that existing SSL algorithms like VAT, Mean-Teacher, FixMatch, when used with RETRIEVE, achieve a) faster training times, b) better performance when unlabeled data consists of Out-of-Distribution (OOD) data and imbalance. More specifically, we show that with minimal accuracy degradation, RETRIEVE achieves a speedup of around $3\times$ in the traditional SSL setting and achieves a speedup of $5\times$ compared to state-of-the-art (SOTA) robust SSL algorithms in the case of imbalance and OOD data. RETRIEVE is available as a part of the CORDS toolkit: https://github.com/decile-team/cords.
\ No newline at end of file
diff --git a/data/2021/neurips/RIM: Reliable Influence-based Active Learning on Graphs b/data/2021/neurips/RIM: Reliable Influence-based Active Learning on Graphs
new file mode 100644
index 0000000000..176c7abe49
--- /dev/null
+++ b/data/2021/neurips/RIM: Reliable Influence-based Active Learning on Graphs	
@@ -0,0 +1 @@
+Message passing is the core of most graph models such as Graph Convolutional Network (GCN) and Label Propagation (LP), which usually require a large number of clean labeled data to smooth out the neighborhood over the graph. However, the labeling process can be tedious, costly, and error-prone in practice. In this paper, we propose to unify active learning (AL) and message passing towards minimizing labeling costs, e.g., making use of few and unreliable labels that can be obtained cheaply. We make two contributions towards that end. First, we open up a perspective by drawing a connection between AL enforcing message passing and social influence maximization, ensuring that the selected samples effectively improve the model performance. Second, we propose an extension to the influence model that incorporates an explicit quality factor to model label noise. In this way, we derive a fundamentally new AL selection criterion for GCN and LP--reliable influence maximization (RIM)--by considering quantity and quality of influence simultaneously. Empirical studies on public datasets show that RIM significantly outperforms current AL methods in terms of accuracy and efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/RL for Latent MDPs: Regret Guarantees and a Lower Bound b/data/2021/neurips/RL for Latent MDPs: Regret Guarantees and a Lower Bound
new file mode 100644
index 0000000000..053d06a5f1
--- /dev/null
+++ b/data/2021/neurips/RL for Latent MDPs: Regret Guarantees and a Lower Bound	
@@ -0,0 +1 @@
+In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least $\Omega((SA)^M)$ episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, {\it i.e.,} providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., Boots et al.) and a reachability assumption, we show that the need for initialization can be removed.
\ No newline at end of file
diff --git a/data/2021/neurips/RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem b/data/2021/neurips/RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem
new file mode 100644
index 0000000000..95588a9f13
--- /dev/null
+++ b/data/2021/neurips/RLlib Flow: Distributed Reinforcement Learning is a Dataflow Problem	
@@ -0,0 +1 @@
+Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow. We show that viewing RL as a dataflow problem leads to highly composable and performant implementations. We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in RLlib, a widely adopted distributed RL library. Concretely, RLlib Flow provides 2-9 code savings in real production code and enables the composition of multi-agent algorithms not possible by end users before. The open-source code is available as part of RLlib at https://github.com/ray-project/ray/tree/master/rllib.
\ No newline at end of file
diff --git a/data/2021/neurips/RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents b/data/2021/neurips/RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents
new file mode 100644
index 0000000000..83d6f1ba76
--- /dev/null
+++ b/data/2021/neurips/RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents	
@@ -0,0 +1 @@
+Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents in complex environments. To address these issues, we propose RMIX, a novel cooperative MARL method with the Conditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaR for decentralized execution. Then, to handle the temporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictor for risk level tuning. Finally, we optimize the CVaR policies with CVaR values used to estimate the target in TD error during centralized training and the CVaR values are used as auxiliary local rewards to update the local distribution via Quantile Regression loss. Empirically, we show that our method significantly outperforms state-of-the-art methods on challenging StarCraft II tasks, demonstrating enhanced coordination and improved sample efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/RMM: Reinforced Memory Management for Class-Incremental Learning b/data/2021/neurips/RMM: Reinforced Memory Management for Class-Incremental Learning
new file mode 100644
index 0000000000..0bd8baa906
--- /dev/null
+++ b/data/2021/neurips/RMM: Reinforced Memory Management for Class-Incremental Learning	
@@ -0,0 +1 @@
+Class-Incremental Learning (CIL) [40] trains classiﬁers under a strict memory budget: in each incremental phase, learning is done for new data, most of which is abandoned to free space for the next phase. The preserved data are exemplars used for replaying. However, existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal. In this work, we propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes. We call our method reinforced memory management (RMM), leveraging reinforcement learning. RMM training is not naturally compatible with CIL as the past, and future data are strictly non-accessible during the incremental phases. We solve this by training the policy function of RMM on pseudo CIL tasks, e.g., the tasks built on the data of the 0 -th phase, and then applying it to target tasks. RMM propagates two levels of actions: Level-1 determines how to split the memory between old and new classes, and Level-2 allocates memory for each speciﬁc class. In essence, it is an optimizable and general method for memory management that can be used in any replaying-based CIL method. For evaluation, we plug RMM into two top-performing baselines (LUCIR+AANets and POD+AANets [30]) and conduct experiments on three benchmarks (CIFAR-100, ImageNet-Subset, and ImageNet-Full). Our results show clear improvements, e.g., boosting POD+AANets by 3 . 6% , 4 . 4% , and 1 . 9% in the 25 -Phase settings of the above benchmarks, respectively. The code is available at https://class-il.mpi-inf.mpg.de/rmm
\ No newline at end of file
diff --git a/data/2021/neurips/Random Noise Defense Against Query-Based Black-Box Attacks b/data/2021/neurips/Random Noise Defense Against Query-Based Black-Box Attacks
new file mode 100644
index 0000000000..fcd0ee4165
--- /dev/null
+++ b/data/2021/neurips/Random Noise Defense Against Query-Based Black-Box Attacks	
@@ -0,0 +1 @@
+The query-based black-box attacks have raised serious threats to machine learning models in many real applications. In this work, we study a lightweight defense method, dubbed Random Noise Defense (RND), which adds proper Gaussian noise to each query. We conduct the theoretical analysis about the effectiveness of RND against query-based black-box attacks and the corresponding adaptive attacks. Our theoretical results reveal that the defense performance of RND is determined by the magnitude ratio between the noise induced by RND and the noise added by the attackers for gradient estimation or local search. The large magnitude ratio leads to the stronger defense performance of RND, and it's also critical for mitigating adaptive attacks. Based on our analysis, we further propose to combine RND with a plausible Gaussian augmentation Fine-tuning (RND-GF). It enables RND to add larger noise to each query while maintaining the clean accuracy to obtain a better trade-off between clean accuracy and defense performance. Additionally, RND can be flexibly combined with the existing defense methods to further boost the adversarial robustness, such as adversarial training (AT). Extensive experiments on CIFAR-10 and ImageNet verify our theoretical findings and the effectiveness of RND and RND-GF.
\ No newline at end of file
diff --git a/data/2021/neurips/Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems b/data/2021/neurips/Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems
new file mode 100644
index 0000000000..115a4cf61e
--- /dev/null
+++ b/data/2021/neurips/Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems	
@@ -0,0 +1 @@
+Recently, there has been much interest in studying the convergence rates of without-replacement SGD, and proving that it is faster than with-replacement SGD in the worst case. However, known lower bounds ignore the problem's geometry, including its condition number, whereas the upper bounds explicitly depend on it. Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGD \emph{does not} significantly improve on with-replacement SGD in terms of worst-case bounds, unless the number of epochs (passes over the data) is larger than the condition number. Since many problems in machine learning and other areas are both ill-conditioned and involve large datasets, this indicates that without-replacement does not necessarily improve over with-replacement sampling for realistic iteration budgets. We show this by providing new lower and upper bounds which are tight (up to log factors), for quadratic problems with commuting quadratic terms, precisely quantifying the dependence on the problem parameters.
\ No newline at end of file
diff --git a/data/2021/neurips/Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery b/data/2021/neurips/Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery
new file mode 100644
index 0000000000..17904e7334
--- /dev/null
+++ b/data/2021/neurips/Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery	
@@ -0,0 +1 @@
+We study the robust recovery of a low-rank matrix from sparsely and grossly corrupted Gaussian measurements, with no prior knowledge on the intrinsic rank. We consider the robust matrix factorization approach. We employ a robust `1 loss function and deal with the challenge of the unknown rank by using an overspecified factored representation of the matrix variable. We then solve the associated nonconvex nonsmooth problem using a subgradient method with diminishing stepsizes. We show that under a regularity condition on the sensing matrices and corruption, which we call restricted direction preserving property (RDPP), even with rank overspecified, the subgradient method converges to the exact low-rank solution at a sublinear rate. Moreover, our result is more general in the sense that it automatically speeds up to a linear rate once the factor rank matches the unknown rank. On the other hand, we show that the RDPP condition holds under generic settings, such as Gaussian measurements under independent or adversarial sparse corruptions, where the result could be of independent interest. Both the exact recovery and the convergence rate of the proposed subgradient method are numerically verified in the overspecified regime. Moreover, our experiment further shows that our particular design of diminishing stepsize effectively prevents overfitting for robust recovery under overparameterized models, such as robust matrix sensing and learning robust deep image prior. This regularization effect is worth further investigation.
\ No newline at end of file
diff --git a/data/2021/neurips/Ranking Policy Decisions b/data/2021/neurips/Ranking Policy Decisions
new file mode 100644
index 0000000000..23af219ff0
--- /dev/null
+++ b/data/2021/neurips/Ranking Policy Decisions	
@@ -0,0 +1 @@
+Policies trained via Reinforcement Learning (RL) are often needlessly complex, making them more difficult to analyse and interpret. In a run with $n$ time steps, a policy will decide $n$ times on an action to take, even when only a tiny subset of these decisions deliver value over selecting a simple default action. Given a pre-trained policy, we propose a black-box method based on statistical fault localisation that ranks the states of the environment according to the importance of decisions made in those states. We evaluate our ranking method by creating new, simpler policies by pruning decisions identified as unimportant, and measure the impact on performance. Our experimental results on a diverse set of standard benchmarks (gridworld, CartPole, Atari games) show that in some cases less than half of the decisions made contribute to the expected reward. We furthermore show that the decisions made in the most frequently visited states are not the most important for the expected reward.
\ No newline at end of file
diff --git a/data/2021/neurips/Rate-Optimal Subspace Estimation on Random Graphs b/data/2021/neurips/Rate-Optimal Subspace Estimation on Random Graphs
new file mode 100644
index 0000000000..f01e1ffd75
--- /dev/null
+++ b/data/2021/neurips/Rate-Optimal Subspace Estimation on Random Graphs	
@@ -0,0 +1 @@
+We study the theory of random bipartite graph whose adjacency matrix is generated according to a connectivity matrix M. We consider the bipartite graph to be sparse, i.e., the entries of M are upper bounded by certain sparsity parameter. We show that the performance of estimating the connectivity matrix M depends on the sparsity of the graph. We focus on two measurement of performance of estimation: the error of estimating M and the error of estimating the column space of M. In the first case, we consider the operator norm and Frobenius norm of the difference between the estimation and the true connectivity matrix. In the second case, the performance will be measured by the difference between the estimated projection matrix and the true projection matrix in operator norm and Frobenius norm. We will show that the estimators we propose achieve the minimax optimal rate.
\ No newline at end of file
diff --git a/data/2021/neurips/Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections b/data/2021/neurips/Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections
new file mode 100644
index 0000000000..53b008ace3
--- /dev/null
+++ b/data/2021/neurips/Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections	
@@ -0,0 +1 @@
+Optimal transport maps between two probability distributions μ and ν on R have found extensive applications in both machine learning and statistics. In practice, these maps need to be estimated from data sampled according to μ and ν. Plugin estimators are perhaps most popular in estimating transport maps in the field of computational optimal transport. In this paper, we provide a comprehensive analysis of the rates of convergences for general plug-in estimators defined via barycentric projections. Our main contribution is a new stability estimate for barycentric projections which proceeds under minimal smoothness assumptions and can be used to analyze general plug-in estimators. We illustrate the usefulness of this stability estimate by first providing rates of convergence for the natural discretediscrete and semi-discrete estimators of optimal transport maps. We then use the same stability estimate to show that, under additional smoothness assumptions of Sobolev type or Besov type, kernel smoothed or wavelet based plug-in estimators respectively speed up the rates of convergence and significantly mitigate the curse of dimensionality suffered by the natural discrete-discrete/semi-discrete estimators. As a by-product of our analysis, we also obtain faster rates of convergence for plug-in estimators of W2(μ, ν), the Wasserstein distance between μ and ν, under the aforementioned smoothness assumptions, thereby complementing recent results in Chizat et al. (2020). Finally, we illustrate the applicability of our results in obtaining rates of convergence for Wasserstein barycenter between two probability distributions and obtaining asymptotic detection thresholds for some recent optimaltransport based tests of independence.
\ No newline at end of file
diff --git a/data/2021/neurips/Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler b/data/2021/neurips/Raw Nav-merge Seismic Data to Subsurface Properties with MLP based Multi-Modal Information Unscrambler
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Re-ranking for image retrieval and transductive few-shot classification b/data/2021/neurips/Re-ranking for image retrieval and transductive few-shot classification
new file mode 100644
index 0000000000..791428441c
--- /dev/null
+++ b/data/2021/neurips/Re-ranking for image retrieval and transductive few-shot classification	
@@ -0,0 +1 @@
+In the problems of image retrieval and few-shot classiﬁcation, the mainstream approaches focus on learning a better feature representation. However, directly tackling the distance or similarity measure between images could also be efﬁcient. To this end, we revisit the idea of re-ranking the top-k retrieved images in the context of image retrieval (e.g., the k-reciprocal nearest neighbors [48, 75]) and generalize this idea to transductive few-shot learning. We propose to meta-learn the re-ranking updates such that the similarity graph converges towards the target similairty graph induced by the image labels. Speciﬁcally, the re-ranking module takes as input an initial similarity graph between the query image and the contextual images using a pre-trained feature extractor, and predicts an improved similarity graph by leveraging the structure among the involved images. We show that our re-ranking approach can be applied to unseen images and can further boost existing approaches for both image retrieval and few-shot learning problems. Our approach operates either independently or in conjunction with classical re-ranking approaches, yielding clear and consistent improvements on image retrieval (CUB, Cars, SOP, rOxford5K and rParis6K) and transductive few-shot classiﬁcation (Mini-ImageNet, tiered-ImageNet and CIFAR-FS) benchmarks. Our code is available at https://imagine.enpc.fr/~shenx/SSR/ .
\ No newline at end of file
diff --git a/data/2021/neurips/ReAct: Out-of-distribution Detection With Rectified Activations b/data/2021/neurips/ReAct: Out-of-distribution Detection With Rectified Activations
new file mode 100644
index 0000000000..f779099a63
--- /dev/null
+++ b/data/2021/neurips/ReAct: Out-of-distribution Detection With Rectified Activations	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection has received much attention lately due to its practical importance in enhancing the safe deployment of neural networks. One of the primary challenges is that models often produce highly confident predictions on OOD data, which undermines the driving principle in OOD detection that the model should only be confident about in-distribution samples. In this work, we propose ReAct--a simple and effective technique for reducing model overconfidence on OOD data. Our method is motivated by novel analysis on internal activations of neural networks, which displays highly distinctive signature patterns for OOD distributions. Our method can generalize effectively to different network architectures and different OOD detection scores. We empirically demonstrate that ReAct achieves competitive detection performance on a comprehensive suite of benchmark datasets, and give theoretical explication for our method's efficacy. On the ImageNet benchmark, ReAct reduces the false positive rate (FPR95) by 25.05% compared to the previous best method.
\ No newline at end of file
diff --git a/data/2021/neurips/ReLU Regression with Massart Noise b/data/2021/neurips/ReLU Regression with Massart Noise
new file mode 100644
index 0000000000..4215650f46
--- /dev/null
+++ b/data/2021/neurips/ReLU Regression with Massart Noise	
@@ -0,0 +1 @@
+We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is {\em at most} $\eta<1/2$. We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution. Such assumptions are necessary for exact recovery to be information-theoretically possible. We demonstrate that our algorithm significantly outperforms naive applications of $\ell_1$ and $\ell_2$ regression on both synthetic and real data.
\ No newline at end of file
diff --git a/data/2021/neurips/ReSSL: Relational Self-Supervised Learning with Weak Augmentation b/data/2021/neurips/ReSSL: Relational Self-Supervised Learning with Weak Augmentation
new file mode 100644
index 0000000000..a1418c50f5
--- /dev/null
+++ b/data/2021/neurips/ReSSL: Relational Self-Supervised Learning with Weak Augmentation	
@@ -0,0 +1 @@
+Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most of methods mainly focus on the instance level information (\ie, the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduced a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as \textit{relation} metric, which is thus utilized to match the feature embeddings of different augmentations. Moreover, to boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. Experimental results show that our proposed ReSSL significantly outperforms the previous state-of-the-art algorithms in terms of both performance and training efficiency. Code is available at \url{https://github.com/KyleZheng1997/ReSSL}.
\ No newline at end of file
diff --git a/data/2021/neurips/Realistic evaluation of transductive few-shot learning b/data/2021/neurips/Realistic evaluation of transductive few-shot learning
new file mode 100644
index 0000000000..06860097c8
--- /dev/null
+++ b/data/2021/neurips/Realistic evaluation of transductive few-shot learning	
@@ -0,0 +1 @@
+Transductive inference is widely used in few-shot learning, as it leverages the statistics of the unlabeled query set of a few-shot task, typically yielding substantially better performances than its inductive counterpart. The current few-shot benchmarks use perfectly class-balanced tasks at inference. We argue that such an artificial regularity is unrealistic, as it assumes that the marginal label probability of the testing samples is known and fixed to the uniform distribution. In fact, in realistic scenarios, the unlabeled query sets come with arbitrary and unknown label marginals. We introduce and study the effect of arbitrary class distributions within the query sets of few-shot tasks at inference, removing the class-balance artefact. Specifically, we model the marginal probabilities of the classes as Dirichlet-distributed random variables, which yields a principled and realistic sampling within the simplex. This leverages the current few-shot benchmarks, building testing tasks with arbitrary class distributions. We evaluate experimentally state-of-the-art transductive methods over 3 widely used data sets, and observe, surprisingly, substantial performance drops, even below inductive methods in some cases. Furthermore, we propose a generalization of the mutual-information loss, based on $\alpha$-divergences, which can handle effectively class-distribution variations. Empirically, we show that our transductive $\alpha$-divergence optimization outperforms state-of-the-art methods across several data sets, models and few-shot settings. Our code is publicly available at https://github.com/oveilleux/Realistic_Transductive_Few_Shot.
\ No newline at end of file
diff --git a/data/2021/neurips/Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training b/data/2021/neurips/Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training
new file mode 100644
index 0000000000..5b70d8729a
--- /dev/null
+++ b/data/2021/neurips/Rebooting ACGAN: Auxiliary Classifier GANs with Stable Training	
@@ -0,0 +1 @@
+Conditional Generative Adversarial Networks (cGAN) generate realistic images by incorporating class information into GAN. While one of the most popular cGANs is an auxiliary classifier GAN with softmax cross-entropy loss (ACGAN), it is widely known that training ACGAN is challenging as the number of classes in the dataset increases. ACGAN also tends to generate easily classifiable samples with a lack of diversity. In this paper, we introduce two cures for ACGAN. First, we identify that gradient exploding in the classifier can cause an undesirable collapse in early training, and projecting input vectors onto a unit hypersphere can resolve the problem. Second, we propose the Data-to-Data Cross-Entropy loss (D2D-CE) to exploit relational information in the class-labeled dataset. On this foundation, we propose the Rebooted Auxiliary Classifier Generative Adversarial Network (ReACGAN). The experimental results show that ReACGAN achieves state-of-the-art generation results on CIFAR10, Tiny-ImageNet, CUB200, and ImageNet datasets. We also verify that ReACGAN benefits from differentiable augmentations and that D2D-CE harmonizes with StyleGAN2 architecture. Model weights and a software package that provides implementations of representative cGANs and all experiments in our paper are available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.
\ No newline at end of file
diff --git a/data/2021/neurips/Rebounding Bandits for Modeling Satiation Effects b/data/2021/neurips/Rebounding Bandits for Modeling Satiation Effects
new file mode 100644
index 0000000000..c47747684e
--- /dev/null
+++ b/data/2021/neurips/Rebounding Bandits for Modeling Satiation Effects	
@@ -0,0 +1 @@
+Psychological research shows that enjoyment of many goods is subject to satiation, with enjoyment declining after repeated exposures to the same item. Nevertheless, proposed algorithms for powering recommender systems seldom model these dynamics, instead proceeding as though user preferences were fixed in time. In this work, we adopt a multi-armed bandit setup, modeling satiation dynamics as a time-invariant linear dynamical system. In our model, the expected rewards for each arm decline monotonically with consecutive exposures and rebound towards the initial reward whenever that arm is not pulled. We analyze this model, showing that, when the arms exhibit deterministic identical dynamics, our problem is equivalent to a specific instance of Max K-Cut. In this case, a greedy policy, which plays the arms in a cyclic order, is optimal. In the general setting, where each arm's satiation dynamics are stochastic and governed by different (unknown) parameters, we propose an algorithm that first uses offline data to estimate each arm's reward model and then plans using a generalization of the greedy policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Recognizing Vector Graphics without Rasterization b/data/2021/neurips/Recognizing Vector Graphics without Rasterization
new file mode 100644
index 0000000000..4123920f02
--- /dev/null
+++ b/data/2021/neurips/Recognizing Vector Graphics without Rasterization	
@@ -0,0 +1 @@
+In this paper, we consider a different data format for images: vector graphics. In contrast to raster graphics which are widely used in image recognition, vector graphics can be scaled up or down into any resolution without aliasing or information loss, due to the analytic representation of the primitives in the document. Furthermore, vector graphics are able to give extra structural information on how low-level elements group together to form high level shapes or structures. These merits of graphic vectors have not been fully leveraged in existing methods. To explore this data format, we target on the fundamental recognition tasks: object localization and classification. We propose an efficient CNN-free pipeline that does not render the graphic into pixels (i.e. rasterization), and takes textual document of the vector graphics as input, called YOLaT (You Only Look at Text). YOLaT builds multi-graphs to model the structural and spatial information in vector graphics, and a dual-stream graph neural network is proposed to detect objects from the graph. Our experiments show that by directly operating on vector graphics, YOLaT out-performs raster-graphic based object detection baselines in terms of both average precision and efficiency.
\ No newline at end of file
diff --git a/data/2021/neurips/Reconstruction for Powerful Graph Representations b/data/2021/neurips/Reconstruction for Powerful Graph Representations
new file mode 100644
index 0000000000..fc38fc88a3
--- /dev/null
+++ b/data/2021/neurips/Reconstruction for Powerful Graph Representations	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have limited expressive power, failing to represent many graph classes correctly. While more expressive graph representation learning (GRL) alternatives can distinguish some of these classes, they are significantly harder to implement, may not scale well, and have not been shown to outperform well-tuned GNNs in real-world tasks. Thus, devising simple, scalable, and expressive GRL architectures that also achieve real-world improvements remains an open challenge. In this work, we show the extent to which graph reconstruction -- reconstructing a graph from its subgraphs -- can mitigate the theoretical and practical problems currently faced by GRL architectures. First, we leverage graph reconstruction to build two new classes of expressive graph representations. Secondly, we show how graph reconstruction boosts the expressive power of any GNN architecture while being a (provably) powerful inductive bias for invariances to vertex removals. Empirically, we show how reconstruction can boost GNN's expressive power -- while maintaining its invariance to permutations of the vertices -- by solving seven graph property tasks not solvable by the original GNN. Further, we demonstrate how it boosts state-of-the-art GNN's performance across nine real-world benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Recovering Latent Causal Factor for Generalization to Distributional Shifts b/data/2021/neurips/Recovering Latent Causal Factor for Generalization to Distributional Shifts
new file mode 100644
index 0000000000..43c66d56ba
--- /dev/null
+++ b/data/2021/neurips/Recovering Latent Causal Factor for Generalization to Distributional Shifts	
@@ -0,0 +1 @@
+Distributional shifts between training and target domains may degrade the prediction accuracy of learned models, mainly because these models often learn features that possess only correlation rather than causal relation with the output. Such a correlation, which is known as “spurious correlation” statistically, is domain-dependent hence may fail to generalize to unseen domains. To avoid such a spurious correlation, we propose La tent C ausal I nvariance M odels (LaCIM) that speciﬁes the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction. Speciﬁcally, the LaCIM introduces a pair of correlated latent factors: (a) causal factor and (b) others, while the extent of this correlation is governed by a domain variable that characterizes the distributional shifts. On the basis of this, we prove that the distribution of observed variables conditioning on latent variables is shift-invariant. Equipped with such an invariance, we prove that the causal factor can be recovered without mixing information from others, which induces the ground-truth predicting mechanism. We propose a Variational-Bayesian-based method to learn this invariance for prediction. The utility of our approach is veriﬁed by improved generalization to distributional shifts on various real-world data. Our code is freely available at https://github.com/wubotong/LaCIM .
\ No newline at end of file
diff --git a/data/2021/neurips/Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition b/data/2021/neurips/Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition
new file mode 100644
index 0000000000..7e2f561d6f
--- /dev/null
+++ b/data/2021/neurips/Recovery Analysis for Plug-and-Play Priors using the Restricted Eigenvalue Condition	
@@ -0,0 +1 @@
+The plug-and-play priors (PnP) and regularization by denoising (RED) methods have become widely used for solving inverse problems by leveraging pre-trained deep denoisers as image priors. While the empirical imaging performance and the theoretical convergence properties of these algorithms have been widely investigated, their recovery properties have not previously been theoretically analyzed. We address this gap by showing how to establish theoretical recovery guarantees for PnP/RED by assuming that the solution of these methods lies near the fixed-points of a deep neural network. We also present numerical results comparing the recovery performance of PnP/RED in compressive sensing against that of recent compressive sensing algorithms based on generative models. Our numerical results suggest that PnP with a pre-trained artifact removal network provides significantly better results compared to the existing state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Rectangular Flows for Manifold Learning b/data/2021/neurips/Rectangular Flows for Manifold Learning
new file mode 100644
index 0000000000..9321987ddf
--- /dev/null
+++ b/data/2021/neurips/Rectangular Flows for Manifold Learning	
@@ -0,0 +1 @@
+Normalizing flows are invertible neural networks with tractable change-of-volume terms, which allow optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest are typically assumed to live in some (often unknown) low-dimensional manifold embedded in a high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- the invertibility requirement implies high-dimensional support of the learned distribution. Injective flows, mappings from low- to high-dimensional spaces, aim to fix this discrepancy by learning distributions on manifolds, but the resulting volume-change term becomes more challenging to evaluate. Current approaches either avoid computing this term entirely using various heuristics, or assume the manifold is known beforehand and therefore are not widely applicable. Instead, we propose two methods to tractably calculate the gradient of this term with respect to the parameters of the model, relying on careful use of automatic differentiation and techniques from numerical linear algebra. Both approaches perform end-to-end nonlinear manifold learning and density estimation for data projected onto this manifold. We study the trade-offs between our proposed methods, empirically verify that we outperform approaches ignoring the volume-change term by more accurately learning manifolds and the corresponding distributions on them, and show promising results on out-of-distribution detection. Our code is available at https://github.com/layer6ai-labs/rectangular-flows.
\ No newline at end of file
diff --git a/data/2021/neurips/Rectifying the Shortcut Learning of Background for Few-Shot Learning b/data/2021/neurips/Rectifying the Shortcut Learning of Background for Few-Shot Learning
new file mode 100644
index 0000000000..ee65725e59
--- /dev/null
+++ b/data/2021/neurips/Rectifying the Shortcut Learning of Background for Few-Shot Learning	
@@ -0,0 +1 @@
+The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL). In this paper, we for the first time empirically identify image background, common in realistic images, as a shortcut knowledge helpful for in-class classification but ungeneralizable beyond training categories in FSL. A novel framework, COSOC, is designed to tackle this problem by extracting foreground objects in images at both training and evaluation without any extra supervision. Extensive experiments carried on inductive FSL tasks demonstrate the effectiveness of our approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Recurrence along Depth: Deep Convolutional Neural Networks with Recurrent Layer Aggregation b/data/2021/neurips/Recurrence along Depth: Deep Convolutional Neural Networks with Recurrent Layer Aggregation
new file mode 100644
index 0000000000..1b3bca7818
--- /dev/null
+++ b/data/2021/neurips/Recurrence along Depth: Deep Convolutional Neural Networks with Recurrent Layer Aggregation	
@@ -0,0 +1 @@
+This paper introduces a concept of layer aggregation to describe how information from previous layers can be reused to better extract features at the current layer. While DenseNet is a typical example of the layer aggregation mechanism, its redundancy has been commonly criticized in the literature. This motivates us to propose a very light-weighted module, called recurrent layer aggregation (RLA), by making use of the sequential structure of layers in a deep CNN. Our RLA module is compatible with many mainstream deep CNNs, including ResNets, Xception and MobileNetV2, and its effectiveness is verified by our extensive experiments on image classification, object detection and instance segmentation tasks. Specifically, improvements can be uniformly observed on CIFAR, ImageNet and MS COCO datasets, and the corresponding RLA-Nets can surprisingly boost the performances by 2-3% on the object detection task. This evidences the power of our RLA module in helping main CNNs better learn structural information in images.
\ No newline at end of file
diff --git a/data/2021/neurips/Recurrent Bayesian Classifier Chains for Exact Multi-Label Classification b/data/2021/neurips/Recurrent Bayesian Classifier Chains for Exact Multi-Label Classification
new file mode 100644
index 0000000000..24b1e41eda
--- /dev/null
+++ b/data/2021/neurips/Recurrent Bayesian Classifier Chains for Exact Multi-Label Classification	
@@ -0,0 +1 @@
+Exact multi-label classification is the task of assigning each datapoint a set of class labels such that the assigned set exactly matches the ground truth. Optimizing for exact multi-label classification is important in domains where missing a single label can be especially costly, such as in object detection for autonomous vehicles or symptom classification for disease diagnosis. Recurrent Classifier Chains (RCCs), a recurrent neural network extension of ensemble-based classifier chains, are the state-of-the-art exact multi-label classification method for maximizing subset accuracy. However, RCCs iteratively predict classes with an unprincipled ordering, and therefore indiscriminately condition class probabilities. These disadvantages make RCCs prone to predicting inaccurate label sets. In this work we propose Recurrent Bayesian Classifier Chains (RBCCs), which learn a Bayesian network of class dependencies and leverage this network in order to condition the prediction of child nodes only on their parents. By conditioning predictions in this way, we perform principled and non-noisy class prediction. We demonstrate the effectiveness of our RBCC method on a variety of real-world multi-label datasets, where we routinely outperform the state of the art methods for exact multi-label classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits b/data/2021/neurips/Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits
new file mode 100644
index 0000000000..6d9fb22219
--- /dev/null
+++ b/data/2021/neurips/Recurrent Submodular Welfare and Matroid Blocking Semi-Bandits	
@@ -0,0 +1 @@
+A recent line of research focuses on the study of stochastic multi-armed bandits (MAB), in the case where temporal correlations of speciﬁc structure are imposed between the player’s actions and the reward distributions of the arms. These correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns – a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial semi-bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a ﬁxed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, only guarantees a 1 / 2 -approximation for general matroids. In this paper we develop the novel technique of correlated (interleaved) scheduling, which allows us to obtain a polynomial-time (1 − 1 / e ) -approximation algorithm (asymptotically and in expectation) for any matroid. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds. In the case where the mean arm rewards are unknown, our technique naturally decouples the scheduling from the learning problem, and thus allows to control the (1 − 1 / e ) -approximate regret of a UCB-based adaptation of our online algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks b/data/2021/neurips/Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks
new file mode 100644
index 0000000000..833cdde8e7
--- /dev/null
+++ b/data/2021/neurips/Recursive Bayesian Networks: Generalising and Unifying Probabilistic Context-Free Grammars and Dynamic Bayesian Networks	
@@ -0,0 +1 @@
+Probabilistic context-free grammars (PCFGs) and dynamic Bayesian networks (DBNs) are widely used sequence models with complementary strengths and limitations. While PCFGs allow for nested hierarchical dependencies (tree structures), their latent variables (non-terminal symbols) have to be discrete. In contrast, DBNs allow for continuous latent variables, but the dependencies are strictly sequential (chain structure). Therefore, neither can be applied if the latent variables are assumed to be continuous and also to have a nested hierarchical dependency structure. In this paper, we present Recursive Bayesian Networks (RBNs), which generalise and unify PCFGs and DBNs, combining their strengths and containing both as special cases. RBNs define a joint distribution over tree-structured Bayesian networks with discrete or continuous latent variables. The main challenge lies in performing joint inference over the exponential number of possible structures and the continuous variables. We provide two solutions: 1) For arbitrary RBNs, we generalise inside and outside probabilities from PCFGs to the mixed discrete-continuous case, which allows for maximum posterior estimates of the continuous latent variables via gradient descent, while marginalising over network structures. 2) For Gaussian RBNs, we additionally derive an analytic approximation, allowing for robust parameter optimisation and Bayesian inference. The capacity and diverse applications of RBNs are illustrated on two examples: In a quantitative evaluation on synthetic data, we demonstrate and discuss the advantage of RBNs for segmentation and tree induction from noisy sequences, compared to change point detection and hierarchical clustering. In an application to musical data, we approach the unsolved problem of hierarchical music analysis from the raw note level and compare our results to expert annotations.
\ No newline at end of file
diff --git a/data/2021/neurips/Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias b/data/2021/neurips/Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias
new file mode 100644
index 0000000000..5d36b2c443
--- /dev/null
+++ b/data/2021/neurips/Recursive Causal Structure Learning in the Presence of Latent Variables and Selection Bias	
@@ -0,0 +1 @@
+We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias. Constraint-based methods are one of the main approaches for solving this problem, but the existing methods are either computationally impractical when dealing with large graphs or lacking completeness guarantees. We propose a novel computationally efficient recursive constraint-based method that is sound and complete. The key idea of our approach is that at each iteration a specific type of variable is identified and removed. This allows us to learn the structure efficiently and recursively, as this technique reduces both the number of required conditional independence (CI) tests and the size of the conditioning sets. The former substantially reduces the computational complexity, while the latter results in more reliable CI tests. We provide an upper bound on the number of required CI tests in the worst case. To the best of our knowledge, this is the tightest bound in the literature. We further provide a lower bound on the number of CI tests required by any constraint-based method. The upper bound of our proposed approach and the lower bound at most differ by a factor equal to the number of variables in the worst case. We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world structures.
\ No newline at end of file
diff --git a/data/2021/neurips/Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems b/data/2021/neurips/Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems
new file mode 100644
index 0000000000..830879b841
--- /dev/null
+++ b/data/2021/neurips/Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems	
@@ -0,0 +1 @@
+The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from $10^7$ to $10^{11}$) along with the quadratic complexity of dot-product attention. In this work, we investigate the problem of approximating the two central components of the Transformer -- multi-head self-attention and point-wise feed-forward transformation, with reduced parameter space and computational complexity. We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations. Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme, TransEvolve, to bypass costly dot-product attention over multiple stacked layers. We perform exhaustive experiments with TransEvolve on well-known encoder-decoder as well as encoder-only tasks. We observe that the degree of approximation (or inversely, the degree of parameter reduction) has different effects on the performance, depending on the task. While in the encoder-decoder regime, TransEvolve delivers performances comparable to the original Transformer, in encoder-only tasks it consistently outperforms Transformer along with several subsequent variants.
\ No newline at end of file
diff --git a/data/2021/neurips/Reducing Collision Checking for Sampling-Based Motion Planning Using Graph Neural Networks b/data/2021/neurips/Reducing Collision Checking for Sampling-Based Motion Planning Using Graph Neural Networks
new file mode 100644
index 0000000000..10b309dc5a
--- /dev/null
+++ b/data/2021/neurips/Reducing Collision Checking for Sampling-Based Motion Planning Using Graph Neural Networks	
@@ -0,0 +1 @@
+Sampling-based motion planning is a popular approach in robotics for finding paths in continuous configuration spaces. Checking collision with obstacles is the major computational bottleneck in this process. We propose new learning-based methods for reducing collision checking to accelerate motion planning by training graph neural networks (GNNs) that perform path exploration and path smoothing. Given random geometric graphs (RGGs) generated from batch sampling, the path exploration component iteratively predicts collision-free edges to prioritize their exploration. The path smoothing component then optimizes paths obtained from the exploration stage. The methods benefit from the ability of GNNs of capturing geometric patterns from RGGs through batch sampling and generalize better to unseen environments. Experimental results show that the learned components can significantly reduce collision checking and improve overall planning efficiency in challenging high-dimensional motion planning tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation b/data/2021/neurips/Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation
new file mode 100644
index 0000000000..4e1982b1cc
--- /dev/null
+++ b/data/2021/neurips/Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation	
@@ -0,0 +1 @@
+Weakly supervised semantic segmentation produces pixel-level localization from class labels; however, a classifier trained on such labels is likely to focus on a small discriminative region of the target object. We interpret this phenomenon using the information bottleneck principle: the final layer of a deep neural network, activated by the sigmoid or softmax activation functions, causes an information bottleneck, and as a result, only a subset of the task-relevant information is passed on to the output. We first support this argument through a simulated toy experiment and then propose a method to reduce the information bottleneck by removing the last activation function. In addition, we introduce a new pooling method that further encourages the transmission of information from non-discriminative regions to the classification. Our experimental evaluations demonstrate that this simple modification significantly improves the quality of localization maps on both the PASCAL VOC 2012 and MS COCO 2014 datasets, exhibiting a new state-of-the-art performance for weakly supervised semantic segmentation. The code is available at: https://github.com/jbeomlee93/RIB.
\ No newline at end of file
diff --git a/data/2021/neurips/Reducing the Covariate Shift by Mirror Samples in Cross Domain Alignment b/data/2021/neurips/Reducing the Covariate Shift by Mirror Samples in Cross Domain Alignment
new file mode 100644
index 0000000000..dff83c054e
--- /dev/null
+++ b/data/2021/neurips/Reducing the Covariate Shift by Mirror Samples in Cross Domain Alignment	
@@ -0,0 +1 @@
+Eliminating the covariate shift cross domains is one of the common methods to deal with the issue of domain shift in visual unsupervised domain adaptation. However, current alignment methods, especially the prototype based or sample-level based methods neglect the structural properties of the underlying distribution and even break the condition of covariate shift. To relieve the limitations and conflicts, we introduce a novel concept named (virtual) mirror, which represents the equivalent sample in another domain. The equivalent sample pairs, named mirror pairs reflect the natural correspondence of the empirical distributions. Then a mirror loss, which aligns the mirror pairs cross domains, is constructed to enhance the alignment of the domains. The proposed method does not distort the internal structure of the underlying distribution. We also provide theoretical proof that the mirror samples and mirror loss have better asymptotic properties in reducing the domain shift. By applying the virtual mirror and mirror loss to the generic unsupervised domain adaptation model, we achieved consistent superior performance on several mainstream benchmarks. Code is available at https://github.com/CTI-VISION/Mirror-Sample
\ No newline at end of file
diff --git a/data/2021/neurips/Referring Transformer: A One-step Approach to Multi-task Visual Grounding b/data/2021/neurips/Referring Transformer: A One-step Approach to Multi-task Visual Grounding
new file mode 100644
index 0000000000..e5e3fd29bf
--- /dev/null
+++ b/data/2021/neurips/Referring Transformer: A One-step Approach to Multi-task Visual Grounding	
@@ -0,0 +1 @@
+As an important step towards visual reasoning, visual grounding (e.g., phrase localization, referring expression comprehension/segmentation) has been widely explored Previous approaches to referring expression comprehension (REC) or segmentation (RES) either suffer from limited performance, due to a two-stage setup, or require the designing of complex task-specific one-stage architectures. In this paper, we propose a simple one-stage multi-task framework for visual grounding tasks. Specifically, we leverage a transformer architecture, where two modalities are fused in a visual-lingual encoder. In the decoder, the model learns to generate contextualized lingual queries which are then decoded and used to directly regress the bounding box and produce a segmentation mask for the corresponding referred regions. With this simple but highly contextualized model, we outperform state-of-the-arts methods by a large margin on both REC and RES tasks. We also show that a simple pre-training schedule (on an external dataset) further improves the performance. Extensive experiments and ablations illustrate that our model benefits greatly from contextualized information and multi-task training.
\ No newline at end of file
diff --git a/data/2021/neurips/Refined Learning Bounds for Kernel and Approximate $k$-Means b/data/2021/neurips/Refined Learning Bounds for Kernel and Approximate $k$-Means
new file mode 100644
index 0000000000..d533573aa0
--- /dev/null
+++ b/data/2021/neurips/Refined Learning Bounds for Kernel and Approximate $k$-Means	
@@ -0,0 +1 @@
+Kernel k -means is one of the most popular approaches to clustering and its the-1 oretical properties have been investigated for decades. However, the existing 2 state-of-the-art risk bounds are of order O ( k/ √ n ) , which do not match with the 3 stated lower bound Ω( (cid:112) k/n ) in terms of k . In this paper, we study the statistical 4 properties of kernel k -means and Nyström-based kernel k -means, and obtain opti-5 mal clustering risk bounds, which improve the existing risk bounds. Particularly, 6 based on a reﬁned upper bound of the clustering Rademacher complexity, we ﬁrst 7 derive an optimal risk bound of rate O ( (cid:112) k/n ) for empirical risk minimizer (ERM), 8 and further extend it to general cases beyond ERM. Then, we analyze the statistical 9 effect of computational approximations of Nyström kernel k -means, and prove that 10 it achieves the same statistical accuracy as the original kernel k -means considering 11 only Ω( √ nk ) Nyström landmark points. We further relax the restriction of land-12 mark points from Ω( √ nk ) to Ω( √ n ) under a mild condition. Finally, we validate 13 the theoretical ﬁndings via numerical experiments. 14
\ No newline at end of file
diff --git a/data/2021/neurips/Refining Language Models with Compositional Explanations b/data/2021/neurips/Refining Language Models with Compositional Explanations
new file mode 100644
index 0000000000..f86ab808ae
--- /dev/null
+++ b/data/2021/neurips/Refining Language Models with Compositional Explanations	
@@ -0,0 +1 @@
+Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such spurious patterns via post-hoc explanation algorithms which compute the importance of input features. Further, the model is regularized to align the importance scores with human knowledge, so that the unintended model behaviors are eliminated. However, such a regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned language model for a target domain by collecting human-provided compositional explanations regarding observed biases. By parsing these explanations into executable logic rules, the human-specified refinement advice from a small set of explanations can be generalized to more training examples. We additionally introduce a regularization term allowing adjustments for both importance and interaction of features to better rectify model behavior. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain as well as improved model fairness after refinement.
\ No newline at end of file
diff --git a/data/2021/neurips/Reformulating Zero-shot Action Recognition for Multi-label Actions b/data/2021/neurips/Reformulating Zero-shot Action Recognition for Multi-label Actions
new file mode 100644
index 0000000000..31a287028a
--- /dev/null
+++ b/data/2021/neurips/Reformulating Zero-shot Action Recognition for Multi-label Actions	
@@ -0,0 +1 @@
+The goal of zero-shot action recognition (ZSAR) is to classify action classes which were not previously seen during training. Traditionally, this is achieved by training a network to map, or regress, visual inputs to a semantic space where a nearest neighbor classifier is used to select the closest target class. We argue that this approach is sub-optimal due to the use of nearest neighbor on static semantic space and is ineffective when faced with multi-label videos where two semantically distinct co-occurring action categories cannot be predicted with high confidence. To overcome these limitations, we propose a ZSAR framework which does not rely on nearest neighbor classification, but rather consists of a pairwise scoring function. Given a video and a set of action classes, our method predicts a set of confidence scores for each class independently. This allows for the prediction of several semantically distinct classes within one video input. Our evaluations show that our method not only achieves strong performance on three single-label action classification datasets (UCF-101, HMDB, and RareAct), but also outperforms previous ZSAR approaches on a challenging multi-label dataset (AVA) and a real-world surprise activity detection dataset (MEVA).
\ No newline at end of file
diff --git a/data/2021/neurips/Regime Switching Bandits b/data/2021/neurips/Regime Switching Bandits
new file mode 100644
index 0000000000..257a45a1c6
--- /dev/null
+++ b/data/2021/neurips/Regime Switching Bandits	
@@ -0,0 +1 @@
+We study a multi-armed bandit problem where the rewards exhibit regime-switching. Specifically, the distributions of the random rewards generated from all arms depend on a common underlying state modeled as a finite-state Markov chain. The agent does not observe the underlying state and has to learn the unknown transition probability matrix as well as the reward distribution. We propose an efficient learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models and upper confidence bound methods for reinforcement learning. We also establish $O(T^{2/3}\sqrt{\log T})$ bound on the regret of the proposed learning algorithm where $T$ is the unknown horizon. Finally, we conduct numerical experiments to illustrate the effectiveness of the learning algorithm.
\ No newline at end of file
diff --git a/data/2021/neurips/Regret Bounds for Gaussian-Process Optimization in Large Domains b/data/2021/neurips/Regret Bounds for Gaussian-Process Optimization in Large Domains
new file mode 100644
index 0000000000..3a06f6df83
--- /dev/null
+++ b/data/2021/neurips/Regret Bounds for Gaussian-Process Optimization in Large Domains	
@@ -0,0 +1 @@
+The goal of this paper is to characterize Gaussian-Process optimization in the setting where the function domain is large relative to the number of admissible function evaluations, i.e., where it is impossible to find the global optimum. We provide upper bounds on the suboptimality (Bayesian simple regret) of the solution found by optimization strategies that are closely related to the widely used expected improvement (EI) and upper confidence bound (UCB) algorithms. These regret bounds illuminate the relationship between the number of evaluations, the domain size (i.e. cardinality of finite domains / Lipschitz constant of the covariance function in continuous domains), and the optimality of the retrieved function value. In particular, we show that even when the number of evaluations is far too small to find the global optimum, we can find nontrivial function values (e.g. values that achieve a certain ratio with the optimal value).
\ No newline at end of file
diff --git a/data/2021/neurips/Regret Minimization Experience Replay in Off-Policy Reinforcement Learning b/data/2021/neurips/Regret Minimization Experience Replay in Off-Policy Reinforcement Learning
new file mode 100644
index 0000000000..a3d5a0c9c6
--- /dev/null
+++ b/data/2021/neurips/Regret Minimization Experience Replay in Off-Policy Reinforcement Learning	
@@ -0,0 +1 @@
+In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for Bellman update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling. Thus most previous criteria only consider this strategy partially. We not only provide theoretical justifications for previous criteria, but also propose two new methods to compute the prioritization weight, namely ReMERN and ReMERT. ReMERN learns an error network, while ReMERT exploits the temporal ordering of states. Both methods outperform previous prioritized sampling algorithms in challenging RL benchmarks, including MuJoCo, Atari and Meta-World.
\ No newline at end of file
diff --git a/data/2021/neurips/Regularization in ResNet with Stochastic Depth b/data/2021/neurips/Regularization in ResNet with Stochastic Depth
new file mode 100644
index 0000000000..593757978e
--- /dev/null
+++ b/data/2021/neurips/Regularization in ResNet with Stochastic Depth	
@@ -0,0 +1 @@
+Regularization plays a major role in modern deep learning. From classic techniques such as L1,L2 penalties to other noise-based methods such as Dropout, regularization often yields better generalization properties by avoiding overfitting. Recently, Stochastic Depth (SD) has emerged as an alternative regularization technique for residual neural networks (ResNets) and has proven to boost the performance of ResNet on many tasks [Huang et al., 2016]. Despite the recent success of SD, little is known about this technique from a theoretical perspective. This paper provides a hybrid analysis combining perturbation analysis and signal propagation to shed light on different regularization effects of SD. Our analysis allows us to derive principled guidelines for choosing the survival rates used for training with SD.
\ No newline at end of file
diff --git a/data/2021/neurips/Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond b/data/2021/neurips/Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond
new file mode 100644
index 0000000000..16c23ace47
--- /dev/null
+++ b/data/2021/neurips/Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond	
@@ -0,0 +1 @@
+We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Regularized Softmax Deep Multi-Agent Q-Learning b/data/2021/neurips/Regularized Softmax Deep Multi-Agent Q-Learning
new file mode 100644
index 0000000000..3a8a7abc15
--- /dev/null
+++ b/data/2021/neurips/Regularized Softmax Deep Multi-Agent Q-Learning	
@@ -0,0 +1 @@
+Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular $Q$-learning algorithm for cooperative multi-agent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent $Q$-Learning, is general and can be applied to any $Q$-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Regulating algorithmic filtering on social media b/data/2021/neurips/Regulating algorithmic filtering on social media
new file mode 100644
index 0000000000..0ff598f264
--- /dev/null
+++ b/data/2021/neurips/Regulating algorithmic filtering on social media	
@@ -0,0 +1 @@
+Through the algorithmic filtering (AF) of content, social media platforms (SMPs) have the ability to influence users' perceptions and behaviors. Attempts to regulate externalities of AF are often difficult to pass or enforce due to critical social, legal, financial, and user related considerations. In this work, we explore this multifaceted problem by proposing a unifying framework that considers the key stakeholders of AF regulation (or self-regulation). We mathematically formalize this framework, using it to construct a data-driven, statistically sound regulatory procedure that satisfies several important criteria. First, by design, it moderates the effect of AF on user learning and decision-making. Second, it has desirable properties of online governance, including being normative and user-driven. Third, by illustrating the regulatory procedure in linear dynamical systems, we prove that it can align social and financial interests. Specifically, we identify conditions under which the regulation imposes a low cost on the SMP's reward (e.g., profits) and incentivizes the SMP to increase content diversity.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization b/data/2021/neurips/Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization
new file mode 100644
index 0000000000..9366db88f9
--- /dev/null
+++ b/data/2021/neurips/Reinforced Few-Shot Acquisition Function Learning for Bayesian Optimization	
@@ -0,0 +1 @@
+Bayesian optimization (BO) conventionally relies on handcrafted acquisition functions (AFs) to sequentially determine the sample points. However, it has been widely observed in practice that the best-performing AF in terms of regret can vary significantly under different types of black-box functions. It has remained a challenge to design one AF that can attain the best performance over a wide variety of black-box functions. This paper aims to attack this challenge through the perspective of reinforced few-shot AF learning (FSAF). Specifically, we first connect the notion of AFs with Q-functions and view a deep Q-network (DQN) as a surrogate differentiable AF. While it serves as a natural idea to combine DQN and an existing few-shot learning method, we identify that such a direct combination does not perform well due to severe overfitting, which is particularly critical in BO due to the need of a versatile sampling policy. To address this, we present a Bayesian variant of DQN with the following three features: (i) It learns a distribution of Q-networks as AFs based on the Kullback-Leibler regularization framework. This inherently provides the uncertainty required in sampling for BO and mitigates overfitting. (ii) For the prior of the Bayesian DQN, we propose to use a demo policy induced by an off-the-shelf AF for better training stability. (iii) On the meta-level, we leverage the meta-loss of Bayesian model-agnostic meta-learning, which serves as a natural companion to the proposed FSAF. Moreover, with the proper design of the Q-networks, FSAF is general-purpose in that it is agnostic to the dimension and the cardinality of the input domain. Through extensive experiments, we demonstrate that the FSAF achieves comparable or better regrets than the state-of-the-art benchmarks on a wide variety of synthetic and real-world test functions.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning Enhanced Explainer for Graph Neural Networks b/data/2021/neurips/Reinforcement Learning Enhanced Explainer for Graph Neural Networks
new file mode 100644
index 0000000000..f754385101
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning Enhanced Explainer for Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have recently emerged as revolutionary technologies for machine learning tasks on graphs. In GNNs, the graph structure is generally incorporated with node representation via the message passing scheme, making the explanation much more challenging. Given a trained GNN model, a GNN explainer aims to identify a most inﬂuential subgraph to interpret the prediction of an instance (e.g., a node or a graph), which is essentially a combinatorial optimization problem over graph. The existing works solve this problem by continuous relaxation or search-based heuristics. But they suffer from key issues such as violation of message passing and hand-crafted heuristics, leading to inferior interpretability. To address these issues, we propose a RL-enhanced GNN explainer, RG-Explainer , which consists of three main components: starting point selection, iterative graph generation and stopping criteria learning. RG-Explainer could construct a connected explanatory subgraph by sequentially adding nodes from the boundary of the current generated graph, which is consistent with the message passing scheme. Further, we design an effective seed locator to select the starting point, and learn stopping criteria to generate superior explanations. Extensive experiments on both synthetic and real datasets show that RG-Explainer outperforms state-of-the-art GNN explainers. Moreover, RG-Explainer can be applied in the inductive setting, demonstrating its better generalization ability.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning based Disease Progression Model for Alzheimer's Disease b/data/2021/neurips/Reinforcement Learning based Disease Progression Model for Alzheimer's Disease
new file mode 100644
index 0000000000..636d428d7c
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning based Disease Progression Model for Alzheimer's Disease	
@@ -0,0 +1 @@
+We model Alzheimer's disease (AD) progression by combining differential equations (DEs) and reinforcement learning (RL) with domain knowledge. DEs provide relationships between some, but not all, factors relevant to AD. We assume that the missing relationships must satisfy general criteria about the working of the brain, for e.g., maximizing cognition while minimizing the cost of supporting cognition. This allows us to extract the missing relationships by using RL to optimize an objective (reward) function that captures the above criteria. We use our model consisting of DEs (as a simulator) and the trained RL agent to predict individualized 10-year AD progression using baseline (year 0) features on synthetic and real data. The model was comparable or better at predicting 10-year cognition trajectories than state-of-the-art learning-based models. Our interpretable model demonstrated, and provided insights into,"recovery/compensatory"processes that mitigate the effect of AD, even though those processes were not explicitly encoded in the model. Our framework combines DEs with RL for modelling AD progression and has broad applicability for understanding other neurological disorders.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection b/data/2021/neurips/Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection
new file mode 100644
index 0000000000..5943cceb21
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection	
@@ -0,0 +1 @@
+We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure. We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function. This result encompasses the well-known settings of low-rank MDPs and, more generally, zero inherent Bellman error (also known as the Bellman closure assumption). We then demonstrate that this condition is also sufficient for these classes of problems by deriving a constant regret bound for two optimistic algorithms (LSVI-UCB and ELEANOR). Finally, we propose an algorithm for representation selection and we prove that it achieves constant regret when one of the given representations, or a suitable combination of them, satisfies the UNISOFT condition.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning in Newcomblike Environments b/data/2021/neurips/Reinforcement Learning in Newcomblike Environments
new file mode 100644
index 0000000000..69f02c43e9
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning in Newcomblike Environments	
@@ -0,0 +1 @@
+Newcomblike decision problems have been studied extensively in the decision theory literature, but they have so far been largely absent in the reinforcement learning literature. In this paper we study value-based reinforcement learning algorithms in the Newcomblike setting, and answer some of the fundamental theoretical questions about the behaviour of such algorithms in these environments. We show that a value-based reinforcement learning agent cannot converge to a policy that is not rati ﬁ able , i.e., does not only choose actions that are optimal given that policy. This gives us a powerful tool for reasoning about the limit behaviour of agents – for example, it lets us show that there are Newcomblike environments in which a reinforcement learning agent cannot converge to any optimal policy. We show that a rati ﬁ able policy always exists in our setting, but that there are cases in which a reinforcement learning agent normally cannot converge to it (and hence cannot converge at all). We also prove several results about the possible limit behaviours of agents in cases where they do not converge to any policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning in Reward-Mixing MDPs b/data/2021/neurips/Reinforcement Learning in Reward-Mixing MDPs
new file mode 100644
index 0000000000..09c1122d58
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning in Reward-Mixing MDPs	
@@ -0,0 +1 @@
+Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP). There, a reward function is drawn from one of multiple possible reward models at the beginning of every episode, but the identity of the chosen reward model is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality. Indeed, with no further assumptions, even for two switching reward-models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. We provide the first polynomial-time algorithm that finds an $\epsilon$-optimal policy after exploring $\tilde{O}(poly(H,\epsilon^{-1}) \cdot S^2 A^2)$ episodes, where $H$ is time-horizon and $S, A$ are the number of states and actions respectively. This is the first efficient algorithm that does not require any assumptions in partially observed environments where the observation space is smaller than the latent state space.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning with Latent Flow b/data/2021/neurips/Reinforcement Learning with Latent Flow
new file mode 100644
index 0000000000..14194cb8fd
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning with Latent Flow	
@@ -0,0 +1 @@
+Temporal information is essential to learning effective policies with Reinforcement Learning (RL). However, current state-of-the-art RL algorithms either assume that such information is given as part of the state space or, when learning from pixels, use the simple heuristic of frame-stacking to implicitly capture temporal information present in the image observations. This heuristic is in contrast to the current paradigm in video classification architectures, which utilize explicit encodings of temporal information through methods such as optical flow and two-stream architectures to achieve state-of-the-art performance. Inspired by leading video classification architectures, we introduce the Flow of Latents for Reinforcement Learning (Flare), a network architecture for RL that explicitly encodes temporal information through latent vector differences. We show that Flare (i) recovers optimal performance in state-based RL without explicit access to the state velocity, solely with positional state information, (ii) achieves state-of-the-art performance on pixel-based challenging continuous control tasks within the DeepMind control benchmark suite, namely quadruped walk, hopper hop, finger turn hard, pendulum swing, and walker run, and is the most sample efficient model-free pixel-based RL algorithm, outperforming the prior model-free state-of-the-art by 1.9X and 1.5X on the 500k and 1M step benchmarks, respectively, and (iv), when augmented over rainbow DQN, outperforms this state-of-the-art level baseline on 5 of 8 challenging Atari games at 100M time step benchmark.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes b/data/2021/neurips/Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes
new file mode 100644
index 0000000000..f832b46fed
--- /dev/null
+++ b/data/2021/neurips/Reinforcement Learning with State Observation Costs in Action-Contingent Noiselessly Observable Markov Decision Processes	
@@ -0,0 +1 @@
+Many real-world problems that require making optimal sequences of decisions under uncertainty involve costs when the agent wishes to obtain information about its environment. We design and analyze algorithms for reinforcement learning (RL) in Action-Contingent Noiselessly Observable MDPs (ACNO-MDPs), a special class of POMDPs in which the agent can choose to either (1) fully observe the state at a cost and then act; or (2) act without any immediate observation information, relying on past observations to infer the underlying state. ACNO-MDPs arise frequently in important real-world application domains like healthcare, in which clinicians must balance the value of information gleaned from medical tests (e.g., blood-based biomarkers) with the costs of gathering that information (e.g., the costs of labor and materials required to administer such tests). We develop a Probably Approximately Correct (PAC) RL algorithm for tabular ACNO-MDPs that provides substantially tighter bounds compared to generic POMDP-RL algorithms, on the total number of episodes exhibiting worse than near-optimal performance. For continuous-state, continuous-action ACNO-MDPs, we propose a novel method of incorporating observation information that, when coupled with modern RL algorithms, yields signiﬁcantly faster learning compared to other POMDP-RL algorithms in several simulated environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Reinforcement learning for optimization of variational quantum circuit architectures b/data/2021/neurips/Reinforcement learning for optimization of variational quantum circuit architectures
new file mode 100644
index 0000000000..a690aba968
--- /dev/null
+++ b/data/2021/neurips/Reinforcement learning for optimization of variational quantum circuit architectures	
@@ -0,0 +1 @@
+The study of Variational Quantum Eigensolvers (VQEs) has been in the spotlight in recent times as they may lead to real-world applications of near-term quantum devices. However, their performance depends on the structure of the used variational ansatz, which requires balancing the depth and expressivity of the corresponding circuit. In recent years, various methods for VQE structure optimization have been introduced but the capacities of machine learning to aid with this problem has not yet been fully investigated. In this work, we propose a reinforcement learning algorithm that autonomously explores the space of possible ans{\"a}tze, identifying economic circuits which still yield accurate ground energy estimates. The algorithm is intrinsically motivated, and it incrementally improves the accuracy of the result while minimizing the circuit depth. We showcase the performance of our algorithm on the problem of estimating the ground-state energy of lithium hydride (LiH). In this well-known benchmark problem, we achieve chemical accuracy, as well as state-of-the-art results in terms of circuit depth.
\ No newline at end of file
diff --git a/data/2021/neurips/Relational Self-Attention: What's Missing in Attention for Video Understanding b/data/2021/neurips/Relational Self-Attention: What's Missing in Attention for Video Understanding
new file mode 100644
index 0000000000..5d72698331
--- /dev/null
+++ b/data/2021/neurips/Relational Self-Attention: What's Missing in Attention for Video Understanding	
@@ -0,0 +1 @@
+Convolution has been arguably the most important feature transform for modern neural networks, leading to the advance of deep learning. Recent emergence of Transformer networks, which replace convolution layers with self-attention blocks, has revealed the limitation of stationary convolution kernels and opened the door to the era of dynamic feature transforms. The existing dynamic transforms, including self-attention, however, are all limited for video understanding where correspondence relations in space and time, i.e., motion information, are crucial for effective representation. In this work, we introduce a relational feature transform, dubbed the relational self-attention (RSA), that leverages rich structures of spatio-temporal relations in videos by dynamically generating relational kernels and aggregating relational contexts. Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts, achieving the state of the art on the standard motion-centric benchmarks for video action recognition, such as Something-Something-V1&V2, Diving48, and FineGym.
\ No newline at end of file
diff --git a/data/2021/neurips/Relative Flatness and Generalization b/data/2021/neurips/Relative Flatness and Generalization
new file mode 100644
index 0000000000..53f105f947
--- /dev/null
+++ b/data/2021/neurips/Relative Flatness and Generalization	
@@ -0,0 +1 @@
+Flatness of the loss curve is conjectured to be connected to the generalization ability of machine learning models, in particular neural networks. While it has been empirically observed that flatness measures consistently correlate strongly with generalization, it is still an open theoretical problem why and under which circumstances flatness is connected to generalization, in particular in light of reparameterizations that change certain flatness measures but leave generalization unchanged. We investigate the connection between flatness and generalization by relating it to the interpolation from representative data, deriving notions of representativeness, and feature robustness. The notions allow us to rigorously connect flatness and generalization and to identify conditions under which the connection holds. Moreover, they give rise to a novel, but natural relative flatness measure that correlates strongly with generalization, simplifies to ridge regression for ordinary least squares, and solves the reparameterization issue.
\ No newline at end of file
diff --git a/data/2021/neurips/Relative Uncertainty Learning for Facial Expression Recognition b/data/2021/neurips/Relative Uncertainty Learning for Facial Expression Recognition
new file mode 100644
index 0000000000..83c7533eb3
--- /dev/null
+++ b/data/2021/neurips/Relative Uncertainty Learning for Facial Expression Recognition	
@@ -0,0 +1 @@
+In facial expression recognition (FER), the uncertainties introduced by inherent noises like ambiguous facial expressions and inconsistent labels raise concerns about the credibility of recognition results. To quantify these uncertainties and achieve good performance under noisy data, we regard uncertainty as a relative concept and propose an innovative uncertainty learning method called Relative Uncertainty Learning (RUL). Rather than assuming Gaussian uncertainty distributions for all datasets, RUL builds an extra branch to learn uncertainty from the relative difficulty of samples by feature mixup. Specifically, we use uncertainties as weights to mix facial features and design an add-up loss to encourage uncertainty learning. It is easy to implement and adds little or no extra computation overhead. Extensive experiments show that RUL outperforms state-of-the-art FER uncertainty learning methods in both real-world and synthetic noisy FER datasets. Besides, RUL also works well on other datasets such as CIFAR and Tiny ImageNet. The code is available at https://github.com/zyh-uaiaaaa/Relative-Uncertainty-Learning .
\ No newline at end of file
diff --git a/data/2021/neurips/Relative stability toward diffeomorphisms indicates performance in deep nets b/data/2021/neurips/Relative stability toward diffeomorphisms indicates performance in deep nets
new file mode 100644
index 0000000000..c8822e7e63
--- /dev/null
+++ b/data/2021/neurips/Relative stability toward diffeomorphisms indicates performance in deep nets	
@@ -0,0 +1 @@
+Understanding why deep nets can classify data in large dimensions remains a challenge. It has been proposed that they do so by becoming stable to diffeomorphisms, yet existing empirical measurements support that it is often not the case. We revisit this question by defining a maximum-entropy distribution on diffeomorphisms, that allows to study typical diffeomorphisms of a given norm. We confirm that stability toward diffeomorphisms does not strongly correlate to performance on benchmark data sets of images. By contrast, we find that the stability toward diffeomorphisms relative to that of generic transformations R f correlates remarkably with the test error ϵ t. It is of order unity at initialization but decreases by several decades during training for state-of-the-art architectures. For CIFAR10 and 15 known architectures we find ϵt≈0.2Rf , suggesting that obtaining a small R f is important to achieve good performance. We study how R f depends on the size of the training set and compare it to a simple model of invariant learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Relaxed Marginal Consistency for Differentially Private Query Answering b/data/2021/neurips/Relaxed Marginal Consistency for Differentially Private Query Answering
new file mode 100644
index 0000000000..87cf534041
--- /dev/null
+++ b/data/2021/neurips/Relaxed Marginal Consistency for Differentially Private Query Answering	
@@ -0,0 +1 @@
+Many differentially private algorithms for answering database queries involve a step that reconstructs a discrete data distribution from noisy measurements. This provides consistent query answers and reduces error, but often requires space that grows exponentially with dimension. Private-PGM is a recent approach that uses graphical models to represent the data distribution, with complexity proportional to that of exact marginal inference in a graphical model with structure determined by the co-occurrence of variables in the noisy measurements. Private-PGM is highly scalable for sparse measurements, but may fail to run in high dimensions with dense measurements. We overcome the main scalability limitation of Private-PGM through a principled approach that relaxes consistency constraints in the estimation objective. Our new approach works with many existing private query answering algorithms and improves scalability or accuracy with no privacy cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Relaxing Local Robustness b/data/2021/neurips/Relaxing Local Robustness
new file mode 100644
index 0000000000..4dbcfe2c3a
--- /dev/null
+++ b/data/2021/neurips/Relaxing Local Robustness	
@@ -0,0 +1 @@
+Certifiable local robustness, which rigorously precludes small-norm adversarial examples, has received significant attention as a means of addressing security concerns in deep learning. However, for some classification problems, local robustness is not a natural objective, even in the presence of adversaries; for example, if an image contains two classes of subjects, the correct label for the image may be considered arbitrary between the two, and thus enforcing strict separation between them is unnecessary. In this work, we introduce two relaxed safety properties for classifiers that address this observation: (1) relaxed top-k robustness, which serves as the analogue of top-k accuracy; and (2) affinity robustness, which specifies which sets of labels must be separated by a robustness margin, and which can be $\epsilon$-close in $\ell_p$ space. We show how to construct models that can be efficiently certified against each relaxed robustness property, and trained with very little overhead relative to standard gradient descent. Finally, we demonstrate experimentally that these relaxed variants of robustness are well-suited to several significant classification problems, leading to lower rejection rates and higher certified accuracies than can be obtained when certifying"standard"local robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/RelaySum for Decentralized Deep Learning on Heterogeneous Data b/data/2021/neurips/RelaySum for Decentralized Deep Learning on Heterogeneous Data
new file mode 100644
index 0000000000..d5a9f38491
--- /dev/null
+++ b/data/2021/neurips/RelaySum for Decentralized Deep Learning on Heterogeneous Data	
@@ -0,0 +1 @@
+In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers. A key challenge, primarily in decentralized deep learning, remains the handling of differences between the workers' local data distributions. To tackle this challenge, we introduce the RelaySum mechanism for information propagation in decentralized learning. RelaySum uses spanning trees to distribute information exactly uniformly across all workers with finite delays depending on the distance between nodes. In contrast, the typical gossip averaging mechanism only distributes data uniformly asymptotically while using the same communication volume per step as RelaySum. We prove that RelaySGD, based on this mechanism, is independent of data heterogeneity and scales to many workers, enabling highly accurate decentralized deep learning on heterogeneous data. Our code is available at http://github.com/epfml/relaysgd.
\ No newline at end of file
diff --git a/data/2021/neurips/Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions b/data/2021/neurips/Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions
new file mode 100644
index 0000000000..1f287cd2df
--- /dev/null
+++ b/data/2021/neurips/Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions	
@@ -0,0 +1 @@
+Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Reliable Decisions with Threshold Calibration b/data/2021/neurips/Reliable Decisions with Threshold Calibration
new file mode 100644
index 0000000000..419f6b5da8
--- /dev/null
+++ b/data/2021/neurips/Reliable Decisions with Threshold Calibration	
@@ -0,0 +1 @@
+Decision makers rely on probabilistic forecasts to predict the loss of different decision rules before deployment. When the forecasted probabilities match the true frequencies, predicted losses will be accurate. Although perfect forecasts are typically impossible, probabilities can be calibrated to match the true frequencies on average. However, we ﬁnd that this average notion of calibration, which is typically used in practice, does not necessarily guarantee accurate decision loss prediction. Speciﬁcally in the regression setting, the loss of threshold decisions, which are decisions based on whether the forecasted outcome falls above or below a cutoff, might not be predicted accurately. We propose a stronger notion of calibration called threshold calibration, which is exactly the condition required to ensure that decision loss is predicted accurately for threshold decisions. We provide an efﬁcient algorithm which takes an uncalibrated forecaster as input and provably outputs a threshold-calibrated forecaster. Our procedure allows downstream decision makers to conﬁdently estimate the loss of any threshold decision under any threshold loss function. Empirically, threshold calibration improves decision loss prediction without compromising on the quality of the decisions in two real-world settings: hospital scheduling decisions and resource allocation decisions.
\ No newline at end of file
diff --git a/data/2021/neurips/Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space b/data/2021/neurips/Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space
new file mode 100644
index 0000000000..eb43169589
--- /dev/null
+++ b/data/2021/neurips/Reliable Estimation of KL Divergence using a Discriminator in Reproducing Kernel Hilbert Space	
@@ -0,0 +1 @@
+Estimating Kullback Leibler (KL) divergence from samples of two distributions is essential in many machine learning problems. Variational methods using neural network discriminator have been proposed to achieve this task in a scalable manner. However, we noted that most of these methods using neural network discriminators suffer from high fluctuations (variance) in estimates and instability in training. In this paper, we look at this issue from statistical learning theory and function space complexity perspective to understand why this happens and how to solve it. We argue that the cause of these pathologies is lack of control over the complexity of the neural network discriminator function and could be mitigated by controlling it. To achieve this objective, we 1) present a novel construction of the discriminator in the Reproducing Kernel Hilbert Space (RKHS), 2) theoretically relate the error probability bound of the KL estimates to the complexity of the discriminator in the RKHS space, 3) present a scalable way to control the complexity (RKHS norm) of the discriminator for a reliable estimation of KL divergence, and 4) prove the consistency of the proposed estimator. In three different applications of KL divergence : estimation of KL, estimation of mutual information and Variational Bayes, we show that by controlling the complexity as developed in the theory, we are able to reduce the variance of KL estimates and stabilize the training
\ No newline at end of file
diff --git a/data/2021/neurips/Reliable Post hoc Explanations: Modeling Uncertainty in Explainability b/data/2021/neurips/Reliable Post hoc Explanations: Modeling Uncertainty in Explainability
new file mode 100644
index 0000000000..9292260e91
--- /dev/null
+++ b/data/2021/neurips/Reliable Post hoc Explanations: Modeling Uncertainty in Explainability	
@@ -0,0 +1 @@
+As black box explanations are increasingly being employed to establish model credibility in high-stakes settings, it is important to ensure that these explanations are accurate and reliable. However, prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability. In addition, these methods are also computationally inefficient, and require significant hyper-parameter tuning. In this paper, we address the aforementioned challenges by developing a novel Bayesian framework for generating local explanations along with their associated uncertainty. We instantiate this framework to obtain Bayesian versions of LIME and KernelSHAP which output credible intervals for the feature importances, capturing the associated uncertainty. The resulting explanations not only enable us to make concrete inferences about their quality (e.g., there is a 95% chance that the feature importance lies within the given range), but are also highly consistent and stable. We carry out a detailed theoretical analysis that leverages the aforementioned uncertainty to estimate how many perturbations to sample, and how to sample for faster convergence. This work makes the first attempt at addressing several critical issues with popular explanation methods in one shot, thereby generating consistent, stable, and reliable explanations with guarantees in a computationally efficient manner. Experimental evaluation with multiple real world datasets and user studies demonstrate that the efficacy of the proposed framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection b/data/2021/neurips/Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection
new file mode 100644
index 0000000000..a02f9f2589
--- /dev/null
+++ b/data/2021/neurips/Reliable and Trustworthy Machine Learning for Health Using Dataset Shift Detection	
@@ -0,0 +1 @@
+Unpredictable ML model behavior on unseen data, especially in the health domain, raises serious concerns about its safety as repercussions for mistakes can be fatal. In this paper, we explore the feasibility of using state-of-the-art out-of-distribution detectors for reliable and trustworthy diagnostic predictions. We select publicly available deep learning models relating to various health conditions (e.g., skin cancer, lung sound, and Parkinson's disease) using various input data types (e.g., image, audio, and motion data). We demonstrate that these models show unreasonable predictions on out-of-distribution datasets. We show that Mahalanobis distance- and Gram matrices-based out-of-distribution detection methods are able to detect out-of-distribution data with high accuracy for the health models that operate on different modalities. We then translate the out-of-distribution score into a human interpretable CONFIDENCE SCORE to investigate its effect on the users' interaction with health ML applications. Our user study shows that the \textsc{confidence score} helped the participants only trust the results with a high score to make a medical decision and disregard results with a low score. Through this work, we demonstrate that dataset shift is a critical piece of information for high-stake ML applications, such as medical diagnosis and healthcare, to provide reliable and trustworthy predictions to the users.
\ No newline at end of file
diff --git a/data/2021/neurips/Remember What You Want to Forget: Algorithms for Machine Unlearning b/data/2021/neurips/Remember What You Want to Forget: Algorithms for Machine Unlearning
new file mode 100644
index 0000000000..ce2c2ce3ea
--- /dev/null
+++ b/data/2021/neurips/Remember What You Want to Forget: Algorithms for Machine Unlearning	
@@ -0,0 +1 @@
+We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $\widehat{w}$ that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint $z \in S$ can request to be unlearned, thus prompting the learner to modify its output model while still ensuring the same accuracy guarantees. We initiate a rigorous study of generalization in machine unlearning, where the goal is to perform well on previously unseen datapoints. Our focus is on both computational and storage complexity. For the setting of convex losses, we provide an unlearning algorithm that can unlearn up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learning (which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This demonstrates a novel separation between differential privacy and machine unlearning.
\ No newline at end of file
diff --git a/data/2021/neurips/Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience b/data/2021/neurips/Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience
new file mode 100644
index 0000000000..5f32e727af
--- /dev/null
+++ b/data/2021/neurips/Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience	
@@ -0,0 +1 @@
+Integrating data from multiple experiments is common practice in systems neuroscience but it requires inter-experimental variability to be negligible compared to the biological signal of interest. This requirement is rarely fulfilled; systematic changes between experiments can drastically affect the outcome of complex analysis pipelines. Modern machine learning approaches designed to adapt models across multiple data domains offer flexible ways of removing inter-experimental variability where classical statistical methods often fail. While applications of these methods have been mostly limited to single-cell genomics, in this work, we develop a theoretical framework for domain adaptation in systems neuroscience. We implement this in an adversarial optimization scheme that removes inter-experimental variability while preserving the biological signal. We compare our method to previous approaches on a large-scale dataset of two-photon imaging recordings of retinal bipolar cell responses to visual stimuli. This dataset provides a unique benchmark as it contains biological signal from well-defined cell types that is obscured by large inter-experimental variability. In a supervised setting, we compare the generalization performance of cell type classifiers across experiments, which we validate with anatomical cell type distributions from electron microscopy data. In an unsupervised setting, we remove inter-experimental variability from data which can then be fed into arbitrary downstream analyses. In both settings, we find that our method achieves the best trade-off between removing inter-experimental variability and preserving biological signal. Thus, we offer a flexible approach to remove inter-experimental variability and integrate datasets across experiments in systems neuroscience. Code available at https://github.com/eulerlab/rave.
\ No newline at end of file
diff --git a/data/2021/neurips/Renyi Differential Privacy of The Subsampled Shuffle Model In Distributed Learning b/data/2021/neurips/Renyi Differential Privacy of The Subsampled Shuffle Model In Distributed Learning
new file mode 100644
index 0000000000..dce62c2c65
--- /dev/null
+++ b/data/2021/neurips/Renyi Differential Privacy of The Subsampled Shuffle Model In Distributed Learning	
@@ -0,0 +1 @@
+We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively through interactions with a server from whom we need privacy. Motivated by stochastic optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round to participate in the learning process, which also enables privacy amplification. To obtain even stronger local privacy guarantees, we study this in the shuffle privacy model, where each client randomizes its response using a local differentially private (LDP) mechanism and the server only receives a random permutation (shuffle) of the clients' responses without their association to each client. The principal result of this paper is a privacy-optimization performance trade-off for discrete randomization mechanisms in this sub-sampled shuffle privacy model. This is enabled through a new theoretical technique to analyze the Renyi Differential Privacy (RDP) of the sub-sampled shuffle model. We numerically demonstrate that, for important regimes, with composition our bound yields significant improvement in privacy guarantee over the state-of-the-art approximate Differential Privacy (DP) guarantee (with strong composition) for sub-sampled shuffled models. We also demonstrate numerically significant improvement in privacy-learning performance operating point using real data sets.
\ No newline at end of file
diff --git a/data/2021/neurips/Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification b/data/2021/neurips/Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification
new file mode 100644
index 0000000000..bb0aa0fd2c
--- /dev/null
+++ b/data/2021/neurips/Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification	
@@ -0,0 +1 @@
+Reinforcement learning (RL) algorithms assume that users specify tasks by manually writing down a reward function. However, this process can be laborious and demands considerable technical expertise. Can we devise RL algorithms that instead enable users to specify tasks simply by providing examples of successful outcomes? In this paper, we derive a control algorithm that maximizes the future probability of these successful outcome examples. Prior work has approached similar problems with a two-stage process, first learning a reward function and then optimizing this reward function using another RL algorithm. In contrast, our method directly learns a value function from transitions and successful outcomes, without learning this intermediate reward function. Our method therefore requires fewer hyperparameters to tune and lines of code to debug. We show that our method satisfies a new data-driven Bellman equation, where examples take the place of the typical reward function term. Experiments show that our approach outperforms prior methods that learn explicit reward functions.
\ No newline at end of file
diff --git a/data/2021/neurips/Replay-Guided Adversarial Environment Design b/data/2021/neurips/Replay-Guided Adversarial Environment Design
new file mode 100644
index 0000000000..7add106779
--- /dev/null
+++ b/data/2021/neurips/Replay-Guided Adversarial Environment Design	
@@ -0,0 +1 @@
+Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR$^{\perp}$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR$^{\perp}$ improves the performance of PAIRED, from which it inherited its theoretical framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Representation Costs of Linear Neural Networks: Analysis and Design b/data/2021/neurips/Representation Costs of Linear Neural Networks: Analysis and Design
new file mode 100644
index 0000000000..87f197b23b
--- /dev/null
+++ b/data/2021/neurips/Representation Costs of Linear Neural Networks: Analysis and Design	
@@ -0,0 +1 @@
+For different parameterizations (mappings from parameters to predictors), we study the regularization cost in predictor space induced by l 2 regularization on the parameters (weights). We focus on linear neural networks as parameterizations of linear predictors. We identify the representation cost of certain sparse linear ConvNets and residual networks. In order to get a better understanding of how the architecture and parameterization affect the representation cost, we also study the reverse problem, identifying which regularizers on linear predictors (e
\ No newline at end of file
diff --git a/data/2021/neurips/Representation Learning Beyond Linear Prediction Functions b/data/2021/neurips/Representation Learning Beyond Linear Prediction Functions
new file mode 100644
index 0000000000..5c632534d3
--- /dev/null
+++ b/data/2021/neurips/Representation Learning Beyond Linear Prediction Functions	
@@ -0,0 +1 @@
+Recent papers on the theory of representation learning has shown the importance of a quantity called diversity when generalizing from a set of source tasks to a target task. Most of these papers assume that the function mapping shared representations to predictions is linear, for both source and target tasks. In practice, researchers in deep learning use different numbers of extra layers following the pretrained model based on the difficulty of the new task. This motivates us to ask whether diversity can be achieved when source tasks and the target task use different prediction function spaces beyond linear functions. We show that diversity holds even if the target task uses a neural network with multiple layers, as long as source tasks use linear functions. If source tasks use nonlinear prediction functions, we provide a negative result by showing that depth-1 neural networks with ReLu activation function need exponentially many source tasks to achieve diversity. For a general function class, we find that eluder dimension gives a lower bound on the number of tasks required for diversity. Our theoretical results imply that simpler tasks generalize better. Though our theoretical results are shown for the global minimizer of empirical risks, their qualitative predictions still hold true for gradient-based optimization algorithms as verified by our simulations on deep neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Representation Learning for Event-based Visuomotor Policies b/data/2021/neurips/Representation Learning for Event-based Visuomotor Policies
new file mode 100644
index 0000000000..2e10579499
--- /dev/null
+++ b/data/2021/neurips/Representation Learning for Event-based Visuomotor Policies	
@@ -0,0 +1 @@
+Event-based cameras are dynamic vision sensors that provide asynchronous measurements of changes in per-pixel brightness at a microsecond level. This makes them significantly faster than conventional frame-based cameras, and an appealing choice for high-speed navigation. While an interesting sensor modality, this asynchronously streamed event data poses a challenge for machine learning techniques that are more suited for frame-based data. In this paper, we present an event variational autoencoder and show that it is feasible to learn compact representations directly from asynchronous spatiotemporal event data. Furthermore, we show that such pretrained representations can be used for event-based reinforcement learning instead of end-to-end reward driven perception. We validate this framework of learning event-based visuomotor policies by applying it to an obstacle avoidance scenario in simulation. Compared to techniques that treat event data as images, we show that representations learnt from event streams result in faster policy training, adapt to different control capacities, and demonstrate a higher degree of robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Representation Learning on Spatial Networks b/data/2021/neurips/Representation Learning on Spatial Networks
new file mode 100644
index 0000000000..2a5e73d53c
--- /dev/null
+++ b/data/2021/neurips/Representation Learning on Spatial Networks	
@@ -0,0 +1 @@
+Spatial networks are networks for which the nodes and edges are constrained by geometry and embedded in real space, which has crucial effects on their topological properties. Although tremendous success has been achieved in spatial and network representation separately in recent years, there exist very little works on the representation of spatial networks. Extracting powerful representations from spatial networks requires the development of appropriate tools to uncover the pairing of both spatial and network information in the appearance of node permutation invariant, and rotation and translation invariant. Hence it can not be modeled merely with either spatial or network models individually. To address these challenges, this paper proposes a generic framework for spatial network representation learning. Speciﬁcally, a provably information-lossless and rotation-translation invariant representation of spatial information on networks is presented. Then a higher-order spatial network convolution operation that adapts to our proposed representation is introduced. To ensure efﬁciency, we also propose a new approach that relied on sampling random spanning trees to reduce the time and space complexity from O ( N 3 ) to O ( N ) . We demonstrate the strength of our proposed framework through extensive experiments on both synthetic and real-world datasets. The code for the proposed model is available at https://github.com/rollingstonezz/SGMP_code .
\ No newline at end of file
diff --git a/data/2021/neurips/Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models b/data/2021/neurips/Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models
new file mode 100644
index 0000000000..b0e5b13598
--- /dev/null
+++ b/data/2021/neurips/Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models	
@@ -0,0 +1 @@
+Explaining the inﬂuence of training data on machine learning model predictions is a critical tool for debugging models through data curation. A recent appealing and efﬁcient approach for this task was provided via the concept of Representer Point Selection (RPS), i.e. a method the leverages the dual form of l 2 regularized optimization in the last layer of the neural network to identify the contribution of training points to the prediction. However, two key drawbacks of RPS-l 2 are that they (i) lead to disagreement between the originally trained network and the RPS-l 2 regularized network modiﬁcation and (ii) often yield a static ranking of training data for test points in the same class, independent of the test point being classiﬁed. Inspired by the RPS-l 2 approach, we propose an alternative method based on a local Jacobian Taylor expansion (LJE). We empirically compared RPS-LJE with the original RPS-l 2 on image classiﬁcation (with ResNet), text classiﬁcation recurrent neural networks (with Bi-LSTM), and tabular classiﬁcation (with XGBoost) tasks. Quantitatively, we show that RPS-LJE slightly outperforms RPS-l 2 and other state-of-the-art data explanation methods by up to 3% on a data debugging task. More critically, we qualitatively observe that RPS-LJE provides stable and individualized explanations that are more coherent to each test data point. Overall, RPS-LJE represents a novel approach to RPS-l 2 that provides a powerful tool for sample-based model explanation and debugging.
\ No newline at end of file
diff --git a/data/2021/neurips/Representing Hyperbolic Space Accurately using Multi-Component Floats b/data/2021/neurips/Representing Hyperbolic Space Accurately using Multi-Component Floats
new file mode 100644
index 0000000000..d7d22a7462
--- /dev/null
+++ b/data/2021/neurips/Representing Hyperbolic Space Accurately using Multi-Component Floats	
@@ -0,0 +1 @@
+Hyperbolic space is particularly useful for embedding data with hierarchical structure; however, representing hyperbolic space with ordinary ﬂoating-point numbers greatly affects the performance due to its ineluctable numerical errors. Simply increasing the precision of ﬂoats fails to solve the problem and incurs a high computation cost for simulating greater-than-double-precision ﬂoats on hardware such as GPUs, which does not support them. In this paper, we propose a simple, feasible-on-GPUs, and easy-to-understand solution for numerically accurate learning on hyperbolic space. We do this with a new approach to represent hyperbolic space using multi-component ﬂoating-point (MCF) in the Poincaré upper-half space model. Theoretically and experimentally we show our model has small numerical error, and on embedding tasks across various datasets, models represented by multi-component ﬂoating-points gain more capacity and run signiﬁcantly faster on GPUs than prior work.
\ No newline at end of file
diff --git a/data/2021/neurips/Representing Long-Range Context for Graph Neural Networks with Global Attention b/data/2021/neurips/Representing Long-Range Context for Graph Neural Networks with Global Attention
new file mode 100644
index 0000000000..1ad33c03dd
--- /dev/null
+++ b/data/2021/neurips/Representing Long-Range Context for Graph Neural Networks with Global Attention	
@@ -0,0 +1 @@
+Graph neural networks are powerful architectures for structured datasets. However, current methods struggle to represent long-range dependencies. Scaling the depth or width of GNNs is insufficient to broaden receptive fields as larger GNNs encounter optimization instabilities such as vanishing gradients and representation oversmoothing, while pooling-based approaches have yet to become as universally useful as in computer vision. In this work, we propose the use of Transformer-based self-attention to learn long-range pairwise relationships, with a novel"readout"mechanism to obtain a global graph embedding. Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module. This simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. Our results suggest that purely-learning-based approaches without graph structure may be suitable for learning high-level, long-range relationships on graphs. Code for GraphTrans is available at https://github.com/ucbrise/graphtrans.
\ No newline at end of file
diff --git a/data/2021/neurips/Repulsive Deep Ensembles are Bayesian b/data/2021/neurips/Repulsive Deep Ensembles are Bayesian
new file mode 100644
index 0000000000..3e678e44a8
--- /dev/null
+++ b/data/2021/neurips/Repulsive Deep Ensembles are Bayesian	
@@ -0,0 +1 @@
+Deep ensembles have recently gained popularity in the deep learning community for their conceptual simplicity and efficiency. However, maintaining functional diversity between ensemble members that are independently trained with gradient descent is challenging. This can lead to pathologies when adding more ensemble members, such as a saturation of the ensemble performance, which converges to the performance of a single model. Moreover, this does not only affect the quality of its predictions, but even more so the uncertainty estimates of the ensemble, and thus its performance on out-of-distribution data. We hypothesize that this limitation can be overcome by discouraging different ensemble members from collapsing to the same function. To this end, we introduce a kernelized repulsive term in the update rule of the deep ensembles. We show that this simple modification not only enforces and maintains diversity among the members but, even more importantly, transforms the maximum a posteriori inference into proper Bayesian inference. Namely, we show that the training dynamics of our proposed repulsive ensembles follow a Wasserstein gradient flow of the KL divergence with the true posterior. We study repulsive terms in weight and function space and empirically compare their performance to standard ensembles and Bayesian baselines on synthetic and real-world prediction tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees b/data/2021/neurips/ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees
new file mode 100644
index 0000000000..4086cd95e1
--- /dev/null
+++ b/data/2021/neurips/ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees	
@@ -0,0 +1 @@
+Models recently used in the literature proving residual networks (ResNets) are better than linear predictors are actually different from standard ResNets that have been widely used in computer vision. In addition to the assumptions such as scalar-valued output or single residual block, the models fundamentally considered in the literature have no nonlinearities at the final residual representation that feeds into the final affine layer. To codify such a difference in nonlinearities and reveal a linear estimation property, we define ResNEsts, i.e., Residual Nonlinear Estimators, by simply dropping nonlinearities at the last residual representation from standard ResNets. We show that wide ResNEsts with bottleneck blocks can always guarantee a very desirable training property that standard ResNets aim to achieve, i.e., adding more blocks does not decrease performance given the same set of basis elements. To prove that, we first recognize ResNEsts are basis function models that are limited by a coupling problem in basis learning and linear prediction. Then, to decouple prediction weights from basis learning, we construct a special architecture termed augmented ResNEst (A-ResNEst) that always guarantees no worse performance with the addition of a block. As a result, such an A-ResNEst establishes empirical risk lower bounds for a ResNEst using corresponding bases. Our results demonstrate ResNEsts indeed have a problem of diminishing feature reuse; however, it can be avoided by sufficiently expanding or widening the input space, leading to the above-mentioned desirable property. Inspired by the densely connected networks (DenseNets) that have been shown to outperform ResNets, we also propose a corresponding new model called Densely connected Nonlinear Estimator (DenseNEst). We show that any DenseNEst can be represented as a wide ResNEst with bottleneck blocks. Unlike ResNEsts, DenseNEsts exhibit the desirable property without any special architectural re-design.
\ No newline at end of file
diff --git a/data/2021/neurips/ResT: An Efficient Transformer for Visual Recognition b/data/2021/neurips/ResT: An Efficient Transformer for Visual Recognition
new file mode 100644
index 0000000000..79e3907e9d
--- /dev/null
+++ b/data/2021/neurips/ResT: An Efficient Transformer for Visual Recognition	
@@ -0,0 +1 @@
+This paper presents an efficient multi-scale vision Transformer, called ResT, that capably served as a general-purpose backbone for image recognition. Unlike existing Transformer methods, which employ standard Transformer blocks to tackle raw images with a fixed resolution, our ResT have several advantages: (1) A memory-efficient multi-head self-attention is built, which compresses the memory by a simple depth-wise convolution, and projects the interaction across the attention-heads dimension while keeping the diversity ability of multi-heads; (2) Position encoding is constructed as spatial attention, which is more flexible and can tackle with input images of arbitrary size without interpolation or fine-tune; (3) Instead of the straightforward tokenization at the beginning of each stage, we design the patch embedding as a stack of overlapping convolution operation with stride on the 2D-reshaped token map. We comprehensively validate ResT on image classification and downstream tasks. Experimental results show that the proposed ResT can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of ResT as strong backbones. The code and models will be made publicly available at https://github.com/wofmanaf/ResT.
\ No newline at end of file
diff --git a/data/2021/neurips/Residual Pathway Priors for Soft Equivariance Constraints b/data/2021/neurips/Residual Pathway Priors for Soft Equivariance Constraints
new file mode 100644
index 0000000000..053484133f
--- /dev/null
+++ b/data/2021/neurips/Residual Pathway Priors for Soft Equivariance Constraints	
@@ -0,0 +1 @@
+There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture additional complexity. Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility. We show that RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact. We showcase the broad applicability of RPPs with dynamical systems, tabular data, and reinforcement learning. In Mujoco locomotion tasks, where contact forces and directional rewards violate strict equivariance assumptions, the RPP outperforms baseline model-free RL agents, and also improves the learned transition models for model-based RL.
\ No newline at end of file
diff --git a/data/2021/neurips/Residual Relaxation for Multi-view Representation Learning b/data/2021/neurips/Residual Relaxation for Multi-view Representation Learning
new file mode 100644
index 0000000000..d692074258
--- /dev/null
+++ b/data/2021/neurips/Residual Relaxation for Multi-view Representation Learning	
@@ -0,0 +1 @@
+Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.
\ No newline at end of file
diff --git a/data/2021/neurips/Residual2Vec: Debiasing graph embedding with random graphs b/data/2021/neurips/Residual2Vec: Debiasing graph embedding with random graphs
new file mode 100644
index 0000000000..281cd90da8
--- /dev/null
+++ b/data/2021/neurips/Residual2Vec: Debiasing graph embedding with random graphs	
@@ -0,0 +1 @@
+Graph embedding maps a graph into a convenient vector-space representation for graph analysis and machine learning applications. Many graph embedding methods hinge on a sampling of context nodes based on random walks. However, random walks can be a biased sampler due to the structural properties of graphs. Most notably, random walks are biased by the degree of each node, where a node is sampled proportionally to its degree. The implication of such biases has not been clear, particularly in the context of graph representation learning. Here, we investigate the impact of the random walks' bias on graph embedding and propose residual2vec, a general graph embedding method that can debias various structural biases in graphs by using random graphs. We demonstrate that this debiasing not only improves link prediction and clustering performance but also allows us to explicitly model salient structural properties in graph embedding.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence b/data/2021/neurips/Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence
new file mode 100644
index 0000000000..35cdbcca92
--- /dev/null
+++ b/data/2021/neurips/Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence	
@@ -0,0 +1 @@
+Capturing accurate uncertainty quantiﬁcation of the predictions from deep neural networks is important in many real-world decision-making applications. A reliable predictor is expected to be accurate when it is conﬁdent about its predictions and indicate high uncertainty when it is likely to be inaccurate. However, modern neural networks have been found to be poorly calibrated, primarily in the direction of overconﬁdence. In recent years, there is a surge of research on model calibration by leveraging implicit or explicit regularization techniques during training, which achieve well calibration performance by avoiding overconﬁdent outputs. In our study, we empirically found that despite the predictions obtained from these regularized models are better calibrated , they suffer from not being as calibratable , namely, it is harder to further calibrate these predictions with post-hoc calibration methods like temperature scaling and histogram binning. We conduct a series of empirical studies showing that overconﬁdence may not hurt ﬁnal calibration performance if post-hoc calibration is allowed, rather, the penalty of conﬁdent outputs will compress the room of potential improvement in post-hoc calibration phase. Our experimental ﬁndings point out a new direction to improve calibration of DNNs by considering main training and post-hoc calibration as a uniﬁed framework.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking Graph Transformers with Spectral Attention b/data/2021/neurips/Rethinking Graph Transformers with Spectral Attention
new file mode 100644
index 0000000000..c297d63475
--- /dev/null
+++ b/data/2021/neurips/Rethinking Graph Transformers with Spectral Attention	
@@ -0,0 +1 @@
+In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data structures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the $\textit{Spectral Attention Network}$ (SAN), which uses a learned positional encoding (LPE) that can take advantage of the full Laplacian spectrum to learn the position of each node in a given graph. This LPE is then added to the node features of the graph and passed to a fully-connected Transformer. By leveraging the full spectrum of the Laplacian, our model is theoretically powerful in distinguishing graphs, and can better detect similar sub-structures from their resonance. Further, by fully connecting the graph, the Transformer does not suffer from over-squashing, an information bottleneck of most GNNs, and enables better modeling of physical phenomenons such as heat transfer and electric interaction. When tested empirically on a set of 4 standard datasets, our model performs on par or better than state-of-the-art GNNs, and outperforms any attention-based model by a wide margin, becoming the first fully-connected architecture to perform well on graph benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking Neural Operations for Diverse Tasks b/data/2021/neurips/Rethinking Neural Operations for Diverse Tasks
new file mode 100644
index 0000000000..c195bbe4a9
--- /dev/null
+++ b/data/2021/neurips/Rethinking Neural Operations for Diverse Tasks	
@@ -0,0 +1 @@
+An important goal of AutoML is to automate-away the design of neural networks on new tasks in under-explored domains. Motivated by this goal, we study the problem of enabling users to discover the right neural operations given data from their specific domain. We introduce a search space of operations called XD-Operations that mimic the inductive bias of standard multi-channel convolutions while being much more expressive: we prove that it includes many named operations across multiple application areas. Starting with any standard backbone such as ResNet, we show how to transform it into a search space over XD-operations and how to traverse the space using a simple weight-sharing scheme. On a diverse set of tasks -- solving PDEs, distance prediction for protein folding, and music modeling -- our approach consistently yields models with lower error than baseline networks and often even lower error than expert-designed domain-specific approaches.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation b/data/2021/neurips/Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
new file mode 100644
index 0000000000..5d1798f6d4
--- /dev/null
+++ b/data/2021/neurips/Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation	
@@ -0,0 +1 @@
+This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation. Unlike most existing approaches, we establish correspondences directly between frames without re-encoding the mask features for every object, leading to a highly efficient and robust framework. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We cast the aggregation process as a voting problem and find that the existing inner-product affinity leads to poor use of memory with a small (fixed) subset of memory nodes dominating the votes, regardless of the query. In light of this phenomenon, we propose using the negative squared Euclidean distance instead to compute the affinities. We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy. The synergy of correspondence networks and diversified voting works exceedingly well, achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization b/data/2021/neurips/Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization
new file mode 100644
index 0000000000..185ce70f18
--- /dev/null
+++ b/data/2021/neurips/Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization	
@@ -0,0 +1 @@
+(Partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical work on multi-label ranking mainly focuses on (Fisher) consistency analyses. However, there is a gap between existing theory and practice -- some pairwise losses can lead to promising performance but lack consistency, while some univariate losses are consistent but usually have no clear superiority in practice. In this paper, we attempt to fill this gap through a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. Our results show that learning algorithms with the consistent univariate loss have an error bound of $O(c)$ ($c$ is the number of labels), while algorithms with the inconsistent pairwise loss depend on $O(\sqrt{c})$ as shown in prior work. This explains that the latter can achieve better performance than the former in practice. Moreover, we present an inconsistent reweighted univariate loss-based learning algorithm that enjoys an error bound of $O(\sqrt{c})$ for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results validate our theoretical analyses.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking conditional GAN training: An approach using geometrically structured latent manifolds b/data/2021/neurips/Rethinking conditional GAN training: An approach using geometrically structured latent manifolds
new file mode 100644
index 0000000000..a56c2ad868
--- /dev/null
+++ b/data/2021/neurips/Rethinking conditional GAN training: An approach using geometrically structured latent manifolds	
@@ -0,0 +1 @@
+Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds. Although efforts have been made to improve results, they can suffer from unpleasant side-effects such as the topology mismatch between latent and output spaces. In contrast, we tackle this problem from a geometrical perspective and propose a novel training mechanism that increases both the diversity and the visual quality of a vanilla cGAN, by systematically encouraging a bi-lipschitz mapping between the latent and the output manifolds. We validate the efficacy of our solution on a baseline cGAN (i.e., Pix2Pix) which lacks diversity, and show that by only modifying its training mechanism (i.e., with our proposed Pix2Pix-Geo), one can achieve more diverse and realistic outputs on a broad set of image-to-image translation tasks. Codes are available at https://github.com/samgregoost/Rethinking-CGANs.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking gradient sparsification as total error minimization b/data/2021/neurips/Rethinking gradient sparsification as total error minimization
new file mode 100644
index 0000000000..9a55624ace
--- /dev/null
+++ b/data/2021/neurips/Rethinking gradient sparsification as total error minimization	
@@ -0,0 +1 @@
+Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-$k$ sparsification, sometimes with $k$ as little as $0.1\%$ of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-$k$ is the communication-optimal sparsifier given a per-iteration $k$ element budget. We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training. We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training. Then, we propose a communication complexity model that minimizes the total error under a communication budget for the entire training. We find that the hard-threshold sparsifier, a variant of the Top-$k$ sparsifier with $k$ determined by a constant hard-threshold, is the optimal sparsifier for this model. Motivated by this, we provide convex and non-convex convergence analyses for the hard-threshold sparsifier with error-feedback. Unlike with Top-$k$ sparsifier, we show that hard-threshold has the same asymptotic convergence and linear speedup property as SGD in the convex case and has no impact on the data-heterogeneity in the non-convex case. Our diverse experiments on various DNNs and a logistic regression model demonstrated that the hard-threshold sparsifier is more communication-efficient than Top-$k$.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking the Pruning Criteria for Convolutional Neural Network b/data/2021/neurips/Rethinking the Pruning Criteria for Convolutional Neural Network
new file mode 100644
index 0000000000..fcddaeee99
--- /dev/null
+++ b/data/2021/neurips/Rethinking the Pruning Criteria for Convolutional Neural Network	
@@ -0,0 +1 @@
+Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters. From our comprehensive experiments, we found two blind spots in the study of pruning criteria: (1) Similarity: There are some strong similarities among several primary pruning criteria that are widely cited and compared. According to these criteria, the ranks of filters'Importance Score are almost identical, resulting in similar pruned structures. (2) Applicability: The filters'Importance Score measured by some pruning criteria are too close to distinguish the network redundancy well. In this paper, we analyze these two blind spots on different types of pruning criteria with layer-wise pruning or global pruning. The analyses are based on the empirical experiments and our assumption (Convolutional Weight Distribution Assumption) that the well-trained convolutional filters each layer approximately follow a Gaussian-alike distribution. This assumption has been verified through systematic and extensive statistical tests.
\ No newline at end of file
diff --git a/data/2021/neurips/Rethinking the Variational Interpretation of Accelerated Optimization Methods b/data/2021/neurips/Rethinking the Variational Interpretation of Accelerated Optimization Methods
new file mode 100644
index 0000000000..aa912391a0
--- /dev/null
+++ b/data/2021/neurips/Rethinking the Variational Interpretation of Accelerated Optimization Methods	
@@ -0,0 +1 @@
+The continuous-time model of Nesterov’s momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization. One of the main ideas in this line of research comes from the ﬁeld of classical mechanics and proposes to link Nesterov’s trajectory to the solution of a set of Euler-Lagrange equations relative to the so-called Bregman Lagrangian. In the last years, this approach led to the discovery of many new (stochastic) accelerated algorithms and provided a solid theoretical foundation for the design of structure-preserving accelerated methods. In this work, we revisit this idea and provide an in-depth analysis of the action relative to the Bregman Lagrangian from the point of view of calculus of variations. Our main ﬁnding is that, while Nesterov’s method is a stationary point for the action, it is often not a minimizer but instead a saddle point for this functional in the space of differentiable curves. This ﬁnding challenges the main intuition behind the variational interpretation of Nesterov’s method and provides additional insights into the intriguing geometry of accelerated paths.
\ No newline at end of file
diff --git a/data/2021/neurips/Retiring Adult: New Datasets for Fair Machine Learning b/data/2021/neurips/Retiring Adult: New Datasets for Fair Machine Learning
new file mode 100644
index 0000000000..5b2e6e29b0
--- /dev/null
+++ b/data/2021/neurips/Retiring Adult: New Datasets for Fair Machine Learning	
@@ -0,0 +1 @@
+Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.
\ No newline at end of file
diff --git a/data/2021/neurips/Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes b/data/2021/neurips/Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes
new file mode 100644
index 0000000000..35f279966c
--- /dev/null
+++ b/data/2021/neurips/Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes	
@@ -0,0 +1 @@
+Optimization algorithms such as projected Newton's method, FISTA, mirror descent, and its variants enjoy near-optimal regret bounds and convergence rates, but suffer from a computational bottleneck of computing ``projections'' in potentially each iteration (e.g., $O(T^{1/2})$ regret of online mirror descent). On the other hand, conditional gradient variants solve a linear optimization in each iteration, but result in suboptimal rates (e.g., $O(T^{3/4})$ regret of online Frank-Wolfe). Motivated by this trade-off in runtime v/s convergence rates, we consider iterative projections of close-by points over widely-prevalent submodular base polytopes $B(f)$. We first give necessary and sufficient conditions for when two close points project to the same face of a polytope, and then show that points far away from the polytope project onto its vertices with high probability. We next use this theory and develop a toolkit to speed up the computation of iterative projections over submodular polytopes using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality-based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $\Omega(n/\log(n))$. Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments.
\ No newline at end of file
diff --git a/data/2021/neurips/Revealing and Protecting Labels in Distributed Training b/data/2021/neurips/Revealing and Protecting Labels in Distributed Training
new file mode 100644
index 0000000000..e6dbac4105
--- /dev/null
+++ b/data/2021/neurips/Revealing and Protecting Labels in Distributed Training	
@@ -0,0 +1 @@
+Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (e.g., ResNet), or they can be reconstructed jointly with model inputs by using Gradients Matching [Zhu et al'19] with additional knowledge about the current state of the model. In this work, we propose a method to discover the set of labels of training samples from only the gradient of the last layer and the id to label mapping. Our method is applicable to a wide variety of model architectures across multiple domains. We demonstrate the effectiveness of our method for model training in two domains - image classification, and automatic speech recognition. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method. Conversely, we demonstrate that gradient quantization and sparsification can significantly reduce the success of the attack.
\ No newline at end of file
diff --git a/data/2021/neurips/Revenue maximization via machine learning with noisy data b/data/2021/neurips/Revenue maximization via machine learning with noisy data
new file mode 100644
index 0000000000..6753863fb4
--- /dev/null
+++ b/data/2021/neurips/Revenue maximization via machine learning with noisy data	
@@ -0,0 +1 @@
+Increasingly, copious amounts of consumer data are used to learn high-revenue mechanisms via machine learning. Existing research on mechanism design via machine learning assumes that there is a distribution over the buyers’ values for the items for sale and that the learning algorithm’s input is a training set sampled from this distribution. This setup makes the strong assumption that no noise is introduced during data collection. In order to help place mechanism design via machine learning on ﬁrm foundations, we investigate the extent to which this learning process is robust to noise. Optimizing revenue using noisy data is challenging because revenue functions are extremely volatile: an inﬁnitesimal change in the buyers’ values can cause a steep drop in revenue. Nonetheless, we provide guarantees when arbitrarily correlated noise is added to the training set; we only require that the noise has bounded magnitude or is sub-Gaussian. We conclude with an application of our guarantees to multi-task mechanism design, where there are multiple distributions over buyers’ values and the goal is to learn a high-revenue mechanism per distribution. To our knowledge, we are the ﬁrst to study mechanism design via machine learning with noisy data as well as multi-task mechanism design.
\ No newline at end of file
diff --git a/data/2021/neurips/Reverse engineering learned optimizers reveals known and novel mechanisms b/data/2021/neurips/Reverse engineering learned optimizers reveals known and novel mechanisms
new file mode 100644
index 0000000000..1903759489
--- /dev/null
+++ b/data/2021/neurips/Reverse engineering learned optimizers reveals known and novel mechanisms	
@@ -0,0 +1 @@
+Learned optimizers are algorithms that can themselves be trained to solve optimization problems. In contrast to baseline optimizers (such as momentum or Adam) that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations. Although this can lead to better performance in certain settings, their inner workings remain a mystery. How is a learned optimizer able to outperform a well tuned baseline? Has it learned a sophisticated combination of existing optimization techniques, or is it implementing completely new behavior? In this work, we address these questions by careful analysis and visualization of learned optimizers. We study learned optimizers trained from scratch on three disparate tasks, and discover that they have learned interpretable mechanisms, including: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation. Moreover, we show how the dynamics of learned optimizers enables these behaviors. Our results help elucidate the previously murky understanding of how learned optimizers work, and establish tools for interpreting future learned optimizers.
\ No newline at end of file
diff --git a/data/2021/neurips/Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems b/data/2021/neurips/Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems
new file mode 100644
index 0000000000..ac1b303317
--- /dev/null
+++ b/data/2021/neurips/Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) are powerful models for processing time-series data, but it remains challenging to understand how they function. Improving this understanding is of substantial interest to both the machine learning and neuroscience communities. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. These include difficulty choosing which fixed point to expand around when studying RNN dynamics and error accumulation when reconstructing the nonlinear dynamics with the linearized dynamics. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation. A first-order Taylor series expansion of the co-trained RNN and an auxiliary function trained to pick out the RNN's fixed points govern the SLDS dynamics. The results are a trained SLDS variant that closely approximates the RNN, an auxiliary function that can produce a fixed point for each point in state-space, and a trained nonlinear RNN whose dynamics have been regularized such that its first-order terms perform the computation, if possible. This model removes the post-training fixed point optimization and allows us to unambiguously study the learned dynamics of the SLDS at any point in state-space. It also generalizes SLDS models to continuous manifolds of switching points while sharing parameters across switches. We validate the utility of the model on two synthetic tasks relevant to previous work reverse engineering RNNs. We then show that our model can be used as a drop-in in more complex architectures, such as LFADS, and apply this LFADS hybrid to analyze single-trial spiking activity from the motor system of a non-human primate.
\ No newline at end of file
diff --git a/data/2021/neurips/Reverse-Complement Equivariant Networks for DNA Sequences b/data/2021/neurips/Reverse-Complement Equivariant Networks for DNA Sequences
new file mode 100644
index 0000000000..0e0e242252
--- /dev/null
+++ b/data/2021/neurips/Reverse-Complement Equivariant Networks for DNA Sequences	
@@ -0,0 +1 @@
+As DNA sequencing technologies keep improving in scale and cost, there is a growing need to develop machine learning models to analyze DNA sequences, e.g., to decipher regulatory signals from DNA fragments bound by a particular protein of interest. As a double helix made of two complementary strands, a DNA fragment can be sequenced as two equivalent, so-called Reverse Complement (RC) sequences of nucleotides. To take into account this inherent symmetry of the data in machine learning models can facilitate learning. In this sense, several authors have recently proposed particular RC-equivariant convolutional neural networks (CNNs). However, it remains unknown whether other RC-equivariant architectures exist, which could potentially increase the set of basic models adapted to DNA sequences for practitioners. Here, we close this gap by characterizing the set of all linear RC-equivariant layers, and show in particular that new architectures exist beyond the ones already explored. We further discuss RC-equivariant pointwise nonlinearities adapted to different architectures, as well as RC-equivariant embeddings of k-mers as an alternative to one-hot encoding of nucleotides. We show experimentally that the new architectures can outperform existing ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning b/data/2021/neurips/Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning
new file mode 100644
index 0000000000..6853bee43d
--- /dev/null
+++ b/data/2021/neurips/Revisit Multimodal Meta-Learning through the Lens of Multi-Task Learning	
@@ -0,0 +1 @@
+Multimodal meta-learning is a recent problem that extends conventional few-shot meta-learning by generalizing its setup to diverse multimodal task distributions. This setup makes a step towards mimicking how humans make use of a diverse set of prior skills to learn new skills. Previous work has achieved encouraging performance. In particular, in spite of the diversity of the multimodal tasks, previous work claims that a single meta-learner trained on a multimodal distribution can sometimes outperform multiple specialized meta-learners trained on individual unimodal distributions. The improvement is attributed to knowledge transfer between different modes of task distributions. However, there is no deep investigation to verify and understand the knowledge transfer between multimodal tasks. Our work makes two contributions to multimodal meta-learning. First, we propose a method to quantify knowledge transfer between tasks of different modes at a micro-level. Our quantitative, task-level analysis is inspired by the recent transference idea from multi-task learning. Second, inspired by hard parameter sharing in multi-task learning and a new interpretation of related work, we propose a new multimodal meta-learner that outperforms existing work by considerable margins. While the major focus is on multimodal meta-learning, our work also attempts to shed light on task interaction in conventional meta-learning. The code for this project is available at https://miladabd.github.io/KML.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting 3D Object Detection From an Egocentric Perspective b/data/2021/neurips/Revisiting 3D Object Detection From an Egocentric Perspective
new file mode 100644
index 0000000000..f6debb11dc
--- /dev/null
+++ b/data/2021/neurips/Revisiting 3D Object Detection From an Egocentric Perspective	
@@ -0,0 +1 @@
+3D object detection is a key module for safety-critical robotics applications such as autonomous driving. For these applications, we care most about how the detections affect the ego-agent's behavior and safety (the egocentric perspective). Intuitively, we seek more accurate descriptions of object geometry when it's more likely to interfere with the ego-agent's motion trajectory. However, current detection metrics, based on box Intersection-over-Union (IoU), are object-centric and aren't designed to capture the spatio-temporal relationship between objects and the ego-agent. To address this issue, we propose a new egocentric measure to evaluate 3D object detection, namely Support Distance Error (SDE). Our analysis based on SDE reveals that the egocentric detection quality is bounded by the coarse geometry of the bounding boxes. Given the insight that SDE would benefit from more accurate geometry descriptions, we propose to represent objects as amodal contours, specifically amodal star-shaped polygons, and devise a simple model, StarPoly, to predict such contours. Our experiments on the large-scale Waymo Open Dataset show that SDE better reflects the impact of detection quality on the ego-agent's safety compared to IoU; and the estimated contours from StarPoly consistently improve the egocentric detection quality over recent 3D object detectors.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations b/data/2021/neurips/Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
new file mode 100644
index 0000000000..0819a72665
--- /dev/null
+++ b/data/2021/neurips/Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations	
@@ -0,0 +1 @@
+Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this paper, we first study how biases in the dataset affect existing methods. Our results show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets. Second, given the generality of the approach, we try to realize further gains with minor modifications. We show that learning additional invariances -- through the use of multi-scale cropping, stronger augmentations and nearest neighbors -- improves the representations. Finally, we observe that MoCo learns spatially structured representations when trained with a multi-crop strategy. The representations can be used for semantic segment retrieval and video instance segmentation without finetuning. Moreover, the results are on par with specialized models. We hope this work will serve as a useful study for other researchers. The code and models are available at https://github.com/wvangansbeke/Revisiting-Contrastive-SSL.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Deep Learning Models for Tabular Data b/data/2021/neurips/Revisiting Deep Learning Models for Tabular Data
new file mode 100644
index 0000000000..c5b7c56b6a
--- /dev/null
+++ b/data/2021/neurips/Revisiting Deep Learning Models for Tabular Data	
@@ -0,0 +1 @@
+The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme b/data/2021/neurips/Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme
new file mode 100644
index 0000000000..8b9e686fe5
--- /dev/null
+++ b/data/2021/neurips/Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme	
@@ -0,0 +1 @@
+Recently, a series of algorithms have been explored for GAN compression, which aims to reduce tremendous computational overhead and memory usages when deploying GANs on resource-constrained edge devices. However, most of the existing GAN compression work only focuses on how to compress the generator, while fails to take the discriminator into account. In this work, we revisit the role of discriminator in GAN compression and design a novel generator-discriminator cooperative compression scheme for GAN compression, termed GCC. Within GCC, a selective activation discriminator automatically selects and activates convolutional channels according to a local capacity constraint and a global coordination constraint, which help maintain the Nash equilibrium with the lightweight generator during the adversarial training and avoid mode collapse. The original generator and discriminator are also optimized from scratch, to play as a teacher model to progressively refine the pruned generator and the selective activation discriminator. A novel online collaborative distillation scheme is designed to take full advantage of the intermediate feature of the teacher generator and discriminator to further boost the performance of the lightweight generator. Extensive experiments on various GAN-based generation tasks demonstrate the effectiveness and generalization of GCC. Among them, GCC contributes to reducing 80% computational costs while maintains comparable performance in image translation tasks. Our code and models are available at https://github.com/SJLeo/GCC.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness b/data/2021/neurips/Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness
new file mode 100644
index 0000000000..c6fcfefd01
--- /dev/null
+++ b/data/2021/neurips/Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness	
@@ -0,0 +1 @@
+We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a regularizer for learning an adversarially robust deep neural network classifier. In addition to the usual cross-entropy loss, we add regularization terms for every intermediate layer to ensure that the latent representations retain useful information for output prediction while reducing redundant information. We show that the HSIC bottleneck enhances robustness to adversarial attacks both theoretically and experimentally. In particular, we prove that the HSIC bottleneck regularizer reduces the sensitivity of the classifier to adversarial examples. Our experiments on multiple benchmark datasets and architectures demonstrate that incorporating an HSIC bottleneck regularizer attains competitive natural accuracy and improves adversarial robustness, both with and without adversarial examples during training. Our code and adversarially robust models are publicly available.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Model Stitching to Compare Neural Representations b/data/2021/neurips/Revisiting Model Stitching to Compare Neural Representations
new file mode 100644
index 0000000000..75ebc5c681
--- /dev/null
+++ b/data/2021/neurips/Revisiting Model Stitching to Compare Neural Representations	
@@ -0,0 +1 @@
+We revisit and extend model stitching (Lenc&Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a"stitched model'' formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as"good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that"more is better'' by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be"plugged in'' to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call"stitching connectivity'', akin to mode-connectivity: typical minima reached by SGD can all be stitched to each other with minimal change in accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting ResNets: Improved Training and Scaling Strategies b/data/2021/neurips/Revisiting ResNets: Improved Training and Scaling Strategies
new file mode 100644
index 0000000000..7ef9ae64b8
--- /dev/null
+++ b/data/2021/neurips/Revisiting ResNets: Improved Training and Scaling Strategies	
@@ -0,0 +1 @@
+Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan&Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting Smoothed Online Learning b/data/2021/neurips/Revisiting Smoothed Online Learning
new file mode 100644
index 0000000000..d873849e32
--- /dev/null
+++ b/data/2021/neurips/Revisiting Smoothed Online Learning	
@@ -0,0 +1 @@
+In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the simple idea of balancing the two costs by an optimization problem. Surprisingly, we find that minimizing the hitting cost alone is $\max(1, \frac{2}{\alpha})$-competitive for $\alpha$-polyhedral functions and $1 + \frac{4}{\lambda}$-competitive for $\lambda$-quadratic growth functions, both of which improve state-of-the-art results significantly. Moreover, when the hitting cost is both convex and $\lambda$-quadratic growth, we reduce the competitive ratio to $1 + \frac{2}{\sqrt{\lambda}}$ by minimizing the weighted sum of the hitting cost and the switching cost. To bound the dynamic regret with switching cost, we follow the standard setting of online convex optimization, in which the hitting cost is convex but hidden from the learner before making predictions. We modify Ader, an existing algorithm designed for dynamic regret, slightly to take into account the switching cost when measuring the performance. The proposed algorithm, named as Smoothed Ader, attains an optimal $O(\sqrt{T(1+P_T)})$ bound for dynamic regret with switching cost, where $P_T$ is the path-length of the comparator sequence. Furthermore, if the hitting cost is accessible in the beginning of each round, we obtain a similar guarantee without the bounded gradient condition, and establish an $\Omega(\sqrt{T(1+P_T)})$ lower bound to confirm the optimality.
\ No newline at end of file
diff --git a/data/2021/neurips/Revisiting the Calibration of Modern Neural Networks b/data/2021/neurips/Revisiting the Calibration of Modern Neural Networks
new file mode 100644
index 0000000000..910af43f64
--- /dev/null
+++ b/data/2021/neurips/Revisiting the Calibration of Modern Neural Networks	
@@ -0,0 +1 @@
+Accurate estimation of predictive uncertainty (model calibration) is essential for the safe application of neural networks. Many instances of miscalibration in modern neural networks have been reported, suggesting a trend that newer, more accurate models produce poorly calibrated predictions. Here, we revisit this question for recent state-of-the-art image classification models. We systematically relate model calibration and accuracy, and find that the most recent models, notably those not using convolutions, are among the best calibrated. Trends observed in prior model generations, such as decay of calibration with distribution shift or model size, are less pronounced in recent architectures. We also show that model size and amount of pretraining do not fully explain these differences, suggesting that architecture is a major determinant of calibration properties.
\ No newline at end of file
diff --git a/data/2021/neurips/Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning b/data/2021/neurips/Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning
new file mode 100644
index 0000000000..4e42db3afd
--- /dev/null
+++ b/data/2021/neurips/Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning	
@@ -0,0 +1 @@
+Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor. T-stream contains two transformers, two projectors, and a predictor. T-stream connects to CNN encoders and is in parallel to the remaining C-Stream. During training, we perform SSL in both streams simultaneously and use the T-stream output to supervise C-stream. The features from CNN encoders are modulated in T-stream for visual attention enhancement and become suitable for the SSL scenario. We use these modulated features to supervise C-stream for learning attentive CNN encoders. To this end, we revitalize CNN attention by using transformers as guidance. Experiments on several standard visual recognition benchmarks, including image classification, object detection, and semantic segmentation, show that the proposed CARE framework improves CNN encoder backbones to the state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Reward is enough for convex MDPs b/data/2021/neurips/Reward is enough for convex MDPs
new file mode 100644
index 0000000000..0d8eb1fcda
--- /dev/null
+++ b/data/2021/neurips/Reward is enough for convex MDPs	
@@ -0,0 +1 @@
+Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that they cannot be formulated using stationary reward functions. Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called `pure exploration'. Our approach is to reformulate the convex MDP problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature.
\ No newline at end of file
diff --git a/data/2021/neurips/Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation b/data/2021/neurips/Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
new file mode 100644
index 0000000000..c1cbd70fc6
--- /dev/null
+++ b/data/2021/neurips/Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation	
@@ -0,0 +1 @@
+We study the model-based reward-free reinforcement learning with linear function approximation for episodic Markov decision processes (MDPs). In this setting, the agent works in two phases. In the exploration phase, the agent interacts with the environment and collects samples without the reward. In the planning phase, the agent is given a specific reward function and uses samples collected from the exploration phase to learn a good policy. We propose a new provably efficient algorithm, called UCRL-RFE under the Linear Mixture MDP assumption, where the transition probability kernel of the MDP can be parameterized by a linear function over certain feature mappings defined on the triplet of state, action, and next state. We show that to obtain an $\epsilon$-optimal policy for arbitrary reward function, UCRL-RFE needs to sample at most $\tilde{\mathcal{O}}(H^5d^2\epsilon^{-2})$ episodes during the exploration phase. Here, $H$ is the length of the episode, $d$ is the dimension of the feature mapping. We also propose a variant of UCRL-RFE using Bernstein-type bonus and show that it needs to sample at most $\tilde{\mathcal{O}}(H^4d(H + d)\epsilon^{-2})$ to achieve an $\epsilon$-optimal policy. By constructing a special class of linear Mixture MDPs, we also prove that for any reward-free algorithm, it needs to sample at least $\tilde \Omega(H^2d\epsilon^{-2})$ episodes to obtain an $\epsilon$-optimal policy. Our upper bound matches the lower bound in terms of the dependence on $\epsilon$ and the dependence on $d$ if $H \ge d$.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk Bounds and Calibration for a Smart Predict-then-Optimize Method b/data/2021/neurips/Risk Bounds and Calibration for a Smart Predict-then-Optimize Method
new file mode 100644
index 0000000000..bf96cc5e01
--- /dev/null
+++ b/data/2021/neurips/Risk Bounds and Calibration for a Smart Predict-then-Optimize Method	
@@ -0,0 +1 @@
+The predict-then-optimize framework is fundamental in practical stochastic decision-making problems: first predict unknown parameters of an optimization model, then solve the problem using the predicted values. A natural loss function in this setting is defined by measuring the decision error induced by the predicted parameters, which was named the Smart Predict-then-Optimize (SPO) loss by Elmachtoub and Grigas [arXiv:1710.08005]. Since the SPO loss is typically nonconvex and possibly discontinuous, Elmachtoub and Grigas [arXiv:1710.08005] introduced a convex surrogate, called the SPO+ loss, that importantly accounts for the underlying structure of the optimization model. In this paper, we greatly expand upon the consistency results for the SPO+ loss provided by Elmachtoub and Grigas [arXiv:1710.08005]. We develop risk bounds and uniform calibration results for the SPO+ loss relative to the SPO loss, which provide a quantitative way to transfer the excess surrogate risk to excess true risk. By combining our risk bounds with generalization bounds, we show that the empirical minimizer of the SPO+ loss achieves low excess true risk with high probability. We first demonstrate these results in the case when the feasible region of the underlying optimization problem is a polyhedron, and then we show that the results can be strengthened substantially when the feasible region is a level set of a strongly convex function. We perform experiments to empirically demonstrate the strength of the SPO+ surrogate, as compared to standard $\ell_1$ and squared $\ell_2$ prediction error losses, on portfolio allocation and cost-sensitive multi-class classification problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures b/data/2021/neurips/Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures
new file mode 100644
index 0000000000..3e6f41b097
--- /dev/null
+++ b/data/2021/neurips/Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures	
@@ -0,0 +1 @@
+Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this"benign overfitting"phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classifier in the over-parameterized setting. Our results precisely characterize the condition under which benign overfitting can occur in linear classification problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning b/data/2021/neurips/Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning
new file mode 100644
index 0000000000..6c9aef7bc9
--- /dev/null
+++ b/data/2021/neurips/Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning	
@@ -0,0 +1 @@
+Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the good dependence on the exploration rate in the data. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data. An empirical investigation validates our theory.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk Monotonicity in Statistical Learning b/data/2021/neurips/Risk Monotonicity in Statistical Learning
new file mode 100644
index 0000000000..c24b85a063
--- /dev/null
+++ b/data/2021/neurips/Risk Monotonicity in Statistical Learning	
@@ -0,0 +1 @@
+Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms that minimize the empirical risk. Non-monotonic behavior of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems highlight the current lack of understanding of learning algorithms and generalization. It is, therefore, crucial to pursue this concern and provide a characterization of such behavior. In this paper, we derive the first consistent and risk-monotonic (in high probability) algorithms for a general statistical learning setting under weak assumptions, consequently answering some questions posed by Viering et al. 2019 on how to avoid non-monotonic behavior of risk curves. We further show that risk monotonicity need not necessarily come at the price of worse excess risk rates. To achieve this, we derive new empirical Bernstein-like concentration inequalities of independent interest that hold for certain non-i.i.d.~processes such as Martingale Difference Sequences.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk-Averse Bayes-Adaptive Reinforcement Learning b/data/2021/neurips/Risk-Averse Bayes-Adaptive Reinforcement Learning
new file mode 100644
index 0000000000..07b13b1afb
--- /dev/null
+++ b/data/2021/neurips/Risk-Averse Bayes-Adaptive Reinforcement Learning	
@@ -0,0 +1 @@
+In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk-Aware Transfer in Reinforcement Learning using Successor Features b/data/2021/neurips/Risk-Aware Transfer in Reinforcement Learning using Successor Features
new file mode 100644
index 0000000000..1b3b0eb9e9
--- /dev/null
+++ b/data/2021/neurips/Risk-Aware Transfer in Reinforcement Learning using Successor Features	
@@ -0,0 +1 @@
+Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making. The former can be addressed by transfer learning and the latter by optimizing some utility function of the return. However, the problem of transferring skills in a risk-aware manner is not well-understood. In this paper, we address the problem of risk-aware policy transfer between tasks in a common domain that differ only in their reward functions, in which risk is measured by the variance of reward streams. Our approach begins by extending the idea of generalized policy improvement to maximize entropic utilities, thus extending policy improvement via dynamic programming to sets of policies and levels of risk-aversion. Next, we extend the idea of successor features (SF), a value function representation that decouples the environment dynamics from the rewards, to capture the variance of returns. Our resulting risk-aware successor features (RaSF) integrate seamlessly within the RL framework, inherit the superior task generalization ability of SFs, and incorporate risk-awareness into the decision-making. Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of RaSFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.
\ No newline at end of file
diff --git a/data/2021/neurips/Risk-averse Heteroscedastic Bayesian Optimization b/data/2021/neurips/Risk-averse Heteroscedastic Bayesian Optimization
new file mode 100644
index 0000000000..6dc4327075
--- /dev/null
+++ b/data/2021/neurips/Risk-averse Heteroscedastic Bayesian Optimization	
@@ -0,0 +1 @@
+Many black-box optimization tasks arising in high-stakes applications require risk-averse decisions. The standard Bayesian optimization (BO) paradigm, however, optimizes the expected value only. We generalize BO to trade mean and input-dependent variance of the objective, both of which we assume to be unknown a priori. In particular, we propose a novel risk-averse heteroscedastic Bayesian optimization algorithm (RAHBO) that aims to identify a solution with high return and low noise variance, while learning the noise distribution on the fly. To this end, we model both expectation and variance as (unknown) RKHS functions, and propose a novel risk-aware acquisition function. We bound the regret for our approach and provide a robust rule to report the final decision point for applications where only a single solution must be identified. We demonstrate the effectiveness of RAHBO on synthetic benchmark functions and hyperparameter tuning tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/RoMA: Robust Model Adaptation for Offline Model-based Optimization b/data/2021/neurips/RoMA: Robust Model Adaptation for Offline Model-based Optimization
new file mode 100644
index 0000000000..9d1eb94000
--- /dev/null
+++ b/data/2021/neurips/RoMA: Robust Model Adaptation for Offline Model-based Optimization	
@@ -0,0 +1 @@
+We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries. A popular approach to solving this problem is maintaining a proxy model, e.g., a deep neural network (DNN), that approximates the true objective function. Here, the main challenge is how to avoid adversarially optimized inputs during the search, i.e., the inputs where the DNN highly overestimates the true objective function. To handle the issue, we propose a new framework, coined robust model adaptation (RoMA), based on gradient-based optimization of inputs over the DNN. Specifically, it consists of two steps: (a) a pre-training strategy to robustly train the proxy model and (b) a novel adaptation procedure of the proxy model to have robust estimates for a specific set of candidate solutions. At a high level, our scheme utilizes the local smoothness prior to overcome the brittleness of the DNN. Experiments under various tasks show the effectiveness of RoMA compared with previous methods, obtaining state-of-the-art results, e.g., RoMA outperforms all at 4 out of 6 tasks and achieves runner-up results at the remaining tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Allocations with Diversity Constraints b/data/2021/neurips/Robust Allocations with Diversity Constraints
new file mode 100644
index 0000000000..05ef4a2189
--- /dev/null
+++ b/data/2021/neurips/Robust Allocations with Diversity Constraints	
@@ -0,0 +1 @@
+We consider the problem of allocating divisible items among multiple agents, and consider the setting where any agent is allowed to introduce diversity constraints on the items they are allocated. We motivate this via settings where the items themselves correspond to user ad slots or task workers with attributes such as race and gender on which the principal seeks to achieve demographic parity. We consider the following question: When an agent expresses diversity constraints into an allocation rule, is the allocation of other agents hurt significantly? If this happens, the cost of introducing such constraints is disproportionately borne by agents who do not benefit from diversity. We codify this via two desiderata capturing robustness. These are no negative externality -- other agents are not hurt -- and monotonicity -- the agent enforcing the constraint does not see a large increase in value. We show in a formal sense that the Nash Welfare rule that maximizes product of agent values is uniquely positioned to be robust when diversity constraints are introduced, while almost all other natural allocation rules fail this criterion. We also show that the guarantees achieved by Nash Welfare are nearly optimal within a widely studied class of allocation rules. We finally perform an empirical simulation on real-world data that models ad allocations to show that this gap between Nash Welfare and other rules persists in the wild.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Auction Design in the Auto-bidding World b/data/2021/neurips/Robust Auction Design in the Auto-bidding World
new file mode 100644
index 0000000000..84c2fed0bc
--- /dev/null
+++ b/data/2021/neurips/Robust Auction Design in the Auto-bidding World	
@@ -0,0 +1 @@
+In classic auction theory, reserve prices are known to be effective for improving revenue for the auctioneer against quasi-linear utility maximizing bidders. The introduction of reserve prices, however, usually do not help improve total welfare of the auctioneer and the bidders. In this paper, we focus on value maximizing bidders with return on spend constraints -- a paradigm that has drawn considerable attention recently as more advertisers adopt auto-bidding algorithms in advertising platforms -- and show that the introduction of reserve prices has a novel impact on the market. Namely, by choosing reserve prices appropriately the auctioneer can improve not only the total revenue but also the total welfare. Our results also demonstrate that reserve prices are robust to bidder types, i.e., reserve prices work well for different bidder types, such as value maximizers and utility maximizers, without using bidder type information. We generalize these results for a variety of auction mechanisms such as VCG, GSP, and first-price auctions. Moreover, we show how to combine these results with additive boosts to improve the welfare of the outcomes of the auction further. Finally, we complement our theoretical observations with an empirical study confirming the effectiveness of these ideas using data from online advertising auctions.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Compressed Sensing MRI with Deep Generative Priors b/data/2021/neurips/Robust Compressed Sensing MRI with Deep Generative Priors
new file mode 100644
index 0000000000..f51f331797
--- /dev/null
+++ b/data/2021/neurips/Robust Compressed Sensing MRI with Deep Generative Priors	
@@ -0,0 +1 @@
+The CSGM framework (Bora-Jalal-Price-Dimakis'17) has shown that deep generative priors can be powerful tools for solving inverse problems. However, to date this framework has been empirically successful only on certain datasets (for example, human faces and MNIST digits), and it is known to perform poorly on out-of-distribution samples. In this paper, we present the first successful application of the CSGM framework on clinical MRI data. We train a generative prior on brain scans from the fastMRI dataset, and show that posterior sampling via Langevin dynamics achieves high quality reconstructions. Furthermore, our experiments and theory show that posterior sampling is robust to changes in the ground-truth distribution and measurement process. Our code and models are available at: \url{https://github.com/utcsilab/csgm-mri-langevin}.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Contrastive Learning Using Negative Samples with Diminished Semantics b/data/2021/neurips/Robust Contrastive Learning Using Negative Samples with Diminished Semantics
new file mode 100644
index 0000000000..9653b2d0ec
--- /dev/null
+++ b/data/2021/neurips/Robust Contrastive Learning Using Negative Samples with Diminished Semantics	
@@ -0,0 +1 @@
+Unsupervised learning has recently made exceptional progress because of the development of more effective contrastive learning methods. However, CNNs are prone to depend on low-level features that humans deem non-semantic. This dependency has been conjectured to induce a lack of robustness to image perturbations or domain shift. In this paper, we show that by generating carefully designed negative samples, contrastive learning can learn more robust representations with less dependence on such features. Contrastive learning utilizes positive pairs that preserve semantic information while perturbing superficial features in the training images. Similarly, we propose to generate negative samples in a reversed way, where only the superfluous instead of the semantic features are preserved. We develop two methods, texture-based and patch-based augmentations, to generate negative samples. These samples achieve better generalization, especially under out-of-domain settings. We also analyze our method and the generated texture-based samples, showing that texture features are indispensable in classifying particular ImageNet classes and especially finer classes. We also show that model bias favors texture and shape features differently under different test settings. Our code, trained models, and ImageNet-Texture dataset can be found at https://github.com/SongweiGe/Contrastive-Learning-with-Non-Semantic-Negatives.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Counterfactual Explanations on Graph Neural Networks b/data/2021/neurips/Robust Counterfactual Explanations on Graph Neural Networks
new file mode 100644
index 0000000000..3467de4bf1
--- /dev/null
+++ b/data/2021/neurips/Robust Counterfactual Explanations on Graph Neural Networks	
@@ -0,0 +1 @@
+Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing the correlation for a single input can easily overfit noise. Moreover, they do not align well with human intuition because removing an identified subgraph from an input graph does not necessarily change the prediction result. In this paper, we propose a novel method to generate robust counterfactual explanations on GNNs by explicitly modelling the common decision logic of GNNs on similar input graphs. Our explanations are naturally robust to noise because they are produced from the common decision boundaries of a GNN that govern the predictions of many similar input graphs. The explanations also align well with human intuition because removing the set of edges identified by an explanation from the input graph changes the prediction significantly. Exhaustive experiments on many public datasets demonstrate the superior performance of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Deep Reinforcement Learning through Adversarial Loss b/data/2021/neurips/Robust Deep Reinforcement Learning through Adversarial Loss
new file mode 100644
index 0000000000..0dc1027f58
--- /dev/null
+++ b/data/2021/neurips/Robust Deep Reinforcement Learning through Adversarial Loss	
@@ -0,0 +1 @@
+Deep neural networks, including reinforcement learning agents, have been proven vulnerable to small adversarial changes in the input, thus making deploying such networks in the real world problematic. In this paper, we propose RADIAL-RL, a method to train reinforcement learning agents with improved robustness against any $l_p$-bounded adversarial attack. By simply minimizing an upper bound of the loss functions under worst case adversarial perturbation derived from efficient robustness verification methods, we significantly improve robustness of RL-agents trained on Atari-2600 games and show that RADIAL-RL can beat state-of-the-art robust training algorithms when evaluated against PGD-attacks. We also propose a new evaluation method, Greedy Worst-Case Reward (GWC), for measuring attack agnostic robustness of RL agents. GWC can be evaluated efficiently and it serves as a good estimate of the reward under the worst possible sequence of adversarial attacks; in particular, GWC accounts for the importance of each action and their temporal dependency, improving upon previous approaches that only evaluate whether each single action can change under input perturbations. Our code is available at this https URL.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Generalization despite Distribution Shift via Minimum Discriminating Information b/data/2021/neurips/Robust Generalization despite Distribution Shift via Minimum Discriminating Information
new file mode 100644
index 0000000000..bc641f6e88
--- /dev/null
+++ b/data/2021/neurips/Robust Generalization despite Distribution Shift via Minimum Discriminating Information	
@@ -0,0 +1 @@
+Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution. We employ the principle of minimum discriminating information to embed the available prior knowledge, and use distributionally robust optimization to account for uncertainty due to the limited samples. By leveraging large deviation results, we obtain explicit generalization bounds with respect to the unknown shifted distribution. Lastly, we demonstrate the versatility of our framework by demonstrating it on two rather distinct applications: (1) training classifiers on systematically biased data and (2) off-policy evaluation in Markov Decision Processes.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Implicit Networks via Non-Euclidean Contractions b/data/2021/neurips/Robust Implicit Networks via Non-Euclidean Contractions
new file mode 100644
index 0000000000..f7c21cdd69
--- /dev/null
+++ b/data/2021/neurips/Robust Implicit Networks via Non-Euclidean Contractions	
@@ -0,0 +1 @@
+Implicit neural networks, a.k.a., deep equilibrium networks, are a class of implicit-depth learning models where function evaluation is performed by solving a fixed point equation. They generalize classic feedforward models and are equivalent to infinite-depth weight-tied feedforward networks. While implicit models show improved accuracy and significant reduction in memory consumption, they can suffer from ill-posedness and convergence instability. This paper provides a new framework, which we call Non-Euclidean Monotone Operator Network (NEMON), to design well-posed and robust implicit neural networks based upon contraction theory for the non-Euclidean norm $\ell_{\infty}$. Our framework includes (i) a novel condition for well-posedness based on one-sided Lipschitz constants, (ii) an average iteration for computing fixed-points, and (iii) explicit estimates on input-output Lipschitz constants. Additionally, we design a training problem with the well-posedness condition and the average iteration as constraints and, to achieve robust models, with the input-output Lipschitz constant as a regularizer. Our $\ell_{\infty}$ well-posedness condition leads to a larger polytopic training search space than existing conditions and our average iteration enjoys accelerated convergence. Finally, we evaluate our framework in image classification through the MNIST and the CIFAR-10 datasets. Our numerical results demonstrate improved accuracy and robustness of the implicit models with smaller input-output Lipschitz bounds. Code is available at https://github.com/davydovalexander/Non-Euclidean_Mon_Op_Net.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch b/data/2021/neurips/Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch
new file mode 100644
index 0000000000..f6a40b5bd2
--- /dev/null
+++ b/data/2021/neurips/Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch	
@@ -0,0 +1 @@
+We study the inverse reinforcement learning (IRL) problem under the \emph{transition dynamics mismatch} between the expert and the learner. In particular, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide an upper bound on the learner's performance degradation based on the $\ell_1$-distance between the two transition dynamics of the expert and the learner. Then, by leveraging insights from the Robust RL literature, we propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch issue. Finally, we empirically demonstrate the stable performance of our algorithm compared to the standard MCE IRL algorithm under transition mismatches in finite MDP problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Learning of Optimal Auctions b/data/2021/neurips/Robust Learning of Optimal Auctions
new file mode 100644
index 0000000000..ba0b0a1c8c
--- /dev/null
+++ b/data/2021/neurips/Robust Learning of Optimal Auctions	
@@ -0,0 +1 @@
+We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. First, we prove tight upper bounds on the revenue we can obtain with a corrupted distribution under a population model, for both regular valuation distributions and distributions with monotone hazard rate (MHR). We then propose new algorithms that, given only an ``approximate distribution'' for the bidder's valuation, can learn a mechanism whose revenue is nearly optimal simultaneously for all ``true distributions'' that are $\alpha$-close to the original distribution in Kolmogorov-Smirnov distance. The proposed algorithms operate beyond the setting of bounded distributions that have been studied in prior works, and are guaranteed to obtain a fraction $1-O(\alpha)$ of the optimal revenue under the true distribution when the distributions are MHR. Moreover, they are guaranteed to yield at least a fraction $1-O(\sqrt{\alpha})$ of the optimal revenue when the distributions are regular. We prove that these upper bounds cannot be further improved, by providing matching lower bounds. Lastly, we derive sample complexity upper bounds for learning a near-optimal auction for both MHR and regular distributions.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Online Correlation Clustering b/data/2021/neurips/Robust Online Correlation Clustering
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Robust Optimization for Multilingual Translation with Imbalanced Data b/data/2021/neurips/Robust Optimization for Multilingual Translation with Imbalanced Data
new file mode 100644
index 0000000000..c891335759
--- /dev/null
+++ b/data/2021/neurips/Robust Optimization for Multilingual Translation with Imbalanced Data	
@@ -0,0 +1 @@
+Multilingual models are parameter-efficient and especially effective in improving low-resource languages by leveraging crosslingual transfer. Despite recent advance in massive multilingual translation with ever-growing model and data, how to effectively train multilingual models has not been well understood. In this paper, we show that a common situation in multilingual training, data imbalance among languages, poses optimization tension between high resource and low resource languages where the found multilingual solution is often sub-optimal for low resources. We show that common training method which upsamples low resources can not robustly optimize population loss with risks of either underfitting high resource languages or overfitting low resource ones. Drawing on recent findings on the geometry of loss landscape and its effect on generalization, we propose a principled optimization algorithm, Curvature Aware Task Scaling (CATS), which adaptively rescales gradients from different tasks with a meta objective of guiding multilingual training to low-curvature neighborhoods with uniformly low loss for all languages. We ran experiments on common benchmarks (TED, WMT and OPUS-100) with varying degrees of data imbalance. CATS effectively improved multilingual optimization and as a result demonstrated consistent gains on low resources ($+0.8$ to $+2.2$ BLEU) without hurting high resources. In addition, CATS is robust to overparameterization and large batch size training, making it a promising training method for massive multilingual models that truly improve low resource languages.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference b/data/2021/neurips/Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference
new file mode 100644
index 0000000000..a9558389e3
--- /dev/null
+++ b/data/2021/neurips/Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference	
@@ -0,0 +1 @@
+Multi-person pose estimation in crowded scenes is challenging because overlapping and occlusions make it difficult to detect person bounding boxes and infer pose cues from individual keypoints. To address those issues, this paper proposes a direct pose-level inference strategy that is free of bounding box detection and keypoint grouping. Instead of inferring individual keypoints, the Pose-level Inference Network (PINet) directly infers the complete pose cues for a person from his/her visible body parts. PINet first applies the Part-based Pose Generation (PPG) to infer multiple coarse poses for each person from his/her body parts. Those coarse poses are refined by the Pose Refinement module through incorporating pose priors, and finally are fused in the Pose Fusion module. PINet relies on discriminative body parts to differentiate overlapped persons, and applies visual body cues to infer the global pose cues. Experiments on several crowded scenes pose estimation benchmarks demonstrate the superiority of PINet. For instance, it achieves 59.8% AP on the OCHuman dataset, outperforming the recent works by a large margin†.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Predictable Control b/data/2021/neurips/Robust Predictable Control
new file mode 100644
index 0000000000..b9c00b5acf
--- /dev/null
+++ b/data/2021/neurips/Robust Predictable Control	
@@ -0,0 +1 @@
+Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Regression Revisited: Acceleration and Improved Estimation Rates b/data/2021/neurips/Robust Regression Revisited: Acceleration and Improved Estimation Rates
new file mode 100644
index 0000000000..570bd68ffb
--- /dev/null
+++ b/data/2021/neurips/Robust Regression Revisited: Acceleration and Improved Estimation Rates	
@@ -0,0 +1 @@
+We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature. For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust Visual Reasoning via Language Guided Neural Module Networks b/data/2021/neurips/Robust Visual Reasoning via Language Guided Neural Module Networks
new file mode 100644
index 0000000000..a0867d7d80
--- /dev/null
+++ b/data/2021/neurips/Robust Visual Reasoning via Language Guided Neural Module Networks	
@@ -0,0 +1 @@
+Neural module networks (NMN) are a popular approach for solving multi-modal tasks such as visual question answering (VQA) and visual referring expression recognition (REF). A key limitation in prior implementations of NMN is that the neural modules do not effectively capture the association between the visual input and the relevant neighbourhood context of the textual input. This limits their generalizability. For instance, NMN fail to understand new concepts such as “yellow sphere to the left" even when it is a combination of known concepts from train data: “blue sphere", “yellow cube", and “metallic cube to the left". In this paper, we address this limitation by introducing a language-guided adaptive convolution layer (LG-Conv) into NMN, in which the ﬁlter weights of convolutions are explicitly multiplied with a spatially varying language-guided kernel. Our model allows the neural module to adaptively co-attend over potential objects of interest from the visual and textual inputs. Extensive experiments on VQA and REF tasks demonstrate the effectiveness of our approach. Additionally, we propose a new challenging out-of-distribution test split for REF task, which we call C3-Ref+, for explicitly evaluating the NMN’s ability to generalize well to adversarial perturbations and unseen combinations of known concepts. Experiments on C3-Ref+ further demonstrate the generalization capabilities of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust and Decomposable Average Precision for Image Retrieval b/data/2021/neurips/Robust and Decomposable Average Precision for Image Retrieval
new file mode 100644
index 0000000000..704261cef8
--- /dev/null
+++ b/data/2021/neurips/Robust and Decomposable Average Precision for Image Retrieval	
@@ -0,0 +1 @@
+In image retrieval, standard evaluation metrics rely on score ranking, e.g. average precision (AP). In this paper, we introduce a method for robust and decomposable average precision (ROADMAP) addressing two major challenges for end-to-end training of deep neural networks with AP: non-differentiability and non-decomposability. Firstly, we propose a new differentiable approximation of the rank function, which provides an upper bound of the AP loss and ensures robust training. Secondly, we design a simple yet effective loss function to reduce the decomposability gap between the AP in the whole training set and its averaged batch approximation, for which we provide theoretical guarantees. Extensive experiments conducted on three image retrieval datasets show that ROADMAP outperforms several recent AP approximation methods and highlight the importance of our two contributions. Finally, using ROADMAP for training deep models yields very good performances, outperforming state-of-the-art results on the three datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems b/data/2021/neurips/Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems
new file mode 100644
index 0000000000..a6c911118a
--- /dev/null
+++ b/data/2021/neurips/Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems	
@@ -0,0 +1 @@
+In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers and most existing coreset construction methods are not resilient against outliers (in particular, an outlier can be located arbitrarily in the space by an adversarial attacker). In this paper, we propose a novel robust coreset method for the {\em continuous-and-bounded learning} problems (with outliers) which includes a broad range of popular optimization objectives in machine learning, {\em e.g.,} logistic regression and $ k $-means clustering. Moreover, our robust coreset can be efficiently maintained in fully-dynamic environment. To the best of our knowledge, this is the first robust and fully-dynamic coreset construction method for these optimization problems. Another highlight is that our coreset size can depend on the doubling dimension of the parameter space, rather than the VC dimension of the objective function which could be very large or even challenging to compute. Finally, we conduct the experiments on real-world datasets to evaluate the effectiveness of our proposed robust coreset method.
\ No newline at end of file
diff --git a/data/2021/neurips/Robust and differentially private mean estimation b/data/2021/neurips/Robust and differentially private mean estimation
new file mode 100644
index 0000000000..1894cd5fc9
--- /dev/null
+++ b/data/2021/neurips/Robust and differentially private mean estimation	
@@ -0,0 +1 @@
+In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness. Each participating individual should be able to contribute without the fear of leaking one’s sensitive information. At the same time, the system should be robust in the presence of malicious participants inserting corrupted data. Recent algorithmic advances in learning from shared data focus on either one of these threats, leaving the system vulnerable to the other. We bridge this gap for the canonical problem of estimating the mean from i.i.d. samples. We introduce PRIME, which is the first efficient algorithm that achieves both privacy and robustness for a wide range of distributions. We further complement this result with a novel exponential time algorithm that improves the sample complexity of PRIME, achieving a near-optimal guarantee and matching a known lower bound for (non-robust) private mean estimation. This proves that there is no extra statistical cost to simultaneously guaranteeing privacy and robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/Robustifying Algorithms of Learning Latent Trees with Vector Variables b/data/2021/neurips/Robustifying Algorithms of Learning Latent Trees with Vector Variables
new file mode 100644
index 0000000000..243b03ce53
--- /dev/null
+++ b/data/2021/neurips/Robustifying Algorithms of Learning Latent Trees with Vector Variables	
@@ -0,0 +1 @@
+We consider learning the structures of Gaussian latent tree models with vector observations when a subset of them are arbitrarily corrupted. First, we present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) without the assumption that the effective depth is bounded in the number of observed nodes, significantly generalizing the results in Choi et al. (2011). We show that Chow-Liu initialization in CLRG greatly reduces the sample complexity of RG from being exponential in the diameter of the tree to only logarithmic in the diameter for the hidden Markov model (HMM). Second, we robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. These robustified algorithms can tolerate a number of corruptions up to the square root of the number of clean samples. Finally, we derive the first known instance-dependent impossibility result for structure learning of latent trees. The optimalities of the robust version of CLRG and NJ are verified by comparing their sample complexities and the impossibility result.
\ No newline at end of file
diff --git a/data/2021/neurips/Robustness between the worst and average case b/data/2021/neurips/Robustness between the worst and average case
new file mode 100644
index 0000000000..9c7e82018d
--- /dev/null
+++ b/data/2021/neurips/Robustness between the worst and average case	
@@ -0,0 +1 @@
+Several recent works in machine learning have focused on evaluating the test-time robustness of a classiﬁer: how well the classiﬁer performs not just on the target domain it was trained upon, but upon perturbed examples. In these settings, the focus has largely been on two extremes of robustness: the robustness to perturbations drawn at random from within some distribution (i.e., robustness to random perturbations), and the robustness to the worst case perturbation in some set (i.e., adversarial robustness). In this paper, we argue that a sliding scale between these two extremes provides a valuable additional metric by which to gauge robustness. Speciﬁcally, we illustrate that each of these two extremes is naturally characterized by a (func-tional) q -norm over perturbation space, with q = 1 corresponding to robustness to random perturbations and q = ∞ corresponding to adversarial perturbations. We then present the main technical contribution of our paper: a method for efﬁciently estimating the value of these norms by interpreting them as the partition function of a particular distribution, then using path sampling with MCMC methods to estimate this partition function (either traditional Metropolis-Hastings for non-differentiable perturbations, or Hamiltonian Monte Carlo for differentiable perturbations). We show that our approach provides substantially better estimates than simple random sampling of the actual “intermediate- q ” robustness of standard, data-augmented, and adversarially-trained classiﬁers, illustrating a clear tradeoff between classiﬁers that optimize different metrics
\ No newline at end of file
diff --git a/data/2021/neurips/Robustness of Graph Neural Networks at Scale b/data/2021/neurips/Robustness of Graph Neural Networks at Scale
new file mode 100644
index 0000000000..045f8bf3ad
--- /dev/null
+++ b/data/2021/neurips/Robustness of Graph Neural Networks at Scale	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are increasingly important given their popularity and the diversity of applications. Yet, existing studies of their vulnerability to adversarial attacks rely on relatively small graphs. We address this gap and study how to attack and defend GNNs at scale. We propose two sparsity-aware first-order optimization attacks that maintain an efficient representation despite optimizing over a number of parameters which is quadratic in the number of nodes. We show that common surrogate losses are not well-suited for global attacks on GNNs. Our alternatives can double the attack strength. Moreover, to improve GNNs' reliability we design a robust aggregation function, Soft Median, resulting in an effective defense at all scales. We evaluate our attacks and defense with standard GNNs on graphs more than 100 times larger compared to previous work. We even scale one order of magnitude further by extending our techniques to a scalable GNN.
\ No newline at end of file
diff --git a/data/2021/neurips/Robustness via Uncertainty-aware Cycle Consistency b/data/2021/neurips/Robustness via Uncertainty-aware Cycle Consistency
new file mode 100644
index 0000000000..dad0c47308
--- /dev/null
+++ b/data/2021/neurips/Robustness via Uncertainty-aware Cycle Consistency	
@@ -0,0 +1 @@
+Unpaired image-to-image translation refers to learning inter-image-domain mapping without corresponding image pairs. Existing methods learn deterministic mappings without explicitly modelling the robustness to outliers or predictive uncertainty, leading to performance degradation when encountering unseen perturbations at test time. To address this, we propose a novel probabilistic method based on Uncertainty-aware Generalized Adaptive Cycle Consistency (UGAC), which models the per-pixel residual by generalized Gaussian distribution, capable of modelling heavy-tailed distributions. We compare our model with a wide variety of state-of-the-art methods on various challenging tasks including unpaired image translation of natural images, using standard datasets, spanning autonomous driving, maps, facades, and also in medical imaging domain consisting of MRI. Experimental results demonstrate that our method exhibits stronger robustness towards unseen perturbations in test data. Code is released here: https://github.com/ExplainableML/UncertaintyAwareCycleConsistency.
\ No newline at end of file
diff --git a/data/2021/neurips/Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding b/data/2021/neurips/Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding
new file mode 100644
index 0000000000..30d127f2a2
--- /dev/null
+++ b/data/2021/neurips/Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding	
@@ -0,0 +1 @@
+Knowledge graph embedding models learn the representations of entities and relations in the knowledge graphs for predicting missing links (relations) between entities. Their effectiveness are deeply affected by the ability of modeling and inferring different relation patterns such as symmetry, asymmetry, inversion, composition and transitivity. Although existing models are already able to model many of these relations patterns, transitivity, a very common relation pattern, is still not been fully supported. In this paper, we first theoretically show that the transitive relations can be modeled with projections. We then propose the Rot-Pro model which combines the projection and relational rotation together. We prove that Rot-Pro can infer all the above relation patterns. Experimental results show that the proposed Rot-Pro model effectively learns the transitivity pattern and achieves the state-of-the-art results on the link prediction task in the datasets containing transitive relations.
\ No newline at end of file
diff --git a/data/2021/neurips/Roto-translated Local Coordinate Frames For Interacting Dynamical Systems b/data/2021/neurips/Roto-translated Local Coordinate Frames For Interacting Dynamical Systems
new file mode 100644
index 0000000000..a0339ca0f9
--- /dev/null
+++ b/data/2021/neurips/Roto-translated Local Coordinate Frames For Interacting Dynamical Systems	
@@ -0,0 +1 @@
+Modelling interactions is critical in learning complex dynamical systems, namely systems of interacting objects with highly non-linear and time-dependent behaviour. A large class of such systems can be formalized as $\textit{geometric graphs}$, $\textit{i.e.}$, graphs with nodes positioned in the Euclidean space given an $\textit{arbitrarily}$ chosen global coordinate system, for instance vehicles in a traffic scene. Notwithstanding the arbitrary global coordinate system, the governing dynamics of the respective dynamical systems are invariant to rotations and translations, also known as $\textit{Galilean invariance}$. As ignoring these invariances leads to worse generalization, in this work we propose local coordinate frames per node-object to induce roto-translation invariance to the geometric graph of the interacting dynamical system. Further, the local coordinate frames allow for a natural definition of anisotropic filtering in graph neural networks. Experiments in traffic scenes, 3D motion capture, and colliding particles demonstrate that the proposed approach comfortably outperforms the recent state-of-the-art.
\ No newline at end of file
diff --git a/data/2021/neurips/Row-clustering of a Point Process-valued Matrix b/data/2021/neurips/Row-clustering of a Point Process-valued Matrix
new file mode 100644
index 0000000000..7d90f2c08a
--- /dev/null
+++ b/data/2021/neurips/Row-clustering of a Point Process-valued Matrix	
@@ -0,0 +1 @@
+Structured point process data harvested from various platforms poses new challenges to the machine learning community. By imposing a matrix structure to repeatedly observed marked point processes, we propose a novel mixture model of multi-level marked point processes for identifying potential heterogeneity in the observed data. Specifically, we study a matrix whose entries are marked log-Gaussian Cox processes and cluster rows of such a matrix. An efficient semi-parametric Expectation-Solution (ES) algorithm combined with functional principal component analysis (FPCA) of point processes is proposed for model estimation. The effectiveness of the proposed framework is demonstrated through simulation studies and a real data analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL b/data/2021/neurips/SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL
new file mode 100644
index 0000000000..7687d8a345
--- /dev/null
+++ b/data/2021/neurips/SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL	
@@ -0,0 +1 @@
+The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a Structure-Aware Dual Graph Aggregation Network (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with Global Graph Linking, Local Graph Linking, and Dual-Graph Aggregation Mechanism. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.
\ No newline at end of file
diff --git a/data/2021/neurips/SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization b/data/2021/neurips/SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization
new file mode 100644
index 0000000000..3747412c58
--- /dev/null
+++ b/data/2021/neurips/SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization	
@@ -0,0 +1 @@
+Multilayer-perceptrons (MLP) are known to struggle with learning functions of high-frequencies, and in particular cases with wide frequency bands. We present a spatially adaptive progressive encoding (SAPE) scheme for input signals of MLP networks, which enables them to better fit a wide range of frequencies without sacrificing training stability or requiring any domain specific preprocessing. SAPE gradually unmasks signal components with increasing frequencies as a function of time and space. The progressive exposure of frequencies is monitored by a feedback loop throughout the neural optimization process, allowing changes to propagate at different rates among local spatial portions of the signal space. We demonstrate the advantage of SAPE on a variety of domains and applications, including regression of low dimensional signals and images, representation learning of occupancy networks, and a geometric task of mesh transfer between 3D shapes.
\ No newline at end of file
diff --git a/data/2021/neurips/SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization b/data/2021/neurips/SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization
new file mode 100644
index 0000000000..1d1cda5509
--- /dev/null
+++ b/data/2021/neurips/SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization	
@@ -0,0 +1 @@
+In this paper we consider the training stability of recurrent neural networks (RNNs), and propose a family of RNNs, namely SBO-RNN, that can be formulated using stochastic bilevel optimization (SBO). With the help of stochastic gradient descent (SGD), we manage to convert the SBO problem into an RNN where the feedforward and backpropagation solve the lower and upper-level optimization for learning hidden states and their hyperparameters, respectively. We prove that under mild conditions there is no vanishing or exploding gradient in training SBO-RNN. Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io .
\ No newline at end of file
diff --git a/data/2021/neurips/SE(3)-equivariant prediction of molecular wavefunctions and electronic densities b/data/2021/neurips/SE(3)-equivariant prediction of molecular wavefunctions and electronic densities
new file mode 100644
index 0000000000..adfbf21330
--- /dev/null
+++ b/data/2021/neurips/SE(3)-equivariant prediction of molecular wavefunctions and electronic densities	
@@ -0,0 +1 @@
+Machine learning has enabled the prediction of quantum chemical properties with high accuracy and efficiency, allowing to bypass computationally costly ab initio calculations. Instead of training on a fixed set of properties, more recent approaches attempt to learn the electronic wavefunction (or density) as a central quantity of atomistic systems, from which all other observables can be derived. This is complicated by the fact that wavefunctions transform non-trivially under molecular rotations, which makes them a challenging prediction target. To solve this issue, we introduce general SE(3)-equivariant operations and building blocks for constructing deep learning architectures for geometric point cloud data and apply them to reconstruct wavefunctions of atomistic systems with unprecedented accuracy. Our model achieves speedups of over three orders of magnitude compared to ab initio methods and reduces prediction errors by up to two orders of magnitude compared to the previous state-of-the-art. This accuracy makes it possible to derive properties such as energies and forces directly from the wavefunction in an end-to-end manner. We demonstrate the potential of our approach in a transfer learning application, where a model trained on low accuracy reference wavefunctions implicitly learns to correct for electronic many-body interactions from observables computed at a higher level of theory. Such machine-learned wavefunction surrogates pave the way towards novel semi-empirical methods, offering resolution at an electronic level while drastically decreasing computational cost. Additionally, the predicted wavefunctions can serve as initial guess in conventional ab initio methods, decreasing the number of iterations required to arrive at a converged solution, thus leading to significant speedups without any loss of accuracy or robustness.
\ No newline at end of file
diff --git a/data/2021/neurips/SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency b/data/2021/neurips/SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
new file mode 100644
index 0000000000..b5dbdabe9b
--- /dev/null
+++ b/data/2021/neurips/SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency	
@@ -0,0 +1 @@
+In this paper, we explore how we can build upon the data and models of Internet images and use them to adapt to robot vision without requiring any extra labels. We present a framework called Self-supervised Embodied Active Learning (SEAL). It utilizes perception models trained on internet images to learn an active exploration policy. The observations gathered by this exploration policy are labelled using 3D consistency and used to improve the perception model. We build and utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner. The semantic map is used to compute an intrinsic motivation reward for training the exploration policy and for labelling the agent observations using spatio-temporal 3D consistency and label propagation. We demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model by just moving around in training environments and the improved perception model can be used to improve Object Goal Navigation.
\ No newline at end of file
diff --git a/data/2021/neurips/SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs b/data/2021/neurips/SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs
new file mode 100644
index 0000000000..392303eed8
--- /dev/null
+++ b/data/2021/neurips/SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs	
@@ -0,0 +1 @@
+Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of O(1/ p n), where n is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: 1. We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. 2. We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least e ⌦(1/n5/12). 3. We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark b/data/2021/neurips/SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios b/data/2021/neurips/SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios
new file mode 100644
index 0000000000..a0145cc602
--- /dev/null
+++ b/data/2021/neurips/SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios	
@@ -0,0 +1 @@
+Active learning has proven to be useful for minimizing labeling costs by selecting the most informative samples. However, existing active learning methods do not work well in realistic scenarios such as imbalance or rare classes, out-of-distribution data in the unlabeled set, and redundancy. In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions. We argue that SIMILAR not only works in standard active learning, but also easily extends to the realistic settings considered above and acts as a one-stop solution for active learning that is scalable to large real-world datasets. Empirically, we show that SIMILAR significantly outperforms existing active learning algorithms by as much as ~5% - 18% in the case of rare classes and ~5% - 10% in the case of out-of-distribution data on several image classification tasks like CIFAR-10, MNIST, and ImageNet. SIMILAR is available as a part of the DISTIL toolkit:"https://github.com/decile-team/distil".
\ No newline at end of file
diff --git a/data/2021/neurips/SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition b/data/2021/neurips/SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition
new file mode 100644
index 0000000000..8843fecf36
--- /dev/null
+++ b/data/2021/neurips/SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition	
@@ -0,0 +1 @@
+To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is especially difficult when scene structure needs to be inferred while also estimating the agent's location/viewpoint, as the two variables jointly give rise to the agent's observations. We present an unsupervised variational approach to this problem. Leveraging the shared structure that exists across different scenes, our model learns to infer two sets of latent representations from RGB video input alone: a set of"object"latents, corresponding to the time-invariant, object-level contents of the scene, as well as a set of"frame"latents, corresponding to global time-varying elements such as viewpoint. This factorization of latents allows our model, SIMONe, to represent object attributes in an allocentric manner which does not depend on viewpoint. Moreover, it allows us to disentangle object dynamics and summarize their trajectories as time-abstracted, view-invariant, per-object properties. We demonstrate these capabilities, as well as the model's performance in terms of view synthesis and instance segmentation, across three procedurally generated video datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks b/data/2021/neurips/SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks
new file mode 100644
index 0000000000..167ba7cff7
--- /dev/null
+++ b/data/2021/neurips/SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) work well when the graph structure is provided. However, this structure may not always be available in real-world applications. One solution to this problem is to infer a task-specific latent structure and then apply a GNN to the inferred graph. Unfortunately, the space of possible graph structures grows super-exponentially with the number of nodes and so the task-specific supervision may be insufficient for learning both the structure and the GNN parameters. In this work, we propose the Simultaneous Learning of Adjacency and GNN Parameters with Self-supervision, or SLAPS, a method that provides more supervision for inferring a graph structure through self-supervision. A comprehensive experimental study demonstrates that SLAPS scales to large graphs with hundreds of thousands of nodes and outperforms several models that have been proposed to learn a task-specific graph structure on established benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression b/data/2021/neurips/SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression
new file mode 100644
index 0000000000..11324aa2de
--- /dev/null
+++ b/data/2021/neurips/SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression	
@@ -0,0 +1 @@
+Logistic regression remains one of the most widely used tools in applied statistics, machine learning and data science. However, in moderately high-dimensional problems, where the number of features $d$ is a non-negligible fraction of the sample size $n$, the logistic regression maximum likelihood estimator (MLE), and statistical procedures based the large-sample approximation of its distribution, behave poorly. Recently, Sur and Cand\`es (2019) showed that these issues can be corrected by applying a new approximation of the MLE's sampling distribution in this high-dimensional regime. Unfortunately, these corrections are difficult to implement in practice, because they require an estimate of the \emph{signal strength}, which is a function of the underlying parameters $\beta$ of the logistic regression. To address this issue, we propose SLOE, a fast and straightforward approach to estimate the signal strength in logistic regression. The key insight of SLOE is that the Sur and Cand\`es (2019) correction can be reparameterized in terms of the \emph{corrupted signal strength}, which is only a function of the estimated parameters $\widehat \beta$. We propose an estimator for this quantity, prove that it is consistent in the relevant high-dimensional regime, and show that dimensionality correction using SLOE is accurate in finite samples. Compared to the existing ProbeFrontier heuristic, SLOE is conceptually simpler and orders of magnitude faster, making it suitable for routine use. We demonstrate the importance of routine dimensionality correction in the Heart Disease dataset from the UCI repository, and a genomics application using data from the UK Biobank. We provide an open source package for this method, available at \url{https://github.com/google-research/sloe-logistic}.
\ No newline at end of file
diff --git a/data/2021/neurips/SNIPS: Solving Noisy Inverse Problems Stochastically b/data/2021/neurips/SNIPS: Solving Noisy Inverse Problems Stochastically
new file mode 100644
index 0000000000..851238b605
--- /dev/null
+++ b/data/2021/neurips/SNIPS: Solving Noisy Inverse Problems Stochastically	
@@ -0,0 +1 @@
+In this work we introduce a novel stochastic algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem, where the observation is assumed to be contaminated by additive white Gaussian noise. Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE) Gaussian denoiser. The proposed approach relies on an intricate derivation of the posterior score function that includes a singular value decomposition (SVD) of the degradation operator, in order to obtain a tractable iterative algorithm for the desired sampling. Due to its stochasticity, the algorithm can produce multiple high perceptual quality samples for the same noisy observation. We demonstrate the abilities of the proposed paradigm for image deblurring, super-resolution, and compressive sensing. We show that the samples produced are sharp, detailed and consistent with the given measurements, and their diversity exposes the inherent uncertainty in the inverse problem being solved.
\ No newline at end of file
diff --git a/data/2021/neurips/SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation b/data/2021/neurips/SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
new file mode 100644
index 0000000000..c3b79f8579
--- /dev/null
+++ b/data/2021/neurips/SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation	
@@ -0,0 +1 @@
+Natural language instructions for visual navigation often use scene descriptions (e.g.,"bedroom") and object references (e.g.,"green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based vision-and-language navigation (VLN) agent that uses two different visual encoders -- a scene classification network and an object detector -- which produce features that match these two distinct types of visual cues. In our method, scene features contribute high-level contextual information that supports object-level processing. With this design, our model is able to use vision-and-language pretraining (i.e., learning the alignment between images and text from large-scale web data) to substantially improve performance on the Room-to-Room (R2R) and Room-Across-Room (RxR) benchmarks. Specifically, our approach leads to improvements of 1.8% absolute in SPL on R2R and 3.7% absolute in SR on RxR. Our analysis reveals even larger gains for navigation instructions that contain six or more object references, which further suggests that our approach is better able to use object features and align them to references in the instructions.
\ No newline at end of file
diff --git a/data/2021/neurips/SOFT: Softmax-free Transformer with Linear Complexity b/data/2021/neurips/SOFT: Softmax-free Transformer with Linear Complexity
new file mode 100644
index 0000000000..22cef76ac6
--- /dev/null
+++ b/data/2021/neurips/SOFT: Softmax-free Transformer with Linear Complexity	
@@ -0,0 +1 @@
+Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.
\ No newline at end of file
diff --git a/data/2021/neurips/SOLQ: Segmenting Objects by Learning Queries b/data/2021/neurips/SOLQ: Segmenting Objects by Learning Queries
new file mode 100644
index 0000000000..095e05679d
--- /dev/null
+++ b/data/2021/neurips/SOLQ: Segmenting Objects by Learning Queries	
@@ -0,0 +1 @@
+In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR [1], our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation. Code is available at https://github.com/megvii-research/SOLQ.
\ No newline at end of file
diff --git a/data/2021/neurips/SOPE: Spectrum of Off-Policy Estimators b/data/2021/neurips/SOPE: Spectrum of Off-Policy Estimators
new file mode 100644
index 0000000000..c0e2ba3505
--- /dev/null
+++ b/data/2021/neurips/SOPE: Spectrum of Off-Policy Estimators	
@@ -0,0 +1 @@
+Many sequential decision making problems are high-stakes and require off-policy evaluation (OPE) of a new policy using historical data collected using some other policy. One of the most common OPE techniques that provides unbiased estimates is trajectory based importance sampling (IS). However, due to the high variance of trajectory IS estimates, importance sampling methods based on state-action visitation distributions (SIS) have recently been adopted. Unfortunately, while SIS often provides lower variance estimates for long horizons, estimating the state-action distribution ratios can be challenging and lead to biased estimates. In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS. Additionally, we also establish a spectrum for doubly-robust and weighted version of these estimators. We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS and can achieve lower mean-squared error than both IS and SIS.
\ No newline at end of file
diff --git a/data/2021/neurips/SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search b/data/2021/neurips/SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning b/data/2021/neurips/SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning
new file mode 100644
index 0000000000..6bdf29fa54
--- /dev/null
+++ b/data/2021/neurips/SQALER: Scaling Question Answering by Decoupling Multi-Hop and Logical Reasoning	
@@ -0,0 +1 @@
+State-of-the-art approaches to reasoning and question answering over knowledge graphs (KGs) usually scale with the number of edges and can only be applied effectively on small instance-dependent subgraphs. In this paper, we address this issue by showing that multi-hop and more complex logical reasoning can be accomplished separately without losing expressive power. Motivated by this insight, we propose an approach to multi-hop reasoning that scales linearly with the number of relation types in the graph, which is usually significantly smaller than the number of edges or nodes. This produces a set of candidate solutions that can be provably refined to recover the solution to the original problem. Our experiments on knowledge-based question answering show that our approach solves the multi-hop MetaQA dataset, achieves a new state-of-the-art on the more challenging WebQuestionsSP, is orders of magnitude more scalable than competitive approaches, and can achieve compositional generalization out of the training distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/SSAL: Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection b/data/2021/neurips/SSAL: Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection
new file mode 100644
index 0000000000..7ed3a8853d
--- /dev/null
+++ b/data/2021/neurips/SSAL: Synergizing between Self-Training and Adversarial Learning for Domain Adaptive Object Detection	
@@ -0,0 +1 @@
+We study adapting trained object detectors to unseen domains manifesting significant variations of object appearance, viewpoints and backgrounds. Most current methods align domains by either using image or instance-level feature alignment in an adversarial fashion. This often suffers due to the presence of unwanted background and as such lacks class-specific alignment. A common remedy to promote class-level alignment is to use high confidence predictions on the unlabelled domain as pseudo labels. These high confidence predictions are often fallacious since the model is poorly calibrated under domain shift. In this paper, we propose to leverage model predictive uncertainty to strike the right balance between adversarial feature alignment and class-level alignment. Specifically, we measure predictive uncertainty on class assignments and the bounding box predictions. Model predictions with low uncertainty are used to generate pseudo-labels for self-supervision, whereas the ones with higher uncertainty are used to generate tiles for an adversarial feature alignment stage. This synergy between tiling around the uncertain object regions and generating pseudo-labels from highly certain object regions allows us to capture both the image and instance level context during the model adaptation stage. We perform extensive experiments covering various domain shift scenarios. Our approach improves upon existing state-of-the-art methods with visible margins.
\ No newline at end of file
diff --git a/data/2021/neurips/SSMF: Shifting Seasonal Matrix Factorization b/data/2021/neurips/SSMF: Shifting Seasonal Matrix Factorization
new file mode 100644
index 0000000000..4526476f5a
--- /dev/null
+++ b/data/2021/neurips/SSMF: Shifting Seasonal Matrix Factorization	
@@ -0,0 +1 @@
+Given taxi-ride counts information between departure and destination locations, how can we forecast their future demands? In general, given a data stream of events with seasonal patterns that innovate over time, how can we effectively and efficiently forecast future events? In this paper, we propose Shifting Seasonal Matrix Factorization approach, namely SSMF, that can adaptively learn multiple seasonal patterns (called regimes), as well as switching between them. Our proposed method has the following properties: (a) it accurately forecasts future events by detecting regime shifts in seasonal patterns as the data stream evolves; (b) it works in an online setting, i.e., processes each observation in constant time and memory; (c) it effectively realizes regime shifts without human intervention by using a lossless data compression scheme. We demonstrate that our algorithm outperforms state-of-the-art baseline methods by accurately forecasting upcoming events on three real-world data streams.
\ No newline at end of file
diff --git a/data/2021/neurips/SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning b/data/2021/neurips/SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning
new file mode 100644
index 0000000000..bf3137a38d
--- /dev/null
+++ b/data/2021/neurips/SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning	
@@ -0,0 +1 @@
+This paper introduces a solid state-of-the-art baseline for a class-incremental semantic segmentation (CISS) problem. While the recent CISS algorithms utilize variants of the knowledge distillation (KD) technique to tackle the problem, they failed to fully address the critical challenges in CISS causing the catastrophic forgetting; the semantic drift of the background class and the multi-label prediction issue. To better address these challenges, we propose a new method, dubbed SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining techniques tailored for semantic segmentation. Specifically, we claim three main contributions. (1) defining unknown classes within the background class to help to learn future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. The extensively conducted experiments show the effectiveness of our method, achieving significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough ablation analyses and discuss different natures of the CISS problem compared to the traditional class-incremental learning targeting classification. The official code is available at https://github.com/clovaai/SSUL.
\ No newline at end of file
diff --git a/data/2021/neurips/STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning b/data/2021/neurips/STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning
new file mode 100644
index 0000000000..a5d374dd41
--- /dev/null
+++ b/data/2021/neurips/STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning	
@@ -0,0 +1 @@
+Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(\epsilon^{-3/2})$ samples and $\tilde{\mathcal{O}}(\epsilon^{-1})$ communication rounds to compute an $\epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.
\ No newline at end of file
diff --git a/data/2021/neurips/STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data b/data/2021/neurips/STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data
new file mode 100644
index 0000000000..072a27f6a0
--- /dev/null
+++ b/data/2021/neurips/STEP: Out-of-Distribution Detection in the Presence of Limited In-Distribution Labeled Data	
@@ -0,0 +1 @@
+Existing semi-supervised learning (SSL) studies typically assume that unlabeled and test data are drawn from the same distribution as labeled data. However, in many real-world applications, it is desirable to have SSL algorithms that not only classify the samples drawn from the same distribution of labeled data but also detect out-of-distribution (OOD) samples drawn from an unknown distribution. In this paper, we study a setting called semi-supervised OOD detection. Two main challenges compared with previous OOD detection settings are i) the lack of labeled data and in-distribution data; ii) OOD samples could be unseen during training. Efforts on this direction remain limited. In this paper, we present an approach S TEP significantly improving OOD detection performance by introducing a new technique: Structure-Keep Unzipping. It learns a new representation space in which OOD samples could be separated well. An efficient optimization algorithm is derived to solve the objective. Comprehensive experiments across various OOD detection benchmarks clearly show that our S TEP approach outperforms other methods by a large margin and achieves remarkable detection performance on several benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization b/data/2021/neurips/STORM+: Fully Adaptive SGD with Recursive Momentum for Nonconvex Optimization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients b/data/2021/neurips/SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients
new file mode 100644
index 0000000000..dfa7bea72a
--- /dev/null
+++ b/data/2021/neurips/SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients	
@@ -0,0 +1 @@
+Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for speciﬁc problems by using some speciﬁc adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To ﬁll this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can ﬂexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic ﬁrst-order oracle (SFO)) complexity of ˜ O ( (cid:15) − 3 ) for ﬁnding an (cid:15) -stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms. Code is available at https://github.com/LIJUNYI95/SuperAdam
\ No newline at end of file
diff --git a/data/2021/neurips/SWAD: Domain Generalization by Seeking Flat Minima b/data/2021/neurips/SWAD: Domain Generalization by Seeking Flat Minima
new file mode 100644
index 0000000000..f595b95dc2
--- /dev/null
+++ b/data/2021/neurips/SWAD: Domain Generalization by Seeking Flat Minima	
@@ -0,0 +1 @@
+Domain generalization (DG) methods aim to achieve generalizability to an unseen target domain by using only training data from the source domains. Although a variety of DG methods have been proposed, a recent study shows that under a fair evaluation protocol, called DomainBed, the simple empirical risk minimization (ERM) approach works comparable to or even outperforms previous methods. Unfortunately, simply solving ERM on a complex, non-convex loss function can easily lead to sub-optimal generalizability by seeking sharp minima. In this paper, we theoretically show that finding flat minima results in a smaller domain generalization gap. We also propose a simple yet effective method, named Stochastic Weight Averaging Densely (SWAD), to find flat minima. SWAD finds flatter minima and suffers less from overfitting than does the vanilla SWA by a dense and overfit-aware stochastic weight sampling strategy. SWAD shows state-of-the-art performances on five DG benchmarks, namely PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet, with consistent and large margins of +1.6% averagely on out-of-domain accuracy. We also compare SWAD with conventional generalization methods, such as data augmentation and consistency regularization methods, to verify that the remarkable performance improvements are originated from by seeking flat minima, not from better in-domain generalizability. Last but not least, SWAD is readily adaptable to existing DG methods without modification; the combination of SWAD and an existing DG method further improves DG performances. Source code is available at https://github.com/khanrc/swad.
\ No newline at end of file
diff --git a/data/2021/neurips/Safe Policy Optimization with Local Generalized Linear Function Approximations b/data/2021/neurips/Safe Policy Optimization with Local Generalized Linear Function Approximations
new file mode 100644
index 0000000000..7de32bc2fd
--- /dev/null
+++ b/data/2021/neurips/Safe Policy Optimization with Local Generalized Linear Function Approximations	
@@ -0,0 +1 @@
+Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.
\ No newline at end of file
diff --git a/data/2021/neurips/Safe Pontryagin Differentiable Programming b/data/2021/neurips/Safe Pontryagin Differentiable Programming
new file mode 100644
index 0000000000..9916aa171f
--- /dev/null
+++ b/data/2021/neurips/Safe Pontryagin Differentiable Programming	
@@ -0,0 +1 @@
+We propose a Safe Pontryagin Differentiable Programming (Safe PDP) methodology, which establishes a theoretical and algorithmic framework to solve a broad class of safety-critical learning and control tasks -- problems that require the guarantee of safety constraint satisfaction at any stage of the learning and control progress. In the spirit of interior-point methods, Safe PDP handles different types of system constraints on states and inputs by incorporating them into the cost or loss through barrier functions. We prove three fundamentals of the proposed Safe PDP: first, both the solution and its gradient in the backward pass can be approximated by solving their more efficient unconstrained counterparts; second, the approximation for both the solution and its gradient can be controlled for arbitrary accuracy by a barrier parameter; and third, importantly, all intermediate results throughout the approximation and optimization strictly respect the constraints, thus guaranteeing safety throughout the entire learning and control process. We demonstrate the capabilities of Safe PDP in solving various safety-critical tasks, including safe policy optimization, safe motion planning, and learning MPCs from demonstrations, on different challenging systems such as 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
\ No newline at end of file
diff --git a/data/2021/neurips/Safe Reinforcement Learning by Imagining the Near Future b/data/2021/neurips/Safe Reinforcement Learning by Imagining the Near Future
new file mode 100644
index 0000000000..8ff4ce543b
--- /dev/null
+++ b/data/2021/neurips/Safe Reinforcement Learning by Imagining the Near Future	
@@ -0,0 +1 @@
+Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences. In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe states. We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions. Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Safe Reinforcement Learning with Natural Language Constraints b/data/2021/neurips/Safe Reinforcement Learning with Natural Language Constraints
new file mode 100644
index 0000000000..0f1970cf69
--- /dev/null
+++ b/data/2021/neurips/Safe Reinforcement Learning with Natural Language Constraints	
@@ -0,0 +1 @@
+In this paper, we tackle the problem of learning control policies for tasks when provided with constraints in natural language. In contrast to instruction following, language here is used not to specify goals, but rather to describe situations that an agent must avoid during its exploration of the environment. Specifying constraints in natural language also differs from the predominant paradigm in safe reinforcement learning, where safety criteria are enforced by hand-defined cost functions. While natural language allows for easy and flexible specification of safety constraints and budget limitations, its ambiguous nature presents a challenge when mapping these specifications into representations that can be used by techniques for safe reinforcement learning. To address this, we develop a model that contains two components: (1) a constraint interpreter to encode natural language constraints into vector representations capturing spatial and temporal information on forbidden states, and (2) a policy network that uses these representations to output a policy with minimal constraint violations. Our model is end-to-end differentiable and we train it using a recently proposed algorithm for constrained policy optimization. To empirically demonstrate the effectiveness of our approach, we create a new benchmark task for autonomous navigation with crowd-sourced free-form text specifying three different types of constraints. Our method outperforms several baselines by achieving 6-7 times higher returns and 76% fewer constraint violations on average. Dataset and code to reproduce our experiments are available at this https URL.
\ No newline at end of file
diff --git a/data/2021/neurips/Sageflow: Robust Federated Learning against Both Stragglers and Adversaries b/data/2021/neurips/Sageflow: Robust Federated Learning against Both Stragglers and Adversaries
new file mode 100644
index 0000000000..e269d48a40
--- /dev/null
+++ b/data/2021/neurips/Sageflow: Robust Federated Learning against Both Stragglers and Adversaries	
@@ -0,0 +1 @@
+While federated learning (FL) allows efﬁcient model training with local data at edge devices, among major issues still to be resolved are: slow devices known as stragglers and malicious attacks launched by adversaries. While the presence of both of these issues raises serious concerns in practical FL systems, no known schemes or combinations of schemes effectively address them at the same time. We propose Sageﬂow, staleness-aware grouping with entropy-based ﬁltering and loss-weighted averaging, to handle both stragglers and adversaries simultaneously. Model grouping and weighting according to staleness (arrival delay) provides robustness against stragglers, while entropy-based ﬁltering and loss-weighted averaging, working in a highly complementary fashion at each grouping stage, counter a wide range of adversary attacks. A theoretical bound is established to provide key insights into the convergence behavior of Sageﬂow. Extensive experimental results show that Sageﬂow outperforms various existing methods aiming to handle stragglers/adversaries.
\ No newline at end of file
diff --git a/data/2021/neurips/SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning b/data/2021/neurips/SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning
new file mode 100644
index 0000000000..73cd4a49a8
--- /dev/null
+++ b/data/2021/neurips/SalKG: Learning From Knowledge Graph Explanations for Commonsense Reasoning	
@@ -0,0 +1 @@
+Augmenting pre-trained language models with knowledge graphs (KGs) has achieved success on various commonsense reasoning tasks. However, for a given task instance, the KG, or certain parts of the KG, may not be useful. Although KG-augmented models often use attention to focus on specific KG components, the KG is still always used, and the attention mechanism is never explicitly taught which KG components should be used. Meanwhile, saliency methods can measure how much a KG feature (e.g., graph, node, path) influences the model to make the correct prediction, thus explaining which KG features are useful. This paper explores how saliency explanations can be used to improve KG-augmented models' performance. First, we propose to create coarse (Is the KG useful?) and fine (Which nodes/paths in the KG are useful?) saliency explanations. Second, to motivate saliency-based supervision, we analyze oracle KG-augmented models which directly use saliency explanations as extra inputs for guiding their attention. Third, we propose SalKG, a framework for KG-augmented models to learn from coarse and/or fine saliency explanations. Given saliency explanations created from a task's training set, SalKG jointly trains the model to predict the explanations, then solve the task by attending to KG features highlighted by the predicted explanations. On three commonsense QA benchmarks (CSQA, OBQA, CODAH) and a range of KG-augmented models, we show that SalKG can yield considerable performance gains -- up to 2.76% absolute improvement on CSQA.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons b/data/2021/neurips/Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons
new file mode 100644
index 0000000000..5769d817cf
--- /dev/null
+++ b/data/2021/neurips/Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons	
@@ -0,0 +1 @@
+We study the sample complexity (i.e., the number of comparisons needed) bounds for actively ranking a set of n items from multi-wise comparisons. Here, a multi-wise comparison takes m items as input and returns a (noisy) result about the best item (the winner feedback) or the order of these items (the full-ranking feedback). We consider two basic ranking problems: top-k items selection and full ranking. Unlike previous works that study ranking from multi-wise comparisons, in this paper, we do not require any parametric model or assumption and work on the fundamental setting where each comparison returns the correct result with probability 1 or a certain probability larger than 12 . This paper helps understand whether and to what degree utilizing multi-wise comparisons can reduce the sample complexity for the ranking problems compared to ranking from pairwise comparisons. Specifically, under the winner feedback setting, one can reduce the sample complexity for top-k selection up to an m factor and that for full ranking up to a log m factor. Under the full-ranking feedback setting, one can reduce the sample complexity for top-k selection up to an m factor and that for full ranking up to an m log m factor. We also conduct numerical simulations to confirm our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond b/data/2021/neurips/Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond
new file mode 100644
index 0000000000..dfc6c3baba
--- /dev/null
+++ b/data/2021/neurips/Sample Complexity of Tree Search Configuration: Cutting Planes and Beyond	
@@ -0,0 +1,15 @@
+Cutting-plane methods have enabled remarkable successes in integer
+programming over the last few decades. State-of-the-art solvers integrate a
+myriad of cutting-plane techniques to speed up the underlying tree-search
+algorithm used to find optimal solutions. In this paper we prove the first
+guarantees for learning high-performing cut-selection policies tailored to the
+instance distribution at hand using samples. We first bound the sample
+complexity of learning cutting planes from the canonical family of
+Chv\'atal-Gomory cuts. Our bounds handle any number of waves of any number of
+cuts and are fine tuned to the magnitudes of the constraint coefficients. Next,
+we prove sample complexity bounds for more sophisticated cut selection policies
+that use a combination of scoring rules to choose from a family of cuts.
+Finally, beyond the realm of cutting planes for integer programming, we develop
+a general abstraction of tree search that captures key components such as node
+selection and variable selection. For this abstraction, we bound the sample
+complexity of learning a good policy for building the search tree.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample Selection for Fair and Robust Training b/data/2021/neurips/Sample Selection for Fair and Robust Training
new file mode 100644
index 0000000000..715d6b0da0
--- /dev/null
+++ b/data/2021/neurips/Sample Selection for Fair and Robust Training	
@@ -0,0 +1 @@
+Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our algorithm obtains fairness and robustness that are better than or comparable to the state-of-the-art technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games b/data/2021/neurips/Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
new file mode 100644
index 0000000000..d02f123d15
--- /dev/null
+++ b/data/2021/neurips/Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games	
@@ -0,0 +1 @@
+Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum. The majority of existing results in this field focuses on either symmetric solution concepts (e.g. Nash equilibrium) or zero-sum games. It remains open how to learn the Stackelberg equilibrium -- an asymmetric analog of the Nash equilibrium -- in general-sum games efficiently from noisy samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium, in the bandit feedback setting where we only observe noisy samples of the reward. We consider three representative two-player general-sum games: bandit games, bandit-reinforcement learning (bandit-RL) games, and linear bandit games. In all these games, we identify a fundamental gap between the exact value of the Stackelberg equilibrium and its estimated version using finitely many noisy samples, which can not be closed information-theoretically regardless of the algorithm. We then establish sharp positive results on sample-efficient learning of Stackelberg equilibrium with value optimal up to the gap identified above, with matching lower bounds in the dependency on the gap, error tolerance, and the size of the action spaces. Overall, our results unveil unique challenges in learning Stackelberg equilibria under noisy bandit feedback, which we hope could shed light on future research on this topic.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting b/data/2021/neurips/Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
new file mode 100644
index 0000000000..6a42af7abd
--- /dev/null
+++ b/data/2021/neurips/Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting	
@@ -0,0 +1 @@
+Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates the linear realizability of the optimal Q-function (also called the"linear $Q^{\star}$ problem"). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available but is unfortunately infeasible when we turn to standard online RL settings. In this paper, we make progress towards understanding this linear $Q^{\star}$ problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. This protocol is more flexible than the standard online RL setting, while being practically relevant and far more restrictive than the generative model. We develop an algorithm tailored to this setting, achieving a sample complexity that scales polynomially with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space. Our findings underscore the fundamental interplay between sampling protocols and low-complexity structural representation in RL.
\ No newline at end of file
diff --git a/data/2021/neurips/Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model b/data/2021/neurips/Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
new file mode 100644
index 0000000000..ba49c3299f
--- /dev/null
+++ b/data/2021/neurips/Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model	
@@ -0,0 +1 @@
+The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space S and the action space A are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with | S | × | A | , which can be prohibitively large when S or A is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. Q-learning) provably learns an ε-optimal policy (resp. Q-function) with high probability as soon as the sample size exceeds the order of K ( 1 - γ ) 3 ε 2 ( resp . K ( 1 - γ ) 4 ε 2 ) , up to some logarithmic factor. Here K is the feature dimension and γ ∈ (0, 1) is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when K is relatively small, and hence the title of this paper.
\ No newline at end of file
diff --git a/data/2021/neurips/Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? b/data/2021/neurips/Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?
new file mode 100644
index 0000000000..d1486a372c
--- /dev/null
+++ b/data/2021/neurips/Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?	
@@ -0,0 +1 @@
+There have been long-standing controversies and inconsistencies over the experiment setup and criteria for identifying the"winning ticket"in literature. To reconcile such, we revisit the definition of lottery ticket hypothesis, with comprehensive and more rigorous conditions. Under our new definition, we show concrete evidence to clarify whether the winning ticket exists across the major DNN architectures and/or applications. Through extensive experiments, we perform quantitative analysis on the correlations between winning tickets and various experimental factors, and empirically study the patterns of our observations. We find that the key training hyperparameters, such as learning rate and training epochs, as well as the architecture characteristics such as capacities and residual connections, are all highly correlated with whether and when the winning tickets can be identified. Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis. Our codes are publicly available at: https://github.com/boone891214/sanity-check-LTH.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Bayesian GPFA with automatic relevance determination and discrete noise models b/data/2021/neurips/Scalable Bayesian GPFA with automatic relevance determination and discrete noise models
new file mode 100644
index 0000000000..6284830075
--- /dev/null
+++ b/data/2021/neurips/Scalable Bayesian GPFA with automatic relevance determination and discrete noise models	
@@ -0,0 +1 @@
+Latent variable models are ubiquitous in the exploratory analysis of neural population recordings, where they allow researchers to summarize the activity of large populations of neurons in lower dimensional ‘latent’ spaces. Existing methods can generally be categorized into (i) Bayesian methods that facilitate flexible incorporation of prior knowledge and uncertainty estimation, but which typically do not scale to large datasets; and (ii) highly parameterized methods without explicit priors that scale better but often struggle in the low-data regime. Here, we bridge this gap by developing a fully Bayesian yet scalable version of Gaussian process factor analysis (bGPFA) which models neural data as arising from a set of inferred latent processes with a prior that encourages smoothness over time. Additionally, bGPFA uses automatic relevance determination to infer the dimensionality of neural activity directly from the training data during optimization. To enable the analysis of continuous recordings without trial structure, we introduce a novel variational inference strategy that scales near-linearly in time and also allows for non-Gaussian noise models more appropriate for electrophysiological recordings. We apply bGPFA to continuous recordings spanning 30 minutes with over 14 million data points from primate motor and somatosensory cortices during a self-paced reaching task. We show that neural activity progresses from an initial state at target onset to a reach-specific preparatory state well before movement onset. The distance between these initial and preparatory latent states is predictive of reaction times across reaches, suggesting that such preparatory dynamics have behavioral relevance despite the lack of externally imposed delay periods. Additionally, bGPFA discovers latent processes that evolve over slow timescales on the order of several seconds and contain complementary information about reaction time. These timescales are longer than those revealed by methods which focus on individual movement epochs and may reflect fluctuations in e.g. task engagement.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Diverse Model Selection for Accessible Transfer Learning b/data/2021/neurips/Scalable Diverse Model Selection for Accessible Transfer Learning
new file mode 100644
index 0000000000..006d506274
--- /dev/null
+++ b/data/2021/neurips/Scalable Diverse Model Selection for Accessible Transfer Learning	
@@ -0,0 +1 @@
+With the preponderance of pretrained deep learning models available off-the-shelf from model banks today, finding the best weights to fine-tune to your use-case can be a daunting task. Several methods have recently been proposed to find good models for transfer learning, but they either don't scale well to large model banks or don't perform well on the diversity of off-the-shelf models. Ideally the question we want to answer is,"given some data and a source model, can you quickly predict the model's accuracy after fine-tuning?"In this paper, we formalize this setting as"Scalable Diverse Model Selection"and propose several benchmarks for evaluating on this task. We find that existing model selection and transferability estimation methods perform poorly here and analyze why this is the case. We then introduce simple techniques to improve the performance and speed of these algorithms. Finally, we iterate on existing methods to create PARC, which outperforms all other methods on diverse model selection. We have released the benchmarks and method code in hope to inspire future work in model selection for accessible transfer learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation b/data/2021/neurips/Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation
new file mode 100644
index 0000000000..ea534c7ec5
--- /dev/null
+++ b/data/2021/neurips/Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation	
@@ -0,0 +1 @@
+Simulation-based techniques such as variants of stochastic Runge-Kutta are the de facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge-Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We address this issue by revisiting the classical SDE literature and derive direct approximations to the (typically intractable) Fokker-Planck-Kolmogorov equation by matching moments. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications, where a non-parametric SDE with a driving Gaussian process velocity field specifies the model.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Inference of Sparsely-changing Gaussian Markov Random Fields b/data/2021/neurips/Scalable Inference of Sparsely-changing Gaussian Markov Random Fields
new file mode 100644
index 0000000000..7d6160c2ce
--- /dev/null
+++ b/data/2021/neurips/Scalable Inference of Sparsely-changing Gaussian Markov Random Fields	
@@ -0,0 +1 @@
+We study the problem of inferring time-varying Gaussian Markov random ﬁelds, where the underlying graphical model is both sparse and changes sparsely over time. Most of the existing methods for the inference of time-varying Markov random ﬁelds (MRFs) rely on the regularized maximum likelihood estimation (MLE), that typically suffer from weak statistical guarantees and high computational time. Instead, we introduce a new class of constrained optimization problems for the inference of sparsely-changing Gaussian MRFs (GMRFs). The proposed optimization problem is formulated based on the exact ` 0 regularization, and can be solved in near-linear time and memory. Moreover, we show that the proposed estimator enjoys a provably small estimation error. We derive sharp statistical guarantees in the high-dimensional regime, showing that such problems can be learned with as few as one sample per time period. Our proposed method is extremely efﬁcient in practice: it can accurately estimate sparsely-changing GMRFs with more than 500 million variables in less than one hour.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Intervention Target Estimation in Linear Models b/data/2021/neurips/Scalable Intervention Target Estimation in Linear Models
new file mode 100644
index 0000000000..b174e37066
--- /dev/null
+++ b/data/2021/neurips/Scalable Intervention Target Estimation in Linear Models	
@@ -0,0 +1 @@
+This paper considers the problem of estimating the unknown intervention targets in a causal directed acyclic graph from observational and interventional data. The focus is on soft interventions in linear structural equation models (SEMs). Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets even for linear SEMs. This severely limits their scalability and sample complexity. This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets. The pivotal idea is to estimate the intervention sites from the difference between the precision matrices associated with the observational and interventional datasets. It involves repeatedly estimating such sites in different subsets of variables. The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class. Consistency, Markov equivalency, and sample complexity are established analytically. Finally, simulation results on both real and synthetic data demonstrate the gains of the proposed approach for scalable causal structure recovery. Implementation of the algorithm and the code to reproduce the simulation results are available at \url{https://github.com/bvarici/intervention-estimation}.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Neural Data Server: A Data Recommender for Transfer Learning b/data/2021/neurips/Scalable Neural Data Server: A Data Recommender for Transfer Learning
new file mode 100644
index 0000000000..8f075d3d04
--- /dev/null
+++ b/data/2021/neurips/Scalable Neural Data Server: A Data Recommender for Transfer Learning	
@@ -0,0 +1 @@
+Absence of large-scale labeled data in the practitioner’s target domain can be a bottleneck to applying machine learning algorithms in practice. Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance, but finding the most relevant data to transfer from can be challenging. Neural Data Server (NDS) [45], a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem. NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task. Thus, the computational cost to each user grows with the number of sources. To address these issues, we propose Scalable Neural Data Server (SNDS), a large-scale search engine that can theoretically index thousands of datasets to serve relevant ML data to end users. SNDS trains the mixture of experts on intermediary datasets during initialization, and represents both data sources and downstream tasks by their proximity to the intermediary datasets. As such, computational cost incurred by SNDS users remains fixed as new datasets are added to the server. We validate SNDS on a plethora of real world tasks and find that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Online Planning via Reinforcement Learning Fine-Tuning b/data/2021/neurips/Scalable Online Planning via Reinforcement Learning Fine-Tuning
new file mode 100644
index 0000000000..a1b681243d
--- /dev/null
+++ b/data/2021/neurips/Scalable Online Planning via Reinforcement Learning Fine-Tuning	
@@ -0,0 +1 @@
+Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Quasi-Bayesian Inference for Instrumental Variable Regression b/data/2021/neurips/Scalable Quasi-Bayesian Inference for Instrumental Variable Regression
new file mode 100644
index 0000000000..6d0794c69b
--- /dev/null
+++ b/data/2021/neurips/Scalable Quasi-Bayesian Inference for Instrumental Variable Regression	
@@ -0,0 +1 @@
+Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking. In this work we present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models. Contrary to Bayesian modeling for IV, our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods. Our algorithm can be further extended to work with neural network models. We analyze the theoretical properties of the proposed quasi-posterior, and demonstrate through empirical evaluation the competitive performance of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Rule-Based Representation Learning for Interpretable Classification b/data/2021/neurips/Scalable Rule-Based Representation Learning for Interpretable Classification
new file mode 100644
index 0000000000..e07b7685be
--- /dev/null
+++ b/data/2021/neurips/Scalable Rule-Based Representation Learning for Interpretable Classification	
@@ -0,0 +1 @@
+Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. An improved design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on nine small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable Thompson Sampling using Sparse Gaussian Process Models b/data/2021/neurips/Scalable Thompson Sampling using Sparse Gaussian Process Models
new file mode 100644
index 0000000000..e99e9e369f
--- /dev/null
+++ b/data/2021/neurips/Scalable Thompson Sampling using Sparse Gaussian Process Models	
@@ -0,0 +1 @@
+Thompson Sampling (TS) with Gaussian Process (GP) models is a powerful tool for optimizing non-convex objective functions. Despite favorable theoretical properties, the computational complexity of the standard algorithms quickly becomes prohibitive as the number of observation points (i.e. the time horizon) grows. Scalable TS methods can be implemented using sparse GP models, but at the price of an approximation error that invalidates the existing regret bounds. Here, we prove regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows that the improvement in computational complexity can be achieved with no loss in terms of the order of regret performance. Specifically, when necessary conditions on some algorithmic parameters are satisfied, we show an $\tilde{O}(\gamma_T\sqrt{ T})$ bound on the regret performance of TS using sparse GP models where $\gamma_T$ is an upper bound on the information gain between the observations and the underlying model.
\ No newline at end of file
diff --git a/data/2021/neurips/Scalable and Stable Surrogates for Flexible Classifiers with Fairness Constraints b/data/2021/neurips/Scalable and Stable Surrogates for Flexible Classifiers with Fairness Constraints
new file mode 100644
index 0000000000..cf44bfd9b0
--- /dev/null
+++ b/data/2021/neurips/Scalable and Stable Surrogates for Flexible Classifiers with Fairness Constraints	
@@ -0,0 +1 @@
+We investigate how fairness relaxations scale to ﬂexible classiﬁers like deep neural networks for images and text. We analyze an easy-to-use and robust way of imposing fairness constraints when training, and through this framework prove that some prior fairness surrogates exhibit degeneracies for non-convex models. We resolve these problems via three new surrogates: an adaptive data re-weighting
\ No newline at end of file
diff --git a/data/2021/neurips/Scalars are universal: Equivariant machine learning, structured like classical physics b/data/2021/neurips/Scalars are universal: Equivariant machine learning, structured like classical physics
new file mode 100644
index 0000000000..30b9184035
--- /dev/null
+++ b/data/2021/neurips/Scalars are universal: Equivariant machine learning, structured like classical physics	
@@ -0,0 +1 @@
+There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincar\'e groups, at any dimensionality $d$. The key observation is that nonlinear O($d$)-equivariant (and related-group-equivariant) functions can be universally expressed in terms of a lightweight collection of scalars -- scalar products and scalar contractions of the scalar, vector, and tensor inputs. We complement our theory with numerical examples that show that the scalar-based method is simple, efficient, and scalable.
\ No newline at end of file
diff --git a/data/2021/neurips/ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers b/data/2021/neurips/ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers
new file mode 100644
index 0000000000..179ba3faac
--- /dev/null
+++ b/data/2021/neurips/ScaleCert: Scalable Certified Defense against Adversarial Patches with Sparse Superficial Layers	
@@ -0,0 +1 @@
+Adversarial patch attacks that craft the pixels in a confined region of the input images show their powerful attack effectiveness in physical environments even with noises or deformations. Existing certified defenses towards adversarial patch attacks work well on small images like MNIST and CIFAR-10 datasets, but achieve very poor certified accuracy on higher-resolution images like ImageNet. It is urgent to design both robust and effective defenses against such a practical and harmful attack in industry-level larger images. In this work, we propose the certified defense methodology that achieves high provable robustness for high-resolution images and largely improves the practicality for real adoption of the certified defense. The basic insight of our work is that the adversarial patch intends to leverage localized superficial important neurons (SIN) to manipulate the prediction results. Hence, we leverage the SIN-based DNN compression techniques to significantly improve the certified accuracy, by reducing the adversarial region searching overhead and filtering the prediction noises. Our experimental results show that the certified accuracy is increased from 36.3% (the state-of-the-art certified detection) to 60.4% on the ImageNet dataset, largely pushing the certified defenses for practical use.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets b/data/2021/neurips/Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets
new file mode 100644
index 0000000000..ab9ebbd15a
--- /dev/null
+++ b/data/2021/neurips/Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets	
@@ -0,0 +1 @@
+Ensembles of machine learning models yield improved system performance as well as robust and interpretable uncertainty estimates; however, their inference costs may often be prohibitively high. \emph{Ensemble Distribution Distillation} is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. For classification, this is achieved by training a Dirichlet distribution over the ensemble members' output distributions via the maximum likelihood criterion. Although theoretically principled, this criterion exhibits poor convergence when applied to large-scale tasks where the number of classes is very high. In our work, we analyze this effect and show that the Dirichlet log-likelihood criterion classes with low probability induce larger gradients than high-probability classes. This forces the model to focus on the distribution of the ensemble tail-class probabilities. We propose a new training objective that minimizes the reverse KL-divergence to a \emph{Proxy-Dirichlet} target derived from the ensemble. This loss resolves the gradient issues of Ensemble Distribution Distillation, as we demonstrate both theoretically and empirically on the ImageNet and WMT17 En-De datasets containing 1000 and 40,000 classes, respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling Gaussian Processes with Derivative Information Using Variational Inference b/data/2021/neurips/Scaling Gaussian Processes with Derivative Information Using Variational Inference
new file mode 100644
index 0000000000..0e32aed5bf
--- /dev/null
+++ b/data/2021/neurips/Scaling Gaussian Processes with Derivative Information Using Variational Inference	
@@ -0,0 +1 @@
+Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling Neural Tangent Kernels via Sketching and Random Features b/data/2021/neurips/Scaling Neural Tangent Kernels via Sketching and Random Features
new file mode 100644
index 0000000000..5cd5204323
--- /dev/null
+++ b/data/2021/neurips/Scaling Neural Tangent Kernels via Sketching and Random Features	
@@ -0,0 +1 @@
+The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. However, the computational complexity of kernel methods has limited its use in large-scale learning tasks. To accelerate learning with NTK, we design a near input-sparsity time approximation algorithm for NTK, by sketching the polynomial expansions of arc-cosine kernels: our sketch for the convolutional counterpart of NTK (CNTK) can transform any image using a linear runtime in the number of pixels. Furthermore, we prove a spectral approximation guarantee for the NTK matrix, by combining random features (based on leverage score sampling) of the arc-cosine kernels with a sketching algorithm. We benchmark our methods on various large-scale regression and classification tasks and show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150× speedup.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling Up Exact Neural Network Compression by ReLU Stability b/data/2021/neurips/Scaling Up Exact Neural Network Compression by ReLU Stability
new file mode 100644
index 0000000000..8725ee171f
--- /dev/null
+++ b/data/2021/neurips/Scaling Up Exact Neural Network Compression by ReLU Stability	
@@ -0,0 +1 @@
+We can compress a rectifier network while exactly preserving its underlying functionality with respect to a given input domain if some of its neurons are stable. However, current approaches to determine the stability of neurons with Rectified Linear Unit (ReLU) activations require solving or finding a good approximation to multiple discrete optimization problems. In this work, we introduce an algorithm based on solving a single optimization problem to identify all stable neurons. Our approach is on median 183 times faster than the state-of-art method on CIFAR-10, which allows us to explore exact compression on deeper (5 x 100) and wider (2 x 800) networks within minutes. For classifiers trained under an amount of L1 regularization that does not worsen accuracy, we can remove up to 56% of the connections on the CIFAR-10 dataset. The code is available at the following link, https://github.com/yuxwind/ExactCompression.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling Vision with Sparse Mixture of Experts b/data/2021/neurips/Scaling Vision with Sparse Mixture of Experts
new file mode 100644
index 0000000000..9519f1df75
--- /dev/null
+++ b/data/2021/neurips/Scaling Vision with Sparse Mixture of Experts	
@@ -0,0 +1 @@
+Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are"dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When applied to image recognition, V-MoE matches the performance of state-of-the-art networks, while requiring as little as half of the compute at inference time. Further, we propose an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute. This allows V-MoE to trade-off performance and compute smoothly at test-time. Finally, we demonstrate the potential of V-MoE to scale vision models, and train a 15B parameter model that attains 90.35% on ImageNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification b/data/2021/neurips/Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification
new file mode 100644
index 0000000000..9e98686239
--- /dev/null
+++ b/data/2021/neurips/Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification	
@@ -0,0 +1 @@
+Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
\ No newline at end of file
diff --git a/data/2021/neurips/Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning b/data/2021/neurips/Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning
new file mode 100644
index 0000000000..ac1d2145a4
--- /dev/null
+++ b/data/2021/neurips/Scallop: From Probabilistic Deductive Databases to Scalable Differentiable Reasoning	
@@ -0,0 +1 @@
+Deep learning and symbolic reasoning are complementary techniques for an intelligent system. However, principled combinations of these techniques are typically limited in scalability, rendering them ill-suited for real-world applications. We pro-pose Scallop, a system that builds upon probabilistic deductive databases, to bridge this gap. The key insight underlying Scallop is a provenance framework that introduces a tunable parameter to specify the level of reasoning granularity. Scallop thereby i) generalizes exact probabilistic reasoning, ii) asymptotically reduces computational cost, and iii) provides relative accuracy guarantees. On synthetic tasks involving mathematical and logical reasoning, Scallop scales signiﬁcantly better without sacriﬁcing accuracy compared to DeepProbLog, a principled neural logic programming approach. Scallop also scales to a newly created real-world Visual Question Answering (VQA) benchmark that requires multi-hop reasoning, achieving 84.22% accuracy and outperforming two VQA-tailored models based on Neural Module Networks and transformers by 12.42% and 21.66% respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Scatterbrain: Unifying Sparse and Low-rank Attention b/data/2021/neurips/Scatterbrain: Unifying Sparse and Low-rank Attention
new file mode 100644
index 0000000000..75da3cc56e
--- /dev/null
+++ b/data/2021/neurips/Scatterbrain: Unifying Sparse and Low-rank Attention	
@@ -0,0 +1 @@
+Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences. However, it is still challenging to balance the trade-off between model quality and efficiency to perform a one-size-fits-all approximation for different tasks. To better understand this trade-off, we observe that sparse and low-rank approximations excel in different regimes, determined by the softmax temperature in attention, and sparse + low-rank can outperform each individually. Inspired by the classical robust-PCA algorithm for sparse and low-rank decomposition, we propose Scatterbrain, a novel way to unify sparse (via locality sensitive hashing) and low-rank (via kernel feature map) attention for accurate and efficient approximation. The estimation is unbiased with provably low error. We empirically show that Scatterbrain can achieve 2.1x lower error than baselines when serving as a drop-in replacement in BigGAN image generation and pre-trained T2T-ViT. On a pre-trained T2T Vision transformer, even without fine-tuning, Scatterbrain can reduce 98% of attention memory at the cost of only 1% drop in accuracy. We demonstrate Scatterbrain for end-to-end training with up to 4 points better perplexity and 5 points better average accuracy than sparse or low-rank efficient transformers on language modeling and long-range-arena tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Scheduling jobs with stochastic holding costs b/data/2021/neurips/Scheduling jobs with stochastic holding costs
new file mode 100644
index 0000000000..9bceb8faaa
--- /dev/null
+++ b/data/2021/neurips/Scheduling jobs with stochastic holding costs	
@@ -0,0 +1 @@
+We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of jobs across classes. In each time step, the server can process a job and observes random holding costs of the jobs that are yet to be completed. We consider a learning-based $c\mu$ rule scheduling which starts with a preemption period of fixed duration, serving as a learning phase, and having gathered data about jobs, it switches to nonpreemptive scheduling. Our algorithms are designed to handle instances with large and small gaps in mean job holding costs and achieve near-optimal performance guarantees. The performance of algorithms is evaluated by regret, where the benchmark is the minimum possible total holding cost attained by the $c\mu$ rule scheduling policy when the parameters of jobs are known. We show regret lower bounds and algorithms that achieve nearly matching regret upper bounds. Our numerical results demonstrate the efficacy of our algorithms and show that our regret analysis is nearly tight.
\ No newline at end of file
diff --git a/data/2021/neurips/Score-based Generative Modeling in Latent Space b/data/2021/neurips/Score-based Generative Modeling in Latent Space
new file mode 100644
index 0000000000..9ce2e13f0f
--- /dev/null
+++ b/data/2021/neurips/Score-based Generative Modeling in Latent Space	
@@ -0,0 +1 @@
+Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage. However, they are usually applied directly in data space and often require thousands of network evaluations for sampling. Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space, relying on the variational autoencoder framework. Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling. To enable training LSGMs end-to-end in a scalable and stable manner, we (i) introduce a new score-matching objective suitable to the LSGM setting, (ii) propose a novel parameterization of the score function that allows SGM to focus on the mismatch of the target distribution with respect to a simple Normal one, and (iii) analytically derive multiple techniques for variance reduction of the training objective. LSGM obtains a state-of-the-art FID score of 2.10 on CIFAR-10, outperforming all existing generative results on this dataset. On CelebA-HQ-256, LSGM is on a par with previous SGMs in sample quality while outperforming them in sampling time by two orders of magnitude. In modeling binary images, LSGM achieves state-of-the-art likelihood on the binarized OMNIGLOT dataset. Our project page and code can be found at https://nvlabs.github.io/LSGM .
\ No newline at end of file
diff --git a/data/2021/neurips/Score-based Generative Neural Networks for Large-Scale Optimal Transport b/data/2021/neurips/Score-based Generative Neural Networks for Large-Scale Optimal Transport
new file mode 100644
index 0000000000..d0729e3773
--- /dev/null
+++ b/data/2021/neurips/Score-based Generative Neural Networks for Large-Scale Optimal Transport	
@@ -0,0 +1 @@
+We consider the fundamental problem of sampling the optimal transport coupling between given source and target distributions. In certain cases, the optimal transport plan takes the form of a one-to-one mapping from the source support to the target support, but learning or even approximating such a map is computationally challenging for large and high-dimensional datasets due to the high cost of linear programming routines and an intrinsic curse of dimensionality. We study instead the Sinkhorn problem, a regularized form of optimal transport whose solutions are couplings between the source and the target distribution. We introduce a novel framework for learning the Sinkhorn coupling between two distributions in the form of a score-based generative model. Conditioned on source data, our procedure iterates Langevin Dynamics to sample target data according to the regularized optimal coupling. Key to this approach is a neural network parametrization of the Sinkhorn problem, and we prove convergence of gradient descent with respect to network parameters in this formulation. We demonstrate its empirical success on a variety of large scale optimal transport tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Searching Parameterized AP Loss for Object Detection b/data/2021/neurips/Searching Parameterized AP Loss for Object Detection
new file mode 100644
index 0000000000..d55d338c06
--- /dev/null
+++ b/data/2021/neurips/Searching Parameterized AP Loss for Object Detection	
@@ -0,0 +1 @@
+Loss functions play an important role in training deep-network-based object detectors. The most widely used evaluation metric for object detection is Average Precision (AP), which captures the performance of localization and classification sub-tasks simultaneously. However, due to the non-differentiable nature of the AP metric, traditional object detectors adopt separate differentiable losses for the two sub-tasks. Such a mis-alignment issue may well lead to performance degradation. To address this, existing works seek to design surrogate losses for the AP metric manually, which requires expertise and may still be sub-optimal. In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation. Different AP approximations are thus represented by a family of parameterized functions in a unified formula. Automatic parameter search algorithm is then employed to search for the optimal parameters. Extensive experiments on the COCO benchmark with three different object detectors (i.e., RetinaNet, Faster R-CNN, and Deformable DETR) demonstrate that the proposed Parameterized AP Loss consistently outperforms existing handcrafted losses. Code is released at https://github.com/fundamentalvision/Parameterized-AP-Loss.
\ No newline at end of file
diff --git a/data/2021/neurips/Searching for Efficient Transformers for Language Modeling b/data/2021/neurips/Searching for Efficient Transformers for Language Modeling
new file mode 100644
index 0000000000..5992d99810
--- /dev/null
+++ b/data/2021/neurips/Searching for Efficient Transformers for Language Modeling	
@@ -0,0 +1 @@
+Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling. Primer's improvements can be mostly attributed to two simple modifications: squaring ReLU activations and adding a depthwise convolution layer after each Q, K, and V projection in self-attention. Experiments show Primer's gains over Transformer increase as compute scale grows and follow a power law with respect to quality at optimal model sizes. We also verify empirically that Primer can be dropped into different codebases to significantly speed up training without additional tuning. For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X. Furthermore, the reduced training cost means Primer needs much less compute to reach a target one-shot performance. For instance, in a 1.9B parameter configuration similar to GPT-3 XL, Primer uses 1/3 of the training compute to achieve the same one-shot performance as Transformer. We open source our models and several comparisons in T5 to help with reproducibility.
\ No newline at end of file
diff --git a/data/2021/neurips/Searching the Search Space of Vision Transformer b/data/2021/neurips/Searching the Search Space of Vision Transformer
new file mode 100644
index 0000000000..91237691f3
--- /dev/null
+++ b/data/2021/neurips/Searching the Search Space of Vision Transformer	
@@ -0,0 +1 @@
+Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, we propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space. The central idea is to gradually evolve different search dimensions guided by their E-T Error computed using a weight-sharing supernet. Moreover, we provide design guidelines of general vision transformers with extensive analysis according to the space searching process, which could promote the understanding of vision transformer. Remarkably, the searched models, named S3 (short for Searching the Search Space), from the searched space achieve superior performance to recently proposed models, such as Swin, DeiT and ViT, when evaluated on ImageNet. The effectiveness of S3 is also illustrated on object detection, semantic segmentation and visual question answering, demonstrating its generality to downstream vision and vision-language tasks. Code and models will be available at https://github.com/microsoft/Cream.
\ No newline at end of file
diff --git a/data/2021/neurips/Second-Order Neural ODE Optimizer b/data/2021/neurips/Second-Order Neural ODE Optimizer
new file mode 100644
index 0000000000..365609209e
--- /dev/null
+++ b/data/2021/neurips/Second-Order Neural ODE Optimizer	
@@ -0,0 +1 @@
+We propose a novel second-order optimization framework for training the emerging deep continuous-time models, specifically the Neural Ordinary Differential Equations (Neural ODEs). Since their training already involves expensive gradient computation by solving a backward ODE, deriving efficient second-order methods becomes highly nontrivial. Nevertheless, inspired by the recent Optimal Control (OC) interpretation of training deep networks, we show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory cost. We further explore a low-rank representation of the second-order derivatives and show that it leads to efficient preconditioned updates with the aid of Kronecker-based factorization. The resulting method -- named SNOpt -- converges much faster than first-order baselines in wall-clock time, and the improvement remains consistent across various applications, e.g. image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep learning. Our code is available at https://github.com/ghliu/snopt.
\ No newline at end of file
diff --git a/data/2021/neurips/See More for Scene: Pairwise Consistency Learning for Scene Classification b/data/2021/neurips/See More for Scene: Pairwise Consistency Learning for Scene Classification
new file mode 100644
index 0000000000..9592b2014e
--- /dev/null
+++ b/data/2021/neurips/See More for Scene: Pairwise Consistency Learning for Scene Classification	
@@ -0,0 +1 @@
+Scene classiﬁcation is a valuable classiﬁcation subtask and has its own characteristics which still needs more in-depth studies. Basically, scene characteristics are distributed over the whole image, which cause the need of “seeing” comprehensive and informative regions. Previous works mainly focus on region discovery and aggregation, while rarely involves the inherent properties of CNN along with its potential ability to satisfy the requirements of scene classiﬁcation. In this paper, we propose to understand scene images and the scene classiﬁcation CNN models in terms of the focus area . From this new perspective, we ﬁnd that large focus area is preferred in scene classiﬁcation CNN models as a consequence of learning scene characteristics. Meanwhile, the analysis about existing training schemes helps us to understand the effects of focus area, and also raises the question about optimal training method for scene classiﬁcation. Pursuing the better usage of scene characteristics, we propose a new learning scheme with a tailored loss in the goal of activating larger focus area on scene images. Since the supervision of the target regions to be enlarged is usually lacked, our alternative learning scheme is to erase already activated area, and allow the CNN models to activate more area during training. The proposed scheme is implemented by keeping the pairwise consistency between the output of the erased image and its original one. In partic-ular, a tailored loss is proposed to keep such pairwise consistency by leveraging category-relevance information. Experiments on Places365 show the signiﬁcant improvements of our method with various CNNs. Our method shows an inferior result on the object-centric dataset, ImageNet, which experimentally indicates that it captures the unique characteristics of scenes.
\ No newline at end of file
diff --git a/data/2021/neurips/SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers b/data/2021/neurips/SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
new file mode 100644
index 0000000000..456a2a45d0
--- /dev/null
+++ b/data/2021/neurips/SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	
@@ -0,0 +1 @@
+We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.
\ No newline at end of file
diff --git a/data/2021/neurips/Selective Sampling for Online Best-arm Identification b/data/2021/neurips/Selective Sampling for Online Best-arm Identification
new file mode 100644
index 0000000000..a1aa0890b0
--- /dev/null
+++ b/data/2021/neurips/Selective Sampling for Online Best-arm Identification	
@@ -0,0 +1 @@
+This work considers the problem of selective-sampling for best-arm identification. Given a set of potential options $\mathcal{Z}\subset\mathbb{R}^d$, a learner aims to compute with probability greater than $1-\delta$, $\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}$ where $\theta_{\ast}$ is unknown. At each time step, a potential measurement $x_t\in \mathcal{X}\subset\mathbb{R}^d$ is drawn IID and the learner can either choose to take the measurement, in which case they observe a noisy measurement of $x^{\top}\theta_{\ast}$, or to abstain from taking the measurement and wait for a potentially more informative point to arrive in the stream. Hence the learner faces a fundamental trade-off between the number of labeled samples they take and when they have collected enough evidence to declare the best arm and stop sampling. The main results of this work precisely characterize this trade-off between labeled samples and stopping time and provide an algorithm that nearly-optimally achieves the minimal label complexity given a desired stopping time. In addition, we show that the optimal decision rule has a simple geometric form based on deciding whether a point is in an ellipse or not. Finally, our framework is general enough to capture binary classification improving upon previous works.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Adaptable Point Processes with Nonparametric Time Decays b/data/2021/neurips/Self-Adaptable Point Processes with Nonparametric Time Decays
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2021/neurips/Self-Adaptable Point Processes with Nonparametric Time Decays	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning b/data/2021/neurips/Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
new file mode 100644
index 0000000000..059ccb6e58
--- /dev/null
+++ b/data/2021/neurips/Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning	
@@ -0,0 +1 @@
+We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Consistent Models and Values b/data/2021/neurips/Self-Consistent Models and Values
new file mode 100644
index 0000000000..22ec90847f
--- /dev/null
+++ b/data/2021/neurips/Self-Consistent Models and Values	
@@ -0,0 +1 @@
+Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly \emph{self-consistent}. Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model. We propose multiple self-consistency updates, evaluate these in both tabular and function approximation settings, and find that, with appropriate choices, self-consistency helps both policy evaluation and control.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks b/data/2021/neurips/Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks
new file mode 100644
index 0000000000..79eae86d9b
--- /dev/null
+++ b/data/2021/neurips/Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks	
@@ -0,0 +1 @@
+Despite remarkable performance in producing realistic samples, Generative Adversarial Networks (GANs) often produce low-quality samples near low-density regions of the data manifold, e.g., samples of minor groups. Many techniques have been developed to improve the quality of generated samples, either by post-processing generated samples or by pre-processing the empirical data distribution, but at the cost of reduced diversity. To promote diversity in sample generation without degrading the overall quality, we propose a simple yet effective method to diagnose and emphasize underrepresented samples during training of a GAN. The main idea is to use the statistics of the discrepancy between the data distribution and the model distribution at each data instance. Based on the observation that the underrepresented samples have a high average discrepancy or high variability in discrepancy, we propose a method to emphasize those samples during training of a GAN. Our experimental results demonstrate that the proposed method improves GAN performance on various datasets, and it is especially effective in improving the quality and diversity of sample generation for minor groups.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Instantiated Recurrent Units with Dynamic Soft Recursion b/data/2021/neurips/Self-Instantiated Recurrent Units with Dynamic Soft Recursion
new file mode 100644
index 0000000000..72fafc6f14
--- /dev/null
+++ b/data/2021/neurips/Self-Instantiated Recurrent Units with Dynamic Soft Recursion	
@@ -0,0 +1 @@
+While standard recurrent neural networks explicitly impose a chain structure on different forms of data, they do not have an explicit bias towards recursive self-instantiation where the extent of recursion is dynamic. Given diverse and even growing data modalities (e.g., logic, algorithmic input and output, music, code, images, and language) that can be expressed in sequences and may beneﬁt from more architectural ﬂexibility, we propose the self-instantiated recurrent unit (Self-IRU) with a novel inductive bias towards dynamic soft recursion. On one hand, the Self-IRU is characterized by recursive self-instantiation via its gating functions, i.e., gating mechanisms of the Self-IRU are controlled by instances of the Self-IRU itself, which are repeatedly invoked in a recursive fashion. On the other hand, the extent of the Self-IRU recursion is controlled by gates whose values are between 0 and 1 and may vary across the temporal dimension of sequences, enabling dynamic soft recursion depth at each time step. The architectural ﬂexibility and effectiveness of our proposed approach are demonstrated across multiple data modalities. For example, the Self-IRU achieves state-of-the-art performance on the logical inference dataset [Bowman et al., 2014] even when comparing with competitive models that have access to ground-truth syntactic information.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Interpretable Model with Transformation Equivariant Interpretation b/data/2021/neurips/Self-Interpretable Model with Transformation Equivariant Interpretation
new file mode 100644
index 0000000000..2b42047921
--- /dev/null
+++ b/data/2021/neurips/Self-Interpretable Model with Transformation Equivariant Interpretation	
@@ -0,0 +1 @@
+During experiments, the model configurations can be divided into two different settings according to the complexity of the dataset. For MNIST [6], as it is simple, and the intrinsic structures are distinct by pixels among classes, the structure of SITE degenerates from G ◦ F1 to G by setting F1 to be the identical operator. That is, z = F1(x) = x. Correspondingly, the generatorG instead maps input x to its prototypes Gi(x), i = 1, · · · , c. Hence the structure of SITE is built to be an autoencoder-based structure, where there are c parallel decoders. As for CIFAR-10 [5], due to the need for upsampling in visualization, the image data are resized to 128× 128. The feature extractor F1 is built based on ResNet-18 [3]. Here F1 : R3×128×128 → R10×16×16. And for the generator G, it consists of c = 10 (number of categories) parallel autoencoders, such that Gi : R10×16×16 → R10×16×16. Both MNIST and CIFAR-10 datasets are split into the training and validation sets by default. And all presented examples are from the validation sets. We also test on more complex datasets like Food-101 [1] to demonstrate the scalability of SITE. Please refer to the Appendix 5 for details. Besides, in order to balance the classification loss and the transformation loss we set the scalar factor to be λ = 5 throughout the training phase.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels b/data/2021/neurips/Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels
new file mode 100644
index 0000000000..93ad27e591
--- /dev/null
+++ b/data/2021/neurips/Self-Paced Contrastive Learning for Semi-supervised Medical Image Segmentation with Meta-labels	
@@ -0,0 +1 @@
+Pre-training a recognition model with contrastive learning on a large dataset of unlabeled data has shown great potential to boost the performance of a downstream task, e.g., image classification. However, in domains such as medical imaging, collecting unlabeled data can be challenging and expensive. In this work, we propose to adapt contrastive learning to work with meta-label annotations, for improving the model's performance in medical image segmentation even when no additional unlabeled data is available. Meta-labels such as the location of a 2D slice in a 3D MRI scan or the type of device used, often come for free during the acquisition process. We use the meta-labels for pre-training the image encoder as well as to regularize a semi-supervised training, in which a reduced set of annotated data is used for training. Finally, to fully exploit the weak annotations, a self-paced learning approach is used to help the learning and discriminate useful labels from noise. Results on three different medical image segmentation datasets show that our approach: i) highly boosts the performance of a model trained on a few scans, ii) outperforms previous contrastive and semi-supervised approaches, and iii) reaches close to the performance of a model trained on the full data.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Bug Detection and Repair b/data/2021/neurips/Self-Supervised Bug Detection and Repair
new file mode 100644
index 0000000000..b7e60d4c70
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Bug Detection and Repair	
@@ -0,0 +1 @@
+Machine learning-based program analyses have recently shown the promise of integrating formal and probabilistic reasoning towards aiding software development. However, in the absence of large annotated corpora, training these analyses is challenging. Towards addressing this, we present BugLab, an approach for self-supervised learning of bug detection and repair. BugLab co-trains two models: (1) a detector model that learns to detect and repair bugs in code, (2) a selector model that learns to create buggy code for the detector to use as training data. A Python implementation of BugLab improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised GANs with Label Augmentation b/data/2021/neurips/Self-Supervised GANs with Label Augmentation
new file mode 100644
index 0000000000..debae7cd7b
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised GANs with Label Augmentation	
@@ -0,0 +1 @@
+Recently, transformation-based self-supervised learning has been applied to generative adversarial networks (GANs) to mitigate catastrophic forgetting in the discriminator by introducing a stationary learning environment. However, the separate self-supervised tasks in existing self-supervised GANs cause a goal inconsistent with generative modeling due to the fact that their self-supervised classifiers are agnostic to the generator distribution. To address this problem, we propose a novel self-supervised GAN that unifies the GAN task with the self-supervised task by augmenting the GAN labels (real or fake) via self-supervision of data transformation. Specifically, the original discriminator and self-supervised classifier are unified into a label-augmented discriminator that predicts the augmented labels to be aware of both the generator distribution and the data distribution under every transformation, and then provide the discrepancy between them to optimize the generator. Theoretically, we prove that the optimal generator could converge to replicate the real data distribution. Empirically, we show that the proposed method significantly outperforms previous self-supervised and data augmentation GANs on both generative modeling and representation learning across benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Learning Disentangled Group Representation as Feature b/data/2021/neurips/Self-Supervised Learning Disentangled Group Representation as Feature
new file mode 100644
index 0000000000..f0459c4d5e
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Learning Disentangled Group Representation as Feature	
@@ -0,0 +1 @@
+A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics). In this paper, we formulate the notion of"good"representation from a group-theoretic view using Higgins' definition of disentangled representation, and show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization, thus unable to modularize the remaining semantics. To break the limitation, we propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM), which successfully grounds the abstract semantics and the group acting on them into concrete contrastive learning. At each iteration, IP-IRM first partitions the training samples into two subsets that correspond to an entangled group element. Then, it minimizes a subset-invariant contrastive loss, where the invariance guarantees to disentangle the group element. We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks. Codes are available at https://github.com/Wangt-CN/IP-IRM.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks b/data/2021/neurips/Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks
new file mode 100644
index 0000000000..25a6789128
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks	
@@ -0,0 +1 @@
+The field of neuromorphic computing promises extremely low-power and low-latency sensing and processing. Challenges in transferring learning algorithms from traditional artificial neural networks (ANNs) to spiking neural networks (SNNs) have so far prevented their application to large-scale, complex regression tasks. Furthermore, realizing a truly asynchronous and fully neuromorphic pipeline that maximally attains the abovementioned benefits involves rethinking the way in which this pipeline takes in and accumulates information. In the case of perception, spikes would be passed as-is and one-by-one between an event camera and an SNN, meaning all temporal integration of information must happen inside the network. In this article, we tackle these two problems. We focus on the complex task of learning to estimate optical flow from event-based camera inputs in a self-supervised manner, and modify the state-of-the-art ANN training pipeline to encode minimal temporal information in its inputs. Moreover, we reformulate the self-supervised loss function for event-based optical flow to improve its convexity. We perform experiments with various types of recurrent ANNs and SNNs using the proposed pipeline. Concerning SNNs, we investigate the effects of elements such as parameter initialization and optimization, surrogate gradient shape, and adaptive neuronal mechanisms. We find that initialization and surrogate gradient width play a crucial part in enabling learning with sparse inputs, while the inclusion of adaptivity and learnable neuronal parameters can improve performance. We show that the performance of the proposed ANNs and SNNs are on par with that of the current state-of-the-art ANNs trained in a self-supervised manner.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style b/data/2021/neurips/Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style
new file mode 100644
index 0000000000..09bfd0c15e
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style	
@@ -0,0 +1 @@
+Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Learning with Kernel Dependence Maximization b/data/2021/neurips/Self-Supervised Learning with Kernel Dependence Maximization
new file mode 100644
index 0000000000..e4ba6fd4cb
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Learning with Kernel Dependence Maximization	
@@ -0,0 +1 @@
+We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC). SSL-HSIC maximizes dependence between representations of transformations of an image and the image identity, while minimizing the kernelized variance of those representations. This framework yields a new understanding of InfoNCE, a variational lower bound on the mutual information (MI) between different transformations. While the MI itself is known to have pathologies which can result in learning meaningless representations, its bound is much better behaved: we show that it implicitly approximates SSL-HSIC (with a slightly different regularizer). Our approach also gives us insight into BYOL, a negative-free SSL method, since SSL-HSIC similarly learns local neighborhoods of samples. SSL-HSIC allows us to directly optimize statistical dependence in time linear in the batch size, without restrictive data assumptions or indirect mutual information estimators. Trained with or without a target network, SSL-HSIC matches the current state-of-the-art for standard linear evaluation on ImageNet, semi-supervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition. Code is available at https://github.com/deepmind/ssl_hsic .
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Multi-Object Tracking with Cross-input Consistency b/data/2021/neurips/Self-Supervised Multi-Object Tracking with Cross-input Consistency
new file mode 100644
index 0000000000..1681227d5d
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Multi-Object Tracking with Cross-input Consistency	
@@ -0,0 +1 @@
+In this paper, we propose a self-supervised learning procedure for training a robust multi-object tracking (MOT) model given only unlabeled video. While several self-supervisory learning signals have been proposed in prior work on single-object tracking, such as color propagation and cycle-consistency, these signals cannot be directly applied for training RNN models, which are needed to achieve accurate MOT: they yield degenerate models that, for instance, always match new detections to tracks with the closest initial detections. We propose a novel self-supervisory signal that we call cross-input consistency: we construct two distinct inputs for the same sequence of video, by hiding different information about the sequence in each input. We then compute tracks in that sequence by applying an RNN model independently on each input, and train the model to produce consistent tracks across the two inputs. We evaluate our unsupervised method on MOT17 and KITTI -- remarkably, we find that, despite training only on unlabeled video, our unsupervised approach outperforms four supervised methods published in the last 1--2 years, including Tracktor++, FAMNet, GSM, and mmMOT.
\ No newline at end of file
diff --git a/data/2021/neurips/Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction b/data/2021/neurips/Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
new file mode 100644
index 0000000000..bc2ebb7643
--- /dev/null
+++ b/data/2021/neurips/Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction	
@@ -0,0 +1 @@
+Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn hyper-representations of the weights of populations of NNs. To that end, we introduce domain speciﬁc data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings. Code and datasets are publicly available 1 .
\ No newline at end of file
diff --git a/data/2021/neurips/Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning b/data/2021/neurips/Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning
new file mode 100644
index 0000000000..25443abef4
--- /dev/null
+++ b/data/2021/neurips/Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning	
@@ -0,0 +1 @@
+Due to the limited and even imbalanced data, semi-supervised semantic segmentation tends to have poor performance on some certain categories, e.g., tailed categories in Cityscapes dataset which exhibits a long-tailed label distribution. Existing approaches almost all neglect this problem, and treat categories equally. Some popular approaches such as consistency regularization or pseudo-labeling may even harm the learning of under-performing categories, that the predictions or pseudo labels of these categories could be too inaccurate to guide the learning on the unlabeled data. In this paper, we look into this problem, and propose a novel framework for semi-supervised semantic segmentation, named adaptive equalization learning (AEL). AEL adaptively balances the training of well and badly performed categories, with a confidence bank to dynamically track category-wise performance during training. The confidence bank is leveraged as an indicator to tilt training towards under-performing categories, instantiated in three strategies: 1) adaptive Copy-Paste and CutMix data augmentation approaches which give more chance for under-performing categories to be copied or cut; 2) an adaptive data sampling approach to encourage pixels from under-performing category to be sampled; 3) a simple yet effective re-weighting method to alleviate the training noise raised by pseudo-labeling. Experimentally, AEL outperforms the state-of-the-art methods by a large margin on the Cityscapes and Pascal VOC benchmarks under various data partition protocols. Code is available at https://github.com/hzhupku/SemiSeg-AEL
\ No newline at end of file
diff --git a/data/2021/neurips/Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification b/data/2021/neurips/Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification
new file mode 100644
index 0000000000..d625c66532
--- /dev/null
+++ b/data/2021/neurips/Semialgebraic Representation of Monotone Deep Equilibrium Models and Applications to Certification	
@@ -0,0 +1 @@
+Deep equilibrium models are based on implicitly defined functional relations and have shown competitive performance compared with the traditional deep networks. Monotone operator equilibrium networks (monDEQ) retain interesting performance with additional theoretical guaranties. Existing certification tools for classical deep networks cannot directly be applied to monDEQs for which much fewer tools exist. We introduce a semialgebraic representation for ReLU based monDEQs which allows to approximate the corresponding input output relation by semidefinite programming (SDP). We present several applications to network certification and obtain SDP models for the following problems : robustness certification, Lipschitz constant estimation, ellipsoidal uncertainty propagation. We use these models to certify robustness of monDEQs w.r.t. a general $L_q$ norm. Experimental results show that the proposed models outperform existing approaches for monDEQ certification. Furthermore, our investigations suggest that monDEQs are much more robust to $L_2$ perturbations than $L_{\infty}$ perturbations.
\ No newline at end of file
diff --git a/data/2021/neurips/Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics b/data/2021/neurips/Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics
new file mode 100644
index 0000000000..d7b0ab0b1a
--- /dev/null
+++ b/data/2021/neurips/Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics	
@@ -0,0 +1 @@
+Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes $\mathcal{F}_2$ and $\mathcal{F}_1$ respectively, which were developed to study overparametrized two-layer neural networks. In particular, we construct pairs of distributions over hyper-spheres that can not be discriminated by fixed kernel $(\mathcal{F}_2)$ integral probability metric (IPM) and Stein discrepancy (SD) in high dimensions, but that can be discriminated by their feature learning ($\mathcal{F}_1$) counterparts. To further study the separation we provide links between the $\mathcal{F}_1$ and $\mathcal{F}_2$ IPMs with sliced Wasserstein distances. Our work suggests that fixed-kernel discriminators perform worse than their feature learning counterparts because their corresponding metrics are weaker.
\ No newline at end of file
diff --git a/data/2021/neurips/Sequence-to-Sequence Learning with Latent Neural Grammars b/data/2021/neurips/Sequence-to-Sequence Learning with Latent Neural Grammars
new file mode 100644
index 0000000000..cdadf9516e
--- /dev/null
+++ b/data/2021/neurips/Sequence-to-Sequence Learning with Latent Neural Grammars	
@@ -0,0 +1 @@
+Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on arbitrary context. While flexible and performant, these models often require large datasets for training and can fail spectacularly on benchmarks designed to test for compositional generalization. This work explores an alternative, hierarchical approach to sequence-to-sequence learning with quasi-synchronous grammars, where each node in the target tree is transduced by a node in the source tree. Both the source and target trees are treated as latent and induced during training. We develop a neural parameterization of the grammar which enables parameter sharing over the combinatorial space of derivation rules without the need for manual feature engineering. We apply this latent neural grammar to various domains -- a diagnostic language navigation task designed to test for compositional generalization (SCAN), style transfer, and small-scale machine translation -- and find that it performs respectably compared to standard baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Sequential Algorithms for Testing Closeness of Distributions b/data/2021/neurips/Sequential Algorithms for Testing Closeness of Distributions
new file mode 100644
index 0000000000..0a77316698
--- /dev/null
+++ b/data/2021/neurips/Sequential Algorithms for Testing Closeness of Distributions	
@@ -0,0 +1 @@
+What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions D 1 and D 2 on { 1 , . . . , n } are equal or ε -far, we give several answers to this question. We show that for a small alphabet size n , there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 4 in terms sample complexity. For a general alphabet size n , we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance TV( D 1 , D 2 ) between D 1 and D 2 is larger than ε . As a corollary, letting ε go to 0 , we obtain a sequential algorithm for testing closeness when no a priori bound on TV( D 1 , D 2 ) is given that has a sample complexity ˜ O ( n 2 / 3 TV( D 1 , D 2 ) 4 / 3 ) : this improves over the ˜ O ( n/ log n TV( D 1 , D 2 ) 2 ) tester of Daskalakis and Kawase [2017] and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing closeness: they can improve the worst case number of samples by at most a constant factor.
\ No newline at end of file
diff --git a/data/2021/neurips/Sequential Causal Imitation Learning with Unobserved Confounders b/data/2021/neurips/Sequential Causal Imitation Learning with Unobserved Confounders
new file mode 100644
index 0000000000..316fac57ff
--- /dev/null
+++ b/data/2021/neurips/Sequential Causal Imitation Learning with Unobserved Confounders	
@@ -0,0 +1 @@
+"Monkey see monkey do"is an age-old adage, referring to na\"ive imitation without a deep understanding of a system's underlying mechanics. Indeed, if a demonstrator has access to information unavailable to the imitator (monkey), such as a different set of sensors, then no matter how perfectly the imitator models its perceived environment (See), attempting to reproduce the demonstrator's behavior (Do) can lead to poor outcomes. Imitation learning in the presence of a mismatch between demonstrator and imitator has been studied in the literature under the rubric of causal imitation learning (Zhang et al., 2020), but existing solutions are limited to single-stage decision-making. This paper investigates the problem of causal imitation learning in sequential settings, where the imitator must make multiple decisions per episode. We develop a graphical criterion that is necessary and sufficient for determining the feasibility of causal imitation, providing conditions when an imitator can match a demonstrator's performance despite differing capabilities. Finally, we provide an efficient algorithm for determining imitability and corroborate our theory with simulations.
\ No newline at end of file
diff --git a/data/2021/neurips/Set Prediction in the Latent Space b/data/2021/neurips/Set Prediction in the Latent Space
new file mode 100644
index 0000000000..2505a8be20
--- /dev/null
+++ b/data/2021/neurips/Set Prediction in the Latent Space	
@@ -0,0 +1 @@
+Set prediction tasks require the matching between predicted set and ground truth set in order to propagate the gradient signal. Recent works have performed this matching in the original feature space thus requiring predeﬁned distance functions. We propose a method for learning the distance function by performing the matching in the latent space learned from encoding networks. This method enables the use of teacher forcing which was not possible previously since matching in the feature space must be computed after the entire output sequence is generated. Nonetheless, a naive implementation of latent set prediction might not converge due to permutation instability. To address this problem, we provide sufﬁcient conditions for permutation stability which begets an algorithm to improve the overall model convergence. Experiments on several set prediction tasks, including image captioning and object detection, demonstrate the effectiveness of our method. Code is available at https://github.com/phizaz/latent-set-prediction .
\ No newline at end of file
diff --git a/data/2021/neurips/Settling the Variance of Multi-Agent Policy Gradients b/data/2021/neurips/Settling the Variance of Multi-Agent Policy Gradients
new file mode 100644
index 0000000000..9aae3db11f
--- /dev/null
+++ b/data/2021/neurips/Settling the Variance of Multi-Agent Policy Gradients	
@@ -0,0 +1 @@
+Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Shape As Points: A Differentiable Poisson Solver b/data/2021/neurips/Shape As Points: A Differentiable Poisson Solver
new file mode 100644
index 0000000000..bd7b544cb1
--- /dev/null
+++ b/data/2021/neurips/Shape As Points: A Differentiable Poisson Solver	
@@ -0,0 +1 @@
+In recent years, neural implicit representations gained popularity in 3D reconstruction due to their expressiveness and flexibility. However, the implicit nature of neural implicit representations results in slow inference time and requires careful initialization. In this paper, we revisit the classic yet ubiquitous point cloud representation and introduce a differentiable point-to-mesh layer using a differentiable formulation of Poisson Surface Reconstruction (PSR) that allows for a GPU-accelerated fast solution of the indicator function given an oriented point cloud. The differentiable PSR layer allows us to efficiently and differentiably bridge the explicit 3D point representation with the 3D mesh via the implicit indicator field, enabling end-to-end optimization of surface reconstruction metrics such as Chamfer distance. This duality between points and meshes hence allows us to represent shapes as oriented point clouds, which are explicit, lightweight and expressive. Compared to neural implicit representations, our Shape-As-Points (SAP) model is more interpretable, lightweight, and accelerates inference time by one order of magnitude. Compared to other explicit representations such as points, patches, and meshes, SAP produces topology-agnostic, watertight manifold surfaces. We demonstrate the effectiveness of SAP on the task of surface reconstruction from unoriented point clouds and learning-based reconstruction.
\ No newline at end of file
diff --git a/data/2021/neurips/Shape Registration in the Time of Transformers b/data/2021/neurips/Shape Registration in the Time of Transformers
new file mode 100644
index 0000000000..5a94136607
--- /dev/null
+++ b/data/2021/neurips/Shape Registration in the Time of Transformers	
@@ -0,0 +1 @@
+In this paper, we propose a transformer-based procedure for the efficient registration of non-rigid 3D point clouds. The proposed approach is data-driven and adopts for the first time the transformer architecture in the registration task. Our method is general and applies to different settings. Given a fixed template with some desired properties (e.g. skinning weights or other animation cues), we can register raw acquired data to it, thereby transferring all the template properties to the input geometry. Alternatively, given a pair of shapes, our method can register the first onto the second (or vice-versa), obtaining a high-quality dense correspondence between the two. In both contexts, the quality of our results enables us to target real applications such as texture transfer and shape interpolation. Furthermore, we also show that including an estimation of the underlying density of the surface eases the learning process. By exploiting the potential of this architecture, we can train our model requiring only a sparse set of ground truth correspondences ($10\sim20\%$ of the total points). The proposed model and the analysis that we perform pave the way for future exploration of transformer-based architectures for registration and matching applications. Qualitative and quantitative evaluations demonstrate that our pipeline outperforms state-of-the-art methods for deformable and unordered 3D data registration on different datasets and scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects b/data/2021/neurips/Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects
new file mode 100644
index 0000000000..dd6f732a9b
--- /dev/null
+++ b/data/2021/neurips/Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects	
@@ -0,0 +1 @@
+We address the novel task of jointly reconstructing the 3D shape, texture, and motion of an object from a single motion-blurred image. While previous approaches address the deblurring problem only in the 2D image domain, our proposed rigorous modeling of all object properties in the 3D domain enables the correct description of arbitrary object motion. This leads to significantly better image decomposition and sharper deblurring results. We model the observed appearance of a motion-blurred object as a combination of the background and a 3D object with constant translation and rotation. Our method minimizes a loss on reconstructing the input image via differentiable rendering with suitable regularizers. This enables estimating the textured 3D mesh of the blurred object with high fidelity. Our method substantially outperforms competing approaches on several benchmarks for fast moving objects deblurring. Qualitative results show that the reconstructed 3D mesh generates high-quality temporal super-resolution and novel views of the deblurred object.
\ No newline at end of file
diff --git a/data/2021/neurips/Shape your Space: A Gaussian Mixture Regularization Approach to Deterministic Autoencoders b/data/2021/neurips/Shape your Space: A Gaussian Mixture Regularization Approach to Deterministic Autoencoders
new file mode 100644
index 0000000000..7df0d6e1c1
--- /dev/null
+++ b/data/2021/neurips/Shape your Space: A Gaussian Mixture Regularization Approach to Deterministic Autoencoders	
@@ -0,0 +1 @@
+Variational Autoencoders (VAEs) are powerful probabilistic models to learn representations of complex data distributions. One important limitation of VAEs is the strong prior assumption that latent representations learned by the model follow a simple uni-modal Gaussian distribution. Further, the variational training procedure poses considerable practical challenges. Recently proposed regularized autoencoders offer a deterministic autoencoding framework, that simpliﬁes the original VAE objective and is signiﬁcantly easier to train. Since these models only provide weak control over the learned latent distribution, they require an ex-post density estimation step to generate samples comparable to those of VAEs. In this paper, we propose a simple and end-to-end trainable deterministic autoencoding framework, that efﬁciently shapes the latent space of the model during training and utilizes the capacity of expressive multi-modal latent distributions. The proposed training procedure provides direct evidence if the latent distribution adequately captures complex aspects of the encoded data. We show in experiments the expressiveness and sample quality of our model in various challenging continuous and discrete domains. An implementation is available at https://github.com/boschresearch/GMM_DAE .
\ No newline at end of file
diff --git a/data/2021/neurips/Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices b/data/2021/neurips/Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices
new file mode 100644
index 0000000000..b8f29abd7f
--- /dev/null
+++ b/data/2021/neurips/Shapeshifter: a Parameter-efficient Transformer using Factorized Reshaped Matrices	
@@ -0,0 +1 @@
+Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily ﬁne-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the ﬁrst place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be made much smaller and still perform well. Here, we focus on factorized representations of matrices that underpin dense, embedding, and self-attention layers. We use low-rank factorized representation of a reshaped and rearranged original matrix to achieve space efﬁcient and expressive linear layers. We prove that stacking such low-rank layers increases their expressiveness, providing theoretical understanding for their effectiveness in deep networks. In Transformer models, our approach leads to more than ten-fold reduction in the number of total trainable parameters, including embedding, attention, and feed-forward layers, with little degradation in on-task performance. The approach operates out-of-the-box, replacing each parameter matrix with its compact equivalent while maintaining the architecture of the network.
\ No newline at end of file
diff --git a/data/2021/neurips/Shaping embodied agent behavior with activity-context priors from egocentric video b/data/2021/neurips/Shaping embodied agent behavior with activity-context priors from egocentric video
new file mode 100644
index 0000000000..c7c47b5e7d
--- /dev/null
+++ b/data/2021/neurips/Shaping embodied agent behavior with activity-context priors from egocentric video	
@@ -0,0 +1 @@
+Complex physical tasks entail a sequence of object interactions, each with its own preconditions -- which can be difficult for robotic agents to learn efficiently solely through their own experience. We introduce an approach to discover activity-context priors from in-the-wild egocentric video captured with human worn cameras. For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e.g., a knife and cutting board brought together with a tomato are conducive to cutting). We encode our video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. In this way, our model translates everyday human experience into embodied agent skills. We demonstrate our idea using egocentric EPIC-Kitchens video of people performing unscripted kitchen activities to benefit virtual household robotic agents performing various complex tasks in AI2-iTHOR, significantly accelerating agent learning. Project page: http://vision.cs.utexas.edu/projects/ego-rewards/
\ No newline at end of file
diff --git a/data/2021/neurips/Shapley Residuals: Quantifying the limits of the Shapley value for explanations b/data/2021/neurips/Shapley Residuals: Quantifying the limits of the Shapley value for explanations
new file mode 100644
index 0000000000..6b37b1f8a9
--- /dev/null
+++ b/data/2021/neurips/Shapley Residuals: Quantifying the limits of the Shapley value for explanations	
@@ -0,0 +1 @@
+Popular feature importance techniques compute additive approximations to nonlinear models by ﬁrst deﬁning a cooperative game describing the value of different subsets of the model’s features, then calculating the resulting game’s Shapley values to attribute credit additively between the features. However, the speciﬁc modeling settings in which the Shapley values are a poor approximation for the true game have not been well-described. In this paper we utilize an interpretation of Shapley values as the result of an orthogonal projection between vector spaces to calculate a residual representing the kernel component of that projection. We provide an algorithm for computing these residuals, characterize different modeling settings based on the value of the residuals, and demonstrate that they capture information about model predictions that Shapley values cannot. Shapley residuals can thus act as a warning to practitioners against overestimating the degree to which Shapley-value-based explanations give them insight into a model.
\ No newline at end of file
diff --git a/data/2021/neurips/Shared Independent Component Analysis for Multi-Subject Neuroimaging b/data/2021/neurips/Shared Independent Component Analysis for Multi-Subject Neuroimaging
new file mode 100644
index 0000000000..e71d307bf8
--- /dev/null
+++ b/data/2021/neurips/Shared Independent Component Analysis for Multi-Subject Neuroimaging	
@@ -0,0 +1 @@
+We consider shared response modeling, a multi-view learning problem where one wants to identify common components from multiple datasets or views. We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances. We then show that in some cases multi-set canonical correlation analysis can recover the correct unmixing matrices, but that even a small amount of sampling noise makes Multiset CCA fail. To solve this problem, we propose to use joint diagonalization after Multiset CCA, leading to a new approach called ShICA-J. We show via simulations that ShICA-J leads to improved results while being very fast to fit. While ShICA-J is based on second-order statistics, we further propose to leverage non-Gaussianity of the components using a maximum-likelihood method, ShICA-ML, that is both more accurate and more costly. Further, ShICA comes with a principled method for shared components estimation. Finally, we provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives.
\ No newline at end of file
diff --git a/data/2021/neurips/Sharp Impossibility Results for Hyper-graph Testing b/data/2021/neurips/Sharp Impossibility Results for Hyper-graph Testing
new file mode 100644
index 0000000000..0a53522309
--- /dev/null
+++ b/data/2021/neurips/Sharp Impossibility Results for Hyper-graph Testing	
@@ -0,0 +1 @@
+In a broad Degree-Corrected Mixed-Membership (DCMM) setting, we test whether a non-uniform hypergraph has only one community or has multiple communities. Since both the null and alternative hypotheses have many unknown parameters, the challenge is, given an alternative, how to identify the null that is hardest to separate from the alternative. We approach this by proposing a degree matching strategy where the main idea is leveraging the theory for tensor scaling to create a least favorable pair of hypotheses. We present a result on standard minimax lower bound theory and a result on Region of Impossibility (which is more informative than the minimax lower bound). We show that our lower bounds are tight by introducing a new test that attains the lower bound up to a logarithmic factor. We also discuss the case where the hypergraphs may have mixed-memberships.
\ No newline at end of file
diff --git a/data/2021/neurips/Shift Invariance Can Reduce Adversarial Robustness b/data/2021/neurips/Shift Invariance Can Reduce Adversarial Robustness
new file mode 100644
index 0000000000..8d9513ff2a
--- /dev/null
+++ b/data/2021/neurips/Shift Invariance Can Reduce Adversarial Robustness	
@@ -0,0 +1 @@
+Shift invariance is a critical property of CNNs that improves performance on classification. However, we show that invariance to circular shifts can also lead to greater sensitivity to adversarial attacks. We first characterize the margin between classes when a shift-invariant linear classifier is used. We show that the margin can only depend on the DC component of the signals. Then, using results about infinitely wide networks, we show that in some simple cases, fully connected and shift-invariant neural networks produce linear decision boundaries. Using this, we prove that shift invariance in neural networks produces adversarial examples for the simple case of two classes, each consisting of a single image with a black or white dot on a gray background. This is more than a curiosity; we show empirically that with real datasets and realistic architectures, shift invariance reduces adversarial robustness. Finally, we describe initial experiments using synthetic data to probe the source of this connection.
\ No newline at end of file
diff --git a/data/2021/neurips/Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data b/data/2021/neurips/Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data
new file mode 100644
index 0000000000..2bbf5a8630
--- /dev/null
+++ b/data/2021/neurips/Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data	
@@ -0,0 +1 @@
+There has been a recent surge of interest in designing Graph Neural Networks (GNNs) for semi-supervised learning tasks. Unfortunately this work has assumed that the nodes labeled for use in training were selected uniformly at random (i.e. are an IID sample). However in many real world scenarios gathering labels for graph nodes is both expensive and inherently biased -- so this assumption can not be met. GNNs can suffer poor generalization when this occurs, by overfitting to superfluous regularities present in the training data. In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and the graph's true inference distribution. SR-GNN adapts GNN models for the presence of distributional shifts between the nodes which have had labels provided for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (~40%) of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe an 2% absolute improvement over the baseline and reduce 30% of the negative effects.
\ No newline at end of file
diff --git a/data/2021/neurips/Shifted Chunk Transformer for Spatio-Temporal Representational Learning b/data/2021/neurips/Shifted Chunk Transformer for Spatio-Temporal Representational Learning
new file mode 100644
index 0000000000..5c515e255f
--- /dev/null
+++ b/data/2021/neurips/Shifted Chunk Transformer for Spatio-Temporal Representational Learning	
@@ -0,0 +1 @@
+Spatio-temporal representational learning has been widely adopted in various fields such as action recognition, video object segmentation, and action anticipation. Previous spatio-temporal representational learning approaches primarily employ ConvNets or sequential models,e.g., LSTM, to learn the intra-frame and inter-frame features. Recently, Transformer models have successfully dominated the study of natural language processing (NLP), image classification, etc. However, the pure-Transformer based spatio-temporal learning can be prohibitively costly on memory and computation to extract fine-grained features from a tiny patch. To tackle the training difficulty and enhance the spatio-temporal learning, we construct a shifted chunk Transformer with pure self-attention blocks. Leveraging the recent efficient Transformer design in NLP, this shifted chunk Transformer can learn hierarchical spatio-temporal features from a local tiny patch to a global video clip. Our shifted self-attention can also effectively model complicated inter-frame variances. Furthermore, we build a clip encoder based on Transformer to model long-term temporal dependencies. We conduct thorough ablation studies to validate each component and hyper-parameters in our shifted chunk Transformer, and it outperforms previous state-of-the-art approaches on Kinetics-400, Kinetics-600, UCF101, and HMDB51.
\ No newline at end of file
diff --git a/data/2021/neurips/Sifting through the noise: Universal first-order methods for stochastic variational inequalities b/data/2021/neurips/Sifting through the noise: Universal first-order methods for stochastic variational inequalities
new file mode 100644
index 0000000000..e5a1a525d0
--- /dev/null
+++ b/data/2021/neurips/Sifting through the noise: Universal first-order methods for stochastic variational inequalities	
@@ -0,0 +1 @@
+We examine a flexible algorithmic framework for solving monotone variational inequalities in the presence of randomness and uncertainty. The proposed template encompasses a wide range of popular first-order methods, including dual averaging, dual extrapolation and optimistic gradient algorithms – both adaptive and non-adaptive. Our first result is that the algorithm achieves the optimal rates of convergence for cocoercive problems when the profile of the randomness is known to the optimizer: O (1 / √ T ) for absolute noise profiles, and O (1 /T ) for relative ones. Subsequently, we drop all prior knowledge requirements (the ab-solute/relative variance of the randomness affecting the problem, the operator’s cocoercivity constant, etc.), and we analyze an adaptive instance of the method that gracefully interpolates between the above rates – i.e., it achieves O (1 / √ T ) and (1 /T ) in the absolute and relative cases, respectively. To our knowledge, this is the first universality result of its kind in the literature and, somewhat surprisingly, it shows that an extra-gradient proxy step is not required to achieve optimal rates.
\ No newline at end of file
diff --git a/data/2021/neurips/Sim and Real: Better Together b/data/2021/neurips/Sim and Real: Better Together
new file mode 100644
index 0000000000..0f1c50c620
--- /dev/null
+++ b/data/2021/neurips/Sim and Real: Better Together	
@@ -0,0 +1 @@
+Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samples from the high throughput but less accurate simulation and the low-throughput, high-fidelity and costly samples from the real environment. We achieve that by maintaining a replay buffer for each environment the agent interacts with. We analyze such multi-environment interaction theoretically, and provide convergence properties, through a novel theoretical replay buffer analysis. We demonstrate the efficacy of our method on a sim-to-real environment.
\ No newline at end of file
diff --git a/data/2021/neurips/SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement b/data/2021/neurips/SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement
new file mode 100644
index 0000000000..fdd187c225
--- /dev/null
+++ b/data/2021/neurips/SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement	
@@ -0,0 +1 @@
+Large scale training requires massive parallelism to ﬁnish the training within a reasonable amount of time. To support massive parallelism, large batch training is the key enabler but often at the cost of generalization performance. Existing works explore adaptive batching or hand-tuned static large batching, in order to strike a balance between the computational efﬁciency and the performance. However, these methods can provide only coarse-grained adaption (e.g., at a epoch level) due to the intrinsic expensive calculation or hand tuning requirements. In this paper, we propose a fully automated and lightweight adaptive batching methodology to enable ﬁne-grained batch size adaption (e.g., at a mini-batch level) that can achieve state-of-the-art performance with record breaking batch sizes. The core component of our method is a lightweight yet efﬁcient representation of the critical gradient noise information. We open-source the proposed methodology by providing a plugin tool that supports mainstream machine learning frameworks. Extensive evaluations on popular benchmarks (e.g., CIFAR10, ImageNet, and BERT-Large) demonstrate that the proposed methodology outperforms state-of-the-art methodologies using adaptive batching approaches or hand-tuned static strategies in both performance and batch size. Particularly, we achieve a new state-of-the-art batch size of 78k in BERT-Large pretraining with SQuAD score 90.69 compared to 90.58 reported in previous state-of-the-art with 59k batch size.
\ No newline at end of file
diff --git a/data/2021/neurips/Similarity and Matching of Neural Network Representations b/data/2021/neurips/Similarity and Matching of Neural Network Representations
new file mode 100644
index 0000000000..d0401fdce5
--- /dev/null
+++ b/data/2021/neurips/Similarity and Matching of Neural Network Representations	
@@ -0,0 +1 @@
+We employ a toolset -- dubbed Dr. Frankenstein -- to analyse the similarity of representations in deep neural networks. With this toolset, we aim to match the activations on given layers of two trained neural networks by joining them with a stitching layer. We demonstrate that the inner representations emerging in deep convolutional neural networks with the same architecture but different initializations can be matched with a surprisingly high degree of accuracy even with a single, affine stitching layer. We choose the stitching layer from several possible classes of linear transformations and investigate their performance and properties. The task of matching representations is closely related to notions of similarity. Using this toolset, we also provide a novel viewpoint on the current line of research regarding similarity indices of neural network representations: the perspective of the performance on a task.
\ No newline at end of file
diff --git a/data/2021/neurips/Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning b/data/2021/neurips/Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
new file mode 100644
index 0000000000..2a97222309
--- /dev/null
+++ b/data/2021/neurips/Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning	
@@ -0,0 +1 @@
+Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue. In this paper, we propose simple stochastic and online gradient descent methods for pairwise learning. A notable difference from the existing studies is that we only pair the current instance with the previous one in building a gradient direction, which is efficient in both the storage and computational complexity. We develop novel stability results, optimization, and generalization error bounds for both convex and nonconvex as well as both smooth and nonsmooth problems. We introduce novel techniques to decouple the dependency of models and the previous instance in both the optimization and generalization analysis. Our study resolves an open question on developing meaningful generalization bounds for OGD using a buffering set with a very small fixed size. We also extend our algorithms and stability analysis to develop differentially private SGD algorithms for pairwise learning which significantly improves the existing results.
\ No newline at end of file
diff --git a/data/2021/neurips/Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions b/data/2021/neurips/Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions
new file mode 100644
index 0000000000..8f768c0bde
--- /dev/null
+++ b/data/2021/neurips/Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions	
@@ -0,0 +1 @@
+Generalized self-concordance is a key property present in the objective function of many important learning problems. We establish the convergence rate of a simple Frank-Wolfe variant that uses the open-loop step size strategy $\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the iteration count. This avoids the use of second-order information or the need to estimate local smoothness parameters of previous work. We also show improved convergence rates for various common cases, e.g., when the feasible region under consideration is uniformly convex or polyhedral.
\ No newline at end of file
diff --git a/data/2021/neurips/Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection b/data/2021/neurips/Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection
new file mode 100644
index 0000000000..5a8551063d
--- /dev/null
+++ b/data/2021/neurips/Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the {\em predictive normalized maximum likelihood} (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the {\em regret}, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet-40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6\% over recent leading methods.
\ No newline at end of file
diff --git a/data/2021/neurips/SketchGen: Generating Constrained CAD Sketches b/data/2021/neurips/SketchGen: Generating Constrained CAD Sketches
new file mode 100644
index 0000000000..6a5d0d3196
--- /dev/null
+++ b/data/2021/neurips/SketchGen: Generating Constrained CAD Sketches	
@@ -0,0 +1 @@
+Computer-aided design (CAD) is the most widely used modeling approach for technical design. The typical starting point in these designs is 2D sketches which can later be extruded and combined to obtain complex three-dimensional assemblies. Such sketches are typically composed of parametric primitives, such as points, lines, and circular arcs, augmented with geometric constraints linking the primitives, such as coincidence, parallelism, or orthogonality. Sketches can be represented as graphs, with the primitives as nodes and the constraints as edges. Training a model to automatically generate CAD sketches can enable several novel workflows, but is challenging due to the complexity of the graphs and the heterogeneity of the primitives and constraints. In particular, each type of primitive and constraint may require a record of different size and parameter types. We propose SketchGen as a generative model based on a transformer architecture to address the heterogeneity problem by carefully designing a sequential language for the primitives and constraints that allows distinguishing between different primitive or constraint types and their parameters, while encouraging our model to re-use information across related parameters, encoding shared structure. A particular highlight of our work is the ability to produce primitives linked via constraints that enables the final output to be further regularized via a constraint solver. We evaluate our model by demonstrating constraint prediction for given sets of primitives and full sketch generation from scratch, showing that our approach significantly out performs the state-of-the-art in CAD sketch generation.
\ No newline at end of file
diff --git a/data/2021/neurips/Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs b/data/2021/neurips/Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs
new file mode 100644
index 0000000000..81cc1b38e3
--- /dev/null
+++ b/data/2021/neurips/Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs	
@@ -0,0 +1 @@
+Piano transcription systems are typically optimized to estimate pitch activity at each frame of audio. They are often followed by carefully designed heuristics and post-processing algorithms to estimate note events from the frame-level predictions. Recent methods have also framed piano transcription as a multi-task learning problem, where the activation of different stages of a note event are estimated independently. These practices are not well aligned with the desired outcome of the task, which is the speciﬁcation of note intervals as holistic events, rather than the aggregation of disjoint observations. In this work, we propose a novel formulation of piano transcription, which is optimized to directly predict note events. Our method is based on Semi-Markov Conditional Random Fields (semi-CRF), which produce scores for intervals rather than individual frames. When formulating piano transcription in this way, we eliminate the need to rely on disjoint frame-level estimates for different stages of a note event. We conduct experiments on the MAESTRO dataset and demonstrate that the proposed model surpasses the current state-of-the-art for piano transcription. Our results suggest that the semi-CRF output layer, while still quadratic in complexity, is a simple, fast and well-performing solution for event-based prediction, and may lead to similar success in other areas which currently rely on frame-level estimates.
\ No newline at end of file
diff --git a/data/2021/neurips/Slice Sampling Reparameterization Gradients b/data/2021/neurips/Slice Sampling Reparameterization Gradients
new file mode 100644
index 0000000000..8763cd235d
--- /dev/null
+++ b/data/2021/neurips/Slice Sampling Reparameterization Gradients	
@@ -0,0 +1 @@
+Many probabilistic modeling problems in machine learning use gradient-based optimization in which the objective takes the form of an expectation. These problems can be challenging when the parameters to be optimized determine the probability distribution under which the expectation is being taken, as the naïve Monte Carlo procedure is not differentiable. Reparameterization gradients make it possible to efﬁciently perform optimization of these Monte Carlo objectives by transforming the expectation to be differentiable, but the approach is typically limited to distributions with simple forms and tractable normalization constants. Here we describe how to differentiate samples from slice sampling to compute slice sampling reparameterization gradients , enabling a richer class of Monte Carlo objective functions to be optimized. Slice sampling is a Markov chain Monte Carlo algorithm for simulating samples from probability distributions; it only requires a density function that can be evaluated point-wise up to a normalization constant, making it applicable to a variety of inference problems and unnormalized models. Our approach is based on the observation that when the slice endpoints are known, the sampling path is a deterministic and differentiable function of the pseudo-random variables, since the algorithm is rejection-free. We evaluate the method on synthetic examples and apply it to a variety of applications with reparameterization of unnormalized probability distributions.
\ No newline at end of file
diff --git a/data/2021/neurips/Sliced Mutual Information: A Scalable Measure of Statistical Dependence b/data/2021/neurips/Sliced Mutual Information: A Scalable Measure of Statistical Dependence
new file mode 100644
index 0000000000..d00aec8267
--- /dev/null
+++ b/data/2021/neurips/Sliced Mutual Information: A Scalable Measure of Statistical Dependence	
@@ -0,0 +1 @@
+Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many desirable structural properties, the estimation of high-dimensional MI from samples suffers from the curse of dimensionality. Motivated by statistical scalability to high dimensions, this paper proposes sliced MI (SMI) as a surrogate measure of dependence. SMI is defined as an average of MI terms between one-dimensional random projections. We show that it preserves many of the structural properties of classic MI, while gaining scalable computation and efficient estimation from samples. Furthermore, and in contrast to classic MI, SMI can grow as a result of deterministic transformations. This enables leveraging SMI for feature extraction by optimizing it over processing functions of raw data to identify useful representations thereof. Our theory is supported by numerical studies of independence testing and feature extraction, which demonstrate the potential gains SMI offers over classic MI for high-dimensional inference.
\ No newline at end of file
diff --git a/data/2021/neurips/Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation b/data/2021/neurips/Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
new file mode 100644
index 0000000000..fd82a81470
--- /dev/null
+++ b/data/2021/neurips/Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation	
@@ -0,0 +1 @@
+Graph Similarity Computation (GSC) is essential to wide-ranging graph applications such as retrieval, plagiarism/anomaly detection, etc. The exact computation of graph similarity, e.g., Graph Edit Distance (GED), is an NP-hard problem that cannot be exactly solved within an adequate time given large graphs. Thanks to the strong representation power of graph neural network (GNN), a variety of GNN-based inexact methods emerged. To capture the subtle difference across graphs, the key success is designing the dense interaction with features fusion at the early stage, which, however, is a trade-off between speed and accuracy. For Slow Learning of graph similarity, this paper proposes a novel early-fusion approach by designing a co-attention-based feature fusion network on multilevel GNN features. To further improve the speed without much accuracy drop, we introduce an efﬁcient GSC solution by distilling the knowledge from the slow early-fusion model to the student one for Fast Inference . Such a student model also enables the ofﬂine collection of individual graph embeddings, speeding up the inference time in orders . To address the instability through knowledge transfer, we decompose the dynamic joint embedding into the static pseudo individual ones for precise teacher-student alignment. The experimental analysis on the real-world datasets demonstrates the superiority of our approach over the state-of-the-art methods on both accuracy and efﬁciency. Particularly, we speed up the prior art by more than 10x on the benchmark AIDS data.
\ No newline at end of file
diff --git a/data/2021/neurips/Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction b/data/2021/neurips/Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction
new file mode 100644
index 0000000000..56d803ab2f
--- /dev/null
+++ b/data/2021/neurips/Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction	
@@ -0,0 +1 @@
+Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.
\ No newline at end of file
diff --git a/data/2021/neurips/Smooth Bilevel Programming for Sparse Regularization b/data/2021/neurips/Smooth Bilevel Programming for Sparse Regularization
new file mode 100644
index 0000000000..407cb4463b
--- /dev/null
+++ b/data/2021/neurips/Smooth Bilevel Programming for Sparse Regularization	
@@ -0,0 +1 @@
+Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple reparametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are"ridable", so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.
\ No newline at end of file
diff --git a/data/2021/neurips/Smooth Normalizing Flows b/data/2021/neurips/Smooth Normalizing Flows
new file mode 100644
index 0000000000..3498bf014c
--- /dev/null
+++ b/data/2021/neurips/Smooth Normalizing Flows	
@@ -0,0 +1 @@
+Normalizing flows are a promising tool for modeling probability distributions in physical systems. While state-of-the-art flows accurately approximate distributions and energies, applications in physics additionally require smooth energies to compute forces and higher-order derivatives. Furthermore, such densities are often defined on non-trivial topologies. A recent example are Boltzmann Generators for generating 3D-structures of peptides and small proteins. These generative models leverage the space of internal coordinates (dihedrals, angles, and bonds), which is a product of hypertori and compact intervals. In this work, we introduce a class of smooth mixture transformations working on both compact intervals and hypertori. Mixture transformations employ root-finding methods to invert them in practice, which has so far prevented bi-directional flow training. To this end, we show that parameter gradients and forces of such inverses can be computed from forward evaluations via the inverse function theorem. We demonstrate two advantages of such smooth flows: they allow training by force matching to simulation data and can be used as potentials in molecular dynamics simulations.
\ No newline at end of file
diff --git a/data/2021/neurips/SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness b/data/2021/neurips/SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness
new file mode 100644
index 0000000000..47fd07156d
--- /dev/null
+++ b/data/2021/neurips/SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness	
@@ -0,0 +1 @@
+Randomized smoothing is currently a state-of-the-art method to construct a certifiably robust classifier from neural networks against $\ell_2$-adversarial perturbations. Under the paradigm, the robustness of a classifier is aligned with the prediction confidence, i.e., the higher confidence from a smoothed classifier implies the better robustness. This motivates us to rethink the fundamental trade-off between accuracy and robustness in terms of calibrating confidences of a smoothed classifier. In this paper, we propose a simple training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup: it trains on convex combinations of samples along the direction of adversarial perturbation for each input. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness in case of smoothed classifiers, and offers an intuitive way to adaptively set a new decision boundary between these samples for better robustness. Our experimental results demonstrate that the proposed method can significantly improve the certified $\ell_2$-robustness of smoothed classifiers compared to existing state-of-the-art robust training methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization b/data/2021/neurips/Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization
new file mode 100644
index 0000000000..b1083769a3
--- /dev/null
+++ b/data/2021/neurips/Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization	
@@ -0,0 +1 @@
+Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including {\em compressed communication}, {\em variance reduction} and {\em acceleration}. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, {\em smoothness matrices} -- information-rich generalizations of the ubiquitous smoothness constants -- can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms -- DCGD, DIANA and ADIANA -- yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.
\ No newline at end of file
diff --git a/data/2021/neurips/Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing b/data/2021/neurips/Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing
new file mode 100644
index 0000000000..e0de70cea3
--- /dev/null
+++ b/data/2021/neurips/Snowflake: Scaling GNNs to high-dimensional continuous control via parameter freezing	
@@ -0,0 +1 @@
+Recent research has shown that graph neural networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020). Results have so far been limited to training on small agents, with the performance of GNNs deteriorating rapidly as the number of sensors and actuators grows. A key motivation for the use of GNNs in the supervised learning setting is their applicability to large graphs, but this benefit has not yet been realised for locomotion control. We identify the weakness with a common GNN architecture that causes this poor scaling: overfitting in the MLPs within the network that encode, decode, and propagate messages. To combat this, we introduce Snowflake, a GNN training method for high-dimensional continuous control that freezes parameters in parts of the network that suffer from overfitting. Snowflake significantly boosts the performance of GNNs for locomotion control on large agents, now matching the performance of MLPs, and with superior transfer properties.
\ No newline at end of file
diff --git a/data/2021/neurips/Soft Calibration Objectives for Neural Networks b/data/2021/neurips/Soft Calibration Objectives for Neural Networks
new file mode 100644
index 0000000000..216843abfa
--- /dev/null
+++ b/data/2021/neurips/Soft Calibration Objectives for Neural Networks	
@@ -0,0 +1 @@
+Optimal decision making requires that classifiers produce uncertainty estimates consistent with their empirical accuracy. However, deep neural networks are often under- or over-confident in their predictions. Consequently, methods have been developed to improve the calibration of their predictive uncertainty both during training and post-hoc. In this work, we propose differentiable losses to improve calibration based on a soft (continuous) version of the binning operation underlying popular calibration-error estimators. When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy. For instance, we observe an 82% reduction in ECE (70% relative to the post-hoc rescaled ECE) in exchange for a 0.7% relative decrease in accuracy relative to the cross entropy baseline on CIFAR-100. When incorporated post-training, the soft-binning-based calibration error objective improves upon temperature scaling, a popular recalibration method. Overall, experiments across losses and datasets demonstrate that using calibration-sensitive procedures yield better uncertainty estimates under dataset shift than the standard practice of using a cross entropy loss and post-hoc recalibration methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Solving Graph-based Public Goods Games with Tree Search and Imitation Learning b/data/2021/neurips/Solving Graph-based Public Goods Games with Tree Search and Imitation Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent b/data/2021/neurips/Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent
new file mode 100644
index 0000000000..0649b0dc0c
--- /dev/null
+++ b/data/2021/neurips/Solving Min-Max Optimization with Hidden Structure via Gradient Descent Ascent	
@@ -0,0 +1 @@
+Many recent AI architectures are inspired by zero-sum games, however, the behavior of their dynamics is still not well understood. Inspired by this, we study standard gradient descent ascent (GDA) dynamics in a specific class of non-convex non-concave zero-sum games, that we call hidden zero-sum games. In this class, players control the inputs of smooth but possibly non-linear functions whose outputs are being applied as inputs to a convex-concave game. Unlike general zero-sum games, these games have a well-defined notion of solution; outcomes that implement the von-Neumann equilibrium of the"hidden"convex-concave game. We prove that if the hidden game is strictly convex-concave then vanilla GDA converges not merely to local Nash, but typically to the von-Neumann solution. If the game lacks strict convexity properties, GDA may fail to converge to any equilibrium, however, by applying standard regularization techniques we can prove convergence to a von-Neumann solution of a slightly perturbed zero-sum game. Our convergence guarantees are non-local, which as far as we know is a first-of-its-kind type of result in non-convex non-concave games. Finally, we discuss connections of our framework with generative adversarial networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Solving Soft Clustering Ensemble via $k$-Sparse Discrete Wasserstein Barycenter b/data/2021/neurips/Solving Soft Clustering Ensemble via $k$-Sparse Discrete Wasserstein Barycenter
new file mode 100644
index 0000000000..09621f61af
--- /dev/null
+++ b/data/2021/neurips/Solving Soft Clustering Ensemble via $k$-Sparse Discrete Wasserstein Barycenter	
@@ -0,0 +1 @@
+Clustering ensemble is one of the most important problems in ensemble learning. Though it has been extensively studied in the past decades, the existing methods often suffer from the issues like high computational complexity and the difﬁculty on understanding the consensus. In this paper, we study the more general soft clustering ensemble problem where each individual solution is a soft clustering. We connect it to the well-known discrete Wasserstein barycenter problem in geometry. Based on some novel geometric insights in high dimensions, we propose the sampling-based algorithms with provable quality guarantees. We also provide the systematical analysis on the consensus of our model. Finally, we conduct the experiments to evaluate our proposed algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Space-time Mixing Attention for Video Transformer b/data/2021/neurips/Space-time Mixing Attention for Video Transformer
new file mode 100644
index 0000000000..8cff68ad4a
--- /dev/null
+++ b/data/2021/neurips/Space-time Mixing Attention for Video Transformer	
@@ -0,0 +1 @@
+This paper is on video recognition using Transformers. Very recent attempts in this area have demonstrated promising results in terms of recognition accuracy, yet they have been also shown to induce, in many cases, significant computational overheads due to the additional modelling of the temporal information. In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model. To achieve this, our model makes two approximations to the full space-time attention used in Video Transformers: (a) It restricts time attention to a local temporal window and capitalizes on the Transformer's depth to obtain full temporal coverage of the video sequence. (b) It uses efficient space-time mixing to attend jointly spatial and temporal locations without inducing any additional cost on top of a spatial-only attention model. We also show how to integrate 2 very lightweight mechanisms for global temporal-only attention which provide additional accuracy improvements at minimal computational cost. We demonstrate that our model produces very high recognition accuracy on the most popular video recognition datasets while at the same time being significantly more efficient than other Video Transformer models. Code will be made available.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration b/data/2021/neurips/Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration
new file mode 100644
index 0000000000..ccaf50975c
--- /dev/null
+++ b/data/2021/neurips/Sparse Deep Learning: A New Framework Immune to Local Traps and Miscalibration	
@@ -0,0 +1 @@
+Deep learning has powered recent successes of artificial intelligence (AI). However, the deep neural network, as the basic model of deep learning, has suffered from issues such as local traps and miscalibration. In this paper, we provide a new framework for sparse deep learning, which has the above issues addressed in a coherent way. In particular, we lay down a theoretical foundation for sparse deep learning and propose prior annealing algorithms for learning sparse neural networks. The former has successfully tamed the sparse deep neural network into the framework of statistical modeling, enabling prediction uncertainty correctly quantified. The latter can be asymptotically guaranteed to converge to the global optimum, enabling the validity of the down-stream statistical inference. Numerical result indicates the superiority of the proposed method compared to the existing ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Flows: Pruning Continuous-depth Models b/data/2021/neurips/Sparse Flows: Pruning Continuous-depth Models
new file mode 100644
index 0000000000..21297d0153
--- /dev/null
+++ b/data/2021/neurips/Sparse Flows: Pruning Continuous-depth Models	
@@ -0,0 +1 @@
+Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that pruning improves generalization for neural ODEs in generative modeling. We empirically show that the improvement is because pruning helps avoid mode-collapse and flatten the loss surface. Moreover, pruning finds efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy. We hope our results will invigorate further research into the performance-size trade-offs of modern continuous-depth models.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation b/data/2021/neurips/Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation
new file mode 100644
index 0000000000..47799b79c3
--- /dev/null
+++ b/data/2021/neurips/Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation	
@@ -0,0 +1 @@
+We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees. We fill this gap and propose a simple yet effective sparsity-promoting modification of the Orthogonal Iteration algorithm for finding the dominant eigenspace of a matrix. By doing so, we can guarantee that our method finds a Stiefel matrix that is globally optimal with respect to the quadratic objective function, while in addition being sparse. As a motivating application we consider the task of permutation synchronisation, which can be understood as a constrained clustering problem that has particular relevance for matching multiple images or 3D shapes in computer vision, computer graphics, and beyond. We demonstrate that the proposed approach outperforms previous methods in this domain.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Spiking Gradient Descent b/data/2021/neurips/Sparse Spiking Gradient Descent
new file mode 100644
index 0000000000..8a31a22313
--- /dev/null
+++ b/data/2021/neurips/Sparse Spiking Gradient Descent	
@@ -0,0 +1 @@
+There is an increasing interest in emulating Spiking Neural Networks (SNNs) on neuromorphic computing devices due to their low energy consumption. Recent advances have allowed training SNNs to a point where they start to compete with traditional Artificial Neural Networks (ANNs) in terms of accuracy, while at the same time being energy efficient when run on neuromorphic hardware. However, the process of training SNNs is still based on dense tensor operations originally developed for ANNs which do not leverage the spatiotemporally sparse nature of SNNs. We present here the first sparse SNN backpropagation algorithm which achieves the same or better accuracy as current state of the art methods while being significantly faster and more memory efficient. We show the effectiveness of our method on real datasets of varying complexity (Fashion-MNIST, Neuromophic-MNIST and Spiking Heidelberg Digits) achieving a speedup in the backward pass of up to 150x, and 85% more memory efficient, without losing accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space b/data/2021/neurips/Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space
new file mode 100644
index 0000000000..0f614bae34
--- /dev/null
+++ b/data/2021/neurips/Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space	
@@ -0,0 +1 @@
+As a basic component of SE(3)-equivariant deep feature learning, steerable convolution has recently demonstrated its advantages for 3D semantic analysis. The advantages are, however, brought by expensive computations on dense, volumetric data, which prevent its practical use for efficient processing of 3D data that are inherently sparse. In this paper, we propose a novel design of Sparse Steerable Convolution (SS-Conv) to address the shortcoming; SS-Conv greatly accelerates steerable convolution with sparse tensors, while strictly preserving the property of SE(3)-equivariance. Based on SS-Conv, we propose a general pipeline for precise estimation of object poses, wherein a key design is a Feature-Steering module that takes the full advantage of SE(3)-equivariance and is able to conduct an efficient pose refinement. To verify our designs, we conduct thorough experiments on three tasks of 3D object semantic analysis, including instance-level 6D pose estimation, category-level 6D pose and size estimation, and category-level 6D pose tracking. Our proposed pipeline based on SS-Conv outperforms existing methods on almost all the metrics evaluated by the three tasks. Ablation studies also show the superiority of our SS-Conv over alternative convolutions in terms of both accuracy and efficiency. Our code is released publicly at https://github.com/Gorilla-Lab-SCUT/SS-Conv.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Training via Boosting Pruning Plasticity with Neuroregeneration b/data/2021/neurips/Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
new file mode 100644
index 0000000000..a2a438a147
--- /dev/null
+++ b/data/2021/neurips/Sparse Training via Boosting Pruning Plasticity with Neuroregeneration	
@@ -0,0 +1 @@
+Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization). The former method suffers from an extremely large computation cost and the latter usually struggles with insufficient performance. In comparison, during-training pruning, a class of pruning methods that simultaneously enjoys the training/inference efficiency and the comparable performance, temporarily, has been less explored. To better understand during-training pruning, we quantitatively study the effect of pruning throughout training from the perspective of pruning plasticity (the ability of the pruned networks to recover the original performance). Pruning plasticity can help explain several other empirical observations about neural network pruning in literature. We further find that pruning plasticity can be substantially improved by injecting a brain-inspired mechanism called neuroregeneration, i.e., to regenerate the same number of connections as pruned. We design a novel gradual magnitude pruning (GMP) method, named gradual pruning with zero-cost neuroregeneration (\textbf{GraNet}), that advances state of the art. Perhaps most impressively, its sparse-to-sparse version for the first time boosts the sparse-to-sparse training performance over various dense-to-sparse methods with ResNet-50 on ImageNet without extending the training time. We release all codes in https://github.com/Shiweiliuiiiiiii/GraNet.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse Uncertainty Representation in Deep Learning with Inducing Weights b/data/2021/neurips/Sparse Uncertainty Representation in Deep Learning with Inducing Weights
new file mode 100644
index 0000000000..8a88ef9a61
--- /dev/null
+++ b/data/2021/neurips/Sparse Uncertainty Representation in Deep Learning with Inducing Weights	
@@ -0,0 +1 @@
+Bayesian neural networks and deep ensembles represent two modern paradigms of uncertainty quantification in deep learning. Yet these approaches struggle to scale mainly due to memory inefficiency issues, since they require parameter storage several times higher than their deterministic counterparts. To address this, we augment the weight matrix of each layer with a small number of inducing weights, thereby projecting the uncertainty quantification into such low dimensional spaces. We further extend Matheron's conditional Gaussian sampling rule to enable fast weight sampling, which enables our inference method to maintain reasonable run-time as compared with ensembles. Importantly, our approach achieves competitive performance to the state-of-the-art in prediction and uncertainty estimation tasks with fully connected neural networks and ResNets, while reducing the parameter size to $\leq 24.3\%$ of that of a $single$ neural network.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparse is Enough in Scaling Transformers b/data/2021/neurips/Sparse is Enough in Scaling Transformers
new file mode 100644
index 0000000000..f951792a48
--- /dev/null
+++ b/data/2021/neurips/Sparse is Enough in Scaling Transformers	
@@ -0,0 +1 @@
+Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size. Surprisingly, the sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters. We also integrate with prior sparsity approaches to attention and enable fast inference on long sequences even with limited memory. This results in performance competitive to the state-of-the-art on long text summarization.
\ No newline at end of file
diff --git a/data/2021/neurips/Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains b/data/2021/neurips/Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains
new file mode 100644
index 0000000000..64bc90229d
--- /dev/null
+++ b/data/2021/neurips/Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains	
@@ -0,0 +1 @@
+A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework b/data/2021/neurips/Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework
new file mode 100644
index 0000000000..5654f620f5
--- /dev/null
+++ b/data/2021/neurips/Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework	
@@ -0,0 +1 @@
+Model smoothing is of central importance for obtaining a reliable teacher model in the student-teacher framework, where the teacher generates surrogate supervision signals to train the student. A popular model smoothing method is the Temporal Moving Average (TMA), which continuously averages the teacher parameters with the up-to-date student parameters. In this paper, we propose"Spatial Ensemble", a novel model smoothing mechanism in parallel with TMA. Spatial Ensemble randomly picks up a small fragment of the student model to directly replace the corresponding fragment of the teacher model. Consequentially, it stitches different fragments of historical student models into a unity, yielding the"Spatial Ensemble"effect. Spatial Ensemble obtains comparable student-teacher learning performance by itself and demonstrates valuable complementarity with temporal moving average. Their integration, named Spatial-Temporal Smoothing, brings general (sometimes significant) improvement to the student-teacher learning framework on a variety of state-of-the-art methods. For example, based on the self-supervised method BYOL, it yields +0.9% top-1 accuracy improvement on ImageNet, while based on the semi-supervised approach FixMatch, it increases the top-1 accuracy by around +6% on CIFAR-10 when only few training labels are available. Codes and models are available at: https://github.com/tengteng95/Spatial_Ensemble.
\ No newline at end of file
diff --git a/data/2021/neurips/Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis b/data/2021/neurips/Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis
new file mode 100644
index 0000000000..ea70848a4a
--- /dev/null
+++ b/data/2021/neurips/Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis	
@@ -0,0 +1 @@
+High-resolution satellite imagery has proven useful for a broad range of tasks, including measurement of global human population, local economic livelihoods, and biodiversity, among many others. Unfortunately, high-resolution imagery is both infrequently collected and expensive to purchase, making it hard to efﬁciently and effectively scale these downstream tasks over both time and space. We propose a new conditional pixel synthesis model that uses abundant, low-cost, low-resolution imagery to generate accurate high-resolution imagery at locations and times in which it is unavailable. We show that our model attains photo-realistic sample quality and outperforms competing baselines on a key downstream task – object counting – particularly in geographic locations where conditions on the ground are changing rapidly.
\ No newline at end of file
diff --git a/data/2021/neurips/Spatio-Temporal Variational Gaussian Processes b/data/2021/neurips/Spatio-Temporal Variational Gaussian Processes
new file mode 100644
index 0000000000..a569a57625
--- /dev/null
+++ b/data/2021/neurips/Spatio-Temporal Variational Gaussian Processes	
@@ -0,0 +1 @@
+We introduce a scalable approach to Gaussian process inference that combines spatio-temporal filtering with natural gradient variational inference, resulting in a non-conjugate GP method for multivariate data that scales linearly with respect to time. Our natural gradient approach enables application of parallel filtering and smoothing, further reducing the temporal span complexity to be logarithmic in the number of time steps. We derive a sparse approximation that constructs a state-space model over a reduced set of spatial inducing points, and show that for separable Markov kernels the full and sparse cases exactly recover the standard variational GP, whilst exhibiting favourable computational properties. To further improve the spatial scaling we propose a mean-field assumption of independence between spatial locations which, when coupled with sparsity and parallelisation, leads to an efficient and accurate method for large spatio-temporal problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks b/data/2021/neurips/Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks
new file mode 100644
index 0000000000..eded418b0b
--- /dev/null
+++ b/data/2021/neurips/Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks	
@@ -0,0 +1 @@
+In this paper, we introduce spatiotemporal joint ﬁlter decomposition to decouple spatial and temporal learning, while preserving spatiotemporal dependency in a video. A 3D convolutional ﬁlter is now jointly decomposed over a set of spatial and temporal ﬁlter atoms respectively. In this way, a 3D convolutional layer becomes three: a temporal atom layer, a spatial atom layer, and a joint coefﬁcient layer, all three remaining convolutional. One obvious arithmetic manipulation allowed in our joint decomposition is to swap spatial or temporal atoms with a set of atoms that have the same number but different sizes, while keeping the remaining unchanged. For example, as shown later, we can now achieve tempo-invariance by simply dilating temporal atoms only. To illustrate this useful atom-swapping property, we further demonstrate how such a decomposition permits the direct learning of 3D CNNs with full-size videos through iterations of two consecutive sub-stages of learning: In the temporal stage, full-temporal downsampled-spatial data are used to learn temporal atoms and joint coefﬁcients while ﬁxing spatial atoms. In the spatial stage, full-spatial downsampled-temporal data are used for spatial atoms and joint coefﬁcients while ﬁxing temporal atoms. We show empirically on multiple action recognition datasets that, the decoupled spatiotemporal learning signiﬁcantly reduces the model memory footprints, and allows deep 3D CNNs to model high-spatial long-temporal dependency with limited computational resources while delivering comparable performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Spectral embedding for dynamic networks with stability guarantees b/data/2021/neurips/Spectral embedding for dynamic networks with stability guarantees
new file mode 100644
index 0000000000..e4153240b9
--- /dev/null
+++ b/data/2021/neurips/Spectral embedding for dynamic networks with stability guarantees	
@@ -0,0 +1 @@
+We consider the problem of embedding a dynamic network, to obtain time-evolving vector representations of each node, which can then be used to describe changes in behaviour of individual nodes, communities, or the entire graph. Given this open-ended remit, we argue that two types of stability in the spatio-temporal positioning of nodes are desirable: to assign the same position, up to noise, to nodes behaving similarly at a given time (cross-sectional stability) and a constant position, up to noise, to a single node behaving similarly across different times (longitudinal stability). Similarity in behaviour is defined formally using notions of exchangeability under a dynamic latent position network model. By showing how this model can be recast as a multilayer random dot product graph, we demonstrate that unfolded adjacency spectral embedding satisfies both stability conditions. We also show how two alternative methods, omnibus and independent spectral embedding, alternately lack one or the other form of stability.
\ No newline at end of file
diff --git a/data/2021/neurips/Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution b/data/2021/neurips/Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution
new file mode 100644
index 0000000000..b50b130d82
--- /dev/null
+++ b/data/2021/neurips/Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution	
@@ -0,0 +1 @@
+Deep-learning based Super-Resolution (SR) methods have exhibited promising performance under non-blind setting where blur kernel is known. However, blur kernels of Low-Resolution (LR) images in different practical applications are usually unknown. It may lead to significant performance drop when degradation process of training images deviates from that of real images. In this paper, we propose a novel blind SR framework to super-resolve LR images degraded by arbitrary blur kernel with accurate kernel estimation in frequency domain. To our best knowledge, this is the first deep learning method which conducts blur kernel estimation in frequency domain. Specifically, we first demonstrate that feature representation in frequency domain is more conducive for blur kernel reconstruction than in spatial domain. Next, we present a Spectrum-to-Kernel (S$2$K) network to estimate general blur kernels in diverse forms. We use a Conditional GAN (CGAN) combined with SR-oriented optimization target to learn the end-to-end translation from degraded images' spectra to unknown kernels. Extensive experiments on both synthetic and real-world images demonstrate that our proposed method sufficiently reduces blur kernel estimation error, thus enables the off-the-shelf non-blind SR methods to work under blind setting effectively, and achieves superior performance over state-of-the-art blind SR methods, averagely by 1.39dB, 0.48dB on commom blind SR setting (with Gaussian kernels) for scales $2\times$ and $4\times$, respectively.
\ No newline at end of file
diff --git a/data/2021/neurips/Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network b/data/2021/neurips/Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network
new file mode 100644
index 0000000000..c11f926eb8
--- /dev/null
+++ b/data/2021/neurips/Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network	
@@ -0,0 +1 @@
+Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Speech-T: Transducer for Text to Speech and Beyond b/data/2021/neurips/Speech-T: Transducer for Text to Speech and Beyond
new file mode 100644
index 0000000000..97d8f53122
--- /dev/null
+++ b/data/2021/neurips/Speech-T: Transducer for Text to Speech and Beyond	
@@ -0,0 +1 @@
+Neural Transducer (e.g., RNN-T) has been widely used in automatic speech recognition (ASR) due to its capabilities of efﬁciently modeling monotonic alignments between input and output sequences and naturally supporting streaming inputs. Considering that monotonic alignments are also critical to text to speech (TTS) synthesis and streaming TTS is also an important application scenario, in this work, we explore the possibility of applying Transducer to TTS and more. However, it is challenging because it is difﬁcult to trade off the emission (continuous mel-spectrogram prediction) probability and transition (ASR Transducer predicts blank token to indicate transition to next input) probability when calculating the output probability lattice in Transducer, and it is not easy to learn the alignments between text and speech through the output probability lattice. We propose SpeechTrans-ducer (Speech-T for short), a Transformer based Transducer model that 1) uses a new forward algorithm to separate the transition prediction from the continuous mel-spectrogram prediction when calculating the output probability lattice, and uses a diagonal constraint in the probability lattice to help the alignment learning; 2) supports both full-sentence or streaming TTS by adjusting the look-ahead context; and 3) further supports both TTS and ASR together for the ﬁrst time, which enjoys several advantages including fewer parameters as well as streaming synthesis and recognition in a single model. Experiments on LJSpeech datasets demonstrate that Speech-T 1) is more
\ No newline at end of file
diff --git a/data/2021/neurips/Speedy Performance Estimation for Neural Architecture Search b/data/2021/neurips/Speedy Performance Estimation for Neural Architecture Search
new file mode 100644
index 0000000000..6617331205
--- /dev/null
+++ b/data/2021/neurips/Speedy Performance Estimation for Neural Architecture Search	
@@ -0,0 +1 @@
+Reliable yet efficient evaluation of generalisation performance of a proposed architecture is crucial to the success of neural architecture search (NAS). Traditional approaches face a variety of limitations: training each architecture to completion is prohibitively expensive, early stopped validation accuracy may correlate poorly with fully trained performance, and model-based estimators require large training sets. We instead propose to estimate the final test performance based on a simple measure of training speed. Our estimator is theoretically motivated by the connection between generalisation and training speed, and is also inspired by the reformulation of a PAC-Bayes bound under the Bayesian setting. Our model-free estimator is simple, efficient, and cheap to implement, and does not require hyperparameter-tuning or surrogate training before deployment. We demonstrate on various NAS search spaces that our estimator consistently outperforms other alternatives in achieving better correlation with the true test performance rankings. We further show that our estimator can be easily incorporated into both query-based and one-shot NAS methods to improve the speed or quality of the search.
\ No newline at end of file
diff --git a/data/2021/neurips/Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay b/data/2021/neurips/Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay
new file mode 100644
index 0000000000..8dd0a6be9b
--- /dev/null
+++ b/data/2021/neurips/Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay	
@@ -0,0 +1 @@
+In this paper, we comprehensively reveal the learning dynamics of normalized 1 neural network using Stochastic Gradient Descent (with momentum) and Weight 2 Decay (WD), named as Spherical Motion Dynamics (SMD). Most related works 3 focus on studying behavior of “effective learning rate" in “equilibrium" state, i.e. 4 assuming weight norm remains unchanged. However, their discussion on why this 5 equilibrium can be reached is either absent or less convincing. Our work directly 6 explores the cause of equilibrium, as a special state of SMD. Speciﬁcally, 1) we 7 introduce the assumptions that can lead to equilibrium state in SMD, and prove 8 equilibrium can be reached in a linear rate regime under given assumptions; 2) we 9 propose “angular update" as a substitute for effective learning rate to depict the state 10 of SMD, and derive the theoretical value of angular update in equilibrium state; 3) 11 we verify our assumptions and theoretical results on various large-scale computer 12 vision tasks including ImageNet and MSCOCO with standard settings. Experiment 13 results show our theoretical ﬁndings agree well with empirical observations. We 14 also show that the behavior of angular update in SMD can produce signiﬁcant 15 effect to the optimization of neural network in practice. 16
\ No newline at end of file
diff --git a/data/2021/neurips/Spot the Difference: Detection of Topological Changes via Geometric Alignment b/data/2021/neurips/Spot the Difference: Detection of Topological Changes via Geometric Alignment
new file mode 100644
index 0000000000..40319197c2
--- /dev/null
+++ b/data/2021/neurips/Spot the Difference: Detection of Topological Changes via Geometric Alignment	
@@ -0,0 +1 @@
+Geometric alignment appears in a variety of applications, ranging from domain adaptation, optimal transport, and normalizing flows in machine learning; optical flow and learned augmentation in computer vision and deformable registration within biomedical imaging. A recurring challenge is the alignment of domains whose topology is not the same; a problem that is routinely ignored, potentially introducing bias in downstream analysis. As a first step towards solving such alignment problems, we propose an unsupervised algorithm for the detection of changes in image topology. The model is based on a conditional variational auto-encoder and detects topological changes between two images during the registration step. We account for both topological changes in the image under spatial variation and unexpected transformations. Our approach is validated on two tasks and datasets: detection of topological changes in microscopy images of cells, and unsupervised anomaly detection brain imaging.
\ No newline at end of file
diff --git a/data/2021/neurips/Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery b/data/2021/neurips/Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery
new file mode 100644
index 0000000000..9d07d6f7bf
--- /dev/null
+++ b/data/2021/neurips/Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery	
@@ -0,0 +1 @@
+We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level. In comparison, previous formulations such as stable PCP rely on noise-dependent parameters to achieve similar performance, and are therefore challenging to deploy in applications where the noise level is unknown. We validate the effectiveness of our new method through experiments on simulated and real datasets. Our simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.
\ No newline at end of file
diff --git a/data/2021/neurips/Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel b/data/2021/neurips/Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
new file mode 100644
index 0000000000..e57b3e2c71
--- /dev/null
+++ b/data/2021/neurips/Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel	
@@ -0,0 +1 @@
+What makes populations stabilize? What makes them fluctuate? Are populations in complex ecosystems more stable than populations in simple ecosystems? In , Robert May addressed these questions in this monograph that has since become a classic. Trained as a theoretical physicist, May used mathematical modeling to investigate the stability and complexity of a community of interacting plants and animals, following the food web as a clue. Contrary to general biological thinking, he showed in population dynamics models that population equilibrium was less likely to destabilize such ecosystems when the number of species is increased and species interactions are randomly added. In the quarter century since its publication, the book’s message has grown in power. Stability and Complexity in Model Ecosystems played a key role in introducing nonlinear mathematical models, along with the study of deterministic chaos, to ecologists—as science writer James Gleick would chronicle in his best-seller Chaos. Nonlinear models are now at the center of ecological thinking, and current threats to biodiversity have made questions about the role of ecosystem complexity more crucial than ever. This book launched a career of mathematical insight into some of life’s biggest questions. Robert May has gone on to apply mathematical modeling to other biological problems. He has modeled the current state of biodiversity and extinction patterns, arguing for the necessity of responding immediately to this crisis. He has also developed mathematical models of infectious disease that are now used by governments and NGOs to tackle diseases like AIDS.
\ No newline at end of file
diff --git a/data/2021/neurips/Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1 n)$ b/data/2021/neurips/Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1 n)$
new file mode 100644
index 0000000000..fde06a3f80
--- /dev/null
+++ b/data/2021/neurips/Stability and Deviation Optimal Risk Bounds with Convergence Rate $O(1 n)$	
@@ -0,0 +1 @@
+The sharpest known high probability generalization bounds for uniformly stable algorithms (Feldman, Vondr\'{a}k, 2018, 2019), (Bousquet, Klochkov, Zhivotovskiy, 2020) contain a generally inevitable sampling error term of order $\Theta(1/\sqrt{n})$. When applied to excess risk bounds, this leads to suboptimal results in several standard stochastic convex optimization problems. We show that if the so-called Bernstein condition is satisfied, the term $\Theta(1/\sqrt{n})$ can be avoided, and high probability excess risk bounds of order up to $O(1/n)$ are possible via uniform stability. Using this result, we show a high probability excess risk bound with the rate $O(\log n/n)$ for strongly convex and Lipschitz losses valid for \emph{any} empirical risk minimization method. This resolves a question of Shalev-Shwartz, Shamir, Srebro, and Sridharan (2009). We discuss how $O(\log n/n)$ high probability excess risk bounds are possible for projected gradient descent in the case of strongly convex and Lipschitz losses without the usual smoothness assumption.
\ No newline at end of file
diff --git a/data/2021/neurips/Stability and Generalization of Bilevel Programming in Hyperparameter Optimization b/data/2021/neurips/Stability and Generalization of Bilevel Programming in Hyperparameter Optimization
new file mode 100644
index 0000000000..47b8af54a0
--- /dev/null
+++ b/data/2021/neurips/Stability and Generalization of Bilevel Programming in Hyperparameter Optimization	
@@ -0,0 +1 @@
+The (gradient-based) bilevel programming framework is widely used in hyperparameter optimization and has achieved excellent performance empirically. Previous theoretical work mainly focuses on its optimization properties, while leaving the analysis on generalization largely open. This paper attempts to address the issue by presenting an expectation bound w.r.t. the validation set based on uniform stability. Our results can explain some mysterious behaviours of the bilevel programming in practice, for instance, overfitting to the validation set. We also present an expectation bound for the classical cross-validation algorithm. Our results suggest that gradient-based algorithms can be better than cross-validation under certain conditions in a theoretical perspective. Furthermore, we prove that regularization terms in both the outer and inner levels can relieve the overfitting problem in gradient-based algorithms. In experiments on feature learning and data reweighting for noisy labels, we corroborate our theoretical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation b/data/2021/neurips/Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation
new file mode 100644
index 0000000000..1f42fc8564
--- /dev/null
+++ b/data/2021/neurips/Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation	
@@ -0,0 +1 @@
+While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation is a promising technique for improving generalization in RL, but it is often found to decrease sample efficiency and can even lead to divergence. In this paper, we investigate causes of instability when using data augmentation in common off-policy RL algorithms. We identify two problems, both rooted in high-variance Q-targets. Based on our findings, we propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. We perform extensive empirical evaluation of image-based RL using both ConvNets and Vision Transformers (ViT) on a family of benchmarks based on DeepMind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals. We further show that our method scales to RL with ViT-based architectures, and that data augmentation may be especially important in this setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Stabilizing Dynamical Systems via Policy Gradient Methods b/data/2021/neurips/Stabilizing Dynamical Systems via Policy Gradient Methods
new file mode 100644
index 0000000000..130f755710
--- /dev/null
+++ b/data/2021/neurips/Stabilizing Dynamical Systems via Policy Gradient Methods	
@@ -0,0 +1 @@
+Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering. In this paper, we provide a simple, model-free algorithm for stabilizing fully observed dynamical systems. While model-free methods have become increasingly popular in practice due to their simplicity and flexibility, stabilization via direct policy search has received surprisingly little attention. Our algorithm proceeds by solving a series of discounted LQR problems, where the discount factor is gradually increased. We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks b/data/2021/neurips/Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks
new file mode 100644
index 0000000000..2f2f9ae623
--- /dev/null
+++ b/data/2021/neurips/Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) are well-known to be vulnerable to adversarial attacks, where malicious human-imperceptible perturbations are included in the input to the deep network to fool it into making a wrong classification. Recent studies have demonstrated that neural Ordinary Differential Equations (ODEs) are intrinsically more robust against adversarial attacks compared to vanilla DNNs. In this work, we propose a stable neural ODE with Lyapunov-stable equilibrium points for defending against adversarial attacks (SODEF). By ensuring that the equilibrium points of the ODE solution used as part of SODEF is Lyapunov-stable, the ODE solution for an input with a small perturbation converges to the same solution as the unperturbed input. We provide theoretical results that give insights into the stability of SODEF as well as the choice of regularizers to ensure its stability. Our analysis suggests that our proposed regularizers force the extracted feature points to be within a neighborhood of the Lyapunov-stable equilibrium points of the ODE. SODEF is compatible with many defense methods and can be applied to any neural network's final regressor layer to enhance its stability against adversarial attacks.
\ No newline at end of file
diff --git a/data/2021/neurips/Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding b/data/2021/neurips/Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
new file mode 100644
index 0000000000..36520d471a
--- /dev/null
+++ b/data/2021/neurips/Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding	
@@ -0,0 +1 @@
+The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $\mathcal{O}(n\log n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime.
\ No newline at end of file
diff --git a/data/2021/neurips/Stateful ODE-Nets using Basis Function Expansions b/data/2021/neurips/Stateful ODE-Nets using Basis Function Expansions
new file mode 100644
index 0000000000..656f0046c0
--- /dev/null
+++ b/data/2021/neurips/Stateful ODE-Nets using Basis Function Expansions	
@@ -0,0 +1 @@
+The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view allows us to formulate a novel stateful ODE-Block that handles stateful layers. The benefits of this new ODE-Block are twofold: first, it enables incorporating meaningful continuous-in-depth batch normalization layers to achieve state-of-the-art performance; second, it enables compressing the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance and reducing both inference time and memory footprint. Performance is demonstrated by applying our stateful ODE-Block to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
\ No newline at end of file
diff --git a/data/2021/neurips/Stateful Strategic Regression b/data/2021/neurips/Stateful Strategic Regression
new file mode 100644
index 0000000000..5c53b8ccd5
--- /dev/null
+++ b/data/2021/neurips/Stateful Strategic Regression	
@@ -0,0 +1 @@
+Automated decision-making tools increasingly assess individuals to determine if they qualify for high-stakes opportunities. A recent line of research investigates how strategic agents may respond to such scoring tools to receive favorable assessments. While prior work has focused on the short-term strategic interactions between a decision-making institution (modeled as a principal) and individual decision-subjects (modeled as agents), we investigate interactions spanning multiple time-steps. In particular, we consider settings in which the agent's effort investment today can accumulate over time in the form of an internal state - impacting both his future rewards and that of the principal. We characterize the Stackelberg equilibrium of the resulting game and provide novel algorithms for computing it. Our analysis reveals several intriguing insights about the role of multiple interactions in shaping the game's outcome: First, we establish that in our stateful setting, the class of all linear assessment policies remains as powerful as the larger class of all monotonic assessment policies. While recovering the principal's optimal policy requires solving a non-convex optimization problem, we provide polynomial-time algorithms for recovering both the principal and agent's optimal policies under common assumptions about the process by which effort investments convert to observable features. Most importantly, we show that with multiple rounds of interaction at her disposal, the principal is more effective at incentivizing the agent to accumulate effort in her desired direction. Our work addresses several critical gaps in the growing literature on the societal impacts of automated decision-making - by focusing on longer time horizons and accounting for the compounding nature of decisions individuals receive over time.
\ No newline at end of file
diff --git a/data/2021/neurips/Statistical Inference with M-Estimators on Adaptively Collected Data b/data/2021/neurips/Statistical Inference with M-Estimators on Adaptively Collected Data
new file mode 100644
index 0000000000..efb7e46024
--- /dev/null
+++ b/data/2021/neurips/Statistical Inference with M-Estimators on Adaptively Collected Data	
@@ -0,0 +1 @@
+Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators-which includes estimators based on empirical risk minimization as well as maximum likelihood-on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.
\ No newline at end of file
diff --git a/data/2021/neurips/Statistical Query Lower Bounds for List-Decodable Linear Regression b/data/2021/neurips/Statistical Query Lower Bounds for List-Decodable Linear Regression
new file mode 100644
index 0000000000..1d1719a43d
--- /dev/null
+++ b/data/2021/neurips/Statistical Query Lower Bounds for List-Decodable Linear Regression	
@@ -0,0 +1 @@
+We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0<\alpha<1/2$ such that an $\alpha$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-\alpha)$-fraction of the points are drawn from an arbitrary noise distribution. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the target regression vector. Our main result is a Statistical Query (SQ) lower bound of $d^{\mathrm{poly}(1/\alpha)}$ for this problem. Our SQ lower bound qualitatively matches the performance of previously developed algorithms, providing evidence that current upper bounds for this task are nearly best possible.
\ No newline at end of file
diff --git a/data/2021/neurips/Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency b/data/2021/neurips/Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency
new file mode 100644
index 0000000000..0843109652
--- /dev/null
+++ b/data/2021/neurips/Statistical Regeneration Guarantees of the Wasserstein Autoencoder with Latent Space Consistency	
@@ -0,0 +1 @@
+The introduction of Variational Autoencoders (VAE) has been marked as a breakthrough in the history of representation learning models. Besides having several accolades of its own, VAE has successfully flagged off a series of inventions in the form of its immediate successors. Wasserstein Autoencoder (WAE), being an heir to that realm carries with it all of the goodness and heightened generative promises, matching even the generative adversarial networks (GANs). Needless to say, recent years have witnessed a remarkable resurgence in statistical analyses of the GANs. Similar examinations for Autoencoders, however, despite their diverse applicability and notable empirical performance, remain largely absent. To close this gap, in this paper, we investigate the statistical properties of WAE. Firstly, we provide statistical guarantees that WAE achieves the target distribution in the latent space, utilizing the Vapnik Chervonenkis (VC) theory. The main result, consequently ensures the regeneration of the input distribution, harnessing the potential offered by Optimal Transport of measures under the Wasserstein metric. This study, in turn, hints at the class of distributions WAE can reconstruct after suffering a compression in the form of a latent law.
\ No newline at end of file
diff --git a/data/2021/neurips/Statistical Undecidability in Linear, Non-Gaussian Causal Models in the Presence of Latent Confounders b/data/2021/neurips/Statistical Undecidability in Linear, Non-Gaussian Causal Models in the Presence of Latent Confounders
new file mode 100644
index 0000000000..59a9629767
--- /dev/null
+++ b/data/2021/neurips/Statistical Undecidability in Linear, Non-Gaussian Causal Models in the Presence of Latent Confounders	
@@ -0,0 +1 @@
+If causal relationships are linear and acyclic and noise terms are independent and Gaussian, causal orientation is not identified from observational data — even if faithfulness is satisfied (Spirtes et al., 2002). Shimizu et al. (2006) showed that acyclic, linear, non-Gaussian (LiNGAM) causal models are identified from observational data, so long as no latent confounders are present. That holds even when faithfulness fails. Genin and Mayo-Wilson (2020) refine that result: not only are causal relationships identified, but causal orientation is statistically decidable. That means that for every ε > 0, there is a method that converges in probability to the correct orientation and, at every sample size, outputs an incorrect orientation with probability less than ε. These results naturally raise questions about what happens in the presence of latent confounders. Hoyer et al. (2008) and Salehkaleybar et al. (2020) show that, although the causal model is not uniquely identified, causal orientation among observed variables is identified in the presence of latent confounders, so long as faithfulness is satisfied. This paper refines these results: although it is possible to converge to the right orientation in the limit, causal orientation is no longer statistically decidable—it is not possible to converge to the correct orientation with finite-sample bounds on the probability of orientation errors, even if faithfulness is satisfied. However, that limiting result suggests several adjustments to the LiNGAM model that may recover decidability.
\ No newline at end of file
diff --git a/data/2021/neurips/Statistically and Computationally Efficient Linear Meta-representation Learning b/data/2021/neurips/Statistically and Computationally Efficient Linear Meta-representation Learning
new file mode 100644
index 0000000000..9f901d1128
--- /dev/null
+++ b/data/2021/neurips/Statistically and Computationally Efficient Linear Meta-representation Learning	
@@ -0,0 +1 @@
+In typical few-shot learning, each task is not equipped with enough data to be learned in isolation. To cope with such data scarcity, meta-representation learning methods train across many related tasks to ﬁnd a shared (lower-dimensional) representation of the data where all tasks can be solved accurately. It is hypothesized that any new arriving tasks can be rapidly trained on this low-dimensional representation using only a few samples. Despite the practical successes of this approach, its statistical and computational properties are less understood. Recent theoretical studies either provide a highly suboptimal statistical error, or require many samples for every task, which is infeasible in the few-shot learning setting. Moreover, the prescribed algorithms in these studies have little resemblance to those used in practice or they are computationally intractable. To understand and explain the success of popular meta-representation learning approaches such as ANIL [43], MetaOptNet [36], R2D2 [9], and OML [33], we study a alternating gradient-descent minimization (AltMinGD) method (and its variant alternating minimization (AltMin) in the Appendix) which underlies the aforementioned methods. For a simple but canonical setting of shared linear representations, we show that AltMinGD achieves nearly-optimal estimation error, requiring only Ω(polylog d ) samples per task. This agrees with the observed efﬁcacy of this algorithm in the practical few-shot learning scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Anderson Mixing for Nonconvex Stochastic Optimization b/data/2021/neurips/Stochastic Anderson Mixing for Nonconvex Stochastic Optimization
new file mode 100644
index 0000000000..069741c9b2
--- /dev/null
+++ b/data/2021/neurips/Stochastic Anderson Mixing for Nonconvex Stochastic Optimization	
@@ -0,0 +1 @@
+Anderson mixing (AM) is an acceleration method for fixed-point iterations. Despite its success and wide usage in scientific computing, the convergence theory of AM remains unclear, and its applications to machine learning problems are not well explored. In this paper, by introducing damped projection and adaptive regularization to classical AM, we propose a Stochastic Anderson Mixing (SAM) scheme to solve nonconvex stochastic optimization problems. Under mild assumptions, we establish the convergence theory of SAM, including the almost sure convergence to stationary points and the worst-case iteration complexity. Moreover, the complexity bound can be improved when randomly choosing an iterate as the output. To further accelerate the convergence, we incorporate a variance reduction technique into the proposed SAM. We also propose a preconditioned mixing strategy for SAM which can empirically achieve faster convergence or better generalization ability. Finally, we apply the SAM method to train various neural networks including the vanilla CNN, ResNets, WideResNet, ResNeXt, DenseNet and RNN. Experimental results on image classification and language model demonstrate the advantages of our method.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Bias-Reduced Gradient Methods b/data/2021/neurips/Stochastic Bias-Reduced Gradient Methods
new file mode 100644
index 0000000000..c90b6aaad0
--- /dev/null
+++ b/data/2021/neurips/Stochastic Bias-Reduced Gradient Methods	
@@ -0,0 +1 @@
+We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_\star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_\star$ with bias $\delta$, variance $O(\log(1/\delta))$, and an expected sampling cost of $O(\log(1/\delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up to logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity b/data/2021/neurips/Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
new file mode 100644
index 0000000000..8453701a30
--- /dev/null
+++ b/data/2021/neurips/Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity	
@@ -0,0 +1 @@
+Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Multi-Armed Bandits with Control Variates b/data/2021/neurips/Stochastic Multi-Armed Bandits with Control Variates
new file mode 100644
index 0000000000..5357cd4ab2
--- /dev/null
+++ b/data/2021/neurips/Stochastic Multi-Armed Bandits with Control Variates	
@@ -0,0 +1 @@
+This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm rewards are functions of some exogenous variables. The mean values of these variables are known a priori from historical data and can be used as control variates. Leveraging the theory of control variates, we obtain mean estimates with smaller variance and tighter confidence bounds. We develop an upper confidence bound based algorithm named UCB-CV and characterize the regret bounds in terms of the correlation between rewards and control variates when they follow a multivariate normal distribution. We also extend UCB-CV to other distributions using resampling methods like Jackknifing and Splitting. Experiments on synthetic problem instances validate performance guarantees of the proposed algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge b/data/2021/neurips/Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge
new file mode 100644
index 0000000000..fa129b3acf
--- /dev/null
+++ b/data/2021/neurips/Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge	
@@ -0,0 +1 @@
+We consider the problem of online linear regression in the stochastic setting. We derive high probability regret bounds for online ridge regression and the forward algorithm. This enables us to compare online regression algorithms more accurately and eliminate assumptions of bounded observations and predictions. Our study advocates for the use of the forward algorithm in lieu of ridge due to its enhanced bounds and robustness to the regularization parameter. Moreover, we explain how to integrate it in algorithms involving linear function approximation to remove a boundedness assumption without deteriorating theoretical bounds. We showcase this modification in linear bandit settings where it yields improved regret bounds. Last, we provide numerical experiments to illustrate our results and endorse our intuitions.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence b/data/2021/neurips/Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence
new file mode 100644
index 0000000000..713e13ea00
--- /dev/null
+++ b/data/2021/neurips/Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence	
@@ -0,0 +1 @@
+Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While stochastic optimization of AUROC has been studied extensively, principled stochastic optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We cast the objective into a sum of {\it dependent compositional functions} with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms named SOAP with {\it provable convergence guarantee under mild conditions} by leveraging recent advances in stochastic compositional optimization. Extensive experimental results on image and graph datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems in terms of AUPRC. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence. The SOAP has been implemented in the libAUC library at~\url{https://libauc.org/}.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret b/data/2021/neurips/Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
new file mode 100644
index 0000000000..03d2e010a8
--- /dev/null
+++ b/data/2021/neurips/Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret	
@@ -0,0 +1 @@
+We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state. We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to guarantee both optimism and convergence of the associated value iteration scheme. We prove that EB-SSP achieves the minimax regret rate $\widetilde{O}(B_{\star} \sqrt{S A K})$, where $K$ is the number of episodes, $S$ is the number of states, $A$ is the number of actions and $B_{\star}$ bounds the expected cumulative cost of the optimal policy from any state, thus closing the gap with the lower bound. Interestingly, EB-SSP obtains this result while being parameter-free, i.e., it does not require any prior knowledge of $B_{\star}$, nor of $T_{\star}$ which bounds the expected time-to-goal of the optimal policy from any state. Furthermore, we illustrate various cases (e.g., positive costs, or general costs when an order-accurate estimate of $T_{\star}$ is available) where the regret only contains a logarithmic dependence on $T_{\star}$, thus yielding the first horizon-free regret bound beyond the finite-horizon MDP setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser b/data/2021/neurips/Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser
new file mode 100644
index 0000000000..366d8e8fd1
--- /dev/null
+++ b/data/2021/neurips/Stochastic Solutions for Linear Inverse Problems using the Prior Implicit in a Denoiser	
@@ -0,0 +1 @@
+Deep neural networks have provided state-of-the-art solutions for problems such as image denoising, which implicitly rely on a prior probability model of natural images. Two recent lines of work – Denoising Score Matching and Plug-and-Play – propose methodologies for drawing samples from this implicit prior and using it to solve inverse problems, respectively. Here, we develop a parsimonious and robust generalization of these ideas. We rely on a classic statistical result that shows the least-squares solution for removing additive Gaussian noise can be written directly in terms of the gradient of the log of the noisy signal density. We use this to derive a stochastic coarse-to-ﬁne gradient ascent procedure for drawing high-probability samples from the implicit prior embedded within a CNN trained to perform blind denoising. A generalization of this algorithm to constrained sampling provides a method for using the implicit prior to solve any deterministic linear inverse problem, with no additional training, thus extending the power of supervised learning for denoising to a much broader set of problems. The algorithm relies on minimal assumptions and exhibits robust convergence over a wide range of parameter choices. To demonstrate the generality of our method, we use it to obtain state-of-the-art levels of unsupervised performance for deblurring, super-resolution, and compressive sensing.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic bandits with groups of similar arms b/data/2021/neurips/Stochastic bandits with groups of similar arms
new file mode 100644
index 0000000000..0371c48bdc
--- /dev/null
+++ b/data/2021/neurips/Stochastic bandits with groups of similar arms	
@@ -0,0 +1 @@
+We consider a variant of the stochastic multi-armed bandit problem where arms are known to be organized into different groups having the same mean. The groups are unknown but a lower bound q on their size is known. This situation typically appears when each arm can be described with a list of categorical attributes, and the (unknown) mean reward function only depends on a subset of them, the others being redundant. In this case, q is linked naturally to the number of attributes considered redundant, and the number of categories of each attribute. For this structured problem of practical relevance, we ﬁrst derive the asymptotic regret lower bound and corresponding constrained optimization problem. They reveal the achievable regret can be substantially reduced when compared to the unstructured setup, possibly by a factor q . However, solving exactly the exact constrained optimization problem involves a combinatorial problem. We introduce a lowerbound inspired strategy involving a computationally efﬁcient relaxation that is based on a sorting mechanism. We further prove it achieves a lower bound close to the optimal one up to a controlled factor, and achieves an asymptotic regret q times smaller than the unstructured one. We believe this shows it is a valuable strategy for the practitioner. Last, we illustrate the performance of the considered strategy on numerical experiments involving a large number of arms.
\ No newline at end of file
diff --git a/data/2021/neurips/Stochastic optimization under time drift: iterate averaging, step-decay schedules, and high probability guarantees b/data/2021/neurips/Stochastic optimization under time drift: iterate averaging, step-decay schedules, and high probability guarantees
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Storchastic: A Framework for General Stochastic Automatic Differentiation b/data/2021/neurips/Storchastic: A Framework for General Stochastic Automatic Differentiation
new file mode 100644
index 0000000000..da4d55f35f
--- /dev/null
+++ b/data/2021/neurips/Storchastic: A Framework for General Stochastic Automatic Differentiation	
@@ -0,0 +1 @@
+Modelers use automatic differentiation (AD) of computation graphs to implement complex Deep Learning models without defining gradient computations. Stochastic AD extends AD to stochastic computation graphs with sampling steps, which arise when modelers handle the intractable expectations common in Reinforcement Learning and Variational Inference. However, current methods for stochastic AD are limited: They are either only applicable to continuous random variables and differentiable functions, or can only use simple but high variance score-function estimators. To overcome these limitations, we introduce Storchastic, a new framework for AD of stochastic computation graphs. Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step, to optimally reduce the variance of the gradient estimates. Furthermore, Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates. Finally, we implement Storchastic as a PyTorch library at https://github.com/HEmile/storchastic.
\ No newline at end of file
diff --git a/data/2021/neurips/Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare b/data/2021/neurips/Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare
new file mode 100644
index 0000000000..e5436ba303
--- /dev/null
+++ b/data/2021/neurips/Strategic Behavior is Bliss: Iterative Voting Improves Social Welfare	
@@ -0,0 +1 @@
+Recent work in iterative voting has defined the additive dynamic price of anarchy (ADPoA) as the difference in social welfare between the truthful and worst-case equilibrium profiles resulting from repeated strategic manipulations. While iterative plurality has been shown to only return alternatives with at most one less initial votes than the truthful winner, it is less understood how agents' welfare changes in equilibrium. To this end, we differentiate agents' utility from their manipulation mechanism and determine iterative plurality's ADPoA in the worst- and average-cases. We first prove that the worst-case ADPoA is linear in the number of agents. To overcome this negative result, we study the average-case ADPoA and prove that equilibrium winners have a constant order welfare advantage over the truthful winner in expectation. Our positive results illustrate the prospect for social welfare to increase due to strategic manipulation.
\ No newline at end of file
diff --git a/data/2021/neurips/Streaming Linear System Identification with Reverse Experience Replay b/data/2021/neurips/Streaming Linear System Identification with Reverse Experience Replay
new file mode 100644
index 0000000000..39c210d0cb
--- /dev/null
+++ b/data/2021/neurips/Streaming Linear System Identification with Reverse Experience Replay	
@@ -0,0 +1,2 @@
+We consider the problem of estimating a stochastic linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms. The problem is equivalent to estimating the parameters of vector auto-regressive (VAR) models encountered in time series analysis (Hamilton (2020)). A recent sequence of papers (Faradonbeh et al., 2018; Simchowitz et al., 2018; Sarkar and Rakhlin, 2019) show that ordinary least squares (OLS) regression can be used to provide optimal finite time estimator for the problem. However, such techniques apply for offline setting where the optimal solution of OLS is available apriori. But, in many problems of interest as encountered in reinforcement learning (RL), it is important to estimate the parameters on the go using gradient oracle. This task is challenging since standard methods like SGD might not perform well when using stochastic gradients from correlated data points (Gy\"orfi and Walk, 1996; Nagaraj et al., 2020). 
+In this work, we propose a novel algorithm, SGD with Reverse Experience Replay (SGD-RER), that is inspired by the experience replay (ER) technique popular in the RL literature (Lin, 1992). SGD-RER divides data into small buffers and runs SGD backwards on the data stored in the individual buffers. We show that this algorithm exactly deconstructs the dependency structure and obtains information theoretically optimal guarantees for both parameter error and prediction error for standard problem settings. Thus, we provide the first - to the best of our knowledge - optimal SGD-style algorithm for the classical problem of linear system identification aka VAR model estimation. Our work demonstrates that knowledge of dependency structure can aid us in designing algorithms which can deconstruct the dependencies between samples optimally in an online fashion.
\ No newline at end of file
diff --git a/data/2021/neurips/Stronger NAS with Weaker Predictors b/data/2021/neurips/Stronger NAS with Weaker Predictors
new file mode 100644
index 0000000000..c25c1722e2
--- /dev/null
+++ b/data/2021/neurips/Stronger NAS with Weaker Predictors	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to alleviate such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures due to the difficulty of fitting the huge search space. This paper reflects on a simple yet crucial question: if our final goal is to find the best architecture, do we really need to model the whole space well?. We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors. As a key property of the weak predictors, their probabilities of sampling better architectures keep increasing. Hence we only sample a few well-performed architectures guided by the previously learned predictor and estimate a new better weak predictor. This embarrassingly easy framework, dubbed WeakNAS, produces coarse-to-fine iteration to gradually refine the ranking of sampling space. Extensive experiments demonstrate that WeakNAS costs fewer samples to find top-performance architectures on NAS-Bench-101 and NAS-Bench-201. Compared to state-of-the-art (SOTA) predictor-based NAS methods, WeakNAS outperforms all with notable margins, e.g., requiring at least 7.5x less samples to find global optimal on NAS-Bench-101. WeakNAS can also absorb their ideas to boost performance more. Further, WeakNAS strikes the new SOTA result of 81.3% in the ImageNet MobileNet Search Space. The code is available at https://github.com/VITA-Group/WeakNAS.
\ No newline at end of file
diff --git a/data/2021/neurips/Structural Credit Assignment in Neural Networks using Reinforcement Learning b/data/2021/neurips/Structural Credit Assignment in Neural Networks using Reinforcement Learning
new file mode 100644
index 0000000000..3e8a4d2eca
--- /dev/null
+++ b/data/2021/neurips/Structural Credit Assignment in Neural Networks using Reinforcement Learning	
@@ -0,0 +1 @@
+Structural credit assignment in neural networks is a long-standing problem, with a variety of alternatives to backpropagation proposed to allow for local training of nodes. One of the early strategies was to treat each node as an agent and use a reinforcement learning method called REINFORCE to update each node locally with only a global reward signal. In this work, we revisit this approach and investigate if we can leverage other reinforcement learning approaches to improve learning. We ﬁrst formalize training a neural network as a ﬁnite-horizon reinforcement learning problem and discuss how this facilitates using ideas from reinforcement learning like off-policy learning. We show that the standard on-policy REINFORCE approach, even with a variety of variance reduction approaches, learns suboptimal solutions. We introduce an off-policy approach, to facilitate reasoning about the greedy action for other agents and help overcome stochasticity in other agents. We conclude by showing that these networks of agents can be more robust to correlated samples when learning online.
\ No newline at end of file
diff --git a/data/2021/neurips/Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families b/data/2021/neurips/Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families
new file mode 100644
index 0000000000..9b7dc1a02f
--- /dev/null
+++ b/data/2021/neurips/Structure learning in polynomial time: Greedy algorithms, Bregman information, and exponential families	
@@ -0,0 +1 @@
+Greedy algorithms have long been a workhorse for learning graphical models, and more broadly for learning statistical models with sparse structure. In the context of learning directed acyclic graphs, greedy algorithms are popular despite their worst-case exponential runtime. In practice, however, they are very efficient. We provide new insight into this phenomenon by studying a general greedy score-based algorithm for learning DAGs. Unlike edge-greedy algorithms such as the popular GES and hill-climbing algorithms, our approach is vertex-greedy and requires at most a polynomial number of score evaluations. We then show how recent polynomial-time algorithms for learning DAG models are a special case of this algorithm, thereby illustrating how these order-based algorithms can be rigourously interpreted as score-based algorithms. This observation suggests new score functions and optimality conditions based on the duality between Bregman divergences and exponential families, which we explore in detail. Explicit sample and computational complexity bounds are derived. Finally, we provide extensive experiments suggesting that this algorithm indeed optimizes the score in a variety of settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Structure-Aware Random Fourier Kernel for Graphs b/data/2021/neurips/Structure-Aware Random Fourier Kernel for Graphs
new file mode 100644
index 0000000000..10524fd78c
--- /dev/null
+++ b/data/2021/neurips/Structure-Aware Random Fourier Kernel for Graphs	
@@ -0,0 +1 @@
+Gaussian Processes (GPs) deﬁne distributions over functions and their generalization capabilities depend heavily on the choice of kernels. In this paper, we propose a novel structure-aware random Fourier (SRF) kernel for GPs that brings several beneﬁts when modeling graph-structured data. First, SRF kernel is deﬁned with a spectral distribution based on the Fourier duality given by the Bochner’s theorem, transforming the kernel learning problem to a distribution inference problem. Second, SRF kernel admits a random Fourier feature formulation that makes the kernel scalable for optimization. Third, SRF kernel enables to leverage geometric structures by taking subgraphs as inputs. To effectively optimize GPs with SRF kernel, we develop a variational EM algorithm, which alternates between an inference procedure (E-step) and a learning procedure (M-step). Experimental results on ﬁve real-world datasets show that our model can achieve state-of-the-art performance in two typical graph learning tasks, i.e., object classiﬁcation and link prediction.
\ No newline at end of file
diff --git a/data/2021/neurips/Structured Denoising Diffusion Models in Discrete State-Spaces b/data/2021/neurips/Structured Denoising Diffusion Models in Discrete State-Spaces
new file mode 100644
index 0000000000..b57d0ca795
--- /dev/null
+++ b/data/2021/neurips/Structured Denoising Diffusion Models in Discrete State-Spaces	
@@ -0,0 +1 @@
+Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.
\ No newline at end of file
diff --git a/data/2021/neurips/Structured Dropout Variational Inference for Bayesian Neural Networks b/data/2021/neurips/Structured Dropout Variational Inference for Bayesian Neural Networks
new file mode 100644
index 0000000000..fb40a458ad
--- /dev/null
+++ b/data/2021/neurips/Structured Dropout Variational Inference for Bayesian Neural Networks	
@@ -0,0 +1 @@
+Approximate inference in Bayesian deep networks exhibits a dilemma of how to yield high fidelity posterior approximations while maintaining computational efficiency and scalability. We tackle this challenge by introducing a novel variational structured approximation inspired by the Bayesian interpretation of Dropout regularization. Concretely, we focus on the inflexibility of the factorized structure in Dropout posterior and then propose an improved method called Variational Structured Dropout (VSD). VSD employs an orthogonal transformation to learn a structured representation on the variational Gaussian noise with plausible complexity, and consequently induces statistical dependencies in the approximate posterior. Theoretically, VSD successfully addresses the pathologies of previous Variational Dropout methods and thus offers a standard Bayesian justification. We further show that VSD induces an adaptive regularization term with several desirable properties which contribute to better generalization. Finally, we conduct extensive experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art variational methods on predictive accuracy, uncertainty estimation, and out-of-distribution detection.
\ No newline at end of file
diff --git a/data/2021/neurips/Structured Reordering for Modeling Latent Alignments in Sequence Transduction b/data/2021/neurips/Structured Reordering for Modeling Latent Alignments in Sequence Transduction
new file mode 100644
index 0000000000..6f80c2fbfa
--- /dev/null
+++ b/data/2021/neurips/Structured Reordering for Modeling Latent Alignments in Sequence Transduction	
@@ -0,0 +1 @@
+Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions. In particular, in contrast to humans, conventional sequence-to-sequence (seq2seq) models fail to generalize systematically, i.e., interpret sentences representing novel combinations of concepts (e.g., text segments) seen in training. Traditional grammar formalisms excel in such settings by implicitly encoding alignments between input and output segments, but are hard to scale and maintain. Instead of engineering a grammar, we directly model segment-to-segment alignments as discrete structured latent variables within a neural seq2seq model. To efficiently explore the large space of alignments, we introduce a reorder-first align-later framework whose central component is a neural reordering module producing {\it separable} permutations. We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations, and, thus, enabling end-to-end differentiable training of our model. The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks (i.e., semantic parsing and machine translation).
\ No newline at end of file
diff --git a/data/2021/neurips/Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training b/data/2021/neurips/Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
new file mode 100644
index 0000000000..54348e5df5
--- /dev/null
+++ b/data/2021/neurips/Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training	
@@ -0,0 +1 @@
+Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each iteration of training. We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. We conduct our experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling using the CoNLL-2003 shared task. We demonstrate that our proposed approach can be used to translate dropout-based computation reduction into reduced training time, with improvement ranging from 1.23x to 1.64x, without sacrificing the target metric.
\ No newline at end of file
diff --git a/data/2021/neurips/Stylized Dialogue Generation with Multi-Pass Dual Learning b/data/2021/neurips/Stylized Dialogue Generation with Multi-Pass Dual Learning
new file mode 100644
index 0000000000..0067674ad4
--- /dev/null
+++ b/data/2021/neurips/Stylized Dialogue Generation with Multi-Pass Dual Learning	
@@ -0,0 +1 @@
+Stylized dialogue generation, which aims to generate a given-style response for an input context, plays a vital role in intelligent dialogue systems. Considering there is no parallel data between the contexts and the responses of target style S1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S1 and an intermediate style S0 is used. However, the interaction among these texts is not fully exploited, and the pseudo contexts are not adequately modeled. To overcome the above difficulties, we propose multi-pass dual learning (MPDL), which leverages the duality among the context, response of style S1 and response of style S0. MPDL builds mappings among the above three domains, where the context should be reconstructed by the MPDL framework, and the reconstruction error is used as the training signal. To evaluate the quality of synthetic data, we also introduce discriminators that effectively measure how a pseudo sequence matches the specific domain, and the evaluation result is used as the weight for that data. Evaluation results indicate that our method obtains significant improvement over previous baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Sub-Linear Memory: How to Make Performers SLiM b/data/2021/neurips/Sub-Linear Memory: How to Make Performers SLiM
new file mode 100644
index 0000000000..18ad49c73c
--- /dev/null
+++ b/data/2021/neurips/Sub-Linear Memory: How to Make Performers SLiM	
@@ -0,0 +1 @@
+The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitous in state-of-the-art solutions for a wide variety of applications. Yet vanilla Transformers are notoriously resource-expensive, requiring $O(L^2)$ in serial time and memory as functions of input length $L$. Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation. We perform a thorough analysis of recent Transformer mechanisms with linear self-attention, Performers, in terms of overall computational complexity. We observe a remarkable computational flexibility: forward and backward propagation can be performed with no approximations using sublinear memory as a function of $L$ (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting. In the extreme case, a Performer consumes only $O(1)$ memory during training, and still requires $O(L)$ time. This discovered time-memory tradeoff can be used for training or, due to complete backward-compatibility, for fine-tuning on a low-memory device, e.g. a smartphone or an earlier-generation GPU, thus contributing towards decentralized and democratized deep learning.
\ No newline at end of file
diff --git a/data/2021/neurips/SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning b/data/2021/neurips/SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning
new file mode 100644
index 0000000000..4e05215454
--- /dev/null
+++ b/data/2021/neurips/SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning	
@@ -0,0 +1 @@
+Self-supervised learning has been shown to be very effective in learning useful representations, and yet much of the success is achieved in data types such as images, audio, and text. The success is mainly enabled by taking advantage of spatial, temporal, or semantic structure in the data through augmentation. However, such structure may not exist in tabular datasets commonly used in fields such as healthcare, making it difficult to design an effective augmentation method, and hindering a similar progress in tabular data setting. In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab), that turns the task of learning from tabular data into a multi-view representation learning problem by dividing the input features to multiple subsets. We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying latent representation. In this framework, the joint representation can be expressed as the aggregate of latent variables of the subsets at test time, which we refer to as collaborative inference. Our experiments show that the SubTab achieves the state of the art (SOTA) performance of 98.31% on MNIST in tabular setting, on par with CNN-based SOTA models, and surpasses existing baselines on three other real-world datasets by a significant margin.
\ No newline at end of file
diff --git a/data/2021/neurips/Subgame solving without common knowledge b/data/2021/neurips/Subgame solving without common knowledge
new file mode 100644
index 0000000000..42f70f1030
--- /dev/null
+++ b/data/2021/neurips/Subgame solving without common knowledge	
@@ -0,0 +1 @@
+In imperfect-information games, subgame solving is significantly more challenging than in perfect-information games, but in the last few years, such techniques have been developed. They were the key ingredient to the milestone of superhuman play in no-limit Texas hold'em poker. Current subgame-solving techniques analyze the entire common-knowledge closure of the player's current information set, that is, the smallest set of nodes within which it is common knowledge that the current node lies. While this is acceptable in games like poker where the common-knowledge closure is relatively small, many practical games have more complex information structure, which renders the common-knowledge closure impractically large to enumerate or even reasonably approximate. We introduce an approach that overcomes this obstacle, by instead working with only low-order knowledge. Our approach allows an agent, upon arriving at an infoset, to basically prune any node that is no longer reachable, thereby massively reducing the game tree size relative to the common-knowledge subgame. We prove that, as is, our approach can increase exploitability compared to the blueprint strategy. However, we develop three avenues by which safety can be guaranteed. Even without the safety-guaranteeing additions, experiments on medium-sized games show that our approach always reduced exploitability in practical games even when applied at every infoset, and a depth-limited version of it led to -- to our knowledge -- the first strong AI for the challenge problem dark chess.
\ No newline at end of file
diff --git a/data/2021/neurips/Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning b/data/2021/neurips/Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning
new file mode 100644
index 0000000000..b31c25b05c
--- /dev/null
+++ b/data/2021/neurips/Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning	
@@ -0,0 +1 @@
+Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimation and learning algorithms. However, empirical and theoretical studies have progressively shown that vanilla IS leads to poor estimations whenever the behavioral and target policies are too dissimilar. In this paper, we analyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind its undesired behavior. Then, we propose a new class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the ﬁrst to achieve, under certain conditions, two key properties: (i) it displays a subgaussian concentration rate; (ii) it preserves the differentiability in the target distribution. Finally, we provide numerical simulations on both synthetic examples and contextual bandits, in comparison with off-policy evaluation and learning baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Subgoal Search For Complex Reasoning Tasks b/data/2021/neurips/Subgoal Search For Complex Reasoning Tasks
new file mode 100644
index 0000000000..1c4ae70340
--- /dev/null
+++ b/data/2021/neurips/Subgoal Search For Complex Reasoning Tasks	
@@ -0,0 +1 @@
+Humans excel in solving complex reasoning tasks through a mental process of moving from one idea to a related one. Inspired by this, we propose Subgoal Search (kSubS) method. Its key component is a learned subgoal generator that produces a diversity of subgoals that are both achievable and closer to the solution. Using subgoals reduces the search space and induces a high-level search graph suitable for efficient planning. In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework. We show that a simple approach of generating $k$-th step ahead subgoals is surprisingly efficient on three challenging domains: two popular puzzle games, Sokoban and the Rubik's Cube, and an inequality proving benchmark INT. kSubS achieves strong results including state-of-the-art on INT within a modest computational budget.
\ No newline at end of file
diff --git a/data/2021/neurips/Subgraph Federated Learning with Missing Neighbor Generation b/data/2021/neurips/Subgraph Federated Learning with Missing Neighbor Generation
new file mode 100644
index 0000000000..4726b1f9af
--- /dev/null
+++ b/data/2021/neurips/Subgraph Federated Learning with Missing Neighbor Generation	
@@ -0,0 +1 @@
+Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holds a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.
\ No newline at end of file
diff --git a/data/2021/neurips/Subgroup Generalization and Fairness of Graph Neural Networks b/data/2021/neurips/Subgroup Generalization and Fairness of Graph Neural Networks
new file mode 100644
index 0000000000..95113b8e1e
--- /dev/null
+++ b/data/2021/neurips/Subgroup Generalization and Fairness of Graph Neural Networks	
@@ -0,0 +1 @@
+Despite enormous successful applications of graph neural networks (GNNs), theoretical understanding of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), has been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/Submodular + Concave b/data/2021/neurips/Submodular + Concave
new file mode 100644
index 0000000000..f3b938ba81
--- /dev/null
+++ b/data/2021/neurips/Submodular + Concave	
@@ -0,0 +1 @@
+It has been well established that first order optimization methods can converge to the maximal objective value of concave functions and provide constant factor approximation guarantees for (non-convex/non-concave) continuous submodular functions. In this work, we initiate the study of the maximization of functions of the form $F(x) = G(x) +C(x)$ over a solvable convex body $P$, where $G$ is a smooth DR-submodular function and $C$ is a smooth concave function. This class of functions is a strict extension of both concave and continuous DR-submodular functions for which no theoretical guarantee is known. We provide a suite of Frank-Wolfe style algorithms, which, depending on the nature of the objective function (i.e., if $G$ and $C$ are monotone or not, and non-negative or not) and on the nature of the set $P$ (i.e., whether it is downward closed or not), provide $1-1/e$, $1/e$, or $1/2$ approximation guarantees. We then use our algorithms to get a framework to smoothly interpolate between choosing a diverse set of elements from a given ground set (corresponding to the mode of a determinantal point process) and choosing a clustered set of elements (corresponding to the maxima of a suitable concave function). Additionally, we apply our algorithms to various functions in the above class (DR-submodular + concave) in both constrained and unconstrained settings, and show that our algorithms consistently outperform natural baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Subquadratic Overparameterization for Shallow Neural Networks b/data/2021/neurips/Subquadratic Overparameterization for Shallow Neural Networks
new file mode 100644
index 0000000000..b39701f114
--- /dev/null
+++ b/data/2021/neurips/Subquadratic Overparameterization for Shallow Neural Networks	
@@ -0,0 +1 @@
+Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence using various initialization strategies, training modifications, and width scalings. In particular, the state-of-the-art results require the width to scale quadratically with the number of training data under standard initialization strategies used in practice for best generalization performance. In contrast, the most recent results obtain linear scaling either with requiring initializations that lead to the"lazy-training", or training only a single layer. In this work, we provide an analytical framework that allows us to adopt standard initialization strategies, possibly avoid lazy training, and train all layers simultaneously in basic shallow neural networks while attaining a desirable subquadratic scaling on the network width. We achieve the desiderata via Polyak-Lojasiewicz condition, smoothness, and standard assumptions on data, and use tools from random matrix theory.
\ No newline at end of file
diff --git a/data/2021/neurips/Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning b/data/2021/neurips/Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning
new file mode 100644
index 0000000000..1db50c2aca
--- /dev/null
+++ b/data/2021/neurips/Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning	
@@ -0,0 +1 @@
+Operating in the real-world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goal-conditioned reinforcement learning (GCRL), becomes especially challenging for long-horizon goals. Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms. However, they struggle to scale to large, high-dimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We further exploit SF to directly compute a goal-conditioned policy for inter-landmark traversal, which we use to execute plans to"frontier"landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer b/data/2021/neurips/Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer
new file mode 100644
index 0000000000..7ba94c84e6
--- /dev/null
+++ b/data/2021/neurips/Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer	
@@ -0,0 +1 @@
+Dealing with severe class imbalance poses a major challenge for many real-world applications, especially when the accurate classification and generalization of minority classes are of primary interest. In computer vision and NLP, learning from datasets with long-tail behavior is a recurring theme, especially for naturally occurring labels. Existing solutions mostly appeal to sampling or weighting adjustments to alleviate the extreme imbalance, or impose inductive bias to prioritize generalizable associations. Here we take a novel perspective to promote sample efficiency and model generalization based on the invariance principles of causality. Our contribution posits a meta-distributional scenario, where the causal generating mechanism for label-conditional features is invariant across different labels. Such causal assumption enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if their feature distributions show apparent disparities. This allows us to leverage a causal data augmentation procedure to enlarge the representation of minority classes. Our development is orthogonal to the existing imbalanced data learning techniques thus can be seamlessly integrated. The proposed approach is validated on an extensive set of synthetic and real-world tasks against state-of-the-art solutions.
\ No newline at end of file
diff --git a/data/2021/neurips/Supervising the Transfer of Reasoning Patterns in VQA b/data/2021/neurips/Supervising the Transfer of Reasoning Patterns in VQA
new file mode 100644
index 0000000000..0a20287845
--- /dev/null
+++ b/data/2021/neurips/Supervising the Transfer of Reasoning Patterns in VQA	
@@ -0,0 +1 @@
+Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when training conditions are favorable enough. However, transferring this learned knowledge to deployable models is a challenge, as much of it is lost during the transfer. We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations. We provide a theoretical analysis based on PAC-learning, showing that such program prediction can lead to decreased sample complexity under mild hypotheses. We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training.
\ No newline at end of file
diff --git a/data/2021/neurips/Support Recovery of Sparse Signals from a Mixture of Linear Measurements b/data/2021/neurips/Support Recovery of Sparse Signals from a Mixture of Linear Measurements
new file mode 100644
index 0000000000..1fbaf0da61
--- /dev/null
+++ b/data/2021/neurips/Support Recovery of Sparse Signals from a Mixture of Linear Measurements	
@@ -0,0 +1 @@
+Recovery of support of a sparse vector from simple measurements is a widely-studied problem, considered under the frameworks of compressed sensing, 1-bit compressed sensing, and more general single index models. We consider generalizations of this problem: mixtures of linear regressions, and mixtures of linear classifiers, where the goal is to recover supports of multiple sparse vectors using only a small number of possibly noisy linear, and 1-bit measurements respectively. The key challenge is that the measurements from different vectors are randomly mixed. Both of these problems have also received attention recently. In mixtures of linear classifiers, the observations correspond to the side of queried hyperplane a random unknown vector lies in, whereas in mixtures of linear regressions we observe the projection of a random unknown vector on the queried hyperplane. The primary step in recovering the unknown vectors from the mixture is to first identify the support of all the individual component vectors. In this work, we study the number of measurements sufficient for recovering the supports of all the component vectors in a mixture in both these models. We provide algorithms that use a number of measurements polynomial in $k, \log n$ and quasi-polynomial in $\ell$, to recover the support of all the $\ell$ unknown vectors in the mixture with high probability when each individual component is a $k$-sparse $n$-dimensional vector.
\ No newline at end of file
diff --git a/data/2021/neurips/Support vector machines and linear regression coincide with very high-dimensional features b/data/2021/neurips/Support vector machines and linear regression coincide with very high-dimensional features
new file mode 100644
index 0000000000..4d1cad858c
--- /dev/null
+++ b/data/2021/neurips/Support vector machines and linear regression coincide with very high-dimensional features	
@@ -0,0 +1 @@
+The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generality of this phenomenon and make the following contributions. First, we prove a super-linear lower bound on the dimension (in terms of sample size) required for support vector proliferation in independent feature models, matching the upper bounds from previous works. We further identify a sharp phase transition in Gaussian feature models, bound the width of this transition, and give experimental support for its universality. Finally, we hypothesize that this phase transition occurs only in much higher-dimensional settings in the $\ell_1$ variant of the SVM, and we present a new geometric characterization of the problem that may elucidate this phenomenon for the general $\ell_p$ case.
\ No newline at end of file
diff --git a/data/2021/neurips/Surrogate Regret Bounds for Polyhedral Losses b/data/2021/neurips/Surrogate Regret Bounds for Polyhedral Losses
new file mode 100644
index 0000000000..aee33b993a
--- /dev/null
+++ b/data/2021/neurips/Surrogate Regret Bounds for Polyhedral Losses	
@@ -0,0 +1 @@
+Surrogate risk minimization is an ubiquitous paradigm in supervised machine learning, wherein a target problem is solved by minimizing a surrogate loss on a dataset. Surrogate regret bounds, also called excess risk bounds, are a common tool to prove generalization rates for surrogate risk minimization. While surrogate regret bounds have been developed for certain classes of loss functions, such as proper losses, general results are relatively sparse. We provide two general results. The first gives a linear surrogate regret bound for any polyhedral (piecewise-linear and convex) surrogate, meaning that surrogate generalization rates translate directly to target rates. The second shows that for sufficiently non-polyhedral surrogates, the regret bound is a square root, meaning fast surrogate generalization rates translate to slow rates for the target. Together, these results suggest polyhedral surrogates are optimal in many cases.
\ No newline at end of file
diff --git a/data/2021/neurips/SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data b/data/2021/neurips/SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
new file mode 100644
index 0000000000..19a56a9e44
--- /dev/null
+++ b/data/2021/neurips/SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data	
@@ -0,0 +1 @@
+We study the problem of inferring heterogeneous treatment effects from time-to-event data. While both the related problems of (i) estimating treatment effects for binary or continuous outcomes and (ii) predicting survival outcomes have been well studied in the recent machine learning literature, their combination -- albeit of high practical relevance -- has received considerably less attention. With the ultimate goal of reliably estimating the effects of treatments on instantaneous risk and survival probabilities, we focus on the problem of learning (discrete-time) treatment-specific conditional hazard functions. We find that unique challenges arise in this context due to a variety of covariate shift issues that go beyond a mere combination of well-studied confounding and censoring biases. We theoretically analyse their effects by adapting recent generalization bounds from domain adaptation and treatment effect estimation to our setting and discuss implications for model design. We use the resulting insights to propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations. We investigate performance across a range of experimental settings and empirically confirm that our method outperforms baselines by addressing covariate shifts from various sources.
\ No newline at end of file
diff --git a/data/2021/neurips/SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision b/data/2021/neurips/SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision
new file mode 100644
index 0000000000..48900f4e57
--- /dev/null
+++ b/data/2021/neurips/SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision	
@@ -0,0 +1 @@
+A recently proposed class of models attempts to learn latent dynamics from high-dimensional observations, like images, using priors informed by Hamiltonian mechanics. While these models have important potential applications in areas like robotics or autonomous driving, there is currently no good way to evaluate their performance: existing methods primarily rely on image reconstruction quality, which does not always reflect the quality of the learnt latent dynamics. In this work, we empirically highlight the problems with the existing measures and develop a set of new measures, including a binary indicator of whether the underlying Hamiltonian dynamics have been faithfully captured, which we call Symplecticity Metric or SyMetric. Our measures take advantage of the known properties of Hamiltonian dynamics and are more discriminative of the model's ability to capture the underlying dynamics than reconstruction error. Using SyMetric, we identify a set of architectural choices that significantly improve the performance of a previously proposed model for inferring latent dynamics from pixels, the Hamiltonian Generative Network (HGN). Unlike the original HGN, the new HGN++ is able to discover an interpretable phase space with physically meaningful latents on some datasets. Furthermore, it is stable for significantly longer rollouts on a diverse range of 13 datasets, producing rollouts of essentially infinite length both forward and backwards in time with no degradation in quality on a subset of the datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding b/data/2021/neurips/Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory b/data/2021/neurips/Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory
new file mode 100644
index 0000000000..ff01f0949d
--- /dev/null
+++ b/data/2021/neurips/Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory	
@@ -0,0 +1 @@
+A neural network model of a differential equation, namely neural ODE, has enabled the learning of continuous-time dynamical systems and probabilistic distributions with high accuracy. The neural ODE uses the same network repeatedly during a numerical integration. The memory consumption of the backpropagation algorithm is proportional to the number of uses times the network size. This is true even if a checkpointing scheme divides the computation graph into sub-graphs. Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time. Although this method consumes memory only for a single network use, it requires high computational cost to suppress numerical errors. This study proposes the symplectic adjoint method, which is an adjoint method solved by a symplectic integrator. The symplectic adjoint method obtains the exact gradient (up to rounding error) with memory proportional to the number of uses plus the network size. The experimental results demonstrate that the symplectic adjoint method consumes much less memory than the naive backpropagation algorithm and checkpointing schemes, performs faster than the adjoint method, and is more robust to rounding errors.
\ No newline at end of file
diff --git a/data/2021/neurips/SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes b/data/2021/neurips/SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
new file mode 100644
index 0000000000..1a24102e7c
--- /dev/null
+++ b/data/2021/neurips/SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes	
@@ -0,0 +1 @@
+Most of the medical observational studies estimate the causal treatment effects using electronic health records (EHR), where a patient’s covariates and outcomes are both observed longitudinally. However, previous methods focus only on adjusting for the covariates while neglecting the temporal structure in the outcomes. To bridge the gap, this paper develops a new method, SyncTwin, that learns a patient-speciﬁc time-constant representation from the pre-treatment observations. SyncTwin issues counterfactual prediction of a target patient by constructing a synthetic twin that closely matches the target in representation. The reliability of the estimated treatment effect can be assessed by comparing the observed and synthetic pre-treatment outcomes. The medical experts can interpret the estimate by examining the most important contributing individuals to the synthetic twin. In the real-data experiment, SyncTwin successfully reproduced the ﬁndings of a randomized controlled clinical trial using observational data, which demonstrates its usability in the complex real-world EHR.
\ No newline at end of file
diff --git a/data/2021/neurips/Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls b/data/2021/neurips/Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls
new file mode 100644
index 0000000000..89a441dc36
--- /dev/null
+++ b/data/2021/neurips/Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls	
@@ -0,0 +1 @@
+We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-in-means estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the US Bureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.
\ No newline at end of file
diff --git a/data/2021/neurips/Systematic Generalization with Edge Transformers b/data/2021/neurips/Systematic Generalization with Edge Transformers
new file mode 100644
index 0000000000..ac24fde1df
--- /dev/null
+++ b/data/2021/neurips/Systematic Generalization with Edge Transformers	
@@ -0,0 +1 @@
+Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes -- as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs b/data/2021/neurips/T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs
new file mode 100644
index 0000000000..342681b00b
--- /dev/null
+++ b/data/2021/neurips/T-LoHo: A Bayesian Regularization Model for Structured Sparsity and Smoothness on Graphs	
@@ -0,0 +1 @@
+Graphs have been commonly used to represent complex data structures. In models dealing with graph-structured data, multivariate parameters may not only exhibit sparse patterns but have structured sparsity and smoothness in the sense that both zero and non-zero parameters tend to cluster together. We propose a new prior for high-dimensional parameters with graphical relations, referred to as the Tree-based Low-rank Horseshoe (T-LoHo) model, that generalizes the popular univariate Bayesian horseshoe shrinkage prior to the multivariate setting to detect structured sparsity and smoothness simultaneously. The T-LoHo prior can be embedded in many high-dimensional hierarchical models. To illustrate its utility, we apply it to regularize a Bayesian high-dimensional regression problem where the regression coefficients are linked by a graph, so that the resulting clusters have flexible shapes and satisfy the cluster contiguity constraint with respect to the graph. We design an efficient Markov chain Monte Carlo algorithm that delivers full Bayesian inference with uncertainty measures for model parameters such as the number of clusters. We offer theoretical investigations of the clustering effects and posterior concentration results. Finally, we illustrate the performance of the model with simulation studies and a real data application for anomaly detection on a road network. The results indicate substantial improvements over other competing methods such as the sparse fused lasso.
\ No newline at end of file
diff --git a/data/2021/neurips/TAAC: Temporally Abstract Actor-Critic for Continuous Control b/data/2021/neurips/TAAC: Temporally Abstract Actor-Critic for Continuous Control
new file mode 100644
index 0000000000..bd3a4bf1f6
--- /dev/null
+++ b/data/2021/neurips/TAAC: Temporally Abstract Actor-Critic for Continuous Control	
@@ -0,0 +1 @@
+We present temporally abstract actor-critic (TAAC), a simple but effective off-policy RL algorithm that incorporates closed-loop temporal abstraction into the actor-critic framework. TAAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor. Crucially, its"act-or-repeat"decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more informed decisions. TAAC has two important features: a) persistent exploration, and b) a new compare-through Q operator for multi-step TD backup, specially tailored to the action repetition scenario. We demonstrate TAAC's advantages over several strong baselines across 14 continuous control tasks. Our surprising finding reveals that while achieving top performance, TAAC is able to"mine"a significant number of repeated actions with the trained policy even on continuous tasks whose problem structures on the surface seem to repel action repetition. This suggests that aside from encouraging persistent exploration, action repetition can find its place in a good policy behavior. Code is available at https://github.com/hnyu/taac.
\ No newline at end of file
diff --git a/data/2021/neurips/TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework b/data/2021/neurips/TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework
new file mode 100644
index 0000000000..44aa3f4abd
--- /dev/null
+++ b/data/2021/neurips/TNASP: A Transformer-based NAS Predictor with a Self-evolution Framework	
@@ -0,0 +1 @@
+Predictor-based Neural Architecture Search (NAS) continues to be an important topic because it aims to mitigate the time-consuming search procedure of traditional NAS methods. A promising performance predictor determines the quality of ﬁnal searched models in predictor-based NAS methods. Most existing predictor-based methodologies train model-based predictors under a proxy dataset setting, which may suffer from the accuracy decline and the generalization problem, mainly due to their poor abilities to represent spatial topology information of the graph structure data. Besides the poor encoding for spatial topology information, these works did not take advantage of the temporal information such as historical evaluations during training. Thus, we propose a Transformer-based NAS performance predictor, associated with a Laplacian matrix based positional encoding strategy, which better represents topology information and achieves better performance than previous state-of-the-art methods on NAS-Bench-101, NAS-Bench-201, and DARTS search space. Furthermore, we also propose a self-evolution framework that can fully utilize temporal information as guidance. This framework iteratively involves previous evaluation information as constraints into current optimization iteration, thus further improving the performance of our predictor. Such framework is model-agnostic, thus can enhance performance on various backbone structures for the prediction task. Our proposed method helped us rank 2nd among all teams in CVPR 2021 NAS Competition Track 2: Performance Prediction Track.
\ No newline at end of file
diff --git a/data/2021/neurips/TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation b/data/2021/neurips/TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation
new file mode 100644
index 0000000000..585e73c601
--- /dev/null
+++ b/data/2021/neurips/TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation	
@@ -0,0 +1 @@
+In few-shot domain adaptation (FDA), classiﬁers for the target domain are trained with accessible labeled data in the source domain (SD) and few labeled data in the target domain (TD). However, data usually contain private information in the current era, e.g., data distributed on personal phones. Thus, the private data will be leaked if we directly access data in SD to train a target-domain classiﬁer (required by FDA methods). In this paper, to prevent privacy leakage in SD, we consider a very challenging problem setting, where the classiﬁer for the TD has to be trained using few labeled target data and a well-trained SD classiﬁer, named few-shot hypothesis adaptation (FHA). In FHA, we cannot access data in SD, as a result, the private information in SD will be protected well. To this end, we propose a target-oriented hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i.e., an intermediate domain) to help train a target-domain classiﬁer. TOHAN maintains two deep networks simultaneously, in which one focuses on learning an intermediate domain and the other takes care of the intermediate-to-target distributional adaptation and the target-risk minimization. Experimental results show that TOHAN outperforms competitive baselines signiﬁcantly.
\ No newline at end of file
diff --git a/data/2021/neurips/TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness b/data/2021/neurips/TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness
new file mode 100644
index 0000000000..80010a55d9
--- /dev/null
+++ b/data/2021/neurips/TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness	
@@ -0,0 +1 @@
+Adversarial Transferability is an intriguing property - adversarial perturbation crafted against one model is also effective against another model, while these models are from different model families or training processes. To better protect ML systems against adversarial attacks, several questions are raised: what are the sufficient conditions for adversarial transferability and how to bound it? Is there a way to reduce the adversarial transferability in order to improve the robustness of an ensemble ML model? To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness. Our theoretical analysis shows that only promoting the orthogonality between gradients of base models is not enough to ensure low transferability; in the meantime, the model smoothness is an important factor to control the transferability. We also provide the lower and upper bounds of adversarial transferability under certain conditions. Inspired by our theoretical analysis, we propose an effective Transferability Reduced Smooth(TRS) ensemble training strategy to train a robust ensemble with low transferability by enforcing both gradient orthogonality and model smoothness between base models. We conduct extensive experiments on TRS and compare with 6 state-of-the-art ensemble baselines against 8 whitebox attacks on different datasets, demonstrating that the proposed TRS outperforms all baselines significantly.
\ No newline at end of file
diff --git a/data/2021/neurips/TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive? b/data/2021/neurips/TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive?
new file mode 100644
index 0000000000..e49e1e523b
--- /dev/null
+++ b/data/2021/neurips/TTT++: When Does Self-Supervised Test-Time Training Fail or Thrive?	
@@ -0,0 +1 @@
+Test-time training (TTT) through self-supervised learning (SSL) is an emerging paradigm to tackle distributional shifts. Despite encouraging results, it remains unclear when this approach thrives or fails. In this work, we ﬁrst provide an in-depth look at its limitations and show that TTT can possibly deteriorate, instead of improving, the test-time performance in the presence of severe distribution shifts. To address this issue, we introduce a test-time feature alignment strategy utilizing ofﬂine feature summarization and online moment matching, which regularizes adaptation without revisiting training data. We further scale this strategy in the online setting through batch-queue decoupling to enable robust moment estimates even with limited batch size. Given aligned feature distributions, we then shed light on the strong potential of TTT by theoretically analyzing its performance post adaptation. This analysis motivates our use of more informative self-supervision in the form of contrastive learning for visual recognition problems. We empirically demonstrate that our modiﬁed version of test-time training, termed TTT++ , outperforms state-of-the-art methods by signiﬁcant margins on several benchmarks. Our result indicates that storing and exploiting extra information, in addition to model parameters, can be a promising direction towards robust test-time adaptation. Our code is available at https://github.com/vita-epfl/ttt-plus-plus .
\ No newline at end of file
diff --git a/data/2021/neurips/TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning b/data/2021/neurips/TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning
new file mode 100644
index 0000000000..62c94e0986
--- /dev/null
+++ b/data/2021/neurips/TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning	
@@ -0,0 +1 @@
+We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. The proposed framework is able to learn proof search strategies as well as tactic and arguments prediction in an end-to-end manner. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart from promising alternatives. We implement the framework in the HOL4 theorem prover. Experimental results show that the framework outperforms existing automated theorem provers (i.e., hammers) available in HOL4 when evaluated on unseen problems. We further elaborate the role of key components of the framework using ablation studies.
\ No newline at end of file
diff --git a/data/2021/neurips/Tactical Optimism and Pessimism for Deep Reinforcement Learning b/data/2021/neurips/Tactical Optimism and Pessimism for Deep Reinforcement Learning
new file mode 100644
index 0000000000..24e05947ae
--- /dev/null
+++ b/data/2021/neurips/Tactical Optimism and Pessimism for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to address function approximation errors, which previously led to disappointing performance. However, a direct consequence of pessimism is reduced exploration, running counter to theoretical support for the efficacy of optimism in the face of uncertainty. So which approach is best? In this work, we show that the most effective degree of optimism can vary both across tasks and over the course of learning. Inspired by this insight, we introduce a novel deep actor-critic framework, Tactical Optimistic and Pessimistic (TOP) estimation, which switches between optimistic and pessimistic value learning online. This is achieved by formulating the selection as a multi-arm bandit problem. We show in a series of continuous control tasks that TOP outperforms existing methods which rely on a fixed degree of optimism, setting a new state of the art in challenging pixel-based environments. Since our changes are simple to implement, we believe these insights can easily be incorporated into a multitude of off-policy algorithms.
\ No newline at end of file
diff --git a/data/2021/neurips/Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time b/data/2021/neurips/Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time
new file mode 100644
index 0000000000..60009b8639
--- /dev/null
+++ b/data/2021/neurips/Tailoring: encoding inductive biases by optimizing unsupervised objectives at prediction time	
@@ -0,0 +1 @@
+From CNNs to attention mechanisms, encoding inductive biases into neural networks has been a fruitful source of improvement in machine learning. Auxiliary losses are a general way of encoding biases in order to help networks learn better representations by adding extra terms to the loss function. However, since they are minimized on the training data, they suffer from the same generalization gap as regular task losses. Moreover, by changing the loss function, the network is optimizing a different objective than the one we care about. In this work we solve both problems: first, we take inspiration from \textit{transductive learning} and note that, after receiving an input but before making a prediction, we can fine-tune our models on any unsupervised objective. We call this process tailoring, because we customize the model to each input. Second, we formulate a nested optimization (similar to those in meta-learning) and train our models to perform well on the task loss after adapting to the tailoring loss. The advantages of tailoring and meta-tailoring are discussed theoretically and demonstrated empirically on several diverse examples: encoding inductive conservation laws from physics to improve predictions, improving local smoothness to increase robustness to adversarial examples, and using contrastive losses on the query image to improve generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning b/data/2021/neurips/Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..e92ed9693a
--- /dev/null
+++ b/data/2021/neurips/Taming Communication and Sample Complexities in Decentralized Policy Evaluation for Cooperative Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Cooperative multi-agent reinforcement learning (MARL) has received increasing attention in recent years and has found many scientiﬁc and engineering applications. However, a key challenge arising from many cooperative MARL algorithm designs (e.g., the actor-critic framework) is the policy evaluation problem, which can only be conducted in a decentralized fashion. In this paper, we focus on decentralized MARL policy evaluation with nonlinear function approximation, which is often seen in deep MARL. We ﬁrst show that the empirical decentralized MARL policy evaluation problem can be reformulated as a decentralized nonconvex-strongly-concave minimax saddle point problem. We then develop a decentralized gradient-based descent ascent algorithm called GT-GDA that enjoys a convergence rate of O (1 /T ) . To further reduce the sample complexity, we pro-pose two decentralized stochastic optimization algorithms called GT-SRVR and GT-SRVR I , which enhance GT-GDA by variance reduction techniques. We show that all algorithms all enjoy an O (1 /T ) convergence rate to a stationary point of the reformulated minimax problem. Moreover, the fast convergence rates of GT-SRVR and GT-SRVR I imply O ( (cid:15) − 2 ) communication complexity and O ( m √ n(cid:15) − 2 ) sample complexity, where m is the number of agents and n is
\ No newline at end of file
diff --git a/data/2021/neurips/Targeted Neural Dynamical Modeling b/data/2021/neurips/Targeted Neural Dynamical Modeling
new file mode 100644
index 0000000000..3af9bb1219
--- /dev/null
+++ b/data/2021/neurips/Targeted Neural Dynamical Modeling	
@@ -0,0 +1 @@
+Latent dynamics models have emerged as powerful tools for modeling and interpreting neural population activity. Recently, there has been a focus on incorporating simultaneously measured behaviour into these models to further disentangle sources of neural variability in their latent space. These approaches, however, are limited in their ability to capture the underlying neural dynamics (e.g. linear) and in their ability to relate the learned dynamics back to the observed behaviour (e.g. no time lag). To this end, we introduce Targeted Neural Dynamical Modeling (TNDM), a nonlinear state-space model that jointly models the neural activity and external behavioural variables. TNDM decomposes neural dynamics into behaviourally relevant and behaviourally irrelevant dynamics; the relevant dynamics are used to reconstruct the behaviour through a flexible linear decoder and both sets of dynamics are used to reconstruct the neural activity through a linear decoder with no time lag. We implement TNDM as a sequential variational autoencoder and validate it on simulated recordings and recordings taken from the premotor and motor cortex of a monkey performing a center-out reaching task. We show that TNDM is able to learn low-dimensional latent dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data.
\ No newline at end of file
diff --git a/data/2021/neurips/Task-Adaptive Neural Network Search with Meta-Contrastive Learning b/data/2021/neurips/Task-Adaptive Neural Network Search with Meta-Contrastive Learning
new file mode 100644
index 0000000000..77ad8be737
--- /dev/null
+++ b/data/2021/neurips/Task-Adaptive Neural Network Search with Meta-Contrastive Learning	
@@ -0,0 +1 @@
+Most conventional Neural Architecture Search (NAS) approaches are limited in that they only generate architectures without searching for the optimal parameters. While some NAS methods handle this issue by utilizing a supernet trained on a large-scale dataset such as ImageNet, they may be suboptimal if the target tasks are highly dissimilar from the dataset the supernet is trained on. To address such limitations, we introduce a novel problem of \emph{Neural Network Search} (NNS), whose goal is to search for the optimal pretrained network for a novel dataset and constraints (e.g. number of parameters), from a model zoo. Then, we propose a novel framework to tackle the problem, namely \emph{Task-Adaptive Neural Network Search} (TANS). Given a model-zoo that consists of network pretrained on diverse datasets, we use a novel amortized meta-learning framework to learn a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a high-performing network on it, and minimize the similarity between irrelevant dataset-network pairs. We validate the effectiveness and efficiency of our method on ten real-world datasets, against existing NAS/AutoML baselines. The results show that our method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network. Our code and the model-zoo are available at https://github.com/wyjeong/TANS.
\ No newline at end of file
diff --git a/data/2021/neurips/Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data b/data/2021/neurips/Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data
new file mode 100644
index 0000000000..0f89106cdc
--- /dev/null
+++ b/data/2021/neurips/Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data	
@@ -0,0 +1 @@
+A deep neural network (DNN) has achieved great success in many machine learning tasks by virtue of its high expressive power. However, its prediction can be easily biased to undesirable features, which are not essential for solving the target task and are even imperceptible to a human, thereby resulting in poor generalization. Leveraging plenty of undesirable features in out-of-distribution (OOD) examples has emerged as a potential solution for de-biasing such features, and a recent study shows that softmax-level calibration of OOD examples can successfully remove the contribution of undesirable features to the last fully-connected layer of a classifier. However, its applicability is confined to the classification task, and its impact on a DNN feature extractor is not properly investigated. In this paper, we propose TAUFE, a novel regularizer that deactivates many undesirable features using OOD examples in the feature extraction layer and thus removes the dependency on the task-specific softmax layer. To show the task-agnostic nature of TAUFE, we rigorously validate its performance on three tasks, classification, regression, and a mix of them, on CIFAR-10, CIFAR-100, ImageNet, CUB200, and CAR datasets. The results demonstrate that TAUFE consistently outperforms the state-of-the-art method as well as the baselines without regularization.
\ No newline at end of file
diff --git a/data/2021/neurips/Taxonomizing local versus global structure in neural network loss landscapes b/data/2021/neurips/Taxonomizing local versus global structure in neural network loss landscapes
new file mode 100644
index 0000000000..59123f6e21
--- /dev/null
+++ b/data/2021/neurips/Taxonomizing local versus global structure in neural network loss landscapes	
@@ -0,0 +1 @@
+Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization performance). Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. By considering a range of metrics that attempt to capture different aspects of the loss landscape, we demonstrate that the best test accuracy is obtained when: the loss landscape is globally well-connected; ensembles of trained models are more similar to each other; and models converge to locally smooth regions. We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data; and that, if the loss landscape is globally poorly-connected, then training to zero loss can actually lead to worse test accuracy. Our detailed empirical results shed light on phases of learning (and consequent double descent behavior), fundamental versus incidental determinants of good generalization, the role of load-like and temperature-like parameters in the learning process, different influences on the loss landscape from model and data, and the relationships between local and global metrics, all topics of recent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Teachable Reinforcement Learning via Advice Distillation b/data/2021/neurips/Teachable Reinforcement Learning via Advice Distillation
new file mode 100644
index 0000000000..4f501fbbd7
--- /dev/null
+++ b/data/2021/neurips/Teachable Reinforcement Learning via Advice Distillation	
@@ -0,0 +1 @@
+Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention. Can we overcome these challenges by building agents that learn from rich, interactive feedback instead? We propose a new supervision paradigm for interactive learning based on"teachable"decision-making systems that learn from structured advice provided by an external teacher. We begin by formalizing a class of human-in-the-loop decision making problems in which multiple forms of teacher-provided advice are available to a learner. We then describe a simple learning algorithm for these problems that first learns to interpret advice, then learns from advice to complete tasks even in the absence of human supervision. In puzzle-solving, navigation, and locomotion domains, we show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms and often less than imitation learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Teaching an Active Learner with Contrastive Examples b/data/2021/neurips/Teaching an Active Learner with Contrastive Examples
new file mode 100644
index 0000000000..926b4afb45
--- /dev/null
+++ b/data/2021/neurips/Teaching an Active Learner with Contrastive Examples	
@@ -0,0 +1 @@
+We study the problem of active learning with the added twist that the learner is assisted by a helpful teacher. We consider the following natural interaction protocol: At each round, the learner proposes a query asking for the label of an instance $x^q$, the teacher provides the requested label $\{x^q, y^q\}$ along with explanatory information to guide the learning process. In this paper, we view this information in the form of an additional contrastive example ($\{x^c, y^c\}$) where $x^c$ is picked from a set constrained by $x^q$ (e.g., dissimilar instances with the same label). Our focus is to design a teaching algorithm that can provide an informative sequence of contrastive examples to the learner to speed up the learning process. We show that this leads to a challenging sequence optimization problem where the algorithm's choices at a given round depend on the history of interactions. We investigate an efficient teaching algorithm that adaptively picks these contrastive examples. We derive strong performance guarantees for our algorithm based on two problem-dependent parameters and further show that for specific types of active learners (e.g., a generalized binary search learner), the proposed teaching algorithm exhibits strong approximation guarantees. Finally, we illustrate our bounds and demonstrate the effectiveness of our teaching framework via two numerical case studies.
\ No newline at end of file
diff --git a/data/2021/neurips/Teaching via Best-Case Counterexamples in the Learning-with-Equivalence-Queries Paradigm b/data/2021/neurips/Teaching via Best-Case Counterexamples in the Learning-with-Equivalence-Queries Paradigm
new file mode 100644
index 0000000000..a44fb89875
--- /dev/null
+++ b/data/2021/neurips/Teaching via Best-Case Counterexamples in the Learning-with-Equivalence-Queries Paradigm	
@@ -0,0 +1 @@
+We study the sample complexity of teaching, termed as “teaching dimension” (TD) in the literature, for the learning-with-equivalence-queries (LwEQ) paradigm. More concretely, we consider a learner who asks equivalence queries (i.e., “is the queried hypothesis the target hypothesis?”), and a teacher responds either “yes” or “no” along with a counterexample to the queried hypothesis. This learning paradigm has been extensively studied when the learner receives worst-case or random counterexamples; in this paper, we consider the optimal teacher who picks best-case counterexamples to teach the target hypothesis within a hypothesis class. For this optimal teacher, we introduce LwEQ-TD, a notion of TD capturing the teaching complexity (i.e., the number of queries made) in this paradigm. We show that a signiﬁcant reduction in queries can be achieved with best-case counterexamples, in contrast to worst-case or random counterexamples, for different hypothesis classes. Furthermore, we establish new connections of LwEQ-TD to the well-studied notions of TD in the learning-from-samples paradigm.
\ No newline at end of file
diff --git a/data/2021/neurips/Techniques for Symbol Grounding with SATNet b/data/2021/neurips/Techniques for Symbol Grounding with SATNet
new file mode 100644
index 0000000000..bfef785b11
--- /dev/null
+++ b/data/2021/neurips/Techniques for Symbol Grounding with SATNet	
@@ -0,0 +1 @@
+Many experts argue that the future of artificial intelligence is limited by the field's ability to integrate symbolic logical reasoning into deep learning architectures. The recently proposed differentiable MAXSAT solver, SATNet, was a breakthrough in its capacity to integrate with a traditional neural network and solve visual reasoning problems. For instance, it can learn the rules of Sudoku purely from image examples. Despite its success, SATNet was shown to succumb to a key challenge in neurosymbolic systems known as the Symbol Grounding Problem: the inability to map visual inputs to symbolic variables without explicit supervision ("label leakage"). In this work, we present a self-supervised pre-training pipeline that enables SATNet to overcome this limitation, thus broadening the class of problems that SATNet architectures can solve to include datasets where no intermediary labels are available at all. We demonstrate that our method allows SATNet to attain full accuracy even with a harder problem setup that prevents any label leakage. We additionally introduce a proofreading method that further improves the performance of SATNet architectures, beating the state-of-the-art on Visual Sudoku.
\ No newline at end of file
diff --git a/data/2021/neurips/Temporal-attentive Covariance Pooling Networks for Video Recognition b/data/2021/neurips/Temporal-attentive Covariance Pooling Networks for Video Recognition
new file mode 100644
index 0000000000..f96b49064a
--- /dev/null
+++ b/data/2021/neurips/Temporal-attentive Covariance Pooling Networks for Video Recognition	
@@ -0,0 +1 @@
+For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling(TCP), inserted at the end of deep architectures, to produce powerful video representations. Specifically, our TCP first develops a temporal attention module to adaptively calibrate spatio-temporal features for the succeeding covariance pooling, approximatively producing attentive covariance representations. Then, a temporal covariance pooling performs temporal pooling of the attentive covariance representations to characterize both intra-frame correlations and inter-frame cross-correlations of the calibrated features. As such, the proposed TCP can capture complex temporal dynamics. Finally, a fast matrix power normalization is introduced to exploit geometry of covariance representations. Note that our TCP is model-agnostic and can be flexibly integrated into any video architectures, resulting in TCPNet for effective video recognition. The extensive experiments on six benchmarks (e.g., Kinetics, Something-Something V1 and Charades) using various video architectures show our TCPNet is clearly superior to its counterparts, while having strong generalization ability. The source code is publicly available.
\ No newline at end of file
diff --git a/data/2021/neurips/Temporally Abstract Partial Models b/data/2021/neurips/Temporally Abstract Partial Models
new file mode 100644
index 0000000000..ffce811b87
--- /dev/null
+++ b/data/2021/neurips/Temporally Abstract Partial Models	
@@ -0,0 +1 @@
+Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement learning, option models (Sutton, Precup \&Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we demonstrate empirically the potential impact of partial option models on the efficiency of planning.
\ No newline at end of file
diff --git a/data/2021/neurips/Tensor Normal Training for Deep Learning Models b/data/2021/neurips/Tensor Normal Training for Deep Learning Models
new file mode 100644
index 0000000000..a49de8d4f3
--- /dev/null
+++ b/data/2021/neurips/Tensor Normal Training for Deep Learning Models	
@@ -0,0 +1 @@
+Despite the predominant use of first-order methods for training deep learning models, second-order methods, and in particular, natural gradient methods, remain of interest because of their potential for accelerating training through the use of curvature information. Several methods with non-diagonal preconditioning matrices, including KFAC, Shampoo, and K-BFGS, have been proposed and shown to be effective. Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters. By approximating the probabilistically based Fisher matrix, as opposed to the empirical Fisher matrix, our method uses the block-wise covariance of the sampling based gradient as the pre-conditioning matrix. Moreover, the assumption that the sampling-based (tensor) gradient follows a TN distribution, ensures that its covariance has a Kronecker separable structure, which leads to a tractable approximation to the Fisher matrix. Consequently, TNT's memory requirements and per-iteration computational costs are only slightly higher than those for first-order methods. In our experiments, TNT exhibited superior optimization performance to state-of-the-art first-order methods, and comparable optimization performance to the state-of-the-art second-order methods KFAC and Shampoo. Moreover, TNT demonstrated its ability to generalize as well as first-order methods, while using fewer epochs.
\ No newline at end of file
diff --git a/data/2021/neurips/Tensor decompositions of higher-order correlations by nonlinear Hebbian plasticity b/data/2021/neurips/Tensor decompositions of higher-order correlations by nonlinear Hebbian plasticity
new file mode 100644
index 0000000000..6bea6f95bc
--- /dev/null
+++ b/data/2021/neurips/Tensor decompositions of higher-order correlations by nonlinear Hebbian plasticity	
@@ -0,0 +1 @@
+Biological synaptic plasticity exhibits nonlinearities that are not accounted for by classic Hebbian learning rules. Here, we introduce a simple family of generalized nonlinear Hebbian learning rules. We study the computations implemented by their dynamics in the simple setting of a neuron receiving feedforward inputs. These nonlinear Hebbian rules allow a neuron to learn tensor decompositions of its higher-order input correlations. The particular input correlation decomposed and the form of the decomposition depend on the location of nonlinearities in the plasticity rule. For simple, biologically motivated parameters, the neuron learns eigenvectors of higher-order input correlation tensors. We prove that tensor eigenvectors are attractors and determine their basins of attraction. We calculate the volume of those basins, showing that the dominant eigenvector has the largest basin of attraction. We then study arbitrary learning rules and find that any learning rule that admits a finite Taylor expansion into the neural input and output also has stable equilibria at generalized eigenvectors of higher-order input correlation tensors. Nonlinearities in synaptic plasticity thus allow a neuron to encode higher-order input correlations in a simple fashion.
\ No newline at end of file
diff --git a/data/2021/neurips/Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs b/data/2021/neurips/Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
new file mode 100644
index 0000000000..7d1da6821f
--- /dev/null
+++ b/data/2021/neurips/Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs	
@@ -0,0 +1 @@
+Imperative programming allows users to implement their deep neural networks (DNNs) easily and has become an essential part of recent deep learning (DL) frameworks. Recently, several systems have been proposed to combine the usability of imperative programming with the optimized performance of symbolic graph execution. Such systems convert imperative Python DL programs to optimized symbolic graphs and execute them. However, they cannot fully support the usability of imperative programming. For example, if an imperative DL program contains a Python feature with no corresponding symbolic representation (e.g., third-party library calls or unsupported dynamic control flows) they fail to execute the program. To overcome this limitation, we propose Terra, an imperative-symbolic co-execution system that can handle any imperative DL programs while achieving the optimized performance of symbolic graph execution. To achieve this, Terra builds a symbolic graph by decoupling DL operations from Python features. Then, Terra conducts the imperative execution to support all Python features, while delegating the decoupled operations to the symbolic execution. We evaluated the performance improvement and coverage of Terra with ten imperative DL programs for several DNN architectures. The results show that Terra can speed up the execution of all ten imperative DL programs, whereas AutoGraph, one of the state-of-the-art systems, fails to execute five of them.
\ No newline at end of file
diff --git a/data/2021/neurips/Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization b/data/2021/neurips/Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization
new file mode 100644
index 0000000000..5e15a663b3
--- /dev/null
+++ b/data/2021/neurips/Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization	
@@ -0,0 +1 @@
+This paper presents a new algorithm for domain generalization (DG), test-time template adjuster (T3A) , aiming to robustify a model to unknown distribution shift. Unlike existing methods that focus on training phase , our method focuses test phase , i.e., correcting its prediction by itself during test time. Speciﬁcally, T3A adjusts a trained linear classiﬁer (the last layer of deep neural networks) with the following procedure: (1) compute a pseudo-prototype representation for each class using online unlabeled data augmented by the base classiﬁer trained in the source domains, (2) and then classify each sample based on its distance to the pseudo-prototypes. T3A is back-propagation-free and modiﬁes only the linear layer; therefore, the increase in computational cost during inference is negligible and avoids the catastrophic failure might caused by stochastic optimization. Despite its simplicity, T3A can leverage knowledge about the target domain by using off-the-shelf test-time data and improve performance. We tested our method on four domain generalization benchmarks, namely PACS, VLCS, OfﬁceHome, and TerraIncognita, along with various backbone networks including ResNet18, ResNet50, Big Transfer (BiT), Vision Transformers (ViT), and MLP-Mixer. The results show T3A stably improves performance on unseen domains across choices of backbone networks, and outperforms existing domain generalization methods.
\ No newline at end of file
diff --git a/data/2021/neurips/Test-Time Personalization with a Transformer for Human Pose Estimation b/data/2021/neurips/Test-Time Personalization with a Transformer for Human Pose Estimation
new file mode 100644
index 0000000000..1944bf9a12
--- /dev/null
+++ b/data/2021/neurips/Test-Time Personalization with a Transformer for Human Pose Estimation	
@@ -0,0 +1 @@
+We propose to personalize a human pose estimator given a set of test images of a person without using any manual annotations. While there is a significant advancement in human pose estimation, it is still very challenging for a model to generalize to different unknown environments and unseen persons. Instead of using a fixed model for every test case, we adapt our pose estimator during test time to exploit person-specific information. We first train our model on diverse data with both a supervised and a self-supervised pose estimation objectives jointly. We use a Transformer model to build a transformation between the self-supervised keypoints and the supervised keypoints. During test time, we personalize and adapt our model by fine-tuning with the self-supervised objective. The pose is then improved by transforming the updated self-supervised keypoints. We experiment with multiple datasets and show significant improvements on pose estimations with our self-supervised personalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Test-time Collective Prediction b/data/2021/neurips/Test-time Collective Prediction
new file mode 100644
index 0000000000..8d8c652507
--- /dev/null
+++ b/data/2021/neurips/Test-time Collective Prediction	
@@ -0,0 +1 @@
+An increasingly common setting in machine learning involves multiple parties, each with their own data, who want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents to make better predictions than they would individually, but may not be willing to release their data or model parameters. In this work, we explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model without relying on external validation, model retraining, or data pooling. Our approach takes inspiration from the literature in social science on human consensus-making. We analyze our mechanism theoretically, showing that it converges to inverse meansquared-error (MSE) weighting in the large-sample limit. To compute error bars on the collective predictions we propose a decentralized Jackknife procedure that evaluates the sensitivity of our mechanism to a single agent's prediction. Empirically, we demonstrate that our scheme effectively combines models with differing quality across the input space. The proposed consensus prediction achieves significant gains over classical model averaging, and even outperforms weighted averaging schemes that have access to additional validation data.
\ No newline at end of file
diff --git a/data/2021/neurips/TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks b/data/2021/neurips/TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks
new file mode 100644
index 0000000000..edfffbdd08
--- /dev/null
+++ b/data/2021/neurips/TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks	
@@ -0,0 +1 @@
+Deep learning (DL) has achieved unprecedented success in a variety of tasks. However, DL systems are notoriously difficult to test and debug due to the lack of explainability of DL models and the huge test input space to cover. Generally speaking, it is relatively easy to collect a massive amount of test data, but the labeling cost can be quite high. Consequently, it is essential to conduct test selection and label only those selected"high quality"bug-revealing test inputs for test cost reduction. In this paper, we propose a novel test prioritization technique that brings order into the unlabeled test instances according to their bug-revealing capabilities, namely TestRank. Different from existing solutions, TestRank leverages both intrinsic attributes and contextual attributes of test instances when prioritizing them. To be specific, we first build a similarity graph on test instances and training samples, and we conduct graph-based semi-supervised learning to extract contextual features. Then, for a particular test instance, the contextual features extracted from the graph neural network (GNN) and the intrinsic features obtained with the DL model itself are combined to predict its bug-revealing probability. Finally, TestRank prioritizes unlabeled test instances in descending order of the above probability value. We evaluate the performance of TestRank on a variety of image classification datasets. Experimental results show that the debugging efficiency of our method significantly outperforms existing test prioritization techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Testing Probabilistic Circuits b/data/2021/neurips/Testing Probabilistic Circuits
new file mode 100644
index 0000000000..7619d79df9
--- /dev/null
+++ b/data/2021/neurips/Testing Probabilistic Circuits	
@@ -0,0 +1 @@
+Probabilistic circuits (PCs) are a powerful modeling framework for representing tractable probability distributions over combinatorial spaces. In machine learning and probabilistic programming, one is often interested in understanding whether the distributions learned using PCs are close to the desired distribution. Thus, given two probabilistic circuits, a fundamental problem of interest is to determine whether their distributions are close to each other. The primary contribution of this paper is a closeness test for PCs with respect to the total variation distance metric. Our algorithm utilizes two common PC queries, counting and sampling. In particular, we provide a poly-time probabilistic algorithm to check the closeness of two PCs when the PCs support tractable approximate counting and sampling. We demonstrate the practical efficiency of our algorithmic framework via a detailed experimental evaluation of a prototype implementation against a set of 475 PC benchmarks. We find that our test correctly decides the closeness of all 475 PCs within 3600 seconds.
\ No newline at end of file
diff --git a/data/2021/neurips/The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy b/data/2021/neurips/The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy
new file mode 100644
index 0000000000..d87a34bb67
--- /dev/null
+++ b/data/2021/neurips/The Adaptive Doubly Robust Estimator and a Paradox Concerning Logging Policy	
@@ -0,0 +1 @@
+The doubly robust (DR) estimator, which consists of two nuisance parameters, the conditional mean outcome and the logging policy (the probability of choosing an action), is crucial in causal inference. This paper proposes a DR estimator for dependent samples obtained from adaptive experiments. To obtain an asymptotically normal semiparametric estimator from dependent samples with non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of sample-splitting. We also report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators utilizing the true logging policy. While a similar phenomenon is known for estimators with i.i.d. samples, traditional explanations based on asymptotic efficiency cannot elucidate our case with dependent samples. We confirm this hypothesis through simulation studies.
\ No newline at end of file
diff --git a/data/2021/neurips/The Benefits of Implicit Regularization from SGD in Least Squares Problems b/data/2021/neurips/The Benefits of Implicit Regularization from SGD in Least Squares Problems
new file mode 100644
index 0000000000..79b7e40e92
--- /dev/null
+++ b/data/2021/neurips/The Benefits of Implicit Regularization from SGD in Least Squares Problems	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with logarithmically more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires quadratically more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.
\ No newline at end of file
diff --git a/data/2021/neurips/The Causal-Neural Connection: Expressiveness, Learnability, and Inference b/data/2021/neurips/The Causal-Neural Connection: Expressiveness, Learnability, and Inference
new file mode 100644
index 0000000000..a35785080b
--- /dev/null
+++ b/data/2021/neurips/The Causal-Neural Connection: Expressiveness, Learnability, and Inference	
@@ -0,0 +1 @@
+One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.
\ No newline at end of file
diff --git a/data/2021/neurips/The Complexity of Bayesian Network Learning: Revisiting the Superstructure b/data/2021/neurips/The Complexity of Bayesian Network Learning: Revisiting the Superstructure
new file mode 100644
index 0000000000..2d197578af
--- /dev/null
+++ b/data/2021/neurips/The Complexity of Bayesian Network Learning: Revisiting the Superstructure	
@@ -0,0 +1 @@
+We investigate the parameterized complexity of Bayesian Network Structure Learning (BNSL), a classical problem that has received signiﬁcant attention in empirical but also purely theoretical studies. We follow up on previous works that have analyzed the complexity of BNSL w.r.t. the so-called superstructure of the input. While known results imply that BNSL is unlikely to be ﬁxed-parameter tractable even when parameterized by the size of a vertex cover in the superstructure, here we show that a different kind of parameterization—notably by the size of a feedback edge set—yields ﬁxed-parameter tractability. We proceed by showing that this result can be strengthened to a localized version of the feedback edge set, and provide corresponding lower bounds that complement previous results to provide a complexity classiﬁcation of BNSL w.r.t. virtually all well-studied graph parameters. We then analyze how the complexity of BNSL depends on the representation of the input. In particular, while the bulk of past theoretical work on the topic assumed the use of the so-called non-zero representation , here we prove that if an additive representation can be used instead then BNSL becomes ﬁxed-parameter tractable even under signiﬁcantly milder restrictions to the superstructure, notably when parameterized by the treewidth alone. Last but not least, we show how our results can be extended to the closely related problem of Polytree Learning.
\ No newline at end of file
diff --git a/data/2021/neurips/The Complexity of Sparse Tensor PCA b/data/2021/neurips/The Complexity of Sparse Tensor PCA
new file mode 100644
index 0000000000..734112ba81
--- /dev/null
+++ b/data/2021/neurips/The Complexity of Sparse Tensor PCA	
@@ -0,0 +1 @@
+We study the problem of sparse tensor principal component analysis: given a tensor $\pmb Y = \pmb W + \lambda x^{\otimes p}$ with $\pmb W \in \otimes^p\mathbb{R}^n$ having i.i.d. Gaussian entries, the goal is to recover the $k$-sparse unit vector $x \in \mathbb{R}^n$. The model captures both sparse PCA (in its Wigner form) and tensor PCA. For the highly sparse regime of $k \leq \sqrt{n}$, we present a family of algorithms that smoothly interpolates between a simple polynomial-time algorithm and the exponential-time exhaustive search algorithm. For any $1 \leq t \leq k$, our algorithms recovers the sparse vector for signal-to-noise ratio $\lambda \geq \tilde{\mathcal{O}} (\sqrt{t} \cdot (k/t)^{p/2})$ in time $\tilde{\mathcal{O}}(n^{p+t})$, capturing the state-of-the-art guarantees for the matrix settings (in both the polynomial-time and sub-exponential time regimes). Our results naturally extend to the case of $r$ distinct $k$-sparse signals with disjoint supports, with guarantees that are independent of the number of spikes. Even in the restricted case of sparse PCA, known algorithms only recover the sparse vectors for $\lambda \geq \tilde{\mathcal{O}}(k \cdot r)$ while our algorithms require $\lambda \geq \tilde{\mathcal{O}}(k)$. Finally, by analyzing the low-degree likelihood ratio, we complement these algorithmic results with rigorous evidence illustrating the trade-offs between signal-to-noise ratio and running time. This lower bound captures the known lower bounds for both sparse PCA and tensor PCA. In this general model, we observe a more intricate three-way trade-off between the number of samples $n$, the sparsity $k$, and the tensor power $p$.
\ No newline at end of file
diff --git a/data/2021/neurips/The Difficulty of Passive Learning in Deep Reinforcement Learning b/data/2021/neurips/The Difficulty of Passive Learning in Deep Reinforcement Learning
new file mode 100644
index 0000000000..e050090628
--- /dev/null
+++ b/data/2021/neurips/The Difficulty of Passive Learning in Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justifications are mostly limited to the tabular or linear cases. Given the impressive results of deep reinforcement learning, we argue for a need to more clearly understand the challenges in this setting. In the vein of Held&Hein's classic 1963 experiment, we propose the"tandem learning"experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning. We identify function approximation in conjunction with fixed data distributions as the strongest factors, thereby extending but also challenging hypotheses stated in past work. Our results provide relevant insights for offline deep reinforcement learning, while also shedding new light on phenomena observed in the online case of learning control.
\ No newline at end of file
diff --git a/data/2021/neurips/The Effect of the Intrinsic Dimension on the Generalization of Quadratic Classifiers b/data/2021/neurips/The Effect of the Intrinsic Dimension on the Generalization of Quadratic Classifiers
new file mode 100644
index 0000000000..406cb449f3
--- /dev/null
+++ b/data/2021/neurips/The Effect of the Intrinsic Dimension on the Generalization of Quadratic Classifiers	
@@ -0,0 +1 @@
+It has been recently observed that neural networks, unlike kernel methods, enjoy a reduced sample complexity when the distribution is isotropic (i.e., when the covariance matrix is the identity). We ﬁnd that this sensitivity to the data distribution is not exclusive to neural networks, and the same phenomenon can be observed on the class of quadratic classiﬁers (i.e., the sign of a quadratic polynomial) with a nuclear-norm constraint. We demonstrate this by deriving an upper bound on the Rademacher Complexity that depends on two key quantities: (i) the intrinsic dimension, which is a measure of isotropy, and (ii) the largest eigenvalue of the second moment (covariance) matrix of the distribution. Our result improves the dependence on the dimension over the best previously known bound and precisely quantiﬁes the relation between the sample complexity and the level of isotropy of the distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/The Elastic Lottery Ticket Hypothesis b/data/2021/neurips/The Elastic Lottery Ticket Hypothesis
new file mode 100644
index 0000000000..0e52150c5c
--- /dev/null
+++ b/data/2021/neurips/The Elastic Lottery Ticket Hypothesis	
@@ -0,0 +1 @@
+Lottery Ticket Hypothesis (LTH) raises keen attention to identifying sparse trainable subnetworks, or winning tickets, which can be trained in isolation to achieve similar or even better performance compared to the full models. Despite many efforts being made, the most effective method to identify such winning tickets is still Iterative Magnitude-based Pruning (IMP), which is computationally expensive and has to be run thoroughly for every different network. A natural question that comes in is: can we"transform"the winning ticket found in one network to another with a different architecture, yielding a winning ticket for the latter at the beginning, without re-doing the expensive IMP? Answering this question is not only practically relevant for efficient"once-for-all"winning ticket finding, but also theoretically appealing for uncovering inherently scalable sparse patterns in networks. We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets). Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP. We have also extensively compared E-LTH with pruning-at-initialization and dynamic sparse training methods, as well as discussed the generalizability of E-LTH to different model families, layer types, and across datasets. Code is available at https://github.com/VITA-Group/ElasticLTH.
\ No newline at end of file
diff --git a/data/2021/neurips/The Emergence of Objectness: Learning Zero-shot Segmentation from Videos b/data/2021/neurips/The Emergence of Objectness: Learning Zero-shot Segmentation from Videos
new file mode 100644
index 0000000000..b224e674b1
--- /dev/null
+++ b/data/2021/neurips/The Emergence of Objectness: Learning Zero-shot Segmentation from Videos	
@@ -0,0 +1 @@
+Humans can easily segment moving objects without knowing what they are. That objectness could emerge from continuous visual observations motivates us to model grouping and movement concurrently from unlabeled videos. Our premise is that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis which can be checked from the data itself without any external supervision. Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images. It then binds them in a conjoint representation called segment flow that pools flow offsets over each region and provides a gross characterization of moving regions for the entire scene. By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively. Our model demonstrates the surprising emergence of objectness in the appearance pathway, surpassing prior works on zero-shot object segmentation from an image, moving object segmentation from a video with unsupervised test-time adaptation, and semantic image segmentation by supervised fine-tuning. Our work is the first truly end-to-end zero-shot object segmentation from videos. It not only develops generic objectness for segmentation and tracking, but also outperforms prevalent image-based contrastive learning methods without augmentation engineering.
\ No newline at end of file
diff --git a/data/2021/neurips/The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization b/data/2021/neurips/The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization
new file mode 100644
index 0000000000..3d7307368e
--- /dev/null
+++ b/data/2021/neurips/The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization	
@@ -0,0 +1 @@
+Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called"$\eta$-trick"that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.
\ No newline at end of file
diff --git a/data/2021/neurips/The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle b/data/2021/neurips/The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
new file mode 100644
index 0000000000..14942fc96a
--- /dev/null
+++ b/data/2021/neurips/The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle	
@@ -0,0 +1 @@
+Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example (Wang and Chen, 2018) has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.
\ No newline at end of file
diff --git a/data/2021/neurips/The Image Local Autoregressive Transformer b/data/2021/neurips/The Image Local Autoregressive Transformer
new file mode 100644
index 0000000000..bc129513b9
--- /dev/null
+++ b/data/2021/neurips/The Image Local Autoregressive Transformer	
@@ -0,0 +1 @@
+Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both the quantitative and qualitative results show the efficacy of our model.
\ No newline at end of file
diff --git a/data/2021/neurips/The Implicit Bias of Minima Stability: A View from Function Space b/data/2021/neurips/The Implicit Bias of Minima Stability: A View from Function Space
new file mode 100644
index 0000000000..a97d1afb44
--- /dev/null
+++ b/data/2021/neurips/The Implicit Bias of Minima Stability: A View from Function Space	
@@ -0,0 +1 @@
+The loss terrains of over-parameterized neural networks have multiple global minima. However, it is well known that stochastic gradient descent (SGD) can stably converge only to minima that are sufﬁciently ﬂat w.r.t. SGD’s step size. In this paper we study the effect that this mechanism has on the function implemented by the trained model. First, we extend the existing knowledge on minima stability to non-differentiable minima, which are common in ReLU nets. We then use our stability results to study a single hidden layer univariate ReLU network. In this setting, we show that SGD is biased towards functions whose second derivative (w.r.t the input) has a bounded weighted L 1 norm, and this is regardless of the initialization. In particular, we show that the function implemented by the network upon convergence gets smoother as the learning rate increases. The weight multiplying the second derivative is larger around the center of the support of the training distribution, and smaller towards its boundaries, suggesting that a trained model tends to be smoother at the center of the training distribution.
\ No newline at end of file
diff --git a/data/2021/neurips/The Inductive Bias of Quantum Kernels b/data/2021/neurips/The Inductive Bias of Quantum Kernels
new file mode 100644
index 0000000000..7a1a077d5a
--- /dev/null
+++ b/data/2021/neurips/The Inductive Bias of Quantum Kernels	
@@ -0,0 +1 @@
+It has been hypothesized that quantum computers may lend themselves well to applications in machine learning. In the present work, we analyze function classes defined via quantum kernels. Quantum computers offer the possibility to efficiently compute inner products of exponentially large density operators that are classically hard to compute. However, having an exponentially large feature space renders the problem of generalization hard. Furthermore, being able to evaluate inner products in high dimensional spaces efficiently by itself does not guarantee a quantum advantage, as already classically tractable kernels can correspond to high- or infinite-dimensional reproducing kernel Hilbert spaces (RKHS). We analyze the spectral properties of quantum kernels and find that we can expect an advantage if their RKHS is low dimensional and contains functions that are hard to compute classically. If the target function is known to lie in this class, this implies a quantum advantage, as the quantum computer can encode this inductive bias, whereas there is no classically efficient way to constrain the function class in the same way. However, we show that finding suitable quantum kernels is not easy because the kernel evaluation might require exponentially many measurements. In conclusion, our message is a somewhat sobering one: we conjecture that quantum machine learning models can offer speed-ups only if we manage to encode knowledge about the problem at hand into quantum circuits, while encoding the same bias into a classical model would be hard. These situations may plausibly occur when learning on data generated by a quantum process, however, they appear to be harder to come by for classical datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/The Lazy Online Subgradient Algorithm is Universal on Strongly Convex Domains b/data/2021/neurips/The Lazy Online Subgradient Algorithm is Universal on Strongly Convex Domains
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective b/data/2021/neurips/The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
new file mode 100644
index 0000000000..028c5985f2
--- /dev/null
+++ b/data/2021/neurips/The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective	
@@ -0,0 +1 @@
+Large width limits have been a recent focus of deep learning research: modulo computational practicalities, do wider networks outperform narrower ones? Answering this question has been challenging, as conventional networks gain representational power with width, potentially masking any negative effects. Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural nets. In doing so, we aim to understand how width affects (standard) neural networks once they have sufficient capacity for a given modeling task. Our theoretical and empirical results on Deep GP suggest that large width can be detrimental to hierarchical models. Surprisingly, we prove that even nonparametric Deep GP converge to Gaussian processes, effectively becoming shallower without any increase in representational power. The posterior, which corresponds to a mixture of data-adaptable basis functions, becomes less data-dependent with width. Our tail analysis demonstrates that width and depth have opposite effects: depth accentuates a model's non-Gaussianity, while width makes models increasingly Gaussian. We find there is a"sweet spot"that maximizes test performance before the limiting GP behavior prevents adaptability, occurring at width = 1 or width = 2 for nonparametric Deep GP. These results make strong predictions about the same phenomenon in conventional neural networks trained with L2 regularization (analogous to a Gaussian prior on parameters): we show that such neural networks may need up to 500 - 1000 hidden units for sufficient capacity - depending on the dataset - but further width degrades performance.
\ No newline at end of file
diff --git a/data/2021/neurips/The Limits of Optimal Pricing in the Dark b/data/2021/neurips/The Limits of Optimal Pricing in the Dark
new file mode 100644
index 0000000000..1bec221a45
--- /dev/null
+++ b/data/2021/neurips/The Limits of Optimal Pricing in the Dark	
@@ -0,0 +1 @@
+A ubiquitous learning problem in today's digital market is, during repeated interactions between a seller and a buyer, how a seller can gradually learn optimal pricing decisions based on the buyer's past purchase responses. A fundamental challenge of learning in such a strategic setup is that the buyer will naturally have incentives to manipulate his responses in order to induce more favorable learning outcomes for him. To understand the limits of the seller's learning when facing such a strategic and possibly manipulative buyer, we study a natural yet powerful buyer manipulation strategy. That is, before the pricing game starts, the buyer simply commits to"imitate"a different value function by pretending to always react optimally according to this imitative value function. We fully characterize the optimal imitative value function that the buyer should imitate as well as the resultant seller revenue and buyer surplus under this optimal buyer manipulation. Our characterizations reveal many useful insights about what happens at equilibrium. For example, a seller with concave production cost will obtain essentially 0 revenue at equilibrium whereas the revenue for a seller with convex production cost is the Bregman divergence of her cost function between no production and certain production. Finally, and importantly, we show that a more powerful class of pricing schemes does not necessarily increase, in fact, may be harmful to, the seller's revenue. Our results not only lead to an effective prescriptive way for buyers to manipulate learning algorithms but also shed lights on the limits of what a seller can really achieve when pricing in the dark.
\ No newline at end of file
diff --git a/data/2021/neurips/The Many Faces of Adversarial Risk b/data/2021/neurips/The Many Faces of Adversarial Risk
new file mode 100644
index 0000000000..94f3f4b889
--- /dev/null
+++ b/data/2021/neurips/The Many Faces of Adversarial Risk	
@@ -0,0 +1 @@
+Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk—not all mathematically rigorous and differing subtly in the details—have appeared in the literature. In this paper, we revisit these definitions, fix measure theoretic issues, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen’s theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with <inline-formula> <tex-math notation="LaTeX">$\infty $ </tex-math></inline-formula>-Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the <inline-formula> <tex-math notation="LaTeX">$\infty $ </tex-math></inline-formula>-Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.
\ No newline at end of file
diff --git a/data/2021/neurips/The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations b/data/2021/neurips/The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations
new file mode 100644
index 0000000000..c9284417af
--- /dev/null
+++ b/data/2021/neurips/The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations	
@@ -0,0 +1 @@
+Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways. To resolve this issue, we propose a simple alteration to the model training process, which results in more socially aligned explanations and metrics. Second, we compare among five approaches for removing features from model inputs. We find that some methods produce more OOD counterfactuals than others, and we make recommendations for selecting a feature-replacement function. Finally, we introduce four search-based methods for identifying FI explanations and compare them to strong baselines, including LIME, Anchors, and Integrated Gradients. Through experiments with six diverse text classification datasets, we find that the only method that consistently outperforms random search is a Parallel Local Search (PLS) that we introduce. Improvements over the second-best method are as large as 5.4 points for Sufficiency and 17 points for Comprehensiveness. All supporting code for experiments in this paper is publicly available at https://github.com/peterbhase/ExplanationSearch.
\ No newline at end of file
diff --git a/data/2021/neurips/The Pareto Frontier of model selection for general Contextual Bandits b/data/2021/neurips/The Pareto Frontier of model selection for general Contextual Bandits
new file mode 100644
index 0000000000..49d9441d92
--- /dev/null
+++ b/data/2021/neurips/The Pareto Frontier of model selection for general Contextual Bandits	
@@ -0,0 +1 @@
+Recent progress in model selection raises the question of the fundamental limits of these techniques. Under specific scrutiny has been model selection for general contextual bandits with nested policy classes, resulting in a COLT2020 open problem. It asks whether it is possible to obtain simultaneously the optimal single algorithm guarantees over all policies in a nested sequence of policy classes, or if otherwise this is possible for a trade-off $\alpha\in[\frac{1}{2},1)$ between complexity term and time: $\ln(|\Pi_m|)^{1-\alpha}T^\alpha$. We give a disappointing answer to this question. Even in the purely stochastic regime, the desired results are unobtainable. We present a Pareto frontier of up to logarithmic factors matching upper and lower bounds, thereby proving that an increase in the complexity term $\ln(|\Pi_m|)$ independent of $T$ is unavoidable for general policy classes. As a side result, we also resolve a COLT2016 open problem concerning second-order bounds in full-information games.
\ No newline at end of file
diff --git a/data/2021/neurips/The Role of Global Labels in Few-Shot Classification and How to Infer Them b/data/2021/neurips/The Role of Global Labels in Few-Shot Classification and How to Infer Them
new file mode 100644
index 0000000000..2f25d1a2b6
--- /dev/null
+++ b/data/2021/neurips/The Role of Global Labels in Few-Shot Classification and How to Infer Them	
@@ -0,0 +1 @@
+Few-shot learning is a central problem in meta-learning, where learners must quickly adapt to new tasks given limited training data. Recently, feature pre-training has become a ubiquitous component in state-of-the-art meta-learning methods and is shown to provide significant performance improvement. However, there is limited theoretical understanding of the connection between pre-training and meta-learning. Further, pre-training requires global labels shared across tasks, which may be unavailable in practice. In this paper, we show why exploiting pre-training is theoretically advantageous for meta-learning, and in particular the critical role of global labels. This motivates us to propose Meta Label Learning (MeLa), a novel meta-learning framework that automatically infers global labels to obtains robust few-shot models. Empirically, we demonstrate that MeLa is competitive with existing methods and provide extensive ablation experiments to highlight its key properties.
\ No newline at end of file
diff --git a/data/2021/neurips/The Semi-Random Satisfaction of Voting Axioms b/data/2021/neurips/The Semi-Random Satisfaction of Voting Axioms
new file mode 100644
index 0000000000..bee69fa1f4
--- /dev/null
+++ b/data/2021/neurips/The Semi-Random Satisfaction of Voting Axioms	
@@ -0,0 +1 @@
+We initiate the work towards a comprehensive picture of the worst average-case satisfaction of voting axioms in semi-random models, to provide a ﬁner and more realistic foundation for comparing voting rules. We adopt the semi-random model and formulation in [54], where an adversary chooses arbitrarily correlated “ground truth” preferences for the agents, on top of which random noises are added. We focus on characterizing the semi-random satisfaction of two well-studied voting axioms: Condorcet criterion and participation . We prove that for any ﬁxed number of alternatives, when the number of voters n is sufﬁciently large, the semi-random satisfaction of the Condorcet criterion under a wide range of voting rules is 1 , 1 − exp( − Θ( n )) , Θ( n − 0 . 5 ) , exp( − Θ( n )) , or being Θ(1) and 1 − Θ(1) at the same time; and the semi-random satisfaction of participation is 1 − Θ( n − 0 . 5 ) . Our results address open questions by Berg and Lepelley [3] in 1994, and also conﬁrm the following high-level message: the Condorcet criterion is a bigger concern than participation under realistic models.
\ No newline at end of file
diff --git a/data/2021/neurips/The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning b/data/2021/neurips/The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
new file mode 100644
index 0000000000..d4883497e0
--- /dev/null
+++ b/data/2021/neurips/The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning	
@@ -0,0 +1 @@
+In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy. Moreover, the system can still perform its task even if the ordering of its inputs is randomly permuted several times during an episode. These permutation invariant systems also display useful robustness and generalization properties that are broadly applicable. Interactive demo and videos of our results: https://attentionneuron.github.io/
\ No newline at end of file
diff --git a/data/2021/neurips/The Skellam Mechanism for Differentially Private Federated Learning b/data/2021/neurips/The Skellam Mechanism for Differentially Private Federated Learning
new file mode 100644
index 0000000000..13d59cdd28
--- /dev/null
+++ b/data/2021/neurips/The Skellam Mechanism for Differentially Private Federated Learning	
@@ -0,0 +1 @@
+We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables. To quantify its privacy guarantees, we analyze the privacy loss distribution via a numerical evaluation and provide a sharp bound on the R\'enyi divergence between two shifted Skellam distributions. While useful in both centralized and distributed privacy applications, we investigate how it can be applied in the context of federated learning with secure aggregation under communication constraints. Our theoretical findings and extensive experimental evaluations demonstrate that the Skellam mechanism provides the same privacy-accuracy trade-offs as the continuous Gaussian mechanism, even when the precision is low. More importantly, Skellam is closed under summation and sampling from it only requires sampling from a Poisson distribution -- an efficient routine that ships with all machine learning and data analysis software packages. These features, along with its discrete nature and competitive privacy-accuracy trade-offs, make it an attractive practical alternative to the newly introduced discrete Gaussian mechanism.
\ No newline at end of file
diff --git a/data/2021/neurips/The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation b/data/2021/neurips/The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation
new file mode 100644
index 0000000000..6132852cbf
--- /dev/null
+++ b/data/2021/neurips/The Unbalanced Gromov Wasserstein Distance: Conic Formulation and Relaxation	
@@ -0,0 +1 @@
+Comparing metric measure spaces (i.e. a metric space endowed with a probability distribution) is at the heart of many machine learning problems. This includes for instance predicting properties of molecules in quantum chemistry or generating graphs with varying connectivity. The most popular distance between such metric measure spaces is the Gromov-Wasserstein (GW) distance, which is the solution of a quadratic assignment problem. This distance has been successfully applied to supervised learning and generative modeling, for applications as diverse as quantum chemistry or natural language processing. The GW distance is however limited to the comparison of metric measure spaces endowed with a \emph{probability} distribution. This strong limitation is problematic for many applications in ML where there is no a priori natural normalization on the total mass of the data. Furthermore, imposing an exact conservation of mass across spaces is not robust to outliers and often leads to irregular matching. To alleviate these issues, we introduce two Unbalanced Gromov-Wasserstein formulations: a distance and a more computationally tractable upper-bounding relaxation. They both allow the comparison of metric spaces equipped with arbitrary positive measures up to isometries.
\ No newline at end of file
diff --git a/data/2021/neurips/The Utility of Explainable AI in Ad Hoc Human-Machine Teaming b/data/2021/neurips/The Utility of Explainable AI in Ad Hoc Human-Machine Teaming
new file mode 100644
index 0000000000..c57ecfddef
--- /dev/null
+++ b/data/2021/neurips/The Utility of Explainable AI in Ad Hoc Human-Machine Teaming	
@@ -0,0 +1 @@
+Recent advances in machine learning have led to growing interest in Explainable AI (xAI) to enable humans to gain insight into the decision-making of machine learning models. Despite this recent interest, the utility of xAI techniques has not yet been characterized in human-machine teaming. Importantly, xAI offers the promise of enhancing team situational awareness (SA) and shared mental model development, which are the key characteristics of effective human-machine teams. Rapidly developing such mental models is especially critical in ad hoc human-machine teaming, where agents do not have a priori knowledge of others’ decision-making strategies. In this paper, we present two novel human-subject experiments quantifying the beneﬁts of deploying xAI techniques within a human-machine teaming scenario. First, we show that xAI techniques can support SA ( p < 0 . 05) . Second, we examine how different SA levels induced via a collaborative AI policy abstraction affect ad hoc human-machine teaming performance. Importantly, we ﬁnd that the beneﬁts of xAI are not universal, as there is a strong dependence on the composition of the human-machine team. Novices beneﬁt from xAI providing increased SA ( p < 0 . 05 ) but are susceptible to cognitive overhead ( p < 0 . 05 ). On the other hand, expert performance degrades with the addition of xAI-based support ( p < 0 . 05 ), indicating that the cost of paying attention to the xAI outweighs the beneﬁts obtained from being provided additional information to enhance SA. Our results demonstrate that researchers must deliberately design and deploy the right xAI techniques in the right scenario by carefully considering human-machine team composition and how the xAI method augments SA.
\ No newline at end of file
diff --git a/data/2021/neurips/The Value of Information When Deciding What to Learn b/data/2021/neurips/The Value of Information When Deciding What to Learn
new file mode 100644
index 0000000000..0095f89fd9
--- /dev/null
+++ b/data/2021/neurips/The Value of Information When Deciding What to Learn	
@@ -0,0 +1 @@
+All sequential decision-making agents explore so as to acquire knowledge about a particular target. It is often the responsibility of the agent designer to construct this target which, in rich and complex environments, constitutes a onerous burden; without full knowledge of the environment itself, a designer may forge a sub-optimal learning target that poorly balances the amount of information an agent must acquire to identify the target against the target's associated performance shortfall. While recent work has developed a connection between learning targets and rate-distortion theory to address this challenge and empower agents that decide what to learn in an automated fashion, the proposed algorithm does not optimally tackle the equally important challenge of efficient information acquisition. In this work, building upon the seminal design principle of information-directed sampling (Russo&Van Roy, 2014), we address this shortcoming directly to couple optimal information acquisition with the optimal design of learning targets. Along the way, we offer new insights into learning targets from the literature on rate-distortion theory before turning to empirical results that confirm the value of information when deciding what to learn.
\ No newline at end of file
diff --git a/data/2021/neurips/The balancing principle for parameter choice in distance-regularized domain adaptation b/data/2021/neurips/The balancing principle for parameter choice in distance-regularized domain adaptation
new file mode 100644
index 0000000000..1ac43c9985
--- /dev/null
+++ b/data/2021/neurips/The balancing principle for parameter choice in distance-regularized domain adaptation	
@@ -0,0 +1 @@
+We address the unsolved algorithm design problem of choosing a justified regularization parameter in unsupervised domain adaptation. This problem is intriguing as no labels are available in the target domain. Our approach starts with the observation that the widely-used method of minimizing the source error, penalized by a distance measure between source and target feature representations, shares characteristics with regularized ill-posed inverse problems. Regularization parameters in inverse problems are optimally chosen by the fundamental principle of balancing approximation and sampling errors. We use this principle to balance learning errors and domain distance in a target error bound. As a result, we obtain a theoretically justified rule for the choice of the regularization parameter. In contrast to the state of the art, our approach allows source and target distributions with disjoint supports. An empirical comparative study on benchmark datasets underpins the performance of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition b/data/2021/neurips/The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
new file mode 100644
index 0000000000..6cf661333d
--- /dev/null
+++ b/data/2021/neurips/The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition	
@@ -0,0 +1 @@
+We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through T episodes, with the goal of achieving (cid:101) O ( √ T ) regret when the losses are adversarial and simultaneously O (polylog( T )) regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the ﬁxed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader (FTRL) framework together with a set of new techniques. Speciﬁcally, we ﬁrst propose a loss-shifting trick in the FTRL analysis, which greatly simpliﬁes the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by (a fraction of) the regret itself in the stochastic setting, a key property to ensure O (polylog( T )) regret.
\ No newline at end of file
diff --git a/data/2021/neurips/The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond b/data/2021/neurips/The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond
new file mode 100644
index 0000000000..4f6a862beb
--- /dev/null
+++ b/data/2021/neurips/The convergence rate of regularized learning in games: From bandits and uncertainty to optimism and beyond	
@@ -0,0 +1 @@
+In this paper, we examine the convergence rate of a wide range of regularized methods for learning in games. To that end, we propose a uniﬁed algorithmic template that we call “follow the generalized leader” (FTGL), and which includes as special cases the canonical “follow the regularized leader” algorithm, its optimistic variants, extra-gradient schemes, and many others. The proposed framework is also sufﬁciently ﬂexible to account for several different feedback models – from full information to bandit feedback. In this general setting, we show that FTGL algorithms converge locally to strict Nash equilibria at a rate which does not depend on the level of uncertainty faced by the players, but only on the geometry of the regularizer near the equilibrium. In particular, we show that algorithms based on entropic regularization – like the exponential weights algorithm – enjoy a linear convergence rate, while Euclidean projection methods converge to equilibrium in a ﬁnite number of iterations, even with bandit feedback.
\ No newline at end of file
diff --git a/data/2021/neurips/The decomposition of the higher-order homology embedding constructed from the $k$-Laplacian b/data/2021/neurips/The decomposition of the higher-order homology embedding constructed from the $k$-Laplacian
new file mode 100644
index 0000000000..65c07eefc8
--- /dev/null
+++ b/data/2021/neurips/The decomposition of the higher-order homology embedding constructed from the $k$-Laplacian	
@@ -0,0 +1 @@
+The null space of the $k$-th order Laplacian $\mathbf{\mathcal L}_k$, known as the {\em $k$-th homology vector space}, encodes the non-trivial topology of a manifold or a network. Understanding the structure of the homology embedding can thus disclose geometric or topological information from the data. The study of the null space embedding of the graph Laplacian $\mathbf{\mathcal L}_0$ has spurred new research and applications, such as spectral clustering algorithms with theoretical guarantees and estimators of the Stochastic Block Model. In this work, we investigate the geometry of the $k$-th homology embedding and focus on cases reminiscent of spectral clustering. Namely, we analyze the {\em connected sum} of manifolds as a perturbation to the direct sum of their homology embeddings. We propose an algorithm to factorize the homology embedding into subspaces corresponding to a manifold's simplest topological components. The proposed framework is applied to the {\em shortest homologous loop detection} problem, a problem known to be NP-hard in general. Our spectral loop detection algorithm scales better than existing methods and is effective on diverse data such as point clouds and images.
\ No newline at end of file
diff --git a/data/2021/neurips/The effectiveness of feature attribution methods and its correlation with automatic evaluation scores b/data/2021/neurips/The effectiveness of feature attribution methods and its correlation with automatic evaluation scores
new file mode 100644
index 0000000000..d01eb2d321
--- /dev/null
+++ b/data/2021/neurips/The effectiveness of feature attribution methods and its correlation with automatic evaluation scores	
@@ -0,0 +1 @@
+Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these tools in their work. However, despite humans being the target end-users, most attribution methods were only evaluated on proxy automatic-evaluation metrics (Zhang et al. 2018; Zhou et al. 2016; Petsiuk et al. 2018). In this paper, we conduct the first user study to measure attribution map effectiveness in assisting humans in ImageNet classification and Stanford Dogs fine-grained classification, and when an image is natural or adversarial (i.e., contains adversarial perturbations). Overall, feature attribution is surprisingly not more effective than showing humans nearest training-set examples. On a harder task of fine-grained dog categorization, presenting attribution maps to humans does not help, but instead hurts the performance of human-AI teams compared to AI alone. Importantly, we found automatic attribution-map evaluation measures to correlate poorly with the actual human-AI team performance. Our findings encourage the community to rigorously test their methods on the downstream human-in-the-loop applications and to rethink the existing evaluation metrics.
\ No newline at end of file
diff --git a/data/2021/neurips/The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning b/data/2021/neurips/The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
new file mode 100644
index 0000000000..986a501ae3
--- /dev/null
+++ b/data/2021/neurips/The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning	
@@ -0,0 +1 @@
+The visual system of mammals is comprised of parallel, hierarchical specialized pathways. Different pathways are specialized in so far as they use representations that are more suitable for supporting specific downstream behaviours. In particular, the clearest example is the specialization of the ventral (“what”) and dorsal (“where”) pathways of the visual cortex. These two pathways support behaviours related to visual recognition and movement, respectively. To-date, deep neural networks have mostly been used as models of the ventral, recognition pathway. However, it is unknown whether both pathways can be modelled with a single deep ANN. Here, we ask whether a single model with a single loss function can capture the properties of both the ventral and the dorsal pathways. We explore this question using data from mice, who like other mammals, have specialized pathways that appear to support recognition and movement behaviours. We show that when we train a deep neural network architecture with two parallel pathways using a self-supervised predictive loss function, we can outperform other models in fitting mouse visual cortex. Moreover, we can model both the dorsal and ventral pathways. These results demonstrate that a self-supervised predictive learning approach applied to parallel pathway architectures can account for some of the functional specialization seen in mammalian visual systems.
\ No newline at end of file
diff --git a/data/2021/neurips/The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization b/data/2021/neurips/The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization
new file mode 100644
index 0000000000..9daf5c7a3f
--- /dev/null
+++ b/data/2021/neurips/The future is log-Gaussian: ResNets and their infinite-depth-and-width limit at initialization	
@@ -0,0 +1 @@
+Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, $n$, the Gaussian approximation gets worse as the network depth, $d$, increases. Given that modern networks are deep, this raises the question of how well modern architectures, like ResNets, are captured by the infinite-width limit. To provide a better approximation, we study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant. In contrast to the Gaussian infinite-width limit, we show theoretically that the network exhibits log-Gaussian behaviour at initialization in the infinite-depth-and-width limit, with parameters depending on the ratio $d/n$. Using Monte Carlo simulations, we demonstrate that even basic properties of standard ResNet architectures are poorly captured by the Gaussian limit, but remarkably well captured by our log-Gaussian limit. Moreover, our analysis reveals that ReLU ResNets at initialization are hypoactivated: fewer than half of the ReLUs are activated. Additionally, we calculate the interlayer correlations, which have the effect of exponentially increasing the variance of the network output. Based on our analysis, we introduce Balanced ResNets, a simple architecture modification, which eliminates hypoactivation and interlayer correlations and is more amenable to theoretical analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/The staircase property: How hierarchical structure can guide deep learning b/data/2021/neurips/The staircase property: How hierarchical structure can guide deep learning
new file mode 100644
index 0000000000..d5716f10d7
--- /dev/null
+++ b/data/2021/neurips/The staircase property: How hierarchical structure can guide deep learning	
@@ -0,0 +1 @@
+This paper identiﬁes a structural property of data distributions that enables deep neural networks to learn hierarchically. We deﬁne the “staircase” property for functions over the Boolean hypercube, which posits that high-order Fourier coefﬁcients are reachable from lower-order Fourier coefﬁcients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks – a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are learnable by more standard ResNet architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that the staircase property has a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any Statistical Query or PAC algorithm, as recently shown.
\ No newline at end of file
diff --git a/data/2021/neurips/There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning b/data/2021/neurips/There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning
new file mode 100644
index 0000000000..b8305cf536
--- /dev/null
+++ b/data/2021/neurips/There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning	
@@ -0,0 +1 @@
+We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order are likely to be separated by an irreversible sequence of actions. Conveniently, learning the temporal order of events can be done in a fully self-supervised way, which we use to estimate the reversibility of actions from experience, without any priors. We propose two different strategies that incorporate reversibility in RL agents, one strategy for exploration (RAE) and one strategy for control (RAC). We demonstrate the potential of reversibility-aware agents in several environments, including the challenging Sokoban game. In synthetic tasks, we show that we can learn control policies that never fail and reduce to zero the side-effects of interactions, even without access to the reward function.
\ No newline at end of file
diff --git a/data/2021/neurips/Think Big, Teach Small: Do Language Models Distil Occam's Razor? b/data/2021/neurips/Think Big, Teach Small: Do Language Models Distil Occam's Razor?
new file mode 100644
index 0000000000..eb1cc03c4d
--- /dev/null
+++ b/data/2021/neurips/Think Big, Teach Small: Do Language Models Distil Occam's Razor?	
@@ -0,0 +1 @@
+Large language models have recently shown a remarkable ability for few-shot learning, including patterns of algorithmic nature. However, it is still an open question to determine what kind of patterns these models can capture and how many examples they need in their prompts. We frame this question as a teaching problem with strong priors, and study whether language models can identify simple algorithmic concepts from small witness sets. In particular, we explore how several GPT architectures, program induction systems and humans perform in terms of the complexity of the concept and the number of additional examples, and how much their behaviour differs. This first joint analysis of language models and machine teaching can address key questions for artificial intelligence and machine learning, such as whether some strong priors, and Occam’s razor in particular, can be distilled from data, making learning from a few examples possible.
\ No newline at end of file
diff --git a/data/2021/neurips/Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates b/data/2021/neurips/Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
new file mode 100644
index 0000000000..cf34c1f9a1
--- /dev/null
+++ b/data/2021/neurips/Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates	
@@ -0,0 +1 @@
+Three Operator Splitting (TOS) (Davis&Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a $\mathcal{O}(1/\sqrt{t})$ convergence rate. The third extension AdapTOS endows TOS with adaptive step-sizes. For the important setting of optimizing a convex loss over the intersection of convex sets AdapTOS attains universal convergence rates, i.e., the rate adapts to the unknown smoothness degree of the objective. We compare our proposed methods with competing methods on various applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Three-dimensional spike localization and improved motion correction for Neuropixels recordings b/data/2021/neurips/Three-dimensional spike localization and improved motion correction for Neuropixels recordings
new file mode 100644
index 0000000000..7a90fbf379
--- /dev/null
+++ b/data/2021/neurips/Three-dimensional spike localization and improved motion correction for Neuropixels recordings	
@@ -0,0 +1 @@
+Neuropixels (NP) probes are dense linear multi-electrode arrays that have rapidly become essential tools for studying the electrophysiology of large neural populations. Unfortunately, a number of challenges remain in analyzing the large datasets output by these probes. Here we introduce several new methods for extracting useful spiking information from NP probes. First, we use a simple point neuron model, together with a neural-network denoiser, to efficiently map single spikes detected on the probe into three-dimensional localizations. Previous methods localized individual spikes in two dimensions only; we show that the new localization approach is significantly more robust and provides an improved feature set for clustering spikes according to neural identity (“spike sorting”). Next, we denoise the resulting three-dimensional point-cloud representation of the data, and show that the resulting 3D images can be accurately registered over time, leading to improved tracking of time-varying neural activity over the probe, and in turn, crisper estimates of neural clusters over time. Open source code is available at https://github.com/int-brain-lab/spikes_localization_registration.git.
\ No newline at end of file
diff --git a/data/2021/neurips/Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize b/data/2021/neurips/Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize
new file mode 100644
index 0000000000..9523e5815e
--- /dev/null
+++ b/data/2021/neurips/Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize	
@@ -0,0 +1 @@
+This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}\theta = \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$. Our analysis is based on new results regarding moments and high probability bounds for products of matrices which are shown to be tight. We derive high probability bounds on the performance of LSA under weaker conditions on the sequence $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$ than previous works. However, in contrast, we establish polynomial concentration bounds with order depending on the stepsize. We show that our conclusions cannot be improved without additional assumptions on the sequence of random matrices $\{{\bf A}_n: n \in \mathbb{N}^*\}$, and in particular that no Gaussian or exponential high probability bounds can hold. Finally, we pay a particular attention to establishing bounds with sharp order with respect to the number of iterations and the stepsize and whose leading terms contain the covariance matrices appearing in the central limit theorems.
\ No newline at end of file
diff --git a/data/2021/neurips/Tighter Expected Generalization Error Bounds via Wasserstein Distance b/data/2021/neurips/Tighter Expected Generalization Error Bounds via Wasserstein Distance
new file mode 100644
index 0000000000..c612629d6f
--- /dev/null
+++ b/data/2021/neurips/Tighter Expected Generalization Error Bounds via Wasserstein Distance	
@@ -0,0 +1 @@
+This work presents several expected generalization error bounds based on the Wasserstein distance. More speciﬁcally, it introduces full-dataset, single-letter, and random-subset bounds, and their analogues in the randomized subsample setting from Steinke and Zakynthinou [1]. Moreover, when the loss function is bounded and the geometry of the space is ignored by the choice of the metric in the Wasserstein distance, these bounds recover from below (and thus, are tighter than) current bounds based on the relative entropy. In particular, they generate new, non-vacuous bounds based on the relative entropy. Therefore, these results can be seen as a bridge between works that account for the geometry of the hypothesis space and those based on the relative entropy, which is agnostic to such geometry. Furthermore, it is shown how to produce various new bounds based on different information measures (e.g., the lautum information or several f -divergences) based on these bounds and how to derive similar bounds with respect to the backward channel using the presented proof techniques.
\ No newline at end of file
diff --git a/data/2021/neurips/Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods b/data/2021/neurips/Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods
new file mode 100644
index 0000000000..0c11589015
--- /dev/null
+++ b/data/2021/neurips/Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods	
@@ -0,0 +1 @@
+In reinforcement learning, continuous time is often discretized by a time scale $\delta$, to which the resulting performance is known to be highly sensitive. In this work, we seek to find a $\delta$-invariant algorithm for policy gradient (PG) methods, which performs well regardless of the value of $\delta$. We first identify the underlying reasons that cause PG methods to fail as $\delta \to 0$, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to have $\delta$-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novel $\delta$-invariant method named Safe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments by adaptively reacting to changes in states during action repetition. We empirically show that our method is not only $\delta$-invariant but also robust to stochasticity, outperforming previous $\delta$-invariant approaches on eight MuJoCo environments with both deterministic and stochastic settings. Our code is available at https://vision.snu.ac.kr/projects/sar.
\ No newline at end of file
diff --git a/data/2021/neurips/Time-independent Generalization Bounds for SGLD in Non-convex Settings b/data/2021/neurips/Time-independent Generalization Bounds for SGLD in Non-convex Settings
new file mode 100644
index 0000000000..dfe37faada
--- /dev/null
+++ b/data/2021/neurips/Time-independent Generalization Bounds for SGLD in Non-convex Settings	
@@ -0,0 +1 @@
+We establish generalization error bounds for stochastic gradient Langevin dynamics (SGLD) with constant learning rate under the assumptions of dissipativity and smoothness, a setting that has received increased attention in the sampling/optimization literature. Unlike existing bounds for SGLD in non-convex settings, ours are time-independent and decay to zero as the sample size increases. Using the framework of uniform stability, we establish time-independent bounds by exploiting the Wasserstein contraction property of the Langevin diffusion, which also allows us to circumvent the need to bound gradients using Lipschitz-like assumptions. Our analysis also supports variants of SGLD that use different discretization methods, incorporate Euclidean projections, or use non-isotropic noise.
\ No newline at end of file
diff --git a/data/2021/neurips/Time-series Generation by Contrastive Imitation b/data/2021/neurips/Time-series Generation by Contrastive Imitation
new file mode 100644
index 0000000000..5dffbec804
--- /dev/null
+++ b/data/2021/neurips/Time-series Generation by Contrastive Imitation	
@@ -0,0 +1 @@
+Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy, where the reinforcement signal is provided by a global (but stepwise-decomposable) energy model trained by contrastive estimation. At training, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality. By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies"generation by imitation". Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm. Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs b/data/2021/neurips/To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs
new file mode 100644
index 0000000000..2c137da0c3
--- /dev/null
+++ b/data/2021/neurips/To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs	
@@ -0,0 +1 @@
+Due to the discrete nature of words, language GANs require to be optimized from rewards provided by discriminator networks, via reinforcement learning methods. This is a much harder setting than for continuous tasks, which enjoy gradient flows from discriminators to generators, usually leading to dramatic learning instabilities. However, we claim that this can be solved by making discriminator and generator networks cooperate to produce output sequences during training. These cooperative outputs, inherently built to obtain higher discrimination scores, not only provide denser rewards for training, but also form a more compact artificial set for discriminator training, hence improving its accuracy and stability. In this paper, we show that our SelfGAN framework, built on this cooperative principle, outperforms Teacher Forcing and obtains state-of-the-art results on two challenging tasks, Summarization and Question Generation.
\ No newline at end of file
diff --git a/data/2021/neurips/To The Point: Correspondence-driven monocular 3D category reconstruction b/data/2021/neurips/To The Point: Correspondence-driven monocular 3D category reconstruction
new file mode 100644
index 0000000000..802bad44ce
--- /dev/null
+++ b/data/2021/neurips/To The Point: Correspondence-driven monocular 3D category reconstruction	
@@ -0,0 +1 @@
+We present To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision. We recover a 3D shape from a 2D image by first regressing the 2D positions corresponding to the 3D template vertices and then jointly estimating a rigid camera transform and non-rigid template deformation that optimally explain the 2D positions through the 3D shape projection. By relying on 3D-2D correspondences we use a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions. We treat this optimization as a differentiable layer and train the whole system in an end-to-end manner. We report systematic quantitative improvements on multiple categories and provide qualitative results comprising diverse shape, pose and texture prediction examples. Project website: https://fkokkinos.github.io/to_the_point/.
\ No newline at end of file
diff --git a/data/2021/neurips/ToAlign: Task-Oriented Alignment for Unsupervised Domain Adaptation b/data/2021/neurips/ToAlign: Task-Oriented Alignment for Unsupervised Domain Adaptation
new file mode 100644
index 0000000000..0ffca1fc01
--- /dev/null
+++ b/data/2021/neurips/ToAlign: Task-Oriented Alignment for Unsupervised Domain Adaptation	
@@ -0,0 +1 @@
+Unsupervised domain adaptive classifcation intends to improve the classifcation performance on unlabeled target domain. To alleviate the adverse effect of domain shift, many approaches align the source and target domains in the feature space. However, a feature is usually taken as a whole for alignment without explicitly making domain alignment proactively serve the classifcation task, leading to sub-optimal solution. In this paper, we propose an effective Task-oriented Alignment (ToAlign) for unsupervised domain adaptation (UDA). We study what features should be aligned across domains and propose to make the domain alignment proactively serve classifcation by performing feature decomposition and alignment under the guidance of the prior knowledge induced from the classifcation task itself. Particularly, we explicitly decompose a feature in the source domain into a task-related/discriminative feature that should be aligned, and a task-irrelevant feature that should be avoided/ignored, based on the classifcation meta-knowledge. Extensive experimental results on various benchmarks (e.g., Offce-Home, Visda-2017, and DomainNet) under different domain adaptation settings demonstrate the effectiveness of ToAlign which helps achieve the state-of-the-art performance. The code is publicly available at https://github.com/microsoft/UDA
\ No newline at end of file
diff --git a/data/2021/neurips/TokenLearner: Adaptive Space-Time Tokenization for Videos b/data/2021/neurips/TokenLearner: Adaptive Space-Time Tokenization for Videos
new file mode 100644
index 0000000000..6d08bee9d1
--- /dev/null
+++ b/data/2021/neurips/TokenLearner: Adaptive Space-Time Tokenization for Videos	
@@ -0,0 +1 @@
+In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in image frames. Our experiments demonstrate strong performance on several challenging benchmarks for video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced computational cost. We establish new state-of-the-arts on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AViD. The code will be available at: https://github.com/google-research/ scenic/tree/main/scenic/projects/token_learner
\ No newline at end of file
diff --git a/data/2021/neurips/Topic Modeling Revisited: A Document Graph-based Neural Network Perspective b/data/2021/neurips/Topic Modeling Revisited: A Document Graph-based Neural Network Perspective
new file mode 100644
index 0000000000..053d89b261
--- /dev/null
+++ b/data/2021/neurips/Topic Modeling Revisited: A Document Graph-based Neural Network Perspective	
@@ -0,0 +1 @@
+Most topic modeling approaches are based on the bag-of-words assumption, where each word is required to be conditionally independent in the same document. As a result, both of the generative story and the topic formulation have totally ignored the semantic dependency among words, which is important for improving the semantic comprehension and model interpretability. To this end, in this paper, we revisit the task of topic modeling by transforming each document into a directed graph with word dependency as edges between word nodes, and develop a novel approach, namely Graph Neural Topic Model (GNTM). Specifically, in GNTM, a well-defined probabilistic generative story is designed to model both the graph structure and word sets with multinomial distributions on the vocabulary and word dependency edge set as the topics. Meanwhile, a Neural Variational Inference (NVI) approach is proposed to learn our model with graph neural networks to encode the document graphs. Besides, we theoretically demonstrate that Latent Dirichlet Allocation (LDA) can be derived from GNTM as a special case with similar objective functions. Finally, extensive experiments on four benchmark datasets have clearly demonstrated the effectiveness and interpretability of GNTM compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/TopicNet: Semantic Graph-Guided Topic Discovery b/data/2021/neurips/TopicNet: Semantic Graph-Guided Topic Discovery
new file mode 100644
index 0000000000..3c8c1ebebe
--- /dev/null
+++ b/data/2021/neurips/TopicNet: Semantic Graph-Guided Topic Discovery	
@@ -0,0 +1 @@
+Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner and automatically organize them into a topic hierarchy. However, it is unclear how to incorporate prior beliefs such as knowledge graph to guide the learning of the topic hierarchy. To address this issue, we introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning. TopicNet represents each topic as a Gaussian-distributed embedding vector, projects the topics of all layers into a shared embedding space, and explores both the symmetric and asymmetric similarities between Gaussian embedding vectors to incorporate prior semantic hierarchies. With an auto-encoding variational inference network, the model parameters are optimized by minimizing the evidence lower bound and a regularization term via stochastic gradient descent. Experiments on widely used benchmarks show that TopicNet outperforms related deep topic models on discovering deeper interpretable topics and mining better document~representations.
\ No newline at end of file
diff --git a/data/2021/neurips/Topographic VAEs learn Equivariant Capsules b/data/2021/neurips/Topographic VAEs learn Equivariant Capsules
new file mode 100644
index 0000000000..de19c6d283
--- /dev/null
+++ b/data/2021/neurips/Topographic VAEs learn Equivariant Capsules	
@@ -0,0 +1 @@
+In this work we seek to bridge the concepts of topographic organization and equivariance in neural networks. To accomplish this, we introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. Furthermore, through topographic organization over time (i.e. temporal coherence), we demonstrate how predefined latent space transformation operators can be encouraged for observed transformed input sequences -- a primitive form of unsupervised learned equivariance. We demonstrate that this model successfully learns sets of approximately equivariant features (i.e."capsules") directly from sequences and achieves higher likelihood on correspondingly transforming test sequences. Equivariance is verified quantitatively by measuring the approximate commutativity of the inference network and the sequence transformations. Finally, we demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Topological Attention for Time Series Forecasting b/data/2021/neurips/Topological Attention for Time Series Forecasting
new file mode 100644
index 0000000000..a2a5364b94
--- /dev/null
+++ b/data/2021/neurips/Topological Attention for Time Series Forecasting	
@@ -0,0 +1 @@
+The problem of (point) forecasting $ \textit{univariate} $ time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether $\textit{local topological properties}$, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose $\textit{topological attention}$, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as $\texttt{N-BEATS}$, and in combination with the latter exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to a broad range of forecasting methods in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.
\ No newline at end of file
diff --git a/data/2021/neurips/Topological Detection of Trojaned Neural Networks b/data/2021/neurips/Topological Detection of Trojaned Neural Networks
new file mode 100644
index 0000000000..fe72d5d56b
--- /dev/null
+++ b/data/2021/neurips/Topological Detection of Trojaned Neural Networks	
@@ -0,0 +1 @@
+Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Topological Relational Learning on Graphs b/data/2021/neurips/Topological Relational Learning on Graphs
new file mode 100644
index 0000000000..6d44da3514
--- /dev/null
+++ b/data/2021/neurips/Topological Relational Learning on Graphs	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have emerged as a powerful tool for graph classification and representation learning. However, GNNs tend to suffer from over-smoothing problems and are vulnerable to graph perturbations. To address these challenges, we propose a novel topological neural framework of topological relational inference (TRI) which allows for integrating higher-order graph information to GNNs and for systematically learning a local graph structure. The key idea is to rewire the original graph by using the persistent homology of the small neighborhoods of nodes and then to incorporate the extracted topological summaries as the side information into the local algorithm. As a result, the new framework enables us to harness both the conventional information on the graph structure and information on the graph higher order topological properties. We derive theoretical stability guarantees for the new local topological representation and discuss their implications on the graph algebraic connectivity. The experimental results on node classification tasks demonstrate that the new TRI-GNN outperforms all 14 state-of-the-art baselines on 6 out 7 graphs and exhibit higher robustness to perturbations, yielding up to 10\% better performance under noisy scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Topology-Imbalance Learning for Semi-Supervised Node Classification b/data/2021/neurips/Topology-Imbalance Learning for Semi-Supervised Node Classification
new file mode 100644
index 0000000000..804123cec5
--- /dev/null
+++ b/data/2021/neurips/Topology-Imbalance Learning for Semi-Supervised Node Classification	
@@ -0,0 +1 @@
+The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community. Although the imbalance considered by existing studies roots from the unequal quantity of labeled examples in different classes (quantity imbalance), we argue that graph data expose a unique source of imbalance from the asymmetric topological properties of the labeled nodes, i.e., labeled nodes are not equal in terms of their structural role in the graph (topology imbalance). In this work, we first probe the previously unknown topology-imbalance issue, including its characteristics, causes, and threats to semi-supervised node classification learning. We then provide a unified view to jointly analyzing the quantity- and topology- imbalance issues by considering the node influence shift phenomenon with the Label Propagation algorithm. In light of our analysis, we devise an influence conflict detection -- based metric Totoro to measure the degree of graph topology imbalance and propose a model-agnostic method ReNode to address the topology-imbalance issue by re-weighting the influence of labeled nodes adaptively based on their relative positions to class boundaries. Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Best-of-All-Worlds Online Learning with Feedback Graphs b/data/2021/neurips/Towards Best-of-All-Worlds Online Learning with Feedback Graphs
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples b/data/2021/neurips/Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples
new file mode 100644
index 0000000000..f6c318218f
--- /dev/null
+++ b/data/2021/neurips/Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples	
@@ -0,0 +1 @@
+We study the problem of training certiﬁably robust models against adversarial examples. Certiﬁable training minimizes an upper bound on the worst-case loss over the allowed perturbation, and thus the tightness of the upper bound is an important factor in building certiﬁably robust models. However, many studies have shown that Interval Bound Propagation (IBP) training uses much looser bounds but outperforms other models that use tighter bounds. We identify another key factor that inﬂuences the performance of certiﬁable training: smoothness of the loss landscape . We ﬁnd signiﬁcant differences in the loss landscapes across many linear relaxation-based methods, and that the current state-of-the-arts method often has a landscape with favorable optimization properties. Moreover, to test the claim, we design a new certiﬁable training method with the desired properties. With the tightness and the smoothness, the proposed method achieves a decent performance under a wide range of perturbations, while others with only one of the two factors can perform well only for a speciﬁc range of perturbations. Our code is available at https://github.com/sungyoon-lee/LossLandscapeMatters .
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Biologically Plausible Convolutional Networks b/data/2021/neurips/Towards Biologically Plausible Convolutional Networks
new file mode 100644
index 0000000000..dc373b12a5
--- /dev/null
+++ b/data/2021/neurips/Towards Biologically Plausible Convolutional Networks	
@@ -0,0 +1 @@
+Convolutional networks are ubiquitous in deep learning. They are particularly useful for images, as they reduce the number of parameters, reduce training time, and increase accuracy. However, as a model of the brain they are seriously problematic, since they require weight sharing - something real neurons simply cannot do. Consequently, while neurons in the brain can be locally connected (one of the features of convolutional networks), they cannot be convolutional. Locally connected but non-convolutional networks, however, significantly underperform convolutional ones. This is troublesome for studies that use convolutional networks to explain activity in the visual system. Here we study plausible alternatives to weight sharing that aim at the same regularization principle, which is to make each neuron within a pool react similarly to identical inputs. The most natural way to do that is by showing the network multiple translations of the same image, akin to saccades in animal vision. However, this approach requires many translations, and doesn't remove the performance gap. We propose instead to add lateral connectivity to a locally connected network, and allow learning via Hebbian plasticity. This requires the network to pause occasionally for a sleep-like phase of "weight sharing". This method enables locally connected networks to achieve nearly convolutional performance on ImageNet, thus supporting convolutional networks as a model of the visual stream.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective b/data/2021/neurips/Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective
new file mode 100644
index 0000000000..e53bc6618a
--- /dev/null
+++ b/data/2021/neurips/Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective	
@@ -0,0 +1 @@
+Real-world data universally confronts a severe class-imbalance problem and exhibits a long-tailed distribution, i.e., most labels are associated with limited instances. The na\"ive models supervised by such datasets would prefer dominant labels, encounter a serious generalization challenge and become poorly calibrated. We propose two novel methods from the prior perspective to alleviate this dilemma. First, we deduce a balance-oriented data augmentation named Uniform Mixup (UniMix) to promote mixup in long-tailed scenarios, which adopts advanced mixing factor and sampler in favor of the minority. Second, motivated by the Bayesian theory, we figure out the Bayes Bias (Bayias), an inherent bias caused by the inconsistency of prior, and compensate it as a modification on standard cross-entropy loss. We further prove that both the proposed methods ensure the classification calibration theoretically and empirically. Extensive experiments verify that our strategies contribute to a better-calibrated model, and their combination achieves state-of-the-art performance on CIFAR-LT, ImageNet-LT, and iNaturalist 2018.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Context-Agnostic Learning Using Synthetic Data b/data/2021/neurips/Towards Context-Agnostic Learning Using Synthetic Data
new file mode 100644
index 0000000000..90341cee35
--- /dev/null
+++ b/data/2021/neurips/Towards Context-Agnostic Learning Using Synthetic Data	
@@ -0,0 +1 @@
+We propose a novel setting for learning, where the input domain is the image of a map defined on the product of two sets, one of which completely determines the labels. We derive a new risk bound for this setting that decomposes into a bias and an error term, and exhibits a surprisingly weak dependence on the true labels. Inspired by these results, we present an algorithm aimed at minimizing the bias term by exploiting the ability to sample from each set independently. We apply our setting to visual classification tasks, where our approach enables us to train classifiers on datasets that consist entirely of a single synthetic example of each class. On several standard benchmarks for real-world image classification, we achieve robust performance in the context-agnostic setting, with good generalization to real world domains, whereas training directly on real world data without our techniques yields classifiers that are brittle to perturbations of the background.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Deeper Deep Reinforcement Learning with Spectral Normalization b/data/2021/neurips/Towards Deeper Deep Reinforcement Learning with Spectral Normalization
new file mode 100644
index 0000000000..632063cd29
--- /dev/null
+++ b/data/2021/neurips/Towards Deeper Deep Reinforcement Learning with Spectral Normalization	
@@ -0,0 +1 @@
+In computer vision and natural language processing, innovations in model architecture that lead to increases in model capacity have reliably translated into gains in performance. In stark contrast with this trend, state-of-the-art reinforcement learning (RL) algorithms often use only small MLPs, and gains in performance typically originate from algorithmic innovations. It is natural to hypothesize that small datasets in RL necessitate simple models to avoid overfitting; however, this hypothesis is untested. In this paper we investigate how RL agents are affected by exchanging the small MLPs with larger modern networks with skip connections and normalization, focusing specifically on soft actor-critic (SAC) algorithms. We verify, empirically, that naively adopting such architectures leads to instabilities and poor performance, likely contributing to the popularity of simple models in practice. However, we show that dataset size is not the limiting factor, and instead argue that intrinsic instability from the actor in SAC taking gradients through the critic is the culprit. We demonstrate that a simple smoothing method can mitigate this issue, which enables stable training with large modern architectures. After smoothing, larger models yield dramatic performance improvements for state-of-the-art agents -- suggesting that more "easy" gains may be had by focusing on model architectures in addition to algorithmic innovations.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Efficient and Effective Adversarial Training b/data/2021/neurips/Towards Efficient and Effective Adversarial Training
new file mode 100644
index 0000000000..fe0a72a557
--- /dev/null
+++ b/data/2021/neurips/Towards Efficient and Effective Adversarial Training	
@@ -0,0 +1 @@
+The vulnerability of Deep Neural Networks to adversarial attacks has spurred immense interest towards improving their robustness. However, present state-of-the-art adversarial defenses involve the use of 10-step adversaries during training, which renders them computationally infeasible for application to large-scale datasets. While the recent single-step defenses show promising direction, their robustness is not on par with multi-step training methods. In this work, we bridge this performance gap by introducing a novel Nuclear-Norm regularizer on network predictions to enforce function smoothing in the vicinity of data samples. While prior works consider each data sample independently, the proposed regularizer uses the joint statistics of adversarial samples across a training minibatch to enhance optimization during both attack generation and training, obtaining state-of-the-art results amongst efﬁcient defenses. We achieve further gains by incorporating exponential averaging of network weights over training iterations. We ﬁnally introduce a Hybrid training approach that combines the effectiveness of a two-step variant of the proposed defense with the efﬁciency of a single-step defense. We demonstrate superior results when compared to multi-step defenses such as TRADES and PGD-AT as well, at a signiﬁcantly lower computational cost.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Enabling Meta-Learning from Target Models b/data/2021/neurips/Towards Enabling Meta-Learning from Target Models
new file mode 100644
index 0000000000..b9ecdd5a5e
--- /dev/null
+++ b/data/2021/neurips/Towards Enabling Meta-Learning from Target Models	
@@ -0,0 +1 @@
+Meta-learning can extract an inductive bias from previous learning experience and assist the training of new tasks. It is often realized through optimizing a meta-model with the evaluation loss of task-specific solvers. Most existing algorithms sample non-overlapping $\mathit{support}$ sets and $\mathit{query}$ sets to train and evaluate the solvers respectively due to simplicity ($\mathcal{S}$/$\mathcal{Q}$ protocol). Different from $\mathcal{S}$/$\mathcal{Q}$ protocol, we can also evaluate a task-specific solver by comparing it to a target model $\mathcal{T}$, which is the optimal model for this task or a model that behaves well enough on this task ($\mathcal{S}$/$\mathcal{T}$ protocol). Although being short of research, $\mathcal{S}$/$\mathcal{T}$ protocol has unique advantages such as offering more informative supervision, but it is computationally expensive. This paper looks into this special evaluation method and takes a step towards putting it into practice. We find that with a small ratio of tasks armed with target models, classic meta-learning algorithms can be improved a lot without consuming many resources. We empirically verify the effectiveness of $\mathcal{S}$/$\mathcal{T}$ protocol in a typical application of meta-learning, $\mathit{i.e.}$, few-shot learning. In detail, after constructing target models by fine-tuning the pre-trained network on those hard tasks, we match the task-specific solvers and target models via knowledge distillation.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond b/data/2021/neurips/Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond
new file mode 100644
index 0000000000..91746e8040
--- /dev/null
+++ b/data/2021/neurips/Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond	
@@ -0,0 +1 @@
+In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning b/data/2021/neurips/Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
new file mode 100644
index 0000000000..834a9686a3
--- /dev/null
+++ b/data/2021/neurips/Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari. To address performance degradation due to poor critics in continuous-action domains, we further combine BVFT with OPE to get the best of both worlds, and obtain a hyperparameter-tuning method for Q-function based OPE with theoretical guarantees as a side product.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Instance-Optimal Offline Reinforcement Learning with Pessimism b/data/2021/neurips/Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
new file mode 100644
index 0000000000..4442b006da
--- /dev/null
+++ b/data/2021/neurips/Towards Instance-Optimal Offline Reinforcement Learning with Pessimism	
@@ -0,0 +1 @@
+We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy μ. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches O  H ∑
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Lower Bounds on the Depth of ReLU Neural Networks b/data/2021/neurips/Towards Lower Bounds on the Depth of ReLU Neural Networks
new file mode 100644
index 0000000000..8226cb5879
--- /dev/null
+++ b/data/2021/neurips/Towards Lower Bounds on the Depth of ReLU Neural Networks	
@@ -0,0 +1 @@
+We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Multi-Grained Explainability for Graph Neural Networks b/data/2021/neurips/Towards Multi-Grained Explainability for Graph Neural Networks
new file mode 100644
index 0000000000..20016ff2f7
--- /dev/null
+++ b/data/2021/neurips/Towards Multi-Grained Explainability for Graph Neural Networks	
@@ -0,0 +1 @@
+When a graph neural network (GNN) made a prediction, one raises question about explainability: “Which fraction of the input graph is most influential to the model’s decision?” Producing an answer requires understanding the model’s inner workings in general and emphasizing the insights on the decision for the instance at hand. Nonetheless, most of current approaches focus only on one aspect: (1) local explainability, which explains each instance independently, thus hardly exhibits the class-wise patterns; and (2) global explainability, which systematizes the globally important patterns, but might be trivial in the local context. This dichotomy limits the flexibility and effectiveness of explainers greatly. A performant paradigm towards multi-grained explainability is until-now lacking and thus a focus of our work. In this work, we exploit the pre-training and fine-tuning idea to develop our explainer and generate multi-grained explanations. Specifically, the pre-training phase accounts for the contrastivity among different classes, so as to highlight the class-wise characteristics from a global view; afterwards, the fine-tuning phase adapts the explanations in the local context. Experiments on both synthetic and real-world datasets show the superiority of our explainer, in terms of AUC on explaining graph classification over the leading baselines. Our codes and datasets are available at https://github.com/Wuyxin/ReFine.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach b/data/2021/neurips/Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach
new file mode 100644
index 0000000000..c91c7abf5d
--- /dev/null
+++ b/data/2021/neurips/Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach	
@@ -0,0 +1 @@
+We target open-world feature extrapolation problem where the feature space of input data goes through expansion and a model trained on partially observed features needs to handle new features in test data without further retraining. The problem is of much significance for dealing with features incrementally collected from different fields. To this end, we propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data. Based on our framework, we design two training strategies, a self-supervised approach and an inductive learning approach, to endow the model with extrapolation ability and alleviate feature-level over-fitting. We also provide theoretical analysis on the generalization error on test data with new features, which dissects the impact of training features and algorithms on generalization performance. Our experiments over several classification datasets and large-scale advertisement click prediction datasets demonstrate that our model can produce effective embeddings for unseen features and significantly outperforms baseline methods that adopt KNN and local aggregation.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation b/data/2021/neurips/Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation
new file mode 100644
index 0000000000..2619c2dfac
--- /dev/null
+++ b/data/2021/neurips/Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation	
@@ -0,0 +1 @@
+Autonomous driving relies on a huge volume of real-world data to be labeled to high precision. Alternative solutions seek to exploit driving simulators that can generate large amounts of labeled data with a plethora of content variations. However, the domain gap between the synthetic and real data remains, raising the following important question: What are the best ways to utilize a self-driving simulator for perception tasks? In this work, we build on top of recent advances in domain-adaptation theory, and from this perspective, propose ways to minimize the reality gap. We primarily focus on the use of labels in the synthetic domain alone. Our approach introduces both a principled way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. Our method is easy to implement in practice as it is agnostic of the network architecture and the choice of the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data (cameras, lidar) using an open-source simulator (CARLA), and evaluate the entire framework on a real-world dataset (nuScenes). Last but not least, we show what types of variations (e.g. weather conditions, number of assets, map design, and color diversity) matter to perception networks when trained with driving simulators, and which ones can be compensated for with our domain adaptation technique.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Robust Bisimulation Metric Learning b/data/2021/neurips/Towards Robust Bisimulation Metric Learning
new file mode 100644
index 0000000000..52d59b2bee
--- /dev/null
+++ b/data/2021/neurips/Towards Robust Bisimulation Metric Learning	
@@ -0,0 +1 @@
+Learned representations in deep reinforcement learning (DRL) have to extract task-relevant information from complex observations, balancing between robustness to distraction and informativeness to the policy. Such stable and rich representations, often learned via modern function approximation techniques, can enable practical application of the policy improvement theorem, even in high-dimensional continuous state-action spaces. Bisimulation metrics offer one solution to this representation learning problem, by collapsing functionally similar states together in representation space, which promotes invariance to noise and distractors. In this work, we generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies and approximate environment dynamics. Our theoretical results help us identify embedding pathologies that may occur in practical use. In particular, we find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal in environments with sparse rewards. Further, we propose a set of practical remedies: (i) a norm constraint on the representation space, and (ii) an extension of prior approaches with intrinsic rewards and latent space regularization. Finally, we provide evidence that the resulting method is not only more robust to sparse reward functions, but also able to solve challenging continuous control tasks with observational distractions, where prior methods fail.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Robust and Reliable Algorithmic Recourse b/data/2021/neurips/Towards Robust and Reliable Algorithmic Recourse
new file mode 100644
index 0000000000..a3b6aa7647
--- /dev/null
+++ b/data/2021/neurips/Towards Robust and Reliable Algorithmic Recourse	
@@ -0,0 +1 @@
+As predictive models are increasingly being deployed in high-stakes decision making (e.g., loan approvals), there has been growing interest in post hoc techniques which provide recourse to affected individuals. These techniques generate recourses under the assumption that the underlying predictive model does not change. However, in practice, models are often regularly updated for a variety of reasons (e.g., dataset shifts), thereby rendering previously prescribed recourses ineffective. To address this problem, we propose a novel framework, RObust Algorithmic Recourse (ROAR), that leverages adversarial training for finding recourses that are robust to model shifts. To the best of our knowledge, this work proposes the first solution to this critical problem. We also carry out detailed theoretical analysis which underscores the importance of constructing recourses that are robust to model shifts: 1) we derive a lower bound on the probability of invalidation of recourses generated by existing approaches which are not robust to model shifts. 2) we prove that the additional cost incurred due to the robust recourses output by our framework is bounded. Experimental evaluation on multiple synthetic and real-world datasets demonstrates the efficacy of the proposed framework and supports our theoretical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors b/data/2021/neurips/Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors
new file mode 100644
index 0000000000..097ff103a8
--- /dev/null
+++ b/data/2021/neurips/Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors	
@@ -0,0 +1 @@
+Compressive phase retrieval is a popular variant of the standard compressive sensing problem in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with near-optimal sample complexity for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that any signal minimizing an amplitude-based empirical loss function is close to the true signal. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and finding a good initialization for gradient-based methods has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the underlying signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We also adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While these guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe performance gains over various existing spectral initialization methods for sparse phase retrieval.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Sample-efficient Overparameterized Meta-learning b/data/2021/neurips/Towards Sample-efficient Overparameterized Meta-learning
new file mode 100644
index 0000000000..304aa8ac5b
--- /dev/null
+++ b/data/2021/neurips/Towards Sample-efficient Overparameterized Meta-learning	
@@ -0,0 +1 @@
+An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparameterization for meta-learning. Here we have a sequence of linear-regression tasks and we ask: (1) Given earlier tasks, what is the optimal linear representation of features for a new downstream task? and (2) How many samples do we need to build this representation? This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental meta-learning questions. Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a task-aware regularization to promote inductive bias. We leverage this inductive bias to explain how the downstream task actually benefits from overparameterization, in contrast to prior works on few-shot learning. For (2), we develop a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error. We then integrate these findings to obtain an overall performance guarantee for our meta-learning algorithm. Numerical experiments on real and synthetic data verify our insights on overparameterized meta-learning.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN b/data/2021/neurips/Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN
new file mode 100644
index 0000000000..ab37660f76
--- /dev/null
+++ b/data/2021/neurips/Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN	
@@ -0,0 +1 @@
+Image-based virtual try-on is one of the most promising applications of human-centric image generation due to its tremendous real-world potential. Yet, as most try-on approaches fit in-shop garments onto a target person, they require the laborious and restrictive construction of a paired training dataset, severely limiting their scalability. While a few recent works attempt to transfer garments directly from one person to another, alleviating the need to collect paired datasets, their performance is impacted by the lack of paired (supervised) information. In particular, disentangling style and spatial information of the garment becomes a challenge, which existing methods either address by requiring auxiliary data or extensive online optimization procedures, thereby still inhibiting their scalability. To achieve a \emph{scalable} virtual try-on system that can transfer arbitrary garments between a source and a target person in an unsupervised manner, we thus propose a texture-preserving end-to-end network, the PAtch-routed SpaTially-Adaptive GAN (PASTA-GAN), that facilitates real-world unpaired virtual try-on. Specifically, to disentangle the style and spatial information of each garment, PASTA-GAN consists of an innovative patch-routed disentanglement module for successfully retaining garment texture and shape characteristics. Guided by the source person keypoints, the patch-routed disentanglement module first decouples garments into normalized patches, thus eliminating the inherent spatial information of the garment, and then reconstructs the normalized patches to the warped garment complying with the target person pose. Given the warped garment, PASTA-GAN further introduces novel spatially-adaptive residual blocks that guide the generator to synthesize more realistic garment details.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Sharper Generalization Bounds for Structured Prediction b/data/2021/neurips/Towards Sharper Generalization Bounds for Structured Prediction
new file mode 100644
index 0000000000..c371de0f12
--- /dev/null
+++ b/data/2021/neurips/Towards Sharper Generalization Bounds for Structured Prediction	
@@ -0,0 +1 @@
+In this paper, we investigate the generalization performance of structured prediction learning and obtain state-of-the-art generalization bounds. Our analysis is based on factor graph decomposition of structured prediction algorithms, and we present novel margin guarantees from three different perspectives: Lipschitz continuity, smoothness, and space capacity condition. In the Lipschitz continuity scenario, we improve the square-root dependency on the label set cardinality of existing bounds to a logarithmic dependence. In the smoothness scenario, we provide generalization bounds that are not only a logarithmic dependency on the label set cardinality but a faster convergence rate of order O ( 1 n ) on the sample size n . In the space capacity scenario, we obtain bounds that do not depend on the label set cardinality and have faster convergence rates than O ( 1 √ n ) . In each scenario, applications are provided to suggest that these conditions are easy to be satisﬁed.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Stable and Robust AdderNets b/data/2021/neurips/Towards Stable and Robust AdderNets
new file mode 100644
index 0000000000..f3b269dc64
--- /dev/null
+++ b/data/2021/neurips/Towards Stable and Robust AdderNets	
@@ -0,0 +1 @@
+Adder neural network (AdderNet) replaces the original convolutions with massive multiplications by cheap additions while achieving comparable performance thus yields a series of energy-efficient neural networks. Compared with convolutional neural networks (CNNs), the training of AdderNets is much more sophisticated including several techniques for adjusting gradient and batch normalization. In addition, variances of both weights and activations in resulting adder networks are very enormous which limits its performance and the potential for applying to other tasks. To enhance the stability and robustness of AdderNets, we first thoroughly analyze the variance estimation of weight parameters and output features of an arbitrary adder layer. Then, we develop a weight normalization scheme for adaptively optimizing the weight distribution of AdderNets during the training procedure, which can reduce the perturbation on running mean and variance in batch normalization layers. Meanwhile, the proposed weight normalization can also be utilized to enhance the adversarial robustness of resulting networks. Experiments conducted on several benchmarks demonstrate the superiority of the proposed approach for generating AdderNets with higher performance.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Tight Communication Lower Bounds for Distributed Optimisation b/data/2021/neurips/Towards Tight Communication Lower Bounds for Distributed Optimisation
new file mode 100644
index 0000000000..ed4ad737ca
--- /dev/null
+++ b/data/2021/neurips/Towards Tight Communication Lower Bounds for Distributed Optimisation	
@@ -0,0 +1 @@
+We consider a standard distributed optimisation setting where $N$ machines, each holding a $d$-dimensional function $f_i$, aim to jointly minimise the sum of the functions $\sum_{i = 1}^N f_i (x)$. This problem arises naturally in large-scale distributed optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that $\Omega( Nd \log d / N\varepsilon)$ total bits need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$. The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values, and is matched within constant factors for quadratic objectives by a new variant of quantised gradient descent, which we describe and analyse. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization b/data/2021/neurips/Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization
new file mode 100644
index 0000000000..140239e100
--- /dev/null
+++ b/data/2021/neurips/Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization	
@@ -0,0 +1 @@
+Value factorization is a popular and promising approach to scaling up multi-agent reinforcement learning in cooperative settings, which balances the learning scalability and the representational capacity of value functions. However, the theoretical understanding of such methods is limited. In this paper, we formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning. Based on this framework, we investigate linear value factorization and reveal that multi-agent Q-learning with this simple decomposition implicitly realizes a powerful counterfactual credit assignment, but may not converge in some settings. Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties, respectively. Finally, to support our theoretical implications in practical realization, we conduct an empirical analysis of state-of-the-art deep multi-agent Q-learning algorithms on didactic examples and a broad set of StarCraft II unit micromanagement tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond b/data/2021/neurips/Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond
new file mode 100644
index 0000000000..305006af59
--- /dev/null
+++ b/data/2021/neurips/Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond	
@@ -0,0 +1 @@
+To train networks, lookahead algorithm [1] updates its fast weights k times via an inner-loop optimizer before updating its slow weights once by using the latest fast weights. Any optimizer, e.g. SGD, can serve as the inner-loop optimizer, and the derived lookahead generally enjoys remarkable test performance improvement over the vanilla optimizer. But theoretical understandings on the test performance improvement of lookahead remain absent yet. To solve this issue, we theoretically justify the advantages of lookahead in terms of the excess risk error which measures the test performance. Speciﬁcally, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-Łojasiewicz condition which has been observed/proved in neural networks. Moreover, we show the stagewise optimization strategy [2] which decays learning rate several times during training can also beneﬁt lookahead in improving its optimization and generalization errors on strongly convex problems. Finally, we propose a stagewise locally-regularized lookahead (SLRLA) algorithm which sums up the vanilla objective and a local regularizer to minimize at each stage and provably enjoys optimization and generalization improvement over the conventional (stagewise) lookahead. Experimental results on CIFAR10/100 and ImageNet testify its advantages. Codes is available at https://github.com/sail-sg/SLRLA-optimizer .
\ No newline at end of file
diff --git a/data/2021/neurips/Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games b/data/2021/neurips/Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
new file mode 100644
index 0000000000..8fa1bf3461
--- /dev/null
+++ b/data/2021/neurips/Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games	
@@ -0,0 +1 @@
+Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted deﬁnitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a uniﬁed measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD) . At the trajectory distribution level, we re-deﬁne BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this uniﬁed diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex Google Research Football environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in Google Research Football .
\ No newline at end of file
diff --git a/data/2021/neurips/Towards a Theoretical Framework of Out-of-Distribution Generalization b/data/2021/neurips/Towards a Theoretical Framework of Out-of-Distribution Generalization
new file mode 100644
index 0000000000..356526cdc3
--- /dev/null
+++ b/data/2021/neurips/Towards a Theoretical Framework of Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+Generalization to out-of-distribution (OOD) data is one of the central problems in modern machine learning. Recently, there is a surge of attempts to propose algorithms that mainly build upon the idea of extracting invariant features. Although intuitively reasonable, theoretical understanding of what kind of invariance can guarantee OOD generalization is still limited, and generalization to arbitrary out-of-distribution is clearly impossible. In this work, we take the first step towards rigorous and quantitative definitions of 1) what is OOD; and 2) what does it mean by saying an OOD problem is learnable. We also introduce a new concept of expansion function, which characterizes to what extent the variance is amplified in the test domains over the training domains, and therefore give a quantitative meaning of invariant features. Based on these, we prove OOD generalization error bounds. It turns out that OOD generalization largely depends on the expansion function. As recently pointed out by Gulrajani and Lopez-Paz (2020), any OOD learning algorithm without a model selection module is incomplete. Our theory naturally induces a model selection criterion. Extensive experiments on benchmark OOD datasets demonstrate that our model selection criterion has a significant advantage over baselines.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness b/data/2021/neurips/Towards a Unified Game-Theoretic View of Adversarial Perturbations and Robustness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Towards a Unified Information-Theoretic Framework for Generalization b/data/2021/neurips/Towards a Unified Information-Theoretic Framework for Generalization
new file mode 100644
index 0000000000..2db0d0ba0e
--- /dev/null
+++ b/data/2021/neurips/Towards a Unified Information-Theoretic Framework for Generalization	
@@ -0,0 +1 @@
+In this work, we investigate the expressiveness of the"conditional mutual information"(CMI) framework of Steinke and Zakynthinou (2020) and the prospect of using it to provide a unified framework for proving generalization bounds in the realizable setting. We first demonstrate that one can use this framework to express non-trivial (but sub-optimal) bounds for any learning algorithm that outputs hypotheses from a class of bounded VC dimension. We prove that the CMI framework yields the optimal bound on the expected risk of Support Vector Machines (SVMs) for learning halfspaces. This result is an application of our general result showing that stable compression schemes Bousquet al. (2020) of size $k$ have uniformly bounded CMI of order $O(k)$. We further show that an inherent limitation of proper learning of VC classes contradicts the existence of a proper learner with constant CMI, and it implies a negative resolution to an open problem of Steinke and Zakynthinou (2020). We further study the CMI of empirical risk minimizers (ERMs) of class $H$ and show that it is possible to output all consistent classifiers (version space) with bounded CMI if and only if $H$ has a bounded star number (Hanneke and Yang (2015)). Moreover, we prove a general reduction showing that"leave-one-out"analysis is expressible via the CMI framework. As a corollary we investigate the CMI of the one-inclusion-graph algorithm proposed by Haussler et al. (1994). More generally, we show that the CMI framework is universal in the sense that for every consistent algorithm and data distribution, the expected risk vanishes as the number of samples diverges if and only if its evaluated CMI has sublinear growth with the number of samples.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards mental time travel: a hierarchical memory for reinforcement learning agents b/data/2021/neurips/Towards mental time travel: a hierarchical memory for reinforcement learning agents
new file mode 100644
index 0000000000..ae2c932369
--- /dev/null
+++ b/data/2021/neurips/Towards mental time travel: a hierarchical memory for reinforcement learning agents	
@@ -0,0 +1 @@
+Reinforcement learning agents often forget details of the past, especially after delays or distractor tasks. Agents with common memory architectures struggle to recall and integrate across multiple timesteps of a past event, or even to recall the details of a single timestep that is followed by distractor tasks. To address these limitations, we propose a Hierarchical Chunk Attention Memory (HCAM), which helps agents to remember the past in detail. HCAM stores memories by dividing the past into chunks, and recalls by first performing high-level attention over coarse summaries of the chunks, and then performing detailed attention within only the most relevant chunks. An agent with HCAM can therefore"mentally time-travel"-- remember past events in detail without attending to all intervening events. We show that agents with HCAM substantially outperform agents with other memory architectures at tasks requiring long-term recall, retention, or reasoning over memory. These include recalling where an object is hidden in a 3D environment, rapidly learning to navigate efficiently in a new neighborhood, and rapidly learning and retaining new object names. Agents with HCAM can extrapolate to task sequences much longer than they were trained on, and can even generalize zero-shot from a meta-learning setting to maintaining knowledge across episodes. HCAM improves agent sample efficiency, generalization, and generality (by solving tasks that previously required specialized architectures). Our work is a step towards agents that can learn, interact, and adapt in complex and temporally-extended environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards optimally abstaining from prediction with OOD test examples b/data/2021/neurips/Towards optimally abstaining from prediction with OOD test examples
new file mode 100644
index 0000000000..f37d037a33
--- /dev/null
+++ b/data/2021/neurips/Towards optimally abstaining from prediction with OOD test examples	
@@ -0,0 +1 @@
+A common challenge across all areas of machine learning is that training data is not distributed like test data, due to natural shifts,"blind spots,"or adversarial examples; such test examples are referred to as out-of-distribution (OOD) test examples. We consider a model where one may abstain from predicting, at a fixed cost. In particular, our transductive abstention algorithm takes labeled training examples and unlabeled test examples as input, and provides predictions with optimal prediction loss guarantees. The loss bounds match standard generalization bounds when test examples are i.i.d. from the training distribution, but add an additional term that is the cost of abstaining times the statistical distance between the train and test distribution (or the fraction of adversarial examples). For linear regression, we give a polynomial-time algorithm based on Celis-Dennis-Tapia optimization algorithms. For binary classification, we show how to efficiently implement it using a proper agnostic learner (i.e., an Empirical Risk Minimizer) for the class of interest. Our work builds on a recent abstention algorithm of Goldwasser, Kalais, and Montasser (2020) for transductive binary classification.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards robust vision by multi-task learning on monkey visual cortex b/data/2021/neurips/Towards robust vision by multi-task learning on monkey visual cortex
new file mode 100644
index 0000000000..c2a96484de
--- /dev/null
+++ b/data/2021/neurips/Towards robust vision by multi-task learning on monkey visual cortex	
@@ -0,0 +1 @@
+Deep neural networks set the state-of-the-art across many tasks in computer vision, but their generalization ability to image distortions is surprisingly fragile. In contrast, the mammalian visual system is robust to a wide range of perturbations. Recent work suggests that this generalization ability can be explained by useful inductive biases encoded in the representations of visual stimuli throughout the visual cortex. Here, we successfully leveraged these inductive biases with a multi-task learning approach: we jointly trained a deep network to perform image classification and to predict neural activity in macaque primary visual cortex (V1). We measured the out-of-distribution generalization abilities of our network by testing its robustness to image distortions. We found that co-training on monkey V1 data leads to increased robustness despite the absence of those distortions during training. Additionally, we showed that our network's robustness is very close to that of an Oracle network where parts of the architecture are directly trained on noisy images. Our results also demonstrated that the network's representations become more brain-like as their robustness improves. Using a novel constrained reconstruction analysis, we investigated what makes our brain-regularized network more robust. We found that our co-trained network is more sensitive to content than noise when compared to a Baseline network that we trained for image classification alone. Using DeepGaze-predicted saliency maps for ImageNet images, we found that our monkey co-trained network tends to be more sensitive to salient regions in a scene, reminiscent of existing theories on the role of V1 in the detection of object borders and bottom-up saliency. Overall, our work expands the promising research avenue of transferring inductive biases from the brain, and provides a novel analysis of the effects of our transfer.
\ No newline at end of file
diff --git a/data/2021/neurips/Towards understanding retrosynthesis by energy-based models b/data/2021/neurips/Towards understanding retrosynthesis by energy-based models
new file mode 100644
index 0000000000..0fc056f28d
--- /dev/null
+++ b/data/2021/neurips/Towards understanding retrosynthesis by energy-based models	
@@ -0,0 +1 @@
+Retrosynthesis -- the process of identifying a set of reactants to synthesize a target molecule -- is of vital importance to material design and drug discovery. Existing machine learning approaches based on language models and graph neural networks have achieved encouraging results. In this paper, we propose a framework that unifies sequence- and graph-based methods as energy-based models (EBMs) with different energy functions. This unified perspective provides critical insights about EBM variants through a comprehensive assessment of performance. Additionally, we present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction by constraining the agreement between the two directions. This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
\ No newline at end of file
diff --git a/data/2021/neurips/Tracking People with 3D Representations b/data/2021/neurips/Tracking People with 3D Representations
new file mode 100644
index 0000000000..1b08c78a0e
--- /dev/null
+++ b/data/2021/neurips/Tracking People with 3D Representations	
@@ -0,0 +1 @@
+We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located in three-dimensional space. To this end, we develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture map on the triangles of the mesh. This serves as a 3D representation for appearance that is robust to viewpoint and pose changes. Given a video clip, we first detect bounding boxes corresponding to people, and for each one, we extract 3D appearance, pose, and location information using HMAR. These embedding vectors are then sent to a transformer, which performs spatio-temporal aggregation of the representations over the duration of the sequence. The similarity of the resulting representations is used to solve for associations that assigns each person to a tracklet. We evaluate our approach on the Posetrack, MuPoTs and AVA datasets. We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance. Code and results are available at: https://brjathu.github.io/T3DP.
\ No newline at end of file
diff --git a/data/2021/neurips/Tracking Without Re-recognition in Humans and Machines b/data/2021/neurips/Tracking Without Re-recognition in Humans and Machines
new file mode 100644
index 0000000000..54ae04515c
--- /dev/null
+++ b/data/2021/neurips/Tracking Without Re-recognition in Humans and Machines	
@@ -0,0 +1 @@
+Imagine trying to track one particular fruitfly in a swarm of hundreds. Higher biological visual systems have evolved to track moving objects by relying on both appearance and motion features. We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same. For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object in the midst of identical-looking"distractor"objects. While humans effortlessly learn PathTracker and generalize to systematic variations in task design, state-of-the-art deep networks struggle. To address this limitation, we identify and model circuit mechanisms in biological brains that are implicated in tracking objects based on motion cues. When instantiated as a recurrent network, our circuit model learns to solve PathTracker with a robust visual strategy that rivals human performance and explains a significant proportion of their decision-making on the challenge. We also show that the success of this circuit model extends to object tracking in natural videos. Adding it to a transformer-based architecture for object tracking builds tolerance to visual nuisances that affect object appearance, resulting in a new state-of-the-art performance on the large-scale TrackingNet object tracking challenge. Our work highlights the importance of building artificial vision models that can help us better understand human vision and improve computer vision.
\ No newline at end of file
diff --git a/data/2021/neurips/Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows b/data/2021/neurips/Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows
new file mode 100644
index 0000000000..83f0199949
--- /dev/null
+++ b/data/2021/neurips/Tractable Density Estimation on Learned Manifolds with Conformal Embedding Flows	
@@ -0,0 +1 @@
+Normalizing flows are generative models that provide tractable density estimation via an invertible transformation from a simple base distribution to a complex target distribution. However, this technique cannot directly model data supported on an unknown low-dimensional manifold, a common occurrence in real-world domains such as image data. Recent attempts to remedy this limitation have introduced geometric complications that defeat a central benefit of normalizing flows: exact density estimation. We recover this benefit with Conformal Embedding Flows, a framework for designing flows that learn manifolds with tractable densities. We argue that composing a standard flow with a trainable conformal embedding is the most natural way to model manifold-supported data. To this end, we present a series of conformal building blocks and apply them in experiments with synthetic and real-world data to demonstrate that flows can model manifold-supported distributions without sacrificing tractable likelihoods.
\ No newline at end of file
diff --git a/data/2021/neurips/Tractable Regularization of Probabilistic Circuits b/data/2021/neurips/Tractable Regularization of Probabilistic Circuits
new file mode 100644
index 0000000000..988b74a607
--- /dev/null
+++ b/data/2021/neurips/Tractable Regularization of Probabilistic Circuits	
@@ -0,0 +1 @@
+Probabilistic Circuits (PCs) are a promising avenue for probabilistic modeling. They combine advantages of probabilistic graphical models (PGMs) with those of neural networks (NNs). Crucially, however, they are tractable probabilistic models, supporting efficient and exact computation of many probabilistic inference queries, such as marginals and MAP. Further, since PCs are structured computation graphs, they can take advantage of deep-learning-style parameter updates, which greatly improves their scalability. However, this innovation also makes PCs prone to overfitting, which has been observed in many standard benchmarks. Despite the existence of abundant regularization techniques for both PGMs and NNs, they are not effective enough when applied to PCs. Instead, we re-think regularization for PCs and propose two intuitive techniques, data softening and entropy regularization, that both take advantage of PCs' tractability and still have an efficient implementation as a computation graph. Specifically, data softening provides a principled way to add uncertainty in datasets in closed form, which implicitly regularizes PC parameters. To learn parameters from a softened dataset, PCs only need linear time by virtue of their tractability. In entropy regularization, the exact entropy of the distribution encoded by a PC can be regularized directly, which is again infeasible for most other density estimation models. We show that both methods consistently improve the generalization performance of a wide variety of PCs. Moreover, when paired with a simple PC structure, we achieved state-of-the-art results on 10 out of 20 standard discrete density estimation benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds b/data/2021/neurips/Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds
new file mode 100644
index 0000000000..a0d1750ae6
--- /dev/null
+++ b/data/2021/neurips/Training Certifiably Robust Neural Networks with Efficient Local Lipschitz Bounds	
@@ -0,0 +1 @@
+Certified robustness is a desirable property for deep neural networks in safety-critical applications, and popular training algorithms can certify robustness of a neural network by computing a global bound on its Lipschitz constant. However, such a bound is often loose: it tends to over-regularize the neural network and degrade its natural accuracy. A tighter Lipschitz bound may provide a better tradeoff between natural and certified accuracy, but is generally hard to compute exactly due to non-convexity of the network. In this work, we propose an efficient and trainable \emph{local} Lipschitz upper bound by considering the interactions between activation functions (e.g. ReLU) and weight matrices. Specifically, when computing the induced norm of a weight matrix, we eliminate the corresponding rows and columns where the activation function is guaranteed to be a constant in the neighborhood of each given data point, which provides a provably tighter bound than the global Lipschitz constant of the neural network. Our method can be used as a plug-in module to tighten the Lipschitz bound in many certifiable training algorithms. Furthermore, we propose to clip activation functions (e.g., ReLU and MaxMin) with a learnable upper threshold and a sparsity loss to assist the network to achieve an even tighter local Lipschitz bound. Experimentally, we show that our method consistently outperforms state-of-the-art methods in both clean and certified accuracy on MNIST, CIFAR-10 and TinyImageNet datasets with various network architectures.
\ No newline at end of file
diff --git a/data/2021/neurips/Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State b/data/2021/neurips/Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State
new file mode 100644
index 0000000000..46d6d6c5c3
--- /dev/null
+++ b/data/2021/neurips/Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State	
@@ -0,0 +1 @@
+Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. However, the supervised training of SNNs remains a hard problem due to the discontinuity of the spiking neuron model. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks, and use surrogate derivatives or compute gradients with respect to the spiking time to deal with the problem. These approaches either accumulate approximation errors or only propagate information limitedly through existing spikes, and usually require information propagation along time steps with large memory costs and biological implausibility. In this work, we consider feedback spiking neural networks, which are more brain-like, and propose a novel training method that does not rely on the exact reverse of the forward computation. First, we show that the average firing rates of SNNs with feedback connections would gradually evolve to an equilibrium state along time, which follows a fixed-point equation. Then by viewing the forward computation of feedback SNNs as a black-box solver for this equation, and leveraging the implicit differentiation on the equation, we can compute the gradient for parameters without considering the exact forward procedure. In this way, the forward and backward procedures are decoupled and therefore the problem of non-differentiable spiking functions is avoided. We also briefly discuss the biological plausibility of implicit differentiation, which only requires computing another equilibrium. Extensive experiments on MNIST, Fashion-MNIST, N-MNIST, CIFAR-10, and CIFAR-100 demonstrate the superior performance of our method for feedback models with fewer neurons and parameters in a small number of time steps. Our code is avaiable at https://github.com/pkuxmq/IDE-FSNN.
\ No newline at end of file
diff --git a/data/2021/neurips/Training Neural Networks is ER-complete b/data/2021/neurips/Training Neural Networks is ER-complete
new file mode 100644
index 0000000000..65abf2673b
--- /dev/null
+++ b/data/2021/neurips/Training Neural Networks is ER-complete	
@@ -0,0 +1 @@
+Given a neural network, training data, and a threshold, it was known that it is NP-hard to ﬁnd weights for the neural network such that the total error is below the threshold. We determine the algorithmic complexity of this fundamental problem precisely, by showing that it is ∃ R -complete. This means that the problem is equivalent, up to polynomial time reductions, to deciding whether a system of polynomial equations and inequalities with integer coeﬃcients and real unknowns has a solution. If, as widely expected, ∃ R is strictly larger than NP, our work implies that the problem of training neural networks is not even in NP.
\ No newline at end of file
diff --git a/data/2021/neurips/Training Neural Networks with Fixed Sparse Masks b/data/2021/neurips/Training Neural Networks with Fixed Sparse Masks
new file mode 100644
index 0000000000..c6936c8ac4
--- /dev/null
+++ b/data/2021/neurips/Training Neural Networks with Fixed Sparse Masks	
@@ -0,0 +1 @@
+During typical gradient-based training of deep neural networks, all of the model's parameters are updated at each iteration. Recent work has shown that it is possible to update only a small subset of the model's parameters during training, which can alleviate storage and communication requirements. In this paper, we show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations. Our method constructs the mask out of the $k$ parameters with the largest Fisher information as a simple approximation as to which parameters are most important for the task at hand. In experiments on parameter-efficient transfer learning and distributed training, we show that our approach matches or exceeds the performance of other methods for training with sparse updates while being more efficient in terms of memory usage and communication costs. We release our code publicly to promote further applications of our approach.
\ No newline at end of file
diff --git a/data/2021/neurips/Training Over-parameterized Models with Non-decomposable Objectives b/data/2021/neurips/Training Over-parameterized Models with Non-decomposable Objectives
new file mode 100644
index 0000000000..a2ca680e5b
--- /dev/null
+++ b/data/2021/neurips/Training Over-parameterized Models with Non-decomposable Objectives	
@@ -0,0 +1 @@
+Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.
\ No newline at end of file
diff --git a/data/2021/neurips/Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time b/data/2021/neurips/Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time
new file mode 100644
index 0000000000..4ed8207199
--- /dev/null
+++ b/data/2021/neurips/Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time	
@@ -0,0 +1 @@
+In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time, leading to a drift between the train and test distributions. Such models are often re-trained on new data periodically, and they hence need to generalize to data not too far into the future. In this context, there is much prior work on enhancing temporal generalization, e.g. continuous transportation of past data, kernel smoothed time-sensitive parameters and more recently, adversarial learning of time-invariant features. However, these methods share several limitations, e.g, poor scalability, training instability, and dependence on unlabeled data from the future. Responding to the above limitations, we propose a simple method that starts with a model with time-sensitive parameters but regularizes its temporal complexity using a Gradient Interpolation (GI) loss. GI allows the decision boundary to change along time and can still prevent overfitting to the limited training time snapshots by allowing task-specific control over changes along time. We compare our method to existing baselines on multiple real-world datasets, which show that GI outperforms more complicated generative and adversarial approaches on the one hand, and simpler gradient regularization methods on the other.
\ No newline at end of file
diff --git a/data/2021/neurips/TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up b/data/2021/neurips/TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
new file mode 100644
index 0000000000..f9271354cb
--- /dev/null
+++ b/data/2021/neurips/TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up	
@@ -0,0 +1 @@
+The recent explosive interest on transformers has suggested their potential to become powerful"universal"models for computer vision tasks, such as classification, detection, and segmentation. While those attempts mainly study the discriminative models, we explore transformers on some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs). Our goal is to conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution, and correspondingly a multi-scale discriminator to capture simultaneously semantic contexts and low-level textures. On top of them, we introduce the new module of grid self-attention for alleviating the memory bottleneck further, in order to scale up TransGAN to high-resolution generation. We also develop a unique training recipe including a series of techniques that can mitigate the training instability issues of TransGAN, such as data augmentation, modified normalization, and relative position encoding. Our best architecture achieves highly competitive performance compared to current state-of-the-art GANs using convolutional backbones. Specifically, TransGAN sets new state-of-the-art inception score of 10.43 and FID of 18.28 on STL-10, outperforming StyleGAN-V2. When it comes to higher-resolution (e.g. 256 x 256) generation tasks, such as on CelebA-HQ and LSUN-Church, TransGAN continues to produce diverse visual examples with high fidelity and impressive texture details. In addition, we dive deep into the transformer-based generation models to understand how their behaviors differ from convolutional ones, by visualizing training dynamics. The code is available at https://github.com/VITA-Group/TransGAN.
\ No newline at end of file
diff --git a/data/2021/neurips/TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification b/data/2021/neurips/TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification
new file mode 100644
index 0000000000..99c7cb9d50
--- /dev/null
+++ b/data/2021/neurips/TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification	
@@ -0,0 +1 @@
+Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93.09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96.03% and 98.82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively. Implementation is available at: https://github.com/szc19990412/TransMIL.
\ No newline at end of file
diff --git a/data/2021/neurips/TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification b/data/2021/neurips/TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification
new file mode 100644
index 0000000000..15e0f2d0df
--- /dev/null
+++ b/data/2021/neurips/TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification	
@@ -0,0 +1 @@
+Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation, which is not naturally suitable for image matching. Accordingly, we propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity computation. Additionally, global max pooling and a multilayer perceptron (MLP) head are applied to decode the matching result. This way, the simplified decoder is computationally more efficient, while at the same time more effective for image matching. The proposed method, called TransMatcher, achieves state-of-the-art performance in generalizable person re-identification, with up to 6.1% and 5.7% performance gains in Rank-1 and mAP, respectively, on several popular datasets. Code is available at https://github.com/ShengcaiLiao/QAConv.
\ No newline at end of file
diff --git a/data/2021/neurips/Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization b/data/2021/neurips/Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization
new file mode 100644
index 0000000000..aec7e20d22
--- /dev/null
+++ b/data/2021/neurips/Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have been shown with superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provide theoretical insights into the design of their frameworks, or clear requirements and guarantees towards the transferability of GNNs. In this work, we establish a theoretically grounded and practically useful framework for the transfer learning of GNNs. Firstly, we propose a novel view towards the essential graph information and advocate the capturing of it as the goal of transferable GNN training, which motivates the design of Ours, a novel GNN framework based on ego-graph information maximization to analytically achieve this goal. Secondly, we specify the requirement of structure-respecting node features as the GNN input, and derive a rigorous bound of GNN transferability based on the difference between the local graph Laplacians of the source and target graphs. Finally, we conduct controlled synthetic experiments to directly justify our theoretical conclusions. Extensive experiments on real-world networks towards role identification show consistent results in the rigorously analyzed setting of direct-transfering, while those towards large-scale relation prediction show promising results in the more generalized and practical setting of transfering with fine-tuning.
\ No newline at end of file
diff --git a/data/2021/neurips/Transformer in Transformer b/data/2021/neurips/Transformer in Transformer
new file mode 100644
index 0000000000..83d6faaf0e
--- /dev/null
+++ b/data/2021/neurips/Transformer in Transformer	
@@ -0,0 +1 @@
+Transformer is a new kind of neural architecture which encodes the input data as powerful features via the attention mechanism. Basically, the visual transformers first divide the input images into several local patches and then calculate both representations and their relationship. Since natural images are of high complexity with abundant detail and color information, the granularity of the patch dividing is not fine enough for excavating features of objects in different scales and locations. In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT). Specifically, we regard the local patches (e.g., 16$\times$16) as"visual sentences"and present to further divide them into smaller patches (e.g., 4$\times$4) as"visual words". The attention of each word will be calculated with other words in the given visual sentence with negligible computational costs. Features of both words and sentences will be aggregated to enhance the representation ability. Experiments on several benchmarks demonstrate the effectiveness of the proposed TNT architecture, e.g., we achieve an 81.5% top-1 accuracy on the ImageNet, which is about 1.7% higher than that of the state-of-the-art visual transformer with similar computational cost. The PyTorch code is available at https://github.com/huawei-noah/CV-Backbones, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/TNT.
\ No newline at end of file
diff --git a/data/2021/neurips/TransformerFusion: Monocular RGB Scene Reconstruction using Transformers b/data/2021/neurips/TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
new file mode 100644
index 0000000000..cb8a9e23f8
--- /dev/null
+++ b/data/2021/neurips/TransformerFusion: Monocular RGB Scene Reconstruction using Transformers	
@@ -0,0 +1 @@
+We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-fine 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion.
\ No newline at end of file
diff --git a/data/2021/neurips/Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs b/data/2021/neurips/Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs
new file mode 100644
index 0000000000..1f2b5a03ab
--- /dev/null
+++ b/data/2021/neurips/Transformers Generalize DeepSets and Can be Extended to Graphs & Hypergraphs	
@@ -0,0 +1 @@
+We present a generalization of Transformers to any-order permutation invariant data (sets, graphs, and hypergraphs). We begin by observing that Transformers generalize DeepSets, or first-order (set-input) permutation invariant MLPs. Then, based on recently characterized higher-order invariant MLPs, we extend the concept of self-attention to higher orders and propose higher-order Transformers for order-$k$ data ($k=2$ for graphs and $k>2$ for hypergraphs). Unfortunately, higher-order Transformers turn out to have prohibitive complexity $\mathcal{O}(n^{2k})$ to the number of input nodes $n$. To address this problem, we present sparse higher-order Transformers that have quadratic complexity to the number of input hyperedges, and further adopt the kernel attention approach to reduce the complexity to linear. In particular, we show that the sparse second-order Transformers with kernel attention are theoretically more expressive than message passing operations while having an asymptotically identical complexity. Our models achieve significant performance improvement over invariant MLPs and message-passing graph neural networks in large-scale graph regression and set-to-(hyper)graph prediction tasks. Our implementation is available at https://github.com/jw9730/hot.
\ No newline at end of file
diff --git a/data/2021/neurips/Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation b/data/2021/neurips/Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation
new file mode 100644
index 0000000000..da892cf4a8
--- /dev/null
+++ b/data/2021/neurips/Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation	
@@ -0,0 +1 @@
+Single image reflection separation (SIRS), as a representative blind source separation task, aims to recover two layers, $\textit{i.e.}$, transmission and reflection, from one mixed observation, which is challenging due to the highly ill-posed nature. Existing deep learning based solutions typically restore the target layers individually, or with some concerns at the end of the output, barely taking into account the interaction across the two streams/branches. In order to utilize information more efficiently, this work presents a general yet simple interactive strategy, namely $\textit{your trash is my treasure}$ (YTMT), for constructing dual-stream decomposition networks. To be specific, we explicitly enforce the two streams to communicate with each other block-wisely. Inspired by the additive property between the two components, the interactive path can be easily built via transferring, instead of discarding, deactivated information by the ReLU rectifier from one stream to the other. Both ablation studies and experimental results on widely-used SIRS datasets are conducted to demonstrate the efficacy of YTMT, and reveal its superiority over other state-of-the-art alternatives. The implementation is quite simple and our code is publicly available at $\href{https://github.com/mingcv/YTMT-Strategy}{\textit{https://github.com/mingcv/YTMT-Strategy}}$.
\ No newline at end of file
diff --git a/data/2021/neurips/Tree in Tree: from Decision Trees to Decision Graphs b/data/2021/neurips/Tree in Tree: from Decision Trees to Decision Graphs
new file mode 100644
index 0000000000..b4eefc05da
--- /dev/null
+++ b/data/2021/neurips/Tree in Tree: from Decision Trees to Decision Graphs	
@@ -0,0 +1 @@
+Decision trees have been widely used as classifiers in many machine learning applications thanks to their lightweight and interpretable decision process. This paper introduces Tree in Tree decision graph (TnT), a framework that extends the conventional decision tree to a more generic and powerful directed acyclic graph. TnT constructs decision graphs by recursively growing decision trees inside the internal or leaf nodes instead of greedy training. The time complexity of TnT is linear to the number of nodes in the graph, and it can construct decision graphs on large datasets. Compared to decision trees, we show that TnT achieves better classification performance with reduced model size, both as a stand-alone classifier and as a base estimator in bagging/AdaBoost ensembles. Our proposed model is a novel, more efficient, and accurate alternative to the widely-used decision trees.
\ No newline at end of file
diff --git a/data/2021/neurips/TriBERT: Human-centric Audio-visual Representation Learning b/data/2021/neurips/TriBERT: Human-centric Audio-visual Representation Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/True Few-Shot Learning with Language Models b/data/2021/neurips/True Few-Shot Learning with Language Models
new file mode 100644
index 0000000000..2ca1130e80
--- /dev/null
+++ b/data/2021/neurips/True Few-Shot Learning with Language Models	
@@ -0,0 +1 @@
+Pretrained language models (LMs) perform well on many tasks even when learning from a few examples, but prior work uses many held-out examples to tune various aspects of learning, such as hyperparameters, training objectives, and natural language templates ("prompts"). Here, we evaluate the few-shot ability of LMs when such held-out examples are unavailable, a setting we call true few-shot learning. We test two model selection criteria, cross-validation and minimum description length, for choosing LM prompts and hyperparameters in the true few-shot setting. On average, both marginally outperform random selection and greatly underperform selection based on held-out examples. Moreover, selection criteria often prefer models that perform significantly worse than randomly-selected ones. We find similar results even when taking into account our uncertainty in a model's true performance during selection, as well as when varying the amount of computation and number of examples used for selection. Overall, our findings suggest that prior work significantly overestimated the true few-shot ability of LMs given the difficulty of few-shot model selection.
\ No newline at end of file
diff --git a/data/2021/neurips/Truncated Marginal Neural Ratio Estimation b/data/2021/neurips/Truncated Marginal Neural Ratio Estimation
new file mode 100644
index 0000000000..63d5f98b46
--- /dev/null
+++ b/data/2021/neurips/Truncated Marginal Neural Ratio Estimation	
@@ -0,0 +1 @@
+Parametric stochastic simulators are ubiquitous in science, often featuring high-dimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural simulator-based inference algorithm which simultaneously offers simulation efficiency and fast empirical posterior testability, which is unique among modern algorithms. Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function. Furthermore, by estimating a locally amortized posterior our algorithm enables efficient empirical tests of the robustness of the inference results. Such tests are important for sanity-checking inference in real-world applications, which do not feature a known ground truth. We perform experiments on a marginalized version of the simulation-based inference benchmark and two complex and narrow posteriors, highlighting the simulator efficiency of our algorithm as well as the quality of the estimated marginal posteriors. Implementation on GitHub.
\ No newline at end of file
diff --git a/data/2021/neurips/Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions b/data/2021/neurips/Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions
new file mode 100644
index 0000000000..46c20c54eb
--- /dev/null
+++ b/data/2021/neurips/Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions	
@@ -0,0 +1 @@
+Multimodal regression is a fundamental task, which integrates the information from different sources to improve the performance of follow-up applications. However, existing methods mainly focus on improving the performance and often ignore the confidence of prediction for diverse situations. In this study, we are devoted to trustworthy multimodal regression which is critical in cost-sensitive domains. To this end, we introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Our model can be dynamically aware of uncertainty for each modality, and also robust for corrupted modalities. Furthermore, the proposed MoNIG ensures explicitly representation of (modality-specific/global) epistemic and aleatoric uncertainties, respectively. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks (e.g., temperature prediction for superconductivity, relative location prediction for CT slices, and multimodal sentiment analysis).
\ No newline at end of file
diff --git a/data/2021/neurips/Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer b/data/2021/neurips/Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL b/data/2021/neurips/Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
new file mode 100644
index 0000000000..7b1e88c15c
--- /dev/null
+++ b/data/2021/neurips/Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL	
@@ -0,0 +1 @@
+Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by formulating online hyperparameter optimization as time varying GP-bandit problem, while also providing theoretical guarantees. However, PB2 is only designed to work for continuous hyperparameters, which severely limits its utility in practice. In this paper we introduce a new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. We evaluate our approach on the challenging Procgen benchmark, where we show that explicitly modelling dependence between data augmentation and other hyperparameters improves generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/Turing Completeness of Bounded-Precision Recurrent Neural Networks b/data/2021/neurips/Turing Completeness of Bounded-Precision Recurrent Neural Networks
new file mode 100644
index 0000000000..c77857b71d
--- /dev/null
+++ b/data/2021/neurips/Turing Completeness of Bounded-Precision Recurrent Neural Networks	
@@ -0,0 +1 @@
+Previous works have proved that recurrent neural networks (RNNs) are Turing-complete. However, in the proofs, the RNNs allow for neurons with unbounded precision, which is neither practical in implementation nor biologically plausible. To remove this assumption, we propose a dynamically growing memory module made of neurons of fixed precision. The memory module dynamically recruits new neurons when more memories are needed, and releases them when memories become irrelevant. We prove that a 54-neuron bounded-precision RNN with growing memory modules can simulate a Universal Turing Machine, with time complexity linear in the simulated machine’s time and independent of the memory size. The result is extendable to various other stack-augmented RNNs. Furthermore, we analyze the Turing completeness of both unbounded-precision and bounded-precision RNNs, revisiting and extending the theoretical foundations of RNNs.
\ No newline at end of file
diff --git a/data/2021/neurips/Twice regularized MDPs and the equivalence between robustness and regularization b/data/2021/neurips/Twice regularized MDPs and the equivalence between robustness and regularization
new file mode 100644
index 0000000000..5b1cb21c8b
--- /dev/null
+++ b/data/2021/neurips/Twice regularized MDPs and the equivalence between robustness and regularization	
@@ -0,0 +1 @@
+Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet, they generally do not encompass uncertainty in the model dynamics. In this work, we aim to learn robust MDPs using regularization. We first show that regularized MDPs are a particular instance of robust MDPs with uncertain reward. We thus establish that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs. We further extend this relationship to MDPs with uncertain transitions: this leads to a regularization term with an additional dependence on the value function. We finally generalize regularized MDPs to twice regularized MDPs (R${}^2$ MDPs), i.e., MDPs with $\textit{both}$ value and policy regularization. The corresponding Bellman operators enable developing policy iteration schemes with convergence and robustness guarantees. It also reduces planning and learning in robust MDPs to regularized MDPs.
\ No newline at end of file
diff --git a/data/2021/neurips/Twins: Revisiting the Design of Spatial Attention in Vision Transformers b/data/2021/neurips/Twins: Revisiting the Design of Spatial Attention in Vision Transformers
new file mode 100644
index 0000000000..cde8f86f37
--- /dev/null
+++ b/data/2021/neurips/Twins: Revisiting the Design of Spatial Attention in Vision Transformers	
@@ -0,0 +1 @@
+Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks, including image level classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code is released at https://github.com/Meituan-AutoML/Twins .
\ No newline at end of file
diff --git a/data/2021/neurips/Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution b/data/2021/neurips/Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution
new file mode 100644
index 0000000000..3860b3038d
--- /dev/null
+++ b/data/2021/neurips/Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution	
@@ -0,0 +1 @@
+We categorize meta-learning evaluation into two settings: $\textit{in-distribution}$ [ID], in which the train and test tasks are sampled $\textit{iid}$ from the same underlying task distribution, and $\textit{out-of-distribution}$ [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given meta-learning method, and 2) consistently comparing the performance of different methods. To address these concerns, we provide suggestions on how to construct FSL benchmarks to allow for ID evaluation as well as more reliable OOD evaluation. Our work aims to inform the meta-learning community about the importance and distinction of ID vs. OOD evaluation, as well as the subtleties of OOD evaluation with current benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Two steps to risk sensitivity b/data/2021/neurips/Two steps to risk sensitivity
new file mode 100644
index 0000000000..eea329951e
--- /dev/null
+++ b/data/2021/neurips/Two steps to risk sensitivity	
@@ -0,0 +1 @@
+Distributional reinforcement learning (RL) -- in which agents learn about all the possible long-term consequences of their actions, and not just the expected value -- is of great recent interest. One of the most important affordances of a distributional view is facilitating a modern, measured, approach to risk when outcomes are not completely certain. By contrast, psychological and neuroscientific investigations into decision making under risk have utilized a variety of more venerable theoretical models such as prospect theory that lack axiomatically desirable properties such as coherence. Here, we consider a particularly relevant risk measure for modeling human and animal planning, called conditional value-at-risk (CVaR), which quantifies worst-case outcomes (e.g., vehicle accidents or predation). We first adopt a conventional distributional approach to CVaR in a sequential setting and reanalyze the choices of human decision-makers in the well-known two-step task, revealing substantial risk aversion that had been lurking under stickiness and perseveration. We then consider a further critical property of risk sensitivity, namely time consistency, showing alternatives to this form of CVaR that enjoy this desirable characteristic. We use simulations to examine settings in which the various forms differ in ways that have implications for human and animal planning and behavior.
\ No newline at end of file
diff --git a/data/2021/neurips/Two-sided fairness in rankings via Lorenz dominance b/data/2021/neurips/Two-sided fairness in rankings via Lorenz dominance
new file mode 100644
index 0000000000..cb6bf1aa64
--- /dev/null
+++ b/data/2021/neurips/Two-sided fairness in rankings via Lorenz dominance	
@@ -0,0 +1 @@
+We consider the problem of generating rankings that are fair towards both users and item producers in recommender systems. We address both usual recommendation (e.g., of music or movies) and reciprocal recommendation (e.g., dating). Following concepts of distributive justice in welfare economics, our notion of fairness aims at increasing the utility of the worse-off individuals, which we formalize using the criterion of Lorenz efficiency. It guarantees that rankings are Pareto efficient, and that they maximally redistribute utility from better-off to worse-off, at a given level of overall utility. We propose to generate rankings by maximizing concave welfare functions, and develop an efficient inference procedure based on the Frank-Wolfe algorithm. We prove that unlike existing approaches based on fairness constraints, our approach always produces fair rankings. Our experiments also show that it increases the utility of the worse-off at lower costs in terms of overall utility.
\ No newline at end of file
diff --git a/data/2021/neurips/Two-step lookahead Bayesian optimization with inequality constraints b/data/2021/neurips/Two-step lookahead Bayesian optimization with inequality constraints
new file mode 100644
index 0000000000..5707f925f7
--- /dev/null
+++ b/data/2021/neurips/Two-step lookahead Bayesian optimization with inequality constraints	
@@ -0,0 +1 @@
+Recent advances in computationally efficient non-myopic Bayesian optimization (BO) improve query efficiency over traditional myopic methods like expected improvement while only modestly increasing computational cost. These advances have been largely limited, however, to unconstrained optimization. For constrained optimization, the few existing non-myopic BO methods require heavy computation. For instance, one existing non-myopic constrained BO method [Lam and Willcox, 2017] relies on computationally expensive unreliable brute-force derivative-free optimization of a Monte Carlo rollout acquisition function. Methods that use the reparameterization trick for more efficient derivative-based optimization of non-myopic acquisition functions in the unconstrained setting, like sample average approximation and infinitesimal perturbation analysis, do not extend: constraints introduce discontinuities in the sampled acquisition function surface that hinder its optimization. Moreover, we argue here that being non-myopic is even more important in constrained problems because fear of violating constraints pushes myopic methods away from sampling the boundary between feasible and infeasible regions, slowing the discovery of optimal solutions with tight constraints. In this paper, we propose a computationally efficient two-step lookahead constrained Bayesian optimization acquisition function (2-OPT-C) supporting both sequential and batch settings. To enable fast acquisition function optimization, we develop a novel likelihood-ratio-based unbiased estimator of the gradient of the two-step optimal acquisition function that does not use the reparameterization trick. In numerical experiments, 2-OPT-C typically improves query efficiency by 2x or more over previous methods, and in some cases by 10x or more.
\ No newline at end of file
diff --git "a/data/2021/neurips/T\303\266RF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis" "b/data/2021/neurips/T\303\266RF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis"
new file mode 100644
index 0000000000..aae613f052
--- /dev/null
+++ "b/data/2021/neurips/T\303\266RF: Time-of-Flight Radiance Fields for Dynamic Scene View Synthesis"	
@@ -0,0 +1 @@
+Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.
\ No newline at end of file
diff --git a/data/2021/neurips/UCB-based Algorithms for Multinomial Logistic Regression Bandits b/data/2021/neurips/UCB-based Algorithms for Multinomial Logistic Regression Bandits
new file mode 100644
index 0000000000..1e6a5cc459
--- /dev/null
+++ b/data/2021/neurips/UCB-based Algorithms for Multinomial Logistic Regression Bandits	
@@ -0,0 +1 @@
+Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logisitc bandits that are used in problems with binary rewards: for instance, when the learner/agent tries to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1\geq 2$ possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action $\mathbf{x}_t$, the user selects one of $K+1\geq 2$ outcomes, say outcome $i$, with a multinomial logit (MNL) probabilistic model with corresponding unknown parameter $\bar{\boldsymbol\theta}_{\ast i}$. Each outcome $i$ is also associated with a revenue parameter $\rho_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $\tilde{\mathcal{O}}(dK\sqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.
\ No newline at end of file
diff --git a/data/2021/neurips/UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis b/data/2021/neurips/UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis
new file mode 100644
index 0000000000..20d7480e60
--- /dev/null
+++ b/data/2021/neurips/UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis	
@@ -0,0 +1 @@
+Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of investigating these control signals separately, we propose a new two-stage architecture, UFC-BERT, to unify any number of multi-modal controls. In UFC-BERT, both the diverse control signals and the synthesized image are uniformly represented as a sequence of discrete tokens to be processed by Transformer. Different from existing two-stage autoregressive approaches such as DALL-E and VQGAN, UFC-BERT adopts non-autoregressive generation (NAR) at the second stage to enhance the holistic consistency of the synthesized image, to support preserving speciﬁed image blocks, and to improve the synthesis speed. Further, we design a progressive algorithm that iteratively improves the non-autoregressively generated image, with the help of two estimators developed for evaluating the compliance with the controls and evaluating the ﬁdelity of the synthesized image, respectively. Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi-Modal CelebA-HQ verify that UFC-BERT can synthesize high-ﬁdelity images that comply with ﬂexible multi-modal controls.
\ No newline at end of file
diff --git a/data/2021/neurips/USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems b/data/2021/neurips/USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems
new file mode 100644
index 0000000000..0fc4506802
--- /dev/null
+++ b/data/2021/neurips/USCO-Solver: Solving Undetermined Stochastic Combinatorial Optimization Problems	
@@ -0,0 +1 @@
+Real-world decision-making systems are often subject to uncertainties that have to be resolved through observational data. Therefore, we are frequently confronted with combinatorial optimization problems of which the objective function is unknown and thus has to be debunked using empirical evidence. In contrast to the common practice that relies on a learning-and-optimization strategy, we consider the regression between combinatorial spaces, aiming to infer high-quality optimization solutions from samples of input-solution pairs -- without the need to learn the objective function. Our main deliverable is a universal solver that is able to handle abstract undetermined stochastic combinatorial optimization problems. For learning foundations, we present learning-error analysis under the PAC-Bayesian framework using a new margin-based analysis. In empirical studies, we demonstrate our design using proof-of-concept experiments, and compare it with other methods that are potentially applicable. Overall, we obtain highly encouraging experimental results for several classic combinatorial problems on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Ultrahyperbolic Neural Networks b/data/2021/neurips/Ultrahyperbolic Neural Networks
new file mode 100644
index 0000000000..c9d1e3556b
--- /dev/null
+++ b/data/2021/neurips/Ultrahyperbolic Neural Networks	
@@ -0,0 +1 @@
+Riemannian space forms, such as the Euclidean space, sphere and hyperbolic space, are popular and powerful representation spaces in machine learning. For instance, hyperbolic geometry is appropriate to represent graphs without cycles and has been used to extend Graph Neural Networks. Recently, some pseudo-Riemannian space forms that generalize both hyperbolic and spherical geometries have been exploited to learn a speciﬁc type of nonparametric embedding called ultrahyperbolic. The lack of geodesic between every pair of ultrahyperbolic points makes the task of learning parametric models (e.g., neural networks) difﬁcult. This paper introduces a method to learn parametric models in ultrahyperbolic space. We experimentally show the relevance of our approach in the tasks of graph and node classiﬁcation.
\ No newline at end of file
diff --git a/data/2021/neurips/Unadversarial Examples: Designing Objects for Robust Vision b/data/2021/neurips/Unadversarial Examples: Designing Objects for Robust Vision
new file mode 100644
index 0000000000..b3f5ef3c75
--- /dev/null
+++ b/data/2021/neurips/Unadversarial Examples: Designing Objects for Robust Vision	
@@ -0,0 +1 @@
+We study a class of realistic computer vision settings wherein one can influence the design of the objects being recognized. We develop a framework that leverages this capability to significantly improve vision models' performance and robustness. This framework exploits the sensitivity of modern machine learning algorithms to input perturbations in order to design"robust objects,"i.e., objects that are explicitly optimized to be confidently detected or classified. We demonstrate the efficacy of the framework on a wide variety of vision-based tasks ranging from standard benchmarks, to (in-simulation) robotics, to real-world experiments. Our code can be found at https://git.io/unadversarial .
\ No newline at end of file
diff --git a/data/2021/neurips/Unbalanced Optimal Transport through Non-negative Penalized Linear Regression b/data/2021/neurips/Unbalanced Optimal Transport through Non-negative Penalized Linear Regression
new file mode 100644
index 0000000000..5617ca2a04
--- /dev/null
+++ b/data/2021/neurips/Unbalanced Optimal Transport through Non-negative Penalized Linear Regression	
@@ -0,0 +1 @@
+This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation.
\ No newline at end of file
diff --git a/data/2021/neurips/Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning b/data/2021/neurips/Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning
new file mode 100644
index 0000000000..111ac4fb44
--- /dev/null
+++ b/data/2021/neurips/Unbiased Classification through Bias-Contrastive and Bias-Balanced Learning	
@@ -0,0 +1 @@
+Theorem A.1. (Bias-balanced Regression) Assume the multinomial logistic regression with the model h(x)[y] = e ηy ∑ y′ e η y′ where ηy is the logit for class y, and the training data distribution Ptrain(X,Y,B). Let Pu be the data distribution shifted from Ptrain to have the uniform correlation between Y and B. If the model h estimates the conditional probability over Pu, i.e., h(x)[y] = pu(y|x), then the estimate of the conditional probability over Ptrain is:
\ No newline at end of file
diff --git a/data/2021/neurips/Uncertain Decisions Facilitate Better Preference Learning b/data/2021/neurips/Uncertain Decisions Facilitate Better Preference Learning
new file mode 100644
index 0000000000..51d2832b8e
--- /dev/null
+++ b/data/2021/neurips/Uncertain Decisions Facilitate Better Preference Learning	
@@ -0,0 +1 @@
+Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed framework where a human is observed making non-sequential binary decisions under uncertainty. In IDT, the human's preferences are conveyed through their loss function, which expresses a tradeoff between different types of mistakes. We give the first statistical analysis of IDT, providing conditions necessary to identify these preferences and characterizing the sample complexity -- the number of decisions that must be observed to learn the tradeoff the human is making to a desired precision. Interestingly, we show that it is actually easier to identify preferences when the decision problem is more uncertain. Furthermore, uncertain decision problems allow us to relax the unrealistic assumption that the human is an optimal decision maker but still identify their exact preferences; we give sample complexities in this suboptimal case as well. Our analysis contradicts the intuition that partial observability should make preference learning more difficult. It also provides a first step towards understanding and improving preference learning methods for uncertain and suboptimal humans.
\ No newline at end of file
diff --git a/data/2021/neurips/Uncertainty Calibration for Ensemble-Based Debiasing Methods b/data/2021/neurips/Uncertainty Calibration for Ensemble-Based Debiasing Methods
new file mode 100644
index 0000000000..4accca3846
--- /dev/null
+++ b/data/2021/neurips/Uncertainty Calibration for Ensemble-Based Debiasing Methods	
@@ -0,0 +1 @@
+Ensemble-based debiasing methods have been shown effective in mitigating the reliance of classifiers on specific dataset bias, by exploiting the output of a bias-only model to adjust the learning target. In this paper, we focus on the bias-only model in these ensemble-based methods, which plays an important role but has not gained much attention in the existing literature. Theoretically, we prove that the debiasing performance can be damaged by inaccurate uncertainty estimations of the bias-only model. Empirically, we show that existing bias-only models fall short in producing accurate uncertainty estimations. Motivated by these findings, we propose to conduct calibration on the bias-only model, thus achieving a three-stage ensemble-based debiasing framework, including bias modeling, model calibrating, and debiasing. Experimental results on NLI and fact verification tasks show that our proposed three-stage debiasing framework consistently outperforms the traditional two-stage one in out-of-distribution accuracy.
\ No newline at end of file
diff --git a/data/2021/neurips/Uncertainty Quantification and Deep Ensembles b/data/2021/neurips/Uncertainty Quantification and Deep Ensembles
new file mode 100644
index 0000000000..27ca306af0
--- /dev/null
+++ b/data/2021/neurips/Uncertainty Quantification and Deep Ensembles	
@@ -0,0 +1 @@
+Deep Learning methods are known to suffer from calibration issues: they typically produce over-confident estimates. These problems are exacerbated in the low data regime. Although the calibration of probabilistic models is well studied, calibrating extremely over-parametrized models in the low-data regime presents unique challenges. We show that deep-ensembles do not necessarily lead to improved calibration properties. In fact, we show that standard ensembling methods, when used in conjunction with modern techniques such as mixup regularization, can lead to less calibrated models. In this text, we examine the interplay between three of the most simple and commonly used approaches to leverage deep learning when data is scarce: data-augmentation, ensembling, and post-processing calibration methods. We demonstrate that, although standard ensembling techniques certainly help to boost accuracy, the calibration of deep-ensembles relies on subtle trade-offs. Our main finding is that calibration methods such as temperature scaling need to be slightly tweaked when used with deep-ensembles and, crucially, need to be executed after the averaging process. Our simulations indicate that, in the low data regime, this simple strategy can halve the Expected Calibration Error (ECE) on a range of benchmark classification problems when compared to standard deep-ensembles.
\ No newline at end of file
diff --git a/data/2021/neurips/Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble b/data/2021/neurips/Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
new file mode 100644
index 0000000000..1754005c7c
--- /dev/null
+++ b/data/2021/neurips/Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble	
@@ -0,0 +1 @@
+Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution. We show that the clipped Q-learning, a technique widely used in online RL, can be leveraged to successfully penalize OOD data points with high prediction uncertainties. Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning. Based on this observation, we propose an ensemble-diversified actor-critic algorithm that reduces the number of required ensemble networks down to a tenth compared to the naive ensemble while achieving state-of-the-art performance on most of the D4RL benchmarks considered.
\ No newline at end of file
diff --git a/data/2021/neurips/Uncertainty-Driven Loss for Single Image Super-Resolution b/data/2021/neurips/Uncertainty-Driven Loss for Single Image Super-Resolution
new file mode 100644
index 0000000000..1635eee9b7
--- /dev/null
+++ b/data/2021/neurips/Uncertainty-Driven Loss for Single Image Super-Resolution	
@@ -0,0 +1 @@
+In low-level vision such as single image super-resolution (SISR), traditional MSE or L 1 loss function treats every pixel equally with the assumption that the importance of all pixels is the same. However, it has been long recognized that texture and edge areas carry more important visual information than smooth areas in photographic images. How to achieve such spatial adaptation in a principled manner has been an open problem in both traditional model-based and modern learning-based approaches toward SISR. In this paper, we propose a new adaptive weighted loss for SISR to train deep networks focusing on challenging situations such as textured and edge pixels with high uncertainty. Speciﬁcally, we introduce variance estimation characterizing the uncertainty on a pixel-by-pixel basis into SISR solutions so the targeted pixels in a high-resolution image (mean) and their corresponding uncertainty (variance) can be learned simultaneously. Moreover, uncertainty estimation allows us to leverage conventional wisdom such as sparsity prior for regularizing SISR solutions. Ultimately, pixels with large certainty (e.g., texture and edge pixels) will be prioritized for SISR according to their importance to visual quality. For the ﬁrst time, we demonstrate that such uncertainty-driven loss can achieve better results than MSE or L 1 loss for a wide range of network architectures. Experimental results on three popular SISR networks show that our proposed uncertainty-driven loss has achieved better PSNR performance than traditional loss functions without any increased computation during testing.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems b/data/2021/neurips/Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems
new file mode 100644
index 0000000000..1b60b8c2b9
--- /dev/null
+++ b/data/2021/neurips/Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems	
@@ -0,0 +1 @@
+Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Bandits with Graph Feedback b/data/2021/neurips/Understanding Bandits with Graph Feedback
new file mode 100644
index 0000000000..e46879a4fc
--- /dev/null
+++ b/data/2021/neurips/Understanding Bandits with Graph Feedback	
@@ -0,0 +1 @@
+The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( \delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Deflation Process in Over-parametrized Tensor Decomposition b/data/2021/neurips/Understanding Deflation Process in Over-parametrized Tensor Decomposition
new file mode 100644
index 0000000000..bb6d9fdfbf
--- /dev/null
+++ b/data/2021/neurips/Understanding Deflation Process in Over-parametrized Tensor Decomposition	
@@ -0,0 +1 @@
+In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor deflation process that is commonly used in tensor decomposition algorithms. We prove that for orthogonally decomposable tensor, a slightly modified version of gradient flow would follow a tensor deflation process and recover all the tensor components. Our proof suggests that for orthogonal tensors, gradient flow dynamics works similarly as greedy low-rank learning in the matrix setting, which is a first step towards understanding the implicit regularization effect of over-parametrized models for low-rank tensors.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization b/data/2021/neurips/Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization
new file mode 100644
index 0000000000..68d949511a
--- /dev/null
+++ b/data/2021/neurips/Understanding End-to-End Model-Based Reinforcement Learning Methods as Implicit Parameterization	
@@ -0,0 +1 @@
+Estimating the per-state expected cumulative rewards is a critical aspect of reinforcement learning approaches, however the experience is obtained, but standard deep neural-network function-approximation methods are often inefﬁcient in this setting. An alternative approach, exempliﬁed by value iteration networks, is to learn transition and reward models of a latent Markov decision process whose value predictions ﬁt the data. This approach has been shown empirically to converge faster to a more robust solution in many cases, but there has been little theoretical study of this phenomenon. In this paper, we explore such implicit representations of value functions via theory and focused experimentation. We prove that, for a linear parametrization, gradient descent converges to global optima despite non-linearity and non-convexity introduced by the implicit representation. Furthermore, we derive convergence rates for both cases which allow us to identify conditions under which stochastic gradient descent (SGD) with this implicit representation converges substantially faster than its explicit counterpart. Finally, we provide empirical results in some simple domains that illustrate the theoretical ﬁndings.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding How Encoder-Decoder Architectures Attend b/data/2021/neurips/Understanding How Encoder-Decoder Architectures Attend
new file mode 100644
index 0000000000..8b80280c76
--- /dev/null
+++ b/data/2021/neurips/Understanding How Encoder-Decoder Architectures Attend	
@@ -0,0 +1 @@
+Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also not well understood. In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and input-driven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the temporal or input-driven components. These findings hold across both recurrent and feed-forward architectures despite their differences in forming the temporal components. Overall, our results provide new insight into the inner workings of attention-based encoder-decoder networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Instance-based Interpretability of Variational Auto-Encoders b/data/2021/neurips/Understanding Instance-based Interpretability of Variational Auto-Encoders
new file mode 100644
index 0000000000..ce051fe90d
--- /dev/null
+++ b/data/2021/neurips/Understanding Instance-based Interpretability of Variational Auto-Encoders	
@@ -0,0 +1 @@
+Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict. However, instance-based interpretations remain ill-understood in the context of unsupervised learning. In this paper, we investigate influence functions [Koh and Liang, 2017], a popular instance-based interpretation method, for a class of deep generative models called variational auto-encoders (VAE). We formally frame the counter-factual question answered by influence functions in this setting, and through theoretical analysis, examine what they reveal about the impact of training samples on classical unsupervised learning methods. We then introduce VAE- TracIn, a computationally efficient and theoretically sound solution based on Pruthi et al. [2020], for VAEs. Finally, we evaluate VAE-TracIn on several real world datasets with extensive quantitative and qualitative analysis.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Interlocking Dynamics of Cooperative Rationalization b/data/2021/neurips/Understanding Interlocking Dynamics of Cooperative Rationalization
new file mode 100644
index 0000000000..4ff8245177
--- /dev/null
+++ b/data/2021/neurips/Understanding Interlocking Dynamics of Cooperative Rationalization	
@@ -0,0 +1 @@
+Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output. The selection mechanism is commonly integrated into the model itself by specifying a two-component cascaded system consisting of a rationale generator, which makes a binary selection of the input features (which is the rationale), and a predictor, which predicts the output based only on the selected features. The components are trained jointly to optimize prediction performance. In this paper, we reveal a major problem with such cooperative rationalization paradigm -- model interlocking. Interlocking arises when the predictor overfits to the features selected by the generator thus reinforcing the generator's selection even if the selected rationales are sub-optimal. The fundamental cause of the interlocking problem is that the rationalization objective to be minimized is concave with respect to the generator's selection policy. We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection. The generator now realizes both soft and hard attention over the features and these are fed into the two different predictors. While the generator still seeks to support the original predictor performance, it also minimizes a gap between the two predictors. As we will show theoretically, since the attention-based predictor exhibits a better convexity property, A2R can overcome the concavity barrier. Our experiments on two synthetic benchmarks and two real datasets demonstrate that A2R can significantly alleviate the interlock problem and find explanations that better align with human judgments. We release our code at https://github.com/Gorov/Understanding_Interlocking.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning b/data/2021/neurips/Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning
new file mode 100644
index 0000000000..1cedbb9d29
--- /dev/null
+++ b/data/2021/neurips/Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning	
@@ -0,0 +1 @@
+Instance discriminative self-supervised representation learning has been attracted attention thanks to its unsupervised nature and informative feature representation for downstream tasks. In practice, it commonly uses a larger number of negative samples than the number of supervised classes. However, there is an inconsistency in the existing analysis; theoretically, a large number of negative samples degrade classification performance on a downstream supervised task, while empirically, they improve the performance. We provide a novel framework to analyze this empirical result regarding negative samples using the coupon collector's problem. Our bound can implicitly incorporate the supervised loss of the downstream task in the self-supervised loss by increasing the number of negative samples. We confirm that our proposed analysis holds on real-world benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding Partial Multi-Label Learning via Mutual Information b/data/2021/neurips/Understanding Partial Multi-Label Learning via Mutual Information
new file mode 100644
index 0000000000..ebb87b03a5
--- /dev/null
+++ b/data/2021/neurips/Understanding Partial Multi-Label Learning via Mutual Information	
@@ -0,0 +1 @@
+To deal with ambiguities in partial multi-label learning (PML), state-of-the-art methods perform disambiguation by identifying ground-truth labels directly. However, there is an essential question:“ Can the ground-truth labels be identiﬁed precisely? ". If yes, “ How can the ground-truth labels be found? ". This paper provides afﬁrmative answers to these questions. Instead of adopting hand-made heuristic strategy, we propose a novel M utual I nformation L abel I dentiﬁcation for P artial M ulti-Label L earning ( MILI-PML ), which is derived from a clear probabilistic formulation and could be easily interpreted theoretically from the mutual information perspective, as well as naturally incorporates the feature/label relevancy into consideration. Extensive experiments on synthetic and real-world datasets clearly demonstrate the superiorities of the proposed MILI-PML.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding and Improving Early Stopping for Learning with Noisy Labels b/data/2021/neurips/Understanding and Improving Early Stopping for Learning with Noisy Labels
new file mode 100644
index 0000000000..f81c8bab2f
--- /dev/null
+++ b/data/2021/neurips/Understanding and Improving Early Stopping for Learning with Noisy Labels	
@@ -0,0 +1 @@
+The memorization effect of deep neural network (DNN) plays a pivotal role in many state-of-the-art label-noise learning methods. To exploit this property, the early stopping trick, which stops the optimization at the early stage of training, is usually adopted. Current methods generally decide the early stopping point by considering a DNN as a whole. However, a DNN can be considered as a composition of a series of layers, and we find that the latter layers in a DNN are much more sensitive to label noise, while their former counterparts are quite robust. Therefore, selecting a stopping point for the whole network may make different DNN layers antagonistically affected each other, thus degrading the final performance. In this paper, we propose to separate a DNN into different parts and progressively train them to address this problem. Instead of the early stopping, which trains a whole DNN all at once, we initially train former DNN layers by optimizing the DNN with a relatively large number of epochs. During training, we progressively train the latter DNN layers by using a smaller number of epochs with the preceding layers fixed to counteract the impact of noisy labels. We term the proposed method as progressive early stopping (PES). Despite its simplicity, compared with the early stopping, PES can help to obtain more promising and stable results. Furthermore, by combining PES with existing approaches on noisy label training, we achieve state-of-the-art performance on image classification benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding the Effect of Stochasticity in Policy Optimization b/data/2021/neurips/Understanding the Effect of Stochasticity in Policy Optimization
new file mode 100644
index 0000000000..5582051e68
--- /dev/null
+++ b/data/2021/neurips/Understanding the Effect of Stochasticity in Policy Optimization	
@@ -0,0 +1 @@
+We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients are used. In particular, unlike the true gradient setting, geometric information cannot be easily exploited in the stochastic case for accelerating policy optimization without detrimental consequences or impractical assumptions. Second, to explain these findings we introduce the concept of committal rate for stochastic policy optimization, and show that this can serve as a criterion for determining almost sure convergence to global optimality. Third, we show that in the absence of external oracle information, which allows an algorithm to determine the difference between optimal and sub-optimal actions given only on-policy samples, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely. That is, an uninformed algorithm either converges to a globally optimal policy with probability $1$ but at a rate no better than $O(1/t)$, or it achieves faster than $O(1/t)$ convergence but then must fail to converge to the globally optimal policy with some positive probability. Finally, we use the committal rate theory to explain why practical policy optimization methods are sensitive to random initialization, then develop an ensemble method that can be guaranteed to achieve near-optimal solutions with high probability.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding the Generalization Benefit of Model Invariance from a Data Perspective b/data/2021/neurips/Understanding the Generalization Benefit of Model Invariance from a Data Perspective
new file mode 100644
index 0000000000..e57cb8077b
--- /dev/null
+++ b/data/2021/neurips/Understanding the Generalization Benefit of Model Invariance from a Data Perspective	
@@ -0,0 +1 @@
+Machine learning models that are developed with invariance to certain types of data transformations have demonstrated superior generalization performance in practice. However, the underlying mechanism that explains why invariance leads to better generalization is not well-understood, limiting our ability to select appropriate data transformations for a given dataset. This paper studies the generalization benefit of model invariance by introducing the sample cover induced by transformations, i.e., a representative subset of a dataset that can approximately recover the whole dataset using transformations. Based on this notion, we refine the generalization bound for invariant models and characterize the suitability of a set of data transformations by the sample covering number induced by transformations, i.e., the smallest size of its induced sample covers. We show that the generalization bound can be tightened for suitable transformations that have a small sample covering number. Moreover, our proposed sample covering number can be empirically evaluated, providing a practical guide for selecting transformations to develop model invariance for better generalization. We evaluate the sample covering numbers for commonly used transformations on multiple datasets and demonstrate that the smaller sample covering number for a set of transformations indicates a smaller gap between the test and training error for invariant models, thus validating our propositions.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning b/data/2021/neurips/Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning
new file mode 100644
index 0000000000..b1da7f8e64
--- /dev/null
+++ b/data/2021/neurips/Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning	
@@ -0,0 +1 @@
+Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from that of the target. However, UDA is not always successful and several accounts of `negative transfer' have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data in the source domain. The failure of these UDA methods demonstrates their limitations at guaranteeing cross-domain generalization consistent with our lower bound. Thus, evaluating UDA methods in adversarial settings such as data poisoning provides a better sense of their robustness to data distributions unfavorable for UDA.
\ No newline at end of file
diff --git a/data/2021/neurips/Understanding the Under-Coverage Bias in Uncertainty Estimation b/data/2021/neurips/Understanding the Under-Coverage Bias in Uncertainty Estimation
new file mode 100644
index 0000000000..472af7e559
--- /dev/null
+++ b/data/2021/neurips/Understanding the Under-Coverage Bias in Uncertainty Estimation	
@@ -0,0 +1 @@
+Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input. It is frequently observed that quantile regression -- a vanilla algorithm for learning quantiles with asymptotic guarantees -- tends to \emph{under-cover} than the desired coverage level in reality. While various fixes have been proposed, a more fundamental understanding of why this under-coverage bias happens in the first place remains elusive. In this paper, we present a rigorous theoretical study on the coverage of uncertainty estimation algorithms in learning quantiles. We prove that quantile regression suffers from an inherent under-coverage bias, in a vanilla setting where we learn a realizable linear quantile function and there is more data than parameters. More quantitatively, for $\alpha>0.5$ and small $d/n$, the $\alpha$-quantile learned by quantile regression roughly achieves coverage $\alpha - (\alpha-1/2)\cdot d/n$ regardless of the noise distribution, where $d$ is the input dimension and $n$ is the number of training data. Our theory reveals that this under-coverage bias stems from a certain high-dimensional parameter estimation error that is not implied by existing theories on quantile regression. Experiments on simulated and real data verify our theory and further illustrate the effect of various factors such as sample size and model capacity on the under-coverage bias in more practical setups.
\ No newline at end of file
diff --git a/data/2021/neurips/Unfolding Taylor's Approximations for Image Restoration b/data/2021/neurips/Unfolding Taylor's Approximations for Image Restoration
new file mode 100644
index 0000000000..5a537444df
--- /dev/null
+++ b/data/2021/neurips/Unfolding Taylor's Approximations for Image Restoration	
@@ -0,0 +1 @@
+Deep learning provides a new avenue for image restoration, which demands a delicate balance between fine-grained details and high-level contextualized information during recovering the latent clear image. In practice, however, existing methods empirically construct encapsulated end-to-end mapping networks without deepening into the rationality, and neglect the intrinsic prior knowledge of restoration task. To solve the above problems, inspired by Taylor's Approximations, we unfold Taylor's Formula to construct a novel framework for image restoration. We find the main part and the derivative part of Taylor's Approximations take the same effect as the two competing goals of high-level contextualized information and spatial details of image restoration respectively. Specifically, our framework consists of two steps, correspondingly responsible for the mapping and derivative functions. The former first learns the high-level contextualized information and the later combines it with the degraded input to progressively recover local high-order spatial details. Our proposed framework is orthogonal to existing methods and thus can be easily integrated with them for further improvement, and extensive experiments demonstrate the effectiveness and scalability of our proposed framework.
\ No newline at end of file
diff --git a/data/2021/neurips/UniDoc: Unified Pretraining Framework for Document Understanding b/data/2021/neurips/UniDoc: Unified Pretraining Framework for Document Understanding
new file mode 100644
index 0000000000..5ef933eb25
--- /dev/null
+++ b/data/2021/neurips/UniDoc: Unified Pretraining Framework for Document Understanding	
@@ -0,0 +1 @@
+Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with self-supervised objectives. However, most of the existing document pretraining methods are still language-dominated. We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. Each input element is composed of words and visual features from a semantic region of the input document image. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses, encouraging the representation to model sentences, learn similarities, and align modalities. Extensive empirical analysis demonstrates that the pretraining procedure learns better joint representations and leads to improvements in downstream tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Uniform Concentration Bounds toward a Unified Framework for Robust Clustering b/data/2021/neurips/Uniform Concentration Bounds toward a Unified Framework for Robust Clustering
new file mode 100644
index 0000000000..349d3965b1
--- /dev/null
+++ b/data/2021/neurips/Uniform Concentration Bounds toward a Unified Framework for Robust Clustering	
@@ -0,0 +1 @@
+Recent advances in center-based clustering continue to improve upon the drawbacks of Lloyd's celebrated $k$-means algorithm over $60$ years after its introduction. Various methods seek to address poor local minima, sensitivity to outliers, and data that are not well-suited to Euclidean measures of fit, but many are supported largely empirically. Moreover, combining such approaches in a piecemeal manner can result in ad hoc methods, and the limited theoretical results supporting each individual contribution may no longer hold. Toward addressing these issues in a principled way, this paper proposes a cohesive robust framework for center-based clustering under a general class of dissimilarity measures. In particular, we present a rigorous theoretical treatment within a Median-of-Means (MoM) estimation framework, showing that it subsumes several popular $k$-means variants. In addition to unifying existing methods, we derive uniform concentration bounds that complete their analyses, and bridge these results to the MoM framework via Dudley's chaining arguments. Importantly, we neither require any assumptions on the distribution of the outlying observations nor on the relative number of observations $n$ to features $p$. We establish strong consistency and an error rate of $O(n^{-1/2})$ under mild conditions, surpassing the best-known results in the literature. The methods are empirically validated thoroughly on real and synthetic datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting b/data/2021/neurips/Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting
new file mode 100644
index 0000000000..6da1fb4ef9
--- /dev/null
+++ b/data/2021/neurips/Uniform Convergence of Interpolators: Gaussian Width, Norm Bounds and Benign Overfitting	
@@ -0,0 +1 @@
+We consider interpolation learning in high-dimensional linear regression with Gaussian data, and prove a generic uniform convergence guarantee on the generalization error of interpolators in an arbitrary hypothesis class in terms of the class's Gaussian width. Applying the generic bound to Euclidean norm balls recovers the consistency result of Bartlett et al. (2020) for minimum-norm interpolators, and confirms a prediction of Zhou et al. (2020) for near-minimal-norm interpolators in the special case of Gaussian data. We demonstrate the generality of the bound by applying it to the simplex, obtaining a novel consistency result for minimum l1-norm interpolators (basis pursuit). Our results show how norm-based generalization bounds can explain and be used to analyze benign overfitting, at least in some settings.
\ No newline at end of file
diff --git a/data/2021/neurips/Uniform Sampling over Episode Difficulty b/data/2021/neurips/Uniform Sampling over Episode Difficulty
new file mode 100644
index 0000000000..94ccbd147c
--- /dev/null
+++ b/data/2021/neurips/Uniform Sampling over Episode Difficulty	
@@ -0,0 +1 @@
+Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms. We demonstrate the efficacy of our method across popular few-shot learning datasets, algorithms, network architectures, and protocols.
\ No newline at end of file
diff --git a/data/2021/neurips/Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation b/data/2021/neurips/Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation
new file mode 100644
index 0000000000..e7a07eae97
--- /dev/null
+++ b/data/2021/neurips/Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation	
@@ -0,0 +1 @@
+We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest.
\ No newline at end of file
diff --git a/data/2021/neurips/Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation b/data/2021/neurips/Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation
new file mode 100644
index 0000000000..c624d820e8
--- /dev/null
+++ b/data/2021/neurips/Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation	
@@ -0,0 +1 @@
+Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higherorder derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization b/data/2021/neurips/Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization
new file mode 100644
index 0000000000..f104b33500
--- /dev/null
+++ b/data/2021/neurips/Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization	
@@ -0,0 +1 @@
+We provide several algorithms for constrained optimization of a large class of convex problems, including softmax, $\ell_p$ regression, and logistic regression. Central to our approach is the notion of width reduction, a technique which has proven immensely useful in the context of maximum flow [Christiano et al., STOC'11] and, more recently, $\ell_p$ regression [Adil et al., SODA'19], in terms of improving the iteration complexity from $O(m^{1/2})$ to $\tilde{O}(m^{1/3})$, where $m$ is the number of rows of the design matrix, and where each iteration amounts to a linear system solve. However, a considerable drawback is that these methods require both problem-specific potentials and individually tailored analyses. As our main contribution, we initiate a new direction of study by presenting the first unified approach to achieving $m^{1/3}$-type rates. Notably, our method goes beyond these previously considered problems to more broadly capture quasi-self-concordant losses, a class which has recently generated much interest and includes the well-studied problem of logistic regression, among others. In order to do so, we develop a unified width reduction method for carefully handling these losses based on a more general set of potentials. Additionally, we directly achieve $m^{1/3}$-type rates in the constrained setting without the need for any explicit acceleration schemes, thus naturally complementing recent work based on a ball-oracle approach [Carmon et al., NeurIPS'20].
\ No newline at end of file
diff --git a/data/2021/neurips/Unifying lower bounds on prediction dimension of convex surrogates b/data/2021/neurips/Unifying lower bounds on prediction dimension of convex surrogates
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Unintended Selection: Persistent Qualification Rate Disparities and Interventions b/data/2021/neurips/Unintended Selection: Persistent Qualification Rate Disparities and Interventions
new file mode 100644
index 0000000000..70a1af451a
--- /dev/null
+++ b/data/2021/neurips/Unintended Selection: Persistent Qualification Rate Disparities and Interventions	
@@ -0,0 +1 @@
+Realistically -- and equitably -- modeling the dynamics of group-level disparities in machine learning remains an open problem. In particular, we desire models that do not suppose inherent differences between artificial groups of people -- but rather endogenize disparities by appeal to unequal initial conditions of insular subpopulations. In this paper, agents each have a real-valued feature $X$ (e.g., credit score) informed by a"true"binary label $Y$ representing qualification (e.g., for a loan). Each agent alternately (1) receives a binary classification label $\hat{Y}$ (e.g., loan approval) from a Bayes-optimal machine learning classifier observing $X$ and (2) may update their qualification $Y$ by imitating successful strategies (e.g., seek a raise) within an isolated group $G$ of agents to which they belong. We consider the disparity of qualification rates $\Pr(Y=1)$ between different groups and how this disparity changes subject to a sequence of Bayes-optimal classifiers repeatedly retrained on the global population. We model the evolving qualification rates of each subpopulation (group) using the replicator equation, which derives from a class of imitation processes. We show that differences in qualification rates between subpopulations can persist indefinitely for a set of non-trivial equilibrium states due to uniformed classifier deployments, even when groups are identical in all aspects except initial qualification densities. We next simulate the effects of commonly proposed fairness interventions on this dynamical system along with a new feedback control mechanism capable of permanently eliminating group-level qualification rate disparities. We conclude by discussing the limitations of our model and findings and by outlining potential future work.
\ No newline at end of file
diff --git a/data/2021/neurips/Unique sparse decomposition of low rank matrices b/data/2021/neurips/Unique sparse decomposition of low rank matrices
new file mode 100644
index 0000000000..f1fdb3716c
--- /dev/null
+++ b/data/2021/neurips/Unique sparse decomposition of low rank matrices	
@@ -0,0 +1 @@
+The problem of finding a unique low dimensional decomposition of a given matrix has been a fundamental and recurrent problem in many areas. In this paper, we study the problem of seeking a unique decomposition of a low rank matrix <inline-formula> <tex-math notation="LaTeX">$\boldsymbol Y\in \mathbb R ^{p\times n}$ </tex-math></inline-formula> that admits a sparse representation. Specifically, we consider <inline-formula> <tex-math notation="LaTeX">$\boldsymbol Y= \boldsymbol A \boldsymbol X $ </tex-math></inline-formula> where the matrix <inline-formula> <tex-math notation="LaTeX">$\boldsymbol A\in \mathbb R^{p\times r}$ </tex-math></inline-formula> has full column rank, with <inline-formula> <tex-math notation="LaTeX">$r < \min \{n,p\}$ </tex-math></inline-formula>, and the matrix <inline-formula> <tex-math notation="LaTeX">$\boldsymbol X\in \mathbb R^{r\times n}$ </tex-math></inline-formula> is element-wise sparse. We prove that this low rank, sparse decomposition of <inline-formula> <tex-math notation="LaTeX">$\boldsymbol Y$ </tex-math></inline-formula> can be uniquely identified, up to some intrinsic signed permutation. Our approach relies on solving a nonconvex optimization problem constrained over the unit sphere. Our geometric analysis for its nonconvex optimization landscape shows that any <italic>strict</italic> local solution is close to the ground truth, and can be recovered by a simple data-driven initialization followed with any second order descent algorithm. Our theoretical findings are corroborated by numerical experiments.<xref ref-type="fn" rid="fn1">1</xref>
\ No newline at end of file
diff --git a/data/2021/neurips/Universal Approximation Using Well-Conditioned Normalizing Flows b/data/2021/neurips/Universal Approximation Using Well-Conditioned Normalizing Flows
new file mode 100644
index 0000000000..30965be821
--- /dev/null
+++ b/data/2021/neurips/Universal Approximation Using Well-Conditioned Normalizing Flows	
@@ -0,0 +1 @@
+Afﬁne-coupling models (Dinh et al., 2014; 2016) are a particularly common type of normalizing ﬂows, for which the Jacobian of the latent-to-observable-variable transformation is triangular, allowing the likelihood to be computed in linear time. Despite the widespread usage of afﬁne couplings, the special structure of the architecture makes understanding their representational power challenging. The question of universal approximation was only recently resolved by three parallel papers (Huang et al., 2020; Zhang et al., 2020; Koehler et al., 2020) – who showed reasonably regular distributions can be approximated arbitrarily well using afﬁne couplings – albeit with networks with a nearly-singular Jacobian. As ill-conditioned Jacobians are an obstacle for likelihood-based training, the fundamental question remains: which distributions can be approximated using well-conditioned afﬁne coupling ﬂows? In this paper, we show that any log-concave distribution can be approximated us-ing well-conditioned afﬁne-coupling ﬂows. In terms of proof techniques, we uncover and leverage deep connections between afﬁne coupling architectures, underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and H ´ enon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). In terms of informing practice, we approximate a padded version of the input distribution with iid Gaus-sians – a strategy which (Koehler et
\ No newline at end of file
diff --git a/data/2021/neurips/Universal Graph Convolutional Networks b/data/2021/neurips/Universal Graph Convolutional Networks
new file mode 100644
index 0000000000..40a0d48d5c
--- /dev/null
+++ b/data/2021/neurips/Universal Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph Convolutional Networks (GCNs), aiming to obtain the representation of a node by aggregating its neighbors, have demonstrated great power in tackling various analytics tasks on graph (network) data. The remarkable performance of GCNs typically relies on the homophily assumption of networks, while such assumption cannot always be satisﬁed, since the heterophily or randomness are also widespread in real-world. This gives rise to one fundamental question: whether networks with different structural properties should adopt different propagation mechanisms? In this paper, we ﬁrst conduct an experimental investigation. Surprisingly, we discover that there are actually segmentation rules for the propagation mechanism, i.e., 1-hop, 2-hop and k -nearest neighbor ( k NN) neighbors are more suitable as neighborhoods of network with complete homophily, complete heterophily and randomness, respectively. However, the real-world networks are complex, and may present diverse structural properties, e.g., the network dominated by homophily may contain a small amount of randomness. So can we reasonably utilize these segmentation rules to design a universal propagation mechanism independent of the network structural assumption? To tackle this challenge, we develop a new universal GCN framework, namely U-GCN. It ﬁrst introduces a multi-type convolution to extract information from 1-hop, 2-hop and k NN networks simultaneously, and then designs a discriminative aggregation to sufﬁciently fuse them aiming to given learning objectives. Extensive experiments demonstrate the superiority of U-GCN over state-of-the-arts. The code and data are available at https://github.com/jindi-tju.
\ No newline at end of file
diff --git a/data/2021/neurips/Universal Off-Policy Evaluation b/data/2021/neurips/Universal Off-Policy Evaluation
new file mode 100644
index 0000000000..71bd191484
--- /dev/null
+++ b/data/2021/neurips/Universal Off-Policy Evaluation	
@@ -0,0 +1 @@
+When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO) -- one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss Uno's applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
\ No newline at end of file
diff --git a/data/2021/neurips/Universal Rate-Distortion-Perception Representations for Lossy Compression b/data/2021/neurips/Universal Rate-Distortion-Perception Representations for Lossy Compression
new file mode 100644
index 0000000000..9a8374d699
--- /dev/null
+++ b/data/2021/neurips/Universal Rate-Distortion-Perception Representations for Lossy Compression	
@@ -0,0 +1 @@
+In the context of lossy compression, Blau&Michaeli (2019) adopt a mathematical notion of perceptual quality and define the information rate-distortion-perception function, generalizing the classical rate-distortion tradeoff. We consider the notion of universal representations in which one may fix an encoder and vary the decoder to achieve any point within a collection of distortion and perception constraints. We prove that the corresponding information-theoretic universal rate-distortion-perception function is operationally achievable in an approximate sense. Under MSE distortion, we show that the entire distortion-perception tradeoff of a Gaussian source can be achieved by a single encoder of the same rate asymptotically. We then characterize the achievable distortion-perception region for a fixed representation in the case of arbitrary distributions, identify conditions under which the aforementioned results continue to hold approximately, and study the case when the rate is not fixed in advance. This motivates the study of practical constructions that are approximately universal across the RDP tradeoff, thereby alleviating the need to design a new encoder for each objective. We provide experimental results on MNIST and SVHN suggesting that on image compression tasks, the operational tradeoffs achieved by machine learning models with a fixed encoder suffer only a small penalty when compared to their variable encoder counterparts.
\ No newline at end of file
diff --git a/data/2021/neurips/Universal Semi-Supervised Learning b/data/2021/neurips/Universal Semi-Supervised Learning
new file mode 100644
index 0000000000..14d9917352
--- /dev/null
+++ b/data/2021/neurips/Universal Semi-Supervised Learning	
@@ -0,0 +1 @@
+Universal Semi-Supervised Learning (UniSSL) aims to solve the open-set problem where both the class distribution ( i.e. , class set) and feature distribution ( i.e. , feature domain) are different between labeled dataset and unlabeled dataset. Such a problem seriously hinders the realistic landing of classical SSL. Different from the existing SSL methods targeting at the open-set problem that only study one certain scenario of class distribution mismatch and ignore the feature distribution mismatch, we consider a more general case where a mismatch exists in both class and feature distribution. In this case, we propose a “Class-shAring data detection and Feature Adaptation” (CAFA) framework which requires no prior knowledge of the class relationship between the labeled dataset and unlabeled dataset. Particularly, CAFA utilizes a novel scoring strategy to detect the data in the shared class set. Then, it conducts domain adaptation to fully exploit the value of the detected class-sharing data for better semi-supervised consistency training. Exhaustive experiments on several benchmark datasets show the effectiveness of our method in tackling open-set problems.
\ No newline at end of file
diff --git a/data/2021/neurips/Unlabeled Principal Component Analysis b/data/2021/neurips/Unlabeled Principal Component Analysis
new file mode 100644
index 0000000000..6b6de69585
--- /dev/null
+++ b/data/2021/neurips/Unlabeled Principal Component Analysis	
@@ -0,0 +1 @@
+We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that UPCA is a well-defined algebraic problem in the sense that the only matrices of minimal rank that agree with the given data are row-permutations of the ground-truth matrix, arising as the unique solutions of a polynomial system of equations. Further, we propose an efficient two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data have been permuted. Stage-I employs outlier-robust PCA methods to estimate the ground-truth column-space. Equipped with the column-space, Stage-II applies recent methods for unlabeled sensing to restore the permuted data. Allowing for missing entries on top of permutations in UPCA leads to the problem of unlabeled matrix completion, for which we derive theory and algorithms of similar flavor. Experiments on synthetic data, face images, educational and medical records reveal the potential of our algorithms for applications such as data privatization and record linkage.
\ No newline at end of file
diff --git a/data/2021/neurips/Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning b/data/2021/neurips/Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning
new file mode 100644
index 0000000000..4c382ef28f
--- /dev/null
+++ b/data/2021/neurips/Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning	
@@ -0,0 +1 @@
+Contrastive self-supervised learning (CSL) has attracted increasing attention for model pre-training via unlabeled data. The resulted CSL models provide instance-discriminative visual features that are uniformly scattered in the feature space. During deployment, the common practice is to directly fine-tune CSL models with cross-entropy, which however may not be the best strategy in practice. Although cross-entropy tends to separate inter-class features, the resulting models still have limited capability for reducing intra-class feature scattering that exists in CSL models. In this paper, we investigate whether applying contrastive learning to fine-tuning would bring further benefits, and analytically find that optimizing the contrastive loss benefits both discriminative representation learning and model optimization during fine-tuning. Inspired by these findings, we propose Contrast-regularized tuning (Core-tuning), a new approach for fine-tuning CSL models. Instead of simply adding the contrastive loss to the objective of fine-tuning, Core-tuning further applies a novel hard pair mining strategy for more effective contrastive fine-tuning, as well as smoothing the decision boundary to better exploit the learned discriminative feature space. Extensive experiments on image classification and semantic segmentation verify the effectiveness of Core-tuning.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning b/data/2021/neurips/Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning
new file mode 100644
index 0000000000..f61c87b257
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning	
@@ -0,0 +1 @@
+Unsupervised reinforcement learning aims to acquire skills without prior goal representations, where an agent automatically explores an open-ended environment to represent goals and learn the goal-conditioned policy. However, this procedure is often time-consuming, limiting the rollout in some potentially expensive target environments. The intuitive approach of training in another interaction-rich environment disrupts the reproducibility of trained skills in the target environment due to the dynamics shifts and thus inhibits direct transferring. Assuming free access to a source environment, we propose an unsupervised domain adaptation method to identify and acquire skills across dynamics. Particularly, we introduce a KL regularized objective to encourage emergence of skills, rewarding the agent for both discovering skills and aligning its behaviors respecting dynamics shifts. This suggests that both dynamics (source and target) shape the reward to facilitate the learning of adaptive skills. We also conduct empirical experiments to demonstrate that our method can effectively learn skills that can be smoothly deployed in target.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Foreground Extraction via Deep Region Competition b/data/2021/neurips/Unsupervised Foreground Extraction via Deep Region Competition
new file mode 100644
index 0000000000..2a8ebb548d
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Foreground Extraction via Deep Region Competition	
@@ -0,0 +1 @@
+We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Learning of Compositional Energy Concepts b/data/2021/neurips/Unsupervised Learning of Compositional Energy Concepts
new file mode 100644
index 0000000000..0a6e131cf3
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Learning of Compositional Energy Concepts	
@@ -0,0 +1 @@
+Humans are able to rapidly understand scenes by utilizing concepts extracted from prior experience. Such concepts are diverse, and include global scene descriptors, such as the weather or lighting, as well as local scene descriptors, such as the color or size of a particular object. So far, unsupervised discovery of concepts has focused on either modeling the global scene-level or the local object-level factors of variation, but not both. In this work, we propose COMET, which discovers and represents concepts as separate energy functions, enabling us to represent both global concepts as well as objects under a unified framework. COMET discovers energy functions through recomposing the input image, which we find captures independent factors without additional supervision. Sample generation in COMET is formulated as an optimization process on underlying energy functions, enabling us to generate images with permuted and composed concepts. Finally, discovered visual concepts in COMET generalize well, enabling us to compose concepts between separate modalities of images as well as with other concepts discovered by a separate instance of COMET trained on a different dataset. Code and data available at https://energy-based-model.github.io/comet/.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Motion Representation Learning with Capsule Autoencoders b/data/2021/neurips/Unsupervised Motion Representation Learning with Capsule Autoencoders
new file mode 100644
index 0000000000..1f625b45a2
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Motion Representation Learning with Capsule Autoencoders	
@@ -0,0 +1 @@
+We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. MCAE models motion in a two-level hierarchy. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets. Notably, it achieves better results than baselines on Trajectory20 with considerably fewer parameters and state-of-the-art performance on the unsupervised skeleton-based action recognition task.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport b/data/2021/neurips/Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport
new file mode 100644
index 0000000000..be1dffba74
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport	
@@ -0,0 +1 @@
+This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing. The DOTN aims to estimate clean references of noisy speech in a target domain, by exploiting the knowledge available from the source domain. The domain shift between training and testing data has been reported to be an obstacle to learning problems in diverse fields. Although rich literature exists on unsupervised domain adaptation for classification, the methods proposed, especially in regressions, remain scarce and often depend on additional information regarding the input data. The proposed DOTN approach tactically fuses the optimal transport (OT) theory from mathematical analysis with generative adversarial frameworks, to help evaluate continuous labels in the target domain. The experimental results on two SE tasks demonstrate that by extending the classical OT formulation, our proposed DOTN outperforms previous adversarial domain adaptation frameworks in a purely unsupervised manner.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Object-Based Transition Models For 3D Partially Observable Environments b/data/2021/neurips/Unsupervised Object-Based Transition Models For 3D Partially Observable Environments
new file mode 100644
index 0000000000..6b2a7058fd
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Object-Based Transition Models For 3D Partially Observable Environments	
@@ -0,0 +1 @@
+We present a slot-wise, object-based transition model that decomposes a scene into objects, aligns them (with respect to a slot-wise object memory) to maintain a consistent order across time, and predicts how those objects evolve over successive frames. The model is trained end-to-end without supervision using losses at the level of the object-structured representation rather than pixels. Thanks to its alignment module, the model deals properly with two issues that are not handled satisfactorily by other transition models, namely object persistence and object identity. We show that the combination of an object-level loss and correct object alignment over time enables the model to outperform a state-of-the-art baseline, and allows it to deal well with object occlusion and re-appearance in partially observable environments.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Object-Level Representation Learning from Scene Images b/data/2021/neurips/Unsupervised Object-Level Representation Learning from Scene Images
new file mode 100644
index 0000000000..fbdab8580d
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Object-Level Representation Learning from Scene Images	
@@ -0,0 +1 @@
+Contrastive self-supervised learning has largely narrowed the gap to supervised pre-training on ImageNet. However, its success highly relies on the object-centric priors of ImageNet, i.e., different augmented views of the same image correspond to the same object. Such a heavily curated constraint becomes immediately infeasible when pre-trained on more complex scene images with many objects. To overcome this limitation, we introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence, thus realizing object-level representation learning from scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks. Furthermore, ORL improves the downstream performance when more unlabeled scene images are available, demonstrating its great potential of harnessing unlabeled data in the wild. We hope our approach can motivate future research on more general-purpose unsupervised representation learning from scene data.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Part Discovery from Contrastive Reconstruction b/data/2021/neurips/Unsupervised Part Discovery from Contrastive Reconstruction
new file mode 100644
index 0000000000..dd219cddb9
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Part Discovery from Contrastive Reconstruction	
@@ -0,0 +1 @@
+The goal of self-supervised visual representation learning is to learn strong, transferable image representations, with the majority of research focusing on object or scene level. On the other hand, representation learning at part level has received significantly less attention. In this paper, we propose an unsupervised approach to object part discovery and segmentation and make three contributions. First, we construct a proxy task through a set of objectives that encourages the model to learn a meaningful decomposition of the image into its parts. Secondly, prior work argues for reconstructing or clustering pre-computed features as a proxy to parts; we show empirically that this alone is unlikely to find meaningful parts; mainly because of their low resolution and the tendency of classification networks to spatially smear out information. We suggest that image reconstruction at the level of pixels can alleviate this problem, acting as a complementary cue. Lastly, we show that the standard evaluation based on keypoint regression does not correlate well with segmentation quality and thus introduce different metrics, NMI and ARI, that better characterize the decomposition of objects into parts. Our method yields semantic parts which are consistent across fine-grained but visually distinct categories, outperforming the state of the art on three benchmark datasets. Code is available at the project page: https://www.robots.ox.ac.uk/~vgg/research/unsup-parts/.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly b/data/2021/neurips/Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly
new file mode 100644
index 0000000000..5ddc385227
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly	
@@ -0,0 +1 @@
+A current remarkable improvement of unsupervised visual representation learning is based on heavy networks with large-batch training. While recent methods have greatly reduced the gap between supervised and unsupervised performance of deep models such as ResNet-50, this development has been relatively limited for small models. In this work, we propose a novel unsupervised learning framework for small networks that combines deep self-supervised representation learning and knowledge distillation within one-phase training. In particular, a teacher model is trained to produce consistent cluster assignments between different views of the same image. Simultaneously, a student model is encouraged to mimic the prediction of on-the-ﬂy self-supervised teacher. For effective knowledge transfer, we adopt the idea of domain classiﬁer so that student training is guided by discriminative features invariant to the representational space shift between teacher and student. We also introduce a network driven multi-view generation paradigm to capture rich feature information contained in the network itself. Extensive experiments show that our student models surpass state-of-the-art ofﬂine distilled networks even from stronger self-supervised teachers as well as top-performing self-supervised models. Notably, our ResNet-18, trained with ResNet-50 teacher, achieves 68.3% ImageNet Top-1 accuracy on frozen feature linear evaluation, which is only 1.5% below the supervised baseline.
\ No newline at end of file
diff --git a/data/2021/neurips/Unsupervised Speech Recognition b/data/2021/neurips/Unsupervised Speech Recognition
new file mode 100644
index 0000000000..d126f5cbf4
--- /dev/null
+++ b/data/2021/neurips/Unsupervised Speech Recognition	
@@ -0,0 +1 @@
+Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language. However, existing methods still heavily rely on hand-crafted pre-processing. Similar to the trend of making supervised speech recognition end-to-end, we introduce wav2vec-U 2.0 which does away with all audio-side pre-processing and improves accuracy through better architecture. In addition, we introduce an auxiliary self-supervised objective that ties model predictions back to the input. Experiments show that wav2vec-U 2.0 improves unsupervised recognition results across different languages while being conceptually simpler.
\ No newline at end of file
diff --git a/data/2021/neurips/User-Level Differentially Private Learning via Correlated Sampling b/data/2021/neurips/User-Level Differentially Private Learning via Correlated Sampling
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks b/data/2021/neurips/Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks
new file mode 100644
index 0000000000..db3a4a4644
--- /dev/null
+++ b/data/2021/neurips/Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks	
@@ -0,0 +1 @@
+High-cardinality categorical features are a major challenge for machine learning methods in general and for deep learning in particular. Existing solutions such as one-hot encoding and entity embeddings can be hard to scale when the cardinality is very high, require much space, are hard to interpret or may overﬁt the data. A special scenario of interest is that of repeated measures, where the categorical feature is the identity of the individual or object, and each object is measured several times, possibly under different conditions (values of the other features). We propose accounting for high-cardinality categorical features as random effects variables in a regression setting, and consequently adopt the corresponding negative log likelihood loss from the linear mixed models (LMM) statistical literature and integrate it in a deep learning framework. We test our model which we call LMMNN on simulated as well as real datasets with a single categorical feature with high cardinality, using various baseline neural networks architectures such as convolutional networks and LSTM, and various applications in e-commerce, healthcare and computer vision. Our results show that treating high-cardinality categorical features as random effects leads to a signiﬁcant improvement in prediction performance compared to state of the art alternatives. Potential extensions such as accounting for multiple categorical features and classiﬁcation settings are discussed. Our code and simulations are available at https://github.com/gsimchoni/lmmnn
\ No newline at end of file
diff --git a/data/2021/neurips/VAST: Value Function Factorization with Variable Agent Sub-Teams b/data/2021/neurips/VAST: Value Function Factorization with Variable Agent Sub-Teams
new file mode 100644
index 0000000000..b6a448bb54
--- /dev/null
+++ b/data/2021/neurips/VAST: Value Function Factorization with Variable Agent Sub-Teams	
@@ -0,0 +1 @@
+Value function factorization (VFF) is a popular approach to cooperative multi-agent reinforcement learning in order to learn local value functions from global rewards. However, state-of-the-art VFF is limited to a handful of agents in most domains. We hypothesize that this is due to the ﬂat factorization scheme, where the VFF operator becomes a performance bottleneck with an increasing number of agents. Therefore, we propose VFF with variable agent sub-teams (VAST) . VAST approximates a factorization for sub-teams which can be deﬁned in an arbitrary way and vary over time, e.g., to adapt to different situations. The sub-team values are then linearly decomposed for all sub-team members. Thus, VAST can learn on a more focused and compact input representation of the original VFF operator. We evaluate VAST in three multi-agent domains and show that VAST can signiﬁcantly outperform state-of-the-art VFF, when the number of agents is sufﬁciently large.
\ No newline at end of file
diff --git a/data/2021/neurips/VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text b/data/2021/neurips/VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
new file mode 100644
index 0000000000..2bec0fafd2
--- /dev/null
+++ b/data/2021/neurips/VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text	
@@ -0,0 +1 @@
+We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available.
\ No newline at end of file
diff --git a/data/2021/neurips/VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization b/data/2021/neurips/VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization
new file mode 100644
index 0000000000..08903bc301
--- /dev/null
+++ b/data/2021/neurips/VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization	
@@ -0,0 +1 @@
+Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the"neighbor explosion"problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the"neighbor explosion"problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory b/data/2021/neurips/Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory
new file mode 100644
index 0000000000..77cb126f48
--- /dev/null
+++ b/data/2021/neurips/Validating the Lottery Ticket Hypothesis with Inertial Manifold Theory	
@@ -0,0 +1 @@
+Despite achieving remarkable efﬁciency, traditional network pruning techniques often follow manually-crafted heuristics to generate pruned sparse networks. Such heuristic pruning strategies are hard to guarantee that the pruned networks achieve test accuracy comparable to the original dense ones. Recent works have empirically identiﬁed and veriﬁed the Lottery Ticket Hypothesis (LTH): a randomly-initialized dense neural network contains an extremely sparse subnetwork, which can be trained to achieve similar accuracy to the former. Due to the lack of theoretical evidence, they often need to run multiple rounds of expensive training and pruning over the original large networks to discover the sparse subnetworks with low accuracy loss. By leveraging dynamical systems theory and inertial manifold theory, this work theoretically veriﬁes the validity of the LTH. We explore the possibility of theoretically lossless pruning as well as one-time pruning, compared with existing neural network pruning and LTH techniques. We reformulate the neural network optimization problem as a gradient dynamical system and reduce this high-dimensional system onto inertial manifolds to obtain a low-dimensional system regarding pruned subnetworks. We demonstrate the precondition and existence of pruned subnetworks and prune the original networks in
\ No newline at end of file
diff --git a/data/2021/neurips/Validation Free and Replication Robust Volume-based Data Valuation b/data/2021/neurips/Validation Free and Replication Robust Volume-based Data Valuation
new file mode 100644
index 0000000000..9102dcca09
--- /dev/null
+++ b/data/2021/neurips/Validation Free and Replication Robust Volume-based Data Valuation	
@@ -0,0 +1 @@
+Data valuation arises as a non-trivial challenge in real-world use cases such as collaborative machine learning, federated learning, trusted data sharing, data mar-ketplaces. The value of data is often associated with the learning performance (e.g., validation accuracy) of a model trained on the data, which introduces a close coupling between data valuation and validation. However, a validation set may not be available in practice and it can be challenging for the data providers to reach an agreement on the choice of the validation set. Another practical issue is that of data replication: Given the value of some data points, a dishonest data provider may replicate these data points to exploit the valuation for a larger reward/payment. We observe that the diversity of the data points is an inherent property of a dataset that is independent of validation. We formalize diversity via the volume of the data matrix (i.e., determinant of its left Gram), which allows us to establish a formal connection between the diversity of data and learning performance without requiring validation. Furthermore, we propose a robust volume measure with a theoretical guarantee on the replication robustness by following the intuition that copying the same data points does not increase the diversity of data. We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is model-and task-agnostic and can be ﬂexibly adapted to handle various neural networks.
\ No newline at end of file
diff --git a/data/2021/neurips/Variance-Aware Off-Policy Evaluation with Linear Function Approximation b/data/2021/neurips/Variance-Aware Off-Policy Evaluation with Linear Function Approximation
new file mode 100644
index 0000000000..c1fbfd9a84
--- /dev/null
+++ b/data/2021/neurips/Variance-Aware Off-Policy Evaluation with Linear Function Approximation	
@@ -0,0 +1 @@
+We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems b/data/2021/neurips/Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
new file mode 100644
index 0000000000..df91eab0b7
--- /dev/null
+++ b/data/2021/neurips/Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems	
@@ -0,0 +1 @@
+We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current task distribution, and curriculum update to a new task distribution. Local optimization over the second term suggests that the curriculum should gradually expand the training tasks from easy to hard. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity progression, which produces training curricula over both the task configurations as well as the number of entities in the task. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in OpenAI's hide-and-seek project. Our project website is at https://sites.google.com/view/vacl-neurips-2021.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Bayesian Optimistic Sampling b/data/2021/neurips/Variational Bayesian Optimistic Sampling
new file mode 100644
index 0000000000..9d19e2f9c4
--- /dev/null
+++ b/data/2021/neurips/Variational Bayesian Optimistic Sampling	
@@ -0,0 +1 @@
+We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic sampling' (VBOS). The procedure works for any posteriors, \ie, it does not require the posterior to have any special properties, such as log-concavity, unimodality, or smoothness. The variational view of the problem has many useful properties, including the ability to tune the exploration-exploitation tradeoff, add regularization, incorporate constraints, and linearly parameterize the policy.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Bayesian Reinforcement Learning with Regret Bounds b/data/2021/neurips/Variational Bayesian Reinforcement Learning with Regret Bounds
new file mode 100644
index 0000000000..96e1d8df68
--- /dev/null
+++ b/data/2021/neurips/Variational Bayesian Reinforcement Learning with Regret Bounds	
@@ -0,0 +1 @@
+We consider the exploration-exploitation trade-off in reinforcement learning and we show that an agent imbued with an epistemic-risk-seeking utility function is able to explore efficiently, as measured by regret. The parameter that controls how risk-seeking the agent is can be optimized to minimize regret, or annealed according to a schedule. We call the resulting algorithm K-learning and we show that the K-values that the agent maintains are optimistic for the expected optimal Q-values at each state-action pair. The utility function approach induces a natural Boltzmann exploration policy for which the 'temperature' parameter is equal to the risk-seeking parameter. This policy achieves a Bayesian regret bound of $\tilde O(L^{3/2} \sqrt{SAT})$, where L is the time horizon, S is the number of states, A is the number of actions, and T is the total number of elapsed time-steps. K-learning can be interpreted as mirror descent in the policy space, and it is similar to other well-known methods in the literature, including Q-learning, soft-Q-learning, and maximum entropy policy gradient. K-learning is simple to implement, as it only requires adding a bonus to the reward at each state-action and then solving a Bellman equation. We conclude with a numerical example demonstrating that K-learning is competitive with other state-of-the-art algorithms in practice.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Continual Bayesian Meta-Learning b/data/2021/neurips/Variational Continual Bayesian Meta-Learning
new file mode 100644
index 0000000000..f04418dca9
--- /dev/null
+++ b/data/2021/neurips/Variational Continual Bayesian Meta-Learning	
@@ -0,0 +1 @@
+Conventional meta-learning considers a set of tasks from a stationary distribution. In contrast, this paper focuses on a more complex online setting, where tasks arrive sequentially and follow a non-stationary distribution. Accordingly, we propose a Variational Continual Bayesian Meta-Learning (VC-BML) algorithm. VC-BML maintains a Dynamic Gaussian Mixture Model for meta-parameters, with the number of component distributions determined by a Chinese Restaurant Process. Dynamic mixtures at the meta-parameter level increase the capability to adapt to diverse and dissimilar tasks due to a larger parameter space, alleviating the negative knowledge transfer problem. To infer the posteriors of model parameters, compared to the previously used point estimation method, we develop a more robust posterior approximation method – structured variational inference for the sake of avoiding forgetting knowledge. Experiments on tasks from non-stationary distributions show that VC-BML is superior in transferring knowledge among diverse tasks and alleviating catastrophic forgetting in an online setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Inference for Continuous-Time Switching Dynamical Systems b/data/2021/neurips/Variational Inference for Continuous-Time Switching Dynamical Systems
new file mode 100644
index 0000000000..ca95e6e163
--- /dev/null
+++ b/data/2021/neurips/Variational Inference for Continuous-Time Switching Dynamical Systems	
@@ -0,0 +1 @@
+Switching dynamical systems provide a powerful, interpretable modeling framework for inference in time-series data in, e.g., the natural sciences or engineering applications. Since many areas, such as biology or discrete-event systems, are naturally described in continuous time, we present a model based on an Markov jump process modulating a subordinated diffusion process. We provide the exact evolution equations for the prior and posterior marginal densities, the direct solutions of which are however computationally intractable. Therefore, we develop a new continuous-time variational inference algorithm, combining a Gaussian process approximation on the diffusion level with posterior inference for Markov jump processes. By minimizing the path-wise Kullback-Leibler divergence we obtain (i) Bayesian latent state estimates for arbitrary points on the real axis and (ii) point estimates of unknown system parameters, utilizing variational expectation maximization. We extensively evaluate our algorithm under the model assumption and for real-world examples.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Model Inversion Attacks b/data/2021/neurips/Variational Model Inversion Attacks
new file mode 100644
index 0000000000..51068d17ac
--- /dev/null
+++ b/data/2021/neurips/Variational Model Inversion Attacks	
@@ -0,0 +1 @@
+Given the ubiquity of deep neural networks, it is important that these models do not reveal information about sensitive data that they have been trained on. In model inversion attacks, a malicious user attempts to recover the private dataset used to train a supervised neural network. A successful model inversion attack should generate realistic and diverse samples that accurately describe each of the classes in the private dataset. In this work, we provide a probabilistic interpretation of model inversion attacks, and formulate a variational objective that accounts for both diversity and accuracy. In order to optimize this variational objective, we choose a variational family defined in the code space of a deep generative model, trained on a public auxiliary dataset that shares some structural similarity with the target dataset. Empirically, our method substantially improves performance in terms of target attack accuracy, sample realism, and diversity on datasets of faces and chest X-ray images.
\ No newline at end of file
diff --git a/data/2021/neurips/Variational Multi-Task Learning with Gumbel-Softmax Priors b/data/2021/neurips/Variational Multi-Task Learning with Gumbel-Softmax Priors
new file mode 100644
index 0000000000..e82aa101c5
--- /dev/null
+++ b/data/2021/neurips/Variational Multi-Task Learning with Gumbel-Softmax Priors	
@@ -0,0 +1 @@
+Multi-task learning aims to explore task relatedness to improve individual tasks, which is of particular significance in the challenging scenario that only limited data is available for each task. To tackle this challenge, we propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks. We cast multi-task learning as a variational Bayesian inference problem, in which task relatedness is explored in a unified manner by specifying priors. To incorporate shared knowledge into each task, we design the prior of a task to be a learnable mixture of the variational posteriors of other related tasks, which is learned by the Gumbel-Softmax technique. In contrast to previous methods, our VMTL can exploit task relatedness for both representations and classifiers in a principled way by jointly inferring their posteriors. This enables individual tasks to fully leverage inductive biases provided by related tasks, therefore improving the overall performance of all tasks. Experimental results demonstrate that the proposed VMTL is able to effectively tackle a variety of challenging multi-task learning settings with limited training data for both classification and regression. Our method consistently surpasses previous methods, including strong Bayesian approaches, and achieves state-of-the-art performance on five benchmark datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices b/data/2021/neurips/Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices
new file mode 100644
index 0000000000..a3220ab32e
--- /dev/null
+++ b/data/2021/neurips/Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices	
@@ -0,0 +1 @@
+We propose the use of the vector-valued distance to compute distances and extract geometric information from the manifold of symmetric positive definite matrices (SPD), and develop gyrovector calculus, constructing analogs of vector space operations in this curved space. We implement these operations and showcase their versatility in the tasks of knowledge graph completion, item recommendation, and question answering. In experiments, the SPD models outperform their equivalents in Euclidean and hyperbolic space. The vector-valued distance allows us to visualize embeddings, showing that the models learn to disentangle representations of positive samples from negative ones.
\ No newline at end of file
diff --git a/data/2021/neurips/Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels b/data/2021/neurips/Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels
new file mode 100644
index 0000000000..b45572666d
--- /dev/null
+++ b/data/2021/neurips/Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels	
@@ -0,0 +1 @@
+Gaussian processes are machine learning models capable of learning unknown functions in a way that represents uncertainty, thereby facilitating construction of optimal decision-making systems. Motivated by a desire to deploy Gaussian processes in novel areas of science, a rapidly-growing line of research has focused on constructively extending these models to handle non-Euclidean domains, including Riemannian manifolds, such as spheres and tori. We propose techniques that generalize this class to model vector fields on Riemannian manifolds, which are important in a number of application areas in the physical sciences. To do so, we present a general recipe for constructing gauge independent kernels, which induce Gaussian vector fields, i.e. vector-valued Gaussian processes coherent with geometry, from scalar-valued Riemannian kernels. We extend standard Gaussian process training methods, such as variational inference, to this setting. This enables vector-valued Gaussian processes on Riemannian manifolds to be trained using standard methods and makes them accessible to machine learning practitioners.
\ No newline at end of file
diff --git a/data/2021/neurips/ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction b/data/2021/neurips/ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
new file mode 100644
index 0000000000..084bff8b86
--- /dev/null
+++ b/data/2021/neurips/ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction	
@@ -0,0 +1 @@
+We introduce ViSER, a method for recovering articulated 3D shapes and dense 3D trajectories from monocular videos. Previous work on high-quality reconstruction of dynamic 3D shapes typically relies on multiple camera views, strong category-speciﬁc priors, or 2D keypoint supervision. We show that none of these are required if one can reliably estimate long-range correspondences in a video, making use of only 2D object masks and two-frame optical ﬂow as inputs. ViSER infers correspondences by matching 2D pixels to a canonical, deformable 3D mesh via video-speciﬁc surface embeddings that capture the pixel appearance of each surface point. These embeddings behave as a continuous set of keypoint descriptors deﬁned over the mesh surface, which can be used to establish dense long-range correspondences across pixels. The surface embeddings are implemented as coordinate-based MLPs that are ﬁt to each video via consistency and contrastive reconstruction losses. Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS. Our code is available at viser-shape.github.io .
\ No newline at end of file
diff --git a/data/2021/neurips/ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias b/data/2021/neurips/ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
new file mode 100644
index 0000000000..b16902bda9
--- /dev/null
+++ b/data/2021/neurips/ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	
@@ -0,0 +1 @@
+Transformers have shown great potential in various computer vision tasks owing to their strong capability in modeling long-range dependency using the self-attention mechanism. Nevertheless, vision transformers treat an image as 1D sequence of visual tokens, lacking an intrinsic inductive bias (IB) in modeling local visual structures and dealing with scale variance. Alternatively, they require large-scale training data and longer training schedules to learn the IB implicitly. In this paper, we propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE. Technically, ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context by using multiple convolutions with different dilation rates. In this way, it acquires an intrinsic scale invariance IB and is able to learn robust feature representation for objects at various scales. Moreover, in each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network. Consequently, it has the intrinsic locality IB and is able to learn local features and global dependencies collaboratively. Experiments on ImageNet as well as downstream tasks prove the superiority of ViTAE over the baseline transformer and concurrent works. Source code and pretrained models will be available at GitHub.
\ No newline at end of file
diff --git a/data/2021/neurips/VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer b/data/2021/neurips/VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
new file mode 100644
index 0000000000..d0ae1b1026
--- /dev/null
+++ b/data/2021/neurips/VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer	
@@ -0,0 +1 @@
+Since visual perception can give rich information beyond text descriptions for world understanding, there has been increasing interest in leveraging visual grounding for language learning. Recently, vokenization (Tan and Bansal, 2020) has attracted attention by using the predictions of a text-to-image retrieval model as labels for language model supervision. Despite its success, the method suffers from approximation error of using finite image labels and the lack of vocabulary diversity of a small image-text dataset. To overcome these limitations, we present VidLanKD, a video-language knowledge distillation method for improving language understanding. We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset. To avoid approximation error, we propose to use different knowledge distillation objectives. In addition, the use of a large-scale video-text dataset helps learn diverse and richer vocabularies. In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models, on several downstream language understanding tasks including GLUE, SQuAD, and SWAG. We also demonstrate the improved world knowledge, physical reasoning, and temporal reasoning capabilities of our model by evaluating on the GLUE-diagnostics, PIQA, and TRACIE datasets. Lastly, we present comprehensive ablation studies as well as visualizations of the learned text-to-video grounding results of our teacher and student language models. Our code and models are available at: https://github.com/zinengtang/VidLanKD
\ No newline at end of file
diff --git a/data/2021/neurips/Video Instance Segmentation using Inter-Frame Communication Transformers b/data/2021/neurips/Video Instance Segmentation using Inter-Frame Communication Transformers
new file mode 100644
index 0000000000..b20a53d1db
--- /dev/null
+++ b/data/2021/neurips/Video Instance Segmentation using Inter-Frame Communication Transformers	
@@ -0,0 +1 @@
+We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames. However, previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications, limiting practicality. In this work, we propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip. Specifically, we propose to utilize concise memory tokens as a mean of conveying information as well as summarizing each frame scene. The features of each frame are enriched and correlated with other frames through exchange of information between the precisely encoded memory tokens. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (AP 44.6 on YouTube-VIS 2019 val set using the offline inference) while having a considerably fast runtime (89.4 FPS). Our method can also be applied to near-online inference for processing a video in real-time with only a small delay. The code will be made available.
\ No newline at end of file
diff --git a/data/2021/neurips/VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media b/data/2021/neurips/VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media
new file mode 100644
index 0000000000..06155a899d
--- /dev/null
+++ b/data/2021/neurips/VigDet: Knowledge Informed Neural Temporal Point Process for Coordination Detection on Social Media	
@@ -0,0 +1 @@
+Recent years have witnessed an increasing use of coordinated accounts on social media, operated by misinformation campaigns to influence public opinion and manipulate social outcomes. Consequently, there is an urgent need to develop an effective methodology for coordinated group detection to combat the misinformation on social media. However, existing works suffer from various drawbacks, such as, either limited performance due to extreme reliance on predefined signatures of coordination, or instead an inability to address the natural sparsity of account activities on social media with useful prior domain knowledge. Therefore, in this paper, we propose a coordination detection framework incorporating neural temporal point process with prior knowledge such as temporal logic or pre-defined filtering functions. Specifically, when modeling the observed data from social media with neural temporal point process, we jointly learn a Gibbs-like distribution of group assignment based on how consistent an assignment is to (1) the account embedding space and (2) the prior knowledge. To address the challenge that the distribution is hard to be efficiently computed and sampled from, we design a theoretically guaranteed variational inference approach to learn a mean-field approximation for it. Experimental results on a real-world dataset show the effectiveness of our proposed method compared to the SOTA model in both unsupervised and semi-supervised settings. We further apply our model on a COVID-19 Vaccine Tweets dataset. The detection result suggests the presence of suspicious coordinated efforts on spreading misinformation about COVID-19 vaccines.
\ No newline at end of file
diff --git a/data/2021/neurips/Visual Adversarial Imitation Learning using Variational Models b/data/2021/neurips/Visual Adversarial Imitation Learning using Variational Models
new file mode 100644
index 0000000000..ed6d0f240e
--- /dev/null
+++ b/data/2021/neurips/Visual Adversarial Imitation Learning using Variational Models	
@@ -0,0 +1 @@
+Reward function specification, which requires considerable human effort and iteration, remains a major impediment for learning behaviors through deep reinforcement learning. In contrast, providing visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents. We consider a setting where an agent is provided a fixed dataset of visual demonstrations illustrating how to perform a task, and must learn to solve the task using the provided demonstrations and unsupervised environment interactions. This setting presents a number of challenges including representation learning for visual observations, sample complexity due to high dimensional spaces, and learning instability due to the lack of a fixed reward or learning signal. Towards addressing these challenges, we develop a variational model-based adversarial imitation learning (V-MAIL) algorithm. The model-based approach provides a strong signal for representation learning, enables sample efficiency, and improves the stability of adversarial training by enabling on-policy learning. Through experiments involving several vision-based locomotion and manipulation tasks, we find that V-MAIL learns successful visuomotor policies in a sample-efficient manner, has better stability compared to prior work, and also achieves higher asymptotic performance. We further find that by transferring the learned models, V-MAIL can learn new tasks from visual demonstrations without any additional environment interactions. All results including videos can be found online at \url{https://sites.google.com/view/variational-mail}.
\ No newline at end of file
diff --git a/data/2021/neurips/Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases b/data/2021/neurips/Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases
new file mode 100644
index 0000000000..80f66e4b94
--- /dev/null
+++ b/data/2021/neurips/Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases	
@@ -0,0 +1 @@
+Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found. The model integrates eccentricity-dependent visual recognition with target-dependent top-down cues. We compared the model against human behavior in six paradigmatic search tasks that show asymmetry in humans. Without prior exposure to the stimuli or task-specific training, the model provides a plausible mechanism for search asymmetry. We hypothesized that the polarity of search asymmetry arises from experience with the natural environment. We tested this hypothesis by training the model on augmented versions of ImageNet where the biases of natural images were either removed or reversed. The polarity of search asymmetry disappeared or was altered depending on the training protocol. This study highlights how classical perceptual properties can emerge in neural network models, without the need for task-specific training, but rather as a consequence of the statistical properties of the developmental diet fed to the model. All source code and data are publicly available at https://github.com/kreimanlab/VisualSearchAsymmetry.
\ No newline at end of file
diff --git a/data/2021/neurips/Visualizing the Emergence of Intermediate Visual Patterns in DNNs b/data/2021/neurips/Visualizing the Emergence of Intermediate Visual Patterns in DNNs
new file mode 100644
index 0000000000..7654737d9c
--- /dev/null
+++ b/data/2021/neurips/Visualizing the Emergence of Intermediate Visual Patterns in DNNs	
@@ -0,0 +1 @@
+This paper proposes a method to visualize the discrimination power of intermediate-layer visual patterns encoded by a DNN. Specifically, we visualize (1) how the DNN gradually learns regional visual patterns in each intermediate layer during the training process, and (2) the effects of the DNN using non-discriminative patterns in low layers to construct disciminative patterns in middle/high layers through the forward propagation. Based on our visualization method, we can quantify knowledge points (i.e., the number of discriminative visual patterns) learned by the DNN to evaluate the representation capacity of the DNN. Furthermore, this method also provides new insights into signal-processing behaviors of existing deep-learning techniques, such as adversarial attacks and knowledge distillation.
\ No newline at end of file
diff --git a/data/2021/neurips/VoiceMixer: Adversarial Voice Style Mixup b/data/2021/neurips/VoiceMixer: Adversarial Voice Style Mixup
new file mode 100644
index 0000000000..9d2b08c49c
--- /dev/null
+++ b/data/2021/neurips/VoiceMixer: Adversarial Voice Style Mixup	
@@ -0,0 +1 @@
+Although recent advances in voice conversion have shown signiﬁcant improvement, there still remains a gap between the converted voice and target voice. A key factor that maintains this gap is the insufﬁcient decomposition of content and voice style from the source speech. This insufﬁciency leads to the converted speech containing source speech style or losing source speech content. In this paper, we present VoiceMixer which can effectively decompose and transfer voice style through a novel information bottleneck and adversarial feedback. With self-supervised representation learning, the proposed information bottleneck can decompose the content and style with only a small loss of content information. Also, for adversarial feedback of each information, the discriminator is decomposed into content and style discriminator with self-supervision, which enable our model to achieve better generalization to the voice style of the converted speech. The experimental results show the superiority of our model in disentanglement and transfer performance, and improve audio quality by preserving content information.
\ No newline at end of file
diff --git a/data/2021/neurips/Volume Rendering of Neural Implicit Surfaces b/data/2021/neurips/Volume Rendering of Neural Implicit Surfaces
new file mode 100644
index 0000000000..72eededee8
--- /dev/null
+++ b/data/2021/neurips/Volume Rendering of Neural Implicit Surfaces	
@@ -0,0 +1 @@
+Neural volume rendering became increasingly popular recently due to its success in synthesizing novel views of a scene from a sparse set of input images. So far, the geometry learned by neural volume rendering techniques was modeled using a generic density function. Furthermore, the geometry itself was extracted using an arbitrary level set of the density function leading to a noisy, often low fidelity reconstruction. The goal of this paper is to improve geometry representation and reconstruction in neural volume rendering. We achieve that by modeling the volume density as a function of the geometry. This is in contrast to previous work modeling the geometry as a function of the volume density. In more detail, we define the volume density function as Laplace's cumulative distribution function (CDF) applied to a signed distance function (SDF) representation. This simple density representation has three benefits: (i) it provides a useful inductive bias to the geometry learned in the neural volume rendering process; (ii) it facilitates a bound on the opacity approximation error, leading to an accurate sampling of the viewing ray. Accurate sampling is important to provide a precise coupling of geometry and radiance; and (iii) it allows efficient unsupervised disentanglement of shape and appearance in volume rendering. Applying this new density representation to challenging scene multiview datasets produced high quality geometry reconstructions, outperforming relevant baselines. Furthermore, switching shape and appearance between scenes is possible due to the disentanglement of the two.
\ No newline at end of file
diff --git a/data/2021/neurips/Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image b/data/2021/neurips/Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image
new file mode 100644
index 0000000000..807f246504
--- /dev/null
+++ b/data/2021/neurips/Voxel-based 3D Detection and Reconstruction of Multiple Objects from a Single Image	
@@ -0,0 +1 @@
+Inferring 3D locations and shapes of multiple objects from a single 2D image is a long-standing objective of computer vision. Most of the existing works either predict one of these 3D properties or focus on solving both for a single object. One fundamental challenge lies in how to learn an effective representation of the image that is well-suited for 3D detection and reconstruction. In this work, we propose to learn a regular grid of 3D voxel features from the input image which is aligned with 3D scene space via a 3D feature lifting operator. Based on the 3D voxel features, our novel CenterNet-3D detection head formulates the 3D detection as keypoint detection in the 3D space. Moreover, we devise an efficient coarse-to-fine reconstruction module, including coarse-level voxelization and a novel local PCA-SDF shape representation, which enables fine detail reconstruction and one order of magnitude faster inference than prior methods. With complementary supervision from both 3D detection and reconstruction, one enables the 3D voxel features to be geometry and context preserving, benefiting both tasks.The effectiveness of our approach is demonstrated through 3D detection and reconstruction in single object and multiple object scenarios.
\ No newline at end of file
diff --git a/data/2021/neurips/Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic b/data/2021/neurips/Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic
new file mode 100644
index 0000000000..cea0227ab4
--- /dev/null
+++ b/data/2021/neurips/Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic	
@@ -0,0 +1 @@
+Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. However, most of the existing theoretical support for AC algorithms focuses on the case of linear function approximations, or linearized neural networks, where the feature representation is fixed throughout training. Such a limitation fails to capture the key aspect of representation learning in neural AC, which is pivotal in practical problems. In this work, we take a mean-field perspective on the evolution and convergence of feature-based neural AC. Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates. The critic is updated by temporal-difference (TD) learning with a larger stepsize while the actor is updated via proximal policy optimization (PPO) with a smaller stepsize. In the continuous-time and infinite-width limiting regime, when the timescales are properly separated, we prove that neural AC finds the globally optimal policy at a sublinear rate. Additionally, we prove that the feature representation induced by the critic network is allowed to evolve within a neighborhood of the initial one.
\ No newline at end of file
diff --git a/data/2021/neurips/Weak-shot Fine-grained Classification via Similarity Transfer b/data/2021/neurips/Weak-shot Fine-grained Classification via Similarity Transfer
new file mode 100644
index 0000000000..f1673ef510
--- /dev/null
+++ b/data/2021/neurips/Weak-shot Fine-grained Classification via Similarity Transfer	
@@ -0,0 +1 @@
+Recognizing fine-grained categories remains a challenging task, due to the subtle distinctions among different subordinate categories, which results in the need of abundant annotated samples. To alleviate the data-hungry problem, we consider the problem of learning novel categories from web data with the support of a clean set of base categories, which is referred to as weak-shot learning. Under this setting, we propose to transfer pairwise semantic similarity from base categories to novel categories, because this similarity is highly transferable and beneficial for learning from web data. Specifically, we firstly train a similarity net on clean data, and then employ two simple yet effective strategies to leverage the transferred similarity to denoise web training data. In addition, we apply adversarial loss on similarity net to enhance the transferability of similarity. Comprehensive experiments on three fine-grained datasets demonstrate that we could dramatically facilitate webly supervised learning by a clean set and similarity transfer is effective under this setting.
\ No newline at end of file
diff --git a/data/2021/neurips/Weighted model estimation for offline model-based reinforcement learning b/data/2021/neurips/Weighted model estimation for offline model-based reinforcement learning
new file mode 100644
index 0000000000..4287bae349
--- /dev/null
+++ b/data/2021/neurips/Weighted model estimation for offline model-based reinforcement learning	
@@ -0,0 +1 @@
+This paper discusses model estimation in ofﬂine model-based reinforcement learning (MBRL), which is important for subsequent policy improvement using an estimated model. From the viewpoint of covariate shift, a natural idea is model estimation weighted by the ratio of the state-action distributions of ofﬂine data and real future data. However, estimating such a natural weight is one of the main challenges for off-policy evaluation, which is not easy to use. As an artiﬁcial alternative, this paper considers weighting with the state-action distribution ratio of ofﬂine data and simulated future data, which can be estimated relatively easily by standard density ratio estimation techniques for supervised learning. Based on the artiﬁcial weight, this paper deﬁnes a loss function for ofﬂine MBRL and presents an algorithm to optimize it. Weighting with the artiﬁcial weight is justiﬁed as evaluating an upper bound of the policy evaluation error. Numerical experiments demonstrate the effectiveness of weighting with the artiﬁcial weight.
\ No newline at end of file
diff --git a/data/2021/neurips/Weisfeiler and Lehman Go Cellular: CW Networks b/data/2021/neurips/Weisfeiler and Lehman Go Cellular: CW Networks
new file mode 100644
index 0000000000..e61b5e1e1d
--- /dev/null
+++ b/data/2021/neurips/Weisfeiler and Lehman Go Cellular: CW Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are limited in their expressive power, struggle with long-range interactions and lack a principled way to model higher-order structures. These problems can be attributed to the strong coupling between the computational graph and the input graph structure. The recently proposed Message Passing Simplicial Networks naturally decouple these elements by performing message passing on the clique complex of the graph. Nevertheless, these models can be severely constrained by the rigid combinatorial structure of Simplicial Complexes (SCs). In this work, we extend recent theoretical results on SCs to regular Cell Complexes, topological objects that flexibly subsume SCs and graphs. We show that this generalisation provides a powerful set of graph"lifting"transformations, each leading to a unique hierarchical message passing procedure. The resulting methods, which we collectively call CW Networks (CWNs), are strictly more powerful than the WL test and not less powerful than the 3-WL test. In particular, we demonstrate the effectiveness of one such scheme, based on rings, when applied to molecular graph problems. The proposed architecture benefits from provably larger expressivity than commonly used GNNs, principled modelling of higher-order signals and from compressing the distances between nodes. We demonstrate that our model achieves state-of-the-art results on a variety of molecular datasets.
\ No newline at end of file
diff --git a/data/2021/neurips/Well-tuned Simple Nets Excel on Tabular Datasets b/data/2021/neurips/Well-tuned Simple Nets Excel on Tabular Datasets
new file mode 100644
index 0000000000..a4f4545fc8
--- /dev/null
+++ b/data/2021/neurips/Well-tuned Simple Nets Excel on Tabular Datasets	
@@ -0,0 +1 @@
+Tabular datasets are the last"unconquered castle"for deep learning, with traditional ML methods like Gradient-Boosted Decision Trees still performing strongly even against recent specialized neural architectures. In this paper, we hypothesize that the key to boosting the performance of neural networks lies in rethinking the joint and simultaneous application of a large set of modern regularization techniques. As a result, we propose regularizing plain Multilayer Perceptron (MLP) networks by searching for the optimal combination/cocktail of 13 regularization techniques for each dataset using a joint optimization over the decision on which regularizers to apply and their subsidiary hyperparameters. We empirically assess the impact of these regularization cocktails for MLPs in a large-scale empirical study comprising 40 tabular datasets and demonstrate that (i) well-regularized plain MLPs significantly outperform recent state-of-the-art specialized neural network architectures, and (ii) they even outperform strong traditional ML methods, such as XGBoost.
\ No newline at end of file
diff --git a/data/2021/neurips/What Makes Multi-Modal Learning Better than Single (Provably) b/data/2021/neurips/What Makes Multi-Modal Learning Better than Single (Provably)
new file mode 100644
index 0000000000..be7d17c7c5
--- /dev/null
+++ b/data/2021/neurips/What Makes Multi-Modal Learning Better than Single (Provably)	
@@ -0,0 +1 @@
+The world provides us with data of multiple modalities. Intuitively, models fusing data from different modalities outperform their uni-modal counterparts, since more information is aggregated. Recently, joining the success of deep learning, there is an influential line of work on deep multi-modal learning, which has remarkable empirical results on various applications. However, theoretical justifications in this field are notably lacking. Can multi-modal learning provably perform better than uni-modal? In this paper, we answer this question under a most popular multi-modal fusion framework, which firstly encodes features from different modalities into a common latent space and seamlessly maps the latent representations into the task space. We prove that learning with multiple modalities achieves a smaller population risk than only using its subset of modalities. The main intuition is that the former has a more accurate estimate of the latent space representation. To the best of our knowledge, this is the first theoretical treatment to capture important qualitative phenomena observed in real multi-modal applications from the generalization perspective. Combining with experiment results, we show that multi-modal learning does possess an appealing formal guarantee.
\ No newline at end of file
diff --git a/data/2021/neurips/What Matters for Adversarial Imitation Learning? b/data/2021/neurips/What Matters for Adversarial Imitation Learning?
new file mode 100644
index 0000000000..70a5077fdc
--- /dev/null
+++ b/data/2021/neurips/What Matters for Adversarial Imitation Learning?	
@@ -0,0 +1 @@
+Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations.
\ No newline at end of file
diff --git a/data/2021/neurips/What can linearized neural networks actually say about generalization? b/data/2021/neurips/What can linearized neural networks actually say about generalization?
new file mode 100644
index 0000000000..439d45b773
--- /dev/null
+++ b/data/2021/neurips/What can linearized neural networks actually say about generalization?	
@@ -0,0 +1 @@
+For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization, but for the networks used in practice, the empirical NTK only provides a rough first-order approximation. Still, a growing body of work keeps leveraging this approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behavior of different neural networks and their linear approximations on different tasks. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks, even when they achieve very different performances. However, in contrast to what was previously reported, we discover that neural networks do not always perform better than their kernel approximations, and reveal that the performance gap heavily depends on architecture, dataset size and training task. We discover that networks overfit to these tasks mostly due to the evolution of their kernel during training, thus, revealing a new type of implicit bias.
\ No newline at end of file
diff --git a/data/2021/neurips/What training reveals about neural network complexity b/data/2021/neurips/What training reveals about neural network complexity
new file mode 100644
index 0000000000..dccafb3d2a
--- /dev/null
+++ b/data/2021/neurips/What training reveals about neural network complexity	
@@ -0,0 +1 @@
+This work explores the Benevolent Training Hypothesis (BTH) which argues that the complexity of the function a deep neural network (NN) is learning can be deduced by its training dynamics. Our analysis provides evidence for BTH by relating the NN's Lipschitz constant at different regions of the input space with the behavior of the stochastic training procedure. We first observe that the Lipschitz constant close to the training data affects various aspects of the parameter trajectory, with more complex networks having a longer trajectory, bigger variance, and often veering further from their initialization. We then show that NNs whose 1st layer bias is trained more steadily (i.e., slowly and with little variation) have bounded complexity even in regions of the input space that are far from any training point. Finally, we find that steady training with Dropout implies a training- and data-dependent generalization bound that grows poly-logarithmically with the number of parameters. Overall, our results support the intuition that good training behavior can be a useful bias towards good generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/What's a good imputation to predict with missing values? b/data/2021/neurips/What's a good imputation to predict with missing values?
new file mode 100644
index 0000000000..35a37d7b83
--- /dev/null
+++ b/data/2021/neurips/What's a good imputation to predict with missing values?	
@@ -0,0 +1 @@
+How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation is not needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.
\ No newline at end of file
diff --git a/data/2021/neurips/When Are Solutions Connected in Deep Networks? b/data/2021/neurips/When Are Solutions Connected in Deep Networks?
new file mode 100644
index 0000000000..3191ae3d26
--- /dev/null
+++ b/data/2021/neurips/When Are Solutions Connected in Deep Networks?	
@@ -0,0 +1 @@
+The question of how and why the phenomenon of mode connectivity occurs in training deep neural networks has gained remarkable attention in the research community. From a theoretical perspective, two possible explanations have been proposed: (i) the loss function has connected sublevel sets, and (ii) the solutions found by stochastic gradient descent are dropout stable. While these explanations provide insights into the phenomenon, their assumptions are not always satisfied in practice. In particular, the first approach requires the network to have one layer with order of $N$ neurons ($N$ being the number of training samples), while the second one requires the loss to be almost invariant after removing half of the neurons at each layer (up to some rescaling of the remaining ones). In this work, we improve both conditions by exploiting the quality of the features at every intermediate layer together with a milder over-parameterization condition. More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then no over-parameterization is needed to show the connectivity. Our experiments confirm that the proposed condition ensures the connectivity of solutions found by stochastic gradient descent, even in settings where the previous requirements do not hold.
\ No newline at end of file
diff --git a/data/2021/neurips/When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work b/data/2021/neurips/When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
new file mode 100644
index 0000000000..4428117045
--- /dev/null
+++ b/data/2021/neurips/When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work	
@@ -0,0 +1 @@
+Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth. First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.
\ No newline at end of file
diff --git a/data/2021/neurips/When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking b/data/2021/neurips/When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking
new file mode 100644
index 0000000000..41b8c715c9
--- /dev/null
+++ b/data/2021/neurips/When False Positive is Intolerant: End-to-End Optimization with Low FPR for Multipartite Ranking	
@@ -0,0 +1 @@
+Multipartite ranking is a basic task in machine learning, where the Area Under the receiver operating characteristics Curve (AUC) is generally applied as the evaluation metric. Despite that AUC re ﬂ ects the overall performance of the model, it is inconsistent with the expected performance in some application scenarios, where only a low False Positive Rate (FPR) is meaningful. To leverage high performance under low FPRs, we consider an alternative metric for multipartite ranking evaluating the True Positive Rate (TPR) at a given FPR, denoted as TPR@FPR. Unfortunately, the key challenge of direct TPR@FPR optimization is two-fold: a) the original objective function is not differentiable, making gradient backpropagation impossible; b) the loss function could not be written as a sum of independent instance-wise terms, making mini-batch based optimization infeasible. To address these issues, we propose a novel framework on top of the deep learning framework named Cross-Batch Approximation for Multipartite Ranking (CBA-MR) . In face of a) , we propose a differentiable surrogate optimization problem where the instances having a short-time effect on FPR are rendered with different weights based on the random walk hypothesis. To tackle b) , we propose a fast ranking estimation method, where the full-batch loss evaluation is replaced by a delayed update scheme with the help of an embedding cache. Finally, experimental results on four real-world benchmarks are provided to demonstrate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2021/neurips/When Is Generalizable Reinforcement Learning Tractable? b/data/2021/neurips/When Is Generalizable Reinforcement Learning Tractable?
new file mode 100644
index 0000000000..a0bf8d6e4d
--- /dev/null
+++ b/data/2021/neurips/When Is Generalizable Reinforcement Learning Tractable?	
@@ -0,0 +1 @@
+Agents trained by reinforcement learning (RL) often fail to generalize beyond the environment they were trained in, even when presented with new scenarios that seem similar to the training environment. We study the query complexity required to train RL agents that generalize to multiple environments. Intuitively, tractable generalization is only possible when the environments are similar or close in some sense. To capture this, we introduce Weak Proximity , a natural structural condition that requires the environments to have highly similar transition and reward functions and share a policy providing optimal value. Despite such shared structure, we prove that tractable generalization is impossible in the worst case. This holds even when each individual environment can be efﬁciently solved to obtain an optimal linear policy, and when the agent possesses a generative model. Our lower bound applies to the more complex task of representation learning for efﬁcient generalization to multiple environments. On the positive side, we introduce Strong Proximity , a strengthened condition which we prove is sufﬁcient for efﬁcient generalization.
\ No newline at end of file
diff --git a/data/2021/neurips/When Is Unsupervised Disentanglement Possible? b/data/2021/neurips/When Is Unsupervised Disentanglement Possible?
new file mode 100644
index 0000000000..b21c67b1df
--- /dev/null
+++ b/data/2021/neurips/When Is Unsupervised Disentanglement Possible?	
@@ -0,0 +1 @@
+A common assumption in many domains is that high dimensional data are a smooth nonlinear function of a small number of independent factors. When is it possible to recover the factors from unlabeled data? In the context of deep models this problem is called “disentanglement” and was recently shown to be impossible without additional strong assumptions [17, 19]. In this paper, we show that the assumption of local isometry together with non-Gaussianity of the factors, is sufﬁcient to provably recover disentangled representations from data. We leverage recent advances in deep generative models to construct manifolds of highly realistic images for which the ground truth latent representation is known, and test whether modern and classical methods succeed in recovering the latent factors. For many different manifolds, we ﬁnd that a spectral method that explicitly optimizes local isometry and non-Gaussianity consistently ﬁnds the correct latent factors, while baseline deep autoencoders do not. We propose how to encourage deep autoencoders to ﬁnd encodings that satisfy local isometry and show that this helps them discover disentangled representations. Overall, our results suggest that in some realistic settings, unsupervised disentanglement is provably possible, without any domain-speciﬁc assumptions. representation found by HLLE+ICA approximately satisﬁes local isometry while the encoding of the baseline autoencoder does not (bottom). In this paper we show that the assumptions of local isometry and non-Gaussianity are sufﬁcient to provably recover disentangled representations.
\ No newline at end of file
diff --git a/data/2021/neurips/When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning? b/data/2021/neurips/When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
new file mode 100644
index 0000000000..83710ebaa5
--- /dev/null
+++ b/data/2021/neurips/When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?	
@@ -0,0 +1 @@
+Contrastive learning (CL) can learn generalizable feature representations and achieve the state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the self-supervised pretraining + supervised finetuning paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such a challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e.g., resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that AdvCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100, and STL-10) and finetuning schemes (linear evaluation and full model finetuning).
\ No newline at end of file
diff --git a/data/2021/neurips/When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting b/data/2021/neurips/When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting
new file mode 100644
index 0000000000..76116522fa
--- /dev/null
+++ b/data/2021/neurips/When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting	
@@ -0,0 +1 @@
+Accurate and trustworthy epidemic forecasting is an important problem that has impact on public health planning and disease mitigation. Most existing epidemic forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations; e.g. it is difficult to specify meaningful priors in Bayesian NNs, while methods like deep ensembling are computationally expensive in practice. In this paper, we fill this important gap. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP, which directly models the probability density of the forecast value. EPIFNP leverages a dynamic stochastic correlation graph to model the correlations between sequences in a non-parametric way, and designs different stochastic latent variables to capture functional uncertainty from different perspectives. Our extensive experiments in a real-time flu forecasting setting show that EPIFNP significantly outperforms previous state-of-the-art models in both accuracy and calibration metrics, up to 2.5x in accuracy and 2.4x in calibration. Additionally, due to properties of its generative process,EPIFNP learns the relations between the current season and similar patterns of historical seasons,enabling interpretable forecasts. Beyond epidemic forecasting, the EPIFNP can be of independent interest for advancing principled uncertainty quantification in deep sequential models for predictive analytics
\ No newline at end of file
diff --git a/data/2021/neurips/Which Mutual-Information Representation Learning Objectives are Sufficient for Control? b/data/2021/neurips/Which Mutual-Information Representation Learning Objectives are Sufficient for Control?
new file mode 100644
index 0000000000..d91a78944c
--- /dev/null
+++ b/data/2021/neurips/Which Mutual-Information Representation Learning Objectives are Sufficient for Control?	
@@ -0,0 +1 @@
+Mutual information maximization provides an appealing formalism for learning representations of data. In the context of reinforcement learning (RL), such representations can accelerate learning by discarding irrelevant and redundant information, while retaining the information necessary for control. Much of the prior work on these methods has addressed the practical difficulties of estimating mutual information from samples of high-dimensional observations, while comparatively less is understood about which mutual information objectives yield representations that are sufficient for RL from a theoretical perspective. In this paper, we formalize the sufficiency of a state representation for learning and representing the optimal policy, and study several popular mutual-information based objectives through this lens. Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP. We corroborate our theoretical results with empirical experiments on a simulated game environment with visual observations.
\ No newline at end of file
diff --git a/data/2021/neurips/Who Leads and Who Follows in Strategic Classification? b/data/2021/neurips/Who Leads and Who Follows in Strategic Classification?
new file mode 100644
index 0000000000..17ba9e97b7
--- /dev/null
+++ b/data/2021/neurips/Who Leads and Who Follows in Strategic Classification?	
@@ -0,0 +1 @@
+As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker"leads"in the game by deploying a model, and the strategic agents"follow"by playing their best response to the deployed model. Importantly, in this framing, the burden of learning is placed solely on the decision-maker, while the agents' best responses are implicitly treated as instantaneous. In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions. In particular, by generalizing the standard model to allow both players to learn over time, we show that a decision-maker that makes updates faster than the agents can reverse the order of play, meaning that the agents lead and the decision-maker follows. We observe in standard learning settings that such a role reversal can be desirable for both the decision-maker and the strategic agents. Finally, we show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
\ No newline at end of file
diff --git a/data/2021/neurips/Why Do Better Loss Functions Lead to Less Transferable Features? b/data/2021/neurips/Why Do Better Loss Functions Lead to Less Transferable Features?
new file mode 100644
index 0000000000..6393473707
--- /dev/null
+++ b/data/2021/neurips/Why Do Better Loss Functions Lead to Less Transferable Features?	
@@ -0,0 +1 @@
+Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.
\ No newline at end of file
diff --git a/data/2021/neurips/Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning b/data/2021/neurips/Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning
new file mode 100644
index 0000000000..34419ae91d
--- /dev/null
+++ b/data/2021/neurips/Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning	
@@ -0,0 +1 @@
+Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
\ No newline at end of file
diff --git a/data/2021/neurips/Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability b/data/2021/neurips/Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability
new file mode 100644
index 0000000000..c7e8810e32
--- /dev/null
+++ b/data/2021/neurips/Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability	
@@ -0,0 +1 @@
+Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we show that, perhaps surprisingly, this is not the case in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite.
\ No newline at end of file
diff --git a/data/2021/neurips/Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks b/data/2021/neurips/Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/Why Spectral Normalization Stabilizes GANs: Analysis and Improvements b/data/2021/neurips/Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
new file mode 100644
index 0000000000..9d2ccd83c3
--- /dev/null
+++ b/data/2021/neurips/Why Spectral Normalization Stabilizes GANs: Analysis and Improvements	
@@ -0,0 +1 @@
+Spectral normalization (SN) is a widely-used technique for improving the stability of Generative Adversarial Networks (GANs) by forcing each layer of the discriminator to have unit spectral norm. This approach controls the Lipschitz constant of the discriminator, and is empirically known to improve sample quality in many GAN architectures. However, there is currently little understanding of why SN is so effective. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection with the successful LeCun initialization technique, proposed over two decades ago to control gradients in the training of deep neural networks. This connection helps to explain why the most popular implementation of SN for GANs requires no hyperparameter tuning, whereas stricter implementations of SN have poor empirical performance out-of-the-box. Unlike LeCun initialization which only controls gradient vanishing at the beginning of training, we show that SN tends to preserve this property throughout training. Finally, building on this theoretical understanding, we propose Bidirectional Spectral Normalization (BSN), a modification of SN inspired by Xavier initialization, a later improvement to LeCun initialization. Theoretically, we show that BSN gives better gradient control than SN. Empirically, we demonstrate that BSN outperforms SN in sample quality on several benchmark datasets, while also exhibiting better training stability.
\ No newline at end of file
diff --git a/data/2021/neurips/Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation b/data/2021/neurips/Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation
new file mode 100644
index 0000000000..b2417e4bdb
--- /dev/null
+++ b/data/2021/neurips/Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation	
@@ -0,0 +1 @@
+Human explanation (e.g., in terms of feature importance) has been recently used to extend the communication channel between human and agent in interactive machine learning. Under this setting, human trainers provide not only the ground truth but also some form of explanation. However, this kind of human guidance was only investigated in supervised learning tasks, and it remains unclear how to best incorporate this type of human knowledge into deep reinforcement learning. In this paper, we present the first study of using human visual explanations in human-in-the-loop reinforcement learning (HRL). We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative"good"or"bad"feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images. We propose EXPAND (EXPlanation AugmeNted feeDback) to encourage the model to encode task-relevant features through a context-aware data augmentation that only perturbs irrelevant features in human salient information. We choose five tasks, namely Pixel-Taxi and four Atari games, to evaluate the performance and sample efficiency of this approach. We show that our method significantly outperforms methods leveraging human explanation that are adapted from supervised learning, and Human-in-the-loop RL baselines that only utilize evaluative feedback.
\ No newline at end of file
diff --git a/data/2021/neurips/Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark b/data/2021/neurips/Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark
new file mode 100644
index 0000000000..b731fe0d10
--- /dev/null
+++ b/data/2021/neurips/Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark	
@@ -0,0 +1 @@
+We propose new width-based planning and learning algorithms inspired from a careful analysis of the design decisions made by previous width-based planners. The algorithms are applied over the Atari-2600 games and our best performing algorithm, Novelty guided Critical Path Learning (N-CPL), outperforms the previously introduced width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and $\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, N-CPL outperforms $\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).
\ No newline at end of file
diff --git a/data/2021/neurips/Wisdom of the Crowd Voting: Truthful Aggregation of Voter Information and Preferences b/data/2021/neurips/Wisdom of the Crowd Voting: Truthful Aggregation of Voter Information and Preferences
new file mode 100644
index 0000000000..bfbf8ca840
--- /dev/null
+++ b/data/2021/neurips/Wisdom of the Crowd Voting: Truthful Aggregation of Voter Information and Preferences	
@@ -0,0 +1 @@
+We consider two-alternative elections where voters' preferences depend on a state variable that is not directly observable. Each voter receives a private signal that is correlated to the state variable. Voters may be"contingent"with different preferences in different states; or predetermined with the same preference in every state. In this setting, even if every voter is a contingent voter, agents voting according to their private information need not result in the adoption of the universally preferred alternative, because the signals can be systematically biased. We present an easy-to-deploy mechanism that elicits and aggregates the private signals from the voters, and outputs the alternative that is favored by the majority. In particular, voters truthfully reporting their signals forms a strong Bayes Nash equilibrium (where no coalition of voters can deviate and receive a better outcome).
\ No newline at end of file
diff --git a/data/2021/neurips/Word2Fun: Modelling Words as Functions for Diachronic Word Representation b/data/2021/neurips/Word2Fun: Modelling Words as Functions for Diachronic Word Representation
new file mode 100644
index 0000000000..f3c37aa97e
--- /dev/null
+++ b/data/2021/neurips/Word2Fun: Modelling Words as Functions for Diachronic Word Representation	
@@ -0,0 +1 @@
+Word meaning may change over time as a reﬂection of changes in human society. Therefore, modeling time in word representation is necessary for some diachronic tasks. Most existing diachronic word representation approaches train the embeddings separately for each pre-grouped time-stamped corpus and align these embeddings, e.g., by orthogonal projections, vector initialization, temporal referencing, and compass. However, not only does word meaning change in a short time, word meaning may also be subject to evolution over long timespans, thus resulting in a uniﬁed continuous process. A recent approach called ‘Diff-Time’ models semantic evolution as functions parameterized by multiple-layer nonlinear neural networks over time. In this paper, we will carry on this line of work by learning explicit functions over time for each word. Our approach, called ‘Word2Fun’, reduces the space complexity from O ( TV D ) to O ( kV D ) where k is a small constant ( k (cid:28) T ). In particular, a speciﬁc instance based on polynomial functions could provably approximate any function modeling word evolution with a given negligible error thanks to the Weierstrass Approximation Theorem. The effectiveness of the proposed approach is evaluated in diverse tasks including time-aware word clustering, temporal analogy, and semantic change detection. Code at: https://github.com/wabyking/Word2Fun.git .
\ No newline at end of file
diff --git a/data/2021/neurips/XCiT: Cross-Covariance Image Transformers b/data/2021/neurips/XCiT: Cross-Covariance Image Transformers
new file mode 100644
index 0000000000..1210d6ee3f
--- /dev/null
+++ b/data/2021/neurips/XCiT: Cross-Covariance Image Transformers	
@@ -0,0 +1 @@
+Following their success in natural language processing, transformers have recently shown much promise for computer vision. The self-attention operation underlying transformers yields global interactions between all tokens ,i.e. words or image patches, and enables flexible modelling of image data beyond the local interactions of convolutions. This flexibility, however, comes with a quadratic complexity in time and memory, hindering application to long sequences and high-resolution images. We propose a"transposed"version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries. The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images. Our cross-covariance image transformer (XCiT) is built upon XCA. It combines the accuracy of conventional transformers with the scalability of convolutional architectures. We validate the effectiveness and generality of XCiT by reporting excellent results on multiple vision benchmarks, including image classification and self-supervised feature learning on ImageNet-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k.
\ No newline at end of file
diff --git a/data/2021/neurips/XDO: A Double Oracle Algorithm for Extensive-Form Games b/data/2021/neurips/XDO: A Double Oracle Algorithm for Extensive-Form Games
new file mode 100644
index 0000000000..e5018da9cf
--- /dev/null
+++ b/data/2021/neurips/XDO: A Double Oracle Algorithm for Extensive-Form Games	
@@ -0,0 +1 @@
+Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo.
\ No newline at end of file
diff --git a/data/2021/neurips/You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism b/data/2021/neurips/You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism
new file mode 100644
index 0000000000..fb58f882cb
--- /dev/null
+++ b/data/2021/neurips/You Are the Best Reviewer of Your Own Papers: An Owner-Assisted Scoring Mechanism	
@@ -0,0 +1 @@
+I consider a setting where reviewers offer very noisy scores for several items for the selection of high-quality ones (e.g., peer review of large conference proceedings), whereas the owner of these items knows the true underlying scores but prefers not to provide this information. To address this withholding of information, in this paper, I introduce the Isotonic Mechanism, a simple and efficient approach to improving imprecise raw scores by leveraging certain information that the owner is incentivized to provide. This mechanism takes the ranking of the items from best to worst provided by the owner as input, in addition to the raw scores provided by the reviewers. It reports the adjusted scores for the items by solving a convex optimization problem. Under certain conditions, I show that the owner's optimal strategy is to honestly report the true ranking of the items to her best knowledge in order to maximize the expected utility. Moreover, I prove that the adjusted scores provided by this owner-assisted mechanism are significantly more accurate than the raw scores provided by the reviewers. This paper concludes with several extensions of the Isotonic Mechanism and some refinements of the mechanism for practical consideration.
\ No newline at end of file
diff --git a/data/2021/neurips/You Never Cluster Alone b/data/2021/neurips/You Never Cluster Alone
new file mode 100644
index 0000000000..1e3384c1c5
--- /dev/null
+++ b/data/2021/neurips/You Never Cluster Alone	
@@ -0,0 +1 @@
+Recent advances in self-supervised learning with instance-level contrastive objectives facilitate unsupervised clustering. However, a standalone datum is not perceiving the context of the holistic cluster, and may undergo sub-optimal assignment. In this paper, we extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation that encodes the context of each data group. Contrastive learning with this representation then rewards the assignment of each datum. To implement this vision, we propose twin-contrast clustering (TCC). We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one. On one hand, with the corresponding assignment variables being the weight, a weighted aggregation along the data points implements the set representation of a cluster. We further propose heuristic cluster augmentation equivalents to enable cluster-level contrastive learning. On the other hand, we derive the evidence lower-bound of the instance-level contrastive objective with the assignments. By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps. Extensive experiments show that TCC outperforms the state-of-the-art on challenging benchmarks.
\ No newline at end of file
diff --git a/data/2021/neurips/You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection b/data/2021/neurips/You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
new file mode 100644
index 0000000000..1edc48ee42
--- /dev/null
+++ b/data/2021/neurips/You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection	
@@ -0,0 +1 @@
+Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task. We find that YOLOS pre-trained on the mid-sized ImageNet-1k dataset only can already achieve quite competitive performance on the challenging COCO object detection benchmark, e.g., YOLOS-Base directly adopted from BERT-Base architecture can obtain 42.0 box AP on COCO val. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through YOLOS. Code and pre-trained models are available at https://github.com/hustvl/YOLOS.
\ No newline at end of file
diff --git a/data/2021/neurips/You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership b/data/2021/neurips/You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership
new file mode 100644
index 0000000000..3a7c8ad50b
--- /dev/null
+++ b/data/2021/neurips/You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership	
@@ -0,0 +1 @@
+Despite tremendous success in many application scenarios, the training and inference costs of using deep learning are also rapidly increasing over time. The lottery ticket hypothesis (LTH) emerges as a promising framework to leverage a special sparse subnetwork (i.e., winning ticket) instead of a full model for both training and inference, that can lower both costs without sacrificing the performance. The main resource bottleneck of LTH is however the extraordinary cost to find the sparse mask of the winning ticket. That makes the found winning ticket become a valuable asset to the owners, highlighting the necessity of protecting its copyright. Our setting adds a new dimension to the recently soaring interest in protecting against the intellectual property (IP) infringement of deep models and verifying their ownerships, since they take owners' massive/unique resources to develop or train. While existing methods explored encrypted weights or predictions, we investigate a unique way to leverage sparse topological information to perform lottery verification, by developing several graph-based signatures that can be embedded as credentials. By further combining trigger set-based methods, our proposal can work in both white-box and black-box verification scenarios. Through extensive experiments, we demonstrate the effectiveness of lottery verification in diverse models (ResNet-20, ResNet-18, ResNet-50) on CIFAR-10 and CIFAR-100. Specifically, our verification is shown to be robust to removal attacks such as model fine-tuning and pruning, as well as several ambiguity attacks. Our codes are available at https://github.com/VITA-Group/NO-stealing-LTH.
\ No newline at end of file
diff --git a/data/2021/neurips/Your head is there to move you around: Goal-driven models of the primate dorsal pathway b/data/2021/neurips/Your head is there to move you around: Goal-driven models of the primate dorsal pathway
new file mode 100644
index 0000000000..1ca5b399a2
--- /dev/null
+++ b/data/2021/neurips/Your head is there to move you around: Goal-driven models of the primate dorsal pathway	
@@ -0,0 +1 @@
+Neurons in the dorsal visual pathway of the mammalian brain are selective for motion stimuli, with the complexity of stimulus representations increasing along the hierarchy. This progression is similar to that of the ventral visual pathway, which is well characterized by artificial neural networks (ANNs) optimized for object recognition. In contrast, there are no image-computable models of the dorsal stream with comparable explanatory power. We hypothesized that the properties of dorsal stream neurons could be explained by a simple learning objective: the need for an organism to orient itself during self-motion. To test this hypothesis, we trained a 3D ResNet to predict an agent’s self-motion parameters from visual stimuli in a simulated environment. We found that the responses in this network accounted well for the selectivity of neurons in a large database of single-neuron recordings from the dorsal visual stream of non-human primates. In contrast, ANNs trained on an action recognition dataset through supervised or self-supervised learning could not explain responses in the dorsal stream, despite also being trained on naturalistic videos with moving objects. These results demonstrate that an ecologically relevant cost function can account for dorsal stream properties in the primate brain.
\ No newline at end of file
diff --git a/data/2021/neurips/Zero Time Waste: Recycling Predictions in Early Exit Neural Networks b/data/2021/neurips/Zero Time Waste: Recycling Predictions in Early Exit Neural Networks
new file mode 100644
index 0000000000..89b234391d
--- /dev/null
+++ b/data/2021/neurips/Zero Time Waste: Recycling Predictions in Early Exit Neural Networks	
@@ -0,0 +1 @@
+The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular IC does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by (1) adding direct connections between ICs and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various datasets and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other recently proposed early exit methods.
\ No newline at end of file
diff --git a/data/2021/neurips/argmax centroid b/data/2021/neurips/argmax centroid
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2021/neurips/iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder b/data/2021/neurips/iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder
new file mode 100644
index 0000000000..9b755faf25
--- /dev/null
+++ b/data/2021/neurips/iFlow: Numerically Invertible Flows for Efficient Lossless Compression via a Uniform Coder	
@@ -0,0 +1 @@
+It was estimated that the world produced $59 ZB$ ($5.9 \times 10^{13} GB$) of data in 2020, resulting in the enormous costs of both data storage and transmission. Fortunately, recent advances in deep generative models have spearheaded a new class of so-called"neural compression"algorithms, which significantly outperform traditional codecs in terms of compression ratio. Unfortunately, the application of neural compression garners little commercial interest due to its limited bandwidth; therefore, developing highly efficient frameworks is of critical practical importance. In this paper, we discuss lossless compression using normalizing flows which have demonstrated a great capacity for achieving high compression ratios. As such, we introduce iFlow, a new method for achieving efficient lossless compression. We first propose Modular Scale Transform (MST) and a novel family of numerically invertible flow transformations based on MST. Then we introduce the Uniform Base Conversion System (UBCS), a fast uniform-distribution codec incorporated into iFlow, enabling efficient compression. iFlow achieves state-of-the-art compression ratios and is $5\times$ quicker than other high-performance schemes. Furthermore, the techniques presented in this paper can be used to accelerate coding time for a broad class of flow-based algorithms.
\ No newline at end of file
diff --git "a/data/2022/neurips/\"Lossless\" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach" "b/data/2022/neurips/\"Lossless\" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach"
new file mode 100644
index 0000000000..505e63d205
--- /dev/null
+++ "b/data/2022/neurips/\"Lossless\" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach"	
@@ -0,0 +1 @@
+Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables"lossless"compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}.
\ No newline at end of file
diff --git "a/data/2022/neurips/\"Why Not Other Classes?\": Towards Class-Contrastive Back-Propagation Explanations" "b/data/2022/neurips/\"Why Not Other Classes?\": Towards Class-Contrastive Back-Propagation Explanations"
new file mode 100644
index 0000000000..2dfda865ac
--- /dev/null
+++ "b/data/2022/neurips/\"Why Not Other Classes?\": Towards Class-Contrastive Back-Propagation Explanations"	
@@ -0,0 +1 @@
+Numerous methods have been developed to explain the inner mechanism of deep neural network (DNN) based classifiers. Existing explanation methods are often limited to explaining predictions of a pre-specified class, which answers the question “why is the input classified into this class?” However, such explanations with respect to a single class are inherently insufficient because they do not capture features with class-discriminative power. That is, features that are important for predicting one class may also be important for other classes. To capture features with true class-discriminative power, we should instead ask “why is the input classified into this class, but not others ?” To answer this question, we propose a weighted contrastive framework for explaining DNNs. Our framework can easily convert any existing back-propagation explanation methods to build class-contrastive explanations. We theoretically validate our weighted contrast explanation in general back-propagation explanations, and show that our framework enables class-contrastive explanations with significant improvements in both qualitative and quantitative experiments. Based on the results, we point out an important blind spot in the current explainable artificial intelligence (XAI) study, where explanations towards the predicted logits and the probabilities are obfus-cated. We suggest that these two aspects should be distinguished explicitly any time explanation methods are applied.
\ No newline at end of file
diff --git a/data/2022/neurips/$k$-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension b/data/2022/neurips/$k$-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension
new file mode 100644
index 0000000000..6ff61d9d0f
--- /dev/null
+++ b/data/2022/neurips/$k$-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension	
@@ -0,0 +1 @@
+The HWI inequality of Otto and Villani [22] is a functional inequality relating the entropy (H), quadratic transportation cost (W), and Fisher information (I), all defined w.r.t. a suitable reference measure that has bounded curvature. In deriving the classic result, a more general version of the HWI was established in [22]—one that is particularly well suited to the application in this paper, where we consider the differences between two entropy terms. See also [23, Proposition 1.5] for a recent derivation of the generalized inequality via a different argument based on an entropic interpolation of Wasserstein geodesics. The result reads as follows: let = N (0, Id) denote the isotropic Gaussian measure on Rd with variance 2 and consider μ, ⌫ 2 P2(R) with D(⌫k ) < 1 for some > 0, then D(μk ) D(⌫k )  W2(μ, ⌫) p J(μk ) 1 2 2 W 2(μ, ⌫). (5)
\ No newline at end of file
diff --git a/data/2022/neurips/(De-)Randomized Smoothing for Decision Stump Ensembles b/data/2022/neurips/(De-)Randomized Smoothing for Decision Stump Ensembles
new file mode 100644
index 0000000000..898fd3e972
--- /dev/null
+++ b/data/2022/neurips/(De-)Randomized Smoothing for Decision Stump Ensembles	
@@ -0,0 +1 @@
+Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely under-explored, in contrast to those focusing on neural networks. Targeting this important challenge, we propose deterministic smoothing for decision stump ensembles. Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming. Importantly, we obtain deterministic robustness certificates, even jointly over numerical and categorical features, a setting ubiquitous in the real world. Further, we derive an MLE-optimal training method for smoothed decision stumps under randomization and propose two boosting approaches to improve their provable robustness. An extensive experimental evaluation on computer vision and tabular data tasks shows that our approach yields significantly higher certified accuracies than the state-of-the-art for tree-based models. We release all code and trained models at https://github.com/eth-sri/drs.
\ No newline at end of file
diff --git a/data/2022/neurips/(Optimal) Online Bipartite Matching with Degree Information b/data/2022/neurips/(Optimal) Online Bipartite Matching with Degree Information
new file mode 100644
index 0000000000..0829834122
--- /dev/null
+++ b/data/2022/neurips/(Optimal) Online Bipartite Matching with Degree Information	
@@ -0,0 +1 @@
+We propose a model for online graph problems where algorithms are given access to an oracle that predicts (e.g., based on modeling assumptions or on past data) the degrees of nodes in the graph. Within this model, we study the classic problem of online bipartite matching, and a natural greedy matching algorithm called MinPredictedDegree, which uses predictions of the degrees of offline nodes. For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i.e., it is optimal for graphs drawn from this model. Since the"symmetric"version of the model, where all online nodes are identical, is a special case of the well-studied"known i.i.d. model", it follows that the competitive ratio of MinPredictedDegree on such inputs is at least 0.7299. For the special case of graphs with power law degree distributions, we show that MinPredictedDegree frequently produces matchings almost as large as the true maximum matching on such graphs. We complement these results with an extensive empirical evaluation showing that MinPredictedDegree compares favorably to state-of-the-art online algorithms for online matching.
\ No newline at end of file
diff --git a/data/2022/neurips/360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning b/data/2022/neurips/360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning
new file mode 100644
index 0000000000..4a3f6ead9f
--- /dev/null
+++ b/data/2022/neurips/360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning	
@@ -0,0 +1 @@
+We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room-layout models using unlabeled 360-images only. This can be valuable in practical scenarios where a pre-trained model needs to be adapted to a new data domain without using any ground truth annotations. Our simple yet effective assumption is that multiple layout estimations in the same scene must define a consistent geometry regardless of their camera positions. Based on this idea, we leverage a pre-trained model to project estimated layout boundaries from several camera views into the 3D world coordinate. Then, we re-project them back to the spherical coordinate and build a probability function, from which we sample the pseudo-labels for self-training. To handle unconfident pseudo-labels, we evaluate the variance in the re-projected boundaries as an uncertainty value to weight each pseudo-label in our loss function during training. In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations. Experimental results show that our solution achieves favorable performance against state-of-the-art methods when self-training from three publicly available source datasets to a unique, newly labeled dataset consisting of multi-view of the same scenes.
\ No newline at end of file
diff --git a/data/2022/neurips/3D Concept Grounding on Neural Fields b/data/2022/neurips/3D Concept Grounding on Neural Fields
new file mode 100644
index 0000000000..1702c86cfd
--- /dev/null
+++ b/data/2022/neurips/3D Concept Grounding on Neural Fields	
@@ -0,0 +1 @@
+In this paper, we address the challenging problem of 3D concept grounding (i.e. segmenting and learning visual concepts) by looking at RGBD images and reasoning about paired questions and answers. Existing visual reasoning approaches typically utilize supervised methods to extract 2D segmentation masks on which concepts are grounded. In contrast, humans are capable of grounding concepts on the underlying 3D representation of images. However, traditionally inferred 3D representations (e.g., point clouds, voxelgrids, and meshes) cannot capture continuous 3D features flexibly, thus making it challenging to ground concepts to 3D regions based on the language description of the object being referred to. To address both issues, we propose to leverage the continuous, differentiable nature of neural fields to segment and learn concepts. Specifically, each 3D coordinate in a scene is represented as a high-dimensional descriptor. Concept grounding can then be performed by computing the similarity between the descriptor vector of a 3D coordinate and the vector embedding of a language concept, which enables segmentations and concept learning to be jointly learned on neural fields in a differentiable fashion. As a result, both 3D semantic and instance segmentations can emerge directly from question answering supervision using a set of defined neural operators on top of neural fields (e.g., filtering and counting). Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks. Furthermore, our framework can generalize well to unseen shape categories and real scans.
\ No newline at end of file
diff --git a/data/2022/neurips/3DB: A Framework for Debugging Computer Vision Models b/data/2022/neurips/3DB: A Framework for Debugging Computer Vision Models
new file mode 100644
index 0000000000..da284917f3
--- /dev/null
+++ b/data/2022/neurips/3DB: A Framework for Debugging Computer Vision Models	
@@ -0,0 +1 @@
+We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation. We demonstrate, through a wide range of use cases, that 3DB allows users to discover vulnerabilities in computer vision systems and gain insights into how models make decisions. 3DB captures and generalizes many robustness analyses from prior work, and enables one to study their interplay. Finally, we find that the insights generated by the system transfer to the physical world. We are releasing 3DB as a library (https://github.com/3db/3db) alongside a set of example analyses, guides, and documentation: https://3db.github.io/3db/ .
\ No newline at end of file
diff --git a/data/2022/neurips/3DILG: Irregular Latent Grids for 3D Generative Modeling b/data/2022/neurips/3DILG: Irregular Latent Grids for 3D Generative Modeling
new file mode 100644
index 0000000000..03868082bc
--- /dev/null
+++ b/data/2022/neurips/3DILG: Irregular Latent Grids for 3D Generative Modeling	
@@ -0,0 +1 @@
+We propose a new representation for encoding 3D shapes as neural fields. The representation is designed to be compatible with the transformer architecture and to benefit both shape reconstruction and shape generation. Existing works on neural fields are grid-based representations with latents defined on a regular grid. In contrast, we define latents on irregular grids, enabling our representation to be sparse and adaptive. In the context of shape reconstruction from point clouds, our shape representation built on irregular grids improves upon grid-based methods in terms of reconstruction accuracy. For shape generation, our representation promotes high-quality shape generation using auto-regressive probabilistic models. We show different applications that improve over the current state of the art. First, we show results for probabilistic shape reconstruction from a single higher resolution image. Second, we train a probabilistic model conditioned on very low resolution images. Third, we apply our model to category-conditioned generation. All probabilistic experiments confirm that we are able to generate detailed and high quality shapes to yield the new state of the art in generative 3D shape modeling.
\ No newline at end of file
diff --git a/data/2022/neurips/3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection on Point Clouds b/data/2022/neurips/3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection on Point Clouds
new file mode 100644
index 0000000000..545661c16d
--- /dev/null
+++ b/data/2022/neurips/3DOS: Towards 3D Open Set Learning - Benchmarking and Understanding Semantic Novelty Detection on Point Clouds	
@@ -0,0 +1 @@
+In recent years there has been significant progress in the field of 3D learning on classification, detection and segmentation problems. The vast majority of the existing studies focus on canonical closed-set conditions, neglecting the intrinsic open nature of the real-world. This limits the abilities of robots and autonomous systems involved in safety-critical applications that require managing novel and unknown signals. In this context exploiting 3D data can be a valuable asset since it provides rich information about the geometry of perceived objects and scenes. With this paper we provide the first broad study on 3D Open Set learning. We introduce 3DOS: a novel testbed for semantic novelty detection that considers several settings with increasing difficulties in terms of semantic (category) shift, and covers both in-domain (synthetic-to-synthetic, real-to-real) and cross-domain (synthetic-to-real) scenarios. Moreover, we investigate the related 2D Open Set literature to understand if and how its recent improvements are effective on 3D data. Our extensive benchmark positions several algorithms in the same coherent picture, revealing their strengths and limitations. The results of our analysis may serve as a reliable foothold for future tailored 3D Open Set methods.
\ No newline at end of file
diff --git a/data/2022/neurips/4D Unsupervised Object Discovery b/data/2022/neurips/4D Unsupervised Object Discovery
new file mode 100644
index 0000000000..f125d42a63
--- /dev/null
+++ b/data/2022/neurips/4D Unsupervised Object Discovery	
@@ -0,0 +1 @@
+Object discovery is a core task in computer vision. While fast progresses have been made in supervised object detection, its unsupervised counterpart remains largely unexplored. With the growth of data volume, the expensive cost of annotations is the major limitation hindering further study. Therefore, discovering objects without annotations has great significance. However, this task seems impractical on still-image or point cloud alone due to the lack of discriminative information. Previous studies underlook the crucial temporal information and constraints naturally behind multi-modal inputs. In this paper, we propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information. We present the first practical approach for this task by proposing a ClusterNet on 3D point clouds, which is jointly iteratively optimized with a 2D localization network. Extensive experiments on the large-scale Waymo Open Dataset suggest that the localization network and ClusterNet achieve competitive performance on both class-agnostic 2D object detection and 3D instance segmentation, bridging the gap between unsupervised methods and full supervised ones. Codes and models will be made available at https://github.com/Robertwyq/LSMOL.
\ No newline at end of file
diff --git a/data/2022/neurips/A Benchmark for Compositional Visual Reasoning b/data/2022/neurips/A Benchmark for Compositional Visual Reasoning
new file mode 100644
index 0000000000..319e29f771
--- /dev/null
+++ b/data/2022/neurips/A Benchmark for Compositional Visual Reasoning	
@@ -0,0 +1 @@
+A fundamental component of human vision is our ability to parse complex visual scenes and judge the relations between their constituent objects. AI benchmarks for visual reasoning have driven rapid progress in recent years with state-of-the-art systems now reaching human accuracy on some of these benchmarks. Yet, there remains a major gap between humans and AI systems in terms of the sample efficiency with which they learn new visual reasoning tasks. Humans' remarkable efficiency at learning has been at least partially attributed to their ability to harness compositionality - allowing them to efficiently take advantage of previously gained knowledge when learning new tasks. Here, we introduce a novel visual reasoning benchmark, Compositional Visual Relations (CVR), to drive progress towards the development of more data-efficient learning algorithms. We take inspiration from fluid intelligence and non-verbal reasoning tests and describe a novel method for creating compositions of abstract rules and generating image datasets corresponding to these rules at scale. Our proposed benchmark includes measures of sample efficiency, generalization, compositionality, and transfer across task rules. We systematically evaluate modern neural architectures and find that convolutional architectures surpass transformer-based architectures across all performance measures in most data regimes. However, all computational models are much less data efficient than humans, even after learning informative visual representations using self-supervision. Overall, we hope our challenge will spur interest in developing neural architectures that can learn to harness compositionality for more efficient learning.
\ No newline at end of file
diff --git a/data/2022/neurips/A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback b/data/2022/neurips/A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
new file mode 100644
index 0000000000..82c098a72a
--- /dev/null
+++ b/data/2022/neurips/A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback	
@@ -0,0 +1 @@
+We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays. Specifically, the adversarial regret guarantee is $\mathcal{O}(\sqrt{TK} + \sqrt{dT\log K})$, where $T$ is the time horizon, $K$ is the number of arms, and $d$ is the fixed delay, whereas the stochastic regret guarantee is $\mathcal{O}\left(\sum_{i \neq i^*}(\frac{1}{\Delta_i} \log(T) + \frac{d}{\Delta_{i}\log K}) + d K^{1/3}\log K\right)$, where $\Delta_i$ are the suboptimality gaps. We also present an extension of the algorithm to the case of arbitrary delays, which is based on an oracle knowledge of the maximal delay $d_{max}$ and achieves $\mathcal{O}(\sqrt{TK} + \sqrt{D\log K} + d_{max}K^{1/3} \log K)$ regret in the adversarial regime, where $D$ is the total delay, and $\mathcal{O}\left(\sum_{i \neq i^*}(\frac{1}{\Delta_i} \log(T) + \frac{\sigma_{max}}{\Delta_{i}\log K}) + d_{max}K^{1/3}\log K\right)$ regret in the stochastic regime, where $\sigma_{max}$ is the maximal number of outstanding observations. Finally, we present a lower bound that matches regret upper bound achieved by the skipping technique of Zimmert and Seldin [2020] in the adversarial setting.
\ No newline at end of file
diff --git a/data/2022/neurips/A Boosting Approach to Reinforcement Learning b/data/2022/neurips/A Boosting Approach to Reinforcement Learning
new file mode 100644
index 0000000000..ab46731f99
--- /dev/null
+++ b/data/2022/neurips/A Boosting Approach to Reinforcement Learning	
@@ -0,0 +1 @@
+Reducing reinforcement learning to supervised learning is a well-studied and effective approach that leverages the benefits of compact function approximation to deal with large-scale Markov decision processes. Independently, the boosting methodology (e.g. AdaBoost) has proven to be indispensable in designing efficient and accurate classification algorithms by combining inaccurate rules-of-thumb. In this paper, we take a further step: we reduce reinforcement learning to a sequence of weak learning problems. Since weak learners perform only marginally better than random guesses, such subroutines constitute a weaker assumption than the availability of an accurate supervised learning oracle. We prove that the sample complexity and running time bounds of the proposed method do not explicitly depend on the number of states. While existing results on boosting operate on convex losses, the value function over policies is non-convex. We show how to use a non-convex variant of the Frank-Wolfe method for boosting, that additionally improves upon the known sample complexity and running time even for reductions to supervised learning.
\ No newline at end of file
diff --git a/data/2022/neurips/A Causal Analysis of Harm b/data/2022/neurips/A Causal Analysis of Harm
new file mode 100644
index 0000000000..edcba9633f
--- /dev/null
+++ b/data/2022/neurips/A Causal Analysis of Harm	
@@ -0,0 +1 @@
+As autonomous systems rapidly become ubiquitous, there is a growing need for a legal and regulatory framework that addresses when and how such a system harms someone. There have been several attempts within the philosophy literature to define harm, but none of them has proven capable of dealing with the many examples that have been presented, leading some to suggest that the notion of harm should be abandoned and “replaced by more well-behaved notions”. As harm is generally something that is caused, most of these definitions have involved causality at some level. Yet surprisingly, none of them makes use of causal models and the definitions of actual causality that they can express. In this paper, which is an expanded version of the conference paper Beckers et al. (Adv Neural Inform Process Syst 35:2365–2376, 2022), we formally define a qualitative notion of harm that uses causal models and is based on a well-known definition of actual causality. The key features of our definition are that it is based on contrastive causation and uses a default utility to which the utility of actual outcomes is compared. We show that our definition is able to handle the examples from the literature, and illustrate its importance for reasoning about situations involving autonomous systems.
\ No newline at end of file
diff --git a/data/2022/neurips/A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization b/data/2022/neurips/A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization
new file mode 100644
index 0000000000..b25a5b935a
--- /dev/null
+++ b/data/2022/neurips/A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization	
@@ -0,0 +1 @@
+Sentence summarization aims at compressing a long sentence into a short one that keeps the main gist, and has extensive real-world applications such as headline generation. In previous work, researchers have developed various approaches to improve the ROUGE score, which is the main evaluation metric for summarization, whereas controlling the summary length has not drawn much attention. In our work, we address a new problem of explicit character-level length control for summarization, and propose a dynamic programming algorithm based on the Connectionist Temporal Classification (CTC) model. Results show that our approach not only achieves higher ROUGE scores but also yields more complete sentences.
\ No newline at end of file
diff --git a/data/2022/neurips/A Characterization of Semi-Supervised Adversarially Robust PAC Learnability b/data/2022/neurips/A Characterization of Semi-Supervised Adversarially Robust PAC Learnability
new file mode 100644
index 0000000000..7ccb4ef10a
--- /dev/null
+++ b/data/2022/neurips/A Characterization of Semi-Supervised Adversarially Robust PAC Learnability	
@@ -0,0 +1 @@
+We study the problem of learning an adversarially robust predictor to test time attacks in the semi-supervised PAC model. We address the question of how many labeled and unlabeled examples are required to ensure learning. We show that having enough unlabeled data (the size of a labeled sample that a fully-supervised method would require), the labeled sample complexity can be arbitrarily smaller compared to previous works, and is sharply characterized by a different complexity measure. We prove nearly matching upper and lower bounds on this sample complexity. This shows that there is a significant benefit in semi-supervised robust learning even in the worst-case distribution-free model, and establishes a gap between the supervised and semi-supervised label complexities which is known not to hold in standard non-robust PAC learning.
\ No newline at end of file
diff --git a/data/2022/neurips/A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases b/data/2022/neurips/A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
new file mode 100644
index 0000000000..fdfdb7aa65
--- /dev/null
+++ b/data/2022/neurips/A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases	
@@ -0,0 +1 @@
+Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge computational expense, blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set. In this paper, we use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias. We apply the resulting learned optimizer to a variety of neural network training tasks, where it outperforms the current state of the art learned optimizer -- at matched optimizer computational overhead -- with regard to optimization performance and meta-training speed, and is capable of generalization to tasks far different from those it was meta-trained on.
\ No newline at end of file
diff --git a/data/2022/neurips/A Closer Look at Offline RL Agents b/data/2022/neurips/A Closer Look at Offline RL Agents
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Closer Look at Prototype Classifier for Few-shot Image Classification b/data/2022/neurips/A Closer Look at Prototype Classifier for Few-shot Image Classification
new file mode 100644
index 0000000000..9d18b0015a
--- /dev/null
+++ b/data/2022/neurips/A Closer Look at Prototype Classifier for Few-shot Image Classification	
@@ -0,0 +1 @@
+The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning algorithm, performs comparably with the prototypical network. However, the training of a new linear classifier requires the retraining of the classifier every time a new class appears. In this paper, we analyze how a prototype classifier works equally well without training a new linear classifier or meta-learning. We experimentally find that directly using the feature vectors, which is extracted by using standard pre-trained models to construct a prototype classifier in meta-testing, does not perform as well as the prototypical network and training new linear classifiers on the feature vectors of pre-trained models. Thus, we derive a novel generalization bound for a prototypical classifier and show that the transformation of a feature vector can improve the performance of prototype classifiers. We experimentally investigate several normalization methods for minimizing the derived bound and find that the same performance can be obtained by using the L2 normalization and minimizing the ratio of the within-class variance to the between-class variance without training a new classifier or meta-learning.
\ No newline at end of file
diff --git a/data/2022/neurips/A Closer Look at Weakly-Supervised Audio-Visual Source Localization b/data/2022/neurips/A Closer Look at Weakly-Supervised Audio-Visual Source Localization
new file mode 100644
index 0000000000..f0ee947f9e
--- /dev/null
+++ b/data/2022/neurips/A Closer Look at Weakly-Supervised Audio-Visual Source Localization	
@@ -0,0 +1 @@
+Audio-visual source localization is a challenging task that aims to predict the location of visual sound sources in a video. Since collecting ground-truth annotations of sounding objects can be costly, a plethora of weakly-supervised localization methods that can learn from datasets with no bounding-box annotations have been proposed in recent years, by leveraging the natural co-occurrence of audio and visual signals. Despite significant interest, popular evaluation protocols have two major flaws. First, they allow for the use of a fully annotated dataset to perform early stopping, thus significantly increasing the annotation effort required for training. Second, current evaluation metrics assume the presence of sound sources at all times. This is of course an unrealistic assumption, and thus better metrics are necessary to capture the model's performance on (negative) samples with no visible sound sources. To accomplish this, we extend the test set of popular benchmarks, Flickr SoundNet and VGG-Sound Sources, in order to include negative samples, and measure performance using metrics that balance localization accuracy and recall. Using the new protocol, we conducted an extensive evaluation of prior methods, and found that most prior works are not capable of identifying negatives and suffer from significant overfitting problems (rely heavily on early stopping for best results). We also propose a new approach for visual sound source localization that addresses both these problems. In particular, we found that, through extreme visual dropout and the use of momentum encoders, the proposed approach combats overfitting effectively, and establishes a new state-of-the-art performance on both Flickr SoundNet and VGG-Sound Source. Code and pre-trained models are available at https://github.com/stoneMo/SLAVC.
\ No newline at end of file
diff --git a/data/2022/neurips/A Closer Look at the Adversarial Robustness of Deep Equilibrium Models b/data/2022/neurips/A Closer Look at the Adversarial Robustness of Deep Equilibrium Models
new file mode 100644
index 0000000000..14198093a4
--- /dev/null
+++ b/data/2022/neurips/A Closer Look at the Adversarial Robustness of Deep Equilibrium Models	
@@ -0,0 +1 @@
+Deep equilibrium models (DEQs) refrain from the traditional layer-stacking paradigm and turn to find the fixed point of a single layer. DEQs have achieved promising performance on different applications with featured memory efficiency. At the same time, the adversarial vulnerability of DEQs raises concerns. Several works propose to certify robustness for monotone DEQs. However, limited efforts are devoted to studying empirical robustness for general DEQs. To this end, we observe that an adversarially trained DEQ requires more forward steps to arrive at the equilibrium state, or even violates its fixed-point structure. Besides, the forward and backward tracks of DEQs are misaligned due to the black-box solvers. These facts cause gradient obfuscation when applying the ready-made attacks to evaluate or adversarially train DEQs. Given this, we develop approaches to estimate the intermediate gradients of DEQs and integrate them into the attacking pipelines. Our approaches facilitate fully white-box evaluations and lead to effective adversarial defense for DEQs. Extensive experiments on CIFAR-10 validate the adversarial robustness of DEQs competitive with deep networks of similar sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/A Combinatorial Perspective on the Optimization of Shallow ReLU Networks b/data/2022/neurips/A Combinatorial Perspective on the Optimization of Shallow ReLU Networks
new file mode 100644
index 0000000000..d73f73e899
--- /dev/null
+++ b/data/2022/neurips/A Combinatorial Perspective on the Optimization of Shallow ReLU Networks	
@@ -0,0 +1 @@
+The NP-hard problem of optimizing a shallow ReLU network can be characterized as a combinatorial search over each training example's activation pattern followed by a constrained convex problem given a fixed set of activation patterns. We explore the implications of this combinatorial aspect of ReLU optimization in this work. We show that it can be naturally modeled via a geometric and combinatoric object known as a zonotope with its vertex set isomorphic to the set of feasible activation patterns. This assists in analysis and provides a foundation for further research. We demonstrate its usefulness when we explore the sensitivity of the optimal loss to perturbations of the training data. Later we discuss methods of zonotope vertex selection and its relevance to optimization. Overparameterization assists in training by making a randomly chosen vertex more likely to contain a good solution. We then introduce a novel polynomial-time vertex selection procedure that provably picks a vertex containing the global optimum using only double the minimum number of parameters required to fit the data. We further introduce a local greedy search heuristic over zonotope vertices and demonstrate that it outperforms gradient descent on underparameterized problems.
\ No newline at end of file
diff --git a/data/2022/neurips/A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks b/data/2022/neurips/A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
new file mode 100644
index 0000000000..95cc1b417e
--- /dev/null
+++ b/data/2022/neurips/A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks	
@@ -0,0 +1 @@
+In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep neural networks (e.g., RNN, LSTM) because of the exploding gradient issue. Gradient clipping is usually employed to address this issue in the single machine setting, but exploring this technique in the distributed setting is still in its infancy: it remains mysterious whether the gradient clipping scheme can take advantage of multiple machines to enjoy parallel speedup. The main technical difficulty lies in dealing with nonconvex loss function, non-Lipschitz continuous gradient, and skipping communication rounds simultaneously. In this paper, we explore a relaxed-smoothness assumption of the loss landscape which LSTM was shown to satisfy in previous works, and design a communication-efficient gradient clipping algorithm. This algorithm can be run on multiple machines, where each machine employs a gradient clipping scheme and communicate with other machines after multiple steps of gradient-based updates. Our algorithm is proved to have $O\left(\frac{1}{N\epsilon^4}\right)$ iteration complexity and $O(\frac{1}{\epsilon^3})$ communication complexity for finding an $\epsilon$-stationary point in the homogeneous data setting, where $N$ is the number of machines. This indicates that our algorithm enjoys linear speedup and reduced communication rounds. Our proof relies on novel analysis techniques of estimating truncated random variables, which we believe are of independent interest. Our experiments on several benchmark datasets and various scenarios demonstrate that our algorithm indeed exhibits fast convergence speed in practice and thus validates our theory.
\ No newline at end of file
diff --git a/data/2022/neurips/A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning b/data/2022/neurips/A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning
new file mode 100644
index 0000000000..9fd53cf1c0
--- /dev/null
+++ b/data/2022/neurips/A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning	
@@ -0,0 +1 @@
+In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-strongly-concave, we prove that FedGDA-GT converges linearly with a constant stepsize to global $\epsilon$-approximation solution with $\mathcal{O}(\log (1/\epsilon))$ rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.
\ No newline at end of file
diff --git a/data/2022/neurips/A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking b/data/2022/neurips/A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking
new file mode 100644
index 0000000000..ad1da58f35
--- /dev/null
+++ b/data/2022/neurips/A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking	
@@ -0,0 +1 @@
+Large-scale graph training is a notoriously challenging problem for graph neural networks (GNNs). Due to the nature of evolving graph structures into the training process, vanilla GNNs usually fail to scale up, limited by the GPU memory space. Up to now, though numerous scalable GNN architectures have been proposed, we still lack a comprehensive survey and fair benchmark of this reservoir to find the rationale for designing scalable GNNs. To this end, we first systematically formulate the representative methods of large-scale graph training into several branches and further establish a fair and consistent benchmark for them by a greedy hyperparameter searching. In addition, regarding efficiency, we theoretically evaluate the time and space complexity of various branches and empirically compare them w.r.t GPU memory usage, throughput, and convergence. Furthermore, We analyze the pros and cons for various branches of scalable GNNs and then present a new ensembling training manner, named EnGCN, to address the existing issues. Our code is available at https://github.com/VITA-Group/Large_Scale_GCN_Benchmarking.
\ No newline at end of file
diff --git a/data/2022/neurips/A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension b/data/2022/neurips/A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension
new file mode 100644
index 0000000000..5355071a10
--- /dev/null
+++ b/data/2022/neurips/A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension	
@@ -0,0 +1 @@
+Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features $p$ is as large as or larger than the number of samples $n$. Here, we tackle this problem by improving the Conditional Randomization Test (CRT). The original CRT algorithm shows promise as a way to output p-values while making few assumptions on the distribution of the test statistics. As it comes with a prohibitive computational cost even in mildly high-dimensional problems, faster solutions based on distillation have been proposed. Yet, they rely on unrealistic hypotheses and result in low-power solutions. To improve this, we propose \emph{CRT-logit}, an algorithm that combines a variable-distillation step and a decorrelation step that takes into account the geometry of $\ell_1$-penalized logistic regression problem. We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/A Consistent and Differentiable Lp Canonical Calibration Error Estimator b/data/2022/neurips/A Consistent and Differentiable Lp Canonical Calibration Error Estimator
new file mode 100644
index 0000000000..4b5dce2637
--- /dev/null
+++ b/data/2022/neurips/A Consistent and Differentiable Lp Canonical Calibration Error Estimator	
@@ -0,0 +1 @@
+Calibrated probabilistic classifiers are models whose predicted probabilities can directly be interpreted as uncertainty estimates. It has been shown recently that deep neural networks are poorly calibrated and tend to output overconfident predictions. As a remedy, we propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates, which asymptotically converges to the true $L_p$ calibration error. This novel estimator enables us to tackle the strongest notion of multiclass calibration, called canonical (or distribution) calibration, while other common calibration methods are tractable only for top-label and marginal calibration. The computational complexity of our estimator is $\mathcal{O}(n^2)$, the convergence rate is $\mathcal{O}(n^{-1/2})$, and it is unbiased up to $\mathcal{O}(n^{-2})$, achieved by a geometric series debiasing scheme. In practice, this means that the estimator can be applied to small subsets of data, enabling efficient estimation and mini-batch updates. The proposed method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities based on conditional expectation, such as the sharpness of a probabilistic classifier. Empirical results validate the correctness of our estimator, and demonstrate its utility in canonical calibration error estimation and calibration error regularized risk minimization.
\ No newline at end of file
diff --git a/data/2022/neurips/A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction b/data/2022/neurips/A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Continuous Time Framework for Discrete Denoising Models b/data/2022/neurips/A Continuous Time Framework for Discrete Denoising Models
new file mode 100644
index 0000000000..d62b0a6cc9
--- /dev/null
+++ b/data/2022/neurips/A Continuous Time Framework for Discrete Denoising Models	
@@ -0,0 +1 @@
+We provide the first complete continuous time framework for denoising diffusion models of discrete data. This is achieved by formulating the forward noising process and corresponding reverse time generative process as Continuous Time Markov Chains (CTMCs). The model can be efficiently trained using a continuous time version of the ELBO. We simulate the high dimensional CTMC using techniques developed in chemical physics and exploit our continuous time framework to derive high performance samplers that we show can outperform discrete time methods for discrete data. The continuous time treatment also enables us to derive a novel theoretical result bounding the error between the generated sample distribution and the true data distribution.
\ No newline at end of file
diff --git a/data/2022/neurips/A Contrastive Framework for Neural Text Generation b/data/2022/neurips/A Contrastive Framework for Neural Text Generation
new file mode 100644
index 0000000000..eee5b50570
--- /dev/null
+++ b/data/2022/neurips/A Contrastive Framework for Neural Text Generation	
@@ -0,0 +1 @@
+Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e.g. beam search) of neural language models often lead to degenerate solutions -- the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease probabilities of certain tokens (e.g., unlikelihood training). However, they often lead to solutions that lack coherence. In this work, we show that an underlying reason for model degeneration is the anisotropic distribution of token representations. We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text. Extensive experiments and analyses on three benchmarks from two languages demonstrate that our proposed approach significantly outperforms current state-of-the-art text generation methods as evaluated by both human and automatic metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning b/data/2022/neurips/A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning
new file mode 100644
index 0000000000..5b1f6402ca
--- /dev/null
+++ b/data/2022/neurips/A Coupled Design of Exploiting Record Similarity for Practical Vertical Federated Learning	
@@ -0,0 +1 @@
+Federated learning is a learning paradigm to enable collaborative learning across different parties without revealing raw data. Notably, vertical federated learning (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, most existing studies in VFL disregard the"record linkage"process. They design algorithms either assuming the data from different parties can be exactly linked or simply linking each record with its most similar neighboring record. These approaches may fail to capture the key features from other less similar records. Moreover, such improper linkage cannot be corrected by training since existing approaches provide no feedback on linkage during training. In this paper, we design a novel coupled training paradigm, FedSim, that integrates one-to-many linkage into the training process. Besides enabling VFL in many real-world applications with fuzzy identifiers, FedSim also achieves better performance in traditional VFL tasks. Moreover, we theoretically analyze the additional privacy risk incurred by sharing similarities. Our experiments on eight datasets with various similarity metrics show that FedSim outperforms other state-of-the-art baselines. The codes of FedSim are available at https://github.com/Xtra-Computing/FedSim.
\ No newline at end of file
diff --git a/data/2022/neurips/A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training b/data/2022/neurips/A Data-Augmentation Is Worth A Thousand Samples: Analytical Moments And Sampling-Free Training
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Dataset for Efforts Towards Achieving the Sustainable Development Goal of Safe Working Environments b/data/2022/neurips/A Dataset for Efforts Towards Achieving the Sustainable Development Goal of Safe Working Environments
new file mode 100644
index 0000000000..20394bac63
--- /dev/null
+++ b/data/2022/neurips/A Dataset for Efforts Towards Achieving the Sustainable Development Goal of Safe Working Environments	
@@ -0,0 +1 @@
+Among United Nations’ 17 Sustainable Development Goals (SDGs), we highlight SDG 8 on Decent Work and Economic Growth. Specifically, we consider how to achieve subgoal 8.8, “protect labour rights and promote safe working environments for all workers [...]”, in light of poor health, safety and environment (HSE) conditions being a widespread problem at workplaces. In EU alone, it is estimated that more than 4000 deaths occur each year due to poor working conditions. To handle the problem and achieve SDG 8, governmental agencies conduct labour inspections and it is therefore essential that these are carried out efficiently. Current research suggests that machine learning (ML) can be used to improve labour inspections, for instance by selecting organisations for inspections more effectively. However, the research in this area is very limited, in part due to a lack of publicly available data. Consequently, we introduce a new dataset called the Labour Inspection Checklists Dataset (LICD), which we have made publicly available. 1 LICD consists of 63634 instances where each instance is an inspection conducted by the Norwegian Labour Inspection Authority. LICD has 577 features and labels. The dataset provides several ML research opportunities; we discuss two demonstration experiments. One experiment deals with the problem of selecting a relevant checklist for inspecting a given target organisation. The other experiment concerns the problem of predicting HSE violations, given a specific checklist and a target organisation. Our experimental results, while promising, suggest that achieving good ML classification performance is difficult for both problems. This motivates future research to improve ML performance, inspire other data analysis efforts, and ultimately achieve SDG 8.
\ No newline at end of file
diff --git a/data/2022/neurips/A Deep Learning Dataloader with Shared Data Preparation b/data/2022/neurips/A Deep Learning Dataloader with Shared Data Preparation
new file mode 100644
index 0000000000..57a6b6b350
--- /dev/null
+++ b/data/2022/neurips/A Deep Learning Dataloader with Shared Data Preparation	
@@ -0,0 +1 @@
+Parallelly executing multiple training jobs on overlapped datasets is a common practice in developing deep learning models. By default, each of the parallel jobs prepares (i.e., loads and preprocesses) the data independently, causing redundant consumption of I/O and CPU. Although a centralized cache component can reduce the redundancies by reusing the data preparation work, each job’s random data shuffling results in a low sampling locality causing heavy cache thrashing. Prior work tries to improve the sampling locality by enforcing all the training jobs loading the same dataset in the same order and pace. However, such a solution is only efficient under strong constraints: all jobs are trained on the same dataset with the same starting moment and training speed. In this paper, we propose a new data loading method for efficiently training parallel DNNs with much flexible constraints. Our method is still highly efficient when different training jobs use different but overlapped datasets and have different starting moments and training speeds. To achieve this, we propose a dependent sampling algorithm (DSA) and a domain-specific cache policy. Moreover, a novel tree data structure is designed to efficiently implement DSA. Based on the proposed techniques, we implemented a prototype, named J OADER , which can share data preparation work as long as the datasets are overlapped for different training jobs. We evaluate the proposed J OADER , showing a greater versatility and superiority of training speed improvement (up to 200% on ResNet18) without affecting the accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/A Deep Reinforcement Learning Framework for Column Generation b/data/2022/neurips/A Deep Reinforcement Learning Framework for Column Generation
new file mode 100644
index 0000000000..940cfca90d
--- /dev/null
+++ b/data/2022/neurips/A Deep Reinforcement Learning Framework for Column Generation	
@@ -0,0 +1 @@
+Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale \textit{integer} linear programs, which rely on CG to solve LP relaxations within a branch and price algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a \textit{route}, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG. Unlike typical column selection rules which myopically select a column based on local information at each iteration, we treat CG as a sequential decision-making problem: the column selected in a given iteration affects subsequent column selections. This perspective lends itself to a Deep Reinforcement Learning approach that uses Graph Neural Networks (GNNs) to represent the variable-constraint structure in the LP of interest. We perform an extensive set of experiments using the publicly available BPPLIB benchmark for CSP and Solomon benchmark for VRPTW. RLCG converges faster and reduces the number of CG iterations by 22.4\% for CSP and 40.9\% for VRPTW on average compared to a commonly used greedy policy. Our code is available at https://github.com/chichengmessi/reinforcement-learning-for-column-generation.git.
\ No newline at end of file
diff --git a/data/2022/neurips/A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval b/data/2022/neurips/A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval
new file mode 100644
index 0000000000..90fad9bffa
--- /dev/null
+++ b/data/2022/neurips/A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval	
@@ -0,0 +1 @@
+Cross-modal retrieval aims to build correspondence between multiple modalities by learning a common representation space. Typically, an image can match multiple texts semantically and vice versa, which significantly increases the difficulty of this task. To address this problem, probabilistic embedding is proposed to quantify these many-to-many relationships. However, existing datasets ( e . g ., MS-COCO) and metrics ( e . g ., Recall@K) cannot fully represent these diversity correspondences due to non-exhaustive annotations. Based on this observation, we utilize semantic correlation computed by CIDEr to find the potential correspondences. Then we present an effective metric, named Average Semantic Precision (ASP), which can measure the ranking precision of semantic correlation for retrieval sets. Additionally, we introduce a novel and concise objective, coined Differentiable ASP Approximation (DAA). Concretely, DAA can optimize ASP directly by making the ranking function of ASP differentiable through a sigmoid function. To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval. The results show that our approach obtains superior performance over the state-of-the-art approaches on all metrics. The code and trained models are released at https://github.com/leolee99/2022-NeurIPS-DAA .
\ No newline at end of file
diff --git a/data/2022/neurips/A Differentially Private Linear-Time fPTAS for the Minimum Enclosing Ball Problem b/data/2022/neurips/A Differentially Private Linear-Time fPTAS for the Minimum Enclosing Ball Problem
new file mode 100644
index 0000000000..f636a4dc4b
--- /dev/null
+++ b/data/2022/neurips/A Differentially Private Linear-Time fPTAS for the Minimum Enclosing Ball Problem	
@@ -0,0 +1 @@
+The Minimum Enclosing Ball (MEB) problem is one of the most fundamental problems in clustering, with applications in operations research, statistics and computational geometry. In this works, we give the first differentially private (DP) fPTAS for the Minimum Enclosing Ball problem, improving both on the runtime and the utility bound of the best known DP-PTAS for the problem, of Ghazi et al. (2020). Given $n$ points in $\R^d$ that are covered by the ball $B(\theta_{opt},r_{opt})$, our simple iterative DP-algorithm returns a ball $B(\theta,r)$ where $r\leq (1+\gamma)r_{opt}$ and which leaves at most $\tilde O(\frac{\sqrt d}{\gamma\epsilon})$ points uncovered in $\tilde O(\nicefrac n {\gamma^2})$-time. We also give a local-model version of our algorithm, that leaves at most $\tilde O(\frac{\sqrt {nd}}{\gamma\epsilon})$ points uncovered, improving on the $n^{0.67}$-bound of Nissim and Stemmer (2018) (at the expense of other parameters). In addition, we test our algorithm empirically and discuss future open problems.
\ No newline at end of file
diff --git a/data/2022/neurips/A Direct Approximation of AIXI Using Logical State Abstractions b/data/2022/neurips/A Direct Approximation of AIXI Using Logical State Abstractions
new file mode 100644
index 0000000000..4ecd5ea7b1
--- /dev/null
+++ b/data/2022/neurips/A Direct Approximation of AIXI Using Logical State Abstractions	
@@ -0,0 +1 @@
+We propose a practical integration of logical state abstraction with AIXI, a Bayesian optimality notion for reinforcement learning agents, to significantly expand the model class that AIXI agents can be approximated over to complex history-dependent and structured environments. The state representation and reasoning framework is based on higher-order logic, which can be used to define and enumerate complex features on non-Markovian and structured environments. We address the problem of selecting the right subset of features to form state abstractions by adapting the $\Phi$-MDP optimisation criterion from state abstraction theory. Exact Bayesian model learning is then achieved using a suitable generalisation of Context Tree Weighting over abstract state sequences. The resultant architecture can be integrated with different planning algorithms. Experimental results on controlling epidemics on large-scale contact networks validates the agent's performance.
\ No newline at end of file
diff --git a/data/2022/neurips/A Fast Post-Training Pruning Framework for Transformers b/data/2022/neurips/A Fast Post-Training Pruning Framework for Transformers
new file mode 100644
index 0000000000..380fa9f254
--- /dev/null
+++ b/data/2022/neurips/A Fast Post-Training Pruning Framework for Transformers	
@@ -0,0 +1 @@
+Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining. Given a resource constraint and a sample dataset, our framework automatically prunes the Transformer model using structured sparsity methods. To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher information; (ii) mask rearrangement that complements the search algorithm; and (iii) mask tuning that reconstructs the output activations for each layer. We apply our method to BERT-base and DistilBERT, and we evaluate its effectiveness on GLUE and SQuAD benchmarks. Our framework achieves up to 2.0x reduction in FLOPs and 1.56x speedup in inference latency, while maintaining<1% loss in accuracy. Importantly, our framework prunes Transformers in less than 3 minutes on a single GPU, which is over two orders of magnitude faster than existing pruning approaches that retrain the models.
\ No newline at end of file
diff --git a/data/2022/neurips/A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data b/data/2022/neurips/A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data
new file mode 100644
index 0000000000..454a149cfb
--- /dev/null
+++ b/data/2022/neurips/A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data	
@@ -0,0 +1 @@
+Nonnegative (linear) least square problems are a fundamental class of problems that is well-studied in statistical learning and for which solvers have been implemented in many of the standard programming languages used within the machine learning community. The existing off-the-shelf solvers view the non-negativity constraint in these problems as an obstacle and, compared to unconstrained least squares, perform additional effort to address it. However, in many of the typical applications, the data itself is nonnegative as well, and we show that the nonnegativity in this case makes the problem easier. In particular, while the oracle complexity of unconstrained least squares problems necessarily scales with one of the data matrix constants (typically the spectral norm) and these problems are solved to additive error, we show that nonnegative least squares problems with nonnegative data are solvable to multiplicative error and with complexity that is independent of any matrix constants. The algorithm we introduce is accelerated and based on a primal-dual perspective. We further show how to provably obtain linear convergence using adaptive restart coupled with our method and demonstrate its effectiveness on large-scale data via numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation b/data/2022/neurips/A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation
new file mode 100644
index 0000000000..75d88bc226
--- /dev/null
+++ b/data/2022/neurips/A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation	
@@ -0,0 +1 @@
+The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires $\tilde{\mathcal{O}}(d)$ expert queries and a $\texttt{poly}(d,H,|\mathcal{A}|,1/\varepsilon)$ amount of exploratory samples to provably recover an $\varepsilon$-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning (IL) approaches, our required number of expert demonstrations is independent of $H$ and logarithmic in $1/\varepsilon$, whereas all prior work required at least linear factors of both in addition to the same dependence on $d$. Towards establishing the minimal amount of expert queries needed, we show that, in the same setting, any learner whose exploration budget is polynomially-bounded (in terms of $d,H,$ and $|\mathcal{A}|$) will require at least $\tilde\Omega(\sqrt{d})$ oracle calls to recover a policy competing with the expert's value function. Under the weaker assumption that the expert's policy is linear, we show that the lower bound increases to $\tilde\Omega(d)$.
\ No newline at end of file
diff --git a/data/2022/neurips/A Fourier Approach to Mixture Learning b/data/2022/neurips/A Fourier Approach to Mixture Learning
new file mode 100644
index 0000000000..43ba79be21
--- /dev/null
+++ b/data/2022/neurips/A Fourier Approach to Mixture Learning	
@@ -0,0 +1 @@
+We revisit the problem of learning mixtures of spherical Gaussians. Given samples from mixture $\frac{1}{k}\sum_{j=1}^{k}\mathcal{N}(\mu_j, I_d)$, the goal is to estimate the means $\mu_1, \mu_2, \ldots, \mu_k \in \mathbb{R}^d$ up to a small error. The hardness of this learning problem can be measured by the separation $\Delta$ defined as the minimum distance between all pairs of means. Regev and Vijayaraghavan (2017) showed that with $\Delta = \Omega(\sqrt{\log k})$ separation, the means can be learned using $\mathrm{poly}(k, d)$ samples, whereas super-polynomially many samples are required if $\Delta = o(\sqrt{\log k})$ and $d = \Omega(\log k)$. This leaves open the low-dimensional regime where $d = o(\log k)$. In this work, we give an algorithm that efficiently learns the means in $d = O(\log k/\log\log k)$ dimensions under separation $d/\sqrt{\log k}$ (modulo doubly logarithmic factors). This separation is strictly smaller than $\sqrt{\log k}$, and is also shown to be necessary. Along with the results of Regev and Vijayaraghavan (2017), our work almost pins down the critical separation threshold at which efficient parameter learning becomes possible for spherical Gaussian mixtures. More generally, our algorithm runs in time $\mathrm{poly}(k)\cdot f(d, \Delta, \epsilon)$, and is thus fixed-parameter tractable in parameters $d$, $\Delta$ and $\epsilon$. Our approach is based on estimating the Fourier transform of the mixture at carefully chosen frequencies, and both the algorithm and its analysis are simple and elementary. Our positive results can be easily extended to learning mixtures of non-Gaussian distributions, under a mild condition on the Fourier spectrum of the distribution.
\ No newline at end of file
diff --git a/data/2022/neurips/A General Framework for Auditing Differentially Private Machine Learning b/data/2022/neurips/A General Framework for Auditing Differentially Private Machine Learning
new file mode 100644
index 0000000000..b425c50029
--- /dev/null
+++ b/data/2022/neurips/A General Framework for Auditing Differentially Private Machine Learning	
@@ -0,0 +1 @@
+We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice. While previous works have taken steps toward evaluating privacy loss through poisoning attacks or membership inference, they have been tailored to specific models or have demonstrated low statistical power. Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations, combining improved privacy search and verification methods with a toolkit of influence-based poisoning attacks. We demonstrate significantly improved auditing power over previous approaches on a variety of models including logistic regression, Naive Bayes, and random forest. Our method can be used to detect privacy violations due to implementation errors or misuse. When violations are not present, it can aid in understanding the amount of information that can be leaked from a given dataset, algorithm, and privacy specification.
\ No newline at end of file
diff --git a/data/2022/neurips/A Geometric Perspective on Variational Autoencoders b/data/2022/neurips/A Geometric Perspective on Variational Autoencoders
new file mode 100644
index 0000000000..a7f9c59439
--- /dev/null
+++ b/data/2022/neurips/A Geometric Perspective on Variational Autoencoders	
@@ -0,0 +1 @@
+This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view. We argue that vanilla VAE models unveil naturally a Riemannian structure in their latent space and that taking into consideration those geometrical aspects can lead to better interpolations and an improved generation procedure. This new proposed sampling method consists in sampling from the uniform distribution deriving intrinsically from the learned Riemannian latent space and we show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets. Since generative models are known to be sensitive to the number of training samples we also stress the method's robustness in the low data regime.
\ No newline at end of file
diff --git a/data/2022/neurips/A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis b/data/2022/neurips/A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis
new file mode 100644
index 0000000000..5b73a5e527
--- /dev/null
+++ b/data/2022/neurips/A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis	
@@ -0,0 +1 @@
+Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/A Kernelised Stein Statistic for Assessing Implicit Generative Models b/data/2022/neurips/A Kernelised Stein Statistic for Assessing Implicit Generative Models
new file mode 100644
index 0000000000..f676e49ff9
--- /dev/null
+++ b/data/2022/neurips/A Kernelised Stein Statistic for Assessing Implicit Generative Models	
@@ -0,0 +1 @@
+Synthetic data generation has become a key ingredient for training machine learning procedures, addressing tasks such as data augmentation, analysing privacy-sensitive data, or visualising representative samples. Assessing the quality of such synthetic data generators hence has to be addressed. As (deep) generative models for synthetic data often do not admit explicit probability distributions, classical statistical procedures for assessing model goodness-of-fit may not be applicable. In this paper, we propose a principled procedure to assess the quality of a synthetic data generator. The procedure is a kernelised Stein discrepancy (KSD)-type test which is based on a non-parametric Stein operator for the synthetic data generator of interest. This operator is estimated from samples which are obtained from the synthetic data generator and hence can be applied even when the model is only implicit. In contrast to classical testing, the sample size from the synthetic data generator can be as large as desired, while the size of the observed data, which the generator aims to emulate is fixed. Experimental results on synthetic distributions and trained generative models on synthetic and real datasets illustrate that the method shows improved power performance compared to existing approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/A Lagrangian Duality Approach to Active Learning b/data/2022/neurips/A Lagrangian Duality Approach to Active Learning
new file mode 100644
index 0000000000..8b8f582d6a
--- /dev/null
+++ b/data/2022/neurips/A Lagrangian Duality Approach to Active Learning	
@@ -0,0 +1 @@
+We consider the pool-based active learning problem, where only a subset of the training data is labeled, and the goal is to query a batch of unlabeled samples to be labeled so as to maximally improve model performance. We formulate the problem using constrained learning, where a set of constraints bounds the performance of the model on labeled samples. Considering a primal-dual approach, we optimize the primal variables, corresponding to the model parameters, as well as the dual variables, corresponding to the constraints. As each dual variable indicates how significantly the perturbation of the respective constraint affects the optimal value of the objective function, we use it as a proxy of the informativeness of the corresponding training sample. Our approach, which we refer to as Active Learning via Lagrangian dualitY, or ALLY, leverages this fact to select a diverse set of unlabeled samples with the highest estimated dual variables as our query set. We demonstrate the benefits of our approach in a variety of classification and regression tasks and discuss its limitations depending on the capacity of the model used and the degree of redundancy in the dataset. We also examine the impact of the distribution shift induced by active sampling and show that ALLY can be used in a generative mode to create novel, maximally-informative samples.
\ No newline at end of file
diff --git a/data/2022/neurips/A Large Scale Search Dataset for Unbiased Learning to Rank b/data/2022/neurips/A Large Scale Search Dataset for Unbiased Learning to Rank
new file mode 100644
index 0000000000..427bad58ac
--- /dev/null
+++ b/data/2022/neurips/A Large Scale Search Dataset for Unbiased Learning to Rank	
@@ -0,0 +1 @@
+The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to the following disadvantages observed from those popular benchmark datasets: (1) outdated semantic feature extraction where state-of-the-art large scale pre-trained language models like BERT cannot be exploited due to the missing of the original text;(2) incomplete display features for in-depth study of ULTR, e.g., missing the displayed abstract of documents for analyzing the click necessary bias; (3) lacking real-world user feedback, leading to the prevalence of synthetic datasets in the empirical study. To overcome the above disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries, which is orders of magnitude larger than the existing ones. Baidu-ULTR provides:(1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of different biases with advanced techniques such as causal discovery and meta-learning; and (3) rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. In this paper, we present the design principle of Baidu-ULTR and the performance of benchmark ULTR algorithms on this new data resource, favoring the exploration of ranking for long-tail queries and pre-training tasks for ranking. The Baidu-ULTR dataset and corresponding baseline implementation are available at https://github.com/ChuXiaokai/baidu_ultr_dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/A Lower Bound of Hash Codes' Performance b/data/2022/neurips/A Lower Bound of Hash Codes' Performance
new file mode 100644
index 0000000000..a663e518c2
--- /dev/null
+++ b/data/2022/neurips/A Lower Bound of Hash Codes' Performance	
@@ -0,0 +1 @@
+As a crucial approach for compact representation learning, hashing has achieved great success in effectiveness and efficiency. Numerous heuristic Hamming space metric learning objectives are designed to obtain high-quality hash codes. Nevertheless, a theoretical analysis of criteria for learning good hash codes remains largely unexploited. In this paper, we prove that inter-class distinctiveness and intra-class compactness among hash codes determine the lower bound of hash codes' performance. Promoting these two characteristics could lift the bound and improve hash learning. We then propose a surrogate model to fully exploit the above objective by estimating the posterior of hash codes and controlling it, which results in a low-bias optimization. Extensive experiments reveal the effectiveness of the proposed method. By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26.5\%$ increase in mean Average Precision and an up to $20.5\%$ increase in accuracy. Our code is publicly available at https://github.com/VL-Group/LBHash.
\ No newline at end of file
diff --git a/data/2022/neurips/A Mean-Field Game Approach to Cloud Resource Management with Function Approximation b/data/2022/neurips/A Mean-Field Game Approach to Cloud Resource Management with Function Approximation
new file mode 100644
index 0000000000..02af310537
--- /dev/null
+++ b/data/2022/neurips/A Mean-Field Game Approach to Cloud Resource Management with Function Approximation	
@@ -0,0 +1 @@
+Reinforcement learning (RL) has gained increasing popularity for resource management in cloud services such as serverless computing. As self-interested users compete for shared resources in a cluster, the multi-tenancy nature of serverless platforms necessitates multi-agent reinforcement learning (MARL) solutions, which often suffer from severe scalability issues. In this paper, we propose a mean-field game (MFG) approach to cloud resource management that is scalable to a large number of users and applications and incorporates function approximation to deal with the large state-action spaces in real-world serverless platforms. Specifically, we present an online natural actor-critic algorithm for learning in MFGs compatible with various forms of function approximation. We theoretically establish its finite-time convergence to the regularized Nash equilibrium under linear function approximation and softmax parameterization. We further implement our algorithm using both linear and neural-network function approximations, and evaluate our solution on an open-source serverless platform, OpenWhisk, with real-world work-loads from production traces. Experimental results demonstrate that our approach is scalable to a large number of users and significantly outperforms various baselines in terms of function latency and resource utilization efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/A Mixture Of Surprises for Unsupervised Reinforcement Learning b/data/2022/neurips/A Mixture Of Surprises for Unsupervised Reinforcement Learning
new file mode 100644
index 0000000000..7ff7eb3e40
--- /dev/null
+++ b/data/2022/neurips/A Mixture Of Surprises for Unsupervised Reinforcement Learning	
@@ -0,0 +1 @@
+Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose to provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives the agent to either explore or gain control over its environment. However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low. This assumption may not always hold in real-world scenarios, where the entropy of the environment's dynamics may be unknown. Hence, choosing between the two objectives is a dilemma. We propose a novel yet simple mixture of policies to address this concern, allowing us to optimize an objective that simultaneously maximizes and minimizes the surprise. Concretely, we train one mixture component whose objective is to maximize the surprise and another whose objective is to minimize the surprise. Hence, our method does not make assumptions about the entropy of the environment's dynamics. We call our method a $\textbf{M}\text{ixture }\textbf{O}\text{f }\textbf{S}\text{urprise}\textbf{S}$ (MOSS) for unsupervised reinforcement learning. Experimental results show that our simple method achieves state-of-the-art performance on the URLB benchmark, outperforming previous pure surprise maximization-based objectives. Our code is available at: https://github.com/LeapLabTHU/MOSS.
\ No newline at end of file
diff --git a/data/2022/neurips/A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs b/data/2022/neurips/A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs
new file mode 100644
index 0000000000..d2ed86f1bb
--- /dev/null
+++ b/data/2022/neurips/A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs	
@@ -0,0 +1 @@
+U-Net architectures are ubiquitous in state-of-the-art deep learning, however their regularisation properties and relationship to wavelets are understudied. In this paper, we formulate a multi-resolution framework which identifies U-Nets as finite-dimensional truncations of models on an infinite-dimensional function space. We provide theoretical results which prove that average pooling corresponds to projection within the space of square-integrable functions and show that U-Nets with average pooling implicitly learn a Haar wavelet basis representation of the data. We then leverage our framework to identify state-of-the-art hierarchical VAEs (HVAEs), which have a U-Net architecture, as a type of two-step forward Euler discretisation of multi-resolution diffusion processes which flow from a point mass, introducing sampling instabilities. We also demonstrate that HVAEs learn a representation of time which allows for improved parameter efficiency through weight-sharing. We use this observation to achieve state-of-the-art HVAE performance with half the number of parameters of existing models, exploiting the properties of our continuous-time formulation.
\ No newline at end of file
diff --git a/data/2022/neurips/A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction b/data/2022/neurips/A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
new file mode 100644
index 0000000000..bd0768610d
--- /dev/null
+++ b/data/2022/neurips/A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction	
@@ -0,0 +1 @@
+The recent advances of deep learning have dramatically changed how machine learning, especially in the domain of natural language processing, can be applied to legal domain. However, this shift to the data-driven approaches calls for larger and more diverse datasets, which are nevertheless still small in number, especially in non-English languages. Here we present the first large-scale benchmark of Korean legal AI datasets, LBOX OPEN, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task. The legal corpus consists of 147k Korean precedents (259M tokens), of which 63k are sentenced in last 4 years and 96k are from the first and the second level courts in which factual issues are reviewed. The two classification tasks are case names (11.3k) and statutes (2.8k) prediction from the factual description of individual cases. The LJP tasks consist of (1) 10.5k criminal examples where the model is asked to predict fine amount, imprisonment with labor, and imprisonment without labor ranges for the given facts, and (2) 4.7k civil examples where the inputs are facts and claim for relief and outputs are the degrees of claim acceptance. The summarization task consists of the Supreme Court precedents and the corresponding summaries (20k). We also release realistic variants of the datasets by extending the domain (1) to infrequent case categories in case name (31k examples) and statute (17.7k) classification tasks, and (2) to long input sequences in the summarization task (51k). Finally, we release LCUBE, the first Korean legal language model trained on the legal corpus from this study. Given the uniqueness of the Law of South Korea and the diversity of the legal tasks covered in this work, we believe that LBOX OPEN contributes to the multilinguality of global legal research. LBOX OPEN and LCUBE will be publicly available.
\ No newline at end of file
diff --git a/data/2022/neurips/A Multilabel Classification Framework for Approximate Nearest Neighbor Search b/data/2022/neurips/A Multilabel Classification Framework for Approximate Nearest Neighbor Search
new file mode 100644
index 0000000000..38e4013347
--- /dev/null
+++ b/data/2022/neurips/A Multilabel Classification Framework for Approximate Nearest Neighbor Search	
@@ -0,0 +1 @@
+Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates can be retrieved by naive lookup or backtracking search. We formulate candidate set selection in ANN search directly as a multilabel classification problem where the labels correspond to the nearest neighbors of the query point, and interpret the partitions as partitioning classifiers for solving this task. Empirical results suggest that the natural classifier based on this interpretation leads to strictly improved performance when combined with any unsupervised or supervised partitioning strategy. We also prove a sufficient condition for consistency of a partitioning classifier for ANN search, and illustrate the result by verifying this condition for chronological $k$-d trees.
\ No newline at end of file
diff --git a/data/2022/neurips/A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs b/data/2022/neurips/A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
new file mode 100644
index 0000000000..14f9a9290f
--- /dev/null
+++ b/data/2022/neurips/A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs	
@@ -0,0 +1 @@
+We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both stochastic and adversarial environments. The bound against oblivious adversaries is $\tilde{O} (\sqrt{\alpha T})$, where $T$ is the time horizon and $\alpha$ is the independence number of the feedback graph. The bound against stochastic environments is $O\big( (\ln T)^2 \max_{S\in \mathcal I(G)} \sum_{i \in S} \Delta_i^{-1}\big)$ where $\mathcal I(G)$ is the family of all independent sets in a suitably defined undirected version of the graph and $\Delta_i$ are the suboptimality gaps. The algorithm combines ideas from the EXP3++ algorithm for stochastic and adversarial bandits and the EXP3.G algorithm for feedback graphs with a novel exploration scheme. The scheme, which exploits the structure of the graph to reduce exploration, is key to obtain best-of-both-worlds guarantees with feedback graphs. We also extend our algorithm and results to a setting where the feedback graphs are allowed to change over time.
\ No newline at end of file
diff --git a/data/2022/neurips/A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP b/data/2022/neurips/A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
new file mode 100644
index 0000000000..f9ce0bfb6c
--- /dev/null
+++ b/data/2022/neurips/A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP	
@@ -0,0 +1 @@
+As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $\Omega\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-\gamma)^3\epsilon^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-\gamma)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.
\ No newline at end of file
diff --git a/data/2022/neurips/A Neural Corpus Indexer for Document Retrieval b/data/2022/neurips/A Neural Corpus Indexer for Document Retrieval
new file mode 100644
index 0000000000..5dee54ad6d
--- /dev/null
+++ b/data/2022/neurips/A Neural Corpus Indexer for Document Retrieval	
@@ -0,0 +1 @@
+Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.
\ No newline at end of file
diff --git a/data/2022/neurips/A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity b/data/2022/neurips/A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity
new file mode 100644
index 0000000000..ab3451ffc3
--- /dev/null
+++ b/data/2022/neurips/A Neural Pre-Conditioning Active Learning Algorithm to Reduce Label Complexity	
@@ -0,0 +1 @@
+Deep learning (DL) algorithms rely on massive amounts of labeled data. Semi-supervised learning (SSL) and active learning (AL) aim to reduce this label complexity by leveraging unlabeled data or carefully acquiring labels, respectively. In this work, we primarily focus on designing an AL algorithm but first argue for a change in how AL algorithms should be evaluated. Although unlabeled data is readily available in pool-based AL, AL algorithms are usually evaluated by measuring the increase in supervised learning (SL) performance at consecutive acquisition steps. Because this measures performance gains from both newly acquired instances and newly acquired labels, we propose to instead evaluate the label efficiency of AL algorithms by measuring the increase in SSL performance at consecutive acquisition steps. After surveying tools that can be used to this end, we propose our neural pre-conditioning (NPC) algorithm inspired by a Neural Tangent Kernel (NTK) analysis. Our algorithm incorporates the classifier's uncertainty on unlabeled data and penalizes redundant samples within candidate batches to efficiently acquire a diverse set of informative labels. Furthermore, we prove that NPC improves downstream training in the large-width regime in a manner previously observed to correlate with generalization. Comparisons with other AL algorithms show that a state-of-the-art SSL algorithm coupled with NPC can achieve high performance using very few labeled data.
\ No newline at end of file
diff --git a/data/2022/neurips/A New Family of Generalization Bounds Using Samplewise Evaluated CMI b/data/2022/neurips/A New Family of Generalization Bounds Using Samplewise Evaluated CMI
new file mode 100644
index 0000000000..2b6f5fb054
--- /dev/null
+++ b/data/2022/neurips/A New Family of Generalization Bounds Using Samplewise Evaluated CMI	
@@ -0,0 +1 @@
+We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension.
\ No newline at end of file
diff --git a/data/2022/neurips/A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models b/data/2022/neurips/A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models
new file mode 100644
index 0000000000..c25bdd1db3
--- /dev/null
+++ b/data/2022/neurips/A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models	
@@ -0,0 +1 @@
+We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss $\ell$ can control the test error under all Moreau envelopes of the loss $\ell$. We use our finite-sample bound to directly recover the"optimistic rate"of Zhou et al. (2021) for linear regression with the square loss, which is known to be tight for minimal $\ell_2$-norm interpolation, but we also handle more general settings where the label is generated by a potentially misspecified multi-index model. The same argument can analyze noisy interpolation of max-margin classifiers through the squared hinge loss, and establishes consistency results in spiked-covariance settings. More generally, when the loss is only assumed to be Lipschitz, our bound effectively improves Talagrand's well-known contraction lemma by a factor of two, and we prove uniform convergence of interpolators (Koehler et al. 2021) for all smooth, non-negative losses. Finally, we show that application of our generalization bound using localized Gaussian width will generally be sharp for empirical risk minimizers, establishing a non-asymptotic Moreau envelope theory for generalization that applies outside of proportional scaling regimes, handles model misspecification, and complements existing asymptotic Moreau envelope theories for M-estimation.
\ No newline at end of file
diff --git a/data/2022/neurips/A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning b/data/2022/neurips/A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning
new file mode 100644
index 0000000000..9e75ecb011
--- /dev/null
+++ b/data/2022/neurips/A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning	
@@ -0,0 +1 @@
+Temporal-difference learning is a popular algorithm for policy evaluation. In this paper, we study the convergence of the regularized non-parametric TD(0) algorithm, in both the independent and Markovian observation settings. In particular, when TD is performed in a universal reproducing kernel Hilbert space (RKHS), we prove convergence of the averaged iterates to the optimal value function, even when it does not belong to the RKHS. We provide explicit convergence rates that depend on a source condition relating the regularity of the optimal value function to the RKHS. We illustrate this convergence numerically on a simple continuous-state Markov reward process.
\ No newline at end of file
diff --git a/data/2022/neurips/A PAC-Bayesian Generalization Bound for Equivariant Networks b/data/2022/neurips/A PAC-Bayesian Generalization Bound for Equivariant Networks
new file mode 100644
index 0000000000..1d734b081b
--- /dev/null
+++ b/data/2022/neurips/A PAC-Bayesian Generalization Bound for Equivariant Networks	
@@ -0,0 +1 @@
+Equivariant networks capture the inductive bias about the symmetry of the learning task by building those symmetries into the model. In this paper, we study how equivariance relates to generalization error utilizing PAC Bayesian analysis for equivariant networks, where the transformation laws of feature spaces are determined by group representations. By using perturbation analysis of equivariant networks in Fourier domain for each layer, we derive norm-based PAC-Bayesian generalization bounds. The bound characterizes the impact of group size, and multiplicity and degree of irreducible representations on the generalization error and thereby provide a guideline for selecting them. In general, the bound indicates that using larger group size in the model improves the generalization error substantiated by extensive numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/A Policy-Guided Imitation Approach for Offline Reinforcement Learning b/data/2022/neurips/A Policy-Guided Imitation Approach for Offline Reinforcement Learning
new file mode 100644
index 0000000000..3a1232f310
--- /dev/null
+++ b/data/2022/neurips/A Policy-Guided Imitation Approach for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit{Prophet}. By doing so, our algorithm allows \textit{state-compositionality} from the dataset, rather than \textit{action-compositionality} conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (\texttt{POR}). \texttt{POR} demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of \texttt{POR} in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.
\ No newline at end of file
diff --git a/data/2022/neurips/A Practical, Progressively-Expressive GNN b/data/2022/neurips/A Practical, Progressively-Expressive GNN
new file mode 100644
index 0000000000..6b865b6eff
--- /dev/null
+++ b/data/2022/neurips/A Practical, Progressively-Expressive GNN	
@@ -0,0 +1 @@
+Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1-dimensional Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism testing frame-work. To this end, researchers have drawn inspiration from the k-WL hierarchy to develop more expressive GNNs. However, current k-WL-equivalent GNNs are not practical for even small values of k, as k-WL becomes combinatorially more complex as k grows. At the same time, several works have found great empirical success in graph learning tasks without highly expressive models, implying that chasing expressiveness with a coarse-grained ruler of expressivity like k-WL is often unneeded in practical tasks. To truly understand the expressiveness-complexity tradeoff, one desires a more fine-grained ruler, which can more gradually increase expressiveness. Our work puts forth such a proposal: Namely, we first propose the (k, c)(<=)-SETWL hierarchy with greatly reduced complexity from k-WL, achieved by moving from k-tuples of nodes to sets with<=k nodes defined over<=c connected components in the induced original graph. We show favorable theoretical results for this model in relation to k-WL, and concretize it via (k, c)(<=)-SETGNN, which is as expressive as (k, c)(<=)-SETWL. Our model is practical and progressively-expressive, increasing in power with k and c. We demonstrate effectiveness on several benchmark datasets, achieving several state-of-the-art results with runtime and memory usage applicable to practical graphs. We open source our implementation at https://github.com/LingxiaoShawn/KCSetGNN.
\ No newline at end of file
diff --git a/data/2022/neurips/A Probabilistic Graph Coupling View of Dimension Reduction b/data/2022/neurips/A Probabilistic Graph Coupling View of Dimension Reduction
new file mode 100644
index 0000000000..9abd1f235b
--- /dev/null
+++ b/data/2022/neurips/A Probabilistic Graph Coupling View of Dimension Reduction	
@@ -0,0 +1 @@
+Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.
\ No newline at end of file
diff --git a/data/2022/neurips/A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization b/data/2022/neurips/A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization
new file mode 100644
index 0000000000..59e8585b7b
--- /dev/null
+++ b/data/2022/neurips/A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization	
@@ -0,0 +1 @@
+We propose a projection-free conditional gradient-type algorithm for smooth stochastic multi-level composition optimization, where the objective function is a nested composition of $T$ functions and the constraint set is a closed convex set. Our algorithm assumes access to noisy evaluations of the functions and their gradients, through a stochastic first-order oracle satisfying certain standard unbiasedness and second moment assumptions. We show that the number of calls to the stochastic first-order oracle and the linear-minimization oracle required by the proposed algorithm, to obtain an $\epsilon$-stationary solution, are of order $\mathcal{O}_T(\epsilon^{-2})$ and $\mathcal{O}_T(\epsilon^{-3})$ respectively, where $\mathcal{O}_T$ hides constants in $T$. Notably, the dependence of these complexity bounds on $\epsilon$ and $T$ are separate in the sense that changing one does not impact the dependence of the bounds on the other. Moreover, our algorithm is parameter-free and does not require any (increasing) order of mini-batches to converge unlike the common practice in the analysis of stochastic conditional gradient-type algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/A Quadrature Rule combining Control Variates and Adaptive Importance Sampling b/data/2022/neurips/A Quadrature Rule combining Control Variates and Adaptive Importance Sampling
new file mode 100644
index 0000000000..54b18e7a9c
--- /dev/null
+++ b/data/2022/neurips/A Quadrature Rule combining Control Variates and Adaptive Importance Sampling	
@@ -0,0 +1 @@
+Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simple weighted least squares approach is proposed to improve the procedure with control variates. The procedure takes the form of a quadrature rule with adapted quadrature weights to reflect the information brought in by the control variates. The quadrature points and weights do not depend on the integrand, a computational advantage in case of multiple integrands. Moreover, the target density needs to be known only up to a multiplicative constant. Our main result is a non-asymptotic bound on the probabilistic error of the procedure. The bound proves that for improving the estimate's accuracy, the benefits from adaptive importance sampling and control variates can be combined. The good behavior of the method is illustrated empirically on synthetic examples and real-world data for Bayesian linear regression.
\ No newline at end of file
diff --git a/data/2022/neurips/A Quantitative Geometric Approach to Neural-Network Smoothness b/data/2022/neurips/A Quantitative Geometric Approach to Neural-Network Smoothness
new file mode 100644
index 0000000000..a4bd1cab8e
--- /dev/null
+++ b/data/2022/neurips/A Quantitative Geometric Approach to Neural-Network Smoothness	
@@ -0,0 +1 @@
+Fast and precise Lipschitz constant estimation of neural networks is an important task for deep learning. Researchers have recently found an intrinsic trade-off between the accuracy and smoothness of neural networks, so training a network with a loose Lipschitz constant estimation imposes a strong regularization and can hurt the model accuracy significantly. In this work, we provide a unified theoretical framework, a quantitative geometric approach, to address the Lipschitz constant estimation. By adopting this framework, we can immediately obtain several theoretical results, including the computational hardness of Lipschitz constant estimation and its approximability. Furthermore, the quantitative geometric perspective can also provide some insights into recent empirical observations that techniques for one norm do not usually transfer to another one. We also implement the algorithms induced from this quantitative geometric approach in a tool GeoLIP. These algorithms are based on semidefinite programming (SDP). Our empirical evaluation demonstrates that GeoLIP is more scalable and precise than existing tools on Lipschitz constant estimation for $\ell_\infty$-perturbations. Furthermore, we also show its intricate relations with other recent SDP-based techniques, both theoretically and empirically. We believe that this unified quantitative geometric perspective can bring new insights and theoretical tools to the investigation of neural-network smoothness and robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/A Reduction to Binary Approach for Debiasing Multiclass Datasets b/data/2022/neurips/A Reduction to Binary Approach for Debiasing Multiclass Datasets
new file mode 100644
index 0000000000..889b2fd30a
--- /dev/null
+++ b/data/2022/neurips/A Reduction to Binary Approach for Debiasing Multiclass Datasets	
@@ -0,0 +1 @@
+We propose a novel reduction-to-binary (R2B) approach that enforces demographic parity for multiclass classification with non-binary sensitive attributes via a reduction to a sequence of binary debiasing tasks. We prove that R2B satisfies optimality and bias guarantees and demonstrate empirically that it can lead to an improvement over two baselines: (1) treating multiclass problems as multi-label by debiasing labels independently and (2) transforming the features instead of the labels. Surprisingly, we also demonstrate that independent label debiasing yields competitive results in most (but not all) settings. We validate these conclusions on synthetic and real-world datasets from social science, computer vision, and healthcare.
\ No newline at end of file
diff --git a/data/2022/neurips/A Regret-Variance Trade-Off in Online Learning b/data/2022/neurips/A Regret-Variance Trade-Off in Online Learning
new file mode 100644
index 0000000000..71bc56889a
--- /dev/null
+++ b/data/2022/neurips/A Regret-Variance Trade-Off in Online Learning	
@@ -0,0 +1 @@
+We consider prediction with expert advice for strongly convex and bounded losses, and investigate trade-offs between regret and"variance"(i.e., squared difference of learner's predictions and best expert predictions). With $K$ experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve $O(\log K)$ regret. We prove that a variant of EWA either achieves a negative regret (i.e., the algorithm outperforms the best expert), or guarantees a $O(\log K)$ bound on both variance and regret. Building on this result, we show several examples of how variance of predictions can be exploited in learning. In the online to batch analysis, we show that a large empirical variance allows to stop the online to batch conversion early and outperform the risk of the best predictor in the class. We also recover the optimal rate of model selection aggregation when we do not consider early stopping. In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance. In online selective sampling, we design an algorithm that samples less when the variance is large, while guaranteeing the optimal regret bound in expectation. In online learning with abstention, we use a similar term as the variance to derive the first high-probability $O(\log K)$ regret bound in this setting. Finally, we extend our results to the setting of online linear regression.
\ No newline at end of file
diff --git a/data/2022/neurips/A Reparametrization-Invariant Sharpness Measure Based on Information Geometry b/data/2022/neurips/A Reparametrization-Invariant Sharpness Measure Based on Information Geometry
new file mode 100644
index 0000000000..6082ccc0b9
--- /dev/null
+++ b/data/2022/neurips/A Reparametrization-Invariant Sharpness Measure Based on Information Geometry	
@@ -0,0 +1 @@
+It has been observed that the generalization performance of neural networks correlates with the sharpness of their loss landscape. Dinh et al. (2017) [8] have observed that existing formulations of sharpness measures fail to be invariant with respect to scaling and reparametrization. While some scale-invariant measures have recently been proposed, reparametrization-invariant measures are still lacking. Moreover, they often do not provide any theoretical insights into generalization performance nor lead to practical use to improve the performance. Based on an information geometric analysis of the neural network parameter space, in this paper we propose a reparametrization-invariant sharpness measure that captures the change in loss with respect to changes in the probability distribution modeled by neural networks, rather than with respect to changes in the parameter values. We reveal some theoretical connections of our measure to generalization performance. In particular, experiments confirm that using our measure as a regularizer in neural network training significantly improves performance.
\ No newline at end of file
diff --git a/data/2022/neurips/A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits b/data/2022/neurips/A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits
new file mode 100644
index 0000000000..7377c1aa38
--- /dev/null
+++ b/data/2022/neurips/A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits	
@@ -0,0 +1 @@
+We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as corrupted Gaussian process (GP) bandit optimization. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, Robust GP Phased Elimination (RGP-PE), successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $\gamma_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C \gamma_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T \gamma_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/A Rotated Hyperbolic Wrapped Normal Distribution for Hierarchical Representation Learning b/data/2022/neurips/A Rotated Hyperbolic Wrapped Normal Distribution for Hierarchical Representation Learning
new file mode 100644
index 0000000000..1722776661
--- /dev/null
+++ b/data/2022/neurips/A Rotated Hyperbolic Wrapped Normal Distribution for Hierarchical Representation Learning	
@@ -0,0 +1 @@
+We present a rotated hyperbolic wrapped normal distribution (RoWN), a simple yet effective alteration of a hyperbolic wrapped normal distribution (HWN). The HWN expands the domain of probabilistic modeling from Euclidean to hyperbolic space, where a tree can be embedded with arbitrary low distortion in theory. In this work, we analyze the geometric properties of the diagonal HWN, a standard choice of distribution in probabilistic modeling. The analysis shows that the distribution is inappropriate to represent the data points at the same hierarchy level through their angular distance with the same norm in the Poincar\'e disk model. We then empirically verify the presence of limitations of HWN, and show how RoWN, the proposed distribution, can alleviate the limitations on various hierarchical datasets, including noisy synthetic binary tree, WordNet, and Atari 2600 Breakout. The code is available at https://github.com/ml-postech/RoWN.
\ No newline at end of file
diff --git a/data/2022/neurips/A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree b/data/2022/neurips/A Scalable Deterministic Global Optimization Algorithm for Training Optimal Decision Tree
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Simple Approach to Automated Spectral Clustering b/data/2022/neurips/A Simple Approach to Automated Spectral Clustering
new file mode 100644
index 0000000000..d44a6e0a56
--- /dev/null
+++ b/data/2022/neurips/A Simple Approach to Automated Spectral Clustering	
@@ -0,0 +1 @@
+The performance of spectral clustering heavily relies on the quality of affinity matrix. A variety of affinity-matrix-construction (AMC) methods have been proposed but they have hyperparameters to determine beforehand, which requires strong experience and leads to difficulty in real applications, especially when the inter-cluster similarity is high and/or the dataset is large. In addition, we often need to choose different AMC methods for different datasets, which still depends on experience. To solve these two challenging problems, in this paper, we present a simple yet effective method for automated spectral clustering. First, we propose to find the most reliable affinity matrix via grid search or Bayesian optimization among a set of candidates given by different AMC methods with different hyperparameters, where the reliability is quantified by the \textit{relative-eigen-gap} of graph Laplacian introduced in this paper. Second, we propose a fast and accurate AMC method based on least squares representation and thresholding and prove its effectiveness theoretically. Finally, we provide a large-scale extension for the automated spectral clustering method, of which the time complexity is linear with the number of data points. Extensive experiments of natural image clustering show that our method is more versatile, accurate, and efficient than baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/A Simple Decentralized Cross-Entropy Method b/data/2022/neurips/A Simple Decentralized Cross-Entropy Method
new file mode 100644
index 0000000000..50798584eb
--- /dev/null
+++ b/data/2022/neurips/A Simple Decentralized Cross-Entropy Method	
@@ -0,0 +1 @@
+Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the state-of-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. Code is available at https://github.com/vincentzhang/decentCEM
\ No newline at end of file
diff --git a/data/2022/neurips/A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk b/data/2022/neurips/A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk
new file mode 100644
index 0000000000..ad333f95ef
--- /dev/null
+++ b/data/2022/neurips/A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk	
@@ -0,0 +1 @@
+We consider the classical multi-armed bandit problem and design simple-to-implement new policies that simultaneously enjoy two properties: worst-case optimality for the expected regret, and safety against heavy-tailed risk for the regret distribution. Recently, [10] showed that information-theoretic optimized bandit policies as well as standard UCB policies suffer from some serious heavy-tailed risk; that is, the probability of incurring a linear regret slowly decays at a polynomial rate of 1 /T , as T (the time horizon) increases. Inspired by their result, we further show that any policy that incurs an instance-dependent O (ln T ) regret must incur a linear regret with probability Ω(poly(1 /T )) and that the heavy-tailed risk actually exists for all “instance-dependent consistent" policies. Next, for the two-armed bandit setting, we provide a simple policy design that (i) has the worst-case optimality for the expected regret at order ˜ O ( √ T ) and (ii) has the worst-case tail probability of incurring a linear regret decay at an exponential rate exp( − Ω( √ T )) . We further prove that this exponential decaying rate of the tail probability is optimal across all policies that have worst-case optimality for the expected regret. Finally, we generalize the policy design and analysis to the general setting with an arbitrary K number of arms. We provide detailed characterization of the tail probability bound for any regret threshold under our policy design. Numerical experiments are conducted to illustrate the theoretical findings. Our results reveal insights on the incompatibility between consistency and light-tailed risk, whereas indicate that worst-case optimality on expected regret and light-tailed risk are compatible.
\ No newline at end of file
diff --git a/data/2022/neurips/A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits b/data/2022/neurips/A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits
new file mode 100644
index 0000000000..83c6a619e2
--- /dev/null
+++ b/data/2022/neurips/A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits	
@@ -0,0 +1 @@
+We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLinUCB} based on the principle of optimism. We prove that the regret of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension of the contextual vector and $T_m$ is the total number of interactions with the environment by $m$-th agent. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.
\ No newline at end of file
diff --git a/data/2022/neurips/A Single-timescale Analysis for Stochastic Approximation with Multiple Coupled Sequences b/data/2022/neurips/A Single-timescale Analysis for Stochastic Approximation with Multiple Coupled Sequences
new file mode 100644
index 0000000000..f5ad483793
--- /dev/null
+++ b/data/2022/neurips/A Single-timescale Analysis for Stochastic Approximation with Multiple Coupled Sequences	
@@ -0,0 +1 @@
+Stochastic approximation (SA) with multiple coupled sequences has found broad applications in machine learning such as bilevel learning and reinforcement learning (RL). In this paper, we study the finite-time convergence of nonlinear SA with multiple coupled sequences. Different from existing multi-timescale analysis, we seek for scenarios where a fine-grained analysis can provide the tight performance guarantee for multi-sequence single-timescale SA (STSA). At the heart of our analysis is the smoothness property of the fixed points in multi-sequence SA that holds in many applications. When all sequences have strongly monotone increments, we establish the iteration complexity of $\mathcal{O}(\epsilon^{-1})$ to achieve $\epsilon$-accuracy, which improves the existing $\mathcal{O}(\epsilon^{-1.5})$ complexity for two coupled sequences. When all but the main sequence have strongly monotone increments, we establish the iteration complexity of $\mathcal{O}(\epsilon^{-2})$. The merit of our results lies in that applying them to stochastic bilevel and compositional optimization problems, as well as RL problems leads to either relaxed assumptions or improvements over their existing performance guarantees.
\ No newline at end of file
diff --git a/data/2022/neurips/A Solver-free Framework for Scalable Learning in Neural ILP Architectures b/data/2022/neurips/A Solver-free Framework for Scalable Learning in Neural ILP Architectures
new file mode 100644
index 0000000000..5df4c33334
--- /dev/null
+++ b/data/2022/neurips/A Solver-free Framework for Scalable Learning in Neural ILP Architectures	
@@ -0,0 +1 @@
+There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as Neural ILP in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box (Paulus et al. 2021) - this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is solver-free, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron. Our key idea is that the training loss should impose that the final polyhedron separates the positives (all constraints satisfied) from the negatives (at least one violated constraint or a suboptimal cost value), via a soft-margin formulation. While positive example(s) are provided as part of the training data, we devise novel techniques for generating negative samples. Our solution is flexible enough to handle equality as well as inequality constraints. Experiments on several problems, both perceptual as well as symbolic, which require learning the constraints of an ILP, show that our approach has superior performance and scales much better compared to purely neural baselines and other state-of-the-art models that require solver-based training. In particular, we are able to obtain excellent performance in 9 x 9 symbolic and visual sudoku, to which the other Neural ILP solver is not able to scale.
\ No newline at end of file
diff --git a/data/2022/neurips/A Spectral Approach to Item Response Theory b/data/2022/neurips/A Spectral Approach to Item Response Theory
new file mode 100644
index 0000000000..048ed01142
--- /dev/null
+++ b/data/2022/neurips/A Spectral Approach to Item Response Theory	
@@ -0,0 +1 @@
+The Rasch model is one of the most fundamental models in \emph{item response theory} and has wide-ranging applications from education testing to recommendation systems. In a universe with $n$ users and $m$ items, the Rasch model assumes that the binary response $X_{li} \in \{0,1\}$ of a user $l$ with parameter $\theta^*_l$ to an item $i$ with parameter $\beta^*_i$ (e.g., a user likes a movie, a student correctly solves a problem) is distributed as $\Pr(X_{li}=1) = 1/(1 + \exp{-(\theta^*_l - \beta^*_i)})$. In this paper, we propose a \emph{new item estimation} algorithm for this celebrated model (i.e., to estimate $\beta^*$). The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph. We complement our algorithmic contributions with finite-sample error guarantees, the first of their kind in the literature, showing that our algorithm is consistent and enjoys favorable optimality properties. We discuss practical modifications to accelerate and robustify the algorithm that practitioners can adopt. Experiments on synthetic and real-life datasets, ranging from small education testing datasets to large recommendation systems datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.
\ No newline at end of file
diff --git a/data/2022/neurips/A Statistical Online Inference Approach in Averaged Stochastic Approximation b/data/2022/neurips/A Statistical Online Inference Approach in Averaged Stochastic Approximation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization b/data/2022/neurips/A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization
new file mode 100644
index 0000000000..2f05fe4a1c
--- /dev/null
+++ b/data/2022/neurips/A Stochastic Linearized Augmented Lagrangian Method for Decentralized Bilevel Optimization	
@@ -0,0 +1 @@
+Bilevel optimization has been shown to be a powerful framework for formulating multi-task machine learning problems, e.g., reinforcement learning (RL) and meta-learning, where the decision variables are coupled in both levels of the minimization problems. In practice, the learning tasks would be located at different computing resource environments, and thus there is a need for deploying a decentralized training framework to implement multi-agent and multi-task learning. We develop a stochastic linearized augmented Lagrangian method (SLAM) for solving general nonconvex bilevel optimization problems over a graph, where both upper and lower optimization variables are able to achieve a consensus. We also establish that the theoretical convergence rate of the proposed SLAM to the Karush-Kuhn-Tucker (KKT) points of this class of problems is on the same order as the one achieved by the classical distributed stochastic gradient descent for only single-level nonconvex minimization problems. Numerical results tested on multi-agent RL problems showcase the superiority of SLAM compared with the benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets b/data/2022/neurips/A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets
new file mode 100644
index 0000000000..a9e225d62b
--- /dev/null
+++ b/data/2022/neurips/A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets	
@@ -0,0 +1 @@
+Predictive tools are becoming widely used in police, courts, and prison systems worldwide. Criminal justice is thus an increasingly important application domain for machine learning and algorithmic fairness. A few benchmark datasets have received significant attention—e.g., COMPAS [1]—but often without proper consideration of the domain context [2]. We conduct a survey of publicly available criminal justice datasets, highlight their potential uses, discuss context, and identify limitations and gaps in the current landscape. We provide datasheets [3] for 15 datasets, and make them available via a public repository. We compare the surveyed datasets across several dimensions, including size, population coverage, and potential use, highlighting possible concerns. We hope this work provides a useful starting point for researchers looking for appropriate datasets related to criminal justice, and wish to further grow the repository in a broader community effort.
\ No newline at end of file
diff --git a/data/2022/neurips/A Theoretical Framework for Inference Learning b/data/2022/neurips/A Theoretical Framework for Inference Learning
new file mode 100644
index 0000000000..fa7203ddc3
--- /dev/null
+++ b/data/2022/neurips/A Theoretical Framework for Inference Learning	
@@ -0,0 +1 @@
+Backpropagation (BP) is the most successful and widely used algorithm in deep learning. However, the computations required by BP are challenging to reconcile with known neurobiology. This difficulty has stimulated interest in more biologically plausible alternatives to BP. One such algorithm is the inference learning algorithm (IL). IL has close connections to neurobiological models of cortical function and has achieved equal performance to BP on supervised learning and auto-associative tasks. In contrast to BP, however, the mathematical foundations of IL are not well-understood. Here, we develop a novel theoretical framework for IL. Our main result is that IL closely approximates an optimization method known as implicit stochastic gradient descent (implicit SGD), which is distinct from the explicit SGD implemented by BP. Our results further show how the standard implementation of IL can be altered to better approximate implicit SGD. Our novel implementation considerably improves the stability of IL across learning rates, which is consistent with our theory, as a key property of implicit SGD is its stability. We provide extensive simulation results that further support our theoretical interpretations and also demonstrate IL achieves quicker convergence when trained with small mini-batches while matching the performance of BP for large mini-batches.
\ No newline at end of file
diff --git a/data/2022/neurips/A Theoretical Study on Solving Continual Learning b/data/2022/neurips/A Theoretical Study on Solving Continual Learning
new file mode 100644
index 0000000000..36ca64fcdc
--- /dev/null
+++ b/data/2022/neurips/A Theoretical Study on Solving Continual Learning	
@@ -0,0 +1 @@
+Continual learning (CL) learns a sequence of tasks incrementally. There are two popular CL settings, class incremental learning (CIL) and task incremental learning (TIL). A major challenge of CL is catastrophic forgetting (CF). While a number of techniques are already available to effectively overcome CF for TIL, CIL remains to be highly challenging. So far, little theoretical study has been done to provide a principled guidance on how to solve the CIL problem. This paper performs such a study. It first shows that probabilistically, the CIL problem can be decomposed into two sub-problems: Within-task Prediction (WP) and Task-id Prediction (TP). It further proves that TP is correlated with out-of-distribution (OOD) detection, which connects CIL and OOD detection. The key conclusion of this study is that regardless of whether WP and TP or OOD detection are defined explicitly or implicitly by a CIL algorithm, good WP and good TP or OOD detection are necessary and sufficient for good CIL performances. Additionally, TIL is simply WP. Based on the theoretical result, new CIL methods are also designed, which outperform strong baselines in both CIL and TIL settings by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning b/data/2022/neurips/A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
new file mode 100644
index 0000000000..7723ec1a1c
--- /dev/null
+++ b/data/2022/neurips/A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations. In this paper, we develop a unified framework that describes variations of GMRL algorithms and points out that existing stochastic meta-gradient estimators adopted by GMRL are actually \textbf{biased}. Such meta-gradient bias comes from two sources: 1) the compositional bias incurred by the two-level problem structure, which has an upper bound of $\mathcal{O}\big(K\alpha^{K}\hat{\sigma}_{\text{In}}|\tau|^{-0.5}\big)$ \emph{w.r.t.} inner-loop update step $K$, learning rate $\alpha$, estimate variance $\hat{\sigma}^{2}_{\text{In}}$ and sample size $|\tau|$, and 2) the multi-step Hessian estimation bias $\hat{\Delta}_{H}$ due to the use of autodiff, which has a polynomial impact $\mathcal{O}\big((K-1)(\hat{\Delta}_{H})^{K-1}\big)$ on the meta-gradient bias. We study tabular MDPs empirically and offer quantitative evidence that testifies our theoretical findings on existing stochastic meta-gradient estimators. Furthermore, we conduct experiments on Iterated Prisoner's Dilemma and Atari games to show how other methods such as off-policy learning and low-bias estimator can help fix the gradient bias for GMRL algorithms in general.
\ No newline at end of file
diff --git a/data/2022/neurips/A Theoretical View on Sparsely Activated Networks b/data/2022/neurips/A Theoretical View on Sparsely Activated Networks
new file mode 100644
index 0000000000..2af430de9d
--- /dev/null
+++ b/data/2022/neurips/A Theoretical View on Sparsely Activated Networks	
@@ -0,0 +1 @@
+Deep and wide neural networks successfully fit very complex functions today, but dense models are starting to be prohibitively expensive for inference. To mitigate this, one promising direction is networks that activate a sparse subgraph of the network. The subgraph is chosen by a data-dependent routing function, enforcing a fixed mapping of inputs to subnetworks (e.g., the Mixture of Experts (MoE) paradigm in Switch Transformers). However, prior work is largely empirical, and while existing routing functions work well in practice, they do not lead to theoretical guarantees on approximation ability. We aim to provide a theoretical explanation for the power of sparse networks. As our first contribution, we present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures. We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions. After representing LSH-based sparse networks with our model, we prove that sparse networks can match the approximation power of dense networks on Lipschitz functions. Applying LSH on the input vectors means that the experts interpolate the target function in different subregions of the input space. To support our theory, we define various datasets based on Lipschitz target functions, and we show that sparse networks give a favorable trade-off between number of active units and approximation quality.
\ No newline at end of file
diff --git a/data/2022/neurips/A Theory of PAC Learnability under Transformation Invariances b/data/2022/neurips/A Theory of PAC Learnability under Transformation Invariances
new file mode 100644
index 0000000000..4ed69b1400
--- /dev/null
+++ b/data/2022/neurips/A Theory of PAC Learnability under Transformation Invariances	
@@ -0,0 +1 @@
+Transformation invariances are present in many real-world problems. For example, image classification is usually invariant to rotation and color transformation: a rotated car in a different color is still identified as a car. Data augmentation, which adds the transformed data into the training set and trains a model on the augmented data, is one commonly used technique to build these invariances into the learning process. However, it is unclear how data augmentation performs theoretically and what the optimal algorithm is in presence of transformation invariances. In this paper, we study PAC learnability under transformation invariances in three settings according to different levels of realizability: (i) A hypothesis fits the augmented data; (ii) A hypothesis fits only the original data and the transformed data lying in the support of the data distribution; (iii) Agnostic case. One interesting observation is that distinguishing between the original data and the transformed data is necessary to achieve optimal accuracy in setting (ii) and (iii), which implies that any algorithm not differentiating between the original and transformed data (including data augmentation) is not optimal. Furthermore, this type of algorithms can even"harm"the accuracy. In setting (i), although it is unnecessary to distinguish between the two data sets, data augmentation still does not perform optimally. Due to such a difference, we propose two combinatorial measures characterizing the optimal sample complexity in setting (i) and (ii)(iii) and provide the optimal algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/A Transformer-Based Object Detector with Coarse-Fine Crossing Representations b/data/2022/neurips/A Transformer-Based Object Detector with Coarse-Fine Crossing Representations
new file mode 100644
index 0000000000..36a58fa35c
--- /dev/null
+++ b/data/2022/neurips/A Transformer-Based Object Detector with Coarse-Fine Crossing Representations	
@@ -0,0 +1 @@
+Transformer-based object detectors have shown competitive performance recently. Compared with convolutional neural networks limited by the relatively small receptive ﬁelds, the advantage of transformer for visual tasks is the capacity to perceive long-range dependencies among all image patches, while the deﬁciency is that the local ﬁne-grained information is not fully excavated. In this paper, we introduce the C oarse-grained and F ine-grained crossing representations to build an efﬁcient D etection T ransformer (CFDT). Speciﬁcally, we propose a local-global cross fusion module to establish the connection between local ﬁne-grained features and global coarse-grained features. Besides, we propose a coarse-ﬁne aware neck which enables detection tokens to interact with both coarse-grained and ﬁne-grained features. Furthermore, an efﬁcient feature integration module is presented for fusing multi-scale representations from different stages. Experimental results on the COCO dataset demonstrate the effectiveness of the proposed method. For instance, our CFDT achieves 48.1 AP with 173G FLOPs, which possesses higher accuracy and less computation compared with the state-of-the-art transformer-based detector ViDT. Code will be available at https: //gitee.com/mindspore/models/tree/master/research/cv/CFDT .
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Analysis of Federated Learning with Arbitrary Client Participation b/data/2022/neurips/A Unified Analysis of Federated Learning with Arbitrary Client Participation
new file mode 100644
index 0000000000..3d7eaf4305
--- /dev/null
+++ b/data/2022/neurips/A Unified Analysis of Federated Learning with Arbitrary Client Participation	
@@ -0,0 +1 @@
+Federated learning (FL) faces challenges of intermittent client availability and computation/communication efficiency. As a result, only a small subset of clients can participate in FL at a given time. It is important to understand how partial client participation affects convergence, but most existing works have either considered idealized participation patterns or obtained results with non-zero optimality error for generic patterns. In this paper, we provide a unified convergence analysis for FL with arbitrary client participation. We first introduce a generalized version of federated averaging (FedAvg) that amplifies parameter updates at an interval of multiple FL rounds. Then, we present a novel analysis that captures the effect of client participation in a single term. By analyzing this term, we obtain convergence upper bounds for a wide range of participation patterns, including both non-stochastic and stochastic cases, which match either the lower bound of stochastic gradient descent (SGD) or the state-of-the-art results in specific settings. We also discuss various insights, recommendations, and experimental results.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective b/data/2022/neurips/A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective
new file mode 100644
index 0000000000..6dc29ff0eb
--- /dev/null
+++ b/data/2022/neurips/A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective	
@@ -0,0 +1 @@
+We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training strategy can improve adversarial robustness and generalization compared to the vanilla training strategy. Using the theoretical results, we provide a high-level understanding of how different design choices of MSDA work differently. For example, we show that the most popular MSDA methods, Mixup and CutMix, behave differently, e.g., CutMix regularizes the input gradients by pixel distances, while Mixup regularizes the input gradients regardless of pixel distances. Our theoretical results also show that the optimal MSDA strategy depends on tasks, datasets, or model parameters. From these observations, we propose generalized MSDAs, a Hybrid version of Mixup and CutMix (HMix) and Gaussian Mixup (GMix), simple extensions of Mixup and CutMix. Our implementation can leverage the advantages of Mixup and CutMix, while our implementation is very efficient, and the computation cost is almost neglectable as Mixup and CutMix. Our empirical study shows that our HMix and GMix outperform the previous state-of-the-art MSDA methods in CIFAR-100 and ImageNet classification tasks. Source code is available at https://github.com/naver-ai/hmix-gmix
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Convergence Theorem for Stochastic Optimization Methods b/data/2022/neurips/A Unified Convergence Theorem for Stochastic Optimization Methods
new file mode 100644
index 0000000000..3a4c235dbe
--- /dev/null
+++ b/data/2022/neurips/A Unified Convergence Theorem for Stochastic Optimization Methods	
@@ -0,0 +1 @@
+In this work, we provide a fundamental unified convergence theorem used for deriving expected and almost sure convergence results for a series of stochastic optimization methods. Our unified theorem only requires to verify several representative conditions and is not tailored to any specific algorithm. As a direct application, we recover expected and almost sure convergence results of the stochastic gradient method (SGD) and random reshuffling (RR) under more general settings. Moreover, we establish new expected and almost sure convergence results for the stochastic proximal gradient method (prox-SGD) and stochastic model-based methods (SMM) for nonsmooth nonconvex optimization problems. These applications reveal that our unified theorem provides a plugin-type convergence analysis and strong convergence guarantees for a wide class of stochastic optimization methods.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Diversity Measure for Multiagent Reinforcement Learning b/data/2022/neurips/A Unified Diversity Measure for Multiagent Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks b/data/2022/neurips/A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks
new file mode 100644
index 0000000000..4dff4b9f30
--- /dev/null
+++ b/data/2022/neurips/A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks	
@@ -0,0 +1 @@
+Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Framework for Alternating Offline Model Training and Policy Learning b/data/2022/neurips/A Unified Framework for Alternating Offline Model Training and Policy Learning
new file mode 100644
index 0000000000..458dd2911b
--- /dev/null
+++ b/data/2022/neurips/A Unified Framework for Alternating Offline Model Training and Policy Learning	
@@ -0,0 +1 @@
+In offline model-based reinforcement learning (offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for policy learning, without further interacting with the environment. Offline MBRL algorithms can improve the efficiency and stability of policy learning over the model-free algorithms. However, in most of the existing offline MBRL algorithms, the learning objectives for the dynamic models and the policies are isolated from each other. Such an objective mismatch may lead to inferior performance of the learned agents. In this paper, we address this issue by developing an iterative offline MBRL framework, where we maximize a lower bound of the true expected return, by alternating between dynamic-model training and policy learning. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets. Source code is publicly released.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Framework for Deep Symbolic Regression b/data/2022/neurips/A Unified Framework for Deep Symbolic Regression
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A Unified Hard-Constraint Framework for Solving Geometrically Complex PDEs b/data/2022/neurips/A Unified Hard-Constraint Framework for Solving Geometrically Complex PDEs
new file mode 100644
index 0000000000..4f18d8eff5
--- /dev/null
+++ b/data/2022/neurips/A Unified Hard-Constraint Framework for Solving Geometrically Complex PDEs	
@@ -0,0 +1 @@
+We present a unified hard-constraint framework for solving geometrically complex PDEs with neural networks, where the most commonly used Dirichlet, Neumann, and Robin boundary conditions (BCs) are considered. Specifically, we first introduce the"extra fields"from the mixed finite element method to reformulate the PDEs so as to equivalently transform the three types of BCs into linear equations. Based on the reformulation, we derive the general solutions of the BCs analytically, which are employed to construct an ansatz that automatically satisfies the BCs. With such a framework, we can train the neural networks without adding extra loss terms and thus efficiently handle geometrically complex PDEs, alleviating the unbalanced competition between the loss terms corresponding to the BCs and PDEs. We theoretically demonstrate that the"extra fields"can stabilize the training process. Experimental results on real-world geometrically complex PDEs showcase the effectiveness of our method compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Model for Multi-class Anomaly Detection b/data/2022/neurips/A Unified Model for Multi-class Anomaly Detection
new file mode 100644
index 0000000000..44019db5cf
--- /dev/null
+++ b/data/2022/neurips/A Unified Model for Multi-class Anomaly Detection	
@@ -0,0 +1 @@
+Despite the rapid advance of unsupervised anomaly detection, existing methods require to train separate models for different objects. In this work, we present UniAD that accomplishes anomaly detection for multiple classes with a unified framework. Under such a challenging setting, popular reconstruction networks may fall into an"identical shortcut", where both normal and anomalous samples can be well recovered, and hence fail to spot outliers. To tackle this obstacle, we make three improvements. First, we revisit the formulations of fully-connected layer, convolutional layer, as well as attention layer, and confirm the important role of query embedding (i.e., within attention layer) in preventing the network from learning the shortcut. We therefore come up with a layer-wise query decoder to help model the multi-class distribution. Second, we employ a neighbor masked attention module to further avoid the information leak from the input feature to the reconstructed output feature. Third, we propose a feature jittering strategy that urges the model to recover the correct message even with noisy inputs. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets, where we surpass the state-of-the-art alternatives by a sufficiently large margin. For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88.1% to 96.5%) and anomaly localization (from 89.5% to 96.8%). Code is available at https://github.com/zhiyuanyou/UniAD.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unified Sequence Interface for Vision Tasks b/data/2022/neurips/A Unified Sequence Interface for Vision Tasks
new file mode 100644
index 0000000000..b0ae6ab1a3
--- /dev/null
+++ b/data/2022/neurips/A Unified Sequence Interface for Vision Tasks	
@@ -0,0 +1 @@
+While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of"core"computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e.g., bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization. To solve a specific task, we use a short prompt as task description, and the sequence output adapts to the prompt so it can produce task-specific output. We show that such a model can achieve competitive performance compared to well-established task-specific models.
\ No newline at end of file
diff --git a/data/2022/neurips/A Unifying Framework for Online Optimization with Long-Term Constraints b/data/2022/neurips/A Unifying Framework for Online Optimization with Long-Term Constraints
new file mode 100644
index 0000000000..7a29f9d1c3
--- /dev/null
+++ b/data/2022/neurips/A Unifying Framework for Online Optimization with Long-Term Constraints	
@@ -0,0 +1 @@
+We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear regret, where $\rho$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints).
\ No newline at end of file
diff --git a/data/2022/neurips/A Unifying Framework of Off-Policy General Value Function Evaluation b/data/2022/neurips/A Unifying Framework of Off-Policy General Value Function Evaluation
new file mode 100644
index 0000000000..ef6a548fc0
--- /dev/null
+++ b/data/2022/neurips/A Unifying Framework of Off-Policy General Value Function Evaluation	
@@ -0,0 +1 @@
+General Value Function (GVF) is a powerful tool to represent both the predictive and retrospective knowledge in reinforcement learning (RL). In practice, often multiple interrelated GVFs need to be evaluated jointly with pre-collected off-policy samples. In the literature, the gradient temporal difference (GTD) learning method has been adopted to evaluate GVFs in the off-policy setting, but such an approach may suffer from a large estimation error even if the function approximation class is sufficiently expressive. Moreover, none of the previous work have formally established the convergence guarantee to the ground truth GVFs under the function approximation settings. In this paper, we address both issues through the lens of a class of GVFs with causal filtering, which cover a wide range of RL applications such as reward variance, value gradient, cost in anomaly detection, stationary distribution gradient, etc. We propose a new algorithm called GenTD for off-policy GVFs evaluation and show that GenTD learns multiple interrelated multi-dimensional GVFs as efficiently as a single canonical scalar value function. We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large. To our best knowledge, GenTD is the first off-policy GVF evaluation algorithm that has global optimality guarantee.
\ No newline at end of file
diff --git a/data/2022/neurips/A Universal Error Measure for Input Predictions Applied to Online Graph Problems b/data/2022/neurips/A Universal Error Measure for Input Predictions Applied to Online Graph Problems
new file mode 100644
index 0000000000..b548ab3b5c
--- /dev/null
+++ b/data/2022/neurips/A Universal Error Measure for Input Predictions Applied to Online Graph Problems	
@@ -0,0 +1 @@
+We introduce a novel measure for quantifying the error in input predictions. The error is based on a minimum-cost hyperedge cover in a suitably defined hypergraph and provides a general template which we apply to online graph problems. The measure captures errors due to absent predicted requests as well as unpredicted actual requests; hence, predicted and actual inputs can be of arbitrary size. We achieve refined performance guarantees for previously studied network design problems in the online-list model, such as Steiner tree and facility location. Further, we initiate the study of learning-augmented algorithms for online routing problems, such as the online traveling salesperson problem and the online dial-a-ride problem, where (transportation) requests arrive over time (online-time model). We provide a general algorithmic framework and we give error-dependent performance bounds that improve upon known worst-case barriers, when given accurate predictions, at the cost of slightly increased worst-case bounds when given predictions of arbitrary quality.
\ No newline at end of file
diff --git a/data/2022/neurips/A Variant of Anderson Mixing with Minimal Memory Size b/data/2022/neurips/A Variant of Anderson Mixing with Minimal Memory Size
new file mode 100644
index 0000000000..41acba5112
--- /dev/null
+++ b/data/2022/neurips/A Variant of Anderson Mixing with Minimal Memory Size	
@@ -0,0 +1 @@
+Anderson mixing (AM) is a useful method that can accelerate ﬁxed-point iterations by exploring the information from historical iterations. Despite its numerical success in various applications, the memory requirement in AM remains a bottleneck when solving large-scale optimization problems in a resource-limited machine. To address this problem, we propose a novel variant of AM method, called Min-AM, by storing only one vector pair, that is the minimal memory size requirement in AM. Our method forms a symmetric approximation to the inverse Hessian matrix and is proved to be equivalent to the full-memory Type-I AM for solving strongly convex quadratic optimization. Moreover, for general nonlinear optimization problems, we establish the convergence properties of Min-AM under reasonable assumptions and show that the mixing parameters can be adaptively chosen by estimating the eigenvalues of the Hessian. Finally, we extend Min-AM to solve stochastic programming problems. Experimental results on logistic regression and network training problems validate the effectiveness of the proposed Min-AM.
\ No newline at end of file
diff --git a/data/2022/neurips/A Variational Edge Partition Model for Supervised Graph Representation Learning b/data/2022/neurips/A Variational Edge Partition Model for Supervised Graph Representation Learning
new file mode 100644
index 0000000000..4aa7e00cc6
--- /dev/null
+++ b/data/2022/neurips/A Variational Edge Partition Model for Supervised Graph Representation Learning	
@@ -0,0 +1 @@
+Graph neural networks (GNNs), which propagate the node features through the edges and learn how to transform the aggregated features under label supervision, have achieved great success in supervised feature extraction for both node-level and graph-level classification tasks. However, GNNs typically treat the graph structure as given and ignore how the edges are formed. This paper introduces a graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlapping node communities, each of which contributes to the edges via a logical OR mechanism. Based on this generative model, we partition each edge into the summation of multiple community-specific weighted edges and use them to define community-specific GNNs. A variational inference framework is proposed to jointly learn a GNN-based inference network that partitions the edges into different communities, these community-specific GNNs, and a GNN-based predictor that combines community-specific GNNs for the end classification task. Extensive evaluations on real-world graph datasets have verified the effectiveness of the proposed method in learning discriminative representations for both node-level and graph-level classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models b/data/2022/neurips/A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
new file mode 100644
index 0000000000..111cc2ae5e
--- /dev/null
+++ b/data/2022/neurips/A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models	
@@ -0,0 +1 @@
+Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse and robust subnetworks (SRNets) can consistently be found in BERT}, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that \textbf{there exist sparse and almost unbiased BERT subnetworks}. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.
\ No newline at end of file
diff --git a/data/2022/neurips/A composable machine-learning approach for steady-state simulations on high-resolution grids b/data/2022/neurips/A composable machine-learning approach for steady-state simulations on high-resolution grids
new file mode 100644
index 0000000000..f9c390d7be
--- /dev/null
+++ b/data/2022/neurips/A composable machine-learning approach for steady-state simulations on high-resolution grids	
@@ -0,0 +1 @@
+In this paper we show that our Machine Learning (ML) approach, CoMLSim (Composable Machine Learning Simulator), can simulate PDEs on highly-resolved grids with higher accuracy and generalization to out-of-distribution source terms and geometries than traditional ML baselines. Our unique approach combines key principles of traditional PDE solvers with local-learning and low-dimensional manifold techniques to iteratively simulate PDEs on large computational domains. The proposed approach is validated on more than 5 steady-state PDEs across different PDE conditions on highly-resolved grids and comparisons are made with the commercial solver, Ansys Fluent as well as 4 other state-of-the-art ML methods. The numerical experiments show that our approach outperforms ML baselines in terms of 1) accuracy across quantitative metrics and 2) generalization to out-of-distribution conditions as well as domain sizes. Additionally, we provide results for a large number of ablations experiments conducted to highlight components of our approach that strongly influence the results. We conclude that our local-learning and iterative-inferencing approach reduces the challenge of generalization that most ML models face.
\ No newline at end of file
diff --git a/data/2022/neurips/A consistently adaptive trust-region method b/data/2022/neurips/A consistently adaptive trust-region method
new file mode 100644
index 0000000000..ede8f47646
--- /dev/null
+++ b/data/2022/neurips/A consistently adaptive trust-region method	
@@ -0,0 +1 @@
+Adaptive trust-region methods attempt to maintain strong convergence guarantees without depending on conservative estimates of problem properties such as Lipschitz constants. However, on close inspection, one can show existing adaptive trust-region methods have theoretical guarantees with severely suboptimal dependence on problem properties such as the Lipschitz constant of the Hessian. For example, TRACE developed by Curtis et al. obtains a $O(\Delta_f L^{3/2} \epsilon^{-3/2}) + \tilde{O}(1)$ iteration bound where $L$ is the Lipschitz constant of the Hessian. Compared with the optimal $O(\Delta_f L^{1/2} \epsilon^{-3/2})$ bound this is suboptimal with respect to $L$. We present the first adaptive trust-region method which circumvents this issue and requires at most $O( \Delta_f L^{1/2} \epsilon^{-3/2}) + \tilde{O}(1)$ iterations to find an $\epsilon$-approximate stationary point, matching the optimal iteration bound up to an additive logarithmic term. Our method is a simple variant of a classic trust-region method and in our experiments performs competitively with both ARC and a classical trust-region method.
\ No newline at end of file
diff --git a/data/2022/neurips/A contrastive rule for meta-learning b/data/2022/neurips/A contrastive rule for meta-learning
new file mode 100644
index 0000000000..2b0aa6389b
--- /dev/null
+++ b/data/2022/neurips/A contrastive rule for meta-learning	
@@ -0,0 +1 @@
+Humans and other animals are capable of improving their learning performance as they solve related tasks from a given problem domain, to the point of being able to learn from extremely limited data. While synaptic plasticity is generically thought to underlie learning in the brain, the precise neural and synaptic mechanisms by which learning processes improve through experience are not well understood. Here, we present a general-purpose, biologically-plausible meta-learning rule which estimates gradients with respect to the parameters of an underlying learning algorithm by simply running it twice. Our rule may be understood as a generalization of contrastive Hebbian learning to meta-learning and notably, it neither requires computing second derivatives nor going backwards in time, two characteristic features of previous gradient-based methods that are hard to conceive in physical neural circuits. We demonstrate the generality of our rule by applying it to two distinct models: a complex synapse with internal states which consolidate task-shared information, and a dual-system architecture in which a primary network is rapidly modulated by another one to learn the specifics of each task. For both models, our meta-learning rule matches or outperforms reference algorithms on a wide range of benchmark problems, while only using information presumed to be locally available at neurons and synapses. We corroborate these findings with a theoretical analysis of the gradient estimation error incurred by our rule.
\ No newline at end of file
diff --git a/data/2022/neurips/A framework for bilevel optimization that enables stochastic and global variance reduction algorithms b/data/2022/neurips/A framework for bilevel optimization that enables stochastic and global variance reduction algorithms
new file mode 100644
index 0000000000..9d1d2a77e7
--- /dev/null
+++ b/data/2022/neurips/A framework for bilevel optimization that enables stochastic and global variance reduction algorithms	
@@ -0,0 +1 @@
+Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/A gradient estimator via L1-randomization for online zero-order optimization with two point feedback b/data/2022/neurips/A gradient estimator via L1-randomization for online zero-order optimization with two point feedback
new file mode 100644
index 0000000000..d63395e0a3
--- /dev/null
+++ b/data/2022/neurips/A gradient estimator via L1-randomization for online zero-order optimization with two point feedback	
@@ -0,0 +1 @@
+This work studies online zero-order optimization of convex and Lipschitz functions. We present a novel gradient estimator based on two function evaluations and randomization on the $\ell_1$-sphere. Considering different geometries of feasible sets and Lipschitz assumptions we analyse online dual averaging algorithm with our estimator in place of the usual gradient. We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise. We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem. In the case of canceling noise that was previously studied in the literature, our guarantees are either comparable or better than state-of-the-art bounds obtained by Duchi et al. (2015) and Shamir (2017) for non-adaptive algorithms. Our analysis is based on deriving a new weighted Poincar\'e type inequality for the uniform measure on the $\ell_1$-sphere with explicit constants, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions b/data/2022/neurips/A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions
new file mode 100644
index 0000000000..0612499b8d
--- /dev/null
+++ b/data/2022/neurips/A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions	
@@ -0,0 +1 @@
+Zhang et al. introduced a novel modification of Goldstein's classical subgradient method, with an efficiency guarantee of $O(\varepsilon^{-4})$ for minimizing Lipschitz functions. Their work, however, makes use of a nonstandard subgradient oracle model and requires the function to be directionally differentiable. In this paper, we show that both of these assumptions can be dropped by simply adding a small random perturbation in each step of their algorithm. The resulting method works on any Lipschitz function whose value and gradient can be evaluated at points of differentiability. We additionally present a new cutting plane algorithm that achieves better efficiency in low dimensions: $O(d\varepsilon^{-3})$ for Lipschitz functions and $O(d\varepsilon^{-2})$ for those that are weakly convex.
\ No newline at end of file
diff --git a/data/2022/neurips/A new dataset for multilingual keyphrase generation b/data/2022/neurips/A new dataset for multilingual keyphrase generation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A permutation-free kernel two-sample test b/data/2022/neurips/A permutation-free kernel two-sample test
new file mode 100644
index 0000000000..067329f72a
--- /dev/null
+++ b/data/2022/neurips/A permutation-free kernel two-sample test	
@@ -0,0 +1 @@
+The kernel Maximum Mean Discrepancy~(MMD) is a popular multivariate distance metric between distributions that has found utility in two-sample testing. The usual kernel-MMD test statistic is a degenerate U-statistic under the null, and thus it has an intractable limiting distribution. Hence, to design a level-$\alpha$ test, one usually selects the rejection threshold as the $(1-\alpha)$-quantile of the permutation distribution. The resulting nonparametric test has finite-sample validity but suffers from large computational cost, since every permutation takes quadratic time. We propose the cross-MMD, a new quadratic-time MMD test statistic based on sample-splitting and studentization. We prove that under mild assumptions, the cross-MMD has a limiting standard Gaussian distribution under the null. Importantly, we also show that the resulting test is consistent against any fixed alternative, and when using the Gaussian kernel, it has minimax rate-optimal power against local alternatives. For large sample sizes, our new cross-MMD provides a significant speedup over the MMD, for only a slight loss in power.
\ No newline at end of file
diff --git a/data/2022/neurips/A sharp NMF result with applications in network modeling b/data/2022/neurips/A sharp NMF result with applications in network modeling
new file mode 100644
index 0000000000..5cadbec760
--- /dev/null
+++ b/data/2022/neurips/A sharp NMF result with applications in network modeling	
@@ -0,0 +1 @@
+Given an n × n non-negative rank-K matrix Ω where m eigenvalues are negative, when can we write Ω = ZPZ (cid:48) for non-negative matrices Z ∈ R n,K and P ∈ R K,K ? While most existing works focused on the case of m = 0 , our primary interest is on the case of general m . With new proof ideas, we present sharp results on when the NMF problem is solvable, which signiﬁcantly extend existing results on this topic. The NMF problem is partially motivated by applications in network modeling. For a network with K communities, rank-K models are especially popular. The Degree-Corrected Mixed-Membership (DCMM) model is a recent rank-K model which is especially useful and interpretable in practice. To enjoy such properties, it is of interest to study when a rank-K model can be rewritten as a DCMM model. Using our NMF results, we show that for a rank-K model in the most interesting parameter ranges, we can always rewrite it as a DCMM model.
\ No newline at end of file
diff --git a/data/2022/neurips/A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal b/data/2022/neurips/A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal
new file mode 100644
index 0000000000..f393a9c627
--- /dev/null
+++ b/data/2022/neurips/A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal	
@@ -0,0 +1 @@
+Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of the loss landscape of past data with memory samples. This paper revisits the rehearsal dynamics in online settings. We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization, and examine the merits and limits of repeated rehearsal. Inspired by our analysis, a simple and intuitive baseline, Repeated Augmented Rehearsal (RAR), is designed to address the underfitting-overfitting dilemma of online rehearsal. Surprisingly, across four rather different OCL benchmarks, this simple baseline outperforms vanilla rehearsal by 9%-17% and also significantly improves state-of-the-art rehearsal-based methods MIR, ASER, and SCR. We also demonstrate that RAR successfully achieves an accurate approximation of the loss landscape of past data and high-loss ridge aversion in its learning trajectory. Extensive ablation studies are conducted to study the interplay between repeated and augmented rehearsal and reinforcement learning (RL) is applied to dynamically adjust the hyperparameters of RAR to balance the stability-plasticity trade-off online. Code is available at https://github.com/YaqianZhang/RepeatedAugmentedRehearsal
\ No newline at end of file
diff --git a/data/2022/neurips/A theory of weight distribution-constrained learning b/data/2022/neurips/A theory of weight distribution-constrained learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/A time-resolved theory of information encoding in recurrent neural networks b/data/2022/neurips/A time-resolved theory of information encoding in recurrent neural networks
new file mode 100644
index 0000000000..ab46c0e004
--- /dev/null
+++ b/data/2022/neurips/A time-resolved theory of information encoding in recurrent neural networks	
@@ -0,0 +1 @@
+Information encoding in neural circuits depends on how well time-varying stimuli are encoded by neural populations. Slow neuronal timescales, noise and network chaos can compromise reliable and rapid population response to external stimuli. A dynamic balance of externally incoming currents by strong recurrent inhibition was previously proposed as a mechanism to accurately and robustly encode a time-varying stimulus in balanced networks of binary neurons, but a theory for recurrent rate networks was missing. Here, we develop a non-stationary dynamic mean-ﬁeld theory that transparently explains how a tight balance of excitatory currents by recurrent inhibition improves information encoding. We demonstrate that the mutual information rate of a time-varying input increases linearly with the tightness of balance, both in the presence of additive noise and with recurrently generated chaotic network ﬂuctuations. We corroborated our ﬁndings in numerical experiments and demonstrated that recurrent networks with positive ﬁring rates trained to transmit a time-varying stimulus generically use recurrent inhibition to increase the information rate. We also found that networks trained to transmit multiple independent time-varying signals spontaneously form multiple local inhibitory clusters, one for each input channel. Our ﬁndings suggest that feedforward excitatory input and local recurrent inhibition–as observed in many biological circuits–is a generic circuit motif for encoding and transmitting time-varying information in recurrent neural circuits.
\ No newline at end of file
diff --git a/data/2022/neurips/A2: Efficient Automated Attacker for Boosting Adversarial Training b/data/2022/neurips/A2: Efficient Automated Attacker for Boosting Adversarial Training
new file mode 100644
index 0000000000..23982f2a22
--- /dev/null
+++ b/data/2022/neurips/A2: Efficient Automated Attacker for Boosting Adversarial Training	
@@ -0,0 +1 @@
+Based on the significant improvement of model robustness by AT (Adversarial Training), various variants have been proposed to further boost the performance. Well-recognized methods have focused on different components of AT (e.g., designing loss functions and leveraging additional unlabeled data). It is generally accepted that stronger perturbations yield more robust models. However, how to generate stronger perturbations efficiently is still missed. In this paper, we propose an efficient automated attacker called A2 to boost AT by generating the optimal perturbations on-the-fly during training. A2 is a parameterized automated attacker to search in the attacker space for the best attacker against the defense model and examples. Extensive experiments across different datasets demonstrate that A2 generates stronger perturbations with low extra cost and reliably improves the robustness of various AT methods against different attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection b/data/2022/neurips/ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
new file mode 100644
index 0000000000..3342946b90
--- /dev/null
+++ b/data/2022/neurips/ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection	
@@ -0,0 +1 @@
+Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that class-incremental learning using ACIL given present data would give identical results to that from its joint-learning counterpart which consumes both present and historical samples. This equality is theoretically validated. Data privacy is ensured since no historical data are involved during the learning process. Empirical validations demonstrate ACIL's competitive accuracy performance with near-identical results for various incremental task settings (e.g., 5-50 phases). This also allows ACIL to outperform the state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases).
\ No newline at end of file
diff --git a/data/2022/neurips/AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning b/data/2022/neurips/AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning
new file mode 100644
index 0000000000..15d7906dfd
--- /dev/null
+++ b/data/2022/neurips/AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning	
@@ -0,0 +1 @@
+Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units, existing research has not examined its effect on the self-attention mechanism. In this paper, we investigate this problem through self-attention attribution and find that dropping attention positions with low attribution scores can accelerate training and increase the risk of overfitting. Motivated by this observation, we propose Attribution-Driven Dropout (AD-DROP), which randomly discards some high-attribution positions to encourage the model to make predictions by relying more on low-attribution positions to reduce overfitting. We also develop a cross-tuning strategy to alternate fine-tuning and AD-DROP to avoid dropping high-attribution positions excessively. Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.
\ No newline at end of file
diff --git a/data/2022/neurips/ADBench: Anomaly Detection Benchmark b/data/2022/neurips/ADBench: Anomaly Detection Benchmark
new file mode 100644
index 0000000000..9f8083b73c
--- /dev/null
+++ b/data/2022/neurips/ADBench: Anomaly Detection Benchmark	
@@ -0,0 +1 @@
+Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 57 benchmark datasets, named ADBench. Our extensive experiments (98,436 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.
\ No newline at end of file
diff --git a/data/2022/neurips/ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation b/data/2022/neurips/ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation
new file mode 100644
index 0000000000..380bcbc36f
--- /dev/null
+++ b/data/2022/neurips/ALIFE: Adaptive Logit Regularizer and Feature Replay for Incremental Semantic Segmentation	
@@ -0,0 +1 @@
+We address the problem of incremental semantic segmentation (ISS) recognizing novel object/stuff categories continually without forgetting previous ones that have been learned. The catastrophic forgetting problem is particularly severe in ISS, since pixel-level ground-truth labels are available only for the novel categories at training time. To address the problem, regularization-based methods exploit probability calibration techniques to learn semantic information from unlabeled pixels. While such techniques are effective, there is still a lack of theoretical understanding of them. Replay-based methods propose to memorize a small set of images for previous categories. They achieve state-of-the-art performance at the cost of large memory footprint. We propose in this paper a novel ISS method, dubbed ALIFE, that provides a better compromise between accuracy and efficiency. To this end, we first show an in-depth analysis on the calibration techniques to better understand the effects on ISS. Based on this, we then introduce an adaptive logit regularizer (ALI) that enables our model to better learn new categories, while retaining knowledge for previous ones. We also present a feature replay scheme that memorizes features, instead of images directly, in order to reduce memory requirements significantly. Since a feature extractor is changed continually, memorized features should also be updated at every incremental stage. To handle this, we introduce category-specific rotation matrices updating the features for each category separately. We demonstrate the effectiveness of our approach with extensive experiments on standard ISS benchmarks, and show that our method achieves a better trade-off in terms of accuracy and efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/ALMA: Hierarchical Learning for Composite Multi-Agent Tasks b/data/2022/neurips/ALMA: Hierarchical Learning for Composite Multi-Agent Tasks
new file mode 100644
index 0000000000..4288b51ca4
--- /dev/null
+++ b/data/2022/neurips/ALMA: Hierarchical Learning for Composite Multi-Agent Tasks	
@@ -0,0 +1 @@
+Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and each agent can meaningfully focus on one subtask to the exclusion of all else in the environment. In these composite tasks, successful policies can often be decomposed into two levels of decision-making: agents are allocated to specific subtasks and each agent acts productively towards their assigned subtask alone. This decomposed decision making provides a strong structural inductive bias, significantly reduces agent observation spaces, and encourages subtask-specific policies to be reused and composed during training, as opposed to treating each new composition of subtasks as unique. We introduce ALMA, a general learning method for taking advantage of these structured tasks. ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies. We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments, outperforming strong baselines. ALMA's modularity also enables it to better generalize to new environment configurations. Finally, we find that while ALMA can integrate separately trained allocation and action policies, the best performance is obtained only by training all components jointly. Our code is available at https://github.com/shariqiqbal2810/ALMA
\ No newline at end of file
diff --git a/data/2022/neurips/AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation b/data/2022/neurips/AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
new file mode 100644
index 0000000000..d46ba575af
--- /dev/null
+++ b/data/2022/neurips/AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation	
@@ -0,0 +1 @@
+Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https://amos22.grand-challenge.org.
\ No newline at end of file
diff --git a/data/2022/neurips/AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness b/data/2022/neurips/AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
new file mode 100644
index 0000000000..559d5e164d
--- /dev/null
+++ b/data/2022/neurips/AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness	
@@ -0,0 +1 @@
+Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54x and 1.77x higher throughput than state-of-the-art model-parallel systems, respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction b/data/2022/neurips/APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction
new file mode 100644
index 0000000000..ce82cc9989
--- /dev/null
+++ b/data/2022/neurips/APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction	
@@ -0,0 +1 @@
+In many web applications, deep learning-based CTR prediction models (deep CTR models for short) are widely adopted. Traditional deep CTR models learn patterns in a static manner, i.e., the network parameters are the same across all the instances. However, such a manner can hardly characterize each of the instances which may have different underlying distributions. It actually limits the representation power of deep CTR models, leading to sub-optimal results. In this paper, we propose an efficient, effective, and universal module, named as Adaptive Parameter Generation network (APG), which can dynamically generate parameters for deep CTR models on-the-fly based on different instances. Extensive experimental evaluation results show that APG can be applied to a variety of deep CTR models and significantly improve their performance. Meanwhile, APG can reduce the time cost by 38.7\% and memory usage by 96.6\% compared to a regular deep CTR model. We have deployed APG in the industrial sponsored search system and achieved 3\% CTR gain and 1\% RPM gain respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking b/data/2022/neurips/APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking
new file mode 100644
index 0000000000..7a57f284de
--- /dev/null
+++ b/data/2022/neurips/APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking	
@@ -0,0 +1 @@
+Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting real-world applications, e.g., understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i.e., the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2,400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36,000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a valuable animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research. The code and dataset will be made publicly available at https://github.com/pandorgan/APT-36K.
\ No newline at end of file
diff --git a/data/2022/neurips/ASPiRe: Adaptive Skill Priors for Reinforcement Learning b/data/2022/neurips/ASPiRe: Adaptive Skill Priors for Reinforcement Learning
new file mode 100644
index 0000000000..993c3da8d7
--- /dev/null
+++ b/data/2022/neurips/ASPiRe: Adaptive Skill Priors for Reinforcement Learning	
@@ -0,0 +1 @@
+We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences. Our experiments demonstrate that ASPiRe can significantly accelerate the learning of new downstream tasks in the presence of multiple priors and show improvement on competitive baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/ATD: Augmenting CP Tensor Decomposition by Self Supervision b/data/2022/neurips/ATD: Augmenting CP Tensor Decomposition by Self Supervision
new file mode 100644
index 0000000000..b8acd34382
--- /dev/null
+++ b/data/2022/neurips/ATD: Augmenting CP Tensor Decomposition by Self Supervision	
@@ -0,0 +1 @@
+Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius norm) are designed for fitting raw data under statistical assumptions, which may not align with downstream classification tasks. In practice, raw input tensor can contain irrelevant information while data augmentation techniques may be used to smooth out class-irrelevant noise in samples. This paper addresses the above challenges by proposing augmented tensor decomposition (ATD), which effectively incorporates data augmentations and self-supervised learning (SSL) to boost downstream classification. To address the non-convexity of the new augmented objective, we develop an iterative method that enables the optimization to follow an alternating least squares (ALS) fashion. We evaluate our proposed ATD on multiple datasets. It can achieve 0.8% ~ 2.5% accuracy gain over tensor-based baselines. Also, our ATD model shows comparable or better performance (e.g., up to 15% in accuracy) over self-supervised and autoencoder baselines while using less than 5% of learnable parameters of these baseline models.
\ No newline at end of file
diff --git a/data/2022/neurips/AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning b/data/2022/neurips/AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
new file mode 100644
index 0000000000..9eea6fc74e
--- /dev/null
+++ b/data/2022/neurips/AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning	
@@ -0,0 +1 @@
+Deep neural networks have seen great success in recent years; however, training a deep model is often challenging as its performance heavily depends on the hyper-parameters used. In addition, finding the optimal hyper-parameter configuration, even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms, can be time-consuming, requiring multiple training runs over the entire dataset for different possible sets of hyper-parameters. Our central insight is that using an informative subset of the dataset for model training runs involved in hyper-parameter optimization, allows us to find the optimal hyper-parameter configuration significantly faster. In this work, we propose AUTOMATA, a gradient-based subset selection framework for hyper-parameter tuning. We empirically evaluate the effectiveness of AUTOMATA in hyper-parameter tuning through several experiments on real-world datasets in the text, vision, and tabular domains. Our experiments show that using gradient-based data subsets for hyper-parameter tuning achieves significantly faster turnaround times and speedups of 3$\times$-30$\times$ while achieving comparable performance to the hyper-parameters found using the entire dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments b/data/2022/neurips/AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments
new file mode 100644
index 0000000000..668a8bdb17
--- /dev/null
+++ b/data/2022/neurips/AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments	
@@ -0,0 +1 @@
+Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation. However, the real world is not only multimodal, but also often complex, and thus in spite of these advances, agents still need to understand the uncertainty in their actions and seek instructions to navigate. To this end, we present AVLEN~ -- an interactive agent for Audio-Visual-Language Embodied Navigation. Similar to audio-visual navigation tasks, the goal of our embodied agent is to localize an audio event via navigating the 3D visual world; however, the agent may also seek help from a human (oracle), where the assistance is provided in free-form natural language. To realize these abilities, AVLEN uses a multimodal hierarchical reinforcement learning backbone that learns: (a) high-level policies to choose either audio-cues for navigation or to query the oracle, and (b) lower-level policies to select navigation actions based on its audio-visual and language inputs. The policies are trained via rewarding for the success on the navigation task while minimizing the number of queries to the oracle. To empirically evaluate AVLEN, we present experiments on the SoundSpaces framework for semantic audio-visual navigation tasks. Our results show that equipping the agent to ask for help leads to a clear improvement in performance, especially in challenging cases, e.g., when the sound is unheard during training or in the presence of distractor sounds.
\ No newline at end of file
diff --git a/data/2022/neurips/AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs b/data/2022/neurips/AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs
new file mode 100644
index 0000000000..3742365486
--- /dev/null
+++ b/data/2022/neurips/AZ-whiteness test: a test for signal uncorrelation on spatio-temporal graphs	
@@ -0,0 +1 @@
+We present the ﬁrst whiteness hypothesis test for graphs, i.e., a whiteness test for multivariate time series associated with the nodes of a dynamic graph; as such, the test represents an important model assessment tool for graph deep learning, e.g., in forecasting setups. The statistical test aims at detecting existing serial dependencies among close-in-time observations, as well as spatial dependencies among neighboring observations given the underlying graph. The proposed AZ-test can be intended as a spatio-temporal extension of traditional tests designed for system identiﬁcation to graph signals. The AZ-test is versatile, allowing the underlying graph to be dynamic, changing in topology and set of nodes over time, and weighted, thus accounting for connections of different strength, as it is the case in many application scenarios like sensor and transportation networks. The asymptotic distribution of the designed test can be derived under the null hypothesis without assuming identically distributed data. We show the effectiveness of the test on both synthetic and real-world problems, and illustrate how it can be employed to assess the quality of spatio-temporal forecasting models by analyzing the prediction residuals appended to the graph stream.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerated Linearized Laplace Approximation for Bayesian Deep Learning b/data/2022/neurips/Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
new file mode 100644
index 0000000000..66f1e6c6ef
--- /dev/null
+++ b/data/2022/neurips/Accelerated Linearized Laplace Approximation for Bayesian Deep Learning	
@@ -0,0 +1 @@
+Laplace approximation (LA) and its linearized variant (LLA) enable effortless adaptation of pretrained deep neural networks to Bayesian neural networks. The generalized Gauss-Newton (GGN) approximation is typically introduced to improve their tractability. However, LA and LLA are still confronted with non-trivial inefficiency issues and should rely on Kronecker-factored, diagonal, or even last-layer approximate GGN matrices in practical use. These approximations are likely to harm the fidelity of learning outcomes. To tackle this issue, inspired by the connections between LLA and neural tangent kernels (NTKs), we develop a Nystrom approximation to NTKs to accelerate LLA. Our method benefits from the capability of popular deep learning libraries for forward mode automatic differentiation, and enjoys reassuring theoretical guarantees. Extensive studies reflect the merits of the proposed method in aspects of both scalability and performance. Our method can even scale up to architectures like vision transformers. We also offer valuable ablation studies to diagnose our method. Code is available at \url{https://github.com/thudzj/ELLA}.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling b/data/2022/neurips/Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling
new file mode 100644
index 0000000000..a0d5db8f9d
--- /dev/null
+++ b/data/2022/neurips/Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling	
@@ -0,0 +1 @@
+In this paper we study the convex-concave saddle-point problem $\min_x \max_y f(x) + y^T \mathbf{A} x - g(y)$, where $f(x)$ and $g(y)$ are smooth and convex functions. We propose an Accelerated Primal-Dual Gradient Method (APDG) for solving this problem, achieving (i) an optimal linear convergence rate in the strongly-convex-strongly-concave regime, matching the lower complexity bound (Zhang et al., 2021), and (ii) an accelerated linear convergence rate in the case when only one of the functions $f(x)$ and $g(y)$ is strongly convex or even none of them are. Finally, we obtain a linearly convergent algorithm for the general smooth and convex-concave saddle point problem $\min_x \max_y F(x,y)$ without the requirement of strong convexity or strong concavity.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerated Projected Gradient Algorithms for Sparsity Constrained Optimization Problems b/data/2022/neurips/Accelerated Projected Gradient Algorithms for Sparsity Constrained Optimization Problems
new file mode 100644
index 0000000000..c72260e0f9
--- /dev/null
+++ b/data/2022/neurips/Accelerated Projected Gradient Algorithms for Sparsity Constrained Optimization Problems	
@@ -0,0 +1 @@
+We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation and the other by subspace identification. The former fully utilizes the problem structure to greatly accelerate the optimization speed with only negligible additional cost. The latter leads to a two-stage meta-algorithm that first uses classical projected gradient iterations to identify the correct subspace containing an optimal solution, and then switches to a highly-efficient smooth optimization method in the identified subspace to attain superlinear convergence. Experiments demonstrate that the proposed accelerated algorithms are magnitudes faster than their non-accelerated counterparts as well as the state of the art.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations b/data/2022/neurips/Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations
new file mode 100644
index 0000000000..a9ed639cd9
--- /dev/null
+++ b/data/2022/neurips/Accelerated Training of Physics-Informed Neural Networks (PINNs) using Meshless Discretizations	
@@ -0,0 +1 @@
+We present a new technique for the accelerated training of physics-informed neural networks (PINNs): discretely-trained PINNs (DT-PINNs). The repeated computation of partial derivative terms in the PINN loss functions via automatic differentiation during training is known to be computationally expensive, especially for higher-order derivatives. DT-PINNs are trained by replacing these exact spatial derivatives with high-order accurate numerical discretizations computed using meshless radial basis function-finite differences (RBF-FD) and applied via sparse-matrix vector multiplication. The use of RBF-FD allows for DT-PINNs to be trained even on point cloud samples placed on irregular domain geometries. Additionally, though traditional PINNs (vanilla-PINNs) are typically stored and trained in 32-bit floating-point (fp32) on the GPU, we show that for DT-PINNs, using fp64 on the GPU leads to significantly faster training times than fp32 vanilla-PINNs with comparable accuracy. We demonstrate the efficiency and accuracy of DT-PINNs via a series of experiments. First, we explore the effect of network depth on both numerical and automatic differentiation of a neural network with random weights and show that RBF-FD approximations of third-order accuracy and above are more efficient while being sufficiently accurate. We then compare the DT-PINNs to vanilla-PINNs on both linear and nonlinear Poisson equations and show that DT-PINNs achieve similar losses with 2-4x faster training times on a consumer GPU. Finally, we also demonstrate that similar results can be obtained for the PINN solution to the heat equation (a space-time problem) by discretizing the spatial derivatives using RBF-FD and using automatic differentiation for the temporal derivative. Our results show that fp64 DT-PINNs offer a superior cost-accuracy profile to fp32 vanilla-PINNs.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerating Certified Robustness Training via Knowledge Transfer b/data/2022/neurips/Accelerating Certified Robustness Training via Knowledge Transfer
new file mode 100644
index 0000000000..e2fbb5ab1d
--- /dev/null
+++ b/data/2022/neurips/Accelerating Certified Robustness Training via Knowledge Transfer	
@@ -0,0 +1 @@
+Training deep neural network classifiers that are certifiably robust against adversarial attacks is critical to ensuring the security and reliability of AI-controlled systems. Although numerous state-of-the-art certified training methods have been developed, they are computationally expensive and scale poorly with respect to both dataset and network complexity. Widespread usage of certified training is further hindered by the fact that periodic retraining is necessary to incorporate new data and network improvements. In this paper, we propose Certified Robustness Transfer (CRT), a general-purpose framework for reducing the computational overhead of any certifiably robust training method through knowledge transfer. Given a robust teacher, our framework uses a novel training loss to transfer the teacher's robustness to the student. We provide theoretical and empirical validation of CRT. Our experiments on CIFAR-10 show that CRT speeds up certified robustness training by $8 \times$ on average across three different architecture generations while achieving comparable robustness to state-of-the-art methods. We also show that CRT can scale to large-scale datasets like ImageNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion b/data/2022/neurips/Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion
new file mode 100644
index 0000000000..eda0c0dd02
--- /dev/null
+++ b/data/2022/neurips/Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion	
@@ -0,0 +1 @@
+The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$-close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs. [See supporting code at https://github.com/Hong-Ming/ScaledSGD.]
\ No newline at end of file
diff --git a/data/2022/neurips/Accelerating Sparse Convolution with Column Vector-Wise Sparsity b/data/2022/neurips/Accelerating Sparse Convolution with Column Vector-Wise Sparsity
new file mode 100644
index 0000000000..3818c261c6
--- /dev/null
+++ b/data/2022/neurips/Accelerating Sparse Convolution with Column Vector-Wise Sparsity	
@@ -0,0 +1 @@
+Weight sparsity is a promising approach to reducing the model size and computation cost of convolutional neural networks (CNNs). Nevertheless, non-zero weights often distribute randomly in sparse CNN models, introducing enormous difﬁculty in obtaining actual speedup on common hardware (e.g., GPU) over their dense counterparts. Existing acceleration solutions either require hardware modiﬁcations for irregular memory access support or rely on a partially structured sparsity pattern. Neither of these methods is capable of achieving fruitful speedup on convolution layers. In this work, we propose an algorithm-software co-designed sparse convolution based on a novel out-vector-wise (OVW) sparse pattern. Building on the insight that vertical vector integrity can preserve continuous memory access in IM2COL, the OVW pattern treats a V × 1 vector as unit. To reduce the error caused by sparsity, we propose an equivalent transformation process, i.e., clustering-based channel permutation, to gather similar rows together. Experimental evaluations demonstrate that our method achieves a 1 . 7 × and 3 . 2 × speedup over the SOTA solution and the dense convolution of ResNet50 on NVIDIA V100 at 75% sparsity, respectively, with only negligible accuracy loss. Moreover, compared to the SOTA solution that achieves speedups only on data with 60% sparsity or more, our method begins to obtain speedups on data with only 10% sparsity.
\ No newline at end of file
diff --git a/data/2022/neurips/Acceleration in Distributed Sparse Regression b/data/2022/neurips/Acceleration in Distributed Sparse Regression
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Action-modulated midbrain dopamine activity arises from distributed control policies b/data/2022/neurips/Action-modulated midbrain dopamine activity arises from distributed control policies
new file mode 100644
index 0000000000..c98c82b441
--- /dev/null
+++ b/data/2022/neurips/Action-modulated midbrain dopamine activity arises from distributed control policies	
@@ -0,0 +1 @@
+Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and"action surprise,"a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of Q-learning. On benchmark navigation and reaching tasks, we show empirically that this model is capable of learning from data driven completely or in part by other policies (e.g. from other brain regions). By contrast, models without the action surprise term suffer in the presence of additional policies, and are incapable of learning at all from behavior that is completely externally driven. The model provides a computational account for numerous experimental findings about dopamine activity that cannot be explained by classic models of reinforcement learning in the basal ganglia. These include differing levels of action surprise signals in dorsal and ventral striatum, decreasing amounts movement-modulated dopamine activity with practice, and representations of action initiation and kinematics in dopamine activity. It also provides further predictions that can be tested with recordings of striatal dopamine activity.
\ No newline at end of file
diff --git a/data/2022/neurips/ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment b/data/2022/neurips/ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2022/neurips/ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2022/neurips/Active Bayesian Causal Inference b/data/2022/neurips/Active Bayesian Causal Inference
new file mode 100644
index 0000000000..ad76b2f309
--- /dev/null
+++ b/data/2022/neurips/Active Bayesian Causal Inference	
@@ -0,0 +1 @@
+Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference -- other unobserved quantities that are not of direct interest (e.g., the full causal model) ought to be marginalized out in this process and contribute to our epistemic uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, which jointly infers a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient, nonlinear additive noise models, which we model using Gaussian processes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, and update our beliefs to choose the next experiment. Through simulations, we demonstrate that our approach is more data-efficient than several baselines that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples while providing well-calibrated uncertainty estimates for the quantities of interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Exploration for Inverse Reinforcement Learning b/data/2022/neurips/Active Exploration for Inverse Reinforcement Learning
new file mode 100644
index 0000000000..ffd9d1cee7
--- /dev/null
+++ b/data/2022/neurips/Active Exploration for Inverse Reinforcement Learning	
@@ -0,0 +1 @@
+Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Labeling: Streaming Stochastic Gradients b/data/2022/neurips/Active Labeling: Streaming Stochastic Gradients
new file mode 100644
index 0000000000..fa293d272d
--- /dev/null
+++ b/data/2022/neurips/Active Labeling: Streaming Stochastic Gradients	
@@ -0,0 +1 @@
+The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the"active labeling"problem, which focuses on active learning with partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over the number of samples. We illustrate our technique in depth for robust regression.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning Helps Pretrained Models Learn the Intended Task b/data/2022/neurips/Active Learning Helps Pretrained Models Learn the Intended Task
new file mode 100644
index 0000000000..69b8b4c7d2
--- /dev/null
+++ b/data/2022/neurips/Active Learning Helps Pretrained Models Learn the Intended Task	
@@ -0,0 +1 @@
+Models can fail in unpredictable ways during deployment due to task ambiguity, when multiple behaviors are consistent with the provided training data. An example is an object classifier trained on red squares and blue circles: when encountering blue squares, the intended behavior is undefined. We investigate whether pretrained models are better active learners, capable of disambiguating between the possible tasks a user may be trying to specify. Intriguingly, we find that better active learning is an emergent property of the pretraining process: pretrained models require up to 5 times fewer labels when using uncertainty-based active learning, while non-pretrained models see no or even negative benefit. We find these gains come from an ability to select examples with attributes that disambiguate the intended behavior, such as rare product categories or atypical backgrounds. These attributes are far more linearly separable in pretrained model's representation spaces vs non-pretrained models, suggesting a possible mechanism for this behavior.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning Polynomial Threshold Functions b/data/2022/neurips/Active Learning Polynomial Threshold Functions
new file mode 100644
index 0000000000..778dccdedd
--- /dev/null
+++ b/data/2022/neurips/Active Learning Polynomial Threshold Functions	
@@ -0,0 +1 @@
+We initiate the study of active learning polynomial threshold functions (PTFs). While traditional lower bounds imply that even univariate quadratics cannot be non-trivially actively learned, we show that allowing the learner basic access to the derivatives of the underlying classifier circumvents this issue and leads to a computationally efficient algorithm for active learning degree-$d$ univariate PTFs in $\tilde{O}(d^3\log(1/\varepsilon\delta))$ queries. We also provide near-optimal algorithms and analyses for active learning PTFs in several average case settings. Finally, we prove that access to derivatives is insufficient for active learning multivariate PTFs, even those of just two variables.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning Through a Covering Lens b/data/2022/neurips/Active Learning Through a Covering Lens
new file mode 100644
index 0000000000..1ffb3f907a
--- /dev/null
+++ b/data/2022/neurips/Active Learning Through a Covering Lens	
@@ -0,0 +1 @@
+Deep active learning aims to reduce the annotation cost for the training of deep models, which is notoriously data-hungry. Until recently, deep active learning methods were ineffectual in the low-budget regime, where only a small number of examples are annotated. The situation has been alleviated by recent advances in representation and self-supervised learning, which impart the geometry of the data representation with rich information about the points. Taking advantage of this progress, we study the problem of subset selection for annotation through a"covering"lens, proposing ProbCover - a new active learning algorithm for the low budget regime, which seeks to maximize Probability Coverage. We then describe a dual way to view the proposed formulation, from which one can derive strategies suitable for the high budget regime of active learning, related to existing methods like Coreset. We conclude with extensive experiments, evaluating ProbCover in the low-budget regime. We show that our principled active learning strategy improves the state-of-the-art in the low-budget regime in several image recognition benchmarks. This method is especially beneficial in the semi-supervised setting, allowing state-of-the-art semi-supervised methods to match the performance of fully supervised methods, while using much fewer labels nonetheless. Code is available at https://github.com/avihu111/TypiClust.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning for Multiple Target Models b/data/2022/neurips/Active Learning for Multiple Target Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Active Learning of Classifiers with Label and Seed Queries b/data/2022/neurips/Active Learning of Classifiers with Label and Seed Queries
new file mode 100644
index 0000000000..cfe6ca2605
--- /dev/null
+++ b/data/2022/neurips/Active Learning of Classifiers with Label and Seed Queries	
@@ -0,0 +1 @@
+We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning with Neural Networks: Insights from Nonparametric Statistics b/data/2022/neurips/Active Learning with Neural Networks: Insights from Nonparametric Statistics
new file mode 100644
index 0000000000..137b366bb7
--- /dev/null
+++ b/data/2022/neurips/Active Learning with Neural Networks: Insights from Nonparametric Statistics	
@@ -0,0 +1 @@
+Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the amount of labeled training data. Empirical successes of deep active learning have been recently reported in the literature, however, rigorous label complexity guarantees of deep active learning have remained elusive. This constitutes a significant gap between theory and practice. This paper tackles this gap by providing the first near-optimal label complexity guarantees for deep active learning. The key insight is to study deep active learning from the nonparametric classification perspective. Under standard low noise conditions, we show that active learning with neural networks can provably achieve the minimax label complexity, up to disagreement coefficient and other logarithmic terms. When equipped with an abstention option, we further develop an efficient deep active learning algorithm that achieves $\mathsf{polylog}(\frac{1}{\epsilon})$ label complexity, without any low noise assumptions. We also provide extensions of our results beyond the commonly studied Sobolev/H\"older spaces and develop label complexity guarantees for learning in Radon $\mathsf{BV}^2$ spaces, which have recently been proposed as natural function spaces associated with neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Learning with Safety Constraints b/data/2022/neurips/Active Learning with Safety Constraints
new file mode 100644
index 0000000000..d20d4a3764
--- /dev/null
+++ b/data/2022/neurips/Active Learning with Safety Constraints	
@@ -0,0 +1 @@
+Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that such algorithms are designed with safety in mind. In this work we investigate the complexity of learning the best safe decision in interactive environments. We reduce this problem to a constrained linear bandits problem, where our goal is to find the best arm satisfying certain (unknown) safety constraints. We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal. To our knowledge, our results are the first on best-arm identification in linear bandits with safety constraints. In practice, we demonstrate that this approach performs well on synthetic and real world datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Ranking without Strong Stochastic Transitivity b/data/2022/neurips/Active Ranking without Strong Stochastic Transitivity
new file mode 100644
index 0000000000..294bb3165e
--- /dev/null
+++ b/data/2022/neurips/Active Ranking without Strong Stochastic Transitivity	
@@ -0,0 +1 @@
+Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity property. We propose a δ -correct algorithm, Probe-Rank, that actively learns the ranking from noisy pairwise comparisons. We prove a sample complexity upper bound for Probe-Rank, which only depends on the preference probabilities between items that are adjacent in the true ranking. This improves upon existing sample complexity results that depend on the preference probabilities for all pairs of items. Probe-Rank thus outperforms existing methods over a large collection of instances that do not satisfy Strong Stochastic Transitivity. Thorough numerical experiments in various settings are conducted, demonstrating that Probe-Rank is significantly more sample-efficient than the state-of-the-art active ranking method.
\ No newline at end of file
diff --git a/data/2022/neurips/Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation b/data/2022/neurips/Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation
new file mode 100644
index 0000000000..8e2b86599e
--- /dev/null
+++ b/data/2022/neurips/Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation	
@@ -0,0 +1 @@
+We propose Active Surrogate Estimators (ASEs), a new method for label-efficient model evaluation. Evaluating model performance is a challenging and important problem when labels are expensive. ASEs address this active testing problem using a surrogate-based estimation approach that interpolates the errors of points with unknown labels, rather than forming a Monte Carlo estimator. ASEs actively learn the underlying surrogate, and we propose a novel acquisition strategy, XWED, that tailors this learning to the final estimation task. We find that ASEs offer greater label-efficiency than the current state-of-the-art when applied to challenging model evaluation problems for deep neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Active-Passive SimStereo - Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods b/data/2022/neurips/Active-Passive SimStereo - Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods
new file mode 100644
index 0000000000..357079c7a2
--- /dev/null
+++ b/data/2022/neurips/Active-Passive SimStereo - Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods	
@@ -0,0 +1 @@
+In stereo vision, self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene so that each patch of an image pair can be identified without ambiguity. However, the projected pattern significantly alters the appearance of the image. If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods, which are now the de-facto standard for dense stereo vision. In this paper, we propose the Active-Passive SimStereo dataset and a corresponding benchmark to evaluate the performance gap between passive and active stereo images for stereo matching algorithms. Using the proposed benchmark and an additional ablation study, we show that the feature extraction and matching modules of a selection of twenty selected deep learning-based stereo matching methods generalize to active stereo without a problem. However, the disparity refinement modules of three of the twenty architectures (ACVNet, CascadeStereo, and StereoNet) are negatively affected by the active stereo patterns due to their reliance on the appearance of the input images.
\ No newline at end of file
diff --git a/data/2022/neurips/AdaFocal: Calibration-aware Adaptive Focal Loss b/data/2022/neurips/AdaFocal: Calibration-aware Adaptive Focal Loss
new file mode 100644
index 0000000000..2b7359dc63
--- /dev/null
+++ b/data/2022/neurips/AdaFocal: Calibration-aware Adaptive Focal Loss	
@@ -0,0 +1 @@
+Much recent work has been devoted to the problem of ensuring that a neural network's confidence scores match the true probability of being correct, i.e. the calibration problem. Of note, it was found that training with focal loss leads to better calibration than cross-entropy while achieving similar level of accuracy \cite{mukhoti2020}. This success stems from focal loss regularizing the entropy of the model's prediction (controlled by the parameter $\gamma$), thereby reining in the model's overconfidence. Further improvement is expected if $\gamma$ is selected independently for each training sample (Sample-Dependent Focal Loss (FLSD-53) \cite{mukhoti2020}). However, FLSD-53 is based on heuristics and does not generalize well. In this paper, we propose a calibration-aware adaptive focal loss called AdaFocal that utilizes the calibration properties of focal (and inverse-focal) loss and adaptively modifies $\gamma_t$ for different groups of samples based on $\gamma_{t-1}$ from the previous step and the knowledge of model's under/over-confidence on the validation set. We evaluate AdaFocal on various image recognition and one NLP task, covering a wide variety of network architectures, to confirm the improvement in calibration while achieving similar levels of accuracy. Additionally, we show that models trained with AdaFocal achieve a significant boost in out-of-distribution detection.
\ No newline at end of file
diff --git a/data/2022/neurips/Adam Can Converge Without Any Modification On Update Rules b/data/2022/neurips/Adam Can Converge Without Any Modification On Update Rules
new file mode 100644
index 0000000000..243cca9666
--- /dev/null
+++ b/data/2022/neurips/Adam Can Converge Without Any Modification On Update Rules	
@@ -0,0 +1 @@
+Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, many new variants have been designed to obtain convergence. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory and practice? We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i.e., $(\beta_1, \beta_2)$; while practical applications often fix the problem first and then tune $(\beta_1, \beta_2)$. Due to this observation, we conjecture that the empirical convergence can be theoretically justified, only if we change the order of picking the problem and hyperparameter. In this work, we confirm this conjecture. We prove that, when $\beta_2$ is large and $\beta_1<\sqrt{\beta_2}<1$, Adam converges to the neighborhood of critical points. The size of the neighborhood is propositional to the variance of stochastic gradients. Under an extra condition (strong growth condition), Adam converges to critical points. It is worth mentioning that our results cover a wide range of hyperparameters: as $\beta_2$ increases, our convergence result can cover any $\beta_1 \in [0,1)$ including $\beta_1=0.9$, which is the default setting in deep learning libraries. To our knowledge, this is the first result showing that Adam can converge without any modification on its update rules. Further, our analysis does not require assumptions of bounded gradients or bounded 2nd-order momentum. When $\beta_2$ is small, we further point out a large region of $(\beta_1,\beta_2)$ where Adam can diverge to infinity. Our divergence result considers the same setting as our convergence result, indicating a phase transition from divergence to convergence when increasing $\beta_2$. These positive and negative results can provide suggestions on how to tune Adam hyperparameters.
\ No newline at end of file
diff --git a/data/2022/neurips/AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition b/data/2022/neurips/AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
new file mode 100644
index 0000000000..77b0190e79
--- /dev/null
+++ b/data/2022/neurips/AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition	
@@ -0,0 +1 @@
+Pretraining Vision Transformers (ViTs) has achieved great success in visual recognition. A following scenario is to adapt a ViT to various image and video recognition tasks. The adaptation is challenging because of heavy computation and memory storage. Each model needs an independent and complete finetuning process to adapt to different tasks, which limits its transferability to different visual domains. To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently. It possesses several benefits more appealing than prior arts. Firstly, AdaptFormer introduces lightweight modules that only add less than 2% extra parameters to a ViT, while it is able to increase the ViT's transferability without updating its original pre-trained parameters, significantly outperforming the existing 100\% fully fine-tuned models on action recognition benchmarks. Secondly, it can be plug-and-play in different Transformers and scalable to many visual tasks. Thirdly, extensive experiments on five image and video datasets show that AdaptFormer largely improves ViTs in the target domains. For example, when updating just 1.5% extra parameters, it achieves about 10% and 19% relative improvement compared to the fully fine-tuned models on Something-Something~v2 and HMDB51, respectively. Code is available at https://github.com/ShoufaChen/AdaptFormer.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptation Accelerating Sampling-based Bayesian Inference in Attractor Neural Networks b/data/2022/neurips/Adaptation Accelerating Sampling-based Bayesian Inference in Attractor Neural Networks
new file mode 100644
index 0000000000..3476c9513c
--- /dev/null
+++ b/data/2022/neurips/Adaptation Accelerating Sampling-based Bayesian Inference in Attractor Neural Networks	
@@ -0,0 +1 @@
+The brain performs probabilistic Bayesian inference to interpret the external world. The sampling-based view assumes that the brain represents the stimulus posterior distribution via samples of stochastic neuronal responses. Although the idea of sampling-based inference is appealing, it faces a critical challenge of whether stochastic sampling is fast enough to match the rapid computation of the brain. In this study, we explore how latent feature sampling can be accelerated in neural circuits. Specifically, we consider a canonical neural circuit model called continuous attractor neural networks (CANNs) and investigate how sampling-based inference of latent continuous variables is accelerated in CANNs. Intriguingly, we find that by including noisy adaptation in the neuronal dynamics, the CANN is able to speed up the sampling process significantly. We theoretically derive that the CANN with noisy adaptation implements the efficient sampling method called Hamiltonian dynamics with friction, where noisy adaption effectively plays the role of momentum. We theoretically analyze the sampling performances of the network and derive the condition when the acceleration has the maximum effect. Simulation results validate our theoretical analyses. We further extend the model to coupled CANNs and demonstrate that noisy adaptation accelerates the sampling of the posterior distribution of multivariate stimuli. We hope that this study enhances our understanding of how Bayesian inference is realized in the brain
\ No newline at end of file
diff --git a/data/2022/neurips/Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency b/data/2022/neurips/Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
new file mode 100644
index 0000000000..cc3dce9f4c
--- /dev/null
+++ b/data/2022/neurips/Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency	
@@ -0,0 +1 @@
+Visual domain adaptation (DA) seeks to transfer trained models to unseen, unlabeled domains across distribution shift, but approaches typically focus on adapting convolutional neural network architectures initialized with supervised ImageNet representations. In this work, we shift focus to adapting modern architectures for object recognition -- the increasingly popular Vision Transformer (ViT) -- and modern pretraining based on self-supervised learning (SSL). Inspired by the design of recent SSL approaches based on learning from partial image inputs generated via masking or cropping -- either by learning to predict the missing pixels, or learning representational invariances to such augmentations -- we propose PACMAC, a simple two-stage adaptation algorithm for self-supervised ViTs. PACMAC first performs in-domain SSL on pooled source and target data to learn task-discriminative features, and then probes the model's predictive consistency across a set of partial target inputs generated via a novel attention-conditioned masking strategy, to identify reliable candidates for self-training. Our simple approach leads to consistent performance gains over competing methods that use ViTs and self-supervised initializations on standard object recognition benchmarks. Code available at https://github.com/virajprabhu/PACMAC
\ No newline at end of file
diff --git a/data/2022/neurips/Adapting to Online Label Shift with Provable Guarantees b/data/2022/neurips/Adapting to Online Label Shift with Provable Guarantees
new file mode 100644
index 0000000000..2bceee35fc
--- /dev/null
+++ b/data/2022/neurips/Adapting to Online Label Shift with Provable Guarantees	
@@ -0,0 +1 @@
+The standard supervised learning paradigm works effectively when training data shares the same distribution as the upcoming testing samples. However, this stationary assumption is often violated in real-world applications, especially when testing data appear in an online fashion. In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not. The non-stationarity nature and the lack of supervision make the problem challenging to be tackled. To address the difficulty, we construct a new unbiased risk estimator that utilizes the unlabeled data, which exhibits many benign properties albeit with potential non-convexity. Building upon that, we propose novel online ensemble algorithms to deal with the non-stationarity of the environments. Our approach enjoys optimal \emph{dynamic regret}, indicating that the performance is competitive with a clairvoyant who knows the online environments in hindsight and then chooses the best decision for each round. The obtained dynamic regret bound scales with the intensity and pattern of label distribution shift, hence exhibiting the adaptivity in the OLaS problem. Extensive experiments are conducted to validate the effectiveness and support our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Data Debiasing through Bounded Exploration b/data/2022/neurips/Adaptive Data Debiasing through Bounded Exploration
new file mode 100644
index 0000000000..6a37222911
--- /dev/null
+++ b/data/2022/neurips/Adaptive Data Debiasing through Bounded Exploration	
@@ -0,0 +1 @@
+Biases in existing datasets used to train algorithmic decision rules can raise ethical and economic concerns due to the resulting disparate treatment of different groups. We propose an algorithm for sequentially debiasing such datasets through adaptive and bounded exploration in a classification problem with costly and censored feedback. Exploration in this context means that at times, and to a judiciously-chosen extent, the decision maker deviates from its (current) loss-minimizing rule, and instead accepts some individuals that would otherwise be rejected, so as to reduce statistical data biases. Our proposed algorithm includes parameters that can be used to balance between the ultimate goal of removing data biases -- which will in turn lead to more accurate and fair decisions, and the exploration risks incurred to achieve this goal. We analytically show that such exploration can help debias data in certain distributions. We further investigate how fairness criteria can work in conjunction with our data debiasing algorithm. We illustrate the performance of our algorithm using experiments on synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport b/data/2022/neurips/Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport
new file mode 100644
index 0000000000..07cca0ec38
--- /dev/null
+++ b/data/2022/neurips/Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport	
@@ -0,0 +1 @@
+Few-shot classification aims to learn a classifier to recognize unseen classes during training, where the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples. A recent solution to this problem is calibrating the distribution of these few sample classes by transferring statistics from the base classes with sufficient examples, where how to decide the transfer weights from base classes to novel classes is the key. However, principled approaches for learning the transfer weights have not been carefully studied. To this end, we propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes, which is built upon a hierarchical Optimal Transport (H-OT) framework. By minimizing the high-level OT distance between novel samples and base classes, we can view the learned transport plan as the adaptive weight information for transferring the statistics of base classes. The learning of the cost function between a base class and novel class in the high-level OT leads to the introduction of the low-level OT, which considers the weights of all the data samples in the base class. Experimental results on standard benchmarks demonstrate that our proposed plug-and-play model outperforms competing approaches and owns desired cross-domain generalization ability, indicating the effectiveness of the learned adaptive weights.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Interest for Emphatic Reinforcement Learning b/data/2022/neurips/Adaptive Interest for Emphatic Reinforcement Learning
new file mode 100644
index 0000000000..c0874e1248
--- /dev/null
+++ b/data/2022/neurips/Adaptive Interest for Emphatic Reinforcement Learning	
@@ -0,0 +1 @@
+Emphatic algorithms have shown great promise in stabilizing and improving re-inforcement learning by selectively emphasizing the update rule. Although the emphasis fundamentally depends on an interest function which defines the intrinsic importance of each state, most approaches simply adopt a uniform interest over all states (except where a hand-designed interest is possible based on domain knowledge). In this paper, we investigate adaptive methods that allow the interest function to dynamically vary over states and iterations. In particular, we leverage meta-gradients to automatically discover online an interest function that would accelerate the agent’s learning process. Empirical evaluations on a wide range of environments show that adapting the interest is key to provide significant gains. Qualitative analysis indicates that the learned interest function emphasizes states of particular importance, such as bottlenecks, which can be especially useful in a transfer learning setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model b/data/2022/neurips/Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model
new file mode 100644
index 0000000000..39ccd246b0
--- /dev/null
+++ b/data/2022/neurips/Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model	
@@ -0,0 +1 @@
+This paper studies the fundamental problem of learning energy-based model (EBM) in the latent space of the generator model. Learning such prior model typically requires running costly Markov Chain Monte Carlo (MCMC). Instead, we propose to use noise contrastive estimation (NCE) to discriminatively learn the EBM through density ratio estimation between the latent prior density and latent posterior density. However, the NCE typically fails to accurately estimate such density ratio given large gap between two densities. To effectively tackle this issue and learn more expressive prior models, we develop the adaptive multi-stage density ratio estimation which breaks the estimation into multiple stages and learn different stages of density ratio sequentially and adaptively. The latent prior model can be gradually learned using ratio estimated in previous stage so that the final latent space EBM prior can be naturally formed by product of ratios in different stages. The proposed method enables informative and much sharper prior than existing baselines, and can be trained efficiently. Our experiments demonstrate strong performances in image generation and reconstruction as well as anomaly detection.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Oracle-Efficient Online Learning b/data/2022/neurips/Adaptive Oracle-Efficient Online Learning
new file mode 100644
index 0000000000..8d25072bd7
--- /dev/null
+++ b/data/2022/neurips/Adaptive Oracle-Efficient Online Learning	
@@ -0,0 +1 @@
+The classical algorithms for online learning and decision-making have the benefit of achieving the optimal performance guarantees, but suffer from computational complexity limitations when implemented at scale. More recent sophisticated techniques, which we refer to as oracle-efficient methods, address this problem by dispatching to an offline optimization oracle that can search through an exponentially-large (or even infinite) space of decisions and select that which performed the best on any dataset. But despite the benefits of computational feasibility, oracle-efficient algorithms exhibit one major limitation: while performing well in worst-case settings, they do not adapt well to friendly environments. In this paper we consider two such friendly scenarios, (a)"small-loss"problems and (b) IID data. We provide a new framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to the small-loss environment, under a particular condition which we call approximability (which is spiritually related to sufficient conditions provided by Dud\'{i}k et al., [2020]). We identify a series of real-world settings, including online auctions and transductive online classification, for which approximability holds. We also extend the algorithm to an IID data setting and establish a"best-of-both-worlds"bound in the oracle-efficient setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Sampling for Discovery b/data/2022/neurips/Adaptive Sampling for Discovery
new file mode 100644
index 0000000000..7e65fcfa00
--- /dev/null
+++ b/data/2022/neurips/Adaptive Sampling for Discovery	
@@ -0,0 +1 @@
+In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization b/data/2022/neurips/Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization
new file mode 100644
index 0000000000..122728ca74
--- /dev/null
+++ b/data/2022/neurips/Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization	
@@ -0,0 +1 @@
+We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $L$-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan&Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant $L$, target accuracy $\epsilon$ or any bound on gradient norms. In doing so, we are able to compute an $\epsilon$-stationary point with $\tilde{O}\left(n + \sqrt{n}/\epsilon^2\right)$ oracle-calls, which matches the respective lower bound up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2022/neurips/Adaptively Exploiting d-Separators with Causal Bandits b/data/2022/neurips/Adaptively Exploiting d-Separators with Causal Bandits
new file mode 100644
index 0000000000..1a16849f15
--- /dev/null
+++ b/data/2022/neurips/Adaptively Exploiting d-Separators with Causal Bandits	
@@ -0,0 +1 @@
+Multi-armed bandit problems provide a framework to identify the optimal intervention over a sequence of repeated experiments. Without additional assumptions, minimax optimal performance (measured by cumulative regret) is well-understood. With access to additional observed variables that d-separate the intervention from the outcome (i.e., they are a d-separator), recent"causal bandit"algorithms provably incur less regret. However, in practice it is desirable to be agnostic to whether observed variables are a d-separator. Ideally, an algorithm should be adaptive; that is, perform nearly as well as an algorithm with oracle knowledge of the presence or absence of a d-separator. In this work, we formalize and study this notion of adaptivity, and provide a novel algorithm that simultaneously achieves (a) optimal regret when a d-separator is observed, improving on classical minimax algorithms, and (b) significantly smaller regret than recent causal bandit algorithms when the observed variables are not a d-separator. Crucially, our algorithm does not require any oracle knowledge of whether a d-separator is observed. We also generalize this adaptivity to other conditions, such as the front-door criterion.
\ No newline at end of file
diff --git a/data/2022/neurips/Additive MIL: Intrinsically Interpretable Multiple Instance Learning for Pathology b/data/2022/neurips/Additive MIL: Intrinsically Interpretable Multiple Instance Learning for Pathology
new file mode 100644
index 0000000000..19065238e5
--- /dev/null
+++ b/data/2022/neurips/Additive MIL: Intrinsically Interpretable Multiple Instance Learning for Pathology	
@@ -0,0 +1 @@
+Multiple Instance Learning (MIL) has been widely applied in pathology towards solving critical problems such as automating cancer diagnosis and grading, predicting patient prognosis, and therapy response. Deploying these models in a clinical setting requires careful inspection of these black boxes during development and deployment to identify failures and maintain physician trust. In this work, we propose a simple formulation of MIL models, which enables interpretability while maintaining similar predictive performance. Our Additive MIL models enable spatial credit assignment such that the contribution of each region in the image can be exactly computed and visualized. We show that our spatial credit assignment coincides with regions used by pathologists during diagnosis and improves upon classical attention heatmaps from attention MIL models. We show that any existing MIL model can be made additive with a simple change in function composition. We also show how these models can debug model failures, identify spurious features, and highlight class-wise regions of interest, enabling their use in high-stakes environments such as clinical decision-making.
\ No newline at end of file
diff --git a/data/2022/neurips/Addressing Leakage in Concept Bottleneck Models b/data/2022/neurips/Addressing Leakage in Concept Bottleneck Models
new file mode 100644
index 0000000000..fcc0f43282
--- /dev/null
+++ b/data/2022/neurips/Addressing Leakage in Concept Bottleneck Models	
@@ -0,0 +1 @@
+Concept bottleneck models (CBMs) enhance the interpretability of their predictions by ﬁrst predicting high-level concepts given features, and subsequently predicting outcomes on the basis of these concepts. Recently, it was demonstrated that training the label predictor directly on the probabilities produced by the concept predictor as opposed to the ground-truth concepts, improves label predictions. However, this results in corruptions in the concept predictions that impact the concept accuracy as well as our ability to intervene on the concepts – a key proposed beneﬁt of CBMs. In this work, we investigate and address two issues with CBMs that cause this disparity in performance: having an insufﬁcient concept set and using inexpressive concept predictor. With our modiﬁcations, CBMs become competitive in terms of predictive performance, with models that otherwise leak unintended information in the concept probabilities, while having dramatically increased concept accuracy and intervention accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets b/data/2022/neurips/Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets
new file mode 100644
index 0000000000..827178b92d
--- /dev/null
+++ b/data/2022/neurips/Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets	
@@ -0,0 +1 @@
+There are over 300 sign languages in the world, many of which have very limited or no labelled sign-to-text datasets. To address low-resource data scenarios, self-supervised pretraining and multilingual finetuning have been shown to be effective in natural language and speech processing. In this work, we apply these ideas to sign language recognition. We make three contributions. First, we release SignCorpus , a large pretraining dataset on sign languages comprising about 4.6K hours of signing data across 10 sign languages. SignCorpus is curated from sign language videos on the internet, filtered for data quality, and converted into sequences of pose keypoints thereby removing all personal identifiable information (PII). Second, we release Sign2Vec , a graph-based model with 5.2M parameters that is pretrained on SignCorpus . We envisage Sign2Vec as a multilingual large-scale pretrained model which can be fine-tuned for various sign recognition tasks across languages. Third, we create MultiSign-ISLR – a multilingual and label-aligned dataset of sequences of pose keypoints from 11 labelled datasets across 7 sign languages, and MultiSign-FS – a new finger-spelling training and test set across 7 languages. On these datasets, we fine-tune Sign2Vec to create multilingual isolated sign recognition models. With experiments on multiple benchmarks, we show that pretraining and multilingual transfer are effective giving significant gains over state-of-the-art results. both pretraining and finetuning stages to be monolingual or multilingual. We see significant improvements against each of these baselines and also against state-of-the-art models trained individually for each dataset. We also finetune Sign2Vec on MultiSign-FS to create a multilingual finger spelling recognition dataset. With similar comparisons with baselines we report large improvements in accuracy both due to multilingual pretraining and joint fine-tuning. With these results, we demonstrate the value of the datasets we release - SignCorpus , MultiSign-ISLR , and MultiSign-FS and effectiveness of the multilingual Sign2Vec model. All models are released as part of the OpenHands repository.
\ No newline at end of file
diff --git a/data/2022/neurips/Adjoint-aided inference of Gaussian process driven differential equations b/data/2022/neurips/Adjoint-aided inference of Gaussian process driven differential equations
new file mode 100644
index 0000000000..86c09aba1d
--- /dev/null
+++ b/data/2022/neurips/Adjoint-aided inference of Gaussian process driven differential equations	
@@ -0,0 +1 @@
+Linear systems occur throughout engineering and the sciences, most notably as differential equations. In many cases the forcing function for the system is unknown, and interest lies in using noisy observations of the system to infer the forcing, as well as other unknown parameters. In differential equations, the forcing function is an unknown function of the independent variables (typically time and space), and can be modelled as a Gaussian process (GP). In this paper we show how the adjoint of a linear system can be used to efficiently infer forcing functions modelled as GPs, using a truncated basis expansion of the GP kernel. We show how exact conjugate Bayesian inference for the truncated GP can be achieved, in many cases with substantially lower computation than would be required using MCMC methods. We demonstrate the approach on systems of both ordinary and partial differential equations, and show that the basis expansion approach approximates well the true forcing with a modest number of basis vectors. Finally, we show how to infer point estimates for the non-linear model parameters, such as the kernel length-scales, using Bayesian optimisation.
\ No newline at end of file
diff --git a/data/2022/neurips/Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition b/data/2022/neurips/Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition
new file mode 100644
index 0000000000..ebea4efe1e
--- /dev/null
+++ b/data/2022/neurips/Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition	
@@ -0,0 +1 @@
+Deep learning models have shown their vulnerability when dealing with adversarial attacks. Existing attacks almost perform on low-level instances, such as pixels and super-pixels, and rarely exploit semantic clues. For face recognition attacks, existing methods typically generate the l_p-norm perturbations on pixels, however, resulting in low attack transferability and high vulnerability to denoising defense models. In this work, instead of performing perturbations on the low-level pixels, we propose to generate attacks through perturbing on the high-level semantics to improve attack transferability. Specifically, a unified flexible framework, Adversarial Attributes (Adv-Attribute), is designed to generate inconspicuous and transferable attacks on face recognition, which crafts the adversarial noise and adds it into different attributes based on the guidance of the difference in face recognition features from the target. Moreover, the importance-aware attribute selection and the multi-objective optimization strategy are introduced to further ensure the balance of stealthiness and attacking strength. Extensive experiments on the FFHQ and CelebA-HQ datasets show that the proposed Adv-Attribute method achieves the state-of-the-art attacking success rates while maintaining better visual effects against recent attack methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Advancing Model Pruning via Bi-level Optimization b/data/2022/neurips/Advancing Model Pruning via Bi-level Optimization
new file mode 100644
index 0000000000..a3369ad402
--- /dev/null
+++ b/data/2022/neurips/Advancing Model Pruning via Bi-level Optimization	
@@ -0,0 +1 @@
+The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks b/data/2022/neurips/Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
new file mode 100644
index 0000000000..60545536f9
--- /dev/null
+++ b/data/2022/neurips/Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks	
@@ -0,0 +1 @@
+The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores. Nonetheless, we note that if the loss trend of the outputs is slightly perturbed, SQAs could be easily misled and thereby become much less effective. Following this idea, we propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions by slightly modifying the output logits. In this way, (1) SQAs are prevented regardless of the model's worst-case robustness; (2) the original model predictions are hardly changed, i.e., no degradation on clean accuracy; (3) the calibration of confidence scores can be improved simultaneously. Extensive experiments are provided to verify the above advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our proposed AAA helps WideResNet-28 secure 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e., adversarial training) only attains 67.44%. Since AAA attacks SQA's general greedy strategy, such advantages of AAA over 8 defenses can be consistently observed on 8 CIFAR-10/ImageNet models under 6 SQAs, using different attack targets, bounds, norms, losses, and strategies. Moreover, AAA calibrates better without hurting the accuracy. Our code is available at https://github.com/Sizhe-Chen/AAA.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach b/data/2022/neurips/Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach
new file mode 100644
index 0000000000..d3ea55dea5
--- /dev/null
+++ b/data/2022/neurips/Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach	
@@ -0,0 +1 @@
+Data augmentation is a critical contributing factor to the success of deep learning but heavily relies on prior domain knowledge which is not always available. Recent works on automatic data augmentation learn a policy to form a sequence of augmentation operations, which are still pre-defined and restricted to limited options. In this paper, we show that a prior-free autonomous data augmentation's objective can be derived from a representation learning principle that aims to preserve the minimum sufficient information of the labels. Given an example, the objective aims at creating a distant"hard positive example"as the augmentation, while still preserving the original label. We then propose a practical surrogate to the objective that can be optimized efficiently and integrated seamlessly into existing methods for a broad class of machine learning tasks, e.g., supervised, semi-supervised, and noisy-label learning. Unlike previous works, our method does not require training an extra generative model but instead leverages the intermediate layer representations of the end-task model for generating data augmentations. In experiments, we show that our method consistently brings non-trivial improvements to the three aforementioned learning tasks from both efficiency and final performance, either or not combined with strong pre-defined augmentations, e.g., on medical images when domain knowledge is unavailable and the existing augmentation techniques perform poorly. Code is available at: https://github.com/kai-wen-yang/LPA3}{https://github.com/kai-wen-yang/LPA3.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Reprogramming Revisited b/data/2022/neurips/Adversarial Reprogramming Revisited
new file mode 100644
index 0000000000..488bd0f12b
--- /dev/null
+++ b/data/2022/neurips/Adversarial Reprogramming Revisited	
@@ -0,0 +1 @@
+Adversarial reprogramming, introduced by Elsayed, Goodfellow, and Sohl-Dickstein, seeks to repurpose a neural network to perform a different task, by manipulating its input without modifying its weights. We prove that two-layer ReLU neural networks with random weights can be adversarially reprogrammed to achieve arbitrarily high accuracy on Bernoulli data models over hypercube vertices, provided the network width is no greater than its input dimension. We also substantially strengthen a recent result of Phuong and Lampert on directional convergence of gradient flow, and obtain as a corollary that training two-layer ReLU neural networks on orthogonally separable datasets can cause their adversarial reprogramming to fail. We support these theoretical results by experiments that demonstrate that, as long as batch normalisation layers are suitably initialised, even untrained networks with random weights are susceptible to adversarial reprogramming. This is in contrast to observations in several recent works that suggested that adversarial reprogramming is not possible for untrained networks to any degree of reliability.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Robustness is at Odds with Lazy Training b/data/2022/neurips/Adversarial Robustness is at Odds with Lazy Training
new file mode 100644
index 0000000000..b87331ac5b
--- /dev/null
+++ b/data/2022/neurips/Adversarial Robustness is at Odds with Lazy Training	
@@ -0,0 +1 @@
+Recent works show that adversarial examples exist for random neural networks [Daniely and Schacham, 2020] and that these examples can be found using a single step of gradient ascent [Bubeck et al., 2021]. In this work, we extend this line of work to"lazy training"of neural networks -- a dominant model in deep learning theory in which neural networks are provably efficiently learnable. We show that over-parametrized neural networks that are guaranteed to generalize well and enjoy strong computational guarantees remain vulnerable to attacks generated using a single step of gradient ascent.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation b/data/2022/neurips/Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation
new file mode 100644
index 0000000000..f84fcdfa16
--- /dev/null
+++ b/data/2022/neurips/Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation	
@@ -0,0 +1 @@
+In this paper, we consider the problem of domain generalization in semantic segmentation, which aims to learn a robust model using only labeled synthetic (source) data. The model is expected to perform well on unseen real (target) domains. Our study finds that the image style variation can largely influence the model's performance and the style features can be well represented by the channel-wise mean and standard deviation of images. Inspired by this, we propose a novel adversarial style augmentation (AdvStyle) approach, which can dynamically generate hard stylized images during training and thus can effectively prevent the model from overfitting on the source domain. Specifically, AdvStyle regards the style feature as a learnable parameter and updates it by adversarial training. The learned adversarial style feature is used to construct an adversarial image for robust model training. AdvStyle is easy to implement and can be readily applied to different models. Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains and show that we can achieve the state of the art. Moreover, AdvStyle can be employed to domain generalized image classification and produces a clear improvement on the considered datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Task Up-sampling for Meta-learning b/data/2022/neurips/Adversarial Task Up-sampling for Meta-learning
new file mode 100644
index 0000000000..daefb091f6
--- /dev/null
+++ b/data/2022/neurips/Adversarial Task Up-sampling for Meta-learning	
@@ -0,0 +1 @@
+The success of meta-learning on existing benchmarks is predicated on the assumption that the distribution of meta-training tasks covers meta-testing tasks. Frequent violation of the assumption in applications with either insufficient tasks or a very narrow meta-training task distribution leads to memorization or learner overfitting. Recent solutions have pursued augmentation of meta-training tasks, while it is still an open question to generate both correct and sufficiently imaginary tasks. In this paper, we seek an approach that up-samples meta-training tasks from the task representation via a task up-sampling network. Besides, the resulting approach named Adversarial Task Up-sampling (ATU) suffices to generate tasks that can maximally contribute to the latest meta-learner by maximizing an adversarial loss. On few-shot sine regression and image classification datasets, we empirically validate the marked improvement of ATU over state-of-the-art task augmentation strategies in the meta-testing performance and also the quality of up-sampled tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks b/data/2022/neurips/Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks
new file mode 100644
index 0000000000..bc38722300
--- /dev/null
+++ b/data/2022/neurips/Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks	
@@ -0,0 +1 @@
+Adversarial training (AT) with imperfect supervision is significant but receives limited attention. To push AT towards more practical scenarios, we explore a brand new yet challenging setting, i.e., AT with complementary labels (CLs), which specify a class that a data sample does not belong to. However, the direct combination of AT with existing methods for CLs results in consistent failure, but not on a simple baseline of two-stage training. In this paper, we further explore the phenomenon and identify the underlying challenges of AT with CLs as intractable adversarial optimization and low-quality adversarial examples. To address the above problems, we propose a new learning strategy using gradually informative attacks, which consists of two critical components: 1) Warm-up Attack (Warm-up) gently raises the adversarial perturbation budgets to ease the adversarial optimization with CLs; 2) Pseudo-Label Attack (PLA) incorporates the progressively informative model predictions into a corrected complementary loss. Extensive experiments are conducted to demonstrate the effectiveness of our method on a range of benchmarked datasets. The code is publicly available at: https://github.com/RoyalSkye/ATCL.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial Unlearning: Reducing Confidence Along Adversarial Directions b/data/2022/neurips/Adversarial Unlearning: Reducing Confidence Along Adversarial Directions
new file mode 100644
index 0000000000..6d48f57745
--- /dev/null
+++ b/data/2022/neurips/Adversarial Unlearning: Reducing Confidence Along Adversarial Directions	
@@ -0,0 +1 @@
+Supervised learning methods trained with maximum likelihood objectives often overfit on training data. Most regularizers that prevent overfitting look to increase confidence on additional examples (e.g., data augmentation, adversarial training), or reduce it on training data (e.g., label smoothing). In this work we propose a complementary regularization strategy that reduces confidence on self-generated examples. The method, which we call RCAD (Reducing Confidence along Adversarial Directions), aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss. In contrast to adversarial training, RCAD does not try to robustify the model to output the original label, but rather regularizes it to have reduced confidence on points generated using much larger perturbations than in conventional adversarial training. RCAD can be easily integrated into training pipelines with a few lines of code. Despite its simplicity, we find on many classification benchmarks that RCAD can be added to existing techniques (e.g., label smoothing, MixUp training) to increase test accuracy by 1-3% in absolute value, with more significant gains in the low data regime. We also provide a theoretical analysis that helps to explain these benefits in simplified settings, showing that RCAD can provably help the model unlearn spurious features in the training data.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarial training for high-stakes reliability b/data/2022/neurips/Adversarial training for high-stakes reliability
new file mode 100644
index 0000000000..014d85d239
--- /dev/null
+++ b/data/2022/neurips/Adversarial training for high-stakes reliability	
@@ -0,0 +1 @@
+In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance. In this work, we used a safe language generation task (``avoid injuries'') as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques -- including a tool that assists human adversaries -- to find and eliminate failures in a classifier that filters text completions suggested by a generator. In our task, we determined that we can set very conservative classifier thresholds without significantly impacting the quality of the filtered outputs. We found that adversarial training increased robustness to the adversarial attacks that we trained on -- doubling the time for our contractors to find adversarial examples both with our tool (from 13 to 26 minutes) and without (from 20 to 44 minutes) -- without affecting in-distribution performance. We hope to see further work in the high-stakes reliability setting, including more powerful tools for enhancing human adversaries and better ways to measure high levels of reliability, until we can confidently rule out the possibility of catastrophic deployment-time failures of powerful models.
\ No newline at end of file
diff --git a/data/2022/neurips/Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization b/data/2022/neurips/Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization
new file mode 100644
index 0000000000..3e2672025a
--- /dev/null
+++ b/data/2022/neurips/Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization	
@@ -0,0 +1 @@
+We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.
\ No newline at end of file
diff --git a/data/2022/neurips/AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators b/data/2022/neurips/AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators
new file mode 100644
index 0000000000..2078b97879
--- /dev/null
+++ b/data/2022/neurips/AgraSSt: Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators	
@@ -0,0 +1 @@
+We propose and analyse a novel statistical procedure, coined AgraSSt, to assess the quality of graph generators that may not be available in explicit form. In particular, AgraSSt can be used to determine whether a learnt graph generating process is capable of generating graphs that resemble a given input graph. Inspired by Stein operators for random graphs, the key idea of AgraSSt is the construction of a kernel discrepancy based on an operator obtained from the graph generator. AgraSSt can provide interpretable criticisms for a graph generator training procedure and help identify reliable sample batches for downstream tasks. Using Stein`s method we give theoretical guarantees for a broad class of random graph models. We provide empirical results on both synthetic input graphs with known graph generation procedures, and real-world input graphs that the state-of-the-art (deep) generative models for graphs are trained on.
\ No newline at end of file
diff --git a/data/2022/neurips/Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift b/data/2022/neurips/Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift
new file mode 100644
index 0000000000..6d376940b3
--- /dev/null
+++ b/data/2022/neurips/Agreement-on-the-line: Predicting the Performance of Neural Networks under Distribution Shift	
@@ -0,0 +1 @@
+Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear correlation with its out-of-distribution (OOD) accuracy on several OOD benchmarks -- a phenomenon they dubbed ''accuracy-on-the-line''. While a useful tool for model selection (i.e., the model most likely to perform the best OOD is the one with highest ID accuracy), this fact does not help estimate the actual OOD performance of models without access to a labeled OOD validation set. In this paper, we show a similar but surprising phenomenon also holds for the agreement between pairs of neural network classifiers: whenever accuracy-on-the-line holds, we observe that the OOD agreement between the predictions of any two pairs of neural networks (with potentially different architectures) also observes a strong linear correlation with their ID agreement. Furthermore, we observe that the slope and bias of OOD vs ID agreement closely matches that of OOD vs ID accuracy. This phenomenon, which we call agreement-on-the-line, has important practical applications: without any labeled data, we can predict the OOD accuracy of classifiers}, since OOD agreement can be estimated with just unlabeled data. Our prediction algorithm outperforms previous methods both in shifts where agreement-on-the-line holds and, surprisingly, when accuracy is not on the line. This phenomenon also provides new insights into deep neural networks: unlike accuracy-on-the-line, agreement-on-the-line appears to only hold for neural network classifiers.
\ No newline at end of file
diff --git a/data/2022/neurips/AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions b/data/2022/neurips/AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions
new file mode 100644
index 0000000000..cd2b5da418
--- /dev/null
+++ b/data/2022/neurips/AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier-Stokes Solutions	
@@ -0,0 +1 @@
+Surrogate models are necessary to optimize meaningful quantities in physical dynamics as their recursive numerical resolutions are often prohibitively expensive. It is mainly the case for fluid dynamics and the resolution of Navier-Stokes equations. However, despite the fast-growing field of data-driven models for physical systems, reference datasets representing real-world phenomena are lacking. In this work, we develop AirfRANS, a dataset for studying the two-dimensional incompressible steady-state Reynolds-Averaged Navier-Stokes equations over airfoils at a subsonic regime and for different angles of attacks. We also introduce metrics on the stress forces at the surface of geometries and visualization of boundary layers to assess the capabilities of models to accurately predict the meaningful information of the problem. Finally, we propose deep learning baselines on four machine learning tasks to study AirfRANS under different constraints for generalization considerations: big and scarce data regime, Reynolds number, and angle of attack extrapolation.
\ No newline at end of file
diff --git a/data/2022/neurips/Algorithms and Hardness for Learning Linear Thresholds from Label Proportions b/data/2022/neurips/Algorithms and Hardness for Learning Linear Thresholds from Label Proportions
new file mode 100644
index 0000000000..092a228442
--- /dev/null
+++ b/data/2022/neurips/Algorithms and Hardness for Learning Linear Thresholds from Label Proportions	
@@ -0,0 +1 @@
+We study the learnability of linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the feature-vector classiﬁer is learnt from bags of feature-vectors and their corresponding observed label proportions which are satisﬁed by (i.e., consistent with) some unknown LTF. This problem has been investigated in recent work ([37]) which gave an algorithm to produce an LTF that satisﬁes at least ( 2 / 5 ) -fraction of a satisﬁable collection of bags, each of size  2 , by solving and rounding a natural SDP relaxation. However, this SDP relaxation is speciﬁc to at most 2 -sized bags and does not apply to bags of larger size. In this work we provide a fairly non-trivial SDP relaxation of a non-quadratic formulation for bags of size 3 . We analyze its rounding procedure using novel matrix decomposition techniques to obtain an algorithm which outputs an LTF satisfying at least ( 1 / 12 ) -fraction of the bags of size  3 . We also apply our techniques to bags of size q � 4 to provide a ⌦ ( 1 / q ) -approximation guarantee for a weaker notion of satisﬁability. We include comparative experiments on simulated data demonstrating the applicability of our algorithmic techniques. From the complexity side we provide a hardness reduction to produce instances with bags of any constant size q . Our reduction proves the NP-hardness of satisfying more than ( 1 / q ) + o (1) fraction of a satisﬁable collection of such bags using as hypothesis any function of constantly many LTFs, showing thereby that the problem is harder to approximate as the bag size q increases. Using a strengthened analysis, for q = 2 we obtain a ( 4 / 9 ) + o (1) hardness factor for this problem, improving upon the ( 1 / 2 ) + o (1) factor shown by [37].
\ No newline at end of file
diff --git a/data/2022/neurips/Algorithms that Approximate Data Removal: New Results and Limitations b/data/2022/neurips/Algorithms that Approximate Data Removal: New Results and Limitations
new file mode 100644
index 0000000000..77cad936de
--- /dev/null
+++ b/data/2022/neurips/Algorithms that Approximate Data Removal: New Results and Limitations	
@@ -0,0 +1 @@
+We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used $\ell_1$, elastic net, or nuclear norm penalties. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models.
\ No newline at end of file
diff --git a/data/2022/neurips/Algorithms with Prediction Portfolios b/data/2022/neurips/Algorithms with Prediction Portfolios
new file mode 100644
index 0000000000..ae18fadc73
--- /dev/null
+++ b/data/2022/neurips/Algorithms with Prediction Portfolios	
@@ -0,0 +1 @@
+The research area of algorithms with predictions has seen recent success showing how to incorporate machine learning into algorithm design to improve performance when the predictions are correct, while retaining worst-case guarantees when they are not. Most previous work has assumed that the algorithm has access to a single predictor. However, in practice, there are many machine learning methods available, often with incomparable generalization guarantees, making it hard to pick a best method a priori. In this work we consider scenarios where multiple predictors are available to the algorithm and the question is how to best utilize them. Ideally, we would like the algorithm's performance to depend on the quality of the best predictor. However, utilizing more predictions comes with a cost, since we now have to identify which prediction is the best. We study the use of multiple predictors for a number of fundamental problems, including matching, load balancing, and non-clairvoyant scheduling, which have been well-studied in the single predictor setting. For each of these problems we introduce new algorithms that take advantage of multiple predictors, and prove bounds on the resulting performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences b/data/2022/neurips/Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences
new file mode 100644
index 0000000000..11dfd57a5c
--- /dev/null
+++ b/data/2022/neurips/Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences	
@@ -0,0 +1 @@
+Multi-view anchor graph clustering selects representative anchors to avoid full pair-wise similarities and therefore reduce the complexity of graph methods. Although widely applied in large-scale applications, existing approaches do not pay sufficient attention to establishing correct correspondences between the anchor sets across views. To be specific, anchor graphs obtained from different views are not aligned column-wisely. Such an \textbf{A}nchor-\textbf{U}naligned \textbf{P}roblem (AUP) would cause inaccurate graph fusion and degrade the clustering performance. Under multi-view scenarios, generating correct correspondences could be extremely difficult since anchors are not consistent in feature dimensions. To solve this challenging issue, we propose the first study of the generalized and flexible anchor graph fusion framework termed \textbf{F}ast \textbf{M}ulti-\textbf{V}iew \textbf{A}nchor-\textbf{C}orrespondence \textbf{C}lustering (FMVACC). Specifically, we show how to find anchor correspondence with both feature and structure information, after which anchor graph fusion is performed column-wisely. Moreover, we theoretically show the connection between FMVACC and existing multi-view late fusion \cite{liu2018late} and partial view-aligned clustering \cite{huang2020partially}, which further demonstrates our generality. Extensive experiments on seven benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Moreover, the proposed alignment module also shows significant performance improvement applying to existing multi-view anchor graph competitors indicating the importance of anchor alignment. Our code is available at \url{https://github.com/wangsiwei2010/NeurIPS22-FMVACC}.
\ No newline at end of file
diff --git a/data/2022/neurips/Aligning individual brains with fused unbalanced Gromov Wasserstein b/data/2022/neurips/Aligning individual brains with fused unbalanced Gromov Wasserstein
new file mode 100644
index 0000000000..5cddf966ef
--- /dev/null
+++ b/data/2022/neurips/Aligning individual brains with fused unbalanced Gromov Wasserstein	
@@ -0,0 +1 @@
+Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel method for inter-subject alignment based on Optimal Transport, denoted as Fused Unbalanced Gromov Wasserstein (FUGW). The method aligns cortical surfaces based on the similarity of their functional signatures in response to a variety of stimulation settings, while penalizing large deformations of individual topographic organization. We demonstrate that FUGW is well-suited for whole-brain landmark-free alignment. The unbalanced feature allows to deal with the fact that functional areas vary in size across subjects. Our results show that FUGW alignment significantly increases between-subject correlation of activity for independent functional data, and leads to more precise mapping at the group level.
\ No newline at end of file
diff --git a/data/2022/neurips/Alignment-guided Temporal Attention for Video Action Recognition b/data/2022/neurips/Alignment-guided Temporal Attention for Video Action Recognition
new file mode 100644
index 0000000000..b676ce8eb6
--- /dev/null
+++ b/data/2022/neurips/Alignment-guided Temporal Attention for Video Action Recognition	
@@ -0,0 +1 @@
+Temporal modeling is crucial for various video learning tasks. Most recent approaches employ either factorized (2D+1D) or joint (3D) spatial-temporal operations to extract temporal contexts from the input frames. While the former is more efficient in computation, the latter often obtains better performance. In this paper, we attribute this to a dilemma between the sufficiency and the efficiency of interactions among various positions in different frames. These interactions affect the extraction of task-relevant information shared among frames. To resolve this issue, we prove that frame-by-frame alignments have the potential to increase the mutual information between frame representations, thereby including more task-relevant information to boost effectiveness. Then we propose Alignment-guided Temporal Attention (ATA) to extend 1-dimensional temporal attention with parameter-free patch-level alignments between neighboring frames. It can act as a general plug-in for image backbones to conduct the action recognition task without any model-specific design. Extensive experiments on multiple benchmarks demonstrate the superiority and generality of our module.
\ No newline at end of file
diff --git a/data/2022/neurips/All Politics is Local: Redistricting via Local Fairness b/data/2022/neurips/All Politics is Local: Redistricting via Local Fairness
new file mode 100644
index 0000000000..fa40acfdeb
--- /dev/null
+++ b/data/2022/neurips/All Politics is Local: Redistricting via Local Fairness	
@@ -0,0 +1 @@
+In this paper, we propose to use the concept of local fairness for auditing and ranking redistricting plans. Given a redistricting plan, a deviating group is a population-balanced contiguous region in which a majority of individuals are of the same interest and in the minority of their respective districts; such a set of individuals have a justified complaint with how the redistricting plan was drawn. A redistricting plan with no deviating groups is called locally fair. We show that the problem of auditing a given plan for local fairness is NP-complete. We present an MCMC approach for auditing as well as ranking redistricting plans. We also present a dynamic programming based algorithm for the auditing problem that we use to demonstrate the efficacy of our MCMC approach. Using these tools, we test local fairness on real-world election data, showing that it is indeed possible to find plans that are almost or exactly locally fair. Further, we show that such plans can be generated while sacrificing very little in terms of compactness and existing fairness measures such as competitiveness of the districts or seat shares of the plans.
\ No newline at end of file
diff --git "a/data/2022/neurips/Alleviating \"Posterior Collapse\" in Deep Topic Models via Policy Gradient" "b/data/2022/neurips/Alleviating \"Posterior Collapse\" in Deep Topic Models via Policy Gradient"
new file mode 100644
index 0000000000..f0c7ceb547
--- /dev/null
+++ "b/data/2022/neurips/Alleviating \"Posterior Collapse\" in Deep Topic Models via Policy Gradient"	
@@ -0,0 +1 @@
+Deep topic models have been proven as a promising way to extract hierarchical latent representations from documents represented as high-dimensional bag-of-words vectors. However, the representation capability of existing deep topic models is still limited by the phenomenon of “ posterior collapse ”, which has been widely criticized in deep generative models, resulting in the higher-level latent representations exhibiting similar or meaningless patterns. To this end, in this paper, we first develop a novel deep-coupling generative process for existing deep topic models, which incorporates skip connections into the generation of documents, enforcing strong links between the document and its multi-layer latent representations. After that, utilizing data augmentation techniques, we reformulate the deep-coupling generative process as a Markov decision process and develop a corresponding Policy Gradient (PG) based training algorithm, which can further alleviate the information reduction at higher layers. Extensive experiments demonstrate that our developed methods can effectively alleviate “ posterior collapse ” in deep topic models, contributing to providing higher-quality latent document representations.
\ No newline at end of file
diff --git a/data/2022/neurips/Alleviating Adversarial Attacks on Variational Autoencoders with MCMC b/data/2022/neurips/Alleviating Adversarial Attacks on Variational Autoencoders with MCMC
new file mode 100644
index 0000000000..05ced47080
--- /dev/null
+++ b/data/2022/neurips/Alleviating Adversarial Attacks on Variational Autoencoders with MCMC	
@@ -0,0 +1 @@
+Variational autoencoders (VAEs) are latent variable models that can generate complex objects and provide meaningful latent representations. Moreover, they could be further used in downstream tasks such as classification. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attack construction proposed previously and present a solution to alleviate the effect of these attacks. Our method utilizes the Markov Chain Monte Carlo (MCMC) technique in the inference step that we motivate with a theoretical analysis. Thus, we do not incorporate any extra costs during training, and the performance on non-attacked inputs is not decreased. We validate our approach on a variety of datasets (MNIST, Fashion MNIST, Color MNIST, CelebA) and VAE configurations ($\beta$-VAE, NVAE, $\beta$-TCVAE), and show that our approach consistently improves the model robustness to adversarial attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid b/data/2022/neurips/Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid
new file mode 100644
index 0000000000..860c7279be
--- /dev/null
+++ b/data/2022/neurips/Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid	
@@ -0,0 +1 @@
+Few-shot learning (FSL) targets at generalization of vision models towards unseen tasks without sufficient annotations. Despite the emergence of a number of few-shot learning methods, the sample selection bias problem, i.e., the sensitivity to the limited amount of support data, has not been well understood. In this paper, we find that this problem usually occurs when the positions of support samples are in the vicinity of task centroid -- the mean of all class centroids in the task. This motivates us to propose an extremely simple feature transformation to alleviate this problem, dubbed Task Centroid Projection Removing (TCPR). TCPR is applied directly to all image features in a given task, aiming at removing the dimension of features along the direction of the task centroid. While the exact task centroid cannot be accurately obtained from limited data, we estimate it using base features that are each similar to one of the support features. Our method effectively prevents features from being too close to the task centroid. Extensive experiments over ten datasets from different domains show that TCPR can reliably improve classification accuracy across various feature extractors, training algorithms and datasets. The code has been made available at https://github.com/KikimorMay/FSL-TCBR.
\ No newline at end of file
diff --git a/data/2022/neurips/Alternating Mirror Descent for Constrained Min-Max Games b/data/2022/neurips/Alternating Mirror Descent for Constrained Min-Max Games
new file mode 100644
index 0000000000..67fcd34551
--- /dev/null
+++ b/data/2022/neurips/Alternating Mirror Descent for Constrained Min-Max Games	
@@ -0,0 +1 @@
+In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).
\ No newline at end of file
diff --git a/data/2022/neurips/Ambiguous Images With Human Judgments for Robust Visual Event Classification b/data/2022/neurips/Ambiguous Images With Human Judgments for Robust Visual Event Classification
new file mode 100644
index 0000000000..a4caeaa3b3
--- /dev/null
+++ b/data/2022/neurips/Ambiguous Images With Human Judgments for Robust Visual Event Classification	
@@ -0,0 +1 @@
+Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models. Experimental results suggest that existing vision models are not sufficiently equipped to provide meaningful outputs for ambiguous images and that datasets of this nature can be used to assess and improve such models through model training and direct evaluation of model calibration. These findings motivate large-scale ambiguous dataset creation and further research focusing on noisy visual data.
\ No newline at end of file
diff --git a/data/2022/neurips/Amortized Inference for Causal Structure Learning b/data/2022/neurips/Amortized Inference for Causal Structure Learning
new file mode 100644
index 0000000000..ce5ffd7c9b
--- /dev/null
+++ b/data/2022/neurips/Amortized Inference for Causal Structure Learning	
@@ -0,0 +1 @@
+Inferring causal structure poses a combinatorial search problem that typically involves evaluating structures with a score or independence test. The resulting search is costly, and designing suitable scores or tests that capture prior knowledge is difficult. In this work, we propose to amortize causal structure learning. Rather than searching over structures, we train a variational inference model to directly predict the causal structure from observational or interventional data. This allows our inference model to acquire domain-specific inductive biases for causal discovery solely from data generated by a simulator, bypassing both the hand-engineering of suitable score functions and the search over graphs. The architecture of our inference model emulates permutation invariances that are crucial for statistical efficiency in structure learning, which facilitates generalization to significantly larger problem instances than seen during training. On synthetic data and semisynthetic gene expression data, our models exhibit robust generalization capabilities when subject to substantial distribution shifts and significantly outperform existing algorithms, especially in the challenging genomics domain. Our code and models are publicly available at: https://github.com/larslorch/avici.
\ No newline at end of file
diff --git a/data/2022/neurips/Amortized Inference for Heterogeneous Reconstruction in Cryo-EM b/data/2022/neurips/Amortized Inference for Heterogeneous Reconstruction in Cryo-EM
new file mode 100644
index 0000000000..82926053d7
--- /dev/null
+++ b/data/2022/neurips/Amortized Inference for Heterogeneous Reconstruction in Cryo-EM	
@@ -0,0 +1 @@
+Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the conformational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Amortized Mixing Coupling Processes for Clustering b/data/2022/neurips/Amortized Mixing Coupling Processes for Clustering
new file mode 100644
index 0000000000..66c3acc4f1
--- /dev/null
+++ b/data/2022/neurips/Amortized Mixing Coupling Processes for Clustering	
@@ -0,0 +1 @@
+Considering the ever-increasing scale of data, which may contain tens of thousands of data points or complicated latent structures, the issue of scalability and algorithmic efﬁciency becomes of vital importance for clustering. In this paper, we propose cluster-wise amortized mixing coupling processes (AMCP), which is able to achieve efﬁcient amortized clustering in a well-deﬁned non-parametric Bayesian posterior. Speciﬁcally, AMCP learns clusters sequentially with the aid of the proposed intra-cluster mixing (IntraCM) and inter-cluster coupling (InterCC) strategies, which investigate the relationship between data points and reference distribution in a linear optimal transport mixing view, and coupling the unassigned set and assigned set to generate new cluster. IntraCM and InterCC avoid pairwise calculation of distances between clusters and reduce the computational complexity from quadratic to linear in the current number of clusters. Furthermore, cluster-wise sequential process is able to improve the quick adaptation ability for the next cluster generation. In this case, AMCP simultaneously learns what makes a cluster, how to group data points into clusters, and how to adaptively control the number of clusters. To illustrate the superiority of the proposed method, we perform experiments on both synthetic data and real-world data in terms of clustering performance and computational efﬁciency.
\ No newline at end of file
diff --git a/data/2022/neurips/Amortized Projection Optimization for Sliced Wasserstein Generative Models b/data/2022/neurips/Amortized Projection Optimization for Sliced Wasserstein Generative Models
new file mode 100644
index 0000000000..d457ce0325
--- /dev/null
+++ b/data/2022/neurips/Amortized Projection Optimization for Sliced Wasserstein Generative Models	
@@ -0,0 +1 @@
+Seeking informative projecting directions has been an important task in utilizing sliced Wasserstein distance in applications. However, finding these directions usually requires an iterative optimization procedure over the space of projecting directions, which is computationally expensive. Moreover, the computational issue is even more severe in deep learning applications, where computing the distance between two mini-batch probability measures is repeated several times. This nested loop has been one of the main challenges that prevent the usage of sliced Wasserstein distances based on good projections in practice. To address this challenge, we propose to utilize the learning-to-optimize technique or amortized optimization to predict the informative direction of any given two mini-batch probability measures. To the best of our knowledge, this is the first work that bridges amortized optimization and sliced Wasserstein generative models. In particular, we derive linear amortized models, generalized linear amortized models, and non-linear amortized models which are corresponding to three types of novel mini-batch losses, named amortized sliced Wasserstein. We demonstrate the favorable performance of the proposed sliced losses in deep generative modeling on standard benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Amortized Proximal Optimization b/data/2022/neurips/Amortized Proximal Optimization
new file mode 100644
index 0000000000..07601e880a
--- /dev/null
+++ b/data/2022/neurips/Amortized Proximal Optimization	
@@ -0,0 +1 @@
+We propose a framework for online meta-optimization of parameters that govern optimization, called Amortized Proximal Optimization (APO). We first interpret various existing neural network optimizers as approximate stochastic proximal point methods which trade off the current-batch loss with proximity terms in both function space and weight space. The idea behind APO is to amortize the minimization of the proximal point objective by meta-learning the parameters of an update rule. We show how APO can be used to adapt a learning rate or a structured preconditioning matrix. Under appropriate assumptions, APO can recover existing optimizers such as natural gradient descent and KFAC. It enjoys low computational overhead and avoids expensive and numerically sensitive operations required by some second-order optimizers, such as matrix inverses. We empirically test APO for online adaptation of learning rates and structured preconditioning matrices for regression, image reconstruction, image classification, and natural language translation tasks. Empirically, the learning rate schedules found by APO generally outperform optimal fixed learning rates and are competitive with manually tuned decay schedules. Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods. Moreover, the absence of matrix inversion provides numerical stability, making it effective for low precision training.
\ No newline at end of file
diff --git a/data/2022/neurips/Amplifying Membership Exposure via Data Poisoning b/data/2022/neurips/Amplifying Membership Exposure via Data Poisoning
new file mode 100644
index 0000000000..45448ba36e
--- /dev/null
+++ b/data/2022/neurips/Amplifying Membership Exposure via Data Poisoning	
@@ -0,0 +1 @@
+As in-the-wild data are increasingly involved in the training stage, machine learning applications become more susceptible to data poisoning attacks. Such attacks typically lead to test-time accuracy degradation or controlled misprediction. In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples. To this end, we demonstrate a set of data poisoning attacks to amplify the membership exposure of the targeted class. We first propose a generic dirty-label attack for supervised classification algorithms. We then propose an optimization-based clean-label attack in the transfer learning scenario, whereby the poisoning samples are correctly labeled and look"natural"to evade human moderation. We extensively evaluate our attacks on computer vision benchmarks. Our results show that the proposed attacks can substantially increase the membership inference precision with minimum overall test-time model performance degradation. To mitigate the potential negative impacts of our attacks, we also investigate feasible countermeasures.
\ No newline at end of file
diff --git a/data/2022/neurips/An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context b/data/2022/neurips/An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context
new file mode 100644
index 0000000000..cf0325c615
--- /dev/null
+++ b/data/2022/neurips/An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context	
@@ -0,0 +1 @@
+One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.
\ No newline at end of file
diff --git a/data/2022/neurips/An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects b/data/2022/neurips/An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects
new file mode 100644
index 0000000000..d5a98d7430
--- /dev/null
+++ b/data/2022/neurips/An Adaptive Kernel Approach to Federated Learning of Heterogeneous Causal Effects	
@@ -0,0 +1 @@
+We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
\ No newline at end of file
diff --git a/data/2022/neurips/An Algorithm for Learning Switched Linear Dynamics from Data b/data/2022/neurips/An Algorithm for Learning Switched Linear Dynamics from Data
new file mode 100644
index 0000000000..1e1e12b132
--- /dev/null
+++ b/data/2022/neurips/An Algorithm for Learning Switched Linear Dynamics from Data	
@@ -0,0 +1 @@
+We present an algorithm for learning switched linear dynamical systems in discrete time from noisy observations of the system’s full state or output. Switched linear systems use multiple linear dynamical modes to fit the data within some desired tolerance. They arise quite naturally in applications to robotics and cyber-physical systems. Learning switched systems from data is a NP-hard problem that is nearly identical to the k -linear regression problem of fitting k > 1 linear models to the data. A direct mixed-integer linear programming (MILP) approach yields time complexity that is exponential in the number of data points. In this paper, we modify the problem formulation to yield an algorithm that is linear in the size of the data while remaining exponential in the number of state variables and the desired number of modes. To do so, we combine classic ideas from the ellipsoidal method for solving convex optimization problems, and well-known oracle separation results in non-smooth optimization. We demonstrate our approach on a set of microbenchmarks and a few interesting real-world problems. Our evaluation suggests that the benefits of this algorithm can be made practical even against highly optimized off-the-shelf MILP solvers.
\ No newline at end of file
diff --git a/data/2022/neurips/An Analysis of Ensemble Sampling b/data/2022/neurips/An Analysis of Ensemble Sampling
new file mode 100644
index 0000000000..fae55d6450
--- /dev/null
+++ b/data/2022/neurips/An Analysis of Ensemble Sampling	
@@ -0,0 +1 @@
+Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made possible by leveraging information-theoretic concepts and novel analytic techniques that may prove useful beyond the scope of this paper.
\ No newline at end of file
diff --git a/data/2022/neurips/An Analytical Theory of Curriculum Learning in Teacher-Student Networks b/data/2022/neurips/An Analytical Theory of Curriculum Learning in Teacher-Student Networks
new file mode 100644
index 0000000000..83e2e4232c
--- /dev/null
+++ b/data/2022/neurips/An Analytical Theory of Curriculum Learning in Teacher-Student Networks	
@@ -0,0 +1 @@
+In animals and humans, curriculum learning—presenting data in a curated order—is critical to rapid learning and effective pedagogy. A long history of experiments has demonstrated the impact of curricula in a variety of animals but, despite its ubiquitous presence, a theoretical understanding of the phenomenon is still lacking. Surprisingly, in contrast to animal learning, curricula strategies are not widely used in machine learning and recent simulation studies reach the conclusion that curricula are moderately effective or even ineffective in most cases. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. We study a task in which a sparse set of informative features are embedded amidst a large set of noisy features. We analytically derive average learning trajectories for simple neural networks on this task, which establish a clear speed benefit for curriculum learning in the online setting. However, when training experiences can be stored and replayed (for instance, during sleep), the advantage of curriculum in standard neural networks disappears, in line with observations from the deep learning literature. Inspired by synaptic consolidation techniques developed to combat catastrophic forgetting, we propose curriculum-aware algorithms that consolidate synapses at curriculum change points and investigate whether this can boost the benefits of curricula. We derive generalisation performance as a function of consolidation strength (implemented as an L 2 regularisation/elastic coupling connecting learning phases), and show that curriculum-aware algorithms can yield a large improvement in test performance. Our reduced analytical descriptions help reconcile apparently conflicting empirical results, trace regimes where curriculum learning yields the largest gains, and provide experimentally-accessible predictions for the impact of task parameters on curriculum benefits. More broadly, our results suggest that fully exploiting a curriculum may require explicit adjustments in the loss.
\ No newline at end of file
diff --git a/data/2022/neurips/An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem b/data/2022/neurips/An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem
new file mode 100644
index 0000000000..ba0ca6a613
--- /dev/null
+++ b/data/2022/neurips/An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem	
@@ -0,0 +1 @@
+We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The"batched"dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds).
\ No newline at end of file
diff --git a/data/2022/neurips/An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning b/data/2022/neurips/An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning
new file mode 100644
index 0000000000..e2d7b22f11
--- /dev/null
+++ b/data/2022/neurips/An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning	
@@ -0,0 +1 @@
+Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data. Many sophisticated methods have been developed to address the challenges this problem comprises. In this paper, we propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective, and then augment the extremely label-constrained support set in few-shot classification tasks. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it is able to outperform state-of-the-art methods on four benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/An Empirical Study on Disentanglement of Negative-free Contrastive Learning b/data/2022/neurips/An Empirical Study on Disentanglement of Negative-free Contrastive Learning
new file mode 100644
index 0000000000..1d5fb12a31
--- /dev/null
+++ b/data/2022/neurips/An Empirical Study on Disentanglement of Negative-free Contrastive Learning	
@@ -0,0 +1 @@
+Negative-free contrastive learning methods have attracted a lot of attention with simplicity and impressive performances for large-scale pretraining. However, its disentanglement property remains unexplored. In this paper, we examine negative-free contrastive learning methods to study the disentanglement property empirically. We find that existing disentanglement metrics fail to make meaningful measurements for high-dimensional representation models, so we propose a new disentanglement metric based on Mutual Information between latent representations and data factors. With this proposed metric, we benchmark the disentanglement property of negative-free contrastive learning on both popular synthetic datasets and a real-world dataset CelebA. Our study shows that the investigated methods can learn a well-disentangled subset of representation. As far as we know, we are the first to extend the study of disentangled representation learning to high-dimensional representation space and introduce negative-free contrastive learning methods into this area. The source code of this paper is available at \url{https://github.com/noahcao/disentanglement_lib_med}.
\ No newline at end of file
diff --git a/data/2022/neurips/An In-depth Study of Stochastic Backpropagation b/data/2022/neurips/An In-depth Study of Stochastic Backpropagation
new file mode 100644
index 0000000000..b84f46f38a
--- /dev/null
+++ b/data/2022/neurips/An In-depth Study of Stochastic Backpropagation	
@@ -0,0 +1 @@
+In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks. During backward propagation, SBP calculates the gradients by only using a subset of feature maps to save the GPU memory and computational cost. We interpret SBP as an efficient way to implement stochastic gradient decent by performing backpropagation dropout, which leads to considerable memory saving and training process speedup, with a minimal impact on the overall model accuracy. We offer some good practices to apply SBP in training image recognition models, which can be adopted in learning a wide range of deep neural networks. Experiments on image classification and object detection show that SBP can save up to 40% of GPU memory with less than 1% accuracy degradation.
\ No newline at end of file
diff --git a/data/2022/neurips/An Information-Theoretic Framework for Deep Learning b/data/2022/neurips/An Information-Theoretic Framework for Deep Learning
new file mode 100644
index 0000000000..e63f0cf710
--- /dev/null
+++ b/data/2022/neurips/An Information-Theoretic Framework for Deep Learning	
@@ -0,0 +1 @@
+Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. We use this framework to study the sample complexity of learning from data generated by deep ReLU neural networks and deep networks that are infinitely wide but have a bounded sum of weights. We establish that the sample complexity of learning under these data generating processes is at most linear and quadratic, respectively, in network depth.
\ No newline at end of file
diff --git a/data/2022/neurips/An Investigation into Whitening Loss for Self-supervised Learning b/data/2022/neurips/An Investigation into Whitening Loss for Self-supervised Learning
new file mode 100644
index 0000000000..7b01cf8bb2
--- /dev/null
+++ b/data/2022/neurips/An Investigation into Whitening Loss for Self-supervised Learning	
@@ -0,0 +1 @@
+A desirable objective in self-supervised learning (SSL) is to avoid feature collapse. Whitening loss guarantees collapse avoidance by minimizing the distance between embeddings of positive pairs under the conditioning that the embeddings from different views are whitened. In this paper, we propose a framework with an informative indicator to analyze whitening loss, which provides a clue to demystify several interesting phenomena as well as a pivoting point connecting to other SSL methods. We reveal that batch whitening (BW) based methods do not impose whitening constraints on the embedding, but they only require the embedding to be full-rank. This full-rank constraint is also sufficient to avoid dimensional collapse. Based on our analysis, we propose channel whitening with random group partition (CW-RGP), which exploits the advantages of BW-based methods in preventing collapse and avoids their disadvantages requiring large batch size. Experimental results on ImageNet classification and COCO object detection reveal that the proposed CW-RGP possesses a promising potential for learning good representations. The code is available at https://github.com/winci-ai/CW-RGP.
\ No newline at end of file
diff --git a/data/2022/neurips/An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries b/data/2022/neurips/An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries
new file mode 100644
index 0000000000..9ca655d270
--- /dev/null
+++ b/data/2022/neurips/An efficient graph generative model for navigating ultra-large combinatorial synthesis libraries	
@@ -0,0 +1 @@
+Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.
\ No newline at end of file
diff --git a/data/2022/neurips/An empirical analysis of compute-optimal large language model training b/data/2022/neurips/An empirical analysis of compute-optimal large language model training
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Analyzing Data-Centric Properties for Graph Contrastive Learning b/data/2022/neurips/Analyzing Data-Centric Properties for Graph Contrastive Learning
new file mode 100644
index 0000000000..e3d0b0959c
--- /dev/null
+++ b/data/2022/neurips/Analyzing Data-Centric Properties for Graph Contrastive Learning	
@@ -0,0 +1 @@
+Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy these properties. This raises the question: how do graph SSL methods, such as contrastive learning (CL), work well? To systematically probe this question, we perform a generalization analysis for CL when using generic graph augmentations (GGAs), with a focus on data-centric properties. Our analysis yields formal insights into the limitations of GGAs and the necessity of task-relevant augmentations. As we empirically show, GGAs do not induce task-relevant invariances on common benchmark datasets, leading to only marginal gains over naive, untrained baselines. Our theory motivates a synthetic data generation process that enables control over task-relevant information and boasts pre-defined optimal augmentations. This flexible benchmark helps us identify yet unrecognized limitations in advanced augmentation techniques (e.g., automated methods). Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.
\ No newline at end of file
diff --git a/data/2022/neurips/Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective b/data/2022/neurips/Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
new file mode 100644
index 0000000000..ad7ed5ece8
--- /dev/null
+++ b/data/2022/neurips/Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective	
@@ -0,0 +1 @@
+The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability b/data/2022/neurips/Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
new file mode 100644
index 0000000000..2440c9a445
--- /dev/null
+++ b/data/2022/neurips/Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability	
@@ -0,0 +1 @@
+Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning b/data/2022/neurips/Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
new file mode 100644
index 0000000000..f64cec061c
--- /dev/null
+++ b/data/2022/neurips/Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning	
@@ -0,0 +1 @@
+We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems. Theoretically, the designed algorithms based on the ARNPG framework achieve $\tilde{O}(1/T)$ global convergence with exact gradients. Empirically, the ARNPG-guided algorithms also demonstrate superior performance compared to some existing policy gradient-based approaches in both exact gradients and sample-based scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars b/data/2022/neurips/AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars
new file mode 100644
index 0000000000..47bb9c595c
--- /dev/null
+++ b/data/2022/neurips/AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars	
@@ -0,0 +1 @@
+Although 2D generative models have made great progress in face image generation and animation, they often suffer from undesirable artifacts such as 3D inconsistency when rendering images from different camera viewpoints. This prevents them from synthesizing video animations indistinguishable from real ones. Recently, 3D-aware GANs extend 2D GANs for explicit disentanglement of camera pose by leveraging 3D scene representations. These methods can well preserve the 3D consistency of the generated images across different views, yet they cannot achieve fine-grained control over other attributes, among which facial expression control is arguably the most useful and desirable for face animation. In this paper, we propose an animatable 3D-aware GAN for multiview consistent face animation generation. The key idea is to decompose the 3D representation of the 3D-aware GAN into a template field and a deformation field, where the former represents different identities with a canonical expression, and the latter characterizes expression variations of each identity. To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN. This helps our method achieve high-quality animatable face image generation with strong visual 3D consistency, even though trained with only unstructured 2D images. Extensive experiments demonstrate our superior performance over prior works. Project page: https://yuewuhkust.github.io/AniFaceGAN
\ No newline at end of file
diff --git a/data/2022/neurips/AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies b/data/2022/neurips/AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies
new file mode 100644
index 0000000000..952eab599c
--- /dev/null
+++ b/data/2022/neurips/AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies	
@@ -0,0 +1 @@
+Existing correspondence datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements, making them insufficient to simulate real animations. In this work, we present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source three-dimensional (3D) movies to full scenes in 2D style, including simultaneous moving background and interactions of multiple subjects. Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets. With this dataset, we establish a comprehensive benchmark by evaluating several existing optical flow and segment matching methods, and analyze shortcomings of these methods on animation data. Data, code and other supplementary materials are available at https://lisiyao21.github.io/projects/AnimeRun.
\ No newline at end of file
diff --git a/data/2022/neurips/AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos b/data/2022/neurips/AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos
new file mode 100644
index 0000000000..b89bc0773d
--- /dev/null
+++ b/data/2022/neurips/AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos	
@@ -0,0 +1 @@
+This paper studies the problem of real-world video super-resolution (VSR) for animation videos, and reveals three key improvements for practical animation VSR. First, recent real-world super-resolution approaches typically rely on degradation simulation using basic operators without any learning capability, such as blur, noise, and compression. In this work, we propose to learn such basic operators from real low-quality animation videos, and incorporate the learned ones into the degradation generation pipeline. Such neural-network-based basic operators could help to better capture the distribution of real degradations. Second, a large-scale high-quality animation video dataset, AVC, is built to facilitate comprehensive training and evaluations for animation VSR. Third, we further investigate an efficient multi-scale network structure. It takes advantage of the efficiency of unidirectional recurrent networks and the effectiveness of sliding-window-based methods. Thanks to the above delicate designs, our method, AnimeSR, is capable of restoring real-world low-quality animation videos effectively and efficiently, achieving superior performance to previous state-of-the-art methods. Codes and models are available at https://github.com/TencentARC/AnimeSR.
\ No newline at end of file
diff --git a/data/2022/neurips/Annihilation of Spurious Minima in Two-Layer ReLU Networks b/data/2022/neurips/Annihilation of Spurious Minima in Two-Layer ReLU Networks
new file mode 100644
index 0000000000..68768b750e
--- /dev/null
+++ b/data/2022/neurips/Annihilation of Spurious Minima in Two-Layer ReLU Networks	
@@ -0,0 +1 @@
+We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima. Sharp analytic estimates are obtained for the loss and the Hessian spectrum at different minima, and it is proved that adding neurons can turn symmetric spurious minima into saddles; minima of lesser symmetry require more neurons. Using Cauchy's interlacing theorem, we prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function. This analytic approach uses techniques, new to the field, from algebraic geometry, representation theory and symmetry breaking, and confirms rigorously the effectiveness of over-parameterization in making the associated loss landscape accessible to gradient-based methods. For a fixed number of neurons and inputs, the spectral results remain true under symmetry breaking perturbation of the target.
\ No newline at end of file
diff --git a/data/2022/neurips/AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection b/data/2022/neurips/AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection
new file mode 100644
index 0000000000..4496ba3573
--- /dev/null
+++ b/data/2022/neurips/AnoShift: A Distribution Shift Benchmark for Unsupervised Anomaly Detection	
@@ -0,0 +1 @@
+Analyzing the distribution shift of data is a growing research direction in nowadays Machine Learning (ML), leading to emerging new benchmarks that focus on providing a suitable scenario for studying the generalization properties of ML models. The existing benchmarks are focused on supervised learning, and to the best of our knowledge, there is none for unsupervised learning. Therefore, we introduce an unsupervised anomaly detection benchmark with data that shifts over time, built over Kyoto-2006+, a traffic dataset for network intrusion detection. This type of data meets the premise of shifting the input distribution: it covers a large time span ($10$ years), with naturally occurring changes over time (eg users modifying their behavior patterns, and software updates). We first highlight the non-stationary nature of the data, using a basic per-feature analysis, t-SNE, and an Optimal Transport approach for measuring the overall distribution distances between years. Next, we propose AnoShift, a protocol splitting the data in IID, NEAR, and FAR testing splits. We validate the performance degradation over time with diverse models, ranging from classical approaches to deep learning. Finally, we show that by acknowledging the distribution shift problem and properly addressing it, the performance can be improved compared to the classical training which assumes independent and identically distributed data (on average, by up to $3\%$ for our approach). Dataset and code are available at https://github.com/bit-ml/AnoShift/.
\ No newline at end of file
diff --git a/data/2022/neurips/Anonymized Histograms in Intermediate Privacy Models b/data/2022/neurips/Anonymized Histograms in Intermediate Privacy Models
new file mode 100644
index 0000000000..d584d2ed2a
--- /dev/null
+++ b/data/2022/neurips/Anonymized Histograms in Intermediate Privacy Models	
@@ -0,0 +1 @@
+We study the problem of privately computing the anonymized histogram (a.k.a. unattributed histogram), which is defined as the histogram without item labels. Previous works have provided algorithms with $\ell_1$- and $\ell_2^2$-errors of $O_\varepsilon(\sqrt{n})$ in the central model of differential privacy (DP). In this work, we provide an algorithm with a nearly matching error guarantee of $\tilde{O}_\varepsilon(\sqrt{n})$ in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in privately estimating symmetric properties of distributions such as entropy, support coverage, and support size.
\ No newline at end of file
diff --git a/data/2022/neurips/Anonymous Bandits for Multi-User Systems b/data/2022/neurips/Anonymous Bandits for Multi-User Systems
new file mode 100644
index 0000000000..cb65eccc9a
--- /dev/null
+++ b/data/2022/neurips/Anonymous Bandits for Multi-User Systems	
@@ -0,0 +1 @@
+In this work, we present and study a new framework for online learning in systems with multiple users that provide user anonymity. Specifically, we extend the notion of bandits to obey the standard $k$-anonymity constraint by requiring each observation to be an aggregation of rewards for at least $k$ users. This provides a simple yet effective framework where one can learn a clustering of users in an online fashion without observing any user's individual decision. We initiate the study of anonymous bandits and provide the first sublinear regret algorithms and lower bounds for this setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Anticipating Performativity by Predicting from Predictions b/data/2022/neurips/Anticipating Performativity by Predicting from Predictions
new file mode 100644
index 0000000000..4368a12350
--- /dev/null
+++ b/data/2022/neurips/Anticipating Performativity by Predicting from Predictions	
@@ -0,0 +1 @@
+Predictions about people, such as their expected educational achievement or their credit risk, can be performative and shape the outcome that they aim to predict. Understanding the causal effect of these predictions on the eventual outcomes is crucial for foreseeing the implications of future predictive models and selecting which models to deploy. However, this causal estimation task poses unique challenges: model predictions are usually deterministic functions of input features and highly correlated with outcomes. This can make the causal effects of predictions on outcomes impossible to disentangle from the direct effect of the covariates. We study this problem through the lens of causal identifiability, and despite the hardness of this problem in full generality, we highlight three natural scenarios where the causal relationship between covariates, predictions and outcomes can be identified from observational data: randomization in predictions, overparameterization of the predictive model deployed during data collection, and discrete prediction outputs. Empirically we show that given our identifiability conditions hold, standard variants of supervised learning that predict from predictions by treating the prediction as an input feature can indeed find transferable functional relationships that allow for conclusions about newly deployed predictive models. These positive results fundamentally rely on model predictions being recorded during data collection, bringing forward the importance of rethinking standard data collection practices to enable progress towards a better understanding of social outcomes and performative feedback loops.
\ No newline at end of file
diff --git a/data/2022/neurips/Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures b/data/2022/neurips/Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures
new file mode 100644
index 0000000000..b924087821
--- /dev/null
+++ b/data/2022/neurips/Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures	
@@ -0,0 +1 @@
+Antibodies are immune system proteins that protect the host by binding to specific antigens such as viruses and bacteria. The binding between antibodies and antigens is mainly determined by the complementarity-determining regions (CDR) of the antibodies. In this work, we develop a deep generative model that jointly models sequences and structures of CDRs based on diffusion probabilistic models and equivariant neural networks. Our method is the first deep learning-based method that generates antibodies explicitly targeting specific antigen structures and is one of the earliest diffusion probabilistic models for protein structures. The model is a “Swiss Army Knife” capable of sequence-structure co-design, sequence design for given backbone structures, and antibody optimization. We conduct extensive experiments to evaluate the quality of both sequences and structures of designed antibodies. We find that our model could yield competitive results in binding affinity measured by biophysical energy functions and other protein design metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/Anytime-Valid Inference For Multinomial Count Data b/data/2022/neurips/Anytime-Valid Inference For Multinomial Count Data
new file mode 100644
index 0000000000..277d026b79
--- /dev/null
+++ b/data/2022/neurips/Anytime-Valid Inference For Multinomial Count Data	
@@ -0,0 +1 @@
+Many experiments are concerned with the comparison of counts between treatment groups. Examples include the number of successful signups in conversion rate experiments, or the number of errors produced by software versions in canary experiments. Observations typically arrive in data streams and practitioners wish to continuously monitor their experiments, sequentially testing hypotheses while maintaining Type I error probabilities under optional stopping and continuation. These goals are frequently complicated in practice by non-stationary time dynamics. We provide practical solutions through sequential tests of multinomial hypotheses, hypotheses about many inhomogeneous Bernoulli processes and hypotheses about many time-inhomogeneous Poisson counting processes. For estimation, we further provide confidence sequences for multinomial probability vectors, all contrasts among probabilities of inhomogeneous Bernoulli processes and all contrasts among intensities of time-inhomogeneous Poisson counting processes. Together, these provide an"anytime-valid"inference framework for a wide variety of experiments dealing with count outcomes, which we illustrate with a number of industry applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization b/data/2022/neurips/Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization
new file mode 100644
index 0000000000..16bc5d342c
--- /dev/null
+++ b/data/2022/neurips/Approaching Quartic Convergence Rates for Quasi-Stochastic Approximation with Application to Gradient-Free Optimization	
@@ -0,0 +1 @@
+Stochastic approximation is a foundation for many algorithms found in machine learning and optimization. It is in general slow to converge: the mean square error vanishes as O ( n − 1 ) . A deterministic counterpart known as quasi-stochastic approximation is a viable alternative in many applications, including gradient-free optimization and reinforcement learning. It was assumed in prior research that the optimal achievable convergence rate is O ( n − 2 ) . It is shown in this paper that through design it is possible to obtain far faster convergence, of order O ( n − 4+ δ ) , with δ > 0 arbitrary. Two techniques are introduced for the ﬁrst time to achieve this rate of convergence. The theory is also specialized within the context of gradient-free optimization, and tested on standard benchmarks. The main results are based on a combination of novel application of results from number theory and techniques adapted from stochastic approximation theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss b/data/2022/neurips/Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss
new file mode 100644
index 0000000000..ffc6d79243
--- /dev/null
+++ b/data/2022/neurips/Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss	
@@ -0,0 +1 @@
+A classical result of Johnson and Lindenstrauss states that a set of $n$ high dimensional data points can be projected down to $O(\log n/\epsilon^2)$ dimensions such that the square of their pairwise distances is preserved up to a small distortion $\epsilon\in(0,1)$. It has been proved that the JL lemma is optimal for the general case, therefore, improvements can only be explored for special cases. This work aims to improve the $\epsilon^{-2}$ dependency based on techniques inspired by the Hutch++ Algorithm, which reduces $\epsilon^{-2}$ to $\epsilon^{-1}$ for the related problem of implicit matrix trace estimation. We first present an algorithm to estimate the Euclidean lengths of the rows of a matrix. We prove for it element-wise probabilistic bounds that are at least as good as standard JL approximations in the worst-case, but are asymptotically better for matrices with decaying spectrum. Moreover, for any matrix, regardless of its spectrum, the algorithm achieves $\epsilon$-accuracy for the total, Frobenius norm-wise relative error using only $O(\epsilon^{-1})$ queries. This is a quadratic improvement over the norm-wise error of standard JL approximations. We also show how these results can be extended to estimate (i) the Euclidean distances between data points and (ii) the statistical leverage scores of tall-and-skinny data matrices, which are ubiquitous for many applications, with analogous theoretical improvements. Proof-of-concept numerical experiments are presented to validate the theoretical analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/Approximate Secular Equations for the Cubic Regularization Subproblem b/data/2022/neurips/Approximate Secular Equations for the Cubic Regularization Subproblem
new file mode 100644
index 0000000000..b02a4b4fe4
--- /dev/null
+++ b/data/2022/neurips/Approximate Secular Equations for the Cubic Regularization Subproblem	
@@ -0,0 +1 @@
+The cubic regularization method (CR) is a popular algorithm for unconstrained non-convex optimization. At each iteration, CR solves a cubically regularized quadratic problem, called the cubic regularization subproblem (CRS). One way to solve the CRS relies on solving the secular equation, whose computational bottleneck lies in the computation of all eigenvalues of the Hessian matrix. In this paper, we propose and analyze a novel CRS solver based on an approximate secular equation, which requires only some of the Hessian eigenvalues and is therefore much more efficient. Two approximate secular equations (ASEs) are developed. For both ASEs, we first study the existence and uniqueness of their roots and then establish an upper bound on the gap between the root and that of the standard secular equation. Such an upper bound can in turn be used to bound the distance from the approximate CRS solution based ASEs to the true CRS solution, thus offering a theoretical guarantee for our CRS solver. A desirable feature of our CRS solver is that it requires only matrix-vector multiplication but not matrix inversion, which makes it particularly suitable for high-dimensional applications of unconstrained non-convex optimization, such as low-rank recovery and deep learning. Numerical experiments with synthetic and real data-sets are conducted to investigate the practical performance of the proposed CRS solver. Experimental results show that the proposed solver outperforms two state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Approximate Value Equivalence b/data/2022/neurips/Approximate Value Equivalence
new file mode 100644
index 0000000000..9682d0126e
--- /dev/null
+++ b/data/2022/neurips/Approximate Value Equivalence	
@@ -0,0 +1 @@
+Model-based reinforcement learning agents must make compromises about which aspects of the environment their models should capture. The value equivalence (VE) principle posits that these compromises should be made considering the model’s eventual use in value-based planning. Given sets of functions and policies, a model is said to be order-k VE to the environment if k applications of the Bellman operators induced by the policies produce the correct result when applied to the functions. Prior work investigated the classes of models induced by VE when we vary k and the sets of policies and functions. This gives rise to a rich collection of topological relationships and conditions under which VE models are optimal for planning. Despite this effort, relatively little is known about the planning performance of models that fail to satisfy these conditions. This is due to the rigidity of the VE formalism, as classes of VE models are defined with respect to exact constraints on their Bellman operators. This limitation gets amplified by the fact that such constraints themselves may depend on functions that can only be approximated in practice. To address these problems we propose approximate value equivalence (AVE), which extends the VE formalism by replacing equalities with error tolerances. This extension allows us to show that AVE models with respect to one set of functions are also AVE with respect to any other set of functions if we tolerate a high enough error. We can then derive bounds on the performance of VE models with respect to arbitrary sets of functions . Moreover, AVE models more accurately reflect what can be learned by our agents in practice, allowing us to investigate previously unexplored tensions between model capacity and the choice of VE model class. In contrast to previous works, we show empirically that there are situations where agents with limited capacity should prefer to learn more accurate models with respect to smaller sets of functions over less accurate models with respect to larger sets of functions.
\ No newline at end of file
diff --git a/data/2022/neurips/Approximation with CNNs in Sobolev Space: with Applications to Classification b/data/2022/neurips/Approximation with CNNs in Sobolev Space: with Applications to Classification
new file mode 100644
index 0000000000..5535ee81ca
--- /dev/null
+++ b/data/2022/neurips/Approximation with CNNs in Sobolev Space: with Applications to Classification	
@@ -0,0 +1 @@
+We derive a novel approximation error bound with an explicit prefactor for Sobolev-regular functions using deep convolutional neural networks (CNNs). The bound is non-asymptotic in terms of the network depth and filter lengths, in a rather flexible way. For Sobolev-regular functions which can be embedded into the Hölder space, the prefactor of our error bound depends on the ambient dimension polynomially instead of exponentially as in most existing results, which is of independent interest. We also establish a new approximation result when the target function is supported on an approximate lower-dimensional manifold. We apply our results to establish non-asymptotic excess risk bounds for classification using CNNs with convex surrogate losses, including the cross-entropy loss, the hinge loss, the logistic loss, the exponential loss and the least squares loss. We show that the classification methods with CNNs can circumvent the curse of dimensionality if input data is supported on a neighborhood of a low-dimensional manifold.
\ No newline at end of file
diff --git a/data/2022/neurips/Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions b/data/2022/neurips/Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions
new file mode 100644
index 0000000000..b35d936fcd
--- /dev/null
+++ b/data/2022/neurips/Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions	
@@ -0,0 +1 @@
+The last few years have seen a surge of work on high dimensional statistics under privacy constraints, mostly following two main lines of work: the ``worst case'' line, which does not make any distributional assumptions on the input data; and the ``strong assumptions'' line, which assumes that the data is generated from specific families, e.g., subgaussian distributions. In this work we take a middle ground, obtaining new differentially private algorithms with polynomial sample complexity for estimating quantiles in high-dimensions, as well as estimating and sampling points of high Tukey depth, all working under very mild distributional assumptions. From the technical perspective, our work relies upon deep robustness results in the convex geometry literature, demonstrating how such results can be used in a private context. Our main object of interest is the (convex) floating body (FB), a notion going back to Archimedes, which is a robust and well studied high-dimensional analogue of the interquantile range. We show how one can privately, and with polynomially many samples, (a) output an approximate interior point of the FB -- e.g., ``a typical user'' in a high-dimensional database -- by leveraging the robustness of the Steiner point of the FB; and at the expense of polynomially many more samples, (b) produce an approximate uniform sample from the FB, by constructing a private noisy projection oracle.
\ No newline at end of file
diff --git a/data/2022/neurips/Are All Losses Created Equal: A Neural Collapse Perspective b/data/2022/neurips/Are All Losses Created Equal: A Neural Collapse Perspective
new file mode 100644
index 0000000000..916ebc6ed4
--- /dev/null
+++ b/data/2022/neurips/Are All Losses Created Equal: A Neural Collapse Perspective	
@@ -0,0 +1 @@
+While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on training data. Based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.
\ No newline at end of file
diff --git a/data/2022/neurips/Are AlphaZero-like Agents Robust to Adversarial Perturbations? b/data/2022/neurips/Are AlphaZero-like Agents Robust to Adversarial Perturbations?
new file mode 100644
index 0000000000..f96eaa48bb
--- /dev/null
+++ b/data/2022/neurips/Are AlphaZero-like Agents Robust to Adversarial Perturbations?	
@@ -0,0 +1 @@
+The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions. In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players and 90\% of examples indeed lead the Go agent to play an obviously inferior action. Our code is available at \url{https://PaperCode.cc/GoAttack}.
\ No newline at end of file
diff --git a/data/2022/neurips/Are Defenses for Graph Neural Networks Robust? b/data/2022/neurips/Are Defenses for Graph Neural Networks Robust?
new file mode 100644
index 0000000000..50c65d7a1c
--- /dev/null
+++ b/data/2022/neurips/Are Defenses for Graph Neural Networks Robust?	
@@ -0,0 +1 @@
+A cursory reading of the literature suggests that we have made a lot of progress in designing effective adversarial defenses for Graph Neural Networks (GNNs). Yet, the standard methodology has a serious flaw - virtually all of the defenses are evaluated against non-adaptive attacks leading to overly optimistic robustness estimates. We perform a thorough robustness analysis of 7 of the most popular defenses spanning the entire spectrum of strategies, i.e., aimed at improving the graph, the architecture, or the training. The results are sobering - most defenses show no or only marginal improvement compared to an undefended baseline. We advocate using custom adaptive attacks as a gold standard and we outline the lessons we learned from successfully designing such attacks. Moreover, our diverse collection of perturbed graphs forms a (black-box) unit test offering a first glance at a model's robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/Are GANs overkill for NLP? b/data/2022/neurips/Are GANs overkill for NLP?
new file mode 100644
index 0000000000..ebb35ef058
--- /dev/null
+++ b/data/2022/neurips/Are GANs overkill for NLP?	
@@ -0,0 +1 @@
+This work offers a novel theoretical perspective on why, despite numerous attempts, adversarial approaches to generative modeling (e.g., GANs) have not been as successful for certain generation tasks, particularly sequential tasks such as Natural Language Generation, as they have in others, such as Computer Vision. In particular, on sequential data such as text, maximum-likelihood approaches are signiﬁcantly more utilized than GANs. We show that, while it may seem that maximizing likelihood is inherently different than minimizing distinguishability, this distinction is largely an artifact of the limited representational capacity of the model family, for a wide class of adversarial objectives. We give a theoretical model in which minimizing KL-divergence (i.e., maximizing likelihood) is a more efﬁcient approach to effectively minimizing the same distinguishability criteria that adversarial models seek to optimize. Reductions show that minimizing distinguishability can be seen as simply boosting likelihood for certain families of models including n-gram models and neural networks with a softmax output layer. To achieve a full polynomial-time reduction, a novel next-token distinguishability model is considered. Some preliminary empirical evidence is also provided to substantiate our theoretical analyses.
\ No newline at end of file
diff --git a/data/2022/neurips/Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks b/data/2022/neurips/Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks
new file mode 100644
index 0000000000..ed3ce032a2
--- /dev/null
+++ b/data/2022/neurips/Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks	
@@ -0,0 +1 @@
+We show that deep networks trained to satisfy demographic parity often do so through a form of race or gender awareness, and that the more we force a network to be fair, the more accurately we can recover race or gender from the internal state of the network. Based on this observation, we investigate an alternative fairness approach: we add a second classification head to the network to explicitly predict the protected attribute (such as race or gender) alongside the original task. After training the two-headed network, we enforce demographic parity by merging the two heads, creating a network with the same architecture as the original network. We establish a close relationship between existing approaches and our approach by showing (1) that the decisions of a fair classifier are well-approximated by our approach, and (2) that an unfair and optimally accurate classifier can be recovered from a fair classifier and our second head predicting the protected attribute. We use our explicit formulation to argue that the existing fairness approaches, just as ours, demonstrate disparate treatment and that they are likely to be unlawful in a wide range of scenarios under US law.
\ No newline at end of file
diff --git a/data/2022/neurips/Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks b/data/2022/neurips/Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks
new file mode 100644
index 0000000000..ce4419523b
--- /dev/null
+++ b/data/2022/neurips/Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks	
@@ -0,0 +1 @@
+An off-the-shelf model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting aims to verify whether a suspect model is stolen from the victim model, which gains more and more attention nowadays. Previous methods always leverage the transferable adversarial examples as the model fingerprint, which is sensitive to adversarial defense or transfer learning scenarios. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-w that selects wrongly classified normal samples as model inputs and calculates the mean correlation among their model outputs. To reduce the training time, we further develop SAC-m that selects CutMix Augmented samples as model inputs, without the need for training the surrogate models or generating adversarial examples. Extensive results validate that SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning, and detects the stolen models with the best performance in terms of AUC across different datasets and model architectures. The codes are available at https://github.com/guanjiyang/SAC.
\ No newline at end of file
diff --git a/data/2022/neurips/Are all Frames Equal? Active Sparse Labeling for Video Action Detection b/data/2022/neurips/Are all Frames Equal? Active Sparse Labeling for Video Action Detection
new file mode 100644
index 0000000000..0bc1657021
--- /dev/null
+++ b/data/2022/neurips/Are all Frames Equal? Active Sparse Labeling for Video Action Detection	
@@ -0,0 +1 @@
+Video action detection requires annotations at every frame, which drastically increases the labeling cost. In this work, we focus on efﬁcient labeling of videos for action detection to minimize this cost. We propose active sparse labeling (ASL) , a novel active learning strategy for video action detection. Sparse labeling will reduce the annotation cost but poses two main challenges; 1) how to estimate the utility of annotating a single frame for action detection as detection is performed at video level?, and 2) how these sparse labels can be used for action detection which require annotations on all the frames? This work attempts to address these challenges within a simple active learning framework. For the ﬁrst challenge, we propose a novel frame-level scoring mechanism aimed at selecting most informative frames in a video. Next, we introduce a novel loss formulation which enables training of action detection model with these sparsely selected frames. We evaluate the proposed approach on two different action detection benchmark datasets, UCF-101-24 and J-HMDB-21, and observed that active sparse labeling can be very effective in saving annotation costs. We demonstrate that the proposed approach performs better than random selection, outperforming all other baselines, with performance comparable to supervised approach using merely 10% annotations. Project details available at https://www.crcv.ucf.edu/research/projects/active-sparse-labeling-for-video-action-detection/
\ No newline at end of file
diff --git a/data/2022/neurips/Ask4Help: Learning to Leverage an Expert for Embodied Tasks b/data/2022/neurips/Ask4Help: Learning to Leverage an Expert for Embodied Tasks
new file mode 100644
index 0000000000..6622c32599
--- /dev/null
+++ b/data/2022/neurips/Ask4Help: Learning to Leverage an Expert for Embodied Tasks	
@@ -0,0 +1 @@
+Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the Ask4Help policy that augments agents with the ability to request, and then use expert assistance. Ask4Help policies can be efficiently trained without modifying the original agent's parameters and learn a desirable trade-off between task performance and the amount of requested help, thereby reducing the cost of querying the expert. We evaluate Ask4Help on two different tasks -- object goal navigation and room rearrangement and see substantial improvements in performance using minimal help. On object navigation, an agent that achieves a $52\%$ success rate is raised to $86\%$ with $13\%$ help and for rearrangement, the state-of-the-art model with a $7\%$ success rate is dramatically improved to $90.4\%$ using $39\%$ help. Human trials with Ask4Help demonstrate the efficacy of our approach in practical scenarios. We release the code for Ask4Help here: https://github.com/allenai/ask4help.
\ No newline at end of file
diff --git a/data/2022/neurips/Assaying Out-Of-Distribution Generalization in Transfer Learning b/data/2022/neurips/Assaying Out-Of-Distribution Generalization in Transfer Learning
new file mode 100644
index 0000000000..acf5fdb55f
--- /dev/null
+++ b/data/2022/neurips/Assaying Out-Of-Distribution Generalization in Transfer Learning	
@@ -0,0 +1 @@
+Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies.
\ No newline at end of file
diff --git a/data/2022/neurips/Assistive Teaching of Motor Control Tasks to Humans b/data/2022/neurips/Assistive Teaching of Motor Control Tasks to Humans
new file mode 100644
index 0000000000..9edb1fec46
--- /dev/null
+++ b/data/2022/neurips/Assistive Teaching of Motor Control Tasks to Humans	
@@ -0,0 +1 @@
+Recent works on shared autonomy and assistive-AI technologies, such as assistive robot teleoperation, seek to model and help human users with limited ability in a fixed task. However, these approaches often fail to account for humans' ability to adapt and eventually learn how to execute a control task themselves. Furthermore, in applications where it may be desirable for a human to intervene, these methods may inhibit their ability to learn how to succeed with full self-control. In this paper, we focus on the problem of assistive teaching of motor control tasks such as parking a car or landing an aircraft. Despite their ubiquitous role in humans' daily activities and occupations, motor tasks are rarely taught in a uniform way due to their high complexity and variance. We propose an AI-assisted teaching algorithm that leverages skill discovery methods from reinforcement learning (RL) to (i) break down any motor control task into teachable skills, (ii) construct novel drill sequences, and (iii) individualize curricula to students with different capabilities. Through an extensive mix of synthetic and user studies on two motor control tasks -- parking a car with a joystick and writing characters from the Balinese alphabet -- we show that assisted teaching with skills improves student performance by around 40% compared to practicing full trajectories without skills, and practicing with individualized drills can result in up to 25% further improvement. Our source code is available at https://github.com/Stanford-ILIAD/teaching
\ No newline at end of file
diff --git a/data/2022/neurips/Associating Objects and Their Effects in Video through Coordination Games b/data/2022/neurips/Associating Objects and Their Effects in Video through Coordination Games
new file mode 100644
index 0000000000..a54d726a93
--- /dev/null
+++ b/data/2022/neurips/Associating Objects and Their Effects in Video through Coordination Games	
@@ -0,0 +1 @@
+We explore a feed-forward approach for decomposing a video into layers, where each layer contains an object of interest along with its associated shadows, reflections, and other visual effects. This problem is challenging since associated effects vary widely with the 3D geometry and lighting conditions in the scene, and ground-truth labels for visual effects are difficult (and in some cases impractical) to collect. We take a self-supervised approach and train a neural network to produce a foreground image and alpha matte from a rough object segmentation mask under a reconstruction and sparsity loss. Under reconstruction loss, the layer decomposition problem is underdetermined: many combinations of layers may reconstruct the input video. Inspired by the game theory concept of focal points—or Schelling points —we pose the problem as a coordination game, where each player (network) predicts the effects for a single object without knowledge of the other players’ choices. The players learn to converge on the “natural” layer decomposition in order to maximize the likelihood of their choices aligning with the other players’. We train the network to play this game with itself, and show how to design the rules of this game so that the focal point lies at the correct layer decomposition. We demonstrate feed-forward results on a challenging synthetic dataset, then show that pretraining on this dataset significantly reduces optimization time for real videos.
\ No newline at end of file
diff --git a/data/2022/neurips/Association Graph Learning for Multi-Task Classification with Category Shifts b/data/2022/neurips/Association Graph Learning for Multi-Task Classification with Category Shifts
new file mode 100644
index 0000000000..1f0384440d
--- /dev/null
+++ b/data/2022/neurips/Association Graph Learning for Multi-Task Classification with Category Shifts	
@@ -0,0 +1 @@
+In this paper, we focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. In particular, we tackle a new setting, which is more realistic than currently addressed in the literature, where categories shift from training to test data. Hence, individual tasks do not contain complete training data for the categories in the test set. To generalize to such test data, it is crucial for individual tasks to leverage knowledge from related tasks. To this end, we propose learning an association graph to transfer knowledge among tasks for missing classes. We construct the association graph with nodes representing tasks, classes and instances, and encode the relationships among the nodes in the edges to guide their mutual knowledge transfer. By message passing on the association graph, our model enhances the categorical information of each instance, making it more discriminative. To avoid spurious correlations between task and class nodes in the graph, we introduce an assignment entropy maximization that encourages each class node to balance its edge weights. This enables all tasks to fully utilize the categorical information from related tasks. An extensive evaluation on three general benchmarks and a medical dataset for skin lesion classification reveals that our method consistently performs better than representative baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again b/data/2022/neurips/Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again
new file mode 100644
index 0000000000..19496a97b7
--- /dev/null
+++ b/data/2022/neurips/Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again	
@@ -0,0 +1 @@
+Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: {\it correct guidance}, {\it smooth regularization}, and {\it class discriminability}. The last term describes the distinctness of {\it wrong class probabilities} that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of {\it class discriminability}, resulting in less discriminative wrong class probabilities. Therefore, we propose {\it Asymmetric Temperature Scaling (ATS)}, which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS. The demo developed in Mindspore is available at https://gitee.com/lxcnju/ats-mindspore and will be available at https://gitee.com/mindspore/models/tree/master/research/cv/ats.
\ No newline at end of file
diff --git a/data/2022/neurips/Asymptotic Behaviors of Projected Stochastic Approximation: A Jump Diffusion Perspective b/data/2022/neurips/Asymptotic Behaviors of Projected Stochastic Approximation: A Jump Diffusion Perspective
new file mode 100644
index 0000000000..e4a2beb1bc
--- /dev/null
+++ b/data/2022/neurips/Asymptotic Behaviors of Projected Stochastic Approximation: A Jump Diffusion Perspective	
@@ -0,0 +1 @@
+In this paper we consider linearly constrained stochastic approximation problems with federated learning as a special case. We propose a loopless projection stochastic approximation algorithm (LPSA) to ensure feasibility by performing the projection with probability p n at the n -th iteration. Considering a specific family of the probability p n and step size η n , we analyze our algorithm from an asymptotic and continuous perspective. Using a novel jump diffusion approximation, we show that the trajectories connecting those properly rescaled last iterates weakly converge to the solution of specific stochastic differential equations (SDEs). By analyzing SDEs, we identify the asymptotic behaviors of LPSA for different choices of ( p n , η n ) . We find the algorithm presents an intriguing asymptotic bias-variance trade-off according to the relative magnitude of p n w.r.t. η n . It brings insights on how to choose appropriate { ( p n , η n ) } n ≥ 1 to minimize the projection complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Asymptotic Properties for Bayesian Neural Network in Besov Space b/data/2022/neurips/Asymptotic Properties for Bayesian Neural Network in Besov Space
new file mode 100644
index 0000000000..e95d6585a8
--- /dev/null
+++ b/data/2022/neurips/Asymptotic Properties for Bayesian Neural Network in Besov Space	
@@ -0,0 +1 @@
+Neural networks have shown great predictive power when dealing with various unstructured data such as images and natural languages. The Bayesian neural network captures the uncertainty of prediction by putting a prior distribution for the parameter of the model and computing the posterior distribution. In this paper, we show that the Bayesian neural network using spike-and-slab prior has consistency with nearly minimax convergence rate when the true regression function is in the Besov space. Even when the smoothness of the regression function is unknown the same posterior convergence rate holds and thus the spike-and-slab prior is adaptive to the smoothness of the regression function. We also consider the shrinkage prior, which is more feasible than other priors, and show that it has the same convergence rate. In other words, we propose a practical Bayesian neural network with guaranteed asymptotic properties.
\ No newline at end of file
diff --git a/data/2022/neurips/Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm b/data/2022/neurips/Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm
new file mode 100644
index 0000000000..65d3d98321
--- /dev/null
+++ b/data/2022/neurips/Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm	
@@ -0,0 +1 @@
+The Partial Area Under the ROC Curve (PAUC), typically including One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC), measures the average performance of a binary classifier within a specific false positive rate and/or true positive rate interval, which is a widely adopted measure when decision constraints must be considered. Consequently, PAUC optimization has naturally attracted increasing attention in the machine learning community within the last few years. Nonetheless, most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. Fortunately, a recent work presents an unbiased formulation of the PAUC optimization problem via distributional robust optimization. However, it is based on the pair-wise formulation of AUC, which suffers from the limited scalability w.r.t. sample size and a slow convergence rate, especially for TPAUC. To address this issue, we present a simpler reformulation of the problem in an asymptotically unbiased and instance-wise manner. For both OPAUC and TPAUC, we come to a nonconvex strongly concave minimax regularized problem of instance-wise functions. On top of this, we employ an efficient solver enjoys a linear per-iteration computational complexity w.r.t. the sample size and a time-complexity of $O(\epsilon^{-1/3})$ to reach a $\epsilon$ stationary point. Furthermore, we find that the minimax reformulation also facilitates the theoretical analysis of generalization error as a byproduct. Compared with the existing results, we present new error bounds that are much easier to prove and could deal with hypotheses with real-valued outputs. Finally, extensive experiments on several benchmark datasets demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Asymptotics of smoothed Wasserstein distances in the small noise regime b/data/2022/neurips/Asymptotics of smoothed Wasserstein distances in the small noise regime
new file mode 100644
index 0000000000..ba7a30f61c
--- /dev/null
+++ b/data/2022/neurips/Asymptotics of smoothed Wasserstein distances in the small noise regime	
@@ -0,0 +1 @@
+We study the behavior of the Wasserstein-$2$ distance between discrete measures $\mu$ and $\nu$ in $\mathbb{R}^d$ when both measures are smoothed by small amounts of Gaussian noise. This procedure, known as Gaussian-smoothed optimal transport, has recently attracted attention as a statistically attractive alternative to the unregularized Wasserstein distance. We give precise bounds on the approximation properties of this proposal in the small noise regime, and establish the existence of a phase transition: we show that, if the optimal transport plan from $\mu$ to $\nu$ is unique and a perfect matching, there exists a critical threshold such that the difference between $W_2(\mu, \nu)$ and the Gaussian-smoothed OT distance $W_2(\mu \ast \mathcal{N}_\sigma, \nu\ast \mathcal{N}_\sigma)$ scales like $\exp(-c /\sigma^2)$ for $\sigma$ below the threshold, and scales like $\sigma$ above it. These results establish that for $\sigma$ sufficiently small, the smoothed Wasserstein distance approximates the unregularized distance exponentially well.
\ No newline at end of file
diff --git "a/data/2022/neurips/Asymptotics of \342\204\2232 Regularized Network Embeddings" "b/data/2022/neurips/Asymptotics of \342\204\2232 Regularized Network Embeddings"
new file mode 100644
index 0000000000..d8583c8c4c
--- /dev/null
+++ "b/data/2022/neurips/Asymptotics of \342\204\2232 Regularized Network Embeddings"	
@@ -0,0 +1 @@
+A common approach to solving prediction tasks on large networks, such as node classification or link prediction, begins by learning a Euclidean embedding of the nodes of the network, from which traditional machine learning methods can then be applied. This includes methods such as DeepWalk and node2vec, which learn embeddings by optimizing stochastic losses formed over subsamples of the graph at each iteration of stochastic gradient descent. In this paper, we study the effects of adding an ℓ 2 penalty of the embedding vectors to the training loss of these types of methods. We prove that, under some exchangeability assumptions on the graph, this asymptotically leads to learning a graphon with a nuclear-norm-type penalty, and give guarantees for the asymptotic distribution of the learned embedding vectors. In particular, the exact form of the penalty depends on the choice of subsampling method used as part of stochastic gradient descent. We also illustrate empirically that concatenating node covariates to ℓ 2 regularized node2vec embeddings leads to comparable, when not superior, performance to methods which incorporate node covariates and the network structure in a non-linear manner.
\ No newline at end of file
diff --git a/data/2022/neurips/Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning b/data/2022/neurips/Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..075004bba9
--- /dev/null
+++ b/data/2022/neurips/Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably. Ideally, agents should learn and execute asynchronously instead. Such asynchronous methods also allow temporally extended actions that can take different amounts of time based on the situation and action executed. Unfortunately, current policy gradient methods are not applicable in asynchronous settings, as they assume that agents synchronously reason about action selection at every time step. To allow asynchronous learning and decision-making, we formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms: decentralized learning, centralized learning, and centralized training for decentralized execution. Empirical results (in simulation and hardware) in a variety of realistic domains demonstrate the superiority of our approaches in large multi-agent problems and validate the effectiveness of our algorithms for learning high-quality and asynchronous solutions.
\ No newline at end of file
diff --git a/data/2022/neurips/Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays b/data/2022/neurips/Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays
new file mode 100644
index 0000000000..2dc7d9cccc
--- /dev/null
+++ b/data/2022/neurips/Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays	
@@ -0,0 +1 @@
+The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on"virtual iterates"and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.
\ No newline at end of file
diff --git a/data/2022/neurips/AttCAT: Explaining Transformers via Attentive Class Activation Tokens b/data/2022/neurips/AttCAT: Explaining Transformers via Attentive Class Activation Tokens
new file mode 100644
index 0000000000..ee338611da
--- /dev/null
+++ b/data/2022/neurips/AttCAT: Explaining Transformers via Attentive Class Activation Tokens	
@@ -0,0 +1 @@
+Transformers have improved the state-of-the-art in various natural language processing and computer vision tasks. However, the success of the Transformer model has not yet been duly explained. Current explanation techniques, which dissect either the self-attention mechanism or gradient-based attribution, do not necessarily provide a faithful explanation of the inner workings of Transformers due to the following reasons: first, attention weights alone without considering the magnitudes of feature values are not adequate to reveal the self-attention mechanism; second, whereas most Transformer explanation techniques utilize self-attention module, the skip-connection module, contributing a significant portion of information flows in Transformers, has not yet been sufficiently exploited in explanation; third, the gradient-based attribution of individual feature does not incorporate interaction among features in explaining the model’s output. In order to tackle the above problems, we propose a novel Transformer explanation technique via attentive class activation tokens, aka, AttCAT, leveraging encoded features, their gradients, and their attention weights to generate a faithful and confident explanation for Transformer’s output. Extensive experiments are conducted to demonstrate the superior performance of AttCAT, which generalizes well to different Transformer architectures, evaluation metrics, datasets, and tasks, to the baseline methods. Our code is available at: https://github.com/qiangyao1988/AttCAT.
\ No newline at end of file
diff --git a/data/2022/neurips/Attention-based Neural Cellular Automata b/data/2022/neurips/Attention-based Neural Cellular Automata
new file mode 100644
index 0000000000..cd68d5d66d
--- /dev/null
+++ b/data/2022/neurips/Attention-based Neural Cellular Automata	
@@ -0,0 +1 @@
+Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of $\textit{attention-based}$ NCAs formed using a spatially localized$\unicode{x2014}$yet globally organized$\unicode{x2014}$self-attention scheme. We introduce an instance of this class named $\textit{Vision Transformer Cellular Automata}$ (ViTCA). We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation b/data/2022/neurips/Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation
new file mode 100644
index 0000000000..6013ff08ce
--- /dev/null
+++ b/data/2022/neurips/Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation	
@@ -0,0 +1 @@
+We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in https://github.com/Albert0147/AaD_SFDA.
\ No newline at end of file
diff --git a/data/2022/neurips/Audio-Driven Co-Speech Gesture Video Generation b/data/2022/neurips/Audio-Driven Co-Speech Gesture Video Generation
new file mode 100644
index 0000000000..1c4e1e494a
--- /dev/null
+++ b/data/2022/neurips/Audio-Driven Co-Speech Gesture Video Generation	
@@ -0,0 +1 @@
+Co-speech gesture is crucial for human-machine interaction and digital entertainment. While previous works mostly map speech audio to human skeletons (e.g., 2D keypoints), directly generating speakers' gestures in the image domain remains unsolved. In this work, we formally define and study this challenging problem of audio-driven co-speech gesture video generation, i.e., using a unified framework to generate speaker image sequence driven by speech audio. Our key insight is that the co-speech gestures can be decomposed into common motion patterns and subtle rhythmic dynamics. To this end, we propose a novel framework, Audio-driveN Gesture vIdeo gEneration (ANGIE), to effectively capture the reusable co-speech gesture patterns as well as fine-grained rhythmic movements. To achieve high-fidelity image sequence generation, we leverage an unsupervised motion representation instead of a structural human body prior (e.g., 2D skeletons). Specifically, 1) we propose a vector quantized motion extractor (VQ-Motion Extractor) to summarize common co-speech gesture patterns from implicit motion representation to codebooks. 2) Moreover, a co-speech gesture GPT with motion refinement (Co-Speech GPT) is devised to complement the subtle prosodic motion details. Extensive experiments demonstrate that our framework renders realistic and vivid co-speech gesture video. Demo video and more resources can be found in: https://alvinliu0.github.io/projects/ANGIE
\ No newline at end of file
diff --git a/data/2022/neurips/Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative b/data/2022/neurips/Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative
new file mode 100644
index 0000000000..0c516100f6
--- /dev/null
+++ b/data/2022/neurips/Augmentations in Hypergraph Contrastive Learning: Fabricated and Generative	
@@ -0,0 +1 @@
+This paper targets at improving the generalizability of hypergraph neural networks in the low-label regime, through applying the contrastive learning approach from images/graphs (we refer to it as HyperGCL). We focus on the following question: How to construct contrastive views for hypergraphs via augmentations? We provide the solutions in two folds. First, guided by domain knowledge, we fabricate two schemes to augment hyperedges with higher-order relations encoded, and adopt three vertex augmentation strategies from graph-structured data. Second, in search of more effective views in a data-driven manner, we for the first time propose a hypergraph generative model to generate augmented views, and then an end-to-end differentiable pipeline to jointly learn hypergraph augmentations and model parameters. Our technical innovations are reflected in designing both fabricated and generative augmentations of hypergraphs. The experimental findings include: (i) Among fabricated augmentations in HyperGCL, augmenting hyperedges provides the most numerical gains, implying that higher-order information in structures is usually more downstream-relevant; (ii) Generative augmentations do better in preserving higher-order information to further benefit generalizability; (iii) HyperGCL also boosts robustness and fairness in hypergraph representation learning. Codes are released at https://github.com/weitianxin/HyperGCL.
\ No newline at end of file
diff --git a/data/2022/neurips/Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems b/data/2022/neurips/Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
new file mode 100644
index 0000000000..2afb539487
--- /dev/null
+++ b/data/2022/neurips/Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems	
@@ -0,0 +1 @@
+We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints b/data/2022/neurips/AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints
new file mode 100644
index 0000000000..5116175b4c
--- /dev/null
+++ b/data/2022/neurips/AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints	
@@ -0,0 +1 @@
+Structured representations such as keypoints are widely used in pose transfer, conditional image generation, animation, and 3D reconstruction. However, their supervised learning requires expensive annotation for each target domain. We propose a self-supervised method that learns to disentangle object structure from the appearance with a graph of 2D keypoints linked by straight edges. Both the keypoint location and their pairwise edge weights are learned, given only a collection of images depicting the same object class. The resulting graph is interpretable, for example, AutoLink recovers the human skeleton topology when applied to images showing people. Our key ingredients are i) an encoder that predicts keypoint locations in an input image, ii) a shared graph as a latent variable that links the same pairs of keypoints in every image, iii) an intermediate edge map that combines the latent graph edge weights and keypoint locations in a soft, differentiable manner, and iv) an inpainting objective on randomly masked images. Although simpler, AutoLink outperforms existing self-supervised methods on the established keypoint and pose estimation benchmarks and paves the way for structure-conditioned generative models on more diverse datasets. Project website: https://xingzhehe.github.io/autolink/.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoML Two-Sample Test b/data/2022/neurips/AutoML Two-Sample Test
new file mode 100644
index 0000000000..b411c43090
--- /dev/null
+++ b/data/2022/neurips/AutoML Two-Sample Test	
@@ -0,0 +1 @@
+Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard supervised learning frameworks, whose usage can require specialized knowledge about two-sample testing. We use a simple test that takes the mean discrepancy of a witness function as the test statistic and prove that minimizing a squared loss leads to a witness with optimal testing power. This allows us to leverage recent advancements in AutoML. Without any user input about the problems at hand, and using the same method for all our experiments, our AutoML two-sample test achieves competitive performance on a diverse distribution shift benchmark as well as on challenging two-sample testing problems. We provide an implementation of the AutoML two-sample test in the Python package autotst.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control b/data/2022/neurips/AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2022/neurips/AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning b/data/2022/neurips/AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning
new file mode 100644
index 0000000000..818434e302
--- /dev/null
+++ b/data/2022/neurips/AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning	
@@ -0,0 +1 @@
+Multi-task learning (MTL) jointly learns a set of tasks by sharing parameters among tasks. It is a promising approach for reducing storage costs while improving task accuracy for many computer vision tasks. The effective adoption of MTL faces two main challenges. The first challenge is to determine what parameters to share across tasks to optimize for both memory efficiency and task accuracy. The second challenge is to automatically apply MTL algorithms to an arbitrary CNN backbone without requiring time-consuming manual re-implementation and significant domain expertise. This paper addresses the challenges by developing the first programming framework AutoMTL that automates efficient MTL model development for vision tasks. AutoMTL takes as inputs an arbitrary backbone convolutional neural network (CNN) and a set of tasks to learn, and automatically produces a multi-task model that achieves high accuracy and small memory footprint simultaneously. Experiments on three popular MTL benchmarks (CityScapes, NYUv2, Tiny-Taskonomy) demonstrate the effectiveness of AutoMTL over state-of-the-art approaches as well as the generalizability of AutoMTL across CNNs. AutoMTL is open-sourced and available at https://github.com/zhanglijun95/AutoMTL.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoST: Towards the Universal Modeling of Spatio-temporal Sequences b/data/2022/neurips/AutoST: Towards the Universal Modeling of Spatio-temporal Sequences
new file mode 100644
index 0000000000..80b6eae660
--- /dev/null
+++ b/data/2022/neurips/AutoST: Towards the Universal Modeling of Spatio-temporal Sequences	
@@ -0,0 +1 @@
+The analysis of spatio-temporal sequences plays an important role in many real-world applications, demanding a high model capacity to capture the interdependence among spatial and temporal dimensions. Previous studies provided separated network design in three categories: spatial ﬁrst, temporal ﬁrst, and spatio-temporal synchronous. However, the manually-designed heterogeneous models can hardly meet the spatio-temporal dependency capturing priority for various tasks. To address this, we proposed a universal modeling framework with three distinctive characteristics: (i) Attention-based network backbone, including S2T Layer (spatial ﬁrst), T2S Layer (temporal ﬁrst), and STS Layer (spatio-temporal synchronous). (ii) The universal modeling framework, named UniST, with a uniﬁed architecture that enables ﬂexible modeling priorities with the proposed three different modules. (iii) An automatic search strategy, named AutoST, automatically searches the optimal spatio-temporal modeling priority by network architecture search. Extensive experiments on ﬁve real-world datasets demonstrate that UniST with any single type of our three proposed modules can achieve state-of-the-art performance. Furthermore, AutoST can achieve overwhelming performance with UniST.
\ No newline at end of file
diff --git a/data/2022/neurips/AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels b/data/2022/neurips/AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels
new file mode 100644
index 0000000000..0abe29030e
--- /dev/null
+++ b/data/2022/neurips/AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels	
@@ -0,0 +1 @@
+Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Autoformalization with Large Language Models b/data/2022/neurips/Autoformalization with Large Language Models
new file mode 100644
index 0000000000..a98df73f1f
--- /dev/null
+++ b/data/2022/neurips/Autoformalization with Large Language Models	
@@ -0,0 +1 @@
+Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs. A successful autoformalization system could advance the fields of formal verification, program synthesis, and artificial intelligence. While the long-term goal of autoformalization seemed elusive for a long time, we show large language models provide new prospects towards this goal. We make the surprising observation that LLMs can correctly translate a significant portion ($25.3\%$) of mathematical competition problems perfectly to formal specifications in Isabelle/HOL. We demonstrate the usefulness of this process by improving a previously introduced neural theorem prover via training on these autoformalized theorems. Our methodology results in a new state-of-the-art result on the MiniF2F theorem proving benchmark, improving the proof rate from $29.6\%$ to $35.2\%$.
\ No newline at end of file
diff --git a/data/2022/neurips/Autoinverse: Uncertainty Aware Inversion of Neural Networks b/data/2022/neurips/Autoinverse: Uncertainty Aware Inversion of Neural Networks
new file mode 100644
index 0000000000..a11a5f1d67
--- /dev/null
+++ b/data/2022/neurips/Autoinverse: Uncertainty Aware Inversion of Neural Networks	
@@ -0,0 +1 @@
+Neural networks are powerful surrogates for numerous forward processes. The inversion of such surrogates is extremely valuable in science and engineering. The most important property of a successful neural inverse method is the performance of its solutions when deployed in the real world, i.e., on the native forward process (and not only the learned surrogate). We propose Autoinverse, a highly automated approach for inverting neural network surrogates. Our main insight is to seek inverse solutions in the vicinity of reliable data which have been sampled form the forward process and used for training the surrogate model. Autoinverse finds such solutions by taking into account the predictive uncertainty of the surrogate and minimizing it during the inversion. Apart from high accuracy, Autoinverse enforces the feasibility of solutions, comes with embedded regularization, and is initialization free. We verify our proposed method through addressing a set of real-world problems in control, fabrication, and design.
\ No newline at end of file
diff --git a/data/2022/neurips/Automatic Differentiation of Programs with Discrete Randomness b/data/2022/neurips/Automatic Differentiation of Programs with Discrete Randomness
new file mode 100644
index 0000000000..b4e217a58d
--- /dev/null
+++ b/data/2022/neurips/Automatic Differentiation of Programs with Discrete Randomness	
@@ -0,0 +1 @@
+Automatic differentiation (AD), a technique for constructing new programs which compute the derivative of an original program, has become ubiquitous throughout scientific computing and deep learning due to the improved performance afforded by gradient-based optimization. However, AD systems have been restricted to the subset of programs that have a continuous dependence on parameters. Programs that have discrete stochastic behaviors governed by distribution parameters, such as flipping a coin with probability $p$ of being heads, pose a challenge to these systems because the connection between the result (heads vs tails) and the parameters ($p$) is fundamentally discrete. In this paper we develop a new reparameterization-based methodology that allows for generating programs whose expectation is the derivative of the expectation of the original program. We showcase how this method gives an unbiased and low-variance estimator which is as automated as traditional AD mechanisms. We demonstrate unbiased forward-mode AD of discrete-time Markov chains, agent-based models such as Conway's Game of Life, and unbiased reverse-mode AD of a particle filter. Our code package is available at https://github.com/gaurav-arya/StochasticAD.jl.
\ No newline at end of file
diff --git a/data/2022/neurips/Automatic differentiation of nonsmooth iterative algorithms b/data/2022/neurips/Automatic differentiation of nonsmooth iterative algorithms
new file mode 100644
index 0000000000..4d04304d86
--- /dev/null
+++ b/data/2022/neurips/Automatic differentiation of nonsmooth iterative algorithms	
@@ -0,0 +1 @@
+Differentiation along algorithms, i.e., piggyback propagation of derivatives, is now routinely used to differentiate iterative solvers in differentiable programming. Asymptotics is well understood for many smooth problems but the nondifferentiable case is hardly considered. Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)? Does it have any variational meaning and can it be used effectively in machine learning? Is there a connection with classical derivative? All these questions are addressed under appropriate nonexpansivity conditions in the framework of conservative derivatives which has proved useful in understanding nonsmooth AD. For nonsmooth piggyback iterations, we characterize the attractor set of nonsmooth piggyback iterations as a set-valued fixed point which remains in the conservative framework. This has various consequences and in particular almost everywhere convergence of classical derivatives. Our results are illustrated on parametric convex optimization problems with forward-backward, Douglas-Rachford and Alternating Direction of Multiplier algorithms as well as the Heavy-Ball method.
\ No newline at end of file
diff --git a/data/2022/neurips/Autoregressive Perturbations for Data Poisoning b/data/2022/neurips/Autoregressive Perturbations for Data Poisoning
new file mode 100644
index 0000000000..93333a5860
--- /dev/null
+++ b/data/2022/neurips/Autoregressive Perturbations for Data Poisoning	
@@ -0,0 +1 @@
+The prevalence of data scraping from social media as a means to obtain datasets has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping, as they make data"unlearnable"by adding small, imperceptible perturbations. Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack. In this work, we introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.
\ No newline at end of file
diff --git a/data/2022/neurips/Autoregressive Search Engines: Generating Substrings as Document Identifiers b/data/2022/neurips/Autoregressive Search Engines: Generating Substrings as Document Identifiers
new file mode 100644
index 0000000000..983d4db46a
--- /dev/null
+++ b/data/2022/neurips/Autoregressive Search Engines: Generating Substrings as Document Identifiers	
@@ -0,0 +1 @@
+Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to the retrieval problem with minimal intervention to the models' architecture. Previous work has explored ways to partition the search space into hierarchical structures and retrieve documents by autoregressively generating their unique identifier. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers. This setup allows us to use an autoregressive model to generate and score distinctive ngrams, that are then mapped to full passages through an efficient data structure. Empirically, we show this not only outperforms prior autoregressive approaches but also leads to an average improvement of at least 10 points over more established retrieval solutions for passage-level retrieval on the KILT benchmark, establishing new state-of-the-art downstream performance on some datasets, while using a considerably lighter memory footprint than competing systems. Code and pre-trained models at https://github.com/facebookresearch/SEAL.
\ No newline at end of file
diff --git a/data/2022/neurips/Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds b/data/2022/neurips/Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds
new file mode 100644
index 0000000000..67389e2aeb
--- /dev/null
+++ b/data/2022/neurips/Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds	
@@ -0,0 +1 @@
+Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL.
\ No newline at end of file
diff --git a/data/2022/neurips/Average Sensitivity of Euclidean k-Clustering b/data/2022/neurips/Average Sensitivity of Euclidean k-Clustering
new file mode 100644
index 0000000000..6cb0918df5
--- /dev/null
+++ b/data/2022/neurips/Average Sensitivity of Euclidean k-Clustering	
@@ -0,0 +1 @@
+Given a set of n points in R d , the goal of Euclidean ( k, (cid:96) ) -clustering is to ﬁnd k centers that minimize the sum of the (cid:96) -th powers of the Euclidean distance of each point to the closest center. In practical situations, the clustering result must be stable against points missing in the input data so that we can make trustworthy and consistent decisions. To address this issue, we consider the average sensitivity of Euclidean ( k, (cid:96) ) -clustering, which measures the stability of the output in total variation distance against deleting a random point from the input data. We ﬁrst show that a popular algorithm k - MEANS ++ and its variant called D (cid:96) - SAMPLING have low average sensitivity. Next, we show that any approximation algorithm for Euclidean ( k, (cid:96) ) -clustering can be transformed to an algorithm with low average sensitivity while almost preserving the approximation guarantee. As byproducts of our results, we provide several algorithms for consistent ( k, (cid:96) ) -clustering and dynamic ( k, (cid:96) ) -clustering in the random-order model, where the input points are randomly permuted and given in an online manner. The goal of the consistent setting is to maintain a good solution while minimizing the number of changes to the solution during the process, and that of the dynamic setting is to maintain a good solution while minimizing the (amortized) update time.
\ No newline at end of file
diff --git a/data/2022/neurips/BEER: Fast $O(1 T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression b/data/2022/neurips/BEER: Fast $O(1 T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression
new file mode 100644
index 0000000000..3c472f2506
--- /dev/null
+++ b/data/2022/neurips/BEER: Fast $O(1 T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression	
@@ -0,0 +1 @@
+Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence $O((G/T)^{2/3})$ compared with their uncompressed counterpart, where $G$ measures the data heterogeneity across different clients, and $T$ is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of $O(1/T)$. This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.
\ No newline at end of file
diff --git a/data/2022/neurips/BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework b/data/2022/neurips/BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework
new file mode 100644
index 0000000000..fc2644f4dd
--- /dev/null
+++ b/data/2022/neurips/BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework	
@@ -0,0 +1 @@
+Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discovered that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure. The code is available at https://github.com/ADLab-AutoDrive/BEVFusion.
\ No newline at end of file
diff --git a/data/2022/neurips/BILCO: An Efficient Algorithm for Joint Alignment of Time Series b/data/2022/neurips/BILCO: An Efficient Algorithm for Joint Alignment of Time Series
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/BLOX: Macro Neural Architecture Search Benchmark and Algorithms b/data/2022/neurips/BLOX: Macro Neural Architecture Search Benchmark and Algorithms
new file mode 100644
index 0000000000..55bfe1257b
--- /dev/null
+++ b/data/2022/neurips/BLOX: Macro Neural Architecture Search Benchmark and Algorithms	
@@ -0,0 +1 @@
+Neural architecture search (NAS) has been successfully used to design numerous high-performance neural networks. However, NAS is typically compute-intensive, so most existing approaches restrict the search to decide the operations and topological structure of a single block only, then the same block is stacked repeatedly to form an end-to-end model. Although such an approach reduces the size of search space, recent studies show that a macro search space, which allows blocks in a model to be different, can lead to better performance. To provide a systematic study of the performance of NAS algorithms on a macro search space, we release Blox - a benchmark that consists of 91k unique models trained on the CIFAR-100 dataset. The dataset also includes runtime measurements of all the models on a diverse set of hardware platforms. We perform extensive experiments to compare existing algorithms that are well studied on cell-based search spaces, with the emerging blockwise approaches that aim to make NAS scalable to much larger macro search spaces. The benchmark and code are available at https://github.com/SamsungLabs/blox.
\ No newline at end of file
diff --git a/data/2022/neurips/BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling b/data/2022/neurips/BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling
new file mode 100644
index 0000000000..d7264ec8f9
--- /dev/null
+++ b/data/2022/neurips/BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling	
@@ -0,0 +1 @@
+Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information ( e.g. , memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges ( i.e. , catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders ( i.e. , bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders. (2) To further enhance our BMU-MoCo by utilizing earlier knowledge, we additionally maintain a pair of global momentum encoders (only initialized at the very beginning) with the same BMU strategy. Extensive results show that our BMU-MoCo remarkably outperforms recent competitors w.r.t. video-text retrieval performance and forgetting rate, even without using any extra data or dynamic networks.
\ No newline at end of file
diff --git a/data/2022/neurips/BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach b/data/2022/neurips/BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
new file mode 100644
index 0000000000..8b92b003d6
--- /dev/null
+++ b/data/2022/neurips/BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach	
@@ -0,0 +1 @@
+Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit differentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods proposed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that depends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.
\ No newline at end of file
diff --git a/data/2022/neurips/BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs b/data/2022/neurips/BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs
new file mode 100644
index 0000000000..091950d157
--- /dev/null
+++ b/data/2022/neurips/BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs	
@@ -0,0 +1 @@
+Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present--to the best of our knowledge--the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to"organic"(real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https://github.com/pygod-team/pygod/tree/main/benchmark
\ No newline at end of file
diff --git a/data/2022/neurips/BR-SNIS: Bias Reduced Self-Normalized Importance Sampling b/data/2022/neurips/BR-SNIS: Bias Reduced Self-Normalized Importance Sampling
new file mode 100644
index 0000000000..f1a63e84b5
--- /dev/null
+++ b/data/2022/neurips/BR-SNIS: Bias Reduced Self-Normalized Importance Sampling	
@@ -0,0 +1 @@
+Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling--importance resampling (ISIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.
\ No newline at end of file
diff --git a/data/2022/neurips/BYOL-Explore: Exploration by Bootstrapped Prediction b/data/2022/neurips/BYOL-Explore: Exploration by Bootstrapped Prediction
new file mode 100644
index 0000000000..11efec6872
--- /dev/null
+++ b/data/2022/neurips/BYOL-Explore: Exploration by Bootstrapped Prediction	
@@ -0,0 +1 @@
+We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments. On this benchmark, we solve the majority of the tasks purely through augmenting the extrinsic reward with BYOL-Explore s intrinsic reward, whereas prior work could only get off the ground with human demonstrations. As further evidence of the generality of BYOL-Explore, we show that it achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.
\ No newline at end of file
diff --git a/data/2022/neurips/Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation b/data/2022/neurips/Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation
new file mode 100644
index 0000000000..d437db5d7f
--- /dev/null
+++ b/data/2022/neurips/Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation	
@@ -0,0 +1 @@
+Transfer learning from the model trained on large datasets to customized down-stream tasks has been widely used as the pre-trained model can greatly boost the generalizability. However, the increasing sizes of pre-trained models also lead to a prohibitively large memory footprints for downstream transferring, making them unaffordable for personal devices. Previous work recognizes the bottleneck of the footprint to be the activation, and hence proposes various solutions such as injecting specific lite modules. In this work, we present a novel memory-efficient transfer framework called Back Razor , that can be plug-and-play applied to any pre-trained network without changing its architecture. The key idea of Back Razor is asymmetric sparsifying : pruning the activation stored for back-propagation, while keeping the forward activation dense. It is based on the observation that the stored activation, that dominates the memory footprint, is only needed for back-propagation. Such asymmetric pruning avoids affecting the precision of forward computation, thus making more aggressive pruning possible. Furthermore, we conduct the theoretical analysis for the convergence rate of Back Razor, showing that under mild conditions, our method retains the similar convergence rate as vanilla SGD. Extensive transfer learning experiments on both Convolutional Neural Networks and Vision Transformers with classification, dense prediction, and language modeling tasks show that Back Razor could yield up to 97% sparsity , saving 9.2x memory usage, without losing accuracy. The code is available at: https://github.com/VITA-Group/BackRazor_Neurips22 .
\ No newline at end of file
diff --git a/data/2022/neurips/BackdoorBench: A Comprehensive Benchmark of Backdoor Learning b/data/2022/neurips/BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
new file mode 100644
index 0000000000..b8d8d73b50
--- /dev/null
+++ b/data/2022/neurips/BackdoorBench: A Comprehensive Benchmark of Backdoor Learning	
@@ -0,0 +1 @@
+Backdoor learning is an emerging and vital topic for studying deep neural networks' vulnerability (DNNs). Many pioneering backdoor attack and defense methods are being proposed, successively or concurrently, in the status of a rapid arms race. However, we find that the evaluations of new methods are often unthorough to verify their claims and accurate performance, mainly due to the rapid development, diverse settings, and the difficulties of implementation and reproducibility. Without thorough evaluations and comparisons, it is not easy to track the current progress and design the future development roadmap of the literature. To alleviate this dilemma, we build a comprehensive benchmark of backdoor learning called BackdoorBench. It consists of an extensible modular-based codebase (currently including implementations of 8 state-of-the-art (SOTA) attacks and 9 SOTA defense algorithms) and a standardized protocol of complete backdoor learning. We also provide comprehensive evaluations of every pair of 8 attacks against 9 defenses, with 5 poisoning ratios, based on 5 models and 4 datasets, thus 8,000 pairs of evaluations in total. We present abundant analysis from different perspectives about these 8,000 evaluations, studying the effects of different factors in backdoor learning. All codes and evaluations of BackdoorBench are publicly available at \url{https://backdoorbench.github.io}.
\ No newline at end of file
diff --git a/data/2022/neurips/BadPrompt: Backdoor Attacks on Continuous Prompts b/data/2022/neurips/BadPrompt: Backdoor Attacks on Continuous Prompts
new file mode 100644
index 0000000000..b7a3682366
--- /dev/null
+++ b/data/2022/neurips/BadPrompt: Backdoor Attacks on Continuous Prompts	
@@ -0,0 +1 @@
+The prompt-based learning paradigm has gained much research attention recently. It has achieved state-of-the-art performance on several NLP tasks, especially in the few-shot scenarios. While steering the downstream tasks, few works have been reported to investigate the security problems of the prompt-based models. In this paper, we conduct the first study on the vulnerability of the continuous prompt learning algorithm to backdoor attacks. We observe that the few-shot scenarios have posed a great challenge to backdoor attacks on the prompt-based models, limiting the usability of existing NLP backdoor methods. To address this challenge, we propose BadPrompt, a lightweight and task-adaptive algorithm, to backdoor attack continuous prompts. Specially, BadPrompt first generates candidate triggers which are indicative for predicting the targeted label and dissimilar to the samples of the non-targeted labels. Then, it automatically selects the most effective and invisible trigger for each sample with an adaptive trigger optimization algorithm. We evaluate the performance of BadPrompt on five datasets and two continuous prompt models. The results exhibit the abilities of BadPrompt to effectively attack continuous prompts while maintaining high performance on the clean test sets, outperforming the baseline models by a large margin. The source code of BadPrompt is publicly available at https://github.com/papersPapers/BadPrompt.
\ No newline at end of file
diff --git a/data/2022/neurips/BagFlip: A Certified Defense Against Data Poisoning b/data/2022/neurips/BagFlip: A Certified Defense Against Data Poisoning
new file mode 100644
index 0000000000..42b535d27d
--- /dev/null
+++ b/data/2022/neurips/BagFlip: A Certified Defense Against Data Poisoning	
@@ -0,0 +1 @@
+Machine learning models are vulnerable to data-poisoning attacks, in which an attacker maliciously modifies the training set to change the prediction of a learned model. In a trigger-less attack, the attacker can modify the training set but not the test inputs, while in a backdoor attack the attacker can also modify test inputs. Existing model-agnostic defense approaches either cannot handle backdoor attacks or do not provide effective certificates (i.e., a proof of a defense). We present BagFlip, a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attacks. We evaluate BagFlip on image classification and malware detection datasets. BagFlip is equal to or more effective than the state-of-the-art approaches for trigger-less attacks and more effective than the state-of-the-art approaches for backdoor attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization b/data/2022/neurips/Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization
new file mode 100644
index 0000000000..af14aa25a2
--- /dev/null
+++ b/data/2022/neurips/Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization	
@@ -0,0 +1 @@
+Directed Evolution (DE), a landmark wet-lab method originated in 1960s, enables discovery of novel protein designs via evolving a population of candidate sequences. Recent advances in biotechnology has made it possible to collect high-throughput data, allowing the use of machine learning to map out a protein's sequence-to-function relation. There is a growing interest in machine learning-assisted DE for accelerating protein optimization. Yet the theoretical understanding of DE, as well as the use of machine learning in DE, remains limited. In this paper, we connect DE with the bandit learning theory and make a first attempt to study regret minimization in DE. We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements. TS-DE updates a posterior of the function based on collected measurements. It uses a posterior-sampled function estimate to guide the crossover recombination and mutation steps in DE. In the case of a linear model, we show that TS-DE enjoys a Bayesian regret of order $\tilde O(d^{2}\sqrt{MT})$, where $d$ is feature dimension, $M$ is population size and $T$ is number of rounds. This regret bound is nearly optimal, confirming that bandit learning can provably accelerate DE. It may have implications for more general sequence optimization and evolutionary algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel b/data/2022/neurips/Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel
new file mode 100644
index 0000000000..d0297f66ed
--- /dev/null
+++ b/data/2022/neurips/Batch Bayesian Optimization on Permutations using the Acquisition Weighted Kernel	
@@ -0,0 +1 @@
+In this work we propose a batch Bayesian optimization method for combinatorial problems on permutations, which is well suited for expensive-to-evaluate objectives. We first introduce LAW, an efficient batch acquisition method based on determinantal point processes using the acquisition weighted kernel. Relying on multiple parallel evaluations, LAW enables accelerated search on combinatorial spaces. We then apply the framework to permutation problems, which have so far received little attention in the Bayesian Optimization literature, despite their practical importance. We call this method LAW2ORDER. On the theoretical front, we prove that LAW2ORDER has vanishing simple regret by showing that the batch cumulative regret is sublinear. Empirically, we assess the method on several standard combinatorial problems involving permutations such as quadratic assignment, flowshop scheduling and the traveling salesman, as well as on a structure learning task.
\ No newline at end of file
diff --git a/data/2022/neurips/Batch Bayesian optimisation via density-ratio estimation with guarantees b/data/2022/neurips/Batch Bayesian optimisation via density-ratio estimation with guarantees
new file mode 100644
index 0000000000..f666f6ea4d
--- /dev/null
+++ b/data/2022/neurips/Batch Bayesian optimisation via density-ratio estimation with guarantees	
@@ -0,0 +1 @@
+Bayesian optimisation (BO) algorithms have shown remarkable success in applications involving expensive black-box functions. Traditionally BO has been set as a sequential decision-making process which estimates the utility of query points via an acquisition function and a prior over functions, such as a Gaussian process. Recently, however, a reformulation of BO via density-ratio estimation (BORE) allowed reinterpreting the acquisition function as a probabilistic binary classifier, removing the need for an explicit prior over functions and increasing scalability. In this paper, we present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates. We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference. The resulting algorithms come equipped with theoretical performance guarantees and are assessed against other batch and sequential BO baselines in a series of experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Batch Multi-Fidelity Active Learning with Budget Constraints b/data/2022/neurips/Batch Multi-Fidelity Active Learning with Budget Constraints
new file mode 100644
index 0000000000..7d8d97ddb0
--- /dev/null
+++ b/data/2022/neurips/Batch Multi-Fidelity Active Learning with Budget Constraints	
@@ -0,0 +1 @@
+Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near $(1 - 1/e)$-approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Batch size-invariance for policy optimization b/data/2022/neurips/Batch size-invariance for policy optimization
new file mode 100644
index 0000000000..7fdc75a9b4
--- /dev/null
+++ b/data/2022/neurips/Batch size-invariance for policy optimization	
@@ -0,0 +1 @@
+We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the learning rate. However, some policy optimization algorithms (such as PPO) do not have this property, because of how they control the size of policy updates. In this work we show how to make these algorithms batch size-invariant. Our key insight is to decouple the proximal policy (used for controlling policy updates) from the behavior policy (used for off-policy corrections). Our experiments help explain why these algorithms work, and additionally show how they can make more efficient use of stale data.
\ No newline at end of file
diff --git a/data/2022/neurips/Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms b/data/2022/neurips/Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms
new file mode 100644
index 0000000000..486262c82e
--- /dev/null
+++ b/data/2022/neurips/Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms	
@@ -0,0 +1 @@
+In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.
\ No newline at end of file
diff --git a/data/2022/neurips/BayesPCN: A Continually Learnable Predictive Coding Associative Memory b/data/2022/neurips/BayesPCN: A Continually Learnable Predictive Coding Associative Memory
new file mode 100644
index 0000000000..78d8f3a07e
--- /dev/null
+++ b/data/2022/neurips/BayesPCN: A Continually Learnable Predictive Coding Associative Memory	
@@ -0,0 +1 @@
+Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community's interest in associative memories has recently been rekindled, most work has focused on memory recall ($read$) over memory learning ($write$). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations ($forget$) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d. high-dimensional data observed hundreds to a thousand ``timesteps'' ago without a large drop in recall ability compared to the state-of-the-art offline-learned parametric memory models.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Active Learning with Fully Bayesian Gaussian Processes b/data/2022/neurips/Bayesian Active Learning with Fully Bayesian Gaussian Processes
new file mode 100644
index 0000000000..ed224d7cab
--- /dev/null
+++ b/data/2022/neurips/Bayesian Active Learning with Fully Bayesian Gaussian Processes	
@@ -0,0 +1 @@
+The bias-variance trade-off is a well-known problem in machine learning that only gets more pronounced the less available data there is. In active learning, where labeled data is scarce or difficult to obtain, neglecting this trade-off can cause inefficient and non-optimal querying, leading to unnecessary data labeling. In this paper, we focus on active learning with Gaussian Processes (GPs). For the GP, the bias-variance trade-off is made by optimization of the two hyperparameters: the length scale and noise-term. Considering that the optimal mode of the joint posterior of the hyperparameters is equivalent to the optimal bias-variance trade-off, we approximate this joint posterior and utilize it to design two new acquisition functions. The first one is a Bayesian variant of Query-by-Committee (B-QBC), and the second is an extension that explicitly minimizes the predictive variance through a Query by Mixture of Gaussian Processes (QB-MGP) formulation. Across six simulators, we empirically show that B-QBC, on average, achieves the best marginal likelihood, whereas QB-MGP achieves the best predictive performance. We show that incorporating the bias-variance trade-off in the acquisition functions mitigates unnecessary and expensive data labeling.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Clustering of Neural Spiking Activity Using a Mixture of Dynamic Poisson Factor Analyzers b/data/2022/neurips/Bayesian Clustering of Neural Spiking Activity Using a Mixture of Dynamic Poisson Factor Analyzers
new file mode 100644
index 0000000000..98400f567e
--- /dev/null
+++ b/data/2022/neurips/Bayesian Clustering of Neural Spiking Activity Using a Mixture of Dynamic Poisson Factor Analyzers	
@@ -0,0 +1 @@
+Modern neural recording techniques allow neuroscientists to observe the spiking activity of many neurons simultaneously. Although previous work has illustrated how activity within and between known populations of neurons can be summarized by low-dimensional latent vectors, in many cases what determines a unique population may be unclear. Neurons differ in their anatomical location, but also, in their cell types and response properties. Moreover, multiple distinct populations may not be well described by a single low-dimensional, linear representation. To tackle these challenges, we develop a clustering method based on a mixture of dynamic Poisson factor analyzers (mixDPFA) model, with the number of clusters treated as an unknown parameter. To do the analysis of DPFA model, we propose a novel Markov chain Monte Carlo (MCMC) algorithm to efﬁciently sample its posterior distribution. Validating our proposed MCMC algorithm with simulations, we ﬁnd that it can accurately recover the true clustering and latent states and is insensitive to the initial cluster assignments. We then apply the proposed mixDPFA model to multi-region experimental recordings, where we ﬁnd that the proposed method can identify novel, reliable clusters of neurons based on their activity, and may, thus, be a useful tool for neural data analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning b/data/2022/neurips/Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning
new file mode 100644
index 0000000000..41b8ea73f9
--- /dev/null
+++ b/data/2022/neurips/Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable. This paper proposes an algorithm, Bayesian optimistic optimization (BOO), which adopts a dynamic weighting technique for enforcing the constraint rather than explicitly solving a constrained optimization problem. BOO is a general algorithm proved to be sample-efficient for models in a finite-dimensional reproducing kernel Hilbert space. We also develop techniques for effective optimization and show through some simulation experiments that BOO is competitive with the existing algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization b/data/2022/neurips/Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization
new file mode 100644
index 0000000000..48b7955fdf
--- /dev/null
+++ b/data/2022/neurips/Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization	
@@ -0,0 +1 @@
+Optimizing expensive-to-evaluate black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in scientific and engineering applications. Bayesian optimization (BO) is a popular, sample-efficient method that leverages a probabilistic surrogate model and an acquisition function (AF) to select promising designs to evaluate. However, maximizing the AF over mixed or high-cardinality discrete search spaces is challenging standard gradient-based methods cannot be used directly or evaluating the AF at every point in the search space would be computationally prohibitive. To address this issue, we propose using probabilistic reparameterization (PR). Instead of directly optimizing the AF over the search space containing discrete parameters, we instead maximize the expectation of the AF over a probability distribution defined by continuous parameters. We prove that under suitable reparameterizations, the BO policy that maximizes the probabilistic objective is the same as that which maximizes the AF, and therefore, PR enjoys the same regret bounds as the original BO policy using the underlying AF. Moreover, our approach provably converges to a stationary point of the probabilistic objective under gradient ascent using scalable, unbiased estimators of both the probabilistic objective and its gradient. Therefore, as the number of starting points and gradient steps increase, our approach will recover of a maximizer of the AF (an often-neglected requisite for commonly used BO regret bounds). We validate our approach empirically and demonstrate state-of-the-art optimization performance on a wide range of real-world applications. PR is complementary to (and benefits) recent work and naturally generalizes to settings with multiple objectives and black-box constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Persuasion for Algorithmic Recourse b/data/2022/neurips/Bayesian Persuasion for Algorithmic Recourse
new file mode 100644
index 0000000000..8eaf96e2c0
--- /dev/null
+++ b/data/2022/neurips/Bayesian Persuasion for Algorithmic Recourse	
@@ -0,0 +1 @@
+When subjected to automated decision-making, decision subjects may strategically modify their observable features in ways they believe will maximize their chances of receiving a favorable decision. In many practical situations, the underlying assessment rule is deliberately kept secret to avoid gaming and maintain competitive advantage. The resulting opacity forces the decision subjects to rely on incomplete information when making strategic feature modifications. We capture such settings as a game of Bayesian persuasion, in which the decision maker offers a form of recourse to the decision subject by providing them with an action recommendation (or signal) to incentivize them to modify their features in desirable ways. We show that when using persuasion, the decision maker and decision subject are never worse off in expectation, while the decision maker can be significantly better off. While the decision maker's problem of finding the optimal Bayesian incentive-compatible (BIC) signaling policy takes the form of optimization over infinitely-many variables, we show that this optimization can be cast as a linear program over finitely-many regions of the space of possible assessment rules. While this reformulation simplifies the problem dramatically, solving the linear program requires reasoning about exponentially-many variables, even in relatively simple cases. Motivated by this observation, we provide a polynomial-time approximation scheme that recovers a near-optimal signaling policy. Finally, our numerical simulations on semi-synthetic data empirically demonstrate the benefits of using persuasion in the algorithmic recourse setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Risk Markov Decision Processes b/data/2022/neurips/Bayesian Risk Markov Decision Processes
new file mode 100644
index 0000000000..69e2eaa27f
--- /dev/null
+++ b/data/2022/neurips/Bayesian Risk Markov Decision Processes	
@@ -0,0 +1 @@
+We consider finite-horizon Markov Decision Processes where parameters, such as transition probabilities, are unknown and estimated from data. The popular distributionally robust approach to addressing the parameter uncertainty can sometimes be overly conservative. In this paper, we propose a new formulation, Bayesian risk Markov Decision Process (BR-MDP), to address parameter uncertainty in MDPs, where a risk functional is applied in nested form to the expected total cost with respect to the Bayesian posterior distribution of the unknown parameters. The proposed formulation provides more flexible risk attitutes towards parameter uncertainty and takes into account the availability of data in future times stages. To solve the proposed formulation with the conditional value-at-risk (CVaR) risk functional, we propose an efficient approximation algorithm by deriving an analytical approximation of the value function and utilizing the convexity of CVaR. We demonstrate the empirical performance of the BR-MDP formulation and proposed algorithms on a gambler's betting problem and an inventory control problem.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty b/data/2022/neurips/Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty
new file mode 100644
index 0000000000..9602b22876
--- /dev/null
+++ b/data/2022/neurips/Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty	
@@ -0,0 +1 @@
+Nonlinear dynamics are ubiquitous in science and engineering applications, but the physics of most complex systems is far from being fully understood. Discovering interpretable governing equations from measurement data can help us understand and predict the behavior of complex dynamic systems. Although extensive work has recently been done in this field, robustly distilling explicit model forms from very sparse data with considerable noise remains intractable. Moreover, quantifying and propagating the uncertainty of the identified system from noisy data is challenging, and relevant literature is still limited. To bridge this gap, we develop a novel Bayesian spline learning framework to identify parsimonious governing equations of nonlinear (spatio)temporal dynamics from sparse, noisy data with quantified uncertainty. The proposed method utilizes spline basis to handle the data scarcity and measurement noise, upon which a group of derivatives can be accurately computed to form a library of candidate model terms. The equation residuals are used to inform the spline learning in a Bayesian manner, where approximate Bayesian uncertainty calibration techniques are employed to approximate posterior distributions of the trainable parameters. To promote the sparsity, an iterative sequential-threshold Bayesian learning approach is developed, using the alternative direction optimization strategy to systematically approximate L0 sparsity constraints. The proposed algorithm is evaluated on multiple nonlinear dynamical systems governed by canonical ordinary and partial differential equations, and the merit/superiority of the proposed method is demonstrated by comparison with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Bayesian inference via sparse Hamiltonian flows b/data/2022/neurips/Bayesian inference via sparse Hamiltonian flows
new file mode 100644
index 0000000000..9ec6f5e7a4
--- /dev/null
+++ b/data/2022/neurips/Bayesian inference via sparse Hamiltonian flows	
@@ -0,0 +1 @@
+A Bayesian coreset is a small, weighted subset of data that replaces the full dataset during Bayesian inference, with the goal of reducing computational cost. Although past work has shown empirically that there often exists a coreset with low inferential error, efficiently constructing such a coreset remains a challenge. Current methods tend to be slow, require a secondary inference step after coreset construction, and do not provide bounds on the data marginal evidence. In this work, we introduce a new method -- sparse Hamiltonian flows -- that addresses all three of these challenges. The method involves first subsampling the data uniformly, and then optimizing a Hamiltonian flow parametrized by coreset weights and including periodic momentum quasi-refreshment steps. Theoretical results show that the method enables an exponential compression of the dataset in a representative model, and that the quasi-refreshment steps reduce the KL divergence to the target. Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide accurate posterior approximations with significantly reduced runtime compared with competing dynamical-system-based inference methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Behavior Transformers: Cloning $k$ modes with one stone b/data/2022/neurips/Behavior Transformers: Cloning $k$ modes with one stone
new file mode 100644
index 0000000000..07841591a8
--- /dev/null
+++ b/data/2022/neurips/Behavior Transformers: Cloning $k$ modes with one stone	
@@ -0,0 +1 @@
+While behavior learning has made impressive progress in recent times, it lags behind computer vision and natural language processing due to its inability to leverage large, human-generated datasets. Human behaviors have wide variance, multiple modes, and human demonstrations typically do not come with reward labels. These properties limit the applicability of current methods in Offline RL and Behavioral Cloning to learn from large, pre-collected datasets. In this work, we present Behavior Transformer (BeT), a new technique to model unlabeled demonstration data with multiple modes. BeT retrofits standard transformer architectures with action discretization coupled with a multi-task action correction inspired by offset prediction in object detection. This allows us to leverage the multi-modal modeling ability of modern transformers to predict multi-modal continuous actions. We experimentally evaluate BeT on a variety of robotic manipulation and self-driving behavior datasets. We show that BeT significantly improves over prior state-of-the-art work on solving demonstrated tasks while capturing the major modes present in the pre-collected datasets. Finally, through an extensive ablation study, we analyze the importance of every crucial component in BeT. Videos of behavior generated by BeT are available at https://notmahi.github.io/bet
\ No newline at end of file
diff --git a/data/2022/neurips/Bellman Residual Orthogonalization for Offline Reinforcement Learning b/data/2022/neurips/Bellman Residual Orthogonalization for Offline Reinforcement Learning
new file mode 100644
index 0000000000..faf776c22e
--- /dev/null
+++ b/data/2022/neurips/Bellman Residual Orthogonalization for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.
\ No newline at end of file
diff --git a/data/2022/neurips/Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability b/data/2022/neurips/Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
new file mode 100644
index 0000000000..24fc15bceb
--- /dev/null
+++ b/data/2022/neurips/Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability	
@@ -0,0 +1 @@
+Estimating personalized effects of treatments is a complex, yet pervasive problem. To tackle it, recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature. Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial -- in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice. In this work, we therefore use post-hoc feature importance methods to identify features that influence the model's predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates -- covariates that determine differential responses to treatment. Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation -- e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.
\ No newline at end of file
diff --git a/data/2022/neurips/Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms b/data/2022/neurips/Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms
new file mode 100644
index 0000000000..a105a9b745
--- /dev/null
+++ b/data/2022/neurips/Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms	
@@ -0,0 +1 @@
+3D human pose and shape estimation (a.k.a."human mesh recovery") has achieved substantial progress. Researchers mainly focus on the development of novel algorithms, while less attention has been paid to other critical factors involved. This could lead to less optimal baselines, hindering the fair and faithful evaluations of newly designed methodologies. To address this problem, this work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms. 1) Datasets. An analysis on 31 datasets reveals the distinct impacts of data samples: datasets featuring critical attributes (i.e. diverse poses, shapes, camera characteristics, backbone features) are more effective. Strategical selection and combination of high-quality datasets can yield a significant boost to the model performance. 2) Backbones. Experiments with 10 backbones, ranging from CNNs to transformers, show the knowledge learnt from a proximity task is readily transferable to human mesh recovery. 3) Training strategies. Proper augmentation techniques and loss designs are crucial. With the above findings, we achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model. More importantly, we provide strong baselines for fair comparisons of algorithms, and recommendations for building effective training configurations in the future. Codebase is available at http://github.com/smplbody/hmr-benchmarks
\ No newline at end of file
diff --git a/data/2022/neurips/Benchopt: Reproducible, efficient and collaborative optimization benchmarks b/data/2022/neurips/Benchopt: Reproducible, efficient and collaborative optimization benchmarks
new file mode 100644
index 0000000000..7f1deb6511
--- /dev/null
+++ b/data/2022/neurips/Benchopt: Reproducible, efficient and collaborative optimization benchmarks	
@@ -0,0 +1 @@
+Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We propose Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcase benchmarks on three standard learning tasks: $\ell_2$-regularized logistic regression, Lasso, and ResNet18 training for image classification. These benchmarks highlight key practical findings that give a more nuanced view of the state-of-the-art for these problems, showing that for practical evaluation, the devil is in the details. We hope that Benchopt will foster collaborative work in the community hence improving the reproducibility of research findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Benefits of Additive Noise in Composing Classes with Bounded Capacity b/data/2022/neurips/Benefits of Additive Noise in Composing Classes with Bounded Capacity
new file mode 100644
index 0000000000..b9d760018c
--- /dev/null
+++ b/data/2022/neurips/Benefits of Additive Noise in Composing Classes with Bounded Capacity	
@@ -0,0 +1 @@
+We observe that given two (compatible) classes of functions $\mathcal{F}$ and $\mathcal{H}$ with small capacity as measured by their uniform covering numbers, the capacity of the composition class $\mathcal{H} \circ \mathcal{F}$ can become prohibitively large or even unbounded. We then show that adding a small amount of Gaussian noise to the output of $\mathcal{F}$ before composing it with $\mathcal{H}$ can effectively control the capacity of $\mathcal{H} \circ \mathcal{F}$, offering a general recipe for modular design. To prove our results, we define new notions of uniform covering number of random functions with respect to the total variation and Wasserstein distances. We instantiate our results for the case of multi-layer sigmoid neural networks. Preliminary empirical results on MNIST dataset indicate that the amount of noise required to improve over existing uniform bounds can be numerically negligible (i.e., element-wise i.i.d. Gaussian noise with standard deviation $10^{-240}$). The source codes are available at https://github.com/fathollahpour/composition_noise.
\ No newline at end of file
diff --git a/data/2022/neurips/Benefits of Permutation-Equivariance in Auction Mechanisms b/data/2022/neurips/Benefits of Permutation-Equivariance in Auction Mechanisms
new file mode 100644
index 0000000000..ae9f0d9c8c
--- /dev/null
+++ b/data/2022/neurips/Benefits of Permutation-Equivariance in Auction Mechanisms	
@@ -0,0 +1 @@
+Designing an incentive-compatible auction mechanism that maximizes the auctioneer's revenue while minimizes the bidders' ex-post regret is an important yet intricate problem in economics. Remarkable progress has been achieved through learning the optimal auction mechanism by neural networks. In this paper, we consider the popular additive valuation and symmetric valuation setting; i.e., the valuation for a set of items is defined as the sum of all items' valuations in the set, and the valuation distribution is invariant when the bidders and/or the items are permutated. We prove that permutation-equivariant neural networks have significant advantages: the permutation-equivariance decreases the expected ex-post regret, improves the model generalizability, while maintains the expected revenue invariant. This implies that the permutation-equivariance helps approach the theoretically optimal dominant strategy incentive compatible condition, and reduces the required sample complexity for desired generalization. Extensive experiments fully support our theory. To our best knowledge, this is the first work towards understanding the benefits of permutation-equivariance in auction mechanisms.
\ No newline at end of file
diff --git a/data/2022/neurips/Benign Overfitting in Two-layer Convolutional Neural Networks b/data/2022/neurips/Benign Overfitting in Two-layer Convolutional Neural Networks
new file mode 100644
index 0000000000..e2c7692713
--- /dev/null
+++ b/data/2022/neurips/Benign Overfitting in Two-layer Convolutional Neural Networks	
@@ -0,0 +1 @@
+Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as"benign overfitting". Recently, there emerges a line of works studying"benign overfitting"from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve a constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Benign Underfitting of Stochastic Gradient Descent b/data/2022/neurips/Benign Underfitting of Stochastic Gradient Descent
new file mode 100644
index 0000000000..74742e2609
--- /dev/null
+++ b/data/2022/neurips/Benign Underfitting of Stochastic Gradient Descent	
@@ -0,0 +1 @@
+We study to what extent may stochastic gradient descent (SGD) be understood as a"conventional"learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate $O(1/\sqrt n)$, and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of $\Omega(1)$. Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). We then continue to analyze the closely related with-replacement SGD, for which we show that an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate. Finally, we interpret our main results in the context of without-replacement SGD for finite-sum convex optimization problems, and derive upper and lower bounds for the multi-epoch regime that significantly improve upon previously known results.
\ No newline at end of file
diff --git a/data/2022/neurips/Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting b/data/2022/neurips/Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Bessel Equivariant Networks for Inversion of Transmission Effects in Multi-Mode Optical Fibres b/data/2022/neurips/Bessel Equivariant Networks for Inversion of Transmission Effects in Multi-Mode Optical Fibres
new file mode 100644
index 0000000000..ee69bc5422
--- /dev/null
+++ b/data/2022/neurips/Bessel Equivariant Networks for Inversion of Transmission Effects in Multi-Mode Optical Fibres	
@@ -0,0 +1 @@
+We develop a new type of model for solving the task of inverting the transmission effects of multi-mode optical fibres through the construction of an $\mathrm{SO}^{+}(2,1)$-equivariant neural network. This model takes advantage of the of the azimuthal correlations known to exist in fibre speckle patterns and naturally accounts for the difference in spatial arrangement between input and speckle patterns. In addition, we use a second post-processing network to remove circular artifacts, fill gaps, and sharpen the images, which is required due to the nature of optical fibre transmission. This two stage approach allows for the inspection of the predicted images produced by the more robust physically motivated equivariant model, which could be useful in a safety-critical application, or by the output of both models, which produces high quality images. Further, this model can scale to previously unachievable resolutions of imaging with multi-mode optical fibres and is demonstrated on $256 \times 256$ pixel images. This is a result of improving the trainable parameter requirement from $\mathcal{O}(N^4)$ to $\mathcal{O}(m)$, where $N$ is pixel size and $m$ is number of fibre modes. Finally, this model generalises to new images, outside of the set of training data classes, better than previous models.
\ No newline at end of file
diff --git a/data/2022/neurips/Best of Both Worlds Model Selection b/data/2022/neurips/Best of Both Worlds Model Selection
new file mode 100644
index 0000000000..d467fac287
--- /dev/null
+++ b/data/2022/neurips/Best of Both Worlds Model Selection	
@@ -0,0 +1 @@
+We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds, specifically in the case of nested adversarial linear bandits. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Better Best of Both Worlds Bounds for Bandits with Switching Costs b/data/2022/neurips/Better Best of Both Worlds Bounds for Bandits with Switching Costs
new file mode 100644
index 0000000000..4ef265c78f
--- /dev/null
+++ b/data/2022/neurips/Better Best of Both Worlds Bounds for Bandits with Switching Costs	
@@ -0,0 +1 @@
+We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal regret bound of $\mathcal{O}(T^{2/3})$ in the oblivious adversarial setting and a bound of $\mathcal{O}(\min\{\log (T)/\Delta^2,T^{2/3}\})$ in the stochastically-constrained regime, both with (unit) switching costs, where $\Delta$ is the gap between the arms. In the stochastically constrained case, our bound improves over previous results due to Rouyer et al., that achieved regret of $\mathcal{O}(T^{1/3}/\Delta)$. We accompany our results with a lower bound showing that, in general, $\tilde{\Omega}(\min\{1/\Delta^2,T^{2/3}\})$ regret is unavoidable in the stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$ worst-case regret.
\ No newline at end of file
diff --git a/data/2022/neurips/Better SGD using Second-order Momentum b/data/2022/neurips/Better SGD using Second-order Momentum
new file mode 100644
index 0000000000..64209aa386
--- /dev/null
+++ b/data/2022/neurips/Better SGD using Second-order Momentum	
@@ -0,0 +1 @@
+We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations. Our algorithm uses Hessian-vector products to"correct"a bias term in the momentum of SGD with momentum. This leads to better gradient estimates in a manner analogous to variance reduction methods. In contrast to prior work, we do not require excessively large batch sizes, and are able to provide an adaptive algorithm whose convergence rate automatically improves with decreasing variance in the gradient estimates. We validate our results on a variety of large-scale deep learning architectures and benchmarks tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Better Uncertainty Calibration via Proper Scores for Classification and Beyond b/data/2022/neurips/Better Uncertainty Calibration via Proper Scores for Classification and Beyond
new file mode 100644
index 0000000000..a507d4b153
--- /dev/null
+++ b/data/2022/neurips/Better Uncertainty Calibration via Proper Scores for Classification and Beyond	
@@ -0,0 +1 @@
+With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification.
\ No newline at end of file
diff --git a/data/2022/neurips/Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness b/data/2022/neurips/Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness
new file mode 100644
index 0000000000..1122dcf096
--- /dev/null
+++ b/data/2022/neurips/Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness	
@@ -0,0 +1 @@
+Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing, for example, adversarially poisoned rounds, which were previously considered in the expert and bandit setting. Our results extend this to the online convex optimization framework. In the fully i.i.d. case, our bounds match the rates one would expect from results in stochastic acceleration, and in the fully adversarial case they gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes in terms of the stochastic variance and the adversarial variation of the loss gradients.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection b/data/2022/neurips/Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection
new file mode 100644
index 0000000000..4880598784
--- /dev/null
+++ b/data/2022/neurips/Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection	
@@ -0,0 +1 @@
+We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of “projecting” a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor. We provide a parallelizable, iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with state-of-the-art benchmarks demonstrate that our approach maintains competitive performance in terms of accuracy-fairness trade-off curves, while achieving favorable runtime on large datasets. We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional groups, and over 1M samples.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond IID: data-driven decision-making in heterogeneous environments b/data/2022/neurips/Beyond IID: data-driven decision-making in heterogeneous environments
new file mode 100644
index 0000000000..c5f8847ef0
--- /dev/null
+++ b/data/2022/neurips/Beyond IID: data-driven decision-making in heterogeneous environments	
@@ -0,0 +1 @@
+How should one leverage historical data when past observations are not perfectly indicative of the future, e.g., due to the presence of unobserved confounders which one cannot"correct"for? Motivated by this question, we study a data-driven decision-making framework in which historical samples are generated from unknown and different distributions assumed to lie in a heterogeneity ball with known radius and centered around the (also) unknown future (out-of-sample) distribution on which the performance of a decision will be evaluated. This work aims at analyzing the performance of central data-driven policies but also near-optimal ones in these heterogeneous environments and understanding key drivers of performance. We establish a first result which allows to upper bound the asymptotic worst-case regret of a broad class of policies. Leveraging this result, for any integral probability metric, we provide a general analysis of the performance achieved by Sample Average Approximation (SAA) as a function of the radius of the heterogeneity ball. This analysis is centered around the approximation parameter, a notion of complexity we introduce to capture how the interplay between the heterogeneity and the problem structure impacts the performance of SAA. In turn, we illustrate through several widely-studied problems -- e.g., newsvendor, pricing -- how this methodology can be applied and find that the performance of SAA varies considerably depending on the combinations of problem classes and heterogeneity. The failure of SAA for certain instances motivates the design of alternative policies to achieve rate-optimality. We derive problem-dependent policies achieving strong guarantees for the illustrative problems described above and provide initial results towards a principled approach for the design and analysis of general rate-optimal algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond L1: Faster and Better Sparse Models with skglm b/data/2022/neurips/Beyond L1: Faster and Better Sparse Models with skglm
new file mode 100644
index 0000000000..a1e2238ce8
--- /dev/null
+++ b/data/2022/neurips/Beyond L1: Faster and Better Sparse Models with skglm	
@@ -0,0 +1 @@
+We propose a new fast algorithm to estimate any sparse generalized linear model with convex or non-convex separable penalties. Our algorithm is able to solve problems with millions of samples and features in seconds, by relying on coordinate descent, working sets and Anderson acceleration. It handles previously unaddressed models, and is extensively shown to improve state-of-art algorithms. We provide a flexible, scikit-learn compatible package, which easily handles customized datafits and penalties.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Mahalanobis Distance for Textual OOD Detection b/data/2022/neurips/Beyond Mahalanobis Distance for Textual OOD Detection
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer b/data/2022/neurips/Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer
new file mode 100644
index 0000000000..67d9252161
--- /dev/null
+++ b/data/2022/neurips/Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer	
@@ -0,0 +1 @@
+By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and `old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies show that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs b/data/2022/neurips/Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs
new file mode 100644
index 0000000000..2813e3120a
--- /dev/null
+++ b/data/2022/neurips/Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have achieved great success on a node classification task. Despite the broad interest in developing and evaluating GNNs, they have been assessed with limited benchmark datasets. As a result, the existing evaluation of GNNs lacks fine-grained analysis from various characteristics of graphs. Motivated by this, we conduct extensive experiments with a synthetic graph generator that can generate graphs having controlled characteristics for fine-grained analysis. Our empirical studies clarify the strengths and weaknesses of GNNs from four major characteristics of real-world graphs with class labels of nodes, i.e., 1) class size distributions (balanced vs. imbalanced), 2) edge connection proportions between classes (homophilic vs. heterophilic), 3) attribute values (biased vs. random), and 4) graph sizes (small vs. large). In addition, to foster future research on GNNs, we publicly release our codebase that allows users to evaluate various GNNs with various graphs. We hope this work offers interesting insights for future research.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis b/data/2022/neurips/Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
new file mode 100644
index 0000000000..7388cce840
--- /dev/null
+++ b/data/2022/neurips/Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis	
@@ -0,0 +1 @@
+Each year, expert-level performance is attained in increasingly-complex multiagent domains, where notable examples include Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations b/data/2022/neurips/Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations
new file mode 100644
index 0000000000..3868057144
--- /dev/null
+++ b/data/2022/neurips/Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations	
@@ -0,0 +1 @@
+Contrastive learning is a highly effective method for learning representations from unlabeled data. Recent works show that contrastive representations can transfer across domains, leading to simple state-of-the-art algorithms for unsupervised domain adaptation. In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other. We refer to this phenomenon as linear transferability. This paper analyzes when and why contrastive representations exhibit linear transferability in a general unsupervised domain adaptation setting. We prove that linear transferability can occur when data from the same class in different domains (e.g., photo dogs and cartoon dogs) are more related with each other than data from different classes in different domains (e.g., photo dogs and cartoon cats) are. Our analyses are in a realistic regime where the source and target domains can have unbounded density ratios and be weakly related, and they have distant representations across domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update b/data/2022/neurips/Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update
new file mode 100644
index 0000000000..e13fde613f
--- /dev/null
+++ b/data/2022/neurips/Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update	
@@ -0,0 +1 @@
+In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $\ln(m)/\eta$ in all normal-form games with m actions and fixed step-sizes $\eta$. Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $\eta$ is upper bounded by $\frac{1}{(n-1)V}$, where $n$ is the number of agents and $[0,V]$ is the payoff range, then the CMWU updates can be computed linearly fast via a contraction map. This implementation results in an uncoupled online learning dynamic that admits a $O (\log T)$-sparse sub-sequence where each agent experiences at most $O(nV\log m)$ regret. This implies that the CMWU dynamics converge with rate $O(nV \log m \log T / T)$ to a \textit{Coarse Correlated Equilibrium}. The latter improves on the current state-of-the-art convergence rate of \textit{uncoupled online learning dynamics} \cite{daskalakis2021near,anagnostides2021near}.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules b/data/2022/neurips/Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
new file mode 100644
index 0000000000..aba888a20d
--- /dev/null
+++ b/data/2022/neurips/Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules	
@@ -0,0 +1 @@
+To unveil how the brain learns, ongoing work seeks biologically-plausible approximations of gradient descent algorithms for training recurrent neural networks (RNNs). Yet, beyond task accuracy, it is unclear if such learning rules converge to solutions that exhibit different levels of generalization than their nonbiologically-plausible counterparts. Leveraging results from deep learning theory based on loss landscape curvature, we ask: how do biologically-plausible gradient approximations affect generalization? We first demonstrate that state-of-the-art biologically-plausible learning rules for training RNNs exhibit worse and more variable generalization performance compared to their machine learning counterparts that follow the true gradient more closely. Next, we verify that such generalization performance is correlated significantly with loss landscape curvature, and we show that biologically-plausible learning rules tend to approach high-curvature regions in synaptic weight space. Using tools from dynamical systems, we derive theoretical arguments and present a theorem explaining this phenomenon. This predicts our numerical results, and explains why biologically-plausible rules lead to worse and more variable generalization properties. Finally, we suggest potential remedies that could be used by the brain to mitigate this effect. To our knowledge, our analysis is the first to identify the reason for this generalization gap between artificial and biologically-plausible learning rules, which can help guide future investigations into how the brain learns solutions that generalize.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond black box densities: Parameter learning for the deviated components b/data/2022/neurips/Beyond black box densities: Parameter learning for the deviated components
new file mode 100644
index 0000000000..e14d24e5e3
--- /dev/null
+++ b/data/2022/neurips/Beyond black box densities: Parameter learning for the deviated components	
@@ -0,0 +1 @@
+As we collect additional samples from a data population for which a known density function estimate may have been previously obtained by a black box method, the increased complexity of the data set may result in the true density being deviated from the known estimate by a mixture distribution. To model this phenomenon, we consider the \emph{deviating mixture model} $(1-\lambda^{*})h_0 + \lambda^{*} (\sum_{i = 1}^{k} p_{i}^{*} f(x|\theta_{i}^{*}))$, where $h_0$ is a known density function, while the deviated proportion $\lambda^{*}$ and latent mixing measure $G_{*} = \sum_{i = 1}^{k} p_{i}^{*} \delta_{\theta_i^{*}}$ associated with the mixture distribution are unknown. Via a novel notion of distinguishability between the known density $h_{0}$ and the deviated mixture distribution, we establish rates of convergence for the maximum likelihood estimates of $\lambda^{*}$ and $G^{*}$ under Wasserstein metric. Simulation studies are carried out to illustrate the theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond neural scaling laws: beating power law scaling via data pruning b/data/2022/neurips/Beyond neural scaling laws: beating power law scaling via data pruning
new file mode 100644
index 0000000000..0cd0bbd8c9
--- /dev/null
+++ b/data/2022/neurips/Beyond neural scaling laws: beating power law scaling via data pruning	
@@ -0,0 +1 @@
+Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond spectral gap: the role of the topology in decentralized learning b/data/2022/neurips/Beyond spectral gap: the role of the topology in decentralized learning
new file mode 100644
index 0000000000..eefce09493
--- /dev/null
+++ b/data/2022/neurips/Beyond spectral gap: the role of the topology in decentralized learning	
@@ -0,0 +1 @@
+In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects of real-world behavior. First, the 'spectral gap' of the communication graph is not predictive of its empirical performance in (deep) learning. Second, current theory does not explain that collaboration enables larger learning rates than training alone. In fact, it prescribes smaller learning rates, which further decrease as graphs become larger, failing to explain convergence in infinite graphs. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. We quantify how the graph topology influences convergence in a quadratic toy problem and provide theoretical results for general smooth and (strongly) convex objectives. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits b/data/2022/neurips/Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits
new file mode 100644
index 0000000000..b37bcd8444
--- /dev/null
+++ b/data/2022/neurips/Beyond the Best: Distribution Functional Estimation in Infinite-Armed Bandits	
@@ -0,0 +1 @@
+In the infinite-armed bandit problem, each arm’s average reward is sampled from an unknown distribution, and each arm can be sampled further to obtain noisy estimates of the average reward of that arm. Prior work focuses on the best arm, i.e., estimating the maximum of the average reward distribution. We consider a general class of distribution functionals beyond the maximum and obtain optimal sample complexities in both the offline and online settings. We show that online estimation, where the learner can sequentially choose whether to sample a new or existing arm, offers no advantage over the offline setting for estimating the mean functional, but significantly reduces the sample complexity for other functionals such as the median, maximum, and trimmed mean. We propose unified meta algorithms for the online and offline settings and derive matching lower bounds using different Wasserstein distances. For the special case of median estimation, we identify a curious thresholding phenomenon on the indistinguishability between Gaussian convolutions with respect to the noise level, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions b/data/2022/neurips/Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
new file mode 100644
index 0000000000..c4046d90ab
--- /dev/null
+++ b/data/2022/neurips/Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions	
@@ -0,0 +1 @@
+Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios). While recent works on marginalized importance sampling (MIS) show that the former can enjoy provable guarantees under realizable function approximation, the latter is only known to be feasible under much stronger assumptions such as prohibitively expressive discriminators. In this work, we provide guarantees for off-policy function estimation under only realizability, by imposing proper regularization on the MIS objectives. Compared to commonly used regularization in MIS, our regularizer is much more flexible and can account for an arbitrary user-specified distribution, under which the learned function will be close to the groundtruth. We provide exact characterization of the optimal dual solution that needs to be realized by the discriminator class, which determines the data-coverage assumption in the case of value-function learning. As another surprising observation, the regularizer can be altered to relax the data-coverage requirement, and completely eliminate it in the ideal case with strong side information.
\ No newline at end of file
diff --git a/data/2022/neurips/Bezier Gaussian Processes for Tall and Wide Data b/data/2022/neurips/Bezier Gaussian Processes for Tall and Wide Data
new file mode 100644
index 0000000000..9edec9d6de
--- /dev/null
+++ b/data/2022/neurips/Bezier Gaussian Processes for Tall and Wide Data	
@@ -0,0 +1 @@
+Modern approximations to Gaussian processes are suitable for"tall data", with a cost that scales well in the number of observations, but under-performs on ``wide data'', scaling poorly in the number of input features. That is, as the number of input features grows, good predictive performance requires the number of summarising variables, and their associated cost, to grow rapidly. We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features, but requires only linear cost in both number of observations and input features. This scaling is achieved through our introduction of the B\'ezier buttress, which allows approximate inference without computing matrix inverses or determinants. We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression, and empirically demonstrate the kernel's ability to scale to both tall and wide datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification b/data/2022/neurips/Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification
new file mode 100644
index 0000000000..0a7eaf2e8b
--- /dev/null
+++ b/data/2022/neurips/Bi-directional Weakly Supervised Knowledge Distillation for Whole Slide Image Classification	
@@ -0,0 +1 @@
+Computer-aided pathology diagnosis based on the classification of Whole Slide Image (WSI) plays an important role in clinical practice, and it is often formulated as a weakly-supervised Multiple Instance Learning (MIL) problem. Existing methods solve this problem from either a bag classification or an instance classification perspective. In this paper, we propose an end-to-end weakly supervised knowledge distillation framework (WENO) for WSI classification, which integrates a bag classifier and an instance classifier in a knowledge distillation framework to mutually improve the performance of both classifiers. Specifically, an attention-based bag classifier is used as the teacher network, which is trained with weak bag labels, and an instance classifier is used as the student network, which is trained using the normalized attention scores obtained from the teacher network as soft pseudo labels for the instances in positive bags. An instance feature extractor is shared between the teacher and the student to further enhance the knowledge exchange between them. In addition, we propose a hard positive instance mining strategy based on the output of the student network to force the teacher network to keep mining hard positive instances. WENO is a plug-and-play framework that can be easily applied to any existing attention-based bag classification methods. Extensive experiments on five datasets demonstrate the efficiency of WENO. Code is available at https://github.com/miccaiif/WENO.
\ No newline at end of file
diff --git a/data/2022/neurips/BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons b/data/2022/neurips/BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons
new file mode 100644
index 0000000000..8f2db89f26
--- /dev/null
+++ b/data/2022/neurips/BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons	
@@ -0,0 +1 @@
+This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs). We provide extensive analysis on the difficulty of binarizing vision MLPs and find that previous binarization methods perform poorly due to limited capacity of binary MLPs. In contrast with the traditional CNNs that utilizing convolutional operations with large kernel size, fully-connected (FC) layers in MLPs can be treated as convolutional layers with kernel size $1\times1$. Thus, the representation ability of the FC layers will be limited when being binarized, and places restrictions on the capability of spatial mixing and channel mixing on the intermediate features. To this end, we propose to improve the performance of binary MLP (BiMLP) model by enriching the representation ability of binary FC layers. We design a novel binary block that contains multiple branches to merge a series of outputs from the same stage, and also a universal shortcut connection that encourages the information flow from the previous stage. The downsampling layers are also carefully designed to reduce the computational complexity while maintaining the classification performance. Experimental results on benchmark dataset ImageNet-1k demonstrate the effectiveness of the proposed BiMLP models, which achieve state-of-the-art accuracy compared to prior binary CNNs. The MindSpore code is available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/BiMLP}.
\ No newline at end of file
diff --git a/data/2022/neurips/BiT: Robustly Binarized Multi-distilled Transformer b/data/2022/neurips/BiT: Robustly Binarized Multi-distilled Transformer
new file mode 100644
index 0000000000..d87c70273b
--- /dev/null
+++ b/data/2022/neurips/BiT: Robustly Binarized Multi-distilled Transformer	
@@ -0,0 +1 @@
+Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%. Code and models are available at: https://github.com/facebookresearch/bit.
\ No newline at end of file
diff --git a/data/2022/neurips/Bidirectional Learning for Offline Infinite-width Model-based Optimization b/data/2022/neurips/Bidirectional Learning for Offline Infinite-width Model-based Optimization
new file mode 100644
index 0000000000..7b9d5f7db2
--- /dev/null
+++ b/data/2022/neurips/Bidirectional Learning for Offline Infinite-width Model-based Optimization	
@@ -0,0 +1 @@
+In offline model-based optimization, we strive to maximize a black-box objective function by only leveraging a static dataset of designs and their scores. This problem setting arises in numerous fields including the design of materials, robots, DNA sequences, and proteins. Recent approaches train a deep neural network (DNN) on the static dataset to act as a proxy function, and then perform gradient ascent on the existing designs to obtain potentially high-scoring designs. This methodology frequently suffers from the out-of-distribution problem where the proxy function often returns poor designs. To mitigate this problem, we propose BiDirectional learning for offline Infinite-width model-based optimization (BDI). BDI consists of two mappings: the forward mapping leverages the static dataset to predict the scores of the high-scoring designs, and the backward mapping leverages the high-scoring designs to predict the scores of the static dataset. The backward mapping, neglected in previous work, can distill more information from the static dataset into the high-scoring designs, which effectively mitigates the out-of-distribution problem. For a finite-width DNN model, the loss function of the backward mapping is intractable and only has an approximate form, which leads to a significant deterioration of the design quality. We thus adopt an infinite-width DNN model, and propose to employ the corresponding neural tangent kernel to yield a closed-form loss for more accurate design updates. Experiments on various tasks verify the effectiveness of BDI. The code is available at https://github.com/GGchen1997/BDI.
\ No newline at end of file
diff --git a/data/2022/neurips/BigBio: A Framework for Data-Centric Biomedical Natural Language Processing b/data/2022/neurips/BigBio: A Framework for Data-Centric Biomedical Natural Language Processing
new file mode 100644
index 0000000000..d467760703
--- /dev/null
+++ b/data/2022/neurips/BigBio: A Framework for Data-Centric Biomedical Natural Language Processing	
@@ -0,0 +1 @@
+Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance. Natural language prompting has recently lead to improved zero-shot generalization by transforming existing, supervised datasets into a diversity of novel pretraining tasks, highlighting the benefits of meta-dataset curation. While successful in general-domain text, translating these data-centric approaches to biomedical language modeling remains challenging, as labeled biomedical datasets are significantly underrepresented in popular data hubs. To address this challenge, we introduce BigBIO a community library of 126+ biomedical NLP datasets, currently covering 12 task categories and 10+ languages. BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata, and is compatible with current platforms for prompt engineering and end-to-end few/zero shot language model evaluation. We discuss our process for task schema harmonization, data auditing, contribution guidelines, and outline two illustrative use cases: zero-shot evaluation of biomedical prompts and large-scale, multi-task learning. BigBIO is an ongoing community effort and is available at https://github.com/bigscience-workshop/biomedical
\ No newline at end of file
diff --git a/data/2022/neurips/BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis b/data/2022/neurips/BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
new file mode 100644
index 0000000000..fdb28d63f7
--- /dev/null
+++ b/data/2022/neurips/BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis	
@@ -0,0 +1 @@
+Binaural audio plays a significant role in constructing immersive augmented and virtual realities. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models),the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples. Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. 0.157, MOS: 3.80 vs. 3.61). The generated audio samples (https://speechresearch.github.io/binauralgrad) and code (https://github.com/microsoft/NeuralSpeech/tree/master/BinauralGrad) are available online.
\ No newline at end of file
diff --git a/data/2022/neurips/Biological Learning of Irreducible Representations of Commuting Transformations b/data/2022/neurips/Biological Learning of Irreducible Representations of Commuting Transformations
new file mode 100644
index 0000000000..7d589b21df
--- /dev/null
+++ b/data/2022/neurips/Biological Learning of Irreducible Representations of Commuting Transformations	
@@ -0,0 +1 @@
+A longstanding challenge in neuroscience is to understand neural mechanisms underlying the brain’s remarkable ability to learn and detect transformations of objects due to motion. Translations and rotations of images can be viewed as orthogonal transformations in the space of pixel intensity vectors. Every orthogonal transformation can be decomposed into rotations within irreducible two-dimensional subspaces (or representations). For sets of commuting transformations, known as toroidal groups, Cohen and Welling proposed a mathematical framework for learning the irreducible representations. We explore the possibility that the brain also learns irreducible representations using a biologically plausible learning mechanism. The first is based on SVD of the anti-symmetrized outer product of the vectors representing consecutive images and is implemented by a single-layer neural network. The second is based on PCA of the difference between consecutive frames and is implemented in a two-layer network but with greater biological plausibility. Both networks learn image rotations (replicating Cohen and Welling’s results) as well as translations. It would be interesting to search for the proposed networks in nascent connectomics and physiology datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Biologically Inspired Dynamic Thresholds for Spiking Neural Networks b/data/2022/neurips/Biologically Inspired Dynamic Thresholds for Spiking Neural Networks
new file mode 100644
index 0000000000..21e5d01ee8
--- /dev/null
+++ b/data/2022/neurips/Biologically Inspired Dynamic Thresholds for Spiking Neural Networks	
@@ -0,0 +1 @@
+The dynamic membrane potential threshold, as one of the essential properties of a biological neuron, is a spontaneous regulation mechanism that maintains neuronal homeostasis, i.e., the constant overall spiking firing rate of a neuron. As such, the neuron firing rate is regulated by a dynamic spiking threshold, which has been extensively studied in biology. Existing work in the machine learning community does not employ bioinspired spiking threshold schemes. This work aims at bridging this gap by introducing a novel bioinspired dynamic energy-temporal threshold (BDETT) scheme for spiking neural networks (SNNs). The proposed BDETT scheme mirrors two bioplausible observations: a dynamic threshold has 1) a positive correlation with the average membrane potential and 2) a negative correlation with the preceding rate of depolarization. We validate the effectiveness of the proposed BDETT on robot obstacle avoidance and continuous control tasks under both normal conditions and various degraded conditions, including noisy observations, weights, and dynamic environments. We find that the BDETT outperforms existing static and heuristic threshold approaches by significant margins in all tested conditions, and we confirm that the proposed bioinspired dynamic threshold scheme offers homeostasis to SNNs in complex real-world tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Biologically plausible solutions for spiking networks with efficient coding b/data/2022/neurips/Biologically plausible solutions for spiking networks with efficient coding
new file mode 100644
index 0000000000..4380bdfb62
--- /dev/null
+++ b/data/2022/neurips/Biologically plausible solutions for spiking networks with efficient coding	
@@ -0,0 +1 @@
+Understanding how the dynamics of neural networks is shaped by the computations they perform is a fundamental question in neuroscience. Recently, the framework of efficient coding proposed a theory of how spiking neural networks can compute low-dimensional stimulus signals with high efficiency. Efficient spiking networks are based on time-dependent minimization of a loss function related to information coding with spikes. To inform the understanding of the function and dynamics of biological networks in the brain, however, the mathematical models have to be informed by biology and obey the same constraints as biological networks. Currently, spiking network models of efficient coding have been extended to include some features of biological plausibility, such as architectures with excitatory and inhibitory neurons. However, biological realism of efficient coding theories is still limited to simple cases and does not include single neuron and network properties that are known to be key in biological circuits. Here, we revisit the theory of efficient coding with spikes to develop spiking neural networks that are closer to biological circuits. Namely, we find a biologically plausible spiking model realizing efficient coding in the case of a generalized leaky integrate-and-fire network with excitatory and inhibitory units, equipped with fast and slow synaptic currents, local homeostatic currents such as spike-triggered adaptation, hyperpolarization-activated rebound current, heterogeneous firing thresholds and resets, heterogeneous postsynaptic potentials, and structured, low-rank connectivity. We show how the complexity of E-E connectivity matrix shapes network responses.
\ No newline at end of file
diff --git a/data/2022/neurips/Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources b/data/2022/neurips/Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources
new file mode 100644
index 0000000000..2ffe23d5f9
--- /dev/null
+++ b/data/2022/neurips/Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources	
@@ -0,0 +1 @@
+Extraction of latent sources of complex stimuli is critical for making sense of the world. While the brain solves this blind source separation (BSS) problem continuously, its algorithms remain unknown. Previous work on biologically-plausible BSS algorithms assumed that observed signals are linear mixtures of statistically independent or uncorrelated sources, limiting the domain of applicability of these algorithms. To overcome this limitation, we propose novel biologically-plausible neural networks for the blind separation of potentially dependent/correlated sources. Differing from previous work, we assume some general geometric, not statistical, conditions on the source vectors allowing separation of potentially dependent/correlated sources. Concretely, we assume that the source vectors are sufficiently scattered in their domains which can be described by certain polytopes. Then, we consider recovery of these sources by the Det-Max criterion, which maximizes the determinant of the output correlation matrix to enforce a similar spread for the source estimates. Starting from this normative principle, and using a weighted similarity matching approach that enables arbitrary linear transformations adaptable by local learning rules, we derive two-layer biologically-plausible neural network algorithms that can separate mixtures into sources coming from a variety of source domains. We demonstrate that our algorithms outperform other biologically-plausible BSS algorithms on correlated source separation problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators b/data/2022/neurips/Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators
new file mode 100644
index 0000000000..4cf04b7efe
--- /dev/null
+++ b/data/2022/neurips/Biologically-plausible backpropagation through arbitrary timespans via local neuromodulators	
@@ -0,0 +1 @@
+The spectacular successes of recurrent neural network models where key parameters are adjusted via backpropagation-based gradient descent have inspired much thought as to how biological neuronal networks might solve the corresponding synaptic credit assignment problem. There is so far little agreement, however, as to how biological networks could implement the necessary backpropagation through time, given widely recognized constraints of biological synaptic network signaling architectures. Here, we propose that extra-synaptic diffusion of local neuromodulators such as neuropeptides may afford an effective mode of backpropagation lying within the bounds of biological plausibility. Going beyond existing temporal truncation-based gradient approximations, our approximate gradient-based update rule, ModProp, propagates credit information through arbitrary time steps. ModProp suggests that modulatory signals can act on receiving cells by convolving their eligibility traces via causal, time-invariant and synapse-type-specific filter taps. Our mathematical analysis of ModProp learning, together with simulation results on benchmark temporal tasks, demonstrate the advantage of ModProp over existing biologically-plausible temporal credit assignment rules. These results suggest a potential neuronal mechanism for signaling credit information related to recurrent interactions over a longer time horizon. Finally, we derive an in-silico implementation of ModProp that could serve as a low-complexity and causal alternative to backpropagation through time.
\ No newline at end of file
diff --git a/data/2022/neurips/Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation b/data/2022/neurips/Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation
new file mode 100644
index 0000000000..516448f3ef
--- /dev/null
+++ b/data/2022/neurips/Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation	
@@ -0,0 +1 @@
+Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.
\ No newline at end of file
diff --git a/data/2022/neurips/Black-Box Generalization: Stability of Zeroth-Order Learning b/data/2022/neurips/Black-Box Generalization: Stability of Zeroth-Order Learning
new file mode 100644
index 0000000000..75c6911c0f
--- /dev/null
+++ b/data/2022/neurips/Black-Box Generalization: Stability of Zeroth-Order Learning	
@@ -0,0 +1 @@
+Federated zeroth-order optimization (FedZO) algorithm enjoys the advantages of both zeroth-order optimization and federated learning, and has shown exceptional performance on black-box attack and softmax regression tasks. However, there is little generalization analysis for FedZO, and its analysis on computing convergence rate is slower than the corresponding first-order optimization setting. This paper aims to establish systematic theoretical assessments of FedZO by developing the analysis technique of on-average model stability. We establish the first generalization error bound of FedZO under the Lipschitz continuity and smoothness conditions. Then, refined generalization and optimization bounds are provided by replacing bounded gradient with heavy-tailed gradient noise and utilizing the second-order Taylor expansion for gradient approximation. With the help of a new error decomposition strategy, our theoretical analysis is also extended to the asynchronous case. For FedZO, our fine-grained analysis fills the theoretical gap on the generalization guarantees and polishes the convergence characterization of the computing algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Black-box coreset variational inference b/data/2022/neurips/Black-box coreset variational inference
new file mode 100644
index 0000000000..69ab6c307a
--- /dev/null
+++ b/data/2022/neurips/Black-box coreset variational inference	
@@ -0,0 +1 @@
+Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks. Existing variational coreset constructions rely on either selecting subsets of the observed datapoints, or jointly performing approximate inference and optimizing pseudodata in the observed space akin to inducing points methods in Gaussian Processes. So far, both approaches are limited by complexities in evaluating their objectives for general purpose models, and require generating samples from a typically intractable posterior over the coreset throughout inference and testing. In this work, we present a black-box variational inference framework for coresets that overcomes these constraints and enables principled application of variational coresets to intractable models, such as Bayesian neural networks. We apply our techniques to supervised learning problems, and compare them with existing approaches in the literature for data summarization and inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Blackbox Attacks via Surrogate Ensemble Search b/data/2022/neurips/Blackbox Attacks via Surrogate Ensemble Search
new file mode 100644
index 0000000000..1a8e620cad
--- /dev/null
+++ b/data/2022/neurips/Blackbox Attacks via Surrogate Ensemble Search	
@@ -0,0 +1 @@
+Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of queries to achieve high success rates (especially for targeted attacks). In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries. We first define a perturbation machine that generates a perturbed image by minimizing a weighted loss function over a fixed set of surrogate models. To generate an attack for a given victim model, we search over the weights in the loss function using queries generated by the perturbation machine. Since the dimension of the search space is small (same as the number of surrogate models), the search requires a small number of queries. We demonstrate that our proposed method achieves better success rate with at least 30x fewer queries compared to state-of-the-art methods on different image classifiers trained with ImageNet. In particular, our method requires as few as 3 queries per image (on average) to achieve more than a 90% success rate for targeted attacks and 1-2 queries per image for over a 99% success rate for untargeted attacks. Our method is also effective on Google Cloud Vision API and achieved a 91% untargeted attack success rate with 2.9 queries per image. We also show that the perturbations generated by our proposed method are highly transferable and can be adopted for hard-label blackbox attacks. We also show effectiveness of BASES for hiding attacks on object detectors.
\ No newline at end of file
diff --git a/data/2022/neurips/Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution b/data/2022/neurips/Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution
new file mode 100644
index 0000000000..6c12f6936b
--- /dev/null
+++ b/data/2022/neurips/Blessing of Depth in Linear Regression: Deeper Models Have Flatter Landscape Around the True Solution	
@@ -0,0 +1 @@
+This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise, and the true linear model is captured via an N -layer diagonal linear neural network. On the negative side, we show that this problem does not have a benign landscape: given any N � 1 , with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any N -layer model with N � 2 , a simple sub-gradient method becomes oblivious to such “problematic” solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a ﬂat local landscape, thereby eschewing the need for “early stopping”. Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with ` 1 -loss.
\ No newline at end of file
diff --git a/data/2022/neurips/Block-Recurrent Transformers b/data/2022/neurips/Block-Recurrent Transformers
new file mode 100644
index 0000000000..05b0eec1ba
--- /dev/null
+++ b/data/2022/neurips/Block-Recurrent Transformers	
@@ -0,0 +1 @@
+We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens during training, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude. Our implementation of recurrence has the same cost in both computation time and parameter count as a conventional transformer layer, but offers dramatically improved perplexity in language modeling tasks over very long sequences. Our model out-performs a long-range Transformer XL baseline by a wide margin, while running twice as fast. We demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source code. Our code has been released as open source.
\ No newline at end of file
diff --git a/data/2022/neurips/Boosting Barely Robust Learners: A New Perspective on Adversarial Robustness b/data/2022/neurips/Boosting Barely Robust Learners: A New Perspective on Adversarial Robustness
new file mode 100644
index 0000000000..e551a32167
--- /dev/null
+++ b/data/2022/neurips/Boosting Barely Robust Learners: A New Perspective on Adversarial Robustness	
@@ -0,0 +1 @@
+We present an oracle-efficient algorithm for boosting the adversarial robustness of barely robust learners. Barely robust learning algorithms learn predictors that are adversarially robust only on a small fraction $\beta \ll 1$ of the data distribution. Our proposed notion of barely robust learning requires robustness with respect to a"larger"perturbation set; which we show is necessary for strongly robust learning, and that weaker relaxations are not sufficient for strongly robust learning. Our results reveal a qualitative and quantitative equivalence between two seemingly unrelated problems: strongly robust learning and barely robust learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Boosting Out-of-distribution Detection with Typical Features b/data/2022/neurips/Boosting Out-of-distribution Detection with Typical Features
new file mode 100644
index 0000000000..8407dbb95e
--- /dev/null
+++ b/data/2022/neurips/Boosting Out-of-distribution Detection with Typical Features	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection is a critical task for ensuring the reliability and safety of deep neural networks in real-world scenarios. Different from most previous OOD detection methods that focus on designing OOD scores or introducing diverse outlier examples to retrain the model, we delve into the obstacle factors in OOD detection from the perspective of typicality and regard the feature's high-probability region of the deep model as the feature's typical set. We propose to rectify the feature into its typical set and calculate the OOD score with the typical features to achieve reliable uncertainty estimation. The feature rectification can be conducted as a {plug-and-play} module with various OOD scores. We evaluate the superiority of our method on both the commonly used benchmark (CIFAR) and the more challenging high-resolution benchmark with large label space (ImageNet). Notably, our approach outperforms state-of-the-art methods by up to 5.11$\%$ in the average FPR95 on the ImageNet benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Boosting the Performance of Generic Deep Neural Network Frameworks with Log-supermodular CRFs b/data/2022/neurips/Boosting the Performance of Generic Deep Neural Network Frameworks with Log-supermodular CRFs
new file mode 100644
index 0000000000..55152b7ec9
--- /dev/null
+++ b/data/2022/neurips/Boosting the Performance of Generic Deep Neural Network Frameworks with Log-supermodular CRFs	
@@ -0,0 +1 @@
+Historically, conditional random fields (CRFs) were popular tools in a variety of application areas from computer vision to natural language processing, but due to their higher computational cost and weaker practical performance, they have, in many situations, fallen out of favor and been replaced by end-to-end deep neural network (DNN) solutions. More recently, combined DNN-CRF approaches have been considered, but their speed and practical performance still falls short of the best performing pure DNN solutions. In this work, we present a generic combined approach in which a log-supermodular CRF acts as a regularizer to encourage similarity between outputs in a structured prediction task. We show that this combined approach is widely applicable, practical (it incurs only a moderate overhead on top of the base DNN solution) and, in some cases, it can rival carefully engineered pure DNN solutions for the same structured prediction task.
\ No newline at end of file
diff --git a/data/2022/neurips/Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation b/data/2022/neurips/Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation
new file mode 100644
index 0000000000..05b8e904c3
--- /dev/null
+++ b/data/2022/neurips/Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce erroneous predictions by injecting imperceptible perturbations. In this work, we study the transferability of adversarial examples, which is significant due to its threat to real-world applications where model architecture or parameters are usually unknown. Many existing works reveal that the adversarial examples are likely to overfit the surrogate model that they are generated from, limiting its transfer attack performance against different target models. To mitigate the overfitting of the surrogate model, we propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure. The adversarial attack with RAP is formulated as a min-max bi-level optimization problem. By integrating RAP into the iterative process for attacks, our method can find more stable adversarial examples which are less sensitive to the changes of decision boundary, mitigating the overfitting of the surrogate model. Comprehensive experimental comparisons demonstrate that RAP can significantly boost adversarial transferability. Furthermore, RAP can be naturally combined with many existing black-box attack techniques, to further boost the transferability. When attacking a real-world image recognition system, Google Cloud Vision API, we obtain 22% performance improvement of targeted attacks over the compared method. Our codes are available at https://github.com/SCLBD/Transfer_attack_RAP.
\ No newline at end of file
diff --git a/data/2022/neurips/Bootstrapped Transformer for Offline Reinforcement Learning b/data/2022/neurips/Bootstrapped Transformer for Offline Reinforcement Learning
new file mode 100644
index 0000000000..5cbd3aa781
--- /dev/null
+++ b/data/2022/neurips/Bootstrapped Transformer for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation models yet has not drawn enough attention in the previous works. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes are available at https://seqml.github.io/bootorl.
\ No newline at end of file
diff --git a/data/2022/neurips/Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity b/data/2022/neurips/Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity
new file mode 100644
index 0000000000..c90e25ce02
--- /dev/null
+++ b/data/2022/neurips/Bounded-Regret MPC via Perturbation Analysis: Prediction Error, Constraints, and Nonlinearity	
@@ -0,0 +1 @@
+We study Model Predictive Control (MPC) and propose a general analysis pipeline to bound its dynamic regret. The pipeline first requires deriving a perturbation bound for a finite-time optimal control problem. Then, the perturbation bound is used to bound the per-step error of MPC, which leads to a bound on the dynamic regret. Thus, our pipeline reduces the study of MPC to the well-studied problem of perturbation analysis, enabling the derivation of regret bounds of MPC under a variety of settings. To demonstrate the power of our pipeline, we use it to generalize existing regret bounds on MPC in linear time-varying (LTV) systems to incorporate prediction errors on costs, dynamics, and disturbances. Further, our pipeline leads to regret bounds on MPC in systems with nonlinear dynamics and constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Bounding and Approximating Intersectional Fairness through Marginal Fairness b/data/2022/neurips/Bounding and Approximating Intersectional Fairness through Marginal Fairness
new file mode 100644
index 0000000000..1eea6808d4
--- /dev/null
+++ b/data/2022/neurips/Bounding and Approximating Intersectional Fairness through Marginal Fairness	
@@ -0,0 +1 @@
+Discrimination in machine learning often arises along multiple dimensions (a.k.a. protected attributes); it is then desirable to ensure \emph{intersectional fairness} -- i.e., that no subgroup is discriminated against. It is known that ensuring \emph{marginal fairness} for every dimension independently is not sufficient in general. Due to the exponential number of subgroups, however, directly measuring intersectional fairness from data is impossible. In this paper, our primary goal is to understand in detail the relationship between marginal and intersectional fairness through statistical analysis. We first identify a set of sufficient conditions under which an exact relationship can be obtained. Then, we prove bounds (easily computable through marginal fairness and other meaningful statistical quantities) in high-probability on intersectional fairness in the general case. Beyond their descriptive value, we show that these theoretical bounds can be leveraged to derive a heuristic improving the approximation and bounds of intersectional fairness by choosing, in a relevant manner, protected attributes for which we describe intersectional subgroups. Finally, we test the performance of our approximations and bounds on real and synthetic data-sets.
\ No newline at end of file
diff --git a/data/2022/neurips/Brain Network Transformer b/data/2022/neurips/Brain Network Transformer
new file mode 100644
index 0000000000..d17fe1fe3b
--- /dev/null
+++ b/data/2022/neurips/Brain Network Transformer	
@@ -0,0 +1 @@
+Human brains are commonly modeled as networks of Regions of Interest (ROIs) and their connections for the understanding of brain functions and mental disorders. Recently, Transformer-based models have been studied over different types of data, including graphs, shown to bring performance gains widely. In this work, we study Transformer-based models for brain network analysis. Driven by the unique properties of data, we model brain networks as graphs with nodes of fixed size and order, which allows us to (1) use connection profiles as node features to provide natural and low-cost positional information and (2) learn pair-wise connection strengths among ROIs with efficient attention weights across individuals that are predictive towards downstream analysis tasks. Moreover, we propose an Orthonormal Clustering Readout operation based on self-supervised soft clustering and orthonormal projection. This design accounts for the underlying functional modules that determine similar behaviors among groups of ROIs, leading to distinguishable cluster-aware node embeddings and informative graph embeddings. Finally, we re-standardize the evaluation pipeline on the only one publicly available large-scale brain network dataset of ABIDE, to enable meaningful comparison of different models. Experiment results show clear improvements of our proposed Brain Network Transformer on both the public ABIDE and our restricted ABCD datasets. The implementation is available at https://github.com/Wayfear/BrainNetworkTransformer.
\ No newline at end of file
diff --git a/data/2022/neurips/Branch & Learn for Recursively and Iteratively Solvable Problems in Predict+Optimize b/data/2022/neurips/Branch & Learn for Recursively and Iteratively Solvable Problems in Predict+Optimize
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Breaking Bad: A Dataset for Geometric Fracture and Reassembly b/data/2022/neurips/Breaking Bad: A Dataset for Geometric Fracture and Reassembly
new file mode 100644
index 0000000000..29e6176140
--- /dev/null
+++ b/data/2022/neurips/Breaking Bad: A Dataset for Geometric Fracture and Reassembly	
@@ -0,0 +1 @@
+We introduce Breaking Bad, a large-scale dataset of fractured objects. Our dataset consists of over one million fractured objects simulated from ten thousand base models. The fracture simulation is powered by a recent physically based algorithm that efficiently generates a variety of fracture modes of an object. Existing shape assembly datasets decompose objects according to semantically meaningful parts, effectively modeling the construction process. In contrast, Breaking Bad models the destruction process of how a geometric object naturally breaks into fragments. Our dataset serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. We analyze our dataset with several geometry measurements and benchmark three state-of-the-art shape assembly deep learning methods under various settings. Extensive experimental results demonstrate the difficulty of our dataset, calling on future research in model designs specifically for the geometric shape assembly task. We host our dataset at https://breaking-bad-dataset.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor b/data/2022/neurips/Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor
new file mode 100644
index 0000000000..fd15a60195
--- /dev/null
+++ b/data/2022/neurips/Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) can automatically design promising neural architectures without artiﬁcial experience. Though it achieves great success, pro-hibitively high search cost is required to ﬁnd a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets ( e.g. , NAS-Bench-101), but can be used to ﬁnd high-performance architectures in large-scale search spaces. Particularly, we propose a progressive subspace adaptation strategy to address the domain discrepancy between the source architecture space and the target space. Considering the large difference between two architecture spaces, an assistant space is developed to smooth the transfer process. Compared with existing NAS methods, the proposed CDP is much more efﬁcient. For example, CDP only requires the search cost of 0.1 GPU Days to ﬁnd architectures with 76.9% top-1 accuracy on ImageNet and 97.51% on CIFAR-10. The source code will be available 3 .
\ No newline at end of file
diff --git a/data/2022/neurips/Bridging Central and Local Differential Privacy in Data Acquisition Mechanisms b/data/2022/neurips/Bridging Central and Local Differential Privacy in Data Acquisition Mechanisms
new file mode 100644
index 0000000000..9455a744fa
--- /dev/null
+++ b/data/2022/neurips/Bridging Central and Local Differential Privacy in Data Acquisition Mechanisms	
@@ -0,0 +1 @@
+We study the design of optimal Bayesian data acquisition mechanisms for a platform interested in estimating the mean of a distribution by collecting data from privacy-conscious users. In our setting, users have heterogeneous sensitivities for two types of privacy losses corresponding to local and central differential privacy measures. The local privacy loss is due to the leakage of a user’s information when she shares her data with the platform, and the central privacy loss is due to the released estimate by the platform to the public. The users share their data in exchange for a payment (e.g., through monetary transfers or services) that compensates for their privacy losses. The platform knows the distribution of privacy sensitivities but not their realizations, and must design a mechanism to solicit their preferences and then deliver both local and central privacy guarantees while minimizing the estimation error plus the expected payment to users. We first establish minimax lower bounds for the estimation error, given a vector of privacy guarantees, and show that a linear estimator is (near) optimal. We then turn to our main goal: designing an optimal data acquisition mechanism. We establish that the design of such mechanisms in a Bayesian setting (where the platform knows the distribution of users’ sensitivities and not their realizations) can be cast as a nonconvex optimization problem. Additionally, for the class of linear estimators, we prove that finding the optimal mechanism admits a Polynomial Time Approximation Scheme.
\ No newline at end of file
diff --git a/data/2022/neurips/Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets b/data/2022/neurips/Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
new file mode 100644
index 0000000000..fcbad9250a
--- /dev/null
+++ b/data/2022/neurips/Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets	
@@ -0,0 +1 @@
+There still remains an extreme performance gap between Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) when training from scratch on small datasets, which is concluded to the lack of inductive bias. In this paper, we further consider this problem and point out two weaknesses of ViTs in inductive biases, that is, the spatial relevance and diverse channel representation. First, on spatial aspect, objects are locally compact and relevant, thus fine-grained feature needs to be extracted from a token and its neighbors. While the lack of data hinders ViTs to attend the spatial relevance. Second, on channel aspect, representation exhibits diversity on different channels. But the scarce data can not enable ViTs to learn strong enough representation for accurate recognition. To this end, we propose Dynamic Hybrid Vision Transformer (DHVT) as the solution to enhance the two inductive biases. On spatial aspect, we adopt a hybrid structure, in which convolution is integrated into patch embedding and multi-layer perceptron module, forcing the model to capture the token features as well as their neighboring features. On channel aspect, we introduce a dynamic feature aggregation module in MLP and a brand new"head token"design in multi-head self-attention module to help re-calibrate channel representation and make different channel group representation interacts with each other. The fusion of weak channel representation forms a strong enough representation for classification. With this design, we successfully eliminate the performance gap between CNNs and ViTs, and our DHVT achieves a series of state-of-the-art performance with a lightweight model, 85.68% on CIFAR-100 with 22.8M parameters, 82.3% on ImageNet-1K with 24.0M parameters. Code is available at https://github.com/ArieSeirack/DHVT.
\ No newline at end of file
diff --git a/data/2022/neurips/Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection b/data/2022/neurips/Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
new file mode 100644
index 0000000000..bdf5bfbe84
--- /dev/null
+++ b/data/2022/neurips/Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection	
@@ -0,0 +1 @@
+Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision. We note that both these modes of supervision are not optimally aligned for the detection task: CLIP is trained with image-text pairs and lacks precise localization of objects while the image-level supervision has been used with heuristics that do not accurately specify local object regions. In this work, we propose to address this problem by performing object-centric alignment of the language embeddings from the CLIP model. Furthermore, we visually ground the objects with only image-level supervision using a pseudo-labeling process that provides high-quality object proposals and helps expand the vocabulary during training. We establish a bridge between the above two object-alignment strategies via a novel weight transfer function that aggregates their complimentary strengths. In essence, the proposed model seeks to minimize the gap between object and image-centric representations in the OVD setting. On the COCO benchmark, our proposed approach achieves 36.6 AP50 on novel classes, an absolute 8.2 gain over the previous best performance. For LVIS, we surpass the state-of-the-art ViLD model by 5.0 mask AP for rare categories and 3.4 overall. Code: https://github.com/hanoonaR/object-centric-ovd.
\ No newline at end of file
diff --git a/data/2022/neurips/Bridging the Gap from Asymmetry Tricks to Decorrelation Principles in Non-contrastive Self-supervised Learning b/data/2022/neurips/Bridging the Gap from Asymmetry Tricks to Decorrelation Principles in Non-contrastive Self-supervised Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers b/data/2022/neurips/Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers
new file mode 100644
index 0000000000..599110c399
--- /dev/null
+++ b/data/2022/neurips/Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers	
@@ -0,0 +1 @@
+While neural network binary classifiers are often evaluated on metrics such as Accuracy and $F_1$-Score, they are commonly trained with a cross-entropy objective. How can this training-evaluation gap be addressed? While specific techniques have been adopted to optimize certain confusion matrix based metrics, it is challenging or impossible in some cases to generalize the techniques to other metrics. Adversarial learning approaches have also been proposed to optimize networks via confusion matrix based metrics, but they tend to be much slower than common training methods. In this work, we propose a unifying approach to training neural network binary classifiers that combines a differentiable approximation of the Heaviside function with a probabilistic view of the typical confusion matrix values using soft sets. Our theoretical analysis shows the benefit of using our method to optimize for a given evaluation metric, such as $F_1$-Score, with soft sets, and our extensive experiments show the effectiveness of our approach in several domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization b/data/2022/neurips/Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization
new file mode 100644
index 0000000000..4f25ce6946
--- /dev/null
+++ b/data/2022/neurips/Bring Your Own Algorithm for Optimal Differentially Private Stochastic Minimax Optimization	
@@ -0,0 +1 @@
+We study differentially private (DP) algorithms for smooth stochastic minimax optimization, with stochastic minimization as a byproduct. The holy grail of these settings is to guarantee the optimal trade-off between the privacy and the excess population loss, using an algorithm with a linear time-complexity in the number of training samples. We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off. Our framework is inspired from the recently proposed Phased-ERM method [22] for nonsmooth differentially private stochastic convex optimization (DP-SCO), which exploits the stability of the empirical risk minimization (ERM) for the privacy guarantee. The flexibility of our approach enables us to sidestep the requirement that the base algorithm needs to have bounded sensitivity, and allows the use of sophisticated variance-reduced accelerated methods to achieve near-linear time-complexity. To the best of our knowledge, these are the first near-linear time algorithms with near-optimal guarantees on the population duality gap for smooth DP-SMO, when the objective is (strongly-)convex--(strongly-)concave. Additionally, based on our flexible framework, we enrich the family of near-linear time algorithms for smooth DP-SCO with the near-optimal privacy-loss trade-off.
\ No newline at end of file
diff --git a/data/2022/neurips/Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens b/data/2022/neurips/Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
new file mode 100644
index 0000000000..4648153891
--- /dev/null
+++ b/data/2022/neurips/Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens	
@@ -0,0 +1 @@
+Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. At the same time, if a small set of annotated images is available, either within or outside the domain of interest, how could we leverage these for a video downstream task? We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model. SViT relies on two key insights. First, as both images and videos contain structured information, we enrich a transformer model with a set of object tokens that can be used across images and videos. Second, the scene representations of individual frames in video should “align” with those of still images. This is achieved via a Frame-Clip Consistency loss, which ensures the ﬂow of structured information between images and videos. We explore a particular instantiation of scene structure, namely a Hand-Object Graph , consisting of hands and objects with their locations as nodes, and physical relations of contact/no-contact as edges. SViT shows strong performance improvements on multiple video understanding tasks and datasets. Furthermore, it won in the Ego4D CVPR’22 Object State Localization challenge. For code and pretrained models, visit the project page at https://eladb3.github.io/SViT/
\ No newline at end of file
diff --git a/data/2022/neurips/Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints b/data/2022/neurips/Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints
new file mode 100644
index 0000000000..95db95c7d9
--- /dev/null
+++ b/data/2022/neurips/Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints	
@@ -0,0 +1 @@
+There is a disconnect between how researchers and practitioners handle privacy-utility tradeoffs. Researchers primarily operate from a privacy first perspective, setting strict privacy requirements and minimizing risk subject to these constraints. Practitioners often desire an accuracy first perspective, possibly satisfied with the greatest privacy they can get subject to obtaining sufficiently small error. Ligett et al. have introduced a"noise reduction"algorithm to address the latter perspective. The authors show that by adding correlated Laplace noise and progressively reducing it on demand, it is possible to produce a sequence of increasingly accurate estimates of a private parameter while only paying a privacy cost for the least noisy iterate released. In this work, we generalize noise reduction to the setting of Gaussian noise, introducing the Brownian mechanism. The Brownian mechanism works by first adding Gaussian noise of high variance corresponding to the final point of a simulated Brownian motion. Then, at the practitioner's discretion, noise is gradually decreased by tracing back along the Brownian path to an earlier time. Our mechanism is more naturally applicable to the common setting of bounded $\ell_2$-sensitivity, empirically outperforms existing work on common statistical tasks, and provides customizable control of privacy loss over the entire interaction with the practitioner. We complement our Brownian mechanism with ReducedAboveThreshold, a generalization of the classical AboveThreshold algorithm that provides adaptive privacy guarantees. Overall, our results demonstrate that one can meet utility constraints while still maintaining strong levels of privacy.
\ No newline at end of file
diff --git a/data/2022/neurips/Byzantine Spectral Ranking b/data/2022/neurips/Byzantine Spectral Ranking
new file mode 100644
index 0000000000..e55d653598
--- /dev/null
+++ b/data/2022/neurips/Byzantine Spectral Ranking	
@@ -0,0 +1 @@
+We study the problem of rank aggregation where the goal is to obtain a global ranking by aggregating pair-wise comparisons of voters over a set of items. We consider an adversarial setting where the voters are partitioned into two sets. The first set votes in a stochastic manner according to the popular score-based Bradley-Terry-Luce (BTL) model for pairwise comparisons. The second set comprises malicious Byzantine voters trying to deteriorate the ranking. We consider a strongly-adversarial scenario where the Byzantine voters know the BTL scores, the votes of the good voters, the algorithm, and can collude with each other. We first show that the popular spectral ranking based Rank-Centrality algorithm, though optimal for the BTL model, does not perform well even when a small constant fraction of the voters are Byzantine. We introduce the Byzantine Spectral Ranking Algorithm (and a faster variant of it), which produces a reliable ranking when the number of good voters exceeds the number of Byzantine voters. We show that no algorithm can produce a satisfactory ranking with probability>1/2 for all BTL weights when there are more Byzantine voters than good voters, showing that our algorithm works for all possible population fractions. We support our theoretical results with experimental results on synthetic and real datasets to demonstrate the failure of the Rank-Centrality algorithm under several adversarial scenarios and how the proposed Byzantine Spectral Ranking algorithm is robust in obtaining good rankings.
\ No newline at end of file
diff --git a/data/2022/neurips/Byzantine-tolerant federated Gaussian process regression for streaming data b/data/2022/neurips/Byzantine-tolerant federated Gaussian process regression for streaming data
new file mode 100644
index 0000000000..33d1585f8b
--- /dev/null
+++ b/data/2022/neurips/Byzantine-tolerant federated Gaussian process regression for streaming data	
@@ -0,0 +1 @@
+In this paper, we consider Byzantine-tolerant federated learning for streaming data using Gaussian process regression (GPR). In particular, a cloud and a group of agents aim to collaboratively learn a latent function where some agents are subject to Byzantine attacks. We develop a Byzantine-tolerant federated GPR algorithm, which includes three modules: agent-based local GPR, cloud-based aggregated GPR and agent-based fused GPR. We derive the upper bounds on the prediction error between the mean from the cloud-based aggregated GPR and the target function provided that Byzantine agents are less than one quarter of all the agents. We also characterize the lower and upper bounds of the predictive variance. Experiments on a synthetic dataset and two real-world datasets are conducted to evaluate the proposed algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/C-Mixup: Improving Generalization in Regression b/data/2022/neurips/C-Mixup: Improving Generalization in Regression
new file mode 100644
index 0000000000..dcaea23f34
--- /dev/null
+++ b/data/2022/neurips/C-Mixup: Improving Generalization in Regression	
@@ -0,0 +1 @@
+Improving the generalization of deep networks is an important open challenge, particularly in domains without plentiful data. The mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels. These interpolated examples augment the original training set. Mixup has shown promising results in various classification tasks, but systematic analysis of mixup in regression remains underexplored. Using mixup directly on regression labels can result in arbitrarily incorrect labels. In this paper, we propose a simple yet powerful algorithm, C-Mixup, to improve generalization on regression tasks. In contrast with vanilla mixup, which picks training examples for mixing with uniform probability, C-Mixup adjusts the sampling probability based on the similarity of the labels. Our theoretical analysis confirms that C-Mixup with label similarity obtains a smaller mean square error in supervised regression and meta-regression than vanilla mixup and using feature similarity. Another benefit of C-Mixup is that it can improve out-of-distribution robustness, where the test distribution is different from the training distribution. By selectively interpolating examples with similar labels, it mitigates the effects of domain-associated information and yields domain-invariant representations. We evaluate C-Mixup on eleven datasets, ranging from tabular to video data. Compared to the best prior approach, C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively. Code is released at https://github.com/huaxiuyao/C-Mixup.
\ No newline at end of file
diff --git a/data/2022/neurips/C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting b/data/2022/neurips/C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting
new file mode 100644
index 0000000000..b9e7abc98d
--- /dev/null
+++ b/data/2022/neurips/C2FAR: Coarse-to-Fine Autoregressive Networks for Precise Probabilistic Forecasting	
@@ -0,0 +1 @@
+We present coarse-to-fine autoregressive networks (C2FAR), a method for modeling the probability distribution of univariate, numeric random variables. C2FAR generates a hierarchical, coarse-to-fine discretization of a variable autoregressively; progressively finer intervals of support are generated from a sequence of binned distributions, where each distribution is conditioned on previously-generated coarser intervals. Unlike prior (flat) binned distributions, C2FAR can represent values with exponentially higher precision, for only a linear increase in complexity. We use C2FAR for probabilistic forecasting via a recurrent neural network, thus modeling time series autoregressively in both space and time. C2FAR is the first method to simultaneously handle discrete and continuous series of arbitrary scale and distribution shape. This flexibility enables a variety of time series use cases, including anomaly detection, interpolation, and compression. C2FAR achieves improvements over the state-of-the-art on several benchmark forecasting datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets b/data/2022/neurips/CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds b/data/2022/neurips/CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds
new file mode 100644
index 0000000000..560f64cac1
--- /dev/null
+++ b/data/2022/neurips/CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds	
@@ -0,0 +1 @@
+We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels with the same semantic predictions, which considers semantic consistency and diverse locality abandoned in previous bottom-up approaches. Then, to recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module to directly aggregate fine-grained spatial information from backbone for further proposal refinement. It is memory-and-computation efficient and can better encode the geometry-specific features of each 3D proposal. Our model achieves state-of-the-art 3D detection performance with remarkable gains of +\textit{3.6\%} on ScanNet V2 and +\textit{2.6}\% on SUN RGB-D in term of mAP@0.25. Code will be available at https://github.com/Haiyang-W/CAGroup3D.
\ No newline at end of file
diff --git a/data/2022/neurips/CARD: Classification and Regression Diffusion Models b/data/2022/neurips/CARD: Classification and Regression Diffusion Models
new file mode 100644
index 0000000000..526743c16f
--- /dev/null
+++ b/data/2022/neurips/CARD: Classification and Regression Diffusion Models	
@@ -0,0 +1 @@
+Learning the distribution of a continuous or categorical response variable $\boldsymbol y$ given its covariates $\boldsymbol x$ is a fundamental problem in statistics and machine learning. Deep neural network-based supervised learning algorithms have made great progress in predicting the mean of $\boldsymbol y$ given $\boldsymbol x$, but they are often criticized for their ability to accurately capture the uncertainty of their predictions. In this paper, we introduce classification and regression diffusion (CARD) models, which combine a denoising diffusion-based conditional generative model and a pre-trained conditional mean estimator, to accurately predict the distribution of $\boldsymbol y$ given $\boldsymbol x$. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets, the experimental results on which show that CARD in general outperforms state-of-the-art methods, including Bayesian neural network-based ones that are designed for uncertainty estimation, especially when the conditional distribution of $\boldsymbol y$ given $\boldsymbol x$ is multi-modal. In addition, we utilize the stochastic nature of the generative model outputs to obtain a finer granularity in model confidence assessment at the instance level for classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains b/data/2022/neurips/CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains
new file mode 100644
index 0000000000..2d280b5a4a
--- /dev/null
+++ b/data/2022/neurips/CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains	
@@ -0,0 +1 @@
+Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the evaluated domain adaptation methods are high compared to those of fully supervised baselines. This affirms the need for benchmarks such as CARLANE to further strengthen research in Unsupervised Domain Adaptation for lane detection. CARLANE, all evaluated models and the corresponding implementations are publicly available at https://carlanebenchmark.github.io.
\ No newline at end of file
diff --git a/data/2022/neurips/CASA: Category-agnostic Skeletal Animal Reconstruction b/data/2022/neurips/CASA: Category-agnostic Skeletal Animal Reconstruction
new file mode 100644
index 0000000000..e25eab3f2f
--- /dev/null
+++ b/data/2022/neurips/CASA: Category-agnostic Skeletal Animal Reconstruction	
@@ -0,0 +1 @@
+Recovering the skeletal shape of an animal from a monocular video is a longstanding challenge. Prevailing animal reconstruction methods often adopt a control-point driven animation model and optimize bone transforms individually without considering skeletal topology, yielding unsatisfactory shape and articulation. In contrast, humans can easily infer the articulation structure of an unknown animal by associating it with a seen articulated character in their memory. Inspired by this fact, we present CASA, a novel Category-Agnostic Skeletal Animal reconstruction method consisting of two major components: a video-to-shape retrieval process and a neural inverse graphics framework. During inference, CASA first retrieves an articulated shape from a 3D character assets bank so that the input video scores highly with the rendered image, according to a pretrained language-vision model. CASA then integrates the retrieved character into an inverse graphics framework and jointly infers the shape deformation, skeleton structure, and skinning weights through optimization. Experiments validate the efficacy of CASA regarding shape reconstruction and articulation. We further demonstrate that the resulting skeletal-animated characters can be used for re-animation.
\ No newline at end of file
diff --git a/data/2022/neurips/CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks b/data/2022/neurips/CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
new file mode 100644
index 0000000000..c06ee45f73
--- /dev/null
+++ b/data/2022/neurips/CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks	
@@ -0,0 +1 @@
+Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, \cater can effectively identify the IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.
\ No newline at end of file
diff --git a/data/2022/neurips/CCCP is Frank-Wolfe in disguise b/data/2022/neurips/CCCP is Frank-Wolfe in disguise
new file mode 100644
index 0000000000..dbf870bbd8
--- /dev/null
+++ b/data/2022/neurips/CCCP is Frank-Wolfe in disguise	
@@ -0,0 +1 @@
+This paper uncovers a simple but rather surprising connection: it shows that the well-known convex-concave procedure (CCCP) and its generalization to constrained problems are both special cases of the Frank-Wolfe (FW) method. This connection not only provides insight of deep (in our opinion) pedagogical value, but also transfers the recently discovered convergence theory of nonconvex Frank-Wolfe methods immediately to CCCP, closing a long-standing gap in its non-asymptotic convergence theory. We hope the viewpoint uncovered by this paper spurs the transfer of other advances made for FW to both CCCP and its generalizations.
\ No newline at end of file
diff --git a/data/2022/neurips/CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior b/data/2022/neurips/CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior
new file mode 100644
index 0000000000..424c02168d
--- /dev/null
+++ b/data/2022/neurips/CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior	
@@ -0,0 +1 @@
+The increasing size and complexity of modern ML systems has improved their predictive capabilities but made their behavior harder to explain. Many techniques for model explanation have been developed in response, but we lack clear criteria for assessing these techniques. In this paper, we cast model explanation as the causal inference problem of estimating causal effects of real-world concepts on the output behavior of ML models given actual input data. We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP). CEBaB consists of short restaurant reviews with human-generated counterfactual reviews in which an aspect (food, noise, ambiance, service) of the dining experience was modified. Original and counterfactual reviews are annotated with multiply-validated sentiment ratings at the aspect-level and review-level. The rich structure of CEBaB allows us to go beyond input features to study the effects of abstract, real-world concepts on model behavior. We use CEBaB to compare the quality of a range of concept-based explanation methods covering different assumptions and conceptions of the problem, and we seek to establish natural metrics for comparative assessments of these methods.
\ No newline at end of file
diff --git a/data/2022/neurips/CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition b/data/2022/neurips/CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition
new file mode 100644
index 0000000000..c248b1b3b0
--- /dev/null
+++ b/data/2022/neurips/CEDe: A collection of expert-curated datasets with atom-level entity annotations for Optical Chemical Structure Recognition	
@@ -0,0 +1 @@
+Optical Chemical Structure Recognition (OCSR) deals with the translation from chemical images to molecular structures, this being the main way chemical compounds are depicted in scientific documents. Traditionally, rule-based methods have followed a framework based on the detection of chemical entities, such as atoms and bonds, followed by a compound structure reconstruction step. Recently, neural architectures analog to image captioning have been explored to solve this task, yet they still show to be data inefficient, using millions of examples just to show performances comparable with traditional methods. Looking to motivate and benchmark new approaches based on atomic-level entities detection and graph reconstruction, we present CEDe , a unique collection of chemical entity bounding boxes manually curated by experts for scientific literature datasets. These annotations combine to more than 700,000 chemical entity bounding boxes with the necessary information for structure reconstruction. Also, a large synthetic dataset containing one million molecular images and annotations is released in order to explore transfer-learning techniques that could help these architectures perform better under low-data regimes. Benchmarks show that detection-reconstruction based models can achieve performances on par with or better than image captioning-like models, even with 100x fewer training examples.
\ No newline at end of file
diff --git a/data/2022/neurips/CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations b/data/2022/neurips/CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations
new file mode 100644
index 0000000000..ff1ea2b7f4
--- /dev/null
+++ b/data/2022/neurips/CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with Demonstrations	
@@ -0,0 +1 @@
+Although reinforcement learning has found widespread use in dense reward settings, training autonomous agents with sparse rewards remains challenging. To address this difficulty, prior work has shown promising results when using not only task-specific demonstrations but also task-agnostic albeit somewhat related demonstrations. In most cases, the available demonstrations are distilled into an implicit prior, commonly represented via a single deep net. Explicit priors in the form of a database that can be queried have also been shown to lead to encouraging results. To better benefit from available demonstrations, we develop a method to Combine Explicit and Implicit Priors (CEIP). CEIP exploits multiple implicit priors in the form of normalizing flows in parallel to form a single complex prior. Moreover, CEIP uses an effective explicit retrieval and push-forward mechanism to condition the implicit priors. In three challenging environments, we find the proposed CEIP method to improve upon sophisticated state-of-the-art techniques.
\ No newline at end of file
diff --git a/data/2022/neurips/CGLB: Benchmark Tasks for Continual Graph Learning b/data/2022/neurips/CGLB: Benchmark Tasks for Continual Graph Learning
new file mode 100644
index 0000000000..715b9001b7
--- /dev/null
+++ b/data/2022/neurips/CGLB: Benchmark Tasks for Continual Graph Learning	
@@ -0,0 +1 @@
+Continual learning on graph data, which aims to accommodate new tasks over newly emerged graph data while maintaining the model performance over existing tasks, is attracting increasing attention from the community. Unlike continual learning on Euclidean data ( e.g. , images, texts, etc.) that has established benchmarks and unified experimental settings, benchmark tasks are rare for Continual Graph Learning (CGL). Moreover, due to the variety of graph data and its complex topological structures, existing works adopt different protocols to configure datasets and experimental settings. This creates a great obstacle to compare different techniques and thus hinders the development of CGL. To this end, we systematically study the task configurations in different application scenarios and develop a comprehensive Continual Graph Learning Benchmark (CGLB) curated from different public datasets. Specifically, CGLB contains both node-level and graph-level continual graph learning tasks under task-incremental (currently widely adopted) and class-incremental (more practical, challenging, yet underexplored) settings, as well as a toolkit for training, evaluating, and visualizing different CGL methods. Within CGLB, we also systematically explain the difference among these task configurations by comparing them to classical continual learning settings. Finally, we comprehensively compare state-of-the-art baselines on CGLB to investigate their effectiveness. Given CGLB and the developed toolkit, the barrier to exploring CGL has been greatly lowered and researchers can focus more on the model development without worrying about tedious work on pre-processing of datasets or encountering unseen pitfalls. The benchmark and the toolkit are available through https://github.com/QueuQ/CGLB .
\ No newline at end of file
diff --git a/data/2022/neurips/CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis b/data/2022/neurips/CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis
new file mode 100644
index 0000000000..f63f2d532b
--- /dev/null
+++ b/data/2022/neurips/CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis	
@@ -0,0 +1 @@
+A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image despite only one output image being observed per input image. GAN-based methods are prone to mode collapse, which leads to low diversity. To get around this, we leverage Implicit Maximum Likelihood Estimation (IMLE) which can overcome mode collapse fundamentally. IMLE uses the same generator as GANs but trains it with a different, non-adversarial objective which ensures each observed image has a generated sample nearby. Unfortunately, to generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive. In this paper, we propose a new method to get around this limitation, which we dub Conditional Hierarchical IMLE (CHIMLE), which can generate high-fidelity images without requiring many samples. We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage across four tasks, namely night-to-day, 16x single image super-resolution, image colourization and image decompression. Quantitatively, our method improves Fr\'echet Inception Distance (FID) by 36.9% on average compared to the prior best IMLE-based method, and by 27.5% on average compared to the best non-IMLE-based general-purpose methods.
\ No newline at end of file
diff --git a/data/2022/neurips/CLEAR: Generative Counterfactual Explanations on Graphs b/data/2022/neurips/CLEAR: Generative Counterfactual Explanations on Graphs
new file mode 100644
index 0000000000..5f7743187d
--- /dev/null
+++ b/data/2022/neurips/CLEAR: Generative Counterfactual Explanations on Graphs	
@@ -0,0 +1 @@
+Counterfactual explanations promote explainability in machine learning models by answering the question"how should an input instance be perturbed to obtain a desired predicted label?". The comparison of this instance before and after perturbation can enhance human interpretation. Most existing studies on counterfactual explanations are limited in tabular data or image data. In this work, we study the problem of counterfactual explanation generation on graphs. A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed: 1) optimizing in the discrete and disorganized space of graphs; 2) generalizing on unseen graphs; and 3) maintaining the causality in the generated counterfactuals without prior knowledge of the causal model. To tackle these challenges, we propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models. Specifically, CLEAR leverages a graph variational autoencoder based mechanism to facilitate its optimization and generalization, and promotes causality by leveraging an auxiliary variable to better identify the underlying causal model. Extensive experiments on both synthetic and real-world graphs validate the superiority of CLEAR over the state-of-the-art methods in different aspects.
\ No newline at end of file
diff --git a/data/2022/neurips/CLEVRER-Humans: Describing Physical and Causal Events the Human Way b/data/2022/neurips/CLEVRER-Humans: Describing Physical and Causal Events the Human Way
new file mode 100644
index 0000000000..2246d5a719
--- /dev/null
+++ b/data/2022/neurips/CLEVRER-Humans: Describing Physical and Causal Events the Human Way	
@@ -0,0 +1 @@
+Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders b/data/2022/neurips/CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders
new file mode 100644
index 0000000000..d840d7fb2f
--- /dev/null
+++ b/data/2022/neurips/CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders	
@@ -0,0 +1 @@
+This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
\ No newline at end of file
diff --git a/data/2022/neurips/CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP b/data/2022/neurips/CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
new file mode 100644
index 0000000000..6cce115823
--- /dev/null
+++ b/data/2022/neurips/CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP	
@@ -0,0 +1 @@
+CLIP yielded impressive results on zero-shot transfer learning tasks and is considered as a foundation model like BERT or GPT3. CLIP vision models that have a rich representation are pre-trained using the InfoNCE objective and natural language supervision before they are fine-tuned on particular tasks. Though CLIP excels at zero-shot transfer learning, it suffers from an explaining away problem, that is, it focuses on one or few features, while neglecting other relevant features. This problem is caused by insufficiently extracting the covariance structure in the original multi-modal data. We suggest to use modern Hopfield networks to tackle the problem of explaining away. Their retrieved embeddings have an enriched covariance structure derived from co-occurrences of features in the stored embeddings. However, modern Hopfield networks increase the saturation effect of the InfoNCE objective which hampers learning. We propose to use the InfoLOOB objective to mitigate this saturation effect. We introduce the novel"Contrastive Leave One Out Boost"(CLOOB), which uses modern Hopfield networks for covariance enrichment together with the InfoLOOB objective. In experiments we compare CLOOB to CLIP after pre-training on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks b/data/2022/neurips/CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
new file mode 100644
index 0000000000..7d53af0c27
--- /dev/null
+++ b/data/2022/neurips/CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks	
@@ -0,0 +1 @@
+Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating"catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer. We envision that CLiMB will facilitate research on a new class of CL algorithms for this challenging multimodal setting.
\ No newline at end of file
diff --git a/data/2022/neurips/COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics b/data/2022/neurips/COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
new file mode 100644
index 0000000000..b655a855bd
--- /dev/null
+++ b/data/2022/neurips/COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics	
@@ -0,0 +1 @@
+Many applications of text generation require incorporating different constraints to control the semantics or style of generated text. These constraints can be hard (e.g., ensuring certain keywords are included in the output) and soft (e.g., contextualizing the output with the left- or right-hand context). In this paper, we present Energy-based Constrained Decoding with Langevin Dynamics (COLD), a decoding framework which unifies constrained generation as specifying constraints through an energy function, then performing efficient differentiable reasoning over the constraints through gradient-based sampling. COLD decoding is a flexible framework that can be applied directly to off-the-shelf left-to-right language models without the need for any task-specific fine-tuning, as demonstrated through three challenging text generation applications: lexically-constrained generation, abductive reasoning, and counterfactual reasoning. Our experiments on these constrained generation tasks point to the effectiveness of our approach, both in terms of automatic and human evaluation.
\ No newline at end of file
diff --git a/data/2022/neurips/CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification b/data/2022/neurips/CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification
new file mode 100644
index 0000000000..b0b9268f24
--- /dev/null
+++ b/data/2022/neurips/CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification	
@@ -0,0 +1 @@
+Data valuation, or the valuation of individual datum contributions, has seen growing interest in machine learning due to its demonstrable efficacy for tasks such as noisy label detection. In particular, due to the desirable axiomatic properties, several Shapley value approximation methods have been proposed. In these methods, the value function is typically defined as the predictive accuracy over the entire development set. However, this limits the ability to differentiate between training instances that are helpful or harmful to their own classes. Intuitively, instances that harm their own classes may be noisy or mislabeled and should receive a lower valuation than helpful instances. In this work, we propose CS-Shapley, a Shapley value with a new value function that discriminates between training instances' in-class and out-of-class contributions. Our theoretical analysis shows the proposed value function is (essentially) the unique function that satisfies two desirable properties for evaluating data values in classification. Further, our experiments on two benchmark evaluation tasks (data removal and noisy label detection) and four classifiers demonstrate the effectiveness of CS-Shapley over existing methods. Lastly, we evaluate the"transferability"of data values estimated from one classifier to others, and our results suggest Shapley-based data valuation is transferable for application across different models.
\ No newline at end of file
diff --git a/data/2022/neurips/CUP: Critic-Guided Policy Reuse b/data/2022/neurips/CUP: Critic-Guided Policy Reuse
new file mode 100644
index 0000000000..7cfdd585a8
--- /dev/null
+++ b/data/2022/neurips/CUP: Critic-Guided Policy Reuse	
@@ -0,0 +1 @@
+The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose source policies. At each state, CUP chooses the source policy that has the largest one-step improvement over the current target policy, and forms a guidance policy. The guidance policy is theoretically guaranteed to be a monotonic improvement over the current target policy. Then the target policy is regularized to imitate the guidance policy to perform efficient policy search. Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever b/data/2022/neurips/Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever
new file mode 100644
index 0000000000..6272e687a9
--- /dev/null
+++ b/data/2022/neurips/Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever	
@@ -0,0 +1 @@
+Recommender retrievers aim to rapidly retrieve a fraction of items from the entire item corpus when a user query requests, with the representative two-tower model trained with the log softmax loss. For efficiently training recommender retrievers on modern hardwares, inbatch sampling, where the items in the mini-batch are shared as negatives to estimate the softmax function, has attained growing interest. However, existing inbatch sampling based strategies just correct the sampling bias of inbatch items with item frequency, being unable to distinguish the user queries within the mini-batch and still incurring significant bias from the softmax. In this paper, we propose a Cache-Augmented Inbatch Importance Resampling (XIR) for training recommender retrievers, which not only offers different negatives to user queries with inbatch items, but also adaptively achieves a more accurate estimation of the softmax distribution. Specifically, XIR resamples items for the given mini-batch training pairs based on certain probabilities, where a cache with more frequently sampled items is adopted to augment the candidate item set, with the purpose of reusing the historical informative samples. XIR enables to sample query-dependent negatives based on inbatch items and to capture dynamic changes of model training, which leads to a better approximation of the softmax and further contributes to better convergence. Finally, we conduct experiments to validate the superior performance of the proposed XIR compared with competitive approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation b/data/2022/neurips/CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation
new file mode 100644
index 0000000000..695edff9c2
--- /dev/null
+++ b/data/2022/neurips/CageNeRF: Cage-based Neural Radiance Field for Generalized 3D Deformation and Animation	
@@ -0,0 +1 @@
+While implicit representations have achieved high-fidelity results in 3D rendering, it remains challenging to deforming and animating the implicit field. Existing works typically leverage data-dependent models as deformation priors, such as SMPL for human body animation. However, this dependency on category-specific priors limits them to generalize to other objects. To solve this problem, we propose a novel framework for deforming and animating the neural radiance field learned on arbitrary objects. The key insight is that we introduce a cage-based representation as deformation prior, which is category-agnostic. Specifically, the deformation is performed based on an enclosing polygon mesh with sparsely defined vertices called cage inside the rendering space, where each point is projected into a novel position based on the barycentric interpolation of the deformed cage vertices. In this way, we transform the cage into a generalized constraint, which is able to deform and animate arbitrary target objects while preserving geometry details. Based on extensive experiments, we demonstrate the effectiveness of our framework in the task of geometry editing, object animation and deformation transfer. The code and supplementary materials are available at https://pengyicong.github.io/CageNeRF/.
\ No newline at end of file
diff --git a/data/2022/neurips/CalFAT: Calibrated Federated Adversarial Training with Label Skewness b/data/2022/neurips/CalFAT: Calibrated Federated Adversarial Training with Label Skewness
new file mode 100644
index 0000000000..d164580440
--- /dev/null
+++ b/data/2022/neurips/CalFAT: Calibrated Federated Adversarial Training with Label Skewness	
@@ -0,0 +1 @@
+Recent studies have shown that, like traditional machine learning, federated learning (FL) is also vulnerable to adversarial attacks. To improve the adversarial robustness of FL, federated adversarial training (FAT) methods have been proposed to apply adversarial training locally before global aggregation. Although these methods demonstrate promising results on independent identically distributed (IID) data, they suffer from training instability on non-IID data with label skewness, resulting in degraded natural accuracy. This tends to hinder the application of FAT in real-world applications where the label distribution across the clients is often skewed. In this paper, we study the problem of FAT under label skewness, and reveal one root cause of the training instability and natural accuracy degradation issues: skewed labels lead to non-identical class probabilities and heterogeneous local models. We then propose a Calibrated FAT (CalFAT) approach to tackle the instability issue by calibrating the logits adaptively to balance the classes. We show both theoretically and empirically that the optimization of CalFAT leads to homogeneous local models across the clients and better convergence points.
\ No newline at end of file
diff --git a/data/2022/neurips/Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees b/data/2022/neurips/Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees
new file mode 100644
index 0000000000..364b76e987
--- /dev/null
+++ b/data/2022/neurips/Calibrated Data-Dependent Constraints with Exact Satisfaction Guarantees	
@@ -0,0 +1 @@
+We consider the task of training machine learning models with data-dependent constraints. Such constraints often arise as empirical versions of expected value constraints that enforce fairness or stability goals. We reformulate data-dependent constraints so that they are calibrated: enforcing the reformulated constraints guarantees that their expected value counterparts are satisfied with a user-prescribed probability. The resulting optimization problem is amendable to standard stochastic optimization algorithms, and we demonstrate the efficacy of our method on a fairness-sensitive classification task where we wish to guarantee the classifier's fairness (at test time).
\ No newline at end of file
diff --git a/data/2022/neurips/Can Adversarial Training Be Manipulated By Non-Robust Features? b/data/2022/neurips/Can Adversarial Training Be Manipulated By Non-Robust Features?
new file mode 100644
index 0000000000..d39a4e4dca
--- /dev/null
+++ b/data/2022/neurips/Can Adversarial Training Be Manipulated By Non-Robust Features?	
@@ -0,0 +1 @@
+Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness. Our code is available at https://github.com/TLMichael/Hypocritical-Perturbation.
\ No newline at end of file
diff --git a/data/2022/neurips/Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem? b/data/2022/neurips/Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem?
new file mode 100644
index 0000000000..b4c7d98108
--- /dev/null
+++ b/data/2022/neurips/Can Hybrid Geometric Scattering Networks Help Solve the Maximum Clique Problem?	
@@ -0,0 +1 @@
+We propose a geometric scattering-based graph neural network (GNN) for approximating solutions of the NP-hard maximum clique (MC) problem. We construct a loss function with two terms, one which encourages the network to find highly connected nodes and the other which acts as a surrogate for the constraint that the nodes form a clique. We then use this loss to train an efficient GNN architecture that outputs a vector representing the probability for each node to be part of the MC and apply a rule-based decoder to make our final prediction. The incorporation of the scattering transform alleviates the so-called oversmoothing problem that is often encountered in GNNs and would degrade the performance of our proposed setup. Our empirical results demonstrate that our method outperforms representative GNN baselines in terms of solution accuracy and inference speed as well as conventional solvers like Gurobi with limited time budgets. Furthermore, our scattering model is very parameter efficient with only $\sim$ 0.1\% of the number of parameters compared to previous GNN baseline models.
\ No newline at end of file
diff --git a/data/2022/neurips/Can Push-forward Generative Models Fit Multimodal Distributions? b/data/2022/neurips/Can Push-forward Generative Models Fit Multimodal Distributions?
new file mode 100644
index 0000000000..b7ee3eaeb5
--- /dev/null
+++ b/data/2022/neurips/Can Push-forward Generative Models Fit Multimodal Distributions?	
@@ -0,0 +1 @@
+Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them"push-forward"models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
\ No newline at end of file
diff --git a/data/2022/neurips/Capturing Failures of Large Language Models via Human Cognitive Biases b/data/2022/neurips/Capturing Failures of Large Language Models via Human Cognitive Biases
new file mode 100644
index 0000000000..d3dbc86c16
--- /dev/null
+++ b/data/2022/neurips/Capturing Failures of Large Language Models via Human Cognitive Biases	
@@ -0,0 +1 @@
+Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases -- systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.
\ No newline at end of file
diff --git a/data/2022/neurips/Capturing Graphs with Hypo-Elliptic Diffusions b/data/2022/neurips/Capturing Graphs with Hypo-Elliptic Diffusions
new file mode 100644
index 0000000000..a4e8417590
--- /dev/null
+++ b/data/2022/neurips/Capturing Graphs with Hypo-Elliptic Diffusions	
@@ -0,0 +1 @@
+Convolutional layers within graph neural networks operate by aggregating information about local neighbourhood structures; one common way to encode such substructures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms. In particular, this gives a structured approach to capture long-range dependencies on graphs that is robust to pooling. Besides the attractive theoretical properties, our experiments show that this method competes with graph transformers on datasets requiring long-range reasoning but scales only linearly in the number of edges as opposed to quadratically in nodes.
\ No newline at end of file
diff --git a/data/2022/neurips/CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification b/data/2022/neurips/CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification
new file mode 100644
index 0000000000..c7b5b19a5e
--- /dev/null
+++ b/data/2022/neurips/CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification	
@@ -0,0 +1 @@
+Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent approaches, such as XR-Transformer and LightXML, leverage a transformer instance to achieve state-of-the-art performance. However, in this process, these approaches need to make various trade-offs between performance and computational requirements. A major shortcoming, as compared to the Bi-LSTM based AttentionXML, is that they fail to keep separate feature representations for each resolution in a label tree. We thus propose CascadeXML, an end-to-end multi-resolution learning pipeline, which can harness the multi-layered architecture of a transformer model for attending to different label resolutions with separate feature representations. CascadeXML significantly outperforms all existing approaches with non-trivial gains obtained on benchmark datasets consisting of up to three million labels. Code for CascadeXML will be made publicly available at \url{https://github.com/xmc-aalto/cascadexml}.
\ No newline at end of file
diff --git a/data/2022/neurips/Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset b/data/2022/neurips/Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset
new file mode 100644
index 0000000000..cdcab280e5
--- /dev/null
+++ b/data/2022/neurips/Category-Level 6D Object Pose Estimation in the Wild: A Semi-Supervised Learning Approach and A New Dataset	
@@ -0,0 +1 @@
+6D object pose estimation is one of the fundamental problems in computer vision and robotics research. While a lot of recent efforts have been made on generalizing pose estimation to novel object instances within the same category, namely category-level 6D pose estimation, it is still restricted in constrained environments given the limited number of annotated data. In this paper, we collect Wild6D, a new unlabeled RGBD object video dataset with diverse instances and backgrounds. We utilize this data to generalize category-level 6D object pose estimation in the wild with semi-supervised learning. We propose a new model, called Rendering for Pose estimation network RePoNet, that is jointly trained using the free ground-truths with the synthetic data, and a silhouette matching objective function on the real-world data. Without using any 3D annotations on real data, our method outperforms state-of-the-art methods on the previous dataset and our Wild6D test set (with manual annotations for evaluation) by a large margin. Project page with Wild6D data: https://oasisyang.github.io/semi-pose .
\ No newline at end of file
diff --git a/data/2022/neurips/Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis b/data/2022/neurips/Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis
new file mode 100644
index 0000000000..de3d466f4e
--- /dev/null
+++ b/data/2022/neurips/Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis	
@@ -0,0 +1 @@
+Machine learning approaches commonly rely on the assumption of independent and identically distributed (i.i.d.) data. In reality, however, this assumption is almost always violated due to distribution shifts between environments. Although valuable learning signals can be provided by heterogeneous data from changing distributions, it is also known that learning under arbitrary (adversarial) changes is impossible. Causality provides a useful framework for modeling distribution shifts, since causal models encode both observational and interventional distributions. In this work, we explore the sparse mechanism shift hypothesis, which posits that distribution shifts occur due to a small number of changing causal conditionals. Motivated by this idea, we apply it to learning causal structure from heterogeneous environments, where i.i.d. data only allows for learning an equivalence class of graphs without restrictive assumptions. We propose the Mechanism Shift Score (MSS), a score-based approach amenable to various empirical estimators, which provably identifies the entire causal structure with high probability if the sparse mechanism shift hypothesis holds. Empirically, we verify behavior predicted by the theory and compare multiple estimators and score functions to identify the best approaches in practice. Compared to other methods, we show how MSS bridges a gap by both being nonparametric as well as explicitly leveraging sparse changes.
\ No newline at end of file
diff --git a/data/2022/neurips/Causal Discovery in Linear Latent Variable Models Subject to Measurement Error b/data/2022/neurips/Causal Discovery in Linear Latent Variable Models Subject to Measurement Error
new file mode 100644
index 0000000000..28206e1ad9
--- /dev/null
+++ b/data/2022/neurips/Causal Discovery in Linear Latent Variable Models Subject to Measurement Error	
@@ -0,0 +1 @@
+We focus on causal discovery in the presence of measurement error in linear systems where the mixing matrix, i.e., the matrix indicating the independent exogenous noise terms pertaining to the observed variables, is identified up to permutation and scaling of the columns. We demonstrate a somewhat surprising connection between this problem and causal discovery in the presence of unobserved parentless causes, in the sense that there is a mapping, given by the mixing matrix, between the underlying models to be inferred in these problems. Consequently, any identifiability result based on the mixing matrix for one model translates to an identifiability result for the other model. We characterize to what extent the causal models can be identified under a two-part faithfulness assumption. Under only the first part of the assumption (corresponding to the conventional definition of faithfulness), the structure can be learned up to the causal ordering among an ordered grouping of the variables but not all the edges across the groups can be identified. We further show that if both parts of the faithfulness assumption are imposed, the structure can be learned up to a more refined ordered grouping. As a result of this refinement, for the latent variable model with unobserved parentless causes, the structure can be identified. Based on our theoretical results, we propose causal structure learning methods for both models, and evaluate their performance on synthetic data.
\ No newline at end of file
diff --git a/data/2022/neurips/Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness b/data/2022/neurips/Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness
new file mode 100644
index 0000000000..90fb58d4b1
--- /dev/null
+++ b/data/2022/neurips/Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness	
@@ -0,0 +1 @@
+One common task in many data sciences applications is to answer questions about the effect of new interventions, like: ‘what would happen to Y if we make X equal to x while observing covariates Z = z ?’. Formally, this is known as conditional effect identification , where the goal is to determine whether a post-interventional distribution is computable from the combination of an observational distribution and assumptions about the underlying domain represented by a causal diagram. A plethora of methods was developed for solving this problem, including the celebrated do-calculus [Pearl, 1995]. In practice, these results are not always applicable since they require a fully specified causal diagram as input, which is usually not available. In this paper, we assume as the input of the task a less informative structure known as a partial ancestral graph (PAG), which represents a Markov equivalence class of causal diagrams, learnable from observational data. We make the following contributions under this relaxed setting. First, we introduce a new causal calculus, which subsumes the current state-of-the-art, PAG-calculus. Second, we develop an algorithm for conditional effect identification given a PAG and prove it to be both sound and complete. In words, failure of the algorithm to identify a certain effect implies that this effect is not identifiable by any method. Third, we prove the proposed calculus to be complete for the same task.
\ No newline at end of file
diff --git a/data/2022/neurips/Causal Inference with Non-IID Data using Linear Graphical Models b/data/2022/neurips/Causal Inference with Non-IID Data using Linear Graphical Models
new file mode 100644
index 0000000000..8c22527f5f
--- /dev/null
+++ b/data/2022/neurips/Causal Inference with Non-IID Data using Linear Graphical Models	
@@ -0,0 +1 @@
+Traditional causal inference techniques assume data are independent and identically distributed (IID) and thus ignores interactions among units. However, a unit’s treatment may affect another unit’s outcome (interference), a unit’s treatment may be correlated with another unit’s outcome or a unit’s treatment and outcome may be spuriously correlated through another unit. To capture such nuances, we model the data generating process using causal graphs and conduct a systematic analysis of the bias caused by different types of interactions when computing causal effects. We derive theorems to detect and quantify the interaction bias, and derive conditions under which it is safe to ignore interactions. Put differently, we present conditions under which causal effects can be computed with negligible bias by assuming that samples are IID. Furthermore, we develop a method to eliminate bias in cases where blindly assuming IID is expected to yield a significantly biased estimate. Finally, we test the coverage and performance of our methods through simulations.
\ No newline at end of file
diff --git a/data/2022/neurips/Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning b/data/2022/neurips/Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning
new file mode 100644
index 0000000000..94e49aa451
--- /dev/null
+++ b/data/2022/neurips/Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning	
@@ -0,0 +1 @@
+Discovering cause and effect variables from observational data is an important but challenging problem in science and engineering. In this work, a recently proposed brain inspired learning algorithm namely-Neurochaos Learning (NL) is used for the classification of cause and effect time series generated using coupled autoregressive processes, coupled 1D chaotic skew tent maps, coupled 1D chaotic logistic maps and a real-world prey-predator system. In the case of coupled skew tent maps, the proposed method consistently outperforms a five layer Deep Neural Network (DNN) and Long Short Term Memory (LSTM) architecture for unidirectional coupling coefficient values ranging from 0 . 1 to 0 . 7 . Further, we investigate the preservation of causality in the feature extracted space of NL using Granger Causality for coupled autoregressive processes and Compression-Complexity Causality for coupled chaotic systems and real-world prey-predator dataset. Unlike DNN, LSTM and 1D Convolutional Neural Network, it is found that NL preserves the inherent causal structures present in the input timeseries data. These findings are promising for the theory and applications of causal machine learning and open up the possibility to explore the potential of NL for more sophisticated causal learning tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Causality-driven Hierarchical Structure Discovery for Reinforcement Learning b/data/2022/neurips/Causality-driven Hierarchical Structure Discovery for Reinforcement Learning
new file mode 100644
index 0000000000..700d4d35f9
--- /dev/null
+++ b/data/2022/neurips/Causality-driven Hierarchical Structure Discovery for Reinforcement Learning	
@@ -0,0 +1 @@
+Hierarchical reinforcement learning (HRL) effectively improves agents' exploration efficiency on tasks with sparse reward, with the guide of high-quality hierarchical structures (e.g., subgoals or options). However, how to automatically discover high-quality hierarchical structures is still a great challenge. Previous HRL methods can hardly discover the hierarchical structures in complex environments due to the low exploration efficiency by exploiting the randomness-driven exploration paradigm. To address this issue, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, leveraging a causality-driven discovery instead of a randomness-driven exploration to effectively build high-quality hierarchical structures in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies and can perfectly guide to build high-quality hierarchical structures. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL significantly boosts exploration efficiency with the causality-driven paradigm.
\ No newline at end of file
diff --git a/data/2022/neurips/Causally motivated multi-shortcut identification and removal b/data/2022/neurips/Causally motivated multi-shortcut identification and removal
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis b/data/2022/neurips/Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
new file mode 100644
index 0000000000..f2b4fc99b2
--- /dev/null
+++ b/data/2022/neurips/Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis	
@@ -0,0 +1 @@
+This paper considers doing quantile regression on censored data using neural networks (NNs). This adds to the survival analysis toolkit by allowing direct prediction of the target variable, along with a distribution-free characterisation of uncertainty, using a flexible function approximator. We begin by showing how an algorithm popular in linear models can be applied to NNs. However, the resulting procedure is inefficient, requiring sequential optimisation of an individual NN at each desired quantile. Our major contribution is a novel algorithm that simultaneously optimises a grid of quantiles output by a single NN. To offer theoretical insight into our algorithm, we show firstly that it can be interpreted as a form of expectation-maximisation, and secondly that it exhibits a desirable `self-correcting' property. Experimentally, the algorithm produces quantiles that are better calibrated than existing methods on 10 out of 12 real datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Certifying Robust Graph Classification under Orthogonal Gromov-Wasserstein Threats b/data/2022/neurips/Certifying Robust Graph Classification under Orthogonal Gromov-Wasserstein Threats
new file mode 100644
index 0000000000..19fcce4985
--- /dev/null
+++ b/data/2022/neurips/Certifying Robust Graph Classification under Orthogonal Gromov-Wasserstein Threats	
@@ -0,0 +1 @@
+Graph classifiers are vulnerable to topological attacks. Although certificates of robustness have been recently developed, their threat model only counts local and global edge perturbations, which effectively ignores important graph structures such as isomorphism. To address this issue, we propose measuring the perturbation with the orthogonal Gromov-Wasserstein discrepancy, and building its Fenchel biconjugate to facilitate convex optimization. Our key insight is drawn from the matching loss whose root connects two variables via a monotone operator, and it yields a tight outer convex approximation for resistance distance on graph nodes. When applied to graph classification by graph convolutional networks, both our certificate and attack algorithm are demonstrated effective.
\ No newline at end of file
diff --git a/data/2022/neurips/Certifying Some Distributional Fairness with Subpopulation Decomposition b/data/2022/neurips/Certifying Some Distributional Fairness with Subpopulation Decomposition
new file mode 100644
index 0000000000..51a219e97e
--- /dev/null
+++ b/data/2022/neurips/Certifying Some Distributional Fairness with Subpopulation Decomposition	
@@ -0,0 +1 @@
+Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter.
\ No newline at end of file
diff --git a/data/2022/neurips/Chain of Thought Imitation with Procedure Cloning b/data/2022/neurips/Chain of Thought Imitation with Procedure Cloning
new file mode 100644
index 0000000000..da3190c298
--- /dev/null
+++ b/data/2022/neurips/Chain of Thought Imitation with Procedure Cloning	
@@ -0,0 +1 @@
+Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output mapping exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output learning problem allows for applicability in a wide variety of settings, it is also an overly simplistic view of the problem in situations where the expert demonstrations provide much richer insight into expert behavior. For example, applications such as path navigation, robot manipulation, and strategy games acquire expert demonstrations via planning, search, or some other multi-step algorithm, revealing not just the output action to be imitated but also the procedure for how to determine this action. While these intermediate computations may use tools not available to the agent during inference (e.g., environment simulators), they are nevertheless informative as a way to explain an expert's mapping of state to actions. To properly leverage expert procedure information without relying on the privileged tools the expert may have used to perform the procedure, we propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations. This way, procedure cloning learns not only what to do (i.e., the output action), but how and why to do it (i.e., the procedure). Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert's procedure directly is infeasible.
\ No newline at end of file
diff --git a/data/2022/neurips/Chain-of-Thought Prompting Elicits Reasoning in Large Language Models b/data/2022/neurips/Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
new file mode 100644
index 0000000000..c2852699a2
--- /dev/null
+++ b/data/2022/neurips/Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	
@@ -0,0 +1 @@
+We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.
\ No newline at end of file
diff --git a/data/2022/neurips/Challenging Common Assumptions in Convex Reinforcement Learning b/data/2022/neurips/Challenging Common Assumptions in Convex Reinforcement Learning
new file mode 100644
index 0000000000..f895f345a7
--- /dev/null
+++ b/data/2022/neurips/Challenging Common Assumptions in Convex Reinforcement Learning	
@@ -0,0 +1 @@
+The classic Reinforcement Learning (RL) formulation concerns the maximization of a scalar reward function. More recently, convex RL has been introduced to extend the RL formulation to all the objectives that are convex functions of the state distribution induced by a policy. Notably, convex RL covers several relevant applications that do not fall into the scalar formulation, including imitation learning, risk-averse RL, and pure exploration. In classic RL, it is common to optimize an infinite trials objective, which accounts for the state distribution instead of the empirical state visitation frequencies, even though the actual number of trajectories is always finite in practice. This is theoretically sound since the infinite trials and finite trials objectives can be proved to coincide and thus lead to the same optimal policy. In this paper, we show that this hidden assumption does not hold in the convex RL setting. In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error. Since the finite trials setting is the default in both simulated and real-world RL, we believe shedding light on this issue will lead to better approaches and methodologies for convex RL, impacting relevant research areas such as imitation learning, risk-averse RL, and pure exploration among others.
\ No newline at end of file
diff --git a/data/2022/neurips/Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery b/data/2022/neurips/Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery
new file mode 100644
index 0000000000..6f4fc5b5b4
--- /dev/null
+++ b/data/2022/neurips/Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery	
@@ -0,0 +1 @@
+Satellite imagery is increasingly available, high resolution, and temporally detailed. Changes in spatio-temporal datasets such as satellite images are particularly interesting as they reveal the many events and forces that shape our world. However, finding such interesting and meaningful change events from the vast data is challenging. In this paper, we present new datasets for such change events that include semantically meaningful events like road construction created using Sentinel-2 satellite imagery (with 10m spatial and 1 month temporal resolution). Instead of manually annotating the very large corpus of satellite images, we introduce a novel unsupervised approach that takes a large spatio-temporal dataset from satellite images and finds interesting change events. To evaluate the meaningfulness on these datasets we create 2 benchmarks namely CaiRoad and CalFire which capture the events of road construction and forest fires. The CaiRoad dataset has a total of 28015 change events with 2259 road construction events from the city of Cairo and the CalFire dataset has 2172 change events with 204 labeled fire events in California. These new benchmarks can be used to evaluate semantic retrieval/classification performance. We explore these benchmarks qualitatively and quantitatively by using several methods and show that these new datasets are indeed challenging for many existing methods. For example the best performing model has a retrieval precision@25 of 0.46 on CaiRoad benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Change-point Detection for Sparse and Dense Functional Data in General Dimensions b/data/2022/neurips/Change-point Detection for Sparse and Dense Functional Data in General Dimensions
new file mode 100644
index 0000000000..bea453f3b3
--- /dev/null
+++ b/data/2022/neurips/Change-point Detection for Sparse and Dense Functional Data in General Dimensions	
@@ -0,0 +1 @@
+We study the problem of change-point detection and localisation for functional data sequentially observed on a general d-dimensional space, where we allow the functional curves to be either sparsely or densely sampled. Data of this form naturally arise in a wide range of applications such as biology, neuroscience, climatology, and finance. To achieve such a task, we propose a kernel-based algorithm named functional seeded binary segmentation (FSBS). FSBS is computationally efficient, can handle discretely observed functional data, and is theoretically sound for heavy-tailed and temporally-dependent observations. Moreover, FSBS works for a general d-dimensional domain, which is the first in the literature of change-point estimation for functional data. We show the consistency of FSBS for multiple change-point estimations and further provide a sharp localisation error rate, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed and the sampling frequency for each curve. Extensive numerical experiments illustrate the effectiveness of FSBS and its advantage over existing methods in the literature under various settings. A real data application is further conducted, where FSBS localises change-points of sea surface temperature patterns in the south Pacific attributed to El Nino.
\ No newline at end of file
diff --git a/data/2022/neurips/Chaotic Dynamics are Intrinsic to Neural Network Training with SGD b/data/2022/neurips/Chaotic Dynamics are Intrinsic to Neural Network Training with SGD
new file mode 100644
index 0000000000..3b41c69def
--- /dev/null
+++ b/data/2022/neurips/Chaotic Dynamics are Intrinsic to Neural Network Training with SGD	
@@ -0,0 +1 @@
+With the advent of deep learning over the last decade, a considerable amount of effort has gone into better understanding and enhancing Stochastic Gradient Descent so as to improve the performance and stability of artificial neural network training. Active research fields in this area include exploiting second order information of the loss landscape and improving the understanding of chaotic dynamics in optimization. This paper exploits the theoretical connection between the curvature of the loss landscape and chaotic dynamics in neural network training to propose a modified SGD ensuring non-chaotic training dynamics to study the importance thereof in NN training. Building on this, we present empirical evidence suggesting that the negative eigenspectrum - and thus directions of local chaos - cannot be removed from SGD without hurting training performance. Extending our empirical analysis to long-term chaos dynamics, we challenge the widespread understanding of convergence against a confined region in parameter space. Our results show that although chaotic network behavior is mostly confined to the initial training phase, models perturbed upon initialization do diverge at a slow pace even after reaching top training performance, and that their divergence can be modelled through a composition of a random walk and a linear divergence. The tools and insights developed as part of our work contribute to improving the understanding of neural network training dynamics and provide a basis for future improvements of optimization methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent b/data/2022/neurips/Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent
new file mode 100644
index 0000000000..058b107ddb
--- /dev/null
+++ b/data/2022/neurips/Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent	
@@ -0,0 +1 @@
+Recent studies have shown that gradient descent (GD) can achieve improved generalization when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the step-size should be chosen sufficiently large, a task which is problem dependent and can be difficult in practice. In this study, we incorporate a chaotic component to GD in a controlled manner, and introduce multiscale perturbed GD (MPGD), a novel optimization framework where the GD recursion is augmented with chaotic perturbations that evolve via an independent dynamical system. We analyze MPGD from three different angles: (i) By building up on recent advances in rough paths theory, we show that, under appropriate assumptions, as the step-size decreases, the MPGD recursion converges weakly to a stochastic differential equation (SDE) driven by a heavy-tailed L\'evy-stable process. (ii) By making connections to recently developed generalization bounds for heavy-tailed processes, we derive a generalization bound for the limiting SDE and relate the worst-case generalization error over the trajectories of the process to the parameters of MPGD. (iii) We analyze the implicit regularization effect brought by the dynamical regularization and show that, in the weak perturbation regime, MPGD introduces terms that penalize the Hessian of the loss function. Empirical results are provided to demonstrate the advantages of MPGD.
\ No newline at end of file
diff --git a/data/2022/neurips/Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models b/data/2022/neurips/Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
new file mode 100644
index 0000000000..8e64b033bc
--- /dev/null
+++ b/data/2022/neurips/Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models	
@@ -0,0 +1 @@
+Large language models produce human-like text that drive a growing number of applications. However, recent literature and, increasingly, real world observations, have demonstrated that these models can generate language that is toxic, biased, untruthful or otherwise harmful. Though work to evaluate language model harms is under way, translating foresight about which harms may arise into rigorous benchmarks is not straightforward. To facilitate this translation, we outline six ways of characterizing harmful text which merit explicit consideration when designing new benchmarks. We then use these characteristics as a lens to identify trends and gaps in existing benchmarks. Finally, we apply them in a case study of the Perspective API, a toxicity classifier that is widely used in harm benchmarks. Our characteristics provide one piece of the bridge that translates between foresight and effective evaluation.
\ No newline at end of file
diff --git a/data/2022/neurips/Characterization of Excess Risk for Locally Strongly Convex Population Risk b/data/2022/neurips/Characterization of Excess Risk for Locally Strongly Convex Population Risk
new file mode 100644
index 0000000000..1d7cf239a0
--- /dev/null
+++ b/data/2022/neurips/Characterization of Excess Risk for Locally Strongly Convex Population Risk	
@@ -0,0 +1 @@
+We establish upper bounds for the expected excess risk of models trained by proper iterative algorithms which approximate the local minima. Unlike the results built upon the strong globally strongly convexity or global growth conditions e.g., PL-inequality, we only require the population risk to be \emph{locally} strongly convex around its local minima. Concretely, our bound under convex problems is of order $\tilde{\cO}(1/n)$. For non-convex problems with $d$ model parameters such that $d/n$ is smaller than a threshold independent of $n$, the order of $\tilde{\cO}(1/n)$ can be maintained if the empirical risk has no spurious local minima with high probability. Moreover, the bound for non-convex problem becomes $\tilde{\cO}(1/\sqrt{n})$ without such assumption. Our results are derived via algorithmic stability and characterization of the empirical risk's landscape. Compared with the existing algorithmic stability based results, our bounds are dimensional insensitive and without restrictions on the algorithm's implementation, learning rate, and the number of iterations. Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.
\ No newline at end of file
diff --git a/data/2022/neurips/Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models b/data/2022/neurips/Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models
new file mode 100644
index 0000000000..058983d6db
--- /dev/null
+++ b/data/2022/neurips/Characterizing the Ventral Visual Stream with Response-Optimized Neural Encoding Models	
@@ -0,0 +1 @@
+Decades of experimental research based on simple, abstract stimuli has revealed the coding principles of the ventral visual processing hierarchy, from the presence of edge detectors in the primary visual cortex to the selectivity for complex visual categories in the anterior ventral stream. However, these studies are, by construction, constrained by their a priori hypotheses. Furthermore, beyond the early stages, precise neuronal tuning properties and representational transformations along the ventral visual pathway remain poorly understood. In this work, we propose to employ response-optimized encoding models trained solely to predict the functional MRI activation, in order to gain insights into the tuning properties and representational transformations in the series of areas along the ventral visual pathway. We demonstrate the strong generalization abilities of these models on artificial stimuli and novel datasets. Intriguingly, we find that response-optimized models trained towards the ventral-occipital and lateral-occipital areas, but not early visual areas, can recapitulate complex visual behaviors like object categorization and perceived image-similarity in humans. We further probe the trained networks to reveal representational biases in different visual areas and generate experimentally testable hypotheses.
\ No newline at end of file
diff --git a/data/2022/neurips/Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains b/data/2022/neurips/Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains
new file mode 100644
index 0000000000..69c0086bf0
--- /dev/null
+++ b/data/2022/neurips/Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains	
@@ -0,0 +1 @@
+Machine learning on blockchain graphs is an emerging ﬁeld with many applications such as ransomware payment tracking, price manipulation analysis, and money laundering detection. However, analyzing blockchain data requires domain expertise and computational resources, which pose a signiﬁcant barrier and hinder advancement in this ﬁeld. We introduce Chartalist, the ﬁrst comprehensive platform to methodically access and use machine learning across a large selection of blockchains to address this challenge. Chartalist contains ML-ready datasets from unspent transaction output (UTXO) (e.g., Bitcoin) and account-based blockchains (e.g., Ethereum). We envision that Chartalist can facilitate data modeling, analysis, and representation of blockchain data and attract a wider community of scientists to analyze blockchains. Chartal-ist is a sustainable open-science initiative at https://github.com/cakcora/ Chartalist .
\ No newline at end of file
diff --git a/data/2022/neurips/Chefs' Random Tables: Non-Trigonometric Random Features b/data/2022/neurips/Chefs' Random Tables: Non-Trigonometric Random Features
new file mode 100644
index 0000000000..91330ced91
--- /dev/null
+++ b/data/2022/neurips/Chefs' Random Tables: Non-Trigonometric Random Features	
@@ -0,0 +1 @@
+We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance reduction is possible by leveraging statistics which are simple to compute. One instantiation of CRTs, the optimal positive random features (OPRFs), is to our knowledge the first RF method for unbiased softmax kernel estimation with positive and bounded RFs, resulting in exponentially small tails and much lower variance than its counterparts. As we show, orthogonal random features applied in OPRFs provide additional variance reduction for any dimensionality $d$ (not only asymptotically for sufficiently large $d$, as for RKS). We test CRTs on many tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new state-of-the-art results for low-rank text Transformers, while providing linear space and time complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers b/data/2022/neurips/Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers
new file mode 100644
index 0000000000..d80b2e5bd6
--- /dev/null
+++ b/data/2022/neurips/Chroma-VAE: Mitigating Shortcut Learning with Generative Classifiers	
@@ -0,0 +1 @@
+Deep neural networks are susceptible to shortcut learning, using simple features to achieve low training loss without discovering essential semantic structure. Contrary to prior belief, we show that generative models alone are not sufficient to prevent shortcut learning, despite an incentive to recover a more comprehensive representation of the data than discriminative approaches. However, we observe that shortcuts are preferentially encoded with minimal information, a fact that generative models can exploit to mitigate shortcut learning. In particular, we propose Chroma-VAE, a two-pronged approach where a VAE classifier is initially trained to isolate the shortcut in a small latent subspace, allowing a secondary classifier to be trained on the complementary, shortcut-free latent subspace. In addition to demonstrating the efficacy of Chroma-VAE on benchmark and real-world shortcut learning tasks, our work highlights the potential for manipulating the latent space of generative classifiers to isolate or interpret specific correlations.
\ No newline at end of file
diff --git a/data/2022/neurips/Chromatic Correlation Clustering, Revisited b/data/2022/neurips/Chromatic Correlation Clustering, Revisited
new file mode 100644
index 0000000000..9760378032
--- /dev/null
+++ b/data/2022/neurips/Chromatic Correlation Clustering, Revisited	
@@ -0,0 +1 @@
+Chromatic Correlation Clustering (CCC) (introduced by Bonchi et al. [6]) is a natural generalization of the celebrated Correlation Clustering (CC) problem. It models objects with categorical pairwise relationships by an edge-colored graph, and has many applications in data mining, social networks and bioinformatics. We show that there exists a 2 . 5 -approximation to the CCC problem based on a Linear Programming (LP) approach, thus improving the best-known approximation ratio of 3 achieved by Klodt et al. [25]. We also present an efficient heuristic algorithm for CCC leveraging a greedy clustering strategy, and conduct extensive experiments to demonstrate the effectiveness and efficiency of our proposed algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Class-Aware Adversarial Transformers for Medical Image Segmentation b/data/2022/neurips/Class-Aware Adversarial Transformers for Medical Image Segmentation
new file mode 100644
index 0000000000..04259a4e47
--- /dev/null
+++ b/data/2022/neurips/Class-Aware Adversarial Transformers for Medical Image Segmentation	
@@ -0,0 +1 @@
+Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2.54%-5.88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model's inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization b/data/2022/neurips/Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization
new file mode 100644
index 0000000000..6b2c9aac46
--- /dev/null
+++ b/data/2022/neurips/Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization	
@@ -0,0 +1 @@
+. Abstract In label-noise learning, estimating the transition matrix plays an important role in building statistically consistent classifier. Current state-of-the-art consistent estimator for the transition matrix has been developed under the newly proposed sufficiently scattered assumption, through incorporating the minimum volume constraint of the transition matrix T into label-noise learning. To compute the volume of T , it heavily relies on the estimated noisy class posterior . However, the estimation error of the noisy class posterior could usually be large as deep learning methods tend to easily overfit the noisy labels. Then, directly minimizing the volume of such obtained T could lead the transition matrix to be poorly estimated. How to reduce the side-effects of the inaccurate noisy class posterior remains unsolved. In this paper, we creatively propose to estimate the transition matrix under a forward-backward cycle-consistency regularization, of which we have greatly reduced the dependency of estimating the transition matrix T on the noisy class posterior . Extensive experimental results consistently justify the effectiveness of the proposed method, on reducing the estimation error of the transition matrix and greatly boosting the classification performance.
\ No newline at end of file
diff --git a/data/2022/neurips/ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences b/data/2022/neurips/ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences
new file mode 100644
index 0000000000..1f9681189f
--- /dev/null
+++ b/data/2022/neurips/ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences	
@@ -0,0 +1 @@
+Quantization compresses models to low bits for efficient inferences which has received increasing attentions. However, existing approaches focused on balanced datasets, while imbalanced data is pervasive in the real world. Therefore, in this study, we investigate the realistic problem, quantization on class-imbalanced data. We observe from the analytical results that quantizing imbalanced data tends to obtain a large error due to the differences between separate class distributions, which leads to a significant accuracy loss. To address this issue, we propose a novel quantization framework, Class Imbalanced Quantization (ClimbQ) that focuses on diminishing the inter-class heterogeneity for quantization error reduction. ClimbQ first scales the variance of each class distribution and then projects data through the new distributions to the same space for quantization. To guarantee the homogeneity of class variances after the ClimbQ process, we examine the quantized features and derive that the homogeneity satisfies when data size for each class is restricted (bounded). Accordingly, we design a Homogeneous Variance Loss (HomoVar Loss) which reweights the data losses of each class based on the bounded data sizes to satisfy the homogeneity of class variances. Extensive experiments on class-imbalanced and benchmark balanced datasets reveal that ClimbQ outperforms the state-of-the-art quantization techniques, especially on highly imbalanced data.
\ No newline at end of file
diff --git a/data/2022/neurips/Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise b/data/2022/neurips/Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise
new file mode 100644
index 0000000000..b8cef3d631
--- /dev/null
+++ b/data/2022/neurips/Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise	
@@ -0,0 +1 @@
+Stochastic first-order methods such as Stochastic Extragradient (SEG) or Stochastic Gradient Descent-Ascent (SGDA) for solving smooth minimax problems and, more generally, variational inequality problems (VIP) have been gaining a lot of attention in recent years due to the growing popularity of adversarial formulations in machine learning. However, while high-probability convergence bounds are known to reflect the actual behavior of stochastic methods more accurately, most convergence results are provided in expectation. Moreover, the only known high-probability complexity results have been derived under restrictive sub-Gaussian (light-tailed) noise and bounded domain assumption [Juditsky et al., 2011]. In this work, we prove the first high-probability complexity results with logarithmic dependence on the confidence level for stochastic methods for solving monotone and structured non-monotone VIPs with non-sub-Gaussian (heavy-tailed) noise and unbounded domains. In the monotone case, our results match the best-known ones in the light-tails case [Juditsky et al., 2011], and are novel for structured non-monotone problems such as negative comonotone, quasi-strongly monotone, and/or star-cocoercive ones. We achieve these results by studying SEG and SGDA with clipping. In addition, we numerically validate that the gradient noise of many practical GAN formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
\ No newline at end of file
diff --git a/data/2022/neurips/Cluster Randomized Designs for One-Sided Bipartite Experiments b/data/2022/neurips/Cluster Randomized Designs for One-Sided Bipartite Experiments
new file mode 100644
index 0000000000..e5dd7351c1
--- /dev/null
+++ b/data/2022/neurips/Cluster Randomized Designs for One-Sided Bipartite Experiments	
@@ -0,0 +1 @@
+The conclusions of randomized controlled trials may be biased when the outcome of one unit depends on the treatment status of other units, a problem known as interference. In this work, we study interference in the setting of one-sided bipartite experiments in which the experimental units - where treatments are randomized and outcomes are measured - do not interact directly. Instead, their interactions are mediated through their connections to interference units on the other side of the graph. Examples of this type of interference are common in marketplaces and two-sided platforms. The cluster-randomized design is a popular method to mitigate interference when the graph is known, but it has not been well-studied in the one-sided bipartite experiment setting. In this work, we formalize a natural model for interference in one-sided bipartite experiments using the exposure mapping framework. We first exhibit settings under which existing cluster-randomized designs fail to properly mitigate interference under this model. We then show that minimizing the bias of the difference-in-means estimator under our model results in a balanced partitioning clustering objective with a natural interpretation. We further prove that our design is minimax optimal over the class of linear potential outcomes models with bounded interference. We conclude by providing theoretical and experimental evidence of the robustness of our design to a variety of interference graphs and potential outcomes models.
\ No newline at end of file
diff --git a/data/2022/neurips/Cluster and Aggregate: Face Recognition with Large Probe Set b/data/2022/neurips/Cluster and Aggregate: Face Recognition with Large Probe Set
new file mode 100644
index 0000000000..beed34131f
--- /dev/null
+++ b/data/2022/neurips/Cluster and Aggregate: Face Recognition with Large Probe Set	
@@ -0,0 +1 @@
+Feature fusion plays a crucial role in unconstrained face recognition where inputs (probes) comprise of a set of $N$ low quality images whose individual qualities vary. Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set. However, attention mechanisms cannot scale to large $N$ due to their quadratic complexity and recurrent modules suffer from input order sensitivity. We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large $N$ and maintain the ability to perform sequential inference with order invariance. Specifically, Cluster stage is a linear assignment of $N$ inputs to $M$ global cluster centers, and Aggregation stage is a fusion over $M$ clustered features. The clustered features play an integral role when the inputs are sequential as they can serve as a summarization of past features. By leveraging the order-invariance of incremental averaging operation, we design an update rule that achieves batch-order invariance, which guarantees that the contributions of early image in the sequence do not diminish as time steps increase. Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition. Code and pretrained models are available in https://github.com/mk-minchul/caface
\ No newline at end of file
diff --git a/data/2022/neurips/Co-Modality Graph Contrastive Learning for Imbalanced Node Classification b/data/2022/neurips/Co-Modality Graph Contrastive Learning for Imbalanced Node Classification
new file mode 100644
index 0000000000..8823e4cbb9
--- /dev/null
+++ b/data/2022/neurips/Co-Modality Graph Contrastive Learning for Imbalanced Node Classification	
@@ -0,0 +1 @@
+Graph contrastive learning (GCL), leveraging graph augmentations to convert graphs into different views and further train graph neural networks (GNNs), has achieved considerable success on graph benchmark datasets. Yet, there are still some gaps in directly applying existing GCL methods to real-world data. First, handcrafted graph augmentations require trials and errors, but still can not yield consistent performance on multiple tasks. Second, most real-world graph data present class-imbalanced distribution but existing GCL methods are not immune to data imbalance. Therefore, this work proposes to explicitly tackle these challenges, via a principled framework called C o-M odality G raph C ontrastive L earning ( CM-GCL ) to automatically generate contrastive pairs and further learn balanced representation over unlabeled data. Specifically, we design inter-modality GCL to automatically generate contrastive pairs (e.g., node-text) based on rich node content. Inspired by the fact that minority samples can be “forgotten” by pruning deep neural networks, we naturally extend network pruning to our GCL framework for mining minority nodes. Based on this, we co-train two pruned encoders (e.g., GNN and text encoder) in different modalities by pushing the corresponding node-text pairs together and the irrelevant node-text pairs away. Meanwhile, we propose intra-modality GCL by co-training non-pruned GNN and pruned GNN, to ensure node embeddings with similar attribute features stay closed. Last, we fine-tune the GNN encoder on downstream class-imbalanced node classification tasks. Extensive experiments demonstrate that our model significantly outperforms state-of-the-art baseline models and learns more balanced representations on real-world graphs. Our source code is available at https://github.com/graphprojects/CM-GCL.
\ No newline at end of file
diff --git a/data/2022/neurips/CoNSoLe: Convex Neural Symbolic Learning b/data/2022/neurips/CoNSoLe: Convex Neural Symbolic Learning
new file mode 100644
index 0000000000..a6359ab633
--- /dev/null
+++ b/data/2022/neurips/CoNSoLe: Convex Neural Symbolic Learning	
@@ -0,0 +1 @@
+Learning the underlying equation from data is a fundamental problem in many disciplines. Recent advances rely on Neural Networks (NNs) but do not provide theoretical guarantees in obtaining the exact equations owing to the non-convexity of NNs. In this paper, we propose Convex Neural Symbolic Learning (CoNSoLe) to seek convexity under mild conditions. The main idea is to decompose the recovering process into two steps and convexify each step. In the first step of searching for right symbols, we convexify the deep Q-learning. The key is to maintain double convexity for both the negative Q-function and the negative reward function in each iteration, leading to provable convexity of the negative optimal Q function to learn the true symbol connections. Conditioned on the exact searching result, we construct a Locally Convex equation Learner (LoCaL) neural network to convexify the estimation of symbol coefficients. With such a design, we quantify a large region with strict convexity in the loss surface of LoCaL for commonly used physical functions. Finally, we demonstrate the superior performance of the CoNSoLe framework over the state-of-the-art on a diverse set of datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/CoNT: Contrastive Neural Text Generation b/data/2022/neurips/CoNT: Contrastive Neural Text Generation
new file mode 100644
index 0000000000..01f2a08d46
--- /dev/null
+++ b/data/2022/neurips/CoNT: Contrastive Neural Text Generation	
@@ -0,0 +1 @@
+Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always rely on auto-regressive decoding. However, previous methods using contrastive learning in neural text generation usually lead to inferior performance. In this paper, we analyse the underlying reasons and propose a new Contrastive Neural Text generation framework, CoNT. CoNT addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects -- the construction of contrastive examples, the choice of the contrastive loss, and the strategy in decoding. We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation. Experimental results show that CoNT clearly outperforms the conventional training framework on all the ten benchmarks with a convincing margin. Especially, CoNT surpasses previous the most competitive contrastive learning method for text generation, by 1.50 BLEU on machine translation and 1.77 ROUGE-1 on summarization, respectively. It achieves new state-of-the-art on summarization, code comment generation (without external data) and data-to-text generation.
\ No newline at end of file
diff --git a/data/2022/neurips/CoPur: Certifiably Robust Collaborative Inference via Feature Purification b/data/2022/neurips/CoPur: Certifiably Robust Collaborative Inference via Feature Purification
new file mode 100644
index 0000000000..83d956ea65
--- /dev/null
+++ b/data/2022/neurips/CoPur: Certifiably Robust Collaborative Inference via Feature Purification	
@@ -0,0 +1 @@
+Collaborative inference leverages diverse features provided by different agents (e.g., sensors) for more accurate inference. A common setup is where each agent sends its embedded features instead of the raw data to the Fusion Center (FC) for joint prediction. In this setting, we consider inference phase attacks when a small fraction of agents is compromised. The compromised agent either does not send embedded features to the FC or sends arbitrary embedded features. To address this, we propose a certifiably robust COllaborative inference framework via feature PURification (CoPur), by leveraging the block-sparse nature of adversarial perturbations on the feature vector, as well as redundancy across the embedded features (by assuming the overall features lie on an underlying lower dimensional manifold). We theoretically show that the proposed feature purification method can robustly recover the true feature vector, despite adversarial corruptions and/or incomplete observations. We also propose and test an untargeted distributed feature-flipping attack, which is agnostic to the model, training data, label, as well as features held by other agents, and is shown to be effective in attacking state-of-the-art defenses. Experiments on ExtraSensory and NUS-WIDE datasets show that CoPur significantly outperforms existing defenses in terms of robustness against targeted and untargeted adversarial attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone b/data/2022/neurips/Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
new file mode 100644
index 0000000000..97e2cf7937
--- /dev/null
+++ b/data/2022/neurips/Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone	
@@ -0,0 +1 @@
+Vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones, bringing gains in terms of memory and performance. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data; followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is available at https://github.com/microsoft/FIBER.
\ No newline at end of file
diff --git a/data/2022/neurips/CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning b/data/2022/neurips/CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
new file mode 100644
index 0000000000..762efed9e5
--- /dev/null
+++ b/data/2022/neurips/CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose"CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Coded Residual Transform for Generalizable Deep Metric Learning b/data/2022/neurips/Coded Residual Transform for Generalizable Deep Metric Learning
new file mode 100644
index 0000000000..d5d1050f70
--- /dev/null
+++ b/data/2022/neurips/Coded Residual Transform for Generalizable Deep Metric Learning	
@@ -0,0 +1 @@
+A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers b/data/2022/neurips/CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
new file mode 100644
index 0000000000..f74c5d590a
--- /dev/null
+++ b/data/2022/neurips/CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers	
@@ -0,0 +1 @@
+The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel auto-regressive generation. We pretrain a 6B-parameter transformer with a simple and flexible self-supervised task, Cross-modal general language model (CogLM), and finetune it for fast super-resolution. The new text-to-image system, CogView2, shows very competitive generation compared to concurrent state-of-the-art DALL-E-2, and naturally supports interactive text-guided editing on images.
\ No newline at end of file
diff --git a/data/2022/neurips/Collaborative Decision Making Using Action Suggestions b/data/2022/neurips/Collaborative Decision Making Using Action Suggestions
new file mode 100644
index 0000000000..9539a7791a
--- /dev/null
+++ b/data/2022/neurips/Collaborative Decision Making Using Action Suggestions	
@@ -0,0 +1 @@
+The level of autonomy is increasing in systems spanning multiple domains, but these systems still experience failures. One way to mitigate the risk of failures is to integrate human oversight of the autonomous systems and rely on the human to take control when the autonomy fails. In this work, we formulate a method of collaborative decision making through action suggestions that improves action selection without taking control of the system. Our approach uses each suggestion efficiently by incorporating the implicit information shared through suggestions to modify the agent's belief and achieves better performance with fewer suggestions than naively following the suggested actions. We assume collaborative agents share the same objective and communicate through valid actions. By assuming the suggested action is dependent only on the state, we can incorporate the suggested action as an independent observation of the environment. The assumption of a collaborative environment enables us to use the agent's policy to estimate the distribution over action suggestions. We propose two methods that use suggested actions and demonstrate the approach through simulated experiments. The proposed methodology results in increased performance while also being robust to suboptimal suggestions.
\ No newline at end of file
diff --git a/data/2022/neurips/Collaborative Learning by Detecting Collaboration Partners b/data/2022/neurips/Collaborative Learning by Detecting Collaboration Partners
new file mode 100644
index 0000000000..4d3be7168d
--- /dev/null
+++ b/data/2022/neurips/Collaborative Learning by Detecting Collaboration Partners	
@@ -0,0 +1 @@
+Massive amounts of data are naturally dispersed over different clients in many real-world applications, collaborative learning has been a promising paradigm that allows to learn models through collaboration among the clients. However, leveraging these dispersed data to learn good models is still challenging since data over different clients are heterogeneous. Previous works mainly focus on learning the centralized model for all clients or learning a personalized model for each client. When there are numerous clients, the centralized model performs badly on some clients, while learning a personalized model for each client costs unaffordable computational resources. In this paper, we propose the collaborative learning method to detect collaboration partners and adaptively learn K models for numerous heterogeneous clients. We theoretically prove that the model learned for each client is a good approximation of its personalized model. Experimental results on real-world datasets verify the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints b/data/2022/neurips/Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints
new file mode 100644
index 0000000000..d36a0d13fb
--- /dev/null
+++ b/data/2022/neurips/Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints	
@@ -0,0 +1 @@
+In modern machine learning, users often have to collaborate to learn the distribution of the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users -- i.e., whose data follow the same discrete distribution -- and has provided optimal communication-efficient methods for estimating that distribution. However, these methods rely heavily on homogeneity, and are less applicable in the common case when users' discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users' discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate their respective individual distribution. We show that SHIFT is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and $n$-gram frequency estimation in the text domain, which corroborate its efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds b/data/2022/neurips/Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds
new file mode 100644
index 0000000000..b896bd5efb
--- /dev/null
+++ b/data/2022/neurips/Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds	
@@ -0,0 +1 @@
+We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a central server to minimize regret. A fraction $\alpha$ of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. In this work, we provide a fundamental understanding of this tension by designing new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. We also complement our algorithms with tight analyses. First, we develop a robust collaborative phased elimination algorithm that achieves $\tilde{O}\left(\alpha+ 1/\sqrt{M}\right) \sqrt{dT}$ regret for each good agent; here, $d$ is the model-dimension and $T$ is the horizon. For small $\alpha$, our result thus reveals a clear benefit of collaboration despite adversaries. Using an information-theoretic argument, we then prove a matching lower bound, thereby providing the first set of tight, near-optimal regret bounds for collaborative linear bandits with adversaries. Furthermore, by leveraging recent advances in high-dimensional robust statistics, we significantly extend our algorithmic ideas and results to (i) the generalized linear bandit model that allows for non-linear observation maps; and (ii) the contextual bandit setting that allows for time-varying feature vectors.
\ No newline at end of file
diff --git a/data/2022/neurips/ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs b/data/2022/neurips/ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs
new file mode 100644
index 0000000000..49195cfe8e
--- /dev/null
+++ b/data/2022/neurips/ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs	
@@ -0,0 +1 @@
+Many real-world data can be modeled as 3D graphs, but learning representations that incorporates 3D information completely and efficiently is challenging. Existing methods either use partial 3D information, or suffer from excessive computational cost. To incorporate 3D information completely and efficiently, we propose a novel message passing scheme that operates within 1-hop neighborhood. Our method guarantees full completeness of 3D information on 3D graphs by achieving global and local completeness. Notably, we propose the important rotation angles to fulfill global completeness. Additionally, we show that our method is orders of magnitude faster than prior methods. We provide rigorous proof of completeness and analysis of time complexity for our methods. As molecules are in essence quantum systems, we build the \underline{com}plete and \underline{e}fficient graph neural network (ComENet) by combing quantum inspired basis functions and the proposed message passing scheme. Experimental results demonstrate the capability and efficiency of ComENet, especially on real-world datasets that are large in both numbers and sizes of graphs. Our code is publicly available as part of the DIG library (\url{https://github.com/divelab/DIG}).
\ No newline at end of file
diff --git a/data/2022/neurips/ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition b/data/2022/neurips/ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition
new file mode 100644
index 0000000000..243c5fd188
--- /dev/null
+++ b/data/2022/neurips/ComGAN: Unsupervised Disentanglement and Segmentation via Image Composition	
@@ -0,0 +1 @@
+We propose ComGAN, a simple unsupervised generative model, which simultaneously generates realistic images and high semantic masks under an adversarial loss and a binary regularization. In this paper, we first investigate two kinds of trivial solutions in the compositional generation process, and demonstrate their source is vanishing gradients on the mask. Then, we solve trivial solutions from the perspective of architecture. Furthermore, we redesign two fully unsupervised modules based on ComGAN (DS-ComGAN), where the disentanglement module associates the foreground, background and mask with three independent variables, and the segmentation module learns object segmentation. Experimental results show that (i) ComGAN’s network architecture effectively avoids trivial solutions without any supervised information and regularization; (ii) DS-ComGAN achieves remarkable results and outperforms existing semi-supervised and weakly supervised methods by a large margin in both the image disentanglement and unsupervised segmentation tasks. It implies that the redesign of ComGAN is a possible direction for future unsupervised work. 1
\ No newline at end of file
diff --git a/data/2022/neurips/ComMU: Dataset for Combinatorial Music Generation b/data/2022/neurips/ComMU: Dataset for Combinatorial Music Generation
new file mode 100644
index 0000000000..06f57f4df3
--- /dev/null
+++ b/data/2022/neurips/ComMU: Dataset for Combinatorial Music Generation	
@@ -0,0 +1 @@
+Commercial adoption of automatic music composition requires the capability of generating diverse and high-quality music suitable for the desired context (e.g., music for romantic movies, action games, restaurants, etc.). In this paper, we introduce combinatorial music generation, a new task to create varying background music based on given conditions. Combinatorial music generation creates short samples of music with rich musical metadata, and combines them to produce a complete music. In addition, we introduce ComMU, the first symbolic music dataset consisting of short music samples and their corresponding 12 musical metadata for combinatorial music generation. Notable properties of ComMU are that (1) dataset is manually constructed by professional composers with an objective guideline that induces regularity, and (2) it has 12 musical metadata that embraces composers' intentions. Our results show that we can generate diverse high-quality music only with metadata, and that our unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition. We highly recommend watching our video before reading the paper (https://pozalabs.github.io/ComMU).
\ No newline at end of file
diff --git a/data/2022/neurips/Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness b/data/2022/neurips/Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness
new file mode 100644
index 0000000000..187ec4c73d
--- /dev/null
+++ b/data/2022/neurips/Combinatorial Bandits with Linear Constraints: Beyond Knapsacks and Fairness	
@@ -0,0 +1 @@
+This paper proposes and studies for the first time the problem of combinatorial multi-armed bandits with linear long-term constraints. Our model generalizes and unifies several prominent lines of work, including bandits with fairness constraints, bandits with knapsacks (BwK), etc. We propose an upper-confidence bound LP-style algorithm for this problem, called UCB-LP, and prove that it achieves a logarithmic problem-dependent regret bound and zero constraint violations in expectation. In the special case of fairness constraints, we further provide a sharper constant regret bound for UCB-LP. Our regret bounds outperform the existing literature on BwK and bandits with fairness constraints simultaneously. We also develop another low-complexity version of UCB-LP and show that it yields ˜ O ( √ T ) problem-independent regret and zero constraint violations with high-probability. Finally, we conduct numerical experiments to validate our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks b/data/2022/neurips/Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks
new file mode 100644
index 0000000000..ff2ddd26ad
--- /dev/null
+++ b/data/2022/neurips/Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks	
@@ -0,0 +1 @@
+Works on implicit regularization have studied gradient trajectories during the optimization process to explain why deep networks favor certain kinds of solutions over others. In deep linear networks, it has been shown that gradient descent implicitly regularizes toward low-rank solutions on matrix completion/factorization tasks. Adding depth not only improves performance on these tasks but also acts as an accelerative pre-conditioning that further enhances this bias towards low-rankedness. Inspired by this, we propose an explicit penalty to mirror this implicit bias which only takes effect with certain adaptive gradient optimizers (e.g. Adam). This combination can enable a degenerate single-layer network to achieve low-rank approximations with generalization error comparable to deep linear networks, making depth no longer necessary for learning. The single-layer network also performs competitively or out-performs various approaches for matrix completion over a range of parameter and data regimes despite its simplicity. Together with an optimizer's inductive bias, our findings suggest that explicit regularization can play a role in designing different, desirable forms of regularization and that a more nuanced understanding of this interplay may be necessary.
\ No newline at end of file
diff --git a/data/2022/neurips/Communicating Natural Programs to Humans and Machines b/data/2022/neurips/Communicating Natural Programs to Humans and Machines
new file mode 100644
index 0000000000..c601f31305
--- /dev/null
+++ b/data/2022/neurips/Communicating Natural Programs to Humans and Machines	
@@ -0,0 +1 @@
+The Abstraction and Reasoning Corpus (ARC) is a set of procedural tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI. What makes building intelligent systems that can generalize to novel situations such as ARC difficult? We posit that the answer might be found by studying the difference of \emph{language}: While humans readily generate and interpret instructions in a general language, computer systems are shackled to a narrow domain-specific language that they can precisely execute. We present LARC, the \textit{Language-complete ARC}: a collection of natural language descriptions by a group of human participants who instruct each other on how to solve ARC tasks using language alone, which contains successful instructions for 88\% of the ARC tasks. We analyze the collected instructions as `natural programs', finding that while they resemble computer programs, they are distinct in two ways: First, they contain a wide range of primitives; Second, they frequently leverage communicative strategies beyond directly executable codes. We demonstrate that these two distinctions prevent current program synthesis techniques from leveraging LARC to its full potential, and give concrete suggestions on how to build the next-generation program synthesizers.
\ No newline at end of file
diff --git a/data/2022/neurips/Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with an Inexact Prox b/data/2022/neurips/Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with an Inexact Prox
new file mode 100644
index 0000000000..7bbf5be7d1
--- /dev/null
+++ b/data/2022/neurips/Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with an Inexact Prox	
@@ -0,0 +1 @@
+Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip). Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications: i) we allow for an inexact computation of the prox operator of a certain smooth strongly convex function via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we perform a careful modification of the dual update step in order to retain linear convergence. Our general results offer the new state-of-the-art rates for the class of strongly convex-concave saddle-point problems with bilinear coupling characterized by the absence of smoothness in the dual function. When applied to federated learning, we obtain a theoretically better alternative to ProxSkip: our method requires fewer local steps ($O(\kappa^{1/3})$ or $O(\kappa^{1/4})$, compared to $O(\kappa^{1/2})$ of ProxSkip), and performs a deterministic number of local steps instead. Like ProxSkip, our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
\ No newline at end of file
diff --git a/data/2022/neurips/Communication Efficient Distributed Learning for Kernelized Contextual Bandits b/data/2022/neurips/Communication Efficient Distributed Learning for Kernelized Contextual Bandits
new file mode 100644
index 0000000000..d31bb4b095
--- /dev/null
+++ b/data/2022/neurips/Communication Efficient Distributed Learning for Kernelized Contextual Bandits	
@@ -0,0 +1 @@
+We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting. Despite the recent advances in communication-efficient distributed bandit learning, existing solutions are restricted to simple models like multi-armed bandits and linear bandits, which hamper their practical utility. In this paper, instead of assuming the existence of a linear reward mapping from the features to the expected rewards, we consider non-linear reward mappings, by letting agents collaboratively search in a reproducing kernel Hilbert space (RKHS). This introduces significant challenges in communication efficiency as distributed kernel learning requires the transfer of raw data, leading to a communication cost that grows linearly w.r.t. time horizon $T$. We addresses this issue by equipping all agents to communicate via a common Nystr\"{o}m embedding that gets updated adaptively as more data points are collected. We rigorously proved that our algorithm can attain sub-linear rate in both regret and communication cost.
\ No newline at end of file
diff --git a/data/2022/neurips/Communication Efficient Federated Learning for Generalized Linear Bandits b/data/2022/neurips/Communication Efficient Federated Learning for Generalized Linear Bandits
new file mode 100644
index 0000000000..8b8a84dc4c
--- /dev/null
+++ b/data/2022/neurips/Communication Efficient Federated Learning for Generalized Linear Bandits	
@@ -0,0 +1 @@
+Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side. But limited by the required communication efficiency, existing solutions are restricted to linear models to exploit their closed-form solutions for parameter estimation. Such a restricted model choice greatly hampers these algorithms' practical utility. In this paper, we take the first step to addressing this challenge by studying generalized linear bandit models under the federated learning setting. We propose a communication-efficient solution framework that employs online regression for local update and offline regression for global update. We rigorously proved, though the setting is more general and challenging, our algorithm can attain sub-linear rate in both regret and communication cost, which is also validated by our extensive empirical evaluations.
\ No newline at end of file
diff --git a/data/2022/neurips/Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate b/data/2022/neurips/Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate
new file mode 100644
index 0000000000..c504eb57aa
--- /dev/null
+++ b/data/2022/neurips/Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate	
@@ -0,0 +1 @@
+Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is ``mixed'' by communication. We found that popular communication topologies either have large maximum degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate that is used to measure the mixing efficiency. In the proposed family, EquiStatic has a degree of $\Theta(\ln(n))$, where $n$ is the network size, and a series of time-dependent one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve an $n$-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, theoretically and empirically. Our code is implemented through BlueFog and available at \url{https://github.com/kexinjinnn/EquiTopo}
\ No newline at end of file
diff --git a/data/2022/neurips/Communication-efficient distributed eigenspace estimation with arbitrary node failures b/data/2022/neurips/Communication-efficient distributed eigenspace estimation with arbitrary node failures
new file mode 100644
index 0000000000..9be8d4222b
--- /dev/null
+++ b/data/2022/neurips/Communication-efficient distributed eigenspace estimation with arbitrary node failures	
@@ -0,0 +1 @@
+We develop an eigenspace estimation algorithm for distributed environments with arbitrary node failures, where a subset of computing nodes can return structurally valid but otherwise arbitrarily chosen responses. Notably, this setting encompasses several important scenarios that arise in distributed computing and data-collection environments such as silent/soft errors, outliers or corrupted data at certain nodes, and adversarial responses. Our estimator builds upon and matches the performance of a recently proposed non-robust estimator up to an additive $\tilde{O}(\sigma \sqrt{\alpha})$ error, where $\sigma^2$ is the variance of the existing estimator and $\alpha$ is the fraction of corrupted nodes.
\ No newline at end of file
diff --git a/data/2022/neurips/Composite Feature Selection Using Deep Ensembles b/data/2022/neurips/Composite Feature Selection Using Deep Ensembles
new file mode 100644
index 0000000000..979c6c98cf
--- /dev/null
+++ b/data/2022/neurips/Composite Feature Selection Using Deep Ensembles	
@@ -0,0 +1 @@
+In many real world problems, features do not act alone but in combination with each other. For example, in genomics, diseases might not be caused by any single mutation but require the presence of multiple mutations. Prior work on feature selection either seeks to identify individual features or can only determine relevant groups from a predefined set. We investigate the problem of discovering groups of predictive features without predefined grouping. To do so, we define predictive groups in terms of linear and non-linear interactions between features. We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups, without requiring candidate groups to be provided. The selected groups are sparse and exhibit minimum overlap. Furthermore, we propose a new metric to measure similarity between discovered groups and the ground truth. We demonstrate the utility of our model on multiple synthetic tasks and semi-synthetic chemistry datasets, where the ground truth structure is known, as well as an image dataset and a real-world cancer dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Composition Theorems for Interactive Differential Privacy b/data/2022/neurips/Composition Theorems for Interactive Differential Privacy
new file mode 100644
index 0000000000..4242ca495f
--- /dev/null
+++ b/data/2022/neurips/Composition Theorems for Interactive Differential Privacy	
@@ -0,0 +1 @@
+An interactive mechanism is an algorithm that stores a data set and answers adaptively chosen queries to it. The mechanism is called differentially private, if any adversary cannot distinguish whether a specific individual is in the data set by interacting with the mechanism. We study composition properties of differential privacy in concurrent compositions. In this setting, an adversary interacts with k interactive mechanisms in parallel and can interleave its queries to the mechanisms arbitrarily. Previously, Vadhan and Wang [2021] proved an optimal concurrent composition theorem for pure-differential privacy. We significantly generalize and extend their results. Namely, we prove optimal parallel composition properties for several major notions of differential privacy in the literature, including approximate DP, R\'enyi DP, and zero-concentrated DP. Our results demonstrate that the adversary gains no advantage by interleaving its queries to independently running mechanisms. Hence, interactivity is a feature that differential privacy grants us for free. Concurrently and independently of our work, Vadhan and Zhang [2022] proved an optimal concurrent composition theorem for f-DP [Dong et al., 2022], which implies our result for the approximate DP case.
\ No newline at end of file
diff --git a/data/2022/neurips/Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language b/data/2022/neurips/Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language
new file mode 100644
index 0000000000..71638e3c09
--- /dev/null
+++ b/data/2022/neurips/Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language	
@@ -0,0 +1 @@
+Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts. In hopes of enabling compositional generalization, various unsupervised learning algorithms have been proposed with inductive biases that aim to induce compositional structure in learned representations (e.g. disentangled representation and emergent language learning). In this work, we evaluate these unsupervised learning algorithms in terms of how well they enable compositional generalization. Specifically, our evaluation protocol focuses on whether or not it is easy to train a simple model on top of the learned representation that generalizes to new combinations of compositional factors. We systematically study three unsupervised representation learning algorithms - $\beta$-VAE, $\beta$-TCVAE, and emergent language (EL) autoencoders - on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. In addition, we find that the previously proposed metrics for evaluating the levels of compositionality are not correlated with actual compositional generalization in our framework. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization. Taken together, our results shed new light on the compositional generalization behavior of different unsupervised learning algorithms with a new setting to rigorously test this behavior, and suggest the potential benefits of delevoping EL learning algorithms for more generalizable representations.
\ No newline at end of file
diff --git a/data/2022/neurips/Compositional generalization through abstract representations in human and artificial neural networks b/data/2022/neurips/Compositional generalization through abstract representations in human and artificial neural networks
new file mode 100644
index 0000000000..ab3a30a482
--- /dev/null
+++ b/data/2022/neurips/Compositional generalization through abstract representations in human and artificial neural networks	
@@ -0,0 +1 @@
+Humans have a remarkable ability to rapidly generalize to new tasks that is difficult to reproduce in artificial learning systems. Compositionality has been proposed as a key mechanism supporting generalization in humans, but evidence of its neural implementation and impact on behavior is still scarce. Here we study the computational properties associated with compositional generalization in both humans and artificial neural networks (ANNs) on a highly compositional task. First, we identified behavioral signatures of compositional generalization in humans, along with their neural correlates using whole-cortex functional magnetic resonance imaging (fMRI) data. Next, we designed pretraining paradigms aided by a procedure we term {\em primitives pretraining} to endow compositional task elements into ANNs. We found that ANNs with this prior knowledge had greater correspondence with human behavior and neural compositional signatures. Importantly, primitives pretraining induced abstract internal representations, excellent zero-shot generalization, and sample-efficient learning. Moreover, it gave rise to a hierarchy of abstract representations that matched human fMRI data, where sensory rule abstractions emerged in early sensory areas, and motor rule abstractions emerged in later motor areas. Our findings give empirical support to the role of compositional generalization in human behavior, implicate abstract representations as its neural implementation, and illustrate that these representations can be embedded into ANNs by designing simple and efficient pretraining procedures.
\ No newline at end of file
diff --git a/data/2022/neurips/Compressible-composable NeRF via Rank-residual Decomposition b/data/2022/neurips/Compressible-composable NeRF via Rank-residual Decomposition
new file mode 100644
index 0000000000..2f9ea0e0d5
--- /dev/null
+++ b/data/2022/neurips/Compressible-composable NeRF via Rank-residual Decomposition	
@@ -0,0 +1 @@
+Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering. However, its implicit representation causes difficulty in manipulating the models like the explicit mesh representation. Several recent advances in NeRF manipulation are usually restricted by a shared renderer network, or suffer from large model size. To circumvent the hurdle, in this paper, we present an explicit neural field representation that enables efficient and convenient manipulation of models. To achieve this goal, we learn a hybrid tensor rank decomposition of the scene without neural networks. Motivated by the low-rank approximation property of the SVD algorithm, we propose a rank-residual learning strategy to encourage the preservation of primary information in lower ranks. The model size can then be dynamically adjusted by rank truncation to control the levels of detail, achieving near-optimal compression without extra optimization. Furthermore, different models can be arbitrarily transformed and composed into one scene by concatenating along the rank dimension. The growth of storage cost can also be mitigated by compressing the unimportant objects in the composed scene. We demonstrate that our method is able to achieve comparable rendering quality to state-of-the-art methods, while enabling extra capability of compression and composition. Code will be made available at https://github.com/ashawkey/CCNeRF.
\ No newline at end of file
diff --git a/data/2022/neurips/Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs b/data/2022/neurips/Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
new file mode 100644
index 0000000000..52a4b9e9cc
--- /dev/null
+++ b/data/2022/neurips/Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs	
@@ -0,0 +1 @@
+Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than contextual bandits, even with a long planning horizon and unknown state transitions. However, these results are limited to either tabular Markov decision processes (MDPs) or computationally inefficient algorithms for linear mixture MDPs. In this paper, we propose the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal $\tilde O(d\sqrt{K} +d^2)$ regret up to logarithmic factors. Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic, where the weight is both \emph{variance-aware} and \emph{uncertainty-aware}. When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round. This also improves upon the best-known algorithms in this setting when $\sigma_k^2$'s are known.
\ No newline at end of file
diff --git a/data/2022/neurips/Concentration of Data Encoding in Parameterized Quantum Circuits b/data/2022/neurips/Concentration of Data Encoding in Parameterized Quantum Circuits
new file mode 100644
index 0000000000..19c2eec4e6
--- /dev/null
+++ b/data/2022/neurips/Concentration of Data Encoding in Parameterized Quantum Circuits	
@@ -0,0 +1 @@
+Variational quantum algorithms have been acknowledged as a leading strategy to realize near-term quantum advantages in meaningful tasks, including machine learning and combinatorial optimization. When applied to tasks involving classical data, such algorithms generally begin with quantum circuits for data encoding and then train quantum neural networks (QNNs) to minimize target functions. Although QNNs have been widely studied to improve these algorithms' performance on practical tasks, there is a gap in systematically understanding the influence of data encoding on the eventual performance. In this paper, we make progress in filling this gap by considering the common data encoding strategies based on parameterized quantum circuits. We prove that, under reasonable assumptions, the distance between the average encoded state and the maximally mixed state could be explicitly upper-bounded with respect to the width and depth of the encoding circuit. This result in particular implies that the average encoded state will concentrate on the maximally mixed state at an exponential speed on depth. Such concentration seriously limits the capabilities of quantum classifiers, and strictly restricts the distinguishability of encoded states from a quantum information perspective. We further support our findings by numerically verifying these results on both synthetic and public data sets. Our results highlight the significance of quantum data encoding in machine learning tasks and may shed light on future encoding strategies.
\ No newline at end of file
diff --git a/data/2022/neurips/Concept Activation Regions: A Generalized Framework For Concept-Based Explanations b/data/2022/neurips/Concept Activation Regions: A Generalized Framework For Concept-Based Explanations
new file mode 100644
index 0000000000..c05f36e154
--- /dev/null
+++ b/data/2022/neurips/Concept Activation Regions: A Generalized Framework For Concept-Based Explanations	
@@ -0,0 +1 @@
+Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN's latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction. In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN's latent space. Each concept is then represented by a region of the DNN's latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries. In this way, CAR assigns the same explanations to latent spaces that have the same geometry. We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN's latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.
\ No newline at end of file
diff --git a/data/2022/neurips/Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off b/data/2022/neurips/Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off
new file mode 100644
index 0000000000..d2d5f4e958
--- /dev/null
+++ b/data/2022/neurips/Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off	
@@ -0,0 +1 @@
+Deploying AI-powered systems requires trustworthy models supporting effective human interactions, going beyond raw prediction accuracy. Concept bottleneck models promote trustworthiness by conditioning classification tasks on an intermediate level of human-like concepts. This enables human interventions which can correct mispredicted concepts to improve the model's performance. However, existing concept bottleneck models are unable to find optimal compromises between high task accuracy, robust concept-based explanations, and effective interventions on concepts -- particularly in real-world conditions where complete and accurate concept supervisions are scarce. To address this, we propose Concept Embedding Models, a novel family of concept bottleneck models which goes beyond the current accuracy-vs-interpretability trade-off by learning interpretable high-dimensional concept representations. Our experiments demonstrate that Concept Embedding Models (1) attain better or competitive task accuracy w.r.t. standard neural models without concepts, (2) provide concept representations capturing meaningful semantics including and beyond their ground truth labels, (3) support test-time concept interventions whose effect in test accuracy surpasses that in standard concept bottleneck models, and (4) scale to real-world conditions where complete concept supervisions are scarce.
\ No newline at end of file
diff --git a/data/2022/neurips/Concrete Score Matching: Generalized Score Matching for Discrete Data b/data/2022/neurips/Concrete Score Matching: Generalized Score Matching for Discrete Data
new file mode 100644
index 0000000000..c031418045
--- /dev/null
+++ b/data/2022/neurips/Concrete Score Matching: Generalized Score Matching for Discrete Data	
@@ -0,0 +1 @@
+Representing probability distributions by the gradient of their density functions has proven effective in modeling a wide range of continuous data modalities. However, this representation is not applicable in discrete domains where the gradient is undefined. To this end, we propose an analogous score function called the"Concrete score", a generalization of the (Stein) score for discrete settings. Given a predefined neighborhood structure, the Concrete score of any input is defined by the rate of change of the probabilities with respect to local directional changes of the input. This formulation allows us to recover the (Stein) score in continuous domains when measuring such changes by the Euclidean distance, while using the Manhattan distance leads to our novel score function in discrete domains. Finally, we introduce a new framework to learn such scores from samples called Concrete Score Matching (CSM), and propose an efficient training objective to scale our approach to high dimensions. Empirically, we demonstrate the efficacy of CSM on density estimation tasks on a mixture of synthetic, tabular, and high-dimensional image datasets, and demonstrate that it performs favorably relative to existing baselines for modeling discrete data.
\ No newline at end of file
diff --git a/data/2022/neurips/Conditional Diffusion Process for Inverse Halftoning b/data/2022/neurips/Conditional Diffusion Process for Inverse Halftoning
new file mode 100644
index 0000000000..8736a808a3
--- /dev/null
+++ b/data/2022/neurips/Conditional Diffusion Process for Inverse Halftoning	
@@ -0,0 +1 @@
+Inverse halftoning is a technique used to recover realistic images from ancient prints ( e.g. , photographs, newspapers, books). The rise of deep learning has led to the gradual incorporation of neural network designs into inverse halftoning methods. Most of existing inverse halftoning approaches adopt the U-net architecture, which uses an encoder to encode halftone prints, followed by a decoder for image reconstruction. However, the mainstream supervised learning paradigm with element-wise regression commonly adopted in U-net based methods has poor generalization ability in practical applications. Specifically, when there is a large gap between the dithering patterns of the training and testing halftones, the re-constructed continuous-tone images have obvious artifacts. This is an important issue in practical applications, since the algorithms for generating halftones are ever-evolving. Even for the same algorithm, different parameter choices will result in different halftone dithering patterns. In this paper, we propose the first generative halftoning method in the literature, which regards the black pixels in halftones as physically moving particles, and makes the randomly distributed particles move under some certain guidance through reverse diffusion process, so as to obtain desired halftone patterns. In particular, we propose a Conditional Diffusion model for image Halftoning (CDH), which consists of a halftone dithering process and an inverse halftoning process. By changing the initial state of the diffusion model, our method can generate visually plausible halftones with different dithering patterns under the condition of image gray level and Laplacian prior. To avoid introducing redundant patterns and undesired artifacts, we propose a meta-halftone guided network to incorporate blue noise guidance in the diffusion process. In this way, halftone images subject to more diverse distributions are fed into the inverse halftoning model, which helps the model to learn a more robust mapping from halftone distributions to continuous-tone distributions, thereby improving the generalization ability to unseen samples. Quantitative and qualitative experimental results demonstrate that the proposed method achieves state-of-the-art results
\ No newline at end of file
diff --git a/data/2022/neurips/Conditional Independence Testing with Heteroskedastic Data and Applications to Causal Discovery b/data/2022/neurips/Conditional Independence Testing with Heteroskedastic Data and Applications to Causal Discovery
new file mode 100644
index 0000000000..c0e45db5e6
--- /dev/null
+++ b/data/2022/neurips/Conditional Independence Testing with Heteroskedastic Data and Applications to Causal Discovery	
@@ -0,0 +1 @@
+Conditional independence (CI) testing is frequently used in data analysis and machine learning for various scientific fields and it forms the basis of constraint-based causal discovery. Oftentimes, CI testing relies on strong, rather unrealistic assumptions. One of these assumptions is homoskedasticity, in other words, a constant conditional variance is assumed. We frame heteroskedasticity in a structural causal model framework and present an adaptation of the partial correlation CI test that works well in the presence of heteroskedastic noise, given that expert knowledge about the heteroskedastic relationships is available. Further, we provide theoretical consistency results for the proposed CI test which carry over to causal discovery under certain assumptions. Numerical causal discovery experiments demonstrate that the adapted partial correlation CI test outperforms the standard test in the presence of heteroskedasticity and is on par for the homoskedastic case. Finally, we discuss the general challenges and limits as to how expert knowledge about heteroskedasticity can be accounted for in causal discovery.
\ No newline at end of file
diff --git a/data/2022/neurips/Conditional Meta-Learning of Linear Representations b/data/2022/neurips/Conditional Meta-Learning of Linear Representations
new file mode 100644
index 0000000000..0124842269
--- /dev/null
+++ b/data/2022/neurips/Conditional Meta-Learning of Linear Representations	
@@ -0,0 +1 @@
+Standard meta-learning for representation learning aims to find a common representation to be shared across multiple tasks. The effectiveness of these methods is often limited when the nuances of the tasks' distribution cannot be captured by a single representation. In this work we overcome this issue by inferring a conditioning function, mapping the tasks' side information (such as the tasks' training dataset itself) into a representation tailored to the task at hand. We study environments in which our conditional strategy outperforms standard meta-learning, such as those in which tasks can be organized in separate clusters according to the representation they share. We then propose a meta-algorithm capable of leveraging this advantage in practice. In the unconditional setting, our method yields a new estimator enjoying faster learning rates and requiring less hyper-parameters to tune than current state-of-the-art methods. Our results are supported by preliminary experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild b/data/2022/neurips/ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild
new file mode 100644
index 0000000000..bbe66d338e
--- /dev/null
+++ b/data/2022/neurips/ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild	
@@ -0,0 +1 @@
+Recording the dynamics of unscripted human interactions in the wild is challenging due to the delicate trade-offs between several factors: participant privacy, ecological validity, data fidelity, and logistical overheads. To address these, following a 'datasets for the community by the community' ethos, we propose the Conference Living Lab (ConfLab): a new concept for multimodal multisensor data collection of in-the-wild free-standing social conversations. For the first instantiation of ConfLab described here, we organized a real-life professional networking event at a major international conference. Involving 48 conference attendees, the dataset captures a diverse mix of status, acquaintance, and networking motivations. Our capture setup improves upon the data fidelity of prior in-the-wild datasets while retaining privacy sensitivity: 8 videos (1920x1080, 60 fps) from a non-invasive overhead view, and custom wearable sensors with onboard recording of body motion (full 9-axis IMU), privacy-preserving low-frequency audio (1250 Hz), and Bluetooth-based proximity. Additionally, we developed custom solutions for distributed hardware synchronization at acquisition and time-efficient continuous annotation of body keypoints and actions at high sampling rates. Our benchmarks showcase some of the open research tasks related to in-the-wild privacy-preserving social data analysis: keypoints detection from overhead camera views, skeleton-based no-audio speaker detection, and F-formation detection.
\ No newline at end of file
diff --git a/data/2022/neurips/Confidence-based Reliable Learning under Dual Noises b/data/2022/neurips/Confidence-based Reliable Learning under Dual Noises
new file mode 100644
index 0000000000..115030ced3
--- /dev/null
+++ b/data/2022/neurips/Confidence-based Reliable Learning under Dual Noises	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks, where massive labeled images are routinely required for model optimization. Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models. Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images. A naive combination of the two lines of works would suffer from the limitations in both sides, and miss the opportunities to handle the two kinds of noise in parallel. This work provides a first, unified framework for reliable learning under the joint (image, label)-noise. Technically, we develop a confidence-based sample filter to progressively filter out noisy data without the need of pre-specifying noise ratio. Then, we penalize the model uncertainty of the detected noisy data instead of letting the model continue over-fitting the misleading information in them. Experimental results on various challenging synthetic and real-world noisy datasets verify that the proposed method can outperform competing baselines in the aspect of classification performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Confident Adaptive Language Modeling b/data/2022/neurips/Confident Adaptive Language Modeling
new file mode 100644
index 0000000000..1ca7d467d2
--- /dev/null
+++ b/data/2022/neurips/Confident Adaptive Language Modeling	
@@ -0,0 +1 @@
+Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $\times 3$ -- while provably maintaining high performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Conformal Frequency Estimation with Sketched Data b/data/2022/neurips/Conformal Frequency Estimation with Sketched Data
new file mode 100644
index 0000000000..f25b3ec317
--- /dev/null
+++ b/data/2022/neurips/Conformal Frequency Estimation with Sketched Data	
@@ -0,0 +1 @@
+A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm;instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Conformal Off-Policy Prediction in Contextual Bandits b/data/2022/neurips/Conformal Off-Policy Prediction in Contextual Bandits
new file mode 100644
index 0000000000..1a872192dd
--- /dev/null
+++ b/data/2022/neurips/Conformal Off-Policy Prediction in Contextual Bandits	
@@ -0,0 +1 @@
+Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose \emph{conformal off-policy prediction} (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2022/neurips/Conformal Prediction with Temporal Quantile Adjustments b/data/2022/neurips/Conformal Prediction with Temporal Quantile Adjustments
new file mode 100644
index 0000000000..e0f9161e41
--- /dev/null
+++ b/data/2022/neurips/Conformal Prediction with Temporal Quantile Adjustments	
@@ -0,0 +1 @@
+We develop Temporal Quantile Adjustment (TQA), a general method to construct efficient and valid prediction intervals (PIs) for regression on cross-sectional time series data. Such data is common in many domains, including econometrics and healthcare. A canonical example in healthcare is predicting patient outcomes using physiological time-series data, where a population of patients composes a cross-section. Reliable PI estimators in this setting must address two distinct notions of coverage: cross-sectional coverage across a cross-sectional slice, and longitudinal coverage along the temporal dimension for each time series. Recent works have explored adapting Conformal Prediction (CP) to obtain PIs in the time series context. However, none handles both notions of coverage simultaneously. CP methods typically query a pre-specified quantile from the distribution of nonconformity scores on a calibration set. TQA adjusts the quantile to query in CP at each time $t$, accounting for both cross-sectional and longitudinal coverage in a theoretically-grounded manner. The post-hoc nature of TQA facilitates its use as a general wrapper around any time series regression model. We validate TQA's performance through extensive experimentation: TQA generally obtains efficient PIs and improves longitudinal coverage while preserving cross-sectional coverage.
\ No newline at end of file
diff --git a/data/2022/neurips/Conformalized Fairness via Quantile Regression b/data/2022/neurips/Conformalized Fairness via Quantile Regression
new file mode 100644
index 0000000000..98e9559053
--- /dev/null
+++ b/data/2022/neurips/Conformalized Fairness via Quantile Regression	
@@ -0,0 +1 @@
+Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity with respect to sensitive attributes, such as race or gender, and thereby derive a reliable fair prediction interval. Using optimal transport and functional synchronization techniques, we establish theoretical guarantees of distribution-free coverage and exact fairness for the induced prediction interval constructed by fair quantiles. A hands-on pipeline is provided to incorporate flexible quantile regressions with an efficient fairness adjustment post-processing algorithm. We demonstrate the superior empirical performance of this approach on several benchmark datasets. Our results show the model's ability to uncover the mechanism underlying the fairness-accuracy trade-off in a wide range of societal and medical applications.
\ No newline at end of file
diff --git a/data/2022/neurips/ConfounderGAN: Protecting Image Data Privacy with Causal Confounder b/data/2022/neurips/ConfounderGAN: Protecting Image Data Privacy with Causal Confounder
new file mode 100644
index 0000000000..015152e4ef
--- /dev/null
+++ b/data/2022/neurips/ConfounderGAN: Protecting Image Data Privacy with Causal Confounder	
@@ -0,0 +1 @@
+The success of deep learning is partly attributed to the availability of massive data downloaded freely from the Internet. However, it also means that users' private data may be collected by commercial organizations without consent and used to train their models. Therefore, it's important and necessary to develop a method or tool to prevent unauthorized data exploitation. In this paper, we propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners. Specifically, the noise produced by the generator for each image has the confounder property. It can build spurious correlations between images and labels, so that the model cannot learn the correct mapping from images to labels in this noise-added dataset. Meanwhile, the discriminator is used to ensure that the generated noise is small and imperceptible, thereby remaining the normal utility of the encrypted image for humans. The experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets. The results demonstrate that our method not only outperforms state-of-the-art methods in standard settings, but can also be applied to fast encryption scenarios. Moreover, we show a series of transferability and stability experiments to further illustrate the effectiveness and superiority of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning b/data/2022/neurips/Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
new file mode 100644
index 0000000000..6272f0ff91
--- /dev/null
+++ b/data/2022/neurips/Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning	
@@ -0,0 +1 @@
+Provably efficient Model-Based Reinforcement Learning (MBRL) based on optimism or posterior sampling (PSRL) is ensured to attain the global optimality asymptotically by introducing the complexity measure of the model. However, the complexity might grow exponentially for the simplest nonlinear models, where global convergence is impossible within finite iterations. When the model suffers a large generalization error, which is quantitatively measured by the model complexity, the uncertainty can be large. The sampled model that current policy is greedily optimized upon will thus be unsettled, resulting in aggressive policy updates and over-exploration. In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update. The policy is first optimized under a reference model, which imitates the mechanism of PSRL while offering more stability. A conservative range of randomness is guaranteed by maximizing the expectation of model value. Without harmful sampling procedures, CDPO can still achieve the same regret as PSRL. More importantly, CDPO enjoys monotonic policy improvement and global optimality simultaneously. Empirical results also validate the exploration efficiency of CDPO.
\ No newline at end of file
diff --git a/data/2022/neurips/Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions b/data/2022/neurips/Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions
new file mode 100644
index 0000000000..f4d4adcafd
--- /dev/null
+++ b/data/2022/neurips/Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions	
@@ -0,0 +1 @@
+Spectral clustering is popular among practitioners and theoreticians alike. While performance guarantees for spectral clustering are well understood, recent studies have focused on enforcing ``fairness'' in clusters, requiring them to be ``balanced'' with respect to a categorical sensitive node attribute (e.g. the race distribution in clusters must match the race distribution in the population). In this paper, we consider a setting where sensitive attributes indirectly manifest in an auxiliary \textit{representation graph} rather than being directly observed. This graph specifies node pairs that can represent each other with respect to sensitive attributes and is observed in addition to the usual \textit{similarity graph}. Our goal is to find clusters in the similarity graph while respecting a new individual-level fairness constraint encoded by the representation graph. We develop variants of unnormalized and normalized spectral clustering for this task and analyze their performance under a \emph{fair} planted partition model induced by the representation graph. This model uses both the cluster membership of the nodes and the structure of the representation graph to generate random similarity graphs. To the best of our knowledge, these are the first consistency results for constrained spectral clustering under an individual-level fairness constraint. Numerical results corroborate our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Consistent Interpolating Ensembles via the Manifold-Hilbert Kernel b/data/2022/neurips/Consistent Interpolating Ensembles via the Manifold-Hilbert Kernel
new file mode 100644
index 0000000000..d195991e40
--- /dev/null
+++ b/data/2022/neurips/Consistent Interpolating Ensembles via the Manifold-Hilbert Kernel	
@@ -0,0 +1 @@
+Recent research in the theory of overparametrized learning has sought to establish generalization guarantees in the interpolating regime. Such results have been established for a few common classes of methods, but so far not for ensemble methods. We devise an ensemble classification method that simultaneously interpolates the training data, and is consistent for a broad class of data distributions. To this end, we define the manifold-Hilbert kernel for data distributed on a Riemannian manifold. We prove that kernel smoothing regression using the manifold-Hilbert kernel is weakly consistent in the setting of Devroye et al. 1998. For the sphere, we show that the manifold-Hilbert kernel can be realized as a weighted random partition kernel, which arises as an infinite ensemble of partition-based classifiers.
\ No newline at end of file
diff --git a/data/2022/neurips/Consistent Sufficient Explanations and Minimal Local Rules for explaining the decision of any classifier or regressor b/data/2022/neurips/Consistent Sufficient Explanations and Minimal Local Rules for explaining the decision of any classifier or regressor
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Constants of motion network b/data/2022/neurips/Constants of motion network
new file mode 100644
index 0000000000..2e4f428c3a
--- /dev/null
+++ b/data/2022/neurips/Constants of motion network	
@@ -0,0 +1 @@
+The beauty of physics is that there is usually a conserved quantity in an always-changing system, known as the constant of motion. Finding the constant of motion is important in understanding the dynamics of the system, but typically requires mathematical proficiency and manual analytical work. In this paper, we present a neural network that can simultaneously learn the dynamics of the system and the constants of motion from data. By exploiting the discovered constants of motion, it can produce better predictions on dynamics and can work on a wider range of systems than Hamiltonian-based neural networks. In addition, the training progresses of our method can be used as an indication of the number of constants of motion in a system which could be useful in studying a novel physical system.
\ No newline at end of file
diff --git a/data/2022/neurips/Constrained GPI for Zero-Shot Transfer in Reinforcement Learning b/data/2022/neurips/Constrained GPI for Zero-Shot Transfer in Reinforcement Learning
new file mode 100644
index 0000000000..511d46d77e
--- /dev/null
+++ b/data/2022/neurips/Constrained GPI for Zero-Shot Transfer in Reinforcement Learning	
@@ -0,0 +1 @@
+For zero-shot transfer in reinforcement learning where the reward function varies between different tasks, the successor features framework has been one of the popular approaches. However, in this framework, the transfer to new target tasks with generalized policy improvement (GPI) relies on only the source successor features [6] or additional successor features obtained from the function approxi-mators’ generalization to novel inputs [12]. The goal of this work is to improve the transfer by more tightly bounding the value approximation errors of successor features on the new target tasks. Given a set of source tasks with their successor features, we present lower and upper bounds on the optimal values for novel task vectors that are expressible as linear combinations of source task vectors. Based on the bounds, we propose constrained GPI as a simple test-time approach that can improve transfer by constraining action-value approximation errors on new target tasks. Through experiments in the Scavenger and Reacher environment with state observations as well as the DeepMind Lab environment with visual observations, we show that the proposed constrained GPI signiﬁcantly outperforms the prior GPI’s transfer performance. Our code and additional information are available at https://jaekyeom.github.io/projects/cgpi/ .
\ No newline at end of file
diff --git a/data/2022/neurips/Constrained Langevin Algorithms with L-mixing External Random Variables b/data/2022/neurips/Constrained Langevin Algorithms with L-mixing External Random Variables
new file mode 100644
index 0000000000..9b2dd4c722
--- /dev/null
+++ b/data/2022/neurips/Constrained Langevin Algorithms with L-mixing External Random Variables	
@@ -0,0 +1 @@
+Langevin algorithms are gradient descent methods augmented with additive noise, and are widely used in Markov Chain Monte Carlo (MCMC) sampling, optimization, and machine learning. In recent years, the non-asymptotic analysis of Langevin algorithms for non-convex learning has been extensively explored. For constrained problems with non-convex losses over a compact convex domain with IID data variables, the projected Langevin algorithm achieves a deviation of $O(T^{-1/4} (\log T)^{1/2})$ from its target distribution [27] in $1$-Wasserstein distance. In this paper, we obtain a deviation of $O(T^{-1/2} \log T)$ in $1$-Wasserstein distance for non-convex losses with $L$-mixing data variables and polyhedral constraints (which are not necessarily bounded). This improves on the previous bound for constrained problems and matches the best-known bound for unconstrained problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Constrained Predictive Coding as a Biologically Plausible Model of the Cortical Hierarchy b/data/2022/neurips/Constrained Predictive Coding as a Biologically Plausible Model of the Cortical Hierarchy
new file mode 100644
index 0000000000..4be471321b
--- /dev/null
+++ b/data/2022/neurips/Constrained Predictive Coding as a Biologically Plausible Model of the Cortical Hierarchy	
@@ -0,0 +1 @@
+Predictive coding has emerged as an influential normative model of neural computation, with numerous extensions and applications. As such, much effort has been put into mapping PC faithfully onto the cortex, but there are issues that remain unresolved or controversial. In particular, current implementations often involve separate value and error neurons and require symmetric forward and backward weights across different brain regions. These features have not been experimentally confirmed. In this work, we show that the PC framework in the linear regime can be modified to map faithfully onto the cortical hierarchy in a manner compatible with empirical observations. By employing a disentangling-inspired constraint on hidden-layer neural activities, we derive an upper bound for the PC objective. Optimization of this upper bound leads to an algorithm that shows the same performance as the original objective and maps onto a biologically plausible network. The units of this network can be interpreted as multi-compartmental neurons with non-Hebbian learning rules, with a remarkable resemblance to recent experimental findings. There exist prior models which also capture these features, but they are phenomenological, while our work is a normative derivation. The network we derive does not involve one-to-one connectivity or signal multiplexing, which the phenomenological models required, indicating that these features are not necessary for learning in the cortex. The normative nature of our algorithm in the simplified linear case allows us to prove interesting properties of the framework and analytically understand the computational role of our network's components. The parameters of our network have natural interpretations as physiological quantities in a multi-compartmental model of pyramidal neurons, providing a concrete link between PC and experimental measurements carried out in the cortex.
\ No newline at end of file
diff --git a/data/2022/neurips/Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data b/data/2022/neurips/Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data
new file mode 100644
index 0000000000..f5b40cea0e
--- /dev/null
+++ b/data/2022/neurips/Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data	
@@ -0,0 +1 @@
+We study stochastic optimization algorithms for constrained nonconvex stochastic optimization problems with Markovian data. In particular, we focus on the case when the transition kernel of the Markov chain is state-dependent. Such stochastic optimization problems arise in various machine learning problems including strategic classification and reinforcement learning. For this problem, we study both projection-based and projection-free algorithms. In both cases, we establish that the number of calls to the stochastic first-order oracle to obtain an appropriately defined $\epsilon$-stationary point is of the order $\mathcal{O}(1/\epsilon^{2.5})$. In the projection-free setting we additionally establish that the number of calls to the linear minimization oracle is of order $\mathcal{O}(1/\epsilon^{5.5})$. We also empirically demonstrate the performance of our algorithm on the problem of strategic classification with neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Constrained Update Projection Approach to Safe Policy Optimization b/data/2022/neurips/Constrained Update Projection Approach to Safe Policy Optimization
new file mode 100644
index 0000000000..2459514e26
--- /dev/null
+++ b/data/2022/neurips/Constrained Update Projection Approach to Safe Policy Optimization	
@@ -0,0 +1 @@
+Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at this link https://github.com/zmsn-2077/ CUP-safe-rl.
\ No newline at end of file
diff --git a/data/2022/neurips/Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations b/data/2022/neurips/Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations
new file mode 100644
index 0000000000..09cdc63668
--- /dev/null
+++ b/data/2022/neurips/Constraining Gaussian Processes to Systems of Linear Ordinary Differential Equations	
@@ -0,0 +1 @@
+Data in many applications follows systems of Ordinary Differential Equations (ODEs). This paper presents a novel algorithmic and symbolic construction for covariance functions of Gaussian Processes (GPs) with realizations strictly following a system of linear homogeneous ODEs with constant coefficients, which we call LODE-GPs. Introducing this strong inductive bias into a GP improves modelling of such data. Using smith normal form algorithms, a symbolic technique, we overcome two current restrictions in the state of the art: (1) the need for certain uniqueness conditions in the set of solutions, typically assumed in classical ODE solvers and their probabilistic counterparts, and (2) the restriction to controllable systems, typically assumed when encoding differential equations in covariance functions. We show the effectiveness of LODE-GPs in a number of experiments, for example learning physically interpretable parameters by maximizing the likelihood.
\ No newline at end of file
diff --git a/data/2022/neurips/Contact-aware Human Motion Forecasting b/data/2022/neurips/Contact-aware Human Motion Forecasting
new file mode 100644
index 0000000000..209d79eb08
--- /dev/null
+++ b/data/2022/neurips/Contact-aware Human Motion Forecasting	
@@ -0,0 +1 @@
+In this paper, we tackle the task of scene-aware 3D human motion forecasting, which consists of predicting future human poses given a 3D scene and a past human motion. A key challenge of this task is to ensure consistency between the human and the scene, accounting for human-scene interactions. Previous attempts to do so model such interactions only implicitly, and thus tend to produce artifacts such as"ghost motion"because of the lack of explicit constraints between the local poses and the global motion. Here, by contrast, we propose to explicitly model the human-scene contacts. To this end, we introduce distance-based contact maps that capture the contact relationships between every joint and every 3D scene point at each time instant. We then develop a two-stage pipeline that first predicts the future contact maps from the past ones and the scene point cloud, and then forecasts the future human poses by conditioning them on the predicted contact maps. During training, we explicitly encourage consistency between the global motion and the local poses via a prior defined using the contact maps and future poses. Our approach outperforms the state-of-the-art human motion forecasting and human synthesis methods on both synthetic and real datasets. Our code is available at https://github.com/wei-mao-2019/ContAwareMotionPred.
\ No newline at end of file
diff --git a/data/2022/neurips/Context-Based Dynamic Pricing with Partially Linear Demand Model b/data/2022/neurips/Context-Based Dynamic Pricing with Partially Linear Demand Model
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Contextual Bandits with Knapsacks for a Conversion Model b/data/2022/neurips/Contextual Bandits with Knapsacks for a Conversion Model
new file mode 100644
index 0000000000..e99b92b260
--- /dev/null
+++ b/data/2022/neurips/Contextual Bandits with Knapsacks for a Conversion Model	
@@ -0,0 +1 @@
+We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] (but we show that the techniques introduced in the present article may also be applied to the case of these linear structures). The adaptive policies exhibited solve at each round a linear program based on upper-confidence estimates of the probabilities of conversion given $a$ and $\mathbf{x}$. This kind of policy is most natural and achieves a regret bound of the typical order (OPT/$B$) $\sqrt{T}$, where $B$ is the total budget allowed, OPT is the optimal expected reward achievable by a static policy, and $T$ is the number of rounds.
\ No newline at end of file
diff --git a/data/2022/neurips/Contextual Dynamic Pricing with Unknown Noise: Explore-then-UCB Strategy and Improved Regrets b/data/2022/neurips/Contextual Dynamic Pricing with Unknown Noise: Explore-then-UCB Strategy and Improved Regrets
new file mode 100644
index 0000000000..420601e3fa
--- /dev/null
+++ b/data/2022/neurips/Contextual Dynamic Pricing with Unknown Noise: Explore-then-UCB Strategy and Improved Regrets	
@@ -0,0 +1 @@
+Dynamic pricing is a fast-moving research area in machine learning and operations management. A lot of work has been done for this problem with known noise. In this paper, we consider a contextual dynamic pricing problem under a linear customer valuation model with an unknown market noise distribution F . This problem is very challenging due to the difﬁculty in balancing three tangled tasks of revenue-maximization, estimating the linear valuation parameter θ 0 , and learning the nonparametric F . To address this issue, we develop a novel Explore-then-UCB (ExUCB) strategy that includes an exploration for θ 0 -learning and a followed UCB procedure of joint revenue-maximization and F -learning. Under Lipschitz and 2nd-order smoothness assumptions on F , ExUCB is the ﬁrst approach to achieve the ˜ O ( T 2 / 3 ) regret rate. Under the Lipschitz assumption only, ExUCB matches the best existing regret of ˜ O ( T 3 / 4 ) and is computationally more efﬁcient. Furthermore, for regret lower bounds under the nonparametric F , not much work has been done beyond only assuming Lipschitz. To ﬁll this gap, we provide the ﬁrst ˜Ω( T 3 / 5 ) lower bound under Lipschitz and 2nd-order smoothness assumptions.
\ No newline at end of file
diff --git a/data/2022/neurips/Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification b/data/2022/neurips/Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification
new file mode 100644
index 0000000000..f64307b978
--- /dev/null
+++ b/data/2022/neurips/Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification	
@@ -0,0 +1 @@
+Recent years have seen a growth in user-centric applications that require effective knowledge transfer across tasks in the low-data regime. An example is personalization, where a pretrained system is adapted by learning on small amounts of labeled data belonging to a specific user. This setting requires high accuracy under low computational complexity, therefore the Pareto frontier of accuracy vs. adaptation cost plays a crucial role. In this paper we push this Pareto frontier in the few-shot image classification setting with a key contribution: a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance with a single forward pass of the user data (context). We use meta-trained CaSE blocks to conditionally adapt the body of a network and a fine-tuning routine to adapt a linear head, defining a method called UpperCaSE. UpperCaSE achieves a new state-of-the-art accuracy relative to meta-learners on the 26 datasets of VTAB+MD and on a challenging real-world personalization benchmark (ORBIT), narrowing the gap with leading fine-tuning methods with the benefit of orders of magnitude lower adaptation cost.
\ No newline at end of file
diff --git a/data/2022/neurips/Continual Learning In Environments With Polynomial Mixing Times b/data/2022/neurips/Continual Learning In Environments With Polynomial Mixing Times
new file mode 100644
index 0000000000..d75d1fee16
--- /dev/null
+++ b/data/2022/neurips/Continual Learning In Environments With Polynomial Mixing Times	
@@ -0,0 +1 @@
+The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mixing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Continual Learning with Evolving Class Ontologies b/data/2022/neurips/Continual Learning with Evolving Class Ontologies
new file mode 100644
index 0000000000..ee35d9f674
--- /dev/null
+++ b/data/2022/neurips/Continual Learning with Evolving Class Ontologies	
@@ -0,0 +1 @@
+Lifelong learners must recognize concept vocabularies that evolve over time. A common yet underexplored scenario is learning with class labels that continually refine/expand old classes. For example, humans learn to recognize ${\tt dog}$ before dog breeds. In practical settings, dataset $\textit{versioning}$ often introduces refinement to ontologies, such as autonomous vehicle benchmarks that refine a previous ${\tt vehicle}$ class into ${\tt school-bus}$ as autonomous operations expand to new cities. This paper formalizes a protocol for studying the problem of $\textit{Learning with Evolving Class Ontology}$ (LECO). LECO requires learning classifiers in distinct time periods (TPs); each TP introduces a new ontology of"fine"labels that refines old ontologies of"coarse"labels (e.g., dog breeds that refine the previous ${\tt dog}$). LECO explores such questions as whether to annotate new data or relabel the old, how to leverage coarse labels, and whether to finetune the previous TP's model or train from scratch. To answer these questions, we leverage insights from related problems such as class-incremental learning. We validate them under the LECO protocol through the lens of image classification (CIFAR and iNaturalist) and semantic segmentation (Mapillary). Our experiments lead to surprising conclusions; while the current status quo is to relabel existing datasets with new ontologies (such as COCO-to-LVIS or Mapillary1.2-to-2.0), LECO demonstrates that a far better strategy is to annotate $\textit{new}$ data with the new ontology. However, this produces an aggregate dataset with inconsistent old-vs-new labels, complicating learning. To address this challenge, we adopt methods from semi-supervised and partial-label learning. Such strategies can surprisingly be made near-optimal, approaching an"oracle"that learns on the aggregate dataset exhaustively labeled with the newest ontology.
\ No newline at end of file
diff --git a/data/2022/neurips/Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions b/data/2022/neurips/Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions
new file mode 100644
index 0000000000..73dc4339bb
--- /dev/null
+++ b/data/2022/neurips/Continual learning: a feature extraction formalization, an efficient algorithm, and fundamental obstructions	
@@ -0,0 +1 @@
+Continual learning is an emerging paradigm in machine learning, wherein a model is exposed in an online fashion to data from multiple different distributions (i.e. environments), and is expected to adapt to the distribution change. Precisely, the goal is to perform well in the new environment, while simultaneously retaining the performance on the previous environments (i.e. avoid"catastrophic forgetting") -- without increasing the size of the model. While this setup has enjoyed a lot of attention in the applied community, there hasn't be theoretical work that even formalizes the desired guarantees. In this paper, we propose a framework for continual learning through the framework of feature extraction -- namely, one in which features, as well as a classifier, are being trained with each environment. When the features are linear, we design an efficient gradient-based algorithm $\mathsf{DPGD}$, that is guaranteed to perform well on the current environment, as well as avoid catastrophic forgetting. In the general case, when the features are non-linear, we show such an algorithm cannot exist, whether efficient or not.
\ No newline at end of file
diff --git a/data/2022/neurips/Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis b/data/2022/neurips/Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis
new file mode 100644
index 0000000000..cca82ed2f0
--- /dev/null
+++ b/data/2022/neurips/Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis	
@@ -0,0 +1 @@
+One of the most effective continuous deep reinforcement learning algorithms is normalized advantage functions (NAF). The main idea of NAF consists in the approximation of the Q-function by functions quadratic with respect to the action variable. This idea allows to apply the algorithm to continuous reinforcement learning problems, but on the other hand, it brings up the question of classes of problems in which this approximation is acceptable. The presented paper describes one such class. We consider reinforcement learning problems obtained by the time-discretization of certain optimal control problems. Based on the idea of NAF, we present a new family of quadratic functions and prove its suitable approximation properties. Taking these properties into account, we provide several ways to improve NAF. The experimental results confirm the efficiency of our improvements.
\ No newline at end of file
diff --git a/data/2022/neurips/Continuous MDP Homomorphisms and Homomorphic Policy Gradient b/data/2022/neurips/Continuous MDP Homomorphisms and Homomorphic Policy Gradient
new file mode 100644
index 0000000000..ace21f571e
--- /dev/null
+++ b/data/2022/neurips/Continuous MDP Homomorphisms and Homomorphic Policy Gradient	
@@ -0,0 +1 @@
+Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.
\ No newline at end of file
diff --git a/data/2022/neurips/Continuously Tempered PDMP samplers b/data/2022/neurips/Continuously Tempered PDMP samplers
new file mode 100644
index 0000000000..f9cff67faf
--- /dev/null
+++ b/data/2022/neurips/Continuously Tempered PDMP samplers	
@@ -0,0 +1 @@
+New sampling algorithms based on simulating continuous-time stochastic processes called piece-wise deterministic Markov processes (PDMPs) have shown considerable promise. However, these methods can struggle to sample from multi-modal or heavy-tailed distributions. We show how tempering ideas can improve the mixing of PDMPs in such cases. We introduce an extended distribution defined over the state of the posterior distribution and an inverse temperature, which interpolates between a tractable distribution when the inverse temperature is 0 and the posterior when the inverse temperature is 1. The marginal distribution of the inverse temperature is a mixture of a continuous distribution on [0,1) and a point mass at 1: which means that we obtain samples when the inverse temperature is 1, and these are draws from the posterior, but sampling algorithms will also explore distributions at lower temperatures which will improve mixing. We show how PDMPs, and particularly the Zig-Zag sampler, can be implemented to sample from such an extended distribution. The resulting algorithm is easy to implement and we show empirically that it can outperform existing PDMP-based samplers on challenging multimodal posteriors.
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive Adapters for Foundation Model Group Robustness b/data/2022/neurips/Contrastive Adapters for Foundation Model Group Robustness
new file mode 100644
index 0000000000..3490d9ec38
--- /dev/null
+++ b/data/2022/neurips/Contrastive Adapters for Foundation Model Group Robustness	
@@ -0,0 +1 @@
+While large pretrained foundation models (FMs) have shown remarkable zero-shot classification robustness to dataset-level distribution shifts, their robustness to subpopulation or group shifts is relatively underexplored. We study this problem, and find that FMs such as CLIP may not be robust to various group shifts. Across 9 robustness benchmarks, zero-shot classification with their embeddings results in gaps of up to 80.7 percentage points (pp) between average and worst-group accuracy. Unfortunately, existing methods to improve robustness require retraining, which can be prohibitively expensive on large foundation models. We also find that efficient ways to improve model inference (e.g., via adapters, lightweight networks with FM embeddings as inputs) do not consistently improve and can sometimes hurt group robustness compared to zero-shot (e.g., increasing the accuracy gap by 50.1 pp on CelebA). We thus develop an adapter training strategy to effectively and efficiently improve FM group robustness. Our motivating observation is that while poor robustness results from groups in the same class being embedded far apart in the foundation model"embedding space,"standard adapter training may not bring these points closer together. We thus propose contrastive adapting, which trains adapters with contrastive learning to bring sample embeddings close to both their ground-truth class embeddings and other sample embeddings in the same class. Across the 9 benchmarks, our approach consistently improves group robustness, raising worst-group accuracy by 8.5 to 56.0 pp over zero-shot. Our approach is also efficient, doing so without any FM finetuning and only a fixed set of frozen FM embeddings. On benchmarks such as Waterbirds and CelebA, this leads to worst-group accuracy comparable to state-of-the-art methods that retrain entire models, while only training $\leq$1% of the model parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive Graph Structure Learning via Information Bottleneck for Recommendation b/data/2022/neurips/Contrastive Graph Structure Learning via Information Bottleneck for Recommendation
new file mode 100644
index 0000000000..4beb25fcbd
--- /dev/null
+++ b/data/2022/neurips/Contrastive Graph Structure Learning via Information Bottleneck for Recommendation	
@@ -0,0 +1 @@
+Graph convolution networks (GCNs) for recommendations have emerged as an important research topic due to their ability to exploit higher-order neighbors. Despite their success, most of them suffer from the popularity bias brought by a small number of active users and popular items. Also, a real-world user-item bipartite graph contains many noisy interactions, which may hamper the sensitive GCNs. Graph contrastive learning show promising performance for solving the above challenges in recommender systems. Most existing works typically perform graph augmentation to create multiple views of the original graph by randomly dropping edges/nodes or relying on predefined rules, and these augmented views always serve as an auxiliary task by maximizing their correspondence. However, we argue that the graph structures generated from these vanilla approaches may be suboptimal, and maximizing their correspondence will force the representation to capture information irrelevant for the recommendation task. Here, we propose a Contrastive Graph Structure Learning via Information Bottleneck (CGI) for recommendation, which adaptively learns whether to drop an edge or node to obtain optimized graph structures in an end-to-end manner. Moreover, we innovatively introduce the Information Bottleneck into the contrastive learning process to avoid capturing irrelevant information among different views and help enrich the final representation for recommendation. Extensive experiments on public datasets are provided to show that our model significantly outperforms strong baselines. 2
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive Language-Image Pre-Training with Knowledge Graphs b/data/2022/neurips/Contrastive Language-Image Pre-Training with Knowledge Graphs
new file mode 100644
index 0000000000..b16c062830
--- /dev/null
+++ b/data/2022/neurips/Contrastive Language-Image Pre-Training with Knowledge Graphs	
@@ -0,0 +1 @@
+Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks. Nevertheless, existing approaches mainly focus on pre-training with simple image-text pairs, while neglecting the semantic connections between concepts from different modalities. In this paper, we propose a knowledge-based pre-training framework, dubbed Knowledge-CLIP, which injects semantic information into the widely used CLIP model. Through introducing knowledge-based objectives in the pre-training process and utilizing different types of knowledge graphs as training data, our model can semantically align the representations in vision and language with higher quality, and enhance the reasoning ability across scenarios and modalities. Extensive experiments on various vision-language downstream tasks demonstrate the effectiveness of Knowledge-CLIP compared with the original CLIP and competitive baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive Learning as Goal-Conditioned Reinforcement Learning b/data/2022/neurips/Contrastive Learning as Goal-Conditioned Reinforcement Learning
new file mode 100644
index 0000000000..7c6402474f
--- /dev/null
+++ b/data/2022/neurips/Contrastive Learning as Goal-Conditioned Reinforcement Learning	
@@ -0,0 +1 @@
+In reinforcement learning (RL), it is easier to solve a task if given a good representation. While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable and instead equip RL algorithms with additional representation learning parts (e.g., auxiliary losses, data augmentation). How can we design RL algorithms that directly acquire good representations? In this paper, instead of adding representation learning parts to an existing RL algorithm, we show (contrastive) representation learning methods can be cast as RL algorithms in their own right. To do this, we build upon prior work and apply contrastive representation learning to action-labeled trajectories, in such a way that the (inner product of) learned representations exactly corresponds to a goal-conditioned value function. We use this idea to reinterpret a prior RL method as performing contrastive learning, and then use the idea to propose a much simpler method that achieves similar performance. Across a range of goal-conditioned RL tasks, we demonstrate that contrastive RL methods achieve higher success rates than prior non-contrastive methods, including in the offline RL setting. We also show that contrastive RL outperforms prior methods on image-based tasks, without using data augmentation or auxiliary objectives.
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive Neural Ratio Estimation b/data/2022/neurips/Contrastive Neural Ratio Estimation
new file mode 100644
index 0000000000..894cca2806
--- /dev/null
+++ b/data/2022/neurips/Contrastive Neural Ratio Estimation	
@@ -0,0 +1 @@
+Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an intrinsic and unknown bias term, making otherwise informative diagnostics unreliable. We propose a multiclass framework free from the bias inherent to NRE-B at optimum, leaving us in the position to run diagnostics that practitioners depend on. It also recovers NRE-A in one corner case and NRE-B in the limiting case. For fair comparison, we benchmark the behavior of all algorithms in both familiar and novel training regimes: when jointly drawn data is unlimited, when data is fixed but prior draws are unlimited, and in the commonplace fixed data and parameters setting. Our investigations reveal that the highest performing models are distant from the competitors (NRE-A, NRE-B) in hyperparameter space. We make a recommendation for hyperparameters distinct from the previous models. We suggest two bounds on the mutual information as performance metrics for simulation-based inference methods, without the need for posterior samples, and provide experimental results. This version corrects a minor implementation error in $\gamma$, improving results.
\ No newline at end of file
diff --git a/data/2022/neurips/Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods b/data/2022/neurips/Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods
new file mode 100644
index 0000000000..1873bb12e1
--- /dev/null
+++ b/data/2022/neurips/Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods	
@@ -0,0 +1 @@
+Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots the theoretical foundations are limited, method-specific, and fail to provide principled design guidelines to practitioners. In this paper, we propose a unifying framework under the helm of spectral manifold learning to address those limitations. Through the course of this study, we will rigorously demonstrate that VICReg, SimCLR, BarlowTwins et al. correspond to eponymous spectral methods such as Laplacian Eigenmaps, Multidimensional Scaling et al. This unification will then allow us to obtain (i) the closed-form optimal representation for each method, (ii) the closed-form optimal network parameters in the linear regime for each method, (iii) the impact of the pairwise relations used during training on each of those quantities and on downstream task performances, and most importantly, (iv) the first theoretical bridge between contrastive and non-contrastive methods towards global and local spectral embedding methods respectively, hinting at the benefits and limitations of each. For example, (i) if the pairwise relation is aligned with the downstream task, any SSL method can be employed successfully and will recover the supervised method, but in the low data regime, VICReg's invariance hyper-parameter should be high; (ii) if the pairwise relation is misaligned with the downstream task, VICReg with small invariance hyper-parameter should be preferred over SimCLR or BarlowTwins.
\ No newline at end of file
diff --git a/data/2022/neurips/Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields b/data/2022/neurips/Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields
new file mode 100644
index 0000000000..a04fd24768
--- /dev/null
+++ b/data/2022/neurips/Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields	
@@ -0,0 +1 @@
+Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, previous methods focus on controllable 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh, built on top of EG3D (Chan et al. 2022), a recent tri-plane-based generative model. To achieve accurate control over fine-grained 3D face shapes of the synthesized images, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis framework. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Controllable Text Generation with Neurally-Decomposed Oracle b/data/2022/neurips/Controllable Text Generation with Neurally-Decomposed Oracle
new file mode 100644
index 0000000000..ec52e530b2
--- /dev/null
+++ b/data/2022/neurips/Controllable Text Generation with Neurally-Decomposed Oracle	
@@ -0,0 +1 @@
+We propose a general and efficient framework to control auto-regressive generation models with NeurAlly-Decomposed Oracle (NADO). Given a pre-trained base language model and a sequence-level boolean oracle function, we propose to decompose the oracle function into token-level guidance to steer the base model in text generation. Specifically, the token-level guidance is approximated by a neural model trained with examples sampled from the base model, demanding no additional auxiliary labeled data. Based on posterior regularization, we present the closed-form optimal solution to incorporate the token-level guidance into the base model for controllable generation. We further provide a theoretical analysis of how the approximation quality of NADO affects the controllable generation results. Experiments conducted on two applications: (1) text generation with lexical constraints and (2) machine translation with formality control demonstrate that our framework efficiently guides the base model towards the given oracle while maintaining high generation quality.
\ No newline at end of file
diff --git a/data/2022/neurips/Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints b/data/2022/neurips/Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints
new file mode 100644
index 0000000000..e02092aaed
--- /dev/null
+++ b/data/2022/neurips/Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints	
@@ -0,0 +1 @@
+The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models, this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor, thus lacking direct control of the resulting model sparsity. In response, we adopt a constrained formulation: using the gate mechanism proposed by Louizos et al. (2018), we formulate a constrained optimization problem where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion. Experiments on CIFAR-{10, 100}, TinyImageNet, and ImageNet using WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal and demonstrate that we can reliably achieve pre-determined sparsity targets without compromising on predictive performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Convergence beyond the over-parameterized regime using Rayleigh quotients b/data/2022/neurips/Convergence beyond the over-parameterized regime using Rayleigh quotients
new file mode 100644
index 0000000000..cad434db8f
--- /dev/null
+++ b/data/2022/neurips/Convergence beyond the over-parameterized regime using Rayleigh quotients	
@@ -0,0 +1 @@
+In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient flow. Our analysis is centered on the notion of Rayleigh quotients in order to prove Kurdyka-{\L}ojasiewicz inequalities for a broader set of neural network architectures and loss functions. We show that Rayleigh quotients provide a unified view for several convergence analysis techniques in the literature. Our strategy produces a proof of convergence for various examples of parametric learning. In particular, our analysis does not require the number of parameters to tend to infinity, nor the number of samples to be finite, thus extending to test loss minimization and beyond the over-parameterized regime.
\ No newline at end of file
diff --git a/data/2022/neurips/Convergence for score-based generative modeling with polynomial complexity b/data/2022/neurips/Convergence for score-based generative modeling with polynomial complexity
new file mode 100644
index 0000000000..86ee8a5cb8
--- /dev/null
+++ b/data/2022/neurips/Convergence for score-based generative modeling with polynomial complexity	
@@ -0,0 +1 @@
+Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.
\ No newline at end of file
diff --git a/data/2022/neurips/Convergent Representations of Computer Programs in Human and Artificial Neural Networks b/data/2022/neurips/Convergent Representations of Computer Programs in Human and Artificial Neural Networks
new file mode 100644
index 0000000000..cd4748a329
--- /dev/null
+++ b/data/2022/neurips/Convergent Representations of Computer Programs in Human and Artificial Neural Networks	
@@ -0,0 +1 @@
+What aspects of computer programs are represented by the human brain during comprehension? We investigate this question by analyzing brain recordings derived from functional magnetic resonance imaging (fMRI) studies of programmers comprehending Python code. We first evaluate a selection of static and dynamic code properties, such as abstract syntax tree (AST)-related and runtime-related metrics and study how they relate to neural brain signals. Then, to learn whether brain representations encode fine-grained information about computer programs, we train a probe to align brain recordings with representations learned by a suite of ML models trained on code. We find that both the Multiple Demand and Language systems–brain systems which are responsible for very different cognitive tasks, encode specific code properties and uniquely align with machine learned representations of code. These findings suggest at least two distinct neural mechanisms mediating computer program comprehension and evaluation, prompting the design of code model objectives that go beyond static language modeling. We make all the corresponding code, data, and analysis publicly available at https://github.com/ALFA-group/code-representations-ml-brain
\ No newline at end of file
diff --git a/data/2022/neurips/Convexity Certificates from Hessians b/data/2022/neurips/Convexity Certificates from Hessians
new file mode 100644
index 0000000000..ff3c724239
--- /dev/null
+++ b/data/2022/neurips/Convexity Certificates from Hessians	
@@ -0,0 +1 @@
+The Hessian of a differentiable convex function is positive semidefinite. Therefore, checking the Hessian of a given function is a natural approach to certify convexity. However, implementing this approach is not straightforward since it requires a representation of the Hessian that allows its analysis. Here, we implement this approach for a class of functions that is rich enough to support classical machine learning. For this class of functions, it was recently shown how to compute computational graphs of their Hessians. We show how to check these graphs for positive semidefiniteness. We compare our implementation of the Hessian approach with the well-established disciplined convex programming (DCP) approach and prove that the Hessian approach is at least as powerful as the DCP approach for differentiable functions. Furthermore, we show for a state-of-the-art implementation of the DCP approach that, for differentiable functions, the Hessian approach is actually more powerful. That is, it can certify the convexity of a larger class of differentiable functions.
\ No newline at end of file
diff --git a/data/2022/neurips/Convolutional Neural Networks on Graphs with Chebyshev Approximation, Revisited b/data/2022/neurips/Convolutional Neural Networks on Graphs with Chebyshev Approximation, Revisited
new file mode 100644
index 0000000000..d8638aa53d
--- /dev/null
+++ b/data/2022/neurips/Convolutional Neural Networks on Graphs with Chebyshev Approximation, Revisited	
@@ -0,0 +1 @@
+Designing spectral convolutional networks is a challenging problem in graph learning. ChebNet, one of the early attempts, approximates the spectral graph convolutions using Chebyshev polynomials. GCN simplifies ChebNet by utilizing only the first two Chebyshev polynomials while still outperforming it on real-world datasets. GPR-GNN and BernNet demonstrate that the Monomial and Bernstein bases also outperform the Chebyshev basis in terms of learning the spectral graph convolutions. Such conclusions are counter-intuitive in the field of approximation theory, where it is established that the Chebyshev polynomial achieves the optimum convergent rate for approximating a function. In this paper, we revisit the problem of approximating the spectral graph convolutions with Chebyshev polynomials. We show that ChebNet's inferior performance is primarily due to illegal coefficients learnt by ChebNet approximating analytic filter functions, which leads to over-fitting. We then propose ChebNetII, a new GNN model based on Chebyshev interpolation, which enhances the original Chebyshev polynomial approximation while reducing the Runge phenomenon. We conducted an extensive experimental study to demonstrate that ChebNetII can learn arbitrary graph convolutions and achieve superior performance in both full- and semi-supervised node classification tasks. Most notably, we scale ChebNetII to a billion graph ogbn-papers100M, showing that spectral-based GNNs have superior performance. Our code is available at https://github.com/ivam-he/ChebNetII.
\ No newline at end of file
diff --git a/data/2022/neurips/Cooperative Distribution Alignment via JSD Upper Bound b/data/2022/neurips/Cooperative Distribution Alignment via JSD Upper Bound
new file mode 100644
index 0000000000..2d9004b44b
--- /dev/null
+++ b/data/2022/neurips/Cooperative Distribution Alignment via JSD Upper Bound	
@@ -0,0 +1 @@
+Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution given only samples from each distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning. Most prior works use adversarial learning (i.e., min-max optimization), which can be challenging to optimize and evaluate. A few recent works explore non-adversarial flow-based (i.e., invertible) approaches, but they lack a unified perspective and are limited in efficiently aligning multiple distributions. Therefore, we propose to unify and generalize previous flow-based approaches under a single non-adversarial framework, which we prove is equivalent to minimizing an upper bound on the Jensen-Shannon Divergence (JSD). Importantly, our problem reduces to a min-min, i.e., cooperative, problem and can provide a natural evaluation metric for unsupervised distribution alignment. We show empirical results on both simulated and real-world datasets to demonstrate the benefits of our approach. Code is available at https://github.com/inouye-lab/alignment-upper-bound.
\ No newline at end of file
diff --git a/data/2022/neurips/Coordinate Linear Variance Reduction for Generalized Linear Programming b/data/2022/neurips/Coordinate Linear Variance Reduction for Generalized Linear Programming
new file mode 100644
index 0000000000..71f3707b89
--- /dev/null
+++ b/data/2022/neurips/Coordinate Linear Variance Reduction for Generalized Linear Programming	
@@ -0,0 +1 @@
+We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coordinate Linear Variance Reduction} (\textsc{clvr}; pronounced"clever"). \textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm. When the regularization terms and constraints are separable, \textsc{clvr} admits an efficient lazy update strategy that makes its complexity bounds scale with the number of nonzero elements of the linear constraint matrix in (GLP) rather than the matrix dimensions. On the other hand, for the special case of linear programs, by exploiting sharpness, we propose a restart scheme for \textsc{clvr} to obtain empirical linear convergence. Then we show that Distributionally Robust Optimization (DRO) problems with ambiguity sets based on both $f$-divergence and Wasserstein metrics can be reformulated as (GLPs) by introducing sparsely connected auxiliary variables. We complement our theoretical guarantees with numerical experiments that verify our algorithm's practical effectiveness, in terms of wall-clock time and number of data passes.
\ No newline at end of file
diff --git a/data/2022/neurips/Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations b/data/2022/neurips/Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations
new file mode 100644
index 0000000000..2de0b84f73
--- /dev/null
+++ b/data/2022/neurips/Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations	
@@ -0,0 +1 @@
+Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation. However, existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views (i.e. 50-150) to obtain decent results. To relive the over-dependence on massive calibrated images and enrich the coordinate-based feature representation, we explore injecting the prior information into the coordinate-based network and introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation. The cores of our method are two attention modules: codebook attention and coordinate attention. The former extracts the useful prototypes containing rich geometry and appearance information from the prior codebook, and the latter propagates such prior information into each coordinate and enriches its feature representation for a scene or object surface. With the help of the prior information, our method can render 3D views with more photo-realistic appearance and geometries than the current methods using fewer calibrated images available. Experiments on various scene reconstruction datasets, including DTU and BlendedMVS, and the full 3D head reconstruction dataset, H3DS, demonstrate the robustness under fewer input views and fine detail-preserving capability of our proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Coreset for Line-Sets Clustering b/data/2022/neurips/Coreset for Line-Sets Clustering
new file mode 100644
index 0000000000..db99253abd
--- /dev/null
+++ b/data/2022/neurips/Coreset for Line-Sets Clustering	
@@ -0,0 +1 @@
+The input to the line-sets k -median problem is an integer k ≥ 1 , and a set L = { L 1 , . . . , L n } that contains n sets of lines in R d . The goal is to compute a set C of k centers (points in R d ) that minimizes the sum (cid:80) L ∈L min ℓ ∈ L,c ∈ C dist( ℓ, c ) of Euclidean distances from each set to its closest center, where dist( ℓ, c ) := min x ∈ ℓ ∥ x − c ∥ 2 . An ε -coreset for this problem is a weighted subset of sets in L that approximates this sum up to 1 ± ε multiplicative factor, for every set C of k centers. We prove that every such input set L has a small ε -coreset, and provide the first coreset construction for this problem and its variants. The coreset consists of O (log 2 n ) weighted line-sets from L , and is constructed in O ( n log n ) time for every fixed d, k ≥ 1 and ε ∈ (0 , 1) . The main technique is based on a novel reduction to a “fair clustering” of colored points to colored centers. We then provide a coreset for this coloring problem, which may be of independent interest. Open source code and experiments are also provided.
\ No newline at end of file
diff --git a/data/2022/neurips/Coresets for Relational Data and The Applications b/data/2022/neurips/Coresets for Relational Data and The Applications
new file mode 100644
index 0000000000..b865ebc184
--- /dev/null
+++ b/data/2022/neurips/Coresets for Relational Data and The Applications	
@@ -0,0 +1 @@
+A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
\ No newline at end of file
diff --git a/data/2022/neurips/Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering b/data/2022/neurips/Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering
new file mode 100644
index 0000000000..e86dbc94a1
--- /dev/null
+++ b/data/2022/neurips/Coresets for Vertical Federated Learning: Regularized Linear Regression and $K$-Means Clustering	
@@ -0,0 +1 @@
+Vertical federated learning (VFL), where data features are stored in multiple parties distributively, is an important area in machine learning. However, the communication complexity for VFL is typically very high. In this paper, we propose a unified framework by constructing coresets in a distributed fashion for communication-efficient VFL. We study two important learning tasks in the VFL setting: regularized linear regression and $k$-means clustering, and apply our coreset framework to both problems. We theoretically show that using coresets can drastically alleviate the communication complexity, while nearly maintain the solution quality. Numerical experiments are conducted to corroborate our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Coresets for Wasserstein Distributionally Robust Optimization Problems b/data/2022/neurips/Coresets for Wasserstein Distributionally Robust Optimization Problems
new file mode 100644
index 0000000000..73cce86fff
--- /dev/null
+++ b/data/2022/neurips/Coresets for Wasserstein Distributionally Robust Optimization Problems	
@@ -0,0 +1 @@
+Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the $\epsilon$-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics b/data/2022/neurips/Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics
new file mode 100644
index 0000000000..ccb63cef68
--- /dev/null
+++ b/data/2022/neurips/Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics	
@@ -0,0 +1 @@
+Self-training based semi-supervised learning algorithms have enabled the learning of highly accurate deep neural networks, using only a fraction of labeled data. However, the majority of work on self-training has focused on the objective of improving accuracy, whereas practical machine learning systems can have complex goals (e.g. maximizing the minimum of recall across classes, etc.) that are non-decomposable in nature. In this work, we introduce the Cost-Sensitive Self-Training (CSST) framework which generalizes the self-training-based methods for optimizing non-decomposable metrics. We prove that our framework can better optimize the desired non-decomposable metric utilizing unlabeled data, under similar data distribution assumptions made for the analysis of self-training. Using the proposed CSST framework, we obtain practical self-training methods (for both vision and NLP tasks) for optimizing different non-decomposable metrics using deep neural networks. Our results demonstrate that CSST achieves an improvement over the state-of-the-art in majority of the cases across datasets and objectives.
\ No newline at end of file
diff --git a/data/2022/neurips/Cost-efficient Gaussian tensor network embeddings for tensor-structured inputs b/data/2022/neurips/Cost-efficient Gaussian tensor network embeddings for tensor-structured inputs
new file mode 100644
index 0000000000..7d7c7600e4
--- /dev/null
+++ b/data/2022/neurips/Cost-efficient Gaussian tensor network embeddings for tensor-structured inputs	
@@ -0,0 +1 @@
+This work discusses tensor network embeddings, which are random matrices ($S$) with tensor network structure. These embeddings have been used to perform dimensionality reduction of tensor network structured inputs $x$ and accelerate applications such as tensor decomposition and kernel regression. Existing works have designed embeddings for inputs $x$ with specific structures, such that the computational cost for calculating $Sx$ is efficient. We provide a systematic way to design tensor network embeddings consisting of Gaussian random tensors, such that for inputs with more general tensor network structures, both the sketch size (row size of $S$) and the sketching computational cost are low. We analyze general tensor network embeddings that can be reduced to a sequence of sketching matrices. We provide a sufficient condition to quantify the accuracy of such embeddings and derive sketching asymptotic cost lower bounds using embeddings that satisfy this condition and have a sketch size lower than any input dimension. We then provide an algorithm to efficiently sketch input data using such embeddings. The sketch size of the embedding used in the algorithm has a linear dependence on the number of sketching dimensions of the input. Assuming tensor contractions are performed with classical dense matrix multiplication algorithms, this algorithm achieves asymptotic cost within a factor of $O(\sqrt{m})$ of our cost lower bound, where $m$ is the sketch size. Further, when each tensor in the input has a dimension that needs to be sketched, this algorithm yields the optimal sketching asymptotic cost. We apply our sketching analysis to inexact tensor decomposition optimization algorithms. We provide a sketching algorithm for CP decomposition that is asymptotically faster than existing work in multiple regimes, and show optimality of an existing algorithm for tensor train rounding.
\ No newline at end of file
diff --git a/data/2022/neurips/Could Giant Pre-trained Image Models Extract Universal Representations? b/data/2022/neurips/Could Giant Pre-trained Image Models Extract Universal Representations?
new file mode 100644
index 0000000000..68649241f4
--- /dev/null
+++ b/data/2022/neurips/Could Giant Pre-trained Image Models Extract Universal Representations?	
@@ -0,0 +1 @@
+Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.
\ No newline at end of file
diff --git a/data/2022/neurips/Counterfactual Fairness with Partially Known Causal Graph b/data/2022/neurips/Counterfactual Fairness with Partially Known Causal Graph
new file mode 100644
index 0000000000..30e2f0cc51
--- /dev/null
+++ b/data/2022/neurips/Counterfactual Fairness with Partially Known Causal Graph	
@@ -0,0 +1 @@
+Fair machine learning aims to avoid treating individuals or sub-populations unfavourably based on \textit{sensitive attributes}, such as gender and race. Those methods in fair machine learning that are built on causal inference ascertain discrimination and bias through causal effects. Though causality-based fair learning is attracting increasing attention, current methods assume the true causal graph is fully known. This paper proposes a general method to achieve the notion of counterfactual fairness when the true causal graph is unknown. To be able to select features that lead to counterfactual fairness, we derive the conditions and algorithms to identify ancestral relations between variables on a \textit{Partially Directed Acyclic Graph (PDAG)}, specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. Interestingly, we find that counterfactual fairness can be achieved as if the true causal graph were fully known, when specific background knowledge is provided: the sensitive attributes do not have ancestors in the causal graph. Results on both simulated and real-world datasets demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media b/data/2022/neurips/Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media
new file mode 100644
index 0000000000..0941481b7d
--- /dev/null
+++ b/data/2022/neurips/Counterfactual Neural Temporal Point Process for Estimating Causal Influence of Misinformation on Social Media	
@@ -0,0 +1 @@
+Recent years have witnessed the rise of misinformation campaigns that spread specific narratives on social media to manipulate public opinions on different areas, such as politics and healthcare. Consequently, an effective and efficient automatic methodology to estimate the influence of the misinformation on user beliefs and activities is needed. However, existing works on misinformation impact estimation either rely on small-scale psychological experiments or can only discover the correlation between user behaviour and misinformation. To address these issues, in this paper, we build up a causal framework that model the causal effect of misinformation from the perspective of temporal point process. To adapt the large-scale data, we design an efficient yet precise way to estimate the Individual Treatment Effect(ITE) via neural temporal point process and gaussian mixture models. Extensive experiments on synthetic dataset verify the effectiveness and efficiency of our model. We further apply our model on a real-world dataset of social media posts and engagements about COVID-19 vaccines. The experimental results indicate that our model recognized identifiable causal effect of misinformation that hurts people's subjective emotions toward the vaccines.
\ No newline at end of file
diff --git a/data/2022/neurips/Counterfactual Temporal Point Processes b/data/2022/neurips/Counterfactual Temporal Point Processes
new file mode 100644
index 0000000000..b6061ac7c1
--- /dev/null
+++ b/data/2022/neurips/Counterfactual Temporal Point Processes	
@@ -0,0 +1 @@
+Machine learning models based on temporal point processes are the state of the art in a wide variety of applications involving discrete events in continuous time. However, these models lack the ability to answer counterfactual questions, which are increasingly relevant as these models are being used to inform targeted interventions. In this work, our goal is to fill this gap. To this end, we first develop a causal model of thinning for temporal point processes that builds upon the Gumbel-Max structural causal model. This model satisfies a desirable counterfactual monotonicity condition, which is sufficient to identify counterfactual dynamics in the process of thinning. Then, given an observed realization of a temporal point process with a given intensity function, we develop a sampling algorithm that uses the above causal model of thinning and the superposition theorem to simulate counterfactual realizations of the temporal point process under a given alternative intensity function. Simulation experiments using synthetic and real epidemiological data show that the counterfactual realizations provided by our algorithm may give valuable insights to enhance targeted interventions.
\ No newline at end of file
diff --git a/data/2022/neurips/Counterfactual harm b/data/2022/neurips/Counterfactual harm
new file mode 100644
index 0000000000..8b0b63038b
--- /dev/null
+++ b/data/2022/neurips/Counterfactual harm	
@@ -0,0 +1 @@
+To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy.
\ No newline at end of file
diff --git a/data/2022/neurips/CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation b/data/2022/neurips/CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
new file mode 100644
index 0000000000..bd810e66c2
--- /dev/null
+++ b/data/2022/neurips/CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation	
@@ -0,0 +1 @@
+Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects, but ignore the semantic consistency of pixels within the same object, leading to incomplete masks and localization errors in predictions. To tackle this problem, we propose CoupAlign, a simple yet effective multi-level visual-semantic alignment method, to couple sentence-mask alignment with word-pixel alignment to enforce object mask constraint for achieving more accurate localization and segmentation. Specifically, the Word-Pixel Alignment (WPA) module performs early fusion of linguistic and pixel-level features in intermediate layers of the vision and language encoders. Based on the word-pixel aligned embedding, a set of mask proposals are generated to hypothesize possible objects. Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target. To further enhance the learning of the two alignment modules, an auxiliary loss is designed to contrast the foreground and background pixels. By hierarchically aligning pixels and masks with linguistic features, our CoupAlign captures the pixel coherence at both visual and semantic levels, thus generating more accurate predictions. Extensive experiments on popular datasets (e.g., RefCOCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e.g., about 2% oIoU increase on the validation and testing set of RefCOCO. Especially, CoupAlign has remarkable ability in distinguishing the target from multiple objects of the same class.
\ No newline at end of file
diff --git a/data/2022/neurips/CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion b/data/2022/neurips/CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
new file mode 100644
index 0000000000..c8e27da9ec
--- /dev/null
+++ b/data/2022/neurips/CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion	
@@ -0,0 +1 @@
+Masked Image Modeling (MIM) has recently been established as a potent pre-training paradigm. A pretext task is constructed by masking patches in an input image, and this masked content is then predicted by a neural network using visible patches as sole input. This pre-training leads to state-of-the-art performance when finetuned for high-level semantic tasks, e.g. image classification and object detection. In this paper we instead seek to learn representations that transfer well to a wide variety of 3D vision and lower-level geometric downstream tasks, such as depth prediction or optical flow estimation. Inspired by MIM, we propose an unsupervised representation learning task trained from pairs of images showing the same scene from different viewpoints. More precisely, we propose the pretext task of cross-view completion where the first input image is partially masked, and this masked content has to be reconstructed from the visible content and the second image. In single-view MIM, the masked content often cannot be inferred precisely from the visible portion only, so the model learns to act as a prior influenced by high-level semantics. In contrast, this ambiguity can be resolved with cross-view completion from the second unmasked image, on the condition that the model is able to understand the spatial relationship between the two images. Our experiments show that our pretext task leads to significantly improved performance for monocular 3D vision downstream tasks such as depth estimation. In addition, our model can be directly applied to binocular downstream tasks like optical flow or relative camera pose estimation, for which we obtain competitive results without bells and whistles, i.e., using a generic architecture without any task-specific design.
\ No newline at end of file
diff --git a/data/2022/neurips/Cross Aggregation Transformer for Image Restoration b/data/2022/neurips/Cross Aggregation Transformer for Image Restoration
new file mode 100644
index 0000000000..aa5fab47ec
--- /dev/null
+++ b/data/2022/neurips/Cross Aggregation Transformer for Image Restoration	
@@ -0,0 +1 @@
+Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct interaction among different windows, which limits the establishment of long-range dependencies. To address the above issue, we propose a new image restoration model, Cross Aggregation Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows. We also introduce the Axial-Shift operation for different window interactions. Furthermore, we propose the Locality Complementary Module to complement the self-attention mechanism, which incorporates the inductive bias of CNN (e.g., translation invariance and locality) into Transformer, enabling global-local coupling. Extensive experiments demonstrate that our CAT outperforms recent state-of-the-art methods on several image restoration applications. The code and models are available at https://github.com/zhengchen1999/CAT.
\ No newline at end of file
diff --git a/data/2022/neurips/Cross-Image Context for Single Image Inpainting b/data/2022/neurips/Cross-Image Context for Single Image Inpainting
new file mode 100644
index 0000000000..1c0e64ba8a
--- /dev/null
+++ b/data/2022/neurips/Cross-Image Context for Single Image Inpainting	
@@ -0,0 +1 @@
+Visual context is of crucial importance for image inpainting. The contextual information captures the appearance and semantic correlation between the image regions, helping to propagate the information of the complete regions for reasoning the content of the corrupted regions. Many inpainting methods compute the visual context based on the regions within a single image. In this paper, we propose the Cross-Image Context Memory (CICM) for learning and using the cross-image context to recover the corrupted regions. CICM consists of multiple sets of the cross-image features learned from the image regions with different visual patterns. The regional features are learned across different images, thus providing richer context that beneﬁts the inpainting task. The experimental results demonstrate the effectiveness and generalization of CICM, which achieves state-of-the-art performances on various datasets for single image inpainting.
\ No newline at end of file
diff --git a/data/2022/neurips/Cross-Linked Unified Embedding for cross-modality representation learning b/data/2022/neurips/Cross-Linked Unified Embedding for cross-modality representation learning
new file mode 100644
index 0000000000..35e87ab9a2
--- /dev/null
+++ b/data/2022/neurips/Cross-Linked Unified Embedding for cross-modality representation learning	
@@ -0,0 +1 @@
+Multi-modal learning is essential for understanding information in the real world. Jointly learning from multi-modal data enables global integration of both shared and modality-specific information, but current strategies often fail when observations from certain modalities are incomplete or missing for part of the subjects. To learn comprehensive representations based on such modality-incomplete data, we present a semi-supervised neural network model called CLUE (Cross-Linked Unified Embedding). Extending from multi-modal VAEs, CLUE introduces the use of cross-encoders to construct latent representations from modality-incomplete observations. Representation learning for modality-incomplete observations is common in genomics. For example, human cells are tightly regulated across multiple related but distinct modalities such as DNA, RNA, and protein, jointly defining a cell’s function. We benchmark CLUE on multi-modal data from single cell measurements, illustrating CLUE’s superior performance in all assessed categories of the NeurIPS 2021 Multimodal Single-cell Data Integration Competition. While we focus on analysis of single cell genomic datasets, we note that the proposed cross-linked embedding strategy could be readily applied to other cross-modality representation learning problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Cross-modal Learning for Image-Guided Point Cloud Shape Completion b/data/2022/neurips/Cross-modal Learning for Image-Guided Point Cloud Shape Completion
new file mode 100644
index 0000000000..5aafcfc38f
--- /dev/null
+++ b/data/2022/neurips/Cross-modal Learning for Image-Guided Point Cloud Shape Completion	
@@ -0,0 +1 @@
+In this paper we explore the recent topic of point cloud completion, guided by an auxiliary image. We show how it is possible to effectively combine the information from the two modalities in a localized latent space, thus avoiding the need for complex point cloud reconstruction methods from single views used by the state-of-the-art. We also investigate a novel weakly-supervised setting where the auxiliary image provides a supervisory signal to the training process by using a differentiable renderer on the completed point cloud to measure fidelity in the image space. Experiments show significant improvements over state-of-the-art supervised methods for both unimodal and multimodal completion. We also show the effectiveness of the weakly-supervised approach which outperforms a number of supervised methods and is competitive with the latest supervised models only exploiting point cloud information.
\ No newline at end of file
diff --git a/data/2022/neurips/CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference b/data/2022/neurips/CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference
new file mode 100644
index 0000000000..b56515d1c4
--- /dev/null
+++ b/data/2022/neurips/CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference	
@@ -0,0 +1 @@
+Recently cloud-based graph convolutional network (GCN) has demonstrated great success and potential in many privacy-sensitive applications such as personal healthcare and financial systems. Despite its high inference accuracy and performance on cloud, maintaining data privacy in GCN inference, which is of paramount importance to these practical applications, remains largely unexplored. In this paper, we take an initial attempt towards this and develop $\textit{CryptoGCN}$--a homomorphic encryption (HE) based GCN inference framework. A key to the success of our approach is to reduce the tremendous computational overhead for HE operations, which can be orders of magnitude higher than its counterparts in the plaintext space. To this end, we develop an approach that can effectively take advantage of the sparsity of matrix operations in GCN inference to significantly reduce the computational overhead. Specifically, we propose a novel AMA data formatting method and associated spatial convolution methods, which can exploit the complex graph structure and perform efficient matrix-matrix multiplication in HE computation and thus greatly reduce the HE operations. We also develop a co-optimization framework that can explore the trade offs among the accuracy, security level, and computational overhead by judicious pruning and polynomial approximation of activation module in GCNs. Based on the NTU-XVIEW skeleton joint dataset, i.e., the largest dataset evaluated homomorphically by far as we are aware of, our experimental results demonstrate that $\textit{CryptoGCN}$ outperforms state-of-the-art solutions in terms of the latency and number of homomorphic operations, i.e., achieving as much as a 3.10$\times$ speedup on latency and reduces the total Homomorphic Operation Count by 77.4\% with a small accuracy loss of 1-1.5$\%$.
\ No newline at end of file
diff --git a/data/2022/neurips/Cryptographic Hardness of Learning Halfspaces with Massart Noise b/data/2022/neurips/Cryptographic Hardness of Learning Halfspaces with Massart Noise
new file mode 100644
index 0000000000..30e0f84bc7
--- /dev/null
+++ b/data/2022/neurips/Cryptographic Hardness of Learning Halfspaces with Massart Noise	
@@ -0,0 +1 @@
+We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples $(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$, where the distribution of $\mathbf{x}$ is arbitrary and the label $y$ is a Massart corruption of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$, with flipping probability $\eta(\mathbf{x}) \leq \eta<1/2$. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than $\Omega(\eta)$, even if the optimal 0-1 error is small, namely $\mathrm{OPT} = 2^{-\log^{c} (N)}$ for any universal constant $c \in (0, 1)$. Prior work had provided qualitatively similar evidence of hardness in the Statistical Query model. Our computational hardness result essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.
\ No newline at end of file
diff --git a/data/2022/neurips/Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation b/data/2022/neurips/Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
new file mode 100644
index 0000000000..ca7367776c
--- /dev/null
+++ b/data/2022/neurips/Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation	
@@ -0,0 +1 @@
+It has been a long-standing dream to design artificial agents that explore their environment efficiently via intrinsic motivation, similar to how children perform curious free play. Despite recent advances in intrinsically motivated reinforcement learning (RL), sample-efficient exploration in object manipulation scenarios remains a significant challenge as most of the relevant information lies in the sparse agent-object and object-object interactions. In this paper, we propose to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments. By planning for future novelty inside structured world models, our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time. Instead of using models only to compute intrinsic rewards, as commonly done, our method showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning. After the entirely intrinsic task-agnostic exploration phase, our method solves challenging downstream tasks such as stacking, flipping, pick&place, and throwing that generalizes to unseen numbers and arrangements of objects without any additional training.
\ No newline at end of file
diff --git a/data/2022/neurips/Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation b/data/2022/neurips/Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation
new file mode 100644
index 0000000000..efcaee8b0a
--- /dev/null
+++ b/data/2022/neurips/Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain Adaptation	
@@ -0,0 +1 @@
+Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks, starting from easy ones and gradually learning towards difficult tasks. In this work, we focus on the idea of framing CRL as interpolations between a source (auxiliary) and a target task distribution. Although existing studies have shown the great potential of this idea, it remains unclear how to formally quantify and generate the movement between task distributions. Inspired by the insights from gradual domain adaptation in semi-supervised learning, we create a natural curriculum by breaking down the potentially large task distributional shift in CRL into smaller shifts. We propose GRADIENT, which formulates CRL as an optimal transport problem with a tailored distance metric between tasks. Specifically, we generate a sequence of task distributions as a geodesic interpolation (i.e., Wasserstein barycenter) between the source and target distributions. Different from many existing methods, our algorithm considers a task-dependent contextual distance metric and is capable of handling nonparametric distributions in both continuous and discrete context settings. In addition, we theoretically show that GRADIENT enables smooth transfer between subsequent stages in the curriculum under certain conditions. We conduct extensive experiments in locomotion and manipulation tasks and show that our proposed GRADIENT achieves higher performance than baselines in terms of learning efficiency and asymptotic performance.
\ No newline at end of file
diff --git a/data/2022/neurips/CyCLIP: Cyclic Contrastive Language-Image Pretraining b/data/2022/neurips/CyCLIP: Cyclic Contrastive Language-Image Pretraining
new file mode 100644
index 0000000000..9dcfd7cc8c
--- /dev/null
+++ b/data/2022/neurips/CyCLIP: Cyclic Contrastive Language-Image Pretraining	
@@ -0,0 +1 @@
+Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness. Such models typically require joint reasoning in the image and text representation spaces for downstream inference tasks. Contrary to prior beliefs, we demonstrate that the image and text representations learned via a standard contrastive objective are not interchangeable and can lead to inconsistent downstream predictions. To mitigate this issue, we formalize consistency and propose CyCLIP, a framework for contrastive representation learning that explicitly optimizes for the learned representations to be geometrically consistent in the image and text space. In particular, we show that consistent representations can be learned by explicitly symmetrizing (a) the similarity between the two mismatched image-text pairs (cross-modal consistency); and (b) the similarity between the image-image pair and the text-text pair (in-modal consistency). Empirically, we show that the improved consistency in CyCLIP translates to significant gains over CLIP, with gains ranging from 10%-24% for zero-shot classification accuracy on standard benchmarks (CIFAR-10, CIFAR-100, ImageNet1K) and 10%-27% for robustness to various natural distribution shifts. The code is available at https://github.com/goel-shashank/CyCLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/DABS 2.0: Improved Datasets and Algorithms for Universal Self-Supervision b/data/2022/neurips/DABS 2.0: Improved Datasets and Algorithms for Universal Self-Supervision
new file mode 100644
index 0000000000..74b4d06f18
--- /dev/null
+++ b/data/2022/neurips/DABS 2.0: Improved Datasets and Algorithms for Universal Self-Supervision	
@@ -0,0 +1 @@
+Universal self-supervised learning (SSL) algorithms hold enormous promise for making machine learning accessible to high-impact domains such as protein biology, manufacturing, and genomics. We present DABS 2.0: a set of improved datasets and algorithms for advancing research on universal SSL. We extend the recently-introduced DABS benchmark with the addition of five real-world science and engineering domains: protein biology, bacterial genomics, multispectral satellite imagery, semiconductor wafers, and particle physics, bringing the total number of domains in the benchmark to twelve. We also propose a new universal SSL algorithm, Capri, and a generalized version of masked autoencoding, and apply both on all twelve domains—the most wide-ranging exploration of SSL yet. We find that multiple algorithms show gains across different domains, outperforming previous baselines. In addition, we demonstrate the usefulness of DABS for scientific study of SSL by investigating the optimal corruption rate for each algorithm, showing that the best setting varies based on the domain. Code will be released at http://github.com/alextamkin/dabs.
\ No newline at end of file
diff --git a/data/2022/neurips/DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization b/data/2022/neurips/DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization
new file mode 100644
index 0000000000..e5a594b260
--- /dev/null
+++ b/data/2022/neurips/DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization	
@@ -0,0 +1 @@
+The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a new acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of $\textit{M-matrices}$, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme and propose DAGMA ($\textit{DAGs via M-matrices for Acyclicity}$), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for $\textit{linear}$ and $\textit{nonlinear}$ SEMs and show that our approach can reach large speed-ups and smaller structural Hamming distances against state-of-the-art methods. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/dagma.
\ No newline at end of file
diff --git a/data/2022/neurips/DARE: Disentanglement-Augmented Rationale Extraction b/data/2022/neurips/DARE: Disentanglement-Augmented Rationale Extraction
new file mode 100644
index 0000000000..8b17b21efa
--- /dev/null
+++ b/data/2022/neurips/DARE: Disentanglement-Augmented Rationale Extraction	
@@ -0,0 +1 @@
+Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Speciﬁcally, it ﬁrst disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method. Code is released at https://github.com/yuelinan/DARE .
\ No newline at end of file
diff --git a/data/2022/neurips/DART: Articulated Hand Model with Diverse Accessories and Rich Textures b/data/2022/neurips/DART: Articulated Hand Model with Diverse Accessories and Rich Textures
new file mode 100644
index 0000000000..8369cf61df
--- /dev/null
+++ b/data/2022/neurips/DART: Articulated Hand Model with Diverse Accessories and Rich Textures	
@@ -0,0 +1 @@
+Hand, the bearer of human productivity and intelligence, is receiving much attention due to the recent fever of digital twins. Among different hand morphable models, MANO has been widely used in vision and graphics community. However, MANO disregards textures and accessories, which largely limits its power to synthesize photorealistic hand data. In this paper, we extend MANO with Diverse Accessories and Rich Textures, namely DART. DART is composed of 50 daily 3D accessories which varies in appearance and shape, and 325 hand-crafted 2D texture maps covers different kinds of blemishes or make-ups. Unity GUI is also provided to generate synthetic hand data with user-defined settings, e.g., pose, camera, background, lighting, textures, and accessories. Finally, we release DARTset, which contains large-scale (800K), high-fidelity synthetic hand images, paired with perfect-aligned 3D labels. Experiments demonstrate its superiority in diversity. As a complement to existing hand datasets, DARTset boosts the generalization in both hand pose estimation and mesh recovery tasks. Raw ingredients (textures, accessories), Unity GUI, source code and DARTset are publicly available at dart2022.github.io
\ No newline at end of file
diff --git a/data/2022/neurips/DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning b/data/2022/neurips/DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning
new file mode 100644
index 0000000000..fb2680a564
--- /dev/null
+++ b/data/2022/neurips/DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning	
@@ -0,0 +1 @@
+In offline RL, constraining the learned policy to remain close to the data is essential 1 to prevent the policy from outputting out-of-distribution (OOD) actions with erro-2 neously overestimated values. In principle, generative adversarial networks (GAN) 3 can provide an elegant solution to do so, with the discriminator directly providing 4 a probability that quantifies distributional shift. However, in practice, GAN-based 5 offline RL methods have not outperformed alternative approaches, perhaps because 6 the generator is trained to both fool the discriminator and maximize return – two 7 objectives that are often at odds with each other. In this paper, we show that the 8 issue of conflicting objectives can be resolved by training two generators: one that 9 maximizes return, with the other capturing the “remainder” of the data distribution 10 in the offline dataset, such that the mixture of the two is close to the behavior policy. 11 We show that not only does having two generators enable an effective GAN-based 12 offline RL method, but also approximates a support constraint, where the policy 13 does not need to match the entire data distribution, but only the slice of the data 14 that leads to high long term performance. We name our method DASCO, for 15 D ual-Generator A dversarial S upport C onstrained O ffline RL. On benchmark tasks 16 that require learning from sub-optimal data, DASCO significantly outperforms 17 prior methods that enforce distribution constraint. 18
\ No newline at end of file
diff --git a/data/2022/neurips/DC-BENCH: Dataset Condensation Benchmark b/data/2022/neurips/DC-BENCH: Dataset Condensation Benchmark
new file mode 100644
index 0000000000..10a22465c3
--- /dev/null
+++ b/data/2022/neurips/DC-BENCH: Dataset Condensation Benchmark	
@@ -0,0 +1 @@
+Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced to facilitate future research and application.
\ No newline at end of file
diff --git a/data/2022/neurips/DDXPlus: A New Dataset For Automatic Medical Diagnosis b/data/2022/neurips/DDXPlus: A New Dataset For Automatic Medical Diagnosis
new file mode 100644
index 0000000000..ea3575b45f
--- /dev/null
+++ b/data/2022/neurips/DDXPlus: A New Dataset For Automatic Medical Diagnosis	
@@ -0,0 +1 @@
+There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors would review the interactions, including the evidence and the predictions, collect if necessary additional information from patients, before deciding on next steps. Despite recent progress in this area, an important piece of doctors' interactions with patients is missing in the design of these systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset of roughly 1.3 million patients that includes a differential diagnosis, along with the ground truth pathology, symptoms and antecedents for each patient. Unlike existing datasets which only contain binary symptoms and antecedents, this dataset also contains categorical and multi-choice symptoms and antecedents useful for efficient data collection. Moreover, some symptoms are organized in a hierarchy, making it possible to design systems able to interact with patients in a logical way. As a proof-of-concept, we extend two existing AD and ASD systems to incorporate the differential diagnosis, and provide empirical evidence that using differentials as training signals is essential for the efficiency of such systems or for helping doctors better understand the reasoning of those systems.
\ No newline at end of file
diff --git a/data/2022/neurips/DENSE: Data-Free One-Shot Federated Learning b/data/2022/neurips/DENSE: Data-Free One-Shot Federated Learning
new file mode 100644
index 0000000000..16a8a2b61b
--- /dev/null
+++ b/data/2022/neurips/DENSE: Data-Free One-Shot Federated Learning	
@@ -0,0 +1 @@
+One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitations, \eg a public dataset is required, clients' models are homogeneous, and additional data/model information need to be uploaded. To overcome these issues, we propose a novel two-stage \textbf{D}ata-fre\textbf{E} o\textbf{N}e-\textbf{S}hot federated l\textbf{E}arning (DENSE) framework, which trains the global model by a data generation stage and a model distillation stage. DENSE is a practical one-shot FL method that can be applied in reality due to the following advantages: (1) DENSE requires no additional information compared with other methods (except the model parameters) to be transferred between clients and the server; (2) DENSE does not require any auxiliary dataset for training; (3) DENSE considers model heterogeneity in FL, \ie different clients can have different model architectures. Experiments on a variety of real-world datasets demonstrate the superiority of our method.For example, DENSE outperforms the best baseline method Fed-ADI by 5.08\% on CIFAR10 dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection b/data/2022/neurips/DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection
new file mode 100644
index 0000000000..0cbd2ea1e9
--- /dev/null
+++ b/data/2022/neurips/DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection	
@@ -0,0 +1 @@
+Graph Anomaly Detection (GAD) has recently become a hot research spot due to its practicability and theoretical value. Since GAD emphasizes the application and the rarity of anomalous samples, enriching the varieties of its datasets is fundamental work. Thus, this paper present DGraph, a real-world dynamic graph in the finance domain. DGraph overcomes many limitations of current GAD datasets. It contains about 3M nodes, 4M dynamic edges, and 1M ground-truth nodes. We provide a comprehensive observation of DGraph, revealing that anomalous nodes and normal nodes generally have different structures, neighbor distribution, and temporal dynamics. Moreover, it suggests that unlabeled nodes are also essential for detecting fraudsters. Furthermore, we conduct extensive experiments on DGraph. Observation and experiments demonstrate that DGraph is propulsive to advance GAD research and enable in-depth exploration of anomalous nodes.
\ No newline at end of file
diff --git a/data/2022/neurips/DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning b/data/2022/neurips/DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..4f5d6d21ac
--- /dev/null
+++ b/data/2022/neurips/DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning	
@@ -0,0 +1 @@
+Hierarchical Reinforcement Learning (HRL) has made notable progress in complex control tasks by leveraging temporal abstraction. However, previous HRL algorithms often suffer from serious data inefficiency as environments get large. The extended components, $i.e.$, goal space and length of episodes, impose a burden on either one or both high-level and low-level policies since both levels share the total horizon of the episode. In this paper, we present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem by decoupling the horizons of high-level and low-level policies and bridging the gap between the length of both horizons using a graph. DHRL provides a freely stretchable high-level action interval, which facilitates longer temporal abstraction and faster training in complex tasks. Our method outperforms state-of-the-art HRL algorithms in typical HRL environments. Moreover, DHRL achieves long and complex locomotion and manipulation tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems b/data/2022/neurips/DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems
new file mode 100644
index 0000000000..e7a8bbdaa0
--- /dev/null
+++ b/data/2022/neurips/DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems	
@@ -0,0 +1 @@
+Recently, deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization (CO) problems. However, most DRL solvers can only scale to a few hundreds of nodes for combinatorial optimization problems on graphs, such as the Traveling Salesman Problem (TSP). This paper addresses the scalability challenge in large-scale combinatorial optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Such a continuous space allows stable REINFORCE-based training and fine-tuning via massively parallel sampling. We further propose a meta-learning framework to enable the effective initialization of model parameters in the fine-tuning stage. Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
\ No newline at end of file
diff --git a/data/2022/neurips/DISCO: Adversarial Defense with Local Implicit Functions b/data/2022/neurips/DISCO: Adversarial Defense with Local Implicit Functions
new file mode 100644
index 0000000000..bbbd845907
--- /dev/null
+++ b/data/2022/neurips/DISCO: Adversarial Defense with Local Implicit Functions	
@@ -0,0 +1 @@
+The problem of adversarial defenses for image classification, where the goal is to robustify a classifier against adversarial examples, is considered. Inspired by the hypothesis that these examples lie beyond the natural image manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is proposed to remove adversarial perturbations by localized manifold projections. DISCO consumes an adversarial image and a query pixel location and outputs a clean RGB value at the location. It is implemented with an encoder and a local implicit module, where the former produces per-pixel deep features and the latter uses the features in the neighborhood of query pixel for predicting the clean RGB value. Extensive experiments demonstrate that both DISCO and its cascade version outperform prior defenses, regardless of whether the defense is known to the attacker. DISCO is also shown to be data and parameter efficient and to mount defenses that transfers across datasets, classifiers and attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/DMAP: a Distributed Morphological Attention Policy for learning to locomote with a changing body b/data/2022/neurips/DMAP: a Distributed Morphological Attention Policy for learning to locomote with a changing body
new file mode 100644
index 0000000000..70949dc749
--- /dev/null
+++ b/data/2022/neurips/DMAP: a Distributed Morphological Attention Policy for learning to locomote with a changing body	
@@ -0,0 +1 @@
+Biological and artificial agents need to deal with constant changes in the real world. We study this problem in four classical continuous control environments, augmented with morphological perturbations. Learning to locomote when the length and the thickness of different body parts vary is challenging, as the control policy is required to adapt to the morphology to successfully balance and advance the agent. We show that a control policy based on the proprioceptive state performs poorly with highly variable body configurations, while an (oracle) agent with access to a learned encoding of the perturbation performs significantly better. We introduce DMAP, a biologically-inspired, attention-based policy network architecture. DMAP combines independent proprioceptive processing, a distributed policy with individual controllers for each joint, and an attention mechanism, to dynamically gate sensory information from different body parts to different controllers. Despite not having access to the (hidden) morphology information, DMAP can be trained end-to-end in all the considered environments, overall matching or surpassing the performance of an oracle agent. Thus DMAP, implementing principles from biological motor control, provides a strong inductive bias for learning challenging sensorimotor tasks. Overall, our work corroborates the power of these principles in challenging locomotion tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/DNA: Proximal Policy Optimization with a Dual Network Architecture b/data/2022/neurips/DNA: Proximal Policy Optimization with a Dual Network Architecture
new file mode 100644
index 0000000000..57f7c1c3c0
--- /dev/null
+++ b/data/2022/neurips/DNA: Proximal Policy Optimization with a Dual Network Architecture	
@@ -0,0 +1 @@
+This paper explores the problem of simultaneously learning a value function and policy in deep actor-critic reinforcement learning models. We find that the common practice of learning these functions jointly is sub-optimal, due to an order-of-magnitude difference in noise levels between these two tasks. Instead, we show that learning these tasks independently, but with a constrained distillation phase, significantly improves performance. Furthermore, we find that the policy gradient noise levels can be decreased by using a lower \textit{variance} return estimate. Whereas, the value learning noise level decreases with a lower \textit{bias} estimate. Together these insights inform an extension to Proximal Policy Optimization we call \textit{Dual Network Architecture} (DNA), which significantly outperforms its predecessor. DNA also exceeds the performance of the popular Rainbow DQN algorithm on four of the five environments tested, even under more difficult stochastic control settings.
\ No newline at end of file
diff --git a/data/2022/neurips/DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning b/data/2022/neurips/DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
new file mode 100644
index 0000000000..57bfe00dc2
--- /dev/null
+++ b/data/2022/neurips/DOMINO: Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+Adapting to the changes in transition dynamics is essential in robotic applications. By learning a conditional policy with a compact context, context-aware meta-reinforcement learning provides a flexible way to adjust behavior according to dynamics changes. However, in real-world applications, the agent may encounter complex dynamics changes. Multiple confounders can influence the transition dynamics, making it challenging to infer accurate context for decision-making. This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error. Our theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments. Extensive experiments show that the context learned by DOMINO benefits both model-based and model-free reinforcement learning algorithms for dynamics generalization in terms of sample efficiency and performance in unseen environments.
\ No newline at end of file
diff --git a/data/2022/neurips/DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning b/data/2022/neurips/DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning
new file mode 100644
index 0000000000..d7d78895bf
--- /dev/null
+++ b/data/2022/neurips/DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning	
@@ -0,0 +1 @@
+Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirements as constraints on the expected cumulative costs that must be satisfied during all episodes of learning. We propose a model-based safe RL algorithm that we call Doubly Optimistic and Pessimistic Exploration (DOPE), and show that it achieves an objective regret $\tilde{O}(|\mathcal{S}|\sqrt{|\mathcal{A}| K})$ without violating the safety constraints during learning, where $|\mathcal{S}|$ is the number of states, $|\mathcal{A}|$ is the number of actions, and $K$ is the number of learning episodes. Our key idea is to combine a reward bonus for exploration (optimism) with a conservative constraint (pessimism), in addition to the standard optimistic model-based exploration. DOPE is not only able to improve the objective regret bound, but also shows a significant empirical performance improvement as compared to earlier optimism-pessimism approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/DP-PCA: Statistically Optimal and Differentially Private PCA b/data/2022/neurips/DP-PCA: Statistically Optimal and Differentially Private PCA
new file mode 100644
index 0000000000..ff44842082
--- /dev/null
+++ b/data/2022/neurips/DP-PCA: Statistically Optimal and Differentially Private PCA	
@@ -0,0 +1 @@
+We study the canonical statistical task of computing the principal component from $n$ i.i.d.~data in $d$ dimensions under $(\varepsilon,\delta)$-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: ($i$) even for Gaussian data, existing private algorithms require the number of samples $n$ to scale super-linearly with $d$, i.e., $n=\Omega(d^{3/2})$, to obtain non-trivial results while non-private PCA requires only $n=O(d)$, and ($ii$) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on {\em private mean estimation}, which adds minimal noise required to ensure privacy by adapting to the variance of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=\tilde O(d)$. Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate.
\ No newline at end of file
diff --git a/data/2022/neurips/DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps b/data/2022/neurips/DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
new file mode 100644
index 0000000000..32afef317c
--- /dev/null
+++ b/data/2022/neurips/DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps	
@@ -0,0 +1 @@
+Diffusion probabilistic models (DPMs) are emerging powerful generative models. Despite their high-quality generation performance, DPMs still suffer from their slow sampling as they generally need hundreds or thousands of sequential function evaluations (steps) of large neural networks to draw a sample. Sampling from DPMs can be viewed alternatively as solving the corresponding diffusion ordinary differential equations (ODEs). In this work, we propose an exact formulation of the solution of diffusion ODEs. The formulation analytically computes the linear part of the solution, rather than leaving all terms to black-box ODE solvers as adopted in previous works. By applying change-of-variable, the solution can be equivalently simplified to an exponentially weighted integral of the neural network. Based on our formulation, we propose DPM-Solver, a fast dedicated high-order solver for diffusion ODEs with the convergence order guarantee. DPM-Solver is suitable for both discrete-time and continuous-time DPMs without any further training. Experimental results show that DPM-Solver can generate high-quality samples in only 10 to 20 function evaluations on various datasets. We achieve 4.70 FID in 10 function evaluations and 2.87 FID in 20 function evaluations on the CIFAR10 dataset, and a $4\sim 16\times$ speedup compared with previous state-of-the-art training-free samplers on various datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing b/data/2022/neurips/DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing
new file mode 100644
index 0000000000..680fade1c5
--- /dev/null
+++ b/data/2022/neurips/DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing	
@@ -0,0 +1 @@
+Federated learning (FL) strives to enable collaborative training of machine learning models without centrally collecting clients' private data. Different from centralized training, the local datasets across clients in FL are non-independent and identically distributed (non-IID). In addition, the data-owning clients may drop out of the training process arbitrarily. These characteristics will significantly degrade the training performance. This paper proposes a Dropout-Resilient Secure Federated Learning (DReS-FL) framework based on Lagrange coded computing (LCC) to tackle both the non-IID and dropout problems. The key idea is to utilize Lagrange coding to secretly share the private datasets among clients so that each client receives an encoded version of the global dataset, and the local gradient computation over this dataset is unbiased. To correctly decode the gradient at the server, the gradient function has to be a polynomial in a finite field, and thus we construct polynomial integer neural networks (PINNs) to enable our framework. Theoretical analysis shows that DReS-FL is resilient to client dropouts and provides privacy protection for the local datasets. Furthermore, we experimentally demonstrate that DReS-FL consistently leads to significant performance gains over baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection b/data/2022/neurips/DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection
new file mode 100644
index 0000000000..3e0fa45443
--- /dev/null
+++ b/data/2022/neurips/DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection	
@@ -0,0 +1 @@
+The Mean-Teacher (MT) scheme is widely adopted in semi-supervised object detection (SSOD). In MT, the sparse pseudo labels, offered by the final predictions of the teacher (e.g., after Non Maximum Suppression (NMS) post-processing), are adopted for the dense supervision for the student via hand-crafted label assignment. However, the sparse-to-dense paradigm complicates the pipeline of SSOD, and simultaneously neglects the powerful direct, dense teacher supervision. In this paper, we attempt to directly leverage the dense guidance of teacher to supervise student training, i.e., the dense-to-dense paradigm. Specifically, we propose the Inverse NMS Clustering (INC) and Rank Matching (RM) to instantiate the dense supervision, without the widely used, conventional sparse pseudo labels. INC leads the student to group candidate boxes into clusters in NMS as the teacher does, which is implemented by learning grouping information revealed in NMS procedure of the teacher. After obtaining the same grouping scheme as the teacher via INC, the student further imitates the rank distribution of the teacher over clustered candidates through Rank Matching. With the proposed INC and RM, we integrate Dense Teacher Guidance into Semi-Supervised Object Detection (termed DTG-SSOD), successfully abandoning sparse pseudo labels and enabling more informative learning on unlabeled data. On COCO benchmark, our DTG-SSOD achieves state-of-the-art performance under various labelling ratios. For example, under 10% labelling ratio, DTG-SSOD improves the supervised baseline from 26.9 to 35.9 mAP, outperforming the previous best method Soft Teacher by 1.9 points.
\ No newline at end of file
diff --git a/data/2022/neurips/DaDA: Distortion-aware Domain Adaptation for Unsupervised Semantic Segmentation b/data/2022/neurips/DaDA: Distortion-aware Domain Adaptation for Unsupervised Semantic Segmentation
new file mode 100644
index 0000000000..05ed528feb
--- /dev/null
+++ b/data/2022/neurips/DaDA: Distortion-aware Domain Adaptation for Unsupervised Semantic Segmentation	
@@ -0,0 +1 @@
+Distributional shifts in photometry and texture have been extensively studied for unsupervised domain adaptation, but their counterparts in optical distortion have been largely neglected. In this work, we tackle the task of unsupervised domain adaptation for semantic image segmentation where unknown optical distortion exists between source and target images. To this end, we propose a d istortion-a ware d omain a daptation (DaDA) framework that boosts the unsupervised segmentation performance. We ﬁrst present a r elative d istortion l earning (RDL) approach that is capable of modeling domain shifts in ﬁne-grained geometric deformation based on diffeomorphic transformation. Then, we demonstrate that applying additional global afﬁne transformations to the diffeomorphically transformed source images can further improve the segmentation adaptation. Besides, we ﬁnd that our distortion-aware adaptation method helps to enhance self-supervised learning by providing higher-quality initial models and pseudo labels. To evaluate, we propose new distortion adaptation benchmarks, where rectilinear source images and ﬁsheye target images are used for unsupervised domain adaptation. Extensive experimental results highlight the effectiveness of our approach over state-of-the-art methods under unknown relative distortion across domains. Datasets and more information are available at https://sait-fdd.github.io/ .
\ No newline at end of file
diff --git a/data/2022/neurips/Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention b/data/2022/neurips/Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention
new file mode 100644
index 0000000000..dc1797c555
--- /dev/null
+++ b/data/2022/neurips/Dance of SNN and ANN: Solving binding problem by combining spike timing and reconstructive attention	
@@ -0,0 +1 @@
+The binding problem is one of the fundamental challenges that prevent the artificial neural network (ANNs) from a compositional understanding of the world like human perception, because disentangled and distributed representations of generative factors can interfere and lead to ambiguity when complex data with multiple objects are presented. In this paper, we propose a brain-inspired hybrid neural network (HNN) that introduces temporal binding theory originated from neuroscience into ANNs by integrating spike timing dynamics (via spiking neural networks, SNNs) with reconstructive attention (by ANNs). Spike timing provides an additional dimension for grouping, while reconstructive feedback coordinates the spikes into temporal coherent states. Through iterative interaction of ANN and SNN, the model continuously binds multiple objects at alternative synchronous firing times in the SNN coding space. The effectiveness of the model is evaluated on synthetic datasets of binary images. By visualization and analysis, we demonstrate that the binding is explainable, soft, flexible, and hierarchical. Notably, the model is trained on single object datasets without explicit supervision on grouping, but successfully binds multiple objects on test datasets, showing its compositional generalization capability. Further results show its binding ability in dynamic situations.
\ No newline at end of file
diff --git a/data/2022/neurips/Data Augmentation MCMC for Bayesian Inference from Privatized Data b/data/2022/neurips/Data Augmentation MCMC for Bayesian Inference from Privatized Data
new file mode 100644
index 0000000000..ac4361fef3
--- /dev/null
+++ b/data/2022/neurips/Data Augmentation MCMC for Bayesian Inference from Privatized Data	
@@ -0,0 +1 @@
+Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. We give results on the computational complexity, acceptance rate, and mixing properties of our MCMC. We illustrate the efficacy and applicability of our methods on a na\"ive-Bayes log-linear model as well as on a linear regression model.
\ No newline at end of file
diff --git a/data/2022/neurips/Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome b/data/2022/neurips/Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome
new file mode 100644
index 0000000000..38042b6108
--- /dev/null
+++ b/data/2022/neurips/Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome	
@@ -0,0 +1 @@
+Data augmentation plays a key role in modern machine learning pipelines. While numerous augmentation strategies have been studied in the context of computer vision and natural language processing, less is known for other data modalities. Our work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome. Drawing on key principles from compositional data analysis, such as the Aitchison geometry of the simplex and subcompositions, we define novel augmentation strategies for this data modality. Incorporating our data augmentations into standard supervised learning pipelines results in consistent performance gains across a wide range of standard benchmark datasets. In particular, we set a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease. In addition, our data augmentations enable us to define a novel contrastive learning model, which improves on previous representation learning approaches for microbiome compositional data. Our code is available at https://github.com/cunningham-lab/AugCoDa.
\ No newline at end of file
diff --git a/data/2022/neurips/Data Distributional Properties Drive Emergent In-Context Learning in Transformers b/data/2022/neurips/Data Distributional Properties Drive Emergent In-Context Learning in Transformers
new file mode 100644
index 0000000000..4d3a53a8a2
--- /dev/null
+++ b/data/2022/neurips/Data Distributional Properties Drive Emergent In-Context Learning in Transformers	
@@ -0,0 +1 @@
+Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language.
\ No newline at end of file
diff --git a/data/2022/neurips/Data augmentation for efficient learning from parametric experts b/data/2022/neurips/Data augmentation for efficient learning from parametric experts
new file mode 100644
index 0000000000..f64b588860
--- /dev/null
+++ b/data/2022/neurips/Data augmentation for efficient learning from parametric experts	
@@ -0,0 +1 @@
+We present a simple, yet powerful data-augmentation technique to enable data-efficient learning from parametric experts for reinforcement and imitation learning. We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert or expert policy to inform the behavior of a student policy. This setting arises naturally in a number of problems, for instance as variants of behavior cloning, or as a component of other algorithms such as DAGGER, policy distillation or KL-regularized RL. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories, thus dramatically reducing the environment interactions required for successful cloning of the expert. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems. We demonstrate the benefit of our method in the context of several existing and widely used algorithms that include policy cloning as a constituent part. Moreover, we highlight the benefits of our approach in two practically relevant settings (a) expert compression, i.e. transfer to a student with fewer parameters; and (b) transfer from privileged experts, i.e. where the expert has a different observation space than the student, usually including access to privileged information.
\ No newline at end of file
diff --git a/data/2022/neurips/Data-Driven Conditional Robust Optimization b/data/2022/neurips/Data-Driven Conditional Robust Optimization
new file mode 100644
index 0000000000..f90f76c535
--- /dev/null
+++ b/data/2022/neurips/Data-Driven Conditional Robust Optimization	
@@ -0,0 +1 @@
+In this paper, we study a novel approach for data-driven decision-making under uncertainty in the presence of contextual information. Speciﬁcally, we address this problem using a new Conditional Robust Optimization (CRO) paradigm that seeks the solution of a robust optimization problem where the uncertainty set accounts for the most recent side information provided by a set of covariates. We propose an integrated framework that designs the conditional uncertainty set by jointly learning a partition in the covariate data space and simultaneously constructing region speciﬁc deep uncertainty sets for the random vector that perturbs the CRO problem. We also provide theoretical guarantees for the coverage provided by conditional uncertainty sets and for the value at risk performances obtained using the proposed CRO model. Finally, we use simulated and real world data to illustrate the implementation of our approach and compare it against two non-contextual robust optimization benchmark approaches to demonstrate the value of exploiting contextual information in robust optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Data-Driven Offline Decision-Making via Invariant Representation Learning b/data/2022/neurips/Data-Driven Offline Decision-Making via Invariant Representation Learning
new file mode 100644
index 0000000000..b7f4957a1a
--- /dev/null
+++ b/data/2022/neurips/Data-Driven Offline Decision-Making via Invariant Representation Learning	
@@ -0,0 +1 @@
+The goal in offline data-driven decision-making is synthesize decisions that optimize a black-box utility function, using a previously-collected static dataset, with no active interaction. These problems appear in many forms: offline reinforcement learning (RL), where we must produce actions that optimize the long-term reward, bandits from logged data, where the goal is to determine the correct arm, and offline model-based optimization (MBO) problems, where we must find the optimal design provided access to only a static dataset. A key challenge in all these settings is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good. In contrast to prior approaches that utilize pessimism or conservatism to tackle this problem, in this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions ("target domain"), when training only on the dataset ("source domain"). This perspective leads to invariant objective models (IOM), our approach for addressing distributional shift by enforcing invariance between the learned representations of the training dataset and optimized decisions. In IOM, if the optimized decisions are too different from the training dataset, the representation will be forced to lose much of the information that distinguishes good designs from bad ones, making all choices seem mediocre. Critically, when the optimizer is aware of this representational tradeoff, it should choose not to stray too far from the training distribution, leading to a natural trade-off between distributional shift and learning performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Data-Efficient Augmentation for Training Neural Networks b/data/2022/neurips/Data-Efficient Augmentation for Training Neural Networks
new file mode 100644
index 0000000000..715367923a
--- /dev/null
+++ b/data/2022/neurips/Data-Efficient Augmentation for Training Neural Networks	
@@ -0,0 +1 @@
+Data augmentation is essential to achieve state-of-the-art performance in many deep learning applications. However, the most effective augmentation techniques become computationally prohibitive for even medium-sized datasets. To address this, we propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. We first show that data augmentation, modeled as additive perturbations, improves learning and generalization by relatively enlarging and perturbing the smaller singular values of the network Jacobian, while preserving its prominent directions. This prevents overfitting and enhances learning the harder to learn information. Then, we propose a framework to iteratively extract small subsets of training data that when augmented, closely capture the alignment of the fully augmented Jacobian with labels/residuals. We prove that stochastic gradient descent applied to the augmented subsets found by our approach has similar training dynamics to that of fully augmented data. Our experiments demonstrate that our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes. Similarly, on TinyImageNet and ImageNet, our method beats the baselines by up to 8%, while achieving up to 3.3x speedup across various subset sizes. Finally, training on and augmenting 50% subsets using our method on a version of CIFAR10 corrupted with label noise even outperforms using the full dataset. Our code is available at: https://github.com/tianyu139/data-efficient-augmentation
\ No newline at end of file
diff --git a/data/2022/neurips/Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data b/data/2022/neurips/Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
new file mode 100644
index 0000000000..5c011753b4
--- /dev/null
+++ b/data/2022/neurips/Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) can be used to improve future performance by leveraging historical data. There exist many different algorithms for offline RL, and it is well recognized that these algorithms, and their hyperparameter settings, can lead to decision policies with substantially differing performance. This prompts the need for pipelines that allow practitioners to systematically perform algorithm-hyperparameter selection for their setting. Critically, in most real-world settings, this pipeline must only involve the use of historical data. Inspired by statistical model selection methods for supervised learning, we introduce a task- and method-agnostic pipeline for automatically training, comparing, selecting, and deploying the best policy when the provided dataset is limited in size. In particular, our work highlights the importance of performing multiple data splits to produce more reliable algorithm-hyperparameter selection. While this is a common approach in supervised learning, to our knowledge, this has not been discussed in detail in the offline RL setting. We show it can have substantial impacts when the dataset is small. Compared to alternate approaches, our proposed pipeline outputs higher-performing deployed policies from a broad range of offline policy learning algorithms and across various simulation domains in healthcare, education, and robotics. This work contributes toward the development of a general-purpose meta-algorithm for automatic algorithm-hyperparameter selection for offline RL.
\ No newline at end of file
diff --git a/data/2022/neurips/Data-Efficient Structured Pruning via Submodular Optimization b/data/2022/neurips/Data-Efficient Structured Pruning via Submodular Optimization
new file mode 100644
index 0000000000..14b85286bc
--- /dev/null
+++ b/data/2022/neurips/Data-Efficient Structured Pruning via Submodular Optimization	
@@ -0,0 +1 @@
+Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
\ No newline at end of file
diff --git a/data/2022/neurips/Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data b/data/2022/neurips/Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data
new file mode 100644
index 0000000000..30184c973c
--- /dev/null
+++ b/data/2022/neurips/Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data	
@@ -0,0 +1 @@
+High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity - this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging. To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets. We show that Data-IQ's characterization of examples is most robust to variation across similarly performant (yet different) models, compared to baselines. Since Data-IQ can be used with any ML model (including neural networks, gradient boosting etc.), this property ensures consistency of data characterization, while allowing flexible model selection. Taking this a step further, we demonstrate that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection. Furthermore, we highlight how the subgroups can inform reliable model usage, noting the significant impact of the Ambiguous subgroup on model generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/DataMUX: Data Multiplexing for Neural Networks b/data/2022/neurips/DataMUX: Data Multiplexing for Neural Networks
new file mode 100644
index 0000000000..3cdc35bf12
--- /dev/null
+++ b/data/2022/neurips/DataMUX: Data Multiplexing for Neural Networks	
@@ -0,0 +1 @@
+In this paper, we introduce data multiplexing (DataMUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. DataMUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased throughput with minimal extra memory requirements. Our approach uses two key components -- 1) a multiplexing layer that performs a fixed linear transformation to each input before combining them to create a mixed representation of the same size as a single input, which is then processed by the base network, and 2) a demultiplexing layer that converts the base network's output back into independent representations before producing predictions for each input. We show the viability of DataMUX for different architectures (Transformers, and to a lesser extent MLPs and CNNs) across six different tasks spanning sentence classification, named entity recognition and image classification. For instance, DataMUX for Transformers can multiplex up to $20$x/$40$x inputs, achieving $11$x/$18$x increase in throughput with minimal absolute performance drops of $<2\%$ and $<4\%$ respectively on MNLI, a natural language inference task. We also provide a theoretical construction for multiplexing in self-attention networks and analyze the effect of various design elements in DataMUX.
\ No newline at end of file
diff --git a/data/2022/neurips/Dataset Distillation using Neural Feature Regression b/data/2022/neurips/Dataset Distillation using Neural Feature Regression
new file mode 100644
index 0000000000..6db3e8e759
--- /dev/null
+++ b/data/2022/neurips/Dataset Distillation using Neural Feature Regression	
@@ -0,0 +1 @@
+Dataset distillation aims to learn a small synthetic dataset that preserves most of the information from the original dataset. Dataset distillation can be formulated as a bi-level meta-learning problem where the outer loop optimizes the meta-dataset and the inner loop trains a model on the distilled data. Meta-gradient computation is one of the key challenges in this formulation, as differentiating through the inner loop learning procedure introduces significant computation and memory costs. In this paper, we address these challenges using neural Feature Regression with Pooling (FRePo), achieving the state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods. The proposed algorithm is analogous to truncated backpropagation through time with a pool of models to alleviate various types of overfitting in dataset distillation. FRePo significantly outperforms the previous methods on CIFAR100, Tiny ImageNet, and ImageNet-1K. Furthermore, we show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense. Please check out our webpage at https://sites.google.com/view/frepo.
\ No newline at end of file
diff --git a/data/2022/neurips/Dataset Distillation via Factorization b/data/2022/neurips/Dataset Distillation via Factorization
new file mode 100644
index 0000000000..fd25660b82
--- /dev/null
+++ b/data/2022/neurips/Dataset Distillation via Factorization	
@@ -0,0 +1 @@
+In this paper, we study \xw{dataset distillation (DD)}, from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive constraints on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve \textasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https://github.com/Huage001/DatasetFactorization}{here}.
\ No newline at end of file
diff --git a/data/2022/neurips/Dataset Inference for Self-Supervised Models b/data/2022/neurips/Dataset Inference for Self-Supervised Models
new file mode 100644
index 0000000000..970cf3e55a
--- /dev/null
+++ b/data/2022/neurips/Dataset Inference for Self-Supervised Models	
@@ -0,0 +1 @@
+Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.
\ No newline at end of file
diff --git a/data/2022/neurips/DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes b/data/2022/neurips/DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes
new file mode 100644
index 0000000000..0fdbfbce8d
--- /dev/null
+++ b/data/2022/neurips/DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes	
@@ -0,0 +1 @@
+Modeling dynamic scenes is important for many applications such as virtual reality and telepresence. Despite achieving unprecedented fidelity for novel view synthesis in dynamic scenes, existing methods based on Neural Radiance Fields (NeRF) suffer from slow convergence (i.e., model training time measured in days). In this paper, we present DeVRF, a novel representation to accelerate learning dynamic radiance fields. The core of DeVRF is to model both the 3D canonical space and 4D deformation field of a dynamic, non-rigid scene with explicit and discrete voxel-based representations. However, it is quite challenging to train such a representation which has a large number of model parameters, often resulting in overfitting issues. To overcome this challenge, we devise a novel static-to-dynamic learning paradigm together with a new data capture setup that is convenient to deploy in practice. This paradigm unlocks efficient learning of deformable radiance fields via utilizing the 3D volumetric canonical space learnt from multi-view static images to ease the learning of 4D voxel deformation field with only few-view dynamic sequences. To further improve the efficiency of our DeVRF and its synthesized novel view's quality, we conduct thorough explorations and identify a set of strategies. We evaluate DeVRF on both synthetic and real-world dynamic scenes with different types of deformation. Experiments demonstrate that DeVRF achieves two orders of magnitude speedup (100x faster) with on-par high-fidelity results compared to the previous state-of-the-art approaches. The code and dataset will be released in https://github.com/showlab/DeVRF.
\ No newline at end of file
diff --git a/data/2022/neurips/Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding b/data/2022/neurips/Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding
new file mode 100644
index 0000000000..6e0ae7f163
--- /dev/null
+++ b/data/2022/neurips/Debiased Causal Tree: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding	
@@ -0,0 +1 @@
+Unmeasured confounding poses a signiﬁcant threat to the validity of causal inference. Despite that various ad hoc methods are developed to remove confounding effects, they are subject to certain fairly strong assumptions. In this work, we consider the estimation of conditional causal effects in the presence of unmea-sured confounding using observational data and historical controls. Under an interpretable transportability condition, we prove the partial identiﬁability of conditional average treatment effect on the treated group (CATT). For tree-based models, a new notion, confounding entropy , is proposed to measure the discrepancy introduced by unobserved confounders between the conditional outcome distribution of the treated and control groups. The confounding entropy generalizes conventional confounding bias, and can be estimated effectively using historical controls. We develop a new method, debiased causal tree, whose splitting rule is to minimize the empirical risk regularized by the confounding entropy. Notably, our method integrates current observational data (for empirical risk) and their historical controls (for confounding entropy) harmoniously. We highlight that, debiased causal tree can not only estimate CATT well in the presence of unmeasured confounding, but also is a robust estimator of conditional average treatment effect (CATE) against the imbalance of the treated and control populations when all confounders are observed. An extension of combining multiple debiased causal trees to further reduce
\ No newline at end of file
diff --git a/data/2022/neurips/Debiased Machine Learning without Sample-Splitting for Stable Estimators b/data/2022/neurips/Debiased Machine Learning without Sample-Splitting for Stable Estimators
new file mode 100644
index 0000000000..edf999e0a1
--- /dev/null
+++ b/data/2022/neurips/Debiased Machine Learning without Sample-Splitting for Stable Estimators	
@@ -0,0 +1 @@
+Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-$n$ consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.
\ No newline at end of file
diff --git a/data/2022/neurips/Debiased Self-Training for Semi-Supervised Learning b/data/2022/neurips/Debiased Self-Training for Semi-Supervised Learning
new file mode 100644
index 0000000000..47d4b61454
--- /dev/null
+++ b/data/2022/neurips/Debiased Self-Training for Semi-Supervised Learning	
@@ -0,0 +1 @@
+Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. Yet these datasets are time-consuming and labor-exhaustive to obtain on realistic tasks. To mitigate the requirement for labeled data, self-training is widely used in semi-supervised learning by iteratively assigning pseudo labels to unlabeled samples. Despite its popularity, self-training is well-believed to be unreliable and often leads to training instability. Our experimental studies further reveal that the bias in semi-supervised learning arises from both the problem itself and the inappropriate training with potentially incorrect pseudo labels, which accumulates the error in the iterative self-training process. To reduce the above bias, we propose Debiased Self-Training (DST). First, the generation and utilization of pseudo labels are decoupled by two parameter-independent classifier heads to avoid direct error accumulation. Second, we estimate the worst case of self-training bias, where the pseudo labeling function is accurate on labeled samples, yet makes as many mistakes as possible on unlabeled samples. We then adversarially optimize the representations to improve the quality of pseudo labels by avoiding the worst case. Extensive experiments justify that DST achieves an average improvement of 6.3% against state-of-the-art methods on standard semi-supervised learning benchmark datasets and 18.9%$ against FixMatch on 13 diverse tasks. Furthermore, DST can be seamlessly adapted to other self-training methods and help stabilize their training and balance performance across classes in both cases of training from scratch and finetuning from pre-trained models.
\ No newline at end of file
diff --git a/data/2022/neurips/Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records b/data/2022/neurips/Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records
new file mode 100644
index 0000000000..9c028d91c4
--- /dev/null
+++ b/data/2022/neurips/Debiased, Longitudinal and Coordinated Drug Recommendation through Multi-Visit Clinic Records	
@@ -0,0 +1 @@
+AI-empowered drug recommendation has become an important task in healthcare research areas, which offers an additional perspective to assist human doctors with more accurate and more efficient drug prescriptions. Generally, drug recommendation is based on patients’ diagnosis results in the electronic health records. We assume that there are three key factors to be addressed in drug recommendation: 1) elimination of recommendation bias due to limitations of observable information, 2) better utilization of historical health condition and 3) coordination of multiple drugs to control safety. To this end, we propose DrugRec, a causal inference based drug recommendation model. The causal graphical model can identify and deconfound the recommendation bias with front-door adjustment. Meanwhile, we model the multi-visit in the causal graph to characterize a patient’s historical health conditions. Finally, we model the drug-drug interactions (DDIs) as the propositional satisfiability (SAT) problem, and solving the SAT problem can help better coordinate the recommendation. Comprehensive experiment results show that our proposed model achieves state-of-the-art performance on the widely used datasets MIMIC-III and MIMIC-IV, demonstrating the effectiveness and safety of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure b/data/2022/neurips/Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure
new file mode 100644
index 0000000000..8248dbb4ce
--- /dev/null
+++ b/data/2022/neurips/Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure	
@@ -0,0 +1 @@
+Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by learning the correlation between the input graphs and labels. However, by presenting a graph classification investigation on the training graphs with severe bias, surprisingly, we discover that GNNs always tend to explore the spurious correlations to make decision, even if the causal correlation always exists. This implies that existing GNNs trained on such biased datasets will suffer from poor generalization capability. By analyzing this problem in a causal view, we find that disentangling and decorrelating the causal and bias latent variables from the biased graphs are both crucial for debiasing. Inspiring by this, we propose a general disentangled GNN framework to learn the causal substructure and bias substructure, respectively. Particularly, we design a parameterized edge mask generator to explicitly split the input graph into causal and bias subgraphs. Then two GNN modules supervised by causal/bias-aware loss functions respectively are trained to encode causal and bias subgraphs into their corresponding representations. With the disentangled representations, we synthesize the counterfactual unbiased training samples to further decorrelate causal and bias variables. Moreover, to better benchmark the severe bias problem, we construct three new graph datasets, which have controllable bias degrees and are easier to visualize and explain. Experimental results well demonstrate that our approach achieves superior generalization performance over existing baselines. Furthermore, owing to the learned edge mask, the proposed model has appealing interpretability and transferability. Code and data are available at: https://github.com/googlebaba/DisC.
\ No newline at end of file
diff --git a/data/2022/neurips/Debugging and Explaining Metric Learning Approaches: An Influence Function Based Perspective b/data/2022/neurips/Debugging and Explaining Metric Learning Approaches: An Influence Function Based Perspective
new file mode 100644
index 0000000000..7e6319706d
--- /dev/null
+++ b/data/2022/neurips/Debugging and Explaining Metric Learning Approaches: An Influence Function Based Perspective	
@@ -0,0 +1 @@
+Deep metric learning (DML) learns a generalizable embedding space where the representations of semantically similar samples are closer. Despite achieving good performance, the state-of-the-art models still suffer from the generalization errors such as farther similar samples and closer dissimilar samples in the space. In this work, we design empirical influence function (EIF), a debugging and explaining technique for the generalization errors of the state-of-the-art metric learning models. EIF is designed to efficiently identify and quantify how a subset of training samples contribute to the generalization errors. Moreover, given a user-specific error, EIF can be used to relabel a potentially noisy training sample as a mitigation. In our quantitative experiment, EIF outperforms the traditional baseline in identifying more relevant training samples with statistical significance and 33.5% less time. In the field study on the well-known datasets such as CUB200, CARS196, and InShop, EIF identifies 4.4%, 6.6%, and 17.7% labelling mistakes, indicating the direction of the DML community to further improve the model performance. Our code is available at https: //github.com/lindsey98/Influence_function_metric_learning .
\ No newline at end of file
diff --git a/data/2022/neurips/Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks b/data/2022/neurips/Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks
new file mode 100644
index 0000000000..9b37457bde
--- /dev/null
+++ b/data/2022/neurips/Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks	
@@ -0,0 +1 @@
+Bilevel optimization have gained growing interests, with numerous applications found in meta learning, minimax games, reinforcement learning, and nested composition optimization. This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning. In this paper, we propose a gossip-based distributed bilevel learning algorithm that allows networked agents to solve both the inner and outer optimization problems in a single timescale and share information via network propagation. We show that our algorithm enjoys the $\mathcal{O}(\frac{1}{K \epsilon^2})$ per-agent sample complexity for general nonconvex bilevel optimization and $\mathcal{O}(\frac{1}{K \epsilon})$ for strongly convex objective, achieving a speedup that scales linearly with the network size. The sample complexities are optimal in both $\epsilon$ and $K$. We test our algorithm on the examples of hyperparameter tuning and decentralized reinforcement learning. Simulated experiments confirmed that our algorithm achieves the state-of-the-art training efficiency and test accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Decentralized Local Stochastic Extra-Gradient for Variational Inequalities b/data/2022/neurips/Decentralized Local Stochastic Extra-Gradient for Variational Inequalities
new file mode 100644
index 0000000000..c9d45d7f1e
--- /dev/null
+++ b/data/2022/neurips/Decentralized Local Stochastic Extra-Gradient for Variational Inequalities	
@@ -0,0 +1 @@
+We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Decentralized Training of Foundation Models in Heterogeneous Environments b/data/2022/neurips/Decentralized Training of Foundation Models in Heterogeneous Environments
new file mode 100644
index 0000000000..bff51c3fb1
--- /dev/null
+++ b/data/2022/neurips/Decentralized Training of Foundation Models in Heterogeneous Environments	
@@ -0,0 +1 @@
+Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often involving tens of thousands of GPUs running continuously for months. These models are typically trained in specialized clusters featuring fast, homogeneous interconnects and using carefully designed software systems that support both data parallelism and model/pipeline parallelism. Such dedicated clusters can be costly and difficult to obtain. Can we instead leverage the much greater amount of decentralized, heterogeneous, and lower-bandwidth interconnected compute? Previous works examining the heterogeneous, decentralized setting focus on relatively small models that can be trained in a purely data parallel manner. State-of-the-art schemes for model parallel foundation model training, such as Megatron, only consider the homogeneous data center setting. In this paper, we present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network. Our key technical contribution is a scheduling algorithm that allocates different computational"tasklets"in the training of foundation models to a group of decentralized GPU devices connected by a slow heterogeneous network. We provide a formal cost model and further propose an efficient evolutionary algorithm to find the optimal allocation strategy. We conduct extensive experiments that represent different scenarios for learning over geo-distributed devices simulated using real-world network measurements. In the most extreme case, across 8 different cities spanning 3 continents, our approach is 4.8X faster than prior state-of-the-art training systems (Megatron).
\ No newline at end of file
diff --git a/data/2022/neurips/Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets b/data/2022/neurips/Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets
new file mode 100644
index 0000000000..dab5b67d98
--- /dev/null
+++ b/data/2022/neurips/Decentralized, Communication- and Coordination-free Learning in Structured Matching Markets	
@@ -0,0 +1 @@
+We study the problem of online learning in competitive settings in the context of two-sided matching markets. In particular, one side of the market, the agents, must learn about their preferences over the other side, the firms, through repeated interaction while competing with other agents for successful matches. We propose a class of decentralized, communication- and coordination-free algorithms that agents can use to reach to their stable match in structured matching markets. In contrast to prior works, the proposed algorithms make decisions based solely on an agent's own history of play and requires no foreknowledge of the firms' preferences. Our algorithms are constructed by splitting up the statistical problem of learning one's preferences, from noisy observations, from the problem of competing for firms. We show that under realistic structural assumptions on the underlying preferences of the agents and firms, the proposed algorithms incur a regret which grows at most logarithmically in the time horizon. Our results show that, in the case of matching markets, competition need not drastically affect the performance of decentralized, communication and coordination free online learning algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning b/data/2022/neurips/Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning
new file mode 100644
index 0000000000..eab57ee43e
--- /dev/null
+++ b/data/2022/neurips/Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning	
@@ -0,0 +1 @@
+The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.
\ No newline at end of file
diff --git a/data/2022/neurips/Decision Trees with Short Explainable Rules b/data/2022/neurips/Decision Trees with Short Explainable Rules
new file mode 100644
index 0000000000..fd18406371
--- /dev/null
+++ b/data/2022/neurips/Decision Trees with Short Explainable Rules	
@@ -0,0 +1 @@
+Decision trees are widely used in many settings where interpretable models are preferred or required. As conﬁrmed by recent empirical studies, the interpretability/explainability of a decision tree critically depends on some of its structural parameters, like size and the average/maximum depth of its leaves. There is indeed a vast literature on the design and analysis of decision tree algorithms that aim at optimizing these parameters. This paper contributes to this important line of research: we propose as a novel criterion of measuring the interpretability of a decision tree, the sparsity of the set of attributes that are (on average) required to explain the classiﬁcation of the examples. We give a tight characterization of the best possible guarantees achievable by a decision tree built to optimize both our new measure (which we call the explanation size) and the more classical measures of worst-case and average depth. In particular, we give an algorithm that guarantees O (ln n ) -approximation (hence optimal if P (cid:54) = NP ) for the minimization of both the average/worst-case explanation size and the average/worst-case depth. In addition to our theoretical contributions, experiments with 20 real datasets show that our algorithm has accuracy competitive with CART while producing trees that allow for much simpler explanations.
\ No newline at end of file
diff --git a/data/2022/neurips/Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses b/data/2022/neurips/Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses
new file mode 100644
index 0000000000..9ea30c1962
--- /dev/null
+++ b/data/2022/neurips/Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses	
@@ -0,0 +1 @@
+Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better on that speciﬁc task . The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difﬁcult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by handcrafting task-speciﬁc surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead learn loss functions that capture task-speciﬁc information. To the best of our knowledge, ours is the ﬁrst approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus generalizable , and (b) can be convex by construction and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and ﬁnd that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal b/data/2022/neurips/Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal
new file mode 100644
index 0000000000..984d1ca17a
--- /dev/null
+++ b/data/2022/neurips/Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal	
@@ -0,0 +1 @@
+Vision transformers (ViTs) have demonstrated impressive performance and stronger adversarial robustness compared to Convolutional Neural Networks (CNNs). On the one hand, ViTs' focus on global interaction between individual patches reduces the local noise sensitivity of images. On the other hand, the neglect of noise sensitivity differences between image regions by existing decision-based attacks further compromises the efficiency of noise compression, especially for ViTs. Therefore, validating the black-box adversarial robustness of ViTs when the target model can only be queried still remains a challenging problem. In this paper, we theoretically analyze the limitations of existing decision-based attacks from the perspective of noise sensitivity difference between regions of the image, and propose a new decision-based black-box attack against ViTs, termed Patch-wise Adversarial Removal (PAR). PAR divides images into patches through a coarse-to-fine search process and compresses the noise on each patch separately. PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression. In addition, PAR can be used as a noise initialization method for other decision-based attacks to improve the noise compression efficiency on both ViTs and CNNs without introducing additional calculations. Extensive experiments on three datasets demonstrate that PAR achieves a much lower noise magnitude with the same number of queries.
\ No newline at end of file
diff --git a/data/2022/neurips/Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity b/data/2022/neurips/Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity
new file mode 100644
index 0000000000..c15bbee72e
--- /dev/null
+++ b/data/2022/neurips/Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity	
@@ -0,0 +1 @@
+Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation b/data/2022/neurips/Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation
new file mode 100644
index 0000000000..fa8d350979
--- /dev/null
+++ b/data/2022/neurips/Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation	
@@ -0,0 +1 @@
+Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned knowledge. Current CISS methods typically use a knowledge distillation (KD) technique for preserving classifier logits, or freeze a feature extractor, to avoid the forgetting problem. The strong constraints, however, prevent learning discriminative features for novel classes. We introduce a CISS framework that alleviates the forgetting problem and facilitates learning novel classes effectively. We have found that a logit can be decomposed into two terms. They quantify how likely an input belongs to a particular class or not, providing a clue for a reasoning process of a model. The KD technique, in this context, preserves the sum of two terms (i.e., a class logit), suggesting that each could be changed and thus the KD does not imitate the reasoning process. To impose constraints on each term explicitly, we propose a new decomposed knowledge distillation (DKD) technique, improving the rigidity of a model and addressing the forgetting problem more effectively. We also introduce a novel initialization method to train new classifiers for novel classes. In CISS, the number of negative training samples for novel classes is not sufficient to discriminate old classes. To mitigate this, we propose to transfer knowledge of negatives to the classifiers successively using an auxiliary classifier, boosting the performance significantly. Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Decomposing NeRF for Editing via Feature Field Distillation b/data/2022/neurips/Decomposing NeRF for Editing via Feature Field Distillation
new file mode 100644
index 0000000000..5948d471f3
--- /dev/null
+++ b/data/2022/neurips/Decomposing NeRF for Editing via Feature Field Distillation	
@@ -0,0 +1 @@
+Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been difficult to selectively edit specific regions or objects. In this work, we tackle the problem of semantic scene decomposition of NeRFs to enable query-based local editing of the represented 3D scenes. We propose to distill the knowledge of off-the-shelf, self-supervised 2D image feature extractors such as CLIP-LSeg or DINO into a 3D feature field optimized in parallel to the radiance field. Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training and enable us to semantically select and edit regions in the radiance field. Our experiments validate that the distilled feature fields (DFFs) can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations.
\ No newline at end of file
diff --git a/data/2022/neurips/Deconfounded Representation Similarity for Comparison of Neural Networks b/data/2022/neurips/Deconfounded Representation Similarity for Comparison of Neural Networks
new file mode 100644
index 0000000000..d593b114c9
--- /dev/null
+++ b/data/2022/neurips/Deconfounded Representation Similarity for Comparison of Neural Networks	
@@ -0,0 +1 @@
+Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks. However, these metrics are confounded by the population structure of data items in the input space, leading to spuriously high similarity for even completely random neural networks and inconsistent domain relations in transfer learning. We introduce a simple and generally applicable fix to adjust for the confounder with covariate adjustment regression, which retains the intuitive invariance properties of the original similarity measures. We show that deconfounding the similarity metrics increases the resolution of detecting semantically similar neural networks. Moreover, in real-world applications, deconfounding improves the consistency of representation similarities with domain similarities in transfer learning, and increases correlation with out-of-distribution accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Decoupled Context Processing for Context Augmented Language Modeling b/data/2022/neurips/Decoupled Context Processing for Context Augmented Language Modeling
new file mode 100644
index 0000000000..e6e6d719c9
--- /dev/null
+++ b/data/2022/neurips/Decoupled Context Processing for Context Augmented Language Modeling	
@@ -0,0 +1 @@
+Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and modularity. In this paper we examined a simple yet effective architecture for incorporating external context into language models based on decoupled Encoder Decoder architecture. We showed that such a simple architecture achieves competitive results on auto-regressive language modeling and open domain question answering tasks. We also analyzed the behavior of the proposed model which performs grounded context transfer. Finally we discussed the computational implications of such retrieval augmented models.
\ No newline at end of file
diff --git a/data/2022/neurips/Decoupled Self-supervised Learning for Graphs b/data/2022/neurips/Decoupled Self-supervised Learning for Graphs
new file mode 100644
index 0000000000..e192bf67e1
--- /dev/null
+++ b/data/2022/neurips/Decoupled Self-supervised Learning for Graphs	
@@ -0,0 +1 @@
+This paper studies the problem of conducting self-supervised learning for node representation learning on graphs. Most existing self-supervised learning methods assume the graph is homophilous, where linked nodes often belong to the same class or have similar features. However, such assumptions of homophily do not always hold in real-world graphs. We address this problem by developing a decoupled self-supervised learning (DSSL) framework for graph neural networks. DSSL imitates a generative process of nodes and links from latent variable modeling of the semantic structure, which decouples different underlying semantics between different neighborhoods into the self-supervised learning process. Our DSSL framework is agnostic to the encoders and does not need prefabricated augmentations, thus is ﬂexible to different graphs. To effectively optimize the framework, we derive the evidence lower bound of the self-supervised objective and develop a scalable training algorithm with variational inference. We provide a theoretical analysis to justify that DSSL enjoys the better downstream performance. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can achieve better performance compared with competitive baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation b/data/2022/neurips/Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation
new file mode 100644
index 0000000000..9730534533
--- /dev/null
+++ b/data/2022/neurips/Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation	
@@ -0,0 +1 @@
+This paper focus on few-shot object detection (FSOD) and instance segmentation (FSIS), which requires a model to quickly adapt to novel classes with a few labeled instances. The existing methods severely suffer from bias classification because of the missing label issue which naturally exists in an instance-level few-shot scenario and is first formally proposed by us. Our analysis suggests that the standard classification head of most FSOD or FSIS models needs to be decoupled to mitigate the bias classification. Therefore, we propose an embarrassingly simple but effective method that decouples the standard classifier into two heads. Then, these two individual heads are capable of independently addressing clear positive samples and noisy negative samples which are caused by the missing label. In this way, the model can effectively learn novel classes while mitigating the effects of noisy negative samples. Without bells and whistles, our model without any additional computation cost and parameters consistently outperforms its baseline and state-of-the-art by a large margin on PASCAL VOC and MS-COCO benchmarks for FSOD and FSIS tasks. 1
\ No newline at end of file
diff --git a/data/2022/neurips/Decoupling Features in Hierarchical Propagation for Video Object Segmentation b/data/2022/neurips/Decoupling Features in Hierarchical Propagation for Video Object Segmentation
new file mode 100644
index 0000000000..c47f5cea18
--- /dev/null
+++ b/data/2022/neurips/Decoupling Features in Hierarchical Propagation for Video Object Segmentation	
@@ -0,0 +1 @@
+This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.
\ No newline at end of file
diff --git a/data/2022/neurips/Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning b/data/2022/neurips/Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning
new file mode 100644
index 0000000000..0e9ce8c3e7
--- /dev/null
+++ b/data/2022/neurips/Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning	
@@ -0,0 +1 @@
+Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https://github.com/zjunlp/PromptKG/tree/main/research/RetroPrompt.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Active Learning by Leveraging Training Dynamics b/data/2022/neurips/Deep Active Learning by Leveraging Training Dynamics
new file mode 100644
index 0000000000..ba9b192f2f
--- /dev/null
+++ b/data/2022/neurips/Deep Active Learning by Leveraging Training Dynamics	
@@ -0,0 +1 @@
+Active learning theories and methods have been extensively studied in classical statistical learning settings. However, deep active learning, i.e., active learning with deep learning models, is usually based on empirical criteria without solid theoretical justification, thus suffering from heavy doubts when some of those fail to provide benefits in real applications. In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that the convergence speed of training and the generalization performance are positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to better generalization performance. Furthermore, to scale up to large deep neural networks and data sets, we introduce two relaxations for the subset selection problem and reduce the time complexity from polynomial to constant. Empirical results show that dynamicAL not only outperforms the other baselines consistently but also scales well on large deep learning models. We hope our work would inspire more attempts on bridging the theoretical findings of deep networks and practical impacts of deep active learning in real applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis b/data/2022/neurips/Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis
new file mode 100644
index 0000000000..bbc97fc6fb
--- /dev/null
+++ b/data/2022/neurips/Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis	
@@ -0,0 +1 @@
+Advanced deep neural networks (DNNs), designed by either human or AutoML algorithms, are growing increasingly complex. Diverse operations are connected by complicated connectivity patterns, e.g., various types of skip connections. Those topological compositions are empirically effective and observed to smooth the loss landscape and facilitate the gradient flow in general. However, it remains elusive to derive any principled understanding of their effects on the DNN capacity or trainability, and to understand why or in which aspect one specific connectivity pattern is better than another. In this work, we theoretically characterize the impact of connectivity patterns on the convergence of DNNs under gradient descent training in fine granularity. By analyzing a wide network's Neural Network Gaussian Process (NNGP), we are able to depict how the spectrum of an NNGP kernel propagates through a particular connectivity pattern, and how that affects the bound of convergence rates. As one practical implication of our results, we show that by a simple filtration on"unpromising"connectivity patterns, we can trim down the number of models to evaluate, and significantly accelerate the large-scale neural architecture search without any overhead. Code is available at: https://github.com/VITA-Group/architecture_convergence.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems b/data/2022/neurips/Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems
new file mode 100644
index 0000000000..c61525b9a9
--- /dev/null
+++ b/data/2022/neurips/Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems	
@@ -0,0 +1 @@
+Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs). It has been shown that BP can achieve state-of-the-art performance on various benchmarks by mixing old and new messages before sending the new one, i.e., damping. However, existing methods of tuning a static damping factor for BP not only are laborious but also harm their performance. Moreover, existing BP algorithms treat each variable node's neighbors equally when composing a new message, which also limits their exploration ability. To address these issues, we seamlessly integrate BP, Gated Recurrent Units (GRUs), and Graph Attention Networks (GATs) within the message-passing framework to reason about dynamic weights and damping factors for composing new BP messages. Our model, Deep Attentive Belief Propagation (DABP), takes the factor graph and the BP messages in each iteration as the input and infers the optimal weights and damping factors through GRUs and GATs, followed by a multi-head attention layer. Furthermore, unlike existing neural-based BP variants, we propose a novel self-supervised learning algorithm for DABP with a smoothed solution cost, which does not require expensive training labels and also avoids the common out-of-distribution issue through efficient online learning. Extensive experiments show that our model significantly outperforms state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Bidirectional Language-Knowledge Graph Pretraining b/data/2022/neurips/Deep Bidirectional Language-Knowledge Graph Pretraining
new file mode 100644
index 0000000000..713ac6c0b8
--- /dev/null
+++ b/data/2022/neurips/Deep Bidirectional Language-Knowledge Graph Pretraining	
@@ -0,0 +1 @@
+Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks. Recent works show that a knowledge graph (KG) can complement text data, offering structured background knowledge that provides a useful scaffold for reasoning. However, these works are not pretrained to learn a deep fusion of the two modalities at scale, limiting the potential to acquire fully joint representations of text and KG. Here we propose DRAGON (Deep Bidirectional Language-Knowledge Graph Pretraining), a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale. Specifically, our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities. We pretrain this model by unifying two self-supervised reasoning tasks, masked language modeling and KG link prediction. DRAGON outperforms existing LM and LM+KG models on diverse downstream tasks including question answering across general and biomedical domains, with +5% absolute gain on average. In particular, DRAGON achieves notable performance on complex reasoning about language and knowledge (+10% on questions involving long contexts or multi-step reasoning) and low-resource QA (+8% on OBQA and RiddleSense), and new state-of-the-art results on various BioNLP tasks. Our code and trained models are available at https://github.com/michiyasunaga/dragon.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Combinatorial Aggregation b/data/2022/neurips/Deep Combinatorial Aggregation
new file mode 100644
index 0000000000..597c6e297d
--- /dev/null
+++ b/data/2022/neurips/Deep Combinatorial Aggregation	
@@ -0,0 +1 @@
+Neural networks are known to produce poor uncertainty estimations, and a variety of approaches have been proposed to remedy this issue. This includes deep ensemble, a simple and effective method that achieves state-of-the-art results for uncertainty-aware learning tasks. In this work, we explore a combinatorial generalization of deep ensemble called deep combinatorial aggregation (DCA). DCA creates multiple instances of network components and aggregates their combinations to produce diversified model proposals and predictions. DCA components can be defined at different levels of granularity. And we discovered that coarse-grain DCAs can outperform deep ensemble for uncertainty-aware learning both in terms of predictive performance and uncertainty estimation. For fine-grain DCAs, we discover that an average parameterization approach named deep combinatorial weight averaging (DCWA) can improve the baseline training. It is on par with stochastic weight averaging (SWA) but does not require any custom training schedule or adaptation of BatchNorm layers. Furthermore, we propose a consistency enforcing loss that helps the training of DCWA and modelwise DCA. We experiment on in-domain, distributional shift, and out-of-distribution image classification tasks, and empirically confirm the effectiveness of DCWA and DCA approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Compression of Pre-trained Transformer Models b/data/2022/neurips/Deep Compression of Pre-trained Transformer Models
new file mode 100644
index 0000000000..0c6b7538d6
--- /dev/null
+++ b/data/2022/neurips/Deep Compression of Pre-trained Transformer Models	
@@ -0,0 +1 @@
+Pre-trained transformer models have achieved remarkable success in natural language processing (NLP) and have recently become competitive alternatives to Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN) in vision and speech tasks, respectively. Due to their excellent computational efﬁ-ciency and scalability, transformer models can be trained on exceedingly large amounts of data at the expense of tremendous growth in model size. As high performance, large-scale, and pre-trained transformer models become increasingly available for users to download and ﬁne-tune for customized downstream tasks, their deployment becomes challenging due to the vast amount of operations and large memory footprint. To address this challenge, we introduce methods to deeply compress pre-trained transformer models across three major application domains: NLP, speech, and vision. Speciﬁcally, we quantize transformer backbones down to 4-bit and further achieve 50% ﬁne-grained structural sparsity on pre-trained BERT, Wav2vec2.0, and Vision Transformer (ViT) models to demonstrate 16x compression while maintaining model accuracy. This is achieved by identifying critical initialization strategies for quantization-and sparsity-aware ﬁne-tuning as well as developing novel techniques such as quantizers with a zero-preserving format and scheduled dropout. These hardware-friendly techniques need only to be applied in the ﬁne-tuning phase for downstream tasks, which renders them especially suitable for acceleration and deployment of pre-trained transformer models.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Counterfactual Estimation with Categorical Background Variables b/data/2022/neurips/Deep Counterfactual Estimation with Categorical Background Variables
new file mode 100644
index 0000000000..813cd8ac7e
--- /dev/null
+++ b/data/2022/neurips/Deep Counterfactual Estimation with Categorical Background Variables	
@@ -0,0 +1 @@
+Referred to as the third rung of the causal inference ladder, counterfactual queries typically ask the"What if ?"question retrospectively. The standard approach to estimate counterfactuals resides in using a structural equation model that accurately reflects the underlying data generating process. However, such models are seldom available in practice and one usually wishes to infer them from observational data alone. Unfortunately, the correct structural equation model is in general not identifiable from the observed factual distribution. Nevertheless, in this work, we show that under the assumption that the main latent contributors to the treatment responses are categorical, the counterfactuals can be still reliably predicted. Building upon this assumption, we introduce CounterFactual Query Prediction (CFQP), a novel method to infer counterfactuals from continuous observations when the background variables are categorical. We show that our method significantly outperforms previously available deep-learning-based counterfactual methods, both theoretically and empirically on time series and image data. Our code is available at https://github.com/edebrouwer/cfqp.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Differentiable Logic Gate Networks b/data/2022/neurips/Deep Differentiable Logic Gate Networks
new file mode 100644
index 0000000000..b20a2c41aa
--- /dev/null
+++ b/data/2022/neurips/Deep Differentiable Logic Gate Networks	
@@ -0,0 +1 @@
+Recently, research has increasingly focused on developing efficient neural network architectures. In this work, we explore logic gate networks for machine learning tasks by learning combinations of logic gates. These networks comprise logic gates such as"AND"and"XOR", which allow for very fast execution. The difficulty in learning logic gate networks is that they are conventionally non-differentiable and therefore do not allow training with gradient descent. Thus, to allow for effective training, we propose differentiable logic gate networks, an architecture that combines real-valued logics and a continuously parameterized relaxation of the network. The resulting discretized logic gate networks achieve fast inference speeds, e.g., beyond a million images of MNIST per second on a single CPU core.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Ensembles Work, But Are They Necessary? b/data/2022/neurips/Deep Ensembles Work, But Are They Necessary?
new file mode 100644
index 0000000000..4774befd4f
--- /dev/null
+++ b/data/2022/neurips/Deep Ensembles Work, But Are They Necessary?	
@@ -0,0 +1 @@
+Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's uncertainty quantification on out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and -- in this sense -- is not indicative of any"effective robustness". While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Equilibrium Approaches to Diffusion Models b/data/2022/neurips/Deep Equilibrium Approaches to Diffusion Models
new file mode 100644
index 0000000000..31e32f123e
--- /dev/null
+++ b/data/2022/neurips/Deep Equilibrium Approaches to Diffusion Models	
@@ -0,0 +1 @@
+Diffusion-based generative models are extremely effective in generating high-quality images, with generated samples often surpassing the quality of those produced by other models under several metrics. One distinguishing feature of these models, however, is that they typically require long sampling chains to produce high-fidelity images. This presents a challenge not only from the lenses of sampling time, but also from the inherent difficulty in backpropagating through these chains in order to accomplish tasks such as model inversion, i.e. approximately finding latent states that generate known images. In this paper, we look at diffusion models through a different perspective, that of a (deep) equilibrium (DEQ) fixed point model. Specifically, we extend the recent denoising diffusion implicit model (DDIM; Song et al. 2020), and model the entire sampling chain as a joint, multivariate fixed point system. This setup provides an elegant unification of diffusion and equilibrium models, and shows benefits in 1) single image sampling, as it replaces the fully-serial typical sampling process with a parallel one; and 2) model inversion, where we can leverage fast gradients in the DEQ setting to much more quickly find the noise that generates a given image. The approach is also orthogonal and thus complementary to other methods used to reduce the sampling time, or improve model inversion. We demonstrate our method's strong performance across several datasets, including CIFAR10, CelebA, and LSUN Bedrooms and Churches.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Fourier Up-Sampling b/data/2022/neurips/Deep Fourier Up-Sampling
new file mode 100644
index 0000000000..a46428d4a4
--- /dev/null
+++ b/data/2022/neurips/Deep Fourier Up-Sampling	
@@ -0,0 +1 @@
+Existing convolutional neural networks widely adopt spatial down-/up-sampling for multi-scale modeling. However, spatial up-sampling operators (\emph{e.g.}, interpolation, transposed convolution, and un-pooling) heavily depend on local pixel attention, incapably exploring the global dependency. In contrast, the Fourier domain obeys the nature of global modeling according to the spectral convolution theorem. Unlike the spatial domain that performs up-sampling with the property of local similarity, up-sampling in the Fourier domain is more challenging as it does not follow such a local property. In this study, we propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues. We revisit the relationships between spatial and Fourier domains and reveal the transform rules on the features of different resolutions in the Fourier domain, which provide key insights for FourierUp's designs. FourierUp as a generic operator consists of three key components: 2D discrete Fourier transform, Fourier dimension increase rules, and 2D inverse Fourier transform, which can be directly integrated with existing networks. Extensive experiments across multiple computer vision tasks, including object detection, image segmentation, image de-raining, image dehazing, and guided image super-resolution, demonstrate the consistent performance gains obtained by introducing our FourierUp.
\ No newline at end of file
diff --git "a/data/2022/neurips/Deep Generalized Schr\303\266dinger Bridge" "b/data/2022/neurips/Deep Generalized Schr\303\266dinger Bridge"
new file mode 100644
index 0000000000..302ecfe19f
--- /dev/null
+++ "b/data/2022/neurips/Deep Generalized Schr\303\266dinger Bridge"	
@@ -0,0 +1 @@
+Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired distribution. These setups are, despite being well-motivated for practical purposes, complicated enough to paralyze most (deep) numerical solvers. Nevertheless, we show that Schr\"odinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accepting mean-field structures, hence solving these MFGs. This is achieved via the application of Forward-Backward Stochastic Differential Equations theory, which, intriguingly, leads to a computational framework with a similar structure to Temporal Difference learning. As such, it opens up novel algorithmic connections to Deep Reinforcement Learning that we leverage to facilitate practical training. We show that our proposed objective function provides necessary and sufficient conditions to the mean-field problem. Our method, named Deep Generalized Schr\"odinger Bridge (DeepGSB), not only outperforms prior methods in solving classical population navigation MFGs, but is also capable of solving 1000-dimensional opinion depolarization, setting a new state-of-the-art numerical solver for high-dimensional MFGs. Our code will be made available at https://github.com/ghliu/DeepGSB.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Generative Model for Periodic Graphs b/data/2022/neurips/Deep Generative Model for Periodic Graphs
new file mode 100644
index 0000000000..59c9e9ec79
--- /dev/null
+++ b/data/2022/neurips/Deep Generative Model for Periodic Graphs	
@@ -0,0 +1 @@
+Periodic graphs are graphs consisting of repetitive local structures, such as crystal nets and polygon mesh. Their generative modeling has great potential in real-world applications such as material design and graphics synthesis. Classical models either rely on domain-specific predefined generation principles (e.g., in crystal net design), or follow geometry-based prescribed rules. Recently, deep generative models has shown great promise in automatically generating general graphs. However, their advancement into periodic graphs have not been well explored due to several key challenges in 1) maintaining graph periodicity; 2) disentangling local and global patterns; and 3) efficiency in learning repetitive patterns. To address them, this paper proposes Periodical-Graph Disentangled Variational Auto-encoder (PGD-VAE), a new deep generative models for periodic graphs that can automatically learn, disentangle, and generate local and global graph patterns. Specifically, we develop a new periodic graph encoder consisting of global-pattern encoder and local-pattern encoder that ensures to disentangle the representation into global and local semantics. We then propose a new periodic graph decoder consisting of local structure decoder, neighborhood decoder, and global structure decoder, as well as the assembler of their outputs that guarantees periodicity. Moreover, we design a new model learning objective that helps ensure the invariance of local-semantic representations for the graphs with the same local structure. Comprehensive experimental evaluations have been conducted to demonstrate the effectiveness of the proposed method. The code of proposed PGD-VAE is availabe at https://github.com/shi-yu-wang/PGD-VAE.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Hierarchical Planning from Pixels b/data/2022/neurips/Deep Hierarchical Planning from Pixels
new file mode 100644
index 0000000000..89bf94b875
--- /dev/null
+++ b/data/2022/neurips/Deep Hierarchical Planning from Pixels	
@@ -0,0 +1 @@
+Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has proven to be challenging, current methods rely on manually specified goal spaces or subtasks, and no general solution exists. We introduce Director, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. The high-level policy maximizes task and exploration rewards by selecting latent goals and the low-level policy learns to achieve the goals. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception, without access to the global position or top-down view that was used by prior work. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Learning Methods for Proximal Inference via Maximum Moment Restriction b/data/2022/neurips/Deep Learning Methods for Proximal Inference via Maximum Moment Restriction
new file mode 100644
index 0000000000..f05885163f
--- /dev/null
+++ b/data/2022/neurips/Deep Learning Methods for Proximal Inference via Maximum Moment Restriction	
@@ -0,0 +1 @@
+The No Unmeasured Confounding Assumption is widely used to identify causal effects in observational studies. Recent work on proximal inference has provided alternative identification results that succeed even in the presence of unobserved confounders, provided that one has measured a sufficiently rich set of proxy variables, satisfying specific structural conditions. However, proximal inference requires solving an ill-posed integral equation. Previous approaches have used a variety of machine learning techniques to estimate a solution to this integral equation, commonly referred to as the bridge function. However, prior work has often been limited by relying on pre-specified kernel functions, which are not data adaptive and struggle to scale to large datasets. In this work, we introduce a flexible and scalable method based on a deep neural network to estimate causal effects in the presence of unmeasured confounding using proximal inference. Our method achieves state of the art performance on two well-established proximal inference benchmarks. Finally, we provide theoretical consistency guarantees for our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Model Reassembly b/data/2022/neurips/Deep Model Reassembly
new file mode 100644
index 0000000000..3d4997e3bb
--- /dev/null
+++ b/data/2022/neurips/Deep Model Reassembly	
@@ -0,0 +1 @@
+In this paper, we explore a novel knowledge-transfer task, termed as Deep Model Reassembly (DeRy), for general-purpose model reuse. Given a collection of heterogeneous models pre-trained from distinct sources and with diverse architectures, the goal of DeRy, as its name implies, is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks under both the hardware resource and performance constraints. Such ambitious nature of DeRy inevitably imposes significant challenges, including, in the first place, the feasibility of its solution. We strive to showcase that, through a dedicated paradigm proposed in this paper, DeRy can be made not only possibly but practically efficiently. Specifically, we conduct the partitions of all pre-trained networks jointly via a cover set optimization, and derive a number of equivalence set, within each of which the network blocks are treated as functionally equivalent and hence interchangeable. The equivalence sets learned in this way, in turn, enable picking and assembling blocks to customize networks subject to certain constraints, which is achieved via solving an integer program backed up with a training-free proxy to estimate the task performance. The reassembled models, give rise to gratifying performances with the user-specified constraints satisfied. We demonstrate that on ImageNet, the best reassemble model achieves 78.6% top-1 accuracy without fine-tuning, which could be further elevated to 83.2% with end-to-end training. Our code is available at https://github.com/Adamdad/DeRy
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies b/data/2022/neurips/Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies
new file mode 100644
index 0000000000..d5e2ef3ba9
--- /dev/null
+++ b/data/2022/neurips/Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies	
@@ -0,0 +1 @@
+Estimating the effect of intervention from observational data while accounting for confounding variables is a key task in causal inference. Oftentimes, the confounders are unobserved, but we have access to large amounts of additional unstructured data (images, text) that contain valuable proxy signal about the missing confounders. This paper argues that leveraging this unstructured data can greatly improve the accuracy of causal effect estimation. Specifically, we introduce deep multi-modal structural equations, a generative model for causal effect estimation in which confounders are latent variables and unstructured data are proxy variables. This model supports multiple multi-modal proxies (images, text) as well as missing data. We empirically demonstrate that our approach outperforms existing methods based on propensity scores and corrects for confounding using unstructured inputs on tasks in genomics and healthcare. Our methods can potentially support the use of large amounts of data that were previously not used in causal inference
\ No newline at end of file
diff --git a/data/2022/neurips/Deep Surrogate Assisted Generation of Environments b/data/2022/neurips/Deep Surrogate Assisted Generation of Environments
new file mode 100644
index 0000000000..a352513ccf
--- /dev/null
+++ b/data/2022/neurips/Deep Surrogate Assisted Generation of Environments	
@@ -0,0 +1 @@
+Recent progress in reinforcement learning (RL) has started producing generally capable agents that can solve a distribution of complex environments. These agents are typically tested on fixed, human-authored environments. On the other hand, quality diversity (QD) optimization has been proven to be an effective component of environment generation algorithms, which can generate collections of high-quality environments that are diverse in the resulting agent behaviors. However, these algorithms require potentially expensive simulations of agents on newly generated environments. We propose Deep Surrogate Assisted Generation of Environments (DSAGE), a sample-efficient QD environment generation algorithm that maintains a deep surrogate model for predicting agent behaviors in new environments. Results in two benchmark domains show that DSAGE significantly outperforms existing QD environment generation algorithms in discovering collections of environments that elicit diverse behaviors of a state-of-the-art RL agent and a planning agent. Our source code and videos are available at https://dsagepaper.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/Deep invariant networks with differentiable augmentation layers b/data/2022/neurips/Deep invariant networks with differentiable augmentation layers
new file mode 100644
index 0000000000..fe28057d5e
--- /dev/null
+++ b/data/2022/neurips/Deep invariant networks with differentiable augmentation layers	
@@ -0,0 +1 @@
+Designing learning systems which are invariant to certain data transformations is critical in machine learning. Practitioners can typically enforce a desired invariance on the trained model through the choice of a network architecture, e.g. using convolutions for translations, or using data augmentation. Yet, enforcing true invariance in the network can be difficult, and data invariances are not always known a piori. State-of-the-art methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems, which are complex to solve and often computationally demanding. In this work we investigate new ways of learning invariances only from the training data. Using learnable augmentation layers built directly in the network, we demonstrate that our method is very versatile. It can incorporate any type of differentiable augmentation and be applied to a broad class of learning problems beyond computer vision. We provide empirical evidence showing that our approach is easier and faster to train than modern automatic data augmentation techniques based on bilevel optimization, while achieving comparable results. Experiments show that while the invariances transferred to a model through automatic data augmentation are limited by the model expressivity, the invariance yielded by our approach is insensitive to it by design.
\ No newline at end of file
diff --git a/data/2022/neurips/DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning b/data/2022/neurips/DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning
new file mode 100644
index 0000000000..5ea3ffba01
--- /dev/null
+++ b/data/2022/neurips/DeepFoids: Adaptive Bio-Inspired Fish Simulation with Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Our goal is to synthesize realistic underwater scenes with various fish species in different fish cages, which can be utilized to train computer vision models to automate fish counting task. It is a challenging problem to prepare a sufficiently diverse labeled dataset of images from aquatic environments. We solve this challenge by introducing an adaptive bio-inspired fish simulation. The behavior of caged fish changes based on the species, size and number of fish, and the size and shape of the cage, among other variables. In this paper, we propose a method for achieving schooling behavior for any given combination of variables, using multi-agent deep reinforcement learning (DRL) in various fish cages in arbitrary environments. Furthermore, to visually reproduce the underwater scene in different locations and seasons, we incorporate a physically-based underwater simulation.
\ No newline at end of file
diff --git a/data/2022/neurips/DeepInteraction: 3D Object Detection via Modality Interaction b/data/2022/neurips/DeepInteraction: 3D Object Detection via Modality Interaction
new file mode 100644
index 0000000000..6f33380d04
--- /dev/null
+++ b/data/2022/neurips/DeepInteraction: 3D Object Detection via Modality Interaction	
@@ -0,0 +1 @@
+Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteraction architecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.
\ No newline at end of file
diff --git a/data/2022/neurips/DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning b/data/2022/neurips/DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning
new file mode 100644
index 0000000000..97e9f2d40e
--- /dev/null
+++ b/data/2022/neurips/DeepMed: Semiparametric Causal Mediation Analysis with Debiased Deep Learning	
@@ -0,0 +1 @@
+Causal mediation analysis can unpack the black box of causality and is therefore a powerful tool for disentangling causal pathways in biomedical and social sciences, and also for evaluating machine learning fairness. To reduce bias for estimating Natural Direct and Indirect Effects in mediation analysis, we propose a new method called DeepMed that uses deep neural networks (DNNs) to cross-fit the infinite-dimensional nuisance functions in the efficient influence functions. We obtain novel theoretical results that our DeepMed method (1) can achieve semiparametric efficiency bound without imposing sparsity constraints on the DNN architecture and (2) can adapt to certain low dimensional structures of the nuisance functions, significantly advancing the existing literature on DNN-based semiparametric causal inference. Extensive synthetic experiments are conducted to support our findings and also expose the gap between theory and practice. As a proof of concept, we apply DeepMed to analyze two real datasets on machine learning fairness and reach conclusions consistent with previous findings.
\ No newline at end of file
diff --git a/data/2022/neurips/DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs b/data/2022/neurips/DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs
new file mode 100644
index 0000000000..80ec43fb89
--- /dev/null
+++ b/data/2022/neurips/DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs	
@@ -0,0 +1 @@
+We consider the problem of learning the optimal threshold policy for control problems. Threshold policies make control decisions by evaluating whether an element of the system state exceeds a certain threshold, whose value is determined by other elements of the system state. By leveraging the monotone property of threshold policies, we prove that their policy gradients have a surprisingly simple expression. We use this simple expression to build an off-policy actor-critic algorithm for learning the optimal threshold policy. Simulation results show that our policy significantly outperforms other reinforcement learning algorithms due to its ability to exploit the monotone property. In addition, we show that the Whittle index, a powerful tool for restless multi-armed bandit problems, is equivalent to the optimal threshold policy for an alternative problem. This observation leads to a simple algorithm that finds the Whittle index by learning the optimal threshold policy in the alternative problem. Simulation results show that our algorithm learns the Whittle index much faster than several recent studies that learn the Whittle index through indirect means.
\ No newline at end of file
diff --git a/data/2022/neurips/Defending Against Adversarial Attacks via Neural Dynamic System b/data/2022/neurips/Defending Against Adversarial Attacks via Neural Dynamic System
new file mode 100644
index 0000000000..36754d3f5d
--- /dev/null
+++ b/data/2022/neurips/Defending Against Adversarial Attacks via Neural Dynamic System	
@@ -0,0 +1 @@
+Although deep neural networks (DNN) have achieved great success, their applications in safety-critical areas are hindered due to their vulnerability to adversarial attacks. Some recent works have accordingly proposed to enhance the robustness of DNN from a dynamic system perspective. Following this line of inquiry, and inspired by the asymptotic stability of the general nonautonomous dynamical system, we propose to make each clean instance be the asymptotically stable equilibrium points of a slowly time-varying system in order to defend against adversarial attacks. We present a theoretical guarantee that if a clean instance is an asymptotically stable equilibrium point and the adversarial instance is in the neighborhood of this point, the asymptotic stability will reduce the adversarial noise to bring the adversarial instance close to the clean instance. Motivated by our theoretical results, we go on to propose a nonautonomous neural ordinary differential equation (ASODE) and place constraints on its corresponding linear time-variant system to make all clean instances act as its asymptotically stable equilibrium points. Our analysis suggests that the constraints can be converted to regularizers in implementation. The experimental results show that ASODE improves robustness against adversarial attacks and outperforms the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Defining and Characterizing Reward Gaming b/data/2022/neurips/Defining and Characterizing Reward Gaming
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging b/data/2022/neurips/Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
new file mode 100644
index 0000000000..fb0babc8cc
--- /dev/null
+++ b/data/2022/neurips/Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging	
@@ -0,0 +1 @@
+In coded aperture snapshot spectral compressive imaging (CASSI) systems, hyperspectral image (HSI) reconstruction methods are employed to recover the spatial-spectral signal from a compressed measurement. Among these algorithms, deep unfolding methods demonstrate promising performance but suffer from two issues. Firstly, they do not estimate the degradation patterns and ill-posedness degree from the highly related CASSI to guide the iterative learning. Secondly, they are mainly CNN-based, showing limitations in capturing long-range dependencies. In this paper, we propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration. Moreover, we customize a novel Half-Shuffle Transformer (HST) that simultaneously captures local contents and non-local dependencies. By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST), for HSI reconstruction. Experiments show that DAUHST significantly surpasses state-of-the-art methods while requiring cheaper computational and memory costs. Code and models will be released at https://github.com/caiyuanhao1998/MST
\ No newline at end of file
diff --git a/data/2022/neurips/Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation b/data/2022/neurips/Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation
new file mode 100644
index 0000000000..76d560ac1d
--- /dev/null
+++ b/data/2022/neurips/Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation	
@@ -0,0 +1 @@
+In unsupervised domain adaptation (UDA), directly adapting from the source to the target domain usually suffers significant discrepancies and leads to insufficient alignment. Thus, many UDA works attempt to vanish the domain gap gradually and softly via various intermediate spaces, dubbed domain bridging (DB). However, for dense prediction tasks such as domain adaptive semantic segmentation (DASS), existing solutions have mostly relied on rough style transfer and how to elegantly bridge domains is still under-explored. In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space. At the heart of DDB lies a dual-path domain bridging step for generating two intermediate domains using the coarse-wise and the fine-wise data mixing techniques, alongside a cross-path knowledge distillation step for taking two complementary models trained on generated intermediate samples as 'teachers' to develop a superior 'student' in a multi-teacher distillation manner. These two optimization steps work in an alternating way and reinforce each other to give rise to DDB with strong adaptation power. Extensive experiments on adaptive segmentation tasks with different settings demonstrate that our DDB significantly outperforms state-of-the-art methods. Code is available at https://github.com/xiaoachen98/DDB.git.
\ No newline at end of file
diff --git a/data/2022/neurips/Delving into Out-of-Distribution Detection with Vision-Language Representations b/data/2022/neurips/Delving into Out-of-Distribution Detection with Vision-Language Representations
new file mode 100644
index 0000000000..7cdf7e898f
--- /dev/null
+++ b/data/2022/neurips/Delving into Out-of-Distribution Detection with Vision-Language Representations	
@@ -0,0 +1 @@
+Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 13.1% (AUROC). Code is available at https://github.com/deeplearning-wisc/MCM.
\ No newline at end of file
diff --git a/data/2022/neurips/Delving into Sequential Patches for Deepfake Detection b/data/2022/neurips/Delving into Sequential Patches for Deepfake Detection
new file mode 100644
index 0000000000..05f5793426
--- /dev/null
+++ b/data/2022/neurips/Delving into Sequential Patches for Deepfake Detection	
@@ -0,0 +1 @@
+Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions. As a result, researchers have been devoted to deepfake detection. Previous studies have identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods, however, they still suffer from robustness problem against post-processings. In this work, we propose the Local-&Temporal-aware Transformer-based Deepfake Detection (LTTD) framework, which adopts a local-to-global learning protocol with a particular focus on the valuable temporal information within local sequences. Specifically, we propose a Local Sequence Transformer (LST), which models the temporal consistency on sequences of restricted spatial regions, where low-level information is hierarchically enhanced with shallow layers of learned 3D filters. Based on the local temporal embeddings, we then achieve the final classification in a global contrastive way. Extensive experiments on popular datasets validate that our approach effectively spots local forgery cues and achieves state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Denoising Diffusion Restoration Models b/data/2022/neurips/Denoising Diffusion Restoration Models
new file mode 100644
index 0000000000..edc1bc1f61
--- /dev/null
+++ b/data/2022/neurips/Denoising Diffusion Restoration Models	
@@ -0,0 +1 @@
+Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not problem-specific typically rely on inefficient iterative methods. This work addresses these issues by introducing Denoising Diffusion Restoration Models (DDRM), an efficient, unsupervised posterior sampling method. Motivated by variational inference, DDRM takes advantage of a pre-trained denoising diffusion generative model for solving any linear inverse problem. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization under various amounts of measurement noise. DDRM outperforms the current leading unsupervised methods on the diverse ImageNet dataset in reconstruction quality, perceptual quality, and runtime, being 5x faster than the nearest competitor. DDRM also generalizes well for natural images out of the distribution of the observed ImageNet training set.
\ No newline at end of file
diff --git a/data/2022/neurips/Dense Interspecies Face Embedding b/data/2022/neurips/Dense Interspecies Face Embedding
new file mode 100644
index 0000000000..93fe57ce57
--- /dev/null
+++ b/data/2022/neurips/Dense Interspecies Face Embedding	
@@ -0,0 +1 @@
+Dense Interspecies Face Embedding (DIFE) is a new direction for understanding faces of various animals by extracting common features among animal faces including human face. There are three main obstacles for interspecies face understanding: (1) lack of animal data compared to human, (2) ambiguous connection between faces of various animals, and (3) extreme shape and style variance. To cope with the lack of data, we utilize multi-teacher knowledge distillation of CSE and StyleGAN2 requiring no additional data or label. Then we synthesize pseudo pair images through the latent space exploration of StyleGAN2 to ﬁnd implicit associations between different animal faces. Finally, we introduce the semantic matching loss to overcome the problem of extreme shape differences between species. To quantitatively evaluate our method over possible previous methodologies like unsupervised keypoint detection, we perform interspecies facial keypoint transfer on MAFL and AP-10K. Furthermore, the results of other applications like interspecies face image manipulation and dense keypoint transfer are provided. The code is available at https://github.com/kingsj0405/dife.
\ No newline at end of file
diff --git a/data/2022/neurips/Density-driven Regularization for Out-of-distribution Detection b/data/2022/neurips/Density-driven Regularization for Out-of-distribution Detection
new file mode 100644
index 0000000000..161e2d542d
--- /dev/null
+++ b/data/2022/neurips/Density-driven Regularization for Out-of-distribution Detection	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) samples is essential for reliably deploying deep learning classifiers in open-world applications. However, existing detectors relying on discriminative probability suffer from the overconfident posterior estimate for OOD data. Other reported approaches either impose strong unproven parametric assumptions to estimate OOD sample density or develop empirical detectors lacking clear theoretical motivations. To address these issues, we propose a theoretical probabilistic framework for OOD detection in deep classification networks, in which two regularization constraints are constructed to reliably calibrate and estimate sample density to identify OOD. Specifically, the density consistency regularization enforces the agreement between analytical and empirical densities of observable low-dimensional categorical labels. The contrastive distribution regularization separates the densities between in distribution (ID) and distribution-deviated samples. A simple and robust implementation algorithm is also provided, which can be used for any pre-trained neural network classifiers. To the best of our knowledge, we have conducted the most extensive evaluations and comparisons on computer vision benchmarks. The results show that our method significantly outperforms state-of-the-art detectors, and even achieves comparable or better performance than methods utilizing additional large-scale outlier exposure datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Depth is More Powerful than Width with Prediction Concatenation in Deep Forest b/data/2022/neurips/Depth is More Powerful than Width with Prediction Concatenation in Deep Forest
new file mode 100644
index 0000000000..6345d3d0f0
--- /dev/null
+++ b/data/2022/neurips/Depth is More Powerful than Width with Prediction Concatenation in Deep Forest	
@@ -0,0 +1 @@
+Random Forest (RF) is an ensemble learning algorithm proposed by Breiman [1] that constructs a large number of randomized decision trees individually and aggregates their predictions by naive averaging. Zhou and Feng [2] further propose Deep Forest (DF) algorithm with multi-layer feature transformation, which significantly outperforms random forest in various application fields. The prediction concatenation (PreConc) operation is crucial for the multi-layer feature transformation in deep forest, though little has been known about its theoretical property. In this paper, we analyze the influence of Preconc on the consistency of deep forest. Especially when the individual tree is inconsistent (as in practice, the individual tree is often set to be fully grown, i.e., there is only one sample at each leaf node), we find that the convergence rate of two-layer DF w.r.t. the number of trees M can reach O (1 /M 2 ) under some mild conditions, while the convergence rate of RF is O (1 /M ) . Therefore, with the help of PreConc, DF with deeper layer will be more powerful than the shallower layer. Experiments confirm theoretical advantages.
\ No newline at end of file
diff --git a/data/2022/neurips/Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks b/data/2022/neurips/Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks
new file mode 100644
index 0000000000..8a27719181
--- /dev/null
+++ b/data/2022/neurips/Descent Steps of a Relation-Aware Energy Produce Heterogeneous Graph Neural Networks	
@@ -0,0 +1 @@
+Heterogeneous graph neural networks (GNNs) achieve strong performance on node classification tasks in a semi-supervised learning setting. However, as in the simpler homogeneous GNN case, message-passing-based heterogeneous GNNs may struggle to balance between resisting the oversmoothing that may occur in deep models, and capturing long-range dependencies of graph structured data. Moreover, the complexity of this trade-off is compounded in the heterogeneous graph case due to the disparate heterophily relationships between nodes of different types. To address these issues, we propose a novel heterogeneous GNN architecture in which layers are derived from optimization steps that descend a novel relation-aware energy function. The corresponding minimizer is fully differentiable with respect to the energy function parameters, such that bilevel optimization can be applied to effectively learn a functional form whose minimum provides optimal node representations for subsequent classification tasks. In particular, this methodology allows us to model diverse heterophily relationships between different node types while avoiding oversmoothing effects. Experimental results on 8 heterogeneous graph benchmarks demonstrates that our proposed method can achieve competitive node classification accuracy
\ No newline at end of file
diff --git a/data/2022/neurips/DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection b/data/2022/neurips/DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
new file mode 100644
index 0000000000..c273525df6
--- /dev/null
+++ b/data/2022/neurips/DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection	
@@ -0,0 +1 @@
+Open-world object detection, as a more general and challenging goal, aims to recognize and localize objects described by arbitrary category names. The recent work GLIP formulates this problem as a grounding problem by concatenating all category names of detection datasets into sentences, which leads to inefficient interaction between category names. This paper presents DetCLIP, a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary. To achieve better learning efficiency, we propose a novel paralleled concept formulation that extracts concepts separately to better utilize heterogeneous datasets (i.e., detection, grounding, and image-text pairs) for training. We further design a concept dictionary~(with descriptions) from various online sources and detection datasets to provide prior knowledge for each concept. By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning. The proposed concept dictionary is further used to provide sufficient negative concepts for the construction of the word-region alignment loss\, and to complete labels for objects with missing descriptions in captions of image-text pair data. The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories compared to the fully-supervised model with the same backbone as ours.
\ No newline at end of file
diff --git a/data/2022/neurips/Detecting Abrupt Changes in Sequential Pairwise Comparison Data b/data/2022/neurips/Detecting Abrupt Changes in Sequential Pairwise Comparison Data
new file mode 100644
index 0000000000..56d38c2b66
--- /dev/null
+++ b/data/2022/neurips/Detecting Abrupt Changes in Sequential Pairwise Comparison Data	
@@ -0,0 +1 @@
+The Bradley-Terry-Luce (BTL) model is a classic and very popular statistical approach for eliciting a global ranking among a collection of items using pairwise comparison data. In applications in which the comparison outcomes are observed as a time series, it is often the case that data are non-stationary, in the sense that the true underlying ranking changes over time. In this paper we are concerned with localizing the change points in a high-dimensional BTL model with piece-wise constant parameters. We propose novel and practicable algorithms based on dynamic programming that can consistently estimate the unknown locations of the change points. We provide consistency rates for our methodology that depend explicitly on the model parameters, the temporal spacing between two consecutive change points and the magnitude of the change. We corroborate our findings with extensive numerical experiments and a real-life example.
\ No newline at end of file
diff --git a/data/2022/neurips/Detection and Localization of Changes in Conditional Distributions b/data/2022/neurips/Detection and Localization of Changes in Conditional Distributions
new file mode 100644
index 0000000000..f510da7e5d
--- /dev/null
+++ b/data/2022/neurips/Detection and Localization of Changes in Conditional Distributions	
@@ -0,0 +1 @@
+We study the change point problem that considers alterations in the conditional distribution of an inferential target on a set of covariates. This paired data scenario is in contrast to the standard setting where a sequentially observed variable is analyzed for potential changes in the marginal distribution. We propose new methodology for solving this problem, by starting from a simpler task that analyzes changes in conditional expectation, and generalizing the tools developed for that task to conditional distributions. Large sample properties of the proposed statistics are derived. In empirical studies, we illustrate the performance of the proposed method against baselines adapted from existing tools. Two real data applications are presented to demonstrate its potential.
\ No newline at end of file
diff --git a/data/2022/neurips/Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference b/data/2022/neurips/Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference
new file mode 100644
index 0000000000..dca8a2c1ce
--- /dev/null
+++ b/data/2022/neurips/Deterministic Langevin Monte Carlo with Normalizing Flows for Bayesian Inference	
@@ -0,0 +1 @@
+We propose a general purpose Bayesian inference algorithm for expensive likelihoods, replacing the stochastic term in the Langevin equation with a deterministic density gradient term. The particle density is evaluated from the current particle positions using a Normalizing Flow (NF), which is differentiable and has good generalization properties in high dimensions. We take advantage of NF preconditioning and NF based Metropolis-Hastings updates for a faster convergence. We show on various examples that the method is competitive against state of the art sampling methods.
\ No newline at end of file
diff --git a/data/2022/neurips/DevFly: Bio-Inspired Development of Binary Connections for Locality Preserving Sparse Codes b/data/2022/neurips/DevFly: Bio-Inspired Development of Binary Connections for Locality Preserving Sparse Codes
new file mode 100644
index 0000000000..792a91db31
--- /dev/null
+++ b/data/2022/neurips/DevFly: Bio-Inspired Development of Binary Connections for Locality Preserving Sparse Codes	
@@ -0,0 +1 @@
+S Data Samples S = {s1, . . . sq, . . . su} q Sample index u Number of samples s A data sample s = {x1, . . . xi, . . . xd} d Input dimension i Input dimension index α Sampling ratio 0 < α < 1, α ∼ 0.1 n Number of connections per KC n = ⌊αd⌋ k Number of active KCs m Code dimension (hash length) j Code dimension index k/m Sparseness θ Threshold θ ∼ 10 β Threshold increasing rate 1 < β, β ∼ 1.2 y′ Intermediate code y′ = {y′ 1, . . . , y′ j , . . . , y′ m} y Hash code y = {y1, . . . , yj , . . . , ym} R Mapping matrix R = {r1, . . . , rj , . . . , rm} r Connections to a code dimension r = {r1, . . . , ri, . . . , rd}, ri ∈ [0, 1] γ expansion ratio γ = m/d Table S.1: Notations in this paper
\ No newline at end of file
diff --git a/data/2022/neurips/DiSC: Differential Spectral Clustering of Features b/data/2022/neurips/DiSC: Differential Spectral Clustering of Features
new file mode 100644
index 0000000000..0c5bed2eb7
--- /dev/null
+++ b/data/2022/neurips/DiSC: Differential Spectral Clustering of Features	
@@ -0,0 +1 @@
+Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Diagnosing failures of fairness transfer across distribution shift in real-world medical settings b/data/2022/neurips/Diagnosing failures of fairness transfer across distribution shift in real-world medical settings
new file mode 100644
index 0000000000..22b71beee6
--- /dev/null
+++ b/data/2022/neurips/Diagnosing failures of fairness transfer across distribution shift in real-world medical settings	
@@ -0,0 +1 @@
+Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is encountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as a key tool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures of fairness transfer, including cases where real-world shifts are more complex than is often assumed in the literature. Based on these results, we discuss potential remedies at each step of the machine learning pipeline.
\ No newline at end of file
diff --git a/data/2022/neurips/Diagonal State Spaces are as Effective as Structured State Spaces b/data/2022/neurips/Diagonal State Spaces are as Effective as Structured State Spaces
new file mode 100644
index 0000000000..e7178b91b8
--- /dev/null
+++ b/data/2022/neurips/Diagonal State Spaces are as Effective as Structured State Spaces	
@@ -0,0 +1 @@
+Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.
\ No newline at end of file
diff --git a/data/2022/neurips/Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech b/data/2022/neurips/Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
new file mode 100644
index 0000000000..efd58348a5
--- /dev/null
+++ b/data/2022/neurips/Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech	
@@ -0,0 +1 @@
+Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentiable Analog Quantum Computing for Optimization and Control b/data/2022/neurips/Differentiable Analog Quantum Computing for Optimization and Control
new file mode 100644
index 0000000000..40066f1261
--- /dev/null
+++ b/data/2022/neurips/Differentiable Analog Quantum Computing for Optimization and Control	
@@ -0,0 +1 @@
+We formulate the first differentiable analog quantum computing framework with a specific parameterization design at the analog signal (pulse) level to better exploit near-term quantum devices via variational methods. We further propose a scalable approach to estimate the gradients of quantum dynamics using a forward pass with Monte Carlo sampling, which leads to a quantum stochastic gradient descent algorithm for scalable gradient-based training in our framework. Applying our framework to quantum optimization and control, we observe a significant advantage of differentiable analog quantum computing against SOTAs based on parameterized digital quantum circuits by orders of magnitude.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentiable hierarchical and surrogate gradient search for spiking neural networks b/data/2022/neurips/Differentiable hierarchical and surrogate gradient search for spiking neural networks
new file mode 100644
index 0000000000..a966077a64
--- /dev/null
+++ b/data/2022/neurips/Differentiable hierarchical and surrogate gradient search for spiking neural networks	
@@ -0,0 +1 @@
+Spiking neural network (SNN) has been viewed as a potential candidate for the next generation of artificial intelligence with appealing characteristics such as sparse computation and inherent temporal dynamics. By adopting architectures of deep artificial neural networks (ANNs), SNNs are achieving competitive performances in benchmark tasks such as image classification. However, successful architectures of ANNs are not necessary ideal for SNN and when tasks become more diverse effective architectural variations could be critical. To this end, we develop a spike-based differentiable hierarchical search (SpikeDHS) framework, where spike-based computation is realized on both the cell and the layer level search space. Based on this framework, we find effective SNN architectures under limited computation cost. During the training of SNN, a suboptimal surrogate gradient function could lead to poor approximations of true gradients, making the network enter certain local minima. To address this problem, we extend the differential approach to surrogate gradient search where the SG function is efficiently optimized locally. Our models achieve state-of-the-art performances on classification of CIFAR10/100 and ImageNet with accuracy of 95.50%, 76.25% and 68.64%. On event-based deep stereo, our method finds optimal layer variation and surpasses the accuracy of specially designed ANNs meanwhile with 26 × lower energy cost ( 6 . 7mJ ), demonstrating the advantage of SNN in processing highly sparse and dynamic signals. Codes are available at https://github.com/Huawei-BIC/SpikeDHS .
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Covariance Revisited b/data/2022/neurips/Differentially Private Covariance Revisited
new file mode 100644
index 0000000000..3c67b04953
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Covariance Revisited	
@@ -0,0 +1 @@
+In this paper, we present two new algorithms for covariance estimation under concentrated differential privacy (zCDP). The first algorithm achieves a Frobenius error of $\tilde{O}(d^{1/4}\sqrt{\mathrm{tr}}/\sqrt{n} + \sqrt{d}/n)$, where $\mathrm{tr}$ is the trace of the covariance matrix. By taking $\mathrm{tr}=1$, this also implies a worst-case error bound of $\tilde{O}(d^{1/4}/\sqrt{n})$, which improves the standard Gaussian mechanism's $\tilde{O}(d/n)$ for the regime $d>\widetilde{\Omega}(n^{2/3})$. Our second algorithm offers a tail-sensitive bound that could be much better on skewed data. The corresponding algorithms are also simple and efficient. Experimental results show that they offer significant improvements over prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Generalized Linear Models Revisited b/data/2022/neurips/Differentially Private Generalized Linear Models Revisited
new file mode 100644
index 0000000000..e50966e538
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Generalized Linear Models Revisited	
@@ -0,0 +1 @@
+We study the problem of $(\epsilon,\delta)$-differentially private learning of linear predictors with convex losses. We provide results for two subclasses of loss functions. The first case is when the loss is smooth and non-negative but not necessarily Lipschitz (such as the squared loss). For this case, we establish an upper bound on the excess population risk of $\tilde{O}\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^* \Vert^2}{(n\epsilon)^{2/3}},\frac{\sqrt{d}\Vert w^*\Vert^2}{n\epsilon}\right\}\right)$, where $n$ is the number of samples, $d$ is the dimension of the problem, and $w^*$ is the minimizer of the population risk. Apart from the dependence on $\Vert w^\ast\Vert$, our bound is essentially tight in all parameters. In particular, we show a lower bound of $\tilde{\Omega}\left(\frac{1}{\sqrt{n}} + {\min\left\{\frac{\Vert w^*\Vert^{4/3}}{(n\epsilon)^{2/3}}, \frac{\sqrt{d}\Vert w^*\Vert}{n\epsilon}\right\}}\right)$. We also revisit the previously studied case of Lipschitz losses [SSTT20]. For this case, we close the gap in the existing work and show that the optimal rate is (up to log factors) $\Theta\left(\frac{\Vert w^*\Vert}{\sqrt{n}} + \min\left\{\frac{\Vert w^*\Vert}{\sqrt{n\epsilon}},\frac{\sqrt{\text{rank}}\Vert w^*\Vert}{n\epsilon}\right\}\right)$, where $\text{rank}$ is the rank of the design matrix. This improves over existing work in the high privacy regime. Finally, our algorithms involve a private model selection approach that we develop to enable attaining the stated rates without a-priori knowledge of $\Vert w^*\Vert$.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank b/data/2022/neurips/Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank
new file mode 100644
index 0000000000..57e550c8ed
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank	
@@ -0,0 +1 @@
+Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data. In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Learning Needs Hidden State (Or Much Faster Convergence) b/data/2022/neurips/Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)
new file mode 100644
index 0000000000..021364e7f1
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Learning Needs Hidden State (Or Much Faster Convergence)	
@@ -0,0 +1 @@
+Prior work on differential privacy analysis of randomized SGD algorithms relies on composition theorems, where the implicit (unrealistic) assumption is that the internal state of the iterative algorithm is revealed to the adversary. As a result, the R\'enyi DP bounds derived by such composition-based analyses linearly grow with the number of training epochs. When the internal state of the algorithm is hidden, we prove a converging privacy bound for noisy stochastic gradient descent (on strongly convex smooth loss functions). We show how to take advantage of privacy amplification by sub-sampling and randomized post-processing, and prove the dynamics of privacy bound for"shuffle and partition"and"sample without replacement"stochastic mini-batch gradient descent schemes. We prove that, in these settings, our privacy bound converges exponentially fast and is substantially smaller than the composition bounds, notably after a few number of training epochs. Thus, unless the DP algorithm converges fast, our privacy analysis shows that hidden state analysis can significantly amplify differential privacy.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Learning with Margin Guarantees b/data/2022/neurips/Differentially Private Learning with Margin Guarantees
new file mode 100644
index 0000000000..2118c07e62
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Learning with Margin Guarantees	
@@ -0,0 +1 @@
+We present a series of new differentially private (DP) algorithms with dimension-independent margin guarantees. For the family of linear hypotheses, we give a pure DP learning algorithm that benefits from relative deviation margin guarantees, as well as an efficient DP learning algorithm with margin guarantees. We also present a new efficient DP learning algorithm with margin guarantees for kernel-based hypotheses with shift-invariant kernels, such as Gaussian kernels, and point out how our results can be extended to other kernels using oblivious sketching techniques. We further give a pure DP learning algorithm for a family of feed-forward neural networks for which we prove margin guarantees that are independent of the input dimension. Additionally, we describe a general label DP learning algorithm, which benefits from relative deviation margin bounds and is applicable to a broad family of hypothesis sets, including that of neural networks. Finally, we show how our DP learning algorithms can be augmented in a general way to include model selection, to select the best confidence margin parameter.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Linear Sketches: Efficient Implementations and Applications b/data/2022/neurips/Differentially Private Linear Sketches: Efficient Implementations and Applications
new file mode 100644
index 0000000000..a79a86f3a1
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Linear Sketches: Efficient Implementations and Applications	
@@ -0,0 +1 @@
+Linear sketches have been widely adopted to process fast data streams, and they can be used to accurately answer frequency estimation, approximate top K items, and summarize data distributions. When data are sensitive, it is desirable to provide privacy guarantees for linear sketches to preserve private information while delivering useful results with theoretical bounds. We show that linear sketches can ensure privacy and maintain their unique properties with a small amount of noise added at initialization. From the differentially private linear sketches, we showcase that the state-of-the-art quantile sketch in the turnstile model can also be private and maintain high performance. Experiments further demonstrate that our proposed differentially private sketches are quantitatively and qualitatively similar to noise-free sketches with high utilization on synthetic and real datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Model Compression b/data/2022/neurips/Differentially Private Model Compression
new file mode 100644
index 0000000000..bc1585090c
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Model Compression	
@@ -0,0 +1 @@
+Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, can be prohibitively large. Hence, often in practice, LLMs are compressed before they are deployed in specific applications. In this paper, we initiate the study of differentially private model compression and propose frameworks for achieving 50% sparsity levels while maintaining nearly full performance. We demonstrate these ideas on standard GLUE benchmarks using BERT models, setting benchmarks for future research on this topic.
\ No newline at end of file
diff --git a/data/2022/neurips/Differentially Private Online-to-batch for Smooth Losses b/data/2022/neurips/Differentially Private Online-to-batch for Smooth Losses
new file mode 100644
index 0000000000..1043ed04b4
--- /dev/null
+++ b/data/2022/neurips/Differentially Private Online-to-batch for Smooth Losses	
@@ -0,0 +1 @@
+We develop a new reduction that converts any online convex optimization algorithm suffering $O(\sqrt{T})$ regret into an $\epsilon$-differentially private stochastic convex optimization algorithm with the optimal convergence rate $\tilde O(1/\sqrt{T} + \sqrt{d}/\epsilon T)$ on smooth losses in linear time, forming a direct analogy to the classical non-private"online-to-batch"conversion. By applying our techniques to more advanced adaptive online algorithms, we produce adaptive differentially private counterparts whose convergence rates depend on apriori unknown variances or parameter norms.
\ No newline at end of file
diff --git a/data/2022/neurips/Diffusion Curvature for Estimating Local Curvature in High Dimensional Data b/data/2022/neurips/Diffusion Curvature for Estimating Local Curvature in High Dimensional Data
new file mode 100644
index 0000000000..33c7c11677
--- /dev/null
+++ b/data/2022/neurips/Diffusion Curvature for Estimating Local Curvature in High Dimensional Data	
@@ -0,0 +1 @@
+We introduce a new intrinsic measure of local curvature on point-cloud data called diffusion curvature. Our measure uses the framework of diffusion maps, including the data diffusion operator, to structure point cloud data and define local curvature based on the laziness of a random walk starting at a point or region of the data. We show that this laziness directly relates to volume comparison results from Riemannian geometry. We then extend this scalar curvature notion to an entire quadratic form using neural network estimations based on the diffusion map of point-cloud data. We show applications of both estimations on toy data, single-cell data, and on estimating local Hessian matrices of neural network loss landscapes.
\ No newline at end of file
diff --git a/data/2022/neurips/Diffusion Models as Plug-and-Play Priors b/data/2022/neurips/Diffusion Models as Plug-and-Play Priors
new file mode 100644
index 0000000000..37389ee49e
--- /dev/null
+++ b/data/2022/neurips/Diffusion Models as Plug-and-Play Priors	
@@ -0,0 +1 @@
+We consider the problem of inferring high-dimensional data $\mathbf{x}$ in a model that consists of a prior $p(\mathbf{x})$ and an auxiliary differentiable constraint $c(\mathbf{x},\mathbf{y})$ on $x$ given some additional information $\mathbf{y}$. In this paper, the prior is an independently trained denoising diffusion generative model. The auxiliary constraint is expected to have a differentiable form, but can come from diverse sources. The possibility of such inference turns diffusion models into plug-and-play modules, thereby allowing a range of potential applications in adapting models to new domains and tasks, such as conditional generation or image segmentation. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise at each step. Considering many noised versions of $\mathbf{x}$ in evaluation of its fitness is a novel search mechanism that may lead to new algorithms for solving combinatorial optimization problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Diffusion Visual Counterfactual Explanations b/data/2022/neurips/Diffusion Visual Counterfactual Explanations
new file mode 100644
index 0000000000..21f07631a3
--- /dev/null
+++ b/data/2022/neurips/Diffusion Visual Counterfactual Explanations	
@@ -0,0 +1 @@
+Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are 'small' but 'realistic' semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.
\ No newline at end of file
diff --git a/data/2022/neurips/Diffusion-LM Improves Controllable Text Generation b/data/2022/neurips/Diffusion-LM Improves Controllable Text Generation
new file mode 100644
index 0000000000..c78c1b79bf
--- /dev/null
+++ b/data/2022/neurips/Diffusion-LM Improves Controllable Text Generation	
@@ -0,0 +1 @@
+Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there has been little progress on complex, fine-grained controls (e.g., syntactic structure). To address this challenge, we develop a new non-autoregressive language model based on continuous diffusions that we call Diffusion-LM. Building upon the recent successes of diffusion models in continuous domains, Diffusion-LM iteratively denoises a sequence of Gaussian vectors into word vectors, yielding a sequence of intermediate latent variables. The continuous, hierarchical nature of these intermediate variables enables a simple gradient-based algorithm to perform complex, controllable generation tasks. We demonstrate successful control of Diffusion-LM for six challenging fine-grained control tasks, significantly outperforming prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Diffusion-based Molecule Generation with Informative Prior Bridges b/data/2022/neurips/Diffusion-based Molecule Generation with Informative Prior Bridges
new file mode 100644
index 0000000000..cae308c933
--- /dev/null
+++ b/data/2022/neurips/Diffusion-based Molecule Generation with Informative Prior Bridges	
@@ -0,0 +1 @@
+AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities.
\ No newline at end of file
diff --git a/data/2022/neurips/DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data b/data/2022/neurips/DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data
new file mode 100644
index 0000000000..2770d78ec6
--- /dev/null
+++ b/data/2022/neurips/DigGAN: Discriminator gradIent Gap Regularization for GAN Training with Limited Data	
@@ -0,0 +1 @@
+Generative adversarial nets (GANs) have been remarkably successful at learning to sample from distributions specified by a given dataset, particularly if the given dataset is reasonably large compared to its dimensionality. However, given limited data, classical GANs have struggled, and strategies like output-regularization, data-augmentation, use of pre-trained models and pruning have been shown to lead to improvements. Notably, the applicability of these strategies is 1) often constrained to particular settings, e.g., availability of a pretrained GAN; or 2) increases training time, e.g., when using pruning. In contrast, we propose a Discriminator gradIent Gap regularized GAN (DigGAN) formulation which can be added to any existing GAN. DigGAN augments existing GANs by encouraging to narrow the gap between the norm of the gradient of a discriminator's prediction w.r.t.\ real images and w.r.t.\ the generated samples. We observe this formulation to avoid bad attractors within the GAN loss landscape, and we find DigGAN to significantly improve the results of GAN training when limited data is available. Code is available at \url{https://github.com/AilsaF/DigGAN}.
\ No newline at end of file
diff --git a/data/2022/neurips/Direct Advantage Estimation b/data/2022/neurips/Direct Advantage Estimation
new file mode 100644
index 0000000000..db751f4d68
--- /dev/null
+++ b/data/2022/neurips/Direct Advantage Estimation	
@@ -0,0 +1 @@
+The predominant approach in reinforcement learning is to assign credit to actions based on the expected return. However, we show that the return may depend on the policy in a way which could lead to excessive variance in value estimation and slow down learning. Instead, we show that the advantage function can be interpreted as causal effects and shares similar properties with causal representations. Based on this insight, we propose Direct Advantage Estimation (DAE), a novel method that can model the advantage function and estimate it directly from on-policy data while simultaneously minimizing the variance of the return without requiring the (action-)value function. We also relate our method to Temporal Difference methods by showing how value functions can be seamlessly integrated into DAE. The proposed method is easy to implement and can be readily adapted by modern actor-critic methods. We evaluate DAE empirically on three discrete control domains and show that it can outperform generalized advantage estimation (GAE), a strong baseline for advantage estimation, on a majority of the environments when applied to policy optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Discovered Policy Optimisation b/data/2022/neurips/Discovered Policy Optimisation
new file mode 100644
index 0000000000..34e9734051
--- /dev/null
+++ b/data/2022/neurips/Discovered Policy Optimisation	
@@ -0,0 +1 @@
+Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations, intuitions, and experimentation. Such an approach of creating algorithms manually is limited by human understanding and ingenuity. In contrast, meta-learning provides a toolkit for automatic machine learning method optimisation, potentially addressing this flaw. However, black-box approaches which attempt to discover RL algorithms with minimal prior structure have thus far not outperformed existing hand-crafted algorithms. Mirror Learning, which includes RL algorithms, such as PPO, offers a potential middle-ground starting point: while every method in this framework comes with theoretical guarantees, components that differentiate them are subject to design. In this paper we explore the Mirror Learning space by meta-learning a"drift"function. We refer to the immediate result as Learnt Policy Optimisation (LPO). By analysing LPO we gain original insights into policy optimisation which we use to formulate a novel, closed-form RL algorithm, Discovered Policy Optimisation (DPO). Our experiments in Brax environments confirm state-of-the-art performance of LPO and DPO, as well as their transfer to unseen settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Discovering Design Concepts for CAD Sketches b/data/2022/neurips/Discovering Design Concepts for CAD Sketches
new file mode 100644
index 0000000000..771246a8ba
--- /dev/null
+++ b/data/2022/neurips/Discovering Design Concepts for CAD Sketches	
@@ -0,0 +1 @@
+Sketch design concepts are recurring patterns found in parametric CAD sketches. Though rarely explicitly formalized by the CAD designers, these concepts are implicitly used in design for modularity and regularity. In this paper, we propose a learning based approach that discovers the modular concepts by induction over raw sketches. We propose the dual implicit-explicit representation of concept structures that allows implicit detection and explicit generation, and the separation of structure generation and parameter instantiation for parameterized concept generation, to learn modular concepts by end-to-end training. We demonstrate the design concept learning on a large scale CAD sketch dataset and show its applications for design intent interpretation and auto-completion.
\ No newline at end of file
diff --git a/data/2022/neurips/Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation b/data/2022/neurips/Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation
new file mode 100644
index 0000000000..e32add1c98
--- /dev/null
+++ b/data/2022/neurips/Discovering and Overcoming Limitations of Noise-engineered Data-free Knowledge Distillation	
@@ -0,0 +1 @@
+Distillation in neural networks using only the samples randomly drawn from a Gaussian distribution is possibly the most straightforward solution one can think of for the complex problem of knowledge transfer from one network (teacher) to the other (student). If successfully done, it can eliminate the requirement of teacher’s training data for knowledge distillation and avoid often arising privacy concerns in sensitive applications such as healthcare. There have been some recent attempts at Gaussian noise-based data-free knowledge distillation, however, none of them offer a consistent or reliable solution. We identify the shift in the distribution of hidden layer activation as the key limiting factor, which occurs when Gaussian noise is fed to the teacher network instead of the accustomed training data. We propose a simple solution to mitigate this shift and show that for vision tasks, such as classification, it is possible to achieve a performance close to the teacher by just using the samples randomly drawn from a Gaussian distribution. We validate our approach on CIFAR10, CIFAR100, SVHN, and Food101 datasets. We further show that in situations of sparsely available original data for distillation, the proposed Gaussian noise-based knowledge distillation method can outperform the distillation using the available data with a large margin. Our work lays the foundation for further research in the direction of noise-engineered knowledge distillation using random samples.
\ No newline at end of file
diff --git a/data/2022/neurips/Discovery of Single Independent Latent Variable b/data/2022/neurips/Discovery of Single Independent Latent Variable
new file mode 100644
index 0000000000..95b93d9a8b
--- /dev/null
+++ b/data/2022/neurips/Discovery of Single Independent Latent Variable	
@@ -0,0 +1 @@
+Latent variable discovery is a central problem in data analysis with a broad range of applications in applied science. In this work, we consider data given as an invertible mixture of two statistically independent components and assume that one of the components is observed while the other is hidden. Our goal is to recover the hidden component. For this purpose, we propose an autoencoder equipped with a discriminator. Unlike the standard nonlinear ICA problem, which was shown to be non-identifiable, in the special case of ICA we consider here, we show that our approach can recover the component of interest up to entropy-preserving transformation. We demonstrate the performance of the proposed approach in several tasks, including image synthesis, voice cloning, and fetal ECG extraction.
\ No newline at end of file
diff --git a/data/2022/neurips/Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning b/data/2022/neurips/Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Discrete-Convex-Analysis-Based Framework for Warm-Starting Algorithms with Predictions b/data/2022/neurips/Discrete-Convex-Analysis-Based Framework for Warm-Starting Algorithms with Predictions
new file mode 100644
index 0000000000..b6e070e2f7
--- /dev/null
+++ b/data/2022/neurips/Discrete-Convex-Analysis-Based Framework for Warm-Starting Algorithms with Predictions	
@@ -0,0 +1 @@
+Augmenting algorithms with learned predictions is a promising approach for going beyond worst-case bounds. Dinitz, Im, Lavastida, Moseley, and Vassilvitskii~(2021) have demonstrated that a warm start with learned dual solutions can improve the time complexity of the Hungarian method for weighted perfect bipartite matching. We extend and improve their framework in a principled manner via \textit{discrete convex analysis} (DCA), a discrete analog of convex analysis. We show the usefulness of our DCA-based framework by applying it to weighted perfect bipartite matching, weighted matroid intersection, and discrete energy minimization for computer vision. Our DCA-based framework yields time complexity bounds that depend on the $\ell_\infty$-distance from a predicted solution to an optimal solution, which has two advantages relative to the previous $\ell_1$-distance-dependent bounds: time complexity bounds are smaller, and learning of predictions is more sample efficient. We also discuss whether to learn primal or dual solutions from the DCA perspective.
\ No newline at end of file
diff --git a/data/2022/neurips/Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders b/data/2022/neurips/Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders
new file mode 100644
index 0000000000..7ccb6e489b
--- /dev/null
+++ b/data/2022/neurips/Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders	
@@ -0,0 +1 @@
+The ability to answer causal questions is crucial in many domains, as causal inference allows one to understand the impact of interventions. In many applications, only a single intervention is possible at a given time. However, in some important areas, multiple interventions are concurrently applied. Disentangling the effects of single interventions from jointly applied interventions is a challenging task -- especially as simultaneously applied interventions can interact. This problem is made harder still by unobserved confounders, which influence both treatments and outcome. We address this challenge by aiming to learn the effect of a single-intervention from both observational data and sets of interventions. We prove that this is not generally possible, but provide identification proofs demonstrating that it can be achieved under non-linear continuous structural causal models with additive, multivariate Gaussian noise -- even when unobserved confounders are present. Importantly, we show how to incorporate observed covariates and learn heterogeneous treatment effects. Based on the identifiability proofs, we provide an algorithm that learns the causal model parameters by pooling data from different regimes and jointly maximizing the combined likelihood. The effectiveness of our method is empirically demonstrated on both synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2022/neurips/Disentangling Transfer in Continual Reinforcement Learning b/data/2022/neurips/Disentangling Transfer in Continual Reinforcement Learning
new file mode 100644
index 0000000000..7fb7c05fd8
--- /dev/null
+++ b/data/2022/neurips/Disentangling Transfer in Continual Reinforcement Learning	
@@ -0,0 +1 @@
+The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcement learning. We adopt SAC as the underlying RL algorithm and Continual World as a suite of continuous control tasks. We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. The best set of choices, dubbed ClonEx-SAC, is evaluated on the recent Continual World benchmark. ClonEx-SAC achieves 87% final success rate compared to 80% of PackNet, the best method in the benchmark. Moreover, the transfer grows from 0.18 to 0.54 according to the metric provided by Continual World.
\ No newline at end of file
diff --git a/data/2022/neurips/Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel b/data/2022/neurips/Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
new file mode 100644
index 0000000000..7c6f0744b7
--- /dev/null
+++ b/data/2022/neurips/Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel	
@@ -0,0 +1 @@
+Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.
\ No newline at end of file
diff --git a/data/2022/neurips/Distilled Gradient Aggregation: Purify Features for Input Attribution in the Deep Neural Network b/data/2022/neurips/Distilled Gradient Aggregation: Purify Features for Input Attribution in the Deep Neural Network
new file mode 100644
index 0000000000..ec357306e8
--- /dev/null
+++ b/data/2022/neurips/Distilled Gradient Aggregation: Purify Features for Input Attribution in the Deep Neural Network	
@@ -0,0 +1 @@
+Measuring the attribution of input features toward the model output is one of the popular post-hoc explanations on the Deep Neural Networks (DNNs). Among various approaches to compute the attribution, the gradient-based methods are widely used to generate attributions, because of its ease of implementation and the model-agnostic characteristic. However, existing gradient integration methods such as Integrated Gradients (IG) suffer from (1) the noisy attributions which cause the unreliability of the explanation, and (2) the selection for the integration path which determines the quality of explanations. FullGrad (FG) is an another approach to construct the reliable attributions by focusing the locality of piece-wise linear network with the bias gradient. Although FG has shown reasonable performance for the given input, as the shortage of the global property, FG is vulnerable to the small perturbation, while IG which includes the exploration over the input space is robust. In this work, we design a new input attribution method which adopt the strengths of both local and global attributions. In particular, we propose a novel approach to distill input features using weak and extremely positive contributor masks. We aggregate the intermediate local attributions obtained from the distillation sequence to provide reliable attribution. We perform the quantitative evaluation compared to various attribution methods and show that our method outperforms others. We also provide the qualitative result that our method obtains object-aligned and sharp attribution heatmap.
\ No newline at end of file
diff --git a/data/2022/neurips/Distilling Representations from GAN Generator via Squeeze and Span b/data/2022/neurips/Distilling Representations from GAN Generator via Squeeze and Span
new file mode 100644
index 0000000000..8a2a7a19b2
--- /dev/null
+++ b/data/2022/neurips/Distilling Representations from GAN Generator via Squeeze and Span	
@@ -0,0 +1 @@
+In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We squeeze the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We span the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code is available at https://github.com/yangyu12/squeeze-and-span.
\ No newline at end of file
diff --git a/data/2022/neurips/Distinguishing Learning Rules with Brain Machine Interfaces b/data/2022/neurips/Distinguishing Learning Rules with Brain Machine Interfaces
new file mode 100644
index 0000000000..15e94f6ccc
--- /dev/null
+++ b/data/2022/neurips/Distinguishing Learning Rules with Brain Machine Interfaces	
@@ -0,0 +1 @@
+Despite extensive theoretical work on biologically plausible learning rules, clear evidence about whether and how such rules are implemented in the brain has been difficult to obtain. We consider biologically plausible supervised- and reinforcement-learning rules and ask whether changes in network activity during learning can be used to determine which learning rule is being used. Supervised learning requires a credit-assignment model estimating the mapping from neural activity to behavior, and, in a biological organism, this model will inevitably be an imperfect approximation of the ideal mapping, leading to a bias in the direction of the weight updates relative to the true gradient. Reinforcement learning, on the other hand, requires no credit-assignment model and tends to make weight updates following the true gradient direction. We derive a metric to distinguish between learning rules by observing changes in the network activity during learning, given that the mapping from brain to behavior is known by the experimenter. Because brain-machine interface (BMI) experiments allow for precise knowledge of this mapping, we model a cursor-control BMI task using recurrent neural networks, showing that learning rules can be distinguished in simulated experiments using only observations that a neuroscience experimenter would plausibly have access to.
\ No newline at end of file
diff --git a/data/2022/neurips/Distinguishing discrete and continuous behavioral variability using warped autoregressive HMMs b/data/2022/neurips/Distinguishing discrete and continuous behavioral variability using warped autoregressive HMMs
new file mode 100644
index 0000000000..b2211b543d
--- /dev/null
+++ b/data/2022/neurips/Distinguishing discrete and continuous behavioral variability using warped autoregressive HMMs	
@@ -0,0 +1 @@
+A core goal in systems neuroscience and neuroethology is to understand how neural circuits generate naturalistic behavior. One foundational idea is that complex naturalistic behavior may be composed of sequences of stereotyped behavioral syllables, which combine to generate rich sequences of actions. To investigate this, a common approach is to use autoregressive hidden Markov models (ARHMMs) to segment video into discrete behavioral syllables. While these approaches have been successful in extracting syllables that are interpretable, they fail to account for other forms of behavioral variability, such as differences in speed, which may be better described as continuous in nature. To overcome these limitations, we introduce a class of warped ARHMMs (WARHMM). As is the case in the ARHMM, behavior is modeled as a mixture of autoregressive dynamics. However, the dynamics under each discrete latent state (i.e. each behavioral syllable) are additionally modulated by a continuous latent “warping variable.” We present two versions of warped ARHMM in which the warping variable affects the dynamics of each syllable either linearly or nonlinearly. Using depth-camera recordings of freely moving mice, we demonstrate that the failure of ARHMMs to account for continuous behavioral variability results in duplicate cluster assignments. WARHMM achieves similar performance to the standard ARHMM while using fewer behavioral syllables. Further analysis of behavioral measurements in mice demonstrates that WARHMM identifies structure relating to response vigor.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Distributionally Robust Optimization with Non-Convex Objectives b/data/2022/neurips/Distributed Distributionally Robust Optimization with Non-Convex Objectives
new file mode 100644
index 0000000000..056a9b5909
--- /dev/null
+++ b/data/2022/neurips/Distributed Distributionally Robust Optimization with Non-Convex Objectives	
@@ -0,0 +1 @@
+Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been widely applied in diverse applications, e.g., network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to different scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the distributed distributionally robust optimization (DDRO) problem. Furthermore, a new uncertainty set, i.e., constrained D-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, and remain robust against data heterogeneity as well as malicious attacks, but also tradeoff robustness with performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems b/data/2022/neurips/Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems
new file mode 100644
index 0000000000..e3d870593f
--- /dev/null
+++ b/data/2022/neurips/Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems	
@@ -0,0 +1 @@
+Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems b/data/2022/neurips/Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems
new file mode 100644
index 0000000000..afb0dc08a6
--- /dev/null
+++ b/data/2022/neurips/Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems	
@@ -0,0 +1 @@
+This paper considers the problem of recovering the policies of multiple interacting experts by estimating their reward functions and constraints where the demonstration data of the experts is distributed to a group of learners. We formulate this problem as a distributed bi-level optimization problem and propose a novel bi-level “distributed inverse constrained reinforcement learning” (D-ICRL) algo-rithm that allows the learners to collaboratively estimate the constraints in the outer loop and learn the corresponding policies and reward functions in the inner loop from the distributed demonstrations through intermittent communications. We formally guarantee that the distributed learners asymptotically achieve consensus which belongs to the set of stationary points of the bi-level optimization problem. Simulations are done to validate the proposed algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Learning of Conditional Quantiles in the Reproducing Kernel Hilbert Space b/data/2022/neurips/Distributed Learning of Conditional Quantiles in the Reproducing Kernel Hilbert Space
new file mode 100644
index 0000000000..a39d8889b9
--- /dev/null
+++ b/data/2022/neurips/Distributed Learning of Conditional Quantiles in the Reproducing Kernel Hilbert Space	
@@ -0,0 +1 @@
+We study distributed learning of nonparametric conditional quantiles with Tikhonov regularization in a reproducing kernel Hilbert space (RKHS). Although distributed parametric quantile regression has been investigated in several existing works, the current nonparametric quantile setting poses different challenges and is still unexplored. The difficulty lies in the illusive explicit bias-variance decomposition in the quantile RKHS setting as in the regularized least squares regression. For the simple divide-and-conquer approach that partitions the data set into multiple parts and then takes an arithmetic average of the individual outputs, we establish the risk bounds using a novel second-order empirical process for quantile risk.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees b/data/2022/neurips/Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
new file mode 100644
index 0000000000..75461d5aa2
--- /dev/null
+++ b/data/2022/neurips/Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees	
@@ -0,0 +1 @@
+Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. New algorithms support bidirectional compressions, and also can be modified for stochastic setting with batches and for federated learning with partial participation of clients. We empirically validated our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Online Convex Optimization with Compressed Communication b/data/2022/neurips/Distributed Online Convex Optimization with Compressed Communication
new file mode 100644
index 0000000000..ba1cd52d19
--- /dev/null
+++ b/data/2022/neurips/Distributed Online Convex Optimization with Compressed Communication	
@@ -0,0 +1 @@
+We consider a distributed online convex optimization problem when streaming data are distributed among computing agents over a connected communication network. Since the data are high-dimensional or the network is large-scale, communication load can be a bottleneck for the efficiency of distributed algorithms. To tackle this bottleneck, we apply the state-of-art data compression scheme to the fundamental GD-based distributed online algorithms. Three algorithms with difference-compressed communication are proposed for full information feedback (DC-DOGD), one-point bandit feedback (DC-DOBD), and two-point bandit feed-back (DC-DO2BD), respectively. We obtain regret bounds explicitly in terms of time horizon, compression ratio, decision dimension, agent number, and network parameters. Our algorithms are proved to be no-regret and match the same regret bounds, w.r.t. time horizon, with their uncompressed versions for both convex and strongly convex losses. Numerical experiments are given to validate the theoretical findings and illustrate that the proposed algorithms can effectively reduce the total transmitted bits for distributed online training compared with the uncompressed baseline.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity b/data/2022/neurips/Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity
new file mode 100644
index 0000000000..9a20cb8ba2
--- /dev/null
+++ b/data/2022/neurips/Distributed Optimization for Overparameterized Problems: Achieving Optimal Dimension Independent Communication Complexity	
@@ -0,0 +1 @@
+Decentralized optimization are playing an important role in applications such as training large machine learning models, among others. Despite its superior practical performance, there has been some lack of fundamental understanding about its theoretical properties. In this work, we address the following open research question: To train an overparameterized model over a set of distributed nodes, what is the minimum communication overhead (in terms of the bits got exchanged) that the system needs to sustain, while still achieving (near) zero training loss? We show that for a class of overparameterized models where the number of parameters D is much larger than the total data samples N , the best possible communication complexity is ⌦ ( N ) , which is independent of the problem dimension D . Further, for a few speciﬁc overparameterized models (i.e., the linear regression, and certain multi-layer neural network with one wide layer), we develop a set of algorithms which uses certain linear compression followed by adaptive quantization, and show that they achieve dimension independent, near-optimal communication complexity. To our knowledge, this is the ﬁrst time that dimension independent communication complexity has been shown for distributed optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Distribution-Informed Neural Networks for Domain Adaptation Regression b/data/2022/neurips/Distribution-Informed Neural Networks for Domain Adaptation Regression
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Distributional Convergence of the Sliced Wasserstein Process b/data/2022/neurips/Distributional Convergence of the Sliced Wasserstein Process
new file mode 100644
index 0000000000..119b00909a
--- /dev/null
+++ b/data/2022/neurips/Distributional Convergence of the Sliced Wasserstein Process	
@@ -0,0 +1 @@
+Motivated by the statistical and computational challenges of computing Wasserstein distances in high-dimensional contexts, machine learning researchers have defined modified Wasserstein distances based on computing distances between one-dimensional projections of the measures. Different choices of how to aggregate these projected distances (averaging, random sampling, maximizing) give rise to different distances, requiring different statistical analyses. We define the \emph{Sliced Wasserstein Process}, a stochastic process defined by the empirical Wasserstein distance between projections of empirical probability measures to all one-dimensional subspaces, and prove a uniform distributional limit theorem for this process. As a result, we obtain a unified framework in which to prove distributional limit results for all Wasserstein distances based on one-dimensional projections. We illustrate these results on a number of examples where no distributional limits were previously known.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributional Reinforcement Learning for Risk-Sensitive Policies b/data/2022/neurips/Distributional Reinforcement Learning for Risk-Sensitive Policies
new file mode 100644
index 0000000000..3642c779cb
--- /dev/null
+++ b/data/2022/neurips/Distributional Reinforcement Learning for Risk-Sensitive Policies	
@@ -0,0 +1 @@
+We address the problem of learning a risk-sensitive policy based on the CVaR risk measure using distributional reinforcement learning. In particular, we show that the standard action-selection strategy when applying the distributional Bellman optimality operator can result in convergence to neither the dynamic, Markovian CVaR nor the static, non-Markovian CVaR. We propose modiﬁcations to the existing algorithms that include a new distributional Bellman operator and show that the proposed strategy greatly expands the utility of distributional RL in learning and representing CVaR-optimized policies. Our proposed approach is a simple extension of standard distributional RL algorithms and can therefore take advantage of many of the recent advances in deep RL. On both synthetic and real data, we empirically show that our proposed algorithm is able to learn better CVaR-optimized policies
\ No newline at end of file
diff --git a/data/2022/neurips/Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning b/data/2022/neurips/Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning
new file mode 100644
index 0000000000..1c6f622521
--- /dev/null
+++ b/data/2022/neurips/Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Multi-agent reinforcement learning has drawn increasing attention in practice, e.g., robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributionally Adaptive Meta Reinforcement Learning b/data/2022/neurips/Distributionally Adaptive Meta Reinforcement Learning
new file mode 100644
index 0000000000..450b60ea65
--- /dev/null
+++ b/data/2022/neurips/Distributionally Adaptive Meta Reinforcement Learning	
@@ -0,0 +1 @@
+Meta-reinforcement learning algorithms provide a data-driven way to acquire policies that quickly adapt to many tasks with varying rewards or dynamics functions. However, learned meta-policies are often effective only on the exact task distribution on which they were trained and struggle in the presence of distribution shift of test-time rewards or transition dynamics. In this work, we develop a framework for meta-RL algorithms that are able to behave appropriately under test-time distribution shifts in the space of tasks. Our framework centers on an adaptive approach to distributional robustness that trains a population of meta-policies to be robust to varying levels of distribution shift. When evaluated on a potentially shifted test-time distribution of tasks, this allows us to choose the meta-policy with the most appropriate level of robustness, and use it to perform fast adaptation. We formally show how our framework allows for improved regret under distribution shift, and empirically show its efficacy on simulated robotics problems under a wide range of distribution shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributionally Robust Optimization via Ball Oracle Acceleration b/data/2022/neurips/Distributionally Robust Optimization via Ball Oracle Acceleration
new file mode 100644
index 0000000000..34689fed72
--- /dev/null
+++ b/data/2022/neurips/Distributionally Robust Optimization via Ball Oracle Acceleration	
@@ -0,0 +1 @@
+We develop and analyze algorithms for distributionally robust optimization (DRO) of convex losses. In particular, we consider group-structured and bounded $f$-divergence uncertainty sets. Our approach relies on an accelerated method that queries a ball optimization oracle, i.e., a subroutine that minimizes the objective within a small ball around the query point. Our main contribution is efficient implementations of this oracle for DRO objectives. For DRO with $N$ non-smooth loss functions, the resulting algorithms find an $\epsilon$-accurate solution with $\widetilde{O}\left(N\epsilon^{-2/3} + \epsilon^{-2}\right)$ first-order oracle queries to individual loss functions. Compared to existing algorithms for this problem, we improve complexity by a factor of up to $\epsilon^{-4/3}$.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributionally Robust Optimization with Data Geometry b/data/2022/neurips/Distributionally Robust Optimization with Data Geometry
new file mode 100644
index 0000000000..d55b9bce61
--- /dev/null
+++ b/data/2022/neurips/Distributionally Robust Optimization with Data Geometry	
@@ -0,0 +1 @@
+Distributionally Robust Optimization (DRO) serves as a robust alternative to empirical risk minimization (ERM), which optimizes the worst-case distribution in an uncertainty set typically specified by distance metrics including f -divergence and the Wasserstein distance. The metrics defined in the ostensible high dimensional space lead to exceedingly large uncertainty sets, resulting in the underperformance of most existing DRO methods. It has been well documented that high dimensional data approximately resides on low dimensional manifolds. In this work, to further constrain the uncertainty set, we incorporate data geometric properties into the design of distance metrics, obtaining our novel Geometric Wasserstein DRO (GDRO). Empowered by Gradient Flow, we derive a generically applicable approximate algorithm for the optimization of GDRO, and provide the bounded error rate of the approximation as well as the convergence rate of our algorithm. We also theoretically characterize the edge cases where certain existing DRO methods are the degeneracy of GDRO. Extensive experiments justify the superiority of our GDRO to existing DRO methods in multiple settings with strong distributional shifts, and confirm that the uncertainty set of GDRO adapts to data geometry.
\ No newline at end of file
diff --git a/data/2022/neurips/Distributionally robust weighted k-nearest neighbors b/data/2022/neurips/Distributionally robust weighted k-nearest neighbors
new file mode 100644
index 0000000000..d495c3900f
--- /dev/null
+++ b/data/2022/neurips/Distributionally robust weighted k-nearest neighbors	
@@ -0,0 +1 @@
+Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing $k$-nearest neighbor ($k$-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors, which aims to find the optimal weighted $k$-NN classifiers that hedge against feature uncertainties. We develop an algorithm, \texttt{Dr.k-NN}, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. When the size of the uncertainty set is properly tuned, the robust classifier has a smaller Lipschitz norm than the vanilla $k$-NN, and thus improves the generalization capability. We also couple our framework with neural-network-based feature embedding. We demonstrate the competitive performance of our algorithm compared to the state-of-the-art in the few-training-sample setting with various real-data experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/DivBO: Diversity-aware CASH for Ensemble Learning b/data/2022/neurips/DivBO: Diversity-aware CASH for Ensemble Learning
new file mode 100644
index 0000000000..92a267cf07
--- /dev/null
+++ b/data/2022/neurips/DivBO: Diversity-aware CASH for Ensemble Learning	
@@ -0,0 +1 @@
+The Combined Algorithm Selection and Hyperparameters optimization (CASH) problem is one of the fundamental problems in Automated Machine Learning (AutoML). Motivated by the success of ensemble learning, recent AutoML systems build post-hoc ensembles to output the final predictions instead of using the best single learner. However, while most CASH methods focus on searching for a single learner with the best performance, they neglect the diversity among base learners (i.e., they may suggest similar configurations to previously evaluated ones), which is also a crucial consideration when building an ensemble. To tackle this issue and further enhance the ensemble performance, we propose DivBO, a diversity-aware framework to inject explicit search of diversity into the CASH problems. In the framework, we propose to use a diversity surrogate to predict the pair-wise diversity of two unseen configurations. Furthermore, we introduce a temporary pool and a weighted acquisition function to guide the search of both performance and diversity based on Bayesian optimization. Empirical results on 15 public datasets show that DivBO achieves the best average ranks (1.82 and 1.73) on both validation and test errors among 10 compared methods, including post-hoc designs in recent AutoML systems and state-of-the-art baselines for ensemble learning on CASH problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Diverse Weight Averaging for Out-of-Distribution Generalization b/data/2022/neurips/Diverse Weight Averaging for Out-of-Distribution Generalization
new file mode 100644
index 0000000000..2945214e2f
--- /dev/null
+++ b/data/2022/neurips/Diverse Weight Averaging for Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.
\ No newline at end of file
diff --git a/data/2022/neurips/Diversified Recommendations for Agents with Adaptive Preferences b/data/2022/neurips/Diversified Recommendations for Agents with Adaptive Preferences
new file mode 100644
index 0000000000..1733ddbc73
--- /dev/null
+++ b/data/2022/neurips/Diversified Recommendations for Agents with Adaptive Preferences	
@@ -0,0 +1 @@
+When an Agent visits a platform recommending a menu of content to select from, their choice of item depends not only on fixed preferences, but also on their prior engagements with the platform. The Recommender's primary objective is typically to encourage content consumption which optimizes some reward, such as ad revenue, but they often also aim to ensure that a wide variety of content is consumed by the Agent over time. We formalize this problem as an adversarial bandit task. At each step, the Recommender presents a menu of $k$ (out of $n$) items to the Agent, who selects one item in the menu according to their unknown preference model, which maps their history of past items to relative selection probabilities. The Recommender then observes the Agent's chosen item and receives bandit feedback of the item's reward. In addition to optimizing reward from selected items, the Recommender must also ensure that the total distribution of chosen items has sufficiently high entropy. We define a class of preference models which are locally learnable, i.e. behavior over the entire domain can be estimated by only observing behavior in a small region; this includes models representable by bounded-degree polynomials as well as functions with a sparse Fourier basis. For this class, we give an algorithm for the Recommender which obtains $\tilde{O}(T^{3/4})$ regret against all item distributions satisfying two conditions: they are sufficiently diversified, and they are instantaneously realizable at any history by some distribution over menus. We show that these conditions are closely connected: all sufficiently high-entropy distributions are instantaneously realizable at any item history. We also give a set of negative results justifying our assumptions, in the form of a runtime lower bound for non-local learning and linear regret lower bounds for alternate benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Diversity vs. Recognizability: Human-like generalization in one-shot generative models b/data/2022/neurips/Diversity vs. Recognizability: Human-like generalization in one-shot generative models
new file mode 100644
index 0000000000..5c1046843f
--- /dev/null
+++ b/data/2022/neurips/Diversity vs. Recognizability: Human-like generalization in one-shot generative models	
@@ -0,0 +1 @@
+Robust generalization to new concepts has long remained a distinctive feature of human intelligence. However, recent progress in deep generative models has now led to neural architectures capable of synthesizing novel instances of unknown visual concepts from a single training example. Yet, a more precise comparison between these models and humans is not possible because existing performance metrics for generative models (i.e., FID, IS, likelihood) are not appropriate for the one-shot generation scenario. Here, we propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity (i.e., intra-class variability). Using this framework, we perform a systematic evaluation of representative one-shot generative models on the Omniglot handwritten dataset. We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. Extensive analyses of the effect of key model parameters further revealed that spatial attention and context integration have a linear contribution to the diversity-recognizability trade-off. In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability. Using the diversity-recognizability framework, we were able to identify models and parameters that closely approximate human data.
\ No newline at end of file
diff --git a/data/2022/neurips/Divert More Attention to Vision-Language Tracking b/data/2022/neurips/Divert More Attention to Vision-Language Tracking
new file mode 100644
index 0000000000..2a14daae47
--- /dev/null
+++ b/data/2022/neurips/Divert More Attention to Vision-Language Tracking	
@@ -0,0 +1 @@
+Relying on Transformer for complex visual feature learning, object tracking has witnessed the new standard for state-of-the-arts (SOTAs). However, this advancement accompanies by larger training data and longer training period, making tracking increasingly expensive. In this paper, we demonstrate that the Transformer-reliance is not necessary and the pure ConvNets are still competitive and even better yet more economical and friendly in achieving SOTA tracking. Our solution is to unleash the power of multimodal vision-language (VL) tracking, simply using ConvNets. The essence lies in learning novel unified-adaptive VL representations with our modality mixer (ModaMixer) and asymmetrical ConvNet search. We show that our unified-adaptive VL representation, learned purely with the ConvNets, is a simple yet strong alternative to Transformer visual features, by unbelievably improving a CNN-based Siamese tracker by 14.5% in SUC on challenging LaSOT (50.7%>65.2%), even outperforming several Transformer-based SOTA trackers. Besides empirical results, we theoretically analyze our approach to evidence its effectiveness. By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer. Code and models will be released at https://github.com/JudasDie/SOTS.
\ No newline at end of file
diff --git a/data/2022/neurips/Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning b/data/2022/neurips/Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning
new file mode 100644
index 0000000000..4734b999fa
--- /dev/null
+++ b/data/2022/neurips/Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning	
@@ -0,0 +1 @@
+We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source-pretrained model is adapted to the target domain without access to the source data. Existing techniques mainly leverage self-supervised pseudo labeling to achieve class-wise global alignment [1] or rely on local structure extraction that encourages feature consistency among neighborhoods [2]. While impressive progress has been made, both lines of methods have their own drawbacks - the"global"approach is sensitive to noisy labels while the"local"counterpart suffers from source bias. In this paper, we present Divide and Contrast (DaC), a new paradigm for SFUDA that strives to connect the good ends of both worlds while bypassing their limitations. Based on the prediction confidence of the source model, DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals under an adaptive contrastive learning framework. Specifically, the source-like samples are utilized for learning global class clustering thanks to their relatively clean labels. The more noisy target-specific data are harnessed at the instance level for learning the intrinsic local structures. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch. Extensive experiments on VisDA, Office-Home, and the more challenging DomainNet have verified the superior performance of DaC over current state-of-the-art approaches. The code is available at https://github.com/ZyeZhang/DaC.git.
\ No newline at end of file
diff --git a/data/2022/neurips/Do Current Multi-Task Optimization Methods in Deep Learning Even Help? b/data/2022/neurips/Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
new file mode 100644
index 0000000000..1560cc9a63
--- /dev/null
+++ b/data/2022/neurips/Do Current Multi-Task Optimization Methods in Deep Learning Even Help?	
@@ -0,0 +1 @@
+Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply optimizing a weighted average of the task losses. In this paper, we perform large-scale experiments on a variety of language and vision tasks to examine the empirical validity of these claims. We show that, despite the added design and computational complexity of these algorithms, MTO methods do not yield any performance improvements beyond what is achievable via traditional optimization approaches. We highlight alternative strategies that consistently yield improvements to the performance profile and point out common training pitfalls that might cause suboptimal results. Finally, we outline challenges in reliably evaluating the performance of MTO algorithms and discuss potential solutions.
\ No newline at end of file
diff --git a/data/2022/neurips/Do Residual Neural Networks discretize Neural Ordinary Differential Equations? b/data/2022/neurips/Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
new file mode 100644
index 0000000000..c0c6298534
--- /dev/null
+++ b/data/2022/neurips/Do Residual Neural Networks discretize Neural Ordinary Differential Equations?	
@@ -0,0 +1 @@
+Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to 0 with depth N if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate 1 over N, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.
\ No newline at end of file
diff --git a/data/2022/neurips/Does GNN Pretraining Help Molecular Representation? b/data/2022/neurips/Does GNN Pretraining Help Molecular Representation?
new file mode 100644
index 0000000000..33ffe45cb3
--- /dev/null
+++ b/data/2022/neurips/Does GNN Pretraining Help Molecular Representation?	
@@ -0,0 +1 @@
+Extracting informative representations of molecules using Graph neural networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph research community has been trying to replicate the success of self-supervised pretraining in natural language processing, with several successes claimed. However, we find the benefit brought by self-supervised pretraining on small molecular data can be negligible in many cases. We conduct thorough ablation studies on the key components of GNN pretraining, including pretraining objectives, data splitting methods, input features, pretraining dataset scales, and GNN architectures, to see how they affect the accuracy of the downstream tasks. Our first important finding is, self-supervised graph pretraining do not always have statistically significant advantages over non-pretraining methods in many settings. Secondly, although noticeable improvement can be observed with additional supervised pretraining, the improvement may diminish with richer features or more balanced data splits. Thirdly, hyper-parameters could have larger impacts on accuracy of downstream tasks than the choice of pretraining tasks, especially when the scales of downstream tasks are small. Finally, we provide our conjectures where the complexity of some pretraining methods on small molecules might be insufficient, followed by empirical evidences on different pretraining datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Does Momentum Change the Implicit Regularization on Separable Data? b/data/2022/neurips/Does Momentum Change the Implicit Regularization on Separable Data?
new file mode 100644
index 0000000000..e69b130612
--- /dev/null
+++ b/data/2022/neurips/Does Momentum Change the Implicit Regularization on Separable Data?	
@@ -0,0 +1 @@
+The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L2 max-margin solution. Technically, to overcome the difficulty of the error accumulation in analyzing the momentum, we construct new potential functions to analyze the gap between the model parameter and the max-margin solution. Numerical experiments are conducted and support our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels? b/data/2022/neurips/Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
new file mode 100644
index 0000000000..daaf8e0adc
--- /dev/null
+++ b/data/2022/neurips/Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?	
@@ -0,0 +1 @@
+We investigate whether self-supervised learning (SSL) can improve online reinforcement learning (RL) from pixels. We extend the contrastive reinforcement learning framework (e.g., CURL) that jointly optimizes SSL and RL losses and conduct an extensive amount of experiments with various self-supervised losses. Our observations suggest that the existing SSL framework for RL fails to bring meaningful improvement over the baselines only taking advantage of image augmentation when the same amount of data and augmentation is used. We further perform evolutionary searches to find the optimal combination of multiple self-supervised losses for RL, but find that even such a loss combination fails to meaningfully outperform the methods that only utilize carefully designed image augmentations. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Domain Adaptation meets Individual Fairness. And they get along b/data/2022/neurips/Domain Adaptation meets Individual Fairness. And they get along
new file mode 100644
index 0000000000..8561d76d03
--- /dev/null
+++ b/data/2022/neurips/Domain Adaptation meets Individual Fairness. And they get along	
@@ -0,0 +1 @@
+Many instances of algorithmic bias are caused by distributional shifts. For example, machine learning (ML) models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we leverage this connection between algorithmic fairness and distribution shifts to show that algorithmic fairness interventions can help ML models overcome distribution shifts, and that domain adaptation methods (for overcoming distribution shifts) can mitigate algorithmic biases. In particular, we show that (i) enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models under the covariate shift assumption and that (ii) it is possible to adapt representation alignment methods for domain adaptation to enforce individual fairness. The former is unexpected because IF interventions were not developed with distribution shifts in mind. The latter is also unexpected because representation alignment is not a common approach in the individual fairness literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Domain Adaptation under Open Set Label Shift b/data/2022/neurips/Domain Adaptation under Open Set Label Shift
new file mode 100644
index 0000000000..b9eb389c63
--- /dev/null
+++ b/data/2022/neurips/Domain Adaptation under Open Set Label Shift	
@@ -0,0 +1 @@
+We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the class-conditional distributions p(x|y) are domain-invariant. OSLS subsumes domain adaptation under label shift and Positive-Unlabeled (PU) learning. The learner's goals here are two-fold: (a) estimate the target label distribution, including the novel class; and (b) learn a target classifier. First, we establish necessary and sufficient conditions for identifying these quantities. Second, motivated by advances in label shift and PU learning, we propose practical methods for both tasks that leverage black-box predictors. Unlike typical Open Set Domain Adaptation (OSDA) problems, which tend to be ill-posed and amenable only to heuristics, OSLS offers a well-posed problem amenable to more principled machinery. Experiments across numerous semi-synthetic benchmarks on vision, language, and medical datasets demonstrate that our methods consistently outperform OSDA baselines, achieving 10--25% improvements in target domain accuracy. Finally, we analyze the proposed methods, establishing finite-sample convergence to the true label marginal and convergence to optimal classifier for linear models in a Gaussian setup. Code is available at https://github.com/acmi-lab/Open-Set-Label-Shift.
\ No newline at end of file
diff --git a/data/2022/neurips/Domain Generalization by Learning and Removing Domain-specific Features b/data/2022/neurips/Domain Generalization by Learning and Removing Domain-specific Features
new file mode 100644
index 0000000000..16614c77cd
--- /dev/null
+++ b/data/2022/neurips/Domain Generalization by Learning and Removing Domain-specific Features	
@@ -0,0 +1 @@
+Deep Neural Networks (DNNs) suffer from domain shift when the test dataset follows a distribution different from the training dataset. Domain generalization aims to tackle this issue by learning a model that can generalize to unseen domains. In this paper, we propose a new approach that aims to explicitly remove domain-specific features for domain generalization. Following this approach, we propose a novel framework called Learning and Removing Domain-specific features for Generalization (LRDG) that learns a domain-invariant model by tactically removing domain-specific features from the input images. Specifically, we design a classifier to effectively learn the domain-specific features for each source domain, respectively. We then develop an encoder-decoder network to map each input image into a new image space where the learned domain-specific features are removed. With the images output by the encoder-decoder network, another classifier is designed to learn the domain-invariant features to conduct image classification. Extensive experiments demonstrate that our framework achieves superior performance compared with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Domain Generalization without Excess Empirical Risk b/data/2022/neurips/Domain Generalization without Excess Empirical Risk
new file mode 100644
index 0000000000..9b917846fd
--- /dev/null
+++ b/data/2022/neurips/Domain Generalization without Excess Empirical Risk	
@@ -0,0 +1 @@
+Given data from diverse sets of distinct distributions, domain generalization aims to learn models that generalize to unseen distributions. A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty. We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization. We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk. This change guarantees that the domain generalization penalty cannot impair optimization of the empirical risk, i.e., in-distribution performance. To solve the proposed optimization problem, we demonstrate an exciting connection to rate-distortion theory and utilize its tools to design an efficient method. Our approach can be applied to any penalty-based domain generalization method, and we demonstrate its effectiveness by applying it to three examplar methods from the literature, showing significant improvements.
\ No newline at end of file
diff --git a/data/2022/neurips/Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation b/data/2022/neurips/Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation
new file mode 100644
index 0000000000..09d4b2c578
--- /dev/null
+++ b/data/2022/neurips/Don't Pour Cereal into Coffee: Differentiable Temporal Logic for Temporal Action Segmentation	
@@ -0,0 +1 @@
+L Length of input sequence N Number of action classes Logic Operators ¬,∧,∨ Logical NEGATION, logical AND, and logical OR X Temporal operator NEXT F Temporal operator EVENTUAL W Temporal operator WEAK_UNTIL S Temporal operator SINCE Constraints BD(ai, aj) Action ai is backward dependent on action aj FC(ai, aj) Action ai forward cancels action aj Ip(ai, aj) Action ai implies action aj Ex(ai, aj) Action ai excludes action aj
\ No newline at end of file
diff --git a/data/2022/neurips/Don't Roll the Dice, Ask Twice: The Two-Query Distortion of Matching Problems and Beyond b/data/2022/neurips/Don't Roll the Dice, Ask Twice: The Two-Query Distortion of Matching Problems and Beyond
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity b/data/2022/neurips/Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity
new file mode 100644
index 0000000000..ddeb2aec7d
--- /dev/null
+++ b/data/2022/neurips/Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity	
@@ -0,0 +1 @@
+In response to subtle adversarial examples flipping classifications of neural network models, recent research has promoted certified robustness as a solution. There, invariance of predictions to all norm-bounded attacks is achieved through randomised smoothing of network inputs. Today's state-of-the-art certifications make optimal use of the class output scores at the input instance under test: no better radius of certification (under the $L_2$ norm) is possible given only these score. However, it is an open question as to whether such lower bounds can be improved using local information around the instance under test. In this work, we demonstrate how today's"optimal"certificates can be improved by exploiting both the transitivity of certifications, and the geometry of the input space, giving rise to what we term Geometrically-Informed Certified Robustness. By considering the smallest distance to points on the boundary of a set of certifications this approach improves certifications for more than $80\%$ of Tiny-Imagenet instances, yielding an on average $5 \%$ increase in the associated certification. When incorporating training time processes that enhance the certified radius, our technique shows even more promising results, with a uniform $4$ percentage point increase in the achieved certified radius.
\ No newline at end of file
diff --git a/data/2022/neurips/Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination b/data/2022/neurips/Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination
new file mode 100644
index 0000000000..8a5082b2ce
--- /dev/null
+++ b/data/2022/neurips/Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination	
@@ -0,0 +1 @@
+The learned policy of model-free offline reinforcement learning (RL) methods is often constrained to stay within the support of datasets to avoid possible dangerous out-of-distribution actions or states, making it challenging to handle out-of-support region. Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model. However, the imagined transitions may be inaccurate, thus downgrading the performance of the underlying offline RL method. In this paper, we propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. We introduce conservatism by trusting samples that the forward model and backward model agree on. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method. Experimental results on the D4RL benchmarks demonstrate that our method significantly boosts the performance of existing model-free offline RL algorithms and achieves competitive or better scores against baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Doubly Robust Counterfactual Classification b/data/2022/neurips/Doubly Robust Counterfactual Classification
new file mode 100644
index 0000000000..df9f5bcf35
--- /dev/null
+++ b/data/2022/neurips/Doubly Robust Counterfactual Classification	
@@ -0,0 +1 @@
+We study counterfactual classification as a new tool for decision-making under hypothetical (contrary to fact) scenarios. We propose a doubly-robust nonparametric estimator for a general counterfactual classifier, where we can incorporate flexible constraints by casting the classification problem as a nonlinear mathematical program involving counterfactuals. We go on to analyze the rates of convergence of the estimator and provide a closed-form expression for its asymptotic distribution. Our analysis shows that the proposed estimator is robust against nuisance model misspecification, and can attain fast $\sqrt{n}$ rates with tractable inference even when using nonparametric machine learning approaches. We study the empirical performance of our methods by simulation and apply them for recidivism risk prediction.
\ No newline at end of file
diff --git a/data/2022/neurips/Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions b/data/2022/neurips/Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions
new file mode 100644
index 0000000000..4ce8d5c9fd
--- /dev/null
+++ b/data/2022/neurips/Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions	
@@ -0,0 +1 @@
+Value iteration (VI) is a foundational dynamic programming method, important for learning and planning in optimal control and reinforcement learning. VI proceeds in batches, where the update to the value of each state must be completed before the next batch of updates can begin. Completing a single batch is prohibitively expensive if the state space is large, rendering VI impractical for many applications. Asynchronous VI helps to address the large state space problem by updating one state at a time, in-place and in an arbitrary order. However, Asynchronous VI still requires a maximization over the entire action space, making it impractical for domains with large action space. To address this issue, we propose doubly-asynchronous value iteration (DAVI), a new algorithm that generalizes the idea of asynchrony from states to states and actions. More concretely, DAVI maximizes over a sampled subset of actions that can be of any user-defined size. This simple approach of using sampling to reduce computation maintains similarly appealing theoretical properties to VI without the need to wait for a full sweep through the entire action space in each update. In this paper, we show DAVI converges to the optimal value function with probability one, converges at a near-geometric rate with probability 1-delta, and returns a near-optimal policy in computation time that nearly matches a previously established bound for VI. We also empirically demonstrate DAVI's effectiveness in several experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer b/data/2022/neurips/Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer
new file mode 100644
index 0000000000..35c41c13da
--- /dev/null
+++ b/data/2022/neurips/Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer	
@@ -0,0 +1 @@
+Although autoregressive models have achieved promising results on image generation, their unidirectional generation process prevents the resultant images from fully reflecting global contexts. To address the issue, we propose an effective image generation framework of Draft-and-Revise with Contextual RQ-transformer to consider global contexts during the generation process. As a generalized VQ-VAE, RQ-VAE first represents a high-resolution image as a sequence of discrete code stacks. After code stacks in the sequence are randomly masked, Contextual RQ-Transformer is trained to infill the masked code stacks based on the unmasked contexts of the image. Then, Contextual RQ-Transformer uses our two-phase decoding, Draft-and-Revise, and generates an image, while exploiting the global contexts of the image during the generation process. Specifically. in the draft phase, our model first focuses on generating diverse images despite rather low quality. Then, in the revise phase, the model iteratively improves the quality of images, while preserving the global contexts of generated images. In experiments, our method achieves state-of-the-art results on conditional image generation. We also validate that the Draft-and-Revise decoding can achieve high performance by effectively controlling the quality-diversity trade-off in image generation.
\ No newline at end of file
diff --git a/data/2022/neurips/Drawing out of Distribution with Neuro-Symbolic Generative Models b/data/2022/neurips/Drawing out of Distribution with Neuro-Symbolic Generative Models
new file mode 100644
index 0000000000..ae1cb28ed7
--- /dev/null
+++ b/data/2022/neurips/Drawing out of Distribution with Neuro-Symbolic Generative Models	
@@ -0,0 +1 @@
+Learning general-purpose representations from perceptual inputs is a hallmark of human intelligence. For example, people can write out numbers or characters, or even draw doodles, by characterizing these tasks as different instantiations of the same generic underlying process -- compositional arrangements of different forms of pen strokes. Crucially, learning to do one task, say writing, implies reasonable competence at another, say drawing, on account of this shared process. We present Drawing out of Distribution (DooD), a neuro-symbolic generative model of stroke-based drawing that can learn such general-purpose representations. In contrast to prior work, DooD operates directly on images, requires no supervision or expensive test-time inference, and performs unsupervised amortised inference with a symbolic stroke model that better enables both interpretability and generalization. We evaluate DooD on its ability to generalise across both data and tasks. We first perform zero-shot transfer from one dataset (e.g. MNIST) to another (e.g. Quickdraw), across five different datasets, and show that DooD clearly outperforms different baselines. An analysis of the learnt representations further highlights the benefits of adopting a symbolic stroke model. We then adopt a subset of the Omniglot challenge tasks, and evaluate its ability to generate new exemplars (both unconditionally and conditionally), and perform one-shot classification, showing that DooD matches the state of the art. Taken together, we demonstrate that DooD does indeed capture general-purpose representations across both data and task, and takes a further step towards building general and robust concept-learning systems.
\ No newline at end of file
diff --git a/data/2022/neurips/DreamShard: Generalizable Embedding Table Placement for Recommender Systems b/data/2022/neurips/DreamShard: Generalizable Embedding Table Placement for Recommender Systems
new file mode 100644
index 0000000000..25a440ec2c
--- /dev/null
+++ b/data/2022/neurips/DreamShard: Generalizable Embedding Table Placement for Recommender Systems	
@@ -0,0 +1 @@
+We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e.g., GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available at https://github.com/daochenzha/dreamshard
\ No newline at end of file
diff --git a/data/2022/neurips/DropCov: A Simple yet Effective Method for Improving Deep Architectures b/data/2022/neurips/DropCov: A Simple yet Effective Method for Improving Deep Architectures
new file mode 100644
index 0000000000..425740c050
--- /dev/null
+++ b/data/2022/neurips/DropCov: A Simple yet Effective Method for Improving Deep Architectures	
@@ -0,0 +1 @@
+Previous works show global covariance pooling (GCP) has great potential to improve deep architectures especially on visual recognition tasks, where post-normalization of GCP plays a very important role in ﬁnal performance. Although several post-normalization strategies have been studied, these methods pay more close attention to effect of normalization on covariance representations rather than the whole GCP networks, and their effectiveness requires further understanding. Meanwhile, existing effective post-normalization strategies (e.g., matrix power normalization) usually suffer from high computational complexity (e.g., O ( d 3 ) for d -dimensional inputs). To handle above issues, this work ﬁrst analyzes the effect of post-normalization from the perspective of training GCP networks. Particularly, we for the ﬁrst time show that effective post-normalization can make a good trade-off between representation decorrelation and information preservation for GCP, which are crucial to alleviate over-ﬁtting and increase representation ability of deep GCP networks, respectively . Based on this ﬁnding, we can improve existing post-normalization methods with some small modiﬁcations, providing further support to our observation. Furthermore, this ﬁnding encourages us to propose a novel pre-normalization method for GCP (namely DropCov), which develops an adaptive channel dropout on features right before GCP, aiming to reach trade-off between representation decorrelation and information preservation in a more efﬁcient way. Our DropCov only has a linear complexity of O ( d ) , while being free for inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Dual-Curriculum Contrastive Multi-Instance Learning for Cancer Prognosis Analysis with Whole Slide Images b/data/2022/neurips/Dual-Curriculum Contrastive Multi-Instance Learning for Cancer Prognosis Analysis with Whole Slide Images
new file mode 100644
index 0000000000..76cec4d210
--- /dev/null
+++ b/data/2022/neurips/Dual-Curriculum Contrastive Multi-Instance Learning for Cancer Prognosis Analysis with Whole Slide Images	
@@ -0,0 +1 @@
+The multi-instance learning (MIL) has advanced cancer prognosis analysis with whole slide images (WSIs). However, current MIL methods for WSI analysis still confront unique challenges. Previous methods typically generate instance representations via a pre-trained model or a model trained by the instances with bag-level annotations, which, however, may not generalize well to the downstream task due to the introduction of excessive label noises and the lack of ﬁne-grained information across multi-magniﬁcation WSIs. Additionally, existing methods generally aggregate instance representations as bag ones for prognosis prediction and have no consideration of intra-bag redundancy and inter-bag discrimination. To address these issues, we propose a dual-curriculum contrastive MIL method for cancer prognosis analysis with WSIs. The proposed method consists of two curriculums, i.e., saliency-guided weakly-supervised instance encoding with cross-scale tiles and contrastive-enhanced soft-bag prognosis inference. Extensive experiments on three public datasets demonstrate that our method outperforms state-of-the-art methods in this ﬁeld. The code is available at https://github.com/YuZhang-SMU/Cancer-Prognosis-Analysis/tree/main/DC_MIL%20Code.
\ No newline at end of file
diff --git a/data/2022/neurips/Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection b/data/2022/neurips/Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection
new file mode 100644
index 0000000000..424b9de664
--- /dev/null
+++ b/data/2022/neurips/Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection	
@@ -0,0 +1 @@
+Graph-level anomaly detection aims to distinguish anomalous graphs in a graph dataset from normal graphs. Anomalous graphs represent a very few but essential patterns in the real world. The anomalous property of a graph may be referable to its anomalous attributes of particular nodes and anomalous substructures that refer to a subset of nodes and edges in the graph. In addition, due to the imbalance nature of anomaly problem, anomalous information will be diluted by normal graphs with overwhelming quantities. Various anomaly notions in the attributes and/or substructures and the imbalance nature together make detecting anomalous graphs a non-trivial task. In this paper, we propose a graph neural network for graph-level anomaly detection, namely iGAD. Specifically, an anomalous graph attribute-aware graph convolution and an anomalous graph substructure-aware deep Random Walk Kernel (deep RWK) are welded into a graph neural network to achieve the dual-discriminative ability on anomalous attributes and substructures. Deep RWK in iGAD makes up for the deficiency of graph convolution in distinguishing structural information caused by the simple neighborhood aggregation mechanism. Further, we propose a Point Mutual Information (PMI)-based loss function to target the problems caused by imbalance distributions. PMI-based loss function enables iGAD to capture essential correlation between input graphs and their anomalous/normal properties. We evaluate iGAD on four real-world graph datasets. Extensive experiments demonstrate the superiority of iGAD on the graph-level anomaly detection task.
\ No newline at end of file
diff --git a/data/2022/neurips/DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations b/data/2022/neurips/DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations
new file mode 100644
index 0000000000..3da60dd686
--- /dev/null
+++ b/data/2022/neurips/DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations	
@@ -0,0 +1 @@
+Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications. Recent work learns an alignment between textual and visual spaces to compensate for insufficient image labels, but loses accuracy because of the limited amount of available MLR annotations. In this work, we utilize the strong alignment of textual and visual features pretrained with millions of auxiliary image-text pairs and propose Dual Context Optimization (DualCoOp) as a unified framework for partial-label MLR and zero-shot MLR. DualCoOp encodes positive and negative contexts with class names as part of the linguistic input (i.e. prompts). Since DualCoOp only introduces a very light learnable overhead upon the pretrained vision-language framework, it can quickly adapt to multi-label recognition tasks that have limited annotations and even unseen classes. Experiments on standard multi-label recognition benchmarks across two challenging low-label settings demonstrate the advantages of our approach over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Dungeons and Data: A Large-Scale NetHack Dataset b/data/2022/neurips/Dungeons and Data: A Large-Scale NetHack Dataset
new file mode 100644
index 0000000000..6f8ee1d7f4
--- /dev/null
+++ b/data/2022/neurips/Dungeons and Data: A Large-Scale NetHack Dataset	
@@ -0,0 +1 @@
+Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Fair Division with Partial Information b/data/2022/neurips/Dynamic Fair Division with Partial Information
new file mode 100644
index 0000000000..2f0af07438
--- /dev/null
+++ b/data/2022/neurips/Dynamic Fair Division with Partial Information	
@@ -0,0 +1 @@
+We consider the fundamental problem of fairly and efﬁciently allocating T indivisible items among n agents with additive preferences. The items become available over a sequence of rounds, and every item must be allocated immediately and irrevocably before the next one arrives. Previous work shows that when the agents’ valuations for the items are drawn from known distributions, it is possible (under mild technical assumptions) to ﬁnd allocations that are envy-free with high probability and Pareto efﬁcient ex-post. We study a partial-information setting, where it is possible to elicit ordinal but not cardinal information. When a new item arrives, the algorithm can query each agent for the relative rank of this item with respect to a subset of the past items. When values are drawn from i.i.d. distributions, we give an algorithm that is envy-free and (1 − (cid:15) ) -welfare-maximizing with high probability. We provide similar guarantees (envy-freeness and a constant approximation to welfare with high probability) even with minimally expressive queries that ask for a comparison to a single previous item. For independent but non-identical agents, we obtain envy-freeness and a constant approximation to Pareto efﬁciency with high probability. We prove that all our results are asymptotically tight.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Graph Neural Networks Under Spatio-Temporal Distribution Shift b/data/2022/neurips/Dynamic Graph Neural Networks Under Spatio-Temporal Distribution Shift
new file mode 100644
index 0000000000..9913e16bfc
--- /dev/null
+++ b/data/2022/neurips/Dynamic Graph Neural Networks Under Spatio-Temporal Distribution Shift	
@@ -0,0 +1 @@
+Dynamic graph neural networks (DyGNNs) have demonstrated powerful predictive abilities by exploiting graph structural and temporal dynamics. However, the existing DyGNNs fail to handle distribution shifts, which naturally exist in dynamic graphs, mainly because the patterns exploited by DyGNNs may be variant with respect to labels under distribution shifts. In this paper, we propose to handle spatio-temporal distribution shifts in dynamic graphs by discovering and utilizing invariant patterns , i.e., structures and features whose predictive abilities are stable across distribution shifts, which faces two key challenges: 1) How to discover the complex variant and invariant spatio-temporal patterns in dynamic graphs, which involve both time-varying graph structures and node features. 2) How to handle spatio-temporal distribution shifts with the discovered variant and invariant patterns. To tackle these challenges, we propose the Disentangled Intervention-based Dynamic graph Attention networks ( DIDA ). Our proposed method can effectively handle spatio-temporal distribution shifts in dynamic graphs by discovering and fully utilizing invariant spatio-temporal patterns. Specifically, we first propose a disentangled spatio-temporal attention network to capture the variant and invariant patterns. Then, we design a spatio-temporal intervention mechanism to create multiple interventional distributions by sampling and reassembling variant patterns across neighborhoods and time stamps to eliminate the spurious impacts of variant patterns. Lastly, we propose an invariance regularization term to minimize the variance of predictions in intervened distributions so that our model can make predictions based on invariant patterns with stable predictive abilities and therefore handle distribution shifts. Experiments on three real-world datasets and one synthetic dataset demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts. Our work is the first study of spatio-temporal distribution shifts in dynamic graphs, to the best of our knowledge
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior b/data/2022/neurips/Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
new file mode 100644
index 0000000000..a4b39df809
--- /dev/null
+++ b/data/2022/neurips/Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior	
@@ -0,0 +1 @@
+Understanding decision-making is a core objective in both neuroscience and psychology, and computational models have often been helpful in the pursuit of this goal. While many models have been developed for characterizing behavior in binary decision-making and bandit tasks, comparatively little work has focused on animal decision-making in more complex tasks, such as navigation through a maze. Inverse reinforcement learning (IRL) is a promising approach for understanding such behavior, as it aims to infer the unknown reward function of an agent from its observed trajectories through state space. However, IRL has yet to be widely applied in neuroscience. One potential reason for this is that existing IRL frameworks assume that an agent’s reward function is fixed over time. To address this shortcoming, we introduce dynamic inverse reinforcement learning (DIRL), a novel IRL framework that allows for time-varying intrinsic rewards. Our method parametrizes the unknown reward function as a time-varying linear combination of spatial reward maps (which we refer to as “goal maps”). We develop an efficient inference method for recovering this dynamic reward function from behavioral data. We demonstrate DIRL in simulated experiments and then apply it to a dataset of mice exploring a labyrinth. Our method returns interpretable reward functions for two separate cohorts of mice, and provides a novel characterization of exploratory behavior. We expect DIRL to have broad applicability in neuroscience, and to facilitate the design of biologically-inspired reward functions for training artificial agents.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Learning in Large Matching Markets b/data/2022/neurips/Dynamic Learning in Large Matching Markets
new file mode 100644
index 0000000000..63766297f8
--- /dev/null
+++ b/data/2022/neurips/Dynamic Learning in Large Matching Markets	
@@ -0,0 +1 @@
+We study a sequential matching problem faced by large centralized platforms where "jobs" must be matched to "workers" subject to uncertainty about worker skill proficiencies. Jobs arrive at discrete times (possibly in batches of stochastic size and composition) with "job-types" observable upon arrival. To capture the "choice overload" phenomenon, we posit an unlimited supply of workers where each worker is characterized by a vector of attributes (aka "worker-types") sampled from an underlying population-level distribution. The distribution as well as mean payoffs for possible workerjob type-pairs are unobservables and the platform's goal is to sequentially match incoming jobs to workers in a way that maximizes its cumulative payoffs over the planning horizon. We establish lower bounds on the regret of any matching algorithm in this setting and propose a novel rate-optimal learning algorithm that adapts to aforementioned primitives online. Our learning guarantees highlight a distinctive characteristic of the problem: achievable performance only has a second-order dependence on worker-type distributions; we believe this finding may be of interest more broadly.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Pricing with Monotonicity Constraint under Unknown Parametric Demand Model b/data/2022/neurips/Dynamic Pricing with Monotonicity Constraint under Unknown Parametric Demand Model
new file mode 100644
index 0000000000..039cde06be
--- /dev/null
+++ b/data/2022/neurips/Dynamic Pricing with Monotonicity Constraint under Unknown Parametric Demand Model	
@@ -0,0 +1 @@
+We consider a Continuum-Armed Bandit problem with an additional monotonicity constraint (or “markdown” constraint) on the actions selected. This problem faith-fully models a natural revenue management problem, called “markdown pricing”, where the objective is to adaptively reduce the price over a ﬁnite horizon to maximize the expected revenues. Chen ([3]) and Jia et al ([9]) recently showed a tight T 3 / 4 regret bound over T rounds under minimal assumptions of unimodality and Lipschitzness in the reward function. This bound shows that markdown pricing is strictly harder than unconstrained dynamic pricing (i.e., without the monotonicity constraint), which admits T 2 / 3 regret under the same assumptions ([11]). However, in practice, demand functions are usually assumed to have certain functional forms (e.g. linear or exponential), rendering the demand learning easier and suggesting better regret bounds. In this work we introduce a concept, markdown dimension , that measures the complexity of a parametric family, and present optimal regret bounds that improve upon the previous T 3 / 4 bound under this framework.
\ No newline at end of file
diff --git "a/data/2022/neurips/Dynamic Sparse Network for Time Series Classification: Learning What to \"See\"" "b/data/2022/neurips/Dynamic Sparse Network for Time Series Classification: Learning What to \"See\""
new file mode 100644
index 0000000000..8a0c13ba8a
--- /dev/null
+++ "b/data/2022/neurips/Dynamic Sparse Network for Time Series Classification: Learning What to \"See\""	
@@ -0,0 +1 @@
+The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC). However, the variation of signal scales across and within time series data, makes it challenging to decide on proper RF sizes for TSC. In this paper, we propose a dynamic sparse network (DSN) with sparse connections for TSC, which can learn to cover various RF without cumbersome hyper-parameters tuning. The kernels in each sparse layer are sparse and can be explored under the constraint regions by dynamic sparse training, which makes it possible to reduce the resource cost. The experimental results show that the proposed DSN model can achieve state-of-art performance on both univariate and multivariate TSC datasets with less than 50\% computational cost compared with recent baseline methods, opening the path towards more accurate resource-aware methods for time series analyses. Our code is publicly available at: https://github.com/QiaoXiao7282/DSN.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic Tensor Product Regression b/data/2022/neurips/Dynamic Tensor Product Regression
new file mode 100644
index 0000000000..8846c552e2
--- /dev/null
+++ b/data/2022/neurips/Dynamic Tensor Product Regression	
@@ -0,0 +1 @@
+In this work, we initiate the study of \emph{Dynamic Tensor Product Regression}. One has matrices $A_1\in \mathbb{R}^{n_1\times d_1},\ldots,A_q\in \mathbb{R}^{n_q\times d_q}$ and a label vector $b\in \mathbb{R}^{n_1\ldots n_q}$, and the goal is to solve the regression problem with the design matrix $A$ being the tensor product of the matrices $A_1, A_2, \dots, A_q$ i.e. $\min_{x\in \mathbb{R}^{d_1\ldots d_q}}~\|(A_1\otimes \ldots\otimes A_q)x-b\|_2$. At each time step, one matrix $A_i$ receives a sparse change, and the goal is to maintain a sketch of the tensor product $A_1\otimes\ldots \otimes A_q$ so that the regression solution can be updated quickly. Recomputing the solution from scratch for each round is very slow and so it is important to develop algorithms which can quickly update the solution with the new design matrix. Our main result is a dynamic tree data structure where any update to a single matrix can be propagated quickly throughout the tree. We show that our data structure can be used to solve dynamic versions of not only Tensor Product Regression, but also Tensor Product Spline regression (which is a generalization of ridge regression) and for maintaining Low Rank Approximations for the tensor product.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamic pricing and assortment under a contextual MNL demand b/data/2022/neurips/Dynamic pricing and assortment under a contextual MNL demand
new file mode 100644
index 0000000000..82fe36207e
--- /dev/null
+++ b/data/2022/neurips/Dynamic pricing and assortment under a contextual MNL demand	
@@ -0,0 +1 @@
+We consider dynamic multi-product pricing and assortment problems under an unknown demand over T periods, where in each period, the seller decides on the price for each product or the assortment of products to offer to a customer who chooses according to an unknown Multinomial Logit Model (MNL). Such problems arise in many applications, including online retail and advertising. We propose a randomized dynamic pricing policy based on a variant of the Online Newton Step algorithm (ONS) that achieves a $O(d\sqrt{T}\log(T))$ regret guarantee under an adversarial arrival model. We also present a new optimistic algorithm for the adversarial MNL contextual bandits problem, which achieves a better dependency than the state-of-the-art algorithms in a problem-dependent constant $\kappa_2$ (potentially exponentially small). Our regret upper bound scales as $\tilde{O}(d\sqrt{\kappa_2 T}+ \log(T)/\kappa_2)$, which gives a stronger bound than the existing $\tilde{O}(d\sqrt{T}/\kappa_2)$ guarantees.
\ No newline at end of file
diff --git a/data/2022/neurips/Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution b/data/2022/neurips/Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution
new file mode 100644
index 0000000000..01d28f2e03
--- /dev/null
+++ b/data/2022/neurips/Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution	
@@ -0,0 +1 @@
+Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.
\ No newline at end of file
diff --git a/data/2022/neurips/E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance b/data/2022/neurips/E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance
new file mode 100644
index 0000000000..7ade2b13ce
--- /dev/null
+++ b/data/2022/neurips/E-MAPP: Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance	
@@ -0,0 +1 @@
+A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL b/data/2022/neurips/EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
new file mode 100644
index 0000000000..d8c90e6a26
--- /dev/null
+++ b/data/2022/neurips/EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL	
@@ -0,0 +1 @@
+Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory. When it succeeds, it receives an intrinsic reward proportional to its confidence in its answer. This incentivizes the agent to generate trajectories which unambiguously explain various aspects of the general language goal. Our experimental study shows that this approach, which does not require engineer intervention to design the auxiliary objectives, improves sample efficiency by effectively directing exploration.
\ No newline at end of file
diff --git a/data/2022/neurips/EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization b/data/2022/neurips/EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization
new file mode 100644
index 0000000000..664339440d
--- /dev/null
+++ b/data/2022/neurips/EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization	
@@ -0,0 +1 @@
+In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richt\'arik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. These two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. This allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, with the convergence rate improving when the number of parallel workers is large. This is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning.
\ No newline at end of file
diff --git a/data/2022/neurips/EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations b/data/2022/neurips/EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations
new file mode 100644
index 0000000000..3dc4ab3409
--- /dev/null
+++ b/data/2022/neurips/EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations	
@@ -0,0 +1 @@
+Score-based diffusion models (SBDMs) have achieved the SOTA FID results in unpaired image-to-image translation (I2I). However, we notice that existing methods totally ignore the training data in the source domain, leading to sub-optimal solutions for unpaired I2I. To this end, we propose energy-guided stochastic differential equations (EGSDE) that employs an energy function pretrained on both the source and target domains to guide the inference process of a pretrained SDE for realistic and faithful unpaired I2I. Building upon two feature extractors, we carefully design the energy function such that it encourages the transferred image to preserve the domain-independent features and discard domain-specific ones. Further, we provide an alternative explanation of the EGSDE as a product of experts, where each of the three experts (corresponding to the SDE and two feature extractors) solely contributes to faithfulness or realism. Empirically, we compare EGSDE to a large family of baselines on three widely-adopted unpaired I2I tasks under four metrics. EGSDE not only consistently outperforms existing SBDMs-based methods in almost all settings but also achieves the SOTA realism results without harming the faithful performance. Furthermore, EGSDE allows for flexible trade-offs between realism and faithfulness and we improve the realism results further (e.g., FID of 51.04 in Cat to Dog and FID of 50.43 in Wild to Dog on AFHQ) by tuning hyper-parameters. The code is available at https://github.com/ML-GSAI/EGSDE.
\ No newline at end of file
diff --git a/data/2022/neurips/EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records b/data/2022/neurips/EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records
new file mode 100644
index 0000000000..55285951a8
--- /dev/null
+++ b/data/2022/neurips/EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records	
@@ -0,0 +1 @@
+We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL.
\ No newline at end of file
diff --git a/data/2022/neurips/ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler b/data/2022/neurips/ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler
new file mode 100644
index 0000000000..3bf00539ac
--- /dev/null
+++ b/data/2022/neurips/ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler	
@@ -0,0 +1 @@
+Numerical reasoning over text is a challenging task of Artificial Intelligence (AI), requiring reading comprehension and numerical reasoning abilities. Previous approaches use numerical reasoning programs to represent the reasoning process. However, most works do not separate the generation of operators and operands, which are key components of a numerical reasoning program, thus limiting their ability to generate such programs for complicated tasks. In this paper, we introduce the numEricaL reASoning with adapTive symbolIc Compiler (ELASTIC) model, which is constituted of the RoBERTa as the Encoder and a Compiler with four modules: Reasoning Manager, Operator Generator, Operands Generator, and Memory Register. ELASTIC is robust when conducting complicated reasoning. Also, it is domain agnostic by supporting the expansion of diverse operators without caring about the number of operands it contains. Experiments show that ELASTIC achieves 68.96 and 65.21 of execution accuracy and program accuracy on the FinQA dataset and 83.00 program accuracy on the MathQA dataset, outperforming previous state-of-the-art models significantly.
\ No newline at end of file
diff --git a/data/2022/neurips/ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models b/data/2022/neurips/ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
new file mode 100644
index 0000000000..a7032c0966
--- /dev/null
+++ b/data/2022/neurips/ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models	
@@ -0,0 +1 @@
+Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/
\ No newline at end of file
diff --git a/data/2022/neurips/ELIAS: End-to-End Learning to Index and Search in Large Output Spaces b/data/2022/neurips/ELIAS: End-to-End Learning to Index and Search in Large Output Spaces
new file mode 100644
index 0000000000..0c107aa59e
--- /dev/null
+++ b/data/2022/neurips/ELIAS: End-to-End Learning to Index and Search in Large Output Spaces	
@@ -0,0 +1 @@
+Extreme multi-label classification (XMC) is a popular framework for solving many real-world problems that require accurate prediction from a very large number of potential output choices. A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search. Existing methods initialize the tree index by clustering the label space into a few mutually exclusive clusters based on pre-defined features and keep it fixed throughout the training procedure. This approach results in a sub-optimal indexing structure over the label space and limits the search performance to the quality of choices made during the initialization of the index. In this paper, we propose a novel method ELIAS which relaxes the tree-based index to a specialized weighted graph-based index which is learned end-to-end with the final task objective. More specifically, ELIAS models the discrete cluster-to-label assignments in the existing tree-based index as soft learnable parameters that are learned jointly with the rest of the ML model. ELIAS achieves state-of-the-art performance on several large-scale extreme classification benchmarks with millions of labels. In particular, ELIAS can be up to 2.5% better at precision@1 and up to 4% better at recall@100 than existing XMC methods. A PyTorch implementation of ELIAS along with other resources is available at https://github.com/nilesh2797/ELIAS.
\ No newline at end of file
diff --git a/data/2022/neurips/ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward b/data/2022/neurips/ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward
new file mode 100644
index 0000000000..18edde99f1
--- /dev/null
+++ b/data/2022/neurips/ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward	
@@ -0,0 +1 @@
+Modern multi-agent reinforcement learning frameworks rely on centralized training and reward shaping to perform well. However, centralized training and dense rewards are not readily available in the real world. Current multi-agent algorithms struggle to learn in the alternative setup of decentralized training or sparse rewards. To address these issues, we propose a self-supervised intrinsic reward ELIGN - expectation alignment - inspired by the self-organization principle in Zoology. Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations. This allows the agents to learn collaborative behaviors without any external reward or centralized training. We demonstrate the efficacy of our approach across 6 tasks in the multi-agent particle and the complex Google Research football environments, comparing ELIGN to sparse and curiosity-based intrinsic rewards. When the number of agents increases, ELIGN scales well in all multi-agent tasks except for one where agents have different capabilities. We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries. These results identify tasks where expectation alignment is a more useful strategy than curiosity-driven exploration for multi-agent coordination, enabling agents to do zero-shot coordination.
\ No newline at end of file
diff --git a/data/2022/neurips/ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts b/data/2022/neurips/ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts
new file mode 100644
index 0000000000..b85ff5d135
--- /dev/null
+++ b/data/2022/neurips/ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts	
@@ -0,0 +1 @@
+Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weather post-processing. However, these models require a comprehensive dataset of weather simulations to produce high-accuracy results, which comes at a high computational cost to generate. This paper introduces the ENS-10 dataset, consisting of ten ensemble members spanning 20 years (1998-2017). The ensemble members are generated by perturbing numerical weather simulations to capture the chaotic behavior of the Earth. To represent the three-dimensional state of the atmosphere, ENS-10 provides the most relevant atmospheric variables at 11 distinct pressure levels and the surface at 0.5-degree resolution for forecast lead times T=0, 24, and 48 hours (two data points per week). We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing. We provide a set of baselines and compare their skill at correcting the predictions of three important atmospheric variables. Moreover, we measure the baselines' skill at improving predictions of extreme weather events using our dataset. The ENS-10 dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
\ No newline at end of file
diff --git a/data/2022/neurips/EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations b/data/2022/neurips/EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
new file mode 100644
index 0000000000..5dfbe03f40
--- /dev/null
+++ b/data/2022/neurips/EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations	
@@ -0,0 +1 @@
+We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISOR
\ No newline at end of file
diff --git a/data/2022/neurips/ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision Medicine b/data/2022/neurips/ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision Medicine
new file mode 100644
index 0000000000..1e4be2b5ca
--- /dev/null
+++ b/data/2022/neurips/ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision Medicine	
@@ -0,0 +1 @@
+Finding an optimal individualized treatment regimen is considered one of the most challenging precision medicine problems. Various patient characteristics influence the response to the treatment, and hence, there is no one-size-fits-all regimen. Moreover, the administration of an unsafe dose during the treatment can have adverse effects on health. Therefore, a treatment model must ensure patient \emph{safety} while \emph{efficiently} optimizing the course of therapy. We study a prevalent medical problem where the treatment aims to keep a physiological variable in a safe range and preferably close to a target level, which we refer to as \emph{leveling}. Such a task may be relevant in numerous other domains as well. We propose ESCADA, a novel and generic multi-armed bandit (MAB) algorithm tailored for the leveling task, to make safe, personalized, and context-aware dose recommendations. We derive high probability upper bounds on its cumulative regret and safety guarantees. Following ESCADA's design, we also describe its Thompson sampling-based counterpart. We discuss why the straightforward adaptations of the classical MAB algorithms such as GP-UCB may not be a good fit for the leveling task. Finally, we make \emph{in silico} experiments on the bolus-insulin dose allocation problem in type-1 diabetes mellitus disease and compare our algorithms against the famous GP-UCB algorithm, the rule-based dose calculators, and a clinician.
\ No newline at end of file
diff --git a/data/2022/neurips/ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography b/data/2022/neurips/ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography
new file mode 100644
index 0000000000..0c675c34fb
--- /dev/null
+++ b/data/2022/neurips/ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography	
@@ -0,0 +1 @@
+Echocardiography is one of the most commonly used diagnostic imaging modalities 1 in cardiology. Application of deep learning models to echocardiograms can enable 2 automated identification of cardiac structures, estimation of cardiac function, and 3 prediction of clinical outcomes. However, a major hindrance to realizing the full 4 potential of deep learning is the lack of large-scale, fully curated and annotated data 5 sets required for supervised training. High-quality pre-trained representations that 6 can transfer useful visual features of echocardiograms to downstream tasks can help 7 adapt deep learning models to new setups using fewer annotated examples. In this 8 paper, we design a suite of benchmarks that can be used to evaluate echocardio- 9 graphic representations with respect to various clinically-relevant tasks using pub- 10 licly accessible data sets. In addition, we develop a unified evaluation protocol— 11 which we call the echocardiographic task adaptation benchmark (ETAB)—that 12 measures how well a visual representation of echocardiograms generalizes to com- 13 mon downstream tasks of interest. We use our benchmarking framework to evaluate 14 state-of-the-art vision architectures, pre-training and transfer learning algorithms. 15 We envision that our standardized, publicly accessible benchmarks would encour- 16 age future research in high-impact application domains and expedite progress in 17 applying deep learning models to practical problems in cardiovascular medicine.
\ No newline at end of file
diff --git a/data/2022/neurips/EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring b/data/2022/neurips/EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring
new file mode 100644
index 0000000000..fbc2f46fd2
--- /dev/null
+++ b/data/2022/neurips/EZNAS: Evolving Zero-Cost Proxies For Neural Architecture Scoring	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) has significantly improved productivity in the design and deployment of neural networks (NN). As NAS typically evaluates multiple models by training them partially or completely, the improved productivity comes at the cost of significant carbon footprint. To alleviate this expensive training routine, zero-shot/cost proxies analyze an NN at initialization to generate a score, which correlates highly with its true accuracy. Zero-cost proxies are currently designed by experts conducting multiple cycles of empirical testing on possible algorithms, datasets, and neural architecture design spaces. This experimentation lowers productivity and is an unsustainable approach towards zero-cost proxy design as deep learning use-cases diversify in nature. Additionally, existing zero-cost proxies fail to generalize across neural architecture design spaces. In this paper, we propose a genetic programming framework to automate the discovery of zero-cost proxies for neural architecture scoring. Our methodology efficiently discovers an interpretable and generalizable zero-cost proxy that gives state of the art score-accuracy correlation on all datasets and search spaces of NASBench-201 and Network Design Spaces (NDS). We believe that this research indicates a promising direction towards automatically discovering zero-cost proxies that can work across network architecture design spaces, datasets, and tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks b/data/2022/neurips/Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks
new file mode 100644
index 0000000000..919017009a
--- /dev/null
+++ b/data/2022/neurips/Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks	
@@ -0,0 +1 @@
+The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient. The results on activation patterns, which we call ``neuron partition'', help build intuitions for understanding the behavior of neural networks' training dynamics, and may be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Earthformer: Exploring Space-Time Transformers for Earth System Forecasting b/data/2022/neurips/Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
new file mode 100644
index 0000000000..6dadba95c4
--- /dev/null
+++ b/data/2022/neurips/Earthformer: Exploring Space-Time Transformers for Earth System Forecasting	
@@ -0,0 +1 @@
+Conventionally, Earth system (e.g., weather and climate) forecasting relies on numerical simulation with complex physical models and are hence both expensive in computation and demanding on domain expertise. With the explosive growth of the spatiotemporal Earth observation data in the past decade, data-driven models that apply Deep Learning (DL) are demonstrating impressive potential for various Earth system forecasting tasks. The Transformer as an emerging DL architecture, despite its broad success in other domains, has limited adoption in this area. In this paper, we propose Earthformer, a space-time Transformer for Earth system forecasting. Earthformer is based on a generic, flexible and efficient space-time attention block, named Cuboid Attention. The idea is to decompose the data into cuboids and apply cuboid-level self-attention in parallel. These cuboids are further connected with a collection of global vectors. We conduct experiments on the MovingMNIST dataset and a newly proposed chaotic N-body MNIST dataset to verify the effectiveness of cuboid attention and figure out the best design of Earthformer. Experiments on two real-world benchmarks about precipitation nowcasting and El Nino/Southern Oscillation (ENSO) forecasting show Earthformer achieves state-of-the-art performance. Code is available: https://github.com/amazon-science/earth-forecasting-transformer .
\ No newline at end of file
diff --git a/data/2022/neurips/EcoFormer: Energy-Saving Attention with Linear Complexity b/data/2022/neurips/EcoFormer: Energy-Saving Attention with Linear Complexity
new file mode 100644
index 0000000000..c831d98e3f
--- /dev/null
+++ b/data/2022/neurips/EcoFormer: Energy-Saving Attention with Linear Complexity	
@@ -0,0 +1 @@
+Transformer is a transformative framework that models sequential data and has achieved remarkable performance on a wide range of tasks, but with high computational and energy cost. To improve its efficiency, a popular choice is to compress the models via binarization which constrains the floating-point values into binary ones to save resource consumption owing to cheap bitwise operations significantly. However, existing binarization methods only aim at minimizing the information loss for the input distribution statistically, while ignoring the pairwise similarity modeling at the core of the attention. To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space. The kernelized hash functions are learned to match the ground-truth similarity relations extracted from the attention map in a self-supervised way. Based on the equivalence between the inner product of binary codes and the Hamming distance as well as the associative property of matrix multiplication, we can approximate the attention in linear complexity by expressing it as a dot-product of binary codes. Moreover, the compact binary representations of queries and keys enable us to replace most of the expensive multiply-accumulate operations in attention with simple accumulations to save considerable on-chip energy footprint on edge devices. Extensive experiments on both vision and language tasks show that EcoFormer consistently achieves comparable performance with standard attentions while consuming much fewer resources. For example, based on PVTv2-B0 and ImageNet-1K, Ecoformer achieves a 73% on-chip energy footprint reduction with only a 0.33% performance drop compared to the standard attention. Code is available at https://github.com/ziplab/EcoFormer.
\ No newline at end of file
diff --git a/data/2022/neurips/Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving b/data/2022/neurips/Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
new file mode 100644
index 0000000000..f4e380b2eb
--- /dev/null
+++ b/data/2022/neurips/Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving	
@@ -0,0 +1 @@
+Aiming towards a holistic understanding of multiple downstream tasks simultaneously, there is a need for extracting features with better transferability. Though many latest self-supervised pre-training methods have achieved impressive performance on various vision tasks under the prevailing pretrain-finetune paradigm, their generalization capacity to multi-task learning scenarios is yet to be explored. In this paper, we extensively investigate the transfer performance of various types of self-supervised methods, e.g., MoCo and SimCLR, on three downstream tasks, including semantic segmentation, drivable area segmentation, and traffic object detection, on the large-scale driving dataset BDD100K. We surprisingly find that their performances are sub-optimal or even lag far behind the single-task baseline, which may be due to the distinctions of training objectives and architectural design lied in the pretrain-finetune paradigm. To overcome this dilemma as well as avoid redesigning the resource-intensive pre-training stage, we propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training, where the off-the-shelf pretrained models can be effectively adapted without increasing the training overhead. During the adapt stage, we utilize learnable multi-scale adapters to dynamically adjust the pretrained model weights supervised by multi-task objectives while leaving the pretrained knowledge untouched. Furthermore, we regard the vision-language pre-training model CLIP as a strong complement to the pretrain-adapt-finetune paradigm and propose a novel adapter named LV-Adapter, which incorporates language priors in the multi-task model via task-specific prompting and alignment between visual and textual features.
\ No newline at end of file
diff --git a/data/2022/neurips/Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples b/data/2022/neurips/Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples
new file mode 100644
index 0000000000..3a43f17187
--- /dev/null
+++ b/data/2022/neurips/Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples	
@@ -0,0 +1 @@
+Poisoning-based backdoor attacks are serious threat for training deep models on data from untrustworthy sources. Given a backdoored model, we observe that the feature representations of poisoned samples with trigger are more sensitive to transformations than those of clean samples. It inspires us to design a simple sensitivity metric, called feature consistency towards transformations (FCT) , to distinguish poisoned samples from clean samples in the untrustworthy training set. Moreover, we propose two effective backdoor defense methods. Built upon a sample-distinguishment module utilizing the FCT metric, the first method trains a secure model from scratch using a two-stage secure training module. And the second method removes backdoor from a backdoored model with a backdoor removal module which alternatively unlearns the distinguished poisoned samples and relearns the distinguished clean samples. Extensive results on three benchmark datasets demonstrate the superior defense performance against eight types of backdoor attacks, to state-of-the-art backdoor defenses. Codes are available at: https://github.com/SCLBD/Effective_backdoor_defense.
\ No newline at end of file
diff --git a/data/2022/neurips/Effective Dimension in Bandit Problems under Censorship b/data/2022/neurips/Effective Dimension in Bandit Problems under Censorship
new file mode 100644
index 0000000000..ef8c0ea7ab
--- /dev/null
+++ b/data/2022/neurips/Effective Dimension in Bandit Problems under Censorship	
@@ -0,0 +1 @@
+In this paper, we study both multi-armed and contextual bandit problems in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem -- a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost, followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension. Our results are useful for applications of sequential decision-making models where the feedback received depends on strategic uncertainty (e.g., agents' willingness to follow a recommendation) and/or random uncertainty (e.g., loss or delay in arrival of information).
\ No newline at end of file
diff --git a/data/2022/neurips/Effectiveness of Vision Transformer for Fast and Accurate Single-Stage Pedestrian Detection b/data/2022/neurips/Effectiveness of Vision Transformer for Fast and Accurate Single-Stage Pedestrian Detection
new file mode 100644
index 0000000000..60ad99e8e0
--- /dev/null
+++ b/data/2022/neurips/Effectiveness of Vision Transformer for Fast and Accurate Single-Stage Pedestrian Detection	
@@ -0,0 +1 @@
+Vision transformers have demonstrated remarkable performance on a variety of computer vision tasks. In this paper, we illustrate the effectiveness of the de-formable vision transformer for single-stage pedestrian detection and propose a spatial and multi-scale feature enhancement module, which aims to achieve the optimal balance between speed and accuracy. Performance improvement with vision transformers on various commonly used single-stage structures is demonstrated. The design of the proposed architecture is investigated in depth. Comprehensive comparisons with state-of-the-art single-and two-stage detectors on different pedestrian datasets are performed. The proposed detector achieves leading performance on Caltech and Citypersons datasets among single-and two-stage methods using fewer parameters than the baseline. The log-average miss rates for Reasonable and Heavy are decreased to 2.6% and 28.0% on the Caltech test set, and 10.9% and 38.6% on the Citypersons validation set, respectively. The proposed method outperforms SOTA two-stage detectors in the Heavy subset on the Citypersons validation set with considerably faster inference speed.
\ No newline at end of file
diff --git a/data/2022/neurips/Effects of Data Geometry in Early Deep Learning b/data/2022/neurips/Effects of Data Geometry in Early Deep Learning
new file mode 100644
index 0000000000..664ca4d062
--- /dev/null
+++ b/data/2022/neurips/Effects of Data Geometry in Early Deep Learning	
@@ -0,0 +1 @@
+Deep neural networks can approximate functions on different types of data, from images to graphs, with varied underlying structure. This underlying structure can be viewed as the geometry of the data manifold. By extending recent advances in the theoretical understanding of neural networks, we study how a randomly initialized neural network with piece-wise linear activation splits the data manifold into regions where the neural network behaves as a linear function. We derive bounds on the density of boundary of linear regions and the distance to these boundaries on the data manifold. This leads to insights into the expressivity of randomly initialized deep neural networks on non-Euclidean data sets. We empirically corroborate our theoretical results using a toy supervised learning problem. Our experiments demonstrate that number of linear regions varies across manifolds and the results hold with changing neural network architectures. We further demonstrate how the complexity of linear regions is different on the low dimensional manifold of images as compared to the Euclidean space, using the MetFaces dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficiency Ordering of Stochastic Gradient Descent b/data/2022/neurips/Efficiency Ordering of Stochastic Gradient Descent
new file mode 100644
index 0000000000..e46b3e4a36
--- /dev/null
+++ b/data/2022/neurips/Efficiency Ordering of Stochastic Gradient Descent	
@@ -0,0 +1 @@
+We consider the stochastic gradient descent (SGD) algorithm driven by a general stochastic sequence, including i.i.d noise and random walk on an arbitrary graph, among others; and analyze it in the asymptotic sense. Specifically, we employ the notion of `efficiency ordering', a well-analyzed tool for comparing the performance of Markov Chain Monte Carlo (MCMC) samplers, for SGD algorithms in the form of Loewner ordering of covariance matrices associated with the scaled iterate errors in the long term. Using this ordering, we show that input sequences that are more efficient for MCMC sampling also lead to smaller covariance of the errors for SGD algorithms in the limit. This also suggests that an arbitrarily weighted MSE of SGD iterates in the limit becomes smaller when driven by more efficient chains. Our finding is of particular interest in applications such as decentralized optimization and swarm learning, where SGD is implemented in a random walk fashion on the underlying communication graph for cost issues and/or data privacy. We demonstrate how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD. We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework. Empirically, we also observe efficiency ordering for variants of SGD such as accelerated SGD and Adam, open up the possibility of extending our notion of efficiency ordering to a broader family of stochastic optimization algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Active Learning with Abstention b/data/2022/neurips/Efficient Active Learning with Abstention
new file mode 100644
index 0000000000..261567a9ac
--- /dev/null
+++ b/data/2022/neurips/Efficient Active Learning with Abstention	
@@ -0,0 +1 @@
+The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing. We develop the first computationally efficient active learning algorithm with abstention. Our algorithm provably achieves $\mathsf{polylog}(\frac{1}{\varepsilon})$ label complexity, without any low noise conditions. Such performance guarantee reduces the label complexity by an exponential factor, relative to passive learning and active learning that is not allowed to abstain. Furthermore, our algorithm is guaranteed to only abstain on hard examples (where the true label distribution is close to a fair coin), a novel property we term \emph{proper abstention} that also leads to a host of other desirable characteristics (e.g., recovering minimax guarantees in the standard setting, and avoiding the undesirable ``noise-seeking'' behavior often seen in active learning). We also provide novel extensions of our algorithm that achieve \emph{constant} label complexity and deal with model misspecification.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning b/data/2022/neurips/Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning
new file mode 100644
index 0000000000..49ff0d5b8b
--- /dev/null
+++ b/data/2022/neurips/Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning	
@@ -0,0 +1 @@
+Recent studies reveal that a well-trained deep reinforcement learning (RL) policy can be particularly vulnerable to adversarial perturbations on input observations. Therefore, it is crucial to train RL agents that are robust against any attacks with a bounded budget. Existing robust training methods in deep RL either treat correlated steps separately, ignoring the robustness of long-term rewards, or train the agents and RL-based attacker together, doubling the computational burden and sample complexity of the training process. In this work, we propose a strong and efficient robust training framework for RL, named Worst-case-aware Robust RL (WocaR-RL) that directly estimates and optimizes the worst-case reward of a policy under bounded l_p attacks without requiring extra samples for learning an attacker. Experiments on multiple environments show that WocaR-RL achieves state-of-the-art performance under various strong attacks, and obtains significantly higher training efficiency than prior state-of-the-art robust training methods. The code of this work is available at https://github.com/umd-huang-lab/WocaR-RL.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Aggregated Kernel Tests using Incomplete $U$-statistics b/data/2022/neurips/Efficient Aggregated Kernel Tests using Incomplete $U$-statistics
new file mode 100644
index 0000000000..085d0da8bf
--- /dev/null
+++ b/data/2022/neurips/Efficient Aggregated Kernel Tests using Incomplete $U$-statistics	
@@ -0,0 +1 @@
+We propose a series of computationally efficient nonparametric tests for the two-sample, independence, and goodness-of-fit problems, using the Maximum Mean Discrepancy (MMD), Hilbert Schmidt Independence Criterion (HSIC), and Kernel Stein Discrepancy (KSD), respectively. Our test statistics are incomplete $U$-statistics, with a computational cost that interpolates between linear time in the number of samples, and quadratic time, as associated with classical $U$-statistic tests. The three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales: we call the resulting tests MMDAggInc, HSICAggInc and KSDAggInc. This procedure provides a solution to the fundamental kernel selection problem as we can aggregate a large number of kernels with several bandwidths without incurring a significant loss of test power. For the test thresholds, we derive a quantile bound for wild bootstrapped incomplete $U$-statistics, which is of independent interest. We derive non-asymptotic uniform separation rates for MMDAggInc and HSICAggInc, and quantify exactly the trade-off between computational efficiency and the attainable rates: this result is novel for tests based on incomplete $U$-statistics, to our knowledge. We further show that in the quadratic-time case, the wild bootstrap incurs no penalty to test power over the more widespread permutation-based approach, since both attain the same minimax optimal rates (which in turn match the rates that use oracle quantiles). We support our claims with numerical experiments on the trade-off between computational efficiency and test power. In all three testing frameworks, the linear-time versions of our proposed tests perform at least as well as the current linear-time state-of-the-art tests.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Architecture Search for Diverse Tasks b/data/2022/neurips/Efficient Architecture Search for Diverse Tasks
new file mode 100644
index 0000000000..7481a4d980
--- /dev/null
+++ b/data/2022/neurips/Efficient Architecture Search for Diverse Tasks	
@@ -0,0 +1 @@
+While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on ten tasks spanning a variety of application domains such as PDE solving, protein folding, and heart disease detection. DASH outperforms state-of-the-art AutoML methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Dataset Distillation using Random Feature Approximation b/data/2022/neurips/Efficient Dataset Distillation using Random Feature Approximation
new file mode 100644
index 0000000000..23e7020f8e
--- /dev/null
+++ b/data/2022/neurips/Efficient Dataset Distillation using Random Feature Approximation	
@@ -0,0 +1 @@
+Dataset distillation compresses large datasets into smaller synthetic coresets which retain performance with the aim of reducing the storage and computational burden of processing the entire dataset. Today's best-performing algorithm, \textit{Kernel Inducing Points} (KIP), which makes use of the correspondence between infinite-width neural networks and kernel-ridge regression, is prohibitively slow due to the exact computation of the neural tangent kernel matrix, scaling $O(|S|^2)$, with $|S|$ being the coreset size. To improve this, we propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel, which reduces the kernel matrix computation to $O(|S|)$. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. We demonstrate the effectiveness of our approach on tasks involving model interpretability and privacy preservation.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems b/data/2022/neurips/Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems
new file mode 100644
index 0000000000..12cbd851c4
--- /dev/null
+++ b/data/2022/neurips/Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems	
@@ -0,0 +1 @@
+In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward. In this paper, we study the generalized low-rank matrix bandit problem, which has been recently proposed in \cite{lu2021low} under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms on this problem, we first propose the G-ESTT framework that modifies the idea from \cite{jun2019bilinear} by using Stein's method on the subspace estimation and then leverage the estimated subspaces via a regularization idea. Furthermore, we remarkably improve the efficiency of G-ESTT by using a novel exclusion idea on the estimated subspace instead, and propose the G-ESTS framework. We also show that G-ESTT can achieve the $\tilde{O}(\sqrt{(d_1+d_2)MrT})$ bound of regret while G-ESTS can achineve the $\tilde{O}(\sqrt{(d_1+d_2)^{3/2}Mr^{3/2}T})$ bound of regret under mild assumption up to logarithm terms, where $M$ is some problem dependent value. Under a reasonable assumption that $M = O((d_1+d_2)^2)$ in our problem setting, the regret of G-ESTT is consistent with the current best regret of $\tilde{O}((d_1+d_2)^{3/2} \sqrt{rT}/D_{rr})$~\citep{lu2021low} ($D_{rr}$ will be defined later). For completeness, we conduct experiments to illustrate that our proposed algorithms, especially G-ESTS, are also computationally tractable and consistently outperform other state-of-the-art (generalized) linear matrix bandit methods based on a suite of simulations.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Graph Similarity Computation with Alignment Regularization b/data/2022/neurips/Efficient Graph Similarity Computation with Alignment Regularization
new file mode 100644
index 0000000000..8b4cda54a3
--- /dev/null
+++ b/data/2022/neurips/Efficient Graph Similarity Computation with Alignment Regularization	
@@ -0,0 +1 @@
+We consider the graph similarity computation (GSC) task based on graph edit distance (GED) estimation. State-of-the-art methods treat GSC as a learning-based prediction task using Graph Neural Networks (GNNs). To capture fine-grained interactions between pair-wise graphs, these methods mostly contain a node-level matching module in the end-to-end learning pipeline, which causes high computational costs in both the training and inference stages. We show that the expensive node-to-node matching module is not necessary for GSC, and high-quality learning can be attained with a simple yet powerful regularization technique, which we call the Alignment Regularization (AReg). In the training stage, the AReg term imposes a node-graph correspondence constraint on the GNN encoder. In the inference stage, the graph-level representations learned by the GNN encoder are directly used to compute the similarity score without using AReg again to speed up inference. We further propose a multi-scale GED discriminator to enhance the expressive ability of the learned representations. Extensive experiments on real-world datasets demonstrate the effectiveness, efficiency and transferability of our approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Knowledge Distillation from Model Checkpoints b/data/2022/neurips/Efficient Knowledge Distillation from Model Checkpoints
new file mode 100644
index 0000000000..e8d26115aa
--- /dev/null
+++ b/data/2022/neurips/Efficient Knowledge Distillation from Model Checkpoints	
@@ -0,0 +1 @@
+Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers). As empirically there exists a strong correlation between the performance of teacher and student models, it is commonly believed that a high performing teacher is preferred. Consequently, practitioners tend to use a well trained network or an ensemble of them as the teacher. In this paper, we make an intriguing observation that an intermediate model, i.e., a checkpoint in the middle of the training procedure, often serves as a better teacher compared to the fully converged model, although the former has much lower accuracy. More surprisingly, a weak snapshot ensemble of several intermediate models from a same training trajectory can outperform a strong ensemble of independently trained and fully converged models, when they are used as teachers. We show that this phenomenon can be partially explained by the information bottleneck principle: the feature representations of intermediate models can have higher mutual information regarding the input, and thus contain more"dark knowledge"for effective distillation. We further propose an optimal intermediate teacher selection algorithm based on maximizing the total task-related mutual information. Experiments verify its effectiveness and applicability.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation b/data/2022/neurips/Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation
new file mode 100644
index 0000000000..c0bfbe0b27
--- /dev/null
+++ b/data/2022/neurips/Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation	
@@ -0,0 +1 @@
+Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Methods for Non-stationary Online Learning b/data/2022/neurips/Efficient Methods for Non-stationary Online Learning
new file mode 100644
index 0000000000..58812de767
--- /dev/null
+++ b/data/2022/neurips/Efficient Methods for Non-stationary Online Learning	
@@ -0,0 +1 @@
+Non-stationary online learning has drawn much attention in recent years. In particular, dynamic regret and adaptive regret are proposed as two principled performance measures for online convex optimization in non-stationary environments. To optimize them, a two-layer online ensemble is usually deployed due to the inherent uncertainty of the non-stationarity, in which a group of base-learners are maintained and a meta-algorithm is employed to track the best one on the fly. However, the two-layer structure raises the concern about the computational complexity -- those methods typically maintain $\mathcal{O}(\log T)$ base-learners simultaneously for a $T$-round online game and thus perform multiple projections onto the feasible domain per round, which becomes the computational bottleneck when the domain is complicated. In this paper, we present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $\mathcal{O}(\log T)$ to $1$. Moreover, our obtained algorithms require only one gradient query and one function evaluation at each round. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods. Empirical studies verify our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Multi-agent Communication via Self-supervised Information Aggregation b/data/2022/neurips/Efficient Multi-agent Communication via Self-supervised Information Aggregation
new file mode 100644
index 0000000000..9c99b36607
--- /dev/null
+++ b/data/2022/neurips/Efficient Multi-agent Communication via Self-supervised Information Aggregation	
@@ -0,0 +1 @@
+Utilizing messages from teammates can improve coordination in cooperative Multi-agent Reinforcement Learning (MARL). To obtain meaningful information for decision-making, previous works typically combine raw messages generated by teammates with local information as inputs for policy. However, neglecting the aggregation of multiple messages poses great inefficiency for policy learning. Motivated by recent advances in representation learning, we argue that efficient message aggregation is essential for good coordination in MARL. In this paper, we propose M ulti-A gent communication via S elf-supervised I nformation A ggregation (MASIA), with which agents can aggregate the received messages into compact representations with high relevance to augment the local policy. Specifically, we design a permutation invariant message encoder to generate common information aggregated representation from raw messages and optimize it via reconstructing and shooting future information in a self-supervised manner. Each agent would utilize the most relevant parts of the aggregated representation for decision-making by a novel message extraction mechanism. Empirical results demonstrate that our method significantly outperforms strong baselines on multiple cooperative MARL tasks for various task settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Non-Parametric Optimizer Search for Diverse Tasks b/data/2022/neurips/Efficient Non-Parametric Optimizer Search for Diverse Tasks
new file mode 100644
index 0000000000..8bbebb4643
--- /dev/null
+++ b/data/2022/neurips/Efficient Non-Parametric Optimizer Search for Diverse Tasks	
@@ -0,0 +1 @@
+Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer. This way, optimizer search can be naturally formulated as a path-finding problem, allowing a variety of well-established tree traversal methods to be used as the search algorithm. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent-form detection that leverage the characteristics of optimizer update rules to further boost the sample efficiency. We provide a diverse set of tasks to benchmark our algorithm and demonstrate that, with only 128 evaluations, the proposed framework can discover optimizers that surpass both human-designed counterparts and prior optimizer search methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent b/data/2022/neurips/Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent
new file mode 100644
index 0000000000..07fdceaabe
--- /dev/null
+++ b/data/2022/neurips/Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent	
@@ -0,0 +1 @@
+A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs). This approach enables us to directly translate state-of-the-art techniques and analyses in NFGs to learning EFGs, but typically suffers from computational intractability due to the exponential blow-up of the game size introduced by the conversion. In this paper, we address this problem in natural and important setups for the \emph{$\Phi$-Hedge} algorithm -- A generic algorithm capable of learning a large class of equilibria for NFGs. We show that $\Phi$-Hedge can be directly used to learn Nash Equilibria (zero-sum settings), Normal-Form Coarse Correlated Equilibria (NFCCE), and Extensive-Form Correlated Equilibria (EFCE) in EFGs. We prove that, in those settings, the \emph{$\Phi$-Hedge} algorithms are equivalent to standard Online Mirror Descent (OMD) algorithms for EFGs with suitable dilated regularizers, and run in polynomial time. This new connection further allows us to design and analyze a new class of OMD algorithms based on modifying its log-partition function. In particular, we design an improved algorithm with balancing techniques that achieves a sharp $\widetilde{\mathcal{O}}(\sqrt{XAT})$ EFCE-regret under bandit-feedback in an EFG with $X$ information sets, $A$ actions, and $T$ episodes. To our best knowledge, this is the first such rate and matches the information-theoretic lower bound.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Risk-Averse Reinforcement Learning b/data/2022/neurips/Efficient Risk-Averse Reinforcement Learning
new file mode 100644
index 0000000000..d2cab7fbe1
--- /dev/null
+++ b/data/2022/neurips/Efficient Risk-Averse Reinforcement Learning	
@@ -0,0 +1 @@
+In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypass it. We also devise a novel Cross Entropy module for risk sampling, which (1) preserves risk aversion despite the soft risk; (2) independently improves sample efficiency. By separating the risk aversion of the sampler and the optimizer, we can sample episodes with poor conditions, yet optimize with respect to successful strategies. We combine these two concepts in CeSoR - Cross-entropy Soft-Risk optimization algorithm - which can be applied on top of any risk-averse policy gradient (PG) method. We demonstrate improved risk aversion in maze navigation, autonomous driving, and resource allocation benchmarks, including in scenarios where standard risk-averse PG completely fails.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Sampling on Riemannian Manifolds via Langevin MCMC b/data/2022/neurips/Efficient Sampling on Riemannian Manifolds via Langevin MCMC
new file mode 100644
index 0000000000..a91939babc
--- /dev/null
+++ b/data/2022/neurips/Efficient Sampling on Riemannian Manifolds via Langevin MCMC	
@@ -0,0 +1 @@
+We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming $\nabla h$ is Lipschitz and $M$ has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within $\epsilon$-Wasserstein distance of $\pi^*$ after $\tilde{O}(\epsilon^{-2})$ steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where $h$ can be nonconvex and $M$ can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that $\pi^*$ satisfies a $CD(\cdot,\infty)$ condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by $\tilde{O}(\epsilon^{-2})$ as well.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning b/data/2022/neurips/Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning
new file mode 100644
index 0000000000..22f69599f4
--- /dev/null
+++ b/data/2022/neurips/Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+In deep reinforcement learning (RL), data augmentation is widely considered as a tool to induce a set of useful priors about semantic consistency and improve sample efficiency and generalization performance. However, even when the prior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency. Meanwhile, the agent is forgetful of the prior due to the non-stationary nature of RL. These observations suggest two extreme schedules of distillation: (i) over the entire training; or (ii) only at the end. Hence, we devise a stand-alone network distillation method to inject the consistency prior at any time (even after RL), and a simple yet efficient framework to automatically schedule the distillation. Specifically, the proposed framework first focuses on mastering train environments regardless of generalization by adaptively deciding which {\it or no} augmentation to be used for the training. After this, we add the distillation to extract the remaining benefits for generalization from all the augmentations, which requires no additional new samples. In our experiments, we demonstrate the utility of the proposed framework, in particular, that considers postponing the augmentation to the end of RL training.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models b/data/2022/neurips/Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models
new file mode 100644
index 0000000000..5bd5d3bbe6
--- /dev/null
+++ b/data/2022/neurips/Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models	
@@ -0,0 +1 @@
+During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about 1%-area edits, SIGE accelerates DDPM by <inline-formula><tex-math notation="LaTeX">$3.0\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>3</mml:mn><mml:mo>.</mml:mo><mml:mn>0</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq1-3316020.gif"/></alternatives></inline-formula> on NVIDIA RTX 3090 and <inline-formula><tex-math notation="LaTeX">$4.6\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>4</mml:mn><mml:mo>.</mml:mo><mml:mn>6</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq2-3316020.gif"/></alternatives></inline-formula> on Apple M1 Pro GPU, Stable Diffusion by <inline-formula><tex-math notation="LaTeX">$7.2\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>7</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq3-3316020.gif"/></alternatives></inline-formula> on 3090, and GauGAN by <inline-formula><tex-math notation="LaTeX">$5.6\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>5</mml:mn><mml:mo>.</mml:mo><mml:mn>6</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq4-3316020.gif"/></alternatives></inline-formula> on 3090 and <inline-formula><tex-math notation="LaTeX">$5.2\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>5</mml:mn><mml:mo>.</mml:mo><mml:mn>2</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="li-ieq5-3316020.gif"/></alternatives></inline-formula> on M1 Pro GPU. Compared to our conference paper, we enhance SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results to substantiate the efficacy of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Submodular Optimization under Noise: Local Search is Robust b/data/2022/neurips/Efficient Submodular Optimization under Noise: Local Search is Robust
new file mode 100644
index 0000000000..abb069edda
--- /dev/null
+++ b/data/2022/neurips/Efficient Submodular Optimization under Noise: Local Search is Robust	
@@ -0,0 +1 @@
+The problem of monotone submodular maximization has been studied extensively due to its wide range of applications. However, there are cases where one can only access the objective function in a distorted or noisy form because of the uncertain nature or the errors involved in the evaluation. This paper considers the problem of constrained monotone submodular maximization with noisy oracles introduced by [Hassidim et al., 2017]. For a cardinality constraint, we propose an algorithm achieving a near-optimal $\left(1-\frac{1}{e}-O(\varepsilon)\right)$-approximation guarantee (for arbitrary $\varepsilon>0$) with only a polynomial number of queries to the noisy value oracle, which improves the exponential query complexity of [Singer et al., 2018]. For general matroid constraints, we show the first constant approximation algorithm in the presence of noise. Our main approaches are to design a novel local search framework that can handle the effect of noise and to construct certain smoothing surrogate functions for noise reduction.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient Training of Low-Curvature Neural Networks b/data/2022/neurips/Efficient Training of Low-Curvature Neural Networks
new file mode 100644
index 0000000000..1ea5bc3376
--- /dev/null
+++ b/data/2022/neurips/Efficient Training of Low-Curvature Neural Networks	
@@ -0,0 +1 @@
+The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of non-linearity. Using this, we demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance, which leads to improved robustness and stable gradients, with only a marginally increased training time. To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers. To efficiently minimize this bound, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to their standard high-curvature counterparts, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network models.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Effective Augmentation Strategy for Adversarial Training b/data/2022/neurips/Efficient and Effective Augmentation Strategy for Adversarial Training
new file mode 100644
index 0000000000..31dbc23fb4
--- /dev/null
+++ b/data/2022/neurips/Efficient and Effective Augmentation Strategy for Adversarial Training	
@@ -0,0 +1 @@
+Adversarial training of Deep Neural Networks is known to be significantly more data-hungry when compared to standard training. Furthermore, complex data augmentations such as AutoAugment, which have led to substantial gains in standard training of image classifiers, have not been successful with Adversarial Training. We first explain this contrasting behavior by viewing augmentation during training as a problem of domain generalization, and further propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training. We aim to handle the conflicting goals of enhancing the diversity of the training dataset and training with data that is close to the test distribution by using a combination of simple and complex augmentations with separate batch normalization layers during training. We further utilize the popular Jensen-Shannon divergence loss to encourage the joint learning of the diverse augmentations, thereby allowing simple augmentations to guide the learning of complex ones. Lastly, to improve the computational efficiency of the proposed method, we propose and utilize a two-step defense, Ascending Constraint Adversarial Training (ACAT), that uses an increasing epsilon schedule and weight-space smoothing to prevent gradient masking. The proposed method DAJAT achieves substantially better robustness-accuracy trade-off when compared to existing methods on the RobustBench Leaderboard on ResNet-18 and WideResNet-34-10. The code for implementing DAJAT is available here: https://github.com/val-iisc/DAJAT.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations b/data/2022/neurips/Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations
new file mode 100644
index 0000000000..60993e82c3
--- /dev/null
+++ b/data/2022/neurips/Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations	
@@ -0,0 +1 @@
+As a longstanding learning paradigm, multi-task learning has been widely applied into a variety of machine learning applications. Nonetheless, identifying which tasks should be learned together is still a challenging fundamental problem because the possible task combinations grow exponentially with the number of tasks, and existing solutions heavily relying on heuristics may probably lead to ineffective groupings with severe performance degradation. To bridge this gap, we develop a systematic multi-task grouping framework with a new meta-learning problem on task combinations, which is to predict the per-task performance gains of multi-task learning over single-task learning for any combination. Our underlying assumption is that no matter how large the space of task combinations is, the relationships between task combinations and performance gains lie in some low-dimensional manifolds and thus can be learnable. Accordingly, we develop a neural meta learner, MTG-Net, to capture these relationships, and design an active learning strategy to progressively select meta-training samples. In this way, even with limited meta samples, MTG-Net holds the potential to produce reasonable gain estimations on arbitrary task combinations. Extensive experiments on diversified multi-task scenarios demonstrate the efficiency and effectiveness of our method. Specifically, in a large-scale evaluation with 27 tasks, which produce over one hundred million task combinations, our method almost doubles the performance obtained by the existing best solution given roughly the same computational cost. Data and code are available at https://github.com/ShawnKS/MTG-Net .
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Effective Optimal Transport-Based Biclustering b/data/2022/neurips/Efficient and Effective Optimal Transport-Based Biclustering
new file mode 100644
index 0000000000..005937dfac
--- /dev/null
+++ b/data/2022/neurips/Efficient and Effective Optimal Transport-Based Biclustering	
@@ -0,0 +1 @@
+Bipartite graphs can be used to model a wide variety of dyadic information such as user-rating, document-term, and gene-disorder pairs. Biclustering is an extension of clustering to the underlying bipartite graph induced from this kind of data. In this paper, we leverage optimal transport (OT) which has gained momentum in the machine learning community to propose a novel and scalable biclustering model that generalizes several classical biclustering approaches. We perform extensive experimentation to show the validity of our approach compared to other OT biclustering algorithms along both dimensions of the dyadic datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Modular Implicit Differentiation b/data/2022/neurips/Efficient and Modular Implicit Differentiation
new file mode 100644
index 0000000000..e4f272608e
--- /dev/null
+++ b/data/2022/neurips/Efficient and Modular Implicit Differentiation	
@@ -0,0 +1 @@
+Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, so far, implicit differentiation remained difficult to use for practitioners, as it often required case-by-case tedious mathematical derivations and implementations. In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines directly in Python a function $F$ capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of $F$ and the implicit function theorem to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many existing implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions b/data/2022/neurips/Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions
new file mode 100644
index 0000000000..3c01b7fdfc
--- /dev/null
+++ b/data/2022/neurips/Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions	
@@ -0,0 +1 @@
+Due to the drastic gap in complexity between sequential and batch statistical learning, recent work has studied a smoothed sequential learning setting, where Nature is constrained to select contexts with density bounded by 1/{\sigma} with respect to a known measure {\mu}. Unfortunately, for some function classes, there is an exponential gap between the statistically optimal regret and that which can be achieved efficiently. In this paper, we give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/{\sigma}) regret for realizable K-wise linear classification. We extend our results to settings where the true classifier is linear in an over-parameterized polynomial featurization of the contexts, as well as to a realizable piecewise-regression setting assuming access to an appropriate ERM oracle. Somewhat surprisingly, standard disagreement-based analyses are insufficient to achieve regret logarithmic in 1/{\sigma}. Instead, we develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers. Along the way, we develop numerous technical tools of independent interest, including a general anti-concentration bound for the determinant of certain matrix averages.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient and Stable Fully Dynamic Facility Location b/data/2022/neurips/Efficient and Stable Fully Dynamic Facility Location
new file mode 100644
index 0000000000..f2241250c2
--- /dev/null
+++ b/data/2022/neurips/Efficient and Stable Fully Dynamic Facility Location	
@@ -0,0 +1 @@
+We consider the classic facility location problem in fully dynamic data streams, where elements can be both inserted and deleted. In this problem, one is interested in maintaining a stable and high quality solution throughout the data stream while using only little time per update (insertion or deletion). We study the problem and provide the first algorithm that at the same time maintains a constant approximation and incurs polylogarithmic amortized recourse per update. We complement our theoretical results with an experimental analysis showing the practical efficiency of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient coding, channel capacity, and the emergence of retinal mosaics b/data/2022/neurips/Efficient coding, channel capacity, and the emergence of retinal mosaics
new file mode 100644
index 0000000000..ee55eec3b9
--- /dev/null
+++ b/data/2022/neurips/Efficient coding, channel capacity, and the emergence of retinal mosaics	
@@ -0,0 +1 @@
+Among the most striking features of retinal organization is the grouping of its output neurons, the retinal ganglion cells (RGCs), into a diversity of functional types. Each of these types exhibits a mosaic-like organization of receptive fields (RFs) that tiles the retina and visual space. Previous work has shown that many features of RGC organization, including the existence of ON and OFF cell types, the structure of spatial RFs, and their relative arrangement, can be predicted on the basis of efficient coding theory. This theory posits that the nervous system is organized to maximize information in its encoding of stimuli while minimizing metabolic costs. Here, we use efficient coding theory to present a comprehensive account of mosaic organization in the case of natural videos as the retinal channel capacity—the number of simulated RGCs available for encoding—is varied. We show that mosaic density increases with channel capacity up to a series of critical points at which, surprisingly, new cell types emerge. Each successive cell type focuses on increasingly high temporal frequencies and integrates signals over large spatial areas. In addition, we show theoretically and in simulation that a transition from mosaic alignment to anti-alignment across pairs of cell types is observed with increasing output noise and decreasing input noise. Together, these results offer a unified perspective on the relationship between retinal mosaics, efficient coding, and channel capacity that can help to explain the stunning functional diversity of retinal cell types.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient identification of informative features in simulation-based inference b/data/2022/neurips/Efficient identification of informative features in simulation-based inference
new file mode 100644
index 0000000000..c4bb96238c
--- /dev/null
+++ b/data/2022/neurips/Efficient identification of informative features in simulation-based inference	
@@ -0,0 +1 @@
+Simulation-based Bayesian inference (SBI) can be used to estimate the parameters of complex mechanistic models given observed model outputs without requiring access to explicit likelihood evaluations. A prime example for the application of SBI in neuroscience involves estimating the parameters governing the response dynamics of Hodgkin-Huxley (HH) models from electrophysiological measurements, by inferring a posterior over the parameters that is consistent with a set of observations. To this end, many SBI methods employ a set of summary statistics or scientifically interpretable features to estimate a surrogate likelihood or posterior. However, currently, there is no way to identify how much each summary statistic or feature contributes to reducing posterior uncertainty. To address this challenge, one could simply compare the posteriors with and without a given feature included in the inference process. However, for large or nested feature sets, this would necessitate repeatedly estimating the posterior, which is computationally expensive or even prohibitive. Here, we provide a more efficient approach based on the SBI method neural likelihood estimation (NLE): We show that one can marginalize the trained surrogate likelihood post-hoc before inferring the posterior to assess the contribution of a feature. We demonstrate the usefulness of our method by identifying the most important features for inferring parameters of an example HH neuron model. Beyond neuroscience, our method is generally applicable to SBI workflows that rely on data features for inference used in other scientific fields.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficient learning of nonlinear prediction models with time-series privileged information b/data/2022/neurips/Efficient learning of nonlinear prediction models with time-series privileged information
new file mode 100644
index 0000000000..52df806486
--- /dev/null
+++ b/data/2022/neurips/Efficient learning of nonlinear prediction models with time-series privileged information	
@@ -0,0 +1 @@
+In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to auxiliary information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
\ No newline at end of file
diff --git a/data/2022/neurips/EfficientFormer: Vision Transformers at MobileNet Speed b/data/2022/neurips/EfficientFormer: Vision Transformers at MobileNet Speed
new file mode 100644
index 0000000000..7adac8ba78
--- /dev/null
+++ b/data/2022/neurips/EfficientFormer: Vision Transformers at MobileNet Speed	
@@ -0,0 +1 @@
+Vision Transformers (ViT) have shown rapid progress in computer vision tasks, achieving promising results on various benchmarks. However, due to the massive number of parameters and model design, \textit{e.g.}, attention mechanism, ViT-based models are generally times slower than lightweight convolutional networks. Therefore, the deployment of ViT for real-time applications is particularly challenging, especially on resource-constrained hardware such as mobile devices. Recent efforts try to reduce the computation complexity of ViT through network architecture search or hybrid design with MobileNet block, yet the inference speed is still unsatisfactory. This leads to an important question: can transformers run as fast as MobileNet while obtaining high performance? To answer this, we first revisit the network architecture and operators used in ViT-based models and identify inefficient designs. Then we introduce a dimension-consistent pure transformer (without MobileNet blocks) as a design paradigm. Finally, we perform latency-driven slimming to get a series of final models dubbed EfficientFormer. Extensive experiments show the superiority of EfficientFormer in performance and speed on mobile devices. Our fastest model, EfficientFormer-L1, achieves $79.2\%$ top-1 accuracy on ImageNet-1K with only $1.6$ ms inference latency on iPhone 12 (compiled with CoreML), which runs as fast as MobileNetV2$\times 1.4$ ($1.6$ ms, $74.7\%$ top-1), and our largest model, EfficientFormer-L7, obtains $83.3\%$ accuracy with only $7.0$ ms latency. Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation b/data/2022/neurips/Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation
new file mode 100644
index 0000000000..07057f6179
--- /dev/null
+++ b/data/2022/neurips/Efficiently Computing Local Lipschitz Constants of Neural Networks via Bound Propagation	
@@ -0,0 +1 @@
+Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization. Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks. In this paper, we develop an efficient framework for computing the $\ell_\infty$ local Lipschitz constant of a neural network by tightly upper bounding the norm of Clarke Jacobian via linear bound propagation. We formulate the computation of local Lipschitz constants with a linear bound propagation process on a high-order backward graph induced by the chain rule of Clarke Jacobian. To enable linear bound propagation, we derive tight linear relaxations for specific nonlinearities in Clarke Jacobian. This formulate unifies existing ad-hoc approaches such as RecurJac, which can be seen as a special case of ours with weaker relaxations. The bound propagation framework also allows us to easily borrow the popular Branch-and-Bound (BaB) approach from neural network verification to further tighten Lipschitz constants. Experiments show that on tiny models, our method produces comparable bounds compared to exact methods that cannot scale to slightly larger models; on larger models, our method efficiently produces tighter results than existing relaxed or naive methods, and our method scales to much larger practical models that previous works could not handle. We also demonstrate an application on provable monotonicity analysis. Code is available at https://github.com/shizhouxing/Local-Lipschitz-Constants.
\ No newline at end of file
diff --git a/data/2022/neurips/Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent b/data/2022/neurips/Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent
new file mode 100644
index 0000000000..085b0aa91e
--- /dev/null
+++ b/data/2022/neurips/Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent	
@@ -0,0 +1 @@
+Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
\ No newline at end of file
diff --git a/data/2022/neurips/EgoTaskQA: Understanding Human Tasks in Egocentric Videos b/data/2022/neurips/EgoTaskQA: Understanding Human Tasks in Egocentric Videos
new file mode 100644
index 0000000000..d0d9d950e1
--- /dev/null
+++ b/data/2022/neurips/EgoTaskQA: Understanding Human Tasks in Egocentric Videos	
@@ -0,0 +1 @@
+Understanding human tasks through video observations is an essential capability of intelligent agents. The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i.e., state changes), and their causal dependencies. These challenges are further aggravated by the natural parallelism from multi-tasking and partial observations in multi-agent collaboration. Most prior works leverage action localization or future prediction as an indirect metric for evaluating such task understanding from videos. To make a direct evaluation, we introduce the EgoTaskQA benchmark that provides a single home for the crucial dimensions of task understanding through question-answering on real-world egocentric videos. We meticulously design questions that target the understanding of (1) action dependencies and effects, (2) intents and goals, and (3) agents' beliefs about others. These questions are divided into four types, including descriptive (what status?), predictive (what will?), explanatory (what caused?), and counterfactual (what if?) to provide diagnostic analyses on spatial, temporal, and causal understandings of goal-oriented tasks. We evaluate state-of-the-art video reasoning models on our benchmark and show their significant gaps between humans in understanding complex goal-oriented egocentric videos. We hope this effort will drive the vision community to move onward with goal-oriented video understanding and reasoning.
\ No newline at end of file
diff --git a/data/2022/neurips/Egocentric Video-Language Pretraining b/data/2022/neurips/Egocentric Video-Language Pretraining
new file mode 100644
index 0000000000..a36968a26b
--- /dev/null
+++ b/data/2022/neurips/Egocentric Video-Language Pretraining	
@@ -0,0 +1 @@
+Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3.8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https://github.com/showlab/EgoVLP.
\ No newline at end of file
diff --git a/data/2022/neurips/ElasticMVS: Learning elastic part representation for self-supervised multi-view stereopsis b/data/2022/neurips/ElasticMVS: Learning elastic part representation for self-supervised multi-view stereopsis
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Eliciting Thinking Hierarchy without a Prior b/data/2022/neurips/Eliciting Thinking Hierarchy without a Prior
new file mode 100644
index 0000000000..90b279ebf4
--- /dev/null
+++ b/data/2022/neurips/Eliciting Thinking Hierarchy without a Prior	
@@ -0,0 +1 @@
+When we use the wisdom of the crowds, we usually rank the answers according to their popularity, especially when we cannot verify the answers. However, this can be very dangerous when the majority make systematic mistakes. A fundamental question arises: can we build a hierarchy among the answers \textit{without any prior} where the higher-ranking answers, which may not be supported by the majority, are from more sophisticated people? To address the question, we propose 1) a novel model to describe people's thinking hierarchy; 2) two algorithms to learn the thinking hierarchy without any prior; 3) a novel open-response based crowdsourcing approach based on the above theoretic framework. In addition to theoretic justifications, we conduct four empirical crowdsourcing studies and show that a) the accuracy of the top-ranking answers learned by our approach is much higher than that of plurality voting (In one question, the plurality answer is supported by 74 respondents but the correct answer is only supported by 3 respondents. Our approach ranks the correct answer the highest without any prior); b) our model has a high goodness-of-fit, especially for the questions where our top-ranking answer is correct. To the best of our knowledge, we are the first to propose a thinking hierarchy model with empirical validations in the general problem-solving scenarios; and the first to propose a practical open-response based crowdsourcing approach that beats plurality voting without any prior.
\ No newline at end of file
diff --git a/data/2022/neurips/Elucidating the Design Space of Diffusion-Based Generative Models b/data/2022/neurips/Elucidating the Design Space of Diffusion-Based Generative Models
new file mode 100644
index 0000000000..3345961d90
--- /dev/null
+++ b/data/2022/neurips/Elucidating the Design Space of Diffusion-Based Generative Models	
@@ -0,0 +1 @@
+We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.
\ No newline at end of file
diff --git a/data/2022/neurips/Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification b/data/2022/neurips/Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification
new file mode 100644
index 0000000000..a1ff3ef26a
--- /dev/null
+++ b/data/2022/neurips/Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification	
@@ -0,0 +1 @@
+This paper explores learning emulators for parameter estimation with uncertainty estimation of high-dimensional dynamical systems. We assume access to a computationally complex simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters. Standard iterative approaches necessitate running the simulator many times, which is computationally prohibitive. This paper describes a novel framework for learning feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. Leveraging a contrastive learning approach, our method exploits intrinsic data properties within and across parameter and trajectory domains. On a coupled 396-dimensional multiscale Lorenz 96 system, our method significantly outperforms a typical parameter estimation method based on predefined metrics and a classical numerical simulator, and with only 1.19% of the baseline's computation time. Ablation studies highlight the potential of explicitly designing learned emulators for parameter estimation by leveraging contrastive learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Embodied Scene-aware Human Pose Estimation b/data/2022/neurips/Embodied Scene-aware Human Pose Estimation
new file mode 100644
index 0000000000..dbfb56d520
--- /dev/null
+++ b/data/2022/neurips/Embodied Scene-aware Human Pose Estimation	
@@ -0,0 +1 @@
+We propose embodied scene-aware human pose estimation where we estimate 3D poses based on a simulated agent's proprioception and scene awareness, along with external third-person observations. Unlike prior methods that often resort to multistage optimization, non-causal inference, and complex contact modeling to estimate human pose and human scene interactions, our method is one-stage, causal, and recovers global 3D human poses in a simulated environment. Since 2D third-person observations are coupled with the camera pose, we propose to disentangle the camera pose and use a multi-step projection gradient defined in the global coordinate frame as the movement cue for our embodied agent. Leveraging a physics simulation and prescanned scenes (e.g., 3D mesh), we simulate our agent in everyday environments (library, office, bedroom, etc.) and equip our agent with environmental sensors to intelligently navigate and interact with the geometries of the scene. Our method also relies only on 2D keypoints and can be trained on synthetic datasets derived from popular human motion databases. To evaluate, we use the popular H36M and PROX datasets and achieve high quality pose estimation on the challenging PROX dataset without ever using PROX motion sequences for training. Code and videos are available on the project page.
\ No newline at end of file
diff --git a/data/2022/neurips/Embrace the Gap: VAEs Perform Independent Mechanism Analysis b/data/2022/neurips/Embrace the Gap: VAEs Perform Independent Mechanism Analysis
new file mode 100644
index 0000000000..2617a1da2e
--- /dev/null
+++ b/data/2022/neurips/Embrace the Gap: VAEs Perform Independent Mechanism Analysis	
@@ -0,0 +1 @@
+Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, since unregularized maximum likelihood estimation cannot invert the data-generating process. Yet, VAEs often succeed at this task. We seek to elucidate this apparent paradox by studying nonlinear VAEs in the limit of near-deterministic decoders. We first prove that, in this regime, the optimal encoder approximately inverts the decoder -- a commonly used but unproven conjecture -- which we refer to as {\em self-consistency}. Leveraging self-consistency, we show that the ELBO converges to a regularized log-likelihood. This allows VAEs to perform what has recently been termed independent mechanism analysis (IMA): it adds an inductive bias towards decoders with column-orthogonal Jacobians, which helps recovering the true latent factors. The gap between ELBO and log-likelihood is therefore welcome, since it bears unanticipated benefits for nonlinear representation learning. In experiments on synthetic and image data, we show that VAEs uncover the true latent factors when the data generating process satisfies the IMA assumption.
\ No newline at end of file
diff --git a/data/2022/neurips/Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding b/data/2022/neurips/Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
new file mode 100644
index 0000000000..731d8574cf
--- /dev/null
+++ b/data/2022/neurips/Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding	
@@ -0,0 +1 @@
+Spatio-Temporal video grounding (STVG) focuses on retrieving the spatio-temporal tube of a specific object depicted by a free-form textual expression. Existing approaches mainly treat this complicated task as a parallel frame-grounding problem and thus suffer from two types of inconsistency drawbacks: feature alignment inconsistency and prediction inconsistency. In this paper, we present an end-to-end one-stage framework, termed Spatio-Temporal Consistency-Aware Transformer (STCAT), to alleviate these issues. Specially, we introduce a novel multi-modal template as the global objective to address this task, which explicitly constricts the grounding region and associates the predictions among all video frames. Moreover, to generate the above template under sufficient video-textual perception, an encoder-decoder architecture is proposed for effective global context modeling. Thanks to these critical designs, STCAT enjoys more consistent cross-modal feature alignment and tube prediction without reliance on any pre-trained object detectors. Extensive experiments show that our method outperforms previous state-of-the-arts with clear margins on two challenging video benchmarks (VidSTG and HC-STVG), illustrating the superiority of the proposed framework to better understanding the association between vision and natural language. Code is publicly available at https://github.com/jy0205/STCAT.
\ No newline at end of file
diff --git a/data/2022/neurips/Emergence of Hierarchical Layers in a Single Sheet of Self-Organizing Spiking Neurons b/data/2022/neurips/Emergence of Hierarchical Layers in a Single Sheet of Self-Organizing Spiking Neurons
new file mode 100644
index 0000000000..c8dd6cf6b7
--- /dev/null
+++ b/data/2022/neurips/Emergence of Hierarchical Layers in a Single Sheet of Self-Organizing Spiking Neurons	
@@ -0,0 +1 @@
+Traditionally convolutional neural network architectures have been designed by stacking layers on top of each other to form deeper hierarchical networks. The cortex in the brain however does not just stack layers as done in standard convolution neural networks, instead different regions are organized next to each other in a large single sheet of neurons. Biological neurons self organize to form topographic maps, where neurons encoding similar stimuli group together to form logical clusters. Here we propose new self-organization principles that allow for the formation of hierarchical cortical regions (i.e. layers) in a completely unsupervised manner without requiring any predefined architecture. Synaptic connections are dynamically grown and pruned, which allows us to actively constrain the number of incoming and outgoing connections. This way we can minimize the wiring cost by taking into account both the synaptic strength and the connection length. The proposed method uses purely local learning rules in the form of spike-timing-dependent plasticity (STDP) with lateral excitation and inhibition. We show experimentally that these self-organization rules are sufficient for topographic maps and hierarchical layers to emerge. Our proposed Self-Organizing Neural Sheet (SONS) model can thus form traditional neural network layers in a completely unsupervised manner from just a single large pool of unstructured spiking neurons.
\ No newline at end of file
diff --git a/data/2022/neurips/Emergent Communication: Generalization and Overfitting in Lewis Games b/data/2022/neurips/Emergent Communication: Generalization and Overfitting in Lewis Games
new file mode 100644
index 0000000000..ff61e5e2e9
--- /dev/null
+++ b/data/2022/neurips/Emergent Communication: Generalization and Overfitting in Lewis Games	
@@ -0,0 +1 @@
+Lewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack of generalization, lack of compositionality, etc). In this paper, we aim to provide better understanding of this phenomenon by analytically studying the learning problem in Lewis games. As a core contribution, we demonstrate that the standard objective in Lewis games can be decomposed in two components: a co-adaptation loss and an information loss. This decomposition enables us to surface two potential sources of overfitting, which we show may undermine the emergence of a structured communication protocol. In particular, when we control for overfitting on the co-adaptation loss, we recover desired properties in the emergent languages: they are more compositional and generalize better.
\ No newline at end of file
diff --git a/data/2022/neurips/Emergent Graphical Conventions in a Visual Communication Game b/data/2022/neurips/Emergent Graphical Conventions in a Visual Communication Game
new file mode 100644
index 0000000000..35bdb65219
--- /dev/null
+++ b/data/2022/neurips/Emergent Graphical Conventions in a Visual Communication Game	
@@ -0,0 +1 @@
+Humans communicate with graphical sketches apart from symbolic languages. Primarily focusing on the latter, recent studies of emergent communication overlook the sketches; they do not account for the evolution process through which symbolic sign systems emerge in the trade-off between iconicity and symbolicity. In this work, we take the very first step to model and simulate this process via two neural agents playing a visual communication game; the sender communicates with the receiver by sketching on a canvas. We devise a novel reinforcement learning method such that agents are evolved jointly towards successful communication and abstract graphical conventions. To inspect the emerged conventions, we define three fundamental properties -- iconicity, symbolicity, and semanticity -- and design evaluation methods accordingly. Our experimental results under different controls are consistent with the observation in studies of human graphical conventions. Of note, we find that evolved sketches can preserve the continuum of semantics under proper environmental pressures. More interestingly, co-evolved agents can switch between conventionalized and iconic communication based on their familiarity with referents. We hope the present research can pave the path for studying emergent communication with the modality of sketches.
\ No newline at end of file
diff --git a/data/2022/neurips/Empirical Gateaux Derivatives for Causal Inference b/data/2022/neurips/Empirical Gateaux Derivatives for Causal Inference
new file mode 100644
index 0000000000..561c064c61
--- /dev/null
+++ b/data/2022/neurips/Empirical Gateaux Derivatives for Causal Inference	
@@ -0,0 +1 @@
+We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by ﬁnite differencing, with a focus on functionals that arise in causal inference. We study the setting where probability distributions are not known a priori but need to be estimated from data. These estimated distributions lead to empirical Gateaux derivatives, and we study the relationships between empirical, numerical, and analytical Gateaux derivatives. Starting with a case study of the interventional mean (average potential outcome), we delineate the relationship between the empirical Gateaux derivative (via ﬁnite differencing) and the analytical Gateaux derivative. We then derive requirements on the rates of numerical approximation in perturbation and smoothing that preserve the statistical beneﬁts of one-step adjustments, such as rate double robustness. We further study more complicated functionals such as dynamic treatment regimes and the linear-programming formulation for policy optimization in inﬁnite-horizon Markov decision processes. The ability to approximate bias adjustments in the presence of arbitrary constraints in these more complicated settings illustrates the usefulness of constructive approaches for Gateaux derivatives. We also ﬁnd that the statistical structure of the functional (rate double robustness) can permit less conservative rates for ﬁnite-difference approximation. This property, however, can be speciﬁc to particular functionals; e.g., it occurs for the average potential outcome (hence average treatment effect) but not the inﬁnite-horizon MDP policy value.
\ No newline at end of file
diff --git a/data/2022/neurips/Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width b/data/2022/neurips/Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
new file mode 100644
index 0000000000..90f6051490
--- /dev/null
+++ b/data/2022/neurips/Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width	
@@ -0,0 +1 @@
+Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to $0$, $+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.
\ No newline at end of file
diff --git a/data/2022/neurips/Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation b/data/2022/neurips/Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation
new file mode 100644
index 0000000000..3cbd2aec3a
--- /dev/null
+++ b/data/2022/neurips/Enabling Detailed Action Recognition Evaluation Through Video Dataset Augmentation	
@@ -0,0 +1 @@
+It is well-known in the video understanding community that human action recognition models suffer from background bias, i.e., over-relying on scene cues in making their predictions. However, it is difficult to quantify this effect using existing evaluation frameworks. We introduce the Human-centric Analysis Toolkit (HAT), which enables evaluation of learned background bias without the need for new manual video annotation. It does so by automatically generating synthetically manipulated videos and leveraging the recent advances in image segmentation and video in-painting. Using HAT we perform an extensive analysis of 74 action recognition models trained on the Kinetics dataset. We confirm that all these models focus more on the scene background than on the human motion; further, we demonstrate that certain model design decisions (such as training with fewer frames per video or using dense as opposed to uniform temporal sampling) appear to worsen the background bias. We open-source HAT to enable the community to design more robust and generalizable human action recognition models. 1
\ No newline at end of file
diff --git a/data/2022/neurips/End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking b/data/2022/neurips/End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/End-to-end Stochastic Optimization with Energy-based Model b/data/2022/neurips/End-to-end Stochastic Optimization with Energy-based Model
new file mode 100644
index 0000000000..f77d86b74c
--- /dev/null
+++ b/data/2022/neurips/End-to-end Stochastic Optimization with Energy-based Model	
@@ -0,0 +1 @@
+Decision-focused learning (DFL) was recently proposed for stochastic optimization problems that involve unknown parameters. By integrating predictive modeling with an implicitly differentiable optimization layer, DFL has shown superior performance to the standard two-stage predict-then-optimize pipeline. However, most existing DFL methods are only applicable to convex problems or a subset of nonconvex problems that can be easily relaxed to convex ones. Further, they can be inefficient in training due to the requirement of solving and differentiating through the optimization problem in every training iteration. We propose SO-EBM, a general and efficient DFL method for stochastic optimization using energy-based models. Instead of relying on KKT conditions to induce an implicit optimization layer, SO-EBM explicitly parameterizes the original optimization problem using a differentiable optimization layer based on energy functions. To better approximate the optimization landscape, we propose a coupled training objective that uses a maximum likelihood loss to capture the optimum location and a distribution-based regularizer to capture the overall energy landscape. Finally, we propose an efficient training procedure for SO-EBM with a self-normalized importance sampler based on a Gaussian mixture proposal. We evaluate SO-EBM in three applications: power scheduling, COVID-19 resource allocation, and non-convex adversarial security game, demonstrating the effectiveness and efficiency of SO-EBM.
\ No newline at end of file
diff --git a/data/2022/neurips/End-to-end Symbolic Regression with Transformers b/data/2022/neurips/End-to-end Symbolic Regression with Transformers
new file mode 100644
index 0000000000..8a87d9570a
--- /dev/null
+++ b/data/2022/neurips/End-to-end Symbolic Regression with Transformers	
@@ -0,0 +1 @@
+Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the"skeleton"of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function. The dominant approach is genetic programming, which evolves candidates by iterating this subroutine a large number of times. Neural networks have recently been tasked to predict the correct skeleton in a single try, but remain much less powerful. In this paper, we challenge this two-step procedure, and task a Transformer to directly predict the full mathematical expression, constants included. One can subsequently refine the predicted constants by feeding them to the non-convex optimizer as an informed initialization. We present ablations to show that this end-to-end approach yields better results, sometimes even without the refinement step. We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Energy-Based Contrastive Learning of Visual Representations b/data/2022/neurips/Energy-Based Contrastive Learning of Visual Representations
new file mode 100644
index 0000000000..5cf29bccc8
--- /dev/null
+++ b/data/2022/neurips/Energy-Based Contrastive Learning of Visual Representations	
@@ -0,0 +1 @@
+Contrastive learning is a method of learning visual representations by training Deep Neural Networks (DNNs) to increase the similarity between representations of positive pairs (transformations of the same image) and reduce the similarity between representations of negative pairs (transformations of different images). Here we explore Energy-Based Contrastive Learning (EBCLR) that leverages the power of generative learning by combining contrastive learning with Energy-Based Models (EBMs). EBCLR can be theoretically interpreted as learning the joint distribution of positive pairs, and it shows promising results on small and medium-scale datasets such as MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100. Specifically, we find EBCLR demonstrates from X4 up to X20 acceleration compared to SimCLR and MoCo v2 in terms of training epochs. Furthermore, in contrast to SimCLR, we observe EBCLR achieves nearly the same performance with 254 negative pairs (batch size 128) and 30 negative pairs (batch size 16) per positive pair, demonstrating the robustness of EBCLR to small numbers of negative pairs. Hence, EBCLR provides a novel avenue for improving contrastive learning methods that usually require large datasets with a significant number of negative pairs per iteration to achieve reasonable performance on downstream tasks. Code: https://github.com/1202kbs/EBCLR
\ No newline at end of file
diff --git a/data/2022/neurips/Enhance the Visual Representation via Discrete Adversarial Training b/data/2022/neurips/Enhance the Visual Representation via Discrete Adversarial Training
new file mode 100644
index 0000000000..25829be3dc
--- /dev/null
+++ b/data/2022/neurips/Enhance the Visual Representation via Discrete Adversarial Training	
@@ -0,0 +1 @@
+Adversarial Training (AT), which is commonly accepted as one of the most effective approaches defending against adversarial examples, can largely harm the standard performance, thus has limited usefulness on industrial-scale production and applications. Surprisingly, this phenomenon is totally opposite in Natural Language Processing (NLP) task, where AT can even benefit for generalization. We notice the merit of AT in NLP tasks could derive from the discrete and symbolic input space. For borrowing the advantage from NLP-style AT, we propose Discrete Adversarial Training (DAT). DAT leverages VQGAN to reform the image data to discrete text-like inputs, i.e. visual words. Then it minimizes the maximal risk on such discrete images with symbolic adversarial perturbations. We further give an explanation from the perspective of distribution to demonstrate the effectiveness of DAT. As a plug-and-play technique for enhancing the visual representation, DAT achieves significant improvement on multiple tasks including image classification, object detection and self-supervised learning. Especially, the model pre-trained with Masked Auto-Encoding (MAE) and fine-tuned by our DAT without extra data can get 31.40 mCE on ImageNet-C and 32.77% top-1 accuracy on Stylized-ImageNet, building the new state-of-the-art. The code will be available at https://github.com/alibaba/easyrobust.
\ No newline at end of file
diff --git a/data/2022/neurips/Enhanced Bilevel Optimization via Bregman Distance b/data/2022/neurips/Enhanced Bilevel Optimization via Bregman Distance
new file mode 100644
index 0000000000..e079a39546
--- /dev/null
+++ b/data/2022/neurips/Enhanced Bilevel Optimization via Bregman Distance	
@@ -0,0 +1 @@
+Bilevel optimization has been recently used in many machine learning problems such as hyperparameter optimization, policy optimization, and meta learning. Although many bilevel optimization methods have been proposed, they still suffer from the high computational complexities and do not consider the more general bilevel problems with nonsmooth regularization. In the paper, thus, we propose a class of enhanced bilevel optimization methods with using Bregman distance to solve bilevel optimization problems, where the outer subproblem is nonconvex and possibly nonsmooth, and the inner subproblem is strongly convex. Specifically, we propose a bilevel optimization method based on Bregman distance (BiO-BreD) to solve deterministic bilevel problems, which achieves a lower computational complexity than the best known results. Meanwhile, we also propose a stochastic bilevel optimization method (SBiO-BreD) to solve stochastic bilevel problems based on stochastic approximated gradients and Bregman distance. Moreover, we further propose an accelerated version of SBiO-BreD method (ASBiO-BreD) using the variance-reduced technique, which can achieve a lower computational complexity than the best known computational complexities with respect to condition number $\kappa$ and target accuracy $\epsilon$ for finding an $\epsilon$-stationary point. We conduct data hyper-cleaning task and hyper-representation learning task to demonstrate that our new algorithms outperform related bilevel optimization approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Enhanced Latent Space Blind Model for Real Image Denoising via Alternative Optimization b/data/2022/neurips/Enhanced Latent Space Blind Model for Real Image Denoising via Alternative Optimization
new file mode 100644
index 0000000000..83543d75ae
--- /dev/null
+++ b/data/2022/neurips/Enhanced Latent Space Blind Model for Real Image Denoising via Alternative Optimization	
@@ -0,0 +1 @@
+Motivated by the achievements in model-based methods and the advances in deep networks, we propose a novel enhanced latent space blind model based deep unfolding network, namely ScaoedNet, for complex real image denoising. It is derived by introducing latent space, noise information, and guidance constraint into the denoising cost function. A self-correction alternative optimization algorithm is proposed to split the novel cost function into three alternative subproblems, i.e. , guidance representation (GR), degradation estimation (DE) and reconstruction (RE) subproblems. Finally, we implement the optimization process by a deep unfolding network consisting of GR, DE and RE networks. For higher performance of the DE network, a novel parameter-free noise feature adaptive enhancement (NFAE) layer is proposed. To synchronously and dynamically realize internal-external feature information mining in the RE network, a novel feature multi-modulation attention (FM 2 A) module is proposed. Our approach thereby leverages the advantages of deep learning, while also beneﬁting from the principled denoising provided by the classical model-based formulation. To the best of our knowledge, our enhanced latent space blind model, optimization scheme, NFAE and FM 2 A have not been reported in the previous literature. Experimental results show the promising performance of ScaoedNet on real image denoising. Code is available at https://github.com/chaoren88/ScaoedNet .
\ No newline at end of file
diff --git a/data/2022/neurips/Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments b/data/2022/neurips/Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Enhancing Safe Exploration Using Safety State Augmentation b/data/2022/neurips/Enhancing Safe Exploration Using Safety State Augmentation
new file mode 100644
index 0000000000..c94443ad57
--- /dev/null
+++ b/data/2022/neurips/Enhancing Safe Exploration Using Safety State Augmentation	
@@ -0,0 +1 @@
+Safe exploration is a challenging and important problem in model-free reinforcement learning (RL). Often the safety cost is sparse and unknown, which unavoidably leads to constraint violations — a phenomenon ideally to be avoided in safety-critical applications. We tackle this problem by augmenting the state-space with a safety state, which is nonnegative if and only if the constraint is satisﬁed. The value of this state also serves as a distance toward constraint violation, while its initial value indicates the available safety budget. This idea allows us to derive policies for scheduling the safety budget during training. We call our approach Simmer (Safe policy IMproveMEnt for RL) to reﬂect the careful nature of these schedules. We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization b/data/2022/neurips/Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization
new file mode 100644
index 0000000000..39281041b1
--- /dev/null
+++ b/data/2022/neurips/Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization	
@@ -0,0 +1 @@
+In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of $68.0\%$, beating vanilla ERM (w/o averaging/ensembling) by $\sim 4\%$, and when using a pre-trained RegNetY-16GF, achieves an average of $76.6\%$, beating vanilla ERM by $6\%$. Our code is available at https://github.com/salesforce/ensemble-of-averages.
\ No newline at end of file
diff --git a/data/2022/neurips/Entropy-Driven Mixed-Precision Quantization for Deep Network Design b/data/2022/neurips/Entropy-Driven Mixed-Precision Quantization for Deep Network Design
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine b/data/2022/neurips/EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
new file mode 100644
index 0000000000..03b98d8463
--- /dev/null
+++ b/data/2022/neurips/EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine	
@@ -0,0 +1 @@
+There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others, aim to improve the system's overall throughput. In this paper, we aim to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop and a modest workstation, to a high-end machine such as NVIDIA DGX-A100. On a high-end machine, EnvPool achieves one million frames per second for the environment execution on Atari environments and three million frames per second on MuJoCo environments. When running EnvPool on a laptop, the speed is 2.8x that of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has great potential to become the de facto RL environment execution engine. Example runs show that it only takes five minutes to train agents to play Atari Pong and MuJoCo Ant on a laptop. EnvPool is open-sourced at https://github.com/sail-sg/envpool.
\ No newline at end of file
diff --git a/data/2022/neurips/Environment Diversification with Multi-head Neural Network for Invariant Learning b/data/2022/neurips/Environment Diversification with Multi-head Neural Network for Invariant Learning
new file mode 100644
index 0000000000..048c3ba8ba
--- /dev/null
+++ b/data/2022/neurips/Environment Diversification with Multi-head Neural Network for Invariant Learning	
@@ -0,0 +1 @@
+Neural networks are often trained with empirical risk minimization; however, it has been shown that a shift between training and testing distributions can cause unpredictable performance degradation. On this issue, a research direction, invariant learning, has been proposed to extract invariant features insensitive to the distributional changes. This work proposes EDNIL, an invariant learning framework containing a multi-head neural network to absorb data biases. We show that this framework does not require prior knowledge about environments or strong assumptions about the pre-trained model. We also reveal that the proposed algorithm has theoretical connections to recent studies discussing properties of variant and invariant features. Finally, we demonstrate that models trained with EDNIL are empirically more robust against distributional shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/Envy-free Policy Teaching to Multiple Agents b/data/2022/neurips/Envy-free Policy Teaching to Multiple Agents
new file mode 100644
index 0000000000..154802082a
--- /dev/null
+++ b/data/2022/neurips/Envy-free Policy Teaching to Multiple Agents	
@@ -0,0 +1 @@
+We study envy-free policy teaching. A number of agents independently explore a common Markov decision process (MDP), but each with their own reward function and discounting rate. A teacher wants to teach a target policy to this diverse group of agents, by means of modifying the agents’ reward functions: providing additional bonuses to certain actions, or penalizing them. When personalized reward modiﬁcation programs are used, an important question is how to design the programs so that the agents think they are treated fairly. We adopt the notion of envy-freeness (EF) from the literature on fair division to formalize this problem and investigate several fundamental questions about the existence of EF solutions in our setting, the computation of cost-minimizing solutions, as well as the price of fairness (PoF), which measures the increase of cost due to the consideration of fairness. We show that 1) an EF solution may not exist if penalties are not allowed in the modiﬁcations, but otherwise always exists. 2) Computing a cost-minimizing EF solution can be formulated as convex optimization and hence solved efﬁciently. 3) The PoF increases but at most quadratically with the geometric sum of the discount factor, and at most linearly with the size of the MDP and the number of agents involved; we present tight asymptotic bounds on the PoF. These results indicate that fairness can be incorporated in multi-agent teaching without signiﬁcant computational or PoF burdens.
\ No newline at end of file
diff --git a/data/2022/neurips/EpiGRAF: Rethinking training of 3D GANs b/data/2022/neurips/EpiGRAF: Rethinking training of 3D GANs
new file mode 100644
index 0000000000..9c2d9eb810
--- /dev/null
+++ b/data/2022/neurips/EpiGRAF: Rethinking training of 3D GANs	
@@ -0,0 +1 @@
+A very recent trend in generative modeling is building 3D-aware generators from 2D image collections. To induce the 3D bias, such models typically rely on volumetric rendering, which is expensive to employ at high resolutions. During the past months, there appeared more than 10 works that address this scaling issue by training a separate 2D decoder to upsample a low-resolution image (or a feature tensor) produced from a pure 3D generator. But this solution comes at a cost: not only does it break multi-view consistency (i.e. shape and texture change when the camera moves), but it also learns the geometry in a low fidelity. In this work, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise. We revisit and improve this optimization scheme in two ways. First, we design a location- and scale-aware discriminator to work on patches of different proportions and spatial positions. Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence. The resulted model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator, and we test it on four datasets (two introduced in this work) at $256^2$ and $512^2$ resolutions. It obtains state-of-the-art image quality, high-fidelity geometry and trains ${\approx} 2.5 \times$ faster than the upsampler-based counterparts. Project website: https://universome.github.io/epigraf.
\ No newline at end of file
diff --git a/data/2022/neurips/Equivariant Graph Hierarchy-Based Neural Networks b/data/2022/neurips/Equivariant Graph Hierarchy-Based Neural Networks
new file mode 100644
index 0000000000..7f5738a93e
--- /dev/null
+++ b/data/2022/neurips/Equivariant Graph Hierarchy-Based Neural Networks	
@@ -0,0 +1 @@
+Equivariant Graph neural Networks (EGNs) are powerful in characterizing the dynamics of multi-body physical systems. Existing EGNs conduct flat message passing, which, yet, is unable to capture the spatial/dynamical hierarchy for complex systems particularly, limiting substructure discovery and global information fusion. In this paper, we propose Equivariant Hierarchy-based Graph Networks (EGHNs) which consist of the three key components: generalized Equivariant Matrix Message Passing (EMMP) , E-Pool and E-UpPool. In particular, EMMP is able to improve the expressivity of conventional equivariant message passing, E-Pool assigns the quantities of the low-level nodes into high-level clusters, while E-UpPool leverages the high-level information to update the dynamics of the low-level nodes. As their names imply, both E-Pool and E-UpPool are guaranteed to be equivariant to meet physic symmetry. Considerable experimental evaluations verify the effectiveness of our EGHN on several applications including multi-object dynamics simulation, motion capture, and protein dynamics modeling.
\ No newline at end of file
diff --git a/data/2022/neurips/Equivariant Networks for Crystal Structures b/data/2022/neurips/Equivariant Networks for Crystal Structures
new file mode 100644
index 0000000000..e28754e239
--- /dev/null
+++ b/data/2022/neurips/Equivariant Networks for Crystal Structures	
@@ -0,0 +1 @@
+Supervised learning with deep models has tremendous potential for applications in materials science. Recently, graph neural networks have been used in this context, drawing direct inspiration from models for molecules. However, materials are typically much more structured than molecules, which is a feature that these models do not leverage. In this work, we introduce a class of models that are equivariant with respect to crystalline symmetry groups. We do this by defining a generalization of the message passing operations that can be used with more general permutation groups, or that can alternatively be seen as defining an expressive convolution operation on the crystal graph. Empirically, these models achieve competitive results with state-of-the-art on property prediction tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Equivariant Networks for Zero-Shot Coordination b/data/2022/neurips/Equivariant Networks for Zero-Shot Coordination
new file mode 100644
index 0000000000..bc6f80e7d9
--- /dev/null
+++ b/data/2022/neurips/Equivariant Networks for Zero-Shot Coordination	
@@ -0,0 +1 @@
+Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner. A common failure mode is symmetry breaking, when agents arbitrarily converge on one out of many equivalent but mutually incompatible policies. Commonly these examples include partial observability, e.g. waving your right hand vs. left hand to convey a covert message. In this paper, we present a novel equivariant network architecture for use in Dec-POMDPs that effectively leverages environmental symmetry for improving zero-shot coordination, doing so more effectively than prior methods. Our method also acts as a ``coordination-improvement operator'' for generic, pre-trained policies, and thus may be applied at test-time in conjunction with any self-play algorithm. We provide theoretical guarantees of our work and test on the AI benchmark task of Hanabi, where we demonstrate our methods outperforming other symmetry-aware baselines in zero-shot coordination, as well as able to improve the coordination ability of a variety of pre-trained policies. In particular, we show our method can be used to improve on the state of the art for zero-shot coordination on the Hanabi benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Error Analysis of Tensor-Train Cross Approximation b/data/2022/neurips/Error Analysis of Tensor-Train Cross Approximation
new file mode 100644
index 0000000000..66618f7672
--- /dev/null
+++ b/data/2022/neurips/Error Analysis of Tensor-Train Cross Approximation	
@@ -0,0 +1 @@
+Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality. Cross approximation-originally developed for representing a matrix from a set of selected rows and columns-is an efficient method for constructing a tensor train decomposition of a tensor from few of its entries. While tensor train cross approximation has achieved remarkable performance in practical applications, its theoretical analysis, in particular regarding the error of the approximation, is so far lacking. To our knowledge, existing results only provide element-wise approximation accuracy guarantees, which lead to a very loose bound when extended to the entire tensor. In this paper, we bridge this gap by providing accuracy guarantees in terms of the entire tensor for both exact and noisy measurements. Our results illustrate how the choice of selected subtensors affects the quality of the cross approximation and that the approximation error caused by model error and/or measurement error may not grow exponentially with the order of the tensor. These results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors, such as those encountered in the description of quantum many-body states.
\ No newline at end of file
diff --git a/data/2022/neurips/Error Correction Code Transformer b/data/2022/neurips/Error Correction Code Transformer
new file mode 100644
index 0000000000..8eef4e3f73
--- /dev/null
+++ b/data/2022/neurips/Error Correction Code Transformer	
@@ -0,0 +1 @@
+Error correction code is a major part of the communication physical layer, ensuring the reliable transfer of data over noisy channels. Recently, neural decoders were shown to outperform classical decoding techniques. However, the existing neural approaches present strong overfitting due to the exponential training complexity, or a restrictive inductive bias due to reliance on Belief Propagation. Recently, Transformers have become methods of choice in many applications thanks to their ability to represent complex interactions between elements. In this work, we propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths. We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately. The element-wise processing allows the analysis of the channel output reliability, while the algebraic code and the interaction between the bits are inserted into the model via an adapted masked self-attention module. The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data b/data/2022/neurips/Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
new file mode 100644
index 0000000000..00efaa8b68
--- /dev/null
+++ b/data/2022/neurips/Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data	
@@ -0,0 +1 @@
+Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.
\ No newline at end of file
diff --git a/data/2022/neurips/Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning b/data/2022/neurips/Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning
new file mode 100644
index 0000000000..0b6b79af8d
--- /dev/null
+++ b/data/2022/neurips/Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning	
@@ -0,0 +1 @@
+In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time. However, existing work has mainly focused on studying first-order optimality guarantees. On the other side, second-order optimality guaranteed algorithms, i.e., algorithms escaping saddle points, have been extensively studied in the non-distributed optimization literature. In this paper, we study a new local algorithm called Bias-Variance Reduced Local Perturbed SGD (BVR-L-PSGD), that combines the existing bias-variance reduced gradient estimator with parameter perturbation to find second-order optimal points in centralized nonconvex distributed optimization. BVR-L-PSGD enjoys second-order optimality with nearly the same communication complexity as the best known one of BVR-L-SGD to find first-order optimality. Particularly, the communication complexity is better than non-local methods when the local datasets heterogeneity is smaller than the smoothness of the local loss. In an extreme case, the communication complexity approaches to $\widetilde \Theta(1)$ when the local datasets heterogeneity goes to zero. Numerical results validate our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits b/data/2022/neurips/Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits
new file mode 100644
index 0000000000..5e308de334
--- /dev/null
+++ b/data/2022/neurips/Escaping from the Barren Plateau via Gaussian Initializations in Deep Variational Quantum Circuits	
@@ -0,0 +1 @@
+Variational quantum circuits have been widely employed in quantum simulation and quantum machine learning in recent years. However, quantum circuits with random structures have poor trainability due to the exponentially vanishing gradient with respect to the circuit depth and the qubit number. This result leads to a general standpoint that deep quantum circuits would not be feasible for practical tasks. In this work, we propose an initialization strategy with theoretical guarantees for the vanishing gradient problem in general deep quantum circuits. Specifically, we prove that under proper Gaussian initialized parameters, the norm of the gradient decays at most polynomially when the qubit number and the circuit depth increase. Our theoretical results hold for both the local and the global observable cases, where the latter was believed to have vanishing gradients even for very shallow circuits. Experimental results verify our theoretical findings in the quantum simulation and quantum chemistry.
\ No newline at end of file
diff --git a/data/2022/neurips/Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning b/data/2022/neurips/Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning
new file mode 100644
index 0000000000..e97a377eda
--- /dev/null
+++ b/data/2022/neurips/Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning	
@@ -0,0 +1 @@
+In label-noise learning, the noise transition matrix , bridging the class posterior for noisy and clean data, has been widely exploited to learn statistically consistent classiﬁers. The effectiveness of these algorithms relies heavily on estimating the transition matrix. Recently, the problem of label-noise learning in multi-label clas-siﬁcation has received increasing attention, and these consistent algorithms can be applied in multi-label cases. However, the estimation of transition matrices in noisy multi-label learning has not been studied and remains challenging, since most of the existing estimators in noisy multi-class learning depend on the existence of anchor points and the accurate ﬁtting of noisy class posterior. To address this problem, in this paper, we ﬁrst study the identiﬁability problem of the class-dependent transition matrix in noisy multi-label learning, and then inspired by the identiﬁability results, we propose a new estimator by exploiting label correlations without neither anchor points nor accurate ﬁtting of noisy class posterior. Speciﬁcally, we estimate the occurrence probability of two noisy labels to get noisy label correlations. Then, we perform sample selection to further extract information that implies clean label correlations, which is used to estimate the occurrence probability of one noisy label when a certain clean label appears. By utilizing the mismatch of label correlations implied in these occurrence probabilities, the transition matrix is identiﬁable , and can then be acquired by solving a simple bilinear decomposition problem. Empirical results demonstrate the effectiveness of our estimator to estimate the transition matrix with label correlations, leading to better classiﬁcation performance. Source codes are available at https://github.com/tmllab/Multi-Label-T.
\ No newline at end of file
diff --git a/data/2022/neurips/Estimating and Explaining Model Performance When Both Covariates and Labels Shift b/data/2022/neurips/Estimating and Explaining Model Performance When Both Covariates and Labels Shift
new file mode 100644
index 0000000000..23f53b5ede
--- /dev/null
+++ b/data/2022/neurips/Estimating and Explaining Model Performance When Both Covariates and Labels Shift	
@@ -0,0 +1 @@
+Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. This unifies and generalizes several existing shift models including label shift and sparse covariate shift, where only marginal feature or label distribution shifts are considered. We describe mathematical conditions under which SJS is identifiable. We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels. We conduct extensive experiments on several real-world datasets with various ML models. Across different datasets and distribution shifts, SEES achieves significant (up to an order of magnitude) shift estimation error improvements over existing approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Estimating graphical models for count data with applications to single-cell gene network b/data/2022/neurips/Estimating graphical models for count data with applications to single-cell gene network
new file mode 100644
index 0000000000..422c44f184
--- /dev/null
+++ b/data/2022/neurips/Estimating graphical models for count data with applications to single-cell gene network	
@@ -0,0 +1 @@
+Graphical models such as Gaussian graphical models have been widely applied for direct interaction inference in many different areas. In many modern applications, such as single-cell RNA sequencing (scRNA-seq) studies, the observed data are counts and often contain many small counts. Traditional graphical models for continuous data are inappropriate for network inference of count data. We consider the Poisson log-normal (PLN) graphical model for count data and the precision matrix of the latent normal distribution represents the network. We propose a two-step method PLNet to estimate the precision matrix. PLNet ﬁrst estimates the latent covariance matrix using the maximum marginal likelihood estimator (MMLE) and then estimates the precision matrix by minimizing the lasso-penalized D-trace loss function. We establish the convergence rate of the MMLE of the covariance matrix and further establish the convergence rate and the sign consistency of the proposed PLNet estimator of the precision matrix in the high dimensional setting. Importantly, although the PLN model is not sub-Gaussian, we show that the PLNet estimator is consistent even if the model dimension goes to inﬁnity exponentially as the sample size increases. The performance of PLNet is evaluated and compared with available methods using simulation and gene regulatory network analysis of real scRNA-seq data.
\ No newline at end of file
diff --git a/data/2022/neurips/Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC b/data/2022/neurips/Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC
new file mode 100644
index 0000000000..10b7bea543
--- /dev/null
+++ b/data/2022/neurips/Estimating the Arc Length of the Optimal ROC Curve and Lower Bounding the Maximal AUC	
@@ -0,0 +1 @@
+In this paper, we show the arc length of the optimal ROC curve is an $f$-divergence. By leveraging this result, we express the arc length using a variational objective and estimate it accurately using positive and negative samples. We show this estimator has a non-parametric convergence rate $O_p(n^{-\beta/4})$ ($\beta \in (0,1]$ depends on the smoothness). Using the same technique, we show the surface area between the optimal ROC curve and the diagonal can be expressed via a similar variational objective. These new insights lead to a novel classification procedure that maximizes an approximate lower bound of the maximal AUC. Experiments on CIFAR-10 datasets show the proposed two-step procedure achieves good AUC performance in imbalanced binary classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Estimation of Entropy in Constant Space with Improved Sample Complexity b/data/2022/neurips/Estimation of Entropy in Constant Space with Improved Sample Complexity
new file mode 100644
index 0000000000..925a18319b
--- /dev/null
+++ b/data/2022/neurips/Estimation of Entropy in Constant Space with Improved Sample Complexity	
@@ -0,0 +1 @@
+Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution $\mathcal D$ over an alphabet of size $k$ up to $\pm\epsilon$ additive error by streaming over $(k/\epsilon^3) \cdot \text{polylog}(1/\epsilon)$ i.i.d. samples and using only $O(1)$ words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to $(k/\epsilon^2)\cdot \text{polylog}(1/\epsilon)$. We conjecture that this is optimal up to $\text{polylog}(1/\epsilon)$ factors.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness b/data/2022/neurips/Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness
new file mode 100644
index 0000000000..1cc09770b6
--- /dev/null
+++ b/data/2022/neurips/Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness	
@@ -0,0 +1 @@
+Recent work has established that the conditional mutual information (CMI) framework of Steinke and Zakynthinou (2020) is expressive enough to capture generalization guarantees in terms of algorithmic stability, VC dimension, and related complexity measures for conventional learning (Harutyunyan et al., 2021, Haghifam et al., 2021). Hence, it provides a unified method for establishing generalization bounds. In meta learning, there has so far been a divide between information-theoretic results and results from classical learning theory. In this work, we take a first step toward bridging this divide. Specifically, we present novel generalization bounds for meta learning in terms of the evaluated CMI (e-CMI). To demonstrate the expressiveness of the e-CMI framework, we apply our bounds to a representation learning setting, with $n$ samples from $\hat n$ tasks parameterized by functions of the form $f_i \circ h$. Here, each $f_i \in \mathcal F$ is a task-specific function, and $h \in \mathcal H$ is the shared representation. For this setup, we show that the e-CMI framework yields a bound that scales as $\sqrt{ \mathcal C(\mathcal H)/(n\hat n) + \mathcal C(\mathcal F)/n} $, where $\mathcal C(\cdot)$ denotes a complexity measure of the hypothesis class. This scaling behavior coincides with the one reported in Tripuraneni et al. (2020) using Gaussian complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluating Graph Generative Models with Contrastively Learned Features b/data/2022/neurips/Evaluating Graph Generative Models with Contrastively Learned Features
new file mode 100644
index 0000000000..77f27f7548
--- /dev/null
+++ b/data/2022/neurips/Evaluating Graph Generative Models with Contrastively Learned Features	
@@ -0,0 +1 @@
+A wide range of models have been proposed for Graph Generative Models, necessitating effective methods to evaluate their quality. So far, most techniques use either traditional metrics based on subgraph counting, or the representations of randomly initialized Graph Neural Networks (GNNs). We propose using representations from contrastively trained GNNs, rather than random GNNs, and show this gives more reliable evaluation metrics. Neither traditional approaches nor GNN-based approaches dominate the other, however: we give examples of graphs that each approach is unable to distinguish. We demonstrate that Graph Substructure Networks (GSNs), which in a way combine both approaches, are better at distinguishing the distances between graph datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts b/data/2022/neurips/Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts
new file mode 100644
index 0000000000..1d350455b6
--- /dev/null
+++ b/data/2022/neurips/Evaluating Latent Space Robustness and Uncertainty of EEG-ML Models under Realistic Distribution Shifts	
@@ -0,0 +1 @@
+The recent availability of large datasets in bio-medicine has inspired the development of representation learning methods for multiple healthcare applications. Despite advances in predictive performance, the clinical utility of such methods is limited when exposed to real-world data. This study develops model diagnostic measures to detect potential pitfalls before deployment without assuming access to external data. Specifically, we focus on modeling realistic data shifts in electrophysiological signals (EEGs) via data transforms and extend the conventional task-based evaluations with analyses of a) the model's latent space and b) predictive uncertainty under these transforms. We conduct experiments on multiple EEG feature encoders and two clinically relevant downstream tasks using publicly available large-scale clinical EEGs. Within this experimental setting, our results suggest that measures of latent space integrity and model uncertainty under the proposed data shifts may help anticipate performance degradation during deployment.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluating Out-of-Distribution Performance on Document Image Classifiers b/data/2022/neurips/Evaluating Out-of-Distribution Performance on Document Image Classifiers
new file mode 100644
index 0000000000..e706e53cbd
--- /dev/null
+++ b/data/2022/neurips/Evaluating Out-of-Distribution Performance on Document Image Classifiers	
@@ -0,0 +1 @@
+The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers. Our new out-of-distribution data can be found at https://github.com/gxlarson/rvl-cdip-ood.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluating Robustness to Dataset Shift via Parametric Robustness Sets b/data/2022/neurips/Evaluating Robustness to Dataset Shift via Parametric Robustness Sets
new file mode 100644
index 0000000000..7e200b8592
--- /dev/null
+++ b/data/2022/neurips/Evaluating Robustness to Dataset Shift via Parametric Robustness Sets	
@@ -0,0 +1 @@
+We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. These shifts are defined via parametric changes in the causal mechanisms of observed variables, where constraints on parameters yield a"robustness set"of plausible distributions and a corresponding worst-case loss over the set. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance. For small shifts, however, we can construct a local second-order approximation to the loss under shift and cast the problem of finding a worst-case shift as a particular non-convex quadratic optimization problem, for which efficient algorithms are available. We demonstrate that this second-order approximation can be estimated directly for shifts in conditional exponential family models, and we bound the approximation error. We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes.
\ No newline at end of file
diff --git a/data/2022/neurips/Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex b/data/2022/neurips/Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex
new file mode 100644
index 0000000000..9e3122e0a0
--- /dev/null
+++ b/data/2022/neurips/Evaluation beyond Task Performance: Analyzing Concepts in AlphaZero in Hex	
@@ -0,0 +1 @@
+AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We investigate AlphaZero's internal representations in the game of Hex using two evaluation techniques from natural language processing (NLP): model probing and behavioral tests. In doing so, we introduce new evaluation tools to the RL community and illustrate how evaluations other than task performance can be used to provide a more complete picture of a model's strengths and weaknesses. Our analyses in the game of Hex reveal interesting patterns and generate some testable hypotheses about how such models learn in general. For example, we find that MCTS discovers concepts before the neural network learns to encode them. We also find that concepts related to short-term end-game planning are best encoded in the final layers of the model, whereas concepts related to long-term planning are encoded in the middle layers of the model.
\ No newline at end of file
diff --git a/data/2022/neurips/EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks b/data/2022/neurips/EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks
new file mode 100644
index 0000000000..5fd079d3e2
--- /dev/null
+++ b/data/2022/neurips/EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have received extensive research attention for their promising performance in graph machine learning. Despite their extraordinary predictive accuracy, existing approaches, such as GCN and GPRGNN, are not robust in the face of homophily changes on test graphs, rendering these models vulnerable to graph structural attacks and with limited capacity in generalizing to graphs of varied homophily levels. Although many methods have been proposed to improve the robustness of GNN models, most of these techniques are restricted to the spatial domain and employ complicated defense mechanisms, such as learning new graph structures or calculating edge attentions. In this paper, we study the problem of designing simple and robust GNN models in the spectral domain. We propose EvenNet, a spectral GNN corresponding to an even-polynomial graph filter. Based on our theoretical analysis in both spatial and spectral domains, we demonstrate that EvenNet outperforms full-order models in generalizing across homophilic and heterophilic graphs, implying that ignoring odd-hop neighbors improves the robustness of GNNs. We conduct experiments on both synthetic and real-world datasets to demonstrate the effectiveness of EvenNet. Notably, EvenNet outperforms existing defense models against structural attacks without introducing additional computational costs and maintains competitiveness in traditional node classification tasks on homophilic and heterophilic graphs.
\ No newline at end of file
diff --git a/data/2022/neurips/Evolution of Neural Tangent Kernels under Benign and Adversarial Training b/data/2022/neurips/Evolution of Neural Tangent Kernels under Benign and Adversarial Training
new file mode 100644
index 0000000000..43c874d716
--- /dev/null
+++ b/data/2022/neurips/Evolution of Neural Tangent Kernels under Benign and Adversarial Training	
@@ -0,0 +1 @@
+Two key challenges facing modern deep learning are mitigating deep networks' vulnerability to adversarial attacks and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent Kernel (NTK) -- a characterization of neural network behavior in the infinite-width limit. In this limit, the kernel is frozen, and the underlying feature map is fixed. In finite widths, however, there is evidence that feature learning happens at the earlier stages of the training (kernel learning) before a second phase where the kernel remains fixed (lazy training). While prior work has aimed at studying adversarial vulnerability through the lens of the frozen infinite-width NTK, there is no work that studies the adversarial robustness of the empirical/finite NTK during training. In this work, we perform an empirical study of the evolution of the empirical NTK under standard and adversarial training, aiming to disambiguate the effect of adversarial training on kernel learning and lazy training. We find under adversarial training, the empirical NTK rapidly converges to a different kernel (and feature map) than standard training. This new kernel provides adversarial robustness, even when non-robust training is performed on top of it. Furthermore, we find that adversarial training on top of a fixed kernel can yield a classifier with $76.1\%$ robust accuracy under PGD attacks with $\varepsilon = 4/255$ on CIFAR-10.
\ No newline at end of file
diff --git a/data/2022/neurips/Exact Shape Correspondence via 2D graph convolution b/data/2022/neurips/Exact Shape Correspondence via 2D graph convolution
new file mode 100644
index 0000000000..e824fc2d21
--- /dev/null
+++ b/data/2022/neurips/Exact Shape Correspondence via 2D graph convolution	
@@ -0,0 +1 @@
+For exact 3D shape correspondence (matching or alignment), i.e., the task of matching each point on a shape to its exact corresponding point on the other shape (or to be more specific, matching at geodesic error 0), most existing methods do not perform well due to two main problems. First, on nearly-isometric shapes (i.e., low noise levels), most existing methods use the eigen-vectors (eigen-functions) of the Laplace Beltrami Operator (LBO) or other shape descriptors to update an initialized correspondence which is not exact, leading to an accumulation of update errors. Thus, though the final correspondence may generally be smooth, it is generally inexact. Second, on non-isometric shapes (noisy shapes), existing methods are generally not robust to noise as they usually assume near-isometry. In addition, existing methods that attempt to address the non-isometric shape problem (e.g., GRAMPA) are generally computationally expensive and do not generalise to nearly-isometric shapes. To address these two problems, we propose a 2D graph convolution-based framework called 2D-GEM. 2D-GEM is robust to noise on non-isometric shapes and with a few additional constraints, it also addresses the errors in the update on nearly-isometric shapes. We demonstrate the effectiveness of 2D-GEM by achieving a high accuracy of 90.5 % at geodesic error 0 on the non-isometric benchmark SHREC16, i.e., TOPKIDS (while being much faster than GRAMPA), and on nearly-isometric benchmarks by achieving a high accuracy of 92.5 % on TOSCA and 84.9 % on SCAPE at geodesic error 0.
\ No newline at end of file
diff --git a/data/2022/neurips/Exact Solutions of a Deep Linear Network b/data/2022/neurips/Exact Solutions of a Deep Linear Network
new file mode 100644
index 0000000000..6125bdca8b
--- /dev/null
+++ b/data/2022/neurips/Exact Solutions of a Deep Linear Network	
@@ -0,0 +1 @@
+This work finds the analytical expression for the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in the deep neural network loss landscape where highly nonlinear phenomenon emerge. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than one hidden layer, qualitatively different from a network with only one hidden layer. Practically, our result implies that common deep learning initialization methods are generally insufficient to ease the optimization of neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Exact learning dynamics of deep linear networks with prior knowledge b/data/2022/neurips/Exact learning dynamics of deep linear networks with prior knowledge
new file mode 100644
index 0000000000..e8ea1869a7
--- /dev/null
+++ b/data/2022/neurips/Exact learning dynamics of deep linear networks with prior knowledge	
@@ -0,0 +1 @@
+Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu’s matrix Riccati solution (Fukumizu 1998 Gen 1 1E–03). We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. The expressions reveal a class of task-independent initialisations that radically alter learning dynamics from slow non-linear dynamics to fast exponential trajectories while converging to a global optimum with identical representational similarity, dissociating learning trajectories from the structure of initial internal representations. We characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small initial weights without incorporating their fine-scale structure. Finally, we discuss the implications of these findings for continual learning, reversal learning and learning of structured knowledge. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation b/data/2022/neurips/Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
new file mode 100644
index 0000000000..f1caecc2de
--- /dev/null
+++ b/data/2022/neurips/Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation	
@@ -0,0 +1 @@
+Generating precise class-aware pseudo ground-truths, a.k.a, class activation maps (CAMs), is essential for weakly-supervised semantic segmentation. The original CAM method usually produces incomplete and inaccurate localization maps. To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages. In the Expansion stage, an offset learning branch in a deformable convolution layer, referred as"expansion sampler"seeks for sampling increasingly less discriminative object regions, driven by an inverse supervision signal that maximizes image-level classification loss. The located more complete object in the Expansion stage is then gradually narrowed down to the final object region during the Shrinkage stage. In the Shrinkage stage, the offset learning branch of another deformable convolution layer, referred as"shrinkage sampler", is introduced to exclude the false positive background regions attended in the Expansion stage to improve the precision of the localization maps. We conduct various experiments on PASCAL VOC 2012 and MS COCO 2014 to well demonstrate the superiority of our method over other state-of-the-art methods for weakly-supervised semantic segmentation. Code will be made publicly available here https://github.com/TyroneLi/ESOL_WSSS.
\ No newline at end of file
diff --git a/data/2022/neurips/Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations b/data/2022/neurips/Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
new file mode 100644
index 0000000000..fbf88ba01a
--- /dev/null
+++ b/data/2022/neurips/Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations	
@@ -0,0 +1 @@
+Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive experiments on three benchmark text-video retrieval datasets prove that our EMCL can learn more discriminative video-and-language representations than previous methods, and significantly outperform previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing approaches either as a jointly training layer or an out-of-the-box inference module with no extra training, making it easy to be incorporated into any existing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Expected Improvement for Contextual Bandits b/data/2022/neurips/Expected Improvement for Contextual Bandits
new file mode 100644
index 0000000000..f2b75f626e
--- /dev/null
+++ b/data/2022/neurips/Expected Improvement for Contextual Bandits	
@@ -0,0 +1 @@
+The expected improvement (EI) is a popular technique to handle the tradeoff between exploration and exploitation under uncertainty. This technique has been widely used in Bayesian optimization but it is not applicable for the contextual bandit problem which is a generalization of the standard bandit and Bayesian optimization. In this paper, we initiate and study the EI technique for contextual bandits from both theoretical and practical perspectives. We propose two novel EI based algorithms, one when the reward function is assumed to be linear and the other for more general reward functions. With linear reward functions, we demonstrate that our algorithm achieves a near-optimal regret. Notably, our regret improves that of LinTS [3] by a factor √ d while avoiding to solve a NP-hard problem at each iteration as in LinUCB [1]. For more general reward functions which are modeled by deep neural networks, we prove that our algorithm achieves a ˜ O ( ˜ d √ T ) regret, where ˜ d is the effective dimension of a neural tangent kernel (NTK) matrix, and T is the number of iterations. Our experiments on various benchmark datasets show that both proposed algorithms work well and consistently outperform existing approaches, especially in high dimensions.
\ No newline at end of file
diff --git a/data/2022/neurips/Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning b/data/2022/neurips/Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
new file mode 100644
index 0000000000..2383db4d68
--- /dev/null
+++ b/data/2022/neurips/Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning	
@@ -0,0 +1 @@
+Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens. Many advanced approaches have been developed to reduce the total number of tokens in large-scale vision transformers, especially for image classification tasks. Typically, they select a small group of essential tokens according to their relevance with the class token, then fine-tune the weights of the vision transformer. Such fine-tuning is less practical for dense prediction due to the much heavier computation and GPU memory cost than image classification. In this paper, we focus on a more challenging problem, i.e., accelerating large-scale vision transformers for dense prediction without any additional re-training or fine-tuning. In response to the fact that high-resolution representations are necessary for dense prediction, we present two non-parametric operators, a token clustering layer to decrease the number of tokens and a token reconstruction layer to increase the number of tokens. The following steps are performed to achieve this: (i) we use the token clustering layer to cluster the neighboring tokens together, resulting in low-resolution representations that maintain the spatial structures; (ii) we apply the following transformer layers only to these low-resolution representations or clustered tokens; and (iii) we use the token reconstruction layer to re-create the high-resolution representations from the refined low-resolution representations. The results obtained by our method are promising on five dense prediction tasks, including object detection, semantic segmentation, panoptic segmentation, instance segmentation, and depth estimation.
\ No newline at end of file
diff --git a/data/2022/neurips/Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces b/data/2022/neurips/Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces
new file mode 100644
index 0000000000..c3823e350f
--- /dev/null
+++ b/data/2022/neurips/Experimental Design for Linear Functionals in Reproducing Kernel Hilbert Spaces	
@@ -0,0 +1 @@
+Optimal experimental design seeks to determine the most informative allocation of experiments to infer an unknown statistical quantity. In this work, we investigate the optimal design of experiments for {\em estimation of linear functionals in reproducing kernel Hilbert spaces (RKHSs)}. This problem has been extensively studied in the linear regression setting under an estimability condition, which allows estimating parameters without bias. We generalize this framework to RKHSs, and allow for the linear functional to be only approximately inferred, i.e., with a fixed bias. This scenario captures many important modern applications, such as estimation of gradient maps, integrals, and solutions to differential equations. We provide algorithms for constructing bias-aware designs for linear functionals. We derive non-asymptotic confidence sets for fixed and adaptive designs under sub-Gaussian noise, enabling us to certify estimation with bounded error with high probability.
\ No newline at end of file
diff --git a/data/2022/neurips/Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes b/data/2022/neurips/Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes
new file mode 100644
index 0000000000..cd69239a2c
--- /dev/null
+++ b/data/2022/neurips/Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes	
@@ -0,0 +1 @@
+In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.
\ No newline at end of file
diff --git a/data/2022/neurips/Explainability Via Causal Self-Talk b/data/2022/neurips/Explainability Via Causal Self-Talk
new file mode 100644
index 0000000000..22b78e2a50
--- /dev/null
+++ b/data/2022/neurips/Explainability Via Causal Self-Talk	
@@ -0,0 +1 @@
+Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Explainable Reinforcement Learning via Model Transforms b/data/2022/neurips/Explainable Reinforcement Learning via Model Transforms
new file mode 100644
index 0000000000..9d8bbcab28
--- /dev/null
+++ b/data/2022/neurips/Explainable Reinforcement Learning via Model Transforms	
@@ -0,0 +1 @@
+Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying model is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), the model can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they can provide meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define the explainability problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Explaining Preferences with Shapley Values b/data/2022/neurips/Explaining Preferences with Shapley Values
new file mode 100644
index 0000000000..0df4f0658c
--- /dev/null
+++ b/data/2022/neurips/Explaining Preferences with Shapley Values	
@@ -0,0 +1 @@
+While preference modelling is becoming one of the pillars of machine learning, the problem of preference explanation remains challenging and underexplored. In this paper, we propose \textsc{Pref-SHAP}, a Shapley value-based model explanation framework for pairwise comparison data. We derive the appropriate value functions for preference models and further extend the framework to model and explain \emph{context specific} information, such as the surface type in a tennis game. To demonstrate the utility of \textsc{Pref-SHAP}, we apply our method to a variety of synthetic and real-world datasets and show that richer and more insightful explanations can be obtained over the baseline.
\ No newline at end of file
diff --git a/data/2022/neurips/Explicable Policy Search b/data/2022/neurips/Explicable Policy Search
new file mode 100644
index 0000000000..14ffb76390
--- /dev/null
+++ b/data/2022/neurips/Explicable Policy Search	
@@ -0,0 +1 @@
+Human teammates often form conscious and subconscious expectations of each other during interaction. Teaming success is contingent on whether such expectations can be met. Similarly, for an intelligent agent to operate beside a human, it must consider the human’s expectation of its behavior. Disregarding such expectations can lead to the loss of trust and degraded team performance. A key challenge here is that the human’s expectation may not align with the agent’s optimal behavior, e.g., due to the human’s partial or inaccurate understanding of the task domain. Prior work on explicable planning described the ability of agents to respect their human teammate’s expectations by trading off task performance for more expected or “ explicable ” behaviors. In this paper, we introduce Explicable Policy Search (EPS) to significantly extend such an ability to stochastic domains in a reinforcement learning (RL) setting with continuous state and action spaces. Furthermore, in contrast to the traditional RL methods, EPS must at the same time infer the human’s hidden expectations. Such inferences require information about the human’s belief about the domain dynamics and her reward model but directly querying them is impractical. We demonstrate that such information can be necessarily and sufficiently encoded by a surrogate reward function for EPS, which can be learned based on the human’s feedback on the agent’s behavior. The surrogate reward function is then used to reshape the agent’s reward function, which is shown to be equivalent to searching for an explicable policy. We evaluate EPS in a set of navigation domains with synthetic human models and in an autonomous driving domain with a user study. The results suggest that our method can generate explicable behaviors that reconcile task performance with human expectations intelligently and has real-world relevance in human-agent teaming domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Explicit Tradeoffs between Adversarial and Natural Distributional Robustness b/data/2022/neurips/Explicit Tradeoffs between Adversarial and Natural Distributional Robustness
new file mode 100644
index 0000000000..6f9cb7c4ad
--- /dev/null
+++ b/data/2022/neurips/Explicit Tradeoffs between Adversarial and Natural Distributional Robustness	
@@ -0,0 +1 @@
+Several existing works study either adversarial or natural distributional robustness of deep neural networks separately. In practice, however, models need to enjoy both types of robustness to ensure reliability. In this work, we bridge this gap and show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness. We first consider a simple linear regression setting on Gaussian data with disjoint sets of core and spurious features. In this setting, through theoretical and empirical analysis, we show that (i) adversarial training with $\ell_1$ and $\ell_2$ norms increases the model reliance on spurious features; (ii) For $\ell_\infty$ adversarial training, spurious reliance only occurs when the scale of the spurious features is larger than that of the core features; (iii) adversarial training can have an unintended consequence in reducing distributional robustness, specifically when spurious correlations are changed in the new test domain. Next, we present extensive empirical evidence, using a test suite of twenty adversarially trained models evaluated on five benchmark datasets (ObjectNet, RIVAL10, Salient ImageNet-1M, ImageNet-9, Waterbirds), that adversarially trained classifiers rely on backgrounds more than their standardly trained counterparts, validating our theoretical results. We also show that spurious correlations in training data (when preserved in the test domain) can improve adversarial robustness, revealing that previous claims that adversarial vulnerability is rooted in spurious correlations are incomplete.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping b/data/2022/neurips/Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping
new file mode 100644
index 0000000000..e9e9f162b1
--- /dev/null
+++ b/data/2022/neurips/Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping	
@@ -0,0 +1 @@
+In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of a linear transformation is equivalent to changing the initialization of the Q -function in neural approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves ofﬂine RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In ofﬂine RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efﬁciency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method. https://sites.google.com/view/rewardshaping
\ No newline at end of file
diff --git a/data/2022/neurips/Exploitability Minimization in Games and Beyond b/data/2022/neurips/Exploitability Minimization in Games and Beyond
new file mode 100644
index 0000000000..62766cd35b
--- /dev/null
+++ b/data/2022/neurips/Exploitability Minimization in Games and Beyond	
@@ -0,0 +1 @@
+Pseudo-games are a natural and well-known generalization of normal-form games, in which the actions taken by each player affect not only the other players' payoffs, as in games, but also the other players' strategy sets. The solution concept par excellence for pseudo-games is the generalized Nash equilibrium (GNE), i.e., a strategy profile at which each player's strategy is feasible and no player can improve their payoffs by unilaterally deviating to another strategy in the strategy set determined by the other players' strategies. The computation of GNE in pseudo-games has long been a problem of interest, due to applications in a wide variety of fields, from environmental protection to logistics to telecommunications. Although computing GNE is PPAD-hard in general, it is still of interest to try to compute them in restricted classes of pseudo-games. One approach is to search for a strategy profile that minimizes exploitability, i.e., the sum of the regrets across all players. As exploitability is nondifferentiable in general, developing efficient first-order methods that minimize it might not seem possible at first glance. We observe, however, that the exploitability-minimization problem can be recast as a min-max optimization problem, and thereby obtain polynomial-time first-order methods to compute a refinement of GNE, namely the variational equilibria (VE), in convex-concave cumulative regret pseudo-games with jointly convex constraints. More generally, we also show that our methods find the stationary points of the exploitability in polynomial time in Lipschitz-smooth pseudo-games with jointly convex constraints. Finally, we demonstrate in experiments that our methods not only outperform known algorithms, but that even in pseudo-games where they are not guaranteed to converge to a GNE, they may do so nonetheless, with proper initialization.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploiting Semantic Relations for Glass Surface Detection b/data/2022/neurips/Exploiting Semantic Relations for Glass Surface Detection
new file mode 100644
index 0000000000..530db7665f
--- /dev/null
+++ b/data/2022/neurips/Exploiting Semantic Relations for Glass Surface Detection	
@@ -0,0 +1 @@
+Glass surfaces are omnipresent in our daily lives and often go unnoticed by the 1 majority of us. While humans are generally able to infer their locations and thus 2 avoid collisions, it can be difficult for current object detection systems to handle 3 them due to the transparent nature of glass surfaces. Previous methods approached 4 the problem by extracting global context information to obtain priors such as 5 boundary and reflection. However, their performances cannot be guaranteed when 6 these critical features are not available. We observe that humans often reason 7 through the semantic context of the environment, which offers insights into the 8 categories of and proximity between entities that are expected to appear in the 9 surrounding. For example, the odds of co-occurrence of glass windows with walls 10 and curtains is generally higher than that with other objects such as cars and trees, 11 which have relatively less semantic relevance. Based on this observation, we 12 propose a model that integrates the contextual relationship of the scene for glass 13 surface detection with two novel modules: (1) Scene Aware Activation (SAA) 14 Module to adaptively filter critical channels with respect to spatial and semantic 15 features, and (2) Context Correlation Attention (CCA) Module to progressively 16 learn the contextual correlations among objects both spatially and semantically. In 17 addition, we propose a large-scale glass surface detection dataset named GSD-S, 18 which contains 4,519 real-world RGB glass surface images from diverse real-19 world scenes with detailed annotations. Experimental results show that our model 20 outperforms contemporary works, especially with 48.8% improvement on MAE 21 from our proposed GSD-S dataset. 22
\ No newline at end of file
diff --git a/data/2022/neurips/Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection b/data/2022/neurips/Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection
new file mode 100644
index 0000000000..3b20ee7fc4
--- /dev/null
+++ b/data/2022/neurips/Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection	
@@ -0,0 +1 @@
+Model attributions are important in deep neural networks as they aid practitioners in understanding the models, but recent studies reveal that attributions can be easily perturbed by adding imperceptible noise to the input. The non-differentiable Kendall's rank correlation is a key performance index for attribution protection. In this paper, we first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness. Based on these findings, we explore the vector space of attribution to explain the shortcomings of attribution defense methods using $\ell_p$ norm and propose integrated gradient regularizer (IGR), which maximizes the cosine similarity between natural and perturbed attributions. Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods. Our experiments on different models and datasets confirm our analysis on attribution protection and demonstrate a decent improvement in adversarial robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploration via Elliptical Episodic Bonuses b/data/2022/neurips/Exploration via Elliptical Episodic Bonuses
new file mode 100644
index 0000000000..981cb88b03
--- /dev/null
+++ b/data/2022/neurips/Exploration via Elliptical Episodic Bonuses	
@@ -0,0 +1 @@
+In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-based episodic term in their exploration bonus. As a result, despite their success in relatively simple, noise-free settings, these methods fall short in more realistic scenarios where the state space is vast and prone to noise. To address this limitation, we introduce Exploration via Elliptical Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces and encourages an agent to explore states that are diverse under a learned embedding within each episode. The embedding is learned using an inverse dynamics model in order to capture controllable aspects of the environment. Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases. E3B also matches existing methods on sparse reward, pixel-based VizDoom environments, and outperforms existing methods in reward-free exploration on Habitat, demonstrating that it can scale to high-dimensional pixel-based observations and realistic environments.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploration via Planning for Information about the Optimal Trajectory b/data/2022/neurips/Exploration via Planning for Information about the Optimal Trajectory
new file mode 100644
index 0000000000..339f04688f
--- /dev/null
+++ b/data/2022/neurips/Exploration via Planning for Information about the Optimal Trajectory	
@@ -0,0 +1 @@
+Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines and 200x fewer samples than model free methods on a diverse set of low-to-medium dimensional control tasks in both the open-loop and closed-loop control settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards b/data/2022/neurips/Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards
new file mode 100644
index 0000000000..dc599c3277
--- /dev/null
+++ b/data/2022/neurips/Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards	
@@ -0,0 +1 @@
+We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent. Existing works have considered a number of different reward shaping formulations; however, they either require external domain knowledge or fail in environments with extremely sparse rewards. In this paper, we propose a novel framework, Exploration-Guided Reward Shaping (E XPLO RS), that operates in a fully self-supervised manner and can accelerate an agent’s learning even in sparse-reward environments. The key idea of E XPLO RS is to learn an intrinsic reward function in combination with exploration-based bonuses to maximize the agent’s utility w.r.t. extrinsic rewards. We theoretically showcase the usefulness of our reward shaping framework in a special family of MDPs. Experimental results on several environments with sparse/noisy reward signals demonstrate the effectiveness of E XPLO RS.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring Example Influence in Continual Learning b/data/2022/neurips/Exploring Example Influence in Continual Learning
new file mode 100644
index 0000000000..185f395b67
--- /dev/null
+++ b/data/2022/neurips/Exploring Example Influence in Continual Learning	
@@ -0,0 +1 @@
+Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks). Due to the fact that past training data is not available, it is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP. Inspired by Influence Function (IF), we first study example influence via adding perturbation to example weight and computing the influence derivation. To avoid the storage and calculation burden of Hessian inverse in neural networks, we propose a simple yet effective MetaSP algorithm to simulate the two key steps in the computation of IF and obtain the S- and P-aware example influence. Moreover, we propose to fuse two kinds of example influence by solving a dual-objective optimization problem, and obtain a fused influence towards SP Pareto optimality. The fused influence can be used to control the update of model and optimize the storage of rehearsal. Empirical results show that our algorithm significantly outperforms state-of-the-art methods on both task- and class-incremental benchmark CL datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring Figure-Ground Assignment Mechanism in Perceptual Organization b/data/2022/neurips/Exploring Figure-Ground Assignment Mechanism in Perceptual Organization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Exploring Length Generalization in Large Language Models b/data/2022/neurips/Exploring Length Generalization in Large Language Models
new file mode 100644
index 0000000000..2dc630f0cc
--- /dev/null
+++ b/data/2022/neurips/Exploring Length Generalization in Large Language Models	
@@ -0,0 +1 @@
+The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring evolution-aware & -free protein language models as protein function predictors b/data/2022/neurips/Exploring evolution-aware & -free protein language models as protein function predictors
new file mode 100644
index 0000000000..b0ddc36c12
--- /dev/null
+++ b/data/2022/neurips/Exploring evolution-aware & -free protein language models as protein function predictors	
@@ -0,0 +1 @@
+Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions. In particular, AlphaFold, a ground-breaking AI system, could potentially reshape structural biology. However, the utility of the PLM module in AlphaFold, Evoformer, has not been explored beyond structure prediction. In this paper, we investigate the representation ability of three popular PLMs: ESM-1b (single sequence), MSA-Transformer (multiple sequence alignment) and Evoformer (structural), with a special focus on Evoformer. Specifically, we aim to answer the following key questions: (i) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function? (ii) If yes, can Evoformer replace ESM-1b and MSA-Transformer? (ii) How much do these PLMs rely on evolution-related protein data? In this regard, are they complementary to each other? We compare these models by empirical study along with new insights and conclusions. All code and datasets for reproducibility are available at https://github.com/elttaes/Revisiting-PLMs.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability b/data/2022/neurips/Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability
new file mode 100644
index 0000000000..166a6c9949
--- /dev/null
+++ b/data/2022/neurips/Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability	
@@ -0,0 +1 @@
+Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Although various algorithms have been extensively studied for AUPRC optimization, the generalization is only guaranteed in the multi-query case. In this work, we present the first trial in the single-query generalization of stochastic AUPRC optimization. For sharper generalization bounds, we focus on algorithm-dependent generalization. There are both algorithmic and theoretical obstacles to our destination. From an algorithmic perspective, we notice that the majority of existing stochastic estimators are biased only when the sampling strategy is biased, and is leave-one-out unstable due to the non-decomposability. To address these issues, we propose a sampling-rate-invariant unbiased stochastic estimator with superior stability. On top of this, the AUPRC optimization is formulated as a composition optimization problem, and a stochastic algorithm is proposed to solve this problem. From a theoretical perspective, standard techniques of the algorithm-dependent generalization analysis cannot be directly applied to such a listwise compositional optimization problem. To fill this gap, we extend the model stability from instancewise losses to listwise losses and bridge the corresponding generalization and stability. Additionally, we construct state transition matrices to describe the recurrence of the stability, and simplify calculations by matrix spectrum. Practically, experimental results on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring the Latent Space of Autoencoders with Interventional Assays b/data/2022/neurips/Exploring the Latent Space of Autoencoders with Interventional Assays
new file mode 100644
index 0000000000..8844959217
--- /dev/null
+++ b/data/2022/neurips/Exploring the Latent Space of Autoencoders with Interventional Assays	
@@ -0,0 +1 @@
+Autoencoders exhibit impressive abilities to embed the data manifold into a low-dimensional latent space, making them a staple of representation learning methods. However, without explicit supervision, which is often unavailable, the representation is usually uninterpretable, making analysis and principled progress challenging. We propose a framework, called latent responses, which exploits the locally contractive behavior exhibited by variational autoencoders to explore the learned manifold. More specifically, we develop tools to probe the representation using interventions in the latent space to quantify the relationships between latent variables. We extend the notion of disentanglement to take the learned generative process into account and consequently avoid the limitations of existing metrics that may rely on spurious correlations. Our analyses underscore the importance of studying the causal structure of the representation to improve performance on downstream tasks such as generation, interpolation, and inference of the factors of variation.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models b/data/2022/neurips/Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
new file mode 100644
index 0000000000..82552888da
--- /dev/null
+++ b/data/2022/neurips/Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models	
@@ -0,0 +1 @@
+Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring the Whole Rashomon Set of Sparse Decision Trees b/data/2022/neurips/Exploring the Whole Rashomon Set of Sparse Decision Trees
new file mode 100644
index 0000000000..8ca0f41501
--- /dev/null
+++ b/data/2022/neurips/Exploring the Whole Rashomon Set of Sparse Decision Trees	
@@ -0,0 +1 @@
+In any given machine learning problem, there might be many models that explain the data almost equally well. However, most learning algorithms return only one of these models, leaving practitioners with no practical way to explore alternative models that might have desirable properties beyond what could be expressed by a loss function. The Rashomon set is the set of these all almost-optimal models. Rashomon sets can be large in size and complicated in structure, particularly for highly nonlinear function classes that allow complex interaction terms, such as decision trees. We provide the first technique for completely enumerating the Rashomon set for sparse decision trees; in fact, our work provides the first complete enumeration of any Rashomon set for a non-trivial problem with a highly nonlinear discrete function class. This allows the user an unprecedented level of control over model choice among all models that are approximately equally good. We represent the Rashomon set in a specialized data structure that supports efficient querying and sampling. We show three applications of the Rashomon set: 1) it can be used to study variable importance for the set of almost-optimal trees (as opposed to a single tree), 2) the Rashomon set for accuracy enables enumeration of the Rashomon sets for balanced accuracy and F1-score, and 3) the Rashomon set for a full dataset can be used to produce Rashomon sets constructed with only subsets of the data set. Thus, we are able to examine Rashomon sets across problems with a new lens, enabling users to choose models rather than be at the mercy of an algorithm that produces only a single model.
\ No newline at end of file
diff --git a/data/2022/neurips/Exploring through Random Curiosity with General Value Functions b/data/2022/neurips/Exploring through Random Curiosity with General Value Functions
new file mode 100644
index 0000000000..e2a28b3634
--- /dev/null
+++ b/data/2022/neurips/Exploring through Random Curiosity with General Value Functions	
@@ -0,0 +1 @@
+Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general value functions (RC-GVF), a novel intrinsic reward function that draws upon connections between these distinct approaches. Instead of using only the current observation's novelty or a curiosity bonus for failing to predict precise environment dynamics, RC-GVF derives intrinsic rewards through predicting temporally extended general value functions. We demonstrate that this improves exploration in a hard-exploration diabolical lock problem. Furthermore, RC-GVF significantly outperforms previous methods in the absence of ground-truth episodic counts in the partially observable MiniGrid environments. Panoramic observations on MiniGrid further boost RC-GVF's performance such that it is competitive to baselines exploiting privileged information in form of episodic counts.
\ No newline at end of file
diff --git a/data/2022/neurips/Exponential Family Model-Based Reinforcement Learning via Score Matching b/data/2022/neurips/Exponential Family Model-Based Reinforcement Learning via Score Matching
new file mode 100644
index 0000000000..61a920d538
--- /dev/null
+++ b/data/2022/neurips/Exponential Family Model-Based Reinforcement Learning via Score Matching	
@@ -0,0 +1 @@
+We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known. SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression. Under standard regularity assumptions, SMRL achieves $\tilde O(d\sqrt{H^3T})$ online regret, where $H$ is the length of each episode and $T$ is the total number of interactions (ignoring polynomial dependence on structural scale parameters).
\ No newline at end of file
diff --git a/data/2022/neurips/Exponential Separations in Symmetric Neural Networks b/data/2022/neurips/Exponential Separations in Symmetric Neural Networks
new file mode 100644
index 0000000000..c33e57cdc5
--- /dev/null
+++ b/data/2022/neurips/Exponential Separations in Symmetric Neural Networks	
@@ -0,0 +1 @@
+In this work we demonstrate a novel separation between symmetric neural network architectures. Specifically, we consider the Relational Network~\parencite{santoro2017simple} architecture as a natural generalization of the DeepSets~\parencite{zaheer2017deep} architecture, and study their representational gap. Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of size $N$ with elements in dimension $D$, which can be efficiently approximated by the former architecture, but provably requires width exponential in $N$ and $D$ for the latter.
\ No newline at end of file
diff --git a/data/2022/neurips/Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks b/data/2022/neurips/Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks
new file mode 100644
index 0000000000..a5ca8f5faf
--- /dev/null
+++ b/data/2022/neurips/Exponentially Improving the Complexity of Simulating the Weisfeiler-Lehman Test with Graph Neural Networks	
@@ -0,0 +1 @@
+Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing non-isomorphic graphs is exactly the same as that of the Weisfeiler-Lehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the 'combine' function of size polynomial or even exponential in the number of graph nodes $n$, as well as feature vectors of length linear in $n$. We present an improved simulation of the WL test on GNNs with \emph{exponentially} lower complexity. In particular, the neural network implementing the combine function in each node has only a polylogarithmic number of parameters in $n$, and the feature vectors exchanged by the nodes of GNN consists of only $O(\log n)$ bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)-optimality of our construction.
\ No newline at end of file
diff --git a/data/2022/neurips/Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training b/data/2022/neurips/Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training
new file mode 100644
index 0000000000..0c2d33bd58
--- /dev/null
+++ b/data/2022/neurips/Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training	
@@ -0,0 +1 @@
+Sparse training is a popular technique to reduce the overhead of training large models. Although previous work has shown promising results for nonstructured sparse models, it is still unclear whether a sparse model with structural constraints can be trained from scratch to high accuracy. In this work, we study the dynamic sparse training for a class of sparse models with shufﬂed block structures. Compared to nonstructured models, such ﬁne-grained structured models are more hardware-friendly and can effectively accelerate the training process. We propose an algorithm that keeps adapting the sparse model while maintaining the active parameters in shufﬂed blocks. We conduct experiments on a variety of networks and datasets and obtain positive results. In particular, on ImageNet, we achieve dense accuracy for ResNet50 and ResNet18 at 0.5 sparsity. On CIFAR10/100, we show that dense accuracy can be recovered at 0.6 sparsity for various models. At higher sparsity, our algorithm can still match the accuracy of nonstructured sparse training in most cases, while reducing the training time by up to 5x due to the ﬁne-grained block structures in the models.
\ No newline at end of file
diff --git a/data/2022/neurips/Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods b/data/2022/neurips/Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods
new file mode 100644
index 0000000000..75dde6a6a8
--- /dev/null
+++ b/data/2022/neurips/Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods	
@@ -0,0 +1 @@
+This work proposes a universal and adaptive second-order method for minimizing second-order smooth, convex functions. Our algorithm achieves $O(\sigma / \sqrt{T})$ convergence when the oracle feedback is stochastic with variance $\sigma^2$, and improves its convergence to $O( 1 / T^3)$ with deterministic oracles, where $T$ is the number of iterations. Our method also interpolates these rates without knowing the nature of the oracle apriori, which is enabled by a parameter-free adaptive step-size that is oblivious to the knowledge of smoothness modulus, variance bounds and the diameter of the constrained set. To our knowledge, this is the first universal algorithm with such global guarantees within the second-order optimization literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Extracting computational mechanisms from neural data using low-rank RNNs b/data/2022/neurips/Extracting computational mechanisms from neural data using low-rank RNNs
new file mode 100644
index 0000000000..431867681d
--- /dev/null
+++ b/data/2022/neurips/Extracting computational mechanisms from neural data using low-rank RNNs	
@@ -0,0 +1 @@
+An inﬂuential framework within systems neuroscience posits that neural computations can be understood in terms of low-dimensional dynamics in recurrent circuits. A number of methods have thus been developed to extract latent dynamical systems from neural recordings, but inferring models that are both predictive and interpretable remains a difﬁcult challenge. Here we propose a new method called Low-rank Inference from Neural Trajectories (LINT), based on a class of low-rank recurrent neural networks (lrRNNs) for which a link between connectivity and dynamics has been previously demonstrated. By ﬁtting such networks to trajectories of neural activity, LINT yields a mechanistic model of latent dynamics, as well as a set of axes for dimensionality reduction and veriﬁable predictions for inactivations of speciﬁc populations of neurons. Here, we ﬁrst demonstrate the consistency of our method and then apply it to two use cases: (i) we reverse-engineer “black-box” vanilla RNNs trained to perform cognitive tasks, and (ii) we infer latent dynamics and neural contributions from electrophysiological recordings of nonhuman primates performing a similar task.
\ No newline at end of file
diff --git a/data/2022/neurips/Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study b/data/2022/neurips/Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study
new file mode 100644
index 0000000000..5812435327
--- /dev/null
+++ b/data/2022/neurips/Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study	
@@ -0,0 +1 @@
+Neural tangent kernel (NTK) is a powerful tool to analyze training dynamics of neural networks and their generalization bounds. The study on NTK has been devoted to typical neural network architectures, but it is incomplete for neural networks with Hadamard products (NNs-Hp), e.g., StyleGAN and polynomial neural networks (PNNs). In this work, we derive the finite-width NTK formulation for a special class of NNs-Hp, i.e., polynomial neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK. Based on our results, we elucidate the separation of PNNs over standard neural networks with respect to extrapolation and spectral bias. Our two key insights are that when compared to standard neural networks, PNNs can fit more complicated functions in the extrapolation regime and admit a slower eigenvalue decay of the respective NTK, leading to a faster learning towards high-frequency functions. Besides, our theoretical results can be extended to other types of NNs-Hp, which expand the scope of our work. Our empirical results validate the separations in broader classes of NNs-Hp, which provide a good justification for a deeper understanding of neural architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/Extrapolative Continuous-time Bayesian Neural Network for Fast Training-free Test-time Adaptation b/data/2022/neurips/Extrapolative Continuous-time Bayesian Neural Network for Fast Training-free Test-time Adaptation
new file mode 100644
index 0000000000..89ae05d021
--- /dev/null
+++ b/data/2022/neurips/Extrapolative Continuous-time Bayesian Neural Network for Fast Training-free Test-time Adaptation	
@@ -0,0 +1 @@
+Human intelligence has shown remarkably lower latency and higher precision than most AI systems when processing non-stationary streaming data in real-time. Numerous neuroscience studies suggest that such abilities may be driven by internal predictive modeling. In this paper, we explore the possibility of introducing such a mechanism in unsupervised domain adaptation (UDA) for handling non-stationary streaming data for real-time streaming applications. We propose to formulate internal predictive modeling as a continuous-time Bayesian filtering problem within a stochastic dynamical system context. Such a dynamical system describes the dynamics of model parameters of a UDA model evolving with non-stationary streaming data. Building on such a dynamical system, we then develop extrap-olative continuous-time Bayesian neural networks (ECBNN 2 ), which generalize existing Bayesian neural networks to represent temporal dynamics and allow us to extrapolate the distribution of model parameters before observing the incoming data, therefore effectively reducing the latency. Remarkably, our empirical results show that ECBNN is capable of continuously generating better distributions of model parameters along the time axis given historical data only, thereby achieving (1) training-free test-time adaptation with low latency, (2) gradually improved alignment between the source and target features and (3) gradually improved model performance over time during the real-time testing stage.
\ No newline at end of file
diff --git a/data/2022/neurips/FACT: Learning Governing Abstractions Behind Integer Sequences b/data/2022/neurips/FACT: Learning Governing Abstractions Behind Integer Sequences
new file mode 100644
index 0000000000..a9a89d3887
--- /dev/null
+++ b/data/2022/neurips/FACT: Learning Governing Abstractions Behind Integer Sequences	
@@ -0,0 +1 @@
+Integer sequences are of central importance to the modeling of concepts admitting complete finitary descriptions. We introduce a novel view on the learning of such concepts and lay down a set of benchmarking tasks aimed at conceptual understanding by machine learning models. These tasks indirectly assess model ability to abstract, and challenge them to reason both interpolatively and extrapolatively from the knowledge gained by observing representative examples. To further aid research in knowledge representation and reasoning, we present FACT, the Finitary Abstraction Comprehension Toolkit. The toolkit surrounds a large dataset of integer sequences comprising both organic and synthetic entries, a library for data pre-processing and generation, a set of model performance evaluation tools, and a collection of baseline model implementations, enabling the making of the future advancements with ease.
\ No newline at end of file
diff --git a/data/2022/neurips/FETA: Towards Specializing Foundational Models for Expert Task Applications b/data/2022/neurips/FETA: Towards Specializing Foundational Models for Expert Task Applications
new file mode 100644
index 0000000000..61e5d130b7
--- /dev/null
+++ b/data/2022/neurips/FETA: Towards Specializing Foundational Models for Expert Task Applications	
@@ -0,0 +1 @@
+Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training. This underlines the necessity to explicitly evaluate and finetune FMs on such expert tasks, arguably ones that appear the most in practical real-world applications. In this paper, we propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation, via learning to match their graphical illustrations to corresponding language descriptions. Our FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. FETA is equipped with a procedure for completely automatic annotation extraction (code would be released upon acceptance), allowing easy extension of FETA to more documentation types and application domains in the future. Our automatic annotation leads to an automated performance metric shown to be consistent with metrics computed on human-curated annotations (also released). We provide multiple baselines and analysis of popular FMs on FETA leading to several interesting findings that we believe would be very valuable to the FM community, paving the way towards real-world application of FMs for practical expert tasks currently 'overlooked' by standard benchmarks focusing on common objects.
\ No newline at end of file
diff --git a/data/2022/neurips/FIRE: Semantic Field of Words Represented as Non-Linear Functions b/data/2022/neurips/FIRE: Semantic Field of Words Represented as Non-Linear Functions
new file mode 100644
index 0000000000..0b4f5e84b1
--- /dev/null
+++ b/data/2022/neurips/FIRE: Semantic Field of Words Represented as Non-Linear Functions	
@@ -0,0 +1 @@
+State-of-the-art word embeddings presume a linear vector space, but this approach does not easily incorporate the nonlinearity that is necessary to represent poly-semy. We thus propose a novel semantic FIeld REepresentation, called FIRE, which is a D -dimensional ﬁeld in which every word is represented as a set of its locations and a nonlinear function covering the ﬁeld. The strength of a word’s relation to another word at a certain location is measured as the function value at that location. With FIRE, compositionality is represented via functional additivity, whereas polysemy is represented via the set of points and the function’s multimodality. By implementing FIRE for English and comparing it with previous representation methods via word and sentence similarity tasks, we show that FIRE produces comparable or even better results. In an evaluation of polysemy to predict the number of word senses, FIRE greatly outperformed BERT and Word2vec, providing evidence of how FIRE represents polysemy. The code is available at https://github.com/kduxin/firelang .
\ No newline at end of file
diff --git a/data/2022/neurips/FLAIR: Federated Learning Annotated Image Repository b/data/2022/neurips/FLAIR: Federated Learning Annotated Image Repository
new file mode 100644
index 0000000000..2232d43c6d
--- /dev/null
+++ b/data/2022/neurips/FLAIR: Federated Learning Annotated Image Repository	
@@ -0,0 +1 @@
+Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices. This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed. Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning. FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution. We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning. Dataset access and the code for the benchmark are available at \url{https://github.com/apple/ml-flair}.
\ No newline at end of file
diff --git a/data/2022/neurips/FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings b/data/2022/neurips/FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings
new file mode 100644
index 0000000000..a0dc9c2010
--- /dev/null
+++ b/data/2022/neurips/FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings	
@@ -0,0 +1 @@
+Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
\ No newline at end of file
diff --git a/data/2022/neurips/FNeVR: Neural Volume Rendering for Face Animation b/data/2022/neurips/FNeVR: Neural Volume Rendering for Face Animation
new file mode 100644
index 0000000000..0df2c02469
--- /dev/null
+++ b/data/2022/neurips/FNeVR: Neural Volume Rendering for Face Animation	
@@ -0,0 +1 @@
+Face animation, one of the hottest topics in computer vision, has achieved a promising performance with the help of generative models. However, it remains a critical challenge to generate identity preserving and photo-realistic images due to the sophisticated motion deformation and complex facial detail modeling. To address these problems, we propose a Face Neural Volume Rendering (FNeVR) network to fully explore the potential of 2D motion warping and 3D volume rendering in a unified framework. In FNeVR, we design a 3D Face Volume Rendering (FVR) module to enhance the facial details for image rendering. Specifically, we first extract 3D information with a well-designed architecture, and then introduce an orthogonal adaptive ray-sampling module for efficient rendering. We also design a lightweight pose editor, enabling FNeVR to edit the facial pose in a simple yet effective way. Extensive experiments show that our FNeVR obtains the best overall quality and performance on widely used talking-head benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction b/data/2022/neurips/FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction
new file mode 100644
index 0000000000..7235aa1efb
--- /dev/null
+++ b/data/2022/neurips/FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction	
@@ -0,0 +1 @@
+The advent of deep learning has led to significant progress in monocular human reconstruction. However, existing representations, such as parametric models, voxel grids, meshes and implicit neural representations, have difficulties achieving high-quality results and real-time speed at the same time. In this paper, we propose Fourier Occupancy Field (FOF), a novel powerful, efficient and flexible 3D representation, for monocular real-time and accurate human reconstruction. The FOF represents a 3D object with a 2D field orthogonal to the view direction where at each 2D position the occupancy field of the object along the view direction is compactly represented with the first few terms of Fourier series, which retains the topology and neighborhood relation in the 2D domain. A FOF can be stored as a multi-channel image, which is compatible with 2D convolutional neural networks and can bridge the gap between 3D geometries and 2D images. The FOF is very flexible and extensible, e.g., parametric models can be easily integrated into a FOF as a prior to generate more robust results. Based on FOF, we design the first 30+FPS high-fidelity real-time monocular human reconstruction framework. We demonstrate the potential of FOF on both public dataset and real captured data. The code will be released for research purposes.
\ No newline at end of file
diff --git a/data/2022/neurips/FP8 Quantization: The Power of the Exponent b/data/2022/neurips/FP8 Quantization: The Power of the Exponent
new file mode 100644
index 0000000000..23f685c140
--- /dev/null
+++ b/data/2022/neurips/FP8 Quantization: The Power of the Exponent	
@@ -0,0 +1 @@
+When quantizing neural networks for efficient inference, low-bit integers are the go-to format for efficiency. However, low-bit floating point numbers have an extra degree of freedom, assigning some bits to work on an exponential scale instead. This paper in-depth investigates this benefit of the floating point format for neural network inference. We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance. Then we show how these findings translate to real networks, provide an efficient implementation for FP8 simulation, and a new algorithm that enables the learning of both the scale parameters and the number of exponent bits in the FP8 format. Our chief conclusion is that when doing post-training quantization for a wide range of networks, the FP8 format is better than INT8 in terms of accuracy, and the choice of the number of exponent bits is driven by the severity of outliers in the network. We also conduct experiments with quantization-aware training where the difference in formats disappears as the network is trained to reduce the effect of outliers.
\ No newline at end of file
diff --git a/data/2022/neurips/FR: Folded Rationalization with a Unified Encoder b/data/2022/neurips/FR: Folded Rationalization with a Unified Encoder
new file mode 100644
index 0000000000..ab6e2366bd
--- /dev/null
+++ b/data/2022/neurips/FR: Folded Rationalization with a Unified Encoder	
@@ -0,0 +1 @@
+Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10.3% as compared to state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Factored Adaptation for Non-Stationary Reinforcement Learning b/data/2022/neurips/Factored Adaptation for Non-Stationary Reinforcement Learning
new file mode 100644
index 0000000000..77767c2df4
--- /dev/null
+++ b/data/2022/neurips/Factored Adaptation for Non-Stationary Reinforcement Learning	
@@ -0,0 +1 @@
+Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.
\ No newline at end of file
diff --git a/data/2022/neurips/Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits b/data/2022/neurips/Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits
new file mode 100644
index 0000000000..aa1537e7e2
--- /dev/null
+++ b/data/2022/neurips/Factored DRO: Factored Distributionally Robust Policies for Contextual Bandits	
@@ -0,0 +1 @@
+While there has been extensive work on learning from ofﬂine data for contextual multi-armed bandit settings, existing methods typically assume there is no environment shift: that the learned policy will operate in the same environmental process as that of data collection. However, this assumption may limit the use of these methods for many practical situations where there may be distribution shifts. In this work we propose Factored Distributionally Robust Optimization (Factored-DRO) 1 , which is able to separately handle distribution shifts in the context distribution and shifts in the reward generating process. Prior work that either ignores potential shifts in the context, or considers them jointly, can lead to performance that is too conservative, especially under certain forms of reward feedback. Our Factored-DRO objective mitigates this by considering the shifts separately, and our proposed estimators are consistent and converge asymptotically. We also introduce a practical algorithm and demonstrate promising empirical results in environments based on real-world datasets, such as voting outcomes and scene classiﬁcation.
\ No newline at end of file
diff --git a/data/2022/neurips/Factorized-FL: Personalized Federated Learning with Parameter Factorization & Similarity Matching b/data/2022/neurips/Factorized-FL: Personalized Federated Learning with Parameter Factorization & Similarity Matching
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Factuality Enhanced Language Models for Open-Ended Text Generation b/data/2022/neurips/Factuality Enhanced Language Models for Open-Ended Text Generation
new file mode 100644
index 0000000000..0243794ba7
--- /dev/null
+++ b/data/2022/neurips/Factuality Enhanced Language Models for Open-Ended Text Generation	
@@ -0,0 +1 @@
+Pretrained language models (LMs) are susceptible to generate text with nonfactual information. In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation. We design the FactualityPrompts test set and metrics to measure the factuality of LM generations. Based on that, we study the factual accuracy of LMs with parameter sizes ranging from 126M to 530B. Interestingly, we find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions. In addition, popular sampling algorithms (e.g., top-p) in open-ended text generation can harm the factuality due to the ''uniform randomness'' introduced at every sampling step. We propose the factual-nucleus sampling algorithm that dynamically adapts the randomness to improve the factuality of generation while maintaining quality. Furthermore, we analyze the inefficiencies of the standard training method in learning correct associations between entities from factual text corpus (e.g., Wikipedia). We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion as the training objective, which can vastly reduce the factual errors. We release our code and FactualityPrompts benchmark at: https://github.com/nayeon7lee/FactualityPrompt.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair Bayes-Optimal Classifiers Under Predictive Parity b/data/2022/neurips/Fair Bayes-Optimal Classifiers Under Predictive Parity
new file mode 100644
index 0000000000..722f7ab9b3
--- /dev/null
+++ b/data/2022/neurips/Fair Bayes-Optimal Classifiers Under Predictive Parity	
@@ -0,0 +1 @@
+Increasing concerns about disparate effects of AI have motivated a great deal of work on fair machine learning. Existing works mainly focus on independence- and separation-based measures (e.g., demographic parity, equality of opportunity, equalized odds), while sufficiency-based measures such as predictive parity are much less studied. This paper considers predictive parity, which requires equalizing the probability of success given a positive prediction among different protected groups. We prove that, if the overall performances of different groups vary only moderately, all fair Bayes-optimal classifiers under predictive parity are group-wise thresholding rules. Perhaps surprisingly, this may not hold if group performance levels vary widely; in this case we find that predictive parity among protected groups may lead to within-group unfairness. We then propose an algorithm we call FairBayes-DPP, aiming to ensure predictive parity when our condition is satisfied. FairBayes-DPP is an adaptive thresholding algorithm that aims to achieve predictive parity, while also seeking to maximize test accuracy. We provide supporting experiments conducted on synthetic and empirical data.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting b/data/2022/neurips/Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
new file mode 100644
index 0000000000..751f8117ba
--- /dev/null
+++ b/data/2022/neurips/Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting	
@@ -0,0 +1 @@
+In consequential decision-making applications, mitigating unwanted biases in machine learning models that yield systematic disadvantage to members of groups delineated by sensitive attributes such as race and gender is one key intervention to strive for equity. Focusing on demographic parity and equality of opportunity, in this paper we propose an algorithm that improves the fairness of a pre-trained classifier by simply dropping carefully selected training data points. We select instances based on their influence on the fairness metric of interest, computed using an infinitesimal jackknife-based approach. The dropping of training points is done in principle, but in practice does not require the model to be refit. Crucially, we find that such an intervention does not substantially reduce the predictive performance of the model but drastically improves the fairness metric. Through careful experiments, we evaluate the effectiveness of the proposed approach on diverse tasks and find that it consistently improves upon existing alternatives.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair Rank Aggregation b/data/2022/neurips/Fair Rank Aggregation
new file mode 100644
index 0000000000..149c6cc3e2
--- /dev/null
+++ b/data/2022/neurips/Fair Rank Aggregation	
@@ -0,0 +1 @@
+Ranking algorithms find extensive usage in diverse areas such as web search, employment, college admission, voting, etc. The related rank aggregation problem deals with combining multiple rankings into a single aggregate ranking. However, algorithms for both these problems might be biased against some individuals or groups due to implicit prejudice or marginalization in the historical data. We study ranking and rank aggregation problems from a fairness or diversity perspective, where the candidates (to be ranked) may belong to different groups and each group should have a fair representation in the final ranking. We allow the designer to set the parameters that define fair representation. These parameters specify the allowed range of the number of candidates from a particular group in the top-$k$ positions of the ranking. Given any ranking, we provide a fast and exact algorithm for finding the closest fair ranking for the Kendall tau metric under block-fairness. We also provide an exact algorithm for finding the closest fair ranking for the Ulam metric under strict-fairness, when there are only $O(1)$ number of groups. Our algorithms are simple, fast, and might be extendable to other relevant metrics. We also give a novel meta-algorithm for the general rank aggregation problem under the fairness framework. Surprisingly, this meta-algorithm works for any generalized mean objective (including center and median problems) and any fairness criteria. As a byproduct, we obtain 3-approximation algorithms for both center and median problems, under both Kendall tau and Ulam metrics. Furthermore, using sophisticated techniques we obtain a $(3-\varepsilon)$-approximation algorithm, for a constant $\varepsilon>0$, for the Ulam metric under strong fairness.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair Ranking with Noisy Protected Attributes b/data/2022/neurips/Fair Ranking with Noisy Protected Attributes
new file mode 100644
index 0000000000..571a932290
--- /dev/null
+++ b/data/2022/neurips/Fair Ranking with Noisy Protected Attributes	
@@ -0,0 +1 @@
+The fair-ranking problem, which asks to rank a given set of items to maximize utility subject to group fairness constraints, has received attention in the fairness, information retrieval, and machine learning literature. Recent works, however, observe that errors in socially-salient (including protected) attributes of items can significantly undermine fairness guarantees of existing fair-ranking algorithms and raise the problem of mitigating the effect of such errors. We study the fair-ranking problem under a model where socially-salient attributes of items are randomly and independently perturbed. We present a fair-ranking framework that incorporates group fairness requirements along with probabilistic information about perturbations in socially-salient attributes. We provide provable guarantees on the fairness and utility attainable by our framework and show that it is information-theoretically impossible to significantly beat these guarantees. Our framework works for multiple non-disjoint attributes and a general class of fairness constraints that includes proportional and equal representation. Empirically, we observe that, compared to baselines, our algorithm outputs rankings with higher fairness, and has a similar or better fairness-utility trade-off compared to baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair Wrapping for Black-box Predictions b/data/2022/neurips/Fair Wrapping for Black-box Predictions
new file mode 100644
index 0000000000..2fc461b817
--- /dev/null
+++ b/data/2022/neurips/Fair Wrapping for Black-box Predictions	
@@ -0,0 +1 @@
+We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias. Our technique builds on the recent analysis of improper loss functions whose optimization can correct any twist in prediction, unfairness being treated as a twist. In the post-processing, we learn a wrapper function which we define as an $\alpha$-tree, which modifies the prediction. We provide two generic boosting algorithms to learn $\alpha$-trees. We show that our modification has appealing properties in terms of composition of $\alpha$-trees, generalization, interpretability, and KL divergence between modified and original predictions. We exemplify the use of our technique in three fairness notions: conditional value-at-risk, equality of opportunity, and statistical parity; and provide experiments on several readily available datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair and Efficient Allocations Without Obvious Manipulations b/data/2022/neurips/Fair and Efficient Allocations Without Obvious Manipulations
new file mode 100644
index 0000000000..1ce58916dd
--- /dev/null
+++ b/data/2022/neurips/Fair and Efficient Allocations Without Obvious Manipulations	
@@ -0,0 +1 @@
+We consider the fundamental problem of allocating a set of indivisible goods among strategic agents with additive valuation functions. It is well known that, in the absence of monetary transfers, Pareto efficient and truthful rules are dictatorial, while there is no deterministic truthful mechanism that allocates all items and achieves envy-freeness up to one item (EF1), even for the case of two agents. In this paper, we investigate the interplay of fairness and efficiency under a relaxation of truthfulness called non-obvious manipulability (NOM), recently proposed by Troyan and Morrill. We show that this relaxation allows us to bypass the aforementioned negative results in a very strong sense. Specifically, we prove that there are deterministic and EF1 algorithms that are not obviously manipulable, and the algorithm that maximizes utilitarian social welfare (the sum of agents' utilities), which is Pareto efficient but not dictatorial, is not obviously manipulable for $n \geq 3$ agents (but obviously manipulable for $n=2$ agents). At the same time, maximizing the egalitarian social welfare (the minimum of agents' utilities) or the Nash social welfare (the product of agents' utilities) is obviously manipulable for any number of agents and items. Our main result is an approximation preserving black-box reduction from the problem of designing EF1 and NOM mechanisms to the problem of designing EF1 algorithms. En route, we prove an interesting structural result about EF1 allocations, as well as new"best-of-both-worlds"results (for the problem without incentives), that might be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Fair and Optimal Decision Trees: A Dynamic Programming Approach b/data/2022/neurips/Fair and Optimal Decision Trees: A Dynamic Programming Approach
new file mode 100644
index 0000000000..45492805b9
--- /dev/null
+++ b/data/2022/neurips/Fair and Optimal Decision Trees: A Dynamic Programming Approach	
@@ -0,0 +1 @@
+Interpretable and fair machine learning models are required for many applications, such as credit assessment and in criminal justice. Decision trees offer this inter-pretability, especially when they are small. Optimal decision trees are of particular interest because they offer the best performance possible for a given size. However, state-of-the-art algorithms for fair and optimal decision trees have scalability issues, often requiring several hours to find such trees even for small datasets. Previous research has shown that dynamic programming (DP) performs well for optimizing decision trees because it can exploit the tree structure. However, adding a global fairness constraint to a DP approach is not straightforward, because the global constraint violates the condition that subproblems should be independent. We show how such a constraint can be incorporated by introducing upper and lower bounds on final fairness values for partial solutions of subproblems, which enables early comparison and pruning. Our results show that our model can find fair and optimal trees several orders of magnitude faster than previous methods, and now also for larger datasets that were previously beyond reach. Moreover, we show that with this substantial improvement our method can find the full Pareto front in the trade-off between accuracy and fairness.
\ No newline at end of file
diff --git a/data/2022/neurips/FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning b/data/2022/neurips/FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning
new file mode 100644
index 0000000000..689e14d64e
--- /dev/null
+++ b/data/2022/neurips/FairVFL: A Fair Vertical Federated Learning Framework with Contrastive Adversarial Learning	
@@ -0,0 +1 @@
+Vertical federated learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed on different platforms in a privacy-preserving way. Since in real-world applications the data may contain bias on fairness-sensitive features (e.g., gender), VFL models may inherit bias from training data and become unfair for some user groups. However, existing fair machine learning methods usually rely on the centralized storage of fairness-sensitive features to achieve model fairness, which are usually inapplicable in federated scenarios. In this paper, we propose a fair vertical federated learning framework (FairVFL), which can improve the fairness of VFL models. The core idea of FairVFL is to learn unified and fair representations of samples based on the decentralized feature fields in a privacy-preserving way. Specifically, each platform with fairness-insensitive features first learns local data representations from local features. Then, these local representations are uploaded to a server and aggregated into a unified representation for the target task. In order to learn a fair unified representation, we send it to each platform storing fairness-sensitive features and apply adversarial learning to remove bias from the unified representation inherited from the biased data. Moreover, for protecting user privacy, we further propose a contrastive adversarial learning method to remove private information from the unified representation in server before sending it to the platforms keeping fairness-sensitive features. Experiments on three real-world datasets validate that our method can effectively improve model fairness with user privacy well-protected.
\ No newline at end of file
diff --git a/data/2022/neurips/Fairness Reprogramming b/data/2022/neurips/Fairness Reprogramming
new file mode 100644
index 0000000000..17c6591c28
--- /dev/null
+++ b/data/2022/neurips/Fairness Reprogramming	
@@ -0,0 +1 @@
+Despite a surge of recent advances in promoting machine Learning (ML) fairness, the existing mainstream approaches mostly require retraining or finetuning the entire weights of the neural network to meet the fairness criteria. However, this is often infeasible in practice for those large-scale trained models due to large computational and storage costs, low data efficiency, and model privacy issues. In this paper, we propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the case where models can not be changed and appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation. We further introduce an information-theoretic framework that explains why and under what conditions fairness goals can be achieved using the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models by providing false demographic information that hinders the model from utilizing the correct demographic information to make the prediction. Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less data dependency under two widely-used fairness criteria. Codes are available at https://github.com/UCSB-NLP-Chang/Fairness-Reprogramming.git.
\ No newline at end of file
diff --git a/data/2022/neurips/Fairness Transferability Subject to Bounded Distribution Shift b/data/2022/neurips/Fairness Transferability Subject to Bounded Distribution Shift
new file mode 100644
index 0000000000..85c6c37be3
--- /dev/null
+++ b/data/2022/neurips/Fairness Transferability Subject to Bounded Distribution Shift	
@@ -0,0 +1 @@
+Given an algorithmic predictor that is"fair"on some source distribution, will it still be fair on an unknown target distribution that differs from the source within some bound? In this paper, we study the transferability of statistical group fairness for machine learning predictors (i.e., classifiers or regressors) subject to bounded distribution shifts. Such shifts may be introduced by initial training data uncertainties, user adaptation to a deployed predictor, dynamic environments, or the use of pre-trained models in new settings. Herein, we develop a bound that characterizes such transferability, flagging potentially inappropriate deployments of machine learning for socially consequential tasks. We first develop a framework for bounding violations of statistical fairness subject to distribution shift, formulating a generic upper bound for transferred fairness violations as our primary result. We then develop bounds for specific worked examples, focusing on two commonly used fairness definitions (i.e., demographic parity and equalized odds) and two classes of distribution shift (i.e., covariate shift and label shift). Finally, we compare our theoretical bounds to deterministic models of distribution shift and against real-world data, finding that we are able to estimate fairness violation bounds in practice, even when simplifying assumptions are only approximately satisfied.
\ No newline at end of file
diff --git a/data/2022/neurips/Fairness in Federated Learning via Core-Stability b/data/2022/neurips/Fairness in Federated Learning via Core-Stability
new file mode 100644
index 0000000000..dcbd954276
--- /dev/null
+++ b/data/2022/neurips/Fairness in Federated Learning via Core-Stability	
@@ -0,0 +1 @@
+Federated learning provides an effective paradigm to jointly optimize a model benefited from rich distributed data while protecting data privacy. Nonetheless, the heterogeneity nature of distributed data makes it challenging to define and ensure fairness among local agents. For instance, it is intuitively"unfair"for agents with data of high quality to sacrifice their performance due to other agents with low quality data. Currently popular egalitarian and weighted equity-based fairness measures suffer from the aforementioned pitfall. In this work, we aim to formally represent this problem and address these fairness issues using concepts from co-operative game theory and social choice theory. We model the task of learning a shared predictor in the federated setting as a fair public decision making problem, and then define the notion of core-stable fairness: Given $N$ agents, there is no subset of agents $S$ that can benefit significantly by forming a coalition among themselves based on their utilities $U_N$ and $U_S$ (i.e., $\frac{|S|}{N} U_S \geq U_N$). Core-stable predictors are robust to low quality local data from some agents, and additionally they satisfy Proportionality and Pareto-optimality, two well sought-after fairness and efficiency notions within social choice. We then propose an efficient federated learning protocol CoreFed to optimize a core stable predictor. CoreFed determines a core-stable predictor when the loss functions of the agents are convex. CoreFed also determines approximate core-stable predictors when the loss functions are not convex, like smooth neural networks. We further show the existence of core-stable predictors in more general settings using Kakutani's fixed point theorem. Finally, we empirically validate our analysis on two real-world datasets, and we show that CoreFed achieves higher core-stability fairness than FedAvg while having similar accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Fairness without Demographics through Knowledge Distillation b/data/2022/neurips/Fairness without Demographics through Knowledge Distillation
new file mode 100644
index 0000000000..d261e29c7d
--- /dev/null
+++ b/data/2022/neurips/Fairness without Demographics through Knowledge Distillation	
@@ -0,0 +1 @@
+Most of existing work on fairness assumes available demographic information in the training set. In practice, due to legal or privacy concerns, when demographic information is not available in the training set, it is crucial to find alternative objectives to ensure fairness. Existing work on fairness without demographics follows Rawlsian Max-Min fairness objectives. However, such constraints could be too strict to improve group fairness, and could lead to a great decrease in accuracy. In light of these limitations, in this paper, we propose to solve the problem from a new perspective, i.e., through knowledge distillation. Our method uses soft label from an overfitted teacher model as an alternative, and we show from preliminary experiments that soft labelling is beneficial for improving fairness. We analyze theoretically the fairness of our method, and we show that our method can be treated as an error-based reweighing. Experimental results on three datasets show that our method outperforms state-of-the-art alternatives, with notable improvements in group fairness and with relatively small decrease in accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search b/data/2022/neurips/Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search
new file mode 100644
index 0000000000..e906f76915
--- /dev/null
+++ b/data/2022/neurips/Falconn++: A Locality-sensitive Filtering Approach for Approximate Nearest Neighbor Search	
@@ -0,0 +1 @@
+We present Falconn++, a novel locality-sensitive filtering approach for approximate nearest neighbor search on angular distance. Falconn++ can filter out potential far away points in any hash bucket \textit{before} querying, which results in higher quality candidates compared to other hashing-based solutions. Theoretically, Falconn++ asymptotically achieves lower query time complexity than Falconn, an optimal locality-sensitive hashing scheme on angular distance. Empirically, Falconn++ achieves higher recall-speed tradeoffs than Falconn on many real-world data sets. Falconn++ is also competitive with HNSW, an efficient representative of graph-based solutions on high search recall regimes.
\ No newline at end of file
diff --git a/data/2022/neurips/Falsification before Extrapolation in Causal Effect Estimation b/data/2022/neurips/Falsification before Extrapolation in Causal Effect Estimation
new file mode 100644
index 0000000000..ebab750a0e
--- /dev/null
+++ b/data/2022/neurips/Falsification before Extrapolation in Causal Effect Estimation	
@@ -0,0 +1 @@
+Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), we propose a meta-algorithm that attempts to reject observational estimates that are biased. We do so using validation effects, causal effects that can be inferred from both RCT and observational data. After rejecting estimators that do not pass this test, we generate conservative confidence intervals on the extrapolated causal effects for subgroups not observed in the RCT. Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm. To facilitate hypothesis testing in settings where causal effect transportation across datasets is necessary, we give conditions under which a doubly-robust estimator of group average treatment effects is asymptotically normal, even when flexible machine learning methods are used for estimation of nuisance parameters. We illustrate the properties of our approach on semi-synthetic and real world datasets, and show that it compares favorably to standard meta-analysis techniques.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Algorithms for Packing Proportional Fairness and its Dual b/data/2022/neurips/Fast Algorithms for Packing Proportional Fairness and its Dual
new file mode 100644
index 0000000000..b939a22114
--- /dev/null
+++ b/data/2022/neurips/Fast Algorithms for Packing Proportional Fairness and its Dual	
@@ -0,0 +1 @@
+The proportional fair resource allocation problem is a major problem studied in flow control of networks, operations research, and economic theory, where it has found numerous applications. This problem, defined as the constrained maximization of $\sum_i \log x_i$, is known as the packing proportional fairness problem when the feasible set is defined by positive linear constraints and $x \in \mathbb{R}^{n}_{\geq 0}$. In this work, we present a distributed accelerated first-order method for this problem which improves upon previous approaches. We also design an algorithm for the optimization of its dual problem. Both algorithms are width-independent. Finally, we show the latter problem has applications to the volume reduction of bounding simplices in an old linear programming algorithm of (YL82), and we obtain some improvements as a result.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement b/data/2022/neurips/Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
new file mode 100644
index 0000000000..a6890566b2
--- /dev/null
+++ b/data/2022/neurips/Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement	
@@ -0,0 +1 @@
+Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Bayesian Estimation of Point Process Intensity as Function of Covariates b/data/2022/neurips/Fast Bayesian Estimation of Point Process Intensity as Function of Covariates
new file mode 100644
index 0000000000..21032196c7
--- /dev/null
+++ b/data/2022/neurips/Fast Bayesian Estimation of Point Process Intensity as Function of Covariates	
@@ -0,0 +1 @@
+In this paper, we tackle the Bayesian estimation of point process intensity as a function of covariates. We propose a novel augmentation of permanental process called augmented permanental process , a doubly-stochastic point process that uses a Gaussian process on covariate space to describe the Bayesian a priori uncertainty present in the square root of intensity, and derive a fast Bayesian estimation algorithm that scales linearly with data size without relying on either domain discretization or Markov Chain Monte Carlo computation. The proposed algorithm is based on a non-trivial ﬁnding that the representer theorem, one of the most desirable mathematical property for machine learning problems, holds for the augmented permanental process, which provides us with many signiﬁcant computational advantages. We evaluate our algorithm on synthetic and real-world data, and show that it outperforms state-of-the-art methods in terms of predictive accuracy while being substantially faster than a conventional Bayesian method.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination b/data/2022/neurips/Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
new file mode 100644
index 0000000000..4e5b08c3ae
--- /dev/null
+++ b/data/2022/neurips/Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination	
@@ -0,0 +1 @@
+Calculation of Bayesian posteriors and model evidences typically requires numerical integration. Bayesian quadrature (BQ), a surrogate-model-based approach to numerical integration, is capable of superb sample efficiency, but its lack of parallelisation has hindered its practical applications. In this work, we propose a parallelised (batch) BQ method, employing techniques from kernel quadrature, that possesses an empirically exponential convergence rate. Additionally, just as with Nested Sampling, our method permits simultaneous inference of both posteriors and model evidence. Samples from our BQ surrogate model are re-selected to give a sparse set of samples, via a kernel recombination algorithm, requiring negligible additional time to increase the batch size. Empirically, we find that our approach significantly outperforms the sampling efficiency of both state-of-the-art BQ techniques and Nested Sampling in various real-world datasets, including lithium-ion battery analytics.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Distance Oracles for Any Symmetric Norm b/data/2022/neurips/Fast Distance Oracles for Any Symmetric Norm
new file mode 100644
index 0000000000..ac5802c873
--- /dev/null
+++ b/data/2022/neurips/Fast Distance Oracles for Any Symmetric Norm	
@@ -0,0 +1 @@
+In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Instrument Learning with Faster Rates b/data/2022/neurips/Fast Instrument Learning with Faster Rates
new file mode 100644
index 0000000000..57f964dedf
--- /dev/null
+++ b/data/2022/neurips/Fast Instrument Learning with Faster Rates	
@@ -0,0 +1 @@
+We investigate nonlinear instrumental variable (IV) regression given high-dimensional instruments. We propose a simple algorithm which combines kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed as a black box. Our algorithm enjoys faster-rate convergence and adapts to the dimensionality of informative latent features, while avoiding an expensive minimax optimization procedure, which has been necessary to establish similar guarantees. It further brings the benefit of flexible machine learning models to quasi-Bayesian uncertainty quantification, likelihood-based model selection, and model averaging. Simulation studies demonstrate the competitive performance of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay b/data/2022/neurips/Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay
new file mode 100644
index 0000000000..b278750ad6
--- /dev/null
+++ b/data/2022/neurips/Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay	
@@ -0,0 +1 @@
+We prove the Fast Equilibrium Conjecture proposed by Li et al. [1], i.e. , stochastic gradient descent (SGD) on a scale-invariant loss ( e.g. , using networks with various normalization schemes) with learning rate ⌘ and weight decay factor � mixes in function space in e O (1 / ( ⌘� )) steps, under two standard assumptions: (1) the noise covariance matrix is non-degenerate and (2) the minimizers of the loss form a connected, compact and analytic manifold. The analysis uses the framework of Li et al. [2] and shows that for every T > 0 , the iterates of SGD with learning rate ⌘ and weight decay factor � on the scale-invariant loss converge in distribution in ln(1 + T � / ⌘ ) / (4 ⌘� ) iterations as ⌘� ! 0 while satisfying ⌘  O ( � )  O (1) . Moreover, the evolution of the limiting distribution can be described by a stochastic differential equation that mixes to the same equilibrium distribution for every initialization around the manifold of minimizers as T ! 1 .
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Neural Kernel Embeddings for General Activations b/data/2022/neurips/Fast Neural Kernel Embeddings for General Activations
new file mode 100644
index 0000000000..583fe3aacf
--- /dev/null
+++ b/data/2022/neurips/Fast Neural Kernel Embeddings for General Activations	
@@ -0,0 +1 @@
+Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations. In this work, we overcome such difficulties by providing methods to work with general activations. First, we compile and expand the list of activation functions admitting exact dual activation expressions to compute neural kernels. When the exact computation is unknown, we present methods to effectively approximate them. We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation. This is done by showing how to approximate the neural kernels using the truncated Hermite expansion of any desired activation functions. While most prior works require data points on the unit sphere, our methods do not suffer from such limitations and are applicable to any dataset of points in $\mathbb{R}^d$. Furthermore, we provide a subspace embedding for NNGP and NTK matrices with near input-sparsity runtime and near-optimal target dimension which applies to any \emph{homogeneous} dual activation functions with rapidly convergent Taylor expansion. Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves $106\times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization b/data/2022/neurips/Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization
new file mode 100644
index 0000000000..0da499e69e
--- /dev/null
+++ b/data/2022/neurips/Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization	
@@ -0,0 +1 @@
+We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is"simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\epsilon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrt{\epsilon})$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Fast Vision Transformers with HiLo Attention b/data/2022/neurips/Fast Vision Transformers with HiLo Attention
new file mode 100644
index 0000000000..0dce684ab4
--- /dev/null
+++ b/data/2022/neurips/Fast Vision Transformers with HiLo Attention	
@@ -0,0 +1 @@
+Vision Transformers (ViTs) have triggered the most recent and significant breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect metric of computational complexity, i.e., FLOPs, which however has a clear gap with the direct metric such as throughput. Thus, we propose to use the direct speed evaluation on the target platform as the design principle for efficient ViTs. Particularly, we introduce LITv2, a simple and effective ViT which performs favourably against the existing state-of-the-art methods across a spectrum of different model sizes with faster speed. At the core of LITv2 is a novel self-attention mechanism, which we dub HiLo. HiLo is inspired by the insight that high frequencies in an image capture local fine details and low frequencies focus on global structures, whereas a multi-head self-attention layer neglects the characteristic of different frequencies. Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map. Benefiting from the efficient design for both groups, we show that HiLo is superior to the existing attention mechanisms by comprehensively benchmarking FLOPs, speed and memory consumption on GPUs and CPUs. For example, HiLo is 1.4x faster than spatial reduction attention and 1.6x faster than local window attention on CPUs. Powered by HiLo, LITv2 serves as a strong backbone for mainstream vision tasks including image classification, dense detection and segmentation. Code is available at https://github.com/ziplab/LITv2.
\ No newline at end of file
diff --git a/data/2022/neurips/Faster Deep Reinforcement Learning with Slower Online Network b/data/2022/neurips/Faster Deep Reinforcement Learning with Slower Online Network
new file mode 100644
index 0000000000..21c7e560d8
--- /dev/null
+++ b/data/2022/neurips/Faster Deep Reinforcement Learning with Slower Online Network	
@@ -0,0 +1 @@
+Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network. This improves the robustness of deep reinforcement learning in presence of noisy updates. The resultant agents, called DQN Pro and Rainbow Pro, exhibit significant performance improvements over their original counterparts on the Atari benchmark demonstrating the effectiveness of this simple idea in deep reinforcement learning. The code for our paper is available here: Github.com/amazon-research/fast-rl-with-slow-updates.
\ No newline at end of file
diff --git a/data/2022/neurips/Faster Linear Algebra for Distance Matrices b/data/2022/neurips/Faster Linear Algebra for Distance Matrices
new file mode 100644
index 0000000000..45c9925ca9
--- /dev/null
+++ b/data/2022/neurips/Faster Linear Algebra for Distance Matrices	
@@ -0,0 +1 @@
+The distance matrix of a dataset $X$ of $n$ points with respect to a distance function $f$ represents all pairwise distances between points in $X$ induced by $f$. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the $\ell_1$ metric for which we get a linear runtime, as well as an $\Omega(n^2)$ lower bound for any algorithm which computes a matrix-vector product for the $\ell_{\infty}$ case, showing a separation between the $\ell_1$ and the $\ell_{\infty}$ metrics. Our upper bound results, in conjunction with recent works on the matrix-vector query model, have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by $\ell_1$ and $\ell_2^2$ functions and the fastest algorithm for computing an additive error low-rank approximation for the $\ell_2$ metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate $\ell_2$ distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma.
\ No newline at end of file
diff --git a/data/2022/neurips/Faster and Scalable Algorithms for Densest Subgraph and Decomposition b/data/2022/neurips/Faster and Scalable Algorithms for Densest Subgraph and Decomposition
new file mode 100644
index 0000000000..c44c7207c3
--- /dev/null
+++ b/data/2022/neurips/Faster and Scalable Algorithms for Densest Subgraph and Decomposition	
@@ -0,0 +1 @@
+We study the densest subgraph problem (DSG) and the densest subgraph local decomposition problem (DSG-LD) in undirected graphs. We also consider su-permodular generalizations of these problems. For large scale graphs simple iterative algorithms perform much better in practice than theoretically fast algorithms based on network-ﬂow or LP solvers. Boob et al. [1] recently gave a fast iterative algorithm called G REEDY ++ for DSG. It was shown in [2] that it converges to a (1 � ✏ ) relative approximation to the optimum density in O ( 1 ✏ 2 � ( G ) � ⇤ ) iterations where � ( G ) is the maximum degree and � ⇤ is the optimum density. Danisch et al. [3] gave an iterative algorithm based on the Frank-Wolfe algorithm for DSG-LD that takes O ( m � ( G ) ✏ 2 ) iterations to converge to an ✏ -additive approximate local decomposition vector ˆ b , where m is number of edges in the graph. In this paper we give a new iterative algorithm for both problems that takes at most O ( p m � ( G ) ✏ ) iterations to converge to an ✏ -additive approximate local decomposition vector; each iteration can be implemented in O ( m ) time. We describe a fractional peeling technique which has strong empirical performance as well as theoretical guarantees. The algorithm is scalable and simple, and can be applied to graphs with hundreds of millions of edges. We test our algorithm on real and synthetic data sets and show that it provides a signiﬁcant beneﬁt over previous algorithms. The algorithm and analysis extends to hypergraphs.
\ No newline at end of file
diff --git a/data/2022/neurips/FasterRisk: Fast and Accurate Interpretable Risk Scores b/data/2022/neurips/FasterRisk: Fast and Accurate Interpretable Risk Scores
new file mode 100644
index 0000000000..7b5712c746
--- /dev/null
+++ b/data/2022/neurips/FasterRisk: Fast and Accurate Interpretable Risk Scores	
@@ -0,0 +1 @@
+Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow. We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a"star ray"search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Fault-Aware Neural Code Rankers b/data/2022/neurips/Fault-Aware Neural Code Rankers
new file mode 100644
index 0000000000..bd1e886058
--- /dev/null
+++ b/data/2022/neurips/Fault-Aware Neural Code Rankers	
@@ -0,0 +1 @@
+Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose CodeRanker, a neural ranker that can predict the correctness of a sampled program without executing it. Our CodeRanker is fault-aware i.e., it is trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that CodeRanker can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/FeLMi : Few shot Learning with hard Mixup b/data/2022/neurips/FeLMi : Few shot Learning with hard Mixup
new file mode 100644
index 0000000000..14ecb345f0
--- /dev/null
+++ b/data/2022/neurips/FeLMi : Few shot Learning with hard Mixup	
@@ -0,0 +1 @@
+Learning from a few examples is a challenging computer vision task. Traditionally, meta-learning-based methods have shown promise towards solving this problem. Recent approaches show benefits by learning a feature extractor on the abundant base examples and transferring these to the fewer novel examples. However, the finetuning stage is often prone to overfitting due to the small size of the novel dataset. To this end, we propose Fe w shot L earning with hard Mi xup ( FeLMi ) using manifold mixup to synthetically generate samples that helps in mitigating the data scarcity issue. Different from a naïve mixup, our approach selects the hard mixup samples using an uncertainty-based criteria. To the best of our knowledge, we are the first to use hard-mixup for the few-shot learning problem. Our approach allows better use of the pseudo-labeled base examples through base-novel mixup and entropy-based filtering. We evaluate our approach on several common few-shot benchmarks - FC-100, CIFAR-FS, miniImageNet and tieredImageNet and obtain improvements in both 1-shot and 5-shot settings. Additionally, we experimented on the cross-domain few-shot setting (miniImageNet → CUB) and obtain significant improvements. Code: https://github.com/aniket004/Felmi
\ No newline at end of file
diff --git a/data/2022/neurips/Feature Learning in $L_2$-regularized DNNs: Attraction Repulsion and Sparsity b/data/2022/neurips/Feature Learning in $L_2$-regularized DNNs: Attraction Repulsion and Sparsity
new file mode 100644
index 0000000000..1f2fd8df06
--- /dev/null
+++ b/data/2022/neurips/Feature Learning in $L_2$-regularized DNNs: Attraction Repulsion and Sparsity	
@@ -0,0 +1 @@
+We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the input and output representations, keeping as little information from the input as necessary to construct the activation of the next layer. For positively homogeneous non-linearities, the loss can be further reformulated in terms of the covariances of the hidden representations, which takes the form of a partially convex optimization over a convex cone. This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$-regularized loss can be achieved with at most $N(N+1)$ neurons in each hidden layer (where $N$ is the size of the training set). We show that this bound is tight by giving an example of a local minimum that requires $N^{2}/4$ hidden neurons. But we also observe numerically that in more traditional settings much less than $N^{2}$ neurons are required to reach the minima.
\ No newline at end of file
diff --git a/data/2022/neurips/Feature-Proxy Transformer for Few-Shot Segmentation b/data/2022/neurips/Feature-Proxy Transformer for Few-Shot Segmentation
new file mode 100644
index 0000000000..026a3df07e
--- /dev/null
+++ b/data/2022/neurips/Feature-Proxy Transformer for Few-Shot Segmentation	
@@ -0,0 +1 @@
+Few-shot segmentation (FSS) aims at performing semantic segmentation on novel classes given a few annotated support samples. With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head. Due to the intricacy of the decoder and its matching pipeline, it is not easy to follow such an FSS framework. This paper revives the straightforward framework of"feature extractor $+$ linear classification head"and proposes a novel Feature-Proxy Transformer (FPTrans) method, in which the"proxy"is the vector representing a semantic class in the linear classification head. FPTrans has two keypoints for learning discriminative features and representative proxies: 1) To better utilize the limited support samples, the feature extractor makes the query interact with the support features from the bottom to top layers using a novel prompting strategy. 2) FPTrans uses multiple local background proxies (instead of a single one) because the background is not homogeneous and may contain some novel foreground regions. These two keypoints are easily integrated into the vision transformer backbone with the prompting mechanism in the transformer. Given the learned features and proxies, FPTrans directly compares their cosine similarity for segmentation. Although the framework is straightforward, we show that FPTrans achieves competitive FSS accuracy on par with state-of-the-art decoder-based methods.
\ No newline at end of file
diff --git a/data/2022/neurips/FedAvg with Fine Tuning: Local Updates Lead to Representation Learning b/data/2022/neurips/FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
new file mode 100644
index 0000000000..a8282d570b
--- /dev/null
+++ b/data/2022/neurips/FedAvg with Fine Tuning: Local Updates Lead to Representation Learning	
@@ -0,0 +1 @@
+The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first such result for any setting. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
\ No newline at end of file
diff --git a/data/2022/neurips/FedPop: A Bayesian Approach for Personalised Federated Learning b/data/2022/neurips/FedPop: A Bayesian Approach for Personalised Federated Learning
new file mode 100644
index 0000000000..480497af78
--- /dev/null
+++ b/data/2022/neurips/FedPop: A Bayesian Approach for Personalised Federated Learning	
@@ -0,0 +1 @@
+Personalised federated learning (FL) aims at collaboratively learning a machine learning model taylored for each client. Albeit promising advances have been made in this direction, most of existing approaches works do not allow for uncertainty quantification which is crucial in many applications. In addition, personalisation in the cross-device setting still involves important issues, especially for new clients or those having small number of observations. This paper aims at filling these gaps. To this end, we propose a novel methodology coined FedPop by recasting personalised FL into the population modeling paradigm where clients' models involve fixed common population parameters and random effects, aiming at explaining data heterogeneity. To derive convergence guarantees for our scheme, we introduce a new class of federated stochastic optimisation algorithms which relies on Markov chain Monte Carlo methods. Compared to existing personalised FL methods, the proposed methodology has important benefits: it is robust to client drift, practical for inference on new clients, and above all, enables uncertainty quantification under mild computational and memory overheads. We provide non-asymptotic convergence guarantees for the proposed algorithms and illustrate their performances on various personalised federated learning tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction b/data/2022/neurips/FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction
new file mode 100644
index 0000000000..52247c1a5d
--- /dev/null
+++ b/data/2022/neurips/FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction	
@@ -0,0 +1 @@
+Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical. However, such constraint not only excludes low-end clients who would otherwise make unique contributions to model training but also restrains clients from training large models due to on-device resource bottlenecks. In this work, we propose FedRolex, a partial training (PT)-based approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. At its core, FedRolex employs a rolling sub-model extraction scheme that allows different parts of the global server model to be evenly trained, which mitigates the client drift induced by the inconsistency between individual client models and server model architectures. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods (e.g. Federated Dropout) and reduces the gap between model-heterogeneous and model-homogeneous FL, especially under the large-model large-dataset regime. In addition, we provide theoretical statistical analysis on its advantage over Federated Dropout and evaluate FedRolex on an emulated real-world device distribution to show that FedRolex can enhance the inclusiveness of FL and boost the performance of low-end devices that would otherwise not benefit from FL. Our code is available at: https://github.com/AIoT-MLSys-Lab/FedRolex
\ No newline at end of file
diff --git a/data/2022/neurips/FedSR: A Simple and Effective Domain Generalization Method for Federated Learning b/data/2022/neurips/FedSR: A Simple and Effective Domain Generalization Method for Federated Learning
new file mode 100644
index 0000000000..4b14e67ac2
--- /dev/null
+++ b/data/2022/neurips/FedSR: A Simple and Effective Domain Generalization Method for Federated Learning	
@@ -0,0 +1 @@
+Federated Learning (FL) refers to the decentralized and privacy-preserving machine learning framework in which multiple clients collaborate (with the help of a central server) to train a global model without sharing their data. However, most existing FL methods only focus on maximizing the model’s performance on the source clients’ data (e.g., mobile users) without considering its generalization ability to unknown target data (e.g., a new user). In this paper, we incorporate the problem of Domain Generalization (DG) into Federated Learning to tackle the aforementioned issue. However, virtually all existing DG methods require a centralized setting where data is shared across the domains, which violates the principles of decentralized FL and hence not applicable. To this end, we propose a simple yet novel representation learning framework, namely FedSR, which enables domain generalization while still respecting the decentralized and privacy-preserving natures of this FL setting. Motivated by classical machine learning algorithms, we aim to learn a simple representation of the data for better generalization. In particular, we enforce an L2-norm regularizer on the representation and a conditional mutual information (between the representation and the data given the label) regularizer to encourage the model to only learn essential information (while ignoring spurious correlations such as the background). Furthermore, we provide theoretical connections between the above two objectives and representation alignment in domain generalization. Extensive experimental results suggest that our method significantly outperforms relevant baselines in this particular problem.
\ No newline at end of file
diff --git a/data/2022/neurips/Federated Learning from Pre-Trained Models: A Contrastive Learning Approach b/data/2022/neurips/Federated Learning from Pre-Trained Models: A Contrastive Learning Approach
new file mode 100644
index 0000000000..2d3bf950d8
--- /dev/null
+++ b/data/2022/neurips/Federated Learning from Pre-Trained Models: A Contrastive Learning Approach	
@@ -0,0 +1 @@
+Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-Shot Audio-Visual Learning of Environment Acoustics b/data/2022/neurips/Few-Shot Audio-Visual Learning of Environment Acoustics
new file mode 100644
index 0000000000..0b0a51284e
--- /dev/null
+++ b/data/2022/neurips/Few-Shot Audio-Visual Learning of Environment Acoustics	
@@ -0,0 +1 @@
+Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and -- in a major departure from traditional methods -- generalizing to novel environments in a few-shot manner. Project: http://vision.cs.utexas.edu/projects/fs_rir.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-Shot Continual Active Learning by a Robot b/data/2022/neurips/Few-Shot Continual Active Learning by a Robot
new file mode 100644
index 0000000000..6872b70d1c
--- /dev/null
+++ b/data/2022/neurips/Few-Shot Continual Active Learning by a Robot	
@@ -0,0 +1 @@
+In this paper, we consider a challenging but realistic continual learning (CL) problem, Few-Shot Continual Active Learning (FoCAL), where a CL agent is provided with unlabeled data for a new or a previously learned task in each increment and the agent only has limited labeling budget available. Towards this, we build on the continual learning and active learning literature and develop a framework that can allow a CL agent to continually learn new object classes from a few labeled training examples. Our framework represents each object class using a uniform Gaussian mixture model (GMM) and uses pseudo-rehearsal to mitigate catastrophic forgetting. The framework also uses uncertainty measures on the Gaussian representations of the previously learned classes to find the most informative samples to be labeled in an increment. We evaluate our approach on the CORe-50 dataset and on a real humanoid robot for the object classification task. The results show that our approach not only produces state-of-the-art results on the dataset but also allows a real robot to continually learn unseen objects in a real environment with limited labeling supervision provided by its user.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-Shot Fast-Adaptive Anomaly Detection b/data/2022/neurips/Few-Shot Fast-Adaptive Anomaly Detection
new file mode 100644
index 0000000000..81db134728
--- /dev/null
+++ b/data/2022/neurips/Few-Shot Fast-Adaptive Anomaly Detection	
@@ -0,0 +1 @@
+The ability to detect anomaly has long been recognized as an inherent human ability, yet to date, practical AI solutions to mimic such capability have been lacking. This lack of progress can be attributed to several factors. To begin with, the distribution of “abnormalities” is intractable. Anything outside of a given normal population is by deﬁnition an anomaly. This explains why a large volume of work in this area has been dedicated to modeling the normal distribution of a given task followed by detecting deviations from it. This direction is however unsatisfying as it would require modeling the normal distribution of every task that comes along, which includes tedious data collection. In this paper, we report our work aiming to handle these issues. To deal with the intractability of abnormal distribution, we leverage Energy Based Model (EBM). EBMs learn to associate low energies to correct values and higher energies to incorrect values. At its core, the EBM employs Langevin Dynamics (LD) in generating these incorrect samples based on an iterative optimization procedure, alleviating the intractable problem of modeling the world of anomalies. Then, in order to avoid training an anomaly detector for every task, we utilize an adaptive sparse coding layer. Our intention is to design a plug and play feature that can be used to quickly update what is normal during inference time. Lastly, to avoid tedious data collection, this mentioned update of the sparse coding layer needs to be achievable with just a few shots. Here, we employ a meta learning scheme that simulates such a few shot setting during training. We support our ﬁndings with strong empirical evidence.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-Shot Non-Parametric Learning with Deep Latent Variable Model b/data/2022/neurips/Few-Shot Non-Parametric Learning with Deep Latent Variable Model
new file mode 100644
index 0000000000..19c1ae225d
--- /dev/null
+++ b/data/2022/neurips/Few-Shot Non-Parametric Learning with Deep Latent Variable Model	
@@ -0,0 +1 @@
+Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning b/data/2022/neurips/Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
new file mode 100644
index 0000000000..c94ecf173e
--- /dev/null
+++ b/data/2022/neurips/Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning	
@@ -0,0 +1 @@
+Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-shot Image Generation via Adaptation-Aware Kernel Modulation b/data/2022/neurips/Few-shot Image Generation via Adaptation-Aware Kernel Modulation
new file mode 100644
index 0000000000..b1840c01dd
--- /dev/null
+++ b/data/2022/neurips/Few-shot Image Generation via Adaptation-Aware Kernel Modulation	
@@ -0,0 +1 @@
+Few-shot image generation (FSIG) aims to learn to generate new and diverse samples given an extremely limited number of samples from a domain, e.g., 10 training samples. Recent work has addressed the problem using transfer learning approach, leveraging a GAN pretrained on a large-scale source domain dataset and adapting that model to the target domain based on very limited target domain samples. Central to recent FSIG methods are knowledge preserving criteria, which aim to select a subset of source model's knowledge to be preserved into the adapted model. However, a major limitation of existing methods is that their knowledge preserving criteria consider only source domain/source task, and they fail to consider target domain/adaptation task in selecting source model's knowledge, casting doubt on their suitability for setups of different proximity between source and target domain. Our work makes two contributions. As our first contribution, we re-visit recent FSIG works and their experiments. Our important finding is that, under setups which assumption of close proximity between source and target domains is relaxed, existing state-of-the-art (SOTA) methods which consider only source domain/source task in knowledge preserving perform no better than a baseline fine-tuning method. To address the limitation of existing methods, as our second contribution, we propose Adaptation-Aware kernel Modulation (AdAM) to address general FSIG of different source-target domain proximity. Extensive experimental results show that the proposed method consistently achieves SOTA performance across source/target domains of different proximity, including challenging setups when source and target domains are more apart. Project Page: https://yunqing-me.github.io/AdAM/
\ No newline at end of file
diff --git a/data/2022/neurips/Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion b/data/2022/neurips/Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion
new file mode 100644
index 0000000000..0fb28ff421
--- /dev/null
+++ b/data/2022/neurips/Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion	
@@ -0,0 +1 @@
+We propose a few-shot learning method for feature selection that can select relevant features given a small number of labeled instances. Existing methods require many labeled instances for accurate feature selection. However, sufﬁcient instances are often unavailable. We use labeled instances in multiple related tasks to alleviate the lack of labeled instances in a target task. To measure the dependency between each feature and label, we use the Hilbert-Schmidt Independence Criterion, which is a kernel-based independence measure. By modeling the kernel functions with neural networks that take a few labeled instances in a task as input, we can encode the task-speciﬁc information to the kernels such that the kernels are appropriate for the task. Feature selection with such kernels is performed by using iterative optimization methods, in which each update step is obtained as a closed-form. This formulation enables us to directly and efﬁciently minimize the expected test error on features selected by a small number of labeled instances. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-shot Relational Reasoning via Connection Subgraph Pretraining b/data/2022/neurips/Few-shot Relational Reasoning via Connection Subgraph Pretraining
new file mode 100644
index 0000000000..7229eff504
--- /dev/null
+++ b/data/2022/neurips/Few-shot Relational Reasoning via Connection Subgraph Pretraining	
@@ -0,0 +1 @@
+Few-shot knowledge graph (KG) completion task aims to perform inductive reasoning over the KG: given only a few support triplets of a new relation $\bowtie$ (e.g., (chop,$\bowtie$,kitchen), (read,$\bowtie$,library), the goal is to predict the query triplets of the same unseen relation $\bowtie$, e.g., (sleep,$\bowtie$,?). Current approaches cast the problem in a meta-learning framework, where the model needs to be first jointly trained over many training few-shot tasks, each being defined by its own relation, so that learning/prediction on the target few-shot task can be effective. However, in real-world KGs, curating many training tasks is a challenging ad hoc process. Here we propose Connection Subgraph Reasoner (CSR), which can make predictions for the target few-shot task directly without the need for pre-training on the human curated set of training tasks. The key to CSR is that we explicitly model a shared connection subgraph between support and query triplets, as inspired by the principle of eliminative induction. To adapt to specific KG, we design a corresponding self-supervised pretraining scheme with the objective of reconstructing automatically sampled connection subgraphs. Our pretrained model can then be directly applied to target few-shot tasks on without the need for training few-shot tasks. Extensive experiments on real KGs, including NELL, FB15K-237, and ConceptNet, demonstrate the effectiveness of our framework: we show that even a learning-free implementation of CSR can already perform competitively to existing methods on target few-shot tasks; with pretraining, CSR can achieve significant gains of up to 52% on the more challenging inductive few-shot tasks where the entities are also unseen during (pre)training.
\ No newline at end of file
diff --git a/data/2022/neurips/Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models b/data/2022/neurips/Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
new file mode 100644
index 0000000000..c2de6461b0
--- /dev/null
+++ b/data/2022/neurips/Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models	
@@ -0,0 +1 @@
+Traditional knowledge distillation (KD) methods manually design student architectures to compress large models given pre-specified computational cost. This requires several trials to find viable students, and repeating the process with change in computational budget. We use Neural Architecture Search (NAS) to automatically distill several compressed students with variable cost from a large model. Existing NAS methods train a single SuperLM consisting of millions of subnetworks with weight-sharing, resulting in interference between subnetworks of different sizes. Additionally, many of these works are task-specific requiring task labels for SuperLM training. Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (e.g., K = 3 can generate typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e.g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training. Task-agnostic training and search allow students to be reused for fine-tuning on any downstream task. Experiments on GLUE benchmark demonstrate AutoDistil to outperform state-of-the-art KD and NAS methods with upto 41 x reduction in computational cost. Code and models are available at aka.ms/autodistil.
\ No newline at end of file
diff --git a/data/2022/neurips/FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation b/data/2022/neurips/FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation
new file mode 100644
index 0000000000..a5124e82ab
--- /dev/null
+++ b/data/2022/neurips/FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation	
@@ -0,0 +1 @@
+The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost.
\ No newline at end of file
diff --git a/data/2022/neurips/FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting b/data/2022/neurips/FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting
new file mode 100644
index 0000000000..9d39374ea7
--- /dev/null
+++ b/data/2022/neurips/FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting	
@@ -0,0 +1 @@
+Recent studies have shown that deep learning models such as RNNs and Transformers have brought significant performance gains for long-term forecasting of time series because they effectively utilize historical information. We found, however, that there is still great room for improvement in how to preserve historical information in neural networks while avoiding overfitting to noise presented in the history. Addressing this allows better utilization of the capabilities of deep learning models. To this end, we design a \textbf{F}requency \textbf{i}mproved \textbf{L}egendre \textbf{M}emory model, or {\bf FiLM}: it applies Legendre Polynomials projections to approximate historical information, uses Fourier projection to remove noise, and adds a low-rank approximation to speed up computation. Our empirical studies show that the proposed FiLM significantly improves the accuracy of state-of-the-art models in multivariate and univariate long-term forecasting by (\textbf{20.3\%}, \textbf{22.6\%}), respectively. We also demonstrate that the representation module developed in this work can be used as a general plug-in to improve the long-term prediction performance of other deep learning modules. Code is available at https://github.com/tianzhou2011/FiLM/
\ No newline at end of file
diff --git a/data/2022/neurips/FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning b/data/2022/neurips/FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning
new file mode 100644
index 0000000000..d59e2ab07a
--- /dev/null
+++ b/data/2022/neurips/FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning	
@@ -0,0 +1 @@
+Finance is a particularly difficult playground for deep reinforcement learning. However, establishing high-quality market environments and benchmarks for financial reinforcement learning is challenging due to three major factors, namely, low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting in the backtesting stage. In this paper, we present an openly accessible FinRL-Meta library that has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we will provide hundreds of market environments through an automatic pipeline that collects dynamic datasets from real-world markets and processes them into gym-style market environments. Second, we reproduce popular papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, FinRL-Meta provides tens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. FinRL-Meta is available at: https://github.com/AI4Finance-Foundation/FinRL-Meta
\ No newline at end of file
diff --git a/data/2022/neurips/Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach b/data/2022/neurips/Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach
new file mode 100644
index 0000000000..573f834abd
--- /dev/null
+++ b/data/2022/neurips/Finding Correlated Equilibrium of Constrained Markov Game: A Primal-Dual Approach	
@@ -0,0 +1 @@
+Constrained Markov game is a fundamental problem that covers many applications, where multiple agents compete with each other under behavioral constraints. The existing literature has proved the existence of Nash equilibrium for constrained Markov games, which turns out to be PPAD-complete and cannot be computed in polynomial time. In this work, we propose a surrogate notion of correlated equilibrium (CE) for constrained Markov games that can be computed in polynomial time, and study its fundamental properties. We show that the modification structure of CE of constrained Markov games is fundamentally different from that of unconstrained Markov games. Moreover, we prove that the corresponding Lagrangian function has zero duality gap. Based on these results, we develop the first primal-dual algorithm that provably converges to CE of constrained Markov games. In particular, we prove that both the duality gap and the constraint violation of the output policy converge at the rate O ( 1 √ T ) . Moreover, when adopting the V-learning algorithm as the subroutine in the primal update, our algorithm achieves an approximate CE with ϵ duality gap with the sample complexity O ( H 9 SA 2 ϵ − 4 ) .
\ No newline at end of file
diff --git a/data/2022/neurips/Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing b/data/2022/neurips/Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing
new file mode 100644
index 0000000000..885e030963
--- /dev/null
+++ b/data/2022/neurips/Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing	
@@ -0,0 +1 @@
+Modern deep neural networks tend to be evaluated on static test sets. One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations. For example, it is hard to study the robustness of these networks to variations of object scale, object pose, scene lighting and 3D occlusions. The main reason is that collecting real datasets with fine-grained naturalistic variations of sufficient scale can be extremely time-consuming and expensive. In this work, we present Counterfactual Simulation Testing, a counterfactual framework that allows us to study the robustness of neural networks with respect to some of these naturalistic variations by building realistic synthetic scenes that allow us to ask counterfactual questions to the models, ultimately providing answers to questions such as"Would your classification still be correct if the object were viewed from the top?"or"Would your classification still be correct if the object were partially occluded by another object?". Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers, with respect to these naturalistic variations. We find evidence that ConvNext is more robust to pose and scale variations than Swin, that ConvNext generalizes better to our simulated domain and that Swin handles partial occlusion better than ConvNext. We also find that robustness for all networks improves with network scale and with data scale and variety. We release the Naturalistic Variation Object Dataset (NVD), a large simulated dataset of 272k images of everyday objects with naturalistic variations such as object pose, scale, viewpoint, lighting and occlusions. Project page: https://counterfactualsimulation.github.io
\ No newline at end of file
diff --git a/data/2022/neurips/Finding Naturally Occurring Physical Backdoors in Image Datasets b/data/2022/neurips/Finding Naturally Occurring Physical Backdoors in Image Datasets
new file mode 100644
index 0000000000..b7670ad77f
--- /dev/null
+++ b/data/2022/neurips/Finding Naturally Occurring Physical Backdoors in Image Datasets	
@@ -0,0 +1 @@
+Extensive literature on backdoor poison attacks has studied attacks and defenses for backdoors using “digital trigger patterns.” In contrast, “physical backdoors” use physical objects as triggers, have only recently been identiﬁed, and are qualitatively different enough to resist most defenses targeting digital trigger backdoors. Research on physical backdoors is limited by access to large datasets containing real images of physical objects co-located with misclassiﬁcation targets. Building these datasets is time-and labor-intensive. This work seeks to address the challenge of accessibility for research on physical backdoor attacks. We hypothesize that there may be naturally occurring physically co-located objects already present in popular datasets such as ImageNet. Once identiﬁed, a careful relabeling of these data can transform them into training samples for physical backdoor attacks. We propose a method to scalably identify these subsets of potential triggers in existing datasets, along with the speciﬁc classes they can poison. We call these naturally occurring trigger-class subsets nat-ural backdoor datasets . Our techniques successfully identify natural backdoors in widely-available datasets, and produce models behaviorally equivalent to those trained on manually curated datasets. We release our code to allow the research community to create their own datasets for research on physical backdoor attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget b/data/2022/neurips/Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget
new file mode 100644
index 0000000000..fe1340d3f6
--- /dev/null
+++ b/data/2022/neurips/Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget	
@@ -0,0 +1 @@
+We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is to choose a set of arms, whereupon feedback for each arm in the chosen set is received. Unlike existing works, we study this problem in a non-stochastic setting with subset-dependent feedback, i.e., the semi-bandit feedback received could be generated by an oblivious adversary and also might depend on the chosen set of arms. In addition, we consider a general feedback scenario covering both the numerical-based as well as preference-based case and introduce a sound theoretical framework for this setting guaranteeing sensible notions of optimal arms, which a learner seeks to find. We suggest a generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative. Theoretical questions about the sufficient and necessary budget of the algorithm to find the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.
\ No newline at end of file
diff --git a/data/2022/neurips/Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization b/data/2022/neurips/Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization
new file mode 100644
index 0000000000..9cd4cf9083
--- /dev/null
+++ b/data/2022/neurips/Finding Second-Order Stationary Points in Nonconvex-Strongly-Concave Minimax Optimization	
@@ -0,0 +1 @@
+We study the smooth minimax optimization problem $\min_{\bf x}\max_{\bf y} f({\bf x},{\bf y})$, where $f$ is $\ell$-smooth, strongly-concave in ${\bf y}$ but possibly nonconvex in ${\bf x}$. Most of existing works focus on finding the first-order stationary points of the function $f({\bf x},{\bf y})$ or its primal function $P({\bf x})\triangleq \max_{\bf y} f({\bf x},{\bf y})$, but few of them focus on achieving second-order stationary points. In this paper, we propose a novel approach for minimax optimization, called Minimax Cubic Newton (MCN), which could find an $\big(\varepsilon,\kappa^{1.5}\sqrt{\rho\varepsilon}\,\big)$-second-order stationary point of $P({\bf x})$ with calling ${\mathcal O}\big(\kappa^{1.5}\sqrt{\rho}\varepsilon^{-1.5}\big)$ times of second-order oracles and $\tilde{\mathcal O}\big(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\big)$ times of first-order oracles, where $\kappa$ is the condition number and $\rho$ is the Lipschitz continuous constant for the Hessian of $f({\bf x},{\bf y})$. In addition, we propose an inexact variant of MCN for high-dimensional problems to avoid calling expensive second-order oracles. Instead, our method solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains the desired approximate second-order stationary point with high probability but only requires $\tilde{\mathcal O}\big(\kappa^{1.5}\ell\varepsilon^{-2}\big)$ Hessian-vector oracle calls and $\tilde{\mathcal O}\big(\kappa^{2}\sqrt{\rho}\varepsilon^{-1.5}\big)$ first-order oracle calls. To the best of our knowledge, this is the first work that considers the non-asymptotic convergence behavior of finding second-order stationary points for minimax problems without the convex-concave assumptions.
\ No newline at end of file
diff --git a/data/2022/neurips/Finding and Listing Front-door Adjustment Sets b/data/2022/neurips/Finding and Listing Front-door Adjustment Sets
new file mode 100644
index 0000000000..8ba8be611d
--- /dev/null
+++ b/data/2022/neurips/Finding and Listing Front-door Adjustment Sets	
@@ -0,0 +1 @@
+Identifying the effects of new interventions from data is a significant challenge found across a wide range of the empirical sciences. A well-known strategy for identifying such effects is Pearl's front-door (FD) criterion (Pearl, 1995). The definition of the FD criterion is declarative, only allowing one to decide whether a specific set satisfies the criterion. In this paper, we present algorithms for finding and enumerating possible sets satisfying the FD criterion in a given causal diagram. These results are useful in facilitating the practical applications of the FD criterion for causal effects estimation and helping scientists to select estimands with desired properties, e.g., based on cost, feasibility of measurement, or statistical power.
\ No newline at end of file
diff --git a/data/2022/neurips/Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms b/data/2022/neurips/Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms
new file mode 100644
index 0000000000..bb77e2b92e
--- /dev/null
+++ b/data/2022/neurips/Fine-Grained Analysis of Stability and Generalization for Modern Meta Learning Algorithms	
@@ -0,0 +1 @@
+The support/query episodic training strategy has been widely applied in modern meta learning algorithms. Supposing the n training episodes and the test episodes are sampled independently from the same environment, previous work has derived a generalization bound of O (1 / √ n ) for smooth non-convex functions via algorithmic stability analysis. In this paper, we provide fine-grained analysis of stability and generalization for modern meta learning algorithms by considering more general situations. Firstly, we develop matching lower and upper stability bounds for meta learning algorithms with two types of loss functions: (1) nonsmooth convex functions with α -Hölder continuous subgradients ( α ∈ [0 , 1)) ; (2) smooth (including convex and non-convex) functions. Our tight stability bounds show that, in the nonsmooth convex case, meta learning algorithms can be inherently less stable than in the smooth convex case. For the smooth non-convex functions, our stability bound is sharper than the existing one, especially in the setting where the number of iterations is larger than the number n of training episodes. Secondly, we derive improved generalization bounds for meta learning algorithms that hold with high probability. Specifically, we first demonstrate that, under the independent episode environment assumption, the generalization bound of O (1 / √ n ) via algorithmic stability analysis is near optimal. To attain faster convergence rate, we show how to yield a deformed generalization bound of O (ln n/n ) with the curvature condition of loss functions. Finally, we obtain a generalization bound for meta learning with dependent episodes whose dependency relation is characterized by a graph. Experiments on regression problems are conducted to verify our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Fine-Grained Semantically Aligned Vision-Language Pre-Training b/data/2022/neurips/Fine-Grained Semantically Aligned Vision-Language Pre-Training
new file mode 100644
index 0000000000..43f8c1e48c
--- /dev/null
+++ b/data/2022/neurips/Fine-Grained Semantically Aligned Vision-Language Pre-Training	
@@ -0,0 +1 @@
+Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks. Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and texts, or advanced cross-modal attention upon image and text features. However, they fail to explicitly learn the fine-grained semantic alignment between visual regions and textual phrases, as only global image-text alignment information is available. In this paper, we introduce LOUPE, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions. To efficiently compute the game-theoretic interactions, we further propose an uncertainty-aware neural Shapley interaction learning module. Experiments show that LOUPE achieves state-of-the-art performance on a variety of vision-language tasks. Furthermore, without any object-level human annotations and fine-tuning, LOUPE achieves competitive performance on object detection and visual grounding. More importantly, LOUPE opens a new promising direction of learning fine-grained semantics from large-scale raw image-text pairs. The repository of this work is at https://github.com/YYJMJC/LOUPE.
\ No newline at end of file
diff --git a/data/2022/neurips/Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively b/data/2022/neurips/Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
new file mode 100644
index 0000000000..4734577a26
--- /dev/null
+++ b/data/2022/neurips/Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively	
@@ -0,0 +1 @@
+Large-scale pre-trained language models have achieved impressive results on a wide range of downstream tasks recently. However, fine-tuning an extremely large-scale pre-trained language model on limited target datasets is often plagued by overfitting and representation degradation. In this paper, we propose a Dynamic Parameter Selection (DPS) algorithm for the large-scale pre-trained models during fine-tuning, which adaptively selects a more promising subnetwork to perform staging updates based on gradients of back-propagation. Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. In addition, DPS brings a large magnitude of improvement in out-of-domain transferring experiments and low-resource scenarios, which shows that it can maintain stable general contextual features and reduce the representation collapse. We release our code at https://github.com/ZhangHaojie077/DPS
\ No newline at end of file
diff --git a/data/2022/neurips/Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees b/data/2022/neurips/Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees
new file mode 100644
index 0000000000..3968a28295
--- /dev/null
+++ b/data/2022/neurips/Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees	
@@ -0,0 +1 @@
+Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AC-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks. Different from previous efforts in activation compression, instead of compressing activation values directly, AC-SGD compresses the changes of the activations. This allows us to show, to the best of our knowledge for the first time, that one can still achieve $O(1/\sqrt{T})$ convergence rate for non-convex objectives under activation compression, without making assumptions on gradient unbiasedness that do not hold for deep learning models with non-linear activation functions.We then show that AC-SGD can be optimized and implemented efficiently, without additional end-to-end runtime overhead.We evaluated AC-SGD to fine-tune language models with up to 1.5 billion parameters, compressing activations to 2-4 bits.AC-SGD provides up to 4.3X end-to-end speed-up in slower networks, without sacrificing model quality. Moreover, we also show that AC-SGD can be combined with state-of-the-art gradient compression algorithms to enable"end-to-end communication compression: All communications between machines, including model gradients, forward activations, and backward gradients are compressed into lower precision.This provides up to 4.9X end-to-end speed-up, without sacrificing model quality.
\ No newline at end of file
diff --git a/data/2022/neurips/Fine-tuning language models to find agreement among humans with diverse preferences b/data/2022/neurips/Fine-tuning language models to find agreement among humans with diverse preferences
new file mode 100644
index 0000000000..c33ace29f1
--- /dev/null
+++ b/data/2022/neurips/Fine-tuning language models to find agreement among humans with diverse preferences	
@@ -0,0 +1 @@
+Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single"generic"user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g.,"should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
\ No newline at end of file
diff --git a/data/2022/neurips/Finite Sample Analysis Of Dynamic Regression Parameter Learning b/data/2022/neurips/Finite Sample Analysis Of Dynamic Regression Parameter Learning
new file mode 100644
index 0000000000..81ae4c9a71
--- /dev/null
+++ b/data/2022/neurips/Finite Sample Analysis Of Dynamic Regression Parameter Learning	
@@ -0,0 +1 @@
+We consider the dynamic linear regression problem, where the predictor vector may vary with time. This problem can be modeled as a linear dynamical system, with non-constant observation operator, where the parameters that need to be learned are the variance of both the process noise and the observation noise. While variance estimation for dynamic regression is a natural problem, with a variety of applications, existing approaches to this problem either lack guarantees altogether, or only have asymptotic guarantees without explicit rates. In particular, existing literature does not provide any clues to the following fundamental question: In terms of data characteristics, what does the convergence rate depend on? In this paper we study the global system operator -- the operator that maps the noise vectors to the output. We obtain estimates on its spectrum, and as a result derive the first known variance estimators with finite sample complexity guarantees. The proposed bounds depend on the shape of a certain spectrum related to the system operator, and thus provide the first known explicit geometric parameter of the data that can be used to bound estimation errors. In addition, the results hold for arbitrary sub Gaussian distributions of noise terms. We evaluate the approach on synthetic and real-world benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Finite-Sample Maximum Likelihood Estimation of Location b/data/2022/neurips/Finite-Sample Maximum Likelihood Estimation of Location
new file mode 100644
index 0000000000..7513654148
--- /dev/null
+++ b/data/2022/neurips/Finite-Sample Maximum Likelihood Estimation of Location	
@@ -0,0 +1 @@
+We consider 1-dimensional location estimation, where we estimate a parameter $\lambda$ from $n$ samples $\lambda + \eta_i$, with each $\eta_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cram\'er-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where $\mathcal{I}$ is the Fisher information of $f$. However, this bound does not hold for finite $n$, or when $f$ varies with $n$. We show for arbitrary $f$ and $n$ that one can recover a similar theory based on the Fisher information of a smoothed version of $f$, where the smoothing radius decays with $n$.
\ No newline at end of file
diff --git a/data/2022/neurips/Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks b/data/2022/neurips/Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks
new file mode 100644
index 0000000000..f80d8119f3
--- /dev/null
+++ b/data/2022/neurips/Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks	
@@ -0,0 +1 @@
+Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of ﬁnite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy ﬁnite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the ﬁnite-time analysis for the adaptive TD with multi-layer ReLU networks approximation whose samples are generated from a Markov decision process. Our established theory shows that if the width of the deep neural network is large enough, the adaptive TD using neural network approximation can ﬁnd the (optimal) value function with high probabilities under the same iteration complexity as TD in general cases. Furthermore, we show that the adaptive TD using neural network approximation, with the same width and searching area, can achieve theoretical acceleration when the stochastic semi-gradients decay fast.
\ No newline at end of file
diff --git a/data/2022/neurips/Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games b/data/2022/neurips/Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games
new file mode 100644
index 0000000000..45a4e38a7b
--- /dev/null
+++ b/data/2022/neurips/Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games	
@@ -0,0 +1 @@
+We study the question of last-iterate convergence rate of the extragradient algorithm by [Kor76] and the optimistic gradient algorithm by [Pop80] in multi-player games. We show that both algorithms with constant step-size have last-iterate convergence rate of O ( 1 √ T ) to a Nash equilibrium in terms of the gap function in smooth monotone games, where each player’s action set is an arbitrary convex set . Previous results only study the unconstrained setting, where each player’s action set is the entire Euclidean space. Our results address an open question raised in several recent works [HIMM19, GPD20, GPDO20], which ask for last-iterate convergence rate of either the extragradient or the optimistic gradient algorithm in the constrained setting. Our convergence rates for both algorithms are tight and match the lower bounds by [GPD20, GPDO20]. At the core of our results lies a new notion – the tangent residual , which we use to measure the proximity to a Nash equilibrium. We use the tangent residual (or a modification of the tangent residual) as the the potential function in our analysis of the extragradient algorithm (or the optimistic gradient algorithm).
\ No newline at end of file
diff --git a/data/2022/neurips/Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits b/data/2022/neurips/Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
new file mode 100644
index 0000000000..9ac3aec9f1
--- /dev/null
+++ b/data/2022/neurips/Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits	
@@ -0,0 +1 @@
+We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a $K$-armed bandit with exponential family rewards, ExpTS over a horizon $T$ is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor $\sqrt{\log K}$, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS$^+$, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms. ExpTS$^+$ is an anytime bandit algorithm and achieves the minimax optimality and asymptotic optimality simultaneously for exponential family reward distributions. Our proof techniques are general and conceptually simple and can be easily applied to analyze standard Thompson sampling with specific reward distributions.
\ No newline at end of file
diff --git a/data/2022/neurips/First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization b/data/2022/neurips/First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization
new file mode 100644
index 0000000000..598ca55a47
--- /dev/null
+++ b/data/2022/neurips/First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization	
@@ -0,0 +1 @@
+How can we train an assistive human-machine interface (e.g., an electromyography-based limb prosthesis) to translate a user's raw command signals into the actions of a robot or computer when there is no prior mapping, we cannot ask the user for supervision in the form of action labels or reward feedback, and we do not have prior knowledge of the tasks the user is trying to accomplish? The key idea in this paper is that, regardless of the task, when an interface is more intuitive, the user's commands are less noisy. We formalize this idea as a completely unsupervised objective for optimizing interfaces: the mutual information between the user's command signals and the induced state transitions in the environment. To evaluate whether this mutual information score can distinguish between effective and ineffective interfaces, we conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains, with an average Spearman's rank correlation of 0.43. In addition to offline evaluation of existing interfaces, we use our unsupervised objective to learn an interface from scratch: we randomly initialize the interface, have the user attempt to perform their desired tasks using the interface, measure the mutual information score, and update the interface to maximize mutual information through reinforcement learning. We evaluate our method through a user study with 12 participants who perform a 2D cursor control task using a perturbed mouse, and an experiment with one user playing the Lunar Lander game using hand gestures. The results show that we can learn an interface from scratch, without any user supervision or prior knowledge of tasks, in under 30 minutes.
\ No newline at end of file
diff --git a/data/2022/neurips/First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data b/data/2022/neurips/First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data
new file mode 100644
index 0000000000..40e424afbd
--- /dev/null
+++ b/data/2022/neurips/First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data	
@@ -0,0 +1 @@
+We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. Moreover, FHDM enables instance-dependent terminate time and accelerates the diffusion process to sample higher quality data with fewer diffusion steps. Technically, we train FHDM by maximum likelihood estimation on diffusion trajectories augmented from observed data with conditional first hitting processes (i.e., bridge) derived based on Doob's $h$-transform, deviating from the commonly used time-reversal mechanism. We apply FHDM to generate data in various domains such as point cloud (general continuous distribution), climate and geographical events on earth (continuous distribution on the sphere), unweighted graphs (distribution of binary matrices), and segmentation maps of 2D images (high-dimensional categorical distribution). We observe considerable improvement compared with the state-of-the-art approaches in both quality and speed.
\ No newline at end of file
diff --git a/data/2022/neurips/First is Better Than Last for Language Data Influence b/data/2022/neurips/First is Better Than Last for Language Data Influence
new file mode 100644
index 0000000000..059c41bf60
--- /dev/null
+++ b/data/2022/neurips/First is Better Than Last for Language Data Influence	
@@ -0,0 +1 @@
+The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains"shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by much. To mitigate this, we propose a technique called TracIn-WE that modifies a method called TracIn to operate on the word embedding layer instead of the last layer, where the cancellation effect is less severe. One potential concern is that influence based on the word embedding layer may not encode sufficient high level information. However, we find that gradients (unlike embeddings) do not suffer from this, possibly because they chain through higher layers. We show that TracIn-WE significantly outperforms other data influence methods applied on the last layer significantly on the case deletion evaluation on three language classification tasks for different models. In addition, TracIn-WE can produce scores not just at the level of the overall training input, but also at the level of words within the training input, a further aid in debugging.
\ No newline at end of file
diff --git a/data/2022/neurips/First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces b/data/2022/neurips/First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces
new file mode 100644
index 0000000000..6c4ef519af
--- /dev/null
+++ b/data/2022/neurips/First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces	
@@ -0,0 +1 @@
+From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.
\ No newline at end of file
diff --git a/data/2022/neurips/Fixed-Distance Hamiltonian Monte Carlo b/data/2022/neurips/Fixed-Distance Hamiltonian Monte Carlo
new file mode 100644
index 0000000000..6cbfdbb29c
--- /dev/null
+++ b/data/2022/neurips/Fixed-Distance Hamiltonian Monte Carlo	
@@ -0,0 +1 @@
+We propose a variation of the Hamiltonian Monte Carlo sampling (HMC) where the equations of motion are simulated for a fixed traversed distance rather than the conventional fixed simulation time. This new mechanism tends to generate proposals that have higher target probability values. The momentum distribution that is naturally joint with our Fixed-Distance HMC (FDHMC), and keeps the proposal acceptance probability close to 1, is not Gaussian and generates momentums that have a higher expected magnitude. This translates into a reduced correlation between the successive MCMC states and according to our experimental results, leads to an improvement in terms of the effective sample size per gradient when compared to the baseline HMC and No-U-Turn (NUTS) samplers
\ No newline at end of file
diff --git a/data/2022/neurips/Flamingo: a Visual Language Model for Few-Shot Learning b/data/2022/neurips/Flamingo: a Visual Language Model for Few-Shot Learning
new file mode 100644
index 0000000000..0b2f09a232
--- /dev/null
+++ b/data/2022/neurips/Flamingo: a Visual Language Model for Few-Shot Learning	
@@ -0,0 +1 @@
+Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.
\ No newline at end of file
diff --git a/data/2022/neurips/Flare7K: A Phenomenological Nighttime Flare Removal Dataset b/data/2022/neurips/Flare7K: A Phenomenological Nighttime Flare Removal Dataset
new file mode 100644
index 0000000000..fc9edc26b2
--- /dev/null
+++ b/data/2022/neurips/Flare7K: A Phenomenological Nighttime Flare Removal Dataset	
@@ -0,0 +1 @@
+Artificial lights commonly leave strong lens flare artifacts on images captured at night. Nighttime flare not only affects the visual quality but also degrades the performance of vision algorithms. Existing flare removal methods mainly focus on removing daytime flares and fail in nighttime. Nighttime flare removal is challenging because of the unique luminance and spectrum of artificial lights and the diverse patterns and image degradation of the flares captured at night. The scarcity of nighttime flare removal datasets limits the research on this crucial task. In this paper, we introduce, Flare7K, the first nighttime flare removal dataset, which is generated based on the observation and statistics of real-world nighttime lens flares. It offers 5,000 scattering and 2,000 reflective flare images, consisting of 25 types of scattering flares and 10 types of reflective flares. The 7,000 flare patterns can be randomly added to flare-free images, forming the flare-corrupted and flare-free image pairs. With the paired data, we can train deep models to restore flare-corrupted images taken in the real world effectively. Apart from abundant flare patterns, we also provide rich annotations, including the labeling of light source, glare with shimmer, reflective flare, and streak, which are commonly absent from existing datasets. Hence, our dataset can facilitate new work in nighttime flare removal and more fine-grained analysis of flare patterns. Extensive experiments show that our dataset adds diversity to existing flare datasets and pushes the frontier of nighttime flare removal.
\ No newline at end of file
diff --git a/data/2022/neurips/FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness b/data/2022/neurips/FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
new file mode 100644
index 0000000000..0f2d5eb515
--- /dev/null
+++ b/data/2022/neurips/FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	
@@ -0,0 +1 @@
+Transformers are slow and memory-hungry on long sequences, since the time and memory complexity of self-attention are quadratic in sequence length. Approximate attention methods have attempted to address this problem by trading off model quality to reduce the compute complexity, but often do not achieve wall-clock speedup. We argue that a missing principle is making attention algorithms IO-aware -- accounting for reads and writes between levels of GPU memory. We propose FlashAttention, an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM. We analyze the IO complexity of FlashAttention, showing that it requires fewer HBM accesses than standard attention, and is optimal for a range of SRAM sizes. We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method. FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3$\times$ speedup on GPT-2 (seq. length 1K), and 2.4$\times$ speedup on long-range arena (seq. length 1K-4K). FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexity on GPT-2 and 6.4 points of lift on long-document classification) and entirely new capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge (seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).
\ No newline at end of file
diff --git a/data/2022/neurips/Flexible Diffusion Modeling of Long Videos b/data/2022/neurips/Flexible Diffusion Modeling of Long Videos
new file mode 100644
index 0000000000..e0f362a2f6
--- /dev/null
+++ b/data/2022/neurips/Flexible Diffusion Modeling of Long Videos	
@@ -0,0 +1 @@
+We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.
\ No newline at end of file
diff --git a/data/2022/neurips/Flexible Neural Image Compression via Code Editing b/data/2022/neurips/Flexible Neural Image Compression via Code Editing
new file mode 100644
index 0000000000..3bf70131c6
--- /dev/null
+++ b/data/2022/neurips/Flexible Neural Image Compression via Code Editing	
@@ -0,0 +1 @@
+Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility. In this paper we propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization. Our work is a new paradigm for variable bitrate NIC. Furthermore, experimental results show that our method surpasses existing variable-rate methods, and achieves ROI coding and multi-distortion trade-off with a single decoder.
\ No newline at end of file
diff --git a/data/2022/neurips/FlowHMM: Flow-based continuous hidden Markov models b/data/2022/neurips/FlowHMM: Flow-based continuous hidden Markov models
new file mode 100644
index 0000000000..ffd8fe9ae9
--- /dev/null
+++ b/data/2022/neurips/FlowHMM: Flow-based continuous hidden Markov models	
@@ -0,0 +1 @@
+Continuous hidden Markov models (HMMs) assume that observations are generated from a mixture of Gaussian densities, limiting their ability to model more complex distributions. In this work, we address this shortcoming and propose novel continuous HMM models, dubbed FlowHMMs, that enable learning general continuous observation densities without constraining them to follow a Gaussian distribution or their mixtures. To that end, we leverage deep flow-based architectures that model complex, non-Gaussian functions and propose two variants of training a FlowHMM model. The first one, based on gradient-based technique, can be applied directly to continuous multidimensional data, yet its application to larger data sequences remains computationally expensive. Therefore, we also present a second approach to training our FlowHMM that relies on the co-occurrence matrix of discretized observations and considers the joint distribution of pairs of co-observed values, hence rendering the training time independent of the training sequence length. As a result, we obtain a model that can be flexibly adapted to the characteristics and dimensionality of the data. We perform a variety of experiments in which we compare both training strategies with a baseline of Gaussian mixture models. We show, that in terms of quality of the recovered probability distribution, accuracy of prediction of hidden states, and likelihood of unseen data, our approach outperforms the standard Gaussian methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Flowification: Everything is a normalizing flow b/data/2022/neurips/Flowification: Everything is a normalizing flow
new file mode 100644
index 0000000000..a91841383d
--- /dev/null
+++ b/data/2022/neurips/Flowification: Everything is a normalizing flow	
@@ -0,0 +1 @@
+The two key characteristics of a normalizing flow is that it is invertible (in particular, dimension preserving) and that it monitors the amount by which it changes the likelihood of data points as samples are propagated along the network. Recently, multiple generalizations of normalizing flows have been introduced that relax these two conditions. On the other hand, neural networks only perform a forward pass on the input, there is neither a notion of an inverse of a neural network nor is there one of its likelihood contribution. In this paper we argue that certain neural network architectures can be enriched with a stochastic inverse pass and that their likelihood contribution can be monitored in a way that they fall under the generalized notion of a normalizing flow mentioned above. We term this enrichment flowification. We prove that neural networks only containing linear layers, convolutional layers and invertible activations such as LeakyReLU can be flowified and evaluate them in the generative setting on image datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/FlyView: a bio-informed optical flow truth dataset for visual navigation using panoramic stereo vision b/data/2022/neurips/FlyView: a bio-informed optical flow truth dataset for visual navigation using panoramic stereo vision
new file mode 100644
index 0000000000..c70f2d62c4
--- /dev/null
+++ b/data/2022/neurips/FlyView: a bio-informed optical flow truth dataset for visual navigation using panoramic stereo vision	
@@ -0,0 +1 @@
+Flying at speed through complex environments is a difficult task that has been performed successfully by insects since the Carboniferous [1], but remains a challenge for robotic and autonomous systems. Insects navigate the world using optical flow sensed by their compound eyes, which they process using a deep neural network implemented on hardware weighing just a few milligrams. Deploying an insect-inspired network architecture in computer vision could therefore enable more efficient and effective ways of estimating structure and self-motion using optical flow. Training a bio-informed deep network to implement these tasks requires biologically relevant training, test, and validation data. To this end, we introduce FlyView 1 , a novel bio-informed truth dataset for visual navigation. This simulated dataset is rendered using open source 3D scenes in which the agent’s position is known at every frame, and is accompanied by truth data on depth, self-motion, and motion flow. This dataset comprising 42,475 frames has several key features that are missing from existing optical flow datasets, including: (i) panoramic camera images, with a monocular and binocular field of view matched to that of a fly’s compound eyes; (ii) dynamically meaningful self-motion, modelled on motion primitives or the 3D trajectories of drones and flies; and (iii) complex natural and indoor environments, including reflective surfaces, fog, and clouds.
\ No newline at end of file
diff --git a/data/2022/neurips/Focal Modulation Networks b/data/2022/neurips/Focal Modulation Networks
new file mode 100644
index 0000000000..da3ae4d947
--- /dev/null
+++ b/data/2022/neurips/Focal Modulation Networks	
@@ -0,0 +1 @@
+We propose focal modulation networks (FocalNets in short), where self-attention (SA) is completely replaced by a focal modulation mechanism for modeling token interactions in vision. Focal modulation comprises three components: (i) hierarchical contextualization, implemented using a stack of depth-wise convolutional layers, to encode visual contexts from short to long ranges, (ii) gated aggregation to selectively gather contexts for each query token based on its content, and (iii) element-wise modulation or affine transformation to inject the aggregated context into the query. Extensive experiments show FocalNets outperform the state-of-the-art SA counterparts (e.g., Swin and Focal Transformers) with similar computational costs on the tasks of image classification, object detection, and segmentation. Specifically, FocalNets with tiny and base size achieve 82.3% and 83.9% top-1 accuracy on ImageNet-1K. After pretrained on ImageNet-22K in 224 resolution, it attains 86.5% and 87.3% top-1 accuracy when finetuned with resolution 224 and 384, respectively. When transferred to downstream tasks, FocalNets exhibit clear superiority. For object detection with Mask R-CNN, FocalNet base trained with 1\times outperforms the Swin counterpart by 2.1 points and already surpasses Swin trained with 3\times schedule (49.0 v.s. 48.5). For semantic segmentation with UPerNet, FocalNet base at single-scale outperforms Swin by 2.4, and beats Swin at multi-scale (50.5 v.s. 49.7). Using large FocalNet and Mask2former, we achieve 58.5 mIoU for ADE20K semantic segmentation, and 57.9 PQ for COCO Panoptic Segmentation. Using huge FocalNet and DINO, we achieved 64.3 and 64.4 mAP on COCO minival and test-dev, respectively, establishing new SoTA on top of much larger attention-based models like Swinv2-G and BEIT-3. Code and checkpoints are available at https://github.com/microsoft/FocalNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback b/data/2022/neurips/Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback
new file mode 100644
index 0000000000..06e365e8d7
--- /dev/null
+++ b/data/2022/neurips/Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback	
@@ -0,0 +1 @@
+We consider regret minimization for Adversarial Markov Decision Processes (AMDPs), where the loss functions are changing over time and adversarially chosen, and the learner only observes the losses for the visited state-action pairs (i.e., bandit feedback). While there has been a surge of studies on this problem using Online-Mirror-Descent (OMD) methods, very little is known about the Follow-the-Perturbed-Leader (FTPL) methods, which are usually computationally more efficient and also easier to implement since it only requires solving an offline planning problem. Motivated by this, we take a closer look at FTPL for learning AMDPs, starting from the standard episodic finite-horizon setting. We find some unique and intriguing difficulties in the analysis and propose a workaround to eventually show that FTPL is also able to achieve near-optimal regret bounds in this case. More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022). Second, using FTPL, we also develop the first no-regret algorithm for learning communicating AMDPs in the infinite-horizon setting with bandit feedback and stochastic transitions. Our algorithm is efficient assuming access to an offline planning oracle, while even for the easier full-information setting, the only existing algorithm (Chandrasekaran and Tewari, 2021) is computationally inefficient.
\ No newline at end of file
diff --git a/data/2022/neurips/Forecasting Future World Events With Neural Networks b/data/2022/neurips/Forecasting Future World Events With Neural Networks
new file mode 100644
index 0000000000..1d11552c77
--- /dev/null
+++ b/data/2022/neurips/Forecasting Future World Events With Neural Networks	
@@ -0,0 +1 @@
+Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus. Questions are taken from forecasting tournaments, ensuring high quality, real-world importance, and diversity. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts (avoiding leakage from the future). Motivated by the difficulty of forecasting numbers across orders of magnitude (e.g. global cases of COVID-19 in 2022), we also curate IntervalQA, a dataset of numerical questions and metrics for calibration. We test language models on our forecasting task and find that performance is far below a human expert baseline. However, performance improves with increased model size and incorporation of relevant information from the news corpus. In sum, Autocast poses a novel challenge for large language models and improved performance could bring large practical benefits.
\ No newline at end of file
diff --git a/data/2022/neurips/Forecasting Human Trajectory from Scene History b/data/2022/neurips/Forecasting Human Trajectory from Scene History
new file mode 100644
index 0000000000..174af89541
--- /dev/null
+++ b/data/2022/neurips/Forecasting Human Trajectory from Scene History	
@@ -0,0 +1 @@
+Predicting the future trajectory of a person remains a challenging problem, due to randomness and subjectivity of human movement. However, the moving patterns of human in a constrained scenario typically conform to a limited number of regularities to a certain extent, because of the scenario restrictions and person-person or person-object interactivity. Thus, an individual person in this scenario should follow one of the regularities as well. In other words, a person's subsequent trajectory has likely been traveled by others. Based on this hypothesis, we propose to forecast a person's future trajectory by learning from the implicit scene regularities. We call the regularities, inherently derived from the past dynamics of the people and the environment in the scene, scene history. We categorize scene history information into two types: historical group trajectory and individual-surroundings interaction. To exploit these two types of information for trajectory prediction, we propose a novel framework Scene History Excavating Network (SHENet), where the scene history is leveraged in a simple yet effective approach. In particular, we design two components: the group trajectory bank module to extract representative group trajectories as the candidate for future path, and the cross-modal interaction module to model the interaction between individual past trajectory and its surroundings for trajectory refinement. In addition, to mitigate the uncertainty in ground-truth trajectory, caused by the aforementioned randomness and subjectivity of human movement, we propose to include smoothness into the training process and evaluation metrics. We conduct extensive evaluations to validate the efficacy of our proposed framework on ETH, UCY, as well as a new, challenging benchmark dataset PAV, demonstrating superior performance compared to state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Formalizing Consistency and Coherence of Representation Learning b/data/2022/neurips/Formalizing Consistency and Coherence of Representation Learning
new file mode 100644
index 0000000000..b01a3dfea6
--- /dev/null
+++ b/data/2022/neurips/Formalizing Consistency and Coherence of Representation Learning	
@@ -0,0 +1 @@
+In the study of reasoning in neural networks, recent efforts have sought to improve consistency and coherence of sequence models, leading to important developments in the area of neuro-symbolic AI. In symbolic AI, the concepts of consistency and coherence can be deﬁned and veriﬁed formally, but for neural networks these deﬁnitions are lacking. The provision of such formal deﬁnitions is crucial to offer a common basis for the quantitative evaluation and systematic comparison of connectionist, neuro-symbolic and transfer learning approaches. In this paper, we introduce formal deﬁnitions of consistency and coherence for neural systems. To illustrate the usefulness of our deﬁnitions, we propose a new dynamic relation-decoder model built around the principles of consistency and coherence. We compare our results with several existing relation-decoders using a partial transfer learning task based on a novel data set introduced in this paper. Our experiments show that relation-decoders that maintain consistency over unobserved regions of representation space retain coherence across domains, whilst achieving better transfer learning performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Formulating Robustness Against Unforeseen Attacks b/data/2022/neurips/Formulating Robustness Against Unforeseen Attacks
new file mode 100644
index 0000000000..a9610064d4
--- /dev/null
+++ b/data/2022/neurips/Formulating Robustness Against Unforeseen Attacks	
@@ -0,0 +1 @@
+Existing defenses against adversarial examples such as adversarial training typically assume that the adversary will conform to a specific or known threat model, such as $\ell_p$ perturbations within a fixed budget. In this paper, we focus on the scenario where there is a mismatch in the threat model assumed by the defense during training, and the actual capabilities of the adversary at test time. We ask the question: if the learner trains against a specific"source"threat model, when can we expect robustness to generalize to a stronger unknown"target"threat model during test-time? Our key contribution is to formally define the problem of learning and generalization with an unforeseen adversary, which helps us reason about the increase in adversarial risk from the conventional perspective of a known adversary. Applying our framework, we derive a generalization bound which relates the generalization gap between source and target threat models to variation of the feature extractor, which measures the expected maximum difference between extracted features across a given threat model. Based on our generalization bound, we propose variation regularization (VR) which reduces variation of the feature extractor across the source threat model during training. We empirically demonstrate that using VR can lead to improved generalization to unforeseen attacks during test-time, and combining VR with perceptual adversarial training (Laidlaw et al., 2021) achieves state-of-the-art robustness on unforeseen attacks. Our code is publicly available at https://github.com/inspire-group/variation-regularization.
\ No newline at end of file
diff --git a/data/2022/neurips/Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains b/data/2022/neurips/Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains
new file mode 100644
index 0000000000..7d9d5b7763
--- /dev/null
+++ b/data/2022/neurips/Forward-Backward Latent State Inference for Hidden Continuous-Time semi-Markov Chains	
@@ -0,0 +1 @@
+Hidden semi-Markov Models (HSMM's) - while broadly in use - are restricted to a discrete and uniform time grid. They are thus not well suited to explain often irregularly spaced discrete event data from continuous-time phenomena. We show that non-sampling-based latent state inference used in HSMM's can be generalized to latent Continuous-Time semi-Markov Chains (CTSMC's). We formulate integro-differential forward and backward equations adjusted to the observation likelihood and introduce an exact integral equation for the Bayesian posterior marginals and a scalable Viterbi-type algorithm for posterior path estimates. The presented equations can be efficiently solved using well-known numerical methods. As a practical tool, variable-step HSMM's are introduced. We evaluate our approaches in latent state inference scenarios in comparison to classical HSMM's.
\ No newline at end of file
diff --git a/data/2022/neurips/Foundation Posteriors for Approximate Probabilistic Inference b/data/2022/neurips/Foundation Posteriors for Approximate Probabilistic Inference
new file mode 100644
index 0000000000..8aab205201
--- /dev/null
+++ b/data/2022/neurips/Foundation Posteriors for Approximate Probabilistic Inference	
@@ -0,0 +1 @@
+Probabilistic programs provide an expressive representation language for generative models. Given a probabilistic program, we are interested in the task of posterior inference: estimating a latent variable given a set of observed variables. Existing techniques for inference in probabilistic programs often require choosing many hyper-parameters, are computationally expensive, and/or only work for restricted classes of programs. Here we formulate inference as masked language modeling: given a program, we generate a supervised dataset of variables and assignments, and randomly mask a subset of the assignments. We then train a neural network to unmask the random values, defining an approximate posterior distribution. By optimizing a single neural network across a range of programs we amortize the cost of training, yielding a"foundation"posterior able to do zero-shot inference for new programs. The foundation posterior can also be fine-tuned for a particular program and dataset by optimizing a variational inference objective. We show the efficacy of the approach, zero-shot and fine-tuned, on a benchmark of STAN programs.
\ No newline at end of file
diff --git a/data/2022/neurips/FourierFormer: Transformer Meets Generalized Fourier Integral Theorem b/data/2022/neurips/FourierFormer: Transformer Meets Generalized Fourier Integral Theorem
new file mode 100644
index 0000000000..a515c939ef
--- /dev/null
+++ b/data/2022/neurips/FourierFormer: Transformer Meets Generalized Fourier Integral Theorem	
@@ -0,0 +1 @@
+Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond. These attention mechanisms compute the pairwise dot products between the queries and keys, which results from the use of unnormalized Gaussian kernels with the assumption that the queries follow a mixture of Gaussian distribution. There is no guarantee that this assumption is valid in practice. In response, we ﬁrst interpret attention in transformers as a nonparametric kernel regression. We then propose the FourierFormer, a new class of transformers in which the dot-product kernels are replaced by the novel generalized Fourier integral kernels. Different from the dot-product kernels, where we need to choose a good covariance matrix to capture the dependency of the features of data, the generalized Fourier integral kernels can automatically capture such dependency and remove the need to tune the covariance matrix. We theoretically prove that our proposed Fourier integral kernels can efﬁ-ciently approximate any key and query distributions. Compared to the conventional transformers with dot-product attention, FourierFormers attain better accuracy and reduce the redundancy between attention heads. We empirically corroborate the advantages of FourierFormers over the baseline transformers in a variety of practical applications including language modeling and image classiﬁcation.
\ No newline at end of file
diff --git a/data/2022/neurips/FourierNets enable the design of highly non-local optical encoders for computational imaging b/data/2022/neurips/FourierNets enable the design of highly non-local optical encoders for computational imaging
new file mode 100644
index 0000000000..43eebece4c
--- /dev/null
+++ b/data/2022/neurips/FourierNets enable the design of highly non-local optical encoders for computational imaging	
@@ -0,0 +1 @@
+Differentiable simulations of optical systems can be combined with deep learning-based reconstruction networks to enable high performance computational imaging via end-to-end (E2E) optimization of both the optical encoder and the deep decoder. This has enabled imaging applications such as 3D localization microscopy, depth estimation, and lensless photography via the optimization of local optical encoders. More challenging computational imaging applications, such as 3D snapshot microscopy which compresses 3D volumes into single 2D images, require a highly non-local optical encoder. We show that existing deep network decoders have a locality bias which prevents the optimization of such highly non-local optical encoders. We address this with a decoder based on a shallow neural network architecture using global kernel Fourier convolutional neural networks (FourierNets). We show that FourierNets surpass existing deep network based decoders at reconstructing photographs captured by the highly non-local DiffuserCam optical encoder. Further, we show that FourierNets enable E2E optimization of highly non-local optical encoders for 3D snapshot microscopy. By combining FourierNets with a large-scale multi-GPU differentiable optical simulation, we are able to optimize non-local optical encoders 170$\times$ to 7372$\times$ larger than prior state of the art, and demonstrate the potential for ROI-type specific optical encoding with a programmable microscope.
\ No newline at end of file
diff --git a/data/2022/neurips/Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator b/data/2022/neurips/Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator
new file mode 100644
index 0000000000..f9d2b51e02
--- /dev/null
+++ b/data/2022/neurips/Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator	
@@ -0,0 +1 @@
+Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which maybe prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d. samples from a continuous distribution supported on the entire unit sphere, AFW and GAFW are proved to converge with linear rates. Importantly, all three variants are parameter-free and use adaptive step-sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/FreGAN: Exploiting Frequency Components for Training GANs under Limited Data b/data/2022/neurips/FreGAN: Exploiting Frequency Components for Training GANs under Limited Data
new file mode 100644
index 0000000000..1087eb225c
--- /dev/null
+++ b/data/2022/neurips/FreGAN: Exploiting Frequency Components for Training GANs under Limited Data	
@@ -0,0 +1 @@
+Training GANs under limited data often leads to discriminator overfitting and memorization issues, causing divergent training. Existing approaches mitigate the overfitting by employing data augmentations, model regularization, or attention mechanisms. However, they ignore the frequency bias of GANs and take poor consideration towards frequency information, especially high-frequency signals that contain rich details. To fully utilize the frequency information of limited data, this paper proposes FreGAN, which raises the model's frequency awareness and draws more attention to producing high-frequency signals, facilitating high-quality generation. In addition to exploiting both real and generated images' frequency information, we also involve the frequency signals of real images as a self-supervised constraint, which alleviates the GAN disequilibrium and encourages the generator to synthesize adequate rather than arbitrary frequency signals. Extensive results demonstrate the superiority and effectiveness of our FreGAN in ameliorating generation quality in the low-data regime (especially when training data is less than 100). Besides, FreGAN can be seamlessly applied to existing regularization and attention mechanism models to further boost the performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Free Probability for predicting the performance of feed-forward fully connected neural networks b/data/2022/neurips/Free Probability for predicting the performance of feed-forward fully connected neural networks
new file mode 100644
index 0000000000..932682541e
--- /dev/null
+++ b/data/2022/neurips/Free Probability for predicting the performance of feed-forward fully connected neural networks	
@@ -0,0 +1 @@
+Gradient descent during the learning process of a neural network can be subject to many instabilities. The spectral density of the Jacobian is a key component for analyzing stability. Following the works of Pennington et al., such Jacobians are modeled using free multiplicative convolutions from Free Probability Theory (FPT). We present a reliable and very fast method for computing the associated spectral densities, for given architecture and initialization. This method has a controlled and proven convergence. Our technique is based on an homotopy method: it is an adaptative Newton-Raphson scheme which chains basins of attraction. In order to demonstrate the relevance of our method we show that the relevant FPT metrics computed before training are highly correlated to final test accuracies - up to 85\%. We also nuance the idea that learning happens at the edge of chaos by giving evidence that a very desirable feature for neural networks is the hyperbolicity of their Jacobian at initialization.
\ No newline at end of file
diff --git a/data/2022/neurips/Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack b/data/2022/neurips/Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack
new file mode 100644
index 0000000000..7db5839345
--- /dev/null
+++ b/data/2022/neurips/Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack	
@@ -0,0 +1 @@
+A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they often either drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component. Our code is available at: https://github.com/tianyu139/friendly-noise
\ No newline at end of file
diff --git a/data/2022/neurips/From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent b/data/2022/neurips/From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
new file mode 100644
index 0000000000..b6ed32c602
--- /dev/null
+++ b/data/2022/neurips/From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent	
@@ -0,0 +1 @@
+Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity that a continuous-time analysis buys us. An overarching theme of our paper is providing general conditions under which SGD converges, assuming that GF on the population loss converges. Our main tool to establish this connection is a general converse Lyapunov like theorem, which implies the existence of a Lyapunov potential under mild assumptions on the rates of convergence of GF. In fact, using these potentials, we show a one-to-one correspondence between rates of convergence of GF and geometrical properties of the underlying objective. When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart). It turns out that these self-bounding assumptions are in a sense also necessary for GD/SGD to work. Using our framework, we provide a unified analysis for GD/SGD not only for classical settings like convex losses, or objectives that satisfy PL / KL properties, but also for more complex problems including Phase Retrieval and Matrix sq-root, and extending the results in the recent work of Chatterjee 2022.
\ No newline at end of file
diff --git a/data/2022/neurips/Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images b/data/2022/neurips/Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images
new file mode 100644
index 0000000000..8de008a8e8
--- /dev/null
+++ b/data/2022/neurips/Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images	
@@ -0,0 +1 @@
+We present a simple yet effective fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes, termed FCOS-LiDAR. Unlike the dominant methods that use the bird-eye view (BEV), our proposed detector detects objects from the range view (RV, a.k.a. range image) of the LiDAR points. Due to the range view's compactness and compatibility with the LiDAR sensors' sampling process on self-driving cars, the range view-based object detector can be realized by solely exploiting the vanilla 2D convolutions, departing from the BEV-based methods which often involve complicated voxelization operations and sparse convolutions. For the first time, we show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors while being significantly faster and simpler. More importantly, almost all previous range view-based detectors only focus on single-frame point clouds, since it is challenging to fuse multi-frame point clouds into a single range view. In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector. Extensive experiments on nuScenes show the superiority of our proposed method and we believe that our work can be strong evidence that an RV-based 3D detector can compare favourably with the current mainstream BEV-based detectors.
\ No newline at end of file
diff --git a/data/2022/neurips/Fully Sparse 3D Object Detection b/data/2022/neurips/Fully Sparse 3D Object Detection
new file mode 100644
index 0000000000..b6d2bd1001
--- /dev/null
+++ b/data/2022/neurips/Fully Sparse 3D Object Detection	
@@ -0,0 +1 @@
+As the perception range of LiDAR increases, LiDAR-based 3D object detection becomes a dominant task in the long-range perception task of autonomous driving. The mainstream 3D object detectors usually build dense feature maps in the network backbone and prediction head. However, the computational and spatial costs on the dense feature map are quadratic to the perception range, which makes them hardly scale up to the long-range setting. To enable efficient long-range LiDAR-based object detection, we build a fully sparse 3D object detector (FSD). The computational and spatial cost of FSD is roughly linear to the number of points and independent of the perception range. FSD is built upon the general sparse voxel encoder and a novel sparse instance recognition (SIR) module. SIR first groups the points into instances and then applies instance-wise feature extraction and prediction. In this way, SIR resolves the issue of center feature missing, which hinders the design of the fully sparse architecture for all center-based or anchor-based detectors. Moreover, SIR avoids the time-consuming neighbor queries in previous point-based methods by grouping points into instances. We conduct extensive experiments on the large-scale Waymo Open Dataset to reveal the working mechanism of FSD, and state-of-the-art performance is reported. To demonstrate the superiority of FSD in long-range detection, we also conduct experiments on Argoverse 2 Dataset, which has a much larger perception range ($200m$) than Waymo Open Dataset ($75m$). On such a large perception range, FSD achieves state-of-the-art performance and is 2.4$\times$ faster than the dense counterpart. Codes will be released at https://github.com/TuSimple/SST.
\ No newline at end of file
diff --git a/data/2022/neurips/Function Classes for Identifiable Nonlinear Independent Component Analysis b/data/2022/neurips/Function Classes for Identifiable Nonlinear Independent Component Analysis
new file mode 100644
index 0000000000..f662847986
--- /dev/null
+++ b/data/2022/neurips/Function Classes for Identifiable Nonlinear Independent Component Analysis	
@@ -0,0 +1 @@
+Unsupervised learning of latent variable models (LVMs) is widely used to represent data in machine learning. When such models reflect the ground truth factors and the mechanisms mapping them to observations, there is reason to expect that they allow generalization in downstream tasks. It is however well known that such identifiability guaranties are typically not achievable without putting constraints on the model class. This is notably the case for nonlinear Independent Component Analysis, in which the LVM maps statistically independent variables to observations via a deterministic nonlinear function. Several families of spurious solutions fitting perfectly the data, but that do not correspond to the ground truth factors can be constructed in generic settings. However, recent work suggests that constraining the function class of such models may promote identifiability. Specifically, function classes with constraints on their partial derivatives, gathered in the Jacobian matrix, have been proposed, such as orthogonal coordinate transformations (OCT), which impose orthogonality of the Jacobian columns. In the present work, we prove that a subclass of these transformations, conformal maps, is identifiable and provide novel theoretical results suggesting that OCTs have properties that prevent families of spurious solutions to spoil identifiability in a generic setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Functional Ensemble Distillation b/data/2022/neurips/Functional Ensemble Distillation
new file mode 100644
index 0000000000..a21e5b7569
--- /dev/null
+++ b/data/2022/neurips/Functional Ensemble Distillation	
@@ -0,0 +1 @@
+Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model. First, we argue that current approaches that simply return distribution over predictions cannot compute important properties, such as the covariance between predictions, which can be valuable for further processing. Second, in many limited data settings, all ensemble members achieve nearly zero training loss, namely, they produce near-identical predictions on the training set which results in sub-optimal distilled models. To address both problems, we propose a novel and general distillation approach, named Functional Ensemble Distillation (FED), and we investigate how to best distill an ensemble in this setting. We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance. We evaluated our method on several tasks and showed that it achieves superior results in both accuracy and uncertainty estimation compared to current approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Functional Indirection Neural Estimator for Better Out-of-distribution Generalization b/data/2022/neurips/Functional Indirection Neural Estimator for Better Out-of-distribution Generalization
new file mode 100644
index 0000000000..72a3e0d2cf
--- /dev/null
+++ b/data/2022/neurips/Functional Indirection Neural Estimator for Better Out-of-distribution Generalization	
@@ -0,0 +1 @@
+The capacity to achieve out-of-distribution (OOD) generalization is a hallmark of human intelligence and yet remains out of reach for machines. This remarkable capability has been attributed to our abilities to make conceptual abstraction and analogy, and to a mechanism known as indirection, which binds two representations and uses one representation to refer to the other. Inspired by these mechanisms, we hypothesize that OOD generalization may be achieved by performing analogy-making and indirection in the functional space instead of the data space as in current methods. To realize this, we design FINE (Functional Indirection Neural Estimator), a neural framework that learns to compose functions that map data input to output on-the-fly. FINE consists of a backbone network and a trainable semantic memory of basis weight matrices. Upon seeing a new input-output data pair, FINE dynamically constructs the backbone weights by mixing the basis weights. The mixing coefficients are indirectly computed through querying a separate corresponding semantic memory using the data pair. We demonstrate empirically that FINE can strongly improve out-of-distribution generalization on IQ tasks that involve geometric transformations. In particular, we train FINE and competing models on IQ tasks using images from the MNIST, Omniglot and CIFAR100 datasets and test on tasks with unseen image classes from one or different datasets and unseen transformation rules. FINE not only achieves the best performance on all tasks but also is able to adapt to small-scale data scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Fused Orthogonal Alternating Least Squares for Tensor Clustering b/data/2022/neurips/Fused Orthogonal Alternating Least Squares for Tensor Clustering
new file mode 100644
index 0000000000..da4c999fab
--- /dev/null
+++ b/data/2022/neurips/Fused Orthogonal Alternating Least Squares for Tensor Clustering	
@@ -0,0 +1 @@
+We introduce a multi-mode tensor clustering method that implements a fused version of the alternating least squares algorithm (Fused-Orth-ALS) for simultaneous tensor factorization and clustering. The statistical convergence rates of recovery and clustering are established when the data are a noise contaminated tensor with a latent low rank CP decomposition structure. Specifically, we show that a modified alternating least squares algorithm can provably recover the true latent low rank factorization structure when the data form an asymmetric tensor with perturbation. Clustering consistency is also established. Finally, we illustrate the accuracy and computational efficient implementation of the Fused-Orth-ALS algorithm by using both simulations and real datasets
\ No newline at end of file
diff --git a/data/2022/neurips/Fuzzy Learning Machine b/data/2022/neurips/Fuzzy Learning Machine
new file mode 100644
index 0000000000..91b4ea61a9
--- /dev/null
+++ b/data/2022/neurips/Fuzzy Learning Machine	
@@ -0,0 +1 @@
+Classiﬁcation is one of the most important problems in machine learning and the nature of it is concept cognition. So far, dozens of different classiﬁers have been designed. Although their working mechanisms vary widely, few of them fully consider concept cognition. In this paper, a new learning machine, fuzzy learning machine (FLM), is proposed from the perspective of concept cognition. Inspired by cognitive science, its working mechanism is of strong interpretability. At the same time, FLM roots in set theory and fuzzy set theory, so FLM has a solid mathematical foundation. The systematic experimental results on a large number of data sets show that FLM can achieve excellent performance, even with the simple implementation.
\ No newline at end of file
diff --git a/data/2022/neurips/GAGA: Deciphering Age-path of Generalized Self-paced Regularizer b/data/2022/neurips/GAGA: Deciphering Age-path of Generalized Self-paced Regularizer
new file mode 100644
index 0000000000..59b63387cc
--- /dev/null
+++ b/data/2022/neurips/GAGA: Deciphering Age-path of Generalized Self-paced Regularizer	
@@ -0,0 +1 @@
+Nowadays self-paced learning (SPL) is an important machine learning paradigm that mimics the cognitive process of humans and animals. The SPL regime involves a self-paced regularizer and a gradually increasing age parameter, which plays a key role in SPL but where to optimally terminate this process is still non-trivial to determine. A natural idea is to compute the solution path w.r.t. age parameter (i.e., age-path). However, current age-path algorithms are either limited to the simplest regularizer, or lack solid theoretical understanding as well as computational efficiency. To address this challenge, we propose a novel \underline{G}eneralized \underline{Ag}e-path \underline{A}lgorithm (GAGA) for SPL with various self-paced regularizers based on ordinary differential equations (ODEs) and sets control, which can learn the entire solution spectrum w.r.t. a range of age parameters. To the best of our knowledge, GAGA is the first exact path-following algorithm tackling the age-path for general self-paced regularizer. Finally the algorithmic steps of classic SVM and Lasso are described in detail. We demonstrate the performance of GAGA on real-world datasets, and find considerable speedup between our algorithm and competing baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations b/data/2022/neurips/GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations
new file mode 100644
index 0000000000..64d9c13745
--- /dev/null
+++ b/data/2022/neurips/GAL: Gradient Assisted Learning for Decentralized Multi-Organization Collaborations	
@@ -0,0 +1 @@
+Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.
\ No newline at end of file
diff --git a/data/2022/neurips/GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis b/data/2022/neurips/GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis
new file mode 100644
index 0000000000..e3632ed9e6
--- /dev/null
+++ b/data/2022/neurips/GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis	
@@ -0,0 +1 @@
+Despite achieving superior performance in human-level control problems, unlike humans, deep reinforcement learning (DRL) lacks high-order intelligence (e.g., logic deduction and reuse), thus it behaves ineffectively than humans regarding learning and generalization in complex problems. Previous works attempt to directly synthesize a white-box logic program as the DRL policy, manifesting logic-driven behaviors. However, most synthesis methods are built on imperative or declarative programming, and each has a distinct limitation, respectively. The former ignores the cause-effect logic during synthesis, resulting in low generalizability across tasks. The latter is strictly proof-based, thus failing to synthesize programs with complex hierarchical logic. In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs. GALOIS leverages the program sketch and defines a new sketch-based hybrid program language for guiding the synthesis. Based on that, GALOIS proposes a sketch-based program synthesis method to automatically generate white-box programs with generalizable and interpretable cause-effect logic. Extensive evaluations on various decision-making tasks with complex logic demonstrate the superiority of GALOIS over mainstream baselines regarding the asymptotic performance, generalizability, and great knowledge reusability across different environments.
\ No newline at end of file
diff --git a/data/2022/neurips/GAMA: Generative Adversarial Multi-Object Scene Attacks b/data/2022/neurips/GAMA: Generative Adversarial Multi-Object Scene Attacks
new file mode 100644
index 0000000000..d936534dad
--- /dev/null
+++ b/data/2022/neurips/GAMA: Generative Adversarial Multi-Object Scene Attacks	
@@ -0,0 +1 @@
+The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object scene Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes. Using the joint image-text features to train the generator, we show that GAMA can craft potent transferable perturbations in order to fool victim classifiers in various attack settings. For example, GAMA triggers ~16% more misclassification than state-of-the-art generative approaches in black-box settings where both the classifier architecture and data distribution of the attacker are different from the victim. Our code is available here: https://abhishekaich27.github.io/gama.html
\ No newline at end of file
diff --git a/data/2022/neurips/GAPX: Generalized Autoregressive Paraphrase-Identification X b/data/2022/neurips/GAPX: Generalized Autoregressive Paraphrase-Identification X
new file mode 100644
index 0000000000..214d4edfb4
--- /dev/null
+++ b/data/2022/neurips/GAPX: Generalized Autoregressive Paraphrase-Identification X	
@@ -0,0 +1 @@
+Paraphrase Identification is a fundamental task in Natural Language Processing. While much progress has been made in the field, the performance of many state-of-the-art models often suffer from distribution shift during inference time. We verify that a major source of this performance drop comes from biases introduced by negative examples. To overcome these biases, we propose in this paper to train two separate models, one that only utilizes the positive pairs and the other the negative pairs. This enables us the option of deciding how much to utilize the negative model, for which we introduce a perplexity based out-of-distribution metric that we show can effectively and automatically determine how much weight it should be given during inference. We support our findings with strong empirical results.
\ No newline at end of file
diff --git a/data/2022/neurips/GAR: Generalized Autoregression for Multi-Fidelity Fusion b/data/2022/neurips/GAR: Generalized Autoregression for Multi-Fidelity Fusion
new file mode 100644
index 0000000000..771cbe9f52
--- /dev/null
+++ b/data/2022/neurips/GAR: Generalized Autoregression for Multi-Fidelity Fusion	
@@ -0,0 +1 @@
+In many scientific research and engineering applications where repeated simulations of complex systems are conducted, a surrogate is commonly adopted to quickly estimate the whole system. To reduce the expensive cost of generating training examples, it has become a promising approach to combine the results of low-fidelity (fast but inaccurate) and high-fidelity (slow but accurate) simulations. Despite the fast developments of multi-fidelity fusion techniques, most existing methods require particular data structures and do not scale well to high-dimensional output. To resolve these issues, we generalize the classic autoregression (AR), which is wildly used due to its simplicity, robustness, accuracy, and tractability, and propose generalized autoregression (GAR) using tensor formulation and latent features. GAR can deal with arbitrary dimensional outputs and arbitrary multifidelity data structure to satisfy the demand of multi-fidelity fusion for complex problems; it admits a fully tractable likelihood and posterior requiring no approximate inference and scales well to high-dimensional problems. Furthermore, we prove the autokrigeability theorem based on GAR in the multi-fidelity case and develop CIGAR, a simplified GAR with the exact predictive mean accuracy with computation reduction by a factor of d 3, where d is the dimensionality of the output. The empirical assessment includes many canonical PDEs and real scientific examples and demonstrates that the proposed method consistently outperforms the SOTA methods with a large margin (up to 6x improvement in RMSE) with only a couple high-fidelity training samples.
\ No newline at end of file
diff --git a/data/2022/neurips/GAUDI: A Neural Architect for Immersive 3D Scene Generation b/data/2022/neurips/GAUDI: A Neural Architect for Immersive 3D Scene Generation
new file mode 100644
index 0000000000..f2115d4bc0
--- /dev/null
+++ b/data/2022/neurips/GAUDI: A Neural Architect for Immersive 3D Scene Generation	
@@ -0,0 +1 @@
+We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.
\ No newline at end of file
diff --git a/data/2022/neurips/GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Models b/data/2022/neurips/GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Models
new file mode 100644
index 0000000000..58354aca55
--- /dev/null
+++ b/data/2022/neurips/GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Models	
@@ -0,0 +1 @@
+High-concurrency asynchronous training upon parameter server (PS) architecture and high-performance synchronous training upon all-reduce (AR) architecture are the most commonly deployed distributed training modes for recommendation models. Although synchronous AR training is designed to have higher training efficiency, asynchronous PS training would be a better choice for training speed when there are stragglers (slow workers) in the shared cluster, especially under limited computing resources. An ideal way to take full advantage of these two training modes is to switch between them upon the cluster status. However, switching training modes often requires tuning hyper-parameters, which is extremely time- and resource-consuming. We find two obstacles to a tuning-free approach: the different distribution of the gradient values and the stale gradients from the stragglers. This paper proposes Global Batch gradients Aggregation (GBA) over PS, which aggregates and applies gradients with the same global batch size as the synchronous training. A token-control process is implemented to assemble the gradients and decay the gradients with severe staleness. We provide the convergence analysis to reveal that GBA has comparable convergence properties with the synchronous training, and demonstrate the robustness of GBA the recommendation models against the gradient staleness. Experiments on three industrial-scale recommendation tasks show that GBA is an effective tuning-free approach for switching. Compared to the state-of-the-art derived asynchronous training, GBA achieves up to 0.2% improvement on the AUC metric, which is significant for the recommendation models. Meanwhile, under the strained hardware resource, GBA speeds up at least 2.4x compared to synchronous training.
\ No newline at end of file
diff --git a/data/2022/neurips/GENIE: Higher-Order Denoising Diffusion Solvers b/data/2022/neurips/GENIE: Higher-Order Denoising Diffusion Solvers
new file mode 100644
index 0000000000..4e6e0d2270
--- /dev/null
+++ b/data/2022/neurips/GENIE: Higher-Order Denoising Diffusion Solvers	
@@ -0,0 +1 @@
+Denoising diffusion models (DDMs) have emerged as a powerful class of generative models. A forward diffusion process slowly perturbs the data, while a deep model learns to gradually denoise. Synthesis amounts to solving a differential equation (DE) defined by the learnt model. Solving the DE requires slow iterative solvers for high-quality generation. In this work, we propose Higher-Order Denoising Diffusion Solvers (GENIE): Based on truncated Taylor methods, we derive a novel higher-order solver that significantly accelerates synthesis. Our solver relies on higher-order gradients of the perturbed data distribution, that is, higher-order score functions. In practice, only Jacobian-vector products (JVPs) are required and we propose to extract them from the first-order score network via automatic differentiation. We then distill the JVPs into a separate neural network that allows us to efficiently compute the necessary higher-order terms for our novel sampler during synthesis. We only need to train a small additional head on top of the first-order score network. We validate GENIE on multiple image generation benchmarks and demonstrate that GENIE outperforms all previous solvers. Unlike recent methods that fundamentally alter the generation process in DDMs, our GENIE solves the true generative DE and still enables applications such as encoding and guided sampling. Project page and code: https://nv-tlabs.github.io/GENIE.
\ No newline at end of file
diff --git a/data/2022/neurips/GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images b/data/2022/neurips/GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images
new file mode 100644
index 0000000000..25d34a450d
--- /dev/null
+++ b/data/2022/neurips/GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images	
@@ -0,0 +1 @@
+As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that can scale in terms of the quantity, quality, and diversity of 3D content is becoming evident. In our work, we aim to train performant 3D generative models that synthesize textured meshes which can be directly consumed by 3D rendering engines, thus immediately usable in downstream applications. Prior works on 3D generative modeling either lack geometric details, are limited in the mesh topology they can produce, typically do not support textures, or utilize neural renderers in the synthesis process, which makes their use in common 3D software non-trivial. In this work, we introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures. We bridge recent success in the differentiable surface modeling, differentiable rendering as well as 2D Generative Adversarial Networks to train our model from 2D image collections. GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings, achieving significant improvements over previous methods.
\ No newline at end of file
diff --git a/data/2022/neurips/GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks b/data/2022/neurips/GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks
new file mode 100644
index 0000000000..2ca59f796c
--- /dev/null
+++ b/data/2022/neurips/GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking Neural Networks (SNNs) have been studied over decades to incorporate their biological plausibility and leverage their promising energy efficiency. Throughout existing SNNs, the leaky integrate-and-fire (LIF) model is commonly adopted to formulate the spiking neuron and evolves into numerous variants with different biological features. However, most LIF-based neurons support only single biological feature in different neuronal behaviors, limiting their expressiveness and neuronal dynamic diversity. In this paper, we propose GLIF, a unified spiking neuron, to fuse different bio-features in different neuronal behaviors, enlarging the representation space of spiking neurons. In GLIF, gating factors, which are exploited to determine the proportion of the fused bio-features, are learnable during training. Combining all learnable membrane-related parameters, our method can make spiking neurons different and constantly changing, thus increasing the heterogeneity and adaptivity of spiking neurons. Extensive experiments on a variety of datasets demonstrate that our method obtains superior performance compared with other SNNs by simply changing their neuronal formulations to GLIF. In particular, we train a spiking ResNet-19 with GLIF and achieve $77.35\%$ top-1 accuracy with six time steps on CIFAR-100, which has advanced the state-of-the-art. Codes are available at \url{https://github.com/Ikarosy/Gated-LIF}.
\ No newline at end of file
diff --git a/data/2022/neurips/GLIPv2: Unifying Localization and Vision-Language Understanding b/data/2022/neurips/GLIPv2: Unifying Localization and Vision-Language Understanding
new file mode 100644
index 0000000000..b7fc918ecb
--- /dev/null
+++ b/data/2022/neurips/GLIPv2: Unifying Localization and Vision-Language Understanding	
@@ -0,0 +1 @@
+We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at https://github.com/microsoft/GLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization b/data/2022/neurips/GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization
new file mode 100644
index 0000000000..2e78f44758
--- /dev/null
+++ b/data/2022/neurips/GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization	
@@ -0,0 +1 @@
+Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models b/data/2022/neurips/GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models
new file mode 100644
index 0000000000..f1b7427290
--- /dev/null
+++ b/data/2022/neurips/GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models	
@@ -0,0 +1 @@
+Prevalent semantic segmentation solutions are, in essence, a dense discriminative classifier of p(class|pixel feature). Though straightforward, this de facto paradigm neglects the underlying data distribution p(pixel feature|class), and struggles to identify out-of-distribution data. Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class). For each class, GMMSeg builds Gaussian Mixture Models (GMMs) via Expectation-Maximization (EM), so as to capture class-conditional densities. Meanwhile, the deep dense representation is end-to-end trained in a discriminative manner, i.e., maximizing p(class|pixel feature). This endows GMMSeg with the strengths of both generative and discriminative models. With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on three closed-set datasets. More impressively, without any modification, GMMSeg even performs well on open-world datasets. We believe this work brings fundamental insights into the related fields.
\ No newline at end of file
diff --git a/data/2022/neurips/GOOD: A Graph Out-of-Distribution Benchmark b/data/2022/neurips/GOOD: A Graph Out-of-Distribution Benchmark
new file mode 100644
index 0000000000..5757136721
--- /dev/null
+++ b/data/2022/neurips/GOOD: A Graph Out-of-Distribution Benchmark	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) learning deals with scenarios in which training and test data follow different distributions. Although general OOD problems have been intensively studied in machine learning, graph OOD is only an emerging area of research. Currently, there lacks a systematic benchmark tailored to graph OOD method evaluation. In this work, we aim at developing an OOD benchmark, known as GOOD, for graphs specifically. We explicitly make distinctions between covariate and concept shifts and design data splits that accurately reflect different shifts. We consider both graph and node prediction tasks as there are key differences in designing shifts. Overall, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits. We provide performance results on 10 commonly used baseline methods with 10 random runs. This results in 510 dataset-model combinations in total. Our results show significant performance gaps between in-distribution and OOD settings. Our results also shed light on different performance trends between covariate and concept shifts by different methods. Our GOOD benchmark is a growing project and expects to expand in both quantity and variety of resources as the area develops. The GOOD benchmark can be accessed via https://github.com/divelab/GOOD/.
\ No newline at end of file
diff --git a/data/2022/neurips/GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale b/data/2022/neurips/GPT3.int8(): 8-bit Matrix Multiplication for Transformers at Scale
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy b/data/2022/neurips/GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy
new file mode 100644
index 0000000000..b6891001ac
--- /dev/null
+++ b/data/2022/neurips/GRASP: Navigating Retrosynthetic Planning with Goal-driven Policy	
@@ -0,0 +1 @@
+Retrosynthetic planning occupies a crucial position in synthetic chemistry and, accordingly, drug discovery, which aims to find synthetic pathways of a target molecule through a sequential decision-making process on a set of feasible reactions. While the majority of recent works focus on the prediction of feasible reactions at each step, there have been limited attempts toward improving the sequential decision-making policy. Existing strategies rely on either the expensive and high-variance value estimation by online rollout, or a settled value estimation neural network pre-trained with simulated pathways of limited diversity and no negative feedback. Besides, how to return multiple candidate pathways that are not only diverse but also desirable for chemists (e.g., affordable building block materials) remains an open challenge. To this end, we propose a Goal-dRiven Actor-critic retroSynthetic Planning (GRASP) framework, where we identify the policy that performs goal-driven retrosynthesis navigation toward a user-demand objective. Our experiments on the benchmark Pistachio dataset and a chemists-designed dataset demonstrate that the framework outperforms existing state-of-the-art approaches by up to 32 . 2% on search efficiency and 5 . 6% on quality. Remarkably, our user studies show that GRASP successfully plans pathways that accomplish the goal prescribed with a goal (building block materials).
\ No newline at end of file
diff --git a/data/2022/neurips/GREED: A Neural Framework for Learning Graph Distance Functions b/data/2022/neurips/GREED: A Neural Framework for Learning Graph Distance Functions
new file mode 100644
index 0000000000..33197f0c93
--- /dev/null
+++ b/data/2022/neurips/GREED: A Neural Framework for Learning Graph Distance Functions	
@@ -0,0 +1 @@
+Among various distance functions for graphs, graph and subgraph edit distances (GED and SED respectively) are two of the most popular and expressive measures. Unfortunately, exact computations for both are NP-hard. To overcome this computational bottleneck, neural approaches to learn and predict edit distance in polynomial time have received much interest. While considerable progress has been made, there exist limitations that need to be addressed. First, the efficacy of an approximate distance function lies not only in its approximation accuracy, but also in the preservation of its properties. To elaborate, although GED is a metric, its neural approximations do not provide such a guarantee. This prohibits their usage in higher order tasks that rely on metric distance functions, such as clustering or indexing. Second, several existing frameworks for GED do not extend to SED due to SED being asymmetric. In this work, we design a novel siamese graph neural network called GREED, which through a carefully crafted inductive bias, learns GED and SED in a property-preserving manner. Through extensive experiments across 10 real graph datasets containing up to 7 million edges, we establish that GREED is not only more accurate than the state of the art, but also up to 3 orders of magnitude faster. Even more significantly, due to preserving the triangle inequality, the generated embeddings are indexable and consequently, even in a CPU-only environment, GREED is up to 50 times faster than GPU-powered baselines for graph / subgraph retrieval.
\ No newline at end of file
diff --git a/data/2022/neurips/GStarX: Explaining Graph Neural Networks with Structure-Aware Cooperative Games b/data/2022/neurips/GStarX: Explaining Graph Neural Networks with Structure-Aware Cooperative Games
new file mode 100644
index 0000000000..a205bcee65
--- /dev/null
+++ b/data/2022/neurips/GStarX: Explaining Graph Neural Networks with Structure-Aware Cooperative Games	
@@ -0,0 +1 @@
+Explaining machine learning models is an important and increasingly popular area of research interest. The Shapley value from game theory has been proposed as a prime approach to compute feature importance towards model predictions on images, text, tabular data, and recently graph neural networks (GNNs) on graphs. In this work, we revisit the appropriateness of the Shapley value for GNN explanation, where the task is to identify the most important subgraph and constituent nodes for GNN predictions. We claim that the Shapley value is a non-ideal choice for graph data because it is by definition not structure-aware. We propose a Graph Structure-aware eXplanation (GStarX) method to leverage the critical graph structure information to improve the explanation. Specifically, we define a scoring function based on a new structure-aware value from the cooperative game theory proposed by Hamiache and Navarro (HN). When used to score node importance, the HN value utilizes graph structures to attribute cooperation surplus between neighbor nodes, resembling message passing in GNNs, so that node importance scores reflect not only the node feature importance, but also the node structural roles. We demonstrate that GStarX produces qualitatively more intuitive explanations, and quantitatively improves explanation fidelity over strong baselines on chemical graph property prediction and text graph sentiment classification.
\ No newline at end of file
diff --git a/data/2022/neurips/GT-GAN: General Purpose Time Series Synthesis with Generative Adversarial Networks b/data/2022/neurips/GT-GAN: General Purpose Time Series Synthesis with Generative Adversarial Networks
new file mode 100644
index 0000000000..4ac2b84bb7
--- /dev/null
+++ b/data/2022/neurips/GT-GAN: General Purpose Time Series Synthesis with Generative Adversarial Networks	
@@ -0,0 +1 @@
+Time series synthesis is an important research topic in the field of deep learning, which can be used for data augmentation. Time series data types can be broadly classified into regular or irregular. However, there are no existing generative models that show good performance for both types without any model changes. Therefore, we present a general purpose model capable of synthesizing regular and irregular time series data. To our knowledge, we are the first designing a general purpose time series synthesis model, which is one of the most challenging settings for time series synthesis. To this end, we design a generative adversarial network-based method, where many related techniques are carefully integrated into a single framework, ranging from neural ordinary/controlled differential equations to continuous time-flow processes. Our method outperforms all existing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/GULP: a prediction-based metric between representations b/data/2022/neurips/GULP: a prediction-based metric between representations
new file mode 100644
index 0000000000..5a21b6cabc
--- /dev/null
+++ b/data/2022/neurips/GULP: a prediction-based metric between representations	
@@ -0,0 +1 @@
+Comparing the representations learned by different neural networks has recently emerged as a key tool to understand various architectures and ultimately optimize them. In this work, we introduce GULP, a family of distance measures between representations that is explicitly motivated by downstream predictive tasks. By construction, GULP provides uniform control over the difference in prediction performance between two representations, with respect to regularized linear prediction tasks. Moreover, it satisfies several desirable structural properties, such as the triangle inequality and invariance under orthogonal transformations, and thus lends itself to data embedding and visualization. We extensively evaluate GULP relative to other methods, and demonstrate that it correctly differentiates between architecture families, converges over the course of training, and captures generalization performance on downstream linear tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Gaussian Copula Embeddings b/data/2022/neurips/Gaussian Copula Embeddings
new file mode 100644
index 0000000000..af8fdac7d5
--- /dev/null
+++ b/data/2022/neurips/Gaussian Copula Embeddings	
@@ -0,0 +1 @@
+Learning latent vector representations via embedding models has been shown promising in machine learning. However, most of the embedding models are still limited to a single type of observed data. We propose a Gaussian copula embedding model to learn latent vectorial representations of items in a heterogeneous-data setting. The proposed model can effectively incorporate different types of observed data and, at the same time, yield robust embeddings. We demonstrate that the proposed model can effectively learn in many different scenarios, outperforming competing models in modeling quality and task performance
\ No newline at end of file
diff --git a/data/2022/neurips/GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions b/data/2022/neurips/GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions
new file mode 100644
index 0000000000..c1f5aea3b0
--- /dev/null
+++ b/data/2022/neurips/GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions	
@@ -0,0 +1 @@
+We investigate the generalization capabilities of neural signed distance functions (SDFs) for learning 3D object representations for unseen and unlabeled point clouds. Existing methods can fit SDFs to a handful of object classes and boast fine detail or fast inference speeds, but do not generalize well to unseen shapes. We introduce a two-stage semi-supervised meta-learning approach that transfers shape priors from labeled to unlabeled data to reconstruct unseen object categories. The first stage uses an episodic training scheme to simulate training on unlabeled data and meta-learns initial shape priors. The second stage then introduces unlabeled data with disjoint classes in a semi-supervised scheme to diversify these priors and achieve generalization. We assess our method on both synthetic data and real collected point clouds. Experimental results and analysis validate that our approach outperforms existing neural SDF methods and is capable of robust zero-shot inference on 100+ unseen classes. Code can be found at https://github.com/princeton-computational-imaging/gensdf.
\ No newline at end of file
diff --git a/data/2022/neurips/GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech b/data/2022/neurips/GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/General Cutting Planes for Bound-Propagation-Based Neural Network Verification b/data/2022/neurips/General Cutting Planes for Bound-Propagation-Based Neural Network Verification
new file mode 100644
index 0000000000..c118d6a98e
--- /dev/null
+++ b/data/2022/neurips/General Cutting Planes for Bound-Propagation-Based Neural Network Verification	
@@ -0,0 +1 @@
+Bound propagation methods, when combined with branch and bound, are among the most effective methods to formally verify properties of deep neural networks such as correctness, robustness, and safety. However, existing works cannot handle the general form of cutting plane constraints widely accepted in traditional solvers, which are crucial for strengthening verifiers with tightened convex relaxations. In this paper, we generalize the bound propagation procedure to allow the addition of arbitrary cutting plane constraints, including those involving relaxed integer variables that do not appear in existing bound propagation formulations. Our generalized bound propagation method, GCP-CROWN, opens up the opportunity to apply general cutting plane methods for neural network verification while benefiting from the efficiency and GPU acceleration of bound propagation methods. As a case study, we investigate the use of cutting planes generated by off-the-shelf mixed integer programming (MIP) solver. We find that MIP solvers can generate high-quality cutting planes for strengthening bound-propagation-based verifiers using our new formulation. Since the branching-focused bound propagation procedure and the cutting-plane-focused MIP solver can run in parallel utilizing different types of hardware (GPUs and CPUs), their combination can quickly explore a large number of branches with strong cutting planes, leading to strong verification performance. Experiments demonstrate that our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark compared to the best tool in VNN-COMP 2021, and also noticeably outperforms state-of-the-art verifiers on a wide range of benchmarks. GCP-CROWN is part of the $\alpha,\!\beta$-CROWN verifier, the VNN-COMP 2022 winner. Code is available at http://PaperCode.cc/GCP-CROWN
\ No newline at end of file
diff --git a/data/2022/neurips/Generalised Implicit Neural Representations b/data/2022/neurips/Generalised Implicit Neural Representations
new file mode 100644
index 0000000000..5518fe0aa2
--- /dev/null
+++ b/data/2022/neurips/Generalised Implicit Neural Representations	
@@ -0,0 +1 @@
+We consider the problem of learning implicit neural representations (INRs) for signals on non-Euclidean domains. In the Euclidean case, INRs are trained on a discrete sampling of a signal over a regular lattice. Here, we assume that the continuous signal exists on some unknown topological space from which we sample a discrete graph. In the absence of a coordinate system to identify the sampled nodes, we propose approximating their location with a spectral embedding of the graph. This allows us to train INRs without knowing the underlying continuous domain, which is the case for most graph signals in nature, while also making the INRs independent of any choice of coordinate system. We show experiments with our method on various real-world signals on non-Euclidean domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalised Mutual Information for Discriminative Clustering b/data/2022/neurips/Generalised Mutual Information for Discriminative Clustering
new file mode 100644
index 0000000000..cb31a85f8a
--- /dev/null
+++ b/data/2022/neurips/Generalised Mutual Information for Discriminative Clustering	
@@ -0,0 +1 @@
+In the last decade, recent successes in deep clustering majorly involved the mutual information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the generalised mutual information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training. Some of these metrics are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep clustering context where the number of clusters is a priori unknown.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Analysis of Message Passing Neural Networks on Large Random Graphs b/data/2022/neurips/Generalization Analysis of Message Passing Neural Networks on Large Random Graphs
new file mode 100644
index 0000000000..14745666be
--- /dev/null
+++ b/data/2022/neurips/Generalization Analysis of Message Passing Neural Networks on Large Random Graphs	
@@ -0,0 +1 @@
+Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Analysis on Learning with a Concurrent Verifier b/data/2022/neurips/Generalization Analysis on Learning with a Concurrent Verifier
new file mode 100644
index 0000000000..eb9e4311ae
--- /dev/null
+++ b/data/2022/neurips/Generalization Analysis on Learning with a Concurrent Verifier	
@@ -0,0 +1 @@
+Machine learning technologies have been used in a wide range of practical systems. In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements. However, it is difficult to obtain a model that satisfies requirements by just learning from examples. A simple solution is to add a module that checks whether the input-output pairs meet the requirements and then modifies the model's outputs. Such a module, which we call a {\em concurrent verifier} (CV), can give a certification, although how the generalizability of the machine learning model changes using a CV is unclear. This paper gives a generalization analysis of learning with a CV. We analyze how the learnability of a machine learning model changes with a CV and show a condition where we can obtain a guaranteed hypothesis using a verifier only in the inference time. We also show that typical error bounds based on Rademacher complexity will be no larger than that of the original model when using a CV in multi-class classification and structured prediction settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Bounds for Estimating Causal Effects of Continuous Treatments b/data/2022/neurips/Generalization Bounds for Estimating Causal Effects of Continuous Treatments
new file mode 100644
index 0000000000..a9c3e97638
--- /dev/null
+++ b/data/2022/neurips/Generalization Bounds for Estimating Causal Effects of Continuous Treatments	
@@ -0,0 +1 @@
+We focus on estimating causal effects of continuous treatments (e.g., dosage in medicine), also known as dose-response function. Existing methods in causal inference for continuous treatments using neural networks are effective and to some extent reduce selection bias, which is introduced by non-randomized treatments among individuals and might lead to covariate imbalance and thus unreliable inference. To theoretically support the alleviation of selection bias in the setting of continuous treatments, we exploit the re-weighting schema and the Integral Probability Metric (IPM) distance to derive an upper bound on the counterfactual loss of estimating the average dose-response function (ADRF), and herein the IPM distance builds a bridge from a source (factual) domain to an inﬁnite number of target (counterfactual) domains. We provide a discretized approximation of the IPM distance with a theoretical guarantee in the practical implementation. Based on the theoretical analyses, we also propose a novel algorithm, called A verage D ose-response esti M at I on via re-weigh T ing schema (ADMIT). ADMIT simultaneously learns a re-weighting network, which aims to alleviate the selection bias, and an inference network, which makes factual and counterfactual estimations. In addition, the effectiveness of ADMIT is empirically demonstrated in both synthetic and semi-synthetic experiments by outperforming the existing benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Bounds for Gradient Methods via Discrete and Continuous Prior b/data/2022/neurips/Generalization Bounds for Gradient Methods via Discrete and Continuous Prior
new file mode 100644
index 0000000000..ba36a490c1
--- /dev/null
+++ b/data/2022/neurips/Generalization Bounds for Gradient Methods via Discrete and Continuous Prior	
@@ -0,0 +1 @@
+Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected noise (such as the Gaussian noise in Langevin dynamics). In this paper, we introduce a new discrete data-dependent prior to the PAC-Bayesian framework, and prove a high probability generalization bound of order $O(\frac{1}{n}\cdot \sum_{t=1}^T(\gamma_t/\varepsilon_t)^2\left\|{\mathbf{g}_t}\right\|^2)$ for Floored GD (i.e. a version of gradient descent with precision level $\varepsilon_t$), where $n$ is the number of training samples, $\gamma_t$ is the learning rate at step $t$, $\mathbf{g}_t$ is roughly the difference of the gradient computed using all samples and that using only prior samples. $\left\|{\mathbf{g}_t}\right\|$ is upper bounded by and and typical much smaller than the gradient norm $\left\|{\nabla f(W_t)}\right\|$. We remark that our bound holds for nonconvex and nonsmooth scenarios. Moreover, our theoretical results provide numerically favorable upper bounds of testing errors (e.g., $0.037$ on MNIST). Using a similar technique, we can also obtain new generalization bounds for certain variants of SGD. Furthermore, we study the generalization bounds for gradient Langevin Dynamics (GLD). Using the same framework with a carefully constructed continuous prior, we show a new high probability generalization bound of order $O(\frac{1}{n} + \frac{L^2}{n^2}\sum_{t=1}^T(\gamma_t/\sigma_t)^2)$ for GLD. The new $1/n^2$ rate is due to the concentration of the difference between the gradient of training samples and that of the prior.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization b/data/2022/neurips/Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization
new file mode 100644
index 0000000000..d20fe99de4
--- /dev/null
+++ b/data/2022/neurips/Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization	
@@ -0,0 +1 @@
+Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using statistical distances such as maximum mean discrepancy, Wasserstein distance, or $\phi$-divergence in the DRO, our analysis implies generalization bounds whose dependence on the hypothesis class appears the minimal possible: The bound depends solely on the true loss function, independent of any other candidates in the hypothesis class. To our best knowledge, it is the first generalization bound of this type in the literature, and we hope our findings can open the door for a better understanding of DRO, especially its benefits on loss minimization and other machine learning applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Error Bounds on Deep Learning with Markov Datasets b/data/2022/neurips/Generalization Error Bounds on Deep Learning with Markov Datasets
new file mode 100644
index 0000000000..059388e685
--- /dev/null
+++ b/data/2022/neurips/Generalization Error Bounds on Deep Learning with Markov Datasets	
@@ -0,0 +1 @@
+In this paper, we derive upper bounds on generalization errors for deep neural networks with Markov datasets. These bounds are developed based on Koltchinskii and Panchenko's approach for bounding the generalization error of combined classifiers with i.i.d. datasets. The development of new symmetrization inequalities in high-dimensional probability for Markov chains is a key element in our extension, where the spectral gap of the infinitesimal generator of the Markov chain plays a key parameter in these inequalities. We also propose a simple method to convert these bounds and other similar ones in traditional deep learning and machine learning to Bayesian counterparts for both i.i.d. and Markov datasets. Extensions to $m$-order homogeneous Markov chains such as AR and ARMA models and mixtures of several Markov data services are given.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Gap in Amortized Inference b/data/2022/neurips/Generalization Gap in Amortized Inference
new file mode 100644
index 0000000000..fea318319f
--- /dev/null
+++ b/data/2022/neurips/Generalization Gap in Amortized Inference	
@@ -0,0 +1 @@
+The ability of likelihood-based probabilistic models to generalize to unseen data is central to many machine learning applications such as lossless compression. In this work, we study the generalization of a popular class of probabilistic model - the Variational Auto-Encoder (VAE). We discuss the two generalization gaps that affect VAEs and show that overfitting is usually dominated by amortized inference. Based on this observation, we propose a new training objective that improves the generalization of amortized inference. We demonstrate how our method can improve performance in the context of image modeling and lossless compression.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization Properties of NAS under Activation and Skip Connection Search b/data/2022/neurips/Generalization Properties of NAS under Activation and Skip Connection Search
new file mode 100644
index 0000000000..6f2b2ad796
--- /dev/null
+++ b/data/2022/neurips/Generalization Properties of NAS under Activation and Skip Connection Search	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) has fostered the automatic discovery of state-of-the-art neural architectures. Despite the progress achieved with NAS, so far there is little attention to theoretical guarantees on NAS. In this work, we study the generalization properties of NAS under a unifying framework enabling (deep) layer skip connection search and activation function search. To this end, we derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime using a certain search space including mixed activation functions, fully connected, and residual neural networks. We use the minimum eigenvalue to establish generalization error bounds of NAS in the stochastic gradient descent training. Importantly, we theoretically and experimentally show how the derived results can guide NAS to select the top-performing architectures, even in the case without training, leading to a train-free algorithm based on our theory. Accordingly, our numerical validation shed light on the design of computationally efficient methods for NAS. Our analysis is non-trivial due to the coupling of various architectures and activation functions under the unifying framework and has its own interest in providing the lower bound of the minimum eigenvalue of NTK in deep learning theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalization for multiclass classification with overparameterized linear models b/data/2022/neurips/Generalization for multiclass classification with overparameterized linear models
new file mode 100644
index 0000000000..cce13109e3
--- /dev/null
+++ b/data/2022/neurips/Generalization for multiclass classification with overparameterized linear models	
@@ -0,0 +1 @@
+Via an overparameterized linear model with Gaussian features, we provide conditions for good generalization for multiclass classification of minimum-norm interpolating solutions in an asymptotic setting where both the number of underlying features and the number of classes scale with the number of training points. The survival/contamination analysis framework for understanding the behavior of overparameterized learning problems is adapted to this setting, revealing that multiclass classification qualitatively behaves like binary classification in that, as long as there are not too many classes (made precise in the paper), it is possible to generalize well even in some settings where the corresponding regression tasks would not generalize. Besides various technical challenges, it turns out that the key difference from the binary classification setting is that there are relatively fewer positive training examples of each class in the multiclass setting as the number of classes increases, making the multiclass problem"harder"than the binary one.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems b/data/2022/neurips/Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems
new file mode 100644
index 0000000000..eacf92d30b
--- /dev/null
+++ b/data/2022/neurips/Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems	
@@ -0,0 +1 @@
+Predicting conversion rate (e.g., the probability that a user will purchase an item) is a fundamental problem in machine learning based recommender systems. However, accurate conversion labels are revealed after a long delay, which harms the timeliness of recommender systems. Previous literature concentrates on utilizing early conversions to mitigate such a delayed feedback problem. In this paper, we show that post-click user behaviors are also informative to conversion rate prediction and can be used to improve timeliness. We propose a generalized delayed feedback model (GDFM) that unifies both post-click behaviors and early conversions as stochastic post-click information, which could be utilized to train GDFM in a streaming manner efficiently. Based on GDFM, we further establish a novel perspective that the performance gap introduced by delayed feedback can be attributed to a temporal gap and a sampling gap. Inspired by our analysis, we propose to measure the quality of post-click information with a combination of temporal distance and sample complexity. The training objective is re-weighted accordingly to highlight informative and timely signals. We validate our analysis on public datasets, and experimental performance confirms the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalized Laplacian Eigenmaps b/data/2022/neurips/Generalized Laplacian Eigenmaps
new file mode 100644
index 0000000000..8d6407c5e8
--- /dev/null
+++ b/data/2022/neurips/Generalized Laplacian Eigenmaps	
@@ -0,0 +1 @@
+Graph contrastive learning attracts/disperses node representations for similar/dissimilar node pairs under some notion of similarity. It may be combined with a low-dimensional embedding of nodes to preserve intrinsic and structural properties of a graph. COLES, a recent graph contrastive method combines traditional graph embedding and negative sampling into one framework. COLES in fact minimizes the trace difference between the within-class scatter matrix encapsulating the graph connectivity and the total scatter matrix encapsulating negative sampling. In this paper, we propose a more essential framework for graph embedding, called Generalized Laplacian EigeNmaps (GLEN), which learns a graph representation by maximizing the rank difference between the total scatter matrix and the within-class scatter matrix, resulting in the minimum class separation guarantee. However, the rank difference minimization is an NP-hard problem. Thus, we replace the trace difference that corresponds to the difference of nuclear norms by the difference of LogDet expressions, which we argue is a more accurate surrogate for the NP-hard rank difference than the trace difference. While enjoying a lesser computational cost, the difference of LogDet terms is lower-bounded by the Affine-invariant Riemannian metric (AIRM) and upper-bounded by AIRM scaled by the factor of √ m . We show on popular benchmarks/backbones that GLEN offers favourable accuracy/scalability compared to state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalized One-shot Domain Adaptation of Generative Adversarial Networks b/data/2022/neurips/Generalized One-shot Domain Adaptation of Generative Adversarial Networks
new file mode 100644
index 0000000000..b7ccf6dbab
--- /dev/null
+++ b/data/2022/neurips/Generalized One-shot Domain Adaptation of Generative Adversarial Networks	
@@ -0,0 +1 @@
+The adaptation of a Generative Adversarial Network (GAN) aims to transfer a pre-trained GAN to a target domain with limited training data. In this paper, we focus on the one-shot case, which is more challenging and rarely explored in previous works. We consider that the adaptation from a source domain to a target domain can be decoupled into two parts: the transfer of global style like texture and color, and the emergence of new entities that do not belong to the source domain. While previous works mainly focus on style transfer, we propose a novel and concise framework to address the \textit{generalized one-shot adaptation} task for both style and entity transfer, in which a reference image and its binary entity mask are provided. Our core idea is to constrain the gap between the internal distributions of the reference and syntheses by sliced Wasserstein distance. To better achieve it, style fixation is used at first to roughly obtain the exemplary style, and an auxiliary network is introduced to the generator to disentangle entity and style transfer. Besides, to realize cross-domain correspondence, we propose the variational Laplacian regularization to constrain the smoothness of the adapted generator. Both quantitative and qualitative experiments demonstrate the effectiveness of our method in various scenarios. Code is available at \url{https://github.com/zhangzc21/Generalized-One-shot-GAN-adaptation}.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning b/data/2022/neurips/Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning
new file mode 100644
index 0000000000..6897d39808
--- /dev/null
+++ b/data/2022/neurips/Generalized Variational Inference in Function Spaces: Gaussian Measures meet Bayesian Deep Learning	
@@ -0,0 +1 @@
+We develop a framework for generalized variational inference in infinite-dimensional function spaces and use it to construct a method termed Gaussian Wasserstein inference (GWI). GWI leverages the Wasserstein distance between Gaussian measures on the Hilbert space of square-integrable functions in order to determine a variational posterior using a tractable optimisation criterion and avoids pathologies arising in standard variational function space inference. An exciting application of GWI is the ability to use deep neural networks in the variational parametrisation of GWI, combining their superior predictive performance with the principled uncertainty quantification analogous to that of Gaussian processes. The proposed method obtains state-of-the-art performance on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalizing Bayesian Optimization with Decision-theoretic Entropies b/data/2022/neurips/Generalizing Bayesian Optimization with Decision-theoretic Entropies
new file mode 100644
index 0000000000..51dc30467f
--- /dev/null
+++ b/data/2022/neurips/Generalizing Bayesian Optimization with Decision-theoretic Entropies	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a popular method for efficiently inferring optima of an expensive black-box function via a sequence of queries. Existing information-theoretic BO procedures aim to make queries that most reduce the uncertainty about optima, where the uncertainty is captured by Shannon entropy. However, an optimal measure of uncertainty would, ideally, factor in how we intend to use the inferred quantity in some downstream procedure. In this paper, we instead consider a generalization of Shannon entropy from work in statistical decision theory (DeGroot 1962, Rao 1984), which contains a broad class of uncertainty measures parameterized by a problem-specific loss function corresponding to a downstream task. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures such as knowledge gradient, expected improvement, and entropy search. We then show how alternative choices for the loss yield a flexible family of acquisition functions that can be customized for use in novel optimization settings. Additionally, we develop gradient-based methods to efficiently optimize our proposed family of acquisition functions, and demonstrate strong empirical performance on a diverse set of sequential decision making tasks, including variants of top-$k$ optimization, multi-level set estimation, and sequence search.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalizing Consistent Multi-Class Classification with Rejection to be Compatible with Arbitrary Losses b/data/2022/neurips/Generalizing Consistent Multi-Class Classification with Rejection to be Compatible with Arbitrary Losses
new file mode 100644
index 0000000000..248096e028
--- /dev/null
+++ b/data/2022/neurips/Generalizing Consistent Multi-Class Classification with Rejection to be Compatible with Arbitrary Losses	
@@ -0,0 +1 @@
+Classification with rejection (CwR) refrains from making a prediction to avoid critical misclassification when encountering test samples that are difficult to classify. Though previous methods for CwR have been provided with theoretical guarantees, they are only compatible with certain loss functions, making them not flexible enough when the loss needs to be changed with the dataset in practice. In this paper, we derive a novel formulation for CwR that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees. First, we show that K -class CwR is equivalent to a ( K +1) -class classification problem on the original data distribution with an augmented class, and propose an empirical risk minimization formulation to solve this problem with an estimation error bound. Then, we find necessary and sufficient conditions for the learning consistency of the surrogates constructed on our proposed formulation equipped with any classification-calibrated multi-class losses, where consistency means the surrogate risk minimization implies the target risk minimization for CwR. Finally, experimental results validate the effectiveness of our proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning b/data/2022/neurips/Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning
new file mode 100644
index 0000000000..eb91413e53
--- /dev/null
+++ b/data/2022/neurips/Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning	
@@ -0,0 +1 @@
+As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.
\ No newline at end of file
diff --git a/data/2022/neurips/Generating Long Videos of Dynamic Scenes b/data/2022/neurips/Generating Long Videos of Dynamic Scenes
new file mode 100644
index 0000000000..e60218f95b
--- /dev/null
+++ b/data/2022/neurips/Generating Long Videos of Dynamic Scenes	
@@ -0,0 +1 @@
+We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence. A common failure case is for content to never change due to over-reliance on inductive biases to provide temporal consistency, such as a single latent code that dictates content for the entire video. On the other extreme, without long-term consistency, generated videos may morph unrealistically between different scenes. To address these limitations, we prioritize the time axis by redesigning the temporal latent representation and learning long-term consistency from data by training on longer videos. To this end, we leverage a two-phase training strategy, where we separately train using longer videos at a low resolution and shorter videos at a high resolution. To evaluate the capabilities of our model, we introduce two new benchmark datasets with explicit focus on long-term temporal dynamics.
\ No newline at end of file
diff --git a/data/2022/neurips/Generating Training Data with Language Models: Towards Zero-Shot Language Understanding b/data/2022/neurips/Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
new file mode 100644
index 0000000000..fca7d90114
--- /dev/null
+++ b/data/2022/neurips/Generating Training Data with Language Models: Towards Zero-Shot Language Understanding	
@@ -0,0 +1 @@
+Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.
\ No newline at end of file
diff --git a/data/2022/neurips/Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN) b/data/2022/neurips/Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)
new file mode 100644
index 0000000000..9b3de2fc8f
--- /dev/null
+++ b/data/2022/neurips/Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)	
@@ -0,0 +1 @@
+Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data.
\ No newline at end of file
diff --git a/data/2022/neurips/Generative Neural Articulated Radiance Fields b/data/2022/neurips/Generative Neural Articulated Radiance Fields
new file mode 100644
index 0000000000..b80f935c26
--- /dev/null
+++ b/data/2022/neurips/Generative Neural Articulated Radiance Fields	
@@ -0,0 +1 @@
+Unsupervised learning of 3D-aware generative adversarial networks (GANs) using only collections of single-view 2D photographs has very recently made much progress. These 3D GANs, however, have not been demonstrated for human bodies and the generated radiance fields of existing frameworks are not directly editable, limiting their applicability in downstream tasks. We propose a solution to these challenges by developing a 3D GAN framework that learns to generate radiance fields of human bodies or faces in a canonical pose and warp them using an explicit deformation field into a desired body pose or facial expression. Using our framework, we demonstrate the first high-quality radiance field generation results for human bodies. Moreover, we show that our deformation-aware training procedure significantly improves the quality of generated bodies or faces when editing their poses or facial expressions compared to a 3D GAN that is not trained with explicit deformations.
\ No newline at end of file
diff --git a/data/2022/neurips/Generative Status Estimation and Information Decoupling for Image Rain Removal b/data/2022/neurips/Generative Status Estimation and Information Decoupling for Image Rain Removal
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2022/neurips/Generative Status Estimation and Information Decoupling for Image Rain Removal	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2022/neurips/Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement b/data/2022/neurips/Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement
new file mode 100644
index 0000000000..bab806971e
--- /dev/null
+++ b/data/2022/neurips/Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement	
@@ -0,0 +1 @@
+Time series forecasting has been a widely explored task of great importance in many applications. However, it is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. In this work, we propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder (BVAE) equipped with diffusion, denoise, and disentanglement, namely D3VAE. Specifically, a coupled diffusion probabilistic model is proposed to augment the time series data without increasing the aleatoric uncertainty and implement a more tractable inference process with BVAE. To ensure the generated series move toward the true target, we further propose to adapt and integrate the multiscale denoising score matching into the diffusion process for time series forecasting. In addition, to enhance the interpretability and stability of the prediction, we treat the latent variable in a multivariate manner and disentangle them on top of minimizing total correlation. Extensive experiments on synthetic and real-world data show that D3VAE outperforms competitive algorithms with remarkable margins. Our implementation is available at https://github.com/PaddlePaddle/PaddleSpatial/tree/main/research/D3VAE.
\ No newline at end of file
diff --git a/data/2022/neurips/Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models b/data/2022/neurips/Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models
new file mode 100644
index 0000000000..a4ec0baeda
--- /dev/null
+++ b/data/2022/neurips/Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models	
@@ -0,0 +1 @@
+Generative models (e.g., GANs, diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a particular region of the output space or sampling evenly over a range of characteristics. For efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control over pre-trained generative models by incorporating knowledge of other off-the-shelf models. PromptGen defines control as energy-based models (EBMs) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, avoiding optimization at inference. Our experiments demonstrate how PromptGen can efficiently sample from several unconditional generative models (e.g., StyleGAN2, StyleNeRF, diffusion autoencoder, NVAE) in a controlled or/and de-biased manner using various off-the-shelf models: (1) with the CLIP model as control, PromptGen can sample images guided by text, (2) with image classifiers as control, PromptGen can de-bias generative models across a set of attributes or attribute combinations, and (3) with inverse graphics models as control, PromptGen can sample images of the same identity in different poses. (4) Finally, PromptGen reveals that the CLIP model shows a"reporting bias"when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner. The code is available at https://github.com/ChenWu98/Generative-Visual-Prompt.
\ No newline at end of file
diff --git a/data/2022/neurips/Generative multitask learning mitigates target-causing confounding b/data/2022/neurips/Generative multitask learning mitigates target-causing confounding
new file mode 100644
index 0000000000..6fbd5b70ba
--- /dev/null
+++ b/data/2022/neurips/Generative multitask learning mitigates target-causing confounding	
@@ -0,0 +1 @@
+We propose generative multitask learning (GMTL), a simple and scalable approach to causal representation learning for multitask learning. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional training. The improvement in robustness comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as \emph{target-causing confounders}. These confounders induce spurious dependencies between the input and targets. This poses a problem for conventional multitask learning, due to its assumption that the targets are conditionally independent given the input. GMTL mitigates target-causing confounding at inference time, by removing the influence of the joint target distribution, and predicting all targets jointly. This removes the spurious dependencies between the input and targets, where the degree of removal is adjustable via a single hyperparameter. This flexibility is useful for managing the trade-off between in- and out-of-distribution generalization. Our results on the Attributes of People and Taskonomy datasets reflect an improved robustness to target shift across four multitask learning methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Generic bounds on the approximation error for physics-informed (and) operator learning b/data/2022/neurips/Generic bounds on the approximation error for physics-informed (and) operator learning
new file mode 100644
index 0000000000..e89aebe965
--- /dev/null
+++ b/data/2022/neurips/Generic bounds on the approximation error for physics-informed (and) operator learning	
@@ -0,0 +1 @@
+We propose a very general framework for deriving rigorous bounds on the approximation error for physics-informed neural networks (PINNs) and operator learning architectures such as DeepONets and FNOs as well as for physics-informed operator learning. These bounds guarantee that PINNs and (physics-informed) DeepONets or FNOs will efficiently approximate the underlying solution or solution operator of generic partial differential equations (PDEs). Our framework utilizes existing neural network approximation results to obtain bounds on more involved learning architectures for PDEs. We illustrate the general framework by deriving the first rigorous bounds on the approximation error of physics-informed operator learning and by showing that PINNs (and physics-informed DeepONets and FNOs) mitigate the curse of dimensionality in approximating nonlinear parabolic PDEs.
\ No newline at end of file
diff --git a/data/2022/neurips/Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction b/data/2022/neurips/Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction
new file mode 100644
index 0000000000..f836ec8e05
--- /dev/null
+++ b/data/2022/neurips/Geo-Neus: Geometry-Consistent Neural Implicit Surfaces Learning for Multi-view Reconstruction	
@@ -0,0 +1 @@
+Recently, neural implicit surfaces learning by volume rendering has become popular for multi-view reconstruction. However, one key challenge remains: existing approaches lack explicit multi-view geometry constraints, hence usually fail to generate geometry consistent surface reconstruction. To address this challenge, we propose geometry-consistent neural implicit surfaces learning for multi-view reconstruction. We theoretically analyze that there exists a gap between the volume rendering integral and point-based signed distance function (SDF) modeling. To bridge this gap, we directly locate the zero-level set of SDF networks and explicitly perform multi-view geometry optimization by leveraging the sparse geometry from structure from motion (SFM) and photometric consistency in multi-view stereo. This makes our SDF optimization unbiased and allows the multi-view geometry constraints to focus on the true surface optimization. Extensive experiments show that our proposed method achieves high-quality surface reconstruction in both complex thin structures and large smooth regions, thus outperforming the state-of-the-arts by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers b/data/2022/neurips/Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers
new file mode 100644
index 0000000000..74d343a39d
--- /dev/null
+++ b/data/2022/neurips/Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers	
@@ -0,0 +1 @@
+Deformable shapes provide important and complex geometric features of objects presented in images. However, such information is oftentimes missing or underutilized as implicit knowledge in many image analysis tasks. This paper presents Geo-SIC, the first deep learning model to learn deformable shapes in a deformation space for an improved performance of image classification. We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations; and (ii) gains increased model interpretability by allowing direct access to the underlying geometric features of image data. In particular, we develop a boosted classification network, equipped with an unsupervised learning of geometric shape representations characterized by diffeomorphic transformations within each class. In contrast to previous approaches using pre-extracted shapes, our model provides a more fundamental approach by naturally learning the most relevant shape features jointly with an image classifier. We demonstrate the effectiveness of our method on both simulated 2D images and real 3D brain magnetic resonance (MR) images. Experimental results show that our model substantially improves the image classification accuracy with an additional benefit of increased model interpretability. Our code is publicly available at https://github.com/jw4hv/Geo-SIC
\ No newline at end of file
diff --git a/data/2022/neurips/Geoclidean: Few-Shot Generalization in Euclidean Geometry b/data/2022/neurips/Geoclidean: Few-Shot Generalization in Euclidean Geometry
new file mode 100644
index 0000000000..8cb6c37ce3
--- /dev/null
+++ b/data/2022/neurips/Geoclidean: Few-Shot Generalization in Euclidean Geometry	
@@ -0,0 +1 @@
+Euclidean geometry is among the earliest forms of mathematical thinking. While the geometric primitives underlying its constructions, such as perfect lines and circles, do not often occur in the natural world, humans rarely struggle to perceive and reason with them. Will computer vision models trained on natural images show the same sensitivity to Euclidean geometry? Here we explore these questions by studying few-shot generalization in the universe of Euclidean geometry constructions. We introduce Geoclidean, a domain-specific language for Euclidean geometry, and use it to generate two datasets of geometric concept learning tasks for benchmarking generalization judgements of humans and machines. We find that humans are indeed sensitive to Euclidean geometry and generalize strongly from a few visual examples of a geometric concept. In contrast, low-level and high-level visual features from standard computer vision models pretrained on natural images do not support correct generalization. Thus Geoclidean represents a novel few-shot generalization benchmark for geometric concept learning, where the performance of humans and of AI models diverge. The Geoclidean framework and dataset are publicly available for download.
\ No newline at end of file
diff --git a/data/2022/neurips/Geodesic Graph Neural Network for Efficient Graph Representation Learning b/data/2022/neurips/Geodesic Graph Neural Network for Efficient Graph Representation Learning
new file mode 100644
index 0000000000..744d5b0d92
--- /dev/null
+++ b/data/2022/neurips/Geodesic Graph Neural Network for Efficient Graph Representation Learning	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have recently been applied to graph learning tasks and achieved state-of-the-art (SOTA) results. However, many competitive methods run GNNs multiple times with subgraph extraction and customized labeling to capture information that is hard for normal GNNs to learn. Such operations are time-consuming and do not scale to large graphs. In this paper, we propose an efficient GNN framework called Geodesic GNN (GDGNN) that requires only one GNN run and injects conditional relationships between nodes into the model without labeling. This strategy effectively reduces the runtime of subgraph methods. Specifically, we view the shortest paths between two nodes as the spatial graph context of the neighborhood around them. The GNN embeddings of nodes on the shortest paths are used to generate geodesic representations. Conditioned on the geodesic representations, GDGNN can generate node, link, and graph representations that carry much richer structural information than plain GNNs. We theoretically prove that GDGNN is more powerful than plain GNNs. We present experimental results to show that GDGNN achieves highly competitive performance with SOTA GNN models on various graph learning tasks while taking significantly less time.
\ No newline at end of file
diff --git a/data/2022/neurips/Geodesic Self-Attention for 3D Point Clouds b/data/2022/neurips/Geodesic Self-Attention for 3D Point Clouds
new file mode 100644
index 0000000000..388bf4112c
--- /dev/null
+++ b/data/2022/neurips/Geodesic Self-Attention for 3D Point Clouds	
@@ -0,0 +1 @@
+Due to the outstanding competence in capturing long-range relationships, self-attention mechanism has achieved remarkable progress in point cloud tasks. Never-theless, point cloud object often has complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most current self-attention modules highly rely on the dot product multiplication in Euclidean space, which cannot capture internal non-Euclidean structures of point cloud objects, especially the long-range relationships along the curve of the implicit manifold surface represented by point cloud objects. To address this problem, in this paper, we introduce a novel metric on the Riemannian manifold to capture the long-range geometrical dependencies of point cloud objects to replace traditional self-attention modules, namely, the G eodesic S elf-A ttention (GSA) module. Our approach achieves state-of-the-art performance compared to point cloud Transformers [13, 10, 44, 26] on object classiﬁcation, few-shot classiﬁcation and part segmentation benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks b/data/2022/neurips/Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks
new file mode 100644
index 0000000000..d52dc49765
--- /dev/null
+++ b/data/2022/neurips/Geometric Knowledge Distillation: Topology Compression for Graph Neural Networks	
@@ -0,0 +1 @@
+We study a new paradigm of knowledge transfer that aims at encoding graph topological information into graph neural networks (GNNs) by distilling knowledge from a teacher GNN model trained on a complete graph to a student GNN model operating on a smaller or sparser graph. To this end, we revisit the connection between thermodynamics and the behavior of GNN, based on which we propose Neural Heat Kernel (NHK) to encapsulate the geometric property of the underlying manifold concerning the architecture of GNNs. A fundamental and principled solution is derived by aligning NHKs on teacher and student models, dubbed as Geometric Knowledge Distillation. We develop non- and parametric instantiations and demonstrate their efficacy in various experimental settings for knowledge distillation regarding different types of privileged topological information and teacher-student schemes.
\ No newline at end of file
diff --git a/data/2022/neurips/Geometric Order Learning for Rank Estimation b/data/2022/neurips/Geometric Order Learning for Rank Estimation
new file mode 100644
index 0000000000..c85d235b4c
--- /dev/null
+++ b/data/2022/neurips/Geometric Order Learning for Rank Estimation	
@@ -0,0 +1 @@
+A novel approach to rank estimation, called geometric order learning (GOL), is proposed in this paper. First, we construct an embedding space, in which the direction and distance between objects represent order and metric relations between their ranks, by enforcing two geometric constraints: the order constraint compels objects to be sorted according to their ranks, while the metric constraint makes the distance between objects reﬂect their rank difference. Then, we perform the simple k nearest neighbor ( k -NN) search in the embedding space to estimate the rank of a test object. Moreover, to assess the quality of embedding spaces for rank estimation, we propose a metric called discriminative ratio for ranking (DRR). Extensive experiments on facial age estimation, historical color image (HCI) clas-siﬁcation, and aesthetic score regression demonstrate that GOL constructs effective embedding spaces and thus yields excellent rank estimation performances. The source codes are available at https://github.com/seon92/GOL
\ No newline at end of file
diff --git a/data/2022/neurips/Geometry-aware Two-scale PIFu Representation for Human Reconstruction b/data/2022/neurips/Geometry-aware Two-scale PIFu Representation for Human Reconstruction
new file mode 100644
index 0000000000..68873da4d5
--- /dev/null
+++ b/data/2022/neurips/Geometry-aware Two-scale PIFu Representation for Human Reconstruction	
@@ -0,0 +1 @@
+Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction from sparse, noisy inputs. Our key idea is to exploit the complementary properties of depth denoising and 3D reconstruction, for learning a two-scale PIFu representation to reconstruct high-frequency facial details and consistent bodies separately. To this end, we first formulate depth denoising and 3D reconstruction as a multi-task learning problem. The depth denoising process enriches the local geometry information of the reconstruction features, while the reconstruction process enhances depth denoising with global topology information. We then propose to learn the two-scale PIFu representation using two MLPs based on the denoised depth and geometry-aware features. Extensive experiments demonstrate the effectiveness of our approach in reconstructing facial details and bodies of different poses and its superiority over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Get More at Once: Alternating Sparse Training with Gradient Correction b/data/2022/neurips/Get More at Once: Alternating Sparse Training with Gradient Correction
new file mode 100644
index 0000000000..44937b642f
--- /dev/null
+++ b/data/2022/neurips/Get More at Once: Alternating Sparse Training with Gradient Correction	
@@ -0,0 +1 @@
+Recently, a new trend of exploring training sparsity has emerged, which removes parameters during training, leading to both training and inference efficiency improvement. This line of works primarily aims to obtain a single sparse model under a pre-defined large sparsity ratio. It leads to a static/fixed sparse inference model that is not capable of adjusting or re-configuring its computation complexity (i.e., inference structure, latency) after training for real-world varying and dynamic hardware resource availability. To enable such run-time or post-training network morphing, the concept of ‘dynamic inference’ or ‘training-once-for-all’ has been proposed to train a single network consisting of multiple sub-nets once, but each sub-net could perform the same inference function with different computing complexity. However, the traditional dynamic inference training method requires a joint training scheme with multi-objective optimization, which suffers from very large training overhead. In this work, for the first time, we propose a novel alternating sparse training (AST) scheme to train multiple sparse sub-nets for dynamic inference without extra training cost compared to the case of training a single sparse model from scratch. Furthermore, to mitigate the interference of weight update among sub-nets without losing the generalization of optimization, we pro-pose gradient correction within the inner-group iterations to reduce their weight update interference. We validate the proposed AST on multiple datasets against state-of-the-art sparse training methods, which shows that AST achieves similar or better accuracy, but only needs to train once to get multiple sparse sub-nets with different sparsity ratios. More importantly, comparing with the traditional joint training based dynamic inference training methodology, the large training overhead is completely eliminated without affecting the accuracy of each sub-net. Code is available at https://github.com/mengjian0502/AST .
\ No newline at end of file
diff --git a/data/2022/neurips/GhostNetV2: Enhance Cheap Operation with Long-Range Attention b/data/2022/neurips/GhostNetV2: Enhance Cheap Operation with Long-Range Attention
new file mode 100644
index 0000000000..915c505474
--- /dev/null
+++ b/data/2022/neurips/GhostNetV2: Enhance Cheap Operation with Long-Range Attention	
@@ -0,0 +1 @@
+Light-weight convolutional neural networks (CNNs) are specially designed for applications on mobile devices with faster inference speed. The convolutional operation can only capture local information in a window region, which prevents performance from being further improved. Introducing self-attention into convolution can capture global information well, but it will largely encumber the actual speed. In this paper, we propose a hardware-friendly attention mechanism (dubbed DFC attention) and then present a new GhostNetV2 architecture for mobile applications. The proposed DFC attention is constructed based on fully-connected layers, which can not only execute fast on common hardware but also capture the dependence between long-range pixels. We further revisit the expressiveness bottleneck in previous GhostNet and propose to enhance expanded features produced by cheap operations with DFC attention, so that a GhostNetV2 block can aggregate local and long-range information simultaneously. Extensive experiments demonstrate the superiority of GhostNetV2 over existing architectures. For example, it achieves 75.3% top-1 accuracy on ImageNet with 167M FLOPs, significantly suppressing GhostNetV1 (74.5%) with a similar computational cost. The source code will be available at https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnetv2_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/ghostnetv2.
\ No newline at end of file
diff --git a/data/2022/neurips/Giga-scale Kernel Matrix-Vector Multiplication on GPU b/data/2022/neurips/Giga-scale Kernel Matrix-Vector Multiplication on GPU
new file mode 100644
index 0000000000..faf379d010
--- /dev/null
+++ b/data/2022/neurips/Giga-scale Kernel Matrix-Vector Multiplication on GPU	
@@ -0,0 +1 @@
+Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scaling issues of KMVM for tall~($10^8\sim 10^9$) and skinny~($D\leq7$) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 1.5-5.5 times} at the cost of $<1\%$ drop in accuracy. We further demonstrate competitive results on \emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Giving Feedback on Interactive Student Programs with Meta-Exploration b/data/2022/neurips/Giving Feedback on Interactive Student Programs with Meta-Exploration
new file mode 100644
index 0000000000..a26764479e
--- /dev/null
+++ b/data/2022/neurips/Giving Feedback on Interactive Student Programs with Meta-Exploration	
@@ -0,0 +1 @@
+Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs. As a result, online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs, which critically hinders students' ability to learn. One approach toward automatic grading is to learn an agent that interacts with a student's program and explores states indicative of errors via reinforcement learning. However, existing work on this approach only provides binary feedback of whether a program is correct or not, while students require finer-grained feedback on the specific errors in their programs to understand their mistakes. In this work, we show that exploring to discover errors can be cast as a meta-exploration problem. This enables us to construct a principled objective for discovering errors and an algorithm for optimizing this objective, which provides fine-grained feedback. We evaluate our approach on a set of over 700K real anonymized student programs from a Code.org interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy. Project web page: https://ezliu.github.io/dreamgrader.
\ No newline at end of file
diff --git a/data/2022/neurips/GlanceNets: Interpretable, Leak-proof Concept-based Models b/data/2022/neurips/GlanceNets: Interpretable, Leak-proof Concept-based Models
new file mode 100644
index 0000000000..7350c1c3f9
--- /dev/null
+++ b/data/2022/neurips/GlanceNets: Interpretable, Leak-proof Concept-based Models	
@@ -0,0 +1 @@
+There is growing interest in concept-based models (CBMs) that combine high-performance and interpretability by acquiring and reasoning with a vocabulary of high-level concepts. A key requirement is that the concepts be interpretable. Existing CBMs tackle this desideratum using a variety of heuristics based on unclear notions of interpretability, and fail to acquire concepts with the intended semantics. We address this by providing a clear definition of interpretability in terms of alignment between the model's representation and an underlying data generation process, and introduce GlanceNets, a new CBM that exploits techniques from disentangled representation learning and open-set recognition to achieve alignment, thus improving the interpretability of the learned concepts. We show that GlanceNets, paired with concept-level supervision, achieve better alignment than state-of-the-art approaches while preventing spurious information from unintendedly leaking into the learned concepts.
\ No newline at end of file
diff --git a/data/2022/neurips/Global Convergence and Stability of Stochastic Gradient Descent b/data/2022/neurips/Global Convergence and Stability of Stochastic Gradient Descent
new file mode 100644
index 0000000000..0ed639f169
--- /dev/null
+++ b/data/2022/neurips/Global Convergence and Stability of Stochastic Gradient Descent	
@@ -0,0 +1 @@
+In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.
\ No newline at end of file
diff --git a/data/2022/neurips/Global Convergence of Federated Learning for Mixed Regression b/data/2022/neurips/Global Convergence of Federated Learning for Mixed Regression
new file mode 100644
index 0000000000..efaeb7971a
--- /dev/null
+++ b/data/2022/neurips/Global Convergence of Federated Learning for Mixed Regression	
@@ -0,0 +1 @@
+This paper studies the problem of model training under Federated Learning when clients exhibit cluster structure. We contextualize this problem in mixed regression, where each client has limited local data generated from one of $k$ unknown regression models. We design an algorithm that achieves global convergence from any initialization, and works even when local data volume is highly unbalanced -- there could exist clients that contain $O(1)$ data points only. Our algorithm first runs moment descent on a few anchor clients (each with $\tilde{\Omega}(k)$ data points) to obtain coarse model estimates. Then each client alternately estimates its cluster labels and refines the model estimates based on FedAvg or FedProx. A key innovation in our analysis is a uniform estimate on the clustering errors, which we prove by bounding the VC dimension of general polynomial concept classes based on the theory of algebraic geometry.
\ No newline at end of file
diff --git a/data/2022/neurips/Global Linear and Local Superlinear Convergence of IRLS for Non-Smooth Robust Regression b/data/2022/neurips/Global Linear and Local Superlinear Convergence of IRLS for Non-Smooth Robust Regression
new file mode 100644
index 0000000000..5054ee849b
--- /dev/null
+++ b/data/2022/neurips/Global Linear and Local Superlinear Convergence of IRLS for Non-Smooth Robust Regression	
@@ -0,0 +1 @@
+We advance both the theory and practice of robust $\ell_p$-quasinorm regression for $p \in (0,1]$ by using novel variants of iteratively reweighted least-squares (IRLS) to solve the underlying non-smooth problem. In the convex case, $p=1$, we prove that this IRLS variant converges globally at a linear rate under a mild, deterministic condition on the feature matrix called the \textit{stable range space property}. In the non-convex case, $p\in(0,1)$, we prove that under a similar condition, IRLS converges locally to the global minimizer at a superlinear rate of order $2-p$; the rate becomes quadratic as $p\to 0$. We showcase the proposed methods in three applications: real phase retrieval, regression without correspondences, and robust face restoration. The results show that (1) IRLS can handle a larger number of outliers than other methods, (2) it is faster than competing methods at the same level of accuracy, (3) it restores a sparsely corrupted face image with satisfactory visual quality. https://github.com/liangzu/IRLS-NeurIPS2022
\ No newline at end of file
diff --git a/data/2022/neurips/Global Normalization for Streaming Speech Recognition in a Modular Framework b/data/2022/neurips/Global Normalization for Streaming Speech Recognition in a Modular Framework
new file mode 100644
index 0000000000..9f2a147a2b
--- /dev/null
+++ b/data/2022/neurips/Global Normalization for Streaming Speech Recognition in a Modular Framework	
@@ -0,0 +1 @@
+We introduce the Globally Normalized Autoregressive Transducer (GNAT) for addressing the label bias problem in streaming speech recognition. Our solution admits a tractable exact computation of the denominator for the sequence-level normalization. Through theoretical and empirical results, we demonstrate that by switching to a globally normalized model, the word error rate gap between streaming and non-streaming speech-recognition models can be greatly reduced (by more than 50\% on the Librispeech dataset). This model is developed in a modular framework which encompasses all the common neural speech recognition models. The modularity of this framework enables controlled comparison of modelling choices and creation of new models.
\ No newline at end of file
diff --git a/data/2022/neurips/Global Optimal K-Medoids Clustering of One Million Samples b/data/2022/neurips/Global Optimal K-Medoids Clustering of One Million Samples
new file mode 100644
index 0000000000..163317fa02
--- /dev/null
+++ b/data/2022/neurips/Global Optimal K-Medoids Clustering of One Million Samples	
@@ -0,0 +1 @@
+We study the deterministic global optimization of the K-Medoids clustering problem. This work proposes a branch and bound (BB) scheme, in which a tailored Lagrangian relaxation method proposed in the 1970s is used to provide a lower bound at each BB node. The lower bounding method already guarantees the maximum gap at the root node. A closed-form solution to the lower bound can be derived analytically without explicitly solving any optimization problems, and its computation can be easily parallelized. Moreover, with this lower bounding method, finite convergence to the global optimal solution can be guaranteed by branching only on the regions of medoids. We also present several tailored bound tightening techniques to reduce the search space and computational cost. Extensive computational studies on 28 machine learning datasets demonstrate that our algorithm can provide a provable global optimal solution with an optimality gap of 0.1% within 4 hours on datasets with up to one million samples. Besides, our algorithm can obtain better or equal objective values than the heuristic method. A theoretical proof of global convergence for our algorithm is also presented.
\ No newline at end of file
diff --git a/data/2022/neurips/Globally Convergent Policy Search for Output Estimation b/data/2022/neurips/Globally Convergent Policy Search for Output Estimation
new file mode 100644
index 0000000000..ceefaaaafd
--- /dev/null
+++ b/data/2022/neurips/Globally Convergent Policy Search for Output Estimation	
@@ -0,0 +1 @@
+We introduce the ﬁrst direct policy search algorithm which provably converges to the globally optimal dynamic ﬁlter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difﬁcult to achieve. This is primarily due to the degeneracies which arise when optimizing over ﬁlters that maintain an internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of informativity , which intuitively requires that all components of a ﬁlter’s internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Speciﬁcally, we propose a regularizer which explicitly enforces informativity, and establish that gradient descent on this regularized objective – combined with a “reconditioning step” – converges to the globally optimal cost at a O (1 /T ) rate.
\ No newline at end of file
diff --git a/data/2022/neurips/Globally Gated Deep Linear Networks b/data/2022/neurips/Globally Gated Deep Linear Networks
new file mode 100644
index 0000000000..08c04b4e07
--- /dev/null
+++ b/data/2022/neurips/Globally Gated Deep Linear Networks	
@@ -0,0 +1 @@
+Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by $P,N\rightarrow\infty, P/N\sim O(1)$, where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks.
\ No newline at end of file
diff --git "a/data/2022/neurips/Gold-standard solutions to the Schr\303\266dinger equation using deep learning: How much physics do we need?" "b/data/2022/neurips/Gold-standard solutions to the Schr\303\266dinger equation using deep learning: How much physics do we need?"
new file mode 100644
index 0000000000..5129c60d6e
--- /dev/null
+++ "b/data/2022/neurips/Gold-standard solutions to the Schr\303\266dinger equation using deep learning: How much physics do we need?"	
@@ -0,0 +1 @@
+Finding accurate solutions to the Schr\"odinger equation is the key unsolved challenge of computational chemistry. Given its importance for the development of new chemical compounds, decades of research have been dedicated to this problem, but due to the large dimensionality even the best available methods do not yet reach the desired accuracy. Recently the combination of deep learning with Monte Carlo methods has emerged as a promising way to obtain highly accurate energies and moderate scaling of computational cost. In this paper we significantly contribute towards this goal by introducing a novel deep-learning architecture that achieves 40-70% lower energy error at 6x lower computational cost compared to previous approaches. Using our method we establish a new benchmark by calculating the most accurate variational ground state energies ever published for a number of different atoms and molecules. We systematically break down and measure our improvements, focusing in particular on the effect of increasing physical prior knowledge. We surprisingly find that increasing the prior knowledge given to the architecture can actually decrease accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/GraB: Finding Provably Better Data Permutations than Random Reshuffling b/data/2022/neurips/GraB: Finding Provably Better Data Permutations than Random Reshuffling
new file mode 100644
index 0000000000..e265b24c8b
--- /dev/null
+++ b/data/2022/neurips/GraB: Finding Provably Better Data Permutations than Random Reshuffling	
@@ -0,0 +1 @@
+Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in model training because it yields faster convergence than with-replacement sampling. Recent studies indicate greedily chosen data orderings can further speed up convergence empirically, at the cost of using more computation and memory. However, greedy ordering lacks theoretical justification and has limited utility due to its non-trivial memory and computation overhead. In this paper, we first formulate an example-ordering framework named herding and answer affirmatively that SGD with herding converges at the rate $O(T^{-2/3})$ on smooth, non-convex objectives, faster than the $O(n^{1/3}T^{-2/3})$ obtained by random reshuffling, where $n$ denotes the number of data points and $T$ denotes the total number of iterations. To reduce the memory overhead, we leverage discrepancy minimization theory to propose an online Gradient Balancing algorithm (GraB) that enjoys the same rate as herding, while reducing the memory usage from $O(nd)$ to just $O(d)$ and computation from $O(n^2)$ to $O(n)$, where $d$ denotes the model dimension. We show empirically on applications including MNIST, CIFAR10, WikiText and GLUE that GraB can outperform random reshuffling in terms of both training and validation performance, and even outperform state-of-the-art greedy ordering while reducing memory usage over $100\times$.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound b/data/2022/neurips/Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound
new file mode 100644
index 0000000000..dabcca5756
--- /dev/null
+++ b/data/2022/neurips/Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound	
@@ -0,0 +1 @@
+The study of first-order optimization is sensitive to the assumptions made on the objective functions. These assumptions induce complexity classes which play a key role in worst-case analysis, including the fundamental concept of algorithm optimality. Recent work argues that strong convexity and smoothness, popular assumptions in literature, lead to a pathological definition of the condition number (Guille-Escuret et al., 2021). Motivated by this result, we focus on the class of functions satisfying a lower restricted secant inequality and an upper error bound. On top of being robust to the aforementioned pathological behavior and including some non-convex functions, this pair of conditions displays interesting geometrical properties. In particular, the necessary and sufficient conditions to interpolate a set of points and their gradients within the class can be separated into simple conditions on each sampled gradient. This allows the performance estimation problem (PEP, Drori and Teboulle (2012)) to be solved analytically, leading to a lower bound on the convergence rate that proves gradient descent to be exactly optimal on this class of functions among all first-order algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient Descent: The Ultimate Optimizer b/data/2022/neurips/Gradient Descent: The Ultimate Optimizer
new file mode 100644
index 0000000000..9dd7ac2d95
--- /dev/null
+++ b/data/2022/neurips/Gradient Descent: The Ultimate Optimizer	
@@ -0,0 +1 @@
+Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as the learning rate. There exist many techniques for automated hyperparameter optimization, but they typically introduce even more hyperparameters to control the hyperparameter optimization process. We propose to instead learn the hyperparameters themselves by gradient descent, and furthermore to learn the hyper-hyperparameters by gradient descent as well, and so on ad infinitum. As these towers of gradient-based optimizers grow, they become significantly less sensitive to the choice of top-level hyperparameters, hence decreasing the burden on the user to search for optimal values.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient Estimation with Discrete Stein Operators b/data/2022/neurips/Gradient Estimation with Discrete Stein Operators
new file mode 100644
index 0000000000..7d2212448a
--- /dev/null
+++ b/data/2022/neurips/Gradient Estimation with Discrete Stein Operators	
@@ -0,0 +1 @@
+Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient Methods Provably Converge to Non-Robust Networks b/data/2022/neurips/Gradient Methods Provably Converge to Non-Robust Networks
new file mode 100644
index 0000000000..80d9618126
--- /dev/null
+++ b/data/2022/neurips/Gradient Methods Provably Converge to Non-Robust Networks	
@@ -0,0 +1 @@
+Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples. In this work, we identify natural settings where depth-$2$ ReLU networks trained with gradient flow are provably non-robust (susceptible to small adversarial $\ell_2$-perturbations), even when robust networks that classify the training dataset correctly exist. Perhaps surprisingly, we show that the well-known implicit bias towards margin maximization induces bias towards non-robust networks, by proving that every network which satisfies the KKT conditions of the max-margin problem is non-robust.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs b/data/2022/neurips/Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
new file mode 100644
index 0000000000..9df3f0ed5e
--- /dev/null
+++ b/data/2022/neurips/Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs	
@@ -0,0 +1 @@
+The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initialisation. In this setting, despite non-convexity, we show that the gradient flow converges to zero loss and characterise its implicit bias towards minimum variation norm. Furthermore, some interesting phenomena are highlighted: a quantitative description of the initial alignment phenomenon and a proof that the process follows a specific saddle to saddle dynamics.
\ No newline at end of file
diff --git a/data/2022/neurips/Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization b/data/2022/neurips/Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization
new file mode 100644
index 0000000000..7fa1d1916e
--- /dev/null
+++ b/data/2022/neurips/Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization	
@@ -0,0 +1 @@
+Nonsmooth nonconvex optimization problems broadly emerge in machine learning and business decision making, whereas two core challenges impede the development of efficient solution methods with finite-time convergence guarantee: the lack of computationally tractable optimality criterion and the lack of computationally powerful oracles. The contributions of this paper are two-fold. First, we establish the relationship between the celebrated Goldstein subdifferential~\citep{Goldstein-1977-Optimization} and uniform smoothing, thereby providing the basis and intuition for the design of gradient-free methods that guarantee the finite-time convergence to a set of Goldstein stationary points. Second, we propose the gradient-free method (GFM) and stochastic GFM for solving a class of nonsmooth nonconvex optimization problems and prove that both of them can return a $(\delta,\epsilon)$-Goldstein stationary point of a Lipschitz function $f$ at an expected convergence rate at $O(d^{3/2}\delta^{-1}\epsilon^{-4})$ where $d$ is the problem dimension. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results. Finally, we demonstrate the effectiveness of 2-SGFM on training ReLU neural networks with the \textsc{Minst} dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction b/data/2022/neurips/Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction
new file mode 100644
index 0000000000..c5971cd922
--- /dev/null
+++ b/data/2022/neurips/Graph Coloring via Neural Networks for Haplotype Assembly and Viral Quasispecies Reconstruction	
@@ -0,0 +1 @@
+Understanding genetic variation, e.g., through mutations, in organisms is crucial to unravel their effects on the environment and human health. A fundamental characterization can be obtained by solving the haplotype assembly problem, which yields the variation across multiple copies of chromosomes. Variations among fast evolving viruses that lead to different strains (called quasispecies) are also deciphered with similar approaches. In both these cases, high-throughput sequencing technologies that provide oversampled mixtures of large noisy fragments (reads) of genomes, are used to infer constituent components (haplotypes or quasispecies). The problem is harder for polyploid species where there are more than two copies of chromosomes. State-of-the-art neural approaches to solve this NP-hard problem do not adequately model relations among the reads that are important for deconvolving the input signal. We address this problem by developing a new method, called NeurHap, that combines graph representation learning with combinatorial optimization. Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy b/data/2022/neurips/Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy
new file mode 100644
index 0000000000..7cc0a81dbf
--- /dev/null
+++ b/data/2022/neurips/Graph Convolution Network based Recommender Systems: Learning Guarantee and Item Mixture Powered Strategy	
@@ -0,0 +1 @@
+Inspired by their powerful representation ability on graph-structured data, Graph Convolution Networks (GCNs) have been widely applied to recommender systems, and have shown superior performance. Despite their empirical success, there is a lack of theoretical explorations such as generalization properties. In this paper, we take a first step towards establishing a generalization guarantee for GCN-based recommendation models under inductive and transductive learning. We mainly investigate the roles of graph normalization and non-linear activation, providing some theoretical understanding, and construct extensive experiments to further verify these findings empirically. Furthermore, based on the proven generalization bound and the challenge of existing models in discrete data learning, we propose Item Mixture (IMix) to enhance recommendation. It models discrete spaces in a continuous manner by mixing the embeddings of positive-negative item pairs, and its effectiveness can be strictly guaranteed from empirical and theoretical aspects.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Few-shot Learning with Task-specific Structures b/data/2022/neurips/Graph Few-shot Learning with Task-specific Structures
new file mode 100644
index 0000000000..b20dc17fdd
--- /dev/null
+++ b/data/2022/neurips/Graph Few-shot Learning with Task-specific Structures	
@@ -0,0 +1 @@
+Graph few-shot learning is of great importance among various graph learning tasks. Under the few-shot scenario, models are often required to conduct classification given limited labeled samples. Existing graph few-shot learning methods typically leverage Graph Neural Networks (GNNs) and perform classification across a series of meta-tasks. Nevertheless, these methods generally rely on the original graph (i.e., the graph that the meta-task is sampled from) to learn node representations. Consequently, the graph structure used in each meta-task is identical. Since the class sets are different across meta-tasks, node representations should be learned in a task-specific manner to promote classification performance. Therefore, to adaptively learn node representations across meta-tasks, we propose a novel framework that learns a task-specific structure for each meta-task. To handle the variety of nodes across meta-tasks, we extract relevant nodes and learn task-specific structures based on node influence and mutual information. In this way, we can learn node representations with the task-specific structure tailored for each meta-task. We further conduct extensive experiments on five node classification datasets under both single- and multiple-graph settings to validate the superiority of our framework over the state-of-the-art baselines. Our code is provided at https://github.com/SongW-SW/GLITTER.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Learning Assisted Multi-Objective Integer Programming b/data/2022/neurips/Graph Learning Assisted Multi-Objective Integer Programming
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Graph Neural Network Bandits b/data/2022/neurips/Graph Neural Network Bandits
new file mode 100644
index 0000000000..b03939267f
--- /dev/null
+++ b/data/2022/neurips/Graph Neural Network Bandits	
@@ -0,0 +1 @@
+We consider the bandit optimization problem with the reward function defined over graph-structured data. This problem has important applications in molecule design and drug discovery, where the reward is naturally invariant to graph permutations. The key challenges in this setting are scaling to large domains, and to graphs with many nodes. We resolve these challenges by embedding the permutation invariance into our model. In particular, we show that graph neural networks (GNNs) can be used to estimate the reward function, assuming it resides in the Reproducing Kernel Hilbert Space of a permutation-invariant additive kernel. By establishing a novel connection between such kernels and the graph neural tangent kernel (GNTK), we introduce the first GNN confidence bound and use it to design a phased-elimination algorithm with sublinear regret. Our regret bound depends on the GNTK's maximum information gain, which we also provide a bound for. While the reward function depends on all $N$ node features, our guarantees are independent of the number of graph nodes $N$. Empirically, our approach exhibits competitive performance and scales well on graph-structured domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Neural Networks are Dynamic Programmers b/data/2022/neurips/Graph Neural Networks are Dynamic Programmers
new file mode 100644
index 0000000000..2cee5eda66
--- /dev/null
+++ b/data/2022/neurips/Graph Neural Networks are Dynamic Programmers	
@@ -0,0 +1 @@
+Recent advances in neural algorithmic reasoning with graph neural networks (GNNs) are propped up by the notion of algorithmic alignment. Broadly, a neural network will be better at learning to execute a reasoning task (in terms of sample complexity) if its individual components align well with the target algorithm. Specifically, GNNs are claimed to align with dynamic programming (DP), a general problem-solving strategy which expresses many polynomial-time algorithms. However, has this alignment truly been demonstrated and theoretically quantified? Here we show, using methods from category theory and abstract algebra, that there exists an intricate connection between GNNs and DP, going well beyond the initial observations over individual algorithms such as Bellman-Ford. Exposing this connection, we easily verify several prior findings in the literature, produce better-grounded GNN architectures for edge-centric tasks, and demonstrate empirical results on the CLRS algorithmic reasoning benchmark. We hope our exposition will serve as a foundation for building stronger algorithmically aligned GNNs.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Neural Networks with Adaptive Readouts b/data/2022/neurips/Graph Neural Networks with Adaptive Readouts
new file mode 100644
index 0000000000..525e7f4a46
--- /dev/null
+++ b/data/2022/neurips/Graph Neural Networks with Adaptive Readouts	
@@ -0,0 +1 @@
+An effective aggregation of node features into a graph-level representation via readout functions is an essential step in numerous learning tasks involving graph neural networks. Typically, readouts are simple and non-adaptive functions designed such that the resulting hypothesis space is permutation invariant. Prior work on deep sets indicates that such readouts might require complex node embeddings that can be difficult to learn via standard neighborhood aggregation schemes. Motivated by this, we investigate the potential of adaptive readouts given by neural networks that do not necessarily give rise to permutation invariant hypothesis spaces. We argue that in some problems such as binding affinity prediction where molecules are typically presented in a canonical form it might be possible to relax the constraints on permutation invariance of the hypothesis space and learn a more effective model of the affinity by employing an adaptive readout function. Our empirical results demonstrate the effectiveness of neural readouts on more than 40 datasets spanning different domains and graph characteristics. Moreover, we observe a consistent improvement over standard readouts (i.e., sum, max, and mean) relative to the number of neighborhood aggregation iterations and different convolutional operators.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Reordering for Cache-Efficient Near Neighbor Search b/data/2022/neurips/Graph Reordering for Cache-Efficient Near Neighbor Search
new file mode 100644
index 0000000000..c697fcec84
--- /dev/null
+++ b/data/2022/neurips/Graph Reordering for Cache-Efficient Near Neighbor Search	
@@ -0,0 +1 @@
+Graph search is one of the most successful algorithmic trends in near neighbor search. Several of the most popular and empirically successful algorithms are, at their core, a simple walk along a pruned near neighbor graph. Such algorithms consistently perform at the top of industrial speed benchmarks for applications such as embedding search. However, graph traversal applications often suffer from poor memory access patterns, and near neighbor search is no exception to this rule. Our measurements show that popular search indices such as the hierarchical navigable small-world graph (HNSW) can have poor cache miss performance. To address this problem, we apply graph reordering algorithms to near neighbor graphs. Graph reordering is a memory layout optimization that groups commonly-accessed nodes together in memory. We present exhaustive experiments applying several reordering algorithms to a leading graph-based near neighbor method based on the HNSW index. We find that reordering improves the query time by up to 40%, and we demonstrate that the time needed to reorder the graph is negligible compared to the time required to construct the index.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Scattering beyond Wavelet Shackles b/data/2022/neurips/Graph Scattering beyond Wavelet Shackles
new file mode 100644
index 0000000000..0f26a880e9
--- /dev/null
+++ b/data/2022/neurips/Graph Scattering beyond Wavelet Shackles	
@@ -0,0 +1 @@
+This work develops a flexible and mathematically sound framework for the design and analysis of graph scattering networks with variable branching ratios and generic functional calculus filters. Spectrally-agnostic stability guarantees for node- and graph-level perturbations are derived; the vertex-set non-preserving case is treated by utilizing recently developed mathematical-physics based tools. Energy propagation through the network layers is investigated and related to truncation stability. New methods of graph-level feature aggregation are introduced and stability of the resulting composite scattering architectures is established. Finally, scattering transforms are extended to edge- and higher order tensorial input. Theoretical results are complemented by numerical investigations: Suitably chosen cattering networks conforming to the developed theory perform better than traditional graph-wavelet based scattering approaches in social network graph classification tasks and significantly outperform other graph-based learning approaches to regression of quantum-chemical energies on QM7.
\ No newline at end of file
diff --git a/data/2022/neurips/Graph Self-supervised Learning with Accurate Discrepancy Learning b/data/2022/neurips/Graph Self-supervised Learning with Accurate Discrepancy Learning
new file mode 100644
index 0000000000..ce0202b5ae
--- /dev/null
+++ b/data/2022/neurips/Graph Self-supervised Learning with Accurate Discrepancy Learning	
@@ -0,0 +1 @@
+Self-supervised learning of graph neural networks (GNNs) aims to learn an accurate representation of the graphs in an unsupervised manner, to obtain transferable representations of them for diverse downstream tasks. Predictive learning and contrastive learning are the two most prevalent approaches for graph self-supervised learning. However, they have their own drawbacks. While the predictive learning methods can learn the contextual relationships between neighboring nodes and edges, they cannot learn global graph-level similarities. Contrastive learning, while it can learn global graph-level similarities, its objective to maximize the similarity between two differently perturbed graphs may result in representations that cannot discriminate two similar graphs with different properties. To tackle such limitations, we propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA). Specifically, we create multiple perturbations of the given graph with varying degrees of similarity, and train the model to predict whether each graph is the original graph or the perturbed one. Moreover, we further aim to accurately capture the amount of discrepancy for each perturbed graph using the graph edit distance. We validate our D-SLA on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which ours largely outperforms relevant baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs b/data/2022/neurips/GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs
new file mode 100644
index 0000000000..d599442a27
--- /dev/null
+++ b/data/2022/neurips/GraphDE: A Generative Framework for Debiased Learning and Out-of-Distribution Detection on Graphs	
@@ -0,0 +1 @@
+Despite the remarkable success of graph neural networks (GNNs) for graph representation learning, they are generally built on the (unreliable) i.i.d. assumption across training and testing data. However, real-world graph data are universally comprised of outliers in training set and out-of-distribution (OOD) testing samples from unseen domains, which solicits effective models for i) debiased learning and ii) OOD detection, towards general trustworthy purpose. In this paper, we first mathematically formulate the two challenging problems for graph data and take an initiative on tackling them under a unified probabilistic model. Specifically, we model the graph generative process to characterize the distribution shifts of graph data together with an additionally introduced latent environment variable as an indicator. We then define a variational distribution, i.e., a recognition model, to infer the environment during training of GNN. By instantiating the generative models as two-component mixtures, we derive a tractable learning objective and theoretically justify that the model can i) automatically identify and down-weight outliers in the training procedure, and ii) induce an effective OOD detector simultaneously. Experiments on diverse datasets with different types of OOD data prove that our model consistently outperforms strong baselines for both debiasing and OOD detection tasks. The source code has been made publicly available at https://github.com/Emiyalzn/GraphDE .
\ No newline at end of file
diff --git a/data/2022/neurips/GraphQNTK: Quantum Neural Tangent Kernel for Graph Data b/data/2022/neurips/GraphQNTK: Quantum Neural Tangent Kernel for Graph Data
new file mode 100644
index 0000000000..101e9875c2
--- /dev/null
+++ b/data/2022/neurips/GraphQNTK: Quantum Neural Tangent Kernel for Graph Data	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) and Graph Kernels (GKs) are two fundamental tools used to analyze graph-structured data. Efforts have been recently made in developing a composite graph learning architecture combining the expressive power of GNNs and the transparent trainability of GKs. However, learning efficiency on these models should be carefully considered as the huge computation overhead. Besides, their convolutional methods are often straightforward and introduce severe loss of graph structure information. In this paper, we design a novel quantum graph learning model to characterize the structural information while using quantum parallelism to improve computing efficiency. Specifically, a quantum algorithm is proposed to approximately estimate the neural tangent kernel of the underlying graph neural network where a multi-head quantum attention mechanism is introduced to properly incorporate semantic similarity information of nodes into the model. We empirically show that our method achieves competitive performance on several graph classification benchmarks, and theoretical analysis is provided to demonstrate the superiority of our quantum algorithm. Source code is available at https://github.com/abel1231/graphQNTK .
\ No newline at end of file
diff --git a/data/2022/neurips/Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks b/data/2022/neurips/Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks
new file mode 100644
index 0000000000..8be72bdb45
--- /dev/null
+++ b/data/2022/neurips/Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks	
@@ -0,0 +1 @@
+Geometric deep learning has broad applications in biology, a domain where relational structure in data is often intrinsic to modelling the underlying phenomena. Currently, efforts in both geometric deep learning and, more broadly, deep learning applied to biomolecular tasks have been hampered by a scarcity of appropriate datasets accessible to domain specialists and machine learning researchers alike. To address this, we introduce Graphein as a turn-key tool for transforming raw data from widely-used bioinformatics databases into machine learning-ready datasets in a high-throughput and ﬂexible manner. Graphein is a Python library for constructing graph and surface-mesh representations of biomolecular structures, such as proteins, nucleic acids and small molecules, and biological interaction networks for computational analysis and machine learning. Graphein provides utilities for data retrieval from widely-used bioinformatics databases for structural data, including the Protein Data Bank, the AlphaFold Structure Database, chemical data from ZINC and ChEMBL, and for biomolecular interaction networks from STRINGdb, BioGrid, TRRUST and RegNetwork. The library interfaces with popular geometric deep learning libraries: DGL, Jraph, PyTorch Geometric and PyTorch3D though remains framework agnostic as it is built on top of the PyData ecosystem to enable inter-operability with scientiﬁc computing tools and libraries. Graphein is designed to be highly ﬂexible, allowing the user to specify each step of the data preparation, scalable to facilitate working with large protein complexes and interaction graphs, and contains useful pre-processing tools
\ No newline at end of file
diff --git a/data/2022/neurips/Green Hierarchical Vision Transformer for Masked Image Modeling b/data/2022/neurips/Green Hierarchical Vision Transformer for Masked Image Modeling
new file mode 100644
index 0000000000..fa0a0b668d
--- /dev/null
+++ b/data/2022/neurips/Green Hierarchical Vision Transformer for Masked Image Modeling	
@@ -0,0 +1 @@
+We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones. Our approach consists of three key designs. First, for window attention, we propose a Group Window Attention scheme following the Divide-and-Conquer strategy. To mitigate the quadratic complexity of the self-attention w.r.t. the number of patches, group attention encourages a uniform partition that visible patches within each local window of arbitrary size can be grouped with equal size, where masked self-attention is then performed within each group. Second, we further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall computation cost of the attention on the grouped patches. Third, as for the convolution layers, we convert them to the Sparse Convolution that works seamlessly with the sparse data, i.e., the visible patches in MIM. As a result, MIM can now work on most, if not all, hierarchical ViTs in a green and efficient way. For example, we can train the hierarchical ViTs, e.g., Swin Transformer and Twins Transformer, about 2.7$\times$ faster and reduce the GPU memory usage by 70%, while still enjoying competitive performance on ImageNet classification and the superiority on downstream COCO object detection benchmarks. Code and pre-trained models have been made publicly available at https://github.com/LayneH/GreenMIM.
\ No newline at end of file
diff --git a/data/2022/neurips/GriddlyJS: A Web IDE for Reinforcement Learning b/data/2022/neurips/GriddlyJS: A Web IDE for Reinforcement Learning
new file mode 100644
index 0000000000..d1167dae70
--- /dev/null
+++ b/data/2022/neurips/GriddlyJS: A Web IDE for Reinforcement Learning	
@@ -0,0 +1 @@
+Progress in reinforcement learning (RL) research is often driven by the design of new, challenging environments -- a costly undertaking requiring skills orthogonal to that of a typical machine learning researcher. The complexity of environment development has only increased with the rise of procedural-content generation (PCG) as the prevailing paradigm for producing varied environments capable of testing the robustness and generalization of RL agents. Moreover, existing environments often require complex build processes, making reproducing results difficult. To address these issues, we introduce GriddlyJS, a web-based Integrated Development Environment (IDE) based on the Griddly engine. GriddlyJS allows researchers to visually design and debug arbitrary, complex PCG grid-world environments using a convenient graphical interface, as well as visualize, evaluate, and record the performance of trained agent models. By connecting the RL workflow to the advanced functionality enabled by modern web standards, GriddlyJS allows publishing interactive agent-environment demos that reproduce experimental results directly to the web. To demonstrate the versatility of GriddlyJS, we use it to quickly develop a complex compositional puzzle-solving environment alongside arbitrary human-designed environment configurations and their solutions for use in automatic curriculum learning and offline RL. The GriddlyJS IDE is open source and freely available at https://griddly.ai.
\ No newline at end of file
diff --git a/data/2022/neurips/Grounded Reinforcement Learning: Learning to Win the Game under Human Commands b/data/2022/neurips/Grounded Reinforcement Learning: Learning to Win the Game under Human Commands
new file mode 100644
index 0000000000..8dabe8fe72
--- /dev/null
+++ b/data/2022/neurips/Grounded Reinforcement Learning: Learning to Win the Game under Human Commands	
@@ -0,0 +1 @@
+We consider the problem of building a reinforcement learning (RL) agent that can both accomplish non-trivial tasks, like winning a real-time strategy game, and strictly follow high-level language commands from humans, like “attack”, even if a command is sub-optimal. We call this novel yet important problem, Grounded Reinforcement Learning (GRL). Compared with other language grounding tasks, GRL is particularly non-trivial and cannot be simply solved by pure RL or behavior cloning (BC). From the RL perspective, it is extremely challenging to derive a precise reward function for human preferences since the commands are abstract and the valid behaviors are highly complicated and multi-modal. From the BC perspective, it is impossible to obtain perfect demonstrations since human strategies in complex games are typically sub-optimal. We tackle GRL via a simple, tractable, and practical constrained RL objective and develop an iterative RL algorithm, REinforced demonstration Distillation (RED), to obtain a strong GRL policy. We evaluate the policies derived by RED, BC and pure RL methods on a simplified real-time strategy game, MiniRTS. Experiment results and human studies show that the RED policy is able to consistently follow human commands and, at the same time, achieve a higher win rate than the baselines. We release our code and present more examples at https://sites.google.com/view/grounded-rl .
\ No newline at end of file
diff --git a/data/2022/neurips/Grounded Video Situation Recognition b/data/2022/neurips/Grounded Video Situation Recognition
new file mode 100644
index 0000000000..b5fdbb49ca
--- /dev/null
+++ b/data/2022/neurips/Grounded Video Situation Recognition	
@@ -0,0 +1 @@
+Dense video understanding requires answering several questions such as who is doing what to whom, with what, how, why, and where. Recently, Video Situation Recognition (VidSitu) is framed as a task for structured prediction of multiple events, their relationships, and actions and various verb-role pairs attached to descriptive entities. This task poses several challenges in identifying, disambiguating, and co-referencing entities across multiple verb-role pairs, but also faces some challenges of evaluation. In this work, we propose the addition of spatio-temporal grounding as an essential component of the structured prediction task in a weakly supervised setting, and present a novel three stage Transformer model, VideoWhisperer, that is empowered to make joint predictions. In stage one, we learn contextualised embeddings for video features in parallel with key objects that appear in the video clips to enable fine-grained spatio-temporal reasoning. The second stage sees verb-role queries attend and pool information from object embeddings, localising answers to questions posed about the action. The final stage generates these answers as captions to describe each verb-role pair present in the video. Our model operates on a group of events (clips) simultaneously and predicts verbs, verb-role pairs, their nouns, and their grounding on-the-fly. When evaluated on a grounding-augmented version of the VidSitu dataset, we observe a large improvement in entity captioning accuracy, as well as the ability to localize verb-roles without grounding annotations at training time.
\ No newline at end of file
diff --git a/data/2022/neurips/Grounding Aleatoric Uncertainty for Unsupervised Environment Design b/data/2022/neurips/Grounding Aleatoric Uncertainty for Unsupervised Environment Design
new file mode 100644
index 0000000000..dcff4f9f3a
--- /dev/null
+++ b/data/2022/neurips/Grounding Aleatoric Uncertainty for Unsupervised Environment Design	
@@ -0,0 +1 @@
+Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution. We formalize this phenomenon as curriculum-induced covariate shift (CICS), and describe how its occurrence in aleatoric parameters can lead to suboptimal policies. Directly sampling these parameters from the ground-truth distribution avoids the issue, but thwarts curriculum learning. We propose SAMPLR, a minimax regret UED method that optimizes the ground-truth utility function, even when the underlying training data is biased due to CICS. We prove, and validate on challenging domains, that our approach preserves optimality under the ground-truth distribution, while promoting robustness across the full range of environment settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Group Meritocratic Fairness in Linear Contextual Bandits b/data/2022/neurips/Group Meritocratic Fairness in Linear Contextual Bandits
new file mode 100644
index 0000000000..c389d3fe66
--- /dev/null
+++ b/data/2022/neurips/Group Meritocratic Fairness in Linear Contextual Bandits	
@@ -0,0 +1 @@
+We study the linear contextual bandit problem where an agent has to select one candidate from a pool and each candidate belongs to a sensitive group. In this setting, candidates' rewards may not be directly comparable between groups, for example when the agent is an employer hiring candidates from different ethnic groups and some groups have a lower reward due to discriminatory bias and/or social injustice. We propose a notion of fairness that states that the agent's policy is fair when it selects a candidate with highest relative rank, which measures how good the reward is when compared to candidates from the same group. This is a very strong notion of fairness, since the relative rank is not directly observed by the agent and depends on the underlying reward model and on the distribution of rewards. Thus we study the problem of learning a policy which approximates a fair policy under the condition that the contexts are independent between groups and the distribution of rewards of each group is absolutely continuous. In particular, we design a greedy policy which at each round constructs a ridge regression estimate from the observed context-reward pairs, and then computes an estimate of the relative rank of each candidate using the empirical cumulative distribution function. We prove that, despite its simplicity and the lack of an initial exploration phase, the greedy policy achieves, up to log factors and with high probability, a fair pseudo-regret of order $\sqrt{dT}$ after $T$ rounds, where $d$ is the dimension of the context vectors. The policy also satisfies demographic parity at each round when averaged over all possible information available before the selection. Finally, we use simulated settings and experiments on the US census data to show that our policy achieves sub-linear fair pseudo-regret also in practice.
\ No newline at end of file
diff --git a/data/2022/neurips/Grow and Merge: A Unified Framework for Continuous Categories Discovery b/data/2022/neurips/Grow and Merge: A Unified Framework for Continuous Categories Discovery
new file mode 100644
index 0000000000..cf1b22e1c7
--- /dev/null
+++ b/data/2022/neurips/Grow and Merge: A Unified Framework for Continuous Categories Discovery	
@@ -0,0 +1 @@
+Although a number of studies are devoted to novel category discovery, most of them assume a static setting where both labeled and unlabeled data are given at once for finding new categories. In this work, we focus on the application scenarios where unlabeled data are continuously fed into the category discovery system. We refer to it as the {\bf Continuous Category Discovery} ({\bf CCD}) problem, which is significantly more challenging than the static setting. A common challenge faced by novel category discovery is that different sets of features are needed for classification and category discovery: class discriminative features are preferred for classification, while rich and diverse features are more suitable for new category mining. This challenge becomes more severe for dynamic setting as the system is asked to deliver good performance for known classes over time, and at the same time continuously discover new classes from unlabeled data. To address this challenge, we develop a framework of {\bf Grow and Merge} ({\bf GM}) that works by alternating between a growing phase and a merging phase: in the growing phase, it increases the diversity of features through a continuous self-supervised learning for effective category mining, and in the merging phase, it merges the grown model with a static one to ensure satisfying performance for known classes. Our extensive studies verify that the proposed GM framework is significantly more effective than the state-of-the-art approaches for continuous category discovery.
\ No newline at end of file
diff --git a/data/2022/neurips/Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics b/data/2022/neurips/Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics
new file mode 100644
index 0000000000..a7e883fc86
--- /dev/null
+++ b/data/2022/neurips/Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics	
@@ -0,0 +1 @@
+We present a novel method for guaranteeing linear momentum in learned physics simulations. Unlike existing methods, we enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. We combine these strict constraints with a hierarchical network architecture, a carefully constructed resampling scheme, and a training approach for temporal coherence. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially. In addition, the induced physical bias leads to significantly better generalization performance and makes our method more reliable in unseen test cases. We evaluate our method on a range of different, challenging fluid scenarios. Among others, we demonstrate that our approach generalizes to new scenarios with up to one million particles. Our results show that the proposed algorithm can learn complex dynamics while outperforming existing approaches in generalization and training performance. An implementation of our approach is available at https://github.com/tum-pbs/DMCF.
\ No newline at end of file
diff --git a/data/2022/neurips/HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions b/data/2022/neurips/HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions
new file mode 100644
index 0000000000..bb48a178ac
--- /dev/null
+++ b/data/2022/neurips/HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions	
@@ -0,0 +1 @@
+Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for tasks such as object detection, OCR and sentiment analysis. Different ML APIs tackling the same task can have very heterogeneous performance. Moreover, the ML models underlying the APIs also evolve over time. As ML APIs rapidly become a valuable marketplace and a widespread way to consume machine learning, it is critical to systematically study and compare different APIs with each other and to characterize how APIs change over time. However, this topic is currently underexplored due to the lack of data. In this paper, we present HAPI (History of APIs), a longitudinal dataset of 1,761,417 instances of commercial ML API applications (involving APIs from Amazon, Google, IBM, Microsoft and other providers) across diverse tasks including image tagging, speech recognition and text mining from 2020 to 2022. Each instance consists of a query input for an API (e.g., an image or text) along with the API's output prediction/annotation and confidence scores. HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS). As examples of the types of analyses that HAPI enables, we show that ML APIs' performance change substantially over time--several APIs' accuracies dropped on specific benchmark datasets. Even when the API's aggregate performance stays steady, its error modes can shift across different subtypes of data between 2020 and 2022. Such changes can substantially impact the entire analytics pipelines that use some ML API as a component. We further use HAPI to study commercial APIs' performance disparities across demographic subgroups over time. HAPI can stimulate more research in the growing field of MLaaS.
\ No newline at end of file
diff --git a/data/2022/neurips/HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details b/data/2022/neurips/HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details
new file mode 100644
index 0000000000..f441ea3306
--- /dev/null
+++ b/data/2022/neurips/HF-NeuS: Improved Surface Reconstruction Using High-Frequency Details	
@@ -0,0 +1 @@
+Neural rendering can be used to reconstruct implicit representations of shapes without 3D supervision. However, current neural surface reconstruction methods have difficulty learning high-frequency geometry details, so the reconstructed shapes are often over-smoothed. We develop HF-NeuS, a novel method to improve the quality of surface reconstruction in neural rendering. We follow recent work to model surfaces as signed distance functions (SDFs). First, we offer a derivation to analyze the relationship between the SDF, the volume density, the transparency function, and the weighting function used in the volume rendering equation and propose to model transparency as transformed SDF. Second, we observe that attempting to jointly encode high-frequency and low-frequency components in a single SDF leads to unstable optimization. We propose to decompose the SDF into a base function and a displacement function with a coarse-to-fine strategy to gradually increase the high-frequency details. Finally, we design an adaptive optimization strategy that makes the training process focus on improving those regions near the surface where the SDFs have artifacts. Our qualitative and quantitative results show that our method can reconstruct fine-grained surface details and obtain better surface reconstruction quality than the current state of the art. Code available at https://github.com/yiqun-wang/HFS.
\ No newline at end of file
diff --git a/data/2022/neurips/HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies b/data/2022/neurips/HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies
new file mode 100644
index 0000000000..91a1e5d61a
--- /dev/null
+++ b/data/2022/neurips/HSDF: Hybrid Sign and Distance Field for Modeling Surfaces with Arbitrary Topologies	
@@ -0,0 +1 @@
+. Abstract Neural implicit function based on signed distance field (SDF) has achieved impressive progress in reconstructing 3D models with high fidelity. However, such approaches can only represent closed surfaces. Recent works based on unsigned distance function (UDF) are proposed to handle both watertight and open surfaces. Nonetheless, as UDF is signless, its direct output is limited to the point cloud, which imposes an additional challenge on extracting high-quality meshes from discrete points. To address this challenge, we present a novel neural implicit representation coded HSDF, which is a hybrid of signed and unsigned distance fields. In particular, HSDF is able to represent arbitrary topologies containing both closed and open surfaces while being compatible with existing iso-surface extraction techniques for easy field-to-mesh conversion. In addition to predicting a UDF, we propose to learn an additional sign field. Unlike traditional SDF, HSDF is able to locate the surface of interest before level surface extraction by generating surface points following NDF [15]. We are then able to obtain open surfaces via an adaptive meshing approach that only instantiates regions containing surfaces into a polygon
\ No newline at end of file
diff --git a/data/2022/neurips/HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces b/data/2022/neurips/HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces
new file mode 100644
index 0000000000..c792a83f6c
--- /dev/null
+++ b/data/2022/neurips/HSurf-Net: Normal Estimation for 3D Point Clouds by Learning Hyper Surfaces	
@@ -0,0 +1 @@
+We propose a novel normal estimation method called HSurf-Net, which can accurately predict normals from point clouds with noise and density variations. Previous methods focus on learning point weights to fit neighborhoods into a geometric surface approximated by a polynomial function with a predefined order, based on which normals are estimated. However, fitting surfaces explicitly from raw point clouds suffers from overfitting or underfitting issues caused by inappropriate polynomial orders and outliers, which significantly limits the performance of existing methods. To address these issues, we introduce hyper surface fitting to implicitly learn hyper surfaces, which are represented by multi-layer perceptron (MLP) layers that take point features as input and output surface patterns in a high dimensional feature space. We introduce a novel space transformation module, which consists of a sequence of local aggregation layers and global shift layers, to learn an optimal feature space, and a relative position encoding module to effectively convert point clouds into the learned feature space. Our model learns hyper surfaces from the noise-less features and directly predicts normal vectors. We jointly optimize the MLP weights and module parameters in a data-driven manner to make the model adaptively find the most suitable surface pattern for various points. Experimental results show that our HSurf-Net achieves the state-of-the-art performance on the synthetic shape dataset, the real-world indoor and outdoor scene datasets. The code, data and pretrained models are publicly available.
\ No newline at end of file
diff --git a/data/2022/neurips/HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes b/data/2022/neurips/HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
new file mode 100644
index 0000000000..7ac2953efa
--- /dev/null
+++ b/data/2022/neurips/HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes	
@@ -0,0 +1 @@
+Learning to generate diverse scene-aware and goal-oriented human motions in 3D scenes remains challenging due to the mediocre characteristics of the existing datasets on Human-Scene Interaction (HSI); they only have limited scale/quality and lack semantics. To fill in the gap, we propose a large-scale and semantic-rich synthetic HSI dataset, denoted as HUMANISE, by aligning the captured human motion sequences with various 3D indoor scenes. We automatically annotate the aligned motions with language descriptions that depict the action and the unique interacting objects in the scene; e.g., sit on the armchair near the desk. HUMANISE thus enables a new generation task, language-conditioned human motion generation in 3D scenes. The proposed task is challenging as it requires joint modeling of the 3D scene, human motion, and natural language. To tackle this task, we present a novel scene-and-language conditioned generative model that can produce 3D human motions of the desirable action interacting with the specified objects. Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
\ No newline at end of file
diff --git a/data/2022/neurips/HUMUS-Net: Hybrid Unrolled Multi-scale Network Architecture for Accelerated MRI Reconstruction b/data/2022/neurips/HUMUS-Net: Hybrid Unrolled Multi-scale Network Architecture for Accelerated MRI Reconstruction
new file mode 100644
index 0000000000..6361e41ea7
--- /dev/null
+++ b/data/2022/neurips/HUMUS-Net: Hybrid Unrolled Multi-scale Network Architecture for Accelerated MRI Reconstruction	
@@ -0,0 +1 @@
+In accelerated MRI reconstruction, the anatomy of a patient is recovered from a set of under-sampled and noisy measurements. Deep learning approaches have been proven to be successful in solving this ill-posed inverse problem and are capable of producing very high quality reconstructions. However, current architectures heavily rely on convolutions, that are content-independent and have difficulties modeling long-range dependencies in images. Recently, Transformers, the workhorse of contemporary natural language processing, have emerged as powerful building blocks for a multitude of vision tasks. These models split input images into non-overlapping patches, embed the patches into lower-dimensional tokens and utilize a self-attention mechanism that does not suffer from the aforementioned weaknesses of convolutional architectures. However, Transformers incur extremely high compute and memory cost when 1) the input image resolution is high and 2) when the image needs to be split into a large number of patches to preserve fine detail information, both of which are typical in low-level vision problems such as MRI reconstruction, having a compounding effect. To tackle these challenges, we propose HUMUS-Net, a hybrid architecture that combines the beneficial implicit bias and efficiency of convolutions with the power of Transformer blocks in an unrolled and multi-scale network. HUMUS-Net extracts high-resolution features via convolutional blocks and refines low-resolution features via a novel Transformer-based multi-scale feature extractor. Features from both levels are then synthesized into a high-resolution output reconstruction. Our network establishes new state of the art on the largest publicly available MRI dataset, the fastMRI dataset. We further demonstrate the performance of HUMUS-Net on two other popular MRI datasets and perform fine-grained ablation studies to validate our design.
\ No newline at end of file
diff --git a/data/2022/neurips/HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences b/data/2022/neurips/HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences
new file mode 100644
index 0000000000..ad42b7febb
--- /dev/null
+++ b/data/2022/neurips/HYPRO: A Hybridly Normalized Probabilistic Model for Long-Horizon Prediction of Event Sequences	
@@ -0,0 +1 @@
+In this paper, we tackle the important yet under-investigated problem of making long-horizon prediction of event sequences. Existing state-of-the-art models do not perform well at this task due to their autoregressive structure. We propose HYPRO, a hybridly normalized probabilistic model that naturally fits this task: its first part is an autoregressive base model that learns to propose predictions; its second part is an energy function that learns to reweight the proposals such that more realistic predictions end up with higher probabilities. We also propose efficient training and inference algorithms for this model. Experiments on multiple real-world datasets demonstrate that our proposed HYPRO model can significantly outperform previous models at making long-horizon predictions of future events. We also conduct a range of ablation studies to investigate the effectiveness of each component of our proposed methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Hamiltonian Latent Operators for content and motion disentanglement in image sequences b/data/2022/neurips/Hamiltonian Latent Operators for content and motion disentanglement in image sequences
new file mode 100644
index 0000000000..e57446ab09
--- /dev/null
+++ b/data/2022/neurips/Hamiltonian Latent Operators for content and motion disentanglement in image sequences	
@@ -0,0 +1 @@
+We introduce \textit{HALO} -- a deep generative model utilising HAmiltonian Latent Operators to reliably disentangle content and motion information in image sequences. The \textit{content} represents summary statistics of a sequence, and \textit{motion} is a dynamic process that determines how information is expressed in any part of the sequence. By modelling the dynamics as a Hamiltonian motion, important desiderata are ensured: (1) the motion is reversible, (2) the symplectic, volume-preserving structure in phase space means paths are continuous and are not divergent in the latent space. Consequently, the nearness of sequence frames is realised by the nearness of their coordinates in the phase space, which proves valuable for disentanglement and long-term sequence generation. The sequence space is generally comprised of different types of dynamical motions. To ensure long-term separability and allow controlled generation, we associate every motion with a unique Hamiltonian that acts in its respective subspace. We demonstrate the utility of \textit{HALO} by swapping the motion of a pair of sequences, controlled generation, and image rotations.
\ No newline at end of file
diff --git a/data/2022/neurips/Hand-Object Interaction Image Generation b/data/2022/neurips/Hand-Object Interaction Image Generation
new file mode 100644
index 0000000000..12b80bd20d
--- /dev/null
+++ b/data/2022/neurips/Hand-Object Interaction Image Generation	
@@ -0,0 +1 @@
+In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of the generated image. Extensive experiments on two large-scale datasets, i.e., HO3Dv3 and DexYCB, demonstrate the effectiveness and superiority of our framework both quantitatively and qualitatively. The project page is available at https://play-with-hoi-generation.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/HandMeThat: Human-Robot Communication in Physical and Social Environments b/data/2022/neurips/HandMeThat: Human-Robot Communication in Physical and Social Environments
new file mode 100644
index 0000000000..fe3b471d17
--- /dev/null
+++ b/data/2022/neurips/HandMeThat: Human-Robot Communication in Physical and Social Environments	
@@ -0,0 +1 @@
+We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.
\ No newline at end of file
diff --git a/data/2022/neurips/Handcrafted Backdoors in Deep Neural Networks b/data/2022/neurips/Handcrafted Backdoors in Deep Neural Networks
new file mode 100644
index 0000000000..7a5a47d842
--- /dev/null
+++ b/data/2022/neurips/Handcrafted Backdoors in Deep Neural Networks	
@@ -0,0 +1 @@
+When machine learning training is outsourced to third parties, $backdoor$ $attacks$ become practical as the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model. Until now, the mechanism to inject backdoors has been limited to $poisoning$. We argue that a supply-chain attacker has more attack techniques available by introducing a $handcrafted$ attack that directly manipulates a model's weights. This direct modification gives our attacker more degrees of freedom compared to poisoning, and we show it can be used to evade many backdoor detection or removal defenses effectively. Across four datasets and four network architectures our backdoor attacks maintain an attack success rate above 96%. Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Hard ImageNet: Segmentations for Objects with Strong Spurious Cues b/data/2022/neurips/Hard ImageNet: Segmentations for Objects with Strong Spurious Cues
new file mode 100644
index 0000000000..1e2f25dd6d
--- /dev/null
+++ b/data/2022/neurips/Hard ImageNet: Segmentations for Objects with Strong Spurious Cues	
@@ -0,0 +1 @@
+Deep classiﬁers are known to rely on spurious features, leading to reduced generalization. The severity of this problem varies signiﬁcantly by class. We identify 15 classes in ImageNet with very strong spurious cues, and collect segmentation masks for these challenging objects to form Hard ImageNet . Leveraging noise, saliency, and ablation based metrics, we demonstrate that models rely on spurious features in Hard ImageNet far more than in RIVAL10, an ImageNet analog to CIFAR10. We observe Hard ImageNet objects are less centered and occupy much less space in their images than RIVAL10 objects, leading to greater spurious feature reliance. Further, we use robust neural features to automatically rank our images based on the degree of spurious cues present. Comparing images with high and low rankings within a class naturally reveals the exact spurious features models rely upon, and shows classiﬁers suffer reduced accuracy when spurious features are absent. With Hard ImageNet’s annotations and evaluation suite, the community can begin to address the problem of learning to detect challenging objects for the right reasons , despite the presence of strong spurious cues.
\ No newline at end of file
diff --git a/data/2022/neurips/Hardness in Markov Decision Processes: Theory and Practice b/data/2022/neurips/Hardness in Markov Decision Processes: Theory and Practice
new file mode 100644
index 0000000000..0b6b57b187
--- /dev/null
+++ b/data/2022/neurips/Hardness in Markov Decision Processes: Theory and Practice	
@@ -0,0 +1 @@
+Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.
\ No newline at end of file
diff --git a/data/2022/neurips/Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks b/data/2022/neurips/Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
new file mode 100644
index 0000000000..d733388559
--- /dev/null
+++ b/data/2022/neurips/Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks	
@@ -0,0 +1 @@
+We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. Prior work hinted at the impossibility of our result: Vempala and Wilmes showed that general SQ lower bounds cannot apply to any real-valued family of functions that satisfies a simple non-degeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many well-studied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning two-hidden-layer ReLU networks, as well as new lower bounds for learning constant-depth ReLU networks from label queries.
\ No newline at end of file
diff --git a/data/2022/neurips/Harmonizing the object recognition strategies of deep neural networks with humans b/data/2022/neurips/Harmonizing the object recognition strategies of deep neural networks with humans
new file mode 100644
index 0000000000..12c0f537ba
--- /dev/null
+++ b/data/2022/neurips/Harmonizing the object recognition strategies of deep neural networks with humans	
@@ -0,0 +1 @@
+The many successes of deep neural networks (DNNs) over the past decade have largely been driven by computational scale rather than insights from biological intelligence. Here, we explore if these trends have also carried concomitant improvements in explaining the visual strategies humans rely on for object recognition. We do this by comparing two related but distinct properties of visual strategies in humans and DNNs: where they believe important visual features are in images and how they use those features to categorize objects. Across 84 different DNNs trained on ImageNet and three independent datasets measuring the where and the how of human visual strategies for object recognition on those images, we find a systematic trade-off between DNN categorization accuracy and alignment with human visual strategies for object recognition. State-of-the-art DNNs are progressively becoming less aligned with humans as their accuracy improves. We rectify this growing issue with our neural harmonizer: a general-purpose training routine that both aligns DNN and human visual strategies and improves categorization accuracy. Our work represents the first demonstration that the scaling laws [1-3] that are guiding the design of DNNs today have also produced worse models of human vision. We release our code and data at https://serre-lab.github.io/Harmonization to help the field build more human-like DNNs.
\ No newline at end of file
diff --git a/data/2022/neurips/Heatmap Distribution Matching for Human Pose Estimation b/data/2022/neurips/Heatmap Distribution Matching for Human Pose Estimation
new file mode 100644
index 0000000000..edfebe1310
--- /dev/null
+++ b/data/2022/neurips/Heatmap Distribution Matching for Human Pose Estimation	
@@ -0,0 +1 @@
+For tackling the task of 2D human pose estimation, the great majority of the recent methods regard this task as a heatmap estimation problem, and optimize the heatmap prediction using the Gaussian-smoothed heatmap as the optimization objective and using the pixel-wise loss (e.g. MSE) as the loss function. In this paper, we show that optimizing the heatmap prediction in such a way, the model performance of body joint localization, which is the intrinsic objective of this task, may not be consistently improved during the optimization process of the heatmap prediction. To address this problem, from a novel perspective, we propose to formulate the optimization of the heatmap prediction as a distribution matching problem between the predicted heatmap and the dot annotation of the body joint directly. By doing so, our proposed method does not need to construct the Gaussian-smoothed heatmap and can achieve a more consistent model performance improvement during the optimization of the heatmap prediction. We show the effectiveness of our proposed method through extensive experiments on the COCO dataset and the MPII dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Hedging as Reward Augmentation in Probabilistic Graphical Models b/data/2022/neurips/Hedging as Reward Augmentation in Probabilistic Graphical Models
new file mode 100644
index 0000000000..f25c320aa7
--- /dev/null
+++ b/data/2022/neurips/Hedging as Reward Augmentation in Probabilistic Graphical Models	
@@ -0,0 +1 @@
+Most people associate the term ‘hedging’ exclusively with financial applications, particularly the use of financial derivatives. We argue that hedging is an activity that human and machine agents should engage in more broadly, even when the agent’s value is not necessarily in monetary units. In this paper, we propose a decision-theoretic view of hedging based on augmenting a probabilistic graphical model – specifically a Bayesian network or an influence diagram – with a reward. Hedging is therefore posed as a particular kind of graph manipulation, and can be viewed as analogous to control/intervention and information gathering related analysis. Effective hedging occurs when a risk-averse agent finds opportunity to balance uncertain rewards in their current situation. We illustrate the concepts with examples and counter-examples, and conduct experiments to demonstrate the properties and applicability of the proposed computational tools that enable agents to proactively identify potential hedging opportunities in real-world situations.
\ No newline at end of file
diff --git a/data/2022/neurips/Heterogeneous Skill Learning for Multi-agent Tasks b/data/2022/neurips/Heterogeneous Skill Learning for Multi-agent Tasks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit b/data/2022/neurips/Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
new file mode 100644
index 0000000000..21744270e5
--- /dev/null
+++ b/data/2022/neurips/Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit	
@@ -0,0 +1 @@
+There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-sparse parity of $n$ bits, a canonical discrete search problem which is statistically easy but computationally hard. Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately $n^{O(k)}$ iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD"stumbles in the dark"until it finds the hidden set of features (a natural algorithm which also runs in $n^{O(k)}$ time). Instead, we show that SGD gradually amplifies the sparse solution via a Fourier gap in the population gradient, making continual progress that is invisible to loss and error metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/Hiding Images in Deep Probabilistic Models b/data/2022/neurips/Hiding Images in Deep Probabilistic Models
new file mode 100644
index 0000000000..86fa963a3a
--- /dev/null
+++ b/data/2022/neurips/Hiding Images in Deep Probabilistic Models	
@@ -0,0 +1 @@
+Data hiding with deep neural networks (DNNs) has experienced impressive successes in recent years. A prevailing scheme is to train an autoencoder, consisting of an encoding network to embed (or transform) secret messages in (or into) a carrier, and a decoding network to extract the hidden messages. This scheme may suffer from several limitations regarding practicability, security, and embedding capacity. In this work, we describe a different computational framework to hide images in deep probabilistic models. Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution. As an instantiation, we adopt a SinGAN, a pyramid of generative adversarial networks (GANs), to learn the patch distribution of one cover image. We hide the secret image by fitting a deterministic mapping from a fixed set of noise maps (generated by an embedding key) to the secret image during patch distribution learning. The stego SinGAN, behaving as the original SinGAN, is publicly communicated; only the receiver with the embedding key is able to extract the secret image. We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security. Moreover, we show the flexibility of the proposed method in terms of hiding multiple images for different receivers and obfuscating the secret image.
\ No newline at end of file
diff --git a/data/2022/neurips/HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis b/data/2022/neurips/HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2022/neurips/HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2022/neurips/Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth b/data/2022/neurips/Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
new file mode 100644
index 0000000000..0c4788b6b0
--- /dev/null
+++ b/data/2022/neurips/Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth	
@@ -0,0 +1 @@
+Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a $(1+\epsilon)$-approximation algorithm for this problem on $m$ edge graphs using $\tilde{O}(m)$ work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions. We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.
\ No newline at end of file
diff --git a/data/2022/neurips/Hierarchical Channel-spatial Encoding for Communication-efficient Collaborative Learning b/data/2022/neurips/Hierarchical Channel-spatial Encoding for Communication-efficient Collaborative Learning
new file mode 100644
index 0000000000..970827d9de
--- /dev/null
+++ b/data/2022/neurips/Hierarchical Channel-spatial Encoding for Communication-efficient Collaborative Learning	
@@ -0,0 +1 @@
+It witnesses that the collaborative learning (CL) systems often face the performance bottleneck of limited bandwidth, where multiple low-end devices continuously generate data and transmit intermediate features to the cloud for incremental training. To this end, improving the communication efficiency by reducing traffic size is one of the most crucial issues for realistic deployment. Existing systems mostly compress features at pixel level and ignore the characteristics of feature structure, which could be further exploited for more efficient compression. In this paper, we take new insights into implementing scalable CL systems through a hierarchical compression on features, termed Stripe-wise Group Quantization (SGQ). Different from previous unstructured quantization methods, SGQ captures both channel and spatial similarity in pixels, and simultaneously encodes features in these two levels to gain a much higher compression ratio. In particular, we refactor feature structure based on inter-channel similarity and bound the gradient deviation caused by quantization, in forward and backward passes, respectively. Such a double-stage pipeline makes SGQ hold a sublinear convergence order as the vanilla SGD-based optimization. Extensive experiments show that SGQ achieves a higher traffic reduction ratio by up to 15 . 97 × and provides 9 . 22 × image processing speedup over the uniform quantized training, while preserving adequate model accuracy as FP32 does, even using 4-bit quantization. This verifies that SGQ can be applied to a wide spectrum of edge intelligence applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Hierarchical Graph Transformer with Adaptive Node Sampling b/data/2022/neurips/Hierarchical Graph Transformer with Adaptive Node Sampling
new file mode 100644
index 0000000000..9a3bb283b9
--- /dev/null
+++ b/data/2022/neurips/Hierarchical Graph Transformer with Adaptive Node Sampling	
@@ -0,0 +1 @@
+The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not achieved competitive performance, especially on large graphs. In this paper, we identify the main deficiencies of current graph transformers:(1) Existing node sampling strategies in Graph Transformers are agnostic to the graph characteristics and the training process. (2) Most sampling strategies only focus on local neighbors and neglect the long-range dependencies in the graph. We conduct experimental investigations on synthetic datasets to show that existing sampling strategies are sub-optimal. To tackle the aforementioned problems, we formulate the optimization strategies of node sampling in Graph Transformer as an adversary bandit problem, where the rewards are related to the attention weights and can vary in the training procedure. Meanwhile, we propose a hierarchical attention scheme with graph coarsening to capture the long-range interactions while reducing computational complexity. Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.
\ No newline at end of file
diff --git a/data/2022/neurips/Hierarchical Lattice Layer for Partially Monotone Neural Networks b/data/2022/neurips/Hierarchical Lattice Layer for Partially Monotone Neural Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Hierarchical Normalization for Robust Monocular Depth Estimation b/data/2022/neurips/Hierarchical Normalization for Robust Monocular Depth Estimation
new file mode 100644
index 0000000000..478491bf1c
--- /dev/null
+++ b/data/2022/neurips/Hierarchical Normalization for Robust Monocular Depth Estimation	
@@ -0,0 +1 @@
+In this paper, we address monocular depth estimation with deep neural networks. To enable training of deep monocular estimation models with various sources of datasets, state-of-the-art methods adopt image-level normalization strategies to generate affine-invariant depth representations. However, learning with image-level normalization mainly emphasizes the relations of pixel representations with the global statistic in the images, such as the structure of the scene, while the fine-grained depth difference may be overlooked. In this paper, we propose a novel multi-scale depth normalization method that hierarchically normalizes the depth representations based on spatial information and depth distributions. Compared with previous normalization strategies applied only at the holistic image level, the proposed hierarchical normalization can effectively preserve the fine-grained details and improve accuracy. We present two strategies that define the hierarchical normalization contexts in the depth domain and the spatial domain, respectively. Our extensive experiments show that the proposed normalization strategy remarkably outperforms previous normalization methods, and we set new state-of-the-art on five zero-shot transfer benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Hierarchical classification at multiple operating points b/data/2022/neurips/Hierarchical classification at multiple operating points
new file mode 100644
index 0000000000..30f1c8355f
--- /dev/null
+++ b/data/2022/neurips/Hierarchical classification at multiple operating points	
@@ -0,0 +1 @@
+Many classification problems consider classes that form a hierarchy. Classifiers that are aware of this hierarchy may be able to make confident predictions at a coarse level despite being uncertain at the fine-grained level. While it is generally possible to vary the granularity of predictions using a threshold at inference time, most contemporary work considers only leaf-node prediction, and almost no prior work has compared methods at multiple operating points. We present an efficient algorithm to produce operating characteristic curves for any method that assigns a score to every class in the hierarchy. Applying this technique to evaluate existing methods reveals that top-down classifiers are dominated by a naive flat softmax classifier across the entire operating range. We further propose two novel loss functions and show that a soft variant of the structured hinge loss is able to significantly outperform the flat baseline. Finally, we investigate the poor accuracy of top-down classifiers and demonstrate that they perform relatively well on unseen classes. Code is available online at https://github.com/jvlmdr/hiercls.
\ No newline at end of file
diff --git a/data/2022/neurips/High-Order Pooling for Graph Neural Networks with Tensor Decomposition b/data/2022/neurips/High-Order Pooling for Graph Neural Networks with Tensor Decomposition
new file mode 100644
index 0000000000..10d65c222e
--- /dev/null
+++ b/data/2022/neurips/High-Order Pooling for Graph Neural Networks with Tensor Decomposition	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are attracting growing attention due to their effectiveness and flexibility in modeling a variety of graph-structured data. Exiting GNN architectures usually adopt simple pooling operations (eg. sum, average, max) when aggregating messages from a local neighborhood for updating node representation or pooling node representations from the entire graph to compute the graph representation. Though simple and effective, these linear operations do not model high-order non-linear interactions among nodes. We propose the Tensorized Graph Neural Network (tGNN), a highly expressive GNN architecture relying on tensor decomposition to model high-order non-linear node interactions. tGNN leverages the symmetric CP decomposition to efficiently parameterize permutation-invariant multilinear maps for modeling node interactions. Theoretical and empirical analysis on both node and graph classification tasks show the superiority of tGNN over competitive baselines. In particular, tGNN achieves the most solid results on two OGB node classification datasets and one OGB graph classification dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/High-dimensional Additive Gaussian Processes under Monotonicity Constraints b/data/2022/neurips/High-dimensional Additive Gaussian Processes under Monotonicity Constraints
new file mode 100644
index 0000000000..400c8db583
--- /dev/null
+++ b/data/2022/neurips/High-dimensional Additive Gaussian Processes under Monotonicity Constraints	
@@ -0,0 +1 @@
+We introduce an additive Gaussian process framework accounting for monotonicity constraints and scalable to high dimensions. Our contributions are threefold. First, we show that our framework enables to satisfy the constraints everywhere in the input space. We also show that more general componentwise linear inequality constraints can be handled similarly, such as componentwise convexity. Second, we propose the additive MaxMod algorithm for sequential dimension reduction. By sequentially maximizing a squared-norm criterion, MaxMod identifies the active input dimensions and refines the most important ones. This criterion can be computed explicitly at a linear cost. Finally, we provide open-source codes for our full framework. We demonstrate the performance and scalability of the methodology in several synthetic examples with hundreds of dimensions under monotonicity constraints as well as on a real-world flood application.
\ No newline at end of file
diff --git a/data/2022/neurips/High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation b/data/2022/neurips/High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
new file mode 100644
index 0000000000..2ffc5ed73c
--- /dev/null
+++ b/data/2022/neurips/High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation	
@@ -0,0 +1 @@
+We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$. In the proportional asymptotic limit where $n,d,N\to\infty$ at the same rate, and an idealized student-teacher setting, we show that the first gradient update contains a rank-1"spike", which results in an alignment between the first-layer weights and the linear component of the teacher model $f^*$. To characterize the impact of this alignment, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on $\boldsymbol{W}$ with learning rate $\eta$, when $f^*$ is a single-index model. We consider two scalings of the first step learning rate $\eta$. For small $\eta$, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large $\eta$, we prove that for certain $f^*$, the same ridge estimator on trained features can go beyond this"linear regime"and outperform a wide range of random features and rotationally invariant kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.
\ No newline at end of file
diff --git a/data/2022/neurips/High-dimensional limit theorems for SGD: Effective dynamics and critical scaling b/data/2022/neurips/High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
new file mode 100644
index 0000000000..d0dc95d237
--- /dev/null
+++ b/data/2022/neurips/High-dimensional limit theorems for SGD: Effective dynamics and critical scaling	
@@ -0,0 +1 @@
+We study the scaling limits of stochastic gradient descent (SGD) with constant step‐size in the high‐dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite‐dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step‐size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step‐size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two‐layer networks for binary and XOR‐type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub‐optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.
\ No newline at end of file
diff --git a/data/2022/neurips/Hilbert Distillation for Cross-Dimensionality Networks b/data/2022/neurips/Hilbert Distillation for Cross-Dimensionality Networks
new file mode 100644
index 0000000000..ead043477c
--- /dev/null
+++ b/data/2022/neurips/Hilbert Distillation for Cross-Dimensionality Networks	
@@ -0,0 +1 @@
+3D convolutional neural networks have revealed superior performance in processing volumetric data such as video and medical imaging. However, the competitive performance by leveraging 3D networks results in huge computational costs, which are far beyond that of 2D networks. In this paper, we propose a novel Hilbert curve-based cross-dimensionality distillation approach that facilitates the knowledge of 3D networks to improve the performance of 2D networks. The proposed Hilbert Distillation (HD) method preserves the structural information via the Hilbert curve, which maps high-dimensional (>=2) representations to one-dimensional continuous space-filling curves. Since the distilled 2D networks are supervised by the curves converted from dimensionally heterogeneous 3D features, the 2D networks are given an informative view in terms of learning structural information embedded in well-trained high-dimensional representations. We further propose a Variable-length Hilbert Distillation (VHD) method to dynamically shorten the walking stride of the Hilbert curve in activation feature areas and lengthen the stride in context feature areas, forcing the 2D networks to pay more attention to learning from activation features. The proposed algorithm outperforms the current state-of-the-art distillation techniques adapted to cross-dimensionality distillation on two classification tasks. Moreover, the distilled 2D networks by the proposed method achieve competitive performance with the original 3D networks, indicating the lightweight distilled 2D networks could potentially be the substitution of cumbersome 3D networks in the real-world scenario.
\ No newline at end of file
diff --git a/data/2022/neurips/Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations b/data/2022/neurips/Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations
new file mode 100644
index 0000000000..eca184d5c4
--- /dev/null
+++ b/data/2022/neurips/Holomorphic Equilibrium Propagation Computes Exact Gradients Through Finite Size Oscillations	
@@ -0,0 +1 @@
+Equilibrium propagation (EP) is an alternative to backpropagation (BP) that allows the training of deep neural networks with local learning rules. It thus provides a compelling framework for training neuromorphic systems and understanding learning in neurobiology. However, EP requires infinitesimal teaching signals, thereby limiting its applicability in noisy physical systems. Moreover, the algorithm requires separate temporal phases and has not been applied to large-scale problems. Here we address these issues by extending EP to holomorphic networks. We show analytically that this extension naturally leads to exact gradients even for finite-amplitude teaching signals. Importantly, the gradient can be computed as the first Fourier coefficient from finite neuronal activity oscillations in continuous time without requiring separate phases. Further, we demonstrate in numerical simulations that our approach permits robust estimation of gradients in the presence of noise and that deeper models benefit from the finite teaching signals. Finally, we establish the first benchmark for EP on the ImageNet 32x32 dataset and show that it matches the performance of an equivalent network trained with BP. Our work provides analytical insights that enable scaling EP to large-scale problems and establishes a formal framework for how oscillations could support learning in biological and neuromorphic systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Homomorphic Matrix Completion b/data/2022/neurips/Homomorphic Matrix Completion
new file mode 100644
index 0000000000..a57aa9d306
--- /dev/null
+++ b/data/2022/neurips/Homomorphic Matrix Completion	
@@ -0,0 +1 @@
+In recommendation systems, global positioning, system identiﬁcation, and mobile social networks, it is a fundamental routine that a server completes a low-rank matrix from an observed subset of its entries. However, sending data to a cloud server raises up the data privacy concern due to eavesdropping attacks and the single-point failure problem, e.g., the Netﬂix prize contest was canceled after a privacy lawsuit. In this paper, we propose a homomorphic matrix completion algorithm for privacy-preserving purpose. First, we formulate a homomorphic matrix completion problem where a server performs matrix completion on cyphertexts, and propose an encryption scheme that is fast and easy to implement. Secondly, we prove that the proposed scheme satisﬁes the homomorphism property that decrypting the recovered matrix on cyphertexts will obtain the target matrix (on plaintexts). Thirdly, we prove that the proposed scheme satisﬁes an ( ✏ , � ) -differential privacy property. While with similar level of privacy guarantee, we reduce the best-known error bound O ( 10 p n 31 n 2 ) to EXACT recovery at a price of more samples. Finally, on synthetic data and real-world data, we show that both homomorphic nuclear-norm minimization and alternating minimization algorithms achieve accurate recoveries on cyphertexts, verifying the homomorphism property.
\ No newline at end of file
diff --git a/data/2022/neurips/Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning b/data/2022/neurips/Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
new file mode 100644
index 0000000000..a15933d3b6
--- /dev/null
+++ b/data/2022/neurips/Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning	
@@ -0,0 +1 @@
+This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .
\ No newline at end of file
diff --git a/data/2022/neurips/HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions b/data/2022/neurips/HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
new file mode 100644
index 0000000000..25b4a2dec3
--- /dev/null
+++ b/data/2022/neurips/HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions	
@@ -0,0 +1 @@
+Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution ($\textit{g}^\textit{n}$Conv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. $\textit{g}^\textit{n}$Conv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models. Based on the operation, we construct a new family of generic vision backbones named HorNet. Extensive experiments on ImageNet classification, COCO object detection and ADE20K semantic segmentation show HorNet outperform Swin Transformers and ConvNeXt by a significant margin with similar overall architecture and training configurations. HorNet also shows favorable scalability to more training data and larger model sizes. Apart from the effectiveness in visual encoders, we also show $\textit{g}^\textit{n}$Conv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that $\textit{g}^\textit{n}$Conv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs. Code is available at https://github.com/raoyongming/HorNet
\ No newline at end of file
diff --git a/data/2022/neurips/House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography b/data/2022/neurips/House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography
new file mode 100644
index 0000000000..b1df2484f0
--- /dev/null
+++ b/data/2022/neurips/House of Cans: Covert Transmission of Internal Datasets via Capacity-Aware Neuron Steganography	
@@ -0,0 +1 @@
+In this paper, we present a capacity-aware neuron steganography scheme (i.e., Cans ) to covertly transmit multiple private machine learning (ML) datasets via a scheduled-to-publish deep neural network (DNN) as the carrier model . Unlike existing steganography schemes which treat the DNN parameters as bit strings, Cans for the first time exploits the learning capacity of the carrier model via a novel parameter sharing mechanism. Extensive evaluation shows, Cans is the first working scheme which can covertly transmit over 10000 real-world data samples within a carrier model which has 220 × less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate ( < 10 − 5 ) and with almost no utility loss on the carrier model ( < 1% ). Besides, Cans implements by-design redundancy to be resilient against common post-processing techniques on the carrier model before the publishing.
\ No newline at end of file
diff --git a/data/2022/neurips/How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders b/data/2022/neurips/How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
new file mode 100644
index 0000000000..eeca91f88a
--- /dev/null
+++ b/data/2022/neurips/How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders	
@@ -0,0 +1 @@
+Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE).
\ No newline at end of file
diff --git a/data/2022/neurips/How Powerful are K-hop Message Passing Graph Neural Networks b/data/2022/neurips/How Powerful are K-hop Message Passing Graph Neural Networks
new file mode 100644
index 0000000000..57d9e4c531
--- /dev/null
+++ b/data/2022/neurips/How Powerful are K-hop Message Passing Graph Neural Networks	
@@ -0,0 +1 @@
+The most popular design paradigm for Graph Neural Networks (GNNs) is 1-hop message passing -- aggregating information from 1-hop neighbors repeatedly. However, the expressive power of 1-hop message passing is bounded by the Weisfeiler-Lehman (1-WL) test. Recently, researchers extended 1-hop message passing to K-hop message passing by aggregating information from K-hop neighbors of nodes simultaneously. However, there is no work on analyzing the expressive power of K-hop message passing. In this work, we theoretically characterize the expressive power of K-hop message passing. Specifically, we first formally differentiate two different kernels of K-hop message passing which are often misused in previous works. We then characterize the expressive power of K-hop message passing by showing that it is more powerful than 1-WL and can distinguish almost all regular graphs. Despite the higher expressive power, we show that K-hop message passing still cannot distinguish some simple regular graphs and its expressive power is bounded by 3-WL. To further enhance its expressive power, we introduce a KP-GNN framework, which improves K-hop message passing by leveraging the peripheral subgraph information in each hop. We show that KP-GNN can distinguish many distance regular graphs which could not be distinguished by previous distance encoding or 3-WL methods. Experimental results verify the expressive power and effectiveness of KP-GNN. KP-GNN achieves competitive results across all benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/How Sampling Impacts the Robustness of Stochastic Neural Networks b/data/2022/neurips/How Sampling Impacts the Robustness of Stochastic Neural Networks
new file mode 100644
index 0000000000..fbf5f906f0
--- /dev/null
+++ b/data/2022/neurips/How Sampling Impacts the Robustness of Stochastic Neural Networks	
@@ -0,0 +1 @@
+Stochastic neural networks (SNNs) are random functions whose predictions are gained by averaging over multiple realizations. Consequently, a gradient-based adversarial example is calculated based on one set of samples and its classification on another set. In this paper, we derive a sufficient condition for such a stochastic prediction to be robust against a given sample-based attack. This allows us to identify the factors that lead to an increased robustness of SNNs and gives theoretical explanations for: (i) the well known observation, that increasing the amount of samples drawn for the estimation of adversarial examples increases the attack's strength, (ii) why increasing the number of samples during an attack can not fully reduce the effect of stochasticity, (iii) why the sample size during inference does not influence the robustness, and (iv) why a higher gradient variance and a shorter expected value of the gradient relates to a higher robustness. Our theoretical findings give a unified view on the mechanisms underlying previously proposed approaches for increasing attack strengths or model robustness and are verified by an extensive empirical analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/How Transferable are Video Representations Based on Synthetic Data? b/data/2022/neurips/How Transferable are Video Representations Based on Synthetic Data?
new file mode 100644
index 0000000000..70f3a393f8
--- /dev/null
+++ b/data/2022/neurips/How Transferable are Video Representations Based on Synthetic Data?	
@@ -0,0 +1 @@
+Action recognition has improved dramatically with massive-scale video datasets. Yet, these datasets are accompanied with issues related to curation cost, privacy, ethics, bias, and copyright. Compared to that, only minor efforts have been devoted toward exploring the potential of synthetic video data. In this work, as a stepping stone towards addressing these shortcomings, we study the transferability of video representations learned solely from synthetically-generated video clips, instead of real data. We propose SynAPT, a novel benchmark for action recognition based on a combination of existing synthetic datasets, in which a model is pre-trained on synthetic videos rendered by various graphics simulators, and then transferred to a set of downstream action recognition datasets, containing different categories than the synthetic data. We provide an extensive baseline analysis on SynAPT revealing that the simulation-to-real gap is minor for datasets with low object and scene bias, where models pre-trained with synthetic data even outperform their real data counterparts. We posit that the gap between real and synthetic action representations can be attributed to contextual bias and static objects related to the action, instead of the temporal dynamics of the action itself. The SynAPT benchmark is available at https://github.com/mintjohnkim/SynAPT.
\ No newline at end of file
diff --git a/data/2022/neurips/How Well Do Unsupervised Learning Algorithms Model Human Real-time and Life-long Learning? b/data/2022/neurips/How Well Do Unsupervised Learning Algorithms Model Human Real-time and Life-long Learning?
new file mode 100644
index 0000000000..9b207ad96a
--- /dev/null
+++ b/data/2022/neurips/How Well Do Unsupervised Learning Algorithms Model Human Real-time and Life-long Learning?	
@@ -0,0 +1 @@
+Humans learn from visual inputs at multiple timescales, both rapidly and flexibly acquiring visual knowledge over short periods, and robustly accumulating online learning progress over longer periods. Modeling these powerful learning capabilities is an important problem for computational visual cognitive science, and models that could replicate them would be of substantial utility in real-world computer vision settings. In this work, we establish benchmarks for both real-time and life-long continual visual learning. Our real-time learning benchmark measures a model's ability to match the rapid visual behavior changes of real humans over the course of minutes and hours, given a stream of visual inputs. Our life-long learning benchmark evaluates the performance of models in a purely online learning curriculum obtained directly from child visual experience over the course of years of development. We evaluate a spectrum of recent deep self-supervised visual learning algorithms on both benchmarks, finding that none of them perfectly match human performance, though some algorithms perform substantially better than others. Interestingly, algorithms embodying recent trends in self-supervised learning - including BYOL, SwAV and MAE - are substantially worse on our benchmarks than an earlier generation of self-supervised algorithms such as SimCLR and MoCo-v2. We present analysis indicating that the failure of these newer algorithms is primarily due to their inability to handle the kind of sparse low-diversity datastreams that naturally arise in the real world, and that actively leveraging memory through negative sampling - a mechanism eschewed by these newer algorithms - appears useful for facilitating learning in such low-diversity environments. We also illustrate a complementarity between the short and long timescales in the two benchmarks, showing how requiring a single learning algorithm to be locally context-sensitive enough to match real-time learning changes while stable enough to avoid catastrophic forgetting over the long term induces a trade-off that human-like algorithms may have to straddle. Taken together, our benchmarks establish a quantitative way to directly compare learning between neural networks models and human learners, show how choices in the mechanism by which such algorithms handle sample comparison and memory strongly impact their ability to match human learning abilities, and expose an open problem space for identifying more flexible and robust visual self-supervision algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios b/data/2022/neurips/How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
new file mode 100644
index 0000000000..dee8e2e91b
--- /dev/null
+++ b/data/2022/neurips/How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	
@@ -0,0 +1 @@
+In recent years, deep neural networks have demonstrated increasingly strong abilities to recognize objects and activities in videos. However, as video understanding becomes widely used in real-world applications, a key consideration is developing human-centric systems that understand not only the content of the video but also how it would affect the wellbeing and emotional state of viewers. To facilitate research in this setting, we introduce two large-scale datasets with over 60,000 videos manually annotated for emotional response and subjective wellbeing. The Video Cognitive Empathy (VCE) dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states. The Video to Valence (V2V) dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing. In experiments, we show how video models that are primarily trained to recognize actions and find contours of objects can be repurposed to understand human preferences and the emotional content of videos. Although there is room for improvement, predicting wellbeing and emotional response is on the horizon for state-of-the-art models. We hope our datasets can help foster further advances at the intersection of commonsense video understanding and human preference learning.
\ No newline at end of file
diff --git a/data/2022/neurips/How and Why to Manipulate Your Own Agent: On the Incentives of Users of Learning Agents b/data/2022/neurips/How and Why to Manipulate Your Own Agent: On the Incentives of Users of Learning Agents
new file mode 100644
index 0000000000..b8acba9410
--- /dev/null
+++ b/data/2022/neurips/How and Why to Manipulate Your Own Agent: On the Incentives of Users of Learning Agents	
@@ -0,0 +1 @@
+The usage of automated learning agents is becoming increasingly prevalent in many online economic applications such as online auctions and automated trading. Motivated by such applications, this paper is dedicated to fundamental modeling and analysis of the strategic situations that the users of automated learning agents are facing. We consider strategic settings where several users engage in a repeated online interaction, assisted by regret-minimizing learning agents that repeatedly play a"game"on their behalf. We propose to view the outcomes of the agents' dynamics as inducing a"meta-game"between the users. Our main focus is on whether users can benefit in this meta-game from"manipulating"their own agents by misreporting their parameters to them. We define a general framework to model and analyze these strategic interactions between users of learning agents for general games and analyze the equilibria induced between the users in three classes of games. We show that, generally, users have incentives to misreport their parameters to their own agents, and that such strategic user behavior can lead to very different outcomes than those anticipated by standard analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/How to talk so AI will learn: Instructions, descriptions, and autonomy b/data/2022/neurips/How to talk so AI will learn: Instructions, descriptions, and autonomy
new file mode 100644
index 0000000000..a614ae8fce
--- /dev/null
+++ b/data/2022/neurips/How to talk so AI will learn: Instructions, descriptions, and autonomy	
@@ -0,0 +1 @@
+From the earliest years of our lives, humans use language to express our beliefs and desires. Being able to talk to artificial agents about our preferences would thus fulfill a central goal of value alignment. Yet today, we lack computational models explaining such language use. To address this challenge, we formalize learning from language in a contextual bandit setting and ask how a human might communicate preferences over behaviors. We study two distinct types of language: $\textit{instructions}$, which provide information about the desired policy, and $\textit{descriptions}$, which provide information about the reward function. We show that the agent's degree of autonomy determines which form of language is optimal: instructions are better in low-autonomy settings, but descriptions are better when the agent will need to act independently. We then define a pragmatic listener agent that robustly infers the speaker's reward function by reasoning about $\textit{how}$ the speaker expresses themselves. We validate our models with a behavioral experiment, demonstrating that (1) our speaker model predicts human behavior, and (2) our pragmatic listener successfully recovers humans' reward functions. Finally, we show that this form of social learning can integrate with and reduce regret in traditional reinforcement learning. We hope these insights facilitate a shift from developing agents that $\textit{obey}$ language to agents that $\textit{learn}$ from it.
\ No newline at end of file
diff --git a/data/2022/neurips/Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models b/data/2022/neurips/Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
new file mode 100644
index 0000000000..e2a49f5394
--- /dev/null
+++ b/data/2022/neurips/Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models	
@@ -0,0 +1 @@
+Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Human-AI Collaborative Bayesian Optimisation b/data/2022/neurips/Human-AI Collaborative Bayesian Optimisation
new file mode 100644
index 0000000000..e25e21926c
--- /dev/null
+++ b/data/2022/neurips/Human-AI Collaborative Bayesian Optimisation	
@@ -0,0 +1 @@
+Human-AI collaboration looks at harnessing the complementary strengths of both humans and AI. We propose a new method for human-AI collaboration in Bayesian optimisation where the optimum is mainly pursued by the Bayesian optimisation algorithm following complex computation, whilst getting occasional help from the accompanying expert having a deeper knowledge of the underlying physical phenomenon. We expect experts to have some understanding of the correlation structures of the experimental system, but not the location of the optimum. The expert provides feedback by either changing the current recommendation or providing her belief on the good and bad regions of the search space based on the current observations. Our proposed method takes such feedback to build a model that aligns with the expert’s model and then uses it for optimisation. We provide theoretical underpinning on why such an approach may be more efﬁcient than the one without expert’s feedback. The empirical results show the robustness and superiority of our method with promising efﬁciency gains.
\ No newline at end of file
diff --git a/data/2022/neurips/Human-AI Shared Control via Policy Dissection b/data/2022/neurips/Human-AI Shared Control via Policy Dissection
new file mode 100644
index 0000000000..d00224e7a0
--- /dev/null
+++ b/data/2022/neurips/Human-AI Shared Control via Policy Dissection	
@@ -0,0 +1 @@
+Human-AI shared control allows human to interact and collaborate with AI to accomplish control tasks in complex environments. Previous Reinforcement Learning (RL) methods attempt the goal-conditioned design to achieve human-controllable policies at the cost of redesigning the reward function and training paradigm. Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called \textit{Policy Dissection} to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a human-interactive policy. We evaluate the proposed approach on the RL tasks of autonomous driving and locomotion. The experiments show that human-AI shared control achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes. With human in the loop, the locomotion robots also exhibit versatile controllable motion skills even though they are only trained to move forward. Our results suggest the promising direction of implementing human-AI shared autonomy through interpreting the learned representation of the autonomous agents. Demo video and code will be made available at https://metadriverse.github.io/policydissect.
\ No newline at end of file
diff --git a/data/2022/neurips/Human-Robotic Prosthesis as Collaborating Agents for Symmetrical Walking b/data/2022/neurips/Human-Robotic Prosthesis as Collaborating Agents for Symmetrical Walking
new file mode 100644
index 0000000000..839d28f291
--- /dev/null
+++ b/data/2022/neurips/Human-Robotic Prosthesis as Collaborating Agents for Symmetrical Walking	
@@ -0,0 +1 @@
+This is the first attempt at considering human influence in the reinforcement learning control of a robotic lower limb prosthesis toward symmetrical walking in real world situations. We propose a collaborative multi-agent reinforcement learning (cMARL) solution framework for this highly complex and challenging human-prosthesis collaboration (HPC) problem. The design of an automatic controller of the robot within the HPC context is based on accessible physical features or measurements that are known to affect walking performance. Comparisons are made with the current state-of-the-art robot control designs, which are single-agent based, as well as existing MARL solution approaches tailored to the problem, including multi-agent deep deterministic policy gradient (MADDPG) and counterfactual multi-agent policy gradient (COMA). Results show that, when compared to these approaches, treating the human and robot as coupled agents and using an estimated human adaption in robot control design can achieve lower stage cost, peak error, and improved symmetry to ensure better human walking performance. Additionally, our approach accelerates learning of walking tasks and increases learning success rate. The proposed framework can potentially be further developed to examine how human and robotic lower limb prosthesis interact, an area that little is known about. Advancing cMARL toward real world applications such as HPC for normative walking sets a good example of how AI can positively impact on people’s lives.
\ No newline at end of file
diff --git a/data/2022/neurips/HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process b/data/2022/neurips/HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process
new file mode 100644
index 0000000000..a30f7a64b0
--- /dev/null
+++ b/data/2022/neurips/HumanLiker: A Human-like Object Detector to Model the Manual Labeling Process	
@@ -0,0 +1 @@
+Popular object detection models generate bounding boxes in a different way than we humans. As an example, modern detectors yield object box either upon the regression of its center and width/height (center-guided detector), or by grouping paired estimated corners (corner-guided detector). However, that is not the pattern we manually label an object due to high degrees of freedom in searching centers or low efficiency of grouping corners. Empirically, humans run two steps to locate an object bounding box manually: 1) click the mouse at the top-left corner of object, and then drag the mouse to the bottom-right corner; 2) refine the corner positions to make the bounding box more precisely, if necessary. Inspired by this manual labeling process, we propose a novel human-like detector, termed as HumanLiker, which is devised as a two-stage end-to-end detector to simulate the two aforementioned. Like we humans in manual labeling, HumanLiker can effectively avert both the thorny center searching and heuristic corner grouping. Different from the mainstream detector branches, i.e. , the center/corner-guided methods, the Human-Liker provides a new paradigm which integrates the advantages of both branches to balance the detection efficiency and bounding box quality. On MS-COCO test-dev set, HumanLiker can achieve 50.2%/51.6% and 53.8%/55.6% in term of AP with ResNeXt-101 and SwinTransformer backbones in single/multi-scale testing, out-performing current popular center/corner-guided baselines ( e.g. , DETR/CornerNet) by a large margin, with much less training epochs and higher inference FPS. Code will be available at https://github.com/Ucas-HaoranWei/HumanLiker .
\ No newline at end of file
diff --git a/data/2022/neurips/Hybrid Neural Autoencoders for Stimulus Encoding in Visual and Other Sensory Neuroprostheses b/data/2022/neurips/Hybrid Neural Autoencoders for Stimulus Encoding in Visual and Other Sensory Neuroprostheses
new file mode 100644
index 0000000000..6f790fb70e
--- /dev/null
+++ b/data/2022/neurips/Hybrid Neural Autoencoders for Stimulus Encoding in Visual and Other Sensory Neuroprostheses	
@@ -0,0 +1 @@
+Sensory neuroprostheses are emerging as a promising technology to restore lost sensory function or augment human capabilities. However, sensations elicited by current devices often appear artificial and distorted. Although current models can predict the neural or perceptual response to an electrical stimulus, an optimal stimulation strategy solves the inverse problem: what is the required stimulus to produce a desired response? Here, we frame this as an end-to-end optimization problem, where a deep neural network stimulus encoder is trained to invert a known and fixed forward model that approximates the underlying biological system. As a proof of concept, we demonstrate the effectiveness of this hybrid neural autoencoder (HNA) in visual neuroprostheses. We find that HNA produces high-fidelity patient-specific stimuli representing handwritten digits and segmented images of everyday objects, and significantly outperforms conventional encoding strategies across all simulated patients. Overall this is an important step towards the long-standing challenge of restoring high-quality vision to people living with incurable blindness and may prove a promising solution for a variety of neuroprosthetic technologies.
\ No newline at end of file
diff --git a/data/2022/neurips/Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights b/data/2022/neurips/Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights
new file mode 100644
index 0000000000..25faa55f8c
--- /dev/null
+++ b/data/2022/neurips/Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights	
@@ -0,0 +1 @@
+Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and several sampling methods based on the topology of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions.
\ No newline at end of file
diff --git a/data/2022/neurips/HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks b/data/2022/neurips/HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks
new file mode 100644
index 0000000000..e5c8698e3b
--- /dev/null
+++ b/data/2022/neurips/HyperDomainNet: Universal Domain Adaptation for Generative Adversarial Networks	
@@ -0,0 +1 @@
+Domain adaptation framework of GANs has achieved great progress in recent years as a main successful approach of training contemporary GANs in the case of very limited training data. In this work, we significantly improve this framework by proposing an extremely compact parameter space for fine-tuning the generator. We introduce a novel domain-modulation technique that allows to optimize only 6 thousand-dimensional vector instead of 30 million weights of StyleGAN2 to adapt to a target domain. We apply this parameterization to the state-of-art domain adaptation methods and show that it has almost the same expressiveness as the full parameter space. Additionally, we propose a new regularization loss that considerably enhances the diversity of the fine-tuned generator. Inspired by the reduction in the size of the optimizing parameter space we consider the problem of multi-domain adaptation of GANs, i.e. setting when the same model can adapt to several domains depending on the input query. We propose the HyperDomainNet that is a hypernetwork that predicts our parameterization given the target domain. We empirically confirm that it can successfully learn a number of domains at once and may even generalize to unseen domains. Source code can be found at https://github.com/MACderRu/HyperDomainNet
\ No newline at end of file
diff --git a/data/2022/neurips/HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding b/data/2022/neurips/HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding
new file mode 100644
index 0000000000..09f9d29fc2
--- /dev/null
+++ b/data/2022/neurips/HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding	
@@ -0,0 +1 @@
+Embedded topic models are able to learn interpretable topics even with large and heavy-tailed vocabularies. However, they generally hold the Euclidean embedding space assumption, leading to a basic limitation in capturing hierarchical relations. To this end, we present a novel framework that introduces hyperbolic embeddings to represent words and topics. With the tree-likeness property of hyperbolic space, the underlying semantic hierarchy among words and topics can be better exploited to mine more interpretable topics. Furthermore, due to the superiority of hyperbolic geometry in representing hierarchical data, tree-structure knowledge can also be naturally injected to guide the learning of a topic hierarchy. Therefore, we further develop a regularization term based on the idea of contrastive learning to inject prior structural knowledge efficiently. Experiments on both topic taxonomy discovery and document representation demonstrate that the proposed framework achieves improved performance against existing embedded topic models.
\ No newline at end of file
diff --git a/data/2022/neurips/HyperTree Proof Search for Neural Theorem Proving b/data/2022/neurips/HyperTree Proof Search for Neural Theorem Proving
new file mode 100644
index 0000000000..5b7e8e14ef
--- /dev/null
+++ b/data/2022/neurips/HyperTree Proof Search for Neural Theorem Proving	
@@ -0,0 +1 @@
+We propose an online training procedure for a transformer-based automated theorem prover. Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS), inspired by the recent success of AlphaZero. Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution. We report detailed ablations of our pipeline's main components by studying performance on three environments of increasing complexity. In particular, we show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, significantly outperforming the previous state of the art of 56.5% by GPT-f. Online training on these unproved theorems increases accuracy to 82.6%. With a similar computational budget, we improve the state of the art on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Hyperbolic Embedding Inference for Structured Multi-Label Prediction b/data/2022/neurips/Hyperbolic Embedding Inference for Structured Multi-Label Prediction
new file mode 100644
index 0000000000..25d69cea8d
--- /dev/null
+++ b/data/2022/neurips/Hyperbolic Embedding Inference for Structured Multi-Label Prediction	
@@ -0,0 +1 @@
+We consider a structured multi-label prediction problem where the labels are organized under implication and mutual exclusion constraints. A major concern is to produce predictions that are logically consistent with these constraints. To do so, we formulate this problem as an embedding inference problem where the constraints are imposed onto the embeddings of labels by geometric construction . Particularly, we consider a hyperbolic Poincaré ball model in which we encode labels as Poincaré hyperplanes that work as linear decision boundaries. The hyperplanes are interpreted as convex regions such that the logical relationships (implication and exclusion) are geometrically encoded using insideness and disjointedness of these regions, respectively. We show theoretical groundings of the method for preserving logical relationships in the embedding space. Extensive experiments on 12 datasets show 1) signiﬁcant improvements in mean average precision; 2) lower number of constraint violations; 3) an order of magnitude fewer dimensions than baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Hyperbolic Feature Augmentation via Distribution Estimation and Infinite Sampling on Manifolds b/data/2022/neurips/Hyperbolic Feature Augmentation via Distribution Estimation and Infinite Sampling on Manifolds
new file mode 100644
index 0000000000..cb7716d65a
--- /dev/null
+++ b/data/2022/neurips/Hyperbolic Feature Augmentation via Distribution Estimation and Infinite Sampling on Manifolds	
@@ -0,0 +1 @@
+Learning in hyperbolic spaces has attracted growing attention recently, owing to their capabilities in capturing hierarchical structures of data. However, existing learning algorithms in the hyperbolic space tend to overfit when limited data is given. In this paper, we propose a hyperbolic feature augmentation method that generates diverse and discriminative features in the hyperbolic space to combat overfitting. We employ a wrapped hyperbolic normal distribution to model augmented features, and use a neural ordinary differential equation module that benefits from meta-learning to estimate the distribution. This is to reduce the bias of estimation caused by the scarcity of data. We also derive an upper bound of the augmentation loss, which enables us to train a hyperbolic model by using an infinite number of augmentations. Experiments on few-shot learning and continual learning tasks show that our method significantly improves the performance of hyperbolic algorithms in scarce data regimes.
\ No newline at end of file
diff --git a/data/2022/neurips/Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution b/data/2022/neurips/Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution
new file mode 100644
index 0000000000..a47eeabe1b
--- /dev/null
+++ b/data/2022/neurips/Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution	
@@ -0,0 +1 @@
+Outlier detection (OD) literature exhibits numerous algorithms as it applies to diverse domains. However, given a new detection task, it is unclear how to choose an algorithm to use, nor how to set its hyperparameter(s) (HPs) in unsupervised settings. HP tuning is an ever-growing problem with the arrival of many new detectors based on deep learning, which usually come with a long list of HPs. Surprisingly, the issue of model selection in the outlier mining literature has been"the elephant in the room"; a significant factor in unlocking the utmost potential of deep methods, yet little said or done to systematically tackle the issue. In the first part of this paper, we conduct the first large-scale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HP-robust and scalable deep hyper-ensemble model called ROBOD that assembles models with varying HP configurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, state-of-the-art detection performance as compared to its modern counterparts, while taking only $2$-$10$\% of the time by the naive hyper-ensemble with independent training.
\ No newline at end of file
diff --git a/data/2022/neurips/Hypothesis Testing for Differentially Private Linear Regression b/data/2022/neurips/Hypothesis Testing for Differentially Private Linear Regression
new file mode 100644
index 0000000000..ae4860fadb
--- /dev/null
+++ b/data/2022/neurips/Hypothesis Testing for Differentially Private Linear Regression	
@@ -0,0 +1 @@
+In this work, we design differentially private hypothesis tests for the following problems in the general linear model: testing a linear relationship and testing for the presence of mixtures. The majority of our hypothesis tests are based on differentially private versions of the $F$-statistic for the general linear model framework, which are uniformly most powerful unbiased in the non-private setting. We also present other tests for these problems, one of which is based on the differentially private nonparametric tests of Couch, Kazan, Shi, Bray, and Groce (CCS 2019), which is especially suited for the small dataset regime. We show that the differentially private $F$-statistic converges to the asymptotic distribution of its non-private counterpart. As a corollary, the statistical power of the differentially private $F$-statistic converges to the statistical power of the non-private $F$-statistic. Through a suite of Monte Carlo based experiments, we show that our tests achieve desired significance levels and have a high power that approaches the power of the non-private tests as we increase sample sizes or the privacy-loss parameter. We also show when our tests outperform existing methods in the literature.
\ No newline at end of file
diff --git a/data/2022/neurips/I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification b/data/2022/neurips/I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification
new file mode 100644
index 0000000000..349de378d4
--- /dev/null
+++ b/data/2022/neurips/I2DFormer: Learning Image to Document Attention for Zero-Shot Image Classification	
@@ -0,0 +1 @@
+Despite the tremendous progress in zero-shot learning(ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name. However, word embeddings extracted from pre-trained language models do not necessarily capture visual similarities, resulting in poor zero-shot performance. In this work, we argue that online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes, therefore can be used as powerful unsupervised side information for ZSL. To this end, we propose I2DFormer, a novel transformer-based ZSL framework that jointly learns to encode images and documents by aligning both modalities in a shared embedding space. In order to distill discriminative visual words from noisy documents, we introduce a new cross-modal attention module that learns fine-grained interactions between image patches and document words. Consequently, our I2DFormer not only learns highly discriminative document embeddings that capture visual similarities but also gains the ability to localize visually relevant words in image regions. Quantitatively, we demonstrate that our I2DFormer significantly outperforms previous unsupervised semantic embeddings under both zero-shot and generalized zero-shot learning settings on three public datasets. Qualitatively, we show that our method leads to highly interpretable results where document words can be grounded in the image regions.
\ No newline at end of file
diff --git a/data/2022/neurips/I2Q: A Fully Decentralized Q-Learning Algorithm b/data/2022/neurips/I2Q: A Fully Decentralized Q-Learning Algorithm
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/IKEA-Manual: Seeing Shape Assembly Step by Step b/data/2022/neurips/IKEA-Manual: Seeing Shape Assembly Step by Step
new file mode 100644
index 0000000000..44d0fd9e7a
--- /dev/null
+++ b/data/2022/neurips/IKEA-Manual: Seeing Shape Assembly Step by Step	
@@ -0,0 +1 @@
+Human-designed visual manuals are crucial components in shape assembly activities. They provide step-by-step guidance on how we should move and connect different parts in a convenient and physically-realizable way. While there has been an ongoing effort in building agents that perform assembly tasks, the information in human-design manuals has been largely overlooked. We identify that this is due to 1) a lack of realistic 3D assembly objects that have paired manuals and 2) the difficulty of extracting structured information from purely image-based manuals. Motivated by this observation, we present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals. We provide fine-grained annotations on the IKEA objects and assembly manuals, including decomposed assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We illustrate the broad application of our dataset on four tasks related to shape assembly: assembly plan generation, part segmentation, pose estimation, and 3D part assembly.
\ No newline at end of file
diff --git a/data/2022/neurips/IM-Loss: Information Maximization Loss for Spiking Neural Networks b/data/2022/neurips/IM-Loss: Information Maximization Loss for Spiking Neural Networks
new file mode 100644
index 0000000000..470f8a4dda
--- /dev/null
+++ b/data/2022/neurips/IM-Loss: Information Maximization Loss for Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking Neural Network (SNN), recognized as a type of biologically plausible architecture, has recently drawn much research attention. It transmits information by 0 / 1 spikes. This bio-mimetic mechanism of SNN demonstrates extreme energy efficiency since it avoids any multiplications on neuromorphic hardware. However, the forward-passing 0 / 1 spike quantization will cause information loss and accuracy degradation. To deal with this problem, the Information maximization loss (IM-Loss) that aims at maximizing the information flow in the SNN is proposed in the paper. The IM-Loss not only enhances the information expressiveness of an SNN directly but also plays a part of the role of normalization without introducing any additional operations ( e.g. , bias and scaling) in the inference phase. Additionally, we introduce a novel differentiable spike activity estimation, Evolutionary Surrogate Gradients (ESG) in SNNs. By appointing automatic evolvable surrogate gradients for spike activity function, ESG can ensure sufficient model updates at the beginning and accurate gradients at the end of the training, resulting in both easy convergence and high task performance. Experimental results on both popular non-spiking static and neuromorphic datasets show that the SNN models trained by our method outperform the current state-of-the-art algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/IMED-RL: Regret optimal learning of ergodic Markov decision processes b/data/2022/neurips/IMED-RL: Regret optimal learning of ergodic Markov decision processes
new file mode 100644
index 0000000000..422662c8d7
--- /dev/null
+++ b/data/2022/neurips/IMED-RL: Regret optimal learning of ergodic Markov decision processes	
@@ -0,0 +1 @@
+We consider reinforcement learning in a discrete, undiscounted, infinite-horizon Markov Decision Problem (MDP) under the average reward criterion, and focus on the minimization of the regret with respect to an optimal policy, when the learner does not know the rewards nor the transitions of the MDP. In light of their success at regret minimization in multi-armed bandits, popular bandit strategies, such as the optimistic UCB , KL-UCB or the Bayesian Thompson sampling strategy, have been extended to the MDP setup. Despite some key successes, existing strategies for solving this problem either fail to be provably asymptotically optimal, or suffer from prohibitive burn-in phase and computational complexity when implemented in practice. In this work, we shed a novel light on regret minimization strategies, by extending to reinforcement learning the computationally appealing Indexed Minimum Empirical Divergence ( IMED ) bandit algorithm. Traditional asymptotic problem-dependent lower bounds on the regret are known under the assumption that the MDP is ergodic . Under this assumption, we introduce IMED-RL and prove that its regret upper bound asymptotically matches the regret lower bound. We discuss both the case when the supports of transitions are unknown, and the more informative but a priori harder-to-exploit-optimally case when they are known. Rewards are assumed light-tailed, semi-bounded from above. Last, we provide numerical illustrations on classical tabular MDPs, ergodic and communicating only, showing the competitiveness of IMED-RL in finite-time against state-of-the-art algorithms. IMED-RL also benefits from a light complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/INRAS: Implicit Neural Representation for Audio Scenes b/data/2022/neurips/INRAS: Implicit Neural Representation for Audio Scenes
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning b/data/2022/neurips/Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning
new file mode 100644
index 0000000000..8435bcce3b
--- /dev/null
+++ b/data/2022/neurips/Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning	
@@ -0,0 +1 @@
+While Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment, Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert's behavior. It is well known that, in general, various reward functions can lead to the same optimal policy, and hence, IRL is ill-defined. However, (Cao et al., 2021) showed that, if we observe two or more experts with different discount factors or acting in different environments, the reward function can under certain conditions be identified up to a constant. This work starts by showing an equivalent identifiability statement from multiple experts in tabular MDPs based on a rank condition, which is easily verifiable and is shown to be also necessary. We then extend our result to various different scenarios, i.e., we characterize reward identifiability in the case where the reward function can be represented as a linear combination of given features, making it more interpretable, or when we have access to approximate transition matrices. Even when the reward is not identifiable, we provide conditions characterizing when data on multiple experts in a given environment allows to generalize and train an optimal agent in a new environment. Our theoretical results on reward identifiability and generalizability are validated in various numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Identifiability of deep generative models without auxiliary information b/data/2022/neurips/Identifiability of deep generative models without auxiliary information
new file mode 100644
index 0000000000..b5db9d108c
--- /dev/null
+++ b/data/2022/neurips/Identifiability of deep generative models without auxiliary information	
@@ -0,0 +1 @@
+We prove identifiability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used in practice. Unlike existing work, our analysis does not require weak supervision, auxiliary information, or conditioning in the latent space. Specifically, we show that for a broad class of generative (i.e. unsupervised) models with universal approximation capabilities, the side information $u$ is not necessary: We prove identifiability of the entire generative model where we do not observe $u$ and only observe the data $x$. The models we consider match autoencoder architectures used in practice that leverage mixture priors in the latent space and ReLU/leaky-ReLU activations in the encoder, such as VaDE and MFC-VAE. Our main result is an identifiability hierarchy that significantly generalizes previous work and exposes how different assumptions lead to different"strengths"of identifiability, and includes certain"vanilla"VAEs with isotropic Gaussian priors as a special case. For example, our weakest result establishes (unsupervised) identifiability up to an affine transformation, and thus partially resolves an open problem regarding model identifiability raised in prior work. These theoretical results are augmented with experiments on both simulated and real data.
\ No newline at end of file
diff --git a/data/2022/neurips/Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy b/data/2022/neurips/Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy
new file mode 100644
index 0000000000..ad6f997972
--- /dev/null
+++ b/data/2022/neurips/Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy	
@@ -0,0 +1 @@
+Gaussian differential privacy (GDP) is a single-parameter family of privacy notions that provides coherent guarantees to avoid the exposure of sensitive individual information. Despite the extra interpretability and tighter bounds under composition GDP provides, many widely used mechanisms (e.g., the Laplace mechanism) inherently provide GDP guarantees but often fail to take advantage of this new framework because their privacy guarantees were derived under a different background. In this paper, we study the asymptotic properties of privacy profiles and develop a simple criterion to identify algorithms with GDP properties. We propose an efficient method for GDP algorithms to narrow down possible values of an optimal privacy measurement, $\mu$ with an arbitrarily small and quantifiable margin of error. For non GDP algorithms, we provide a post-processing procedure that can amplify existing privacy guarantees to meet the GDP condition. As applications, we compare two single-parameter families of privacy notions, $\epsilon$-DP, and $\mu$-GDP, and show that all $\epsilon$-DP algorithms are intrinsically also GDP. Lastly, we show that the combination of our measurement process and the composition theorem of GDP is a powerful and convenient tool to handle compositions compared to the traditional standard and advanced composition theorems.
\ No newline at end of file
diff --git a/data/2022/neurips/Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials b/data/2022/neurips/Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials
new file mode 100644
index 0000000000..5a5a2e858f
--- /dev/null
+++ b/data/2022/neurips/Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials	
@@ -0,0 +1 @@
+A recent goal in the theory of deep learning is to identify how neural networks can escape the"lazy training,"or Neural Tangent Kernel (NTK) regime, where the network is coupled with its first order Taylor expansion at initialization. While the NTK is minimax optimal for learning dense polynomials (Ghorbani et al, 2021), it cannot learn features, and hence has poor sample complexity for learning many classes of functions including sparse polynomials. Recent works have thus aimed to identify settings where gradient based algorithms provably generalize better than the NTK. One such example is the"QuadNTK"approach of Bai and Lee (2020), which analyzes the second-order term in the Taylor expansion. Bai and Lee (2020) show that the second-order term can learn sparse polynomials efficiently; however, it sacrifices the ability to learn general dense polynomials. In this paper, we analyze how gradient descent on a two-layer neural network can escape the NTK regime by utilizing a spectral characterization of the NTK (Montanari and Zhong, 2020) and building on the QuadNTK approach. We first expand upon the spectral analysis to identify"good"directions in parameter space in which we can move without harming generalization. Next, we show that a wide two-layer neural network can jointly use the NTK and QuadNTK to fit target functions consisting of a dense low-degree term and a sparse high-degree term -- something neither the NTK nor the QuadNTK can do on their own. Finally, we construct a regularizer which encourages our parameter vector to move in the"good"directions, and show that gradient descent on the regularized loss will converge to a global minimizer, which also has low test error. This yields an end to end convergence and generalization guarantee with provable sample complexity improvement over both the NTK and QuadNTK on their own.
\ No newline at end of file
diff --git a/data/2022/neurips/If Influence Functions are the Answer, Then What is the Question? b/data/2022/neurips/If Influence Functions are the Answer, Then What is the Question?
new file mode 100644
index 0000000000..e6eb9bd9bd
--- /dev/null
+++ b/data/2022/neurips/If Influence Functions are the Answer, Then What is the Question?	
@@ -0,0 +1 @@
+Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment is often poor in neural networks. In this work, we investigate the specific factors that cause this discrepancy by decomposing it into five separate terms. We study the contributions of each term on a variety of architectures and datasets and how they vary with factors such as network width and training time. While practical influence function estimates may be a poor match to leave-one-out retraining for nonlinear networks, we show they are often a good approximation to a different object we term the proximal Bregman response function (PBRF). Since the PBRF can still be used to answer many of the questions motivating influence functions, such as identifying influential or mislabeled examples, our results suggest that current algorithms for influence function estimation give more informative results than previous error analyses would suggest.
\ No newline at end of file
diff --git a/data/2022/neurips/Imbalance Trouble: Revisiting Neural-Collapse Geometry b/data/2022/neurips/Imbalance Trouble: Revisiting Neural-Collapse Geometry
new file mode 100644
index 0000000000..882a06c73d
--- /dev/null
+++ b/data/2022/neurips/Imbalance Trouble: Revisiting Neural-Collapse Geometry	
@@ -0,0 +1 @@
+Neural Collapse refers to the remarkable structural properties characterizing the geometry of class embeddings and classifier weights, found by deep nets when trained beyond zero training error. However, this characterization only holds for balanced data. Here we thus ask whether it can be made invariant to class imbalances. Towards this end, we adopt the unconstrained-features model (UFM), a recent theoretical model for studying neural collapse, and introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon. Specifically, we prove for the UFM with cross-entropy loss and vanishing regularization that, irrespective of class imbalances, the embeddings and classifiers always interpolate a simplex-encoded label matrix and that their individual geometries are determined by the SVD factors of this same label matrix. We then present extensive experiments on synthetic and real datasets that confirm convergence to the SELI geometry. However, we caution that convergence worsens with increasing imbalances. We theoretically support this finding by showing that unlike the balanced case, when minorities are present, ridge-regularization plays a critical role in tweaking the geometry. This defines new questions and motivates further investigations into the impact of class imbalances on the rates at which first-order methods converge to their asymptotically preferred solutions.
\ No newline at end of file
diff --git a/data/2022/neurips/Imitating Past Successes can be Very Suboptimal b/data/2022/neurips/Imitating Past Successes can be Very Suboptimal
new file mode 100644
index 0000000000..e852da5bbc
--- /dev/null
+++ b/data/2022/neurips/Imitating Past Successes can be Very Suboptimal	
@@ -0,0 +1 @@
+Prior work has proposed a simple strategy for reinforcement learning (RL): label experience with the outcomes achieved in that experience, and then imitate the relabeled experience. These outcome-conditioned imitation learning methods are appealing because of their simplicity, strong performance, and close ties with supervised learning. However, it remains unclear how these methods relate to the standard RL objective, reward maximization. In this paper, we formally relate outcome-conditioned imitation learning to reward maximization, drawing a precise relationship between the learned policy and Q-values and explaining the close connections between these methods and prior EM-based policy search methods. This analysis shows that existing outcome-conditioned imitation learning methods do not necessarily improve the policy, but a simple modification results in a method that does guarantee policy improvement, under some assumptions.
\ No newline at end of file
diff --git a/data/2022/neurips/Implications of Model Indeterminacy for Explanations of Automated Decisions b/data/2022/neurips/Implications of Model Indeterminacy for Explanations of Automated Decisions
new file mode 100644
index 0000000000..3801826180
--- /dev/null
+++ b/data/2022/neurips/Implications of Model Indeterminacy for Explanations of Automated Decisions	
@@ -0,0 +1 @@
+There has been a significant research effort focused on explaining predictive models, for example through post-hoc explainability and recourse methods. Most of the proposed techniques operate upon a single, fixed, predictive model. However, it is well-known that given a dataset and a predictive task, there may be a multiplicity of models that solve the problem (nearly) equally well. In this work, we investigate the implications of this kind of model indeterminacy on the post-hoc explanations of predictive models. We show how it can lead to explanatory multiplicity , and we explore the underlying drivers. We show how predictive multiplicity, and the related concept of epistemic uncertainty, are not reliable indicators of explanatory multiplicity. We further illustrate how a set of models showing very similar aggregate performance on a test dataset may show large variations in their local explanations, i.e., for a specific input. We explore these effects for Shapley value based explanations on three risk assessment datasets. Our results indicate that model indeterminacy may have a substantial impact on explanations in practice, leading to inconsistent and even contradicting explanations.
\ No newline at end of file
diff --git a/data/2022/neurips/Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent b/data/2022/neurips/Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
new file mode 100644
index 0000000000..0c7d00efa2
--- /dev/null
+++ b/data/2022/neurips/Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent	
@@ -0,0 +1 @@
+As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.
\ No newline at end of file
diff --git a/data/2022/neurips/Implicit Neural Representations with Levels-of-Experts b/data/2022/neurips/Implicit Neural Representations with Levels-of-Experts
new file mode 100644
index 0000000000..5e66c9536d
--- /dev/null
+++ b/data/2022/neurips/Implicit Neural Representations with Levels-of-Experts	
@@ -0,0 +1 @@
+Coordinate-based networks, usually in the forms of MLPs, have been successfully applied to the task of predicting high-frequency but low-dimensional signals using coordinate inputs. To scale them to model large-scale signals, previous works resort to hybrid representations, combining a coordinate-based network with a grid-based representation, such as sparse voxels. However, such approaches lack a compact global latent representation in its grid, making it difficult to model a distribution of signals, which is important for generalization tasks. To address the limitation, we propose the Levels-of-Experts (LoE) framework, which is a novel coordinate-based representation consisting of an MLP with periodic, position-dependent weights arranged hierarchically. For each linear layer of the MLP, multiple candidate values of its weight matrix are tiled and replicated across the input space, with different layers replicating at different frequencies. Based on the input, only one of the weight matrices is chosen for each layer. This greatly increases the model capacity without incurring extra computation or compromising generalization capability. We show that the new representation is an efficient and competitive drop-in replacement for a wide range of tasks, including signal fitting, novel view synthesis, and generative modeling.
\ No newline at end of file
diff --git a/data/2022/neurips/Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions b/data/2022/neurips/Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions
new file mode 100644
index 0000000000..405bb2a668
--- /dev/null
+++ b/data/2022/neurips/Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk).
\ No newline at end of file
diff --git a/data/2022/neurips/Implicit Warping for Animation with Image Sets b/data/2022/neurips/Implicit Warping for Animation with Image Sets
new file mode 100644
index 0000000000..fc21dbcb02
--- /dev/null
+++ b/data/2022/neurips/Implicit Warping for Animation with Image Sets	
@@ -0,0 +1 @@
+We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video. A single cross- modal attention layer is used to find correspondences between the source images and the driving image, choose the most appropriate features from different source images, and warp the selected features. This is in contrast to the existing methods that use explicit flow-based warping, which is designed for animation using a single source and does not extend well to multiple sources. The pick-and-choose capability of our framework helps it achieve state-of-the-art results on multiple datasets for image animation using both single and multiple source images. The project website is available at https://deepimagination.cc/implicit warping/
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Algorithms for Neural Active Learning b/data/2022/neurips/Improved Algorithms for Neural Active Learning
new file mode 100644
index 0000000000..7b13f1a4f9
--- /dev/null
+++ b/data/2022/neurips/Improved Algorithms for Neural Active Learning	
@@ -0,0 +1 @@
+We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. In particular, we introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work. Then, the proposed algorithm leverages the powerful representation of NNs for both exploitation and exploration, has the query decision-maker tailored for $k$-class classification problems with the performance guarantee, utilizes the full feedback, and updates parameters in a more practical and efficient manner. These careful designs lead to an instance-dependent regret upper bound, roughly improving by a multiplicative factor $O(\log T)$ and removing the curse of input dimensionality. Furthermore, we show that the algorithm can achieve the same performance as the Bayes-optimal classifier in the long run under the hard-margin setting in classification problems. In the end, we use extensive experiments to evaluate the proposed algorithm and SOTA baselines, to show the improved empirical performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Bounds on Neural Complexity for Representing Piecewise Linear Functions b/data/2022/neurips/Improved Bounds on Neural Complexity for Representing Piecewise Linear Functions
new file mode 100644
index 0000000000..07180e8747
--- /dev/null
+++ b/data/2022/neurips/Improved Bounds on Neural Complexity for Representing Piecewise Linear Functions	
@@ -0,0 +1 @@
+A deep neural network using rectified linear units represents a continuous piecewise linear (CPWL) function and vice versa. Recent results in the literature estimated that the number of neurons needed to exactly represent any CPWL function grows exponentially with the number of pieces or exponentially in terms of the factorial of the number of distinct linear components. Moreover, such growth is amplified linearly with the input dimension. These existing results seem to indicate that the cost of representing a CPWL function is expensive. In this paper, we propose much tighter bounds and establish a polynomial time algorithm to find a network satisfying these bounds for any given CPWL function. We prove that the number of hidden neurons required to exactly represent any CPWL function is at most a quadratic function of the number of pieces. In contrast to all previous results, this upper bound is invariant to the input dimension. Besides the number of pieces, we also study the number of distinct linear components in CPWL functions. When such a number is also given, we prove that the quadratic complexity turns into bilinear, which implies a lower neural complexity because the number of distinct linear components is always not greater than the minimum number of pieces in a CPWL function. When the number of pieces is unknown, we prove that, in terms of the number of distinct linear components, the neural complexities of any CPWL function are at most polynomial growth for low-dimensional inputs and factorial growth for the worst-case scenario, which are significantly better than existing results in the literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization b/data/2022/neurips/Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
new file mode 100644
index 0000000000..5c79eff999
--- /dev/null
+++ b/data/2022/neurips/Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization	
@@ -0,0 +1 @@
+The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications. Especially, its variance reduced versions have nowadays gained particular attention. In this paper, we study two variants of this kind, namely, the Stochastic Variance Reduced Gradient Langevin Dynamics and the Stochastic Recursive Gradient Langevin Dynamics. We prove their convergence to the objective distribution in terms of KL-divergence under the sole assumptions of smoothness and Log-Sobolev inequality which are weaker conditions than those used in prior works for these algorithms. With the batch size and the inner loop length set to $\sqrt{n}$, the gradient complexity to achieve an $\epsilon$-precision is $\tilde{O}((n+dn^{1/2}\epsilon^{-1})\gamma^2 L^2\alpha^{-2})$, which is an improvement from any previous analyses. We also show some essential applications of our result to non-convex optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Coresets for Euclidean k-Means b/data/2022/neurips/Improved Coresets for Euclidean k-Means
new file mode 100644
index 0000000000..e44da4e05d
--- /dev/null
+++ b/data/2022/neurips/Improved Coresets for Euclidean k-Means	
@@ -0,0 +1 @@
+Given a set of n points in d dimensions, the Euclidean k -means problem (resp. the Euclidean k -median problem) consists of finding k centers such that the sum of squared distances (resp. sum of distances) from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a (1 ± ε ) factor. The current state of the art coreset size is ˜ O (min( k 2 · ε − 2 , k · ε − 4 )) for Euclidean k -means and ˜ O (min( k 2 · ε − 2 , k · ε − 3 )) for Euclidean k -median. The best known lower bound for both problems is Ω( kε − 2 ) . In this paper, we improve the upper bounds ˜ O (min( k 3 / 2 · ε − 2 , k · ε − 4 )) for k -means and ˜ O (min( k 4 / 3 · ε − 2 , k · ε − 3 )) for k -median. In particular, ours is the first provable bound that breaks through the k 2 barrier while retaining an optimal dependency on ε .
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams b/data/2022/neurips/Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams
new file mode 100644
index 0000000000..cfddc32dc7
--- /dev/null
+++ b/data/2022/neurips/Improved Differential Privacy for SGD via Optimal Private Linear Operators on Adaptive Streams	
@@ -0,0 +1 @@
+Motivated by recent applications requiring differential privacy over adaptive streams, we investigate the question of optimal instantiations of the matrix mechanism in this setting. We prove fundamental theoretical results on the applicability of matrix factorizations to adaptive streams, and provide a parameter-free fixed-point algorithm for computing optimal factorizations. We instantiate this framework with respect to concrete matrices which arise naturally in machine learning, and train user-level differentially private models with the resulting optimal mechanisms, yielding significant improvements in a notable problem in federated learning with user-level differential privacy.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Feature Distillation via Projector Ensemble b/data/2022/neurips/Improved Feature Distillation via Projector Ensemble
new file mode 100644
index 0000000000..0286558010
--- /dev/null
+++ b/data/2022/neurips/Improved Feature Distillation via Projector Ensemble	
@@ -0,0 +1 @@
+In knowledge distillation, previous feature distillation methods mainly focus on the design of loss functions and the selection of the distilled layers, while the effect of the feature projector between the student and the teacher remains under-explored. In this paper, we first discuss a plausible mechanism of the projector with empirical evidence and then propose a new feature distillation method based on a projector ensemble for further performance improvement. We observe that the student network benefits from a projector even if the feature dimensions of the student and the teacher are the same. Training a student backbone without a projector can be considered as a multi-task learning process, namely achieving discriminative feature extraction for classification and feature matching between the student and the teacher for distillation at the same time. We hypothesize and empirically verify that without a projector, the student network tends to overfit the teacher's feature distributions despite having different architecture and weights initialization. This leads to degradation on the quality of the student's deep features that are eventually used in classification. Adding a projector, on the other hand, disentangles the two learning tasks and helps the student network to focus better on the main feature extraction task while still being able to utilize teacher features as a guidance through the projector. Motivated by the positive effect of the projector in feature distillation, we propose an ensemble of projectors to further improve the quality of student features. Experimental results on different datasets with a series of teacher-student pairs illustrate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Fine-Tuning by Better Leveraging Pre-Training Data b/data/2022/neurips/Improved Fine-Tuning by Better Leveraging Pre-Training Data
new file mode 100644
index 0000000000..cb3423fa3b
--- /dev/null
+++ b/data/2022/neurips/Improved Fine-Tuning by Better Leveraging Pre-Training Data	
@@ -0,0 +1 @@
+As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy once the number of training samples is increased in some vision tasks. In this work, we revisit this phenomenon from the perspective of generalization analysis by using excess risk bound which is popular in learning theory. The result reveals that the excess risk bound may have a weak dependency on the pre-trained model. The observation inspires us to leverage pre-training data for fine-tuning, since this data is also available for fine-tuning. The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning. With the theoretical motivation, we propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task. Extensive experimental results for image classification tasks on 8 benchmark data sets verify the effectiveness of the proposed data selection based fine-tuning pipeline.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Imaging by Invex Regularizers with Global Optima Guarantees b/data/2022/neurips/Improved Imaging by Invex Regularizers with Global Optima Guarantees
new file mode 100644
index 0000000000..a6cb511dc3
--- /dev/null
+++ b/data/2022/neurips/Improved Imaging by Invex Regularizers with Global Optima Guarantees	
@@ -0,0 +1 @@
+Image reconstruction enhanced by regularizers, e.g., to enforce sparsity, low rank or smoothness priors on images, has many successful applications in vision tasks such as computer photography, biomedical and spectral imaging. It has been well accepted that non-convex regularizers normally perform better than convex ones in terms of the reconstruction quality. But their convergence analysis is only established to a critical point, rather than the global optima. To mitigate the loss of guarantees for global optima, we propose to apply the concept of \textit{invexity} and provide the first list of proved invex regularizers for improving image reconstruction. Moreover, we establish convergence guarantees to global optima for various advanced image reconstruction techniques after being improved by such invex regularization. To the best of our knowledge, this is the first practical work applying invex regularization to improve imaging with global optima guarantees. To demonstrate the effectiveness of invex regularization, numerical experiments are conducted for various imaging tasks using benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs b/data/2022/neurips/Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
new file mode 100644
index 0000000000..b2916bf522
--- /dev/null
+++ b/data/2022/neurips/Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs	
@@ -0,0 +1 @@
+In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet is challenging because variances are often not known a priori. Recently, considerable progress has been made by Zhang et al. (2021) where they obtain a variance-adaptive regret bound for linear bandits without knowledge of the variances and a horizon-free regret bound for linear mixture Markov decision processes (MDPs). In this paper, we present novel analyses that improve their regret bounds significantly. For linear bandits, we achieve $\tilde O(\min\{d\sqrt{K}, d^{1.5}\sqrt{\sum_{k=1}^K \sigma_k^2}\} + d^2)$ where $d$ is the dimension of the features, $K$ is the time horizon, and $\sigma_k^2$ is the noise variance at time step $k$, and $\tilde O$ ignores polylogarithmic dependence, which is a factor of $d^3$ improvement. For linear mixture MDPs with the assumption of maximum cumulative reward in an episode being in $[0,1]$, we achieve a horizon-free regret bound of $\tilde O(d \sqrt{K} + d^2)$ where $d$ is the number of base models and $K$ is the number of episodes. This is a factor of $d^{3.5}$ improvement in the leading term and $d^7$ in the lower order term. Our analysis critically relies on a novel peeling-based regret analysis that leverages the elliptical potential `count' lemma.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved Utility Analysis of Private CountSketch b/data/2022/neurips/Improved Utility Analysis of Private CountSketch
new file mode 100644
index 0000000000..156bef2fc4
--- /dev/null
+++ b/data/2022/neurips/Improved Utility Analysis of Private CountSketch	
@@ -0,0 +1 @@
+Sketching is an important tool for dealing with high-dimensional vectors that are sparse (or well-approximated by a sparse vector), especially useful in distributed, parallel, and streaming settings. It is known that sketches can be made differentially private by adding noise according to the sensitivity of the sketch, and this has been used in private analytics and federated learning settings. The post-processing property of differential privacy implies that all estimates computed from the sketch can be released within the given privacy budget. In this paper we consider the classical CountSketch, made differentially private with the Gaussian mechanism, and give an improved analysis of its estimation error. Perhaps surprisingly, the privacy-utility trade-off is essentially the best one could hope for, independent of the number of repetitions in CountSketch: The error is almost identical to the error from non-private CountSketch plus the noise needed to make the vector private in the original, high-dimensional domain.
\ No newline at end of file
diff --git a/data/2022/neurips/Improved techniques for deterministic l2 robustness b/data/2022/neurips/Improved techniques for deterministic l2 robustness
new file mode 100644
index 0000000000..fcbbd7b0a9
--- /dev/null
+++ b/data/2022/neurips/Improved techniques for deterministic l2 robustness	
@@ -0,0 +1 @@
+Training convolutional neural networks (CNNs) with a strict 1-Lipschitz constraint under the $l_{2}$ norm is useful for adversarial robustness, interpretable gradients and stable training. 1-Lipschitz CNNs are usually designed by enforcing each layer to have an orthogonal Jacobian matrix (for all inputs) to prevent the gradients from vanishing during backpropagation. However, their performance often significantly lags behind that of heuristic methods to enforce Lipschitz constraints where the resulting CNN is not \textit{provably} 1-Lipschitz. In this work, we reduce this gap by introducing (a) a procedure to certify robustness of 1-Lipschitz CNNs by replacing the last linear layer with a 1-hidden layer MLP that significantly improves their performance for both standard and provably robust accuracy, (b) a method to significantly reduce the training time per epoch for Skew Orthogonal Convolution (SOC) layers (>30\% reduction for deeper networks) and (c) a class of pooling layers using the mathematical property that the $l_{2}$ distance of an input to a manifold is 1-Lipschitz. Using these methods, we significantly advance the state-of-the-art for standard and provable robust accuracies on CIFAR-10 (gains of +1.79\% and +3.82\%) and similarly on CIFAR-100 (+3.78\% and +4.75\%) across all networks. Code is available at \url{https://github.com/singlasahil14/improved_l2_robustness}.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator b/data/2022/neurips/Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator
new file mode 100644
index 0000000000..0066b9ff24
--- /dev/null
+++ b/data/2022/neurips/Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator	
@@ -0,0 +1 @@
+3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes. A popular solution is to adopt the generative adversarial network (GAN) and replace the generator with a 3D renderer, where volume rendering with neural radiance field (NeRF) is commonly used. Despite the advancement of synthesis quality, existing methods fail to obtain moderate 3D shapes. We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough. In other words, displacing the generative mechanism only offers the capability, but not the guarantee, of producing 3D-aware images, because the supervision of the generator primarily comes from the discriminator. To address this issue, we propose GeoD through learning a geometry-aware discriminator to improve 3D-aware GANs. Concretely, besides differentiating real and fake samples from the 2D image space, the discriminator is additionally asked to derive the geometry information from the inputs, which is then applied as the guidance of the generator. Such a simple yet effective design facilitates learning substantially more accurate 3D shapes. Extensive experiments on various generator architectures and training datasets verify the superiority of GeoD over state-of-the-art alternatives. Moreover, our approach is registered as a general framework such that a more capable discriminator (i.e., with a third task of novel view synthesis beyond domain classification and geometry extraction) can further assist the generator with a better multi-view consistency.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class b/data/2022/neurips/Improving Barely Supervised Learning by Discriminating Unlabeled Samples with Super-Class
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Improving Certified Robustness via Statistical Learning with Logical Reasoning b/data/2022/neurips/Improving Certified Robustness via Statistical Learning with Logical Reasoning
new file mode 100644
index 0000000000..7fa9043599
--- /dev/null
+++ b/data/2022/neurips/Improving Certified Robustness via Statistical Learning with Logical Reasoning	
@@ -0,0 +1 @@
+Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-arts.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Diffusion Models for Inverse Problems using Manifold Constraints b/data/2022/neurips/Improving Diffusion Models for Inverse Problems using Manifold Constraints
new file mode 100644
index 0000000000..31971150ca
--- /dev/null
+++ b/data/2022/neurips/Improving Diffusion Models for Inverse Problems using Manifold Constraints	
@@ -0,0 +1 @@
+Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce suboptimal results. By studying the generative sampling path, here we show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint, which can be used synergistically with the previous solvers to make the iterations close to the manifold. The proposed manifold constraint is straightforward to implement within a few lines of code, yet boosts the performance by a surprisingly large margin. With extensive experiments, we show that our method is superior to the previous methods both theoretically and empirically, producing promising results in many applications such as image inpainting, colorization, and sparse-view computed tomography. Code available https://github.com/HJ-harry/MCG_diffusion
\ No newline at end of file
diff --git a/data/2022/neurips/Improving GANs with A Dynamic Discriminator b/data/2022/neurips/Improving GANs with A Dynamic Discriminator
new file mode 100644
index 0000000000..1a671a3cad
--- /dev/null
+++ b/data/2022/neurips/Improving GANs with A Dynamic Discriminator	
@@ -0,0 +1 @@
+Discriminator plays a vital role in training generative adversarial networks (GANs) via distinguishing real and synthesized samples. While the real data distribution remains the same, the synthesis distribution keeps varying because of the evolving generator, and thus effects a corresponding change to the bi-classification task for the discriminator. We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task. A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional computation cost or training objectives. Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator. Experiments on both 2D and 3D-aware image synthesis tasks conducted on a range of datasets substantiate the generalizability of our DynamicD as well as its substantial improvement over the baselines. Furthermore, DynamicD is synergistic to other discriminator-improving approaches (including data augmentation, regularizers, and pre-training), and brings continuous performance gain when combined for learning GANs.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Generative Adversarial Networks via Adversarial Learning in Latent Space b/data/2022/neurips/Improving Generative Adversarial Networks via Adversarial Learning in Latent Space
new file mode 100644
index 0000000000..801555147d
--- /dev/null
+++ b/data/2022/neurips/Improving Generative Adversarial Networks via Adversarial Learning in Latent Space	
@@ -0,0 +1 @@
+For Generative Adversarial Networks which map a latent distribution to the target distribution, in this paper, we study how the sampling in latent space can affect the generation performance, especially for images. We observe that, as the neural generator is a continuous function, two close samples in latent space would be mapped into two nearby images, while their quality can differ much as the quality generally does not exhibit a continuous nature in pixel space. From such a continuous mapping function perspective, it is also possible that two distant latent samples can be mapped into two close images (if not exactly the same). In particular, if the latent samples are mapped in aggregation into a single mode, mode collapse occurs. Accordingly, we propose adding an implicit latent transform before the mapping function to improve latent z from its initial distribution, e.g., Gaussian. This is achieved using well-developed adversarial sample mining techniques, e.g. iterative fast gradient sign method (I-FGSM). We further propose new GAN training pipelines to obtain better generative mappings w.r.t quality and diversity by introducing targeted latent transforms into the bi-level optimization of GAN. Experimental results on visual data show that our method can effectively achieve improvement in both quality and diversity. The implementation is publicly available at https://github.com/yangco-le/AdvLatGAN .
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Intrinsic Exploration with Language Abstractions b/data/2022/neurips/Improving Intrinsic Exploration with Language Abstractions
new file mode 100644
index 0000000000..e05e246eab
--- /dev/null
+++ b/data/2022/neurips/Improving Intrinsic Exploration with Language Abstractions	
@@ -0,0 +1 @@
+Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 47-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method b/data/2022/neurips/Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method
new file mode 100644
index 0000000000..6a825745ea
--- /dev/null
+++ b/data/2022/neurips/Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method	
@@ -0,0 +1 @@
+We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov’s accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate O (1 /k 2 ) of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors b/data/2022/neurips/Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors
new file mode 100644
index 0000000000..92cb8223cf
--- /dev/null
+++ b/data/2022/neurips/Improving Out-of-Distribution Generalization by Adversarial Training with Structured Priors	
@@ -0,0 +1 @@
+Deep models often fail to generalize well in test domains when the data distribution differs from that in the training domain. Among numerous approaches to address this Out-of-Distribution (OOD) generalization problem, there has been a growing surge of interest in exploiting Adversarial Training (AT) to improve OOD performance. Recent works have revealed that the robust model obtained by conducting sample-wise AT also retains transferability to biased test domains. In this paper, we empirically show that sample-wise AT has limited improvement on OOD performance. Specifically, we find that AT can only maintain performance at smaller scales of perturbation while Universal AT (UAT) is more robust to larger-scale perturbations. This provides us with clues that adversarial perturbations with universal (low dimensional) structures can enhance the robustness against large data distribution shifts that are common in OOD scenarios. Inspired by this, we propose two AT variants with low-rank structures to train OOD-robust models. Extensive experiments on DomainBed benchmark show that our proposed approaches outperform Empirical Risk Minimization (ERM) and sample-wise AT. Our code is available at https://github.com/NOVAglow646/NIPS22-MAT-and-LDAT-for-OOD.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Policy Learning via Language Dynamics Distillation b/data/2022/neurips/Improving Policy Learning via Language Dynamics Distillation
new file mode 100644
index 0000000000..0bb501f52e
--- /dev/null
+++ b/data/2022/neurips/Improving Policy Learning via Language Dynamics Distillation	
@@ -0,0 +1 @@
+Recent work has shown that augmenting environments with language descriptions improves policy learning. However, for environments with complex language abstractions, learning how to ground language to observations is difficult due to sparse, delayed rewards. We propose Language Dynamics Distillation (LDD), which pretrains a model to predict environment dynamics given demonstrations with language descriptions, and then fine-tunes these language-aware pretrained representations via reinforcement learning (RL). In this way, the model is trained to both maximize expected reward and retain knowledge about how language relates to environment dynamics. On SILG, a benchmark of five tasks with language descriptions that evaluate distinct generalization challenges on unseen environments (NetHack, ALFWorld, RTFM, Messenger, and Touchdown), LDD outperforms tabula-rasa RL, VAE pretraining, and methods that learn from unlabeled demonstrations in inverse RL and reward shaping with pretrained experts. In our analyses, we show that language descriptions in demonstrations improve sample-efficiency and generalization across environments, and that dynamics modelling with expert demonstrations is more effective than with non-experts.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Self-Supervised Learning by Characterizing Idealized Representations b/data/2022/neurips/Improving Self-Supervised Learning by Characterizing Idealized Representations
new file mode 100644
index 0000000000..02a5cdabe9
--- /dev/null
+++ b/data/2022/neurips/Improving Self-Supervised Learning by Characterizing Idealized Representations	
@@ -0,0 +1 @@
+Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.g., linear or MLP) trained on that representation attain perfect accuracy. These requirements lead to a unifying conceptual framework for improving existing SSL methods and deriving new ones. For contrastive learning, our framework prescribes simple but significant improvements to previous methods such as using asymmetric projection heads. For non-contrastive learning, we use our framework to derive a simple and novel objective. Our resulting SSL algorithms outperform baselines on standard benchmarks, including SwAV+multicrops on linear probing of ImageNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization b/data/2022/neurips/Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization
new file mode 100644
index 0000000000..0411ce9dfb
--- /dev/null
+++ b/data/2022/neurips/Improving Task-Specific Generalization in Few-Shot Learning via Adaptive Vicinal Risk Minimization	
@@ -0,0 +1 @@
+Recent years have witnessed the rapid development of meta-learning in improving the meta generalization over tasks in few-shot learning. However, the task-speciﬁc level generalization is overlooked in most algorithms. For a novel few-shot learning task where the empirical distribution likely deviates from the true distribution, the model obtained via minimizing the empirical loss can hardly generalize to unseen data. A viable solution to improving the generalization comes as a more accurate approximation of the true distribution; that is, admitting a Gaussian-like vicinal distribution for each of the limited training samples. Thereupon we derive the resulting vicinal loss function over vicinities of all training samples and minimize it instead of the conventional empirical loss over training samples only, favorably free from the exhaustive sampling of all vicinal samples. It remains challenging to obtain the statistical parameters of the vicinal distribution for each sample. To tackle this challenge, we further propose to estimate the statistical parameters as the weighted mean and variance of a set of unlabeled data it passed by a random walk starting from training samples. To verify the performance of the proposed method, we conduct experiments on three standard few-shot learning benchmarks and consolidate the superiority of the proposed method over state-of-the-art few-shot learning baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Transformer with an Admixture of Attention Heads b/data/2022/neurips/Improving Transformer with an Admixture of Attention Heads
new file mode 100644
index 0000000000..0f431dd192
--- /dev/null
+++ b/data/2022/neurips/Improving Transformer with an Admixture of Attention Heads	
@@ -0,0 +1 @@
+Transformers with multi-head self-attention have achieved remarkable success in sequence modeling and beyond. However, they suffer from high computational and memory complexities for computing the attention matrix at each head. Recently, it has been shown that those attention matrices lie on a low-dimensional manifold and, thus, are redundant. We propose the Transformer with a Finite Admixture of Shared Heads (FiSHformers), a novel class of efficient and flexible transformers that allow the sharing of attention matrices between attention heads. At the core of FiSHformer is a novel finite admixture model of shared heads (FiSH) that samples attention matrices from a set of global attention matrices. The number of global attention matrices is much smaller than the number of local attention matrices generated. FiSHformers directly learn these global attention matrices rather than the local ones as in other transformers, thus significantly improving the computational and memory efficiency of the model. We empirically verify the advantages of the FiSHformer over the baseline transformers in a wide range of practical applications including language modeling, machine translation, and image classification. On the WikiText-103, IWSLT’14 De-En and WMT’14
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Variational Autoencoders with Density Gap-based Regularization b/data/2022/neurips/Improving Variational Autoencoders with Density Gap-based Regularization
new file mode 100644
index 0000000000..0d01e5f06c
--- /dev/null
+++ b/data/2022/neurips/Improving Variational Autoencoders with Density Gap-based Regularization	
@@ -0,0 +1 @@
+Variational autoencoders (VAEs) are one of the powerful unsupervised learning frameworks in NLP for latent representation learning and latent-directed generation. The classic optimization goal of VAEs is to maximize the Evidence Lower Bound (ELBo), which consists of a conditional likelihood for generation and a negative Kullback-Leibler (KL) divergence for regularization. In practice, optimizing ELBo often leads the posterior distribution of all samples converge to the same degenerated local optimum, namely posterior collapse or KL vanishing. There are effective ways proposed to prevent posterior collapse in VAEs, but we observe that they in essence make trade-offs between posterior collapse and hole problem, i.e., mismatch between the aggregated posterior distribution and the prior distribution. To this end, we introduce new training objectives to tackle both two problems through a novel regularization based on the probabilistic density gap between the aggregated posterior distribution and the prior distribution. Through experiments on language modeling, latent space visualization and interpolation, we show that our proposed method can solve both problems effectively and thus outperforms the existing methods in latent-directed generation. To the best of our knowledge, we are the first to jointly solve the hole problem and the posterior collapse.
\ No newline at end of file
diff --git a/data/2022/neurips/Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions b/data/2022/neurips/Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
new file mode 100644
index 0000000000..b95be155aa
--- /dev/null
+++ b/data/2022/neurips/Improving Zero-Shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions	
@@ -0,0 +1 @@
+Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they struggle in the offline RL setting, i.e. learning from a static dataset. We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations. We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior, where we quantify this similarity using \emph{generalized value functions}. We show that GSF is general enough to recover existing SSL objectives while also improving zero-shot generalization performance on a complex offline RL benchmark, offline Procgen.
\ No newline at end of file
diff --git a/data/2022/neurips/In Defense of the Unitary Scalarization for Deep Multi-Task Learning b/data/2022/neurips/In Defense of the Unitary Scalarization for Deep Multi-Task Learning
new file mode 100644
index 0000000000..c5a3719c86
--- /dev/null
+++ b/data/2022/neurips/In Defense of the Unitary Scalarization for Deep Multi-Task Learning	
@@ -0,0 +1 @@
+Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and implementation overhead. We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings. We then present an analysis suggesting that many specialized multi-task optimizers can be partly interpreted as forms of regularization, potentially explaining our surprising results. We believe our results call for a critical reevaluation of recent research in the area.
\ No newline at end of file
diff --git a/data/2022/neurips/In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning b/data/2022/neurips/In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning
new file mode 100644
index 0000000000..5bafdb0598
--- /dev/null
+++ b/data/2022/neurips/In Differential Privacy, There is Truth: on Vote-Histogram Leakage in Ensemble Private Learning	
@@ -0,0 +1 @@
+When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
\ No newline at end of file
diff --git a/data/2022/neurips/In What Ways Are Deep Neural Networks Invariant and How Should We Measure This? b/data/2022/neurips/In What Ways Are Deep Neural Networks Invariant and How Should We Measure This?
new file mode 100644
index 0000000000..2b3282560b
--- /dev/null
+++ b/data/2022/neurips/In What Ways Are Deep Neural Networks Invariant and How Should We Measure This?	
@@ -0,0 +1 @@
+It is often said that a deep learning model is"invariant"to some specific type of transformation. However, what is meant by this statement strongly depends on the context in which it is made. In this paper we explore the nature of invariance and equivariance of deep learning models with the goal of better understanding the ways in which they actually capture these concepts on a formal level. We introduce a family of invariance and equivariance metrics that allows us to quantify these properties in a way that disentangles them from other metrics such as loss or accuracy. We use our metrics to better understand the two most popular methods used to build invariance into networks: data augmentation and equivariant layers. We draw a range of conclusions about invariance and equivariance in deep learning models, ranging from whether initializing a model with pretrained weights has an effect on a trained model's invariance, to the extent to which invariance learned via training can generalize to out-of-distribution data.
\ No newline at end of file
diff --git a/data/2022/neurips/In the Eye of the Beholder: Robust Prediction with Causal User Modeling b/data/2022/neurips/In the Eye of the Beholder: Robust Prediction with Causal User Modeling
new file mode 100644
index 0000000000..b37661025d
--- /dev/null
+++ b/data/2022/neurips/In the Eye of the Beholder: Robust Prediction with Causal User Modeling	
@@ -0,0 +1 @@
+Accurately predicting the relevance of items to users is crucial to the success of many social platforms. Conventional approaches train models on logged historical data; but recommendation systems, media services, and online marketplaces all exhibit a constant influx of new content -- making relevancy a moving target, to which standard predictive models are not robust. In this paper, we propose a learning framework for relevance prediction that is robust to changes in the data distribution. Our key observation is that robustness can be obtained by accounting for how users causally perceive the environment. We model users as boundedly-rational decision makers whose causal beliefs are encoded by a causal graph, and show how minimal information regarding the graph can be used to contend with distributional changes. Experiments in multiple settings demonstrate the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Incentivizing Combinatorial Bandit Exploration b/data/2022/neurips/Incentivizing Combinatorial Bandit Exploration
new file mode 100644
index 0000000000..a01994f01c
--- /dev/null
+++ b/data/2022/neurips/Incentivizing Combinatorial Bandit Exploration	
@@ -0,0 +1 @@
+Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users. All published work on this problem, known as incentivized exploration, focuses on small, unstructured action sets and mainly targets the case when the users' beliefs are independent across actions. However, realistic exploration problems often feature large, structured action sets and highly correlated beliefs. We focus on a paradigmatic exploration problem with structure: combinatorial semi-bandits. We prove that Thompson Sampling, when applied to combinatorial semi-bandits, is incentive-compatible when initialized with a sufficient number of samples of each arm (where this number is determined in advance by the Bayesian prior). Moreover, we design incentive-compatible algorithms for collecting the initial samples.
\ No newline at end of file
diff --git a/data/2022/neurips/Inception Transformer b/data/2022/neurips/Inception Transformer
new file mode 100644
index 0000000000..e63c91f93a
--- /dev/null
+++ b/data/2022/neurips/Inception Transformer	
@@ -0,0 +1 @@
+Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, that effectively learns comprehensive features with both high- and low-frequency information in visual data. Specifically, we design an Inception mixer to explicitly graft the advantages of convolution and max-pooling for capturing the high-frequency information to Transformers. Different from recent hybrid frameworks, the Inception mixer brings greater efficiency through a channel splitting mechanism to adopt parallel convolution/max-pooling path and self-attention path as high- and low-frequency mixers, while having the flexibility to model discriminative information scattered within a wide frequency range. Considering that bottom layers play more roles in capturing high-frequency details while top layers more in modeling low-frequency global information, we further introduce a frequency ramp structure, i.e. gradually decreasing the dimensions fed to the high-frequency mixer and increasing those to the low-frequency mixer, which can effectively trade-off high- and low-frequency components across different layers. We benchmark the iFormer on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation. For example, our iFormer-S hits the top-1 accuracy of 83.4% on ImageNet-1K, much higher than DeiT-S by 3.6%, and even slightly better than much bigger model Swin-B (83.3%) with only 1/4 parameters and 1/3 FLOPs. Code and models will be released at https://github.com/sail-sg/iFormer.
\ No newline at end of file
diff --git a/data/2022/neurips/Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering b/data/2022/neurips/Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering
new file mode 100644
index 0000000000..037485d64c
--- /dev/null
+++ b/data/2022/neurips/Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering	
@@ -0,0 +1 @@
+Collaborative filtering (CF) models easily suffer from popularity bias, which makes recommendation deviate from users' actual preferences. However, most current debiasing strategies are prone to playing a trade-off game between head and tail performance, thus inevitably degrading the overall recommendation accuracy. To reduce the negative impact of popularity bias on CF models, we incorporate Bias-aware margins into Contrastive loss and propose a simple yet effective BC Loss, where the margin tailors quantitatively to the bias degree of each user-item interaction. We investigate the geometric interpretation of BC loss, then further visualize and theoretically prove that it simultaneously learns better head and tail representations by encouraging the compactness of similar users/items and enlarging the dispersion of dissimilar users/items. Over eight benchmark datasets, we use BC loss to optimize two high-performing CF models. On various evaluation settings (i.e., imbalanced/balanced, temporal split, fully-observed unbiased, tail/head test evaluations), BC loss outperforms the state-of-the-art debiasing and non-debiasing methods with remarkable improvements. Considering the theoretical guarantee and empirical success of BC loss, we advocate using it not just as a debiasing strategy, but also as a standard loss in recommender models.
\ No newline at end of file
diff --git a/data/2022/neurips/Increasing Confidence in Adversarial Robustness Evaluations b/data/2022/neurips/Increasing Confidence in Adversarial Robustness Evaluations
new file mode 100644
index 0000000000..40a6ff22b8
--- /dev/null
+++ b/data/2022/neurips/Increasing Confidence in Adversarial Robustness Evaluations	
@@ -0,0 +1 @@
+Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests - such as ours - will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism.
\ No newline at end of file
diff --git a/data/2022/neurips/Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces b/data/2022/neurips/Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces
new file mode 100644
index 0000000000..8ed1f9c118
--- /dev/null
+++ b/data/2022/neurips/Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces	
@@ -0,0 +1 @@
+Recent advances have extended the scope of Bayesian optimization (BO) to expensive-to-evaluate black-box functions with dozens of dimensions, aspiring to unlock impactful applications, for example, in the life sciences, neural architecture search, and robotics. However, a closer examination reveals that the state-of-the-art methods for high-dimensional Bayesian optimization (HDBO) suffer from degrading performance as the number of dimensions increases or even risk failure if certain unverifiable assumptions are not met. This paper proposes BAxUS that leverages a novel family of nested random subspaces to adapt the space it optimizes over to the problem. This ensures high performance while removing the risk of failure, which we assert via theoretical guarantees. A comprehensive evaluation demonstrates that BAxUS achieves better results than the state-of-the-art methods for a broad set of applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards b/data/2022/neurips/Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
new file mode 100644
index 0000000000..d2548dc1ac
--- /dev/null
+++ b/data/2022/neurips/Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards	
@@ -0,0 +1 @@
+Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central object for advertisers in online advertising platforms. This paper investigates the problem of how an advertiser can learn to optimize the bidding sequence in an online manner \emph{without} knowing the incrementality parameters in advance. We formulate the offline version of this problem as a specially structured episodic Markov Decision Process (MDP) and then, for its online learning counterpart, propose a novel reinforcement learning (RL) algorithm with regret at most $\widetilde{O}(H^2\sqrt{T})$, which depends on the number of rounds $H$ and number of episodes $T$, but does not depend on the number of actions (i.e., possible bids). A fundamental difference between our learning problem from standard RL problems is that the realized reward feedback from conversion incrementality is \emph{mixed} and \emph{delayed}. To handle this difficulty we propose and analyze a novel pairwise moment-matching algorithm to learn the conversion incrementality, which we believe is of independent of interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Independence Testing for Bounded Degree Bayesian Networks b/data/2022/neurips/Independence Testing for Bounded Degree Bayesian Networks
new file mode 100644
index 0000000000..395819ef33
--- /dev/null
+++ b/data/2022/neurips/Independence Testing for Bounded Degree Bayesian Networks	
@@ -0,0 +1 @@
+We study the following independence testing problem: given access to samples from a distribution $P$ over $\{0,1\}^n$, decide whether $P$ is a product distribution or whether it is $\varepsilon$-far in total variation distance from any product distribution. For arbitrary distributions, this problem requires $\exp(n)$ samples. We show in this work that if $P$ has a sparse structure, then in fact only linearly many samples are required. Specifically, if $P$ is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by $d$, then $\tilde{\Theta}(2^{d/2}\cdot n/\varepsilon^2)$ samples are necessary and sufficient for independence testing.
\ No newline at end of file
diff --git a/data/2022/neurips/Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models b/data/2022/neurips/Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models
new file mode 100644
index 0000000000..9144eb38bd
--- /dev/null
+++ b/data/2022/neurips/Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models	
@@ -0,0 +1 @@
+Causal discovery aims to recover causal structures generating the observational data. Despite its success in certain problems, in many real-world scenarios the observed variables are not the target variables of interest, but the imperfect measures of the target variables. Causal discovery under measurement error aims to recover the causal graph among unobserved target variables from observations made with measurement error. We consider a specific formulation of the problem, where the unobserved target variables follow a linear non-Gaussian acyclic model, and the measurement process follows the random measurement error model. Existing methods on this formulation rely on non-scalable over-complete independent component analysis (OICA). In this work, we propose the Transformed Independent Noise (TIN) condition, which checks for independence between a specific linear transformation of some measured variables and certain other measured variables. By leveraging the non-Gaussianity and higher-order statistics of data, TIN is informative about the graph structure among the unobserved target variables. By utilizing TIN, the ordered group decomposition of the causal model is identifiable. In other words, we could achieve what once required OICA to achieve by only conducting independence tests. Experimental results on both synthetic and real-world data demonstrate the effectiveness and reliability of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples b/data/2022/neurips/Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
new file mode 100644
index 0000000000..26a243a86c
--- /dev/null
+++ b/data/2022/neurips/Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples	
@@ -0,0 +1 @@
+Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of robustness by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by: (i) categorizing attack failures based on how they affect the optimization of gradient-based attacks, while also unveiling two novel failures affecting many popular attack implementations and past evaluations; (ii) proposing six novel indicators of failure, to automatically detect the presence of such failures in the attack optimization process; and (iii) suggesting a systematic protocol to apply the corresponding fixes. Our extensive experimental analysis, involving more than 15 models in 3 distinct application domains, shows that our indicators of failure can be used to debug and improve current adversarial robustness evaluations, thereby providing a first concrete step towards automatizing and systematizing them. Our open-source code is available at: https://github.com/pralab/IndicatorsOfAttackFailure.
\ No newline at end of file
diff --git a/data/2022/neurips/Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence b/data/2022/neurips/Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence
new file mode 100644
index 0000000000..48229bfb1c
--- /dev/null
+++ b/data/2022/neurips/Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence	
@@ -0,0 +1 @@
+To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome. This incentive design problem naturally possesses a bilevel structure, in which a designer modifies the rewards of the agents with incentives while anticipating the response of the agents, who play a non-cooperative game that converges to an equilibrium. The existing bilevel optimization algorithms raise a dilemma when applied to this problem: anticipating how incentives affect the agents at equilibrium requires solving the equilibrium problem repeatedly, which is computationally inefficient; bypassing the time-consuming step of equilibrium-finding can reduce the computational cost, but may lead the designer to a sub-optimal solution. To address such a dilemma, we propose a method that tackles the designer's and agents' problems simultaneously in a single loop. Specifically, at each iteration, both the designer and the agents only move one step. Nevertheless, we allow the designer to gradually learn the overall influence of the incentives on the agents, which guarantees optimality after convergence. The convergence rate of the proposed scheme is also established for a broad class of games.
\ No newline at end of file
diff --git a/data/2022/neurips/Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? b/data/2022/neurips/Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?
new file mode 100644
index 0000000000..09c141d5cf
--- /dev/null
+++ b/data/2022/neurips/Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?	
@@ -0,0 +1 @@
+Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a better convergence property. Our experimental results show that our method is able to bring significant improvements with faster convergence on multiple imbalanced datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Inductive Logical Query Answering in Knowledge Graphs b/data/2022/neurips/Inductive Logical Query Answering in Knowledge Graphs
new file mode 100644
index 0000000000..2b8c4224bc
--- /dev/null
+++ b/data/2022/neurips/Inductive Logical Query Answering in Knowledge Graphs	
@@ -0,0 +1 @@
+Formulating and answering logical queries is a standard communication interface for knowledge graphs (KGs). Alleviating the notorious incompleteness of real-world KGs, neural methods achieved impressive results in link prediction and complex query answering tasks by learning representations of entities, relations, and queries. Still, most existing query answering methods rely on transductive entity embeddings and cannot generalize to KGs containing new entities without retraining the entity embeddings. In this work, we study the inductive query answering task where inference is performed on a graph containing new entities with queries over both seen and unseen entities. To this end, we devise two mechanisms leveraging inductive node and relational structure representations powered by graph neural networks (GNNs). Experimentally, we show that inductive models are able to perform logical reasoning at inference time over unseen nodes generalizing to graphs up to 500% larger than training ones. Exploring the efficiency--effectiveness trade-off, we find the inductive relational structure representation method generally achieves higher performance, while the inductive node representation method is able to answer complex queries in the inference-only regime without any training on queries and scales to graphs of millions of nodes. Code is available at https://github.com/DeepGraphLearning/InductiveQE.
\ No newline at end of file
diff --git a/data/2022/neurips/Inference and Sampling for Archimax Copulas b/data/2022/neurips/Inference and Sampling for Archimax Copulas
new file mode 100644
index 0000000000..7f5a67a16e
--- /dev/null
+++ b/data/2022/neurips/Inference and Sampling for Archimax Copulas	
@@ -0,0 +1 @@
+Understanding multivariate dependencies in both the bulk and the tails of a distribution is an important problem for many applications, such as ensuring algorithms are robust to observations that are infrequent but have devastating effects. Archimax copulas are a family of distributions endowed with a precise representation that allows simultaneous modeling of the bulk and the tails of a distribution. Rather than separating the two as is typically done in practice, incorporating additional information from the bulk may improve inference of the tails, where observations are limited. Building on the stochastic representation of Archimax copulas, we develop a non-parametric inference method and sampling algorithm. Our proposed methods, to the best of our knowledge, are the first that allow for highly flexible and scalable inference and sampling algorithms, enabling the increased use of Archimax copulas in practical settings. We experimentally compare to state-of-the-art density modeling techniques, and the results suggest that the proposed method effectively extrapolates to the tails while scaling to higher dimensional data. Our findings suggest that the proposed algorithms can be used in a variety of applications where understanding the interplay between the bulk and the tails of a distribution is necessary, such as healthcare and safety.
\ No newline at end of file
diff --git a/data/2022/neurips/Infinite Recommendation Networks: A Data-Centric Approach b/data/2022/neurips/Infinite Recommendation Networks: A Data-Centric Approach
new file mode 100644
index 0000000000..5cb016dd5d
--- /dev/null
+++ b/data/2022/neurips/Infinite Recommendation Networks: A Data-Centric Approach	
@@ -0,0 +1 @@
+We leverage the Neural Tangent Kernel and its equivalence to training infinitely-wide neural networks to devise $\infty$-AE: an autoencoder with infinitely-wide bottleneck layers. The outcome is a highly expressive yet simplistic recommendation model with a single hyper-parameter and a closed-form solution. Leveraging $\infty$-AE's simplicity, we also develop Distill-CF for synthesizing tiny, high-fidelity data summaries which distill the most important knowledge from the extremely large and sparse user-item interaction matrix for efficient and accurate subsequent data-usage like model training, inference, architecture search, etc. This takes a data-centric approach to recommendation, where we aim to improve the quality of logged user-feedback data for subsequent modeling, independent of the learning algorithm. We particularly utilize the concept of differentiable Gumbel-sampling to handle the inherent data heterogeneity, sparsity, and semi-structuredness, while being scalable to datasets with hundreds of millions of user-item interactions. Both of our proposed approaches significantly outperform their respective state-of-the-art and when used together, we observe 96-105% of $\infty$-AE's performance on the full dataset with as little as 0.1% of the original dataset size, leading us to explore the counter-intuitive question: Is more data what you need for better recommendation?
\ No newline at end of file
diff --git a/data/2022/neurips/Infinite-Fidelity Coregionalization for Physical Simulation b/data/2022/neurips/Infinite-Fidelity Coregionalization for Physical Simulation
new file mode 100644
index 0000000000..1adfefa08d
--- /dev/null
+++ b/data/2022/neurips/Infinite-Fidelity Coregionalization for Physical Simulation	
@@ -0,0 +1 @@
+Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within continuous, infinite fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities, which can be even higher than the fidelities of training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.
\ No newline at end of file
diff --git a/data/2022/neurips/Influencing Long-Term Behavior in Multiagent Reinforcement Learning b/data/2022/neurips/Influencing Long-Term Behavior in Multiagent Reinforcement Learning
new file mode 100644
index 0000000000..b934c87874
--- /dev/null
+++ b/data/2022/neurips/Influencing Long-Term Behavior in Multiagent Reinforcement Learning	
@@ -0,0 +1 @@
+The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors jointly affect the environment's transition and reward dynamics. An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and influence the evolution of future policies towards desirable behavior for its own benefit. Unfortunately, previous approaches for achieving this suffer from myopic evaluation, considering only a finite number of policy updates. As such, these methods can only influence transient future policies rather than achieving the promise of scalable equilibrium selection approaches that influence the behavior at convergence. In this paper, we propose a principled framework for considering the limiting policies of other agents as time approaches infinity. Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will converge to. Our paper characterizes desirable solution concepts within this problem setting and provides practical approaches for optimizing over possible outcomes. As a result of our farsighted objective, we demonstrate better long-term performance than state-of-the-art baselines across a suite of diverse multiagent benchmark domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality b/data/2022/neurips/Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
new file mode 100644
index 0000000000..efd8214551
--- /dev/null
+++ b/data/2022/neurips/Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality	
@@ -0,0 +1 @@
+Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using results from random matrix theory, we reveal the information complexity of learning a linear map in high dimensions and unveil information-theoretic analogs of double and multiple descent phenomena.
\ No newline at end of file
diff --git a/data/2022/neurips/Information-Theoretic GAN Compression with Variational Energy-based Model b/data/2022/neurips/Information-Theoretic GAN Compression with Variational Energy-based Model
new file mode 100644
index 0000000000..35fff2617e
--- /dev/null
+++ b/data/2022/neurips/Information-Theoretic GAN Compression with Variational Energy-based Model	
@@ -0,0 +1 @@
+We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks, which aims to maximize the mutual information between teacher and student networks via a variational optimization based on an energy-based model. Because the direct computation of the mutual information in continuous domains is intractable, our approach alternatively optimizes the student network by maximizing the variational lower bound of the mutual information. To achieve a tight lower bound, we introduce an energy-based model relying on a deep neural network to represent a flexible variational distribution that deals with high-dimensional images and consider spatial dependencies between pixels, effectively. Since the proposed method is a generic optimization algorithm, it can be conveniently incorporated into arbitrary generative adversarial networks and even dense prediction networks, e.g., image enhancement models. We demonstrate that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently when combined with several existing models.
\ No newline at end of file
diff --git a/data/2022/neurips/Information-Theoretic Safe Exploration with Gaussian Processes b/data/2022/neurips/Information-Theoretic Safe Exploration with Gaussian Processes
new file mode 100644
index 0000000000..6df3f89d20
--- /dev/null
+++ b/data/2022/neurips/Information-Theoretic Safe Exploration with Gaussian Processes	
@@ -0,0 +1 @@
+We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an a priori unknown (safety) constraint. A common approach is to place a Gaussian process prior on the unknown constraint and allow evaluations only in regions that are safe with high probability. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. Moreover, the way in which they exploit regularity assumptions about the constraint introduces an additional critical hyperparameter. In this paper, we propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate. Our approach is naturally applicable to continuous domains and does not require additional hyperparameters. We theoretically analyze the method and show that we do not violate the safety constraint with high probability and that we explore by learning about the constraint up to arbitrary precision. Empirical evaluations demonstrate improved data-efficiency and scalability.
\ No newline at end of file
diff --git a/data/2022/neurips/Inherently Explainable Reinforcement Learning in Natural Language b/data/2022/neurips/Inherently Explainable Reinforcement Learning in Natural Language
new file mode 100644
index 0000000000..389be12fab
--- /dev/null
+++ b/data/2022/neurips/Inherently Explainable Reinforcement Learning in Natural Language	
@@ -0,0 +1 @@
+We focus on the task of creating a reinforcement learning agent that is inherently explainable -- with the ability to produce immediate local explanations by thinking out loud while performing a task and analyzing entire trajectories post-hoc to produce causal explanations. This Hierarchically Explainable Reinforcement Learning agent (HEX-RL), operates in Interactive Fictions, text-based game environments in which an agent perceives and acts upon the world using textual natural language. These games are usually structured as puzzles or quests with long-term dependencies in which an agent must complete a sequence of actions to succeed -- providing ideal environments in which to test an agent's ability to explain its actions. Our agent is designed to treat explainability as a first-class citizen, using an extracted symbolic knowledge graph-based state representation coupled with a Hierarchical Graph Attention mechanism that points to the facts in the internal graph representation that most influenced the choice of actions. Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties b/data/2022/neurips/Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties
new file mode 100644
index 0000000000..d5bd044c22
--- /dev/null
+++ b/data/2022/neurips/Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties	
@@ -0,0 +1 @@
+For decades, atomistic modeling has played a crucial role in predicting the behavior of materials in numerous fields ranging from nanotechnology to drug discovery. The most accurate methods in this domain are rooted in first-principles quantum mechanical calculations such as density functional theory (DFT). Because these methods have remained computationally prohibitive, practitioners have traditionally focused on defining physically motivated closed-form expressions known as empirical interatomic potentials (EIPs) that approximately model the interactions between atoms in materials. In recent years, neural network (NN)-based potentials trained on quantum mechanical (DFT-labeled) data have emerged as a more accurate alternative to conventional EIPs. However, the generalizability of these models relies heavily on the amount of labeled training data, which is often still insufficient to generate models suitable for general-purpose applications. In this paper, we propose two generic strategies that take advantage of unlabeled training instances to inject domain knowledge from conventional EIPs to NNs in order to increase their generalizability. The first strategy, based on weakly supervised learning, trains an auxiliary classifier on EIPs and selects the best-performing EIP to generate energies to supplement the ground-truth DFT energies in training the NN. The second strategy, based on transfer learning, first pretrains the NN on a large set of easily obtainable EIP energies, and then fine-tunes it on ground-truth DFT energies. Experimental results on three benchmark datasets demonstrate that the first strategy improves baseline NN performance by 5% to 51% while the second improves baseline performance by up to 55%. Combining them further boosts performance.
\ No newline at end of file
diff --git a/data/2022/neurips/InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model b/data/2022/neurips/InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model
new file mode 100644
index 0000000000..007fe35103
--- /dev/null
+++ b/data/2022/neurips/InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model	
@@ -0,0 +1 @@
+We propose InsNet, an expressive insertion-based text generator with efficient training and flexible decoding (parallel or sequential). Unlike most existing insertion-based text generation works that require re-encoding of the context after each insertion operation and thus are inefficient to train, InsNet only requires one pass of context encoding for the entire sequence during training by introducing a novel insertion-oriented position encoding and a light-weighted slot representation strategy to enable computation sharing. Furthermore, we propose an algorithm InsNet-Dinic to better determine the parallelization of insertion operations that provides a controllable switch between parallel and sequential decoding, making it flexible to handle more parallelizable tasks such as machine translation with efficient decoding, or less parallelizable tasks such as open-domain text generation to guarantee high-quality outputs. Experiments on two lexically constrained text generation datasets and three machine translation datasets demonstrate InsNet's advantages over previous insertion-based methods in terms of training speed, inference efficiency, and generation quality.
\ No newline at end of file
diff --git a/data/2022/neurips/InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation b/data/2022/neurips/InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation
new file mode 100644
index 0000000000..8f83a402fb
--- /dev/null
+++ b/data/2022/neurips/InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation	
@@ -0,0 +1 @@
+Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Insights into Pre-training via Simpler Synthetic Tasks b/data/2022/neurips/Insights into Pre-training via Simpler Synthetic Tasks
new file mode 100644
index 0000000000..919cf3c64c
--- /dev/null
+++ b/data/2022/neurips/Insights into Pre-training via Simpler Synthetic Tasks	
@@ -0,0 +1 @@
+Pre-training produces representations that are effective for a wide range of downstream tasks, but it is still unclear what properties of pre-training are necessary for effective gains. Notably, recent work shows that even pre-training on synthetic tasks can achieve significant gains in downstream tasks. In this work, we perform three experiments that iteratively simplify pre-training and show that the simplifications still retain much of its gains. First, building on prior work, we perform a systematic evaluation of three existing synthetic pre-training methods on six downstream tasks. We find the best synthetic pre-training method, LIME, attains an average of $67\%$ of the benefits of natural pre-training. Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME. Third, we find that $39\%$ of the benefits can be attained by using merely the parameter statistics of synthetic pre-training. We release the source code at https://github.com/felixzli/synthetic_pretraining.
\ No newline at end of file
diff --git a/data/2022/neurips/Instability and Local Minima in GAN Training with Kernel Discriminators b/data/2022/neurips/Instability and Local Minima in GAN Training with Kernel Discriminators
new file mode 100644
index 0000000000..36634cd41b
--- /dev/null
+++ b/data/2022/neurips/Instability and Local Minima in GAN Training with Kernel Discriminators	
@@ -0,0 +1 @@
+Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the $\textit{Isolated Points Model}$ is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors.
\ No newline at end of file
diff --git a/data/2022/neurips/Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees b/data/2022/neurips/Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees
new file mode 100644
index 0000000000..00109b000b
--- /dev/null
+++ b/data/2022/neurips/Instance-Based Uncertainty Estimation for Gradient-Boosted Regression Trees	
@@ -0,0 +1 @@
+Gradient-boosted regression trees (GBRTs) are hugely popular for solving tabular regression problems, but provide no estimate of uncertainty. We propose Instance-Based Uncertainty estimation for Gradient-boosted regression trees (IBUG), a simple method for extending any GBRT point predictor to produce probabilistic predictions. IBUG computes a non-parametric distribution around a prediction using the $k$-nearest training instances, where distance is measured with a tree-ensemble kernel. The runtime of IBUG depends on the number of training examples at each leaf in the ensemble, and can be improved by sampling trees or training instances. Empirically, we find that IBUG achieves similar or better performance than the previous state-of-the-art across 22 benchmark regression datasets. We also find that IBUG can achieve improved probabilistic performance by using different base GBRT models, and can more flexibly model the posterior distribution of a prediction than competing methods. We also find that previous methods suffer from poor probabilistic calibration on some datasets, which can be mitigated using a scalar factor tuned on the validation data. Source code is available at https://www.github.com/jjbrophy47/ibug.
\ No newline at end of file
diff --git a/data/2022/neurips/Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design b/data/2022/neurips/Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design
new file mode 100644
index 0000000000..2703387d33
--- /dev/null
+++ b/data/2022/neurips/Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design	
@@ -0,0 +1 @@
+While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the"worst-case"instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an"easy"instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the"instance-dependent"complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the"directions"most relevant to learning a near-optimal policy, and may be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Instance-based Learning for Knowledge Base Completion b/data/2022/neurips/Instance-based Learning for Knowledge Base Completion
new file mode 100644
index 0000000000..5e1a1745bd
--- /dev/null
+++ b/data/2022/neurips/Instance-based Learning for Knowledge Base Completion	
@@ -0,0 +1 @@
+In this paper, we propose a new method for knowledge base completion (KBC): instance-based learning (IBL). For example, to answer (Jill Biden, lived city,? ), instead of going directly to Washington D.C., our goal is to find Joe Biden, who has the same lived city as Jill Biden. Through prototype entities, IBL provides interpretability. We develop theories for modeling prototypes and combining IBL with translational models. Experiments on various tasks confirmed the IBL model's effectiveness and interpretability. In addition, IBL shed light on the mechanism of rule-based KBC models. Previous research has generally agreed that rule-based models provide rules with semantically compatible premises and hypotheses. We challenge this view. We begin by demonstrating that some logical rules represent {\it instance-based equivalence} (i.e. prototypes) rather than semantic compatibility. These are denoted as {\it IBL rules}. Surprisingly, despite occupying only a small portion of the rule space, IBL rules outperform non-IBL rules in all four benchmarks. We use a variety of experiments to demonstrate that rule-based models work because they have the ability to represent instance-based equivalence via IBL rules. The findings provide new insights of how rule-based models work and how to interpret their rules.
\ No newline at end of file
diff --git a/data/2022/neurips/Instance-optimal PAC Algorithms for Contextual Bandits b/data/2022/neurips/Instance-optimal PAC Algorithms for Contextual Bandits
new file mode 100644
index 0000000000..e4df1de568
--- /dev/null
+++ b/data/2022/neurips/Instance-optimal PAC Algorithms for Contextual Bandits	
@@ -0,0 +1 @@
+In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the $(\epsilon,\delta)$-$\textit{PAC}$ setting: given a policy class $\Pi$ the goal of the learner is to return a policy $\pi\in \Pi$ whose expected reward is within $\epsilon$ of the optimal policy with probability greater than $1-\delta$. We characterize the first $\textit{instance-dependent}$ PAC sample complexity of contextual bandits through a quantity $\rho_{\Pi}$, and provide matching upper and lower bounds in terms of $\rho_{\Pi}$ for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
\ No newline at end of file
diff --git a/data/2022/neurips/Integral Probability Metrics PAC-Bayes Bounds b/data/2022/neurips/Integral Probability Metrics PAC-Bayes Bounds
new file mode 100644
index 0000000000..efaef56a87
--- /dev/null
+++ b/data/2022/neurips/Integral Probability Metrics PAC-Bayes Bounds	
@@ -0,0 +1 @@
+We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and improved bounds in favorable cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.
\ No newline at end of file
diff --git a/data/2022/neurips/Interaction Modeling with Multiplex Attention b/data/2022/neurips/Interaction Modeling with Multiplex Attention
new file mode 100644
index 0000000000..621d5e37a7
--- /dev/null
+++ b/data/2022/neurips/Interaction Modeling with Multiplex Attention	
@@ -0,0 +1 @@
+Modeling multi-agent systems requires understanding how agents interact. Such systems are often difficult to model because they can involve a variety of types of interactions that layer together to drive rich social behavioral dynamics. Here we introduce a method for accurately modeling multi-agent systems. We present Interaction Modeling with Multiplex Attention (IMMA), a forward prediction model that uses a multiplex latent graph to represent multiple independent types of interactions and attention to account for relations of different strengths. We also introduce Progressive Layer Training, a training strategy for this architecture. We show that our approach outperforms state-of-the-art models in trajectory forecasting and relation inference, spanning three multi-agent scenarios: social navigation, cooperative task achievement, and team sports. We further demonstrate that our approach can improve zero-shot generalization and allows us to probe how different interactions impact agent behavior.
\ No newline at end of file
diff --git a/data/2022/neurips/Interaction-Grounded Learning with Action-Inclusive Feedback b/data/2022/neurips/Interaction-Grounded Learning with Action-Inclusive Feedback
new file mode 100644
index 0000000000..22466d5a44
--- /dev/null
+++ b/data/2022/neurips/Interaction-Grounded Learning with Action-Inclusive Feedback	
@@ -0,0 +1 @@
+Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies. The agent observes a context vector, takes an action, and receives a feedback vector, using this information to effectively optimize a policy with respect to a latent reward function. Prior analyzed approaches fail when the feedback vector contains the action, which significantly limits IGL's success in many potential scenarios such as Brain-computer interface (BCI) or Human-computer interface (HCI) applications. We address this by creating an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion. We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation b/data/2022/neurips/Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation
new file mode 100644
index 0000000000..a7abf71d82
--- /dev/null
+++ b/data/2022/neurips/Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation	
@@ -0,0 +1 @@
+Few-shot semantic segmentation aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more effective category information from the support to match with the corresponding objects in query. However, they all ignored the category information gap between query and support images. If the objects in them show large intra-class diversity, forcibly migrating the category information from the support to the query is ineffective. To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive category knowledge from the query. Specifically, we design an Intermediate Prototype Mining Transformer (IPMT) to learn the prototype in an iterative way. In each IPMT layer, we propagate the object information in both support and query features to the prototype and then use it to activate the query feature map. By conducting this process iteratively, both the intermediate prototype and the query feature can be progressively improved. At last, the final query feature is used to yield precise segmentation prediction. Extensive experiments on both PASCAL-5i and COCO-20i datasets clearly verify the effectiveness of our IPMT and show that it outperforms previous state-of-the-art methods by a large margin. Code is available at https://github.com/LIUYUANWEI98/IPMT
\ No newline at end of file
diff --git a/data/2022/neurips/Interpolation and Regularization for Causal Learning b/data/2022/neurips/Interpolation and Regularization for Causal Learning
new file mode 100644
index 0000000000..05bbb29cd5
--- /dev/null
+++ b/data/2022/neurips/Interpolation and Regularization for Causal Learning	
@@ -0,0 +1 @@
+We study the problem of learning causal models from observational data through the lens of interpolation and its counterpart -- regularization. A large volume of recent theoretical, as well as empirical work, suggests that, in highly complex model classes, interpolating estimators can have good statistical generalization properties and can even be optimal for statistical learning. Motivated by an analogy between statistical and causal learning recently highlighted by Janzing (2019), we investigate whether interpolating estimators can also learn good causal models. To this end, we consider a simple linearly confounded model and derive precise asymptotics for the *causal risk* of the min-norm interpolator and ridge-regularized regressors in the high-dimensional regime. Under the principle of independent causal mechanisms, a standard assumption in causal learning, we find that interpolators cannot be optimal and causal learning requires stronger regularization than statistical learning. This resolves a recent conjecture in Janzing (2019). Beyond this assumption, we find a larger range of behavior that can be precisely characterized with a new measure of *confounding strength*. If the confounding strength is negative, causal learning requires weaker regularization than statistical learning, interpolators can be optimal, and the optimal regularization can even be negative. If the confounding strength is large, the optimal regularization is infinite, and learning from observational data is actively harmful.
\ No newline at end of file
diff --git a/data/2022/neurips/Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations b/data/2022/neurips/Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations
new file mode 100644
index 0000000000..03bbb420ed
--- /dev/null
+++ b/data/2022/neurips/Interpreting Operation Selection in Differentiable Architecture Search: A Perspective from Influence-Directed Explanations	
@@ -0,0 +1 @@
+The Differentiable ARchiTecture Search (DARTS) has dominated the neural architecture search community due to its search efficiency and simplicity. DARTS leverages continuous relaxation to convert the intractable operation selection problem into a continuous magnitude optimization problem which can be easily handled with gradient-descent, while it poses an additional challenge in measuring the operation importance or selecting an architecture from the optimized magnitudes. The vanilla DARTS assumes the optimized magnitudes reflect the importance of operations, while more recent works find this naive assumption leads to poor generalization and is without any theoretical guarantees. In this work, we leverage influence functions, the functional derivatives of the loss function, to theoretically reveal the operation selection part in DARTS and estimate the candidate operation importance by approximating its influence on the supernet with Taylor expansions. We show the operation strength is not only related to the magnitude but also second-order information, leading to a fundamentally new criterion for operation selection in DARTS, named Influential Magnitude . Empirical studies across different tasks on several spaces show that vanilla DARTS and its variants can avoid most failures by leveraging the proposed theory-driven operation selection criterion.
\ No newline at end of file
diff --git a/data/2022/neurips/Interventions, Where and How? Experimental Design for Causal Models at Scale b/data/2022/neurips/Interventions, Where and How? Experimental Design for Causal Models at Scale
new file mode 100644
index 0000000000..b8704ffd9b
--- /dev/null
+++ b/data/2022/neurips/Interventions, Where and How? Experimental Design for Causal Models at Scale	
@@ -0,0 +1 @@
+Causal discovery from observational and interventional data is challenging due to limited data and non-identifiability: factors that introduce uncertainty in estimating the underlying structural causal model (SCM). Selecting experiments (interventions) based on the uncertainty arising from both factors can expedite the identification of the SCM. Existing methods in experimental design for causal discovery from limited data either rely on linear assumptions for the SCM or select only the intervention target. This work incorporates recent advances in Bayesian causal discovery into the Bayesian optimal experimental design framework, allowing for active causal discovery of large, nonlinear SCMs while selecting both the interventional target and the value. We demonstrate the performance of the proposed method on synthetic graphs (Erdos-R\`enyi, Scale Free) for both linear and nonlinear SCMs as well as on the \emph{in-silico} single-cell gene regulatory network dataset, DREAM.
\ No newline at end of file
diff --git a/data/2022/neurips/Intra-agent speech permits zero-shot task acquisition b/data/2022/neurips/Intra-agent speech permits zero-shot task acquisition
new file mode 100644
index 0000000000..24d0adc65a
--- /dev/null
+++ b/data/2022/neurips/Intra-agent speech permits zero-shot task acquisition	
@@ -0,0 +1 @@
+Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of"inner speech"in humans (Vygotsky, 1934) to better understand the role of intra-agent speech in embodied behavior. First, we formally pose intra-agent speech as a semi-supervised problem and develop two algorithms that enable visually grounded captioning with little labeled language data. We then experimentally compute scaling curves over different amounts of labeled data and compare the data efficiency against a supervised learning baseline. Finally, we incorporate intra-agent speech into an embodied, mobile manipulator agent operating in a 3D virtual world, and show that with as few as 150 additional image captions, intra-agent speech endows the agent with the ability to manipulate and answer questions about a new object without any related task-directed experience (zero-shot). Taken together, our experiments suggest that modelling intra-agent speech is effective in enabling embodied agents to learn new tasks efficiently and without direct interaction experience.
\ No newline at end of file
diff --git a/data/2022/neurips/Intrinsic dimensionality estimation using Normalizing Flows b/data/2022/neurips/Intrinsic dimensionality estimation using Normalizing Flows
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Introspective Learning : A Two-Stage approach for Inference in Neural Networks b/data/2022/neurips/Introspective Learning : A Two-Stage approach for Inference in Neural Networks
new file mode 100644
index 0000000000..94f7d323dd
--- /dev/null
+++ b/data/2022/neurips/Introspective Learning : A Two-Stage approach for Inference in Neural Networks	
@@ -0,0 +1 @@
+In this paper, we advocate for two stages in a neural network's decision making process. The first is the existing feed-forward inference framework where patterns in given data are sensed and associated with previously learned patterns. The second stage is a slower reflection stage where we ask the network to reflect on its feed-forward decision by considering and evaluating all available choices. Together, we term the two stages as introspective learning. We use gradients of trained neural networks as a measurement of this reflection. A simple three-layered Multi Layer Perceptron is used as the second stage that predicts based on all extracted gradient features. We perceptually visualize the post-hoc explanations from both stages to provide a visual grounding to introspection. For the application of recognition, we show that an introspective network is 4% more robust and 42% less prone to calibration errors when generalizing to noisy data. We also illustrate the value of introspective networks in downstream tasks that require generalizability and calibration including active learning, out-of-distribution detection, and uncertainty estimation. Finally, we ground the proposed machine introspection to human introspection for the application of image quality assessment.
\ No newline at end of file
diff --git a/data/2022/neurips/Invariance Learning based on Label Hierarchy b/data/2022/neurips/Invariance Learning based on Label Hierarchy
new file mode 100644
index 0000000000..f41035e46e
--- /dev/null
+++ b/data/2022/neurips/Invariance Learning based on Label Hierarchy	
@@ -0,0 +1 @@
+Deep Neural Networks inherit spurious correlations embedded in training data and hence may fail to predict desired labels on unseen domains (or environments), which have different distributions from the domain used in training. Invariance Learning (IL) has been developed recently to overcome this shortcoming; using training data in many domains, IL estimates such a predictor that is invariant to a change of domain. However, the requirement of training data in multiple domains is a strong restriction of IL, since it often needs high annotation cost. We propose a novel IL framework to overcome this problem. Assuming the availability of data from multiple domains for a higher level of classification task, for which the labeling cost is low, we estimate an invariant predictor for the target classification task with training data in a single domain. Additionally, we propose two cross-validation methods for selecting hyperparameters of invariance regularization to solve the issue of hyperparameter selection, which has not been handled properly in existing IL methods. The effectiveness of the proposed framework, including the cross-validation, is demonstrated empirically, and the correctness of the hyperparameter selection is proved under some conditions.
\ No newline at end of file
diff --git a/data/2022/neurips/Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations b/data/2022/neurips/Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations
new file mode 100644
index 0000000000..f5c7075689
--- /dev/null
+++ b/data/2022/neurips/Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations	
@@ -0,0 +1 @@
+Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the data augmentation parameters are chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentation without validation data during training of a deep neural network. Our approach relies on phrasing data augmentation as an invariance in the prior distribution on the functions of a neural network, which allows us to learn it using Bayesian model selection. This has been shown to work in Gaussian processes, but not yet for deep neural networks. We propose a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective, which can be optimised without human supervision or validation data. We show that our method can successfully recover invariances present in the data, and that this improves generalisation and data efficiency on image datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Invariance-Aware Randomized Smoothing Certificates b/data/2022/neurips/Invariance-Aware Randomized Smoothing Certificates
new file mode 100644
index 0000000000..8696acabca
--- /dev/null
+++ b/data/2022/neurips/Invariance-Aware Randomized Smoothing Certificates	
@@ -0,0 +1 @@
+Building models that comply with the invariances inherent to different domains, such as invariance under translation or rotation, is a key aspect of applying machine learning to real world problems like molecular property prediction, medical imaging, protein folding or LiDAR classification. For the first time, we study how the invariances of a model can be leveraged to provably guarantee the robustness of its predictions. We propose a gray-box approach, enhancing the powerful black-box randomized smoothing technique with white-box knowledge about invariances. First, we develop gray-box certificates based on group orbits, which can be applied to arbitrary models with invariance under permutation and Euclidean isometries. Then, we derive provably tight gray-box certificates. We experimentally demonstrate that the provably tight certificates can offer much stronger guarantees, but that in practical scenarios the orbit-based method is a good approximation.
\ No newline at end of file
diff --git a/data/2022/neurips/Invariant and Transportable Representations for Anti-Causal Domain Shifts b/data/2022/neurips/Invariant and Transportable Representations for Anti-Causal Domain Shifts
new file mode 100644
index 0000000000..9c0129cefb
--- /dev/null
+++ b/data/2022/neurips/Invariant and Transportable Representations for Anti-Causal Domain Shifts	
@@ -0,0 +1 @@
+Real-world classification problems must contend with domain shift, the (potential) mismatch between the domain where a model is deployed and the domain(s) where the training data was gathered. Methods to handle such problems must specify what structure is common between the domains and what varies. A natural assumption is that causal (structural) relationships are invariant in all domains. Then, it is tempting to learn a predictor for label $Y$ that depends only on its causal parents. However, many real-world problems are"anti-causal"in the sense that $Y$ is a cause of the covariates $X$ -- in this case, $Y$ has no causal parents and the naive causal invariance is useless. In this paper, we study representation learning under a particular notion of domain shift that both respects causal invariance and that naturally handles the"anti-causal"structure. We show how to leverage the shared causal structure of the domains to learn a representation that both admits an invariant predictor and that also allows fast adaptation in new domains. The key is to translate causal assumptions into learning principles that disentangle"invariant"and"non-stable"features. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed learning algorithm. Code is available at https://github.com/ybjiaang/ACTIR.
\ No newline at end of file
diff --git a/data/2022/neurips/Inverse Design for Fluid-Structure Interactions using Graph Network Simulators b/data/2022/neurips/Inverse Design for Fluid-Structure Interactions using Graph Network Simulators
new file mode 100644
index 0000000000..3803d9e94e
--- /dev/null
+++ b/data/2022/neurips/Inverse Design for Fluid-Structure Interactions using Graph Network Simulators	
@@ -0,0 +1 @@
+Designing physical artifacts that serve a purpose—such as tools and other functional structures—is central to engineering as well as everyday human behavior. Though automating design using machine learning has tremendous promise, existing methods are often limited by the task-dependent distributions they were exposed to during training. Here we showcase a task-agnostic approach to inverse design, by combining general-purpose graph network simulators with gradient-based design optimization. This constitutes a simple, fast, and reusable approach that solves high-dimensional problems with complex physical dynamics, including designing surfaces and tools to manipulate fluid flows and optimizing the shape of an airfoil to minimize drag. This framework produces high-quality designs by propagating gradients through trajectories of hundreds of steps, even when using models that were pre-trained for single-step predictions on data substantially different from the design tasks. In our fluid manipulation tasks, the resulting designs outperformed those found by sampling-based optimization techniques. In airfoil design, they matched the quality of those obtained with a specialized solver. Our results suggest that despite some remaining challenges, machine learning-based simulators are maturing to the point where they can support general-purpose design optimization across a variety of fluid-structure interaction domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality b/data/2022/neurips/Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality
new file mode 100644
index 0000000000..2201655262
--- /dev/null
+++ b/data/2022/neurips/Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality	
@@ -0,0 +1 @@
+Optimizing strategic decisions (a.k.a. computing equilibrium) is key to the success of many non-cooperative multi-agent applications. However, in many real-world situations, we may face the exact opposite of this game-theoretic problem -- instead of prescribing equilibrium of a given game, we may directly observe the agents' equilibrium behaviors but want to infer the underlying parameters of an unknown game. This research question, also known as inverse game theory, has been studied in multiple recent works in the context of Stackelberg games. Unfortunately, existing works exhibit quite negative results, showing statistical hardness and computational hardness, assuming follower's perfectly rational behaviors. Our work relaxes the perfect rationality agent assumption to the classic quantal response model, a more realistic behavior model of bounded rationality. Interestingly, we show that the smooth property brought by such bounded rationality model actually leads to provably more efficient learning of the follower utility parameters in general Stackelberg games. Systematic empirical experiments on synthesized games confirm our theoretical results and further suggest its robustness beyond the strict quantal response model.
\ No newline at end of file
diff --git a/data/2022/neurips/Invertible Monotone Operators for Normalizing Flows b/data/2022/neurips/Invertible Monotone Operators for Normalizing Flows
new file mode 100644
index 0000000000..5896de5b72
--- /dev/null
+++ b/data/2022/neurips/Invertible Monotone Operators for Normalizing Flows	
@@ -0,0 +1 @@
+Normalizing flows model probability distributions by learning invertible transformations that transfer a simple distribution into complex distributions. Since the architecture of ResNet-based normalizing flows is more flexible than that of coupling-based models, ResNet-based normalizing flows have been widely studied in recent years. Despite their architectural flexibility, it is well-known that the current ResNet-based models suffer from constrained Lipschitz constants. In this paper, we propose the monotone formulation to overcome the issue of the Lipschitz constants using monotone operators and provide an in-depth theoretical analysis. Furthermore, we construct an activation function called Concatenated Pila (CPila) to improve gradient flow. The resulting model, Monotone Flows, exhibits an excellent performance on multiple density estimation benchmarks (MNIST, CIFAR-10, ImageNet32, ImageNet64). Code is available at https://github.com/mlvlab/MonotoneFlows.
\ No newline at end of file
diff --git a/data/2022/neurips/Is Integer Arithmetic Enough for Deep Learning Training? b/data/2022/neurips/Is Integer Arithmetic Enough for Deep Learning Training?
new file mode 100644
index 0000000000..c9580c9cdf
--- /dev/null
+++ b/data/2022/neurips/Is Integer Arithmetic Enough for Deep Learning Training?	
@@ -0,0 +1 @@
+The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising approach to save energy, memory footprint, and latency of deep learning models. As such, quantization has attracted the attention of researchers in recent years. However, using integer numbers to form a fully functional integer training pipeline including forward pass, back-propagation, and stochastic gradient descent is not studied in detail. Our empirical and mathematical results reveal that integer arithmetic seems to be enough to train deep learning models. Unlike recent proposals, instead of quantization, we directly switch the number representation of computations. Our novel training method forms a fully integer training pipeline that does not change the trajectory of the loss and accuracy compared to floating-point, nor does it need any special hyper-parameter tuning, distribution adjustment, or gradient clipping. Our experimental results show that our proposed method is effective in a wide variety of tasks such as classification (including vision transformers), object detection, and semantic segmentation.
\ No newline at end of file
diff --git a/data/2022/neurips/Is Out-of-Distribution Detection Learnable? b/data/2022/neurips/Is Out-of-Distribution Detection Learnable?
new file mode 100644
index 0000000000..874765f8a0
--- /dev/null
+++ b/data/2022/neurips/Is Out-of-Distribution Detection Learnable?	
@@ -0,0 +1 @@
+Supervised learning aims to train a classifier under the assumption that training and test data are from the same distribution. To ease the above assumption, researchers have studied a more realistic setting: out-of-distribution (OOD) detection, where test data may come from classes that are unknown during training (i.e., OOD data). Due to the unavailability and diversity of OOD data, good generalization ability is crucial for effective OOD detection algorithms. To study the generalization of OOD detection, in this paper, we investigate the probably approximately correct (PAC) learning theory of OOD detection, which is proposed by researchers as an open problem. First, we find a necessary condition for the learnability of OOD detection. Then, using this condition, we prove several impossibility theorems for the learnability of OOD detection under some scenarios. Although the impossibility theorems are frustrating, we find that some conditions of these impossibility theorems may not hold in some practical scenarios. Based on this observation, we next give several necessary and sufficient conditions to characterize the learnability of OOD detection in some practical scenarios. Lastly, we also offer theoretical supports for several representative OOD detection works based on our OOD theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Is Sortition Both Representative and Fair? b/data/2022/neurips/Is Sortition Both Representative and Fair?
new file mode 100644
index 0000000000..a0cd9abd3c
--- /dev/null
+++ b/data/2022/neurips/Is Sortition Both Representative and Fair?	
@@ -0,0 +1 @@
+Sortition is a form of democracy built on random selection of representatives. Two of the key arguments in favor of sortition are representation (a random panel reﬂects the composition of the population) and fairness (everyone has a chance to participate). Uniformly random selection is perfectly fair, but is it representative? To answer this question, we introduce the notion of a representation metric on the space of individuals, and assume that the cost of an individual for a panel is determined by the q -th closest representative; the representation of a (random) panel is measured by the ratio between the (expected) sum of costs of the optimal panel for the individuals and that of the given panel. For k/ 2 < q ≤ k − Ω( k ) , where k is the panel size, we show that uniform random selection is indeed representative by establishing a constant lower bound on this ratio. By contrast, for q ≤ k/ 2 , no random selection algorithm that is almost fair can give such a guarantee. We therefore consider relaxed fairness guarantees and develop a new random selection algorithm that sheds light on the tradeoff between representation and fairness.
\ No newline at end of file
diff --git a/data/2022/neurips/Is a Modular Architecture Enough? b/data/2022/neurips/Is a Modular Architecture Enough?
new file mode 100644
index 0000000000..a1bf058d90
--- /dev/null
+++ b/data/2022/neurips/Is a Modular Architecture Enough?	
@@ -0,0 +1 @@
+Inspired from human cognition, machine learning systems are gradually revealing advantages of sparser and more modular architectures. Recent work demonstrates that not only do some modular architectures generalize well, but they also lead to better out-of-distribution generalization, scaling properties, learning speed, and interpretability. A key intuition behind the success of such systems is that the data generating system for most real-world settings is considered to consist of sparsely interacting parts, and endowing models with similar inductive biases will be helpful. However, the field has been lacking in a rigorous quantitative assessment of such systems because these real-world data distributions are complex and unknown. In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
\ No newline at end of file
diff --git a/data/2022/neurips/Is one annotation enough? - A data-centric image classification benchmark for noisy and ambiguous label estimation b/data/2022/neurips/Is one annotation enough? - A data-centric image classification benchmark for noisy and ambiguous label estimation
new file mode 100644
index 0000000000..b789215b77
--- /dev/null
+++ b/data/2022/neurips/Is one annotation enough? - A data-centric image classification benchmark for noisy and ambiguous label estimation	
@@ -0,0 +1 @@
+High-quality data is necessary for modern machine learning. However, the acquisition of such data is difficult due to noisy and ambiguous annotations of humans. The aggregation of such annotations to determine the label of an image leads to a lower data quality. We propose a data-centric image classification benchmark with ten real-world datasets and multiple annotations per image to allow researchers to investigate and quantify the impact of such data quality issues. With the benchmark we can study the impact of annotation costs and (semi-)supervised methods on the data quality for image classification by applying a novel methodology to a range of different algorithms and diverse datasets. Our benchmark uses a two-phase approach via a data label improvement method in the first phase and a fixed evaluation model in the second phase. Thereby, we give a measure for the relation between the input labeling effort and the performance of (semi-)supervised algorithms to enable a deeper insight into how labels should be created for effective model training. Across thousands of experiments, we show that one annotation is not enough and that the inclusion of multiple annotations allows for a better approximation of the real underlying class distribution. We identify that hard labels can not capture the ambiguity of the data and this might lead to the common issue of overconfident models. Based on the presented datasets, benchmarked methods, and analysis, we create multiple research opportunities for the future directed at the improvement of label noise estimation approaches, data annotation schemes, realistic (semi-)supervised learning, or more reliable image collection.
\ No newline at end of file
diff --git a/data/2022/neurips/Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations b/data/2022/neurips/Is this the Right Neighborhood? Accurate and Query Efficient Model Agnostic Explanations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models b/data/2022/neurips/Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models
new file mode 100644
index 0000000000..eba50dac9c
--- /dev/null
+++ b/data/2022/neurips/Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models	
@@ -0,0 +1 @@
+World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios such as autonomous driving, there commonly exists noncontrollable dynamics independent of the action signals, making it difficult to learn effective world models. To tackle this problem, we present a novel reinforcement learning approach named Iso-Dream, which improves the Dream-to-Control framework in two aspects. First, by optimizing the inverse dynamics, we encourage the world model to learn controllable and noncontrollable sources of spatiotemporal changes on isolated state transition branches. Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model. Specifically, to estimate state values, we roll-out the noncontrollable states into the future and associate them with the current controllable state. In this way, the isolation of dynamics sources can greatly benefit long-horizon decision-making of the agent, such as a self-driving car that can avoid potential risks by anticipating the movement of other vehicles. Experiments show that Iso-Dream is effective in decoupling the mixed dynamics and remarkably outperforms existing approaches in a wide range of visual control and prediction domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Isometric 3D Adversarial Examples in the Physical World b/data/2022/neurips/Isometric 3D Adversarial Examples in the Physical World
new file mode 100644
index 0000000000..89b9441513
--- /dev/null
+++ b/data/2022/neurips/Isometric 3D Adversarial Examples in the Physical World	
@@ -0,0 +1 @@
+3D deep learning models are shown to be as vulnerable to adversarial examples as 2D models. However, existing attack methods are still far from stealthy and suffer from severe performance degradation in the physical world. Although 3D data is highly structured, it is difficult to bound the perturbations with simple metrics in the Euclidean space. In this paper, we propose a novel $\epsilon$-isometric ($\epsilon$-ISO) attack to generate natural and robust 3D adversarial examples in the physical world by considering the geometric properties of 3D objects and the invariance to physical transformations. For naturalness, we constrain the adversarial example to be $\epsilon$-isometric to the original one by adopting the Gaussian curvature as a surrogate metric guaranteed by a theoretical analysis. For invariance to physical transformations, we propose a maxima over transformation (MaxOT) method that actively searches for the most harmful transformations rather than random ones to make the generated adversarial example more robust in the physical world. Experiments on typical point cloud recognition models validate that our approach can significantly improve the attack success rate and naturalness of the generated 3D adversarial examples than the state-of-the-art attack methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments b/data/2022/neurips/Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments
new file mode 100644
index 0000000000..f0f95f117a
--- /dev/null
+++ b/data/2022/neurips/Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments	
@@ -0,0 +1 @@
+Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. Despite a proliferation of proposal algorithms for this task, assessing their performance both theoretically and empirically is still very challenging. Distributional matching algorithms such as (Conditional) Domain Adversarial Networks [Ganin et al., 2016, Long et al., 2018] are popular and enjoy empirical success, but they lack formal guarantees. Other approaches such as Invariant Risk Minimization (IRM) require a prohibitively large number of training environments -- linear in the dimension of the spurious feature space $d_s$ -- even on simple data models like the one proposed by [Rosenfeld et al., 2021]. Under a variant of this model, we show that both ERM and IRM cannot generalize with $o(d_s)$ environments. We then present an iterative feature matching algorithm that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(\log d_s)$ environments. Our results provide the first theoretical justification for a family of distribution-matching algorithms widely used in practice under a concrete nontrivial data model.
\ No newline at end of file
diff --git a/data/2022/neurips/Iterative Scene Graph Generation b/data/2022/neurips/Iterative Scene Graph Generation
new file mode 100644
index 0000000000..9c2a873d28
--- /dev/null
+++ b/data/2022/neurips/Iterative Scene Graph Generation	
@@ -0,0 +1 @@
+The task of scene graph generation entails identifying object entities and their corresponding interaction predicates in a given image (or video). Due to the combinatorially large solution space, existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation feasible (e.g., assuming that objects are conditionally independent of predicate predictions). However, this fixed factorization is not ideal under all scenarios (e.g., for images where an object entailed in interaction is small and not discernible on its own). In this work, we propose a novel framework for scene graph generation that addresses this limitation, as well as introduces dynamic conditioning on the image, using message passing in a Markov Random Field. This is implemented as an iterative refinement procedure wherein each modification is conditioned on the graph generated in the previous iteration. This conditioning across refinement steps allows joint reasoning over entities and relations. This framework is realized via a novel and end-to-end trainable transformer-based architecture. In addition, the proposed framework can improve existing approach performance. Through extensive experiments on Visual Genome and Action Genome benchmark datasets we show improved performance on the scene graph generation.
\ No newline at end of file
diff --git a/data/2022/neurips/Iterative Structural Inference of Directed Graphs b/data/2022/neurips/Iterative Structural Inference of Directed Graphs
new file mode 100644
index 0000000000..8edfcf0de6
--- /dev/null
+++ b/data/2022/neurips/Iterative Structural Inference of Directed Graphs	
@@ -0,0 +1 @@
+In this paper, we propose a variational model, Iterative Structural Inference of Directed Graphs (iSIDG), to infer the existence of directed interactions from observational agents’ features over a time period in a dynamical system. First, the iterative process in our model feeds the learned interactions back to encourage our model to eliminate indirect interactions and to emphasize directional representation during learning. Second, we show that extra regularization terms in the objective function for smoothness, connectiveness
\ No newline at end of file
diff --git a/data/2022/neurips/JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search b/data/2022/neurips/JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search
new file mode 100644
index 0000000000..717c1a7b2b
--- /dev/null
+++ b/data/2022/neurips/JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search	
@@ -0,0 +1 @@
+The past few years have seen the development of many benchmarks for Neural Architecture Search (NAS), fueling rapid progress in NAS research. However, recent work, which shows that good hyperparameter settings can be more important than using the best architecture, calls for a shift in focus towards Joint Architecture and Hyperparameter Search (JAHS). Therefore, we present JAHS-Bench-201, the first collection of surrogate benchmarks for JAHS, built to also facilitate research on multi-objective, cost-aware and (multi) multi-fidelity optimization algorithms. To the best of our knowledge, JAHS-Bench-201 is based on the most extensive dataset of neural network performance data in the public domain. It is composed of approximately 161 million data points and 20 performance metrics for three deep learning tasks, while featuring a 14 -dimensional search and fidelity space that extends the popular NAS-Bench-201 space. With JAHS-Bench-201, we hope to democratize research on JAHS and lower the barrier to entry of an extremely compute intensive field, e.g., by reducing the compute time to run a JAHS algorithm from 5 days to only a few seconds.
\ No newline at end of file
diff --git a/data/2022/neurips/JAWS: Auditing Predictive Uncertainty Under Covariate Shift b/data/2022/neurips/JAWS: Auditing Predictive Uncertainty Under Covariate Shift
new file mode 100644
index 0000000000..8216c83f31
--- /dev/null
+++ b/data/2022/neurips/JAWS: Auditing Predictive Uncertainty Under Covariate Shift	
@@ -0,0 +1 @@
+We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on the core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with data-dependent likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of the sample size or the influence function order under common regularity assumptions. Moreover, we propose a general approach to repurposing predictive interval-generating methods and their guarantees to the reverse task: estimating the probability that a prediction is erroneous, based on user-specified error criteria such as a safe or acceptable tolerance threshold around the true label. We then propose \textbf{JAW-E} and \textbf{JAWA-E} as the repurposed proposed methods for this \textbf{E}rror assessment task. Practically, JAWS outperform state-of-the-art predictive inference baselines in a variety of biased real world data sets for interval-generation and error-assessment predictive uncertainty auditing tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Joint Entropy Search For Maximally-Informed Bayesian Optimization b/data/2022/neurips/Joint Entropy Search For Maximally-Informed Bayesian Optimization
new file mode 100644
index 0000000000..35613f76ba
--- /dev/null
+++ b/data/2022/neurips/Joint Entropy Search For Maximally-Informed Bayesian Optimization	
@@ -0,0 +1 @@
+Information-theoretic Bayesian optimization techniques have become popular for optimizing expensive-to-evaluate black-box functions due to their non-myopic qualities. Entropy Search and Predictive Entropy Search both consider the entropy over the optimum in the input space, while the recent Max-value Entropy Search considers the entropy over the optimal value in the output space. We propose Joint Entropy Search (JES), a novel information-theoretic acquisition function that considers an entirely new quantity, namely the entropy over the joint optimal probability density over both input and output space. To incorporate this information, we consider the reduction in entropy from conditioning on fantasized optimal input/output pairs. The resulting approach primarily relies on standard GP machinery and removes complex approximations typically associated with information-theoretic methods. With minimal computational overhead, JES shows superior decision-making, and yields state-of-the-art performance for information-theoretic approaches across a wide suite of tasks. As a light-weight approach with superior results, JES provides a new go-to acquisition function for Bayesian optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Joint Entropy Search for Multi-Objective Bayesian Optimization b/data/2022/neurips/Joint Entropy Search for Multi-Objective Bayesian Optimization
new file mode 100644
index 0000000000..4cf30961d4
--- /dev/null
+++ b/data/2022/neurips/Joint Entropy Search for Multi-Objective Bayesian Optimization	
@@ -0,0 +1 @@
+Many real-world problems can be phrased as a multi-objective optimization problem, where the goal is to identify the best set of compromises between the competing objectives. Multi-objective Bayesian optimization (BO) is a sample efficient strategy that can be deployed to solve these vector-valued optimization problems where access is limited to a number of noisy objective function evaluations. In this paper, we propose a novel information-theoretic acquisition function for BO called Joint Entropy Search (JES), which considers the joint information gain for the optimal set of inputs and outputs. We present several analytical approximations to the JES acquisition function and also introduce an extension to the batch setting. We showcase the effectiveness of this new approach on a range of synthetic and real-world problems in terms of the hypervolume and its weighted variants.
\ No newline at end of file
diff --git a/data/2022/neurips/Joint Learning of 2D-3D Weakly Supervised Semantic Segmentation b/data/2022/neurips/Joint Learning of 2D-3D Weakly Supervised Semantic Segmentation
new file mode 100644
index 0000000000..4a4ff62e34
--- /dev/null
+++ b/data/2022/neurips/Joint Learning of 2D-3D Weakly Supervised Semantic Segmentation	
@@ -0,0 +1 @@
+The aim of weakly supervised semantic segmentation (WSSS) is to learn semantic segmentation without using dense annotations. WSSS has been intensively studied for 2D images and 3D point clouds. However, the existing WSSS studies have focused on a single domain, i . e . 2D or 3D, even when multi-domain data is available. In this paper, we propose a novel joint 2D-3D WSSS framework taking advantage of WSSS in different domains, using classification labels only. Via projection, we leverage the 2D class activation map as self-supervision to enhance the 3D semantic perception. Conversely, we exploit the similarity matrix of point cloud features for training the image classifier to achieve more precise 2D segmentation. In both directions, we devise a confidence-based scoring method to reduce the effect of inaccurate self-supervision. With extensive quantitative and qualitative experiments, we verify that the proposed joint WSSS framework effectively transfers the benefit of each domain to the other domain, and the resulting semantic segmentation performance is remarkably improved in both 2D and 3D domains. On the ScanNetV2 benchmark, our framework significantly outperforms the prior WSSS approaches, suggesting a new research direction for WSSS.
\ No newline at end of file
diff --git a/data/2022/neurips/Jump Self-attention: Capturing High-order Statistics in Transformers b/data/2022/neurips/Jump Self-attention: Capturing High-order Statistics in Transformers
new file mode 100644
index 0000000000..8d1195607c
--- /dev/null
+++ b/data/2022/neurips/Jump Self-attention: Capturing High-order Statistics in Transformers	
@@ -0,0 +1 @@
+The recent success of Transformer has benefited many real-world applications, with its capability of building long dependency through pairwise dot-products. However, the strong assumption that elements are directly attentive to each other limits the performance of tasks with high-order dependencies such as natural language understanding and Image captioning. To solve such problems, we are the first to define the Jump Self-attention (JAT) to build Transformers. Inspired by the pieces moving of English Draughts, we introduce the spectral convolutional technique to calculate JAT on the dot-product feature map. This technique allows JAT’s propagation in each self-attention head and is interchangeable with the canonical self-attention. We further develop the higher-order variants under the multi-hop assumption to increase the generality. Moreover, the proposed architecture is compatible with the pre-trained models. With extensive experiments, we empirically show that our methods significantly increase the performance on ten different tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/K-LITE: Learning Transferable Visual Models with External Knowledge b/data/2022/neurips/K-LITE: Learning Transferable Visual Models with External Knowledge
new file mode 100644
index 0000000000..cf20c93887
--- /dev/null
+++ b/data/2022/neurips/K-LITE: Learning Transferable Visual Models with External Knowledge	
@@ -0,0 +1 @@
+The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, due to the broad concept coverage achieved via large-scale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages a much more structured source of supervision and offers sample efficiency. We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts. In evaluation, the text is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is available at https://github.com/microsoft/klite.
\ No newline at end of file
diff --git a/data/2022/neurips/K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions b/data/2022/neurips/K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions
new file mode 100644
index 0000000000..56140b18c7
--- /dev/null
+++ b/data/2022/neurips/K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions	
@@ -0,0 +1 @@
+Unlike RGB cameras that use visible light bands (384$\sim$769 THz) and Lidars that use infrared bands (361$\sim$331 THz), Radars use relatively longer wavelength radio bands (77$\sim$81 GHz), resulting in robust measurements in adverse weathers. Unfortunately, existing Radar datasets only contain a relatively small number of samples compared to the existing camera and Lidar datasets. This may hinder the development of sophisticated data-driven deep learning techniques for Radar-based perception. Moreover, most of the existing Radar datasets only provide 3D Radar tensor (3DRT) data that contain power measurements along the Doppler, range, and azimuth dimensions. As there is no elevation information, it is challenging to estimate the 3D bounding box of an object from 3DRT. In this work, we introduce KAIST-Radar (K-Radar), a novel large-scale object detection dataset and benchmark that contains 35K frames of 4D Radar tensor (4DRT) data with power measurements along the Doppler, range, azimuth, and elevation dimensions, together with carefully annotated 3D bounding box labels of objects on the roads. K-Radar includes challenging driving conditions such as adverse weathers (fog, rain, and snow) on various road structures (urban, suburban roads, alleyways, and highways). In addition to the 4DRT, we provide auxiliary measurements from carefully calibrated high-resolution Lidars, surround stereo cameras, and RTK-GPS. We also provide 4DRT-based object detection baseline neural networks (baseline NNs) and show that the height information is crucial for 3D object detection. And by comparing the baseline NN with a similarly-structured Lidar-based neural network, we demonstrate that 4D Radar is a more robust sensor for adverse weather conditions. All codes are available at https://github.com/kaist-avelab/k-radar.
\ No newline at end of file
diff --git a/data/2022/neurips/KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation b/data/2022/neurips/KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
new file mode 100644
index 0000000000..2965ab00b4
--- /dev/null
+++ b/data/2022/neurips/KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation	
@@ -0,0 +1 @@
+Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three large language modeling datasets. Our implementation and pretrained checkpoints are released at~\url{https://github.com/chijames/KERPLE.git}.
\ No newline at end of file
diff --git a/data/2022/neurips/KSD Aggregated Goodness-of-fit Test b/data/2022/neurips/KSD Aggregated Goodness-of-fit Test
new file mode 100644
index 0000000000..c90d32e9dd
--- /dev/null
+++ b/data/2022/neurips/KSD Aggregated Goodness-of-fit Test	
@@ -0,0 +1 @@
+We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide non-asymptotic guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. For compactly supported densities with bounded model score function, we derive the rate for KSDAgg over restricted Sobolev balls; this rate corresponds to the minimax optimal rate over unrestricted Sobolev balls, up to an iterated logarithmic term. KSDAgg can be computed exactly in practice as it relies either on a parametric bootstrap or on a wild bootstrap to estimate the quantiles and the level corrections. In particular, for the crucial choice of bandwidth of a fixed kernel, it avoids resorting to arbitrary heuristics (such as median or standard deviation) or to data splitting. We find on both synthetic and real-world data that KSDAgg outperforms other state-of-the-art quadratic-time adaptive KSD-based goodness-of-fit testing procedures.
\ No newline at end of file
diff --git a/data/2022/neurips/Kantorovich Strikes Back! Wasserstein GANs are not Optimal Transport? b/data/2022/neurips/Kantorovich Strikes Back! Wasserstein GANs are not Optimal Transport?
new file mode 100644
index 0000000000..95f3fcde83
--- /dev/null
+++ b/data/2022/neurips/Kantorovich Strikes Back! Wasserstein GANs are not Optimal Transport?	
@@ -0,0 +1 @@
+Wasserstein Generative Adversarial Networks (WGANs) are the popular generative models built on the theory of Optimal Transport (OT) and the Kantorovich duality. Despite the success of WGANs, it is still unclear how well the underlying OT dual solvers approximate the OT cost (Wasserstein-1 distance, $\mathbb{W}_{1}$) and the OT gradient needed to update the generator. In this paper, we address these questions. We construct 1-Lipschitz functions and use them to build ray monotone transport plans. This strategy yields pairs of continuous benchmark distributions with the analytically known OT plan, OT cost and OT gradient in high-dimensional spaces such as spaces of images. We thoroughly evaluate popular WGAN dual form solvers (gradient penalty, spectral normalization, entropic regularization, etc.) using these benchmark pairs. Even though these solvers perform well in WGANs, none of them faithfully compute $\mathbb{W}_{1}$ in high dimensions. Nevertheless, many provide a meaningful approximation of the OT gradient. These observations suggest that these solvers should not be treated as good estimators of $\mathbb{W}_{1}$, but to some extent they indeed can be used in variational problems requiring the minimization of $\mathbb{W}_{1}$.
\ No newline at end of file
diff --git a/data/2022/neurips/Kernel Interpolation with Sparse Grids b/data/2022/neurips/Kernel Interpolation with Sparse Grids
new file mode 100644
index 0000000000..8f49c2dfd9
--- /dev/null
+++ b/data/2022/neurips/Kernel Interpolation with Sparse Grids	
@@ -0,0 +1 @@
+Structured kernel interpolation (SKI) accelerates Gaussian process (GP) inference by interpolating the kernel covariance function using a dense grid of inducing points, whose corresponding kernel matrix is highly structured and thus amenable to fast linear algebra. Unfortunately, SKI scales poorly in the dimension of the input points, since the dense grid size grows exponentially with the dimension. To mitigate this issue, we propose the use of sparse grids within the SKI framework. These grids enable accurate interpolation, but with a number of points growing more slowly with dimension. We contribute a novel nearly linear time matrix-vector multiplication algorithm for the sparse grid kernel matrix. Next, we describe how sparse grids can be combined with an efficient interpolation scheme based on simplices. With these changes, we demonstrate that SKI can be scaled to higher dimensions while maintaining accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Kernel Memory Networks: A Unifying Framework for Memory Modeling b/data/2022/neurips/Kernel Memory Networks: A Unifying Framework for Memory Modeling
new file mode 100644
index 0000000000..dec890624c
--- /dev/null
+++ b/data/2022/neurips/Kernel Memory Networks: A Unifying Framework for Memory Modeling	
@@ -0,0 +1 @@
+We consider the problem of training a neural network to store a set of patterns with maximal noise robustness. A solution, in terms of optimal weights and state update rules, is derived by training each individual neuron to perform either kernel classification or interpolation with a minimum weight norm. By applying this method to feed-forward and recurrent networks, we derive optimal models, termed kernel memory networks, that include, as special cases, many of the hetero- and auto-associative memory models that have been proposed over the past years, such as modern Hopfield networks and Kanerva's sparse distributed memory. We modify Kanerva's model and demonstrate a simple way to design a kernel memory network that can store an exponential number of continuous-valued patterns with a finite basin of attraction. The framework of kernel memory networks offers a simple and intuitive way to understand the storage capacity of previous memory models, and allows for new biological interpretations in terms of dendritic non-linearities and synaptic cross-talk.
\ No newline at end of file
diff --git a/data/2022/neurips/Kernel Multimodal Continuous Attention b/data/2022/neurips/Kernel Multimodal Continuous Attention
new file mode 100644
index 0000000000..fcc9c5c523
--- /dev/null
+++ b/data/2022/neurips/Kernel Multimodal Continuous Attention	
@@ -0,0 +1 @@
+Attention mechanisms average a data representation with respect to probability weights. Recently, [23–25] proposed continuous attention, focusing on unimodal exponential and deformed exponential family attention densities: the latter can have sparse support. [8] extended to multimodality via Gaussian mixture attention densities. In this paper, we propose using kernel exponential families [4] and our new sparse counterpart, kernel deformed exponential families. Theoretically, we show new existence results for both families, and approximation capabilities for the deformed case. Lacking closed form expressions for the context vector, we use numerical integration: we prove exponential convergence for both families. Experiments show that kernel continuous attention often outperforms unimodal continuous attention, and the sparse variant tends to highlight time series peaks
\ No newline at end of file
diff --git a/data/2022/neurips/Kernel similarity matching with Hebbian networks b/data/2022/neurips/Kernel similarity matching with Hebbian networks
new file mode 100644
index 0000000000..89e952cb44
--- /dev/null
+++ b/data/2022/neurips/Kernel similarity matching with Hebbian networks	
@@ -0,0 +1 @@
+Recent works have derived neural networks with online correlation-based learning rules to perform kernel similarity matching . These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper we attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological “Nyström” method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nyström methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.
\ No newline at end of file
diff --git a/data/2022/neurips/Keypoint-Guided Optimal Transport with Applications in Heterogeneous Domain Adaptation b/data/2022/neurips/Keypoint-Guided Optimal Transport with Applications in Heterogeneous Domain Adaptation
new file mode 100644
index 0000000000..474aecdce1
--- /dev/null
+++ b/data/2022/neurips/Keypoint-Guided Optimal Transport with Applications in Heterogeneous Domain Adaptation	
@@ -0,0 +1 @@
+Existing Optimal Transport (OT) methods mainly derive the optimal transport plan/matching under the criterion of transport cost/distance minimization, which may cause incorrect matching in some cases. In many applications, annotating a few matched keypoints across domains is reasonable or even effortless in annotation burden. It is valuable to investigate how to leverage the annotated keypoints to guide the correct matching in OT. In this paper, we propose a novel KeyPoint-Guided model by ReLation preservation (KPG-RL) that searches for the matching guided by the keypoints in OT. To impose the keypoints in OT, first, we propose a mask-based constraint of the transport plan that preserves the matching of keypoint pairs. Second, we propose to preserve the relation of each data point to the keypoints to guide the matching. The proposed KPG-RL model can be solved by the Sinkhorn’s algorithm and is applicable even when distributions are supported in different spaces. We further utilize the relation preservation constraint in the Kantorovich Problem and Gromov-Wasserstein model to impose the guidance of keypoints in them. Meanwhile, the proposed KPG-RL model is extended to partial OT setting. As an application, we apply the proposed KPG-RL model to the heterogeneous domain adaptation. Experiments verified the effectiveness of the KPG-RL model. Code is available at https://github.com/XJTU-XGU/KPG-RL .
\ No newline at end of file
diff --git a/data/2022/neurips/Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks b/data/2022/neurips/Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks
new file mode 100644
index 0000000000..3fc7cfb965
--- /dev/null
+++ b/data/2022/neurips/Knowledge Distillation Improves Graph Structure Augmentation for Graph Neural Networks	
@@ -0,0 +1 @@
+Graph (structure) augmentation aims to perturb the graph structure through heuristic or probabilistic rules, enabling the nodes to capture richer contextual information and thus improving generalization performance. While there have been a few graph structure augmentation methods proposed recently, none of them are aware of a potential negative augmentation problem, which may be caused by overly severe distribution shifts between the original and augmented graphs. In this paper, we take an important graph property, namely graph homophily, to analyze the distribution shifts between the two graphs and thus measure the severity of an augmentation algorithm suffering from negative augmentation. To tackle this problem, we propose a novel Knowledge Distillation for Graph Augmentation (KDGA) framework, which helps to reduce the potential negative effects of distribution shifts, i.e., negative augmentation problem. Specifically, KDGA extracts the knowledge of any GNN teacher model trained on the augmented graphs and injects it into a partially parameter-shared student model that is tested on the original graph. As a simple but efficient framework, KDGA is applicable to a variety of existing graph augmentation methods and can significantly improve the performance of various GNN architectures. For three popular graph augmentation methods, namely GAUG, MH-Aug, and GraphAug, the experimental results show that the learned student models outperform their vanilla implementations by an average accuracy of 4.6% (GAUG), 4.2% (MH-Aug), and 4.6% (GraphAug) on eight graph datasets. Codes are available at: https://github.com/LirongWu/KDGA .
\ No newline at end of file
diff --git a/data/2022/neurips/Knowledge Distillation from A Stronger Teacher b/data/2022/neurips/Knowledge Distillation from A Stronger Teacher
new file mode 100644
index 0000000000..9931306464
--- /dev/null
+++ b/data/2022/neurips/Knowledge Distillation from A Stronger Teacher	
@@ -0,0 +1 @@
+Unlike existing knowledge distillation methods focus on the baseline settings, where the teacher models and training strategies are not that strong and competing as state-of-the-art approaches, this paper presents a method dubbed DIST to distill better from a stronger teacher. We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer. As a result, the exact match of predictions in KL divergence would disturb the training and make existing methods perform poorly. In this paper, we show that simply preserving the relations between the predictions of teacher and student would suffice, and propose a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly. Besides, considering that different instances have different semantic similarities to each class, we also extend this relational match to the intra-class level. Our method is simple yet practical, and extensive experiments demonstrate that it adapts well to various architectures, model sizes and training strategies, and can achieve state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at: https://github.com/hunto/DIST_KD .
\ No newline at end of file
diff --git a/data/2022/neurips/Knowledge Distillation: Bad Models Can Be Good Role Models b/data/2022/neurips/Knowledge Distillation: Bad Models Can Be Good Role Models
new file mode 100644
index 0000000000..7a76c3c91f
--- /dev/null
+++ b/data/2022/neurips/Knowledge Distillation: Bad Models Can Be Good Role Models	
@@ -0,0 +1 @@
+Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work \citep{nakkiran2020distributional} has empirically observed that such networks behave as"conditional samplers"from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling behavior in the context of learning theory. We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data. We show that samplers, while being bad classifiers, can be good teachers. Concretely, we prove that distillation from samplers is guaranteed to produce a student which approximates the Bayes optimal classifier. Finally, we show that some common learning algorithms (e.g., Nearest-Neighbours and Kernel Machines) can generate samplers when applied in the overparameterized regime.
\ No newline at end of file
diff --git a/data/2022/neurips/Knowledge-Aware Bayesian Deep Topic Model b/data/2022/neurips/Knowledge-Aware Bayesian Deep Topic Model
new file mode 100644
index 0000000000..411e54d309
--- /dev/null
+++ b/data/2022/neurips/Knowledge-Aware Bayesian Deep Topic Model	
@@ -0,0 +1 @@
+We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining word co-occurrence patterns, ignoring potentially easy-to-obtain prior topic hierarchies that could help enhance topic coherence. While several knowledge-based topic models have recently been proposed, they are either only applicable to shallow hierarchies or sensitive to the quality of the provided prior knowledge. To this end, we develop a novel deep ETM that jointly models the documents and the given prior knowledge by embedding the words and topics into the same space. Guided by the provided knowledge, the proposed model tends to discover topic hierarchies that are organized into interpretable taxonomies. Besides, with a technique for adapting a given graph, our extended version allows the provided prior topic structure to be finetuned to match the target corpus. Extensive experiments show that our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
\ No newline at end of file
diff --git a/data/2022/neurips/LAION-5B: An open large-scale dataset for training next generation image-text models b/data/2022/neurips/LAION-5B: An open large-scale dataset for training next generation image-text models
new file mode 100644
index 0000000000..2281ebf863
--- /dev/null
+++ b/data/2022/neurips/LAION-5B: An open large-scale dataset for training next generation image-text models	
@@ -0,0 +1 @@
+Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B - a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection. Announcement page https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/
\ No newline at end of file
diff --git a/data/2022/neurips/LAMP: Extracting Text from Gradients with Language Model Priors b/data/2022/neurips/LAMP: Extracting Text from Gradients with Language Model Priors
new file mode 100644
index 0000000000..2304f24848
--- /dev/null
+++ b/data/2022/neurips/LAMP: Extracting Text from Gradients with Language Model Priors	
@@ -0,0 +1 @@
+Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modeling prior text probability with an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization, which minimizes reconstruction loss on embeddings, while avoiding local minima by applying discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and 23% longer subsequences on average. Moreover, we are the first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.
\ No newline at end of file
diff --git a/data/2022/neurips/LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning b/data/2022/neurips/LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning
new file mode 100644
index 0000000000..a4d1656e81
--- /dev/null
+++ b/data/2022/neurips/LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new samples. This setting is particularly well-suited for continuous control robotic applications for which online data collection based on trial-and-error is costly and potentially unsafe. In practice, offline datasets are often heterogeneous , i.e., collected in a variety of scenarios, such as data from several human demonstrators or from policies that act with different purposes. Unfortunately, such datasets often contain action distributions with multiple modes and, in some cases, lack a sufficient number of high-reward trajectories, which render offline policy training inefficient. To address this challenge, we propose to leverage latent-variable generative model to represent high-advantage state-action pairs leading to better adherence to data distributions that contributes to solving the task, while maximizing reward via a policy over the latent variable. As we empirically show on a range of simulated locomotion, navigation, and manipulation tasks, our method referred to as latent-variable advantage-weighted policy optimization (LAPO), improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets, and by 8% on datasets with narrow and biased distributions.
\ No newline at end of file
diff --git a/data/2022/neurips/LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery b/data/2022/neurips/LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery
new file mode 100644
index 0000000000..27627b6778
--- /dev/null
+++ b/data/2022/neurips/LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery	
@@ -0,0 +1 @@
+Creating high-quality articulated 3D models of animals is challenging either via manual creation or using 3D scanning tools. Therefore, techniques to reconstruct articulated 3D objects from 2D images are crucial and highly useful. In this work, we propose a practical problem setting to estimate 3D pose and shape of animals given only a few (10-30) in-the-wild images of a particular animal species (say, horse). Contrary to existing works that rely on pre-defined template shapes, we do not assume any form of 2D or 3D ground-truth annotations, nor do we leverage any multi-view or temporal information. Moreover, each input image ensemble can contain animal instances with varying poses, backgrounds, illuminations, and textures. Our key insight is that 3D parts have much simpler shape compared to the overall animal and that they are robust w.r.t. animal pose articulations. Following these insights, we propose LASSIE, a novel optimization framework which discovers 3D parts in a self-supervised manner with minimal user intervention. A key driving force behind LASSIE is the enforcing of 2D-3D part consistency using self-supervisory deep features. Experiments on Pascal-Part and self-collected in-the-wild animal datasets demonstrate considerably better 3D reconstructions as well as both 2D and 3D part discovery compared to prior arts. Project page: chhankyao.github.io/lassie/
\ No newline at end of file
diff --git a/data/2022/neurips/LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank b/data/2022/neurips/LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank
new file mode 100644
index 0000000000..20075b24fe
--- /dev/null
+++ b/data/2022/neurips/LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank	
@@ -0,0 +1 @@
+Using Unbiased Learning to Rank (ULTR) to train the ranking model with biased click logs has attracted increased research interest. The key idea is to explicitly model the user’s observation behavior when building the ranker with a large number of click logs. Considering the simplicity, recent efforts are mainly based on the position bias hypothesis, in which the observation only depends on the position. However, this hypothesis does not hold in many scenarios due to the neglect of the distinct characteristics of individuals in the same position. On the other hand, directly modeling observation bias for each individual is quite challenging, since the effects of each individual’s features on relevance and observation are entangled. It is difficult to ravel out this coupled effect and thus obtain a correct relevance model from click data. To address this issue, we first present the concept of coupling effect for individual-level ULTR. Then, we develop the novel Lipschitz and Bernoulli Decoupling (LBD) model to decouple the effects on relevance and observation at the individual level. We prove theoretically that our proposed method could recover the correct relevance order for the ranking objective. Empirical results on two LTR benchmark datasets show that the proposed model outperforms the state-of-the-art base-lines and verify its effectiveness in debiasing data. Our codes are available at https://github.com/Keytoyze/Lipschitz-Bernoulli-Decoupling .
\ No newline at end of file
diff --git a/data/2022/neurips/LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning b/data/2022/neurips/LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..ac0bd2ec1e
--- /dev/null
+++ b/data/2022/neurips/LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, in many complex multi-agent tasks, different agents are expected to possess specific abilities to handle different subtasks. In those scenarios, sharing parameters indiscriminately may lead to similar behavior across all agents, which will limit the exploration efficiency and degrade the final performance. To balance the training complexity and the diversity of agent behavior, we propose a novel framework to learn dynamic subtask assignment (LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder to construct a vector representation for each subtask according to its identity. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy, which can dynamically group agents with similar abilities into the same subtask. In this way, agents dealing with the same subtask share their learning of specific abilities and different subtasks correspond to different specific abilities. We further introduce two regularizers to increase the representation difference between subtasks and stabilize the training by discouraging agents from frequently changing subtasks, respectively. Empirical results show that LDSA learns reasonable and effective subtask assignment for better collaboration and significantly improves the learning performance on the challenging StarCraft II micromanagement benchmark and Google Research Football.
\ No newline at end of file
diff --git a/data/2022/neurips/LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward b/data/2022/neurips/LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
new file mode 100644
index 0000000000..3219326f6f
--- /dev/null
+++ b/data/2022/neurips/LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward	
@@ -0,0 +1 @@
+Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.
\ No newline at end of file
diff --git a/data/2022/neurips/LGDN: Language-Guided Denoising Network for Video-Language Modeling b/data/2022/neurips/LGDN: Language-Guided Denoising Network for Video-Language Modeling
new file mode 100644
index 0000000000..5d46fcffa3
--- /dev/null
+++ b/data/2022/neurips/LGDN: Language-Guided Denoising Network for Video-Language Modeling	
@@ -0,0 +1 @@
+Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e.g., scenery shot, transition or teaser). Although a number of recent works deploy attention mechanism to alleviate this problem, the irrelevant/noisy information still makes it very difficult to address. To overcome such challenge, we thus propose an efficient and effective model, termed Language-Guided Denoising Network (LGDN), for video-language modeling. Different from most existing methods that utilize all extracted video frames, LGDN dynamically filters out the misaligned or redundant frames under the language supervision and obtains only 2--4 salient frames per video for cross-modal token-level alignment. Extensive experiments on five public datasets show that our LGDN outperforms the state-of-the-arts by large margins. We also provide detailed ablation study to reveal the critical importance of solving the noise issue, in hope of inspiring future video-language work.
\ No newline at end of file
diff --git a/data/2022/neurips/LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks b/data/2022/neurips/LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks
new file mode 100644
index 0000000000..9e7a730ce4
--- /dev/null
+++ b/data/2022/neurips/LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks	
@@ -0,0 +1 @@
+Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling"no-code machine learning with LMs."We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning.
\ No newline at end of file
diff --git a/data/2022/neurips/LION: Latent Point Diffusion Models for 3D Shape Generation b/data/2022/neurips/LION: Latent Point Diffusion Models for 3D Shape Generation
new file mode 100644
index 0000000000..1c08b04a1e
--- /dev/null
+++ b/data/2022/neurips/LION: Latent Point Diffusion Models for 3D Shape Generation	
@@ -0,0 +1 @@
+Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the hierarchical Latent Point Diffusion Model (LION) for 3D shape generation. LION is set up as a variational autoencoder (VAE) with a hierarchical latent space that combines a global shape latent representation with a point-structured latent space. For generation, we train two hierarchical DDMs in these latent spaces. The hierarchical VAE approach boosts performance compared to DDMs that operate on point clouds directly, while the point-structured latents are still ideally suited for DDM-based modeling. Experimentally, LION achieves state-of-the-art generation performance on multiple ShapeNet benchmarks. Furthermore, our VAE framework allows us to easily use LION for different relevant tasks: LION excels at multimodal shape denoising and voxel-conditioned synthesis, and it can be adapted for text- and image-driven 3D generation. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern surface reconstruction techniques to generate smooth 3D meshes. We hope that LION provides a powerful tool for artists working with 3D shapes due to its high-quality generation, flexibility, and surface reconstruction. Project page and code: https://nv-tlabs.github.io/LION.
\ No newline at end of file
diff --git a/data/2022/neurips/LIPS - Learning Industrial Physical Simulation benchmark suite b/data/2022/neurips/LIPS - Learning Industrial Physical Simulation benchmark suite
new file mode 100644
index 0000000000..0885757735
--- /dev/null
+++ b/data/2022/neurips/LIPS - Learning Industrial Physical Simulation benchmark suite	
@@ -0,0 +1 @@
+Physical simulations are at the core of many critical industrial systems. However, today’s physical simulators have some limitations such as computation time, dealing with missing or uncertain data, or even non-convergence for some feasible cases. Recently, the use of data-driven approaches to learn complex physical simulations has been considered as a promising approach to address those issues. However, this comes often at the cost of some accuracy which may hinder the industrial use. To drive this new research topic towards a better real-world applicability, we propose a new benchmark suite "Learning Industrial Physical Simulations"(LIPS) to meet the need of developing efficient, industrial application-oriented, augmented simulators. To define how to assess such benchmark performance, we propose a set of four generic categories of criteria. The proposed benchmark suite is a modular and configurable framework that can deal with different physical problems. To demonstrate this ability, we pro-pose in this paper to investigate two distinct use-cases with different physical simulations, namely: the power grid and the pneumatic. For each use case, several benchmarks are described and assessed with existing models. None of the models perform well under all expected criteria, inviting the community to develop new industry-applicable solutions and possibly showcase their performance publicly upon online LIPS instance on Codabench.
\ No newline at end of file
diff --git a/data/2022/neurips/LISA: Learning Interpretable Skill Abstractions from Language b/data/2022/neurips/LISA: Learning Interpretable Skill Abstractions from Language
new file mode 100644
index 0000000000..05a6726fdd
--- /dev/null
+++ b/data/2022/neurips/LISA: Learning Interpretable Skill Abstractions from Language	
@@ -0,0 +1 @@
+Learning policies that effectively utilize language instructions in complex, multi-task environments is an important problem in sequential decision-making. While it is possible to condition on the entire language instruction directly, such an approach could suffer from generalization issues. In our work, we propose \emph{Learning Interpretable Skill Abstractions (LISA)}, a hierarchical imitation learning framework that can learn diverse, interpretable primitive behaviors or skills from language-conditioned demonstrations to better generalize to unseen instructions. LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy. In navigation and robotic manipulation environments, LISA outperforms a strong non-hierarchical Decision Transformer baseline in the low data regime and is able to compose learned skills to solve tasks containing unseen long-range instructions. Our method demonstrates a more natural way to condition on language in sequential decision-making problems and achieve interpretable and controllable behavior with the learned skills.
\ No newline at end of file
diff --git a/data/2022/neurips/LOG: Active Model Adaptation for Label-Efficient OOD Generalization b/data/2022/neurips/LOG: Active Model Adaptation for Label-Efficient OOD Generalization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness b/data/2022/neurips/LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness
new file mode 100644
index 0000000000..50d74c404f
--- /dev/null
+++ b/data/2022/neurips/LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness	
@@ -0,0 +1 @@
+Recent studies show that training deep neural networks (DNNs) with Lipschitz constraints are able to enhance adversarial robustness and other model properties such as stability. In this paper, we propose a layer-wise orthogonal training method (LOT) to effectively train 1-Lipschitz convolution layers via parametrizing an orthogonal matrix with an unconstrained matrix. We then efficiently compute the inverse square root of a convolution kernel by transforming the input domain to the Fourier frequency domain. On the other hand, as existing works show that semi-supervised training helps improve empirical robustness, we aim to bridge the gap and prove that semi-supervised learning also improves the certified robustness of Lipschitz-bounded models. We conduct comprehensive evaluations for LOT under different settings. We show that LOT significantly outperforms baselines regarding deterministic l2 certified robustness, and scales to deeper neural networks. Under the supervised scenario, we improve the state-of-the-art certified robustness for all architectures (e.g. from 59.04% to 63.50% on CIFAR-10 and from 32.57% to 34.59% on CIFAR-100 at radius rho = 36/255 for 40-layer networks). With semi-supervised learning over unlabelled data, we are able to improve state-of-the-art certified robustness on CIFAR-10 at rho = 108/255 from 36.04% to 42.39%. In addition, LOT consistently outperforms baselines on different model architectures with only 1/3 evaluation time.
\ No newline at end of file
diff --git a/data/2022/neurips/LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning b/data/2022/neurips/LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
new file mode 100644
index 0000000000..a5cb6d4003
--- /dev/null
+++ b/data/2022/neurips/LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning	
@@ -0,0 +1 @@
+Fine-tuning large pre-trained models on downstream tasks has been adopted in a variety of domains recently. However, it is costly to update the entire parameter set of large pre-trained models. Although recently proposed parameter-efficient transfer learning (PETL) techniques allow updating a small subset of parameters (e.g. only using 2% of parameters) inside a pre-trained backbone network for a new task, they only reduce the training memory requirement by up to 30%. This is because the gradient computation for the trainable parameters still requires backpropagation through the large pre-trained backbone model. To address this, we propose Ladder Side-Tuning (LST), a new PETL technique that can reduce training memory requirements by more substantial amounts. Unlike existing parameter-efficient methods that insert additional parameters inside backbone networks, we train a ladder side network, a small and separate network that takes intermediate activations as input via shortcut connections (called ladders) from backbone networks and makes predictions. LST has significantly lower memory requirements than previous methods, because it does not require backpropagation through the backbone network, but instead only through the side network and ladder connections. We evaluate our method with various models (T5 and CLIP-T5) on both NLP (GLUE) and vision-and-language (VQA, GQA, NLVR2 , MSCOCO) tasks. LST saves 69% of the memory costs to fine-tune the whole network, while other methods only save 26% of that in similar parameter usages (hence, 2.7x more memory savings). Moreover, LST achieves higher accuracy than Adapter and LoRA in a low-memory regime. To further show the advantage of this better memory efficiency, we also apply LST to larger T5 models, attaining better GLUE performance than full fine-tuning and other PETL methods. The accuracy-efficiency trade-off also holds on VL tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/LTMD: Learning Improvement of Spiking Neural Networks with Learnable Thresholding Neurons and Moderate Dropout b/data/2022/neurips/LTMD: Learning Improvement of Spiking Neural Networks with Learnable Thresholding Neurons and Moderate Dropout
new file mode 100644
index 0000000000..c8ac7db06a
--- /dev/null
+++ b/data/2022/neurips/LTMD: Learning Improvement of Spiking Neural Networks with Learnable Thresholding Neurons and Moderate Dropout	
@@ -0,0 +1 @@
+Spiking Neural Networks (SNNs) have shown substantial promise in processing spatio-temporal data, mimicking biological neuronal mechanisms, and saving computational power. However, most SNNs use fixed model regardless of their locations in the network. This limits SNNs’ capability of transmitting precise information in the network, which becomes worse for deeper SNNs. Some researchers try to use specified parametric models in different network layers or regions, but most still use preset or suboptimal parameters. Inspired by the neuroscience observation that different neuronal mechanisms exist in disparate brain regions, we propose a new spiking neuronal mechanism, named learnable thresholding, to address this issue. Utilizing learnable threshold values, learnable thresholding enables flexible neuronal mechanisms across layers, proper information flow within the network, and fast network convergence. In addition, we propose a moderate dropout method to serve as an enhancement technique to minimize inconsistencies between independent dropout runs. Finally, we evaluate the robustness of the proposed learnable thresholding and moderate dropout for image classification with different initial thresholds for various types of datasets. Our proposed methods produce superior results compared to other approaches for almost all datasets with fewer timesteps. Our codes are available at https://github.com/sq117/LTMD.git .
\ No newline at end of file
diff --git a/data/2022/neurips/Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting b/data/2022/neurips/Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting
new file mode 100644
index 0000000000..0c368c6a6f
--- /dev/null
+++ b/data/2022/neurips/Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting	
@@ -0,0 +1 @@
+We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples - the true label distribution is distorted by the adversarial perturbation, but is neglected by the common practice that inherits labels from clean examples. Recognizing label noise sheds insights on the prevalence of robust overfitting in adversarial training, and explains its intriguing dependence on perturbation radius and data quality. Also, our label noise perspective aligns well with our observations of the epoch-wise double descent in adversarial training. Guided by our analyses, we proposed a method to automatically calibrate the label to address the label noise and robust overfitting. Our method achieves consistent performance improvements across various models and datasets without introducing new hyper-parameters or additional tuning.
\ No newline at end of file
diff --git a/data/2022/neurips/Label-Aware Global Consistency for Multi-Label Learning with Single Positive Labels b/data/2022/neurips/Label-Aware Global Consistency for Multi-Label Learning with Single Positive Labels
new file mode 100644
index 0000000000..292f500010
--- /dev/null
+++ b/data/2022/neurips/Label-Aware Global Consistency for Multi-Label Learning with Single Positive Labels	
@@ -0,0 +1 @@
+In single positive multi-label learning (SPML), only one of multiple positive labels is observed for each instance. The previous work trains the model by simply treating unobserved labels as negative ones, and designs the regularization to constrain the number of expected positive labels. However, in many real-world scenarios, the true number of positive labels is unavailable, making such methods less applicable. In this paper, we propose to solve SPML problems by designing a Label-Aware global Consistency (LAC) regularization, which leverages the manifold structure information to enhance the recovery of potential positive labels. On one hand, we ﬁrst perform pseudo-labeling for each unobserved label based on its prediction probability. The consistency regularization is then imposed on model outputs to balance the ﬁtting of identiﬁed labels and exploring of potential positive labels. On the other hand, by enforcing label-wise embeddings to maintain global consistency, LAC loss encourages the model to learn more distinctive representations, which is beneﬁcial for recovering the information of potential positive labels. Experiments on multiple benchmark datasets validate that the proposed method can achieve state-of-the-art performance for solving SPML tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Label-invariant Augmentation for Semi-Supervised Graph Classification b/data/2022/neurips/Label-invariant Augmentation for Semi-Supervised Graph Classification
new file mode 100644
index 0000000000..6aaa5753d6
--- /dev/null
+++ b/data/2022/neurips/Label-invariant Augmentation for Semi-Supervised Graph Classification	
@@ -0,0 +1 @@
+Recently, contrastiveness-based augmentation surges a new climax in the computer vision domain, where some operations, including rotation, crop, and flip, combined with dedicated algorithms, dramatically increase the model generalization and robustness. Following this trend, some pioneering attempts employ the similar idea to graph data. Nevertheless, unlike images, it is much more difficult to design reasonable augmentations without changing the nature of graphs. Although exciting, the current graph contrastive learning does not achieve as promising performance as visual contrastive learning. We conjecture the current performance of graph contrastive learning might be limited by the violation of the label-invariant augmentation assumption. In light of this, we propose a label-invariant augmentation for graph-structured data to address this challenge. Different from the node/edge modification and subgraph extraction, we conduct the augmentation in the representation space and generate the augmented samples in the most difficult direction while keeping the label of augmented data the same as the original samples. In the semi-supervised scenario, we demonstrate our proposed method outperforms the classical graph neural network based methods and recent graph contrastive learning on eight benchmark graph-structured data, followed by several in-depth experiments to further explore the label-invariant augmentation in several aspects.
\ No newline at end of file
diff --git a/data/2022/neurips/Langevin Autoencoders for Learning Deep Latent Variable Models b/data/2022/neurips/Langevin Autoencoders for Learning Deep Latent Variable Models
new file mode 100644
index 0000000000..5d66adc4f8
--- /dev/null
+++ b/data/2022/neurips/Langevin Autoencoders for Learning Deep Latent Variable Models	
@@ -0,0 +1 @@
+Markov chain Monte Carlo (MCMC), such as Langevin dynamics, is valid for approximating intractable distributions. However, its usage is limited in the context of deep latent variable models owing to costly datapoint-wise sampling iterations and slow convergence. This paper proposes the amortized Langevin dynamics (ALD), wherein datapoint-wise MCMC iterations are entirely replaced with updates of an encoder that maps observations into latent variables. This amortization enables efficient posterior sampling without datapoint-wise iterations. Despite its efficiency, we prove that ALD is valid as an MCMC algorithm, whose Markov chain has the target posterior as a stationary distribution under mild assumptions. Based on the ALD, we also present a new deep latent variable model named the Langevin autoencoder (LAE). Interestingly, the LAE can be implemented by slightly modifying the traditional autoencoder. Using multiple synthetic datasets, we first validate that ALD can properly obtain samples from target posteriors. We also evaluate the LAE on the image generation task, and show that our LAE can outperform existing methods based on variational inference, such as the variational autoencoder, and other MCMC-based methods in terms of the test likelihood.
\ No newline at end of file
diff --git a/data/2022/neurips/Language Conditioned Spatial Relation Reasoning for 3D Object Grounding b/data/2022/neurips/Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
new file mode 100644
index 0000000000..f9aa6f12d6
--- /dev/null
+++ b/data/2022/neurips/Language Conditioned Spatial Relation Reasoning for 3D Object Grounding	
@@ -0,0 +1 @@
+Localizing objects in 3D scenes based on natural language requires understanding and reasoning about spatial relations. In particular, it is often crucial to distinguish similar objects referred by the text, such as"the left most chair"and"a chair next to the window". In this work we propose a language-conditioned transformer model for grounding 3D objects and their spatial relations. To this end, we design a spatial self-attention layer that accounts for relative distances and orientations between objects in input 3D point clouds. Training such a layer with visual and language inputs enables to disambiguate spatial relations and to localize objects referred by the text. To facilitate the cross-modal learning of relations, we further propose a teacher-student approach where the teacher model is first trained using ground-truth object labels, and then helps to train a student model using point cloud inputs. We perform ablation studies showing advantages of our approach. We also demonstrate our model to significantly outperform the state of the art on the challenging Nr3D, Sr3D and ScanRefer 3D object grounding datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners b/data/2022/neurips/Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
new file mode 100644
index 0000000000..348ae1e38f
--- /dev/null
+++ b/data/2022/neurips/Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners	
@@ -0,0 +1 @@
+The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction. Existing few-shot video-language learners focus exclusively on the encoder, resulting in the absence of a video-to-text decoder to handle generative tasks. Video captioners have been pretrained on large-scale video-language datasets, but they rely heavily on finetuning and lack the ability to generate text for unseen tasks in a few-shot setting. We propose VidIL, a few-shot Video-language Learner via Image and Language models, which demonstrates strong performance on few-shot video-to-text tasks without the necessity of pretraining or finetuning on any video datasets. We use the image-language models to translate the video content into frame captions, object, attribute, and event phrases, and compose them into a temporal structure template. We then instruct a language model, with a prompt containing a few in-context examples, to generate a target output from the composed content. The flexibility of prompting allows the model to capture any form of text input, such as automatic speech recognition (ASR) transcripts. Our experiments demonstrate the power of language models in understanding videos on a wide variety of video-language tasks, including video captioning, video question answering, video caption retrieval, and video future event prediction. Especially, on video future event prediction, our few-shot model significantly outperforms state-of-the-art supervised models trained on large-scale video datasets. Code and resources are publicly available for research purposes at https://github.com/MikeWangWZHL/VidIL .
\ No newline at end of file
diff --git a/data/2022/neurips/Laplacian Autoencoders for Learning Stochastic Representations b/data/2022/neurips/Laplacian Autoencoders for Learning Stochastic Representations
new file mode 100644
index 0000000000..2dd8053598
--- /dev/null
+++ b/data/2022/neurips/Laplacian Autoencoders for Learning Stochastic Representations	
@@ -0,0 +1 @@
+Established methods for unsupervised representation learning such as variational autoencoders produce none or poorly calibrated uncertainty estimates making it difficult to evaluate if learned representations are stable and reliable. In this work, we present a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower-bound of the autoencoder evidence. This is maximized using Monte Carlo EM with a variational distribution that takes the shape of a Laplace approximation. We develop a new Hessian approximation that scales linearly with data size allowing us to model high-dimensional data. Empirically, we show that our Laplacian autoencoder estimates well-calibrated uncertainties in both latent and output space. We demonstrate that this results in improved performance across a multitude of downstream tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Large Language Models are Zero-Shot Reasoners b/data/2022/neurips/Large Language Models are Zero-Shot Reasoners
new file mode 100644
index 0000000000..10844a277d
--- /dev/null
+++ b/data/2022/neurips/Large Language Models are Zero-Shot Reasoners	
@@ -0,0 +1 @@
+Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars. Notably, chain of thought (CoT) prompting, a recent technique for eliciting complex multi-step reasoning through step-by-step answer examples, achieved the state-of-the-art performances in arithmetics and symbolic reasoning, difficult system-2 tasks that do not follow the standard scaling laws for LLMs. While these successes are often attributed to LLMs' ability for few-shot learning, we show that LLMs are decent zero-shot reasoners by simply adding"Let's think step by step"before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics (MultiArith, GSM8K, AQUA-RAT, SVAMP), symbolic reasoning (Last Letter, Coin Flip), and other logical reasoning tasks (Date Understanding, Tracking Shuffled Objects), without any hand-crafted few-shot examples, e.g. increasing the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with large InstructGPT model (text-davinci-002), as well as similar magnitudes of improvements with another off-the-shelf large model, 540B parameter PaLM. The versatility of this single prompt across very diverse reasoning tasks hints at untapped and understudied fundamental zero-shot capabilities of LLMs, suggesting high-level, multi-task broad cognitive capabilities may be extracted by simple prompting. We hope our work not only serves as the minimal strongest zero-shot baseline for the challenging reasoning benchmarks, but also highlights the importance of carefully exploring and analyzing the enormous zero-shot knowledge hidden inside LLMs before crafting finetuning datasets or few-shot exemplars.
\ No newline at end of file
diff --git a/data/2022/neurips/Large-Scale Differentiable Causal Discovery of Factor Graphs b/data/2022/neurips/Large-Scale Differentiable Causal Discovery of Factor Graphs
new file mode 100644
index 0000000000..11c45bb6bb
--- /dev/null
+++ b/data/2022/neurips/Large-Scale Differentiable Causal Discovery of Factor Graphs	
@@ -0,0 +1 @@
+A common theme in causal inference is learning causal relationships between observed variables, also known as causal discovery. This is usually a daunting task, given the large number of candidate causal graphs and the combinatorial nature of the search space. Perhaps for this reason, most research has so far focused on relatively small causal graphs, with up to hundreds of nodes. However, recent advances in fields like biology enable generating experimental data sets with thousands of interventions followed by rich profiling of thousands of variables, raising the opportunity and urgent need for large causal graph models. Here, we introduce the notion of factor directed acyclic graphs (f-DAGs) as a way to restrict the search space to non-linear low-rank causal interaction models. Combining this novel structural assumption with recent advances that bridge the gap between causal discovery and continuous optimization, we achieve causal discovery on thousands of variables. Additionally, as a model for the impact of statistical noise on this estimation procedure, we study a model of edge perturbations of the f-DAG skeleton based on random graphs and quantify the effect of such perturbations on the f-DAG rank. This theoretical analysis suggests that the set of candidate f-DAGs is much smaller than the whole DAG space and thus may be more suitable as a search space in the high-dimensional regime where the underlying skeleton is hard to assess. We propose Differentiable Causal Discovery of Factor Graphs (DCD-FG), a scalable implementation of -DAG constrained causal discovery for high-dimensional interventional data. DCD-FG uses a Gaussian non-linear low-rank structural equation model and shows significant improvements compared to state-of-the-art methods in both simulations as well as a recent large-scale single-cell RNA sequencing data set with hundreds of genetic interventions.
\ No newline at end of file
diff --git a/data/2022/neurips/Large-Scale Retrieval for Reinforcement Learning b/data/2022/neurips/Large-Scale Retrieval for Reinforcement Learning
new file mode 100644
index 0000000000..6e73b85c6a
--- /dev/null
+++ b/data/2022/neurips/Large-Scale Retrieval for Reinforcement Learning	
@@ -0,0 +1 @@
+Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm is for an agent to amortise information that helps decision making into its network weights via gradient descent on training losses. Here, we pursue an alternative approach in which agents can utilise large-scale context sensitive database lookups to support their parametric computations. This allows agents to directly learn in an end-to-end manner to utilise relevant information to inform their outputs. In addition, new information can be attended to by the agent, without retraining, by simply augmenting the retrieval dataset. We study this approach for offline RL in 9x9 Go, a challenging game for which the vast combinatorial state space privileges generalisation over direct matching to past experiences. We leverage fast, approximate nearest neighbor techniques in order to retrieve relevant data from a set of tens of millions of expert demonstration states. Attending to this information provides a significant boost to prediction accuracy and game-play performance over simply using these demonstrations as training trajectories, providing a compelling demonstration of the value of large-scale retrieval in offline RL agents.
\ No newline at end of file
diff --git a/data/2022/neurips/Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes b/data/2022/neurips/Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes
new file mode 100644
index 0000000000..8d7dc5ee85
--- /dev/null
+++ b/data/2022/neurips/Large-batch Optimization for Dense Visual Predictions: Training Faster R-CNN in 4.2 Minutes	
@@ -0,0 +1 @@
+Training a large-scale deep neural network in a large-scale dataset is challenging and time-consuming. The recent breakthrough of large-batch optimization is a promising way to tackle this challenge. However, although the current advanced algorithms such as LARS and LAMB succeed in classification models, the complicated pipelines of dense visual predictions such as object detection and segmentation still suffer from the heavy performance drop in the large-batch training regime. To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts. Firstly, AGVM can align the gradient variances between different modules in the dense visual predictors, such as backbone, feature pyramid network (FPN), detection, and segmentation heads. We show that training with a large batch size can fail with the gradient variances misaligned among them, which is a phenomenon primarily overlooked in previous work. Secondly, AGVM is a plug-and-play module that generalizes well to many different architectures (e.g., CNNs and Transformers) and different tasks (e.g., object detection, instance segmentation, semantic segmentation, and panoptic segmentation). It is also compatible with different optimizers (e.g., SGD and AdamW). Thirdly, a theoretical analysis of AGVM is provided. Extensive experiments on the COCO and ADE20K datasets demonstrate the superiority of AGVM. For example, it can train Faster R-CNN+ResNet50 in 4 minutes without losing performance. AGVM enables training an object detector with one billion parameters in just 3.5 hours, reducing the training time by 20.9x, whilst achieving 62.2 mAP on COCO. The deliverables are released at https://github.com/Sense-X/AGVM.
\ No newline at end of file
diff --git a/data/2022/neurips/Large-scale Optimization of Partial AUC in a Range of False Positive Rates b/data/2022/neurips/Large-scale Optimization of Partial AUC in a Range of False Positive Rates
new file mode 100644
index 0000000000..5d8418e1fd
--- /dev/null
+++ b/data/2022/neurips/Large-scale Optimization of Partial AUC in a Range of False Positive Rates	
@@ -0,0 +1 @@
+The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. However, it summarizes the true positive rates (TPRs) over all false positive rates (FPRs) in the ROC space, which may include the FPRs with no practical relevance in some applications. The partial AUC, as a generalization of the AUC, summarizes only the TPRs over a specific range of the FPRs and is thus a more suitable performance measure in many real-world situations. Although partial AUC optimization in a range of FPRs had been studied, existing algorithms are not scalable to big data and not applicable to deep learning. To address this challenge, we cast the problem into a non-smooth difference-of-convex (DC) program for any smooth predictive functions (e.g., deep neural networks), which allowed us to develop an efficient approximated gradient descent method based on the Moreau envelope smoothing technique, inspired by recent advances in non-smooth DC optimization. To increase the efficiency of large data processing, we used an efficient stochastic block coordinate update in our algorithm. Our proposed algorithm can also be used to minimize the sum of ranked range loss, which also lacks efficient solvers. We established a complexity of $\tilde O(1/\epsilon^6)$ for finding a nearly $\epsilon$-critical solution. Finally, we numerically demonstrated the effectiveness of our proposed algorithms for both partial AUC maximization and sum of ranked range loss minimization.
\ No newline at end of file
diff --git a/data/2022/neurips/LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model b/data/2022/neurips/LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
new file mode 100644
index 0000000000..f0dc4a93c4
--- /dev/null
+++ b/data/2022/neurips/LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model	
@@ -0,0 +1 @@
+Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying. Source codes are open at https://github.com/ChocoWu/LasUIE.
\ No newline at end of file
diff --git a/data/2022/neurips/Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities b/data/2022/neurips/Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities
new file mode 100644
index 0000000000..6f8454c4ae
--- /dev/null
+++ b/data/2022/neurips/Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities	
@@ -0,0 +1 @@
+The Past Extragradient (PEG) [Popov, 1980] method, also known as the Optimistic Gradient method, has known a recent gain in interest in the optimization community with the emergence of variational inequality formulations for machine learning. Recently, in the unconstrained case, Golowich et al. [2020] proved that a $O(1/N)$ last-iterate convergence rate in terms of the squared norm of the operator can be achieved for Lipschitz and monotone operators with a Lipschitz Jacobian. In this work, by introducing a novel analysis through potential functions, we show that (i) this $O(1/N)$ last-iterate convergence can be achieved without any assumption on the Jacobian of the operator, and (ii) it can be extended to the constrained case, which was not derived before even under Lipschitzness of the Jacobian. The proof is significantly different from the one known from Golowich et al. [2020], and its discovery was computer-aided. Those results close the open question of the last iterate convergence of PEG for monotone variational inequalities.
\ No newline at end of file
diff --git a/data/2022/neurips/Latency-aware Spatial-wise Dynamic Networks b/data/2022/neurips/Latency-aware Spatial-wise Dynamic Networks
new file mode 100644
index 0000000000..556c00f046
--- /dev/null
+++ b/data/2022/neurips/Latency-aware Spatial-wise Dynamic Networks	
@@ -0,0 +1 @@
+Spatial-wise dynamic convolution has become a promising approach to improving the inference efficiency of deep networks. By allocating more computation to the most informative pixels, such an adaptive inference paradigm reduces the spatial redundancy in image features and saves a considerable amount of unnecessary computation. However, the theoretical efficiency achieved by previous methods can hardly translate into a realistic speedup, especially on the multi-core processors (e.g. GPUs). The key challenge is that the existing literature has only focused on designing algorithms with minimal computation, ignoring the fact that the practical latency can also be influenced by scheduling strategies and hardware properties. To bridge the gap between theoretical computation and practical efficiency, we propose a latency-aware spatial-wise dynamic network (LASNet), which performs coarse-grained spatially adaptive inference under the guidance of a novel latency prediction model. The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties. We use the latency predictor to guide both the algorithm design and the scheduling optimization on various hardware platforms. Experiments on image classification, object detection and instance segmentation demonstrate that the proposed framework significantly improves the practical inference efficiency of deep networks. For example, the average latency of a ResNet-101 on the ImageNet validation set could be reduced by 36% and 46% on a server GPU (Nvidia Tesla-V100) and an edge device (Nvidia Jetson TX2 GPU) respectively without sacrificing the accuracy. Code is available at https://github.com/LeapLabTHU/LASNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Latent Hierarchical Causal Structure Discovery with Rank Constraints b/data/2022/neurips/Latent Hierarchical Causal Structure Discovery with Rank Constraints
new file mode 100644
index 0000000000..5520c5a348
--- /dev/null
+++ b/data/2022/neurips/Latent Hierarchical Causal Structure Discovery with Rank Constraints	
@@ -0,0 +1 @@
+Most causal discovery procedures assume that there are no latent confounders in the system, which is often violated in real-world problems. In this paper, we consider a challenging scenario for causal structure identification, where some variables are latent and they form a hierarchical graph structure to generate the measured variables; the children of latent variables may still be latent and only leaf nodes are measured, and moreover, there can be multiple paths between every pair of variables (i.e., it is beyond tree structure). We propose an estimation procedure that can efficiently locate latent variables, determine their cardinalities, and identify the latent hierarchical structure, by leveraging rank deficiency constraints over the measured variables. We show that the proposed algorithm can find the correct Markov equivalence class of the whole graph asymptotically under proper restrictions on the graph structure.
\ No newline at end of file
diff --git a/data/2022/neurips/Latent Planning via Expansive Tree Search b/data/2022/neurips/Latent Planning via Expansive Tree Search
new file mode 100644
index 0000000000..d7848beecf
--- /dev/null
+++ b/data/2022/neurips/Latent Planning via Expansive Tree Search	
@@ -0,0 +1 @@
+Planning enables autonomous agents to solve complex decision-making problems by evaluating predictions of the future. However, classical planning algorithms often become infeasible in real-world settings where state spaces are high-dimensional and transition dynamics unknown. The idea behind latent planning is to simplify the decision-making task by mapping it to a lower-dimensional embedding space. Common latent planning strategies are based on trajectory optimization techniques such as shooting or collocation, which are prone to failure in long-horizon and highly non-convex settings. In this work, we study long-horizon goal-reaching scenarios from visual inputs and formulate latent planning as an explorative tree search. Inspired by classical sampling-based motion planning algorithms, we design a method which iteratively grows and optimizes a tree representation of visited areas of the latent space. To encourage fast exploration, the sampling of new states is biased towards sparsely represented regions within the estimated data support. Our method, called Expansive Latent Space Trees (ELAST), relies on self-supervised training via contrastive learning to obtain (a) a latent state representation and (b) a latent transition density model. We embed ELAST into a model-predictive control scheme and demonstrate significant performance improvements compared to existing baselines given challenging visual control tasks in simulation, including the navigation for a deformable object.
\ No newline at end of file
diff --git a/data/2022/neurips/Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training b/data/2022/neurips/Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training
new file mode 100644
index 0000000000..59c63b8f3a
--- /dev/null
+++ b/data/2022/neurips/Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training	
@@ -0,0 +1 @@
+Federated learning (FL) is a collaborative, privacy-preserving method for training deep neural networks at the edge of the Internet of Things (IoT). Despite the many advantages, existing FL implementations suffer high communication costs that prevent adoption at scale. Specifically, the frequent model updates between the central server and the many end nodes are a source of channel congestion and high energy consumption. This letter tackles this aspect by introducing federated learning with gradual layer freezing (FedGLF), a novel FL scheme that gradually reduces the portion of the model sent back and forth, relieving the communication bundle yet preserving the quality of the training service. The results collected on two image classification tasks learned with different data distributions prove that FedGLF outperforms conventional FL schemes, with data volume savings ranging from 14% to 59% or up to 2.5% higher accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Lazy and Fast Greedy MAP Inference for Determinantal Point Process b/data/2022/neurips/Lazy and Fast Greedy MAP Inference for Determinantal Point Process
new file mode 100644
index 0000000000..a707613d0c
--- /dev/null
+++ b/data/2022/neurips/Lazy and Fast Greedy MAP Inference for Determinantal Point Process	
@@ -0,0 +1 @@
+The maximum a posteriori (MAP) inference for determinantal point processes (DPPs) is crucial for selecting diverse items in many machine learning applications. Although DPP MAP inference is NP-hard, the greedy algorithm often finds high-quality solutions, and many researchers have studied its efficient implementation. One classical and practical method is the lazy greedy algorithm, which is applicable to general submodular function maximization, while a recent fast greedy algorithm based on the Cholesky factorization is more efficient for DPP MAP inference. This paper presents how to combine the ideas of"lazy"and"fast", which have been considered incompatible in the literature. Our lazy and fast greedy algorithm achieves almost the same time complexity as the current best one and runs faster in practice. The idea of"lazy + fast"is extendable to other greedy-type algorithms. We also give a fast version of the double greedy algorithm for unconstrained DPP MAP inference. Experiments validate the effectiveness of our acceleration ideas.
\ No newline at end of file
diff --git a/data/2022/neurips/Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering b/data/2022/neurips/Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
new file mode 100644
index 0000000000..85ec5cdb20
--- /dev/null
+++ b/data/2022/neurips/Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering	
@@ -0,0 +1 @@
+When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions. ScienceQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data. The data and code are available at https://scienceqa.github.io.
\ No newline at end of file
diff --git a/data/2022/neurips/Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets b/data/2022/neurips/Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets
new file mode 100644
index 0000000000..1a9eeca77d
--- /dev/null
+++ b/data/2022/neurips/Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets	
@@ -0,0 +1 @@
+We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market. At each step, the agents are presented with a dynamical context, where the contexts determine the utilities. The planner controls the transition of the contexts to maximize the cumulative social welfare, while the agents aim to find a myopic stable matching at each step. Such a setting captures a range of applications including ridesharing platforms. We formalize the problem by proposing a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching. The proposed algorithm addresses the coupled challenges of sequential exploration, matching stability, and function approximation. We prove that the algorithm achieves sublinear regret.
\ No newline at end of file
diff --git a/data/2022/neurips/Learn what matters: cross-domain imitation learning with task-relevant embeddings b/data/2022/neurips/Learn what matters: cross-domain imitation learning with task-relevant embeddings
new file mode 100644
index 0000000000..f8dc83a37e
--- /dev/null
+++ b/data/2022/neurips/Learn what matters: cross-domain imitation learning with task-relevant embeddings	
@@ -0,0 +1 @@
+We study how an autonomous agent learns to perform a task from demonstrations in a different domain, such as a different environment or different agent. Such cross-domain imitation learning is required to, for example, train an artificial agent from demonstrations of a human expert. We propose a scalable framework that enables cross-domain imitation learning without access to additional demonstrations or further domain knowledge. We jointly train the learner agent's policy and learn a mapping between the learner and expert domains with adversarial training. We effect this by using a mutual information criterion to find an embedding of the expert's state space that contains task-relevant information and is invariant to domain specifics. This step significantly simplifies estimating the mapping between the learner and expert domains and hence facilitates end-to-end learning. We demonstrate successful transfer of policies between considerably different domains, without extra supervision such as additional demonstrations, and in situations where other methods fail.
\ No newline at end of file
diff --git a/data/2022/neurips/Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks b/data/2022/neurips/Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks
new file mode 100644
index 0000000000..f131a0ddd5
--- /dev/null
+++ b/data/2022/neurips/Learnable Polyphase Sampling for Shift Invariant and Equivariant Convolutional Networks	
@@ -0,0 +1 @@
+We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be trained end-to-end from data and generalizes existing handcrafted downsampling layers. It is widely applicable as it can be integrated into any convolutional network by replacing down/upsampling layers. We evaluate LPS on image classification and semantic segmentation. Experiments show that LPS is on-par with or outperforms existing methods in both performance and shift consistency. For the first time, we achieve true shift-equivariance on semantic segmentation (PASCAL VOC), i.e., 100% shift consistency, outperforming baselines by an absolute 3.3%.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning (Very) Simple Generative Models Is Hard b/data/2022/neurips/Learning (Very) Simple Generative Models Is Hard
new file mode 100644
index 0000000000..33399da73f
--- /dev/null
+++ b/data/2022/neurips/Learning (Very) Simple Generative Models Is Hard	
@@ -0,0 +1 @@
+Motivated by the recent empirical successes of deep generative models, we study the computational complexity of the following unsupervised learning problem. For an unknown neural network $F:\mathbb{R}^d\to\mathbb{R}^{d'}$, let $D$ be the distribution over $\mathbb{R}^{d'}$ given by pushing the standard Gaussian $\mathcal{N}(0,\textrm{Id}_d)$ through $F$. Given i.i.d. samples from $D$, the goal is to output any distribution close to $D$ in statistical distance. We show under the statistical query (SQ) model that no polynomial-time algorithm can solve this problem even when the output coordinates of $F$ are one-hidden-layer ReLU networks with $\log(d)$ neurons. Previously, the best lower bounds for this problem simply followed from lower bounds for supervised learning and required at least two hidden layers and $\mathrm{poly}(d)$ neurons [Daniely-Vardi '21, Chen-Gollakota-Klivans-Meka '22]. The key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with polynomially-bounded slopes such that the pushforward of $\mathcal{N}(0,1)$ under $f$ matches all low-degree moments of $\mathcal{N}(0,1)$.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Active Camera for Multi-Object Navigation b/data/2022/neurips/Learning Active Camera for Multi-Object Navigation
new file mode 100644
index 0000000000..ea80108445
--- /dev/null
+++ b/data/2022/neurips/Learning Active Camera for Multi-Object Navigation	
@@ -0,0 +1 @@
+Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras. As a result, the agent may take a very long time to perceive the environment due to limited camera scope. In contrast, humans typically gain a larger field of view by looking around for a better perception of the environment. How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics. In this paper, we consider navigating to multiple objects more efficiently with active cameras. Specifically, we cast moving camera to a Markov Decision Process and reformulate the active camera problem as a reinforcement learning problem. However, we have to address two new challenges: 1) how to learn a good camera policy in complex environments and 2) how to coordinate it with the navigation policy. To address these, we carefully design a reward function to encourage the agent to explore more areas by moving camera actively. Moreover, we exploit human experience to infer a rule-based camera action to guide the learning process. Last, to better coordinate two kinds of policies, the camera policy takes navigation actions into account when making camera moving decisions. Experimental results show our camera policy consistently improves the performance of multi-object navigation over four baselines on two datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network b/data/2022/neurips/Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network
new file mode 100644
index 0000000000..a5e4454324
--- /dev/null
+++ b/data/2022/neurips/Learning Articulated Rigid Body Dynamics with Lagrangian Graph Neural Network	
@@ -0,0 +1 @@
+Lagrangian and Hamiltonian neural networks (LNNs and HNNs, respectively) encode strong inductive biases that allow them to outperform other models of physical systems significantly. However, these models have, thus far, mostly been limited to simple systems such as pendulums and springs or a single rigid body such as a gyroscope or a rigid rotor. Here, we present a Lagrangian graph neural network (LGNN) that can learn the dynamics of articulated rigid bodies by exploiting their topology. We demonstrate the performance of LGNN by learning the dynamics of ropes, chains, and trusses with the bars modeled as rigid bodies. LGNN also exhibits generalizability -- LGNN trained on chains with a few segments exhibits generalizability to simulate a chain with large number of links and arbitrary link length. We also show that the LGNN can simulate unseen hybrid systems including bars and chains, on which they have not been trained on. Specifically, we show that the LGNN can be used to model the dynamics of complex real-world structures such as the stability of tensegrity structures. Finally, we discuss the non-diagonal nature of the mass matrix and its ability to generalize in complex systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation b/data/2022/neurips/Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
new file mode 100644
index 0000000000..eb002cb73d
--- /dev/null
+++ b/data/2022/neurips/Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation	
@@ -0,0 +1 @@
+There exists an unequivocal distinction between the sound produced by a static source and that produced by a moving one, especially when the source moves towards or away from the microphone. In this paper, we propose to use this connection between audio and visual dynamics for solving two challenging tasks simultaneously, namely: (i) separating audio sources from a mixture using visual cues, and (ii) predicting the 3D visual motion of a sounding source using its separated audio. Towards this end, we present Audio Separator and Motion Predictor (ASMP) -- a deep learning framework that leverages the 3D structure of the scene and the motion of sound sources for better audio source separation. At the heart of ASMP is a 2.5D scene graph capturing various objects in the video and their pseudo-3D spatial proximities. This graph is constructed by registering together 2.5D monocular depth predictions from the 2D video frames and associating the 2.5D scene regions with the outputs of an object detector applied on those frames. The ASMP task is then mathematically modeled as the joint problem of: (i) recursively segmenting the 2.5D scene graph into several sub-graphs, each associated with a constituent sound in the input audio mixture (which is then separated) and (ii) predicting the 3D motions of the corresponding sound sources from the separated audio. To empirically evaluate ASMP, we present experiments on two challenging audio-visual datasets, viz. Audio Separation in the Wild (ASIW) and Audio Visual Event (AVE). Our results demonstrate that ASMP achieves a clear improvement in source separation quality, outperforming prior works on both datasets, while also estimating the direction of motion of the sound sources better than other methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Best Combination for Efficient N: M Sparsity b/data/2022/neurips/Learning Best Combination for Efficient N: M Sparsity
new file mode 100644
index 0000000000..10e64c1a18
--- /dev/null
+++ b/data/2022/neurips/Learning Best Combination for Efficient N: M Sparsity	
@@ -0,0 +1 @@
+By forcing at most N out of M consecutive weights to be non-zero, the recent N:M network sparsity has received increasing attention for its two attractive advantages: 1) Promising performance at a high sparsity. 2) Significant speedups on NVIDIA A100 GPUs. Recent studies require an expensive pre-training phase or a heavy dense-gradient computation. In this paper, we show that the N:M learning can be naturally characterized as a combinatorial problem which searches for the best combination candidate within a finite collection. Motivated by this characteristic, we solve N:M sparsity in an efficient divide-and-conquer manner. First, we divide the weight vector into $C_{\text{M}}^{\text{N}}$ combination subsets of a fixed size N. Then, we conquer the combinatorial problem by assigning each combination a learnable score that is jointly optimized with its associate weights. We prove that the introduced scoring mechanism can well model the relative importance between combination subsets. And by gradually removing low-scored subsets, N:M fine-grained sparsity can be efficiently optimized during the normal training phase. Comprehensive experiments demonstrate that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks. Our project is released at \url{https://github.com/zyxxmu/LBC}.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Bipartite Graphs: Heavy Tails and Multiple Components b/data/2022/neurips/Learning Bipartite Graphs: Heavy Tails and Multiple Components
new file mode 100644
index 0000000000..312075ce0c
--- /dev/null
+++ b/data/2022/neurips/Learning Bipartite Graphs: Heavy Tails and Multiple Components	
@@ -0,0 +1 @@
+We investigate the problem of learning an undirected, weighted bipartite graph under the Gaussian Markov random ﬁeld model, for which we present an optimization formulation along with an efﬁcient algorithm based on the projected gradient descent. Motivated by practical applications, where outliers or heavy-tailed events are present, we extend the proposed learning scheme to the case in which the data follow a multivariate Student-t distribution. As a result, the optimization program is no longer convex, but a veriﬁably convergent iterative algorithm is proposed based on the majorization-minimization framework. Finally, we propose an efﬁcient and provably convergent algorithm for learning k -component bipartite graphs that leverages rank constraints of the underlying graph Laplacian matrix. The proposed estimators outperform state-of-the-art methods for bipartite graph learning, as evidenced by real-world experiments using ﬁnancial time series data.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs b/data/2022/neurips/Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs
new file mode 100644
index 0000000000..a5f803804e
--- /dev/null
+++ b/data/2022/neurips/Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs	
@@ -0,0 +1 @@
+Despite recent success in using the invariance principle for out-of-distribution (OOD) generalization on Euclidean data (e.g., images), studies on graph data are still limited. Different from images, the complex nature of graphs poses unique challenges to adopting the invariance principle. In particular, distribution shifts on graphs can appear in a variety of forms such as attributes and structures, making it difficult to identify the invariance. Moreover, domain or environment partitions, which are often required by OOD methods on Euclidean data, could be highly expensive to obtain for graphs. To bridge this gap, we propose a new framework, called Causality Inspired Invariant Graph LeArning (CIGA), to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts. Specifically, we characterize potential distribution shifts on graphs with causal models, concluding that OOD generalization on graphs is achievable when models focus only on subgraphs containing the most information about the causes of labels. Accordingly, we propose an information-theoretic objective to extract the desired subgraphs that maximally preserve the invariant intra-class information. Learning with these subgraphs is immune to distribution shifts. Extensive experiments on 16 synthetic or real-world datasets, including a challenging setting -- DrugOOD, from AI-aided drug discovery, validate the superior OOD performance of CIGA.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Chaotic Dynamics in Dissipative Systems b/data/2022/neurips/Learning Chaotic Dynamics in Dissipative Systems
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Learning Concept Credible Models for Mitigating Shortcuts b/data/2022/neurips/Learning Concept Credible Models for Mitigating Shortcuts
new file mode 100644
index 0000000000..ed042c964d
--- /dev/null
+++ b/data/2022/neurips/Learning Concept Credible Models for Mitigating Shortcuts	
@@ -0,0 +1 @@
+During training, models can exploit spurious correlations as shortcuts, resulting in poor generalization performance when shortcuts do not persist. In this work, assuming access to a representation based on domain knowledge (i.e., known concepts) that is invariant to shortcuts, we aim to learn robust and accurate models from biased training data. In contrast to previous work, we do not rely solely on known concepts, but allow the model to also learn unknown concepts. We propose two approaches for mitigating shortcuts that incorporate domain knowledge, while accounting for potentially important yet unknown concepts. The first approach is two-staged. After fitting a model using known concepts, it accounts for the residual using unknown concepts. While flexible, we show that this approach is vulnerable when shortcuts are correlated with the unknown concepts. This limitation is addressed by our second approach that extends a recently proposed regularization penalty. Applied to two real-world datasets, we demonstrate that both approaches can successfully mitigate shortcut learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Consistency-Aware Unsigned Distance Functions Progressively from Raw Point Clouds b/data/2022/neurips/Learning Consistency-Aware Unsigned Distance Functions Progressively from Raw Point Clouds
new file mode 100644
index 0000000000..a7031d10d2
--- /dev/null
+++ b/data/2022/neurips/Learning Consistency-Aware Unsigned Distance Functions Progressively from Raw Point Clouds	
@@ -0,0 +1 @@
+Surface reconstruction for point clouds is an important task in 3D computer vision. Most of the latest methods resolve this problem by learning signed distance functions (SDF) from point clouds, which are limited to reconstructing shapes or scenes with closed surfaces. Some other methods tried to represent shapes or scenes with open surfaces using unsigned distance functions (UDF) which are learned from large scale ground truth unsigned distances. However, the learned UDF is hard to provide smooth distance ﬁelds near the surface due to the non-continuous character of point clouds. In this paper, we propose a novel method to learn consistency-aware unsigned distance functions directly from raw point clouds. We achieve this by learning to move 3D queries to reach the surface with a ﬁeld consistency constraint, where we also enable to progressively estimate a more accurate surface. Speciﬁcally, we train a neural network to gradually infer the relationship between 3D queries and the approximated surface by searching for the moving target of queries in a dynamic way, which results in a consistent ﬁeld around the surface. Meanwhile, we introduce a polygonization algorithm to extract surfaces directly from the gradient ﬁeld of the learned UDF. The experimental results in surface reconstruction for synthetic and real scan data show signiﬁcant improvements over the state-of-the-art under the widely used benchmarks. Project page
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Contrastive Embedding in Low-Dimensional Space b/data/2022/neurips/Learning Contrastive Embedding in Low-Dimensional Space
new file mode 100644
index 0000000000..111994ed59
--- /dev/null
+++ b/data/2022/neurips/Learning Contrastive Embedding in Low-Dimensional Space	
@@ -0,0 +1 @@
+Contrastive learning (CL) pretrains feature embeddings to scatter instances in the feature space so that the training data can be well discriminated. Most existing CL techniques usually encourage learning such feature embeddings in the high-dimensional space to maximize the instance discrimination. However, this practice may lead to undesired results where the scattering instances are sparsely distributed in the high-dimensional feature space, making it difficult to capture the underlying similarity between pairwise instances. To this end, we propose a novel framework called contrastive learning with low-dimensional reconstruction (CLLR), which adopts a regularized projection layer to reduce the dimensionality of the feature embedding. In CLLR, we build the sparse / low-rank regularizer to adaptively reconstruct a low-dimensional projection space while preserving the basic objective for instance discrimination, and thus successfully learning contrastive embeddings that alleviate the above issue. Theoretically, we prove a tighter error bound for CLLR; empirically, the superiority of CLLR is demonstrated across multiple domains. Both theoretical and experimental results emphasize the significance of learning low-dimensional contrastive embeddings.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Debiased Classifier with Biased Committee b/data/2022/neurips/Learning Debiased Classifier with Biased Committee
new file mode 100644
index 0000000000..52f0b97302
--- /dev/null
+++ b/data/2022/neurips/Learning Debiased Classifier with Biased Committee	
@@ -0,0 +1 @@
+Neural networks are prone to be biased towards spurious correlations between classes and latent attributes exhibited in a major portion of training data, which ruins their generalization capability. We propose a new method for training debiased classifiers with no spurious attribute label. The key idea is to employ a committee of classifiers as an auxiliary module that identifies bias-conflicting data, i.e., data without spurious correlation, and assigns large weights to them when training the main classifier. The committee is learned as a bootstrapped ensemble so that a majority of its classifiers are biased as well as being diverse, and intentionally fail to predict classes of bias-conflicting data accordingly. The consensus within the committee on prediction difficulty thus provides a reliable cue for identifying and weighting bias-conflicting data. Moreover, the committee is also trained with knowledge transferred from the main classifier so that it gradually becomes debiased along with the main classifier and emphasizes more difficult data as training progresses. On five real-world datasets, our method outperforms prior arts using no spurious attribute label like ours and even surpasses those relying on bias labels occasionally.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Deep Input-Output Stable Dynamics b/data/2022/neurips/Learning Deep Input-Output Stable Dynamics
new file mode 100644
index 0000000000..27a67b0973
--- /dev/null
+++ b/data/2022/neurips/Learning Deep Input-Output Stable Dynamics	
@@ -0,0 +1 @@
+Learning stable dynamics from observed time-series data is an essential problem in robotics, physical modeling, and systems biology. Many of these dynamics are represented as an inputs-output system to communicate with the external environment. In this study, we focus on input-output stable systems, exhibiting robustness against unexpected stimuli and noise. We propose a method to learn nonlinear systems guaranteeing the input-output stability. Our proposed method utilizes the differentiable projection onto the space satisfying the Hamilton-Jacobi inequality to realize the input-output stability. The problem of finding this projection can be formulated as a quadratic constraint quadratic programming problem, and we derive the particular solution analytically. Also, we apply our method to a toy bistable model and the task of training a benchmark generated from a glucose-insulin simulator. The results show that the nonlinear system with neural networks by our method achieves the input-output stability, unlike naive neural networks. Our code is available at https://github.com/clinfo/DeepIOStability.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization b/data/2022/neurips/Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
new file mode 100644
index 0000000000..12b094d89c
--- /dev/null
+++ b/data/2022/neurips/Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization	
@@ -0,0 +1 @@
+A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Distinct and Representative Modes for Image Captioning b/data/2022/neurips/Learning Distinct and Representative Modes for Image Captioning
new file mode 100644
index 0000000000..840ea54a91
--- /dev/null
+++ b/data/2022/neurips/Learning Distinct and Representative Modes for Image Captioning	
@@ -0,0 +1 @@
+Over the years, state-of-the-art (SoTA) image captioning methods have achieved promising results on some evaluation metrics (e.g., CIDEr). However, recent findings show that the captions generated by these methods tend to be biased toward the"average"caption that only captures the most general mode (a.k.a, language pattern) in the training corpus, i.e., the so-called mode collapse problem. Affected by it, the generated captions are limited in diversity and usually less informative than natural image descriptions made by humans. In this paper, we seek to avoid this problem by proposing a Discrete Mode Learning (DML) paradigm for image captioning. Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of"mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models. Specifically, the proposed DML optimizes a dual architecture that consists of an image-conditioned discrete variational autoencoder (CdVAE) branch and a mode-conditioned image captioning (MIC) branch. The CdVAE branch maps each image caption to one of the mode embeddings stored in a learned codebook, and is trained with a pure non-autoregressive generation objective to make the modes distinct and representative. The MIC branch can be simply modified from an existing image captioning model, where the mode embedding is added to the original word embeddings as the control signal. In the experiments, we apply the proposed DML to two widely used image captioning models, Transformer and AoANet. The results show that the learned mode embedding successfully facilitates these models to generate high-quality image captions with different modes, further leading to better performance for both diversity and quality on the MSCOCO dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game b/data/2022/neurips/Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game
new file mode 100644
index 0000000000..ca8af4f3fd
--- /dev/null
+++ b/data/2022/neurips/Learning Distributed and Fair Policies for Network Load Balancing as Markov Potential Game	
@@ -0,0 +1 @@
+This paper investigates the network load balancing problem in data centers (DCs) where multiple load balancers (LBs) are deployed, using the multi-agent reinforcement learning (MARL) framework. The challenges of this problem consist of the heterogeneous processing architecture and dynamic environments, as well as limited and partial observability of each LB agent in distributed networking systems, which can largely degrade the performance of in-production load balancing algorithms in real-world setups. Centralised-training-decentralised-execution (CTDE) RL scheme has been proposed to improve MARL performance, yet it incurs -- especially in distributed networking systems, which prefer distributed and plug-and-play design scheme -- additional communication and management overhead among agents. We formulate the multi-agent load balancing problem as a Markov potential game, with a carefully and properly designed workload distribution fairness as the potential function. A fully distributed MARL algorithm is proposed to approximate the Nash equilibrium of the game. Experimental evaluations involve both an event-driven simulator and real-world system, where the proposed MARL load balancing algorithm shows close-to-optimal performance in simulations, and superior results over in-production LBs in the real-world system.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers b/data/2022/neurips/Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers
new file mode 100644
index 0000000000..3fcce649a0
--- /dev/null
+++ b/data/2022/neurips/Learning Distributions Generated by Single-Layer ReLU Networks in the Presence of Arbitrary Outliers	
@@ -0,0 +1 @@
+We consider a set of data samples such that a fraction of the samples are arbitrary outliers, and the rest are the output samples of a single-layer neural network with rectified linear unit (ReLU) activation. Our goal is to estimate the parameters (weight matrix and bias vector) of the neural network, assuming the bias vector to be non-negative. We estimate the network parameters using the gradient descent algorithm combined with either the median-or trimmed mean-based filters to mitigate the effect of the arbitrary outliers. We then prove that ˜ O (cid:16) 1 p 2 + 1 ϵ 2 p (cid:17) samples and ˜ O (cid:16) d 2 p 2 + d 2 ϵ 2 p (cid:17) time are sufficient for our algorithm to estimate the neural network parameters within an error of ϵ when the outlier probability is 1 − p , where 2 / 3 < p ≤ 1 and the problem dimension is d (with log factors being ignored here). Our theoretical and simulation results provide insights into the training complexity of ReLU neural networks in terms of the probability of outliers and problem dimension.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces b/data/2022/neurips/Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces
new file mode 100644
index 0000000000..d955ea3d86
--- /dev/null
+++ b/data/2022/neurips/Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces	
@@ -0,0 +1 @@
+We study a class of dynamical systems modelled as Markov chains that admit an invariant distribution via the corresponding transfer, or Koopman, operator. While data-driven algorithms to reconstruct such operators are well known, their relationship with statistical learning is largely unexplored. We formalize a framework to learn the Koopman operator from finite data trajectories of the dynamical system. We consider the restriction of this operator to a reproducing kernel Hilbert space and introduce a notion of risk, from which different estimators naturally arise. We link the risk with the estimation of the spectral decomposition of the Koopman operator. These observations motivate a reduced-rank operator regression (RRR) estimator. We derive learning bounds for the proposed estimator, holding both in i.i.d. and non i.i.d. settings, the latter in terms of mixing coefficients. Our results suggest RRR might be beneficial over other widely used estimators as confirmed in numerical experiments both for forecasting and mode decomposition.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation b/data/2022/neurips/Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
new file mode 100644
index 0000000000..a75ff782ee
--- /dev/null
+++ b/data/2022/neurips/Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation	
@@ -0,0 +1 @@
+In the past few years, transformers have achieved promising performances on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds their from being deployed on edge devices such as cell phones and smart watches. Knowledge distillation is a widely used paradigm for compressing cumbersome architectures via transferring information to a compact student. However, most of them are designed for convolutional neural networks (CNNs), which do not fully investigate the character of vision transformer (ViT). In this paper, we utilize the patch-level information and propose a fine-grained manifold distillation method. Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space. Then, we decouple the manifold matching loss into three terms with careful design to further reduce the computational costs for the patch relationship. Equipped with the proposed method, a DeiT-Tiny model containing 5M parameters achieves 76.5% top-1 accuracy on ImageNet-1k, which is +2.0% higher than previous distillation approaches. Transfer learning results on other classification benchmarks and downstream vision tasks also demonstrate the superiority of our method over the state-of-the-art algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Energy Networks with Generalized Fenchel-Young Losses b/data/2022/neurips/Learning Energy Networks with Generalized Fenchel-Young Losses
new file mode 100644
index 0000000000..b2ab1f0222
--- /dev/null
+++ b/data/2022/neurips/Learning Energy Networks with Generalized Fenchel-Young Losses	
@@ -0,0 +1 @@
+Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function, typically parametrized by a neural network. This allows one to capture potentially complex relationships between inputs and outputs. To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function. The key challenge for training energy networks lies in computing loss gradients, as this typically requires argmin/argmax differentiation. In this paper, building upon a generalized notion of conjugate function, which replaces the usual bilinear pairing with a general energy function, we propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks. Our losses enjoy many desirable properties and their gradients can be computed efficiently without argmin/argmax differentiation. We also prove the calibration of their excess risk in the case of linear-concave energies. We demonstrate our losses on multilabel classification and imitation learning tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Enhanced Representation for Tabular Data via Neighborhood Propagation b/data/2022/neurips/Learning Enhanced Representation for Tabular Data via Neighborhood Propagation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Learning Equivariant Segmentation with Instance-Unique Querying b/data/2022/neurips/Learning Equivariant Segmentation with Instance-Unique Querying
new file mode 100644
index 0000000000..f794153554
--- /dev/null
+++ b/data/2022/neurips/Learning Equivariant Segmentation with Instance-Unique Querying	
@@ -0,0 +1 @@
+Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings. In this work, we devise a new training framework that boosts query-based models through discriminative query embedding learning. It explores two essential properties, namely dataset-level uniqueness and transformation equivariance, of the relation between queries and instances. First, our algorithm uses the queries to retrieve the corresponding instances from the whole training dataset, instead of only searching within individual scenes. As querying instances across scenes is more challenging, the segmenters are forced to learn more discriminative queries for effective instance separation. Second, our algorithm encourages both image (instance) representations and queries to be equivariant against geometric transformations, leading to more robust, instance-query matching. On top of four famous, query-based models ($i.e.,$ CondInst, SOLOv2, SOTR, and Mask2Former), our training algorithm provides significant performance gains ($e.g.,$ +1.6 - 3.2 AP) on COCO dataset. In addition, our algorithm promotes the performance of SOLOv2 by 2.7 AP, on LVISv1 dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Expressive Meta-Representations with Mixture of Expert Neural Processes b/data/2022/neurips/Learning Expressive Meta-Representations with Mixture of Expert Neural Processes
new file mode 100644
index 0000000000..f3b56a0b58
--- /dev/null
+++ b/data/2022/neurips/Learning Expressive Meta-Representations with Mixture of Expert Neural Processes	
@@ -0,0 +1 @@
+Neural processes (NPs) formulate exchangeable stochastic processes and are promising models for meta learning that do not require gradient updates during the testing phase. However, most NP variants place a strong emphasis on a global latent variable. This weakens the approximation power and restricts the scope of applications using NP variants, especially when data generative processes are complicated. To resolve these issues, we propose to combine the M ixture o f E xpert models with N eural P rocesse s to develop more expressive exchangeable stochastic processes, referred to as Mixture of Expert Neural Processes (MoE-NPs). Then we apply MoE-NPs to both few-shot supervised learning and meta reinforcement learning tasks. Empirical results demonstrate MoE-NPs’ strong generalization capability to unseen tasks in these benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Fractional White Noises in Neural Stochastic Differential Equations b/data/2022/neurips/Learning Fractional White Noises in Neural Stochastic Differential Equations
new file mode 100644
index 0000000000..686ae4a1fe
--- /dev/null
+++ b/data/2022/neurips/Learning Fractional White Noises in Neural Stochastic Differential Equations	
@@ -0,0 +1 @@
+Differential equations play important roles in modeling complex physical systems. Recent advances present interesting research directions by combining differential equations with neural networks. By including noise, stochastic differential equations (SDEs) allows us to model data with uncertainty and measure imprecision. There are many variants of noises known to exist in many real-world data. For example, previously white noises are idealized and induced by Brownian motions. Nevertheless, there is a lack of machine learning models that can handle such noises. In this paper, we introduce a generalized fractional white noise to existing models and propose an efficient approximation of noise sample paths based on classical integration methods and sparse Gaussian processes. Our experimental results demonstrate that the proposed model can capture noise characteristics such as continuity from various time series data, therefore improving model fittings over existing models. We examine how we can apply our approach to score-based generative models, showing that there exists a case of our generalized noise resulting in a better image generation measure.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning General World Models in a Handful of Reward-Free Deployments b/data/2022/neurips/Learning General World Models in a Handful of Reward-Free Deployments
new file mode 100644
index 0000000000..4e15bb085b
--- /dev/null
+++ b/data/2022/neurips/Learning General World Models in a Handful of Reward-Free Deployments	
@@ -0,0 +1 @@
+Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide theoretical intuition for CASCADE which we show in a tabular setting improves upon na\"ive approaches that do not account for population diversity. We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid, Crafter and the DM Control Suite. Code and videos are available at https://ycxuyingchen.github.io/cascade/
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation b/data/2022/neurips/Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation
new file mode 100644
index 0000000000..dfc7599cd6
--- /dev/null
+++ b/data/2022/neurips/Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation	
@@ -0,0 +1 @@
+Recent neural methods for vehicle routing problems always train and test the deep models on the same instance distribution (i.e., uniform). To tackle the consequent cross-distribution generalization concerns, we bring the knowledge distillation to this field and propose an Adaptive Multi-Distribution Knowledge Distillation (AMDKD) scheme for learning more generalizable deep models. Particularly, our AMDKD leverages various knowledge from multiple teachers trained on exemplar distributions to yield a light-weight yet generalist student model. Meanwhile, we equip AMDKD with an adaptive strategy that allows the student to concentrate on difficult distributions, so as to absorb hard-to-master knowledge more effectively. Extensive experimental results show that, compared with the baseline neural methods, our AMDKD is able to achieve competitive results on both unseen in-distribution and out-of-distribution instances, which are either randomly synthesized or adopted from benchmark datasets (i.e., TSPLIB and CVRPLIB). Notably, our AMDKD is generic, and consumes less computational resources for inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Generalizable Part-based Feature Representation for 3D Point Clouds b/data/2022/neurips/Learning Generalizable Part-based Feature Representation for 3D Point Clouds
new file mode 100644
index 0000000000..bb1b7325cd
--- /dev/null
+++ b/data/2022/neurips/Learning Generalizable Part-based Feature Representation for 3D Point Clouds	
@@ -0,0 +1 @@
+Deep networks on 3D point clouds have achieved remarkable success in 3D classification, while they are vulnerable to geometry variations caused by inconsistent data acquisition procedures. This results in a challenging 3D domain generalization (3DDG) problem, that is to generalize a model trained on source domain to an unseen target domain. Based on the observation that local geometric structures are more generalizable than the whole shape, we propose to reduce the geometry shift by a generalizable part-based feature representation and design a novel part-based domain generalization network (PDG) for 3D point cloud classification. Specifically, we build a part-template feature space shared by source and target domains. Shapes from distinct domains are first organized to part-level features and then represented by part-template features. The transformed part-level features, dubbed aligned part-based representations, are then aggregated by a part-based feature aggregation module. To improve the robustness of the part-based representations, we further propose a contrastive learning framework upon part-based shape representation. Experiments and ablation studies on 3DDA and 3DDG benchmarks justify the efficacy of the proposed approach for domain generalization, compared with the previous state-of-the-art methods. Our code will be available on http://github.com/weixmath/PDG .
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems b/data/2022/neurips/Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems
new file mode 100644
index 0000000000..4bc91a33ee
--- /dev/null
+++ b/data/2022/neurips/Learning Generalized Policy Automata for Relational Stochastic Shortest Path Problems	
@@ -0,0 +1 @@
+Several goal-oriented problems in the real-world can be naturally expressed as Stochastic Shortest Path Problems (SSPs). However, the computational complexity of solving SSPs makes finding solutions to even moderately sized problems intractable. Currently, existing state-of-the-art planners and heuristics often fail to exploit knowledge learned from solving other instances. This paper presents an approach for learning \emph{Generalized Policy Automata} (GPA): non-deterministic partial policies that can be used to catalyze the solution process. GPAs are learned using relational, feature-based abstractions, which makes them applicable on broad classes of related problems with different object names and quantities. Theoretical analysis of this approach shows that it guarantees completeness and hierarchical optimality. Empirical analysis shows that this approach effectively learns broadly applicable policy knowledge in a few-shot fashion and significantly outperforms state-of-the-art SSP solvers on test problems whose object counts are far greater than those used during training.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds b/data/2022/neurips/Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds
new file mode 100644
index 0000000000..2aa4eff9c3
--- /dev/null
+++ b/data/2022/neurips/Learning Graph-embedded Key-event Back-tracing for Object Tracking in Event Clouds	
@@ -0,0 +1 @@
+Event data-based object tracking is attracting attention increasingly. Unfortunately, the unusual data structure caused by the unique sensing mechanism poses great challenges in designing downstream algorithms. To tackle such challenges, existing methods usually re-organize raw event data (or event clouds) with the event frame/image representation to adapt to mature RGB data-based tracking paradigms, which compromises the high temporal resolution and sparse characteristics. By contrast, we advocate developing new designs/techniques tailored to the special data structure to realize object tracking. To this end, we make the ﬁrst attempt to construct a new end-to-end learning-based paradigm that directly consumes event clouds. Speciﬁcally, to process a non-uniformly distributed large-scale event cloud efﬁciently, we propose a simple yet effective density-insensitive downsampling strategy to sample a subset called key-events. Then, we employ a graph-based network to embed the irregular spatio-temporal information of key-events into a high-dimensional feature space, and the resulting embeddings are utilized to predict their target likelihoods via semantic-driven Siamese-matching. Besides, we also propose motion-aware target likelihood prediction, which learns the motion ﬂow to back-trace the potential initial positions of key-events and measures them with the previous proposal. Finally, we obtain the bounding box by adaptively fusing the two intermediate ones separately regressed from the weighted embeddings of key-events by the two types of predicted target likelihoods. Extensive experiments on both synthetic and real event datasets demonstrate the superiority of the proposed framework over state-of-the-art methods in terms of both the tracking accuracy and speed. The code is publicly available at https://github.com/ZHU-Zhiyu/Event-tracking
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Individualized Treatment Rules with Many Treatments: A Supervised Clustering Approach Using Adaptive Fusion b/data/2022/neurips/Learning Individualized Treatment Rules with Many Treatments: A Supervised Clustering Approach Using Adaptive Fusion
new file mode 100644
index 0000000000..f6a25a87ed
--- /dev/null
+++ b/data/2022/neurips/Learning Individualized Treatment Rules with Many Treatments: A Supervised Clustering Approach Using Adaptive Fusion	
@@ -0,0 +1 @@
+Learning an optimal Individualized Treatment Rule (ITR) is a very important problem in precision medicine. This paper is concerned with the challenge when the number of treatment arms is large, and some groups of treatments in the large treat-ment space may work similarly for the patients. Motivated by the recent development of supervised clustering, we propose a novel adaptive fusion based method to cluster the treatments with similar treatment effects together and estimate the optimal ITR simultaneously through a single convex optimization. The problem is formulated as balancing loss ` penalty terms with a tuning parameter, which allows the entire solution path of the treatment clustering process to be clearly visualized hierarchically. For computation, we propose an efﬁcient algorithm based on accelerated proximal gradient and further conduct a novel group-lasso based algorithm for variable selection to boost the performance. Moreover, we demonstrate the theoretical guarantee of recovering the underlying true clustering structure of the treatments for our method. Finally, we demonstrate the superior performance of our method via both simulations and a real data application on cancer treatment, which may assist the decision making process for doctors.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness b/data/2022/neurips/Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness
new file mode 100644
index 0000000000..b477b8d1f9
--- /dev/null
+++ b/data/2022/neurips/Learning Infinite-Horizon Average-Reward Restless Multi-Action Bandits via Index Awareness	
@@ -0,0 +1 @@
+We consider the online restless bandits with average-reward and multiple actions, where the state of each arm evolves according to a Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. Since ﬁnding the optimal control is typically intractable for restless bandits, existing learning algorithms are often computationally expensive or with a regret bound that is exponential in the number of arms and states. In this paper, we advocate index-aware reinforcement learning (RL) solutions to design RL algorithms operating on a much smaller dimensional subspace by exploiting the inherent structure in restless bandits. Speciﬁcally, we ﬁrst propose novel index policies to address dimensionality concerns, which are provably optimal. We then leverage the indices to develop two low-complexity index-aware RL algorithms, namely, (i) GM-R2MAB , which has access to a generative model; and (ii) UC-R2MAB , which learns the model using an upper conﬁdence style online exploitation method. We prove that both algorithms achieve a sub-linear regret that is only polynomial in the number of arms and states. A key differentiator between our algorithms and existing ones stems from the fact that our RL algorithms contain a novel exploitation that leverages our proposed provably optimal index policies for decision-makings.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Interface Conditions in Domain Decomposition Solvers b/data/2022/neurips/Learning Interface Conditions in Domain Decomposition Solvers
new file mode 100644
index 0000000000..051e0ce90e
--- /dev/null
+++ b/data/2022/neurips/Learning Interface Conditions in Domain Decomposition Solvers	
@@ -0,0 +1 @@
+Domain decomposition methods are widely used and effective in the approximation of solutions to partial differential equations. Yet the optimal construction of these methods requires tedious analysis and is often available only in simplified, structured-grid settings, limiting their use for more complex problems. In this work, we generalize optimized Schwarz domain decomposition methods to unstructured-grid problems, using Graph Convolutional Neural Networks (GCNNs) and unsupervised learning to learn optimal modifications at subdomain interfaces. A key ingredient in our approach is an improved loss function, enabling effective training on relatively small problems, but robust performance on arbitrarily large problems, with computational cost linear in problem size. The performance of the learned linear solvers is compared with both classical and optimized domain decomposition algorithms, for both structured- and unstructured-grid problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Invariant Graph Representations for Out-of-Distribution Generalization b/data/2022/neurips/Learning Invariant Graph Representations for Out-of-Distribution Generalization
new file mode 100644
index 0000000000..2b3422de5f
--- /dev/null
+++ b/data/2022/neurips/Learning Invariant Graph Representations for Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+Graph representation learning has shown effectiveness when testing and training graph data come from the same distribution, but most existing approaches fail to generalize under distribution shifts. Invariant learning, backed by the invariance principle from causality, can achieve guaranteed generalization under distribution shifts in theory and has shown great successes in practice. However, invariant learning for graphs under distribution shifts remains unexplored and challenging. To solve this problem, we propose Graph Invariant Learning ( GIL ) model capable of learning generalized graph representations under distribution shifts. Our proposed method can capture the invariant relationships between predictive graph structural information and labels in a mixture of latent environments through jointly optimizing three tailored modules. Specifically, we first design a GNN-based subgraph generator to identify invariant subgraphs. Then we use the variant subgraphs, i.e., complements of invariant subgraphs, to infer the latent environment labels. We further propose an invariant learning module to learn graph representations that can generalize to unknown test graphs. Theoretical justifications for our proposed method are also provided. Extensive experiments on both synthetic and real-world datasets demonstrate the superiority of our method against state-of-the-art baselines under distribution shifts for the graph classification task.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Latent Seasonal-Trend Representations for Time Series Forecasting b/data/2022/neurips/Learning Latent Seasonal-Trend Representations for Time Series Forecasting
new file mode 100644
index 0000000000..746bc7099d
--- /dev/null
+++ b/data/2022/neurips/Learning Latent Seasonal-Trend Representations for Time Series Forecasting	
@@ -0,0 +1 @@
+Forecasting complex time series is ubiquitous and vital in a range of applications but challenging. Recent advances endeavor to achieve progress by incorporating various deep learning techniques (e.g., RNN and Transformer) into sequential models. However, clear patterns are still hard to extract since time series are often composed of several intricately entangled components. Motivated by the success of disentangled variational autoencoder in computer vision and classical time series decomposition, we plan to infer a couple of representations that depict seasonal and trend components of time series. To achieve this goal, we propose LaST, which, based on variational inference, aims to disentangle the seasonal-trend representations in the latent space. Furthermore, LaST supervises and disassociates representations from the perspectives of themselves and input reconstruction, and introduces a series of auxiliary objectives. Extensive experiments prove that LaST achieves state-of-the-art performance on time series forecasting task against the most advanced representation learning and end-to-end forecasting models. For reproducibility, our implementation is publicly available on Github 1 .
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Long-Term Crop Management Strategies with CyclesGym b/data/2022/neurips/Learning Long-Term Crop Management Strategies with CyclesGym
new file mode 100644
index 0000000000..0b93803174
--- /dev/null
+++ b/data/2022/neurips/Learning Long-Term Crop Management Strategies with CyclesGym	
@@ -0,0 +1 @@
+To improve the sustainability and resilience of modern food systems, designing improved crop management strategies is crucial. The increasing abundance of data on agricultural systems suggests that future strategies could beneﬁt from adapting to environmental conditions, but how to design these adaptive policies poses a new frontier. A natural technique for learning policies in these kinds of sequential decision-making problems is reinforcement learning (RL). To obtain the large number of samples required to learn effective RL policies, existing work has used mechanistic crop growth models (CGMs) as simulators. These solutions focus on single-year, single-crop simulations for learning strategies for a single agricultural management practice. However, to learn sustainable long-term policies we must be able to train in multi-year environments, with multiple crops, and consider a wider array of management techniques. We introduce C YCLES GYM, an RL environment based on the multi-year, multi-crop CGM Cycles. C YCLES GYM allows for long-term planning in agroecosystems, provides modular state space and reward constructors and weather generators, and allows for complex actions. For RL researchers, this is a novel benchmark to investigate issues arising in real-world applications. For agronomists, we demonstrate the potential of RL as a powerful optimization tool for agricultural systems management in multi-year case studies on nitrogen (N) fertilization and crop planning scenarios
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Manifold Dimensions with Conditional Variational Autoencoders b/data/2022/neurips/Learning Manifold Dimensions with Conditional Variational Autoencoders
new file mode 100644
index 0000000000..1f7ae48f57
--- /dev/null
+++ b/data/2022/neurips/Learning Manifold Dimensions with Conditional Variational Autoencoders	
@@ -0,0 +1 @@
+Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold. For example, while prior work has suggested that the globally optimal VAE solution can learn the correct manifold dimension, a necessary (but not sufficient) condition for producing samples from the true data distribution, this has never been rigorously proven. Moreover, it remains unclear how such considerations would change when various types of conditioning variables are introduced, or when the data support is extended to a union of manifolds (e.g., as is likely the case for MNIST digits and related). In this work, we address these points by first proving that VAE global minima are indeed capable of recovering the correct manifold dimension. We then extend this result to more general CVAEs, demonstrating practical scenarios whereby the conditioning variables allow the model to adaptively learn manifolds of varying dimension across samples. Our analyses, which have practical implications for various CVAE design choices, are also supported by numerical results on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Mixed Multinomial Logits with Provable Guarantees b/data/2022/neurips/Learning Mixed Multinomial Logits with Provable Guarantees
new file mode 100644
index 0000000000..ca747a9afa
--- /dev/null
+++ b/data/2022/neurips/Learning Mixed Multinomial Logits with Provable Guarantees	
@@ -0,0 +1 @@
+A mixture of multinomial logits (MMNL) generalizes the single logit model, which is commonly used in predicting the probabilities of different outcomes. While extensive algorithms have been developed in the literature to learn MMNL models, theoretical results are limited. Built on the Frank-Wolfe (FW) method, we propose a new algorithm that learns both mixture weights and component-specific logit parameters with provable convergence guarantees for an arbitrary number of mixtures. Our algorithm utilizes historical choice data to generate a set of candidate choice probability vectors, each being close to the ground truth with a high probability. We further provide a sample complexity analysis to show that only a polynomial number of samples is required to secure the performance guarantee of our algorithm. Finally, we conduct simulation studies to evaluate the performance and demonstrate how to apply our algorithm to real-world applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Modular Simulations for Homogeneous Systems b/data/2022/neurips/Learning Modular Simulations for Homogeneous Systems
new file mode 100644
index 0000000000..98a6b04c9d
--- /dev/null
+++ b/data/2022/neurips/Learning Modular Simulations for Homogeneous Systems	
@@ -0,0 +1 @@
+Complex systems are often decomposed into modular subsystems for engineering tractability. Although various equation based white-box modeling techniques make use of such structure, learning based methods have yet to incorporate these ideas broadly. We present a modular simulation framework for modeling homogeneous multibody dynamical systems, which combines ideas from graph neural networks and neural differential equations. We learn to model the individual dynamical subsystem as a neural ODE module. Full simulation of the composite system is orchestrated via spatio-temporal message passing between these modules. An arbitrary number of modules can be combined to simulate systems of a wide variety of coupling topologies. We evaluate our framework on a variety of systems and show that message passing allows coordination between multiple modules over time for accurate predictions and in certain cases, enables zero-shot generalization to new system configurations. Furthermore, we show that our models can be transferred to new system configurations with lower data requirement and training effort, compared to those trained from scratch.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching b/data/2022/neurips/Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching
new file mode 100644
index 0000000000..0b672fbd15
--- /dev/null
+++ b/data/2022/neurips/Learning Multi-resolution Functional Maps with Spectral Attention for Robust Shape Matching	
@@ -0,0 +1 @@
+In this work, we present a novel non-rigid shape matching framework based on multi-resolution functional maps with spectral attention. Existing functional map learning methods all rely on the critical choice of the spectral resolution hyperparameter, which can severely affect the overall accuracy or lead to overfitting, if not chosen carefully. In this paper, we show that spectral resolution tuning can be alleviated by introducing spectral attention. Our framework is applicable in both supervised and unsupervised settings, and we show that it is possible to train the network so that it can adapt the spectral resolution, depending on the given shape input. More specifically, we propose to compute multi-resolution functional maps that characterize correspondence across a range of spectral resolutions, and introduce a spectral attention network that helps to combine this representation into a single coherent final correspondence. Our approach is not only accurate with near-isometric input, for which a high spectral resolution is typically preferred, but also robust and able to produce reasonable matching even in the presence of significant non-isometric distortion, which poses great challenges to existing methods. We demonstrate the superior performance of our approach through experiments on a suite of challenging near-isometric and non-isometric shape matching benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning b/data/2022/neurips/Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning
new file mode 100644
index 0000000000..96065aeb23
--- /dev/null
+++ b/data/2022/neurips/Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning	
@@ -0,0 +1 @@
+This paper explores the possibility of near-optimally solving multi-agent, multi-task NP-hard planning problems with time-dependent rewards using a learning-based algorithm. In particular, we consider a class of robot/machine scheduling problems called the multi-robot reward collection problem (MRRC). Such MRRC problems well model ride-sharing, pickup-and-delivery, and a variety of related problems. In representing the MRRC problem as a sequential decision-making problem, we observe that each state can be represented as an extension of probabilistic graphical models (PGMs), which we refer to as random PGMs. We then develop a mean-field inference method for random PGMs. We then propose (1) an order-transferable Q-function estimator and (2) an order-transferability-enabled auction to select a joint assignment in polynomial time. These result in a reinforcement learning framework with at least $1-1/e$ optimality. Experimental results on solving MRRC problems highlight the near-optimality and transferability of the proposed methods. We also consider identical parallel machine scheduling problems (IPMS) and minimax multiple traveling salesman problems (minimax-mTSP).
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Neural Acoustic Fields b/data/2022/neurips/Learning Neural Acoustic Fields
new file mode 100644
index 0000000000..87a888c840
--- /dev/null
+++ b/data/2022/neurips/Learning Neural Acoustic Fields	
@@ -0,0 +1 @@
+Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of the visual world, there have not been commensurate advances in learning spatial auditory representations. To address this gap, we introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene. By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds. We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations. We further show that the representation learned by NAFs can help improve visual learning with sparse views. Finally, we show that a representation informative of scene structure emerges during the learning of NAFs.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Neural Set Functions Under the Optimal Subset Oracle b/data/2022/neurips/Learning Neural Set Functions Under the Optimal Subset Oracle
new file mode 100644
index 0000000000..a4ac771ec1
--- /dev/null
+++ b/data/2022/neurips/Learning Neural Set Functions Under the Optimal Subset Oracle	
@@ -0,0 +1 @@
+Learning neural set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery. The majority of existing works study methodologies of set function learning under the function value oracle, which, however, requires expensive supervision signals. This renders it impractical for applications with only weak supervisions under the Optimal Subset (OS) oracle, the study of which is surprisingly overlooked. In this work, we present a principled yet practical maximum likelihood learning framework, termed as EquiVSet, that simultaneously meets the following desiderata of learning set functions under the OS oracle: i) permutation invariance of the set mass function being modeled; ii) permission of varying ground set; iii) minimum prior; and iv) scalability. The main components of our framework involve: an energy-based treatment of the set mass function, DeepSet-style architectures to handle permutation invariance, mean-field variational inference, and its amortized variants. Thanks to the elegant combination of these advanced architectures, empirical studies on three real-world applications (including Amazon product recommendation, set anomaly detection, and compound selection for virtual screening) demonstrate that EquiVSet outperforms the baselines by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Optical Flow from Continuous Spike Streams b/data/2022/neurips/Learning Optical Flow from Continuous Spike Streams
new file mode 100644
index 0000000000..08c65e2d02
--- /dev/null
+++ b/data/2022/neurips/Learning Optical Flow from Continuous Spike Streams	
@@ -0,0 +1 @@
+Spike camera is an emerging bio-inspired vision sensor with ultra-high temporal resolution. It records scenes by accumulating photons and outputting continuous binary spike streams. Optical flow is a key task for spike cameras and their applications. A previous attempt has been made for spike-based optical flow. However, the previous work only focuses on motion between two moments, and it uses graphics-based data for training, whose generalization is limited. In this paper, we propose a tailored network, Spike2Flow that extracts information from binary spikes with temporal-spatial representation based on the differential of spike firing time and spatial information aggregation. The network utilizes continuous motion clues through joint correlation decoding. Besides, a new dataset with real-world scenes is proposed for better generalization. Experimental results show that our approach achieves state-of-the-art performance on existing synthetic datasets and real data captured by spike cameras. The source code and dataset are available at https://github.com/ruizhao26/Spike2Flow .
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Optimal Flows for Non-Equilibrium Importance Sampling b/data/2022/neurips/Learning Optimal Flows for Non-Equilibrium Importance Sampling
new file mode 100644
index 0000000000..4c132a0cb3
--- /dev/null
+++ b/data/2022/neurips/Learning Optimal Flows for Non-Equilibrium Importance Sampling	
@@ -0,0 +1 @@
+Many applications in computational sciences and statistical inference require the computation of expectations with respect to complex high-dimensional distributions with unknown normalization constants, as well as the estimation of these constants. Here we develop a method to perform these calculations based on generating samples from a simple base distribution, transporting them by the flow generated by a velocity field, and performing averages along these flowlines. This non-equilibrium importance sampling (NEIS) strategy is straightforward to implement and can be used for calculations with arbitrary target distributions. On the theory side, we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator with zero-variance. We also draw connections between NEIS and approaches based on mapping a base distribution onto a target via a transport map. On the computational side, we show how to use deep learning to represent the velocity field by a neural network and train it towards the zero variance optimum. These results are illustrated numerically on benchmark examples (with dimension up to $10$), where after training the velocity field, the variance of the NEIS estimator is reduced by up to $6$ orders of magnitude than that of a vanilla estimator. We also compare the performances of NEIS with those of Neal's annealed importance sampling (AIS).
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Options via Compression b/data/2022/neurips/Learning Options via Compression
new file mode 100644
index 0000000000..7feae7bd70
--- /dev/null
+++ b/data/2022/neurips/Learning Options via Compression	
@@ -0,0 +1 @@
+Identifying statistical regularities in solutions to some tasks in multi-task reinforcement learning can accelerate the learning of new tasks. Skill learning offers one way of identifying these regularities by decomposing pre-collected experiences into a sequence of skills. A popular approach to skill learning is maximizing the likelihood of the pre-collected experience with latent variable models, where the latent variables represent the skills. However, there are often many solutions that maximize the likelihood equally well, including degenerate solutions. To address this underspecification, we propose a new objective that combines the maximum likelihood objective with a penalty on the description length of the skills. This penalty incentivizes the skills to maximally extract common structures from the experiences. Empirically, our objective learns skills that solve downstream tasks in fewer samples compared to skills learned from only maximizing likelihood. Further, while most prior works in the offline multi-task setting focus on tasks with low-dimensional observations, our objective can scale to challenging tasks with high-dimensional image observations.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Partial Equivariances From Data b/data/2022/neurips/Learning Partial Equivariances From Data
new file mode 100644
index 0000000000..a92a733e0e
--- /dev/null
+++ b/data/2022/neurips/Learning Partial Equivariances From Data	
@@ -0,0 +1 @@
+Group Convolutional Neural Networks (G-CNNs) constrain learned features to respect the symmetries in the selected group, and lead to better generalization when these symmetries appear in the data. If this is not the case, however, equivariance leads to overly constrained models and worse performance. Frequently, transformations occurring in data can be better represented by a subset of a group than by a group as a whole, e.g., rotations in $[-90^{\circ}, 90^{\circ}]$. In such cases, a model that respects equivariance $\textit{partially}$ is better suited to represent the data. In addition, relevant transformations may differ for low and high-level features. For instance, full rotation equivariance is useful to describe edge orientations in a face, but partial rotation equivariance is better suited to describe face poses relative to the camera. In other words, the optimal level of equivariance may differ per layer. In this work, we introduce $\textit{Partial G-CNNs}$: G-CNNs able to learn layer-wise levels of partial and full equivariance to discrete, continuous groups and combinations thereof as part of training. Partial G-CNNs retain full equivariance when beneficial, e.g., for rotated MNIST, but adjust it whenever it becomes harmful, e.g., for classification of 6 / 9 digits or natural images. We empirically show that partial G-CNNs pair G-CNNs when full equivariance is advantageous, and outperform them otherwise.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Physical Dynamics with Subequivariant Graph Neural Networks b/data/2022/neurips/Learning Physical Dynamics with Subequivariant Graph Neural Networks
new file mode 100644
index 0000000000..30fb20cd00
--- /dev/null
+++ b/data/2022/neurips/Learning Physical Dynamics with Subequivariant Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics. However, they still encounter several challenges: 1) Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization and should be incorporated into the model design. Existing simulators either consider insufficient symmetry, or enforce excessive equivariance in practice when symmetry is partially broken by gravity. 2) Objects in the physical world possess diverse shapes, sizes, and properties, which should be appropriately processed by the model. To tackle these difficulties, we propose a novel backbone, Subequivariant Graph Neural Network, which 1) relaxes equivariance to subequivariance by considering external fields like gravity, where the universal approximation ability holds theoretically; 2) introduces a new subequivariant object-aware message passing for learning physical interactions between multiple objects of various shapes in the particle-based representation; 3) operates in a hierarchical fashion, allowing for modeling long-range and complex interactions. Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall compared with state-of-the-art GNN simulators, while exhibiting strong generalization and data efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Physics Constrained Dynamics Using Autoencoders b/data/2022/neurips/Learning Physics Constrained Dynamics Using Autoencoders
new file mode 100644
index 0000000000..d31f99b4a0
--- /dev/null
+++ b/data/2022/neurips/Learning Physics Constrained Dynamics Using Autoencoders	
@@ -0,0 +1 @@
+We consider the problem of estimating states ( e.g., position and velocity) and physical parameters ( e.g., friction, elasticity) from a sequence of observations when provided a dynamic equation that describes the behavior of the system. The dynamic equation can arise from first principles ( e.g., Newton’s laws) and provide useful cues for learning, but its physical parameters are unknown. To address this problem, we propose a model that estimates states and physical parameters of the system using two main components. First, an autoencoder compresses a sequence of observations ( e.g., sensor measurements, pixel images) into a sequence for the state representation that is consistent with physics by including a simulation of the dynamic equation. Second, an estimator is coupled with the autoencoder to predict the values of the physical parameters. We also theoretically and empirically show that using Fourier feature mappings improves the generalization of the estimator in predicting physical parameters compared to raw state sequences when learning from high-frequency data. In our experiments on three visual and one sensor measurement tasks, our model imposes interpretability on latent states and achieves improved generalization performance for long-term prediction of system dynamics over state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Predictions for Algorithms with Predictions b/data/2022/neurips/Learning Predictions for Algorithms with Predictions
new file mode 100644
index 0000000000..0b4d9a9bcc
--- /dev/null
+++ b/data/2022/neurips/Learning Predictions for Algorithms with Predictions	
@@ -0,0 +1 @@
+A burgeoning paradigm in algorithm design is the field of algorithms with predictions, in which algorithms can take advantage of a possibly-imperfect prediction of some aspect of the problem. While much work has focused on using predictions to improve competitive ratios, running times, or other performance measures, less effort has been devoted to the question of how to obtain the predictions themselves, especially in the critical online setting. We introduce a general design approach for algorithms that learn predictors: (1) identify a functional dependence of the performance measure on the prediction quality and (2) apply techniques from online learning to learn predictors, tune robustness-consistency trade-offs, and bound the sample complexity. We demonstrate the effectiveness of our approach by applying it to bipartite matching, ski-rental, page migration, and job scheduling. In several settings we improve upon multiple existing results while utilizing a much simpler analysis, while in the others we provide the first learning-theoretic guarantees.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Probabilistic Models from Generator Latent Spaces with Hat EBM b/data/2022/neurips/Learning Probabilistic Models from Generator Latent Spaces with Hat EBM
new file mode 100644
index 0000000000..54e638fa63
--- /dev/null
+++ b/data/2022/neurips/Learning Probabilistic Models from Generator Latent Spaces with Hat EBM	
@@ -0,0 +1 @@
+This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM). Our formulation posits that observed images are the sum of unobserved latent variables passed through the generator network and a residual random variable that spans the gap between the generator output and the image manifold. One can then define an EBM that includes the generator as part of its forward pass, which we call the Hat EBM. The model can be trained without inferring the latent variables of the observed data or calculating the generator Jacobian determinant. This enables explicit probabilistic modeling of the output distribution of any type of generator network. Experiments show strong performance of the proposed method on (1) unconditional ImageNet synthesis at 128x128 resolution, (2) refining the output of existing generators, and (3) learning EBMs that incorporate non-probabilistic generators. Code and pretrained models to reproduce our results are available at https://github.com/point0bar1/hat-ebm.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Recourse on Instance Environment to Enhance Prediction Accuracy b/data/2022/neurips/Learning Recourse on Instance Environment to Enhance Prediction Accuracy
new file mode 100644
index 0000000000..d45f6befb1
--- /dev/null
+++ b/data/2022/neurips/Learning Recourse on Instance Environment to Enhance Prediction Accuracy	
@@ -0,0 +1 @@
+Machine Learning models are often susceptible to poor performance on instances sampled from bad environments. For example, an image classifier could provide low accuracy on images captured under low lighting conditions. In high stake ML applications, such as AI-driven medical diagnostics, a better option could be to provide recourse in the form of alternative environment settings in which to recapture the instance for more reliable diagnostics. In this paper, we propose a model called R ECOURSE N ET that learns to apply recourse on the space of environments so that the recoursed instances are amenable to better predictions by the classifier. Learning to output optimal recourse is challenging because we do not assume access to the underlying physical process that generates the recoursed instances. Also, the optimal setting could be instance-dependent — for example the best camera angle for object recognition could be a function of the object’s shape. We propose a novel three-level training method that (a) Learns a classifier that is optimized for high performance under recourse, (b) Learns a recourse predictor when the training data may contain only limited instances under good environment settings, and (c) Triggers recourse selectively only when recourse is likely to improve classifier confidence. We experiment with synthetic and real world datasets to show the efficacy of our proposed approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Representations via a Robust Behavioral Metric for Deep Reinforcement Learning b/data/2022/neurips/Learning Representations via a Robust Behavioral Metric for Deep Reinforcement Learning
new file mode 100644
index 0000000000..bd82a330a8
--- /dev/null
+++ b/data/2022/neurips/Learning Representations via a Robust Behavioral Metric for Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Learning an informative representation with behavioral metrics is able to accelerate the deep reinforcement learning process. There are two key research issues on behavioral metric-based representation learning: 1) how to relax the computation of a specific behavioral metric, which is difficult or even intractable to compute, and 2) how to approximate the relaxed metric by learning an embedding space for states. In this paper, we analyze the potential relaxation and/or approximation gaps for existing behavioral metric-based representation learning methods. Based on the analysis, we propose a new behavioral distance, the RAP distance, and develop a practical representation learning algorithm on top of it with a theoretical analysis. We conduct extensive experiments on DeepMind Control Suite with distraction, Robosuite, and autonomous driving simulator CARLA to demonstrate new state-of-the-art results.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Robust Dynamics through Variational Sparse Gating b/data/2022/neurips/Learning Robust Dynamics through Variational Sparse Gating
new file mode 100644
index 0000000000..14f1f478a1
--- /dev/null
+++ b/data/2022/neurips/Learning Robust Dynamics through Variational Sparse Gating	
@@ -0,0 +1 @@
+Learning world models from their sensory inputs enables agents to plan for actions by imagining their future outcomes. World models have previously been shown to improve sample-efficiency in simulated environments with few objects, but have not yet been applied successfully to environments with many objects. In environments with many objects, often only a small number of them are moving or interacting at the same time. In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels. First, we introduce Variational Sparse Gating (VSG), a latent dynamics model that updates its feature dimensions sparsely through stochastic binary gates. Moreover, we propose a simplified architecture Simple Variational Sparse Gating (SVSG) that removes the deterministic pathway of previous models, resulting in a fully stochastic transition function that leverages the VSG mechanism. We evaluate the two model architectures in the BringBackShapes (BBS) environment that features a large number of moving objects and partial observability, demonstrating clear improvements over prior models.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Robust Rule Representations for Abstract Reasoning via Internal Inferences b/data/2022/neurips/Learning Robust Rule Representations for Abstract Reasoning via Internal Inferences
new file mode 100644
index 0000000000..dda8ade6c4
--- /dev/null
+++ b/data/2022/neurips/Learning Robust Rule Representations for Abstract Reasoning via Internal Inferences	
@@ -0,0 +1 @@
+Abstract reasoning, as one of the hallmarks of human intelligence, involves collecting information, identifying abstract rules, and applying the rules to solve new problems. Although neural networks have achieved human-level performances in several tasks, the abstract reasoning techniques still far lag behind due to the complexity of learning and applying the logic rules, especially in an unsupervised manner. In this work, we propose a novel framework, ARII, that learns rule representations for Abstract Reasoning via Internal Inferences . The key idea is to repeatedly apply a rule to different instances in hope of having a comprehensive understanding (i.e., representations) of the rule. Specifically, ARII consists of a rule encoder, a reasoner, and an internal referrer. Based on the representations produced by the rule encoder, the reasoner draws the conclusion while the referrer performs internal inferences to regularize rule representations to be robust and generalizable. We evaluate ARII on two benchmark datasets, including PGM and I-RAVEN. We observe that ARII achieves new state-of-the-art records on the majority of the reasoning tasks, including most of the generalization tests in PGM. Our codes are available at https://github.com/Zhangwenbo0324/ARII .
\ No newline at end of file
diff --git a/data/2022/neurips/Learning State-Aware Visual Representations from Audible Interactions b/data/2022/neurips/Learning State-Aware Visual Representations from Audible Interactions
new file mode 100644
index 0000000000..2b5e5b3703
--- /dev/null
+++ b/data/2022/neurips/Learning State-Aware Visual Representations from Audible Interactions	
@@ -0,0 +1 @@
+We propose a self-supervised algorithm to learn representations from egocentric video data. Recently, significant efforts have been made to capture humans interacting with their own environments as they go about their daily activities. In result, several large egocentric datasets of interaction-rich multi-modal data have emerged. However, learning representations from videos can be challenging. First, given the uncurated nature of long-form continuous videos, learning effective representations require focusing on moments in time when interactions take place. Second, visual representations of daily activities should be sensitive to changes in the state of the environment. However, current successful multi-modal learning frameworks encourage representation invariance over time. To address these challenges, we leverage audio signals to identify moments of likely interactions which are conducive to better learning. We also propose a novel self-supervised objective that learns from audible state changes caused by interactions. We validate these contributions extensively on two large-scale egocentric datasets, EPIC-Kitchens-100 and the recently released Ego4D, and show improvements on several downstream tasks, including action recognition, long-term action anticipation, and object state change classification.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Structure from the Ground up - Hierarchical Representation Learning by Chunking b/data/2022/neurips/Learning Structure from the Ground up - Hierarchical Representation Learning by Chunking
new file mode 100644
index 0000000000..7a2a50d15b
--- /dev/null
+++ b/data/2022/neurips/Learning Structure from the Ground up - Hierarchical Representation Learning by Chunking	
@@ -0,0 +1 @@
+From learning to play the piano to speaking a new language, reusing and recombining previously acquired representations enables us to master complex skills and easily adapt to new environments. Inspired by the Gestalt principle of grouping by proximity and theories of chunking in cognitive science, we propose a hierarchical chunking model (HCM). HCM learns representations from non-i.i.d. sequential data from the ground up by first discovering the minimal atomic sequential units as chunks. As learning progresses, a hierarchy of chunk representations is acquired by chunking previously learned representations into more complex representations guided by sequential dependence. We provide learning guarantees on an idealized version of HCM, and demonstrate that HCM learns meaningful and interpretable representations in a human-like fashion. Our model can be extended to learn visual, temporal, and visual-temporal chunks. The interpretability of the learned chunks can be used to assess transfer or interference when the environment changes. Finally, in an fMRI dataset, we demonstrate that HCM learns interpretable chunks of functional coactivation regions and hierarchical modular and sub-modular structures supported by the neuroscientific literature. Taken together, our results show how cognitive science in general and theories of chunking in particular can inform novel and more interpretable approaches to representation learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Substructure Invariance for Out-of-Distribution Molecular Representations b/data/2022/neurips/Learning Substructure Invariance for Out-of-Distribution Molecular Representations
new file mode 100644
index 0000000000..a330e524f5
--- /dev/null
+++ b/data/2022/neurips/Learning Substructure Invariance for Out-of-Distribution Molecular Representations	
@@ -0,0 +1 @@
+Molecule representation learning (MRL) has been extensively studied and current methods have shown promising power for various tasks, e.g., molecular property prediction and target identiﬁcation. However, a common hypothesis of existing methods is that either the model development or experimental evaluation is mostly based on i.i.d. data across training and testing. Such a hypothesis can be violated in real-world applications where testing molecules could come from new environments, bringing about serious performance degradation or unexpected prediction. We propose a new representation learning framework entitled MoleOOD to enhance the robustness of MRL models against such distribution shifts, motivated by an observation that the (bio)chemical properties of molecules are usually invariantly associated with certain privileged molecular substructures across different environments (e.g., scaffolds, sizes, etc.). Speciﬁcally, We introduce an environment inference model to identify the latent factors that impact data generation from different distributions in a fully data-driven manner. We also propose a new learning objective to guide the molecule encoder to leverage environment-invariant substructures that more stably relate with the labels across environments. Extensive experiments on ten real-world datasets demonstrate that our model has a stronger generalization ability than existing methods under various out-of-distribution (OOD) settings, despite the absence of manual speciﬁcations of environments. Particularly, our method achieves up to 5.9% and 3.9% improvement over the strongest baselines on OGB and DrugOOD benchmarks in terms of ROC-AUC, respectively. Our source code is publicly available at https:/
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Superpoint Graph Cut for 3D Instance Segmentation b/data/2022/neurips/Learning Superpoint Graph Cut for 3D Instance Segmentation
new file mode 100644
index 0000000000..aded49874e
--- /dev/null
+++ b/data/2022/neurips/Learning Superpoint Graph Cut for 3D Instance Segmentation	
@@ -0,0 +1 @@
+This supplementary material provides more details on network architecture, visualization, and ablation study of our method. We also analyze the limitation and discuss the impact of our method. Specifically, in Sec. B, we provide specific network architecture and more details about the superpoint feature extraction. In Sec. C, we provide more visualization results, quantitative results, and ablation study of our proposed method. In Sec. D, we discuss the limitations and impacts of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Symmetric Rules with SATNet b/data/2022/neurips/Learning Symmetric Rules with SATNet
new file mode 100644
index 0000000000..05c5473e08
--- /dev/null
+++ b/data/2022/neurips/Learning Symmetric Rules with SATNet	
@@ -0,0 +1 @@
+SATNet is a differentiable constraint solver with a custom backpropagation algorithm, which can be used as a layer in a deep-learning system. It is a promising proposal for bridging deep learning and logical reasoning. In fact, SATNet has been successfully applied to learn, among others, the rules of a complex logical puzzle, such as Sudoku, just from input and output pairs where inputs are given as images. In this paper, we show how to improve the learning of SATNet by exploiting symmetries in the target rules of a given but unknown logical puzzle or more generally a logical formula. We present SymSATNet, a variant of SATNet that translates the given symmetries of the target rules to a condition on the parameters of SATNet and requires that the parameters should have a particular parametric form that guarantees the condition. The requirement dramatically reduces the number of parameters to learn for the rules with enough symmetries, and makes the parameter learning of SymSATNet much easier than that of SATNet. We also describe a technique for automatically discovering symmetries of the target rules from examples. Our experiments with Sudoku and Rubik's cube show the substantial improvement of SymSATNet over the baseline SATNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Tractable Probabilistic Models from Inconsistent Local Estimates b/data/2022/neurips/Learning Tractable Probabilistic Models from Inconsistent Local Estimates
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2022/neurips/Learning Tractable Probabilistic Models from Inconsistent Local Estimates	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2022/neurips/Learning Two-Player Markov Games: Neural Function Approximation and Correlated Equilibrium b/data/2022/neurips/Learning Two-Player Markov Games: Neural Function Approximation and Correlated Equilibrium
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space b/data/2022/neurips/Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
new file mode 100644
index 0000000000..c463d3c402
--- /dev/null
+++ b/data/2022/neurips/Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space	
@@ -0,0 +1 @@
+Humans are remarkably flexible in understanding viewpoint changes due to visual cortex supporting the perception of 3D structure. In contrast, most of the computer vision models that learn visual representation from a pool of 2D images often fail to generalize over novel camera viewpoints. Recently, the vision architectures have shifted towards convolution-free architectures, visual Transformers, which operate on tokens derived from image patches. However, these Transformers do not perform explicit operations to learn viewpoint-agnostic representation for visual understanding. To this end, we propose a 3D Token Representation Layer (3DTRL) that estimates the 3D positional information of the visual tokens and leverages it for learning viewpoint-agnostic representations. The key elements of 3DTRL include a pseudo-depth estimator and a learned camera matrix to impose geometric transformations on the tokens, trained in an unsupervised fashion. These enable 3DTRL to recover the 3D positional information of the tokens from 2D patches. In practice, 3DTRL is easily plugged-in into a Transformer. Our experiments demonstrate the effectiveness of 3DTRL in many vision tasks including image classification, multi-view video alignment, and action recognition. The models with 3DTRL outperform their backbone Transformers in all the tasks with minimal added computation. Our code is available at https://github.com/elicassion/3DTRL.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning b/data/2022/neurips/Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
new file mode 100644
index 0000000000..2cb0ba8565
--- /dev/null
+++ b/data/2022/neurips/Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning	
@@ -0,0 +1 @@
+Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky videos can be stored due to the limited memory. To address this problem, we propose FrameMaker, a memory-efficient video class-incremental learning approach that learns to produce a condensed frame for each selected video. Specifically, FrameMaker is mainly composed of two crucial components: Frame Condensing and Instance-Specific Prompt. The former is to reduce the memory cost by preserving only one condensed frame instead of the whole video, while the latter aims to compensate the lost spatio-temporal details in the Frame Condensing stage. By this means, FrameMaker enables a remarkable reduction in memory but keep enough information that can be applied to following incremental tasks. Experimental results on multiple challenging benchmarks, i.e., HMDB51, UCF101 and Something-Something V2, demonstrate that FrameMaker can achieve better performance to recent advanced methods while consuming only 20% memory. Additionally, under the same memory consumption conditions, FrameMaker significantly outperforms existing state-of-the-arts by a convincing margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning and Covering Sums of Independent Random Variables with Unbounded Support b/data/2022/neurips/Learning and Covering Sums of Independent Random Variables with Unbounded Support
new file mode 100644
index 0000000000..6f9952daec
--- /dev/null
+++ b/data/2022/neurips/Learning and Covering Sums of Independent Random Variables with Unbounded Support	
@@ -0,0 +1 @@
+We study the problem of covering and learning sums $X = X_1 + \cdots + X_n$ of independent integer-valued random variables $X_i$ (SIIRVs) with unbounded, or even infinite, support. De et al. at FOCS 2018, showed that the maximum value of the collective support of $X_i$'s necessarily appears in the sample complexity of learning $X$. In this work, we address two questions: (i) Are there general families of SIIRVs with unbounded support that can be learned with sample complexity independent of both $n$ and the maximal element of the support? (ii) Are there general families of SIIRVs with unbounded support that admit proper sparse covers in total variation distance? As for question (i), we provide a set of simple conditions that allow the unbounded SIIRV to be learned with complexity $\text{poly}(1/\epsilon)$ bypassing the aforementioned lower bound. We further address question (ii) in the general setting where each variable $X_i$ has unimodal probability mass function and is a different member of some, possibly multi-parameter, exponential family $\mathcal{E}$ that satisfies some structural properties. These properties allow $\mathcal{E}$ to contain heavy tailed and non log-concave distributions. Moreover, we show that for every $\epsilon>0$, and every $k$-parameter family $\mathcal{E}$ that satisfies some structural assumptions, there exists an algorithm with $\tilde{O}(k) \cdot \text{poly}(1/\epsilon)$ samples that learns a sum of $n$ arbitrary members of $\mathcal{E}$ within $\epsilon$ in TV distance. The output of the learning algorithm is also a sum of random variables whose distribution lies in the family $\mathcal{E}$. En route, we prove that any discrete unimodal exponential family with bounded constant-degree central moments can be approximated by the family corresponding to a bounded subset of the initial (unbounded) parameter space.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning dynamics of deep linear networks with multiple pathways b/data/2022/neurips/Learning dynamics of deep linear networks with multiple pathways
new file mode 100644
index 0000000000..ddc0a352ed
--- /dev/null
+++ b/data/2022/neurips/Learning dynamics of deep linear networks with multiple pathways	
@@ -0,0 +1 @@
+Not only have deep networks become standard in machine learning, they are increasingly of interest in neuroscience as models of cortical computation that capture relationships between structural and functional properties. In addition they are a useful target of theoretical research into the properties of network computation. Deep networks typically have a serial or approximately serial organization across layers, and this is often mirrored in models that purport to represent computation in mammalian brains. There are, however, multiple examples of parallel pathways in mammalian brains. In some cases, such as the mouse, the entire visual system appears arranged in a largely parallel, rather than serial fashion. While these pathways may be formed by differing cost functions that drive different computations, here we present a new mathematical analysis of learning dynamics in networks that have parallel computational pathways driven by the same cost function. We use the approximation of deep linear networks with large hidden layer sizes to show that, as the depth of the parallel pathways increases, different features of the training set (defined by the singular values of the input-output correlation) will typically concentrate in one of the pathways. This result is derived analytically and demonstrated with numerical simulation with both linear and non-linear networks. Thus, rather than sharing stimulus and task features across multiple pathways, parallel network architectures learn to produce sharply diversified representations with specialized and specific pathways, a mechanism which may hold important consequences for codes in both biological and artificial systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context b/data/2022/neurips/Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales b/data/2022/neurips/Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales
new file mode 100644
index 0000000000..65ee972a28
--- /dev/null
+++ b/data/2022/neurips/Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales	
@@ -0,0 +1 @@
+Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful. Particularly important is the ability to incorporate domain knowledge of invariances, e.g., translational invariance of images. Kernels based on the \textit{maximum} similarity over a group of transformations are not generally positive definite. Perhaps it is for this reason that they have not been studied theoretically. We address this lacuna and show that positive definiteness indeed holds \textit{with high probability} for kernels based on the maximum similarity in the small training sample set regime of interest, and that they do yield the best results in that regime. We also show how additional properties such as their ability to incorporate local features at multiple spatial scales, e.g., as done in CNNs through max pooling, and to provide the benefits of composition through the architecture of multiple layers, can also be embedded into SVMs. We verify through experiments on widely available image sets that the resulting SVMs do provide superior accuracy in comparison to well-established deep neural network benchmarks for small sample sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning from Future: A Novel Self-Training Framework for Semantic Segmentation b/data/2022/neurips/Learning from Future: A Novel Self-Training Framework for Semantic Segmentation
new file mode 100644
index 0000000000..8798f5d567
--- /dev/null
+++ b/data/2022/neurips/Learning from Future: A Novel Self-Training Framework for Semantic Segmentation	
@@ -0,0 +1 @@
+Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wrong supervision signals and get accumulated in the training process. The primary cause of such a drawback is that the prevailing self-training framework acts as guiding the current state with previous knowledge, because the teacher is updated with the past student only. To alleviate this problem, we propose a novel self-training strategy, which allows the model to learn from the future. Concretely, at each training step, we first virtually optimize the student (i.e., caching the gradients without applying them to the model weights), then update the teacher with the virtual future student, and finally ask the teacher to produce pseudo-labels for the current student as the guidance. In this way, we manage to improve the quality of pseudo-labels and thus boost the performance. We also develop two variants of our future-self-training (FST) framework through peeping at the future both deeply (FST-D) and widely (FST-W). Taking the tasks of unsupervised domain adaptive semantic segmentation and semi-supervised semantic segmentation as the instances, we experimentally demonstrate the effectiveness and superiority of our approach under a wide range of settings. Code will be made publicly available.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning from Label Proportions by Learning with Label Noise b/data/2022/neurips/Learning from Label Proportions by Learning with Label Noise
new file mode 100644
index 0000000000..2dab732a24
--- /dev/null
+++ b/data/2022/neurips/Learning from Label Proportions by Learning with Label Noise	
@@ -0,0 +1 @@
+Learning from label proportions (LLP) is a weakly supervised classification problem where data points are grouped into bags, and the label proportions within each bag are observed instead of the instance-level labels. The task is to learn a classifier to predict the individual labels of future individual instances. Prior work on LLP for multi-class data has yet to develop a theoretically grounded algorithm. In this work, we provide a theoretically grounded approach to LLP based on a reduction to learning with label noise, using the forward correction (FC) loss of \citet{Patrini2017MakingDN}. We establish an excess risk bound and generalization error analysis for our approach, while also extending the theory of the FC loss which may be of independent interest. Our approach demonstrates improved empirical performance in deep learning scenarios across multiple datasets and architectures, compared to the leading existing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning from Stochastically Revealed Preference b/data/2022/neurips/Learning from Stochastically Revealed Preference
new file mode 100644
index 0000000000..f745b14bc6
--- /dev/null
+++ b/data/2022/neurips/Learning from Stochastically Revealed Preference	
@@ -0,0 +1 @@
+We study the learning problem of revealed preference in a stochastic setting: a learner observes the utility-maximizing actions of a set of agents whose utility follows some unknown distribution, and the learner aims to infer the distribution through the observations of actions. The problem can be viewed as a single-constraint special case of the inverse linear optimization problem. Existing works all assume that all the agents share one common utility which can easily be violated under practical contexts. In this paper, we consider two settings for the underlying utility distribution: a Gaussian setting where the customer utility follows the von Mises-Fisher distribution, and a $\delta$-corruption setting where the customer utility distribution concentrates on one fixed vector with high probability and is arbitrarily corrupted otherwise. We devise Bayesian approaches for parameter estimation and develop theoretical guarantees for the recovery of the true parameter. We illustrate the algorithm performance through numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning from a Sample in Online Algorithms b/data/2022/neurips/Learning from a Sample in Online Algorithms
new file mode 100644
index 0000000000..61990265bc
--- /dev/null
+++ b/data/2022/neurips/Learning from a Sample in Online Algorithms	
@@ -0,0 +1 @@
+We consider three central problems in optimization: the restricted assignment load-balancing problem, the Steiner tree network design problem, and facility location clustering. We consider the online setting, where the input arrives over time, and irrevocable decisions must be made without knowledge of the future. For all these problems, any online algorithm must incur a cost that is approximately log | I | times the optimal cost in the worst-case, where | I | is the length of the input. But can we go beyond the worst-case? In this work we give algorithms that perform substantially better when a p -fraction of the input is given as a sample: the algorithm use this sample to learn a good strategy to use for the rest of the input.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning in Congestion Games with Bandit Feedback b/data/2022/neurips/Learning in Congestion Games with Bandit Feedback
new file mode 100644
index 0000000000..0d87ff7671
--- /dev/null
+++ b/data/2022/neurips/Learning in Congestion Games with Bandit Feedback	
@@ -0,0 +1 @@
+In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning in Observable POMDPs, without Computationally Intractable Oracles b/data/2022/neurips/Learning in Observable POMDPs, without Computationally Intractable Oracles
new file mode 100644
index 0000000000..83bab84445
--- /dev/null
+++ b/data/2022/neurips/Learning in Observable POMDPs, without Computationally Intractable Oracles	
@@ -0,0 +1 @@
+Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planning or estimation problem as a subroutine. In this work we develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in"observable"POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations. Our techniques circumvent the more traditional approach of using the principle of optimism under uncertainty to promote exploration, and instead give a novel application of barycentric spanners to constructing policy covers.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning interacting dynamical systems with latent Gaussian process ODEs b/data/2022/neurips/Learning interacting dynamical systems with latent Gaussian process ODEs
new file mode 100644
index 0000000000..d86151fa06
--- /dev/null
+++ b/data/2022/neurips/Learning interacting dynamical systems with latent Gaussian process ODEs	
@@ -0,0 +1 @@
+We study time uncertainty-aware modeling of continuous-time dynamics of interacting objects. We introduce a new model that decomposes independent dynamics of single objects accurately from their interactions. By employing latent Gaussian process ordinary differential equations, our model infers both independent dynamics and their interactions with reliable uncertainty estimates. In our formulation, each object is represented as a graph node and interactions are modeled by accumulating the messages coming from neighboring objects. We show that efficient inference of such a complex network of variables is possible with modern variational sparse Gaussian process inference techniques. We empirically demonstrate that our model improves the reliability of long-term predictions over neural network based alternatives and it successfully handles missing dynamic or static information. Furthermore, we observe that only our model can successfully encapsulate independent dynamics and interaction information in distinct functions and show the benefit from this disentanglement in extrapolation scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning low-dimensional generalizable natural features from retina using a U-net b/data/2022/neurips/Learning low-dimensional generalizable natural features from retina using a U-net
new file mode 100644
index 0000000000..384348a059
--- /dev/null
+++ b/data/2022/neurips/Learning low-dimensional generalizable natural features from retina using a U-net	
@@ -0,0 +1 @@
+Much of sensory neuroscience focuses on presenting stimuli that are chosen by the experimenter because they are parametric and easy to sample and are thought to be behaviorally relevant to the organism. However, it is not generally known what these relevant features are in complex, natural scenes. This work focuses on using the retinal encoding of natural movies to determine the presumably behaviorally-relevant features that the brain represents. It is prohibitive to parameterize a natural movie and its respective retinal encoding fully. We use time within a natural movie as a proxy for the whole suite of features evolving across the scene. We then use a task-agnostic deep architecture, an encoder-decoder, to model the retinal encoding process and characterize its representation of “time in the natural scene” in a compressed latent space. In our end-to-end training, an encoder learns a compressed latent representation from a large population of salamander retinal ganglion cells responding to natural movies, while a decoder samples from this compressed latent space to generate the appropriate future movie frame. By comparing latent representations of retinal activity from three movies, we find that the retina has a generalizable encoding for time in the natural scene: the precise, low-dimensional representation of time learned from one movie can be used to represent time in a different movie, with up to 17 ms resolution. We then show that static textures and velocity features of a natural movie are synergistic. The retina simultaneously encodes both to establishes a generalizable, low-dimensional representation of time in the natural scene.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning on Arbitrary Graph Topologies via Predictive Coding b/data/2022/neurips/Learning on Arbitrary Graph Topologies via Predictive Coding
new file mode 100644
index 0000000000..1bb7a55ec0
--- /dev/null
+++ b/data/2022/neurips/Learning on Arbitrary Graph Topologies via Predictive Coding	
@@ -0,0 +1 @@
+Training with backpropagation (BP) in standard deep learning consists of two main steps: a forward pass that maps a data point to its prediction, and a backward pass that propagates the error of this prediction back through the network. This process is highly effective when the goal is to minimize a specific objective function. However, it does not allow training on networks with cyclic or backward connections. This is an obstacle to reaching brain-like capabilities, as the highly complex heterarchical structure of the neural connections in the neocortex are potentially fundamental for its effectiveness. In this paper, we show how predictive coding (PC), a theory of information processing in the cortex, can be used to perform inference and learning on arbitrary graph topologies. We experimentally show how this formulation, called PC graphs, can be used to flexibly perform different tasks with the same network by simply stimulating specific neurons. This enables the model to be queried on stimuli with different structures, such as partial images, images with labels, or images without labels. We conclude by investigating how the topology of the graph influences the final performance, and comparing against simple baselines trained with BP.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning on the Edge: Online Learning with Stochastic Feedback Graphs b/data/2022/neurips/Learning on the Edge: Online Learning with Stochastic Feedback Graphs
new file mode 100644
index 0000000000..7ab268a24d
--- /dev/null
+++ b/data/2022/neurips/Learning on the Edge: Online Learning with Stochastic Feedback Graphs	
@@ -0,0 +1 @@
+The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erd\H{o}s-R\'enyi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order $\min\bigl\{\min_{\varepsilon} \sqrt{(\alpha_\varepsilon/\varepsilon) T},\, \min_{\varepsilon} (\delta_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$ (ignoring logarithmic factors), where $\alpha_{\varepsilon}$ and $\delta_{\varepsilon}$ are graph-theoretic quantities measured on the support of the stochastic feedback graph $\mathcal{G}$ with edge probabilities thresholded at $\varepsilon$. Our result, which holds without any preliminary knowledge about $\mathcal{G}$, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning single-index models with shallow neural networks b/data/2022/neurips/Learning single-index models with shallow neural networks
new file mode 100644
index 0000000000..9b35aa0513
--- /dev/null
+++ b/data/2022/neurips/Learning single-index models with shallow neural networks	
@@ -0,0 +1 @@
+Single-index models are a class of functions given by an unknown univariate ``link'' function applied to an unknown one-dimensional projection of the input. These models are particularly relevant in high dimension, when the data might present low-dimensional structure that learning algorithms should adapt to. While several statistical aspects of this model, such as the sample complexity of recovering the relevant (one-dimensional) subspace, are well-understood, they rely on tailored algorithms that exploit the specific structure of the target function. In this work, we introduce a natural class of shallow neural networks and study its ability to learn single-index models via gradient flow. More precisely, we consider shallow networks in which biases of the neurons are frozen at random initialization. We show that the corresponding optimization landscape is benign, which in turn leads to generalization guarantees that match the near-optimal sample complexity of dedicated semi-parametric methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning sparse features can lead to overfitting in neural networks b/data/2022/neurips/Learning sparse features can lead to overfitting in neural networks
new file mode 100644
index 0000000000..5c5874c4dd
--- /dev/null
+++ b/data/2022/neurips/Learning sparse features can lead to overfitting in neural networks	
@@ -0,0 +1 @@
+It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge. For example, it is beneficial for modern architectures to be trained to classify images, whereas it is detrimental for fully-connected networks to be trained on the same data. Here, we propose an explanation for this puzzle, by showing that feature learning can perform worse than lazy training (via the random feature kernel or the neural tangent kernel) as the former can lead to a sparser neural representation. Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth along certain directions of the input space. We illustrate this phenomenon in two settings: (i) regression of Gaussian random functions on the d-dimensional unit sphere and (ii) classification of benchmark data sets of images. For (i), we compute the scaling of the generalization error with the number of training points and show that methods that do not learn features generalize better, even when the dimension of the input space is large. For (ii), we show empirically that learning features can indeed lead to sparse and thereby less smooth representations of the image predictors. This fact is plausibly responsible for deteriorating the performance, which is known to be correlated with smoothness along diffeomorphisms.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning the Structure of Large Networked Systems Obeying Conservation Laws b/data/2022/neurips/Learning the Structure of Large Networked Systems Obeying Conservation Laws
new file mode 100644
index 0000000000..d0924a92f1
--- /dev/null
+++ b/data/2022/neurips/Learning the Structure of Large Networked Systems Obeying Conservation Laws	
@@ -0,0 +1 @@
+Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems may be modeled as balance equations of the form $X = B^{*} Y$, where the sparsity pattern of $B^{*}$ captures the connectivity of the network, and $Y, X \in \mathbb{R}^p$ are vectors of"potentials"and"injected flows"at the nodes respectively. The node potentials $Y$ cause flows across edges and the flows $X$ injected at the nodes are extraneous to the network dynamics. In several practical systems, the network structure is often unknown and needs to be estimated from data. Towards this, one has access to samples of the node potentials $Y$, but only the statistics of the node injections $X$. Motivated by this important problem, we study the estimation of the sparsity structure of the matrix $B^{*}$ from $n$ samples of $Y$ under the assumption that the node injections $X$ follow a Gaussian distribution with a known covariance $\Sigma_X$. We propose a new $\ell_{1}$-regularized maximum likelihood estimator for this problem in the high-dimensional regime where the size of the network $p$ is larger than sample size $n$. We show that this optimization problem is convex in the objective and admits a unique solution. Under a new mutual incoherence condition, we establish sufficient conditions on the triple $(n,p,d)$ for which exact sparsity recovery of $B^{*}$ is possible with high probability; $d$ is the degree of the graph. We also establish guarantees for the recovery of $B^{*}$ in the element-wise maximum, Frobenius, and operator norms. Finally, we complement these theoretical results with experimental validation of the performance of the proposed estimator on synthetic and real-world data.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Accelerate Partial Differential Equations via Latent Global Evolution b/data/2022/neurips/Learning to Accelerate Partial Differential Equations via Latent Global Evolution
new file mode 100644
index 0000000000..525cfd2cef
--- /dev/null
+++ b/data/2022/neurips/Learning to Accelerate Partial Differential Equations via Latent Global Evolution	
@@ -0,0 +1 @@
+Simulating the time evolution of Partial Differential Equations (PDEs) of large-scale systems is crucial in many scientific and engineering domains such as fluid dynamics, weather forecasting and their inverse optimization problems. However, both classical solvers and recent deep learning-based surrogate models are typically extremely computationally intensive, because of their local evolution: they need to update the state of each discretized cell at each time step during inference. Here we develop Latent Evolution of PDEs (LE-PDE), a simple, fast and scalable method to accelerate the simulation and inverse optimization of PDEs. LE-PDE learns a compact, global representation of the system and efficiently evolves it fully in the latent space with learned latent evolution models. LE-PDE achieves speed-up by having a much smaller latent dimension to update during long rollout as compared to updating in the input space. We introduce new learning objectives to effectively learn such latent dynamics to ensure long-term stability. We further introduce techniques for speeding-up inverse optimization of boundary conditions for PDEs via backpropagation through time in latent space, and an annealing technique to address the non-differentiability and sparse interaction of boundary conditions. We test our method in a 1D benchmark of nonlinear PDEs, 2D Navier-Stokes flows into turbulent phase and an inverse optimization of boundary conditions in 2D Navier-Stokes flow. Compared to state-of-the-art deep learning-based surrogate models and other strong baselines, we demonstrate up to 128x reduction in the dimensions to update, and up to 15x improvement in speed, while achieving competitive accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework b/data/2022/neurips/Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework
new file mode 100644
index 0000000000..1c59a8ad7e
--- /dev/null
+++ b/data/2022/neurips/Learning to Attack Federated Learning: A Model-based Reinforcement Learning Attack Framework	
@@ -0,0 +1 @@
+We propose a model-based reinforcement learning framework to derive untargeted poisoning attacks against federated learning (FL) systems. Our framework first approximates the distribution of the clients’ aggregated data using model updates from the server. The learned distribution is then used to build a simulator of the FL environment, which is utilized to learn an adaptive attack policy through reinforcement learning. Our framework is capable of learning strong attacks automatically even when the server adopts a robust aggregation rule. We further derive an upper bound on the attacker’s performance loss due to inaccurate distribution estimation. Experimental results on real-world datasets demonstrate that the proposed attack framework significantly outperforms state-of-the-art poisoning attacks. This indicates the importance of developing adaptive defenses for FL systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Branch with Tree MDPs b/data/2022/neurips/Learning to Branch with Tree MDPs
new file mode 100644
index 0000000000..52e7867d8b
--- /dev/null
+++ b/data/2022/neurips/Learning to Branch with Tree MDPs	
@@ -0,0 +1 @@
+State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching rules from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching expert. In this work, we instead propose to learn branching rules from scratch via Reinforcement Learning (RL). We revisit the work of Etheve et al. (2020) and propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch. We derive a tree policy gradient theorem, which exhibits a better credit assignment compared to its temporal counterpart. We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation b/data/2022/neurips/Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
new file mode 100644
index 0000000000..f922c1192d
--- /dev/null
+++ b/data/2022/neurips/Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation	
@@ -0,0 +1 @@
+While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the probability of continuing to generate that sentence; 3) The sentences with higher initial probabilities usually have a stronger self-reinforcement effect. Motivated by our findings, we propose a simple and effective training method \textbf{DITTO} (Pseu\underline{D}o-Repet\underline{IT}ion Penaliza\underline{T}i\underline{O}n), where the model learns to penalize probabilities of sentence-level repetitions from pseudo repetitive data. Although our method is motivated by mitigating repetitions, experiments show that DITTO not only mitigates the repetition issue without sacrificing perplexity, but also achieves better generation quality. Extensive experiments on open-ended text generation (Wikitext-103) and text summarization (CNN/DailyMail) demonstrate the generality and effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Compare Nodes in Branch and Bound with Graph Neural Networks b/data/2022/neurips/Learning to Compare Nodes in Branch and Bound with Graph Neural Networks
new file mode 100644
index 0000000000..15cc23c54d
--- /dev/null
+++ b/data/2022/neurips/Learning to Compare Nodes in Branch and Bound with Graph Neural Networks	
@@ -0,0 +1 @@
+Branch-and-bound approaches in integer programming require ordering portions of the space to explore next, a problem known as node comparison. We propose a new siamese graph neural network model to tackle this problem, where the nodes are represented as bipartite graphs with attributes. Similar to prior work, we train our model to imitate a diving oracle that plunges towards the optimal solution. We evaluate our method by solving the instances in a plain framework where the nodes are explored according to their rank. On three NP-hard benchmarks chosen to be particularly primal-difficult, our approach leads to faster solving and smaller branch- and-bound trees than the default ranking function of the open-source solver SCIP, as well as competing machine learning methods. Moreover, these results generalize to instances larger than used for training. Code for reproducing the experiments can be found at https://github.com/ds4dm/learn2comparenodes.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Configure Computer Networks with Neural Algorithmic Reasoning b/data/2022/neurips/Learning to Configure Computer Networks with Neural Algorithmic Reasoning
new file mode 100644
index 0000000000..976f196fdf
--- /dev/null
+++ b/data/2022/neurips/Learning to Configure Computer Networks with Neural Algorithmic Reasoning	
@@ -0,0 +1 @@
+We present a new method for scaling automatic configuration of computer networks. The key idea is to relax the computationally hard search problem of finding a configuration that satisfies a given specification into an approximate objective amenable to learning-based techniques. Based on this idea, we train a neural algorithmic model which learns to generate configurations likely to (fully or partially) satisfy a given specification under existing routing protocols. By relaxing the rigid satisfaction guarantees, our approach (i) enables greater flexibility: it is protocol-agnostic, enables cross-protocol reasoning, and does not depend on hardcoded rules; and (ii) finds configurations for much larger computer networks than previously possible. Our learned synthesizer is up to 490x faster than state-of-the-art SMT-based methods, while producing configurations which on average satisfy more than 93% of the provided requirements.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Constrain Policy Optimization with Virtual Trust Region b/data/2022/neurips/Learning to Constrain Policy Optimization with Virtual Trust Region
new file mode 100644
index 0000000000..351f528b64
--- /dev/null
+++ b/data/2022/neurips/Learning to Constrain Policy Optimization with Virtual Trust Region	
@@ -0,0 +1 @@
+We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Discover and Detect Objects b/data/2022/neurips/Learning to Discover and Detect Objects
new file mode 100644
index 0000000000..17e70f2108
--- /dev/null
+++ b/data/2022/neurips/Learning to Discover and Detect Objects	
@@ -0,0 +1 @@
+We tackle the problem of novel class discovery and localization (NCDL). In this setting, we assume a source dataset with supervision for only some object classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity without any human supervision. To tackle NCDL, we propose a two-stage object detection network Region-based NCDL (RNCDL) that uses a region proposal network to localize regions of interest (RoIs). We then train our network to learn to classify each RoI, either as one of the known classes, seen in the source dataset, or one of the novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those not part of the labeled object class vocabulary. Our experiments conducted using COCO and LVIS datasets reveal that our method is significantly more effective than multi-stage pipelines that rely on traditional clustering algorithms. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale Visual Genome dataset, where our network successfully learns to detect various semantic classes without direct supervision.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs b/data/2022/neurips/Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs
new file mode 100644
index 0000000000..3ff0881bcc
--- /dev/null
+++ b/data/2022/neurips/Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs	
@@ -0,0 +1 @@
+In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken the powerful decoder by applying uniformly random dropout to the decoder input. We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Follow Instructions in Text-Based Games b/data/2022/neurips/Learning to Follow Instructions in Text-Based Games
new file mode 100644
index 0000000000..db55c26b27
--- /dev/null
+++ b/data/2022/neurips/Learning to Follow Instructions in Text-Based Games	
@@ -0,0 +1 @@
+Text-based games present a unique class of sequential decision making problem in which agents interact with a partially observable, simulated environment via actions and observations conveyed through natural language. Such observations typically include instructions that, in a reinforcement learning (RL) setting, can directly or indirectly guide a player towards completing reward-worthy tasks. In this work, we study the ability of RL agents to follow such instructions. We conduct experiments that show that the performance of state-of-the-art text-based game agents is largely unaffected by the presence or absence of such instructions, and that these agents are typically unable to execute tasks to completion. To further study and address the task of instruction following, we equip RL agents with an internal structured representation of natural language instructions in the form of Linear Temporal Logic (LTL), a formal language that is increasingly used for temporally extended reward specification in RL. Our framework both supports and highlights the benefit of understanding the temporal semantics of instructions and in measuring progress towards achievement of such a temporally extended behaviour. Experiments with 500+ games in TextWorld demonstrate the superior performance of our approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Generate Inversion-Resistant Model Explanations b/data/2022/neurips/Learning to Generate Inversion-Resistant Model Explanations
new file mode 100644
index 0000000000..a3ccdee74e
--- /dev/null
+++ b/data/2022/neurips/Learning to Generate Inversion-Resistant Model Explanations	
@@ -0,0 +1 @@
+The wide adoption of deep neural networks (DNNs) in mission-critical applications has spurred the need for interpretable models that provide explanations of the model’s decisions. Unfortunately, previous studies have demonstrated that model explanations facilitate information leakage, rendering DNN models vulnerable to model inversion attacks. These attacks enable the adversary to reconstruct original images based on model explanations, thus leaking privacy-sensitive features. To this end, we present Generative Noise Injector for Model Explanations (GNIME), a novel defense framework that perturbs model explanations to minimize the risk of model inversion attacks while preserving the interpretabilities of the generated explanations. Specifically, we formulate the defense training as a two-player minimax game between the inversion attack network on the one hand, which aims to invert model explanations, and the noise generator network on the other, which aims to inject perturbations to tamper with model inversion attacks. We demonstrate that GNIME significantly decreases the information leakage in model explanations, decreasing transferable classification accuracy in facial recognition models by up to 84.8% while preserving the original functionality of model explanations.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Mitigate AI Collusion on Economic Platforms b/data/2022/neurips/Learning to Mitigate AI Collusion on Economic Platforms
new file mode 100644
index 0000000000..ae2bc94ad2
--- /dev/null
+++ b/data/2022/neurips/Learning to Mitigate AI Collusion on Economic Platforms	
@@ -0,0 +1 @@
+Algorithmic pricing on online e-commerce platforms raises the concern of tacit collusion, where reinforcement learning algorithms learn to set collusive prices in a decentralized manner and through nothing more than profit feedback. This raises the question as to whether collusive pricing can be prevented through the design of suitable"buy boxes,"i.e., through the design of the rules that govern the elements of e-commerce sites that promote particular products and prices to consumers. In this paper, we demonstrate that reinforcement learning (RL) can also be used by platforms to learn buy box rules that are effective in preventing collusion by RL sellers. For this, we adopt the methodology of Stackelberg POMDPs, and demonstrate success in learning robust rules that continue to provide high consumer welfare together with sellers employing different behavior models or having out-of-distribution costs for goods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Navigate Wikipedia by Taking Random Walks b/data/2022/neurips/Learning to Navigate Wikipedia by Taking Random Walks
new file mode 100644
index 0000000000..d010df6ef3
--- /dev/null
+++ b/data/2022/neurips/Learning to Navigate Wikipedia by Taking Random Walks	
@@ -0,0 +1 @@
+A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this paper, we show that behavioral cloning of randomly sampled trajectories is sufficient to learn an effective link selection policy. We demonstrate the approach on a graph version of Wikipedia with 38M nodes and 387M edges. The model is able to efficiently navigate between nodes 5 and 20 steps apart 96% and 92% of the time, respectively. We then use the resulting embeddings and policy in downstream fact verification and question answering tasks where, in combination with basic TF-IDF search and ranking methods, they are competitive results to the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification b/data/2022/neurips/Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification
new file mode 100644
index 0000000000..c8d5f06ddc
--- /dev/null
+++ b/data/2022/neurips/Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification	
@@ -0,0 +1 @@
+Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting, where training samples are associated with different weights in the loss function. Most of existing re-weighting approaches treat the example weights as the learnable parameter and optimize the weights on the meta set, entailing expensive bilevel optimization. In this paper, we propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view. Specifically, we view the training set as an imbalanced distribution over its samples, which is transported by OT to a balanced distribution obtained from the meta set. The weights of the training samples are the probability mass of the imbalanced distribution and learned by minimizing the OT distance between the two distributions. Compared with existing methods, our proposed one disengages the dependence of the weight learning on the concerned classifier at each iteration. Experiments on image, text and point cloud datasets demonstrate that our proposed re-weighting method has excellent performance, achieving state-of-the-art results in many cases and providing a promising tool for addressing the imbalanced classification issue.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures b/data/2022/neurips/Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures
new file mode 100644
index 0000000000..afe139221e
--- /dev/null
+++ b/data/2022/neurips/Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures	
@@ -0,0 +1 @@
+This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gradient descent on symmetric neural networks, the generalization error can be lower-bounded in terms of the noise-stability of the target function, supporting a conjecture made in [ZRKB21]. It is then shown that in the distribution shift setting, when the data withholding corresponds to freezing a single feature (referred to as canonical holdout), the generalization error of gradient descent admits a tight characterization in terms of the Boolean influence for several relevant architectures. This is shown on linear models and supported experimentally on other models such as MLPs and Transformers. In particular, this puts forward the hypothesis that for such architectures and for learning logical functions such as PVR functions, GD tends to have an implicit bias towards low-degree representations, which in turn gives the Boolean influence for the generalization error under quadratic loss.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations b/data/2022/neurips/Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations
new file mode 100644
index 0000000000..23c1795241
--- /dev/null
+++ b/data/2022/neurips/Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations	
@@ -0,0 +1 @@
+Modeling multivariate time series as temporal signals over a (possibly dynamic) graph is an effective representational framework that allows for developing models for time series analysis. In fact, discrete sequences of graphs can be processed by autoregressive graph neural networks to recursively learn representations at each discrete point in time and space. Spatiotemporal graphs are often highly sparse, with time series characterized by multiple, concurrent, and long sequences of missing data, e.g., due to the unreliable underlying sensor network. In this context, autoregressive models can be brittle and exhibit unstable learning dynamics. The objective of this paper is, then, to tackle the problem of learning effective models to reconstruct, i.e., impute, missing data points by conditioning the reconstruction only on the available observations. In particular, we propose a novel class of attention-based architectures that, given a set of highly sparse discrete observations, learn a representation for points in time and space by exploiting a spatiotemporal propagation architecture aligned with the imputation task. Representations are trained end-to-end to reconstruct observations w.r.t. the corresponding sensor and its neighboring nodes. Compared to the state of the art, our model handles sparse data without propagating prediction errors or requiring a bidirectional model to encode forward and backward time dependencies. Empirical results on representative benchmarks show the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs b/data/2022/neurips/Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs
new file mode 100644
index 0000000000..861d14b1df
--- /dev/null
+++ b/data/2022/neurips/Learning to Sample and Aggregate: Few-shot Reasoning over Temporal Knowledge Graphs	
@@ -0,0 +1 @@
+In this paper, we investigate a realistic but underexplored problem, called few-shot temporal knowledge graph reasoning, that aims to predict future facts for newly emerging entities based on extremely limited observations in evolving graphs. It offers practical value in applications that need to derive instant new knowledge about new entities in temporal knowledge graphs (TKGs) with minimal supervision. The challenges mainly come from the few-shot and time shift properties of new entities. First, the limited observations associated with them are insufficient for training a model from scratch. Second, the potentially dynamic distributions from the initially observable facts to the future facts ask for explicitly modeling the evolving characteristics of new entities. We correspondingly propose a novel Meta Temporal Knowledge Graph Reasoning (MetaTKGR) framework. Unlike prior work that relies on rigid neighborhood aggregation schemes to enhance low-data entity representation, MetaTKGR dynamically adjusts the strategies of sampling and aggregating neighbors from recent facts for new entities, through temporally supervised signals on future facts as instant feedback. Besides, such a meta temporal reasoning procedure goes beyond existing meta-learning paradigms on static knowledge graphs that fail to handle temporal adaptation with large entity variance. We further provide a theoretical analysis and propose a temporal adaptation regularizer to stabilize the meta temporal reasoning over time. Empirically, extensive experiments on three real-world TKGs demonstrate the superiority of MetaTKGR over state-of-the-art baselines by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Scaffold: Optimizing Model Explanations for Teaching b/data/2022/neurips/Learning to Scaffold: Optimizing Model Explanations for Teaching
new file mode 100644
index 0000000000..73219beda4
--- /dev/null
+++ b/data/2022/neurips/Learning to Scaffold: Optimizing Model Explanations for Teaching	
@@ -0,0 +1 @@
+Modern machine learning models are opaque, and as a result there is a burgeoning academic subfield on methods that explain these models' behavior. However, what is the precise goal of providing such explanations, and how can we demonstrate that explanations achieve this goal? Some research argues that explanations should help teach a student (either human or machine) to simulate the model being explained, and that the quality of explanations can be measured by the simulation accuracy of students on unexplained examples. In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model. We train models on three natural language processing and computer vision tasks, and find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods. Through human annotations and a user study, we further find that these learned explanations more closely align with how humans would explain the required decisions in these tasks. Our code is available at https://github.com/coderpat/learning-scaffold
\ No newline at end of file
diff --git a/data/2022/neurips/Learning to Share in Networked Multi-Agent Reinforcement Learning b/data/2022/neurips/Learning to Share in Networked Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Learning with convolution and pooling operations in kernel methods b/data/2022/neurips/Learning with convolution and pooling operations in kernel methods
new file mode 100644
index 0000000000..15e46d8d8b
--- /dev/null
+++ b/data/2022/neurips/Learning with convolution and pooling operations in kernel methods	
@@ -0,0 +1 @@
+Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for their success is that these architectures encode hypothesis classes that are suitable for natural images. However, understanding the precise interplay between approximation and generalization in convolutional architectures remains a challenge. In this paper, we consider the stylized setting of covariates (image pixels) uniformly distributed on the hypercube, and characterize exactly the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations. We use this characterization to compute sharp asymptotics of the generalization error for any given function in high-dimension. In particular, we quantify the gain in sample complexity brought by enforcing locality with the convolution operation and approximate translation invariance with average pooling. Notably, these results provide a precise description of how convolution and pooling operations trade off approximation with generalization power in one layer convolutional kernels.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning with little mixing b/data/2022/neurips/Learning with little mixing
new file mode 100644
index 0000000000..dde607e209
--- /dev/null
+++ b/data/2022/neurips/Learning with little mixing	
@@ -0,0 +1 @@
+We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+\epsilon}$ norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning-Augmented Algorithms for Online Linear and Semidefinite Programming b/data/2022/neurips/Learning-Augmented Algorithms for Online Linear and Semidefinite Programming
new file mode 100644
index 0000000000..9092801410
--- /dev/null
+++ b/data/2022/neurips/Learning-Augmented Algorithms for Online Linear and Semidefinite Programming	
@@ -0,0 +1 @@
+Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study online covering linear and semidefinite programs in which the algorithm is augmented with advice from a possibly erroneous predictor. We show that if the predictor is accurate, we can efficiently bypass these impossibility results and achieve a constant-factor approximation to the optimal solution, i.e., consistency. On the other hand, if the predictor is inaccurate, under some technical conditions, we achieve results that match both the classical optimal upper bounds and the tight lower bounds up to constant factors, i.e., robustness. More broadly, we introduce a framework that extends both (1) the online set cover problem augmented with machine-learning predictors, studied by Bamas, Maggiori, and Svensson (NeurIPS 2020), and (2) the online covering SDP problem, initiated by Elad, Kale, and Naor (ICALP 2016). Specifically, we obtain general online learning-augmented algorithms for covering linear programs with fractional advice and constraints, and initiate the study of learning-augmented algorithms for covering SDP problems. Our techniques are based on the primal-dual framework of Buchbinder and Naor (Mathematics of Operations Research, 34, 2009) and can be further adjusted to handle constraints where the variables lie in a bounded region, i.e., box constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Learning-based Motion Planning in Dynamic Environments Using GNNs and Temporal Encoding b/data/2022/neurips/Learning-based Motion Planning in Dynamic Environments Using GNNs and Temporal Encoding
new file mode 100644
index 0000000000..76b49ea173
--- /dev/null
+++ b/data/2022/neurips/Learning-based Motion Planning in Dynamic Environments Using GNNs and Temporal Encoding	
@@ -0,0 +1 @@
+Learning-based methods have shown promising performance for accelerating motion planning, but mostly in the setting of static environments. For the more challenging problem of planning in dynamic environments, such as multi-arm assembly tasks and human-robot interaction, motion planners need to consider the trajectories of the dynamic obstacles and reason about temporal-spatial interactions in very large state spaces. We propose a GNN-based approach that uses temporal encoding and imitation learning with data aggregation for learning both the embeddings and the edge prioritization policies. Experiments show that the proposed methods can significantly accelerate online planning over state-of-the-art complete dynamic planning algorithms. The learned models can often reduce costly collision checking operations by more than 1000x, and thus accelerating planning by up to 95%, while achieving high success rates on hard instances as well.
\ No newline at end of file
diff --git a/data/2022/neurips/Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning b/data/2022/neurips/Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning
new file mode 100644
index 0000000000..71d009a757
--- /dev/null
+++ b/data/2022/neurips/Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning	
@@ -0,0 +1 @@
+Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combine traditional search methods, such as best-first search and Monte Carlo tree search, with Deep Neural Networks' (DNN) heuristics have shown promising progress and can solve a significant number of hard planning instances beyond specialized solvers. To better understand why these approaches work, we studied the interplay of the policy and value networks of DNN-based best-first search on Sokoban and show the surprising effectiveness of the policy network, further enhanced by the value network, as a guiding heuristic for the search. To further understand the phenomena, we studied the cost distribution of the search algorithms and found that Sokoban instances can have heavy-tailed runtime distributions, with tails both on the left and right-hand sides. In particular, for the first time, we show the existence of \textit{left heavy tails} and propose an abstract tree model that can empirically explain the appearance of these tails. The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees. Our results also demonstrate the importance of random restarts, as are widely used in traditional combinatorial solvers, for DNN-based search methods to avoid left and right heavy tails.
\ No newline at end of file
diff --git a/data/2022/neurips/Less-forgetting Multi-lingual Fine-tuning b/data/2022/neurips/Less-forgetting Multi-lingual Fine-tuning
new file mode 100644
index 0000000000..d014d4bdc5
--- /dev/null
+++ b/data/2022/neurips/Less-forgetting Multi-lingual Fine-tuning	
@@ -0,0 +1 @@
+Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.
\ No newline at end of file
diff --git a/data/2022/neurips/Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis b/data/2022/neurips/Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis
new file mode 100644
index 0000000000..1508fa9332
--- /dev/null
+++ b/data/2022/neurips/Let Images Give You More: Point Cloud Cross-Modal Training for Shape Analysis	
@@ -0,0 +1 @@
+Although recent point cloud analysis achieves impressive progress, the paradigm of representation learning from a single modality gradually meets its bottleneck. In this work, we take a step towards more discriminative 3D point cloud representation by fully taking advantages of images which inherently contain richer appearance information, e.g., texture, color, and shade. Specifically, this paper introduces a simple but effective point cloud cross-modality training (PointCMT) strategy, which utilizes view-images, i.e., rendered or projected 2D images of the 3D object, to boost point cloud analysis. In practice, to effectively acquire auxiliary knowledge from view images, we develop a teacher-student framework and formulate the cross modal learning as a knowledge distillation problem. PointCMT eliminates the distribution discrepancy between different modalities through novel feature and classifier enhancement criteria and avoids potential negative transfer effectively. Note that PointCMT effectively improves the point-only representation without architecture modification. Sufficient experiments verify significant gains on various datasets using appealing backbones, i.e., equipped with PointCMT, PointNet++ and PointMLP achieve state-of-the-art performance on two benchmarks, i.e., 94.4% and 86.7% accuracy on ModelNet40 and ScanObjectNN, respectively. Code will be made available at https://github.com/ZhanHeshen/PointCMT.
\ No newline at end of file
diff --git a/data/2022/neurips/Lethal Dose Conjecture on Data Poisoning b/data/2022/neurips/Lethal Dose Conjecture on Data Poisoning
new file mode 100644
index 0000000000..b5418ad504
--- /dev/null
+++ b/data/2022/neurips/Lethal Dose Conjecture on Data Poisoning	
@@ -0,0 +1 @@
+Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare b/data/2022/neurips/Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare
new file mode 100644
index 0000000000..7c957dc3c0
--- /dev/null
+++ b/data/2022/neurips/Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare	
@@ -0,0 +1 @@
+Many reinforcement learning (RL) applications have combinatorial action spaces, where each action is a composition of sub-actions. A standard RL approach ignores this inherent factorization structure, resulting in a potential failure to make meaningful inferences about rarely observed sub-action combinations; this is particularly problematic for offline settings, where data may be limited. In this work, we propose a form of linear Q-function decomposition induced by factored action spaces. We study the theoretical properties of our approach, identifying scenarios where it is guaranteed to lead to zero bias when used to approximate the Q-function. Outside the regimes with theoretical guarantees, we show that our approach can still be useful because it leads to better sample efficiency without necessarily sacrificing policy optimality, allowing us to achieve a better bias-variance trade-off. Across several offline RL problems using simulators and real-world datasets motivated by healthcare, we demonstrate that incorporating factored action spaces into value-based RL can result in better-performing policies. Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space when applying RL to observational datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Leveraging Inter-Layer Dependency for Post -Training Quantization b/data/2022/neurips/Leveraging Inter-Layer Dependency for Post -Training Quantization
new file mode 100644
index 0000000000..7f9def1a1a
--- /dev/null
+++ b/data/2022/neurips/Leveraging Inter-Layer Dependency for Post -Training Quantization	
@@ -0,0 +1 @@
+Prior works on Post-training Quantization (PTQ) typically separate a neural network into sub-nets and quantize them sequentially. This process pays little attention to the dependency across the sub-nets, hence is less optimal. In this paper, we propose a novel Network-Wise Quantization (NWQ) approach to fully leveraging inter-layer dependency. NWQ faces a larger scale combinatorial optimization problem of discrete variables than in previous works, which raises two major challenges: over-fitting and discrete optimization problem. NWQ alleviates over fitting via a Activation Regularization (AR) technique, which better controls the activation distribution. To optimize discrete variables, NWQ introduces Annealing Softmax (ASoftmax) and Annealing Mixup (AMixup) to progressively transition quantized weights and activations from continuity to discretization, respectively. Extensive experiments demonstrates that NWQ outperforms prior state-of-the-art approaches by a large margin: 20.24% for the challenging configuration of MobileNetV2 with 2 bits on ImageNet, pushing extremely low-bit PTQ from feasibility to usability. In addition, NWQ is able to achieve competitive or better results with only 10% computation cost of previous works.
\ No newline at end of file
diff --git a/data/2022/neurips/Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions b/data/2022/neurips/Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions
new file mode 100644
index 0000000000..6b0506b2e7
--- /dev/null
+++ b/data/2022/neurips/Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions	
@@ -0,0 +1 @@
+With the advent and increasing consolidation of e-commerce, digital advertising has very recently replaced traditional advertising as the main marketing force in the economy. In the past four years, a particularly important development in the digital advertising industry is the shift from second-price auctions to first-price auctions for online display ads. This shift immediately motivated the intellectually challenging question of how to bid in first-price auctions, because unlike in second-price auctions, bidding one's private value truthfully is no longer optimal. Following a series of recent works in this area, we consider a differentiated setup: we do not make any assumption about other bidders' maximum bid (i.e. it can be adversarial over time), and instead assume that we have access to a hint that serves as a prediction of other bidders' maximum bid, where the prediction is learned through some blackbox machine learning model. We consider two types of hints: one where a single point-prediction is available, and the other where a hint interval (representing a type of confidence region into which others' maximum bid falls) is available. We establish minimax optimal regret bounds for both cases and highlight the quantitatively different behavior between the two settings. We also provide improved regret bounds when the others' maximum bid exhibits the further structure of sparsity. Finally, we complement the theoretical results with demonstrations using real bidding data.
\ No newline at end of file
diff --git a/data/2022/neurips/LieGG: Studying Learned Lie Group Generators b/data/2022/neurips/LieGG: Studying Learned Lie Group Generators
new file mode 100644
index 0000000000..2770f85bb7
--- /dev/null
+++ b/data/2022/neurips/LieGG: Studying Learned Lie Group Generators	
@@ -0,0 +1 @@
+Symmetries built into a neural network have appeared to be very beneficial for a wide range of tasks as it saves the data to learn them. We depart from the position that when symmetries are not built into a model a priori, it is advantageous for robust networks to learn symmetries directly from the data to fit a task function. In this paper, we present a method to extract symmetries learned by a neural network and to evaluate the degree to which a network is invariant to them. With our method, we are able to explicitly retrieve learned invariances in a form of the generators of corresponding Lie-groups without prior knowledge of symmetries in the data. We use the proposed method to study how symmetrical properties depend on a neural network's parameterization and configuration. We found that the ability of a network to learn symmetries generalizes over a range of architectures. However, the quality of learned symmetries depends on the depth and the number of parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting b/data/2022/neurips/Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
new file mode 100644
index 0000000000..6d61bdcbac
--- /dev/null
+++ b/data/2022/neurips/Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting	
@@ -0,0 +1 @@
+In lifelong learning systems, especially those based on artificial neural networks, one of the biggest obstacles is the severe inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this article, we propose a new kind of connectionist architecture, the Sequential Neural Coding Network, that is robust to forgetting when learning from streams of data points and, unlike networks of today, does not learn via the immensely popular back-propagation of errors. Grounded in the neurocognitive theory of predictive processing, our model adapts its synapses in a biologically-plausible fashion, while another, complementary neural system rapidly learns to direct and control this cortex-like structure by mimicking the task-executive control functionality of the basal ganglia. In our experiments, we demonstrate that our self-organizing system experiences significantly less forgetting as compared to standard neural models and outperforms a wide swath of previously proposed methods even though it is trained across task datasets in a stream-like fashion. The promising performance of our complementary system on benchmarks, e.g., SplitMNIST, Split Fashion MNIST, and Split NotMNIST, offers evidence that by incorporating mechanisms prominent in real neuronal systems, such as competition, sparse activation patterns, and iterative input processing, a new possibility for tackling the grand challenge of lifelong machine learning opens up.
\ No newline at end of file
diff --git a/data/2022/neurips/Lifting Weak Supervision To Structured Prediction b/data/2022/neurips/Lifting Weak Supervision To Structured Prediction
new file mode 100644
index 0000000000..9f1b188d8e
--- /dev/null
+++ b/data/2022/neurips/Lifting Weak Supervision To Structured Prediction	
@@ -0,0 +1 @@
+Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits b/data/2022/neurips/Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits
new file mode 100644
index 0000000000..a7294107ef
--- /dev/null
+++ b/data/2022/neurips/Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits	
@@ -0,0 +1 @@
+We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts. We adapt the information-theoretic perspective of \cite{RvR16} to the contextual setting by considering a lifted version of the information ratio defined in terms of the unknown model parameter instead of the optimal action or optimal policy as done in previous works on the same setting. This allows us to bound the regret in terms of the entropy of the prior distribution through a remarkably simple proof, and with no structural assumptions on the likelihood or the prior. The extension to priors with infinite entropy only requires a Lipschitz assumption on the log-likelihood. An interesting special case is that of logistic bandits with $d$-dimensional parameters, $K$ actions, and Lipschitz logits, for which we provide a $\widetilde{O}(\sqrt{dKT})$ regret upper-bound that does not depend on the smallest slope of the sigmoid link function.
\ No newline at end of file
diff --git a/data/2022/neurips/Linear Label Ranking with Bounded Noise b/data/2022/neurips/Linear Label Ranking with Bounded Noise
new file mode 100644
index 0000000000..74196614f2
--- /dev/null
+++ b/data/2022/neurips/Linear Label Ranking with Bounded Noise	
@@ -0,0 +1 @@
+Label Ranking (LR) is the supervised task of learning a sorting function that maps feature vectors 𝑥 ∈ R 𝑑 to rankings 𝜎 ( 𝑥 ) ∈ S 𝑘 over a finite set of 𝑘 labels. We focus on the fundamental case of learning linear sorting functions (LSFs) under Gaussian marginals: 𝑥 is sampled from the 𝑑 -dimensional standard normal and the ground truth ranking 𝜎 ⋆ ( 𝑥 ) is the ordering induced by sorting the coordinates of the vector 𝑊 ⋆ 𝑥 , where 𝑊 ⋆ ∈ R 𝑘 × 𝑑 is unknown. We consider learning LSFs in the presence of bounded noise: assuming that a noiseless example is of the form ( 𝑥 , 𝜎 ⋆ ( 𝑥 )) , we observe ( 𝑥 , 𝜋 ) , where for any pair of elements 𝑖 ̸ = 𝑗 , the probability that the order of 𝑖, 𝑗 is different in 𝜋 than in 𝜎 ⋆ ( 𝑥 ) is at most 𝜂 < 1 / 2 . We design efficient non-proper and proper learning algorithms that learn hypotheses within normalized Kendall’s Tau distance 𝜖 from the ground truth with 𝑁 = ̃︀ 𝑂 ( 𝑑 log( 𝑘 ) /𝜖 ) labeled examples and runtime poly ( 𝑁, 𝑘 ) . For the more challenging top-𝑟 disagreement loss, we give an efficient proper learning algorithm that achieves 𝜖 top-𝑟 disagreement with the ground truth with 𝑁 = ̃︀ 𝑂 ( 𝑑𝑘𝑟/𝜖 ) samples and poly ( 𝑁 ) runtime.
\ No newline at end of file
diff --git a/data/2022/neurips/Linear tree shap b/data/2022/neurips/Linear tree shap
new file mode 100644
index 0000000000..3279bf7da3
--- /dev/null
+++ b/data/2022/neurips/Linear tree shap	
@@ -0,0 +1 @@
+In recent years, game-theoretic Shapley values have gained increasing attention with respect to local model explanation by feature attributions. While the approach using Shapley values is model-independent, their (exact) computation is usually intractable, so efficient model-specific algorithms have been devised including approaches for decision trees or their ensembles in general. Our work goes further in this direction by extending the interventional TreeSHAP algorithm to piecewise linear regression trees, which gained more attention in the past few years. To this end, we introduce a decomposition of the contribution function based on decision paths, which allows a more comprehensible formulation of SHAP algorithms for tree-based models. Our algorithm can also be readily applied to computing SHAP interaction values of these models. In particular, as the main contribution of this paper, we provide a more efficient approach of interventional SHAP for tree-based models by precomputing statistics of the background data based on the tree structure.
\ No newline at end of file
diff --git a/data/2022/neurips/Lipschitz Bandits with Batched Feedback b/data/2022/neurips/Lipschitz Bandits with Batched Feedback
new file mode 100644
index 0000000000..ca90d1e58e
--- /dev/null
+++ b/data/2022/neurips/Lipschitz Bandits with Batched Feedback	
@@ -0,0 +1 @@
+In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a <inline-formula> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula>-step problem with Lipschitz reward of zooming dimension <inline-formula> <tex-math notation="LaTeX">$d_{z}$ </tex-math></inline-formula>, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate <inline-formula> <tex-math notation="LaTeX">$\widetilde {\mathcal {O}}\left ({T^{\frac {d_{z}+1}{d_{z}+2}}}\right)$ </tex-math></inline-formula> using only <inline-formula> <tex-math notation="LaTeX">$\mathcal {O} \left ({\log \log T}\right) $ </tex-math></inline-formula> batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that <inline-formula> <tex-math notation="LaTeX">$\Omega (\log \log T)$ </tex-math></inline-formula> batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.
\ No newline at end of file
diff --git a/data/2022/neurips/List-Decodable Sparse Mean Estimation b/data/2022/neurips/List-Decodable Sparse Mean Estimation
new file mode 100644
index 0000000000..e41504060a
--- /dev/null
+++ b/data/2022/neurips/List-Decodable Sparse Mean Estimation	
@@ -0,0 +1 @@
+Robust mean estimation is one of the most important problems in statistics: given a set of samples in $\mathbb{R}^d$ where an $\alpha$ fraction are drawn from some distribution $D$ and the rest are adversarially corrupted, we aim to estimate the mean of $D$. A surge of recent research interest has been focusing on the list-decodable setting where $\alpha \in (0, \frac12]$, and the goal is to output a finite number of estimates among which at least one approximates the target mean. In this paper, we consider that the underlying distribution $D$ is Gaussian with $k$-sparse mean. Our main contribution is the first polynomial-time algorithm that enjoys sample complexity $O\big(\mathrm{poly}(k, \log d)\big)$, i.e. poly-logarithmic in the dimension. One of our core algorithmic ingredients is using low-degree sparse polynomials to filter outliers, which may find more applications.
\ No newline at end of file
diff --git a/data/2022/neurips/List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering b/data/2022/neurips/List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering
new file mode 100644
index 0000000000..801aa2d044
--- /dev/null
+++ b/data/2022/neurips/List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering	
@@ -0,0 +1 @@
+We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $\alpha \in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor \alpha m \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $\mu$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector $\widehat \mu$ such that $\| \widehat \mu - \mu \|_2$ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with"certifiably bounded"$t$-th moments in $k$-sparse directions and sufficiently light tails, our algorithm achieves error of $(1/\alpha)^{O(1/t)}$ with sample complexity $m = (k\log(n))^{O(t)}/\alpha$ and running time $\mathrm{poly}(mn^t)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $\Theta (\sqrt{\log(1/\alpha)})$ with quasi-polynomial sample and computational complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.
\ No newline at end of file
diff --git a/data/2022/neurips/Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF b/data/2022/neurips/Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF
new file mode 100644
index 0000000000..5c8c1c697f
--- /dev/null
+++ b/data/2022/neurips/Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF	
@@ -0,0 +1 @@
+This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on popular benchmarks, including a real-world multi-label classification task.
\ No newline at end of file
diff --git a/data/2022/neurips/LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models b/data/2022/neurips/LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models
new file mode 100644
index 0000000000..5d474282f1
--- /dev/null
+++ b/data/2022/neurips/LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models	
@@ -0,0 +1 @@
+The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak memory utilization and latency is non-trivial. This is exacerbated by the proliferation of various hardware. We leverage the somewhat surprising empirical observation that the number of decoder parameters in autoregressive Transformers has a high rank correlation with task performance, irrespective of the architecture topology. This observation organically induces a simple Neural Architecture Search (NAS) algorithm that uses decoder parameters as a proxy for perplexity without need for any model training. The search phase of our training-free algorithm, dubbed Lightweight Transformer Search (LTS), can be run directly on target devices since it does not require GPUs. Using on-target-device measurements, LTS extracts the Pareto-frontier of perplexity versus any hardware performance cost. We evaluate LTS on diverse devices from ARM CPUs to NVIDIA GPUs and two popular autoregressive Transformer backbones: GPT-2 and Transformer-XL. Results show that the perplexity of 16-layer GPT-2 and Transformer-XL can be achieved with up to 1.5x, 2.5x faster runtime and 1.2x, 2.0x lower peak memory utilization. When evaluated in zero and one-shot settings, LTS Pareto-frontier models achieve higher average accuracy compared to the 350M parameter OPT across 14 tasks, with up to 1.6x lower latency. LTS extracts the Pareto-frontier in under 3 hours while running on a commodity laptop. We effectively remove the carbon footprint of hundreds of GPU hours of training during search, offering a strong simple baseline for future NAS methods in autoregressive language modeling.
\ No newline at end of file
diff --git a/data/2022/neurips/LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation b/data/2022/neurips/LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
new file mode 100644
index 0000000000..0aad357ef2
--- /dev/null
+++ b/data/2022/neurips/LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation	
@@ -0,0 +1 @@
+We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Bayesian optimization via maximizing probability of descent b/data/2022/neurips/Local Bayesian optimization via maximizing probability of descent
new file mode 100644
index 0000000000..8002dcdc85
--- /dev/null
+++ b/data/2022/neurips/Local Bayesian optimization via maximizing probability of descent	
@@ -0,0 +1 @@
+Local optimization presents a promising approach to expensive, high-dimensional black-box optimization by sidestepping the need to globally explore the search space. For objective functions whose gradient cannot be evaluated directly, Bayesian optimization offers one solution -- we construct a probabilistic model of the objective, design a policy to learn about the gradient at the current location, and use the resulting information to navigate the objective landscape. Previous work has realized this scheme by minimizing the variance in the estimate of the gradient, then moving in the direction of the expected gradient. In this paper, we re-examine and refine this approach. We demonstrate that, surprisingly, the expected value of the gradient is not always the direction maximizing the probability of descent, and in fact, these directions may be nearly orthogonal. This observation then inspires an elegant optimization scheme seeking to maximize the probability of descent while moving in the direction of most-probable descent. Experiments on both synthetic and real-world objectives show that our method outperforms previous realizations of this optimization scheme and is competitive against other, significantly more complicated baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Identifiability of Deep ReLU Neural Networks: the Theory b/data/2022/neurips/Local Identifiability of Deep ReLU Neural Networks: the Theory
new file mode 100644
index 0000000000..f4a6c8cae3
--- /dev/null
+++ b/data/2022/neurips/Local Identifiability of Deep ReLU Neural Networks: the Theory	
@@ -0,0 +1 @@
+Is a sample rich enough to determine, at least locally, the parameters of a neural network? To answer this question, we introduce a new local parameterization of a given deep ReLU neural network by fixing the values of some of its weights. This allows us to define local lifting operators whose inverses are charts of a smooth manifold of a high dimensional space. The function implemented by the deep ReLU neural network composes the local lifting with a linear operator which depends on the sample. We derive from this convenient representation a geometrical necessary and sufficient condition of local identifiability. Looking at tangent spaces, the geometrical condition provides: 1/ a sharp and testable necessary condition of identifiability and 2/ a sharp and testable sufficient condition of local identifiability. The validity of the conditions can be tested numerically using backpropagation and matrix rank computations.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Latent Space Bayesian Optimization over Structured Inputs b/data/2022/neurips/Local Latent Space Bayesian Optimization over Structured Inputs
new file mode 100644
index 0000000000..323e9eba2e
--- /dev/null
+++ b/data/2022/neurips/Local Latent Space Bayesian Optimization over Structured Inputs	
@@ -0,0 +1 @@
+Bayesian optimization over the latent spaces of deep autoencoder models (DAEs) has recently emerged as a promising new approach for optimizing challenging black-box functions over structured, discrete, hard-to-enumerate search spaces (e.g., molecules). Here the DAE dramatically simplifies the search space by mapping inputs into a continuous latent space where familiar Bayesian optimization tools can be more readily applied. Despite this simplification, the latent space typically remains high-dimensional. Thus, even with a well-suited latent space, these approaches do not necessarily provide a complete solution, but may rather shift the structured optimization problem to a high-dimensional one. In this paper, we propose LOL-BO, which adapts the notion of trust regions explored in recent work on high-dimensional Bayesian optimization to the structured setting. By reformulating the encoder to function as both an encoder for the DAE globally and as a deep kernel for the surrogate model within a trust region, we better align the notion of local optimization in the latent space with local optimization in the input space. LOL-BO achieves as much as 20 times improvement over state-of-the-art latent space Bayesian optimization methods across six real-world benchmarks, demonstrating that improvement in optimization strategies is as important as developing better DAE models.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity b/data/2022/neurips/Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity
new file mode 100644
index 0000000000..085879593f
--- /dev/null
+++ b/data/2022/neurips/Local Linear Convergence of Gradient Methods for Subspace Optimization via Strict Complementarity	
@@ -0,0 +1 @@
+We consider optimization problems in which the goal is find a $k$-dimensional subspace of $\mathbb{R}^n$, $k<<n$, which minimizes a convex and smooth loss. Such problems generalize the fundamental task of principal component analysis (PCA) to include robust and sparse counterparts, and logistic PCA for binary data, among others. This problem could be approached either via nonconvex gradient methods with highly-efficient iterations, but for which arguing about fast convergence to a global minimizer is difficult or, via a convex relaxation for which arguing about convergence to a global minimizer is straightforward, but the corresponding methods are often inefficient in high dimensions. In this work we bridge these two approaches under a strict complementarity assumption, which in particular implies that the optimal solution to the convex relaxation is unique and is also the optimal solution to the original nonconvex problem. Our main result is a proof that a natural nonconvex gradient method which is \textit{SVD-free} and requires only a single QR-factorization of an $n\times k$ matrix per iteration, converges locally with a linear rate. We also establish linear convergence results for the nonconvex projected gradient method, and the Frank-Wolfe method when applied to the convex relaxation.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions b/data/2022/neurips/Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions
new file mode 100644
index 0000000000..50c9bb420b
--- /dev/null
+++ b/data/2022/neurips/Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions	
@@ -0,0 +1 @@
+We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis b/data/2022/neurips/Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis
new file mode 100644
index 0000000000..6afde039e3
--- /dev/null
+++ b/data/2022/neurips/Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis	
@@ -0,0 +1 @@
+Recent self-supervised advances in medical computer vision exploit the global and local anatomical self-similarity for pretraining prior to downstream tasks such as segmentation. However, current methods assume i.i.d. image acquisition, which is invalid in clinical study designs where follow-up longitudinal scans track subject-specific temporal changes. Further, existing self-supervised methods for medically-relevant image-to-image architectures exploit only spatial or temporal self-similarity and do so via a loss applied only at a single image-scale, with naive multi-scale spatiotemporal extensions collapsing to degenerate solutions. To these ends, this paper makes two contributions: (1) It presents a local and multi-scale spatiotemporal representation learning method for image-to-image architectures trained on longitudinal images. It exploits the spatiotemporal self-similarity of learned multi-scale intra-subject image features for pretraining and develops several feature-wise regularizations that avoid degenerate representations; (2) During finetuning, it proposes a surprisingly simple self-supervised segmentation consistency regularization to exploit intra-subject correlation. Benchmarked across various segmentation tasks, the proposed framework outperforms both well-tuned randomly-initialized baselines and current self-supervised techniques designed for both i.i.d. and longitudinal datasets. These improvements are demonstrated across both longitudinal neurodegenerative adult MRI and developing infant brain MRI and yield both higher performance and longitudinal consistency.
\ No newline at end of file
diff --git a/data/2022/neurips/Local-Global MCMC kernels: the best of both worlds b/data/2022/neurips/Local-Global MCMC kernels: the best of both worlds
new file mode 100644
index 0000000000..87c7eeb439
--- /dev/null
+++ b/data/2022/neurips/Local-Global MCMC kernels: the best of both worlds	
@@ -0,0 +1 @@
+Recent works leveraging learning to enhance sampling have shown promising results, in particular by designing effective non-local moves and global proposals. However, learning accuracy is inevitably limited in regions where little data is available such as in the tails of distributions as well as in high-dimensional problems. In the present paper we study an Explore-Exploit Markov chain Monte Carlo strategy ($Ex^2MCMC$) that combines local and global samplers showing that it enjoys the advantages of both approaches. We prove $V$-uniform geometric ergodicity of $Ex^2MCMC$ without requiring a uniform adaptation of the global sampler to the target distribution. We also compute explicit bounds on the mixing rate of the Explore-Exploit strategy under realistic conditions. Moreover, we also analyze an adaptive version of the strategy ($FlEx^2MCMC$) where a normalizing flow is trained while sampling to serve as a proposal for global moves. We illustrate the efficiency of $Ex^2MCMC$ and its adaptive version on classical sampling benchmarks as well as in sampling high-dimensional distributions defined by Generative Adversarial Networks seen as Energy Based Models. We provide the code to reproduce the experiments at the link: https://github.com/svsamsonov/ex2mcmc_new.
\ No newline at end of file
diff --git a/data/2022/neurips/Locally Hierarchical Auto-Regressive Modeling for Image Generation b/data/2022/neurips/Locally Hierarchical Auto-Regressive Modeling for Image Generation
new file mode 100644
index 0000000000..5870e9d716
--- /dev/null
+++ b/data/2022/neurips/Locally Hierarchical Auto-Regressive Modeling for Image Generation	
@@ -0,0 +1 @@
+We propose a locally hierarchical auto-regressive model with multiple resolutions of discrete codes. In the ﬁrst stage of our algorithm, we represent an image with a pyramid of codes using Hierarchically Quantized Variational AutoEncoder (HQ-VAE), which disentangles the information contained in the multi-level codes. For an example of two-level codes, we create two separate pathways to carry high-level coarse structures of input images using top codes while compensating for missing ﬁne details by constructing a residual connection for bottom codes. An appropriate selection of resizing operations for code embedding maps enables top codes to capture maximal information within images and the ﬁrst stage algorithm achieves better performance on both vector quantization and image generation. The second stage adopts Hierarchically Quantized Transformer (HQ-Transformer) to process a sequence of local pyramids, which consist of a single top code and its corresponding bottom codes. Contrary to other hierarchical models, we sample bottom codes in parallel by exploiting the conditional independence assumption on the bottom codes. This assumption is naturally harvested from our ﬁrst-stage model, HQ-VAE, where the bottom code learns to describe local details. On class-conditional and text-conditional generation benchmarks, our model shows competitive performance to previous AR models in terms of ﬁdelity of generated images while enjoying lighter computational budgets.
\ No newline at end of file
diff --git a/data/2022/neurips/Locating and Editing Factual Associations in GPT b/data/2022/neurips/Locating and Editing Factual Associations in GPT
new file mode 100644
index 0000000000..b919bedb9b
--- /dev/null
+++ b/data/2022/neurips/Locating and Editing Factual Associations in GPT	
@@ -0,0 +1 @@
+We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/
\ No newline at end of file
diff --git a/data/2022/neurips/Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy b/data/2022/neurips/Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy
new file mode 100644
index 0000000000..db7078525f
--- /dev/null
+++ b/data/2022/neurips/Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy	
@@ -0,0 +1 @@
+A canonical noise distribution (CND) is an additive mechanism designed to satisfy $f$-differential privacy ($f$-DP), without any wasted privacy budget. $f$-DP is a hypothesis testing-based formulation of privacy phrased in terms of tradeoff functions, which captures the difficulty of a hypothesis test. In this paper, we consider the existence and construction of both log-concave CNDs and multivariate CNDs. Log-concave distributions are important to ensure that higher outputs of the mechanism correspond to higher input values, whereas multivariate noise distributions are important to ensure that a joint release of multiple outputs has a tight privacy characterization. We show that the existence and construction of CNDs for both types of problems is related to whether the tradeoff function can be decomposed by functional composition (related to group privacy) or mechanism composition. In particular, we show that pure $\epsilon$-DP cannot be decomposed in either way and that there is neither a log-concave CND nor any multivariate CND for $\epsilon$-DP. On the other hand, we show that Gaussian-DP, $(0,\delta)$-DP, and Laplace-DP each have both log-concave and multivariate CNDs.
\ No newline at end of file
diff --git a/data/2022/neurips/Log-Linear-Time Gaussian Processes Using Binary Tree Kernels b/data/2022/neurips/Log-Linear-Time Gaussian Processes Using Binary Tree Kernels
new file mode 100644
index 0000000000..36472c1060
--- /dev/null
+++ b/data/2022/neurips/Log-Linear-Time Gaussian Processes Using Binary Tree Kernels	
@@ -0,0 +1 @@
+Gaussian processes (GPs) produce good probabilistic models of functions, but most GP kernels require $O((n+m)n^2)$ time, where $n$ is the number of data points and $m$ the number of predictive locations. We present a new kernel that allows for Gaussian process regression in $O((n+m)\log(n+m))$ time. Our"binary tree"kernel places all data points on the leaves of a binary tree, with the kernel depending only on the depth of the deepest common ancestor. We can store the resulting kernel matrix in $O(n)$ space in $O(n \log n)$ time, as a sum of sparse rank-one matrices, and approximately invert the kernel matrix in $O(n)$ time. Sparse GP methods also offer linear run time, but they predict less well than higher dimensional kernels. On a classic suite of regression tasks, we compare our kernel against Mat\'ern, sparse, and sparse variational kernels. The binary tree GP assigns the highest likelihood to the test data on a plurality of datasets, usually achieves lower mean squared error than the sparse methods, and often ties or beats the Mat\'ern GP. On large datasets, the binary tree GP is fastest, and much faster than a Mat\'ern GP.
\ No newline at end of file
diff --git a/data/2022/neurips/Log-Polar Space Convolution Layers b/data/2022/neurips/Log-Polar Space Convolution Layers
new file mode 100644
index 0000000000..3aa4e1cca5
--- /dev/null
+++ b/data/2022/neurips/Log-Polar Space Convolution Layers	
@@ -0,0 +1 @@
+Convolutional neural networks use regular quadrilateral convolution kernels to extract features. Since the number of parameters increases quadratically with the size of the convolution kernel, many popular models use small convolution kernels, resulting in small local receptive ﬁelds in lower layers. This paper proposes a novel log-polar space convolution (LPSC) layer, where the convolution kernel is elliptical and adaptively divides its local receptive ﬁeld into different regions according to the relative directions and logarithmic distances. The local receptive ﬁeld grows exponentially with the number of distance levels. Therefore, the proposed LPSC not only naturally encodes local spatial structures, but also greatly increases the single-layer receptive ﬁeld while maintaining the number of parameters. We show that LPSC can be implemented with conventional convolution via log-polar space pooling and can be applied in any network architecture to replace conventional convolutions. Experiments on different tasks and datasets demonstrate the effectiveness of the proposed LPSC.
\ No newline at end of file
diff --git a/data/2022/neurips/LogiGAN: Learning Logical Reasoning via Adversarial Pre-training b/data/2022/neurips/LogiGAN: Learning Logical Reasoning via Adversarial Pre-training
new file mode 100644
index 0000000000..184a75924f
--- /dev/null
+++ b/data/2022/neurips/LogiGAN: Learning Logical Reasoning via Adversarial Pre-training	
@@ -0,0 +1 @@
+We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Upon automatic identifying logical reasoning phenomena in massive text corpus via detection heuristics, we train language models to predict the masked-out logical statements. Inspired by the facilitation effect of reflective thinking in human learning, we analogically simulate the learning-thinking process with an adversarial Generator-Verifier architecture to assist logic learning. LogiGAN implements a novel sequential GAN approach that (a) circumvents the non-differentiable challenge of the sequential GAN by leveraging the Generator as a sentence-level generative likelihood scorer with a learning objective of reaching scoring consensus with the Verifier; (b) is computationally feasible for large-scale pre-training with arbitrary target length. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of LogiGAN. Ablation studies on LogiGAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking's facilitation effect might also generalize to machine learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators b/data/2022/neurips/Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators
new file mode 100644
index 0000000000..b11b738ead
--- /dev/null
+++ b/data/2022/neurips/Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators	
@@ -0,0 +1 @@
+The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks. Consequently, we construct efficient approximations named $\text{AND}_\text{AIL}$ (the AND operator Approximate for Independent Logits), $\text{OR}_\text{AIL}$, and $\text{XNOR}_\text{AIL}$, which utilize only comparison and addition operations, have well-behaved gradients, and can be deployed as activation functions in neural networks. Like MaxOut, $\text{AND}_\text{AIL}$ and $\text{OR}_\text{AIL}$ are generalizations of ReLU to two-dimensions. While our primary aim is to formalize dendritic computations within a logit-space probabilistic-Boolean framework, we deploy these new activation functions, both in isolation and in conjunction to demonstrate their effectiveness on a variety of tasks including image classification, transfer learning, abstract reasoning, and compositional zero-shot learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Logical Credal Networks b/data/2022/neurips/Logical Credal Networks
new file mode 100644
index 0000000000..1def2ee020
--- /dev/null
+++ b/data/2022/neurips/Logical Credal Networks	
@@ -0,0 +1 @@
+The Logical Credal Network or LCN is a recent probabilistic logic designed for effective aggregation and reasoning over multiple sources of imprecise knowledge. An LCN specifies a set of probability distributions over all interpretations of a set of logical formulas for which marginal and conditional probability bounds on their truth values are known. Inference in LCNs involves the exact solution of a non-convex non-linear program defined over an exponentially large number of non-negative real valued variables and, therefore, is limited to relatively small problems. In this paper, we present ARIEL -- a novel iterative message-passing scheme for approximate inference in LCNs. Inspired by classical belief propagation for graphical models, our method propagates messages that involve solving considerably smaller local non-linear programs. Experiments on several classes of LCNs demonstrate clearly that ARIEL yields high quality solutions compared with exact inference and scales to much larger problems than previously considered.
\ No newline at end of file
diff --git a/data/2022/neurips/Long Range Graph Benchmark b/data/2022/neurips/Long Range Graph Benchmark
new file mode 100644
index 0000000000..601ebf39d9
--- /dev/null
+++ b/data/2022/neurips/Long Range Graph Benchmark	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.
\ No newline at end of file
diff --git a/data/2022/neurips/Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning b/data/2022/neurips/Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
new file mode 100644
index 0000000000..e87f21d0ed
--- /dev/null
+++ b/data/2022/neurips/Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning	
@@ -0,0 +1 @@
+Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks. Previous studies of video-language pretraining mainly focus on short-form videos (i.e., within 30 seconds) and sentences, leaving long-form video-language pre-training rarely explored. Directly learning representation from long-form videos and language may benefit many long-form video-language understanding tasks. However, it is challenging due to the difficulty of modeling long-range relationships and the heavy computational burden caused by more frames. In this paper, we introduce a Long-Form VIdeo-LAnguage pre-training model (LF-VILA) and train it on a large-scale long-form video and paragraph dataset constructed from an existing public dataset. To effectively capture the rich temporal dynamics and to better align video and language in an efficient end-to-end manner, we introduce two novel designs in our LF-VILA model. We first propose a Multimodal Temporal Contrastive (MTC) loss to learn the temporal relation across different modalities by encouraging fine-grained alignment between long-form videos and paragraphs. Second, we propose a Hierarchical Temporal Window Attention (HTWA) mechanism to effectively capture long-range dependency while reducing computational cost in Transformer. We fine-tune the pre-trained LF-VILA model on seven downstream long-form video-language understanding tasks of paragraph-to-video retrieval and long-form video question-answering, and achieve new state-of-the-art performances. Specifically, our model achieves 16.1% relative improvement on ActivityNet paragraph-to-video retrieval task and 2.4% on How2QA task, respectively. We release our code, dataset, and pre-trained models at https://github.com/microsoft/XPretrain.
\ No newline at end of file
diff --git a/data/2022/neurips/Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding b/data/2022/neurips/Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
new file mode 100644
index 0000000000..f68f5e84a7
--- /dev/null
+++ b/data/2022/neurips/Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding	
@@ -0,0 +1 @@
+The 3D visual grounding task has been explored with visual and language streams comprehending referential language to identify target objects in 3D scenes. However, most existing methods devote the visual stream to capturing the 3D visual clues using off-the-shelf point clouds encoders. The main question we address in this paper is"can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?". The main idea is to assist the 3D encoder by incorporating rich 2D object representations without requiring extra 2D inputs. To this end, we leverage 2D clues, synthetically generated from 3D point clouds, and empirically show their aptitude to boost the quality of the learned visual representations. We validate our approach through comprehensive experiments on Nr3D, Sr3D, and ScanRefer datasets and show consistent performance gains compared to existing methods. Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks, i.e., Nr3D, Sr3D, and ScanRefer. The code is available at https://eslambakr.github.io/LAR.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/Look More but Care Less in Video Recognition b/data/2022/neurips/Look More but Care Less in Video Recognition
new file mode 100644
index 0000000000..dfb6937c9a
--- /dev/null
+++ b/data/2022/neurips/Look More but Care Less in Video Recognition	
@@ -0,0 +1 @@
+Existing action recognition methods typically sample a few frames to represent each video to avoid the enormous computation, which often limits the recognition performance. To tackle this problem, we propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation. Specifically, the Ample Branch takes all input frames to obtain abundant information with condensed computation and provides the guidance for Focal Branch by the proposed Navigation Module; the Focal Branch squeezes the temporal size to only focus on the salient frames at each convolution block; in the end, the results of two branches are adaptively fused to prevent the loss of information. With this design, we can introduce more frames to the network but cost less computation. Besides, we demonstrate AFNet can utilize fewer frames while achieving higher accuracy as the dynamic selection in intermediate features enforces implicit temporal modeling. Further, we show that our method can be extended to reduce spatial redundancy with even less cost. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning b/data/2022/neurips/Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
new file mode 100644
index 0000000000..9bcc88bc6e
--- /dev/null
+++ b/data/2022/neurips/Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning	
@@ -0,0 +1 @@
+Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.
\ No newline at end of file
diff --git a/data/2022/neurips/Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing b/data/2022/neurips/Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
new file mode 100644
index 0000000000..ed0914dfbe
--- /dev/null
+++ b/data/2022/neurips/Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing	
@@ -0,0 +1 @@
+Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resource speech corpus. This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S$^3$-Router framework, which for the first time discovers that simply discarding no more than 10\% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks. More importantly, S$^3$-Router can serve as an all-in-one technique to enable (1) a new finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a state-of-the-art ASR pruning technique, and (4) a new tool to quantitatively analyze the learned speech representation. We believe S$^3$-Router has provided a new perspective for practical deployment of speech SSL models. Our codes are available at: https://github.com/GATECH-EIC/S3-Router.
\ No newline at end of file
diff --git a/data/2022/neurips/Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation b/data/2022/neurips/Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation
new file mode 100644
index 0000000000..ff639512ca
--- /dev/null
+++ b/data/2022/neurips/Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation	
@@ -0,0 +1 @@
+Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders failing to map novel combinations to the proper regions of the latent space, or (b) novel combinations being mapped correctly but the decoder being unable to render the correct output for the unseen combinations. We investigate these alternatives by testing several models on a range of datasets and training settings. We find that (i) when models fail, their encoders also fail to map unseen combinations to correct regions of the latent space and (ii) when models succeed, it is either because the test conditions do not exclude enough examples, or because excluded cases involve combinations of object properties with its shape. We argue that to generalise properly, models not only need to capture factors of variation, but also understand how to invert the process that causes the visual input.
\ No newline at end of file
diff --git a/data/2022/neurips/Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks b/data/2022/neurips/Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks
new file mode 100644
index 0000000000..056578f885
--- /dev/null
+++ b/data/2022/neurips/Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks	
@@ -0,0 +1 @@
+A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP both through the lens of the data distribution and the loss landscape geometry. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on"easy"training data, we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset. Finally, we identify novel properties of the loss landscape of dense networks that are predictive of IMP performance, showing in particular that more examples being linearly mode connected in the dense network correlates well with good initializations for IMP. Combined, these results provide new insight into the role played by the early phase training in IMP.
\ No newline at end of file
diff --git a/data/2022/neurips/Low-Rank Modular Reinforcement Learning via Muscle Synergy b/data/2022/neurips/Low-Rank Modular Reinforcement Learning via Muscle Synergy
new file mode 100644
index 0000000000..a281103185
--- /dev/null
+++ b/data/2022/neurips/Low-Rank Modular Reinforcement Learning via Muscle Synergy	
@@ -0,0 +1 @@
+Modular Reinforcement Learning (RL) decentralizes the control of multi-joint robots by learning policies for each actuator. Previous work on modular RL has proven its ability to control morphologically different agents with a shared actuator policy. However, with the increase in the Degree of Freedom (DoF) of robots, training a morphology-generalizable modular controller becomes exponentially difficult. Motivated by the way the human central nervous system controls numerous muscles, we propose a Synergy-Oriented LeARning (SOLAR) framework that exploits the redundant nature of DoF in robot control. Actuators are grouped into synergies by an unsupervised learning method, and a synergy action is learned to control multiple actuators in synchrony. In this way, we achieve a low-rank control at the synergy level. We extensively evaluate our method on a variety of robot morphologies, and the results show its superior efficiency and generalizability, especially on robots with a large DoF like Humanoids++ and UNIMALs.
\ No newline at end of file
diff --git a/data/2022/neurips/Low-rank Optimal Transport: Approximation, Statistics and Debiasing b/data/2022/neurips/Low-rank Optimal Transport: Approximation, Statistics and Debiasing
new file mode 100644
index 0000000000..b514142359
--- /dev/null
+++ b/data/2022/neurips/Low-rank Optimal Transport: Approximation, Statistics and Debiasing	
@@ -0,0 +1 @@
+The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in \cite{scetbon2021lowrank} holds several promises in that regard, and was shown to complement more established entropic regularization approaches, being able to insert itself in more complex pipelines, such as quadratic OT. LOT restricts the search for low-cost couplings to those that have a low-nonnegative rank, yielding linear time algorithms in cases of interest. However, these promises can only be fulfilled if the LOT approach is seen as a legitimate contender to entropic regularization when compared on properties of interest, where the scorecard typically includes theoretical properties (statistical complexity and relation to other methods) or practical aspects (debiasing, hyperparameter tuning, initialization). We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
\ No newline at end of file
diff --git a/data/2022/neurips/Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations b/data/2022/neurips/Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations
new file mode 100644
index 0000000000..5384159c7d
--- /dev/null
+++ b/data/2022/neurips/Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations	
@@ -0,0 +1 @@
+Neural networks have achieved tremendous success in a large variety of applications. However, their memory footprint and computational demand can render them impractical in application settings with limited hardware or energy resources. In this work, we propose a novel algorithm to find efficient low-rank subnetworks. Remarkably, these subnetworks are determined and adapted already during the training phase and the overall time and memory resources required by both training and evaluating them are significantly reduced. The main idea is to restrict the weight matrices to a low-rank manifold and to update the low-rank factors rather than the full matrix during training. To derive training updates that are restricted to the prescribed manifold, we employ techniques from dynamic model order reduction for matrix differential equations. This allows us to provide approximation, stability, and descent guarantees. Moreover, our method automatically and dynamically adapts the ranks during training to achieve the desired approximation accuracy. The efficiency of the proposed method is demonstrated through a variety of numerical experiments on fully-connected and convolutional networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression b/data/2022/neurips/Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression
new file mode 100644
index 0000000000..f9350f7161
--- /dev/null
+++ b/data/2022/neurips/Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression	
@@ -0,0 +1 @@
+Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results on convergence rates under communication compression, a theoretical lower bound is still missing. Analyses of algorithms with communication compression have attributed convergence to two abstract properties: the unbiased property or the contractive property. They can be applied with either unidirectional compression (only messages from workers to server are compressed) or bidirectional compression. In this paper, we consider distributed stochastic algorithms for minimizing smooth and non-convex objective functions under communication compression. We establish a convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection. To close the gap between the lower bound and the existing upper bounds, we further propose an algorithm, NEOLITHIC, which almost reaches our lower bound (up to logarithm factors) under mild conditions. Our results also show that using contractive bidirectional compression can yield iterative methods that converge as fast as those using unbiased unidirectional compression. The experimental results validate our findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs b/data/2022/neurips/Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs
new file mode 100644
index 0000000000..8d5fd1c585
--- /dev/null
+++ b/data/2022/neurips/Lower Bounds on Randomly Preconditioned Lasso via Robust Sparse Designs	
@@ -0,0 +1 @@
+Sparse linear regression with ill-conditioned Gaussian random covariates is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably cannot succeed on polylogarithmically sparse signals with a sublinear number of samples. However, this lower bound only holds against deterministic preconditioners, and in many contexts randomization is crucial to the success of preconditioners. We prove a stronger lower bound that rules out randomized preconditioners. For an appropriate covariance matrix, we construct a single signal distribution on which any invertibly-preconditioned Lasso program fails with high probability, unless it receives a linear number of samples. Surprisingly, at the heart of our lower bound is a new robustness result in compressed sensing. In particular, we study recovering a sparse signal when a few measurements can be erased adversarially. To our knowledge, this natural question has not been studied before for sparse measurements. We surprisingly show that standard sparse Bernoulli measurements are almost-optimally robust to adversarial erasures: if b measurements are erased, then all but O ( b ) of the coordinates of the signal are identiﬁable.
\ No newline at end of file
diff --git a/data/2022/neurips/Luckiness in Multiscale Online Learning b/data/2022/neurips/Luckiness in Multiscale Online Learning
new file mode 100644
index 0000000000..b94576bc76
--- /dev/null
+++ b/data/2022/neurips/Luckiness in Multiscale Online Learning	
@@ -0,0 +1 @@
+Algorithms for full-information online learning are classically tuned to minimize their worst-case regret. Modern algorithms additionally provide tighter guarantees outside the adversarial regime, most notably in the form of constant pseudoregret bounds under statistical margin assumptions. We investigate the multiscale extension of the problem where the loss ranges of the experts are vastly different. Here, the regret with respect to each expert needs to scale with its range, instead of the maximum overall range. We develop new multiscale algorithms, tuning schemes and analysis techniques to show that worst-case robustness and adaptation to easy data can be combined at a negligible cost. We further develop an extension with optimism and apply it to solve multiscale two-player zero-sum games. We demonstrate experimentally the superior performance of our scale-adaptive algorithm and discuss the subtle relationship of our results to Freund’s 2016 open problem.
\ No newline at end of file
diff --git a/data/2022/neurips/M2N: Mesh Movement Networks for PDE Solvers b/data/2022/neurips/M2N: Mesh Movement Networks for PDE Solvers
new file mode 100644
index 0000000000..160915e05a
--- /dev/null
+++ b/data/2022/neurips/M2N: Mesh Movement Networks for PDE Solvers	
@@ -0,0 +1 @@
+Mainstream numerical Partial Differential Equation (PDE) solvers require discretizing the physical domain using a mesh. Mesh movement methods aim to improve the accuracy of the numerical solution by increasing mesh resolution where the solution is not well-resolved, whilst reducing unnecessary resolution elsewhere. However, mesh movement methods, such as the Monge-Ampere method, require the solution of auxiliary equations, which can be extremely expensive especially when the mesh is adapted frequently. In this paper, we propose to our best knowledge the first learning-based end-to-end mesh movement framework for PDE solvers. Key requirements of learning-based mesh movement methods are alleviating mesh tangling, boundary consistency, and generalization to mesh with different resolutions. To achieve these goals, we introduce the neural spline model and the graph attention network (GAT) into our models respectively. While the Neural-Spline based model provides more flexibility for large deformation, the GAT based model can handle domains with more complicated shapes and is better at performing delicate local deformation. We validate our methods on stationary and time-dependent, linear and non-linear equations, as well as regularly and irregularly shaped domains. Compared to the traditional Monge-Ampere method, our approach can greatly accelerate the mesh adaptation process, whilst achieving comparable numerical error reduction.
\ No newline at end of file
diff --git a/data/2022/neurips/M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus b/data/2022/neurips/M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus
new file mode 100644
index 0000000000..d8d16c7aac
--- /dev/null
+++ b/data/2022/neurips/M4Singer: A Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus	
@@ -0,0 +1 @@
+The lack of publicly available high-quality and accurately labeled datasets has long been a major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present M4Singer , a free-to-use M ulti-style, M ulti-singer M andarin singing collection with elaborately annotated M usical scores as well as its benchmarks. Specifically, 1) we construct and release a large high-quality Chinese singing voice corpus, which is recorded by 20 professional singers, covering 700 Chinese pop songs as well as all the four SATB types (i.e., soprano, alto, tenor, and bass); 2) we take extensive efforts to manually compose the musical scores for each recorded song, which is necessary to the study of the prosody modeling for SVS. 3) To facilitate the use and demonstrate the quality of M4Singer, we conduct four different benchmark experiments: score-based SVS, controllable singing voice (CSV), singing voice conversion (SVC) and automatic music transcription (AMT). Audio samples can be found at http://m4singer.github.io.
\ No newline at end of file
diff --git a/data/2022/neurips/MABSplit: Faster Forest Training Using Multi-Armed Bandits b/data/2022/neurips/MABSplit: Faster Forest Training Using Multi-Armed Bandits
new file mode 100644
index 0000000000..26ba9b92f0
--- /dev/null
+++ b/data/2022/neurips/MABSplit: Faster Forest Training Using Multi-Armed Bandits	
@@ -0,0 +1 @@
+Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decision trees. Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples and computational power across candidate split points. We provide theoretical guarantees that MABSplit improves the sample complexity of each node split from linear to logarithmic in the number of data points. In some settings, MABSplit leads to 100x faster training (an 99% reduction in training time) without any decrease in generalization performance. We demonstrate similar speedups when MABSplit is used across a variety of forest-based variants, such as Extremely Random Forests and Random Patches. We also show our algorithm can be used in both classification and regression tasks. Finally, we show that MABSplit outperforms existing methods in generalization performance and feature importance calculations under a fixed computational budget. All of our experimental results are reproducible via a one-line script at https://github.com/ThrunGroup/FastForest.
\ No newline at end of file
diff --git a/data/2022/neurips/MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields b/data/2022/neurips/MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields
new file mode 100644
index 0000000000..74a8d442f4
--- /dev/null
+++ b/data/2022/neurips/MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields	
@@ -0,0 +1 @@
+Creating fast and accurate force fields is a long-standing challenge in computational chemistry and materials science. Recently, several equivariant message passing neural networks (MPNNs) have been shown to outperform models built using other approaches in terms of accuracy. However, most MPNNs suffer from high computational cost and poor scalability. We propose that these limitations arise because MPNNs only pass two-body messages leading to a direct relationship between the number of layers and the expressivity of the network. In this work, we introduce MACE, a new equivariant MPNN model that uses higher body order messages. In particular, we show that using four-body messages reduces the required number of message passing iterations to just two, resulting in a fast and highly parallelizable model, reaching or exceeding state-of-the-art accuracy on the rMD17, 3BPA, and AcAc benchmark tasks. We also demonstrate that using higher order messages leads to an improved steepness of the learning curves.
\ No newline at end of file
diff --git a/data/2022/neurips/MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching b/data/2022/neurips/MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching
new file mode 100644
index 0000000000..8cafaebfd3
--- /dev/null
+++ b/data/2022/neurips/MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching	
@@ -0,0 +1 @@
+Recently, the accuracy of image-text matching has been greatly improved by multimodal pretrained models, all of which are trained on millions or billions of paired images and texts. Different from them, this paper studies a new scenario as unpaired image-text matching, in which paired images and texts are assumed to be unavailable during model training. To deal with this, we propose a simple yet effective method namely Multimodal Aligned Conceptual Knowledge (MACK), which is inspired by the knowledge use in human brain. It can be directly used as general knowledge to correlate images and texts even without model training, or further fine-tuned based on unpaired images and texts to better generalize to certain datasets. In addition, we extend it as a re-ranking method, which can be easily combined with existing image-text matching models to substantially improve their performance.
\ No newline at end of file
diff --git a/data/2022/neurips/MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control b/data/2022/neurips/MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control
new file mode 100644
index 0000000000..411f21ad63
--- /dev/null
+++ b/data/2022/neurips/MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control	
@@ -0,0 +1 @@
+We introduce the Multi-Agent Tracking Environment ( MATE ), a novel multi-agent environment simulates the target coverage control problems in the real world. MATE hosts an asymmetric cooperative-competitive game consisting of two groups of learning agents—“cameras” and “targets”—with opposing interests. Speciﬁcally, “cameras”, a group of directional sensors, are mandated to actively control the directional perception area to maximize the coverage rate of targets. On the other side, “targets” are mobile agents that aim to transport cargo between multiple randomly assigned warehouses while minimizing the exposure to the camera sensor networks. To showcase the practicality of MATE, we benchmark the multi-agent reinforcement learning (MARL) algorithms from different aspects, including cooperation, communication, scalability, robustness, and asymmetric self-play. We start by reporting results for cooperative tasks using MARL algorithms (MAPPO, IPPO, QMIX, MADDPG) and the results after augmenting with multi-agent communication protocols (TarMAC, I2C). We then evaluate the effectiveness of the popular self-play techniques (PSRO, ﬁctitious self-play) in an asymmetric zero-sum competitive game. This process of co-evolution between cameras and targets helps to realize a less exploitable
\ No newline at end of file
diff --git a/data/2022/neurips/MAgNet: Mesh Agnostic Neural PDE Solver b/data/2022/neurips/MAgNet: Mesh Agnostic Neural PDE Solver
new file mode 100644
index 0000000000..95064de65e
--- /dev/null
+++ b/data/2022/neurips/MAgNet: Mesh Agnostic Neural PDE Solver	
@@ -0,0 +1 @@
+The computational complexity of classical numerical methods for solving Partial Differential Equations (PDE) scales significantly as the resolution increases. As an important example, climate predictions require fine spatio-temporal resolutions to resolve all turbulent scales in the fluid simulations. This makes the task of accurately resolving these scales computationally out of reach even with modern supercomputers. As a result, current numerical modelers solve PDEs on grids that are too coarse (3km to 200km on each side), which hinders the accuracy and usefulness of the predictions. In this paper, we leverage the recent advances in Implicit Neural Representations (INR) to design a novel architecture that predicts the spatially continuous solution of a PDE given a spatial position query. By augmenting coordinate-based architectures with Graph Neural Networks (GNN), we enable zero-shot generalization to new non-uniform meshes and long-term predictions up to 250 frames ahead that are physically consistent. Our Mesh Agnostic Neural PDE Solver (MAgNet) is able to make accurate predictions across a variety of PDE simulation datasets and compares favorably with existing baselines. Moreover, MAgNet generalizes well to different meshes and resolutions up to four times those trained on.
\ No newline at end of file
diff --git a/data/2022/neurips/MAtt: A Manifold Attention Network for EEG Decoding b/data/2022/neurips/MAtt: A Manifold Attention Network for EEG Decoding
new file mode 100644
index 0000000000..40e00f3bf4
--- /dev/null
+++ b/data/2022/neurips/MAtt: A Manifold Attention Network for EEG Decoding	
@@ -0,0 +1 @@
+Recognition of electroencephalographic (EEG) signals highly affect the efficiency of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-learning (DL)-based EEG decoders offer improved performances, the development of geometric learning (GL) has attracted much attention for offering exceptional robustness in decoding noisy EEG data. However, there is a lack of studies on the merged use of deep neural networks (DNNs) and geometric learning for EEG decoding. We herein propose a manifold attention network (mAtt), a novel geometric deep learning (GDL)-based model, featuring a manifold attention mechanism that characterizes spatiotemporal representations of EEG data fully on a Riemannian symmetric positive definite (SPD) manifold. The evaluation of the proposed MAtt on both time-synchronous and -asyncronous EEG datasets suggests its superiority over other leading DL methods for general EEG decoding. Furthermore, analysis of model interpretation reveals the capability of MAtt in capturing informative EEG features and handling the non-stationarity of brain dynamics.
\ No newline at end of file
diff --git a/data/2022/neurips/MBW: Multi-view Bootstrapping in the Wild b/data/2022/neurips/MBW: Multi-view Bootstrapping in the Wild
new file mode 100644
index 0000000000..bab4d4a778
--- /dev/null
+++ b/data/2022/neurips/MBW: Multi-view Bootstrapping in the Wild	
@@ -0,0 +1 @@
+Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing 1-2% of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting image frames of tail-end distribution categories with their corresponding 2D, 3D labels generated from minimal human intervention.
\ No newline at end of file
diff --git a/data/2022/neurips/MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators b/data/2022/neurips/MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators
new file mode 100644
index 0000000000..ab347d0bd2
--- /dev/null
+++ b/data/2022/neurips/MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators	
@@ -0,0 +1 @@
+We propose a framework of generative adversarial networks with multiple discriminators, which collaborate to represent a real dataset more effectively. Our approach facilitates learning a generator consistent with the underlying data distribution based on real images and thus mitigates the chronic mode collapse problem. From the inspiration of multiple choice learning, we guide each discriminator to have expertise in a subset of the entire data and allow the generator to find reasonable correspondences between the latent and real data spaces automatically without extra supervision for training examples. Despite the use of multiple discriminators, the backbone networks are shared across the discriminators and the increase in training cost is marginal. We demonstrate the effectiveness of our algorithm using multiple evaluation metrics in the standard datasets for diverse tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/MCMAE: Masked Convolution Meets Masked Autoencoders b/data/2022/neurips/MCMAE: Masked Convolution Meets Masked Autoencoders
new file mode 100644
index 0000000000..f155cd77d0
--- /dev/null
+++ b/data/2022/neurips/MCMAE: Masked Convolution Meets Masked Autoencoders	
@@ -0,0 +1 @@
+Vision Transformers (ViT) become widely-adopted architectures for various vision tasks. Masked auto-encoding [2, 1, 28, 55] for feature pretraining and multi-scale hybrid convolution-transformer architectures [12, 21, 49, 34, 57] can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation. In this paper, our MCMAE framework demonstrates that multi-scale hybrid convolution-transformer can learn more discriminative representations via the mask auto-encoding scheme. However, directly using the original masking strategy leads to the heavy computational cost and pretraining-finetuning discrepancy. To tackle the issue, we adopt the masked convolution to prevent information leakage in the convolution blocks. A simple block-wise masking strategy is proposed to ensure computational efficiency. We also propose to more directly supervise the multi-scale features of the encoder to boost multi-scale features. MCMAE-Base improves ImageNet-1K finetuning accuracy by 1.4% compared with MAE-Base. On object detection, MCMAE-Base finetuned for only 25 epochs surpasses MAE-Base fined-tuned for 100 epochs by 2.9% AP box and 2.2% AP mask respectively. Code and pretrained models are available at https://github.com/Alpha-VL/ConvMAE .
\ No newline at end of file
diff --git a/data/2022/neurips/MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation b/data/2022/neurips/MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
new file mode 100644
index 0000000000..d4f732ac73
--- /dev/null
+++ b/data/2022/neurips/MCVD - Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation	
@@ -0,0 +1 @@
+Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using $\le$ 4 GPUs. Project page: https://mask-cond-video-diffusion.github.io ; Code : https://github.com/voletiv/mcvd-pytorch
\ No newline at end of file
diff --git a/data/2022/neurips/MEMO: Test Time Robustness via Adaptation and Augmentation b/data/2022/neurips/MEMO: Test Time Robustness via Adaptation and Augmentation
new file mode 100644
index 0000000000..760654a8de
--- /dev/null
+++ b/data/2022/neurips/MEMO: Test Time Robustness via Adaptation and Augmentation	
@@ -0,0 +1 @@
+While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we evaluate two baseline ResNet models, two robust ResNet-50 models, and a robust vision transformer model, and we demonstrate that this approach achieves accuracy gains of 1-8\% over standard model evaluation and also generally outperforms prior augmentation and adaptation strategies. For the setting in which only one test point is available, we achieve state-of-the-art results on the ImageNet-C, ImageNet-R, and, among ResNet-50 models, ImageNet-A distribution shift benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets b/data/2022/neurips/METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets
new file mode 100644
index 0000000000..8e8ee875d8
--- /dev/null
+++ b/data/2022/neurips/METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets	
@@ -0,0 +1 @@
+The COVID-19 pandemic continues to bring up various topics discussed or debated on social media. In order to explore the impact of pandemics on people's lives, it is crucial to understand the public's concerns and attitudes towards pandemic-related entities (e.g., drugs, vaccines) on social media. However, models trained on existing named entity recognition (NER) or targeted sentiment analysis (TSA) datasets have limited ability to understand COVID-19-related social media texts because these datasets are not designed or annotated from a medical perspective. This paper releases METS-CoV, a dataset containing medical entities and targeted sentiments from COVID-19-related tweets. METS-CoV contains 10,000 tweets with 7 types of entities, including 4 medical entity types (Disease, Drug, Symptom, and Vaccine) and 3 general entity types (Person, Location, and Organization). To further investigate tweet users' attitudes toward specific entities, 4 types of entities (Person, Organization, Drug, and Vaccine) are selected and annotated with user sentiments, resulting in a targeted sentiment dataset with 9,101 entities (in 5,278 tweets). To the best of our knowledge, METS-CoV is the first dataset to collect medical entities and corresponding sentiments of COVID-19-related tweets. We benchmark the performance of classical machine learning models and state-of-the-art deep learning models on NER and TSA tasks with extensive experiments. Results show that the dataset has vast room for improvement for both NER and TSA tasks. METS-CoV is an important resource for developing better medical social media tools and facilitating computational social science research, especially in epidemiology. Our data, annotation guidelines, benchmark models, and source code are publicly available (https://github.com/YLab-Open/METS-CoV) to ensure reproducibility.
\ No newline at end of file
diff --git a/data/2022/neurips/MExMI: Pool-based Active Model Extraction Crossover Membership Inference b/data/2022/neurips/MExMI: Pool-based Active Model Extraction Crossover Membership Inference
new file mode 100644
index 0000000000..323cf3591e
--- /dev/null
+++ b/data/2022/neurips/MExMI: Pool-based Active Model Extraction Crossover Membership Inference	
@@ -0,0 +1 @@
+With increasing popularity of Machine Learning as a Service (MLaaS), ML models trained from public and proprietary data are deployed in the cloud and deliver prediction services to users. However, as the prediction API becomes a new attack surface, growing concerns have arisen on the conﬁdentiality of ML models. Existing literatures show their vulnerability under model extraction (ME) attacks, while their private training data is vulnerable to another type of attacks, namely, membership inference (MI). In this paper, we show that ME and MI can reinforce each other through a chained and iterative reaction, which can signiﬁcantly boost ME attack accuracy and improve MI by saving the query cost. As such, we build a framework MExMI for pool-based active model extraction (PAME) to exploit MI through three modules: “MI Pre-Filter”, “MI Post-Filter”, and “semi-supervised boosting”. Experimental results show that MExMI can improve up to 11 . 14% from the best known PAME attack and reach 94 . 07% ﬁdelity with only 16k queries. Furthermore, the accuracy, precision and recall of the MI attack in MExMI are on par with state-of-the-art MI attack which needs 150k queries.
\ No newline at end of file
diff --git a/data/2022/neurips/MGNNI: Multiscale Graph Neural Networks with Implicit Layers b/data/2022/neurips/MGNNI: Multiscale Graph Neural Networks with Implicit Layers
new file mode 100644
index 0000000000..11b11255fb
--- /dev/null
+++ b/data/2022/neurips/MGNNI: Multiscale Graph Neural Networks with Implicit Layers	
@@ -0,0 +1 @@
+Recently, implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs. In this paper, we introduce and justify two weaknesses of implicit GNNs: the constrained expressiveness due to their limited effective range for capturing long-range dependencies, and their lack of ability to capture multiscale information on graphs at multiple resolutions. To show the limited effective range of previous implicit GNNs, We first provide a theoretical analysis and point out the intrinsic relationship between the effective range and the convergence of iterative equations used in these models. To mitigate the mentioned weaknesses, we propose a multiscale graph neural network with implicit layers (MGNNI) which is able to model multiscale structures on graphs and has an expanded effective range for capturing long-range dependencies. We conduct comprehensive experiments for both node classification and graph classification to show that MGNNI outperforms representative baselines and has a better ability for multiscale modeling and capturing of long-range dependencies.
\ No newline at end of file
diff --git a/data/2022/neurips/MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing b/data/2022/neurips/MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing
new file mode 100644
index 0000000000..05ae482590
--- /dev/null
+++ b/data/2022/neurips/MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing	
@@ -0,0 +1 @@
+Video-language models (VLMs), large models pre-trained on numerous but noisy video-text pairs from the internet, have revolutionized activity recognition through their remarkable generalization and open-vocabulary capabilities. While complex human activities are often hierarchical and compositional, most existing tasks for evaluating VLMs focus only on high-level video understanding, making it difficult to accurately assess and interpret the ability of VLMs to understand complex and fine-grained human activities. Inspired by the recently proposed MOMA framework, we define activity graphs as a single universal representation of human activities that encompasses video understanding at the activity, subactivity, and atomic action level. We redefine activity parsing as the overarching task of activity graph generation, requiring understanding human activities across all three levels. To facilitate the evaluation of models on activity parsing, we introduce MOMA-LRG (Multi-Object Multi-Actor Language-Refined Graphs), a large dataset of complex human activities with activity graph annotations that can be readily transformed into natural language sentences. Lastly, we present a model-agnostic and lightweight approach to adapting and evaluating VLMs by incorporating structured knowledge from activity graphs into VLMs, addressing the individual limitations of language and graphical models. We demonstrate strong performance on activity parsing and few-shot video classification, and our framework is intended to foster future research in the joint modeling of videos, graphs, and language.
\ No newline at end of file
diff --git a/data/2022/neurips/MORA: Improving Ensemble Robustness Evaluation with Model Reweighing Attack b/data/2022/neurips/MORA: Improving Ensemble Robustness Evaluation with Model Reweighing Attack
new file mode 100644
index 0000000000..2741d02509
--- /dev/null
+++ b/data/2022/neurips/MORA: Improving Ensemble Robustness Evaluation with Model Reweighing Attack	
@@ -0,0 +1 @@
+Adversarial attacks can deceive neural networks by adding tiny perturbations to their input data. Ensemble defenses, which are trained to minimize attack transferability among sub-models, offer a promising research direction to improve robustness against such attacks while maintaining a high accuracy on natural inputs. We discover, however, that recent state-of-the-art (SOTA) adversarial attack strategies cannot reliably evaluate ensemble defenses, sizeably overestimating their robustness. This paper identifies the two factors that contribute to this behavior. First, these defenses form ensembles that are notably difficult for existing gradient-based method to attack, due to gradient obfuscation. Second, ensemble defenses diversify sub-model gradients, presenting a challenge to defeat all sub-models simultaneously, simply summing their contributions may counteract the overall attack objective; yet, we observe that ensemble may still be fooled despite most sub-models being correct. We therefore introduce MORA, a model-reweighing attack to steer adversarial example synthesis by reweighing the importance of sub-model gradients. MORA finds that recent ensemble defenses all exhibit varying degrees of overestimated robustness. Comparing it against recent SOTA white-box attacks, it can converge orders of magnitude faster while achieving higher attack success rates across all ensemble models examined with three different ensemble modes (i.e., ensembling by either softmax, voting or logits). In particular, most ensemble defenses exhibit near or exactly 0% robustness against MORA with $\ell^\infty$ perturbation within 0.02 on CIFAR-10, and 0.01 on CIFAR-100. We make MORA open source with reproducible results and pre-trained models; and provide a leaderboard of ensemble defenses under various attack strategies.
\ No newline at end of file
diff --git a/data/2022/neurips/MOVE: Unsupervised Movable Object Segmentation and Detection b/data/2022/neurips/MOVE: Unsupervised Movable Object Segmentation and Detection
new file mode 100644
index 0000000000..43622a7e17
--- /dev/null
+++ b/data/2022/neurips/MOVE: Unsupervised Movable Object Segmentation and Detection	
@@ -0,0 +1 @@
+We introduce MOVE, a novel method to segment objects without any form of supervision. MOVE exploits the fact that foreground objects can be shifted locally relative to their initial position and result in realistic (undistorted) new images. This property allows us to train a segmentation model on a dataset of images without annotation and to achieve state of the art (SotA) performance on several evaluation datasets for unsupervised salient object detection and segmentation. In unsupervised single object discovery, MOVE gives an average CorLoc improvement of 7.2% over the SotA, and in unsupervised class-agnostic object detection it gives a relative AP improvement of 53% on average. Our approach is built on top of self-supervised features (e.g. from DINO or MAE), an inpainting network (based on the Masked AutoEncoder) and adversarial training.
\ No newline at end of file
diff --git a/data/2022/neurips/MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification b/data/2022/neurips/MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification
new file mode 100644
index 0000000000..c48abca7ed
--- /dev/null
+++ b/data/2022/neurips/MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification	
@@ -0,0 +1 @@
+Although online handwriting verification has made great progress recently, the verification performances are still far behind the real usage owing to the small scale of the datasets as well as the limited biometric mediums. Therefore, this paper proposes a new handwriting verification benchmark dataset named Multimodal Signature and Digit String (MSDS), which consists of two subsets: MSDS-ChS (Chinese Signatures) and MSDS-TDS (Token Digit Strings), contributed by 402 users, with 20 genuine samples and 20 skilled forgeries per user per subset. MSDS-ChS consists of handwritten Chinese signatures, which, to the best of our knowledge, is the largest publicly available Chinese signature dataset for handwriting verification, at least eight times larger than existing online datasets. Meanwhile, MSDS-TDS consists of handwritten Token Digit Strings, i.e, the actual phone numbers of users, which have not been explored yet. Extensive experiments with different baselines are respectively conducted for MSDS-ChS and MSDS-TDS. Surprisingly, verification performances of state-of-the-art methods on MSDS-TDS are generally better than those on MSDS-ChS, which indicates that the handwritten Token Digit String could be a more effective biometric than handwritten Chinese signature. This is a promising discovery that could inspire us to explore new biometric traits. The MSDS dataset is available at https://github.com/HCIILAB/MSDS.
\ No newline at end of file
diff --git a/data/2022/neurips/MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction b/data/2022/neurips/MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction
new file mode 100644
index 0000000000..af041eeefe
--- /dev/null
+++ b/data/2022/neurips/MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction	
@@ -0,0 +1 @@
+There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/MVP-N: A Dataset and Benchmark for Real-World Multi-View Object Classification b/data/2022/neurips/MVP-N: A Dataset and Benchmark for Real-World Multi-View Object Classification
new file mode 100644
index 0000000000..b1f7df9942
--- /dev/null
+++ b/data/2022/neurips/MVP-N: A Dataset and Benchmark for Real-World Multi-View Object Classification	
@@ -0,0 +1 @@
+Combining information from multiple views is essential for discriminating similar objects. However, existing datasets for multi-view object classification have several limitations, such as synthetic and coarse-grained objects, no validation split for hyperparameter tuning, and a lack of view-level information quantity annotations for analyzing multi-view-based methods. To address this issue, this study proposes a new dataset, MVP-N 2 , which contains 44 retail products, 16k real captured views with human-perceived information quantity annotations, and 9k multi-view sets. The fine-grained categorization of objects naturally generates multi-view label noise owing to the inter-class view similarity, allowing the study of learning from noisy labels in the multi-view case. Moreover, this study benchmarks four multi-view-based feature aggregation methods and twelve soft label methods on MVP-N. Experimental results show that MVP-N will be a valuable resource for facilitating the development of real-world multi-view object classification methods. The dataset and code are publicly available at https://github.com/SMNUResearch/MVP-N .
\ No newline at end of file
diff --git a/data/2022/neurips/Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach b/data/2022/neurips/Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
new file mode 100644
index 0000000000..ace7d8c1cc
--- /dev/null
+++ b/data/2022/neurips/Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach	
@@ -0,0 +1 @@
+Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight. However, we find the indiscriminate perturbation of SAM on all parameters is suboptimal, which also results in excessive computation, i.e., double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose an efficient and effective training scheme coined as Sparse SAM (SSAM), which achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions which are based onFisher information and dynamic sparse training, respectively. In addition, we theoretically prove that SSAM can converge at the same rate as SAM, i.e., $O(\log T/\sqrt{T})$. Sparse SAM not only has the potential for training acceleration but also smooths the loss landscape effectively. Extensive experimental results on CIFAR10, CIFAR100, and ImageNet-1K confirm the superior efficiency of our method to SAM, and the performance is preserved or even better with a perturbation of merely 50% sparsity. Code is availiable at https://github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Make Some Noise: Reliable and Efficient Single-Step Adversarial Training b/data/2022/neurips/Make Some Noise: Reliable and Efficient Single-Step Adversarial Training
new file mode 100644
index 0000000000..fdc4bb222f
--- /dev/null
+++ b/data/2022/neurips/Make Some Noise: Reliable and Efficient Single-Step Adversarial Training	
@@ -0,0 +1 @@
+Recently, Wong et al. showed that adversarial training with single-step FGSM leads to a characteristic failure mode named Catastrophic Overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. Experimentally they showed that simply adding a random perturbation prior to FGSM (RS-FGSM) could prevent CO. However, Andriushchenko and Flammarion observed that RS-FGSM still leads to CO for larger perturbations, and proposed a computationally expensive regularizer (GradAlign) to avoid it. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with \textit{not clipping} is highly effective in avoiding CO for large perturbation radii. We then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous state-of-the-art GradAlign, while achieving 3x speed-up. Code can be found in https://github.com/pdejorge/N-FGSM
\ No newline at end of file
diff --git a/data/2022/neurips/Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis b/data/2022/neurips/Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis
new file mode 100644
index 0000000000..586822a06c
--- /dev/null
+++ b/data/2022/neurips/Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis	
@@ -0,0 +1 @@
+Most of the existing algorithms for zero-shot classification problems typically rely on the attribute-based semantic relations among categories to realize the classification of novel categories without observing any of their instances. However, training the zero-shot classification models still requires attribute labeling for each class (or even instance) in the training dataset, which is also expensive. To this end, in this paper, we bring up a new problem scenario:"Can we derive zero-shot learning for novel attribute detectors/classifiers and use them to automatically annotate the dataset for labeling efficiency?". Basically, given only a small set of detectors that are learned to recognize some manually annotated attributes (i.e., the seen attributes), we aim to synthesize the detectors of novel attributes in a zero-shot learning manner. Our proposed method, Zero-Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge, tackles this new research problem by applying the set operations to first decompose the seen attributes into their basic attributes and then recombine these basic attributes into the novel ones. Extensive experiments are conducted to verify the capacity of our synthesized detectors for accurately capturing the semantics of the novel attributes and show their superior performance in terms of detection and localization compared to other baseline approaches. Moreover, we demonstrate the application of automatic annotation using our synthesized detectors on Caltech-UCSD Birds-200-2011 dataset. Various generalized zero-shot classification algorithms trained upon the dataset re-annotated by ZSLA show comparable performance with those trained with the manual ground-truth annotations. Please refer to our project page for source code: https://yuhsuanli.github.io/ZSLA/
\ No newline at end of file
diff --git a/data/2022/neurips/Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels b/data/2022/neurips/Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels
new file mode 100644
index 0000000000..67a6f940ab
--- /dev/null
+++ b/data/2022/neurips/Making Look-Ahead Active Learning Strategies Feasible with Neural Tangent Kernels	
@@ -0,0 +1 @@
+We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining, and prove that this approximation works asymptotically even in an active learning setup -- approximating"look-ahead"selection criteria with far less computation required. This also enables us to conduct sequential active learning, i.e. updating the model in a streaming regime, without needing to retrain the model with SGD after adding each new data point. Moreover, our querying strategy, which better understands how the model's predictions will change by adding new data points in comparison to the standard ("myopic") criteria, beats other look-ahead strategies by large margins, and achieves equal or better performance compared to state-of-the-art methods on several benchmark datasets in pool-based active learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure b/data/2022/neurips/Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure
new file mode 100644
index 0000000000..52569d9a18
--- /dev/null
+++ b/data/2022/neurips/Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure	
@@ -0,0 +1 @@
+This paper presents a new efficient black-box attribution method based on Hilbert-Schmidt Independence Criterion (HSIC), a dependence measure based on Reproducing Kernel Hilbert Spaces (RKHS). HSIC measures the dependence between regions of an input image and the output of a model based on kernel embeddings of distributions. It thus provides explanations enriched by RKHS representation capabilities. HSIC can be estimated very efficiently, significantly reducing the computational cost compared to other black-box attribution methods. Our experiments show that HSIC is up to 8 times faster than the previous best black-box attribution methods while being as faithful. Indeed, we improve or match the state-of-the-art of both black-box and white-box attribution methods for several fidelity metrics on Imagenet with various recent model architectures. Importantly, we show that these advances can be transposed to efficiently and faithfully explain object detection models such as YOLOv4. Finally, we extend the traditional attribution methods by proposing a new kernel enabling an ANOVA-like orthogonal decomposition of importance scores based on HSIC, allowing us to evaluate not only the importance of each image patch but also the importance of their pairwise interactions. Our implementation is available at https://github.com/paulnovello/HSIC-Attribution-Method.
\ No newline at end of file
diff --git a/data/2022/neurips/Manifold Interpolating Optimal-Transport Flows for Trajectory Inference b/data/2022/neurips/Manifold Interpolating Optimal-Transport Flows for Trajectory Inference
new file mode 100644
index 0000000000..56bdde0678
--- /dev/null
+++ b/data/2022/neurips/Manifold Interpolating Optimal-Transport Flows for Trajectory Inference	
@@ -0,0 +1 @@
+We present a method called Manifold Interpolating Optimal-Transport Flow (MIOFlow) that learns stochastic, continuous population dynamics from static snapshot samples taken at sporadic timepoints. MIOFlow combines dynamic models, manifold learning, and optimal transport by training neural ordinary differential equations (Neural ODE) to interpolate between static population snapshots as penalized by optimal transport with manifold ground distance. Further, we ensure that the flow follows the geometry by operating in the latent space of an autoencoder that we call a geodesic autoencoder (GAE). In GAE the latent space distance between points is regularized to match a novel multiscale geodesic distance on the data manifold that we define. We show that this method is superior to normalizing flows, Schrödinger bridges and other generative models that are designed to flow from noise to data in terms of interpolating between populations. Theoretically, we link these trajectories with dynamic optimal transport. We evaluate our method on simulated data with bifurcations and merges, as well as scRNA-seq data from embryoid body differentiation, and acute myeloid leukemia treatment.
\ No newline at end of file
diff --git a/data/2022/neurips/Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation b/data/2022/neurips/Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation
new file mode 100644
index 0000000000..2416156679
--- /dev/null
+++ b/data/2022/neurips/Margin-Based Few-Shot Class-Incremental Learning with Class-Level Overfitting Mitigation	
@@ -0,0 +1 @@
+Few-shot class-incremental learning (FSCIL) is designed to incrementally recognize novel classes with only few training samples after the (pre-)training on base classes with sufficient samples, which focuses on both base-class performance and novel-class generalization. A well known modification to the base-class training is to apply a margin to the base-class classification. However, a dilemma exists that we can hardly achieve both good base-class performance and novel-class generalization simultaneously by applying the margin during the base-class training, which is still under explored. In this paper, we study the cause of such dilemma for FSCIL. We first interpret this dilemma as a class-level overfitting (CO) problem from the aspect of pattern learning, and then find its cause lies in the easily-satisfied constraint of learning margin-based patterns. Based on the analysis, we propose a novel margin-based FSCIL method to mitigate the CO problem by providing the pattern learning process with extra constraint from the margin-based patterns themselves. Extensive experiments on CIFAR100, Caltech-USCD Birds-200-2011 (CUB200), and miniImageNet demonstrate that the proposed method effectively mitigates the CO problem and achieves state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients b/data/2022/neurips/Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients
new file mode 100644
index 0000000000..b9cabb0996
--- /dev/null
+++ b/data/2022/neurips/Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients	
@@ -0,0 +1 @@
+Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Markovian Interference in Experiments b/data/2022/neurips/Markovian Interference in Experiments
new file mode 100644
index 0000000000..78a2a2335f
--- /dev/null
+++ b/data/2022/neurips/Markovian Interference in Experiments	
@@ -0,0 +1 @@
+We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited inventory). Despite outsize practical importance, the best estimators for this `Markovian' interference problem are largely heuristic in nature, and their bias is not well understood. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general have exponentially smaller variance than off-policy evaluation. At the same time, its bias is second order in the impact of the intervention. This yields a striking bias-variance tradeoff so that the DQ estimator effectively dominates state-of-the-art alternatives. From a theoretical perspective, we introduce three separate novel techniques that are of independent interest in the theory of Reinforcement Learning (RL). Our empirical evaluation includes a set of experiments on a city-scale ride-hailing simulator.
\ No newline at end of file
diff --git a/data/2022/neurips/Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class b/data/2022/neurips/Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class
new file mode 100644
index 0000000000..9703190a68
--- /dev/null
+++ b/data/2022/neurips/Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class	
@@ -0,0 +1 @@
+In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary's control on maliciously constructed input with a trigger. While these existing attacks are very effective, the adversary's capability is limited: given an input, these attacks can only cause the model to misclassify toward a single pre-defined or target class. In contrast, this paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model. Given the learned trigger-generation function, during inference, the adversary can specify an arbitrary backdoor attack target class, and an appropriate trigger causing the model to classify toward this target class is created accordingly. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet. The proposed Marksman backdoor attack can also easily bypass existing backdoor defenses that were originally designed against backdoor attacks with a single target class. Our work takes another significant step toward understanding the extensive risks of backdoor attacks in practice.
\ No newline at end of file
diff --git a/data/2022/neurips/Mask Matching Transformer for Few-Shot Segmentation b/data/2022/neurips/Mask Matching Transformer for Few-Shot Segmentation
new file mode 100644
index 0000000000..d6ae9bf7d0
--- /dev/null
+++ b/data/2022/neurips/Mask Matching Transformer for Few-Shot Segmentation	
@@ -0,0 +1 @@
+In this paper, we aim to tackle the challenging few-shot segmentation task from a new perspective. Typical methods follow the paradigm to firstly learn prototypical features from support images and then match query features in pixel-level to obtain segmentation results. However, to obtain satisfactory segments, such a paradigm needs to couple the learning of the matching operations with heavy segmentation modules, limiting the flexibility of design and increasing the learning complexity. To alleviate this issue, we propose Mask Matching Transformer (MM-Former), a new paradigm for the few-shot segmentation task. Specifically, MM-Former first uses a class-agnostic segmenter to decompose the query image into multiple segment proposals. Then, a simple matching mechanism is applied to merge the related segment proposals into the final mask guided by the support images. The advantages of our MM-Former are two-fold. First, the MM-Former follows the paradigm of decompose first and then blend, allowing our method to benefit from the advanced potential objects segmenter to produce high-quality mask proposals for query images. Second, the mission of prototypical features is relaxed to learn coefficients to fuse correct ones within a proposal pool, making the MM-Former be well generalized to complex scenarios or cases. We conduct extensive experiments on the popular COCO-$20^i$ and Pascal-$5^i$ benchmarks. Competitive results well demonstrate the effectiveness and the generalization ability of our MM-Former.
\ No newline at end of file
diff --git a/data/2022/neurips/Mask-based Latent Reconstruction for Reinforcement Learning b/data/2022/neurips/Mask-based Latent Reconstruction for Reinforcement Learning
new file mode 100644
index 0000000000..6c9fb7632a
--- /dev/null
+++ b/data/2022/neurips/Mask-based Latent Reconstruction for Reinforcement Learning	
@@ -0,0 +1 @@
+For deep reinforcement learning (RL) from pixels, learning effective state representations is crucial for achieving high performance. However, in practice, limited experience and high-dimensional inputs prevent effective representation learning. To address this, motivated by the success of mask-based modeling in other research fields, we introduce mask-based reconstruction to promote state representation learning in RL. Specifically, we propose a simple yet effective self-supervised method, Mask-based Latent Reconstruction (MLR), to predict complete state representations in the latent space from the observations with spatially and temporally masked pixels. MLR enables better use of context information when learning state representations to make them more informative, which facilitates the training of RL agents. Extensive experiments show that our MLR significantly improves the sample efficiency in RL and outperforms the state-of-the-art sample-efficient RL methods on multiple continuous and discrete control benchmarks. Our code is available at https://github.com/microsoft/Mask-based-Latent-Reconstruction.
\ No newline at end of file
diff --git a/data/2022/neurips/MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning b/data/2022/neurips/MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning
new file mode 100644
index 0000000000..9e65813629
--- /dev/null
+++ b/data/2022/neurips/MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning	
@@ -0,0 +1 @@
+Placement is an essential task in modern chip design, aiming at placing millions of circuit modules on a 2D chip canvas. Unlike the human-centric solution, which requires months of intense effort by hardware engineers to produce a layout to minimize delay and energy consumption, deep reinforcement learning has become an emerging autonomous tool. However, the learning-centric method is still in its early stage, impeded by a massive design space of size ten to the order of a few thousand. This work presents MaskPlace to automatically generate a valid chip layout design within a few hours, whose performance can be superior or comparable to recent advanced approaches. It has several appealing benefits that prior arts do not have. Firstly, MaskPlace recasts placement as a problem of learning pixel-level visual representation to comprehensively describe millions of modules on a chip, enabling placement in a high-resolution canvas and a large action space. It outperforms recent methods that represent a chip as a hypergraph. Secondly, it enables training the policy network by an intuitive reward function with dense reward, rather than a complicated reward function with sparse reward from previous methods. Thirdly, extensive experiments on many public benchmarks show that MaskPlace outperforms existing RL approaches in all key performance metrics, including wirelength, congestion, and density. For example, it achieves 60%-90% wirelength reduction and guarantees zero overlaps. We believe MaskPlace can improve AI-assisted chip layout design. The deliverables are released at https://laiyao1.github.io/maskplace.
\ No newline at end of file
diff --git a/data/2022/neurips/MaskTune: Mitigating Spurious Correlations by Forcing to Explore b/data/2022/neurips/MaskTune: Mitigating Spurious Correlations by Forcing to Explore
new file mode 100644
index 0000000000..a693990ae5
--- /dev/null
+++ b/data/2022/neurips/MaskTune: Mitigating Spurious Correlations by Forcing to Explore	
@@ -0,0 +1 @@
+A fundamental challenge of over-parameterized deep learning models is learning meaningful data representations that yield good performance on a downstream task without over-fitting spurious input features. This work proposes MaskTune, a masking strategy that prevents over-reliance on spurious (or a limited number of) features. MaskTune forces the trained model to explore new features during a single epoch finetuning by masking previously discovered features. MaskTune, unlike earlier approaches for mitigating shortcut learning, does not require any supervision, such as annotating spurious features or labels for subgroup samples in a dataset. Our empirical results on biased MNIST, CelebA, Waterbirds, and ImagenNet-9L datasets show that MaskTune is effective on tasks that often suffer from the existence of spurious correlations. Finally, we show that MaskTune outperforms or achieves similar performance to the competing methods when applied to the selective classification (classification with rejection option) task. Code for MaskTune is available at https://github.com/aliasgharkhani/Masktune.
\ No newline at end of file
diff --git a/data/2022/neurips/Masked Autoencoders As Spatiotemporal Learners b/data/2022/neurips/Masked Autoencoders As Spatiotemporal Learners
new file mode 100644
index 0000000000..e041f2957a
--- /dev/null
+++ b/data/2022/neurips/Masked Autoencoders As Spatiotemporal Learners	
@@ -0,0 +1 @@
+This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels. Interestingly, we show that our MAE method can learn strong representations with almost no inductive bias on spacetime (only except for patch and positional embeddings), and spacetime-agnostic random masking performs the best. We observe that the optimal masking ratio is as high as 90% (vs. 75% on images), supporting the hypothesis that this ratio is related to information redundancy of the data. A high masking ratio leads to a large speedup, e.g.,>4x in wall-clock time or even more. We report competitive results on several challenging video datasets using vanilla Vision Transformers. We observe that MAE can outperform supervised pre-training by large margins. We further report encouraging results of training on real-world, uncurated Instagram data. Our study suggests that the general framework of masked autoencoding (BERT, MAE, etc.) can be a unified methodology for representation learning with minimal domain knowledge.
\ No newline at end of file
diff --git a/data/2022/neurips/Masked Autoencoders that Listen b/data/2022/neurips/Masked Autoencoders that Listen
new file mode 100644
index 0000000000..2584d8b575
--- /dev/null
+++ b/data/2022/neurips/Masked Autoencoders that Listen	
@@ -0,0 +1 @@
+This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower masking ratio on target datasets. Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training. The code and models will be at https://github.com/facebookresearch/AudioMAE.
\ No newline at end of file
diff --git a/data/2022/neurips/Masked Autoencoding for Scalable and Generalizable Decision Making b/data/2022/neurips/Masked Autoencoding for Scalable and Generalizable Decision Making
new file mode 100644
index 0000000000..7f51a03be0
--- /dev/null
+++ b/data/2022/neurips/Masked Autoencoding for Scalable and Generalizable Decision Making	
@@ -0,0 +1 @@
+We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models. To this end, this paper presents masked decision prediction (MaskDP), a simple and scalable self-supervised pretraining method for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP approach, we employ a masked autoencoder (MAE) to state-action trajectories, wherein we randomly mask state and action tokens and reconstruct the missing data. By doing so, the model is required to infer masked-out states and actions and extract information about dynamics. We find that masking different proportions of the input sequence significantly helps with learning a better model that generalizes well to multiple downstream tasks. In our empirical study, we find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, MaskDP transfers well to offline RL and shows promising scaling behavior w.r.t. to model size. It is amenable to data-efficient finetuning, achieving competitive results with prior methods based on autoregressive pretraining.
\ No newline at end of file
diff --git a/data/2022/neurips/Masked Generative Adversarial Networks are Data-Efficient Generation Learners b/data/2022/neurips/Masked Generative Adversarial Networks are Data-Efficient Generation Learners
new file mode 100644
index 0000000000..18b50ecc8c
--- /dev/null
+++ b/data/2022/neurips/Masked Generative Adversarial Networks are Data-Efficient Generation Learners	
@@ -0,0 +1 @@
+This paper shows that masked generative adversarial networks (MaskedGAN) are robust image generation learners with limited training data. The idea of MaskedGAN is simple: it randomly masks out certain image information for effective GAN training with limited data. We develop two masking strategies that work along orthogonal dimensions of training images, including a shifted spatial masking that masks the images in spatial dimensions with random shifts, and a balanced spectral masking that masks certain image spectral bands with self-adaptive probabilities. The two masking strategies complement each other which together encourage more challenging holistic learning from limited training data, ultimately suppressing trivial solutions and failures in GAN training. Albeit simple, extensive experiments show that MaskedGAN achieves superior performance consistently across different network architectures ( e.g. , CNNs including BigGAN and StyleGAN-v2 and Transformers including TransGAN and GANformer) and datasets ( e.g. , CIFAR-10, CIFAR-100, ImageNet, 100-shot, AFHQ, FFHQ and Cityscapes).
\ No newline at end of file
diff --git a/data/2022/neurips/Masked Prediction: A Parameter Identifiability View b/data/2022/neurips/Masked Prediction: A Parameter Identifiability View
new file mode 100644
index 0000000000..166b767f58
--- /dev/null
+++ b/data/2022/neurips/Masked Prediction: A Parameter Identifiability View	
@@ -0,0 +1 @@
+The vast majority of work in self-supervised learning have focused on assessing recovered features by a chosen set of downstream tasks. While there are several commonly used benchmark datasets, this lens of feature learning requires assumptions on the downstream tasks which are not inherent to the data distribution itself. In this paper, we present an alternative lens, one of parameter identiﬁability: assuming data comes from a parametric probabilistic model, we train a self-supervised learning predictor with a suitable parametric form, and ask whether the parameters of the optimal predictor can be used to extract the parameters of the ground truth generative model. Speciﬁcally, we focus on latent-variable models capturing sequential structures, namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We focus on masked prediction as the self-supervised learning task and study the optimal masked predictor. We show that parameter identiﬁability is governed by the task difﬁculty, which is determined by the choice of data model and the amount of tokens to predict. Technique-wise, we uncover close connections with the uniqueness of tensor rank decompositions , a widely used tool in studying identiﬁability through the lens of the method of moments.
\ No newline at end of file
diff --git a/data/2022/neurips/Matching in Multi-arm Bandit with Collision b/data/2022/neurips/Matching in Multi-arm Bandit with Collision
new file mode 100644
index 0000000000..40d40fcc09
--- /dev/null
+++ b/data/2022/neurips/Matching in Multi-arm Bandit with Collision	
@@ -0,0 +1 @@
+In this paper, we consider the matching of multi-agent multi-armed bandit problem, i
\ No newline at end of file
diff --git a/data/2022/neurips/Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence b/data/2022/neurips/Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence
new file mode 100644
index 0000000000..fb239f61f2
--- /dev/null
+++ b/data/2022/neurips/Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence	
@@ -0,0 +1 @@
+Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update method) and its continuous analogue, Quantum Replicator Dynamics. When each player selects their state according to quantum replicator dynamics, we show that the system exhibits conservation laws in a quantum-information theoretic sense. Moreover, we show that the system exhibits Poincare recurrence, meaning that almost all orbits return arbitrarily close to their initial conditions infinitely often. Our analysis generalizes previous results in the case of classical games.
\ No newline at end of file
diff --git a/data/2022/neurips/Matryoshka Representation Learning b/data/2022/neurips/Matryoshka Representation Learning
new file mode 100644
index 0000000000..7b765442d9
--- /dev/null
+++ b/data/2022/neurips/Matryoshka Representation Learning	
@@ -0,0 +1 @@
+Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.
\ No newline at end of file
diff --git a/data/2022/neurips/Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification b/data/2022/neurips/Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
new file mode 100644
index 0000000000..ba1583fb60
--- /dev/null
+++ b/data/2022/neurips/Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification	
@@ -0,0 +1 @@
+In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximizing Revenue under Market Shrinkage and Market Uncertainty b/data/2022/neurips/Maximizing Revenue under Market Shrinkage and Market Uncertainty
new file mode 100644
index 0000000000..ad336cb657
--- /dev/null
+++ b/data/2022/neurips/Maximizing Revenue under Market Shrinkage and Market Uncertainty	
@@ -0,0 +1 @@
+A shrinking market is a ubiquitous challenge faced by various industries. In this paper we formulate the ﬁrst formal model of shrinking markets in multi-item settings, and study how mechanism design and machine learning can help preserve revenue in an uncertain, shrinking market. Via a sample-based learning mechanism, we prove the ﬁrst guarantees on how much revenue can be preserved by truthful multi-item, multi-bidder auctions (for limited supply) when only a random unknown fraction of the population participates in the market. We ﬁrst present a general reduction that converts any sufﬁciently rich auction class into a randomized auction robust to market shrinkage. Our main technique is a novel combinatorial construction called a winner diagram that concisely represents all possible executions of an auction on an uncertain set of bidders. Via a probabilistic analysis of winner diagrams, we derive a general possibility result: a sufﬁciently rich class of auctions always contains an auction that is robust to market shrinkage and market uncertainty. Our result has applications to important practically-constrained settings such as auctions with a limited number of winners. We then show how to efﬁciently learn an auction that is robust to market shrinkage by leveraging practically-efﬁcient routines for solving the winner determination problem.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximizing and Satisficing in Multi-armed Bandits with Graph Information b/data/2022/neurips/Maximizing and Satisficing in Multi-armed Bandits with Graph Information
new file mode 100644
index 0000000000..02ce091cb2
--- /dev/null
+++ b/data/2022/neurips/Maximizing and Satisficing in Multi-armed Bandits with Graph Information	
@@ -0,0 +1 @@
+Pure exploration in multi-armed bandits has emerged as an important framework for modeling decision-making and search under uncertainty. In modern applications, however, one is often faced with a tremendously large number of options. Even obtaining one observation per option may be too costly rendering traditional pure exploration algorithms ineffective. Fortunately, one often has access to similar relationships amongst the options that can be leveraged. In this paper, we consider the pure exploration problem in stochastic multi-armed bandits where the similarities between the arms are captured by a graph and the rewards may be represented as a smooth signal on this graph. In particular, we consider the problem of finding the arm with the maximum reward (i.e., the maximizing problem) or one with a sufficiently high reward (i.e., the satisficing problem) under this model. We propose novel algorithms \textbf{\algoname{}} (GRaph-based UcB) and $\zeta$-\textbf{\algoname{}} for these problems and provide a theoretical characterization of their performance which specifically elicits the benefit of the graph side information. We also prove a lower bound on the data requirement, showing a large class of problems where these algorithms are near-optimal. We complement our theory with experimental results that show the benefit of capitalizing on such side information.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximum Class Separation as Inductive Bias in One Matrix b/data/2022/neurips/Maximum Class Separation as Inductive Bias in One Matrix
new file mode 100644
index 0000000000..48108b5fd1
--- /dev/null
+++ b/data/2022/neurips/Maximum Class Separation as Inductive Bias in One Matrix	
@@ -0,0 +1 @@
+Maximizing the separation between classes constitutes a well-known inductive bias in machine learning and a pillar of many traditional algorithms. By default, deep networks are not equipped with this inductive bias and therefore many alternative solutions have been proposed through differential optimization. Current approaches tend to optimize classification and separation jointly: aligning inputs with class vectors and separating class vectors angularly. This paper proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations. The main observation behind our approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network. We outline a recursive approach to obtain the matrix consisting of maximally separable vectors for any number of classes, which can be added with negligible engineering effort and computational overhead. Despite its simple nature, this one matrix multiplication provides real impact. We show that our proposal directly boosts classification, long-tailed recognition, out-of-distribution detection, and open-set recognition, from CIFAR to ImageNet. We find empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance. The closed-form implementation and code to reproduce the experiments are available on github.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks b/data/2022/neurips/Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks
new file mode 100644
index 0000000000..4ad3fee83a
--- /dev/null
+++ b/data/2022/neurips/Maximum Common Subgraph Guided Graph Retrieval: Late and Early Interaction Networks	
@@ -0,0 +1 @@
+The graph retrieval problem is to search in a large corpus of graphs for ones that are most similar to a query graph. A common consideration for scoring similarity is the maximum common subgraph (MCS) between the query and corpus graphs, usually counting the number of common edges (i.e., MCES). In some applications, it is also desirable that the common subgraph be connected, i.e., the maximum common connected subgraph (MCCS). Finding exact MCES and MCCS is intractable, but may be unnecessary if ranking corpus graphs by relevance is the goal. We design fast and trainable neural functions that approximate MCES and MCCS well. Late interaction methods compute dense representations for the query and corpus graph separately, and compare these representations using simple similarity functions at the last stage, leading to highly scalable systems. Early interaction methods combine information from both graphs right from the input stages, are usually considerably more accurate, but slower. We propose both late and early interaction neural MCES and MCCS formulations. They are both based on a continuous relaxation of a node alignment matrix between query and corpus nodes. For MCCS, we propose a novel differentiable network for estimating the size of the largest connected common subgraph. Extensive experiments with seven data sets show that our proposals are superior among late interaction models in terms of both accuracy and speed. Our early interaction models provide accuracy competitive with the state of the art, at substantially greater speeds.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximum Likelihood Training of Implicit Nonlinear Diffusion Model b/data/2022/neurips/Maximum Likelihood Training of Implicit Nonlinear Diffusion Model
new file mode 100644
index 0000000000..d021abf057
--- /dev/null
+++ b/data/2022/neurips/Maximum Likelihood Training of Implicit Nonlinear Diffusion Model	
@@ -0,0 +1 @@
+Whereas diverse variations of diffusion models exist, extending the linear diffusion into a nonlinear diffusion process is investigated by very few works. The nonlinearity effect has been hardly understood, but intuitively, there would be promising diffusion patterns to efficiently train the generative distribution towards the data distribution. This paper introduces a data-adaptive nonlinear diffusion process for score-based diffusion models. The proposed Implicit Nonlinear Diffusion Model (INDM) learns by combining a normalizing flow and a diffusion process. Specifically, INDM implicitly constructs a nonlinear diffusion on the \textit{data space} by leveraging a linear diffusion on the \textit{latent space} through a flow network. This flow network is key to forming a nonlinear diffusion, as the nonlinearity depends on the flow network. This flexible nonlinearity improves the learning curve of INDM to nearly Maximum Likelihood Estimation (MLE) against the non-MLE curve of DDPM++, which turns out to be an inflexible version of INDM with the flow fixed as an identity mapping. Also, the discretization of INDM shows the sampling robustness. In experiments, INDM achieves the state-of-the-art FID of 1.75 on CelebA. We release our code at \url{https://github.com/byeonghu-na/INDM}.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximum a posteriori natural scene reconstruction from retinal ganglion cells with deep denoiser priors b/data/2022/neurips/Maximum a posteriori natural scene reconstruction from retinal ganglion cells with deep denoiser priors
new file mode 100644
index 0000000000..2c4390652e
--- /dev/null
+++ b/data/2022/neurips/Maximum a posteriori natural scene reconstruction from retinal ganglion cells with deep denoiser priors	
@@ -0,0 +1 @@
+A fraction of the visual information arriving at the retina is transmitted to the brain by signals in the optic nerve, and the brain must rely solely on these signals to make inferences about the visual world. Previous work has probed the visual information contained in retinal signals by reconstructing images from retinal activity using linear regression and nonlinear regression with neural networks. Maximum a posteriori (MAP) reconstruction offers a more general and principled approach. We develop a novel method for approximate MAP reconstruction by combining a generalized linear model of light responses in retinal neurons and their dependence on spike history and spikes of neighboring cells, with an image prior implicitly embedded in a deep convolutional neural network trained for image denoising. We use this method to reconstruct natural images from ex vivo simultaneously-recorded spikes of hundreds of ganglion cells uniformly sampling a region of the retina. The method produces reconstructions that match or exceed the state-of-the-art in perceptual similarity and exhibit additional fine detail, while using substantially fewer model parameters than previous approaches. The use of more rudimentary encoding models (a linear-nonlinear-Poisson cascade) or image priors (a 1/F spectral model) significantly reduces reconstruction performance, indicating the essential role of both components in achieving high-quality reconstructed images from the retinal signal.
\ No newline at end of file
diff --git a/data/2022/neurips/Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees b/data/2022/neurips/Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
new file mode 100644
index 0000000000..382af35f8b
--- /dev/null
+++ b/data/2022/neurips/Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees	
@@ -0,0 +1 @@
+Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL [1] and IQ-Learn [2] emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks. In this paper we develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm provably converges to a stationary solution with a finite-time guarantee. If the reward is parameterized linearly, we show the identified solution corresponds to the solution of the maximum entropy IRL problem. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models b/data/2022/neurips/Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models
new file mode 100644
index 0000000000..431cc58f96
--- /dev/null
+++ b/data/2022/neurips/Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models	
@@ -0,0 +1 @@
+We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $\theta_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by isotropic standard Gaussian noise. The sequence of signs $\{S_{i}\}_{i\in[n]}\in\{-1,1\}^{n}$ is drawn from a stationary homogeneous Markov chain with flip probability $\delta\in[0,1/2]$. As $\delta$ varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which $\delta=0$ and the Gaussian Mixture Model for which $\delta=1/2$. Assuming that the estimator knows $\delta$, we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $\|\theta_{*}\|,\delta,d,n$. We then provide an upper bound to the case of estimating $\delta$, assuming a (possibly inaccurate) knowledge of $\theta_{*}$. The bound is proved to be tight when $\theta_{*}$ is an accurately known constant. These results are then combined to an algorithm which estimates $\theta_{*}$ with $\delta$ unknown a priori, and theoretical guarantees on its error are stated.
\ No newline at end of file
diff --git a/data/2022/neurips/Mean Estimation with User-level Privacy under Data Heterogeneity b/data/2022/neurips/Mean Estimation with User-level Privacy under Data Heterogeneity
new file mode 100644
index 0000000000..7aa21df690
--- /dev/null
+++ b/data/2022/neurips/Mean Estimation with User-level Privacy under Data Heterogeneity	
@@ -0,0 +1 @@
+A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.
\ No newline at end of file
diff --git a/data/2022/neurips/Measures of Information Reflect Memorization Patterns b/data/2022/neurips/Measures of Information Reflect Memorization Patterns
new file mode 100644
index 0000000000..c06def790d
--- /dev/null
+++ b/data/2022/neurips/Measures of Information Reflect Memorization Patterns	
@@ -0,0 +1 @@
+Neural networks are known to exploit spurious artifacts (or shortcuts) that co-occur with a target label, exhibiting heuristic memorization. On the other hand, networks have been shown to memorize training examples, resulting in example-level memorization. These kinds of memorization impede generalization of networks beyond their training distributions. Detecting such memorization could be challenging, often requiring researchers to curate tailored test sets. In this work, we hypothesize -- and subsequently show -- that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization. We quantify the diversity in the neural activations through information-theoretic measures and find support for our hypothesis on experiments spanning several natural language and vision tasks. Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples. Lastly, we demonstrate the utility of our findings for the problem of model selection. The associated code and other resources for this work are available at https://rachitbansal.github.io/information-measures.
\ No newline at end of file
diff --git a/data/2022/neurips/Measuring Data Reconstruction Defenses in Collaborative Inference Systems b/data/2022/neurips/Measuring Data Reconstruction Defenses in Collaborative Inference Systems
new file mode 100644
index 0000000000..773c568736
--- /dev/null
+++ b/data/2022/neurips/Measuring Data Reconstruction Defenses in Collaborative Inference Systems	
@@ -0,0 +1 @@
+The collaborative inference systems are designed to speed up the prediction processes in edge-cloud scenarios, where the local devices and the cloud system work together to run a complex deep-learning model. However, those edge-cloud collaborative inference systems are vulnerable to emerging reconstruction attacks, where malicious cloud service providers are able to recover the edge-side users’ private data. To defend against such attacks, several defense countermeasures have been recently introduced. Unfortunately, little is known about the robustness of those defense countermeasures. In this paper, we take the ﬁrst step towards measuring the robustness of those state-of-the-art defenses with respect to reconstruction attacks. Speciﬁcally, we show that the latent privacy features are still retained in the obfus-cated representations. Motivated by such an observation, we design a technology called Sensitive Feature Distillation (SFD) to restore sensitive information from the protected feature representations. Our experiments show that SFD can break through defense mechanisms in model partitioning scenarios, demonstrating the in-adequacy of existing defense mechanisms as a privacy-preserving technique against reconstruction attacks. We hope our ﬁndings inspire further work in improving the robustness of defense mechanisms against reconstruction attacks for collaborative inference systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Measuring and Reducing Model Update Regression in Structured Prediction for NLP b/data/2022/neurips/Measuring and Reducing Model Update Regression in Structured Prediction for NLP
new file mode 100644
index 0000000000..d09ef06af7
--- /dev/null
+++ b/data/2022/neurips/Measuring and Reducing Model Update Regression in Structured Prediction for NLP	
@@ -0,0 +1 @@
+Recent advance in deep learning has led to the rapid adoption of machine learning-based NLP models in a wide range of applications. Despite the continuous gain in accuracy, backward compatibility is also an important aspect for industrial applications, yet it received little research attention. Backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor. This work studies model update regression in structured prediction tasks. We choose syntactic dependency parsing and conversational semantic parsing as representative examples of structured prediction tasks in NLP. First, we measure and analyze model update regression in different model update settings. Next, we explore and benchmark existing techniques for reducing model update regression including model ensemble and knowledge distillation. We further propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured prediction. Experiments show that BCR can better mitigate model update regression than model ensemble and knowledge distillation approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models b/data/2022/neurips/Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
new file mode 100644
index 0000000000..710ee5b998
--- /dev/null
+++ b/data/2022/neurips/Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models	
@@ -0,0 +1 @@
+Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process. We also analyze the memorization dynamics of different parts of speech and find that models memorize nouns and numbers first; we hypothesize and provide empirical evidence that nouns and numbers act as a unique identifier for memorizing individual training examples. Together, these findings present another piece of the broader puzzle of trying to understand what actually improves as models get bigger.
\ No newline at end of file
diff --git a/data/2022/neurips/Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization b/data/2022/neurips/Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
new file mode 100644
index 0000000000..fc96f33bd8
--- /dev/null
+++ b/data/2022/neurips/Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization	
@@ -0,0 +1 @@
+The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.
\ No newline at end of file
diff --git a/data/2022/neurips/Memory Efficient Continual Learning with Transformers b/data/2022/neurips/Memory Efficient Continual Learning with Transformers
new file mode 100644
index 0000000000..a6d91ea433
--- /dev/null
+++ b/data/2022/neurips/Memory Efficient Continual Learning with Transformers	
@@ -0,0 +1 @@
+In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since the resources or data might not be available in sufficiently large quantities to practitioners to train the model from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Memory safe computations with XLA compiler b/data/2022/neurips/Memory safe computations with XLA compiler
new file mode 100644
index 0000000000..aa888e88de
--- /dev/null
+++ b/data/2022/neurips/Memory safe computations with XLA compiler	
@@ -0,0 +1 @@
+Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overflows. Memory-efficient solutions require complex programming approaches with significant logic outside the computational framework. This impairs the adoption and use of such algorithms. To address this, we developed an XLA compiler extension that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed. Our approach leads to better use of hardware resources. We believe that further focus on removing memory constraints at a compiler level will widen the range of machine learning methods that can be developed in the future.
\ No newline at end of file
diff --git a/data/2022/neurips/Merging Models with Fisher-Weighted Averaging b/data/2022/neurips/Merging Models with Fisher-Weighted Averaging
new file mode 100644
index 0000000000..750b8cba45
--- /dev/null
+++ b/data/2022/neurips/Merging Models with Fisher-Weighted Averaging	
@@ -0,0 +1 @@
+Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this"merging"operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters therefore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We first show that our"Fisher merging"technique provides a performance boost in settings where simple parameter averaging is currently used -- specifically, robust fine-tuning and model ensembling. Then, we compare merging to standard gradient-based transfer learning and demonstrate that merging enables a fundamentally different method for transferring capabilities across models. Specifically, we show that Fisher merging is competitive with gradient-based transfer learning approaches (while being significantly cheaper) in intermediate-task training and domain-adaptive pre-training. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. We release our code to facilitate future research into methods for merging models.
\ No newline at end of file
diff --git a/data/2022/neurips/Mesoscopic modeling of hidden spiking neurons b/data/2022/neurips/Mesoscopic modeling of hidden spiking neurons
new file mode 100644
index 0000000000..77112cdf6f
--- /dev/null
+++ b/data/2022/neurips/Mesoscopic modeling of hidden spiking neurons	
@@ -0,0 +1 @@
+Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking photo-stimulation.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach b/data/2022/neurips/Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach
new file mode 100644
index 0000000000..d19c6b998b
--- /dev/null
+++ b/data/2022/neurips/Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach	
@@ -0,0 +1 @@
+In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution. The optimal meta RL policy, a.k.a. the Bayes-optimal behavior, is well defined, and guarantees optimal reward in expectation, taken with respect to the task distribution. The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability. Recent work provided the first such PAC analysis for a model-free setting, where a history-dependent policy was learned from the training tasks. In this work, we propose a different approach: directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution. We show that our approach leads to bounds that depend on the dimension of the task distribution. In particular, in settings where the task distribution lies in a low-dimensional manifold, we extend our analysis to use dimensionality reduction techniques and account for such structure, obtaining significantly better bounds than previous work, which strictly depend on the number of states and actions. The key of our approach is the regularization implied by the kernel density estimation method. We further demonstrate that this regularization is useful in practice, when `plugged in' the state-of-the-art VariBAD meta RL algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification b/data/2022/neurips/Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification
new file mode 100644
index 0000000000..46c4a4cfc5
--- /dev/null
+++ b/data/2022/neurips/Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification	
@@ -0,0 +1 @@
+We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical character recognition, featuring various image scales (microscopic, human scales, remote sensing). All datasets are preprocessed, annotated, and formatted uniformly, and come in 3 versions (Micro $\subset$ Mini $\subset$ Extended) to match users' computational resources. We showcase the utility of the first 30 datasets on few-shot learning problems. The other 10 will be released shortly after. Meta-Album is already more diverse and larger (in number of datasets) than similar efforts, and we are committed to keep enlarging it via a series of competitions. As competitions terminate, their test data are released, thus creating a rolling benchmark, available through OpenML.org. Our website https://meta-album.github.io/ contains the source code of challenge winning methods, baseline methods, data loaders, and instructions for contributing either new datasets or algorithms to our expandable meta-dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Auto-Decoder for Solving Parametric Partial Differential Equations b/data/2022/neurips/Meta-Auto-Decoder for Solving Parametric Partial Differential Equations
new file mode 100644
index 0000000000..8d6881b03c
--- /dev/null
+++ b/data/2022/neurips/Meta-Auto-Decoder for Solving Parametric Partial Differential Equations	
@@ -0,0 +1 @@
+Many important problems in science and engineering require solving the so-called parametric partial differential equations (PDEs), i.e., PDEs with different physical parameters, boundary conditions, shapes of computation domains, etc. Recently, building learning-based numerical solvers for parametric PDEs has become an emerging new field. One category of methods such as the Deep Galerkin Method (DGM) and Physics-Informed Neural Networks (PINNs) aim to approximate the solution of the PDEs. They are typically unsupervised and mesh-free, but require going through the time-consuming network training process from scratch for each set of parameters of the PDE. Another category of methods such as Fourier Neural Operator (FNO) and Deep Operator Network (DeepONet) try to approximate the solution mapping directly. Being fast with only one forward inference for each PDE parameter without retraining, they often require a large corpus of paired input-output observations drawn from numerical simulations, and most of them need a predefined mesh as well. In this paper, we propose Meta-Auto-Decoder (MAD), a mesh-free and unsupervised deep learning method that enables the pre-trained model to be quickly adapted to equation instances by implicitly encoding (possibly heterogenous) PDE parameters as latent vectors. The proposed method MAD can be interpreted by manifold learning in infinite-dimensional spaces, granting it a geometric insight. Extensive numerical experiments show that the MAD method exhibits faster convergence speed without losing accuracy than other deep learning-based methods. The project page with code is available: https://gitee.com/mindspore/mindscience/tree/master/MindElec/.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Complementing the Semantics of Short Texts in Neural Topic Models b/data/2022/neurips/Meta-Complementing the Semantics of Short Texts in Neural Topic Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts b/data/2022/neurips/Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
new file mode 100644
index 0000000000..0a6063d610
--- /dev/null
+++ b/data/2022/neurips/Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts	
@@ -0,0 +1 @@
+In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple source domains is counterintuitive. The model is more biased toward learning only domain-invariant features and may result in negative knowledge transfer. In this work, we propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process to address domain shift. Specifically, we incorporate Mixture-of-Experts (MoE) as teachers, where each expert is separately trained on different source domains to maximize their specialty. Given a test-time target domain, a small set of unlabeled data is sampled to query the knowledge from MoE. As the source domains are correlated to the target domains, a transformer-based aggregator then combines the domain knowledge by examining the interconnection among them. The output is treated as a supervision signal to adapt a student prediction network toward the target domain. We further employ meta-learning to enforce the aggregator to distill positive knowledge and the student network to achieve fast adaptation. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. Our code is available at https://github.com/n3il666/Meta-DMoE.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Learning Dynamics Forecasting Using Task Inference b/data/2022/neurips/Meta-Learning Dynamics Forecasting Using Task Inference
new file mode 100644
index 0000000000..fe4e2c5e4f
--- /dev/null
+++ b/data/2022/neurips/Meta-Learning Dynamics Forecasting Using Task Inference	
@@ -0,0 +1 @@
+Current deep learning models for dynamics forecasting struggle with generalization. They can only forecast in a specific domain and fail when applied to systems with different parameters, external forces, or boundary conditions. We propose a model-based meta-learning method called DyAd which can generalize across heterogeneous domains by partitioning them into different tasks. DyAd has two parts: an encoder which infers the time-invariant hidden features of the task with weak supervision, and a forecaster which learns the shared dynamics of the entire domain. The encoder adapts and controls the forecaster during inference using adaptive instance normalization and adaptive padding. Theoretically, we prove that the generalization error of such procedure is related to the task relatedness in the source domain, as well as the domain differences between source and target. Experimentally, we demonstrate that our model outperforms state-of-the-art approaches on both turbulent flow and real-world ocean data forecasting tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Learning with Self-Improving Momentum Target b/data/2022/neurips/Meta-Learning with Self-Improving Momentum Target
new file mode 100644
index 0000000000..c4c8d95842
--- /dev/null
+++ b/data/2022/neurips/Meta-Learning with Self-Improving Momentum Target	
@@ -0,0 +1 @@
+The idea of using a separately trained target model (or teacher) to improve the performance of the student model has been increasingly popular in various machine learning domains, and meta-learning is no exception; a recent discovery shows that utilizing task-wise target models can significantly boost the generalization performance. However, obtaining a target model for each task can be highly expensive, especially when the number of tasks for meta-learning is large. To tackle this issue, we propose a simple yet effective method, coined Self-improving Momentum Target (SiMT). SiMT generates the target model by adapting from the temporal ensemble of the meta-learner, i.e., the momentum network. This momentum network and its task-specific adaptations enjoy a favorable generalization performance, enabling self-improving of the meta-learner through knowledge distillation. Moreover, we found that perturbing parameters of the meta-learner, e.g., dropout, further stabilize this self-improving process by preventing fast convergence of the distillation loss during meta-training. Our experimental results demonstrate that SiMT brings a significant performance gain when combined with a wide range of meta-learning methods under various applications, including few-shot regression, few-shot classification, and meta-reinforcement learning. Code is available at https://github.com/jihoontack/SiMT.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning b/data/2022/neurips/Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning
new file mode 100644
index 0000000000..e047ad1058
--- /dev/null
+++ b/data/2022/neurips/Meta-Query-Net: Resolving Purity-Informativeness Dilemma in Open-set Active Learning	
@@ -0,0 +1 @@
+Unlabeled data examples awaiting annotations contain open-set noise inevitably. A few active learning studies have attempted to deal with this open-set noise for sample selection by filtering out the noisy examples. However, because focusing on the purity of examples in a query set leads to overlooking the informativeness of the examples, the best balancing of purity and informativeness remains an important question. In this paper, to solve this purity-informativeness dilemma in open-set active learning, we propose a novel Meta-Query-Net,(MQ-Net) that adaptively finds the best balancing between the two factors. Specifically, by leveraging the multi-round property of active learning, we train MQ-Net using a query set without an additional validation set. Furthermore, a clear dominance relationship between unlabeled examples is effectively captured by MQ-Net through a novel skyline regularization. Extensive experiments on multiple open-set active learning scenarios demonstrate that the proposed MQ-Net achieves 20.14% improvement in terms of accuracy, compared with the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Reinforcement Learning with Self-Modifying Networks b/data/2022/neurips/Meta-Reinforcement Learning with Self-Modifying Networks
new file mode 100644
index 0000000000..92e33b8cf8
--- /dev/null
+++ b/data/2022/neurips/Meta-Reinforcement Learning with Self-Modifying Networks	
@@ -0,0 +1 @@
+Deep Reinforcement Learning has demonstrated the potential of neural networks tuned with gradient descent for solving complex tasks in well-delimited environments. However, these neural systems are slow learners producing specialized agents with no mechanism to continue learning beyond their training curriculum. On the contrary, biological synaptic plasticity is persistent and manifold, and has been hypothesized to play a key role in executive functions such as working memory and cognitive flexibility, potentially supporting more efficient and generic learning abilities. Inspired by this, we propose to build networks with dynamic weights, able to continually perform self-reflexive modification as a function of their current synaptic state and action-reward feedback, rather than a fixed network configuration. The resulting model, MetODS (for Meta-Optimized Dynamical Synapses) is a broadly applicable meta-reinforcement learning system able to learn efficient and powerful control rules in the agent policy space. A single layer with dynamic synapses can perform one-shot learning, generalizes navigation principles to unseen environments and manifests a strong ability to learn adaptive motor policies.
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning b/data/2022/neurips/Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning
new file mode 100644
index 0000000000..b5af781820
--- /dev/null
+++ b/data/2022/neurips/Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning	
@@ -0,0 +1 @@
+Setting up a well-designed reward function has been challenging for many reinforcement learning applications. Preference-based reinforcement learning (PbRL) provides a new framework that avoids reward engineering by leveraging human preferences (i.e., preferring apples over oranges) as the reward signal. Therefore, improving the efﬁcacy of data usage for preference data becomes critical. In this work, we propose Meta-Reward-Net (MRN), a data-efﬁcient PbRL framework that incorporates bi-level optimization for both reward and policy learning. The key idea of MRN is to adopt the performance of the Q-function as the learning target. Based on this, MRN learns the Q-function and the policy in the inner level while updating the reward function adaptively according to the performance of the Q-function on the preference data in the outer level. Our experiments on robotic simulated manipulation tasks and locomotion tasks demonstrate that MRN outperforms prior methods in the case of few preference labels and signiﬁcantly improves data efﬁciency, achieving state-of-the-art in preference-based RL. Ablation studies further demonstrate that MRN learns a more accurate Q-function compared to prior work and shows obvious advantages when only a small amount of human feedback is available. The source code and videos of this project are released at https://sites.google.com/view/meta-reward-net 1 .
\ No newline at end of file
diff --git a/data/2022/neurips/Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks b/data/2022/neurips/Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks
new file mode 100644
index 0000000000..94b505fdca
--- /dev/null
+++ b/data/2022/neurips/Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks	
@@ -0,0 +1 @@
+Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs. The code is available at: https://github.com/dchiji-ntt/meta-ticket
\ No newline at end of file
diff --git a/data/2022/neurips/MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning b/data/2022/neurips/MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning
new file mode 100644
index 0000000000..8dd5ef5f9a
--- /dev/null
+++ b/data/2022/neurips/MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning	
@@ -0,0 +1 @@
+As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample. While contrastive learning has yielded continuous advancements in sampling strategy and architecture design, it still remains two persistent defects: the interference of task-irrelevant information and sample inefficiency, which are related to the recurring existence of trivial constant solutions. From the perspective of dimensional analysis, we find out that the dimensional redundancy and dimensional confounder are the intrinsic issues behind the phenomena, and provide experimental evidence to support our viewpoint. We further propose a simple yet effective approach MetaMask, short for the dimensional Mask learned by Meta-learning, to learn representations against dimensional redundancy and confounder. MetaMask adopts the redundancy-reduction technique to tackle the dimensional redundancy issue and innovatively introduces a dimensional mask to reduce the gradient effects of specific dimensions containing the confounder, which is trained by employing a meta-learning paradigm with the objective of improving the performance of masked representations on a typical self-supervised task. We provide solid theoretical analyses to prove MetaMask can obtain tighter risk bounds for downstream classification compared to typical contrastive methods. Empirically, our method achieves state-of-the-art performance on various benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification b/data/2022/neurips/MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification
new file mode 100644
index 0000000000..39f6f7ad90
--- /dev/null
+++ b/data/2022/neurips/MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification	
@@ -0,0 +1 @@
+In medical image analysis, often we need to build an image recognition system for a target scenario with the access to small labeled data and abundant unlabeled data, as well as multiple related models pretrained on different source scenarios. This presents the combined challenges of multi-source-free domain adaptation and semi-supervised learning simultaneously . However, both problems are typically studied independently in the literature, and how to effectively combine existing methods is non-trivial in design. In this work, we introduce a novel MetaTeacher framework with three key components: (1) A learnable coordinating scheme for adaptive domain adaptation of individual source models, (2) A mutual feedback mechanism between the target model and source models for more coherent learning, and (3) A semi-supervised bilevel optimization algorithm for consistently organizing the adaption of source models and the learning of target model. It aims to leverage the knowledge of source models adaptively whilst maximize their complementary benefits collectively to counter the challenge of limited supervision. Extensive experiments on five chest x-ray image datasets show that our method outperforms clearly all the state-of-the-art alternatives. The code is available at https:// github.com/wongzbb/metateacher .
\ No newline at end of file
diff --git a/data/2022/neurips/MetricFormer: A Unified Perspective of Correlation Exploring in Similarity Learning b/data/2022/neurips/MetricFormer: A Unified Perspective of Correlation Exploring in Similarity Learning
new file mode 100644
index 0000000000..c49d801e0a
--- /dev/null
+++ b/data/2022/neurips/MetricFormer: A Unified Perspective of Correlation Exploring in Similarity Learning	
@@ -0,0 +1 @@
+Similarity learning can be significantly advanced by informative relationships among different samples and features. The current methods try to excavate the multiple correlations in different aspects, but cannot integrate them into a unified framework. In this paper, we provide to consider the multiple correlations from a unified perspective and propose a new method called MetricFormer, which can effectively capture and model the multiple correlations with an elaborate metric transformer. In MetricFormer, the feature decoupling block is adopted to learn an ensemble of distinct and diverse features with different discriminative characteristics. After that, we apply the batch-wise correlation block into the batch dimension of each mini-batch to implicitly explore sample relationships. Finally, the feature-wise correlation block is performed to discover the intrinsic structural pattern of the ensemble of features and obtain the aggregated feature embedding for similarity measuring. With three kinds of transformer blocks, we can learn more representative features through the proposed MetricFormer. Moreover, our proposed method can be flexibly integrated with any metric learning framework. Extensive experiments on three widely-used datasets demonstrate the superiority of our proposed method over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders b/data/2022/neurips/Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders
new file mode 100644
index 0000000000..d1fe1dc1c8
--- /dev/null
+++ b/data/2022/neurips/Micro and Macro Level Graph Modeling for Graph Variational Auto-Encoders	
@@ -0,0 +1 @@
+Generative models for graph data are an important research topic in machine learning. Graph data comprise two levels that are typically analyzed separately: node-level properties such as the existence of a link between a pair of nodes, and global aggregate graph-level statistics, such as motif counts. This paper proposes a new multi-level framework that jointly models node-level properties and graph-level statistics, as mutually reinforcing sources of information. We introduce a new micro-macro training objective for graph generation that combines node-level and graph-level losses. We utilize the micro-macro objective to improve graph generation with a GraphVAE, a well-established model based on graph-level latent variables, that provides fast training and generation time for medium-sized graphs. Our experiments show that adding micro-macro modeling to the GraphVAE model improves graph quality scores up to 2 orders of magnitude on five benchmark datasets, while maintaining the GraphVAE generation speed advantage.
\ No newline at end of file
diff --git a/data/2022/neurips/Mildly Conservative Q-Learning for Offline Reinforcement Learning b/data/2022/neurips/Mildly Conservative Q-Learning for Offline Reinforcement Learning
new file mode 100644
index 0000000000..f8a7ce796d
--- /dev/null
+++ b/data/2022/neurips/Mildly Conservative Q-Learning for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values. We theoretically show that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OOD actions. Experimental results on the D4RL benchmarks demonstrate that MCQ achieves remarkable performance compared with prior work. Furthermore, MCQ shows superior generalization ability when transferring from offline to online, and significantly outperforms baselines. Our code is publicly available at https://github.com/dmksjfl/MCQ.
\ No newline at end of file
diff --git a/data/2022/neurips/MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training b/data/2022/neurips/MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
new file mode 100644
index 0000000000..2052c1653c
--- /dev/null
+++ b/data/2022/neurips/MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training	
@@ -0,0 +1 @@
+We propose MinVIS, a minimal video instance segmentation (VIS) framework that achieves state-of-the-art VIS performance with neither video-based architectures nor training procedures. By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP. Since MinVIS treats frames in training videos as independent images, we can drastically sub-sample the annotated frames in training videos without any modifications. With only 1% of labeled frames, MinVIS outperforms or is comparable to fully-supervised state-of-the-art approaches on YouTube-VIS 2019/2021. Our key observation is that queries trained to be discriminative between intra-frame object instances are temporally consistent and can be used to track instances without any manually designed heuristics. MinVIS thus has the following inference pipeline: we first apply the trained query-based image instance segmentation to video frames independently. The segmented instances are then tracked by bipartite matching of the corresponding queries. This inference is done in an online fashion and does not need to process the whole video at once. MinVIS thus has the practical advantages of reducing both the labeling costs and the memory requirements, while not sacrificing the VIS performance. Code is available at: https://github.com/NVlabs/MinVIS
\ No newline at end of file
diff --git a/data/2022/neurips/Mind Reader: Reconstructing complex images from brain activities b/data/2022/neurips/Mind Reader: Reconstructing complex images from brain activities
new file mode 100644
index 0000000000..fe9c541a8f
--- /dev/null
+++ b/data/2022/neurips/Mind Reader: Reconstructing complex images from brain activities	
@@ -0,0 +1 @@
+Understanding how the brain encodes external stimuli and how these stimuli can be decoded from the measured brain activities are long-standing and challenging questions in neuroscience. In this paper, we focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals. Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli that are rich in semantics, closer to everyday scenes, and can reveal more perspectives. However, data scarcity of fMRI datasets is the main obstacle to applying state-of-the-art deep learning models to this problem. We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images. Therefore, the modalities involved in our method are: (i) voxel-level fMRI signals, (ii) observed images that trigger the brain signals, and (iii) textual description of the images. To further address data scarcity, we leverage an aligned vision-language latent space pre-trained on massive datasets. Instead of training models from scratch to find a latent space shared by the three modalities, we encode fMRI signals into this pre-aligned latent space. Then, conditioned on embeddings in this space, we reconstruct images with a generative model. The reconstructed images from our pipeline balance both naturalness and fidelity: they are photo-realistic and capture the ground truth image contents well.
\ No newline at end of file
diff --git a/data/2022/neurips/Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning b/data/2022/neurips/Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
new file mode 100644
index 0000000000..a4760ea738
--- /dev/null
+++ b/data/2022/neurips/Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning	
@@ -0,0 +1 @@
+We present modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models. Specifically, we show that different data modalities (e.g. images and text) are embedded at arm's length in their shared representation in multi-modal models such as CLIP. Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization. In model initialization, we show empirically and theoretically that the representation of a common deep neural network is restricted to a narrow cone. As a consequence, in a multi-modal model with two encoders, the representations of the two modalities are clearly apart when the model is initialized. During optimization, contrastive learning keeps the different modalities separate by a certain distance, which is influenced by the temperature parameter in the loss function. Our experiments further demonstrate that varying the modality gap distance has a significant impact in improving the model's downstream zero-shot classification performance and fairness. Our code and data are available at https://modalitygap.readthedocs.io/
\ No newline at end of file
diff --git a/data/2022/neurips/MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge b/data/2022/neurips/MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
new file mode 100644
index 0000000000..c1453d12ee
--- /dev/null
+++ b/data/2022/neurips/MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge	
@@ -0,0 +1 @@
+Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo's data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve a variety of open-ended tasks specified in free-form language without any manually designed dense shaping reward. We open-source the simulation suite, knowledge bases, algorithm implementation, and pretrained models (https://minedojo.org) to promote research towards the goal of generally capable embodied agents.
\ No newline at end of file
diff --git a/data/2022/neurips/Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning b/data/2022/neurips/Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..ee6533035b
--- /dev/null
+++ b/data/2022/neurips/Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification b/data/2022/neurips/Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification
new file mode 100644
index 0000000000..a0001a7cf2
--- /dev/null
+++ b/data/2022/neurips/Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification	
@@ -0,0 +1 @@
+We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optimal rate as a result of an optimization over all possible parameters. We introduce two rates, $R^{\mathrm{go}}$ and $R^{\mathrm{go}}_{\infty}$, corresponding to lower bounds on the probability of misidentification, each of which is associated with a proposed algorithm. The rate $R^{\mathrm{go}}$ is associated with $R^{\mathrm{go}}$-tracking, which can be efficiently implemented by a neural network and is shown to outperform existing algorithms. However, this rate requires a nontrivial condition to be achievable. To address this issue, we introduce the second rate $R^{\mathrm{go}}_\infty$. We show that this rate is indeed achievable by introducing a conceptual algorithm called delayed optimal tracking (DOT).
\ No newline at end of file
diff --git a/data/2022/neurips/Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits b/data/2022/neurips/Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits
new file mode 100644
index 0000000000..8819d9c6f8
--- /dev/null
+++ b/data/2022/neurips/Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits	
@@ -0,0 +1 @@
+We study the problem of best arm identification in linear bandits in the fixed-budget setting. By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm, Optimal Design-based Linear Best Arm Identification (OD-LinBAI). We provide a theoretical analysis of the failure probability of OD-LinBAI. Instead of all the optimality gaps, the performance of OD-LinBAI depends only on the gaps of the top $d$ arms, where $d$ is the effective dimension of the linear bandit instance. Complementarily, we present a minimax lower bound for this problem. The upper and lower bounds show that OD-LinBAI is minimax optimal up to constant multiplicative factors in the exponent, which is a significant theoretical improvement over existing methods (e.g., BayesGap, Peace, LinearExploration and GSE), and settles the question of ascertaining the difficulty of learning the best arm in the fixed-budget setting. Finally, numerical experiments demonstrate considerable empirical improvements over existing algorithms on a variety of real and synthetic datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Minimax Optimal Online Imitation Learning via Replay Estimation b/data/2022/neurips/Minimax Optimal Online Imitation Learning via Replay Estimation
new file mode 100644
index 0000000000..5cd2f84329
--- /dev/null
+++ b/data/2022/neurips/Minimax Optimal Online Imitation Learning via Replay Estimation	
@@ -0,0 +1 @@
+Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap that scales with $H^2 / N$ for behavioral cloning and $H / \sqrt{N}$ for online moment matching, where $H$ is the horizon and $N$ is the size of the expert dataset. We introduce the technique of replay estimation to reduce this empirical variance: by repeatedly executing cached expert actions in a stochastic simulator, we compute a smoother expert visitation distribution estimate to match. In the presence of general function approximation, we prove a meta theorem reducing the performance gap of our approach to the parameter estimation error for offline classification (i.e. learning the expert policy). In the tabular setting or with linear function approximation, our meta theorem shows that the performance gap incurred by our approach achieves the optimal $\widetilde{O} \left( \min({H^{3/2}} / {N}, {H} / {\sqrt{N}} \right)$ dependency, under significantly weaker assumptions compared to prior work. We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/Minimax Regret for Cascading Bandits b/data/2022/neurips/Minimax Regret for Cascading Bandits
new file mode 100644
index 0000000000..d4a420c858
--- /dev/null
+++ b/data/2022/neurips/Minimax Regret for Cascading Bandits	
@@ -0,0 +1 @@
+Cascading bandits is a natural and popular model that frames the task of learning to rank from Bernoulli click feedback in a bandit setting. For the case of unstructured rewards, we prove matching upper and lower bounds for the problem-independent (i.e., gap-free) regret, both of which strictly improve the best known. A key observation is that the hard instances of this problem are those with small mean rewards, i.e., the small click-through rates that are most relevant in practice. Based on this, and the fact that small mean implies small variance for Bernoullis, our key technical result shows that variance-aware confidence sets derived from the Bernstein and Chernoff bounds lead to optimal algorithms (up to log terms), whereas Hoeffding-based algorithms suffer order-wise suboptimal regret. This sharply contrasts with the standard (non-cascading) bandit setting, where the variance-aware algorithms only improve constants. In light of this and as an additional contribution, we propose a variance-aware algorithm for the structured case of linear rewards and show its regret strictly improves the state-of-the-art.
\ No newline at end of file
diff --git a/data/2022/neurips/Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model b/data/2022/neurips/Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model
new file mode 100644
index 0000000000..a56b4a059c
--- /dev/null
+++ b/data/2022/neurips/Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model	
@@ -0,0 +1 @@
+This paper studies multi-agent reinforcement learning in Markov games, with the goal of learning Nash equilibria or coarse correlated equilibria (CCE) sample-optimally. All prior results suffer from at least one of the two obstacles: the curse of multiple agents and the barrier of long horizon, regardless of the sampling protocol in use. We take a step towards settling this problem, assuming access to a flexible sampling mechanism: the generative model. Focusing on non-stationary finite-horizon Markov games, we develop a fast learning algorithm called \myalg~and an adaptive sampling scheme that leverage the optimism principle in online adversarial learning (particularly the Follow-the-Regularized-Leader (FTRL) method). Our algorithm learns an $\varepsilon$-approximate CCE in a general-sum Markov game using $$ \widetilde{O}\bigg( \frac{H^4 S \sum_{i=1}^m A_i}{\varepsilon^2} \bigg) $$ samples, where $m$ is the number of players, $S$ indicates the number of states, $H$ is the horizon, and $A_i$ denotes the number of actions for the $i$-th player. This is minimax-optimal (up to log factor) when the number of players is fixed. When applied to two-player zero-sum Markov games, our algorithm provably finds an $\varepsilon$-approximate Nash equilibrium with minimal samples. Along the way, we derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities, which might be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Mining Multi-Label Samples from Single Positive Labels b/data/2022/neurips/Mining Multi-Label Samples from Single Positive Labels
new file mode 100644
index 0000000000..abe2b1f651
--- /dev/null
+++ b/data/2022/neurips/Mining Multi-Label Samples from Single Positive Labels	
@@ -0,0 +1 @@
+Conditional generative adversarial networks (cGANs) have shown superior results in class-conditional generation tasks. To simultaneously control multiple conditions, cGANs require multi-label training datasets, where multiple labels can be assigned to each data instance. Nevertheless, the tremendous annotation cost limits the accessibility of multi-label datasets in real-world scenarios. Therefore, in this study we explore the practical setting called the single positive setting, where each data instance is annotated by only one positive label with no explicit negative labels. To generate multi-label data in the single positive setting, we propose a novel sampling approach called single-to-multi-label (S2M) sampling, based on the Markov chain Monte Carlo method. As a widely applicable"add-on"method, our proposed S2M sampling method enables existing unconditional and conditional GANs to draw high-quality multi-label data with a minimal annotation cost. Extensive experiments on real image datasets verify the effectiveness and correctness of our method, even when compared to a model trained with fully annotated datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation b/data/2022/neurips/Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation
new file mode 100644
index 0000000000..b8cf8f4a93
--- /dev/null
+++ b/data/2022/neurips/Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation	
@@ -0,0 +1 @@
+Incremental or continual learning has been extensively studied for image classification tasks to alleviate catastrophic forgetting, a phenomenon that earlier learned knowledge is forgotten when learning new concepts. For class incremental semantic segmentation, such a phenomenon often becomes much worse due to the background shift, i.e., some concepts learned at previous stages are assigned to the background class at the current training stage, therefore, significantly reducing the performance of these old concepts. To address this issue, we propose a simple yet effective method in this paper, named Mining unseen Classes via Regional Objectness for Segmentation (MicroSeg). Our MicroSeg is based on the assumption that background regions with strong objectness possibly belong to those concepts in the historical or future stages. Therefore, to avoid forgetting old knowledge at the current training stage, our MicroSeg first splits the given image into hundreds of segment proposals with a proposal generator. Those segment proposals with strong objectness from the background are then clustered and assigned newly-defined labels during the optimization. In this way, the distribution characterizes of old concepts in the feature space could be better perceived, relieving the catastrophic forgetting caused by the background shift accordingly. Extensive experiments on Pascal VOC and ADE20K datasets show competitive results with state-of-the-art, well validating the effectiveness of the proposed MicroSeg.
\ No newline at end of file
diff --git a/data/2022/neurips/Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently b/data/2022/neurips/Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently
new file mode 100644
index 0000000000..4249fb5426
--- /dev/null
+++ b/data/2022/neurips/Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently	
@@ -0,0 +1 @@
+Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. This paper answers an open question in this literature: For the classification setting, what solution does mirror descent (MD) converge to? Specifically, motivated by its efficient implementation, we consider the family of mirror descent algorithms with potential function chosen as the $p$-th power of the $\ell_p$-norm, which is an important generalization of GD. We call this algorithm $p$-$\textsf{GD}$. For this family, we characterize the solutions it obtains and show that it converges in direction to a generalized maximum-margin solution with respect to the $\ell_p$-norm for linearly separable classification. While the MD update rule is in general expensive to compute and perhaps not suitable for deep learning, $p$-$\textsf{GD}$ is fully parallelizable in the same manner as SGD and can be used to train deep neural networks with virtually no additional computational overhead. Using comprehensive experiments with both linear and deep neural network models, we demonstrate that $p$-$\textsf{GD}$ can noticeably affect the structure and the generalization performance of the learned models.
\ No newline at end of file
diff --git a/data/2022/neurips/Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM b/data/2022/neurips/Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM
new file mode 100644
index 0000000000..74010d8a37
--- /dev/null
+++ b/data/2022/neurips/Mirror Descent with Relative Smoothness in Measure Spaces, with application to Sinkhorn and EM	
@@ -0,0 +1 @@
+Many problems in machine learning can be formulated as optimizing a convex functional over a vector space of measures. This paper studies the convergence of the mirror descent algorithm in this infinite-dimensional setting. Defining Bregman divergences through directional derivatives, we derive the convergence of the scheme for relatively smooth and convex pairs of functionals. Such assumptions allow to handle non-smooth functionals such as the Kullback--Leibler (KL) divergence. Applying our result to joint distributions and KL, we show that Sinkhorn's primal iterations for entropic optimal transport in the continuous setting correspond to a mirror descent, and we obtain a new proof of its (sub)linear convergence. We also show that Expectation Maximization (EM) can always formally be written as a mirror descent. When optimizing only on the latent distribution while fixing the mixtures parameters -- which corresponds to the Richardson--Lucy deconvolution scheme in signal processing -- we derive sublinear rates of convergence.
\ No newline at end of file
diff --git a/data/2022/neurips/Mismatched No More: Joint Model-Policy Optimization for Model-Based RL b/data/2022/neurips/Mismatched No More: Joint Model-Policy Optimization for Model-Based RL
new file mode 100644
index 0000000000..de6fc1b1e0
--- /dev/null
+++ b/data/2022/neurips/Mismatched No More: Joint Model-Policy Optimization for Model-Based RL	
@@ -0,0 +1 @@
+Many model-based reinforcement learning (RL) methods follow a similar template: fit a model to previously observed data, and then use data from that model for RL or planning. However, models that achieve better training performance (e.g., lower MSE) are not necessarily better for control: an RL agent may seek out the small fraction of states where an accurate model makes mistakes, or it might act in ways that do not expose the errors of an inaccurate model. As noted in prior work, there is an objective mismatch: models are useful if they yield good policies, but they are trained to maximize their accuracy, rather than the performance of the policies that result from them. In this work, we propose a single objective for jointly training the model and the policy, such that updates to either component increase a lower bound on expected return. To the best of our knowledge, this is the first lower bound for model-based RL that holds globally and can be efficiently estimated in continuous settings; it is the only lower bound that mends the objective mismatch problem. A version of this bound becomes tight under certain assumptions. Optimizing this bound resembles a GAN: a classifier distinguishes between real and fake transitions, the model is updated to produce transitions that look realistic, and the policy is updated to avoid states where the model predictions are unrealistic. Numerical simulations demonstrate that optimizing this bound yields reward maximizing policies and yields dynamics that (perhaps surprisingly) can aid in exploration. We also show that a deep RL algorithm loosely based on our lower bound can achieve performance competitive with prior model-based methods, and better performance on certain hard exploration tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models b/data/2022/neurips/MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models
new file mode 100644
index 0000000000..c365c8fabb
--- /dev/null
+++ b/data/2022/neurips/MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models	
@@ -0,0 +1 @@
+State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo b/data/2022/neurips/Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo
new file mode 100644
index 0000000000..c9f529cb89
--- /dev/null
+++ b/data/2022/neurips/Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo	
@@ -0,0 +1 @@
+Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation and supervised learning with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.
\ No newline at end of file
diff --git a/data/2022/neurips/Misspecified Phase Retrieval with Generative Priors b/data/2022/neurips/Misspecified Phase Retrieval with Generative Priors
new file mode 100644
index 0000000000..3f6c1055f7
--- /dev/null
+++ b/data/2022/neurips/Misspecified Phase Retrieval with Generative Priors	
@@ -0,0 +1 @@
+In this paper, we study phase retrieval under model misspecification and generative priors. In particular, we aim to estimate an $n$-dimensional signal $\mathbf{x}$ from $m$ i.i.d.~realizations of the single index model $y = f(\mathbf{a}^T\mathbf{x})$, where $f$ is an unknown and possibly random nonlinear link function and $\mathbf{a} \in \mathbb{R}^n$ is a standard Gaussian vector. We make the assumption $\mathrm{Cov}[y,(\mathbf{a}^T\mathbf{x})^2] \ne 0$, which corresponds to the misspecified phase retrieval problem. In addition, the underlying signal $\mathbf{x}$ is assumed to lie in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a two-step approach, for which the first step plays the role of spectral initialization and the second step refines the estimated vector produced by the first step iteratively. We show that both steps enjoy a statistical rate of order $\sqrt{(k\log L)\cdot (\log m)/m}$ under suitable conditions. Experiments on image datasets are performed to demonstrate that our approach performs on par with or even significantly outperforms several competing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization b/data/2022/neurips/Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization
new file mode 100644
index 0000000000..bae59000c3
--- /dev/null
+++ b/data/2022/neurips/Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization	
@@ -0,0 +1 @@
+Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (\mire), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. \mire\ consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed \mire.
\ No newline at end of file
diff --git a/data/2022/neurips/Mixture-of-Experts with Expert Choice Routing b/data/2022/neurips/Mixture-of-Experts with Expert Choice Routing
new file mode 100644
index 0000000000..11083bb910
--- /dev/null
+++ b/data/2022/neurips/Mixture-of-Experts with Expert Choice Routing	
@@ -0,0 +1 @@
+Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increase while keeping the amount of computation for a given token or a given sample unchanged. However, a poor expert routing strategy (e.g. one resulting in load imbalance) can cause certain experts to be under-trained, leading to an expert being under or over-specialized. Prior work allocates a fixed number of experts to each token using a top-k function regardless of the relative importance of different tokens. To address this, we propose a heterogeneous mixture-of-experts employing an expert choice method. Instead of letting tokens select the top-k experts, we have experts selecting the top-k tokens. As a result, each token can be routed to a variable number of experts and each expert can have a fixed bucket size. We systematically study pre-training speedups using the same computational resources of the Switch Transformer top-1 and GShard top-2 gating of prior work and find that our method improves training convergence time by more than 2x. For the same computational cost, our method demonstrates higher performance in fine-tuning 11 selected tasks in the GLUE and SuperGLUE benchmarks. For a smaller activation cost, our method outperforms the T5 dense model in 7 out of the 11 tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control b/data/2022/neurips/MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control
new file mode 100644
index 0000000000..6c605c3f06
--- /dev/null
+++ b/data/2022/neurips/MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control	
@@ -0,0 +1 @@
+Simulated humanoids are an appealing research domain due to their physical capabilities. Nonetheless, they are also challenging to control, as a policy must drive an unstable, discontinuous, and high-dimensional physical system. One widely studied approach is to utilize motion capture (MoCap) data to teach the humanoid agent low-level skills (e.g., standing, walking, and running) that can then be re-used to synthesize high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, as MoCap data offers only kinematic information. Finding physical control inputs to realize the demonstrated motions requires computationally intensive methods like reinforcement learning. Thus, despite the publicly available MoCap data, its utility has been limited to institutions with large-scale compute. In this work, we dramatically lower the barrier for productive research on this topic by training and releasing high-quality agents that can track over three hours of MoCap data for a simulated humanoid in the dm_control physics-based environment. We release MoCapAct (Motion Capture with Actions), a dataset of these expert agents and their rollouts, which contain proprioceptive observations and actions. We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks. Finally, we use MoCapAct to train an autoregressive GPT model and show that it can control a simulated humanoid to perform natural motion completion given a motion prompt. Videos of the results and links to the code and dataset are available at https://microsoft.github.io/MoCapAct.
\ No newline at end of file
diff --git a/data/2022/neurips/MoCoDA: Model-based Counterfactual Data Augmentation b/data/2022/neurips/MoCoDA: Model-based Counterfactual Data Augmentation
new file mode 100644
index 0000000000..9be96d400f
--- /dev/null
+++ b/data/2022/neurips/MoCoDA: Model-based Counterfactual Data Augmentation	
@@ -0,0 +1 @@
+The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MoCoDA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MoCoDA enables RL agents to learn policies that generalize to unseen states and actions. We use MoCoDA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.
\ No newline at end of file
diff --git a/data/2022/neurips/MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation b/data/2022/neurips/MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation
new file mode 100644
index 0000000000..fe3c1b8eb0
--- /dev/null
+++ b/data/2022/neurips/MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation	
@@ -0,0 +1 @@
+Monocular 3D object detection (Mono3D) in mobile settings (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Due to the near-far disparity phenomenon of monocular vision and the ever-changing camera pose, it is hard to acquire high detection accuracy, especially for far objects. Inspired by the insight that the depth of an object can be well determined according to the depth of the ground where it stands, in this paper, we propose a novel Mono3D framework, called MoGDE, which constantly estimates the corresponding ground depth of an image and then utilizes the estimated ground depth information to guide Mono3D. To this end, we utilize a pose detection network to estimate the pose of the camera and then construct a feature map portraying pixel-level ground depth according to the 3D-to-2D perspective geometry. Moreover, to improve Mono3D with the estimated ground depth, we design an RGB-D feature fusion network based on the transformer structure, where the long-range self-attention mechanism is utilized to effectively identify ground-contacting points and pin the corresponding ground depth to the image feature map. We conduct extensive experiments on the real-world KITTI dataset. The results demonstrate that MoGDE can effectively improve the Mono3D accuracy and robustness for both near and far objects. MoGDE yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation b/data/2022/neurips/MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation
new file mode 100644
index 0000000000..7dcceee5a3
--- /dev/null
+++ b/data/2022/neurips/MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation	
@@ -0,0 +1 @@
+Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
\ No newline at end of file
diff --git a/data/2022/neurips/Model Preserving Compression for Neural Networks b/data/2022/neurips/Model Preserving Compression for Neural Networks
new file mode 100644
index 0000000000..6da28a71c4
--- /dev/null
+++ b/data/2022/neurips/Model Preserving Compression for Neural Networks	
@@ -0,0 +1 @@
+After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning. No existing compression methods simultaneously satisfy these criteria $\unicode{x2014}$ we introduce a principled approach that does by leveraging interpolative decompositions. Our approach simultaneously selects and eliminates channels (analogously, neurons), then constructs an interpolation matrix that propagates a correction into the next layer, preserving the network's structure. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. Our theoretical generalization bound for a one layer network lends itself naturally to a heuristic that allows our method to automatically choose per-layer sizes for deep networks. We demonstrate the efficacy of our approach with strong empirical performance on a variety of tasks, models, and datasets $\unicode{x2014}$ from simple one-hidden-layer networks to deep networks on ImageNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Model Zoos: A Dataset of Diverse Populations of Neural Network Models b/data/2022/neurips/Model Zoos: A Dataset of Diverse Populations of Neural Network Models
new file mode 100644
index 0000000000..d4b2d69bfa
--- /dev/null
+++ b/data/2022/neurips/Model Zoos: A Dataset of Diverse Populations of Neural Network Models	
@@ -0,0 +1 @@
+In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can reveal latent properties of individual models. With such model zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of NN weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of NNs. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research. In total the proposed model zoo dataset is based on eight image datasets, consists of 27 model zoos trained with varying hyperparameter combinations and includes 50'360 unique NN models as well as their sparsified twins, resulting in over 3'844'360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks. The dataset can be found at www.modelzoos.cc.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-Based Imitation Learning for Urban Driving b/data/2022/neurips/Model-Based Imitation Learning for Urban Driving
new file mode 100644
index 0000000000..4d8ee74d34
--- /dev/null
+++ b/data/2022/neurips/Model-Based Imitation Learning for Urban Driving	
@@ -0,0 +1 @@
+An accurate model of the environment and the dynamic agents acting in it offers great potential for improving motion planning. We present MILE: a Model-based Imitation LEarning approach to jointly learn a model of the world and a policy for autonomous driving. Our method leverages 3D geometry as an inductive bias and learns a highly compact latent space directly from high-resolution videos of expert demonstrations. Our model is trained on an offline corpus of urban driving data, without any online interaction with the environment. MILE improves upon prior state-of-the-art by 31% in driving score on the CARLA simulator when deployed in a completely new town and new weather conditions. Our model can predict diverse and plausible states and actions, that can be interpretably decoded to bird's-eye view semantic segmentation. Further, we demonstrate that it can execute complex driving manoeuvres from plans entirely predicted in imagination. Our approach is the first camera-only method that models static scene, dynamic scene, and ego-behaviour in an urban driving environment. The code and model weights are available at https://github.com/wayveai/mile.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief b/data/2022/neurips/Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief
new file mode 100644
index 0000000000..78f431c754
--- /dev/null
+++ b/data/2022/neurips/Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief	
@@ -0,0 +1 @@
+Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model. While the dynamics model learned through reuse of the static dataset, its generalization ability hopefully promotes policy learning if properly utilized. To that end, several works propose to quantify the uncertainty of predicted dynamics, and explicitly apply it to penalize reward. However, as the dynamics and the reward are intrinsically different factors in context of MDP, characterizing the impact of dynamics uncertainty through reward penalty may incur unexpected tradeoff between model utilization and risk avoidance. In this work, we instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation of offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with policy-dependent reweighting factor, termed Pessimism-Modulated Dynamics Belief. To improve policy, we devise an iterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain condition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-Based Opponent Modeling b/data/2022/neurips/Model-Based Opponent Modeling
new file mode 100644
index 0000000000..45b95bd528
--- /dev/null
+++ b/data/2022/neurips/Model-Based Opponent Modeling	
@@ -0,0 +1 @@
+When one agent interacts with a multi-agent environment, it is challenging to deal with various opponents unseen before. Modeling the behaviors, goals, or beliefs of opponents could help the agent adjust its policy to adapt to different opponents. In addition, it is also important to consider opponents who are learning simultaneously or capable of reasoning. However, existing work usually tackles only one of the aforementioned types of opponents. In this paper, we propose model-based opponent modeling (MBOM), which employs the environment model to adapt to all kinds of opponents. MBOM simulates the recursive reasoning process in the environment model and imagines a set of improving opponent policies. To effectively and accurately represent the opponent policy, MBOM further mixes the imagined opponent policies according to the similarity with the real behaviors of opponents. Empirically, we show that MBOM achieves more effective adaptation than existing methods in a variety of tasks, respectively with different types of opponents, i.e., fixed policy, na\"ive learner, and reasoning learner.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-based Lifelong Reinforcement Learning with Bayesian Exploration b/data/2022/neurips/Model-based Lifelong Reinforcement Learning with Bayesian Exploration
new file mode 100644
index 0000000000..52375fa7c6
--- /dev/null
+++ b/data/2022/neurips/Model-based Lifelong Reinforcement Learning with Bayesian Exploration	
@@ -0,0 +1 @@
+We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuous-state domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent model-based deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity b/data/2022/neurips/Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity
new file mode 100644
index 0000000000..a7f92a1be8
--- /dev/null
+++ b/data/2022/neurips/Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity	
@@ -0,0 +1 @@
+We propose a general framework to design posterior sampling methods for model-based RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger distance in conditional probability estimation. We further show that optimistic posterior sampling can control this Hellinger distance, when we measure model error via data likelihood. This technique allows us to design and analyze unified posterior sampling algorithms with state-of-the-art sample complexity guarantees for many model-based RL settings. We illustrate our general result in many special cases, demonstrating the versatility of our framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm b/data/2022/neurips/Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm
new file mode 100644
index 0000000000..8f9dc95525
--- /dev/null
+++ b/data/2022/neurips/Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm	
@@ -0,0 +1 @@
+During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem has been recently well studied under the Constrained Markov Decision Process (CMDP) Framework, where in addition to single-stage rewards, an agent receives single-stage costs or penalties as well depending on the state transitions. The prescribed cost functions are responsible for mapping undesirable behavior at any given time-step to a scalar value. The goal then is to find a feasible policy that maximizes reward returns while constraining the cost returns to be below a prescribed threshold during training as well as deployment. We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner as well as find a feasible optimal policy using the Lagrangian Relaxation-based Proximal Policy Optimization. We use an ensemble of neural networks with different initializations to tackle epistemic and aleatoric uncertainty issues faced during environment model learning. We compare our approach with relevant model-free and model-based approaches in Constrained RL using the challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym. We demonstrate that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches. Further, our approach shows better reward performance than other constrained model-based approaches in the literature.
\ No newline at end of file
diff --git a/data/2022/neurips/Modeling Human Exploration Through Resource-Rational Reinforcement Learning b/data/2022/neurips/Modeling Human Exploration Through Resource-Rational Reinforcement Learning
new file mode 100644
index 0000000000..acc12b7555
--- /dev/null
+++ b/data/2022/neurips/Modeling Human Exploration Through Resource-Rational Reinforcement Learning	
@@ -0,0 +1 @@
+Equipping artificial agents with useful exploration mechanisms remains a challenge to this day. Humans, on the other hand, seem to manage the trade-off between exploration and exploitation effortlessly. In the present article, we put forward the hypothesis that they accomplish this by making optimal use of limited computational resources. We study this hypothesis by meta-learning reinforcement learning algorithms that sacrifice performance for a shorter description length (defined as the number of bits required to implement the given algorithm). The emerging class of models captures human exploration behavior better than previously considered approaches, such as Boltzmann exploration, upper confidence bound algorithms, and Thompson sampling. We additionally demonstrate that changing the description length in our class of models produces the intended effects: reducing description length captures the behavior of brain-lesioned patients while increasing it mirrors cognitive development during adolescence.
\ No newline at end of file
diff --git a/data/2022/neurips/Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings b/data/2022/neurips/Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings
new file mode 100644
index 0000000000..e0e3b0e540
--- /dev/null
+++ b/data/2022/neurips/Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings	
@@ -0,0 +1 @@
+Modeling directed graphs with differentiable representations is a fundamental requirement for performing machine learning on graph-structured data. Geometric embedding models (e.g. hyperbolic, cone, and box embeddings) excel at this task, exhibiting useful inductive biases for directed graphs. However, modeling directed graphs that both contain cycles and some element of transitivity, two properties common in real-world settings, is challenging. Box embeddings, which can be thought of as representing the graph as an intersection over some learned super-graphs, have a natural inductive bias toward modeling transitivity, but (as we prove) cannot model cycles. To this end, we propose binary code box embeddings , where a learned binary code selects a subset of graphs for intersection. We explore several variants, including global binary codes (amounting to a union over intersections) and per-vertex binary codes (allowing greater flexibility) as well as methods of regularization. Theoretical and empirical results show that the proposed models not only preserve a useful inductive bias of transitivity but also have sufficient representational capacity to model arbitrary graphs, including graphs with cycles.
\ No newline at end of file
diff --git a/data/2022/neurips/Modeling the Machine Learning Multiverse b/data/2022/neurips/Modeling the Machine Learning Multiverse
new file mode 100644
index 0000000000..4652b7cbc7
--- /dev/null
+++ b/data/2022/neurips/Modeling the Machine Learning Multiverse	
@@ -0,0 +1 @@
+Amid mounting concern about the reliability and credibility of machine learning research, we present a principled framework for making robust and generalizable claims: the multiverse analysis. Our framework builds upon the multiverse analysis (Steegen et al., 2016) introduced in response to psychology's own reproducibility crisis. To efficiently explore high-dimensional and often continuous ML search spaces, we model the multiverse with a Gaussian Process surrogate and apply Bayesian experimental design. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. For the machine learning community, the multiverse analysis is a simple and effective technique for identifying robust claims, for increasing transparency, and a step toward improved reproducibility.
\ No newline at end of file
diff --git a/data/2022/neurips/Models Out of Line: A Fourier Lens on Distribution Shift Robustness b/data/2022/neurips/Models Out of Line: A Fourier Lens on Distribution Shift Robustness
new file mode 100644
index 0000000000..da59ab0d8a
--- /dev/null
+++ b/data/2022/neurips/Models Out of Line: A Fourier Lens on Distribution Shift Robustness	
@@ -0,0 +1 @@
+Improving the accuracy of deep neural networks (DNNs) on out-of-distribution (OOD) data is critical to an acceptance of deep learning (DL) in real world applications. It has been observed that accuracies on in-distribution (ID) versus OOD data follow a linear trend and models that outperform this baseline are exceptionally rare (and referred to as"effectively robust"). Recently, some promising approaches have been developed to improve OOD robustness: model pruning, data augmentation, and ensembling or zero-shot evaluating large pretrained models. However, there still is no clear understanding of the conditions on OOD data and model properties that are required to observe effective robustness. We approach this issue by conducting a comprehensive empirical study of diverse approaches that are known to impact OOD robustness on a broad range of natural and synthetic distribution shifts of CIFAR-10 and ImageNet. In particular, we view the"effective robustness puzzle"through a Fourier lens and ask how spectral properties of both models and OOD data influence the corresponding effective robustness. We find this Fourier lens offers some insight into why certain robust models, particularly those from the CLIP family, achieve OOD robustness. However, our analysis also makes clear that no known metric is consistently the best explanation (or even a strong explanation) of OOD robustness. Thus, to aid future research into the OOD puzzle, we address the gap in publicly-available models with effective robustness by introducing a set of pretrained models--RobustNets--with varying levels of OOD robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models b/data/2022/neurips/Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
new file mode 100644
index 0000000000..c4714fec9a
--- /dev/null
+++ b/data/2022/neurips/Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models	
@@ -0,0 +1 @@
+Despite the great success of pre-trained language models (PLMs) in a large set of natural language processing (NLP) tasks, there has been a growing concern about their security in real-world applications. Backdoor attack, which poisons a small number of training samples by inserting backdoor triggers, is a typical threat to security. Trained on the poisoned dataset, a victim model would perform normally on benign samples but predict the attacker-chosen label on samples containing pre-defined triggers. The vulnerability of PLMs under backdoor attacks has been proved with increasing evidence in the literature. In this paper, we present several simple yet effective training strategies that could effectively defend against such attacks. To the best of our knowledge, this is the first work to explore the possibility of backdoor-free adaptation for PLMs. Our motivation is based on the observation that, when trained on the poisoned dataset, the PLM’s adaptation follows a strict order of two stages: (1) a moderate-fitting stage, where the model mainly learns the major features corresponding to the original task instead of subsidiary features of backdoor triggers, and (2) an overfitting stage, where both features are learned adequately. Therefore, if we could properly restrict the PLM’s adaptation to the moderate-fitting stage, the model would neglect the backdoor triggers but still achieve satisfying performance on the original task. To this end, we design three methods to defend against backdoor attacks by reducing the model capacity, training epochs, and learning rate, respectively. Experimental results demonstrate the effectiveness of our methods in defending against several representative NLP backdoor attacks. We also perform visualization-based analysis to attain a deeper understanding of how the model learns different features, and explore the effect of the poisoning ratio. Finally, we explore whether our methods could defend against backdoor attacks for the pre-trained CV model. The codes are publicly available at https://github.com/thunlp/Moderate-fitting .
\ No newline at end of file
diff --git a/data/2022/neurips/Modular Flows: Differential Molecular Generation b/data/2022/neurips/Modular Flows: Differential Molecular Generation
new file mode 100644
index 0000000000..ad032b349c
--- /dev/null
+++ b/data/2022/neurips/Modular Flows: Differential Molecular Generation	
@@ -0,0 +1 @@
+Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance, or induce discrepancy between the encoding and the decoding steps that necessitates post hoc validity correction. We circumvent these issues with novel continuous normalizing E(3)-equivariant flows, based on a system of node ODEs coupled as a graph PDE, that repeatedly reconcile locally toward globally aligned densities. Our models can be cast as message-passing temporal networks, and result in superlative performance on the tasks of density estimation and molecular generation. In particular, our generated samples achieve state-of-the-art on both the standard QM9 and ZINC250K benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Module-Aware Optimization for Auxiliary Learning b/data/2022/neurips/Module-Aware Optimization for Auxiliary Learning
new file mode 100644
index 0000000000..f127310d84
--- /dev/null
+++ b/data/2022/neurips/Module-Aware Optimization for Auxiliary Learning	
@@ -0,0 +1 @@
+In line 2 of (A2), we obtain the relation between ∇αθM and its previous term ∇αθM−1, like a linear function with its weight (I − η1∇2θM−1L̄t(α, θM−1)) and bias term −η1∇α∇θM−1L̄t(α, θM−1) . In the following derivation, ∇αθM−1 can be replaced with its previous linear form of ∇αθM−2, until we reach the initial ∇αθ0, which equals to 0. Summing up all the bias terms gives the final expression. From (A2), the derivation of ∇αθM requires the Jacobi and Hessian matrix in previous M steps, which is memory consuming. By approximating all previous θτ as θM , we obtain the following best-response approximation:
\ No newline at end of file
diff --git a/data/2022/neurips/Molecule Generation by Principal Subgraph Mining and Assembling b/data/2022/neurips/Molecule Generation by Principal Subgraph Mining and Assembling
new file mode 100644
index 0000000000..5e78d251ba
--- /dev/null
+++ b/data/2022/neurips/Molecule Generation by Principal Subgraph Mining and Assembling	
@@ -0,0 +1 @@
+Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external subgraph construction, and the subgraph assembling depends solely on local arrangement. In this paper, we define a novel notion, principal subgraph, that is closely related to the informative pattern within molecules. Interestingly, our proposed merge-and-update subgraph extraction method can automatically discover frequent principal subgraphs from the dataset, while previous methods are incapable of. Moreover, we develop a two-step subgraph assembling strategy, which first predicts a set of subgraphs in a sequence-wise manner and then assembles all generated subgraphs globally as the final output molecule. Built upon graph variational auto-encoder, our model is demonstrated to be effective in terms of several evaluation metrics and efficiency, compared with state-of-the-art methods on distribution learning and (constrained) property optimization tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Moment Distributionally Robust Tree Structured Prediction b/data/2022/neurips/Moment Distributionally Robust Tree Structured Prediction
new file mode 100644
index 0000000000..0baff077ce
--- /dev/null
+++ b/data/2022/neurips/Moment Distributionally Robust Tree Structured Prediction	
@@ -0,0 +1 @@
+Structured prediction of tree-shaped objects is heavily studied under the name of syntactic dependency parsing. Current practice based on maximum likelihood or margin is either agnostic to or inconsistent with the evaluation loss. Risk minimization alleviates the discrepancy between training and test objectives but typically induces a non-convex problem. These approaches adopt explicit regularization to combat overfitting without probabilistic interpretation. We propose a moment-based distributionally robust optimization approach for tree structured prediction, where the worst-case expected loss over a set of distributions within bounded moment divergence from the empirical distribution is minimized. We develop efficient algorithms for arborescences and other variants of trees. We derive Fisher consistency, convergence rates and generalization bounds for our proposed method. We evaluate its empirical effectiveness on dependency parsing benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation b/data/2022/neurips/Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation
new file mode 100644
index 0000000000..df6e54ada2
--- /dev/null
+++ b/data/2022/neurips/Momentum Adversarial Distillation: Handling Large Distribution Shifts in Data-Free Knowledge Distillation	
@@ -0,0 +1 @@
+Data-free Knowledge Distillation (DFKD) has attracted attention recently thanks to its appealing capability of transferring knowledge from a teacher network to a student network without using training data. The main idea is to use a generator to synthesize data for training the student. As the generator gets updated, the distribution of synthetic data will change. Such distribution shift could be large if the generator and the student are trained adversarially, causing the student to forget the knowledge it acquired at previous steps. To alleviate this problem, we propose a simple yet effective method called Momentum Adversarial Distillation (MAD) which maintains an exponential moving average (EMA) copy of the generator and uses synthetic samples from both the generator and the EMA generator to train the student. Since the EMA generator can be considered as an ensemble of the generator's old versions and often undergoes a smaller change in updates compared to the generator, training on its synthetic samples can help the student recall the past knowledge and prevent the student from adapting too quickly to new updates of the generator. Our experiments on six benchmark datasets including big datasets like ImageNet and Places365 demonstrate the superior performance of MAD over competing methods for handling the large distribution shift problem. Our method also compares favorably to existing DFKD methods and even achieves state-of-the-art results in some cases.
\ No newline at end of file
diff --git a/data/2022/neurips/Momentum Aggregation for Private Non-convex ERM b/data/2022/neurips/Momentum Aggregation for Private Non-convex ERM
new file mode 100644
index 0000000000..5971e710bb
--- /dev/null
+++ b/data/2022/neurips/Momentum Aggregation for Private Non-convex ERM	
@@ -0,0 +1 @@
+We introduce new algorithms and convergence guarantees for privacy-preserving non-convex Empirical Risk Minimization (ERM) on smooth $d$-dimensional objectives. We develop an improved sensitivity analysis of stochastic gradient descent on smooth objectives that exploits the recurrence of examples in different epochs. By combining this new approach with recent analysis of momentum with private aggregation techniques, we provide an $(\epsilon,\delta)$-differential private algorithm that finds a gradient of norm $\tilde O\left(\frac{d^{1/3}}{(\epsilon N)^{2/3}}\right)$ in $O\left(\frac{N^{7/3}\epsilon^{4/3}}{d^{2/3}}\right)$ gradient evaluations, improving the previous best gradient bound of $\tilde O\left(\frac{d^{1/4}}{\sqrt{\epsilon N}}\right)$.
\ No newline at end of file
diff --git a/data/2022/neurips/MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction b/data/2022/neurips/MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction
new file mode 100644
index 0000000000..85e7d7f8a4
--- /dev/null
+++ b/data/2022/neurips/MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction	
@@ -0,0 +1 @@
+In recent years, neural implicit surface reconstruction methods have become popular for multi-view 3D reconstruction. In contrast to traditional multi-view stereo methods, these approaches tend to produce smoother and more complete reconstructions due to the inductive smoothness bias of neural networks. State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. Yet, their performance drops significantly for larger and more complex scenes and scenes captured from sparse viewpoints. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints, in particular in less-observed and textureless areas. Motivated by recent advances in the area of monocular geometry prediction, we systematically explore the utility these cues provide for improving neural implicit surface reconstruction. We demonstrate that depth and normal cues, predicted by general-purpose monocular estimators, significantly improve reconstruction quality and optimization time. Further, we analyse and investigate multiple design choices for representing neural implicit surfaces, ranging from monolithic MLP models over single-grid to multi-resolution grid representations. We observe that geometric monocular priors improve performance both for small-scale single-object as well as large-scale multi-object scenes, independent of the choice of representation.
\ No newline at end of file
diff --git a/data/2022/neurips/Monocular Dynamic View Synthesis: A Reality Check b/data/2022/neurips/Monocular Dynamic View Synthesis: A Reality Check
new file mode 100644
index 0000000000..6365c820c8
--- /dev/null
+++ b/data/2022/neurips/Monocular Dynamic View Synthesis: A Reality Check	
@@ -0,0 +1 @@
+We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signal present in the input capture sequence based on the relative camera-scene motion. We introduce two new metrics: co-visibility masked image metrics and correspondence accuracy, which overcome the issue in existing protocols. We also propose a new iPhone dataset that includes more diverse real-life deformation sequences. Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 dB drop in masked PSNR in the absence of multi-view cues and 4-5 dB drop when modeling complex motion. Code and data can be found at https://hangg7.com/dycheck.
\ No newline at end of file
diff --git a/data/2022/neurips/Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations b/data/2022/neurips/Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
new file mode 100644
index 0000000000..2dd7257d0e
--- /dev/null
+++ b/data/2022/neurips/Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations	
@@ -0,0 +1 @@
+Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified $Q$-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high $Q$-values in corresponding regions of the state space. Experiments across $5$ continuous control domains suggest that MCAC can be used to significantly increase learning efficiency across $6$ commonly used RL and RL-from-demonstrations algorithms. See https://sites.google.com/view/mcac-rl for code and supplementary material.
\ No newline at end of file
diff --git a/data/2022/neurips/Monte Carlo Tree Descent for Black-Box Optimization b/data/2022/neurips/Monte Carlo Tree Descent for Black-Box Optimization
new file mode 100644
index 0000000000..88b7979ae5
--- /dev/null
+++ b/data/2022/neurips/Monte Carlo Tree Descent for Black-Box Optimization	
@@ -0,0 +1 @@
+The key to Black-Box Optimization is to efficiently search through input regions with potentially widely-varying numerical properties, to achieve low-regret descent and fast progress toward the optima. Monte Carlo Tree Search (MCTS) methods have recently been introduced to improve Bayesian optimization by computing better partitioning of the search space that balances exploration and exploitation. Extending this promising framework, we study how to further integrate sample-based descent for faster optimization. We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices that incorporate stochastic search and Gaussian Processes. We propose the corresponding rules for balancing progress and uncertainty, branch selection, tree expansion, and backpropagation. The designed search process puts more emphasis on sampling for faster descent and uses localized Gaussian Processes as auxiliary metrics for both exploitation and exploration. We show empirically that the proposed algorithms can outperform state-of-the-art methods on many challenging benchmark problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization b/data/2022/neurips/Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization
new file mode 100644
index 0000000000..bc02cf74b4
--- /dev/null
+++ b/data/2022/neurips/Monte Carlo Tree Search based Variable Selection for High Dimensional Bayesian Optimization	
@@ -0,0 +1 @@
+Bayesian optimization (BO) is a class of popular methods for expensive black-box optimization, and has been widely applied to many scenarios. However, BO suffers from the curse of dimensionality, and scaling it to high-dimensional problems is still a challenge. In this paper, we propose a variable selection method MCTS-VS based on Monte Carlo tree search (MCTS), to iteratively select and optimize a subset of variables. That is, MCTS-VS constructs a low-dimensional subspace via MCTS and optimizes in the subspace with any BO algorithm. We give a theoretical analysis of the general variable selection method to reveal how it can work. Experiments on high-dimensional synthetic functions and real-world problems (i.e., NAS-bench problems and MuJoCo locomotion tasks) show that MCTS-VS equipped with a proper BO optimizer can achieve state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/MorphTE: Injecting Morphology in Tensorized Embeddings b/data/2022/neurips/MorphTE: Injecting Morphology in Tensorized Embeddings
new file mode 100644
index 0000000000..0fcba246c2
--- /dev/null
+++ b/data/2022/neurips/MorphTE: Injecting Morphology in Tensorized Embeddings	
@@ -0,0 +1 @@
+In the era of deep learning, word embeddings are essential when dealing with text tasks. However, storing and accessing these embeddings requires a large amount of space. This is not conducive to the deployment of these models on resource-limited devices. Combining the powerful compression capability of tensor products, we propose a word embedding compression method with morphological augmentation, Morphologically-enhanced Tensorized Embeddings (MorphTE). A word consists of one or more morphemes, the smallest units that bear meaning or have a grammatical function. MorphTE represents a word embedding as an entangled form of its morpheme vectors via the tensor product, which injects prior semantic and grammatical knowledge into the learning of embeddings. Furthermore, the dimensionality of the morpheme vector and the number of morphemes are much smaller than those of words, which greatly reduces the parameters of the word embeddings. We conduct experiments on tasks such as machine translation and question answering. Experimental results on four translation datasets of different languages show that MorphTE can compress word embedding parameters by about 20 times without performance loss and significantly outperforms related embedding compression methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Most Activation Functions Can Win the Lottery Without Excessive Depth b/data/2022/neurips/Most Activation Functions Can Win the Lottery Without Excessive Depth
new file mode 100644
index 0000000000..1aa40b82f0
--- /dev/null
+++ b/data/2022/neurips/Most Activation Functions Can Win the Lottery Without Excessive Depth	
@@ -0,0 +1 @@
+The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly initialized neural network that has double the target's depth $2L$ and is wider by a logarithmic factor. We show that a depth $L+1$ network is sufficient. This result indicates that we can expect to find lottery tickets at realistic, commonly used depths while only requiring logarithmic overparametrization. Our novel construction approach applies to a large class of activation functions and is not limited to ReLUs.
\ No newline at end of file
diff --git a/data/2022/neurips/Motion Transformer with Global Intention Localization and Local Movement Refinement b/data/2022/neurips/Motion Transformer with Global Intention Localization and Local Movement Refinement
new file mode 100644
index 0000000000..9c4bc29a54
--- /dev/null
+++ b/data/2022/neurips/Motion Transformer with Global Intention Localization and Local Movement Refinement	
@@ -0,0 +1 @@
+Predicting multimodal future behavior of traffic participants is essential for robotic vehicles to make safe decisions. Existing works explore to directly predict future trajectories based on latent features or utilize dense goal candidates to identify agent's destinations, where the former strategy converges slowly since all motion modes are derived from the same feature while the latter strategy has efficiency issue since its performance highly relies on the density of goal candidates. In this paper, we propose Motion TRansformer (MTR) framework that models motion prediction as the joint optimization of global intention localization and local movement refinement. Instead of using goal candidates, MTR incorporates spatial intention priors by adopting a small set of learnable motion query pairs. Each motion query pair takes charge of trajectory prediction and refinement for a specific motion mode, which stabilizes the training process and facilitates better multimodal predictions. Experiments show that MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges, ranking 1st on the leaderboards of Waymo Open Motion Dataset. The source code is available at https://github.com/sshaoshuai/MTR.
\ No newline at end of file
diff --git a/data/2022/neurips/Movement Penalized Bayesian Optimization with Application to Wind Energy Systems b/data/2022/neurips/Movement Penalized Bayesian Optimization with Application to Wind Energy Systems
new file mode 100644
index 0000000000..c89f89b277
--- /dev/null
+++ b/data/2022/neurips/Movement Penalized Bayesian Optimization with Application to Wind Energy Systems	
@@ -0,0 +1 @@
+Contextual Bayesian optimization (CBO) is a powerful framework for sequential decision-making given side information, with important applications, e.g., in wind energy systems. In this setting, the learner receives context (e.g., weather conditions) at each round, and has to choose an action (e.g., turbine parameters). Standard algorithms assume no cost for switching their decisions at every round. However, in many practical applications, there is a cost associated with such changes, which should be minimized. We introduce the episodic CBO with movement costs problem and, based on the online learning approach for metrical task systems of Coester and Lee (2019), propose a novel randomized mirror descent algorithm that makes use of Gaussian Process confidence bounds. We compare its performance with the offline optimal sequence for each episode and provide rigorous regret guarantees. We further demonstrate our approach on the important real-world application of altitude optimization for Airborne Wind Energy Systems. In the presence of substantial movement costs, our algorithm consistently outperforms standard CBO algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds b/data/2022/neurips/MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds
new file mode 100644
index 0000000000..f44478fd45
--- /dev/null
+++ b/data/2022/neurips/MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds	
@@ -0,0 +1 @@
+3D object detection from the LiDAR point cloud is fundamental to autonomous driving. Large-scale outdoor scenes usually feature significant variance in instance scales, thus requiring features rich in long-range and fine-grained information to support accurate detection. Recent detectors leverage the power of window-based transformers to model long-range dependencies but tend to blur out fine-grained details. To mitigate this gap, we present a novel Mixed-scale Sparse Voxel Trans-former, named MsSVT, which can well capture both types of information simultaneously by the divide-and-conquer philosophy. Specifically, MsSVT explicitly divides attention heads into multiple groups, each in charge of attending to information within a particular range. All groups’ output is merged to obtain the final mixed-scale features. Moreover, we provide a novel chessboard sampling strategy to reduce the computational complexity of applying a window-based transformer in 3D voxel space. To improve efficiency, we also implement the voxel sampling and gathering operations sparsely with a hash map. Endowed by the powerful capability and high efficiency of modeling mixed-scale information, our single-stage detector built on top of MsSVT surprisingly outperforms state-of-the-art two-stage detectors on Waymo. Our project page: https://github.com/dscdyc/MsSVT .
\ No newline at end of file
diff --git a/data/2022/neurips/Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging b/data/2022/neurips/Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging
new file mode 100644
index 0000000000..de5b6ba504
--- /dev/null
+++ b/data/2022/neurips/Muffliato: Peer-to-Peer Privacy Amplification for Decentralized Optimization and Averaging	
@@ -0,0 +1 @@
+Decentralized optimization is increasingly popular in machine learning for its scalability and efficiency. Intuitively, it should also provide better privacy guarantees, as nodes only observe the messages sent by their neighbors in the network graph. But formalizing and quantifying this gain is challenging: existing results are typically limited to Local Differential Privacy (LDP) guarantees that overlook the advantages of decentralization. In this work, we introduce pairwise network differential privacy, a relaxation of LDP that captures the fact that the privacy leakage from a node $u$ to a node $v$ may depend on their relative position in the graph. We then analyze the combination of local noise injection with (simple or randomized) gossip averaging protocols on fixed and random communication graphs. We also derive a differentially private decentralized optimization algorithm that alternates between local gradient descent steps and gossip averaging. Our results show that our algorithms amplify privacy guarantees as a function of the distance between nodes in the graph, matching the privacy-utility trade-off of the trusted curator, up to factors that explicitly depend on the graph topology. Finally, we illustrate our privacy gains with experiments on synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Agent Reinforcement Learning is a Sequence Modeling Problem b/data/2022/neurips/Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
new file mode 100644
index 0000000000..73d63d496f
--- /dev/null
+++ b/data/2022/neurips/Multi-Agent Reinforcement Learning is a Sequence Modeling Problem	
@@ -0,0 +1 @@
+Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that MAT is an excellent few-short learner on unseen tasks regardless of changes in the number of agents. See our project page at https://sites.google.com/view/multi-agent-transformer.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Class $H$-Consistency Bounds b/data/2022/neurips/Multi-Class $H$-Consistency Bounds
new file mode 100644
index 0000000000..c3848f24d9
--- /dev/null
+++ b/data/2022/neurips/Multi-Class $H$-Consistency Bounds	
@@ -0,0 +1 @@
+We present an extensive study of H -consistency bounds for multi-class classiﬁcation. These are upper bounds on the target loss estimation error of a predictor in a hypothesis set H , expressed in terms of the surrogate loss estimation error of that predictor. They are stronger and more signiﬁcant guarantees than Bayes-consistency, H -calibration or H -consistency, and more informative than excess error bounds derived for H being the family of all measurable functions. We give a series of new H -consistency bounds for surrogate multi-class losses, including max losses, sum losses, and constrained losses, both in the non-adversarial and adversarial cases, and for different differentiable or convex auxiliary functions used. We also prove that no non-trivial H -consistency bound can be given in some cases. To our knowledge, these are the ﬁrst H -consistency bounds proven for the multi-class setting. Our proof techniques are also novel and likely to be useful in the analysis of other such guarantees.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Fidelity Best-Arm Identification b/data/2022/neurips/Multi-Fidelity Best-Arm Identification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Multi-Game Decision Transformers b/data/2022/neurips/Multi-Game Decision Transformers
new file mode 100644
index 0000000000..103c5454b7
--- /dev/null
+++ b/data/2022/neurips/Multi-Game Decision Transformers	
@@ -0,0 +1 @@
+A longstanding goal of the field of AI is a method for learning a highly capable, generalist agent from diverse experience. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents. Specifically, we show that a single transformer-based model - with a single set of weights - trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance. When trained and evaluated appropriately, we find that the same trends observed in language and vision hold, including scaling of performance with model size and rapid adaptation to new games via fine-tuning. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning, and find that our Multi-Game Decision Transformer models offer the best scalability and performance. We release the pre-trained models and code to encourage further research in this direction.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning b/data/2022/neurips/Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
new file mode 100644
index 0000000000..5ccd242b26
--- /dev/null
+++ b/data/2022/neurips/Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning	
@@ -0,0 +1 @@
+Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e., pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching between fine-grained visual tokens and text tokens, followed by contrastive learning to align them. More important, to leverage the high-level inter-subject relationship semantic (e.g., disease) correspondences, we design a novel cross-modal disease-level alignment paradigm to enforce the cross-modal cluster assignment consistency. Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Instance Causal Representation Learning for Instance Label Prediction and Out-of-Distribution Generalization b/data/2022/neurips/Multi-Instance Causal Representation Learning for Instance Label Prediction and Out-of-Distribution Generalization
new file mode 100644
index 0000000000..33283bab66
--- /dev/null
+++ b/data/2022/neurips/Multi-Instance Causal Representation Learning for Instance Label Prediction and Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+Multi-instance learning (MIL) deals with objects represented as bags of instances and can predict instance labels from bag-level supervision. However, significant performance gaps exist between instance-level MIL algorithms and supervised learners since the instance labels are unavailable in MIL. Most existing MIL algorithms tackle the problem by treating multi-instance bags as harmful ambiguities and predicting instance labels by reducing the supervision inexactness. This work studies MIL from a new perspective by considering bags as auxiliary information, and utilize it to identify instance-level causal representations from bag-level weak supervision. We propose the CausalMIL algorithm, which not only excels at instance label prediction but also provides robustness to distribution change by synergistically integrating MIL with identifiable variational autoencoder. Our approach is based on a practical and general assumption: the prior distribution over the instance latent representations belongs to the non-factorized exponential family conditioning on the multi-instance bags. Experiments on synthetic and real-world datasets demonstrate that our approach significantly outperforms various baselines on instance label prediction and out-of-distribution generalization tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities b/data/2022/neurips/Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities
new file mode 100644
index 0000000000..26f573c5d5
--- /dev/null
+++ b/data/2022/neurips/Multi-LexSum: Real-world Summaries of Civil Rights Lawsuits at Multiple Granularities	
@@ -0,0 +1 @@
+With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections. One such setting is the Civil Rights Litigation Clearinghouse (CRLC) (https://clearinghouse.net),which posts information about large-scale civil rights lawsuits, serving lawyers, scholars, and the general public. Today, summarization in the CRLC requires extensive training of lawyers and law students who spend hours per case understanding multiple relevant documents in order to produce high-quality summaries of key events and outcomes. Motivated by this ongoing real-world summarization effort, we introduce Multi-LexSum, a collection of 9,280 expert-authored summaries drawn from ongoing CRLC writing. Multi-LexSum presents a challenging multi-document summarization task given the length of the source documents, often exceeding two hundred pages per case. Furthermore, Multi-LexSum is distinct from other datasets in its multiple target summaries, each at a different granularity (ranging from one-sentence"extreme"summaries to multi-paragraph narrations of over five hundred words). We present extensive analysis demonstrating that despite the high-quality summaries in the training data (adhering to strict content and style guidelines), state-of-the-art summarization models perform poorly on this task. We release Multi-LexSum for further research in summarization methods as well as to facilitate development of applications to assist in the CRLC's mission at https://multilexsum.github.io.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval b/data/2022/neurips/Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Multi-Objective Deep Learning with Adaptive Reference Vectors b/data/2022/neurips/Multi-Objective Deep Learning with Adaptive Reference Vectors
new file mode 100644
index 0000000000..7d9ef68107
--- /dev/null
+++ b/data/2022/neurips/Multi-Objective Deep Learning with Adaptive Reference Vectors	
@@ -0,0 +1 @@
+Many deep learning models involve optimizing multiple objectives. Since objectives are often conﬂicting, we aim to get diverse and representative trade-off solutions among these objectives. Gradient-based multi-objective optimization (MOO) algorithms using reference vectors have shown promising performance. However, they may still produce undesirable solutions due to mismatch between the pre-speciﬁed reference vectors and the problem’s underlying Pareto front. In this paper, we propose a novel gradient-based MOO algorithm with adaptive reference vectors. We formulate reference vector adaption as a bilevel optimization problem, and solve it with an efﬁcient solver. Theoretical convergence analysis is also provided. Experiments on an extensive set of learning scenarios demonstrate the superiority of the proposed algorithm over the state-of-the-art.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Sample Training for Neural Image Compression b/data/2022/neurips/Multi-Sample Training for Neural Image Compression
new file mode 100644
index 0000000000..fdc098702b
--- /dev/null
+++ b/data/2022/neurips/Multi-Sample Training for Neural Image Compression	
@@ -0,0 +1 @@
+This paper considers the problem of lossy neural image compression (NIC). Current state-of-the-art (sota) methods adopt uniform posterior to approximate quantization noise, and single-sample pathwise estimator to approximate the gradient of evidence lower bound (ELBO). In this paper, we propose to train NIC with multiple-sample importance weighted autoencoder (IWAE) target, which is tighter than ELBO and converges to log likelihood as sample size increases. First, we identify that the uniform posterior of NIC has special properties, which affect the variance and bias of pathwise and score function estimators of the IWAE target. Moreover, we provide insights on a commonly adopted trick in NIC from gradient variance perspective. Based on those analysis, we further propose multiple-sample NIC (MS-NIC), an enhanced IWAE target for NIC. Experimental results demonstrate that it improves sota NIC methods. Our MS-NIC is plug-and-play, and can be easily extended to other neural compression tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-Scale Adaptive Network for Single Image Denoising b/data/2022/neurips/Multi-Scale Adaptive Network for Single Image Denoising
new file mode 100644
index 0000000000..012897b333
--- /dev/null
+++ b/data/2022/neurips/Multi-Scale Adaptive Network for Single Image Denoising	
@@ -0,0 +1 @@
+Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale architecture design and accordingly propose a novel Multi-Scale Adaptive Network (MSANet) for single image denoising. Specifically, MSANet simultaneously embraces the within-scale characteristics and the cross-scale complementarity thanks to three novel neural blocks, \textit{i.e.}, adaptive feature block (AFeB), adaptive multi-scale block (AMB), and adaptive fusion block (AFuB). In brief, AFeB is designed to adaptively preserve image details and filter noises, which is highly expected for the features with mixed details and noises. AMB could enlarge the receptive field and aggregate the multi-scale information, which meets the need of contextually informative features. AFuB devotes to adaptively sampling and transferring the features from one scale to another scale, which fuses the multi-scale features with varying characteristics from coarse to fine. Extensive experiments on both three real and six synthetic noisy image datasets show the superiority of MSANet compared with 12 methods. The code could be accessed from https://github.com/XLearning-SCU/2022-NeurIPS-MSANet.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-agent Dynamic Algorithm Configuration b/data/2022/neurips/Multi-agent Dynamic Algorithm Configuration
new file mode 100644
index 0000000000..5cdbe1b479
--- /dev/null
+++ b/data/2022/neurips/Multi-agent Dynamic Algorithm Configuration	
@@ -0,0 +1 @@
+Automated algorithm configuration relieves users from tedious, trial-and-error tuning tasks. A popular algorithm configuration tuning paradigm is dynamic algorithm configuration (DAC), in which an agent learns dynamic configuration policies across instances by reinforcement learning (RL). However, in many complex algorithms, there may exist different types of configuration hyperparameters, and such heterogeneity may bring difficulties for classic DAC which uses a single-agent RL policy. In this paper, we aim to address this issue and propose multi-agent DAC (MA-DAC), with one agent working for one type of configuration hyperparameter. MA-DAC formulates the dynamic configuration of a complex algorithm with multiple types of hyperparameters as a contextual multi-agent Markov decision process and solves it by a cooperative multi-agent RL (MARL) algorithm. To instantiate, we apply MA-DAC to a well-known optimization algorithm for multi-objective optimization problems. Experimental results show the effectiveness of MA-DAC in not only achieving superior performance compared with other configuration tuning approaches based on heuristic rules, multi-armed bandits, and single-agent RL, but also being capable of generalizing to different problem classes. Furthermore, we release the environments in this paper as a benchmark for testing MARL algorithms, with the hope of facilitating the application of MARL.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents b/data/2022/neurips/Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents
new file mode 100644
index 0000000000..ccb852121f
--- /dev/null
+++ b/data/2022/neurips/Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents	
@@ -0,0 +1 @@
+We consider a scenario where multiple agents are learning a common decision vector from data which can be influenced by the agents' decisions. This leads to the problem of multi-agent performative prediction (Multi-PfD). In this paper, we formulate Multi-PfD as a decentralized optimization problem that minimizes a sum of loss functions, where each loss function is based on a distribution influenced by the local decision vector. We first prove the necessary and sufficient condition for the Multi-PfD problem to admit a unique multi-agent performative stable (Multi-PS) solution. We show that enforcing consensus leads to a laxer condition for the existence of Multi-PS solution with respect to the distributions' sensitivities, compared to the single agent case. Then, we study a decentralized extension to the greedy deployment scheme [Mendler-D\"unner et al., 2020], called the DSGD-GD scheme. We show that DSGD-GD converges to the Multi-PS solution and analyze its non-asymptotic convergence rate. Numerical results validate our analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization b/data/2022/neurips/Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization
new file mode 100644
index 0000000000..d99eb122d6
--- /dev/null
+++ b/data/2022/neurips/Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization	
@@ -0,0 +1 @@
+In this paper, we study multi-block min-max bilevel optimization problems, where the upper level is non-convex strongly-concave minimax objective and the lower level is a strongly convex objective, and there are multiple blocks of dual variables and lower level problems. Due to the intertwined multi-block min-max bilevel structure, the computational cost at each iteration could be prohibitively high, especially with a large number of blocks. To tackle this challenge, we present a single-loop randomized stochastic algorithm, which requires updates for only a constant number of blocks at each iteration. Under some mild assumptions on the problem, we establish its sample complexity of $O(1/\epsilon^4)$ for finding an $\epsilon$-stationary point. This matches the optimal complexity for solving stochastic nonconvex optimization under a general unbiased stochastic oracle model. Moreover, we provide two applications of the proposed method in multi-task deep AUC (area under ROC curve) maximization and multi-task deep partial AUC maximization. Experimental results validate our theory and demonstrate the effectiveness of our method on problems with hundreds of tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization b/data/2022/neurips/Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization
new file mode 100644
index 0000000000..d830622ee1
--- /dev/null
+++ b/data/2022/neurips/Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization	
@@ -0,0 +1 @@
+Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration? There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex/Polyak-{\L}ojasiewicz (PL) objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-dataset Training of Transformers for Robust Action Recognition b/data/2022/neurips/Multi-dataset Training of Transformers for Robust Action Recognition
new file mode 100644
index 0000000000..b0ec626cfa
--- /dev/null
+++ b/data/2022/neurips/Multi-dataset Training of Transformers for Robust Action Recognition	
@@ -0,0 +1 @@
+We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition. We build our method on Transformers for its efficacy. Although we have witnessed great progress for video action recognition in the past decade, it remains challenging yet valuable how to train a single model that can perform well across multiple datasets. Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss, aiming to learn robust representations for action recognition. In particular, the informative loss maximizes the expressiveness of the feature embedding while the projection loss for each dataset mines the intrinsic relations between classes across datasets. We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. Extensive experimental results show that our method can consistently improve state-of-the-art performance. Code and models are released.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-fidelity Monte Carlo: a pseudo-marginal approach b/data/2022/neurips/Multi-fidelity Monte Carlo: a pseudo-marginal approach
new file mode 100644
index 0000000000..0b193abf22
--- /dev/null
+++ b/data/2022/neurips/Multi-fidelity Monte Carlo: a pseudo-marginal approach	
@@ -0,0 +1 @@
+Markov chain Monte Carlo (MCMC) is an established approach for uncertainty quantification and propagation in scientific applications. A key challenge in applying MCMC to scientific domains is computation: the target density of interest is often a function of expensive computations, such as a high-fidelity physical simulation, an intractable integral, or a slowly-converging iterative algorithm. Thus, using an MCMC algorithms with an expensive target density becomes impractical, as these expensive computations need to be evaluated at each iteration of the algorithm. In practice, these computations often approximated via a cheaper, low-fidelity computation, leading to bias in the resulting target density. Multi-fidelity MCMC algorithms combine models of varying fidelities in order to obtain an approximate target density with lower computational cost. In this paper, we describe a class of asymptotically exact multi-fidelity MCMC algorithms for the setting where a sequence of models of increasing fidelity can be computed that approximates the expensive target density of interest. We take a pseudo-marginal MCMC approach for multi-fidelity inference that utilizes a cheaper, randomized-fidelity unbiased estimator of the target fidelity constructed via random truncation of a telescoping series of the low-fidelity sequence of models. Finally, we discuss and evaluate the proposed multi-fidelity MCMC approach on several applications, including log-Gaussian Cox process modeling, Bayesian ODE system identification, PDE-constrained optimization, and Gaussian process regression parameter inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-layer State Evolution Under Random Convolutional Design b/data/2022/neurips/Multi-layer State Evolution Under Random Convolutional Design
new file mode 100644
index 0000000000..a33e505340
--- /dev/null
+++ b/data/2022/neurips/Multi-layer State Evolution Under Random Convolutional Design	
@@ -0,0 +1 @@
+Signal recovery under generative neural network priors has emerged as a promising direction in statistical inference and computational imaging. Theoretical analysis of reconstruction algorithms under generative priors is, however, challenging. For generative priors with fully connected layers and Gaussian i.i.d. weights, this was achieved by the multi-layer approximate message (ML-AMP) algorithm via a rigorous state evolution. However, practical generative priors are typically convolutional, allowing for computational benefits and inductive biases, and so the Gaussian i.i.d. weight assumption is very limiting. In this paper, we overcome this limitation and establish the state evolution of ML-AMP for random convolutional layers. We prove in particular that random convolutional layers belong to the same universality class as Gaussian matrices. Our proof technique is of an independent interest as it establishes a mapping between convolutional matrices and spatially coupled sensing matrices used in coding theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing b/data/2022/neurips/Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
new file mode 100644
index 0000000000..5198992e23
--- /dev/null
+++ b/data/2022/neurips/Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing	
@@ -0,0 +1 @@
+The audio-visual video parsing task aims to parse a video into modality-and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels. During training, they do not know which modality perceives and meanwhile which temporal segment contains the video event. Since there is no explicit grouping in the existing frameworks, the modality and temporal uncertainties make these meth-ods suffer from false predictions. For instance, segments in the same category could be predicted in different event classes. Learning compact and discriminative multi-modal subspaces is essential for mitigating the issue. To this end, in this paper, we propose a novel Multi-modal Grouping Network, namely MGN, for explicitly semantic-aware grouping. Speciﬁcally, MGN aggregates event-aware unimodal features through unimodal grouping in terms of learnable categorical embedding tokens. Furthermore, it leverages the cross-modal grouping for modality-aware prediction to match the video-level target. Our simple framework achieves improving results against previous baselines on weakly-supervised audio-visual video parsing. In addition, our MGN is much more lightweight, using only 47.2% of the parameters of baselines (17 MB vs. 36 MB). Code is available at https://github.com/stoneMo/MGN .
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-objective Deep Data Generation with Correlated Property Control b/data/2022/neurips/Multi-objective Deep Data Generation with Correlated Property Control
new file mode 100644
index 0000000000..07f00cf5e2
--- /dev/null
+++ b/data/2022/neurips/Multi-objective Deep Data Generation with Correlated Property Control	
@@ -0,0 +1 @@
+Developing deep generative models has been an emerging field due to the ability to model and generate complex data for various purposes, such as image synthesis and molecular design. However, the advancement of deep generative models is limited by challenges to generate objects that possess multiple desired properties: 1) the existence of complex correlation among real-world properties is common but hard to identify; 2) controlling individual property enforces an implicit partially control of its correlated properties, which is difficult to model; 3) controlling multiple properties under various manners simultaneously is hard and under-explored. We address these challenges by proposing a novel deep generative framework that recovers semantics and the correlation of properties through disentangled latent vectors. The correlation is handled via an explainable mask pooling layer, and properties are precisely retained by generated objects via the mutual dependence between latent vectors and properties. Our generative model preserves properties of interest while handling correlation and conflicts of properties under a multi-objective optimization framework. The experiments demonstrate our model's superior performance in generating data with desired properties.
\ No newline at end of file
diff --git a/data/2022/neurips/Multi-view Subspace Clustering on Topological Manifold b/data/2022/neurips/Multi-view Subspace Clustering on Topological Manifold
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples b/data/2022/neurips/MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples
new file mode 100644
index 0000000000..8a25582d1e
--- /dev/null
+++ b/data/2022/neurips/MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples	
@@ -0,0 +1 @@
+Multi-label classification, which predicts a set of labels for an input, has many applications. However, multiple recent studies showed that multi-label classification is vulnerable to adversarial examples. In particular, an attacker can manipulate the labels predicted by a multi-label classifier for an input via adding carefully crafted, human-imperceptible perturbation to it. Existing provable defenses for multi-class classification achieve sub-optimal provable robustness guarantees when generalized to multi-label classification. In this work, we propose MultiGuard, the first provably robust defense against adversarial examples to multi-label classification. Our MultiGuard leverages randomized smoothing, which is the state-of-the-art technique to build provably robust classifiers. Specifically, given an arbitrary multi-label classifier, our MultiGuard builds a smoothed multi-label classifier via adding random noise to the input. We consider isotropic Gaussian noise in this work. Our major theoretical contribution is that we show a certain number of ground truth labels of an input are provably in the set of labels predicted by our MultiGuard when the $\ell_2$-norm of the adversarial perturbation added to the input is bounded. Moreover, we design an algorithm to compute our provable robustness guarantees. Empirically, we evaluate our MultiGuard on VOC 2007, MS-COCO, and NUS-WIDE benchmark datasets. Our code is available at: \url{https://github.com/quwenjie/MultiGuard}
\ No newline at end of file
diff --git a/data/2022/neurips/MultiScan: Scalable RGBD scanning for 3D environments with articulated objects b/data/2022/neurips/MultiScan: Scalable RGBD scanning for 3D environments with articulated objects
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2022/neurips/MultiScan: Scalable RGBD scanning for 3D environments with articulated objects	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2022/neurips/Multiagent Q-learning with Sub-Team Coordination b/data/2022/neurips/Multiagent Q-learning with Sub-Team Coordination
new file mode 100644
index 0000000000..b4b27dbae7
--- /dev/null
+++ b/data/2022/neurips/Multiagent Q-learning with Sub-Team Coordination	
@@ -0,0 +1 @@
+Abstract
\ No newline at end of file
diff --git a/data/2022/neurips/Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes b/data/2022/neurips/Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes
new file mode 100644
index 0000000000..a542ba9285
--- /dev/null
+++ b/data/2022/neurips/Multiclass Learnability Beyond the PAC Framework: Universal Rates and Partial Concept Classes	
@@ -0,0 +1 @@
+In this paper we study the problem of multiclass classification with a bounded number of different labels $k$, in the realizable setting. We extend the traditional PAC model to a) distribution-dependent learning rates, and b) learning rates under data-dependent assumptions. First, we consider the universal learning setting (Bousquet, Hanneke, Moran, van Handel and Yehudayoff, STOC '21), for which we provide a complete characterization of the achievable learning rates that holds for every fixed distribution. In particular, we show the following trichotomy: for any concept class, the optimal learning rate is either exponential, linear or arbitrarily slow. Additionally, we provide complexity measures of the underlying hypothesis class that characterize when these rates occur. Second, we consider the problem of multiclass classification with structured data (such as data lying on a low dimensional manifold or satisfying margin conditions), a setting which is captured by partial concept classes (Alon, Hanneke, Holzman and Moran, FOCS '21). Partial concepts are functions that can be undefined in certain parts of the input space. We extend the traditional PAC learnability of total concept classes to partial concept classes in the multiclass setting and investigate differences between partial and total concepts.
\ No newline at end of file
diff --git a/data/2022/neurips/Multilingual Abusive Comment Detection at Scale for Indic Languages b/data/2022/neurips/Multilingual Abusive Comment Detection at Scale for Indic Languages
new file mode 100644
index 0000000000..bb085bb3e9
--- /dev/null
+++ b/data/2022/neurips/Multilingual Abusive Comment Detection at Scale for Indic Languages	
@@ -0,0 +1 @@
+Social media platforms were conceived to act as online ‘town squares’ where people could get together, share information and communicate with each other peacefully. However, harmful content borne out of bad actors are constantly plaguing these platforms slowly converting them into ‘mosh pits’ where the bad actors take the liberty to extensively abuse various marginalised groups. Accurate and timely detection of abusive content on social media platforms is therefore very important for facilitating safe interactions between users. However, due to the small scale and sparse linguistic coverage of Indic abusive speech datasets, development of such algorithms for Indic social media users (one-sixth of global population) is severely impeded. To facilitate and encourage research in this important direction, we contribute for the first time MACD - a large-scale (150K), human-annotated, multilingual (5 languages), balanced (49% abusive content) and diverse (70K users) abuse detection dataset of user comments, sourced from a popular social media platform - ShareChat 2 . We also release AbuseXLMR , an abusive content detection model pretrained on large number of social media comments in 15+ Indic languages which outperforms XLM-R and MuRIL on multiple Indic datasets. Along with the annotations, we also release the mapping between comment, post and user id’s to facilitate modelling the relationship between them. We share competitive monolingual, cross-lingual and few-shot baselines so that MACD can be used as
\ No newline at end of file
diff --git a/data/2022/neurips/Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts b/data/2022/neurips/Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
new file mode 100644
index 0000000000..f72157f129
--- /dev/null
+++ b/data/2022/neurips/Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts	
@@ -0,0 +1 @@
+Large sparsely-activated models have obtained excellent performance in multiple domains. However, such models are typically trained on a single modality at a time. We present the Language-Image MoE, LIMoE, a sparse mixture of experts model capable of multimodal learning. LIMoE accepts both images and text simultaneously, while being trained using a contrastive loss. MoEs are a natural fit for a multimodal backbone, since expert layers can learn an appropriate partitioning of modalities. However, new challenges arise; in particular, training stability and balanced expert utilization, for which we propose an entropy-based regularization scheme. Across multiple scales, we demonstrate remarkable performance improvement over dense models of equivalent computational cost. LIMoE-L/16 trained comparably to CLIP-L/14 achieves 78.6% zero-shot ImageNet accuracy (vs. 76.2%), and when further scaled to H/14 (with additional data) it achieves 84.1%, comparable to state-of-the-art methods which use larger custom per-modality backbones and pre-training schemes. We analyse the quantitative and qualitative behavior of LIMoE, and demonstrate phenomena such as differing treatment of the modalities and the organic emergence of modality-specific experts.
\ No newline at end of file
diff --git a/data/2022/neurips/Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve b/data/2022/neurips/Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve
new file mode 100644
index 0000000000..dea17589f1
--- /dev/null
+++ b/data/2022/neurips/Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve	
@@ -0,0 +1 @@
+We find a surprising connection between multitask learning and robustness to neuron failures. Our experiments show that bilingual language models retain higher performance under various neuron perturbations, such as random deletions, magnitude pruning and weight noise compared to equivalent monolingual ones. We provide a theoretical justification for this robustness by mathematically analyzing linear representation learning and showing that multitasking creates more robust representations. Our analysis connects robustness to spectral properties of the learned representation and proves that multitasking leads to higher robustness for diverse task vectors. We open-source our code and models: https://github.com/giannisdaras/multilingual_robustness
\ No newline at end of file
diff --git a/data/2022/neurips/Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks b/data/2022/neurips/Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks
new file mode 100644
index 0000000000..fb38744ab6
--- /dev/null
+++ b/data/2022/neurips/Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks	
@@ -0,0 +1 @@
+Modeling multivariate time series (MTS) is critical in modern intelligent systems. The accurate forecast of MTS data is still challenging due to the complicated latent variable correlation. Recent works apply the Graph Neural Networks (GNNs) to the task, with the basic idea of representing the correlation as a static graph. However, predicting with a static graph causes signiﬁcant bias because the correlation is time-varying in the real-world MTS data. Besides, there is no gap analysis between the actual correlation and the learned one in their works to validate the effectiveness. This paper proposes a temporal polynomial graph neural network (TPGNN) for accurate MTS forecasting, which represents the dynamic variable correlation as a temporal matrix polynomial in two steps. First, we capture the overall correlation with a static matrix basis. Then, we use a set of time-varying coefﬁcients and the matrix basis to construct a matrix polynomial for each time step. The constructed result empirically captures the precise dynamic correlation of six synthetic MTS datasets generated by a non-repeating random walk model. Moreover, the theoretical analysis shows that TPGNN can achieve perfect approximation under a commutative condition. We conduct extensive experiments on two trafﬁc datasets with prior structure and four benchmark datasets. The results indicate that TPGNN achieves the state-of-the-art on both short-term and long-term MTS forecastings. 1
\ No newline at end of file
diff --git a/data/2022/neurips/Multiview Human Body Reconstruction from Uncalibrated Cameras b/data/2022/neurips/Multiview Human Body Reconstruction from Uncalibrated Cameras
new file mode 100644
index 0000000000..0966090dfb
--- /dev/null
+++ b/data/2022/neurips/Multiview Human Body Reconstruction from Uncalibrated Cameras	
@@ -0,0 +1 @@
+We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) to geometrically align and fuse visual features. Despite remarkable performances, the requirement of camera calibration restricted their applicability to real-world scenarios, e.g. , reconstruction from social videos with wide-baseline cameras. We address this challenge by leveraging the commonly observed human body as a semantic calibration target, which eliminates the requirement of camera calibration. Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images. We learn a self-attention mechanism to reason about the confidence of visual features across and within views. With fused visual features, a regressor is learned to predict the parameters of a body model. We demonstrate that our calibration-free multiview fusion method reliably reconstructs 3D body pose and shape, outperforming state-of-the-art single view methods with post-hoc multiview fusion, particularly in the presence of non-trivial occlusion, and showing comparable accuracy to multiview methods that require calibration.
\ No newline at end of file
diff --git a/data/2022/neurips/Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation b/data/2022/neurips/Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
new file mode 100644
index 0000000000..1976836939
--- /dev/null
+++ b/data/2022/neurips/Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation	
@@ -0,0 +1 @@
+Symbolic music generation aims to generate music scores automatically. A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e.g., over 10,000 tokens), and the existing models have shortcomings in generating musical repetition structures. In this paper, we propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures (e.g., the previous 1st, 2nd, 4th and 8th bars, selected via similarity statistics); with the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost. The advantages are two-fold. First, it can capture both music structure-related correlations via the fine-grained attention, and other contextual information via the coarse-grained attention. Second, it is efficient and can model over 3X longer music sequences compared to its full-attention counterpart. Both objective and subjective experimental results demonstrate its ability to generate long music sequences with high quality and better structures.
\ No newline at end of file
diff --git a/data/2022/neurips/Mutual Information Divergence: A Unified Metric for Multimodal Generative Models b/data/2022/neurips/Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
new file mode 100644
index 0000000000..c2ed162835
--- /dev/null
+++ b/data/2022/neurips/Mutual Information Divergence: A Unified Metric for Multimodal Generative Models	
@@ -0,0 +1 @@
+Text-to-image generation and image captioning are recently emerged as a new experimental paradigm to assess machine intelligence. They predict continuous quantity accompanied by their sampling techniques in the generation, making evaluation complicated and intractable to get marginal distributions. Based on a recent trend that multimodal generative evaluations exploit a vison-and-language pre-trained model, we propose the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID). To validate, we extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks. The proposed MID significantly outperforms the competitive methods by having consistency across benchmarks, sample parsimony, and robustness toward the exploited CLIP model. We look forward to seeing the underrepresented implications of the Gaussian cross-mutual information in multimodal representation learning and the future works based on this novel proposition.
\ No newline at end of file
diff --git a/data/2022/neurips/Myriad: a real-world testbed to bridge trajectory optimization and deep learning b/data/2022/neurips/Myriad: a real-world testbed to bridge trajectory optimization and deep learning
new file mode 100644
index 0000000000..0d1bd31de6
--- /dev/null
+++ b/data/2022/neurips/Myriad: a real-world testbed to bridge trajectory optimization and deep learning	
@@ -0,0 +1 @@
+We present Myriad, a testbed written in JAX for learning and planning in real-world continuous environments. The primary contributions of Myriad are threefold. First, Myriad provides machine learning practitioners access to trajectory optimization techniques for application within a typical automatic differentiation workflow. Second, Myriad presents many real-world optimal control problems, ranging from biology to medicine to engineering, for use by the machine learning community. Formulated in continuous space and time, these environments retain some of the complexity of real-world systems often abstracted away by standard benchmarks. As such, Myriad strives to serve as a stepping stone towards application of modern machine learning techniques for impactful real-world tasks. Finally, we use the Myriad repository to showcase a novel approach for learning and control tasks. Trained in a fully end-to-end fashion, our model leverages an implicit planning module over neural ordinary differential equations, enabling simultaneous learning and planning with complex environment dynamics.
\ No newline at end of file
diff --git "a/data/2022/neurips/M\302\263ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design" "b/data/2022/neurips/M\302\263ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"
new file mode 100644
index 0000000000..e8ba5cfa57
--- /dev/null
+++ "b/data/2022/neurips/M\302\263ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design"	
@@ -0,0 +1 @@
+Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task. Yet most real systems demand only one or two tasks at each moment, and switch between tasks as needed: therefore such all tasks activated inference is also highly inefficient and non-scalable. In this paper, we present a model-accelerator co-design framework to enable efficient on-device MTL. Our framework, dubbed M$^3$ViT, customizes mixture-of-experts (MoE) layers into a vision transformer (ViT) backbone for MTL, and sparsely activates task-specific experts during training. Then at inference with any task of interest, the same design allows for activating only the task-corresponding sparse expert pathway, instead of the full model. Our new model design is further enhanced by hardware-level innovations, in particular, a novel computation reordering scheme tailored for memory-constrained MTL that achieves zero-overhead switching between tasks and can scale to any number of experts. When executing single-task inference, M$^{3}$ViT achieves higher accuracies than encoder-focused MTL methods, while significantly reducing 88% inference FLOPs. When implemented on a hardware platform of one Xilinx ZCU104 FPGA, our co-design framework reduces the memory requirement by 2.4 times, while achieving energy efficiency up to 9.23 times higher than a comparable FPGA baseline. Code is available at: https://github.com/VITA-Group/M3ViT.
\ No newline at end of file
diff --git a/data/2022/neurips/NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks b/data/2022/neurips/NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks
new file mode 100644
index 0000000000..527fd5f2aa
--- /dev/null
+++ b/data/2022/neurips/NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks	
@@ -0,0 +1 @@
+Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, e.g. image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how NAS-Bench-360 and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360.ml.cmu.edu.
\ No newline at end of file
diff --git a/data/2022/neurips/NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search b/data/2022/neurips/NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search
new file mode 100644
index 0000000000..c14ad9d32b
--- /dev/null
+++ b/data/2022/neurips/NAS-Bench-Graph: Benchmarking Graph Neural Architecture Search	
@@ -0,0 +1 @@
+Graph neural architecture search (GraphNAS) has recently aroused considerable attention in both academia and industry. However, two key challenges seriously hinder the further research of GraphNAS. First, since there is no consensus for the experimental setting, the empirical results in different research papers are often not comparable and even not reproducible, leading to unfair comparisons. Secondly, GraphNAS often needs extensive computations, which makes it highly inefficient and inaccessible to researchers without access to large-scale computation. To solve these challenges, we propose NAS-Bench-Graph, a tailored benchmark that supports unified, reproducible, and efficient evaluations for GraphNAS. Specifically, we construct a unified, expressive yet compact search space, covering 26,206 unique graph neural network (GNN) architectures and propose a principled evaluation protocol. To avoid unnecessary repetitive training, we have trained and evaluated all of these architectures on nine representative graph datasets, recording detailed metrics including train, validation, and test performance in each epoch, the latency, the number of parameters, etc. Based on our proposed benchmark, the performance of GNN architectures can be directly obtained by a look-up table without any further computation, which enables fair, fully reproducible, and efficient comparisons. To demonstrate its usage, we make in-depth analyses of our proposed NAS-Bench-Graph, revealing several interesting findings for GraphNAS. We also showcase how the benchmark can be easily compatible with GraphNAS open libraries such as AutoGL and NNI. To the best of our knowledge, our work is the first benchmark for graph neural architecture search.
\ No newline at end of file
diff --git a/data/2022/neurips/NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies b/data/2022/neurips/NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies
new file mode 100644
index 0000000000..cbc8185bc4
--- /dev/null
+++ b/data/2022/neurips/NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies	
@@ -0,0 +1 @@
+Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS). Recent work has shown that these techniques show great promise, but certain aspects, such as evaluating and exploiting their complementary strengths, are under-studied. In this work, we create NAS-Bench-Suite: we evaluate 13 ZC proxies across 28 tasks, creating by far the largest dataset (and unified codebase) for ZC proxies, enabling orders-of-magnitude faster experiments on ZC proxies, while avoiding confounding factors stemming from different implementations. To demonstrate the usefulness of NAS-Bench-Suite, we run a large-scale analysis of ZC proxies, including a bias analysis, and the first information-theoretic analysis which concludes that ZC proxies capture substantial complementary information. Motivated by these findings, we present a procedure to improve the performance of ZC proxies by reducing biases such as cell size, and we also show that incorporating all 13 ZC proxies into the surrogate models used by NAS algorithms can improve their predictive performance by up to 42%. Our code and datasets are available at https://github.com/automl/naslib/tree/zerocost.
\ No newline at end of file
diff --git a/data/2022/neurips/NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching b/data/2022/neurips/NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching
new file mode 100644
index 0000000000..152d30c935
--- /dev/null
+++ b/data/2022/neurips/NCP: Neural Correspondence Prior for Effective Unsupervised Shape Matching	
@@ -0,0 +1 @@
+Specifically, we first give a background on functional maps in Appendix A. Next, the implementation details are provided in Appendix B. The implementation details of our few-shot keypoint detection algorithm are presented in Appendix C. In Appendix D, we provide a complete formulation and proof of the theorem we introduced in Sec. 3.2 of the main text. In Appendix E, we provide an experiment to verify the conditions of the aforementioned theorem. We present a more in-depth analysis of the Neural Correspondence Prior (NCP) effect in Appendix F. Additional quantitative and qualitative results for the shape matching on man-made data tasks are included in Appendix G. An experiment showing the performance of our method in the case of near isometric data is presented in Appendix H. Finally, we discuss the societal impact of our work in Appendix I.
\ No newline at end of file
diff --git a/data/2022/neurips/NOMAD: Nonlinear Manifold Decoders for Operator Learning b/data/2022/neurips/NOMAD: Nonlinear Manifold Decoders for Operator Learning
new file mode 100644
index 0000000000..38e071439e
--- /dev/null
+++ b/data/2022/neurips/NOMAD: Nonlinear Manifold Decoders for Operator Learning	
@@ -0,0 +1 @@
+Supervised learning in function spaces is an emerging area of machine learning research with applications to the prediction of complex physical systems such as fluid flows, solid mechanics, and climate modeling. By directly learning maps (operators) between infinite dimensional function spaces, these models are able to learn discretization invariant representations of target functions. A common approach is to represent such target functions as linear combinations of basis elements learned from data. However, there are simple scenarios where, even though the target functions form a low dimensional submanifold, a very large number of basis elements is needed for an accurate linear representation. Here we present NOMAD, a novel operator learning framework with a nonlinear decoder map capable of learning finite dimensional representations of nonlinear submanifolds in function spaces. We show this method is able to accurately learn low dimensional representations of solution manifolds to partial differential equations while outperforming linear models of larger size. Additionally, we compare to state-of-the-art operator learning methods on a complex fluid dynamics benchmark and achieve competitive performance with a significantly smaller model size and training cost.
\ No newline at end of file
diff --git a/data/2022/neurips/NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation b/data/2022/neurips/NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation
new file mode 100644
index 0000000000..0ced33dd68
--- /dev/null
+++ b/data/2022/neurips/NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation	
@@ -0,0 +1 @@
+Test-time adaptation (TTA) is an emerging paradigm that addresses distributional shifts between training and testing phases without additional data acquisition or labeling cost; only unlabeled test data streams are used for continual model adaptation. Previous TTA schemes assume that the test samples are independent and identically distributed (i.i.d.), even though they are often temporally correlated (non-i.i.d.) in application scenarios, e.g., autonomous driving. We discover that most existing TTA methods fail dramatically under such scenarios. Motivated by this, we present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner. Our evaluation with various datasets, including real-world non-i.i.d. streams, demonstrates that the proposed robust TTA not only outperforms state-of-the-art TTA algorithms in the non-i.i.d. setting, but also achieves comparable performance to those algorithms under the i.i.d. assumption. Code is available at https://github.com/TaesikGong/NOTE.
\ No newline at end of file
diff --git a/data/2022/neurips/NS3: Neuro-symbolic Semantic Code Search b/data/2022/neurips/NS3: Neuro-symbolic Semantic Code Search
new file mode 100644
index 0000000000..ac2a015ef5
--- /dev/null
+++ b/data/2022/neurips/NS3: Neuro-symbolic Semantic Code Search	
@@ -0,0 +1 @@
+Semantic code search is the task of retrieving a code snippet given a textual description of its functionality. Recent work has been focused on using similarity metrics between neural embeddings of text and code. However, current language models are known to struggle with longer, compositional text, and multi-step reasoning. To overcome this limitation, we propose supplementing the query sentence with a layout of its semantic structure. The semantic layout is used to break down the final reasoning decision into a series of lower-level decisions. We use a Neural Module Network architecture to implement this idea. We compare our model - NS3 (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods, and evaluate on two datasets - CodeSearchNet and Code Search and Question Answering. We demonstrate that our approach results in more precise code retrieval, and we study the effectiveness of our modular design when handling compositional queries.
\ No newline at end of file
diff --git a/data/2022/neurips/NSNet: A General Neural Probabilistic Framework for Satisfiability Problems b/data/2022/neurips/NSNet: A General Neural Probabilistic Framework for Satisfiability Problems
new file mode 100644
index 0000000000..ee53d7487d
--- /dev/null
+++ b/data/2022/neurips/NSNet: A General Neural Probabilistic Framework for Satisfiability Problems	
@@ -0,0 +1 @@
+We present the Neural Satisfiability Network (NSNet), a general neural framework that models satisfiability problems as probabilistic inference and meanwhile exhibits proper explainability. Inspired by the Belief Propagation (BP), NSNet uses a novel graph neural network (GNN) to parameterize BP in the latent space, where its hidden representations maintain the same probabilistic interpretation as BP. NSNet can be flexibly configured to solve both SAT and #SAT problems by applying different learning objectives. For SAT, instead of directly predicting a satisfying assignment, NSNet performs marginal inference among all satisfying solutions, which we empirically find is more feasible for neural networks to learn. With the estimated marginals, a satisfying assignment can be efficiently generated by rounding and executing a stochastic local search. For #SAT, NSNet performs approximate model counting by learning the Bethe approximation of the partition function. Our evaluations show that NSNet achieves competitive results in terms of inference accuracy and time efficiency on multiple SAT and #SAT datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis b/data/2022/neurips/NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
new file mode 100644
index 0000000000..0098f9aced
--- /dev/null
+++ b/data/2022/neurips/NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis	
@@ -0,0 +1 @@
+In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.
\ No newline at end of file
diff --git a/data/2022/neurips/Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks b/data/2022/neurips/Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks
new file mode 100644
index 0000000000..a4f8347042
--- /dev/null
+++ b/data/2022/neurips/Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks	
@@ -0,0 +1 @@
+Unrestricted color attacks, which manipulate semantically meaningful color of an image, have shown their stealthiness and success in fooling both human eyes and deep neural networks. However, current works usually sacrifice the flexibility of the uncontrolled setting to ensure the naturalness of adversarial examples. As a result, the black-box attack performance of these methods is limited. To boost transferability of adversarial examples without damaging image quality, we propose a novel Natural Color Fool (NCF) which is guided by realistic color distributions sampled from a publicly available dataset and optimized by our neighborhood search and initialization reset. By conducting extensive experiments and visualizations, we convincingly demonstrate the effectiveness of our proposed method. Notably, on average, results show that our NCF can outperform state-of-the-art approaches by 15.0%$\sim$32.9% for fooling normally trained models and 10.0%$\sim$25.3% for evading defense methods. Our code is available at https://github.com/ylhz/Natural-Color-Fool.
\ No newline at end of file
diff --git a/data/2022/neurips/Natural gradient enables fast sampling in spiking neural networks b/data/2022/neurips/Natural gradient enables fast sampling in spiking neural networks
new file mode 100644
index 0000000000..d7c045a453
--- /dev/null
+++ b/data/2022/neurips/Natural gradient enables fast sampling in spiking neural networks	
@@ -0,0 +1 @@
+For animals to navigate an uncertain world, their brains need to estimate uncertainty at the timescales of sensations and actions. Sampling-based algorithms afford a theoretically-grounded framework for probabilistic inference in neural circuits, but it remains unknown how one can implement fast sampling algorithms in biologically-plausible spiking networks. Here, we propose to leverage the population geometry, controlled by the neural code and the neural dynamics, to implement fast samplers in spiking neural networks. We first show that two classes of spiking samplers—efficient balanced spiking networks that simulate Langevin sampling, and networks with probabilistic spike rules that implement Metropolis-Hastings sampling—can be unified within a common framework. We then show that careful choice of population geometry, corresponding to the natural space of parameters, enables rapid inference of parameters drawn from strongly-correlated high-dimensional distributions in both networks. Our results suggest design prin-ciples for algorithms for sampling-based probabilistic inference in spiking neural networks, yielding potential inspiration for neuromorphic computing and testable predictions for neurobiology.
\ No newline at end of file
diff --git a/data/2022/neurips/Natural image synthesis for the retina with variational information bottleneck representation b/data/2022/neurips/Natural image synthesis for the retina with variational information bottleneck representation
new file mode 100644
index 0000000000..e19b77035d
--- /dev/null
+++ b/data/2022/neurips/Natural image synthesis for the retina with variational information bottleneck representation	
@@ -0,0 +1 @@
+In the early visual system, high dimensional natural stimuli are encoded into the trains of neuronal spikes that transmit the information to the brain to produce perception. However, is all the visual scene information required to explain the neuronal responses? In this work, we search for answers to this question by developing a joint model of the natural visual input and neuronal responses using the Information Bottleneck (IB) framework that can represent features of the input data into a few latent variables that play a role in the prediction of the outputs. The correlations between data samples acquired from published experiments on ex-vivo retinas are accounted for in the model by a Gaussian Process (GP) prior. The proposed IB-GP model performs competitively to the state-of-the-art feedforward convolutional networks in predicting spike responses to natural stimuli. Finally, the IB-GP model is used in a closed-loop iterative process to obtain reduced-complexity inputs that elicit responses as elicited by the original stimuli. We found three properties of the retina’s IB-GP model. First, the reconstructed stimuli from the latent variables show robustness in spike prediction across models. Second, surprisingly the dynamics of the high-dimensional stimuli and RGCs’ responses are very well represented in the embeddings of the IB-GP model. Third, the minimum stimuli consist of different patterns: Gabor-type locally high-frequency filters, on-and off-center Gaussians, or a mixture of both. Overall, this work demonstrates that the IB-GP model provides a principled approach for joint learning of the stimuli and retina codes, capturing dynamics of the stimuli-RGCs in the latent space which could help better understand the computation of the early visual system.
\ No newline at end of file
diff --git a/data/2022/neurips/NaturalProver: Grounded Mathematical Proof Generation with Language Models b/data/2022/neurips/NaturalProver: Grounded Mathematical Proof Generation with Language Models
new file mode 100644
index 0000000000..2d6e240347
--- /dev/null
+++ b/data/2022/neurips/NaturalProver: Grounded Mathematical Proof Generation with Language Models	
@@ -0,0 +1 @@
+Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models.
\ No newline at end of file
diff --git a/data/2022/neurips/Navigating Memory Construction by Global Pseudo-Task Simulation for Continual Learning b/data/2022/neurips/Navigating Memory Construction by Global Pseudo-Task Simulation for Continual Learning
new file mode 100644
index 0000000000..1210c92b67
--- /dev/null
+++ b/data/2022/neurips/Navigating Memory Construction by Global Pseudo-Task Simulation for Continual Learning	
@@ -0,0 +1 @@
+Continual learning faces a crucial challenge of catastrophic forgetting. To address this challenge, experience replay (ER) that maintains a tiny subset of samples from previous tasks has been commonly used. Existing ER works usually focus on refining the learning objective for each task with a static memory construction policy. In this paper, we formulate the dynamic memory construction in ER as a combinatorial optimization problem, which aims at directly minimizing the global loss across all experienced tasks. We first apply three tactics to solve the problem in the offline setting as a starting point. To provide an approximate solution to this problem in the online continual learning setting, we further propose the Global Pseudo-task Simulation (GPS), which mimics future catastrophic forgetting of the current task by permutation. Our empirical results and analyses suggest that the GPS consistently improves accuracy across four commonly used vision benchmarks. We have also shown that our GPS can serve as the unified framework for integrating various memory construction policies in existing ER works.
\ No newline at end of file
diff --git a/data/2022/neurips/NeMF: Neural Motion Fields for Kinematic Animation b/data/2022/neurips/NeMF: Neural Motion Fields for Kinematic Animation
new file mode 100644
index 0000000000..6e9e63bf3d
--- /dev/null
+++ b/data/2022/neurips/NeMF: Neural Motion Fields for Kinematic Animation	
@@ -0,0 +1 @@
+We present an implicit neural representation to learn the spatio-temporal space of kinematic motions. Unlike previous work that represents motion as discrete sequential samples, we propose to express the vast motion space as a continuous function over time, hence the name Neural Motion Fields (NeMF). Specifically, we use a neural network to learn this function for miscellaneous sets of motions, which is designed to be a generative model conditioned on a temporal coordinate $t$ and a random vector $z$ for controlling the style. The model is then trained as a Variational Autoencoder (VAE) with motion encoders to sample the latent space. We train our model with a diverse human motion dataset and quadruped dataset to prove its versatility, and finally deploy it as a generic motion prior to solve task-agnostic problems and show its superiority in different motion generation and editing applications, such as motion interpolation, in-betweening, and re-navigating. More details can be found on our project page: https://cs.yale.edu/homes/che/projects/nemf/.
\ No newline at end of file
diff --git a/data/2022/neurips/Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs b/data/2022/neurips/Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
new file mode 100644
index 0000000000..28ca029e49
--- /dev/null
+++ b/data/2022/neurips/Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs	
@@ -0,0 +1 @@
+In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$. While minimax optimal algorithms exist for this problem, its instance-dependent complexity remains elusive in episodic Markov decision processes (MDPs). In this paper, we propose the first nearly matching (up to a horizon squared factor and logarithmic terms) upper and lower bounds on the sample complexity of PAC RL in deterministic episodic MDPs with finite state and action spaces. In particular, our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap. While our instance-dependent lower bound is written as a linear program, our algorithms are very simple and do not require solving such an optimization problem during learning. Their design and analyses employ novel ideas, including graph-theoretical concepts (minimum flows) and a new maximum-coverage exploration strategy.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Isometric Properties of Kronecker-Structured Random Tensor Embeddings b/data/2022/neurips/Near-Isometric Properties of Kronecker-Structured Random Tensor Embeddings
new file mode 100644
index 0000000000..53c5751570
--- /dev/null
+++ b/data/2022/neurips/Near-Isometric Properties of Kronecker-Structured Random Tensor Embeddings	
@@ -0,0 +1 @@
+We give uniform concentration inequality for random tensors acting on rank-1 Kronecker structured signals, which parallels a Gordon-type inequality for this class of tensor structured data. Two variants of the random embedding are considered, where the embedding dimension depends on explicit quantities characterizing the complexity of the signal. As applications of the tools developed herein, we illustrate with examples from signal recovery and optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Collaborative Learning in Bandits b/data/2022/neurips/Near-Optimal Collaborative Learning in Bandits
new file mode 100644
index 0000000000..d88cd06b96
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Collaborative Learning in Bandits	
@@ -0,0 +1 @@
+This paper introduces a general multi-agent bandit model in which each agent is facing a finite set of arms and may communicate with other agents through a central controller in order to identify, in pure exploration, or play, in regret minimization, its optimal arm. The twist is that the optimal arm for each agent is the arm with largest expected mixed reward, where the mixed reward of an arm is a weighted sum of the rewards of this arm for all agents. This makes communication between agents often necessary. This general setting allows to recover and extend several recent models for collaborative bandit learning, including the recently proposed federated learning with personalization (Shi et al., 2021). In this paper, we provide new lower bounds on the sample complexity of pure exploration and on the regret. We then propose a near-optimal algorithm for pure exploration. This algorithm is based on phased elimination with two novel ingredients: a data-dependent sampling scheme within each phase, aimed at matching a relaxation of the lower bound.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Correlation Clustering with Privacy b/data/2022/neurips/Near-Optimal Correlation Clustering with Privacy
new file mode 100644
index 0000000000..013408bd6a
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Correlation Clustering with Privacy	
@@ -0,0 +1 @@
+Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labelling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our approximation guarantees are stronger than those shown in prior work and are optimal up to logarithmic factors.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments b/data/2022/neurips/Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
new file mode 100644
index 0000000000..9623d13cca
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments	
@@ -0,0 +1 @@
+We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions. We start by establishing a lower bound $\Omega((B_{\star} SAT_{\star}(\Delta_c + B_{\star}^2\Delta_P))^{1/3}K^{2/3})$, where $B_{\star}$ is the maximum expected cost of the optimal policy of any episode starting from any state, $T_{\star}$ is the maximum hitting time of the optimal policy of any episode starting from the initial state, $SA$ is the number of state-action pairs, $\Delta_c$ and $\Delta_P$ are the amount of changes of the cost and transition functions respectively, and $K$ is the number of episodes. The different roles of $\Delta_c$ and $\Delta_P$ in this lower bound inspire us to design algorithms that estimate costs and transitions separately. Specifically, assuming the knowledge of $\Delta_c$ and $\Delta_P$, we develop a simple but sub-optimal algorithm and another more involved minimax optimal algorithm (up to logarithmic terms). These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as well as some new techniques such as properly penalizing long-horizon policies. Finally, when $\Delta_c$ and $\Delta_P$ are unknown, we develop a variant of the MASTER algorithm [Wei and Luo, 2021] and integrate the aforementioned ideas into it to achieve $\widetilde{O}(\min\{B_{\star} S\sqrt{ALK}, (B_{\star}^2S^2AT_{\star}(\Delta_c+B_{\star}\Delta_P))^{1/3}K^{2/3}\})$ regret, where $L$ is the unknown number of changes of the environment.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Multi-Agent Learning for Safe Coverage Control b/data/2022/neurips/Near-Optimal Multi-Agent Learning for Safe Coverage Control
new file mode 100644
index 0000000000..a73473a4fa
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Multi-Agent Learning for Safe Coverage Control	
@@ -0,0 +1 @@
+In multi-agent coverage control problems, agents navigate their environment to reach locations that maximize the coverage of some density. In practice, the density is rarely known $\textit{a priori}$, further complicating the original NP-hard problem. Moreover, in many applications, agents cannot visit arbitrary locations due to $\textit{a priori}$ unknown safety constraints. In this paper, we aim to efficiently learn the density to approximately solve the coverage problem while preserving the agents' safety. We first propose a conditionally linear submodular coverage function that facilitates theoretical analysis. Utilizing this structure, we develop MacOpt, a novel algorithm that efficiently trades off the exploration-exploitation dilemma due to partial observability, and show that it achieves sublinear regret. Next, we extend results on single-agent safe exploration to our multi-agent setting and propose SafeMac for safe coverage and exploration. We analyze SafeMac and give first of its kind results: near optimal coverage in finite time while provably guaranteeing safety. We extensively evaluate our algorithms on synthetic and real problems, including a bio-diversity monitoring task under safety constraints, where SafeMac outperforms competing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal No-Regret Learning Dynamics for General Convex Games b/data/2022/neurips/Near-Optimal No-Regret Learning Dynamics for General Convex Games
new file mode 100644
index 0000000000..f83f1cbb2c
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal No-Regret Learning Dynamics for General Convex Games	
@@ -0,0 +1 @@
+A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's \emph{regret} after $T$ repetitions grows polylogarithmically in $T$, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces -- such as normal-form and extensive-form games. The question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for general convex and compact strategy sets -- which occur in many fundamental models in economics and multiagent systems -- while retaining efficient strategy updates is an important question. In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately \emph{lifted} space using a \emph{self-concordant regularizer} that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to $O(\log\log T)$ per-iteration complexity; we also give extensions when access to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Private and Scalable $k$-Clustering b/data/2022/neurips/Near-Optimal Private and Scalable $k$-Clustering
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Near-Optimal Randomized Exploration for Tabular Markov Decision Processes b/data/2022/neurips/Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
new file mode 100644
index 0000000000..782fedb9a0
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Randomized Exploration for Tabular Markov Decision Processes	
@@ -0,0 +1 @@
+We study algorithms using randomized value functions for exploration in reinforcement learning. This type of algorithms enjoys appealing empirical performance. We show that when we use 1) a single random seed in each episode, and 2) a Bernstein-type magnitude of noise, we obtain a worst-case $\widetilde{O}\left(H\sqrt{SAT}\right)$ regret bound for episodic time-inhomogeneous Markov Decision Process where $S$ is the size of state space, $A$ is the size of action space, $H$ is the planning horizon and $T$ is the number of interactions. This bound polynomially improves all existing bounds for algorithms based on randomized value functions, and for the first time, matches the $\Omega\left(H\sqrt{SAT}\right)$ lower bound up to logarithmic factors. Our result highlights that randomized exploration can be near-optimal, which was previously achieved only by optimistic algorithms. To achieve the desired result, we develop 1) a new clipping operation to ensure both the probability of being optimistic and the probability of being pessimistic are lower bounded by a constant, and 2) a new recursive formula for the absolute value of estimation errors to analyze the regret.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning b/data/2022/neurips/Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
new file mode 100644
index 0000000000..968c360c12
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning	
@@ -0,0 +1 @@
+In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent is required to provide a time schedule to update policy before everything, which is particularly suitable for the scenarios where the agent suffers extensively from changing the policy adaptively. Given a finite-horizon MDP with $S$ states, $A$ actions and planning horizon $H$, we design a computational efficient algorithm to achieve near-optimal regret of $\tilde{O}(\sqrt{SAH^3K\ln(1/\delta)})$\footnote{$\tilde{O}(\cdot)$ hides logarithmic terms of $(S,A,H,K)$} in $K$ episodes using $O\left(H+\log_2\log_2(K) \right)$ batches with confidence parameter $\delta$. To our best of knowledge, it is the first $\tilde{O}(\sqrt{SAH^3K})$ regret bound with $O(H+\log_2\log_2(K))$ batch complexity. Meanwhile, we show that to achieve $\tilde{O}(\mathrm{poly}(S,A,H)\sqrt{K})$ regret, the number of batches is at least $\Omega\left(H/\log_A(K)+ \log_2\log_2(K) \right)$, which matches our upper bound up to logarithmic terms. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore over the unlearned states; 2) an computational efficient algorithm to explore certain directions with an approximated transition model.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback b/data/2022/neurips/Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
new file mode 100644
index 0000000000..e4bf14cfb1
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback	
@@ -0,0 +1 @@
+The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. More precisely, the feedback for the agent in episode $k$ is revealed only in the end of episode $k + d^k$, where the delay $d^k$ can be changing over episodes and chosen by an oblivious adversary. We present the first algorithms that achieve near-optimal $\sqrt{K + D}$ regret, where $K$ is the number of episodes and $D = \sum_{k=1}^K d^k$ is the total delay, significantly improving upon the best known regret bound of $(K + D)^{2/3}$.
\ No newline at end of file
diff --git a/data/2022/neurips/Near-Optimal Sample Complexity Bounds for Constrained MDPs b/data/2022/neurips/Near-Optimal Sample Complexity Bounds for Constrained MDPs
new file mode 100644
index 0000000000..b58e4d5140
--- /dev/null
+++ b/data/2022/neurips/Near-Optimal Sample Complexity Bounds for Constrained MDPs	
@@ -0,0 +1 @@
+In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.
\ No newline at end of file
diff --git a/data/2022/neurips/Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions b/data/2022/neurips/Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
new file mode 100644
index 0000000000..a5ea684af4
--- /dev/null
+++ b/data/2022/neurips/Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions	
@@ -0,0 +1 @@
+We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously.
\ No newline at end of file
diff --git a/data/2022/neurips/Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs b/data/2022/neurips/Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs
new file mode 100644
index 0000000000..ee84be1754
--- /dev/null
+++ b/data/2022/neurips/Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs	
@@ -0,0 +1 @@
+This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of $\tilde{\Theta}( \alpha^{1/2} T^{1/2} )$, while weakly observable graphs induce minimax regret of $\tilde{\Theta}( \delta^{1/3} T^{2/3} )$, where $\alpha$ and $\delta$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of $\tilde{O}( \alpha^{1/2} T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{\alpha (\ln T)^3 }{\Delta_{\min}} ) $ for stochastic environments, where $\Delta_{\min}$ expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of $\tilde{O}( \delta^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-regularized-leader approach combined with newly designed update rules for learning rates.
\ No newline at end of file
diff --git a/data/2022/neurips/Nearly-Tight Bounds for Testing Histogram Distributions b/data/2022/neurips/Nearly-Tight Bounds for Testing Histogram Distributions
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning b/data/2022/neurips/NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning
new file mode 100644
index 0000000000..7fd45490f0
--- /dev/null
+++ b/data/2022/neurips/NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) aims at learning a good policy from a batch of collected data, without extra interactions with the environment during training. However, current offline RL benchmarks commonly have a large reality gap, because they involve large datasets collected by highly exploratory policies, and the trained policy is directly evaluated in the environment. In real-world situations, running a highly exploratory policy is prohibited to ensure system safety, the data is commonly very limited, and a trained policy should be well validated before deployment. In this paper, we present a near real-world offline RL benchmark, named NeoRL, which contains datasets from various domains with controlled sizes, and extra test datasets for policy validation. We evaluate existing offline RL algorithms on NeoRL and argue that the performance of a policy should also be compared with the deterministic version of the behavior policy, instead of the dataset reward. The empirical results demonstrate that the tested offline RL algorithms become less competitive to the deterministic policy on many datasets, and the offline policy evaluation hardly helps. The NeoRL suit can be found at this http URL. We hope this work will shed some light on future research and draw more attention when deploying RL in real-world systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization b/data/2022/neurips/Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
new file mode 100644
index 0000000000..32f0783902
--- /dev/null
+++ b/data/2022/neurips/Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization	
@@ -0,0 +1 @@
+Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates. However, when it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. We provide such an example proving that the simple combination of Gradient Descent Ascent (GDA) with adaptive stepsizes can diverge if the primal-dual stepsize ratio is not carefully chosen; hence, a fortiori, such adaptive extensions are not parameter-agnostic. To address the issue, we formally introduce a Nested Adaptive framework, NeAda for short, that carries an inner loop for adaptively maximizing the dual variable with controllable stopping criteria and an outer loop for adaptively minimizing the primal variable. Such mechanism can be equipped with off-the-shelf adaptive optimizers and automatically balance the progress in the primal and dual variables. Theoretically, for nonconvex-strongly-concave minimax problems, we show that NeAda can achieve the near-optimal $\tilde{O}(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-4})$ gradient complexities respectively in the deterministic and stochastic settings, without prior information on the problem's smoothness and strong concavity parameters. To the best of our knowledge, this is the first algorithm that simultaneously achieves near-optimal convergence rates and parameter-agnostic adaptation in the nonconvex minimax setting. Numerically, we further illustrate the robustness of the NeAda family with experiments on simple test functions and a real-world application.
\ No newline at end of file
diff --git a/data/2022/neurips/Network change point localisation under local differential privacy b/data/2022/neurips/Network change point localisation under local differential privacy
new file mode 100644
index 0000000000..abb8d2c700
--- /dev/null
+++ b/data/2022/neurips/Network change point localisation under local differential privacy	
@@ -0,0 +1 @@
+Network data are ubiquitous in our daily life, containing rich but often sensitive information. In this paper, we expand the current static analysis of privatised networks to a dynamic framework by considering a sequence of networks with potential change points. We investigate the fundamental limits in consistently localising change points under both node and edge privacy constraints, demonstrating interesting phase transition in terms of the signal-to-noise ratio condition, accompanied by polynomial-time algorithms. The private signal-to-noise ratio conditions quantify the costs of the privacy for change point localisation problems and exhibit a different scaling in the sparsity parameter compared to the non-private counterparts. Our algorithms are shown to be optimal under the edge LDP constraint up to log factors. Under node LDP constraint, a gap exists between our upper bound and lower bound and we leave it as an interesting open problem.
\ No newline at end of file
diff --git a/data/2022/neurips/NeuForm: Adaptive Overfitting for Neural Shape Editing b/data/2022/neurips/NeuForm: Adaptive Overfitting for Neural Shape Editing
new file mode 100644
index 0000000000..798250d811
--- /dev/null
+++ b/data/2022/neurips/NeuForm: Adaptive Overfitting for Neural Shape Editing	
@@ -0,0 +1 @@
+Neural representations are popular for representing shapes, as they can be learned form sensor data and used for data cleanup, model completion, shape editing, and shape synthesis. Current neural representations can be categorized as either overfitting to a single object instance, or representing a collection of objects. However, neither allows accurate editing of neural scene representations: on the one hand, methods that overfit objects achieve highly accurate reconstructions, but do not generalize to unseen object configurations and thus cannot support editing; on the other hand, methods that represent a family of objects with variations do generalize but produce only approximate reconstructions. We propose NEUFORM to combine the advantages of both overfitted and generalizable representations by adaptively using the one most appropriate for each shape region: the overfitted representation where reliable data is available, and the generalizable representation everywhere else. We achieve this with a carefully designed architecture and an approach that blends the network weights of the two representations, avoiding seams and other artifacts. We demonstrate edits that successfully reconfigure parts of human-designed shapes, such as chairs, tables, and lamps, while preserving semantic integrity and the accuracy of an overfitted shape representation. We compare with two state-of-the-art competitors and demonstrate clear improvements in terms of plausibility and fidelity of the resultant edits.
\ No newline at end of file
diff --git a/data/2022/neurips/NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos b/data/2022/neurips/NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos
new file mode 100644
index 0000000000..a8af809e17
--- /dev/null
+++ b/data/2022/neurips/NeuPhysics: Editable Neural Geometry and Physics from Monocular Videos	
@@ -0,0 +1 @@
+We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a two-way conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras.
\ No newline at end of file
diff --git a/data/2022/neurips/Neur2SP: Neural Two-Stage Stochastic Programming b/data/2022/neurips/Neur2SP: Neural Two-Stage Stochastic Programming
new file mode 100644
index 0000000000..c08f0fda86
--- /dev/null
+++ b/data/2022/neurips/Neur2SP: Neural Two-Stage Stochastic Programming	
@@ -0,0 +1 @@
+Stochastic Programming is a powerful modeling framework for decision-making under uncertainty. In this work, we tackle two-stage stochastic programs (2SPs), the most widely used class of stochastic programming models. Solving 2SPs exactly requires optimizing over an expected value function that is computationally intractable. Having a mixed-integer linear program (MIP) or a nonlinear program (NLP) in the second stage further aggravates the intractability, even when specialized algorithms that exploit problem structure are employed. Finding high-quality (first-stage) solutions -- without leveraging problem structure -- can be crucial in such settings. We develop Neur2SP, a new method that approximates the expected value function via a neural network to obtain a surrogate model that can be solved more efficiently than the traditional extensive formulation approach. Neur2SP makes no assumptions about the problem structure, in particular about the second-stage problem, and can be implemented using an off-the-shelf MIP solver. Our extensive computational experiments on four benchmark 2SP problem classes with different structures (containing MIP and NLP second-stage problems) demonstrate the efficiency (time) and efficacy (solution quality) of Neur2SP. In under 1.66 seconds, Neur2SP finds high-quality solutions across all problems even as the number of scenarios increases, an ideal property that is difficult to have for traditional 2SP solution techniques. Namely, the most generic baseline method typically requires minutes to hours to find solutions of comparable quality.
\ No newline at end of file
diff --git a/data/2022/neurips/NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation b/data/2022/neurips/NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation
new file mode 100644
index 0000000000..09add9b9a2
--- /dev/null
+++ b/data/2022/neurips/NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation	
@@ -0,0 +1 @@
+Optical computing is an emerging technology for next-generation efficient artificial intelligence (AI) due to its ultra-high speed and efficiency. Electromagnetic field simulation is critical to the design, optimization, and validation of photonic devices and circuits. However, costly numerical simulation significantly hinders the scalability and turn-around time in the photonic circuit design loop. Recently, physics-informed neural networks have been proposed to predict the optical field solution of a single instance of a partial differential equation (PDE) with predefined parameters. Their complicated PDE formulation and lack of efficient parametrization mechanisms limit their flexibility and generalization in practical simulation scenarios. In this work, for the first time, a physics-agnostic neural operator-based framework, dubbed NeurOLight, is proposed to learn a family of frequency-domain Maxwell PDEs for ultra-fast parametric photonic device simulation. We balance the efficiency and generalization of NeurOLight via several novel techniques. Specifically, we discretize different devices into a unified domain, represent parametric PDEs with a compact wave prior, and encode the incident light via masked source modeling. We design our model with parameter-efficient cross-shaped NeurOLight blocks and adopt superposition-based augmentation for data-efficient learning. With these synergistic approaches, NeurOLight generalizes to a large space of unseen simulation settings, demonstrates 2-orders-of-magnitude faster simulation speed than numerical solvers, and outperforms prior neural network models by ~54% lower prediction error with ~44% fewer parameters. Our code is available at https://github.com/JeremieMelo/NeurOLight.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Abstractions b/data/2022/neurips/Neural Abstractions
new file mode 100644
index 0000000000..d9db953ca8
--- /dev/null
+++ b/data/2022/neurips/Neural Abstractions	
@@ -0,0 +1 @@
+We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics. Neural networks have extensively been used before as approximators; in this work, we make a step further and use them for the first time as abstractions. For a given dynamical model, our method synthesises a neural network that overapproximates its dynamics by ensuring an arbitrarily tight, formally certified bound on the approximation error. For this purpose, we employ a counterexample-guided inductive synthesis procedure. We show that this produces a neural ODE with non-deterministic disturbances that constitutes a formal abstraction of the concrete model under analysis. This guarantees a fundamental property: if the abstract model is safe, i.e., free from any initialised trajectory that reaches an undesirable state, then the concrete model is also safe. By using neural ODEs with ReLU activation functions as abstractions, we cast the safety verification problem for nonlinear dynamical models into that of hybrid automata with affine dynamics, which we verify using SpaceEx. We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models. We additionally demonstrate and that it is effective on models that do not exhibit local Lipschitz continuity, which are out of reach to the existing technologies.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Approximation of Graph Topological Features b/data/2022/neurips/Neural Approximation of Graph Topological Features
new file mode 100644
index 0000000000..41c3f15af1
--- /dev/null
+++ b/data/2022/neurips/Neural Approximation of Graph Topological Features	
@@ -0,0 +1 @@
+Topological features based on persistent homology capture high-order structural information so as to augment graph neural network methods. However, computing extended persistent homology summaries remains slow for large and dense graphs and can be a serious bottleneck for the learning pipeline. Inspired by recent success in neural algorithmic reasoning, we propose a novel graph neural network to estimate extended persistence diagrams (EPDs) on graphs efficiently. Our model is built on algorithmic insights, and benefits from better supervision and closer alignment with the EPD computation algorithm. We validate our method with convincing empirical results on approximating EPDs and downstream graph representation learning tasks. Our method is also efficient; on large and dense graphs, we accelerate the computation by nearly 100 times.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Attentive Circuits b/data/2022/neurips/Neural Attentive Circuits
new file mode 100644
index 0000000000..aaf90d45be
--- /dev/null
+++ b/data/2022/neurips/Neural Attentive Circuits	
@@ -0,0 +1 @@
+Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Basis Models for Interpretability b/data/2022/neurips/Neural Basis Models for Interpretability
new file mode 100644
index 0000000000..bd365ff28a
--- /dev/null
+++ b/data/2022/neurips/Neural Basis Models for Interpretability	
@@ -0,0 +1 @@
+Due to the widespread use of complex machine learning models in real-world applications, it is becoming critical to explain model predictions. However, these models are typically black-box deep neural networks, explained post-hoc via methods with known faithfulness limitations. Generalized Additive Models (GAMs) are an inherently interpretable class of models that address this limitation by learning a non-linear shape function for each feature separately, followed by a linear model on top. However, these models are typically difficult to train, require numerous parameters, and are difficult to scale. We propose an entirely new subfamily of GAMs that utilizes basis decomposition of shape functions. A small number of basis functions are shared among all features, and are learned jointly for a given task, thus making our model scale much better to large-scale data with high-dimensional features, especially when features are sparse. We propose an architecture denoted as the Neural Basis Model (NBM) which uses a single neural network to learn these bases. On a variety of tabular and image datasets, we demonstrate that for interpretable machine learning, NBMs are the state-of-the-art in accuracy, model size, and, throughput and can easily model all higher-order feature interactions. Source code is available at https://github.com/facebookresearch/nbm-spam.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Circuit Architectural Priors for Embodied Control b/data/2022/neurips/Neural Circuit Architectural Priors for Embodied Control
new file mode 100644
index 0000000000..4dd1df538d
--- /dev/null
+++ b/data/2022/neurips/Neural Circuit Architectural Priors for Embodied Control	
@@ -0,0 +1 @@
+Artificial neural networks for motor control usually adopt generic architectures like fully connected MLPs. While general, these tabula rasa architectures rely on large amounts of experience to learn, are not easily transferable to new bodies, and have internal dynamics that are difficult to interpret. In nature, animals are born with highly structured connectivity in their nervous systems shaped by evolution; this innate circuitry acts synergistically with learning mechanisms to provide inductive biases that enable most animals to function well soon after birth and learn efficiently. Convolutional networks inspired by visual circuitry have encoded useful biases for vision. However, it is unknown the extent to which ANN architectures inspired by neural circuitry can yield useful biases for other AI domains. In this work, we ask what advantages biologically inspired ANN architecture can provide in the domain of motor control. Specifically, we translate C. elegans locomotion circuits into an ANN model controlling a simulated Swimmer agent. On a locomotion task, our architecture achieves good initial performance and asymptotic performance comparable with MLPs, while dramatically improving data efficiency and requiring orders of magnitude fewer parameters. Our architecture is interpretable and transfers to new body designs. An ablation analysis shows that constrained excitation/inhibition is crucial for learning, while weight initialization contributes to good initial performance. Our work demonstrates several advantages of biologically inspired ANN architecture and encourages future work in more complex embodied control.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold b/data/2022/neurips/Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold
new file mode 100644
index 0000000000..e36709b8f1
--- /dev/null
+++ b/data/2022/neurips/Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold	
@@ -0,0 +1 @@
+When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called"neural collapse"phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Conservation Laws: A Divergence-Free Perspective b/data/2022/neurips/Neural Conservation Laws: A Divergence-Free Perspective
new file mode 100644
index 0000000000..261905ad9a
--- /dev/null
+++ b/data/2022/neurips/Neural Conservation Laws: A Divergence-Free Perspective	
@@ -0,0 +1 @@
+We investigate the parameterization of deep neural networks that by design satisfy the continuity equation, a fundamental conservation law. This is enabled by the observation that any solution of the continuity equation can be represented as a divergence-free vector field. We hence propose building divergence-free neural networks through the concept of differential forms, and with the aid of automatic differentiation, realize two practical constructions. As a result, we can parameterize pairs of densities and vector fields that always exactly satisfy the continuity equation, foregoing the need for extra penalty methods or expensive numerical simulation. Furthermore, we prove these models are universal and so can be used to represent any divergence-free vector field. Finally, we experimentally validate our approaches by computing neural network-based solutions to fluid equations, solving for the Hodge decomposition, and learning dynamical optimal transport maps.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules b/data/2022/neurips/Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
new file mode 100644
index 0000000000..f5447687a7
--- /dev/null
+++ b/data/2022/neurips/Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules	
@@ -0,0 +1 @@
+Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are typically expressed as additive iterative update processes which have straightforward ODE counterparts. Here we introduce a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets. This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers. Our novel models outperform the best existing Neural Controlled Differential Equation based models on various time series classification tasks, while also addressing their fundamental scalability limitations. Our code is public.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection b/data/2022/neurips/Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection
new file mode 100644
index 0000000000..0008ca9ffb
--- /dev/null
+++ b/data/2022/neurips/Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection	
@@ -0,0 +1 @@
+Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization. Many recent approaches to learn submodular functions suffer from limited expressiveness. In this work, we propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions. To fit a latent submodular function from (set, value) observations, FLEXSUBNET applies a concave function on modular functions in a recursive manner. We do not draw the concave function from a restricted family, but rather learn from data using a highly expressive neural network that implements a differentiable quadrature procedure. Such an expressive neural model for concave functions may be of independent interest. Next, we extend this setup to provide a novel characterization of monotone \alpha-submodular functions, a recently introduced notion of approximate submodular functions. We then use this characterization to design a novel neural model for such functions. Finally, we consider learning submodular set functions under distant supervision in the form of (perimeter-set, high-value-subset) pairs. This yields a novel subset selection method based on an order-invariant, yet greedy sampler built around the above neural set functions. Our experiments on synthetic and real data show that FLEXSUBNET outperforms several baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees b/data/2022/neurips/Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees
new file mode 100644
index 0000000000..a9be22f3c7
--- /dev/null
+++ b/data/2022/neurips/Neural Lyapunov Control of Unknown Nonlinear Systems with Stability Guarantees	
@@ -0,0 +1 @@
+Learning for control of dynamical systems with formal guarantees remains a challenging task. This paper proposes a learning framework to simultaneously stabilize an unknown nonlinear system with a neural controller and learn a neural Lyapunov function to certify a region of attraction (ROA) for the closed-loop system. The algorithmic structure consists of two neural networks and a satisfiability modulo theories (SMT) solver. The first neural network is responsible for learning the unknown dynamics. The second neural network aims to identify a valid Lyapunov function and a provably stabilizing nonlinear controller. The SMT solver then verifies that the candidate Lyapunov function indeed satisfies the Lyapunov conditions. We provide theoretical guarantees of the proposed learning framework in terms of the closed-loop stability for the unknown nonlinear system. We illustrate the effectiveness of the approach with a set of numerical experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence b/data/2022/neurips/Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence
new file mode 100644
index 0000000000..e44bdd6648
--- /dev/null
+++ b/data/2022/neurips/Neural Matching Fields: Implicit Representation of Matching Fields for Visual Correspondence	
@@ -0,0 +1 @@
+Existing pipelines of semantic correspondence commonly include extracting high-level semantic features for the invariance against intra-class variations and background clutters. This architecture, however, inevitably results in a low-resolution matching field that additionally requires an ad-hoc interpolation process as a post-processing for converting it into a high-resolution one, certainly limiting the overall performance of matching results. To overcome this, inspired by recent success of implicit neural representation, we present a novel method for semantic correspondence, called Neural Matching Field (NeMF). However, complicacy and high-dimensionality of a 4D matching field are the major hindrances, which we propose a cost embedding network to process a coarse cost volume to use as a guidance for establishing high-precision matching field through the following fully-connected network. Nevertheless, learning a high-dimensional matching field remains challenging mainly due to computational complexity, since a naive exhaustive inference would require querying from all pixels in the 4D space to infer pixel-wise correspondences. To overcome this, we propose adequate training and inference procedures, which in the training phase, we randomly sample matching candidates and in the inference phase, we iteratively performs PatchMatch-based inference and coordinate optimization at test time. With these combined, competitive results are attained on several standard benchmarks for semantic correspondence. Code and pre-trained weights are available at https://ku-cvlab.github.io/NeMF/.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Network Architecture Beyond Width and Depth b/data/2022/neurips/Neural Network Architecture Beyond Width and Depth
new file mode 100644
index 0000000000..6a386bdcb4
--- /dev/null
+++ b/data/2022/neurips/Neural Network Architecture Beyond Width and Depth	
@@ -0,0 +1 @@
+This paper proposes a new neural network architecture by introducing an additional dimension called height beyond width and depth. Neural network architectures with height, width, and depth as hyper-parameters are called three-dimensional architectures. It is shown that neural networks with three-dimensional architectures are significantly more expressive than the ones with two-dimensional architectures (those with only width and depth as hyper-parameters), e.g., standard fully connected networks. The new network architecture is constructed recursively via a nested structure, and hence we call a network with the new architecture nested network (NestNet). A NestNet of height $s$ is built with each hidden neuron activated by a NestNet of height $\le s-1$. When $s=1$, a NestNet degenerates to a standard network with a two-dimensional architecture. It is proved by construction that height-$s$ ReLU NestNets with $\mathcal{O}(n)$ parameters can approximate $1$-Lipschitz continuous functions on $[0,1]^d$ with an error $\mathcal{O}(n^{-(s+1)/d})$, while the optimal approximation error of standard ReLU networks with $\mathcal{O}(n)$ parameters is $\mathcal{O}(n^{-2/d})$. Furthermore, such a result is extended to generic continuous functions on $[0,1]^d$ with the approximation error characterized by the modulus of continuity. Finally, we use numerical experimentation to show the advantages of the super-approximation power of ReLU NestNets.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members b/data/2022/neurips/Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members
new file mode 100644
index 0000000000..02b1afe241
--- /dev/null
+++ b/data/2022/neurips/Neural Payoff Machines: Predicting Fair and Stable Payoff Allocations Among Team Members	
@@ -0,0 +1 @@
+In many multi-agent settings, participants can form teams to achieve collective outcomes that may far surpass their individual capabilities. Measuring the relative contributions of agents and allocating them shares of the reward that promote long-lasting cooperation are difficult tasks. Cooperative game theory offers solution concepts identifying distribution schemes, such as the Shapley value, that fairly reflect the contribution of individuals to the performance of the team or the Core, which reduces the incentive of agents to abandon their team. Applications of such methods include identifying influential features and sharing the costs of joint ventures or team formation. Unfortunately, using these solutions requires tackling a computational barrier as they are hard to compute, even in restricted settings. In this work, we show how cooperative game-theoretic solutions can be distilled into a learned model by training neural networks to propose fair and stable payoff allocations. We show that our approach creates models that can generalize to games far from the training distribution and can predict solutions for more players than observed during training. An important application of our framework is Explainable AI: our approach can be used to speed-up Shapley value computations on many instances.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions b/data/2022/neurips/Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions
new file mode 100644
index 0000000000..5b23c7ec47
--- /dev/null
+++ b/data/2022/neurips/Neural Set Function Extensions: Learning with Discrete Functions in High Dimensions	
@@ -0,0 +1 @@
+Integrating functions on discrete domains into neural networks is key to developing their capability to reason about discrete objects. But, discrete domains are (1) not naturally amenable to gradient-based optimization, and (2) incompatible with deep learning architectures that rely on representations in high-dimensional vector spaces. In this work, we address both difficulties for set functions, which capture many important discrete problems. First, we develop a framework for extending set functions onto low-dimensional continuous domains, where many extensions are naturally defined. Our framework subsumes many well-known extensions as special cases. Second, to avoid undesirable low-dimensional neural network bottlenecks, we convert low-dimensional extensions into representations in high-dimensional spaces, taking inspiration from the success of semidefinite programs for combinatorial optimization. Empirically, we observe benefits of our extensions for unsupervised neural combinatorial optimization, in particular with high-dimensional representations.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Shape Deformation Priors b/data/2022/neurips/Neural Shape Deformation Priors
new file mode 100644
index 0000000000..f2f650c7a6
--- /dev/null
+++ b/data/2022/neurips/Neural Shape Deformation Priors	
@@ -0,0 +1 @@
+We present Neural Shape Deformation Priors, a novel method for shape manipulation that predicts mesh deformations of non-rigid objects from user-provided handle movements. State-of-the-art methods cast this problem as an optimization task, where the input source mesh is iteratively deformed to minimize an objective function according to hand-crafted regularizers such as ARAP. In this work, we learn the deformation behavior based on the underlying geometric properties of a shape, while leveraging a large-scale dataset containing a diverse set of non-rigid deformations. Specifically, given a source mesh and desired target locations of handles that describe the partial surface deformation, we predict a continuous deformation field that is defined in 3D space to describe the space deformation. To this end, we introduce transformer-based deformation networks that represent a shape deformation as a composition of local surface deformations. It learns a set of local latent codes anchored in 3D space, from which we can learn a set of continuous deformation functions for local surfaces. Our method can be applied to challenging deformations and generalizes well to unseen deformations. We validate our approach in experiments using the DeformingThing4D dataset, and compare to both classic optimization-based and recent neural network-based methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs b/data/2022/neurips/Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs
new file mode 100644
index 0000000000..c7a7fd3053
--- /dev/null
+++ b/data/2022/neurips/Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs	
@@ -0,0 +1 @@
+Cellular sheaves equip graphs with a"geometrical"structure by assigning vector spaces and linear maps to nodes and edges. Graph Neural Networks (GNNs) implicitly assume a graph with a trivial underlying sheaf. This choice is reflected in the structure of the graph Laplacian operator, the properties of the associated diffusion equation, and the characteristics of the convolutional models that discretise this equation. In this paper, we use cellular sheaf theory to show that the underlying geometry of the graph is deeply linked with the performance of GNNs in heterophilic settings and their oversmoothing behaviour. By considering a hierarchy of increasingly general sheaves, we study how the ability of the sheaf diffusion process to achieve linear separation of the classes in the infinite time limit expands. At the same time, we prove that when the sheaf is non-trivial, discretised parametric diffusion processes have greater control than GNNs over their asymptotic behaviour. On the practical side, we study how sheaves can be learned from data. The resulting sheaf diffusion models have many desirable properties that address the limitations of classical graph diffusion equations (and corresponding GNN models) and obtain competitive results in heterophilic settings. Overall, our work provides new connections between GNNs and algebraic topology and would be of interest to both fields.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Stochastic Control b/data/2022/neurips/Neural Stochastic Control
new file mode 100644
index 0000000000..8cd3b4e7ee
--- /dev/null
+++ b/data/2022/neurips/Neural Stochastic Control	
@@ -0,0 +1 @@
+Control problems are always challenging since they arise from the real-world systems where stochasticity and randomness are of ubiquitous presence. This naturally and urgently calls for developing efficient neural control policies for stabilizing not only the deterministic equations but the stochastic systems as well. Here, in order to meet this paramount call, we propose two types of controllers, viz., the exponential stabilizer (ES) based on the stochastic Lyapunov theory and the asymptotic stabilizer (AS) based on the stochastic asymptotic stability theory. The ES can render the controlled systems exponentially convergent but it requires a long computational time; conversely, the AS makes the training much faster but it can only assure the asymptotic (not the exponential) attractiveness of the control targets. These two stochastic controllers thus are complementary in applications. We also investigate rigorously the linear controller and the proposed neural stochastic controllers in both convergence time and energy cost and numerically compare them in these two indexes. More significantly, we use several representative physical systems to illustrate the usefulness of the proposed controllers in stabilization of dynamical systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics b/data/2022/neurips/Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics
new file mode 100644
index 0000000000..337399cc17
--- /dev/null
+++ b/data/2022/neurips/Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics	
@@ -0,0 +1 @@
+Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an extension to two popular classes of physics-inspired architectures. On the one hand, it extends Neural CDEs and variants -- continuous-time analogues of RNNs -- in that it is capable of processing incoming sequential information arriving at arbitrary spatial resolutions. On the other hand, it extends Neural Operators -- generalizations of neural networks to model mappings between spaces of functions -- in that it can parameterize solution operators of SPDEs depending simultaneously on the initial condition and a realization of the driving noise. By performing operations in the spectral domain, we show how a Neural SPDE can be evaluated in two ways, either by calling an ODE solver (emulating a spectral Galerkin scheme), or by solving a fixed point problem. Experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera b/data/2022/neurips/Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera
new file mode 100644
index 0000000000..1b43a56a3c
--- /dev/null
+++ b/data/2022/neurips/Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera	
@@ -0,0 +1 @@
+We propose Neural-DynamicReconstruction (NDR), a template-free method to recover high-fidelity geometry and motions of a dynamic scene from a monocular RGB-D camera. In NDR, we adopt the neural implicit function for surface representation and rendering such that the captured color and depth can be fully utilized to jointly optimize the surface and deformations. To represent and constrain the non-rigid deformations, we propose a novel neural invertible deforming network such that the cycle consistency between arbitrary two frames is automatically satisfied. Considering that the surface topology of dynamic scene might change over time, we employ a topology-aware strategy to construct the topology-variant correspondence for the fused frames. NDR also further refines the camera poses in a global optimization manner. Experiments on public datasets and our collected dataset demonstrate that NDR outperforms existing monocular dynamic reconstruction methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs b/data/2022/neurips/Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs
new file mode 100644
index 0000000000..ed9f6be0e0
--- /dev/null
+++ b/data/2022/neurips/Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs	
@@ -0,0 +1 @@
+Continuous-time dynamic graphs naturally abstract many real-world systems, such as social and transactional networks. While the research on continuous-time dynamic graph representation learning has made signiﬁcant advances recently, neither graph topological properties nor temporal dependencies have been well-considered and explicitly modeled in capturing dynamic patterns. In this paper, we introduce a new approach, Neural Temporal Walks ( NeurTWs ), for representation learning on continuous-time dynamic graphs. By considering not only time constraints but also structural and tree traversal properties, our method conducts spatiotemporal-biased random walks to retrieve a set of representative motifs, enabling temporal nodes to be characterized effectively. With a component based on neural ordinary differential equations, the extracted motifs allow for irregularly-sampled temporal nodes to be embedded explicitly over multiple different interaction time intervals, enabling the effective capture of the underlying spatiotemporal dynamics. To enrich supervision signals, we further design a harder contrastive pretext task for model optimization. Our method demonstrates overwhelming superiority under both transductive and inductive settings on six real-world datasets 1 .
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Topological Ordering for Computation Graphs b/data/2022/neurips/Neural Topological Ordering for Computation Graphs
new file mode 100644
index 0000000000..30bb0aebf0
--- /dev/null
+++ b/data/2022/neurips/Neural Topological Ordering for Computation Graphs	
@@ -0,0 +1 @@
+Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework. Our encoder is a novel attention based graph neural network architecture called \emph{Topoformer} which uses different topological transforms of a DAG for message passing. The node embeddings produced by the encoder are converted into node priorities which are used by the decoder to generate a probability distribution over topological orders. We train our model on a dataset of synthetically generated graphs called layered graphs. We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes. We also train and test our model on a set of real-world computation graphs, showing performance improvements.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural Transmitted Radiance Fields b/data/2022/neurips/Neural Transmitted Radiance Fields
new file mode 100644
index 0000000000..42d2000ca8
--- /dev/null
+++ b/data/2022/neurips/Neural Transmitted Radiance Fields	
@@ -0,0 +1 @@
+Neural radiance fields (NeRF) have brought tremendous progress to novel view synthesis. Though NeRF enables the rendering of subtle details in a scene by learning from a dense set of images, it also reconstructs the undesired reflections when we capture images through glass. As a commonly observed interference, the reflection would undermine the visibility of the desired transmitted scene behind glass by occluding the transmitted light rays. In this paper, we aim at addressing the problem of rendering novel transmitted views given a set of reflection-corrupted images. By introducing the transmission encoder and recurring edge constraints as guidance, our neural transmitted radiance fields can resist such reflection interference during rendering and reconstruct high-fidelity results even under sparse views. The proposed method achieves superior performance from the experiments on a newly collected dataset compared with state-of-the-art methods. Our code and data is available at https://github.com/FreeButUselessSoul/TNeRF.
\ No newline at end of file
diff --git a/data/2022/neurips/Neural-Symbolic Entangled Framework for Complex Query Answering b/data/2022/neurips/Neural-Symbolic Entangled Framework for Complex Query Answering
new file mode 100644
index 0000000000..6bf4dd4d0d
--- /dev/null
+++ b/data/2022/neurips/Neural-Symbolic Entangled Framework for Complex Query Answering	
@@ -0,0 +1 @@
+Answering complex queries over knowledge graphs (KG) is an important yet challenging task because of the KG incompleteness issue and cascading errors during reasoning. Recent query embedding (QE) approaches to embed the entities and relations in a KG and the first-order logic (FOL) queries into a low dimensional space, answering queries by dense similarity search. However, previous works mainly concentrate on the target answers, ignoring intermediate entities' usefulness, which is essential for relieving the cascading error problem in logical query answering. In addition, these methods are usually designed with their own geometric or distributional embeddings to handle logical operators like union, intersection, and negation, with the sacrifice of the accuracy of the basic operator - projection, and they could not absorb other embedding methods to their models. In this work, we propose a Neural and Symbolic Entangled framework (ENeSy) for complex query answering, which enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. The projection operator in ENeSy could be any embedding method with the capability of link prediction, and the other FOL operators are handled without parameters. With both neural and symbolic reasoning results contained, ENeSy answers queries in ensembles. ENeSy achieves the SOTA performance on several benchmarks, especially in the setting of the training model only with the link prediction task.
\ No newline at end of file
diff --git a/data/2022/neurips/NeuroSchedule: A Novel Effective GNN-based Scheduling Method for High-level Synthesis b/data/2022/neurips/NeuroSchedule: A Novel Effective GNN-based Scheduling Method for High-level Synthesis
new file mode 100644
index 0000000000..f1ae3e067b
--- /dev/null
+++ b/data/2022/neurips/NeuroSchedule: A Novel Effective GNN-based Scheduling Method for High-level Synthesis	
@@ -0,0 +1 @@
+High-level synthesis (HLS) is widely used for transferring behavior-level specifications into circuit-level implementations. As a critical step in HLS, scheduling arranges the execution order of operations for enhanced performance. However, existing scheduling methods suffer from either exponential runtime or poor quality of solutions. This paper proposes an efficient and effective GNN-based scheduling method called NeuroSchedule, with both fast runtime and enhanced solution quality. Major features are as follows: (1) The learning problem for HLS scheduling is formulated for the first time, and a new machine learning framework is proposed. (2) Pre-training models are adopted to further enhance the scalability for various scheduling problems with different settings. Experimental results show that NeuroSchedule obtains near-optimal solutions while achieving more than 50,000 × improvement in runtime compared with the ILP-based scheduling method. At the same time, NeuroSchedule improves the scheduling results by 6.10% on average compared with state-of-the-art entropy-directed method. To the best of our knowledge, this is the first GNN-based scheduling method for HLS.
\ No newline at end of file
diff --git a/data/2022/neurips/Neuron with Steady Response Leads to Better Generalization b/data/2022/neurips/Neuron with Steady Response Leads to Better Generalization
new file mode 100644
index 0000000000..ce28f9a2f2
--- /dev/null
+++ b/data/2022/neurips/Neuron with Steady Response Leads to Better Generalization	
@@ -0,0 +1 @@
+Regularization can mitigate the generalization gap between training and inference by introducing inductive bias. Existing works have already proposed various inductive biases from diverse perspectives. However, none of them explores inductive bias from the perspective of class-dependent response distribution of individual neurons. In this paper, we conduct a substantial analysis of the characteristics of such distribution. Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization. Accordingly, we propose a new regularization method called Neuron Steadiness Regularization (NSR) to reduce neuron intra-class response variance. Based on the Complexity Measure, we theoretically guarantee the effectiveness of NSR for improving generalization. We conduct extensive experiments on Multilayer Perceptron, Convolutional Neural Networks, and Graph Neural Networks with popular benchmark datasets of diverse domains, which show that our Neuron Steadiness Regularization consistently outperforms the vanilla version of models with significant gain and low additional computational overhead.
\ No newline at end of file
diff --git a/data/2022/neurips/Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints b/data/2022/neurips/Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints
new file mode 100644
index 0000000000..988441b068
--- /dev/null
+++ b/data/2022/neurips/Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints	
@@ -0,0 +1 @@
+There has been signiﬁcant recent progress designing deep generative models that generate realistic sequence data such as text or music. Nevertheless, it remains difﬁcult to incorporate high-level structure to guide the generative process, and many such models perform well on local coherence, but less so on global coherence. We propose a novel approach for incorporating global structure in the form of relational constraints between different subcomponents of an example (e.g., lines of a poem or measures of music). Our generative model has two parts: (i) one model to generate a realistic set of relational constraints, and (ii) a second model to generate realistic data satisfying these constraints. For model (i), we propose a program synthesis algorithm that infers the relational constraints present in the training data, and then learn a generative model based on the resulting constraint data. In our experiments, we show that our approach signiﬁcantly improves over state-of-the-art in terms of capturing high-level structure in the data, while performing comparably or better in terms of low-level structure.
\ No newline at end of file
diff --git a/data/2022/neurips/New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound b/data/2022/neurips/New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound
new file mode 100644
index 0000000000..2b4e545b59
--- /dev/null
+++ b/data/2022/neurips/New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound	
@@ -0,0 +1 @@
+Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net. Evaluations of saliency methods convert this heat map into a new {\em masked input} by retaining the $k$ highest-ranked pixels of the original input and replacing the rest with \textquotedblleft uninformative\textquotedblright\ pixels, and checking if the net's output is mostly unchanged. This is usually seen as an {\em explanation} of the output, but the current paper highlights reasons why this inference of causality may be suspect. Inspired by logic concepts of {\em completeness \&soundness}, it observes that the above type of evaluation focuses on completeness of the explanation, but ignores soundness. New evaluation metrics are introduced to capture both notions, while staying in an {\em intrinsic} framework -- i.e., using the dataset and the net, but no separately trained nets, human evaluations, etc. A simple saliency method is described that matches or outperforms prior methods in the evaluations. Experiments also suggest new intrinsic justifications, based on soundness, for popular heuristic tricks such as TV regularization and upsampling.
\ No newline at end of file
diff --git a/data/2022/neurips/New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma b/data/2022/neurips/New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma
new file mode 100644
index 0000000000..d55aa4a6f0
--- /dev/null
+++ b/data/2022/neurips/New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma	
@@ -0,0 +1 @@
+We prove new lower bounds for statistical estimation tasks under the constraint of $(\varepsilon, \delta)$-differential privacy. First, we provide tight lower bounds for private covariance estimation of Gaussian distributions. We show that estimating the covariance matrix in Frobenius norm requires $\Omega(d^2)$ samples, and in spectral norm requires $\Omega(d^{3/2})$ samples, both matching upper bounds up to logarithmic factors. The latter bound verifies the existence of a conjectured statistical gap between the private and the non-private sample complexities for spectral estimation of Gaussian covariances. We prove these bounds via our main technical contribution, a broad generalization of the fingerprinting method to exponential families. Additionally, using the private Assouad method of Acharya, Sun, and Zhang, we show a tight $\Omega(d/(\alpha^2 \varepsilon))$ lower bound for estimating the mean of a distribution with bounded covariance to $\alpha$-error in $\ell_2$-distance. Prior known lower bounds for all these problems were either polynomially weaker or held under the stricter condition of $(\varepsilon, 0)$-differential privacy.
\ No newline at end of file
diff --git a/data/2022/neurips/No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit b/data/2022/neurips/No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit
new file mode 100644
index 0000000000..964d3d6b4f
--- /dev/null
+++ b/data/2022/neurips/No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit	
@@ -0,0 +1 @@
+Research in Neuroscience, as in many scientific disciplines, is undergoing a renaissance based on deep learning. Unique to Neuroscience, deep learning models can be used not only as a tool but interpreted as models of the brain. The central claims of recent deep learning-based models of brain circuits are that they make novel predictions about neural phenomena or shed light on the fundamental functions being optimized. We show, through the case-study of grid cells in the entorhinal-hippocampal circuit, that one may get neither. We begin by reviewing the principles of grid cell mechanism and function obtained from first-principles modeling efforts, then rigorously examine the claims of deep learning models of grid cells. Using large-scale architectural and hyperparameter sweeps and theory-driven experimentation, we demonstrate that the results of such models may be more strongly driven by particular, non-fundamental, and post-hoc implementation choices than fundamental truths about neural circuits or the loss function(s) they might optimize. We discuss why these models cannot be expected to produce accurate models of the brain without the addition of substantial amounts of inductive bias, an informal No Free Lunch result for Neuroscience. Based on first principles work, we provide hypotheses for what additional loss functions will produce grid cells more robustly. In conclusion, circumspection and transparency, together with biological knowledge, are warranted in building and interpreting deep learning models in Neuroscience.
\ No newline at end of file
diff --git a/data/2022/neurips/No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation b/data/2022/neurips/No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation
new file mode 100644
index 0000000000..0f27f05ab7
--- /dev/null
+++ b/data/2022/neurips/No-regret learning in games with noisy feedback: Faster rates and adaptivity via learning rate separation	
@@ -0,0 +1 @@
+We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile.
\ No newline at end of file
diff --git a/data/2022/neurips/Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world b/data/2022/neurips/Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world
new file mode 100644
index 0000000000..88385646f5
--- /dev/null
+++ b/data/2022/neurips/Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world	
@@ -0,0 +1 @@
+We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability. The focus of Nocturne is to enable research into inference and theory of mind in real-world multi-agent settings without the computational overhead of computer vision and feature extraction from images. Agents in this simulator only observe an obstructed view of the scene, mimicking human visual sensing constraints. Unlike existing benchmarks that are bottlenecked by rendering human-like observations directly using a camera input, Nocturne uses efficient intersection methods to compute a vectorized set of visible features in a C++ back-end, allowing the simulator to run at over 2000 steps-per-second. Using open-source trajectory and map data, we construct a simulator to load and replay arbitrary trajectories and scenes from real-world driving data. Using this environment, we benchmark reinforcement-learning and imitation-learning agents and demonstrate that the agents are quite far from human-level coordination ability and deviate significantly from the expert trajectories.
\ No newline at end of file
diff --git a/data/2022/neurips/NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification b/data/2022/neurips/NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification
new file mode 100644
index 0000000000..aaf017c321
--- /dev/null
+++ b/data/2022/neurips/NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification	
@@ -0,0 +1 @@
+Graph neural networks have been extensively studied for learning with inter-connected data. Despite this, recent evidence has revealed GNNs' deficiencies related to over-squashing, heterophily, handling long-range dependencies, edge incompleteness and particularly, the absence of graphs altogether. While a plausible solution is to learn new adaptive topology for message passing, issues concerning quadratic complexity hinder simultaneous guarantees for scalability and precision in large networks. In this paper, we introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes, as an important building block for a pioneering Transformer-style network for node classification on large graphs, dubbed as \textsc{NodeFormer}. Specifically, the efficient computation is enabled by a kernerlized Gumbel-Softmax operator that reduces the algorithmic complexity to linearity w.r.t. node numbers for learning latent graph structures from large, potentially fully-connected graphs in a differentiable manner. We also provide accompanying theory as justification for our design. Extensive experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs (with up to 2M nodes) and graph-enhanced applications (e.g., image classification) where input graphs are missing.
\ No newline at end of file
diff --git a/data/2022/neurips/Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling b/data/2022/neurips/Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling
new file mode 100644
index 0000000000..da22cc11ab
--- /dev/null
+++ b/data/2022/neurips/Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling	
@@ -0,0 +1 @@
+Machine learning has been highly successful in data-driven applications but is often hampered when the data contains noise, especially label noise. When trained on noisy labels, deep neural networks tend to ﬁt all noisy labels, resulting in poor generalization. To handle this problem, a common idea is to force the model to ﬁt only clean samples rather than mislabeled ones. In this paper, we propose a simple yet effective method that automatically distinguishes the mislabeled samples and prevents the model from memorizing them, named Noise Attention Learning. In our method, we introduce an attention branch to produce attention weights based on representations of samples. This attention branch is learned to divide the samples according to the predictive power in their representations. We design the corresponding loss function that incorporates the attention weights for training the model without affecting the original learning direction. Empirical results show that most of the mislabeled samples yield signiﬁcantly lower weights than the clean ones. Furthermore, our theoretical analysis shows that the gradients of training samples are dynamically scaled by the attention weights, implicitly preventing memorization of the mislabeled samples. Experimental results on two benchmarks (CIFAR-10 and CIFAR-100) with simulated label noise and three real-world noisy datasets (ANIMAL-10N, Clothing1M and Webvision) demonstrate that our approach outperforms state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Convex Bilevel Games with Critical Point Selection Maps b/data/2022/neurips/Non-Convex Bilevel Games with Critical Point Selection Maps
new file mode 100644
index 0000000000..81cbc688e0
--- /dev/null
+++ b/data/2022/neurips/Non-Convex Bilevel Games with Critical Point Selection Maps	
@@ -0,0 +1 @@
+Bilevel optimization problems involve two nested objectives, where an upper-level objective depends on a solution to a lower-level problem. When the latter is non-convex, multiple critical points may be present, leading to an ambiguous definition of the problem. In this paper, we introduce a key ingredient for resolving this ambiguity through the concept of a selection map which allows one to choose a particular solution to the lower-level problem. Using such maps, we define a class of hierarchical games between two agents that resolve the ambiguity in bilevel problems. This new class of games requires introducing new analytical tools in Morse theory to extend implicit differentiation, a technique used in bilevel optimization resulting from the implicit function theorem. In particular, we establish the validity of such a method even when the latter theorem is inapplicable due to degenerate critical points. Finally, we show that algorithms for solving bilevel problems based on unrolled optimization solve these games up to approximation errors due to finite computational power. A simple correction to these algorithms is then proposed for removing these errors.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Gaussian Tensor Programs b/data/2022/neurips/Non-Gaussian Tensor Programs
new file mode 100644
index 0000000000..8b48da4718
--- /dev/null
+++ b/data/2022/neurips/Non-Gaussian Tensor Programs	
@@ -0,0 +1 @@
+Does it matter whether one randomly initializes a neural network (NN) from Gaussian, uniform, or other distributions? We show the answer is ”yes” in some parameter tensors (the so-called matrix-like parameters) but ”no” in others when the NN is wide. This is a specific instance of a more general universality principle for Tensor Programs (TP) that informs precisely when the limit of a program depends on the distribution of its initial matrices and vectors. To obtain this principle, we develop the theory of non-Gaussian Tensor Programs. As corollaries, we obtain all previous consequences of the TP framework (such as NNGP/NTK correspondence, Free Indepedence Principle, Dynamical Dichotomy Theorem, and µ -parametrization) for NNs with non-Gaussian weights. 1
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Linear Coordination Graphs b/data/2022/neurips/Non-Linear Coordination Graphs
new file mode 100644
index 0000000000..5078bac963
--- /dev/null
+++ b/data/2022/neurips/Non-Linear Coordination Graphs	
@@ -0,0 +1 @@
+Value decomposition multi-agent reinforcement learning methods learn the global value function as a mixing of each agent's individual utility functions. Coordination graphs (CGs) represent a higher-order decomposition by incorporating pairwise payoff functions and thus is supposed to have a more powerful representational capacity. However, CGs decompose the global value function linearly over local value functions, severely limiting the complexity of the value function class that can be represented. In this paper, we propose the first non-linear coordination graph by extending CG value decomposition beyond the linear case. One major challenge is to conduct greedy action selections in this new function class to which commonly adopted DCOP algorithms are no longer applicable. We study how to solve this problem when mixing networks with LeakyReLU activation are used. An enumeration method with a global optimality guarantee is proposed and motivates an efficient iterative optimization method with a local optimality guarantee. We find that our method can achieve superior performance on challenging multi-agent coordination tasks like MACO.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings b/data/2022/neurips/Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
new file mode 100644
index 0000000000..e2c3c2e99e
--- /dev/null
+++ b/data/2022/neurips/Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings	
@@ -0,0 +1 @@
+Semantic representation learning for sentences is an important and well-studied problem in NLP. The current trend for this task involves training a Transformer-based sentence encoder through a contrastive objective with text, i.e., clustering sentences with semantically similar meanings and scattering others. In this work, we find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses, using unpaired examples from another modality (e.g., sentences and unrelated image/audio data). In particular, besides learning by the contrastive loss on text, our model clusters examples from a non-linguistic domain (e.g., visual/audio) with a similar contrastive loss at the same time. The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP. Experiments on 7 semantic textual similarity benchmarks reveal that models trained with the additional non-linguistic (images/audio) contrastive objective lead to higher quality sentence embeddings. This indicates that Transformer models are able to generalize better by doing a similar task (i.e., clustering) with unpaired examples from different modalities in a multi-task fashion.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning b/data/2022/neurips/Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning
new file mode 100644
index 0000000000..5c73c5154e
--- /dev/null
+++ b/data/2022/neurips/Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning	
@@ -0,0 +1 @@
+We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards. Existing work assumes that human evaluators observe each step in a trajectory independently when providing feedback on agent behaviour. In this work, we remove this assumption, extending RM to capture temporal dependencies in human assessment of trajectories. We show how RM can be approached as a multiple instance learning (MIL) problem, where trajectories are treated as bags with return labels, and steps within the trajectories are instances with unseen reward labels. We go on to develop new MIL models that are able to capture the time dependencies in labelled trajectories. We demonstrate on a range of RL tasks that our novel MIL models can reconstruct reward functions to a high level of accuracy, and can be used to train high-performing agent policies.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation b/data/2022/neurips/Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
new file mode 100644
index 0000000000..99650e1cff
--- /dev/null
+++ b/data/2022/neurips/Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation	
@@ -0,0 +1 @@
+Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss, which forces the model outputs to be aligned verbatim with the target sentence and will highly penalize small shifts in word positions. Latent alignment models relax the explicit alignment by marginalizing out all monotonic latent alignments with the CTC loss. However, they cannot handle non-monotonic alignments, which is non-negligible as there is typically global word reordering in machine translation. In this work, we explore non-monotonic latent alignments for NAT. We extend the alignment space to non-monotonic alignments to allow for the global word reordering and further consider all alignments that overlap with the target sentence. We non-monotonically match the alignments to the target sentence and train the latent alignment model to maximize the F1 score of non-monotonic matching. Extensive experiments on major WMT benchmarks show that our method substantially improves the translation performance of CTC-based models. Our best model achieves 30.06 BLEU on WMT14 En-De with only one-iteration decoding, closing the gap between non-autoregressive and autoregressive models.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret b/data/2022/neurips/Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret
new file mode 100644
index 0000000000..7f7b875c93
--- /dev/null
+++ b/data/2022/neurips/Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret	
@@ -0,0 +1 @@
+The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more. Even assuming prior knowledge of the mean payoff functions, computing an optimal planning in the above model is NP-hard, while the state-of-the-art is a $1/4$-approximation algorithm for the case where at most one arm can be played per round. We first focus on the setting where the mean payoff functions are known. In this setting, we significantly improve the best-known guarantees for the planning problem by developing a polynomial-time $(1-{1}/{e})$-approximation algorithm (asymptotically and in expectation), based on a novel combination of randomized LP rounding and a time-correlated (interleaved) scheduling method. Furthermore, our algorithm achieves improved guarantees -- compared to prior work -- for the case where more than one arm can be played at each round. Moving to the bandit setting, when the mean payoff functions are initially unknown, we show how our algorithm can be transformed into a bandit algorithm with sublinear regret.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-convex online learning via algorithmic equivalence b/data/2022/neurips/Non-convex online learning via algorithmic equivalence
new file mode 100644
index 0000000000..9ec9f93053
--- /dev/null
+++ b/data/2022/neurips/Non-convex online learning via algorithmic equivalence	
@@ -0,0 +1 @@
+We study an algorithmic equivalence technique between non-convex gradient descent and convex mirror descent. We start by looking at a harder problem of regret minimization in online non-convex optimization. We show that under certain geometric and smoothness conditions, online gradient descent applied to non-convex functions is an approximation of online mirror descent applied to convex functions under reparameterization. In continuous time, the gradient flow with this reparameterization was shown to be exactly equivalent to continuous-time mirror descent by Amid and Warmuth 2020, but theory for the analogous discrete time algorithms is left as an open problem. We prove an $O(T^{\frac{2}{3}})$ regret bound for non-convex online gradient descent in this setting, answering this open problem. Our analysis is based on a new and simple algorithmic equivalence method.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-deep Networks b/data/2022/neurips/Non-deep Networks
new file mode 100644
index 0000000000..d18e555889
--- /dev/null
+++ b/data/2022/neurips/Non-deep Networks	
@@ -0,0 +1 @@
+Depth is the hallmark of deep neural networks. But more depth means more sequential computation and higher latency. This begs the question -- is it possible to build high-performing"non-deep"neural networks? We show that it is. To do so, we use parallel subnetworks instead of stacking one layer after another. This helps effectively reduce depth while maintaining high performance. By utilizing parallel substructures, we show, for the first time, that a network with a depth of just 12 can achieve top-1 accuracy over 80% on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. We also show that a network with a low-depth (12) backbone can achieve an AP of 48% on MS-COCO. We analyze the scaling rules for our design and show how to increase performance without changing the network's depth. Finally, we provide a proof of concept for how non-deep networks could be used to build low-latency recognition systems. Code is available at https://github.com/imankgoyal/NonDeepNetworks.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness b/data/2022/neurips/Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness
new file mode 100644
index 0000000000..578da47519
--- /dev/null
+++ b/data/2022/neurips/Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness	
@@ -0,0 +1 @@
+Understanding the consequences of mutation for molecular fitness and function is a fundamental problem in biology. Recently, generative probabilistic models have emerged as a powerful tool for estimating fitness from evolutionary sequence data, with accuracy sufficient to predict both laboratory measurements of function and disease risk in humans, and to design novel functional proteins. Existing techniques rest on an assumed relationship between density estimation and fitness estimation, a relationship that we interrogate in this article. We prove that fitness is not identifiable from observational sequence data alone, placing fundamental limits on our ability to disentangle fitness landscapes from phylogenetic history. We show on real datasets that perfect density estimation in the limit of infinite data would, with high confidence, result in poor fitness estimation; current models perform accurate fitness estimation because of, not despite, misspecification. Our results challenge the conventional wisdom that bigger models trained on bigger datasets will inevitably lead to better fitness estimation, and suggest novel estimation strategies going forward.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem b/data/2022/neurips/Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
new file mode 100644
index 0000000000..be04423525
--- /dev/null
+++ b/data/2022/neurips/Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem	
@@ -0,0 +1 @@
+Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-rigid Point Cloud Registration with Neural Deformation Pyramid b/data/2022/neurips/Non-rigid Point Cloud Registration with Neural Deformation Pyramid
new file mode 100644
index 0000000000..5ea5d6aa4f
--- /dev/null
+++ b/data/2022/neurips/Non-rigid Point Cloud Registration with Neural Deformation Pyramid	
@@ -0,0 +1 @@
+Non-rigid point cloud registration is a key component in many computer vision and computer graphics applications. The high complexity of the unknown non-rigid motion make this task a challenging problem. In this paper, we break down this problem via hierarchical motion decomposition. Our method called Neural Deformation Pyramid (NDP) represents non-rigid motion using a pyramid architecture. Each pyramid level, denoted by a Multi-Layer Perception (MLP), takes as input a sinusoidally encoded 3D point and outputs its motion increments from the previous level. The sinusoidal function starts with a low input frequency and gradually increases when the pyramid level goes down. This allows a multi-level rigid to nonrigid motion decomposition and also speeds up the solving by 50 times compared to the existing MLP-based approach. Our method achieves advanced partialto-partial non-rigid point cloud registration results on the 4DMatch/4DLoMatch benchmark under both no-learned and supervised settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-stationary Bandits with Knapsacks b/data/2022/neurips/Non-stationary Bandits with Knapsacks
new file mode 100644
index 0000000000..d0436f2611
--- /dev/null
+++ b/data/2022/neurips/Non-stationary Bandits with Knapsacks	
@@ -0,0 +1 @@
+In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates between these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.
\ No newline at end of file
diff --git a/data/2022/neurips/Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting b/data/2022/neurips/Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting
new file mode 100644
index 0000000000..dee760d359
--- /dev/null
+++ b/data/2022/neurips/Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting	
@@ -0,0 +1 @@
+Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.
\ No newline at end of file
diff --git a/data/2022/neurips/Nonlinear MCMC for Bayesian Machine Learning b/data/2022/neurips/Nonlinear MCMC for Bayesian Machine Learning
new file mode 100644
index 0000000000..57bf560dc9
--- /dev/null
+++ b/data/2022/neurips/Nonlinear MCMC for Bayesian Machine Learning	
@@ -0,0 +1 @@
+We explore the application of a nonlinear MCMC technique first introduced in [1] to problems in Bayesian machine learning. We provide a convergence guarantee in total variation that uses novel results for long-time convergence and large-particle ("propagation of chaos") convergence. We apply this nonlinear MCMC technique to sampling problems including a Bayesian neural network on CIFAR10.
\ No newline at end of file
diff --git a/data/2022/neurips/Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network b/data/2022/neurips/Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network
new file mode 100644
index 0000000000..5e7e5d35ec
--- /dev/null
+++ b/data/2022/neurips/Nonlinear Sufficient Dimension Reduction with a Stochastic Neural Network	
@@ -0,0 +1 @@
+Sufficient dimension reduction is a powerful tool to extract core information hidden in the high-dimensional data and has potentially many important applications in machine learning tasks. However, the existing nonlinear sufficient dimension reduction methods often lack the scalability necessary for dealing with large-scale data. We propose a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for large-scale data. The proposed stochastic neural network is trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm, whose convergence is rigorously studied in the paper as well. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.
\ No newline at end of file
diff --git a/data/2022/neurips/Nonnegative Tensor Completion via Integer Optimization b/data/2022/neurips/Nonnegative Tensor Completion via Integer Optimization
new file mode 100644
index 0000000000..4ffa54a78a
--- /dev/null
+++ b/data/2022/neurips/Nonnegative Tensor Completion via Integer Optimization	
@@ -0,0 +1 @@
+Unlike matrix completion, tensor completion does not have an algorithm that is known to achieve the information-theoretic sample complexity rate. This paper develops a new algorithm for the special case of completion for nonnegative tensors. We prove that our algorithm converges in a linear (in numerical tolerance) number of oracle steps, while achieving the information-theoretic rate. Our approach is to define a new norm for nonnegative tensors using the gauge of a particular 0-1 polytope; integer linear programming can, in turn, be used to solve linear separation problems over this polytope. We combine this insight with a variant of the Frank-Wolfe algorithm to construct our numerical algorithm, and we demonstrate its effectiveness and scalability through computational experiments using a laptop on tensors with up to one-hundred million entries.
\ No newline at end of file
diff --git a/data/2022/neurips/Nonparametric Uncertainty Quantification for Single Deterministic Neural Network b/data/2022/neurips/Nonparametric Uncertainty Quantification for Single Deterministic Neural Network
new file mode 100644
index 0000000000..4e7cf617b2
--- /dev/null
+++ b/data/2022/neurips/Nonparametric Uncertainty Quantification for Single Deterministic Neural Network	
@@ -0,0 +1 @@
+This paper proposes a fast and scalable method for uncertainty quantification of machine learning models' predictions. First, we show the principled way to measure the uncertainty of predictions for a classifier based on Nadaraya-Watson's nonparametric estimate of the conditional label distribution. Importantly, the proposed approach allows to disentangle explicitly aleatoric and epistemic uncertainties. The resulting method works directly in the feature space. However, one can apply it to any neural network by considering an embedding of the data induced by the network. We demonstrate the strong performance of the method in uncertainty estimation tasks on text classification problems and a variety of real-world image datasets, such as MNIST, SVHN, CIFAR-100 and several versions of ImageNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Nonstationary Dual Averaging and Online Fair Allocation b/data/2022/neurips/Nonstationary Dual Averaging and Online Fair Allocation
new file mode 100644
index 0000000000..aee576e1b0
--- /dev/null
+++ b/data/2022/neurips/Nonstationary Dual Averaging and Online Fair Allocation	
@@ -0,0 +1 @@
+We consider the problem of fairly allocating items to a set of individuals, when the items are arriving online. A central solution concept in fair allocation is competitive equilibrium: every individual is endowed with a budget of faux currency, and the resulting competitive equilibrium is used to allocate. For the online fair allocation context, the PACE algorithm of Gao et al. [2021] leverages the dual averaging algorithm to approximate competitive equilibria. The authors show that, when items arrive i.i.d, the algorithm asymptotically achieves the fairness and efficiency guarantees of the offline competitive equilibrium allocation. However, real-world data is typically not stationary. One could instead model the data as adversarial, but this is often too pessimistic in practice. Motivated by this consideration, we study an online fair allocation setting with nonstationary item arrivals. To address this setting, we first develop new online learning results for the dual averaging algorithm under nonstationary input models. We show that the dual averaging iterates converge in mean square to both the underlying optimal solution of the"true"stochastic optimization problem as well as the"hindsight"optimal solution of the finite-sum problem given by the sample path. Our results apply to several nonstationary input models: adversarial corruption, ergodic input, and block-independent (including periodic) input. Here, the bound on the mean square error depends on a nonstationarity measure of the input. We recover the classical bound when the input data is i.i.d. We then show that our dual averaging results imply that the PACE algorithm for online fair allocation simultaneously achieves"best of both worlds"guarantees against any of these input models. Finally, we conduct numerical experiments which show strong empirical performance against nonstationary inputs.
\ No newline at end of file
diff --git a/data/2022/neurips/Normalizing Flows for Knockoff-free Controlled Feature Selection b/data/2022/neurips/Normalizing Flows for Knockoff-free Controlled Feature Selection
new file mode 100644
index 0000000000..52812438c2
--- /dev/null
+++ b/data/2022/neurips/Normalizing Flows for Knockoff-free Controlled Feature Selection	
@@ -0,0 +1 @@
+Controlled feature selection aims to discover the features a response depends on while limiting the false discovery rate (FDR) to a predefined level. Recently, multiple deep-learning-based methods have been proposed to perform controlled feature selection through the Model-X knockoff framework. We demonstrate, however, that these methods often fail to control the FDR for two reasons. First, these methods often learn inaccurate models of features. Second, the"swap"property, which is required for knockoffs to be valid, is often not well enforced. We propose a new procedure called FlowSelect to perform controlled feature selection that does not suffer from either of these two problems. To more accurately model the features, FlowSelect uses normalizing flows, the state-of-the-art method for density estimation. Instead of enforcing the"swap"property, FlowSelect uses a novel MCMC-based procedure to calculate p-values for each feature directly. Asymptotically, FlowSelect computes valid p-values. Empirically, FlowSelect consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FlowSelect also demonstrates greater power on these benchmarks. Additionally, FlowSelect correctly infers the genetic variants associated with specific soybean traits from GWAS data.
\ No newline at end of file
diff --git a/data/2022/neurips/Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise b/data/2022/neurips/Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise
new file mode 100644
index 0000000000..f13b3a93ee
--- /dev/null
+++ b/data/2022/neurips/Not All Bits have Equal Value: Heterogeneous Precisions via Trainable Noise	
@@ -0,0 +1 @@
+We study the problem of training deep networks while quantizing parameters and activations into low-precision numeric representations, a setting central to reducing energy consumption and inference time of deployed models. We propose a method that learns different precisions, as measured by bits in numeric representations, for different weights in a neural network, yielding a heterogeneous allocation of bits across parameters. Learning precisions occurs alongside learning weight values, using a strategy derived from a novel framework wherein the intractability of optimizing discrete precisions is approximated by training per-parameter noise magnitudes. We broaden this framework to also encompass learning precisions for hidden state activations, simultaneously with weight precisions and values. Our approach exposes the objective of constructing a low-precision inference-efﬁcient model to the entirety of the training process. Experiments show that it ﬁnds highly heterogeneous precision assignments for CNNs trained on CIFAR and ImageNet, improving upon previous state-of-the-art quantization methods. Our improvements extend to the challenging scenario of learning reduced-precision GANs.
\ No newline at end of file
diff --git a/data/2022/neurips/Not too little, not too much: a theoretical analysis of graph (over)smoothing b/data/2022/neurips/Not too little, not too much: a theoretical analysis of graph (over)smoothing
new file mode 100644
index 0000000000..13d8763698
--- /dev/null
+++ b/data/2022/neurips/Not too little, not too much: a theoretical analysis of graph (over)smoothing	
@@ -0,0 +1 @@
+We analyze graph smoothing with \emph{mean aggregation}, where each node successively receives the average of the features of its neighbors. Indeed, it has quickly been observed that Graph Neural Networks (GNNs), which generally follow some variant of Message-Passing (MP) with repeated aggregation, may be subject to the oversmoothing phenomenon: by performing too many rounds of MP, the node features tend to converge to a non-informative limit. In the case of mean aggregation, for connected graphs, the node features become constant across the whole graph. At the other end of the spectrum, it is intuitively obvious that some MP rounds are necessary, but existing analyses do not exhibit both phenomena at once: beneficial ``finite'' smoothing and oversmoothing in the limit. In this paper, we consider simplified linear GNNs, and rigorously analyze two examples for which a finite number of mean aggregation steps provably improves the learning performance, before oversmoothing kicks in. We consider a latent space random graph model, where node features are partial observations of the latent variables and the graph contains pairwise relationships between them. We show that graph smoothing restores some of the lost information, up to a certain point, by two phenomenon: graph smoothing shrinks non-principal directions in the data faster than principal ones, which is useful for regression, and shrinks nodes within communities faster than they collapse together, which improves classification.
\ No newline at end of file
diff --git a/data/2022/neurips/OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds b/data/2022/neurips/OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds
new file mode 100644
index 0000000000..da5203fc5e
--- /dev/null
+++ b/data/2022/neurips/OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds	
@@ -0,0 +1 @@
+In this paper, we study the problem of 3D object segmentation from raw point clouds. Unlike all existing methods which usually require a large amount of human annotations for full supervision, we propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass, without needing any type of human annotations. The key to our approach is to fully leverage the dynamic motion patterns over sequential point clouds as supervision signals to automatically discover rigid objects. Our method consists of three major components, 1) the object segmentation network to directly estimate multi-object masks from a single point cloud frame, 2) the auxiliary self-supervised scene flow estimator, and 3) our core object geometry consistency component. By carefully designing a series of loss functions, we effectively take into account the multi-object rigid consistency and the object shape invariance in both temporal and spatial scales. This allows our method to truly discover the object geometry even in the absence of annotations. We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation and general object segmentation in both indoor and the challenging outdoor scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics b/data/2022/neurips/OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics
new file mode 100644
index 0000000000..d74650993b
--- /dev/null
+++ b/data/2022/neurips/OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics	
@@ -0,0 +1 @@
+Clinical diagnosis of the eye is performed over multifarious data modalities including scalar clinical labels, vectorized biomarkers, two-dimensional fundus images, and three-dimensional Optical Coherence Tomography (OCT) scans. Clinical practitioners use all available data modalities for diagnosing and treating eye diseases like Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). Enabling usage of machine learning algorithms within the ophthalmic medical domain requires research into the relationships and interactions between all relevant data over a treatment period. Existing datasets are limited in that they neither provide data nor consider the explicit relationship modeling between the data modalities. In this paper, we introduce the Ophthalmic Labels for Investigating Visual Eye Semantics (OLIVES) dataset that addresses the above limitation. This is the first OCT and near-IR fundus dataset that includes clinical labels, biomarker labels, disease labels, and time-series patient treatment information from associated clinical trials. The dataset consists of 1268 near-IR fundus images each with at least 49 OCT scans, and 16 biomarkers, along with 4 clinical labels and a disease diagnosis of DR or DME. In total, there are 96 eyes' data averaged over a period of at least two years with each eye treated for an average of 66 weeks and 7 injections. We benchmark the utility of OLIVES dataset for ophthalmic data as well as provide benchmarks and concrete research directions for core and emerging machine learning paradigms within medical image analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs b/data/2022/neurips/OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs
new file mode 100644
index 0000000000..20fdbc4a99
--- /dev/null
+++ b/data/2022/neurips/OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs	
@@ -0,0 +1 @@
+This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to perform inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/OPEN: Orthogonal Propagation with Ego-Network Modeling b/data/2022/neurips/OPEN: Orthogonal Propagation with Ego-Network Modeling
new file mode 100644
index 0000000000..25d9b52c9b
--- /dev/null
+++ b/data/2022/neurips/OPEN: Orthogonal Propagation with Ego-Network Modeling	
@@ -0,0 +1 @@
+To alleviate the unfavorable effect of noisy topology in Graph Neural networks (GNNs), some efforts perform the local topology reﬁnement through the pairwise propagation weight learning and the multi-channel extension. Unfortunately, most of them suffer a common and fatal drawback: irrelevant propagation to one node and in multi-channels. These two kinds of irrelevances make propagation weights in multi-channels free to be determined by the labeled data, and thus the GNNs are exposed to overﬁtting. To tackle this issue, a novel Orthogonal Propagation with Ego-Network modeling (OPEN) is proposed by modeling relevances between propagations. Speciﬁcally, the relevance between propagations to one node is modeled by whole ego-network modeling, while the relevance between propagations in multi-channels is modeled via diversity requirement. By interpreting the propagations to one node from the perspective of dimension reduction, propagation weights are inferred from principal components of the ego-network, which are orthogonal to each other. Theoretical analysis and experimental evaluations reveal four attractive characteristics of OPEN as modeling high-order relationships beyond pairwise one, preventing overﬁtting, robustness, and high efﬁciency.
\ No newline at end of file
diff --git a/data/2022/neurips/ORIENT: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift b/data/2022/neurips/ORIENT: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift
new file mode 100644
index 0000000000..af9da0e151
--- /dev/null
+++ b/data/2022/neurips/ORIENT: Submodular Mutual Information Measures for Data Subset Selection under Distribution Shift	
@@ -0,0 +1 @@
+Real-world machine-learning applications require robust models that generalize well to distribution shift settings, which is typical in real-world situations. Domain adaptation techniques aim to address this issue of distribution shift by minimizing the disparities between domains to ensure that the model trained on the source domain performs well on the target domain. Nevertheless, the existing domain adaptation methods are computationally very expensive. In this work, we aim to improve the efficiency of existing supervised domain adaptation (SDA) methods by using a subset of source data that is similar to target data for faster model training. Specifically, we propose O RIENT , a subset selection framework that uses the submodular mutual information (SMI) functions to select a source data subset similar to the target data for faster training. Additionally, we demonstrate how existing robust subset selection strategies, such as G LISTER , G RAD M ATCH , and C RAIG , when used with a held-out query set, fit within our proposed framework and demonstrate the connections with them. Finally, we empirically demonstrate that SDA approaches like d -SNE, CCSA, and standard Cross-entropy training, when employed together with O RIENT , achieve a) faster training and b) better performance on the target data.
\ No newline at end of file
diff --git a/data/2022/neurips/OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training b/data/2022/neurips/OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training
new file mode 100644
index 0000000000..1b806f4e29
--- /dev/null
+++ b/data/2022/neurips/OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training	
@@ -0,0 +1 @@
+State-of-the-art deepfake detectors perform well in identifying forgeries when they are evaluated on a test set similar to the training set, but struggle to maintain good performance when the test forgeries exhibit different characteristics from the training images, e.g., forgeries are created by unseen deepfake methods. Such a weak generalization capability hinders the applicability of current deepfake detectors. In this paper, we introduce a new learning paradigm specially designed for the generalizable deepfake detection task. Our key idea is to construct a testsample-specific auxiliary task to update the model before applying it to the sample. Specifically, we synthesize pseudo-training samples from each test image and create a test-time training objective to update the model. Moreover, we propose to leverage meta-learning to ensure that a fast single-step test-time gradient descent, dubbed one-shot test-time training (OST), can be sufficient for good deepfake detection performance. Extensive results across several benchmark datasets demonstrate that our approach performs favorably against existing arts in terms of generalization to unseen data and robustness to different post-processing steps.
\ No newline at end of file
diff --git a/data/2022/neurips/OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport b/data/2022/neurips/OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport
new file mode 100644
index 0000000000..836c83bf9f
--- /dev/null
+++ b/data/2022/neurips/OTKGE: Multi-modal Knowledge Graph Embeddings via Optimal Transport	
@@ -0,0 +1 @@
+Multi-modal knowledge graph embeddings (KGE) have caught more and more attention in learning representations of entities and relations for link prediction tasks. Different from previous uni-modal KGE approaches, multi-modal KGE can leverage expressive knowledge from a wealth of modalities (image, text, etc. ), leading to more comprehensive representations of real-world entities. However, the critical challenge along this course lies in that the multi-modal embedding spaces are usually heterogeneous. In this sense, direct fusion will destroy the inherent spatial structure of different modal embeddings. To overcome this challenge, we revisit multi-modal KGE from a distributional alignment perspective and propose optimal transport knowledge graph embeddings ( OTKGE ). Speciﬁcally, we model the multi-modal fusion procedure as a transport plan moving different modal embeddings to a uniﬁed space by minimizing the Wasserstein distance between multi-modal distributions. Theoretically, we show that by minimizing the Wasserstein distance between the individual modalities and the uniﬁed embedding space,
\ No newline at end of file
diff --git a/data/2022/neurips/Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks b/data/2022/neurips/Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks
new file mode 100644
index 0000000000..7edceb1235
--- /dev/null
+++ b/data/2022/neurips/Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks	
@@ -0,0 +1 @@
+Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to process them with an identical structure. One main obstacle lies in the high-dimensional outputs in object-level visual tasks. In this paper, we propose an object-centric vision framework, Obj2Seq. Obj2Seq takes objects as basic units, and regards most object-level visual tasks as sequence generation problems of objects. Therefore, these visual tasks can be decoupled into two steps. First recognize objects of given categories, and then generate a sequence for each of these objects. The definition of the output sequences varies for different tasks, and the model is supervised by matching these sequences with ground-truth targets. Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks. When experimenting on MS COCO, Obj2Seq achieves 45.7% AP on object detection, 89.0% AP on multi-label classification and 65.0% AP on human pose estimation. These results demonstrate its potential to be generally applied to different visual tasks. Code has been made available at: https://github.com/CASIA-IVA-Lab/Obj2Seq.
\ No newline at end of file
diff --git a/data/2022/neurips/Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation b/data/2022/neurips/Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation
new file mode 100644
index 0000000000..c9c25ebf07
--- /dev/null
+++ b/data/2022/neurips/Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation	
@@ -0,0 +1 @@
+Iterative refinement -- start with a random guess, then iteratively improve the guess -- is a useful paradigm for representation learning because it offers a way to break symmetries among equally plausible explanations for the data. This property enables the application of such methods to infer representations of sets of entities, such as objects in physical scenes, structurally resembling clustering algorithms in latent space. However, most prior works differentiate through the unrolled refinement process, which can make optimization challenging. We observe that such methods can be made differentiable by means of the implicit function theorem, and develop an implicit differentiation approach that improves the stability and tractability of training by decoupling the forward and backward passes. This connection enables us to apply advances in optimizing implicit layers to not only improve the optimization of the slot attention module in SLATE, a state-of-the-art method for learning entity representations, but do so with constant space and time complexity in backpropagation and only one additional line of code.
\ No newline at end of file
diff --git a/data/2022/neurips/Object Scene Representation Transformer b/data/2022/neurips/Object Scene Representation Transformer
new file mode 100644
index 0000000000..b96b35080e
--- /dev/null
+++ b/data/2022/neurips/Object Scene Representation Transformer	
@@ -0,0 +1 @@
+A compositional understanding of the world in terms of objects and their geometry in 3D space is considered a cornerstone of human cognition. Facilitating the learning of such a representation in neural networks holds promise for substantially improving labeled data efficiency. As a key step in this direction, we make progress on the problem of learning 3D-consistent decompositions of complex scenes into individual objects in an unsupervised fashion. We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis. OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods. At the same time, it is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder. We believe this work will not only accelerate future architecture exploration and scaling efforts, but it will also serve as a useful tool for both object-centric as well as neural scene representation learning communities.
\ No newline at end of file
diff --git a/data/2022/neurips/Object-Category Aware Reinforcement Learning b/data/2022/neurips/Object-Category Aware Reinforcement Learning
new file mode 100644
index 0000000000..a135a4b876
--- /dev/null
+++ b/data/2022/neurips/Object-Category Aware Reinforcement Learning	
@@ -0,0 +1 @@
+Object-oriented reinforcement learning (OORL) is a promising way to improve the sample efficiency and generalization ability over standard RL. Recent works that try to solve OORL tasks without additional feature engineering mainly focus on learning the object representations and then solving tasks via reasoning based on these object representations. However, none of these works tries to explicitly model the inherent similarity between different object instances of the same category. Objects of the same category should share similar functionalities; therefore, the category is the most critical property of an object. Following this insight, we propose a novel framework named Object-Category Aware Reinforcement Learning (OCARL), which utilizes the category information of objects to facilitate both perception and reasoning. OCARL consists of three parts: (1) Category-Aware Unsupervised Object Discovery (UOD), which discovers the objects as well as their corresponding categories; (2) Object-Category Aware Perception, which encodes the category information and is also robust to the incompleteness of (1) at the same time; (3) Object-Centric Modular Reasoning, which adopts multiple independent and object-category-specific networks when reasoning based on objects. Our experiments show that OCARL can improve both the sample efficiency and generalization in the OORL domain.
\ No newline at end of file
diff --git a/data/2022/neurips/OccGen: Selection of Real-world Multilingual Parallel Data Balanced in Gender within Occupations b/data/2022/neurips/OccGen: Selection of Real-world Multilingual Parallel Data Balanced in Gender within Occupations
new file mode 100644
index 0000000000..1a0586f2d5
--- /dev/null
+++ b/data/2022/neurips/OccGen: Selection of Real-world Multilingual Parallel Data Balanced in Gender within Occupations	
@@ -0,0 +1 @@
+This paper describes the O CC G EN toolkit, which allows extracting multilingual parallel data balanced in gender within occupations. O CC G EN can extract datasets that reflect gender diversity (beyond binary) more fairly in society to be further used to explicitly mitigate occupational gender stereotypes. We propose two use cases that extract evaluation datasets for machine translation in four high-resource languages from different linguistic families and in a low-resource African language. Our analysis of these use cases shows that translation outputs in high-resource languages tend to worsen in feminine subsets (compared to masculine), specially in the directions containing English. This is confirmed by the human evaluation. We hypothesize that a sound language generation may contribute to pay less attention to the source sentence and to overgeneralize to the most frequent gender forms.
\ No newline at end of file
diff --git a/data/2022/neurips/Off-Policy Evaluation for Action-Dependent Non-stationary Environments b/data/2022/neurips/Off-Policy Evaluation for Action-Dependent Non-stationary Environments
new file mode 100644
index 0000000000..be105b5164
--- /dev/null
+++ b/data/2022/neurips/Off-Policy Evaluation for Action-Dependent Non-stationary Environments	
@@ -0,0 +1 @@
+Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a higher-order stationarity assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.
\ No newline at end of file
diff --git a/data/2022/neurips/Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models b/data/2022/neurips/Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models
new file mode 100644
index 0000000000..642f951464
--- /dev/null
+++ b/data/2022/neurips/Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models	
@@ -0,0 +1 @@
+We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.
\ No newline at end of file
diff --git a/data/2022/neurips/Off-Policy Evaluation with Deficient Support Using Side Information b/data/2022/neurips/Off-Policy Evaluation with Deficient Support Using Side Information
new file mode 100644
index 0000000000..253a5c048d
--- /dev/null
+++ b/data/2022/neurips/Off-Policy Evaluation with Deficient Support Using Side Information	
@@ -0,0 +1 @@
+The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of new policies from the data collected by another one. OPE is crucial when evaluating a new policy online is too expensive or risky. Many of the state-of-the-art OPE estimators are based on the Inverse Propensity Scoring (IPS) technique, which provides an unbiased estimator when the full support assumption holds, i.e., when the logging policy assigns a non-zero probability to each action. However, there are several scenarios where this assumption does not hold in practice, i.e., there is deficient support, and the IPS estimator is biased in the general case. In this paper, we consider two alternative estimators for the deficient support OPE problem. We first show how to adapt an estimator that was originally proposed for a different domain to the deficient support setting. Then, we propose another estimator, which is a novel contribution of this paper. These estimators exploit additional information about the actions, which we call side information, in order to make reliable estimates on the unsupported actions. Under alternative assumptions that do not require full support, we show that the considered estimators are unbiased. We also provide a theoretical analysis of the concentration when relaxing all the assumptions. Finally, we provide an experimental evaluation showing how the considered estimators are better suited for the deficient support setting compared to the baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Off-Policy Evaluation with Policy-Dependent Optimization Response b/data/2022/neurips/Off-Policy Evaluation with Policy-Dependent Optimization Response
new file mode 100644
index 0000000000..f643eb3779
--- /dev/null
+++ b/data/2022/neurips/Off-Policy Evaluation with Policy-Dependent Optimization Response	
@@ -0,0 +1 @@
+The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an \textit{average} of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an \textit{average} but rather as an \textit{output} of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with \textit{policy-dependent} linear optimization responses: causal outcomes introduce stochasticity in objective function coefficients. Under this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of \textit{optimization} bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method, and discuss asymptotic variance properties for a set of adjusted plug-in estimators. Lastly, attaining unbiased policy evaluation allows for policy optimization: we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.
\ No newline at end of file
diff --git a/data/2022/neurips/Off-Team Learning b/data/2022/neurips/Off-Team Learning
new file mode 100644
index 0000000000..b43dbec335
--- /dev/null
+++ b/data/2022/neurips/Off-Team Learning	
@@ -0,0 +1 @@
+Zero-shot coordination (ZSC) evaluates an algorithm by the performance of a team of agents that were trained independently under that algorithm. Off-belief learning (OBL) is a recent method that achieves state-of-the-art results in ZSC in the game Hanabi. However, the implementation of OBL relies on a belief model that experiences covariate shift. Moreover, during ad-hoc coordination, OBL or any other neural policy may experience test-time covariate shift. We present two methods addressing these issues. The first method, off-team belief learning (OT-BL), attempts to improve the accuracy of the belief model of a target policy π T on a broader range of inputs by weighting trajectories approximately according to the distribution induced by a different policy π b . The second, off-team off-belief learning (OT-OBL), attempts to compute an OBL equilibrium, where fixed point error is weighted according to the distribution induced by cross-play between the training policy π and a different fixed policy π b instead of self-play of π . We investigate these methods in variants of Hanabi.
\ No newline at end of file
diff --git a/data/2022/neurips/Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression b/data/2022/neurips/Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression
new file mode 100644
index 0000000000..428ad0402e
--- /dev/null
+++ b/data/2022/neurips/Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression	
@@ -0,0 +1 @@
+Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose $\textbf{Go}$al-conditioned $f$-$\textbf{A}$dvantage $\textbf{R}$egression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a state-occupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains. Through extensive experiments, we validate GoFAR's effectiveness in various problem settings and tasks, significantly outperforming prior state-of-art. Notably, on a real robotic dexterous manipulation task, while no other method makes meaningful progress, GoFAR acquires complex manipulation behavior that successfully accomplishes diverse goals.
\ No newline at end of file
diff --git a/data/2022/neurips/Offline Multi-Agent Reinforcement Learning with Knowledge Distillation b/data/2022/neurips/Offline Multi-Agent Reinforcement Learning with Knowledge Distillation
new file mode 100644
index 0000000000..6c0fb5b702
--- /dev/null
+++ b/data/2022/neurips/Offline Multi-Agent Reinforcement Learning with Knowledge Distillation	
@@ -0,0 +1 @@
+We introduce an offline multi-agent reinforcement learning (offline MARL) framework that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy who has the privilege to access every agent’s observations, actions, and rewards. After the teacher policy has identified and recombined the "good" behavior in the dataset, we create separate student policies and distill not only the teacher policy’s features but also its structural relations among different agents’ features to student policies. We show that our framework significantly improves performances on a range of tasks and outperforms state-of-the-art offline MARL baselines. Furthermore, we demonstrate that the proposed method has a better convergence rate, is more sample efficient, and is more robust to various demonstration qualities compared with baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Okapi: Generalising Better by Making Statistical Matches Match b/data/2022/neurips/Okapi: Generalising Better by Making Statistical Matches Match
new file mode 100644
index 0000000000..005707c0ea
--- /dev/null
+++ b/data/2022/neurips/Okapi: Generalising Better by Making Statistical Matches Match	
@@ -0,0 +1 @@
+We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime- and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets Sagawa et al., which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.
\ No newline at end of file
diff --git a/data/2022/neurips/Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again b/data/2022/neurips/Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
new file mode 100644
index 0000000000..6b2796a1f7
--- /dev/null
+++ b/data/2022/neurips/Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again	
@@ -0,0 +1 @@
+Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topology-aware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.
\ No newline at end of file
diff --git a/data/2022/neurips/OmniVL: One Foundation Model for Image-Language and Video-Language Tasks b/data/2022/neurips/OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
new file mode 100644
index 0000000000..58d5bcbcdf
--- /dev/null
+++ b/data/2022/neurips/OmniVL: One Foundation Model for Image-Language and Video-Language Tasks	
@@ -0,0 +1 @@
+This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture. It adopts a unified transformer-based visual encoder for both image and video inputs, and thus can perform joint image-language and video-language pretraining. We demonstrate, for the first time, such a paradigm benefits both image and video tasks, as opposed to the conventional one-directional transfer (e.g., use image-language to help video-language). To this end, we propose a decoupled joint pretraining of image-language and video-language to effectively decompose the vision-language modeling into spatial and temporal dimensions and obtain performance boost on both image and video tasks. Moreover, we introduce a novel unified vision-language contrastive (UniVLC) loss to leverage image-text, video-text, image-label (e.g., image classification), video-label (e.g., video action recognition) data together, so that both supervised and noisily supervised pretraining data are utilized as much as possible. Without incurring extra task-specific adaptors, OmniVL can simultaneously support visual only tasks (e.g., image classification, video action recognition), cross-modal alignment tasks (e.g., image/video-text retrieval), and multi-modal understanding and generation tasks (e.g., image/video question answering, captioning). We evaluate OmniVL on a wide range of downstream tasks and achieve state-of-the-art or competitive results with similar model size and data scale.
\ No newline at end of file
diff --git a/data/2022/neurips/On A Mallows-type Model For (Ranked) Choices b/data/2022/neurips/On A Mallows-type Model For (Ranked) Choices
new file mode 100644
index 0000000000..1bfb6cb165
--- /dev/null
+++ b/data/2022/neurips/On A Mallows-type Model For (Ranked) Choices	
@@ -0,0 +1 @@
+We consider a preference learning setting where every participant chooses an ordered list of $k$ most preferred items among a displayed set of candidates. (The set can be different for every participant.) We identify a distance-based ranking model for the population's preferences and their (ranked) choice behavior. The ranking model resembles the Mallows model but uses a new distance function called Reverse Major Index (RMJ). We find that despite the need to sum over all permutations, the RMJ-based ranking distribution aggregates into (ranked) choice probabilities with simple closed-form expression. We develop effective methods to estimate the model parameters and showcase their generalization power using real data, especially when there is a limited variety of display sets.
\ No newline at end of file
diff --git a/data/2022/neurips/On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models b/data/2022/neurips/On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models
new file mode 100644
index 0000000000..5528b27176
--- /dev/null
+++ b/data/2022/neurips/On Analyzing Generative and Denoising Capabilities of Diffusion-based Deep Generative Models	
@@ -0,0 +1 @@
+Diffusion-based Deep Generative Models (DDGMs) offer state-of-the-art performance in generative modeling. Their main strength comes from their unique setup in which a model (the backward diffusion process) is trained to reverse the forward diffusion process, which gradually adds noise to the input signal. Although DDGMs are well studied, it is still unclear how the small amount of noise is transformed during the backward diffusion process. Here, we focus on analyzing this problem to gain more insight into the behavior of DDGMs and their denoising and generative capabilities. We observe a fluid transition point that changes the functionality of the backward diffusion process from generating a (corrupted) image from noise to denoising the corrupted image to the final sample. Based on this observation, we postulate to divide a DDGM into two parts: a denoiser and a generator. The denoiser could be parameterized by a denoising auto-encoder, while the generator is a diffusion-based model with its own set of parameters. We experimentally validate our proposition, showing its pros and cons.
\ No newline at end of file
diff --git a/data/2022/neurips/On Batch Teaching with Sample Complexity Bounded by VCD b/data/2022/neurips/On Batch Teaching with Sample Complexity Bounded by VCD
new file mode 100644
index 0000000000..6170952b27
--- /dev/null
+++ b/data/2022/neurips/On Batch Teaching with Sample Complexity Bounded by VCD	
@@ -0,0 +1 @@
+In machine teaching, a concept is represented by (and inferred from) a small number of labeled examples. Various teaching models in the literature cast the interaction between teacher and learner in a way to obtain a small complexity (in terms of the number of examples required for teaching a concept) while obeying certain constraints that are meant to prevent unfair collusion between teacher and learner. In recent years, one major research goal has been to show interesting relationships between teaching complexity and the VC-dimension ( VCD ). So far, the only interesting relationship known from batch teaching settings is an upper bound quadratic in the VCD , on a parameter called recursive teaching dimension. The only known upper bound on teaching complexity that is linear in VCD was obtained in a model of teaching with sequences rather than batches. This paper is the first to provide an upper bound of VCD on a batch teaching complexity parameter. This parameter, called STD min , is introduced here as a model of teaching that intuitively incorporates a notion of “importance” of an example for a concept. In designing the STD min teaching model, we argue that the standard notion of collusion-freeness from the literature may be inadequate for certain applications; we hence propose three desirable properties of teaching complexity and demonstrate that they are satisfied by STD min .
\ No newline at end of file
diff --git a/data/2022/neurips/On Computing Probabilistic Explanations for Decision Trees b/data/2022/neurips/On Computing Probabilistic Explanations for Decision Trees
new file mode 100644
index 0000000000..c7d350c314
--- /dev/null
+++ b/data/2022/neurips/On Computing Probabilistic Explanations for Decision Trees	
@@ -0,0 +1 @@
+Formal XAI (explainable AI) is a growing area that focuses on computing explanations with mathematical guarantees for the decisions made by ML models. Inside formal XAI, one of the most studied cases is that of explaining the choices taken by decision trees, as they are traditionally deemed as one of the most interpretable classes of models. Recent work has focused on studying the computation of"sufficient reasons", a kind of explanation in which given a decision tree $T$ and an instance $x$, one explains the decision $T(x)$ by providing a subset $y$ of the features of $x$ such that for any other instance $z$ compatible with $y$, it holds that $T(z) = T(x)$, intuitively meaning that the features in $y$ are already enough to fully justify the classification of $x$ by $T$. It has been argued, however, that sufficient reasons constitute a restrictive notion of explanation, and thus the community has started to study their probabilistic counterpart, in which one requires that the probability of $T(z) = T(x)$ must be at least some value $\delta \in (0, 1]$, where $z$ is a random instance that is compatible with $y$. Our paper settles the computational complexity of $\delta$-sufficient-reasons over decision trees, showing that both (1) finding $\delta$-sufficient-reasons that are minimal in size, and (2) finding $\delta$-sufficient-reasons that are minimal inclusion-wise, do not admit polynomial-time algorithms (unless P=NP). This is in stark contrast with the deterministic case ($\delta = 1$) where inclusion-wise minimal sufficient-reasons are easy to compute. By doing this, we answer two open problems originally raised by Izza et al. On the positive side, we identify structural restrictions of decision trees that make the problem tractable, and show how SAT solvers might be able to tackle these problems in practical settings.
\ No newline at end of file
diff --git a/data/2022/neurips/On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond b/data/2022/neurips/On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond
new file mode 100644
index 0000000000..d1359ed130
--- /dev/null
+++ b/data/2022/neurips/On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond	
@@ -0,0 +1 @@
+The FedProx algorithm is a simple yet powerful distributed proximal point optimization method widely used for federated learning (FL) over heterogeneous data. Despite its popularity and remarkable success witnessed in practice, the theoretical understanding of FedProx is largely underinvestigated: the appealing convergence behavior of FedProx is so far characterized under certain non-standard and unrealistic dissimilarity assumptions of local functions, and the results are limited to smooth optimization problems. In order to remedy these deficiencies, we develop a novel local dissimilarity invariant convergence theory for FedProx and its minibatch stochastic extension through the lens of algorithmic stability. As a result, we contribute to derive several new and deeper insights into FedProx for non-convex federated optimization including: 1) convergence guarantees independent on local dissimilarity type conditions; 2) convergence guarantees for non-smooth FL problems; and 3) linear speedup with respect to size of minibatch and number of sampled devices. Our theory for the first time reveals that local dissimilarity and smoothness are not must-have for FedProx to get favorable complexity bounds. Preliminary experimental results on a series of benchmark FL datasets are reported to demonstrate the benefit of minibatching for improving the sample efficiency of FedProx.
\ No newline at end of file
diff --git a/data/2022/neurips/On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds b/data/2022/neurips/On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds
new file mode 100644
index 0000000000..ad678cb570
--- /dev/null
+++ b/data/2022/neurips/On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds	
@@ -0,0 +1 @@
+Generative networks have experienced great empirical successes in distribution learning. Many existing experiments have demonstrated that generative networks can generate high-dimensional complex data from a low-dimensional easy-to-sample distribution. However, this phenomenon can not be justified by existing theories. The widely held manifold hypothesis speculates that real-world data sets, such as natural images and signals, exhibit low-dimensional geometric structures. In this paper, we take such low-dimensional data structures into consideration by assuming that data distributions are supported on a low-dimensional manifold. We prove statistical guarantees of generative networks under the Wasserstein-1 loss. We show that the Wasserstein-1 loss converges to zero at a fast rate depending on the intrinsic dimension instead of the ambient data dimension. Our theory leverages the low-dimensional geometric structures in data sets and justifies the practical power of generative networks. We require no smoothness assumptions on the data distribution which is desirable in practice.
\ No newline at end of file
diff --git a/data/2022/neurips/On Divergence Measures for Bayesian Pseudocoresets b/data/2022/neurips/On Divergence Measures for Bayesian Pseudocoresets
new file mode 100644
index 0000000000..60b4abe7cc
--- /dev/null
+++ b/data/2022/neurips/On Divergence Measures for Bayesian Pseudocoresets	
@@ -0,0 +1 @@
+A Bayesian pseudocoreset is a small synthetic dataset for which the posterior over parameters approximates that of the original dataset. While promising, the scalability of Bayesian pseudocoresets is not yet validated in realistic problems such as image classification with deep neural networks. On the other hand, dataset distillation methods similarly construct a small dataset such that the optimization using the synthetic dataset converges to a solution with performance competitive with optimization using full data. Although dataset distillation has been empirically verified in large-scale settings, the framework is restricted to point estimates, and their adaptation to Bayesian inference has not been explored. This paper casts two representative dataset distillation algorithms as approximations to methods for constructing pseudocoresets by minimizing specific divergence measures: reverse KL divergence and Wasserstein distance. Furthermore, we provide a unifying view of such divergence measures in Bayesian pseudocoreset construction. Finally, we propose a novel Bayesian pseudocoreset algorithm based on minimizing forward KL divergence. Our empirical results demonstrate that the pseudocoresets constructed from these methods reflect the true posterior even in high-dimensional Bayesian inference problems.
\ No newline at end of file
diff --git a/data/2022/neurips/On Efficient Online Imitation Learning via Classification b/data/2022/neurips/On Efficient Online Imitation Learning via Classification
new file mode 100644
index 0000000000..86e355c120
--- /dev/null
+++ b/data/2022/neurips/On Efficient Online Imitation Learning via Classification	
@@ -0,0 +1 @@
+Imitation learning (IL) is a general learning paradigm for tackling sequential decision-making problems. Interactive imitation learning, where learners can interactively query for expert demonstrations, has been shown to achieve provably superior sample efficiency guarantees compared with its offline counterpart or reinforcement learning. In this work, we study classification-based online imitation learning (abbrev. $\textbf{COIL}$) and the fundamental feasibility to design oracle-efficient regret-minimization algorithms in this setting, with a focus on the general nonrealizable case. We make the following contributions: (1) we show that in the $\textbf{COIL}$ problem, any proper online learning algorithm cannot guarantee a sublinear regret in general; (2) we propose $\textbf{Logger}$, an improper online learning algorithmic framework, that reduces $\textbf{COIL}$ to online linear optimization, by utilizing a new definition of mixed policy class; (3) we design two oracle-efficient algorithms within the $\textbf{Logger}$ framework that enjoy different sample and interaction round complexity tradeoffs, and conduct finite-sample analyses to show their improvements over naive behavior cloning; (4) we show that under the standard complexity-theoretic assumptions, efficient dynamic regret minimization is infeasible in the $\textbf{Logger}$ framework. Our work puts classification-based online imitation learning, an important IL setup, into a firmer foundation.
\ No newline at end of file
diff --git a/data/2022/neurips/On Elimination Strategies for Bandit Fixed-Confidence Identification b/data/2022/neurips/On Elimination Strategies for Bandit Fixed-Confidence Identification
new file mode 100644
index 0000000000..4929950a83
--- /dev/null
+++ b/data/2022/neurips/On Elimination Strategies for Bandit Fixed-Confidence Identification	
@@ -0,0 +1 @@
+Elimination algorithms for bandit identification, which prune the plausible correct answers sequentially until only one remains, are computationally convenient since they reduce the problem size over time. However, existing elimination strategies are often not fully adaptive (they update their sampling rule infrequently) and are not easy to extend to combinatorial settings, where the set of answers is exponentially large in the problem dimension. On the other hand, most existing fully-adaptive strategies to tackle general identification problems are computationally demanding since they repeatedly test the correctness of every answer, without ever reducing the problem size. We show that adaptive methods can be modified to use elimination in both their stopping and sampling rules, hence obtaining the best of these two worlds: the algorithms (1) remain fully adaptive, (2) suffer a sample complexity that is never worse of their non-elimination counterpart, and (3) provably eliminate certain wrong answers early. We confirm these benefits experimentally, where elimination improves significantly the computational complexity of adaptive methods on common tasks like best-arm identification in linear bandits.
\ No newline at end of file
diff --git a/data/2022/neurips/On Embeddings for Numerical Features in Tabular Deep Learning b/data/2022/neurips/On Embeddings for Numerical Features in Tabular Deep Learning
new file mode 100644
index 0000000000..85537046fc
--- /dev/null
+++ b/data/2022/neurips/On Embeddings for Numerical Features in Tabular Deep Learning	
@@ -0,0 +1 @@
+Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.
\ No newline at end of file
diff --git a/data/2022/neurips/On Enforcing Better Conditioned Meta-Learning for Rapid Few-Shot Adaptation b/data/2022/neurips/On Enforcing Better Conditioned Meta-Learning for Rapid Few-Shot Adaptation
new file mode 100644
index 0000000000..42a858f80a
--- /dev/null
+++ b/data/2022/neurips/On Enforcing Better Conditioned Meta-Learning for Rapid Few-Shot Adaptation	
@@ -0,0 +1 @@
+Inspired by the concept of preconditioning, we propose a novel method to increase adaptation speed for gradient-based meta-learning methods without incurring extra parameters. We demonstrate that recasting the optimization problem to a non-linear least-squares formulation provides a principled way to actively enforce a $\textit{well-conditioned}$ parameter space for meta-learning models based on the concepts of the condition number and local curvature. Our comprehensive evaluations show that the proposed method significantly outperforms its unconstrained counterpart especially during initial adaptation steps, while achieving comparable or better overall results on several few-shot classification tasks -- creating the possibility of dynamically choosing the number of adaptation steps at inference time.
\ No newline at end of file
diff --git a/data/2022/neurips/On Feature Learning in the Presence of Spurious Correlations b/data/2022/neurips/On Feature Learning in the Presence of Spurious Correlations
new file mode 100644
index 0000000000..58258cc062
--- /dev/null
+++ b/data/2022/neurips/On Feature Learning in the Presence of Spurious Correlations	
@@ -0,0 +1 @@
+Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds. In this paper we evaluate the amount of information about the core (non-spurious) features that can be decoded from the representations learned by standard empirical risk minimization (ERM) and specialized group robustness training. Following recent work on Deep Feature Reweighting (DFR), we evaluate the feature representations by re-training the last layer of the model on a held-out set where the spurious correlation is broken. On multiple vision and NLP problems, we show that the features learned by simple ERM are highly competitive with the features learned by specialized group robustness methods targeted at reducing the effect of spurious correlations. Moreover, we show that the quality of learned feature representations is greatly affected by the design decisions beyond the training method, such as the model architecture and pre-training strategy. On the other hand, we find that strong regularization is not necessary for learning high quality feature representations. Finally, using insights from our analysis, we significantly improve upon the best results reported in the literature on the popular Waterbirds, CelebA hair color prediction and WILDS-FMOW problems, achieving 97%, 92% and 50% worst-group accuracies, respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/On Gap-dependent Bounds for Offline Reinforcement Learning b/data/2022/neurips/On Gap-dependent Bounds for Offline Reinforcement Learning
new file mode 100644
index 0000000000..3553e08534
--- /dev/null
+++ b/data/2022/neurips/On Gap-dependent Bounds for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy coverage assumption), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the optimal policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a positive sub-optimality gap in the optimal $Q$-function. Furthermore, we show when the visitation probabilities of the behavior policy are uniformly lower bounded for states where an optimal policy's visitation probabilities are positive (the uniform optimal policy coverage assumption), the sample complexity of identifying an optimal policy is independent of $\frac{1}{\epsilon}$. Lastly, we present nearly-matching lower bounds to complement our gap-dependent upper bounds.
\ No newline at end of file
diff --git a/data/2022/neurips/On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice b/data/2022/neurips/On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice
new file mode 100644
index 0000000000..ba20662a6c
--- /dev/null
+++ b/data/2022/neurips/On Image Segmentation With Noisy Labels: Characterization and Volume Properties of the Optimal Solutions to Accuracy and Dice	
@@ -0,0 +1 @@
+We study two of the most popular performance metrics in medical image segmentation, Accuracy and Dice, when the target labels are noisy. For both metrics, several statements related to characterization and volume properties of the set of optimal segmentations are proved, and associated experiments are provided. Our main insights are: (i) the volume of the solutions to both metrics may deviate significantly from the expected volume of the target, (ii) the volume of a solution to Accuracy is always less than or equal to the volume of a solution to Dice and (iii) the optimal solutions to both of these metrics coincide when the set of feasible segmentations is constrained to the set of segmentations with the volume equal to the expected volume of the target.
\ No newline at end of file
diff --git a/data/2022/neurips/On Infinite Separations Between Simple and Optimal Mechanisms b/data/2022/neurips/On Infinite Separations Between Simple and Optimal Mechanisms
new file mode 100644
index 0000000000..7eff386df7
--- /dev/null
+++ b/data/2022/neurips/On Infinite Separations Between Simple and Optimal Mechanisms	
@@ -0,0 +1 @@
+We consider a revenue-maximizing seller with $k$ heterogeneous items for sale to a single additive buyer, whose values are drawn from a known, possibly correlated prior $\mathcal{D}$. It is known that there exist priors $\mathcal{D}$ such that simple mechanisms -- those with bounded menu complexity -- extract an arbitrarily small fraction of the optimal revenue. This paper considers the opposite direction: given a correlated distribution $\mathcal{D}$ witnessing an infinite separation between simple and optimal mechanisms, what can be said about $\mathcal{D}$? Previous work provides a framework for constructing such $\mathcal{D}$: it takes as input a sequence of $k$-dimensional vectors satisfying some geometric property, and produces a $\mathcal{D}$ witnessing an infinite gap. Our first main result establishes that this framework is without loss: every $\mathcal{D}$ witnessing an infinite separation could have resulted from this framework. Even earlier work provided a more streamlined framework. Our second main result establishes that this restrictive framework is not tight. That is, we provide an instance $\mathcal{D}$ witnessing an infinite gap, but which provably could not have resulted from the restrictive framework. As a corollary, we discover a new kind of mechanism which can witness these infinite separations on instances where the previous ''aligned'' mechanisms do not.
\ No newline at end of file
diff --git a/data/2022/neurips/On Kernelized Multi-Armed Bandits with Constraints b/data/2022/neurips/On Kernelized Multi-Armed Bandits with Constraints
new file mode 100644
index 0000000000..fb975e6aa1
--- /dev/null
+++ b/data/2022/neurips/On Kernelized Multi-Armed Bandits with Constraints	
@@ -0,0 +1 @@
+We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. This kernelized bandit setup strictly generalizes standard multi-armed bandits and linear bandits. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small, which is motivated by various practical applications. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis. This framework builds upon a novel sufficient condition, which not only is satisfied under general exploration strategies, including \emph{upper confidence bound} (UCB), \emph{Thompson sampling} (TS), and new ones based on \emph{random exploration}, but also enables a unified analysis for showing both sublinear regret and sublinear or even zero constraint violation. We demonstrate the superior performance of our proposed algorithms via numerical experiments based on both synthetic and real-world datasets. Along the way, we also make the first detailed comparison between two popular methods for analyzing constrained bandits and Markov decision processes (MDPs) by discussing the key difference and some subtleties in the analysis, which could be of independent interest to the communities.
\ No newline at end of file
diff --git a/data/2022/neurips/On Learning Fairness and Accuracy on Multiple Subgroups b/data/2022/neurips/On Learning Fairness and Accuracy on Multiple Subgroups
new file mode 100644
index 0000000000..3b09916011
--- /dev/null
+++ b/data/2022/neurips/On Learning Fairness and Accuracy on Multiple Subgroups	
@@ -0,0 +1 @@
+We propose an analysis in fair learning that preserves the utility of the data while reducing prediction disparities under the criteria of group sufficiency. We focus on the scenario where the data contains multiple or even many subgroups, each with limited number of samples. As a result, we present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective. Specifically, the subgroup specific predictors are learned in the lower-level through a small amount of data and the fair predictor. In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors. We further prove that such a bilevel objective can effectively control the group sufficiency and generalization error. We evaluate the proposed framework on real-world datasets. Empirical evidence suggests the consistently improved fair predictions, as well as the comparable accuracy to the baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/On Learning and Refutation in Noninteractive Local Differential Privacy b/data/2022/neurips/On Learning and Refutation in Noninteractive Local Differential Privacy
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/On Leave-One-Out Conditional Mutual Information For Generalization b/data/2022/neurips/On Leave-One-Out Conditional Mutual Information For Generalization
new file mode 100644
index 0000000000..60ae7d7d29
--- /dev/null
+++ b/data/2022/neurips/On Leave-One-Out Conditional Mutual Information For Generalization	
@@ -0,0 +1 @@
+We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not exploit the structure of the problem and may be hard to evaluate in practice, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation, stability of the optimization algorithm, and the geometry of the loss-landscape. It applies both to the output of training algorithms as well as their predictions. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning. In particular, our bounds are non-vacuous on large-scale image-classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/On Margin Maximization in Linear and ReLU Networks b/data/2022/neurips/On Margin Maximization in Linear and ReLU Networks
new file mode 100644
index 0000000000..14f092a973
--- /dev/null
+++ b/data/2022/neurips/On Margin Maximization in Linear and ReLU Networks	
@@ -0,0 +1 @@
+The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li [2019] showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in the parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem. On the flip side, we identify multiple settings where a local or global optimum can be guaranteed.
\ No newline at end of file
diff --git a/data/2022/neurips/On Margins and Generalisation for Voting Classifiers b/data/2022/neurips/On Margins and Generalisation for Voting Classifiers
new file mode 100644
index 0000000000..d5985389c7
--- /dev/null
+++ b/data/2022/neurips/On Margins and Generalisation for Voting Classifiers	
@@ -0,0 +1 @@
+We study the generalisation properties of majority voting on finite ensembles of classifiers, proving margin-based generalisation bounds via the PAC-Bayes theory. These provide state-of-the-art guarantees on a number of classification tasks. Our central results leverage the Dirichlet posteriors studied recently by Zantedeschi et al. [2021] for training voting classifiers; in contrast to that work our bounds apply to non-randomised votes via the use of margins. Our contributions add perspective to the debate on the"margins theory"proposed by Schapire et al. [1998] for the generalisation of ensemble classifiers.
\ No newline at end of file
diff --git a/data/2022/neurips/On Measuring Excess Capacity in Neural Networks b/data/2022/neurips/On Measuring Excess Capacity in Neural Networks
new file mode 100644
index 0000000000..7a31802238
--- /dev/null
+++ b/data/2022/neurips/On Measuring Excess Capacity in Neural Networks	
@@ -0,0 +1 @@
+We study the excess capacity of deep networks in the context of supervised classification. That is, given a capacity measure of the underlying hypothesis class - in our case, empirical Rademacher complexity - to what extent can we (a priori) constrain this class while retaining an empirical error on a par with the unconstrained regime? To assess excess capacity in modern architectures (such as residual networks), we extend and unify prior Rademacher complexity bounds to accommodate function composition and addition, as well as the structure of convolutions. The capacity-driving terms in our bounds are the Lipschitz constants of the layers and an (2, 1) group norm distance to the initializations of the convolution weights. Experiments on benchmark datasets of varying task difficulty indicate that (1) there is a substantial amount of excess capacity per task, and (2) capacity can be kept at a surprisingly similar level across tasks. Overall, this suggests a notion of compressibility with respect to weight norms, complementary to classic compression via weight pruning. Source code is available at https://github.com/rkwitt/excess_capacity.
\ No newline at end of file
diff --git a/data/2022/neurips/On Non-Linear operators for Geometric Deep Learning b/data/2022/neurips/On Non-Linear operators for Geometric Deep Learning
new file mode 100644
index 0000000000..a6316a5856
--- /dev/null
+++ b/data/2022/neurips/On Non-Linear operators for Geometric Deep Learning	
@@ -0,0 +1 @@
+This work studies operators mapping vector and scalar fields defined over a manifold $\mathcal{M}$, and which commute with its group of diffeomorphisms $\text{Diff}(\mathcal{M})$. We prove that in the case of scalar fields $L^p_\omega(\mathcal{M,\mathbb{R}})$, those operators correspond to point-wise non-linearities, recovering and extending known results on $\mathbb{R}^d$. In the context of Neural Networks defined over $\mathcal{M}$, it indicates that point-wise non-linear operators are the only universal family that commutes with any group of symmetries, and justifies their systematic use in combination with dedicated linear operators commuting with specific symmetries. In the case of vector fields $L^p_\omega(\mathcal{M},T\mathcal{M})$, we show that those operators are solely the scalar multiplication. It indicates that $\text{Diff}(\mathcal{M})$ is too rich and that there is no universal class of non-linear operators to motivate the design of Neural Networks over the symmetries of $\mathcal{M}$.
\ No newline at end of file
diff --git a/data/2022/neurips/On Optimal Learning Under Targeted Data Poisoning b/data/2022/neurips/On Optimal Learning Under Targeted Data Poisoning
new file mode 100644
index 0000000000..fa3afbb723
--- /dev/null
+++ b/data/2022/neurips/On Optimal Learning Under Targeted Data Poisoning	
@@ -0,0 +1 @@
+Consider the task of learning a hypothesis class $\mathcal{H}$ in the presence of an adversary that can replace up to an $\eta$ fraction of the examples in the training set with arbitrary adversarial examples. The adversary aims to fail the learner on a particular target test point $x$ which is known to the adversary but not to the learner. In this work we aim to characterize the smallest achievable error $\epsilon=\epsilon(\eta)$ by the learner in the presence of such an adversary in both realizable and agnostic settings. We fully achieve this in the realizable setting, proving that $\epsilon=\Theta(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $\mathtt{VC}(\mathcal{H})$ is the VC dimension of $\mathcal{H}$. Remarkably, we show that the upper bound can be attained by a deterministic learner. In the agnostic setting we reveal a more elaborate landscape: we devise a deterministic learner with a multiplicative regret guarantee of $\epsilon \leq C\cdot\mathtt{OPT} + O(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $C>1$ is a universal numerical constant. We complement this by showing that for any deterministic learner there is an attack which worsens its error to at least $2\cdot \mathtt{OPT}$. This implies that a multiplicative deterioration in the regret is unavoidable in this case. Finally, the algorithms we develop for achieving the optimal rates are inherently improper. Nevertheless, we show that for a variety of natural concept classes, such as linear classifiers, it is possible to retain the dependence $\epsilon=\Theta_{\mathcal{H}}(\eta)$ by a proper algorithm in the realizable setting. Here $\Theta_{\mathcal{H}}$ conceals a polynomial dependence on $\mathtt{VC}(\mathcal{H})$.
\ No newline at end of file
diff --git a/data/2022/neurips/On Privacy and Personalization in Cross-Silo Federated Learning b/data/2022/neurips/On Privacy and Personalization in Cross-Silo Federated Learning
new file mode 100644
index 0000000000..e39215a376
--- /dev/null
+++ b/data/2022/neurips/On Privacy and Personalization in Cross-Silo Federated Learning	
@@ -0,0 +1 @@
+While the application of differential privacy (DP) has been well-studied in cross-device federated learning (FL), there is a lack of work considering DP and its implications for cross-silo FL, a setting characterized by a limited number of clients each containing many data subjects. In cross-silo FL, usual notions of client-level DP are less suitable as real-world privacy regulations typically concern the in-silo data subjects rather than the silos themselves. In this work, we instead consider an alternative notion of silo-specific sample-level DP, where silos set their own privacy targets for their local examples. Under this setting, we reconsider the roles of personalization in federated learning. In particular, we show that mean-regularized multi-task learning (MR-MTL), a simple personalization framework, is a strong baseline for cross-silo FL: under stronger privacy requirements, silos are incentivized to federate more with each other to mitigate DP noise, resulting in consistent improvements relative to standard baseline methods. We provide an empirical study of competing methods as well as a theoretical characterization of MR-MTL for mean estimation, highlighting the interplay between privacy and cross-silo data heterogeneity. Our work serves to establish baselines for private cross-silo FL as well as identify key directions of future work in this area.
\ No newline at end of file
diff --git a/data/2022/neurips/On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting b/data/2022/neurips/On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
new file mode 100644
index 0000000000..b4930110b8
--- /dev/null
+++ b/data/2022/neurips/On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting	
@@ -0,0 +1 @@
+The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training-from-scratch to a fine-tuning paradigm. While in some applications the goal is to"nudge"the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged to tackle this challenge: Reward Maximization (RM) and, more recently, Distribution Matching (DM). RM applies standard Reinforcement Learning (RL) techniques, such as Policy Gradients, to gradually increase the reward signal. DM prescribes to first make explicit the target distribution that the model is fine-tuned to approximate. Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM. We further observe that while DM differs from RM, it can suffer from similar training difficulties, such as high gradient variance. We leverage connections between the two paradigms to import the concept of baseline into DM methods. We empirically validate the benefits of adding a baseline on an array of controllable language generation tasks such as constraining topic, sentiment, and gender distributions in texts sampled from a language model. We observe superior performance in terms of constraint satisfaction, stability and sample efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/On Robust Multiclass Learnability b/data/2022/neurips/On Robust Multiclass Learnability
new file mode 100644
index 0000000000..7647fd3aca
--- /dev/null
+++ b/data/2022/neurips/On Robust Multiclass Learnability	
@@ -0,0 +1 @@
+This work analyzes the robust learning problem in the multiclass setting. Under the framework of Probably Approximately Correct (PAC) learning, we first show that the graph dimension and the Natarajan dimension, which characterize the standard multiclass learnability, are no longer applicable in robust learning problem. We then generalize these notions to the robust learning setting, denoted as the adversarial graph dimension (AG-dimension) and the adversarial Natarajan dimension (AN-dimension). Upper and lower bounds of the sample complexity of robust multiclass learning are rigorously derived based on the AG-dimension and AN-dimension, respectively. Moreover, we calculate the AG-dimension and AN-dimension of the class of linear multiclass predictors, and show that the graph (Natarajan) dimension is of the same order as the AG(AN)-dimension. Finally, we prove that the AG-dimension and AN-dimension are not equivalent.
\ No newline at end of file
diff --git a/data/2022/neurips/On Sample Optimality in Personalized Collaborative and Federated Learning b/data/2022/neurips/On Sample Optimality in Personalized Collaborative and Federated Learning
new file mode 100644
index 0000000000..efadd60c46
--- /dev/null
+++ b/data/2022/neurips/On Sample Optimality in Personalized Collaborative and Federated Learning	
@@ -0,0 +1 @@
+In personalized federated learning, each member of a potentially large set of agents aims to train a model minimizing its loss function averaged over its local data distribution. We study this problem under the lens of stochastic optimization, focusing on a scenario with a large number of agents, that each possess very few data samples from their local data distribution. Specifically, we prove novel matching lower and upper bounds on the number of samples required from all agents to approximately minimize the generalization error of a fixed agent. We provide strategies matching these lower bounds, based on a gradient filtering approach: given prior knowledge on some notion of distance between local data distributions, agents filter and aggregate stochastic gradients received from other agents, in order to achieve an optimal bias-variance trade-off. Finally, we quantify the impact of using rough estimations of the distances between local distributions of agents, based on a very small number of local samples.
\ No newline at end of file
diff --git a/data/2022/neurips/On Scalable Testing of Samplers b/data/2022/neurips/On Scalable Testing of Samplers
new file mode 100644
index 0000000000..f7c97d53d0
--- /dev/null
+++ b/data/2022/neurips/On Scalable Testing of Samplers	
@@ -0,0 +1 @@
+In this paper we study the problem of testing of constrained samplers over high-dimensional distributions with $(\varepsilon,\eta,\delta)$ guarantees. Samplers are increasingly used in a wide range of safety-critical ML applications, and hence the testing problem has gained importance. For $n$-dimensional distributions, the existing state-of-the-art algorithm, $\mathsf{Barbarik2}$, has a worst case query complexity of exponential in $n$ and hence is not ideal for use in practice. Our primary contribution is an exponentially faster algorithm that has a query complexity linear in $n$ and hence can easily scale to larger instances. We demonstrate our claim by implementing our algorithm and then comparing it against $\mathsf{Barbarik2}$. Our experiments on the samplers $\mathsf{wUnigen3}$ and $\mathsf{wSTS}$, find that $\mathsf{Barbarik3}$ requires $10\times$ fewer samples for $\mathsf{wUnigen3}$ and $450\times$ fewer samples for $\mathsf{wSTS}$ as compared to $\mathsf{Barbarik2}$.
\ No newline at end of file
diff --git a/data/2022/neurips/On Scrambling Phenomena for Randomly Initialized Recurrent Networks b/data/2022/neurips/On Scrambling Phenomena for Randomly Initialized Recurrent Networks
new file mode 100644
index 0000000000..284161c614
--- /dev/null
+++ b/data/2022/neurips/On Scrambling Phenomena for Randomly Initialized Recurrent Networks	
@@ -0,0 +1 @@
+Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics, and their sensitivity to the initialization process often renders them notoriously hard to train. Recent works have shed light on such phenomena analyzing when exploding or vanishing gradients may occur, either of which is detrimental for training dynamics. In this paper, we point to a formal connection between RNNs and chaotic dynamical systems and prove a qualitatively stronger phenomenon about RNNs than what exploding gradients seem to suggest. Our main result proves that under standard initialization (e.g., He, Xavier etc.), RNNs will exhibit \textit{Li-Yorke chaos} with \textit{constant} probability \textit{independent} of the network's width. This explains the experimentally observed phenomenon of \textit{scrambling}, under which trajectories of nearby points may appear to be arbitrarily close during some timesteps, yet will be far away in future timesteps. In stark contrast to their feedforward counterparts, we show that chaotic behavior in RNNs is preserved under small perturbations and that their expressive power remains exponential in the number of feedback iterations. Our technical arguments rely on viewing RNNs as random walks under non-linear activations, and studying the existence of certain types of higher-order fixed points called \textit{periodic points} that lead to phase transitions from order to chaos.
\ No newline at end of file
diff --git a/data/2022/neurips/On Translation and Reconstruction Guarantees of the Cycle-Consistent Generative Adversarial Networks b/data/2022/neurips/On Translation and Reconstruction Guarantees of the Cycle-Consistent Generative Adversarial Networks
new file mode 100644
index 0000000000..21334dbca2
--- /dev/null
+++ b/data/2022/neurips/On Translation and Reconstruction Guarantees of the Cycle-Consistent Generative Adversarial Networks	
@@ -0,0 +1 @@
+The task of unpaired image-to-image translation has witnessed a revolution with the introduction of the cycle-consistency loss to Generative Adversarial Networks (GANs). Numerous variants, with Cycle-Consistent Adversarial Network (Cy-cleGAN) at their forefront, have shown remarkable empirical performance. The involvement of two unalike data spaces and the existence of multiple solution maps between them are some of the facets that make such architectures unique. In this study, we investigate the statistical properties of such unpaired data translator networks between distinct spaces, bearing the additional responsibility of cycle-consistency. In a density estimation setup, we derive sharp non-asymptotic bounds on the translation errors under suitably characterized models. This, in turn, points out sufficient regularity conditions that maps must obey to carry out successful translations. We further show that cycle-consistency is achieved as a consequence of the data being successfully generated in each space based on observations from the other. In a first-of-its-kind attempt, we also provide deterministic bounds on the cumulative reconstruction error. In the process, we establish tolerable upper bounds on the discrepancy responsible for ill-posedness in such networks.
\ No newline at end of file
diff --git a/data/2022/neurips/On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification b/data/2022/neurips/On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification
new file mode 100644
index 0000000000..7fc565e7b6
--- /dev/null
+++ b/data/2022/neurips/On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification	
@@ -0,0 +1 @@
+Aleatoric uncertainty captures the inherent randomness of the data, such as measurement noise. In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter. By contrast, for Bayesian classification we use a categorical distribution with no mechanism to represent our beliefs about aleatoric uncertainty. Our work shows that explicitly accounting for aleatoric uncertainty significantly improves the performance of Bayesian neural networks. We note that many standard benchmarks, such as CIFAR, have essentially no aleatoric uncertainty. Moreover, we show data augmentation in approximate inference has the effect of softening the likelihood, leading to underconfidence and profoundly misrepresenting our honest beliefs about aleatoric uncertainty. Accordingly, we find that a cold posterior, tempered by a power greater than one, often more honestly reflects our beliefs about aleatoric uncertainty than no tempering -- providing an explicit link between data augmentation and cold posteriors. We show that we can match or exceed the performance of posterior tempering by using a Dirichlet observation model, where we explicitly control the level of aleatoric uncertainty, without any need for tempering.
\ No newline at end of file
diff --git a/data/2022/neurips/On global convergence of ResNets: From finite to infinite width using linear parameterization b/data/2022/neurips/On global convergence of ResNets: From finite to infinite width using linear parameterization
new file mode 100644
index 0000000000..e6b49dd424
--- /dev/null
+++ b/data/2022/neurips/On global convergence of ResNets: From finite to infinite width using linear parameterization	
@@ -0,0 +1 @@
+Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for shallow networks, using on convex optimization technics. To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear. Such ResNets admit both infinite depth and width limits, encoding residual blocks in a Reproducing Kernel Hilbert Space (RKHS). In this limit, we prove a local Polyak-Lojasiewicz inequality. Thus, every critical point is a global minimizer and a local convergence result of GD holds, retrieving the lazy regime. In contrast with other mean-field studies, it applies to both parametric and non-parametric cases under an expressivity condition on the residuals. Our analysis leads to a practical and quantified recipe: starting from a universal RKHS, Random Fourier Features are applied to obtain a finite dimensional parameterization satisfying with high-probability our expressivity condition.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Adversarial Robustness of Mixture of Experts b/data/2022/neurips/On the Adversarial Robustness of Mixture of Experts
new file mode 100644
index 0000000000..bfce2898b4
--- /dev/null
+++ b/data/2022/neurips/On the Adversarial Robustness of Mixture of Experts	
@@ -0,0 +1 @@
+Adversarial robustness is a key desirable property of neural networks. It has been empirically shown to be affected by their sizes, with larger networks being typically more robust. Recently, Bubeck and Sellke proved a lower bound on the Lipschitz constant of functions that fit the training data in terms of their number of parameters. This raises an interesting open question, do -- and can -- functions with more parameters, but not necessarily more computational cost, have better robustness? We study this question for sparse Mixture of Expert models (MoEs), that make it possible to scale up the model size for a roughly constant computational cost. We theoretically show that under certain conditions on the routing and the structure of the data, MoEs can have significantly smaller Lipschitz constants than their dense counterparts. The robustness of MoEs can suffer when the highest weighted experts for an input implement sufficiently different functions. We next empirically evaluate the robustness of MoEs on ImageNet using adversarial attacks and show they are indeed more robust than dense models with the same computational cost. We make key observations showing the robustness of MoEs to the choice of experts, highlighting the redundancy of experts in models trained in practice.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Complexity of Adversarial Decision Making b/data/2022/neurips/On the Complexity of Adversarial Decision Making
new file mode 100644
index 0000000000..3b444aaabd
--- /dev/null
+++ b/data/2022/neurips/On the Complexity of Adversarial Decision Making	
@@ -0,0 +1 @@
+A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show -- via new upper and lower bounds -- that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results -- both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and Gy\"{o}rgy.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Convergence Theory for Hessian-Free Bilevel Algorithms b/data/2022/neurips/On the Convergence Theory for Hessian-Free Bilevel Algorithms
new file mode 100644
index 0000000000..4be6697d39
--- /dev/null
+++ b/data/2022/neurips/On the Convergence Theory for Hessian-Free Bilevel Algorithms	
@@ -0,0 +1 @@
+Bilevel optimization has arisen as a powerful tool in modern machine learning. However, due to the nested structure of bilevel optimization, even gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations, which can be costly and unscalable in practice. Recently, Hessian-free bilevel schemes have been proposed to resolve this issue, where the general idea is to use zeroth- or first-order methods to approximate the full hypergradient of the bilevel problem. However, we empirically observe that such approximation can lead to large variance and unstable training, but estimating only the response Jacobian matrix as a partial component of the hypergradient turns out to be extremely effective. To this end, we propose a new Hessian-free method, which adopts the zeroth-order-like method to approximate the response Jacobian matrix via taking difference between two optimization paths. Theoretically, we provide the convergence rate analysis for the proposed algorithms, where our key challenge is to characterize the approximation and smoothness properties of the trajectory-dependent estimator, which can be of independent interest. This is the first known convergence rate result for this type of Hessian-free bilevel algorithms. Experimentally, we demonstrate that the proposed algorithms outperform baseline bilevel optimizers on various bilevel problems. Particularly, in our experiment on few-shot meta-learning with ResNet-12 network over the miniImageNet dataset, we show that our algorithm outperforms baseline meta-learning algorithms, while other baseline bilevel optimizers do not solve such meta-learning problems within a comparable time frame.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond b/data/2022/neurips/On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond
new file mode 100644
index 0000000000..b17a84cea1
--- /dev/null
+++ b/data/2022/neurips/On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond	
@@ -0,0 +1 @@
+The conﬂicting gradients problem is one of the major bottlenecks for the effective training of machine learning models that deal with multiple objectives. To resolve this problem, various gradient manipulation techniques, such as PCGrad, MGDA, and CAGrad, have been developed, which directly alter the conﬂicting gradients to reﬁned ones with alleviated or even no conﬂicts. However, the existing design and analysis of these techniques are mainly conducted under the full-batch gradient setting, ignoring the fact that they are primarily applied with stochastic mini-batch gradients. In this paper, we illustrate that the stochastic gradient manipulation algorithms may fail to converge to Pareto optimal solutions. Firstly, we show that these different algorithms can be summarized into a uniﬁed algorithmic framework, where the descent direction is given by the composition of the gradients of the multiple objectives. Then we provide an explicit two-objective convex optimization instance to explicate the non-convergence issue under the uniﬁed framework, which suggests that the non-convergence results from the determination of the composite weights solely by the instantaneous stochastic gradients. To ﬁx the non-convergence issue, we propose a novel composite weights determination scheme that exponentially averages the past calculated weights. Finally, we show the resulting new variant of stochastic gradient manipulation converges to Pareto optimal or critical solutions and yields comparable or improved empirical performance.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs b/data/2022/neurips/On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs
new file mode 100644
index 0000000000..08686f8ba6
--- /dev/null
+++ b/data/2022/neurips/On the Discrimination Risk of Mean Aggregation Feature Imputation in Graphs	
@@ -0,0 +1 @@
+In human networks, nodes belonging to a marginalized group often have a disproportionate rate of unknown or missing features. This, in conjunction with graph structure and known feature biases, can cause graph feature imputation algorithms to predict values for unknown features that make the marginalized group’s feature values more distinct from the the dominant group’s feature values than they are in reality. We call this distinction the discrimination risk . We prove that a higher discrimination risk can amplify the unfairness of a machine learning model applied to the imputed data. We then formalize a general graph feature imputation framework called mean aggregation imputation and theoretically and empirically characterize graphs in which applying this framework can yield feature values with a high discrimination risk. We propose a simple algorithm to ensure mean aggregation-imputed features provably have a low discrimination risk, while minimally sacrificing reconstruction error (with respect to the imputation objective). We evaluate the fairness and accuracy of our solution on synthetic and real-world credit networks.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Double Descent of Random Features Models Trained with SGD b/data/2022/neurips/On the Double Descent of Random Features Models Trained with SGD
new file mode 100644
index 0000000000..7e809cdfde
--- /dev/null
+++ b/data/2022/neurips/On the Double Descent of Random Features Models Trained with SGD	
@@ -0,0 +1 @@
+We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD) in under-/over-parameterized regime. In this work, we derive precise non-asymptotic error bounds of RF regression under both constant and polynomial-decay step-size SGD setting, and observe the double descent phenomenon both theoretically and empirically. Our analysis shows how to cope with multiple randomness sources of initialization, label noise, and data sampling (as well as stochastic gradients) with no closed-form solution, and also goes beyond the commonly-used Gaussian/spherical data assumption. Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for interpolation learning, and is able to characterize the double descent behavior by the unimodality of variance and monotonic decrease of bias. Besides, we also prove that the constant step-size SGD setting incurs no loss in convergence rate when compared to the exact minimum-norm interpolator, as a theoretical justification of using SGD in practice.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning b/data/2022/neurips/On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
new file mode 100644
index 0000000000..53f385cc90
--- /dev/null
+++ b/data/2022/neurips/On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning	
@@ -0,0 +1 @@
+We empirically investigate how pre-training on data of different modalities, such as language and vision, affects fine-tuning of Transformer-based models to Mujoco offline reinforcement learning tasks. Analysis of the internal representation reveals that the pre-trained Transformers acquire largely different representations before and after pre-training, but acquire less information of data in fine-tuning than the randomly initialized one. A closer look at the parameter changes of the pre-trained Transformers reveals that their parameters do not change that much and that the bad performance of the model pre-trained with image data could partially come from large gradients and gradient clipping. To study what information the Transformer pre-trained with language data utilizes, we fine-tune this model with no context provided, finding that the model learns efficiently even without context information. Subsequent follow-up analysis supports the hypothesis that pre-training with language data is likely to make the Transformer get context-like information and utilize it to solve the downstream task.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias b/data/2022/neurips/On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias
new file mode 100644
index 0000000000..9a6f7f68c3
--- /dev/null
+++ b/data/2022/neurips/On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias	
@@ -0,0 +1 @@
+We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with $r$ neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most $\mathcal{O}(r)$ linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is $\tilde{\mathcal{O}}(r)$ and independent of the sample size.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning b/data/2022/neurips/On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
new file mode 100644
index 0000000000..9232990cc3
--- /dev/null
+++ b/data/2022/neurips/On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning	
@@ -0,0 +1 @@
+Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning b/data/2022/neurips/On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning
new file mode 100644
index 0000000000..e6eedaa1d6
--- /dev/null
+++ b/data/2022/neurips/On the Effectiveness of Lipschitz-Driven Rehearsal in Continual Learning	
@@ -0,0 +1 @@
+Rehearsal approaches enjoy immense popularity with Continual Learning (CL) practitioners. These methods collect samples from previously encountered data distributions in a small memory buffer; subsequently, they repeatedly optimize on the latter to prevent catastrophic forgetting. This work draws attention to a hidden pitfall of this widespread practice: repeated optimization on a small pool of data inevitably leads to tight and unstable decision boundaries, which are a major hindrance to generalization. To address this issue, we propose Lipschitz-DrivEn Rehearsal (LiDER), a surrogate objective that induces smoothness in the backbone network by constraining its layer-wise Lipschitz constants w.r.t. replay examples. By means of extensive experiments, we show that applying LiDER delivers a stable performance gain to several state-of-the-art rehearsal CL methods across multiple datasets, both in the presence and absence of pre-training. Through additional ablative experiments, we highlight peculiar aspects of buffer overfitting in CL and better characterize the effect produced by LiDER. Code is available at https://github.com/aimagelab/LiDER
\ No newline at end of file
diff --git a/data/2022/neurips/On the Effectiveness of Persistent Homology b/data/2022/neurips/On the Effectiveness of Persistent Homology
new file mode 100644
index 0000000000..97612a8da6
--- /dev/null
+++ b/data/2022/neurips/On the Effectiveness of Persistent Homology	
@@ -0,0 +1 @@
+Persistent homology (PH) is one of the most popular methods in Topological Data Analysis. Even though PH has been used in many different types of applications, the reasons behind its success remain elusive; in particular, it is not known for which classes of problems it is most effective, or to what extent it can detect geometric or topological features. The goal of this work is to identify some types of problems where PH performs well or even better than other methods in data analysis. We consider three fundamental shape analysis tasks: the detection of the number of holes, curvature and convexity from 2D and 3D point clouds sampled from shapes. Experiments demonstrate that PH is successful in these tasks, outperforming several baselines, including PointNet, an architecture inspired precisely by the properties of point clouds. In addition, we observe that PH remains effective for limited computational resources and limited training data, as well as out-of-distribution test data, including various data transformations and noise. For convexity detection, we provide a theoretical guarantee that PH is effective for this task in $\mathbb{R}^d$, and demonstrate the detection of a convexity measure on the FLAVIA data set of plant leaf images. Due to the crucial role of shape classification in understanding mathematical and physical structures and objects, and in many applications, the findings of this work will provide some knowledge about the types of problems that are appropriate for PH, so that it can - to borrow the words from Wigner 1960 - ``remain valid in future research, and extend, to our pleasure", but to our lesser bafflement, to a variety of applications.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood b/data/2022/neurips/On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood
new file mode 100644
index 0000000000..25da48a2ad
--- /dev/null
+++ b/data/2022/neurips/On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood	
@@ -0,0 +1 @@
+We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various symmetric properties when the estimation error $\epsilon \gg n^{-1/3}$. This result improves upon the previous best accuracy threshold of $\epsilon \gg n^{-1/4}$ achievable by polynomial time computable PML-based universal estimators [ACSS21, ACSS20]. Our estimator reaches a theoretical limit for universal symmetric property estimation as [Han21] shows that a broad class of universal estimators (containing many well known approaches including ours) cannot be sample optimal for every $1$-Lipschitz property when $\epsilon \ll n^{-1/3}$.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Epistemic Limits of Personalized Prediction b/data/2022/neurips/On the Epistemic Limits of Personalized Prediction
new file mode 100644
index 0000000000..10128e92cf
--- /dev/null
+++ b/data/2022/neurips/On the Epistemic Limits of Personalized Prediction	
@@ -0,0 +1 @@
+Machine learning models are often personalized by using group attributes that encode personal characteristics (e.g., sex, age group, HIV status). In such settings, individuals expect to receive more accurate predictions in return for disclosing group attributes to the personalized model. We study when we can tell that a personalized model upholds this principle for every group who provides personal data. We introduce a metric called the benefit of personalization (BoP) to measure the smallest gain in accuracy that any group expects to receive from a personalized model. We describe how the BoP can be used to carry out basic routines to audit a personalized model, including: (i) hypothesis tests to check that a personalized model improves performance for every group; (ii) estimation procedures to bound the minimum gain in personalization. We characterize the reliability of these routines in a finite-sample regime and present minimax bounds on both the probability of error for BoP hypothesis tests and the mean-squared error of BoP estimates. Our results show that we can only claim that personalization improves performance for each group who provides data when we explicitly limit the number of group attributes used by a personalized model. In particular, we show that it is impossible to reliably verify that a personalized classifier with k ≥ 19 binary group attributes will benefit every group who provides personal data using a dataset of n = 8 × 10 9 samples – one for each person in the world.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Frequency-bias of Coordinate-MLPs b/data/2022/neurips/On the Frequency-bias of Coordinate-MLPs
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/On the Generalizability and Predictability of Recommender Systems b/data/2022/neurips/On the Generalizability and Predictability of Recommender Systems
new file mode 100644
index 0000000000..7c4c7884d2
--- /dev/null
+++ b/data/2022/neurips/On the Generalizability and Predictability of Recommender Systems	
@@ -0,0 +1 @@
+While other areas of machine learning have seen more and more automation, designing a high-performing recommender system still requires a high level of human effort. Furthermore, recent work has shown that modern recommender system algorithms do not always improve over well-tuned baselines. A natural follow-up question is,"how do we choose the right algorithm for a new dataset and performance metric?"In this work, we start by giving the first large-scale study of recommender system approaches by comparing 18 algorithms and 100 sets of hyperparameters across 85 datasets and 315 metrics. We find that the best algorithms and hyperparameters are highly dependent on the dataset and performance metric, however, there are also strong correlations between the performance of each algorithm and various meta-features of the datasets. Motivated by these findings, we create RecZilla, a meta-learning approach to recommender systems that uses a model to predict the best algorithm and hyperparameters for new, unseen datasets. By using far more meta-training data than prior work, RecZilla is able to substantially reduce the level of human involvement when faced with a new recommender system application. We not only release our code and pretrained RecZilla models, but also all of our raw experimental results, so that practitioners can train a RecZilla model for their desired performance metric: https://github.com/naszilla/reczilla.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model b/data/2022/neurips/On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model
new file mode 100644
index 0000000000..cef9e32c39
--- /dev/null
+++ b/data/2022/neurips/On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model	
@@ -0,0 +1 @@
+In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the"learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the neurons. However, in terms of the actual generalization performance, our results suggest that 3-layer NTK is much less sensitive to the choices of bias than 2-layer NTK, especially when the input dimension is large.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games b/data/2022/neurips/On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games
new file mode 100644
index 0000000000..cd51565fac
--- /dev/null
+++ b/data/2022/neurips/On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games	
@@ -0,0 +1 @@
+Softmax policy gradient is a popular algorithm for policy optimization in single-agent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without $\log$-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory-dependent constant that can be arbitrarily large, whereas the $\log$-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Identifiability of Nonlinear ICA: Sparsity and Beyond b/data/2022/neurips/On the Identifiability of Nonlinear ICA: Sparsity and Beyond
new file mode 100644
index 0000000000..5729dd1a7e
--- /dev/null
+++ b/data/2022/neurips/On the Identifiability of Nonlinear ICA: Sparsity and Beyond	
@@ -0,0 +1 @@
+Nonlinear independent component analysis (ICA) aims to recover the underlying independent latent sources from their observable nonlinear mixtures. How to make the nonlinear ICA model identifiable up to certain trivial indeterminacies is a long-standing problem in unsupervised learning. Recent breakthroughs reformulate the standard independence assumption of sources as conditional independence given some auxiliary variables (e.g., class labels and/or domain/time indexes) as weak supervision or inductive bias. However, nonlinear ICA with unconditional priors cannot benefit from such developments. We explore an alternative path and consider only assumptions on the mixing process, such as Structural Sparsity. We show that under specific instantiations of such constraints, the independent latent sources can be identified from their nonlinear mixtures up to a permutation and a component-wise transformation, thus achieving nontrivial identifiability of nonlinear ICA without auxiliary variables. We provide estimation methods and validate the theoretical results experimentally. The results on image data suggest that our conditions may hold in a number of practical data generating processes.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Importance of Gradient Norm in PAC-Bayesian Bounds b/data/2022/neurips/On the Importance of Gradient Norm in PAC-Bayesian Bounds
new file mode 100644
index 0000000000..869b276ebc
--- /dev/null
+++ b/data/2022/neurips/On the Importance of Gradient Norm in PAC-Bayesian Bounds	
@@ -0,0 +1 @@
+Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss and on-average bounded gradient norm assumptions. Following this relaxation, we propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. These inequalities add an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. We apply the proposed bound on Bayesian deep nets and empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity b/data/2022/neurips/On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity
new file mode 100644
index 0000000000..c4aab63776
--- /dev/null
+++ b/data/2022/neurips/On the Interpretability of Regularisation for Neural Networks Through Model Gradient Similarity	
@@ -0,0 +1 @@
+Most complex machine learning and modelling techniques are prone to over-fitting and may subsequently generalise poorly to future data. Artificial neural networks are no different in this regard and, despite having a level of implicit regularisation when trained with gradient descent, often require the aid of explicit regularisers. We introduce a new framework, Model Gradient Similarity (MGS), that (1) serves as a metric of regularisation, which can be used to monitor neural network training, (2) adds insight into how explicit regularisers, while derived from widely different principles, operate via the same mechanism underneath by increasing MGS, and (3) provides the basis for a new regularisation scheme which exhibits excellent performance, especially in challenging settings such as high levels of label noise or limited sample sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Learning Mechanisms in Physical Reasoning b/data/2022/neurips/On the Learning Mechanisms in Physical Reasoning
new file mode 100644
index 0000000000..7e39b38354
--- /dev/null
+++ b/data/2022/neurips/On the Learning Mechanisms in Physical Reasoning	
@@ -0,0 +1 @@
+Is dynamics prediction indispensable for physical reasoning? If so, what kind of roles do the dynamics prediction modules play during the physical reasoning process? Most studies focus on designing dynamics prediction networks and treating physical reasoning as a downstream task without investigating the questions above, taking for granted that the designed dynamics prediction would undoubtedly help the reasoning process. In this work, we take a closer look at this assumption, exploring this fundamental hypothesis by comparing two learning mechanisms: Learning from Dynamics (LfD) and Learning from Intuition (LfI). In the first experiment, we directly examine and compare these two mechanisms. Results show a surprising finding: Simple LfI is better than or on par with state-of-the-art LfD. This observation leads to the second experiment with Ground-truth Dynamics, the ideal case of LfD wherein dynamics are obtained directly from a simulator. Results show that dynamics, if directly given instead of approximated, would achieve much higher performance than LfI alone on physical reasoning; this essentially serves as the performance upper bound. Yet practically, LfD mechanism can only predict Approximate Dynamics using dynamics learning modules that mimic the physical laws, making the following downstream physical reasoning modules degenerate into the LfI paradigm; see the third experiment. We note that this issue is hard to mitigate, as dynamics prediction errors inevitably accumulate in the long horizon. Finally, in the fourth experiment, we note that LfI, the extremely simpler strategy when done right, is more effective in learning to solve physical reasoning problems. Taken together, the results on the challenging benchmark of PHYRE show that LfI is, if not better, as good as LfD for dynamics prediction. However, the potential improvement from LfD, though challenging, remains lucrative.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Limitations of Stochastic Pre-processing Defenses b/data/2022/neurips/On the Limitations of Stochastic Pre-processing Defenses
new file mode 100644
index 0000000000..80b8629665
--- /dev/null
+++ b/data/2022/neurips/On the Limitations of Stochastic Pre-processing Defenses	
@@ -0,0 +1 @@
+Defending against adversarial examples remains an open problem. A common belief is that randomness at inference increases the cost of finding adversarial inputs. An example of such a defense is to apply a random transformation to inputs prior to feeding them to the model. In this paper, we empirically and theoretically investigate such stochastic pre-processing defenses and demonstrate that they are flawed. First, we show that most stochastic defenses are weaker than previously thought; they lack sufficient randomness to withstand even standard attacks like projected gradient descent. This casts doubt on a long-held assumption that stochastic defenses invalidate attacks designed to evade deterministic defenses and force attackers to integrate the Expectation over Transformation (EOT) concept. Second, we show that stochastic defenses confront a trade-off between adversarial robustness and model invariance; they become less effective as the defended model acquires more invariance to their randomization. Future work will need to decouple these two effects. We also discuss implications and guidance for future research.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Parameterization and Initialization of Diagonal State Space Models b/data/2022/neurips/On the Parameterization and Initialization of Diagonal State Space Models
new file mode 100644
index 0000000000..a63953f21b
--- /dev/null
+++ b/data/2022/neurips/On the Parameterization and Initialization of Diagonal State Space Models	
@@ -0,0 +1 @@
+State space models (SSM) have recently been shown to be very effective as a deep learning layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers. The first version to show this potential was the S4 model, which is particularly effective on tasks involving long-range dependencies by using a prescribed state matrix called the HiPPO matrix. While this has an interpretable mathematical mechanism for modeling long dependencies, it introduces a custom representation and algorithm that can be difficult to implement. On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix. This work seeks to systematically understand how to parameterize and initialize such diagonal state space models. While it follows from classical results that almost all SSMs have an equivalent diagonal form, we show that the initialization is critical for performance. We explain why DSS works mathematically, by showing that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension. We also systematically describe various design choices in parameterizing and computing diagonal SSMs, and perform a controlled empirical study ablating the effects of these choices. Our final model S4D is a simple diagonal version of S4 whose kernel computation requires just 2 lines of code and performs comparably to S4 in almost all settings, with state-of-the-art results for image, audio, and medical time-series domains, and averaging 85\% on the Long Range Arena benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Representation Collapse of Sparse Mixture of Experts b/data/2022/neurips/On the Representation Collapse of Sparse Mixture of Experts
new file mode 100644
index 0000000000..718b08d369
--- /dev/null
+++ b/data/2022/neurips/On the Representation Collapse of Sparse Mixture of Experts	
@@ -0,0 +1 @@
+Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses b/data/2022/neurips/On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses
new file mode 100644
index 0000000000..d7fc55dadc
--- /dev/null
+++ b/data/2022/neurips/On the Robustness of Deep Clustering Models: Adversarial Attacks and Defenses	
@@ -0,0 +1 @@
+Clustering models constitute a class of unsupervised machine learning methods which are used in a number of application pipelines, and play a vital role in modern data science. With recent advancements in deep learning -- deep clustering models have emerged as the current state-of-the-art over traditional clustering approaches, especially for high-dimensional image datasets. While traditional clustering approaches have been analyzed from a robustness perspective, no prior work has investigated adversarial attacks and robustness for deep clustering models in a principled manner. To bridge this gap, we propose a blackbox attack using Generative Adversarial Networks (GANs) where the adversary does not know which deep clustering model is being used, but can query it for outputs. We analyze our attack against multiple state-of-the-art deep clustering models and real-world datasets, and find that it is highly successful. We then employ some natural unsupervised defense approaches, but find that these are unable to mitigate our attack. Finally, we attack Face++, a production-level face clustering API service, and find that we can significantly reduce its performance as well. Through this work, we thus aim to motivate the need for truly robust deep clustering models.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Robustness of Graph Neural Diffusion to Topology Perturbations b/data/2022/neurips/On the Robustness of Graph Neural Diffusion to Topology Perturbations
new file mode 100644
index 0000000000..1ea96aed91
--- /dev/null
+++ b/data/2022/neurips/On the Robustness of Graph Neural Diffusion to Topology Perturbations	
@@ -0,0 +1 @@
+Neural diffusion on graphs is a novel class of graph neural networks that has attracted increasing attention recently. The capability of graph neural partial differential equations (PDEs) in addressing common hurdles of graph neural networks (GNNs), such as the problems of over-smoothing and bottlenecks, has been investigated but not their robustness to adversarial attacks. In this work, we explore the robustness properties of graph neural PDEs. We empirically demonstrate that graph neural PDEs are intrinsically more robust against topology perturbation as compared to other GNNs. We provide insights into this phenomenon by exploiting the stability of the heat semigroup under graph topology perturbations. We discuss various graph diffusion operators and relate them to existing graph neural PDEs. Furthermore, we propose a general graph neural PDE framework based on which a new class of robust GNNs can be defined. We verify that the new model achieves comparable state-of-the-art performance on several benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/On the SDEs and Scaling Rules for Adaptive Gradient Algorithms b/data/2022/neurips/On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
new file mode 100644
index 0000000000..d72cb4c8e2
--- /dev/null
+++ b/data/2022/neurips/On the SDEs and Scaling Rules for Adaptive Gradient Algorithms	
@@ -0,0 +1 @@
+Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for these methods. This paper derives the SDE approximations for RMSprop and Adam, giving theoretical guarantees of their correctness as well as experimental validation of their applicability to common large-scaling vision and language settings. A key practical result is the derivation of a $\textit{square root scaling rule}$ to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach b/data/2022/neurips/On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
new file mode 100644
index 0000000000..b2dd964517
--- /dev/null
+++ b/data/2022/neurips/On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach	
@@ -0,0 +1 @@
+Interpretable and explainable machine learning has seen a recent surge of interest. We focus on safety as a key motivation behind the surge and make the relationship between interpretability and safety more quantitative. Toward assessing safety, we introduce the concept of maximum deviation via an optimization problem to find the largest deviation of a supervised learning model from a reference model regarded as safe. We then show how interpretability facilitates this safety assessment. For models including decision trees, generalized linear and additive models, the maximum deviation can be computed exactly and efficiently. For tree ensembles, which are not regarded as interpretable, discrete optimization techniques can still provide informative bounds. For a broader class of piecewise Lipschitz functions, we leverage the multi-armed bandit literature to show that interpretability produces tighter (regret) bounds on the maximum deviation. We present case studies, including one on mortgage approval, to illustrate our methods and the insights about models that may be obtained from deviation maximization.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory b/data/2022/neurips/On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory
new file mode 100644
index 0000000000..bb9736f4a7
--- /dev/null
+++ b/data/2022/neurips/On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory	
@@ -0,0 +1 @@
+This is an extended abstract. The full version of the paper is in NeurIPS 2022 and can be found in [1].
\ No newline at end of file
diff --git a/data/2022/neurips/On the Spectral Bias of Convolutional Neural Tangent and Gaussian Process Kernels b/data/2022/neurips/On the Spectral Bias of Convolutional Neural Tangent and Gaussian Process Kernels
new file mode 100644
index 0000000000..84ee86559f
--- /dev/null
+++ b/data/2022/neurips/On the Spectral Bias of Convolutional Neural Tangent and Gaussian Process Kernels	
@@ -0,0 +1 @@
+We study the properties of various over-parametrized convolutional neural architectures through their respective Gaussian process and neural tangent kernels. We prove that, with normalized multi-channel input and ReLU activation, the eigenfunctions of these kernels with the uniform measure are formed by products of spherical harmonics, defined over the channels of the different pixels. We next use hierarchical factorizable kernels to bound their respective eigenvalues. We show that the eigenvalues decay polynomially, quantify the rate of decay, and derive measures that reflect the composition of hierarchical features in these networks. Our results provide concrete quantitative characterization of over-parameterized convolutional network architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Stability and Scalability of Node Perturbation Learning b/data/2022/neurips/On the Stability and Scalability of Node Perturbation Learning
new file mode 100644
index 0000000000..d62fb8fdd9
--- /dev/null
+++ b/data/2022/neurips/On the Stability and Scalability of Node Perturbation Learning	
@@ -0,0 +1 @@
+To survive, animals must adapt synaptic weights based on external stimuli and rewards. And they must do so using local, biologically plausible, learning rules – a highly nontrivial constraint. One possible approach is to perturb neural activity (or use intrinsic, ongoing noise to perturb it), determine whether performance increases or decreases, and use that information to adjust the weights. This algorithm – known as node perturbation – has been shown to work on simple problems, but little is known about either its stability or its scalability with respect to network size. We investigate these issues both analytically, in deep linear networks, and numerically, in deep nonlinear ones. We show analytically that in deep linear networks with one hidden layer, both learning time and performance depend very weakly on hidden layer size. However, unlike stochastic gradient descent, when there is model mismatch between the student and teacher networks, node perturbation is always unstable. The instability is triggered by weight diffusion, which eventually leads to very large weights. This instability can be suppressed by weight normalization, at the cost of bias in the learning rule. We conﬁrm numerically that a similar instability, and to a lesser extent scalability, exist in deep nonlinear networks trained on both a motor control task and image classiﬁcation tasks. Our study highlights the limitations and potential of node perturbation as a biologically plausible learning rule in the brain.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL b/data/2022/neurips/On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
new file mode 100644
index 0000000000..8f4e9f99ab
--- /dev/null
+++ b/data/2022/neurips/On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL	
@@ -0,0 +1 @@
+We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions. On the positive side, we propose the RFOLIVE (Reward-Free OLIVE) algorithm for sample-efficient reward-free exploration under minimal structural assumptions, which covers the previously studied settings of linear MDPs (Jin et al., 2020b), linear completeness (Zanette et al., 2020b) and low-rank MDPs with unknown representation (Modi et al., 2021). Our analyses indicate that the explorability or reachability assumptions, previously made for the latter two settings, are not necessary statistically for reward-free exploration. On the negative side, we provide a statistical hardness result for both reward-free and reward-aware exploration under linear completeness assumptions when the underlying features are unknown, showing an exponential separation between low-rank and linear completeness settings.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Strong Correlation Between Model Invariance and Generalization b/data/2022/neurips/On the Strong Correlation Between Model Invariance and Generalization
new file mode 100644
index 0000000000..1ac1d1df27
--- /dev/null
+++ b/data/2022/neurips/On the Strong Correlation Between Model Invariance and Generalization	
@@ -0,0 +1 @@
+Generalization and invariance are two essential properties of any machine learning model. Generalization captures a model's ability to classify unseen data while invariance measures consistency of model predictions on transformations of the data. Existing research suggests a positive relationship: a model generalizing well should be invariant to certain visual factors. Building on this qualitative implication we make two contributions. First, we introduce effective invariance (EI), a simple and reasonable measure of model invariance which does not rely on image labels. Given predictions on a test image and its transformed version, EI measures how well the predictions agree and with what level of confidence. Second, using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance, focusing on rotation and grayscale transformations. From a model-centric view, we observe generalization and invariance of different models exhibit a strong linear relationship, on both in-distribution and out-of-distribution datasets. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets. Apart from these major findings, other minor but interesting insights are also discussed.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Symmetries of Deep Learning Models and their Internal Representations b/data/2022/neurips/On the Symmetries of Deep Learning Models and their Internal Representations
new file mode 100644
index 0000000000..cef1c99237
--- /dev/null
+++ b/data/2022/neurips/On the Symmetries of Deep Learning Models and their Internal Representations	
@@ -0,0 +1 @@
+Symmetry is a fundamental tool in the exploration of a broad range of complex systems. In machine learning symmetry has been explored in both models and data. In this paper we seek to connect the symmetries arising from the architecture of a family of models with the symmetries of that family's internal representation of data. We do this by calculating a set of fundamental symmetry groups, which we call the intertwiner groups of the model. We connect intertwiner groups to a model's internal representations of data through a range of experiments that probe similarities between hidden states across models with the same architecture. Our work suggests that the symmetries of a network are propagated into the symmetries in that network's representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for ReLU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Theoretical Properties of Noise Correlation in Stochastic Optimization b/data/2022/neurips/On the Theoretical Properties of Noise Correlation in Stochastic Optimization
new file mode 100644
index 0000000000..6903589bd4
--- /dev/null
+++ b/data/2022/neurips/On the Theoretical Properties of Noise Correlation in Stochastic Optimization	
@@ -0,0 +1 @@
+Studying the properties of stochastic noise to optimize complex non-convex functions has been an active area of research in the field of machine learning. Prior work has shown that the noise of stochastic gradient descent improves optimization by overcoming undesirable obstacles in the landscape. Moreover, injecting artificial Gaussian noise has become a popular idea to quickly escape saddle points. Indeed, in the absence of reliable gradient information, the noise is used to explore the landscape, but it is unclear what type of noise is optimal in terms of exploration ability. In order to narrow this gap in our knowledge, we study a general type of continuous-time non-Markovian process, based on fractional Brownian motion, that allows for the increments of the process to be correlated. This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process. We demonstrate how to discretize such processes which gives rise to the new algorithm fPGD. This method is a generalization of the known algorithms PGD and Anti-PGD. We study the properties of fPGD both theoretically and empirically, demonstrating that it possesses exploration abilities that, in some cases, are favorable over PGD and Anti-PGD. These results open the field to novel ways to exploit noise for training machine learning models.
\ No newline at end of file
diff --git a/data/2022/neurips/On the Tradeoff Between Robustness and Fairness b/data/2022/neurips/On the Tradeoff Between Robustness and Fairness
new file mode 100644
index 0000000000..7a22798c79
--- /dev/null
+++ b/data/2022/neurips/On the Tradeoff Between Robustness and Fairness	
@@ -0,0 +1 @@
+Interestingly, recent experimental results [2, 26] have identiﬁed a robust fairness phenomenon in adversarial training (AT), namely that a robust model well-trained by AT exhibits a remarkable disparity of standard accuracy and robust accuracy among different classes compared with natural training. However, the effect of different perturbation radii in AT on robust fairness has not been studied, and one natural question is raised: does a tradeoff exist between average robustness and robust fairness? Our extensive experimental results provide an afﬁrmative answer to this question: with an increasing perturbation radius, stronger AT will lead to a larger class-wise disparity of robust accuracy. Theoretically, we analyze the class-wise performance of adversarially trained linear models with mixture Gaussian distribution. Our theoretical results support our observations. Moreover, our theory shows that adversarial training easily leads to more serious robust fairness issue than natural training. Motivated by theoretical results, we propose a fairly adversarial training (FAT) method to mitigate the tradeoff between average robustness and robust fairness. Experimental results validate the effectiveness of our proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve b/data/2022/neurips/On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve
new file mode 100644
index 0000000000..869b632634
--- /dev/null
+++ b/data/2022/neurips/On the consistent estimation of optimal Receiver Operating Characteristic (ROC) curve	
@@ -0,0 +1 @@
+Under a standard binary classiﬁcation setting with possible model misspeciﬁcation, we study the problem of estimating general Receiver Operating Characteristic (ROC) curve, which is an arbitrary set of false positive rate (FPR) and true positive rate (TPR) pairs. We formally introduce the notion of optimal ROC curve over a general model space. It is argued that any ROC curve estimation methods implemented over the given model space should target the optimal ROC curve over that space. Three popular ROC curve estimation methods are then analyzed at the population level (i.e., when there are inﬁnite number of samples) under both correct and incorrect model speciﬁcation. Based on our analysis, they are all consistent when the surrogate loss function satisﬁes certain conditions and the given model space includes all measurable classiﬁers. Interestingly, some of these conditions are similar to those that are required to ensure classiﬁcation consistency. When the model space is incorrectly speciﬁed, however, we show that only one method leads to consistent estimation of the ROC curve over the chosen model space. We present some numerical results to demonstrate the effects of model misspeciﬁcation on the performance of various methods in terms of their ROC curve estimates.
\ No newline at end of file
diff --git a/data/2022/neurips/On the convergence of policy gradient methods to Nash equilibria in general stochastic games b/data/2022/neurips/On the convergence of policy gradient methods to Nash equilibria in general stochastic games
new file mode 100644
index 0000000000..f89fe79457
--- /dev/null
+++ b/data/2022/neurips/On the convergence of policy gradient methods to Nash equilibria in general stochastic games	
@@ -0,0 +1 @@
+Learning in stochastic games is a notoriously difficult problem because, in addition to each other's strategic decisions, the players must also contend with the fact that the game itself evolves over time, possibly in a very complicated manner. Because of this, the convergence properties of popular learning algorithms - like policy gradient and its variants - are poorly understood, except in specific classes of games (such as potential or two-player, zero-sum games). In view of this, we examine the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary (SOS) in a sense similar to the type of sufficiency conditions used in optimization. Our first result is that SOS policies are locally attracting with high probability, and we show that policy gradient trajectories with gradient estimates provided by the REINFORCE algorithm achieve an $\mathcal{O}(1/\sqrt{n})$ distance-squared convergence rate if the method's step-size is chosen appropriately. Subsequently, specializing to the class of deterministic Nash policies, we show that this rate can be improved dramatically and, in fact, policy gradient methods converge within a finite number of iterations in that case.
\ No newline at end of file
diff --git a/data/2022/neurips/On the detrimental effect of invariances in the likelihood for variational inference b/data/2022/neurips/On the detrimental effect of invariances in the likelihood for variational inference
new file mode 100644
index 0000000000..0286e6c67f
--- /dev/null
+++ b/data/2022/neurips/On the detrimental effect of invariances in the likelihood for variational inference	
@@ -0,0 +1 @@
+Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.
\ No newline at end of file
diff --git a/data/2022/neurips/On the difficulty of learning chaotic dynamics with RNNs b/data/2022/neurips/On the difficulty of learning chaotic dynamics with RNNs
new file mode 100644
index 0000000000..95c26f3b71
--- /dev/null
+++ b/data/2022/neurips/On the difficulty of learning chaotic dynamics with RNNs	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. It is particularly problematic in scientific applications where one aims to reconstruct the underlying dynamical system. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system's Lyapunov spectrum, regardless of the employed RNN architecture.
\ No newline at end of file
diff --git a/data/2022/neurips/On the generalization of learning algorithms that do not converge b/data/2022/neurips/On the generalization of learning algorithms that do not converge
new file mode 100644
index 0000000000..6da2e6f52b
--- /dev/null
+++ b/data/2022/neurips/On the generalization of learning algorithms that do not converge	
@@ -0,0 +1 @@
+Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that"train stably generalize better"even when the training continues indefinitely and the weights do not converge.
\ No newline at end of file
diff --git a/data/2022/neurips/On the inability of Gaussian process regression to optimally learn compositional functions b/data/2022/neurips/On the inability of Gaussian process regression to optimally learn compositional functions
new file mode 100644
index 0000000000..21022fe554
--- /dev/null
+++ b/data/2022/neurips/On the inability of Gaussian process regression to optimally learn compositional functions	
@@ -0,0 +1 @@
+We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$.
\ No newline at end of file
diff --git a/data/2022/neurips/On the non-universality of deep learning: quantifying the cost of symmetry b/data/2022/neurips/On the non-universality of deep learning: quantifying the cost of symmetry
new file mode 100644
index 0000000000..34a48df36b
--- /dev/null
+++ b/data/2022/neurips/On the non-universality of deep learning: quantifying the cost of symmetry	
@@ -0,0 +1 @@
+We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).
\ No newline at end of file
diff --git a/data/2022/neurips/On the relationship between variational inference and auto-associative memory b/data/2022/neurips/On the relationship between variational inference and auto-associative memory
new file mode 100644
index 0000000000..32477a79bf
--- /dev/null
+++ b/data/2022/neurips/On the relationship between variational inference and auto-associative memory	
@@ -0,0 +1 @@
+In this article, we propose a variational inference formulation of auto-associative memories, allowing us to combine perceptual inference and memory retrieval into the same mathematical framework. In this formulation, the prior probability distribution onto latent representations is made memory dependent, thus pulling the inference process towards previously stored representations. We then study how different neural network approaches to variational inference can be applied in this framework. We compare methods relying on amortized inference such as Variational Auto Encoders and methods relying on iterative inference such as Predictive Coding and suggest combining both approaches to design new auto-associative memory models. We evaluate the obtained algorithms on the CIFAR10 and CLEVR image datasets and compare them with other associative memory models such as Hopfield Networks, End-to-End Memory Networks and Neural Turing Machines.
\ No newline at end of file
diff --git a/data/2022/neurips/On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation b/data/2022/neurips/On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation
new file mode 100644
index 0000000000..be88597af0
--- /dev/null
+++ b/data/2022/neurips/On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation	
@@ -0,0 +1 @@
+Much of the recent successes of deep learning can be attributed to scaling up the size of the networks to the point where they often are vastly overparameterized. Thus, understanding the role of overparameterization is of increasing importance. While predictive theories have been developed for supervised learning, little is known about the Reinforcement Learning case. In this work, we take a theoretical approach and study the role of overparameterization for off-policy Temporal Difference (TD) learning in the linear setting. We leverage tools from random matrix theory and random graph theory to obtain a characterization of the spectrum of the TD operator. We use this result to study the stability and optimization dynamics of TD learning as a function of the number of parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/On the symmetries of the synchronization problem in Cryo-EM: Multi-Frequency Vector Diffusion Maps on the Projective Plane b/data/2022/neurips/On the symmetries of the synchronization problem in Cryo-EM: Multi-Frequency Vector Diffusion Maps on the Projective Plane
new file mode 100644
index 0000000000..e788feaeff
--- /dev/null
+++ b/data/2022/neurips/On the symmetries of the synchronization problem in Cryo-EM: Multi-Frequency Vector Diffusion Maps on the Projective Plane	
@@ -0,0 +1 @@
+Cryo-Electron Microscopy (Cryo-EM) is an important imaging method which allows high-resolution reconstruction of the 3D structures of biomolecules. It produces highly noisy 2D images by projecting a molecule’s 3D density from random viewing directions. Because the projection directions are unknown , estimating the images’ poses is necessary to perform the reconstruction. We focus on this task and study it under the group synchronization framework: if the relative poses of pairs of images can be approximated from the data, an estimation of the images’ poses is given by the assignment which is most consistent with the relative ones. In particular, by studying the symmetries of cryo-EM, we show that relative poses in the group O(2) provide sufﬁcient constraints to identify the images’ poses, up to the molecule’s chirality. With this in mind, we improve the existing multi-frequency vector diffusion maps (MFVDM) method: by using O(2) relative poses, our method not only predicts the similarity between the images’ viewing directions but also recovers their poses. Hence, we can leverage all input images in a 3D reconstruction algorithm by initializing the poses with our estimation rather than just clustering and averaging the input images. We validate the recovery capabilities and robustness of our method on randomly generated synchronization graphs and a synthetic cryo-EM dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/On-Demand Sampling: Learning Optimally from Multiple Distributions b/data/2022/neurips/On-Demand Sampling: Learning Optimally from Multiple Distributions
new file mode 100644
index 0000000000..c366fb2675
--- /dev/null
+++ b/data/2022/neurips/On-Demand Sampling: Learning Optimally from Multiple Distributions	
@@ -0,0 +1 @@
+Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and fair federated learning. In each of these settings, a learner seeks to uniformly minimize its expected loss over $n$ predefined data distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds for multi-distribution learning exceed that of learning a single distribution by only an additive factor of $n \log(n) / \epsilon^2$. This improves upon the best known sample complexity bounds for fair federated learning by Mohri et al. and collaborative learning by Nguyen and Zakynthinou by multiplicative factors of $n$ and $\log(n)/\epsilon^3$, respectively. We also provide the first sample complexity bounds for the group DRO objective of Sagawa et al. To guarantee these optimal sample complexity bounds, our algorithms learn to sample from data distributions on demand. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving stochastic zero-sum games. In particular, we contribute stochastic variants of no-regret dynamics that can trade off between players' differing sampling costs.
\ No newline at end of file
diff --git a/data/2022/neurips/On-Device Training Under 256KB Memory b/data/2022/neurips/On-Device Training Under 256KB Memory
new file mode 100644
index 0000000000..d0a3ee3f80
--- /dev/null
+++ b/data/2022/neurips/On-Device Training Under 256KB Memory	
@@ -0,0 +1 @@
+On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy on tinyML application VWW. Our study enables IoT devices not only to perform inference but also to continuously adapt to new data for on-device lifelong learning. A video demo can be found here: https://youtu.be/0pUFZYdoMY8.
\ No newline at end of file
diff --git a/data/2022/neurips/One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations b/data/2022/neurips/One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations
new file mode 100644
index 0000000000..3efa7dfaad
--- /dev/null
+++ b/data/2022/neurips/One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations	
@@ -0,0 +1 @@
+Free-form text prompts allow users to describe their intentions during image manipulation conveniently. Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations. Currently, the latent mapping between these two spaces is empirically designed and confines that each manipulation model can only handle one fixed text prompt. In this paper, we propose a method named Free-Form CLIP (FFCLIP), aiming to establish an automatic latent mapping so that one manipulation model handles free-form text prompts. Our FFCLIP has a cross-modality semantic modulation module containing semantic alignment and injection. The semantic alignment performs the automatic latent mapping via linear transformations with a cross attention mechanism. After alignment, we inject semantics from text prompt embeddings to the StyleGAN latent space. For one type of image (e.g., `human portrait'), one FFCLIP model can be learned to handle free-form text prompts. Meanwhile, we observe that although each training text prompt only contains a single semantic meaning, FFCLIP can leverage text prompts with multiple semantic meanings for image manipulation. In the experiments, we evaluate FFCLIP on three types of images (i.e., `human portraits', `cars', and `churches'). Both visual and numerical results show that FFCLIP effectively produces semantically accurate and visually realistic images. Project page: https://github.com/KumapowerLIU/FFCLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement b/data/2022/neurips/One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement
new file mode 100644
index 0000000000..0a12809f57
--- /dev/null
+++ b/data/2022/neurips/One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement	
@@ -0,0 +1 @@
+Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named SMILE, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/One for All: Simultaneous Metric and Preference Learning over Multiple Users b/data/2022/neurips/One for All: Simultaneous Metric and Preference Learning over Multiple Users
new file mode 100644
index 0000000000..ec917fe333
--- /dev/null
+++ b/data/2022/neurips/One for All: Simultaneous Metric and Preference Learning over Multiple Users	
@@ -0,0 +1 @@
+This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i.e., responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users.
\ No newline at end of file
diff --git a/data/2022/neurips/One-Inlier is First: Towards Efficient Position Encoding for Point Cloud Registration b/data/2022/neurips/One-Inlier is First: Towards Efficient Position Encoding for Point Cloud Registration
new file mode 100644
index 0000000000..2318434928
--- /dev/null
+++ b/data/2022/neurips/One-Inlier is First: Towards Efficient Position Encoding for Point Cloud Registration	
@@ -0,0 +1 @@
+Transformer architecture has shown great potential for many visual tasks, including point cloud registration. As an order-aware module, position encoding plays an important role in Transformer architecture applied to point cloud registration task. In this paper, we propose OIF-PCR, a one-inlier based position encoding method for point cloud registration network. Speciﬁcally, we ﬁrst ﬁnd one correspondence by a differentiable optimal transport layer, and use it to normalize each point for position encoding. It can eliminate the challenges brought by the different reference frames of two point clouds, and mitigate the feature ambiguity by learning the spatial consistency. Then, we propose a joint approach for establishing correspondence and position encoding, presenting an iterative optimization process. Finally, we design a progressive way for point cloud alignment and feature learning to gradually optimize the rigid transformation. The proposed position encoding is very efﬁcient, requiring only a small addition of memory and computing overhead. Extensive experiments demonstrate the proposed method can achieve competitive performance with the state-of-the-art methods in both indoor and outdoor scenes.
\ No newline at end of file
diff --git a/data/2022/neurips/One-shot Neural Backdoor Erasing via Adversarial Weight Masking b/data/2022/neurips/One-shot Neural Backdoor Erasing via Adversarial Weight Masking
new file mode 100644
index 0000000000..9682e54e7c
--- /dev/null
+++ b/data/2022/neurips/One-shot Neural Backdoor Erasing via Adversarial Weight Masking	
@@ -0,0 +1 @@
+Recent studies show that despite achieving high accuracy on a number of real-world applications, deep neural networks (DNNs) can be backdoored: by injecting triggered data samples into the training dataset, the adversary can mislead the trained model into classifying any test data to the target class as long as the trigger pattern is presented. To nullify such backdoor threats, various methods have been proposed. Particularly, a line of research aims to purify the potentially compromised model. However, one major limitation of this line of work is the requirement to access sufficient original training data: the purifying performance is a lot worse when the available training data is limited. In this work, we propose Adversarial Weight Masking (AWM), a novel method capable of erasing the neural backdoors even in the one-shot setting. The key idea behind our method is to formulate this into a min-max optimization problem: first, adversarially recover the trigger patterns and then (soft) mask the network weights that are sensitive to the recovered patterns. Comprehensive evaluations of several benchmark datasets suggest that AWM can largely improve the purifying effects over other state-of-the-art methods on various available training dataset sizes.
\ No newline at end of file
diff --git a/data/2022/neurips/OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models b/data/2022/neurips/OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models
new file mode 100644
index 0000000000..e22638e936
--- /dev/null
+++ b/data/2022/neurips/OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models	
@@ -0,0 +1 @@
+We propose a new method for object pose estimation without CAD models. The previous feature-matching-based method OnePose has shown promising results under a one-shot setting which eliminates the need for CAD models or object-specific training. However, OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects. We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection. Built upon the detector-free feature matching method LoFTR, we devise a new keypoint-free SfM method to reconstruct a semi-dense point-cloud model for the object. Given a query image for object pose estimation, a 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model without first detecting keypoints in the image. Experiments show that the proposed pipeline outperforms existing one-shot CAD-model-free methods by a large margin and is comparable to CAD-model-based methods on LINEMOD even for low-textured objects. We also collect a new dataset composed of 80 sequences of 40 low-textured objects to facilitate future research on one-shot object pose estimation. The supplementary material, code and dataset are available on the project page: https://zju3dv.github.io/onepose_plus_plus/.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Agnostic Multiclass Boosting b/data/2022/neurips/Online Agnostic Multiclass Boosting
new file mode 100644
index 0000000000..6a5ed99607
--- /dev/null
+++ b/data/2022/neurips/Online Agnostic Multiclass Boosting	
@@ -0,0 +1 @@
+Boosting is a fundamental approach in machine learning that enjoys both strong theoretical and practical guarantees. At a high-level, boosting algorithms cleverly aggregate weak learners to generate predictions with arbitrarily high accuracy. In this way, boosting algorithms convert weak learners into strong ones. Recently, Brukhim et al. extended boosting to the online agnostic binary classification setting. A key ingredient in their approach is a clean and simple reduction to online convex optimization, one that efficiently converts an arbitrary online convex optimizer to an agnostic online booster. In this work, we extend this reduction to multiclass problems and give the first boosting algorithm for online agnostic mutliclass classification. Our reduction also enables the construction of algorithms for statistical agnostic, online realizable, and statistical realizable multiclass boosting.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Algorithms for the Santa Claus Problem b/data/2022/neurips/Online Algorithms for the Santa Claus Problem
new file mode 100644
index 0000000000..80564e6d3a
--- /dev/null
+++ b/data/2022/neurips/Online Algorithms for the Santa Claus Problem	
@@ -0,0 +1 @@
+The Santa Claus problem is a fundamental problem in fair division: the goal is to partition a set of heterogeneous items among heterogeneous agents so as to maximize the minimum value of items received by any agent. In this paper, we study the online version of this problem where the items are not known in advance and have to be assigned to agents as they arrive over time. If the arrival order of items is arbitrary, then no good assignment rule exists in the worst case. However, we show that, if the arrival order is random, then for $n$ agents and any $\varepsilon>0$, we can obtain a competitive ratio of $1-\varepsilon$ when the optimal assignment gives value at least $\Omega(\log n / \varepsilon^2)$ to every agent (assuming each item has at most unit value). We also show that this result is almost tight: namely, if the optimal solution has value at most $C \ln n / \varepsilon$ for some constant $C$, then there is no $(1-\varepsilon)$-competitive algorithm even for random arrival order.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Allocation and Learning in the Presence of Strategic Agents b/data/2022/neurips/Online Allocation and Learning in the Presence of Strategic Agents
new file mode 100644
index 0000000000..8c14b70b70
--- /dev/null
+++ b/data/2022/neurips/Online Allocation and Learning in the Presence of Strategic Agents	
@@ -0,0 +1 @@
+We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them. The agents' valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents' utility compared to that under the optimal offline allocation policy.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Bipartite Matching with Advice: Tight Robustness-Consistency Tradeoffs for the Two-Stage Model b/data/2022/neurips/Online Bipartite Matching with Advice: Tight Robustness-Consistency Tradeoffs for the Two-Stage Model
new file mode 100644
index 0000000000..6c679c63df
--- /dev/null
+++ b/data/2022/neurips/Online Bipartite Matching with Advice: Tight Robustness-Consistency Tradeoffs for the Two-Stage Model	
@@ -0,0 +1 @@
+We study the two-stage vertex-weighted online bipartite matching problem of Feng, Niazadeh, and Saberi (SODA 2021) in a setting where the algorithm has access to a suggested matching that is recommended in the first stage. We evaluate an algorithm by its robustness $R$, which is its performance relative to that of the optimal offline matching, and its consistency $C$, which is its performance when the advice or the prediction given is correct. We characterize for this problem the Pareto-efficient frontier between robustness and consistency, which is rare in the literature on advice-augmented algorithms, yet necessary for quantifying such an algorithm to be optimal. Specifically, we propose an algorithm that is $R$-robust and $C$-consistent for any $(R,C)$ with $0 \leq R \leq \frac{3}{4}$ and $\sqrt{1-R} + \sqrt{1-C} = 1$, and prove that no other algorithm can achieve a better tradeoff.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Convex Optimization with Hard Constraints: Towards the Best of Two Worlds and Beyond b/data/2022/neurips/Online Convex Optimization with Hard Constraints: Towards the Best of Two Worlds and Beyond
new file mode 100644
index 0000000000..6f2afc3b9c
--- /dev/null
+++ b/data/2022/neurips/Online Convex Optimization with Hard Constraints: Towards the Best of Two Worlds and Beyond	
@@ -0,0 +1 @@
+This paper considers online convex optimization with hard constraints and analyzes achievable regret and cumulative hard constraint violation (violation for short). The problem distinguishes itself from online convex optimization with soft constraints, where a violation at one round can be compensated/cancelled by a conservative decision at a different round. We propose a RECtiﬁed Online Optimization al-gorithm (RECOO) and consider two settings: ﬁxed constraints and adversarial constraints. Both settings have been considered in the literature. Compared with existing results, RECOO achieves the best of two worlds and beyond. For the ﬁxed-constraints setting, RECOO achieves O ` ? T ˘ regret and O p 1 q violation, where T is the learning horizon. The best known results in this case are O p? T q regret and O ` T 1 { 4 ˘ violation. For the adversarial-constraints setting, it guarantees O p? T q regret and O p T 3 { 4 q violation, which match the best existing results. When the loss function is strongly convex, RECOO can guarantee O p log T q regret and O p 1 q violation for ﬁxed constraints, and O p log T q regret and O p? T log T q violation for adversarial constraints. Both these results are order-wise better than the existing bounds. The regret and violation bounds mentioned above use the best ﬁxed decision in hindsight as the baseline. This paper further considers a dynamic baseline where the comparator sequence is time-varying. This paper shows that RECOO not only improves the existing bounds for the ﬁxed-constraints setting but also for the ﬁrst time, establishes dynamic regret and violation bounds for the adversarial-constraints setting. Our experiment results conﬁrm that RECOO outperforms several existing algorithms for both ﬁxed and adversarial
\ No newline at end of file
diff --git a/data/2022/neurips/Online Decision Mediation b/data/2022/neurips/Online Decision Mediation
new file mode 100644
index 0000000000..d45e1fee95
--- /dev/null
+++ b/data/2022/neurips/Online Decision Mediation	
@@ -0,0 +1 @@
+Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to *accept* that agent's decision, *intervene* with an alternative, or *request* the expert's opinion. For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting. Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of *online decision mediation* -- that is, of simultaneously learning and evaluating mediator policies from scratch with *abstentive feedback*: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail. Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Deep Equilibrium Learning for Regularization by Denoising b/data/2022/neurips/Online Deep Equilibrium Learning for Regularization by Denoising
new file mode 100644
index 0000000000..5197ea4b0c
--- /dev/null
+++ b/data/2022/neurips/Online Deep Equilibrium Learning for Regularization by Denoising	
@@ -0,0 +1 @@
+Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-end optimal. The recent Deep Equilibrium Models (DEQ) framework has enabled memory-efficient end-to-end learning of PnP/RED priors by implicitly differentiating through the fixed-point equations without storing intermediate activation values. However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications. We propose ODER as a new strategy for improving the efficiency of DEQ through stochastic approximations of the measurement models. We theoretically analyze ODER giving insights into its convergence and ability to approximate the traditional DEQ approach. Our numerical results suggest the potential improvements in training/testing complexity due to ODER on three distinct imaging applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Frank-Wolfe with Arbitrary Delays b/data/2022/neurips/Online Frank-Wolfe with Arbitrary Delays
new file mode 100644
index 0000000000..6d0bd76873
--- /dev/null
+++ b/data/2022/neurips/Online Frank-Wolfe with Arbitrary Delays	
@@ -0,0 +1 @@
+The online Frank-Wolfe (OFW) method has gained much popularity for online convex optimization due to its projection-free property. Previous studies show that OFW can attain an O ( T 3 / 4 ) regret bound for convex losses and an O ( T 2 / 3 ) regret bound for strongly convex losses. However, they assume that each gradient queried by OFW is revealed immediately, which may not hold in practice and limits the application of OFW. To address this limitation, we propose a delayed variant of OFW, which allows gradients to be delayed by arbitrary rounds. The main idea is to perform an update similar to OFW after receiving any delayed gradient, and play the latest decision for each round. Despite its simplicity, we prove that our delayed variant of OFW is able to achieve an O ( T 3 / 4 + dT 1 / 4 ) regret bound for convex losses and an O ( T 2 / 3 + d log T ) regret bound for strongly convex losses, where d is the maximum delay. This is quite surprising since under a relatively large amount of delay (e.g., d = O ( √ T ) for convex losses and d = O ( T 2 / 3 / log T ) for strongly convex losses), the delayed variant of OFW enjoys the same regret bound as that of the original OFW.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Learning and Pricing for Network Revenue Management with Reusable Resources b/data/2022/neurips/Online Learning and Pricing for Network Revenue Management with Reusable Resources
new file mode 100644
index 0000000000..0f8484d06e
--- /dev/null
+++ b/data/2022/neurips/Online Learning and Pricing for Network Revenue Management with Reusable Resources	
@@ -0,0 +1 @@
+We consider a price-based network revenue management problem with multiple products and multiple reusable resources. Each randomly arriving customer requests a product (service) that needs to occupy a sequence of reusable resources (servers). We adopt an incomplete information setting where the ﬁrm does not know the price-demand function for each product and the goal is to dynamically set prices of all products to maximize the total expected revenue of serving customers. We propose novel batched bandit learning algorithms for ﬁnding near-optimal pricing policies, and show that they admit a near-optimal cumulative regret bound of ˜ O ( J √ XT ) , where J , X , and T are the numbers of products, candidate prices, and service periods, respectively. As part of our regret analysis, we develop the ﬁrst ﬁnite-time mixing time analysis of an open network queueing system (i.e., the celebrated Jackson Network), which could be of independent interest. Our numerical studies show that the proposed approaches perform consistently well.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications b/data/2022/neurips/Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications
new file mode 100644
index 0000000000..121ecd55ae
--- /dev/null
+++ b/data/2022/neurips/Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications	
@@ -0,0 +1 @@
+We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax theorem does not apply), we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret. We demonstrate the power of our framework by using it to (re)derive optimal bounds and efficient algorithms across a variety of domains, ranging from multicalibration to a large set of no regret algorithms, to a variant of Blackwell's approachability theorem for polytopes with fast convergence rates. As a new application, we show how to ``(multi)calibeat'' an arbitrary collection of forecasters -- achieving an exponentially improved dependence on the number of models we are competing against, compared to prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Neural Sequence Detection with Hierarchical Dirichlet Point Process b/data/2022/neurips/Online Neural Sequence Detection with Hierarchical Dirichlet Point Process
new file mode 100644
index 0000000000..45b3d81253
--- /dev/null
+++ b/data/2022/neurips/Online Neural Sequence Detection with Hierarchical Dirichlet Point Process	
@@ -0,0 +1 @@
+Neural sequence detection plays a vital role in neuroscience research. Recent impressive works utilize convolutive nonnegative matrix factorization and Neyman-Scott process to solve this problem. However, they still face two limitations. Firstly, they accommodate the entire dataset into memory and perform iterative updates of multiple passes, which can be inefficient when the dataset is large or grows frequently. Secondly, they rely on the prior knowledge of the number of sequence types, which can be impractical with data when the future situation is unknown. To tackle these limitations, we propose a hierarchical Dirichlet point process model for efficient neural sequence detection. Instead of computing the entire data, our model can sequentially detect sequences in an online unsupervised manner with Particle filters. Besides, the Dirichlet prior enables our model to automatically introduce new sequence types on the fly as needed, thus avoiding specifying the number of types in advance. We manifest these advantages on synthetic data and neural recordings from songbird higher vocal center and rodent hippocampus.
\ No newline at end of file
diff --git a/data/2022/neurips/Online PAC-Bayes Learning b/data/2022/neurips/Online PAC-Bayes Learning
new file mode 100644
index 0000000000..8ea45d2f58
--- /dev/null
+++ b/data/2022/neurips/Online PAC-Bayes Learning	
@@ -0,0 +1 @@
+Most PAC-Bayesian bounds hold in the batch learning setting where data is collected at once, prior to inference or prediction. This somewhat departs from many contemporary learning problems where data streams are collected and the algorithms must dynamically adjust. We prove new PAC-Bayesian bounds in this online learning framework, leveraging an updated definition of regret, and we revisit classical PAC-Bayesian results with a batch-to-online conversion, extending their remit to the case of dependent data. Our results hold for bounded losses, potentially \emph{non-convex}, paving the way to promising developments in online learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Reinforcement Learning for Mixed Policy Scopes b/data/2022/neurips/Online Reinforcement Learning for Mixed Policy Scopes
new file mode 100644
index 0000000000..7c4e0abab9
--- /dev/null
+++ b/data/2022/neurips/Online Reinforcement Learning for Mixed Policy Scopes	
@@ -0,0 +1 @@
+Combination therapy refers to the use of multiple treatments – such as surgery, medication, and behavioral therapy - to cure a single disease, and has become a cornerstone for treating various conditions including cancer, HIV, and depression. All possible combinations of treatments lead to a collection of treatment regimens (i.e., policies) with mixed scopes, or what physicians could observe and which actions they should take depending on the context. In this paper, we investigate the online reinforcement learning setting for optimizing the policy space with mixed scopes. In particular, we develop novel online algorithms that achieve sublinear regret compared to an optimal agent deployed in the environment. The regret bound has a dependency on the maximal cardinality of the induced state-action space associated with mixed scopes. We further introduce a canonical representation for an arbitrary subset of interventional distributions given a causal diagram, which leads to a non-trivial, minimal representation of the model parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/Online Training Through Time for Spiking Neural Networks b/data/2022/neurips/Online Training Through Time for Spiking Neural Networks
new file mode 100644
index 0000000000..98390eee15
--- /dev/null
+++ b/data/2022/neurips/Online Training Through Time for Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. Particularly, backpropagation through time (BPTT) with surrogate gradients (SG) is popularly used to achieve high performance in a very small number of time steps. However, it is at the cost of large memory consumption for training, lack of theoretical clarity for optimization, and inconsistency with the online property of biological learning and rules on neuromorphic hardware. Other works connect spike representations of SNNs with equivalent artificial neural network formulation and train SNNs by gradients from equivalent mappings to ensure descent directions. But they fail to achieve low latency and are also not online. In this work, we propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning by tracking presynaptic activities and leveraging instantaneous loss and gradients. Meanwhile, we theoretically analyze and prove that gradients of OTTT can provide a similar descent direction for optimization as gradients based on spike representations under both feedforward and recurrent conditions. OTTT only requires constant training memory costs agnostic to time steps, avoiding the significant memory costs of BPTT for GPU training. Furthermore, the update rule of OTTT is in the form of three-factor Hebbian learning, which could pave a path for online on-chip learning. With OTTT, it is the first time that two mainstream supervised SNN training methods, BPTT with SG and spike representation-based training, are connected, and meanwhile in a biologically plausible form. Experiments on CIFAR-10, CIFAR-100, ImageNet, and CIFAR10-DVS demonstrate the superior performance of our method on large-scale static and neuromorphic datasets in small time steps.
\ No newline at end of file
diff --git a/data/2022/neurips/Ontologue: Declarative Benchmark Construction for Ontological Multi-Label Classification b/data/2022/neurips/Ontologue: Declarative Benchmark Construction for Ontological Multi-Label Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Open High-Resolution Satellite Imagery: The WorldStrat Dataset - With Application to Super-Resolution b/data/2022/neurips/Open High-Resolution Satellite Imagery: The WorldStrat Dataset - With Application to Super-Resolution
new file mode 100644
index 0000000000..13ead4348d
--- /dev/null
+++ b/data/2022/neurips/Open High-Resolution Satellite Imagery: The WorldStrat Dataset - With Application to Super-Resolution	
@@ -0,0 +1 @@
+Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810792 and the software package at https://github.com/worldstrat/worldstrat .
\ No newline at end of file
diff --git a/data/2022/neurips/Open-Ended Reinforcement Learning with Neural Reward Functions b/data/2022/neurips/Open-Ended Reinforcement Learning with Neural Reward Functions
new file mode 100644
index 0000000000..77127356c4
--- /dev/null
+++ b/data/2022/neurips/Open-Ended Reinforcement Learning with Neural Reward Functions	
@@ -0,0 +1 @@
+Inspired by the great success of unsupervised learning in Computer Vision and Natural Language Processing, the Reinforcement Learning community has recently started to focus more on unsupervised discovery of skills. Most current approaches, like DIAYN or DADS, optimize some form of mutual information objective. We propose a different approach that uses reward functions encoded by neural networks. These are trained iteratively to reward more complex behavior. In high-dimensional robotic environments our approach learns a wide range of interesting skills including front-flips for Half-Cheetah and one-legged running for Humanoid. In the pixel-based Montezuma's Revenge environment our method also works with minimal changes and it learns complex skills that involve interacting with items and visiting diverse locations. The implementation of our approach can be found in this link: https://github.com/amujika/Open-Ended-Reinforcement-Learning-with-Neural-Reward-Functions.
\ No newline at end of file
diff --git a/data/2022/neurips/OpenAUC: Towards AUC-Oriented Open-Set Recognition b/data/2022/neurips/OpenAUC: Towards AUC-Oriented Open-Set Recognition
new file mode 100644
index 0000000000..e8cc76ae26
--- /dev/null
+++ b/data/2022/neurips/OpenAUC: Towards AUC-Oriented Open-Set Recognition	
@@ -0,0 +1 @@
+Traditional machine learning follows a close-set assumption that the training and test set share the same label space. While in many practical scenarios, it is inevitable that some test samples belong to unknown classes (open-set). To fix this issue, Open-Set Recognition (OSR), whose goal is to make correct predictions on both close-set samples and open-set samples, has attracted rising attention. In this direction, the vast majority of literature focuses on the pattern of open-set samples. However, how to evaluate model performance in this challenging task is still unsolved. In this paper, a systematic analysis reveals that most existing metrics are essentially inconsistent with the aforementioned goal of OSR: (1) For metrics extended from close-set classification, such as Open-set F-score, Youden's index, and Normalized Accuracy, a poor open-set prediction can escape from a low performance score with a superior close-set prediction. (2) Novelty detection AUC, which measures the ranking performance between close-set and open-set samples, ignores the close-set performance. To fix these issues, we propose a novel metric named OpenAUC. Compared with existing metrics, OpenAUC enjoys a concise pairwise formulation that evaluates open-set performance and close-set performance in a coupling manner. Further analysis shows that OpenAUC is free from the aforementioned inconsistency properties. Finally, an end-to-end learning method is proposed to minimize the OpenAUC risk, and the experimental results on popular benchmark datasets speak to its effectiveness. Project Page: https://github.com/wang22ti/OpenAUC.
\ No newline at end of file
diff --git a/data/2022/neurips/OpenFWI: Large-scale Multi-structural Benchmark Datasets for Full Waveform Inversion b/data/2022/neurips/OpenFWI: Large-scale Multi-structural Benchmark Datasets for Full Waveform Inversion
new file mode 100644
index 0000000000..a6b04564e6
--- /dev/null
+++ b/data/2022/neurips/OpenFWI: Large-scale Multi-structural Benchmark Datasets for Full Waveform Inversion	
@@ -0,0 +1 @@
+Full waveform inversion (FWI) is widely used in geophysics to reconstruct high-resolution velocity maps from seismic data. The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community. We present OpenFWI, a collection of large-scale multi-structural benchmark datasets, to facilitate diversified, rigorous, and reproducible research on FWI. In particular, OpenFWI consists of 12 datasets (2.1TB in total) synthesized from multiple sources. It encompasses diverse domains in geophysics (interface, fault, CO2 reservoir, etc.), covers different geological subsurface structures (flat, curve, etc.), and contains various amounts of data samples (2K - 67K). It also includes a dataset for 3D FWI. Moreover, we use OpenFWI to perform benchmarking over four deep learning methods, covering both supervised and unsupervised learning regimes. Along with the benchmarks, we implement additional experiments, including physics-driven methods, complexity analysis, generalization study, uncertainty quantification, and so on, to sharpen our understanding of datasets and methods. The studies either provide valuable insights into the datasets and the performance, or uncover their current limitations. We hope OpenFWI supports prospective research on FWI and inspires future open-source efforts on AI for science. All datasets and related information can be accessed through our website at https://openfwi-lanl.github.io/
\ No newline at end of file
diff --git a/data/2022/neurips/OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters b/data/2022/neurips/OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters
new file mode 100644
index 0000000000..f9ae140187
--- /dev/null
+++ b/data/2022/neurips/OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters	
@@ -0,0 +1 @@
+Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics. Given the wide adoption of AR face filters and the importance of faces in our social structures and relations, there is increased interest by the scientific community to analyze the impact of such filters from a psychological, artistic and sociological perspective. However, there are few quantitative analyses in this area mainly due to a lack of publicly available datasets of facial images with applied AR filters. The proprietary, close nature of most social media platforms does not allow users, scientists and practitioners to access the code and the details of the available AR face filters. Scraping faces from these platforms to collect data is ethically unacceptable and should, therefore, be avoided in research. In this paper, we present OpenFilter, a flexible framework to apply AR filters available in social media platforms on existing large collections of human faces. Moreover, we share FairBeauty and B-LFW, two beautified versions of the publicly available FairFace and LFW datasets and we outline insights derived from the analysis of these beautified datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/OpenOOD: Benchmarking Generalized Out-of-Distribution Detection b/data/2022/neurips/OpenOOD: Benchmarking Generalized Out-of-Distribution Detection
new file mode 100644
index 0000000000..804435f2c1
--- /dev/null
+++ b/data/2022/neurips/OpenOOD: Benchmarking Generalized Out-of-Distribution Detection	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often results in unfair comparisons and inconclusive results. From the problem setting perspective, OOD detection is closely related to neighboring fields including anomaly detection (AD), open set recognition (OSR), and model uncertainty, since methods developed for one domain are often applicable to each other. To help the community to improve the evaluation and advance, we build a unified, well-structured codebase called OpenOOD, which implements over 30 methods developed in relevant fields and provides a comprehensive benchmark under the recently proposed generalized OOD detection framework. With a comprehensive comparison of these methods, we are gratified that the field has progressed significantly over the past few years, where both preprocessing methods and the orthogonal post-hoc methods show strong potential.
\ No newline at end of file
diff --git a/data/2022/neurips/OpenSRH: optimizing brain tumor surgery using intraoperative stimulated Raman histology b/data/2022/neurips/OpenSRH: optimizing brain tumor surgery using intraoperative stimulated Raman histology
new file mode 100644
index 0000000000..cc94ba4aa9
--- /dev/null
+++ b/data/2022/neurips/OpenSRH: optimizing brain tumor surgery using intraoperative stimulated Raman histology	
@@ -0,0 +1 @@
+Accurate intraoperative diagnosis is essential for providing safe and effective care during brain tumor surgery. Our standard-of-care diagnostic methods are time, resource, and labor intensive, which restricts access to optimal surgical treatments. To address these limitations, we propose an alternative workflow that combines stimulated Raman histology (SRH), a rapid optical imaging method, with deep learning-based automated interpretation of SRH images for intraoperative brain tumor diagnosis and real-time surgical decision support. Here, we present OpenSRH, the first public dataset of clinical SRH images from 300+ brain tumors patients and 1300+ unique whole slide optical images. OpenSRH contains data from the most common brain tumors diagnoses, full pathologic annotations, whole slide tumor segmentations, raw and processed optical imaging data for end-to-end model development and validation. We provide a framework for patch-based whole slide SRH classification and inference using weak (i.e. patient-level) diagnostic labels. Finally, we benchmark two computer vision tasks: multiclass histologic brain tumor classification and patch-based contrastive representation learning. We hope OpenSRH will facilitate the clinical translation of rapid optical imaging and real-time ML-based surgical decision support in order to improve the access, safety, and efficacy of cancer surgery in the era of precision medicine. Dataset access, code, and benchmarks are available at https://opensrh.mlins.org.
\ No newline at end of file
diff --git a/data/2022/neurips/OpenXAI: Towards a Transparent Evaluation of Model Explanations b/data/2022/neurips/OpenXAI: Towards a Transparent Evaluation of Model Explanations
new file mode 100644
index 0000000000..6a9f5b4233
--- /dev/null
+++ b/data/2022/neurips/OpenXAI: Towards a Transparent Evaluation of Model Explanations	
@@ -0,0 +1 @@
+While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator and a collection of diverse real-world datasets, pre-trained models, and state-of-the-art feature attribution methods, and (ii) open-source implementations of eleven quantitative metrics for evaluating faithfulness, stability (robustness), and fairness of explanation methods, in turn providing comparisons of several explanation methods across a wide variety of metrics, models, and datasets. OpenXAI is easily extensible, as users can readily evaluate custom explanation methods and incorporate them into our leaderboards. Overall, OpenXAI provides an automated end-to-end pipeline that not only simplifies and standardizes the evaluation of post hoc explanation methods, but also promotes transparency and reproducibility in benchmarking these methods. While the first release of OpenXAI supports only tabular datasets, the explanation methods and metrics that we consider are general enough to be applicable to other data modalities. OpenXAI datasets and models, implementations of state-of-the-art explanation methods and evaluation metrics, are publicly available at this GitHub link.
\ No newline at end of file
diff --git a/data/2022/neurips/Operative dimensions in unconstrained connectivity of recurrent neural networks b/data/2022/neurips/Operative dimensions in unconstrained connectivity of recurrent neural networks
new file mode 100644
index 0000000000..1cca7c48f1
--- /dev/null
+++ b/data/2022/neurips/Operative dimensions in unconstrained connectivity of recurrent neural networks	
@@ -0,0 +1 @@
+Recurrent Neural Networks (RNNs) are commonly used models to study neural computation. However, a comprehensive understanding of how dynamics in RNNs emerge from the underlying connectivity is largely lacking. Previous work derived such an understanding for RNNs fulfilling very specific constraints on their connectivity, but it is unclear whether the resulting insights apply more generally. Here we study how network dynamics are related to network connectivity in RNNs trained without any specific constraints on several tasks previously employed in neuroscience. Despite the apparent high-dimensional connectivity of these RNNs, we show that a low-dimensional, functionally relevant subspace of the weight matrix can be found through the identification of operative dimensions, which we define as components of the connectivity whose removal has a large influence on local RNN dynamics. We find that a weight matrix built from only a few operative dimensions is sufficient for the RNNs to operate with the original performance, implying that much of the high-dimensional structure of the trained connectivity is functionally irrelevant. The existence of a low-dimensional, operative subspace in the weight matrix simplifies the challenge of linking connectivity to network dynamics and suggests that independent network functions may be placed in specific, separate subspaces of the weight matrix to avoid catastrophic forgetting in continual learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Operator Splitting Value Iteration b/data/2022/neurips/Operator Splitting Value Iteration
new file mode 100644
index 0000000000..778825650d
--- /dev/null
+++ b/data/2022/neurips/Operator Splitting Value Iteration	
@@ -0,0 +1 @@
+We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Algorithms for Decentralized Stochastic Variational Inequalities b/data/2022/neurips/Optimal Algorithms for Decentralized Stochastic Variational Inequalities
new file mode 100644
index 0000000000..a0ca74acdf
--- /dev/null
+++ b/data/2022/neurips/Optimal Algorithms for Decentralized Stochastic Variational Inequalities	
@@ -0,0 +1 @@
+Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds. Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases. Experimental results confirm the effectiveness of the presented algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Binary Classification Beyond Accuracy b/data/2022/neurips/Optimal Binary Classification Beyond Accuracy
new file mode 100644
index 0000000000..667abc0b2c
--- /dev/null
+++ b/data/2022/neurips/Optimal Binary Classification Beyond Accuracy	
@@ -0,0 +1 @@
+The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in imbalanced binary classification, where data are dominated by samples from one of two classes. The first part of this paper derives a novel generalization of the Bayes-optimal classifier from accuracy to any performance metric computed from the confusion matrix. Specifically, this result (a) demonstrates that stochastic classifiers sometimes outperform the best possible deterministic classifier and (b) removes an empirically unverifiable absolute continuity assumption that is poorly understood but pervades existing results. We then demonstrate how to use this generalized Bayes classifier to obtain regret bounds in terms of the error of estimating regression functions under uniform loss. Finally, we use these results to develop some of the first finite-sample statistical guarantees specific to imbalanced binary classification. Specifically, we demonstrate that optimal classification performance depends on properties of class imbalance, such as a novel notion called Uniform Class Imbalance, that have not previously been formalized. We further illustrate these contributions numerically in the case of $k$-nearest neighbor classification
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning b/data/2022/neurips/Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
new file mode 100644
index 0000000000..75d3b0e55c
--- /dev/null
+++ b/data/2022/neurips/Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning	
@@ -0,0 +1 @@
+We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on an exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] extended to also cover weight quantization at the scale of modern DNNs. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can enable the accurate compound application of both pruning and quantization in a post-training setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Comparator Adaptive Online Learning with Switching Cost b/data/2022/neurips/Optimal Comparator Adaptive Online Learning with Switching Cost
new file mode 100644
index 0000000000..f0f5f4123f
--- /dev/null
+++ b/data/2022/neurips/Optimal Comparator Adaptive Online Learning with Switching Cost	
@@ -0,0 +1 @@
+Practical online learning tasks are often naturally defined on unconstrained domains, where optimal algorithms for general convex losses are characterized by the notion of comparator adaptivity. In this paper, we design such algorithms in the presence of switching cost - the latter penalizes the typical optimism in adaptive algorithms, leading to a delicate design trade-off. Based on a novel dual space scaling strategy discovered by a continuous-time analysis, we propose a simple algorithm that improves the existing comparator adaptive regret bound [ZCP22a] to the optimal rate. The obtained benefits are further extended to the expert setting, and the practicality of the proposed algorithm is demonstrated through a sequential investment task.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Dynamic Regret in LQR Control b/data/2022/neurips/Optimal Dynamic Regret in LQR Control
new file mode 100644
index 0000000000..832fb3a1e4
--- /dev/null
+++ b/data/2022/neurips/Optimal Dynamic Regret in LQR Control	
@@ -0,0 +1 @@
+We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})$, where $\mathcal{TV}(M_{1:n})$ is the total variation of any oracle sequence of Disturbance Action policies parameterized by $M_1,...,M_n$ -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal $\tilde{O}(n^{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Efficiency-Envy Trade-Off via Optimal Transport b/data/2022/neurips/Optimal Efficiency-Envy Trade-Off via Optimal Transport
new file mode 100644
index 0000000000..481de275d9
--- /dev/null
+++ b/data/2022/neurips/Optimal Efficiency-Envy Trade-Off via Optimal Transport	
@@ -0,0 +1 @@
+We consider the problem of allocating a distribution of items to $n$ recipients where each recipient has to be allocated a fixed, prespecified fraction of all items, while ensuring that each recipient does not experience too much envy. We show that this problem can be formulated as a variant of the semi-discrete optimal transport (OT) problem, whose solution structure in this case has a concise representation and a simple geometric interpretation. Unlike existing literature that treats envy-freeness as a hard constraint, our formulation allows us to \emph{optimally} trade off efficiency and envy continuously. Additionally, we study the statistical properties of the space of our OT based allocation policies by showing a polynomial bound on the number of samples needed to approximate the optimal solution from samples. Our approach is suitable for large-scale fair allocation problems such as the blood donation matching problem, and we show numerically that it performs well on a prior realistic data simulator.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity b/data/2022/neurips/Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Optimal Positive Generation via Latent Transformation for Contrastive Learning b/data/2022/neurips/Optimal Positive Generation via Latent Transformation for Contrastive Learning
new file mode 100644
index 0000000000..fe8cfec3f1
--- /dev/null
+++ b/data/2022/neurips/Optimal Positive Generation via Latent Transformation for Contrastive Learning	
@@ -0,0 +1 @@
+Contrastive learning, which learns to contrast positive with negative pairs of samples, has been popular for self-supervised visual representation learning. Although great effort has been made to design proper positive pairs through data augmentation, few works attempt to generate optimal positives for each instance. Inspired by semantic consistency and computational advantage in latent space of pretrained generative models, this paper proposes to learn instance-specific latent transformations to generate Contrastive Optimal Positives (COP-Gen) for self-supervised contrastive learning. Specifically, we formulate COP-Gen as an instance-specific latent space navigator which minimizes the mutual information between the generated positive pair subject to the semantic consistency constraint. Theoretically, the learned latent transformation creates optimal positives for contrastive learning, which removes as much nuisance information as possible while preserving the semantics. Empirically, using generated positives by COP-Gen consistently outperforms other latent transformation methods and even real-image-based methods in self-supervised contrastive learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Query Complexities for Dynamic Trace Estimation b/data/2022/neurips/Optimal Query Complexities for Dynamic Trace Estimation
new file mode 100644
index 0000000000..9102d2cc72
--- /dev/null
+++ b/data/2022/neurips/Optimal Query Complexities for Dynamic Trace Estimation	
@@ -0,0 +1 @@
+We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any $m$ matrices $A_1,...,A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on $A_i$ and can be generalized to a bound for the $p$-th Schatten norm for $p \in [1,2]$, giving a complexity of $\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right)$. By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Rates for Regularized Conditional Mean Embedding Learning b/data/2022/neurips/Optimal Rates for Regularized Conditional Mean Embedding Learning
new file mode 100644
index 0000000000..4fecacdcf8
--- /dev/null
+++ b/data/2022/neurips/Optimal Rates for Regularized Conditional Mean Embedding Learning	
@@ -0,0 +1 @@
+We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of $Y$ given $X$ into a target reproducing kernel Hilbert space $\mathcal{H}_Y$. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal $O(\log n / n)$ rates without assuming $\mathcal{H}_Y$ to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Scaling for Locally Balanced Proposals in Discrete Spaces b/data/2022/neurips/Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
new file mode 100644
index 0000000000..90ff3e93c8
--- /dev/null
+++ b/data/2022/neurips/Optimal Scaling for Locally Balanced Proposals in Discrete Spaces	
@@ -0,0 +1 @@
+Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the efficiency of M-H in discrete spaces can also be characterized by an asymptotic acceptance rate that is independent of the target distribution. Moreover, we verify, both theoretically and empirically, that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are $0.574$ and $0.234$ respectively. These results also help establish that LBP is asymptotically $O(N^\frac{2}{3})$ more efficient than RWM with respect to model dimension $N$. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces. We demonstrate empirically that such adaptive M-H sampling can robustly improve sampling in a variety of target distributions in discrete spaces, including training deep energy based models.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Transport of Classifiers to Fairness b/data/2022/neurips/Optimal Transport of Classifiers to Fairness
new file mode 100644
index 0000000000..792b1eaf45
--- /dev/null
+++ b/data/2022/neurips/Optimal Transport of Classifiers to Fairness	
@@ -0,0 +1 @@
+In past work on fairness in machine learning, the focus has been on forcing the prediction of classifiers to have similar statistical properties for people of different demographics. To reduce the violation of these properties, fairness methods usually simply rescale the classifier scores, ignoring similarities and dissimilarities between members of different groups. Yet, we hypothesize that such information is relevant in quantifying the unfairness of a given classifier. To validate this hypothesis, we introduce Optimal Transport to Fairness (OTF), a method that quantifies the violation of fairness constraints as the smallest Optimal Transport cost between a probabilistic classifier and any score function that satisfies these constraints. For a flexible class of linear fairness constraints, we construct a practical way to compute OTF as a differentiable fairness regularizer that can be added to any standard classification setting. Experiments show that OTF can be used to achieve an improved trade-off between predictive power and fairness.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition b/data/2022/neurips/Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition
new file mode 100644
index 0000000000..a8a729a5a2
--- /dev/null
+++ b/data/2022/neurips/Optimal Transport-based Identity Matching for Identity-invariant Facial Expression Recognition	
@@ -0,0 +1 @@
+Identity-invariant facial expression recognition (FER) has been one of the challenging computer vision tasks. Since conventional FER schemes do not explicitly address the inter-identity variation of facial expressions, their neural network models still operate depending on facial identity. This paper proposes to quantify the inter-identity variation by utilizing pairs of similar expressions explored through a specific matching process. We formulate the identity matching process as an Optimal Transport (OT) problem. Specifically, to find pairs of similar expressions from different identities, we define the inter-feature similarity as a transportation cost. Then, optimal identity matching to find the optimal flow with minimum transportation cost is performed by Sinkhorn-Knopp iteration. The proposed matching method is not only easy to plug in to other models, but also requires only acceptable computational overhead. Extensive simulations prove that the proposed FER method improves the PCC/CCC performance by up to 10\% or more compared to the runner-up on wild datasets. The source code and software demo are available at https://github.com/kdhht2334/ELIM_FER.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal Weak to Strong Learning b/data/2022/neurips/Optimal Weak to Strong Learning
new file mode 100644
index 0000000000..63b12653a6
--- /dev/null
+++ b/data/2022/neurips/Optimal Weak to Strong Learning	
@@ -0,0 +1 @@
+The classic algorithm AdaBoost allows to convert a weak learner, that is an algorithm that produces a hypothesis which is slightly better than chance, into a strong learner, achieving arbitrarily high accuracy when given enough training data. We present a new algorithm that constructs a strong learner from a weak learner but uses less training data than AdaBoost and all other weak to strong learners to achieve the same generalization bounds. A sample complexity lower bound shows that our new algorithm uses the minimum possible amount of training data and is thus optimal. Hence, this work settles the sample complexity of the classic problem of constructing a strong learner from a weak learner.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal and Adaptive Monteiro-Svaiter Acceleration b/data/2022/neurips/Optimal and Adaptive Monteiro-Svaiter Acceleration
new file mode 100644
index 0000000000..ebfc2e23e5
--- /dev/null
+++ b/data/2022/neurips/Optimal and Adaptive Monteiro-Svaiter Acceleration	
@@ -0,0 +1 @@
+We develop a variant of the Monteiro-Svaiter (MS) acceleration framework that removes the need to solve an expensive implicit equation at every iteration. Consequently, for any $p\ge 2$ we improve the complexity of convex optimization with Lipschitz $p$th derivative by a logarithmic factor, matching a lower bound. We also introduce an MS subproblem solver that requires no knowledge of problem parameters, and implement it as either a second- or first-order method by solving linear systems or applying MinRes, respectively. On logistic regression our method outperforms previous second-order momentum methods, but under-performs Newton's method; simply iterating our first-order adaptive subproblem solver performs comparably to L-BFGS.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimal-er Auctions through Attention b/data/2022/neurips/Optimal-er Auctions through Attention
new file mode 100644
index 0000000000..5b40bfb74c
--- /dev/null
+++ b/data/2022/neurips/Optimal-er Auctions through Attention	
@@ -0,0 +1 @@
+RegretNet is a recent breakthrough in the automated design of revenue-maximizing auctions. It combines the flexibility of deep learning with the regret-based approach to relax the Incentive Compatibility (IC) constraint (that participants prefer to bid truthfully) in order to approximate optimal auctions. We propose two independent improvements of RegretNet. The first is a neural architecture denoted as RegretFormer that is based on attention layers. The second is a loss function that requires explicit specification of an acceptable IC violation denoted as regret budget. We investigate both modifications in an extensive experimental study that includes settings with constant and inconstant number of items and participants, as well as novel validation procedures tailored to regret-based approaches. We find that RegretFormer consistently outperforms RegretNet in revenue (i.e. is optimal-er) and that our loss function both simplifies hyperparameter tuning and allows to unambiguously control the revenue-regret trade-off by selecting the regret budget.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games b/data/2022/neurips/Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games
new file mode 100644
index 0000000000..46962bcbfd
--- /dev/null
+++ b/data/2022/neurips/Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games	
@@ -0,0 +1 @@
+We show that, for any sufficiently small fixed $\epsilon>0$, when both players in a general-sum two-player (bimatrix) game employ optimistic mirror descent (OMD) with smooth regularization, learning rate $\eta = O(\epsilon^2)$ and $T = \Omega(\text{poly}(1/\epsilon))$ repetitions, either the dynamics reach an $\epsilon$-approximate Nash equilibrium (NE), or the average correlated distribution of play is an $\Omega(\text{poly}(\epsilon))$-strong coarse correlated equilibrium (CCE): any possible unilateral deviation does not only leave the player worse, but will decrease its utility by $\Omega(\text{poly}(\epsilon))$. As an immediate consequence, when the iterates of OMD are bounded away from being Nash equilibria in a bimatrix game, we guarantee convergence to an exact CCE after only $O(1)$ iterations. Our results reveal that uncoupled no-regret learning algorithms can converge to CCE in general-sum games remarkably faster than to NE in, for example, zero-sum games. To establish this, we show that when OMD does not reach arbitrarily close to a NE, the (cumulative) regret of both players is not only negative, but decays linearly with time. Given that regret is the canonical measure of performance in online learning, our results suggest that cycling behavior of no-regret learning algorithms in games can be justified in terms of efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees b/data/2022/neurips/Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
new file mode 100644
index 0000000000..a0c6361ae6
--- /dev/null
+++ b/data/2022/neurips/Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees	
@@ -0,0 +1 @@
+We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in $H$, $S$, $A$, and $T$ per state-action pair. For OPSRL we guarantee a high-probability regret bound of order at most $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ ignoring $\text{poly}\log(HSAT)$ terms. The key novel technical ingredient is a new sharp anti-concentration inequality for linear forms which may be of independent interest. Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions. Our bound matches the lower bound of order $\Omega(\sqrt{H^3SAT})$, thereby answering the open problems raised by Agrawal and Jia [2017b] for the episodic setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimistic Tree Searches for Combinatorial Black-Box Optimization b/data/2022/neurips/Optimistic Tree Searches for Combinatorial Black-Box Optimization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Optimizing Data Collection for Machine Learning b/data/2022/neurips/Optimizing Data Collection for Machine Learning
new file mode 100644
index 0000000000..43733b9499
--- /dev/null
+++ b/data/2022/neurips/Optimizing Data Collection for Machine Learning	
@@ -0,0 +1 @@
+Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data to collect. Over-collecting data incurs unnecessary present costs, while under-collecting may incur future costs and delay workflows. We propose a new paradigm for modeling the data collection workflow as a formal optimal data collection problem that allows designers to specify performance targets, collection costs, a time horizon, and penalties for failing to meet the targets. Additionally, this formulation generalizes to tasks requiring multiple data sources, such as labeled and unlabeled data used in semi-supervised learning. To solve our problem, we develop Learn-Optimize-Collect (LOC), which minimizes expected future collection costs. Finally, we numerically compare our framework to the conventional baseline of estimating data requirements by extrapolating from neural scaling laws. We significantly reduce the risks of failing to meet desired performance targets on several classification, segmentation, and detection tasks, while maintaining low total collection costs.
\ No newline at end of file
diff --git a/data/2022/neurips/Optimizing Relevance Maps of Vision Transformers Improves Robustness b/data/2022/neurips/Optimizing Relevance Maps of Vision Transformers Improves Robustness
new file mode 100644
index 0000000000..af57f2cea1
--- /dev/null
+++ b/data/2022/neurips/Optimizing Relevance Maps of Vision Transformers Improves Robustness	
@@ -0,0 +1 @@
+It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model's relevancy signal and manipulate it such that the model is focused on the foreground object. This is done as a finetuning step, involving relatively few samples consisting of pairs of images and their associated foreground masks. Specifically, we encourage the model's relevancy map (i) to assign lower relevance to background regions, (ii) to consider as much information as possible from the foreground, and (iii) we encourage the decisions to have high confidence. When applied to Vision Transformer (ViT) models, a marked improvement in robustness to domain shifts is observed. Moreover, the foreground masks can be obtained automatically, from a self-supervised variant of the ViT model itself; therefore no additional supervision is required.
\ No newline at end of file
diff --git a/data/2022/neurips/Oracle Inequalities for Model Selection in Offline Reinforcement Learning b/data/2022/neurips/Oracle Inequalities for Model Selection in Offline Reinforcement Learning
new file mode 100644
index 0000000000..066a78b1af
--- /dev/null
+++ b/data/2022/neurips/Oracle Inequalities for Model Selection in Offline Reinforcement Learning	
@@ -0,0 +1 @@
+In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good policy without interacting with the environment. A major challenge in applying such methods in practice is the lack of both theoretically principled and practical tools for model selection and evaluation. To address this, we study the problem of model selection in offline RL with value function approximation. The learner is given a nested sequence of model classes to minimize squared Bellman error and must select among these to achieve a balance between approximation and estimation error of the classes. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors. The algorithm, ModBE, takes as input a collection of candidate model classes and a generic base offline RL algorithm. By successively eliminating model classes using a novel one-sided generalization test, ModBE returns a policy with regret scaling with the complexity of the minimally complete model class. In addition to its theoretical guarantees, it is conceptually simple and computationally efficient, amounting to solving a series of square loss regression problems and then comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
\ No newline at end of file
diff --git a/data/2022/neurips/Oracle-Efficient Online Learning for Smoothed Adversaries b/data/2022/neurips/Oracle-Efficient Online Learning for Smoothed Adversaries
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Order-Invariant Cardinality Estimators Are Differentially Private b/data/2022/neurips/Order-Invariant Cardinality Estimators Are Differentially Private
new file mode 100644
index 0000000000..33b41baa10
--- /dev/null
+++ b/data/2022/neurips/Order-Invariant Cardinality Estimators Are Differentially Private	
@@ -0,0 +1 @@
+We consider privacy in the context of streaming algorithms for cardinality estimation. We show that a large class of algorithms all satisfy $\epsilon$-differential privacy, so long as (a) the algorithm is combined with a simple down-sampling procedure, and (b) the cardinality of the input stream is $\Omega(k/\epsilon)$. Here, $k$ is a certain parameter of the sketch that is always at most the sketch size in bits, but is typically much smaller. We also show that, even with no modification, algorithms in our class satisfy $(\epsilon, \delta)$-differential privacy, where $\delta$ falls exponentially with the stream cardinality. Our analysis applies to essentially all popular cardinality estimation algorithms, and substantially generalizes and tightens privacy bounds from earlier works.
\ No newline at end of file
diff --git a/data/2022/neurips/Ordered Subgraph Aggregation Networks b/data/2022/neurips/Ordered Subgraph Aggregation Networks
new file mode 100644
index 0000000000..1c2527b165
--- /dev/null
+++ b/data/2022/neurips/Ordered Subgraph Aggregation Networks	
@@ -0,0 +1 @@
+Numerous subgraph-enhanced graph neural networks (GNNs) have emerged recently, provably boosting the expressive power of standard (message-passing) GNNs. However, there is a limited understanding of how these approaches relate to each other and to the Weisfeiler-Leman hierarchy. Moreover, current approaches either use all subgraphs of a given size, sample them uniformly at random, or use hand-crafted heuristics instead of learning to select subgraphs in a data-driven manner. Here, we offer a unified way to study such architectures by introducing a theoretical framework and extending the known expressivity results of subgraph-enhanced GNNs. Concretely, we show that increasing subgraph size always increases the expressive power and develop a better understanding of their limitations by relating them to the established $k\text{-}\mathsf{WL}$ hierarchy. In addition, we explore different approaches for learning to sample subgraphs using recent methods for backpropagating through complex discrete probability distributions. Empirically, we study the predictive performance of different subgraph-enhanced GNNs, showing that our data-driven architectures increase prediction accuracy on standard benchmark datasets compared to non-data-driven subgraph-enhanced graph neural networks while reducing computation time.
\ No newline at end of file
diff --git a/data/2022/neurips/OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression b/data/2022/neurips/OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression
new file mode 100644
index 0000000000..1eb966d8cc
--- /dev/null
+++ b/data/2022/neurips/OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression	
@@ -0,0 +1 @@
+This paper presents a language-powered paradigm for ordinal regression. Existing methods usually treat each rank as a category and employ a set of weights to learn these concepts. These methods are easy to overfit and usually attain unsatisfactory performance as the learned concepts are mainly derived from the training set. Recent large pre-trained vision-language models like CLIP have shown impressive performance on various visual tasks. In this paper, we propose to learn the rank concepts from the rich semantic CLIP latent space. Specifically, we reformulate this task as an image-language matching problem with a contrastive objective, which regards labels as text and obtains a language prototype from a text encoder for each rank. While prompt engineering for CLIP is extremely time-consuming, we propose OrdinalCLIP, a differentiable prompting method for adapting CLIP for ordinal regression. OrdinalCLIP consists of learnable context tokens and learnable rank embeddings; The learnable rank embeddings are constructed by explicitly modeling numerical continuity, resulting in well-ordered, compact language prototypes in the CLIP space. Once learned, we can only save the language prototypes and discard the huge language model, resulting in zero additional computational overhead compared with the linear head counterpart. Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks, and gains improvements in few-shot and distribution shift settings for age estimation. The code is available at https://github.com/xk-huang/OrdinalCLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization b/data/2022/neurips/Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization
new file mode 100644
index 0000000000..396aa12433
--- /dev/null
+++ b/data/2022/neurips/Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization	
@@ -0,0 +1 @@
+We present a general vision transformer backbone, called as Orthogonal Trans-former, in pursuit of both efﬁciency and effectiveness. A major challenge for vision transformer is that self-attention, as the key element in capturing long-range dependency, is very computationally expensive for dense prediction tasks (e.g., object detection). Coarse global self-attention and local self-attention are then designed to reduce the cost, but they suffer from either neglecting local correlations or hurting global modeling. We present an orthogonal self-attention mechanism to alleviate these issues. Speciﬁcally, self-attention is computed in the orthogonal space that is reversible to the spatial domain but has much lower resolution. The capabilities of learning global dependency and exploring local correlations are maintained because every orthogonal token in self-attention can attend to the entire visual tokens. Remarkably, orthogonality is realized by constructing an endogenously orthogonal matrix that is friendly to neural networks and can be optimized as arbitrary orthogonal matrices. We also introduce Positional MLP to incorporate position information for arbitrary input resolutions as well as enhance the capacity of MLP. Finally, we develop a hierarchical architecture for Orthogonal Transformer. Extensive experiments demonstrate its strong performance on a broad range of vision tasks, including image classiﬁcation, object detection, instance segmentation and semantic segmentation.
\ No newline at end of file
diff --git a/data/2022/neurips/Oscillatory Tracking of Continuous Attractor Neural Networks Account for Phase Precession and Procession of Hippocampal Place Cells b/data/2022/neurips/Oscillatory Tracking of Continuous Attractor Neural Networks Account for Phase Precession and Procession of Hippocampal Place Cells
new file mode 100644
index 0000000000..bbfc750334
--- /dev/null
+++ b/data/2022/neurips/Oscillatory Tracking of Continuous Attractor Neural Networks Account for Phase Precession and Procession of Hippocampal Place Cells	
@@ -0,0 +1 @@
+Hippocampal place cells of freely moving rodents display an intriguing temporal organization in their responses known as ‘theta phase precession’, in which individual neurons fire at progressively earlier phases in successive theta cycles as the animal traverses the place fields. Recent experimental studies found that in addition to phase precession, many place cells also exhibit accompanied phase procession, but the underlying neural mechanism remains unclear. Here, we propose a neural circuit model to elucidate the generation of both kinds of phase shift in place cells’ firing. Specifically, we consider a continuous attractor neural network (CANN) with feedback inhibition, which is inspired by the reciprocal interaction between the hippocampus and the medial septum. The feedback inhibition induces intrinsic mobility of the CANN which competes with the extrinsic mobility arising from the external drive. Their interplay generates an oscillatory tracking state, that is, the network bump state (resembling the decoded virtual position of the animal) sweeps back and forth around the external moving input (resembling the physical position of the animal). We show that this oscillatory tracking naturally explains the forward and backward sweeps of the decoded position during the animal’s locomotion. At the single neuron level, the forward and backward sweeps account for, respectively, theta phase precession and procession. Furthermore, by tuning the feedback inhibition strength, we also explain the emergence of bimodal cells and unimodal cells, with the former having co-existed phase precession and procession, and the latter having only significant phase precession. We hope that this study facilitates our understanding of hippocampal temporal coding and lays foundation for unveiling their computational functions
\ No newline at end of file
diff --git a/data/2022/neurips/Out-of-Distribution Detection via Conditional Kernel Independence Model b/data/2022/neurips/Out-of-Distribution Detection via Conditional Kernel Independence Model
new file mode 100644
index 0000000000..8caf166d60
--- /dev/null
+++ b/data/2022/neurips/Out-of-Distribution Detection via Conditional Kernel Independence Model	
@@ -0,0 +1 @@
+Recently, various methods have been introduced to address the OOD detection problem with training outlier exposure. These methods usually count on discriminative softmax metric or energy method to screen OOD samples. In this paper, we probe an alternative hypothesis on OOD detection by constructing a novel latent variable model based on independent component analysis (ICA) techniques. This novel method named Conditional-i builds upon the probabilistic formulation, and applies the Hilbert-Schmidt Independence Criteria that offers a convenient solution for optimizing variable dependencies. Conditional-i exclusively encodes the useful class condition into the probabilistic model, which provides the desired convenience in delivering theoretical support for the OOD detection task. To facilitate the implementation of the Conditional-i model, we construct unique memory bank architectures that allow for convenient end-to-end training within a tractable budget. Empirical results demonstrate an evident performance boost on benchmarks against SOTA methods. We also provide valuable theoretical justifications that our training strategy is guaranteed to bound the error in the context of OOD detection. Code is available at: https://github.com/OODHSIC/conditional-i.
\ No newline at end of file
diff --git a/data/2022/neurips/Out-of-Distribution Detection with An Adaptive Likelihood Ratio on Informative Hierarchical VAE b/data/2022/neurips/Out-of-Distribution Detection with An Adaptive Likelihood Ratio on Informative Hierarchical VAE
new file mode 100644
index 0000000000..b0f677e93e
--- /dev/null
+++ b/data/2022/neurips/Out-of-Distribution Detection with An Adaptive Likelihood Ratio on Informative Hierarchical VAE	
@@ -0,0 +1 @@
+Unsupervised out-of-distribution (OOD) detection is essential for the reliability of machine learning. In the literature, existing work has shown that higher-level semantics captured by hierarchical VAEs can be used to detect OOD instances. However, we empirically show that, the inheirt “ posterior collapse ” of hierarchical VAEs would seriously limit their capacity for OOD detection. Based on a thorough analysis, we propose an informative hierarchical VAE to alleviate this issue through enhancing the connections between the data sample and its multi-layer stochastic latent representations during training. Furthermore, we propose a novel score function for unsupervised OOD detection, referred to as Adaptive Likelihood Ratio, which can selectively aggregate the semantic information on multiple hidden layers of hierarchical VAEs, leading to a strong separability between in-distribution and OOD samples. Experimental results demonstrate that our method can significantly outperform existing state-of-the-art unsupervised OOD detection approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models b/data/2022/neurips/Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
new file mode 100644
index 0000000000..fbeb00fecd
--- /dev/null
+++ b/data/2022/neurips/Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models	
@@ -0,0 +1 @@
+Transformer architecture has become the fundamental element of the widespread natural language processing~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bottleneck for quantization performance. However, their proposed methods increase the computation overhead and still leave the outliers there. To fundamentally address this problem, this paper delves into the inherent inducement and importance of the outliers. We discover that $\boldsymbol \gamma$ in LayerNorm (LN) acts as a sinful amplifier for the outliers, and the importance of outliers varies greatly where some outliers provided by a few tokens cover a large area but can be clipped sharply without negative impacts. Motivated by these findings, we propose an outlier suppression framework including two components: Gamma Migration and Token-Wise Clipping. The Gamma Migration migrates the outlier amplifier to subsequent modules in an equivalent transformation, contributing to a more quantization-friendly model without any extra burden. The Token-Wise Clipping takes advantage of the large variance of token range and designs a token-wise coarse-to-fine pipeline, obtaining a clipping range with minimal final quantization loss in an efficient way. This framework effectively suppresses the outliers and can be used in a plug-and-play mode. Extensive experiments prove that our framework surpasses the existing works and, for the first time, pushes the 6-bit post-training BERT quantization to the full-precision (FP) level. Our code is available at https://github.com/wimh966/outlier_suppression.
\ No newline at end of file
diff --git a/data/2022/neurips/Outlier-Robust Sparse Estimation via Non-Convex Optimization b/data/2022/neurips/Outlier-Robust Sparse Estimation via Non-Convex Optimization
new file mode 100644
index 0000000000..57accaf38d
--- /dev/null
+++ b/data/2022/neurips/Outlier-Robust Sparse Estimation via Non-Convex Optimization	
@@ -0,0 +1 @@
+We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA. We develop novel and simple optimization formulations for these problems such that any approximate stationary point of the associated optimization problem yields a near-optimal solution for the underlying robust estimation task. As a corollary, we obtain that any first-order method that efficiently converges to stationarity yields an efficient algorithm for these tasks. The obtained algorithms are simple, practical, and succeed under broader distributional assumptions compared to prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions b/data/2022/neurips/Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions
new file mode 100644
index 0000000000..f0c9fd2cad
--- /dev/null
+++ b/data/2022/neurips/Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions	
@@ -0,0 +1 @@
+We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $\mu$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $\mu$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $\tau$, having an additive $\log(1/\tau)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
\ No newline at end of file
diff --git a/data/2022/neurips/Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling b/data/2022/neurips/Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling
new file mode 100644
index 0000000000..ccd5eae39f
--- /dev/null
+++ b/data/2022/neurips/Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling	
@@ -0,0 +1 @@
+As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication- and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses. Extensive empirical studies show that the proposed ECOS improves the quality of automated client labeling, model compression, and label outsourcing when applied in various learning scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Overparameterization from Computational Constraints b/data/2022/neurips/Overparameterization from Computational Constraints
new file mode 100644
index 0000000000..6a13a4d37f
--- /dev/null
+++ b/data/2022/neurips/Overparameterization from Computational Constraints	
@@ -0,0 +1 @@
+Overparameterized models with millions of parameters have been hugely successful. In this work, we ask: can the need for large models be, at least in part, due to the \emph{computational} limitations of the learner? Additionally, we ask, is this situation exacerbated for \emph{robust} learning? We show that this indeed could be the case. We show learning tasks for which computationally bounded learners need \emph{significantly more} model parameters than what information-theoretic learners need. Furthermore, we show that even more model parameters could be necessary for robust learning. In particular, for computationally bounded learners, we extend the recent result of Bubeck and Sellke [NeurIPS'2021] which shows that robust models might need more parameters, to the computational regime and show that bounded learners could provably need an even larger number of parameters. Then, we address the following related question: can we hope to remedy the situation for robust computationally bounded learning by restricting \emph{adversaries} to also be computationally bounded for sake of obtaining models with fewer parameters? Here again, we show that this could be possible. Specifically, building on the work of Garg, Jha, Mahloujifar, and Mahmoody [ALT'2020], we demonstrate a learning task that can be learned efficiently and robustly against a computationally bounded attacker, while to be robust against an information-theoretic attacker requires the learner to utilize significantly more parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting b/data/2022/neurips/P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
new file mode 100644
index 0000000000..a0a5aa3618
--- /dev/null
+++ b/data/2022/neurips/P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting	
@@ -0,0 +1 @@
+Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect. In this paper, we provide a new perspective of leveraging pre-trained 2D knowledge in 3D domain to tackle this problem, tuning pre-trained image models with the novel Point-to-Pixel prompting for point cloud analysis at a minor parameter cost. Following the principle of prompting engineering, we transform point clouds into colorful images with geometry-preserved projection and geometry-aware coloring to adapt to pre-trained image models, whose weights are kept frozen during the end-to-end optimization of point cloud analysis tasks. We conduct extensive experiments to demonstrate that cooperating with our proposed Point-to-Pixel Prompting, better pre-trained image model will lead to consistently better performance in 3D vision. Enjoying prosperous development from image pre-training field, our method attains 89.3% accuracy on the hardest setting of ScanObjectNN, surpassing conventional point cloud models with much fewer trainable parameters. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Segmentation. Code is available at https://github.com/wangzy22/P2P.
\ No newline at end of file
diff --git a/data/2022/neurips/PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization b/data/2022/neurips/PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
new file mode 100644
index 0000000000..3a46471fe4
--- /dev/null
+++ b/data/2022/neurips/PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization	
@@ -0,0 +1 @@
+While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning b/data/2022/neurips/PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/PALBERT: Teaching ALBERT to Ponder b/data/2022/neurips/PALBERT: Teaching ALBERT to Ponder
new file mode 100644
index 0000000000..abbff6187e
--- /dev/null
+++ b/data/2022/neurips/PALBERT: Teaching ALBERT to Ponder	
@@ -0,0 +1 @@
+Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence on wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed.Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layer’s index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from i-th layer, introduces major variance in model outputs, significantly reducing the resulting model’s performance.In this paper, we propose Ponder ALBERT (PALBERT) – an improvement to PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We compared PALBERT with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.
\ No newline at end of file
diff --git a/data/2022/neurips/PALMER: Perception - Action Loop with Memory for Long-Horizon Planning b/data/2022/neurips/PALMER: Perception - Action Loop with Memory for Long-Horizon Planning
new file mode 100644
index 0000000000..68105d5c6b
--- /dev/null
+++ b/data/2022/neurips/PALMER: Perception - Action Loop with Memory for Long-Horizon Planning	
@@ -0,0 +1 @@
+To achieve autonomy in a priori unknown real-world scenarios, agents should be able to: i) act from high-dimensional sensory observations (e.g., images), ii) learn from past experience to adapt and improve, and iii) be capable of long horizon planning. Classical planning algorithms (e.g. PRM, RRT) are proficient at handling long-horizon planning. Deep learning based methods in turn can provide the necessary representations to address the others, by modeling statistical contingencies between observations. In this direction, we introduce a general-purpose planning algorithm called PALMER that combines classical sampling-based planning algorithms with learning-based perceptual representations. For training these perceptual representations, we combine Q-learning with contrastive representation learning to create a latent space where the distance between the embeddings of two states captures how easily an optimal policy can traverse between them. For planning with these perceptual representations, we re-purpose classical sampling-based planning algorithms to retrieve previously observed trajectory segments from a replay buffer and restitch them into approximately optimal paths that connect any given pair of start and goal states. This creates a tight feedback loop between representation learning, memory, reinforcement learning, and sampling-based planning. The end result is an experiential framework for long-horizon planning that is significantly more robust and sample efficient compared to existing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/PDEBench: An Extensive Benchmark for Scientific Machine Learning b/data/2022/neurips/PDEBench: An Extensive Benchmark for Scientific Machine Learning
new file mode 100644
index 0000000000..9872b7c7cd
--- /dev/null
+++ b/data/2022/neurips/PDEBench: An Extensive Benchmark for Scientific Machine Learning	
@@ -0,0 +1 @@
+Machine learning-based modeling of physical systems has experienced increased interest in recent years. Despite some impressive progress, there is still a lack of benchmarks for Scientific ML that are easy to use but still challenging and representative of a wide range of problems. We introduce PDEBench, a benchmark suite of time-dependent simulation tasks based on Partial Differential Equations (PDEs). PDEBench comprises both code and data to benchmark the performance of novel machine learning models against both classical numerical simulations and machine learning baselines. Our proposed set of benchmark problems contribute the following unique features: (1) A much wider range of PDEs compared to existing benchmarks, ranging from relatively common examples to more realistic and difficult problems; (2) much larger ready-to-use datasets compared to prior work, comprising multiple simulation runs across a larger number of initial and boundary conditions and PDE parameters; (3) more extensible source codes with user-friendly APIs for data generation and baseline results with popular machine learning models (FNO, U-Net, PINN, Gradient-Based Inverse Method). PDEBench allows researchers to extend the benchmark freely for their own purposes using a standardized API and to compare the performance of new models to existing baseline methods. We also propose new evaluation metrics with the aim to provide a more holistic understanding of learning methods in the context of Scientific ML. With those metrics we identify tasks which are challenging for recent ML methods and propose these tasks as future challenges for the community. The code is available at https://github.com/pdebench/PDEBench.
\ No newline at end of file
diff --git a/data/2022/neurips/PDSketch: Integrated Domain Programming, Learning, and Planning b/data/2022/neurips/PDSketch: Integrated Domain Programming, Learning, and Planning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding b/data/2022/neurips/PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding
new file mode 100644
index 0000000000..361ceaae57
--- /dev/null
+++ b/data/2022/neurips/PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding	
@@ -0,0 +1 @@
+We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_Benchmark
\ No newline at end of file
diff --git a/data/2022/neurips/PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient b/data/2022/neurips/PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
new file mode 100644
index 0000000000..0da54ca6de
--- /dev/null
+++ b/data/2022/neurips/PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient	
@@ -0,0 +1 @@
+Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson Correlation Coefficient to focus on the relational information from the teacher and relax constraints on the magnitude of the features. Our method consistently outperforms the existing detection KD methods and works for both homogeneous and heterogeneous student-teacher pairs. Furthermore, it converges faster. With a powerful MaskRCNN-Swin detector as the teacher, ResNet-50 based RetinaNet and FCOS achieve 41.5% and 43.9% mAP on COCO2017, which are 4.1\% and 4.8\% higher than the baseline, respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics b/data/2022/neurips/PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/PaCo: Parameter-Compositional Multi-task Reinforcement Learning b/data/2022/neurips/PaCo: Parameter-Compositional Multi-task Reinforcement Learning
new file mode 100644
index 0000000000..f4b202f9cd
--- /dev/null
+++ b/data/2022/neurips/PaCo: Parameter-Compositional Multi-task Reinforcement Learning	
@@ -0,0 +1 @@
+The purpose of multi-task reinforcement learning (MTRL) is to train a single policy that can be applied to a set of different tasks. Sharing parameters allows us to take advantage of the similarities among tasks. However, the gaps between contents and difficulties of different tasks bring us challenges on both which tasks should share the parameters and what parameters should be shared, as well as the optimization challenges due to parameter sharing. In this work, we introduce a parameter-compositional approach (PaCo) as an attempt to address these challenges. In this framework, a policy subspace represented by a set of parameters is learned. Policies for all the single tasks lie in this subspace and can be composed by interpolating with the learned set. It allows not only flexible parameter sharing but also a natural way to improve training. We demonstrate the state-of-the-art performance on Meta-World benchmarks, verifying the effectiveness of the proposed approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Palm up: Playing in the Latent Manifold for Unsupervised Pretraining b/data/2022/neurips/Palm up: Playing in the Latent Manifold for Unsupervised Pretraining
new file mode 100644
index 0000000000..66990de2ea
--- /dev/null
+++ b/data/2022/neurips/Palm up: Playing in the Latent Manifold for Unsupervised Pretraining	
@@ -0,0 +1 @@
+Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the environment. In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets. Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space. The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator. We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data. We further leverage the temporal information of this data to pair data points as a natural supervision for representation learning. Our experiments suggest that the learned representations can be successfully transferred to downstream tasks in both vision and reinforcement learning domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network b/data/2022/neurips/Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network
new file mode 100644
index 0000000000..a10564aa70
--- /dev/null
+++ b/data/2022/neurips/Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network	
@@ -0,0 +1 @@
+Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-sharpening. Inspired by the classical reverse filtering that reverses images to the status before filtering, we formulate pan-sharpening as an alternately iterative reverse filtering process, which fuses LR MS and HR MS in an interpretable manner. Different from existing model-driven methods that require well-designed priors and degradation assumptions, the reverse filtering process avoids the dependency on pre-defined exact priors. To guarantee the stability and convergence of the iterative process via contraction mapping on a metric space, we develop the learnable multi-scale Gaussian kernel module, instead of using specific filters. We demonstrate the theoretical feasibility of such formulations. Extensive experiments on diverse scenes to thoroughly verify the performance of our method, significantly outperforming the state of the arts.
\ No newline at end of file
diff --git a/data/2022/neurips/Parallel Tempering With a Variational Reference b/data/2022/neurips/Parallel Tempering With a Variational Reference
new file mode 100644
index 0000000000..dce7e58c57
--- /dev/null
+++ b/data/2022/neurips/Parallel Tempering With a Variational Reference	
@@ -0,0 +1 @@
+Sampling from complex target distributions is a challenging task fundamental to Bayesian inference. Parallel tempering (PT) addresses this problem by constructing a Markov chain on the expanded state space of a sequence of distributions interpolating between the posterior distribution and a fixed reference distribution, which is typically chosen to be the prior. However, in the typical case where the prior and posterior are nearly mutually singular, PT methods are computationally prohibitive. In this work we address this challenge by constructing a generalized annealing path connecting the posterior to an adaptively tuned variational reference. The reference distribution is tuned to minimize the forward (inclusive) KL divergence to the posterior distribution using a simple, gradient-free moment-matching procedure. We show that our adaptive procedure converges to the forward KL minimizer, and that the forward KL divergence serves as a good proxy to a previously developed measure of PT performance. We also show that in the large-data limit in typical Bayesian models, the proposed method improves in performance, while traditional PT deteriorates arbitrarily. Finally, we introduce PT with two references -- one fixed, one variational -- with a novel split annealing path that ensures stable variational reference adaptation. The paper concludes with experiments that demonstrate the large empirical gains achieved by our method in a wide range of realistic Bayesian inference scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Parameter tuning and model selection in Optimal Transport with semi-dual Brenier formulation b/data/2022/neurips/Parameter tuning and model selection in Optimal Transport with semi-dual Brenier formulation
new file mode 100644
index 0000000000..64fc866439
--- /dev/null
+++ b/data/2022/neurips/Parameter tuning and model selection in Optimal Transport with semi-dual Brenier formulation	
@@ -0,0 +1 @@
+Over the past few years, numerous computational models have been developed to solve Optimal Transport (OT) in a stochastic setting, where distributions are represented by samples and where the goal is to find the closest map to the ground truth OT map, unknown in practical settings. So far, no quantitative criterion has yet been put forward to tune the parameters of these models and select maps that best approximate the ground truth. To perform this task, we propose to leverage the Brenier formulation of OT.Theoretically, we show that this formulation guarantees that, up to sharp a distortion parameter depending on the smoothness/strong convexity and a statistical deviation term, the selected map achieves the lowest quadratic error to the ground truth. This criterion, estimated via convex optimization, enables parameter tuning and model selection among entropic regularization of OT, input convex neural networks and smooth and strongly convex nearest-Brenier (SSNB) models.We also use this criterion to question the use of OT in Domain-Adaptation (DA). In a standard DA experiment, it enables us to identify the potential that is closest to the true OT map between the source and the target. Yet, we observe that this selected potential is far from being the one that performs best for the downstream transfer classification task.
\ No newline at end of file
diff --git a/data/2022/neurips/Parameter-Efficient Masking Networks b/data/2022/neurips/Parameter-Efficient Masking Networks
new file mode 100644
index 0000000000..3b230f80a6
--- /dev/null
+++ b/data/2022/neurips/Parameter-Efficient Masking Networks	
@@ -0,0 +1 @@
+A deeper network structure generally handles more complicated non-linearity and performs more competitively. Nowadays, advanced network designs often contain a large number of repetitive structures (e.g., Transformer). They empower the network capacity to a new level but also increase the model size inevitably, which is unfriendly to either model restoring or transferring. In this study, we are the first to investigate the representative potential of fixed random weights with limited unique values by learning diverse masks and introduce the Parameter-Efficient Masking Networks (PEMN). It also naturally leads to a new paradigm for model compression to diminish the model size. Concretely, motivated by the repetitive structures in modern neural networks, we utilize one random initialized layer, accompanied with different masks, to convey different feature mappings and represent repetitive network modules. Therefore, the model can be expressed as \textit{one-layer} with a bunch of masks, which significantly reduce the model storage cost. Furthermore, we enhance our strategy by learning masks for a model filled by padding a given random weights vector. In this way, our method can further lower the space complexity, especially for models without many repetitive architectures. We validate the potential of PEMN learning masks on random weights with limited unique values and test its effectiveness for a new compression paradigm based on different network architectures. Code is available at https://github.com/yueb17/PEMN
\ No newline at end of file
diff --git a/data/2022/neurips/Parameter-free Dynamic Graph Embedding for Link Prediction b/data/2022/neurips/Parameter-free Dynamic Graph Embedding for Link Prediction
new file mode 100644
index 0000000000..ede1f92c0a
--- /dev/null
+++ b/data/2022/neurips/Parameter-free Dynamic Graph Embedding for Link Prediction	
@@ -0,0 +1 @@
+Dynamic interaction graphs have been widely adopted to model the evolution of user-item interactions over time. There are two crucial factors when modelling user preferences for link prediction in dynamic interaction graphs: 1) collaborative relationship among users and 2) user personalized interaction patterns. Existing methods often implicitly consider these two factors together, which may lead to noisy user modelling when the two factors diverge. In addition, they usually require time-consuming parameter learning with back-propagation, which is prohibitive for real-time user preference modelling. To this end, this paper proposes FreeGEM, a parameter-free dynamic graph embedding method for link prediction. Firstly, to take advantage of the collaborative relationships, we propose an incremental graph embedding engine to obtain user/item embeddings, which is an Online-Monitor-Offline architecture consisting of an Online module to approximately embed users/items over time, a Monitor module to estimate the approximation error in real time and an Offline module to calibrate the user/item embeddings when the online approximation errors exceed a threshold. Meanwhile, we integrate attribute information into the model, which enables FreeGEM to better model users belonging to some under represented groups. Secondly, we design a personalized dynamic interaction pattern modeller, which combines dynamic time decay with attention mechanism to model user short-term interests. Experimental results on two link prediction tasks show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency. All code and datasets can be found in https://github.com/FudanCISL/FreeGEM.
\ No newline at end of file
diff --git a/data/2022/neurips/Parameter-free Regret in High Probability with Heavy Tails b/data/2022/neurips/Parameter-free Regret in High Probability with Heavy Tails
new file mode 100644
index 0000000000..d61cea57da
--- /dev/null
+++ b/data/2022/neurips/Parameter-free Regret in High Probability with Heavy Tails	
@@ -0,0 +1 @@
+We present new algorithms for online convex optimization over unbounded domains that obtain parameter-free regret in high-probability given access only to potentially heavy-tailed subgradient estimates. Previous work in unbounded domains considers only in-expectation results for sub-exponential subgradients. Unlike in the bounded domain case, we cannot rely on straight-forward martingale concentration due to exponentially large iterates produced by the algorithm. We develop new regularization techniques to overcome these problems. Overall, with probability at most $\delta$, for all comparators $\mathbf{u}$ our algorithm achieves regret $\tilde{O}(\| \mathbf{u} \| T^{1/\mathfrak{p}} \log (1/\delta))$ for subgradients with bounded $\mathfrak{p}^{th}$ moments for some $\mathfrak{p} \in (1, 2]$.
\ No newline at end of file
diff --git a/data/2022/neurips/Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference b/data/2022/neurips/Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference
new file mode 100644
index 0000000000..9869536ffc
--- /dev/null
+++ b/data/2022/neurips/Parameters or Privacy: A Provable Tradeoff Between Overparameterization and Membership Inference	
@@ -0,0 +1 @@
+A surprising phenomenon in modern machine learning is the ability of a highly overparameterized model to generalize well (small error on the test data) even when it is trained to memorize the training data (zero error on the training data). This has led to an arms race towards increasingly overparameterized models (c.f., deep learning). In this paper, we study an underexplored hidden cost of overparameterization: the fact that overparameterized models may be more vulnerable to privacy attacks, in particular the membership inference attack that predicts the (potentially sensitive) examples used to train a model. We significantly extend the relatively few empirical results on this problem by theoretically proving for an overparameterized linear regression model in the Gaussian data setting that membership inference vulnerability increases with the number of parameters. Moreover, a range of empirical studies indicates that more complex, nonlinear models exhibit the same behavior. Finally, we extend our analysis towards ridge-regularized linear regression and show in the Gaussian data setting that increased regularization also increases membership inference vulnerability in the overparameterized regime.
\ No newline at end of file
diff --git a/data/2022/neurips/Parametrically Retargetable Decision-Makers Tend To Seek Power b/data/2022/neurips/Parametrically Retargetable Decision-Makers Tend To Seek Power
new file mode 100644
index 0000000000..c8f6e696a7
--- /dev/null
+++ b/data/2022/neurips/Parametrically Retargetable Decision-Makers Tend To Seek Power	
@@ -0,0 +1 @@
+If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. In fully observable environments, most reward functions have an optimal policy which seeks power by keeping options open and staying alive. However, the real world is neither fully observable, nor must trained agents be even approximately reward-optimal. We consider a range of models of AI decision-making, from optimal, to random, to choices informed by learning and interacting with an environment. We discover that many decision-making functions are retargetable, and that retargetability is sufficient to cause power-seeking tendencies. Our functional criterion is simple and broad. We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, retargetable training procedures may train real-world agents which seek power over humans.
\ No newline at end of file
diff --git a/data/2022/neurips/Paraphrasing Is All You Need for Novel Object Captioning b/data/2022/neurips/Paraphrasing Is All You Need for Novel Object Captioning
new file mode 100644
index 0000000000..ab58207022
--- /dev/null
+++ b/data/2022/neurips/Paraphrasing Is All You Need for Novel Object Captioning	
@@ -0,0 +1 @@
+Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristically optimize the output captions via paraphrasing. With P2C, the captioning model first learns paraphrasing from a language model pre-trained on text-only corpus, allowing expansion of the word bank for improving linguistic fluency. To further enforce the output caption sufficiently describing the visual content of the input image, we perform self-paraphrasing for the captioning model with fidelity and adequacy objectives introduced. Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved. In the experiments, we not only show that our P2C achieves state-of-the-art performances on nocaps and COCO Caption datasets, we also verify the effectiveness and flexibility of our learning framework by replacing language and cross-modality association models for NOC. Implementation details and code are available in the supplementary materials.
\ No newline at end of file
diff --git a/data/2022/neurips/Pareto Set Learning for Expensive Multi-Objective Optimization b/data/2022/neurips/Pareto Set Learning for Expensive Multi-Objective Optimization
new file mode 100644
index 0000000000..b678af53f2
--- /dev/null
+++ b/data/2022/neurips/Pareto Set Learning for Expensive Multi-Objective Optimization	
@@ -0,0 +1 @@
+Expensive multi-objective optimization problems can be found in many real-world applications, where their objective function evaluations involve expensive computations or physical experiments. It is desirable to obtain an approximate Pareto front with a limited evaluation budget. Multi-objective Bayesian optimization (MOBO) has been widely used for finding a finite set of Pareto optimal solutions. However, it is well-known that the whole Pareto set is on a continuous manifold and can contain infinite solutions. The structural properties of the Pareto set are not well exploited in existing MOBO methods, and the finite-set approximation may not contain the most preferred solution(s) for decision-makers. This paper develops a novel learning-based method to approximate the whole Pareto set for MOBO, which generalizes the decomposition-based multi-objective optimization algorithm (MOEA/D) from finite populations to models. We design a simple and powerful acquisition search method based on the learned Pareto set, which naturally supports batch evaluation. In addition, with our proposed model, decision-makers can readily explore any trade-off area in the approximate Pareto set for flexible decision-making. This work represents the first attempt to model the Pareto set for expensive multi-objective optimization. Experimental results on different synthetic and real-world problems demonstrate the effectiveness of our proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Partial Identification of Treatment Effects with Implicit Generative Models b/data/2022/neurips/Partial Identification of Treatment Effects with Implicit Generative Models
new file mode 100644
index 0000000000..bc1a815871
--- /dev/null
+++ b/data/2022/neurips/Partial Identification of Treatment Effects with Implicit Generative Models	
@@ -0,0 +1 @@
+We consider the problem of partial identification, the estimation of bounds on the treatment effects from observational data. Although studied using discrete treatment variables or in specific causal graphs (e.g., instrumental variables), partial identification has been recently explored using tools from deep generative modeling. We propose a new method for partial identification of average treatment effects(ATEs) in general causal graphs using implicit generative models comprising continuous and discrete random variables. Since ATE with continuous treatment is generally non-regular, we leverage the partial derivatives of response functions to define a regular approximation of ATE, a quantity we call uniform average treatment derivative (UATD). We prove that our algorithm converges to tight bounds on ATE in linear structural causal models (SCMs). For nonlinear SCMs, we empirically show that using UATD leads to tighter and more stable bounds than methods that directly optimize the ATE.
\ No newline at end of file
diff --git a/data/2022/neurips/PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories b/data/2022/neurips/PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories
new file mode 100644
index 0000000000..014d3b8517
--- /dev/null
+++ b/data/2022/neurips/PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories	
@@ -0,0 +1 @@
+While 3D shape representations enable powerful reasoning in many visual and perception applications, learning 3D shape priors tends to be constrained to the specific categories trained on, leading to an inefficient learning process, particularly for general applications with unseen categories. Thus, we propose PatchComplete, which learns effective shape priors based on multi-resolution local patches, which are often more general than full shapes (e.g., chairs and tables often both share legs) and thus enable geometric reasoning about unseen class categories. To learn these shared substructures, we learn multi-resolution patch priors across all train categories, which are then associated to input partial shape observations by attention across the patch priors, and finally decoded into a complete shape reconstruction. Such patch-based priors avoid overfitting to specific train categories and enable reconstruction on entirely unseen categories at test time. We demonstrate the effectiveness of our approach on synthetic ShapeNet data as well as challenging real-scanned objects from ScanNet, which include noise and clutter, improving over state of the art in novel-category shape completion by 19.3% in chamfer distance on ShapeNet, and 9.0% for ScanNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Patching open-vocabulary models by interpolating weights b/data/2022/neurips/Patching open-vocabulary models by interpolating weights
new file mode 100644
index 0000000000..9a979a93dc
--- /dev/null
+++ b/data/2022/neurips/Patching open-vocabulary models by interpolating weights	
@@ -0,0 +1 @@
+Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.
\ No newline at end of file
diff --git a/data/2022/neurips/Path Independent Equilibrium Models Can Better Exploit Test-Time Computation b/data/2022/neurips/Path Independent Equilibrium Models Can Better Exploit Test-Time Computation
new file mode 100644
index 0000000000..8d2124702c
--- /dev/null
+++ b/data/2022/neurips/Path Independent Equilibrium Models Can Better Exploit Test-Time Computation	
@@ -0,0 +1 @@
+Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that stronger performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system -- its tendency to converge to the same steady-state behaviour regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Our results help explain why equilibrium models are capable of strong upwards generalization and motivates future work that harnesses path independence as a general modelling principle to facilitate scalable test-time usage.
\ No newline at end of file
diff --git a/data/2022/neurips/Pay attention to your loss : understanding misconceptions about Lipschitz neural networks b/data/2022/neurips/Pay attention to your loss : understanding misconceptions about Lipschitz neural networks
new file mode 100644
index 0000000000..3c961942d3
--- /dev/null
+++ b/data/2022/neurips/Pay attention to your loss : understanding misconceptions about Lipschitz neural networks	
@@ -0,0 +1 @@
+Lipschitz constrained networks have gathered considerable attention in the deep learning community, with usages ranging from Wasserstein distance estimation to the training of certifiably robust classifiers. However they remain commonly considered as less accurate, and their properties in learning are still not fully understood. In this paper we clarify the matter: when it comes to classification 1-Lipschitz neural networks enjoy several advantages over their unconstrained counterpart. First, we show that these networks are as accurate as classical ones, and can fit arbitrarily difficult boundaries. Then, relying on a robustness metric that reflects operational needs we characterize the most robust classifier: the WGAN discriminator. Next, we show that 1-Lipschitz neural networks generalize well under milder assumptions. Finally, we show that hyper-parameters of the loss are crucial for controlling the accuracy-robustness trade-off. We conclude that they exhibit appealing properties to pave the way toward provably accurate, and provably robust neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/PeRFception: Perception using Radiance Fields b/data/2022/neurips/PeRFception: Perception using Radiance Fields
new file mode 100644
index 0000000000..b03a997c0e
--- /dev/null
+++ b/data/2022/neurips/PeRFception: Perception using Radiance Fields	
@@ -0,0 +1 @@
+The recent progress in implicit 3D representation, i.e., Neural Radiance Fields (NeRFs), has made accurate and photorealistic 3D reconstruction possible in a differentiable manner. This new representation can effectively convey the information of hundreds of high-resolution images in one compact format and allows photorealistic synthesis of novel views. In this work, using the variant of NeRF called Plenoxels, we create the first large-scale implicit representation datasets for perception tasks, called the PeRFception, which consists of two parts that incorporate both object-centric and scene-centric scans for classification and segmentation. It shows a significant memory compression rate (96.4\%) from the original dataset, while containing both 2D and 3D information in a unified form. We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images. The code and data are publicly available in https://postech-cvlab.github.io/PeRFception .
\ No newline at end of file
diff --git a/data/2022/neurips/Peer Prediction for Learning Agents b/data/2022/neurips/Peer Prediction for Learning Agents
new file mode 100644
index 0000000000..65a7681418
--- /dev/null
+++ b/data/2022/neurips/Peer Prediction for Learning Agents	
@@ -0,0 +1 @@
+Peer prediction refers to a collection of mechanisms for eliciting information from human agents when direct verification of the obtained information is unavailable. They are designed to have a game-theoretic equilibrium where everyone reveals their private information truthfully. This result holds under the assumption that agents are Bayesian and they each adopt a fixed strategy across all tasks. Human agents however are observed in many domains to exhibit learning behavior in sequential settings. In this paper, we explore the dynamics of sequential peer prediction mechanisms when participants are learning agents. We first show that the notion of no regret alone for the agents' learning algorithms cannot guarantee convergence to the truthful strategy. We then focus on a family of learning algorithms where strategy updates only depend on agents' cumulative rewards and prove that agents' strategies in the popular Correlated Agreement (CA) mechanism converge to truthful reporting when they use algorithms from this family. This family of algorithms is not necessarily no-regret, but includes several familiar no-regret learning algorithms (e.g multiplicative weight update and Follow the Perturbed Leader) as special cases. Simulation of several algorithms in this family as well as the $\epsilon$-greedy algorithm, which is outside of this family, shows convergence to the truthful strategy in the CA mechanism.
\ No newline at end of file
diff --git a/data/2022/neurips/Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop b/data/2022/neurips/Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop
new file mode 100644
index 0000000000..c7736ed82b
--- /dev/null
+++ b/data/2022/neurips/Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop	
@@ -0,0 +1 @@
+No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual robustness of NR-IQA models. Under a Lagrangian formulation, we identify insightful connections of the proposed perceptual attack to previous beautiful ideas in computer vision and machine learning. We test one knowledge-driven and three data-driven NR-IQA methods under four full-reference IQA models (as approximations to human perception of just-noticeable differences). Through carefully designed psychophysical experiments, we find that all four NR-IQA models are vulnerable to the proposed perceptual attack. More interestingly, we observe that the generated counterexamples are not transferable, manifesting themselves as distinct design flows of respective NR-IQA methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Perfect Sampling from Pairwise Comparisons b/data/2022/neurips/Perfect Sampling from Pairwise Comparisons
new file mode 100644
index 0000000000..dd4f165242
--- /dev/null
+++ b/data/2022/neurips/Perfect Sampling from Pairwise Comparisons	
@@ -0,0 +1 @@
+In this work, we study how to efficiently obtain perfect samples from a discrete distribution $\mathcal{D}$ given access only to pairwise comparisons of elements of its support. Specifically, we assume access to samples $(x, S)$, where $S$ is drawn from a distribution over sets $\mathcal{Q}$ (indicating the elements being compared), and $x$ is drawn from the conditional distribution $\mathcal{D}_S$ (indicating the winner of the comparison) and aim to output a clean sample $y$ distributed according to $\mathcal{D}$. We mainly focus on the case of pairwise comparisons where all sets $S$ have size 2. We design a Markov chain whose stationary distribution coincides with $\mathcal{D}$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past. However, the sample complexity of this algorithm depends on the structure of the distribution $\mathcal{D}$ and can be even exponential in the support of $\mathcal{D}$ in many natural scenarios. Our main contribution is to provide an efficient exact sampling algorithm whose complexity does not depend on the structure of $\mathcal{D}$. To this end, we give a parametric Markov chain that mixes significantly faster given a good approximation to the stationary distribution. We can obtain such an approximation using an efficient learning from pairwise comparisons algorithm (Shah et al., JMLR 17, 2016). Our technique for speeding up sampling from a Markov chain whose stationary distribution is approximately known is simple, general and possibly of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/PerfectDou: Dominating DouDizhu with Perfect Information Distillation b/data/2022/neurips/PerfectDou: Dominating DouDizhu with Perfect Information Distillation
new file mode 100644
index 0000000000..dd991e5ad8
--- /dev/null
+++ b/data/2022/neurips/PerfectDou: Dominating DouDizhu with Perfect Information Distillation	
@@ -0,0 +1 @@
+As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation. In detail, we adopt a perfect-training-imperfect-execution framework that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game and the trained policies can be used to play the imperfect information game during the actual gameplay. To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information. To train our system, we adopt proximal policy optimization with generalized advantage estimation in a parallel training paradigm. In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Performative Power b/data/2022/neurips/Performative Power
new file mode 100644
index 0000000000..39168f64ca
--- /dev/null
+++ b/data/2022/neurips/Performative Power	
@@ -0,0 +1 @@
+We introduce the notion of performative power, which measures the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to cause change in a population of participants. We relate performative power to the economic study of competition in digital economies. Traditional economic concepts struggle with identifying anti-competitive patterns in digital platforms not least due to the complexity of market definition. In contrast, performative power is a causal notion that is identifiable with minimal knowledge of the market, its internals, participants, products, or prices. Low performative power implies that a firm can do no better than to optimize their objective on current data. In contrast, firms of high performative power stand to benefit from steering the population towards more profitable behavior. We confirm in a simple theoretical model that monopolies maximize performative power. A firm's ability to personalize increases performative power, while competition and outside options decrease performative power. On the empirical side, we propose an observational causal design to identify performative power from discontinuities in how digital platforms display content. This allows to repurpose causal effects from various studies about digital platforms as lower bounds on performative power. Finally, we speculate about the role that performative power might play in competition policy and antitrust enforcement in digital marketplaces.
\ No newline at end of file
diff --git a/data/2022/neurips/Periodic Graph Transformers for Crystal Material Property Prediction b/data/2022/neurips/Periodic Graph Transformers for Crystal Material Property Prediction
new file mode 100644
index 0000000000..94de559d0d
--- /dev/null
+++ b/data/2022/neurips/Periodic Graph Transformers for Crystal Material Property Prediction	
@@ -0,0 +1 @@
+We consider representation learning on periodic graphs encoding crystal materials. Different from regular graphs, periodic graphs consist of a minimum unit cell repeating itself on a regular lattice in 3D space. How to effectively encode these periodic structures poses unique challenges not present in regular graph representation learning. In addition to being E(3) invariant, periodic graph representations need to be periodic invariant. That is, the learned representations should be invariant to shifts of cell boundaries as they are artificially imposed. Furthermore, the periodic repeating patterns need to be captured explicitly as lattices of different sizes and orientations may correspond to different materials. In this work, we propose a transformer architecture, known as Matformer, for periodic graph representation learning. Our Matformer is designed to be invariant to periodicity and can capture repeating patterns explicitly. In particular, Matformer encodes periodic patterns by efficient use of geometric distances between the same atoms in neighboring cells. Experimental results on multiple common benchmark datasets show that our Matformer outperforms baseline methods consistently. In addition, our results demonstrate the importance of periodic invariance and explicit repeating pattern encoding for crystal representation learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Peripheral Vision Transformer b/data/2022/neurips/Peripheral Vision Transformer
new file mode 100644
index 0000000000..783b9f2a5e
--- /dev/null
+++ b/data/2022/neurips/Peripheral Vision Transformer	
@@ -0,0 +1 @@
+Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Personalized Federated Learning towards Communication Efficiency, Robustness and Fairness b/data/2022/neurips/Personalized Federated Learning towards Communication Efficiency, Robustness and Fairness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Personalized Online Federated Learning with Multiple Kernels b/data/2022/neurips/Personalized Online Federated Learning with Multiple Kernels
new file mode 100644
index 0000000000..898cf3035b
--- /dev/null
+++ b/data/2022/neurips/Personalized Online Federated Learning with Multiple Kernels	
@@ -0,0 +1 @@
+Multi-kernel learning (MKL) exhibits well-documented performance in online non-linear function approximation. Federated learning enables a group of learners (called clients) to train an MKL model on the data distributed among clients to perform online non-linear function approximation. There are some challenges in online federated MKL that need to be addressed: i) Communication efficiency especially when a large number of kernels are considered ii) Heterogeneous data distribution among clients. The present paper develops an algorithmic framework to enable clients to communicate with the server to send their updates with affordable communication cost while clients employ a large dictionary of kernels. Utilizing random feature (RF) approximation, the present paper proposes scalable online federated MKL algorithm. We prove that using the proposed online federated MKL algorithm, each client enjoys sub-linear regret with respect to the RF approximation of its best kernel in hindsight, which indicates that the proposed algorithm can effectively deal with heterogeneity of the data distributed among clients. Experimental results on real datasets showcase the advantages of the proposed algorithm compared with other online federated kernel learning ones.
\ No newline at end of file
diff --git a/data/2022/neurips/Perturbation Learning Based Anomaly Detection b/data/2022/neurips/Perturbation Learning Based Anomaly Detection
new file mode 100644
index 0000000000..8111c3f5dd
--- /dev/null
+++ b/data/2022/neurips/Perturbation Learning Based Anomaly Detection	
@@ -0,0 +1 @@
+This paper presents a simple yet effective method for anomaly detection. The main idea is to learn small perturbations to perturb normal data and learn a classifier to classify the normal data and the perturbed data into two different classes. The perturbator and classifier are jointly learned using deep neural networks. Importantly, the perturbations should be as small as possible but the classifier is still able to recognize the perturbed data from unperturbed data. Therefore, the perturbed data are regarded as abnormal data and the classifier provides a decision boundary between the normal data and abnormal data, although the training data do not include any abnormal data. Compared with the state-of-the-art of anomaly detection, our method does not require any assumption about the shape (e.g. hypersphere) of the decision boundary and has fewer hyper-parameters to determine. Empirical studies on benchmark datasets verify the effectiveness and superiority of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Phase Transition from Clean Training to Adversarial Training b/data/2022/neurips/Phase Transition from Clean Training to Adversarial Training
new file mode 100644
index 0000000000..20f4abb031
--- /dev/null
+++ b/data/2022/neurips/Phase Transition from Clean Training to Adversarial Training	
@@ -0,0 +1 @@
+Adversarial training is one important algorithm to achieve robust machine learning models. However, numerous empirical results show a great performance degradation from clean training to adversarial training (e
\ No newline at end of file
diff --git a/data/2022/neurips/Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks b/data/2022/neurips/Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
new file mode 100644
index 0000000000..89d674be24
--- /dev/null
+++ b/data/2022/neurips/Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks	
@@ -0,0 +1 @@
+Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad & Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.
\ No newline at end of file
diff --git a/data/2022/neurips/Phase transitions in when feedback is useful b/data/2022/neurips/Phase transitions in when feedback is useful
new file mode 100644
index 0000000000..aba0fd4ff9
--- /dev/null
+++ b/data/2022/neurips/Phase transitions in when feedback is useful	
@@ -0,0 +1 @@
+Sensory observations about the world are invariably ambiguous. Inference about the world's latent variables is thus an important computation for the brain. However, computational constraints limit the performance of these computations. These constraints include energetic costs for neural activity and noise on every channel. Efficient coding is one prominent theory that describes how such limited resources can best be used. In one incarnation, this leads to a theory of predictive coding, where predictions are subtracted from signals, reducing the cost of sending something that is already known. This theory does not, however, account for the costs or noise associated with those predictions. Here we offer a theory that accounts for both feedforward and feedback costs, and noise in all computations. We formulate this inference problem as message-passing on a graph whereby feedback serves as an internal control signal aiming to maximize how well an inference tracks a target state while minimizing the costs of computation. We apply this novel formulation of inference as control to the canonical problem of inferring the hidden scalar state of a linear dynamical system with Gaussian variability. The best solution depends on architectural constraints, such as Dale's law, the ubiquitous law that each neuron makes solely excitatory or inhibitory postsynaptic connections. This biological structure can create asymmetric costs for feedforward and feedback channels. Under such conditions, our theory predicts the gain of optimal predictive feedback and how it is incorporated into the inference computation. We show that there is a non-monotonic dependence of optimal feedback gain as a function of both the computational parameters and the world dynamics, leading to phase transitions in whether feedback provides any utility in optimal inference under computational constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding b/data/2022/neurips/Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
new file mode 100644
index 0000000000..e33b87ab6e
--- /dev/null
+++ b/data/2022/neurips/Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding	
@@ -0,0 +1 @@
+We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. See https://imagen.research.google/ for an overview of the results.
\ No newline at end of file
diff --git a/data/2022/neurips/PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery b/data/2022/neurips/PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery
new file mode 100644
index 0000000000..c46f0d6638
--- /dev/null
+++ b/data/2022/neurips/PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery	
@@ -0,0 +1 @@
+Correctly capturing intraoperative brain shift in image-guided neurosurgical procedures is a critical task for aligning preoperative data with intraoperative geometry for ensuring accurate surgical navigation. While the finite element method (FEM) is a proven technique to effectively approximate soft tissue deformation through biomechanical formulations, their degree of success boils down to a trade-off between accuracy and speed. To circumvent this problem, the most recent works in this domain have proposed leveraging data-driven models obtained by training various machine learning algorithms -- e.g., random forests, artificial neural networks (ANNs) -- with the results of finite element analysis (FEA) to speed up tissue deformation approximations by prediction. These methods, however, do not account for the structure of the finite element (FE) mesh during training that provides information on node connectivities as well as the distance between them, which can aid with approximating tissue deformation based on the proximity of force load points with the rest of the mesh nodes. Therefore, this work proposes a novel framework, PhysGNN, a data-driven model that approximates the solution of the FEM by leveraging graph neural networks (GNNs), which are capable of accounting for the mesh structural information and inductive learning over unstructured grids and complex topological structures. Empirically, we demonstrate that the proposed architecture, PhysGNN, promises accurate and fast soft tissue deformation approximations, and is competitive with the state-of-the-art (SOTA) algorithms while promising enhanced computational feasibility, therefore suitable for neurosurgical settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Physically-Based Face Rendering for NIR-VIS Face Recognition b/data/2022/neurips/Physically-Based Face Rendering for NIR-VIS Face Recognition
new file mode 100644
index 0000000000..ce2e23d251
--- /dev/null
+++ b/data/2022/neurips/Physically-Based Face Rendering for NIR-VIS Face Recognition	
@@ -0,0 +1 @@
+Near infrared (NIR) to Visible (VIS) face matching is challenging due to the significant domain gaps as well as a lack of sufficient data for cross-modality model training. To overcome this problem, we propose a novel method for paired NIR-VIS facial image generation. Specifically, we reconstruct 3D face shape and reflectance from a large 2D facial dataset and introduce a novel method of transforming the VIS reflectance to NIR reflectance. We then use a physically-based renderer to generate a vast, high-resolution and photorealistic dataset consisting of various poses and identities in the NIR and VIS spectra. Moreover, to facilitate the identity feature learning, we propose an IDentity-based Maximum Mean Discrepancy (ID-MMD) loss, which not only reduces the modality gap between NIR and VIS images at the domain level but encourages the network to focus on the identity features instead of facial details, such as poses and accessories. Extensive experiments conducted on four challenging NIR-VIS face recognition benchmarks demonstrate that the proposed method can achieve comparable performance with the state-of-the-art (SOTA) methods without requiring any existing NIR-VIS face recognition datasets. With slightly fine-tuning on the target NIR-VIS face recognition datasets, our method can significantly surpass the SOTA performance. Code and pretrained models are released under the insightface (https://github.com/deepinsight/insightface/tree/master/recognition).
\ No newline at end of file
diff --git a/data/2022/neurips/Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions b/data/2022/neurips/Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions
new file mode 100644
index 0000000000..f97408da82
--- /dev/null
+++ b/data/2022/neurips/Physics-Embedded Neural Networks: Graph Neural PDE Solvers with Mixed Boundary Conditions	
@@ -0,0 +1 @@
+Graph neural network (GNN) is a promising approach to learning and predicting physical phenomena described in boundary value problems, such as partial differential equations (PDEs) with boundary conditions. However, existing models inadequately treat boundary conditions essential for the reliable prediction of such problems. In addition, because of the locally connected nature of GNNs, it is difficult to accurately predict the state after a long time, where interaction between vertices tends to be global. We present our approach termed physics-embedded neural networks that considers boundary conditions and predicts the state after a long time using an implicit method. It is built based on an E(n)-equivariant GNN, resulting in high generalization performance on various shapes. We demonstrate that our model learns flow phenomena in complex shapes and outperforms a well-optimized classical solver and a state-of-the-art machine learning model in speed-accuracy trade-off. Therefore, our model can be a useful standard for realizing reliable, fast, and accurate GNN-based PDE solvers. The code is available at https://github.com/yellowshippo/penn-neurips2022.
\ No newline at end of file
diff --git a/data/2022/neurips/Physics-Informed Implicit Representations of Equilibrium Network Flows b/data/2022/neurips/Physics-Informed Implicit Representations of Equilibrium Network Flows
new file mode 100644
index 0000000000..987cf9e687
--- /dev/null
+++ b/data/2022/neurips/Physics-Informed Implicit Representations of Equilibrium Network Flows	
@@ -0,0 +1 @@
+Flow networks are ubiquitous in natural and engineered systems, and in order to understand and manage these networks, one must quantify the ﬂow of commodities across their edges. This paper considers the estimation problem of predicting unlabeled edge ﬂows from nodal supply and demand. We propose an implicit neural network layer that incorporates two fundamental physical laws: conservation of mass, and the existence of a constitutive relationship between edge ﬂows and nodal states (e.g., Ohm’s law). Computing the edge ﬂows from these two laws is a nonlinear inverse problem, which our layer solves efﬁciently with a specialized contraction mapping. Using implicit differentiation to compute the solution’s gradients, our model is able to learn the constitutive relationship within a semi-supervised framework. We demonstrate that our approach can accurately predict edge ﬂows in AC power networks and water distribution systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization? b/data/2022/neurips/Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?
new file mode 100644
index 0000000000..08ecc45451
--- /dev/null
+++ b/data/2022/neurips/Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?	
@@ -0,0 +1 @@
+As the scope of machine learning broadens, we observe a recurring theme of algorithmic monoculture: the same systems, or systems that share components (e.g. training data), are deployed by multiple decision-makers. While sharing offers clear advantages (e.g. amortizing costs), does it bear risks? We introduce and formalize one such risk, outcome homogenization: the extent to which particular individuals or groups experience negative outcomes from all decision-makers. If the same individuals or groups exclusively experience undesirable outcomes, this may institutionalize systemic exclusion and reinscribe social hierarchy. To relate algorithmic monoculture and outcome homogenization, we propose the component-sharing hypothesis: if decision-makers share components like training data or specific models, then they will produce more homogeneous outcomes. We test this hypothesis on algorithmic fairness benchmarks, demonstrating that sharing training data reliably exacerbates homogenization, with individual-level effects generally exceeding group-level effects. Further, given the dominant paradigm in AI of foundation models, i.e. models that can be adapted for myriad downstream tasks, we test whether model sharing homogenizes outcomes across tasks. We observe mixed results: we find that for both vision and language settings, the specific methods for adapting a foundation model significantly influence the degree of outcome homogenization. We conclude with philosophical analyses of and societal challenges for outcome homogenization, with an eye towards implications for deployed machine learning systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset b/data/2022/neurips/Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
new file mode 100644
index 0000000000..78603272c2
--- /dev/null
+++ b/data/2022/neurips/Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset	
@@ -0,0 +1 @@
+One concern with the rise of large language models lies with their potential for significant harm, particularly from pretraining on biased, obscene, copyrighted, and private information. Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material. First, we gather and make available the Pile of Law, a 256GB (and growing) dataset of open-source English-language legal and administrative data, covering court opinions, contracts, administrative rules, and legislative records. Pretraining on the Pile of Law may help with legal tasks that have the promise to improve access to justice. Second, we distill the legal norms that governments have developed to constrain the inclusion of toxic or private content into actionable lessons for researchers and discuss how our dataset reflects these norms. Third, we show how the Pile of Law offers researchers the opportunity to learn such filtering rules directly from the data, providing an exciting new research direction in model-based processing.
\ No newline at end of file
diff --git a/data/2022/neurips/Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation b/data/2022/neurips/Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation
new file mode 100644
index 0000000000..18eaaf11ef
--- /dev/null
+++ b/data/2022/neurips/Pitfalls of Epistemic Uncertainty Quantification through Loss Minimisation	
@@ -0,0 +1 @@
+Uncertainty quantification has received increasing attention in machine learning in the recent past. In particular, a distinction between aleatoric and epistemic uncertainty has been found useful in this regard. The latter refers to the learner's (lack of) knowledge and appears to be especially difficult to measure and quantify. In this paper, we analyse a recent proposal based on the idea of a second-order learner, which yields predictions in the form of distributions over probability distributions. While standard (first-order) learners can be trained to predict accurate probabilities, namely by minimising suitable loss functions on sample data, we show that loss minimisation does not work for second-order predictors: The loss functions proposed for inducing such predictors do not incentivise the learner to represent its epistemic uncertainty in a faithful way.
\ No newline at end of file
diff --git a/data/2022/neurips/Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning b/data/2022/neurips/Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning
new file mode 100644
index 0000000000..ae27704c7b
--- /dev/null
+++ b/data/2022/neurips/Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning	
@@ -0,0 +1 @@
+In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples. However, learning an accurate model can be difficult since the policy is continually updated and the induced distribution over visited states used for model learning shifts accordingly. Prior methods alleviate this issue by quantifying the uncertainty of model-generated samples. However, these methods only quantify the uncertainty passively after the samples were generated, rather than foreseeing the uncertainty before model trajectories fall into those highly uncertain regions. The resulting low-quality samples can induce unstable learning targets and hinder the optimization of the policy. Moreover, while being learned to minimize one-step prediction errors, the model is generally used to predict for multiple steps, leading to a mismatch between the objectives of model learning and model usage. To this end, we propose \emph{Plan To Predict} (P2P), an MBRL framework that treats the model rollout process as a sequential decision making problem by reversely considering the model as a decision maker and the current policy as the dynamics. In this way, the model can quickly adapt to the current policy and foresee the multi-step future uncertainty when generating trajectories. Theoretically, we show that the performance of P2P can be guaranteed by approximately optimizing a lower bound of the true environment return. Empirical results demonstrate that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Planning for Sample Efficient Imitation Learning b/data/2022/neurips/Planning for Sample Efficient Imitation Learning
new file mode 100644
index 0000000000..e97089c10c
--- /dev/null
+++ b/data/2022/neurips/Planning for Sample Efficient Imitation Learning	
@@ -0,0 +1 @@
+Imitation learning is a class of promising policy learning algorithms that is free from many practical issues with reinforcement learning, such as the reward design issue and the exploration hardness. However, the current imitation algorithm struggles to achieve both high performance and high in-environment sample efficiency simultaneously. Behavioral Cloning (BC) does not need in-environment interactions, but it suffers from the covariate shift problem which harms its performance. Adversarial Imitation Learning (AIL) turns imitation learning into a distribution matching problem. It can achieve better performance on some tasks but it requires a large number of in-environment interactions. Inspired by the recent success of EfficientZero in RL, we propose EfficientImitate (EI), a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Our algorithmic contribution in this paper is two-fold. First, we extend AIL into the MCTS-based RL. Second, we show the seemingly incompatible two classes of imitation algorithms (BC and AIL) can be naturally unified under our framework, enjoying the benefits of both. We benchmark our method not only on the state-based DeepMind Control Suite, but also on the image version which many previous works find highly challenging. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency. EI shows over 4x gain in performance in the limited sample setting on state-based and image-based tasks and can solve challenging problems like Humanoid, where previous methods fail with small amount of interactions. Our code is available at https://github.com/zhaohengyin/EfficientImitate.
\ No newline at end of file
diff --git a/data/2022/neurips/Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction b/data/2022/neurips/Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction
new file mode 100644
index 0000000000..9ab7937ed6
--- /dev/null
+++ b/data/2022/neurips/Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction	
@@ -0,0 +1 @@
+The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/PlasticityNet: Learning to Simulate Metal, Sand, and Snow for Optimization Time Integration b/data/2022/neurips/PlasticityNet: Learning to Simulate Metal, Sand, and Snow for Optimization Time Integration
new file mode 100644
index 0000000000..62d08e1ab1
--- /dev/null
+++ b/data/2022/neurips/PlasticityNet: Learning to Simulate Metal, Sand, and Snow for Optimization Time Integration	
@@ -0,0 +1 @@
+In this paper, we propose a neural network-based approach for learning to represent the behavior of plastic solid materials ranging from rubber and metal to sand and snow. Unlike elastic forces such as spring forces, these plastic forces do not result from the positional gradient of any potential energy, imposing great challenges on the stability and flexibility of their simulation. Our method effectively resolves this issue by learning a generalizable plastic energy whose derivative closely matches the analytical behavior of plastic forces. Our method, for the first time, enables the simulation of a wide range of arbitrary elasticity-plasticity combinations using time step-independent, unconditionally stable optimization-based time integrators. We demonstrate the efficacy of our method by learning and producing challenging 2D and 3D effects of metal, sand, and snow with complex dynamics.
\ No newline at end of file
diff --git a/data/2022/neurips/Pluralistic Image Completion with Gaussian Mixture Models b/data/2022/neurips/Pluralistic Image Completion with Gaussian Mixture Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Point Transformer V2: Grouped Vector Attention and Partition-based Pooling b/data/2022/neurips/Point Transformer V2: Grouped Vector Attention and Partition-based Pooling
new file mode 100644
index 0000000000..9349900302
--- /dev/null
+++ b/data/2022/neurips/Point Transformer V2: Grouped Vector Attention and Partition-based Pooling	
@@ -0,0 +1 @@
+As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.
\ No newline at end of file
diff --git a/data/2022/neurips/Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training b/data/2022/neurips/Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
new file mode 100644
index 0000000000..3ecadaf252
--- /dev/null
+++ b/data/2022/neurips/Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training	
@@ -0,0 +1 @@
+Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. However, it still remains an open question on how to exploit masked autoencoding for learning 3D representations of irregular point clouds. In this paper, we propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds. Unlike the standard transformer in MAE, we modify the encoder and decoder into pyramid architectures to progressively model spatial geometries and capture both fine-grained and high-level semantics of 3D shapes. For the encoder that downsamples point tokens by stages, we design a multi-scale masking strategy to generate consistent visible regions across scales, and adopt a local spatial self-attention mechanism during fine-tuning to focus on neighboring patterns. By multi-scale token propagation, the lightweight decoder gradually upsamples point tokens with complementary skip connections from the encoder, which further promotes the reconstruction from a global-to-local perspective. Extensive experiments demonstrate the state-of-the-art performance of Point-M2AE for 3D representation learning. With a frozen encoder after pre-training, Point-M2AE achieves 92.9% accuracy for linear SVM on ModelNet40, even surpassing some fully trained methods. By fine-tuning on downstream tasks, Point-M2AE achieves 86.43% accuracy on ScanObjectNN, +3.36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme. Code is available at https://github.com/ZrrSkywalker/Point-M2AE.
\ No newline at end of file
diff --git a/data/2022/neurips/PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies b/data/2022/neurips/PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
new file mode 100644
index 0000000000..2ac2d321e9
--- /dev/null
+++ b/data/2022/neurips/PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies	
@@ -0,0 +1 @@
+PointNet++ is one of the most influential neural architectures for point cloud understanding. Although the accuracy of PointNet++ has been largely surpassed by recent networks such as PointMLP and Point Transformer, we find that a large portion of the performance gain is due to improved training strategies, i.e. data augmentation and optimization techniques, and increased model sizes rather than architectural innovations. Thus, the full potential of PointNet++ has yet to be explored. In this work, we revisit the classical PointNet++ through a systematic study of model training and scaling strategies, and offer two major contributions. First, we propose a set of improved training strategies that significantly improve PointNet++ performance. For example, we show that, without any change in architecture, the overall accuracy (OA) of PointNet++ on ScanObjectNN object classification can be raised from 77.9% to 86.1%, even outperforming state-of-the-art PointMLP. Second, we introduce an inverted residual bottleneck design and separable MLPs into PointNet++ to enable efficient and effective model scaling and propose PointNeXt, the next version of PointNets. PointNeXt can be flexibly scaled up and outperforms state-of-the-art methods on both 3D classification and segmentation tasks. For classification, PointNeXt reaches an overall accuracy of 87.7 on ScanObjectNN, surpassing PointMLP by 2.3%, while being 10x faster in inference. For semantic segmentation, PointNeXt establishes a new state-of-the-art performance with 74.9% mean IoU on S3DIS (6-fold cross-validation), being superior to the recent Point Transformer. The code and models are available at https://github.com/guochengqian/pointnext.
\ No newline at end of file
diff --git a/data/2022/neurips/PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points b/data/2022/neurips/PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points
new file mode 100644
index 0000000000..ad364e4c94
--- /dev/null
+++ b/data/2022/neurips/PointTAD: Multi-Label Temporal Action Detection with Learnable Query Points	
@@ -0,0 +1 @@
+Traditional temporal action detection (TAD) usually handles untrimmed videos with small number of action instances from a single label (e.g., ActivityNet, THUMOS). However, this setting might be unrealistic as different classes of actions often co-occur in practice. In this paper, we focus on the task of multi-label temporal action detection that aims to localize all action instances from a multi-label untrimmed video. Multi-label TAD is more challenging as it requires for fine-grained class discrimination within a single video and precise localization of the co-occurring instances. To mitigate this issue, we extend the sparse query-based detection paradigm from the traditional TAD and propose the multi-label TAD framework of PointTAD. Specifically, our PointTAD introduces a small set of learnable query points to represent the important frames of each action instance. This point-based representation provides a flexible mechanism to localize the discriminative frames at boundaries and as well the important frames inside the action. Moreover, we perform the action decoding process with the Multi-level Interactive Module to capture both point-level and instance-level action semantics. Finally, our PointTAD employs an end-to-end trainable framework simply based on RGB input for easy deployment. We evaluate our proposed method on two popular benchmarks and introduce the new metric of detection-mAP for multi-label TAD. Our model outperforms all previous methods by a large margin under the detection-mAP metric, and also achieves promising results under the segmentation-mAP metric. Code is available at https://github.com/MCG-NJU/PointTAD.
\ No newline at end of file
diff --git a/data/2022/neurips/Poisson Flow Generative Models b/data/2022/neurips/Poisson Flow Generative Models
new file mode 100644
index 0000000000..4360af874f
--- /dev/null
+++ b/data/2022/neurips/Poisson Flow Generative Models	
@@ -0,0 +1 @@
+We propose a new"Poisson flow"generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the $z=0$ plane transforms into a distribution on the hemisphere of radius $r$ that becomes uniform in the $r \to\infty$ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the $z$ reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of $9.68$ and a FID score of $2.35$. It also performs on par with the state-of-the-art SDE approaches while offering $10\times $ to $20 \times$ acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method. The code is available at https://github.com/Newbeeer/poisson_flow .
\ No newline at end of file
diff --git a/data/2022/neurips/PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds b/data/2022/neurips/PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds
new file mode 100644
index 0000000000..ebed2236b9
--- /dev/null
+++ b/data/2022/neurips/PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds	
@@ -0,0 +1 @@
+LiDAR point clouds, which are usually scanned by rotating LiDAR sensors continuously, capture precise geometry of the surrounding environment and are crucial to many autonomous detection and navigation tasks. Though many 3D deep architectures have been developed, efficient collection and annotation of large amounts of point clouds remain one major challenge in the analytic and understanding of point cloud data. This paper presents PolarMix, a point cloud augmentation technique that is simple and generic but can mitigate the data constraint effectively across different perception tasks and scenarios. PolarMix enriches point cloud distributions and preserves point cloud fidelity via two cross-scan augmentation strategies that cut, edit, and mix point clouds along the scanning direction. The first is scene-level swapping which exchanges point cloud sectors of two LiDAR scans that are cut along the azimuth axis. The second is instance-level rotation and paste which crops point instances from one LiDAR scan, rotates them by multiple angles (to create multiple copies), and paste the rotated point instances into other scans. Extensive experiments show that PolarMix achieves superior performance consistently across different perception tasks and scenarios. In addition, it can work as plug-and-play for various 3D deep architectures and also performs well for unsupervised domain adaptation.
\ No newline at end of file
diff --git a/data/2022/neurips/Policy Gradient With Serial Markov Chain Reasoning b/data/2022/neurips/Policy Gradient With Serial Markov Chain Reasoning
new file mode 100644
index 0000000000..e6ab5497bc
--- /dev/null
+++ b/data/2022/neurips/Policy Gradient With Serial Markov Chain Reasoning	
@@ -0,0 +1 @@
+We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process. We model agent behavior as the steady-state distribution of a parameterized reasoning Markov chain (RMC), optimized with a new tractable estimate of the policy gradient. We perform action selection by simulating the RMC for enough reasoning steps to approach its steady-state distribution. We show our framework has several useful properties that are inherently missing from traditional RL. For instance, it allows agent behavior to approximate any continuous distribution over actions by parameterizing the RMC with a simple Gaussian transition function. Moreover, the number of reasoning steps to reach convergence can scale adaptively with the difficulty of each action selection decision and can be accelerated by re-using past solutions. Our resulting algorithm achieves state-of-the-art performance in popular Mujoco and DeepMind Control benchmarks, both for proprioceptive and pixel-based tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Policy Optimization for Markov Games: Unified Framework and Faster Convergence b/data/2022/neurips/Policy Optimization for Markov Games: Unified Framework and Faster Convergence
new file mode 100644
index 0000000000..1e8abbf72c
--- /dev/null
+++ b/data/2022/neurips/Policy Optimization for Markov Games: Unified Framework and Faster Convergence	
@@ -0,0 +1 @@
+This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate. These improve over the current best $\mathcal{\widetilde{O}}(T^{-1/2})$ rate of symmetric policy optimization type algorithms. We also extend this algorithm to multi-player general-sum Markov Games and show an $\mathcal{\widetilde{O}}(T^{-3/4})$ convergence rate to Coarse Correlated Equilibria (CCE). Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using"eager"value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with $H=2$ layers.
\ No newline at end of file
diff --git a/data/2022/neurips/Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems b/data/2022/neurips/Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems
new file mode 100644
index 0000000000..e5cb6ca92b
--- /dev/null
+++ b/data/2022/neurips/Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems	
@@ -0,0 +1 @@
+Long-term fairness is an important factor of consideration in designing and deploying learning-based decision systems in high-stake decision-making contexts. Recent work has proposed the use of Markov Decision Processes (MDPs) to formulate decision-making with long-term fairness requirements in dynamically changing environments, and demonstrated major challenges in directly deploying heuristic and rule-based policies that worked well in static environments. We show that policy optimization methods from deep reinforcement learning can be used to find strictly better decision policies that can often achieve both higher overall utility and less violation of the fairness requirements, compared to previously-known strategies. In particular, we propose new methods for imposing fairness requirements in policy optimization by regularizing the advantage evaluation of different actions. Our proposed methods make it easy to impose fairness constraints without reward engineering or sacrificing training efficiency. We perform detailed analyses in three established case studies, including attention allocation in incident monitoring, bank loan approval, and vaccine distribution in population networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Policy Optimization with Linear Temporal Logic Constraints b/data/2022/neurips/Policy Optimization with Linear Temporal Logic Constraints
new file mode 100644
index 0000000000..d5c67bc606
--- /dev/null
+++ b/data/2022/neurips/Policy Optimization with Linear Temporal Logic Constraints	
@@ -0,0 +1 @@
+We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and as an alternative to the standard of cost shaping. With access to a generative model, we develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality (through a reduction to a reachability problem). Empirically, our algorithm can achieve strong performance even in low-sample regimes.
\ No newline at end of file
diff --git a/data/2022/neurips/Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks b/data/2022/neurips/Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
new file mode 100644
index 0000000000..972266c34e
--- /dev/null
+++ b/data/2022/neurips/Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks	
@@ -0,0 +1 @@
+Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks while training only a few parameters. Despite their success, most existing methods are proposed in Natural Language Processing tasks with language Transformers, and adaptation to Computer Vision tasks with Vision Transformers remains under-explored, especially for dense vision tasks. Further, in multi-task settings, individually fine-tuning and storing separate models for different tasks is inefficient. In this work, we provide an extensive multi-task parameter-efficient benchmark and examine existing parameter-efficient fine-tuning NLP methods for vision tasks. Our results on four different dense vision tasks showed that existing methods cannot be efficiently integrated due to the hierarchical nature of the Hierarchical Vision Transformers. To overcome this issue, we propose Polyhistor and Polyhistor-Lite, consisting of Decomposed HyperNetworks and Layer-wise Scaling Kernels, to share information across different tasks with a few trainable parameters. This leads to favorable performance improvements against existing parameter-efficient methods while using fewer trainable parameters. Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters. Furthermore, our methods show larger performance gains when large networks and more pretraining data are used.
\ No newline at end of file
diff --git a/data/2022/neurips/Polynomial Neural Fields for Subband Decomposition and Manipulation b/data/2022/neurips/Polynomial Neural Fields for Subband Decomposition and Manipulation
new file mode 100644
index 0000000000..2488e9303a
--- /dev/null
+++ b/data/2022/neurips/Polynomial Neural Fields for Subband Decomposition and Manipulation	
@@ -0,0 +1 @@
+Neural fields have emerged as a new paradigm for representing signals, thanks to their ability to do it compactly while being easy to optimize. In most applications, however, neural fields are treated like black boxes, which precludes many signal manipulation tasks. In this paper, we propose a new class of neural fields called polynomial neural fields (PNFs). The key advantage of a PNF is that it can represent a signal as a composition of a number of manipulable and interpretable components without losing the merits of neural fields representation. We develop a general theoretical framework to analyze and design PNFs. We use this framework to design Fourier PNFs, which match state-of-the-art performance in signal representation tasks that use neural fields. In addition, we empirically demonstrate that Fourier PNFs enable signal manipulation applications such as texture transfer and scale-space interpolation. Code is available at https://github.com/stevenygd/PNF.
\ No newline at end of file
diff --git a/data/2022/neurips/Polynomial time guarantees for the Burer-Monteiro method b/data/2022/neurips/Polynomial time guarantees for the Burer-Monteiro method
new file mode 100644
index 0000000000..773a4226cd
--- /dev/null
+++ b/data/2022/neurips/Polynomial time guarantees for the Burer-Monteiro method	
@@ -0,0 +1,2 @@
+The Burer-Monteiro method is one of the most widely used techniques for solving large-scale semidefinite programs (SDP). The basic idea is to solve a nonconvex program in $Y$, where $Y$ is an $n \times p$ matrix such that $X = Y Y^T$. In this paper, we show that this method can solve SDPs in polynomial time in an smoothed analysis setting. More precisely, we consider an SDP whose domain satisfies some compactness and smoothness assumptions, and slightly perturb the cost matrix and the constraints. We show that if $p \gtrsim \sqrt{2(1+\eta)m}$, where $m$ is the number of constraints and $\eta>0$ is any fixed constant, then the Burer-Monteiro method can solve SDPs to any desired accuracy in polynomial time, in the setting of smooth analysis. Our bound on $p$ approaches the celebrated Barvinok-Pataki bound in the limit as $\eta$ goes to zero, beneath which it is known that the nonconvex program can be suboptimal. 
+Previous analyses were unable to give polynomial time guarantees for the Burer-Monteiro method, since they either assumed that the criticality conditions are satisfied exactly, or ignored the nontrivial problem of computing an approximately feasible solution. We address the first problem through a novel connection with tubular neighborhoods of algebraic varieties. For the feasibility problem we consider a least squares formulation, and provide the first guarantees that do not rely on the restricted isometry property.
\ No newline at end of file
diff --git a/data/2022/neurips/Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games b/data/2022/neurips/Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games
new file mode 100644
index 0000000000..2632de604a
--- /dev/null
+++ b/data/2022/neurips/Polynomial-Time Optimal Equilibria with a Mediator in Extensive-Form Games	
@@ -0,0 +1 @@
+For common notions of correlated equilibrium in extensive-form games, computing an optimal (e.g., welfare-maximizing) equilibrium is NP-hard. Other equilibrium notions -- communication (Forges 1986) and certification (Forges&Koessler 2005) equilibria -- augment the game with a mediator that has the power to both send and receive messages to and from the players -- and, in particular, to remember the messages. In this paper, we investigate both notions in extensive-form games from a computational lens. We show that optimal equilibria in both notions can be computed in polynomial time, the latter under a natural additional assumption known in the literature. Our proof works by constructing a mediator-augmented game of polynomial size that explicitly represents the mediator's decisions and actions. Our framework allows us to define an entire family of equilibria by varying the mediator's information partition, the players' ability to lie, and the players' ability to deviate. From this perspective, we show that other notions of equilibrium, such as extensive-form correlated equilibrium, correspond to the mediator having imperfect recall. This shows that, at least among all these equilibrium notions, the hardness of computation is driven by the mediator's imperfect recall. As special cases of our general construction, we recover 1) the polynomial-time algorithm of Conitzer&Sandholm (2004) for automated mechanism design in Bayes-Nash equilibria and 2) the correlation DAG algorithm of Zhang et al (2022) for optimal correlation. Our algorithm is especially scalable when the equilibrium notion is what we define as the full-certification equilibrium, where players cannot lie about their information but they can be silent. We back up our theoretical claims with experiments on a suite of standard benchmark games.
\ No newline at end of file
diff --git a/data/2022/neurips/PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits b/data/2022/neurips/PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
new file mode 100644
index 0000000000..9b7b629f77
--- /dev/null
+++ b/data/2022/neurips/PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits	
@@ -0,0 +1 @@
+In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.
\ No newline at end of file
diff --git a/data/2022/neurips/Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization b/data/2022/neurips/Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
new file mode 100644
index 0000000000..8bd5c565c6
--- /dev/null
+++ b/data/2022/neurips/Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization	
@@ -0,0 +1 @@
+The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random forest algorithms for PU-learning. Key to our approach is a new interpretation of decision tree algorithms for positive and negative data as \emph{recursive greedy risk minimization algorithms}. We extend this perspective to the PU setting to develop new decision tree learning algorithms that directly minimizes PU-data based estimators for the expected risk. This allows us to develop an efficient PU random forest algorithm, PU extra trees. Our approach features three desirable properties: it is robust to the choice of the loss function in the sense that various loss functions lead to the same decision trees; it requires little hyperparameter tuning as compared to neural network based PU learning; it supports a feature importance that directly measures a feature's contribution to risk minimization. Our algorithms demonstrate strong performance on several datasets. Our code is available at \url{https://github.com/puetpaper/PUExtraTrees}.
\ No newline at end of file
diff --git a/data/2022/neurips/Positively Weighted Kernel Quadrature via Subsampling b/data/2022/neurips/Positively Weighted Kernel Quadrature via Subsampling
new file mode 100644
index 0000000000..82e3a2cde1
--- /dev/null
+++ b/data/2022/neurips/Positively Weighted Kernel Quadrature via Subsampling	
@@ -0,0 +1 @@
+We study kernel quadrature rules with convex weights. Our approach combines the spectral properties of the kernel with recombination results about point measures. This results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples from the underlying measure and evaluation of the kernel and that result in a small worst-case error. In addition to our theoretical results and the benefits resulting from convex weights, our experiments indicate that this construction can compete with the optimal bounds in well-known examples.
\ No newline at end of file
diff --git a/data/2022/neurips/Post-hoc estimators for learning to defer to an expert b/data/2022/neurips/Post-hoc estimators for learning to defer to an expert
new file mode 100644
index 0000000000..e827639911
--- /dev/null
+++ b/data/2022/neurips/Post-hoc estimators for learning to defer to an expert	
@@ -0,0 +1 @@
+Many practical settings allow a classiﬁer to defer predictions to one or more costly experts . For example, the learning to defer paradigm allows a classiﬁer to defer to a human expert, at some monetary cost. Similarly, the adaptive inference paradigm allows a base model to defer to one or more large models, at some computational cost. The goal in these settings is to learn classiﬁcation and deferral mechanisms to optimise a suitable accuracy-cost tradeo � . To achieve this, a central issue studied in prior work is the design of a coherent loss function for both mechanisms. In this work, we demonstrate that existing losses can underﬁt the training set when there is a non-trivial deferral cost, owing to an implicit application of a high level of label smoothing. To resolve this, we propose two post-hoc estimators that ﬁt a deferral function on top of a base model, either by threshold correction, or by learning when the base model’s error rate exceeds the cost of deferring to the expert. Both approaches are equipped with theoretical guarantees, and empirically yield e � ective accuracy-cost tradeo � s on learning to defer and adaptive inference benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers b/data/2022/neurips/Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers
new file mode 100644
index 0000000000..f2e601b233
--- /dev/null
+++ b/data/2022/neurips/Posted Pricing and Dynamic Prior-independent Mechanisms with Value Maximizers	
@@ -0,0 +1 @@
+We study posted price auctions and dynamic prior-independent mechanisms for (ROI-constrained) value maximizers. In contrast to classic (quasi-linear) utility maximizers, these agents aim to maximize their total value subject to a minimum ratio of value per unit of payment made. When personalized posted prices are allowed, posted price auctions for value maximizers can be reduced to posted price auctions for utility maximizers. However, for anonymous posted prices, the well-known 12 approximation for utility maximizers is impossible for value maxi-mizers and we provide a posted price mechanism with 12 (1 − 1 /e ) approximation. Moreover, we demonstrate how to apply our results to design prior-independent mechanisms in a dynamic environment; and to the best of our knowledge, this gives the first constant revenue approximation with multiple value maximizers. Finally, we provide an extension to combinatorial auctions with submodular / XOS agents.
\ No newline at end of file
diff --git a/data/2022/neurips/Posterior Collapse of a Linear Latent Variable Model b/data/2022/neurips/Posterior Collapse of a Linear Latent Variable Model
new file mode 100644
index 0000000000..be7eeafd29
--- /dev/null
+++ b/data/2022/neurips/Posterior Collapse of a Linear Latent Variable Model	
@@ -0,0 +1 @@
+This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice. For a general linear latent variable model that includes linear variational autoencoders as a special case, we precisely identify the nature of posterior collapse to be the competition between the likelihood and the regularization of the mean due to the prior. Our result suggests that posterior collapse may be related to neural collapse and dimensional collapse and could be a subclass of a general problem of learning for deeper architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/Posterior Matching for Arbitrary Conditioning b/data/2022/neurips/Posterior Matching for Arbitrary Conditioning
new file mode 100644
index 0000000000..2f42ac7de8
--- /dev/null
+++ b/data/2022/neurips/Posterior Matching for Arbitrary Conditioning	
@@ -0,0 +1 @@
+Arbitrary conditioning is an important problem in unsupervised learning, where we seek to model the conditional densities $p(\mathbf{x}_u \mid \mathbf{x}_o)$ that underly some data, for all possible non-intersecting subsets $o, u \subset \{1, \dots , d\}$. However, the vast majority of density estimation only focuses on modeling the joint distribution $p(\mathbf{x})$, in which important conditional dependencies between features are opaque. We propose a simple and general framework, coined Posterior Matching, that enables Variational Autoencoders (VAEs) to perform arbitrary conditioning, without modification to the VAE itself. Posterior Matching applies to the numerous existing VAE-based approaches to joint density estimation, thereby circumventing the specialized models required by previous approaches to arbitrary conditioning. We find that Posterior Matching is comparable or superior to current state-of-the-art methods for a variety of tasks with an assortment of VAEs (e.g.~discrete, hierarchical, VaDE).
\ No newline at end of file
diff --git a/data/2022/neurips/Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks b/data/2022/neurips/Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks
new file mode 100644
index 0000000000..2e4c81f61d
--- /dev/null
+++ b/data/2022/neurips/Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks	
@@ -0,0 +1 @@
+Monte Carlo (MC) integration is the de facto method for approximating the predictive distribution of Bayesian neural networks (BNNs). But, even with many MC samples, Gaussian-based BNNs could still yield bad predictive performance due to the posterior approximation's error. Meanwhile, alternatives to MC integration tend to be more expensive or biased. In this work, we experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself. However, previous methods for obtaining accurate posterior approximations are expensive and non-trivial to implement. We, therefore, propose to refine Gaussian approximate posteriors with normalizing flows. When applied to last-layer BNNs, it yields a simple \emph{post hoc} method for improving pre-existing parametric approximations. We show that the resulting posterior approximation is competitive with even the gold-standard full-batch Hamiltonian Monte Carlo.
\ No newline at end of file
diff --git a/data/2022/neurips/Posterior and Computational Uncertainty in Gaussian Processes b/data/2022/neurips/Posterior and Computational Uncertainty in Gaussian Processes
new file mode 100644
index 0000000000..25f7d16ef0
--- /dev/null
+++ b/data/2022/neurips/Posterior and Computational Uncertainty in Gaussian Processes	
@@ -0,0 +1 @@
+Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Power and limitations of single-qubit native quantum neural networks b/data/2022/neurips/Power and limitations of single-qubit native quantum neural networks
new file mode 100644
index 0000000000..6cfc68ee4c
--- /dev/null
+++ b/data/2022/neurips/Power and limitations of single-qubit native quantum neural networks	
@@ -0,0 +1 @@
+Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. We in particular establish the exact correlations between the parameters of the trainable gates and the Fourier coefficients, resolving an open problem on the universal approximation property of QNN. Second, we discuss the limitations of single-qubit native QNNs on approximating multivariate functions by analyzing the frequency spectrum and the flexibility of Fourier coefficients. We further demonstrate the expressivity and limitations of single-qubit native QNNs via numerical experiments. We believe these results would improve our understanding of QNNs and provide a helpful guideline for designing powerful QNNs for machine learning tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models b/data/2022/neurips/Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models
new file mode 100644
index 0000000000..5d25e35395
--- /dev/null
+++ b/data/2022/neurips/Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models	
@@ -0,0 +1 @@
+Machine learning based traffic forecasting models leverage sophisticated spatiotemporal auto-correlations to provide accurate predictions of city-wide traffic states. However, existing methods assume a reliable and unbiased forecasting environment, which is not always available in the wild. In this work, we investigate the vulnerability of spatiotemporal traffic forecasting models and propose a practical adversarial spatiotemporal attack framework. Specifically, instead of simultaneously attacking all geo-distributed data sources, an iterative gradient-guided node saliency method is proposed to identify the time-dependent set of victim nodes. Furthermore, we devise a spatiotemporal gradient descent based scheme to generate real-valued adversarial traffic states under a perturbation constraint. Meanwhile, we theoretically demonstrate the worst performance bound of adversarial traffic forecasting attacks. Extensive experiments on two real-world datasets show that the proposed two-step framework achieves up to $67.8\%$ performance degradation on various advanced spatiotemporal forecasting models. Remarkably, we also show that adversarial training with our proposed attacks can significantly improve the robustness of spatiotemporal traffic forecasting models. Our code is available in \url{https://github.com/luckyfan-cs/ASTFA}.
\ No newline at end of file
diff --git a/data/2022/neurips/Practical Adversarial Multivalid Conformal Prediction b/data/2022/neurips/Practical Adversarial Multivalid Conformal Prediction
new file mode 100644
index 0000000000..843b1d7645
--- /dev/null
+++ b/data/2022/neurips/Practical Adversarial Multivalid Conformal Prediction	
@@ -0,0 +1 @@
+We give a simple, generic conformal prediction method for sequential prediction that achieves target empirical coverage guarantees against adversarially chosen data. It is computationally lightweight -- comparable to split conformal prediction -- but does not require having a held-out validation set, and so all data can be used for training models from which to derive a conformal score. It gives stronger than marginal coverage guarantees in two ways. First, it gives threshold calibrated prediction sets that have correct empirical coverage even conditional on the threshold used to form the prediction set from the conformal score. Second, the user can specify an arbitrary collection of subsets of the feature space -- possibly intersecting -- and the coverage guarantees also hold conditional on membership in each of these subsets. We call our algorithm MVP, short for MultiValid Prediction. We give both theory and an extensive set of empirical evaluations.
\ No newline at end of file
diff --git a/data/2022/neurips/Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments b/data/2022/neurips/Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments
new file mode 100644
index 0000000000..e1dca1438f
--- /dev/null
+++ b/data/2022/neurips/Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments	
@@ -0,0 +1 @@
+Learning from demonstration methods usually leverage close to optimal demonstrations to accelerate training. By contrast, when demonstrating a task, human teachers deviate from optimal demonstrations and pedagogically modify their behavior by giving demonstrations that best disambiguate the goal they want to demonstrate. Analogously, human learners excel at pragmatically inferring the intent of the teacher, facilitating communication between the two agents. These mechanisms are critical in the few demonstrations regime, where inferring the goal is more difficult. In this paper, we implement pedagogy and pragmatism mechanisms by leveraging a Bayesian model of Goal Inference from demonstrations (BGI). We highlight the benefits of this model in multi-goal teacher-learner setups with two artificial agents that learn with goal-conditioned Reinforcement Learning. We show that combining BGI-agents (a pedagogical teacher and a pragmatic learner) results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime. We provide the code for our experiments (https://github.com/Caselles/NeurIPS22-demonstrations-pedagogy-pragmatism), as well as an illustrative video explaining our approach (https://youtu.be/V4n16IjkNyw).
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors b/data/2022/neurips/Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors
new file mode 100644
index 0000000000..956a85286e
--- /dev/null
+++ b/data/2022/neurips/Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors	
@@ -0,0 +1 @@
+Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task. Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-supervised approaches, which then serve as the basis for priors that modify the whole loss surface on the downstream task. This simple modular approach enables significant performance gains and more data-efficient learning on a variety of downstream classification and segmentation tasks, serving as a drop-in replacement for standard pre-training strategies. These highly informative priors also can be saved for future use, similar to pre-trained weights, and stand in contrast to the zero-mean isotropic uninformative priors that are typically used in Bayesian deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning b/data/2022/neurips/Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning
new file mode 100644
index 0000000000..efe8fb78a2
--- /dev/null
+++ b/data/2022/neurips/Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning	
@@ -0,0 +1 @@
+Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World, and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting. Project Page: https://sites.google.com/view/pie-g/home.
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-Trained Language Models for Interactive Decision-Making b/data/2022/neurips/Pre-Trained Language Models for Interactive Decision-Making
new file mode 100644
index 0000000000..9792411a0f
--- /dev/null
+++ b/data/2022/neurips/Pre-Trained Language Models for Interactive Decision-Making	
@@ -0,0 +1 @@
+Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and generalization in general sequential decision-making problems. In this approach, goals and observations are represented as a sequence of embeddings, and a policy network initialized with a pre-trained LM predicts the next action. We demonstrate that this framework enables effective combinatorial generalization across different environments and supervisory modalities. We begin by assuming access to a set of expert demonstrations, and show that initializing policies with LMs and fine-tuning them via behavior cloning improves task completion rates by 43.6% in the VirtualHome environment. Next, we integrate an active data gathering procedure in which agents iteratively interact with the environment, relabel past"failed"experiences with new goals, and update their policies in a self-supervised loop. Active data gathering further improves combinatorial generalization, outperforming the best baseline by 25.1%. Finally, we explain these results by investigating three possible factors underlying the effectiveness of the LM-based policy. We find that sequential input representations (vs. fixed-dimensional feature vectors) and LM-based weight initialization are both important for generalization. Surprisingly, however, the format of the policy inputs encoding (e.g. as a natural language string vs. an arbitrary sequential encoding) has little influence. Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning b/data/2022/neurips/Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning
new file mode 100644
index 0000000000..ef793934ea
--- /dev/null
+++ b/data/2022/neurips/Pre-Trained Model Reusability Evaluation for Small-Data Transfer Learning	
@@ -0,0 +1 @@
+We study model reusability evaluation (MRE) for source pre-trained models: evaluating their transfer learning performance to new target tasks. In special, we focus on the setting under which the target training datasets are small, making it difﬁ-cult to produce reliable MRE scores using them. Under this situation, we propose synergistic learning for building the task-model metric, which can be realized by collecting a set of pre-trained models and asking a group of data providers to participate. We provide theoretical guarantees to show that the learned task-model metric distances can serve as trustworthy MRE scores, and propose synergistic learning algorithms and models for general learning tasks. Experiments show that the MRE models learned by synergistic learning can generate signiﬁcantly more reliable MRE scores than existing approaches for small-data transfer learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-activation Distributions Expose Backdoor Neurons b/data/2022/neurips/Pre-activation Distributions Expose Backdoor Neurons
new file mode 100644
index 0000000000..3a9f79d50a
--- /dev/null
+++ b/data/2022/neurips/Pre-activation Distributions Expose Backdoor Neurons	
@@ -0,0 +1 @@
+Convolutional neural networks (CNN) can be manipulated to perform specific behaviors when encountering a particular trigger pattern without affecting the performance on normal samples, which is referred to as backdoor attack. The back-door attack is usually achieved by injecting a small proportion of poisoned samples into the training set, through which the victim trains a model embedded with the designated backdoor. In this work, we demonstrate that backdoor neurons are exposed by their pre-activation distributions, where populations from benign data and poisoned data show significantly different moments. This property is shown to be attack-invariant and allows us to efficiently locate backdoor neurons. On this basis, we make several proper assumptions on the neuron activation distributions, and propose two backdoor neuron detection strategies based on (1) the differential entropy of the neurons, and (2) the Kullback-Leibler divergence between the benign sample distribution and a poisoned statistics based hypothetical distribution. Experimental results show that our proposed defense strategies are both efficient and effective against various backdoor attacks. Source code is available here.
\ No newline at end of file
diff --git a/data/2022/neurips/Pre-trained Adversarial Perturbations b/data/2022/neurips/Pre-trained Adversarial Perturbations
new file mode 100644
index 0000000000..fcacf3c71c
--- /dev/null
+++ b/data/2022/neurips/Pre-trained Adversarial Perturbations	
@@ -0,0 +1 @@
+Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression b/data/2022/neurips/Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm b/data/2022/neurips/Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm
new file mode 100644
index 0000000000..d34a09ae41
--- /dev/null
+++ b/data/2022/neurips/Precise Regret Bounds for Log-loss via a Truncated Bayesian Algorithm	
@@ -0,0 +1 @@
+We study the sequential general online regression, known also as the sequential probability assignments, under logarithmic loss when compared against a broad class of experts. We focus on obtaining tight, often matching, lower and upper bounds for the sequential minimax regret that are defined as the excess loss it incurs over a class of experts. After proving a general upper bound, we consider some specific classes of experts from Lipschitz class to bounded Hessian class and derive matching lower and upper bounds with provably optimal constants. Our bounds work for a wide range of values of the data dimension and the number of rounds. To derive lower bounds, we use tools from information theory (e.g., Shtarkov sum) and for upper bounds, we resort to new"smooth truncated covering"of the class of experts. This allows us to find constructive proofs by applying a simple and novel truncated Bayesian algorithm. Our proofs are substantially simpler than the existing ones and yet provide tighter (and often optimal) bounds.
\ No newline at end of file
diff --git a/data/2022/neurips/Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution b/data/2022/neurips/Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution
new file mode 100644
index 0000000000..ac3d39cbaa
--- /dev/null
+++ b/data/2022/neurips/Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution	
@@ -0,0 +1 @@
+Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA HTS is required to enrich single-cell data meaningfully. We introduce chemCPA, a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with an architecture surgery for transfer learning and demonstrate how training on existing bulk RNA HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating drug discovery.
\ No newline at end of file
diff --git a/data/2022/neurips/Predicting Label Distribution from Multi-label Ranking b/data/2022/neurips/Predicting Label Distribution from Multi-label Ranking
new file mode 100644
index 0000000000..7bab35dd98
--- /dev/null
+++ b/data/2022/neurips/Predicting Label Distribution from Multi-label Ranking	
@@ -0,0 +1 @@
+Label distribution can provide richer information about label polysemy than logical labels in multi-label learning. There are currently two strategies including LDL (label distribution learning) and LE (label enhancement) to predict label distributions. LDL requires experts to annotate instances with label distributions and learn a predictive mapping on such a training set. LE requires experts to annotate instances with logical labels and generates label distributions from them. However, LDL requires costly annotation, and the performance of the LE is unstable. In this paper, we study the problem of predicting label distribution from multi-label ranking which is a compromise w.r.t. annotation cost but has good guarantees for performance. On the one hand, we theoretically investigate the relation between multi-label ranking and label distribution. We deﬁne the notion of EAE (expected approximation error) to quantify the quality of an annotation, give the bounds of EAE for multi-label ranking, and derive the optimal range of label distribution corresponding to a particular multi-label ranking. On the other hand, we propose a framework of label distribution predicting from multi-label ranking via conditional Dirichlet mixtures. This framework integrates the processes of recovering and learning label distributions end-to-end and allows us to easily encode our knowledge about current tasks by a scoring function. Finally, we implement extensive experiments to validate our proposal.
\ No newline at end of file
diff --git a/data/2022/neurips/Predictive Coding beyond Gaussian Distributions b/data/2022/neurips/Predictive Coding beyond Gaussian Distributions
new file mode 100644
index 0000000000..64c213f30b
--- /dev/null
+++ b/data/2022/neurips/Predictive Coding beyond Gaussian Distributions	
@@ -0,0 +1 @@
+A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP). A prominent example is predictive coding (PC), which is a neuroscience-inspired method that performs inference on hierarchical Gaussian generative models. These methods, however, fail to keep up with modern neural networks, as they are unable to replicate the dynamics of complex layers and activation functions. In this work, we solve this problem by generalizing PC to arbitrary probability distributions, enabling the training of architectures, such as transformers, that are hard to approximate with only Gaussian assumptions. We perform three experimental analyses. First, we study the gap between our method and the standard formulation of PC on multiple toy examples. Second, we test the reconstruction quality on variational autoencoders, where our method reaches the same reconstruction quality as BP. Third, we show that our method allows us to train transformer networks and achieve a performance comparable with BP on conditional language models. More broadly, this method allows neuroscience-inspired learning to be applied to multiple domains, since the internal distributions can be flexibly adapted to the data, tasks, and architectures used.
\ No newline at end of file
diff --git a/data/2022/neurips/Predictive Querying for Autoregressive Neural Sequence Models b/data/2022/neurips/Predictive Querying for Autoregressive Neural Sequence Models
new file mode 100644
index 0000000000..112c928ad8
--- /dev/null
+++ b/data/2022/neurips/Predictive Querying for Autoregressive Neural Sequence Models	
@@ -0,0 +1 @@
+In reasoning about sequential events it is natural to pose probabilistic queries such as"when will event A occur next"or"what is the probability of A occurring before B", with applications in areas such as user modeling, medicine, and finance. However, with machine learning shifting towards neural autoregressive models such as RNNs and transformers, probabilistic querying has been largely restricted to simple cases such as next-event prediction. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Preservation of the Global Knowledge by Not-True Distillation in Federated Learning b/data/2022/neurips/Preservation of the Global Knowledge by Not-True Distillation in Federated Learning
new file mode 100644
index 0000000000..8d11894f5f
--- /dev/null
+++ b/data/2022/neurips/Preservation of the Global Knowledge by Not-True Distillation in Federated Learning	
@@ -0,0 +1 @@
+In federated learning, a strong global model is collaboratively learned by aggregating clients' locally trained models. Although this precludes the need to access clients' data directly, the global model's convergence often suffers from data heterogeneity. This study starts from an analogy to continual learning and suggests that forgetting could be the bottleneck of federated learning. We observe that the global model forgets the knowledge from previous rounds, and the local training induces forgetting the knowledge outside of the local distribution. Based on our findings, we hypothesize that tackling down forgetting will relieve the data heterogeneity problem. To this end, we propose a novel and effective algorithm, Federated Not-True Distillation (FedNTD), which preserves the global perspective on locally available data only for the not-true classes. In the experiments, FedNTD shows state-of-the-art performance on various setups without compromising data privacy or incurring additional communication costs.
\ No newline at end of file
diff --git a/data/2022/neurips/Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation b/data/2022/neurips/Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation
new file mode 100644
index 0000000000..c40f81e77b
--- /dev/null
+++ b/data/2022/neurips/Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation	
@@ -0,0 +1 @@
+We establish a simple connection between robust and differentially-private algorithms: private mechanisms which perform well with very high probability are automatically robust in the sense that they retain accuracy even if a constant fraction of the samples they receive are adversarially corrupted. Since optimal mechanisms typically achieve these high success probabilities, our results imply that optimal private mechanisms for many basic statistics problems are robust. We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems. Assuming the Brennan-Bresler secret-leakage planted clique conjecture, we demonstrate a fundamental tradeoff between computational efficiency, privacy leakage, and success probability for sparse mean estimation. Private algorithms which match this tradeoff are not yet known -- we achieve that (up to polylogarithmic factors) in a polynomially-large range of parameters via the Sum-of-Squares method. To establish an information-computation gap for private sparse mean estimation, we also design new (exponential-time) mechanisms using fewer samples than efficient algorithms must use. Finally, we give evidence for privacy-induced information-computation gaps for several other statistics and learning problems, including PAC learning parity functions and estimation of the mean of a multivariate Gaussian.
\ No newline at end of file
diff --git a/data/2022/neurips/Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss b/data/2022/neurips/Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss
new file mode 100644
index 0000000000..b74d1d4b76
--- /dev/null
+++ b/data/2022/neurips/Privacy of Noisy Stochastic Gradient Descent: More Iterations without More Privacy Loss	
@@ -0,0 +1 @@
+A central issue in machine learning is how to train models on sensitive user data. Industry has widely adopted a simple algorithm: Stochastic Gradient Descent with noise (a.k.a. Stochastic Gradient Langevin Dynamics). However, foundational theoretical questions about this algorithm's privacy loss remain open -- even in the seemingly simple setting of smooth convex losses over a bounded domain. Our main result resolves these questions: for a large range of parameters, we characterize the differential privacy up to a constant factor. This result reveals that all previous analyses for this setting have the wrong qualitative behavior. Specifically, while previous privacy analyses increase ad infinitum in the number of iterations, we show that after a small burn-in period, running SGD longer leaks no further privacy. Our analysis departs from previous approaches based on fast mixing, instead using techniques based on optimal transport (namely, Privacy Amplification by Iteration) and the Sampled Gaussian Mechanism (namely, Privacy Amplification by Sampling). Our techniques readily extend to other settings, e.g., strongly convex losses, non-uniform stepsizes, arbitrary batch sizes, and random or cyclic choice of batches.
\ No newline at end of file
diff --git a/data/2022/neurips/Private Estimation with Public Data b/data/2022/neurips/Private Estimation with Public Data
new file mode 100644
index 0000000000..aedd7586b9
--- /dev/null
+++ b/data/2022/neurips/Private Estimation with Public Data	
@@ -0,0 +1 @@
+We initiate the study of differentially private (DP) estimation with access to a small amount of public data. For private estimation of d-dimensional Gaussians, we assume that the public data comes from a Gaussian that may have vanishing similarity in total variation distance with the underlying Gaussian of the private data. We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution from the private sample complexity, which is known to be otherwise necessary without public data. For separated Gaussian mixtures, we assume that the underlying public and private distributions are the same, and we consider two settings: (1) when given a dimension-independent amount of public data, the private sample complexity can be improved polynomially in terms of the number of mixture components, and any dependence on the range parameters of the distribution can be removed in the approximate DP case; (2) when given an amount of public data linear in the dimension, the private sample complexity can be made independent of range parameters even under concentrated DP, and additional improvements can be made to the overall sample complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate b/data/2022/neurips/Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate
new file mode 100644
index 0000000000..f2a08de8a0
--- /dev/null
+++ b/data/2022/neurips/Private Graph All-Pairwise-Shortest-Path Distance Release with Improved Error Rate	
@@ -0,0 +1 @@
+Releasing all pairwise shortest path (APSP) distances among vertices on general graphs under weight Differential Privacy (DP) is known as a challenging task that has gained increasing interest recently. Previous work achieved DP with the maximal absolute error among all published pairwise distances bounded by ˜ O ( n ) where n is the number of nodes. Whether the approximation error can be reduced to sublinear in n is still an interesting open problem. In this paper, we break the linear barrier on the distance approximation error in APSP release, by proposing an algorithm that releases a constructed synthetic graph privately. Computing all pairwise distances on the constructed graph only introduces ˜ O ( n 1 / 2 ) error in answering all pairwise shortest path distances for fixed privacy parameter. Our method is based on a novel graph diameter (link length) augmentation via constructing “shortcuts” for the paths and the use of Laplace noise with non-zero mean. Numerical examples are also provided. Additionally, we also propose a DP algorithm with error rate ˜ O ( k ) , which improves the error of general graphs, when the graph has small feedback vertex set number k = o ( n 1 / 2 ) .
\ No newline at end of file
diff --git a/data/2022/neurips/Private Isotonic Regression b/data/2022/neurips/Private Isotonic Regression
new file mode 100644
index 0000000000..7aad49e679
--- /dev/null
+++ b/data/2022/neurips/Private Isotonic Regression	
@@ -0,0 +1 @@
+In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log|\mathcal{X}| / n$, where $\mathrm{width}(\mathcal{X})$ is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly $(\mathrm{width}(\mathcal{X}) + \log |\mathcal{X}|) / n$, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for $\ell_1$ and $\ell_2^2$ losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function.
\ No newline at end of file
diff --git a/data/2022/neurips/Private Multiparty Perception for Navigation b/data/2022/neurips/Private Multiparty Perception for Navigation
new file mode 100644
index 0000000000..8c27272788
--- /dev/null
+++ b/data/2022/neurips/Private Multiparty Perception for Navigation	
@@ -0,0 +1 @@
+We introduce a framework for navigating through cluttered environments by connecting multiple cameras together while simultaneously preserving privacy. Occlusions and obstacles in large environments are often challenging situations for navigation agents because the environment is not fully observable from a single camera view. Given multiple camera views of an environment, our approach learns to produce a multiview scene representation that can only be used for navigation, provably preventing one party from inferring anything beyond the output task. On a new navigation dataset that we will publicly release, experiments show that private multiparty representations allow navigation through complex scenes and around obstacles while jointly preserving privacy. Our approach scales to an arbitrary number of camera viewpoints. We believe developing visual representations that preserve privacy is increasingly important for many applications such as navigation.
\ No newline at end of file
diff --git a/data/2022/neurips/Private Set Generation with Discriminative Information b/data/2022/neurips/Private Set Generation with Discriminative Information
new file mode 100644
index 0000000000..87af41dbb1
--- /dev/null
+++ b/data/2022/neurips/Private Set Generation with Discriminative Information	
@@ -0,0 +1 @@
+Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in sensitive domains. Unfortunately, restricted by the inherent complexity of modeling high-dimensional distributions, existing private generative models are struggling with the utility of synthetic samples. In contrast to existing works that aim at fitting the complete data distribution, we directly optimize for a small set of samples that are representative of the distribution under the supervision of discriminative information from downstream tasks, which is generally an easier task and more suitable for private training. Our work provides an alternative view for differentially private generation of high-dimensional data and introduces a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Private Synthetic Data for Multitask Learning and Marginal Queries b/data/2022/neurips/Private Synthetic Data for Multitask Learning and Marginal Queries
new file mode 100644
index 0000000000..01860c0b88
--- /dev/null
+++ b/data/2022/neurips/Private Synthetic Data for Multitask Learning and Marginal Queries	
@@ -0,0 +1 @@
+We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Private and Communication-Efficient Algorithms for Entropy Estimation b/data/2022/neurips/Private and Communication-Efficient Algorithms for Entropy Estimation
new file mode 100644
index 0000000000..9e00b7a552
--- /dev/null
+++ b/data/2022/neurips/Private and Communication-Efficient Algorithms for Entropy Estimation	
@@ -0,0 +1 @@
+Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples, and also with minimizing the amount of data they must transmit to the server. We give improved private and communication-efficient algorithms for estimating several popular measures of the entropy of a distribution. All of our algorithms have constant communication cost and satisfy local differential privacy. For a joint distribution over many variables whose conditional independence is given by a tree, we describe algorithms for estimating Shannon entropy that require a number of samples that is linear in the number of variables, compared to the quadratic sample complexity of prior work. We also describe an algorithm for estimating Gini entropy whose sample complexity has no dependence on the support size of the distribution and can be implemented using a single round of concurrent communication between the users and the server. In contrast, the previously best-known algorithm has high communication cost and requires the server to facilitate interaction between the users. Finally, we describe an algorithm for estimating collision entropy that generalizes the best known algorithm to the private and communication-efficient setting.
\ No newline at end of file
diff --git a/data/2022/neurips/Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data b/data/2022/neurips/Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data
new file mode 100644
index 0000000000..e89a941ec0
--- /dev/null
+++ b/data/2022/neurips/Probabilistic Missing Value Imputation for Mixed Categorical and Ordered Data	
@@ -0,0 +1 @@
+Many real-world datasets contain missing entries and mixed data types including categorical and ordered (e.g. continuous and ordinal) variables. Imputing the missing entries is necessary, since many data analysis pipelines require complete data, but this is challenging especially for mixed data. This paper proposes a probabilistic imputation method using an extended Gaussian copula model that supports both single and multiple imputation. The method models mixed categorical and ordered data using a latent Gaussian distribution. The unordered characteristics of categorical variables is explicitly modeled using the argmax operator. The method makes no assumptions on the data marginals nor does it require tuning any hyperparameters. Experimental results on synthetic and real datasets show that imputation with the extended Gaussian copula outperforms the current state-of-the-art for both categorical and ordered variables in mixed data.
\ No newline at end of file
diff --git a/data/2022/neurips/Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design b/data/2022/neurips/Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
new file mode 100644
index 0000000000..7dd5652242
--- /dev/null
+++ b/data/2022/neurips/Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design	
@@ -0,0 +1 @@
+Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is particularly true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures. This suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach (1) on a synthetic task that captures the ability to learn a hidden data distribution, (2) with state-of-the-art results in RNA folding that reveal advantages on highly ambiguous data, and (3) demonstrating its generative capabilities on property-based molecule design by implicitly learning the underlying distributions and outperforming existing work.
\ No newline at end of file
diff --git a/data/2022/neurips/Probable Domain Generalization via Quantile Risk Minimization b/data/2022/neurips/Probable Domain Generalization via Quantile Risk Minimization
new file mode 100644
index 0000000000..53371b431d
--- /dev/null
+++ b/data/2022/neurips/Probable Domain Generalization via Quantile Risk Minimization	
@@ -0,0 +1 @@
+Domain generalization (DG) seeks predictors which perform well on unseen test distributions by leveraging data drawn from multiple related training distributions or domains. To achieve this, DG is commonly formulated as an average- or worst-case problem over the set of possible domains. However, predictors that perform well on average lack robustness while predictors that perform well in the worst case tend to be overly-conservative. To address this, we propose a new probabilistic framework for DG where the goal is to learn predictors that perform well with high probability. Our key idea is that distribution shifts seen during training should inform us of probable shifts at test time, which we realize by explicitly relating training and test domains as draws from the same underlying meta-distribution. To achieve probable DG, we propose a new optimization problem called Quantile Risk Minimization (QRM). By minimizing the $\alpha$-quantile of predictor's risk distribution over domains, QRM seeks predictors that perform well with probability $\alpha$. To solve QRM in practice, we propose the Empirical QRM (EQRM) algorithm and provide: (i) a generalization bound for EQRM; and (ii) the conditions under which EQRM recovers the causal predictor as $\alpha \to 1$. In our experiments, we introduce a more holistic quantile-focused evaluation protocol for DG and demonstrate that EQRM outperforms state-of-the-art baselines on datasets from WILDS and DomainBed.
\ No newline at end of file
diff --git a/data/2022/neurips/Probing Classifiers are Unreliable for Concept Removal and Detection b/data/2022/neurips/Probing Classifiers are Unreliable for Concept Removal and Detection
new file mode 100644
index 0000000000..4660ac9762
--- /dev/null
+++ b/data/2022/neurips/Probing Classifiers are Unreliable for Concept Removal and Detection	
@@ -0,0 +1 @@
+Neural network models trained on text data have been found to encode undesirable linguistic or sensitive concepts in their representation. Removing such concepts is non-trivial because of a complex relationship between the concept, text input, and the learnt representation. Recent work has proposed post-hoc and adversarial methods to remove such unwanted concepts from a model's representation. Through an extensive theoretical and empirical analysis, we show that these methods can be counter-productive: they are unable to remove the concepts entirely, and in the worst case may end up destroying all task-relevant features. The reason is the methods' reliance on a probing classifier as a proxy for the concept. Even under the most favorable conditions for learning a probing classifier when a concept's relevant features in representation space alone can provide 100% accuracy, we prove that a probing classifier is likely to use non-concept features and thus post-hoc or adversarial methods will fail to remove the concept correctly. These theoretical implications are confirmed by experiments on models trained on synthetic, Multi-NLI, and Twitter datasets. For sensitive applications of concept removal such as fairness, we recommend caution against using these methods and propose a spuriousness metric to gauge the quality of the final classifier.
\ No newline at end of file
diff --git a/data/2022/neurips/Procedural Image Programs for Representation Learning b/data/2022/neurips/Procedural Image Programs for Representation Learning
new file mode 100644
index 0000000000..77c3d06e9e
--- /dev/null
+++ b/data/2022/neurips/Procedural Image Programs for Representation Learning	
@@ -0,0 +1 @@
+Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias. Existing work focuses on a handful of curated generative processes which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with a large dataset of twenty-one thousand programs, each one generating a diverse set of synthetic images. These programs are short code snippets, which are easy to modify and fast to execute using OpenGL. The proposed dataset can be used for both supervised and unsupervised representation learning, and reduces the gap between pre-training with real and procedurally generated images by 38%.
\ No newline at end of file
diff --git a/data/2022/neurips/Product Ranking for Revenue Maximization with Multiple Purchases b/data/2022/neurips/Product Ranking for Revenue Maximization with Multiple Purchases
new file mode 100644
index 0000000000..e0f3522865
--- /dev/null
+++ b/data/2022/neurips/Product Ranking for Revenue Maximization with Multiple Purchases	
@@ -0,0 +1 @@
+Product ranking is the core problem for revenue-maximizing online retailers. To design proper product ranking algorithms, various consumer choice models are proposed to characterize the consumers' behaviors when they are provided with a list of products. However, existing works assume that each consumer purchases at most one product or will keep viewing the product list after purchasing a product, which does not agree with the common practice in real scenarios. In this paper, we assume that each consumer can purchase multiple products at will. To model consumers' willingness to view and purchase, we set a random attention span and purchase budget, which determines the maximal amount of products that he/she views and purchases, respectively. Under this setting, we first design an optimal ranking policy when the online retailer can precisely model consumers' behaviors. Based on the policy, we further develop the Multiple-Purchase-with-Budget UCB (MPB-UCB) algorithms with $\~O(\sqrt{T})$ regret that estimate consumers' behaviors and maximize revenue simultaneously in online settings. Experiments on both synthetic and semi-synthetic datasets prove the effectiveness of the proposed algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images b/data/2022/neurips/Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images
new file mode 100644
index 0000000000..2e76e47a9c
--- /dev/null
+++ b/data/2022/neurips/Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images	
@@ -0,0 +1 @@
+In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We firstly introduce four complexity factors to quantitatively measure the distributions of object- and scene-level biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models catastrophically fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the colossal failure of existing unsupervised models on real-world images are the challenging distributions of object- and scene-level biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.
\ No newline at end of file
diff --git a/data/2022/neurips/Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization b/data/2022/neurips/Prompt Certified Machine Unlearning with Randomized Gradient Smoothing and Quantization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms b/data/2022/neurips/Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms
new file mode 100644
index 0000000000..293316fa9e
--- /dev/null
+++ b/data/2022/neurips/Proppo: a Message Passing Framework for Customizable and Composable Learning Algorithms	
@@ -0,0 +1 @@
+While existing automatic differentiation (AD) frameworks allow ﬂexibly composing model architectures, they do not provide the same ﬂexibility for composing learning algorithms—everything has to be implemented in terms of back-propagation. To address this gap, we invent Automatic Propagation (AP) software, which generalizes AD, and allows custom and composable construction of complex learning algorithms. The framework allows packaging custom learning algorithms into propagators that automatically implement the necessary computations, and can be reused across different computation graphs. We implement Proppo, a prototype AP software package built on top of the Pytorch AD framework. To demonstrate the utility of Proppo, we use it to implement Monte Carlo gradient estimation techniques, such as reparameterization and likelihood ratio gradients, as well as the total propagation algorithm and Gaussian shaping gradients, which were previously used in model-based reinforcement learning, but do not have any publicly available implementation. Finally, in minimalistic experiments, we show that these methods allow increasing the gradient estimation accuracy by orders of magnitude, particularly when the machine learning system is at the edge of chaos. 3
\ No newline at end of file
diff --git a/data/2022/neurips/ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model b/data/2022/neurips/ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model
new file mode 100644
index 0000000000..d4ae81ddea
--- /dev/null
+++ b/data/2022/neurips/ProtoVAE: A Trustworthy Self-Explainable Prototypical Variational Model	
@@ -0,0 +1 @@
+The need for interpretable models has fostered the development of self-explainable classifiers. Prior approaches are either based on multi-stage optimization schemes, impacting the predictive performance of the model, or produce explanations that are not transparent, trustworthy or do not capture the diversity of the data. To address these shortcomings, we propose ProtoVAE, a variational autoencoder-based framework that learns class-specific prototypes in an end-to-end manner and enforces trustworthiness and diversity by regularizing the representation space and introducing an orthonormality constraint. Finally, the model is designed to be transparent by directly incorporating the prototypes into the decision process. Extensive comparisons with previous self-explainable approaches demonstrate the superiority of ProtoVAE, highlighting its ability to generate trustworthy and diverse explanations, while not degrading predictive performance.
\ No newline at end of file
diff --git a/data/2022/neurips/ProtoX: Explaining a Reinforcement Learning Agent via Prototyping b/data/2022/neurips/ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
new file mode 100644
index 0000000000..069437f1f4
--- /dev/null
+++ b/data/2022/neurips/ProtoX: Explaining a Reinforcement Learning Agent via Prototyping	
@@ -0,0 +1 @@
+While deep reinforcement learning has proven to be successful in solving control tasks, the"black-box"nature of an agent has received increasing concerns. We propose a prototype-based post-hoc policy explainer, ProtoX, that explains a blackbox agent by prototyping the agent's behaviors into scenarios, each represented by a prototypical state. When learning prototypes, ProtoX considers both visual similarity and scenario similarity. The latter is unique to the reinforcement learning context, since it explains why the same action is taken in visually different states. To teach ProtoX about visual similarity, we pre-train an encoder using contrastive learning via self-supervised learning to recognize states as similar if they occur close together in time and receive the same action from the black-box agent. We then add an isometry layer to allow ProtoX to adapt scenario similarity to the downstream task. ProtoX is trained via imitation learning using behavior cloning, and thus requires no access to the environment or agent. In addition to explanation fidelity, we design different prototype shaping terms in the objective function to encourage better interpretability. We conduct various experiments to test ProtoX. Results show that ProtoX achieved high fidelity to the original black-box agent while providing meaningful and understandable explanations.
\ No newline at end of file
diff --git a/data/2022/neurips/Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection b/data/2022/neurips/Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection
new file mode 100644
index 0000000000..ee262c8270
--- /dev/null
+++ b/data/2022/neurips/Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection	
@@ -0,0 +1 @@
+Most existing 3D point cloud object detection approaches heavily rely on large amounts of labeled training data. However, the labeling process is costly and time-consuming. This paper considers few-shot 3D point cloud object detection, where only a few annotated samples of novel classes are needed with abundant samples of base classes. To this end, we propose Prototypical VoteNet to recognize and localize novel instances, which incorporates two new modules: Prototypical Vote Module (PVM) and Prototypical Head Module (PHM). Specifically, as the 3D basic geometric structures can be shared among categories, PVM is designed to leverage class-agnostic geometric prototypes, which are learned from base classes, to refine local features of novel categories.Then PHM is proposed to utilize class prototypes to enhance the global feature of each object, facilitating subsequent object localization and classification, which is trained by the episodic training strategy. To evaluate the model in this new setting, we contribute two new benchmark datasets, FS-ScanNet and FS-SUNRGBD. We conduct extensive experiments to demonstrate the effectiveness of Prototypical VoteNet, and our proposed method shows significant and consistent improvements compared to baselines on two benchmark datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Provable Benefit of Multitask Representation Learning in Reinforcement Learning b/data/2022/neurips/Provable Benefit of Multitask Representation Learning in Reinforcement Learning
new file mode 100644
index 0000000000..75d04ce809
--- /dev/null
+++ b/data/2022/neurips/Provable Benefit of Multitask Representation Learning in Reinforcement Learning	
@@ -0,0 +1 @@
+As representation learning becomes a powerful technique to reduce sample complexity in reinforcement learning (RL) in practice, theoretical understanding of its advantage is still limited. In this paper, we theoretically characterize the benefit of representation learning under the low-rank Markov decision process (MDP) model. We first study multitask low-rank RL (as upstream training), where all tasks share a common representation, and propose a new multitask reward-free algorithm called REFUEL. REFUEL learns both the transition kernel and the near-optimal policy for each task, and outputs a well-learned representation for downstream tasks. Our result demonstrates that multitask representation learning is provably more sample-efficient than learning each task individually, as long as the total number of tasks is above a certain threshold. We then study the downstream RL in both online and offline settings, where the agent is assigned with a new task sharing the same representation as the upstream tasks. For both online and offline settings, we develop a sample-efficient algorithm, and show that it finds a near-optimal policy with the suboptimality gap bounded by the sum of the estimation error of the learned representation in upstream and a vanishing term as the number of downstream samples becomes large. Our downstream results of online and offline RL further capture the benefit of employing the learned representation from upstream as opposed to learning the representation of the low-rank model directly. To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Provable Defense against Backdoor Policies in Reinforcement Learning b/data/2022/neurips/Provable Defense against Backdoor Policies in Reinforcement Learning
new file mode 100644
index 0000000000..82fccd03c3
--- /dev/null
+++ b/data/2022/neurips/Provable Defense against Backdoor Policies in Reinforcement Learning	
@@ -0,0 +1 @@
+We propose a provable defense mechanism against backdoor policies in reinforcement learning under subspace trigger assumption. A backdoor policy is a security threat where an adversary publishes a seemingly well-behaved policy which in fact allows hidden triggers. During deployment, the adversary can modify observed states in a particular way to trigger unexpected actions and harm the agent. We assume the agent does not have the resources to re-train a good policy. Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment. Our sanitized policy achieves $\epsilon$ approximate optimality in the presence of triggers, provided the number of clean interactions is $O\left(\frac{D}{(1-\gamma)^4 \epsilon^2}\right)$ where $\gamma$ is the discounting factor and $D$ is the dimension of state space. Empirically, we show that our sanitization defense performs well on two Atari game environments.
\ No newline at end of file
diff --git a/data/2022/neurips/Provable General Function Class Representation Learning in Multitask Bandits and MDP b/data/2022/neurips/Provable General Function Class Representation Learning in Multitask Bandits and MDP
new file mode 100644
index 0000000000..f3cf1cdda0
--- /dev/null
+++ b/data/2022/neurips/Provable General Function Class Representation Learning in Multitask Bandits and MDP	
@@ -0,0 +1 @@
+While multitask representation learning has become a popular approach in reinforcement learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it works is still limited. Most previous analytical works could only assume that the representation function is already known to the agent or from linear function class, since analyzing general function class representation encounters non-trivial technical obstacles such as generalization guarantee, formulation of confidence bound in abstract function space, etc. However, linear-case analysis heavily relies on the particularity of linear function class, while real-world practice usually adopts general non-linear representation functions like neural networks. This significantly reduces its applicability. In this work, we extend the analysis to general function class representations. Specifically, we consider an agent playing $M$ contextual bandits (or MDPs) concurrently and extracting a shared representation function $\phi$ from a specific function class $\Phi$ using our proposed Generalized Functional Upper Confidence Bound algorithm (GFUCB). We theoretically validate the benefit of multitask representation learning within general function class for bandits and linear MDP for the first time. Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation.
\ No newline at end of file
diff --git a/data/2022/neurips/Provable Generalization of Overparameterized Meta-learning Trained with SGD b/data/2022/neurips/Provable Generalization of Overparameterized Meta-learning Trained with SGD
new file mode 100644
index 0000000000..7b96e57d71
--- /dev/null
+++ b/data/2022/neurips/Provable Generalization of Overparameterized Meta-learning Trained with SGD	
@@ -0,0 +1 @@
+Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited. This paper studies the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML), which aims to find a good initialization for fast adaptation to new tasks. Under a mixed linear regression model, we analyze the generalization properties of MAML trained with SGD in the overparameterized regime. We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. With such sharp characterizations, we further explore how various learning parameters impact the generalization capability of overparameterized MAML, including explicitly identifying typical data and task distributions that can achieve diminishing generalization error with overparameterization, and characterizing the impact of adaptation learning rate on both excess risk and the early stopping time. Our theoretical findings are further validated by experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Provable Subspace Identification Under Post-Nonlinear Mixtures b/data/2022/neurips/Provable Subspace Identification Under Post-Nonlinear Mixtures
new file mode 100644
index 0000000000..e44979cce5
--- /dev/null
+++ b/data/2022/neurips/Provable Subspace Identification Under Post-Nonlinear Mixtures	
@@ -0,0 +1 @@
+Unsupervised mixture learning (UML) aims at identifying linearly or nonlinearly mixed latent components in a blind manner. UML is known to be challenging: Even learning linear mixtures requires highly nontrivial analytical tools, e.g., independent component analysis or nonnegative matrix factorization. In this work, the post-nonlinear (PNL) mixture model -- where unknown element-wise nonlinear functions are imposed onto a linear mixture -- is revisited. The PNL model is widely employed in different fields ranging from brain signal classification, speech separation, remote sensing, to causal discovery. To identify and remove the unknown nonlinear functions, existing works often assume different properties on the latent components (e.g., statistical independence or probability-simplex structures). This work shows that under a carefully designed UML criterion, the existence of a nontrivial null space associated with the underlying mixing system suffices to guarantee identification/removal of the unknown nonlinearity. Compared to prior works, our finding largely relaxes the conditions of attaining PNL identifiability, and thus may benefit applications where no strong structural information on the latent components is known. A finite-sample analysis is offered to characterize the performance of the proposed approach under realistic settings. To implement the proposed learning criterion, a block coordinate descent algorithm is proposed. A series of numerical experiments corroborate our theoretical claims.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free b/data/2022/neurips/Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free
new file mode 100644
index 0000000000..6deb50d475
--- /dev/null
+++ b/data/2022/neurips/Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free	
@@ -0,0 +1 @@
+The application of machine learning in safety-critical systems requires a reliable assessment of uncertainty. However, deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data. Even if trained to be non-confident on OOD data, one can still adversarially manipulate OOD data so that the classifier again assigns high confidence to the manipulated samples. We show that two previously published defenses can be broken by better adapted attacks, highlighting the importance of robustness guarantees around OOD data. Since the existing method for this task is hard to train and significantly limits accuracy, we construct a classifier that can simultaneously achieve provably adversarially robust OOD detection and high clean accuracy. Moreover, by slightly modifying the classifier's architecture our method provably avoids the asymptotic overconfidence problem of standard neural networks. We provide code for all our experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably Efficient Model-Free Constrained RL with Linear Function Approximation b/data/2022/neurips/Provably Efficient Model-Free Constrained RL with Linear Function Approximation
new file mode 100644
index 0000000000..f4fe4c1cca
--- /dev/null
+++ b/data/2022/neurips/Provably Efficient Model-Free Constrained RL with Linear Function Approximation	
@@ -0,0 +1 @@
+We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus b/data/2022/neurips/Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
new file mode 100644
index 0000000000..46af3100ee
--- /dev/null
+++ b/data/2022/neurips/Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus	
@@ -0,0 +1 @@
+This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales $\sum_{i=1}^mA_i$ where $A_i$ is the action size of the $i$-th player and $m$ is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space $\Pi_{i=1}^m A_i$ due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class $\Pi$ as input and output a strategy that is close to the best strategy in $\Pi$. In this setting, the sample complexity only scales with $\log |\Pi|$ instead of $\sum_{i=1}^mA_i$.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems b/data/2022/neurips/Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems
new file mode 100644
index 0000000000..7585de8a5b
--- /dev/null
+++ b/data/2022/neurips/Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems	
@@ -0,0 +1 @@
+We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. Given a policy class that consists of memory based policies (that look at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy in the given policy class. For certain examples such as undercomplete observable tabular POMDPs, observable LQGs and observable POMDPs with latent low-rank transition, by implicitly leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon in its sample complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning b/data/2022/neurips/Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning
new file mode 100644
index 0000000000..1294be40cb
--- /dev/null
+++ b/data/2022/neurips/Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning	
@@ -0,0 +1 @@
+An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $\widetilde{O}(H{{\dim_{R}^2}})$ queries on the reward function to provide an $\epsilon$-optimal policy for any $\epsilon>0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably expressive temporal graph networks b/data/2022/neurips/Provably expressive temporal graph networks
new file mode 100644
index 0000000000..be88b288d9
--- /dev/null
+++ b/data/2022/neurips/Provably expressive temporal graph networks	
@@ -0,0 +1 @@
+Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TGNs). Specifically, novel constructions reveal the inadequacy of MP-TGNs and WA-TGNs, proving that neither category subsumes the other. We extend the 1-WL (Weisfeiler-Leman) test to temporal graphs, and show that the most powerful MP-TGNs should use injective updates, as in this case they become as expressive as the temporal WL. Also, we show that sufficiently deep MP-TGNs cannot benefit from memory, and MP/WA-TGNs fail to compute graph properties such as girth. These theoretical insights lead us to PINT -- a novel architecture that leverages injective temporal message passing and relative positional features. Importantly, PINT is provably more expressive than both MP-TGNs and WA-TGNs. PINT significantly outperforms existing TGNs on several real-world benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Provably sample-efficient RL with side information about latent dynamics b/data/2022/neurips/Provably sample-efficient RL with side information about latent dynamics
new file mode 100644
index 0000000000..f9f08d5cd3
--- /dev/null
+++ b/data/2022/neurips/Provably sample-efficient RL with side information about latent dynamics	
@@ -0,0 +1 @@
+We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan. We formalize this setting as transfer reinforcement learning from an abstract simulator, which we assume is deterministic (such as a simple model of moving around the floor plan), but which is only required to capture the target domain's latent-state dynamics approximately up to unknown (bounded) perturbations (to account for environment stochasticity). Crucially, we assume no prior knowledge about the structure of observations in the target domain except that they can be used to identify the latent states (but the decoding map is unknown). Under these assumptions, we present an algorithm, called TASID, that learns a robust policy in the target domain, with sample complexity that is polynomial in the horizon, and independent of the number of states, which is not possible without access to some prior knowledge. In synthetic experiments, we verify various properties of our algorithm and show that it empirically outperforms transfer RL algorithms that require access to"full simulators"(i.e., those that also simulate observations).
\ No newline at end of file
diff --git a/data/2022/neurips/Provably tuning the ElasticNet across instances b/data/2022/neurips/Provably tuning the ElasticNet across instances
new file mode 100644
index 0000000000..17828fec09
--- /dev/null
+++ b/data/2022/neurips/Provably tuning the ElasticNet across instances	
@@ -0,0 +1 @@
+An important unresolved challenge in the theory of regularization is to set the regularization coefficients of popular techniques like the ElasticNet with general provable guarantees. We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances, a setting that encompasses both cross-validation and multi-task hyperparameter optimization. We obtain a novel structural result for the ElasticNet which characterizes the loss as a function of the tuning parameters as a piecewise-rational function with algebraic boundaries. We use this to bound the structural complexity of the regularized loss functions and show generalization guarantees for tuning the ElasticNet regression coefficients in the statistical setting. We also consider the more challenging online learning setting, where we show vanishing average expected regret relative to the optimal parameter pair. We further extend our results to tuning classification algorithms obtained by thresholding regression fits regularized by Ridge, LASSO, or ElasticNet. Our results are the first general learning-theoretic guarantees for this important class of problems that avoid strong assumptions on the data distribution. Furthermore, our guarantees hold for both validation and popular information criterion objectives.
\ No newline at end of file
diff --git a/data/2022/neurips/Proximal Learning With Opponent-Learning Awareness b/data/2022/neurips/Proximal Learning With Opponent-Learning Awareness
new file mode 100644
index 0000000000..9e41761be9
--- /dev/null
+++ b/data/2022/neurips/Proximal Learning With Opponent-Learning Awareness	
@@ -0,0 +1 @@
+Learning With Opponent-Learning Awareness (LOLA) (Foerster et al. [2018a]) is a multi-agent reinforcement learning algorithm that typically learns reciprocity-based cooperation in partially competitive environments. However, LOLA often fails to learn such behaviour on more complex policy spaces parameterized by neural networks, partly because the update rule is sensitive to the policy parameterization. This problem is especially pronounced in the opponent modeling setting, where the opponent's policy is unknown and must be inferred from observations; in such settings, LOLA is ill-specified because behaviorally equivalent opponent policies can result in non-equivalent updates. To address this shortcoming, we reinterpret LOLA as approximating a proximal operator, and then derive a new algorithm, proximal LOLA (POLA), which uses the proximal formulation directly. Unlike LOLA, the POLA updates are parameterization invariant, in the sense that when the proximal objective has a unique optimum, behaviorally equivalent policies result in behaviorally equivalent updates. We then present practical approximations to the ideal POLA update, which we evaluate in several partially competitive environments with function approximation and opponent modeling. This empirically demonstrates that POLA achieves reciprocity-based cooperation more reliably than LOLA.
\ No newline at end of file
diff --git a/data/2022/neurips/Proximal Point Imitation Learning b/data/2022/neurips/Proximal Point Imitation Learning
new file mode 100644
index 0000000000..7fd5f23de0
--- /dev/null
+++ b/data/2022/neurips/Proximal Point Imitation Learning	
@@ -0,0 +1 @@
+This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation.
\ No newline at end of file
diff --git a/data/2022/neurips/Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks b/data/2022/neurips/Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks
new file mode 100644
index 0000000000..094fd1f744
--- /dev/null
+++ b/data/2022/neurips/Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks	
@@ -0,0 +1 @@
+Visual object recognition has been extensively studied in both neuroscience and computer vision. Recently, the most popular class of artificial systems for this task, deep convolutional neural networks (CNNs), has been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex. This has prompted questions on what, if any, are the common principles underlying the reformatting of visual information as it flows through a CNN or the ventral stream. Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex and look for them in the other system. We show that intrinsic dimensionality (ID) of object representations along the rat homologue of the ventral stream presents two distinct expansion-contraction phases, as previously shown for CNNs. Conversely, in CNNs, we show that training results in both distillation and active pruning (mirroring the increase in ID) of low- to middle-level image information in single units, as representations gain the ability to support invariant discrimination, in agreement with previous observations in rat visual cortex. Taken together, our findings suggest that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
\ No newline at end of file
diff --git a/data/2022/neurips/Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions b/data/2022/neurips/Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions
new file mode 100644
index 0000000000..7110ffd608
--- /dev/null
+++ b/data/2022/neurips/Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions	
@@ -0,0 +1 @@
+Pruning is one of the predominant approaches for compressing deep neural networks (DNNs). Lately, coresets (provable data summarizations) were leveraged for pruning DNNs, adding the advantage of theoretical guarantees on the trade-off between the compression rate and the approximation error. However, coresets in this domain were either data-dependent or generated under restrictive assumptions on both the model's weights and inputs. In real-world scenarios, such assumptions are rarely satisfied, limiting the applicability of coresets. To this end, we suggest a novel and robust framework for computing such coresets under mild assumptions on the model's weights and without any assumption on the training data. The idea is to compute the importance of each neuron in each layer with respect to the output of the following layer. This is achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our method is simultaneously data-independent, applicable to various networks and datasets (due to the simplified assumptions), and theoretically supported. Experimental results show that our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets. For example, our method achieved a $62\%$ compression rate on ResNet50 on ImageNet with $1.09\%$ drop in accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Pruning has a disparate impact on model accuracy b/data/2022/neurips/Pruning has a disparate impact on model accuracy
new file mode 100644
index 0000000000..41617360a7
--- /dev/null
+++ b/data/2022/neurips/Pruning has a disparate impact on model accuracy	
@@ -0,0 +1 @@
+Network pruning is a widely-used compression technique that is able to significantly scale down overparameterized models with minimal loss of accuracy. This paper shows that pruning may create or exacerbate disparate impacts. The paper sheds light on the factors to cause such disparities, suggesting differences in gradient norms and distance to decision boundary across groups to be responsible for this critical issue. It analyzes these factors in detail, providing both theoretical and empirical support, and proposes a simple, yet effective, solution that mitigates the disparate impacts caused by pruning.
\ No newline at end of file
diff --git a/data/2022/neurips/Pruning's Effect on Generalization Through the Lens of Training and Regularization b/data/2022/neurips/Pruning's Effect on Generalization Through the Lens of Training and Regularization
new file mode 100644
index 0000000000..fb192a43ed
--- /dev/null
+++ b/data/2022/neurips/Pruning's Effect on Generalization Through the Lens of Training and Regularization	
@@ -0,0 +1 @@
+Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning's effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/Pseudo-Riemannian Graph Convolutional Networks b/data/2022/neurips/Pseudo-Riemannian Graph Convolutional Networks
new file mode 100644
index 0000000000..8e4a0af6e3
--- /dev/null
+++ b/data/2022/neurips/Pseudo-Riemannian Graph Convolutional Networks	
@@ -0,0 +1 @@
+Graph convolutional networks (GCNs) are powerful frameworks for learning embeddings of graph-structured data. GCNs are traditionally studied through the lens of Euclidean geometry. Recent works find that non-Euclidean Riemannian manifolds provide specific inductive biases for embedding hierarchical or spherical data. However, they cannot align well with data of mixed graph topologies. We consider a larger class of pseudo-Riemannian manifolds that generalize hyperboloid and sphere. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected pseudo-Riemannian manifolds. As a consequence, we derive a pseudo-Riemannian GCN that models data in pseudo-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufficiently flexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. We demonstrate the representational capabilities of this method by applying it to the tasks of graph reconstruction, node classification and link prediction on a series of standard graphs with mixed topologies. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.
\ No newline at end of file
diff --git a/data/2022/neurips/Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social Text Classification b/data/2022/neurips/Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social Text Classification
new file mode 100644
index 0000000000..3615ddff16
--- /dev/null
+++ b/data/2022/neurips/Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social Text Classification	
@@ -0,0 +1 @@
+Social media has become the fulcrum of all forms of communication. Classifying social texts such as fake news, rumour, sarcasm, etc. has gained significant attention. The surface-level signals expressed by a social-text itself may not be adequate for such tasks; therefore, recent methods attempted to incorporate other intrinsic signals such as user behavior and the underlying graph structure. Oftentimes, the `public wisdom' expressed through the comments/replies to a social-text acts as a surrogate of crowd-sourced view and may provide us with complementary signals. State-of-the-art methods on social-text classification tend to ignore such a rich hierarchical signal. Here, we propose Hyphen, a discourse-aware hyperbolic spectral co-attention network. Hyphen is a fusion of hyperbolic graph representation learning with a novel Fourier co-attention mechanism in an attempt to generalise the social-text classification tasks by incorporating public discourse. We parse public discourse as an Abstract Meaning Representation (AMR) graph and use the powerful hyperbolic geometric representation to model graphs with hierarchical structure. Finally, we equip it with a novel Fourier co-attention mechanism to capture the correlation between the source post and public discourse. Extensive experiments on four different social-text classification tasks, namely detecting fake news, hate speech, rumour, and sarcasm, show that Hyphen generalises well, and achieves state-of-the-art results on ten benchmark datasets. We also employ a sentence-level fact-checked and annotated dataset to evaluate how Hyphen is capable of producing explanations as analogous evidence to the final prediction.
\ No newline at end of file
diff --git a/data/2022/neurips/PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation b/data/2022/neurips/PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation
new file mode 100644
index 0000000000..a381e9b32d
--- /dev/null
+++ b/data/2022/neurips/PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation	
@@ -0,0 +1 @@
+The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
\ No newline at end of file
diff --git a/data/2022/neurips/Pure Transformers are Powerful Graph Learners b/data/2022/neurips/Pure Transformers are Powerful Graph Learners
new file mode 100644
index 0000000000..4ee5024f27
--- /dev/null
+++ b/data/2022/neurips/Pure Transformers are Powerful Graph Learners	
@@ -0,0 +1 @@
+We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.
\ No newline at end of file
diff --git a/data/2022/neurips/Pushing the limits of fairness impossibility: Who's the fairest of them all? b/data/2022/neurips/Pushing the limits of fairness impossibility: Who's the fairest of them all?
new file mode 100644
index 0000000000..d68abadde6
--- /dev/null
+++ b/data/2022/neurips/Pushing the limits of fairness impossibility: Who's the fairest of them all?	
@@ -0,0 +1 @@
+The impossibility theorem of fairness is a foundational result in the algorithmic fairness literature. It states that outside of special cases, one cannot exactly and simultaneously satisfy all three common and intuitive definitions of fairness - demographic parity, equalized odds, and predictive rate parity. This result has driven most works to focus on solutions for one or two of the metrics. Rather than follow suit, in this paper we present a framework that pushes the limits of the impossibility theorem in order to satisfy all three metrics to the best extent possible. We develop an integer-programming based approach that can yield a certifiably optimal post-processing method for simultaneously satisfying multiple fairness criteria under small violations. We show experiments demonstrating that our post-processor can improve fairness across the different definitions simultaneously with minimal model performance reduction. We also discuss applications of our framework for model selection and fairness explainability, thereby attempting to answer the question: who's the fairest of them all?
\ No newline at end of file
diff --git a/data/2022/neurips/Pyramid Attention For Source Code Summarization b/data/2022/neurips/Pyramid Attention For Source Code Summarization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining b/data/2022/neurips/PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
new file mode 100644
index 0000000000..6fcb69657c
--- /dev/null
+++ b/data/2022/neurips/PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining	
@@ -0,0 +1 @@
+Large-scale vision-language pre-training has achieved promising results on downstream tasks. Existing methods highly rely on the assumption that the image-text pairs crawled from the Internet are in perfect one-to-one correspondence. However, in real scenarios, this assumption can be difficult to hold: the text description, obtained by crawling the affiliated metadata of the image, often suffers from the semantic mismatch and the mutual compatibility. To address these issues, we introduce PyramidCLIP, which constructs an input pyramid with different semantic levels for each modality, and aligns visual elements and linguistic elements in the form of hierarchy via peer-level semantics alignment and cross-level relation alignment. Furthermore, we soften the loss of negative samples (unpaired samples) so as to weaken the strict constraint during the pre-training stage, thus mitigating the risk of forcing the model to distinguish compatible negative pairs. Experiments on five downstream tasks demonstrate the effectiveness of the proposed PyramidCLIP. In particular, with the same amount of 15 million pre-training image-text pairs, PyramidCLIP exceeds CLIP on ImageNet zero-shot classification top-1 accuracy by 10.6%/13.2%/10.0% with ResNet50/ViT-B32/ViT-B16 based image encoder respectively. When scaling to larger datasets, PyramidCLIP achieves the state-of-the-art results on several downstream tasks. In particular, the results of PyramidCLIP-ResNet50 trained on 143M image-text pairs surpass that of CLIP using 400M data on ImageNet zero-shot classification task, significantly improving the data efficiency of CLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case b/data/2022/neurips/Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case
new file mode 100644
index 0000000000..dc578adbf5
--- /dev/null
+++ b/data/2022/neurips/Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case	
@@ -0,0 +1 @@
+In recent years, deep generative models have attracted increasing interest due to their capacity to model complex distributions. Among those models, variational autoencoders have gained popularity as they have proven both to be computationally efficient and yield impressive results in multiple fields. Following this breakthrough, extensive research has been done in order to improve the original publication, resulting in a variety of different VAE models in response to different tasks. In this paper we present Pythae, a versatile open-source Python library providing both a unified implementation and a dedicated framework allowing straightforward, reproducible and reliable use of generative autoencoder models. We then propose to use this library to perform a case study benchmark where we present and compare 19 generative autoencoder models representative of some of the main improvements on downstream tasks such as image reconstruction, generation, classification, clustering and interpolation. The open-source library can be found at https://github.com/clementchadebec/benchmark_VAE.
\ No newline at end of file
diff --git a/data/2022/neurips/Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer b/data/2022/neurips/Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer
new file mode 100644
index 0000000000..96e08cbc1d
--- /dev/null
+++ b/data/2022/neurips/Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer	
@@ -0,0 +1 @@
+The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices. Among the powerful compression approaches, quantization extremely reduces the computation and memory consumption by low-bit parameters and bit-wise operations. However, low-bit ViTs remain largely unexplored and usually suffer from a significant performance drop compared with the real-valued counterparts. In this work, through extensive empirical analysis, we first identify the bottleneck for severe performance drop comes from the information distortion of the low-bit quantized self-attention map. We then develop an information rectification module (IRM) and a distribution guided distillation (DGD) scheme for fully quantized vision transformers (Q-ViT) to effectively eliminate such distortion, leading to a fully quantized ViTs. We evaluate our methods on popular DeiT and Swin backbones. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, our Q-ViT can theoretically accelerates the ViT-S by 6.14x and achieves about 80.9% Top-1 accuracy, even surpassing the full-precision counterpart by 1.0% on ImageNet dataset. Our codes and models are attached on https://github.com/YanjingLi0202/Q-ViT
\ No newline at end of file
diff --git a/data/2022/neurips/QC-StyleGAN - Quality Controllable Image Generation and Manipulation b/data/2022/neurips/QC-StyleGAN - Quality Controllable Image Generation and Manipulation
new file mode 100644
index 0000000000..b1f8c6e3d0
--- /dev/null
+++ b/data/2022/neurips/QC-StyleGAN - Quality Controllable Image Generation and Manipulation	
@@ -0,0 +1 @@
+The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation. The code is available at https://github.com/VinAIResearch/QC-StyleGAN.
\ No newline at end of file
diff --git a/data/2022/neurips/QUARK: Controllable Text Generation with Reinforced Unlearning b/data/2022/neurips/QUARK: Controllable Text Generation with Reinforced Unlearning
new file mode 100644
index 0000000000..be164ab74a
--- /dev/null
+++ b/data/2022/neurips/QUARK: Controllable Text Generation with Reinforced Unlearning	
@@ -0,0 +1 @@
+Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO (Schulman et al. 2017), while relying only on standard language modeling primitives.
\ No newline at end of file
diff --git a/data/2022/neurips/Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP b/data/2022/neurips/Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
new file mode 100644
index 0000000000..d916650a5d
--- /dev/null
+++ b/data/2022/neurips/Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP	
@@ -0,0 +1 @@
+Web-crawled datasets have enabled remarkable generalization capabilities in recent image-text models such as CLIP (Contrastive Language-Image pre-training) or Flamingo, but little is known about the dataset creation processes. In this work, we introduce a testbed of six publicly available data sources - YFCC, LAION, Conceptual Captions, WIT, RedCaps, Shutterstock - to investigate how pre-training distributions induce robustness in CLIP. We find that the performance of the pre-training data varies substantially across distribution shifts, with no single data source dominating. Moreover, we systematically study the interactions between these data sources and find that combining multiple sources does not necessarily yield better models, but rather dilutes the robustness of the best individual data source. We complement our empirical findings with theoretical insights from a simple setting, where combining the training data also results in diluted robustness. In addition, our theoretical model provides a candidate explanation for the success of the CLIP-based data filtering technique recently employed in the LAION dataset. Overall our results demonstrate that simply gathering a large amount of data from the web is not the most effective way to build a pre-training dataset for robust generalization, necessitating further study into dataset design. Code is available at https://github.com/mlfoundations/clip_quality_not_quantity.
\ No newline at end of file
diff --git a/data/2022/neurips/Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference b/data/2022/neurips/Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference
new file mode 100644
index 0000000000..2b094e764d
--- /dev/null
+++ b/data/2022/neurips/Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference	
@@ -0,0 +1 @@
+Although a vast body of literature relates to image segmentation methods that use deep neural networks (DNNs), less attention has been paid to assessing the statistical reliability of segmentation results. In this study, we interpret the segmentation results as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method by which to quantify the reliability of these hypotheses within a statistical hypothesis testing framework. Specifically, we consider a statistical hypothesis test for the difference between the object and background regions. This problem is challenging, as the difference would be falsely large because of the adaptation of the DNN to the data. To overcome this difficulty, we introduce a conditional selective inference (SI) framework -- a new statistical inference framework for data-driven hypotheses that has recently received considerable attention -- to compute exact (non-asymptotic) valid p-values for the segmentation results. To use the conditional SI framework for DNN-based segmentation, we develop a new SI algorithm based on the homotopy method, which enables us to derive the exact (non-asymptotic) sampling distribution of DNN-driven hypothesis. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method can successfully control the false positive rate, has good performance in terms of computational efficiency, and provides good results when applied to medical image data.
\ No newline at end of file
diff --git a/data/2022/neurips/Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability b/data/2022/neurips/Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability
new file mode 100644
index 0000000000..cd72043937
--- /dev/null
+++ b/data/2022/neurips/Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability	
@@ -0,0 +1 @@
+Constrained reinforcement learning (RL) is an area of RL whose objective is to find an optimal policy that maximizes expected cumulative return while satisfying a given constraint. Most of the previous constrained RL works consider expected cumulative sum cost as the constraint. However, optimization with this constraint cannot guarantee a target probability of outage event that the cumulative sum cost exceeds a given threshold. This paper proposes a framework, named Quantile Constrained RL (QCRL), to constrain the quantile of the distribution of the cumulative sum cost that is a necessary and sufficient condition to satisfy the outage constraint. This is the first work that tackles the issue of applying the policy gradient theorem to the quantile and provides theoretical results for approximating the gradient of the quantile. Based on the derived theoretical results and the technique of the Lagrange multiplier, we construct a constrained RL algorithm named Quantile Constrained Policy Optimization (QCPO). We use distributional RL with the Large Deviation Principle (LDP) to estimate quantiles and tail probability of the cumulative sum cost for the implementation of QCPO. The implemented algorithm satisfies the outage probability constraint after the training period.
\ No newline at end of file
diff --git a/data/2022/neurips/Quantized Training of Gradient Boosting Decision Trees b/data/2022/neurips/Quantized Training of Gradient Boosting Decision Trees
new file mode 100644
index 0000000000..934dfee499
--- /dev/null
+++ b/data/2022/neurips/Quantized Training of Gradient Boosting Decision Trees	
@@ -0,0 +1 @@
+Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus about GBDT's training algorithms is gradients and statistics are computed based on high-precision floating points. In this paper, we investigate an essentially important question which has been largely ignored by the previous literature: how many bits are needed for representing gradients in training GBDT? To solve this mystery, we propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits. With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits. Promisingly, these findings may pave the way for much more efficient training of GBDT from several aspects: (1) speeding up the computation of gradient statistics in histograms; (2) compressing the communication cost of high-precision statistical information during distributed training; (3) the inspiration of utilization and development of hardware architectures which well support low-precision computation for GBDT training. Benchmarked on CPUs, GPUs, and distributed clusters, we observe up to 2$\times$ speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets, demonstrating the effectiveness and potential of the low-precision training of GBDT. The code will be released to the official repository of LightGBM.
\ No newline at end of file
diff --git a/data/2022/neurips/Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants b/data/2022/neurips/Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants
new file mode 100644
index 0000000000..f947f488bb
--- /dev/null
+++ b/data/2022/neurips/Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants	
@@ -0,0 +1 @@
+Given a convex function $f\colon\mathbb{R}^{d}\to\mathbb{R}$, the problem of sampling from a distribution $\propto e^{-f(x)}$ is called log-concave sampling. This task has wide applications in machine learning, physics, statistics, etc. In this work, we develop quantum algorithms for sampling log-concave distributions and for estimating their normalizing constants $\int_{\mathbb{R}^d}e^{-f(x)}\mathrm{d} x$. First, we use underdamped Langevin diffusion to develop quantum algorithms that match the query complexity (in terms of the condition number $\kappa$ and dimension $d$) of analogous classical algorithms that use gradient (first-order) queries, even though the quantum algorithms use only evaluation (zeroth-order) queries. For estimating normalizing constants, these algorithms also achieve quadratic speedup in the multiplicative error $\epsilon$. Second, we develop quantum Metropolis-adjusted Langevin algorithms with query complexity $\widetilde{O}(\kappa^{1/2}d)$ and $\widetilde{O}(\kappa^{1/2}d^{3/2}/\epsilon)$ for log-concave sampling and normalizing constant estimation, respectively, achieving polynomial speedups in $\kappa,d,\epsilon$ over the best known classical algorithms by exploiting quantum analogs of the Monte Carlo method and quantum walks. We also prove a $1/\epsilon^{1-o(1)}$ quantum lower bound for estimating normalizing constants, implying near-optimality of our quantum algorithms in $\epsilon$.
\ No newline at end of file
diff --git a/data/2022/neurips/Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits b/data/2022/neurips/Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits
new file mode 100644
index 0000000000..ff69509af9
--- /dev/null
+++ b/data/2022/neurips/Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits	
@@ -0,0 +1 @@
+We initiate the study of quantum algorithms for optimizing approximately convex functions. Given a convex set ${\cal K}\subseteq\mathbb{R}^{n}$ and a function $F\colon\mathbb{R}^{n}\to\mathbb{R}$ such that there exists a convex function $f\colon\mathcal{K}\to\mathbb{R}$ satisfying $\sup_{x\in{\cal K}}|F(x)-f(x)|\leq \epsilon/n$, our quantum algorithm finds an $x^{*}\in{\cal K}$ such that $F(x^{*})-\min_{x\in{\cal K}} F(x)\leq\epsilon$ using $\tilde{O}(n^{3})$ quantum evaluation queries to $F$. This achieves a polynomial quantum speedup compared to the best-known classical algorithms. As an application, we give a quantum algorithm for zeroth-order stochastic convex bandits with $\tilde{O}(n^{5}\log^{2} T)$ regret, an exponential speedup in $T$ compared to the classical $\Omega(\sqrt{T})$ lower bound. Technically, we achieve quantum speedup in $n$ by exploiting a quantum framework of simulated annealing and adopting a quantum version of the hit-and-run walk. Our speedup in $T$ for zeroth-order stochastic convex bandits is due to a quadratic quantum speedup in multiplicative error of mean estimation.
\ No newline at end of file
diff --git a/data/2022/neurips/Quasi-Newton Methods for Saddle Point Problems b/data/2022/neurips/Quasi-Newton Methods for Saddle Point Problems
new file mode 100644
index 0000000000..7927e3e35c
--- /dev/null
+++ b/data/2022/neurips/Quasi-Newton Methods for Saddle Point Problems	
@@ -0,0 +1 @@
+This paper studies quasi-Newton methods for solving strongly-convex-strongly-concave saddle point problems (SPP). We propose greedy and random Broyden family updates for SPP, which have explicit local superlinear convergence rate of ${\mathcal O}\big(\big(1-\frac{1}{n\kappa^2}\big)^{k(k-1)/2}\big)$, where $n$ is dimensions of the problem, $\kappa$ is the condition number and $k$ is the number of iterations. The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization. We also present two specific Broyden family algorithms with BFGS-type and SR1-type updates, which enjoy the faster local convergence rate of $\mathcal O\big(\big(1-\frac{1}{n}\big)^{k(k-1)/2}\big)$. Additionally, we extend our algorithms to solve general nonlinear equations and prove it enjoys the similar convergence rate.
\ No newline at end of file
diff --git a/data/2022/neurips/QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query b/data/2022/neurips/QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query
new file mode 100644
index 0000000000..2e074e98fe
--- /dev/null
+++ b/data/2022/neurips/QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware Part-Level Query	
@@ -0,0 +1 @@
+We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image. The existing end-to-end methods rely on dense representations to preserve the spatial detail and structure for precise keypoint localization. However, the dense paradigm introduces complex and redundant post-processes during inference. In our framework, each human instance is encoded by several learnable spatial-aware part-level queries associated with an instance-level query. First, we propose the Spatial Part Embedding Generation Module (SPEGM) that considers the local spatial attention mechanism to generate several spatial-sensitive part embeddings, which contain spatial details and structural information for enhancing the part-level queries. Second, we introduce the Selective Iteration Module (SIM) to adaptively update the sparse part-level queries via the generated spatial-sensitive part embeddings stage-by-stage. Based on the two proposed modules, the part-level queries are able to fully encode the spatial details and structural information for precise keypoint regression. With the bipartite matching, QueryPose avoids the hand-designed post-processes and surpasses the existing dense end-to-end methods with 73.6 AP on MS COCO mini-val set and 72.7 AP on CrowdPose test set. Code is available at https://github.com/buptxyb666/QueryPose.
\ No newline at end of file
diff --git a/data/2022/neurips/Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits b/data/2022/neurips/Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits
new file mode 100644
index 0000000000..499420ff8b
--- /dev/null
+++ b/data/2022/neurips/Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits	
@@ -0,0 +1 @@
+Consider N cooperative agents such that for T turns, each agent n takes an action a n and receives a stochastic reward r n ( a 1 , . . . , a N ) . Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d ( G ) . We want each agent n to achieve an expected average reward of at least λ n over time, for a given quality of service (QoS) vector λ . A QoS vector λ is not necessarily achievable. By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region. Our main observation is that the gap between λ n t and the accumulated reward of agent n , which we call the QoS regret, behaves like a queue. Inspired by this observation, we propose a distributed algorithm that aims to learn a max-weight matching of agents to actions. In each epoch, the algorithm employs a consensus phase where the agents agree on a certain weighted sum of rewards by communicating only O ( d ( G )) numbers every turn. Then, the algorithm uses distributed successive elimination on a random subset of action profiles to approximately maximize this weighted sum of rewards. We prove a bound on the accumulated sum of expected QoS regrets of all agents, that holds if λ is a safety margin ε T away from the boundary of the capacity region, where ε T → 0 as T → ∞ . This bound implies that, for large T , our algorithm can achieve any λ in the interior of the dynamic capacity region, while all agents are guaranteed an empirical average expected QoS regret of ˜ O (1) over t = 1 , . . . , T which never exceeds ˜ O (cid:0) √ t (cid:1) for any t . We then extend our result to time-varying i.i.d. communication graphs.
\ No newline at end of file
diff --git a/data/2022/neurips/Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking? b/data/2022/neurips/Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?
new file mode 100644
index 0000000000..6a2ce3b401
--- /dev/null
+++ b/data/2022/neurips/Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?	
@@ -0,0 +1 @@
+Recent developments in monocular multi-object tracking have been very successful in tracking visible objects and bridging short occlusion gaps, mainly relying on data-driven appearance models. While we have significantly advanced short-term tracking performance, bridging longer occlusion gaps remains elusive: state-of-the-art object trackers only bridge less than 10% of occlusions longer than three seconds. We suggest that the missing key is reasoning about future trajectories over a longer time horizon. Intuitively, the longer the occlusion gap, the larger the search space for possible associations. In this paper, we show that even a small yet diverse set of trajectory predictions for moving agents will significantly reduce this search space and thus improve long-term tracking robustness. Our experiments suggest that the crucial components of our approach are reasoning in a bird's-eye view space and generating a small yet diverse set of forecasts while accounting for their localization uncertainty. This way, we can advance state-of-the-art trackers on the MOTChallenge dataset and significantly improve their long-term tracking performance. This paper's source code and experimental data are available at https://github.com/dendorferpatrick/QuoVadis.
\ No newline at end of file
diff --git a/data/2022/neurips/RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning b/data/2022/neurips/RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
new file mode 100644
index 0000000000..d3194d53f5
--- /dev/null
+++ b/data/2022/neurips/RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. We formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained to minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and adversarially optimising the model. The problem formulation that we address is theoretically grounded, resulting in a probably approximately correct (PAC) performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that it outperforms existing state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering b/data/2022/neurips/REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
new file mode 100644
index 0000000000..d7a2f923ee
--- /dev/null
+++ b/data/2022/neurips/REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering	
@@ -0,0 +1 @@
+This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent. Based on these observations, we propose a new knowledge-based VQA method REVIVE, which tries to utilize the explicit information of object regions not only in the knowledge retrieval stage but also in the answering model. The key motivation is that object regions and inherent relationship are important for knowledge-based VQA. We perform extensive experiments on the standard OK-VQA dataset and achieve new state-of-the-art performance, i.e., 58.0% accuracy, surpassing previous state-of-the-art method by a large margin (+3.6%). We also conduct detailed analysis and show the necessity of regional information in different framework components for knowledge-based VQA. Code is publicly available at https://github.com/yzleroy/REVIVE.
\ No newline at end of file
diff --git a/data/2022/neurips/RISE: Robust Individualized Decision Learning with Sensitive Variables b/data/2022/neurips/RISE: Robust Individualized Decision Learning with Sensitive Variables
new file mode 100644
index 0000000000..be41d21ce6
--- /dev/null
+++ b/data/2022/neurips/RISE: Robust Individualized Decision Learning with Sensitive Variables	
@@ -0,0 +1 @@
+This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns. A naive baseline is to ignore these sensitive variables in learning decision rules, leading to significant uncertainty and bias. To address this, we propose a decision learning framework to incorporate sensitive variables during offline training but not include them in the input of the learned decision rule during model deployment. Specifically, from a causal perspective, the proposed framework intends to improve the worst-case outcomes of individuals caused by sensitive variables that are unavailable at the time of decision. Unlike most existing literature that uses mean-optimal objectives, we propose a robust learning framework by finding a newly defined quantile- or infimum-optimal decision rule. The reliable performance of the proposed method is demonstrated through synthetic experiments and three real-world applications.
\ No newline at end of file
diff --git a/data/2022/neurips/RKHS-SHAP: Shapley Values for Kernel Methods b/data/2022/neurips/RKHS-SHAP: Shapley Values for Kernel Methods
new file mode 100644
index 0000000000..39eab3b1bf
--- /dev/null
+++ b/data/2022/neurips/RKHS-SHAP: Shapley Values for Kernel Methods	
@@ -0,0 +1 @@
+Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values~(SV), a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing SVs from a functional perspective, we propose \textsc{RKHS-SHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for consistent model interpretation. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the SVs of sensitive features.
\ No newline at end of file
diff --git a/data/2022/neurips/RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection b/data/2022/neurips/RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
new file mode 100644
index 0000000000..cb98bc5e6c
--- /dev/null
+++ b/data/2022/neurips/RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection	
@@ -0,0 +1 @@
+The task of Human-Object Interaction (HOI) detection targets fine-grained visual parsing of humans interacting with their environment, enabling a broad range of applications. Prior work has demonstrated the benefits of effective architecture design and integration of relevant cues for more accurate HOI detection. However, the design of an appropriate pre-training strategy for this task remains underexplored by existing approaches. To address this gap, we propose Relational Language-Image Pre-training (RLIP), a strategy for contrastive pre-training that leverages both entity and relation descriptions. To make effective use of such pre-training, we make three technical contributions: (1) a new Parallel entity detection and Sequential relation inference (ParSe) architecture that enables the use of both entity and relation descriptions during holistically optimized pre-training; (2) a synthetic data generation framework, Label Sequence Extension, that expands the scale of language data available within each minibatch; (3) mechanisms to account for ambiguity, Relation Quality Labels and Relation Pseudo-Labels, to mitigate the influence of ambiguous/noisy samples in the pre-training data. Through extensive experiments, we demonstrate the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection performance as well as increased robustness to learning from noisy annotations. Code will be available at https://github.com/JacobYuan7/RLIP.
\ No newline at end of file
diff --git a/data/2022/neurips/RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks b/data/2022/neurips/RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks
new file mode 100644
index 0000000000..edddafa634
--- /dev/null
+++ b/data/2022/neurips/RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent Neural Networks	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) are widely used throughout neuroscience as models of local neural activity. Many properties of single RNNs are well characterized theoretically, but experimental neuroscience has moved in the direction of studying multiple interacting areas, and RNN theory needs to be likewise extended. We take a constructive approach towards this problem, leveraging tools from nonlinear control theory and machine learning to characterize when combinations of stable RNNs will themselves be stable. Importantly, we derive conditions which allow for massive feedback connections between interacting RNNs. We parameterize these conditions for easy optimization using gradient-based techniques, and show that stability-constrained"networks of networks"can perform well on challenging sequential-processing benchmark tasks. Altogether, our results provide a principled approach towards understanding distributed, modular function in the brain.
\ No newline at end of file
diff --git a/data/2022/neurips/RORL: Robust Offline Reinforcement Learning via Conservative Smoothing b/data/2022/neurips/RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
new file mode 100644
index 0000000000..e3cdc3e065
--- /dev/null
+++ b/data/2022/neurips/RORL: Robust Offline Reinforcement Learning via Conservative Smoothing	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
\ No newline at end of file
diff --git a/data/2022/neurips/RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning b/data/2022/neurips/RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning
new file mode 100644
index 0000000000..dc02534110
--- /dev/null
+++ b/data/2022/neurips/RSA: Reducing Semantic Shift from Aggressive Augmentations for Self-supervised Learning	
@@ -0,0 +1 @@
+Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images’ structures leading to a severe semantic shift problem that augmented views of the same image may not share the same semantics, thus degrading the transfer performance. To address this problem, we propose a new SSL paradigm, which counteracts the impact of semantic shift by balancing the role of weak and ag-gressively augmented pairs. Specifically, semantically inconsistent pairs are of minority, and we treat them as noisy pairs. Note that deep neural networks (DNNs) have a crucial memorization effect that DNNs tend to first memorize clean (majority) examples before overfitting to noisy (minority) examples. Therefore, we set a relatively large weight for aggressively augmented data pairs at the early learning stage. With the training going on, the model begins to overfit noisy pairs. Accordingly, we gradually reduce the weights of aggressively augmented pairs. In doing so, our method can better embrace aggressive augmentations and neutralize the semantic shift problem. Experiments show that our model achieves 73.1% top-1 accuracy on ImageNet-1K with ResNet-50 for 200 epochs, which is a 2.5% improvement over BYOL. Moreover, experiments also demonstrate that the learned representations can transfer well for various downstream tasks. Code is released at: https://github.com/tmllab/RSA .
\ No newline at end of file
diff --git a/data/2022/neurips/RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer b/data/2022/neurips/RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer
new file mode 100644
index 0000000000..84381ba4aa
--- /dev/null
+++ b/data/2022/neurips/RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer	
@@ -0,0 +1 @@
+Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time semantic segmentation, pure CNN-based approaches still dominate in this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer for real-time semantic segmenation, which achieves better trade-off between performance and efficiency than CNN-based models. To achieve high inference efficiency on GPU-like devices, our RTFormer leverages GPU-Friendly Attention with linear complexity and discards the multi-head mechanism. Besides, we find that cross-resolution attention is more efficient to gather global context information for high-resolution branch by spreading the high level knowledge learned from low-resolution branch. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our proposed RTFormer, it achieves state-of-the-art on Cityscapes, CamVid and COCOStuff, and shows promising results on ADE20K. Code is available at PaddleSeg: https://github.com/PaddlePaddle/PaddleSeg.
\ No newline at end of file
diff --git a/data/2022/neurips/RainNet: A Large-Scale Imagery Dataset and Benchmark for Spatial Precipitation Downscaling b/data/2022/neurips/RainNet: A Large-Scale Imagery Dataset and Benchmark for Spatial Precipitation Downscaling
new file mode 100644
index 0000000000..ac17d3fb7b
--- /dev/null
+++ b/data/2022/neurips/RainNet: A Large-Scale Imagery Dataset and Benchmark for Spatial Precipitation Downscaling	
@@ -0,0 +1 @@
+AI-for-science approaches have been applied to solve scientific problems (e.g., nuclear fusion, ecology, genomics, meteorology) and have achieved highly promising results. Spatial precipitation downscaling is one of the most important meteorological problem and urgently requires the participation of AI. However, the lack of a well-organized and annotated large-scale dataset hinders the training and verification of more effective and advancing deep-learning models for precipitation downscaling. To alleviate these obstacles, we present the first large-scale spatial precipitation downscaling dataset named RainNet, which contains more than $62,400$ pairs of high-quality low/high-resolution precipitation maps for over $17$ years, ready to help the evolution of deep learning models in precipitation downscaling. Specifically, the precipitation maps carefully collected in RainNet cover various meteorological phenomena (e.g., hurricane, squall), which is of great help to improve the model generalization ability. In addition, the map pairs in RainNet are organized in the form of image sequences ($720$ maps per month or 1 map/hour), showing complex physical properties, e.g., temporal misalignment, temporal sparse, and fluid properties. Furthermore, two deep-learning-oriented metrics are specifically introduced to evaluate or verify the comprehensive performance of the trained model (e.g., prediction maps reconstruction accuracy). To illustrate the applications of RainNet, 14 state-of-the-art models, including deep models and traditional approaches, are evaluated. To fully explore potential downscaling solutions, we propose an implicit physical estimation benchmark framework to learn the above characteristics. Extensive experiments demonstrate the value of RainNet in training and evaluating downscaling models. Our dataset is available at https://neuralchen.github.io/RainNet/.
\ No newline at end of file
diff --git a/data/2022/neurips/Random Normalization Aggregation for Adversarial Defense b/data/2022/neurips/Random Normalization Aggregation for Adversarial Defense
new file mode 100644
index 0000000000..a44e3651cb
--- /dev/null
+++ b/data/2022/neurips/Random Normalization Aggregation for Adversarial Defense	
@@ -0,0 +1 @@
+The vulnerability of deep neural networks has been widely found in various models as well as tasks where slight perturbations on the inputs could lead to incorrect predictions. These perturbed inputs are known as adversarial examples and one of the intriguing properties of them is Adversarial Transfersability , i . e . the capability of adversarial examples to fool other models. Traditionally, this transferability is always regarded as a critical threat to the defense against adversarial attacks, however, we argue that the network robustness can be significantly boosted by utilizing adversarial transferability from a new perspective. In this work, we first discuss the influence of different popular normalization layers on the adversarial transferability, and then provide both empirical evidence and theoretical analysis to shed light on the relationship between normalization types and transferability. Based on our theoretical analysis, we propose a simple yet effective module named Random Normalization Aggregation (RNA) which replaces the batch normalization layers in the networks and aggregates different selected normalization types to form a huge random space. Specifically, a random path is sampled during each inference procedure so that the network itself can be treated as an ensemble of a wide range of different models. Since the entire random space is designed with low adversarial transferability, it is difficult to perform effective attacks even when the network parameters are accessible. We conduct extensive experiments on various models and datasets, and demonstrate the strong superiority of proposed algorithm. The PyTorch code is available at https://github.com/UniSerj/ Random-Norm-Aggregation and the MindSpore code is available at https: //gitee.com/mindspore/models/tree/master/research/cv/RNA .
\ No newline at end of file
diff --git a/data/2022/neurips/Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism b/data/2022/neurips/Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism
new file mode 100644
index 0000000000..fe72147abe
--- /dev/null
+++ b/data/2022/neurips/Random Rank: The One and Only Strategyproof and Proportionally Fair Randomized Facility Location Mechanism	
@@ -0,0 +1 @@
+Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice. In our work, we propose a concept called Strong Proportionality, which ensures that when there are two groups of agents at different locations, both groups incur the same total cost. We show that although Strong Proportionality is a well-motivated and basic axiom, there is no deterministic strategyproof mechanism satisfying the property. We then identify a randomized mechanism called Random Rank (which uniformly selects a number $k$ between $1$ to $n$ and locates the facility at the $k$'th highest agent location) which satisfies Strong Proportionality in expectation. Our main theorem characterizes Random Rank as the unique mechanism that achieves universal truthfulness, universal anonymity, and Strong Proportionality in expectation among all randomized mechanisms. Finally, we show via the AverageOrRandomRank mechanism that even stronger ex-post fairness guarantees can be achieved by weakening universal truthfulness to strategyproofness in expectation.
\ No newline at end of file
diff --git a/data/2022/neurips/Random Sharpness-Aware Minimization b/data/2022/neurips/Random Sharpness-Aware Minimization
new file mode 100644
index 0000000000..31d9cf3d94
--- /dev/null
+++ b/data/2022/neurips/Random Sharpness-Aware Minimization	
@@ -0,0 +1 @@
+Currently, Sharpness-Aware Minimization (SAM) is proposed to seek the parameters that lie in a flat region to improve the generalization when training neural networks. In particular, a minimax optimization objective is defined to find the maximum loss value centered on the weight, out of the purpose of simultaneously minimizing loss value and loss sharpness. For the sake of simplicity, SAM applies one-step gradient ascent to approximate the solution of the inner maximization. However, one-step gradient ascent may not be sufficient and multi-step gradient ascents will cause additional training costs. Based on this observation, we propose a novel random smoothing based SAM (R-SAM) algorithm. To be specific, R-SAM essentially smooths the loss landscape, based on which we are able to apply the one-step gradient ascent on the smoothed weights to improve the approximation of the inner maximization. Further, we evaluate our proposed R-SAM on CIFAR and ImageNet datasets. The experimental results illustrate that R-SAM can consistently improve the performance on ResNet and Vision Transformer (ViT) training.
\ No newline at end of file
diff --git a/data/2022/neurips/Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets b/data/2022/neurips/Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets
new file mode 100644
index 0000000000..30f2c1fccd
--- /dev/null
+++ b/data/2022/neurips/Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) typically require massive data to train on, which is a hurdle for numerous practical domains. Facing the data shortfall, one viable option is to acquire domain-specific training data from external uncensored sources, such as open webs or third-party data collectors. However, the quality of such acquired data is often not rigorously scrutinized, and one cannot easily rule out the risk of “poisoned” examples being included in such unreliable datasets, resulting in unreliable trained models which pose potential risks to many high-stake applications. While existing options usually suffer from high computational costs or assumptions on clean data access, this paper attempts to detect backdoors for potential victim models with minimal prior knowledge. In particular, provided with a trained model, users are assumed to (1) have no prior knowledge of whether it is already poisoned, or what the target class/percentage of samples is poisoned, and (2) have no access to a clean sample set from the same training distribution, nor any trusted model trained on such clean data. To tackle this challenging scenario, we first observe the contrasting channel-level statistics between the backdoor trigger and clean image features, and consequently, how they can be differentiated by progressive channel shuffling. We then propose the randomized channel shuffling method for backdoor-targeted class detection, which requires only a few feed-forward passes. It thus incurs minimal overheads and demands no clean sample nor prior knowledge. We further explore a “full” clean data-free setting, where neither the target class detection nor the trigger recovery can access the clean data. Extensive experiments are conducted with three datasets (CIFAR-10, GTSRB, Tiny ImageNet), three architectures (AlexNet, ResNet-20, SENet-18), and three attacks (BadNets[1], clean label attack [2], and WaNet
\ No newline at end of file
diff --git a/data/2022/neurips/Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks b/data/2022/neurips/Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks
new file mode 100644
index 0000000000..a4fd4c53c1
--- /dev/null
+++ b/data/2022/neurips/Randomized Message-Interception Smoothing: Gray-box Certificates for Graph Neural Networks	
@@ -0,0 +1 @@
+Randomized smoothing is one of the most promising frameworks for certifying the adversarial robustness of machine learning models, including Graph Neural Networks (GNNs). Yet, existing randomized smoothing certificates for GNNs are overly pessimistic since they treat the model as a black box, ignoring the underlying architecture. To remedy this, we propose novel gray-box certificates that exploit the message-passing principle of GNNs: We randomly intercept messages and carefully analyze the probability that messages from adversarially controlled nodes reach their target nodes. Compared to existing certificates, we certify robustness to much stronger adversaries that control entire nodes in the graph and can arbitrarily manipulate node features. Our certificates provide stronger guarantees for attacks at larger distances, as messages from farther-away nodes are more likely to get intercepted. We demonstrate the effectiveness of our method on various models and datasets. Since our gray-box certificates consider the underlying graph structure, we can significantly improve certifiable robustness by applying graph sparsification.
\ No newline at end of file
diff --git a/data/2022/neurips/Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means b/data/2022/neurips/Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means
new file mode 100644
index 0000000000..b51627c4e3
--- /dev/null
+++ b/data/2022/neurips/Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means	
@@ -0,0 +1 @@
+Kernel k -means is arguably one of the most common approaches to clustering. In this paper, we investigate the efﬁciency of kernel k -means combined with randomized sketches in terms of both statistical analysis and computational requirements. More precisely, we propose a uniﬁed randomized sketches framework to kernel k -means and investigate its excess risk bounds, obtaining the state-of-the-art risk bound with only a fraction of computations. Indeed, we prove that it sufﬁces to choose the sketch dimension Ω( √ n ) to obtain the same accuracy of exact kernel k -means with greatly reducing the computational costs, for sub-Gaussian sketches, the randomized orthogonal system (ROS) sketches, and Nyström kernel k -means, where n is the number of samples. To the best of our knowledge, this is the ﬁrst result of this kind for unsupervised learning. Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/Rank Diminishing in Deep Neural Networks b/data/2022/neurips/Rank Diminishing in Deep Neural Networks
new file mode 100644
index 0000000000..5f6f4d17fc
--- /dev/null
+++ b/data/2022/neurips/Rank Diminishing in Deep Neural Networks	
@@ -0,0 +1 @@
+The rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. In particular, the assumption of low-rank feature representations leads to algorithmic developments in many architectures. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear. To fill this gap, we perform a rigorous study on the behavior of network rank, focusing particularly on the notion of rank deficiency. We theoretically establish a universal monotonic decreasing property of network rank from the basic rules of differential and algebraic composition, and uncover rank deficiency of network blocks and deep function coupling. By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i.e., ResNets, deep MLPs, and Transformers on ImageNet. These empirical results are in direct accord with our theory. Furthermore, we reveal a novel phenomenon of independence deficit caused by the rank deficiency of deep networks, where classification confidence of a given category can be linearly decided by the confidence of a handful of other categories. The theoretical results of this work, together with the empirical findings, may advance understanding of the inherent principles of deep neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection b/data/2022/neurips/RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection
new file mode 100644
index 0000000000..1933d0906a
--- /dev/null
+++ b/data/2022/neurips/RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection	
@@ -0,0 +1 @@
+The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \texttt{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature (\emph{i.e.,} $\mathbf{X}{-} \mathbf{s}_{1}\mathbf{u}_{1}\mathbf{v}_{1}^{T}$). \texttt{RankFeat} achieves the \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Rapid Model Architecture Adaption for Meta-Learning b/data/2022/neurips/Rapid Model Architecture Adaption for Meta-Learning
new file mode 100644
index 0000000000..007ef4a2aa
--- /dev/null
+++ b/data/2022/neurips/Rapid Model Architecture Adaption for Meta-Learning	
@@ -0,0 +1 @@
+Network Architecture Search (NAS) methods have recently gathered much attention. They design networks with better performance and use a much shorter search time compared to traditional manual tuning. Despite their efficiency in model deployments, most NAS algorithms target a single task on a fixed hardware system. However, real-life few-shot learning environments often cover a great number of tasks (T ) and deployments on a wide variety of hardware platforms (H ). The combinatorial search complexity T times H creates a fundamental search efficiency challenge if one naively applies existing NAS methods to these scenarios. To overcome this issue, we show, for the first time, how to rapidly adapt model architectures to new tasks in a many-task many-hardware few-shot learning setup by integrating Model Agnostic Meta Learning (MAML) into the NAS flow. The proposed NAS method (H-Meta-NAS) is hardware-aware and performs optimisation in the MAML framework. H-Meta-NAS shows a Pareto dominance compared to a variety of NAS and manual baselines in popular few-shot learning benchmarks with various hardware platforms and constraints. In particular, on the 5-way 1-shot Mini-ImageNet classification task, the proposed method outperforms the best manual baseline by a large margin (5.21% in accuracy) using 60% less computation.
\ No newline at end of file
diff --git a/data/2022/neurips/Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems b/data/2022/neurips/Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems
new file mode 100644
index 0000000000..9df03fa0e7
--- /dev/null
+++ b/data/2022/neurips/Rapidly Mixing Multiple-try Metropolis Algorithms for Model Selection Problems	
@@ -0,0 +1 @@
+The multiple-try Metropolis (MTM) algorithm is an extension of the Metropolis-Hastings (MH) algorithm by selecting the proposed state among multiple trials according to some weight function. Although MTM has gained great popularity owing to its faster empirical convergence and mixing than the standard MH algorithm, its theoretical mixing property is rarely studied in the literature due to its complex proposal scheme. We prove that MTM can achieve a mixing time bound smaller than that of MH by a factor of the number of trials under a general setting applicable to high-dimensional model selection problems with discrete state spaces. Our theoretical results motivate a new class of weight functions called locally balanced weight functions and guide the choice of the number of trials, which leads to improved performance over standard MTM algorithms. We support our theoretical results by extensive simulation studies and real data applications with several Bayesian model selection problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Rare Gems: Finding Lottery Tickets at Initialization b/data/2022/neurips/Rare Gems: Finding Lottery Tickets at Initialization
new file mode 100644
index 0000000000..d959ec1532
--- /dev/null
+++ b/data/2022/neurips/Rare Gems: Finding Lottery Tickets at Initialization	
@@ -0,0 +1 @@
+Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming"train, prune, re-train"approach. Frankle&Carbin conjecture that we can avoid this by training"lottery tickets", i.e., special sparse subnetworks found at initialization, that can be trained to high accuracy. However, a subsequent line of work by Frankle et al. and Su et al. presents concrete evidence that current algorithms for finding trainable networks at initialization, fail simple baseline comparisons, e.g., against training random sparse subnetworks. Finding lottery tickets that train to better accuracy compared to simple baselines remains an open problem. In this work, we resolve this open problem by proposing Gem-Miner which finds lottery tickets at initialization that beat current baselines. Gem-Miner finds lottery tickets trainable to accuracy competitive or better than Iterative Magnitude Pruning (IMP), and does so up to $19\times$ faster.
\ No newline at end of file
diff --git a/data/2022/neurips/Rashomon Capacity: A Metric for Predictive Multiplicity in Classification b/data/2022/neurips/Rashomon Capacity: A Metric for Predictive Multiplicity in Classification
new file mode 100644
index 0000000000..c2ed155a3d
--- /dev/null
+++ b/data/2022/neurips/Rashomon Capacity: A Metric for Predictive Multiplicity in Classification	
@@ -0,0 +1 @@
+Predictive multiplicity occurs when classification models with statistically indistinguishable performances assign conflicting predictions to individual samples. When used for decision-making in applications of consequence (e.g., lending, education, criminal justice), models developed without regard for predictive multiplicity may result in unjustified and arbitrary decisions for specific individuals. We introduce a new metric, called Rashomon Capacity, to measure predictive multiplicity in probabilistic classification. Prior metrics for predictive multiplicity focus on classifiers that output thresholded (i.e., 0-1) predicted classes. In contrast, Rashomon Capacity applies to probabilistic classifiers, capturing more nuanced score variations for individual samples. We provide a rigorous derivation for Rashomon Capacity, argue its intuitive appeal, and demonstrate how to estimate it in practice. We show that Rashomon Capacity yields principled strategies for disclosing conflicting models to stakeholders. Our numerical experiments illustrate how Rashomon Capacity captures predictive multiplicity in various datasets and learning models, including neural networks. The tools introduced in this paper can help data scientists measure and report predictive multiplicity prior to model deployment.
\ No newline at end of file
diff --git a/data/2022/neurips/Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning b/data/2022/neurips/Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning
new file mode 100644
index 0000000000..7e89d71eb9
--- /dev/null
+++ b/data/2022/neurips/Rate-Distortion Theoretic Bounds on Generalization Error for Distributed Learning	
@@ -0,0 +1 @@
+In this paper, we use tools from rate-distortion theory to establish new upper bounds on the generalization error of statistical distributed learning algorithms. Specifically, there are $K$ clients whose individually chosen models are aggregated by a central server. The bounds depend on the compressibility of each client's algorithm while keeping other clients' algorithms un-compressed, and leverage the fact that small changes in each local model change the aggregated model by a factor of only $1/K$. Adopting a recently proposed approach by Sefidgaran et al., and extending it suitably to the distributed setting, this enables smaller rate-distortion terms which are shown to translate into tighter generalization bounds. The bounds are then applied to the distributed support vector machines (SVM), suggesting that the generalization error of the distributed setting decays faster than that of the centralized one with a factor of $\mathcal{O}(\log(K)/\sqrt{K})$. This finding is validated also experimentally. A similar conclusion is obtained for a multiple-round federated learning setup where each client uses stochastic gradient Langevin dynamics (SGLD).
\ No newline at end of file
diff --git a/data/2022/neurips/Rate-Optimal Online Convex Optimization in Adaptive Linear Control b/data/2022/neurips/Rate-Optimal Online Convex Optimization in Adaptive Linear Control
new file mode 100644
index 0000000000..5868c49c68
--- /dev/null
+++ b/data/2022/neurips/Rate-Optimal Online Convex Optimization in Adaptive Linear Control	
@@ -0,0 +1 @@
+We consider the problem of controlling an unknown linear dynamical system under adversarially changing convex costs and full feedback of both the state and cost function. We present the first computationally-efficient algorithm that attains an optimal $\smash{\sqrt{T}}$-regret rate compared to the best stabilizing linear controller in hindsight, while avoiding stringent assumptions on the costs such as strong convexity. Our approach is based on a careful design of non-convex lower confidence bounds for the online costs, and uses a novel technique for computationally-efficient regret minimization of these bounds that leverages their particular non-convex structure.
\ No newline at end of file
diff --git a/data/2022/neurips/Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion b/data/2022/neurips/Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion
new file mode 100644
index 0000000000..3c4d4819c2
--- /dev/null
+++ b/data/2022/neurips/Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion	
@@ -0,0 +1 @@
+Given a symmetric matrix $M$ and a vector $\lambda$, we present new bounds on the Frobenius-distance utility of the Gaussian mechanism for approximating $M$ by a matrix whose spectrum is $\lambda$, under $(\varepsilon,\delta)$-differential privacy. Our bounds depend on both $\lambda$ and the gaps in the eigenvalues of $M$, and hold whenever the top $k+1$ eigenvalues of $M$ have sufficiently large gaps. When applied to the problems of private rank-$k$ covariance matrix approximation and subspace recovery, our bounds yield improvements over previous bounds. Our bounds are obtained by viewing the addition of Gaussian noise as a continuous-time matrix Brownian motion. This viewpoint allows us to track the evolution of eigenvalues and eigenvectors of the matrix, which are governed by stochastic differential equations discovered by Dyson. These equations allow us to bound the utility as the square-root of a sum-of-squares of perturbations to the eigenvectors, as opposed to a sum of perturbation bounds obtained via Davis-Kahan-type theorems.
\ No newline at end of file
diff --git a/data/2022/neurips/ReCo: Retrieve and Co-segment for Zero-shot Transfer b/data/2022/neurips/ReCo: Retrieve and Co-segment for Zero-shot Transfer
new file mode 100644
index 0000000000..d0a36e413b
--- /dev/null
+++ b/data/2022/neurips/ReCo: Retrieve and Co-segment for Zero-shot Transfer	
@@ -0,0 +1 @@
+Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.
\ No newline at end of file
diff --git a/data/2022/neurips/ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective b/data/2022/neurips/ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective
new file mode 100644
index 0000000000..2b58c6b816
--- /dev/null
+++ b/data/2022/neurips/ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective	
@@ -0,0 +1 @@
+Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture draws upon both modelling paradigms, which previously were largely thought of as disjoint. Concretely, using a message-passing formalism, we show how FMs can be cast as GNNs by reformulating the gradient descent procedure as message-passing operations, which forms the basis of our ReFactor GNNs. Across a multitude of well-established KGC benchmarks, our ReFactor GNNs achieve comparable transductive performance to FMs, and state-of-the-art inductive performance while using an order of magnitude fewer parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks b/data/2022/neurips/Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks
new file mode 100644
index 0000000000..48687049b6
--- /dev/null
+++ b/data/2022/neurips/Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks	
@@ -0,0 +1 @@
+Recently complex-valued neural networks have received increasing attention due to successful applications in various tasks and the potential advantages of better theoretical properties and richer representational capacity. However, the training dynamics of complex networks compared to real networks remains an open problem. In this paper, we investigate the dynamics of deep complex networks during real-valued backpropagation in the infinite-width limit via neural tangent kernel (NTK). We first extend the Tensor Program to the complex domain, to show that the dynamics of any basic complex network architecture is governed by its NTK under real-valued backpropagation. Then we propose a way to investigate the comparison of training dynamics between complex and real networks by studying their NTKs. As a result, we surprisingly prove that for most complex activation functions, the commonly used real-valued backpropagation reduces the training dynamics of complex networks to that of ordinary real networks as the widths tend to infinity, thus eliminating the characteristics of complex-valued neural networks. Finally, the experiments validate our theoretical findings numerically.
\ No newline at end of file
diff --git a/data/2022/neurips/Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm b/data/2022/neurips/Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
new file mode 100644
index 0000000000..7a53b527c5
--- /dev/null
+++ b/data/2022/neurips/Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm	
@@ -0,0 +1 @@
+Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed at attenuating such effect. Through statistical analysis, we have observed that intensification is less severe with our algorithm but nevertheless more pronounced with relatively more difficult tasks, less complex models, and higher pruning ratios. More surprisingly, we conversely observe a de-intensification effect with lower pruning ratios, which indicates that moderate pruning may have a corrective effect to such distortions.
\ No newline at end of file
diff --git a/data/2022/neurips/Receding Horizon Inverse Reinforcement Learning b/data/2022/neurips/Receding Horizon Inverse Reinforcement Learning
new file mode 100644
index 0000000000..05a756ca08
--- /dev/null
+++ b/data/2022/neurips/Receding Horizon Inverse Reinforcement Learning	
@@ -0,0 +1 @@
+Inverse reinforcement learning (IRL) seeks to infer a cost function that explains the underlying goals and preferences of expert demonstrations. This paper presents receding horizon inverse reinforcement learning (RHIRL), a new IRL algorithm for high-dimensional, noisy, continuous systems with black-box dynamic models. RHIRL addresses two key challenges of IRL: scalability and robustness. To handle high-dimensional continuous systems, RHIRL matches the induced optimal trajectories with expert demonstrations locally in a receding horizon manner and 'stitches' together the local solutions to learn the cost; it thereby avoids the 'curse of dimensionality'. This contrasts sharply with earlier algorithms that match with expert demonstrations globally over the entire high-dimensional state space. To be robust against imperfect expert demonstrations and control noise, RHIRL learns a state-dependent cost function 'disentangled' from system dynamics under mild conditions. Experiments on benchmark tasks show that RHIRL outperforms several leading IRL algorithms in most instances. We also prove that the cumulative error of RHIRL grows linearly with the task duration.
\ No newline at end of file
diff --git a/data/2022/neurips/Recipe for a General, Powerful, Scalable Graph Transformer b/data/2022/neurips/Recipe for a General, Powerful, Scalable Graph Transformer
new file mode 100644
index 0000000000..f9cd73fb47
--- /dev/null
+++ b/data/2022/neurips/Recipe for a General, Powerful, Scalable Graph Transformer	
@@ -0,0 +1 @@
+We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.
\ No newline at end of file
diff --git a/data/2022/neurips/Recommender Forest for Efficient Retrieval b/data/2022/neurips/Recommender Forest for Efficient Retrieval
new file mode 100644
index 0000000000..b4d2f5aa5d
--- /dev/null
+++ b/data/2022/neurips/Recommender Forest for Efficient Retrieval	
@@ -0,0 +1 @@
+Recommender systems (RS) have to select the top-n items from a massive item set. For the sake of efficient recommendation, RS usually represents users and items as latent embeddings and relies on approximate nearest neighbor search (ANNs) to retrieve the recommendation results. Despite the reduction of running time, the representation learning is independent of ANNs index construction; thus, the two operations can be incompatible, which results in a potential loss of recommendation accuracy. To overcome the above problem, we propose the Recommender Forest (a.k.a., RecForest), which jointly learns latent embedding and index for an efficient and high-fidelity recommendation. RecForest consists of multiple K-ary trees, each of which is a partition of the item set via hierarchical balanced clustering such that each item is uniquely represented by a path from the root to a leaf. Given such a data structure, an encoder-decoder-based routing network is developed: it first encodes user information into user representation; then, leveraging a transformer-based decoder, it identifies the top-n items via beam search. Compared with the existing methods, RecForest brings in the following advantages: 1) the false partition of the near-boundary items can be effectively alleviated by the use of multiple trees; 2) the routing operation becomes much more accurate thanks to the powerful transformer decoder; 3) the branch parameters are shared across different tree levels, making the index to be extremely memory-efficient. The experimental studies are performed on six popular recommendation datasets: with a significantly simplified training cost, RecForest outperforms competitive baseline approaches in terms of both recommendation accuracy and efficiency. The code is available at https://github.com/wuchao-li/RecForest .
\ No newline at end of file
diff --git a/data/2022/neurips/Reconstructing Training Data From Trained Neural Networks b/data/2022/neurips/Reconstructing Training Data From Trained Neural Networks
new file mode 100644
index 0000000000..36ad6203ea
--- /dev/null
+++ b/data/2022/neurips/Reconstructing Training Data From Trained Neural Networks	
@@ -0,0 +1 @@
+Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Reconstruction on Trees and Low-Degree Polynomials b/data/2022/neurips/Reconstruction on Trees and Low-Degree Polynomials
new file mode 100644
index 0000000000..9f4cfcf321
--- /dev/null
+++ b/data/2022/neurips/Reconstruction on Trees and Low-Degree Polynomials	
@@ -0,0 +1 @@
+The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, graphical models, phylogenetic reconstruction, Markov Chain Monte Carlo, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves. Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees. Perhaps surprisingly, we show that there are simple tree models with $N$ leaves and bounded arity where (1) nontrivial reconstruction of the root value is possible with a simple polynomial time algorithm and with robustness to noise, but not with any polynomial of degree $N^{c}$ for $c>0$ a constant depending only on the arity, and (2) when the tree is unknown and given multiple samples with correlated root assignments, nontrivial reconstruction of the root value is possible with a simple Statistical Query algorithm but not with any polynomial of degree $N^c$. These results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems. They also complement recent work of Moitra, Mossel, and Sandon who studied the circuit complexity of Belief Propagation. As a consequence of our main result, we show that for some $c'>0$ depending only on the arity, $\exp(N^{c'})$ many samples are needed for RBF kernel regression to obtain nontrivial correlation with the true regression function (BP). We pose related open questions about low-degree polynomials and the Kesten-Stigum threshold.
\ No newline at end of file
diff --git a/data/2022/neurips/Recovering Private Text in Federated Learning of Language Models b/data/2022/neurips/Recovering Private Text in Federated Learning of Language Models
new file mode 100644
index 0000000000..4e16221258
--- /dev/null
+++ b/data/2022/neurips/Recovering Private Text in Federated Learning of Language Models	
@@ -0,0 +1 @@
+Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can successfully reconstruct single sentences with high fidelity for large batch sizes and even multiple sentences if applied iteratively. We evaluate three defense methods: gradient pruning, DPSGD, and a simple approach to freeze word embeddings that we propose. We show that both gradient pruning and DPSGD lead to a significant drop in utility. However, if we fine-tune a public pre-trained LM on private text without updating word embeddings, it can effectively defend the attack with minimal data utility loss. Together, we hope that our results can encourage the community to rethink the privacy concerns of LM training and its standard practices in the future.
\ No newline at end of file
diff --git a/data/2022/neurips/Recruitment Strategies That Take a Chance b/data/2022/neurips/Recruitment Strategies That Take a Chance
new file mode 100644
index 0000000000..c77bdc799c
--- /dev/null
+++ b/data/2022/neurips/Recruitment Strategies That Take a Chance	
@@ -0,0 +1 @@
+In academic recruitment settings, including faculty hiring and PhD admissions, committees aim to maximize the overall quality of recruited candidates, but there is uncertainty about whether a candidate would accept an offer if given one. Previous work has considered algorithms that make offers sequentially and are subject to a hard budget constraint. We argue that these modeling choices may be inconsistent with the practice of academic recruitment. Instead, we restrict ourselves to a single batch of offers, and we treat the target number of positions as a soft constraint, so we risk overshooting or undershooting the target. Speciﬁcally, our objective is to select a subset of candidates that maximizes the overall expected value associated with candidates who accept, minus an expected penalty for deviating from the target. We ﬁrst analyze the guarantees provided by natural greedy heuristics, showing their desirable properties despite the simplicity. Depending on the structure of the penalty function, we further develop algorithms that provide fully polynomial-time approximation schemes and constant-factor approximations to this objective. Empirical evaluation of our algorithms corroborates these theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms b/data/2022/neurips/Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms
new file mode 100644
index 0000000000..c23b6809e4
--- /dev/null
+++ b/data/2022/neurips/Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms	
@@ -0,0 +1 @@
+Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized program. For example, on parity problems, the NN learns as well as Gaussian elimination, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight sharing between layers and convolutional weight sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more natural and powerful than either alone, particularly for concisely parameterizing discrete algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Recurrent Memory Transformer b/data/2022/neurips/Recurrent Memory Transformer
new file mode 100644
index 0000000000..ddfbcad575
--- /dev/null
+++ b/data/2022/neurips/Recurrent Memory Transformer	
@@ -0,0 +1 @@
+Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.
\ No newline at end of file
diff --git a/data/2022/neurips/Recurrent Video Restoration Transformer with Guided Deformable Attention b/data/2022/neurips/Recurrent Video Restoration Transformer with Guided Deformable Attention
new file mode 100644
index 0000000000..23a072e8af
--- /dev/null
+++ b/data/2022/neurips/Recurrent Video Restoration Transformer with Guided Deformable Attention	
@@ -0,0 +1 @@
+Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime.
\ No newline at end of file
diff --git a/data/2022/neurips/Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method b/data/2022/neurips/Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method
new file mode 100644
index 0000000000..4e70939355
--- /dev/null
+++ b/data/2022/neurips/Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method	
@@ -0,0 +1 @@
+Despite the success of generative adversarial networks (GANs) in generating visually appealing images, they are notoriously challenging to train. In order to stabilize the learning dynamics in minimax games, we propose a novel recursive reasoning algorithm: Level $k$ Gradient Play (Lv.$k$ GP) algorithm. In contrast to many existing algorithms, our algorithm does not require sophisticated heuristics or curvature information. We show that as $k$ increases, Lv.$k$ GP converges asymptotically towards an accurate estimation of players' future strategy. Moreover, we justify that Lv.$\infty$ GP naturally generalizes a line of provably convergent game dynamics which rely on predictive updates. Furthermore, we provide its local convergence property in nonconvex-nonconcave zero-sum games and global convergence in bilinear and quadratic games. By combining Lv.$k$ GP with Adam optimizer, our algorithm shows a clear advantage in terms of performance and computational overhead compared to other methods. Using a single Nvidia RTX3090 GPU and 30 times fewer parameters than BigGAN on CIFAR-10, we achieve an FID of 10.17 for unconditional image generation within 30 hours, allowing GAN training on common computational resources to reach state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Recursive Reinforcement Learning b/data/2022/neurips/Recursive Reinforcement Learning
new file mode 100644
index 0000000000..d3abc8491c
--- /dev/null
+++ b/data/2022/neurips/Recursive Reinforcement Learning	
@@ -0,0 +1 @@
+This article proposes a fault-tolerant adaptive multigradient recursive reinforcement learning (RL) event-triggered tracking control scheme for strict-feedback discrete-time multiagent systems. The multigradient recursive RL algorithm is used to avoid the local optimal problem that may exist in the gradient descent scheme. Different from the existing event-triggered control results, a new lemma about the relative threshold event-triggered control strategy is proposed to handle the compensation error, which can improve the utilization of communication resources and weaken the negative impact on tracking accuracy and closed-loop system stability. To overcome the difficulty caused by sensor fault, a distributed control method is introduced by adopting the adaptive compensation technique, which can effectively decrease the number of online estimation parameters. Furthermore, by using the multigradient recursive RL algorithm with less learning parameters, the online estimation time can be effectively reduced. The stability of closed-loop multiagent systems is proved by using the Lyapunov stability theorem, and it is verified that all signals are semiglobally uniformly ultimately bounded. Finally, two simulation examples are given to show the availability of the presented control scheme.
\ No newline at end of file
diff --git a/data/2022/neurips/RecursiveMix: Mixed Learning with History b/data/2022/neurips/RecursiveMix: Mixed Learning with History
new file mode 100644
index 0000000000..6cbb76ce67
--- /dev/null
+++ b/data/2022/neurips/RecursiveMix: Mixed Learning with History	
@@ -0,0 +1 @@
+Mix-based augmentation has been proven fundamental to the generalization of deep vision models. However, current augmentations only mix samples at the current data batch during training, which ignores the possible knowledge accumulated in the learning history. In this paper, we propose a recursive mixed-sample learning paradigm, termed"RecursiveMix"(RM), by exploring a novel training strategy that leverages the historical input-prediction-label triplets. More specifically, we iteratively resize the input image batch from the previous iteration and paste it into the current batch while their labels are fused proportionally to the area of the operated patches. Further, a consistency loss is introduced to align the identical image semantics across the iterations, which helps the learning of scale-invariant feature representations. Based on ResNet-50, RM largely improves classification accuracy by $\sim$3.2\% on CIFAR100 and $\sim$2.8\% on ImageNet with negligible extra computation/storage costs. In the downstream object detection task, the RM pretrained model outperforms the baseline by 2.1 AP points and surpasses CutMix by 1.4 AP points under the ATSS detector on COCO. In semantic segmentation, RM also surpasses the baseline and CutMix by 1.9 and 1.1 mIoU points under UperNet on ADE20K, respectively. Codes and pretrained models are available at \url{https://github.com/megvii-research/RecursiveMix}.
\ No newline at end of file
diff --git a/data/2022/neurips/Redeeming intrinsic rewards via constrained optimization b/data/2022/neurips/Redeeming intrinsic rewards via constrained optimization
new file mode 100644
index 0000000000..6d25bec6f2
--- /dev/null
+++ b/data/2022/neurips/Redeeming intrinsic rewards via constrained optimization	
@@ -0,0 +1 @@
+State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., $\epsilon$-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms. We propose a principled constrained optimization procedure called Extrinsic-Intrinsic Policy Optimization (EIPO) that automatically tunes the importance of the intrinsic reward: it suppresses the intrinsic reward when exploration is unnecessary and increases it when exploration is required. The results is superior exploration that does not require manual tuning in balancing the intrinsic reward against the task reward. Consistent performance gains across sixty-one ATARI games validate our claim. The code is available at https://github.com/Improbable-AI/eipo.
\ No newline at end of file
diff --git a/data/2022/neurips/Redistribution of Weights and Activations for AdderNet Quantization b/data/2022/neurips/Redistribution of Weights and Activations for AdderNet Quantization
new file mode 100644
index 0000000000..4ec4f6179d
--- /dev/null
+++ b/data/2022/neurips/Redistribution of Weights and Activations for AdderNet Quantization	
@@ -0,0 +1 @@
+Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in l1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the l1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored. To this end, we first thoroughly analyze the difference on distributions of weights and activations in AdderNet and then propose a new quantization algorithm by redistributing the weights and the activations. Specifically, the pre-trained full-precision weights in different kernels are clustered into different groups, then the intra-group sharing and inter-group independent scales can be adopted. To further compensate the accuracy drop caused by the distribution difference, we then develop a lossless range clamp scheme for weights and a simple yet effective outliers clamp strategy for activations. Thus, the functionality of full-precision weights and the representation ability of full-precision activations can be fully preserved. The effectiveness of the proposed quantization method for AdderNet is well verified on several benchmarks, e.g., our 4-bit post-training quantized adder ResNet-18 achieves an 66.5% top-1 accuracy on the ImageNet with comparable energy efficiency, which is about 8.5% higher than that of the previous AdderNet quantization methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Reduced Representation of Deformation Fields for Effective Non-rigid Shape Matching b/data/2022/neurips/Reduced Representation of Deformation Fields for Effective Non-rigid Shape Matching
new file mode 100644
index 0000000000..62b14d708e
--- /dev/null
+++ b/data/2022/neurips/Reduced Representation of Deformation Fields for Effective Non-rigid Shape Matching	
@@ -0,0 +1 @@
+In this work we present a novel approach for computing correspondences between non-rigid objects, by exploiting a reduced representation of deformation fields. Different from existing works that represent deformation fields by training a general-purpose neural network, we advocate for an approximation based on mesh-free methods. By letting the network learn deformation parameters at a sparse set of positions in space (nodes), we reconstruct the continuous deformation field in a closed-form with guaranteed smoothness. With this reduction in degrees of freedom, we show significant improvement in terms of data-efficiency thus enabling limited supervision. Furthermore, our approximation provides direct access to first-order derivatives of deformation fields, which facilitates enforcing desirable regularization effectively. Our resulting model has high expressive power and is able to capture complex deformations. We illustrate its effectiveness through state-of-the-art results across multiple deformable shape matching benchmarks. Our code and data are publicly available at: https://github.com/Sentient07/DeformationBasis.
\ No newline at end of file
diff --git a/data/2022/neurips/Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT b/data/2022/neurips/Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
new file mode 100644
index 0000000000..816669e6a0
--- /dev/null
+++ b/data/2022/neurips/Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT	
@@ -0,0 +1 @@
+Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit
\ No newline at end of file
diff --git a/data/2022/neurips/Redundancy-Free Message Passing for Graph Neural Networks b/data/2022/neurips/Redundancy-Free Message Passing for Graph Neural Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Redundant representations help generalization in wide neural networks b/data/2022/neurips/Redundant representations help generalization in wide neural networks
new file mode 100644
index 0000000000..90a738c396
--- /dev/null
+++ b/data/2022/neurips/Redundant representations help generalization in wide neural networks	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) defy the classical bias-variance trade-off; adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ‘benign overfitting’ in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information and differ from each other only by statistically independent noise. The number of these groups increases linearly with the width of the layer, but only if the width is above a critical value. We show that redundant neurons appear only when the training is regularized and the training error is zero.
\ No newline at end of file
diff --git a/data/2022/neurips/Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Translation Model b/data/2022/neurips/Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Translation Model
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Regret Bounds for Information-Directed Reinforcement Learning b/data/2022/neurips/Regret Bounds for Information-Directed Reinforcement Learning
new file mode 100644
index 0000000000..7ba720d219
--- /dev/null
+++ b/data/2022/neurips/Regret Bounds for Information-Directed Reinforcement Learning	
@@ -0,0 +1 @@
+Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL). However, theoretical understanding of IDS for Markov Decision Processes (MDPs) is still limited. We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target. Our theoretical results shed light on the importance of choosing the learning target such that the practitioners can balance the computation and regret bounds. As a consequence, we derive prior-free Bayesian regret bounds for vanilla-IDS which learns the whole environment under tabular finite-horizon MDPs. In addition, we propose a computationally-efficient regularized-IDS that maximizes an additive form rather than the ratio form and show that it enjoys the same regret bound as vanilla-IDS. With the aid of rate-distortion theory, we improve the regret bound by learning a surrogate, less informative environment. Furthermore, we extend our analysis to linear MDPs and prove similar regret bounds for Thompson sampling as a by-product.
\ No newline at end of file
diff --git a/data/2022/neurips/Regret Bounds for Multilabel Classification in Sparse Label Regimes b/data/2022/neurips/Regret Bounds for Multilabel Classification in Sparse Label Regimes
new file mode 100644
index 0000000000..41d0ff8c5d
--- /dev/null
+++ b/data/2022/neurips/Regret Bounds for Multilabel Classification in Sparse Label Regimes	
@@ -0,0 +1 @@
+Multi-label classiﬁcation (MLC) has wide practical importance, but the theoretical understanding of its statistical properties is still limited. As an attempt to ﬁll this gap, we thoroughly study upper and lower regret bounds for two canonical MLC performance measures, Hamming loss and Precision@ κ . We consider two different statistical and algorithmic settings, a non-parametric setting tackled by plug-in classiﬁers à la k -nearest neighbors, and a parametric one tackled by empirical risk minimization operating on surrogate loss functions. For both, we analyze the interplay between a natural MLC variant of the low noise assumption, widely studied in binary classiﬁcation, and the label sparsity, the latter being a natural property of large-scale MLC problems. We show that those conditions are crucial in improving the bounds, but the way they are tangled is not obvious, and also different across the two settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Regret Bounds for Risk-Sensitive Reinforcement Learning b/data/2022/neurips/Regret Bounds for Risk-Sensitive Reinforcement Learning
new file mode 100644
index 0000000000..eefa2336dc
--- /dev/null
+++ b/data/2022/neurips/Regret Bounds for Risk-Sensitive Reinforcement Learning	
@@ -0,0 +1 @@
+In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.
\ No newline at end of file
diff --git a/data/2022/neurips/Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games b/data/2022/neurips/Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
new file mode 100644
index 0000000000..4a54584e6d
--- /dev/null
+++ b/data/2022/neurips/Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games	
@@ -0,0 +1 @@
+We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Regularized Molecular Conformation Fields b/data/2022/neurips/Regularized Molecular Conformation Fields
new file mode 100644
index 0000000000..75e198aa34
--- /dev/null
+++ b/data/2022/neurips/Regularized Molecular Conformation Fields	
@@ -0,0 +1 @@
+Predicting energetically favorable 3-dimensional conformations of organic molecules from molecular graph plays a fundamental role in computer-aided drug discovery research. However, effectively exploring the high-dimensional conformation space to identify (meta)stable conformers is anything but trivial. In this work, we introduce R MCF , a novel framework to generate a diverse set of low-energy molecular conformations through sampling from a regularized molecular conformation ﬁeld. We develop a data-driven molecular segmentation algorithm to automatically partition each molecule into several structural building blocks to reduce the modeling degrees of freedom. Then, we employ a Markov Random Field to learn the joint probability distribution of fragment conﬁgurations and inter-fragment dihedral angles, which enables us to sample from different low-energy regions of a conformation space. Our model constantly outperforms state-of-the-art models for the conformation generation task on the GEOM-Drugs dataset. We attribute the success of R MCF to modeling in a regularized feature space and learning a global fragment conﬁguration distribution for effective sampling. The proposed method could be generalized to deal with larger biomolecular systems. 2
\ No newline at end of file
diff --git a/data/2022/neurips/Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress b/data/2022/neurips/Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
new file mode 100644
index 0000000000..96ba6aa6df
--- /dev/null
+++ b/data/2022/neurips/Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress	
@@ -0,0 +1 @@
+Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further. Open-sourced code and trained agents at https://agarwl.github.io/reincarnating_rl.
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforced Genetic Algorithm for Structure-based Drug Design b/data/2022/neurips/Reinforced Genetic Algorithm for Structure-based Drug Design
new file mode 100644
index 0000000000..eaf0b7f34d
--- /dev/null
+++ b/data/2022/neurips/Reinforced Genetic Algorithm for Structure-based Drug Design	
@@ -0,0 +1 @@
+Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a disease-related protein (targets), which is the primary approach to computer-aided drug discovery. Recently, applying deep generative models for three-dimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated state-of-the-art performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a random-walk-like exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics. To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve performance by leveraging the shared underlying physics of the binding processes. The code is available at https://github.com/futianfan/reinforced-genetic-algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space b/data/2022/neurips/Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
new file mode 100644
index 0000000000..926b14885a
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space	
@@ -0,0 +1 @@
+In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the \emph{diameter} $D$ of the MDP is $\Omega(S^S)$, where $S$ is the number of states. Therefore, the existing lower and upper bounds on the regret at time$T$, of order $O(\sqrt{DSAT})$ for MDPs with $S$ states and $A$ actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by $\tilde{\mathcal{O}}(\sqrt{E_2AT})$ where $E_2$ is related to the weighted second moment of the stationary measure of a reference policy. Importantly, $E_2$ is bounded independently of $S$. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform.
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning with Automated Auxiliary Loss Search b/data/2022/neurips/Reinforcement Learning with Automated Auxiliary Loss Search
new file mode 100644
index 0000000000..a8331c8480
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning with Automated Auxiliary Loss Search	
@@ -0,0 +1 @@
+A good state representation is crucial to solving complicated reinforcement learning (RL) challenges. Many recent works focus on designing auxiliary losses for learning informative representations. Unfortunately, these handcrafted objectives rely heavily on expert knowledge and may be sub-optimal. In this paper, we propose a principled and universal method for learning better representations with auxiliary loss functions, named Automated Auxiliary Loss Search (A2LS), which automatically searches for top-performing auxiliary loss functions for RL. Specifically, based on the collected trajectory data, we define a general auxiliary loss space of size $7.5 \times 10^{20}$ and explore the space with an efficient evolutionary search strategy. Empirical results show that the discovered auxiliary loss (namely, A2-winner) significantly improves the performance on both high-dimensional (image) and low-dimensional (vector) unseen tasks with much higher efficiency, showing promising generalization ability to different settings and even different benchmark domains. We conduct a statistical analysis to reveal the relations between patterns of auxiliary losses and RL performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning with Logarithmic Regret and Policy Switches b/data/2022/neurips/Reinforcement Learning with Logarithmic Regret and Policy Switches
new file mode 100644
index 0000000000..b64a94403f
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning with Logarithmic Regret and Policy Switches	
@@ -0,0 +1 @@
+In this paper, we study the problem of regret minimization for episodic Rein-forcement Learning (RL) both in the model-free and the model-based setting. We focus on learning with general function classes and general model classes, and we derive results that scale with the eluder dimension of these classes. In contrast to the existing body of work that mainly establishes instance-independent regret guarantees, we focus on the instance-dependent setting and show that the regret scales logarithmically with the horizon T , provided that there is a gap between the best and the second best action in every state. In addition, we show that such a logarithmic regret bound is realizable by algorithms with O (log T ) switching cost (also known as adaptivity complexity ). In other words, these algorithms rarely switch their policy during the course of their execution. Finally, we complement our results with lower bounds which show that even in the tabular setting, we cannot hope for regret guarantees lower than O (log T ) .
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning with Neural Radiance Fields b/data/2022/neurips/Reinforcement Learning with Neural Radiance Fields
new file mode 100644
index 0000000000..48e6dc9936
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning with Neural Radiance Fields	
@@ -0,0 +1 @@
+It is a long-standing problem to find effective representations for training reinforcement learning (RL) agents. This paper demonstrates that learning state representations with supervision from Neural Radiance Fields (NeRFs) can improve the performance of RL compared to other learned representations or even low-dimensional, hand-engineered state information. Specifically, we propose to train an encoder that maps multiple image observations to a latent space describing the objects in the scene. The decoder built from a latent-conditioned NeRF serves as the supervision signal to learn the latent space. An RL algorithm then operates on the learned latent space as its state representation. We call this NeRF-RL. Our experiments indicate that NeRF as supervision leads to a latent space better suited for the downstream RL tasks involving robotic object manipulations like hanging mugs on hooks, pushing objects, or opening doors. Video: https://dannydriess.github.io/nerf-rl
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning with Non-Exponential Discounting b/data/2022/neurips/Reinforcement Learning with Non-Exponential Discounting
new file mode 100644
index 0000000000..285c29c3ff
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning with Non-Exponential Discounting	
@@ -0,0 +1 @@
+Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Reinforcement Learning with a Terminator b/data/2022/neurips/Reinforcement Learning with a Terminator
new file mode 100644
index 0000000000..4e24e9d208
--- /dev/null
+++ b/data/2022/neurips/Reinforcement Learning with a Terminator	
@@ -0,0 +1 @@
+We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret. Motivated by our theoretical analysis, we design and implement a scalable approach, which combines optimism (w.r.t. termination) and a dynamic discount factor, incorporating the termination probability. We deploy our method on high-dimensional driving and MinAtar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Relation-Constrained Decoding for Text Generation b/data/2022/neurips/Relation-Constrained Decoding for Text Generation
new file mode 100644
index 0000000000..a32fde720b
--- /dev/null
+++ b/data/2022/neurips/Relation-Constrained Decoding for Text Generation	
@@ -0,0 +1 @@
+The dominant paradigm for neural text generation nowadays is seq2seq learning with large-scale pretrained language models. However, it is usually difficult to manually constrain the generation process of these models. Prior studies have introduced Lexically Constrained Decoding (LCD) to ensure the presence of pre-specified words or phrases in the output. However, simply applying lexical constraints has no guarantee of the grammatical or semantic relations between words. Thus, more elaborate constraints are needed. To this end, we first propose a new constrained decoding scenario named Relation-Constrained Decoding (RCD) , which requires the model’s output to contain several given word pairs with respect to the given relations between them. For this scenario, we present a novel plug-and-play decoding algorithm named RE lation-guided probability S urgery and b E am AL location (RESEAL), which can handle different categories of relations, e.g., syntactical relations or factual relations. Moreover, RESEAL can adaptively “reseal” the relations to form a high-quality sentence, which can be applied to the inference stage of any autoregressive text generation model. To evaluate our method, we first construct an RCD benchmark based on dependency relations from treebanks with annotated dependencies. Experimental results demonstrate that our approach can achieve better preservation of the input dependency relations compared to previous methods. To further illustrate the effectiveness of RESEAL, we apply our method to three downstream tasks: sentence summarization, fact-based text editing, and data-to-text generation. We observe an improvement in generation quality. The source code is available at https://github.com/CasparSwift/RESEAL .
\ No newline at end of file
diff --git a/data/2022/neurips/Relational Proxies: Emergent Relationships as Fine-Grained Discriminators b/data/2022/neurips/Relational Proxies: Emergent Relationships as Fine-Grained Discriminators
new file mode 100644
index 0000000000..b0f46f67bd
--- /dev/null
+++ b/data/2022/neurips/Relational Proxies: Emergent Relationships as Fine-Grained Discriminators	
@@ -0,0 +1 @@
+Fine-grained categories that largely share the same set of parts cannot be discriminated based on part information alone, as they mostly differ in the way the local parts relate to the overall global structure of the object. We propose Relational Proxies, a novel approach that leverages the relational information between the global and local views of an object for encoding its semantic label. Starting with a rigorous formalization of the notion of distinguishability between fine-grained categories, we prove the necessary and sufficient conditions that a model must satisfy in order to learn the underlying decision boundaries in the fine-grained setting. We design Relational Proxies based on our theoretical findings and evaluate it on seven challenging fine-grained benchmark datasets and achieve state-of-the-art results on all of them, surpassing the performance of all existing works with a margin exceeding 4% in some cases. We also experimentally validate our theory on fine-grained distinguishability and obtain consistent results across multiple benchmarks. Implementation is available at https://github.com/abhrac/relational-proxies.
\ No newline at end of file
diff --git a/data/2022/neurips/Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL b/data/2022/neurips/Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
new file mode 100644
index 0000000000..b931e0d2b3
--- /dev/null
+++ b/data/2022/neurips/Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL	
@@ -0,0 +1 @@
+The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents. Our improved generalization bound may be of independent interest and is applicable to other regression problems related to the transformer beyond MARL.
\ No newline at end of file
diff --git a/data/2022/neurips/Relaxing Equivariance Constraints with Non-stationary Continuous Filters b/data/2022/neurips/Relaxing Equivariance Constraints with Non-stationary Continuous Filters
new file mode 100644
index 0000000000..d390664148
--- /dev/null
+++ b/data/2022/neurips/Relaxing Equivariance Constraints with Non-stationary Continuous Filters	
@@ -0,0 +1 @@
+Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although some tasks are inherently equivariant, many tasks do not strictly follow such symmetries. In such cases, equivariance constraints can be overly restrictive. In this work, we propose a parameter-efficient relaxation of equivariance that can effectively interpolate between a (i) non-equivariant linear product, (ii) a strict-equivariant convolution, and (iii) a strictly-invariant mapping. The proposed parameterisation can be thought of as a building block to allow adjustable symmetry structure in neural networks. In addition, we demonstrate that the amount of equivariance can be learned from the training data using backpropagation. Gradient-based learning of equivariance achieves similar or improved performance compared to the best value found by cross-validation and outperforms baselines with partial or strict equivariance on CIFAR-10 and CIFAR-100 image classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks b/data/2022/neurips/Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks
new file mode 100644
index 0000000000..d69cc9d3bd
--- /dev/null
+++ b/data/2022/neurips/Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks	
@@ -0,0 +1 @@
+We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation allows for more efficient and effective distillation. Concretely, we learn a set of bases (aka ``memories'') which are shared between classes and combined through learned flexible addressing functions to generate a diverse set of training examples. This leads to several benefits: 1) the size of compressed data does not necessarily grow linearly with the number of classes; 2) an overall higher compression rate with more effective distillation is achieved; and 3) more generalized queries are allowed beyond recalling the original classes. We demonstrate state-of-the-art results on the dataset distillation task across six benchmarks, including up to 16.5% and 9.7% in retained accuracy improvement when distilling CIFAR10 and CIFAR100 respectively. We then leverage our framework to perform continual learning, achieving state-of-the-art results on four benchmarks, with 23.2% accuracy improvement on MANY. The code is released on our project webpage https://github.com/princetonvisualai/RememberThePast-DatasetDistillation.
\ No newline at end of file
diff --git a/data/2022/neurips/Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning b/data/2022/neurips/Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
new file mode 100644
index 0000000000..b8f67850b0
--- /dev/null
+++ b/data/2022/neurips/Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning	
@@ -0,0 +1 @@
+Propose-Test-Release (PTR) is a differential privacy framework that works with local sensitivity of functions, instead of their global sensitivity. This framework is typically used for releasing robust statistics such as median or trimmed mean in a differentially private manner. While PTR is a common framework introduced over a decade ago, using it in applications such as robust SGD where we need many adaptive robust queries is challenging. This is mainly due to the lack of Renyi Differential Privacy (RDP) analysis, an essential ingredient underlying the moments accountant approach for differentially private deep learning. In this work, we generalize the standard PTR and derive the first RDP bound for it when the target function has bounded global sensitivity. We show that our RDP bound for PTR yields tighter DP guarantees than the directly analyzed $(\eps, \delta)$-DP. We also derive the algorithm-specific privacy amplification bound of PTR under subsampling. We show that our bound is much tighter than the general upper bound and close to the lower bound. Our RDP bounds enable tighter privacy loss calculation for the composition of many adaptive runs of PTR. As an application of our analysis, we show that PTR and our theoretical results can be used to design differentially private variants for byzantine robust training algorithms that use robust statistics for gradients aggregation. We conduct experiments on the settings of label, feature, and gradient corruption across different datasets and architectures. We show that PTR-based private and robust training algorithm significantly improves the utility compared with the baseline.
\ No newline at end of file
diff --git a/data/2022/neurips/Repairing Neural Networks by Leaving the Right Past Behind b/data/2022/neurips/Repairing Neural Networks by Leaving the Right Past Behind
new file mode 100644
index 0000000000..f7b5f88693
--- /dev/null
+++ b/data/2022/neurips/Repairing Neural Networks by Leaving the Right Past Behind	
@@ -0,0 +1 @@
+Prediction failures of machine learning models often arise from deficiencies in training data, such as incorrect labels, outliers, and selection biases. However, such data points that are responsible for a given failure mode are generally not known a priori, let alone a mechanism for repairing the failure. This work draws on the Bayesian view of continual learning, and develops a generic framework for both, identifying training examples that have given rise to the target failure, and fixing the model through erasing information about them. This framework naturally allows leveraging recent advances in continual learning to this new problem of model repairment, while subsuming the existing works on influence functions and data deletion as specific instances. Experimentally, the proposed approach outperforms the baselines for both identification of detrimental training data and fixing model failures in a generalisable manner.
\ No newline at end of file
diff --git a/data/2022/neurips/Representing Spatial Trajectories as Distributions b/data/2022/neurips/Representing Spatial Trajectories as Distributions
new file mode 100644
index 0000000000..36a4a7e390
--- /dev/null
+++ b/data/2022/neurips/Representing Spatial Trajectories as Distributions	
@@ -0,0 +1 @@
+We introduce a representation learning framework for spatial trajectories. We represent partial observations of trajectories as probability distributions in a learned latent space, which characterize the uncertainty about unobserved parts of the trajectory. Our framework allows us to obtain samples from a trajectory for any continuous point in time, both interpolating and extrapolating. Our flexible approach supports directly modifying specific attributes of a trajectory, such as its pace, as well as combining different partial observations into single representations. Experiments show our method's advantage over baselines in prediction tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Reproducibility in Optimization: Theoretical Framework and Limits b/data/2022/neurips/Reproducibility in Optimization: Theoretical Framework and Limits
new file mode 100644
index 0000000000..8bb9613641
--- /dev/null
+++ b/data/2022/neurips/Reproducibility in Optimization: Theoretical Framework and Limits	
@@ -0,0 +1 @@
+We initiate a formal study of reproducibility in optimization. We define a quantitative measure of reproducibility of optimization procedures in the face of noisy or error-prone operations such as inexact or stochastic gradient computations or inexact initialization. We then analyze several convex optimization settings of interest such as smooth, non-smooth, and strongly-convex objective functions and establish tight bounds on the limits of reproducibility in each setting. Our analysis reveals a fundamental trade-off between computation and reproducibility: more computation is necessary (and sufficient) for better reproducibility.
\ No newline at end of file
diff --git a/data/2022/neurips/ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization b/data/2022/neurips/ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization
new file mode 100644
index 0000000000..d2f9ecf1cc
--- /dev/null
+++ b/data/2022/neurips/ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization	
@@ -0,0 +1 @@
+The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efﬁciency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can ﬁnd the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL. We theoretically show that ResQ can satisfy both the individual global max (IGM) and the distributional IGM principle without representation limitations. Through experiments on matrix games, the predator-prey, and StarCraft benchmarks, we show that ResQ can obtain better results than multiple expected/stochastic value factorization methods.
\ No newline at end of file
diff --git a/data/2022/neurips/ResT V2: Simpler, Faster and Stronger b/data/2022/neurips/ResT V2: Simpler, Faster and Stronger
new file mode 100644
index 0000000000..114c0a6c34
--- /dev/null
+++ b/data/2022/neurips/ResT V2: Simpler, Faster and Stronger	
@@ -0,0 +1 @@
+This paper proposes ResTv2, a simpler, faster, and stronger multi-scale vision Transformer for visual recognition. ResTv2 simplifies the EMSA structure in ResTv1 (i.e., eliminating the multi-head interaction part) and employs an upsample operation to reconstruct the lost medium- and high-frequency information caused by the downsampling operation. In addition, we explore different techniques for better apply ResTv2 backbones to downstream tasks. We found that although combining EMSAv2 and window attention can greatly reduce the theoretical matrix multiply FLOPs, it may significantly decrease the computation density, thus causing lower actual speed. We comprehensively validate ResTv2 on ImageNet classification, COCO detection, and ADE20K semantic segmentation. Experimental results show that the proposed ResTv2 can outperform the recently state-of-the-art backbones by a large margin, demonstrating the potential of ResTv2 as solid backbones. The code and models will be made publicly available at \url{https://github.com/wofmanaf/ResT}
\ No newline at end of file
diff --git a/data/2022/neurips/Residual Multiplicative Filter Networks for Multiscale Reconstruction b/data/2022/neurips/Residual Multiplicative Filter Networks for Multiscale Reconstruction
new file mode 100644
index 0000000000..efa0e0adcc
--- /dev/null
+++ b/data/2022/neurips/Residual Multiplicative Filter Networks for Multiscale Reconstruction	
@@ -0,0 +1 @@
+Coordinate networks like Multiplicative Filter Networks (MFNs) and BACON offer some control over the frequency spectrum used to represent continuous signals such as images or 3D volumes. Yet, they are not readily applicable to problems for which coarse-to-fine estimation is required, including various inverse problems in which coarse-to-fine optimization plays a key role in avoiding poor local minima. We introduce a new coordinate network architecture and training scheme that enables coarse-to-fine optimization with fine-grained control over the frequency support of learned reconstructions. This is achieved with two key innovations. First, we incorporate skip connections so that structure at one scale is preserved when fitting finer-scale structure. Second, we propose a novel initialization scheme to provide control over the model frequency spectrum at each stage of optimization. We demonstrate how these modifications enable multiscale optimization for coarse-to-fine fitting to natural images. We then evaluate our model on synthetically generated datasets for the the problem of single-particle cryo-EM reconstruction. We learn high resolution multiscale structures, on par with the state-of-the art.
\ No newline at end of file
diff --git a/data/2022/neurips/Resolving the data ambiguity for periodic crystals b/data/2022/neurips/Resolving the data ambiguity for periodic crystals
new file mode 100644
index 0000000000..d517ae1d40
--- /dev/null
+++ b/data/2022/neurips/Resolving the data ambiguity for periodic crystals	
@@ -0,0 +1 @@
+The fundamental model of all solid crystalline materials is a periodic set of atomic centers considered up to rigid motion in Euclidean space. The major obstacle to materials discovery was highly ambiguous representations of periodic crystals that didn’t allow fast and reliable comparisons and led to numerous (near-) duplicates in many databases of experimental and simulated crystals. This paper exemplarily resolves the ambiguity by invariants, which are descriptors without false negatives. The new Pointwise Distance Distributions (PDD) is a numerical matrix with a near-linear time complexity and an exactly computable metric. The strongest theoretical result is generic completeness (absence of false positives) for all ﬁnite and periodic sets of points in any dimension. The strength of PDD is shown by 200B+ pairwise comparisons of all periodic structures in the world’s largest collection (Cambridge Structural Database) of existing materials over two days on a modest desktop.
\ No newline at end of file
diff --git a/data/2022/neurips/Resource-Adaptive Federated Learning with All-In-One Neural Composition b/data/2022/neurips/Resource-Adaptive Federated Learning with All-In-One Neural Composition
new file mode 100644
index 0000000000..3f8915be67
--- /dev/null
+++ b/data/2022/neurips/Resource-Adaptive Federated Learning with All-In-One Neural Composition	
@@ -0,0 +1 @@
+Conventional Federated Learning (FL) systems inherently assume a uniform processing capacity among clients for deployed models. However, diverse client hardware often leads to varying computation resources in practice. Such system heterogeneity results in an inevitable trade-off between model complexity and data accessibility as a bottleneck. To avoid such a dilemma and achieve resource-adaptive federated learning , we introduce a simple yet effective mechanism, termed All-In-One Neural Composition , to systematically support training complexity-adjustable models with flexible resource adaption. It is able to efficiently construct models at various complexities using one unified neural basis shared among clients, instead of pruning the global model into local ones. The proposed mechanism endows the system with unhindered access to the full range of knowledge scattered across clients and generalizes existing pruning-based solutions by allowing soft and learnable extraction of low footprint models. Extensive experiment results on popular FL benchmarks demonstrate the effectiveness of our approach. The resulting FL system empowered by our All-In-One Neural Composition, called FLANC, manifests consistent performance gains across diverse system/data heterogeneous setups while keeping high efficiency in computation and communication.
\ No newline at end of file
diff --git a/data/2022/neurips/Respecting Transfer Gap in Knowledge Distillation b/data/2022/neurips/Respecting Transfer Gap in Knowledge Distillation
new file mode 100644
index 0000000000..797ff1ba07
--- /dev/null
+++ b/data/2022/neurips/Respecting Transfer Gap in Knowledge Distillation	
@@ -0,0 +1 @@
+Knowledge distillation (KD) is essentially a process of transferring a teacher model's behavior, e.g., network response, to a student model. The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set. Traditional KD methods hold an underlying assumption that the data collected in both human domain and machine domain are both independent and identically distributed (IID). We point out that this naive assumption is unrealistic and there is indeed a transfer gap between the two domains. Although the gap offers the student model external knowledge from the machine domain, the imbalanced teacher knowledge would make us incorrectly estimate how much to transfer from teacher to student per sample on the non-IID transfer set. To tackle this challenge, we propose Inverse Probability Weighting Distillation (IPWD) that estimates the propensity score of a training sample belonging to the machine domain, and assigns its inverse amount to compensate for under-represented samples. Experiments on CIFAR-100 and ImageNet demonstrate the effectiveness of IPWD for both two-stage distillation and one-stage self-distillation.
\ No newline at end of file
diff --git a/data/2022/neurips/Retaining Knowledge for Learning with Dynamic Definition b/data/2022/neurips/Retaining Knowledge for Learning with Dynamic Definition
new file mode 100644
index 0000000000..ce78f6035f
--- /dev/null
+++ b/data/2022/neurips/Retaining Knowledge for Learning with Dynamic Definition	
@@ -0,0 +1 @@
+Machine learning models are often deployed in settings where they must be constantly updated in response to the changes in class deﬁnitions while retaining high accuracy on previously learned deﬁnitions. A classical use case is fraud detection, where new fraud schemes come one after another. While such an update can be accomplished by re-training on the complete data, the process is inefﬁcient and prevents real-time and on-device learning. On the other hand, efﬁcient methods that incrementally learn from new data often result in the forgetting of previously-learned knowledge. We deﬁne this problem as Learning with Dynamic Deﬁnition (LDD) and demonstrate that popular models, such as the Vision Transformer and Roberta, exhibit substantial forgetting of past deﬁnitions. We present a ﬁrst practical and provable solution to LDD. Our proposal is a hash-based sparsity model RIDDLE that solves evolving deﬁnitions by associating samples only to relevant parameters. We prove that our model is a universal function approximator and theoretically bounds the knowledge lost during the update process. On practical tasks with evolving class deﬁnition in vision and natural language processing, RID-DLE outperforms baselines by up to 30% on the original dataset while providing competitive accuracy on the update dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Alignment in Video Super-Resolution Transformers b/data/2022/neurips/Rethinking Alignment in Video Super-Resolution Transformers
new file mode 100644
index 0000000000..aa9a0cdd33
--- /dev/null
+++ b/data/2022/neurips/Rethinking Alignment in Video Super-Resolution Transformers	
@@ -0,0 +1 @@
+The alignment of adjacent frames is considered an essential operation in video super-resolution (VSR). Advanced VSR models, including the latest VSR Transformers, are generally equipped with well-designed alignment modules. However, the progress of the self-attention mechanism may violate this common sense. In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations. Our experiments show that: (i) VSR Transformers can directly utilize multi-frame information from unaligned videos, and (ii) existing alignment methods are sometimes harmful to VSR Transformers. These observations indicate that we can further improve the performance of VSR Transformers simply by removing the alignment module and adopting a larger attention window. Nevertheless, such designs will dramatically increase the computational burden, and cannot deal with large motions. Therefore, we propose a new and efficient alignment method called patch alignment, which aligns image patches instead of pixels. VSR Transformers equipped with patch alignment could demonstrate state-of-the-art performance on multiple benchmarks. Our work provides valuable insights on how multi-frame information is used in VSR and how to select alignment methods for different networks/datasets. Codes and models will be released at https://github.com/XPixelGroup/RethinkVSRAlignment.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Generalization in Few-Shot Classification b/data/2022/neurips/Rethinking Generalization in Few-Shot Classification
new file mode 100644
index 0000000000..366d578d3d
--- /dev/null
+++ b/data/2022/neurips/Rethinking Generalization in Few-Shot Classification	
@@ -0,0 +1 @@
+Single image-level annotations only correctly describe an often small subset of an image's content, particularly when complex real-world scenes are depicted. While this might be acceptable in many classification scenarios, it poses a significant challenge for applications where the set of classes differs significantly between training and test time. In this paper, we take a closer look at the implications in the context of $\textit{few-shot learning}$. Splitting the input samples into patches and encoding these via the help of Vision Transformers allows us to establish semantic correspondences between local regions across images and independent of their respective class. The most informative patch embeddings for the task at hand are then determined as a function of the support set via online optimization at inference time, additionally providing visual interpretability of `$\textit{what matters most}$' in the image. We build on recent advances in unsupervised training of networks via masked image modelling to overcome the lack of fine-grained labels and learn the more general statistical structure of the data while avoiding negative image-level annotation influence, $\textit{aka}$ supervision collapse. Experimental results show the competitiveness of our approach, achieving new state-of-the-art results on four popular few-shot classification benchmarks for $5$-shot and $1$-shot scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Image Restoration for Object Detection b/data/2022/neurips/Rethinking Image Restoration for Object Detection
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2022/neurips/Rethinking Image Restoration for Object Detection	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning b/data/2022/neurips/Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..7a44ed23f7
--- /dev/null
+++ b/data/2022/neurips/Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success. Individual Global Max (IGM) decomposition, which is an important element of CTDE, measures the consistency between local and joint policies. The majority of IGM-based research focuses on how to establish this consistent relationship, but little attention has been paid to examining IGM's potential flaws. In this work, we reveal that the IGM condition is a lossy decomposition, and the error of lossy decomposition will accumulated in hypernetwork-based methods. To address the above issue, we propose to adopt an imitation learning strategy to separate the lossy decomposition from Bellman iterations, thereby avoiding error accumulation. The proposed strategy is theoretically proved and empirically verified on the StarCraft Multi-Agent Challenge benchmark problem with zero sight view. The results also confirm that the proposed method outperforms state-of-the-art IGM-based approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Knowledge Graph Evaluation Under the Open-World Assumption b/data/2022/neurips/Rethinking Knowledge Graph Evaluation Under the Open-World Assumption
new file mode 100644
index 0000000000..14d57fcc3d
--- /dev/null
+++ b/data/2022/neurips/Rethinking Knowledge Graph Evaluation Under the Open-World Assumption	
@@ -0,0 +1 @@
+Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown triplets as false is called the closed-world assumption. This closed-world assumption might negatively affect the fairness and consistency of the evaluation metrics. In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption, where unknown triplets are considered to include many missing facts not included in the training or test sets. For the currently most used metrics such as mean reciprocal rank (MRR) and Hits@K, we point out that their behavior may be unexpected under the open-world assumption. Specifically, with not many missing facts, their numbers show a logarithmic trend with respect to the true strength of the model, and thus, the metric increase could be insignificant in terms of reflecting the true model improvement. Further, considering the variance, we show that the degradation in the reported numbers may result in incorrect comparisons between different models, where stronger models may have lower metric numbers. We validate the phenomenon both theoretically and experimentally. Finally, we suggest possible causes and solutions for this problem. Our code and data are available at https://github.com/GraphPKU/Open-World-KG .
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective b/data/2022/neurips/Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective
new file mode 100644
index 0000000000..3fd1ddeb2e
--- /dev/null
+++ b/data/2022/neurips/Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective	
@@ -0,0 +1 @@
+Designing neural networks with bounded Lipschitz constant is a promising way to obtain certifiably robust classifiers against adversarial examples. However, the relevant progress for the important $\ell_\infty$ perturbation setting is rather limited, and a principled understanding of how to design expressive $\ell_\infty$ Lipschitz networks is still lacking. In this paper, we bridge the gap by studying certified $\ell_\infty$ robustness from a novel perspective of representing Boolean functions. We derive two fundamental impossibility results that hold for any standard Lipschitz network: one for robust classification on finite datasets, and the other for Lipschitz function approximation. These results identify that networks built upon norm-bounded affine layers and Lipschitz activations intrinsically lose expressive power even in the two-dimensional case, and shed light on how recently proposed Lipschitz networks (e.g., GroupSort and $\ell_\infty$-distance nets) bypass these impossibilities by leveraging order statistic functions. Finally, based on these insights, we develop a unified Lipschitz network that generalizes prior works, and design a practical version that can be efficiently trained (making certified robust training free). Extensive experiments show that our approach is scalable, efficient, and consistently yields better certified robustness across multiple datasets and perturbation radii than prior Lipschitz networks. Our code is available at https://github.com/zbh2047/SortNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Resolution in the Context of Efficient Video Recognition b/data/2022/neurips/Rethinking Resolution in the Context of Efficient Video Recognition
new file mode 100644
index 0000000000..040c2be9f0
--- /dev/null
+++ b/data/2022/neurips/Rethinking Resolution in the Context of Efficient Video Recognition	
@@ -0,0 +1 @@
+In this paper, we empirically study how to make the most of low-resolution frames for efficient video recognition. Existing methods mainly focus on developing compact networks or alleviating temporal redundancy of video inputs to increase efficiency, whereas compressing frame resolution has rarely been considered a promising solution. A major concern is the poor recognition accuracy on low-resolution frames. We thus start by analyzing the underlying causes of performance degradation on low-resolution frames. Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale. Motivated by the success of knowledge distillation (KD), we propose to bridge the gap between network and input size via cross-resolution KD (ResKD). Our work shows that ResKD is a simple but effective method to boost recognition accuracy on low-resolution frames. Without bells and whistles, ResKD considerably surpasses all competitive methods in terms of efficiency and accuracy on four large-scale benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V2. In addition, we extensively demonstrate its effectiveness over state-of-the-art architectures, i.e., 3D-CNNs and Video Transformers, and scalability towards super low-resolution frames. The results suggest ResKD can serve as a general inference acceleration method for state-of-the-art video recognition. Our code will be available at https://github.com/CVMI-Lab/ResKD.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Value Function Learning for Generalization in Reinforcement Learning b/data/2022/neurips/Rethinking Value Function Learning for Generalization in Reinforcement Learning
new file mode 100644
index 0000000000..7b7dea3752
--- /dev/null
+++ b/data/2022/neurips/Rethinking Value Function Learning for Generalization in Reinforcement Learning	
@@ -0,0 +1 @@
+Our work focuses on training RL agents on multiple visually diverse environments to improve observational generalization performance. In prior methods, policy and value networks are separately optimized using a disjoint network architecture to avoid interference and obtain a more accurate value function. We identify that a value network in the multi-environment setting is more challenging to optimize and prone to memorizing the training data than in the conventional single-environment setting. In addition, we find that appropriate regularization on the value network is necessary to improve both training and test performance. To this end, we propose Delayed-Critic Policy Gradient (DCPG), a policy gradient algorithm that implicitly penalizes value estimates by optimizing the value network less frequently with more training data than the policy network. This can be implemented using a single unified network architecture. Furthermore, we introduce a simple self-supervised task that learns the forward and inverse dynamics of environments using a single discriminator, which can be jointly optimized with the value network. Our proposed algorithms significantly improve observational generalization performance and sample efficiency on the Procgen Benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking Variational Inference for Probabilistic Programs with Stochastic Support b/data/2022/neurips/Rethinking Variational Inference for Probabilistic Programs with Stochastic Support
new file mode 100644
index 0000000000..506a3f7e29
--- /dev/null
+++ b/data/2022/neurips/Rethinking Variational Inference for Probabilistic Programs with Stochastic Support	
@@ -0,0 +1 @@
+We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational guide on a variable-by-variable basis, while maintaining the stochastic control flow of the original program. SDVI instead breaks the program down into sub-programs with static support, before automatically building separate sub-guides for each. This decomposition significantly aids in the construction of suitable variational families, enabling, in turn, substantial improvements in inference performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain b/data/2022/neurips/Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination b/data/2022/neurips/Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination
new file mode 100644
index 0000000000..a68e7c7c56
--- /dev/null
+++ b/data/2022/neurips/Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination	
@@ -0,0 +1 @@
+Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, GCL is inefficient in both time and memory consumption. In addition, GCL normally requires a large number of training epochs to be well-trained on large-scale datasets. Inspired by an observation of a technical defect (i.e., inappropriate usage of Sigmoid function) commonly used in two representative GCL works, DGI and MVGRL, we revisit GCL and introduce a new learning paradigm for self-supervised graph representation learning, namely, Group Discrimination (GD), and propose a novel GD-based method called Graph Group Discrimination (GGD). Instead of similarity computation, GGD directly discriminates two groups of node samples with a very simple binary cross-entropy loss. In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets. These two advantages endow GGD with very efficient property. Extensive experiments show that GGD outperforms state-of-the-art self-supervised methods on eight datasets. In particular, GGD can be trained in 0.18 seconds (6.44 seconds including data preprocessing) on ogbn-arxiv, which is orders of magnitude (10,000+) faster than GCL baselines while consuming much less memory. Trained with 9 hours on ogbn-papers100M with billion edges, GGD outperforms its GCL counterparts in both accuracy and efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking the Reverse-engineering of Trojan Triggers b/data/2022/neurips/Rethinking the Reverse-engineering of Trojan Triggers
new file mode 100644
index 0000000000..cd5f9cf910
--- /dev/null
+++ b/data/2022/neurips/Rethinking the Reverse-engineering of Trojan Triggers	
@@ -0,0 +1 @@
+Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.
\ No newline at end of file
diff --git a/data/2022/neurips/Rethinking the compositionality of point clouds through regularization in the hyperbolic space b/data/2022/neurips/Rethinking the compositionality of point clouds through regularization in the hyperbolic space
new file mode 100644
index 0000000000..2304e68754
--- /dev/null
+++ b/data/2022/neurips/Rethinking the compositionality of point clouds through regularization in the hyperbolic space	
@@ -0,0 +1 @@
+Point clouds of 3D objects exhibit an inherent compositional nature where simple parts can be assembled into progressively more complex shapes to form whole objects. Explicitly capturing such part-whole hierarchy is a long-sought objective in order to build effective models, but its tree-like nature has made the task elusive. In this paper, we propose to embed the features of a point cloud classifier into the hyperbolic space and explicitly regularize the space to account for the part-whole hierarchy. The hyperbolic space is the only space that can successfully embed the tree-like nature of the hierarchy. This leads to substantial improvements in the performance of state-of-art supervised models for point cloud classification.
\ No newline at end of file
diff --git a/data/2022/neurips/Retrieval-Augmented Diffusion Models b/data/2022/neurips/Retrieval-Augmented Diffusion Models
new file mode 100644
index 0000000000..832f15c774
--- /dev/null
+++ b/data/2022/neurips/Retrieval-Augmented Diffusion Models	
@@ -0,0 +1 @@
+Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called ``prompt-engineering'' has become established, in which carefully selected and composed sentences are used to achieve a certain visual style in the synthesized image. In this note, we present an alternative approach based on retrieval-augmented diffusion models (RDMs). In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples. During inference (sampling), we replace the retrieval database with a more specialized database that contains, for example, only images of a particular visual style. This provides a novel way to prompt a general trained model after training and thereby specify a particular visual style. As shown by our experiments, this approach is superior to specifying the visual style within the text prompt. We open-source code and model weights at https://github.com/CompVis/latent-diffusion .
\ No newline at end of file
diff --git a/data/2022/neurips/Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions b/data/2022/neurips/Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions
new file mode 100644
index 0000000000..2faa021217
--- /dev/null
+++ b/data/2022/neurips/Retrieve, Reason, and Refine: Generating Accurate and Faithful Patient Instructions	
@@ -0,0 +1 @@
+The “Patient Instruction” (PI), which contains critical instructional information provided both to carers and to the patient at the time of discharge, is essential for the patient to manage their condition outside hospital. An accurate and easy-to-follow PI can improve the self-management of patients which can in turn reduce hospital readmission rates. However, writing an appropriate PI can be extremely time-consuming for physicians, and is subject to being incomplete or error-prone for (potentially overworked) physicians. Therefore, we propose a new task that can provide an objective means of avoiding incompleteness, while reducing clinical workload: the automatic generation of the PI, which is imagined as being a document that the clinician can review, modify, and approve as necessary (rather than taking the human “out of the loop”). We build a benchmark clinical dataset and propose the Re 3 Writer, which imitates the working patterns of physicians to first re trieve related working experience from historical PIs written by physicians, then re ason related medical knowledge. Finally, it re fines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the PI for previously-unseen patient according to their health records during hospitalization. Our experiments show that, using our method, the performance of five different models can be substantially boosted across all metrics, with up to 20%, 11% and 19% relative improvements in BLEU-4, ROUGE-L and METEOR, respectively. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of its usefulness for clinical practice. 3
\ No newline at end of file
diff --git a/data/2022/neurips/Retrospective Adversarial Replay for Continual Learning b/data/2022/neurips/Retrospective Adversarial Replay for Continual Learning
new file mode 100644
index 0000000000..998b6f03d2
--- /dev/null
+++ b/data/2022/neurips/Retrospective Adversarial Replay for Continual Learning	
@@ -0,0 +1 @@
+Continual learning is an emerging research challenge in machine learning that addresses the problem where models quickly ﬁt the most recently trained-on data but suffer from catastrophic forgetting of previous data due to distribution shifts — it does this by maintaining a small historical replay buffer in replay-based methods. To avoid these problems, this paper proposes a method, “Retrospective Adversarial Replay (RAR)”, that synthesizes adversarial samples near the forgetting boundary. RAR perturbs a buffered sample towards its nearest neighbor drawn from the current task in a latent representation space. By replaying such samples, we are able to reﬁne the boundary between previous and current tasks, hence combating forgetting and reducing bias towards the current task. To mitigate the severity of a small replay buffer, we develop a novel MixUp-based strategy to increase replay variation by replaying mixed augmentations. Combined with RAR, this achieves a holistic framework that helps to alleviate catastrophic forgetting. We show that this excels on broadly-used benchmarks and outperforms other continual learning baselines especially when only a small buffer is available. We conduct a thorough ablation study over each key component as well as a hyperparameter sensitivity analysis to demonstrate the effectiveness and robustness of RAR.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisit last-iterate convergence of mSGD under milder requirement on step size b/data/2022/neurips/Revisit last-iterate convergence of mSGD under milder requirement on step size
new file mode 100644
index 0000000000..154efa3a2f
--- /dev/null
+++ b/data/2022/neurips/Revisit last-iterate convergence of mSGD under milder requirement on step size	
@@ -0,0 +1 @@
+Understanding convergence of stochastic gradient descent (SGD) based optimization algorithms can help deal with enormous machine learning problems. To ensure last-iterate convergence of SGD and momentum-based SGD (mSGD), the existing studies usually constrain the step size ϵ n to decay as (cid:80) + ∞ n =1 ϵ 2 n < + ∞ , which however is rather conservative and may lead to slow convergence in the early stage of the iteration. In this paper, we relax this requirement by studying an alternate step size for the mSGD. First, we relax the requirement of the decay on step size to (cid:80) + ∞ n =1 ϵ 2+ η 0 n < + ∞ (0 ≤ η 0 < 1 / 2) . This implies that a larger step size, such as ϵ n = 1 √ n can be utilized for accelerating the mSGD in the early stage. Under this new step size and some common conditions, we prove that the gradient norm of mSGD for a class of non-convex loss functions asymptotically decays to zero. In addition, we show that this step size can indeed help make the iterates of mSGD converge into a neighborhood of the stationary points quicker in the early stage. Finally, we establish the convergence of mSGD under a constant step size ϵ n ≡ ϵ > 0 by removing a common requirement in the literature on strong convexity of the loss functions. Some experiments are given to illustrate the developed results.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Active Sets for Gaussian Process Decoders b/data/2022/neurips/Revisiting Active Sets for Gaussian Process Decoders
new file mode 100644
index 0000000000..0c4738b4ee
--- /dev/null
+++ b/data/2022/neurips/Revisiting Active Sets for Gaussian Process Decoders	
@@ -0,0 +1 @@
+Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum b/data/2022/neurips/Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum
new file mode 100644
index 0000000000..c33ff54df2
--- /dev/null
+++ b/data/2022/neurips/Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum	
@@ -0,0 +1 @@
+Graph Contrastive Learning (GCL), learning the node representations by augmenting graphs, has attracted considerable attentions. Despite the proliferation of various graph augmentation strategies, some fundamental questions still remain unclear: what information is essentially encoded into the learned representations by GCL? Are there some general graph augmentation rules behind different augmentations? If so, what are they and what insights can they bring? In this paper, we answer these questions by establishing the connection between GCL and graph spectrum. By an experimental investigation in spectral domain, we firstly find the General grAph augMEntation (GAME) rule for GCL, i.e., the difference of the high-frequency parts between two augmented graphs should be larger than that of low-frequency parts. This rule reveals the fundamental principle to revisit the current graph augmentations and design new effective graph augmentations. Then we theoretically prove that GCL is able to learn the invariance information by contrastive invariance theorem, together with our GAME rule, for the first time, we uncover that the learned representations by GCL essentially encode the low-frequency information, which explains why GCL works. Guided by this rule, we propose a spectral graph contrastive learning module (SpCo), which is a general and GCL-friendly plug-in. We combine it with different existing GCL models, and extensive experiments well demonstrate that it can further improve the performances of a wide variety of different GCL methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Heterophily For Graph Neural Networks b/data/2022/neurips/Revisiting Heterophily For Graph Neural Networks
new file mode 100644
index 0000000000..93c2160ad6
--- /dev/null
+++ b/data/2022/neurips/Revisiting Heterophily For Graph Neural Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by using graph structures based on the relational inductive bias (homophily assumption). While GNNs have been commonly believed to outperform NNs in real-world tasks, recent work has identified a non-trivial set of datasets where their performance compared to NNs is not satisfactory. Heterophily has been considered the main cause of this empirical observation and numerous works have been put forward to address it. In this paper, we first revisit the widely used homophily metrics and point out that their consideration of only graph-label consistency is a shortcoming. Then, we study heterophily from the perspective of post-aggregation node similarity and define new homophily metrics, which are potentially advantageous compared to existing ones. Based on this investigation, we prove that some harmful cases of heterophily can be effectively addressed by local diversification operation. Then, we propose the Adaptive Channel Mixing (ACM), a framework to adaptively exploit aggregation, diversification and identity channels node-wisely to extract richer localized information for diverse node heterophily situations. ACM is more powerful than the commonly used uni-channel framework for node classification tasks on heterophilic graphs and is easy to be implemented in baseline GNN layers. When evaluated on 10 benchmark node classification tasks, ACM-augmented baselines consistently achieve significant performance gain, exceeding state-of-the-art GNNs on most tasks without incurring significant computational burden.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Injective Attacks on Recommender Systems b/data/2022/neurips/Revisiting Injective Attacks on Recommender Systems
new file mode 100644
index 0000000000..23effac49e
--- /dev/null
+++ b/data/2022/neurips/Revisiting Injective Attacks on Recommender Systems	
@@ -0,0 +1 @@
+Recent studies have demonstrated that recommender systems (RecSys) are vulnerable to injective attacks. Given a limited fake user budget, attackers can inject fake users with carefully designed behaviors into the open platforms, making RecSys recommend a target item to more real users for proﬁts. In this paper, we ﬁrst revisit existing attackers and reveal that they suffer from the difﬁculty-agnostic and diversity-deﬁcit issues. Existing attackers concentrate their efforts on difﬁcult users who have low tendencies toward the target item, thus reducing their effectiveness. Moreover, they are incapable of affecting the target RecSys to recommend the target item to real users in a diverse manner, because their generated fake user behaviors are dominated by large communities. To alleviate these two issues, we propose a difﬁculty and diversity aware attacker, namely DADA. We design the difﬁculty-aware and diversity-aware objectives to enable easy users from various communities to contribute more weights when optimizing attackers. By incorporating these two objectives, the proposed attacker DADA can concentrate on easy users while also affecting a broader range of real users simultaneously, thereby boosting the effectiveness. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed attacker.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Neural Scaling Laws in Language and Vision b/data/2022/neurips/Revisiting Neural Scaling Laws in Language and Vision
new file mode 100644
index 0000000000..3d2997ada3
--- /dev/null
+++ b/data/2022/neurips/Revisiting Neural Scaling Laws in Language and Vision	
@@ -0,0 +1 @@
+The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting (interpolating) parameters. We then present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains, including image classification, neural machine translation (NMT) and language modeling, in addition to tasks from the BIG-Bench evaluation benchmark. Finally, we release a benchmark dataset comprising of 90 evaluation tasks to facilitate research in this domain.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching b/data/2022/neurips/Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching
new file mode 100644
index 0000000000..182d9d1312
--- /dev/null
+++ b/data/2022/neurips/Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching	
@@ -0,0 +1 @@
+Stereo matching is a classic challenging problem in computer vision, which has recently witnessed remarkable progress by Deep Neural Networks (DNNs). This paradigm shift leads to two interesting and entangled questions that have not been addressed well. First , it is unclear whether stereo matching DNNs that are trained from scratch really learn to perform matching well. This paper studies this problem from the lens of white-box adversarial attacks. It presents a method of learning stereo-constrained photometrically-consistent attacks, which by design are weaker adversarial attacks, and yet can cause catastrophic performance drop for those DNNs. This observation suggests that they may not actually learn to perform matching well in the sense that they should otherwise achieve potentially even better after stereo-constrained perturbations are introduced. Second , stereo matching DNNs are typically trained under the simulation-to-real (Sim2Real) pipeline due to the data hungriness of DNNs. Thus, alleviating the impacts of the Sim2Real photometric gap in stereo matching DNNs becomes a pressing need. Towards joint adversarially robust and domain generalizable stereo matching, this paper proposes to learn DNN-contextualized binary-pattern-driven non-parametric cost-volumes . It leverages the perspective of learning the cost aggregation via DNNs, and presents a simple yet expressive design that is fully end-to-end trainable, without resorting to speciﬁc aggregation inductive biases. In experiments, the proposed method is tested in the SceneFlow dataset, the KITTI2015 dataset, and the Middlebury dataset. It signiﬁcantly improves the adversarial robustness, while retaining accuracy performance comparable to state-of-the-art methods. It also shows a better Sim2Real generalizability. Our code and pretrained models are released at this Github Repo.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization b/data/2022/neurips/Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization
new file mode 100644
index 0000000000..7a6332034a
--- /dev/null
+++ b/data/2022/neurips/Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization	
@@ -0,0 +1 @@
+Decentralized optimization is effective to save communication in large-scale machine learning. Although numerous algorithms have been proposed with theoretical guarantees and empirical successes, the performance limits in decentralized optimization, especially the influence of network topology and its associated weight matrix on the optimal convergence rate, have not been fully understood. While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear. This paper revisits non-convex stochastic decentralized optimization and establishes an optimal convergence rate with general weight matrices. In addition, we also establish the optimal rate when non-convex loss functions further satisfy the Polyak-Lojasiewicz (PL) condition. Following existing lines of analysis in literature cannot achieve these results. Instead, we leverage the Ring-Lattice graph to admit general weight matrices while maintaining the optimal relation between the graph diameter and weight matrix connectivity. Lastly, we develop a new decentralized algorithm to nearly attain the above two optimal rates under additional mild conditions.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering b/data/2022/neurips/Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering
new file mode 100644
index 0000000000..1449a35f26
--- /dev/null
+++ b/data/2022/neurips/Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering	
@@ -0,0 +1 @@
+Deploying models on target domain data subject to distribution shift requires adaptation. Test-time training (TTT) emerges as a solution to this adaptation under a realistic scenario where access to full source domain data is not available and instant inference on target domain is required. Despite many efforts into TTT, there is a confusion over the experimental settings, thus leading to unfair comparisons. In this work, we first revisit TTT assumptions and categorize TTT protocols by two key factors. Among the multiple protocols, we adopt a realistic sequential test-time training (sTTT) protocol, under which we further develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning. TTAC discovers clusters in both source and target domain and match the target clusters to the source ones to improve generalization. Pseudo label filtering and iterative updating are developed to improve the effectiveness and efficiency of anchored clustering. We demonstrate that under all TTT protocols TTAC consistently outperforms the state-of-the-art methods on six TTT datasets. We hope this work will provide a fair benchmarking of TTT methods and future research should be compared within respective protocols. A demo code is available at https://github.com/Gorilla-Lab-SCUT/TTAC.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution b/data/2022/neurips/Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution
new file mode 100644
index 0000000000..f0d4bb709a
--- /dev/null
+++ b/data/2022/neurips/Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution	
@@ -0,0 +1 @@
+The conventional sliced Wasserstein is defined between two probability measures that have realizations as vectors. When comparing two probability measures over images, practitioners first need to vectorize images and then project them to one-dimensional space by using matrix multiplication between the sample matrix and the projection matrix. After that, the sliced Wasserstein is evaluated by averaging the two corresponding one-dimensional projected probability measures. However, this approach has two limitations. The first limitation is that the spatial structure of images is not captured efficiently by the vectorization step; therefore, the later slicing process becomes harder to gather the discrepancy information. The second limitation is memory inefficiency since each slicing direction is a vector that has the same dimension as the images. To address these limitations, we propose novel slicing methods for sliced Wasserstein between probability measures over images that are based on the convolution operators. We derive convolution sliced Wasserstein (CSW) and its variants via incorporating stride, dilation, and non-linear activation function into the convolution operators. We investigate the metricity of CSW as well as its sample complexity, its computational complexity, and its connection to conventional sliced Wasserstein distances. Finally, we demonstrate the favorable performance of CSW over the conventional sliced Wasserstein in comparing probability measures over images and in training deep generative modeling on images.
\ No newline at end of file
diff --git a/data/2022/neurips/Revisiting Sparse Convolutional Model for Visual Recognition b/data/2022/neurips/Revisiting Sparse Convolutional Model for Visual Recognition
new file mode 100644
index 0000000000..db9793e047
--- /dev/null
+++ b/data/2022/neurips/Revisiting Sparse Convolutional Model for Visual Recognition	
@@ -0,0 +1 @@
+Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be expressed by a linear combination of a few elements from a convolutional dictionary, are powerful tools for analyzing natural images with good theoretical interpretability and biological plausibility. However, such principled models have not demonstrated competitive performance when compared with empirically designed deep networks. This paper revisits the sparse convolutional modeling for image classification and bridges the gap between good empirical performance (of deep learning) and good interpretability (of sparse convolutional models). Our method uses differentiable optimization layers that are defined from convolutional sparse coding as drop-in replacements of standard convolutional layers in conventional deep neural networks. We show that such models have equally strong empirical performance on CIFAR-10, CIFAR-100, and ImageNet datasets when compared to conventional neural networks. By leveraging stable recovery property of sparse modeling, we further show that such models can be much more robust to input corruptions as well as adversarial perturbations in testing through a simple proper trade-off between sparse regularization and data reconstruction terms. Source code can be found at https://github.com/Delay-Xili/SDNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Riemannian Diffusion Models b/data/2022/neurips/Riemannian Diffusion Models
new file mode 100644
index 0000000000..4c2f8d40a7
--- /dev/null
+++ b/data/2022/neurips/Riemannian Diffusion Models	
@@ -0,0 +1 @@
+Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in generalizing the Euclidean case, we prove that maximizing this variational lower-bound is equivalent to Riemannian score matching. Empirically, we demonstrate the expressive power of Riemannian diffusion models on a wide spectrum of smooth manifolds, such as spheres, tori, hyperboloids, and orthogonal groups. Our proposed method achieves new state-of-the-art likelihoods on all benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Riemannian Neural SDE: Learning Stochastic Representations on Manifolds b/data/2022/neurips/Riemannian Neural SDE: Learning Stochastic Representations on Manifolds
new file mode 100644
index 0000000000..e1c55a164b
--- /dev/null
+++ b/data/2022/neurips/Riemannian Neural SDE: Learning Stochastic Representations on Manifolds	
@@ -0,0 +1 @@
+In recent years, the neural stochastic differential equation (NSDE) has gained attention for modeling stochastic representations with great success in various types of applications. However, it typically loses expressivity when the data representation is manifold-valued. To address this issue, we suggest a principled method for expressing the stochastic representation with the Riemannian neural SDE (RNSDE), which extends the conventional Euclidean NSDE. Empirical results for various tasks demonstrate that the proposed method significantly outperforms baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Riemannian Score-Based Generative Modelling b/data/2022/neurips/Riemannian Score-Based Generative Modelling
new file mode 100644
index 0000000000..7a6587f927
--- /dev/null
+++ b/data/2022/neurips/Riemannian Score-Based Generative Modelling	
@@ -0,0 +1 @@
+Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance. Score-based generative modelling (SGM) consists of a ``noising'' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a ``denoising'' process defined by approximating the time-reversal of the diffusion. Existing SGMs assume that data is supported on a Euclidean space, i.e. a manifold with flat geometry. In many domains such as robotics, geoscience or protein modelling, data is often naturally described by distributions living on Riemannian manifolds and current SGM techniques are not appropriate. We introduce here Riemannian Score-based Generative Models (RSGMs), a class of generative models extending SGMs to Riemannian manifolds. We demonstrate our approach on a variety of manifolds, and in particular with earth and climate science spherical data.
\ No newline at end of file
diff --git a/data/2022/neurips/Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime b/data/2022/neurips/Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
new file mode 100644
index 0000000000..a60c70e487
--- /dev/null
+++ b/data/2022/neurips/Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.
\ No newline at end of file
diff --git a/data/2022/neurips/Risk-Driven Design of Perception Systems b/data/2022/neurips/Risk-Driven Design of Perception Systems
new file mode 100644
index 0000000000..7936a2be61
--- /dev/null
+++ b/data/2022/neurips/Risk-Driven Design of Perception Systems	
@@ -0,0 +1 @@
+Modern autonomous systems rely on perception modules to process complex sensor measurements into state estimates. These estimates are then passed to a controller, which uses them to make safety-critical decisions. It is therefore important that we design perception systems to minimize errors that reduce the overall safety of the system. We develop a risk-driven approach to designing perception systems that accounts for the effect of perceptual errors on the performance of the fully-integrated, closed-loop system. We formulate a risk function to quantify the effect of a given perceptual error on overall safety, and show how we can use it to design safer perception systems by including a risk-dependent term in the loss function and generating training data in risk-sensitive regions. We evaluate our techniques on a realistic vision-based aircraft detect and avoid application and show that risk-driven design reduces collision risk by 37% over a baseline system.
\ No newline at end of file
diff --git a/data/2022/neurips/Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge b/data/2022/neurips/Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge
new file mode 100644
index 0000000000..e82309b4ba
--- /dev/null
+++ b/data/2022/neurips/Roadblocks for Temporarily Disabling Shortcuts and Learning New Knowledge	
@@ -0,0 +1 @@
+Deep learning models have been found with a tendency of relying on shortcuts, i.e., decision rules that perform well on standard benchmarks but fail when transferred to more challenging testing conditions. Such reliance may hinder deep learning models from learning other task-related features and seriously affect their performance and robustness. Although recent studies have shown some characteristics of shortcuts, there are few investigations on how to help the deep learning models to solve shortcut problems. This paper proposes a framework to address this issue by setting up roadblocks on shortcuts. Speciﬁcally, roadblocks are placed when the model is urged to learn to complete a gently modiﬁed task to ensure that the learned knowledge, including shortcuts, is insufﬁcient the complete the task. Therefore, the model trained on the modiﬁed task will no longer over-rely on shortcuts. Extensive experiments demonstrate that the proposed framework signiﬁcantly improves the training of networks on both synthetic and real-world datasets in terms of both classiﬁcation accuracy and feature diversity. Moreover, the visualization results show that the mechanism behind the proposed our method is consistent with our expectations. In summary, our approach can effectively disable the shortcuts and thus learn more robust features.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Anytime Learning of Markov Decision Processes b/data/2022/neurips/Robust Anytime Learning of Markov Decision Processes
new file mode 100644
index 0000000000..f68e9b5fea
--- /dev/null
+++ b/data/2022/neurips/Robust Anytime Learning of Markov Decision Processes	
@@ -0,0 +1 @@
+Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. Furthermore, our method is capable of adapting to changes in the environment. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Bayesian Regression via Hard Thresholding b/data/2022/neurips/Robust Bayesian Regression via Hard Thresholding
new file mode 100644
index 0000000000..b38455f2b0
--- /dev/null
+++ b/data/2022/neurips/Robust Bayesian Regression via Hard Thresholding	
@@ -0,0 +1 @@
+By combining robust regression and prior information, we develop an effective robust regression method that can resist adaptive adversarial attacks. Due to the widespread existence of noise and data corruption, it is necessary to recover the true regression parameters when a certain proportion of the response variables have been corrupted. Methods to overcome this problem often involve robust least-squares regression. However, few methods achieve good performance when dealing with severe adaptive adversarial attacks. Based on the combination of prior information and robust regression via hard thresholding from [1], this paper proposes an algorithm that improves the breakdown point when facing adaptive adversarial attacks. Furthermore, to improve the robustness and reduce the estimation error caused by the inclusion of a prior, the idea of Bayesian reweighting is used to construct a more robust algorithm. We prove the theoretical convergence of proposed algorithms under mild conditions. Extensive experiments show that, under different dataset attacks, our algorithms achieve state-of-the-art results compared with other benchmark algorithms, demonstrating the robustness of the proposed approach.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Binary Models by Pruning Randomly-initialized Networks b/data/2022/neurips/Robust Binary Models by Pruning Randomly-initialized Networks
new file mode 100644
index 0000000000..cefc1110a0
--- /dev/null
+++ b/data/2022/neurips/Robust Binary Models by Pruning Randomly-initialized Networks	
@@ -0,0 +1 @@
+Robustness to adversarial attacks was shown to require a larger model capacity, and thus a larger memory footprint. In this paper, we introduce an approach to obtain robust yet compact models by pruning randomly-initialized binary networks. Unlike adversarial training, which learns the model parameters, we initialize the model parameters as either +1 or -1, keep them fixed, and find a subnetwork structure that is robust to attacks. Our method confirms the Strong Lottery Ticket Hypothesis in the presence of adversarial attacks, and extends this to binary networks. Furthermore, it yields more compact networks with competitive performance than existing works by 1) adaptively pruning different network layers; 2) exploiting an effective binary initialization scheme; 3) incorporating a last batch normalization layer to improve training stability. Our experiments demonstrate that our approach not only always outperforms the state-of-the-art robust binary networks, but also can achieve accuracy better than full-precision ones on some datasets. Finally, we show the structured patterns of our pruned binary networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Calibration with Multi-domain Temperature Scaling b/data/2022/neurips/Robust Calibration with Multi-domain Temperature Scaling
new file mode 100644
index 0000000000..e6bd35aa41
--- /dev/null
+++ b/data/2022/neurips/Robust Calibration with Multi-domain Temperature Scaling	
@@ -0,0 +1 @@
+Uncertainty quantification is essential for the reliable deployment of machine learning models to high-stakes application domains. Uncertainty quantification is all the more challenging when training distribution and test distribution are different, even the distribution shifts are mild. Despite the ubiquity of distribution shifts in real-world applications, existing uncertainty quantification approaches mainly study the in-distribution setting where the train and test distributions are the same. In this paper, we develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the heterogeneity in the domains to improve calibration robustness under distribution shift. Through experiments on three benchmark data sets, we find our proposed method outperforms existing methods as measured on both in-distribution and out-of-distribution test sets.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Feature-Level Adversaries are Interpretability Tools b/data/2022/neurips/Robust Feature-Level Adversaries are Interpretability Tools
new file mode 100644
index 0000000000..3176efbebf
--- /dev/null
+++ b/data/2022/neurips/Robust Feature-Level Adversaries are Interpretability Tools	
@@ -0,0 +1 @@
+The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create"feature-level"adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks. We make three contributions. First, we observe that feature-level attacks provide useful classes of inputs for studying representations in models. Second, we show that these adversaries are uniquely versatile and highly robust. We demonstrate that they can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale. Third, we show how these adversarial images can be used as a practical interpretability tool for identifying bugs in networks. We use these adversaries to make predictions about spurious associations between features and classes which we then test by designing"copy/paste"attacks in which one natural image is pasted into another to cause a targeted misclassification. Our results suggest that feature-level attacks are a promising approach for rigorous interpretability research. They support the design of tools to better understand what a model has learned and diagnose brittle feature associations. Code is available at https://github.com/thestephencasper/feature_level_adv
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Generalized Method of Moments: A Finite Sample Viewpoint b/data/2022/neurips/Robust Generalized Method of Moments: A Finite Sample Viewpoint
new file mode 100644
index 0000000000..9a4a1cb111
--- /dev/null
+++ b/data/2022/neurips/Robust Generalized Method of Moments: A Finite Sample Viewpoint	
@@ -0,0 +1 @@
+For many inference problems in statistics and econometrics, the unknown parameter is identified by a set of moment conditions. A generic method of solving moment conditions is the Generalized Method of Moments (GMM). However, classical GMM estimation is potentially very sensitive to outliers. Robustified GMM estimators have been developed in the past, but suffer from several drawbacks: computational intractability, poor dimension-dependence, and no quantitative recovery guarantees in the presence of a constant fraction of outliers. In this work, we develop the first computationally efficient GMM estimator (under intuitive assumptions) that can tolerate a constant $\epsilon$ fraction of adversarially corrupted samples, and that has an $\ell_2$ recovery guarantee of $O(\sqrt{\epsilon})$. To achieve this, we draw upon and extend a recent line of work on algorithmic robust statistics for related but simpler problems such as mean estimation, linear regression and stochastic optimization. As two examples of the generality of our algorithm, we show how our estimation algorithm and assumptions apply to instrumental variables linear and logistic regression. Moreover, we experimentally validate that our estimator outperforms classical IV regression and two-stage Huber regression on synthetic and semi-synthetic datasets with corruption.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Graph Structure Learning via Multiple Statistical Tests b/data/2022/neurips/Robust Graph Structure Learning via Multiple Statistical Tests
new file mode 100644
index 0000000000..d274bb2aa4
--- /dev/null
+++ b/data/2022/neurips/Robust Graph Structure Learning via Multiple Statistical Tests	
@@ -0,0 +1 @@
+Graph structure learning aims to learn connectivity in a graph from data. It is particularly important for many computer vision related tasks since no explicit graph structure is available for images for most cases. A natural way to construct a graph among images is to treat each image as a node and assign pairwise image similarities as weights to corresponding edges. It is well known that pairwise similarities between images are sensitive to the noise in feature representations, leading to unreliable graph structures. We address this problem from the viewpoint of statistical tests. By viewing the feature vector of each node as an independent sample, the decision of whether creating an edge between two nodes based on their similarity in feature representation can be thought as a ${\it single}$ statistical test. To improve the robustness in the decision of creating an edge, multiple samples are drawn and integrated by ${\it multiple}$ statistical tests to generate a more reliable similarity measure, consequentially more reliable graph structure. The corresponding elegant matrix form named $\mathcal{B}\textbf{-Attention}$ is designed for efficiency. The effectiveness of multiple tests for graph structure learning is verified both theoretically and empirically on multiple clustering and ReID benchmark datasets. Source codes are available at https://github.com/Thomas-wyh/B-Attention.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Imitation of a Few Demonstrations with a Backwards Model b/data/2022/neurips/Robust Imitation of a Few Demonstrations with a Backwards Model
new file mode 100644
index 0000000000..89c6d77aaa
--- /dev/null
+++ b/data/2022/neurips/Robust Imitation of a Few Demonstrations with a Backwards Model	
@@ -0,0 +1 @@
+Behavior cloning of expert demonstrations can speed up learning optimal policies in a more sample-efficient way over reinforcement learning. However, the policy cannot extrapolate well to unseen states outside of the demonstration data, creating covariate shift (agent drifting away from demonstrations) and compounding errors. In this work, we tackle this issue by extending the region of attraction around the demonstrations so that the agent can learn how to get back onto the demonstrated trajectories if it veers off-course. We train a generative backwards dynamics model and generate short imagined trajectories from states in the demonstrations. By imitating both demonstrations and these model rollouts, the agent learns the demonstrated paths and how to get back onto these paths. With optimal or near-optimal demonstrations, the learned policy will be both optimal and robust to deviations, with a wider region of attraction. On continuous control domains, we evaluate the robustness when starting from different initial states unseen in the demonstration data. While both our method and other imitation learning baselines can successfully solve the tasks for initial states in the training distribution, our method exhibits considerably more robustness to different initial states.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Imitation via Mirror Descent Inverse Reinforcement Learning b/data/2022/neurips/Robust Imitation via Mirror Descent Inverse Reinforcement Learning
new file mode 100644
index 0000000000..9348ea7b63
--- /dev/null
+++ b/data/2022/neurips/Robust Imitation via Mirror Descent Inverse Reinforcement Learning	
@@ -0,0 +1 @@
+Recently, adversarial imitation learning has shown a scalable reward acquisition method for inverse reinforcement learning (IRL) problems. However, estimated reward signals often become uncertain and fail to train a reliable statistical model since the existing methods tend to solve hard optimization problems directly. Inspired by a first-order optimization method called mirror descent, this paper proposes to predict a sequence of reward functions, which are iterative solutions for a constrained convex problem. IRL solutions derived by mirror descent are tolerant to the uncertainty incurred by target density estimation since the amount of reward learning is regulated with respect to local geometric constraints. We prove that the proposed mirror descent update rule ensures robust minimization of a Bregman divergence in terms of a rigorous regret bound of $\mathcal{O}(1/T)$ for step sizes $\{\eta_t\}_{t=1}^{T}$. Our IRL method was applied on top of an adversarial framework, and it outperformed existing adversarial methods in an extensive suite of benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Learning against Relational Adversaries b/data/2022/neurips/Robust Learning against Relational Adversaries
new file mode 100644
index 0000000000..e8fa07a360
--- /dev/null
+++ b/data/2022/neurips/Robust Learning against Relational Adversaries	
@@ -0,0 +1 @@
+Test-time adversarial attacks have posed serious challenges to the robustness of machine-learning models, and in many settings the adversarial perturbation needs not be bounded by small ℓ p -norms. Motivated by attacks in program analysis and security tasks, we investigate relational adversaries , a broad class of attackers who create adversarial examples in a reflexive-transitive closure of a logical relation. We analyze the conditions for robustness against relational adversaries and investigate different levels of robustness-accuracy trade-off due to various patterns in a relation. Inspired by the insights, we propose normalize-and-predict , a learning framework that leverages input normalization to achieve provable robustness. The framework solves the pain points of adversarial training against relational adversaries and can be combined with adversarial training for the benefits of both approaches. Guided by our theoretical findings, we apply our framework to source code authorship attribution and malware detection. Results of both tasks show our learning framework significantly improves the robustness of models against relational adversaries. In the process, it outperforms adversarial training, the most noteworthy defense mechanism, by a wide margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Model Selection and Nearly-Proper Learning for GMMs b/data/2022/neurips/Robust Model Selection and Nearly-Proper Learning for GMMs
new file mode 100644
index 0000000000..f39e2a94ff
--- /dev/null
+++ b/data/2022/neurips/Robust Model Selection and Nearly-Proper Learning for GMMs	
@@ -0,0 +1 @@
+In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance? The problem of estimating the number of components, also called model selection, is important in its own right but there are essentially no known efficient algorithms with provable guarantees let alone ones that can tolerate adversarial corruptions. In this work, we study the problem of robust model selection for univariate Gaussian mixture models (GMMs). Given $\textsf{poly}(k/\epsilon)$ samples from a distribution that is $\epsilon$-close in TV distance to a GMM with $k$ components, we can construct a GMM with $\widetilde{O}(k)$ components that approximates the distribution to within $\widetilde{O}(\epsilon)$ in $\textsf{poly}(k/\epsilon)$ time. Thus we are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor. Prior to our work, the only known algorithms for learning arbitrary univariate GMMs either output significantly more than $k$ components (e.g. $k/\epsilon^2$ components for kernel density estimates) or run in time exponential in $k$. Moreover, by adapting our techniques we obtain similar results for reconstructing Fourier-sparse signals.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Models are less Over-Confident b/data/2022/neurips/Robust Models are less Over-Confident
new file mode 100644
index 0000000000..71a2134aa7
--- /dev/null
+++ b/data/2022/neurips/Robust Models are less Over-Confident	
@@ -0,0 +1 @@
+Despite the success of convolutional neural networks (CNNs) in many academic benchmarks for computer vision tasks, their application in the real-world is still facing fundamental challenges. One of these open problems is the inherent lack of robustness, unveiled by the striking effectiveness of adversarial attacks. Current attack methods are able to manipulate the network's prediction by adding specific but small amounts of noise to the input. In turn, adversarial training (AT) aims to achieve robustness against such attacks and ideally a better model generalization ability by including adversarial samples in the trainingset. However, an in-depth analysis of the resulting robust models beyond adversarial robustness is still pending. In this paper, we empirically analyze a variety of adversarially trained models that achieve high robust accuracies when facing state-of-the-art attacks and we show that AT has an interesting side-effect: it leads to models that are significantly less overconfident with their decisions, even on clean data than non-robust models. Further, our analysis of robust models shows that not only AT but also the model's building blocks (like activation functions and pooling) have a strong influence on the models' prediction confidences. Data&Project website: https://github.com/GeJulia/robustness_confidences_evaluation
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Neural Posterior Estimation and Statistical Model Criticism b/data/2022/neurips/Robust Neural Posterior Estimation and Statistical Model Criticism
new file mode 100644
index 0000000000..7df5b4d4da
--- /dev/null
+++ b/data/2022/neurips/Robust Neural Posterior Estimation and Statistical Model Criticism	
@@ -0,0 +1 @@
+Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true data generating process, and consequently should thoughtfully consider the risk of model misspecification. In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models, and consider the implication of a simulation-to-reality gap. While recent works have demonstrated reliable performance of these methods, the analyses have been performed using synthetic data generated by the simulator model itself, and have therefore only addressed the well-specified case. In this paper, we find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively. As a remedy we argue that principled scientific inquiry with simulators should incorporate a model criticism component, to facilitate interpretable identification of misspecification and a robust inference component, to fit 'wrong but useful' models. We propose robust neural posterior estimation (RNPE), an extension of NPE to simultaneously achieve both these aims, through explicitly modelling the discrepancies between simulations and the observed data. We assess the approach on a range of artificially misspecified examples, and find RNPE performs well across the tasks, whereas naively using NPE leads to misleading and erratic posteriors.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning b/data/2022/neurips/Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
new file mode 100644
index 0000000000..9b11ea637f
--- /dev/null
+++ b/data/2022/neurips/Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution compared to on-policy sampling. Empirically, we show that this faster convergence leads to lower mean squared error policy value estimates.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Reinforcement Learning using Offline Data b/data/2022/neurips/Robust Reinforcement Learning using Offline Data
new file mode 100644
index 0000000000..cabd268d6e
--- /dev/null
+++ b/data/2022/neurips/Robust Reinforcement Learning using Offline Data	
@@ -0,0 +1 @@
+The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to learn the policy that maximizes the value against the worst possible models that lie in an uncertainty set. In this work, we propose a robust RL algorithm called Robust Fitted Q-Iteration (RFQI), which uses only an offline dataset to learn the optimal robust policy. Robust RL with offline data is significantly more challenging than its non-robust counterpart because of the minimization over all models present in the robust Bellman operator. This poses challenges in offline data collection, optimization over the models, and unbiased estimation. In this work, we propose a systematic approach to overcome these challenges, resulting in our RFQI algorithm. We prove that RFQI learns a near-optimal robust policy under standard assumptions and demonstrate its superior performance on standard benchmark problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Rent Division b/data/2022/neurips/Robust Rent Division
new file mode 100644
index 0000000000..afc5936d26
--- /dev/null
+++ b/data/2022/neurips/Robust Rent Division	
@@ -0,0 +1 @@
+In fair rent division, the problem is to assign rooms to roommates and fairly split the rent based on roommates’ reported valuations for the rooms. Envy-free rent division is the most popular application on the fair division website Spliddit. The standard model assumes that agents can correctly report their valuations for each room. In practice, agents may be unsure about their valuations, for example because they have had only limited time to inspect the rooms. Our goal is to ﬁnd a robust rent division that remains fair even if agent valuations are slightly different from the reported ones. We introduce the lexislack solution, which selects a rent division that remains envy-free for valuations within as large a radius as possible of the reported valuations. We also consider robustness notions for valuations that come from a probability distribution, and use results from learning theory to show how we can ﬁnd rent divisions that (almost) maximize the probability of being envy-free, or that minimize the expected envy. We show that an almost optimal allocation can be identiﬁed based on polynomially many samples from the valuation distribution. Finding the best allocation given these samples is NP-hard, but in practice such an allocation can be found using integer linear programming.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Semi-Supervised Learning when Not All Classes have Labels b/data/2022/neurips/Robust Semi-Supervised Learning when Not All Classes have Labels
new file mode 100644
index 0000000000..01a420094b
--- /dev/null
+++ b/data/2022/neurips/Robust Semi-Supervised Learning when Not All Classes have Labels	
@@ -0,0 +1 @@
+Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data. Existing SSL typically requires all classes have labels. However, in many real-world applications, there may exist some classes that are difficult to label or newly occurred classes that cannot be labeled in time, resulting in there are unseen classes in unlabeled data. Unseen classes will be misclassified as seen classes, causing poor classification performance. The performance of seen classes is also harmed by the existence of unseen classes. This limits the practical and wider application of SSL. To address this problem, this paper proposes a new SSL approach that can classify not only seen classes but also unseen classes. Our approach consists of two modules: unseen class classification and learning pace synchronization . Specifically, we first enable the SSL methods to classify unseen classes by exploiting pairwise similarity between examples and then synchronize the learning pace between seen and unseen classes by proposing an adaptive threshold with distribution alignment. Extensive empirical results show our approach achieves significant performance improvement in both seen and unseen classes compared with previous studies.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Streaming PCA b/data/2022/neurips/Robust Streaming PCA
new file mode 100644
index 0000000000..b6257c3f39
--- /dev/null
+++ b/data/2022/neurips/Robust Streaming PCA	
@@ -0,0 +1 @@
+We consider streaming principal component analysis when the stochastic data-generating model is subject to perturbations. While existing models assume a fixed covariance, we adopt a robust perspective where the covariance matrix belongs to a temporal uncertainty set. Under this setting, we provide fundamental limits on convergence of any algorithm recovering principal components. We analyze the convergence of the noisy power method and Oja's algorithm, both studied for the stationary data generating model, and argue that the noisy power method is rate-optimal in our setting. Finally, we demonstrate the validity of our analysis through numerical experiments on synthetic and real-world dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Robust Testing in High-Dimensional Sparse Models b/data/2022/neurips/Robust Testing in High-Dimensional Sparse Models
new file mode 100644
index 0000000000..6b58129f00
--- /dev/null
+++ b/data/2022/neurips/Robust Testing in High-Dimensional Sparse Models	
@@ -0,0 +1 @@
+We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given $n$ i.i.d. samples from the distribution $\mathcal{N}\left(\theta,I_d\right)$ (with unknown $\theta$), of which a small fraction has been arbitrarily corrupted. Under the promise that $\|\theta\|_0\le s$, we want to correctly distinguish whether $\|\theta\|_2=0$ or $\|\theta\|_2>\gamma$, for some input parameter $\gamma>0$. We show that any algorithm for this task requires $n=\Omega\left(s\log\frac{ed}{s}\right)$ samples, which is tight up to logarithmic factors. We also extend our results to other common notions of sparsity, namely, $\|\theta\|_q\le s$ for any $0<q<2$. In the second observation model that we consider, the data is generated according to a sparse linear regression model, where the covariates are i.i.d. Gaussian and the regression coefficient (signal) is known to be $s$-sparse. Here too we assume that an $\epsilon$-fraction of the data is arbitrarily corrupted. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=\Omega\left(\min(s\log d,{1}/{\gamma^4})\right)$ samples. Our results show that the complexity of testing in these two settings significantly increases under robustness constraints. This is in line with the recent observations made in robust mean testing and robust covariance testing.
\ No newline at end of file
diff --git a/data/2022/neurips/Robustness Analysis of Video-Language Models Against Visual and Language Perturbations b/data/2022/neurips/Robustness Analysis of Video-Language Models Against Visual and Language Perturbations
new file mode 100644
index 0000000000..03224c708d
--- /dev/null
+++ b/data/2022/neurips/Robustness Analysis of Video-Language Models Against Visual and Language Perturbations	
@@ -0,0 +1 @@
+Joint visual and language modeling on large-scale datasets has recently shown good progress in multi-modal tasks when compared to single modal learning. However, robustness of these approaches against real-world perturbations has not been studied. In this work, we perform the first extensive robustness study of video-language models against various real-world perturbations. We focus on text-to-video retrieval and propose two large-scale benchmark datasets, MSRVTT-P and YouCook2-P, which utilize 90 different visual and 35 different text perturbations. The study reveals some interesting initial findings from the studied models: 1) models are generally more susceptible when only video is perturbed as opposed to when only text is perturbed, 2) models that are pre-trained are more robust than those trained from scratch, 3) models attend more to scene and objects rather than motion and action. We hope this study will serve as a benchmark and guide future research in robust video-language learning. The benchmark introduced in this study along with the code and datasets is available at https://bit.ly/3CNOly4.
\ No newline at end of file
diff --git a/data/2022/neurips/Robustness Disparities in Face Detection b/data/2022/neurips/Robustness Disparities in Face Detection
new file mode 100644
index 0000000000..9653a34e45
--- /dev/null
+++ b/data/2022/neurips/Robustness Disparities in Face Detection	
@@ -0,0 +1 @@
+Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are $\textit{masculine presenting}$, $\textit{older}$, of $\textit{darker skin type}$, or have $\textit{dim lighting}$ are more susceptible to errors than their counterparts in other identities.
\ No newline at end of file
diff --git a/data/2022/neurips/Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization) b/data/2022/neurips/Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)
new file mode 100644
index 0000000000..b2a63b0160
--- /dev/null
+++ b/data/2022/neurips/Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)	
@@ -0,0 +1 @@
+We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with the lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under the non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by [Huang et al. NeurIPS21; Wu et al. NeurIPS21] and are consistent with [Bubeck and Sellke NeurIPS21; Bubeck et al. COLT21].
\ No newline at end of file
diff --git a/data/2022/neurips/Robustness to Label Noise Depends on the Shape of the Noise Distribution b/data/2022/neurips/Robustness to Label Noise Depends on the Shape of the Noise Distribution
new file mode 100644
index 0000000000..79b42ae1c4
--- /dev/null
+++ b/data/2022/neurips/Robustness to Label Noise Depends on the Shape of the Noise Distribution	
@@ -0,0 +1 @@
+Machine learning classifiers have been demonstrated, both empirically and theoretically, to be robust to label noise under certain conditions -- notably the typical assumption is that label noise is independent of the features given the class label. We provide a theoretical framework that generalizes beyond this typical assumption by modeling label noise as a distribution over feature space. We show that both the scale and the shape of the noise distribution influence the posterior likelihood; and the shape of the noise distribution has a stronger impact on classification performance if the noise is concentrated in feature space where the decision boundary can be moved. For the special case of uniform label noise (independent of features and the class label), we show that the Bayes optimal classifier for $c$ classes is robust to label noise until the ratio of noisy samples goes above $\frac{c-1}{c}$ (e.g. 90% for 10 classes), which we call the tipping point. However, for the special case of class-dependent label noise (independent of features given the class label), the tipping point can be as low as 50%. Most importantly, we show that when the noise distribution targets decision boundaries (label noise is directly dependent on feature space), classification robustness can drop off even at a small scale of noise. Even when evaluating recent label-noise mitigation methods we see reduced accuracy when label noise is dependent on features. These findings explain why machine learning often handles label noise well if the noise distribution is uniform in feature-space; yet it also points to the difficulty of overcoming label noise when it is concentrated in a region of feature space where a decision boundary can move.
\ No newline at end of file
diff --git a/data/2022/neurips/Robustness to Unbounded Smoothness of Generalized SignSGD b/data/2022/neurips/Robustness to Unbounded Smoothness of Generalized SignSGD
new file mode 100644
index 0000000000..fa85232562
--- /dev/null
+++ b/data/2022/neurips/Robustness to Unbounded Smoothness of Generalized SignSGD	
@@ -0,0 +1 @@
+Traditional analyses in non-convex optimization typically rely on the smoothness assumption, namely requiring the gradients to be Lipschitz. However, recent evidence shows that this smoothness condition does not capture the properties of some deep learning objective functions, including the ones involving Recurrent Neural Networks and LSTMs. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this relaxed assumption, it has been theoretically and empirically shown that the gradient-clipped SGD has an advantage over the vanilla one. In this paper, we show that clipping is not indispensable for Adam-type algorithms in tackling such scenarios: we theoretically prove that a generalized SignSGD algorithm can obtain similar convergence rates as SGD with clipping but does not need explicit clipping at all. This family of algorithms on one end recovers SignSGD and on the other end closely resembles the popular Adam algorithm. Our analysis underlines the critical role that momentum plays in analyzing SignSGD-type and Adam-type algorithms: it not only reduces the effects of noise, thus removing the need for large mini-batch in previous analyses of SignSGD-type algorithms, but it also substantially reduces the effects of unbounded smoothness and gradient norms. We also compare these algorithms with popular optimizers on a set of deep learning tasks, observing that we can match the performance of Adam while beating the others.
\ No newline at end of file
diff --git a/data/2022/neurips/Root Cause Analysis of Failures in Microservices through Causal Discovery b/data/2022/neurips/Root Cause Analysis of Failures in Microservices through Causal Discovery
new file mode 100644
index 0000000000..43a4719e01
--- /dev/null
+++ b/data/2022/neurips/Root Cause Analysis of Failures in Microservices through Causal Discovery	
@@ -0,0 +1 @@
+Most cloud applications use a large number of smaller sub-components (called mi-croservices) that interact with each other in the form of a complex graph to provide the overall functionality to the user. While the modularity of the microservice architecture is beneficial for rapid software development, maintaining and debugging such a system quickly in cases of failure is challenging. We propose a scalable algorithm for rapidly detecting the root cause of failures in complex microservice architectures. The key ideas behind our novel hierarchical and localized learning approach are: (1) to treat the failure as an intervention on the root cause to quickly detect it, (2) only learn the portion of the causal graph related to the root cause, thus avoiding a large number of costly conditional independence tests, and (3) hierarchically explore the graph. The proposed technique is highly scalable and produces useful insights about the root cause, while the use of traditional techniques becomes infeasible due to high computation time. Our solution is application agnostic and relies only on the data collected for diagnosis. For the evaluation, we compare the proposed solution with a modified version of the PC algorithm and the state-of-the-art for root cause analysis. The results show a considerable improvement in top-k recall while significantly reducing the execution time.
\ No newline at end of file
diff --git a/data/2022/neurips/Rotation-Equivariant Conditional Spherical Neural Fields for Learning a Natural Illumination Prior b/data/2022/neurips/Rotation-Equivariant Conditional Spherical Neural Fields for Learning a Natural Illumination Prior
new file mode 100644
index 0000000000..35455a642e
--- /dev/null
+++ b/data/2022/neurips/Rotation-Equivariant Conditional Spherical Neural Fields for Learning a Natural Illumination Prior	
@@ -0,0 +1 @@
+Inverse rendering is an ill-posed problem. Previous work has sought to resolve this by focussing on priors for object or scene shape or appearance. In this work, we instead focus on a prior for natural illuminations. Current methods rely on spherical harmonic lighting or other generic representations and, at best, a simplistic prior on the parameters. We propose a conditional neural field representation based on a variational auto-decoder with a SIREN network and, extending Vector Neurons, build equivariance directly into the network. Using this, we develop a rotation-equivariant, high dynamic range (HDR) neural illumination model that is compact and able to express complex, high-frequency features of natural environment maps. Training our model on a curated dataset of 1.6K HDR environment maps of natural scenes, we compare it against traditional representations, demonstrate its applicability for an inverse rendering task and show environment map completion from partial observations. A PyTorch implementation, our dataset and trained models can be found at jadgardner.github.io/RENI.
\ No newline at end of file
diff --git "a/data/2022/neurips/R\303\251nyiCL: Contrastive Representation Learning with Skew R\303\251nyi Divergence" "b/data/2022/neurips/R\303\251nyiCL: Contrastive Representation Learning with Skew R\303\251nyi Divergence"
new file mode 100644
index 0000000000..99c8bbd5f3
--- /dev/null
+++ "b/data/2022/neurips/R\303\251nyiCL: Contrastive Representation Learning with Skew R\303\251nyi Divergence"	
@@ -0,0 +1 @@
+Contrastive representation learning seeks to acquire useful representations by estimating the shared information between multiple views of data. Here, the choice of data augmentation is sensitive to the quality of learned representations: as harder the data augmentations are applied, the views share more task-relevant information, but also task-irrelevant one that can hinder the generalization capability of representation. Motivated by this, we present a new robust contrastive learning scheme, coined R\'enyiCL, which can effectively manage harder augmentations by utilizing R\'enyi divergence. Our method is built upon the variational lower bound of R\'enyi divergence, but a na\"ive usage of a variational method is impractical due to the large variance. To tackle this challenge, we propose a novel contrastive objective that conducts variational estimation of a skew R\'enyi divergence and provide a theoretical guarantee on how variational estimation of skew divergence leads to stable training. We show that R\'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously so that it can selectively learn useful features and ignore nuisance features. Through experiments on ImageNet, we show that R\'enyi contrastive learning with stronger augmentations outperforms other self-supervised methods without extra regularization or computational overhead. Moreover, we also validate our method on other domains such as graph and tabular, showing empirical gain over other contrastive methods.
\ No newline at end of file
diff --git a/data/2022/neurips/S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction b/data/2022/neurips/S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction
new file mode 100644
index 0000000000..5dbfdc3158
--- /dev/null
+++ b/data/2022/neurips/S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction	
@@ -0,0 +1 @@
+We present three novel strategies to incorporate a parametric body model into a pixel-aligned implicit model for single-view clothed human reconstruction. Firstly, we introduce ray-based sampling, a novel technique that transforms a parametric model into a set of highly informative, pixel-aligned 2D feature maps. Next, we propose a new type of feature based on blendweights. Blendweight-based labels serve as soft human parsing labels and help to improve the structural fidelity of reconstructed meshes. Finally, we show how we can extract and capitalize on body part orientation information from a parametric model to further improve reconstruction quality. Together, these three techniques form our S-PIFu framework, which significantly outperforms state-of-the-arts methods in all metrics. Our code is available at https://github.com/kcyt/SPIFu.
\ No newline at end of file
diff --git a/data/2022/neurips/S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning b/data/2022/neurips/S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning
new file mode 100644
index 0000000000..63a78f8199
--- /dev/null
+++ b/data/2022/neurips/S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning	
@@ -0,0 +1 @@
+State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to learn prompts independently across domains with pre-trained transformers, avoiding the use of exemplars that commonly appear in conventional methods. This results in a win-win game where the prompting can achieve the best for each domain. The independent prompting across domains only requests one single cross-entropy loss for training and one simple K-NN operation as a domain identifier for inference. The learning paradigm derives an image prompt learning approach and a novel language-image prompt learning approach. Owning an excellent scalability (0.03% parameter increase per domain), the best of our approaches achieves a remarkable relative improvement (an average of about 30%) over the best of the state-of-the-art exemplar-free methods for three standard DIL tasks, and even surpasses the best of them relatively by about 6% in average when they use exemplars. Source code is available at \url{https://github.com/iamwangyabin/S-Prompts}.
\ No newline at end of file
diff --git a/data/2022/neurips/S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning b/data/2022/neurips/S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning
new file mode 100644
index 0000000000..2f3ae53ad7
--- /dev/null
+++ b/data/2022/neurips/S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (Offline RL) suffers from the innate distributional shift as it cannot interact with the physical environment during training. To alleviate such limitation, state-based offline RL leverages a learned dynamics model from the logged experience and augments the predicted state transition to extend the data distribution. For exploiting such benefit also on the image-based RL, we firstly propose a generative model, S2P (State2Pixel), which synthesizes the raw pixel of the agent from its corresponding state. It enables bridging the gap between the state and the image domain in RL algorithms, and virtually exploring unseen image distribution via model-based transition in the state space. Through experiments, we confirm that our S2P-based image synthesis not only improves the image-based offline RL performance but also shows powerful generalization capability on unseen tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint b/data/2022/neurips/S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
new file mode 100644
index 0000000000..8ed36b0fe0
--- /dev/null
+++ b/data/2022/neurips/S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint	
@@ -0,0 +1 @@
+In this paper, we address the"dual problem"of multi-view scene reconstruction in which we utilize single-view images captured under different point lights to learn a neural scene representation. Different from existing single-view methods which can only recover a 2.5D scene representation (i.e., a normal / depth map for the visible surface), our method learns a neural reflectance field to represent the 3D geometry and BRDFs of a scene. Instead of relying on multi-view photo-consistency, our method exploits two information-rich monocular cues, namely shading and shadow, to infer scene geometry. Experiments on multiple challenging datasets show that our method is capable of recovering 3D geometry, including both visible and invisible parts, of a scene from single-view images. Thanks to the neural reflectance field representation, our method is robust to depth discontinuities. It supports applications like novel-view synthesis and relighting. Our code and model can be found at https://ywq.github.io/s3nerf.
\ No newline at end of file
diff --git a/data/2022/neurips/S3GC: Scalable Self-Supervised Graph Clustering b/data/2022/neurips/S3GC: Scalable Self-Supervised Graph Clustering
new file mode 100644
index 0000000000..b7d9c59572
--- /dev/null
+++ b/data/2022/neurips/S3GC: Scalable Self-Supervised Graph Clustering	
@@ -0,0 +1 @@
+We study the problem of clustering graphs with additional side-information of node features. The problem is extensively studied, and several existing methods exploit Graph Neural Networks to learn node representations [29]. However, most of the existing methods focus on generic representations instead of their cluster-ability or do not scale to large scale graph datasets. In this work, we propose S 3 GC which uses contrastive learning along with Graph Neural Networks and node features to learn clusterable features. We empirically demonstrate that S 3 GC is able to learn the correct cluster structure even when graph information or node features are individually not informative enough to learn correct clusters. Finally, using extensive evaluation on a variety of benchmarks, we demonstrate that S 3 GC is able to signiﬁcantly outperform state-of-the-art methods in terms of clustering accuracy – with as much as 5% gain in NMI – while being scalable to graphs of size 100M.
\ No newline at end of file
diff --git a/data/2022/neurips/S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces b/data/2022/neurips/S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/SALSA: Attacking Lattice Cryptography with Transformers b/data/2022/neurips/SALSA: Attacking Lattice Cryptography with Transformers
new file mode 100644
index 0000000000..2871e7a2c3
--- /dev/null
+++ b/data/2022/neurips/SALSA: Attacking Lattice Cryptography with Transformers	
@@ -0,0 +1 @@
+Currently deployed public-key cryptosystems will be vulnerable to attacks by full-scale quantum computers. Consequently,"quantum resistant"cryptosystems are in high demand, and lattice-based cryptosystems, based on a hard problem known as Learning With Errors (LWE), have emerged as strong contenders for standardization. In this work, we train transformers to perform modular arithmetic and combine half-trained models with statistical cryptanalysis techniques to propose SALSA: a machine learning attack on LWE-based cryptographic schemes. SALSA can fully recover secrets for small-to-mid size LWE instances with sparse binary secrets, and may scale to attack real-world LWE-based cryptosystems.
\ No newline at end of file
diff --git a/data/2022/neurips/SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections b/data/2022/neurips/SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections
new file mode 100644
index 0000000000..52514a8b35
--- /dev/null
+++ b/data/2022/neurips/SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections	
@@ -0,0 +1 @@
+Inverse rendering of an object under entirely unknown capture conditions is a fundamental challenge in computer vision and graphics. Neural approaches such as NeRF have achieved photorealistic results on novel view synthesis, but they require known camera poses. Solving this problem with unknown camera poses is highly challenging as it requires joint optimization over shape, radiance, and pose. This problem is exacerbated when the input images are captured in the wild with varying backgrounds and illuminations. Standard pose estimation techniques fail in such image collections in the wild due to very few estimated correspondences across images. Furthermore, NeRF cannot relight a scene under any illumination, as it operates on radiance (the product of reflectance and illumination). We propose a joint optimization framework to estimate the shape, BRDF, and per-image camera pose and illumination. Our method works on in-the-wild online image collections of an object and produces relightable 3D assets for several use-cases such as AR/VR. To our knowledge, our method is the first to tackle this severely unconstrained task with minimal user interaction. Project page: https://markboss.me/publication/2022-samurai/ Video: https://youtu.be/LlYuGDjXp-8
\ No newline at end of file
diff --git a/data/2022/neurips/SAPA: Similarity-Aware Point Affiliation for Feature Upsampling b/data/2022/neurips/SAPA: Similarity-Aware Point Affiliation for Feature Upsampling
new file mode 100644
index 0000000000..dfcafd80c2
--- /dev/null
+++ b/data/2022/neurips/SAPA: Similarity-Aware Point Affiliation for Feature Upsampling	
@@ -0,0 +1 @@
+We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity. By rethinking point affiliation, we present a generic formulation for generating upsampling kernels. The kernels encourage not only semantic smoothness but also boundary sharpness in the upsampled feature maps. Such properties are particularly useful for some dense prediction tasks such as semantic segmentation. The key idea of our formulation is to generate similarity-aware kernels by comparing the similarity between each encoder feature point and the spatially associated local region of decoder features. In this way, the encoder feature point can function as a cue to inform the semantic cluster of upsampled feature points. To embody the formulation, we further instantiate a lightweight upsampling operator, termed Similarity-Aware Point Affiliation (SAPA), and investigate its variants. SAPA invites consistent performance improvements on a number of dense prediction tasks, including semantic segmentation, object detection, depth estimation, and image matting. Code is available at: https://github.com/poppinace/sapa
\ No newline at end of file
diff --git a/data/2022/neurips/SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems b/data/2022/neurips/SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems
new file mode 100644
index 0000000000..b01707fd65
--- /dev/null
+++ b/data/2022/neurips/SAPD+: An Accelerated Stochastic Method for Nonconvex-Concave Minimax Problems	
@@ -0,0 +1 @@
+We propose a new stochastic method SAPD+ for solving nonconvex-concave minimax problems of the form $\min\max \mathcal{L}(x,y) = f(x)+\Phi(x,y)-g(y)$, where $f,g$ are closed convex and $\Phi(x,y)$ is a smooth function that is weakly convex in $x$, (strongly) concave in $y$. For both strongly concave and merely concave settings, SAPD+ achieves the best known oracle complexities of $\mathcal{O}(L\kappa_y\epsilon^{-4})$ and $\mathcal{O}(L^3\epsilon^{-6})$, respectively, without assuming compactness of the problem domain, where $\kappa_y$ is the condition number and $L$ is the Lipschitz constant. We also propose SAPD+ with variance reduction, which enjoys the best known oracle complexity of $\mathcal{O}(L\kappa_y^2\epsilon^{-3})$ for weakly convex-strongly concave setting. We demonstrate the efficiency of SAPD+ on a distributionally robust learning problem with a weakly convex cost and also on a multi-class classification problem in deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training b/data/2022/neurips/SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training
new file mode 100644
index 0000000000..522bb50de7
--- /dev/null
+++ b/data/2022/neurips/SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training	
@@ -0,0 +1 @@
+Data parallelism across multiple machines is widely adopted for accelerating distributed deep learning, but it is hard to achieve linear speedup due to the heavy communication. In this paper, we propose SAPipe, a performant system that pushes the training speed of data parallelism to its fullest extent. By introducing partial staleness, the communication overlaps the computation with minimal staleness in SAPipe. To mitigate additional problems incurred by staleness, SAPipe adopts staleness compensation techniques including weight prediction and delay compensation with provably lower error bounds. Additionally, SAPipe presents an algorithm-system co-design with runtime optimization to minimize system overhead for the staleness training pipeline and staleness compensation. We have implemented SAPipe in the BytePS framework, compatible to both TensorFlow and PyTorch. Our experiments show that SAPipe achieves up to 157% speedups over BytePS (non-stale), and outperforms PipeSGD in accuracy by up to 13.7%.
\ No newline at end of file
diff --git a/data/2022/neurips/SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos b/data/2022/neurips/SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
new file mode 100644
index 0000000000..063112d4c8
--- /dev/null
+++ b/data/2022/neurips/SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos	
@@ -0,0 +1 @@
+The visual world can be parsimoniously characterized in terms of distinct entities with sparse interactions. Discovering this compositional structure in dynamic visual scenes has proven challenging for end-to-end computer vision approaches unless explicit instance-level supervision is provided. Slot-based models leveraging motion cues have recently shown great promise in learning to represent, segment, and track objects without direct supervision, but they still fail to scale to complex real-world multi-object videos. In an effort to bridge this gap, we take inspiration from human development and hypothesize that information about scene geometry in the form of depth signals can facilitate object-centric learning. We introduce SAVi++, an object-centric video model which is trained to predict depth signals from a slot-based video representation. By further leveraging best practices for model scaling, we are able to train SAVi++ to segment complex dynamic scenes recorded with moving cameras, containing both static and moving objects of diverse appearance on naturalistic backgrounds, without the need for segmentation supervision. Finally, we demonstrate that by using sparse depth signals obtained from LiDAR, SAVi++ is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization b/data/2022/neurips/SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization
new file mode 100644
index 0000000000..7eb4cd4df1
--- /dev/null
+++ b/data/2022/neurips/SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization	
@@ -0,0 +1 @@
+Vision Transformers (ViTs) yield impressive performance across various vision tasks. However, heavy computation and memory footprint make them inaccessible for edge devices. Previous works apply importance criteria determined independently by each individual component to prune ViTs. Considering that heterogeneous components in ViTs play distinct roles, these approaches lead to suboptimal performance. In this paper, we introduce joint importance, which integrates essential structural-aware interactions between components for the first time, to perform collaborative pruning. Based on the theoretical analysis, we construct a Taylor-based approximation to evaluate the joint importance. This guides pruning toward a more balanced reduction across all components. To further reduce the algorithm complexity, we incorporate the interactions into the optimization function under some mild assumptions. Moreover, the proposed method can be seamlessly applied to various tasks including object detection. Extensive experiments demonstrate the effectiveness of our method. Notably, the proposed approach outperforms the existing state-of-the-art approaches on ImageNet, increasing accuracy by 0.7% over the DeiT-Base baseline while saving 50% FLOPs. On COCO, we are the first to show that 70% FLOPs of Faster R-CNN with ViT backbone can be removed with only 0.3% mAP drop. The code is available at https://github.com/hikvision-research/SAViT .
\ No newline at end of file
diff --git a/data/2022/neurips/SCAMPS: Synthetics for Camera Measurement of Physiological Signals b/data/2022/neurips/SCAMPS: Synthetics for Camera Measurement of Physiological Signals
new file mode 100644
index 0000000000..922fd87022
--- /dev/null
+++ b/data/2022/neurips/SCAMPS: Synthetics for Camera Measurement of Physiological Signals	
@@ -0,0 +1 @@
+The use of cameras and computational algorithms for noninvasive, low-cost and scalable measurement of physiological (e.g., cardiac and pulmonary) vital signs is very attractive. However, diverse data representing a range of environments, body motions, illumination conditions and physiological states is laborious, time consuming and expensive to obtain. Synthetic data have proven a valuable tool in several areas of machine learning, yet are not widely available for camera measurement of physiological states. Synthetic data offer"perfect"labels (e.g., without noise and with precise synchronization), labels that may not be possible to obtain otherwise (e.g., precise pixel level segmentation maps) and provide a high degree of control over variation and diversity in the dataset. We present SCAMPS, a dataset of synthetics containing 2,800 videos (1.68M frames) with aligned cardiac and respiratory signals and facial action intensities. The RGB frames are provided alongside segmentation maps. We provide precise descriptive statistics about the underlying waveforms, including inter-beat interval, heart rate variability, and pulse arrival time. Finally, we present baseline results training on these synthetic data and testing on real-world datasets to illustrate generalizability.
\ No newline at end of file
diff --git a/data/2022/neurips/SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction b/data/2022/neurips/SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction
new file mode 100644
index 0000000000..c077e23fdd
--- /dev/null
+++ b/data/2022/neurips/SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction	
@@ -0,0 +1 @@
+One unique property of time series is that the temporal relations are largely preserved after downsampling into two sub-sequences. By taking advantage of this property, we propose a novel neural network architecture that conducts sample convolution and interaction for temporal modeling and forecasting, named SCINet. Specifically, SCINet is a recursive downsample-convolve-interact architecture. In each layer, we use multiple convolutional filters to extract distinct yet valuable temporal features from the downsampled sub-sequences or features. By combining these rich features aggregated from multiple resolutions, SCINet effectively models time series with complex temporal dynamics. Experimental results show that SCINet achieves significant forecasting accuracy improvements over both existing convolutional models and Transformer-based solutions across various real-world time series forecasting datasets. Our codes and data are available at https://github.com/cure-lab/SCINet.
\ No newline at end of file
diff --git a/data/2022/neurips/SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification b/data/2022/neurips/SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification
new file mode 100644
index 0000000000..c3ba931afc
--- /dev/null
+++ b/data/2022/neurips/SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification	
@@ -0,0 +1 @@
+Weakly-supervised whole-slide image (WSI) classiﬁcation (WSWC) is a challenging task where a large number of unlabeled patches (instances) exist within each WSI (bag) while only a slide label is given. Despite recent progress for the multiple instance learning (MIL)-based WSI analysis, the major limitation is that it usually focuses on the easy-to-distinguish diagnosis-positive regions while ignoring positives that occupy a small ratio in the entire WSI. To obtain more discriminative features, we propose a novel weakly-supervised classiﬁcation method based on cross-slide contrastive learning (called SCL-WC), which depends on task-agnostic self-supervised feature pre-extraction and task-speciﬁc weakly-supervised feature reﬁnement and aggregation for WSI-level prediction. To enable both intra-WSI and inter-WSI information interaction, we propose a positive-negative-aware module (PNM) and a weakly-supervised cross-slide contrastive learning (WSCL) module, respectively. The WSCL aims to pull WSIs with the same disease types closer and push different WSIs away. The PNM aims to facilitate the separation of tumor-like patches and normal ones within each WSI. Extensive experiments demonstrate state-of-the-art performance of our method in three different classi-ﬁcation tasks (e.g., over 2% of AUC in Camelyon16, 5% of F1 score in BRACS, and 3% of AUC in DiagSet). Our method also shows superior ﬂexibility and scalability in weakly-supervised localization and semi-supervised classiﬁcation experiments (e.g., ﬁrst place in the BRIGHT challenge). Our code will be available at https://github.com/Xiyue-Wang/SCL-WC.
\ No newline at end of file
diff --git a/data/2022/neurips/SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration b/data/2022/neurips/SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration
new file mode 100644
index 0000000000..38df0f225a
--- /dev/null
+++ b/data/2022/neurips/SCONE: Surface Coverage Optimization in Unknown Environments by Volumetric Integration	
@@ -0,0 +1 @@
+Next Best View computation (NBV) is a long-standing problem in robotics, and consists in identifying the next most informative sensor position(s) for reconstructing a 3D object or scene efficiently and accurately. Like most current methods, we consider NBV prediction from a depth sensor like Lidar systems. Learning-based methods relying on a volumetric representation of the scene are suitable for path planning, but have lower accuracy than methods using a surface-based representation. However, the latter do not scale well with the size of the scene and constrain the camera to a small number of poses. To obtain the advantages of both representations, we show that we can maximize surface metrics by Monte Carlo integration over a volumetric representation. In particular, we propose an approach, SCONE, that relies on two neural modules: The first module predicts occupancy probability in the entire volume of the scene. Given any new camera pose, the second module samples points in the scene based on their occupancy probability and leverages a self-attention mechanism to predict the visibility of the samples. Finally, we integrate the visibility to evaluate the gain in surface coverage for the new camera pose. NBV is selected as the pose that maximizes the gain in total surface coverage. Our method scales to large scenes and handles free camera motion: It takes as input an arbitrarily large point cloud gathered by a depth sensor as well as camera poses to predict NBV. We demonstrate our approach on a novel dataset made of large and complex 3D scenes.
\ No newline at end of file
diff --git a/data/2022/neurips/SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping b/data/2022/neurips/SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping
new file mode 100644
index 0000000000..6c18b9ec3f
--- /dev/null
+++ b/data/2022/neurips/SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping	
@@ -0,0 +1 @@
+We present simultaneous generation and mapping (SGAM), a novel 3D scene generation algorithm. Our goal is to produce a realistic, globally consistent 3D world on a large scale. Achieving this goal is challenging and goes beyond the capacities of existing 3D generation or video generation approaches, which fail to scale up to create large, globally consistent 3D scene structures. Towards tackling the challenges, we take a hybrid approach that integrates generative sensor modeling with 3D reconstruction. Our proposed approach is an autoregressive generative framework that simultaneously generates sensor data at novel viewpoints and builds a 3D map at each timestamp. Given an arbitrary camera trajectory, our method repeatedly applies this generation-and-mapping process for thousands of steps, allowing us to create a gigantic virtual world. Our model can be trained from RGB-D sequences without having access to the complete 3D scene structure. The generated scenes are readily compatible with various interactive environments and rendering engines. Experiments on CLEVER and GoogleEarth datasets demonstrates ours can generate consistent, realistic, and geometrically-plausible scenes that compare favorably to existing view synthesis methods. Our project page is available at https://yshen47.github.io/sgam/.
\ No newline at end of file
diff --git a/data/2022/neurips/SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning b/data/2022/neurips/SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning
new file mode 100644
index 0000000000..51ec93acf9
--- /dev/null
+++ b/data/2022/neurips/SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning	
@@ -0,0 +1 @@
+Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis. The implementation of this paper is on https://github.com/hsvgbkhgbv/shapley-q-learning.
\ No newline at end of file
diff --git a/data/2022/neurips/SHINE: SubHypergraph Inductive Neural nEtwork b/data/2022/neurips/SHINE: SubHypergraph Inductive Neural nEtwork
new file mode 100644
index 0000000000..db11e1a46d
--- /dev/null
+++ b/data/2022/neurips/SHINE: SubHypergraph Inductive Neural nEtwork	
@@ -0,0 +1 @@
+Hypergraph neural networks can model multi-way connections among nodes of the graphs, which are common in real-world applications such as genetic medicine. In particular, genetic pathways or gene sets encode molecular functions driven by multiple genes, naturally represented as hyperedges. Thus, hypergraph-guided embedding can capture functional relations in learned representations. Existing hypergraph neural network models often focus on node-level or graph-level inference. There is an unmet need in learning powerful representations of subgraphs of hypergraphs in real-world applications. For example, a cancer patient can be viewed as a subgraph of genes harboring mutations in the patient, while all the genes are connected by hyperedges that correspond to pathways representing specific molecular functions. For accurate inductive subgraph prediction, we propose SubHypergraph Inductive Neural nEtwork (SHINE). SHINE uses informative genetic pathways that encode molecular functions as hyperedges to connect genes as nodes. SHINE jointly optimizes the objectives of end-to-end subgraph classification and hypergraph nodes' similarity regularization. SHINE simultaneously learns representations for both genes and pathways using strongly dual attention message passing. The learned representations are aggregated via a subgraph attention layer and used to train a multilayer perceptron for inductive subgraph inferencing. We evaluated SHINE against a wide array of state-of-the-art (hyper)graph neural networks, XGBoost, NMF and polygenic risk score models, using large scale NGS and curated datasets. SHINE outperformed all comparison models significantly, and yielded interpretable disease models with functional insights.
\ No newline at end of file
diff --git a/data/2022/neurips/SIREN: Shaping Representations for Detecting Out-of-Distribution Objects b/data/2022/neurips/SIREN: Shaping Representations for Detecting Out-of-Distribution Objects
new file mode 100644
index 0000000000..5696263ffe
--- /dev/null
+++ b/data/2022/neurips/SIREN: Shaping Representations for Detecting Out-of-Distribution Objects	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) objects is indispensable for safely deploying object detectors in the wild. Although distance-based OOD detection methods have demonstrated promise in image classification, they remain largely unexplored in object-level OOD detection. This paper bridges the gap by proposing a distance-based framework for detecting OOD objects, which relies on the model-agnostic representation space and provides strong generality across different neural architectures. Our proposed framework SIREN contributes two novel components: (1) a representation learning component that uses a trainable loss function to shape the representations into a mixture of von Mises-Fisher (vMF) distributions on the unit hypersphere, and (2) a test-time OOD detection score leveraging the learned vMF distributions in a parametric or non-parametric way. SIREN achieves competitive performance on both the recent detection transformers and CNN-based models, improving the AUROC by a large margin compared to the previous best method. Code is publicly available at https://github.com/deeplearning-wisc/siren .
\ No newline at end of file
diff --git a/data/2022/neurips/SIXO: Smoothing Inference with Twisted Objectives b/data/2022/neurips/SIXO: Smoothing Inference with Twisted Objectives
new file mode 100644
index 0000000000..a2d073f4a6
--- /dev/null
+++ b/data/2022/neurips/SIXO: Smoothing Inference with Twisted Objectives	
@@ -0,0 +1 @@
+Sequential Monte Carlo (SMC) is an inference algorithm for state space models that approximates the posterior by sampling from a sequence of target distributions. The target distributions are often chosen to be the filtering distributions, but these ignore information from future observations, leading to practical and theoretical limitations in inference and model learning. We introduce SIXO, a method that instead learns targets that approximate the smoothing distributions, incorporating information from all observations. The key idea is to use density ratio estimation to fit functions that warp the filtering distributions into the smoothing distributions. We then use SMC with these learned targets to define a variational objective for model and proposal learning. SIXO yields provably tighter log marginal lower bounds and offers significantly more accurate posterior inferences and parameter estimates in a variety of domains.
\ No newline at end of file
diff --git a/data/2022/neurips/SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance b/data/2022/neurips/SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance
new file mode 100644
index 0000000000..92040e8307
--- /dev/null
+++ b/data/2022/neurips/SInGE: Sparsity via Integrated Gradients Estimation of Neuron Relevance	
@@ -0,0 +1 @@
+The leap in performance in state-of-the-art computer vision methods is attributed to the development of deep neural networks. However it often comes at a computational price which may hinder their deployment. To alleviate this limitation, structured pruning is a well known technique which consists in removing channels, neurons or filters, and is commonly applied in order to produce more compact models. In most cases, the computations to remove are selected based on a relative importance criterion. At the same time, the need for explainable predictive models has risen tremendously and motivated the development of robust attribution methods that highlight the relative importance of pixels of an input image or feature map. In this work, we discuss the limitations of existing pruning heuristics, among which magnitude and gradient-based methods. We draw inspiration from attribution methods to design a novel integrated gradient pruning criterion, in which the relevance of each neuron is defined as the integral of the gradient variation on a path towards this neuron removal. Furthermore, we propose an entwined DNN pruning and fine-tuning flowchart to better preserve DNN accuracy while removing parameters. We show through extensive validation on several datasets, architectures as well as pruning scenarios that the proposed method, dubbed SInGE, significantly outperforms existing state-of-the-art DNN pruning methods.
\ No newline at end of file
diff --git a/data/2022/neurips/SKFlow: Learning Optical Flow with Super Kernels b/data/2022/neurips/SKFlow: Learning Optical Flow with Super Kernels
new file mode 100644
index 0000000000..fce02aefa0
--- /dev/null
+++ b/data/2022/neurips/SKFlow: Learning Optical Flow with Super Kernels	
@@ -0,0 +1 @@
+Optical flow estimation is a classical yet challenging task in computer vision. One of the essential factors in accurately predicting optical flow is to alleviate occlusions between frames. However, it is still a thorny problem for current top-performing optical flow estimation methods due to insufficient local evidence to model occluded areas. In this paper, we propose the Super Kernel Flow Network (SKFlow), a CNN architecture to ameliorate the impacts of occlusions on optical flow estimation. SKFlow benefits from the super kernels which bring enlarged receptive fields to complement the absent matching information and recover the occluded motions. We present efficient super kernel designs by utilizing conical connections and hybrid depth-wise convolutions. Extensive experiments demonstrate the effectiveness of SKFlow on multiple benchmarks, especially in the occluded areas. Without pre-trained backbones on ImageNet and with a modest increase in computation, SKFlow achieves compelling performance and ranks $\textbf{1st}$ among currently published methods on the Sintel benchmark. On the challenging Sintel clean and final passes (test), SKFlow surpasses the best-published result in the unmatched areas ($7.96$ and $12.50$) by $9.09\%$ and $7.92\%$. The code is available at \href{https://github.com/littlespray/SKFlow}{https://github.com/littlespray/SKFlow}.
\ No newline at end of file
diff --git a/data/2022/neurips/SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments b/data/2022/neurips/SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments
new file mode 100644
index 0000000000..b046a1fa71
--- /dev/null
+++ b/data/2022/neurips/SMPL: Simulated Industrial Manufacturing and Process Control Learning Environments	
@@ -0,0 +1 @@
+Traditional biological and pharmaceutical manufacturing plants are controlled by human workers or pre-defined thresholds. Modernized factories have advanced process control algorithms such as model predictive control (MPC). However, there is little exploration of applying deep reinforcement learning to control manufacturing plants. One of the reasons is the lack of high fidelity simulations and standard APIs for benchmarking. To bridge this gap, we develop an easy-to-use library that includes five high-fidelity simulation environments: BeerFMTEnv, ReactorEnv, AtropineEnv, PenSimEnv and mAbEnv, which cover a wide range of manufacturing processes. We build these environments on published dynamics models. Furthermore, we benchmark online and offline, model-based and model-free reinforcement learning algorithms for comparisons of follow-up research.
\ No newline at end of file
diff --git a/data/2022/neurips/SNAKE: Shape-aware Neural 3D Keypoint Field b/data/2022/neurips/SNAKE: Shape-aware Neural 3D Keypoint Field
new file mode 100644
index 0000000000..002df51592
--- /dev/null
+++ b/data/2022/neurips/SNAKE: Shape-aware Neural 3D Keypoint Field	
@@ -0,0 +1 @@
+Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection? Existing methods either seek salient features according to statistics of different orders or learn to predict keypoints that are invariant to transformation. Nevertheless, the idea of incorporating shape reconstruction into 3D keypoint detection is under-explored. We argue that this is restricted by former problem formulations. To this end, a novel unsupervised paradigm named SNAKE is proposed, which is short for shape-aware neural 3D keypoint field. Similar to recent coordinate-based radiance or distance field, our network takes 3D coordinates as inputs and predicts implicit shape indicators and keypoint saliency simultaneously, thus naturally entangling 3D keypoint detection and shape reconstruction. We achieve superior performance on various public benchmarks, including standalone object datasets ModelNet40, KeypointNet, SMPL meshes and scene-level datasets 3DMatch and Redwood. Intrinsic shape awareness brings several advantages as follows. (1) SNAKE generates 3D keypoints consistent with human semantic annotation, even without such supervision. (2) SNAKE outperforms counterparts in terms of repeatability, especially when the input point clouds are down-sampled. (3) the generated keypoints allow accurate geometric registration, notably in a zero-shot setting. Codes are available at https://github.com/zhongcl-thu/SNAKE
\ No newline at end of file
diff --git a/data/2022/neurips/SNN-RAT: Robustness-enhanced Spiking Neural Network through Regularized Adversarial Training b/data/2022/neurips/SNN-RAT: Robustness-enhanced Spiking Neural Network through Regularized Adversarial Training
new file mode 100644
index 0000000000..9eed52abfa
--- /dev/null
+++ b/data/2022/neurips/SNN-RAT: Robustness-enhanced Spiking Neural Network through Regularized Adversarial Training	
@@ -0,0 +1 @@
+Spiking neural networks (SNNs) are promising to be widely deployed in real-time and safety-critical applications with the advance of neuromorphic computing. Recent work has demonstrated the insensitivity of SNNs to small random perturbations due to the discrete internal information representation. The variety of training algorithms and the involvement of the temporal dimension pose more threats to the robustness of SNNs than that of typical neural networks. We account for the vulnerability of SNNs by constructing adversaries based on different differentiable approximation techniques. By deriving a Lipschitz constant specifically for the spike representation, we first theoretically answer the question of how much adversarial invulnerability is retained in SNNs. Hence, to defend against the broad attack methods, we propose a regularized adversarial training scheme with low computational overheads. SNNs can benefit from the constraint of the perturbed spike distance’s amplification and the generalization on multiple adversarial ϵ - neighbourhoods. Our experiments on the image recognition benchmarks have proven that our training scheme can defend against powerful adversarial attacks crafted from strong differentiable approximations. To be specific, our approach makes the black-box attacks of the Projected Gradient Descent attack nearly ineffective. We believe that our work will facilitate the spread of SNNs for
\ No newline at end of file
diff --git a/data/2022/neurips/SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG b/data/2022/neurips/SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG
new file mode 100644
index 0000000000..9818e96076
--- /dev/null
+++ b/data/2022/neurips/SPD domain-specific batch normalization to crack interpretable unsupervised domain adaptation in EEG	
@@ -0,0 +1 @@
+Electroencephalography (EEG) provides access to neuronal dynamics non-invasively with millisecond resolution, rendering it a viable method in neuroscience and healthcare. However, its utility is limited as current EEG technology does not generalize well across domains (i.e., sessions and subjects) without expensive supervised re-calibration. Contemporary methods cast this transfer learning (TL) problem as a multi-source/-target unsupervised domain adaptation (UDA) problem and address it with deep learning or shallow, Riemannian geometry aware alignment methods. Both directions have, so far, failed to consistently close the performance gap to state-of-the-art domain-specific methods based on tangent space mapping (TSM) on the symmetric positive definite (SPD) manifold. Here, we propose a theory-based machine learning framework that enables, for the first time, learning domain-invariant TSM models in an end-to-end fashion. To achieve this, we propose a new building block for geometric deep learning, which we denote SPD domain-specific momentum batch normalization (SPDDSMBN). A SPDDSMBN layer can transform domain-specific SPD inputs into domain-invariant SPD outputs, and can be readily applied to multi-source/-target and online UDA scenarios. In extensive experiments with 6 diverse EEG brain-computer interface (BCI) datasets, we obtain state-of-the-art performance in inter-session and -subject TL with a simple, intrinsically interpretable network architecture, which we denote TSMNet.
\ No newline at end of file
diff --git a/data/2022/neurips/SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning b/data/2022/neurips/SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning
new file mode 100644
index 0000000000..76c74fa643
--- /dev/null
+++ b/data/2022/neurips/SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns . To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy’s parameters can serve as a good initialization for a series of downstream tasks’ policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.
\ No newline at end of file
diff --git a/data/2022/neurips/SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion b/data/2022/neurips/SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion
new file mode 100644
index 0000000000..a62df3a087
--- /dev/null
+++ b/data/2022/neurips/SPoVT: Semantic-Prototype Variational Transformer for Dense Point Cloud Semantic Completion	
@@ -0,0 +1 @@
+Point cloud completion is an active research topic for 3D vision and has been widely studied in recent years. Instead of directly predicting the missing point cloud from the partial input, we introduce a Semantic-Prototype Variational Transformer (SPoVT) in this work, which takes both partial point cloud and their semantic labels as the inputs for semantic point cloud object completion. By observing and attending to geometry and semantic information as input features, our SPoVT would derive point cloud features and their semantic prototypes for completion purposes. As a result, our SPoVT not only performs point cloud completion with varying resolution, it also allows manipulation of different semantic parts of an object. Experiments on benchmark datasets would quantitatively and qualitatively verify the effectiveness and practicality of our proposed model.
\ No newline at end of file
diff --git a/data/2022/neurips/SQ Lower Bounds for Learning Single Neurons with Massart Noise b/data/2022/neurips/SQ Lower Bounds for Learning Single Neurons with Massart Noise
new file mode 100644
index 0000000000..163c030742
--- /dev/null
+++ b/data/2022/neurips/SQ Lower Bounds for Learning Single Neurons with Massart Noise	
@@ -0,0 +1 @@
+We study the problem of PAC learning a single neuron in the presence of Massart noise. Specifically, for a known activation function $f: \mathbb{R} \to \mathbb{R}$, the learner is given access to labeled examples $(\mathbf{x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of $\mathbf{x}$ is arbitrary and the corresponding label $y$ is a Massart corruption of $f(\langle \mathbf{w}, \mathbf{x} \rangle)$. The goal of the learner is to output a hypothesis $h: \mathbb{R}^d \to \mathbb{R}$ with small squared loss. For a range of activation functions, including ReLUs, we establish super-polynomial Statistical Query (SQ) lower bounds for this learning problem. In more detail, we prove that no efficient SQ algorithm can approximate the optimal error within any constant factor. Our main technical contribution is a novel SQ-hard construction for learning $\{ \pm 1\}$-weight Massart halfspaces on the Boolean hypercube that is interesting on its own right.
\ No newline at end of file
diff --git a/data/2022/neurips/ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning b/data/2022/neurips/ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
new file mode 100644
index 0000000000..894c38a2b8
--- /dev/null
+++ b/data/2022/neurips/ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning	
@@ -0,0 +1 @@
+Capitalizing on large pre-trained models for various downstream tasks of interest have recently emerged with promising performance. Due to the ever-growing model size, the standard full fine-tuning based task adaptation strategy becomes prohibitively costly in terms of model training and storage. This has led to a new research direction in parameter-efficient transfer learning. However, existing attempts typically focus on downstream tasks from the same modality (e.g., image understanding) of the pre-trained model. This creates a limit because in some specific modalities, (e.g., video understanding) such a strong pre-trained model with sufficient knowledge is less or not available. In this work, we investigate such a novel cross-modality transfer learning setting, namely parameter-efficient image-to-video transfer learning. To solve this problem, we propose a new Spatio-Temporal Adapter (ST-Adapter) for parameter-efficient fine-tuning per video task. With a built-in spatio-temporal reasoning capability in a compact design, ST-Adapter enables a pre-trained image model without temporal knowledge to reason about dynamic video content at a small (~8%) per-task parameter cost, requiring approximately 20 times fewer updated parameters compared to previous work. Extensive experiments on video action recognition tasks show that our ST-Adapter can match or even outperform the strong full fine-tuning strategy and state-of-the-art video models, whilst enjoying the advantage of parameter efficiency. The code and model are available at https://github.com/linziyi96/st-adapter
\ No newline at end of file
diff --git a/data/2022/neurips/STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers b/data/2022/neurips/STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers
new file mode 100644
index 0000000000..a5f796a667
--- /dev/null
+++ b/data/2022/neurips/STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers	
@@ -0,0 +1 @@
+Modeling neural population dynamics underlying noisy single-trial spiking activities is essential for relating neural observation and behavior. A recent non-recurrent method - Neural Data Transformers (NDT) - has shown great success in capturing neural dynamics with low inference latency without an explicit dynamical model. However, NDT focuses on modeling the temporal evolution of the population activity while neglecting the rich covariation between individual neurons. In this paper we introduce SpatioTemporal Neural Data Transformer (STNDT), an NDT-based architecture that explicitly models responses of individual neurons in the population across time and space to uncover their underlying firing rates. In addition, we propose a contrastive learning loss that works in accordance with mask modeling objective to further improve the predictive performance. We show that our model achieves state-of-the-art performance on ensemble level in estimating neural activities across four neural datasets, demonstrating its capability to capture autonomous and non-autonomous dynamics spanning different cortical regions while being completely agnostic to the specific behaviors at hand. Furthermore, STNDT spatial attention mechanism reveals consistently important subsets of neurons that play a vital role in driving the response of the entire population, providing interpretability and key insights into how the population of neurons performs computation.
\ No newline at end of file
diff --git a/data/2022/neurips/STaR: Bootstrapping Reasoning With Reasoning b/data/2022/neurips/STaR: Bootstrapping Reasoning With Reasoning
new file mode 100644
index 0000000000..2634ef540b
--- /dev/null
+++ b/data/2022/neurips/STaR: Bootstrapping Reasoning With Reasoning	
@@ -0,0 +1 @@
+Generating step-by-step"chain-of-thought"rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the"Self-Taught Reasoner"(STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.
\ No newline at end of file
diff --git a/data/2022/neurips/Safe Opponent-Exploitation Subgame Refinement b/data/2022/neurips/Safe Opponent-Exploitation Subgame Refinement
new file mode 100644
index 0000000000..27cc199b2a
--- /dev/null
+++ b/data/2022/neurips/Safe Opponent-Exploitation Subgame Refinement	
@@ -0,0 +1 @@
+In zero-sum games, an NE strategy tends to be overly conservative confronted with opponents of limited rationality, because it does not actively exploit their weaknesses. From another perspective, best responding to an estimated opponent model is vulnerable to estimation errors and lacks safety guarantees. Inspired by the recent success of real-time search algorithms in developing superhuman AI, we investigate the dilemma of safety and opponent exploitation and present a novel real-time search framework, called Safe Exploitation Search (SES), which continuously interpolates between the two extremes of online strategy reﬁnement. We provide SES with a theoretically upper-bounded exploitability and a lower-bounded evaluation performance. Additionally, SES enables computationally efﬁcient online adaptation to a possibly updating opponent model, while previous safe exploitation methods have to recompute for the whole game. Empirical results show that SES signiﬁcantly outperforms NE baselines and previous algorithms while keeping exploitability low at the same time.
\ No newline at end of file
diff --git a/data/2022/neurips/SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles b/data/2022/neurips/SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles
new file mode 100644
index 0000000000..411d22f751
--- /dev/null
+++ b/data/2022/neurips/SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles	
@@ -0,0 +1 @@
+As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e.g., bird's-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https://safebench.github.io.
\ No newline at end of file
diff --git a/data/2022/neurips/Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions b/data/2022/neurips/Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions
new file mode 100644
index 0000000000..38cdd06fc4
--- /dev/null
+++ b/data/2022/neurips/Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions	
@@ -0,0 +1 @@
+Neural Networks (NNs) have been successfully employed to represent the state evolution of complex dynamical systems. Such models, referred to as NN dynamic models (NNDMs), use iterative noisy predictions of NN to estimate a distribution of system trajectories over time. Despite their accuracy, safety analysis of NNDMs is known to be a challenging problem and remains largely unexplored. To address this issue, in this paper, we introduce a method of providing safety guarantees for NNDMs. Our approach is based on stochastic barrier functions, whose relation with safety are analogous to that of Lyapunov functions with stability. We first show a method of synthesizing stochastic barrier functions for NNDMs via a convex optimization problem, which in turn provides a lower bound on the system's safety probability. A key step in our method is the employment of the recent convex approximation results for NNs to find piece-wise linear bounds, which allow the formulation of the barrier function synthesis problem as a sum-of-squares optimization program. If the obtained safety probability is above the desired threshold, the system is certified. Otherwise, we introduce a method of generating controls for the system that robustly maximizes the safety probability in a minimally-invasive manner. We exploit the convexity property of the barrier function to formulate the optimal control synthesis problem as a linear program. Experimental results illustrate the efficacy of the method. Namely, they show that the method can scale to multi-dimensional NNDMs with multiple layers and hundreds of neurons per layer, and that the controller can significantly improve the safety probability.
\ No newline at end of file
diff --git a/data/2022/neurips/SageMix: Saliency-Guided Mixup for Point Clouds b/data/2022/neurips/SageMix: Saliency-Guided Mixup for Point Clouds
new file mode 100644
index 0000000000..2e7f3fd046
--- /dev/null
+++ b/data/2022/neurips/SageMix: Saliency-Guided Mixup for Point Clouds	
@@ -0,0 +1 @@
+Data augmentation is key to improving the generalization ability of deep learning models. Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity. Also, recent studies of saliency-aware Mixup in the image domain show that preserving discriminative parts is beneficial to improving the generalization performance. However, these Mixup-based data augmentations are underexplored in 3D vision, especially in point clouds. In this paper, we propose SageMix, a saliency-guided Mixup for point clouds to preserve salient local structures. Specifically, we extract salient regions from two point clouds and smoothly combine them into one continuous shape. With a simple sequential sampling by re-weighted saliency scores, SageMix preserves the local structure of salient regions. Extensive experiments demonstrate that the proposed method consistently outperforms existing Mixup methods in various benchmark point cloud datasets. With PointNet++, our method achieves an accuracy gain of 2.6% and 4.0% over standard training in 3D Warehouse dataset (MN40) and ScanObjectNN, respectively. In addition to generalization performance, SageMix improves robustness and uncertainty calibration. Moreover, when adopting our method to various tasks including part segmentation and standard 2D image classification, our method achieves competitive performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Saliency-Aware Neural Architecture Search b/data/2022/neurips/Saliency-Aware Neural Architecture Search
new file mode 100644
index 0000000000..399968196c
--- /dev/null
+++ b/data/2022/neurips/Saliency-Aware Neural Architecture Search	
@@ -0,0 +1 @@
+Recently a wide variety of NAS methods have been proposed and achieved considerable success in automatically identifying highly-performing architectures of neural networks for the sake of reducing the reliance on human experts. Existing NAS methods ignore the fact that different input data elements (e.g., image pixels) have different importance (or saliency) in determining the prediction outcome. They treat all data elements as being equally important and therefore lead to suboptimal performance. To address this problem, we propose an end-to-end framework which dynamically detects saliency of input data, reweights data using saliency maps, and searches architectures on saliency-reweighted data. Our framework is based on four-level optimization, which performs four learning stages in a unified way. At the first stage, a model is trained with its architecture tentatively fixed. At the second stage, saliency maps are generated using the trained model. At the third stage, the model is retrained on saliency-reweighted data. At the fourth stage, the model is evaluated on a validation set and the architecture is updated by minimizing the validation loss. Experiments on several datasets demonstrate the effectiveness of our framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search b/data/2022/neurips/Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search
new file mode 100644
index 0000000000..4942d227cd
--- /dev/null
+++ b/data/2022/neurips/Sample Complexity of Learning Heuristic Functions for Greedy-Best-First and A* Search	
@@ -0,0 +1 @@
+Greedy best-first search (GBFS) and A* search (A*) are popular algorithms for path-finding on large graphs. Both use so-called heuristic functions, which estimate how close a vertex is to the goal. While heuristic functions have been handcrafted using domain knowledge, recent studies demonstrate that learning heuristic functions from data is effective in many applications. Motivated by this emerging approach, we study the sample complexity of learning heuristic functions for GBFS and A*. We build on a recent framework called \textit{data-driven algorithm design} and evaluate the \textit{pseudo-dimension} of a class of utility functions that measure the performance of parameterized algorithms. Assuming that a vertex set of size $n$ is fixed, we present $\mathrm{O}(n\lg n)$ and $\mathrm{O}(n^2\lg n)$ upper bounds on the pseudo-dimensions for GBFS and A*, respectively, parameterized by heuristic function values. The upper bound for A* can be improved to $\mathrm{O}(n^2\lg d)$ if every vertex has a degree of at most $d$ and to $\mathrm{O}(n \lg n)$ if edge weights are integers bounded by $\mathrm{poly}(n)$. We also give $\Omega(n)$ lower bounds for GBFS and A*, which imply that our bounds for GBFS and A* under the integer-weight condition are tight up to a $\lg n$ factor. Finally, we discuss a case where the performance of A* is measured by the suboptimality and show that we can sometimes obtain a better guarantee by combining a parameter-dependent worst-case bound with a sample complexity bound.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample Constrained Treatment Effect Estimation b/data/2022/neurips/Sample Constrained Treatment Effect Estimation
new file mode 100644
index 0000000000..19e38b4712
--- /dev/null
+++ b/data/2022/neurips/Sample Constrained Treatment Effect Estimation	
@@ -0,0 +1 @@
+Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset must be further partitioned into treatment and control groups. Algorithms for partitioning the entire population into treatment and control groups, or for choosing a single representative subset, have been well-studied. The key challenge in our setting is jointly choosing a representative subset and a partition for that set. We focus on both individual and average treatment effect estimation, under a linear effects model. We give provably efficient experimental designs and corresponding estimators, by identifying connections to discrepancy minimization and leverage-score-based sampling used in randomized numerical linear algebra. Our theoretical results obtain a smooth transition to known guarantees when $s$ equals the population size. We also empirically demonstrate the performance of our algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization b/data/2022/neurips/Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization
new file mode 100644
index 0000000000..f9170fd96a
--- /dev/null
+++ b/data/2022/neurips/Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization	
@@ -0,0 +1 @@
+Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization--the number of molecules evaluated by the oracle--is rarely discussed, despite being an essential consideration for realistic discovery applications. To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 tasks with a particular focus on sample efficiency. Our results show that most"state-of-the-art"methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games b/data/2022/neurips/Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games
new file mode 100644
index 0000000000..a0caac200d
--- /dev/null
+++ b/data/2022/neurips/Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games	
@@ -0,0 +1 @@
+Imperfect-Information Extensive-Form Games (IIEFGs) is a prevalent model for real-world games involving imperfect information and sequential plays. The Extensive-Form Correlated Equilibrium (EFCE) has been proposed as a natural solution concept for multi-player general-sum IIEFGs. However, existing algorithms for finding an EFCE require full feedback from the game, and it remains open how to efficiently learn the EFCE in the more challenging bandit feedback setting where the game can only be learned by observations from repeated playing. This paper presents the first sample-efficient algorithm for learning the EFCE from bandit feedback. We begin by proposing $K$-EFCE -- a more generalized definition that allows players to observe and deviate from the recommended actions for $K$ times. The $K$-EFCE includes the EFCE as a special case at $K=1$, and is an increasingly stricter notion of equilibrium as $K$ increases. We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player. Our algorithm works by minimizing a wide-range regret at each information set that takes into account all possible recommendation histories. Finally, we design a sample-based variant of our algorithm that learns an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K+1}/\varepsilon^2)$ episodes of play in the bandit feedback setting. When specialized to $K=1$, this gives the first sample-efficient algorithm for learning EFCE from bandit feedback.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample-Efficient Reinforcement Learning of Partially Observable Markov Games b/data/2022/neurips/Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
new file mode 100644
index 0000000000..e5f9c76242
--- /dev/null
+++ b/data/2022/neurips/Sample-Efficient Reinforcement Learning of Partially Observable Markov Games	
@@ -0,0 +1 @@
+This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs -- weakly revealing POMGs -- in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies. To our best knowledge, this work provides the first line of sample-efficient results for learning POMGs.
\ No newline at end of file
diff --git a/data/2022/neurips/Sample-Then-Optimize Batch Neural Thompson Sampling b/data/2022/neurips/Sample-Then-Optimize Batch Neural Thompson Sampling
new file mode 100644
index 0000000000..587794a7ed
--- /dev/null
+++ b/data/2022/neurips/Sample-Then-Optimize Batch Neural Thompson Sampling	
@@ -0,0 +1 @@
+Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and derived theoretical guarantees using the theory of neural tangent kernel (NTK). However, these works suffer from the limitations of the requirement to invert an extremely large parameter matrix and the restriction to the sequential (rather than batch) setting. To overcome these limitations, we introduce two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO-BNTS-Linear. To choose an input query, we only need to train an NN (resp. a linear model) and then choose the query by maximizing the trained NN (resp. linear model), which is equivalently sampled from the GP posterior with the NTK as the kernel function. As a result, our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy. Next, we derive regret upper bounds for our algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions. Finally, we verify their empirical effectiveness using practical AutoML and reinforcement learning experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Sampling from Log-Concave Distributions with Infinity-Distance Guarantees b/data/2022/neurips/Sampling from Log-Concave Distributions with Infinity-Distance Guarantees
new file mode 100644
index 0000000000..8cb9512e71
--- /dev/null
+++ b/data/2022/neurips/Sampling from Log-Concave Distributions with Infinity-Distance Guarantees	
@@ -0,0 +1 @@
+For a $d$-dimensional log-concave distribution $\pi(\theta) \propto e^{-f(\theta)}$ constrained to a convex body $K$, the problem of outputting samples from a distribution $\nu$ which is $\varepsilon$-close in infinity-distance $\sup_{\theta \in K} |\log \frac{\nu(\theta)}{\pi(\theta)}|$ to $\pi$ arises in differentially private optimization. While sampling within total-variation distance $\varepsilon$ of $\pi$ can be done by algorithms whose runtime depends polylogarithmically on $\frac{1}{\varepsilon}$, prior algorithms for sampling in $\varepsilon$ infinity distance have runtime bounds that depend polynomially on $\frac{1}{\varepsilon}$. We bridge this gap by presenting an algorithm that outputs a point $\varepsilon$-close to $\pi$ in infinity distance that requires at most $\mathrm{poly}(\log \frac{1}{\varepsilon}, d)$ calls to a membership oracle for $K$ and evaluation oracle for $f$, when $f$ is Lipschitz. Our approach departs from prior works that construct Markov chains on a $\frac{1}{\varepsilon^2}$-discretization of $K$ to achieve a sample with $\varepsilon$ infinity-distance error, and present a method to directly convert continuous samples from $K$ with total-variation bounds to samples with infinity bounds. This approach also allows us to obtain an improvement on the dimension $d$ in the running time for the problem of sampling from a log-concave distribution on polytopes $K$ with infinity distance $\varepsilon$, by plugging in TV-distance running time bounds for the Dikin Walk Markov chain.
\ No newline at end of file
diff --git a/data/2022/neurips/Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent b/data/2022/neurips/Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent
new file mode 100644
index 0000000000..69568ee85f
--- /dev/null
+++ b/data/2022/neurips/Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent	
@@ -0,0 +1 @@
+Sampling methods, as important inference and learning techniques, are typically designed for unconstrained domains. However, constraints are ubiquitous in machine learning problems, such as those on safety, fairness, robustness, and many other properties that must be satisfied to apply sampling results in real-life applications. Enforcing these constraints often leads to implicitly-defined manifolds, making efficient sampling with constraints very challenging. In this paper, we propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold $\mathcal{G}_0$ defined by general equality constraints. O-Gradient decomposes the gradient into two parts: one decreases the distance to $\mathcal{G}_0$ and the other decreases the KL divergence in the orthogonal space. While most existing manifold sampling methods require initialization on $\mathcal{G}_0$, O-Gradient does not require such prior knowledge. We prove that O-Gradient converges to the target constrained distribution with rate $\widetilde{O}(1/\text{the number of iterations})$ under mild conditions. Our proof relies on a new Stein characterization of conditional measure which could be of independent interest. We implement O-Gradient through both Langevin dynamics and Stein variational gradient descent and demonstrate its effectiveness in various experiments, including Bayesian deep neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space b/data/2022/neurips/Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space
new file mode 100644
index 0000000000..6ca819fef3
--- /dev/null
+++ b/data/2022/neurips/Sampling with Riemannian Hamiltonian Monte Carlo in a Constrained Space	
@@ -0,0 +1 @@
+We demonstrate for the first time that ill-conditioned, non-smooth, constrained distributions in very high dimension, upwards of 100,000, can be sampled efficiently $\textit{in practice}$. Our algorithm incorporates constraints into the Riemannian version of Hamiltonian Monte Carlo and maintains sparsity. This allows us to achieve a mixing rate independent of smoothness and condition numbers. On benchmark data sets in systems biology and linear programming, our algorithm outperforms existing packages by orders of magnitude. In particular, we achieve a 1,000-fold speed-up for sampling from the largest published human metabolic network (RECON3D). Our package has been incorporated into the COBRA toolbox.
\ No newline at end of file
diff --git a/data/2022/neurips/Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization b/data/2022/neurips/Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization
new file mode 100644
index 0000000000..9e726a77d5
--- /dev/null
+++ b/data/2022/neurips/Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization	
@@ -0,0 +1 @@
+We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-{\L}ojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of data-ordering attacks, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the incremental gradient method, where the data points are not shuffled at all.
\ No newline at end of file
diff --git a/data/2022/neurips/SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery b/data/2022/neurips/SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
new file mode 100644
index 0000000000..3be85b979e
--- /dev/null
+++ b/data/2022/neurips/SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery	
@@ -0,0 +1 @@
+Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees b/data/2022/neurips/Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees
new file mode 100644
index 0000000000..0d804ecea8
--- /dev/null
+++ b/data/2022/neurips/Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees	
@@ -0,0 +1 @@
+Distributionally robust optimization (DRO) has shown lot of promise in providing robustness in learning as well as sample based optimization problems. We endeavor to provide DRO solutions for a class of sum of fractionals, non-convex optimization which is used for decision making in prominent areas such as facility location and security games. In contrast to previous work, we find it more tractable to optimize the equivalent variance regularized form of DRO rather than the minimax form. We transform the variance regularized form to a mixed-integer second order cone program (MISOCP), which, while guaranteeing near global optimality, does not scale enough to solve problems with real world data-sets. We further propose two abstraction approaches based on clustering and stratified sampling to increase scalability, which we then use for real world data-sets. Importantly, we provide near global optimality guarantees for our approach and show experimentally that our solution quality is better than the locally optimal ones achieved by state-of-the-art gradient-based methods. We experimentally compare our different approaches and baselines, and reveal nuanced properties of a DRO solution.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Infomin Learning b/data/2022/neurips/Scalable Infomin Learning
new file mode 100644
index 0000000000..a7ef283adc
--- /dev/null
+++ b/data/2022/neurips/Scalable Infomin Learning	
@@ -0,0 +1 @@
+The task of infomin learning aims to learn a representation with high utility while being uninformative about a specified target, with the latter achieved by minimising the mutual information between the representation and the target. It has broad applications, ranging from training fair prediction models against protected attributes, to unsupervised learning with disentangled representations. Recent works on infomin learning mainly use adversarial training, which involves training a neural network to estimate mutual information or its proxy and thus is slow and difficult to optimise. Drawing on recent advances in slicing techniques, we propose a new infomin learning approach, which uses a novel proxy metric to mutual information. We further derive an accurate and analytically computable approximation to this proxy metric, thereby removing the need of constructing neural network-based mutual information estimators. Experiments on algorithmic fairness, disentangled representation learning and domain adaptation verify that our method can effectively remove unwanted information with limited time budget.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Interpretability via Polynomials b/data/2022/neurips/Scalable Interpretability via Polynomials
new file mode 100644
index 0000000000..d6cd912a33
--- /dev/null
+++ b/data/2022/neurips/Scalable Interpretability via Polynomials	
@@ -0,0 +1 @@
+Generalized Additive Models (GAMs) have quickly become the leading choice for inherently-interpretable machine learning. However, unlike uninterpretable methods such as DNNs, they lack expressive power and easy scalability, and are hence not a feasible alternative for real-world tasks. We present a new class of GAMs that use tensor rank decompositions of polynomials to learn powerful, {\em inherently-interpretable} models. Our approach, titled Scalable Polynomial Additive Models (SPAM) is effortlessly scalable and models {\em all} higher-order feature interactions without a combinatorial parameter explosion. SPAM outperforms all current interpretable approaches, and matches DNN/XGBoost performance on a series of real-world benchmarks with up to hundreds of thousands of features. We demonstrate by human subject evaluations that SPAMs are demonstrably more interpretable in practice, and are hence an effortless replacement for DNNs for creating interpretable and high-performance systems suitable for large-scale machine learning. Source code is available at https://github.com/facebookresearch/nbm-spam.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs b/data/2022/neurips/Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs
new file mode 100644
index 0000000000..e5f5c56fad
--- /dev/null
+++ b/data/2022/neurips/Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs	
@@ -0,0 +1 @@
+Covering skill (a.k.a., option) discovery has been developed to improve the exploration of RL in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. Given that joint state space grows exponentially with the number of agents in multi-agent systems, existing researches still relying on single-agent skill discovery either become prohibitive or fail to directly discover joint skills that improve the connectivity of the joint state space. In this paper, we propose multi-agent skill discovery which enables the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph, based on which we can directly estimate its Fiedler vector using the Laplacian spectrum of individual agents' transition graphs. Further, considering that directly computing the Laplacian spectrum is intractable for tasks with infinite-scale state spaces, we further propose a deep learning extension of our method by estimating eigenfunctions through NN-based representation learning techniques. The evaluation on multi-agent tasks built with simulators like Mujoco, shows that the proposed algorithm can successfully identify multi-agent skills, and significantly outperforms the state-of-the-art. Codes are available at: https://github.itap.purdue.edu/Clan-labs/Scalable_MAOD_via_KP.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Neural Video Representations with Learnable Positional Features b/data/2022/neurips/Scalable Neural Video Representations with Learnable Positional Features
new file mode 100644
index 0000000000..d6e4655150
--- /dev/null
+++ b/data/2022/neurips/Scalable Neural Video Representations with Learnable Positional Features	
@@ -0,0 +1 @@
+Succinct representation of complex signals using coordinate-based neural representations (CNRs) has seen great progress, and several recent efforts focus on extending them for handling videos. Here, the main challenge is how to (a) alleviate a compute-inefficiency in training CNRs to (b) achieve high-quality video encoding while (c) maintaining the parameter-efficiency. To meet all requirements (a), (b), and (c) simultaneously, we propose neural video representations with learnable positional features (NVP), a novel CNR by introducing"learnable positional features"that effectively amortize a video as latent codes. Specifically, we first present a CNR architecture based on designing 2D latent keyframes to learn the common video contents across each spatio-temporal axis, which dramatically improves all of those three requirements. Then, we propose to utilize existing powerful image and video codecs as a compute-/memory-efficient compression procedure of latent codes. We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07$\rightarrow$34.57 (measured with the PSNR metric), even using $>$8 times fewer parameters. We also show intriguing properties of NVP, e.g., video inpainting, video frame interpolation, etc.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees b/data/2022/neurips/Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees
new file mode 100644
index 0000000000..f3dadefd36
--- /dev/null
+++ b/data/2022/neurips/Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees	
@@ -0,0 +1 @@
+We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the exploration-exploitation task, enabling LinUCB to achieve constant (i.e., horizon-independent) regret. In this paper, we propose BanditSRL, a representation learning algorithm that combines a novel constrained optimization problem to learn a realizable representation with good spectral properties with a generalized likelihood ratio test to exploit the recovered representation and avoid excessive exploration. We prove that BanditSRL can be paired with any no-regret algorithm and achieve constant regret whenever an HLS representation is available. Furthermore, BanditSRL can be easily combined with deep neural networks and we show how regularizing towards HLS representations is beneficial in standard benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions b/data/2022/neurips/Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions
new file mode 100644
index 0000000000..8a05b8b61f
--- /dev/null
+++ b/data/2022/neurips/Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions	
@@ -0,0 +1 @@
+Estimating the effects of continuous-valued interventions from observational data is a critically important task for climate science, healthcare, and economics. Recent work focuses on designing neural network architectures and regularization functions to allow for scalable estimation of average and individual-level dose-response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (observation of all confounding variables) and positivity (observation of all treatment levels for every covariate value describing a set of units), assumptions problematic in the continuous treatment regime. Scalable sensitivity and uncertainty analyses to understand the ignorance induced in causal estimates when these assumptions are relaxed are less studied. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm and uncertainty-aware deep models to derive and estimate these bounds for high-dimensional, large-sample observational data. We work in concert with climate scientists interested in the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years. This problem is known to be complicated by many unobserved confounders.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable and Efficient Non-adaptive Deterministic Group Testing b/data/2022/neurips/Scalable and Efficient Non-adaptive Deterministic Group Testing
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy b/data/2022/neurips/Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy
new file mode 100644
index 0000000000..c90fe64003
--- /dev/null
+++ b/data/2022/neurips/Scalable and Efficient Training of Large Convolutional Neural Networks with Differential Privacy	
@@ -0,0 +1 @@
+Large convolutional neural networks (CNN) can be difficult to train in the differentially private (DP) regime, since the optimization algorithms require a computationally expensive operation, known as the per-sample gradient clipping. We propose an efficient and scalable implementation of this clipping on convolutional layers, termed as the mixed ghost clipping, that significantly eases the private training in terms of both time and space complexities, without affecting the accuracy. The improvement in efficiency is rigorously studied through the first complexity analysis for the mixed ghost clipping and existing DP training algorithms. Extensive experiments on vision classification tasks, with large ResNet, VGG, and Vision Transformers, demonstrate that DP training with mixed ghost clipping adds $1\sim 10\%$ memory overhead and $<2\times$ slowdown to the standard non-private training. Specifically, when training VGG19 on CIFAR10, the mixed ghost clipping is $3\times$ faster than state-of-the-art Opacus library with $18\times$ larger maximum batch size. To emphasize the significance of efficient DP training on convolutional layers, we achieve 96.7\% accuracy on CIFAR10 and 83.0\% on CIFAR100 at $\epsilon=1$ using BEiT, while the previous best results are 94.8\% and 67.4\%, respectively. We open-source a privacy engine (\url{https://github.com/woodyx218/private_vision}) that implements DP training of CNN with a few lines of code.
\ No newline at end of file
diff --git a/data/2022/neurips/Scalable design of Error-Correcting Output Codes using Discrete Optimization with Graph Coloring b/data/2022/neurips/Scalable design of Error-Correcting Output Codes using Discrete Optimization with Graph Coloring
new file mode 100644
index 0000000000..67d7d61090
--- /dev/null
+++ b/data/2022/neurips/Scalable design of Error-Correcting Output Codes using Discrete Optimization with Graph Coloring	
@@ -0,0 +1 @@
+We study the problem of scalable design of Error-Correcting Output Codes (ECOC) for multi-class classification. Prior works on ECOC-based classifiers are limited to codebooks with small number of rows (classes) or columns, and do not provide optimality guarantees for the codebook design problem. We address these limitations by developing a codebook design approach based on a Mixed-Integer Quadratically Constrained Program (MIQCP). This discrete formulation is naturally suited for maximizing the error-correction capability of ECOC-based classifiers and incorporates various design criteria in a flexible manner. Our solution approach is tractable in that it incrementally increases the codebook size by adding columns to maximize the gain in error-correcting capability. In particular, we show that the maximal gain in error-correction can be upper bounded by solving a graph-coloring problem. As a result, we can efficiently generate near-optimal codebooks for very large problem instances. These codebooks provide competitive multi-class classification performance on small class datasets such as MNIST and CIFAR10. Moreover, by leveraging transfer-learned binary classifiers, we achieve better classification performance over transfer-learned multi-class CNNs on large class datasets such as CIFAR100, Caltech-101/256. Our results highlight the advantages of simple and modular ECOC-based classifiers in improving classification accuracy without the risk of overfitting.
\ No newline at end of file
diff --git a/data/2022/neurips/Scale-invariant Learning by Physics Inversion b/data/2022/neurips/Scale-invariant Learning by Physics Inversion
new file mode 100644
index 0000000000..82db2408e1
--- /dev/null
+++ b/data/2022/neurips/Scale-invariant Learning by Physics Inversion	
@@ -0,0 +1 @@
+Solving inverse problems, such as parameter estimation and optimal control, is a vital part of science. Many experiments repeatedly collect data and rely on machine learning algorithms to quickly infer solutions to the associated inverse problems. We find that state-of-the-art training techniques are not well-suited to many problems that involve physical processes. The highly nonlinear behavior, common in physical processes, results in strongly varying gradients that lead first-order optimizers like SGD or Adam to compute suboptimal optimization directions. We propose a novel hybrid training approach that combines higher-order optimization methods with machine learning techniques. We take updates from a scale-invariant inverse problem solver and embed them into the gradient-descent-based learning pipeline, replacing the regular gradient of the physical process. We demonstrate the capabilities of our method on a variety of canonical physical systems, showing that it yields significant improvements on a wide range of optimization and learning problems.
\ No newline at end of file
diff --git a/data/2022/neurips/Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning b/data/2022/neurips/Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
new file mode 100644
index 0000000000..97de52cb22
--- /dev/null
+++ b/data/2022/neurips/Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning	
@@ -0,0 +1,2 @@
+Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. 
+To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at this https URL.
\ No newline at end of file
diff --git a/data/2022/neurips/Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization b/data/2022/neurips/Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
new file mode 100644
index 0000000000..82cd188d0e
--- /dev/null
+++ b/data/2022/neurips/Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization	
@@ -0,0 +1 @@
+Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the commonly adopted instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos from the internet. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only instructional videos, there exist strong gradient conflicts between different CMA losses within the same video, audio, text triplet, indicating them as the noisy source of supervision. We then propose to harmonize such gradients, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for each sample triplet, so that their gradient directions are more aligned; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training on less noisy sample triplets. Applying those techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts.
\ No newline at end of file
diff --git a/data/2022/neurips/Score-Based Diffusion meets Annealed Importance Sampling b/data/2022/neurips/Score-Based Diffusion meets Annealed Importance Sampling
new file mode 100644
index 0000000000..d3fbea29b4
--- /dev/null
+++ b/data/2022/neurips/Score-Based Diffusion meets Annealed Importance Sampling	
@@ -0,0 +1 @@
+More than twenty years after its introduction, Annealed Importance Sampling (AIS) remains one of the most effective methods for marginal likelihood estimation. It relies on a sequence of distributions interpolating between a tractable initial distribution and the target distribution of interest which we simulate from approximately using a non-homogeneous Markov chain. To obtain an importance sampling estimate of the marginal likelihood, AIS introduces an extended target distribution to reweight the Markov chain proposal. While much effort has been devoted to improving the proposal distribution used by AIS, an underappreciated issue is that AIS uses a convenient but suboptimal extended target distribution. We here leverage recent progress in score-based generative modeling (SGM) to approximate the optimal extended target distribution minimizing the variance of the marginal likelihood estimate for AIS proposals corresponding to the discretization of Langevin and Hamiltonian dynamics. We demonstrate these novel, differentiable, AIS procedures on a number of synthetic benchmark distributions and variational auto-encoders.
\ No newline at end of file
diff --git a/data/2022/neurips/Score-Based Generative Models Detect Manifolds b/data/2022/neurips/Score-Based Generative Models Detect Manifolds
new file mode 100644
index 0000000000..af308359f7
--- /dev/null
+++ b/data/2022/neurips/Score-Based Generative Models Detect Manifolds	
@@ -0,0 +1 @@
+Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the"right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we find conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.
\ No newline at end of file
diff --git a/data/2022/neurips/Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance b/data/2022/neurips/Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance
new file mode 100644
index 0000000000..24f4fd9feb
--- /dev/null
+++ b/data/2022/neurips/Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance	
@@ -0,0 +1 @@
+Score-based generative models are shown to achieve remarkable empirical performances in various applications such as image generation and audio synthesis. However, a theoretical understanding of score-based diffusion models is still incomplete. Recently, Song et al. showed that the training objective of score-based generative models is equivalent to minimizing the Kullback-Leibler divergence of the generated distribution from the data distribution. In this work, we show that score-based models also minimize the Wasserstein distance between them under suitable assumptions on the model. Specifically, we prove that the Wasserstein distance is upper bounded by the square root of the objective function up to multiplicative constants and a fixed constant offset. Our proof is based on a novel application of the theory of optimal transport, which can be of independent interest to the society. Our numerical experiments support our findings. By analyzing our upper bounds, we provide a few techniques to obtain tighter upper bounds.
\ No newline at end of file
diff --git a/data/2022/neurips/Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition b/data/2022/neurips/Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
new file mode 100644
index 0000000000..3ce7a50bc1
--- /dev/null
+++ b/data/2022/neurips/Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition	
@@ -0,0 +1 @@
+Spatio-Temporal feature matching and alignment are essential for few-shot action recognition as they determine the coherence and effectiveness of the temporal patterns. Nevertheless, this process could be not reliable, especially when dealing with complex video scenarios. In this paper, we propose to improve the performance of matching and alignment from the end-to-end design of models. Our solution comes at two-folds. First, we encourage to enhance the extracted Spatio-Temporal representations from few-shot videos in the perspective of architectures. With this aim, we propose a specialized transformer search method for videos, thus the spatial and temporal attention can be well-organized and optimized for stronger feature representations. Second, we also design an efficient non-parametric spatio-temporal prototype alignment strategy to better handle the high variability of motion. In particular, a query-specific class prototype will be generated for each query sample and category, which can better match query sequences against all support sequences. By doing so, our method SST enjoys significant superiority over the benchmark UCF101 and HMDB51 datasets. For example, with no pretraining, our method achieves 17.1% Top-1 accuracy improvement than the baseline TRX on UCF101 5-way 1-shot setting but with only 3x fewer FLOPs.
\ No newline at end of file
diff --git a/data/2022/neurips/Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits b/data/2022/neurips/Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
new file mode 100644
index 0000000000..aeeb70e530
--- /dev/null
+++ b/data/2022/neurips/Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits	
@@ -0,0 +1 @@
+We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.
\ No newline at end of file
diff --git a/data/2022/neurips/SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning b/data/2022/neurips/SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning
new file mode 100644
index 0000000000..9ac68c1d4e
--- /dev/null
+++ b/data/2022/neurips/SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning	
@@ -0,0 +1 @@
+The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which allows us to optimize it with exponential search. We numerically show that the resulting algorithm is more stable than the state-of-the-art approach based on the Brent minimization method. Building on this simple algorithm and Secure Multiparty Computation routines, we propose SecureFedYJ, a federated algorithm that performs a pooled-equivalent YJ transformation without leaking more information than the final fitted parameters do. Quantitative experiments on real data demonstrate that, in addition to being secure, our approach reliably normalizes features across silos as well as if data were pooled, making it a viable approach for safe federated feature Gaussianization.
\ No newline at end of file
diff --git a/data/2022/neurips/Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers b/data/2022/neurips/Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
new file mode 100644
index 0000000000..e96a5c3f41
--- /dev/null
+++ b/data/2022/neurips/Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers	
@@ -0,0 +1 @@
+Complex time-varying systems are often studied by abstracting away from the dynamics of individual components to build a model of the population-level dynamics from the start. However, when building a population-level description, it can be easy to lose sight of each individual and how they contribute to the larger picture. In this paper, we present a novel transformer architecture for learning from time-varying data that builds descriptions of both the individual as well as the collective population dynamics. Rather than combining all of our data into our model at the onset, we develop a separable architecture that operates on individual time-series first before passing them forward; this induces a permutation-invariance property and can be used to transfer across systems of different size and order. After demonstrating that our model can be applied to successfully recover complex interactions and dynamics in many-body systems, we apply our approach to populations of neurons in the nervous system. On neural activity datasets, we show that our model not only yields robust decoding performance, but also provides impressive performance in transfer across recordings of different animals without any neuron-level correspondence. By enabling flexible pre-training that can be transferred to neural recordings of different size and order, our work provides a first step towards creating a foundation model for neural decoding.
\ No newline at end of file
diff --git a/data/2022/neurips/SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation b/data/2022/neurips/SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
new file mode 100644
index 0000000000..f394343d54
--- /dev/null
+++ b/data/2022/neurips/SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation	
@@ -0,0 +1 @@
+We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of semantic segmentation due to the efficiency of self-attention in encoding spatial information. In this paper, we show that convolutional attention is a more efficient and effective way to encode contextual information than the self-attention mechanism in transformers. By re-examining the characteristics owned by successful segmentation models, we discover several key components leading to the performance improvement of segmentation models. This motivates us to design a novel convolutional attention network that uses cheap convolutional operations. Without bells and whistles, our SegNeXt significantly improves the performance of previous state-of-the-art methods on popular benchmarks, including ADE20K, Cityscapes, COCO-Stuff, Pascal VOC, Pascal Context, and iSAID. Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90.6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it. On average, SegNeXt achieves about 2.0% mIoU improvements compared to the state-of-the-art methods on the ADE20K datasets with the same or fewer computations. Code is available at https://github.com/uyzhang/JSeg (Jittor) and https://github.com/Visual-Attention-Network/SegNeXt (Pytorch).
\ No newline at end of file
diff --git a/data/2022/neurips/SegViT: Semantic Segmentation with Plain Vision Transformers b/data/2022/neurips/SegViT: Semantic Segmentation with Plain Vision Transformers
new file mode 100644
index 0000000000..c57fcef2d7
--- /dev/null
+++ b/data/2022/neurips/SegViT: Semantic Segmentation with Plain Vision Transformers	
@@ -0,0 +1 @@
+We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit. Previous ViT-based segmentation networks usually learn a pixel-level representation from the output of the ViT. Differently, we make use of the fundamental component -- attention mechanism, to generate masks for semantic segmentation. Specifically, we propose the Attention-to-Mask (ATM) module, in which the similarity maps between a set of learnable class tokens and the spatial feature maps are transferred to the segmentation masks. Experiments show that our proposed SegVit using the ATM module outperforms its counterparts using the plain ViT backbone on the ADE20K dataset and achieves new state-of-the-art performance on COCO-Stuff-10K and PASCAL-Context datasets. Furthermore, to reduce the computational cost of the ViT backbone, we propose query-based down-sampling (QD) and query-based up-sampling (QU) to build a Shrunk structure. With the proposed Shrunk structure, the model can save up to $40\%$ computations while maintaining competitive performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Segmenting Moving Objects via an Object-Centric Layered Representation b/data/2022/neurips/Segmenting Moving Objects via an Object-Centric Layered Representation
new file mode 100644
index 0000000000..1345ae647a
--- /dev/null
+++ b/data/2022/neurips/Segmenting Moving Objects via an Object-Centric Layered Representation	
@@ -0,0 +1 @@
+The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video. We make four contributions: First, we introduce an object-centric segmentation model with a depth-ordered layer representation. This is implemented using a variant of the transformer architecture that ingests optical flow, where each query vector specifies an object and its layer for the entire video. The model can effectively discover multiple moving objects and handle mutual occlusions; Second, we introduce a scalable pipeline for generating multi-object synthetic training data via layer compositions, that is used to train the proposed model, significantly reducing the requirements for labour-intensive annotations, and supporting Sim2Real generalisation; Third, we conduct thorough ablation studies, showing that the model is able to learn object permanence and temporal shape consistency, and is able to predict amodal segmentation masks; Fourth, we evaluate our model, trained only on synthetic data, on standard video segmentation benchmarks, DAVIS, MoCA, SegTrack, FBMS-59, and achieve state-of-the-art performance among existing methods that do not rely on any manual annotations. With test-time adaptation, we observe further performance boosts.
\ No newline at end of file
diff --git a/data/2022/neurips/SelecMix: Debiased Learning by Contradicting-pair Sampling b/data/2022/neurips/SelecMix: Debiased Learning by Contradicting-pair Sampling
new file mode 100644
index 0000000000..0843c2c1c1
--- /dev/null
+++ b/data/2022/neurips/SelecMix: Debiased Learning by Contradicting-pair Sampling	
@@ -0,0 +1 @@
+Neural networks trained with ERM (empirical risk minimization) sometimes learn unintended decision rules, in particular when their training data is biased, i.e., when training labels are strongly correlated with undesirable features. To prevent a network from learning such features, recent methods augment training data such that examples displaying spurious correlations (i.e., bias-aligned examples) become a minority, whereas the other, bias-conflicting examples become prevalent. However, these approaches are sometimes difficult to train and scale to real-world data because they rely on generative models or disentangled representations. We propose an alternative based on mixup, a popular augmentation that creates convex combinations of training examples. Our method, coined SelecMix, applies mixup to contradicting pairs of examples, defined as showing either (i) the same label but dissimilar biased features, or (ii) different labels but similar biased features. Identifying such pairs requires comparing examples with respect to unknown biased features. For this, we utilize an auxiliary contrastive model with the popular heuristic that biased features are learned preferentially during training. Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples.
\ No newline at end of file
diff --git a/data/2022/neurips/Selective compression learning of latent representations for variable-rate image compression b/data/2022/neurips/Selective compression learning of latent representations for variable-rate image compression
new file mode 100644
index 0000000000..883fcfb739
--- /dev/null
+++ b/data/2022/neurips/Selective compression learning of latent representations for variable-rate image compression	
@@ -0,0 +1 @@
+Recently, many neural network-based image compression methods have shown promising results superior to the existing tool-based conventional codecs. However, most of them are often trained as separate models for different target bit rates, thus increasing the model complexity. Therefore, several studies have been conducted for learned compression that supports variable rates with single models, but they require additional network modules, layers, or inputs that often lead to complexity overhead, or do not provide sufficient coding efficiency. In this paper, we firstly propose a selective compression method that partially encodes the latent representations in a fully generalized manner for deep learning-based variable-rate image compression. The proposed method adaptively determines essential representation elements for compression of different target quality levels. For this, we first generate a 3D importance map as the nature of input content to represent the underlying importance of the representation elements. The 3D importance map is then adjusted for different target quality levels using importance adjustment curves. The adjusted 3D importance map is finally converted into a 3D binary mask to determine the essential representation elements for compression. The proposed method can be easily integrated with the existing compression models with a negligible amount of overhead increase. Our method can also enable continuously variable-rate compression via simple interpolation of the importance adjustment curves among different quality levels. The extensive experimental results show that the proposed method can achieve comparable compression efficiency as those of the separately trained reference compression models and can reduce decoding time owing to the selective compression. The sample codes are publicly available at https://github.com/JooyoungLeeETRI/SCR.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Aware Personalized Federated Learning b/data/2022/neurips/Self-Aware Personalized Federated Learning
new file mode 100644
index 0000000000..a995b3dd5f
--- /dev/null
+++ b/data/2022/neurips/Self-Aware Personalized Federated Learning	
@@ -0,0 +1 @@
+In the context of personalized federated learning (FL), the critical challenge is to balance local model improvement and global model tuning when the personal and global objectives may not be exactly aligned. Inspired by Bayesian hierarchical models, we develop a self-aware personalized FL method where each client can automatically balance the training of its local personal model and the global model that implicitly contributes to other clients' training. Such a balance is derived from the inter-client and intra-client uncertainty quantification. A larger inter-client variation implies more personalization is needed. Correspondingly, our method uses uncertainty-driven local training steps and aggregation rule instead of conventional local fine-tuning and sample size-based aggregation. With experimental studies on synthetic data, Amazon Alexa audio data, and public datasets such as MNIST, FEMNIST, CIFAR10, and Sent140, we show that our proposed method can achieve significantly improved personalization performance compared with the existing counterparts.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks b/data/2022/neurips/Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks
new file mode 100644
index 0000000000..d57d3ea6fc
--- /dev/null
+++ b/data/2022/neurips/Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks	
@@ -0,0 +1 @@
+We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel (NTK), and consequently, output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained by Yang and Hu with tensor programs. For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of convolutional neural networks at fixed feature learning strength are preserved across different widths on a image classification task.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Explaining Deviations for Coordination b/data/2022/neurips/Self-Explaining Deviations for Coordination
new file mode 100644
index 0000000000..3fa0753975
--- /dev/null
+++ b/data/2022/neurips/Self-Explaining Deviations for Coordination	
@@ -0,0 +1 @@
+Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world. In this paper, we focus on a specific subclass of coordination problems in which humans are able to discover self-explaining deviations (SEDs). SEDs are actions that deviate from the common understanding of what reasonable behavior would be in normal circumstances. They are taken with the intention of causing another agent or other agents to realize, using theory of mind, that the circumstance must be abnormal. We first motivate SED with a real world example and formalize its definition. Next, we introduce a novel algorithm, improvement maximizing self-explaining deviations (IMPROVISED), to perform SEDs. Lastly, we evaluate IMPROVISED both in an illustrative toy setting and the popular benchmark setting Hanabi, where it is the first method to produce so called finesse plays, which are regarded as one of the more iconic examples of human theory of mind.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Organized Group for Cooperative Multi-agent Reinforcement Learning b/data/2022/neurips/Self-Organized Group for Cooperative Multi-agent Reinforcement Learning
new file mode 100644
index 0000000000..aca570d403
--- /dev/null
+++ b/data/2022/neurips/Self-Organized Group for Cooperative Multi-agent Reinforcement Learning	
@@ -0,0 +1 @@
+Centralized training with decentralized execution (CTDE) has achieved great success in cooperative multi-agent reinforcement learning (MARL) in practical applications. However, CTDE-based methods typically suffer from poor zero-shot generalization ability with dynamic team composition and varying partial observability. To tackle these issues, we propose a spontaneously grouping mechanism, termed Self-Organized Group (SOG), which is featured with a conductor election (CE) and a message summary (MS) mechanism. In CE, a certain number of conductors are elected every T time-steps to temporally construct groups, each with conductor-follower consensus where the followers are constrained to only communicate with their conductor. In MS, each conductor summarize and distribute the received messages to all affiliate group members to hold a unified scheduling. SOG provides zero-shot generalization ability to the dynamic number of agents and the varying partial observability. Sufficient experiments on mainstream multi-agent benchmarks exhibit superiority of SOG.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations b/data/2022/neurips/Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations
new file mode 100644
index 0000000000..3ca8e15269
--- /dev/null
+++ b/data/2022/neurips/Self-Similarity Priors: Neural Collages as Differentiable Fractal Representations	
@@ -0,0 +1 @@
+Many patterns in nature exhibit self-similarity: they can be compactly described via self-referential transformations. Said patterns commonly appear in natural and artificial objects, such as molecules, shorelines, galaxies and even images. In this work, we investigate the role of learning in the automated discovery of self-similarity and in its utilization for downstream tasks. To this end, we design a novel class of implicit operators, Neural Collages, which (1) represent data as the parameters of a self-referential, structured transformation, and (2) employ hypernetworks to amortize the cost of finding these parameters to a single forward pass. We investigate how to leverage the representations produced by Neural Collages in various tasks, including data compression and generation. Neural Collages image compressors are orders of magnitude faster than other self-similarity-based algorithms during encoding and offer compression rates competitive with implicit methods. Finally, we showcase applications of Neural Collages for fractal art and as deep generative models.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition b/data/2022/neurips/Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition
new file mode 100644
index 0000000000..94fb8a7231
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition	
@@ -0,0 +1 @@
+Existing long-tailed recognition methods, aiming to train class-balanced models from long-tailed data, generally assume the models would be evaluated on the uniform test class distribution. However, practical test class distributions often violate this assumption (e.g., being either long-tailed or even inversely long-tailed), which may lead existing methods to fail in real applications. In this paper, we study a more practical yet challenging task, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed while the test class distribution is agnostic and not necessarily uniform. In addition to the issue of class imbalance, this task poses another challenge: the class distribution shift between the training and test data is unknown. To tackle this task, we propose a novel approach, called Self-supervised Aggregation of Diverse Experts, which consists of two strategies: (i) a new skill-diverse expert learning strategy that trains multiple experts from a single and stationary long-tailed dataset to separately handle different class distributions; (ii) a novel test-time expert aggregation strategy that leverages self-supervision to aggregate the learned multiple experts for handling unknown test class distributions. We theoretically show that our self-supervised strategy has a provable ability to simulate test-agnostic class distributions. Promising empirical results demonstrate the effectiveness of our method on both vanilla and test-agnostic long-tailed recognition. Code is available at \url{https://github.com/Vanint/SADE-AgnosticLT}.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency b/data/2022/neurips/Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency
new file mode 100644
index 0000000000..0402797ca6
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency	
@@ -0,0 +1 @@
+Pre-training on time series poses a unique challenge due to the potential mismatch between pre-training and target domains, such as shifts in temporal dynamics, fast-evolving trends, and long-range and short-cyclic effects, which can lead to poor downstream performance. While domain adaptation methods can mitigate these shifts, most methods need examples directly from the target domain, making them suboptimal for pre-training. To address this challenge, methods need to accommodate target domains with different temporal dynamics and be capable of doing so without seeing any target examples during pre-training. Relative to other modalities, in time series, we expect that time-based and frequency-based representations of the same example are located close together in the time-frequency space. To this end, we posit that time-frequency consistency (TF-C) -- embedding a time-based neighborhood of an example close to its frequency-based neighborhood -- is desirable for pre-training. Motivated by TF-C, we define a decomposable pre-training model, where the self-supervised signal is provided by the distance between time and frequency components, each individually trained by contrastive estimation. We evaluate the new method on eight datasets, including electrodiagnostic testing, human activity recognition, mechanical fault detection, and physical status monitoring. Experiments against eight state-of-the-art methods show that TF-C outperforms baselines by 15.4% (F1 score) on average in one-to-one settings (e.g., fine-tuning an EEG-pretrained model on EMG data) and by 8.4% (precision) in challenging one-to-many settings (e.g., fine-tuning an EEG-pretrained model for either hand-gesture recognition or mechanical fault prediction), reflecting the breadth of scenarios that arise in real-world applications. Code and datasets: https://github.com/mims-harvard/TFC-pretraining.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Fair Representation Learning without Demographics b/data/2022/neurips/Self-Supervised Fair Representation Learning without Demographics
new file mode 100644
index 0000000000..fde16afaf5
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Fair Representation Learning without Demographics	
@@ -0,0 +1 @@
+Fairness has become an important topic in machine learning. Generally, most literature on fairness assumes that the sensitive information, such as gender or race, is present in the training set, and uses this information to mitigate bias. However, due to practical concerns like privacy and regulation, applications of these methods are restricted. Also, although much of the literature studies supervised learning, in many real-world scenarios, we want to utilize the large unlabelled dataset to improve the model’s accuracy. Can we improve fair classification without sensitive information and without labels? To tackle the problem, in this paper, we propose a novel reweighing-based contrastive learning method. The goal of our method is to learn a generally fair representation without observing sensitive attributes. Our method assigns weights to training samples per iteration based on their gradient directions relative to the validation samples such that the average top-k validation loss is minimized. Compared with past fairness methods without demographics, our method is built on fully unsupervised training data and requires only a small labelled validation set. We provide rigorous theoretical proof of the convergence of our model. Experimental results show that our proposed method achieves better or comparable performance than state-of-the-art methods on three datasets in terms of accuracy and several fairness metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Image Restoration with Blurry and Noisy Pairs b/data/2022/neurips/Self-Supervised Image Restoration with Blurry and Noisy Pairs
new file mode 100644
index 0000000000..d9fd066824
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Image Restoration with Blurry and Noisy Pairs	
@@ -0,0 +1 @@
+When taking photos under an environment with insufficient light, the exposure time and the sensor gain usually require to be carefully chosen to obtain images with satisfying visual quality. For example, the images with high ISO usually have inescapable noise, while the long-exposure ones may be blurry due to camera shake or object motion. Existing solutions generally suggest to seek a balance between noise and blur, and learn denoising or deblurring models under either full- or self-supervision. However, the real-world training pairs are difficult to collect, and the self-supervised methods merely rely on blurry or noisy images are limited in performance. In this work, we tackle this problem by jointly leveraging the short-exposure noisy image and the long-exposure blurry image for better image restoration. Such setting is practically feasible due to that short-exposure and long-exposure images can be either acquired by two individual cameras or synthesized by a long burst of images. Moreover, the short-exposure images are hardly blurry, and the long-exposure ones have negligible noise. Their complementarity makes it feasible to learn restoration model in a self-supervised manner. Specifically, the noisy images can be used as the supervision information for deblurring, while the sharp areas in the blurry images can be utilized as the auxiliary supervision information for self-supervised denoising. By learning in a collaborative manner, the deblurring and denoising tasks in our method can benefit each other. Experiments on synthetic and real-world images show the effectiveness and practicality of the proposed method. Codes are available at https://github.com/cszhilu1998/SelfIR.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Learning Through Efference Copies b/data/2022/neurips/Self-Supervised Learning Through Efference Copies
new file mode 100644
index 0000000000..74d9b57776
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Learning Through Efference Copies	
@@ -0,0 +1 @@
+Self-supervised learning (SSL) methods aim to exploit the abundance of unlabelled data for machine learning (ML), however the underlying principles are often method-specific. An SSL framework derived from biological first principles of embodied learning could unify the various SSL methods, help elucidate learning in the brain, and possibly improve ML. SSL commonly transforms each training datapoint into a pair of views, uses the knowledge of this pairing as a positive (i.e. non-contrastive) self-supervisory sign, and potentially opposes it to unrelated, (i.e. contrastive) negative examples. Here, we show that this type of self-supervision is an incomplete implementation of a concept from neuroscience, the Efference Copy (EC). Specifically, the brain also transforms the environment through efference, i.e. motor commands, however it sends to itself an EC of the full commands, i.e. more than a mere SSL sign. In addition, its action representations are likely egocentric. From such a principled foundation we formally recover and extend SSL methods such as SimCLR, BYOL, and ReLIC under a common theoretical framework, i.e. Self-supervision Through Efference Copies (S-TEC). Empirically, S-TEC restructures meaningfully the within- and between-class representations. This manifests as improvement in recent strong SSL baselines in image classification, segmentation, object detection, and in audio. These results hypothesize a testable positive influence from the brain's motor outputs onto its sensory representations.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data b/data/2022/neurips/Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data
new file mode 100644
index 0000000000..199e39fdc7
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Learning of Brain Dynamics from Broad Neuroimaging Data	
@@ -0,0 +1 @@
+Self-supervised learning techniques are celebrating immense success in natural language processing (NLP) by enabling models to learn from broad language data at unprecedented scales. Here, we aim to leverage the success of these techniques for mental state decoding, where researchers aim to identify specific mental states (e.g., the experience of anger or joy) from brain activity. To this end, we devise a set of novel self-supervised learning frameworks for neuroimaging data inspired by prominent learning frameworks in NLP. At their core, these frameworks learn the dynamics of brain activity by modeling sequences of activity akin to how sequences of text are modeled in NLP. We evaluate the frameworks by pre-training models on a broad neuroimaging dataset spanning functional Magnetic Resonance Imaging data from 11,980 experimental runs of 1,726 individuals across 34 datasets, and subsequently adapting the pre-trained models to benchmark mental state decoding datasets. The pre-trained models transfer well, generally outperforming baseline models trained from scratch, while models trained in a learning framework based on causal language modeling clearly outperform the others.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Learning via Maximum Entropy Coding b/data/2022/neurips/Self-Supervised Learning via Maximum Entropy Coding
new file mode 100644
index 0000000000..ef708d2267
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Learning via Maximum Entropy Coding	
@@ -0,0 +1 @@
+A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably introduce biases into the learned representation, which in turn leads to biased transfer performance on various downstream tasks. To cope with this issue, we propose Maximum Entropy Coding (MEC), a more principled objective that explicitly optimizes on the structure of the representation, so that the learned representation is less biased and thus generalizes better to unseen downstream tasks. Inspired by the principle of maximum entropy in information theory, we hypothesize that a generalizable representation should be the one that admits the maximum entropy among all plausible representations. To make the objective end-to-end trainable, we propose to leverage the minimal coding length in lossy data coding as a computationally tractable surrogate for the entropy, and further derive a scalable reformulation of the objective that allows fast computation. Extensive experiments demonstrate that MEC learns a more generalizable representation than previous methods based on specific pretext tasks. It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking. Interestingly, we show that existing batch-wise and feature-wise self-supervised objectives could be seen equivalent to low-order approximations of MEC. Code and pre-trained models are available at https://github.com/xinliu20/MEC.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Learning with an Information Maximization Criterion b/data/2022/neurips/Self-Supervised Learning with an Information Maximization Criterion
new file mode 100644
index 0000000000..f5a76874fc
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Learning with an Information Maximization Criterion	
@@ -0,0 +1 @@
+Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log-determinant of the feature covariance matrix. The regularization term acts as a natural barrier against feature space degeneracy. Consequently, beyond avoiding complete output collapse to a single point, the proposed approach also prevents dimensional collapse by encouraging the spread of information across the whole feature space. Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Pretraining for Large-Scale Point Clouds b/data/2022/neurips/Self-Supervised Pretraining for Large-Scale Point Clouds
new file mode 100644
index 0000000000..2057c5abcc
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Pretraining for Large-Scale Point Clouds	
@@ -0,0 +1 @@
+Pretraining on large unlabeled datasets has been proven to improve the down-stream task performance on many computer vision tasks, such as 2D object detection and video classification. However, for large-scale 3D scenes, such as outdoor LiDAR point clouds, pretraining is not widely used. Due to the special data characteristics of large 3D point clouds, approaches for 2D pretraining frameworks tend to not generalize well to this domain. In this paper, we propose a new self-supervised pretraining method that targets large-scale 3D scenes. We pretrain commonly used point-based and voxel-based model architectures and show the transfer learning performance on 3D object detection and semantic segmentation. We demonstrate the effectiveness of our approach on both dense 3D indoor point clouds and sparse outdoor LiDAR point clouds.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-Supervised Visual Representation Learning with Semantic Grouping b/data/2022/neurips/Self-Supervised Visual Representation Learning with Semantic Grouping
new file mode 100644
index 0000000000..60d763f6fa
--- /dev/null
+++ b/data/2022/neurips/Self-Supervised Visual Representation Learning with Semantic Grouping	
@@ -0,0 +1 @@
+In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together. Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation. Code is available at: https://github.com/CVMI-Lab/SlotCon.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-explaining deep models with logic rule reasoning b/data/2022/neurips/Self-explaining deep models with logic rule reasoning
new file mode 100644
index 0000000000..ddb5684132
--- /dev/null
+++ b/data/2022/neurips/Self-explaining deep models with logic rule reasoning	
@@ -0,0 +1 @@
+We present SELOR, a framework for integrating self-explaining capabilities into a given deep model to achieve both high prediction performance and human precision. By"human precision", we refer to the degree to which humans agree with the reasons models provide for their predictions. Human precision affects user trust and allows users to collaborate closely with the model. We demonstrate that logic rule explanations naturally satisfy human precision with the expressive power required for good predictive performance. We then illustrate how to enable a deep model to predict and explain with logic rules. Our method does not require predefined logic rule sets or human annotations and can be learned efficiently and easily with widely-used deep learning modules in a differentiable way. Extensive experiments show that our method gives explanations closer to human decision logic than other methods while maintaining the performance of deep learning models.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-supervised Amodal Video Object Segmentation b/data/2022/neurips/Self-supervised Amodal Video Object Segmentation
new file mode 100644
index 0000000000..31d9fbe6a3
--- /dev/null
+++ b/data/2022/neurips/Self-supervised Amodal Video Object Segmentation	
@@ -0,0 +1 @@
+Amodal perception requires inferring the full shape of an object that is partially occluded. This task is particularly challenging on two levels: (1) it requires more information than what is contained in the instant retina or imaging sensor, (2) it is difficult to obtain enough well-annotated amodal labels for supervision. To this end, this paper develops a new framework of Self-supervised amodal Video object segmentation (SaVos). Our method efficiently leverages the visual information of video temporal sequences to infer the amodal mask of objects. The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned. Accordingly, we derive a novel self-supervised learning paradigm that efficiently utilizes the visible object parts as the supervision to guide the training on videos. In addition to learning type prior to complete masks for known types, SaVos also learns the spatiotemporal prior, which is also useful for the amodal task and could generalize to unseen types. The proposed framework achieves the state-of-the-art performance on the synthetic amodal segmentation benchmark FISHBOWL and the real world benchmark KINS-Video-Car. Further, it lends itself well to being transferred to novel distributions using test-time adaptation, outperforming existing models even after the transfer to a new distribution.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering b/data/2022/neurips/Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
new file mode 100644
index 0000000000..c1bf0f69be
--- /dev/null
+++ b/data/2022/neurips/Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering	
@@ -0,0 +1 @@
+Recent self-supervised pre-training methods on Heterogeneous Information Networks (HINs) have shown promising competitiveness over traditional semi-supervised Heterogeneous Graph Neural Networks (HGNNs). Unfortunately, their performance heavily depends on careful customization of various strategies for generating high-quality positive examples and negative examples, which notably limits their flexibility and generalization ability. In this work, we present SHGP, a novel Self-supervised Heterogeneous Graph Pre-training approach, which does not need to generate any positive examples or negative examples. It consists of two modules that share the same attention-aggregation scheme. In each iteration, the Att-LPA module produces pseudo-labels through structural clustering, which serve as the self-supervision signals to guide the Att-HGNN module to learn object embeddings and attention coefficients. The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings. Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. We release our source code at: https://github.com/kepsail/SHGP.
\ No newline at end of file
diff --git a/data/2022/neurips/Self-supervised surround-view depth estimation with volumetric feature fusion b/data/2022/neurips/Self-supervised surround-view depth estimation with volumetric feature fusion
new file mode 100644
index 0000000000..3b5e9d037d
--- /dev/null
+++ b/data/2022/neurips/Self-supervised surround-view depth estimation with volumetric feature fusion	
@@ -0,0 +1 @@
+We present a self-supervised depth estimation approach using a unified volumetric feature fusion for surround-view images. Given a set of surround-view images, our method constructs a volumetric feature map by extracting image feature maps from surround-view images and fuse the feature maps into a shared, unified 3D voxel space. The volumetric feature map then can be used for estimating a depth map at each surround view by projecting it into an image coordinate. A volumetric feature contains 3D information at its local voxel coordinate; thus our method can also synthesize a depth map at arbitrary rotated viewpoints by projecting the volumetric feature map into the target viewpoints. Furthermore, assuming static camera extrinsics in the multi-camera system, we propose to estimate a canonical camera motion from the volumetric feature map. Our method leverages 3D spatio-temporal context to learn metric-scale depth and the canonical camera motion in a self-supervised manner. Our method outperforms the prior arts on DDAD and nuScenes datasets, especially estimating more accurate metric-scale depth and consistent depth between neighboring views.
\ No newline at end of file
diff --git a/data/2022/neurips/SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders b/data/2022/neurips/SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders
new file mode 100644
index 0000000000..3f5f5b23d7
--- /dev/null
+++ b/data/2022/neurips/SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders	
@@ -0,0 +1 @@
+Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps. 1) Semantic part learning: we design a self-supervised part learning method to obtain semantic parts by leveraging and refining the multi-head attention of a ViT-based encoder. 2) Semantic-guided MAE (SemMAE) training: we design a masking strategy that varies from masking a portion of patches in each part to masking a portion of (whole) parts in an image. Extensive experiments on various vision tasks show that SemMAE can learn better image representation by integrating semantic information. In particular, SemMAE achieves 84.5% fine-tuning accuracy on ImageNet-1k, which outperforms the vanilla MAE by 1.4%. In the semantic segmentation and fine-grained recognition tasks, SemMAE also brings significant improvements and yields the state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Semantic Diffusion Network for Semantic Segmentation b/data/2022/neurips/Semantic Diffusion Network for Semantic Segmentation
new file mode 100644
index 0000000000..5e82a5ca28
--- /dev/null
+++ b/data/2022/neurips/Semantic Diffusion Network for Semantic Segmentation	
@@ -0,0 +1 @@
+Precise and accurate predictions over boundary areas are essential for semantic segmentation. However, the commonly-used convolutional operators tend to smooth and blur local detail cues, making it difficult for deep models to generate accurate boundary predictions. In this paper, we introduce an operator-level approach to enhance semantic boundary awareness, so as to improve the prediction of the deep semantic segmentation model. Specifically, we first formulate the boundary feature enhancement as an anisotropic diffusion process. We then propose a novel learnable approach called semantic diffusion network (SDN) to approximate the diffusion process, which contains a parameterized semantic difference convolution operator followed by a feature fusion module. Our SDN aims to construct a differentiable mapping from the original feature to the inter-class boundary-enhanced feature. The proposed SDN is an efficient and flexible module that can be easily plugged into existing encoder-decoder segmentation models. Extensive experiments show that our approach can achieve consistent improvements over several typical and state-of-the-art segmentation baseline models on challenging public benchmarks. The code will be released soon.
\ No newline at end of file
diff --git a/data/2022/neurips/Semantic Exploration from Language Abstractions and Pretrained Representations b/data/2022/neurips/Semantic Exploration from Language Abstractions and Pretrained Representations
new file mode 100644
index 0000000000..451b02ec84
--- /dev/null
+++ b/data/2022/neurips/Semantic Exploration from Language Abstractions and Pretrained Representations	
@@ -0,0 +1 @@
+Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluate vision-language representations, pretrained on natural image captioning datasets. We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by considering the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach in two very different task domains -- one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments.
\ No newline at end of file
diff --git a/data/2022/neurips/Semantic Probabilistic Layers for Neuro-Symbolic Learning b/data/2022/neurips/Semantic Probabilistic Layers for Neuro-Symbolic Learning
new file mode 100644
index 0000000000..a87ec6d2da
--- /dev/null
+++ b/data/2022/neurips/Semantic Probabilistic Layers for Neuro-Symbolic Learning	
@@ -0,0 +1 @@
+We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.
\ No newline at end of file
diff --git a/data/2022/neurips/Semantic uncertainty intervals for disentangled latent spaces b/data/2022/neurips/Semantic uncertainty intervals for disentangled latent spaces
new file mode 100644
index 0000000000..f973386812
--- /dev/null
+++ b/data/2022/neurips/Semantic uncertainty intervals for disentangled latent spaces	
@@ -0,0 +1 @@
+Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained challenging. In this work, we provide principled uncertainty intervals that are guaranteed to contain the true semantic factors for any underlying generative model. The method does the following: (1) it uses quantile regression to output a heuristic uncertainty interval for each element in the latent space (2) calibrates these uncertainties such that they contain the true value of the latent for a new, unseen input. The endpoints of these calibrated intervals can then be propagated through the generator to produce interpretable uncertainty visualizations for each semantic factor. This technique reliably communicates semantically meaningful, principled, and instance-adaptive uncertainty in inverse problems like image super-resolution and image completion.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-Discrete Normalizing Flows through Differentiable Tessellation b/data/2022/neurips/Semi-Discrete Normalizing Flows through Differentiable Tessellation
new file mode 100644
index 0000000000..8a6747e587
--- /dev/null
+++ b/data/2022/neurips/Semi-Discrete Normalizing Flows through Differentiable Tessellation	
@@ -0,0 +1 @@
+Mapping between discrete and continuous distributions is a difficult task and many have had to resort to heuristical approaches. We propose a tessellation-based approach that directly learns quantization boundaries in a continuous space, complete with exact likelihood evaluations. This is done through constructing normalizing flows on convex polytopes parameterized using a simple homeomorphism with an efficient log determinant Jacobian. We explore this approach in two application settings, mapping from discrete to continuous and vice versa. Firstly, a Voronoi dequantization allows automatically learning quantization boundaries in a multidimensional space. The location of boundaries and distances between regions can encode useful structural relations between the quantized discrete values. Secondly, a Voronoi mixture model has near-constant computation cost for likelihood evaluation regardless of the number of mixture components. Empirically, we show improvements over existing methods across a range of structured data modalities.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-Supervised Generative Models for Multiagent Trajectories b/data/2022/neurips/Semi-Supervised Generative Models for Multiagent Trajectories
new file mode 100644
index 0000000000..fec39c2065
--- /dev/null
+++ b/data/2022/neurips/Semi-Supervised Generative Models for Multiagent Trajectories	
@@ -0,0 +1 @@
+Analyzing the spatiotemporal behavior of multiple agents is of great interest to many communities. Existing probabilistic models in this realm are formalized either in an unsupervised framework, where the latent space is described by discrete or continuous variables, or in a supervised framework, where weakly preserved labels add explicit information to continuous latent representations. To overcome inherent limitations, we propose a novel objective function for processing multi-agent trajectories based on semi-supervised variational autoencoders, where equivariance and interaction of agents are captured via customized graph networks. The resulting architecture disentangles discrete and continuous latent effects and provides a natural solution for injecting expensive domain knowledge into interactive sequential systems. Empirically, our model not only outperforms various state-of-the-art baselines in trajectory forecasting, but also learns to effectively leverage unsupervised multi-agent sequences for classiﬁcation tasks on interactive real-world sports datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization b/data/2022/neurips/Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization
new file mode 100644
index 0000000000..18b97e941a
--- /dev/null
+++ b/data/2022/neurips/Semi-Supervised Learning with Decision Trees: Graph Laplacian Tree Alternating Optimization	
@@ -0,0 +1 @@
+In this supplementary material, we provide the following: 1) Pseudocodes for LapTAO and TAO (section 1); 2) Derivation of the solution for the label-step (section 2); 3) How to accelerate the “label–step” in LapTAO (section 3); 4) Description of the experimental setup: datasets, comparison methods, hyperparameters, etc. (section 4); 5) Additional experimental results (section 5): comparison with SSCT [6] and EBBS, decision tree visualizations, etc.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant b/data/2022/neurips/Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant
new file mode 100644
index 0000000000..57abacf0bf
--- /dev/null
+++ b/data/2022/neurips/Semi-Supervised Semantic Segmentation via Gentle Teaching Assistant	
@@ -0,0 +1 @@
+Semi-Supervised Semantic Segmentation aims at training the segmentation model with limited labeled data and a large amount of unlabeled data. To effectively leverage the unlabeled data, pseudo labeling, along with the teacher-student framework, is widely adopted in semi-supervised semantic segmentation. Though proved to be effective, this paradigm suffers from incorrect pseudo labels which inevitably exist and are taken as auxiliary training data. To alleviate the negative impact of incorrect pseudo labels, we delve into the current Semi-Supervised Semantic Segmentation frameworks. We argue that the unlabeled data with pseudo labels can facilitate the learning of representative features in the feature extractor, but it is unreliable to supervise the mask predictor. Motivated by this consideration, we propose a novel framework, Gentle Teaching Assistant (GTA-Seg) to disentangle the effects of pseudo labels on feature extractor and mask predictor of the student model. Specifically, in addition to the original teacher-student framework, our method introduces a teaching assistant network which directly learns from pseudo labels generated by the teacher network. The gentle teaching assistant (GTA) is coined gentle since it only transfers the beneficial feature representation knowledge in the feature extractor to the student model in an Exponential Moving Average (EMA) manner, protecting the student model from the negative influences caused by unreliable pseudo labels in the mask predictor. The student model is also supervised by reliable labeled data to train an accurate mask predictor, further facilitating feature representation. Extensive experiment results on benchmark datasets validate that our method shows competitive performance against previous methods. Code is available at https://github.com/Jin-Ying/GTA-Seg.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-Supervised Video Salient Object Detection Based on Uncertainty-Guided Pseudo Labels b/data/2022/neurips/Semi-Supervised Video Salient Object Detection Based on Uncertainty-Guided Pseudo Labels
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Semi-infinitely Constrained Markov Decision Processes b/data/2022/neurips/Semi-infinitely Constrained Markov Decision Processes
new file mode 100644
index 0000000000..65fe6eae8b
--- /dev/null
+++ b/data/2022/neurips/Semi-infinitely Constrained Markov Decision Processes	
@@ -0,0 +1 @@
+We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CRL is a model-based reinforcement learning algorithm. Given an estimate of the transition model, we first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. SI-CPO is a policy optimization algorithm. Borrowing the ideas from the cooperative stochastic approximation approach, we make alternative updates to the policy parameters to maximize the reward or minimize the cost. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems. We present theoretical analysis for SI-CRL and SI-CPO, identifying their iteration complexity and sample complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and demonstrate that our proposed algorithms are able to solve complex sequential decision-making tasks leveraging modern deep reinforcement learning techniques.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-supervised Active Linear Regression b/data/2022/neurips/Semi-supervised Active Linear Regression
new file mode 100644
index 0000000000..e0907aec68
--- /dev/null
+++ b/data/2022/neurips/Semi-supervised Active Linear Regression	
@@ -0,0 +1 @@
+Labeled data often comes at a high cost as it may require recruiting human labelers or running costly‘ experiments. At the same time, in many practical scenarios, one already has access to a partially labeled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning through the frame of linear regression. Here, the learner has access to a dataset X ∈ R ( n un + n lab ) × d composed of n un unlabeled examples that a learner can actively query, and n lab examples labeled a priori. Denoting the true labels by Y ∈ R n un + n lab , the learner’s objective is to ﬁnd (cid:98) β ∈ R d such that,
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization b/data/2022/neurips/Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization
new file mode 100644
index 0000000000..218debf0b9
--- /dev/null
+++ b/data/2022/neurips/Semi-supervised Semantic Segmentation with Prototype-based Consistency Regularization	
@@ -0,0 +1 @@
+Semi-supervised semantic segmentation requires the model to effectively propagate the label information from limited annotated images to unlabeled ones. A challenge for such a per-pixel prediction task is the large intra-class variation, i.e., regions belonging to the same class may exhibit a very different appearance even in the same picture. This diversity will make the label propagation hard from pixels to pixels. To address this problem, we propose a novel approach to regularize the distribution of within-class features to ease label propagation difficulty. Specifically, our approach encourages the consistency between the prediction from a linear predictor and the output from a prototype-based predictor, which implicitly encourages features from the same pseudo-class to be close to at least one within-class prototype while staying far from the other between-class prototypes. By further incorporating CutMix operations and a carefully-designed prototype maintenance strategy, we create a semi-supervised semantic segmentation algorithm that demonstrates superior performance over the state-of-the-art methods from extensive experimental evaluation on both Pascal VOC and Cityscapes benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Semi-supervised Vision Transformers at Scale b/data/2022/neurips/Semi-supervised Vision Transformers at Scale
new file mode 100644
index 0000000000..7562d147d9
--- /dev/null
+++ b/data/2022/neurips/Semi-supervised Vision Transformers at Scale	
@@ -0,0 +1 @@
+We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.
\ No newline at end of file
diff --git a/data/2022/neurips/SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training b/data/2022/neurips/SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
new file mode 100644
index 0000000000..d8a7882415
--- /dev/null
+++ b/data/2022/neurips/SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training	
@@ -0,0 +1 @@
+Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to `fine-tune global model with labeled data' and `generate pseudo-labels with the global model.' We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
\ No newline at end of file
diff --git a/data/2022/neurips/SeqPATE: Differentially Private Text Generation via Knowledge Distillation b/data/2022/neurips/SeqPATE: Differentially Private Text Generation via Knowledge Distillation
new file mode 100644
index 0000000000..c82dca8f2c
--- /dev/null
+++ b/data/2022/neurips/SeqPATE: Differentially Private Text Generation via Knowledge Distillation	
@@ -0,0 +1 @@
+Protecting the privacy of user data is crucial for text generation models, which can leak sensitive information during generation. Differentially private (DP) learning methods provide guarantees against identifying the existence of a training sample from model outputs. PATE is a recent DP learning algorithm that achieves high utility with strong privacy protection on training samples. However, text generation models output tokens sequentially in a large output space; the classic PATE algorithm is not customized for this setting. Furthermore, PATE works well to protect sample-level privacy, but is not designed to protect phrases in samples. In this paper, we propose SeqPATE, an extension of PATE to text generation that protects the privacy of individual training samples and sensitive phrases in training data. To adapt PATE to text generation, we generate pseudo-contexts and reduce the sequence generation problem to a next-word prediction problem. To handle the large output space, we propose a candidate filtering strategy to dynamically reduce the output space, and refine the teacher aggregation of PATE to avoid low agreement due to voting for a large number of candidates. To further reduce privacy losses, we use knowledge distillation to reduce the number of teacher queries. The experiments verify the effectiveness of SeqPATE in protecting both training samples and sensitive phrases.
\ No newline at end of file
diff --git a/data/2022/neurips/Sequence Model Imitation Learning with Unobserved Contexts b/data/2022/neurips/Sequence Model Imitation Learning with Unobserved Contexts
new file mode 100644
index 0000000000..8532b84db2
--- /dev/null
+++ b/data/2022/neurips/Sequence Model Imitation Learning with Unobserved Contexts	
@@ -0,0 +1 @@
+We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed. One example of this is when the expert has access to privileged information: while the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions, they might be able to eventually identify the hidden context and act as the expert would. We prove that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods. This is because on-policy algorithms provably learn to recover from their initially suboptimal actions, while off-policy methods treat their suboptimal past actions as though they came from the expert. This often manifests as a latching behavior: a naive repetition of past actions. We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.
\ No newline at end of file
diff --git a/data/2022/neurips/Sequence-to-Set Generative Models b/data/2022/neurips/Sequence-to-Set Generative Models
new file mode 100644
index 0000000000..1c04485b34
--- /dev/null
+++ b/data/2022/neurips/Sequence-to-Set Generative Models	
@@ -0,0 +1 @@
+In this paper, we propose a sequence-to-set method that can transform any sequence generative model based on maximum likelihood to a set generative model where we can evaluate the utility/probability of any set. An efficient importance sampling algorithm is devised to tackle the computational challenge of learning our sequence-to-set model. We present GRU2Set, which is an instance of our sequence-to-set method and employs the famous GRU model as the sequence generative model. To further obtain permutation invariant representation of sets, we devise the SetNN model which is also an instance of the sequence-to-set model. A direct application of our models is to learn an order/set distribution from a collection of e-commerce orders, which is an essential step in many important operational decisions such as inventory arrangement for fast delivery. Based on the intuition that small-sized sets are usually easier to learn than large sets, we propose a size-bias trick that can help learn better set distributions with respect to the $\ell_1$-distance evaluation metric. Two e-commerce order datasets, TMALL and HKTVMALL, are used to conduct extensive experiments to show the effectiveness of our models. The experimental results demonstrate that our models can learn better set/order distributions from order data than the baselines. Moreover, no matter what model we use, applying the size-bias trick can always improve the quality of the set distribution learned from data.
\ No newline at end of file
diff --git a/data/2022/neurips/Sequencer: Deep LSTM for Image Classification b/data/2022/neurips/Sequencer: Deep LSTM for Image Classification
new file mode 100644
index 0000000000..8cc7f546f9
--- /dev/null
+++ b/data/2022/neurips/Sequencer: Deep LSTM for Image Classification	
@@ -0,0 +1 @@
+In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.
\ No newline at end of file
diff --git a/data/2022/neurips/Sequential Information Design: Learning to Persuade in the Dark b/data/2022/neurips/Sequential Information Design: Learning to Persuade in the Dark
new file mode 100644
index 0000000000..ade6eadaa9
--- /dev/null
+++ b/data/2022/neurips/Sequential Information Design: Learning to Persuade in the Dark	
@@ -0,0 +1 @@
+We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $\alpha \in [1/2, 1]$ as input, ensures $\tilde{O}({T^\alpha})$ and $\tilde{O}( T^{\max \{ \alpha, 1-\frac{\alpha}{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight.
\ No newline at end of file
diff --git a/data/2022/neurips/Set-based Meta-Interpolation for Few-Task Meta-Learning b/data/2022/neurips/Set-based Meta-Interpolation for Few-Task Meta-Learning
new file mode 100644
index 0000000000..398401980c
--- /dev/null
+++ b/data/2022/neurips/Set-based Meta-Interpolation for Few-Task Meta-Learning	
@@ -0,0 +1 @@
+Meta-learning approaches enable machine learning systems to adapt to new tasks given few examples by leveraging knowledge from related tasks. However, a large number of meta-training tasks are still required for generalization to unseen tasks during meta-testing, which introduces a critical bottleneck for real-world problems that come with only few tasks, due to various reasons including the difficulty and cost of constructing tasks. Recently, several task augmentation methods have been proposed to tackle this issue using domain-specific knowledge to design augmentation techniques to densify the meta-training task distribution. However, such reliance on domain-specific knowledge renders these methods inapplicable to other domains. While Manifold Mixup based task augmentation methods are domain-agnostic, we empirically find them ineffective on non-image domains. To tackle these limitations, we propose a novel domain-agnostic task augmentation method, Meta-Interpolation, which utilizes expressive neural set functions to densify the meta-training task distribution using bilevel optimization. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains such as image classification, molecule property prediction, text classification and speech recognition. Experimentally, we show that Meta-Interpolation consistently outperforms all the relevant baselines. Theoretically, we prove that task interpolation with the set function regularizes the meta-learner to improve generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer b/data/2022/neurips/Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
new file mode 100644
index 0000000000..88960a4f01
--- /dev/null
+++ b/data/2022/neurips/Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer	
@@ -0,0 +1 @@
+Knowledge distillation can be generally divided into offline and online categories according to whether teacher model is pre-trained and persistent during the distillation process. Offline distillation can employ existing models yet always demonstrates inferior performance than online ones. In this paper, we first empirically show that the essential factor for their performance gap lies in the reversed distillation from student to teacher, rather than the training fashion. Offline distillation can achieve competitive performance gain by fine-tuning pre-trained teacher to adapt student with such reversed distillation. However, this fine-tuning process still costs lots of training budgets. To alleviate this dilemma, we propose SHAKE, a simple yet effective SHA dow K nowl E dge transfer framework to bridge offline and online distillation, which trades the accuracy with efficiency. Specifically, we build an extra shadow head on the backbone to mimic the predictions of pre-trained teacher as its shadow. Then, this shadow head is leveraged as a proxy teacher to perform bidirectional distillation with student on the fly. In this way, SHAKE not only updates this student-aware proxy teacher with the knowledge of pre-trained model, but also greatly optimizes costs of augmented reversed distillation. Extensive experiments on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results with different CNNs and Vision Transformer models. Additionally, our method shows strong compatibility with multi-teacher and augmentation strategies by gaining additional performance improvement. Code is made publicly available at https://lilujunai.github.io/SHAKE/.
\ No newline at end of file
diff --git a/data/2022/neurips/Shape And Structure Preserving Differential Privacy b/data/2022/neurips/Shape And Structure Preserving Differential Privacy
new file mode 100644
index 0000000000..6a9974f4e6
--- /dev/null
+++ b/data/2022/neurips/Shape And Structure Preserving Differential Privacy	
@@ -0,0 +1 @@
+It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively curved manifold, such as Kendall's 2D shape space, is significantly influences by the curvature. Focusing on the problem of sanitizing the Fr\'echet mean of a sample of points on a manifold, we exploit the characterisation of the mean as the minimizer of an objective function comprised of the sum of squared distances and develop a K-norm gradient mechanism on Riemannian manifolds that favors values that produce gradients close to the the zero of the objective function. For the case of positively curved manifolds, we describe how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism, and demonstrate this numerically on a dataset of shapes of corpus callosa. Further illustrations of the mechanism's utility on a sphere and the manifold of symmetric positive definite matrices are also presented.
\ No newline at end of file
diff --git a/data/2022/neurips/Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising b/data/2022/neurips/Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising
new file mode 100644
index 0000000000..f22d308c77
--- /dev/null
+++ b/data/2022/neurips/Shape, Light, and Material Decomposition from Images using Monte Carlo Rendering and Denoising	
@@ -0,0 +1 @@
+Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multi-view images. Most methods rely on simple rendering algorithms: pre-filtered direct lighting or learned representations of irradiance. We show that a more realistic shading model, incorporating ray tracing and Monte Carlo integration, substantially improves decomposition into shape, materials&lighting. Unfortunately, Monte Carlo integration provides estimates with significant noise, even at large sample counts, which makes gradient-based inverse rendering very challenging. To address this, we incorporate multiple importance sampling and denoising in a novel inverse rendering pipeline. This substantially improves convergence and enables gradient-based optimization at low sample counts. We present an efficient method to jointly reconstruct geometry (explicit triangle meshes), materials, and lighting, which substantially improves material and light separation compared to previous work. We argue that denoising can become an integral part of high quality inverse rendering pipelines.
\ No newline at end of file
diff --git a/data/2022/neurips/ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model b/data/2022/neurips/ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model
new file mode 100644
index 0000000000..d3afc749d6
--- /dev/null
+++ b/data/2022/neurips/ShapeCrafter: A Recursive Text-Conditioned 3D Shape Generation Model	
@@ -0,0 +1 @@
+We present ShapeCrafter, a neural network for recursive text-conditioned 3D shape generation. Existing methods to generate text-conditioned 3D shapes consume an entire text prompt to generate a 3D shape in a single step. However, humans tend to describe shapes recursively-we may start with an initial description and progressively add details based on intermediate results. To capture this recursive process, we introduce a method to generate a 3D shape distribution, conditioned on an initial phrase, that gradually evolves as more phrases are added. Since existing datasets are insufficient for training this approach, we present Text2Shape++, a large dataset of 369K shape-text pairs that supports recursive shape generation. To capture local details that are often used to refine shape descriptions, we build on top of vector-quantized deep implicit functions that generate a distribution of high-quality shapes. Results show that our method can generate shapes consistent with text descriptions, and shapes evolve gradually as more phrases are added. Our method supports shape editing, extrapolation, and can enable new applications in human-machine collaboration for creative design.
\ No newline at end of file
diff --git a/data/2022/neurips/Sharing Knowledge for Meta-learning with Feature Descriptions b/data/2022/neurips/Sharing Knowledge for Meta-learning with Feature Descriptions
new file mode 100644
index 0000000000..4d5713890e
--- /dev/null
+++ b/data/2022/neurips/Sharing Knowledge for Meta-learning with Feature Descriptions	
@@ -0,0 +1 @@
+Language is an important tool for humans to share knowledge. We propose a meta-learning method that shares knowledge across supervised learning tasks using feature descriptions written in natural language, which have not been used in the existing meta-learning methods. The proposed method improves the predictive performance on unseen tasks with a limited number of labeled data by meta-learning from various tasks. With the feature descriptions, we can ﬁnd relationships across tasks even when their feature spaces are different. The feature descriptions are encoded using a language model pretrained with a large corpus, which enables us to incorporate human knowledge stored in the corpus into meta-learning. In our experiments, we demonstrate that the proposed method achieves better predictive performance than the existing meta-learning methods using a wide variety of real-world datasets provided by the statistical ofﬁce of the EU and Japan.
\ No newline at end of file
diff --git a/data/2022/neurips/Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality b/data/2022/neurips/Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality
new file mode 100644
index 0000000000..49983922d1
--- /dev/null
+++ b/data/2022/neurips/Sharp Analysis of Stochastic Optimization under Global Kurdyka-Lojasiewicz Inequality	
@@ -0,0 +1 @@
+We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Lojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of $\mathcal{O}(\epsilon^{-(4-\alpha)/\alpha})$ for SGD when the objective satisfies the so called $\alpha$-PL condition, where $\alpha$ is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of $\mathcal{O}(\epsilon^{-2/\alpha})$ when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of $\alpha=1$ which appears in applications such as policy optimization in reinforcement learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning b/data/2022/neurips/Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning
new file mode 100644
index 0000000000..96d4ca2079
--- /dev/null
+++ b/data/2022/neurips/Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning	
@@ -0,0 +1 @@
+We study the asynchronous stochastic gradient descent algorithm for distributed training over $n$ workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return those to the server without any synchronization. Existing convergence rates of this algorithm for non-convex smooth objectives depend on the maximum gradient delay $\tau_{\max}$ and show that an $\epsilon$-stationary point is reached after $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \tau_{\max}\epsilon^{-1}\right)$ iterations, where $\sigma$ denotes the variance of stochastic gradients. In this work (i) we obtain a tighter convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \sqrt{\tau_{\max}\tau_{avg}}\epsilon^{-1}\right)$ without any change in the algorithm where $\tau_{avg}$ is the average delay, which can be significantly smaller than $\tau_{\max}$. We also provide (ii) a simple delay-adaptive learning rate scheme, under which asynchronous SGD achieves a convergence rate of $\mathcal{O}\!\left(\sigma^2\epsilon^{-2}+ \tau_{avg}\epsilon^{-1}\right)$, and does not require any extra hyperparameter tuning nor extra communications. Our result allows to show for the first time that asynchronous SGD is always faster than mini-batch SGD. In addition, (iii) we consider the case of heterogeneous functions motivated by federated learning applications and improve the convergence rate by proving a weaker dependence on the maximum delay compared to prior works. In particular, we show that the heterogeneity term in convergence rate is only affected by the average delay within each worker.
\ No newline at end of file
diff --git a/data/2022/neurips/Sharpness-Aware Training for Free b/data/2022/neurips/Sharpness-Aware Training for Free
new file mode 100644
index 0000000000..e441bd9fe7
--- /dev/null
+++ b/data/2022/neurips/Sharpness-Aware Training for Free	
@@ -0,0 +1 @@
+Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.
\ No newline at end of file
diff --git a/data/2022/neurips/Shield Decentralization for Safe Multi-Agent Reinforcement Learning b/data/2022/neurips/Shield Decentralization for Safe Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..4f6fa64a3a
--- /dev/null
+++ b/data/2022/neurips/Shield Decentralization for Safe Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Learning safe solutions is an important but challenging problem in multi-agent re-inforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the ﬁrst time.
\ No newline at end of file
diff --git a/data/2022/neurips/ShuffleMixer: An Efficient ConvNet for Image Super-Resolution b/data/2022/neurips/ShuffleMixer: An Efficient ConvNet for Image Super-Resolution
new file mode 100644
index 0000000000..4693250d7b
--- /dev/null
+++ b/data/2022/neurips/ShuffleMixer: An Efficient ConvNet for Image Super-Resolution	
@@ -0,0 +1 @@
+Lightweight and efficiency are critical drivers for the practical application of image super-resolution (SR) algorithms. We propose a simple and effective approach, ShuffleMixer, for lightweight image super-resolution that explores large convolution and channel split-shuffle operation. In contrast to previous SR models that simply stack multiple small kernel convolutions or complex operators to learn representations, we explore a large kernel ConvNet for mobile-friendly SR design. Specifically, we develop a large depth-wise convolution and two projection layers based on channel splitting and shuffling as the basic component to mix features efficiently. Since the contexts of natural images are strongly locally correlated, using large depth-wise convolutions only is insufficient to reconstruct fine details. To overcome this problem while maintaining the efficiency of the proposed module, we introduce Fused-MBConvs into the proposed network to model the local connectivity of different features. Experimental results demonstrate that the proposed ShuffleMixer is about 6x smaller than the state-of-the-art methods in terms of model parameters and FLOPs while achieving competitive performance. In NTIRE 2022, our primary method won the model complexity track of the Efficient Super-Resolution Challenge [23]. The code is available at https://github.com/sunny2109/MobileSR-NTIRE2022.
\ No newline at end of file
diff --git a/data/2022/neurips/SignRFF: Sign Random Fourier Features b/data/2022/neurips/SignRFF: Sign Random Fourier Features
new file mode 100644
index 0000000000..7abf0b4084
--- /dev/null
+++ b/data/2022/neurips/SignRFF: Sign Random Fourier Features	
@@ -0,0 +1 @@
+The industry practice has been moving to embedding based retrieval (EBR). For example, in many applications, the embedding vectors are trained by some form of two-tower models. During serving phase, candidates (embedding vectors) are retrieved according to the rankings of cosine similarities either exhaustively or by approximate near neighbor (ANN) search algorithms. For those applications, it is natural to apply “sign random projections” (SignRP) or variants, on the trained embedding vectors to facilitate efficient data storage and cosine distance computations. SignRP is also one of the standard indexing schemes for conducting approximate near neighbor search. In the literature, SignRP has been popular and, to an extent, becomes the default method for “locality sensitive hashing” (LSH). In this paper, we propose “sign random Fourier features” (SignRFF) as an alternative to SignRP. The original method of random Fourier features (RFF) is a standard technique for approximating the Gaussian kernel (as opposed to the linear cosine kernel), in the literature of large-scale machine learning. Basically, RFF applies a simple nonlinear transformation on the samples generated by random projections (RP). Thus, in the pipeline of EBR, it is straightforward to replace SignRP by SignRFF. This paper explains, in a principled manner, why it makes sense to do so. In this paper, a new analytical measure called Ranking Efficiency (RE) is developed, which in retrospect is closely related to the “two-sample mean” t -test statistic for binomial variables. RE provides a systematic and unified framework for comparing different LSH methods. We compare our proposed SignRP with SignRP, KLSH (kernel LSH), as well SQ-RFF (which is another 1-bit coding scheme for RFF). According to the RE expression, SignRFF consistently outperforms KLSH (for Gaussian kernel) and SQ-RFF. SignRFF
\ No newline at end of file
diff --git a/data/2022/neurips/Signal Processing for Implicit Neural Representations b/data/2022/neurips/Signal Processing for Implicit Neural Representations
new file mode 100644
index 0000000000..932e3b2434
--- /dev/null
+++ b/data/2022/neurips/Signal Processing for Implicit Neural Representations	
@@ -0,0 +1 @@
+Implicit Neural Representations (INRs) encoding continuous multi-media data via multi-layer perceptrons has shown undebatable promise in various computer vision tasks. Despite many successful applications, editing and processing an INR remains intractable as signals are represented by latent parameters of a neural network. Existing works manipulate such continuous representations via processing on their discretized instance, which breaks down the compactness and continuous nature of INR. In this work, we present a pilot study on the question: how to directly modify an INR without explicit decoding? We answer this question by proposing an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR. Our key insight is that spatial gradients of neural networks can be computed analytically and are invariant to translation, while mathematically we show that any continuous convolution filter can be uniformly approximated by a linear combination of high-order differential operators. With these two knobs, INSP-Net instantiates the signal processing operator as a weighted composition of computational graphs corresponding to the high-order derivatives of INRs, where the weighting parameters can be data-driven learned. Based on our proposed INSP-Net, we further build the first Convolutional Neural Network (CNN) that implicitly runs on INRs, named INSP-ConvNet. Our experiments validate the expressiveness of INSP-Net and INSP-ConvNet in fitting low-level image and geometry processing kernels (e.g. blurring, deblurring, denoising, inpainting, and smoothening) as well as for high-level tasks on implicit fields such as image classification.
\ No newline at end of file
diff --git a/data/2022/neurips/Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse b/data/2022/neurips/Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
new file mode 100644
index 0000000000..d7b896aedb
--- /dev/null
+++ b/data/2022/neurips/Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse	
@@ -0,0 +1 @@
+Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training is still largely unanswered, and its investigation is necessary for a more comprehensive understanding of this architecture. In this work, we shed new light on the causes and the effects of this phenomenon. First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization. Furthermore, we provide a thorough description of the origin of rank collapse and discuss how to prevent it via an appropriate depth-dependent scaling of the residual branches. Finally, our analysis unveils that specific architectural hyperparameters affect the gradients of queries and values differently, leading to disproportionate gradient norms. This suggests an explanation for the widespread use of adaptive methods for Transformers' optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/Signal Recovery with Non-Expansive Generative Network Priors b/data/2022/neurips/Signal Recovery with Non-Expansive Generative Network Priors
new file mode 100644
index 0000000000..9aaa5e4712
--- /dev/null
+++ b/data/2022/neurips/Signal Recovery with Non-Expansive Generative Network Priors	
@@ -0,0 +1 @@
+We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed, allowing for networks with contractive layers (as often the case of real generators). In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality which we term Range Restricted Weight Distribution Condition (R2WDC) and that weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.
\ No newline at end of file
diff --git a/data/2022/neurips/Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions b/data/2022/neurips/Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions
new file mode 100644
index 0000000000..84a803007f
--- /dev/null
+++ b/data/2022/neurips/Simple Mechanisms for Welfare Maximization in Rich Advertising Auctions	
@@ -0,0 +1 @@
+Internet ad auctions have evolved from a few lines of text to richer informational layouts that include images, sitelinks, videos, etc. Ads in these new formats occupy varying amounts of space, and an advertiser can provide multiple formats, only one of which can be shown. The seller is now faced with a multi-parameter mechanism design problem. Computing an efficient allocation is computationally intractable, and therefore the standard Vickrey-Clarke-Groves (VCG) auction, while truthful and welfare-optimal, is impractical. In this paper, we tackle a fundamental problem in the design of modern ad auctions. We adopt a ``Myersonian'' approach and study allocation rules that are monotone both in the bid and set of rich ads. We show that such rules can be paired with a payment function to give a truthful auction. Our main technical challenge is designing a monotone rule that yields a good approximation to the optimal welfare. Monotonicity doesn't hold for standard algorithms, e.g. the incremental bang-per-buck order, that give good approximations to ``knapsack-like'' problems such as ours. In fact, we show that no deterministic monotone rule can approximate the optimal welfare within a factor better than $2$ (while there is a non-monotone FPTAS). Our main result is a new, simple, greedy and monotone allocation rule that guarantees a $3$ approximation. In ad auctions in practice, monotone allocation rules are often paired with the so-called Generalized Second Price (GSP) payment rule, which charges the minimum threshold price below which the allocation changes. We prove that, even though our monotone allocation rule paired with GSP is not truthful, its Price of Anarchy (PoA) is bounded. Under standard no overbidding assumption, we prove a pure PoA bound of $6$ and a Bayes-Nash PoA bound of $\frac{6}{(1 - \frac{1}{e})}$. Finally, we experimentally test our algorithms on real-world data.
\ No newline at end of file
diff --git a/data/2022/neurips/Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos b/data/2022/neurips/Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
new file mode 100644
index 0000000000..a11d5b1fc0
--- /dev/null
+++ b/data/2022/neurips/Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos	
@@ -0,0 +1 @@
+Unsupervised object-centric learning aims to represent the modular, compositional, and causal structure of a scene as a set of object representations and thereby promises to resolve many critical limitations of traditional single-vector representations such as poor systematic generalization. Although there have been many remarkable advances in recent years, one of the most critical problems in this direction has been that previous methods work only with simple and synthetic scenes but not with complex and naturalistic images or videos. In this paper, we propose STEVE, an unsupervised model for object-centric learning in videos. Our proposed model makes a significant advancement by demonstrating its effectiveness on various complex and naturalistic videos unprecedented in this line of research. Interestingly, this is achieved by neither adding complexity to the model architecture nor introducing a new objective or weak supervision. Rather, it is achieved by a surprisingly simple architecture that uses a transformer-based image decoder conditioned on slots and the learning objective is simply to reconstruct the observation. Our experiment results on various complex and naturalistic videos show significant improvements compared to the previous state-of-the-art.
\ No newline at end of file
diff --git a/data/2022/neurips/Simple and Optimal Greedy Online Contention Resolution Schemes b/data/2022/neurips/Simple and Optimal Greedy Online Contention Resolution Schemes
new file mode 100644
index 0000000000..5282ebc44c
--- /dev/null
+++ b/data/2022/neurips/Simple and Optimal Greedy Online Contention Resolution Schemes	
@@ -0,0 +1 @@
+Real-world problems such as ad allocation and matching have been extensively studied under the lens of combinatorial optimization. In several applications, uncertainty in the input appears naturally and this has led to the study of online stochastic optimization models for such problems. For the offline case, these constrained combinatorial optimization problems have been extensively studied, and Contention Resolution Schemes (CRSs), introduced by Chekuri, Vondr\'{a}k, and Zenklusen, have emerged in recent years as a general framework to obtaining a solution. The idea behind a CRS is to first obtain a fractional solution to a (continuous) relaxation of the objective and then round the fractional solution to an integral one. When the order of rounding is controlled by an adversary, Online Contention Resolution Schemes (OCRSs) can be used instead, and have been successfully applied in settings such as prophet inequalities and stochastic probing. In this work, we focus on greedy OCRSs, which provide guarantees against the strongest possible adversary, an almighty adversary. Intuitively, a greedy OCRS has to make all its decisions before the online process starts. We present simple $1/e$ - selectable greedy OCRSs for the single-item setting, partition matroids and transversal matroids, which improve upon the previous state-of-the-art greedy OCRSs for these constraints. We also show that our greedy OCRSs are optimal, even for the simple single-item case.
\ No newline at end of file
diff --git a/data/2022/neurips/Simplified Graph Convolution with Heterophily b/data/2022/neurips/Simplified Graph Convolution with Heterophily
new file mode 100644
index 0000000000..d4c9005493
--- /dev/null
+++ b/data/2022/neurips/Simplified Graph Convolution with Heterophily	
@@ -0,0 +1 @@
+Recent work has shown that a simple, fast method called Simple Graph Convolution (SGC) (Wu et al., 2019), which eschews deep learning, is competitive with deep methods like graph convolutional networks (GCNs) (Kipf&Welling, 2017) in common graph machine learning benchmarks. The use of graph data in SGC implicitly assumes the common but not universal graph characteristic of homophily, wherein nodes link to nodes which are similar. Here we confirm that SGC is indeed ineffective for heterophilous (i.e., non-homophilous) graphs via experiments on synthetic and real-world datasets. We propose Adaptive Simple Graph Convolution (ASGC), which we show can adapt to both homophilous and heterophilous graph structure. Like SGC, ASGC is not a deep model, and hence is fast, scalable, and interpretable; further, we can prove performance guarantees on natural synthetic data models. Empirically, ASGC is often competitive with recent deep models at node classification on a benchmark of real-world datasets. The SGC paper questioned whether the complexity of graph neural networks is warranted for common graph problems involving homophilous networks; our results similarly suggest that, while deep learning often achieves the highest performance, heterophilous structure alone does not necessitate these more involved methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Simulation-guided Beam Search for Neural Combinatorial Optimization b/data/2022/neurips/Simulation-guided Beam Search for Neural Combinatorial Optimization
new file mode 100644
index 0000000000..b4bd130b4b
--- /dev/null
+++ b/data/2022/neurips/Simulation-guided Beam Search for Neural Combinatorial Optimization	
@@ -0,0 +1 @@
+Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved. With the goal of providing a powerful search procedure to neural CO approaches, we propose simulation-guided beam search (SGBS), which examines candidate solutions within a fixed-width tree search that both a neural net-learned policy and a simulation (rollout) identify as promising. We further hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS, and EAS improves the quality of the policy used in SGBS. We evaluate our methods on well-known CO benchmarks and show that SGBS significantly improves the quality of the solutions found under reasonable runtime assumptions.
\ No newline at end of file
diff --git a/data/2022/neurips/Simultaneous Missing Value Imputation and Structure Learning with Groups b/data/2022/neurips/Simultaneous Missing Value Imputation and Structure Learning with Groups
new file mode 100644
index 0000000000..6fa4369a4a
--- /dev/null
+++ b/data/2022/neurips/Simultaneous Missing Value Imputation and Structure Learning with Groups	
@@ -0,0 +1 @@
+Learning structures between groups of variables from data with missing values is an important task in the real world, yet difficult to solve. One typical scenario is discovering the structure among topics in the education domain to identify learning pathways. Here, the observations are student performances for questions under each topic which contain missing values. However, most existing methods focus on learning structures between a few individual variables from the complete data. In this work, we propose VISL, a novel scalable structure learning approach that can simultaneously infer structures between groups of variables under missing data and perform missing value imputations with deep learning. Particularly, we propose a generative model with a structured latent space and a graph neural network-based architecture, scaling to a large number of variables. Empirically, we conduct extensive experiments on synthetic, semi-synthetic, and real-world education data sets. We show improved performances on both imputation and structure learning accuracy compared to popular and recent approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Single Loop Gaussian Homotopy Method for Non-convex Optimization b/data/2022/neurips/Single Loop Gaussian Homotopy Method for Non-convex Optimization
new file mode 100644
index 0000000000..57b7d557a3
--- /dev/null
+++ b/data/2022/neurips/Single Loop Gaussian Homotopy Method for Non-convex Optimization	
@@ -0,0 +1 @@
+The Gaussian homotopy (GH) method is a popular approach to finding better stationary points for non-convex optimization problems by gradually reducing a parameter value $t$, which changes the problem to be solved from an almost convex one to the original target one. Existing GH-based methods repeatedly call an iterative optimization solver to find a stationary point every time $t$ is updated, which incurs high computational costs. We propose a novel single loop framework for GH methods (SLGH) that updates the parameter $t$ and the optimization decision variables at the same. Computational complexity analysis is performed on the SLGH algorithm under various situations: either a gradient or gradient-free oracle of a GH function can be obtained for both deterministic and stochastic settings. The convergence rate of SLGH with a tuned hyperparameter becomes consistent with the convergence rate of gradient descent, even though the problem to be solved is gradually changed due to $t$. In numerical experiments, our SLGH algorithms show faster convergence than an existing double loop GH method while outperforming gradient descent-based methods in terms of finding a better solution.
\ No newline at end of file
diff --git a/data/2022/neurips/Single Model Uncertainty Estimation via Stochastic Data Centering b/data/2022/neurips/Single Model Uncertainty Estimation via Stochastic Data Centering
new file mode 100644
index 0000000000..84c6f0b508
--- /dev/null
+++ b/data/2022/neurips/Single Model Uncertainty Estimation via Stochastic Data Centering	
@@ -0,0 +1 @@
+We are interested in estimating the uncertainties of deep neural networks, which play an important role in many scientific and engineering problems. In this paper, we present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models, where the differences in predictions are a strong indicator of epistemic uncertainties. Using the neural tangent kernel (NTK), we demonstrate that this phenomena occurs in part because the NTK is not shift-invariant. Since this is achieved via a trivial input transformation, we show that this behavior can therefore be approximated by training a single neural network -- using a technique that we call $\Delta-$UQ -- that estimates uncertainty around prediction by marginalizing out the effect of the biases during inference. We show that $\Delta-$UQ's uncertainty estimates are superior to many of the current methods on a variety of benchmarks -- outlier rejection, calibration under distribution shift, and sequential design optimization of black box functions. Code for $\Delta-$UQ can be accessed at https://github.com/LLNL/DeltaUQ
\ No newline at end of file
diff --git a/data/2022/neurips/Single-Stage Visual Relationship Learning using Conditional Queries b/data/2022/neurips/Single-Stage Visual Relationship Learning using Conditional Queries
new file mode 100644
index 0000000000..ca37cc49b5
--- /dev/null
+++ b/data/2022/neurips/Single-Stage Visual Relationship Learning using Conditional Queries	
@@ -0,0 +1 @@
+Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are computationally efficient. With the advent of DETR, a set based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem and the combinatorial entity pair distribution. We employ a DETR-based encoder-decoder design and leverage conditional queries to significantly reduce the entity label space as well, which leads to 20% fewer parameters compared to state-of-the-art single-stage models. Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset, yet is capable of end-to-end training and faster inference.
\ No newline at end of file
diff --git a/data/2022/neurips/Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity b/data/2022/neurips/Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Single-phase deep learning in cortico-cortical networks b/data/2022/neurips/Single-phase deep learning in cortico-cortical networks
new file mode 100644
index 0000000000..e335a35d4a
--- /dev/null
+++ b/data/2022/neurips/Single-phase deep learning in cortico-cortical networks	
@@ -0,0 +1 @@
+The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks. In neuroscience, it is unclear whether the brain could adopt a similar strategy to correctly modify its synapses. Recent models have attempted to bridge this gap while being consistent with a range of experimental observations. However, these models are either unable to effectively backpropagate error signals across multiple layers or require a multi-phase learning process, neither of which are reminiscent of learning in the brain. Here, we introduce a new model, Bursting Cortico-Cortical Networks (BurstCCN), which solves these issues by integrating known properties of cortical networks namely bursting activity, short-term plasticity (STP) and dendrite-targeting interneurons. BurstCCN relies on burst multiplexing via connection-type-specific STP to propagate backprop-like error signals within deep cortical networks. These error signals are encoded at distal dendrites and induce burst-dependent plasticity as a result of excitatory-inhibitory top-down inputs. First, we demonstrate that our model can effectively backpropagate errors through multiple layers using a single-phase learning process. Next, we show both empirically and analytically that learning in our model approximates backprop-derived gradients. Finally, we demonstrate that our model is capable of learning complex image classification tasks (MNIST and CIFAR-10). Overall, our results suggest that cortical features across sub-cellular, cellular, microcircuit and systems levels jointly underlie single-phase efficient deep learning in the brain.
\ No newline at end of file
diff --git a/data/2022/neurips/Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning b/data/2022/neurips/Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning
new file mode 100644
index 0000000000..c4418d145e
--- /dev/null
+++ b/data/2022/neurips/Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning	
@@ -0,0 +1 @@
+Freezing the pre-trained backbone has become a standard paradigm to avoid overfitting in few-shot segmentation. In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}. We present a solution to overcome the overfitting problem, leading to better model generalization on learning novel classes. Our method decomposes backbone parameters into three successive matrices via the Singular Value Decomposition (SVD), then {\em only fine-tunes the singular values} and keeps others frozen. The above design allows the model to adjust feature representations on novel classes while maintaining semantic clues within the pre-trained backbone. We evaluate our {\em Singular Value Fine-tuning (SVF)} approach on various few-shot segmentation methods with different backbones. We achieve state-of-the-art results on both Pascal-5$^i$ and COCO-20$^i$ across 1-shot and 5-shot settings. Hopefully, this simple baseline will encourage researchers to rethink the role of backbone fine-tuning in few-shot settings. The source code and models will be available at https://github.com/syp2ysy/SVF.
\ No newline at end of file
diff --git a/data/2022/neurips/Size and depth of monotone neural networks: interpolation and approximation b/data/2022/neurips/Size and depth of monotone neural networks: interpolation and approximation
new file mode 100644
index 0000000000..0c91767147
--- /dev/null
+++ b/data/2022/neurips/Size and depth of monotone neural networks: interpolation and approximation	
@@ -0,0 +1 @@
+We study monotone neural networks with threshold gates where all the weights (other than the biases) are nonnegative. We focus on the expressive power and efficiency of the representation of such networks. Our first result establishes that every monotone function over [0,1]d can be approximated within arbitrarily small additive error by a depth-4 monotone network. When , we improve upon the previous best-known construction, which has a depth of d+1 . Our proof goes by solving the monotone interpolation problem for monotone datasets using a depth-4 monotone threshold network. In our second main result, we compare size bounds between monotone and arbitrary neural networks with threshold gates. We find that there are monotone real functions that can be computed efficiently by networks with no restriction on the gates, whereas monotone networks approximating these functions need exponential size in the dimension.
\ No newline at end of file
diff --git a/data/2022/neurips/SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks b/data/2022/neurips/SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks
new file mode 100644
index 0000000000..24f37b260a
--- /dev/null
+++ b/data/2022/neurips/SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks	
@@ -0,0 +1 @@
+In the past few years, graph neural networks (GNNs) have become the de facto model of choice for graph classification. While, from the theoretical viewpoint, most GNNs can operate on graphs of any size, it is empirically observed that their classification performance degrades when they are applied on graphs with sizes that differ from those in the training data. Previous works have tried to tackle this issue in graph classification by providing the model with inductive biases derived from assumptions on the generative process of the graphs, or by requiring access to graphs from the test domain. The first strategy is tied to the quality of the assumptions made for the generative process, and requires the use of specific models designed after the explicit definition of the generative process of the data, leaving open the question of how to improve the performance of generic GNN models in general settings. On the other hand, the second strategy can be applied to any GNN, but requires access to information that is not always easy to obtain. In this work we consider the scenario in which we only have access to the training data, and we propose a regularization strategy that can be applied to any GNN to improve its generalization capabilities from smaller to larger graphs without requiring access to the test data. Our regularization is based on the idea of simulating a shift in the size of the training graphs using coarsening techniques, and enforcing the model to be robust to such a shift. Experimental results on standard datasets show that popular GNN models, trained on the 50% smallest graphs in the dataset and tested on the 10% largest graphs, obtain performance improvements of up to 30% when trained with our regularization strategy.
\ No newline at end of file
diff --git a/data/2022/neurips/Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity b/data/2022/neurips/Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity
new file mode 100644
index 0000000000..9cea48851b
--- /dev/null
+++ b/data/2022/neurips/Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts.
\ No newline at end of file
diff --git a/data/2022/neurips/SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems b/data/2022/neurips/SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems
new file mode 100644
index 0000000000..3df0b30636
--- /dev/null
+++ b/data/2022/neurips/SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems	
@@ -0,0 +1 @@
+Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm that has been shown to achieve state-of-the-art results on many standard data science problems. We are interested in its application to multioutput problems when the output is highly multidimensional. Although there are highly effective GBDT implementations, their scalability to such problems is still unsatisfactory. In this paper, we propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario. The idea behind these methods lies in the approximate computation of a scoring function used to find the best split of decision trees. These methods are implemented in SketchBoost, which itself is integrated into our easily customizable Python-based GPU implementation of GBDT called Py-Boost. Our numerical study demonstrates that SketchBoost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Sketching based Representations for Robust Image Classification with Provable Guarantees b/data/2022/neurips/Sketching based Representations for Robust Image Classification with Provable Guarantees
new file mode 100644
index 0000000000..a1d64da1b6
--- /dev/null
+++ b/data/2022/neurips/Sketching based Representations for Robust Image Classification with Provable Guarantees	
@@ -0,0 +1 @@
+How do we provably represent images succinctly so that their essential latent attributes are correctly captured by the representation to as high level of detail as possible? While today’s deep networks (such as CNNs) produce image embeddings they do not have any provable properties and seem to work in mysterious non-interpretable ways. In this work we theoretically study synthetic images that are composed of a union or intersection of several mathematically speciﬁed shapes using thresholded polynomial functions (for e.g. ellipses, rectangles). We show how to produce a succinct sketch of such an image so that the sketch “smoothly” maps to the latent-coefﬁcients producing the different shapes in the image. We prove several important properties such as: easy reconstruction of the image from the sketch, similarity preservation (similar shapes produce similar sketches), being able to index sketches so that other similar images and parts of other images can be retrieved, being able to store the sketches into a dictionary of concepts and shapes so parts of the same or different images that refer to the same shape can point to the same entry in this dictionary of common shape attributes.
\ No newline at end of file
diff --git a/data/2022/neurips/Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning b/data/2022/neurips/Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
new file mode 100644
index 0000000000..ac5b955e69
--- /dev/null
+++ b/data/2022/neurips/Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning (RL) with diverse offline datasets can have the advantage of leveraging the relation of multiple tasks and the common skills learned across those tasks, hence allowing us to deal with real-world complex problems efficiently in a data-driven way. In offline RL where only offline data is used and online interaction with the environment is restricted, it is yet difficult to achieve the optimal policy for multiple tasks, especially when the data quality varies for the tasks. In this paper, we present a skill-based multi-task RL technique on heterogeneous datasets that are generated by behavior policies of different quality. To learn the shareable knowledge across those datasets effectively, we employ a task decomposition method for which common skills are jointly learned and used as guidance to reformulate a task in shared and achievable subtasks. In this joint learning, we use Wasserstein auto-encoder (WAE) to represent both skills and tasks on the same latent space and use the quality-weighted loss as a regularization term to induce tasks to be decomposed into subtasks that are more consistent with high-quality skills than others. To improve the performance of offline RL agents learned on the latent space, we also augment datasets with imaginary trajectories relevant to high-quality skills for each task. Through experiments, we show that our multi-task offline RL approach is robust to the mixed configurations of different-quality datasets and it outperforms other state-of-the-art algorithms for several robotic manipulation tasks and drone navigation tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis b/data/2022/neurips/SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis
new file mode 100644
index 0000000000..5e45148008
--- /dev/null
+++ b/data/2022/neurips/SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained debugging and analysis	
@@ -0,0 +1 @@
+For the deployment of artificial intelligence (AI) in high-risk settings, such as healthcare, methods that provide interpretability/explainability or allow fine-grained error analysis are critical. Many recent methods for interpretability/explainability and fine-grained error analysis use concepts, which are meta-labels that are semantically meaningful to humans. However, there are only a few datasets that include concept-level meta-labels and most of these meta-labels are relevant for natural images that do not require domain expertise. Densely annotated datasets in medicine focused on meta-labels that are relevant to a single disease such as melanoma. In dermatology, skin disease is described using an established clinical lexicon that allows clinicians to describe physical exam findings to one another. To provide a medical dataset densely annotated by domain experts with annotations useful across multiple disease processes, we developed SkinCon: a skin disease dataset densely annotated by dermatologists. SkinCon includes 3230 images from the Fitzpatrick 17k dataset densely annotated with 48 clinical concepts, 22 of which have at least 50 images representing the concept. The concepts used were chosen by two dermatologists considering the clinical descriptor terms used to describe skin lesions. Examples include"plaque","scale", and"erosion". The same concepts were also used to label 656 skin disease images from the Diverse Dermatology Images dataset, providing an additional external dataset with diverse skin tone representations. We review the potential applications for the SkinCon dataset, such as probing models, concept-based explanations, and concept bottlenecks. Furthermore, we use SkinCon to demonstrate two of these use cases: debugging mistakes of an existing dermatology AI model with concepts and developing interpretable models with post-hoc concept bottleneck models.
\ No newline at end of file
diff --git a/data/2022/neurips/Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch b/data/2022/neurips/Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
new file mode 100644
index 0000000000..59302f673f
--- /dev/null
+++ b/data/2022/neurips/Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch	
@@ -0,0 +1 @@
+As the curation of data for machine learning becomes increasingly automated, dataset tampering is a mounting threat. Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data. This vulnerability is then activated at inference time by placing a"trigger"into the model's input. Typical backdoor attacks insert the trigger directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all. However, this hidden trigger attack is ineffective at poisoning neural networks trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch. We demonstrate its effectiveness on ImageNet and in black-box settings. Our implementation code can be found at https://github.com/hsouri/Sleeper-Agent.
\ No newline at end of file
diff --git a/data/2022/neurips/Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions b/data/2022/neurips/Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions
new file mode 100644
index 0000000000..bb1ddc8c6e
--- /dev/null
+++ b/data/2022/neurips/Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions	
@@ -0,0 +1 @@
+Recent extensions to dynamic games of the well-known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the systems in (Leslie 2020; Baudin 2022) to construct a family of new decentralized learning algorithms which are model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our procedures can be seen as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where the players best responses are regularized, thanks to a smooth strictly concave perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in zero-sum and identical-interest discounted stochastic games. The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods. When there is only one player, our problem is an instance of Reinforcement Learning and our procedures are proved to globally converge to the optimal stationary policy of the regularized MDP. In that sense, they can be seen as an alternative to the well known Q-learning procedure.
\ No newline at end of file
diff --git a/data/2022/neurips/Smoothed Embeddings for Certified Few-Shot Learning b/data/2022/neurips/Smoothed Embeddings for Certified Few-Shot Learning
new file mode 100644
index 0000000000..471e05cd4c
--- /dev/null
+++ b/data/2022/neurips/Smoothed Embeddings for Certified Few-Shot Learning	
@@ -0,0 +1 @@
+Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classifiers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classification is performed by computing distances to embeddings of classes prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of Lipschitz continuity of such models and derive robustness certificate against $\ell_2$-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are confirmed by experiments on different datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor b/data/2022/neurips/Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor
new file mode 100644
index 0000000000..a8b9fe385e
--- /dev/null
+++ b/data/2022/neurips/Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor	
@@ -0,0 +1 @@
+In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost. In the setting of learning with expert advice, Daniely and Mansour (2019) demonstrate that Discounted-Normal-Predictor can be utilized to yield nearly optimal regret bounds over any interval, even in the presence of switching costs. Inspired by their results, we develop a simple algorithm for SOCO: Combining online gradient descent (OGD) with different step sizes sequentially by Discounted-Normal-Predictor. Despite its simplicity, we prove that it is able to minimize the adaptive regret with switching cost, i.e., attaining nearly optimal regret with switching cost on every interval. By exploiting the theoretical guarantee of OGD for dynamic regret, we further show that the proposed algorithm can minimize the dynamic regret with switching cost in every interval.
\ No newline at end of file
diff --git a/data/2022/neurips/SnAKe: Bayesian Optimization with Pathwise Exploration b/data/2022/neurips/SnAKe: Bayesian Optimization with Pathwise Exploration
new file mode 100644
index 0000000000..436358481a
--- /dev/null
+++ b/data/2022/neurips/SnAKe: Bayesian Optimization with Pathwise Exploration	
@@ -0,0 +1 @@
+Bayesian Optimization is a very effective tool for optimizing expensive black-box functions. Inspired by applications developing and characterizing reaction chemistry using droplet microfluidic reactors, we consider a novel setting where the expense of evaluating the function can increase significantly when making large input changes between iterations. We further assume we are working asynchronously, meaning we have to select new queries before evaluating previous experiments. This paper investigates the problem and introduces 'Sequential Bayesian Optimization via Adaptive Connecting Samples' (SnAKe), which provides a solution by considering large batches of queries and preemptively building optimization paths that minimize input costs. We investigate some convergence properties and empirically show that the algorithm is able to achieve regret similar to classical Bayesian Optimization algorithms in both synchronous and asynchronous settings, while reducing input costs significantly. We show the method is robust to the choice of its single hyper-parameter and provide a parameter-free alternative.
\ No newline at end of file
diff --git a/data/2022/neurips/So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems b/data/2022/neurips/So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems
new file mode 100644
index 0000000000..90dffc5fff
--- /dev/null
+++ b/data/2022/neurips/So3krates: Equivariant attention for interactions on arbitrary length-scales in molecular systems	
@@ -0,0 +1 @@
+The application of machine learning methods in quantum chemistry has enabled the study of numerous chemical phenomena, which are computationally intractable with traditional ab-initio methods. However, some quantum mechanical properties of molecules and materials depend on non-local electronic effects, which are often neglected due to the difficulty of modeling them efficiently. This work proposes a modified attention mechanism adapted to the underlying physics, which allows to recover the relevant non-local effects. Namely, we introduce spherical harmonic coordinates (SPHCs) to reflect higher-order geometric information for each atom in a molecule, enabling a non-local formulation of attention in the SPHC space. Our proposed model So3krates - a self-attention based message passing neural network - uncouples geometric information from atomic features, making them independently amenable to attention mechanisms. Thereby we construct spherical filters, which extend the concept of continuous filters in Euclidean space to SPHC space and serve as foundation for a spherical self-attention mechanism. We show that in contrast to other published methods, So3krates is able to describe non-local quantum mechanical effects over arbitrary length scales. Further, we find evidence that the inclusion of higher-order geometric correlations increases data efficiency and improves generalization. So3krates matches or exceeds state-of-the-art performance on popular benchmarks, notably, requiring a significantly lower number of parameters (0.25 - 0.4x) while at the same time giving a substantial speedup (6 - 14x for training and 2 - 11x for inference) compared to other models.
\ No newline at end of file
diff --git a/data/2022/neurips/SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning b/data/2022/neurips/SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning
new file mode 100644
index 0000000000..73bbef31f1
--- /dev/null
+++ b/data/2022/neurips/SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning	
@@ -0,0 +1 @@
+Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https://github.com/hbzju/SoLar .
\ No newline at end of file
diff --git a/data/2022/neurips/Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent b/data/2022/neurips/Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
new file mode 100644
index 0000000000..f05deb7f04
--- /dev/null
+++ b/data/2022/neurips/Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent	
@@ -0,0 +1 @@
+In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality.
\ No newline at end of file
diff --git a/data/2022/neurips/Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations b/data/2022/neurips/Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations
new file mode 100644
index 0000000000..17987d0e8b
--- /dev/null
+++ b/data/2022/neurips/Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations	
@@ -0,0 +1 @@
+Considering two decision-making tasks $A$ and $B$, each of which wishes to compute an effective \textit{decision} $Y$ for a given \textit{query} $X$, {can we solve task $B$ by using query-decision pairs $(X, Y)$ of $A$ without knowing the latent decision-making model?} Such problems, called \textit{inverse decision-making with task migrations}, are of interest in that the complex and stochastic nature of real-world applications often prevents the agent from completely knowing the underlying system. In this paper, we introduce such a new problem with formal formulations and present a generic framework for addressing decision-making tasks in social contagion management. On the theory side, we present a generalization analysis for justifying the learning performance of our framework. In empirical studies, we perform a sanity check and compare the presented method with other possible learning-based and graph-based methods. We have acquired promising experimental results, confirming for the first time that it is possible to solve one decision-making task by using the solutions associated with another one.
\ No newline at end of file
diff --git a/data/2022/neurips/Society of Agents: Regret Bounds of Concurrent Thompson Sampling b/data/2022/neurips/Society of Agents: Regret Bounds of Concurrent Thompson Sampling
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/SoftPatch: Unsupervised Anomaly Detection with Noisy Data b/data/2022/neurips/SoftPatch: Unsupervised Anomaly Detection with Noisy Data
new file mode 100644
index 0000000000..c54d10d22c
--- /dev/null
+++ b/data/2022/neurips/SoftPatch: Unsupervised Anomaly Detection with Noisy Data	
@@ -0,0 +1 @@
+Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection for the first time. To solve this problem, we proposed a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset. Comprehensive experiments in various noise scenes demonstrate that SoftPatch outperforms the state-of-the-art AD methods on the MVTecAD and BTAD benchmarks and is comparable to those methods under the setting without noise.
\ No newline at end of file
diff --git a/data/2022/neurips/Solving Quantitative Reasoning Problems with Language Models b/data/2022/neurips/Solving Quantitative Reasoning Problems with Language Models
new file mode 100644
index 0000000000..744a9351dd
--- /dev/null
+++ b/data/2022/neurips/Solving Quantitative Reasoning Problems with Language Models	
@@ -0,0 +1 @@
+Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
\ No newline at end of file
diff --git a/data/2022/neurips/SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression b/data/2022/neurips/SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression
new file mode 100644
index 0000000000..658fd5ff97
--- /dev/null
+++ b/data/2022/neurips/SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression	
@@ -0,0 +1 @@
+To enable large-scale machine learning in bandwidth-hungry environments such as wireless networks, significant progress has been made recently in designing communication-efficient federated learning algorithms with the aid of communication compression. On the other end, privacy-preserving, especially at the client level, is another important desideratum that has not been addressed simultaneously in the presence of advanced communication compression techniques yet. In this paper, we propose a unified framework that enhances the communication efficiency of private federated learning with communication compression. Exploiting both general compression operators and local differential privacy, we first examine a simple algorithm that applies compression directly to differentially-private stochastic gradient descent, and identify its limitations. We then propose a unified framework SoteriaFL for private federated learning, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme. We provide a comprehensive characterization of its performance trade-offs in terms of privacy, utility, and communication complexity, where SoteraFL is shown to achieve better communication complexity without sacrificing privacy nor utility than other private federated learning algorithms without communication compression.
\ No newline at end of file
diff --git a/data/2022/neurips/Sound and Complete Causal Identification with Latent Variables Given Local Background Knowledge b/data/2022/neurips/Sound and Complete Causal Identification with Latent Variables Given Local Background Knowledge
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Sound and Complete Verification of Polynomial Networks b/data/2022/neurips/Sound and Complete Verification of Polynomial Networks
new file mode 100644
index 0000000000..e7819f183a
--- /dev/null
+++ b/data/2022/neurips/Sound and Complete Verification of Polynomial Networks	
@@ -0,0 +1 @@
+Polynomial Networks (PNs) have demonstrated promising performance on face and image recognition recently. However, robustness of PNs is unclear and thus obtaining certificates becomes imperative for enabling their adoption in real-world applications. Existing verification algorithms on ReLU neural networks (NNs) based on classical branch and bound (BaB) techniques cannot be trivially applied to PN verification. In this work, we devise a new bounding method, equipped with BaB for global convergence guarantees, called Verification of Polynomial Networks or VPN for short. One key insight is that we obtain much tighter bounds than the interval bound propagation (IBP) and DeepT-Fast [Bonaert et al., 2021] baselines. This enables sound and complete PN verification with empirical validation on MNIST, CIFAR10 and STL10 datasets. We believe our method has its own interest to NN verification. The source code is publicly available at https://github.com/megaelius/PNVerification.
\ No newline at end of file
diff --git a/data/2022/neurips/SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning b/data/2022/neurips/SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
new file mode 100644
index 0000000000..1842e140ca
--- /dev/null
+++ b/data/2022/neurips/SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning	
@@ -0,0 +1 @@
+We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments. Given a 3D mesh of a real-world environment, SoundSpaces can generate highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations. Together with existing 3D visual assets, it supports an array of audio-visual research tasks, such as audio-visual navigation, mapping, source localization and separation, and acoustic matching. Compared to existing resources, SoundSpaces 2.0 has the advantages of allowing continuous spatial sampling, generalization to novel environments, and configurable microphone and material properties. To our knowledge, this is the first geometry-based acoustic simulation that offers high fidelity and realism while also being fast enough to use for embodied learning. We showcase the simulator's properties and benchmark its performance against real-world audio measurements. In addition, we demonstrate two downstream tasks -- embodied navigation and far-field automatic speech recognition -- and highlight sim2real performance for the latter. SoundSpaces 2.0 is publicly available to facilitate wider research for perceptual systems that can both see and hear.
\ No newline at end of file
diff --git a/data/2022/neurips/SparCL: Sparse Continual Learning on the Edge b/data/2022/neurips/SparCL: Sparse Continual Learning on the Edge
new file mode 100644
index 0000000000..45c7b83032
--- /dev/null
+++ b/data/2022/neurips/SparCL: Sparse Continual Learning on the Edge	
@@ -0,0 +1 @@
+Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Fourier Backpropagation in Cryo-EM Reconstruction b/data/2022/neurips/Sparse Fourier Backpropagation in Cryo-EM Reconstruction
new file mode 100644
index 0000000000..b431dada75
--- /dev/null
+++ b/data/2022/neurips/Sparse Fourier Backpropagation in Cryo-EM Reconstruction	
@@ -0,0 +1 @@
+Electron cryo-microscopy (cryo-EM) is a powerful method for investigating the structures of protein molecules, with important implications for understanding the molecular processes of life and drug development. In this technique, many noisy, two-dimensional projection images of protein molecules in unknown poses are combined into one or more three-dimensional reconstructions. The presence of multiple structural states in the data represents a major bottleneck in existing processing pipelines, often requiring expert user supervision. Variational auto-encoders (VAEs) have recently been proposed as an attractive means for learning the data manifold of data sets with a large number of different states. These methods are based on a coordinate-based approach, similar to Neural Radiance Fields (NeRF), to make volumetric reconstructions from 2D image data in Fourier-space. Although NeRF is a powerful method for real-space reconstruction, many of the benefits of the method do not transfer to Fourier-space, e.g. inductive bias for spatial locality. We present an approach where the VAE reconstruction is expressed on a volumetric grid, and demonstrate how this model can be trained efficiently through a novel backpropagation method that exploits the sparsity of the projection operation in Fourier-space. We achieve improved results on a simulated data set and at least equivalent results on an experimental data set when compared to the coordinate-based approach, while also substantially lowering computational cost. Our approach is computationally more efficient, especially in inference, enabling interactive analysis of the latent space by the user.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Gaussian Process Hyperparameters: Optimize or Integrate? b/data/2022/neurips/Sparse Gaussian Process Hyperparameters: Optimize or Integrate?
new file mode 100644
index 0000000000..7f6ed73c36
--- /dev/null
+++ b/data/2022/neurips/Sparse Gaussian Process Hyperparameters: Optimize or Integrate?	
@@ -0,0 +1 @@
+The kernel function and its hyperparameters are the central model selection choice in a Gaussian proces (Rasmussen and Williams, 2006). Typically, the hyperparameters of the kernel are chosen by maximising the marginal likelihood, an approach known as Type-II maximum likelihood (ML-II). However, ML-II does not account for hyperparameter uncertainty, and it is well-known that this can lead to severely biased estimates and an underestimation of predictive uncertainty. While there are several works which employ a fully Bayesian characterisation of GPs, relatively few propose such approaches for the sparse GPs paradigm. In this work we propose an algorithm for sparse Gaussian process regression which leverages MCMC to sample from the hyperparameter posterior within the variational inducing point framework of Titsias (2009). This work is closely related to Hensman et al. (2015b) but side-steps the need to sample the inducing points, thereby significantly improving sampling efficiency in the Gaussian likelihood case. We compare this scheme against natural baselines in literature along with stochastic variational GPs (SVGPs) along with an extensive computational analysis.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Hypergraph Community Detection Thresholds in Stochastic Block Model b/data/2022/neurips/Sparse Hypergraph Community Detection Thresholds in Stochastic Block Model
new file mode 100644
index 0000000000..12131c41ac
--- /dev/null
+++ b/data/2022/neurips/Sparse Hypergraph Community Detection Thresholds in Stochastic Block Model	
@@ -0,0 +1 @@
+Community detection in random graphs or hypergraphs is an interesting fundamental problem in statistics, machine learning and computer vision. When the hypergraphs are generated by a stochastic block model , the existence of a sharp threshold on the model parameters for community detection was conjectured by Angelini et al. 2015. In this paper, we confirm the positive part of the conjecture, the possibility of non-trivial reconstruction above the threshold, for the case of two blocks. We do so by comparing the hypergraph stochastic block model with its Erdös-Rényi counterpart. We also obtain estimates for the parameters of the hyper-graph stochastic block model. The methods developed in this paper are generalised from the study of sparse random graphs by Mossel et al. 2015 and are motivated by the work of Yuan et al. 2022. Furthermore, we present some discussion on the negative part of the conjecture, i.e., non-reconstruction of community structures.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection b/data/2022/neurips/Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection
new file mode 100644
index 0000000000..8e8455a9ec
--- /dev/null
+++ b/data/2022/neurips/Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection	
@@ -0,0 +1 @@
+There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully investigate the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to fully connected neural networks. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Probabilistic Circuits via Pruning and Growing b/data/2022/neurips/Sparse Probabilistic Circuits via Pruning and Growing
new file mode 100644
index 0000000000..83adc51deb
--- /dev/null
+++ b/data/2022/neurips/Sparse Probabilistic Circuits via Pruning and Growing	
@@ -0,0 +1 @@
+Probabilistic circuits (PCs) are a tractable representation of probability distributions allowing for exact and efficient computation of likelihoods and marginals. There has been significant recent progress on improving the scale and expressiveness of PCs. However, PC training performance plateaus as model size increases. We discover that most capacity in existing large PC structures is wasted: fully-connected parameter layers are only sparsely used. We propose two operations: pruning and growing, that exploit the sparsity of PC structures. Specifically, the pruning operation removes unimportant sub-networks of the PC for model compression and comes with theoretical guarantees. The growing operation increases model capacity by increasing the size of the latent space. By alternatingly applying pruning and growing, we increase the capacity that is meaningfully used, allowing us to significantly scale up PC learning. Empirically, our learner achieves state-of-the-art likelihoods on MNIST-family image datasets and on Penn Tree Bank language data compared to other PC learners and less tractable deep generative models such as flow-based models and variational autoencoders (VAEs).
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Structure Search for Delta Tuning b/data/2022/neurips/Sparse Structure Search for Delta Tuning
new file mode 100644
index 0000000000..836240c94a
--- /dev/null
+++ b/data/2022/neurips/Sparse Structure Search for Delta Tuning	
@@ -0,0 +1 @@
+Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of delta tuning (DT), i.e., parameter-efficient tuning, find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing DT modules in an automatic manner. We automatically S earch for the S parse S tructure of Delta Tuning (S 3 Delta). Based on a unified framework of various DT methods, S 3 Delta conducts the differentiable DT structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S 3 Delta surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99% fine-tuning performance with 0.01% trainable parameters. Moreover, the advantage of S 3 Delta is amplified with extremely low trainable parameters budgets (0.0009% ∼ 0.01%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of DT methods. Our codes are publicly available at https://github.com/thunlp/S3Delta .
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse Winning Tickets are Data-Efficient Image Recognizers b/data/2022/neurips/Sparse Winning Tickets are Data-Efficient Image Recognizers
new file mode 100644
index 0000000000..4d5db1696a
--- /dev/null
+++ b/data/2022/neurips/Sparse Winning Tickets are Data-Efficient Image Recognizers	
@@ -0,0 +1 @@
+Improving the performance of deep networks in data-limited regimes has warranted much attention. In this work, we empirically show that “winning tickets” (small sub-networks) obtained via magnitude pruning based on the lottery ticket hypothesis [1], apart from being sparse are also effective recognizers in data-limited regimes. Based on extensive experiments, we find that in low data regimes (datasets of 50-100 examples per class), sparse winning tickets substantially outperform the original dense networks. This approach, when combined with augmentations or fine-tuning from a self-supervised backbone network, shows further improvements in performance by as much as 16% (absolute) on low sample datasets and long-tailed classification. Further, sparse winning tickets are more robust to synthetic noise and distribution shifts compared to their dense counterparts. Our analysis of winning tickets on small datasets indicates that, though sparse, the networks retain density in the initial layers and their representations are more generalizable. Code is available at https://github.com/VITA-Group/DataEfficientLTH .
\ No newline at end of file
diff --git a/data/2022/neurips/Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection b/data/2022/neurips/Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection
new file mode 100644
index 0000000000..a99ac1873d
--- /dev/null
+++ b/data/2022/neurips/Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection	
@@ -0,0 +1 @@
+LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors. Yet, small, distant, and incomplete objects with sparse or few points are often hard to detect. We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space. Specifically, we first train a dense point 3D detector (DDet) with a dense point cloud as input and design a sparse point 3D detector (SDet) with a regular point cloud as input. Importantly, we formulate the lightweight plug-in S2D module and the point cloud reconstruction module in SDet to densify 3D features and train SDet to produce 3D features, following the dense 3D features in DDet. So, in inference, SDet can simulate dense 3D features from regular (sparse) point cloud inputs without requiring dense inputs. We evaluate our method on the large-scale Waymo Open Dataset and the Waymo Domain Adaptation Dataset, showing its high performance and efficiency over the state of the arts.
\ No newline at end of file
diff --git a/data/2022/neurips/Sparsity in Continuous-Depth Neural Networks b/data/2022/neurips/Sparsity in Continuous-Depth Neural Networks
new file mode 100644
index 0000000000..22d423e430
--- /dev/null
+++ b/data/2022/neurips/Sparsity in Continuous-Depth Neural Networks	
@@ -0,0 +1 @@
+Neural Ordinary Differential Equations (NODEs) have proven successful in learning dynamical systems in terms of accurately recovering the observed trajectories. While different types of sparsity have been proposed to improve robustness, the generalization properties of NODEs for dynamical systems beyond the observed data are underexplored. We systematically study the influence of weight and feature sparsity on forecasting as well as on identifying the underlying dynamical laws. Besides assessing existing methods, we propose a regularization technique to sparsify"input-output connections"and extract relevant features during training. Moreover, we curate real-world datasets consisting of human motion capture and human hematopoiesis single-cell RNA-seq data to realistically analyze different levels of out-of-distribution (OOD) generalization in forecasting and dynamics identification respectively. Our extensive empirical evaluation on these challenging benchmarks suggests that weight sparsity improves generalization in the presence of noise or irregular sampling. However, it does not prevent learning spurious feature dependencies in the inferred dynamics, rendering them impractical for predictions under interventions, or for inferring the true underlying dynamics. Instead, feature sparsity can indeed help with recovering sparse ground-truth dynamics compared to unregularized NODEs.
\ No newline at end of file
diff --git a/data/2022/neurips/Spartan: Differentiable Sparsity via Regularized Transportation b/data/2022/neurips/Spartan: Differentiable Sparsity via Regularized Transportation
new file mode 100644
index 0000000000..1559693453
--- /dev/null
+++ b/data/2022/neurips/Spartan: Differentiable Sparsity via Regularized Transportation	
@@ -0,0 +1 @@
+We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation policies, including both unstructured and block structured sparsity, as well as general cost-sensitive sparsity allocation mediated by linear models of per-parameter costs. On ImageNet-1K classification, Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training.
\ No newline at end of file
diff --git a/data/2022/neurips/Spatial Mixture-of-Experts b/data/2022/neurips/Spatial Mixture-of-Experts
new file mode 100644
index 0000000000..a7fe4b1b3f
--- /dev/null
+++ b/data/2022/neurips/Spatial Mixture-of-Experts	
@@ -0,0 +1 @@
+Many data have an underlying dependence on spatial location; it may be weather on the Earth, a simulation on a mesh, or a registered image. Yet this feature is rarely taken advantage of, and violates common assumptions made by many neural network layers, such as translation equivariance. Further, many works that do incorporate locality fail to capture fine-grained structure. To address this, we introduce the Spatial Mixture-of-Experts (SMoE) layer, a sparsely-gated layer that learns spatial structure in the input domain and routes experts at a fine-grained level to utilize it. We also develop new techniques to train SMoEs, including a self-supervised routing loss and damping expert errors. Finally, we show strong results for SMoEs on numerous tasks, and set new state-of-the-art results for medium-range weather prediction and post-processing ensemble weather forecasts.
\ No newline at end of file
diff --git a/data/2022/neurips/Spatial Pruned Sparse Convolution for Efficient 3D Object Detection b/data/2022/neurips/Spatial Pruned Sparse Convolution for Efficient 3D Object Detection
new file mode 100644
index 0000000000..a60a2be59e
--- /dev/null
+++ b/data/2022/neurips/Spatial Pruned Sparse Convolution for Efficient 3D Object Detection	
@@ -0,0 +1 @@
+3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects. In this paper, we analyze major components of existing sparse 3D CNNs and find that 3D CNNs ignore the redundancy of data and further amplify it in the down-sampling process, which brings a huge amount of extra and unnecessary computational overhead. Inspired by this, we propose a new convolution operator named spatial pruned sparse convolution (SPS-Conv), which includes two variants, spatial pruned submanifold sparse convolution (SPSS-Conv) and spatial pruned regular sparse convolution (SPRS-Conv), both of which are based on the idea of dynamically determining crucial areas for redundancy reduction. We validate that the magnitude can serve as important cues to determine crucial areas which get rid of the extra computations of learning-based methods. The proposed modules can easily be incorporated into existing sparse 3D CNNs without extra architectural modifications. Extensive experiments on the KITTI, Waymo and nuScenes datasets demonstrate that our method can achieve more than 50% reduction in GFLOPs without compromising the performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime b/data/2022/neurips/Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime
new file mode 100644
index 0000000000..2b6a9c7e7e
--- /dev/null
+++ b/data/2022/neurips/Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime	
@@ -0,0 +1 @@
+We provide quantitative bounds measuring the $L^2$ difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time. The proof exploits the low-effective-rank property of the Fisher Information Matrix at initialization, which implies a low effective dimension of the model (far smaller than the number of parameters). We conclude that local capacity control from the low effective rank of the Fisher Information Matrix is still underexplored theoretically.
\ No newline at end of file
diff --git a/data/2022/neurips/Spectrum Random Masking for Generalization in Image-based Reinforcement Learning b/data/2022/neurips/Spectrum Random Masking for Generalization in Image-based Reinforcement Learning
new file mode 100644
index 0000000000..147441a484
--- /dev/null
+++ b/data/2022/neurips/Spectrum Random Masking for Generalization in Image-based Reinforcement Learning	
@@ -0,0 +1 @@
+Generalization in image-based reinforcement learning (RL) aims to learn a robust policy that could be applied directly on unseen visual environments, which is a challenging task since agents usually tend to overfit to their training environment. To handle this problem, a natural approach is to increase the data diversity by image based augmentations. However, different with most vision tasks such as classification and detection, RL tasks are not always invariant to spatial based augmentations due to the entanglement of environment dynamics and visual appearance. In this paper, we argue with two principles for augmentations in RL: First , the augmented observations should facilitate learning a universal policy, which is robust to various distribution shifts. Second , the augmented data should be invariant to the learning signals such as action and reward. Following these rules, we revisit image-based RL tasks from the view of frequency domain and propose a novel augmentation method, namely Spectrum Random Masking (SRM),which is able to help agents to learn the whole frequency spectrum of observation for coping with various distributions and compatible with the pre-collected action and reward corresponding to original observation. Extensive experiments conducted on DMControl Generalization Benchmark demonstrate the proposed SRM achieves the state-of-the-art performance with strong generalization potentials.
\ No newline at end of file
diff --git a/data/2022/neurips/Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions b/data/2022/neurips/Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions
new file mode 100644
index 0000000000..befe62d296
--- /dev/null
+++ b/data/2022/neurips/Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions	
@@ -0,0 +1 @@
+One of the most important AI research questions is to trade off computation versus performance since ``perfect rationality"exists in theory but is impossible to achieve in practice. Recently, Monte-Carlo tree search (MCTS) has attracted considerable attention due to the significant performance improvement in various challenging domains. However, the expensive time cost during search severely restricts its scope for applications. This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively. We give theoretical bounds of the proposed method and evaluate the performance and computations on $9 \times 9$ Go board games and Atari games. Experiments show that our method can achieve comparable performances to the original search algorithm while requiring less than $50\%$ search time on average. We believe that this approach is a viable alternative for tasks under limited time and resources. The code is available at \url{https://github.com/YeWR/V-MCTS.git}.
\ No newline at end of file
diff --git a/data/2022/neurips/Spherical Channels for Modeling Atomic Interactions b/data/2022/neurips/Spherical Channels for Modeling Atomic Interactions
new file mode 100644
index 0000000000..6134a9c8d9
--- /dev/null
+++ b/data/2022/neurips/Spherical Channels for Modeling Atomic Interactions	
@@ -0,0 +1 @@
+Modeling the energy and forces of atomic systems is a fundamental problem in computational chemistry with the potential to help address many of the world's most pressing problems, including those related to energy scarcity and climate change. These calculations are traditionally performed using Density Functional Theory, which is computationally very expensive. Machine learning has the potential to dramatically improve the efficiency of these calculations from days or hours to seconds. We propose the Spherical Channel Network (SCN) to model atomic energies and forces. The SCN is a graph neural network where nodes represent atoms and edges their neighboring atoms. The atom embeddings are a set of spherical functions, called spherical channels, represented using spherical harmonics. We demonstrate, that by rotating the embeddings based on the 3D edge orientation, more information may be utilized while maintaining the rotational equivariance of the messages. While equivariance is a desirable property, we find that by relaxing this constraint in both message passing and aggregation, improved accuracy may be achieved. We demonstrate state-of-the-art results on the large-scale Open Catalyst dataset in both energy and force prediction for numerous tasks and metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/Spherization Layer: Representation Using Only Angles b/data/2022/neurips/Spherization Layer: Representation Using Only Angles
new file mode 100644
index 0000000000..0ea655b54d
--- /dev/null
+++ b/data/2022/neurips/Spherization Layer: Representation Using Only Angles	
@@ -0,0 +1 @@
+In neural network literature, angular similarity between feature vectors is frequently used for interpreting or re-using learned representations. However, the inner product in neural networks partially disperses information over the scales and angles of the involved input vectors and weight vectors. Therefore, when using only angular similarity on representations trained with the inner product, information loss occurs in downstream methods, which limits their performance. In this paper, we proposed the spherization layer to represent all information on angular similarity. The layer 1) maps the pre-activations of input vectors into the specific range of angles, 2) converts the angular coordinates of the vectors to Cartesian coordinates with an additional dimension, and 3) trains decision boundaries from hyperplanes, without bias parameters, passing through the origin. This approach guarantees that representation learning always occurs on the hyperspherical surface without the loss of any information unlike other projection-based methods. Furthermore, this method can be applied to any network by replacing an existing layer. We validate the functional correctness of the proposed method in a toy task, retention ability in well-known image classification tasks, and effectiveness in word analogy test and few-shot learning. Code is publicly available at https://github.com/GIST-IRR/spherization_layer
\ No newline at end of file
diff --git a/data/2022/neurips/Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables b/data/2022/neurips/Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables
new file mode 100644
index 0000000000..cc93bd7212
--- /dev/null
+++ b/data/2022/neurips/Split-kl and PAC-Bayes-split-kl Inequalities for Ternary Random Variables	
@@ -0,0 +1 @@
+We present a new concentration of measure inequality for sums of independent bounded random variables, which we name a split-kl inequality. The inequality is particularly well-suited for ternary random variables, which naturally show up in a variety of problems, including analysis of excess losses in classification, analysis of weighted majority votes, and learning with abstention. We demonstrate that for ternary random variables the inequality is simultaneously competitive with the kl inequality, the Empirical Bernstein inequality, and the Unexpected Bernstein inequality, and in certain regimes outperforms all of them. It resolves an open question by Tolstikhin and Seldin [2013] and Mhammedi et al. [2019] on how to match simultaneously the combinatorial power of the kl inequality when the distribution happens to be close to binary and the power of Bernstein inequalities to exploit low variance when the probability mass is concentrated on the middle value. We also derive a PAC-Bayes-split-kl inequality and compare it with the PAC-Bayes-kl, PAC-Bayes-Empirical-Bennett, and PAC-Bayes-Unexpected-Bernstein inequalities in an analysis of excess losses and in an analysis of a weighted majority vote for several UCI datasets. Last but not least, our study provides the first direct comparison of the Empirical Bernstein and Unexpected Bernstein inequalities and their PAC-Bayes extensions.
\ No newline at end of file
diff --git a/data/2022/neurips/Squeezeformer: An Efficient Transformer for Automatic Speech Recognition b/data/2022/neurips/Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
new file mode 100644
index 0000000000..8b734e0b06
--- /dev/null
+++ b/data/2022/neurips/Squeezeformer: An Efficient Transformer for Automatic Speech Recognition	
@@ -0,0 +1 @@
+The recently proposed Conformer model has become the de facto backbone model for various downstream speech tasks based on its hybrid attention-convolution architecture that captures both local and global features. However, through a series of systematic studies, we find that the Conformer architecture's design choices are not optimal. After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes. In particular, for the macro-architecture, Squeezeformer incorporates (i) the Temporal U-Net structure which reduces the cost of the multi-head attention modules on long sequences, and (ii) a simpler block structure of multi-head attention or convolution modules followed up by feed-forward module instead of the Macaron structure proposed in Conformer. Furthermore, for the micro-architecture, Squeezeformer (i) simplifies the activations in the convolutional block, (ii) removes redundant Layer Normalization operations, and (iii) incorporates an efficient depthwise down-sampling layer to efficiently sub-sample the input signal. Squeezeformer achieves state-of-the-art results of 7.5%, 6.5%, and 6.0% word-error-rate (WER) on LibriSpeech test-other without external language models, which are 3.1%, 1.4%, and 0.6% better than Conformer-CTC with the same number of FLOPs. Our code is open-sourced and available online.
\ No newline at end of file
diff --git a/data/2022/neurips/Stability Analysis and Generalization Bounds of Adversarial Training b/data/2022/neurips/Stability Analysis and Generalization Bounds of Adversarial Training
new file mode 100644
index 0000000000..6f6319a072
--- /dev/null
+++ b/data/2022/neurips/Stability Analysis and Generalization Bounds of Adversarial Training	
@@ -0,0 +1 @@
+In adversarial machine learning, deep neural networks can fit the adversarial examples on the training dataset but have poor generalization ability on the test set. This phenomenon is called robust overfitting, and it can be observed when adversarially training neural nets on common datasets, including SVHN, CIFAR-10, CIFAR-100, and ImageNet. In this paper, we study the robust overfitting issue of adversarial training by using tools from uniform stability. One major challenge is that the outer function (as a maximization of the inner function) is nonsmooth, so the standard technique (e.g., hardt et al., 2016) cannot be applied. Our approach is to consider $\eta$-approximate smoothness: we show that the outer function satisfies this modified smoothness assumption with $\eta$ being a constant related to the adversarial perturbation $\epsilon$. Based on this, we derive stability-based generalization bounds for stochastic gradient descent (SGD) on the general class of $\eta$-approximate smooth functions, which covers the adversarial loss. Our results suggest that robust test accuracy decreases in $\epsilon$ when $T$ is large, with a speed between $\Omega(\epsilon\sqrt{T})$ and $\mathcal{O}(\epsilon T)$. This phenomenon is also observed in practice. Additionally, we show that a few popular techniques for adversarial training (e.g., early stopping, cyclic learning rate, and stochastic weight averaging) are stability-promoting in theory.
\ No newline at end of file
diff --git a/data/2022/neurips/Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks b/data/2022/neurips/Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks
new file mode 100644
index 0000000000..d3ff83c25a
--- /dev/null
+++ b/data/2022/neurips/Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks	
@@ -0,0 +1 @@
+While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.
\ No newline at end of file
diff --git a/data/2022/neurips/Stability and Generalization for Markov Chain Stochastic Gradient Methods b/data/2022/neurips/Stability and Generalization for Markov Chain Stochastic Gradient Methods
new file mode 100644
index 0000000000..b8efe77720
--- /dev/null
+++ b/data/2022/neurips/Stability and Generalization for Markov Chain Stochastic Gradient Methods	
@@ -0,0 +1 @@
+Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.
\ No newline at end of file
diff --git a/data/2022/neurips/Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel b/data/2022/neurips/Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel
new file mode 100644
index 0000000000..0add32faf7
--- /dev/null
+++ b/data/2022/neurips/Stability and Generalization of Kernel Clustering: from Single Kernel to Multiple Kernel	
@@ -0,0 +1 @@
+Multiple kernel clustering (MKC) is an important research topic that has been widely studied for decades. However, current methods still face two problems: inefficient when handling out-of-sample data points and lack of theoretical study of the stability and generalization of clustering. In this paper, we propose a novel method that can efficiently compute the embedding of out-of-sample data with a solid generalization guarantee. Specifically, we approximate the eigen functions of the integral operator associated with the linear combination of base kernel functions to construct low-dimensional embeddings of out-of-sample points for efficient multiple kernel clustering. In addition, we, for the first time, theoretically study the stability of clustering algorithms and prove that the single-view version of the proposed method has uniform stability as O (cid:0) Kn − 3 / 2 (cid:1) and establish an upper bound of excess risk as (cid:101) O (cid:0) Kn − 3 / 2 + n − 1 / 2 (cid:1) , where K is the cluster number and n is the number of samples. We then extend the theoretical results to multiple kernel scenarios and find that the stability of MKC depends on kernel weights. As an example, we apply our method to a novel MKC algorithm termed SimpleMKKM and derive the upper bound of its excess clustering risk, which is tighter than the current results. Extensive experimental results validate the effectiveness and efficiency of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Staggered Rollout Designs Enable Causal Inference Under Interference Without Network Knowledge b/data/2022/neurips/Staggered Rollout Designs Enable Causal Inference Under Interference Without Network Knowledge
new file mode 100644
index 0000000000..626937cd85
--- /dev/null
+++ b/data/2022/neurips/Staggered Rollout Designs Enable Causal Inference Under Interference Without Network Knowledge	
@@ -0,0 +1 @@
+Randomized experiments are widely used to estimate causal effects across a variety of domains. However, classical causal inference approaches rely on critical independence assumptions that are violated by network interference, when the treatment of one individual influences the outcomes of others. All existing approaches require at least approximate knowledge of the network, which may be unavailable and costly to collect. We consider the task of estimating the total treatment effect (TTE), or the average difference between the outcomes when the whole population is treated versus when the whole population is untreated. By leveraging a staggered rollout design, in which treatment is incrementally given to random subsets of individuals, we derive unbiased estimators for TTE that do not rely on any prior structural knowledge of the network, as long as the network interference effects are constrained to low-degree interactions among neighbors of an individual. We derive bounds on the variance of the estimators, and we show in experiments that our estimator performs well against baselines on simulated data. Central to our theoretical contribution is a connection between staggered rollout observations and polynomial extrapolation.
\ No newline at end of file
diff --git a/data/2022/neurips/Staircase Attention for Recurrent Processing of Sequences b/data/2022/neurips/Staircase Attention for Recurrent Processing of Sequences
new file mode 100644
index 0000000000..6533405a4e
--- /dev/null
+++ b/data/2022/neurips/Staircase Attention for Recurrent Processing of Sequences	
@@ -0,0 +1 @@
+Attention mechanisms have become a standard tool for sequence modeling tasks, in particular by stacking self-attention layers over the entire input sequence as in the Transformer architecture. In this work we introduce a novel attention procedure called staircase attention that, unlike self-attention, operates across the sequence (in time) recurrently processing the input by adding another step of processing. A step in the staircase comprises of backward tokens (encoding the sequence so far seen) and forward tokens (ingesting a new part of the sequence), or an extreme Ladder version with a forward step of zero that simply repeats the Transformer on each step of the ladder, sharing the weights. We thus describe a family of such models that can trade off performance and compute, by either increasing the amount of recurrence through time, the amount of sequential processing via recurrence in depth, or both. Staircase attention is shown to be able to solve tasks that involve tracking that conventional Transformers cannot, due to this recurrence. Further, it is shown to provide improved modeling power for the same size model (number of parameters) compared to self-attentive Transformers on large language modeling and dialogue tasks, yielding significant perplexity gains.
\ No newline at end of file
diff --git a/data/2022/neurips/Star Temporal Classification: Sequence Modeling with Partially Labeled Data b/data/2022/neurips/Star Temporal Classification: Sequence Modeling with Partially Labeled Data
new file mode 100644
index 0000000000..c6de1bf3c4
--- /dev/null
+++ b/data/2022/neurips/Star Temporal Classification: Sequence Modeling with Partially Labeled Data	
@@ -0,0 +1 @@
+We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We address this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be missing. We express STC as the composition of weighted finite-state transducers (WFSTs) and use GTN (a framework for automatic differentiation with WFSTs) to compute gradients. We perform extensive experiments on automatic speech recognition. These experiments show that STC can close the performance gap with supervised baseline to about 1% WER when up to 70% of the labels are missing. We also perform experiments in handwriting recognition to show that our method easily applies to other sequence classification tasks
\ No newline at end of file
diff --git a/data/2022/neurips/Stars: Tera-Scale Graph Building for Clustering and Learning b/data/2022/neurips/Stars: Tera-Scale Graph Building for Clustering and Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Statistical Learning and Inverse Problems: A Stochastic Gradient Approach b/data/2022/neurips/Statistical Learning and Inverse Problems: A Stochastic Gradient Approach
new file mode 100644
index 0000000000..60b2d050ae
--- /dev/null
+++ b/data/2022/neurips/Statistical Learning and Inverse Problems: A Stochastic Gradient Approach	
@@ -0,0 +1 @@
+Inverse problems are paramount in Science and Engineering. In this paper, we consider the setup of Statistical Inverse Problem (SIP) and demonstrate how Stochastic Gradient Descent (SGD) algorithms can be used in the linear SIP setting. We provide consistency and finite sample bounds for the excess risk. We also propose a modification for the SGD algorithm where we leverage machine learning methods to smooth the stochastic gradients and improve empirical performance. We exemplify the algorithm in a setting of great interest nowadays: the Functional Linear Regression model. In this case we consider a synthetic data example and examples with a real data classification problem.
\ No newline at end of file
diff --git a/data/2022/neurips/Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances b/data/2022/neurips/Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances
new file mode 100644
index 0000000000..0e4c2a44b7
--- /dev/null
+++ b/data/2022/neurips/Statistical, Robustness, and Computational Guarantees for Sliced Wasserstein Distances	
@@ -0,0 +1 @@
+Sliced Wasserstein distances preserve properties of classic Wasserstein distances while being more scalable for computation and estimation in high dimensions. The goal of this work is to quantify this scalability from three key aspects: (i) empirical convergence rates; (ii) robustness to data contamination; and (iii) efficient computational methods. For empirical convergence, we derive fast rates with explicit dependence of constants on dimension, subject to log-concavity of the population distributions. For robustness, we characterize minimax optimal, dimension-free robust estimation risks, and show an equivalence between robust sliced 1-Wasserstein estimation and robust mean estimation. This enables lifting statistical and algorithmic guarantees available for the latter to the sliced 1-Wasserstein setting. Moving on to computational aspects, we analyze the Monte Carlo estimator for the average-sliced distance, demonstrating that larger dimension can result in faster convergence of the numerical integration error. For the max-sliced distance, we focus on a subgradient-based local optimization algorithm that is frequently used in practice, albeit without formal guarantees, and establish an $O(\epsilon^{-4})$ computational complexity bound for it. Our theory is validated by numerical experiments, which altogether provide a comprehensive quantitative account of the scalability question.
\ No newline at end of file
diff --git a/data/2022/neurips/Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers b/data/2022/neurips/Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
new file mode 100644
index 0000000000..5a8228e0c0
--- /dev/null
+++ b/data/2022/neurips/Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers	
@@ -0,0 +1 @@
+A common lens to theoretically study neural net architectures is to analyze the functions they can approximate. However, constructions from approximation theory may be unrealistic and therefore less meaningful. For example, a common unrealistic trick is to encode target function values using infinite precision. To address these issues, this work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: boolean circuits and Turing machines. We show that overparameterized feedforward neural nets can SM approximate boolean circuits with sample complexity depending only polynomially on the circuit size, not the size of the network. In addition, we show that transformers can SM approximate Turing machines with computation time bounded by $T$ with sample complexity polynomial in the alphabet size, state space size, and $\log (T)$. We also introduce new tools for analyzing generalization which provide much tighter sample complexities than the typical VC-dimension or norm-based bounds, which may be of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing b/data/2022/neurips/Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing
new file mode 100644
index 0000000000..004767dbec
--- /dev/null
+++ b/data/2022/neurips/Stimulative Training of Residual Networks: A Social Psychology Perspective of Loafing	
@@ -0,0 +1 @@
+Residual networks have shown great success and become indispensable in today's deep models. In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training strategy to strengthen the performance of residual networks. As residual networks can be viewed as ensembles of relatively shallow networks (i.e., \textit{unraveled view}) in prior works, we also start from such view and consider that the final performance of a residual network is co-determined by a group of sub-networks. Inspired by the social loafing problem of social psychology, we find that residual networks invariably suffer from similar problem, where sub-networks in a residual network are prone to exert less effort when working as part of the group compared to working alone. We define this previously overlooked problem as \textit{network loafing}. As social loafing will ultimately cause the low individual productivity and the reduced overall performance, network loafing will also hinder the performance of a given residual network and its sub-networks. Referring to the solutions of social psychology, we propose \textit{stimulative training}, which randomly samples a residual sub-network and calculates the KL-divergence loss between the sampled sub-network and the given residual network, to act as extra supervision for sub-networks and make the overall goal consistent. Comprehensive empirical results and theoretical analyses verify that stimulative training can well handle the loafing problem, and improve the performance of a residual network by improving the performance of its sub-networks. The code is available at https://github.com/Sunshine-Ye/NIPS22-ST .
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Adaptive Activation Function b/data/2022/neurips/Stochastic Adaptive Activation Function
new file mode 100644
index 0000000000..d399e845e7
--- /dev/null
+++ b/data/2022/neurips/Stochastic Adaptive Activation Function	
@@ -0,0 +1 @@
+The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of representing this variability. Therefore, this study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs. Furthermore, the proposed activation function mathematically exhibits a more generalized form of Swish activation function, and thus we denoted it as Adaptive SwisH (ASH). ASH highlights informative features that exhibit large values in the top percentiles in an input, whereas it rectifies low values. Most importantly, ASH exhibits trainable, adaptive, and context-aware properties compared to other activation functions. Furthermore, ASH represents general formula of the previously studied activation function and provides a reasonable mathematical background for the superior performance. To validate the effectiveness and robustness of ASH, we implemented ASH into many deep learning models for various tasks, including classification, detection, segmentation, and image generation. Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions b/data/2022/neurips/Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions
new file mode 100644
index 0000000000..0e91dfde95
--- /dev/null
+++ b/data/2022/neurips/Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions	
@@ -0,0 +1 @@
+We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning. We propose novel variants of stochastic Halpern iteration with recursive variance reduction. In the cocoercive -- and more generally Lipschitz-monotone -- setup, our algorithm attains $\epsilon$ norm of the operator with $\mathcal{O}(\frac{1}{\epsilon^3})$ stochastic operator evaluations, which significantly improves over state of the art $\mathcal{O}(\frac{1}{\epsilon^4})$ stochastic operator evaluations required for existing monotone inclusion solvers applied to the same problem classes. We further show how to couple one of the proposed variants of stochastic Halpern iteration with a scheduled restart scheme to solve stochastic monotone inclusion problems with ${\mathcal{O}}(\frac{\log(1/\epsilon)}{\epsilon^2})$ stochastic operator evaluations under additional sharpness or strong monotonicity assumptions.
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Multiple Target Sampling Gradient Descent b/data/2022/neurips/Stochastic Multiple Target Sampling Gradient Descent
new file mode 100644
index 0000000000..61995bb7b8
--- /dev/null
+++ b/data/2022/neurips/Stochastic Multiple Target Sampling Gradient Descent	
@@ -0,0 +1 @@
+Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimization problem and can be viewed as a probabilistic version of this single-objective optimization problem. A natural question then arises:"Can we derive a probabilistic version of the multi-objective optimization?". To answer this question, we propose Stochastic Multiple Target Sampling Gradient Descent (MT-SGD), enabling us to sample from multiple unnormalized target distributions. Specifically, our MT-SGD conducts a flow of intermediate distributions gradually orienting to multiple target distributions, which allows the sampled particles to move to the joint high-likelihood region of the target distributions. Interestingly, the asymptotic analysis shows that our approach reduces exactly to the multiple-gradient descent algorithm for multi-objective optimization, as expected. Finally, we conduct comprehensive experiments to demonstrate the merit of our approach to multi-task learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality b/data/2022/neurips/Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality
new file mode 100644
index 0000000000..d99261b0ae
--- /dev/null
+++ b/data/2022/neurips/Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality	
@@ -0,0 +1 @@
+We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time. We show that, surprisingly, the notion of optimal finite-time regret is not a uniquely defined property in this context and that, in general, it is decoupled from the asymptotic rate. We discuss alternative choices and propose a notion of finite-time optimality that we argue is \emph{meaningful}. For that notion, we give an algorithm that admits quasi-optimal regret both in finite-time and asymptotically.
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions b/data/2022/neurips/Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions
new file mode 100644
index 0000000000..bea603a814
--- /dev/null
+++ b/data/2022/neurips/Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Functions	
@@ -0,0 +1 @@
+We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with $1\le\alpha\le2$ which holds in a wide range of applications in machine learning and signal processing. This condition ensures that any first-order stationary point is a global optimum. We prove that the total sample complexity of SCRN in achieving $\epsilon$-global optimum is $\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for $1\le\alpha<3/2$ and $\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for $3/2\le\alpha\le 2$. SCRN improves the best-known sample complexity of stochastic gradient descent. Even under a weak version of gradient dominance property, which is applicable to policy-based reinforcement learning (RL), SCRN achieves the same improvement over stochastic policy gradient methods. Additionally, we show that the average sample complexity of SCRN can be reduced to ${\mathcal{O}}(\epsilon^{-2})$ for $\alpha=1$ using a variance reduction method with time-varying batch sizes. Experimental results in various RL settings showcase the remarkable performance of SCRN compared to first-order methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Stochastic Window Transformer for Image Restoration b/data/2022/neurips/Stochastic Window Transformer for Image Restoration
new file mode 100644
index 0000000000..4d58bfe22f
--- /dev/null
+++ b/data/2022/neurips/Stochastic Window Transformer for Image Restoration	
@@ -0,0 +1 @@
+Thanks to the powerful representation capabilities, transformers have made impressive progress in image restoration. However, existing transformers-based methods do not carefully consider the particularities of image restoration. In general, image restoration requires that an ideal approach should be translation-invariant to the degradation, i.e., the undesirable degradation should be removed irrespective of its position within the image. Furthermore, the local relationships also play a vital role, which should be faithfully exploited for recovering clean images. Nevertheless, most transformers either adopt local attention with the fixed local window strategy or global attention, which unfortunately breaks the translation invariance and causes huge loss of local relationships. To address these issues, we propose an elegant stochastic window strategy for transformers. Specifically, we first introduce the window partition with stochastic shift to replace the original fixed window partition for training. Then, we design a new layer expectation propagation algorithm to efficiently approximate the expectation of the induced stochastic transformer for testing. Our stochastic window transformer not only enjoys powerful representation but also maintains the desired property of translation invariance and locality. Experiments validate the stochastic window strategy consistently improves performance on various image restoration tasks (derain-ing, denoising and deblurring) by significant margins. The code is available at https://github.com/jiexiaou/Stoformer .
\ No newline at end of file
diff --git a/data/2022/neurips/Streaming Radiance Fields for 3D Video Synthesis b/data/2022/neurips/Streaming Radiance Fields for 3D Video Synthesis
new file mode 100644
index 0000000000..697135dbf8
--- /dev/null
+++ b/data/2022/neurips/Streaming Radiance Fields for 3D Video Synthesis	
@@ -0,0 +1 @@
+We present an explicit-grid based method for efficiently reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes. Instead of training a single model that combines all the frames, we formulate the dynamic modeling problem with an incremental learning paradigm in which per-frame model difference is trained to complement the adaption of a base model on the current frame. By exploiting the simple yet effective tuning strategy with narrow bands, the proposed method realizes a feasible framework for handling video sequences on-the-fly with high training efficiency. The storage overhead induced by using explicit grid representations can be significantly reduced through the use of model difference based compression. We also introduce an efficient strategy to further accelerate model optimization for each frame. Experiments on challenging video sequences demonstrate that our approach is capable of achieving a training speed of 15 seconds per-frame with competitive rendering quality, which attains $1000 \times$ speedup over the state-of-the-art implicit methods. Code is available at https://github.com/AlgoHunt/StreamRF.
\ No newline at end of file
diff --git a/data/2022/neurips/StrokeRehab: A Benchmark Dataset for Sub-second Action Identification b/data/2022/neurips/StrokeRehab: A Benchmark Dataset for Sub-second Action Identification
new file mode 100644
index 0000000000..c4224d646b
--- /dev/null
+++ b/data/2022/neurips/StrokeRehab: A Benchmark Dataset for Sub-second Action Identification	
@@ -0,0 +1 @@
+Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.
\ No newline at end of file
diff --git a/data/2022/neurips/Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts b/data/2022/neurips/Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts
new file mode 100644
index 0000000000..5af48b2a11
--- /dev/null
+++ b/data/2022/neurips/Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts	
@@ -0,0 +1 @@
+The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.
\ No newline at end of file
diff --git a/data/2022/neurips/Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport b/data/2022/neurips/Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport
new file mode 100644
index 0000000000..def53480a1
--- /dev/null
+++ b/data/2022/neurips/Structural Kernel Search via Bayesian Optimization and Symbolical Optimal Transport	
@@ -0,0 +1 @@
+Despite recent advances in automated machine learning, model selection is still a complex and computationally intensive process. For Gaussian processes (GPs), selecting the kernel is a crucial task, often done manually by the expert. Additionally, evaluating the model selection criteria for Gaussian processes typically scales cubically in the sample size, rendering kernel search particularly computationally expensive. We propose a novel, efficient search method through a general, structured kernel space. Previous methods solved this task via Bayesian optimization and relied on measuring the distance between GP's directly in function space to construct a kernel-kernel. We present an alternative approach by defining a kernel-kernel over the symbolic representation of the statistical hypothesis that is associated with a kernel. We empirically show that this leads to a computationally more efficient way of searching through a discrete kernel space.
\ No newline at end of file
diff --git a/data/2022/neurips/Structural Knowledge Distillation for Object Detection b/data/2022/neurips/Structural Knowledge Distillation for Object Detection
new file mode 100644
index 0000000000..2b1d9e773c
--- /dev/null
+++ b/data/2022/neurips/Structural Knowledge Distillation for Object Detection	
@@ -0,0 +1 @@
+Knowledge Distillation (KD) is a well-known training paradigm in deep neural networks where knowledge acquired by a large teacher model is transferred to a small student. KD has proven to be an effective technique to significantly improve the student's performance for various tasks including object detection. As such, KD techniques mostly rely on guidance at the intermediate feature level, which is typically implemented by minimizing an lp-norm distance between teacher and student activations during training. In this paper, we propose a replacement for the pixel-wise independent lp-norm based on the structural similarity (SSIM). By taking into account additional contrast and structural cues, feature importance, correlation and spatial dependence in the feature space are considered in the loss formulation. Extensive experiments on MSCOCO demonstrate the effectiveness of our method across different training schemes and architectures. Our method adds only little computational overhead, is straightforward to implement and at the same time it significantly outperforms the standard lp-norms. Moreover, more complex state-of-the-art KD methods using attention-based sampling mechanisms are outperformed, including a +3.5 AP gain using a Faster R-CNN R-50 compared to a vanilla model.
\ No newline at end of file
diff --git a/data/2022/neurips/Structural Pruning via Latency-Saliency Knapsack b/data/2022/neurips/Structural Pruning via Latency-Saliency Knapsack
new file mode 100644
index 0000000000..ae4c8c35a3
--- /dev/null
+++ b/data/2022/neurips/Structural Pruning via Latency-Saliency Knapsack	
@@ -0,0 +1 @@
+Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets, on different platforms. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins. Project page at https://halp-neurips.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/Structure-Aware Image Segmentation with Homotopy Warping b/data/2022/neurips/Structure-Aware Image Segmentation with Homotopy Warping
new file mode 100644
index 0000000000..694617bf21
--- /dev/null
+++ b/data/2022/neurips/Structure-Aware Image Segmentation with Homotopy Warping	
@@ -0,0 +1 @@
+Besides per-pixel accuracy, topological correctness is also crucial for the segmentation of images with fine-scale structures, e.g., satellite images and biomedical images. In this paper, by leveraging the theory of digital topology, we identify pixels in an image that are critical for topology. By focusing on these critical pixels, we propose a new homotopy warping loss to train deep image segmentation networks for better topological accuracy. To efficiently identify these topologically critical pixels, we propose a new algorithm exploiting the distance transform. The proposed algorithm, as well as the loss function, naturally generalize to different topological structures in both 2D and 3D settings. The proposed loss function helps deep nets achieve better performance in terms of topology-aware metrics, outperforming state-of-the-art structure/topology-aware segmentation methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Structure-Preserving 3D Garment Modeling with Neural Sewing Machines b/data/2022/neurips/Structure-Preserving 3D Garment Modeling with Neural Sewing Machines
new file mode 100644
index 0000000000..5e1eb7d38b
--- /dev/null
+++ b/data/2022/neurips/Structure-Preserving 3D Garment Modeling with Neural Sewing Machines	
@@ -0,0 +1 @@
+3D Garment modeling is a critical and challenging topic in the area of computer vision and graphics, with increasing attention focused on garment representation learning, garment reconstruction, and controllable garment manipulation, whereas existing methods were constrained to model garments under specific categories or with relatively simple topologies. In this paper, we propose a novel Neural Sewing Machine (NSM), a learning-based framework for structure-preserving 3D garment modeling, which is capable of learning representations for garments with diverse shapes and topologies and is successfully applied to 3D garment reconstruction and controllable manipulation. To model generic garments, we first obtain sewing pattern embedding via a unified sewing pattern encoding module, as the sewing pattern can accurately describe the intrinsic structure and the topology of the 3D garment. Then we use a 3D garment decoder to decode the sewing pattern embedding into a 3D garment using the UV-position maps with masks. To preserve the intrinsic structure of the predicted 3D garment, we introduce an inner-panel structure-preserving loss, an inter-panel structure-preserving loss, and a surface-normal loss in the learning process of our framework. We evaluate NSM on the public 3D garment dataset with sewing patterns with diverse garment shapes and categories. Extensive experiments demonstrate that the proposed NSM is capable of representing 3D garments under diverse garment shapes and topologies, realistically reconstructing 3D garments from 2D images with the preserved structure, and accurately manipulating the 3D garment categories, shapes, and topologies, outperforming the state-of-the-art methods by a clear margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Structured Energy Network As a Loss b/data/2022/neurips/Structured Energy Network As a Loss
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Structured Recognition for Generative Models with Explaining Away b/data/2022/neurips/Structured Recognition for Generative Models with Explaining Away
new file mode 100644
index 0000000000..ae6f577adc
--- /dev/null
+++ b/data/2022/neurips/Structured Recognition for Generative Models with Explaining Away	
@@ -0,0 +1 @@
+A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data. Such structure can be expressed in the pattern of interactions between explanatory latent variables captured through a probabilistic graphical model. Although the learning of structured graphical models has a long history, much recent work in unsupervised modelling has instead emphasised flexible deep-network-based generation, either transforming independent latent generators to model complex data or assuming that distinct observed variables are derived from different latent nodes. Here, we extend amortised variational inference to incorporate structured factors over multiple variables, able to capture the observation-induced posterior dependence between latents that results from ``explaining away'' and thus allow complex observations to depend on multiple nodes of a structured graph. We show that appropriately parametrised factors can be combined efficiently with variational message passing in rich graphical structures. We instantiate the framework in nonlinear Gaussian Process Factor Analysis, evaluating the structured recognition framework using synthetic data from known generative processes. We fit the GPFA model to high-dimensional neural spike data from the hippocampus of freely moving rodents, where the model successfully identifies latent signals that correlate with behavioural covariates.
\ No newline at end of file
diff --git a/data/2022/neurips/Structuring Representations Using Group Invariants b/data/2022/neurips/Structuring Representations Using Group Invariants
new file mode 100644
index 0000000000..34f27c4b22
--- /dev/null
+++ b/data/2022/neurips/Structuring Representations Using Group Invariants	
@@ -0,0 +1 @@
+A finite set of invariants can identify many interesting transformation groups. For example, distances, inner products and angles are preserved by Euclidean, Orthogonal and Conformal transformations, respectively. In an equivariant representation, the group invariants should remain constant on the embedding as we transform the input. This gives a procedure for learning equivariant representations without knowing the possibly nonlinear action of the group in the input space. Rather than enforcing such hard invariance constraints on the latent space, we show how to use invariants for “symmetry regularization” of the latent while guaranteeing equivari-ance through other means. We also show the feasibility of learning disentangled representations using this approach and provide favorable qualitative and quantitative results on downstream tasks, including world modeling and reinforcement learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Structuring Uncertainty for Fine-Grained Sampling in Stochastic Segmentation Networks b/data/2022/neurips/Structuring Uncertainty for Fine-Grained Sampling in Stochastic Segmentation Networks
new file mode 100644
index 0000000000..533861e30c
--- /dev/null
+++ b/data/2022/neurips/Structuring Uncertainty for Fine-Grained Sampling in Stochastic Segmentation Networks	
@@ -0,0 +1 @@
+In image segmentation, the classic approach of learning a deterministic segmentation neither accounts for noise and ambiguity in the data nor for expert disagreements about the correct segmentation. This has been addressed by architectures that predict heteroscedastic (input-dependent) segmentation uncertainty, which indicates regions of segmentations that should be treated with care. What is missing are structural insights into the uncertainty, which would be desirable for interpretability and systematic adjustments. In the context of state-of-the-art stochastic segmentation networks (SSNs), we solve this issue by dismantling the overall predicted uncertainty into smaller uncertainty components. We obtain them directly from the low-rank Gaussian distribution for the logits in the network head of SSNs, based on a previously unconsidered view of this distribution as a factor model. The rank subsequently encodes a number of latent variables, each of which controls an individual uncertainty component. Hence, we can use the latent variables (called factors) for ﬁne-grained sample control, thereby solving an open problem from previous work. There is one caveat though–factors are only unique up to orthogonal rotations. Factor rotations allow us to structure the uncertainty in a way that endorses simplicity, non-redundancy, and separation among the individual uncertainty components.
\ No newline at end of file
diff --git a/data/2022/neurips/Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis b/data/2022/neurips/Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis
new file mode 100644
index 0000000000..855172d7e8
--- /dev/null
+++ b/data/2022/neurips/Sub-exponential time Sum-of-Squares lower bounds for Principal Components Analysis	
@@ -0,0 +1 @@
+Principal Components Analysis (PCA) is a dimension-reduction technique widely used in machine learning and statistics. However, due to the dependence of the principal components on all the dimensions, the components are notoriously hard to interpret. Therefore, a variant known as sparse PCA is often preferred. Sparse PCA learns principal components of the data but enforces that such components must be sparse. This has applications in diverse ﬁelds such as computational biology and image processing. To learn sparse principal components, it’s well known that standard PCA will not work, especially in high dimensions, and therefore algorithms for sparse PCA are often studied as a separate endeavor. Various algorithms have been proposed for Sparse PCA over the years, but given how fundamental it is for applications in science, the limits of efﬁcient algorithms are only partially understood. In this work, we study the limits of the powerful Sum of Squares (SoS) family of algorithms for Sparse PCA. SoS algorithms have recently revolutionized robust statistics, leading to breakthrough algorithms for long-standing open problems in machine learning, such as optimally learning mixtures of gaussians, robust clustering, robust regression, etc. Moreover, it is believed to be the optimal robust algorithm for many statistical problems. Therefore, for sparse PCA, it’s plausible that it can beat simpler algorithms such as diagonal thresholding that have been traditionally used. In this work, we show that this is not the case, by exhibiting strong tradeoffs between the number of samples required, the sparsity and the ambient dimension, for which SoS algorithms, even if allowed sub-exponential time, will fail to optimally recover the component. Our results are complemented by known algorithms in literature, thereby painting an almost complete picture of the behavior of efﬁcient algorithms for sparse PCA. Since SoS algorithms encapsulate many algorithmic techniques such as spectral or statistical query algorithms, this solidiﬁes the message that known algorithms are optimal for sparse PCA. Moreover, our techniques are strong enough to obtain similar tradeoffs
\ No newline at end of file
diff --git a/data/2022/neurips/Subgame Solving in Adversarial Team Games b/data/2022/neurips/Subgame Solving in Adversarial Team Games
new file mode 100644
index 0000000000..8c4410a13b
--- /dev/null
+++ b/data/2022/neurips/Subgame Solving in Adversarial Team Games	
@@ -0,0 +1 @@
+In adversarial team games , a team of players sequentially faces a team of adversaries. These games are the simplest setting with multiple players where cooperation and competition coexist, and it is known that the information asymmetry among the team members makes equilibrium approximation computationally hard. Although much effort has been spent designing scalable algorithms, the problem of solving large game instances is open. In this paper, we extend the successful approach of solving huge two-player zero-sum games, where a blueprint strategy is computed offline by using an abstract version of the game and then it is refined online , that is, during a playthrough. In particular, to the best of our knowledge, our paper provides the first method for online strategy refinement via subgame solving in adversarial team games. Our method, based on the team belief DAG, generates a gadget game and then refine the blueprint strategy by using column-generation approaches in anytime fashion. If the blueprint is sparse, then our whole algorithm runs end-to-end in polynomial time given a best-response oracle; in particular, it avoids expanding the whole team belief DAG, which has exponential worst-case size. We apply our method to a standard test suite, and we empirically show the performance improvement of the strategies thanks to subgame solving
\ No newline at end of file
diff --git a/data/2022/neurips/Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation b/data/2022/neurips/Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation
new file mode 100644
index 0000000000..ac43ee461e
--- /dev/null
+++ b/data/2022/neurips/Subgroup Robustness Grows On Trees: An Empirical Baseline Investigation	
@@ -0,0 +1 @@
+Researchers have proposed many methods for fair and robust machine learning, but comprehensive empirical evaluation of their subgroup robustness is lacking. In this work, we address this gap in the context of tabular data, where sensitive subgroups are clearly-defined, real-world fairness problems abound, and prior works often do not compare to state-of-the-art tree-based models as baselines. We conduct an empirical comparison of several previously-proposed methods for fair and robust learning alongside state-of-the-art tree-based methods and other baselines. Via experiments with more than $340{,}000$ model configurations on eight datasets, we show that tree-based methods have strong subgroup robustness, even when compared to robustness- and fairness-enhancing methods. Moreover, the best tree-based models tend to show good performance over a range of metrics, while robust or group-fair models can show brittleness, with significant performance differences across different metrics for a fixed model. We also demonstrate that tree-based models show less sensitivity to hyperparameter configurations, and are less costly to train. Our work suggests that tree-based ensemble models make an effective baseline for tabular data, and are a sensible default when subgroup robustness is desired. For associated code and detailed results, see https://github.com/jpgard/subgroup-robustness-grows-on-trees .
\ No newline at end of file
diff --git a/data/2022/neurips/Sublinear Algorithms for Hierarchical Clustering b/data/2022/neurips/Sublinear Algorithms for Hierarchical Clustering
new file mode 100644
index 0000000000..88584270ce
--- /dev/null
+++ b/data/2022/neurips/Sublinear Algorithms for Hierarchical Clustering	
@@ -0,0 +1 @@
+Hierarchical clustering over graphs is a fundamental task in data mining and machine learning with applications in domains such as phylogenetics, social network analysis, and information retrieval. Specifically, we consider the recently popularized objective function for hierarchical clustering due to Dasgupta. Previous algorithms for (approximately) minimizing this objective function require linear time/space complexity. In many applications the underlying graph can be massive in size making it computationally challenging to process the graph even using a linear time/space algorithm. As a result, there is a strong interest in designing algorithms that can perform global computation using only sublinear resources. The focus of this work is to study hierarchical clustering for massive graphs under three well-studied models of sublinear computation which focus on space, time, and communication, respectively, as the primary resources to optimize: (1) (dynamic) streaming model where edges are presented as a stream, (2) query model where the graph is queried using neighbor and degree queries, (3) MPC model where the graph edges are partitioned over several machines connected via a communication channel. We design sublinear algorithms for hierarchical clustering in all three models above. At the heart of our algorithmic results is a view of the objective in terms of cuts in the graph, which allows us to use a relaxed notion of cut sparsifiers to do hierarchical clustering while introducing only a small distortion in the objective function. Our main algorithmic contributions are then to show how cut sparsifiers of the desired form can be efficiently constructed in the query model and the MPC model. We complement our algorithmic results by establishing nearly matching lower bounds that rule out the possibility of designing better algorithms in each of these models.
\ No newline at end of file
diff --git a/data/2022/neurips/Submodular Maximization in Clean Linear Time b/data/2022/neurips/Submodular Maximization in Clean Linear Time
new file mode 100644
index 0000000000..ddd11dcf63
--- /dev/null
+++ b/data/2022/neurips/Submodular Maximization in Clean Linear Time	
@@ -0,0 +1 @@
+In this paper, we provide the first deterministic algorithm that achieves the tight $1-1/e$ approximation guarantee for submodular maximization under a cardinality (size) constraint while making a number of queries that scales only linearly with the size of the ground set $n$. To complement our result, we also show strong information-theoretic lower bounds. More specifically, we show that when the maximum cardinality allowed for a solution is constant, no algorithm making a sub-linear number of function evaluations can guarantee any constant approximation ratio. Furthermore, when the constraint allows the selection of a constant fraction of the ground set, we show that any algorithm making fewer than $\Omega(n/\log(n))$ function evaluations cannot perform better than an algorithm that simply outputs a uniformly random subset of the ground set of the right size. We then provide a variant of our deterministic algorithm for the more general knapsack constraint, which is the first linear-time algorithm that achieves $1/2$-approximation guarantee for this constraint. Finally, we extend our results to the general case of maximizing a monotone submodular function subject to the intersection of a $p$-set system and multiple knapsack constraints. We extensively evaluate the performance of our algorithms on multiple real-life machine learning applications, including movie recommendation, location summarization, twitter text summarization and video summarization.
\ No newline at end of file
diff --git a/data/2022/neurips/Subquadratic Kronecker Regression with Applications to Tensor Decomposition b/data/2022/neurips/Subquadratic Kronecker Regression with Applications to Tensor Decomposition
new file mode 100644
index 0000000000..807618e13b
--- /dev/null
+++ b/data/2022/neurips/Subquadratic Kronecker Regression with Applications to Tensor Decomposition	
@@ -0,0 +1 @@
+Kronecker regression is a highly-structured least squares problem $\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$, where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes \mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression problem arises in each step of the widely-used alternating least squares (ALS) algorithm for computing the Tucker decomposition of a tensor. We present the first subquadratic-time algorithm for solving Kronecker regression to a $(1+\varepsilon)$-approximation that avoids the exponential term $O(\varepsilon^{-N})$ in the running time. Our techniques combine leverage score sampling and iterative methods. By extending our approach to block-design matrices where one block is a Kronecker product, we also achieve subquadratic-time algorithms for (1) Kronecker ridge regression and (2) updating the factor matrices of a Tucker decomposition in ALS, which is not a pure Kronecker regression problem, thereby improving the running time of all steps of Tucker ALS. We demonstrate the speed and accuracy of this Kronecker regression algorithm on synthetic data and real-world image tensors.
\ No newline at end of file
diff --git a/data/2022/neurips/Subsidiary Prototype Alignment for Universal Domain Adaptation b/data/2022/neurips/Subsidiary Prototype Alignment for Universal Domain Adaptation
new file mode 100644
index 0000000000..97d1e7f760
--- /dev/null
+++ b/data/2022/neurips/Subsidiary Prototype Alignment for Universal Domain Adaptation	
@@ -0,0 +1 @@
+Universal Domain Adaptation (UniDA) deals with the problem of knowledge transfer between two datasets with domain-shift as well as category-shift. The goal is to categorize unlabeled target samples, either into one of the"known"categories or into a single"unknown"category. A major problem in UniDA is negative transfer, i.e. misalignment of"known"and"unknown"classes. To this end, we first uncover an intriguing tradeoff between negative-transfer-risk and domain-invariance exhibited at different layers of a deep network. It turns out we can strike a balance between these two metrics at a mid-level layer. Towards designing an effective framework based on this insight, we draw motivation from Bag-of-visual-Words (BoW). Word-prototypes in a BoW-like representation of a mid-level layer would represent lower-level visual primitives that are likely to be unaffected by the category-shift in the high-level features. We develop modifications that encourage learning of word-prototypes followed by word-histogram based classification. Following this, subsidiary prototype-space alignment (SPA) can be seen as a closed-set alignment problem, thereby avoiding negative transfer. We realize this with a novel word-histogram-related pretext task to enable closed-set SPA, operating in conjunction with goal task UniDA. We demonstrate the efficacy of our approach on top of existing UniDA techniques, yielding state-of-the-art performance across three standard UniDA and Open-Set DA object recognition benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Subspace Recovery from Heterogeneous Data with Non-isotropic Noise b/data/2022/neurips/Subspace Recovery from Heterogeneous Data with Non-isotropic Noise
new file mode 100644
index 0000000000..c0b0f6ea4d
--- /dev/null
+++ b/data/2022/neurips/Subspace Recovery from Heterogeneous Data with Non-isotropic Noise	
@@ -0,0 +1 @@
+Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from $n$ users with user $i$ contributing data samples from a $d$-dimensional distribution with mean $\mu_i$. Our goal is to recover the linear subspace shared by $\mu_1,\ldots,\mu_n$ using the data points from all users, where every data point from user $i$ is formed by adding an independent mean-zero noise vector to $\mu_i$. If we only have one data point from every user, subspace recovery is information-theoretically impossible when the covariance matrices of the noise vectors can be non-spherical, necessitating additional restrictive assumptions in previous work. We avoid these assumptions by leveraging at least two data points from each user, which allows us to design an efficiently-computable estimator under non-spherical and user-dependent noise. We prove an upper bound for the estimation error of our estimator in general scenarios where the number of data points and amount of noise can vary across users, and prove an information-theoretic error lower bound that not only matches the upper bound up to a constant factor, but also holds even for spherical Gaussian noise. This implies that our estimator does not introduce additional estimation error (up to a constant factor) due to irregularity in the noise. We show additional results for a linear regression problem in a similar setup.
\ No newline at end of file
diff --git a/data/2022/neurips/Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap b/data/2022/neurips/Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap
new file mode 100644
index 0000000000..0db7275a25
--- /dev/null
+++ b/data/2022/neurips/Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap	
@@ -0,0 +1 @@
+A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model where the cluster means are sparse vectors. Here we provide an exact asymptotic characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity, i.e. when the fraction of non-zero components of the cluster means $\rho$, as well as the ratio $\alpha$ between the number of samples and the dimension are fixed, while the dimension diverges. We identify the information-theoretic threshold below which obtaining a positive correlation with the true cluster means is statistically impossible. Additionally, we investigate the performance of the approximate message passing (AMP) algorithm analyzed via its state evolution, which is conjectured to be optimal among polynomial algorithm for this task. We identify in particular the existence of a statistical-to-computational gap between the algorithm that require a signal-to-noise ratio $\lambda_{\text{alg}} \ge k / \sqrt{\alpha} $ to perform better than random, and the information theoretic threshold at $\lambda_{\text{it}} \approx \sqrt{-k \rho \log{\rho}} / \sqrt{\alpha}$. Finally, we discuss the case of sub-extensive sparsity $\rho$ by comparing the performance of the AMP with other sparsity-enhancing algorithms, such as sparse-PCA and diagonal thresholding.
\ No newline at end of file
diff --git a/data/2022/neurips/Supervised Training of Conditional Monge Maps b/data/2022/neurips/Supervised Training of Conditional Monge Maps
new file mode 100644
index 0000000000..6eaf136b63
--- /dev/null
+++ b/data/2022/neurips/Supervised Training of Conditional Monge Maps	
@@ -0,0 +1 @@
+Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures $(\mu, \nu)$, a parameterized map $T_\theta$ that can efficiently map $\mu$ onto $\nu$. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures $(\mu, \nu)$ that define optimal transport problems do not arise in isolation but are associated with a context $c$, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures $\left(\mu_i, \nu_i\right)$ tagged with a context label $c_i$. CondOT learns a global map $\mathcal{T}_\theta$ conditioned on context that is not only expected to fit all labeled pairs in the dataset $\left\{\left(c_i,\left(\mu_i, \nu_i\right)\right)\right\}$, i.e., $\mathcal{T}_\theta\left(c_i\right) \sharp \mu_i \approx \nu_i$, but should also generalize to produce meaningful maps $\mathcal{T}_\theta\left(c_{\text {new }}\right)$ when conditioned on unseen contexts $c_{\text {new }}$. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately.
\ No newline at end of file
diff --git a/data/2022/neurips/Supervising the Multi-Fidelity Race of Hyperparameter Configurations b/data/2022/neurips/Supervising the Multi-Fidelity Race of Hyperparameter Configurations
new file mode 100644
index 0000000000..2ddfc4c9d8
--- /dev/null
+++ b/data/2022/neurips/Supervising the Multi-Fidelity Race of Hyperparameter Configurations	
@@ -0,0 +1 @@
+Multi-fidelity (gray-box) hyperparameter optimization techniques (HPO) have recently emerged as a promising direction for tuning Deep Learning methods. However, existing methods suffer from a sub-optimal allocation of the HPO budget to the hyperparameter configurations. In this work, we introduce DyHPO, a Bayesian Optimization method that learns to decide which hyperparameter configuration to train further in a dynamic race among all feasible configurations. We propose a new deep kernel for Gaussian Processes that embeds the learning curve dynamics, and an acquisition function that incorporates multi-budget information. We demonstrate the significant superiority of DyHPO against state-of-the-art hyperparameter optimization methods through large-scale experiments comprising 50 datasets (Tabular, Image, NLP) and diverse architectures (MLP, CNN/NAS, RNN).
\ No newline at end of file
diff --git a/data/2022/neurips/Support Recovery in Sparse PCA with Incomplete Data b/data/2022/neurips/Support Recovery in Sparse PCA with Incomplete Data
new file mode 100644
index 0000000000..9084df197a
--- /dev/null
+++ b/data/2022/neurips/Support Recovery in Sparse PCA with Incomplete Data	
@@ -0,0 +1 @@
+We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data. Our algorithm is based on the semidefinite program (SDP) relaxation of the non-convex $l_1$-regularized PCA problem. We provide theoretical and experimental evidence that SDP enables us to exactly recover the true support of the sparse leading eigenvector of the unknown true matrix, despite only observing an incomplete (missing uniformly at random) and noisy version of it. We derive sufficient conditions for exact recovery, which involve matrix incoherence, the spectral gap between the largest and second-largest eigenvalues, the observation probability and the noise variance. We validate our theoretical results with incomplete synthetic data, and show encouraging and meaningful results on a gene expression dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/Supported Policy Optimization for Offline Reinforcement Learning b/data/2022/neurips/Supported Policy Optimization for Offline Reinforcement Learning
new file mode 100644
index 0000000000..7dfde30f3e
--- /dev/null
+++ b/data/2022/neurips/Supported Policy Optimization for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy. The elaborative designs of parameterization methods usually intrude into the policy networks, which may bring extra inference cost and cannot take full advantage of well-established online methods. Regularization methods reduce the divergence between the learned policy and the behavior policy, which may mismatch the inherent density-based definition of support set thereby failing to avoid the out-of-distribution actions effectively. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint. SPOT adopts a VAE-based density estimator to explicitly model the support set of behavior policy and presents a simple but effective density-based regularization term, which can be plugged non-intrusively into off-the-shelf off-policy RL algorithms. SPOT achieves the state-of-the-art performance on standard benchmarks for offline RL. Benefiting from the pluggable design, offline pretrained models from SPOT can also be applied to perform online fine-tuning seamlessly.
\ No newline at end of file
diff --git a/data/2022/neurips/SurDis: A Surface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments b/data/2022/neurips/SurDis: A Surface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments
new file mode 100644
index 0000000000..f2cc471396
--- /dev/null
+++ b/data/2022/neurips/SurDis: A Surface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments	
@@ -0,0 +1 @@
+According to World Health Organization, there is an estimated 2.2 billion people with a near or distance vision impairment worldwide. Difficulty in self-navigation is one of the greatest challenges to independence for the blind and low vision (BLV) people. Through consultations with several BLV service providers, we realized that negotiating surface discontinuities is one of the very prominent challenges when navigating an outdoor environment within the urban. Surface discontinuities are commonly formed by rises and drop-offs along a pathway. They could be a threat to balancing during a walk and perceiving such a threat is highly challenging to the BLVs. In this paper, we introduce SurDis, a novel dataset of depth maps and stereo images that exemplifies the issue of surface discontinuity in the urban areas of Klang Valley, Malaysia. We seek to address the limitation of existing datasets of such nature in these areas. Current mobility tools for the BLVs predominantly focus on furniture, indoor built environments, traffic signs, vehicles, humans and various types of objects’ detection above the surface of a pathway. We emphasize a specific purpose for SurDis – to support the development of assistive wearable technology for the BLVs to negotiate surface discontinuity. We consulted BLV volunteers on the specifications of surface condition that could become hazardous for navigation using 3D printed replicas of actual scaled-down scenes, and identified locations that are frequented by the BLVs as our target data collection fields. With feedback from these volunteers, we developed a lightweight, small and unobtrusive prototype equipped with a tiny stereo camera and an embedded system on a single board computer to capture the samples from 10 different locations. We describe instrument development, data collection, preprocessing, annotation, and experiments conducted. The dataset contains: (1) more than 17000 depth maps generated
\ No newline at end of file
diff --git a/data/2022/neurips/Surprise Minimizing Multi-Agent Learning with Energy-based Models b/data/2022/neurips/Surprise Minimizing Multi-Agent Learning with Energy-based Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Sustainable Online Reinforcement Learning for Auto-bidding b/data/2022/neurips/Sustainable Online Reinforcement Learning for Auto-bidding
new file mode 100644
index 0000000000..004d3b5fc6
--- /dev/null
+++ b/data/2022/neurips/Sustainable Online Reinforcement Learning for Auto-bidding	
@@ -0,0 +1 @@
+Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS. In this paper, we argue that there exists significant gaps between the VAS and RAS, making the RL training process suffer from the problem of inconsistency between online and offline (IBOO). Firstly, we formally define the IBOO and systematically analyze its causes and influences. Then, to avoid the IBOO, we propose a sustainable online RL (SORL) framework that trains the auto-bidding policy by directly interacting with the RAS, instead of learning in the VAS. Specifically, based on our proof of the Lipschitz smooth property of the Q function, we design a safe and efficient online exploration (SER) policy for continuously collecting data from the RAS. Meanwhile, we derive the theoretical lower bound on the safety of the SER policy. We also develop a variance-suppressed conservative Q-learning (V-CQL) method to effectively and stably learn the auto-bidding policy with the collected data. Finally, extensive simulated and real-world experiments validate the superiority of our approach over the state-of-the-art auto-bidding algorithm.
\ No newline at end of file
diff --git a/data/2022/neurips/SwinTrack: A Simple and Strong Baseline for Transformer Tracking b/data/2022/neurips/SwinTrack: A Simple and Strong Baseline for Transformer Tracking
new file mode 100644
index 0000000000..cf1a7f187c
--- /dev/null
+++ b/data/2022/neurips/SwinTrack: A Simple and Strong Baseline for Transformer Tracking	
@@ -0,0 +1 @@
+Recently Transformer has been largely explored in tracking and shown state-of-the-art (SOTA) performance. However, existing efforts mainly focus on fusing and enhancing features generated by convolutional neural networks (CNNs). The potential of Transformer in representation learning remains under-explored. In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed SwinTrack, within classic Siamese framework. In particular, both representation learning and feature fusion in SwinTrack leverage the Transformer architecture, enabling better feature interactions for tracking than pure CNN or hybrid CNN-Transformer frameworks. Besides, to further enhance robustness, we present a novel motion token that embeds historical target trajectory to improve tracking by providing temporal context. Our motion token is lightweight with negligible computation but brings clear gains. In our thorough experiments, SwinTrack exceeds existing approaches on multiple benchmarks. Particularly, on the challenging LaSOT, SwinTrack sets a new record with 0.713 SUC score. It also achieves SOTA results on other benchmarks. We expect SwinTrack to serve as a solid baseline for Transformer tracking and facilitate future research. Our codes and results are released at https://github.com/LitingLin/SwinTrack.
\ No newline at end of file
diff --git a/data/2022/neurips/Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization b/data/2022/neurips/Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization
new file mode 100644
index 0000000000..7f6feb7c11
--- /dev/null
+++ b/data/2022/neurips/Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization	
@@ -0,0 +1 @@
+Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i.e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers less relying on problem-specific expert domain knowledge (heuristic method) and supervised labeled data (supervised learning method). This paper presents a novel training scheme, Sym-NCO, which is a regularizer-based training scheme that leverages universal symmetricities in various CO problems and solutions. Leveraging symmetricities such as rotational and reflectional invariance can greatly improve the generalization capability of DRL-NCO because it allows the learned solver to exploit the commonly shared symmetricities in the same CO problem class. Our experimental results verify that our Sym-NCO greatly improves the performance of DRL-NCO methods in four CO tasks, including the traveling salesman problem (TSP), capacitated vehicle routing problem (CVRP), prize collecting TSP (PCTSP), and orienteering problem (OP), without utilizing problem-specific expert domain knowledge. Remarkably, Sym-NCO outperformed not only the existing DRL-NCO methods but also a competitive conventional solver, the iterative local search (ILS), in PCTSP at 240 faster speed. Our source code is available at https://github.com/alstn12088/Sym-NCO.
\ No newline at end of file
diff --git a/data/2022/neurips/Symbolic Distillation for Learned TCP Congestion Control b/data/2022/neurips/Symbolic Distillation for Learned TCP Congestion Control
new file mode 100644
index 0000000000..29abaef532
--- /dev/null
+++ b/data/2022/neurips/Symbolic Distillation for Learned TCP Congestion Control	
@@ -0,0 +1 @@
+Recent advances in TCP congestion control (CC) have achieved tremendous success with deep reinforcement learning (RL) approaches, which use feedforward neural networks (NN) to learn complex environment conditions and make better decisions. However, such"black-box"policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath due to the use of complex NNs. This paper proposes a novel two-stage solution to achieve the best of both worlds: first to train a deep RL agent, then distill its (over-)parameterized NN policy into white-box, light-weight rules in the form of symbolic expressions that are much easier to understand and to implement in constrained environments. At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree. The distilled symbolic rules preserve and often improve performance over state-of-the-art NN policies while being faster and simpler than a standard neural network. We validate the performance of our distilled symbolic rules on both simulation and emulation environments. Our code is available at https://github.com/VITA-Group/SymbolicPCC.
\ No newline at end of file
diff --git a/data/2022/neurips/Symmetry Teleportation for Accelerated Optimization b/data/2022/neurips/Symmetry Teleportation for Accelerated Optimization
new file mode 100644
index 0000000000..cf9ae753e1
--- /dev/null
+++ b/data/2022/neurips/Symmetry Teleportation for Accelerated Optimization	
@@ -0,0 +1 @@
+Existing gradient-based optimization methods update parameters locally, in a direction that minimizes the loss function. We study a different approach, symmetry teleportation, that allows parameters to travel a large distance on the loss level set, in order to improve the convergence speed in subsequent steps. Teleportation exploits symmetries in the loss landscape of optimization problems. We derive loss-invariant group actions for test functions in optimization and multi-layer neural networks, and prove a necessary condition for teleportation to improve convergence rate. We also show that our algorithm is closely related to second order methods. Experimentally, we show that teleportation improves the convergence speed of gradient descent and AdaGrad for several optimization problems including test functions, multi-layer regressions, and MNIST classification.
\ No newline at end of file
diff --git a/data/2022/neurips/Symmetry-induced Disentanglement on Graphs b/data/2022/neurips/Symmetry-induced Disentanglement on Graphs
new file mode 100644
index 0000000000..c9bdca65ae
--- /dev/null
+++ b/data/2022/neurips/Symmetry-induced Disentanglement on Graphs	
@@ -0,0 +1 @@
+Learning disentangled representations is important for unraveling the underlying complex interactions between latent generative factors. Disentanglement has been formalized using a symmetry-centric notion for unstructured spaces, however, graphs have eluded a similarly rigorous treatment. We fill this gap with a new notion of conditional symmetry for disentanglement, and leverage tools from Lie algebras to encode graph properties into subgroups using suitable adaptations of generative models such as Variational Autoencoders. Unlike existing works on disentanglement, the proposed models segregate the latent space into uncoupled and entangled parts. Experiments on synthetic and real datasets suggest that these models can learn effective disengaged representations, and improve performance on downstream tasks such as few-shot classification and molecular generation.
\ No newline at end of file
diff --git a/data/2022/neurips/Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data b/data/2022/neurips/Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data
new file mode 100644
index 0000000000..5ec24cf8cf
--- /dev/null
+++ b/data/2022/neurips/Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data	
@@ -0,0 +1 @@
+Hamiltonian mechanics is a well-established theory for modeling the time evolution of systems with conserved quantities (called Hamiltonian ), such as the total energy of the system. Recent works have parameterized the Hamiltonian by machine learning models (e.g., neural networks), allowing Hamiltonian dynamics to be obtained from state trajectories without explicit mathematical modeling. However, the performance of existing models is limited as we can observe only noisy and sparse trajectories in practice. This paper proposes a probabilistic model that can learn the dynamics of conservative or dissipative systems from noisy and sparse data. We introduce a Gaussian process that incorporates the symplectic geometric structure of Hamiltonian systems, which is used as a prior distribution for estimating Hamiltonian systems with additive dissipation. We then present its spectral representation, Symplectic Spectrum Gaussian Processes (SSGPs) , for which we newly derive random Fourier features with symplectic structures. This allows us to construct an efﬁcient variational inference algorithm for training the models while simulating the dynamics via ordinary differential equation solvers. Experiments on several physical systems show that SSGP offers excellent performance in predicting dynamics that follow the energy conservation or dissipation law from noisy and sparse data.
\ No newline at end of file
diff --git a/data/2022/neurips/Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms b/data/2022/neurips/Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms
new file mode 100644
index 0000000000..7c04dc4a1e
--- /dev/null
+++ b/data/2022/neurips/Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms	
@@ -0,0 +1 @@
+The stochastic contextual bandit problem, which models the trade-off between exploration and exploitation, has many real applications, including recommender systems, online advertising and clinical trials. As many other machine learning algorithms, contextual bandit algorithms often have one or more hyper-parameters. As an example, in most optimal stochastic contextual bandit algorithms, there is an unknown exploration parameter which controls the trade-off between exploration and exploitation. A proper choice of the hyper-parameters is essential for contextual bandit algorithms to perform well. However, it is infeasible to use offline tuning methods to select hyper-parameters in contextual bandit environment since there is no pre-collected dataset and the decisions have to be made in real time. To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment. We derive the regret bounds of our proposed Syndicated Bandits framework and show it can avoid its regret dependent exponentially in the number of hyper-parameters to be tuned. Moreover, it achieves optimal regret bounds under certain scenarios. Syndicated Bandits framework is general enough to handle the tuning tasks in many popular contextual bandit algorithms, such as LinUCB, LinTS, UCB-GLM, etc. Experiments on both synthetic and real datasets validate the effectiveness of our proposed framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Synergy-of-Experts: Collaborate to Improve Adversarial Robustness b/data/2022/neurips/Synergy-of-Experts: Collaborate to Improve Adversarial Robustness
new file mode 100644
index 0000000000..64fa932512
--- /dev/null
+++ b/data/2022/neurips/Synergy-of-Experts: Collaborate to Improve Adversarial Robustness	
@@ -0,0 +1 @@
+Learning adversarially robust models requires invariant predictions to a small neighborhood of its natural inputs, often encountering insufficient model capacity . There is research showing that learning multiple sub-models in an ensemble could mitigate this insufficiency, further improving the generalization and the robustness. However, the ensemble’s voting-based strategy excludes the possibility that the true predictions remain with the minority . Therefore, this paper further improves the ensemble through a collaboration scheme—Synergy-of-Experts (SoE). Compared with the voting-based strategy, the SoE enables the possibility of correct predictions even if there exists a single correct sub-model. In SoE, every sub-model fits its specific vulnerability area and reserves the rest of the sub-models to fit other vulnerability areas, which effectively optimizes the utilization of the model capacity. Empirical experiments verify that SoE outperforms various ensemble methods against white-box and transfer-based adversarial attacks. The source codes are available at https://github.com/cuis15/synergy-of-experts .
\ No newline at end of file
diff --git a/data/2022/neurips/Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning b/data/2022/neurips/Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning
new file mode 100644
index 0000000000..dcc7fd0a6c
--- /dev/null
+++ b/data/2022/neurips/Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning	
@@ -0,0 +1 @@
+Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data - instead given access to a set of expert models and their predictions alongside some limited information about the dataset used to train them. In scenarios from finance to the medical sciences, and even consumer practice, stakeholders have developed models on private data they either cannot, or do not want to, share. Given the value and legislation surrounding personal information, it is not surprising that only the models, and not the data, will be released - the pertinent question becoming: how best to use these models? Previous work has focused on global model selection or ensembling, with the result of a single final model across the feature space. Machine learning models perform notoriously poorly on data outside their training domain however, and so we argue that when ensembling models the weightings for individual instances must reflect their respective domains - in other words models that are more likely to have seen information on that instance should have more attention paid to them. We introduce a method for such an instance-wise ensembling of models, including a novel representation learning step for handling sparse high-dimensional domains. Finally, we demonstrate the need and generalisability of our method on classical machine learning tasks as well as highlighting a real world use case in the pharmacological setting of vancomycin precision dosing.
\ No newline at end of file
diff --git a/data/2022/neurips/Systematic improvement of neural network quantum states using Lanczos b/data/2022/neurips/Systematic improvement of neural network quantum states using Lanczos
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/TA-GATES: An Encoding Scheme for Neural Network Architectures b/data/2022/neurips/TA-GATES: An Encoding Scheme for Neural Network Architectures
new file mode 100644
index 0000000000..945c9b46d6
--- /dev/null
+++ b/data/2022/neurips/TA-GATES: An Encoding Scheme for Neural Network Architectures	
@@ -0,0 +1 @@
+.
\ No newline at end of file
diff --git a/data/2022/neurips/TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training b/data/2022/neurips/TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training
new file mode 100644
index 0000000000..d5c61bfb9b
--- /dev/null
+++ b/data/2022/neurips/TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training	
@@ -0,0 +1 @@
+Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectiveness in scaling up deep neural networks to an extreme scale. Despite that numerous efforts have been made to improve the performance of MoE from the model design or system optimization perspective, existing MoE dispatch patterns are still not able to fully exploit the underlying heterogeneous network environments. In this paper, we propose TA-MoE, a topology-aware routing strategy for large-scale MoE trainging, from a model-system co-design perspective, which can dynamically adjust the MoE dispatch pattern according to the network topology. Based on communication modeling, we abstract the dispatch problem into an optimization objective and obtain the approximate dispatch pattern under different topologies. On top of that, we design a topology-aware auxiliary loss, which can adaptively route the data to fit in the underlying topology without sacrificing the model accuracy. Experiments show that TA-MoE can substantially outperform its counterparts on various hardware and model configurations, with roughly 1.01x-1.61x, 1.01x-4.77x, 1.25x-1.54x improvements over the popular DeepSpeed-MoE, FastMoE and FasterMoE.
\ No newline at end of file
diff --git a/data/2022/neurips/TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition b/data/2022/neurips/TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
new file mode 100644
index 0000000000..3f9c6cc451
--- /dev/null
+++ b/data/2022/neurips/TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition	
@@ -0,0 +1 @@
+Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset. Extensive experiments show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes. Our codes and results are available at our project webpage https://cyw-3d.github.io/tango/.
\ No newline at end of file
diff --git a/data/2022/neurips/TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction b/data/2022/neurips/TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction
new file mode 100644
index 0000000000..02d7da98e6
--- /dev/null
+++ b/data/2022/neurips/TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction	
@@ -0,0 +1 @@
+Illuminating interactions between proteins and small drug molecules is a longstanding challenge in the field of drug discovery. Despite the importance of understanding these interactions, most previous works are limited by hand-designed scoring functions and insufficient conformation sampling. The recently-proposed graph neural network-based methods provides alternatives to predict protein-ligand complex conformation in a one-shot manner. However, these methods neglect the geometric constraints of the complex structure and weaken the role of local functional regions. As a result, they might produce unreasonable conformations for challenging targets and generalize poorly to novel proteins. In this paper, we propose Trigonometry-Aware Neural networKs for binding structure prediction, TANKBind, that builds trigonometry constraint as a vigorous inductive bias into the model and explicitly attends to all possible binding sites for each protein by segmenting the whole protein into functional blocks. We construct novel contrastive losses with local region negative sampling to jointly optimize the binding interaction and affinity. Extensive experiments show substantial performance gains in comparison to state-of-the-art physics-based and deep learning-based methods on commonly-used benchmark datasets for both binding structure and affinity predictions with variant settings.
\ No newline at end of file
diff --git a/data/2022/neurips/TAP-Vid: A Benchmark for Tracking Any Point in a Video b/data/2022/neurips/TAP-Vid: A Benchmark for Tracking Any Point in a Video
new file mode 100644
index 0000000000..937a78085f
--- /dev/null
+++ b/data/2022/neurips/TAP-Vid: A Benchmark for Tracking Any Point in a Video	
@@ -0,0 +1 @@
+Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
\ No newline at end of file
diff --git a/data/2022/neurips/TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels b/data/2022/neurips/TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels
new file mode 100644
index 0000000000..ddcf606608
--- /dev/null
+++ b/data/2022/neurips/TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels	
@@ -0,0 +1 @@
+State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data.
\ No newline at end of file
diff --git a/data/2022/neurips/TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models b/data/2022/neurips/TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
new file mode 100644
index 0000000000..8551a8a8ee
--- /dev/null
+++ b/data/2022/neurips/TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models	
@@ -0,0 +1 @@
+In order to diagnostically analyze and improve the capability of pretrained language models (PLMs) in text generation, we propose TGEA 2.0, to date the largest dataset built on machine-authored texts by PLMs with fine-grained semantic annotations on a wide variety of pathological generation errors. We collect 170K nominal, phrasal and sentential prompts from 6M natural sentences in 3 domains. These prompts are fed into 4 generative PLMs with their best decoding strategy to generate paragraphs. 195,629 sentences are extracted from these generated paragraphs for manual annotation, where 36K erroneous sentences are detected, 42K erroneous spans are located and categorized into an error type defined in a two-level error taxonomy. We define a Mi nimal S et of E rror-related W ords (MiSEW) for each erroneous span, which not only provides error-associated words but also rationalizes the reasoning behind the error. Quality control with a pre-annotation and feedback loop is performed before and during the entire annotation process. With the diagnostically annotated dataset, we propose 5 diagnosis benchmark tasks (i.e., erroneous text detection, MiSEW extraction, erroneous span location and correction together with error type classification) and 2 pathology mitigation benchmark tasks (pairwise comparison and word prediction). Experiment results on these benchmark tasks demonstrate that TGEA 2.0 is a challenging dataset that could facilitate further research on automatic diagnosis and pathology mitigation over machine texts. The dataset is publicly available at https://github
\ No newline at end of file
diff --git a/data/2022/neurips/TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation b/data/2022/neurips/TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
new file mode 100644
index 0000000000..0d4db814dd
--- /dev/null
+++ b/data/2022/neurips/TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation	
@@ -0,0 +1 @@
+Current referring expression comprehension algorithms can effectively detect or segment objects indicated by nouns, but how to understand verb reference is still under-explored. As such, we study the challenging problem of task oriented detection, which aims to find objects that best afford an action indicated by verbs like sit comfortably on. Towards a finer localization that better serves downstream applications like robot interaction, we extend the problem into task oriented instance segmentation. A unique requirement of this task is to select preferred candidates among possible alternatives. Thus we resort to the transformer architecture which naturally models pair-wise query relationships with attention, leading to the TOIST method. In order to leverage pre-trained noun referring expression comprehension models and the fact that we can access privileged noun ground truth during training, a novel noun-pronoun distillation framework is proposed. Noun prototypes are generated in an unsupervised manner and contextual pronoun features are trained to select prototypes. As such, the network remains noun-agnostic during inference. We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher $\rm{mAP^{box}}$ than the best-reported results. The proposed noun-pronoun distillation can boost $\rm{mAP^{box}}$ and $\rm{mAP^{mask}}$ by +2.8% and +3.8%. Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST.
\ No newline at end of file
diff --git a/data/2022/neurips/TPU-KNN: K Nearest Neighbor Search at Peak FLOP s b/data/2022/neurips/TPU-KNN: K Nearest Neighbor Search at Peak FLOP s
new file mode 100644
index 0000000000..e682aa7dc0
--- /dev/null
+++ b/data/2022/neurips/TPU-KNN: K Nearest Neighbor Search at Peak FLOP s	
@@ -0,0 +1 @@
+This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivated by an accurate accelerator performance model that takes into account both the memory and instruction bottlenecks. Our algorithm comes with an analytical guarantee of recall in expectation and does not require maintaining sophisticated index data structure or tuning, making it suitable for applications with frequent updates. Our work is available in the open-source package of Jax and Tensorflow on TPU.
\ No newline at end of file
diff --git a/data/2022/neurips/TREC: Transient Redundancy Elimination-based Convolution b/data/2022/neurips/TREC: Transient Redundancy Elimination-based Convolution
new file mode 100644
index 0000000000..5ab178bbcd
--- /dev/null
+++ b/data/2022/neurips/TREC: Transient Redundancy Elimination-based Convolution	
@@ -0,0 +1 @@
+The intensive computations in convolutional neural networks (CNNs) pose challenges for resource-constrained devices; eliminating redundant computations from convolution is essential. This paper gives a principled method to detect and avoid transient redundancy, a type of redundancy existing in input data or activation maps and hence changing across inferences. By introducing a new form of convolution (TREC), this new method makes transient redundancy detection and avoidance an inherent part of the CNN architecture, and the determination of the best configurations for redundancy elimination part of CNN backward propagation. We provide a rigorous proof of the robustness and convergence of TREC-equipped CNNs. TREC removes over 96% computations and achieves 3.51× average speedups on microcontrollers with minimal (about 0.7%) accuracy loss.
\ No newline at end of file
diff --git a/data/2022/neurips/TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning b/data/2022/neurips/TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning
new file mode 100644
index 0000000000..98f94ca668
--- /dev/null
+++ b/data/2022/neurips/TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning	
@@ -0,0 +1 @@
+We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular evolutionary-based methods and outperforms them by the number of function evaluations or execution time, often by a significant margin.
\ No newline at end of file
diff --git a/data/2022/neurips/TUSK: Task-Agnostic Unsupervised Keypoints b/data/2022/neurips/TUSK: Task-Agnostic Unsupervised Keypoints
new file mode 100644
index 0000000000..4b599af569
--- /dev/null
+++ b/data/2022/neurips/TUSK: Task-Agnostic Unsupervised Keypoints	
@@ -0,0 +1 @@
+Existing unsupervised methods for keypoint learning rely heavily on the assumption that a specific keypoint type (e.g. elbow, digit, abstract geometric shape) appears only once in an image. This greatly limits their applicability, as each instance must be isolated before applying the method-an issue that is never discussed or evaluated. We thus propose a novel method to learn Task-agnostic, UnSupervised Keypoints (TUSK) which can deal with multiple instances. To achieve this, instead of the commonly-used strategy of detecting multiple heatmaps, each dedicated to a specific keypoint type, we use a single heatmap for detection, and enable unsupervised learning of keypoint types through clustering. Specifically, we encode semantics into the keypoints by teaching them to reconstruct images from a sparse set of keypoints and their descriptors, where the descriptors are forced to form distinct clusters in feature space around learned prototypes. This makes our approach amenable to a wider range of tasks than any previous unsupervised keypoint method: we show experiments on multiple-instance detection and classification, object discovery, and landmark detection-all unsupervised-with performance on par with the state of the art, while also being able to deal with multiple instances.
\ No newline at end of file
diff --git a/data/2022/neurips/TVLT: Textless Vision-Language Transformer b/data/2022/neurips/TVLT: Textless Vision-Language Transformer
new file mode 100644
index 0000000000..74dfcd2a29
--- /dev/null
+++ b/data/2022/neurips/TVLT: Textless Vision-Language Transformer	
@@ -0,0 +1 @@
+In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-and-language representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR). TVLT is trained by reconstructing masked patches of continuous video frames and audio spectrograms (masked autoencoding) and contrastive modeling to align video and audio. TVLT attains performance comparable to its text-based counterpart on various multimodal tasks, such as visual question answering, image retrieval, video retrieval, and multimodal sentiment analysis, with 28x faster inference speed and only 1/3 of the parameters. Our findings suggest the possibility of learning compact and efficient visual-linguistic representations from low-level visual and audio signals without assuming the prior existence of text. Our code and checkpoints are available at: https://github.com/zinengtang/TVLT
\ No newline at end of file
diff --git a/data/2022/neurips/TaSIL: Taylor Series Imitation Learning b/data/2022/neurips/TaSIL: Taylor Series Imitation Learning
new file mode 100644
index 0000000000..9c3aa23dce
--- /dev/null
+++ b/data/2022/neurips/TaSIL: Taylor Series Imitation Learning	
@@ -0,0 +1 @@
+We propose Taylor Series Imitation Learning (TaSIL), a simple augmentation to standard behavior cloning losses in the context of continuous control. TaSIL penalizes deviations in the higher-order Taylor series terms between the learned and expert policies. We show that experts satisfying a notion of $\textit{incremental input-to-state stability}$ are easy to learn, in the sense that a small TaSIL-augmented imitation loss over expert trajectories guarantees a small imitation loss over trajectories generated by the learned policy. We provide sample-complexity bounds for TaSIL that scale as $\tilde{\mathcal{O}}(1/n)$ in the realizable setting, for $n$ the number of expert demonstrations. Finally, we demonstrate experimentally the relationship between the robustness of the expert policy and the order of Taylor expansion required in TaSIL, and compare standard Behavior Cloning, DART, and DAgger with TaSIL-loss-augmented variants. In all cases, we show significant improvement over baselines across a variety of MuJoCo tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets b/data/2022/neurips/TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets
new file mode 100644
index 0000000000..355838491d
--- /dev/null
+++ b/data/2022/neurips/TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets	
@@ -0,0 +1 @@
+The best neural architecture for a given machine learning problem depends on many factors: not only the complexity and structure of the dataset, but also on resource constraints including latency, compute, energy consumption, etc. Neural architecture search (NAS) for tabular datasets is an important but under-explored problem. Previous NAS algorithms designed for image search spaces incorporate resource constraints directly into the reinforcement learning (RL) rewards. However, for NAS on tabular datasets, this protocol often discovers suboptimal architectures. This paper develops TabNAS, a new and more effective approach to handle resource constraints in tabular NAS using an RL controller motivated by the idea of rejection sampling. TabNAS immediately discards any architecture that violates the resource constraints without training or learning from that architecture. TabNAS uses a Monte-Carlo-based correction to the RL policy gradient update to account for this extra filtering step. Results on several tabular datasets demonstrate the superiority of TabNAS over previous reward-shaping methods: it finds better models that obey the constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training b/data/2022/neurips/TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training
new file mode 100644
index 0000000000..945d91b42f
--- /dev/null
+++ b/data/2022/neurips/TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training	
@@ -0,0 +1 @@
+Vision-Language Pre-training (VLP) has been shown to be an efficient method to improve the performance of models on different vision-and-language downstream tasks. Substantial studies have shown that neural networks may be able to learn some general rules about language and visual concepts from a large-scale weakly labeled image-text dataset. However, most of the public cross-modal datasets that contain more than 100M image-text pairs are in English; there is a lack of available large-scale and high-quality Chinese VLP datasets. In this work, we propose a new framework for automatic dataset acquisition and cleaning with which we construct a new large-scale and high-quality cross-modal dataset named as TaiSu, containing 166 million images and 219 million Chinese captions. Compared with the recently released Wukong dataset, our dataset is achieved with much stricter restrictions on the semantic correlation of image-text pairs. We also propose to combine texts collected from the web with texts generated by a pre-trained image-captioning model. To the best of our knowledge, TaiSu is currently the largest publicly accessible Chinese cross-modal dataset. Furthermore, we test our dataset on several vision-language downstream tasks. TaiSu outperforms BriVL by a large margin on the zero-shot image-text retrieval task and zero-shot image classification task. TaiSu also shows better performance than Wukong on the image-retrieval task without using image augmentation for training. Results demonstrate that TaiSu can serve as a promising VLP dataset, both for understanding and generative tasks. More information can be referred to https://github.com/ksOAn6g5/TaiSu .
\ No newline at end of file
diff --git "a/data/2022/neurips/Taming Fat-Tailed (\"Heavier-Tailed\" with Potentially Infinite Variance) Noise in Federated Learning" "b/data/2022/neurips/Taming Fat-Tailed (\"Heavier-Tailed\" with Potentially Infinite Variance) Noise in Federated Learning"
new file mode 100644
index 0000000000..da5a47cdf9
--- /dev/null
+++ "b/data/2022/neurips/Taming Fat-Tailed (\"Heavier-Tailed\" with Potentially Infinite Variance) Noise in Federated Learning"	
@@ -0,0 +1 @@
+A key assumption in most existing works on FL algorithms' convergence analysis is that the noise in stochastic first-order information has a finite variance. Although this assumption covers all light-tailed (i.e., sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal, Weibull, and some Pareto distributions), it fails for many fat-tailed noise distributions (i.e., ``heavier-tailed'' with potentially infinite variance) that have been empirically observed in the FL literature. To date, it remains unclear whether one can design convergent algorithms for FL systems that experience fat-tailed noise. This motivates us to fill this gap in this paper by proposing an algorithmic framework called FAT-Clipping (\ul{f}ederated \ul{a}veraging with \ul{t}wo-sided learning rates and \ul{clipping}), which contains two variants: FAT-Clipping per-round (FAT-Clipping-PR) and FAT-Clipping per-iteration (FAT-Clipping-PI). Specifically, for the largest $\alpha \in (1,2]$ such that the fat-tailed noise in FL still has a bounded $\alpha$-moment, we show that both variants achieve $\mathcal{O}((mT)^{\frac{2-\alpha}{\alpha}})$ and $\mathcal{O}((mT)^{\frac{1-\alpha}{3\alpha-2}})$ convergence rates in the strongly-convex and general non-convex settings, respectively, where $m$ and $T$ are the numbers of clients and communication rounds. Moreover, at the expense of more clipping operations compared to FAT-Clipping-PR, FAT-Clipping-PI further enjoys a linear speedup effect with respect to the number of local updates at each client and being lower-bound-matching (i.e., order-optimal). Collectively, our results advance the understanding of designing efficient algorithms for FL systems that exhibit fat-tailed first-order oracle information.
\ No newline at end of file
diff --git a/data/2022/neurips/TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification b/data/2022/neurips/TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification
new file mode 100644
index 0000000000..7d21e2595c
--- /dev/null
+++ b/data/2022/neurips/TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification	
@@ -0,0 +1 @@
+Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution. For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning. Experimental results in ball and room rearrangement demonstrate that our method significantly outperforms the state-of-the-art methods in the quality of the terminal state, the efficiency of the control process, and scalability.
\ No newline at end of file
diff --git a/data/2022/neurips/Target alignment in truncated kernel ridge regression b/data/2022/neurips/Target alignment in truncated kernel ridge regression
new file mode 100644
index 0000000000..2395542ed5
--- /dev/null
+++ b/data/2022/neurips/Target alignment in truncated kernel ridge regression	
@@ -0,0 +1 @@
+Kernel ridge regression (KRR) has recently attracted renewed interest due to its potential for explaining the transient effects, such as double descent, that emerge during neural network training. In this work, we study how the alignment between the target function and the kernel affects the performance of the KRR. We focus on the truncated KRR (TKRR) which utilizes an additional parameter that controls the spectral truncation of the kernel matrix. We show that for polynomial alignment, there is an \emph{over-aligned} regime, in which TKRR can achieve a faster rate than what is achievable by full KRR. The rate of TKRR can improve all the way to the parametric rate, while that of full KRR is capped at a sub-optimal value. This shows that target alignemnt can be better leveraged by utilizing spectral truncation in kernel methods. We also consider the bandlimited alignment setting and show that the regularization surface of TKRR can exhibit transient effects including multiple descent and non-monotonic behavior. Our results show that there is a strong and quantifable relation between the shape of the \emph{alignment spectrum} and the generalization performance of kernel methods, both in terms of rates and in finite samples.
\ No newline at end of file
diff --git a/data/2022/neurips/Task Discovery: Finding the Tasks that Neural Networks Generalize on b/data/2022/neurips/Task Discovery: Finding the Tasks that Neural Networks Generalize on
new file mode 100644
index 0000000000..99224f7872
--- /dev/null
+++ b/data/2022/neurips/Task Discovery: Finding the Tasks that Neural Networks Generalize on	
@@ -0,0 +1 @@
+When developing deep learning models, we usually decide what task we want to solve then search for a model that generalizes well on the task. An intriguing question would be: what if, instead of fixing the task and searching in the model space, we fix the model and search in the task space? Can we find tasks that the model generalizes on? How do they look, or do they indicate anything? These are the questions we address in this paper. We propose a task discovery framework that automatically finds examples of such tasks via optimizing a generalization-based quantity called agreement score. We demonstrate that one set of images can give rise to many tasks on which neural networks generalize well. These tasks are a reflection of the inductive biases of the learning framework and the statistical patterns present in the data, thus they can make a useful tool for analysing the neural networks and their biases. As an example, we show that the discovered tasks can be used to automatically create adversarial train-test splits which make a model fail at test time, without changing the pixels or labels, but by only selecting how the datapoints should be split between the train and test sets. We end with a discussion on human-interpretability of the discovered tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Task-Agnostic Graph Explanations b/data/2022/neurips/Task-Agnostic Graph Explanations
new file mode 100644
index 0000000000..a67720d86a
--- /dev/null
+++ b/data/2022/neurips/Task-Agnostic Graph Explanations	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have emerged as powerful tools to encode graph-structured data. Due to their broad applications, there is an increasing need to develop tools to explain how GNNs make decisions given graph-structured data. Existing learning-based GNN explanation approaches are task-specific in training and hence suffer from crucial drawbacks. Specifically, they are incapable of producing explanations for a multitask prediction model with a single explainer. They are also unable to provide explanations in cases where the GNN is trained in a self-supervised manner, and the resulting representations are used in future downstream tasks. To address these limitations, we propose a Task-Agnostic GNN Explainer (TAGE) that is independent of downstream models and trained under self-supervision with no knowledge of downstream tasks. TAGE enables the explanation of GNN embedding models with unseen downstream tasks and allows efficient explanation of multitask models. Our extensive experiments show that TAGE can significantly speed up the explanation efficiency by using the same model to explain predictions for multiple downstream tasks while achieving explanation quality as good as or even better than current state-of-the-art GNN explanation approaches. Our code is pubicly available as part of the DIG library at https://github.com/divelab/DIG/tree/main/dig/xgraph/TAGE/.
\ No newline at end of file
diff --git a/data/2022/neurips/Task-Free Continual Learning via Online Discrepancy Distance Learning b/data/2022/neurips/Task-Free Continual Learning via Online Discrepancy Distance Learning
new file mode 100644
index 0000000000..88983ec3be
--- /dev/null
+++ b/data/2022/neurips/Task-Free Continual Learning via Online Discrepancy Distance Learning	
@@ -0,0 +1 @@
+Learning from non-stationary data streams, also called Task-Free Continual Learning (TFCL) remains challenging due to the absence of explicit task information. Although recently some methods have been proposed for TFCL, they lack theoretical guarantees. Moreover, forgetting analysis during TFCL was not studied theoretically before. This paper develops a new theoretical analysis framework which provides generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model. This analysis gives new insights into the forgetting behaviour in classification tasks. Inspired by this theoretical model, we propose a new approach enabled by the dynamic component expansion mechanism for a mixture model, namely the Online Discrepancy Distance Learning (ODDL). ODDL estimates the discrepancy between the probabilistic representation of the current memory buffer and the already accumulated knowledge and uses it as the expansion signal to ensure a compact network architecture with optimal performance. We then propose a new sample selection approach that selectively stores the most relevant samples into the memory buffer through the discrepancy-based measure, further improving the performance. We perform several TFCL experiments with the proposed methodology, which demonstrate that the proposed approach achieves the state of the art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Task-level Differentially Private Meta Learning b/data/2022/neurips/Task-level Differentially Private Meta Learning
new file mode 100644
index 0000000000..5823dc4fee
--- /dev/null
+++ b/data/2022/neurips/Task-level Differentially Private Meta Learning	
@@ -0,0 +1 @@
+We study the problem of meta-learning with task-level differential privacy. Meta-learning has received increasing attention recently because of its ability to enable fast generalization to new task with small number of data points. However, the training process of meta learning likely involves exchange of task specific information, which may pose privacy risk especially in some privacy-sensitive applications. Therefore, it is important to provide strong privacy guarantees such that the learning process will not reveal any task sensitive information. To this end, existing works have proposed meta learning algorithms with record-level differential privacy, which is not sufficient in many scenarios since it does not protect the aggregated statistics based on the task dataset as a whole. Moreover, the utility guarantees in the prior work are based on assuming the loss function satisfies both smoothness and quadratic growth conditions, which do not necessarily hold in practice. To address these issues, we propose meta learning algorithms with task-level differential privacy; that is, our algorithms protect the entire dataset for each task. In the case when a single meta model is trained, we give both privacy and utility guarantees assuming only that the loss is convex and Lipschitz. Moreover, we propose a new private clustering-based meta-learning algorithm that enables private meta learning of multiple meta models. This can provide significant accuracy gains over the single meta model paradigm, especially when the tasks distribution cannot be well represented by a single meta model. Finally, we conduct several experiments demonstrating the effectiveness of our proposed algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation b/data/2022/neurips/Teach Less, Learn More: On the Undistillable Classes in Knowledge Distillation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Teacher Forcing Recovers Reward Functions for Text Generation b/data/2022/neurips/Teacher Forcing Recovers Reward Functions for Text Generation
new file mode 100644
index 0000000000..b2096047a4
--- /dev/null
+++ b/data/2022/neurips/Teacher Forcing Recovers Reward Functions for Text Generation	
@@ -0,0 +1 @@
+Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.
\ No newline at end of file
diff --git a/data/2022/neurips/TempEL: Linking Dynamically Evolving and Newly Emerging Entities b/data/2022/neurips/TempEL: Linking Dynamically Evolving and Newly Emerging Entities
new file mode 100644
index 0000000000..1dd6754e8e
--- /dev/null
+++ b/data/2022/neurips/TempEL: Linking Dynamically Evolving and Newly Emerging Entities	
@@ -0,0 +1 @@
+In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions' contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.
\ No newline at end of file
diff --git a/data/2022/neurips/Template based Graph Neural Network with Optimal Transport Distances b/data/2022/neurips/Template based Graph Neural Network with Optimal Transport Distances
new file mode 100644
index 0000000000..1f3f43cb72
--- /dev/null
+++ b/data/2022/neurips/Template based Graph Neural Network with Optimal Transport Distances	
@@ -0,0 +1 @@
+Current Graph Neural Networks (GNN) architectures generally rely on two important components: node features embedding through message passing, and aggregation with a specialized form of pooling. The structural (or topological) information is implicitly taken into account in these two steps. We propose in this work a novel point of view, which places distances to some learnable graph templates at the core of the graph representation. This distance embedding is constructed thanks to an optimal transport distance: the Fused Gromov-Wasserstein (FGW) distance, which encodes simultaneously feature and structure dissimilarities by solving a soft graph-matching problem. We postulate that the vector of FGW distances to a set of template graphs has a strong discriminative power, which is then fed to a non-linear classifier for final predictions. Distance embedding can be seen as a new layer, and can leverage on existing message passing techniques to promote sensible feature representations. Interestingly enough, in our work the optimal set of template graphs is also learnt in an end-to-end fashion by differentiating through this layer. After describing the corresponding learning procedure, we empirically validate our claim on several synthetic and real life graph classification datasets, where our method is competitive or surpasses kernel and GNN state-of-the-art approaches. We complete our experiments by an ablation study and a sensitivity analysis to parameters.
\ No newline at end of file
diff --git a/data/2022/neurips/Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction b/data/2022/neurips/Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction
new file mode 100644
index 0000000000..3b439192f1
--- /dev/null
+++ b/data/2022/neurips/Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction	
@@ -0,0 +1 @@
+Training deep learning models can be computationally expensive. Prior works have shown that increasing the batch size can potentially lead to better overall throughput. However, the batch size is frequently limited by the accelerator memory capacity due to the activations/feature maps stored for the training backward pass, as larger batch sizes require larger feature maps to be stored. Transformer-based models, which have recently seen a surge in popularity due to their good performance and applicability to a variety of tasks, have a similar problem. To remedy this issue, we propose Tempo, a new approach to efficiently use accelerator (e.g., GPU) memory resources for training Transformer-based models. Our approach provides drop-in replacements for the GELU, LayerNorm, and Attention layers, reducing the memory usage and ultimately leading to more efficient training. We implement Tempo and evaluate the throughput, memory usage, and accuracy/loss on the BERT Large pre-training task. We demonstrate that Tempo enables up to 2x higher batch sizes and 16% higher training throughput over the state-of-the-art baseline. We also evaluate Tempo on GPT2 and RoBERTa models, showing 19% and 26% speedup over the baseline.
\ No newline at end of file
diff --git a/data/2022/neurips/Temporal Effective Batch Normalization in Spiking Neural Networks b/data/2022/neurips/Temporal Effective Batch Normalization in Spiking Neural Networks
new file mode 100644
index 0000000000..7e5f65da56
--- /dev/null
+++ b/data/2022/neurips/Temporal Effective Batch Normalization in Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking Neural Networks (SNNs) are promising in neuromorphic hardware owing to utilizing spatio-temporal information and sparse event-driven signal processing. However, it is challenging to train SNNs due to the non-differentiable nature of the binary firing function. The surrogate gradients alleviate the training problem and make SNNs obtain comparable performance as Artificial Neural Networks (ANNs) with the same structure. Unfortunately, batch normalization, contributing to the success of ANNs, does not play a prominent role in SNNs because of the additional temporal dimension. To this end, we propose an effective normalization method called temporal effective batch normalization (TEBN). By rescaling the presynaptic inputs with different weights at every time-step, temporal distributions become smoother and uniform. Theoretical analysis shows that TEBN can be viewed as a smoother of SNN’s optimization landscape and could help stabilize the gradient norm. Experimental results on both static and neuromorphic datasets show that SNNs with TEBN outperform the state-of-the-art accuracy with fewer time-steps, and achieve better robustness to hyper-parameters than other normalizations.
\ No newline at end of file
diff --git a/data/2022/neurips/Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning b/data/2022/neurips/Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
new file mode 100644
index 0000000000..60ca5dfcfa
--- /dev/null
+++ b/data/2022/neurips/Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning	
@@ -0,0 +1 @@
+Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams. A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors. At the same time, a fast stream is parameterized as a Transformer to process chunks consisting of $K$ time-steps conditioned on the information in the slow-stream. In the proposed approach we hope to gain the expressiveness of the Transformer, while encouraging better compression and structuring of representations in the slow stream. We show the benefits of the proposed method in terms of improved sample efficiency and generalization performance as compared to various competitive baselines for visual perception and sequential decision making tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Temporally Disentangled Representation Learning b/data/2022/neurips/Temporally Disentangled Representation Learning
new file mode 100644
index 0000000000..6c8410bd88
--- /dev/null
+++ b/data/2022/neurips/Temporally Disentangled Representation Learning	
@@ -0,0 +1 @@
+Recently in the field of unsupervised representation learning, strong identifiability results for disentanglement of causally-related latent variables have been established by exploiting certain side information, such as class labels, in addition to independence. However, most existing work is constrained by functional form assumptions such as independent sources or further with linear transitions, and distribution assumptions such as stationary, exponential family distribution. It is unknown whether the underlying latent variables and their causal relations are identifiable if they have arbitrary, nonparametric causal influences in between. In this work, we establish the identifiability theories of nonparametric latent causal processes from their nonlinear mixtures under fixed temporal causal influences and analyze how distribution changes can further benefit the disentanglement. We propose \textbf{\texttt{TDRL}}, a principled framework to recover time-delayed latent causal variables and identify their relations from measured sequential data under stationary environments and under different distribution shifts. Specifically, the framework can factorize unknown distribution shifts into transition distribution changes under fixed and time-varying latent causal relations, and under observation changes in observation. Through experiments, we show that time-delayed latent causal influences are reliably identified and that our approach considerably outperforms existing baselines that do not correctly exploit this modular representation of changes. Our code is available at: \url{https://github.com/weirayao/tdrl}.
\ No newline at end of file
diff --git a/data/2022/neurips/Temporally-Consistent Survival Analysis b/data/2022/neurips/Temporally-Consistent Survival Analysis
new file mode 100644
index 0000000000..2b98a641e5
--- /dev/null
+++ b/data/2022/neurips/Temporally-Consistent Survival Analysis	
@@ -0,0 +1 @@
+We study survival analysis in the dynamic setting: We seek to model the time to an event of interest given sequences of states. Taking inspiration from temporal-difference learning, a central idea in reinforcement learning, we develop algorithms that estimate a discrete-time survival model by exploiting a temporal-consistency condition. Intuitively, this condition captures the fact that the survival distribution at consecutive states should be similar, accounting for the delay between states. Our method can be combined with any parametric survival model and naturally accommodates right-censored observations. We demonstrate empirically that it achieves better sample-efficiency and predictive performance compared to approaches that directly regress the observed survival outcome
\ No newline at end of file
diff --git a/data/2022/neurips/Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems b/data/2022/neurips/Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
new file mode 100644
index 0000000000..9c3b69a216
--- /dev/null
+++ b/data/2022/neurips/Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems	
@@ -0,0 +1 @@
+Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicks, likes, shares, and follows, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Tensor Program Optimization with Probabilistic Programs b/data/2022/neurips/Tensor Program Optimization with Probabilistic Programs
new file mode 100644
index 0000000000..2ff1d55272
--- /dev/null
+++ b/data/2022/neurips/Tensor Program Optimization with Probabilistic Programs	
@@ -0,0 +1 @@
+Automatic optimization for tensor programs becomes increasingly important as we deploy deep learning in various environments, and efficient optimization relies on a rich search space and effective search. Most existing efforts adopt a search space which lacks the ability to efficiently enable domain experts to grow the search space. This paper introduces MetaSchedule, a domain-specific probabilistic programming language abstraction to construct a rich search space of tensor programs. Our abstraction allows domain experts to analyze the program, and easily propose stochastic choices in a modular way to compose program transformation accordingly. We also build an end-to-end learning-driven framework to find an optimized program for a given search space. Experimental results show that MetaSchedule can cover the search space used in the state-of-the-art tensor program optimization frameworks in a modular way. Additionally, it empowers domain experts to conveniently grow the search space and modularly enhance the system, which brings 48% speedup on end-to-end deep learning workloads.
\ No newline at end of file
diff --git a/data/2022/neurips/Tensor Wheel Decomposition and Its Tensor Completion Application b/data/2022/neurips/Tensor Wheel Decomposition and Its Tensor Completion Application
new file mode 100644
index 0000000000..225a4a7f7c
--- /dev/null
+++ b/data/2022/neurips/Tensor Wheel Decomposition and Its Tensor Completion Application	
@@ -0,0 +1 @@
+Recently, tensor network (TN) decompositions have gained prominence in computer vision and contributed promising results to high-order data recovery tasks. However, current TN models are rather being developed towards more intricate structures to pursue incremental improvements, which instead leads to a dramatic increase in rank numbers, thus encountering laborious hyper-parameter selection, especially for higher-order cases. In this paper, we propose a novel TN decomposition, dubbed tensor wheel (TW) decomposition, in which a high-order tensor is represented by a set of latent factors mapped into a speciﬁc wheel topology. Such decomposition is constructed starting from analyzing the graph structure, aiming to more accurately characterize the complex interactions inside objectives while maintaining a lower hyper-parameter scale, theoretically alleviating the above deﬁciencies. Furthermore, to investigate the potentiality of TW decomposition, we provide its one numerical application, i.e., tensor completion (TC), yet develop an efﬁcient proximal alternating minimization-based solving algorithm with guaranteed convergence. Experimental results elaborate that the proposed method is signiﬁcantly superior to other tensor decomposition-based state-of-the-art methods on synthetic and real-world data, implying the merits of TW decomposition. The code is available at: https://github.com/zhongchengwu/code_TWDec .
\ No newline at end of file
diff --git a/data/2022/neurips/Test Time Adaptation via Conjugate Pseudo-labels b/data/2022/neurips/Test Time Adaptation via Conjugate Pseudo-labels
new file mode 100644
index 0000000000..181100c8cb
--- /dev/null
+++ b/data/2022/neurips/Test Time Adaptation via Conjugate Pseudo-labels	
@@ -0,0 +1 @@
+Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, with access to only the unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT [Wang et al., 2021], but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising phenomenon: if we attempt to meta-learn the best possible TTA loss over a wide class of functions, then we recover a function that is remarkably similar to (a temperature-scaled version of) the softmax-entropy employed by TENT. This only holds, however, if the classifier we are adapting is trained via cross-entropy; if trained via squared loss, a different best TTA loss emerges. To explain this phenomenon, we analyze TTA through the lens of the training losses's convex conjugate. We show that under natural conditions, this (unsupervised) conjugate function can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the best losses found by meta-learning. This leads to a generic recipe that can be used to find a good TTA loss for any given supervised training loss function of a general class. Empirically, our approach consistently dominates other baselines over a wide range of benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed PolyLoss, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudolabel. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/tta_conjugate.
\ No newline at end of file
diff --git a/data/2022/neurips/Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models b/data/2022/neurips/Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models
new file mode 100644
index 0000000000..ac9b17e73f
--- /dev/null
+++ b/data/2022/neurips/Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models	
@@ -0,0 +1 @@
+Pre-trained vision-language models (e.g., CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. For image classification, TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample. In evaluating generalization to natural distribution shifts, TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average, surpassing previous prompt tuning approaches that require additional task-specific training data. In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data. Project page: https://azshue.github.io/TPT.
\ No newline at end of file
diff --git a/data/2022/neurips/Test-Time Training with Masked Autoencoders b/data/2022/neurips/Test-Time Training with Masked Autoencoders
new file mode 100644
index 0000000000..e6e510a64f
--- /dev/null
+++ b/data/2022/neurips/Test-Time Training with Masked Autoencoders	
@@ -0,0 +1 @@
+Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.
\ No newline at end of file
diff --git a/data/2022/neurips/Text Classification with Born's Rule b/data/2022/neurips/Text Classification with Born's Rule
new file mode 100644
index 0000000000..68462468ff
--- /dev/null
+++ b/data/2022/neurips/Text Classification with Born's Rule	
@@ -0,0 +1 @@
+This paper presents a text classiﬁcation algorithm inspired by the notion of superposition of states in quantum physics. By regarding text as a superposition of words, we derive the wave function of a document and we compute the transition probability of the document to a target class according to Born’s rule. Two complementary implementations are presented. In the ﬁrst one, wave functions are calculated explicitly. The second implementation embeds the classiﬁer in a neural network architecture. Through analysis of three benchmark datasets, we illustrate several aspects of the proposed method, such as classiﬁcation performance, explainability, and computational efﬁciency. These ideas are also applicable to non-textual data.
\ No newline at end of file
diff --git a/data/2022/neurips/Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval b/data/2022/neurips/Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval
new file mode 100644
index 0000000000..60a54c8b72
--- /dev/null
+++ b/data/2022/neurips/Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval	
@@ -0,0 +1 @@
+Cross-modal retrieval between videos and texts has gained increasing research interest due to the rapid emergence of videos on the web. Generally, a video contains rich instance and event information and the query text only describes a part of the information. Thus, a video can correspond to multiple different text descriptions and queries. We call this phenomenon the ``Video-Text Correspondence Ambiguity'' problem. Current techniques mostly concentrate on mining local or multi-level alignment between contents of a video and text (\textit{e.g.}, object to entity and action to verb). It is difficult for these methods to alleviate the video-text correspondence ambiguity by describing a video using only one single feature, which is required to be matched with multiple different text features at the same time. To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features. Given a query text, the similarity is determined by the most similar prototype to find correspondence in the video, which is termed text-adaptive matching. To learn diverse prototypes for representing the rich information in videos, we propose a variance loss to encourage different prototypes to attend to different contents of the video. Our method outperforms state-of-the-art methods on four public video retrieval datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset b/data/2022/neurips/The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
new file mode 100644
index 0000000000..4c6a8846d0
--- /dev/null
+++ b/data/2022/neurips/The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset	
@@ -0,0 +1 @@
+As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the foreground. This paper documents the data creation and curation efforts undertaken by BigScience to assemble the Responsible Open-science Open-collaboration Text Sources (ROOTS) corpus, a 1.6TB dataset spanning 59 languages that was used to train the 176-billion-parameter BigScience Large Open-science Open-access Multilingual (BLOOM) language model. We further release a large initial subset of the corpus and analyses thereof, and hope to empower large-scale monolingual and multilingual modeling projects with both the data and the processing tools, as well as stimulate research around this large multilingual corpus.
\ No newline at end of file
diff --git a/data/2022/neurips/The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound b/data/2022/neurips/The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound
new file mode 100644
index 0000000000..9b6431272c
--- /dev/null
+++ b/data/2022/neurips/The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound	
@@ -0,0 +1 @@
+The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. There has been much recent interest in obtaining a better theoretical understanding of the Burer-Monteiro method. When the maximum allowed rank $p$ of the SDP solution is above the Barvinok-Pataki bound (where a globally optimal solution of rank at most $p$ is guaranteed to exist), a recent line of work established convergence to a global optimum for generic or smoothed instances of the problem. However, it was open whether there even exists an instance in this regime where the Burer-Monteiro method fails. We prove that the Burer-Monteiro method can fail for the Max-Cut SDP on $n$ vertices when the rank is above the Barvinok-Pataki bound ($p \ge \sqrt{2n}$). We provide a family of instances that have spurious local minima even when the rank $p = n/2$. Combined with existing guarantees, this settles the question of the existence of spurious local minima for the Max-Cut formulation in all ranges of the rank and justifies the use of beyond worst-case paradigms like smoothed analysis to obtain guarantees for the Burer-Monteiro method.
\ No newline at end of file
diff --git a/data/2022/neurips/The Curse of Unrolling: Rate of Differentiating Through Optimization b/data/2022/neurips/The Curse of Unrolling: Rate of Differentiating Through Optimization
new file mode 100644
index 0000000000..21e4d31319
--- /dev/null
+++ b/data/2022/neurips/The Curse of Unrolling: Rate of Differentiating Through Optimization	
@@ -0,0 +1 @@
+Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
\ No newline at end of file
diff --git a/data/2022/neurips/The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World b/data/2022/neurips/The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World
new file mode 100644
index 0000000000..fecbb041b8
--- /dev/null
+++ b/data/2022/neurips/The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World	
@@ -0,0 +1 @@
+It is crucial that image datasets for computer vision are representative and contain accurate demographic information to ensure their robustness and fairness, especially for smaller subpopulations. To address this issue, we present Dollar Street—a supervised dataset that contains 38,479 images of everyday household items from homes around the world. This dataset was manually curated and fully labeled 2 , including tags for objects (e.g. “toilet,” “toothbrush,” “stove”) and demographic data such as region, country and home monthly income. This dataset includes images from homes with no internet access and incomes as low as $26.99 per month, visually capturing valuable socioeconomic diversity of traditionally under-represented populations. All images and data are licensed under CC-BY, permitting their use in academic and commercial work. Moreover, we show that this dataset can improve the performance of classification tasks for images of household items from lower income homes, addressing a critical need for datasets that combat bias.
\ No newline at end of file
diff --git a/data/2022/neurips/The Effects of Regularization and Data Augmentation are Class Dependent b/data/2022/neurips/The Effects of Regularization and Data Augmentation are Class Dependent
new file mode 100644
index 0000000000..23debcb5e6
--- /dev/null
+++ b/data/2022/neurips/The Effects of Regularization and Data Augmentation are Class Dependent	
@@ -0,0 +1 @@
+Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the"barn spider"classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
\ No newline at end of file
diff --git a/data/2022/neurips/The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization b/data/2022/neurips/The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization
new file mode 100644
index 0000000000..0812b2bb11
--- /dev/null
+++ b/data/2022/neurips/The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization	
@@ -0,0 +1 @@
+In this paper, we study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems. Arjevani et al. (2019) established the lower bound $\Omega\left(\epsilon^{-2/(3p+1)}\right)$ on the number of the $p$-th order oracle calls required by an algorithm to find an $\epsilon$-accurate solution to the problem, where the $p$-th order oracle stands for the computation of the objective function value and the derivatives up to the order $p$. However, the existing state-of-the-art high-order methods of Gasnikov et al. (2019b); Bubeck et al. (2019); Jiang et al. (2019) achieve the oracle complexity $\mathcal{O}\left(\epsilon^{-2/(3p+1)} \log (1/\epsilon)\right)$, which does not match the lower bound. The reason for this is that these algorithms require performing a complex binary search procedure, which makes them neither optimal nor practical. We fix this fundamental issue by providing the first algorithm with $\mathcal{O}\left(\epsilon^{-2/(3p+1)}\right)$ $p$-th order oracle complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization b/data/2022/neurips/The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization
new file mode 100644
index 0000000000..b294a36335
--- /dev/null
+++ b/data/2022/neurips/The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization	
@@ -0,0 +1 @@
+In this paper, we revisit the smooth and strongly-convex-strongly-concave minimax optimization problem. Zhang et al. (2021) and Ibrahim et al. (2020) established the lower bound $\Omega\left(\sqrt{\kappa_x\kappa_y} \log \frac{1}{\epsilon}\right)$ on the number of gradient evaluations required to find an $\epsilon$-accurate solution, where $\kappa_x$ and $\kappa_y$ are condition numbers for the strong convexity and strong concavity assumptions. However, the existing state-of-the-art methods do not match this lower bound: algorithms of Lin et al. (2020) and Wang and Li (2020) have gradient evaluation complexity $\mathcal{O}\left( \sqrt{\kappa_x\kappa_y}\log^3\frac{1}{\epsilon}\right)$ and $\mathcal{O}\left( \sqrt{\kappa_x\kappa_y}\log^3 (\kappa_x\kappa_y)\log\frac{1}{\epsilon}\right)$, respectively. We fix this fundamental issue by providing the first algorithm with $\mathcal{O}\left(\sqrt{\kappa_x\kappa_y}\log\frac{1}{\epsilon}\right)$ gradient evaluation complexity. We design our algorithm in three steps: (i) we reformulate the original problem as a minimization problem via the pointwise conjugate function; (ii) we apply a specific variant of the proximal point algorithm to the reformulated problem; (iii) we compute the proximal operator inexactly using the optimal algorithm for operator norm reduction in monotone inclusions.
\ No newline at end of file
diff --git a/data/2022/neurips/The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics b/data/2022/neurips/The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics
new file mode 100644
index 0000000000..4e8ddf8ac6
--- /dev/null
+++ b/data/2022/neurips/The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics	
@@ -0,0 +1 @@
+Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to make a rigorous connection between the seemingly different low-degree and free-energy based approaches. We define a free-energy based criterion for hardness and formally connect it to the well-established notion of low-degree hardness for a broad class of statistical problems, namely all Gaussian additive models and certain models with a sparse planted signal. By leveraging these rigorous connections we are able to: establish that for Gaussian additive models the"algebraic"notion of low-degree hardness implies failure of"geometric"local MCMC algorithms, and provide new low-degree lower bounds for sparse linear regression which seem difficult to prove directly. These results provide both conceptual insights into the connections between different notions of hardness, as well as concrete technical tools such as new methods for proving low-degree lower bounds.
\ No newline at end of file
diff --git a/data/2022/neurips/The Gyro-Structure of Some Matrix Manifolds b/data/2022/neurips/The Gyro-Structure of Some Matrix Manifolds
new file mode 100644
index 0000000000..40456e553b
--- /dev/null
+++ b/data/2022/neurips/The Gyro-Structure of Some Matrix Manifolds	
@@ -0,0 +1 @@
+Comparison against SPD Neural Networks For SPDNet and SPDNetBN, we compute a covariance matrix to represent an input sequence as in [20]. The sizes of the covariance matrices are respectively 93× 93, 60× 60, and 150× 150 for HDM05, FPHA, and NTU60 datasets. The sizes of the transformation matrices for the experiments on FPHA dataset are set to 60× 50, 50× 40, 40× 30, respectively. The sizes of the transformation matrices for the experiments on NTU60 dataset are set to 150× 100, 100× 60, 60× 30, respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/The Hessian Screening Rule b/data/2022/neurips/The Hessian Screening Rule
new file mode 100644
index 0000000000..2137b9aba9
--- /dev/null
+++ b/data/2022/neurips/The Hessian Screening Rule	
@@ -0,0 +1 @@
+Predictor screening rules, which discard predictors before fitting a model, have had considerable impact on the speed with which sparse regression problems, such as the lasso, can be solved. In this paper we present a new screening rule for solving the lasso path: the Hessian Screening Rule. The rule uses second-order information from the model to provide both effective screening, particularly in the case of high correlation, as well as accurate warm starts. The proposed rule outperforms all alternatives we study on simulated data sets with both low and high correlation for $\ell_1$-regularized least-squares (the lasso) and logistic regression. It also performs best in general on the real data sets that we examine.
\ No newline at end of file
diff --git a/data/2022/neurips/The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning b/data/2022/neurips/The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning
new file mode 100644
index 0000000000..f25b6b624e
--- /dev/null
+++ b/data/2022/neurips/The Impact of Task Underspecification in Evaluating Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Evaluations of Deep Reinforcement Learning (DRL) methods are an integral part of scientific progress of the field. Beyond designing DRL methods for general intelligence, designing task-specific methods is becoming increasingly prominent for real-world applications. In these settings, the standard evaluation practice involves using a few instances of Markov Decision Processes (MDPs) to represent the task. However, many tasks induce a large family of MDPs owing to variations in the underlying environment, particularly in real-world contexts. For example, in traffic signal control, variations may stem from intersection geometries and traffic flow levels. The select MDP instances may thus inadvertently cause overfitting, lacking the statistical power to draw conclusions about the method's true performance across the family. In this article, we augment DRL evaluations to consider parameterized families of MDPs. We show that in comparison to evaluating DRL methods on select MDP instances, evaluating the MDP family often yields a substantially different relative ranking of methods, casting doubt on what methods should be considered state-of-the-art. We validate this phenomenon in standard control benchmarks and the real-world application of traffic signal control. At the same time, we show that accurately evaluating on an MDP family is nontrivial. Overall, this work identifies new challenges for empirical rigor in reinforcement learning, especially as the outcomes of DRL trickle into downstream decision-making.
\ No newline at end of file
diff --git a/data/2022/neurips/The Implicit Delta Method b/data/2022/neurips/The Implicit Delta Method
new file mode 100644
index 0000000000..e4ee5b620c
--- /dev/null
+++ b/data/2022/neurips/The Implicit Delta Method	
@@ -0,0 +1 @@
+Epistemic uncertainty quantification is a crucial part of drawing credible conclusions from predictive models, whether concerned about the prediction at a given point or any downstream evaluation that uses the model as input. When the predictive model is simple and its evaluation differentiable, this task is solved by the delta method, where we propagate the asymptotically-normal uncertainty in the predictive model through the evaluation to compute standard errors and Wald confidence intervals. However, this becomes difficult when the model and/or evaluation becomes more complex. Remedies include the bootstrap, but it can be computationally infeasible when training the model even once is costly. In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of the predictive model to automatically assess downstream uncertainty. We show that the change in the evaluation due to regularization is consistent for the asymptotic variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference. This provides both a reliable quantification of uncertainty in terms of standard errors as well as permits the construction of calibrated confidence intervals. We discuss connections to other approaches to uncertainty quantification, both Bayesian and frequentist, and demonstrate our approach empirically.
\ No newline at end of file
diff --git a/data/2022/neurips/The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning b/data/2022/neurips/The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
new file mode 100644
index 0000000000..18ee9449a7
--- /dev/null
+++ b/data/2022/neurips/The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning	
@@ -0,0 +1 @@
+Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations. This phenomenon is a typical example of implicit bias in deep learning and remains little understood. In this work, we present our empirical and theoretical discoveries on non-contrastive self-supervised learning. Empirically, we find that when the prediction head is initialized as an identity matrix with only its off-diagonal entries being trainable, the network can learn competitive representations even though the trivial optima still exist in the training objective. Theoretically, we present a framework to understand the behavior of the trainable, but identity-initialized prediction head. Under a simple setting, we characterized the substitution effect and acceleration effect of the prediction head. The substitution effect happens when learning the stronger features in some neurons can substitute for learning these features in other neurons through updating the prediction head. And the acceleration effect happens when the substituted features can accelerate the learning of other weaker features to prevent them from being ignored. These two effects enable the neural networks to learn all the features rather than focus only on learning the stronger features, which is likely the cause of the dimensional collapse phenomenon. To the best of our knowledge, this is also the first end-to-end optimization guarantee for non-contrastive methods using nonlinear neural networks with a trainable prediction head and normalization.
\ No newline at end of file
diff --git a/data/2022/neurips/The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm b/data/2022/neurips/The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm
new file mode 100644
index 0000000000..cd0be2d6e1
--- /dev/null
+++ b/data/2022/neurips/The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm	
@@ -0,0 +1 @@
+Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and Collaborative Filtering. Following the convention of RS, existing methods exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, we argue that the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to include a multiple set of representations for each user in the system. Based on this embedding paradigm, user preference toward an item is aggregated from different embeddings by taking the minimum item-user distance among the user embedding set. Furthermore, we observe that the diversity of the embeddings for the same user also plays an essential role in the model. To this end, we propose a \textit{diversity control regularization} term to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could generalize well to unseen test data by tackling the challenge of the annoying operation that comes from the minimum value. Experiments over a range of benchmark datasets speak to the efficacy of DPCML.
\ No newline at end of file
diff --git a/data/2022/neurips/The Missing Invariance Principle found - the Reciprocal Twin of Invariant Risk Minimization b/data/2022/neurips/The Missing Invariance Principle found - the Reciprocal Twin of Invariant Risk Minimization
new file mode 100644
index 0000000000..bf8afd63aa
--- /dev/null
+++ b/data/2022/neurips/The Missing Invariance Principle found - the Reciprocal Twin of Invariant Risk Minimization	
@@ -0,0 +1 @@
+Machine learning models often generalize poorly to out-of-distribution (OOD) data as a result of relying on features that are spuriously correlated with the label during training. Recently, the technique of Invariant Risk Minimization (IRM) was proposed to learn predictors that only use invariant features by conserving the feature-conditioned label expectation $\mathbb{E}_e[y|f(x)]$ across environments. However, more recent studies have demonstrated that IRM-v1, a practical version of IRM, can fail in various settings. Here, we identify a fundamental flaw of IRM formulation that causes the failure. We then introduce a complementary notion of invariance, MRI, based on conserving the label-conditioned feature expectation $\mathbb{E}_e[f(x)|y]$, which is free of this flaw. Further, we introduce a simplified, practical version of the MRI formulation called MRI-v1. We prove that for general linear problems, MRI-v1 guarantees invariant predictors given sufficient number of environments. We also empirically demonstrate that MRI-v1 strongly out-performs IRM-v1 and consistently achieves near-optimal OOD generalization in image-based nonlinear problems.
\ No newline at end of file
diff --git a/data/2022/neurips/The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning b/data/2022/neurips/The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
new file mode 100644
index 0000000000..c479501fa0
--- /dev/null
+++ b/data/2022/neurips/The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning	
@@ -0,0 +1 @@
+We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The distinction from the value-based case bears important implications on concepts such as backward-view algorithms. Our work provides the first theoretical guarantees on multi-step off-policy distributional RL algorithms, including results that apply to the small number of existing approaches to multi-step distributional RL. In addition, we derive a novel algorithm, Quantile Regression-Retrace, which leads to a deep RL agent QR-DQN-Retrace that shows empirical improvements over QR-DQN on the Atari-57 benchmark. Collectively, we shed light on how unique challenges in multi-step distributional RL can be addressed both in theory and practice.
\ No newline at end of file
diff --git a/data/2022/neurips/The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization b/data/2022/neurips/The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization
new file mode 100644
index 0000000000..3aed4f30de
--- /dev/null
+++ b/data/2022/neurips/The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization	
@@ -0,0 +1 @@
+The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
\ No newline at end of file
diff --git a/data/2022/neurips/The Neural Testbed: Evaluating Joint Predictions b/data/2022/neurips/The Neural Testbed: Evaluating Joint Predictions
new file mode 100644
index 0000000000..3ba4d9c0d8
--- /dev/null
+++ b/data/2022/neurips/The Neural Testbed: Evaluating Joint Predictions	
@@ -0,0 +1 @@
+Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community.
\ No newline at end of file
diff --git a/data/2022/neurips/The Phenomenon of Policy Churn b/data/2022/neurips/The Phenomenon of Policy Churn
new file mode 100644
index 0000000000..ae443d6da5
--- /dev/null
+++ b/data/2022/neurips/The Phenomenon of Policy Churn	
@@ -0,0 +1 @@
+We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it is not limited to specific algorithm or environment properties. A number of ablations help whittle down the plausible explanations on why churn occurs to just a handful, all related to deep learning. Finally, we hypothesise that policy churn is a beneficial but overlooked form of implicit exploration that casts $\epsilon$-greedy exploration in a fresh light, namely that $\epsilon$-noise plays a much smaller role than expected.
\ No newline at end of file
diff --git a/data/2022/neurips/The Pitfalls of Regularization in Off-Policy TD Learning b/data/2022/neurips/The Pitfalls of Regularization in Off-Policy TD Learning
new file mode 100644
index 0000000000..49dd39cfe3
--- /dev/null
+++ b/data/2022/neurips/The Pitfalls of Regularization in Off-Policy TD Learning	
@@ -0,0 +1 @@
+Temporal Difference (TD) learning is ubiquitous in reinforcement learning, where it is often combined with off-policy sampling and function approximation. Unfortunately learning with this combination (known as the deadly triad ), exhibits instability and unbounded error. To account for this, modern RL methods often implicitly (or sometimes explicitly) assume that regularization is sufficient to mitigate the problem in practice; indeed, the standard deadly triad examples from the literature can be “fixed” via proper regularization. In this paper, we introduce a series of new counterexamples to show that the instability and unbounded error of TD methods is not solved by regularization. We demonstrate that, in the off-policy setting with linear function approximation, TD methods can fail to learn a non-trivial value function under any amount of regularization; we further show that regularization can induce divergence under common conditions; and we show that one of the most promising methods to mitigate this divergence (Emphatic TD algorithms) may also diverge under regularization. We further demonstrate such divergence when using neural networks as function approximators. Thus, we argue that the role of regularization in TD methods needs to be reconsidered, given that it is insufficient to prevent divergence and may itself introduce instability. There needs to be much more care in the practical and theoretical application of regularization to RL methods.
\ No newline at end of file
diff --git a/data/2022/neurips/The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design b/data/2022/neurips/The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design
new file mode 100644
index 0000000000..4cd9d5fa4c
--- /dev/null
+++ b/data/2022/neurips/The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design	
@@ -0,0 +1 @@
+Placement and routing are two critical yet time-consuming steps of chip design in modern VLSI systems. Distinct from traditional heuristic solvers, this paper on one hand proposes an RL-based model for mixed-size macro placement, which differs from existing learning-based placers that often consider the macro by coarse grid-based mask. While the standard cells are placed via gradient-based GPU acceleration. On the other hand, a one-shot conditional generative routing model, which is composed of a special-designed input-size-adapting generator and a bi-discriminator, is devised to perform one-shot routing to the pins within each net, and the order of nets to route is adaptively learned. Combining these techniques, we develop a flexible neural pipeline, which to our best knowledge, is the first joint placement and routing network without involving any traditional heuristic solver. Experimental results on chip design benchmarks showcase the effectiveness of our approach. Source code will be made publicly available at: https://github.com/Thinklab-SJTU/EDA-AI
\ No newline at end of file
diff --git a/data/2022/neurips/The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift b/data/2022/neurips/The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
new file mode 100644
index 0000000000..9ff8e99481
--- /dev/null
+++ b/data/2022/neurips/The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift	
@@ -0,0 +1 @@
+We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.
\ No newline at end of file
diff --git a/data/2022/neurips/The Privacy Onion Effect: Memorization is Relative b/data/2022/neurips/The Privacy Onion Effect: Memorization is Relative
new file mode 100644
index 0000000000..4839d25fda
--- /dev/null
+++ b/data/2022/neurips/The Privacy Onion Effect: Memorization is Relative	
@@ -0,0 +1 @@
+Machine learning models trained on private datasets have been shown to leak their private data. While recent work has found that the average data point is rarely leaked, the outlier samples are frequently subject to memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing the"layer"of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the same attack. We perform several experiments to study this effect, and understand why it occurs. The existence of this effect has various consequences. For example, it suggests that proposals to defend against memorization without training with rigorous privacy guarantees are unlikely to be effective. Further, it suggests that privacy-enhancing technologies such as machine unlearning could actually harm the privacy of other users.
\ No newline at end of file
diff --git a/data/2022/neurips/The Query Complexity of Cake Cutting b/data/2022/neurips/The Query Complexity of Cake Cutting
new file mode 100644
index 0000000000..4798bfaa20
--- /dev/null
+++ b/data/2022/neurips/The Query Complexity of Cake Cutting	
@@ -0,0 +1,2 @@
+We study the query complexity of cake cutting and give lower and upper bounds for computing approximately envy-free, perfect, and equitable allocations with the minimum number of cuts. The lower bounds are tight for computing connected envy-free allocations among n=3 players and for computing perfect and equitable allocations with minimum number of cuts between n=2 players. 
+We also formalize moving knife procedures and show that a large subclass of this family, which captures all the known moving knife procedures, can be simulated efficiently with arbitrarily small error in the Robertson-Webb query model.
\ No newline at end of file
diff --git a/data/2022/neurips/The Role of Baselines in Policy Gradient Optimization b/data/2022/neurips/The Role of Baselines in Policy Gradient Optimization
new file mode 100644
index 0000000000..2f92ddca66
--- /dev/null
+++ b/data/2022/neurips/The Role of Baselines in Policy Gradient Optimization	
@@ -0,0 +1 @@
+We study the effect of baselines in on-policy stochastic policy gradient optimization, and close the gap between the theory and practice of policy optimization methods. Our first contribution is to show that the \emph{state value} baseline allows on-policy stochastic \emph{natural} policy gradient (NPG) to converge to a globally optimal policy at an $O(1/t)$ rate, which was not previously known. The analysis relies on two novel findings: the expected progress of the NPG update satisfies a stochastic version of the non-uniform \L{}ojasiewicz (N\L{}) inequality, and with probability 1 the state value baseline prevents the optimal action's probability from vanishing, thus ensuring sufficient exploration. Importantly, these results provide a new understanding of the role of baselines in stochastic policy gradient: by showing that the variance of natural policy gradient estimates remains unbounded with or without a baseline, we find that variance reduction \emph{cannot} explain their utility in this setting. Instead, the analysis reveals that the primary effect of the value baseline is to \textbf{reduce the aggressiveness of the updates} rather than their variance. That is, we demonstrate that a finite variance is \emph{not necessary} for almost sure convergence of stochastic NPG, while controlling update aggressiveness is both necessary and sufficient. Additional experimental results verify these theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/The Sample Complexity of One-Hidden-Layer Neural Networks b/data/2022/neurips/The Sample Complexity of One-Hidden-Layer Neural Networks
new file mode 100644
index 0000000000..34e44636d2
--- /dev/null
+++ b/data/2022/neurips/The Sample Complexity of One-Hidden-Layer Neural Networks	
@@ -0,0 +1 @@
+We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks. In the latter setting, we study how the sample complexity is additionally affected by parameters such as the amount of overlap between patches and the overall number of patches.
\ No newline at end of file
diff --git a/data/2022/neurips/The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models b/data/2022/neurips/The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models
new file mode 100644
index 0000000000..fe4b5bf194
--- /dev/null
+++ b/data/2022/neurips/The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models	
@@ -0,0 +1 @@
+Recent works have demonstrated great success in pre-training large-scale autoregressive language models on massive GPUs. To reduce the wall-clock training time, a common practice is to increase the batch size and learning rate. However, such practice is often brittle and leads to a so-called stability-efficiency dilemma: increasing the batch sizes and learning rates leads to better training efficiency but can also result in training instability, leading to poor generalization accuracy or failed runs. To better understand this phenomenon, we conduct an in-depth analysis on large-scale pre-training experiments replicating the GPT-2 model. We find that there is a strong correlation between training instability and extreme values of gradient variance, and that samples with long sequence lengths contribute to these extreme gradient variance values, especially at the beginning of the training, indicating that long sequence length can be a main source of training instability. Based on the analysis, we present a Sequence Length Warmup method that aims to solve the training stability-efficiency dilemma. Experiments replicating GPT-2 models show that our approach enables stable training with 8x larger batch size and 4x larger learning rate, whereas the baseline approach struggles with training instability. To achieve the same or better zero-shot evaluation results, our method reduces the required number of training tokens and wall clock time by up to 2.2x and 3.7x, respectively. Experiments replicating GPT-3 model (125M) show that our approach enables stable training with 8x larger batch size and 40x larger learning rate, and retains 99% of the zero-shot accuracy on 11 tasks using 10x less data and 17x less time compared to the original GPT-3 training recipe, while the baseline diverges under the same settings and only retain 95% of accuracy under lower learning rate.
\ No newline at end of file
diff --git a/data/2022/neurips/The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games b/data/2022/neurips/The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
new file mode 100644
index 0000000000..43a2224d18
--- /dev/null
+++ b/data/2022/neurips/The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games	
@@ -0,0 +1 @@
+Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, Google Research Football, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter factors that are critical to PPO's empirical performance, and give concrete practical suggestions regarding these factors. Our results show that when using these practices, simple PPO-based methods can be a strong baseline in cooperative multi-agent reinforcement learning. Source code is released at \url{https://github.com/marlbenchmark/on-policy}.
\ No newline at end of file
diff --git a/data/2022/neurips/The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes b/data/2022/neurips/The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes
new file mode 100644
index 0000000000..14ca87a98a
--- /dev/null
+++ b/data/2022/neurips/The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes	
@@ -0,0 +1 @@
+Convolutional neural networks were the standard for solving many computer vision tasks until recently, when Transformers of MLP-based architectures have started to show competitive performance. These architectures typically have a vast number of weights and need to be trained on massive datasets; hence, they are not suitable for their use in low-data regimes. In this work, we propose a simple yet effective framework to improve generalization from small amounts of data. We augment modern CNNs with fully-connected (FC) layers and show the massive impact this architectural change has in low-data regimes. We further present an online joint knowledge-distillation method to utilize the extra FC layers at train time but avoid them during test time. This allows us to improve the generalization of a CNN-based model without any increase in the number of weights at test time. We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning. Our experiments significantly outperform the networks without fully-connected layers, reaching a relative improvement of up to $16\%$ validation accuracy in the supervised setting without adding any extra parameters during inference.
\ No newline at end of file
diff --git a/data/2022/neurips/The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning b/data/2022/neurips/The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
new file mode 100644
index 0000000000..3296b59a61
--- /dev/null
+++ b/data/2022/neurips/The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning	
@@ -0,0 +1 @@
+Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning? We study this question on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We test the performance of four LLMs on three textual reasoning datasets using prompts that include explanations in multiple different styles. For these tasks, we find that including explanations in the prompts for OPT, GPT-3 (davinci), and InstructGPT (text-davinci-001) only yields small to moderate accuracy improvements over standard few-show learning. However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models' predictions nor be factually grounded in the input, even on simple tasks with extractive explanations. However, these flawed explanations can still be useful as a way to verify LLMs' predictions post-hoc. Through analysis in our three settings, we show that explanations judged by humans to be good--logically consistent with the input and the prediction--more likely cooccur with accurate predictions. Following these observations, we train calibrators using automatically extracted scores that assess the reliability of explanations, allowing us to improve performance post-hoc across all of our datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/The alignment property of SGD noise and how it helps select flat minima: A stability analysis b/data/2022/neurips/The alignment property of SGD noise and how it helps select flat minima: A stability analysis
new file mode 100644
index 0000000000..e3857aa7df
--- /dev/null
+++ b/data/2022/neurips/The alignment property of SGD noise and how it helps select flat minima: A stability analysis	
@@ -0,0 +1 @@
+The phenomenon that stochastic gradient descent (SGD) favors flat minima has played a critical role in understanding the implicit regularization of SGD. In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum $\theta^*$ is linearly stable for SGD, then it must satisfy $\|H(\theta^*)\|_F\leq O(\sqrt{B}/\eta)$, where $\|H(\theta^*)\|_F, B,\eta$ denote the Frobenius norm of Hessian at $\theta^*$, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum \emph{exponentially} fast. Hence, for minima accessible to SGD, the sharpness -- as measured by the Frobenius norm of the Hessian -- is bounded \emph{independently} of the model size and sample size. The key to obtaining these results is exploiting the particular structure of SGD noise: The noise concentrates in sharp directions of local landscape and the magnitude is proportional to loss value. This alignment property of SGD noise provably holds for linear networks and random feature models (RFMs), and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are also justified by extensive experiments on CIFAR-10 dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/The computational and learning benefits of Daleian neural networks b/data/2022/neurips/The computational and learning benefits of Daleian neural networks
new file mode 100644
index 0000000000..3740552d60
--- /dev/null
+++ b/data/2022/neurips/The computational and learning benefits of Daleian neural networks	
@@ -0,0 +1 @@
+Dale's principle implies that biological neural networks are composed of neurons that are either excitatory or inhibitory. While the number of possible architectures of such Daleian networks is exponentially smaller than non-Daleian ones, the computational and functional implications of using Daleian networks by the brain are mostly unknown. Here, we use models of recurrent spiking neural networks and rate-based networks to show, surprisingly, that despite the structural limitations on Daleian networks, they can approximate the computation performed by non-Daleian networks to a very high degree of accuracy. Moreover, we find that Daleian networks are more functionally robust to synaptic noise. We then show that unlike non-Daleian networks, Daleian ones can learn efficiently by tuning single neuron features, nearly as well as learning by tuning individual synaptic weights - suggesting a simpler and more biologically plausible learning mechanism. We thus suggest that in addition to architectural simplicity, Dale's principle confers computational and learning benefits for biological networks, and offers new directions for constructing and training biologically-inspired artificial neural networks
\ No newline at end of file
diff --git a/data/2022/neurips/The least-control principle for local learning at equilibrium b/data/2022/neurips/The least-control principle for local learning at equilibrium
new file mode 100644
index 0000000000..acd0922525
--- /dev/null
+++ b/data/2022/neurips/The least-control principle for local learning at equilibrium	
@@ -0,0 +1 @@
+Equilibrium systems are a powerful way to express neural computations. As special cases, they include models of great current interest in both neuroscience and machine learning, such as deep neural networks, equilibrium recurrent neural networks, deep equilibrium models, or meta-learning. Here, we present a new principle for learning such systems with a temporally- and spatially-local rule. Our principle casts learning as a least-control problem, where we first introduce an optimal controller to lead the system towards a solution state, and then define learning as reducing the amount of control needed to reach such a state. We show that incorporating learning signals within a dynamics as an optimal control enables transmitting activity-dependent credit assignment information, avoids storing intermediate states in memory, and does not rely on infinitesimal learning signals. In practice, our principle leads to strong performance matching that of leading gradient-based learning methods when applied to an array of problems involving recurrent neural networks and meta-learning. Our results shed light on how the brain might learn and offer new ways of approaching a broad class of machine learning problems.
\ No newline at end of file
diff --git a/data/2022/neurips/The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation? b/data/2022/neurips/The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?
new file mode 100644
index 0000000000..19a8a2d425
--- /dev/null
+++ b/data/2022/neurips/The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?	
@@ -0,0 +1 @@
+We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed? While the matched Bayes-optimal setting with unstructured noise is well understood, the analysis of this mismatched problem is only at its premises. In this paper, we make a step towards understanding the effect of the strong source of mismatch which is the noise statistics. Our main technical contribution is the rigorous analysis of a Bayes estimator and of an approximate message passing (AMP) algorithm, both of which incorrectly assume a Gaussian setup. The first result exploits the theory of spherical integrals and of low-rank matrix perturbations; the idea behind the second one is to design and analyze an artificial AMP which, by taking advantage of the flexibility in the denoisers, is able to"correct"the mismatch. Armed with these sharp asymptotic characterizations, we unveil a rich and often unexpected phenomenology. For example, despite AMP is in principle designed to efficiently compute the Bayes estimator, the former is outperformed by the latter in terms of mean-square error. We show that this performance gap is due to an incorrect estimation of the signal norm. In fact, when the SNR is large enough, the overlaps of the AMP and the Bayes estimator coincide, and they even match those of optimal estimators taking into account the structure of the noise.
\ No newline at end of file
diff --git a/data/2022/neurips/The price of unfairness in linear bandits with biased feedback b/data/2022/neurips/The price of unfairness in linear bandits with biased feedback
new file mode 100644
index 0000000000..24ded8194f
--- /dev/null
+++ b/data/2022/neurips/The price of unfairness in linear bandits with biased feedback	
@@ -0,0 +1 @@
+In this paper, we study the problem of fair sequential decision making with biased linear bandit feedback. At each round, a player selects an action described by a covariate and by a sensitive attribute. The perceived reward is a linear combination of the covariates of the chosen action, but the player only observes a biased evaluation of this reward, depending on the sensitive attribute. To characterize the difficulty of this problem, we design a phased elimination algorithm that corrects the unfair evaluations, and establish upper bounds on its regret. We show that the worst-case regret is smaller than $\mathcal{O}(\kappa_*^{1/3}\log(T)^{1/3}T^{2/3})$, where $\kappa_*$ is an explicit geometrical constant characterizing the difficulty of bias estimation. We prove lower bounds on the worst-case regret for some sets of actions showing that this rate is tight up to a possible sub-logarithmic factor. We also derive gap-dependent upper bounds on the regret, and matching lower bounds for some problem instance.Interestingly, these results reveal a transition between a regime where the problem is as difficult as its unbiased counterpart, and a regime where it can be much harder.
\ No newline at end of file
diff --git a/data/2022/neurips/The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model b/data/2022/neurips/The trade-offs of model size in large recommendation models : 100GB to 10MB Criteo-tb DLRM model
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Theoretical analysis of deep neural networks for temporally dependent observations b/data/2022/neurips/Theoretical analysis of deep neural networks for temporally dependent observations
new file mode 100644
index 0000000000..ad49061e2c
--- /dev/null
+++ b/data/2022/neurips/Theoretical analysis of deep neural networks for temporally dependent observations	
@@ -0,0 +1 @@
+Deep neural networks are powerful tools to model observations over time with non-linear patterns. Despite the widespread use of neural networks in such settings, most theoretical developments of deep neural networks are under the assumption of independent observations, and theoretical results for temporally dependent observations are scarce. To bridge this gap, we study theoretical properties of deep neural networks on modeling non-linear time series data. Specifically, non-asymptotic bounds for prediction error of (sparse) feed-forward neural network with ReLU activation function is established under mixing-type assumptions. These assumptions are mild such that they include a wide range of time series models including auto-regressive models. Compared to independent observations, established convergence rates have additional logarithmic factors to compensate for additional complexity due to dependence among data points. The theoretical results are supported via various numerical simulation settings as well as an application to a macroeconomic data set.
\ No newline at end of file
diff --git a/data/2022/neurips/Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques b/data/2022/neurips/Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques
new file mode 100644
index 0000000000..f29125a6cc
--- /dev/null
+++ b/data/2022/neurips/Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques	
@@ -0,0 +1 @@
+To address the high communication costs of distributed machine learning, a large body of work has been devoted in recent years to designing various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them. Recently, Safaryan et al. (2021) pioneered a dramatically different compression design approach: they first use the local training data to form local smoothness matrices and then propose to design a compressor capable of exploiting the smoothness information contained therein. While this novel approach leads to substantial savings in communication, it is limited to sparsification as it crucially depends on the linearity of the compression operator. In this work, we generalize their smoothness-aware compression strategy to arbitrary unbiased compression operators, which also include sparsification. Specializing our results to stochastic quantization, we guarantee significant savings in communication complexity compared to standard quantization. In particular, we prove that block quantization with $n$ blocks theoretically outperforms single block quantization, leading to a reduction in communication complexity by an $\mathcal{O}(n)$ factor, where $n$ is the number of nodes in the distributed system. Finally, we provide extensive numerical evidence with convex optimization problems that our smoothness-aware quantization strategies outperform existing quantization schemes as well as the aforementioned smoothness-aware sparsification strategies with respect to three evaluation metrics: the number of iterations, the total amount of bits communicated, and wall-clock time.
\ No newline at end of file
diff --git a/data/2022/neurips/Theoretically Provable Spiking Neural Networks b/data/2022/neurips/Theoretically Provable Spiking Neural Networks
new file mode 100644
index 0000000000..1503d06077
--- /dev/null
+++ b/data/2022/neurips/Theoretically Provable Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking neural networks have attracted increasing attention in recent years due to their potential of handling time-dependent data. Many algorithms and techniques have been developed; however, theoretical understandings of many aspects of spiking neural networks are far from clear. A recent work [44] disclosed that typical spiking neural networks could hardly work on spatio-temporal data due to their bifurcation dynamics and suggested that the self-connection structure has to be added. In this paper, we theoretically investigate the approximation ability and computational efﬁciency of spiking neural networks with self connections, and show that the self-connection structure enables spiking neural networks to approximate discrete dynamical systems using a polynomial number of parameters within polynomial time complexities. Our theoretical results may shed some insight for the future studies of spiking neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Theory and Approximate Solvers for Branched Optimal Transport with Multiple Sources b/data/2022/neurips/Theory and Approximate Solvers for Branched Optimal Transport with Multiple Sources
new file mode 100644
index 0000000000..880fb9094a
--- /dev/null
+++ b/data/2022/neurips/Theory and Approximate Solvers for Branched Optimal Transport with Multiple Sources	
@@ -0,0 +1 @@
+Branched Optimal Transport (BOT) is a generalization of optimal transport in which transportation costs along an edge are subadditive. This subadditivity models an increase in transport efficiency when shipping mass along the same route, favoring branched transportation networks. We here study the NP-hard optimization of BOT networks connecting a finite number of sources and sinks in $\mathbb{R}^2$. First, we show how to efficiently find the best geometry of a BOT network for many sources and sinks, given a topology. Second, we argue that a topology with more than three edges meeting at a branching point is never optimal. Third, we show that the results obtained for the Euclidean plane generalize directly to optimal transportation networks on two-dimensional Riemannian manifolds. Finally, we present a simple but effective approximate BOT solver combining geometric optimization with a combinatorial optimization of the network topology.
\ No newline at end of file
diff --git a/data/2022/neurips/Theseus: A Library for Differentiable Nonlinear Optimization b/data/2022/neurips/Theseus: A Library for Differentiable Nonlinear Optimization
new file mode 100644
index 0000000000..8cfd436ac1
--- /dev/null
+++ b/data/2022/neurips/Theseus: A Library for Differentiable Nonlinear Optimization	
@@ -0,0 +1 @@
+We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai
\ No newline at end of file
diff --git a/data/2022/neurips/Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization b/data/2022/neurips/Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization
new file mode 100644
index 0000000000..50ff530ea3
--- /dev/null
+++ b/data/2022/neurips/Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization	
@@ -0,0 +1 @@
+We consider linear prediction with a convex Lipschitz loss, or more generally, stochastic convex optimization problems of generalized linear form, i.e.~where each instantaneous loss is a scalar convex function of a linear function. We show that in this setting, early stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $\epsilon$ (compared to the best possible with unit Euclidean norm) with an optimal, up to logarithmic factors, sample complexity of $\tilde{O}(1/\epsilon^2)$ and only $\tilde{O}(1/\epsilon^2)$ iterations. This contrasts with general stochastic convex optimization, where $\Omega(1/\epsilon^4)$ iterations are needed Amir et al. [2021b]. The lower iteration complexity is ensured by leveraging uniform convergence rather than stability. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $\Theta(1/\epsilon^4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
\ No newline at end of file
diff --git a/data/2022/neurips/Thinned random measures for sparse graphs with overlapping communities b/data/2022/neurips/Thinned random measures for sparse graphs with overlapping communities
new file mode 100644
index 0000000000..7abe0eeb6a
--- /dev/null
+++ b/data/2022/neurips/Thinned random measures for sparse graphs with overlapping communities	
@@ -0,0 +1 @@
+Network models for exchangeable arrays, including most stochastic block models, generate dense graphs with a limited ability to capture many characteristics of real-world social and biological networks. A class of models based on completely random measures like the generalized gamma process (GGP) have recently addressed some of these limitations. We propose a framework for thinning edges from realizations of GGP random graphs that models observed links via nodes’ overall propensity to interact, as well as the similarity of node memberships within a large set of latent communities. Our formulation allows us to learn the number of communities from data, and enables efficient Monte Carlo methods that scale linearly with the number of observed edges, and thus (unlike dense block models) sub-quadratically with the number of entities or nodes. We compare to alternative models for both dense and sparse networks, and demonstrate effective recovery of latent community structure for real-world networks with thousands of nodes.
\ No newline at end of file
diff --git a/data/2022/neurips/This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish b/data/2022/neurips/This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
new file mode 100644
index 0000000000..deb400a59e
--- /dev/null
+++ b/data/2022/neurips/This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish	
@@ -0,0 +1 @@
+The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare large language models. Following the trend to replicate GLUE for other languages, the KLEJ benchmark has been released for Polish. In this paper, we evaluate the progress in benchmarking for low-resourced languages. We note that only a handful of languages have such comprehensive benchmarks. We also note the gap in the number of tasks being evaluated by benchmarks for resource-rich English/Chinese and the rest of the world. In this paper, we introduce LEPISZCZE (the Polish word for glew, the Middle English predecessor of glue), a new, comprehensive benchmark for Polish NLP with a large variety of tasks and high-quality operationalization of the benchmark. We design LEPISZCZE with flexibility in mind. Including new models, datasets, and tasks is as simple as possible while still offering data versioning and model tracking. In the first run of the benchmark, we test 13 experiments (task and dataset pairs) based on the five most recent LMs for Polish. We use five datasets from the Polish benchmark and add eight novel datasets. As the paper's main contribution, apart from LEPISZCZE, we provide insights and experiences learned while creating the benchmark for Polish as the blueprint to design similar benchmarks for other low-resourced languages.
\ No newline at end of file
diff --git a/data/2022/neurips/Thompson Sampling Efficiently Learns to Control Diffusion Processes b/data/2022/neurips/Thompson Sampling Efficiently Learns to Control Diffusion Processes
new file mode 100644
index 0000000000..450353f93c
--- /dev/null
+++ b/data/2022/neurips/Thompson Sampling Efficiently Learns to Control Diffusion Processes	
@@ -0,0 +1 @@
+Diffusion processes that evolve according to linear stochastic differential equations are an important family of continuous-time dynamic decision-making models. Optimal policies are well-studied for them, under full certainty about the drift matrices. However, little is known about data-driven control of diffusion processes with uncertain drift matrices as conventional discrete-time analysis techniques are not applicable. In addition, while the task can be viewed as a reinforcement learning problem involving exploration and exploitation trade-off, ensuring system stability is a fundamental component of designing optimal policies. We establish that the popular Thompson sampling algorithm learns optimal actions fast, incurring only a square-root of time regret, and also stabilizes the system in a short time period. To the best of our knowledge, this is the first such result for Thompson sampling in a diffusion process control problem. We validate our theoretical results through empirical simulations with real parameter matrices from two settings of airplane and blood glucose control. Moreover, we observe that Thompson sampling significantly improves (worst-case) regret, compared to the state-of-the-art algorithms, suggesting Thompson sampling explores in a more guarded fashion. Our theoretical analysis involves characterization of a certain optimality manifold that ties the local geometry of the drift parameters to the optimal control of the diffusion process. We expect this technique to be of broader interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers b/data/2022/neurips/Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers
new file mode 100644
index 0000000000..37cf408ec6
--- /dev/null
+++ b/data/2022/neurips/Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers	
@@ -0,0 +1 @@
+In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.
\ No newline at end of file
diff --git a/data/2022/neurips/Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret b/data/2022/neurips/Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret
new file mode 100644
index 0000000000..ff9280b18a
--- /dev/null
+++ b/data/2022/neurips/Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret	
@@ -0,0 +1 @@
+We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ ("O"for"online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ ("E"for"exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs.~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.
\ No newline at end of file
diff --git a/data/2022/neurips/Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems b/data/2022/neurips/Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems
new file mode 100644
index 0000000000..de97afa4ba
--- /dev/null
+++ b/data/2022/neurips/Tight Analysis of Extra-gradient and Optimistic Gradient Methods For Nonconvex Minimax Problems	
@@ -0,0 +1 @@
+Despite the established convergence theory of Optimistic Gradient Descent Ascent (OGDA) and Extragradient (EG) methods for the convex-concave minimax problems, little is known about the theoretical guarantees of these methods in nonconvex settings. To bridge this gap, for the first time, this paper establishes the convergence of OGDA and EG methods under the nonconvex-strongly-concave (NC-SC) and nonconvex-concave (NC-C) settings by providing a unified analysis through the lens of single-call extra-gradient methods. We further establish lower bounds on the convergence of GDA/OGDA/EG, shedding light on the tightness of our analysis. We also conduct experiments supporting our theoretical results. We believe our results will advance the theoretical understanding of OGDA and EG methods for solving complicated nonconvex minimax real-world problems, e.g., Generative Adversarial Networks (GANs) or robust neural networks training.
\ No newline at end of file
diff --git a/data/2022/neurips/Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes b/data/2022/neurips/Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes
new file mode 100644
index 0000000000..2ffba7565d
--- /dev/null
+++ b/data/2022/neurips/Tight Lower Bounds on Worst-Case Guarantees for Zero-Shot Learning with Attributes	
@@ -0,0 +1 @@
+We develop a rigorous mathematical analysis of zero-shot learning with attributes. In this setting, the goal is to label novel classes with no training data, only detectors for attributes and a description of how those attributes are correlated with the target classes, called the class-attribute matrix. We develop the first non-trivial lower bound on the worst-case error of the best map from attributes to classes for this setting, even with perfect attribute detectors. The lower bound characterizes the theoretical intrinsic difficulty of the zero-shot problem based on the available information -- the class-attribute matrix -- and the bound is practically computable from it. Our lower bound is tight, as we show that we can always find a randomized map from attributes to classes whose expected error is upper bounded by the value of the lower bound. We show that our analysis can be predictive of how standard zero-shot methods behave in practice, including which classes will likely be confused with others.
\ No newline at end of file
diff --git a/data/2022/neurips/Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization b/data/2022/neurips/Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization
new file mode 100644
index 0000000000..8a4ee252ae
--- /dev/null
+++ b/data/2022/neurips/Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization	
@@ -0,0 +1 @@
+Successful applications of InfoNCE and its variants have popularized the use of contrastive variational mutual information (MI) estimators in machine learning. While featuring superior stability, these estimators crucially depend on costly large-batch training, and they sacrifice bound tightness for variance reduction. To overcome these limitations, we revisit the mathematics of popular variational MI bounds from the lens of unnormalized statistical modeling and convex optimization. Our investigation not only yields a new unified theoretical framework encompassing popular variational MI bounds but also leads to a novel, simple, and powerful contrastive MI estimator named as FLO. Theoretically, we show that the FLO estimator is tight, and it provably converges under stochastic gradient descent. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
\ No newline at end of file
diff --git a/data/2022/neurips/Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints b/data/2022/neurips/Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints
new file mode 100644
index 0000000000..a8a3aa6a06
--- /dev/null
+++ b/data/2022/neurips/Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints	
@@ -0,0 +1 @@
+Distributionally robust optimization has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e., if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provides a unified viewpoint to a class of existing robust methods but also leads to new regularization tools. To realize these novel tools, tractable computational algorithms are proposed. As a byproduct, the strong duality theorem proved in this paper can be potentially applied to other problems of independent interest.
\ No newline at end of file
diff --git a/data/2022/neurips/Time-Conditioned Dances with Simplicial Complexes: Zigzag Filtration Curve based Supra-Hodge Convolution Networks for Time-series Forecasting b/data/2022/neurips/Time-Conditioned Dances with Simplicial Complexes: Zigzag Filtration Curve based Supra-Hodge Convolution Networks for Time-series Forecasting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/To update or not to update? Neurons at equilibrium in deep models b/data/2022/neurips/To update or not to update? Neurons at equilibrium in deep models
new file mode 100644
index 0000000000..fb78c3d237
--- /dev/null
+++ b/data/2022/neurips/To update or not to update? Neurons at equilibrium in deep models	
@@ -0,0 +1 @@
+Recent advances in deep learning optimization showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters. Such a discovery has a broad impact from theory to applications, driving the research towards methods to identify the minimum subset of parameters to train without look-ahead information exploitation. However, the methods proposed do not match the state-of-the-art performance, and rely on unstructured sparsely connected models. In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq). When a neuron is in a configuration at equilibrium (meaning that it has learned a specific input-output relationship), we can halt its update; on the contrary, when a neuron is at non-equilibrium, we let its state evolve towards an equilibrium state, updating its parameters. The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.
\ No newline at end of file
diff --git a/data/2022/neurips/ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery b/data/2022/neurips/ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery
new file mode 100644
index 0000000000..01fe67df92
--- /dev/null
+++ b/data/2022/neurips/ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery	
@@ -0,0 +1 @@
+In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).
\ No newline at end of file
diff --git a/data/2022/neurips/TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers b/data/2022/neurips/TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers
new file mode 100644
index 0000000000..b65d1cd9bc
--- /dev/null
+++ b/data/2022/neurips/TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers	
@@ -0,0 +1 @@
+Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides x15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models. Code is available at https://github.com/mlvlab/TokenMixup.
\ No newline at end of file
diff --git a/data/2022/neurips/Top Two Algorithms Revisited b/data/2022/neurips/Top Two Algorithms Revisited
new file mode 100644
index 0000000000..55f8e60ed1
--- /dev/null
+++ b/data/2022/neurips/Top Two Algorithms Revisited	
@@ -0,0 +1 @@
+Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been obtained when the arms are Gaussian with known variances. In this paper, we provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. As a result, we obtain theoretically supported Top Two algorithms for best arm identification with bounded distributions. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices, like selecting the empirical best arm.
\ No newline at end of file
diff --git a/data/2022/neurips/Torsional Diffusion for Molecular Conformer Generation b/data/2022/neurips/Torsional Diffusion for Molecular Conformer Generation
new file mode 100644
index 0000000000..a333957090
--- /dev/null
+++ b/data/2022/neurips/Torsional Diffusion for Molecular Conformer Generation	
@@ -0,0 +1 @@
+Molecular conformer generation is a fundamental task in computational chemistry. Several machine learning approaches have been developed, but none have outperformed state-of-the-art cheminformatics methods. We propose torsional diffusion, a novel diffusion framework that operates on the space of torsion angles via a diffusion process on the hypertorus and an extrinsic-to-intrinsic score model. On a standard benchmark of drug-like molecules, torsional diffusion generates superior conformer ensembles compared to machine learning and cheminformatics methods in terms of both RMSD and chemical properties, and is orders of magnitude faster than previous diffusion-based models. Moreover, our model provides exact likelihoods, which we employ to build the first generalizable Boltzmann generator. Code is available at https://github.com/gcorso/torsional-diffusion.
\ No newline at end of file
diff --git a/data/2022/neurips/TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies b/data/2022/neurips/TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies
new file mode 100644
index 0000000000..d373b398df
--- /dev/null
+++ b/data/2022/neurips/TotalSelfScan: Learning Full-body Avatars from Self-Portrait Videos of Faces, Hands, and Bodies	
@@ -0,0 +1 @@
+Recent advances in implicit neural representations make it possible to reconstruct a human-body model from a monocular self-rotation video. While previous works present impressive results of human body reconstruction, the quality of reconstructed face and hands are relatively low. The main reason is that the image region occupied by these parts is very small compared to the body. To solve this problem, we propose a new approach named TotalSelfScan, which reconstructs the full-body model from several monocular self-rotation videos that focus on the face, hands, and body, respectively. Compared to recording a single video, this setting has almost no additional cost but provides more details of essential parts. To learn the full-body model, instead of encoding the whole body in a single network, we propose a multi-part representation to model separate parts and then fuse the part-specific observations into a single unified human model. Once learned, the full-body model enables rendering photorealistic free-viewpoint videos under novel human poses. Experiments show that TotalSelfScan can significantly improve the reconstruction and rendering quality on the face and hands compared to the existing methods. The code is available at https://zju3dv.github.io/TotalSelfScan .
\ No newline at end of file
diff --git a/data/2022/neurips/Touch and Go: Learning from Human-Collected Vision and Touch b/data/2022/neurips/Touch and Go: Learning from Human-Collected Vision and Touch
new file mode 100644
index 0000000000..1f77eeceff
--- /dev/null
+++ b/data/2022/neurips/Touch and Go: Learning from Human-Collected Vision and Touch	
@@ -0,0 +1 @@
+The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely been confined to lab settings or simulated environments, our dataset spans a large number of"in the wild"objects and scenes. To demonstrate our dataset's effectiveness, we successfully apply it to a variety of tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven image stylization, i.e., making the visual appearance of an object more consistent with a given tactile signal, and 3) predicting future frames of a tactile signal from visuo-tactile inputs.
\ No newline at end of file
diff --git a/data/2022/neurips/Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis b/data/2022/neurips/Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
new file mode 100644
index 0000000000..2716f32e2b
--- /dev/null
+++ b/data/2022/neurips/Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis	
@@ -0,0 +1 @@
+We derive and solve an ``Equation of Motion'' (EoM) for deep neural networks (DNNs), a differential equation that precisely describes the discrete learning dynamics of DNNs. Differential equations are continuous but have played a prominent role even in the study of discrete optimization (gradient descent (GD) algorithms). However, there still exist gaps between differential equations and the actual learning dynamics of DNNs due to discretization error. In this paper, we start from gradient flow (GF) and derive a counter term that cancels the discretization error between GF and GD. As a result, we obtain EoM, a continuous differential equation that precisely describes the discrete learning dynamics of GD. We also derive discretization error to show to what extent EoM is precise. In addition, we apply EoM to two specific cases: scale- and translation-invariant layers. EoM highlights differences between continuous-time and discrete-time GD, indicating the importance of the counter term for a better description of the discrete learning dynamics of GD. Our experimental results support our theoretical findings.
\ No newline at end of file
diff --git a/data/2022/neurips/Toward Robust Spiking Neural Network Against Adversarial Perturbation b/data/2022/neurips/Toward Robust Spiking Neural Network Against Adversarial Perturbation
new file mode 100644
index 0000000000..58ae5ec549
--- /dev/null
+++ b/data/2022/neurips/Toward Robust Spiking Neural Network Against Adversarial Perturbation	
@@ -0,0 +1 @@
+As spiking neural networks (SNNs) are deployed increasingly in real-world efficiency critical applications, the security concerns in SNNs attract more attention. Currently, researchers have already demonstrated an SNN can be attacked with adversarial examples. How to build a robust SNN becomes an urgent issue. Recently, many studies apply certified training in artificial neural networks (ANNs), which can improve the robustness of an NN model promisely. However, existing certifications cannot transfer to SNNs directly because of the distinct neuron behavior and input formats for SNNs. In this work, we first design S-IBP and S-CROWN that tackle the non-linear functions in SNNs' neuron modeling. Then, we formalize the boundaries for both digital and spike inputs. Finally, we demonstrate the efficiency of our proposed robust training method in different datasets and model architectures. Based on our experiment, we can achieve a maximum $37.7\%$ attack error reduction with $3.7\%$ original accuracy loss. To the best of our knowledge, this is the first analysis on robust training of SNNs.
\ No newline at end of file
diff --git a/data/2022/neurips/Toward Understanding Privileged Features Distillation in Learning-to-Rank b/data/2022/neurips/Toward Understanding Privileged Features Distillation in Learning-to-Rank
new file mode 100644
index 0000000000..f0df63aac3
--- /dev/null
+++ b/data/2022/neurips/Toward Understanding Privileged Features Distillation in Learning-to-Rank	
@@ -0,0 +1 @@
+In learning-to-rank problems, a privileged feature is one that is available during model training, but not available at test time. Such features naturally arise in merchandised recommendation systems; for instance,"user clicked this item"as a feature is predictive of"user purchased this item"in the offline data, but is clearly not available during online serving. Another source of privileged features is those that are too expensive to compute online but feasible to be added offline. Privileged features distillation (PFD) refers to a natural idea: train a"teacher"model using all features (including privileged ones) and then use it to train a"student"model that does not use the privileged features. In this paper, we first study PFD empirically on three public ranking datasets and an industrial-scale ranking problem derived from Amazon's logs. We show that PFD outperforms several baselines (no-distillation, pretraining-finetuning, self-distillation, and generalized distillation) on all these datasets. Next, we analyze why and when PFD performs well via both empirical ablation studies and theoretical analysis for linear models. Both investigations uncover an interesting non-monotone behavior: as the predictive power of a privileged feature increases, the performance of the resulting student model initially increases but then decreases. We show the reason for the later decreasing performance is that a very predictive privileged teacher produces predictions with high variance, which lead to high variance student estimates and inferior testing performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Toward a realistic model of speech processing in the brain with self-supervised learning b/data/2022/neurips/Toward a realistic model of speech processing in the brain with self-supervised learning
new file mode 100644
index 0000000000..d48f556286
--- /dev/null
+++ b/data/2022/neurips/Toward a realistic model of speech processing in the brain with self-supervised learning	
@@ -0,0 +1 @@
+Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual words). These elements highlight the need to identify algorithms that, under these limitations, would suffice to account for both behavioral and brain responses. Focusing on the issue of speech processing, we here hypothesize that self-supervised algorithms trained on the raw waveform constitute a promising candidate. Specifically, we compare a recent self-supervised architecture, Wav2Vec 2.0, to the brain activity of 412 English, French, and Mandarin individuals recorded with functional Magnetic Resonance Imaging (fMRI), while they listened to ~1h of audio books. Our results are four-fold. First, we show that this algorithm learns brain-like representations with as little as 600 hours of unlabelled speech -- a quantity comparable to what infants can be exposed to during language acquisition. Second, its functional hierarchy aligns with the cortical hierarchy of speech processing. Third, different training regimes reveal a functional specialization akin to the cortex: Wav2Vec 2.0 learns sound-generic, speech-specific and language-specific representations similar to those of the prefrontal and temporal cortices. Fourth, we confirm the similarity of this specialization with the behavior of 386 additional participants. These elements, resulting from the largest neuroimaging benchmark to date, show how self-supervised learning can account for a rich organization of speech processing in the brain, and thus delineate a path to identify the laws of language acquisition which shape the human brain.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Better Evaluation for Dynamic Link Prediction b/data/2022/neurips/Towards Better Evaluation for Dynamic Link Prediction
new file mode 100644
index 0000000000..0804a0fae1
--- /dev/null
+++ b/data/2022/neurips/Towards Better Evaluation for Dynamic Link Prediction	
@@ -0,0 +1 @@
+Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings because easy negative edges are often used in the current evaluation setting. To evaluate against more difficult negative edges, we introduce two more challenging negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Consistency in Adversarial Classification b/data/2022/neurips/Towards Consistency in Adversarial Classification
new file mode 100644
index 0000000000..0f1761b85d
--- /dev/null
+++ b/data/2022/neurips/Towards Consistency in Adversarial Classification	
@@ -0,0 +1 @@
+In this paper, we study the problem of consistency in the context of adversarial examples. Specifically, we tackle the following question: can surrogate losses still be used as a proxy for minimizing the $0/1$ loss in the presence of an adversary that alters the inputs at test-time? Different from the standard classification task, this question cannot be reduced to a point-wise minimization problem, and calibration needs not to be sufficient to ensure consistency. In this paper, we expose some pathological behaviors specific to the adversarial problem, and show that no convex surrogate loss can be consistent or calibrated in this context. It is therefore necessary to design another class of surrogate functions that can be used to solve the adversarial consistency issue. As a first step towards designing such a class, we identify sufficient and necessary conditions for a surrogate loss to be calibrated in both the adversarial and standard settings. Finally, we give some directions for building a class of losses that could be consistent in the adversarial framework.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Disentangling Information Paths with Coded ResNeXt b/data/2022/neurips/Towards Disentangling Information Paths with Coded ResNeXt
new file mode 100644
index 0000000000..8a037f1733
--- /dev/null
+++ b/data/2022/neurips/Towards Disentangling Information Paths with Coded ResNeXt	
@@ -0,0 +1 @@
+The conventional, widely used treatment of deep learning models as black boxes provides limited or no insights into the mechanisms that guide neural network decisions. Significant research effort has been dedicated to building interpretable models to address this issue. Most efforts either focus on the high-level features associated with the last layers, or attempt to interpret the output of a single layer. In this paper, we take a novel approach to enhance the transparency of the function of the whole network. We propose a neural network architecture for classification, in which the information that is relevant to each class flows through specific paths. These paths are designed in advance before training leveraging coding theory and without depending on the semantic similarities between classes. A key property is that each path can be used as an autonomous single-purpose model. This enables us to obtain, without any additional training and for any class, a lightweight binary classifier that has at least $60\%$ fewer parameters than the original network. Furthermore, our coding theory based approach allows the neural network to make early predictions at intermediate layers during inference, without requiring its full evaluation. Remarkably, the proposed architecture provides all the aforementioned properties while improving the overall accuracy. We demonstrate these properties on a slightly modified ResNeXt model tested on CIFAR-10/100 and ImageNet-1k.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks b/data/2022/neurips/Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks
new file mode 100644
index 0000000000..ca99a9a481
--- /dev/null
+++ b/data/2022/neurips/Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks	
@@ -0,0 +1 @@
+One-shot generative domain adaption aims to transfer a pre-trained generator on one domain to a new domain using one reference image only. However, it remains very challenging for the adapted generator (i) to generate diverse images inherited from the pre-trained generator while (ii) faithfully acquiring the domain-specific attributes and styles of the reference image. In this paper, we present a novel one-shot generative domain adaption method, i.e., DiFa, for diverse generation and faithful adaptation. For global-level adaptation, we leverage the difference between the CLIP embedding of reference image and the mean embedding of source images to constrain the target generator. For local-level adaptation, we introduce an attentive style loss which aligns each intermediate token of adapted image with its corresponding token of the reference image. To facilitate diverse generation, selective cross-domain consistency is introduced to select and retain the domain-sharing attributes in the editing latent $\mathcal{W}+$ space to inherit the diversity of pre-trained generator. Extensive experiments show that our method outperforms the state-of-the-arts both quantitatively and qualitatively, especially for the cases of large domain gaps. Moreover, our DiFa can easily be extended to zero-shot generative domain adaption with appealing results. Code is available at https://github.com/1170300521/DiFa.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization b/data/2022/neurips/Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization
new file mode 100644
index 0000000000..558982f50e
--- /dev/null
+++ b/data/2022/neurips/Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization	
@@ -0,0 +1 @@
+Aiming to locate the object that emits a specified sound in complex scenes, the task of sounding object localization bridges two perception-oriented modalities of vision and acoustics, and brings enormous research value to the comprehensive perceptual understanding of machine intelligence. Although there are massive training data collected in this field, few of them contain accurate bounding box annotations, hindering the learning process and further application of proposed models. In order to address this problem, we try to explore an effective multi-modal knowledge transfer strategy to obtain precise knowledge from other similar tasks and transfer it through well-aligned multi-modal data to deal with this task in a zero-resource manner. Concretely, we design and propose a novel Two-stream Universal Referring localization Network (TURN), which is composed of a localization stream and an alignment stream to carry out different functions. The former is utilized to extract the knowledge related to referring object localization from the image grounding task, while the latter is devised to learn a universal semantic space shared between texts and audios. Moreover, we further develop an adaptive sampling strategy to automatically identify the overlap between different data domains, thus boosting the performance and stability of our model. The extensive experiments on various publicly-available benchmarks demonstrate that TURN can achieve competitive performance compared with the state-of-the-art approaches without using any data in this field, which verifies the feasibility of our proposed mechanisms and strategies. The code is available at https://github.com/AwalkZY/TURN.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Efficient 3D Object Detection with Knowledge Distillation b/data/2022/neurips/Towards Efficient 3D Object Detection with Knowledge Distillation
new file mode 100644
index 0000000000..ff36cf9487
--- /dev/null
+++ b/data/2022/neurips/Towards Efficient 3D Object Detection with Knowledge Distillation	
@@ -0,0 +1 @@
+Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors, focusing on popular pillar- and voxel-based detectors.In the absence of well-developed teacher-student pairs, we first study how to obtain student models with good trade offs between accuracy and efficiency from the perspectives of model compression and input resolution reduction. Then, we build a benchmark to assess existing KD methods developed in the 2D domain for 3D object detection upon six well-constructed teacher-student pairs. Further, we propose an improved KD pipeline incorporating an enhanced logit KD method that performs KD on only a few pivotal positions determined by teacher classification response, and a teacher-guided student model initialization to facilitate transferring teacher model's feature extraction ability to students through weight inheritance. Finally, we conduct extensive experiments on the Waymo dataset. Our best performing model achieves $65.75\%$ LEVEL 2 mAPH, surpassing its teacher model and requiring only $44\%$ of teacher flops. Our most efficient model runs 51 FPS on an NVIDIA A100, which is $2.2\times$ faster than PointPillar with even higher accuracy. Code is available at \url{https://github.com/CVMI-Lab/SparseKD}.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Efficient Post-training Quantization of Pre-trained Language Models b/data/2022/neurips/Towards Efficient Post-training Quantization of Pre-trained Language Models
new file mode 100644
index 0000000000..9fa5a31ea7
--- /dev/null
+++ b/data/2022/neurips/Towards Efficient Post-training Quantization of Pre-trained Language Models	
@@ -0,0 +1 @@
+Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end training with full access to the entire dataset. Therefore, they suffer from slow training, large memory overhead, and data security issues. In this paper, we study post-training quantization~(PTQ) of PLMs, and propose module-wise quantization error minimization~(MREM), an efficient solution to mitigate these issues. By partitioning the PLM into multiple modules, we minimize the reconstruction error incurred by quantization for each module. In addition, we design a new model parallel training strategy such that each module can be trained locally on separate computing devices without waiting for preceding modules, which brings nearly the theoretical training speed-up (e.g., $4\times$ on $4$ GPUs). Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning b/data/2022/neurips/Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning
new file mode 100644
index 0000000000..b4efaa5389
--- /dev/null
+++ b/data/2022/neurips/Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning	
@@ -0,0 +1 @@
+In this paper, we target image-based person-to-person virtual try-on in the presence of diverse poses and large viewpoint variations. Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape. Moreover, current garment warping methods are confined to localized regions, which makes them ineffective in capturing long-range dependencies and results in inferior flows with artifacts. To tackle these issues, we present 3D-aware global correspondences, which are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies. Particularly, given an image pair depicting the source and target person, (a) we first obtain their pose-aware and high-level representations via two encoders, and introduce a coarse-to-fine decoder with multiple refinement modules to predict the pixel-wise global correspondence. (b) 3D parametric human models inferred from images are incorporated as priors to regularize the correspondence refinement process so that our flows can be 3D-aware and better handle variations of pose and viewpoint. (c) Finally, an adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result. Extensive experiments on public benchmarks and our HardPose test set demonstrate the superiority of our method against the SOTA try-on approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning b/data/2022/neurips/Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning
new file mode 100644
index 0000000000..a17ece4e24
--- /dev/null
+++ b/data/2022/neurips/Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning	
@@ -0,0 +1 @@
+Achieving human-level dexterity is an important open problem in robotics. However, tasks of dexterous hand manipulation, even at the baby level, are challenging to solve through reinforcement learning (RL). The difficulty lies in the high degrees of freedom and the required cooperation among heterogeneous agents (e.g., joints of fingers). In this study, we propose the Bimanual Dexterous Hands Benchmark (Bi-DexHands), a simulator that involves two dexterous hands with tens of bimanual manipulation tasks and thousands of target objects. Specifically, tasks in Bi-DexHands are designed to match different levels of human motor skills according to cognitive science literature. We built Bi-DexHands in the Issac Gym; this enables highly efficient RL training, reaching 30,000+ FPS by only one single NVIDIA RTX 3090. We provide a comprehensive benchmark for popular RL algorithms under different settings; this includes Single-agent/Multi-agent RL, Offline RL, Multi-task RL, and Meta RL. Our results show that the PPO type of on-policy algorithms can master simple manipulation tasks that are equivalent up to 48-month human babies (e.g., catching a flying object, opening a bottle), while multi-agent RL can further help to master manipulations that require skilled bimanual cooperation (e.g., lifting a pot, stacking blocks). Despite the success on each single task, when it comes to acquiring multiple manipulation skills, existing RL algorithms fail to work in most of the multi-task and the few-shot learning settings, which calls for more substantial development from the RL community. Our project is open sourced at https://github.com/PKU-MARL/DexterousHands.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Improving Calibration in Object Detection Under Domain Shift b/data/2022/neurips/Towards Improving Calibration in Object Detection Under Domain Shift
new file mode 100644
index 0000000000..bb944fa29b
--- /dev/null
+++ b/data/2022/neurips/Towards Improving Calibration in Object Detection Under Domain Shift	
@@ -0,0 +1 @@
+With deep neural network based solution more readily being incorporated in real-world applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and well-calibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain predictions. Unfortunately, very little to no attention is paid towards addressing calibration of DNN-based visual object detectors, that occupy similar space and importance in many decision making systems as their visual classification counterparts. In this work, we study the calibration of DNN-based object detection models, particularly under domain shift. To this end, we first propose a new, plug-and-play, train-time calibration loss for object detection (coined as TCD). It can be used with various application-specific loss functions as an auxiliary loss function to improve detection calibration. Second, we devise a new implicit technique for improving calibration in self-training based domain adaptive detectors, featuring a new uncertainty quantification mechanism for object detection. We demonstrate TCD is capable of enhancing calibration with notable margins (1) across different DNN-based object detection paradigms both in in-domain and out-of-domain predictions, and (2) in different domain-adaptive detectors across challenging adaptation scenarios. Finally, we empirically show that our implicit calibration technique can be used in tandem with TCD during adaptation to further boost calibration in diverse domain shift scenarios.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Improving Faithfulness in Abstractive Summarization b/data/2022/neurips/Towards Improving Faithfulness in Abstractive Summarization
new file mode 100644
index 0000000000..470d70613f
--- /dev/null
+++ b/data/2022/neurips/Towards Improving Faithfulness in Abstractive Summarization	
@@ -0,0 +1 @@
+Despite the success achieved in neural abstractive summarization based on pre-trained language models, one unresolved issue is that the generated summaries are not always faithful to the input document. There are two possible causes of the unfaithfulness problem: (1) the summarization model fails to understand or capture the gist of the input text, and (2) the model over-relies on the language model to generate fluent but inadequate words. In this work, we propose a Faithfulness Enhanced Summarization model (FES), which is designed for addressing these two problems and improving faithfulness in abstractive summarization. For the first problem, we propose to use question-answering (QA) to examine whether the encoder fully grasps the input document and can answer the questions on the key information in the input. The QA attention on the proper input words can also be used to stipulate how the decoder should attend to the source. For the second problem, we introduce a max-margin loss defined on the difference between the language and the summarization model, aiming to prevent the overconfidence of the language model. Extensive experiments on two benchmark summarization datasets, CNN/DM and XSum, demonstrate that our model significantly outperforms strong baselines. The evaluation of factual consistency also shows that our model generates more faithful summaries than baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Learning Universal Hyperparameter Optimizers with Transformers b/data/2022/neurips/Towards Learning Universal Hyperparameter Optimizers with Transformers
new file mode 100644
index 0000000000..fe8596409d
--- /dev/null
+++ b/data/2022/neurips/Towards Learning Universal Hyperparameter Optimizers with Transformers	
@@ -0,0 +1 @@
+Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction when trained on vast tuning data from the wild, such as Google's Vizier database, one of the world's largest HPO datasets. Our extensive experiments demonstrate that the OptFormer can simultaneously imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates. Compared to a Gaussian Process, the OptFormer also learns a robust prior distribution for hyperparameter response functions, and can thereby provide more accurate and better calibrated predictions. This work paves the path to future extensions for training a Transformer-based model as a general HPO optimizer.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Lightweight Black-Box Attack Against Deep Neural Networks b/data/2022/neurips/Towards Lightweight Black-Box Attack Against Deep Neural Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Towards Optimal Communication Complexity in Distributed Non-Convex Optimization b/data/2022/neurips/Towards Optimal Communication Complexity in Distributed Non-Convex Optimization
new file mode 100644
index 0000000000..b9623e9118
--- /dev/null
+++ b/data/2022/neurips/Towards Optimal Communication Complexity in Distributed Non-Convex Optimization	
@@ -0,0 +1 @@
+We study the problem of distributed stochastic non-convex optimization with intermittent communication. We consider the full participation setting where M machines work in parallel over R communication rounds and the partial participation setting where M machines are sampled independently every round from some meta-distribution over machines. We propose and analyze a new algorithm that improves existing methods by requiring fewer and lighter variance reduction operations. We also present lower bounds, showing our algorithm is either optimal or almost optimal in most settings. Numerical experiments demonstrate the superior performance of our algorithm
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment b/data/2022/neurips/Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment
new file mode 100644
index 0000000000..c077c62c68
--- /dev/null
+++ b/data/2022/neurips/Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment	
@@ -0,0 +1 @@
+The goal of sequential event prediction is to estimate the next event based on a sequence of historical events, with applications to sequential recommendation, user behavior analysis and clinical treatment. In practice, the next-event prediction models are trained with sequential data collected at one time and need to generalize to newly arrived sequences in remote future, which requires models to handle temporal distribution shift from training to testing. In this paper, we first take a data-generating perspective to reveal a negative result that existing approaches with maximum likelihood estimation would fail for distribution shift due to the latent context confounder, i.e., the common cause for the historical events and the next event. Then we devise a new learning objective based on backdoor adjustment and further harness variational inference to make it tractable for sequence learning problems. On top of that, we propose a framework with hierarchical branching structures for learning context-specific representations. Comprehensive experiments on diverse tasks (e.g., sequential recommendation) demonstrate the effectiveness, applicability and scalability of our method with various off-the-shelf models as backbones.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Practical Control of Singular Values of Convolutional Layers b/data/2022/neurips/Towards Practical Control of Singular Values of Convolutional Layers
new file mode 100644
index 0000000000..5e8529f05f
--- /dev/null
+++ b/data/2022/neurips/Towards Practical Control of Singular Values of Convolutional Layers	
@@ -0,0 +1 @@
+In general, convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties and offered several methods for controlling them. Nevertheless, these methods present an intractable computational challenge or resort to coarse approximations. In this paper, we offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity. Our method is based on the tensor-train decomposition; it retains control over the actual singular values of convolutional mappings while providing structurally sparse and hardware-friendly representation. We demonstrate the improved properties of modern CNNs with our method and analyze its impact on the model performance, calibration, and adversarial robustness. The source code is available at: https://github.com/WhiteTeaDragon/practical_svd_conv
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference b/data/2022/neurips/Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference
new file mode 100644
index 0000000000..fba142d054
--- /dev/null
+++ b/data/2022/neurips/Towards Practical Few-shot Query Sets: Transductive Minimum Description Length Inference	
@@ -0,0 +1 @@
+Standard few-shot benchmarks are often built upon simplifying assumptions on the query sets, which may not always hold in practice. In particular, for each task at testing time, the classes effectively present in the unlabeled query set are known a priori, and correspond exactly to the set of classes represented in the labeled support set. We relax these assumptions and extend current benchmarks, so that the query-set classes of a given task are unknown, but just belong to a much larger set of possible classes. Our setting could be viewed as an instance of the challenging yet practical problem of extremely imbalanced K-way classification, K being much larger than the values typically used in standard benchmarks, and with potentially irrelevant supervision from the support set. Expectedly, our setting incurs drops in the performances of state-of-the-art methods. Motivated by these observations, we introduce a PrimAl Dual Minimum Description LEngth (PADDLE) formulation, which balances data-fitting accuracy and model complexity for a given few-shot task, under supervision constraints from the support set. Our constrained MDL-like objective promotes competition among a large set of possible classes, preserving only effective classes that befit better the data of a few-shot task. It is hyperparameter free, and could be applied on top of any base-class training. Furthermore, we derive a fast block coordinate descent algorithm for optimizing our objective, with convergence guarantee, and a linear computational complexity at each iteration. Comprehensive experiments over the standard few-shot datasets and the more realistic and challenging i-Nat dataset show highly competitive performances of our method, more so when the numbers of possible classes in the tasks increase. Our code is publicly available at https://github.com/SegoleneMartin/PADDLE.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias b/data/2022/neurips/Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias
new file mode 100644
index 0000000000..7fb0c94a03
--- /dev/null
+++ b/data/2022/neurips/Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias	
@@ -0,0 +1 @@
+It has become cognitive inertia to employ cross-entropy loss function in classification related tasks. In the untargeted attacks on graph structure, the gradients derived from the attack objective are the attacker's basis for evaluating a perturbation scheme. Previous methods use negative cross-entropy loss as the attack objective in attacking node-level classification models. However, the suitability of the cross-entropy function for constructing the untargeted attack objective has yet been discussed in previous works. This paper argues about the previous unreasonable attack objective from the perspective of budget allocation. We demonstrate theoretically and empirically that negative cross-entropy tends to produce more significant gradients from nodes with lower confidence in the labeled classes, even if the predicted classes of these nodes have been misled. To free up these inefficient attack budgets, we propose a simple attack model for untargeted attacks on graph structure based on a novel attack objective which generates unweighted gradients on graph structures that are not affected by the node confidence. By conducting experiments in gray-box poisoning attack scenarios, we demonstrate that a reasonable budget allocation can significantly improve the effectiveness of gradient-based edge perturbations without any extra hyper-parameter.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation b/data/2022/neurips/Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation
new file mode 100644
index 0000000000..b2bb97a056
--- /dev/null
+++ b/data/2022/neurips/Towards Reliable Simulation-Based Inference with Balanced Neural Ratio Estimation	
@@ -0,0 +1 @@
+Modern approaches for simulation-based inference rely upon deep learning surrogates to enable approximate inference with computer simulators. In practice, the estimated posteriors' computational faithfulness is, however, rarely guaranteed. For example, Hermans et al. (2021) show that current simulation-based inference algorithms can produce posteriors that are overconfident, hence risking false inferences. In this work, we introduce Balanced Neural Ratio Estimation (BNRE), a variation of the NRE algorithm designed to produce posterior approximations that tend to be more conservative, hence improving their reliability, while sharing the same Bayes optimal solution. We achieve this by enforcing a balancing condition that increases the quantified uncertainty in small simulation budget regimes while still converging to the exact posterior as the budget increases. We provide theoretical arguments showing that BNRE tends to produce posterior surrogates that are more conservative than NRE's. We evaluate BNRE on a wide variety of tasks and show that it produces conservative posterior surrogates on all tested benchmarks and simulation budgets. Finally, we emphasize that BNRE is straightforward to implement over NRE and does not introduce any computational overhead.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Robust Blind Face Restoration with Codebook Lookup Transformer b/data/2022/neurips/Towards Robust Blind Face Restoration with Codebook Lookup Transformer
new file mode 100644
index 0000000000..8675ddc052
--- /dev/null
+++ b/data/2022/neurips/Towards Robust Blind Face Restoration with Codebook Lookup Transformer	
@@ -0,0 +1 @@
+Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance to 1) improve the mapping from degraded inputs to desired outputs, or 2) complement high-quality details lost in the inputs. In this paper, we demonstrate that a learned discrete codebook prior in a small proxy space largely reduces the uncertainty and ambiguity of restoration mapping by casting blind face restoration as a code prediction task, while providing rich visual atoms for generating high-quality faces. Under this paradigm, we propose a Transformer-based prediction network, named CodeFormer, to model the global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded. To enhance the adaptiveness for different degradation, we also propose a controllable feature transformation module that allows a flexible trade-off between fidelity and quality. Thanks to the expressive codebook prior and global modeling, CodeFormer outperforms the state of the arts in both quality and fidelity, showing superior robustness to degradation. Extensive experimental results on synthetic and real-world datasets verify the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Safe Reinforcement Learning with a Safety Editor Policy b/data/2022/neurips/Towards Safe Reinforcement Learning with a Safety Editor Policy
new file mode 100644
index 0000000000..517ed2c24d
--- /dev/null
+++ b/data/2022/neurips/Towards Safe Reinforcement Learning with a Safety Editor Policy	
@@ -0,0 +1 @@
+We consider the safe reinforcement learning (RL) problem of maximizing utility with extremely low constraint violation rates. Assuming no prior knowledge or pre-training of the environment safety model given a task, an agent has to learn, via exploration, which states and actions are safe. A popular approach in this line of research is to combine a model-free RL algorithm with the Lagrangian method to adjust the weight of the constraint reward relative to the utility reward dynamically. It relies on a single policy to handle the conflict between utility and constraint rewards, which is often challenging. We present SEditor, a two-policy approach that learns a safety editor policy transforming potentially unsafe actions proposed by a utility maximizer policy into safe ones. The safety editor is trained to maximize the constraint reward while minimizing a hinge loss of the utility state-action values before and after an action is edited. SEditor extends existing safety layer designs that assume simplified safety models, to general safe RL scenarios where the safety model can in theory be arbitrarily complex. As a first-order method, it is easy to implement and efficient for both inference and training. On 12 Safety Gym tasks and 2 safe racing tasks, SEditor obtains much a higher overall safety-weighted-utility (SWU) score than the baselines, and demonstrates outstanding utility performance with constraint violation rates as low as once per 2k time steps, even in obstacle-dense environments. On some tasks, this low violation rate is up to 200 times lower than that of an unconstrained RL method with similar utility performance. Code is available at https://github.com/hnyu/seditor.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Theoretically Inspired Neural Initialization Optimization b/data/2022/neurips/Towards Theoretically Inspired Neural Initialization Optimization
new file mode 100644
index 0000000000..b3c6cfd254
--- /dev/null
+++ b/data/2022/neurips/Towards Theoretically Inspired Neural Initialization Optimization	
@@ -0,0 +1 @@
+Automated machine learning has been widely explored to reduce human efforts in designing neural architectures and looking for proper hyperparameters. In the domain of neural initialization, however, similar automated techniques have rarely been studied. Most existing initialization methods are handcrafted and highly dependent on specific architectures. In this paper, we propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. Specifically, GradCosine is the cosine similarity of sample-wise gradients with respect to the initialized parameters. By analyzing the sample-wise optimization landscape, we show that both the training and test performance of a network can be improved by maximizing GradCosine under gradient norm constraint. Based on this observation, we further propose the neural initialization optimization (NIO) algorithm. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost compared with the training time. With NIO, we improve the classification performance of a variety of neural architectures on CIFAR-10, CIFAR-100, and ImageNet. Moreover, we find that our method can even help to train large vision Transformer architecture without warmup.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning b/data/2022/neurips/Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning
new file mode 100644
index 0000000000..17b301bdc4
--- /dev/null
+++ b/data/2022/neurips/Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning	
@@ -0,0 +1 @@
+The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure doctors' acceptability of such systems. In their initial interaction with patients, doctors do not only focus on identifying the pathology a patient is suffering from; they instead generate a differential diagnosis (in the form of a short list of plausible diseases) because the medical evidence collected from patients is often insufficient to establish a final diagnosis. Moreover, doctors explicitly explore severe pathologies before potentially ruling them out from the differential, especially in acute care settings. Finally, for doctors to trust a system's recommendations, they need to understand how the gathered evidences led to the predicted diseases. In particular, interactions between a system and a patient need to emulate the reasoning of doctors. We therefore propose to model the evidence acquisition and automatic diagnosis tasks using a deep reinforcement learning framework that considers three essential aspects of a doctor's reasoning, namely generating a differential diagnosis using an exploration-confirmation approach while prioritizing severe pathologies. We propose metrics for evaluating interaction quality based on these three aspects. We show that our approach performs better than existing models while maintaining competitive pathology prediction accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Understanding Grokking: An Effective Theory of Representation Learning b/data/2022/neurips/Towards Understanding Grokking: An Effective Theory of Representation Learning
new file mode 100644
index 0000000000..9ae0291559
--- /dev/null
+++ b/data/2022/neurips/Towards Understanding Grokking: An Effective Theory of Representation Learning	
@@ -0,0 +1 @@
+We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a"Goldilocks zone"(including comprehension and grokking) between memorization and confusion. We find on transformers the grokking phase stays closer to the memorization phase (compared to the comprehension phase), leading to delayed generalization. The Goldilocks phase is reminiscent of"intelligence from starvation"in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Understanding the Condensation of Neural Networks at Initial Training b/data/2022/neurips/Towards Understanding the Condensation of Neural Networks at Initial Training
new file mode 100644
index 0000000000..e4bbc601f9
--- /dev/null
+++ b/data/2022/neurips/Towards Understanding the Condensation of Neural Networks at Initial Training	
@@ -0,0 +1 @@
+Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense on isolated orientations. The condensation dynamics implies that the training implicitly regularizes a NN towards one with a much smaller effective size. In this work, we illustrate the formation of the condensation in multi-layer fully connected NNs and show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where"multiplicity"indicates the multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization leads NNs to condensation at the initial training stage.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Understanding the Mixture-of-Experts Layer in Deep Learning b/data/2022/neurips/Towards Understanding the Mixture-of-Experts Layer in Deep Learning
new file mode 100644
index 0000000000..b07a4d6926
--- /dev/null
+++ b/data/2022/neurips/Towards Understanding the Mixture-of-Experts Layer in Deep Learning	
@@ -0,0 +1 @@
+The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. This motivates us to consider a challenging classiﬁcation problem with intrinsic cluster structures. Theoretically, we proved that this problem is hard to solve by a single expert such as a two-layer convolutional neural network (CNN). Yet with the MoE layer with each expert being a two-layer CNN, the problem can be solved successfully. In particular, our theory shows that the router can learn the cluster-center features, which helps divide the input complex problem into simpler classiﬁcation sub-problems that individual experts can conquer. To our knowledge, this is the ﬁrst theoretical result toward formally understanding the mechanism of the MoE layer for deep learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Versatile Embodied Navigation b/data/2022/neurips/Towards Versatile Embodied Navigation
new file mode 100644
index 0000000000..a138d13af4
--- /dev/null
+++ b/data/2022/neurips/Towards Versatile Embodied Navigation	
@@ -0,0 +1 @@
+With the emergence of varied visual navigation tasks (e.g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well. Given plenty of embodied navigation tasks and task-specific solutions, we address a more fundamental question: can we learn a single powerful agent that masters not one but multiple navigation tasks concurrently? First, we propose VXN, a large-scale 3D dataset that instantiates four classic navigation tasks in standardized, continuous, and audiovisual-rich environments. Second, we propose Vienna, a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model. Building upon a full-attentive architecture, Vienna formulates various navigation tasks as a unified, parse-and-query procedure: the target description, augmented with four task embeddings, is comprehensively interpreted into a set of diversified goal vectors, which are refined as the navigation progresses, and used as queries to retrieve supportive context from episodic history for decision making. This enables the reuse of knowledge across navigation tasks with varying input domains/modalities. We empirically demonstrate that, compared with learning each visual navigation task individually, our multitask agent achieves comparable or even better performance with reduced complexity.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards Video Text Visual Question Answering: Benchmark and Baseline b/data/2022/neurips/Towards Video Text Visual Question Answering: Benchmark and Baseline
new file mode 100644
index 0000000000..3b6b8cf065
--- /dev/null
+++ b/data/2022/neurips/Towards Video Text Visual Question Answering: Benchmark and Baseline	
@@ -0,0 +1 @@
+There are already some text-based visual question answering (TextVQA) benchmarks for developing machine’s ability to answer questions based on texts in images in recent years. However, models developed on these benchmarks cannot work effectively in many real-life scenarios (e.g. traffic monitoring, shopping ads and e-learning videos) where temporal reasoning ability is required. To this end, we propose a new task named Vi deo Te xt V isual Q uestion A nswering (ViteVQA in short) that aims at answering questions by spatiotemporally reasoning texts and visual information in a given video. In particular, on the one hand, we build the first ViteVQA benchmark dataset named M4-ViteVQA — the abbreviation of M ulti-category M ulti-frame M ulti-resolution M ulti-modal benchmark for ViteVQA , which contains 7,620 video clips of 9 categories (i.e., shopping , traveling , driving , vlog , sport , advertisement , movie , game and talking ) and 3 kinds of resolutions (i.e., 720p, 1080p and 1176 × 664), and 25,123 question-answer pairs. On the other hand, we develop a baseline method named T5-ViteVQA for the ViteVQA task. T5-ViteVQA consists of five transformers. It first extracts optical character recognition (OCR) tokens, question features, and video representations via two OCR transformers, one language transformer and one video-language transformer, respectively. Then, a multimodal fusion transformer and an answer generation module are applied to fuse multimodal information and generate the final prediction. Extensive experiments on M4-ViteVQA demonstrate the superiority of T5-ViteVQA over the existing approaches of TextVQA and VQA tasks. The ViteVQA benchmark is available in https://github.com/bytedance/VTVQA.
\ No newline at end of file
diff --git a/data/2022/neurips/Towards a Standardised Performance Evaluation Protocol for Cooperative MARL b/data/2022/neurips/Towards a Standardised Performance Evaluation Protocol for Cooperative MARL
new file mode 100644
index 0000000000..a266dde262
--- /dev/null
+++ b/data/2022/neurips/Towards a Standardised Performance Evaluation Protocol for Cooperative MARL	
@@ -0,0 +1 @@
+Multi-agent reinforcement learning (MARL) has emerged as a useful approach to solving decentralised decision-making problems at scale. Research in the field has been growing steadily with many breakthrough algorithms proposed in recent years. In this work, we take a closer look at this rapid development with a focus on evaluation methodologies employed across a large body of research in cooperative MARL. By conducting a detailed meta-analysis of prior work, spanning 75 papers accepted for publication from 2016 to 2022, we bring to light worrying trends that put into question the true rate of progress. We further consider these trends in a wider context and take inspiration from single-agent RL literature on similar issues with recommendations that remain applicable to MARL. Combining these recommendations, with novel insights from our analysis, we propose a standardised performance evaluation protocol for cooperative MARL. We argue that such a standard protocol, if widely adopted, would greatly improve the validity and credibility of future research, make replication and reproducibility easier, as well as improve the ability of the field to accurately gauge the rate of progress over time by being able to make sound comparisons across different works. Finally, we release our meta-analysis data publicly on our project website for future research on evaluation: https://sites.google.com/view/marl-standard-protocol
\ No newline at end of file
diff --git a/data/2022/neurips/Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees b/data/2022/neurips/Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees
new file mode 100644
index 0000000000..832be3848c
--- /dev/null
+++ b/data/2022/neurips/Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees	
@@ -0,0 +1 @@
+We develop a simple and unified framework for nonlinear variable selection that incorporates uncertainty in the prediction function and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods, neural networks, etc). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated partial derivative $\Psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_{P_\mathcal{X}}$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulations and experiments on healthcare benchmark datasets confirm that the proposed algorithm outperforms existing classic and recent variable selection methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Tracking Functional Changes in Nonstationary Signals with Evolutionary Ensemble Bayesian Model for Robust Neural Decoding b/data/2022/neurips/Tracking Functional Changes in Nonstationary Signals with Evolutionary Ensemble Bayesian Model for Robust Neural Decoding
new file mode 100644
index 0000000000..262a9481d4
--- /dev/null
+++ b/data/2022/neurips/Tracking Functional Changes in Nonstationary Signals with Evolutionary Ensemble Bayesian Model for Robust Neural Decoding	
@@ -0,0 +1 @@
+Neural signals are typical nonstationary data where the functional mapping between neural activities and the intentions (such as the velocity of movements) can occasionally change. Existing studies mostly use a ﬁxed neural decoder, thus suffering from an unstable performance given neural functional changes. We propose a novel evolutionary ensemble framework (EvoEnsemble) to dynamically cope with changes in neural signals by evolving the decoder model accordingly. EvoEnsemble integrates evolutionary computation algorithms in a Bayesian framework where the ﬁtness of models can be sequentially computed with their likelihoods according to the incoming data at each time slot, which enables online tracking of time-varying functions. Two strategies of evolve-at-changes and history-model-archive are designed to further improve efﬁciency and stability. Experiments with simulations and neural signals demonstrate that EvoEnsemble can track the changes in functions effectively thus improving the accuracy and robustness of neural decoding. The improvement is most signiﬁcant in neural signals with functional changes.
\ No newline at end of file
diff --git a/data/2022/neurips/Tractable Function-Space Variational Inference in Bayesian Neural Networks b/data/2022/neurips/Tractable Function-Space Variational Inference in Bayesian Neural Networks
new file mode 100644
index 0000000000..536e93c776
--- /dev/null
+++ b/data/2022/neurips/Tractable Function-Space Variational Inference in Bayesian Neural Networks	
@@ -0,0 +1 @@
+Reliable predictive uncertainty estimation plays an important role in enabling the deployment of neural networks to safety-critical settings. A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters, infer an approximate posterior distribution, and use it to make stochastic predictions. However, explicit inference over neural network parameters makes it difficult to incorporate meaningful prior information about the data-generating process into the model. In this paper, we pursue an alternative approach. Recognizing that the primary object of interest in most settings is the distribution over functions induced by the posterior distribution over neural network parameters, we frame Bayesian inference in neural networks explicitly as inferring a posterior distribution over functions and propose a scalable function-space variational inference method that allows incorporating prior information and results in reliable predictive uncertainty estimates. We show that the proposed method leads to state-of-the-art uncertainty estimation and predictive performance on a range of prediction tasks and demonstrate that it performs well on a challenging safety-critical medical diagnosis task in which reliable uncertainty estimation is essential.
\ No newline at end of file
diff --git a/data/2022/neurips/Tractable Optimality in Episodic Latent MABs b/data/2022/neurips/Tractable Optimality in Episodic Latent MABs
new file mode 100644
index 0000000000..6cbd994a94
--- /dev/null
+++ b/data/2022/neurips/Tractable Optimality in Episodic Latent MABs	
@@ -0,0 +1 @@
+We consider a multi-armed bandit problem with $M$ latent contexts, where an agent interacts with the environment for an episode of $H$ time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging. Without any additional structural assumptions, existing techniques to tackle partially observed settings imply the decision maker can learn a near-optimal policy with $O(A)^H$ episodes, but do not promise more. In this work, we show that learning with {\em polynomial} samples in $A$ is possible. We achieve this by using techniques from experiment design. Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\texttt{poly}(A) + \texttt{poly}(M,H)^{\min(M,H)})$ interactions. In practice, we show that we can formulate the moment-matching via maximum likelihood estimation. In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning b/data/2022/neurips/Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning
new file mode 100644
index 0000000000..c919cef627
--- /dev/null
+++ b/data/2022/neurips/Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning	
@@ -0,0 +1 @@
+This paper investigates the problem of fairly trading off between payoff and model rewards in collaborative machine learning (ML) where parties aggregate their datasets together to obtain improved ML models over that of each party. Supposing parties can afford the optimal model trained on the aggregated dataset, we propose an allocation scheme that distributes the payoff fairly. Notably, the same scheme can be derived from two different approaches based on (a) desirable properties of the parties’ payoffs or (b) that of the underlying payoff flows from one party to another. While the former is conceptually simpler, the latter can be used to handle the practical constraint on the budgets of parties. In particular, we propose desirable properties for achieving a fair adjustment of the payoff flows that can trade off between the model reward’s performance and the payoff reward. We empirically demonstrate that our proposed scheme is a sensible solution in several scenarios of collaborative ML with different budget constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/Trading Off Resource Budgets For Improved Regret Bounds b/data/2022/neurips/Trading Off Resource Budgets For Improved Regret Bounds
new file mode 100644
index 0000000000..9bb0249a8f
--- /dev/null
+++ b/data/2022/neurips/Trading Off Resource Budgets For Improved Regret Bounds	
@@ -0,0 +1 @@
+In this work we consider a variant of adversarial online learning where in each round one picks $B$ out of $N$ arms and incurs cost equal to the $\textit{minimum}$ of the costs of each arm chosen. We propose an algorithm called Follow the Perturbed Multiple Leaders (FPML) for this problem, which we show (by adapting the techniques of Kalai and Vempala [2005]) achieves expected regret $\mathcal{O}(T^{\frac{1}{B+1}}\ln(N)^{\frac{B}{B+1}})$ over time horizon $T$ relative to the $\textit{single}$ best arm in hindsight. This introduces a trade-off between the budget $B$ and the single-best-arm regret, and we proceed to investigate several applications of this trade-off. First, we observe that algorithms which use standard regret minimizers as subroutines can sometimes be adapted by replacing these subroutines with FPML, and we use this to generalize existing algorithms for Online Submodular Function Maximization [Streeter and Golovin, 2008] in both the full feedback and semi-bandit feedback settings. Next, we empirically evaluate our new algorithms on an online black-box hyperparameter optimization problem. Finally, we show how FPML can lead to new algorithms for Linear Programming which require stronger oracles at the benefit of fewer oracle calls.
\ No newline at end of file
diff --git a/data/2022/neurips/Trading off Image Quality for Robustness is not Necessary with Regularized Deterministic Autoencoders b/data/2022/neurips/Trading off Image Quality for Robustness is not Necessary with Regularized Deterministic Autoencoders
new file mode 100644
index 0000000000..c7a71de293
--- /dev/null
+++ b/data/2022/neurips/Trading off Image Quality for Robustness is not Necessary with Regularized Deterministic Autoencoders	
@@ -0,0 +1 @@
+The susceptibility of Variational Autoencoders (VAEs) to adversarial attacks indicates the necessity to evaluate the robustness of the learned representations along with the generation performance. The vulnerability of VAEs has been attributed to the limitations associated with their variational formulation. Deterministic au-toencoders could overcome the practical limitations associated with VAEs and offer a promising alternative for image generation applications. In this work, we propose an adversarially robust deterministic autoencoder with superior performance in terms of both generation and robustness of the learned representations. We introduce a regularization scheme to incorporate adversarially perturbed data points to the training pipeline without increasing the computational complexity or compromising the generation fidelity when compared to the robust VAEs by leveraging a loss based on the two-point Kolmogorov–Smirnov test between representations. We conduct extensive experimental studies on popular image benchmark datasets to quantify the robustness of the proposed approach based on the adversarial attacks targeted at VAEs. Our empirical findings show that the proposed method achieves significant performance in both robustness and fidelity when compared to the robust VAE models. An implementation is available at https://github.com/boschresearch/Robust_GMM_DAE .
\ No newline at end of file
diff --git a/data/2022/neurips/Trading off Utility, Informativeness, and Complexity in Emergent Communication b/data/2022/neurips/Trading off Utility, Informativeness, and Complexity in Emergent Communication
new file mode 100644
index 0000000000..6629d6ded1
--- /dev/null
+++ b/data/2022/neurips/Trading off Utility, Informativeness, and Complexity in Emergent Communication	
@@ -0,0 +1 @@
+Emergent communication (EC) research often focuses on optimizing task-specific utility as a driver for communication. However, there is increasing evidence that human languages are shaped by task-general communicative constraints and evolve under pressure to optimize the Information Bottleneck (IB) tradeoff between the informativeness and complexity of the lexicon. Here, we integrate these two approaches by trading off utility, informativeness, and complexity in EC. To this end, we propose Vector-Quantized Variational Information Bottleneck (VQ-VIB), a method for training neural agents to encode inputs into discrete signals embedded in a continuous space. We evaluate our approach in multi-agent reinforcement learning settings and in color reference games and show that: (1) VQ-VIB agents can continuously adapt to changing communicative needs and, in the color domain, align with human languages; (2) the emergent VQ-VIB embedding spaces are semantically meaningful and perceptually grounded; and (3) encouraging informativeness leads to faster convergence rates and improved utility, both in VQ-VIB and in prior neural architectures for symbolic EC, with VQ-VIB achieving higher utility for any given complexity. This work offers a new framework for EC that is grounded in information-theoretic principles that are believed to characterize human language evolution and that may facilitate human-agent interaction.
\ No newline at end of file
diff --git a/data/2022/neurips/Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes b/data/2022/neurips/Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
new file mode 100644
index 0000000000..7727d2e9b6
--- /dev/null
+++ b/data/2022/neurips/Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes	
@@ -0,0 +1 @@
+A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. However, the varying ELR may obscure certain characteristics of the intrinsic loss landscape structure. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
\ No newline at end of file
diff --git a/data/2022/neurips/Training Spiking Neural Networks with Event-driven Backpropagation b/data/2022/neurips/Training Spiking Neural Networks with Event-driven Backpropagation
new file mode 100644
index 0000000000..b4b134c7aa
--- /dev/null
+++ b/data/2022/neurips/Training Spiking Neural Networks with Event-driven Backpropagation	
@@ -0,0 +1 @@
+Spiking Neural networks (SNNs) represent and transmit information by spatiotemporal spike patterns, which bring two major advantages: biological plausibility and suitability for ultralow-power neuromorphic implementation. Despite this, the binary firing characteristic makes training SNNs more challenging. To learn the parameters of deep SNNs in an event-driven fashion as in inference of SNNs, back-propagation with respect to spike timing is proposed. Although this event-driven learning has the advantages of lower computational cost and memory occupation, the accuracy is far below the recurrent neural network-like learning approaches. In this paper, we first analyze the commonly used temporal backpropagation training approach and prove that the sum of gradients remains unchanged between fully-connected and convolutional layers. Secondly, we show that the max pooling layer meets the above invariance rule, while the average pooling layer does not, which will suffer the gradient vanishing problem but can be revised to meet the requirement. Thirdly, we point out the reverse gradient problem for time-based gradients and propose a backward kernel that can solve this problem and keep the property of the invariable sum of gradients. The experimental results show that the proposed approach achieves state-of-the-art performance on CIFAR10 among time-based training methods. Also, this is the first time that the time-based backpropagation approach successfully trains SNN on the CIFAR100 dataset. Our code is available at https://github.com/zhuyaoyu/SNN-event-driven-learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Training Spiking Neural Networks with Local Tandem Learning b/data/2022/neurips/Training Spiking Neural Networks with Local Tandem Learning
new file mode 100644
index 0000000000..d504878396
--- /dev/null
+++ b/data/2022/neurips/Training Spiking Neural Networks with Local Tandem Learning	
@@ -0,0 +1 @@
+Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient over their predecessors. However, there is a lack of an efficient and generalized training method for deep SNNs, especially for deployment on analog computing substrates. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL). The LTL rule follows the teacher-student learning approach by mimicking the intermediate feature representations of a pre-trained ANN. By decoupling the learning of network layers and leveraging highly informative supervisor signals, we demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity. Our experimental results have also shown that the SNNs thus trained can achieve comparable accuracies to their teacher ANNs on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Moreover, the proposed LTL rule is hardware friendly. It can be easily implemented on-chip to perform fast parameter calibration and provide robustness against the notorious device non-ideality issues. It, therefore, opens up a myriad of opportunities for training and deployment of SNN on ultra-low-power mixed-signal neuromorphic computing chips.10
\ No newline at end of file
diff --git a/data/2022/neurips/Training Subset Selection for Weak Supervision b/data/2022/neurips/Training Subset Selection for Weak Supervision
new file mode 100644
index 0000000000..bc80b95cb7
--- /dev/null
+++ b/data/2022/neurips/Training Subset Selection for Weak Supervision	
@@ -0,0 +1 @@
+Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We explore this tradeoff by combining pretrained data representations with the cut statistic (Muhlenbach et al., 2004) to select (hopefully) high-quality subsets of the weakly-labeled training data. Subset selection applies to any label model and classifier and is very simple to plug in to existing weak supervision pipelines, requiring just a few lines of code. We show our subset selection method improves the performance of weak supervision for a wide range of label models, classifiers, and datasets. Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Training Uncertainty-Aware Classifiers with Conformalized Deep Learning b/data/2022/neurips/Training Uncertainty-Aware Classifiers with Conformalized Deep Learning
new file mode 100644
index 0000000000..2579528ab2
--- /dev/null
+++ b/data/2022/neurips/Training Uncertainty-Aware Classifiers with Conformalized Deep Learning	
@@ -0,0 +1 @@
+Deep neural networks are powerful tools to detect hidden patterns in data and leverage them to make predictions, but they are not designed to understand uncertainty and estimate reliable probabilities. In particular, they tend to be overconfident. We begin to address this problem in the context of multi-class classification by developing a novel training algorithm producing models with more dependable uncertainty estimates, without sacrificing predictive power. The idea is to mitigate overconfidence by minimizing a loss function, inspired by advances in conformal inference, that quantifies model uncertainty by carefully leveraging hold-out data. Experiments with synthetic and real data demonstrate this method can lead to smaller conformal prediction sets with higher conditional coverage, after exact calibration with hold-out data, compared to state-of-the-art alternatives.
\ No newline at end of file
diff --git a/data/2022/neurips/Training and Inference on Any-Order Autoregressive Models the Right Way b/data/2022/neurips/Training and Inference on Any-Order Autoregressive Models the Right Way
new file mode 100644
index 0000000000..934a13a61b
--- /dev/null
+++ b/data/2022/neurips/Training and Inference on Any-Order Autoregressive Models the Right Way	
@@ -0,0 +1 @@
+Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting. In recent years, the family of Any-Order Autoregressive Models (AO-ARMs) -- closely related to popular models such as BERT and XLNet -- has shown breakthrough performance in arbitrary conditional tasks across a sweeping range of domains. But, in spite of their success, in this paper we identify significant improvements to be made to previous formulations of AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their probabilistic model, i.e., they define the same distribution in multiple different ways. We alleviate this redundancy by training on a smaller set of univariate conditionals that still maintains support for efficient arbitrary conditional inference. Second, we upweight the training loss for univariate conditionals that are evaluated more frequently during inference. Our method leads to improved performance with no compromises on tractability, giving state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8), image (CIFAR10, ImageNet32), and continuous tabular data domains.
\ No newline at end of file
diff --git a/data/2022/neurips/Training language models to follow instructions with human feedback b/data/2022/neurips/Training language models to follow instructions with human feedback
new file mode 100644
index 0000000000..0d4602fb2f
--- /dev/null
+++ b/data/2022/neurips/Training language models to follow instructions with human feedback	
@@ -0,0 +1 @@
+Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
\ No newline at end of file
diff --git a/data/2022/neurips/Training stochastic stabilized supralinear networks by dynamics-neutral growth b/data/2022/neurips/Training stochastic stabilized supralinear networks by dynamics-neutral growth
new file mode 100644
index 0000000000..fe08fb337e
--- /dev/null
+++ b/data/2022/neurips/Training stochastic stabilized supralinear networks by dynamics-neutral growth	
@@ -0,0 +1 @@
+There continues to be a trade-off between the biological realism and performance of neural networks. Contemporary deep learning techniques allow neural networks to be trained to perform challenging computations at (near) human-level, but these networks typically violate key biological constraints. More detailed models of biological neural networks can incorporate many of these constraints but typically suffer from subpar performance and trainability. Here, we narrow this gap by developing an effective method for training a canonical model of cortical neural circuits, the stabilized supralinear network (SSN), that in previous work had to be constructed manually or trained with undue constraints. SSNs are particularly challenging to train for the same reasons that make them biologically realistic: they are characterized by strongly-connected excitatory cells and expansive firing rate non-linearities that together make them prone to dynamical instabilities unless stabilized by appropriately tuned recurrent inhibition. Our method avoids such instabilities by initializing a small network and gradually increasing network size via the dynamics-neutral addition of neurons during training. We first show how SSNs can be trained to perform typical machine learning tasks by training an SSN on MNIST classification. We then demonstrate the effectiveness of our method by training an SSN on the challenging task of performing amortized Markov chain Monte Carlo-based inference under a Gaussian scale mixture generative model of natural image patches with a rich and diverse set of basis functions – something that was not possible with previous methods. These results open the way to training realistic cortical-like neural networks on challenging tasks at scale.
\ No newline at end of file
diff --git a/data/2022/neurips/Training with More Confidence: Mitigating Injected and Natural Backdoors During Training b/data/2022/neurips/Training with More Confidence: Mitigating Injected and Natural Backdoors During Training
new file mode 100644
index 0000000000..ac68532be2
--- /dev/null
+++ b/data/2022/neurips/Training with More Confidence: Mitigating Injected and Natural Backdoors During Training	
@@ -0,0 +1 @@
+The backdoor or Trojan attack is a severe threat to deep neural networks (DNNs). Researchers find that DNNs trained on benign data and settings can also learn backdoor behaviors, which is known as the natural backdoor. Existing works on anti-backdoor learning are based on weak observations that the backdoor and benign behaviors can differentiate during training. An adaptive attack with slow poisoning can bypass such defenses. Moreover, these methods cannot defend natural backdoors. We found the fundamental differences between backdoor-related neurons and benign neurons: backdoor-related neurons form a hyperplane as the classification surface across input domains of all affected labels. By further analyzing the training process and model architectures, we found that piece-wise linear functions cause this hyperplane surface. In this paper, we design a novel training method that forces the training to avoid generating such hyperplanes and thus remove the injected backdoors. Our extensive experiments on five datasets against five state-of-the-art attacks and also benign training show that our method can outperform existing state-of-the-art defenses. On average, the ASR (attack success rate) of the models trained with NONE is 54.83 times lower than undefended models under standard poisoning backdoor attack and 1.75 times lower under the natural backdoor attack. Our code is available at https://github.com/RU-System-Software-and-Security/NONE.
\ No newline at end of file
diff --git a/data/2022/neurips/Trajectory Inference via Mean-field Langevin in Path Space b/data/2022/neurips/Trajectory Inference via Mean-field Langevin in Path Space
new file mode 100644
index 0000000000..0a247ea1e3
--- /dev/null
+++ b/data/2022/neurips/Trajectory Inference via Mean-field Langevin in Path Space	
@@ -0,0 +1 @@
+Trajectory inference aims at recovering the dynamics of a population from snapshots of its temporal marginals. To solve this task, a min-entropy estimator relative to the Wiener measure in path space was introduced by Lavenant et al. arXiv:2102.09204, and shown to consistently recover the dynamics of a large class of drift-diffusion processes from the solution of an infinite dimensional convex optimization problem. In this paper, we introduce a grid-free algorithm to compute this estimator. Our method consists in a family of point clouds (one per snapshot) coupled via Schr\"odinger bridges which evolve with noisy gradient descent. We study the mean-field limit of the dynamics and prove its global convergence to the desired estimator. Overall, this leads to an inference method with end-to-end theoretical guarantees that solves an interpretable model for trajectory inference. We also present how to adapt the method to deal with mass variations, a useful extension when dealing with single cell RNA-sequencing data where cells can branch and die.
\ No newline at end of file
diff --git a/data/2022/neurips/Trajectory balance: Improved credit assignment in GFlowNets b/data/2022/neurips/Trajectory balance: Improved credit assignment in GFlowNets
new file mode 100644
index 0000000000..4d2ff9b22d
--- /dev/null
+++ b/data/2022/neurips/Trajectory balance: Improved credit assignment in GFlowNets	
@@ -0,0 +1 @@
+Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces.
\ No newline at end of file
diff --git a/data/2022/neurips/Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions b/data/2022/neurips/Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions
new file mode 100644
index 0000000000..5ba92ff045
--- /dev/null
+++ b/data/2022/neurips/Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions	
@@ -0,0 +1 @@
+We analyze the dynamics of large batch stochastic gradient descent with momentum (SGD+M) on the least squares problem when both the number of samples and dimensions are large. In this setting, we show that the dynamics of SGD+M converge to a deterministic discrete Volterra equation as dimension increases, which we analyze. We identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $\mathcal{O}(1/\sqrt{\kappa})$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline b/data/2022/neurips/Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline
new file mode 100644
index 0000000000..486f0e84c5
--- /dev/null
+++ b/data/2022/neurips/Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline	
@@ -0,0 +1 @@
+Current end-to-end autonomous driving methods either run a controller based on a planned trajectory or perform control prediction directly, which have spanned two separately studied lines of research. Seeing their potential mutual benefits to each other, this paper takes the initiative to explore the combination of these two well-developed worlds. Specifically, our integrated approach has two branches for trajectory planning and direct control, respectively. The trajectory branch predicts the future trajectory, while the control branch involves a novel multi-step prediction scheme such that the relationship between current actions and future states can be reasoned. The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step. The outputs from two branches are then fused to achieve complementary advantages. Our results are evaluated in the closed-loop urban driving setting with challenging scenarios using the CARLA simulator. Even with a monocular camera input, the proposed approach ranks first on the official CARLA Leaderboard, outperforming other complex candidates with multiple sensors or fusion mechanisms by a large margin. The source code is publicly available at https://github.com/OpenPerceptionX/TCP
\ No newline at end of file
diff --git a/data/2022/neurips/TransBoost: Improving the Best ImageNet Performance using Deep Transduction b/data/2022/neurips/TransBoost: Improving the Best ImageNet Performance using Deep Transduction
new file mode 100644
index 0000000000..2c22fea81e
--- /dev/null
+++ b/data/2022/neurips/TransBoost: Improving the Best ImageNet Performance using Deep Transduction	
@@ -0,0 +1 @@
+This paper deals with deep transductive learning, and proposes TransBoost as a procedure for fine-tuning any deep neural model to improve its performance on any (unlabeled) test set provided at training time. TransBoost is inspired by a large margin principle and is efficient and simple to use. Our method significantly improves the ImageNet classification performance on a wide range of architectures, such as ResNets, MobileNetV3-L, EfficientNetB0, ViT-S, and ConvNext-T, leading to state-of-the-art transductive performance. Additionally we show that TransBoost is effective on a wide variety of image classification datasets. The implementation of TransBoost is provided at: https://github.com/omerb01/TransBoost .
\ No newline at end of file
diff --git a/data/2022/neurips/TransTab: Learning Transferable Tabular Transformers Across Tables b/data/2022/neurips/TransTab: Learning Transferable Tabular Transformers Across Tables
new file mode 100644
index 0000000000..bd346d81e3
--- /dev/null
+++ b/data/2022/neurips/TransTab: Learning Transferable Tabular Transformers Across Tables	
@@ -0,0 +1 @@
+Tabular data (or tables) are the most widely used data format in machine learning (ML). However, ML models often assume the table structure keeps fixed in training and testing. Before ML modeling, heavy data cleaning is required to merge disparate tables with different columns. This preprocessing often incurs significant data waste (e.g., removing unmatched columns and samples). How to learn ML models from multiple tables with partially overlapping columns? How to incrementally update ML models as more columns become available over time? Can we leverage model pretraining on multiple distinct tables? How to train an ML model which can predict on an unseen table? To answer all those questions, we propose to relax fixed table structures by introducing a Transferable Tabular Transformer (TransTab) for tables. The goal of TransTab is to convert each sample (a row in the table) to a generalizable embedding vector, and then apply stacked transformers for feature encoding. One methodology insight is combining column description and table cells as the raw input to a gated transformer model. The other insight is to introduce supervised and self-supervised pretraining to improve model performance. We compare TransTab with multiple baseline methods on diverse benchmark datasets and five oncology clinical trial datasets. Overall, TransTab ranks 1.00, 1.00, 1.78 out of 12 methods in supervised learning, feature incremental learning, and transfer learning scenarios, respectively; and the proposed pretraining leads to 2.3% AUC lift on average over the supervised learning.
\ No newline at end of file
diff --git a/data/2022/neurips/Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling b/data/2022/neurips/Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
new file mode 100644
index 0000000000..682f8242bb
--- /dev/null
+++ b/data/2022/neurips/Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling	
@@ -0,0 +1 @@
+Sentence scoring aims at measuring the likelihood score of a sentence and is widely used in many natural language processing scenarios, like reranking, which is to select the best sentence from multiple candidates. Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost. In this paper, we propose \textit{Transcormer} -- a Transformer model with a novel \textit{sliding language modeling} (SLM) for sentence scoring. Specifically, our SLM adopts a triple-stream self-attention mechanism to estimate the probability of all tokens in a sentence with bidirectional context and only requires a single forward pass. SLM can avoid the limitations of CLM (only unidirectional context) and MLM (multiple forward passes) and inherit their advantages, and thus achieve high effectiveness and efficiency in scoring. Experimental results on multiple tasks demonstrate that our method achieves better performance than other language modelings.
\ No newline at end of file
diff --git a/data/2022/neurips/Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation b/data/2022/neurips/Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation
new file mode 100644
index 0000000000..f55375f5a6
--- /dev/null
+++ b/data/2022/neurips/Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation	
@@ -0,0 +1 @@
+Consider the problem of improving the estimation of conditional average treatment effects (CATE) for a target domain of interest by leveraging related information from a source domain with a different feature space. This heterogeneous transfer learning problem for CATE estimation is ubiquitous in areas such as healthcare where we may wish to evaluate the effectiveness of a treatment for a new patient population for which different clinical covariates and limited data are available. In this paper, we address this problem by introducing several building blocks that use representation learning to handle the heterogeneous feature spaces and a flexible multi-task architecture with shared and private layers to transfer information between potential outcome functions across domains. Then, we show how these building blocks can be used to recover transfer learning equivalents of the standard CATE learners. On a new semi-synthetic data simulation benchmark for heterogeneous transfer learning we not only demonstrate performance improvements of our heterogeneous transfer causal effect learners across datasets, but also provide insights into the differences between these learners from a transfer perspective.
\ No newline at end of file
diff --git a/data/2022/neurips/Transferring Fairness under Distribution Shifts via Fair Consistency Regularization b/data/2022/neurips/Transferring Fairness under Distribution Shifts via Fair Consistency Regularization
new file mode 100644
index 0000000000..4efe56cf7c
--- /dev/null
+++ b/data/2022/neurips/Transferring Fairness under Distribution Shifts via Fair Consistency Regularization	
@@ -0,0 +1 @@
+The increasing reliance on ML models in high-stakes tasks has raised a major concern on fairness violations. Although there has been a surge of work that improves algorithmic fairness, most of them are under the assumption of an identical training and test distribution. In many real-world applications, however, such an assumption is often violated as previously trained fair models are often deployed in a different environment, and the fairness of such models has been observed to collapse. In this paper, we study how to transfer model fairness under distribution shifts, a widespread issue in practice. We conduct a fine-grained analysis of how the fair model is affected under different types of distribution shifts and find that domain shifts are more challenging than subpopulation shifts. Inspired by the success of self-training in transferring accuracy under domain shifts, we derive a sufficient condition for transferring group fairness. Guided by it, we propose a practical algorithm with a fair consistency regularization as the key component. A synthetic dataset benchmark, which covers all types of distribution shifts, is deployed for experimental verification of the theoretical findings. Experiments on synthetic and real datasets including image and tabular data demonstrate that our approach effectively transfers fairness and accuracy under various distribution shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching b/data/2022/neurips/Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching
new file mode 100644
index 0000000000..fc58e5c2e3
--- /dev/null
+++ b/data/2022/neurips/Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching	
@@ -0,0 +1 @@
+Despite surprising performance on zero-shot transfer, pre-training a large-scale multimodal model is often prohibitive as it requires a huge amount of data and computing resources. In this paper, we propose a method (BeamCLIP) that can effectively transfer the representations of a large pre-trained multimodal model (CLIP-ViT) into a small target model (e.g., ResNet-18). For unsupervised transfer, we introduce cross-modal similarity matching (CSM) that enables a student model to learn the representations of a teacher model by matching the relative similarity distribution across text prompt embeddings. To better encode the text prompts, we design context-based prompt augmentation (CPA) that can alleviate the lexical ambiguity of input text prompts. Our experiments show that unsupervised representation transfer of a pre-trained vision-language model enables a small ResNet-18 to achieve a better ImageNet-1K top-1 linear probe accuracy (66.2%) than vision-only self-supervised learning (SSL) methods (e.g., SimCLR: 51.8%, SwAV: 63.7%), while closing the gap with supervised learning (69.8%).
\ No newline at end of file
diff --git a/data/2022/neurips/Transform Once: Efficient Operator Learning in Frequency Domain b/data/2022/neurips/Transform Once: Efficient Operator Learning in Frequency Domain
new file mode 100644
index 0000000000..d4d4ffb613
--- /dev/null
+++ b/data/2022/neurips/Transform Once: Efficient Operator Learning in Frequency Domain	
@@ -0,0 +1 @@
+Spectral analysis provides one of the most effective paradigms for information-preserving dimensionality reduction, as simple descriptions of naturally occurring signals are often obtained via few terms of periodic basis functions. In this work, we study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time: frequency-domain models (FDMs). Existing FDMs are based on complex-valued transforms i.e. Fourier Transforms (FT), and layers that perform computation on the spectrum and input data separately. This design introduces considerable computational overhead: for each layer, a forward and inverse FT. Instead, this work introduces a blueprint for frequency domain learning through a single transform: transform once (T1). To enable efficient, direct learning in the frequency domain we derive a variance-preserving weight initialization scheme and investigate methods for frequency selection in reduced-order FDMs. Our results noticeably streamline the design process of FDMs, pruning redundant transforms, and leading to speedups of 3x to 10x that increase with data resolution and model size. We perform extensive experiments on learning the solution operator of spatio-temporal dynamics, including incompressible Navier-Stokes, turbulent flows around airfoils and high-resolution video of smoke. T1 models improve on the test performance of FDMs while requiring significantly less computation (5 hours instead of 32 for our large-scale experiment), with over 20% reduction in average predictive error across tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Transformer Memory as a Differentiable Search Index b/data/2022/neurips/Transformer Memory as a Differentiable Search Index
new file mode 100644
index 0000000000..722d33d47e
--- /dev/null
+++ b/data/2022/neurips/Transformer Memory as a Differentiable Search Index	
@@ -0,0 +1 @@
+In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids; in other words, a DSI model answers queries directly using only its parameters, dramatically simplifying the whole retrieval process. We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes. Experiments demonstrate that given appropriate design choices, DSI significantly outperforms strong baselines such as dual encoder models. Moreover, DSI demonstrates strong generalization capabilities, outperforming a BM25 baseline in a zero-shot setup.
\ No newline at end of file
diff --git a/data/2022/neurips/Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing b/data/2022/neurips/Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing
new file mode 100644
index 0000000000..195eab60b7
--- /dev/null
+++ b/data/2022/neurips/Transformer-based Working Memory for Multiagent Reinforcement Learning with Action Parsing	
@@ -0,0 +1 @@
+Learning in real-world multiagent tasks is challenging due to the usual partial observability of each agent. Previous efforts alleviate the partial observability by historical hidden states with Recurrent Neural Networks, however, they do not consider the multiagent characters that either the multiagent observation consists of a number of object entities or the action space shows clear entity interactions. To tackle these issues, we propose the Agent Transformer Memory (ATM) network with a transformer-based memory. First, ATM utilizes the transformer to enable the unified processing of the factored environmental entities and memory. Inspired by the human’s working memory process where a limited capacity of information temporarily held in mind can effectively guide the decision-making, ATM updates its fixed-capacity memory with the working memory updating schema. Second, as agents’ each action has its particular interaction entities in the environment, ATM parses the action space to introduce this action’s semantic inductive bias by binding each action with its specified involving entity to predict the state-action value or logit. Extensive experiments on the challenging SMAC and Level-Based Foraging environments validate that ATM could boost existing multiagent RL algorithms with impressive learning acceleration and performance improvement.
\ No newline at end of file
diff --git a/data/2022/neurips/Transformers from an Optimization Perspective b/data/2022/neurips/Transformers from an Optimization Perspective
new file mode 100644
index 0000000000..ff87762485
--- /dev/null
+++ b/data/2022/neurips/Transformers from an Optimization Perspective	
@@ -0,0 +1 @@
+Deep learning models such as the Transformer are often constructed by heuristics and experience. To provide a complementary foundation, in this work we study the following problem: Is it possible to find an energy function underlying the Transformer model, such that descent steps along this energy correspond with the Transformer forward pass? By finding such a function, we can view Transformers as the unfolding of an interpretable optimization process across iterations. This unfolding perspective has been frequently adopted in the past to elucidate more straightforward deep models such as MLPs and CNNs; however, it has thus far remained elusive obtaining a similar equivalence for more complex models with self-attention mechanisms like the Transformer. To this end, we first outline several major obstacles before providing companion techniques to at least partially address them, demonstrating for the first time a close association between energy function minimization and deep layers with self-attention. This interpretation contributes to our intuition and understanding of Transformers, while potentially laying the ground-work for new model designs.
\ No newline at end of file
diff --git a/data/2022/neurips/Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost b/data/2022/neurips/Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost
new file mode 100644
index 0000000000..82c0cbdba4
--- /dev/null
+++ b/data/2022/neurips/Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost	
@@ -0,0 +1 @@
+To overcome the quadratic cost of self-attention, recent works have proposed various sparse attention modules, most of which fall under one of two groups: 1) sparse attention under a hand-crafted patterns and 2) full attention followed by a sparse variant of softmax such as $\alpha$-entmax. Unfortunately, the first group lacks adaptability to data while the second still requires quadratic cost in training. In this work, we propose SBM-Transformer, a model that resolves both problems by endowing each attention head with a mixed-membership Stochastic Block Model (SBM). Then, each attention head data-adaptively samples a bipartite graph, the adjacency of which is used as an attention mask for each input. During backpropagation, a straight-through estimator is used to flow gradients beyond the discrete sampling step and adjust the probabilities of sampled edges based on the predictive loss. The forward and backward cost are thus linear to the number of edges, which each attention head can also choose flexibly based on the input. By assessing the distribution of graphs, we theoretically show that SBM-Transformer is a universal approximator for arbitrary sequence-to-sequence functions in expectation. Empirical evaluations under the LRA and GLUE benchmarks demonstrate that our model outperforms previous efficient variants as well as the original Transformer with full attention. Our implementation can be found in https://github.com/sc782/SBM-Transformer .
\ No newline at end of file
diff --git a/data/2022/neurips/Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture b/data/2022/neurips/Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
new file mode 100644
index 0000000000..64b3fad557
--- /dev/null
+++ b/data/2022/neurips/Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture	
@@ -0,0 +1 @@
+In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their"width"approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/Translation-equivariant Representation in Recurrent Networks with a Continuous Manifold of Attractors b/data/2022/neurips/Translation-equivariant Representation in Recurrent Networks with a Continuous Manifold of Attractors
new file mode 100644
index 0000000000..3080be86e9
--- /dev/null
+++ b/data/2022/neurips/Translation-equivariant Representation in Recurrent Networks with a Continuous Manifold of Attractors	
@@ -0,0 +1 @@
+Equivariant representation is necessary for the brain and artiﬁcial perceptual systems to faithfully represent the stimulus under some (Lie) group transformations. However, it remains unknown how recurrent neural circuits in the brain represent the stimulus equivariantly, nor the neural representation of abstract group operators. The present study uses a one-dimensional (1D) translation group as an example to explore the general recurrent neural circuit mechanism of the equivariant stimulus representation. We found that a continuous attractor network (CAN), a canonical neural circuit model, self-consistently generates a continuous family of stationary population responses (attractors) that represents the stimulus equivariantly. Inspired by the Drosophila’s compass circuit, we found that the 1D translation operators can be represented by extra speed neurons besides the CAN, where speed neurons’ responses represent the moving speed (1D translation group parameter), and their feedback connections to the CAN represent the translation generator (Lie algebra). We demonstrated that the network responses are consistent with experimental data. Our model for the ﬁrst time demonstrates how recurrent neural circuitry in the brain achieves equivariant stimulus representation.
\ No newline at end of file
diff --git a/data/2022/neurips/Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork b/data/2022/neurips/Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork
new file mode 100644
index 0000000000..80d08402b3
--- /dev/null
+++ b/data/2022/neurips/Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) are vulnerable to backdoor attacks. Previous works have shown it extremely challenging to unlearn the undesired backdoor behavior from the network, since the entire network can be affected by the backdoor samples. In this paper, we propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples from the model. Our defense strategy, Trap and Replace, consists of two stages. In the first stage, we bait and trap the backdoors in a small and easy-to-replace subnetwork. Specifically, we add an auxiliary image reconstruction head on top of the stem network shared with a light-weighted classification head. The intuition is that the auxiliary image reconstruction task encourages the stem network to keep sufficient low-level visual features that are hard to learn but semantically correct, instead of overfitting to the easy-to-learn but semantically incorrect backdoor correlations. As a result, when trained on backdoored datasets, the backdoors are easily baited towards the unprotected classification head, since it is much more vulnerable than the shared stem, leaving the stem network hardly poisoned. In the second stage, we replace the poisoned light-weighted classification head with an untainted one, by re-training it from scratch only on a small holdout dataset with clean samples, while fixing the stem network. As a result, both the stem and the classification head in the final network are hardly affected by backdoor training samples. We evaluate our method against ten different backdoor attacks. Our method outperforms previous state-of-the-art methods by up to 20.57%, 9.80%, and 13.72% attack success rate and on-average 3.14%, 1.80%, and 1.21% clean classification accuracy on CIFAR10, GTSRB, and ImageNet-12, respectively. Code is available at https://github.com/VITA-Group/Trap-and-Replace-Backdoor-Defense.
\ No newline at end of file
diff --git a/data/2022/neurips/Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks b/data/2022/neurips/Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks
new file mode 100644
index 0000000000..db278deff8
--- /dev/null
+++ b/data/2022/neurips/Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks	
@@ -0,0 +1 @@
+Understanding generalization and robustness of machine learning models fundamentally relies on assuming an appropriate metric on the data space. Identifying such a metric is particularly challenging for non-Euclidean data such as graphs. Here, we propose a pseudometric for attributed graphs, the Tree Mover's Distance (TMD), and study its relation to generalization. Via a hierarchical optimal transport problem, TMD reflects the local distribution of node attributes as well as the distribution of local computation trees, which are known to be decisive for the learning behavior of graph neural networks (GNNs). First, we show that TMD captures properties relevant to graph classification: a simple TMD-SVM performs competitively with standard GNNs. Second, we relate TMD to generalization of GNNs under distribution shifts, and show that it correlates well with performance drop under such shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces b/data/2022/neurips/Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces
new file mode 100644
index 0000000000..82988d4d59
--- /dev/null
+++ b/data/2022/neurips/Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces	
@@ -0,0 +1 @@
+Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search, as they achieve good predictive performance with little or no manual tuning, naturally handle discrete feature spaces, and are relatively insensitive to outliers in the training data. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. To address both points simultaneously, we propose using the kernel interpretation of tree ensembles as a Gaussian Process prior to obtain model variance estimates, and we develop a compatible optimization formulation for the acquisition function. The latter further allows us to seamlessly integrate known constraints to improve sampling efficiency by considering domain-knowledge in engineering settings and modeling search space symmetries, e.g., hierarchical relationships in neural architecture search. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
\ No newline at end of file
diff --git a/data/2022/neurips/TreeMoCo: Contrastive Neuron Morphology Representation Learning b/data/2022/neurips/TreeMoCo: Contrastive Neuron Morphology Representation Learning
new file mode 100644
index 0000000000..01df7852c4
--- /dev/null
+++ b/data/2022/neurips/TreeMoCo: Contrastive Neuron Morphology Representation Learning	
@@ -0,0 +1 @@
+Morphology of neuron trees is a key indicator to delineate neuronal cell-types, analyze brain development process, and evaluate pathological changes in neurological diseases. Traditional analysis mostly relies on heuristic features and visual inspections. A quantitative, informative, and comprehensive representation of neuron morphology is largely absent but desired. To fill this gap, in this work, we adopt a Tree-LSTM network to encode neuron morphology and introduce a self-supervised learning framework named TreeMoCo to learn features without the need for labels. We test TreeMoCo on 2403 high-quality 3D neuron reconstructions of mouse brains from three different public resources. Our results show that TreeMoCo is effective in both classifying major brain cell-types and identifying sub-types. To our best knowledge, TreeMoCo is the very first to explore learning the representation of neuron tree morphology with contrastive learning. It has a great potential to shed new light on quantitative neuron morphology analysis. Code is available at https: //github.com/TencentAILabHealthcare/NeuronRepresentation .
\ No newline at end of file
diff --git a/data/2022/neurips/Triangulation candidates for Bayesian optimization b/data/2022/neurips/Triangulation candidates for Bayesian optimization
new file mode 100644
index 0000000000..515c409793
--- /dev/null
+++ b/data/2022/neurips/Triangulation candidates for Bayesian optimization	
@@ -0,0 +1 @@
+Bayesian optimization involves"inner optimization"over a new-data acquisition criterion which is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart local numerical optimizers. In such cases it is common to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. We detail the construction of these"tricands"and demonstrate empirically how they outperform both numerically optimized acquisitions and random candidate-based alternatives, and are well-suited for hybrid schemes, on benchmark synthetic and real simulation experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model b/data/2022/neurips/Trimmed Maximum Likelihood Estimation for Robust Generalized Linear Model
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Truly Deterministic Policy Optimization b/data/2022/neurips/Truly Deterministic Policy Optimization
new file mode 100644
index 0000000000..cc2f5f4d3b
--- /dev/null
+++ b/data/2022/neurips/Truly Deterministic Policy Optimization	
@@ -0,0 +1 @@
+In this paper, we present a policy gradient method that avoids exploratory noise injection and performs policy search over the deterministic landscape. By avoiding noise injection all sources of estimation variance can be eliminated in systems with deterministic dynamics (up to the initial state distribution). Since deterministic policy regularization is impossible using traditional non-metric measures such as the KL divergence, we derive a Wasserstein-based quadratic model for our purposes. We state conditions on the system model under which it is possible to establish a monotonic policy improvement guarantee, propose a surrogate function for policy gradient estimation, and show that it is possible to compute exact advantage estimates if both the state transition model and the policy are deterministic. Finally, we describe two novel robotic control environments -- one with non-local rewards in the frequency domain and the other with a long horizon (8000 time-steps) -- for which our policy gradient method (TDPO) significantly outperforms existing methods (PPO, TRPO, DDPG, and TD3). Our implementation with all the experimental settings is available at https://github.com/ehsansaleh/code_tdpo
\ No newline at end of file
diff --git a/data/2022/neurips/Truncated Matrix Power Iteration for Differentiable DAG Learning b/data/2022/neurips/Truncated Matrix Power Iteration for Differentiable DAG Learning
new file mode 100644
index 0000000000..de61ea40df
--- /dev/null
+++ b/data/2022/neurips/Truncated Matrix Power Iteration for Differentiable DAG Learning	
@@ -0,0 +1 @@
+Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of $3$ or more in terms of structural Hamming distance.
\ No newline at end of file
diff --git a/data/2022/neurips/Truncated proposals for scalable and hassle-free simulation-based inference b/data/2022/neurips/Truncated proposals for scalable and hassle-free simulation-based inference
new file mode 100644
index 0000000000..7a7067f26f
--- /dev/null
+++ b/data/2022/neurips/Truncated proposals for scalable and hassle-free simulation-based inference	
@@ -0,0 +1 @@
+Simulation-based inference (SBI) solves statistical inverse problems by repeatedly running a stochastic simulator and inferring posterior distributions from model-simulations. To improve simulation efficiency, several inference methods take a sequential approach and iteratively adapt the proposal distributions from which model simulations are generated. However, many of these sequential methods are difficult to use in practice, both because the resulting optimisation problems can be challenging and efficient diagnostic tools are lacking. To overcome these issues, we present Truncated Sequential Neural Posterior Estimation (TSNPE). TSNPE performs sequential inference with truncated proposals, sidestepping the optimisation issues of alternative approaches. In addition, TSNPE allows to efficiently perform coverage tests that can scale to complex models with many parameters. We demonstrate that TSNPE performs on par with previous methods on established benchmark tasks. We then apply TSNPE to two challenging problems from neuroscience and show that TSNPE can successfully obtain the posterior distributions, whereas previous methods fail. Overall, our results demonstrate that TSNPE is an efficient, accurate, and robust inference method that can scale to challenging scientific models.
\ No newline at end of file
diff --git a/data/2022/neurips/Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions b/data/2022/neurips/Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions
new file mode 100644
index 0000000000..0fdb20a1b0
--- /dev/null
+++ b/data/2022/neurips/Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions	
@@ -0,0 +1 @@
+Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces. We circumvent the infinite-dimensional optimization problem for PO by providing a one-dimensional dual reformulation for which strong duality holds. We then analytically derive the optimal policy update given the solution of the dual problem. This way, we bypass the computation of optimal transport costs and of optimal transport maps, which we implicitly characterize by solving the dual formulation. Finally, we provide an experimental evaluation of our approach across various control tasks. Our results show that optimal transport discrepancies can offer an advantage over state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Trustworthy Monte Carlo b/data/2022/neurips/Trustworthy Monte Carlo
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Tsetlin Machine for Solving Contextual Bandit Problems b/data/2022/neurips/Tsetlin Machine for Solving Contextual Bandit Problems
new file mode 100644
index 0000000000..bd3fafb603
--- /dev/null
+++ b/data/2022/neurips/Tsetlin Machine for Solving Contextual Bandit Problems	
@@ -0,0 +1 @@
+This paper introduces an interpretable contextual bandit algorithm using Tsetlin Machines, which solves complex pattern recognition tasks using propositional logic. The proposed bandit learning algorithm relies on straightforward bit manipulation, thus simplifying computation and interpretation. We then present a mechanism for performing Thompson sampling with Tsetlin Machine, given its non-parametric nature. Our empirical analysis shows that Tsetlin Machine as a base contextual bandit learner outperforms other popular base learners on eight out of nine datasets. We further analyze the interpretability of our learner, investigating how arms are selected based on propositional expressions that model the context.
\ No newline at end of file
diff --git a/data/2022/neurips/Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers b/data/2022/neurips/Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers
new file mode 100644
index 0000000000..3a4db9546b
--- /dev/null
+++ b/data/2022/neurips/Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers	
@@ -0,0 +1 @@
+Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism. We define a flexible equilibrium selection framework, that is capable of uniquely selecting an equilibrium that minimizes relative entropy, or maximizes welfare. The network is trained without needing to generate any supervised training data. We show remarkable zero-shot generalization to larger games. We argue that such a network is a powerful component for many possible multiagent algorithms.
\ No newline at end of file
diff --git a/data/2022/neurips/Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation b/data/2022/neurips/Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation
new file mode 100644
index 0000000000..de53b44e05
--- /dev/null
+++ b/data/2022/neurips/Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation	
@@ -0,0 +1 @@
+Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.
\ No newline at end of file
diff --git a/data/2022/neurips/TweetNERD - End to End Entity Linking Benchmark for Tweets b/data/2022/neurips/TweetNERD - End to End Entity Linking Benchmark for Tweets
new file mode 100644
index 0000000000..9e11f9528f
--- /dev/null
+++ b/data/2022/neurips/TweetNERD - End to End Entity Linking Benchmark for Tweets	
@@ -0,0 +1 @@
+Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K+ Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits. TweetNERD is available at: https://doi.org/10.5281/zenodo.6617192 under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Check out more details at https://github.com/twitter-research/TweetNERD.
\ No newline at end of file
diff --git a/data/2022/neurips/TwiBot-22: Towards Graph-Based Twitter Bot Detection b/data/2022/neurips/TwiBot-22: Towards Graph-Based Twitter Bot Detection
new file mode 100644
index 0000000000..408af4dfa2
--- /dev/null
+++ b/data/2022/neurips/TwiBot-22: Towards Graph-Based Twitter Bot Detection	
@@ -0,0 +1 @@
+Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse. State-of-the-art bot detection methods generally leverage the graph structure of the Twitter network, and they exhibit promising performance when confronting novel Twitter bots that traditional methods fail to detect. However, very few of the existing Twitter bot detection datasets are graph-based, and even these few graph-based datasets suffer from limited dataset scale, incomplete graph structure, as well as low annotation quality. In fact, the lack of a large-scale graph-based Twitter bot detection benchmark that addresses these issues has seriously hindered the development and evaluation of novel graph-based bot detection approaches. In this paper, we propose TwiBot-22, a comprehensive graph-based Twitter bot detection benchmark that presents the largest dataset to date, provides diversified entities and relations on the Twitter network, and has considerably better annotation quality than existing datasets. In addition, we re-implement 35 representative Twitter bot detection baselines and evaluate them on 9 datasets, including TwiBot-22, to promote a fair comparison of model performance and a holistic understanding of research progress. To facilitate further research, we consolidate all implemented codes and datasets into the TwiBot-22 evaluation framework, where researchers could consistently evaluate new models and datasets. The TwiBot-22 Twitter bot detection benchmark and evaluation framework are publicly available at https://twibot22.github.io/
\ No newline at end of file
diff --git a/data/2022/neurips/Two-Stream Network for Sign Language Recognition and Translation b/data/2022/neurips/Two-Stream Network for Sign Language Recognition and Translation
new file mode 100644
index 0000000000..08eaf5eec1
--- /dev/null
+++ b/data/2022/neurips/Two-Stream Network for Sign Language Recognition and Translation	
@@ -0,0 +1 @@
+Sign languages are visual languages using manual articulations and non-manual elements to convey information. For sign language recognition and translation, the majority of existing approaches directly encode RGB videos into hidden representations. RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding. To mitigate this problem and better incorporate domain knowledge, such as handshape and body movement, we introduce a dual visual encoder containing two separate streams to model both the raw videos and the keypoint sequences generated by an off-the-shelf keypoint estimator. To make the two streams interact with each other, we explore a variety of techniques, including bidirectional lateral connection, sign pyramid network with auxiliary supervision, and frame-level self-distillation. The resulting model is called TwoStream-SLR, which is competent for sign language recognition (SLR). TwoStream-SLR is extended to a sign language translation (SLT) model, TwoStream-SLT, by simply attaching an extra translation network. Experimentally, our TwoStream-SLR and TwoStream-SLT achieve state-of-the-art performance on SLR and SLT tasks across a series of datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily. Code and models are available at: https://github.com/FangyunWei/SLRT.
\ No newline at end of file
diff --git a/data/2022/neurips/Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime b/data/2022/neurips/Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime
new file mode 100644
index 0000000000..515b3b16ad
--- /dev/null
+++ b/data/2022/neurips/Two-layer neural network on infinite dimensional data: global optimization guarantee in the mean-field regime	
@@ -0,0 +1 @@
+The analysis of neural network optimization in the mean-field regime is important as the setting allows for feature learning. The existing theory has been developed mainly for neural networks in finite dimensions, i.e. each neuron has a finite-dimensional parameter. However, the setting of infinite-dimensional input naturally arises in machine learning problems such as nonparametric functional data analysis and graph classification. In this paper, we develop a new mean-field analysis of a two-layer neural network in an infinite-dimensional parameter space. We first give a generalization error bound, which shows that the regularized empirical risk minimizer properly generalizes when the data size is sufficiently large, despite the neurons being infinite-dimensional. Next, we present two gradient-based optimization algorithms for infinite-dimensional mean-field networks, by extending the recently developed particle optimization framework to the infinite-dimensional setting. We show that the proposed algorithms converge to the (regularized) global optimal solution, and moreover, their rates of convergence are of polynomial order in the online setting and exponential order in the finite sample setting, respectively. To the best of our knowledge, this is the first quantitative global optimization guarantee of a neural network on infinite-dimensional input and in the presence of feature learning.
\ No newline at end of file
diff --git a/data/2022/neurips/UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units b/data/2022/neurips/UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units
new file mode 100644
index 0000000000..825b97d0c2
--- /dev/null
+++ b/data/2022/neurips/UDC: Unified DNAS for Compressible TinyML Models for Neural Processing Units	
@@ -0,0 +1 @@
+Deploying TinyML models on low-cost IoT hardware is very challenging, due to limited device memory capacity. Neural processing unit (NPU) hardware address the memory challenge by using model compression to exploit weight quantization and sparsity to fit more parameters in the same footprint. However, designing compressible neural networks (NNs) is challenging, as it expands the design space across which we must make balanced trade-offs. This paper demonstrates Unified DNAS for Compressible (UDC) NNs, which explores a large search space to generate state-of-the-art compressible NNs for NPU. ImageNet results show UDC networks are up to $3.35\times$ smaller (iso-accuracy) or 6.25% more accurate (iso-model size) than previous work.
\ No newline at end of file
diff --git a/data/2022/neurips/ULNeF: Untangled Layered Neural Fields for Mix-and-Match Virtual Try-On b/data/2022/neurips/ULNeF: Untangled Layered Neural Fields for Mix-and-Match Virtual Try-On
new file mode 100644
index 0000000000..5d7f5a7664
--- /dev/null
+++ b/data/2022/neurips/ULNeF: Untangled Layered Neural Fields for Mix-and-Match Virtual Try-On	
@@ -0,0 +1 @@
+Recent advances in neural models have shown great results for virtual try-on (VTO) problems, where a 3D representation of a garment is deformed to fit a target body shape. However, current solutions are limited to a single garment layer, and cannot address the combinatorial complexity of mixing different garments. Motivated by this limitation, we investigate the use of neural fields for mix-and-match VTO, and identify and solve a fundamental challenge that existing neural-field methods cannot address: the interaction between layered neural fields. To this end, we propose a neural model that untangles layered neural fields to represent collision-free garment surfaces. The key ingredient is a neural untangling projection operator that works directly on the layered neural fields, not on explicit surface representations. Algorithms to resolve object-object interaction are inherently limited by the use of explicit geometric representations, and we show how methods that work directly on neural implicit representations could bring a change of paradigm and open the door to radically different approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup b/data/2022/neurips/UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup
new file mode 100644
index 0000000000..aee9c2c90c
--- /dev/null
+++ b/data/2022/neurips/UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup	
@@ -0,0 +1 @@
+Subpopulation shift widely exists in many real-world machine learning applications, referring to the training and test distributions containing the same subpopulation groups but varying in subpopulation frequencies. Importance reweighting is a normal way to handle the subpopulation shift issue by imposing constant or adaptive sampling weights on each sample in the training dataset. However, some recent studies have recognized that most of these approaches fail to improve the performance over empirical risk minimization especially when applied to over-parameterized neural networks. In this work, we propose a simple yet practical framework, called uncertainty-aware mixup (UMIX), to mitigate the overfitting issue in over-parameterized models by reweighting the ''mixed'' samples according to the sample uncertainty. The training-trajectories-based uncertainty estimation is equipped in the proposed UMIX for each sample to flexibly characterize the subpopulation distribution. We also provide insightful theoretical analysis to verify that UMIX achieves better generalization bounds over prior works. Further, we conduct extensive empirical studies across a wide range of tasks to validate the effectiveness of our method both qualitatively and quantitatively. Code is available at https://github.com/TencentAILabHealthcare/UMIX.
\ No newline at end of file
diff --git a/data/2022/neurips/UQGAN: A Unified Model for Uncertainty Quantification of Deep Classifiers trained via Conditional GANs b/data/2022/neurips/UQGAN: A Unified Model for Uncertainty Quantification of Deep Classifiers trained via Conditional GANs
new file mode 100644
index 0000000000..e0cd6dddd1
--- /dev/null
+++ b/data/2022/neurips/UQGAN: A Unified Model for Uncertainty Quantification of Deep Classifiers trained via Conditional GANs	
@@ -0,0 +1 @@
+We present an approach to quantifying both aleatoric and epistemic uncertainty for deep neural networks in image classification, based on generative adversarial networks (GANs). While most works in the literature that use GANs to generate out-of-distribution (OoD) examples only focus on the evaluation of OoD detection, we present a GAN based approach to learn a classifier that produces proper uncertainties for OoD examples as well as for false positives (FPs). Instead of shielding the entire in-distribution data with GAN generated OoD examples which is state-of-the-art, we shield each class separately with out-of-class examples generated by a conditional GAN and complement this with a one-vs-all image classifier. In our experiments, in particular on CIFAR10, CIFAR100 and Tiny ImageNet, we improve over the OoD detection and FP detection performance of state-of-the-art GAN-training based classifiers. Furthermore, we also find that the generated GAN examples do not significantly affect the calibration error of our classifier and result in a significant gain in model accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/USB: A Unified Semi-supervised Learning Benchmark for Classification b/data/2022/neurips/USB: A Unified Semi-supervised Learning Benchmark for Classification
new file mode 100644
index 0000000000..2aecc9f5d2
--- /dev/null
+++ b/data/2022/neurips/USB: A Unified Semi-supervised Learning Benchmark for Classification	
@@ -0,0 +1 @@
+Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) for classification by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate the dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation of these SSL methods. We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 39 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with TorchSSL.
\ No newline at end of file
diff --git a/data/2022/neurips/UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes b/data/2022/neurips/UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
new file mode 100644
index 0000000000..9a272a7fcb
--- /dev/null
+++ b/data/2022/neurips/UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes	
@@ -0,0 +1 @@
+We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncalibrated Models Can Improve Human-AI Collaboration b/data/2022/neurips/Uncalibrated Models Can Improve Human-AI Collaboration
new file mode 100644
index 0000000000..dde2c9e161
--- /dev/null
+++ b/data/2022/neurips/Uncalibrated Models Can Improve Human-AI Collaboration	
@@ -0,0 +1 @@
+In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of"confidence"that the human can use to calibrate how much they depend on or trust the advice. In this paper, we present an initial exploration that suggests showing AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice). We first train a model to predict human incorporation of AI advice using data from thousands of human-AI interactions. This enables us to explicitly estimate how to transform the AI's prediction confidence, making the AI uncalibrated, in order to improve the final human prediction. We empirically validate our results across four different tasks--dealing with images, text and tabular data--involving hundreds of human participants. We further support our findings with simulation analysis. Our findings suggest the importance of jointly optimizing the human-AI system as opposed to the standard paradigm of optimizing the AI model alone.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning b/data/2022/neurips/Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning
new file mode 100644
index 0000000000..869f15eed6
--- /dev/null
+++ b/data/2022/neurips/Uncertainty Estimation Using Riemannian Model Dynamics for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Model-based offline reinforcement learning approaches generally rely on bounds of model error. Estimating these bounds is usually achieved through uncertainty estimation methods. In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric. In particular, we build upon recent advances in Riemannian geometry of generative models to construct a pullback metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out-of-distribution samples as well as the discrepancy of examples in the data. We leverage our method for uncertainty estimation in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline approaches on continuous control and autonomous driving benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture b/data/2022/neurips/Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
new file mode 100644
index 0000000000..51168abda4
--- /dev/null
+++ b/data/2022/neurips/Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture	
@@ -0,0 +1 @@
+Uncertainty estimation is essential to make neural networks trustworthy in real-world applications. Extensive research efforts have been made to quantify and reduce predictive uncertainty. However, most existing works are designed for unimodal data, whereas multi-view uncertainty estimation has not been sufficiently investigated. Therefore, we propose a new multi-view classification framework for better uncertainty estimation and out-of-domain sample detection, where we associate each view with an uncertainty-aware classifier and combine the predictions of all the views in a principled way. The experimental results with real-world datasets demonstrate that our proposed approach is an accurate, reliable, and well-calibrated classifier, which predominantly outperforms the multi-view baselines tested in terms of expected calibration error, robustness to noise, and accuracy for the in-domain sample classification and the out-of-domain sample detection tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncertainty-Aware Hierarchical Refinement for Incremental Implicitly-Refined Classification b/data/2022/neurips/Uncertainty-Aware Hierarchical Refinement for Incremental Implicitly-Refined Classification
new file mode 100644
index 0000000000..56e045afcb
--- /dev/null
+++ b/data/2022/neurips/Uncertainty-Aware Hierarchical Refinement for Incremental Implicitly-Refined Classification	
@@ -0,0 +1 @@
+Incremental implicitly-refined classification task aims at assigning hierarchical labels to each sample encountered at different phases. Existing methods tend to fail in generating hierarchy-invariant descriptors when the novel classes are inherited from the old ones. To address the issue, this paper, which explores the inheritance relations in the process of multi-level semantic increment, proposes an Uncertainty-Aware Hierarchical Refinement (UAHR) scheme. Specifically, our proposed scheme consists of a global representation extension strategy that enhances the discrimination of incremental representation by widening the corresponding margin distance, and a hierarchical distribution alignment strategy that refines the distillation process by explicitly determining the inheritance relationship of the incremental class. Particularly, the shifting subclasses are corrected under the guidance of hierarchical uncertainty, ensuring the consistency of the homogeneous features. Extensive experiments on widely used benchmarks ( i.e. , IIRC-CIFAR, IIRC-ImageNet-lite, IIRC-ImageNet-Subset, and IIRC-ImageNet-full) demonstrate the superiority of our proposed method over the state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game b/data/2022/neurips/Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game
new file mode 100644
index 0000000000..11b57618d5
--- /dev/null
+++ b/data/2022/neurips/Uncertainty-Aware Reinforcement Learning for Risk-Sensitive Player Evaluation in Sports Game	
@@ -0,0 +1 @@
+A major task of sports analytics is player evaluation. Previous methods commonly measured the impact of players’ actions on desirable outcomes (e.g., goals or winning) without considering the risk induced by stochastic game dynamics. In this paper, we design an uncertainty-aware Reinforcement Learning (RL) framework to learn a risk-sensitive player evaluation metric from stochastic game dynamics. To embed the risk of a player’s movements into the distribution of action-values, we model their 1) aleatoric uncertainty , which represents the intrinsic stochasticity in a sports game, and 2) epistemic uncertainty , which is due to a model’s insufﬁcient knowledge regarding Out-of-Distribution (OoD) samples. We demonstrate how a distributional Bellman operator and a feature-space density model can capture these uncertainties. Based on such uncertainty estimation, we propose a Risk-sensitive Game Impact Metric (RiGIM) that measures players’ performance over a season by conditioning on a speciﬁc conﬁdence level. Empirical evaluation, based on over 9M play-by-play ice hockey and soccer events, shows that RiGIM correlates highly with standard success measures and has a consistent risk sensitivity.
\ No newline at end of file
diff --git a/data/2022/neurips/Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games b/data/2022/neurips/Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games
new file mode 100644
index 0000000000..05a16f7d49
--- /dev/null
+++ b/data/2022/neurips/Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games	
@@ -0,0 +1 @@
+In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).
\ No newline at end of file
diff --git a/data/2022/neurips/Uncovering the Structural Fairness in Graph Contrastive Learning b/data/2022/neurips/Uncovering the Structural Fairness in Graph Contrastive Learning
new file mode 100644
index 0000000000..d049c87d69
--- /dev/null
+++ b/data/2022/neurips/Uncovering the Structural Fairness in Graph Contrastive Learning	
@@ -0,0 +1 @@
+Recent studies show that graph convolutional network (GCN) often performs worse for low-degree nodes, exhibiting the so-called structural unfairness for graphs with long-tailed degree distributions prevalent in the real world. Graph contrastive learning (GCL), which marries the power of GCN and contrastive learning, has emerged as a promising self-supervised approach for learning node representations. How does GCL behave in terms of structural fairness? Surprisingly, we find that representations obtained by GCL methods are already fairer to degree bias than those learned by GCN. We theoretically show that this fairness stems from intra-community concentration and inter-community scatter properties of GCL, resulting in a much clear community structure to drive low-degree nodes away from the community boundary. Based on our theoretical analysis, we further devise a novel graph augmentation method, called GRAph contrastive learning for DEgree bias (GRADE), which applies different strategies to low- and high-degree nodes. Extensive experiments on various benchmarks and evaluation protocols validate the effectiveness of the proposed method.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment b/data/2022/neurips/Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment
new file mode 100644
index 0000000000..6ae5afed55
--- /dev/null
+++ b/data/2022/neurips/Understanding Aesthetics with Language: A Photo Critique Dataset for Aesthetic Assessment	
@@ -0,0 +1 @@
+Computational inference of aesthetics is an ill-defined task due to its subjective nature. Many datasets have been proposed to tackle the problem by providing pairs of images and aesthetic scores based on human ratings. However, humans are better at expressing their opinion, taste, and emotions by means of language rather than summarizing them in a single number. In fact, photo critiques provide much richer information as they reveal how and why users rate the aesthetics of visual stimuli. In this regard, we propose the Reddit Photo Critique Dataset (RPCD), which contains tuples of image and photo critiques. RPCD consists of 74K images and 220K comments and is collected from a Reddit community used by hobbyists and professional photographers to improve their photography skills by leveraging constructive community feedback. The proposed dataset differs from previous aesthetics datasets mainly in three aspects, namely (i) the large scale of the dataset and the extension of the comments criticizing different aspects of the image, (ii) it contains mostly UltraHD images, and (iii) it can easily be extended to new data as it is collected through an automatic pipeline. To the best of our knowledge, in this work, we propose the first attempt to estimate the aesthetic quality of visual stimuli from the critiques. To this end, we exploit the polarity of the sentiment of criticism as an indicator of aesthetic judgment. We demonstrate how sentiment polarity correlates positively with the aesthetic judgment available for two aesthetic assessment benchmarks. Finally, we experiment with several models by using the sentiment scores as a target for ranking images. Dataset and baselines are available (https://github.com/mediatechnologycenter/aestheval).
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Benign Overfitting in Gradient-Based Meta Learning b/data/2022/neurips/Understanding Benign Overfitting in Gradient-Based Meta Learning
new file mode 100644
index 0000000000..966da58d5e
--- /dev/null
+++ b/data/2022/neurips/Understanding Benign Overfitting in Gradient-Based Meta Learning	
@@ -0,0 +1 @@
+Meta learning has demonstrated tremendous success in few-shot learning with limited supervised data. In those settings, the meta model is usually overparameterized. While the conventional statistical learning theory suggests that overparameterized models tend to overfit, empirical evidence reveals that overparameterized meta learning methods still work well -- a phenomenon often called"benign overfitting."To understand this phenomenon, we focus on the meta learning settings with a challenging bilevel structure that we term the gradient-based meta learning, and analyze its generalization performance under an overparameterized meta linear regression model. While our analysis uses the relatively tractable linear models, our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in gradient-based meta learning tasks. We corroborate our theoretical claims through numerical simulations.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty b/data/2022/neurips/Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty
new file mode 100644
index 0000000000..e278ee67c0
--- /dev/null
+++ b/data/2022/neurips/Understanding Cross-Domain Few-Shot Learning Based on Domain Similarity and Few-Shot Difficulty	
@@ -0,0 +1 @@
+Cross-domain few-shot learning (CD-FSL) has drawn increasing attention for handling large differences between the source and target domains--an important concern in real-world scenarios. To overcome these large differences, recent works have considered exploiting small-scale unlabeled data from the target domain during the pre-training stage. This data enables self-supervised pre-training on the target domain, in addition to supervised pre-training on the source domain. In this paper, we empirically investigate which pre-training is preferred based on domain similarity and few-shot difficulty of the target domain. We discover that the performance gain of self-supervised pre-training over supervised pre-training becomes large when the target domain is dissimilar to the source domain, or the target domain itself has low few-shot difficulty. We further design two pre-training schemes, mixed-supervised and two-stage learning, that improve performance. In this light, we present six findings for CD-FSL, which are supported by extensive experiments and analyses on three source and eight target benchmark datasets with varying levels of domain similarity and few-shot difficulty. Our code is available at https://github.com/sungnyun/understanding-cdfsl.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Deep Contrastive Learning via Coordinate-wise Optimization b/data/2022/neurips/Understanding Deep Contrastive Learning via Coordinate-wise Optimization
new file mode 100644
index 0000000000..1849d48aba
--- /dev/null
+++ b/data/2022/neurips/Understanding Deep Contrastive Learning via Coordinate-wise Optimization	
@@ -0,0 +1 @@
+We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbol{\theta}$ and pairwise importance $\alpha$, where the \emph{max player} $\boldsymbol{\theta}$ learns representation for contrastiveness, and the \emph{min player} $\alpha$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $\alpha$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $\alpha$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $\alpha$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Hyperdimensional Computing for Parallel Single-Pass Learning b/data/2022/neurips/Understanding Hyperdimensional Computing for Parallel Single-Pass Learning
new file mode 100644
index 0000000000..3cb3ad0297
--- /dev/null
+++ b/data/2022/neurips/Understanding Hyperdimensional Computing for Parallel Single-Pass Learning	
@@ -0,0 +1 @@
+Hyperdimensional computing (HDC) is an emerging learning paradigm that computes with high dimensional binary vectors. It is attractive because of its energy efficiency and low latency, especially on emerging hardware -- but HDC suffers from low model accuracy, with little theoretical understanding of what limits its performance. We propose a new theoretical analysis of the limits of HDC via a consideration of what similarity matrices can be"expressed"by binary vectors, and we show how the limits of HDC can be approached using random Fourier features (RFF). We extend our analysis to the more general class of vector symbolic architectures (VSA), which compute with high-dimensional vectors (hypervectors) that are not necessarily binary. We propose a new class of VSAs, finite group VSAs, which surpass the limits of HDC. Using representation theory, we characterize which similarity matrices can be"expressed"by finite group VSA hypervectors, and we show how these VSAs can be constructed. Experimental results show that our RFF method and group VSA can both outperform the state-of-the-art HDC model by up to 7.6\% while maintaining hardware efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective b/data/2022/neurips/Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective
new file mode 100644
index 0000000000..a34578d0a6
--- /dev/null
+++ b/data/2022/neurips/Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) have shown superiority in many prediction tasks over graphs due to their impressive capability of capturing nonlinear relations in graph-structured data. However, for node classification tasks, often, only marginal improvement of GNNs over their linear counterparts has been observed. Previous works provide very few understandings of this phenomenon. In this work, we resort to Bayesian learning to deeply investigate the functions of non-linearity in GNNs for node classification tasks. Given a graph generated from the statistical model CSBM, we observe that the max-a-posterior estimation of a node label given its own and neighbors' attributes consists of two types of non-linearity, a possibly non-linear transformation of node attributes and a ReLU-activated feature aggregation from neighbors. The latter surprisingly matches the type of non-linearity used in many GNN models. By further imposing Gaussian assumption on node attributes, we prove that the superiority of those ReLU activations is only significant when the node attributes are far more informative than the graph structure, which nicely matches many previous empirical observations. A similar argument can be achieved when there is a distribution shift of node attributes between the training and testing datasets. Finally, we verify our theory on both synthetic and real-world networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Programmatic Weak Supervision via Source-aware Influence Function b/data/2022/neurips/Understanding Programmatic Weak Supervision via Source-aware Influence Function
new file mode 100644
index 0000000000..c0d2678819
--- /dev/null
+++ b/data/2022/neurips/Understanding Programmatic Weak Supervision via Source-aware Influence Function	
@@ -0,0 +1 @@
+Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources into probabilistic training labels, which are in turn used to train an end model. With its increasing popularity, it is critical to have some tool for users to understand the influence of each component (e.g., the source vote or training data) in the pipeline and interpret the end model behavior. To achieve this, we build on Influence Function (IF) and propose source-aware IF, which leverages the generation process of the probabilistic labels to decompose the end model's training objective and then calculate the influence associated with each (data, source, class) tuple. These primitive influence score can then be used to estimate the influence of individual component of PWS, such as source vote, supervision source, and training data. On datasets of diverse domains, we demonstrate multiple use cases: (1) interpreting incorrect predictions from multiple angles that reveals insights for debugging the PWS pipeline, (2) identifying mislabeling of sources with a gain of 9%-37% over baselines, and (3) improving the end model's generalization performance by removing harmful components in the training objective (13%-24% better than ordinary IF).
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Robust Learning through the Lens of Representation Similarities b/data/2022/neurips/Understanding Robust Learning through the Lens of Representation Similarities
new file mode 100644
index 0000000000..761fa856e4
--- /dev/null
+++ b/data/2022/neurips/Understanding Robust Learning through the Lens of Representation Similarities	
@@ -0,0 +1 @@
+Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different training procedures, architectural parameters and adversarial constraints. Our experiments highlight hitherto unseen properties of robust representations that we posit underlie the behavioral differences of robust networks. We discover a lack of specialization in robust networks' representations along with a disappearance of `block structure'. We also find overfitting during robust training largely impacts deeper layers. These, along with other findings, suggest ways forward for the design and training of better robust networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding Square Loss in Training Overparametrized Neural Network Classifiers b/data/2022/neurips/Understanding Square Loss in Training Overparametrized Neural Network Classifiers
new file mode 100644
index 0000000000..4cd922c02c
--- /dev/null
+++ b/data/2022/neurips/Understanding Square Loss in Training Overparametrized Neural Network Classifiers	
@@ -0,0 +1 @@
+Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime. Interesting properties regarding the generalization error, robustness, and calibration error are revealed. We consider two cases, according to whether classes are separable or not. In the general non-separable case, fast convergence rate is established for both misclassification rate and calibration error. When classes are separable, the misclassification rate improves to be exponentially fast. Further, the resulting margin is proven to be lower bounded away from zero, providing theoretical guarantees for robustness. We expect our findings to hold beyond the NTK regime and translate to practical settings. To this end, we conduct extensive empirical studies on practical neural networks, demonstrating the effectiveness of square loss in both synthetic low-dimensional data and real image data. Comparing to cross-entropy, square loss has comparable generalization error but noticeable advantages in robustness and model calibration.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries b/data/2022/neurips/Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries
new file mode 100644
index 0000000000..8231fa1176
--- /dev/null
+++ b/data/2022/neurips/Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries	
@@ -0,0 +1 @@
+Subgraph GNNs are a recent class of expressive Graph Neural Networks (GNNs) which model graphs as collections of subgraphs. So far, the design space of possible Subgraph GNN architectures as well as their basic theoretical properties are still largely unexplored. In this paper, we study the most prominent form of subgraph methods, which employs node-based subgraph selection policies such as ego-networks or node marking and deletion. We address two central questions: (1) What is the upper-bound of the expressive power of these methods? and (2) What is the family of equivariant message passing layers on these sets of subgraphs?. Our first step in answering these questions is a novel symmetry analysis which shows that modelling the symmetries of node-based subgraph collections requires a significantly smaller symmetry group than the one adopted in previous works. This analysis is then used to establish a link between Subgraph GNNs and Invariant Graph Networks (IGNs). We answer the questions above by first bounding the expressive power of subgraph methods by 3-WL, and then proposing a general family of message-passing layers for subgraph methods that generalises all previous node-based Subgraph GNNs. Finally, we design a novel Subgraph GNN dubbed SUN, which theoretically unifies previous architectures while providing better empirical performance on multiple benchmarks.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation b/data/2022/neurips/Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation
new file mode 100644
index 0000000000..c022c84a51
--- /dev/null
+++ b/data/2022/neurips/Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation	
@@ -0,0 +1 @@
+We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patch-based transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs across a wide set of ImageNet based robustness benchmarks. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation, and together boost the performance further.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding the Eluder Dimension b/data/2022/neurips/Understanding the Eluder Dimension
new file mode 100644
index 0000000000..f4e3696685
--- /dev/null
+++ b/data/2022/neurips/Understanding the Eluder Dimension	
@@ -0,0 +1 @@
+We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation. First, we study the relationship between the eluder dimension for a function class and a generalized notion of rank, defined for any monotone"activation"$\sigma : \mathbb{R}\to \mathbb{R}$, which corresponds to the minimal dimension required to represent the class as a generalized linear model. It is known that when $\sigma$ has derivatives bounded away from $0$, $\sigma$-rank gives rise to an upper bound on eluder dimension for any function class; we show however that eluder dimension can be exponentially smaller than $\sigma$-rank. We also show that the condition on the derivative is necessary; namely, when $\sigma$ is the $\mathsf{relu}$ activation, the eluder dimension can be exponentially larger than $\sigma$-rank. For binary-valued function classes, we obtain a characterization of the eluder dimension in terms of star number and threshold dimension, quantities which are relevant in active learning and online learning respectively.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding the Evolution of Linear Regions in Deep Reinforcement Learning b/data/2022/neurips/Understanding the Evolution of Linear Regions in Deep Reinforcement Learning
new file mode 100644
index 0000000000..edfe8cf3c7
--- /dev/null
+++ b/data/2022/neurips/Understanding the Evolution of Linear Regions in Deep Reinforcement Learning	
@@ -0,0 +1 @@
+Policies produced by deep reinforcement learning are typically characterised by their learning curves, but they remain poorly understood in many other respects. ReLU-based policies result in a partitioning of the input space into piecewise linear regions. We seek to understand how observed region counts and their densities evolve during deep reinforcement learning using empirical results that span a range of continuous control tasks and policy network dimensions. Intuitively, we may expect that during training, the region density increases in the areas that are frequently visited by the policy, thereby affording fine-grained control. We use recent theoretical and empirical results for the linear regions induced by neural networks in supervised learning settings for grounding and comparison of our results. Empirically, we find that the region density increases only moderately throughout training, as measured along fixed trajectories coming from the final policy. However, the trajectories themselves also increase in length during training, and thus the region densities decrease as seen from the perspective of the current trajectory. Our findings suggest that the complexity of deep reinforcement learning policies does not principally emerge from a significant growth in the complexity of functions observed on-and-around trajectories of the policy.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding the Failure of Batch Normalization for Transformers in NLP b/data/2022/neurips/Understanding the Failure of Batch Normalization for Transformers in NLP
new file mode 100644
index 0000000000..d319878104
--- /dev/null
+++ b/data/2022/neurips/Understanding the Failure of Batch Normalization for Transformers in NLP	
@@ -0,0 +1 @@
+Batch Normalization (BN) is a core and prevalent technique in accelerating the training of deep neural networks and improving the generalization on Computer Vision (CV) tasks. However, it fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN). In this paper, we are trying to answer why BN usually performs worse than LN in NLP tasks with Transformer models. We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistency and reveal that TID can indicate BN's performance, supported by extensive experiments, including image classification, neural machine translation, language modeling, sequence labeling, and text classification tasks. We find that BN can obtain much better test performance than LN when TID keeps small through training. To suppress the explosion of TID, we propose Regularized BN (RBN) that adds a simple regularization term to narrow the gap between batch statistics and population statistics of BN. RBN improves the performance of BN consistently and outperforms or is on par with LN on 17 out of 20 settings, involving ten datasets and two common variants of Transformer Our code is available at https://github.com/wjxts/RegularizedBN.
\ No newline at end of file
diff --git a/data/2022/neurips/Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction b/data/2022/neurips/Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
new file mode 100644
index 0000000000..afd68855e6
--- /dev/null
+++ b/data/2022/neurips/Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction	
@@ -0,0 +1 @@
+Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held belief that flatter minima lead to better generalization, this paper gives mathematical analysis and supporting experiments suggesting that normalization (together with accompanying weight-decay) encourages GD to reduce the sharpness of loss surface. Here"sharpness"is carefully defined given that the loss is scale-invariant, a known consequence of normalization. Specifically, for a fairly broad class of neural nets with normalization, our theory explains how GD with a finite learning rate enters the so-called Edge of Stability (EoS) regime, and characterizes the trajectory of GD in this regime via a continuous sharpness-reduction flow.
\ No newline at end of file
diff --git a/data/2022/neurips/UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification b/data/2022/neurips/UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
new file mode 100644
index 0000000000..c484399315
--- /dev/null
+++ b/data/2022/neurips/UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification	
@@ -0,0 +1 @@
+Machine Learning (ML) research has focused on maximizing the accuracy of predictive tasks. ML models, however, are increasingly more complex, resource intensive, and costlier to deploy in resource-constrained environments. These issues are exacerbated for prediction tasks with sequential classification on progressively transitioned stages with ''happens-before'' relation between them.We argue that it is possible to ''unfold'' a monolithic single multi-class classifier, typically trained for all stages using all data, into a series of single-stage classifiers. Each single-stage classifier can be cascaded gradually from cheaper to more expensive binary classifiers that are trained using only the necessary data modalities or features required for that stage. UnfoldML is a cost-aware and uncertainty-based dynamic 2D prediction pipeline for multi-stage classification that enables (1) navigation of the accuracy/cost tradeoff space, (2) reducing the spatio-temporal cost of inference by orders of magnitude, and (3) early prediction on proceeding stages. UnfoldML achieves orders of magnitude better cost in clinical settings, while detecting multi-stage disease development in real time. It achieves within 0.1% accuracy from the highest-performing multi-class baseline, while saving close to 20X on spatio-temporal cost of inference and earlier (3.5hrs) disease onset prediction. We also show that UnfoldML generalizes to image classification, where it can predict different level of labels (from coarse to fine) given different level of abstractions of a image, saving close to 5X cost with as little as 0.4% accuracy reduction.
\ No newline at end of file
diff --git a/data/2022/neurips/Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs b/data/2022/neurips/Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
new file mode 100644
index 0000000000..444915966e
--- /dev/null
+++ b/data/2022/neurips/Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs	
@@ -0,0 +1 @@
+To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.
\ No newline at end of file
diff --git a/data/2022/neurips/UniCLIP: Unified Framework for Contrastive Language-Image Pre-training b/data/2022/neurips/UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
new file mode 100644
index 0000000000..97d513e6c0
--- /dev/null
+++ b/data/2022/neurips/UniCLIP: Unified Framework for Contrastive Language-Image Pre-training	
@@ -0,0 +1 @@
+Pre-training vision-language models with contrastive objectives has shown promising results that are both scalable to large uncurated datasets and transferable to many downstream applications. Some following works have targeted to improve data efficiency by adding self-supervision terms, but inter-domain (image-text) contrastive loss and intra-domain (image-image) contrastive loss are defined on individual spaces in those works, so many feasible combinations of supervision are overlooked. To overcome this issue, we propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training. UniCLIP integrates the contrastive loss of both inter-domain pairs and intra-domain pairs into a single universal space. The discrepancies that occur when integrating contrastive loss between different domains are resolved by the three key components of UniCLIP: (1) augmentation-aware feature embedding, (2) MP-NCE loss, and (3) domain dependent similarity measure. UniCLIP outperforms previous vision-language pre-training methods on various single- and multi-modality downstream tasks. In our experiments, we show that each component that comprises UniCLIP contributes well to the final performance.
\ No newline at end of file
diff --git a/data/2022/neurips/UniGAN: Reducing Mode Collapse in GANs using a Uniform Generator b/data/2022/neurips/UniGAN: Reducing Mode Collapse in GANs using a Uniform Generator
new file mode 100644
index 0000000000..1c45020b45
--- /dev/null
+++ b/data/2022/neurips/UniGAN: Reducing Mode Collapse in GANs using a Uniform Generator	
@@ -0,0 +1 @@
+Despite the signiﬁcant progress that has been made in the training of Generative Adversarial Networks (GANs), the mode collapse problem remains a major challenge in training GANs, which refers to a lack of diversity in generative samples. In this paper, we propose a new type of generative diversity named uniform diversity , which relates to a newly proposed type of mode collapse named u -mode collapse where the generative samples distribute nonuniformly over the data manifold. From a geometric perspective, we show that the uniform diversity is closely related with the generator uniformity property, and the maximum uniform diversity is achieved if the generator is uniform. To learn a uniform generator, we propose UniGAN , a generative framework with a Normalizing Flow based generator and a simple yet sample efﬁcient generator uniformity regularization, which can be easily adapted to any other generative framework. A new type of diversity metric named udiv is also proposed to estimate the uniform diversity given a set of generative samples in practice. Experimental results verify the effectiveness of our UniGAN in learning a uniform generator and improving uniform diversity.
\ No newline at end of file
diff --git a/data/2022/neurips/Uni[MASK]: Unified Inference in Sequential Decision Problems b/data/2022/neurips/Uni[MASK]: Unified Inference in Sequential Decision Problems
new file mode 100644
index 0000000000..31d6194e44
--- /dev/null
+++ b/data/2022/neurips/Uni[MASK]: Unified Inference in Sequential Decision Problems	
@@ -0,0 +1 @@
+Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the UniMASK framework, which provides a unified way to specify models which can be trained on many different sequential decision-making tasks. We show that a single UniMASK model is often capable of carrying out many tasks with performance similar to or better than single-task models. Additionally, after fine-tuning, our UniMASK models consistently outperform comparable single-task models. Our code is publicly available at https://github.com/micahcarroll/uniMASK.
\ No newline at end of file
diff --git a/data/2022/neurips/Unified Optimal Transport Framework for Universal Domain Adaptation b/data/2022/neurips/Unified Optimal Transport Framework for Universal Domain Adaptation
new file mode 100644
index 0000000000..c7d0dfa455
--- /dev/null
+++ b/data/2022/neurips/Unified Optimal Transport Framework for Universal Domain Adaptation	
@@ -0,0 +1 @@
+Universal Domain Adaptation (UniDA) aims to transfer knowledge from a source domain to a target domain without any constraints on label sets. Since both domains may hold private classes, identifying target common samples for domain alignment is an essential issue in UniDA. Most existing methods require manually specified or hand-tuned threshold values to detect common samples thus they are hard to extend to more realistic UniDA because of the diverse ratios of common classes. Moreover, they cannot recognize different categories among target-private samples as these private samples are treated as a whole. In this paper, we propose to use Optimal Transport (OT) to handle these issues under a unified framework, namely UniOT. First, an OT-based partial alignment with adaptive filling is designed to detect common classes without any predefined threshold values for realistic UniDA. It can automatically discover the intrinsic difference between common and private classes based on the statistical information of the assignment matrix obtained from OT. Second, we propose an OT-based target representation learning that encourages both global discrimination and local consistency of samples to avoid the over-reliance on the source. Notably, UniOT is the first method with the capability to automatically discover and recognize private categories in the target domain for UniDA. Accordingly, we introduce a new metric H^3-score to evaluate the performance in terms of both accuracy of common samples and clustering performance of private ones. Extensive experiments clearly demonstrate the advantages of UniOT over a wide range of state-of-the-art methods in UniDA.
\ No newline at end of file
diff --git a/data/2022/neurips/Unifying Voxel-based Representation with Transformer for 3D Object Detection b/data/2022/neurips/Unifying Voxel-based Representation with Transformer for 3D Object Detection
new file mode 100644
index 0000000000..eb5de0be54
--- /dev/null
+++ b/data/2022/neurips/Unifying Voxel-based Representation with Transformer for 3D Object Detection	
@@ -0,0 +1 @@
+In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify multi-modality representations in the voxel space for accurate and robust single- or cross-modality 3D detection. To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space. Different from previous work, our approach preserves the voxel space without height compression to alleviate semantic ambiguity and enable spatial connections. To make full use of the inputs from different sensors, the cross-modality interaction is then proposed, including knowledge transfer and modality fusion. In this way, geometry-aware expressions in point clouds and context-rich features in images are well utilized for better performance and robustness. The transformer decoder is applied to efficiently sample features from the unified space with learnable positions, which facilitates object-level interactions. In general, UVTR presents an early attempt to represent different modalities in a unified framework. It surpasses previous work in single- or multi-modality entries. The proposed method achieves leading performance in the nuScenes test set for both object detection and the following object tracking task. Code is made publicly available at https://github.com/dvlab-research/UVTR.
\ No newline at end of file
diff --git a/data/2022/neurips/Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search b/data/2022/neurips/Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
new file mode 100644
index 0000000000..574dab8186
--- /dev/null
+++ b/data/2022/neurips/Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search	
@@ -0,0 +1 @@
+Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical interpretation for their empirical performances, and (c) there may exist untapped potential in existing training-free NAS, which probably can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Remarkably, HNAS can enjoy the advantages of both training-free (i.e., the superior search efficiency) and training-based (i.e., the remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Universal Rates for Interactive Learning b/data/2022/neurips/Universal Rates for Interactive Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups b/data/2022/neurips/Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups
new file mode 100644
index 0000000000..b7f12f61a4
--- /dev/null
+++ b/data/2022/neurips/Universality of Group Convolutional Neural Networks Based on Ridgelet Analysis on Groups	
@@ -0,0 +1 @@
+We show the universality of depth-2 group convolutional neural networks (GCNNs) in a unified and constructive manner based on the ridgelet theory. Despite widespread use in applications, the approximation property of (G)CNNs has not been well investigated. The universality of (G)CNNs has been shown since the late 2010s. Yet, our understanding on how (G)CNNs represent functions is incomplete because the past universality theorems have been shown in a case-by-case manner by manually/carefully assigning the network parameters depending on the variety of convolution layers, and in an indirect manner by converting/modifying the (G)CNNs into other universal approximators such as invariant polynomials and fully-connected networks. In this study, we formulate a versatile depth-2 continuous GCNN $S[\gamma]$ as a nonlinear mapping between group representations, and directly obtain an analysis operator, called the ridgelet trasform, that maps a given function $f$ to the network parameter $\gamma$ so that $S[\gamma]=f$. The proposed GCNN covers typical GCNNs such as the cyclic convolution on multi-channel images, networks on permutation-invariant inputs (Deep Sets), and $\mathrm{E}(n)$-equivariant networks. The closed-form expression of the ridgelet transform can describe how the network parameters are organized to represent a function. While it has been known only for fully-connected networks, this study is the first to obtain the ridgelet transform for GCNNs. By discretizing the closed-form expression, we can systematically generate a constructive proof of the $cc$-universality of finite GCNNs. In other words, our universality proofs are more unified and constructive than previous proofs.
\ No newline at end of file
diff --git a/data/2022/neurips/Universally Expressive Communication in Multi-Agent Reinforcement Learning b/data/2022/neurips/Universally Expressive Communication in Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..398ecd9acd
--- /dev/null
+++ b/data/2022/neurips/Universally Expressive Communication in Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Allowing agents to share information through communication is crucial for solving complex tasks in multi-agent reinforcement learning. In this work, we consider the question of whether a given communication protocol can express an arbitrary policy. By observing that many existing protocols can be viewed as instances of graph neural networks (GNNs), we demonstrate the equivalence of joint action selection to node labelling. With standard GNN approaches provably limited in their expressive capacity, we draw from existing GNN literature and consider augmenting agent observations with: (1) unique agent IDs and (2) random noise. We provide a theoretical analysis as to how these approaches yield universally expressive communication, and also prove them capable of targeting arbitrary sets of actions for identical agents. Empirically, these augmentations are found to improve performance on tasks where expressive communication is required, whilst, in general, the optimal communication protocol is found to be task-dependent.
\ No newline at end of file
diff --git a/data/2022/neurips/Unknown-Aware Domain Adversarial Learning for Open-Set Domain Adaptation b/data/2022/neurips/Unknown-Aware Domain Adversarial Learning for Open-Set Domain Adaptation
new file mode 100644
index 0000000000..f9f6332cf0
--- /dev/null
+++ b/data/2022/neurips/Unknown-Aware Domain Adversarial Learning for Open-Set Domain Adaptation	
@@ -0,0 +1 @@
+Open-Set Domain Adaptation (OSDA) assumes that a target domain contains unknown classes, which are not discovered in a source domain. Existing domain adversarial learning methods are not suitable for OSDA because distribution matching with $\textit{unknown}$ classes leads to negative transfer. Previous OSDA methods have focused on matching the source and the target distribution by only utilizing $\textit{known}$ classes. However, this $\textit{known}$-only matching may fail to learn the target-$\textit{unknown}$ feature space. Therefore, we propose Unknown-Aware Domain Adversarial Learning (UADAL), which $\textit{aligns}$ the source and the target-$\textit{known}$ distribution while simultaneously $\textit{segregating}$ the target-$\textit{unknown}$ distribution in the feature alignment procedure. We provide theoretical analyses on the optimized state of the proposed $\textit{unknown-aware}$ feature alignment, so we can guarantee both $\textit{alignment}$ and $\textit{segregation}$ theoretically. Empirically, we evaluate UADAL on the benchmark datasets, which shows that UADAL outperforms other methods with better feature alignments by reporting state-of-the-art performances.
\ No newline at end of file
diff --git a/data/2022/neurips/Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes b/data/2022/neurips/Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes
new file mode 100644
index 0000000000..13fb792db9
--- /dev/null
+++ b/data/2022/neurips/Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes	
@@ -0,0 +1 @@
+The sample compressibility of concept classes plays an important role in learning theory, as a sufficient condition for PAC learnability, and more recently as an avenue for robust generalisation in adaptive data analysis. Whether compression schemes of size $O(d)$ must necessarily exist for all classes of VC dimension $d$ is unknown, but conjectured to be true by Warmuth. Recently Chalopin, Chepoi, Moran, and Warmuth (2018) gave a beautiful unlabelled sample compression scheme of size VC dimension for all maximum classes: classes that meet the Sauer-Shelah-Perles Lemma with equality. They also offered a counterexample to compression schemes based on a promising approach known as corner peeling. In this paper we simplify and extend their proof technique to deal with so-called extremal classes of VC dimension $d$ which contain maximum classes of VC dimension $d-1$. A criterion is given which would imply that all extremal classes admit unlabelled compression schemes of size $d$. We also prove that all intersection-closed classes with VC dimension $d$ admit unlabelled compression schemes of size at most $11d$.
\ No newline at end of file
diff --git a/data/2022/neurips/Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity b/data/2022/neurips/Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
new file mode 100644
index 0000000000..d8f1726187
--- /dev/null
+++ b/data/2022/neurips/Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity	
@@ -0,0 +1 @@
+Reinforcement learning provides an automated framework for learning behaviors from high-level reward specifications, but in practice the choice of reward function can be crucial for good results -- while in principle the reward only needs to specify what the task is, in reality practitioners often need to design more detailed rewards that provide the agent with some hints about how the task should be completed. The idea of this type of ``reward-shaping'' has been often discussed in the literature, and is often a critical part of practical applications, but there is relatively little formal characterization of how the choice of reward shaping can yield benefits in sample complexity. In this work, we build on the framework of novelty-based exploration to provide a simple scheme for incorporating shaped rewards into RL along with an analysis tool to show that particular choices of reward shaping provably improve sample efficiency. We characterize the class of problems where these gains are expected to be significant and show how this can be connected to practical algorithms in the literature. We confirm that these results hold in practice in an experimental evaluation, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance.
\ No newline at end of file
diff --git a/data/2022/neurips/Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems b/data/2022/neurips/Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems
new file mode 100644
index 0000000000..e2496907fc
--- /dev/null
+++ b/data/2022/neurips/Unravelling the Performance of Physics-informed Graph Neural Networks for Dynamical Systems	
@@ -0,0 +1 @@
+Recently, graph neural networks have been gaining a lot of attention to simulate dynamical systems due to their inductive nature leading to zero-shot generalizability. Similarly, physics-informed inductive biases in deep-learning frameworks have been shown to give superior performance in learning the dynamics of physical systems. There is a growing volume of literature that attempts to combine these two approaches. Here, we evaluate the performance of thirteen different graph neural networks, namely, Hamiltonian and Lagrangian graph neural networks, graph neural ODE, and their variants with explicit constraints and different architectures. We briefly explain the theoretical formulation highlighting the similarities and differences in the inductive biases and graph architecture of these systems. We evaluate these models on spring, pendulum, gravitational, and 3D deformable solid systems to compare the performance in terms of rollout error, conserved quantities such as energy and momentum, and generalizability to unseen system sizes. Our study demonstrates that GNNs with additional inductive biases, such as explicit constraints and decoupling of kinetic and potential energies, exhibit significantly enhanced performance. Further, all the physics-informed GNNs exhibit zero-shot generalizability to system sizes an order of magnitude larger than the training system, thus providing a promising route to simulate large-scale realistic systems.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Adaptation from Repeated Traversals for Autonomous Driving b/data/2022/neurips/Unsupervised Adaptation from Repeated Traversals for Autonomous Driving
new file mode 100644
index 0000000000..8815140fb6
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Adaptation from Repeated Traversals for Autonomous Driving	
@@ -0,0 +1 @@
+For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. While extensive research has been done on such an unsupervised domain adaptation problem, one fundamental problem lingers: there is no reliable signal in the target domain to supervise the adaptation process. To overcome this issue we observe that it is easy to collect unsupervised data from multiple traversals of repeated routes. While different from conventional unsupervised domain adaptation, this assumption is extremely realistic since many drivers share the same roads. We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain. Concretely, we generate pseudo-labels with the out-of-domain detector but reduce false positives by removing detections of supposedly mobile objects that are persistent across traversals. Further, we reduce false negatives by encouraging predictions in regions that are not persistent. We experiment with our approach on two large-scale driving datasets and show remarkable improvement in 3D object detection of cars, pedestrians, and cyclists, bringing us a step closer to generalizable autonomous driving.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Causal Generative Understanding of Images b/data/2022/neurips/Unsupervised Causal Generative Understanding of Images
new file mode 100644
index 0000000000..2c4d021c30
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Causal Generative Understanding of Images	
@@ -0,0 +1 @@
+We present a novel causal generative model for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images. This model is trained to reconstruct multi-view images via a latent representation describing the shapes, colours and positions of the 3D objects they show. We then propose an inference algorithm that can infer this latent representation given a single out-of-distribution image as input. We conduct extensive experiments applying our approach to test datasets that have zero probability under the training distribution. Our approach significantly out-performs baselines that do not capture the true causal image generation process.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Cross-Task Generalization via Retrieval Augmentation b/data/2022/neurips/Unsupervised Cross-Task Generalization via Retrieval Augmentation
new file mode 100644
index 0000000000..06a21759c3
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Cross-Task Generalization via Retrieval Augmentation	
@@ -0,0 +1 @@
+Humans can perform unseen tasks by recalling relevant skills acquired previously and then generalizing them to the target tasks, even if there is no supervision at all. In this paper, we aim to improve this kind of cross-task generalization ability of massive multi-task language models, such as T0 and FLAN, in an unsupervised setting. We propose a retrieval-augmentation method named ReCross that takes a few unlabelled examples as queries to retrieve a small subset of upstream data and uses them to update the multi-task model for better generalization. ReCross is a straightforward yet effective retrieval method that combines both efficient dense retrieval and effective pair-wise reranking. Our results and analysis show that it significantly outperforms both non-retrieval methods and other baseline methods.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Domain Adaptation for Semantic Segmentation using Depth Distribution b/data/2022/neurips/Unsupervised Domain Adaptation for Semantic Segmentation using Depth Distribution
new file mode 100644
index 0000000000..6206868a2a
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Domain Adaptation for Semantic Segmentation using Depth Distribution	
@@ -0,0 +1 @@
+Recent years have witnessed significant advancements made in the field of unsupervised domain adaptation for semantic segmentation. Depth information has been proved to be effective in building a bridge between synthetic datasets and real-world datasets. However, the existing methods may not pay enough attention to depth distribution in different categories, which makes it possible to use them for further improvement. Besides the existing methods that only use depth regression as an auxiliary task, we propose to use depth distribution density to further support semantic segmentation. Therefore, considering the relationship among depth distribution density, depth and semantic segmentation, we propose a branch balance loss for these three sub-tasks in multi-task learning schemes. In addition, we also pro-pose a spatial aggregation priors of pixels in different categories, which can be used to refine the pseudo-labels for self-training, thus further improving the performance of the prediction model. Experiments on SYNTHIA-to-Cityscapes and SYNTHIA-to-Mapillary benchmarks show the effectiveness of the method. The source code is available at https://github.com/depdis/Depth_Distribution .
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Image-to-Image Translation with Density Changing Regularization b/data/2022/neurips/Unsupervised Image-to-Image Translation with Density Changing Regularization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Unsupervised Learning From Incomplete Measurements for Inverse Problems b/data/2022/neurips/Unsupervised Learning From Incomplete Measurements for Inverse Problems
new file mode 100644
index 0000000000..68b6e78b17
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning From Incomplete Measurements for Inverse Problems	
@@ -0,0 +1 @@
+In many real-world inverse problems, only incomplete measurement data are available for training which can pose a problem for learning a reconstruction function. Indeed, unsupervised learning using a fixed incomplete measurement process is impossible in general, as there is no information in the nullspace of the measurement operator. This limitation can be overcome by using measurements from multiple operators. While this idea has been successfully applied in various applications, a precise characterization of the conditions for learning is still lacking. In this paper, we fill this gap by presenting necessary and sufficient conditions for learning the underlying signal model needed for reconstruction which indicate the interplay between the number of distinct measurement operators, the number of measurements per operator, the dimension of the model and the dimension of the signals. Furthermore, we propose a novel and conceptually simple unsupervised learning loss which only requires access to incomplete measurement data and achieves a performance on par with supervised learning when the sufficient condition is verified. We validate our theoretical bounds and demonstrate the advantages of the proposed unsupervised loss compared to previous methods via a series of experiments on various imaging inverse problems, such as accelerated magnetic resonance imaging, compressed sensing and image inpainting.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation b/data/2022/neurips/Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation
new file mode 100644
index 0000000000..df2c96fcae
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation	
@@ -0,0 +1 @@
+Using machine learning to solve combinatorial optimization (CO) problems is challenging, especially when the data is unlabeled. This work proposes an unsupervised learning framework for CO problems. Our framework follows a standard relaxation-plus-rounding approach and adopts neural networks to parameterize the relaxed solutions so that simple back-propagation can train the model end-to-end. Our key contribution is the observation that if the relaxed objective satisfies entry-wise concavity, a low optimization loss guarantees the quality of the final integral solutions. This observation significantly broadens the applicability of the previous framework inspired by Erdos' probabilistic method. In particular, this observation can guide the design of objective models in applications where the objectives are not given explicitly while requiring being modeled in prior. We evaluate our framework by solving a synthetic graph optimization problem, and two real-world applications including resource allocation in circuit design and approximate computing. Our framework largely outperforms the baselines based on na\"{i}ve relaxation, reinforcement learning, and Gumbel-softmax tricks.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Learning of Equivariant Structure from Sequences b/data/2022/neurips/Unsupervised Learning of Equivariant Structure from Sequences
new file mode 100644
index 0000000000..6b0c0324e3
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning of Equivariant Structure from Sequences	
@@ -0,0 +1 @@
+In this study, we present meta-sequential prediction (MSP), an unsupervised framework to learn the symmetry from the time sequence of length at least three. Our method leverages the stationary property (e.g. constant velocity, constant acceleration) of the time sequence to learn the underlying equivariant structure of the dataset by simply training the encoder-decoder model to be able to predict the future observations. We will demonstrate that, with our framework, the hidden disentangled structure of the dataset naturally emerges as a by-product by applying simultaneous block-diagonalization to the transition operators in the latent space, the procedure which is commonly used in representation theory to decompose the feature-space based on the type of response to group actions. We will showcase our method from both empirical and theoretical perspectives. Our result suggests that finding a simple structured relation and learning a model with extrapolation capability are two sides of the same coin. The code is available at https://github.com/takerum/meta_sequential_prediction.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Learning of Group Invariant and Equivariant Representations b/data/2022/neurips/Unsupervised Learning of Group Invariant and Equivariant Representations
new file mode 100644
index 0000000000..505f5e6515
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning of Group Invariant and Equivariant Representations	
@@ -0,0 +1 @@
+Equivariant neural networks, whose hidden features transform according to representations of a group G acting on the data, exhibit training efficiency and an improved generalisation performance. In this work, we extend group invariant and equivariant representation learning to the field of unsupervised deep learning. We propose a general learning strategy based on an encoder-decoder framework in which the latent representation is separated in an invariant term and an equivariant group action component. The key idea is that the network learns to encode and decode data to and from a group-invariant representation by additionally learning to predict the appropriate group action to align input and output pose to solve the reconstruction task. We derive the necessary conditions on the equivariant encoder, and we present a construction valid for any G, both discrete and continuous. We describe explicitly our construction for rotations, translations and permutations. We test the validity and the robustness of our approach in a variety of experiments with diverse data types employing different network architectures.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Learning of Shape Programs with Repeatable Implicit Parts b/data/2022/neurips/Unsupervised Learning of Shape Programs with Repeatable Implicit Parts
new file mode 100644
index 0000000000..10d5b8633f
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning of Shape Programs with Repeatable Implicit Parts	
@@ -0,0 +1 @@
+This document contains additional materials for our proposed method, ProGRIP, including additional details of our implementation (Sec. A), two demonstrative applications in semantic shape editing (Sec. B) and shape compactness (Sec. C), our study on using repeatable parts instead of posed parts for segmentation (Sec. D), a comparison on reconstruction quality between our box abstractions and our posed implicit functions (Sec. E), a comparison on reconstruction quality between repeatable parts and nonrepeatable parts both using implicit shape representations (Sec. F), more baselines (Sec. G), more qualitative visualizations (Sec. H), and some representative failure cases of our method (Sec. I). Please visit our project webpage, progrip-project.github.io , for additional visualizations.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Learning under Latent Label Shift b/data/2022/neurips/Unsupervised Learning under Latent Label Shift
new file mode 100644
index 0000000000..91e295e7fe
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Learning under Latent Label Shift	
@@ -0,0 +1 @@
+What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve upon competitive unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Multi-Object Segmentation by Predicting Probable Motion Patterns b/data/2022/neurips/Unsupervised Multi-Object Segmentation by Predicting Probable Motion Patterns
new file mode 100644
index 0000000000..cfceca5f11
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Multi-Object Segmentation by Predicting Probable Motion Patterns	
@@ -0,0 +1 @@
+We propose a new approach to learn to segment multiple image objects without manual supervision. The method can extract objects form still images, but uses videos for supervision. While prior works have considered motion for segmentation, a key insight is that, while motion can be used to identify objects, not all objects are necessarily in motion: the absence of motion does not imply the absence of objects. Hence, our model learns to predict image regions that are likely to contain motion patterns characteristic of objects moving rigidly. It does not predict specific motion, which cannot be done unambiguously from a still image, but a distribution of possible motions, which includes the possibility that an object does not move at all. We demonstrate the advantage of this approach over its deterministic counterpart and show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks, surpassing methods that use motion even at test time. As our approach is applicable to variety of network architectures that segment the scenes, we also apply it to existing image reconstruction-based models showing drastic improvement. Project page and code: https://www.robots.ox.ac.uk/~vgg/research/ppmp .
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation b/data/2022/neurips/Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation
new file mode 100644
index 0000000000..476e3d3423
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation	
@@ -0,0 +1 @@
+We present radiance field propagation (RFP), a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene. RFP is derived from emerging neural radiance field-based techniques, which jointly encodes semantics with appearance and geometry. The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss, enabling an unsupervised partitioning of a scene into salient or meaningful regions corresponding to different object instances. To better handle complex scenes with multiple objects and occlusions, we further propose an iterative expectation-maximization algorithm to refine object masks. RFP is one of the first unsupervised approach for tackling 3D real scene object segmentation for neural radiance field (NeRF) without any supervision, annotations, or other cues such as 3D bounding boxes and prior knowledge of object class. Experiments demonstrate that RFP achieves feasible segmentation results that are more accurate than previous unsupervised image/scene segmentation approaches, and are comparable to existing supervised NeRF-based methods. The segmented object representations enable individual 3D object editing operations.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning b/data/2022/neurips/Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant VAE b/data/2022/neurips/Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant VAE
new file mode 100644
index 0000000000..81294a0ada
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Object Representation Learning using Translation and Rotation Group Equivariant VAE	
@@ -0,0 +1 @@
+In many imaging modalities, objects of interest can occur in a variety of locations and poses (i.e. are subject to translations and rotations in 2d or 3d), but the location and pose of an object does not change its semantics (i.e. the object's essence). That is, the specific location and rotation of an airplane in satellite imagery, or the 3d rotation of a chair in a natural image, or the rotation of a particle in a cryo-electron micrograph, do not change the intrinsic nature of those objects. Here, we consider the problem of learning semantic representations of objects that are invariant to pose and location in a fully unsupervised manner. We address shortcomings in previous approaches to this problem by introducing TARGET-VAE, a translation and rotation group-equivariant variational autoencoder framework. TARGET-VAE combines three core innovations: 1) a rotation and translation group-equivariant encoder architecture, 2) a structurally disentangled distribution over latent rotation, translation, and a rotation-translation-invariant semantic object representation, which are jointly inferred by the approximate inference network, and 3) a spatially equivariant generator network. In comprehensive experiments, we show that TARGET-VAE learns disentangled representations without supervision that significantly improve upon, and avoid the pathologies of, previous methods. When trained on images highly corrupted by rotation and translation, the semantic representations learned by TARGET-VAE are similar to those learned on consistently posed objects, dramatically improving clustering in the semantic latent space. Furthermore, TARGET-VAE is able to perform remarkably accurate unsupervised pose and location inference. We expect methods like TARGET-VAE will underpin future approaches for unsupervised object generation, pose prediction, and object detection.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Point Cloud Completion and Segmentation by Generative Adversarial Autoencoding Network b/data/2022/neurips/Unsupervised Point Cloud Completion and Segmentation by Generative Adversarial Autoencoding Network
new file mode 100644
index 0000000000..20d1f10b96
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Point Cloud Completion and Segmentation by Generative Adversarial Autoencoding Network	
@@ -0,0 +1 @@
+Most existing point cloud completion methods assume the input partial point cloud is clean, which is not the case in practice, and are generally based on supervised learning. In this paper, we present an unsupervised generative adversarial au-toencoding network, named UGAAN, which completes the partial point cloud contaminated by surroundings from real scenes and cutouts the object simultaneously, only using artificial CAD models as assistance. The generator of UGAAN learns to predict the complete point clouds on real data from both the discriminator and the autoencoding process of artificial data. The latent codes from generator are also fed to discriminator which makes encoder only extract object features rather than noises. We also devise a refiner for generating better complete cloud with a segmentation module to separate the object from background. We train our UGAAN with one real scene dataset and evaluate it with the other two. Extensive experiments and visualization demonstrate our superiority, generalization and robustness. Comparisons against the previous method show that our method achieves the state-of-the-art performance on unsupervised point cloud completion and segmentation on real data.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Reinforcement Learning with Contrastive Intrinsic Control b/data/2022/neurips/Unsupervised Reinforcement Learning with Contrastive Intrinsic Control
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models b/data/2022/neurips/Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models
new file mode 100644
index 0000000000..ba3bf24e03
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models	
@@ -0,0 +1 @@
+Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images. Considering that training DPMs from scratch will take a long time and there have existed numerous pre-trained DPMs, we propose \textbf{P}re-trained \textbf{D}PM \textbf{A}uto\textbf{E}ncoding (\textbf{PDAE}), a general method to adapt existing pre-trained DPMs to the decoders for image reconstruction, with better training efficiency and performance than Diff-AE. Specifically, we find that the reason that pre-trained DPMs fail to reconstruct an image from its latent variables is due to the information loss of forward process, which causes a gap between their predicted posterior mean and the true one. From this perspective, the classifier-guided sampling method can be explained as computing an extra mean shift to fill the gap, reconstructing the lost class information in samples. These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap. Drawing inspiration from this, we employ a trainable model to predict a mean shift according to encoded representation and train it to fill as much gap as possible, in this way, the encoder is forced to learn as much information as possible from images to help the filling. By reusing a part of network of pre-trained DPMs and redesigning the weighting scheme of diffusion loss, PDAE can learn meaningful representations from images efficiently. Extensive experiments demonstrate the effectiveness, efficiency and flexibility of PDAE.
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Skill Discovery via Recurrent Skill Training b/data/2022/neurips/Unsupervised Skill Discovery via Recurrent Skill Training
new file mode 100644
index 0000000000..dbd6dd4612
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Skill Discovery via Recurrent Skill Training	
@@ -0,0 +1 @@
+Being able to discover diverse useful skills without external reward functions is beneﬁcial in reinforcement learning research. Previous unsupervised skill discovery approaches mainly train different skills in parallel. Although impressive results have been provided, we found that parallel training procedure can sometimes block exploration when the state visited by different skills overlap, which leads to poor state coverage and restricts the diversity of learned skills. In this paper, we take a deeper look into this phenomenon and propose a novel framework to address this issue, which we call Recurrent Skill Training (ReST). Instead of training all the skills in parallel, ReST trains different skills one after another recurrently, along with a state coverage based intrinsic reward. We conduct experiments on a number of challenging 2D navigation environments and robotic locomotion environments. Evaluation results show that our proposed approach outperforms previous parallel training approaches in terms of state coverage and skill diversity. Videos of the discovered skills are available at https://sites.google.com/ view/neurips22-rest .
\ No newline at end of file
diff --git a/data/2022/neurips/Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment b/data/2022/neurips/Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment
new file mode 100644
index 0000000000..a1a9381724
--- /dev/null
+++ b/data/2022/neurips/Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment	
@@ -0,0 +1 @@
+This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
\ No newline at end of file
diff --git a/data/2022/neurips/Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection b/data/2022/neurips/Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection
new file mode 100644
index 0000000000..6250b5f4b0
--- /dev/null
+++ b/data/2022/neurips/Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. Our codes are available at \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark}.
\ No newline at end of file
diff --git a/data/2022/neurips/Uplifting Bandits b/data/2022/neurips/Uplifting Bandits
new file mode 100644
index 0000000000..68a8548a8f
--- /dev/null
+++ b/data/2022/neurips/Uplifting Bandits	
@@ -0,0 +1 @@
+We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure.
\ No newline at end of file
diff --git a/data/2022/neurips/Use-Case-Grounded Simulations for Explanation Evaluation b/data/2022/neurips/Use-Case-Grounded Simulations for Explanation Evaluation
new file mode 100644
index 0000000000..169ec8d065
--- /dev/null
+++ b/data/2022/neurips/Use-Case-Grounded Simulations for Explanation Evaluation	
@@ -0,0 +1 @@
+A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected explanation methods. To address these challenges and aid user study design, we introduce Use-Case-Grounded Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that SimEvals can be used to efficiently screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.
\ No newline at end of file
diff --git a/data/2022/neurips/Using Embeddings for Causal Estimation of Peer Influence in Social Networks b/data/2022/neurips/Using Embeddings for Causal Estimation of Peer Influence in Social Networks
new file mode 100644
index 0000000000..97a74553a6
--- /dev/null
+++ b/data/2022/neurips/Using Embeddings for Causal Estimation of Peer Influence in Social Networks	
@@ -0,0 +1 @@
+We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbors. A main challenge to such estimation is that homophily - the tendency of connected units to share similar latent traits - acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment nonparametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key contributions are to nonparametrically formalize the causal effect in a way that accounts for homophily, and to show how embedding methods can be used to identify and estimate this effect. Code is available at https://github.com/IrinaCristali/Peer-Contagion-on-Networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness b/data/2022/neurips/Using Mixup as a Regularizer Can Surprisingly Improve Accuracy & Out-of-Distribution Robustness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Using Partial Monotonicity in Submodular Maximization b/data/2022/neurips/Using Partial Monotonicity in Submodular Maximization
new file mode 100644
index 0000000000..8ed8ecc263
--- /dev/null
+++ b/data/2022/neurips/Using Partial Monotonicity in Submodular Maximization	
@@ -0,0 +1 @@
+Over the last two decades, submodular function maximization has been the workhorse of many discrete optimization problems in machine learning applications. Traditionally, the study of submodular functions was based on binary function properties. However, such properties have an inherit weakness, namely, if an algorithm assumes functions that have a particular property, then it provides no guarantee for functions that violate this property, even when the violation is very slight. Therefore, recent works began to consider continuous versions of function properties. Probably the most significant among these (so far) are the submodularity ratio and the curvature, which were studied extensively together and separately. The monotonicity property of set functions plays a central role in submodular maximization. Nevertheless, and despite all the above works, no continuous version of this property has been suggested to date (as far as we know). This is unfortunate since submoduar functions that are almost monotone often arise in machine learning applications. In this work we fill this gap by defining the monotonicity ratio, which is a continues version of the monotonicity property. We then show that for many standard submodular maximization algorithms one can prove new approximation guarantees that depend on the monotonicity ratio; leading to improved approximation ratios for the common machine learning applications of movie recommendation, quadratic programming and image summarization.
\ No newline at end of file
diff --git a/data/2022/neurips/Using natural language and program abstractions to instill human inductive biases in machines b/data/2022/neurips/Using natural language and program abstractions to instill human inductive biases in machines
new file mode 100644
index 0000000000..e1d04156bd
--- /dev/null
+++ b/data/2022/neurips/Using natural language and program abstractions to instill human inductive biases in machines	
@@ -0,0 +1 @@
+Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key.
\ No newline at end of file
diff --git a/data/2022/neurips/VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming b/data/2022/neurips/VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming
new file mode 100644
index 0000000000..cd87547fc0
--- /dev/null
+++ b/data/2022/neurips/VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming	
@@ -0,0 +1 @@
+We present VAEL, a neuro-symbolic generative model integrating variational autoencoders (VAE) with the reasoning capabilities of probabilistic logic (L) programming. Besides standard latent subsymbolic variables, our model exploits a probabilistic logic program to define a further structured representation, which is used for logical reasoning. The entire process is end-to-end differentiable. Once trained, VAEL can solve new unseen generation tasks by (i) leveraging the previously acquired knowledge encoded in the neural component and (ii) exploiting new logical programs on the structured latent space. Our experiments provide support on the benefits of this neuro-symbolic integration both in terms of task generalization and data efficiency. To the best of our knowledge, this work is the first to propose a general-purpose end-to-end framework integrating probabilistic logic programming into a deep generative model.
\ No newline at end of file
diff --git a/data/2022/neurips/VCT: A Video Compression Transformer b/data/2022/neurips/VCT: A Video Compression Transformer
new file mode 100644
index 0000000000..c795627718
--- /dev/null
+++ b/data/2022/neurips/VCT: A Video Compression Transformer	
@@ -0,0 +1 @@
+We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past. The resulting video compression transformer outperforms previous methods on standard video compression data sets. Experiments on synthetic data show that our model learns to handle complex motion patterns such as panning, blurring and fading purely from data. Our approach is easy to implement, and we release code to facilitate future research.
\ No newline at end of file
diff --git a/data/2022/neurips/VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement b/data/2022/neurips/VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement
new file mode 100644
index 0000000000..1b1934972e
--- /dev/null
+++ b/data/2022/neurips/VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement	
@@ -0,0 +1 @@
+We present Variable Experience Rollout (VER), a technique for efficiently scaling batched on-policy reinforcement learning in heterogenous environments (where different environments take vastly different times to generate rollouts) to many GPUs residing on, potentially, many machines. VER combines the strengths of and blurs the line between synchronous and asynchronous on-policy RL methods (SyncOnRL and AsyncOnRL, respectively). VER learns from on-policy experience (like SyncOnRL) and has no synchronization points (like AsyncOnRL). VER leads to significant and consistent speed-ups across a broad range of embodied navigation and mobile manipulation tasks in photorealistic 3D simulation environments. Specifically, for PointGoal navigation and ObjectGoal navigation in Habitat 1.0, VER is 60-100% faster (1.6-2x speedup) than DD-PPO, the current state of art distributed SyncOnRL, with similar sample efficiency. For mobile manipulation tasks (open fridge/cabinet, pick/place objects) in Habitat 2.0 VER is 150% faster (2.5x speedup) on 1 GPU and 170% faster (2.7x speedup) on 8 GPUs than DD-PPO. Compared to SampleFactory (the current state-of-the-art AsyncOnRL), VER matches its speed on 1 GPU, and is 70% faster (1.7x speedup) on 8 GPUs with better sample efficiency. We leverage these speed-ups to train chained skills for GeometricGoal rearrangement tasks in the Home Assistant Benchmark (HAB). We find a surprising emergence of navigation in skills that do not ostensible require any navigation. Specifically, the Pick skill involves a robot picking an object from a table. During training the robot was always spawned close to the table and never needed to navigate. However, we find that if base movement is part of the action space, the robot learns to navigate then pick an object in new environments with 50% success, demonstrating surprisingly high out-of-distribution generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely? b/data/2022/neurips/VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely?
new file mode 100644
index 0000000000..861bd26398
--- /dev/null
+++ b/data/2022/neurips/VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely?	
@@ -0,0 +1 @@
+Vertical Federated Learning (VFL), that trains federated models over vertically partitioned data, has emerged as an important learning paradigm. However, existing VFL methods are facing two challenges: (1) scalability when # participants grows to even modest scale and (2) diminishing return w.r.t. # participants: not all participants are equally important and many will not introduce quality improvement in a large consortium. Inspired by these two challenges, in this paper, we ask: How can we select l out of m participants, where l ≪ m , that are most important? We call this problem Vertically Federated Participant Selection , and model it with a principled mutual information-based view. Our first technical contribution is VF-MINE—a Vertically Federated Mutual INformation Estimator —that uses one of the most celebrated algorithms in database theory—Fagin’s algorithm as a building block. Our second contribution is to further optimize VF-MINE to enable VF-PS, a group testing-based participant selection framework. We empirically show that vertically federated participation selection can be orders of magnitude faster than training a full-fledged VFL model, while being able to identify the most important subset of participants that often lead to a VFL model of similar quality.
\ No newline at end of file
diff --git a/data/2022/neurips/VICE: Variational Interpretable Concept Embeddings b/data/2022/neurips/VICE: Variational Interpretable Concept Embeddings
new file mode 100644
index 0000000000..9ed9fb684b
--- /dev/null
+++ b/data/2022/neurips/VICE: Variational Interpretable Concept Embeddings	
@@ -0,0 +1 @@
+A central goal in the cognitive sciences is the development of numerical models for mental representations of object concepts. This paper introduces Variational Interpretable Concept Embeddings (VICE), an approximate Bayesian method for embedding object concepts in a vector space using data collected from humans in a triplet odd-one-out task. VICE uses variational inference to obtain sparse, non-negative representations of object concepts with uncertainty estimates for the embedding values. These estimates are used to automatically select the dimensions that best explain the data. We derive a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for experimental design. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in the triplet odd-one-out task. Furthermore, VICE's object representations are more reproducible and consistent across random initializations, highlighting the unique advantage of using VICE for deriving interpretable embeddings from human behavior.
\ No newline at end of file
diff --git a/data/2022/neurips/VICRegL: Self-Supervised Learning of Local Visual Features b/data/2022/neurips/VICRegL: Self-Supervised Learning of Local Visual Features
new file mode 100644
index 0000000000..93dcd83c31
--- /dev/null
+++ b/data/2022/neurips/VICRegL: Self-Supervised Learning of Local Visual Features	
@@ -0,0 +1 @@
+Most recent self-supervised methods for learning image representations focus on either producing a global feature with invariance properties, or producing a set of local features. The former works best for classification tasks while the latter is best for detection and segmentation tasks. This paper explores the fundamental trade-off between learning local and global features. A new method called VICRegL is proposed that learns good global and local features simultaneously, yielding excellent performance on detection and segmentation tasks while maintaining good performance on classification tasks. Concretely, two identical branches of a standard convolutional net architecture are fed two differently distorted versions of the same image. The VICReg criterion is applied to pairs of global feature vectors. Simultaneously, the VICReg criterion is applied to pairs of local feature vectors occurring before the last pooling layer. Two local feature vectors are attracted to each other if their l2-distance is below a threshold or if their relative locations are consistent with a known geometric transformation between the two input images. We demonstrate strong performance on linear classification and segmentation transfer tasks. Code and pretrained models are publicly available at: https://github.com/facebookresearch/VICRegL
\ No newline at end of file
diff --git a/data/2022/neurips/VITA: Video Instance Segmentation via Object Token Association b/data/2022/neurips/VITA: Video Instance Segmentation via Object Token Association
new file mode 100644
index 0000000000..8ed3b10bd7
--- /dev/null
+++ b/data/2022/neurips/VITA: Video Instance Segmentation via Object Token Association	
@@ -0,0 +1 @@
+We introduce a novel paradigm for offline Video Instance Segmentation (VIS), based on the hypothesis that explicit object-oriented information can be a strong clue for understanding the context of the entire sequence. To this end, we propose VITA, a simple structure built on top of an off-the-shelf Transformer-based image instance segmentation model. Specifically, we use an image object detector as a means of distilling object-specific contexts into object tokens. VITA accomplishes video-level understanding by associating frame-level object tokens without using spatio-temporal backbone features. By effectively building relationships between objects using the condensed information, VITA achieves the state-of-the-art on VIS benchmarks with a ResNet-50 backbone: 49.8 AP, 45.7 AP on YouTube-VIS 2019&2021, and 19.6 AP on OVIS. Moreover, thanks to its object token-based structure that is disjoint from the backbone features, VITA shows several practical advantages that previous offline VIS methods have not explored - handling long and high-resolution videos with a common GPU, and freezing a frame-level detector trained on image domain. Code is available at https://github.com/sukjunhwang/VITA.
\ No newline at end of file
diff --git a/data/2022/neurips/VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation b/data/2022/neurips/VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
new file mode 100644
index 0000000000..4363b24501
--- /dev/null
+++ b/data/2022/neurips/VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation	
@@ -0,0 +1 @@
+Benefiting from language flexibility and compositionality, humans naturally intend to use language to command an embodied agent for complex tasks such as navigation and object manipulation. In this work, we aim to fill the blank of the last mile of embodied agents -- object manipulation by following human guidance, e.g.,"move the red mug next to the box while keeping it upright."To this end, we introduce an Automatic Manipulation Solver (AMSolver) system and build a Vision-and-Language Manipulation benchmark (VLMbench) based on it, containing various language instructions on categorized robotic manipulation tasks. Specifically, modular rule-based task templates are created to automatically generate robot demonstrations with language instructions, consisting of diverse object shapes and appearances, action types, and motion constraints. We also develop a keypoint-based model 6D-CLIPort to deal with multi-view observations and language input and output a sequence of 6 degrees of freedom (DoF) actions. We hope the new simulator and benchmark will facilitate future research on language-guided robotic manipulation.
\ No newline at end of file
diff --git a/data/2022/neurips/VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts b/data/2022/neurips/VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
new file mode 100644
index 0000000000..226e618d55
--- /dev/null
+++ b/data/2022/neurips/VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts	
@@ -0,0 +1 @@
+We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Mixture-of-Modality-Experts (MoME) Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of MoME, pretrained VLMo can be fine-tuned as a fusion encoder for vision-language classification tasks, or used as a dual encoder for efficient image-text retrieval. Moreover, we propose a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs. Experimental results show that VLMo achieves state-of-the-art results on various vision-language tasks, including VQA, NLVR2 and image-text retrieval. The code and pretrained models are available at https://aka.ms/vlmo.
\ No newline at end of file
diff --git a/data/2022/neurips/VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning b/data/2022/neurips/VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
new file mode 100644
index 0000000000..617d0f4495
--- /dev/null
+++ b/data/2022/neurips/VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning	
@@ -0,0 +1 @@
+We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
\ No newline at end of file
diff --git a/data/2022/neurips/VTC-LFC: Vision Transformer Compression with Low-Frequency Components b/data/2022/neurips/VTC-LFC: Vision Transformer Compression with Low-Frequency Components
new file mode 100644
index 0000000000..7974e87e69
--- /dev/null
+++ b/data/2022/neurips/VTC-LFC: Vision Transformer Compression with Low-Frequency Components	
@@ -0,0 +1 @@
+Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem. To address such a challenge, several methods have been proposed to compress ViTs. Most of them borrow experience in convolutional neural networks (CNNs) and mainly focus on the spatial domain. However, the compression only in the spatial domain suffers from a dramatic performance drop without ﬁne-tuning and is not robust to noise, as the noise in the spatial domain can easily confuse the pruning criteria, leading to some parameters/channels being pruned incorrectly. Inspired by recent ﬁndings that self-attention is a low-pass ﬁlter and low-frequency signals/components are more informative to ViTs, this paper proposes compressing ViTs with low-frequency components. Two metrics named low-frequency sensitivity (LFS) and low-frequency energy (LFE) are proposed for better channel pruning and token pruning. Additionally, a bottom-up cascade pruning scheme is applied to compress different dimensions jointly. Extensive experiments demonstrate that the proposed method could save 40% ∼ 60% of the FLOPs in ViTs, thus signiﬁcantly increasing the throughput on practical devices with less than 1% performance drop on ImageNet-1K. Code will be available at https://github.com/Daner-Wang/VTC-LFC.git .
\ No newline at end of file
diff --git a/data/2022/neurips/VaiPhy: a Variational Inference Based Algorithm for Phylogeny b/data/2022/neurips/VaiPhy: a Variational Inference Based Algorithm for Phylogeny
new file mode 100644
index 0000000000..e104f4273e
--- /dev/null
+++ b/data/2022/neurips/VaiPhy: a Variational Inference Based Algorithm for Phylogeny	
@@ -0,0 +1 @@
+Phylogenetics is a classical methodology in computational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of cancer development. The exponential size of the tree space is, unfortunately, a substantial obstacle for Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more recent variational inference (VI) based methods offer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approximate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art methods on real data and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equations with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topologies in the augmented tree space, and (ii) the JC sampler, to the best of our knowledge, the first-ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density estimation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: \url{https://github.com/Lagergren-Lab/VaiPhy}.
\ No newline at end of file
diff --git a/data/2022/neurips/Value Function Decomposition for Iterative Design of Reinforcement Learning Agents b/data/2022/neurips/Value Function Decomposition for Iterative Design of Reinforcement Learning Agents
new file mode 100644
index 0000000000..632ef648c8
--- /dev/null
+++ b/data/2022/neurips/Value Function Decomposition for Iterative Design of Reinforcement Learning Agents	
@@ -0,0 +1 @@
+Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric, which measures each reward component's effect on agent decision-making. Using these tools, we provide several demonstrations of decomposition's use in identifying and addressing problems in the design of both environments and agents. Value decomposition is broadly applicable and easy to incorporate into existing algorithms and workflows, making it a powerful tool in an RL practitioner's toolbox.
\ No newline at end of file
diff --git a/data/2022/neurips/Variable-rate hierarchical CPC leads to acoustic unit discovery in speech b/data/2022/neurips/Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
new file mode 100644
index 0000000000..5e8f7fceda
--- /dev/null
+++ b/data/2022/neurips/Variable-rate hierarchical CPC leads to acoustic unit discovery in speech	
@@ -0,0 +1 @@
+The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets. Accounting for the structure of the speech signal improves upon single-level CPC features and enhances the disentanglement of the learned representations, as measured by downstream speech recognition tasks, while resulting in a meaningful segmentation of the signal that closely resembles phone boundaries.
\ No newline at end of file
diff --git a/data/2022/neurips/Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning b/data/2022/neurips/Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning
new file mode 100644
index 0000000000..1d8c7cfffe
--- /dev/null
+++ b/data/2022/neurips/Variance Reduced ProxSkip: Algorithm, Theory and Application to Federated Learning	
@@ -0,0 +1 @@
+We study distributed optimization methods based on the {\em local training (LT)} paradigm: achieving communication efficiency by performing richer local gradient-based training on the clients before parameter averaging. Looking back at the progress of the field, we {\em identify 5 generations of LT methods}: 1) heuristic, 2) homogeneous, 3) sublinear, 4) linear, and 5) accelerated. The 5${}^{\rm th}$ generation, initiated by the ProxSkip method of Mishchenko, Malinovsky, Stich and Richt\'{a}rik (2022) and its analysis, is characterized by the first theoretical confirmation that LT is a communication acceleration mechanism. Inspired by this recent progress, we contribute to the 5${}^{\rm th}$ generation of LT methods by showing that it is possible to enhance them further using {\em variance reduction}. While all previous theoretical results for LT methods ignore the cost of local work altogether, and are framed purely in terms of the number of communication rounds, we show that our methods can be substantially faster in terms of the {\em total training cost} than the state-of-the-art method ProxSkip in theory and practice in the regime when local computation is sufficiently expensive. We characterize this threshold theoretically, and confirm our theoretical predictions with empirical results.
\ No newline at end of file
diff --git a/data/2022/neurips/Variational Model Perturbation for Source-Free Domain Adaptation b/data/2022/neurips/Variational Model Perturbation for Source-Free Domain Adaptation
new file mode 100644
index 0000000000..cd1fce9a29
--- /dev/null
+++ b/data/2022/neurips/Variational Model Perturbation for Source-Free Domain Adaptation	
@@ -0,0 +1 @@
+We aim for source-free domain adaptation, where the task is to deploy a model pre-trained on source domains to target domains. The challenges stem from the distribution shift from the source to the target domain, coupled with the unavailability of any source data and labeled target data for optimization. Rather than fine-tuning the model by updating the parameters, we propose to perturb the source model to achieve adaptation to target domains. We introduce perturbations into the model parameters by variational Bayesian inference in a probabilistic framework. By doing so, we can effectively adapt the model to the target domain while largely preserving the discriminative ability. Importantly, we demonstrate the theoretical connection to learning Bayesian neural networks, which proves the generalizability of the perturbed model to target domains. To enable more efficient optimization, we further employ a parameter sharing strategy, which substantially reduces the learnable parameters compared to a fully Bayesian neural network. Our model perturbation provides a new probabilistic way for domain adaptation which enables efficient adaptation to target domains while maximally preserving knowledge in source models. Experiments on several source-free benchmarks under three different evaluation settings verify the effectiveness of the proposed variational model perturbation for source-free domain adaptation.
\ No newline at end of file
diff --git a/data/2022/neurips/Variational inference via Wasserstein gradient flows b/data/2022/neurips/Variational inference via Wasserstein gradient flows
new file mode 100644
index 0000000000..d2fcf750df
--- /dev/null
+++ b/data/2022/neurips/Variational inference via Wasserstein gradient flows	
@@ -0,0 +1 @@
+Along with Markov chain Monte Carlo (MCMC) methods, variational inference (VI) has emerged as a central computational approach to large-scale Bayesian inference. Rather than sampling from the true posterior $\pi$, VI aims at producing a simple but effective approximation $\hat \pi$ to $\pi$ for which summary statistics are easy to compute. However, unlike the well-studied MCMC methodology, algorithmic guarantees for VI are still relatively less well-understood. In this work, we propose principled methods for VI, in which $\hat \pi$ is taken to be a Gaussian or a mixture of Gaussians, which rest upon the theory of gradient flows on the Bures--Wasserstein space of Gaussian measures. Akin to MCMC, it comes with strong theoretical guarantees when $\pi$ is log-concave.
\ No newline at end of file
diff --git a/data/2022/neurips/VectorAdam for Rotation Equivariant Geometry Optimization b/data/2022/neurips/VectorAdam for Rotation Equivariant Geometry Optimization
new file mode 100644
index 0000000000..9ecfabb4d8
--- /dev/null
+++ b/data/2022/neurips/VectorAdam for Rotation Equivariant Geometry Optimization	
@@ -0,0 +1 @@
+The Adam optimization algorithm has proven remarkably effective for optimization problems across machine learning and even traditional tasks in geometry processing. At the same time, the development of equivariant methods, which preserve their output under the action of rotation or some other transformation, has proven to be important for geometry problems across these domains. In this work, we observe that Adam $-$ when treated as a function that maps initial conditions to optimized results $-$ is not rotation equivariant for vector-valued parameters due to per-coordinate moment updates. This leads to significant artifacts and biases in practice. We propose to resolve this deficiency with VectorAdam, a simple modification which makes Adam rotation-equivariant by accounting for the vector structure of optimization variables. We demonstrate this approach on problems in machine learning and traditional geometric optimization, showing that equivariant VectorAdam resolves the artifacts and biases of traditional Adam when applied to vector-valued data, with equivalent or even improved rates of convergence.
\ No newline at end of file
diff --git a/data/2022/neurips/VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web b/data/2022/neurips/VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web
new file mode 100644
index 0000000000..6f4072efcc
--- /dev/null
+++ b/data/2022/neurips/VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web	
@@ -0,0 +1 @@
+The DarkWeb represents a hotbed for illicit activity, where users communicate on different market forums in order to exchange goods and services. Law enforcement agencies benefit from forensic tools that perform authorship analysis, in order to identify and profile users based on their textual content. However, authorship analysis has been traditionally studied using corpora featuring literary texts such as fragments from novels or fan fiction, which may not be suitable in a cybercrime context. Moreover, the few works that employ authorship analysis tools for cybercrime prevention usually employ ad-hoc experimental setups and datasets. To address these issues, we release VeriDark: a benchmark comprised of three large scale authorship verification datasets and one authorship identification dataset obtained from user activity from either Dark Web related Reddit communities or popular illicit Dark Web market forums. We evaluate competitive NLP baselines on the three datasets and perform an analysis of the predictions to better understand the limitations of such approaches. We make the datasets and baselines publicly available at https://github.com/bit-ml/VeriDark
\ No newline at end of file
diff --git a/data/2022/neurips/Verification and search algorithms for causal DAGs b/data/2022/neurips/Verification and search algorithms for causal DAGs
new file mode 100644
index 0000000000..bb7f8abacd
--- /dev/null
+++ b/data/2022/neurips/Verification and search algorithms for causal DAGs	
@@ -0,0 +1 @@
+We study two problems related to recovering causal graphs from interventional data: (i) $\textit{verification}$, where the task is to check if a purported causal graph is correct, and (ii) $\textit{search}$, where the task is to recover the correct causal graph. For both, we wish to minimize the number of interventions performed. For the first problem, we give a characterization of a minimal sized set of atomic interventions that is necessary and sufficient to check the correctness of a claimed causal graph. Our characterization uses the notion of $\textit{covered edges}$, which enables us to obtain simple proofs and also easily reason about earlier known results. We also generalize our results to the settings of bounded size interventions and node-dependent interventional costs. For all the above settings, we provide the first known provable algorithms for efficiently computing (near)-optimal verifying sets on general graphs. For the second problem, we give a simple adaptive algorithm based on graph separators that produces an atomic intervention set which fully orients any essential graph while using $\mathcal{O}(\log n)$ times the optimal number of interventions needed to $\textit{verify}$ (verifying size) the underlying DAG on $n$ vertices. This approximation is tight as $\textit{any}$ search algorithm on an essential line graph has worst case approximation ratio of $\Omega(\log n)$ with respect to the verifying size. With bounded size interventions, each of size $\leq k$, our algorithm gives an $\mathcal{O}(\log n \cdot \log k)$ factor approximation. Our result is the first known algorithm that gives a non-trivial approximation guarantee to the verifying size on general unweighted graphs and with bounded size interventions.
\ No newline at end of file
diff --git a/data/2022/neurips/Versatile Multi-stage Graph Neural Network for Circuit Representation b/data/2022/neurips/Versatile Multi-stage Graph Neural Network for Circuit Representation
new file mode 100644
index 0000000000..574aef7da1
--- /dev/null
+++ b/data/2022/neurips/Versatile Multi-stage Graph Neural Network for Circuit Representation	
@@ -0,0 +1 @@
+Due to the rapid growth in the scale of circuits and the desire for knowledge transfer from old designs to new ones, deep learning technologies have been widely exploited in Electronic Design Automation (EDA) to assist circuit design. In chip design cycles, we might encounter heterogeneous and diverse information sources, including the two most informative ones: the netlist and the design layout. However, handling each information source independently is sub-optimal. In this paper, we propose a novel way to integrate the multiple information sources under a unified heterogeneous graph named Circuit Graph , where topological and geometrical information is well integrated. Then, we propose Circuit GNN to fully utilize the features of vertices, edges as well as heterogeneous information during the message passing process. It is the first attempt to design a versatile circuit representation that is compatible across multiple EDA tasks and stages. Experiments on the two most representative prediction tasks in EDA show that our solution reaches state-of-the-art performance in both logic synthesis and global placement chip design stages. Besides, it achieves a 10x speed-up on congestion prediction compared to the state-of-the-art model.
\ No newline at end of file
diff --git a/data/2022/neurips/ViSioNS: Visual Search in Natural Scenes Benchmark b/data/2022/neurips/ViSioNS: Visual Search in Natural Scenes Benchmark
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation b/data/2022/neurips/ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
new file mode 100644
index 0000000000..71c98bda79
--- /dev/null
+++ b/data/2022/neurips/ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation	
@@ -0,0 +1 @@
+Although no specific domain knowledge is considered in the design, plain vision transformers have shown excellent performance in visual recognition tasks. However, little effort has been made to reveal the potential of such simple structures for pose estimation tasks. In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose. Specifically, ViTPose employs plain and non-hierarchical vision transformers as backbones to extract features for a given person instance and a lightweight decoder for pose estimation. It can be scaled up from 100M to 1B parameters by taking the advantages of the scalable model capacity and high parallelism of transformers, setting a new Pareto front between throughput and performance. Besides, ViTPose is very flexible regarding the attention type, input resolution, pre-training and finetuning strategy, as well as dealing with multiple pose tasks. We also empirically demonstrate that the knowledge of large ViTPose models can be easily transferred to small ones via a simple knowledge token. Experimental results show that our basic ViTPose model outperforms representative methods on the challenging MS COCO Keypoint Detection benchmark, while the largest model sets a new state-of-the-art. The code and models are available at https://github.com/ViTAE-Transformer/ViTPose.
\ No newline at end of file
diff --git a/data/2022/neurips/Video Diffusion Models b/data/2022/neurips/Video Diffusion Models
new file mode 100644
index 0000000000..34ed0ed386
--- /dev/null
+++ b/data/2022/neurips/Video Diffusion Models	
@@ -0,0 +1 @@
+Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on established benchmarks for video prediction and unconditional video generation. Supplementary material is available at https://video-diffusion.github.io/
\ No newline at end of file
diff --git a/data/2022/neurips/Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos b/data/2022/neurips/Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
new file mode 100644
index 0000000000..987d52239f
--- /dev/null
+++ b/data/2022/neurips/Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos	
@@ -0,0 +1 @@
+Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for training models with broad, general capabilities for text, images, and other modalities. However, for many sequential decision domains such as robotics, video games, and computer use, publicly available data does not contain the labels required to train behavioral priors in the same way. We extend the internet-scale pretraining paradigm to sequential decision domains through semi-supervised imitation learning wherein agents learn to act by watching online unlabeled videos. Specifically, we show that with a small amount of labeled data we can train an inverse dynamics model accurate enough to label a huge unlabeled source of online data -- here, online videos of people playing Minecraft -- from which we can then train a general behavioral prior. Despite using the native human interface (mouse and keyboard at 20Hz), we show that this behavioral prior has nontrivial zero-shot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit human-level performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
\ No newline at end of file
diff --git a/data/2022/neurips/Video compression dataset and benchmark of learning-based video-quality metrics b/data/2022/neurips/Video compression dataset and benchmark of learning-based video-quality metrics
new file mode 100644
index 0000000000..aaaeca9536
--- /dev/null
+++ b/data/2022/neurips/Video compression dataset and benchmark of learning-based video-quality metrics	
@@ -0,0 +1 @@
+Video-quality measurement is a critical task in video processing. Nowadays, many implementations of new encoding standards - such as AV1, VVC, and LCEVC - use deep-learning-based decoding algorithms with perceptual metrics that serve as optimization objectives. But investigations of the performance of modern video- and image-quality metrics commonly employ videos compressed using older standards, such as AVC. In this paper, we present a new benchmark for video-quality metrics that evaluates video compression. It is based on a new dataset consisting of about 2,500 streams encoded using different standards, including AVC, HEVC, AV1, VP9, and VVC. Subjective scores were collected using crowdsourced pairwise comparisons. The list of evaluated metrics includes recent ones based on machine learning and neural networks. The results demonstrate that new no-reference metrics exhibit a high correlation with subjective quality and approach the capability of top full-reference metrics.
\ No newline at end of file
diff --git a/data/2022/neurips/Video-based Human-Object Interaction Detection from Tubelet Tokens b/data/2022/neurips/Video-based Human-Object Interaction Detection from Tubelet Tokens
new file mode 100644
index 0000000000..526828db7f
--- /dev/null
+++ b/data/2022/neurips/Video-based Human-Object Interaction Detection from Tubelet Tokens	
@@ -0,0 +1 @@
+We present a novel vision Transformer, named TUTOR, which is able to learn tubelet tokens, served as highly-abstracted spatiotemporal representations, for video-based human-object interaction (V-HOI) detection. The tubelet tokens structurize videos by agglomerating and linking semantically-related patch tokens along spatial and temporal domains, which enjoy two benefits: 1) Compactness: each tubelet token is learned by a selective attention mechanism to reduce redundant spatial dependencies from others; 2) Expressiveness: each tubelet token is enabled to align with a semantic instance, i.e., an object or a human, across frames, thanks to agglomeration and linking. The effectiveness and efficiency of TUTOR are verified by extensive experiments. Results shows our method outperforms existing works by large margins, with a relative mAP gain of $16.14\%$ on VidHOI and a 2 points gain on CAD-120 as well as a $4 \times$ speedup.
\ No newline at end of file
diff --git a/data/2022/neurips/VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training b/data/2022/neurips/VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
new file mode 100644
index 0000000000..b60390b489
--- /dev/null
+++ b/data/2022/neurips/VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	
@@ -0,0 +1 @@
+Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets. In this paper, we show that video masked autoencoders (VideoMAE) are data-efficient learners for self-supervised video pre-training (SSVP). We are inspired by the recent ImageMAE and propose customized video tube masking with an extremely high ratio. This simple design makes video reconstruction a more challenging self-supervision task, thus encouraging extracting more effective video representations during this pre-training process. We obtain three important findings on SSVP: (1) An extremely high proportion of masking ratio (i.e., 90% to 95%) still yields favorable performance of VideoMAE. The temporally redundant video content enables a higher masking ratio than that of images. (2) VideoMAE achieves impressive results on very small datasets (i.e., around 3k-4k videos) without using any extra data. (3) VideoMAE shows that data quality is more important than data quantity for SSVP. Domain shift between pre-training and target datasets is an important issue. Notably, our VideoMAE with the vanilla ViT can achieve 87.4% on Kinetics-400, 75.4% on Something-Something V2, 91.3% on UCF101, and 62.6% on HMDB51, without using any extra data. Code is available at https://github.com/MCG-NJU/VideoMAE.
\ No newline at end of file
diff --git a/data/2022/neurips/ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints b/data/2022/neurips/ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints
new file mode 100644
index 0000000000..60c99b62f5
--- /dev/null
+++ b/data/2022/neurips/ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints	
@@ -0,0 +1 @@
+Recent studies have demonstrated that visual recognition models lack robustness to distribution shift. However, current work mainly considers model robustness to 2D image transformations, leaving viewpoint changes in the 3D world less explored. In general, viewpoint changes are prevalent in various real-world applications (e.g., autonomous driving), making it imperative to evaluate viewpoint robustness. In this paper, we propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models. By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints under an entropic regularizer, which helps to handle the fluctuations of the real camera pose and mitigate the reality gap between the real objects and their neural representations. Experiments validate that the common image classifiers are extremely vulnerable to the generated adversarial viewpoints, which also exhibit high cross-model transferability. Based on ViewFool, we introduce ImageNet-V, a new out-of-distribution dataset for benchmarking viewpoint robustness of image classifiers. Evaluation results on 40 classifiers with diverse architectures, objective functions, and data augmentations reveal a significant drop in model performance when tested on ImageNet-V, which provides a possibility to leverage ViewFool as an effective data augmentation strategy to improve viewpoint robustness.
\ No newline at end of file
diff --git a/data/2022/neurips/VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids b/data/2022/neurips/VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids
new file mode 100644
index 0000000000..e5eeba9808
--- /dev/null
+++ b/data/2022/neurips/VisCo Grids: Surface Reconstruction with Viscosity and Coarea Grids	
@@ -0,0 +1 @@
+Surface reconstruction has been seeing a lot of progress lately by utilizing Implicit Neural Representations (INRs). Despite their success, INRs often introduce hard to control inductive bias (i.e., the solution surface can exhibit unexplainable behaviours), have costly inference, and are slow to train. The goal of this work is to show that replacing neural networks with simple grid functions, along with two novel geometric priors achieve comparable results to INRs, with instant inference, and improved training times. To that end we introduce VisCo Grids: a grid-based surface reconstruction method incorporating Viscosity and Coarea priors. Intuitively, the Viscosity prior replaces the smoothness inductive bias of INRs, while the Coarea favors a minimal area solution. Experimenting with VisCo Grids on a standard reconstruction baseline provided comparable results to the best performing INRs on this dataset.
\ No newline at end of file
diff --git a/data/2022/neurips/VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives b/data/2022/neurips/VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
new file mode 100644
index 0000000000..3ea5907f64
--- /dev/null
+++ b/data/2022/neurips/VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives	
@@ -0,0 +1 @@
+Many past works aim to improve visual reasoning in models by supervising feature importance (estimated by model explanation techniques) with human annotations such as highlights of important image regions. However, recent work has shown that performance gains from feature importance (FI) supervision for Visual Question Answering (VQA) tasks persist even with random supervision, suggesting that these methods do not meaningfully align model FI with human FI. In this paper, we show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason (RRR) metrics by optimizing for four key model objectives: (1) accurate predictions given limited but sufficient information (Sufficiency); (2) max-entropy predictions given no important information (Uncertainty); (3) invariance of predictions to changes in unimportant features (Invariance); and (4) alignment between model FI explanations and human FI explanations (Plausibility). Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets in terms of both in-distribution and out-of-distribution accuracy. While past work suggests that the mechanism for improved accuracy is through improved explanation plausibility, we show that this relationship depends crucially on explanation faithfulness (whether explanations truly represent the model's internal reasoning). Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful. Lastly, we show that, surprisingly, RRR metrics are not predictive of out-of-distribution model accuracy when controlling for a model's in-distribution accuracy, which calls into question the value of these metrics for evaluating model reasoning. All supporting code is available at https://github.com/zfying/visfis
\ No newline at end of file
diff --git a/data/2022/neurips/Vision GNN: An Image is Worth Graph of Nodes b/data/2022/neurips/Vision GNN: An Image is Worth Graph of Nodes
new file mode 100644
index 0000000000..afdf5644c0
--- /dev/null
+++ b/data/2022/neurips/Vision GNN: An Image is Worth Graph of Nodes	
@@ -0,0 +1 @@
+Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. We first split the image to a number of patches which are viewed as nodes, and construct a graph by connecting the nearest neighbors. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. ViG consists of two basic modules: Grapher module with graph convolution for aggregating and updating graph information, and FFN module with two linear layers for node feature transformation. Both isotropic and pyramid architectures of ViG are built with different model sizes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture. We hope this pioneering study of GNN on general visual tasks will provide useful inspiration and experience for future research. The PyTorch code is available at https://github.com/huawei-noah/Efficient-AI-Backbones and the MindSpore code is available at https://gitee.com/mindspore/models.
\ No newline at end of file
diff --git a/data/2022/neurips/Vision Transformers provably learn spatial structure b/data/2022/neurips/Vision Transformers provably learn spatial structure
new file mode 100644
index 0000000000..674bad29bb
--- /dev/null
+++ b/data/2022/neurips/Vision Transformers provably learn spatial structure	
@@ -0,0 +1 @@
+Vision Transformers (ViTs) have achieved comparable or superior performance than Convolutional Neural Networks (CNNs) in computer vision. This empirical breakthrough is even more remarkable since, in contrast to CNNs, ViTs do not embed any visual inductive bias of spatial locality. Yet, recent works have shown that while minimizing their training loss, ViTs specifically learn spatially localized patterns. This raises a central question: how do ViTs learn these patterns by solely minimizing their training loss using gradient-based methods from random initialization? In this paper, we provide some theoretical justification of this phenomenon. We propose a spatially structured dataset and a simplified ViT model. In this model, the attention matrix solely depends on the positional encodings. We call this mechanism the positional attention mechanism. On the theoretical side, we consider a binary classification task and show that while the learning problem admits multiple solutions that generalize, our model implicitly learns the spatial structure of the dataset while generalizing: we call this phenomenon patch association. We prove that patch association helps to sample-efficiently transfer to downstream datasets that share the same structure as the pre-training one but differ in the features. Lastly, we empirically verify that a ViT with positional attention performs similarly to the original one on CIFAR-10/100, SVHN and ImageNet.
\ No newline at end of file
diff --git a/data/2022/neurips/Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning b/data/2022/neurips/Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
new file mode 100644
index 0000000000..b562a69057
--- /dev/null
+++ b/data/2022/neurips/Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning	
@@ -0,0 +1 @@
+People say,"A picture is worth a thousand words". Then how can we get the rich information out of the image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training. Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e.g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model. Based on visual clues, we use large language model to produce a series of comprehensive descriptions for the visual content, which is then verified by the vision model again to select the candidate that aligns best with the image. We evaluate the quality of generated descriptions by quantitative and qualitative measurement. The results demonstrate the effectiveness of such a structured semantic representation.
\ No newline at end of file
diff --git a/data/2022/neurips/Visual Concepts Tokenization b/data/2022/neurips/Visual Concepts Tokenization
new file mode 100644
index 0000000000..e29a80c264
--- /dev/null
+++ b/data/2022/neurips/Visual Concepts Tokenization	
@@ -0,0 +1 @@
+Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene decomposition. Towards this goal, we propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept. Particularly, to obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens, preventing information leakage across concept tokens. We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts. The cross-attention and disentangling loss play the role of induction and mutual exclusion for the concept tokens, respectively. Extensive experiments on several popular datasets verify the effectiveness of VCT on the tasks of disentangled representation learning and scene decomposition. VCT achieves the state of the art results by a large margin.
\ No newline at end of file
diff --git a/data/2022/neurips/Visual Prompting via Image Inpainting b/data/2022/neurips/Visual Prompting via Image Inpainting
new file mode 100644
index 0000000000..6193d325fb
--- /dev/null
+++ b/data/2022/neurips/Visual Prompting via Image Inpainting	
@@ -0,0 +1 @@
+How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting - literally just filling in a hole in a concatenated visual prompt image - turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc.
\ No newline at end of file
diff --git a/data/2022/neurips/Visual correspondence-based explanations improve AI robustness and human-AI team accuracy b/data/2022/neurips/Visual correspondence-based explanations improve AI robustness and human-AI team accuracy
new file mode 100644
index 0000000000..4a0cc3869c
--- /dev/null
+++ b/data/2022/neurips/Visual correspondence-based explanations improve AI robustness and human-AI team accuracy	
@@ -0,0 +1 @@
+Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models b/data/2022/neurips/VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models
new file mode 100644
index 0000000000..81b99d3c29
--- /dev/null
+++ b/data/2022/neurips/VoiceBlock: Privacy through Real-Time Adversarial Attacks with Audio-to-Audio Models	
@@ -0,0 +1 @@
+As governments and corporations adopt deep learning systems to collect and analyze user-generated audio data, concerns about security and privacy natu-rally emerge in areas such as automatic speaker recognition. While audio adversarial examples offer one route to mislead or evade these invasive systems, they are typically crafted through time-intensive offline optimization, limiting their usefulness in streaming contexts. Inspired by architectures for audio-to-audio tasks such as denoising and speech enhancement, we propose a neural network model capable of adversarially modifying a user’s audio stream in real-time. Our model learns to apply a time-varying finite impulse response (FIR) filter to outgoing audio, allowing for effective and inconspicuous perturbations on a small fixed delay suitable for streaming tasks. We demonstrate our model is highly effective at de-identifying user speech from speaker recognition and able to transfer to an unseen recognition system. We conduct a perceptual study and find that our method produces perturbations significantly less perceptible than baseline anonymization methods, when controlling for effectiveness. Finally, we provide an implementation of our model capable of running in real-time on a single CPU thread. Audio examples and code can be found at https://interactiveaudiolab.github.io/project/voiceblock.html .
\ No newline at end of file
diff --git a/data/2022/neurips/VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids b/data/2022/neurips/VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
new file mode 100644
index 0000000000..995b44918f
--- /dev/null
+++ b/data/2022/neurips/VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids	
@@ -0,0 +1 @@
+State-of-the-art 3D-aware generative models rely on coordinate-based MLPs to parameterize 3D radiance fields. While demonstrating impressive results, querying an MLP for every sample along each ray leads to slow rendering. Therefore, existing approaches often render low-resolution feature maps and process them with an upsampling network to obtain the final image. Albeit efficient, neural rendering often entangles viewpoint and content such that changing the camera pose results in unwanted changes of geometry or appearance. Motivated by recent results in voxel-based novel view synthesis, we investigate the utility of sparse voxel grid representations for fast and 3D-consistent generative modeling in this paper. Our results demonstrate that monolithic MLPs can indeed be replaced by 3D convolutions when combining sparse voxel grids with progressive growing, free space pruning and appropriate regularization. To obtain a compact representation of the scene and allow for scaling to higher voxel resolutions, our model disentangles the foreground object (modeled in 3D) from the background (modeled in 2D). In contrast to existing approaches, our method requires only a single forward pass to generate a full 3D scene. It hence allows for efficient rendering from arbitrary viewpoints while yielding 3D consistent results with high visual fidelity.
\ No newline at end of file
diff --git a/data/2022/neurips/WT-MVSNet: Window-based Transformers for Multi-view Stereo b/data/2022/neurips/WT-MVSNet: Window-based Transformers for Multi-view Stereo
new file mode 100644
index 0000000000..6db129eb00
--- /dev/null
+++ b/data/2022/neurips/WT-MVSNet: Window-based Transformers for Multi-view Stereo	
@@ -0,0 +1 @@
+Recently, Transformers were shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Window-based Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since point-to-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks $1^{st}$ on Tanks and Temples benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/Washing The Unwashable : On The (Im)possibility of Fairwashing Detection b/data/2022/neurips/Washing The Unwashable : On The (Im)possibility of Fairwashing Detection
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Wasserstein $K$-means for clustering probability distributions b/data/2022/neurips/Wasserstein $K$-means for clustering probability distributions
new file mode 100644
index 0000000000..ccec23f28a
--- /dev/null
+++ b/data/2022/neurips/Wasserstein $K$-means for clustering probability distributions	
@@ -0,0 +1 @@
+Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used $K$-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the $K$-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based $K$-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein $K$-means can achieve exact recovery given the clusters are well-separated under the $2$-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based $K$-means can achieve better classification performance over the standard centroid-based $K$-means for clustering probability distributions and images.
\ No newline at end of file
diff --git a/data/2022/neurips/Wasserstein Iterative Networks for Barycenter Estimation b/data/2022/neurips/Wasserstein Iterative Networks for Barycenter Estimation
new file mode 100644
index 0000000000..743c7f9c39
--- /dev/null
+++ b/data/2022/neurips/Wasserstein Iterative Networks for Barycenter Estimation	
@@ -0,0 +1 @@
+Wasserstein barycenters have become popular due to their ability to represent the average of probability measures in a geometrically meaningful way. In this paper, we present an algorithm to approximate the Wasserstein-2 barycenters of continuous measures via a generative model. Previous approaches rely on regularization (entropic/quadratic) which introduces bias or on input convex neural networks which are not expressive enough for large-scale tasks. In contrast, our algorithm does not introduce bias and allows using arbitrary neural networks. In addition, based on the celebrity faces dataset, we construct Ave, celeba! dataset which can be used for quantitative evaluation of barycenter algorithms by using standard metrics of generative models such as FID.
\ No newline at end of file
diff --git a/data/2022/neurips/Wasserstein Logistic Regression with Mixed Features b/data/2022/neurips/Wasserstein Logistic Regression with Mixed Features
new file mode 100644
index 0000000000..cbbbcc02f9
--- /dev/null
+++ b/data/2022/neurips/Wasserstein Logistic Regression with Mixed Features	
@@ -0,0 +1 @@
+Recent work has leveraged the popular distributionally robust optimization paradigm to combat overfitting in classical logistic regression. While the resulting classification scheme displays a promising performance in numerical experiments, it is inherently limited to numerical features. In this paper, we show that distributionally robust logistic regression with mixed (i.e., numerical and categorical) features, despite amounting to an optimization problem of exponential size, admits a polynomial-time solution scheme. We subsequently develop a practically efficient column-and-constraint approach that solves the problem as a sequence of polynomial-time solvable exponential conic programs. Our model retains many of the desirable theoretical features of previous works, but -- in contrast to the literature -- it does not admit an equivalent representation as a regularized logistic regression, that is, it represents a genuinely novel variant of logistic regression. We show that our method outperforms both the unregularized and the regularized logistic regression on categorical as well as mixed-feature benchmark instances.
\ No newline at end of file
diff --git a/data/2022/neurips/Watermarking for Out-of-distribution Detection b/data/2022/neurips/Watermarking for Out-of-distribution Detection
new file mode 100644
index 0000000000..d4af7268e2
--- /dev/null
+++ b/data/2022/neurips/Watermarking for Out-of-distribution Detection	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models. However, existing methods largely ignore the reprogramming property of deep models and thus may not fully unleash their intrinsic strength: without modifying parameters of a well-trained deep model, we can reprogram this model for a new purpose via data-level manipulation (e.g., adding a specific feature perturbation to the data). This property motivates us to reprogram a classification model to excel at OOD detection (a new task), and thus we propose a general methodology named watermarking in this paper. Specifically, we learn a unified pattern that is superimposed onto features of original data, and the model's detection capability is largely boosted after watermarking. Extensive experiments verify the effectiveness of watermarking, demonstrating the significance of the reprogramming property of deep models in OOD detection.
\ No newline at end of file
diff --git a/data/2022/neurips/WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting b/data/2022/neurips/WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting
new file mode 100644
index 0000000000..d03f408150
--- /dev/null
+++ b/data/2022/neurips/WaveBound: Dynamic Error Bounds for Stable Time Series Forecasting	
@@ -0,0 +1 @@
+Time series forecasting has become a critical task due to its high practicality in real-world applications such as traffic, energy consumption, economics and finance, and disease analysis. Recent deep-learning-based approaches have shown remarkable success in time series forecasting. Nonetheless, due to the dynamics of time series data, deep networks still suffer from unstable training and overfitting. Inconsistent patterns appearing in real-world data lead the model to be biased to a particular pattern, thus limiting the generalization. In this work, we introduce the dynamic error bounds on training loss to address the overfitting issue in time series forecasting. Consequently, we propose a regularization method called WaveBound which estimates the adequate error bounds of training loss for each time step and feature at each iteration. By allowing the model to focus less on unpredictable data, WaveBound stabilizes the training process, thus significantly improving generalization. With the extensive experiments, we show that WaveBound consistently improves upon the existing models in large margins, including the state-of-the-art model.
\ No newline at end of file
diff --git a/data/2022/neurips/Wavelet Feature Maps Compression for Image-to-Image CNNs b/data/2022/neurips/Wavelet Feature Maps Compression for Image-to-Image CNNs
new file mode 100644
index 0000000000..3c1a33e611
--- /dev/null
+++ b/data/2022/neurips/Wavelet Feature Maps Compression for Image-to-Image CNNs	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) are known for requiring extensive computational resources, and quantization is among the best and most common methods for compressing them. While aggressive quantization (i.e., less than 4-bits) performs well for classification, it may cause severe performance degradation in image-to-image tasks such as semantic segmentation and depth estimation. In this paper, we propose Wavelet Compressed Convolution (WCC) -- a novel approach for high-resolution activation maps compression integrated with point-wise convolutions, which are the main computational cost of modern architectures. To this end, we use an efficient and hardware-friendly Haar-wavelet transform, known for its effectiveness in image compression, and define the convolution on the compressed activation map. We experiment with various tasks that benefit from high-resolution input. By combining WCC with light quantization, we achieve compression rates equivalent to 1-4bit activation quantization with relatively small and much more graceful degradation in performance. Our code is available at https://github.com/BGUCompSci/WaveletCompressedConvolution.
\ No newline at end of file
diff --git a/data/2022/neurips/Wavelet Score-Based Generative Modeling b/data/2022/neurips/Wavelet Score-Based Generative Modeling
new file mode 100644
index 0000000000..d2e4f5d659
--- /dev/null
+++ b/data/2022/neurips/Wavelet Score-Based Generative Modeling	
@@ -0,0 +1 @@
+Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that we analyze mathematically. We show that SGMs can be considerably accelerated, by factorizing the data distribution into a product of conditional probabilities of wavelet coefficients across scales. The resulting Wavelet Score-based Generative Model (WSGM) synthesizes wavelet coefficients with the same number of time steps at all scales, and its time complexity therefore grows linearly with the image size. This is proved mathematically over Gaussian distributions, and shown numerically over physical processes at phase transition and natural image datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/Weak-shot Semantic Segmentation via Dual Similarity Transfer b/data/2022/neurips/Weak-shot Semantic Segmentation via Dual Similarity Transfer
new file mode 100644
index 0000000000..56ae67337f
--- /dev/null
+++ b/data/2022/neurips/Weak-shot Semantic Segmentation via Dual Similarity Transfer	
@@ -0,0 +1 @@
+Semantic segmentation is an important and prevalent task, but severely suffers from the high cost of pixel-level annotations when extending to more classes in wider applications. To this end, we focus on the problem named weak-shot semantic segmentation, where the novel classes are learnt from cheaper image-level labels with the support of base classes having off-the-shelf pixel-level labels. To tackle this problem, we propose SimFormer, which performs dual similarity transfer upon MaskFormer. Specifically, MaskFormer disentangles the semantic segmentation task into two sub-tasks: proposal classification and proposal segmentation for each proposal. Proposal segmentation allows proposal-pixel similarity transfer from base classes to novel classes, which enables the mask learning of novel classes. We also learn pixel-pixel similarity from base classes and distill such class-agnostic semantic similarity to the semantic masks of novel classes, which regularizes the segmentation model with pixel-level semantic relationship across images. In addition, we propose a complementary loss to facilitate the learning of novel classes. Comprehensive experiments on the challenging COCO-Stuff-10K and ADE20K datasets demonstrate the effectiveness of our method. Codes are available at https://github.com/bcmi/SimFormer-Weak-Shot-Semantic-Segmentation.
\ No newline at end of file
diff --git a/data/2022/neurips/Weakly Supervised Representation Learning with Sparse Perturbations b/data/2022/neurips/Weakly Supervised Representation Learning with Sparse Perturbations
new file mode 100644
index 0000000000..0cfa762425
--- /dev/null
+++ b/data/2022/neurips/Weakly Supervised Representation Learning with Sparse Perturbations	
@@ -0,0 +1 @@
+The theory of representation learning aims to build methods that provably invert the data generating process with minimal domain knowledge or any source of supervision. Most prior approaches require strong distributional assumptions on the latent variables and weak supervision (auxiliary information such as timestamps) to provide provable identification guarantees. In this work, we show that if one has weak supervision from observations generated by sparse perturbations of the latent variables--e.g. images in a reinforcement learning environment where actions move individual sprites--identification is achievable under unknown continuous latent distributions. We show that if the perturbations are applied only on mutually exclusive blocks of latents, we identify the latents up to those blocks. We also show that if these perturbation blocks overlap, we identify latents up to the smallest blocks shared across perturbations. Consequently, if there are blocks that intersect in one latent variable only, then such latents are identified up to permutation and scaling. We propose a natural estimation procedure based on this theory and illustrate it on low-dimensional synthetic and image-based experiments.
\ No newline at end of file
diff --git a/data/2022/neurips/Weakly supervised causal representation learning b/data/2022/neurips/Weakly supervised causal representation learning
new file mode 100644
index 0000000000..130142c856
--- /dev/null
+++ b/data/2022/neurips/Weakly supervised causal representation learning	
@@ -0,0 +1 @@
+Learning high-level causal representations together with a causal model from unstructured low-level data such as pixels is impossible from observational data alone. We prove under mild assumptions that this representation is however identifiable in a weakly supervised setting. This involves a dataset with paired samples before and after random, unknown interventions, but no further labels. We then introduce implicit latent causal models, variational autoencoders that represent causal variables and causal structure without having to optimize an explicit discrete graph structure. On simple image data, including a novel dataset of simulated robotic manipulation, we demonstrate that such models can reliably identify the causal structure and disentangle causal variables.
\ No newline at end of file
diff --git a/data/2022/neurips/Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation b/data/2022/neurips/Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation
new file mode 100644
index 0000000000..4c22ed090a
--- /dev/null
+++ b/data/2022/neurips/Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation	
@@ -0,0 +1 @@
+We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
\ No newline at end of file
diff --git a/data/2022/neurips/WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents b/data/2022/neurips/WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
new file mode 100644
index 0000000000..c334f844bb
--- /dev/null
+++ b/data/2022/neurips/WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents	
@@ -0,0 +1 @@
+Existing benchmarks for grounding language in interactive environments either lack real-world linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. To bridge this gap, we develop WebShop -- a simulated e-commerce website environment with $1.18$ million real-world products and $12,087$ crowd-sourced text instructions. Given a text instruction specifying a product requirement, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase an item. WebShop provides several challenges for language grounding including understanding compositional instructions, query (re-)formulation, comprehending and acting on noisy text in webpages, and performing strategic exploration. We collect over $1,600$ human demonstrations for the task, and train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of $29\%$, which outperforms rule-based heuristics ($9.6\%$) but is far lower than human expert performance ($59\%$). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show that agents trained on WebShop exhibit non-trivial sim-to-real transfer when evaluated on amazon.com and ebay.com, indicating the potential value of WebShop in developing practical web-based agents that can operate in the wild.
\ No newline at end of file
diff --git a/data/2022/neurips/Weighted Distillation with Unlabeled Examples b/data/2022/neurips/Weighted Distillation with Unlabeled Examples
new file mode 100644
index 0000000000..16ca51d326
--- /dev/null
+++ b/data/2022/neurips/Weighted Distillation with Unlabeled Examples	
@@ -0,0 +1 @@
+Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large ''teacher'' neural network is trained on the labeled data available, and then it is used to generate labels on an unlabeled dataset (typically much larger in size). These labels are then utilized to train the smaller ''student'' model which will actually be deployed. Naturally, the success of the approach depends on the quality of the teacher's labels, since the student could be confused if trained on inaccurate data. This paper proposes a principled approach for addressing this issue based on a ''debiasing'' reweighting of the student's loss function tailored to the distillation training paradigm. Our method is hyper-parameter free, data-agnostic, and simple to implement. We demonstrate significant improvements on popular academic datasets and we accompany our results with a theoretical analysis which rigorously justifies the performance of our method in certain settings.
\ No newline at end of file
diff --git a/data/2022/neurips/Weighted Mutual Learning with Diversity-Driven Model Compression b/data/2022/neurips/Weighted Mutual Learning with Diversity-Driven Model Compression
new file mode 100644
index 0000000000..7434624ef9
--- /dev/null
+++ b/data/2022/neurips/Weighted Mutual Learning with Diversity-Driven Model Compression	
@@ -0,0 +1 @@
+Online distillation attracts attention from the community as it simplifies the traditional two-stage knowledge distillation process into a single stage. Online distillation collaboratively trains a group of peer models, which are treated as students, and all students gain extra knowledge from each other. However, memory consumption and diversity among students are two key challenges to the scalability and quality of online distillation. To address the two challenges, this paper presents a framework called Weighted Mutual Learning with Diversity-Driven Model Compression ( WML ) for online distillation. First, at the base of a hierarchical structure where students share different parts, we leverage the structured network pruning to generate diversified students with different models sizes, thus also helping reduce the memory requirements. Second, rather than taking the average of students, this paper, for the first time, leverages a bi-level formulation to estimate the relative importance of students with a close-form, to further boost the effectiveness of the distillation from each other. Extensive experiments show the generalization of the proposed framework, which outperforms existing online distillation methods on a variety of deep neural networks. More interesting, as a byproduct, WML produces a series of students with different model sizes in a single run, which also achieves competitive results compared with existing channel pruning methods.
\ No newline at end of file
diff --git a/data/2022/neurips/WeightedSHAP: analyzing and improving Shapley based feature attributions b/data/2022/neurips/WeightedSHAP: analyzing and improving Shapley based feature attributions
new file mode 100644
index 0000000000..2633ae900a
--- /dev/null
+++ b/data/2022/neurips/WeightedSHAP: analyzing and improving Shapley based feature attributions	
@@ -0,0 +1 @@
+Shapley value is a popular approach for measuring the influence of individual features. While Shapley feature attribution is built upon desiderata from game theory, some of its constraints may be less natural in certain machine learning settings, leading to unintuitive model interpretation. In particular, the Shapley value uses the same weight for all marginal contributions -- i.e. it gives the same importance when a large number of other features are given versus when a small number of other features are given. This property can be problematic if larger feature sets are more or less informative than smaller feature sets. Our work performs a rigorous analysis of the potential limitations of Shapley feature attribution. We identify simple settings where the Shapley value is mathematically suboptimal by assigning larger attributions for less influential features. Motivated by this observation, we propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data. On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.
\ No newline at end of file
diff --git a/data/2022/neurips/Weisfeiler and Leman Go Walking: Random Walk Kernels Revisited b/data/2022/neurips/Weisfeiler and Leman Go Walking: Random Walk Kernels Revisited
new file mode 100644
index 0000000000..a63c18dfe9
--- /dev/null
+++ b/data/2022/neurips/Weisfeiler and Leman Go Walking: Random Walk Kernels Revisited	
@@ -0,0 +1 @@
+Random walk kernels have been introduced in seminal work on graph learning and were later largely superseded by kernels based on the Weisfeiler-Leman test for graph isomorphism. We give a unified view on both classes of graph kernels. We study walk-based node refinement methods and formally relate them to several widely-used techniques, including Morgan's algorithm for molecule canonization and the Weisfeiler-Leman test. We define corresponding walk-based kernels on nodes that allow fine-grained parameterized neighborhood comparison, reach Weisfeiler-Leman expressiveness, and are computed using the kernel trick. From this we show that classical random walk kernels with only minor modifications regarding definition and computation are as expressive as the widely-used Weisfeiler-Leman subtree kernel but support non-strict neighborhood comparison. We verify experimentally that walk-based kernels reach or even surpass the accuracy of Weisfeiler-Leman kernels in real-world classification tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/What Can Transformers Learn In-Context? A Case Study of Simple Function Classes b/data/2022/neurips/What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
new file mode 100644
index 0000000000..cbb3a9fc61
--- /dev/null
+++ b/data/2022/neurips/What Can Transformers Learn In-Context? A Case Study of Simple Function Classes	
@@ -0,0 +1 @@
+In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn"most"functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .
\ No newline at end of file
diff --git a/data/2022/neurips/What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? b/data/2022/neurips/What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?
new file mode 100644
index 0000000000..d5d2432a7e
--- /dev/null
+++ b/data/2022/neurips/What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?	
@@ -0,0 +1 @@
+The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training.
\ No newline at end of file
diff --git a/data/2022/neurips/What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods b/data/2022/neurips/What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods
new file mode 100644
index 0000000000..03699e74c7
--- /dev/null
+++ b/data/2022/neurips/What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods	
@@ -0,0 +1 @@
+A multitude of explainability methods has been described to try to help users better understand how modern AI systems make decisions. However, most performance metrics developed to evaluate these methods have remained largely theoretical - without much consideration for the human end-user. In particular, it is not yet clear (1) how useful current explainability methods are in real-world scenarios; and (2) whether current performance metrics accurately reflect the usefulness of explanation methods for the end user. To fill this gap, we conducted psychophysics experiments at scale (n = 1,150) to evaluate the usefulness of representative attribution methods in three real-world scenarios. Our results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varies widely across these scenarios. This suggests the need to move beyond quantitative improvements of current attribution methods, towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.
\ No newline at end of file
diff --git a/data/2022/neurips/What Makes Graph Neural Networks Miscalibrated? b/data/2022/neurips/What Makes Graph Neural Networks Miscalibrated?
new file mode 100644
index 0000000000..ff200bba30
--- /dev/null
+++ b/data/2022/neurips/What Makes Graph Neural Networks Miscalibrated?	
@@ -0,0 +1 @@
+Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induced correlations between the nodes. In this work, we conduct a systematic study on the calibration qualities of GNN node predictions. In particular, we identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. Furthermore, based on the insights from this study, we design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks. GATS incorporates designs that address all the identified influential factors and produces nodewise temperature scaling using an attention-based architecture. GATS is accuracy-preserving, data-efficient, and expressive at the same time. Our experiments empirically verify the effectiveness of GATS, demonstrating that it can consistently achieve state-of-the-art calibration results on various graph datasets for different GNN backbones.
\ No newline at end of file
diff --git "a/data/2022/neurips/What Makes a \"Good\" Data Augmentation in Knowledge Distillation - A Statistical Perspective" "b/data/2022/neurips/What Makes a \"Good\" Data Augmentation in Knowledge Distillation - A Statistical Perspective"
new file mode 100644
index 0000000000..51b348c768
--- /dev/null
+++ "b/data/2022/neurips/What Makes a \"Good\" Data Augmentation in Knowledge Distillation - A Statistical Perspective"	
@@ -0,0 +1 @@
+Knowledge distillation (KD) is a general neural network training approach that uses a teacher model to guide the student model. Existing works mainly study KD from the network output side (e.g., trying to design a better KD loss function), while few have attempted to understand it from the input side. Especially, its interplay with data augmentation (DA) has not been well understood. In this paper, we ask: Why do some DA schemes (e.g., CutMix) inherently perform much better than others in KD? What makes a"good"DA in KD? Our investigation from a statistical perspective suggests that a good DA scheme should reduce the covariance of the teacher-student cross-entropy. A practical metric, the stddev of teacher's mean probability (T. stddev), is further presented and well justified empirically. Besides the theoretical understanding, we also introduce a new entropy-based data-mixing DA scheme, CutMixPick, to further enhance CutMix. Extensive empirical studies support our claims and demonstrate how we can harvest considerable performance gains simply by using a better DA scheme in knowledge distillation.
\ No newline at end of file
diff --git a/data/2022/neurips/What You See is What You Classify: Black Box Attributions b/data/2022/neurips/What You See is What You Classify: Black Box Attributions
new file mode 100644
index 0000000000..1d4bfdb0c1
--- /dev/null
+++ b/data/2022/neurips/What You See is What You Classify: Black Box Attributions	
@@ -0,0 +1 @@
+An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are provided in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundary-precise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating very distinct class-specific masks in a single forward pass. This makes the proposed method very efficient during inference. We show that our attributions are superior to established methods both visually and quantitatively with respect to the PASCAL VOC-2007 and Microsoft COCO-2014 datasets.
\ No newline at end of file
diff --git a/data/2022/neurips/What You See is What You Get: Principled Deep Learning via Distributional Generalization b/data/2022/neurips/What You See is What You Get: Principled Deep Learning via Distributional Generalization
new file mode 100644
index 0000000000..e4f339f608
--- /dev/null
+++ b/data/2022/neurips/What You See is What You Get: Principled Deep Learning via Distributional Generalization	
@@ -0,0 +1 @@
+Having similar behavior at training time and test time $-$ what we call a"What You See Is What You Get"(WYSIWYG) property $-$ is desirable in machine learning. Models trained with standard stochastic gradient descent (SGD), however, do not necessarily have this property, as their complex behaviors such as robustness or subgroup performance can differ drastically between training and test time. In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization. Applying this connection, we introduce new conceptual tools for designing deep-learning methods by reducing generalization concerns to optimization ones: to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the training data. By applying this novel design principle, which bypasses"pathologies"of SGD, we construct simple algorithms that are competitive with SOTA in several distributional-robustness applications, significantly improve the privacy vs. disparate impact trade-off of DP-SGD, and mitigate robust overfitting in adversarial training. Finally, we also improve on theoretical bounds relating DP, stability, and distributional generalization.
\ No newline at end of file
diff --git a/data/2022/neurips/What are the best Systems? New Perspectives on NLP Benchmarking b/data/2022/neurips/What are the best Systems? New Perspectives on NLP Benchmarking
new file mode 100644
index 0000000000..ff968581ce
--- /dev/null
+++ b/data/2022/neurips/What are the best Systems? New Perspectives on NLP Benchmarking	
@@ -0,0 +1 @@
+In Machine Learning, a benchmark refers to an ensemble of datasets associated with one or multiple metrics together with a way to aggregate different systems performances. They are instrumental in (i) assessing the progress of new methods along different axes and (ii) selecting the best systems for practical use. This is particularly the case for NLP with the development of large pre-trained models (e.g. GPT, BERT) that are expected to generalize well on a variety of tasks. While the community mainly focused on developing new datasets and metrics, there has been little interest in the aggregation procedure, which is often reduced to a simple average over various performance measures. However, this procedure can be problematic when the metrics are on a different scale, which may lead to spurious conclusions. This paper proposes a new procedure to rank systems based on their performance across different tasks. Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task and is theoretically grounded. We conduct extensive numerical experiments (on over 270k scores) to assess the soundness of our approach both on synthetic and real scores (e.g. GLUE, EXTREM, SEVAL, TAC, FLICKR). In particular, we show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure while being both more reliable and robust.
\ No newline at end of file
diff --git a/data/2022/neurips/What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs b/data/2022/neurips/What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
new file mode 100644
index 0000000000..5feaca9d65
--- /dev/null
+++ b/data/2022/neurips/What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs	
@@ -0,0 +1 @@
+Given an input image, and nothing else, our method returns the bounding boxes of objects in the image and phrases that describe the objects. This is achieved within an open world paradigm, in which the objects in the input image may not have been encountered during the training of the localization mechanism. Moreover, training takes place in a weakly supervised setting, where no bounding boxes are provided. To achieve this, our method combines two pre-trained networks: the CLIP image-to-text matching score and the BLIP image captioning tool. Training takes place on COCO images and their captions and is based on CLIP. Then, during inference, BLIP is used to generate a hypothesis regarding various regions of the current image. Our work generalizes weakly supervised segmentation and phrase grounding and is shown empirically to outperform the state of the art in both domains. It also shows very convincing results in the novel task of weakly-supervised open-world purely visual phrase-grounding presented in our work. For example, on the datasets used for benchmarking phrase-grounding, our method results in a very modest degradation in comparison to methods that employ human captions as an additional input. Our code is available at https://github.com/talshaharabany/what-is-where-by-looking and a live demo can be found at https://replicate.com/talshaharabany/what-is-where-by-looking.
\ No newline at end of file
diff --git a/data/2022/neurips/What is a Good Metric to Study Generalization of Minimax Learners? b/data/2022/neurips/What is a Good Metric to Study Generalization of Minimax Learners?
new file mode 100644
index 0000000000..6e89614f1d
--- /dev/null
+++ b/data/2022/neurips/What is a Good Metric to Study Generalization of Minimax Learners?	
@@ -0,0 +1 @@
+Minimax optimization has served as the backbone of many machine learning (ML) problems. Although the convergence behavior of optimization algorithms has been extensively studied in the minimax settings, their generalization guarantees in stochastic minimax optimization problems, i.e., how the solution trained on empirical data performs on unseen testing data, have been relatively underexplored. A fundamental question remains elusive: What is a good metric to study generalization of minimax learners? In this paper, we aim to answer this question by first showing that primal risk, a universal metric to study generalization in minimization problems, which has also been adopted recently to study generalization in minimax ones, fails in simple examples. We thus propose a new metric to study generalization of minimax learners: the primal gap, defined as the difference between the primal risk and its minimum over all models, to circumvent the issues. Next, we derive generalization error bounds for the primal gap in nonconvex-concave settings. As byproducts of our analysis, we also solve two open questions: establishing generalization error bounds for primal risk and primal-dual risk, another existing metric that is only well-defined when the global saddle-point exists, in the strong sense, i.e., without strong concavity or assuming that the maximization and expectation can be interchanged, while either of these assumptions was needed in the literature. Finally, we leverage this new metric to compare the generalization behavior of two popular algorithms -- gradient descent-ascent (GDA) and gradient descent-max (GDMax) in stochastic minimax optimization.
\ No newline at end of file
diff --git a/data/2022/neurips/What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment b/data/2022/neurips/What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment
new file mode 100644
index 0000000000..79c6a3d468
--- /dev/null
+++ b/data/2022/neurips/What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment	
@@ -0,0 +1 @@
+The fundamental problem of causal inference -- that we never observe counterfactuals -- prevents us from identifying how many might be negatively affected by a proposed intervention. If, in an A/B test, half of users click (or buy, or watch, or renew, etc.), whether exposed to the standard experience A or a new one B, hypothetically it could be because the change affects no one, because the change positively affects half the user population to go from no-click to click while negatively affecting the other half, or something in between. While unknowable, this impact is clearly of material importance to the decision to implement a change or not, whether due to fairness, long-term, systemic, or operational considerations. We therefore derive the tightest-possible (i.e., sharp) bounds on the fraction negatively affected (and other related estimands) given data with only factual observations, whether experimental or observational. Naturally, the more we can stratify individuals by observable covariates, the tighter the sharp bounds. Since these bounds involve unknown functions that must be learned from data, we develop a robust inference algorithm that is efficient almost regardless of how and how fast these functions are learned, remains consistent when some are mislearned, and still gives valid conservative bounds when most are mislearned. Our methodology altogether therefore strongly supports credible conclusions: it avoids spuriously point-identifying this unknowable impact, focusing on the best bounds instead, and it permits exceedingly robust inference on these. We demonstrate our method in simulation studies and in a case study of career counseling for the unemployed.
\ No newline at end of file
diff --git a/data/2022/neurips/When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture b/data/2022/neurips/When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
new file mode 100644
index 0000000000..9ceffc8393
--- /dev/null
+++ b/data/2022/neurips/When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture	
@@ -0,0 +1 @@
+Vision Transformers (ViTs) have recently achieved competitive performance in broad vision tasks. Unfortunately, on popular threat models, naturally trained ViTs are shown to provide no more adversarial robustness than convolutional neural networks (CNNs). Adversarial training is still required for ViTs to defend against such adversarial attacks. In this paper, we provide the first and comprehensive study on the adversarial training recipe of ViTs via extensive evaluation of various training techniques across benchmark datasets. We find that pre-training and SGD optimizer are necessary for ViTs' adversarial training. Further considering ViT as a new type of model architecture, we investigate its adversarial robustness from the perspective of its unique architectural components. We find, when randomly masking gradients from some attention blocks or masking perturbations on some patches during adversarial training, the adversarial robustness of ViTs can be remarkably improved, which may potentially open up a line of work to explore the architectural information inside the newly designed models like ViTs. Our code is available at https://github.com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers.
\ No newline at end of file
diff --git a/data/2022/neurips/When Combinatorial Thompson Sampling meets Approximation Regret b/data/2022/neurips/When Combinatorial Thompson Sampling meets Approximation Regret
new file mode 100644
index 0000000000..0fa596d2b2
--- /dev/null
+++ b/data/2022/neurips/When Combinatorial Thompson Sampling meets Approximation Regret	
@@ -0,0 +1 @@
+We study the Combinatorial Thompson Sampling policy (CTS) for combinatorial multi-armed bandit problems (CMAB), within an approximation regret setting. Although CTS has attracted a lot of interest, it has a drawback that other usual CMAB policies do not have when considering non-exact oracles: for some oracles, CTS has a poor approximation regret (scaling linearly with the time horizon $T$) [Wang and Chen, 2018]. A study is then necessary to discriminate the oracles on which CTS could learn. This study was started by Kong et al. [2021]: they gave the first approximation regret analysis of CTS for the greedy oracle, obtaining an upper bound of order $\mathcal{O}(\log(T)/\Delta^2)$, where $\Delta$ is some minimal reward gap. In this paper, our objective is to push this study further than the simple case of the greedy oracle. We provide the first $\mathcal{O}(\log(T)/\Delta)$ approximation regret upper bound for CTS, obtained under a specific condition on the approximation oracle, allowing a reduction to the exact oracle analysis. We thus term this condition REDUCE2EXACT, and observe that it is satisfied in many concrete examples. Moreover, it can be extended to the probabilistically triggered arms setting, thus capturing even more problems, such as online influence maximization.
\ No newline at end of file
diff --git a/data/2022/neurips/When Do Flat Minima Optimizers Work? b/data/2022/neurips/When Do Flat Minima Optimizers Work?
new file mode 100644
index 0000000000..a825a3a0ba
--- /dev/null
+++ b/data/2022/neurips/When Do Flat Minima Optimizers Work?	
@@ -0,0 +1 @@
+Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.
\ No newline at end of file
diff --git a/data/2022/neurips/When Does Differentially Private Learning Not Suffer in High Dimensions? b/data/2022/neurips/When Does Differentially Private Learning Not Suffer in High Dimensions?
new file mode 100644
index 0000000000..5166e92395
--- /dev/null
+++ b/data/2022/neurips/When Does Differentially Private Learning Not Suffer in High Dimensions?	
@@ -0,0 +1 @@
+Large pretrained models can be privately fine-tuned to achieve performance approaching that of non-private models. A common theme in these results is the surprising observation that high-dimensional models can achieve favorable privacy-utility trade-offs. This seemingly contradicts known results on the model-size dependence of differentially private convex learning and raises the following research question: When does the performance of differentially private learning not degrade with increasing model size? We identify that the magnitudes of gradients projected onto subspaces is a key factor that determines performance. To precisely characterize this for private convex learning, we introduce a condition on the objective that we term \emph{restricted Lipschitz continuity} and derive improved bounds for the excess empirical and population risks that are dimension-independent under additional conditions. We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components. This behavior is similar to conditions under which we obtain dimension-independent bounds in convex settings. Our theoretical and empirical results together provide a possible explanation for recent successes in large-scale private fine-tuning. Code to reproduce our results can be found at \url{https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis}.
\ No newline at end of file
diff --git a/data/2022/neurips/When Does Group Invariant Learning Survive Spurious Correlations? b/data/2022/neurips/When Does Group Invariant Learning Survive Spurious Correlations?
new file mode 100644
index 0000000000..9e5058cd26
--- /dev/null
+++ b/data/2022/neurips/When Does Group Invariant Learning Survive Spurious Correlations?	
@@ -0,0 +1 @@
+By inferring latent groups in the training data, recent works introduce invariant learning to the case where environment annotations are unavailable. Typically, learning group invariance under a majority/minority split is empirically shown to be effective in improving out-of-distribution generalization on many datasets. However, theoretical guarantee for these methods on learning invariant mechanisms is lacking. In this paper, we reveal the insufficiency of existing group invariant learning methods in preventing classifiers from depending on spurious correlations in the training set. Specifically, we propose two criteria on judging such sufficiency. Theoretically and empirically, we show that existing methods can violate both criteria and thus fail in generalizing to spurious correlation shifts. Motivated by this, we design a new group invariant learning method, which constructs groups with statistical independence tests, and reweights samples by group label proportion to meet the criteria. Experiments on both synthetic and real data demonstrate that the new method significantly outperforms existing group invariant learning methods in generalizing to spurious correlation shifts.
\ No newline at end of file
diff --git a/data/2022/neurips/When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits b/data/2022/neurips/When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits
new file mode 100644
index 0000000000..70bb0d1f39
--- /dev/null
+++ b/data/2022/neurips/When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits	
@@ -0,0 +1 @@
+We study the problem of multi-armed bandits with $\epsilon$-global Differential Privacy (DP). First, we prove the minimax and problem-dependent regret lower bounds for stochastic and linear bandits that quantify the hardness of bandits with $\epsilon$-global DP. These bounds suggest the existence of two hardness regimes depending on the privacy budget $\epsilon$. In the high-privacy regime (small $\epsilon$), the hardness depends on a coupled effect of privacy and partial information about the reward distributions. In the low-privacy regime (large $\epsilon$), bandits with $\epsilon$-global DP are not harder than the bandits without privacy. For stochastic bandits, we further propose a generic framework to design a near-optimal $\epsilon$ global DP extension of an index-based optimistic bandit algorithm. The framework consists of three ingredients: the Laplace mechanism, arm-dependent adaptive episodes, and usage of only the rewards collected in the last episode for computing private statistics. Specifically, we instantiate $\epsilon$-global DP extensions of UCB and KL-UCB algorithms, namely AdaP-UCB and AdaP-KLUCB. AdaP-KLUCB is the first algorithm that both satisfies $\epsilon$-global DP and yields a regret upper bound that matches the problem-dependent lower bound up to multiplicative constants.
\ No newline at end of file
diff --git a/data/2022/neurips/When are Local Queries Useful for Robust Learning? b/data/2022/neurips/When are Local Queries Useful for Robust Learning?
new file mode 100644
index 0000000000..e94b941678
--- /dev/null
+++ b/data/2022/neurips/When are Local Queries Useful for Robust Learning?	
@@ -0,0 +1 @@
+Distributional assumptions have been shown to be necessary for the robust learnability of concept classes when considering the exact-in-the-ball robust risk and access to random examples by Gourdeau et al. (2019). In this paper, we study learning models where the learner is given more power through the use of local queries, and give the first distribution-free algorithms that perform robust empirical risk minimization (ERM) for this notion of robustness. The first learning model we consider uses local membership queries (LMQ), where the learner can query the label of points near the training sample. We show that, under the uniform distribution, LMQs do not increase the robustness threshold of conjunctions and any superclass, e.g., decision lists and halfspaces. Faced with this negative result, we introduce the local equivalence query ($\mathsf{LEQ}$) oracle, which returns whether the hypothesis and target concept agree in the perturbation region around a point in the training sample, as well as a counterexample if it exists. We show a separation result: on the one hand, if the query radius $\lambda$ is strictly smaller than the adversary's perturbation budget $\rho$, then distribution-free robust learning is impossible for a wide variety of concept classes; on the other hand, the setting $\lambda=\rho$ allows us to develop robust ERM algorithms. We then bound the query complexity of these algorithms based on online learning guarantees and further improve these bounds for the special case of conjunctions. We finish by giving robust learning algorithms for halfspaces on $\{0,1\}^n$ and then obtaining robustness guarantees for halfspaces in $\mathbb{R}^n$ against precision-bounded adversaries.
\ No newline at end of file
diff --git a/data/2022/neurips/When are Offline Two-Player Zero-Sum Markov Games Solvable? b/data/2022/neurips/When are Offline Two-Player Zero-Sum Markov Games Solvable?
new file mode 100644
index 0000000000..0bca41c619
--- /dev/null
+++ b/data/2022/neurips/When are Offline Two-Player Zero-Sum Markov Games Solvable?	
@@ -0,0 +1 @@
+We study what dataset assumption permits solving offline two-player zero-sum Markov games. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov games. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.
\ No newline at end of file
diff --git a/data/2022/neurips/When does dough become a bagel? Analyzing the remaining mistakes on ImageNet b/data/2022/neurips/When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
new file mode 100644
index 0000000000..209b565004
--- /dev/null
+++ b/data/2022/neurips/When does dough become a bagel? Analyzing the remaining mistakes on ImageNet	
@@ -0,0 +1 @@
+Image classification accuracy on the ImageNet dataset has been a barometer for progress in computer vision over the last decade. Several recent papers have questioned the degree to which the benchmark remains useful to the community, yet innovations continue to contribute gains to performance, with today's largest models achieving 90%+ top-1 accuracy. To help contextualize progress on ImageNet and provide a more meaningful evaluation for today's state-of-the-art models, we manually review and categorize every remaining mistake that a few top models make in order to provide insight into the long-tail of errors on one of the most benchmarked datasets in computer vision. We focus on the multi-label subset evaluation of ImageNet, where today's best models achieve upwards of 97% top-1 accuracy. Our analysis reveals that nearly half of the supposed mistakes are not mistakes at all, and we uncover new valid multi-labels, demonstrating that, without careful review, we are significantly underestimating the performance of these models. On the other hand, we also find that today's best models still make a significant number of mistakes (40%) that are obviously wrong to human reviewers. To calibrate future progress on ImageNet, we provide an updated multi-label evaluation set, and we curate ImageNet-Major: a 68-example"major error"slice of the obvious mistakes made by today's top models -- a slice where models should achieve near perfection, but today are far from doing so.
\ No newline at end of file
diff --git a/data/2022/neurips/When does return-conditioned supervised learning work for offline reinforcement learning? b/data/2022/neurips/When does return-conditioned supervised learning work for offline reinforcement learning?
new file mode 100644
index 0000000000..28be744958
--- /dev/null
+++ b/data/2022/neurips/When does return-conditioned supervised learning work for offline reinforcement learning?	
@@ -0,0 +1 @@
+Several recent works have proposed a class of algorithms for the offline reinforcement learning (RL) problem that we will refer to as return-conditioned supervised learning (RCSL). RCSL algorithms learn the distribution of actions conditioned on both the state and the return of the trajectory. Then they define a policy by conditioning on achieving high return. In this paper, we provide a rigorous study of the capabilities and limitations of RCSL, something which is crucially missing in previous work. We find that RCSL returns the optimal policy under a set of assumptions that are stronger than those needed for the more traditional dynamic programming-based algorithms. We provide specific examples of MDPs and datasets that illustrate the necessity of these assumptions and the limits of RCSL. Finally, we present empirical evidence that these limitations will also cause issues in practice by providing illustrative experiments in simple point-mass environments and on datasets from the D4RL benchmark.
\ No newline at end of file
diff --git a/data/2022/neurips/When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning b/data/2022/neurips/When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning
new file mode 100644
index 0000000000..71b59e1237
--- /dev/null
+++ b/data/2022/neurips/When to Ask for Help: Proactive Interventions in Autonomous Reinforcement Learning	
@@ -0,0 +1 @@
+A long-term goal of reinforcement learning is to design agents that can autonomously interact and learn in the world. A critical challenge to such autonomy is the presence of irreversible states which require external assistance to recover from, such as when a robot arm has pushed an object off of a table. While standard agents require constant monitoring to decide when to intervene, we aim to design proactive agents that can request human intervention only when needed. To this end, we propose an algorithm that efficiently learns to detect and avoid states that are irreversible, and proactively asks for help in case the agent does enter them. On a suite of continuous control environments with unknown irreversible states, we find that our algorithm exhibits better sample- and intervention-efficiency compared to existing methods. Our code is publicly available at https://sites.google.com/view/proactive-interventions
\ No newline at end of file
diff --git a/data/2022/neurips/When to Intervene: Learning Optimal Intervention Policies for Critical Events b/data/2022/neurips/When to Intervene: Learning Optimal Intervention Policies for Critical Events
new file mode 100644
index 0000000000..1780e8e0ab
--- /dev/null
+++ b/data/2022/neurips/When to Intervene: Learning Optimal Intervention Policies for Critical Events	
@@ -0,0 +1 @@
+Providing a timely intervention before the onset of a critical event, such as a system failure, is of importance in many industrial settings. Before the onset of the critical event, systems typically exhibit behavioral changes which often manifest as stochastic co-variate observations which may be leveraged to trigger intervention. In this paper, for the ﬁrst time, we formulate the problem of ﬁnding an optimally timed intervention (OTI) policy as minimizing the expected residual time to event, subject to a constraint on the probability of missing the event. Existing machine learning approaches to intervention on critical events focus on predicting event occurrence within a pre-deﬁned window (a classiﬁcation problem) or predicting time-to-event (a regression problem). Interventions are then triggered by setting model thresholds. These are heuristic-driven, lacking guarantees regarding optimality. To model the evolution of system behavior, we introduce the concept of a hazard rate process. We show that the OTI problem is equivalent to an optimal stopping problem on the associated hazard rate process. This key link has not been explored in literature. Under Markovian assumptions on the hazard rate process, we show that an OTI policy at any time can be analytically determined from the conditional hazard rate function at that time. Further, we show that our theory includes, as a special case, the important class of neural hazard rate processes generated by recurrent neural networks (RNNs). To model such processes, we propose a dynamic deep recurrent survival analysis (DDRSA) architecture, introducing an RNN encoder into the static DRSA setting. Finally, we demonstrate RNN-based OTI policies with experiments and show that they outperform popular intervention methods.
\ No newline at end of file
diff --git a/data/2022/neurips/When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment b/data/2022/neurips/When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
new file mode 100644
index 0000000000..660570807f
--- /dev/null
+++ b/data/2022/neurips/When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment	
@@ -0,0 +1 @@
+AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of rule-breaking question answering (RBQA) of cases that involve potentially permissible rule-breaking -- inspired by recent moral psychology studies. Using a state-of-the-art large language model (LLM) as a basis, we propose a novel moral chain of thought (MORALCOT) prompting strategy that combines the strengths of LLMs with theories of moral reasoning developed in cognitive science to predict human moral judgments. MORALCOT outperforms seven existing LLMs by 6.2% F1, suggesting that modeling human reasoning might be necessary to capture the flexibility of the human moral mind. We also conduct a detailed error analysis to suggest directions for future work to improve AI safety using RBQA. Our data is open-sourced at https://huggingface.co/datasets/feradauto/MoralExceptQA and code at https://github.com/feradauto/MoralCoT
\ No newline at end of file
diff --git a/data/2022/neurips/When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning b/data/2022/neurips/When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning
new file mode 100644
index 0000000000..75f44443ab
--- /dev/null
+++ b/data/2022/neurips/When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning	
@@ -0,0 +1 @@
+Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. H2O provides a brand new hybrid offline-and-online RL paradigm, which can potentially shed light on future RL algorithm design for solving practical real-world tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/When to Update Your Model: Constrained Model-based Reinforcement Learning b/data/2022/neurips/When to Update Your Model: Constrained Model-based Reinforcement Learning
new file mode 100644
index 0000000000..4732e45eb2
--- /dev/null
+++ b/data/2022/neurips/When to Update Your Model: Constrained Model-based Reinforcement Learning	
@@ -0,0 +1 @@
+Designing and analyzing model-based RL (MBRL) algorithms with guaranteed monotonic improvement has been challenging, mainly due to the interdependence between policy optimization and model learning. Existing discrepancy bounds generally ignore the impacts of model shifts, and their corresponding algorithms are prone to degrade performance by drastic model updating. In this work, we first propose a novel and general theoretical scheme for a non-decreasing performance guarantee of MBRL. Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. These discoveries encourage us to formulate a constrained lower-bound optimization problem to permit the monotonicity of MBRL. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns. Motivated by these analyses, we design a simple but effective algorithm CMLO (Constrained Model-shift Lower-bound Optimization), by introducing an event-triggered mechanism that flexibly determines when to update the model. Experiments show that CMLO surpasses other state-of-the-art methods and produces a boost when various policy optimization methods are employed.
\ No newline at end of file
diff --git a/data/2022/neurips/Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability b/data/2022/neurips/Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability
new file mode 100644
index 0000000000..d1c192bd1c
--- /dev/null
+++ b/data/2022/neurips/Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability	
@@ -0,0 +1 @@
+Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We find that samples which cause similar parameters to malfunction are semantically similar. We also show that pruning the most salient parameters for a wrongly classified sample often improves model behavior. Furthermore, fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples that are misclassified for similar reasons. Based on our parameter saliency method, we also introduce an input-space saliency technique that reveals how image features cause specific network components to malfunction. Further, we rigorously validate the meaningfulness of our saliency maps on both the dataset and case-study levels.
\ No newline at end of file
diff --git a/data/2022/neurips/Where to Pay Attention in Sparse Training for Feature Selection? b/data/2022/neurips/Where to Pay Attention in Sparse Training for Feature Selection?
new file mode 100644
index 0000000000..5a4b4af9d9
--- /dev/null
+++ b/data/2022/neurips/Where to Pay Attention in Sparse Training for Feature Selection?	
@@ -0,0 +1 @@
+A new line of research for feature selection based on neural networks has recently emerged. Despite its superiority to classical methods, it requires many training iterations to converge and detect informative features. The computational time becomes prohibitively long for datasets with a large number of samples or a very high dimensional feature space. In this paper, we present a new efficient unsupervised method for feature selection based on sparse autoencoders. In particular, we propose a new sparse training algorithm that optimizes a model's sparse topology during training to pay attention to informative features quickly. The attention-based adaptation of the sparse topology enables fast detection of informative features after a few training iterations. We performed extensive experiments on 10 datasets of different types, including image, speech, text, artificial, and biological. They cover a wide range of characteristics, such as low and high-dimensional feature spaces, and few and large training samples. Our proposed approach outperforms the state-of-the-art methods in terms of selecting informative features while reducing training iterations and computational costs substantially. Moreover, the experiments show the robustness of our method in extremely noisy environments.
\ No newline at end of file
diff --git a/data/2022/neurips/Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps b/data/2022/neurips/Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps
new file mode 100644
index 0000000000..705111a8c3
--- /dev/null
+++ b/data/2022/neurips/Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps	
@@ -0,0 +1 @@
+Multi-agent collaborative perception could significantly upgrade the perception performance by enabling agents to share complementary information with each other through communication. It inevitably results in a fundamental trade-off between perception performance and communication bandwidth. To tackle this bottleneck issue, we propose a spatial confidence map, which reflects the spatial heterogeneity of perceptual information. It empowers agents to only share spatially sparse, yet perceptually critical information, contributing to where to communicate. Based on this novel spatial confidence map, we propose Where2comm, a communication-efficient collaborative perception framework. Where2comm has two distinct advantages: i) it considers pragmatic compression and uses less communication to achieve higher perception performance by focusing on perceptually critical areas; and ii) it can handle varying communication bandwidth by dynamically adjusting spatial areas involved in communication. To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/LiDAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original CoPerception-UAVs. Where2comm consistently outperforms previous methods; for example, it achieves more than $100,000 \times$ lower communication volume and still outperforms DiscoNet and V2X-ViT on OPV2V. Our code is available at https://github.com/MediaBrain-SJTU/where2comm.
\ No newline at end of file
diff --git a/data/2022/neurips/Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations b/data/2022/neurips/Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations
new file mode 100644
index 0000000000..9a1b042d65
--- /dev/null
+++ b/data/2022/neurips/Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations	
@@ -0,0 +1 @@
+A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients x Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.
\ No newline at end of file
diff --git a/data/2022/neurips/Whitening Convergence Rate of Coupling-based Normalizing Flows b/data/2022/neurips/Whitening Convergence Rate of Coupling-based Normalizing Flows
new file mode 100644
index 0000000000..45ab4a1a23
--- /dev/null
+++ b/data/2022/neurips/Whitening Convergence Rate of Coupling-based Normalizing Flows	
@@ -0,0 +1 @@
+Coupling-based normalizing flows (e.g. RealNVP) are a popular family of normalizing flow architectures that work surprisingly well in practice. This calls for theoretical understanding. Existing work shows that such flows weakly converge to arbitrary data distributions. However, they make no statement about the stricter convergence criterion used in practice, the maximum likelihood loss. For the first time, we make a quantitative statement about this kind of convergence: We prove that all coupling-based normalizing flows perform whitening of the data distribution (i.e. diagonalize the covariance matrix) and derive corresponding convergence bounds that show a linear convergence rate in the depth of the flow. Numerical experiments demonstrate the implications of our theory and point at open questions.
\ No newline at end of file
diff --git a/data/2022/neurips/Why Do Artificially Generated Data Help Adversarial Robustness b/data/2022/neurips/Why Do Artificially Generated Data Help Adversarial Robustness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power b/data/2022/neurips/Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
new file mode 100644
index 0000000000..ed3a8c56d9
--- /dev/null
+++ b/data/2022/neurips/Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power	
@@ -0,0 +1 @@
+It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. This result holds even if the data is linear separable (which means achieving standard generalization is easy), and more generally for any parameterized function classes as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
\ No newline at end of file
diff --git a/data/2022/neurips/Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters b/data/2022/neurips/Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters
new file mode 100644
index 0000000000..00bae5ff72
--- /dev/null
+++ b/data/2022/neurips/Why So Pessimistic? Estimating Uncertainties for Offline RL through Ensembles, and Why Their Independence Matters	
@@ -0,0 +1 @@
+Motivated by the success of ensembles for uncertainty estimation in supervised learning, we take a renewed look at how ensembles of $Q$-functions can be leveraged as the primary source of pessimism for offline reinforcement learning (RL). We begin by identifying a critical flaw in a popular algorithmic choice used by many ensemble-based RL algorithms, namely the use of shared pessimistic target values when computing each ensemble member's Bellman error. Through theoretical analyses and construction of examples in toy MDPs, we demonstrate that shared pessimistic targets can paradoxically lead to value estimates that are effectively optimistic. Given this result, we propose MSG, a practical offline RL algorithm that trains an ensemble of $Q$-functions with independently computed targets based on completely separate networks, and optimizes a policy with respect to the lower confidence bound of predicted action values. Our experiments on the popular D4RL and RL Unplugged offline RL benchmarks demonstrate that on challenging domains such as antmazes, MSG with deep ensembles surpasses highly well-tuned state-of-the-art methods by a wide margin. Additionally, through ablations on benchmarks domains, we verify the critical significance of using independently trained $Q$-functions, and study the role of ensemble size. Finally, as using separate networks per ensemble member can become computationally costly with larger neural network architectures, we investigate whether efficient ensemble approximations developed for supervised learning can be similarly effective, and demonstrate that they do not match the performance and robustness of MSG with separate networks, highlighting the need for new efforts into efficient uncertainty estimation directed at RL.
\ No newline at end of file
diff --git a/data/2022/neurips/Why do We Need Large Batchsizes in Contrastive Learning? A Gradient-Bias Perspective b/data/2022/neurips/Why do We Need Large Batchsizes in Contrastive Learning? A Gradient-Bias Perspective
new file mode 100644
index 0000000000..cb1ce95fe5
--- /dev/null
+++ b/data/2022/neurips/Why do We Need Large Batchsizes in Contrastive Learning? A Gradient-Bias Perspective	
@@ -0,0 +1 @@
+Contrastive learning (CL) has been the de facto technique for self-supervised representation learning (SSL), with impressive empirical success such as multi-modal representation learning. However, traditional CL loss only considers negative samples from a minibatch, which could cause biased gradients due to the non-decomposibility of the loss. For the ﬁrst time, we consider optimizing a more generalized contrastive loss, where each data sample is associated with an inﬁnite number of negative samples. We show that directly using minibatch stochastic optimization could lead to gradient bias. To remedy this, we propose an efﬁcient Bayesian data augmentation technique to augment the contrastive loss into a de-composable one, where standard stochastic optimization can be directly applied without gradient bias. Speciﬁcally, our augmented loss deﬁnes a joint distribution over the model parameters and the augmented parameters, which can be conveniently optimized by a proposed stochastic expectation-maximization algorithm. Our framework is more general and is related to several popular SSL algorithms. We verify our framework on both small scale models and several large foundation models, including SSL of ImageNet and SSL for vision-language representation learning. Experiment results indicate the existence of gradient bias in all cases, and demonstrate the effectiveness of the proposed method on improving previous state of the arts. Remarkably, our method can outperform the strong MoCo-v3 under the same hyper-parameter setting with only around half of the minibatch size; and also obtains strong results in the recent public benchmark ELEVATER for few-shot image classiﬁcation.
\ No newline at end of file
diff --git a/data/2022/neurips/Why do tree-based models still outperform deep learning on typical tabular data? b/data/2022/neurips/Why do tree-based models still outperform deep learning on typical tabular data?
new file mode 100644
index 0000000000..cbe5aec82d
--- /dev/null
+++ b/data/2022/neurips/Why do tree-based models still outperform deep learning on typical tabular data?	
@@ -0,0 +1 @@
+While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that tree-based models remain state-of-the-art on medium-sized data ( ∼ 10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs). This leads to a series of challenges which should guide researchers aiming to build tabular-specific NNs: 1. be robust to uninformative features, 2. preserve the orientation of the data, and 3. be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20 000 compute hours hyperparameter search for each learner. Results Looking at the results as a function of random search time rather than random search iterations tree-based models superiority even more striking. Neural networks and tree-based models were close for some benchmarks after a small number of iterations, but for the same amount of time spent on random search, tree-based models scores are always high above neural networks.
\ No newline at end of file
diff --git a/data/2022/neurips/Why neural networks find simple solutions: The many regularizers of geometric complexity b/data/2022/neurips/Why neural networks find simple solutions: The many regularizers of geometric complexity
new file mode 100644
index 0000000000..6375431cd3
--- /dev/null
+++ b/data/2022/neurips/Why neural networks find simple solutions: The many regularizers of geometric complexity	
@@ -0,0 +1 @@
+In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
\ No newline at end of file
diff --git a/data/2022/neurips/Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time b/data/2022/neurips/Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time
new file mode 100644
index 0000000000..d8ebe1ff3f
--- /dev/null
+++ b/data/2022/neurips/Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time	
@@ -0,0 +1 @@
+Distribution shift occurs when the test distribution differs from the training distribution, and it can considerably degrade performance of machine learning models deployed in the real world. Temporal shifts -- distribution shifts arising from the passage of time -- often occur gradually and have the additional structure of timestamp metadata. By leveraging timestamp metadata, models can potentially learn from trends in past distribution shifts and extrapolate into the future. While recent works have studied distribution shifts, temporal shifts remain underexplored. To address this gap, we curate Wild-Time, a benchmark of 5 datasets that reflect temporal distribution shifts arising in a variety of real-world applications, including patient prognosis and news classification. On these datasets, we systematically benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning. We use two evaluation strategies: evaluation with a fixed time split (Eval-Fix) and evaluation with a data stream (Eval-Stream). Eval-Fix, our primary evaluation strategy, aims to provide a simple evaluation protocol, while Eval-Stream is more realistic for certain real-world applications. Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data. Existing methods are unable to close this gap. Code is available at https://wild-time.github.io/.
\ No newline at end of file
diff --git a/data/2022/neurips/Will Bilevel Optimizers Benefit from Loops b/data/2022/neurips/Will Bilevel Optimizers Benefit from Loops
new file mode 100644
index 0000000000..05a51791f0
--- /dev/null
+++ b/data/2022/neurips/Will Bilevel Optimizers Benefit from Loops	
@@ -0,0 +1 @@
+Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results.
\ No newline at end of file
diff --git a/data/2022/neurips/WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models b/data/2022/neurips/WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models
new file mode 100644
index 0000000000..56abac7580
--- /dev/null
+++ b/data/2022/neurips/WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models	
@@ -0,0 +1 @@
+While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a textual cue related to several visual candidates, and another player tries to identify them. Human players are rewarded for creating associations that are challenging for a rival AI model but still solvable by other human players. We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient. Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills, including general knowledge, common sense, abstraction, and more. We release the dataset, the code and the interactive game, allowing future data collection that can be used to develop models with better association abilities.
\ No newline at end of file
diff --git a/data/2022/neurips/Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark b/data/2022/neurips/Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
new file mode 100644
index 0000000000..a50d83d102
--- /dev/null
+++ b/data/2022/neurips/Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark	
@@ -0,0 +1 @@
+Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development. Furthermore, we release a group of models pre-trained with various image encoders (ViT-B/ViT-L/SwinT) and also apply advanced pre-training techniques into VLP such as locked-image text tuning, token-wise similarity in contrastive learning, and reduced-token interaction. Extensive experiments and a benchmarking of different downstream tasks including a new largest human-verified image-text test dataset are also provided. Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods. For the zero-shot image classification task on 10 datasets, $Wukong_{ViT-L}$ achieves an average accuracy of 73.03%. For the image-text retrieval task, it achieves a mean recall of 71.6% on AIC-ICC which is 12.9% higher than WenLan 2.0. Also, our Wukong models are benchmarked on downstream tasks with other variants on multiple datasets, e.g., Flickr8K-CN, Flickr-30K-CN, COCO-CN, et al. More information can be referred to: https://wukong-dataset.github.io/wukong-dataset/.
\ No newline at end of file
diff --git a/data/2022/neurips/XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient b/data/2022/neurips/XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient
new file mode 100644
index 0000000000..d0c78ceab4
--- /dev/null
+++ b/data/2022/neurips/XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient	
@@ -0,0 +1 @@
+Extreme compression, particularly ultra-low bit precision (binary/ternary) quantization, has been proposed to fit large NLP models on resource-constraint devices. However, to preserve the accuracy for such aggressive compression schemes, cutting-edge methods usually introduce complicated compression pipelines, e.g., multi-stage expensive knowledge distillation with extensive hyperparameter tuning. Also, they oftentimes focus less on smaller transformer models that have already been heavily compressed via knowledge distillation and lack a systematic study to show the effectiveness of their methods. In this paper, we perform a very comprehensive systematic study to measure the impact of many key hyperparameters and training strategies from previous works. As a result, we find out that previous baselines for ultra-low bit precision quantization are significantly under-trained. Based on our study, we propose a simple yet effective compression pipeline for extreme compression, named XTC. XTC demonstrates that (1) we can skip the pre-training knowledge distillation to obtain a 5-layer BERT while achieving better performance than previous state-of-the-art methods, e.g., the 6-layer TinyBERT; (2) extreme quantization plus layer reduction is able to reduce the model size by 50x, resulting in new state-of-the-art results on GLUE tasks.
\ No newline at end of file
diff --git a/data/2022/neurips/You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments b/data/2022/neurips/You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments
new file mode 100644
index 0000000000..4f699dfd32
--- /dev/null
+++ b/data/2022/neurips/You Can't Count on Luck: Why Decision Transformers and RvS Fail in Stochastic Environments	
@@ -0,0 +1 @@
+Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We demonstrate this in several challenging stochastic offline-RL tasks including the challenging puzzle game 2048, and Connect Four playing against a stochastic opponent. In all tested domains, ESPER achieves significantly better alignment between the target return and achieved return than simply conditioning on returns. ESPER also achieves higher maximum performance than even the value-based baselines.
\ No newline at end of file
diff --git a/data/2022/neurips/You Never Stop Dancing: Non-freezing Dance Generation via Bank-constrained Manifold Projection b/data/2022/neurips/You Never Stop Dancing: Non-freezing Dance Generation via Bank-constrained Manifold Projection
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2022/neurips/You Only Live Once: Single-Life Reinforcement Learning b/data/2022/neurips/You Only Live Once: Single-Life Reinforcement Learning
new file mode 100644
index 0000000000..6d8f749334
--- /dev/null
+++ b/data/2022/neurips/You Only Live Once: Single-Life Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning algorithms are typically designed to learn a performant policy that can repeatedly and autonomously complete a task, usually starting from scratch. However, in many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial. For example, imagine a disaster relief robot tasked with retrieving an item from a fallen building, where it cannot get direct supervision from humans. It must retrieve this object within one test-time trial, and must do so while tackling unknown obstacles, though it may leverage knowledge it has of the building before the disaster. We formalize this problem setting, which we call single-life reinforcement learning (SLRL), where an agent must complete a task within a single episode without interventions, utilizing its prior experience while contending with some form of novelty. SLRL provides a natural setting to study the challenge of autonomously adapting to unfamiliar situations, and we find that algorithms designed for standard episodic reinforcement learning often struggle to recover from out-of-distribution states in this setting. Motivated by this observation, we propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy that leverages the agent's prior experience as guidance in novel situations. Our experiments on several single-life continuous control problems indicate that methods based on our distribution matching formulation are 20-60% more successful because they can more quickly recover from novel states.
\ No newline at end of file
diff --git a/data/2022/neurips/Your Out-of-Distribution Detection Method is Not Robust! b/data/2022/neurips/Your Out-of-Distribution Detection Method is Not Robust!
new file mode 100644
index 0000000000..bce1a8a8a9
--- /dev/null
+++ b/data/2022/neurips/Your Out-of-Distribution Detection Method is Not Robust!	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used $\epsilon=8/255$ for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .
\ No newline at end of file
diff --git a/data/2022/neurips/Your Transformer May Not be as Powerful as You Expect b/data/2022/neurips/Your Transformer May Not be as Powerful as You Expect
new file mode 100644
index 0000000000..c1c60337f8
--- /dev/null
+++ b/data/2022/neurips/Your Transformer May Not be as Powerful as You Expect	
@@ -0,0 +1 @@
+Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative -- RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. One key reason lies in that most RPEs are placed in the softmax attention that always generates a right stochastic matrix. This restricts the network from capturing positional information in the RPEs and limits its capacity. To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions. Therefore, the corresponding URPE-based Transformers become universal function approximators. Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications. The code will be made publicly available at https://github.com/lsj2408/URPE.
\ No newline at end of file
diff --git a/data/2022/neurips/ZARTS: On Zero-order Optimization for Neural Architecture Search b/data/2022/neurips/ZARTS: On Zero-order Optimization for Neural Architecture Search
new file mode 100644
index 0000000000..dc7ca3425e
--- /dev/null
+++ b/data/2022/neurips/ZARTS: On Zero-order Optimization for Neural Architecture Search	
@@ -0,0 +1 @@
+Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. It introduces trainable architecture parameters to represent the importance of candidate operations and proposes first/second-order approximation to estimate their gradients, making it possible to solve NAS by gradient descent algorithm. However, our in-depth empirical results show that the approximation will often distort the loss landscape, leading to the biased objective to optimize and in turn inaccurate gradient estimation for architecture parameters. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. Specifically, three representative zero-order optimization methods are introduced: RS, MGS, and GLD, among which MGS performs best by balancing the accuracy and speed. Moreover, we explore the connections between RS/MGS and gradient descent algorithm and show that our ZARTS can be seen as a robust gradient-free counterpart to DARTS. Extensive experiments on multiple datasets and search spaces show the remarkable performance of our method. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue. Also, we search on the search space of DARTS to compare with peer methods, and our discovered architecture achieves 97.54% accuracy on CIFAR-10 and 75.7% top-1 accuracy on ImageNet, which are state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2022/neurips/ZIN: When and How to Learn Invariance Without Environment Partition? b/data/2022/neurips/ZIN: When and How to Learn Invariance Without Environment Partition?
new file mode 100644
index 0000000000..520a470aef
--- /dev/null
+++ b/data/2022/neurips/ZIN: When and How to Learn Invariance Without Environment Partition?	
@@ -0,0 +1 @@
+It is commonplace to encounter heterogeneous data, of which some aspects of the data distribution may vary but the underlying causal mechanisms remain constant. When data are divided into distinct environments according to the heterogeneity, recent invariant learning methods have proposed to learn robust and invariant models based on this environment partition. It is hence tempting to utilize the inherent heterogeneity even when environment partition is not provided. Unfortunately, in this work, we show that learning invariant features under this circumstance is fundamentally impossible without further inductive biases or additional information. Then, we propose a framework to jointly learn environment partition and invariant representation, assisted by additional auxiliary information. We derive sufficient and necessary conditions for our framework to provably identify invariant features under a fairly general setting. Experimental results on both synthetic and real world datasets validate our analysis and demonstrate an improved performance of the proposed framework over existing methods. Finally, our results also raise the need of making the role of inductive biases more explicit in future works, when considering learning invariant models without environment partition. Codes are available at https://github.com/linyongver/ZIN_official .
\ No newline at end of file
diff --git a/data/2022/neurips/ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings b/data/2022/neurips/ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
new file mode 100644
index 0000000000..f2f319ab48
--- /dev/null
+++ b/data/2022/neurips/ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings	
@@ -0,0 +1 @@
+We present a scalable approach for learning open-world object-goal navigation (ObjectNav) -- the task of asking a virtual robot (agent) to find any instance of an object in an unexplored environment (e.g.,"find a sink"). Our approach is entirely zero-shot -- i.e., it does not require ObjectNav rewards or demonstrations of any kind. Instead, we train on the image-goal navigation (ImageNav) task, in which agents find the location where a picture (i.e., goal image) was captured. Specifically, we encode goal images into a multimodal, semantic embedding space to enable training semantic-goal navigation (SemanticNav) agents at scale in unannotated 3D environments (e.g., HM3D). After training, SemanticNav agents can be instructed to find objects described in free-form natural language (e.g.,"sink","bathroom sink", etc.) by projecting language goals into the same multimodal, semantic embedding space. As a result, our approach enables open-world ObjectNav. We extensively evaluate our agents on three ObjectNav datasets (Gibson, HM3D, and MP3D) and observe absolute improvements in success of 4.2% - 20.0% over existing zero-shot methods. For reference, these gains are similar or better than the 5% improvement in success between the Habitat 2020 and 2021 ObjectNav challenge winners. In an open-world setting, we discover that our agents can generalize to compound instructions with a room explicitly mentioned (e.g.,"Find a kitchen sink") and when the target room can be inferred (e.g.,"Find a sink and a stove").
\ No newline at end of file
diff --git a/data/2022/neurips/Zero-Shot 3D Drug Design by Sketching and Generating b/data/2022/neurips/Zero-Shot 3D Drug Design by Sketching and Generating
new file mode 100644
index 0000000000..7f7d1d4bf0
--- /dev/null
+++ b/data/2022/neurips/Zero-Shot 3D Drug Design by Sketching and Generating	
@@ -0,0 +1 @@
+Drug design is a crucial step in the drug discovery cycle. Recently, various deep learning-based methods design drugs by generating novel molecules from scratch, avoiding traversing large-scale drug libraries. However, they depend on scarce experimental data or time-consuming docking simulation, leading to overfitting issues with limited training data and slow generation speed. In this study, we propose the zero-shot drug design method DESERT (Drug dEsign by SkEtching and geneRaTing). Specifically, DESERT splits the design process into two stages: sketching and generating, and bridges them with the molecular shape. The two-stage fashion enables our method to utilize the large-scale molecular database to reduce the need for experimental data and docking simulation. Experiments show that DESERT achieves a new state-of-the-art at a fast speed.
\ No newline at end of file
diff --git a/data/2022/neurips/Zero-Shot Video Question Answering via Frozen Bidirectional Language Models b/data/2022/neurips/Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
new file mode 100644
index 0000000000..81c64e53b9
--- /dev/null
+++ b/data/2022/neurips/Zero-Shot Video Question Answering via Frozen Bidirectional Language Models	
@@ -0,0 +1 @@
+Video question answering (VideoQA) is a complex task that requires diverse multi-modal data for training. Manual annotation of question and answers for videos, however, is tedious and prohibits scalability. To tackle this problem, recent methods consider zero-shot settings with no manual annotation of visual question-answer. In particular, a promising approach adapts frozen autoregressive language models pretrained on Web-scale text-only data to multi-modal inputs. In contrast, we here build on frozen bidirectional language models (BiLM) and show that such an approach provides a stronger and cheaper alternative for zero-shot VideoQA. In particular, (i) we combine visual inputs with the frozen BiLM using light trainable modules, (ii) we train such modules using Web-scraped multi-modal data, and finally (iii) we perform zero-shot VideoQA inference through masked language modeling, where the masked text is the answer to a given question. Our proposed approach, FrozenBiLM, outperforms the state of the art in zero-shot VideoQA by a significant margin on a variety of datasets, including LSMDC-FiB, iVQA, MSRVTT-QA, MSVD-QA, ActivityNet-QA, TGIF-FrameQA, How2QA and TVQA. It also demonstrates competitive performance in the few-shot and fully-supervised setting. Our code and models are publicly available at https://github.com/antoyang/FrozenBiLM.
\ No newline at end of file
diff --git a/data/2022/neurips/Zero-Sum Stochastic Stackelberg Games b/data/2022/neurips/Zero-Sum Stochastic Stackelberg Games
new file mode 100644
index 0000000000..354edee21a
--- /dev/null
+++ b/data/2022/neurips/Zero-Sum Stochastic Stackelberg Games	
@@ -0,0 +1 @@
+Zero-sum stochastic games have found important applications in a variety of fields, from machine learning to economics. Work on this model has primarily focused on the computation of Nash equilibrium due to its effectiveness in solving adversarial board and video games. Unfortunately, a Nash equilibrium is not guaranteed to exist in zero-sum stochastic games when the payoffs at each state are not convex-concave in the players' actions. A Stackelberg equilibrium, however, is guaranteed to exist. Consequently, in this paper, we study zero-sum stochastic Stackelberg games. Going beyond known existence results for (non-stationary) Stackelberg equilibria, we prove the existence of recursive (i.e., Markov perfect) Stackelberg equilibria (recSE) in these games, provide necessary and sufficient conditions for a policy profile to be a recSE, and show that recSE can be computed in (weakly) polynomial time via value iteration. Finally, we show that zero-sum stochastic Stackelberg games can model the problem of pricing and allocating goods across agents and time. More specifically, we propose a zero-sum stochastic Stackelberg game whose recSE correspond to the recursive competitive equilibria of a large class of stochastic Fisher markets. We close with a series of experiments that showcase how our methodology can be used to solve the consumption-savings problem in stochastic Fisher markets.
\ No newline at end of file
diff --git a/data/2022/neurips/Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks b/data/2022/neurips/Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
new file mode 100644
index 0000000000..e58d7e1f8c
--- /dev/null
+++ b/data/2022/neurips/Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks	
@@ -0,0 +1 @@
+Data continuously emitted from industrial ecosystems such as social or e-commerce platforms are commonly represented as heterogeneous graphs (HG) composed of multiple node/edge types. State-of-the-art graph learning methods for HGs known as heterogeneous graph neural networks (HGNNs) are applied to learn deep context-informed node representations. However, many HG datasets from industrial applications suffer from label imbalance between node types. As there is no direct way to learn using labels rooted at different node types, HGNNs have been applied to only a few node types with abundant labels. We propose a zero-shot transfer learning module for HGNNs called a Knowledge Transfer Network (KTN) that transfers knowledge from label-abundant node types to zero-labeled node types through rich relational information given in the HG. KTN is derived from the theoretical relationship, which we introduce in this work, between distinct feature extractors for each node type given in an HGNN model. KTN improves performance of 6 different types of HGNN models by up to 960% for inference on zero-labeled node types and outperforms state-of-the-art transfer learning baselines by up to 73% across 18 different transfer learning tasks on HGs.
\ No newline at end of file
diff --git a/data/2022/neurips/ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time b/data/2022/neurips/ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time
new file mode 100644
index 0000000000..978efbae0a
--- /dev/null
+++ b/data/2022/neurips/ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time	
@@ -0,0 +1 @@
+Humans have the remarkable ability to recognize and acquire novel visual concepts in a zero-shot manner. Given a high-level, symbolic description of a novel concept in terms of previously learned visual concepts and their relations, humans can recognize novel concepts without seeing any examples. Moreover, they can acquire new concepts by parsing and communicating symbolic structures using learned visual concepts and relations. Endowing these capabilities in machines is pivotal in improving their generalization capability at inference time. In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way. ZeroC represents concepts as graphs of constituent concept models (as nodes) and their relations (as edges). To allow inference time composition, we employ energy-based models (EBMs) to model concepts and relations. We design ZeroC architecture so that it allows a one-to-one mapping between a symbolic graph structure of a concept and its corresponding EBM, which for the first time, allows acquiring new concepts, communicating its graph structure, and applying it to classification and detection tasks (even across domains) at inference time. We introduce algorithms for learning and inference with ZeroC. We evaluate ZeroC on a challenging grid-world dataset which is designed to probe zero-shot concept recognition and acquisition, and demonstrate its capability.
\ No newline at end of file
diff --git a/data/2022/neurips/ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers b/data/2022/neurips/ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
new file mode 100644
index 0000000000..7b641a821c
--- /dev/null
+++ b/data/2022/neurips/ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers	
@@ -0,0 +1 @@
+How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements. In this work, we present an efficient and affordable post-training quantization approach to compress large Transformer-based models, termed as ZeroQuant. ZeroQuant is an end-to-end quantization and inference pipeline with three main components: (1) a fine-grained hardware-friendly quantization scheme for both weight and activations; (2) a novel affordable layer-by-layer knowledge distillation algorithm (LKD) even without the access to the original training data; (3) a highly-optimized quantization system backend support to remove the quantization/dequantization overhead. As such, we are able to show that: (1) ZeroQuant can reduce the precision for weights and activations to INT8 in a cost-free way for both BERT and GPT3-style models with minimal accuracy impact, which leads to up to 5.19x/4.16x speedup on those models compared to FP16 inference; (2) ZeroQuant plus LKD affordably quantize the weights in the fully-connected module to INT4 along with INT8 weights in the attention module and INT8 activations, resulting in 3x memory footprint reduction compared to the FP16 model; (3) ZeroQuant can be directly applied to two of the largest open-sourced language models, including GPT-J6B and GPT-NeoX20, for which our INT8 model achieves similar accuracy as the FP16 model but achieves up to 5.2x better efficiency.
\ No newline at end of file
diff --git a/data/2022/neurips/Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity b/data/2022/neurips/Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity
new file mode 100644
index 0000000000..950c19a3fb
--- /dev/null
+++ b/data/2022/neurips/Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity	
@@ -0,0 +1 @@
+$\ell_0$ constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the $\ell_0$ constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.
\ No newline at end of file
diff --git a/data/2022/neurips/Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients b/data/2022/neurips/Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients
new file mode 100644
index 0000000000..ad9b1d2792
--- /dev/null
+++ b/data/2022/neurips/Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients	
@@ -0,0 +1 @@
+We consider escaping saddle points of nonconvex problems where only the function evaluations can be accessed. Although a variety of works have been proposed, the majority of them require either second or first-order information, and only a few of them have exploited zeroth-order methods, particularly the technique of negative curvature finding with zeroth-order methods which has been proven to be the most efficient method for escaping saddle points. To fill this gap, in this paper, we propose two zeroth-order negative curvature finding frameworks that can replace Hessian-vector product computations without increasing the iteration complexity. We apply the proposed frameworks to ZO-GD, ZO-SGD, ZO-SCSG, ZO-SPIDER and prove that these ZO algorithms can converge to $(\epsilon,\delta)$-approximate second-order stationary points with less query complexity compared with prior zeroth-order works for finding local minima.
\ No newline at end of file
diff --git a/data/2022/neurips/Zonotope Domains for Lagrangian Neural Network Verification b/data/2022/neurips/Zonotope Domains for Lagrangian Neural Network Verification
new file mode 100644
index 0000000000..3332f25b28
--- /dev/null
+++ b/data/2022/neurips/Zonotope Domains for Lagrangian Neural Network Verification	
@@ -0,0 +1 @@
+Neural network verification aims to provide provable bounds for the output of a neural network for a given input range. Notable prior works in this domain have either generated bounds using abstract domains, which preserve some dependency between intermediate neurons in the network; or framed verification as an optimization problem and solved a relaxation using Lagrangian methods. A key drawback of the latter technique is that each neuron is treated independently, thereby ignoring important neuron interactions. We provide an approach that merges these two threads and uses zonotopes within a Lagrangian decomposition. Crucially, we can decompose the problem of verifying a deep neural network into the verification of many 2-layer neural networks. While each of these problems is provably hard, we provide efficient relaxation methods that are amenable to efficient dual ascent procedures. Our technique yields bounds that improve upon both linear programming and Lagrangian-based verification techniques in both time and bound tightness.
\ No newline at end of file
diff --git a/data/2022/neurips/ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization b/data/2022/neurips/ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization
new file mode 100644
index 0000000000..7cb096ae79
--- /dev/null
+++ b/data/2022/neurips/ZooD: Exploiting Model Zoo for Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+Recent advances on large-scale pre-training have shown great potentials of leveraging a large set of Pre-Trained Models (PTMs) for improving Out-of-Distribution (OoD) generalization, for which the goal is to perform well on possible unseen domains after fine-tuning on multiple training domains. However, maximally exploiting a zoo of PTMs is challenging since fine-tuning all possible combinations of PTMs is computationally prohibitive while accurate selection of PTMs requires tackling the possible data distribution shift for OoD tasks. In this work, we propose ZooD, a paradigm for PTMs ranking and ensemble with feature selection. Our proposed metric ranks PTMs by quantifying inter-class discriminability and inter-domain stability of the features extracted by the PTMs in a leave-one-domain-out cross-validation manner. The top-K ranked models are then aggregated for the target OoD task. To avoid accumulating noise induced by model ensemble, we propose an efficient variational EM algorithm to select informative features. We evaluate our paradigm on a diverse model zoo consisting of 35 models for various OoD tasks and demonstrate: (i) model ranking is better correlated with fine-tuning ranking than previous methods and up to 9859x faster than brute-force fine-tuning; (ii) OoD generalization after model ensemble with feature selection outperforms the state-of-the-art methods and the accuracy on most challenging task DomainNet is improved from 46.5\% to 50.6\%. Furthermore, we provide the fine-tuning results of 35 PTMs on 7 OoD datasets, hoping to help the research of model zoo and OoD generalization. Code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/zood.
\ No newline at end of file
diff --git a/data/2022/neurips/coVariance Neural Networks b/data/2022/neurips/coVariance Neural Networks
new file mode 100644
index 0000000000..03fdeed5d5
--- /dev/null
+++ b/data/2022/neurips/coVariance Neural Networks	
@@ -0,0 +1 @@
+In computational neuroscience, there has been an increased interest in developing machine learning algorithms that leverage brain imaging data to provide estimates of “brain age” for an individual. Importantly, the discordance between brain age and chronological age (referred to as “brain age gap”) can capture accelerated aging due to adverse health conditions and therefore, can reflect increased vulnerability towards neurological disease or cognitive impairments. However, widespread adoption of brain age for clinical decision support has been hindered due to lack of transparency and methodological justifications in most existing brain age prediction algorithms. In this paper, we leverage coVariance neural networks (VNN) to propose an explanation-driven and anatomically interpretable framework for brain age prediction using cortical thickness features. Specifically, our brain age prediction framework extends beyond the coarse metric of brain age gap in Alzheimer’s disease (AD) and we make two important observations: (i) VNNs can assign anatomical interpretability to elevated brain age gap in AD by identifying contributing brain regions, (ii) the interpretability offered by VNNs is contingent on their ability to exploit specific eigenvectors of the anatomical covariance matrix. Together, these observations facilitate an explainable and anatomically interpretable perspective to the task of brain age prediction.
\ No newline at end of file
diff --git a/data/2022/neurips/mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors b/data/2022/neurips/mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors
new file mode 100644
index 0000000000..261721d556
--- /dev/null
+++ b/data/2022/neurips/mRI: Multi-modal 3D Human Pose Estimation Dataset using mmWave, RGB-D, and Inertial Sensors	
@@ -0,0 +1 @@
+The ability to estimate 3D human body pose and movement, also known as human pose estimation (HPE), enables many applications for home-based health monitoring, such as remote rehabilitation training. Several possible solutions have emerged using sensors ranging from RGB cameras, depth sensors, millimeter-Wave (mmWave) radars, and wearable inertial sensors. Despite previous efforts on datasets and benchmarks for HPE, few dataset exploits multiple modalities and focuses on home-based health monitoring. To bridge the gap, we present mRI, a multi-modal 3D human pose estimation dataset with mmWave, RGB-D, and Inertial Sensors. Our dataset consists of over 160k synchronized frames from 20 subjects performing rehabilitation exercises and supports the benchmarks of HPE and action detection. We perform extensive experiments using our dataset and delineate the strength of each modality. We hope that the release of mRI can catalyze the research in pose estimation, multi-modal learning, and action understanding, and more importantly facilitate the applications of home-based health monitoring.
\ No newline at end of file
diff --git a/data/2022/neurips/pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning b/data/2022/neurips/pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning
new file mode 100644
index 0000000000..836c7ce42d
--- /dev/null
+++ b/data/2022/neurips/pFL-Bench: A Comprehensive Benchmark for Personalized Federated Learning	
@@ -0,0 +1 @@
+Personalized Federated Learning (pFL), which utilizes and deploys distinct local models, has gained increasing attention in recent years due to its success in handling the statistical heterogeneity of FL clients. However, standardized evaluation and systematical analysis of diverse pFL methods remain a challenge. Firstly, the highly varied datasets, FL simulation settings and pFL implementations prevent easy and fair comparisons of pFL methods. Secondly, the current pFL literature diverges in the adopted evaluation and ablation protocols. Finally, the effectiveness and robustness of pFL methods are under-explored in various practical scenarios, such as the generalization to new clients and the participation of resource-limited clients. To tackle these challenges, we propose the first comprehensive pFL benchmark, pFL-Bench, for facilitating rapid, reproducible, standardized and thorough pFL evaluation. The proposed benchmark contains more than 10 dataset variants in various application domains with a unified data partition and realistic heterogeneous settings; a modularized and easy-to-extend pFL codebase with more than 20 competitive pFL method implementations; and systematic evaluations under containerized environments in terms of generalization, fairness, system overhead, and convergence. We highlight the benefits and potential of state-of-the-art pFL methods and hope the pFL-Bench enables further pFL research and broad applications that would otherwise be difficult owing to the absence of a dedicated benchmark. The code is released at https://github.com/alibaba/FederatedScope/tree/master/benchmark/pFL-Bench.
\ No newline at end of file
diff --git a/data/2022/neurips/projUNN: efficient method for training deep networks with unitary matrices b/data/2022/neurips/projUNN: efficient method for training deep networks with unitary matrices
new file mode 100644
index 0000000000..77813f27b1
--- /dev/null
+++ b/data/2022/neurips/projUNN: efficient method for training deep networks with unitary matrices	
@@ -0,0 +1 @@
+In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full $N$-dimensional unitary or orthogonal matrices with a training runtime scaling as $O(kN^2)$. Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting ($k=1$), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds benchmarked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.
\ No newline at end of file
diff --git a/data/2022/neurips/pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models b/data/2022/neurips/pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
new file mode 100644
index 0000000000..c1b39c773b
--- /dev/null
+++ b/data/2022/neurips/pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models	
@@ -0,0 +1 @@
+Knowledge tracing (KT) is the task of using students' historical learning interaction data to model their knowledge mastery over time so as to make predictions on their future interaction performance. Recently, remarkable progress has been made of using various deep learning techniques to solve the KT problem. However, the success behind deep learning based knowledge tracing (DLKT) approaches is still left somewhat unknown and proper measurement and analysis of these DLKT approaches remain a challenge. First, data preprocessing procedures in existing works are often private and custom, which limits experimental standardization. Furthermore, existing DLKT studies often differ in terms of the evaluation protocol and are far away real-world educational contexts. To address these problems, we introduce a comprehensive python based benchmark platform, \textsc{pyKT}, to guarantee valid comparisons across DLKT methods via thorough evaluations. The \textsc{pyKT} library consists of a standardized set of integrated data preprocessing procedures on 7 popular datasets across different domains, and 10 frequently compared DLKT model implementations for transparent experiments. Results from our fine-grained and rigorous empirical KT studies yield a set of observations and suggestions for effective DLKT, e.g., wrong evaluation setting may cause label leakage that generally leads to performance inflation; and the improvement of many DLKT approaches is minimal compared to the very first DLKT model proposed by Piech et al. \cite{piech2015deep}. We have open sourced \textsc{pyKT} and our experimental results at https://pykt.org/. We welcome contributions from other research groups and practitioners.
\ No newline at end of file
diff --git a/data/2022/neurips/u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality b/data/2022/neurips/u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
new file mode 100644
index 0000000000..e17557a20d
--- /dev/null
+++ b/data/2022/neurips/u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality	
@@ -0,0 +1 @@
+While audio-visual speech models can yield superior performance and robustness compared to audio-only models, their development and adoption are hindered by the lack of labeled and unlabeled audio-visual data and the cost to deploy one model per modality. In this paper, we present u-HuBERT, a self-supervised pre-training framework that can leverage both multimodal and unimodal speech with a unified masked cluster prediction objective. By utilizing modality dropout during pre-training, we demonstrate that a single fine-tuned model can achieve performance on par or better than the state-of-the-art modality-specific models. Moreover, our model fine-tuned only on audio can perform well with audio-visual and visual speech input, achieving zero-shot modality generalization for multiple speech processing tasks. In particular, our single model yields 1.2%/1.4%/27.2% speech recognition word error rate on LRS3 with audio-visual/audio/visual input. Codes and models are available at https://github.com/facebookresearch/av_hubert
\ No newline at end of file
diff --git a/data/2022/neurips/xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Radar Imagery b/data/2022/neurips/xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Radar Imagery
new file mode 100644
index 0000000000..c1f35f5de8
--- /dev/null
+++ b/data/2022/neurips/xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Radar Imagery	
@@ -0,0 +1 @@
+Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems -- known as ``dark vessels'' -- is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each. The images are annotated using a combination of automated and manual analysis. Co-located bathymetry and wind state rasters accompany every SAR image. We also provide an overview of the xView3 Computer Vision Challenge, an international competition using xView3-SAR for ship detection and characterization at large scale. We release the data (\href{https://iuu.xview.us/}{https://iuu.xview.us/}) and code (\href{https://github.com/DIUx-xView}{https://github.com/DIUx-xView}) to support ongoing development and evaluation of ML approaches for this important application.
\ No newline at end of file
diff --git "a/data/2022/neurips/\360\237\217\230\357\270\217 ProcTHOR: Large-Scale Embodied AI Using Procedural Generation" "b/data/2022/neurips/\360\237\217\230\357\270\217 ProcTHOR: Large-Scale Embodied AI Using Procedural Generation"
new file mode 100644
index 0000000000..80765dbf8b
--- /dev/null
+++ "b/data/2022/neurips/\360\237\217\230\357\270\217 ProcTHOR: Large-Scale Embodied AI Using Procedural Generation"	
@@ -0,0 +1 @@
+Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10,000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.
\ No newline at end of file
diff --git "a/data/2023/neurips/\"Why Not Looking backward?\" A Robust Two-Step Method to Automatically Terminate Bayesian Optimization" "b/data/2023/neurips/\"Why Not Looking backward?\" A Robust Two-Step Method to Automatically Terminate Bayesian Optimization"
new file mode 100644
index 0000000000..f6bfe57826
--- /dev/null
+++ "b/data/2023/neurips/\"Why Not Looking backward?\" A Robust Two-Step Method to Automatically Terminate Bayesian Optimization"	
@@ -0,0 +1 @@
+Bayesian Optimization (BO) is a powerful method for tackling expensive black-box optimization problems. As a sequential model-based optimization strategy, BO iteratively explores promising solutions until a predetermined budget, either iterations or time, is exhausted. The decision on when to terminate BO significantly influences both the quality of solutions and its computational efficiency. In this paper, we propose a simple, yet theoretically grounded, two-step method for automatically terminating BO. Our core concept is to proactively identify if the search is within a convex region by examining previously observed samples. BO is halted once the local regret within this convex region falls below a predetermined threshold. To enhance numerical stability, we propose an approximation method for calculating the termination indicator by solving a bilevel optimization problem. We conduct extensive empirical studies on diverse benchmark problems, including synthetic functions, reinforcement learning, and hyperparameter optimization. Experimental results demonstrate that our proposed method saves up to ≈ 80% computational budget yet is with an order of magnitude smaller performance degradation, comparing against the other peer methods. In addition, our proposed termination method is robust in terms of the setting of its termination criterion.
\ No newline at end of file
diff --git a/data/2023/neurips/(Amplified) Banded Matrix Factorization: A unified approach to private training b/data/2023/neurips/(Amplified) Banded Matrix Factorization: A unified approach to private training
new file mode 100644
index 0000000000..0a774de4a1
--- /dev/null
+++ b/data/2023/neurips/(Amplified) Banded Matrix Factorization: A unified approach to private training	
@@ -0,0 +1 @@
+Matrix factorization (MF) mechanisms for differential privacy (DP) have substantially improved the state-of-the-art in privacy-utility-computation tradeoffs for ML applications in a variety of scenarios, but in both the centralized and federated settings there remain instances where either MF cannot be easily applied, or other algorithms provide better tradeoffs (typically, as $\epsilon$ becomes small). In this work, we show how MF can subsume prior state-of-the-art algorithms in both federated and centralized training settings, across all privacy budgets. The key technique throughout is the construction of MF mechanisms with banded matrices (lower-triangular matrices with at most $\hat{b}$ nonzero bands including the main diagonal). For cross-device federated learning (FL), this enables multiple-participations with a relaxed device participation schema compatible with practical FL infrastructure (as demonstrated by a production deployment). In the centralized setting, we prove that banded matrices enjoy the same privacy amplification results as the ubiquitous DP-SGD algorithm, but can provide strictly better performance in most scenarios -- this lets us always at least match DP-SGD, and often outperform it.
\ No newline at end of file
diff --git a/data/2023/neurips/2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression b/data/2023/neurips/2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression
new file mode 100644
index 0000000000..1da4a605df
--- /dev/null
+++ b/data/2023/neurips/2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression	
@@ -0,0 +1 @@
+We consider distributed convex optimization problems in the regime when the communication between the server and the workers is expensive in both uplink and downlink directions. We develop a new and provably accelerated method, which we call 2Direction, based on fast bidirectional compressed communication and a new bespoke error-feedback mechanism which may be of independent interest. Indeed, we find that the EF and EF21-P mechanisms (Seide et al., 2014; Gruntkowska et al., 2023) that have considerable success in the design of efficient non-accelerated methods are not appropriate for accelerated methods. In particular, we prove that 2Direction improves the previous state-of-the-art communication complexity $\widetilde{\Theta}\left(K \times \left(\frac{L}{\alpha \mu} + \frac{L_{\max} \omega}{n \mu} + \omega\right)\right)$ (Gruntkowska et al., 2023) to $\widetilde{\Theta}(K \times (\sqrt{\frac{L (\omega + 1)}{\alpha \mu}} + \sqrt{\frac{L_{\max} \omega^2}{n \mu}} + \frac{1}{\alpha} + \omega))$ in the $\mu$-strongly-convex setting, where $L$ and $L_{\max}$ are smoothness constants, $n$ is # of workers, $\omega$ and $\alpha$ are compression errors of the Rand$K$ and Top$K$ sparsifiers (as examples), $K$ is # of coordinates/bits that the server and workers send to each other. Moreover, our method is the first that improves upon the communication complexity of the vanilla accelerated gradient descent (AGD) method (Nesterov, 2018). We obtain similar improvements in the general convex regime as well. Finally, our theoretical findings are corroborated by experimental evidence.
\ No newline at end of file
diff --git a/data/2023/neurips/3D molecule generation by denoising voxel grids b/data/2023/neurips/3D molecule generation by denoising voxel grids
new file mode 100644
index 0000000000..6eaaf30369
--- /dev/null
+++ b/data/2023/neurips/3D molecule generation by denoising voxel grids	
@@ -0,0 +1 @@
+We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the"clean"molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.
\ No newline at end of file
diff --git a/data/2023/neurips/3D-LLM: Injecting the 3D World into Large Language Models b/data/2023/neurips/3D-LLM: Injecting the 3D World into Large Language Models
new file mode 100644
index 0000000000..895c11956c
--- /dev/null
+++ b/data/2023/neurips/3D-LLM: Injecting the 3D World into Large Language Models	
@@ -0,0 +1 @@
+Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 1M 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi-view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on held-out evaluation dataset, ScanQA, SQA3D and 3DMV-VQA, outperform state-of-the-art baselines. In particular, experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin ( e.g. , the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/ .
\ No newline at end of file
diff --git a/data/2023/neurips/4D Panoptic Scene Graph Generation b/data/2023/neurips/4D Panoptic Scene Graph Generation
new file mode 100644
index 0000000000..46f0992e62
--- /dev/null
+++ b/data/2023/neurips/4D Panoptic Scene Graph Generation	
@@ -0,0 +1 @@
+We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs. To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. In the end, we provide a real-world application example to demonstrate how we can achieve dynamic scene understanding by integrating a large language model into our PSG-4D system.
\ No newline at end of file
diff --git a/data/2023/neurips/A Bayesian Approach To Analysing Training Data Attribution In Deep Learning b/data/2023/neurips/A Bayesian Approach To Analysing Training Data Attribution In Deep Learning
new file mode 100644
index 0000000000..d84b6a69c4
--- /dev/null
+++ b/data/2023/neurips/A Bayesian Approach To Analysing Training Data Attribution In Deep Learning	
@@ -0,0 +1 @@
+Training data attribution (TDA) techniques find influential training data for the model's prediction on the test data of interest. They approximate the impact of down- or up-weighting a particular training sample. While conceptually useful, they are hardly applicable to deep models in practice, particularly because of their sensitivity to different model initialisation. In this paper, we introduce a Bayesian perspective on the TDA task, where the learned model is treated as a Bayesian posterior and the TDA estimates as random variables. From this novel viewpoint, we observe that the influence of an individual training sample is often overshadowed by the noise stemming from model initialisation and SGD batch composition. Based on this observation, we argue that TDA can only be reliably used for explaining deep model predictions that are consistently influenced by certain training data, independent of other noise factors. Our experiments demonstrate the rarity of such noise-independent training-test data pairs but confirm their existence. We recommend that future researchers and practitioners trust TDA estimates only in such cases. Further, we find a disagreement between ground truth and estimated TDA distributions and encourage future work to study this gap. Code is provided at https://github.com/ElisaNguyen/bayesian-tda.
\ No newline at end of file
diff --git a/data/2023/neurips/A Bounded Ability Estimation for Computerized Adaptive Testing b/data/2023/neurips/A Bounded Ability Estimation for Computerized Adaptive Testing
new file mode 100644
index 0000000000..2df885e8dd
--- /dev/null
+++ b/data/2023/neurips/A Bounded Ability Estimation for Computerized Adaptive Testing	
@@ -0,0 +1 @@
+Computerized adaptive testing (CAT), as a tool that can efﬁciently measure student’s ability, has been widely used in various standardized tests (e.g., GMAT and GRE). The adaptivity of CAT refers to the selection of the most informative questions for each student, reducing test length. Existing CAT methods do not explicitly target ability estimation accuracy since there is no student’s true ability as ground truth; therefore, these methods cannot be guaranteed to make the estimate converge to the true with such limited responses. In this paper, we analyze the statistical properties of estimation and ﬁnd a theoretical approximation of the true ability: the ability estimated by full responses to question bank. Based on this, a Bounded Ability Estimation framework for CAT (BECAT) is proposed in a data-summary manner, which selects a question subset that closely matches the gradient of the full responses. Thus, we develop an expected gradient difference approximation to design a simple greedy selection algorithm, and show the rigorous theoretical and error upper-bound guarantees of its ability estimate. Experiments on both real-world and synthetic datasets, show that it can reach the same estimation accuracy using 15% less questions on average, signiﬁcantly reducing test length.
\ No newline at end of file
diff --git a/data/2023/neurips/A Cross-Moment Approach for Causal Effect Estimation b/data/2023/neurips/A Cross-Moment Approach for Causal Effect Estimation
new file mode 100644
index 0000000000..7a85fd799f
--- /dev/null
+++ b/data/2023/neurips/A Cross-Moment Approach for Causal Effect Estimation	
@@ -0,0 +1 @@
+We consider the problem of estimating the causal effect of a treatment on an outcome in linear structural causal models (SCM) with latent confounders when we have access to a single proxy variable. Several methods (such as difference-in-difference (DiD) estimator or negative outcome control) have been proposed in this setting in the literature. However, these approaches require either restrictive assumptions on the data generating model or having access to at least two proxy variables. We propose a method to estimate the causal effect using cross moments between the treatment, the outcome, and the proxy variable. In particular, we show that the causal effect can be identified with simple arithmetic operations on the cross moments if the latent confounder in linear SCM is non-Gaussian. In this setting, DiD estimator provides an unbiased estimate only in the special case where the latent confounder has exactly the same direct causal effects on the outcomes in the pre-treatment and post-treatment phases. This translates to the common trend assumption in DiD, which we effectively relax. Additionally, we provide an impossibility result that shows the causal effect cannot be identified if the observational distribution over the treatment, the outcome, and the proxy is jointly Gaussian. Our experiments on both synthetic and real-world datasets showcase the effectiveness of the proposed approach in estimating the causal effect.
\ No newline at end of file
diff --git a/data/2023/neurips/A Data-Free Approach to Mitigate Catastrophic Forgetting in Federated Class Incremental Learning for Vision Tasks b/data/2023/neurips/A Data-Free Approach to Mitigate Catastrophic Forgetting in Federated Class Incremental Learning for Vision Tasks
new file mode 100644
index 0000000000..bb82c4f802
--- /dev/null
+++ b/data/2023/neurips/A Data-Free Approach to Mitigate Catastrophic Forgetting in Federated Class Incremental Learning for Vision Tasks	
@@ -0,0 +1 @@
+Deep learning models often suffer from forgetting previously learned information when trained on new data. This problem is exacerbated in federated learning (FL), where the data is distributed and can change independently for each user. Many solutions are proposed to resolve this catastrophic forgetting in a centralized setting. However, they do not apply directly to FL because of its unique complexities, such as privacy concerns and resource limitations. To overcome these challenges, this paper presents a framework for $\textbf{federated class incremental learning}$ that utilizes a generative model to synthesize samples from past distributions. This data can be later exploited alongside the training data to mitigate catastrophic forgetting. To preserve privacy, the generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Moreover, our solution does not demand the users to store old data or models, which gives them the freedom to join/leave the training at any time. Additionally, we introduce SuperImageNet, a new regrouping of the ImageNet dataset specifically tailored for federated continual learning. We demonstrate significant improvements compared to existing baselines through extensive experiments on multiple datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/A Dataset for Analyzing Streaming Media Performance over HTTP 3 Browsers b/data/2023/neurips/A Dataset for Analyzing Streaming Media Performance over HTTP 3 Browsers
new file mode 100644
index 0000000000..18ff7f02db
--- /dev/null
+++ b/data/2023/neurips/A Dataset for Analyzing Streaming Media Performance over HTTP 3 Browsers	
@@ -0,0 +1 @@
+HTTP/3 is a new application layer protocol supported by most browsers. It uses QUIC as an underlying transport protocol. QUIC provides multiple benefits, like faster connection establishment, reduced latency, and improved connection migration. Hence, popular browsers like Chrome/Chromium, Microsoft Edge, Apple Safari, and Mozilla Firefox have started supporting it. This paper presents an HTTP/3-supported browser dataset collection tool named H3B. It collects the application and network-level logs during YouTube streaming. We consider YouTube one of the most popular video streaming applications supporting QUIC. Using this tool, we collected a dataset of over 5936 YouTube sessions covering 5464 hours of streaming over 5 different geographical locations and 5 different bandwidth patterns. We believe our tool and as well as the dataset 1 could be used in multiple applications such as a better configuration of application/transport protocols based on the network conditions, intelligent integration of network and application, predicting YouTube’s QoE, etc. We analyze the dataset and observe that during an HTTP/3 streaming, not all requests are served by HTTP/3. Instead, whenever the network condition is unfavorable, the browser chooses to fallback , and the application requests are transmitted using HTTP/2 over the old-standing transport protocol TCP. We observe that such switching of protocols impacts the performance of video streaming applications.
\ No newline at end of file
diff --git a/data/2023/neurips/A Diffusion-Model of Joint Interactive Navigation b/data/2023/neurips/A Diffusion-Model of Joint Interactive Navigation
new file mode 100644
index 0000000000..832d57ff8b
--- /dev/null
+++ b/data/2023/neurips/A Diffusion-Model of Joint Interactive Navigation	
@@ -0,0 +1 @@
+Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
\ No newline at end of file
diff --git a/data/2023/neurips/A Dynamical System View of Langevin-Based Non-Convex Sampling b/data/2023/neurips/A Dynamical System View of Langevin-Based Non-Convex Sampling
new file mode 100644
index 0000000000..f3f2b15a2e
--- /dev/null
+++ b/data/2023/neurips/A Dynamical System View of Langevin-Based Non-Convex Sampling	
@@ -0,0 +1 @@
+Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture the scales of the variables such as Wasserstein distances, and (3) mainly apply to elementary schemes such as stochastic gradient Langevin dynamics. In this paper, we develop a new framework that lifts the above issues by harnessing several tools from the theory of dynamical systems. Our key result is that, for a large class of state-of-the-art sampling schemes, their last-iterate convergence in Wasserstein distances can be reduced to the study of their continuous-time counterparts, which is much better understood. Coupled with standard assumptions of MCMC sampling, our theory immediately yields the last-iterate Wasserstein convergence of many advanced sampling schemes such as proximal, randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our framework also motivates more efficient schemes that enjoy the same rigorous guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games b/data/2023/neurips/A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games
new file mode 100644
index 0000000000..811246a3a5
--- /dev/null
+++ b/data/2023/neurips/A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games	
@@ -0,0 +1 @@
+We study two-player zero-sum stochastic games, and propose a form of independent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and minimax value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players. Our main results provide finite-sample guarantees. In particular, we prove the first-known $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexity bound for payoff-based independent learning dynamics, up to a smoothing bias. In the special case where the stochastic game has only one state (i.e., matrix games), we provide a sharper $\tilde{\mathcal{O}}(1/\epsilon)$ sample complexity. Our analysis uses a novel coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.
\ No newline at end of file
diff --git a/data/2023/neurips/A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions b/data/2023/neurips/A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions
new file mode 100644
index 0000000000..1f2fa0e55b
--- /dev/null
+++ b/data/2023/neurips/A Framework for Fast and Stable Representations of Multiparameter Persistent Homology Decompositions	
@@ -0,0 +1 @@
+Topological data analysis (TDA) is an area of data science that focuses on using invariants from algebraic topology to provide multiscale shape descriptors for geometric data sets such as point clouds. One of the most important such descriptors is {\em persistent homology}, which encodes the change in shape as a filtration parameter changes; a typical parameter is the feature scale. For many data sets, it is useful to simultaneously vary multiple filtration parameters, for example feature scale and density. While the theoretical properties of single parameter persistent homology are well understood, less is known about the multiparameter case. In particular, a central question is the problem of representing multiparameter persistent homology by elements of a vector space for integration with standard machine learning algorithms. Existing approaches to this problem either ignore most of the multiparameter information to reduce to the one-parameter case or are heuristic and potentially unstable in the face of noise. In this article, we introduce a new general representation framework that leverages recent results on {\em decompositions} of multiparameter persistent homology. This framework is rich in information, fast to compute, and encompasses previous approaches. Moreover, we establish theoretical stability guarantees under this framework as well as efficient algorithms for practical computation, making this framework an applicable and versatile tool for analyzing geometric and point cloud data. We validate our stability results and algorithms with numerical experiments that demonstrate statistical convergence, prediction accuracy, and fast running times on several real data sets.
\ No newline at end of file
diff --git a/data/2023/neurips/A General Framework for Robust G-Invariance in G-Equivariant Networks b/data/2023/neurips/A General Framework for Robust G-Invariance in G-Equivariant Networks
new file mode 100644
index 0000000000..9c9303da55
--- /dev/null
+++ b/data/2023/neurips/A General Framework for Robust G-Invariance in G-Equivariant Networks	
@@ -0,0 +1 @@
+We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks ($G$-CNNs), which we call the $G$-triple-correlation ($G$-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps--such as the max--are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the $G$-TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max $G$-Pooling in $G$-CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for $G$-CNNs defined on both commutative and non-commutative groups--$SO(2)$, $O(2)$, $SO(3)$, and $O(3)$ (discretized as the cyclic $C8$, dihedral $D16$, chiral octahedral $O$ and full octahedral $O_h$ groups)--acting on $\mathbb{R}^2$ and $\mathbb{R}^3$ on both $G$-MNIST and $G$-ModelNet10 datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/A General Theory of Correct, Incorrect, and Extrinsic Equivariance b/data/2023/neurips/A General Theory of Correct, Incorrect, and Extrinsic Equivariance
new file mode 100644
index 0000000000..684cb0f1ad
--- /dev/null
+++ b/data/2023/neurips/A General Theory of Correct, Incorrect, and Extrinsic Equivariance	
@@ -0,0 +1 @@
+Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, we present a general theory for such a situation. We propose pointwise definitions of correct, incorrect, and extrinsic equivariance, which allow us to quantify continuously the degree of each type of equivariance a function displays. We then study the impact of various degrees of incorrect or extrinsic symmetry on model error. We prove error lower bounds for invariant or equivariant networks in classification or regression settings with partially incorrect symmetry. We also analyze the potentially harmful effects of extrinsic equivariance. Experiments validate these results in three different environments.
\ No newline at end of file
diff --git a/data/2023/neurips/A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation b/data/2023/neurips/A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation
new file mode 100644
index 0000000000..9ffcf37d51
--- /dev/null
+++ b/data/2023/neurips/A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation	
@@ -0,0 +1 @@
+In recent years, concept-based approaches have emerged as some of the most promising explainability methods to help us interpret the decisions of Artificial Neural Networks (ANNs). These methods seek to discover intelligible visual 'concepts' buried within the complex patterns of ANN activations in two key steps: (1) concept extraction followed by (2) importance estimation. While these two steps are shared across methods, they all differ in their specific implementations. Here, we introduce a unifying theoretical framework that comprehensively defines and clarifies these two steps. This framework offers several advantages as it allows us: (i) to propose new evaluation metrics for comparing different concept extraction approaches; (ii) to leverage modern attribution methods and evaluation metrics to extend and systematically evaluate state-of-the-art concept-based approaches and importance estimation techniques; (iii) to derive theoretical guarantees regarding the optimality of such methods. We further leverage our framework to try to tackle a crucial question in explainability: how to efficiently identify clusters of data points that are classified based on a similar shared strategy. To illustrate these findings and to highlight the main strategies of a model, we introduce a visual representation called the strategic cluster graph. Finally, we present https://serre-lab.github.io/Lens, a dedicated website that offers a complete compilation of these visualizations for all classes of the ImageNet dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/A Massive Scale Semantic Similarity Dataset of Historical English b/data/2023/neurips/A Massive Scale Semantic Similarity Dataset of Historical English
new file mode 100644
index 0000000000..f8f07f6a49
--- /dev/null
+++ b/data/2023/neurips/A Massive Scale Semantic Similarity Dataset of Historical English	
@@ -0,0 +1 @@
+A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs. Historically, around half of articles in U.S. local newspapers came from newswires like the Associated Press. While local papers reproduced articles from the newswire, they wrote their own headlines, which form abstractive summaries of the associated articles. We associate articles and their headlines by exploiting document layouts and language understanding. We then use deep neural methods to detect which articles are from the same underlying source, in the presence of substantial noise and abridgement. The headlines of reproduced articles form positive semantic similarity pairs. The resulting publicly available HEADLINES dataset is significantly larger than most existing semantic similarity datasets and covers a much longer span of time. It will facilitate the application of contrastively trained semantic similarity models to a variety of tasks, including the study of semantic change across space and time.
\ No newline at end of file
diff --git a/data/2023/neurips/A Measure-Theoretic Axiomatisation of Causality b/data/2023/neurips/A Measure-Theoretic Axiomatisation of Causality
new file mode 100644
index 0000000000..0025d14405
--- /dev/null
+++ b/data/2023/neurips/A Measure-Theoretic Axiomatisation of Causality	
@@ -0,0 +1 @@
+Causality is a central concept in a wide range of research areas, yet there is still no universally agreed axiomatisation of causality. We view causality both as an extension of probability theory and as a study of \textit{what happens when one intervenes on a system}, and argue in favour of taking Kolmogorov's measure-theoretic axiomatisation of probability as the starting point towards an axiomatisation of causality. To that end, we propose the notion of a \textit{causal space}, consisting of a probability space along with a collection of transition probability kernels, called \textit{causal kernels}, that encode the causal information of the space. Our proposed framework is not only rigorously grounded in measure theory, but it also sheds light on long-standing limitations of existing frameworks including, for example, cycles, latent variables and stochastic processes.
\ No newline at end of file
diff --git a/data/2023/neurips/A Regularized Conditional GAN for Posterior Sampling in Image Recovery Problems b/data/2023/neurips/A Regularized Conditional GAN for Posterior Sampling in Image Recovery Problems
new file mode 100644
index 0000000000..5e465b0af1
--- /dev/null
+++ b/data/2023/neurips/A Regularized Conditional GAN for Posterior Sampling in Image Recovery Problems	
@@ -0,0 +1 @@
+In image recovery problems, one seeks to infer an image from distorted, incomplete, and/or noise-corrupted measurements. Such problems arise in magnetic resonance imaging (MRI), computed tomography, deblurring, super-resolution, inpainting, phase retrieval, image-to-image translation, and other applications. Given a training set of signal/measurement pairs, we seek to do more than just produce one good image estimate. Rather, we aim to rapidly and accurately sample from the posterior distribution. To do this, we propose a regularized conditional Wasserstein GAN that generates dozens of high-quality posterior samples per second. Our regularization comprises an $\ell_1$ penalty and an adaptively weighted standard-deviation reward. Using quantitative evaluation metrics like conditional Fr\'{e}chet inception distance, we demonstrate that our method produces state-of-the-art posterior samples in both multicoil MRI and large-scale inpainting applications. The code for our model can be found here: https://github.com/matt-bendel/rcGAN
\ No newline at end of file
diff --git a/data/2023/neurips/A Riemannian Exponential Augmented Lagrangian Method for Computing the Projection Robust Wasserstein Distance b/data/2023/neurips/A Riemannian Exponential Augmented Lagrangian Method for Computing the Projection Robust Wasserstein Distance
new file mode 100644
index 0000000000..b7c6bd52b6
--- /dev/null
+++ b/data/2023/neurips/A Riemannian Exponential Augmented Lagrangian Method for Computing the Projection Robust Wasserstein Distance	
@@ -0,0 +1 @@
+Projecting the distance measures onto a low-dimensional space is an efficient way of mitigating the curse of dimensionality in the classical Wasserstein distance using optimal transport. The obtained maximized distance is referred to as projection robust Wasserstein (PRW) distance. In this paper, we equivalently reformulate the computation of the PRW distance as an optimization problem over the Cartesian product of the Stiefel manifold and the Euclidean space with additional nonlinear inequality constraints. We propose a Riemannian exponential augmented Lagrangian method (ReALM) with a global convergence guarantee to solve this problem. Compared with the existing approaches, ReALM can potentially avoid too small penalty parameters. Moreover, we propose a framework of inexact Riemannian gradient descent methods to solve the subproblems in ReALM efficiently. In particular, by using the special structure of the subproblem, we give a practical algorithm named as the inexact Riemannian Barzilai-Borwein method with Sinkhorn iteration (iRBBS). The remarkable features of iRBBS lie in that it performs a flexible number of Sinkhorn iterations to compute an inexact gradient with respect to the projection matrix of the problem and adopts the Barzilai-Borwein stepsize based on the inexact gradient information to improve the performance. We show that iRBBS can return an $\epsilon$-stationary point of the original PRW distance problem within $\mathcal{O}(\epsilon^{-3})$ iterations. Extensive numerical results on synthetic and real datasets demonstrate that our proposed ReALM as well as iRBBS outperform the state-of-the-art solvers for computing the PRW distance.
\ No newline at end of file
diff --git a/data/2023/neurips/A Robust Exact Algorithm for the Euclidean Bipartite Matching Problem b/data/2023/neurips/A Robust Exact Algorithm for the Euclidean Bipartite Matching Problem
new file mode 100644
index 0000000000..b3c24d5c29
--- /dev/null
+++ b/data/2023/neurips/A Robust Exact Algorithm for the Euclidean Bipartite Matching Problem	
@@ -0,0 +1 @@
+Algorithms for the minimum-cost bipartite matching can be used to estimate Wasserstein distance between two distributions. Given two sets A and B of n points in a 2 -dimensional Euclidean space, one can use a fast implementation of the Hungarian method to compute a minimum-cost bipartite matching of A and B in ˜ O ( n 2 ) time. Let ∆ be the spread, i.e., the ratio of the distance of the farthest to the closest pair of points in A ∪ B . In this paper, we present a new algorithm to compute a minimum-cost bipartite matching of A and B with a similar worst-case execution time of ˜ O ( n 2 log ∆) . However, when A and B are drawn independently and identically from a fixed distribution that is not known to the algorithm, the execution time of our algorithm is, in expectation, ˜ O ( n 7 / 4 log ∆) . To the best of our knowledge, our algorithm is the first one to achieve a sub-quadratic execution time even for stochastic point sets with real-valued coordinates. Our algorithm extends to any dimension d , where it runs in ˜ O ( n 2 − 12 d Φ( n )) time for stochastic point sets A and B ; here Φ( n ) is the query/update time of a dynamic weighted nearest neighbor data structure. Our algorithm can be seen as a careful adaptation of the Hungarian method in the geometric divide-and-conquer framework.
\ No newline at end of file
diff --git a/data/2023/neurips/A Scalable Neural Network for DSIC Affine Maximizer Auction Design b/data/2023/neurips/A Scalable Neural Network for DSIC Affine Maximizer Auction Design
new file mode 100644
index 0000000000..06a6d1008d
--- /dev/null
+++ b/data/2023/neurips/A Scalable Neural Network for DSIC Affine Maximizer Auction Design	
@@ -0,0 +1 @@
+Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of allocation candidates. To address these limitations, we propose AMenuNet, a scalable neural network that constructs the AMA parameters (even including the allocation menu) from bidder and item representations. AMenuNet is always DSIC and individually rational (IR) due to the properties of AMAs, and it enhances scalability by generating candidate allocations through a neural network. Additionally, AMenuNet is permutation equivariant, and its number of parameters is independent of auction scale. We conduct extensive experiments to demonstrate that AMenuNet outperforms strong baselines in both contextual and non-contextual multi-item auctions, scales well to larger auctions, generalizes well to different settings, and identifies useful deterministic allocations. Overall, our proposed approach offers an effective solution to automated DSIC auction design, with improved scalability and strong revenue performance in various settings.
\ No newline at end of file
diff --git a/data/2023/neurips/A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models b/data/2023/neurips/A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models
new file mode 100644
index 0000000000..e329b40c8b
--- /dev/null
+++ b/data/2023/neurips/A Scale-Invariant Sorting Criterion to Find a Causal Order in Additive Noise Models	
@@ -0,0 +1 @@
+Additive Noise Models (ANMs) are a common model class for causal discovery from observational data and are often used to generate synthetic data for causal discovery benchmarking. Specifying an ANM requires choosing all parameters, including those not fixed by explicit assumptions. Reisach et al. (2021) show that sorting variables by increasing variance often yields an ordering close to a causal order and introduce var-sortability to quantify this alignment. Since increasing variances may be unrealistic and are scale-dependent, ANM data are often standardized in benchmarks. We show that synthetic ANM data are characterized by another pattern that is scale-invariant: the explainable fraction of a variable's variance, as captured by the coefficient of determination $R^2$, tends to increase along the causal order. The result is high $R^2$-sortability, meaning that sorting the variables by increasing $R^2$ yields an ordering close to a causal order. We propose an efficient baseline algorithm termed $R^2$-SortnRegress that exploits high $R^2$-sortability and that can match and exceed the performance of established causal discovery algorithms. We show analytically that sufficiently high edge weights lead to a relative decrease of the noise contributions along causal chains, resulting in increasingly deterministic relationships and high $R^2$. We characterize $R^2$-sortability for different simulation parameters and find high values in common settings. Our findings reveal high $R^2$-sortability as an assumption about the data generating process relevant to causal discovery and implicit in many ANM sampling schemes. It should be made explicit, as its prevalence in real-world data is unknown. For causal discovery benchmarking, we implement $R^2$-sortability, the $R^2$-SortnRegress algorithm, and ANM simulation procedures in our library CausalDisco at https://causaldisco.github.io/CausalDisco/.
\ No newline at end of file
diff --git a/data/2023/neurips/A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization b/data/2023/neurips/A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization
new file mode 100644
index 0000000000..1d459763ce
--- /dev/null
+++ b/data/2023/neurips/A Single-Loop Accelerated Extra-Gradient Difference Algorithm with Improved Complexity Bounds for Constrained Minimax Optimization	
@@ -0,0 +1 @@
+In this paper, we propose a novel extra-gradient difference acceleration algorithm for solving constrained nonconvex-nonconcave (NC-NC) minimax problems. In particular, we design a new extra-gradient difference step to obtain an important quasi-cocoercivity property, which plays a key role to signiﬁcantly improve the convergence rate in the constrained NC-NC setting without additional structural assumption. Then momentum acceleration is also introduced into our dual accelerating update step. Moreover, we prove that, to ﬁnd an (cid:15) -stationary point of the function f , our algorithm attains the complexity O ( (cid:15) − 2 ) in the constrained NC-NC setting, while the best-known complexity bound is (cid:101) O ( (cid:15) − 4 ) , where (cid:101) O ( · ) hides logarithmic factors compared to O ( · ) . As the special cases of the constrained NC-NC setting, our algorithm can also obtain the same complexity O ( (cid:15) − 2 ) for both the nonconvex-concave (NC-C) and convex-nonconcave (C-NC) cases, while the best-known complexity bounds are (cid:101) O ( (cid:15) − 2 . 5 ) for the NC-C case and (cid:101) O ( (cid:15) − 4 ) for the C-NC case. For fair comparison with existing algorithms, we also analyze the complexity bound to ﬁnd (cid:15) -stationary point of the primal function φ for the constrained NC-C problem, which shows that our algorithm can improve the complexity bound from (cid:101) O ( (cid:15) − 3 ) to O ( (cid:15) − 2 ) . To the best of our knowledge, this is the ﬁrst time that the proposed algorithm improves the best-known complexity bounds from O ( (cid:15) − 4 ) and (cid:101) O ( (cid:15) − 3 ) to O ( (cid:15) − 2 ) in both the NC-NC and NC-C settings
\ No newline at end of file
diff --git a/data/2023/neurips/A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm b/data/2023/neurips/A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm
new file mode 100644
index 0000000000..30f9384e7d
--- /dev/null
+++ b/data/2023/neurips/A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm	
@@ -0,0 +1 @@
+We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $\alpha<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(\mu, \Sigma)$, the goal is to output a list of $O(1/\alpha)$ hypotheses at least one of which is close to $\Sigma$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/\alpha)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/\alpha)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.
\ No newline at end of file
diff --git a/data/2023/neurips/A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes b/data/2023/neurips/A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
new file mode 100644
index 0000000000..d3a5661b1a
--- /dev/null
+++ b/data/2023/neurips/A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes	
@@ -0,0 +1 @@
+The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is unclear whether PPO or its optimistic variants can effectively solve linear Markov decision processes (MDPs), which are arguably the simplest models in RL with function approximation. To bridge this gap, we propose an optimistic variant of PPO for episodic adversarial linear MDPs with full-information feedback, and establish a $\tilde{\mathcal{O}}(d^{3/4}H^2K^{3/4})$ regret for it. Here $d$ is the ambient dimension of linear MDPs, $H$ is the length of each episode, and $K$ is the number of episodes. Compared with existing policy-based algorithms, we achieve the state-of-the-art regret bound in both stochastic linear MDPs and adversarial linear MDPs with full information. Additionally, our algorithm design features a novel multi-batched updating mechanism and the theoretical analysis utilizes a new covering number argument of value and policy classes, which might be of independent interest.
\ No newline at end of file
diff --git a/data/2023/neurips/A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process b/data/2023/neurips/A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process
new file mode 100644
index 0000000000..6055038a85
--- /dev/null
+++ b/data/2023/neurips/A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process	
@@ -0,0 +1 @@
+Learning rich skills under the option framework without supervision of external rewards is at the frontier of reinforcement learning research. Existing works mainly fall into two distinctive categories: variational option discovery that maximizes the diversity of the options through a mutual information loss (while ignoring coverage) and Laplacian-based methods that focus on improving the coverage of options by increasing connectivity of the state space (while ignoring diversity). In this paper, we show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework. To be specific, we explicitly quantify the diversity and coverage of the learned options through a novel use of Determinantal Point Process (DPP) and optimize these objectives to discover options with both superior diversity and coverage. Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari. The results demonstrate that our algorithm outperforms state-of-the-art baselines in both diversity- and coverage-driven categories.
\ No newline at end of file
diff --git a/data/2023/neurips/A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing b/data/2023/neurips/A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing
new file mode 100644
index 0000000000..5bb38a4a0d
--- /dev/null
+++ b/data/2023/neurips/A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing	
@@ -0,0 +1 @@
+In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x}^* \in \mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $\mathbf{x}^*$ rather than for all $\mathbf{x}^*$ simultaneously. In this paper, we build a unified framework to derive uniform recovery guarantees for nonlinear GCS where the observation model is nonlinear and possibly discontinuous or unknown. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. Specifically, using a single realization of the sensing ensemble and generalized Lasso, {\em all} $\mathbf{x}^*\in G(\mathbb{B}_2^k(r))$ can be recovered up to an $\ell_2$-error at most $\epsilon$ using roughly $\tilde{O}({k}/{\epsilon^2})$ samples, with omitted logarithmic factors typically being dominated by $\log L$. Notably, this almost coincides with existing non-uniform guarantees up to logarithmic factors, hence the uniformity costs very little. As part of our technical contributions, we introduce the Lipschitz approximation to handle discontinuous observation models. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy. Experimental results are presented to corroborate our theory.
\ No newline at end of file
diff --git a/data/2023/neurips/A Unified Model and Dimension for Interactive Estimation b/data/2023/neurips/A Unified Model and Dimension for Interactive Estimation
new file mode 100644
index 0000000000..72ffc84439
--- /dev/null
+++ b/data/2023/neurips/A Unified Model and Dimension for Interactive Estimation	
@@ -0,0 +1 @@
+We study an abstract framework for interactive learning called interactive estimation in which the goal is to estimate a target from its"similarity'' to points queried by the learner. We introduce a combinatorial measure called dissimilarity dimension which largely captures learnability in our model. We present a simple, general, and broadly-applicable algorithm, for which we obtain both regret and PAC generalization bounds that are polynomial in the new dimension. We show that our framework subsumes and thereby unifies two classic learning models: statistical-query learning and structured bandits. We also delineate how the dissimilarity dimension is related to well-known parameters for both frameworks, in some cases yielding significantly improved analyses.
\ No newline at end of file
diff --git a/data/2023/neurips/A Unified, Scalable Framework for Neural Population Decoding b/data/2023/neurips/A Unified, Scalable Framework for Neural Population Decoding
new file mode 100644
index 0000000000..4ad1990fe0
--- /dev/null
+++ b/data/2023/neurips/A Unified, Scalable Framework for Neural Population Decoding	
@@ -0,0 +1 @@
+Our ability to use deep learning approaches to decipher neural activity would likely benefit from greater scale, in terms of both model size and datasets. However, the integration of many neural recordings into one unified model is challenging, as each recording contains the activity of different neurons from different individual animals. In this paper, we introduce a training framework and architecture designed to model the population dynamics of neural activity across diverse, large-scale neural recordings. Our method first tokenizes individual spikes within the dataset to build an efficient representation of neural events that captures the fine temporal structure of neural activity. We then employ cross-attention and a PerceiverIO backbone to further construct a latent tokenization of neural population activities. Utilizing this architecture and training framework, we construct a large-scale multi-session model trained on large datasets from seven nonhuman primates, spanning over 158 different sessions of recording from over 27,373 neural units and over 100 hours of recordings. In a number of different tasks, we demonstrate that our pretrained model can be rapidly adapted to new, unseen sessions with unspecified neuron correspondence, enabling few-shot performance with minimal labels. This work presents a powerful new approach for building deep learning tools to analyze neural data and stakes out a clear path to training at scale.
\ No newline at end of file
diff --git a/data/2023/neurips/A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning b/data/2023/neurips/A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning
new file mode 100644
index 0000000000..0287ddb8b6
--- /dev/null
+++ b/data/2023/neurips/A Unifying Perspective on Multi-Calibration: Game Dynamics for Multi-Objective Learning	
@@ -0,0 +1 @@
+We provide a unifying framework for the design and analysis of multicalibrated predictors. By placing the multicalibration problem in the general setting of multi-objective learning -- where learning guarantees must hold simultaneously over a set of distributions and loss functions -- we exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems. In addition to shedding light on existing multicalibration guarantees and greatly simplifying their analysis, our approach also yields improved guarantees, such as obtaining stronger multicalibration conditions that scale with the square-root of group size and improving the complexity of $k$-class multicalibration by an exponential factor of $k$. Beyond multicalibration, we use these game dynamics to address emerging considerations in the study of group fairness and multi-distribution learning.
\ No newline at end of file
diff --git a/data/2023/neurips/A fast heuristic to optimize time-space tradeoff for large models b/data/2023/neurips/A fast heuristic to optimize time-space tradeoff for large models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/A graphon-signal analysis of graph neural networks b/data/2023/neurips/A graphon-signal analysis of graph neural networks
new file mode 100644
index 0000000000..aec40717dd
--- /dev/null
+++ b/data/2023/neurips/A graphon-signal analysis of graph neural networks	
@@ -0,0 +1 @@
+We present an approach for analyzing message passing graph neural networks (MPNNs) based on an extension of graphon analysis to a so called graphon-signal analysis. A MPNN is a function that takes a graph and a signal on the graph (a graph-signal) and returns some value. Since the input space of MPNNs is non-Euclidean, i.e., graphs can be of any size and topology, properties such as generalization are less well understood for MPNNs than for Euclidean neural networks. We claim that one important missing ingredient in past work is a meaningful notion of graph-signal similarity measure, that endows the space of inputs to MPNNs with a regular structure. We present such a similarity measure, called the graphon-signal cut distance, which makes the space of all graph-signals a dense subset of a compact metric space -- the graphon-signal space. Informally, two deterministic graph-signals are close in cut distance if they ``look like'' they were sampled from the same random graph-signal model. Hence, our cut distance is a natural notion of graph-signal similarity, which allows comparing any pair of graph-signals of any size and topology. We prove that MPNNs are Lipschitz continuous functions over the graphon-signal metric space. We then give two applications of this result: 1) a generalization bound for MPNNs, and, 2) the stability of MPNNs to subsampling of graph-signals. Our results apply to any regular enough MPNN on any distribution of graph-signals, making the analysis rather universal.
\ No newline at end of file
diff --git a/data/2023/neurips/A new perspective on building efficient and expressive 3D equivariant graph neural networks b/data/2023/neurips/A new perspective on building efficient and expressive 3D equivariant graph neural networks
new file mode 100644
index 0000000000..6558029497
--- /dev/null
+++ b/data/2023/neurips/A new perspective on building efficient and expressive 3D equivariant graph neural networks	
@@ -0,0 +1 @@
+Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these networks through a local-to-global analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (LSE) and frame transition encoding (FTE). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves state-of-the-art performance on both scalar-valued and vector-valued molecular property prediction tasks. We further point out the design space for future developments of equivariant graph neural networks. Our codes are available at \url{https://github.com/yuanqidu/LeftNet}.
\ No newline at end of file
diff --git a/data/2023/neurips/A polar prediction model for learning to represent visual transformations b/data/2023/neurips/A polar prediction model for learning to represent visual transformations
new file mode 100644
index 0000000000..f897fd6c12
--- /dev/null
+++ b/data/2023/neurips/A polar prediction model for learning to represent visual transformations	
@@ -0,0 +1 @@
+All organisms make temporal predictions, and their evolutionary fitness level depends on the accuracy of these predictions. In the context of visual perception, the motions of both the observer and objects in the scene structure the dynamics of sensory signals, allowing for partial prediction of future signals based on past ones. Here, we propose a self-supervised representation-learning framework that extracts and exploits the regularities of natural videos to compute accurate predictions. We motivate the polar architecture by appealing to the Fourier shift theorem and its group-theoretic generalization, and we optimize its parameters on next-frame prediction. Through controlled experiments, we demonstrate that this approach can discover the representation of simple transformation groups acting in data. When trained on natural video datasets, our framework achieves better prediction performance than traditional motion compensation and rivals conventional deep networks, while maintaining interpretability and speed. Furthermore, the polar computations can be restructured into components resembling normalized simple and direction-selective complex cell models of primate V1 neurons. Thus, polar prediction offers a principled framework for understanding how the visual system represents sensory inputs in a form that simplifies temporal prediction.
\ No newline at end of file
diff --git a/data/2023/neurips/A unified framework for information-theoretic generalization bounds b/data/2023/neurips/A unified framework for information-theoretic generalization bounds
new file mode 100644
index 0000000000..14abc956f7
--- /dev/null
+++ b/data/2023/neurips/A unified framework for information-theoretic generalization bounds	
@@ -0,0 +1 @@
+This paper presents a general methodology for deriving information-theoretic generalization bounds for learning algorithms. The main technical tool is a probabilistic decorrelation lemma based on a change of measure and a relaxation of Young's inequality in $L_{\psi_p}$ Orlicz spaces. Using the decorrelation lemma in combination with other techniques, such as symmetrization, couplings, and chaining in the space of probability measures, we obtain new upper bounds on the generalization error, both in expectation and in high probability, and recover as special cases many of the existing generalization bounds, including the ones based on mutual information, conditional mutual information, stochastic chaining, and PAC-Bayes inequalities. In addition, the Fernique-Talagrand upper bound on the expected supremum of a subgaussian process emerges as a special case.
\ No newline at end of file
diff --git a/data/2023/neurips/ADGym: Design Choices for Deep Anomaly Detection b/data/2023/neurips/ADGym: Design Choices for Deep Anomaly Detection
new file mode 100644
index 0000000000..7015a099e1
--- /dev/null
+++ b/data/2023/neurips/ADGym: Design Choices for Deep Anomaly Detection	
@@ -0,0 +1 @@
+Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.
\ No newline at end of file
diff --git a/data/2023/neurips/AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix b/data/2023/neurips/AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix
new file mode 100644
index 0000000000..f28363c82e
--- /dev/null
+++ b/data/2023/neurips/AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix	
@@ -0,0 +1 @@
+Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. These diagonal elements are closely related to the Hessian and can be perceived as an approximation of the inner product between the Hessian row vectors and difference of the adjacent parameter vectors. Additionally, we introduce an auto-switching function that enables the preconditioning matrix to switch dynamically between Stochastic Gradient Descent (SGD) and the adaptive optimizer. Based on these two techniques, we develop a new optimizer named AGD that enhances the generalization performance. We evaluate AGD on public datasets of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys). Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) optimizers, achieving highly competitive or significantly better predictive performance. Furthermore, we analyze how AGD is able to switch automatically between SGD and the adaptive optimizer and its actual effects on various scenarios. The code is available at https://github.com/intelligent-machine-learning/dlrover/tree/master/atorch/atorch/optimizers.
\ No newline at end of file
diff --git a/data/2023/neurips/AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis b/data/2023/neurips/AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis
new file mode 100644
index 0000000000..a57b5f6e0f
--- /dev/null
+++ b/data/2023/neurips/AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis	
@@ -0,0 +1 @@
+High-dimensional mediation analysis is often associated with a multiple testing problem for detecting signiﬁcant mediators. Assessing the uncertainty of this detecting process via false discovery rate (FDR) has garnered great interest. To control the FDR in multiple testing, two essential steps are involved: ranking and selection. Existing approaches either construct p-values without calibration or disregard the joint information across tests, leading to conservation in FDR control or non-optimal ranking rules for multiple hypotheses. In this paper, we develop an adaptive mediation detection procedure (referred to as "AMDP") to identify relevant mediators while asymptotically controlling the FDR in high-dimensional mediation analysis. AMDP produces the optimal rule for ranking hypotheses and proposes a data-driven strategy to determine the threshold for mediator selection. This novel method captures information from the proportions of composite null hypotheses and the distribution of p-values, which turns the high dimensionality into an advantage instead of a limitation. The numerical studies on synthetic and real data sets illustrate the performances of AMDP compared with existing approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/ANPL: Towards Natural Programming with Interactive Decomposition b/data/2023/neurips/ANPL: Towards Natural Programming with Interactive Decomposition
new file mode 100644
index 0000000000..508ac25dc0
--- /dev/null
+++ b/data/2023/neurips/ANPL: Towards Natural Programming with Interactive Decomposition	
@@ -0,0 +1 @@
+Though LLMs are capable of generating plausible programs, it's challenging to interact with the LLMs further to revise the program, especially if the user's specific requirements are different from the initial proposal. In this paper, we introduce ANPL, an interactive programming system that ensures users can always refine the generated code towards their specific programmatic intents via structured decompositions. Borrowing the paradigm of sketching from program synthesis, an ANPL program consists of a set of input-outputs that it must satisfy, a ``sketch'' -- control/data flow expressed in precise code (e.g. Python), and ``holes'' -- sub-modules to be implemented by the LLM specified with natural language. The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole, turning it into a sub-ANPL program that can be solved recursively. This workflow allows the users to offload programming burdens to the LLM as much as possible while retaining the ability to pinpoint and resolve bugs locally, without exposing the rest of the program to the LLM. We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together. Additional evaluations on APPS, HumanEval, and real-world programming tasks have validated that the ANPL framework is applicable to multiple programming domains. We release the ANPL solutions to the ARC tasks as a dataset, providing insights into how humans decompose novel tasks programmatically. See our code at https://iprc-dip.github.io/ANPL/.
\ No newline at end of file
diff --git a/data/2023/neurips/ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation b/data/2023/neurips/ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation
new file mode 100644
index 0000000000..ec9995a25b
--- /dev/null
+++ b/data/2023/neurips/ANTN: Bridging Autoregressive Neural Networks and Tensor Networks for Quantum Many-Body Simulation	
@@ -0,0 +1 @@
+Quantum many-body physics simulation has important impacts on understanding fundamental science and has applications to quantum materials design and quantum technology. However, due to the exponentially growing size of the Hilbert space with respect to the particle number, a direct simulation is intractable. While representing quantum states with tensor networks and neural networks are the two state-of-the-art methods for approximate simulations, each has its own limitations in terms of expressivity and inductive bias. To address these challenges, we develop a novel architecture, Autoregressive Neural TensorNet (ANTN), which bridges tensor networks and autoregressive neural networks. We show that Autoregressive Neural TensorNet parameterizes normalized wavefunctions, allows for exact sampling, generalizes the expressivity of tensor networks and autoregressive neural networks, and inherits a variety of symmetries from autoregressive neural networks. We demonstrate our approach on quantum state learning as well as finding the ground state of the challenging 2D $J_1$-$J_2$ Heisenberg model with different systems sizes and coupling parameters, outperforming both tensor networks and autoregressive neural networks. Our work opens up new opportunities for quantum many-body physics simulation, quantum technology design, and generative modeling in artificial intelligence.
\ No newline at end of file
diff --git a/data/2023/neurips/AQuA: A Benchmarking Tool for Label Quality Assessment b/data/2023/neurips/AQuA: A Benchmarking Tool for Label Quality Assessment
new file mode 100644
index 0000000000..5b35e47eda
--- /dev/null
+++ b/data/2023/neurips/AQuA: A Benchmarking Tool for Label Quality Assessment	
@@ -0,0 +1 @@
+Machine learning (ML) models are only as good as the data they are trained on. But recent studies have found datasets widely used to train and evaluate ML models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on the train set hurt ML models' ability to generalize, and they impact evaluation and model selection using the test set. Consequently, learning in the presence of labeling errors is an active area of research, yet this field lacks a comprehensive benchmark to evaluate these methods. Most of these methods are evaluated on a few computer vision datasets with significant variance in the experimental protocols. With such a large pool of methods and inconsistent evaluation, it is also unclear how ML practitioners can choose the right models to assess label quality in their data. To this end, we propose a benchmarking environment AQuA to rigorously evaluate methods that enable machine learning in the presence of label noise. We also introduce a design space to delineate concrete design choices of label error detection models. We hope that our proposed design space and benchmark enable practitioners to choose the right tools to improve their label quality and that our benchmark enables objective and rigorous evaluation of machine learning tools facing mislabeled data.
\ No newline at end of file
diff --git a/data/2023/neurips/AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation b/data/2023/neurips/AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
new file mode 100644
index 0000000000..62baa7f488
--- /dev/null
+++ b/data/2023/neurips/AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation	
@@ -0,0 +1 @@
+Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently. However, natural language exhibits a far more pronounced sequential dependency in comparison to images, and the majority of existing language models are trained with a left-to-right auto-regressive approach. To account for the inherent sequential characteristic of natural language, we introduce Auto-Regressive Diffusion (AR-Diffusion). AR-Diffusion ensures that the generation of tokens on the right depends on the generated ones on the left, a mechanism achieved through employing a dynamic number of denoising steps that vary based on token position. This results in tokens on the left undergoing fewer denoising steps than those on the right, thereby enabling them to generate earlier and subsequently influence the generation of tokens on the right. In a series of experiments on various text generation tasks, including text summarization, machine translation, and common sense generation, AR-Diffusion clearly demonstrated its superiority over existing diffusion language models and that it can be $100\times\sim600\times$ faster when achieving comparable results. Our code is available at https://github.com/microsoft/ProphetNet/tree/master/AR-diffusion.
\ No newline at end of file
diff --git a/data/2023/neurips/ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition b/data/2023/neurips/ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition
new file mode 100644
index 0000000000..d863f2e6b7
--- /dev/null
+++ b/data/2023/neurips/ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition	
@@ -0,0 +1 @@
+Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving 63% accuracy and a recall-at-10 of 91%, evaluated entirely on videos of users who are not present in the training or validation sets. An accessible PDF of this article is available at the following link: https://aashakadesai.github.io/research/ASLCitizen_arxiv_updated.pdf
\ No newline at end of file
diff --git a/data/2023/neurips/ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks b/data/2023/neurips/ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks
new file mode 100644
index 0000000000..1614f83594
--- /dev/null
+++ b/data/2023/neurips/ASPEN: Breaking Operator Barriers for Efficient Parallelization of Deep Neural Networks	
@@ -0,0 +1 @@
+Modern Deep Neural Network (DNN) frameworks use tensor operators as the main building blocks of DNNs. However, we observe that operator-based construction of DNNs incurs significant drawbacks in parallelism in the form of synchronization barriers . Synchronization barriers of operators confine the scope of parallel computation to each operator and obscure the rich parallel computation opportunities that exist across operators. To this end, we present ASPEN, a novel parallel computation solution for DNNs that allows fine-grained dynamic execution of DNNs , which (1) removes the operator barriers and expresses DNNs in dataflow graphs of fine-grained tiles to expose the parallel computation opportunities across operators, and (2) exploits these opportunities by dynamically locating and scheduling them in runtime. This novel approach of ASPEN enables opportunistic parallelism , a new class of parallelism for DNNs that is unavailable in the existing operator-based approaches. ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential. We provide challenges and solutions to our approach and show that our proof-of-concept implementation of ASPEN on CPU shows exceptional performance, outperforming state-of-the-art inference systems of TorchScript and TVM by up to 3.2 × and 4.3 × , respectively.
\ No newline at end of file
diff --git a/data/2023/neurips/ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation b/data/2023/neurips/ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
new file mode 100644
index 0000000000..7c3a285cdf
--- /dev/null
+++ b/data/2023/neurips/ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation	
@@ -0,0 +1 @@
+Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they require prohibitively large amounts of extra memory, since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass. This makes it difficult, if not impossible, to use them in production. We present AtMan that provides explanations of generative transformer models at almost no extra cost. Specifically, AtMan is a modality-agnostic perturbation method that manipulates the attention mechanisms of transformers to produce relevance maps for the input with respect to the output prediction. Instead of using backpropagation, AtMan applies a parallelizable token-based search method based on cosine similarity neighborhood in the embedding space. Our exhaustive experiments on text and image-text benchmarks demonstrate that AtMan outperforms current state-of-the-art gradient-based methods on several metrics while being computationally efficient. As such, AtMan is suitable for use in large model inference deployments.
\ No newline at end of file
diff --git a/data/2023/neurips/AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models b/data/2023/neurips/AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
new file mode 100644
index 0000000000..379dfdf420
--- /dev/null
+++ b/data/2023/neurips/AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models	
@@ -0,0 +1 @@
+Audio editing is applicable for various purposes, such as adding background sound effects, replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-based methods achieved zero-shot audio editing by using a diffusion and denoising process conditioned on the text description of the output audio. However, these methods still have some problems: 1) they have not been trained on editing tasks and cannot ensure good editing effects; 2) they can erroneously modify audio segments that do not require editing; 3) they need a complete description of the output audio, which is not always available or necessary in practical scenarios. In this work, we propose AUDIT, an instruction-guided audio editing model based on latent diffusion models. Specifically, AUDIT has three main design features: 1) we construct triplet training data (instruction, input audio, output audio) for different audio editing tasks and train a diffusion model using instruction and input (to be edited) audio as conditions and generating output (edited) audio; 2) it can automatically learn to only modify segments that need to be edited by comparing the difference between the input and output audio; 3) it only needs edit instructions instead of full target audio descriptions as text input. AUDIT achieves state-of-the-art results in both objective and subjective metrics for several audio editing tasks (e.g., adding, dropping, replacement, inpainting, super-resolution). Demo samples are available at https://audit-demo.github.io/.
\ No newline at end of file
diff --git a/data/2023/neurips/AVIS: Autonomous Visual Information Seeking with Large Language Model Agent b/data/2023/neurips/AVIS: Autonomous Visual Information Seeking with Large Language Model Agent
new file mode 100644
index 0000000000..38100aaefa
--- /dev/null
+++ b/data/2023/neurips/AVIS: Autonomous Visual Information Seeking with Large Language Model Agent	
@@ -0,0 +1 @@
+In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as"What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.
\ No newline at end of file
diff --git a/data/2023/neurips/AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web b/data/2023/neurips/AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web
new file mode 100644
index 0000000000..083286588a
--- /dev/null
+++ b/data/2023/neurips/AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web	
@@ -0,0 +1 @@
+Existing datasets for automated fact-checking have substantial limitations, such as relying on artificial claims, lacking annotations for evidence and intermediate reasoning, or including evidence published after the claim. In this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. Through a multi-round annotation process, we avoid common pitfalls including context dependence, evidence insufficiency, and temporal leakage, and reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We develop a baseline as well as an evaluation scheme for verifying claims through several question-answering steps against the open web.
\ No newline at end of file
diff --git a/data/2023/neurips/AbDiffuser: full-atom generation of in-vitro functioning antibodies b/data/2023/neurips/AbDiffuser: full-atom generation of in-vitro functioning antibodies
new file mode 100644
index 0000000000..c876400585
--- /dev/null
+++ b/data/2023/neurips/AbDiffuser: full-atom generation of in-vitro functioning antibodies	
@@ -0,0 +1 @@
+We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints; handles sequence-length changes; and reduces memory complexity by an order of magnitude, enabling backbone and side chain generation. We validate AbDiffuser in silico and in vitro. Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set. Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of the selected designs were tight binders.
\ No newline at end of file
diff --git a/data/2023/neurips/Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism b/data/2023/neurips/Accelerated On-Device Forward Neural Network Training with Module-Wise Descending Asynchronism
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance b/data/2023/neurips/Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
new file mode 100644
index 0000000000..db97a7c4fd
--- /dev/null
+++ b/data/2023/neurips/Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance	
@@ -0,0 +1 @@
+In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle complexity, as well as the maximal admissible level of adversarial noise. However, the assumption of finite variance is burdensome and it might not hold in many practical scenarios. To address this, we demonstrate how to adapt a refined clipped version of the accelerated gradient (Stochastic Similar Triangles) method from (Sadiev et al., 2023) for a two-point zero-order oracle. This adaptation entails extending the batching technique to accommodate infinite variance -- a non-trivial task that stands as a distinct contribution of this paper.
\ No newline at end of file
diff --git a/data/2023/neurips/Accelerating Motion Planning via Optimal Transport b/data/2023/neurips/Accelerating Motion Planning via Optimal Transport
new file mode 100644
index 0000000000..7194cd86a9
--- /dev/null
+++ b/data/2023/neurips/Accelerating Motion Planning via Optimal Transport	
@@ -0,0 +1 @@
+Motion planning is still an open problem for many disciplines, e.g., robotics, autonomous driving, due to their need for high computational resources that hinder real-time, efficient decision-making. A class of methods striving to provide smooth solutions is gradient-based trajectory optimization. However, those methods usually suffer from bad local minima, while for many settings, they may be inapplicable due to the absence of easy-to-access gradients of the optimization objectives. In response to these issues, we introduce Motion Planning via Optimal Transport (MPOT) -- a \textit{gradient-free} method that optimizes a batch of smooth trajectories over highly nonlinear costs, even for high-dimensional tasks, while imposing smoothness through a Gaussian Process dynamics prior via the planning-as-inference perspective. To facilitate batch trajectory optimization, we introduce an original zero-order and highly-parallelizable update rule: the Sinkhorn Step, which uses the regular polytope family for its search directions. Each regular polytope, centered on trajectory waypoints, serves as a local cost-probing neighborhood, acting as a \textit{trust region} where the Sinkhorn Step"transports"local waypoints toward low-cost regions. We theoretically show that Sinkhorn Step guides the optimizing parameters toward local minima regions of non-convex objective functions. We then show the efficiency of MPOT in a range of problems from low-dimensional point-mass navigation to high-dimensional whole-body robot motion planning, evincing its superiority compared to popular motion planners, paving the way for new applications of optimal transport in motion planning.
\ No newline at end of file
diff --git a/data/2023/neurips/Accessing Higher Dimensions for Unsupervised Word Translation b/data/2023/neurips/Accessing Higher Dimensions for Unsupervised Word Translation
new file mode 100644
index 0000000000..a918e92211
--- /dev/null
+++ b/data/2023/neurips/Accessing Higher Dimensions for Unsupervised Word Translation	
@@ -0,0 +1 @@
+The striking ability of unsupervised word translation has been demonstrated with the help of word vectors / pretraining; however, they require large amounts of data and usually fails if the data come from different domains. We propose coocmap, a method that can use either high-dimensional co-occurrence counts or their lower-dimensional approximations. Freed from the limits of low dimensions, we show that relying on low-dimensional vectors and their incidental properties miss out on better denoising methods and useful world knowledge in high dimensions, thus stunting the potential of the data. Our results show that unsupervised translation can be achieved more easily and robustly than previously thought -- less than 80MB and minutes of CPU time is required to achieve over 50\% accuracy for English to Finnish, Hungarian, and Chinese translations when trained on similar data; even under domain mismatch, we show coocmap still works fully unsupervised on English NewsCrawl to Chinese Wikipedia and English Europarl to Spanish Wikipedia, among others. These results challenge prevailing assumptions on the necessity and superiority of low-dimensional vectors, and suggest that similarly processed co-occurrences can outperform dense vectors on other tasks too.
\ No newline at end of file
diff --git a/data/2023/neurips/Achieving Cross Modal Generalization with Multimodal Unified Representation b/data/2023/neurips/Achieving Cross Modal Generalization with Multimodal Unified Representation
new file mode 100644
index 0000000000..fe6de34312
--- /dev/null
+++ b/data/2023/neurips/Achieving Cross Modal Generalization with Multimodal Unified Representation	
@@ -0,0 +1 @@
+This paper introduces a novel task called Cross Modal Generalization (CMG), which addresses the challenge of learning a unified discrete representation from paired multimodal data during pre-training. Then in downstream tasks, the model can achieve zero-shot generalization ability in other modalities when only one modal is labeled. Existing approaches in multimodal representation learning focus more on coarse-grained alignment or rely on the assumption that information from different modalities is completely aligned, which is impractical in real-world scenarios. To overcome this limitation, we propose Uni-Code , which contains two key contributions: the Dual Cross-modal Information Disentangling (DCID) module and the Multi-Modal Exponential Moving Average (MM-EMA). These methods facilitate bidirectional supervision between modalities and align semantically equivalent information in a shared discrete latent space, enabling fine-grained unified representation of multimodal sequences. During pre-training, we investigate various modality combinations, including audio-visual, audio-text, and the tri-modal combination of audio-visual-text. Extensive experiments on various downstream tasks, i.e., cross-modal event classification, localization, cross-modal retrieval, query-based video segmentation, and cross-dataset event localization, demonstrate the effectiveness of our proposed methods. The code is available at https://github.com/haihuangcode/CMG.
\ No newline at end of file
diff --git a/data/2023/neurips/Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models b/data/2023/neurips/Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models
new file mode 100644
index 0000000000..54db735faf
--- /dev/null
+++ b/data/2023/neurips/Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models	
@@ -0,0 +1 @@
+Unlike most reinforcement learning agents which require an unrealistic amount of environment interactions to learn a new behaviour, humans excel at learning quickly by merely observing and imitating others. This ability highly depends on the fact that humans have a model of their own embodiment that allows them to infer the most likely actions that led to the observed behaviour. In this paper, we propose Action Inference by Maximising Evidence (AIME) to replicate this behaviour using world models. AIME consists of two distinct phases. In the first phase, the agent learns a world model from its past experience to understand its own body by maximising the ELBO. While in the second phase, the agent is given some observation-only demonstrations of an expert performing a novel task and tries to imitate the expert's behaviour. AIME achieves this by defining a policy as an inference model and maximising the evidence of the demonstration under the policy and world model. Our method is"zero-shot"in the sense that it does not require further training for the world model or online interactions with the environment after given the demonstration. We empirically validate the zero-shot imitation performance of our method on the Walker and Cheetah embodiment of the DeepMind Control Suite and find it outperforms the state-of-the-art baselines. Code is available at: https://github.com/argmax-ai/aime.
\ No newline at end of file
diff --git a/data/2023/neurips/Active Vision Reinforcement Learning under Limited Visual Observability b/data/2023/neurips/Active Vision Reinforcement Learning under Limited Visual Observability
new file mode 100644
index 0000000000..85babb8dc0
--- /dev/null
+++ b/data/2023/neurips/Active Vision Reinforcement Learning under Limited Visual Observability	
@@ -0,0 +1 @@
+In this work, we investigate Active Vision Reinforcement Learning (ActiveVision-RL), where an embodied agent simultaneously learns action policy for the task while also controlling its visual observations in partially observable environments. We denote the former as motor policy and the latter as sensory policy. For example, humans solve real world tasks by hand manipulation (motor policy) together with eye movements (sensory policy). ActiveVision-RL poses challenges on coordinating two policies given their mutual influence. We propose SUGARL, Sensorimotor Understanding Guided Active Reinforcement Learning, a framework that models motor and sensory policies separately, but jointly learns them using with an intrinsic sensorimotor reward. This learnable reward is assigned by sensorimotor reward module, incentivizes the sensory policy to select observations that are optimal to infer its own motor action, inspired by the sensorimotor stage of humans. Through a series of experiments, we show the effectiveness of our method across a range of observability conditions and its adaptability to existed RL algorithms. The sensory policies learned through our method are observed to exhibit effective active vision strategies.
\ No newline at end of file
diff --git a/data/2023/neurips/Active representation learning for general task space with applications in robotics b/data/2023/neurips/Active representation learning for general task space with applications in robotics
new file mode 100644
index 0000000000..79923631b5
--- /dev/null
+++ b/data/2023/neurips/Active representation learning for general task space with applications in robotics	
@@ -0,0 +1 @@
+Representation learning based on multi-task pretraining has become a powerful approach in many domains. In particular, task-aware representation learning aims to learn an optimal representation for a specific target task by sampling data from a set of source tasks, while task-agnostic representation learning seeks to learn a universal representation for a class of tasks. In this paper, we propose a general and versatile algorithmic and theoretic framework for \textit{active representation learning}, where the learner optimally chooses which source tasks to sample from. This framework, along with a tractable meta algorithm, allows most arbitrary target and source task spaces (from discrete to continuous), covers both task-aware and task-agnostic settings, and is compatible with deep representation learning practices. We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases. In the bilinear case, by leveraging the non-uniform spectrum of the task representation and the calibrated source-target relevance, we prove that the sample complexity to achieve $\varepsilon$-excess risk on target scales with $ (k^*)^2 \|v^*\|_2^2 \varepsilon^{-2}$ where $k^*$ is the effective dimension of the target and $\|v^*\|_2^2 \in (0,1]$ represents the connection between source and target space. Compared to the passive one, this can save up to $\frac{1}{d_W}$ of sample complexity, where $d_W$ is the task space dimension. Finally, we demonstrate different instantiations of our meta algorithm in synthetic datasets and robotics problems, from pendulum simulations to real-world drone flight datasets. On average, our algorithms outperform baselines by $20\%-70\%$.
\ No newline at end of file
diff --git a/data/2023/neurips/Activity Grammars for Temporal Action Segmentation b/data/2023/neurips/Activity Grammars for Temporal Action Segmentation
new file mode 100644
index 0000000000..26a676f6d4
--- /dev/null
+++ b/data/2023/neurips/Activity Grammars for Temporal Action Segmentation	
@@ -0,0 +1 @@
+Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules. Our approach can be combined with any neural network for temporal action segmentation to enhance the sequence prediction and discover its compositional structure. Experimental results demonstrate that our method significantly improves temporal action segmentation in terms of both performance and interpretability on two standard benchmarks, Breakfast and 50 Salads.
\ No newline at end of file
diff --git a/data/2023/neurips/AdANNS: A Framework for Adaptive Semantic Search b/data/2023/neurips/AdANNS: A Framework for Adaptive Semantic Search
new file mode 100644
index 0000000000..fec3ac82a9
--- /dev/null
+++ b/data/2023/neurips/AdANNS: A Framework for Adaptive Semantic Search	
@@ -0,0 +1 @@
+Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
\ No newline at end of file
diff --git a/data/2023/neurips/Adapting Neural Link Predictors for Data-Efficient Complex Query Answering b/data/2023/neurips/Adapting Neural Link Predictors for Data-Efficient Complex Query Answering
new file mode 100644
index 0000000000..e886bf185f
--- /dev/null
+++ b/data/2023/neurips/Adapting Neural Link Predictors for Data-Efficient Complex Query Answering	
@@ -0,0 +1 @@
+Answering complex queries on incomplete knowledge graphs is a challenging task where a model needs to answer complex logical queries in the presence of missing knowledge. Prior work in the literature has proposed to address this problem by designing architectures trained end-to-end for the complex query answering task with a reasoning process that is hard to interpret while requiring data and resource-intensive training. Other lines of research have proposed re-using simple neural link predictors to answer complex queries, reducing the amount of training data by orders of magnitude while providing interpretable answers. The neural link predictor used in such approaches is not explicitly optimised for the complex query answering task, implying that its scores are not calibrated to interact together. We propose to address these problems via CQD$^{\mathcal{A}}$, a parameter-efficient score \emph{adaptation} model optimised to re-calibrate neural link prediction scores for the complex query answering task. While the neural link predictor is frozen, the adaptation component -- which only increases the number of model parameters by $0.03\%$ -- is trained on the downstream complex query answering task. Furthermore, the calibration component enables us to support reasoning over queries that include atomic negations, which was previously impossible with link predictors. In our experiments, CQD$^{\mathcal{A}}$ produces significantly more accurate results than current state-of-the-art methods, improving from $34.4$ to $35.1$ Mean Reciprocal Rank values averaged across all datasets and query types while using $\leq 30\%$ of the available training query types. We further show that CQD$^{\mathcal{A}}$ is data-efficient, achieving competitive results with only $1\%$ of the training complex queries, and robust in out-of-domain evaluations.
\ No newline at end of file
diff --git a/data/2023/neurips/Adaptive Principal Component Regression with Applications to Panel Data b/data/2023/neurips/Adaptive Principal Component Regression with Applications to Panel Data
new file mode 100644
index 0000000000..bd828b3e37
--- /dev/null
+++ b/data/2023/neurips/Adaptive Principal Component Regression with Applications to Panel Data	
@@ -0,0 +1 @@
+Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. We demonstrate the usefulness of our bounds by applying them to the domain of panel data, a ubiquitous setting in econometrics and statistics. As our first application, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy. Our second application is a procedure for learning such an intervention assignment policy in a setting where units arrive sequentially to be treated. In addition to providing theoretical performance guarantees (as measured by regret), we show that our method empirically outperforms a baseline which does not leverage error-in-variables regression.
\ No newline at end of file
diff --git a/data/2023/neurips/Adaptive Privacy Composition for Accuracy-first Mechanisms b/data/2023/neurips/Adaptive Privacy Composition for Accuracy-first Mechanisms
new file mode 100644
index 0000000000..dc7c9a27d7
--- /dev/null
+++ b/data/2023/neurips/Adaptive Privacy Composition for Accuracy-first Mechanisms	
@@ -0,0 +1 @@
+In many practical applications of differential privacy, practitioners seek to provide the best privacy guarantees subject to a target level of accuracy. A recent line of work by Ligett et al. '17 and Whitehouse et al. '22 has developed such accuracy-first mechanisms by leveraging the idea of noise reduction that adds correlated noise to the sufficient statistic in a private computation and produces a sequence of increasingly accurate answers. A major advantage of noise reduction mechanisms is that the analysts only pay the privacy cost of the least noisy or most accurate answer released. Despite this appealing property in isolation, there has not been a systematic study on how to use them in conjunction with other differentially private mechanisms. A fundamental challenge is that the privacy guarantee for noise reduction mechanisms is (necessarily) formulated as ex-post privacy that bounds the privacy loss as a function of the released outcome. Furthermore, there has yet to be any study on how ex-post private mechanisms compose, which allows us to track the accumulated privacy over several mechanisms. We develop privacy filters [Rogers et al. '16, Feldman and Zrnic '21, and Whitehouse et al. '22'] that allow an analyst to adaptively switch between differentially private and ex-post private mechanisms subject to an overall differential privacy guarantee.
\ No newline at end of file
diff --git a/data/2023/neurips/Adaptive Test-Time Personalization for Federated Learning b/data/2023/neurips/Adaptive Test-Time Personalization for Federated Learning
new file mode 100644
index 0000000000..b332e769cd
--- /dev/null
+++ b/data/2023/neurips/Adaptive Test-Time Personalization for Federated Learning	
@@ -0,0 +1 @@
+Personalized federated learning algorithms have shown promising results in adapting models to various distribution shifts. However, most of these methods require labeled data on testing clients for personalization, which is usually unavailable in real-world scenarios. In this paper, we introduce a novel setting called test-time personalized federated learning (TTPFL), where clients locally adapt a global model in an unsupervised way without relying on any labeled data during test-time. While traditional test-time adaptation (TTA) can be used in this scenario, most of them inherently assume training data come from a single domain, while they come from multiple clients (source domains) with different distributions. Overlooking these domain interrelationships can result in suboptimal generalization. Moreover, most TTA algorithms are designed for a specific kind of distribution shift and lack the flexibility to handle multiple kinds of distribution shifts in FL. In this paper, we find that this lack of flexibility partially results from their pre-defining which modules to adapt in the model. To tackle this challenge, we propose a novel algorithm called ATP to adaptively learns the adaptation rates for each module in the model from distribution shifts among source domains. Theoretical analysis proves the strong generalization of ATP. Extensive experiments demonstrate its superiority in handling various distribution shifts including label shift, image corruptions, and domain shift, outperforming existing TTA methods across multiple datasets and model architectures. Our code is available at https://github.com/baowenxuan/ATP .
\ No newline at end of file
diff --git a/data/2023/neurips/Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels b/data/2023/neurips/Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels
new file mode 100644
index 0000000000..8b6853793f
--- /dev/null
+++ b/data/2023/neurips/Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels	
@@ -0,0 +1 @@
+Humans solving algorithmic (or) reasoning problems typically exhibit solution times that grow as a function of problem difficulty. Adaptive recurrent neural networks have been shown to exhibit this property for various language-processing tasks. However, little work has been performed to assess whether such adaptive computation can also enable vision models to extrapolate solutions beyond their training distribution's difficulty level, with prior work focusing on very simple tasks. In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes. We combine convolutional recurrent neural networks (ConvRNNs) with a learnable halting mechanism based on Graves (2016). We explore various implementations of such adaptive ConvRNNs (AdRNNs) ranging from tying weights across layers to more sophisticated biologically inspired recurrent networks that possess lateral connections and gating. We show that 1) AdRNNs learn to dynamically halt processing early (or late) to solve easier (or harder) problems, 2) these RNNs zero-shot generalize to more difficult problem settings not shown during training by dynamically increasing the number of recurrent iterations at test time. Our study provides modeling evidence supporting the hypothesis that recurrent processing enables the functional advantage of adaptively allocating compute resources conditional on input requirements and hence allowing generalization to harder difficulty levels of a visual reasoning problem without training.
\ No newline at end of file
diff --git a/data/2023/neurips/Add and Thin: Diffusion for Temporal Point Processes b/data/2023/neurips/Add and Thin: Diffusion for Temporal Point Processes
new file mode 100644
index 0000000000..9a4c2029ec
--- /dev/null
+++ b/data/2023/neurips/Add and Thin: Diffusion for Temporal Point Processes	
@@ -0,0 +1 @@
+Autoregressive neural networks within the temporal point process (TPP) framework have become the standard for modeling continuous-time event data. Even though these models can expressively capture event sequences in a one-step-ahead fashion, they are inherently limited for long-term forecasting applications due to the accumulation of errors caused by their sequential nature. To overcome these limitations, we derive ADD-THIN, a principled probabilistic denoising diffusion model for TPPs that operates on entire event sequences. Unlike existing diffusion approaches, ADD-THIN naturally handles data with discrete and continuous components. In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.
\ No newline at end of file
diff --git a/data/2023/neurips/Addressing Negative Transfer in Diffusion Models b/data/2023/neurips/Addressing Negative Transfer in Diffusion Models
new file mode 100644
index 0000000000..d2eaf1c904
--- /dev/null
+++ b/data/2023/neurips/Addressing Negative Transfer in Diffusion Models	
@@ -0,0 +1 @@
+Diffusion-based generative models have achieved remarkable success in various domains. It trains a shared model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we first aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1) the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2) negative transfer can arise even in diffusion training. Building upon these observations, we aim to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2), we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the efficacy of proposed clustering and its integration with MTL methods through various experiments, demonstrating 1) improved generation quality and 2) faster training convergence of diffusion models.
\ No newline at end of file
diff --git a/data/2023/neurips/Adversarial Counterfactual Environment Model Learning b/data/2023/neurips/Adversarial Counterfactual Environment Model Learning
new file mode 100644
index 0000000000..7a6ce73989
--- /dev/null
+++ b/data/2023/neurips/Adversarial Counterfactual Environment Model Learning	
@@ -0,0 +1,14 @@
+A good model for action-effect prediction, named environment model, is important to achieve
+sample-efficient decision-making policy learning in many domains like robot control, recommender
+systems, and patients’ treatment selection. We can take unlimited trials with such a model to identify
+the appropriate actions so that the costs of queries in the real world can be saved. It requires the
+model to correctly handle unseen data, also called counterfactual data. However, standard data
+fitting techniques do not automatically achieve such generalization ability and commonly result in
+unreliable models. In this work, we introduce counterfactual-query risk minimization (CQRM) in
+model learning for generalizing to a counterfactual dataset queried by a specific target policy. Since
+the target policies can be various and unknown in policy learning, we propose an adversarial CQRM
+objective in which the model learns on counterfactual data queried by adversarial policies, and finally
+derive a tractable solution GALILEO. We also discover that adversarial CQRM is closely related
+to the adversarial model learning, explaining the effectiveness of the latter. We apply GALILEO
+in synthetic tasks and a real-world application. The results show that GALILEO makes accurate
+predictions on counterfactual data and thus significantly improves policies in real-world testing.
\ No newline at end of file
diff --git a/data/2023/neurips/Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces b/data/2023/neurips/Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces
new file mode 100644
index 0000000000..f05b6f76af
--- /dev/null
+++ b/data/2023/neurips/Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces	
@@ -0,0 +1 @@
+Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.
\ No newline at end of file
diff --git a/data/2023/neurips/Adversarial Self-Training Improves Robustness and Generalization for Gradual Domain Adaptation b/data/2023/neurips/Adversarial Self-Training Improves Robustness and Generalization for Gradual Domain Adaptation
new file mode 100644
index 0000000000..a3480c9ead
--- /dev/null
+++ b/data/2023/neurips/Adversarial Self-Training Improves Robustness and Generalization for Gradual Domain Adaptation	
@@ -0,0 +1 @@
+Gradual Domain Adaptation (GDA), in which the learner is provided with additional intermediate domains, has been theoretically and empirically studied in many contexts. Despite its vital role in security-critical scenarios, the adversarial robustness of the GDA model remains unexplored. In this paper, we adopt the effective gradual self-training method and replace vanilla self-training with adversarial self-training (AST). AST first predicts labels on the unlabeled data and then adversarially trains the model on the pseudo-labeled distribution. Intriguingly, we find that gradual AST improves not only adversarial accuracy but also clean accuracy on the target domain. We reveal that this is because adversarial training (AT) performs better than standard training when the pseudo-labels contain a portion of incorrect labels. Accordingly, we first present the generalization error bounds for gradual AST in a multiclass classification setting. We then use the optimal value of the Subset Sum Problem to bridge the standard error on a real distribution and the adversarial error on a pseudo-labeled distribution. The result indicates that AT may obtain a tighter bound than standard training on data with incorrect pseudo-labels. We further present an example of a conditional Gaussian distribution to provide more insights into why gradual AST can improve the clean accuracy for GDA.
\ No newline at end of file
diff --git a/data/2023/neurips/Adversarial Training from Mean Field Perspective b/data/2023/neurips/Adversarial Training from Mean Field Perspective
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Adversarially Robust Distributed Count Tracking via Partial Differential Privacy b/data/2023/neurips/Adversarially Robust Distributed Count Tracking via Partial Differential Privacy
new file mode 100644
index 0000000000..9cdf203dd9
--- /dev/null
+++ b/data/2023/neurips/Adversarially Robust Distributed Count Tracking via Partial Differential Privacy	
@@ -0,0 +1 @@
+We study the distributed tracking model, also known as distributed functional monitoring. This model involves $k$ sites each receiving a stream of items and communicating with the central server. The server's task is to track a function of all items received thus far continuously, with minimum communication cost. For count tracking, it is known that there is a $\sqrt{k}$ gap in communication between deterministic and randomized algorithms. However, existing randomized algorithms assume an"oblivious adversary"who constructs the entire input streams before the algorithm starts. Here we consider adaptive adversaries who can choose new items based on previous answers from the algorithm. Deterministic algorithms are trivially robust to adaptive adversaries, while randomized ones may not. Therefore, we investigate whether the $\sqrt{k}$ advantage of randomized algorithms is from randomness itself or the oblivious adversary assumption. We provide an affirmative answer to this question by giving a robust algorithm with optimal communication. Existing robustification techniques do not yield optimal bounds due to the inherent challenges of the distributed nature of the problem. To address this, we extend the differential privacy framework by introducing"partial differential privacy"and proving a new generalization theorem. This theorem may have broader applications beyond robust count tracking, making it of independent interest.
\ No newline at end of file
diff --git a/data/2023/neurips/Advice Querying under Budget Constraint for Online Algorithms b/data/2023/neurips/Advice Querying under Budget Constraint for Online Algorithms
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Affinity-Aware Graph Networks b/data/2023/neurips/Affinity-Aware Graph Networks
new file mode 100644
index 0000000000..b8653c82e1
--- /dev/null
+++ b/data/2023/neurips/Affinity-Aware Graph Networks	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have emerged as a powerful technique for learning on relational data. Owing to the relatively limited number of message passing steps they perform -- and hence a smaller receptive field -- there has been significant interest in improving their expressivity by incorporating structural aspects of the underlying graph. In this paper, we explore the use of affinity measures as features in graph neural networks, in particular measures arising from random walks, including effective resistance, hitting and commute times. We propose message passing networks based on these features and evaluate their performance on a variety of node and graph property prediction tasks. Our architecture has lower computational complexity, while our features are invariant to the permutations of the underlying graph. The measures we compute allow the network to exploit the connectivity properties of the graph, thereby allowing us to outperform relevant benchmarks for a wide variety of tasks, often with significantly fewer message passing steps. On one of the largest publicly available graph regression datasets, OGB-LSC-PCQM4Mv1, we obtain the best known single-model validation MAE at the time of writing.
\ No newline at end of file
diff --git a/data/2023/neurips/AirDelhi: Fine-Grained Spatio-Temporal Particulate Matter Dataset From Delhi For ML based Modeling b/data/2023/neurips/AirDelhi: Fine-Grained Spatio-Temporal Particulate Matter Dataset From Delhi For ML based Modeling
new file mode 100644
index 0000000000..bf506244e6
--- /dev/null
+++ b/data/2023/neurips/AirDelhi: Fine-Grained Spatio-Temporal Particulate Matter Dataset From Delhi For ML based Modeling	
@@ -0,0 +1 @@
+Air pollution poses serious health concerns in developing countries, such as India, necessitating large-scale measurement for correlation analysis, policy recommendations, and informed decision-making. However, fine-grained data collection is costly. Specifically, static sensors for pollution measurement cost several thousand dollars per unit, leading to inadequate deployment and coverage. To complement the existing sparse static sensor network, we propose a mobile sensor network utilizing lower-cost PM 2 . 5 sensors mounted on public buses in the Delhi-NCR region of India. Through this exercise, we introduce a novel dataset A IR D ELHI comprising PM 2 . 5 and PM 10 measurements. This dataset is made publicly available at https: // www. cse. iitd. ac. in/ pollutiondata , serving as a valuable resource for machine learning (ML) researchers and environmental-ists. We present three key contributions with the release of this dataset. Firstly, through in-depth statistical analysis, we demonstrate that the released dataset significantly differs from existing pollution datasets, highlighting its uniqueness and potential for new insights. Secondly, the dataset quality been validated against existing expensive sensors. Thirdly, we conduct a benchmarking exercise ( https: // github. com/ sachin-iitd/ DelhiPMDatasetBenchmark ), evaluating state-of-the-art methods for interpolation, feature imputation, and forecasting on this dataset, which is the largest publicly available PM dataset to date. The results of the benchmarking exercise underscore the substantial disparities in accuracy between the proposed
\ No newline at end of file
diff --git a/data/2023/neurips/AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation b/data/2023/neurips/AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
new file mode 100644
index 0000000000..056a4b2ea9
--- /dev/null
+++ b/data/2023/neurips/AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation	
@@ -0,0 +1 @@
+One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) setting since the joint action space grows exponentially with the number of agents. To avoid this curse of dimensionality, existing MARL methods adopt either value decomposition methods or fully decentralized training of individual agents. However, even when combined with standard conservatism principles, these methods can still result in the selection of OOD joint actions in offline MARL. To this end, we introduce AlberDICE, an offline MARL algorithm that alternatively performs centralized training of individual agents based on stationary distribution optimization. AlberDICE circumvents the exponential complexity of MARL by computing the best response of one agent at a time while effectively avoiding OOD joint action selection. Theoretically, we show that the alternating optimization procedure converges to Nash policies. In the experiments, we demonstrate that AlberDICE significantly outperforms baseline algorithms on a standard suite of MARL benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization b/data/2023/neurips/Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
new file mode 100644
index 0000000000..4eab3180cb
--- /dev/null
+++ b/data/2023/neurips/Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization	
@@ -0,0 +1 @@
+The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text prompts for unseen domains. While effective, this overlooks the key cause for performance degradation to unseen domains -- distribution shift. In this work, we explicitly handle this problem by aligning the out-of-distribution (OOD) test sample statistics to those of the source data using prompt tuning. We use a single test sample to adapt multi-modal prompts at test time by minimizing the feature distribution shift to bridge the gap in the test domain. Evaluating against the domain generalization benchmark, our method improves zero-shot top- 1 accuracy beyond existing prompt-learning techniques, with a 3.08% improvement over the baseline MaPLe. In cross-dataset generalization with unseen categories across 10 datasets, our method improves consistently across all datasets compared to the existing state-of-the-art. Our source code and models are available at https://jameelhassan.github.io/promptalign.
\ No newline at end of file
diff --git a/data/2023/neurips/Aligning Gradient and Hessian for Neural Signed Distance Function b/data/2023/neurips/Aligning Gradient and Hessian for Neural Signed Distance Function
new file mode 100644
index 0000000000..00e08426b8
--- /dev/null
+++ b/data/2023/neurips/Aligning Gradient and Hessian for Neural Signed Distance Function	
@@ -0,0 +1 @@
+The Signed Distance Function (SDF), as an implicit surface representation, provides a crucial method for reconstructing a watertight surface from unorganized point clouds. The SDF has a fundamental relationship with the principles of surface vector calculus. Given a smooth surface, there exists a thin-shell space in which the SDF is differentiable everywhere such that the gradient of the SDF is an eigenvector of its Hessian matrix, with a corresponding eigenvalue of zero. In this paper, we introduce a method to directly learn the SDF from point clouds in the absence of normals. Our motivation is grounded in a fundamental observation: aligning the gradient and the Hessian of the SDF provides a more efficient mechanism to govern gradient directions. This, in turn, ensures that gradient changes more accurately reflect the true underlying variations in shape. Extensive experimental results demonstrate its ability to accurately recover the underlying shape while effectively suppressing the presence of ghost geometry.
\ No newline at end of file
diff --git a/data/2023/neurips/Aligning Language Models with Human Preferences via a Bayesian Approach b/data/2023/neurips/Aligning Language Models with Human Preferences via a Bayesian Approach
new file mode 100644
index 0000000000..6abe9f1ef0
--- /dev/null
+++ b/data/2023/neurips/Aligning Language Models with Human Preferences via a Bayesian Approach	
@@ -0,0 +1 @@
+In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as d-PM. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i.e., emotional support conversation and integrity"Rule-of-Thumb"generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.
\ No newline at end of file
diff --git a/data/2023/neurips/Alignment with human representations supports robust few-shot learning b/data/2023/neurips/Alignment with human representations supports robust few-shot learning
new file mode 100644
index 0000000000..40db1fb61b
--- /dev/null
+++ b/data/2023/neurips/Alignment with human representations supports robust few-shot learning	
@@ -0,0 +1 @@
+Should we care whether AI systems have representations of the world that are similar to those of humans? We provide an information-theoretic analysis that suggests that there should be a U-shaped relationship between the degree of representational alignment with humans and performance on few-shot learning tasks. We confirm this prediction empirically, finding such a relationship in an analysis of the performance of 491 computer vision models. We also show that highly-aligned models are more robust to both natural adversarial attacks and domain shifts. Our results suggest that human-alignment is often a sufficient, but not necessary, condition for models to make effective use of limited data, be robust, and generalize well.
\ No newline at end of file
diff --git a/data/2023/neurips/All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation b/data/2023/neurips/All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation
new file mode 100644
index 0000000000..570ed91cae
--- /dev/null
+++ b/data/2023/neurips/All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation	
@@ -0,0 +1 @@
+Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. Existing methods often rely on empirical label selection strategies, such as confidence thresholding, to generate beneficial pseudo-labels for model training. This approach may, however, hinder the comprehensive exploitation of unlabeled data points. We hypothesize that this selective usage arises from the noise in pseudo-labels generated on unlabeled data. The noise in pseudo-labels may result in significant discrepancies between pseudo-labels and model predictions, thus confusing and affecting the model training greatly. To address this issue, we propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions. More specifically, our method introduces an Entropy Regularization loss and a Distribution Alignment loss for weakly supervised learning in 3D segmentation tasks, resulting in an ERDA learning strategy. Interestingly, by using KL distance to formulate the distribution alignment loss, it reduces to a deceptively simple cross-entropy-based loss which optimizes both the pseudo-label generation network and the 3D segmentation network simultaneously. Despite the simplicity, our method promisingly improves the performance. We validate the effectiveness through extensive experiments on various baselines and large-scale datasets. Results show that ERDA effectively enables the effective usage of all unlabeled data points for learning and achieves state-of-the-art performance under different settings. Remarkably, our method can outperform fully-supervised baselines using only 1% of true annotations. Code and model will be made publicly available at https://github.com/LiyaoTang/ERDA.
\ No newline at end of file
diff --git a/data/2023/neurips/Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception b/data/2023/neurips/Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
new file mode 100644
index 0000000000..466215ea05
--- /dev/null
+++ b/data/2023/neurips/Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception	
@@ -0,0 +1 @@
+We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model and task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) Performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model. 2) Sparsification with MoE on a single modality-agnostic encoder substantially improves the performance, outperforming dense models that use modality-specific encoders or additional fusion layers and greatly mitigates the conflicts between modalities. IMP achieves competitive performance on a wide range of downstream tasks including video classification, image classification, image-text, and video-text retrieval. Most notably, we train a sparse IMP-MoE-L variant focusing on video tasks that achieves new state-of-the-art in zero-shot video classification: 77.0% on Kinetics-400, 76.8% on Kinetics-600, and 68.3% on Kinetics-700, improving the previous state-of-the-art by +5%, +6.7%, and +5.8%, respectively, while using only 15% of their total training computational cost.
\ No newline at end of file
diff --git a/data/2023/neurips/Alternating Updates for Efficient Transformers b/data/2023/neurips/Alternating Updates for Efficient Transformers
new file mode 100644
index 0000000000..8e65a62c93
--- /dev/null
+++ b/data/2023/neurips/Alternating Updates for Efficient Transformers	
@@ -0,0 +1 @@
+It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. We present extensions of AltUp, such as its applicability to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to $87\%$ speedup relative to the dense baselines at the same accuracy.
\ No newline at end of file
diff --git a/data/2023/neurips/Alternation makes the adversary weaker in two-player games b/data/2023/neurips/Alternation makes the adversary weaker in two-player games
new file mode 100644
index 0000000000..26df1213c4
--- /dev/null
+++ b/data/2023/neurips/Alternation makes the adversary weaker in two-player games	
@@ -0,0 +1 @@
+Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈ [ n ] selects a vector x t and then an adversary selects a cost-vector c t ∈ [ − 1 , 1] n . The learner then experiences cost ( c t + c t − 1 ) ⊤ x t instead of ( c t ) ⊤ x t as in standard OLO. We establish that under this small twist, the Ω( √ T ) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O ((log n ) 4 / 3 T 1 / 3 ) regret for the n -dimensional simplex and O ( ρ log T ) regret for the ball of radius ρ > 0 . Our results imply that in alternating game-play, an agent can always guarantee ˜ O ((log n ) 4 / 3 T 1 / 3 ) regardless the strategies of the other agent while the regret bound improves to O (log T ) in case the agent admits only two actions.
\ No newline at end of file
diff --git a/data/2023/neurips/American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers b/data/2023/neurips/American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers
new file mode 100644
index 0000000000..f2ca8a5b18
--- /dev/null
+++ b/data/2023/neurips/American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers	
@@ -0,0 +1 @@
+Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout regions. OCR quality can also be low. This study develops a novel, deep learning pipeline for extracting full article texts from newspaper images and applies it to the nearly 20 million scans in Library of Congress's public domain Chronicling America collection. The pipeline includes layout detection, legibility classification, custom OCR, and association of article texts spanning multiple bounding boxes. To achieve high scalability, it is built with efficient architectures designed for mobile phones. The resulting American Stories dataset provides high quality data that could be used for pre-training a large language model to achieve better understanding of historical English and historical world knowledge. The dataset could also be added to the external database of a retrieval-augmented language model to make historical information - ranging from interpretations of political events to minutiae about the lives of people's ancestors - more widely accessible. Furthermore, structured article texts facilitate using transformer-based methods for popular social science applications like topic classification, detection of reproduced content, and news story clustering. Finally, American Stories provides a massive silver quality dataset for innovating multimodal layout analysis models and other multimodal applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs b/data/2023/neurips/Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs
new file mode 100644
index 0000000000..e7f2f533d8
--- /dev/null
+++ b/data/2023/neurips/Amortized Reparametrization: Efficient and Scalable Variational Inference for Latent SDEs	
@@ -0,0 +1 @@
+We consider the problem of inferring latent stochastic differential equations (SDEs) with a time and memory cost that scales independently with the amount of data, the total length of the time series, and the stiffness of the approximate differential equations. This is in stark contrast to typical methods for inferring latent differential equations which, despite their constant memory cost, have a time complexity that is heavily dependent on the stiffness of the approximate differential equation. We achieve this computational advancement by removing the need to solve differential equations when approximating gradients using a novel amortization strategy coupled with a recently derived reparametrization of expectations under linear SDEs. We show that, in practice, this allows us to achieve similar performance to methods based on adjoint sensitivities with more than an order of magnitude fewer evaluations of the model in training.
\ No newline at end of file
diff --git "a/data/2023/neurips/An Alternating Optimization Method for Bilevel Problems under the Polyak-\305\201ojasiewicz Condition" "b/data/2023/neurips/An Alternating Optimization Method for Bilevel Problems under the Polyak-\305\201ojasiewicz Condition"
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/An Efficient Dataset Condensation Plugin and Its Application to Continual Learning b/data/2023/neurips/An Efficient Dataset Condensation Plugin and Its Application to Continual Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits b/data/2023/neurips/An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits
new file mode 100644
index 0000000000..b612c78907
--- /dev/null
+++ b/data/2023/neurips/An Exploration-by-Optimization Approach to Best of Both Worlds in Linear Bandits	
@@ -0,0 +1 @@
+In this paper, we consider how to construct best-of-both-worlds linear bandit algorithms that achieve nearly optimal performance for both stochastic and adversarial environments. For this purpose, we show that a natural approach referred to as exploration by optimization [Lattimore and Szepesvári, 2020b] works well. Specifically, an algorithm constructed using this approach achieves O ( d √ T log T ) -regret in adversarial environments and O ( d 2 log T ∆ min ) -regret in stochastic environments. Symbols d , T and ∆ min here represent the dimensionality of the action set, the time horizon, and the minimum sub-optimality gap, respectively. We also show that this algorithm has even better theoretical guarantees for important special cases including the multi-armed bandit problem and multitask bandits.
\ No newline at end of file
diff --git a/data/2023/neurips/An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits b/data/2023/neurips/An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits
new file mode 100644
index 0000000000..77387ce2d2
--- /dev/null
+++ b/data/2023/neurips/An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits	
@@ -0,0 +1 @@
+We present an oracle-efficient relaxation for the adversarial contextual bandits problem, where the contexts are sequentially drawn i.i.d from a known distribution and the cost sequence is chosen by an online adversary. Our algorithm has a regret bound of $O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls per round to an offline optimization oracle, where $K$ denotes the number of actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of policies. This is the first result to improve the prior best bound of $O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et al. at NeurIPS 2016, and the first to match the original bound of Langford and Zhang at NeurIPS 2007 which was obtained for the stochastic case.
\ No newline at end of file
diff --git a/data/2023/neurips/An Optimization-based Approach To Node Role Discovery in Networks: Approximating Equitable Partitions b/data/2023/neurips/An Optimization-based Approach To Node Role Discovery in Networks: Approximating Equitable Partitions
new file mode 100644
index 0000000000..d1362b0639
--- /dev/null
+++ b/data/2023/neurips/An Optimization-based Approach To Node Role Discovery in Networks: Approximating Equitable Partitions	
@@ -0,0 +1 @@
+Similar to community detection, partitioning the nodes of a network according to their structural roles aims to identify fundamental building blocks of a network. The found partitions can be used, e.g., to simplify descriptions of the network connectivity, to derive reduced order models for dynamical processes unfolding on processes, or as ingredients for various graph mining tasks. In this work, we offer a fresh look on the problem of role extraction and its differences to community detection and present a definition of node roles related to graph-isomorphism tests, the Weisfeiler-Leman algorithm and equitable partitions. We study two associated optimization problems (cost functions) grounded in ideas from graph isomorphism testing, and present theoretical guarantees associated to the solutions of these problems. Finally, we validate our approach via a novel"role-infused partition benchmark", a network model from which we can sample networks in which nodes are endowed with different roles in a stochastic way.
\ No newline at end of file
diff --git a/data/2023/neurips/An information-theoretic quantification of the content of communication between brain regions b/data/2023/neurips/An information-theoretic quantification of the content of communication between brain regions
new file mode 100644
index 0000000000..1b2b5d3ded
--- /dev/null
+++ b/data/2023/neurips/An information-theoretic quantification of the content of communication between brain regions	
@@ -0,0 +1 @@
+Quantifying the amount, content and direction of communication between brain regions is key to understanding brain function. Traditional methods to analyze brain activity based on the Wiener-Granger causality principle quantify the overall information propagated by neural activity between simultaneously recorded brain regions, but do not reveal the information flow about specific features of interest (such as sensory stimuli). Here, we develop a new information theoretic measure termed Feature-specific Information Transfer (FIT), quantifying how much information about a specific feature flows between two regions. FIT merges the Wiener-Granger causality principle with information-content specificity. We first derive FIT and prove analytically its key properties. We then illustrate and test them with simulations of neural activity, demonstrating that FIT identifies, within the total information flowing between regions, the information that is transmitted about specific features. We then analyze three neural datasets obtained with different recording methods, magneto- and electro-encephalography, and spiking activity, to demonstrate the ability of FIT to uncover the content and direction of information flow between brain regions beyond what can be discerned with traditional anaytical methods. FIT can improve our understanding of how brain regions communicate by uncovering previously hidden feature-specific information flow.
\ No newline at end of file
diff --git a/data/2023/neurips/Analyzing Generalization of Neural Networks through Loss Path Kernels b/data/2023/neurips/Analyzing Generalization of Neural Networks through Loss Path Kernels
new file mode 100644
index 0000000000..f44caf52e5
--- /dev/null
+++ b/data/2023/neurips/Analyzing Generalization of Neural Networks through Loss Path Kernels	
@@ -0,0 +1 @@
+Deep neural networks have been increasingly used in real-world applications, making it critical to ensure their ability to adapt to new, unseen data. In this paper, we study the generalization capability of neural networks trained with (stochastic) gradient flow. We establish a new connection between the loss dynamics of gradient flow and general kernel machines by proposing a new kernel, called loss path kernel. This kernel measures the similarity between two data points by evaluating the agreement between loss gradients along the path determined by the gradient flow. Based on this connection, we derive a new generalization upper bound that applies to general neural network architectures. This new bound is tight and strongly correlated with the true generalization error. We apply our results to guide the design of neural architecture search (NAS) and demonstrate favorable performance compared with state-of-the-art NAS algorithms through numerical experiments.
\ No newline at end of file
diff --git a/data/2023/neurips/Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods b/data/2023/neurips/Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods
new file mode 100644
index 0000000000..d77f4192ed
--- /dev/null
+++ b/data/2023/neurips/Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods	
@@ -0,0 +1 @@
+Supervised training of deep neural networks on pairs of clean image and noisy measurement achieves state-of-the-art performance for many image reconstruction tasks, but such training pairs are difficult to collect. Self-supervised methods enable training based on noisy measurements only, without clean images. In this work, we investigate the cost of self-supervised training in terms of sample complexity for a class of self-supervised methods that enable the computation of unbiased estimates of gradients of the supervised loss, including noise2noise methods. We analytically show that a model trained with such self-supervised training is as good as the same model trained in a supervised fashion, but self-supervised training requires more examples than supervised training. We then study self-supervised denoising and accelerated MRI empirically and characterize the cost of self-supervised training in terms of the number of additional samples required, and find that the performance gap between self-supervised and supervised training vanishes as a function of the training examples, at a problem-dependent rate, as predicted by our theory.
\ No newline at end of file
diff --git a/data/2023/neurips/Anchor Data Augmentation b/data/2023/neurips/Anchor Data Augmentation
new file mode 100644
index 0000000000..1e64992237
--- /dev/null
+++ b/data/2023/neurips/Anchor Data Augmentation	
@@ -0,0 +1 @@
+We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data augmentation, which is in contrast to the current state-of-the-art domain-agnostic solutions that rely on the Mixup literature. Our Anchor Data Augmentation (ADA) uses several replicas of the modified samples in AR to provide more training examples, leading to more robust regression predictions. We apply ADA to linear and nonlinear regression problems using neural networks. ADA is competitive with state-of-the-art C-Mixup solutions.
\ No newline at end of file
diff --git a/data/2023/neurips/Anonymous and Copy-Robust Delegations for Liquid Democracy b/data/2023/neurips/Anonymous and Copy-Robust Delegations for Liquid Democracy
new file mode 100644
index 0000000000..ef0ab6ed7b
--- /dev/null
+++ b/data/2023/neurips/Anonymous and Copy-Robust Delegations for Liquid Democracy	
@@ -0,0 +1 @@
+Liquid democracy with ranked delegations is a novel voting scheme that unites the practicability of representative democracy with the idealistic appeal of direct democracy: Every voter decides between casting their vote on a question at hand or delegating their voting weight to some other, trusted agent. Delegations are transitive, and since voters may end up in a delegation cycle, they are encouraged to indicate not only a single delegate, but a set of potential delegates and a ranking among them. Based on the delegation preferences of all voters, a delegation rule selects one representative per voter. Previous work has revealed a trade-off between two properties of delegation rules called anonymity and copy-robustness. To overcome this issue we study two fractional delegation rules: Mixed Borda branching, which generalizes a rule satisfying copy-robustness, and the random walk rule, which satisfies anonymity. Using the Markov chain tree theorem, we show that the two rules are in fact equivalent, and simultaneously satisfy generalized versions of the two properties. Combining the same theorem with Fulkerson's algorithm, we develop a polynomial-time algorithm for computing the outcome of the studied delegation rule. This algorithm is of independent interest, having applications in semi-supervised learning and graph theory.
\ No newline at end of file
diff --git a/data/2023/neurips/Anytime Model Selection in Linear Bandits b/data/2023/neurips/Anytime Model Selection in Linear Bandits
new file mode 100644
index 0000000000..882789f89e
--- /dev/null
+++ b/data/2023/neurips/Anytime Model Selection in Linear Bandits	
@@ -0,0 +1 @@
+Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online learning algorithms that treat different models as experts. Existing methods, however, scale poorly ($\text{poly}M$) with the number of models $M$ in terms of their regret. Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved ($\log M$) dependence on $M$ for its regret. ALEXP has anytime guarantees on its regret, and neither requires knowledge of the horizon $n$, nor relies on an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
\ No newline at end of file
diff --git a/data/2023/neurips/Anytime-Competitive Reinforcement Learning with Policy Prior b/data/2023/neurips/Anytime-Competitive Reinforcement Learning with Policy Prior
new file mode 100644
index 0000000000..f1914e7fb7
--- /dev/null
+++ b/data/2023/neurips/Anytime-Competitive Reinforcement Learning with Policy Prior	
@@ -0,0 +1 @@
+This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.
\ No newline at end of file
diff --git a/data/2023/neurips/Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders b/data/2023/neurips/Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders
new file mode 100644
index 0000000000..d041855650
--- /dev/null
+++ b/data/2023/neurips/Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders	
@@ -0,0 +1 @@
+Structural causal bandit provides a framework for decision-making problems when causal information is available. It models the stochastic environment with a structural causal model (SCM) that governs the causal relations between random variables. In each round, an agent applies an intervention (or no intervention) by setting certain variables to some constants, and receives a stochastic reward from a non-manipulable variable. Though the causal structure is given, the observational and interventional distributions of these random variables are unknown beforehand, and they can only be learned through interactions with the environment. Therefore, to maximize the expected cumulative reward, it is critical to balance the exploration-versus-exploitation tradeoff. We consider discrete random variables with a finite domain and a semi-Markovian setting, where random variables are affected by unobserved confounders. Using the canonical SCM formulation to discretize the domains of unobserved variables, we efficiently integrate samples to reduce model uncertainty, gaining an advantage over those in a classical multi-armed bandit setup. We provide a logarithmic asymptotic regret lower bound for the structural causal bandit problem. Inspired by the lower bound, we design an algorithm that can utilize the causal structure to accelerate the learning process and take informative and rewarding interventions. We establish that our algorithm achieves a logarithmic regret and demonstrate that it outperforms the existing methods via simulations.
\ No newline at end of file
diff --git a/data/2023/neurips/Approximate inference of marginals using the IBIA framework b/data/2023/neurips/Approximate inference of marginals using the IBIA framework
new file mode 100644
index 0000000000..aedc564a04
--- /dev/null
+++ b/data/2023/neurips/Approximate inference of marginals using the IBIA framework	
@@ -0,0 +1 @@
+Exact inference of marginals in probabilistic graphical models (PGM) is known to be intractable, necessitating the use of approximate methods. Most of the existing variational techniques perform iterative message passing in loopy graphs which is slow to converge for many benchmarks. In this paper, we propose a new algorithm for marginal inference that is based on the incremental build-infer-approximate (IBIA) paradigm. Our algorithm converts the PGM into a sequence of linked clique tree forests (SLCTF) with bounded clique sizes, and then uses a heuristic belief update algorithm to infer the marginals. For the special case of Bayesian networks, we show that if the incremental build step in IBIA uses the topological order of variables then (a) the prior marginals are consistent in all CTFs in the SLCTF and (b) the posterior marginals are consistent once all evidence variables are added to the SLCTF. In our approach, the belief propagation step is non-iterative and the accuracy-complexity trade-off is controlled using user-defined clique size bounds. Results for several benchmark sets from recent UAI competitions show that our method gives either better or comparable accuracy than existing variational and sampling based methods, with smaller runtimes.
\ No newline at end of file
diff --git a/data/2023/neurips/Are Diffusion Models Vision-And-Language Reasoners? b/data/2023/neurips/Are Diffusion Models Vision-And-Language Reasoners?
new file mode 100644
index 0000000000..fbb1960305
--- /dev/null
+++ b/data/2023/neurips/Are Diffusion Models Vision-And-Language Reasoners?	
@@ -0,0 +1 @@
+Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.
\ No newline at end of file
diff --git a/data/2023/neurips/Are Vision Transformers More Data Hungry Than Newborn Visual Systems? b/data/2023/neurips/Are Vision Transformers More Data Hungry Than Newborn Visual Systems?
new file mode 100644
index 0000000000..46e31bfb3f
--- /dev/null
+++ b/data/2023/neurips/Are Vision Transformers More Data Hungry Than Newborn Visual Systems?	
@@ -0,0 +1 @@
+Vision transformers (ViTs) are top performing models on many computer vision benchmarks and can accurately predict human behavior on object recognition tasks. However, researchers question the value of using ViTs as models of biological learning because ViTs are thought to be more data hungry than brains, with ViTs requiring more training data to reach similar levels of performance. To test this assumption, we directly compared the learning abilities of ViTs and animals, by performing parallel controlled rearing experiments on ViTs and newborn chicks. We first raised chicks in impoverished visual environments containing a single object, then simulated the training data available in those environments by building virtual animal chambers in a video game engine. We recorded the first-person images acquired by agents moving through the virtual chambers and used those images to train self supervised ViTs that leverage time as a teaching signal, akin to biological visual systems. When ViTs were trained through the eyes of newborn chicks, the ViTs solved the same view invariant object recognition tasks as the chicks. Thus, ViTs were not more data hungry than newborn visual systems: both learned view invariant object representations in impoverished visual environments. The flexible and generic attention based learning mechanism in ViTs combined with the embodied data streams available to newborn animals appears sufficient to drive the development of animal-like object recognition.
\ No newline at end of file
diff --git a/data/2023/neurips/Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment b/data/2023/neurips/Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment
new file mode 100644
index 0000000000..4d0ca3137f
--- /dev/null
+++ b/data/2023/neurips/Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment	
@@ -0,0 +1 @@
+Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo-viewport sequences from a given starting point. Additionally, we design a Multi-scale Feature Aggregation (MFA) module with a Distortion-aware Block (DAB) to fuse distorted and semantic features of each viewport. We also devise Temporal Modeling Module (TMM) to learn the viewport transition in the temporal domain. Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/TianheWu/Assessor360.
\ No newline at end of file
diff --git a/data/2023/neurips/Auditing for Human Expertise b/data/2023/neurips/Auditing for Human Expertise
new file mode 100644
index 0000000000..0bd2aa2e9d
--- /dev/null
+++ b/data/2023/neurips/Auditing for Human Expertise	
@@ -0,0 +1 @@
+High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts add value which could not be captured by an algorithmic predictor. We develop a statistical framework under which we can pose this question as a natural hypothesis test. Indeed, as our framework highlights, detecting human expertise is more subtle than simply comparing the accuracy of expert predictions to those made by a particular learning algorithm. Instead, we propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest after conditioning on the available inputs (`features'). A rejection of our test thus suggests that human experts may add value to any algorithm trained on the available data, and has direct implications for whether human-AI `complementarity' is achievable in a given prediction task. We highlight the utility of our procedure using admissions data collected from the emergency department of a large academic hospital system, where we show that physicians' admit/discharge decisions for patients with acute gastrointestinal bleeding (AGIB) appear to be incorporating information that is not available to a standard algorithmic screening tool. This is despite the fact that the screening tool is arguably more accurate than physicians' discretionary decisions, highlighting that -- even absent normative concerns about accountability or interpretability -- accuracy is insufficient to justify algorithmic automation.
\ No newline at end of file
diff --git a/data/2023/neurips/Augmenting Language Models with Long-Term Memory b/data/2023/neurips/Augmenting Language Models with Long-Term Memory
new file mode 100644
index 0000000000..91972b01d2
--- /dev/null
+++ b/data/2023/neurips/Augmenting Language Models with Long-Term Memory	
@@ -0,0 +1 @@
+Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.
\ No newline at end of file
diff --git a/data/2023/neurips/Auslan-Daily: Australian Sign Language Translation for Daily Communication and News b/data/2023/neurips/Auslan-Daily: Australian Sign Language Translation for Daily Communication and News
new file mode 100644
index 0000000000..b7f5834d0c
--- /dev/null
+++ b/data/2023/neurips/Auslan-Daily: Australian Sign Language Translation for Daily Communication and News	
@@ -0,0 +1 @@
+Sign language translation (SLT) aims to convert a continuous sign language video clip into a spoken language. Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SLT datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SLT. To fill this gap, we curate an Australian Sign Language translation dataset, dubbed Auslan-Daily, which is collected from the Auslan educational TV series and Auslan TV programs. The former involves daily communications among multiple signers in the wild, while the latter comprises sign language videos for up-to-date news, weather forecasts, and documentaries. In particular, Auslan-Daily has two main features: (1) the topics are diverse and signed by multiple signers, and (2) the scenes in our dataset are more complex, e.g. , captured in various environments, gesture interference during multi-signers’ interactions and various camera positions. With a collection of more than 45 hours of high-quality Auslan video materials, we invite Auslan experts to align different fine-grained visual and language pairs, including video ↔ fingerspelling, video ↔ gloss, and video ↔ sentence. As a result, Auslan-Daily contains multi-grained annotations that can be utilized to accomplish various fundamental sign language tasks, such as signer detection, sign spotting, fingerspelling detection, isolated sign language recognition, sign language translation and alignment. Moreover, we benchmark results with state-of-the-art models for each task in Auslan-Daily. Experiments indicate that Auslan-Daily is a highly challenging SLT dataset, and we hope this dataset will contribute to
\ No newline at end of file
diff --git a/data/2023/neurips/AutoGO: Automated Computation Graph Optimization for Neural Network Evolution b/data/2023/neurips/AutoGO: Automated Computation Graph Optimization for Neural Network Evolution
new file mode 100644
index 0000000000..601aa87f44
--- /dev/null
+++ b/data/2023/neurips/AutoGO: Automated Computation Graph Optimization for Neural Network Evolution	
@@ -0,0 +1 @@
+Optimizing Deep Neural Networks (DNNs) to obtain high-quality models for efficient real-world deployment has posed multi-faceted challenges to machine learning engineers. Existing methods either search for neural architectures in heuristic design spaces or apply low-level adjustments to computation primitives to improve inference efficiency on hardware. We present Automated Graph Optimization (AutoGO), a framework to evolve neural networks in a low-level Computation Graph (CG) of primitive operations to improve both its performance and hardware friendliness. Through a tokenization scheme, AutoGO performs variable-sized segment mutations, making both primitive changes and larger-grained changes to CGs. We introduce our segmentation and mutation algorithms, efficient frequent segment mining technique, as well as a pretrained context-aware predictor to estimate the impact of segment replacements. Extensive experimental results show that AutoGO can automatically evolve several typical large convolutional networks to achieve significant task performance improvement and FLOPs reduction on a range of CV tasks, ranging from Classification, Semantic Segmentation, Human Pose Estimation, to Super Resolution, yet without introducing any newer primitive operations. We also demonstrate the lightweight deployment results of AutoGO-optimized super-resolution and denoising U-Nets on a cycle simulator for a Neural Processing Unit (NPU), achieving PSNR improvement and latency/power reduction simultaneously. Code available at https://github.com/Ascend-Research/AutoGO.
\ No newline at end of file
diff --git a/data/2023/neurips/Autodecoding Latent 3D Diffusion Models b/data/2023/neurips/Autodecoding Latent 3D Diffusion Models
new file mode 100644
index 0000000000..f709427dd3
--- /dev/null
+++ b/data/2023/neurips/Autodecoding Latent 3D Diffusion Models	
@@ -0,0 +1 @@
+We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.
\ No newline at end of file
diff --git a/data/2023/neurips/BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks b/data/2023/neurips/BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks
new file mode 100644
index 0000000000..96e7a52650
--- /dev/null
+++ b/data/2023/neurips/BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks	
@@ -0,0 +1 @@
+The MineRL BASALT competition has served to catalyze advances in learning from human feedback through four hard-to-specify tasks in Minecraft, such as create and photograph a waterfall. Given the completion of two years of BASALT competitions, we offer to the community a formalized benchmark through the BASALT Evaluation and Demonstrations Dataset (BEDD), which serves as a resource for algorithm development and performance assessment. BEDD consists of a collection of 26 million image-action pairs from nearly 14,000 videos of human players completing the BASALT tasks in Minecraft. It also includes over 3,000 dense pairwise human evaluations of human and algorithmic agents. These comparisons serve as a fixed, preliminary leaderboard for evaluating newly-developed algorithms. To enable this comparison, we present a streamlined codebase for benchmarking new algorithms against the leaderboard. In addition to presenting these datasets, we conduct a detailed analysis of the data from both datasets to guide algorithm development and evaluation. The released code and data are available at https://github.com/minerllabs/basalt-benchmark .
\ No newline at end of file
diff --git a/data/2023/neurips/BIOT: Biosignal Transformer for Cross-data Learning in the Wild b/data/2023/neurips/BIOT: Biosignal Transformer for Cross-data Learning in the Wild
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization b/data/2023/neurips/BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization
new file mode 100644
index 0000000000..880c14cc44
--- /dev/null
+++ b/data/2023/neurips/BQ-NCO: Bisimulation Quotienting for Efficient Neural Combinatorial Optimization	
@@ -0,0 +1 @@
+Despite the success of neural-based combinatorial optimization methods for end-to-end heuristic learning, out-of-distribution generalization remains a challenge. In this paper, we present a novel formulation of Combinatorial Optimization Problems (COPs) as Markov Decision Processes (MDPs) that effectively leverages common symmetries of COPs to improve out-of-distribution robustness. Starting from a direct MDP formulation of a constructive method, we introduce a generic way to reduce the state space, based on Bisimulation Quotienting (BQ) in MDPs. Then, for COPs with a recursive nature, we specialize the bisimulation and show how the reduced state exploits the symmetries of these problems and facilitates MDP solving. Our approach is principled and we prove that an optimal policy for the proposed BQ-MDP actually solves the associated COPs. We illustrate our approach on five classical problems: the Euclidean and Asymmetric Traveling Salesman, Capacitated Vehicle Routing, Orienteering and Knapsack Problems. Furthermore, for each problem, we introduce a simple attention-based policy network for the BQ-MDPs, which we train by imitation of (near) optimal solutions of small instances from a single distribution. We obtain new state-of-the-art results for the five COPs on both synthetic and realistic benchmarks. Notably, in contrast to most existing neural approaches, our learned policies show excellent generalization performance to much larger instances than seen during training, without any additional search procedure.
\ No newline at end of file
diff --git a/data/2023/neurips/Back-Modality: Leveraging Modal Transformation for Data Augmentation b/data/2023/neurips/Back-Modality: Leveraging Modal Transformation for Data Augmentation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Balancing Risk and Reward: A Batched-Bandit Strategy for Automated Phased Release b/data/2023/neurips/Balancing Risk and Reward: A Batched-Bandit Strategy for Automated Phased Release
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance b/data/2023/neurips/Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance
new file mode 100644
index 0000000000..eadaf03ece
--- /dev/null
+++ b/data/2023/neurips/Banana: Banach Fixed-Point Network for Pointcloud Segmentation with Inter-Part Equivariance	
@@ -0,0 +1 @@
+Equivariance has gained strong interest as a desirable network property that inherently ensures robust generalization. However, when dealing with complex systems such as articulated objects or multi-object scenes, effectively capturing inter-part transformations poses a challenge, as it becomes entangled with the overall structure and local transformations. The interdependence of part assignment and per-part group action necessitates a novel equivariance formulation that allows for their co-evolution. In this paper, we present Banana, a Banach fixed-point network for equivariant segmentation with inter-part equivariance by construction. Our key insight is to iteratively solve a fixed-point problem, where point-part assignment labels and per-part SE(3)-equivariance co-evolve simultaneously. We provide theoretical derivations of both per-step equivariance and global convergence, which induces an equivariant final convergent state. Our formulation naturally provides a strict definition of inter-part equivariance that generalizes to unseen inter-part configurations. Through experiments conducted on both articulated objects and multi-object scans, we demonstrate the efficacy of our approach in achieving strong generalization under inter-part transformations, even when confronted with substantial changes in pointcloud geometry and topology.
\ No newline at end of file
diff --git a/data/2023/neurips/Bandit Task Assignment with Unknown Processing Time b/data/2023/neurips/Bandit Task Assignment with Unknown Processing Time
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/BanditPAM++: Faster k-medoids Clustering b/data/2023/neurips/BanditPAM++: Faster k-medoids Clustering
new file mode 100644
index 0000000000..7c8a7ac2e1
--- /dev/null
+++ b/data/2023/neurips/BanditPAM++: Faster k-medoids Clustering	
@@ -0,0 +1 @@
+Clustering is a fundamental task in data science with wide-ranging applications. In $k$-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in $k$-medoids clustering, respectively. $k$-medoids clustering has recently grown in popularity due to the discovery of more efficient $k$-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized $k$-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is $O(k)$ faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information $\textit{within}$ each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information $\textit{across}$ different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10$\times$ faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments.
\ No newline at end of file
diff --git a/data/2023/neurips/BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis b/data/2023/neurips/BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis
new file mode 100644
index 0000000000..a8e3d66b1d
--- /dev/null
+++ b/data/2023/neurips/BasisFormer: Attention-based Time Series Forecasting with Learnable and Interpretable Basis	
@@ -0,0 +1 @@
+Bases have become an integral part of modern deep learning-based models for time series forecasting due to their ability to act as feature extractors or future references. To be effective, a basis must be tailored to the specific set of time series data and exhibit distinct correlation with each time series within the set. However, current state-of-the-art methods are limited in their ability to satisfy both of these requirements simultaneously. To address this challenge, we propose BasisFormer, an end-to-end time series forecasting architecture that leverages learnable and interpretable bases. This architecture comprises three components: First, we acquire bases through adaptive self-supervised learning, which treats the historical and future sections of the time series as two distinct views and employs contrastive learning. Next, we design a Coef module that calculates the similarity coefficients between the time series and bases in the historical view via bidirectional cross-attention. Finally, we present a Forecast module that selects and consolidates the bases in the future view based on the similarity coefficients, resulting in accurate future predictions. Through extensive experiments on six datasets, we demonstrate that BasisFormer outperforms previous state-of-the-art methods by 11.04\% and 15.78\% respectively for univariate and multivariate forecasting tasks. Code is available at: \url{https://github.com/nzl5116190/Basisformer}
\ No newline at end of file
diff --git a/data/2023/neurips/BayesTune: Bayesian Sparse Deep Model Fine-tuning b/data/2023/neurips/BayesTune: Bayesian Sparse Deep Model Fine-tuning
new file mode 100644
index 0000000000..7c85e96997
--- /dev/null
+++ b/data/2023/neurips/BayesTune: Bayesian Sparse Deep Model Fine-tuning	
@@ -0,0 +1 @@
+Deep learning practice is increasingly driven by powerful foundation models (FM), pre-trained at scale and then fine-tuned for specific tasks of interest. A key property of this workflow is the efficacy of performing sparse or parameter-efficient fine-tuning, meaning that by updating only a tiny fraction of the whole FM parameters on a downstream task can lead to surprisingly good performance, often even superior to a full model update. However, it is not clear what is the optimal and principled way to select which parameters to update. Although a growing number of sparse fine-tuning ideas have been proposed, they are mostly not satisfactory, relying on hand-crafted heuristics or heavy approximation. In this paper we propose a novel Bayesian sparse fine-tuning algorithm: we place a (sparse) Laplace prior for each parameter of the FM, with the mean equal to the initial value and the scale parameter having a hyper-prior that encourages small scale. Roughly speaking, the posterior means of the scale parameters indicate how important it is to update the corresponding parameter away from its initial value when solving the downstream task. Given the sparse prior, most scale parameters are small a posteriori, and the few large-valued scale parameters identify those FM parameters that crucially need to be updated away from their initial values. Based on this, we can threshold the scale parameters to decide which parameters to update or freeze, leading to a principled sparse fine-tuning strategy. To efficiently infer the posterior distribution of the scale parameters, we adopt the Langevin MCMC sampler, requiring only two times the complexity of the vanilla SGD. Tested on popular NLP benchmarks as well as the VTAB vision tasks, our approach shows significant improvement over the state-of-the-arts (e.g., 1% point higher than the best SOTA when fine-tuning RoBERTa for GLUE and SuperGLUE benchmarks).
\ No newline at end of file
diff --git a/data/2023/neurips/Bayesian Active Causal Discovery with Multi-Fidelity Experiments b/data/2023/neurips/Bayesian Active Causal Discovery with Multi-Fidelity Experiments
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Bayesian Learning via Q-Exponential Process b/data/2023/neurips/Bayesian Learning via Q-Exponential Process
new file mode 100644
index 0000000000..1adf39bae0
--- /dev/null
+++ b/data/2023/neurips/Bayesian Learning via Q-Exponential Process	
@@ -0,0 +1 @@
+Regularization is one of the most fundamental topics in optimization, statistics and machine learning. To get sparsity in estimating a parameter $u\in\mathbb{R}^d$, an $\ell_q$ penalty term, $\Vert u\Vert_q$, is usually added to the objective function. What is the probabilistic distribution corresponding to such $\ell_q$ penalty? What is the correct stochastic process corresponding to $\Vert u\Vert_q$ when we model functions $u\in L^q$? This is important for statistically modeling large dimensional objects, e.g. images, with penalty to preserve certainty properties, e.g. edges in the image. In this work, we generalize the $q$-exponential distribution (with density proportional to) $\exp{(- \frac{1}{2}|u|^q)}$ to a stochastic process named $Q$-exponential (Q-EP) process that corresponds to the $L_q$ regularization of functions. The key step is to specify consistent multivariate $q$-exponential distributions by choosing from a large family of elliptic contour distributions. The work is closely related to Besov process which is usually defined by the expanded series. Q-EP can be regarded as a definition of Besov process with explicit probabilistic formulation and direct control on the correlation length. From the Bayesian perspective, Q-EP provides a flexible prior on functions with sharper penalty ($q<2$) than the commonly used Gaussian process (GP). We compare GP, Besov and Q-EP in modeling functional data, reconstructing images, and solving inverse problems and demonstrate the advantage of our proposed methodology.
\ No newline at end of file
diff --git a/data/2023/neurips/Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval b/data/2023/neurips/Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval
new file mode 100644
index 0000000000..bb0066d6c9
--- /dev/null
+++ b/data/2023/neurips/Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval	
@@ -0,0 +1 @@
+We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid log-posterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decomposition of the Generalized Gauss-Newton approximation. Empirically, we show that our Laplacian Metric Learner (LAM) estimates well-calibrated uncertainties, reliably detects out-of-distribution examples, and yields state-of-the-art predictive performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Bayesian Risk-Averse Q-Learning with Streaming Observations b/data/2023/neurips/Bayesian Risk-Averse Q-Learning with Streaming Observations
new file mode 100644
index 0000000000..be22ea944a
--- /dev/null
+++ b/data/2023/neurips/Bayesian Risk-Averse Q-Learning with Streaming Observations	
@@ -0,0 +1 @@
+We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment. To account for the model mis-specification between this training environment and the real environment due to lack of data, we adopt a formulation of Bayesian risk MDP (BRMDP) with infinite horizon, which uses Bayesian posterior to estimate the transition model and impose a risk functional to account for the model uncertainty. Observations from the real environment that is out of the agent's control arrive periodically and are utilized by the agent to update the Bayesian posterior to reduce model uncertainty. We theoretically demonstrate that BRMDP balances the trade-off between robustness and conservativeness, and we further develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment. The proposed algorithm learns a risk-averse yet optimal policy that depends on the availability of real-world observations. We provide a theoretical guarantee of strong convergence for the proposed algorithm.
\ No newline at end of file
diff --git a/data/2023/neurips/Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variability b/data/2023/neurips/Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variability
new file mode 100644
index 0000000000..78cf3cce45
--- /dev/null
+++ b/data/2023/neurips/Bayesian nonparametric (non-)renewal processes for analyzing neural spike train variability	
@@ -0,0 +1 @@
+Neural spiking activity is generally variable, non-stationary, and exhibits complex dependencies on covariates, such as sensory input or behavior. These dependencies have been proposed to be signatures of specific computations, and so characterizing them with quantitative rigor is critical for understanding neural computations. Approaches based on point processes provide a principled statistical framework for modeling neural spiking activity. However, currently, they only allow the instantaneous mean, but not the instantaneous variability, of responses to depend on covariates. To resolve this limitation, we propose a scalable Bayesian approach generalizing modulated renewal processes using sparse variational Gaussian processes. We leverage pathwise conditioning for computing nonparametric priors over conditional interspike interval distributions and rely on automatic relevance determination to detect lagging interspike interval dependencies beyond renewal order. After systematically validating our method on synthetic data, we apply it to two foundational datasets of animal navigation: head direction cells in freely moving mice and hippocampal place cells in rats running along a linear track. Our model exhibits competitive or better predictive power compared to state-of-the-art baselines, and outperforms them in terms of capturing interspike interval statistics. These results confirm the importance of modeling covariate-dependent spiking variability, and further analyses of our fitted models reveal rich patterns of variability modulation beyond the temporal resolution of flexible count-based approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset b/data/2023/neurips/BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
new file mode 100644
index 0000000000..894dd28f06
--- /dev/null
+++ b/data/2023/neurips/BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset	
@@ -0,0 +1 @@
+In this paper, we introduce the \textsc{BeaverTails} dataset, aimed at fostering research on safety alignment in large language models (LLMs). This dataset uniquely separates annotations of helpfulness and harmlessness for question-answering pairs, thus offering distinct perspectives on these crucial attributes. In total, we have gathered safety meta-labels for 30,207 question-answer (QA) pairs and 30,144 pairs of expert comparison data for both the helpfulness and harmlessness metrics. In total, we have gathered safety meta-labels for 333,963 question-answer (QA) pairs and 361,903 pairs of expert comparison data for both the helpfulness and harmlessness metrics. We further showcase applications of BeaverTails in content moderation and reinforcement learning with human feedback (RLHF), emphasizing its potential for practical safety measures in LLMs. We believe this dataset provides vital resources for the community, contributing towards the safe development and deployment of LLMs. Our project page is available at the following URL: https://sites.google.com/view/pku-beavertails. Warning: this paper contains example data that may be offensive or harmful.
\ No newline at end of file
diff --git a/data/2023/neurips/Benchmarking Foundation Models with Language-Model-as-an-Examiner b/data/2023/neurips/Benchmarking Foundation Models with Language-Model-as-an-Examiner
new file mode 100644
index 0000000000..630ea09c9d
--- /dev/null
+++ b/data/2023/neurips/Benchmarking Foundation Models with Language-Model-as-an-Examiner	
@@ -0,0 +1 @@
+Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: https://lmexam.com.
\ No newline at end of file
diff --git a/data/2023/neurips/Better Private Linear Regression Through Better Private Feature Selection b/data/2023/neurips/Better Private Linear Regression Through Better Private Feature Selection
new file mode 100644
index 0000000000..98f1176448
--- /dev/null
+++ b/data/2023/neurips/Better Private Linear Regression Through Better Private Feature Selection	
@@ -0,0 +1 @@
+Existing work on differentially private linear regression typically assumes that end users can precisely set data bounds or algorithmic hyperparameters. End users often struggle to meet these requirements without directly examining the data (and violating privacy). Recent work has attempted to develop solutions that shift these burdens from users to algorithms, but they struggle to provide utility as the feature dimension grows. This work extends these algorithms to higher-dimensional problems by introducing a differentially private feature selection method based on Kendall rank correlation. We prove a utility guarantee for the setting where features are normally distributed and conduct experiments across 25 datasets. We find that adding this private feature selection step before regression significantly broadens the applicability of ``plug-and-play'' private linear regression algorithms at little additional cost to privacy, computation, or decision-making by the end user.
\ No newline at end of file
diff --git a/data/2023/neurips/Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence b/data/2023/neurips/Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
new file mode 100644
index 0000000000..0ccfeb935f
--- /dev/null
+++ b/data/2023/neurips/Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence	
@@ -0,0 +1 @@
+Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-$(k + 1)$ Graph. Unlike the existing topologies, the Base-$(k + 1)$ Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-$(k + 1)$ Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-$(k + 1)$ Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.
\ No newline at end of file
diff --git a/data/2023/neurips/Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends b/data/2023/neurips/Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends
new file mode 100644
index 0000000000..c293daa5d6
--- /dev/null
+++ b/data/2023/neurips/Beyond Myopia: Learning from Positive and Unlabeled Data through Holistic Predictive Trends	
@@ -0,0 +1 @@
+Learning binary classifiers from positive and unlabeled data (PUL) is vital in many real-world applications, especially when verifying negative examples is difficult. Despite the impressive empirical performance of recent PUL methods, challenges like accumulated errors and increased estimation bias persist due to the absence of negative labels. In this paper, we unveil an intriguing yet long-overlooked observation in PUL: \textit{resampling the positive data in each training iteration to ensure a balanced distribution between positive and unlabeled examples results in strong early-stage performance. Furthermore, predictive trends for positive and negative classes display distinctly different patterns.} Specifically, the scores (output probability) of unlabeled negative examples consistently decrease, while those of unlabeled positive examples show largely chaotic trends. Instead of focusing on classification within individual time frames, we innovatively adopt a holistic approach, interpreting the scores of each example as a temporal point process (TPP). This reformulates the core problem of PUL as recognizing trends in these scores. We then propose a novel TPP-inspired measure for trend detection and prove its asymptotic unbiasedness in predicting changes. Notably, our method accomplishes PUL without requiring additional parameter tuning or prior assumptions, offering an alternative perspective for tackling this problem. Extensive experiments verify the superiority of our method, particularly in a highly imbalanced real-world setting, where it achieves improvements of up to $11.3\%$ in key metrics. The code is available at \href{https://github.com/wxr99/HolisticPU}{https://github.com/wxr99/HolisticPU}.
\ No newline at end of file
diff --git a/data/2023/neurips/Bias in Evaluation Processes: An Optimization-Based Model b/data/2023/neurips/Bias in Evaluation Processes: An Optimization-Based Model
new file mode 100644
index 0000000000..89c5756bf0
--- /dev/null
+++ b/data/2023/neurips/Bias in Evaluation Processes: An Optimization-Based Model	
@@ -0,0 +1 @@
+Biases with respect to socially-salient attributes of individuals have been well documented in evaluation processes used in settings such as admissions and hiring. We view such an evaluation process as a transformation of a distribution of the true utility of an individual for a task to an observed distribution and model it as a solution to a loss minimization problem subject to an information constraint. Our model has two parameters that have been identified as factors leading to biases: the resource-information trade-off parameter in the information constraint and the risk-averseness parameter in the loss function. We characterize the distributions that arise from our model and study the effect of the parameters on the observed distribution. The outputs of our model enrich the class of distributions that can be used to capture variation across groups in the observed evaluations. We empirically validate our model by fitting real-world datasets and use it to study the effect of interventions in a downstream selection task. These results contribute to an understanding of the emergence of bias in evaluation processes and provide tools to guide the deployment of interventions to mitigate biases.
\ No newline at end of file
diff --git a/data/2023/neurips/Bicriteria Approximation Algorithms for the Submodular Cover Problem b/data/2023/neurips/Bicriteria Approximation Algorithms for the Submodular Cover Problem
new file mode 100644
index 0000000000..b1d0cb1dad
--- /dev/null
+++ b/data/2023/neurips/Bicriteria Approximation Algorithms for the Submodular Cover Problem	
@@ -0,0 +1 @@
+In this paper, we consider the optimization problem Submodular Cover (SCP), which is to find a minimum cardinality subset of a finite universe $U$ such that the value of a submodular function $f$ is above an input threshold $\tau$. In particular, we consider several variants of SCP including the general case, the case where $f$ is additionally assumed to be monotone, and finally the case where $f$ is a regularized monotone submodular function. Our most significant contributions are that: (i) We propose a scalable algorithm for monotone SCP that achieves nearly the same approximation guarantees as the standard greedy algorithm in significantly faster time; (ii) We are the first to develop an algorithm for general SCP that achieves a solution arbitrarily close to being feasible; and finally (iii) we are the first to develop algorithms for regularized SCP. Our algorithms are then demonstrated to be effective in an extensive experimental section on data summarization and graph cut, two applications of SCP.
\ No newline at end of file
diff --git a/data/2023/neurips/Bicriteria Multidimensional Mechanism Design with Side Information b/data/2023/neurips/Bicriteria Multidimensional Mechanism Design with Side Information
new file mode 100644
index 0000000000..36d304e04b
--- /dev/null
+++ b/data/2023/neurips/Bicriteria Multidimensional Mechanism Design with Side Information	
@@ -0,0 +1 @@
+We develop a versatile new methodology for multidimensional mechanism design that incorporates side information about agent types with the bicriteria goal of generating high social welfare and high revenue simultaneously. Side information can come from a variety of sources -- examples include advice from a domain expert, predictions from a machine-learning model trained on historical agent data, or even the mechanism designer's own gut instinct -- and in practice such sources are abundant. In this paper we adopt a prior-free perspective that makes no assumptions on the correctness, accuracy, or source of the side information. First, we design a meta-mechanism that integrates input side information with an improvement of the classical VCG mechanism. The welfare, revenue, and incentive properties of our meta-mechanism are characterized by a number of novel constructions we introduce based on the notion of a weakest competitor, which is an agent that has the smallest impact on welfare. We then show that our meta-mechanism -- when carefully instantiated -- simultaneously achieves strong welfare and revenue guarantees that are parameterized by errors in the side information. When the side information is highly informative and accurate, our mechanism achieves welfare and revenue competitive with the total social surplus, and its performance decays continuously and gradually as the quality of the side information decreases. Finally, we apply our meta-mechanism to a setting where each agent's type is determined by a constant number of parameters. Specifically, agent types lie on constant-dimensional subspaces (of the potentially high-dimensional ambient type space) that are known to the mechanism designer. We use our meta-mechanism to obtain the first known welfare and revenue guarantees in this setting.
\ No newline at end of file
diff --git a/data/2023/neurips/Bifurcations and loss jumps in RNN training b/data/2023/neurips/Bifurcations and loss jumps in RNN training
new file mode 100644
index 0000000000..6b8718d38e
--- /dev/null
+++ b/data/2023/neurips/Bifurcations and loss jumps in RNN training	
@@ -0,0 +1 @@
+Recurrent neural networks (RNNs) are popular machine learning tools for modeling and forecasting sequential data and for inferring dynamical systems (DS) from observed time series. Concepts from DS theory (DST) have variously been used to further our understanding of both, how trained RNNs solve complex tasks, and the training process itself. Bifurcations are particularly important phenomena in DS, including RNNs, that refer to topological (qualitative) changes in a system's dynamical behavior as one or more of its parameters are varied. Knowing the bifurcation structure of an RNN will thus allow to deduce many of its computational and dynamical properties, like its sensitivity to parameter variations or its behavior during training. In particular, bifurcations may account for sudden loss jumps observed in RNN training that could severely impede the training process. Here we first mathematically prove for a particular class of ReLU-based RNNs that certain bifurcations are indeed associated with loss gradients tending toward infinity or zero. We then introduce a novel heuristic algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions, hence bifurcation manifolds in parameter space. In contrast to previous numerical algorithms for finding fixed points and common continuation methods, our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior. We exemplify the algorithm on the analysis of the training process of RNNs, and find that the recently introduced technique of generalized teacher forcing completely avoids certain types of bifurcations in training. Thus, besides facilitating the DST analysis of trained RNNs, our algorithm provides a powerful instrument for analyzing the training process itself.
\ No newline at end of file
diff --git a/data/2023/neurips/BioMassters: A Benchmark Dataset for Forest Biomass Estimation using Multi-modal Satellite Time-series b/data/2023/neurips/BioMassters: A Benchmark Dataset for Forest Biomass Estimation using Multi-modal Satellite Time-series
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method b/data/2023/neurips/Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method
new file mode 100644
index 0000000000..3d91c24535
--- /dev/null
+++ b/data/2023/neurips/Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method	
@@ -0,0 +1 @@
+The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment. However, they typically simulate the missing content by manual-designed error masks, thus failing to fill in the realistic video loss in video communication (e.g., telepresence, live streaming, and internet video) and multimedia forensics. To address this, we introduce the bitstream-corrupted video (BSCV) benchmark, the first benchmark dataset with more than 28,000 video clips, which can be used for bitstream-corrupted video recovery in the real world. The BSCV is a collection of 1) a proposed three-parameter corruption model for video bitstream, 2) a large-scale dataset containing rich error patterns, multiple corruption levels, and flexible dataset branches, and 3) a plug-and-play module in video recovery framework that serves as a benchmark. We evaluate state-of-the-art video inpainting methods on the BSCV dataset, demonstrating existing approaches' limitations and our framework's advantages in solving the bitstream-corrupted video recovery problem. The benchmark and dataset are released at https://github.com/LIUTIGHE/BSCV-Dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/Black-Box Differential Privacy for Interactive ML b/data/2023/neurips/Black-Box Differential Privacy for Interactive ML
new file mode 100644
index 0000000000..4c6e9e425c
--- /dev/null
+++ b/data/2023/neurips/Black-Box Differential Privacy for Interactive ML	
@@ -0,0 +1 @@
+In this work we revisit an interactive variant of joint differential privacy, recently introduced by Naor et al. [2023], and generalize it towards handling online processes in which existing privacy definitions seem too restrictive. We study basic properties of this definition and demonstrate that it satisfies (suitable variants) of group privacy, composition, and post processing. In order to demonstrate the advantages of this privacy definition compared to traditional forms of differential privacy, we consider the basic setting of online classification. We show that any (possibly non-private) learning rule can be effectively transformed to a private learning rule with only a polynomial overhead in the mistake bound. This demonstrates a stark difference with traditional forms of differential privacy, such as the one studied by Golowich and Livni [2021]
\ No newline at end of file
diff --git a/data/2023/neurips/Block Coordinate Plug-and-Play Methods for Blind Inverse Problems b/data/2023/neurips/Block Coordinate Plug-and-Play Methods for Blind Inverse Problems
new file mode 100644
index 0000000000..97c86e4a9e
--- /dev/null
+++ b/data/2023/neurips/Block Coordinate Plug-and-Play Methods for Blind Inverse Problems	
@@ -0,0 +1 @@
+Plug-and-play (PnP) prior is a well-known class of methods for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image denoisers. While PnP methods have been extensively used for image recovery with known measurement operators, there is little work on PnP for solving blind inverse problems. We address this gap by presenting a new block-coordinate PnP (BC-PnP) method that efficiently solves this joint estimation problem by introducing learned denoisers as priors on both the unknown image and the unknown measurement operator. We present a new convergence theory for BC-PnP compatible with blind inverse problems by considering nonconvex data-fidelity terms and expansive denoisers. Our theory analyzes the convergence of BC-PnP to a stationary point of an implicit function associated with an approximate minimum mean-squared error (MMSE) denoiser. We numerically validate our method on two blind inverse problems: automatic coil sensitivity estimation in magnetic resonance imaging (MRI) and blind image deblurring. Our results show that BC-PnP provides an efficient and principled framework for using denoisers as PnP priors for jointly estimating measurement operators and images.
\ No newline at end of file
diff --git a/data/2023/neurips/Block-State Transformers b/data/2023/neurips/Block-State Transformers
new file mode 100644
index 0000000000..a7fe38ebab
--- /dev/null
+++ b/data/2023/neurips/Block-State Transformers	
@@ -0,0 +1 @@
+State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.
\ No newline at end of file
diff --git a/data/2023/neurips/Boosting Adversarial Transferability by Achieving Flat Local Maxima b/data/2023/neurips/Boosting Adversarial Transferability by Achieving Flat Local Maxima
new file mode 100644
index 0000000000..b32c226c0d
--- /dev/null
+++ b/data/2023/neurips/Boosting Adversarial Transferability by Achieving Flat Local Maxima	
@@ -0,0 +1 @@
+Transfer-based attack adopts the adversarial examples generated on the surrogate model to attack various models, making it applicable in the physical world and attracting increasing interest. Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, inspired by the observation that flat local minima are correlated with good generalization, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability by introducing a penalized gradient norm to the original loss function. Since directly optimizing the gradient regularization norm is computationally expensive and intractable for generating adversarial examples, we propose an approximation optimization method to simplify the gradient update of the objective function. Specifically, we randomly sample an example and adopt a first-order procedure to approximate the curvature of Hessian/vector product, which makes computing more efficient by interpolating two neighboring gradients. Meanwhile, in order to obtain a more stable gradient direction, we randomly sample multiple examples and average the gradients of these examples to reduce the variance due to random sampling during the iterative process. Extensive experimental results on the ImageNet-compatible dataset show that the proposed method can generate adversarial examples at flat local regions, and significantly improve the adversarial transferability on either normally trained models or adversarially trained models than the state-of-the-art attacks. Our codes are available at: https://github.com/Trustworthy-AI-Group/PGN.
\ No newline at end of file
diff --git a/data/2023/neurips/Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning b/data/2023/neurips/Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences b/data/2023/neurips/Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences
new file mode 100644
index 0000000000..df94a895af
--- /dev/null
+++ b/data/2023/neurips/Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences	
@@ -0,0 +1 @@
+We study the problem of optimizing biological sequences, e.g., proteins, DNA, and RNA, to maximize a black-box score function that is only evaluated in an offline dataset. We propose a novel solution, bootstrapped training of score-conditioned generator (BootGen) algorithm. Our algorithm repeats a two-stage process. In the first stage, our algorithm trains the biological sequence generator with rank-based weights to enhance the accuracy of sequence generation based on high scores. The subsequent stage involves bootstrapping, which augments the training dataset with self-generated data labeled by a proxy score function. Our key idea is to align the score-based generation with a proxy score function, which distills the knowledge of the proxy score function to the generator. After training, we aggregate samples from multiple bootstrapped generators and proxies to produce a diverse design. Extensive experiments show that our method outperforms competitive baselines on biological sequential design tasks. We provide reproducible source code: \href{https://github.com/kaist-silab/bootgen}{https://github.com/kaist-silab/bootgen}.
\ No newline at end of file
diff --git a/data/2023/neurips/Boundary Guided Learning-Free Semantic Control with Diffusion Models b/data/2023/neurips/Boundary Guided Learning-Free Semantic Control with Diffusion Models
new file mode 100644
index 0000000000..2d04d45047
--- /dev/null
+++ b/data/2023/neurips/Boundary Guided Learning-Free Semantic Control with Diffusion Models	
@@ -0,0 +1 @@
+Applying pre-trained generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning DDMs or learning auxiliary editing networks in the existing literature. In this work, we present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs, without learning any extra networks. As one of the first learning-free diffusion editing works, we start by seeking a comprehensive understanding of the intermediate high-dimensional latent spaces by theoretically and empirically analyzing their probabilistic and geometric behaviors in the Markov chain. We then propose to further explore the critical step for editing in the denoising trajectory that characterizes the convergence of a pre-trained DDM and introduce an automatic search method. Last but not least, in contrast to the conventional understanding that DDMs have relatively poor semantic behaviors, we prove that the critical latent space we found already exhibits semantic subspace boundaries at the generic level in unconditional DDMs, which allows us to do controllable manipulation by guiding the denoising trajectory towards the targeted boundary via a single-step operation. We conduct extensive experiments on multiple DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256), achieving superior or state-of-the-art performance in various task scenarios (image semantic editing, text-based editing, unconditional semantic control) to demonstrate the effectiveness.
\ No newline at end of file
diff --git a/data/2023/neurips/Bounding training data reconstruction in DP-SGD b/data/2023/neurips/Bounding training data reconstruction in DP-SGD
new file mode 100644
index 0000000000..292127edba
--- /dev/null
+++ b/data/2023/neurips/Bounding training data reconstruction in DP-SGD	
@@ -0,0 +1 @@
+Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks. By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples. Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against training data reconstruction, then utility of private models can be improved because less noise is required to protect against these more ambitious attacks. We investigate this further in the context of DP-SGD, a standard algorithm for private deep learning, and provide an upper bound on the success of any reconstruction attack against DP-SGD together with an attack that empirically matches the predictions of our bound. Together, these two results open the door to fine-grained investigations on how to set the privacy parameters of DP-SGD in practice to protect against reconstruction attacks. Finally, we use our methods to demonstrate that different settings of the DP-SGD parameters leading to the same DP guarantees can result in significantly different success rates for reconstruction, indicating that the DP guarantee alone might not be a good proxy for controlling the protection against reconstruction attacks.
\ No newline at end of file
diff --git a/data/2023/neurips/Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models b/data/2023/neurips/Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models
new file mode 100644
index 0000000000..8e82daf214
--- /dev/null
+++ b/data/2023/neurips/Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models	
@@ -0,0 +1 @@
+A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually assembled stimulus sets consisting of isolated objects in non-ecological contexts, exploring functional organization without robust a priori hypotheses has been challenging. To overcome these limitations, we introduce a data-driven approach in which we synthesize images predicted to activate a given brain region using paired natural images and fMRI recordings, bypassing the need for category-specific stimuli. Our approach -- Brain Diffusion for Visual Exploration ("BrainDiVE") -- builds on recent generative methods by combining large-scale diffusion models with brain-guided image synthesis. Validating our method, we demonstrate the ability to synthesize preferred images with appropriate semantic specificity for well-characterized category-selective ROIs. We then show that BrainDiVE can characterize differences between ROIs selective for the same high-level category. Finally we identify novel functional subdivisions within these ROIs, validated with behavioral data. These results advance our understanding of the fine-grained functional organization of human visual cortex, and provide well-specified constraints for further examination of cortical organization using hypothesis-driven methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy b/data/2023/neurips/Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy
new file mode 100644
index 0000000000..b64d81d4bc
--- /dev/null
+++ b/data/2023/neurips/Breaking the Communication-Privacy-Accuracy Tradeoff with f-Differential Privacy	
@@ -0,0 +1 @@
+We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. The commonly adopted compression schemes introduce information loss into local data while improving communication efficiency, and it remains an open problem whether such discrete-valued mechanisms provide any privacy protection. In this paper, we study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP). More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms, including the binomial noise and the binomial mechanisms that are proposed for privacy preservation, and the sign-based methods that are proposed for data compression, in closed-form expressions. We further investigate the amplification in privacy by sparsification and propose a ternary stochastic compressor. By leveraging compression for privacy amplification, we improve the existing methods by removing the dependency of accuracy (in terms of mean square error) on communication cost in the popular use case of distributed mean estimation, therefore breaking the three-way tradeoff between privacy, communication, and accuracy. Finally, we discuss the Byzantine resilience of the proposed mechanism and its application in federated learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models b/data/2023/neurips/Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
new file mode 100644
index 0000000000..040cd2f729
--- /dev/null
+++ b/data/2023/neurips/Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models	
@@ -0,0 +1 @@
+Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoencoder, enabling more focused attention on foreground representations. Moreover, we bridge the 3D-text gap at the scene level using image captioning foundation models, thereby facilitating scene-level knowledge distillation. We further extend this bridging effort by introducing an innovative object-level knowledge distillation method that harnesses highly accurate object-level masks and semantic text data from foundation models. Our methodology significantly surpasses the performance of existing state-of-the-art methods in 3D object detection and semantic segmentation tasks. For instance, on the ScanNet dataset, Bridge3D improves the baseline by a notable margin of 6.3%. Code will be available at: https://github.com/Zhimin-C/Bridge3D
\ No newline at end of file
diff --git a/data/2023/neurips/Bypassing spike sorting: Density-based decoding using spike localization from dense multielectrode probes b/data/2023/neurips/Bypassing spike sorting: Density-based decoding using spike localization from dense multielectrode probes
new file mode 100644
index 0000000000..ff2f0a9b72
--- /dev/null
+++ b/data/2023/neurips/Bypassing spike sorting: Density-based decoding using spike localization from dense multielectrode probes	
@@ -0,0 +1 @@
+Neural decoding and its applications to brain computer interfaces (BCI) are essential for understanding the association between neural activity and behavior. A prerequisite for many decoding approaches is spike sorting, the assignment of action potentials (spikes) to individual neurons. Current spike sorting algorithms, however, can be inaccurate and do not properly model uncertainty of spike assignments, therefore discarding information that could potentially improve decoding performance. Recent advances in high-density probes (e.g., Neuropixels) and computational methods now allow for extracting a rich set of spike features from unsorted data; these features can in turn be used to directly decode behavioral correlates. To this end, we propose a spike sorting-free decoding method that directly models the distribution of extracted spike features using a mixture of Gaussians (MoG) encoding the uncertainty of spike assignments, without aiming to solve the spike clustering problem explicitly. We allow the mixing proportion of the MoG to change over time in response to the behavior and develop variational inference methods to fit the resulting model and to perform decoding. We benchmark our method with an extensive suite of recordings from different animals and probe geometries, demonstrating that our proposed decoder can consistently outperform current methods based on thresholding (i.e. multi-unit activity) and spike sorting. Open source code is available at https://github.com/yzhang511/density_decoding.
\ No newline at end of file
diff --git a/data/2023/neurips/Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits b/data/2023/neurips/Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits
new file mode 100644
index 0000000000..99e99a97e6
--- /dev/null
+++ b/data/2023/neurips/Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits	
@@ -0,0 +1 @@
+We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6}})$, or are computationally inefficient. We greatly improve these results by achieving a regret of $\widetilde{O}(\sqrt{T})$ without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with $poly(d)\sqrt{T}$ regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error.
\ No newline at end of file
diff --git a/data/2023/neurips/C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder b/data/2023/neurips/C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder
new file mode 100644
index 0000000000..84cedbf6c0
--- /dev/null
+++ b/data/2023/neurips/C-Disentanglement: Discovering Causally-Independent Generative Factors under an Inductive Bias of Confounder	
@@ -0,0 +1 @@
+Representation learning assumes that real-world data is generated by a few semantically meaningful generative factors (i.e., sources of variation) and aims to discover them in the latent space. These factors are expected to be causally disentangled, meaning that distinct factors are encoded into separate latent variables, and changes in one factor will not affect the values of the others. Compared to statistical independence, causal disentanglement allows more controllable data generation, improved robustness, and better generalization. However, most existing work assumes unconfoundedness in the discovery process, that there are no common causes to the generative factors and thus obtain only statistical independence. In this paper, we recognize the importance of modeling confounders in discovering causal generative factors. Unfortunately, such factors are not identifiable without proper inductive bias. We fill the gap by introducing a framework entitled Confounded-Disentanglement (C-Disentanglement), the first framework that explicitly introduces the inductive bias of confounder via labels from domain expertise. In addition, we accordingly propose an approach to sufficiently identify the causally disentangled factors under any inductive bias of the confounder. We conduct extensive experiments on both synthetic and real-world datasets. Our method demonstrates competitive results compared to various SOTA baselines in obtaining causally disentangled features and downstream tasks under domain shifts.
\ No newline at end of file
diff --git a/data/2023/neurips/CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models b/data/2023/neurips/CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
new file mode 100644
index 0000000000..1ace6fd7b4
--- /dev/null
+++ b/data/2023/neurips/CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models	
@@ -0,0 +1 @@
+Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. Our method is based on two technical advancements: a new theoretically-justified pruner, which can handle complex weight correlations accurately and efficiently during the pruning process itself, and an efficient finetuning procedure for post-compression recovery. We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (ViT), modern CNNs, and ViT-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. $\geq 75$%) with low impact on accuracy ($\leq 1$% relative drop). Our approach is also compatible with structured pruning and quantization, and can lead to practical speedups of 1.5 to 2.4x without accuracy loss. To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.
\ No newline at end of file
diff --git a/data/2023/neurips/CARE: Modeling Interacting Dynamics Under Temporal Environmental Variation b/data/2023/neurips/CARE: Modeling Interacting Dynamics Under Temporal Environmental Variation
new file mode 100644
index 0000000000..b79615512d
--- /dev/null
+++ b/data/2023/neurips/CARE: Modeling Interacting Dynamics Under Temporal Environmental Variation	
@@ -0,0 +1 @@
+Modeling interacting dynamical systems, such as fluid dynamics and intermolecular interactions, is a fundamental research problem for understanding and simulating complex real-world systems. Many of these systems can be naturally represented by dynamic graphs, and graph neural network-based approaches have been proposed and shown promising performance. However, most of these approaches assume the underlying dynamics does not change over time, which is unfortunately untrue. For example, a molecular dynamics can be affected by the environment temperature over the time. In this paper, we take an attempt to provide a probabilistic view for time-varying dynamics and propose a model Context-attended Graph ODE (CARE) for modeling time-varying interacting dynamical systems. In our CARE, we explicitly use a context variable to model time-varying environment and construct an encoder to initialize the context variable from historical trajectories. Furthermore, we employ a neural ODE model to depict the dynamic evolution of the context variable inferred from system states. This context variable is incorporated into a coupled ODE to simultaneously drive the evolution of systems. Comprehensive experiments on four datasets demonstrate the effectiveness of our proposed CARE compared with several state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/CAST: Cross-Attention in Space and Time for Video Action Recognition b/data/2023/neurips/CAST: Cross-Attention in Space and Time for Video Action Recognition
new file mode 100644
index 0000000000..d60fc6ad76
--- /dev/null
+++ b/data/2023/neurips/CAST: Cross-Attention in Space and Time for Video Action Recognition	
@@ -0,0 +1 @@
+Recognizing human actions in videos requires spatial and temporal understanding. Most existing action recognition models lack a balanced spatio-temporal understanding of videos. In this work, we propose a novel two-stream architecture, called Cross-Attention in Space and Time (CAST), that achieves a balanced spatio-temporal understanding of videos using only RGB input. Our proposed bottleneck cross-attention mechanism enables the spatial and temporal expert models to exchange information and make synergistic predictions, leading to improved performance. We validate the proposed method with extensive experiments on public benchmarks with different characteristics: EPIC-KITCHENS-100, Something-Something-V2, and Kinetics-400. Our method consistently shows favorable performance across these datasets, while the performance of existing methods fluctuates depending on the dataset characteristics.
\ No newline at end of file
diff --git a/data/2023/neurips/CEIL: Generalized Contextual Imitation Learning b/data/2023/neurips/CEIL: Generalized Contextual Imitation Learning
new file mode 100644
index 0000000000..cabb173e75
--- /dev/null
+++ b/data/2023/neurips/CEIL: Generalized Contextual Imitation Learning	
@@ -0,0 +1 @@
+In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/CHAMMI: A benchmark for channel-adaptive models in microscopy imaging b/data/2023/neurips/CHAMMI: A benchmark for channel-adaptive models in microscopy imaging
new file mode 100644
index 0000000000..6941314e60
--- /dev/null
+++ b/data/2023/neurips/CHAMMI: A benchmark for channel-adaptive models in microscopy imaging	
@@ -0,0 +1 @@
+Most neural networks assume that input images have a fixed number of channels (three for RGB images). However, there are many settings where the number of channels may vary, such as microscopy images where the number of channels changes depending on instruments and experimental goals. Yet, there has not been a systemic attempt to create and evaluate neural networks that are invariant to the number and type of channels. As a result, trained models remain specific to individual studies and are hardly reusable for other microscopy settings. In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework. In addition, we adapted several existing techniques to create channel-adaptive models and compared their performance on this benchmark to fixed-channel, baseline models. We find that channel-adaptive models can generalize better to out-of-domain tasks and can be computationally efficient. We contribute a curated dataset (https://doi.org/10.5281/zenodo.7988357) and an evaluation API (https://github.com/broadinstitute/MorphEm.git) to facilitate objective comparisons in future research and applications.
\ No newline at end of file
diff --git a/data/2023/neurips/COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs b/data/2023/neurips/COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
new file mode 100644
index 0000000000..6229dd4cf2
--- /dev/null
+++ b/data/2023/neurips/COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs	
@@ -0,0 +1 @@
+Counterfactual examples have proven to be valuable in the field of natural language processing (NLP) for both evaluating and improving the robustness of language models to spurious correlations in datasets. Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models. We use our framework to create COCO-Counterfactuals, a multimodal counterfactual dataset of paired image and text captions based on the MS-COCO dataset. We validate the quality of COCO-Counterfactuals through human evaluations and show that existing multimodal models are challenged by our counterfactual image-text pairs. Additionally, we demonstrate the usefulness of COCO-Counterfactuals for improving out-of-domain generalization of multimodal vision-language models via training data augmentation.
\ No newline at end of file
diff --git a/data/2023/neurips/COOM: A Game Benchmark for Continual Reinforcement Learning b/data/2023/neurips/COOM: A Game Benchmark for Continual Reinforcement Learning
new file mode 100644
index 0000000000..fb9ab0d7b7
--- /dev/null
+++ b/data/2023/neurips/COOM: A Game Benchmark for Continual Reinforcement Learning	
@@ -0,0 +1 @@
+The advancement of continual reinforcement learning (RL) has been facing various obstacles, including standardized metrics and evaluation protocols, demanding computational requirements, and a lack of widely accepted standard benchmarks. In response to these challenges, we present COOM ( C ontinual D OOM ), a continual RL benchmark tailored for embodied pixel-based RL. COOM presents a meticulously crafted suite of task sequences set within visually distinct 3D environments, serving as a robust evaluation framework to assess crucial aspects of continual RL, such as catastrophic forgetting, knowledge transfer, and sample-efficient learning. Following an in-depth empirical evaluation of popular continual learning (CL) methods, we pinpoint their limitations, provide valuable insight into the benchmark and highlight unique algorithmic challenges. This makes our work the first to benchmark image-based CRL in 3D environments with embodied perception. The primary objective of the COOM benchmark is to offer the research community a valuable and cost-effective challenge. It seeks to deepen our comprehension of the capabilities and limitations of current and forthcoming CL methods in an RL setting. The code and environments are open-sourced and accessible on GitHub.
\ No newline at end of file
diff --git a/data/2023/neurips/CORL: Research-oriented Deep Offline Reinforcement Learning Library b/data/2023/neurips/CORL: Research-oriented Deep Offline Reinforcement Learning Library
new file mode 100644
index 0000000000..ed25f03ef5
--- /dev/null
+++ b/data/2023/neurips/CORL: Research-oriented Deep Offline Reinforcement Learning Library	
@@ -0,0 +1 @@
+CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.
\ No newline at end of file
diff --git a/data/2023/neurips/CQM: Curriculum Reinforcement Learning with a Quantized World Model b/data/2023/neurips/CQM: Curriculum Reinforcement Learning with a Quantized World Model
new file mode 100644
index 0000000000..8ed7ff7951
--- /dev/null
+++ b/data/2023/neurips/CQM: Curriculum Reinforcement Learning with a Quantized World Model	
@@ -0,0 +1 @@
+Recent curriculum Reinforcement Learning (RL) has shown notable progress in solving complex tasks by proposing sequences of surrogate tasks. However, the previous approaches often face challenges when they generate curriculum goals in a high-dimensional space. Thus, they usually rely on manually specified goal spaces. To alleviate this limitation and improve the scalability of the curriculum, we propose a novel curriculum method that automatically defines the semantic goal space which contains vital information for the curriculum process, and suggests curriculum goals over it. To define the semantic goal space, our method discretizes continuous observations via vector quantized-variational autoencoders (VQ-VAE) and restores the temporal relations between the discretized observations by a graph. Concurrently, ours suggests uncertainty and temporal distance-aware curriculum goals that converges to the final goals over the automatically composed goal space. We demonstrate that the proposed method allows efficient explorations in an uninformed environment with raw goal examples only. Also, ours outperforms the state-of-the-art curriculum RL methods on data efficiency and performance, in various goal-reaching tasks even with ego-centric visual inputs.
\ No newline at end of file
diff --git a/data/2023/neurips/CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography b/data/2023/neurips/CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography
new file mode 100644
index 0000000000..64288ffb4c
--- /dev/null
+++ b/data/2023/neurips/CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography	
@@ -0,0 +1 @@
+Current image steganography techniques are mainly focused on cover-based methods, which commonly have the risk of leaking secret images and poor robustness against degraded container images. Inspired by recent developments in diffusion models, we discovered that two properties of diffusion models, the ability to achieve translation between two images without training, and robustness to noisy data, can be used to improve security and natural robustness in image steganography tasks. For the choice of diffusion model, we selected Stable Diffusion, a type of conditional diffusion model, and fully utilized the latest tools from open-source communities, such as LoRAs and ControlNets, to improve the controllability and diversity of container images. In summary, we propose a novel image steganography framework, named Controllable, Robust and Secure Image Steganography (CRoSS), which has significant advantages in controllability, robustness, and security compared to cover-based image steganography methods. These benefits are obtained without additional training. To our knowledge, this is the first work to introduce diffusion models to the field of image steganography. In the experimental section, we conducted detailed experiments to demonstrate the advantages of our proposed CRoSS framework in controllability, robustness, and security.
\ No newline at end of file
diff --git a/data/2023/neurips/CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss b/data/2023/neurips/CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
new file mode 100644
index 0000000000..3e5722f744
--- /dev/null
+++ b/data/2023/neurips/CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss	
@@ -0,0 +1 @@
+This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.
\ No newline at end of file
diff --git a/data/2023/neurips/Cal-DETR: Calibrated Detection Transformer b/data/2023/neurips/Cal-DETR: Calibrated Detection Transformer
new file mode 100644
index 0000000000..c97f51094b
--- /dev/null
+++ b/data/2023/neurips/Cal-DETR: Calibrated Detection Transformer	
@@ -0,0 +1 @@
+Albeit revealing impressive predictive performance for several computer vision tasks, deep neural networks (DNNs) are prone to making overconfident predictions. This limits the adoption and wider utilization of DNNs in many safety-critical applications. There have been recent efforts toward calibrating DNNs, however, almost all of them focus on the classification task. Surprisingly, very little attention has been devoted to calibrating modern DNN-based object detectors, especially detection transformers, which have recently demonstrated promising detection performance and are influential in many decision-making systems. In this work, we address the problem by proposing a mechanism for calibrated detection transformers (Cal-DETR), particularly for Deformable-DETR, UP-DETR and DINO. We pursue the train-time calibration route and make the following contributions. First, we propose a simple yet effective approach for quantifying uncertainty in transformer-based object detectors. Second, we develop an uncertainty-guided logit modulation mechanism that leverages the uncertainty to modulate the class logits. Third, we develop a logit mixing approach that acts as a regularizer with detection-specific losses and is also complementary to the uncertainty-guided logit modulation technique to further improve the calibration performance. Lastly, we conduct extensive experiments across three in-domain and four out-domain scenarios. Results corroborate the effectiveness of Cal-DETR against the competing train-time methods in calibrating both in-domain and out-domain detections while maintaining or even improving the detection performance. Our codebase and pre-trained models can be accessed at \url{https://github.com/akhtarvision/cal-detr}.
\ No newline at end of file
diff --git a/data/2023/neurips/Calibrate and Boost Logical Expressiveness of GNN Over Multi-Relational and Temporal Graphs b/data/2023/neurips/Calibrate and Boost Logical Expressiveness of GNN Over Multi-Relational and Temporal Graphs
new file mode 100644
index 0000000000..69df9854ce
--- /dev/null
+++ b/data/2023/neurips/Calibrate and Boost Logical Expressiveness of GNN Over Multi-Relational and Temporal Graphs	
@@ -0,0 +1 @@
+As a powerful framework for graph representation learning, Graph Neural Networks (GNNs) have garnered significant attention in recent years. However, to the best of our knowledge, there has been no formal analysis of the logical expressiveness of GNNs as Boolean node classifiers over multi-relational graphs, where each edge carries a specific relation type. In this paper, we investigate $\mathcal{FOC}_2$, a fragment of first-order logic with two variables and counting quantifiers. On the negative side, we demonstrate that the R$^2$-GNN architecture, which extends the local message passing GNN by incorporating global readout, fails to capture $\mathcal{FOC}_2$ classifiers in the general case. Nevertheless, on the positive side, we establish that R$^2$-GNNs models are equivalent to $\mathcal{FOC}_2$ classifiers under certain restricted yet reasonable scenarios. To address the limitations of R$^2$-GNNs regarding expressiveness, we propose a simple graph transformation technique, akin to a preprocessing step, which can be executed in linear time. This transformation enables R$^2$-GNNs to effectively capture any $\mathcal{FOC}_2$ classifiers when applied to the"transformed"input graph. Moreover, we extend our analysis of expressiveness and graph transformation to temporal graphs, exploring several temporal GNN architectures and providing an expressiveness hierarchy for them. To validate our findings, we implement R$^2$-GNNs and the graph transformation technique and conduct empirical tests in node classification tasks against various well-known GNN architectures that support multi-relational or temporal graphs. Our experimental results consistently demonstrate that R$^2$-GNN with the graph transformation outperforms the baseline methods on both synthetic and real-world datasets
\ No newline at end of file
diff --git a/data/2023/neurips/Calibration by Distribution Matching: Trainable Kernel Calibration Metrics b/data/2023/neurips/Calibration by Distribution Matching: Trainable Kernel Calibration Metrics
new file mode 100644
index 0000000000..d0a77edb37
--- /dev/null
+++ b/data/2023/neurips/Calibration by Distribution Matching: Trainable Kernel Calibration Metrics	
@@ -0,0 +1 @@
+Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. Furthermore, we provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions. Our empirical evaluation demonstrates that employing these metrics as regularizers enhances calibration, sharpness, and decision-making across a range of regression and classification tasks, outperforming methods relying solely on post-hoc recalibration.
\ No newline at end of file
diff --git a/data/2023/neurips/CamoPatch: An Evolutionary Strategy for Generating Camoflauged Adversarial Patches b/data/2023/neurips/CamoPatch: An Evolutionary Strategy for Generating Camoflauged Adversarial Patches
new file mode 100644
index 0000000000..6ff2de3a0d
--- /dev/null
+++ b/data/2023/neurips/CamoPatch: An Evolutionary Strategy for Generating Camoflauged Adversarial Patches	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have demonstrated vulnerabilities to adversarial examples, which raises concerns about their reliability in safety-critical applications. While the majority of existing methods generate adversarial examples by making small modifications to the entire image, recent research has proposed a practical alternative known as adversarial patches. Adversarial patches have shown to be highly effective in causing DNNs to misclassify by distorting a localized area (patch) of the image. However, existing methods often produce clearly visible distortions since they do not consider the visibility of the patch. To address this, we propose a novel method for constructing adversarial patches that approximates the appearance of the area it covers. We achieve this by using a set of semi-transparent, RGB-valued circles, drawing inspiration from the computational art community. We utilize an evolutionary strategy to optimize the properties of each shape, and employ a simulated annealing approach to optimize the patch’s location. Our approach achieves better or comparable performance to state-of-the-art methods on ImageNet DNN classifiers while achieving a lower l 2 distance from the original image. By minimizing the visibility of the patch, this work further highlights the vulnerabilities of DNNs to adversarial patches.
\ No newline at end of file
diff --git a/data/2023/neurips/Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments b/data/2023/neurips/Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Causal discovery from observational and interventional data across multiple environments b/data/2023/neurips/Causal discovery from observational and interventional data across multiple environments
new file mode 100644
index 0000000000..77d472077a
--- /dev/null
+++ b/data/2023/neurips/Causal discovery from observational and interventional data across multiple environments	
@@ -0,0 +1 @@
+A fundamental problem in many sciences is the learning of causal structure underlying a system, typically through observation and experimentation. Commonly, one even collects data across multiple domains, such as gene sequencing from different labs, or neural recordings from different species. Although there exist methods for learning the equivalence class of causal diagrams from observational and experimental data, they are meant to operate in a single domain. In this paper, we develop a fundamental approach to structure learning in non-Markovian systems (i.e. when there exist latent confounders) leveraging observational and interventional data collected from multiple domains. Specifically, we start by showing that learning from observational data in multiple domains is equivalent to learning from interventional data with unknown targets in a single domain. But there are also subtleties when considering observational and experimental data. Using causal invariances derived from do-calculus, we define a property called S-Markov that connects interventional distributions from multiple-domains to graphical criterion on a selection diagram. Leveraging the S-Markov property, we introduce a new constraint-based causal discovery algorithm, S-FCI, that can learn from observational and interventional data from different domains. We prove that the algorithm is sound and subsumes existing constraint-based causal discovery algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Causal-structure Driven Augmentations for Text OOD Generalization b/data/2023/neurips/Causal-structure Driven Augmentations for Text OOD Generalization
new file mode 100644
index 0000000000..f59d9bdda2
--- /dev/null
+++ b/data/2023/neurips/Causal-structure Driven Augmentations for Text OOD Generalization	
@@ -0,0 +1 @@
+The reliance of text classifiers on spurious correlations can lead to poor generalization at deployment, raising concerns about their use in safety-critical domains such as healthcare. In this work, we propose to use counterfactual data augmentation, guided by knowledge of the causal structure of the data, to simulate interventions on spurious features and to learn more robust text classifiers. We show that this strategy is appropriate in prediction problems where the label is spuriously correlated with an attribute. Under the assumptions of such problems, we discuss the favorable sample complexity of counterfactual data augmentation, compared to importance re-weighting. Pragmatically, we match examples using auxiliary data, based on diff-in-diff methodology, and use a large language model (LLM) to represent a conditional probability of text. Through extensive experimentation on learning caregiver-invariant predictors of clinical diagnoses from medical narratives and on semi-synthetic data, we demonstrate that our method for simulating interventions improves out-of-distribution (OOD) accuracy compared to baseline invariant learning algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Characterization and Learning of Causal Graphs with Small Conditioning Sets b/data/2023/neurips/Characterization and Learning of Causal Graphs with Small Conditioning Sets
new file mode 100644
index 0000000000..dbbed2c8d5
--- /dev/null
+++ b/data/2023/neurips/Characterization and Learning of Causal Graphs with Small Conditioning Sets	
@@ -0,0 +1 @@
+Constraint-based causal discovery algorithms learn part of the causal graph structure by systematically testing conditional independences observed in the data. These algorithms, such as the PC algorithm and its variants, rely on graphical characterizations of the so-called equivalence class of causal graphs proposed by Pearl. However, constraint-based causal discovery algorithms struggle when data is limited since conditional independence tests quickly lose their statistical power, especially when the conditioning set is large. To address this, we propose using conditional independence tests where the size of the conditioning set is upper bounded by some integer $k$ for robust causal discovery. The existing graphical characterizations of the equivalence classes of causal graphs are not applicable when we cannot leverage all the conditional independence statements. We first define the notion of $k$-Markov equivalence: Two causal graphs are $k$-Markov equivalent if they entail the same conditional independence constraints where the conditioning set size is upper bounded by $k$. We propose a novel representation that allows us to graphically characterize $k$-Markov equivalence between two causal graphs. We propose a sound constraint-based algorithm called the $k$-PC algorithm for learning this equivalence class. Finally, we conduct synthetic, and semi-synthetic experiments to demonstrate that the $k$-PC algorithm enables more robust causal discovery in the small sample regime compared to the baseline algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Characterization of Overfitting in Robust Multiclass Classification b/data/2023/neurips/Characterization of Overfitting in Robust Multiclass Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach b/data/2023/neurips/Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach
new file mode 100644
index 0000000000..4dd6950d06
--- /dev/null
+++ b/data/2023/neurips/Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach	
@@ -0,0 +1 @@
+Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target datasets for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. We evaluate the effectiveness of our proposed RFR algorithm on synthetic and real distribution shifts across various datasets. Experimental results demonstrate that RFR achieves better fairness-accuracy trade-off performance compared with several baselines. The source code is available at \url{https://github.com/zhimengj0326/RFR_NeurIPS23}.
\ No newline at end of file
diff --git a/data/2023/neurips/ChatGPT-Powered Hierarchical Comparisons for Image Classification b/data/2023/neurips/ChatGPT-Powered Hierarchical Comparisons for Image Classification
new file mode 100644
index 0000000000..3d400a6f68
--- /dev/null
+++ b/data/2023/neurips/ChatGPT-Powered Hierarchical Comparisons for Image Classification	
@@ -0,0 +1 @@
+The zero-shot open-vocabulary challenge in image classification is tackled by pretrained vision-language models like CLIP, which benefit from incorporating class-specific knowledge from large language models (LLMs) like ChatGPT. However, biases in CLIP lead to similar descriptions for distinct but related classes, prompting our novel image classification framework via hierarchical comparisons: using LLMs to recursively group classes into hierarchies and classifying images by comparing image-text embeddings at each hierarchy level, resulting in an intuitive, effective, and explainable approach.
\ No newline at end of file
diff --git a/data/2023/neurips/Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models b/data/2023/neurips/Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
new file mode 100644
index 0000000000..a2edf9b18b
--- /dev/null
+++ b/data/2023/neurips/Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models	
@@ -0,0 +1 @@
+Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL instruction tuning. In this paper, we propose a novel and affordable solution for the effective VL adaption of LLMs, called Mixture-of-Modality Adaptation (MMA). Instead of using large neural networks to connect the image encoder and LLM, MMA adopts lightweight modules, i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enables the joint optimization of the image and language models. Meanwhile, MMA is also equipped with a routing algorithm to help LLMs achieve an automatic shift between single- and multi-modal instructions without compromising their ability of natural language understanding. To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN. To validate MMA and LaVIN, we conduct extensive experiments under two setups, namely multimodal science question answering and multimodal dialogue. The experimental results not only demonstrate the competitive performance and the superior training efficiency of LaVIN than existing multimodal LLMs, but also confirm its great potential as a general-purpose chatbot. More importantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4 training hours with 3.8M trainable parameters, greatly confirming the effectiveness of MMA. Our project is released at https://luogen1996.github.io/lavin.
\ No newline at end of file
diff --git a/data/2023/neurips/Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models b/data/2023/neurips/Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data b/data/2023/neurips/CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data
new file mode 100644
index 0000000000..a3001ce314
--- /dev/null
+++ b/data/2023/neurips/CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data	
@@ -0,0 +1 @@
+City-scale 3D point cloud is a promising way to express detailed and complicated outdoor structures. It encompasses both the appearance and geometry features of segmented city components, including cars, streets, and buildings, that can be utilized for attractive applications such as user-interactive navigation of autonomous vehicles and drones. However, compared to the extensive text annotations available for images and indoor scenes, the scarcity of text annotations for outdoor scenes poses a significant challenge for achieving these applications. To tackle this problem, we introduce the CityRefer dataset for city-level visual grounding. The dataset consists of 35k natural language descriptions of 3D objects appearing in SensatUrban city scenes and 5k landmarks labels synchronizing with OpenStreetMap. To ensure the quality and accuracy of the dataset, all descriptions and labels in the CityRefer dataset are manually verified. We also have developed a baseline system that can learn encoded language descriptions, 3D object instances, and geographical information about the city's landmarks to perform visual grounding on the CityRefer dataset. To the best of our knowledge, the CityRefer dataset is the largest city-level visual grounding dataset for localizing specific 3D objects.
\ No newline at end of file
diff --git a/data/2023/neurips/Class-Conditional Conformal Prediction with Many Classes b/data/2023/neurips/Class-Conditional Conformal Prediction with Many Classes
new file mode 100644
index 0000000000..39a45b4e11
--- /dev/null
+++ b/data/2023/neurips/Class-Conditional Conformal Prediction with Many Classes	
@@ -0,0 +1 @@
+Standard conformal prediction methods provide a marginal coverage guarantee, which means that for a random test point, the conformal prediction set contains the true label with a user-specified probability. In many classification problems, we would like to obtain a stronger guarantee--that for test points of a specific class, the prediction set contains the true label with the same user-chosen probability. For the latter goal, existing conformal prediction methods do not work well when there is a limited amount of labeled data per class, as is often the case in real applications where the number of classes is large. We propose a method called clustered conformal prediction that clusters together classes having"similar"conformal scores and performs conformal prediction at the cluster level. Based on empirical evaluation across four image data sets with many (up to 1000) classes, we find that clustered conformal typically outperforms existing methods in terms of class-conditional coverage and set size metrics.
\ No newline at end of file
diff --git a/data/2023/neurips/Classical Simulation of Quantum Circuits: Parallel Environments and Benchmark b/data/2023/neurips/Classical Simulation of Quantum Circuits: Parallel Environments and Benchmark
new file mode 100644
index 0000000000..b8620ee427
--- /dev/null
+++ b/data/2023/neurips/Classical Simulation of Quantum Circuits: Parallel Environments and Benchmark	
@@ -0,0 +1 @@
+Google’s “quantum supremacy” announcement [3] has received broad questions from academia and industry due to the debatable estimate of 10 , 000 years’ running time for the classical simulation task on the Summit supercomputer. Has “quantum supremacy” already come? Or will it come in one or two decades later? To avoid hasty advertisements of “quantum supremacy” by tech giants or quantum startups and eliminate the cost of dedicating a team to the classical simulation task, we advocate an open-source approach to maintain a trustable benchmark performance. In this paper, we take a reinforcement learning approach for the classical simulation of quantum circuits and demonstrate its great potential by reporting an estimated simulation time of less than 4 days, a speedup of 5 . 40 × over the state-of-the-art method. Speciﬁcally, we formulate the classical simulation task as a tensor network contraction ordering problem using the K-spin Ising model and employ a novel Hamiltonian-based reinforcement learning algorithm. Then, we evaluate the performance of classical simulation of quantum circuits. We develop a dozen of massively parallel environments to simulate quantum circuits. We open-source our parallel gym environments and benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling b/data/2023/neurips/ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling
new file mode 100644
index 0000000000..d9469c4932
--- /dev/null
+++ b/data/2023/neurips/ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling	
@@ -0,0 +1 @@
+Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.
\ No newline at end of file
diff --git a/data/2023/neurips/ClusterFomer: Clustering As A Universal Visual Learner b/data/2023/neurips/ClusterFomer: Clustering As A Universal Visual Learner
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Clustering the Sketch: Dynamic Compression for Embedding Tables b/data/2023/neurips/Clustering the Sketch: Dynamic Compression for Embedding Tables
new file mode 100644
index 0000000000..abbd1d739d
--- /dev/null
+++ b/data/2023/neurips/Clustering the Sketch: Dynamic Compression for Embedding Tables	
@@ -0,0 +1 @@
+Embedding tables are used by machine learning systems to work with categorical features. In modern Recommendation Systems, these tables can be very large, necessitating the development of new methods for fitting them in memory, even during training. We suggest Clustered Compositional Embeddings (CCE) which combines clustering-based compression like quantization to codebooks with dynamic methods like The Hashing Trick and Compositional Embeddings (Shi et al., 2020). Experimentally CCE achieves the best of both worlds: The high compression rate of codebook-based quantization, but *dynamically* like hashing-based methods, so it can be used during training. Theoretically, we prove that CCE is guaranteed to converge to the optimal codebook and give a tight bound for the number of iterations required.
\ No newline at end of file
diff --git a/data/2023/neurips/CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection b/data/2023/neurips/CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
new file mode 100644
index 0000000000..13b087521d
--- /dev/null
+++ b/data/2023/neurips/CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection	
@@ -0,0 +1 @@
+Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature. There are primarily two fundamental problems in OV-3DDet, i.e., localizing and classifying novel objects. This paper aims at addressing the two problems simultaneously via a unified framework, under the condition of limited base categories. To localize novel 3D objects, we propose an effective 3D Novel Object Discovery strategy, which utilizes both the 3D box geometry priors and 2D semantic open-vocabulary priors to generate pseudo box labels of the novel objects. To classify novel object boxes, we further develop a cross-modal alignment module based on discovered novel boxes, to align feature spaces between 3D point cloud and image/text modalities. Specifically, the alignment process contains a class-agnostic and a class-discriminative alignment, incorporating not only the base objects with annotations but also the increasingly discovered novel objects, resulting in an iteratively enhanced alignment. The novel box discovery and crossmodal alignment are jointly learned to collaboratively benefit each other. The novel object discovery can directly impact the cross-modal alignment, while a better feature alignment can, in turn, boost the localization capability, leading to a unified OV-3DDet framework, named CoDA, for simultaneous novel object localization and classification. Extensive experiments on two challenging datasets (i.e., SUN-RGBD and ScanNet) demonstrate the effectiveness of our method and also show a significant mAP improvement upon the best-performing alternative method by 80%. Codes and pre-trained models are released on the project page.
\ No newline at end of file
diff --git a/data/2023/neurips/CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection b/data/2023/neurips/CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
new file mode 100644
index 0000000000..47be7c6c65
--- /dev/null
+++ b/data/2023/neurips/CoDet: Co-occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection	
@@ -0,0 +1 @@
+Deriving reliable region-word alignment from image-text pairs is critical to learn object-level vision-language representations for open-vocabulary object detection. Existing methods typically rely on pre-trained or self-trained vision-language models for alignment, which are prone to limitations in localization accuracy or generalization capabilities. In this paper, we propose CoDet, a novel approach that overcomes the reliance on pre-aligned vision-language space by reformulating region-word alignment as a co-occurring object discovery problem. Intuitively, by grouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group. CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept. Extensive experiments demonstrate that CoDet has superior performances and compelling scalability in open-vocabulary detection, e.g., by scaling up the visual backbone, CoDet achieves 37.0 $\text{AP}^m_{novel}$ and 44.7 $\text{AP}^m_{all}$ on OV-LVIS, surpassing the previous SoTA by 4.2 $\text{AP}^m_{novel}$ and 9.8 $\text{AP}^m_{all}$. Code is available at https://github.com/CVMI-Lab/CoDet.
\ No newline at end of file
diff --git a/data/2023/neurips/CoPriv: Network Protocol Co-Optimization for Communication-Efficient Private Inference b/data/2023/neurips/CoPriv: Network Protocol Co-Optimization for Communication-Efficient Private Inference
new file mode 100644
index 0000000000..40e9ef7cf5
--- /dev/null
+++ b/data/2023/neurips/CoPriv: Network Protocol Co-Optimization for Communication-Efficient Private Inference	
@@ -0,0 +1 @@
+Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication. In this work, we present CoPriv, a framework that jointly optimizes the 2PC inference protocol and the DNN architecture. CoPriv features a new 2PC protocol for convolution based on Winograd transformation and develops DNN-aware optimization to significantly reduce the inference communication. CoPriv further develops a 2PC-aware network optimization algorithm that is compatible with the proposed protocol and simultaneously reduces the communication for all the linear and non-linear operations. We compare CoPriv with the SOTA 2PC protocol, CrypTFlow2, and demonstrate 2.1x communication reduction for both ResNet-18 and ResNet-32 on CIFAR-100. We also compare CoPriv with SOTA network optimization methods, including SNL, MetaPruning, etc. CoPriv achieves 9.98x and 3.88x online and total communication reduction with a higher accuracy compare to SNL, respectively. CoPriv also achieves 3.87x online communication reduction with more than 3% higher accuracy compared to MetaPruning.
\ No newline at end of file
diff --git a/data/2023/neurips/Cognitive Steering in Deep Neural Networks via Long-Range Modulatory Feedback Connections b/data/2023/neurips/Cognitive Steering in Deep Neural Networks via Long-Range Modulatory Feedback Connections
new file mode 100644
index 0000000000..1b1a677e74
--- /dev/null
+++ b/data/2023/neurips/Cognitive Steering in Deep Neural Networks via Long-Range Modulatory Feedback Connections	
@@ -0,0 +1 @@
+Given the rich visual information available in each glance, humans can internally direct their visual attention to enhance goal-relevant information—a capacity often absent in standard vision models. Here we introduce cognitively and biologically-inspired long-range modulatory pathways to enable ‘cognitive steering’ in vision models. First, we show that models equipped with these feedback pathways naturally show improved image recognition, adversarial robustness, and increased brain alignment, relative to baseline models. Further, these feedback projections from the final layer of the vision backbone provide a meaningful steering interface , where goals can be specified as vectors in the output space. We show that there are effective ways to steer the model that dramatically improve recognition of categories in composite images of multiple categories, succeeding where baseline feed-forward models without flexible steering fail. And, our multiplicative modulatory motif prevents rampant hallucination of the top-down goal category, dissociating what the model is looking for, from what it is looking at. Thus, these long-range modulatory pathways enable new behavioral capacities for goal-directed visual encoding, offering a flexible communication interface between cognitive and visual systems.
\ No newline at end of file
diff --git a/data/2023/neurips/Collaborative Alignment of NLP Models b/data/2023/neurips/Collaborative Alignment of NLP Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Collaborative Learning via Prediction Consensus b/data/2023/neurips/Collaborative Learning via Prediction Consensus
new file mode 100644
index 0000000000..9835162b8e
--- /dev/null
+++ b/data/2023/neurips/Collaborative Learning via Prediction Consensus	
@@ -0,0 +1 @@
+We consider a collaborative learning setting where the goal of each agent is to improve their own model by leveraging the expertise of collaborators, in addition to their own training data. To facilitate the exchange of expertise among agents, we propose a distillation-based method leveraging shared unlabeled auxiliary data, which is pseudo-labeled by the collective. Central to our method is a trust weighting scheme that serves to adaptively weigh the influence of each collaborator on the pseudo-labels until a consensus on how to label the auxiliary data is reached. We demonstrate empirically that our collaboration scheme is able to significantly boost the performance of individual models in the target domain from which the auxiliary data is sampled. By design, our method adeptly accommodates heterogeneity in model architectures and substantially reduces communication overhead compared to typical collaborative learning methods. At the same time, it can provably mitigate the negative impact of bad models on the collective.
\ No newline at end of file
diff --git a/data/2023/neurips/Collaborative Score Distillation for Consistent Visual Editing b/data/2023/neurips/Collaborative Score Distillation for Consistent Visual Editing
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Collaboratively Learning Linear Models with Structured Missing Data b/data/2023/neurips/Collaboratively Learning Linear Models with Structured Missing Data
new file mode 100644
index 0000000000..8fad574071
--- /dev/null
+++ b/data/2023/neurips/Collaboratively Learning Linear Models with Structured Missing Data	
@@ -0,0 +1 @@
+We study the problem of collaboratively learning least squares estimates for $m$ agents. Each agent observes a different subset of the features$\unicode{x2013}$e.g., containing data collected from sensors of varying resolution. Our goal is to determine how to coordinate the agents in order to produce the best estimator for each agent. We propose a distributed, semi-supervised algorithm Collab, consisting of three steps: local training, aggregation, and distribution. Our procedure does not require communicating the labeled data, making it communication efficient and useful in settings where the labeled data is inaccessible. Despite this handicap, our procedure is nearly asymptotically local minimax optimal$\unicode{x2013}$even among estimators allowed to communicate the labeled data such as imputation methods. We test our method on real and synthetic data.
\ No newline at end of file
diff --git a/data/2023/neurips/Collapsed Inference for Bayesian Deep Learning b/data/2023/neurips/Collapsed Inference for Bayesian Deep Learning
new file mode 100644
index 0000000000..83b2e11aba
--- /dev/null
+++ b/data/2023/neurips/Collapsed Inference for Bayesian Deep Learning	
@@ -0,0 +1 @@
+Bayesian neural networks (BNNs) provide a formalism to quantify and calibrate uncertainty in deep learning. Current inference approaches for BNNs often resort to few-sample estimation for scalability, which can harm predictive performance, while its alternatives tend to be computationally prohibitively expensive. We tackle this challenge by revealing a previously unseen connection between inference on BNNs and volume computation problems. With this observation, we introduce a novel collapsed inference scheme that performs Bayesian model averaging using collapsed samples. It improves over a Monte-Carlo sample by limiting sampling to a subset of the network weights while pairing it with some closed-form conditional distribution over the rest. A collapsed sample represents uncountably many models drawn from the approximate posterior and thus yields higher sample efficiency. Further, we show that the marginalization of a collapsed sample can be solved analytically and efficiently despite the non-linearity of neural networks by leveraging existing volume computation solvers. Our proposed use of collapsed samples achieves a balance between scalability and accuracy. On various regression and classification tasks, our collapsed Bayesian deep learning approach demonstrates significant improvements over existing methods and sets a new state of the art in terms of uncertainty estimation as well as predictive performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Compact Neural Volumetric Video Representations with Dynamic Codebooks b/data/2023/neurips/Compact Neural Volumetric Video Representations with Dynamic Codebooks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions b/data/2023/neurips/Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions
new file mode 100644
index 0000000000..24a2c820a9
--- /dev/null
+++ b/data/2023/neurips/Comparing Causal Frameworks: Potential Outcomes, Structural Models, Graphs, and Abstractions	
@@ -0,0 +1 @@
+The aim of this paper is to make clear and precise the relationship between the Rubin causal model (RCM) and structural causal model (SCM) frameworks for causal inference. Adopting a neutral logical perspective, and drawing on previous work, we show what is required for an RCM to be representable by an SCM. A key result then shows that every RCM -- including those that violate algebraic principles implied by the SCM framework -- emerges as an abstraction of some representable RCM. Finally, we illustrate the power of this ameliorative perspective by pinpointing an important role for SCM principles in classic applications of RCMs; conversely, we offer a characterization of the algebraic constraints implied by a graph, helping to substantiate further comparisons between the two frameworks.
\ No newline at end of file
diff --git a/data/2023/neurips/Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift b/data/2023/neurips/Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift
new file mode 100644
index 0000000000..d18860c03d
--- /dev/null
+++ b/data/2023/neurips/Complementary Benefits of Contrastive Learning and Self-Training Under Distribution Shift	
@@ -0,0 +1 @@
+Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains unexplored. In this paper, we undertake a systematic empirical investigation of this combination, finding that (i) in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (e.g., BREEDs, WILDS), we demonstrate that the combined method obtains 3--8% higher accuracy than either approach independently. We then theoretically analyze these techniques in a simplified model of distribution shift, demonstrating scenarios under which the features produced by contrastive learning can yield a good initialization for self-training to further amplify gains and achieve optimal performance, even when either method alone would fail.
\ No newline at end of file
diff --git a/data/2023/neurips/Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy b/data/2023/neurips/Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy
new file mode 100644
index 0000000000..adfe160643
--- /dev/null
+++ b/data/2023/neurips/Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy	
@@ -0,0 +1 @@
+Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-$2$ networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-$3$ ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-$2$ networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a well-studied assumption on the existence of local pseudorandom generators.
\ No newline at end of file
diff --git a/data/2023/neurips/Concept Algebra for (Score-Based) Text-Controlled Generative Models b/data/2023/neurips/Concept Algebra for (Score-Based) Text-Controlled Generative Models
new file mode 100644
index 0000000000..61cc1b62c8
--- /dev/null
+++ b/data/2023/neurips/Concept Algebra for (Score-Based) Text-Controlled Generative Models	
@@ -0,0 +1 @@
+This paper concerns the structure of learned representations in text-guided generative models, focusing on score-based models. A key property of such models is that they can compose disparate concepts in a `disentangled' manner. This suggests these models have internal representations that encode concepts in a `disentangled' manner. Here, we focus on the idea that concepts are encoded as subspaces of some representation space. We formalize what this means, show there's a natural choice for the representation, and develop a simple method for identifying the part of the representation corresponding to a given concept. In particular, this allows us to manipulate the concepts expressed by the model through algebraic manipulation of the representation. We demonstrate the idea with examples using Stable Diffusion. Code in https://github.com/zihao12/concept-algebra-code
\ No newline at end of file
diff --git a/data/2023/neurips/Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement b/data/2023/neurips/Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
new file mode 100644
index 0000000000..1c3ec4787a
--- /dev/null
+++ b/data/2023/neurips/Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement	
@@ -0,0 +1 @@
+Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.
\ No newline at end of file
diff --git a/data/2023/neurips/Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference b/data/2023/neurips/Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
new file mode 100644
index 0000000000..2889ace084
--- /dev/null
+++ b/data/2023/neurips/Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference	
@@ -0,0 +1 @@
+We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.
\ No newline at end of file
diff --git a/data/2023/neurips/Conditional Mutual Information for Disentangled Representations in Reinforcement Learning b/data/2023/neurips/Conditional Mutual Information for Disentangled Representations in Reinforcement Learning
new file mode 100644
index 0000000000..516b077553
--- /dev/null
+++ b/data/2023/neurips/Conditional Mutual Information for Disentangled Representations in Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.
\ No newline at end of file
diff --git a/data/2023/neurips/Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model b/data/2023/neurips/Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model
new file mode 100644
index 0000000000..7ae4cbc64f
--- /dev/null
+++ b/data/2023/neurips/Conformal Prediction for Uncertainty-Aware Planning with Diffusion Dynamics Model	
@@ -0,0 +1 @@
+Robotic applications often involve working in environments that are uncertain, dynamic, and partially observable. Recently, diffusion models have been proposed for learning trajectory prediction models trained from expert demonstrations, which can be used for planning in robot tasks. Such models have demonstrated a strong ability to overcome challenges such as multi-modal action distributions, high-dimensional output spaces, and training instability. It is crucial to quantify the uncertainty of these dynamics models when using them for planning. In this paper, we quantify the uncertainty of diffusion dynamics models using Conformal Prediction (CP). Given a finite number of exchangeable expert trajectory examples (called the “calibration set”), we use CP to obtain a set in the trajectory space (called the “coverage region”) that is guaranteed to contain the output of the diffusion model with a user-defined probability (called the “coverage level”). In PlanCP, inspired by concepts from conformal prediction, we modify the loss function for training the diffusion model to include a quantile term to encourage more robust performance across the variety of training examples. At test time, we then calibrate PlanCP with a conformal prediction process to obtain coverage sets for the trajectory prediction with guaranteed coverage level. We evaluate our algorithm on various planning tasks and model-based offline reinforcement learning tasks and show that it reduces the uncertainty of the learned trajectory prediction model. As a by-product, our algorithm PlanCP outperforms prior algorithms on existing offline RL benchmarks and challenging continuous planning tasks. Our method can be combined with most model-based planning approaches to produce uncertainty estimates of the closed-loop system.
\ No newline at end of file
diff --git a/data/2023/neurips/Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems b/data/2023/neurips/Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems
new file mode 100644
index 0000000000..a0d6929306
--- /dev/null
+++ b/data/2023/neurips/Connected Superlevel Set in (Deep) Reinforcement Learning and its Application to Minimax Theorems	
@@ -0,0 +1 @@
+The aim of this paper is to improve the understanding of the optimization landscape for policy optimization problems in reinforcement learning. Specifically, we show that the superlevel set of the objective function with respect to the policy parameter is always a connected set both in the tabular setting and under policies represented by a class of neural networks. In addition, we show that the optimization objective as a function of the policy parameter and reward satisfies a stronger"equiconnectedness"property. To our best knowledge, these are novel and previously unknown discoveries. We present an application of the connectedness of these superlevel sets to the derivation of minimax theorems for robust reinforcement learning. We show that any minimax optimization program which is convex on one side and is equiconnected on the other side observes the minimax equality (i.e. has a Nash equilibrium). We find that this exact structure is exhibited by an interesting robust reinforcement learning problem under an adversarial reward attack, and the validity of its minimax equality immediately follows. This is the first time such a result is established in the literature.
\ No newline at end of file
diff --git a/data/2023/neurips/Connecting Certified and Adversarial Training b/data/2023/neurips/Connecting Certified and Adversarial Training
new file mode 100644
index 0000000000..4b73168aa1
--- /dev/null
+++ b/data/2023/neurips/Connecting Certified and Adversarial Training	
@@ -0,0 +1 @@
+Training certifiably robust neural networks remains a notoriously hard problem. On one side, adversarial training optimizes under-approximations of the worst-case loss, which leads to insufficient regularization for certification, while on the other, sound certified training methods optimize loose over-approximations, leading to over-regularization and poor (standard) accuracy. In this work we propose TAPS, an (unsound) certified training method that combines IBP and PGD training to yield precise, although not necessarily sound, worst-case loss approximations, reducing over-regularization and increasing certified and standard accuracies. Empirically, TAPS achieves a new state-of-the-art in many settings, e.g., reaching a certified accuracy of $22\%$ on TinyImageNet for $\ell_\infty$-perturbations with radius $\epsilon=1/255$. We make our implementation and networks public at https://github.com/eth-sri/taps.
\ No newline at end of file
diff --git a/data/2023/neurips/Conservative State Value Estimation for Offline Reinforcement Learning b/data/2023/neurips/Conservative State Value Estimation for Offline Reinforcement Learning
new file mode 100644
index 0000000000..c7f5e3d348
--- /dev/null
+++ b/data/2023/neurips/Conservative State Value Estimation for Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the Bellman iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states \emph{around} the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Constructing Non-isotropic Gaussian Diffusion Model Using Isotropic Gaussian Diffusion Model for Image Editing b/data/2023/neurips/Constructing Non-isotropic Gaussian Diffusion Model Using Isotropic Gaussian Diffusion Model for Image Editing
new file mode 100644
index 0000000000..598f5d69c3
--- /dev/null
+++ b/data/2023/neurips/Constructing Non-isotropic Gaussian Diffusion Model Using Isotropic Gaussian Diffusion Model for Image Editing	
@@ -0,0 +1 @@
+Score-based diffusion models (SBDMs) have achieved state-of-the-art results in image generation. In this paper, we propose a Non-isotropic Gaussian Diffusion Model (NGDM) for image editing, which requires editing the source image while preserving the image regions irrelevant to the editing task. We construct NGDM by adding independent Gaussian noises with different variances to different image pixels. Instead of specifically training the NGDM, we rectify the NGDM into an isotropic Gaussian diffusion model with different pixels having different total forward diffusion time. We propose to reverse the diffusion by designing a sampling method that starts at different time for different pixels for denoising to generate images using the pre-trained isotropic Gaussian diffusion model. Experimental results show that NGDM achieves state-of-the-art performance for image editing tasks, considering the trade-off between the fidelity to the source image and alignment with the desired editing target.
\ No newline at end of file
diff --git a/data/2023/neurips/Context Shift Reduction for Offline Meta-Reinforcement Learning b/data/2023/neurips/Context Shift Reduction for Offline Meta-Reinforcement Learning
new file mode 100644
index 0000000000..ff475e1ffd
--- /dev/null
+++ b/data/2023/neurips/Context Shift Reduction for Offline Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.
\ No newline at end of file
diff --git a/data/2023/neurips/Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes b/data/2023/neurips/Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes
new file mode 100644
index 0000000000..cad9a18af9
--- /dev/null
+++ b/data/2023/neurips/Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes	
@@ -0,0 +1 @@
+Embedding-based neural topic models have turned out to be a superior option for low-resourced topic modeling. However, current approaches consider static word embeddings learnt from source tasks as general knowledge that can be transferred directly to the target task, discounting the dynamically changing nature of word meanings in different contexts, thus typically leading to sub-optimal results when adapting to new tasks with unfamiliar contexts. To settle this issue, we provide an effective method that centers on adaptively generating semantically tailored word embeddings for each task by fully exploiting contextual information. Specifically, we first condense the contextual syntactic dependencies of words into a semantic graph for each task, which is then modeled by a Variational Graph Auto-Encoder to produce task-specific word representations. On this basis, we further impose a learnable Gaussian mixture prior on the latent space of words to efficiently learn topic representations from a clustering perspective, which contributes to diverse topic discovery and fast adaptation to novel tasks. We have conducted a wealth of quantitative and qualitative experiments, and the results show that our approach comprehensively outperforms established topic models.
\ No newline at end of file
diff --git a/data/2023/neurips/Context-lumpable stochastic bandits b/data/2023/neurips/Context-lumpable stochastic bandits
new file mode 100644
index 0000000000..9d6c9a9cf8
--- /dev/null
+++ b/data/2023/neurips/Context-lumpable stochastic bandits	
@@ -0,0 +1 @@
+We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t=1,2,\dots$, the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S,K\}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +K )/\epsilon^2)$ samples with high probability and provide a matching $\Omega(r(S+K)/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r^3(S+K)T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.
\ No newline at end of file
diff --git a/data/2023/neurips/Contextual Bandits and Imitation Learning with Preference-Based Active Queries b/data/2023/neurips/Contextual Bandits and Imitation Learning with Preference-Based Active Queries
new file mode 100644
index 0000000000..bea03da989
--- /dev/null
+++ b/data/2023/neurips/Contextual Bandits and Imitation Learning with Preference-Based Active Queries	
@@ -0,0 +1 @@
+We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action’s reward. Instead, the learner can actively request the expert at each round to compare two actions and receive noisy preference feedback. The learner’s objec-tive is two-fold: to minimize regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert. In this paper, we assume that the learner has access to a function class that can represent the expert’s preference model under appropriate link functions and present an algorithm that leverages an online regression oracle with respect to this function class. For the contextual ban-dit setting, our algorithm achieves a regret bound that combines the best of both worlds, scaling as O (min {√ T, d/ ∆ } ) , where T represents the number of interactions, d represents the eluder dimen-sion of the function class, and ∆ represents the minimum preference of the optimal action over any suboptimal action under all contexts. Our al-gorithm does not require the knowledge of ∆ , and the obtained regret bound is comparable to what can be achieved in the standard contextual bandits setting where the learner observes reward signals at each round. Additionally, our algorithm makes only O (min { T, d 2 / ∆ 2 } ) queries to the expert. We then extend our algorithm to the imitation learning setting, where the agent engages with an unknown environment in episodes of length H , and provide similar guarantees regarding regret and query complexity. Interestingly, with preference-based feedback, our imitation learning algorithm can learn a policy outperforming
\ No newline at end of file
diff --git a/data/2023/neurips/Contextual Stochastic Bilevel Optimization b/data/2023/neurips/Contextual Stochastic Bilevel Optimization
new file mode 100644
index 0000000000..61fe02c109
--- /dev/null
+++ b/data/2023/neurips/Contextual Stochastic Bilevel Optimization	
@@ -0,0 +1 @@
+We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results.
\ No newline at end of file
diff --git a/data/2023/neurips/Continuous-Time Functional Diffusion Processes b/data/2023/neurips/Continuous-Time Functional Diffusion Processes
new file mode 100644
index 0000000000..f35f52f3e3
--- /dev/null
+++ b/data/2023/neurips/Continuous-Time Functional Diffusion Processes	
@@ -0,0 +1 @@
+We introduce Functional Diffusion Processes (FDPs), which generalize score-based diffusion models to infinite-dimensional function spaces. FDPs require a new mathematical framework to describe the forward and backward dynamics, and several extensions to derive practical training objectives. These include infinite-dimensional versions of Girsanov theorem, in order to be able to compute an ELBO, and of the sampling theorem, in order to guarantee that functional evaluations in a countable set of points are equivalent to infinite-dimensional functions. We use FDPs to build a new breed of generative models in function spaces, which do not require specialized network architectures, and that can work with any kind of continuous data. Our results on real data show that FDPs achieve high-quality image generation, using a simple MLP architecture with orders of magnitude fewer parameters than existing diffusion models.
\ No newline at end of file
diff --git a/data/2023/neurips/Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time b/data/2023/neurips/Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time
new file mode 100644
index 0000000000..9700a85570
--- /dev/null
+++ b/data/2023/neurips/Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time	
@@ -0,0 +1 @@
+We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $\epsilon$ fraction of the data in one of the component distributions. Notably, our algorithm does not need labels and establishes the unique (and efficient) identifiability of the hidden halfspace under this distributional assumption. The sample and time complexity of the algorithm are polynomial in the dimension and $1/\epsilon$. The algorithm uses only the first two moments of suitable re-weightings of the empirical distribution, which we call contrastive moments; its analysis uses classical facts about generalized Dirichlet polynomials and relies crucially on a new monotonicity property of the moment ratio of truncations of logconcave distributions. Such algorithms, based only on first and second moments were suggested in earlier work, but hitherto eluded rigorous guarantees. Prior work addressed the special case when the underlying distribution is Gaussian via Non-Gaussian Component Analysis. We improve on this by providing polytime guarantees based on Total Variation (TV) distance, in place of existing moment-bound guarantees that can be super-polynomial. Our work is also the first to go beyond Gaussians in this setting.
\ No newline at end of file
diff --git a/data/2023/neurips/Contrastive Sampling Chains in Diffusion Models b/data/2023/neurips/Contrastive Sampling Chains in Diffusion Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Controlling Text-to-Image Diffusion by Orthogonal Finetuning b/data/2023/neurips/Controlling Text-to-Image Diffusion by Orthogonal Finetuning
new file mode 100644
index 0000000000..f7faad6b73
--- /dev/null
+++ b/data/2023/neurips/Controlling Text-to-Image Diffusion by Orthogonal Finetuning	
@@ -0,0 +1 @@
+Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
\ No newline at end of file
diff --git a/data/2023/neurips/Convex-Concave Zero-Sum Stochastic Stackelberg Games b/data/2023/neurips/Convex-Concave Zero-Sum Stochastic Stackelberg Games
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Convolutional Neural Operators for robust and accurate learning of PDEs b/data/2023/neurips/Convolutional Neural Operators for robust and accurate learning of PDEs
new file mode 100644
index 0000000000..e789cd4a5a
--- /dev/null
+++ b/data/2023/neurips/Convolutional Neural Operators for robust and accurate learning of PDEs	
@@ -0,0 +1 @@
+Although very successfully used in conventional machine learning, convolution based neural network architectures -- believed to be inconsistent in function space -- have been largely ignored in the context of learning solution operators of PDEs. Here, we present novel adaptations for convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is designed specifically to preserve its underlying continuous nature, even when implemented in a discretized form on a computer. We prove a universality theorem to show that CNOs can approximate operators arising in PDEs to desired accuracy. CNOs are tested on a novel suite of benchmarks, encompassing a diverse set of PDEs with possibly multi-scale solutions and are observed to significantly outperform baselines, paving the way for an alternative framework for robust and accurate operator learning. Our code is publicly available at https://github.com/bogdanraonic3/ConvolutionalNeuralOperator
\ No newline at end of file
diff --git a/data/2023/neurips/Convolutional State Space Models for Long-Range Spatiotemporal Modeling b/data/2023/neurips/Convolutional State Space Models for Long-Range Spatiotemporal Modeling
new file mode 100644
index 0000000000..9d05cb0699
--- /dev/null
+++ b/data/2023/neurips/Convolutional State Space Models for Long-Range Spatiotemporal Modeling	
@@ -0,0 +1 @@
+Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.
\ No newline at end of file
diff --git a/data/2023/neurips/Core-sets for Fair and Diverse Data Summarization b/data/2023/neurips/Core-sets for Fair and Diverse Data Summarization
new file mode 100644
index 0000000000..ed9e1cb0c6
--- /dev/null
+++ b/data/2023/neurips/Core-sets for Fair and Diverse Data Summarization	
@@ -0,0 +1 @@
+We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.
\ No newline at end of file
diff --git a/data/2023/neurips/Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry b/data/2023/neurips/Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry
new file mode 100644
index 0000000000..0dcdbfc322
--- /dev/null
+++ b/data/2023/neurips/Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry	
@@ -0,0 +1 @@
+The backpropagation algorithm has experienced remarkable success in training large-scale artificial neural networks; however, its biological plausibility has been strongly criticized, and it remains an open question whether the brain employs supervised learning mechanisms akin to it. Here, we propose correlative information maximization between layer activations as an alternative normative approach to describe the signal propagation in biological neural networks in both forward and backward directions. This new framework addresses many concerns about the biological-plausibility of conventional artificial neural networks and the backpropagation algorithm. The coordinate descent-based optimization of the corresponding objective, combined with the mean square error loss function for fitting labeled supervision data, gives rise to a neural network structure that emulates a more biologically realistic network of multi-compartment pyramidal neurons with dendritic processing and lateral inhibitory neurons. Furthermore, our approach provides a natural resolution to the weight symmetry problem between forward and backward signal propagation paths, a significant critique against the plausibility of the conventional backpropagation algorithm. This is achieved by leveraging two alternative, yet equivalent forms of the correlative mutual information objective. These alternatives intrinsically lead to forward and backward prediction networks without weight symmetry issues, providing a compelling solution to this long-standing challenge.
\ No newline at end of file
diff --git a/data/2023/neurips/CorresNeRF: Image Correspondence Priors for Neural Radiance Fields b/data/2023/neurips/CorresNeRF: Image Correspondence Priors for Neural Radiance Fields
new file mode 100644
index 0000000000..e08a947311
--- /dev/null
+++ b/data/2023/neurips/CorresNeRF: Image Correspondence Priors for Neural Radiance Fields	
@@ -0,0 +1 @@
+Neural Radiance Fields (NeRFs) have achieved impressive results in novel view synthesis and surface reconstruction tasks. However, their performance suffers under challenging scenarios with sparse input views. We present CorresNeRF, a novel method that leverages image correspondence priors computed by off-the-shelf methods to supervise NeRF training. We design adaptive processes for augmentation and filtering to generate dense and high-quality correspondences. The correspondences are then used to regularize NeRF training via the correspondence pixel reprojection and depth loss terms. We evaluate our methods on novel view synthesis and surface reconstruction tasks with density-based and SDF-based NeRF models on different datasets. Our method outperforms previous methods in both photometric and geometric metrics. We show that this simple yet effective technique of using correspondence priors can be applied as a plug-and-play module across different NeRF variants. The project page is at https://yxlao.github.io/corres-nerf.
\ No newline at end of file
diff --git a/data/2023/neurips/Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning b/data/2023/neurips/Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
new file mode 100644
index 0000000000..caa419c0ce
--- /dev/null
+++ b/data/2023/neurips/Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning	
@@ -0,0 +1 @@
+Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number, which is therefore theoretically superior to the direct treatment referred to above, especially when the agent number is large. We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.
\ No newline at end of file
diff --git a/data/2023/neurips/Counterfactual Evaluation of Peer-Review Assignment Policies b/data/2023/neurips/Counterfactual Evaluation of Peer-Review Assignment Policies
new file mode 100644
index 0000000000..8e391f005c
--- /dev/null
+++ b/data/2023/neurips/Counterfactual Evaluation of Peer-Review Assignment Policies	
@@ -0,0 +1 @@
+Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment policies is evaluating how changes to the assignment algorithm map to changes in review quality. In this work, we leverage recently proposed policies that introduce randomness in peer-review assignment--in order to mitigate fraud--as a valuable opportunity to evaluate counterfactual assignment policies. Specifically, we exploit how such randomized assignments provide a positive probability of observing the reviews of many assignment policies of interest. To address challenges in applying standard off-policy evaluation methods, such as violations of positivity, we introduce novel methods for partial identification based on monotonicity and Lipschitz smoothness assumptions for the mapping between reviewer-paper covariates and outcomes. We apply our methods to peer-review data from two computer science venues: the TPDP'21 workshop (95 papers and 35 reviewers) and the AAAI'22 conference (8,450 papers and 3,145 reviewers). We consider estimates of (i) the effect on review quality when changing weights in the assignment algorithm, e.g., weighting reviewers' bids vs. textual similarity (between the review's past papers and the submission), and (ii) the"cost of randomization", capturing the difference in expected quality between the perturbed and unperturbed optimal match. We find that placing higher weight on text similarity results in higher review quality and that introducing randomization in the reviewer-paper assignment only marginally reduces the review quality. Our methods for partial identification may be of independent interest, while our off-policy approach can likely find use evaluating a broad class of algorithmic matching systems.
\ No newline at end of file
diff --git a/data/2023/neurips/Coupled Reconstruction of Cortical Surfaces by Diffeomorphic Mesh Deformation b/data/2023/neurips/Coupled Reconstruction of Cortical Surfaces by Diffeomorphic Mesh Deformation
new file mode 100644
index 0000000000..bfd341b05b
--- /dev/null
+++ b/data/2023/neurips/Coupled Reconstruction of Cortical Surfaces by Diffeomorphic Mesh Deformation	
@@ -0,0 +1 @@
+Accurate reconstruction of cortical surfaces from brain magnetic resonance images (MRIs) remains a challenging task due to the notorious partial volume effect in brain MRIs and the cerebral cortex's thin and highly folded patterns. Although many promising deep learning-based cortical surface reconstruction methods have been developed, they typically fail to model the interdependence between inner (white matter) and outer (pial) cortical surfaces, which can help generate cortical surfaces with spherical topology. To robustly reconstruct the cortical surfaces with topological correctness, we develop a new deep learning framework to jointly reconstruct the inner, outer, and their in-between (midthickness) surfaces and estimate cortical thickness directly from 3D MRIs. Our method first estimates the midthickness surface and then learns three diffeomorphic flows jointly to optimize the midthickness surface and deform it inward and outward to the inner and outer cortical surfaces respectively, regularized by topological correctness. Our method also outputs a cortex thickness value for each surface vertex, estimated from its diffeomorphic deformation trajectory. Our method has been evaluated on two large-scale neuroimaging datasets, including ADNI and OASIS, achieving state-of-the-art cortical surface reconstruction performance in terms of accuracy, surface regularity, and computation efficiency.
\ No newline at end of file
diff --git a/data/2023/neurips/Covariance-adaptive best arm identification b/data/2023/neurips/Covariance-adaptive best arm identification
new file mode 100644
index 0000000000..c0682c4671
--- /dev/null
+++ b/data/2023/neurips/Covariance-adaptive best arm identification	
@@ -0,0 +1 @@
+We consider the problem of best arm identification in the multi-armed bandit model, under fixed confidence. Given a confidence input $\delta$, the goal is to identify the arm with the highest mean reward with a probability of at least 1 -- $\delta$, while minimizing the number of arm pulls. While the literature provides solutions to this problem under the assumption of independent arms distributions, we propose a more flexible scenario where arms can be dependent and rewards can be sampled simultaneously. This framework allows the learner to estimate the covariance among the arms distributions, enabling a more efficient identification of the best arm. The relaxed setting we propose is relevant in various applications, such as clinical trials, where similarities between patients or drugs suggest underlying correlations in the outcomes. We introduce new algorithms that adapt to the unknown covariance of the arms and demonstrate through theoretical guarantees that substantial improvement can be achieved over the standard setting. Additionally, we provide new lower bounds for the relaxed setting and present numerical simulations that support their theoretical findings.
\ No newline at end of file
diff --git a/data/2023/neurips/Creating Multi-Level Skill Hierarchies in Reinforcement Learning b/data/2023/neurips/Creating Multi-Level Skill Hierarchies in Reinforcement Learning
new file mode 100644
index 0000000000..e8a325cbc6
--- /dev/null
+++ b/data/2023/neurips/Creating Multi-Level Skill Hierarchies in Reinforcement Learning	
@@ -0,0 +1 @@
+What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent.
\ No newline at end of file
diff --git a/data/2023/neurips/Creating a Public Repository for Joining Private Data b/data/2023/neurips/Creating a Public Repository for Joining Private Data
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Cross-Domain Policy Adaptation via Value-Guided Data Filtering b/data/2023/neurips/Cross-Domain Policy Adaptation via Value-Guided Data Filtering
new file mode 100644
index 0000000000..b151b02ac3
--- /dev/null
+++ b/data/2023/neurips/Cross-Domain Policy Adaptation via Value-Guided Data Filtering	
@@ -0,0 +1 @@
+Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/Cross-links Matter for Link Prediction: Rethinking the Debiased GNN from a Data Perspective b/data/2023/neurips/Cross-links Matter for Link Prediction: Rethinking the Debiased GNN from a Data Perspective
new file mode 100644
index 0000000000..422441d7e5
--- /dev/null
+++ b/data/2023/neurips/Cross-links Matter for Link Prediction: Rethinking the Debiased GNN from a Data Perspective	
@@ -0,0 +1 @@
+Recently, the bias-related issues in GNN-based link prediction have raised widely spread concerns. In this paper, we emphasize the bias on links across different node clusters, which we call cross-links, after considering its significance in both easing information cocoons and preserving graph connectivity. Instead of following the objective-oriented mechanism in prior works with compromised utility, we empirically find that existing GNN models face severe data bias between internal-links (links within the same cluster) and cross-links, and this inspires us to rethink the bias issue on cross-links from a data perspective. Specifically, we design a simple yet effective twin-structure framework, which can be easily applied to most GNNs to mitigate the bias as well as boost their utility in an end-to-end manner. The basic idea is to generate debiased node embeddings as demonstrations and fuse them into the embeddings of original GNNs. In particular, we learn debiased node embeddings with the help of augmented supervision signals, and a novel dynamic training strategy is designed to effectively fuse debiased node embeddings with the original node embeddings. Experiments on three datasets with six common GNNs show that our framework can not only alleviate the bias between internal-links and cross-links but also boost the overall accuracy. Comparisons with other state-of-the-art methods also verify the superiority of our method.
\ No newline at end of file
diff --git a/data/2023/neurips/D4Explainer: In-distribution Explanations of Graph Neural Network via Discrete Denoising Diffusion b/data/2023/neurips/D4Explainer: In-distribution Explanations of Graph Neural Network via Discrete Denoising Diffusion
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/DAC-DETR: Divide the Attention Layers and Conquer b/data/2023/neurips/DAC-DETR: Divide the Attention Layers and Conquer
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets b/data/2023/neurips/DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
new file mode 100644
index 0000000000..1ee5948d37
--- /dev/null
+++ b/data/2023/neurips/DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets	
@@ -0,0 +1 @@
+Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets? The answer lies in learning dataset-specific features and ensembling their knowledge but do all this in a single model. Previous methods achieve this by having separate detection heads on a common backbone but that results in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose Dataset-Aware Mixture-of-Experts, DAMEX where we train the experts to become an `expert' of a dataset by learning to route each dataset tokens to its mapped expert. Experiments on Universal Object-Detection Benchmark show that we outperform the existing state-of-the-art by average +10.2 AP score and improve over our non-MoE baseline by average +2.0 AP score. We also observe consistent gains while mixing datasets with (1) limited availability, (2) disparate domains and (3) divergent label sets. Further, we qualitatively show that DAMEX is robust against expert representation collapse.
\ No newline at end of file
diff --git a/data/2023/neurips/DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation b/data/2023/neurips/DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
new file mode 100644
index 0000000000..afcec95d62
--- /dev/null
+++ b/data/2023/neurips/DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation	
@@ -0,0 +1 @@
+Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal distribution, posing challenges to achieving both high-quality translations and fast decoding speeds for S2ST models. In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. To better capture the complex distribution of the target speech, DASpeech adopts the two-pass architecture to decompose the generation process into two steps, where a linguistic decoder first generates the target text, and an acoustic decoder then generates the target speech based on the hidden states of the linguistic decoder. Specifically, we use the decoder of DA-Transformer as the linguistic decoder, and use FastSpeech 2 as the acoustic decoder. DA-Transformer models translations with a directed acyclic graph (DAG). To consider all potential paths in the DAG during training, we calculate the expected hidden states for each target token via dynamic programming, and feed them into the acoustic decoder to predict the target mel-spectrogram. During inference, we select the most probable path and take hidden states on that path as input to the acoustic decoder. Experiments on the CVSS Fr-En benchmark demonstrate that DASpeech can achieve comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18.53x speedup compared to the autoregressive baseline. Compared with the previous non-autoregressive S2ST model, DASpeech does not rely on knowledge distillation and iterative decoding, achieving significant improvements in both translation quality and decoding speed. Furthermore, DASpeech shows the ability to preserve the speaker's voice of the source speech during translation.
\ No newline at end of file
diff --git a/data/2023/neurips/DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries b/data/2023/neurips/DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries
new file mode 100644
index 0000000000..bd9ac090bc
--- /dev/null
+++ b/data/2023/neurips/DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries	
@@ -0,0 +1 @@
+We study the problem of $\textit{vector set search}$ with $\textit{vector set queries}$. This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are $\textit{sets}$ of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (${\bf D}$ESSERT ${\bf E}$ffeciently ${\bf S}$earches ${\bf S}$ets of ${\bf E}$mbeddings via ${\bf R}$etrieval ${\bf T}$ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.
\ No newline at end of file
diff --git a/data/2023/neurips/DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning b/data/2023/neurips/DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..907f844e2d
--- /dev/null
+++ b/data/2023/neurips/DIFFER: Decomposing Individual Reward for Fair Experience Replay in Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+Cooperative multi-agent reinforcement learning (MARL) is a challenging task, as agents must learn complex and diverse individual strategies from a shared team reward. However, existing methods struggle to distinguish and exploit important individual experiences, as they lack an effective way to decompose the team reward into individual rewards. To address this challenge, we propose DIFFER, a powerful theoretical framework for decomposing individual rewards to enable fair experience replay in MARL. By enforcing the invariance of network gradients, we establish a partial differential equation whose solution yields the underlying individual reward function. The individual TD-error can then be computed from the solved closed-form individual rewards, indicating the importance of each piece of experience in the learning task and guiding the training process. Our method elegantly achieves an equivalence to the original learning framework when individual experiences are homogeneous, while also adapting to achieve more muscular efficiency and fairness when diversity is observed.Our extensive experiments on popular benchmarks validate the effectiveness of our theory and method, demonstrating significant improvements in learning efficiency and fairness.
\ No newline at end of file
diff --git a/data/2023/neurips/DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization b/data/2023/neurips/DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
new file mode 100644
index 0000000000..45accb178f
--- /dev/null
+++ b/data/2023/neurips/DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization	
@@ -0,0 +1 @@
+Neural network-based Combinatorial Optimization (CO) methods have shown promising results in solving various NP-complete (NPC) problems without relying on hand-crafted domain knowledge. This paper broadens the current scope of neural solvers for NPC problems by introducing a new graph-based diffusion framework, namely DIFUSCO. Our framework casts NPC problems as discrete {0, 1}-vector optimization problems and leverages graph-based denoising diffusion models to generate high-quality solutions. We investigate two types of diffusion models with Gaussian and Bernoulli noise, respectively, and devise an effective inference schedule to enhance the solution quality. We evaluate our methods on two well-studied NPC combinatorial optimization problems: Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS). Experimental results show that DIFUSCO strongly outperforms the previous state-of-the-art neural solvers, improving the performance gap between ground-truth and neural solvers from 1.76% to 0.46% on TSP-500, from 2.46% to 1.17% on TSP-1000, and from 3.19% to 2.58% on TSP10000. For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark.
\ No newline at end of file
diff --git a/data/2023/neurips/DISCS: A Benchmark for Discrete Sampling b/data/2023/neurips/DISCS: A Benchmark for Discrete Sampling
new file mode 100644
index 0000000000..7a98082d00
--- /dev/null
+++ b/data/2023/neurips/DISCS: A Benchmark for Discrete Sampling	
@@ -0,0 +1 @@
+Sampling in discrete spaces, with critical applications in simulation and optimization, has recently been boosted by significant advances in gradient-based approaches that exploit modern accelerators like GPUs. However, two key challenges are hindering further advancement in research on discrete sampling. First, since there is no consensus on experimental settings and evaluation setups, the empirical results in different research papers are often not comparable. Second, implementing samplers and target distributions often requires a nontrivial amount of effort in terms of calibration and parallelism. To tackle these challenges, we pro-pose DISCS (DISCrete Sampling), a tailored package and benchmark that supports unified and efficient experiment implementation and evaluations for discrete sampling in three types of tasks: sampling from classical graphical models and energy based generative models, and sampling for solving combinatorial optimization. Throughout the comprehensive evaluations in DISCS , we gained new insights into scalability, design principles for proposal distributions, and lessons for adaptive sampling design. DISCS efficiently implements representative discrete samplers in existing research works as baselines and offers a simple interface that researchers can conveniently add new discrete samplers and directly compare their performance with the benchmark result in a calibrated setup.
\ No newline at end of file
diff --git a/data/2023/neurips/DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model b/data/2023/neurips/DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
new file mode 100644
index 0000000000..a91785617b
--- /dev/null
+++ b/data/2023/neurips/DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model	
@@ -0,0 +1 @@
+Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision, allowing our segmentation model to benefit from cheaper bounding box annotations. To share knowledge across datasets, we use text embeddings from the same semantic embedding space as classifiers and share all network parameters among datasets. We train DaTaSeg on ADE semantic, COCO panoptic, and Objects365 detection datasets. DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. DaTaSeg also enables weakly-supervised knowledge transfer on ADE panoptic and Objects365 instance segmentation. Experiments show DaTaSeg scales with the number of training datasets and enables open-vocabulary segmentation through direct transfer. In addition, we annotate an Objects365 instance segmentation set of 1,000 images and will release it as a public benchmark.
\ No newline at end of file
diff --git a/data/2023/neurips/Data Quality in Imitation Learning b/data/2023/neurips/Data Quality in Imitation Learning
new file mode 100644
index 0000000000..8b25fe7150
--- /dev/null
+++ b/data/2023/neurips/Data Quality in Imitation Learning	
@@ -0,0 +1 @@
+In supervised learning, the question of data quality and curation has been over-shadowed in recent years by increasingly more powerful and expressive models that can ingest internet-scale data. However, in offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity. This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations. Policies learned through IL suffer from state distribution shift at test time due to compounding errors in action prediction, which leads to unseen states that the policy cannot recover from. Instead of designing new algorithms to address distribution shift, an alternative perspective is to develop new ways of assessing and curating datasets. There is growing evidence that the same IL algorithms can have substantially different performance across different datasets. This calls for a formalism for defining metrics of"data quality"that can further be leveraged for data curation. In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. We propose two fundamental properties that shape the quality of a dataset: i) action divergence: the mismatch between the expert and learned policy at certain states; and ii) transition diversity: the noise present in the system for a given state and action. We investigate the combined effect of these two key properties in imitation learning theoretically, and we empirically analyze models trained on a variety of different data sources. We show that state diversity is not always beneficial, and we demonstrate how action divergence and transition diversity interact in practice.
\ No newline at end of file
diff --git a/data/2023/neurips/Data Selection for Language Models via Importance Resampling b/data/2023/neurips/Data Selection for Language Models via Importance Resampling
new file mode 100644
index 0000000000..fc604bad80
--- /dev/null
+++ b/data/2023/neurips/Data Selection for Language Models via Importance Resampling	
@@ -0,0 +1 @@
+Selecting a suitable pretraining dataset is crucial for both general-domain (e.g., GPT-3) and domain-specific (e.g., Codex) language models (LMs). We formalize this problem as selecting a subset of a large raw unlabeled dataset to match a desired target distribution given unlabeled target samples. Due to the scale and dimensionality of the raw text data, existing methods use simple heuristics or require human experts to manually curate data. Instead, we extend the classic importance resampling approach used in low-dimensions for LM data selection. We propose Data Selection with Importance Resampling (DSIR), an efficient and scalable framework that estimates importance weights in a reduced feature space for tractability and selects data with importance resampling according to these weights. We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents from the full Pile dataset in 4.5 hours. To measure whether hashed n-gram features preserve the aspects of the data that are relevant to the target, we define KL reduction, a data metric that measures the proximity between the selected pretraining data and the target on some feature space. Across 8 data selection methods (including expert selection), KL reduction on hashed n-gram features highly correlates with average downstream accuracy (r=0.82). When selecting data for continued pretraining on a specific domain, DSIR performs comparably to expert curation across 8 target distributions. When pretraining general-domain models (target is Wikipedia and books), DSIR improves over random selection and heuristic filtering baselines by 2-2.5% on the GLUE benchmark. Code is available at https://github.com/p-lambda/dsir.
\ No newline at end of file
diff --git a/data/2023/neurips/Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances b/data/2023/neurips/Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances
new file mode 100644
index 0000000000..efd3b49eb9
--- /dev/null
+++ b/data/2023/neurips/Data-driven Optimal Filtering for Linear Systems with Unknown Noise Covariances	
@@ -0,0 +1 @@
+This paper examines learning the optimal filtering policy, known as the Kalman gain, for a linear system with unknown noise covariance matrices using noisy output data. The learning problem is formulated as a stochastic policy optimization problem, aiming to minimize the output prediction error. This formulation provides a direct bridge between data-driven optimal control and, its dual, optimal filtering. Our contributions are twofold. Firstly, we conduct a thorough convergence analysis of the stochastic gradient descent algorithm, adopted for the filtering problem, accounting for biased gradients and stability constraints. Secondly, we carefully leverage a combination of tools from linear system theory and high-dimensional statistics to derive bias-variance error bounds that scale logarithmically with problem dimension, and, in contrast to subspace methods, the length of output trajectories only affects the bias term.
\ No newline at end of file
diff --git a/data/2023/neurips/Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation b/data/2023/neurips/Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation
new file mode 100644
index 0000000000..e381d2092c
--- /dev/null
+++ b/data/2023/neurips/Dataset Diffusion: Diffusion-based Synthetic Data Generation for Pixel-Level Semantic Segmentation	
@@ -0,0 +1 @@
+Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, and our approach significantly outperforms concurrent work. Our benchmarks and code will be released at https://github.com/VinAIResearch/Dataset-Diffusion
\ No newline at end of file
diff --git a/data/2023/neurips/Debiasing Conditional Stochastic Optimization b/data/2023/neurips/Debiasing Conditional Stochastic Optimization
new file mode 100644
index 0000000000..7bcbeba344
--- /dev/null
+++ b/data/2023/neurips/Debiasing Conditional Stochastic Optimization	
@@ -0,0 +1 @@
+In this paper, we study the conditional stochastic optimization (CSO) problem which covers a variety of applications including portfolio selection, reinforcement learning, robust learning, causal inference, etc. The sample-averaged gradient of the CSO objective is biased due to its nested structure, and therefore requires a high sample complexity for convergence. We introduce a general stochastic extrapolation technique that effectively reduces the bias. We show that for nonconvex smooth objectives, combining this extrapolation with variance reduction techniques can achieve a significantly better sample complexity than the existing bounds. Additionally, we develop new algorithms for the finite-sum variant of the CSO problem that also significantly improve upon existing results. Finally, we believe that our debiasing technique has the potential to be a useful tool for addressing similar challenges in other stochastic optimization problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards b/data/2023/neurips/Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
new file mode 100644
index 0000000000..8ece820dd3
--- /dev/null
+++ b/data/2023/neurips/Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards	
@@ -0,0 +1 @@
+We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order $\log{T}$ in both sub-gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order $\sqrt{T}\log T$ up to a $\log T$ factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.
\ No newline at end of file
diff --git a/data/2023/neurips/Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models b/data/2023/neurips/Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models
new file mode 100644
index 0000000000..cb3eb247a4
--- /dev/null
+++ b/data/2023/neurips/Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models	
@@ -0,0 +1 @@
+Reinforcement learning presents an attractive paradigm to reason about several distinct aspects of sequential decision making, such as specifying complex goals, planning future observations and actions, and critiquing their utilities. However, the combined integration of these capabilities poses competing algorithmic challenges in retaining maximal expressivity while allowing for flexibility in modeling choices for efficient learning and inference. We present Decision Stacks, a generative framework that decomposes goal-conditioned policy agents into 3 generative modules. These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing. Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed. Our empirical results demonstrate the effectiveness of Decision Stacks for offline policy optimization for several MDP and POMDP environments, outperforming existing methods and enabling flexible generative decision making.
\ No newline at end of file
diff --git a/data/2023/neurips/Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory b/data/2023/neurips/Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory
new file mode 100644
index 0000000000..f38109db37
--- /dev/null
+++ b/data/2023/neurips/Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory	
@@ -0,0 +1 @@
+Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
\ No newline at end of file
diff --git a/data/2023/neurips/Decompose Novel into Known: Part Concept Learning For 3D Novel Class Discovery b/data/2023/neurips/Decompose Novel into Known: Part Concept Learning For 3D Novel Class Discovery
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning b/data/2023/neurips/Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..ca193f30f9
--- /dev/null
+++ b/data/2023/neurips/Decompose a Task into Generalizable Subtasks in Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+In recent years, Multi-Agent Reinforcement Learning (MARL) techniques have made significant strides in achieving high asymptotic performance in single task. However, there has been limited exploration of model transferability across tasks. Training a model from scratch for each task can be time-consuming and expensive, especially for large-scale Multi-Agent Systems. Therefore, it is crucial to develop methods for generalizing the model across tasks. Considering that there exist task-independent subtasks across MARL tasks, a model that can decompose such subtasks from the source task could generalize to target tasks. However, ensuring true task-independence of subtasks poses a challenge. In this paper, we propose to d ecompose a t ask in to a series of g eneralizable s ubtasks (DT2GS), a novel framework that addresses this challenge by utilizing a scalable subtask encoder and an adaptive subtask semantic module. We show that these components endow subtasks with two properties critical for task-independence: avoiding overfitting to the source task and maintaining consistent yet scalable semantics across tasks. Empirical results demonstrate that DT2GS possesses sound zero-shot generalization capability across tasks, exhibits sufficient transferability, and outperforms existing methods in both multi-task and single-task problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Deep Contract Design via Discontinuous Networks b/data/2023/neurips/Deep Contract Design via Discontinuous Networks
new file mode 100644
index 0000000000..d40cce4a3d
--- /dev/null
+++ b/data/2023/neurips/Deep Contract Design via Discontinuous Networks	
@@ -0,0 +1 @@
+Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function of the design of a contract where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes.
\ No newline at end of file
diff --git a/data/2023/neurips/Deep Fractional Fourier Transform b/data/2023/neurips/Deep Fractional Fourier Transform
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems b/data/2023/neurips/Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems
new file mode 100644
index 0000000000..b95cba3474
--- /dev/null
+++ b/data/2023/neurips/Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems	
@@ -0,0 +1 @@
+Probabilistic inference in high-dimensional state-space models is computationally challenging. For many spatiotemporal systems, however, prior knowledge about the dependency structure of state variables is available. We leverage this structure to develop a computationally efficient approach to state estimation and learning in graph-structured state-space models with (partially) unknown dynamics and limited historical data. Building on recent methods that combine ideas from deep learning with principled inference in Gaussian Markov random fields (GMRF), we reformulate graph-structured state-space models as Deep GMRFs defined by simple spatial and temporal graph layers. This results in a flexible spatiotemporal prior that can be learned efficiently from a single time sequence via variational inference. Under linear Gaussian assumptions, we retain a closed-form posterior, which can be sampled efficiently using the conjugate gradient method, scaling favourably compared to classical Kalman filter based approaches
\ No newline at end of file
diff --git a/data/2023/neurips/Deep Insights into Noisy Pseudo Labeling on Graph Data b/data/2023/neurips/Deep Insights into Noisy Pseudo Labeling on Graph Data
new file mode 100644
index 0000000000..d6489c7893
--- /dev/null
+++ b/data/2023/neurips/Deep Insights into Noisy Pseudo Labeling on Graph Data	
@@ -0,0 +1 @@
+Pseudo labeling (PL) is a wide-applied strategy to enlarge the labeled dataset by self-annotating the potential samples during the training process. Several works have shown that it can improve the graph learning model performance in general. However, we notice that the incorrect labels can be fatal to the graph training process. Inappropriate PL may result in the performance degrading, especially on graph data where the noise can propagate. Surprisingly, the corresponding error is seldom theoretically analyzed in the literature. In this paper, we aim to give deep insights of PL on graph learning models. We first present the error analysis of PL strategy by showing that the error is bounded by the confidence of PL threshold and consistency of multi-view prediction. Then, we theoretically illustrate the effect of PL on convergence property. Based on the analysis, we propose a cautious pseudo labeling methodology in which we pseudo label the samples with highest confidence and multi-view consistency. Finally, extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model b/data/2023/neurips/Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model
new file mode 100644
index 0000000000..8856373d4d
--- /dev/null
+++ b/data/2023/neurips/Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model	
@@ -0,0 +1 @@
+Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.
\ No newline at end of file
diff --git a/data/2023/neurips/DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization b/data/2023/neurips/DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization
new file mode 100644
index 0000000000..9334b1923d
--- /dev/null
+++ b/data/2023/neurips/DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization	
@@ -0,0 +1 @@
+Ant Colony Optimization (ACO) is a meta-heuristic algorithm that has been successfully applied to various Combinatorial Optimization Problems (COPs). Traditionally, customizing ACO for a specific problem requires the expert design of knowledge-driven heuristics. In this paper, we propose DeepACO, a generic framework that leverages deep reinforcement learning to automate heuristic designs. DeepACO serves to strengthen the heuristic measures of existing ACO algorithms and dispense with laborious manual design in future ACO applications. As a neural-enhanced meta-heuristic, DeepACO consistently outperforms its ACO counterparts on eight COPs using a single neural model and a single set of hyperparameters. As a Neural Combinatorial Optimization method, DeepACO performs better than or on par with problem-specific methods on canonical routing problems. Our code is publicly available at https://github.com/henry-yeh/DeepACO.
\ No newline at end of file
diff --git a/data/2023/neurips/DeepPCR: Parallelizing Sequential Operations in Neural Networks b/data/2023/neurips/DeepPCR: Parallelizing Sequential Operations in Neural Networks
new file mode 100644
index 0000000000..c2494de049
--- /dev/null
+++ b/data/2023/neurips/DeepPCR: Parallelizing Sequential Operations in Neural Networks	
@@ -0,0 +1 @@
+Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for the forward and $200\times$ for the backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
\ No newline at end of file
diff --git a/data/2023/neurips/DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation b/data/2023/neurips/DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
new file mode 100644
index 0000000000..d112ce69b1
--- /dev/null
+++ b/data/2023/neurips/DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation	
@@ -0,0 +1 @@
+This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural network. Specifically, for an initial hand-object pose estimated by a base network, we forward it to a physics simulator to evaluate its stability. However, due to non-smooth contact geometry and penetration, existing differentiable simulators can not provide reliable state gradient. To remedy this, we further introduce a deep network to learn the stability evaluation process from the simulator, while smoothly approximating its gradient and thus enabling effective back-propagation. Extensive experiments show that our method noticeably improves the stability of the estimation and achieves superior efficiency over test-time optimization. The code is available at https://github.com/rongakowang/DeepSimHO.
\ No newline at end of file
diff --git a/data/2023/neurips/Delegated Classification b/data/2023/neurips/Delegated Classification
new file mode 100644
index 0000000000..93e04b1e50
--- /dev/null
+++ b/data/2023/neurips/Delegated Classification	
@@ -0,0 +1 @@
+When machine learning is outsourced to a rational agent, conflicts of interest might arise and severely impact predictive performance. In this work, we propose a theoretical framework for incentive-aware delegation of machine learning tasks. We model delegation as a principal-agent game, in which accurate learning can be incentivized by the principal using performance-based contracts. Adapting the economic theory of contract design to this setting, we define budget-optimal contracts and prove they take a simple threshold form under reasonable assumptions. In the binary-action case, the optimality of such contracts is shown to be equivalent to the classic Neyman-Pearson lemma, establishing a formal connection between contract design and statistical hypothesis testing. Empirically, we demonstrate that budget-optimal contracts can be constructed using small-scale data, leveraging recent advances in the study of learning curves and scaling laws. Performance and economic outcomes are evaluated using synthetic and real-world classification tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All? b/data/2023/neurips/Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?
new file mode 100644
index 0000000000..d0748646cb
--- /dev/null
+++ b/data/2023/neurips/Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All?	
@@ -0,0 +1 @@
+Recent studies on Graph Neural Networks(GNNs) provide both empirical and theoretical evidence supporting their effectiveness in capturing structural patterns on both homophilic and certain heterophilic graphs. Notably, most real-world homophilic and heterophilic graphs are comprised of a mixture of nodes in both homophilic and heterophilic structural patterns, exhibiting a structural disparity. However, the analysis of GNN performance with respect to nodes exhibiting different structural patterns, e.g., homophilic nodes in heterophilic graphs, remains rather limited. In the present study, we provide evidence that Graph Neural Networks(GNNs) on node classification typically perform admirably on homophilic nodes within homophilic graphs and heterophilic nodes within heterophilic graphs while struggling on the opposite node set, exhibiting a performance disparity. We theoretically and empirically identify effects of GNNs on testing nodes exhibiting distinct structural patterns. We then propose a rigorous, non-i.i.d PAC-Bayesian generalization bound for GNNs, revealing reasons for the performance disparity, namely the aggregated feature distance and homophily ratio difference between training and testing nodes. Furthermore, we demonstrate the practical implications of our new findings via (1) elucidating the effectiveness of deeper GNNs; and (2) revealing an over-looked distribution shift factor on graph out-of-distribution problem and proposing a new scenario accordingly.
\ No newline at end of file
diff --git a/data/2023/neurips/Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models b/data/2023/neurips/Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models
new file mode 100644
index 0000000000..35e7a513a0
--- /dev/null
+++ b/data/2023/neurips/Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models	
@@ -0,0 +1 @@
+Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text, leading to numerous applications such as cross-modal retrieval, visual question answering, captioning, and more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called `object bias' - their representations behave as `bags of nouns', mostly ignoring or downsizing the attributes, relations, and states of objects described/appearing in texts/images. Although some great attempts at fixing these `compositional reasoning' issues were proposed in the recent literature, the problem is still far from being solved. In this paper, we uncover two factors limiting the VL models' compositional reasoning performance. These two factors are properties of the paired VL dataset used for finetuning and pre-training the VL model: (i) the caption quality, or in other words `image-alignment', of the texts; and (ii) the `density' of the captions in the sense of mentioning all the details appearing on the image. We propose a fine-tuning approach for automatically treating these factors leveraging a standard VL dataset (CC3M). Applied to CLIP, we demonstrate its significant compositional reasoning performance increase of up to $\sim27\%$ over the base model, up to $\sim20\%$ over the strongest baseline, and by $6.7\%$ on average.
\ No newline at end of file
diff --git a/data/2023/neurips/Depth-discriminative Metric Learning for Monocular 3D Object Detection b/data/2023/neurips/Depth-discriminative Metric Learning for Monocular 3D Object Detection
new file mode 100644
index 0000000000..6388f172f5
--- /dev/null
+++ b/data/2023/neurips/Depth-discriminative Metric Learning for Monocular 3D Object Detection	
@@ -0,0 +1 @@
+Monocular 3D object detection poses a significant challenge due to the lack of depth information in RGB images. Many existing methods strive to enhance the object depth estimation performance by allocating additional parameters for object depth estimation, utilizing extra modules or data. In contrast, we introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes without increasing inference time and model size. Our method employs the distance-preserving function to organize the feature space manifold in relation to ground-truth object depth. The proposed (K, B, eps)-quasi-isometric loss leverages predetermined pairwise distance restriction as guidance for adjusting the distance among object descriptors without disrupting the non-linearity of the natural feature manifold. Moreover, we introduce an auxiliary head for object-wise depth estimation, which enhances depth quality while maintaining the inference time. The broad applicability of our method is demonstrated through experiments that show improvements in overall performance when integrated into various baselines. The results show that our method consistently improves the performance of various baselines by 23.51% and 5.78% on average across KITTI and Waymo, respectively.
\ No newline at end of file
diff --git a/data/2023/neurips/Derandomized novelty detection with FDR control via conformal e-values b/data/2023/neurips/Derandomized novelty detection with FDR control via conformal e-values
new file mode 100644
index 0000000000..46e7323b7e
--- /dev/null
+++ b/data/2023/neurips/Derandomized novelty detection with FDR control via conformal e-values	
@@ -0,0 +1 @@
+Conformal inference provides a general distribution-free method to rigorously calibrate the output of any machine learning algorithm for novelty detection. While this approach has many strengths, it has the limitation of being randomized, in the sense that it may lead to different results when analyzing twice the same data, and this can hinder the interpretation of any findings. We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values to quantify statistical significance. This solution allows the evidence gathered from multiple analyses of the same data to be aggregated effectively while provably controlling the false discovery rate. Further, we show that the proposed method can reduce randomness without much loss of power compared to standard conformal inference, partly thanks to an innovative way of weighting conformal e-values based on additional side information carefully extracted from the same data. Simulations with synthetic and real data confirm this solution can be effective at eliminating random noise in the inferences obtained with state-of-the-art alternative techniques, sometimes also leading to higher power.
\ No newline at end of file
diff --git a/data/2023/neurips/Described Object Detection: Liberating Object Detection with Flexible Expressions b/data/2023/neurips/Described Object Detection: Liberating Object Detection with Flexible Expressions
new file mode 100644
index 0000000000..b32305f6c8
--- /dev/null
+++ b/data/2023/neurips/Described Object Detection: Liberating Object Detection with Flexible Expressions	
@@ -0,0 +1 @@
+Detecting objects based on language information is a popular task that includes Open-Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object. We establish the research foundation for DOD by constructing a Description Detection Dataset ($D^3$). This dataset features flexible language expressions, whether short category names or long descriptions, and annotating all described objects on all images without omission. By evaluating previous SOTA methods on $D^3$, we find some troublemakers that fail current REC, OVD, and bi-functional methods. REC methods struggle with confidence scores, rejecting negative instances, and multi-target scenarios, while OVD methods face constraints with long and complex descriptions. Recent bi-functional methods also do not work well on DOD due to their separated training procedures and inference strategies for REC and OVD tasks. Building upon the aforementioned findings, we propose a baseline that largely improves REC methods by reconstructing the training data and introducing a binary classification sub-task, outperforming existing methods. Data and code are available at https://github.com/shikras/d-cube and related works are tracked in https://github.com/Charles-Xie/awesome-described-object-detection.
\ No newline at end of file
diff --git a/data/2023/neurips/DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation b/data/2023/neurips/DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation
new file mode 100644
index 0000000000..73e6ba37fc
--- /dev/null
+++ b/data/2023/neurips/DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation	
@@ -0,0 +1 @@
+Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.
\ No newline at end of file
diff --git a/data/2023/neurips/DiViNeT: 3D Reconstruction from Disparate Views using Neural Template Regularization b/data/2023/neurips/DiViNeT: 3D Reconstruction from Disparate Views using Neural Template Regularization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models b/data/2023/neurips/Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
new file mode 100644
index 0000000000..6e93b3b00e
--- /dev/null
+++ b/data/2023/neurips/Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models	
@@ -0,0 +1 @@
+The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/
\ No newline at end of file
diff --git a/data/2023/neurips/Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models b/data/2023/neurips/Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
new file mode 100644
index 0000000000..ceb676c442
--- /dev/null
+++ b/data/2023/neurips/Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models	
@@ -0,0 +1 @@
+Due to the ease of training, ability to scale, and high sample quality, diffusion models (DMs) have become the preferred option for generative modeling, with numerous pre-trained models available for a wide variety of datasets. Containing intricate information about data distributions, pre-trained DMs are valuable assets for downstream applications. In this work, we consider learning from pre-trained DMs and transferring their knowledge to other generative models in a data-free fashion. Specifically, we propose a general framework called Diff-Instruct to instruct the training of arbitrary generative models as long as the generated samples are differentiable with respect to the model parameters. Our proposed Diff-Instruct is built on a rigorous mathematical foundation where the instruction process directly corresponds to minimizing a novel divergence we call Integral Kullback-Leibler (IKL) divergence. IKL is tailored for DMs by calculating the integral of the KL divergence along a diffusion process, which we show to be more robust in comparing distributions with misaligned supports. We also reveal non-trivial connections of our method to existing works such as DreamFusion, and generative adversarial training. To demonstrate the effectiveness and universality of Diff-Instruct, we consider two scenarios: distilling pre-trained diffusion models and refining existing GAN models. The experiments on distilling pre-trained diffusion models show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. The experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models across various settings.
\ No newline at end of file
diff --git a/data/2023/neurips/DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification b/data/2023/neurips/DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
new file mode 100644
index 0000000000..02858fc456
--- /dev/null
+++ b/data/2023/neurips/DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification	
@@ -0,0 +1 @@
+Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20% on CIFAR-10 under $\ell_\infty$ attack $(\epsilon=8/255)$, and over 10% on ImageNet under $\ell_\infty$ attack $(\epsilon=4/255)$. We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.
\ No newline at end of file
diff --git a/data/2023/neurips/DiffComplete: Diffusion-based Generative 3D Shape Completion b/data/2023/neurips/DiffComplete: Diffusion-based Generative 3D Shape Completion
new file mode 100644
index 0000000000..d6cfc21919
--- /dev/null
+++ b/data/2023/neurips/DiffComplete: Diffusion-based Generative 3D Shape Completion	
@@ -0,0 +1 @@
+We introduce a new diffusion-based approach for shape completion on 3D range scans. Compared with prior deterministic and probabilistic methods, we strike a balance between realism, multi-modality, and high fidelity. We propose DiffComplete by casting shape completion as a generative task conditioned on the incomplete shape. Our key designs are two-fold. First, we devise a hierarchical feature aggregation mechanism to inject conditional features in a spatially-consistent manner. So, we can capture both local details and broader contexts of the conditional inputs to control the shape completion. Second, we propose an occupancy-aware fusion strategy in our model to enable the completion of multiple partial shapes and introduce higher flexibility on the input conditions. DiffComplete sets a new SOTA performance (e.g., 40% decrease on l_1 error) on two large-scale 3D shape completion benchmarks. Our completed shapes not only have a realistic outlook compared with the deterministic methods but also exhibit high similarity to the ground truths compared with the probabilistic alternatives. Further, DiffComplete has strong generalizability on objects of entirely unseen classes for both synthetic and real data, eliminating the need for model re-training in various applications.
\ No newline at end of file
diff --git a/data/2023/neurips/DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology b/data/2023/neurips/DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology
new file mode 100644
index 0000000000..d4f45d8a4b
--- /dev/null
+++ b/data/2023/neurips/DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology	
@@ -0,0 +1 @@
+We present DiffInfinite, a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artifacts. The training leverages classifier-free guidance to augment a small, sparsely annotated dataset with unlabelled data. Our method alleviates unique challenges in histopathological imaging practice: large-scale information, costly manual annotation, and protective data handling. The biological plausibility of DiffInfinite data is evaluated in a survey by ten experienced pathologists as well as a downstream classification and segmentation task. Samples from the model score strongly on anti-copying metrics which is relevant for the protection of patient data.
\ No newline at end of file
diff --git a/data/2023/neurips/DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model b/data/2023/neurips/DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model
new file mode 100644
index 0000000000..0ffcfbd7da
--- /dev/null
+++ b/data/2023/neurips/DiffTraj: Generating GPS Trajectory with Diffusion Probabilistic Model	
@@ -0,0 +1 @@
+Pervasive integration of GPS-enabled devices and data acquisition technologies has led to an exponential increase in GPS trajectory data, fostering advancements in spatial-temporal data mining research. Nonetheless, GPS trajectories contain personal geolocation information, rendering serious privacy concerns when working with raw data. A promising approach to address this issue is trajectory generation, which involves replacing original data with generated, privacy-free alternatives. Despite the potential of trajectory generation, the complex nature of human behavior and its inherent stochastic characteristics pose challenges in generating high-quality trajectories. In this work, we propose a spatial-temporal diffusion probabilistic model for trajectory generation (DiffTraj). This model effectively combines the generative abilities of diffusion models with the spatial-temporal features derived from real trajectories. The core idea is to reconstruct and synthesize geographic trajectories from white noise through a reverse trajectory denoising process. Furthermore, we propose a Trajectory UNet (Traj-UNet) deep neural network to embed conditional information and accurately estimate noise levels during the reverse process. Experiments on two real-world datasets show that DiffTraj can be intuitively applied to generate high-fidelity trajectories while retaining the original distributions. Moreover, the generated results can support downstream trajectory analysis tasks and significantly outperform other methods in terms of geo-distribution evaluations.
\ No newline at end of file
diff --git a/data/2023/neurips/Differentiable Clustering with Perturbed Spanning Forests b/data/2023/neurips/Differentiable Clustering with Perturbed Spanning Forests
new file mode 100644
index 0000000000..7a9d77b64d
--- /dev/null
+++ b/data/2023/neurips/Differentiable Clustering with Perturbed Spanning Forests	
@@ -0,0 +1 @@
+We introduce a differentiable clustering method based on minimum-weight spanning forests, a variant of spanning trees with several connected components. Our method relies on stochastic perturbations of solutions of linear programs, for smoothing and efficient gradient computations. This allows us to include clustering in end-to-end trainable pipelines. We show that our method performs well even in difficult settings, such as datasets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several real world datasets for supervised and semi-supervised tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Differentiable sorting for censored time-to-event data b/data/2023/neurips/Differentiable sorting for censored time-to-event data
new file mode 100644
index 0000000000..e69de29bb2
diff --git "a/data/2023/neurips/Differentially Private Statistical Inference through \316\262-Divergence One Posterior Sampling" "b/data/2023/neurips/Differentially Private Statistical Inference through \316\262-Divergence One Posterior Sampling"
new file mode 100644
index 0000000000..7f6ebb14a2
--- /dev/null
+++ "b/data/2023/neurips/Differentially Private Statistical Inference through \316\262-Divergence One Posterior Sampling"	
@@ -0,0 +1 @@
+Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
\ No newline at end of file
diff --git a/data/2023/neurips/DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models b/data/2023/neurips/DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models
new file mode 100644
index 0000000000..c58f27e040
--- /dev/null
+++ b/data/2023/neurips/DiffuseBot: Breeding Soft Robots With Physics-Augmented Generative Diffusion Models	
@@ -0,0 +1 @@
+Nature evolves creatures with a high complexity of morphological and behavioral intelligence, meanwhile computational methods lag in approaching that diversity and efficacy. Co-optimization of artificial creatures' morphology and control in silico shows promise for applications in physical soft robotics and virtual character creation; such approaches, however, require developing new learning algorithms that can reason about function atop pure structure. In this paper, we present DiffuseBot, a physics-augmented diffusion model that generates soft robot morphologies capable of excelling in a wide spectrum of tasks. DiffuseBot bridges the gap between virtually generated content and physical utility by (i) augmenting the diffusion process with a physical dynamical simulation which provides a certificate of performance, and (ii) introducing a co-design procedure that jointly optimizes physical design and control by leveraging information about physical sensitivities from differentiable simulation. We showcase a range of simulated and fabricated robots along with their capabilities. Check our website at https://diffusebot.github.io/
\ No newline at end of file
diff --git a/data/2023/neurips/Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning b/data/2023/neurips/Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
new file mode 100644
index 0000000000..0286ea86e6
--- /dev/null
+++ b/data/2023/neurips/Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning	
@@ -0,0 +1 @@
+Diffusion models have demonstrated highly-expressive generative capabilities in vision and NLP. Recent studies in reinforcement learning (RL) have shown that diffusion models are also powerful in modeling complex policies or trajectories in offline datasets. However, these works have been limited to single-task settings where a generalist agent capable of addressing multi-task predicaments is absent. In this paper, we aim to investigate the effectiveness of a single diffusion model in modeling large-scale multi-task offline data, which can be challenging due to diverse and multimodal data distribution. Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings. \textsc{MTDiff} leverages vast amounts of knowledge available in multi-task data and performs implicit knowledge sharing among tasks. For generative planning, we find \textsc{MTDiff} outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D. For data synthesis, \textsc{MTDiff} generates high-quality data for testing tasks given a single demonstration as a prompt, which enhances the low-quality datasets for even unseen tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Diffusion Self-Guidance for Controllable Image Generation b/data/2023/neurips/Diffusion Self-Guidance for Controllable Image Generation
new file mode 100644
index 0000000000..9d4ee24fd7
--- /dev/null
+++ b/data/2023/neurips/Diffusion Self-Guidance for Controllable Image Generation	
@@ -0,0 +1 @@
+Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/
\ No newline at end of file
diff --git a/data/2023/neurips/Direct Preference-based Policy Optimization without Reward Modeling b/data/2023/neurips/Direct Preference-based Policy Optimization without Reward Modeling
new file mode 100644
index 0000000000..66e7505cfe
--- /dev/null
+++ b/data/2023/neurips/Direct Preference-based Policy Optimization without Reward Modeling	
@@ -0,0 +1 @@
+Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a two-step procedure: they first learn a reward model based on given preference data and then employ off-the-shelf reinforcement learning algorithms using the learned reward model. However, obtaining an accurate reward model solely from preference information, especially when the preference is from human teachers, can be difficult. Instead, we propose a PbRL algorithm that directly learns from preference without requiring any reward modeling. To achieve this, we adopt a contrastive learning framework to design a novel policy scoring metric that assigns a high score to policies that align with the given preferences. We apply our algorithm to offline RL tasks with actual human preference labels and show that our algorithm outperforms or is on par with the existing PbRL methods. Notably, on high-dimensional control tasks, our algorithm surpasses offline RL methods that learn with ground-truth reward information. Finally, we show that our algorithm can be successfully applied to fine-tune large language models.
\ No newline at end of file
diff --git a/data/2023/neurips/Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity b/data/2023/neurips/Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity
new file mode 100644
index 0000000000..9113d645dd
--- /dev/null
+++ b/data/2023/neurips/Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity	
@@ -0,0 +1 @@
+How neuronal circuits achieve credit assignment remains a central unsolved question in systems neuroscience. Various studies have suggested plausible solutions for back-propagating error signals through multi-layer networks. These purely functionally motivated models assume distinct neuronal compartments to represent local error signals that determine the sign of synaptic plasticity. However, this explicit error modulation is inconsistent with phenomenological plasticity models in which the sign depends primarily on postsynaptic activity. Here we show how a plausible microcircuit model and Hebbian learning rule derived within an adaptive control theory framework can resolve this discrepancy. Assuming errors are encoded in top-down dis-inhibitory synaptic afferents, we show that error-modulated learning emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same learning rule accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.
\ No newline at end of file
diff --git a/data/2023/neurips/DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models b/data/2023/neurips/DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models
new file mode 100644
index 0000000000..609c0bff72
--- /dev/null
+++ b/data/2023/neurips/DisDiff: Unsupervised Disentanglement of Diffusion Probabilistic Models	
@@ -0,0 +1 @@
+Targeting to understand the underlying explainable factors behind observations and modeling the conditional generation process on these factors, we connect disentangled representation learning to Diffusion Probabilistic Models (DPMs) to take advantage of the remarkable modeling ability of DPMs. We propose a new task, disentanglement of (DPMs): given a pre-trained DPM, without any annotations of the factors, the task is to automatically discover the inherent factors behind the observations and disentangle the gradient fields of DPM into sub-gradient fields, each conditioned on the representation of each discovered factor. With disentangled DPMs, those inherent factors can be automatically discovered, explicitly represented, and clearly injected into the diffusion process via the sub-gradient fields. To tackle this task, we devise an unsupervised approach named DisDiff, achieving disentangled representation learning in the framework of DPMs. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of DisDiff.
\ No newline at end of file
diff --git a/data/2023/neurips/Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning b/data/2023/neurips/Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning
new file mode 100644
index 0000000000..0dce3fa1e9
--- /dev/null
+++ b/data/2023/neurips/Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning	
@@ -0,0 +1 @@
+In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set, which consists of one ground-truth label and several false positive labels. Multi-instance partial-label learning (MIPL) is a learning paradigm to deal with such tasks and has achieved favorable performances. Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels. However, this scheme may be suboptimal as global bag-level information is ignored and the predicted labels of bags are sensitive to predictions of negative instances. In this paper, we study an alternative scheme where a multi-instance bag is embedded into a single vector representation. Accordingly, an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL employs a disambiguation attention mechanism to aggregate a multi-instance bag into a single vector representation, followed by a momentum-based disambiguation strategy to identify the ground-truth label from the candidate label set. Furthermore, we introduce a real-world MIPL dataset for colorectal cancer classification. Experimental results on benchmark and real-world datasets validate the superiority of DEMIPL against the compared MIPL and partial-label learning approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design b/data/2023/neurips/Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
new file mode 100644
index 0000000000..c141c02808
--- /dev/null
+++ b/data/2023/neurips/Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design	
@@ -0,0 +1 @@
+The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment Design (GROOVE). In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. We believe this approach is a step towards the discovery of truly general RL algorithms, capable of solving a wide range of real-world environments.
\ No newline at end of file
diff --git a/data/2023/neurips/Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning b/data/2023/neurips/Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning
new file mode 100644
index 0000000000..465eafbe16
--- /dev/null
+++ b/data/2023/neurips/Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning	
@@ -0,0 +1 @@
+Discovering achievements with a hierarchical structure in procedurally generated environments presents a significant challenge. This requires an agent to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods have been built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be advantageous for learning hierarchical dependencies. However, these methods demand an excessive number of environment interactions or large model sizes, limiting their practicality. In this work, we demonstrate that proximal policy optimization (PPO), a simple yet versatile model-free algorithm, outperforms previous methods when optimized with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, albeit with limited confidence. Based on this observation, we introduce a novel contrastive learning method, called achievement distillation, which strengthens the agent's ability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment in a sample-efficient manner while utilizing fewer model parameters.
\ No newline at end of file
diff --git a/data/2023/neurips/Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions b/data/2023/neurips/Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions
new file mode 100644
index 0000000000..9b784a52db
--- /dev/null
+++ b/data/2023/neurips/Discovering Intrinsic Spatial-Temporal Logic Rules to Explain Human Actions	
@@ -0,0 +1 @@
+We propose a logic-informed knowledge-driven modeling framework for human movements by analyzing their trajectories. Our approach is inspired by the fact that human actions are usually driven by their intentions or desires, and are influenced by environmental factors such as the spatial relationships with surrounding objects. In this paper, we introduce a set of spatial-temporal logic rules as knowledge to explain human actions. These rules will be automatically discovered from observational data. To learn the model parameters and the rule content, we design an expectation-maximization (EM) algorithm, which treats the rule content as latent variables. The EM algorithm alternates between the E-step and M-step: in the E-step, the posterior distribution over the latent rule content is evaluated; in the M-step, the rule generator and model parameters are jointly optimized by maximizing the current expected log-likelihood. Our model may have a wide range of applications in areas such as sports analytics, robotics, and autonomous cars, where understanding human movements are essential. We demonstrate the model's superior interpretability and prediction performance on pedestrian and NBA basketball player datasets, both achieving promising results.
\ No newline at end of file
diff --git a/data/2023/neurips/Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering b/data/2023/neurips/Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering
new file mode 100644
index 0000000000..f21f64ebef
--- /dev/null
+++ b/data/2023/neurips/Disentangled Wasserstein Autoencoder for T-Cell Receptor Engineering	
@@ -0,0 +1 @@
+In protein biophysics, the separation between the functionally important residues (forming the active site or binding surface) and those that create the overall structure (the fold) is a well-established and fundamental concept. Identifying and modifying those functional sites is critical for protein engineering but computationally non-trivial, and requires significant domain knowledge. To automate this process from a data-driven perspective, we propose a disentangled Wasserstein autoencoder with an auxiliary classifier, which isolates the function-related patterns from the rest with theoretical guarantees. This enables one-pass protein sequence editing and improves the understanding of the resulting sequences and editing actions involved. To demonstrate its effectiveness, we apply it to T-cell receptors (TCRs), a well-studied structure-function case. We show that our method can be used to alter the function of TCRs without changing the structural backbone, outperforming several competing methods in generation quality and efficiency, and requiring only 10% of the running time needed by baseline models. To our knowledge, this is the first approach that utilizes disentangled representations for TCR engineering.
\ No newline at end of file
diff --git a/data/2023/neurips/Disentangling Cognitive Diagnosis with Limited Exercise Labels b/data/2023/neurips/Disentangling Cognitive Diagnosis with Limited Exercise Labels
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Disentangling Voice and Content with Self-Supervision for Speaker Recognition b/data/2023/neurips/Disentangling Voice and Content with Self-Supervision for Speaker Recognition
new file mode 100644
index 0000000000..b2aa505669
--- /dev/null
+++ b/data/2023/neurips/Disentangling Voice and Content with Self-Supervision for Speaker Recognition	
@@ -0,0 +1 @@
+For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.
\ No newline at end of file
diff --git a/data/2023/neurips/Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models b/data/2023/neurips/Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models
new file mode 100644
index 0000000000..f7314bd464
--- /dev/null
+++ b/data/2023/neurips/Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models	
@@ -0,0 +1 @@
+We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for better teachers by showing strong gains in out-of-distribution robustness when distilling from pretrained foundation models. Following this finding, we propose Discrete Adversarial Distillation (DAD), which leverages a robust teacher to generate adversarial examples and a VQGAN to discretize them, creating more informative samples than standard data augmentation techniques. We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting and demonstrate strong gains in out-of-distribution robustness and clean accuracy across different student architectures. Notably, our method adds minor computational overhead compared to similar techniques and can be easily combined with other data augmentations for further improvements.
\ No newline at end of file
diff --git a/data/2023/neurips/Distributed Inference and Fine-tuning of Large Language Models Over The Internet b/data/2023/neurips/Distributed Inference and Fine-tuning of Large Language Models Over The Internet
new file mode 100644
index 0000000000..6fbc5be136
--- /dev/null
+++ b/data/2023/neurips/Distributed Inference and Fine-tuning of Large Language Models Over The Internet	
@@ -0,0 +1 @@
+Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-grade network. This could allow running LLM efficiently by pooling together idle compute resources of multiple research groups and volunteers. We address two open problems: (1) how to perform inference and fine-tuning reliably if any device can disconnect abruptly and (2) how to partition LLMs between devices with uneven hardware, joining and leaving at will. In order to do that, we develop special fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput. We showcase these algorithms in Petals - a decentralized system that runs Llama 2 (70B) and BLOOM (176B) over the Internet up to 10x faster than offloading for interactive generation. We evaluate the performance of our system in simulated conditions and a real-world setup spanning two continents.
\ No newline at end of file
diff --git a/data/2023/neurips/Distributed Personalized Empirical Risk Minimization b/data/2023/neurips/Distributed Personalized Empirical Risk Minimization
new file mode 100644
index 0000000000..9d7747689b
--- /dev/null
+++ b/data/2023/neurips/Distributed Personalized Empirical Risk Minimization	
@@ -0,0 +1 @@
+This paper advocates a new paradigm Personalized Empirical Risk Minimization (PERM) to facilitate learning from heterogeneous data sources without imposing stringent constraints on computational resources shared by participating devices. In PERM, we aim to learn a distinct model for each client by learning who to learn with and personalizing the aggregation of local empirical losses by effectively estimating the statistical discrepancy among data distributions, which entails optimal statistical accuracy for all local distributions and overcomes the data heterogeneity issue. To learn personalized models at scale, we propose a distributed algorithm that replaces the standard model averaging with model shuffling to simultaneously optimize PERM objectives for all devices. This also allows us to learn distinct model architectures (e.g., neural networks with different numbers of parameters) for different clients, thus confining underlying memory and compute resources of individual clients. We rigorously analyze the convergence of the proposed algorithm and conduct experiments that corroborate the effectiveness of the proposed paradigm.
\ No newline at end of file
diff --git a/data/2023/neurips/Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training b/data/2023/neurips/Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training
new file mode 100644
index 0000000000..5874880efa
--- /dev/null
+++ b/data/2023/neurips/Distributionally Robust Ensemble of Lottery Tickets Towards Calibrated Sparse Network Training	
@@ -0,0 +1 @@
+The recently developed sparse network training methods, such as Lottery Ticket Hypothesis (LTH) and its variants, have shown impressive learning capacity by finding sparse sub-networks from a dense one. While these methods could largely sparsify deep networks, they generally focus more on realizing comparable accuracy to dense counterparts yet neglect network calibration. However, how to achieve calibrated network predictions lies at the core of improving model reliability, especially when it comes to addressing the overconfident issue and out-of-distribution cases. In this study, we propose a novel Distributionally Robust Optimization (DRO) framework to achieve an ensemble of lottery tickets towards calibrated network sparsification. Specifically, the proposed DRO ensemble aims to learn multiple diverse and complementary sparse sub-networks (tickets) with the guidance of uncertainty sets, which encourage tickets to gradually capture different data distributions from easy to hard and naturally complement each other. We theoretically justify the strong calibration performance by showing how the proposed robust training process guarantees to lower the confidence of incorrect predictions. Extensive experimental results on several benchmarks show that our proposed lottery ticket ensemble leads to a clear calibration improvement without sacrificing accuracy and burdening inference costs. Furthermore, experiments on OOD datasets demonstrate the robustness of our approach in the open-set environment.
\ No newline at end of file
diff --git a/data/2023/neurips/Diverse Shape Completion via Style Modulated Generative Adversarial Networks b/data/2023/neurips/Diverse Shape Completion via Style Modulated Generative Adversarial Networks
new file mode 100644
index 0000000000..3a7a0f71e6
--- /dev/null
+++ b/data/2023/neurips/Diverse Shape Completion via Style Modulated Generative Adversarial Networks	
@@ -0,0 +1 @@
+Shape completion aims to recover the full 3D geometry of an object from a partial observation. This problem is inherently multi-modal since there can be many ways to plausibly complete the missing regions of a shape. Such diversity would be indicative of the underlying uncertainty of the shape and could be preferable for downstream tasks such as planning. In this paper, we propose a novel conditional generative adversarial network that can produce many diverse plausible completions of a partially observed point cloud. To enable our network to produce multiple completions for the same partial input, we introduce stochasticity into our network via style modulation. By extracting style codes from complete shapes during training, and learning a distribution over them, our style codes can explicitly carry shape category information leading to better completions. We further introduce diversity penalties and discriminators at multiple scales to prevent conditional mode collapse and to train without the need for multiple ground truth completions for each partial input. Evaluations across several synthetic and real datasets demonstrate that our method achieves significant improvements in respecting the partial observations while obtaining greater diversity in completions.
\ No newline at end of file
diff --git a/data/2023/neurips/Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation b/data/2023/neurips/Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation
new file mode 100644
index 0000000000..b9091a8535
--- /dev/null
+++ b/data/2023/neurips/Diversified Outlier Exposure for Out-of-Distribution Detection via Informative Extrapolation	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection is important for deploying reliable machine learning models on real-world applications. Recent advances in outlier exposure have shown promising results on OOD detection via fine-tuning model with informatively sampled auxiliary outliers. However, previous methods assume that the collected outliers can be sufficiently large and representative to cover the boundary between ID and OOD data, which might be impractical and challenging. In this work, we propose a novel framework, namely, Diversified Outlier Exposure (DivOE), for effective OOD detection via informative extrapolation based on the given auxiliary outliers. Specifically, DivOE introduces a new learning objective, which diversifies the auxiliary distribution by explicitly synthesizing more informative outliers for extrapolation during training. It leverages a multi-step optimization method to generate novel outliers beyond the original ones, which is compatible with many variants of outlier exposure. Extensive experiments and analyses have been conducted to characterize and demonstrate the effectiveness of the proposed DivOE. The code is publicly available at: https://github.com/tmlr-group/DivOE.
\ No newline at end of file
diff --git a/data/2023/neurips/Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation b/data/2023/neurips/Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation
new file mode 100644
index 0000000000..5bc724f1f3
--- /dev/null
+++ b/data/2023/neurips/Diversify Your Vision Datasets with Automatic Diffusion-based Augmentation	
@@ -0,0 +1 @@
+Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.
\ No newline at end of file
diff --git a/data/2023/neurips/Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback b/data/2023/neurips/Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback
new file mode 100644
index 0000000000..335c4fd14f
--- /dev/null
+++ b/data/2023/neurips/Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback	
@@ -0,0 +1 @@
+The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine
\ No newline at end of file
diff --git a/data/2023/neurips/DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining b/data/2023/neurips/DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
new file mode 100644
index 0000000000..cec975848c
--- /dev/null
+++ b/data/2023/neurips/DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining	
@@ -0,0 +1 @@
+The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to set the domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Does Invariant Graph Learning via Environment Augmentation Learn Invariance? b/data/2023/neurips/Does Invariant Graph Learning via Environment Augmentation Learn Invariance?
new file mode 100644
index 0000000000..8484c51349
--- /dev/null
+++ b/data/2023/neurips/Does Invariant Graph Learning via Environment Augmentation Learn Invariance?	
@@ -0,0 +1 @@
+Invariant graph representation learning aims to learn the invariance among data from different environments for out-of-distribution generalization on graphs. As the graph environment partitions are usually expensive to obtain, augmenting the environment information has become the de facto approach. However, the usefulness of the augmented environment information has never been verified. In this work, we find that it is fundamentally impossible to learn invariant graph representations via environment augmentation without additional assumptions. Therefore, we develop a set of minimal assumptions, including variation sufficiency and variation consistency, for feasible invariant graph learning. We then propose a new framework Graph invAriant Learning Assistant (GALA). GALA incorporates an assistant model that needs to be sensitive to graph environment changes or distribution shifts. The correctness of the proxy predictions by the assistant model hence can differentiate the variations in spurious subgraphs. We show that extracting the maximally invariant subgraph to the proxy predictions provably identifies the underlying invariant subgraph for successful OOD generalization under the established minimal assumptions. Extensive experiments on datasets including DrugOOD with various graph distribution shifts confirm the effectiveness of GALA.
\ No newline at end of file
diff --git a/data/2023/neurips/Does a sparse ReLU network training problem always admit an optimum ? b/data/2023/neurips/Does a sparse ReLU network training problem always admit an optimum ?
new file mode 100644
index 0000000000..4e647f447c
--- /dev/null
+++ b/data/2023/neurips/Does a sparse ReLU network training problem always admit an optimum ?	
@@ -0,0 +1 @@
+Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters, and that optimization algorithms may then diverge. Via a new topological relation between sparse ReLU neural networks and their linear counterparts, we derive -- using existing tools from real algebraic geometry -- an algorithm to verify that a given sparsity pattern suffers from this issue. Then, the existence of a global optimum is proved for every concrete optimization problem involving a shallow sparse ReLU neural network of output dimension one. Overall, the analysis is based on the investigation of two topological properties of the space of functions implementable as sparse ReLU neural networks: a best approximation property, and a closedness property, both in the uniform norm. This is studied both for (finite) domains corresponding to practical training on finite training sets, and for more general domains such as the unit cube. This allows us to provide conditions for the guaranteed existence of an optimum given a sparsity pattern. The results apply not only to several sparsity patterns proposed in recent works on network pruning/sparsification, but also to classical dense neural networks, including architectures not covered by existing results.
\ No newline at end of file
diff --git a/data/2023/neurips/Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy b/data/2023/neurips/Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy
new file mode 100644
index 0000000000..045a3ea57d
--- /dev/null
+++ b/data/2023/neurips/Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy	
@@ -0,0 +1 @@
+Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Don't just prune by magnitude! Your mask topology is a secret weapon b/data/2023/neurips/Don't just prune by magnitude! Your mask topology is a secret weapon
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage b/data/2023/neurips/Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
new file mode 100644
index 0000000000..4028d8cfc6
--- /dev/null
+++ b/data/2023/neurips/Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage	
@@ -0,0 +1 @@
+In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P^2MPO$), which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Notably, the double pessimism principle is crucial to overcome the distributional shifts incurred by (i) the mismatch between the behavior policy and the target policies; and (ii) the perturbation of the nominal model. Under certain accuracy conditions on the model estimation subroutine, we prove that $P^2MPO$ is sample-efficient with robust partial coverage data, which only requires the offline data to have good coverage of the distributions induced by the optimal robust policy and the perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples of RMDPs, including tabular RMDPs, factored RMDPs, kernel and neural RMDPs, we prove that $P^2MPO$ enjoys a $\tilde{\mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the dataset size. We highlight that all these examples, except tabular RMDPs, are first identified and proven tractable by this work. Furthermore, we continue our study of robust offline RL in the robust Markov games (RMGs). By extending the double pessimism principle identified for single-agent RMDPs, we propose another algorithm framework that can efficiently find the robust Nash equilibria among players using only robust unilateral (partial) coverage data. To our best knowledge, this work proposes the first general learning principle -- double pessimism -- for robust offline RL and shows that it is provably efficient with general function approximation.
\ No newline at end of file
diff --git a/data/2023/neurips/Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee b/data/2023/neurips/Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee
new file mode 100644
index 0000000000..db83c76a67
--- /dev/null
+++ b/data/2023/neurips/Double Randomized Underdamped Langevin with Dimension-Independent Convergence Guarantee	
@@ -0,0 +1 @@
+This paper focuses on the high-dimensional sampling of log-concave distributions with composite structures: p ∗ (d x ) ∝ exp( − g ( x ) − f ( x ))d x . We develop a double randomization technique, which leads to a fast underdamped Langevin algorithm with a dimension-independent convergence guarantee. We prove that the algorithm enjoys an overall (cid:101) O (cid:16) (tr( H )) 1 / 3 ϵ 2 / 3 (cid:17) iteration complexity to reach an ϵ -tolerated sample whose distribution p admits W 2 ( p, p ∗ ) ≤ ϵ . Here, H is an upper bound of the Hessian matrices for f and does not explicitly depend on dimension d . For the posterior sampling over linear models with normalized data, we show a clear superiority of convergence rate which is dimension-free and outperforms the previous best-known results by a d 1 / 3 factor. The analysis to achieve a faster convergence rate brings new insights into high-dimensional sampling.
\ No newline at end of file
diff --git a/data/2023/neurips/Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control b/data/2023/neurips/Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control
new file mode 100644
index 0000000000..8ef194e74c
--- /dev/null
+++ b/data/2023/neurips/Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control	
@@ -0,0 +1 @@
+Motivated by a recent literature on the double-descent phenomenon in machine learning, we consider highly over-parameterized models in causal inference, including synthetic control with many control units. In such models, there may be so many free parameters that the model fits the training data perfectly. We first investigate high-dimensional linear regression for imputing wage data and estimating average treatment effects, where we find that models with many more covariates than sample size can outperform simple ones. We then document the performance of high-dimensional synthetic control estimators with many control units. We find that adding control units can help improve imputation performance even beyond the point where the pre-treatment fit is perfect. We provide a unified theoretical perspective on the performance of these high-dimensional models. Specifically, we show that more complex models can be interpreted as model-averaging estimators over simpler ones, which we link to an improvement in average performance. This perspective yields concrete insights into the use of synthetic control when control units are many relative to the number of pre-treatment periods.
\ No newline at end of file
diff --git a/data/2023/neurips/Doubly Robust Augmented Transfer for Meta-Reinforcement Learning b/data/2023/neurips/Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
new file mode 100644
index 0000000000..714fe52ff6
--- /dev/null
+++ b/data/2023/neurips/Doubly Robust Augmented Transfer for Meta-Reinforcement Learning	
@@ -0,0 +1 @@
+Meta-reinforcement learning (Meta-RL), though enabling a fast adaptation to learn new skills by exploiting the common structure shared among different tasks, suffers performance degradation in the sparse-reward setting. Current hindsight-based sample transfer approaches can alleviate this issue by transferring relabeled trajectories from other tasks to a new task so as to provide informative experience for the target reward function, but are unfortunately constrained with the unrealistic assumption that tasks differ only in reward functions. In this paper, we propose a doubly robust augmented transfer (DRaT) approach, aiming at addressing the more general sparse reward meta-RL scenario with both dynamics mismatches and varying reward functions across tasks. Specifically, we design a doubly robust augmented estimator for efficient value-function evaluation, which tackles dynamics mismatches with the optimal importance weight of transition distributions achieved by minimizing the theoretically derived upper bound of mean squared error (MSE) between the estimated values of transferred samples and their true values in the target task. Due to its intractability, we then propose an interval-based approximation to this optimal importance weight, which is guaranteed to cover the optimum with a constrained and sample-independent upper bound on the MSE approximation error. Based on our theoretical findings, we finally develop a DRaT algorithm for transferring informative samples across tasks during the training of meta-RL. We implement DRaT on an off-policy meta-RL baseline, and empirically show that it significantly outperforms other hindsight-based approaches on various sparse-reward MuJoCo locomotion tasks with varying dynamics and reward functions.
\ No newline at end of file
diff --git a/data/2023/neurips/Drift doesn't Matter: Dynamic Decomposition with Diffusion Reconstruction for Unstable Multivariate Time Series Anomaly Detection b/data/2023/neurips/Drift doesn't Matter: Dynamic Decomposition with Diffusion Reconstruction for Unstable Multivariate Time Series Anomaly Detection
new file mode 100644
index 0000000000..c4f03929d0
--- /dev/null
+++ b/data/2023/neurips/Drift doesn't Matter: Dynamic Decomposition with Diffusion Reconstruction for Unstable Multivariate Time Series Anomaly Detection	
@@ -0,0 +1 @@
+Many unsupervised methods have recently been proposed for multivariate time series anomaly detection. However, existing works mainly focus on stable data yet often omit the drift generated from non-stationary environments, which may lead to numerous false alarms. We propose D ynamic D ecomposition with D iffusion R econstruction (D 3 R), a novel anomaly detection network for real-world unstable data to fill the gap. D 3 R tackles the drift via decomposition and reconstruction. In the decomposition procedure, we utilize data-time mix-attention to dynamically decompose long-period multivariate time series, overcoming the limitation of the local sliding window. The information bottleneck is critical yet difficult to determine in the reconstruction procedure. To avoid retraining once the bottleneck changes, we control it externally by noise diffusion and directly reconstruct the polluted data. The whole model can be trained end-to-end. Extensive experiments on various real-world datasets demonstrate that D 3 R significantly outperforms existing methods, with a 11% average relative improvement over the previous SOTA models. Code is available at https://github.com/ForestsKing/D3R.
\ No newline at end of file
diff --git a/data/2023/neurips/Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative MARL b/data/2023/neurips/Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative MARL
new file mode 100644
index 0000000000..055a3bdd01
--- /dev/null
+++ b/data/2023/neurips/Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative MARL	
@@ -0,0 +1 @@
+Value decomposition methods have gained popularity in the field of cooperative multi-agent reinforcement learning. However, almost all existing methods follow the principle of Individual Global Max (IGM) or its variants, which limits their problem-solving capabilities. To address this, we propose a dual self-awareness value decomposition framework, inspired by the notion of dual self-awareness in psychology, that entirely rejects the IGM premise. Each agent consists of an ego policy for action selection and an alter ego value function to solve the credit assignment problem. The value function factorization can ignore the IGM assumption by utilizing an explicit search procedure. On the basis of the above, we also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets b/data/2023/neurips/DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets
new file mode 100644
index 0000000000..95d5916167
--- /dev/null
+++ b/data/2023/neurips/DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets	
@@ -0,0 +1 @@
+One of the grand challenges of cell biology is inferring the gene regulatory network (GRN) which describes interactions between genes and their products that control gene expression and cellular function. We can treat this as a causal discovery problem but with two non-standard challenges: (1) regulatory networks are inherently cyclic so we should not model a GRN as a directed acyclic graph (DAG), and (2) observations have significant measurement noise, so for typical sample sizes there will always be a large equivalence class of graphs that are likely given the data, and we want methods that capture this uncertainty. Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both. In this paper we leverage the fact that it is possible to estimate the"velocity"of gene expression with RNA velocity techniques to develop an approach that addresses both challenges. Because we have access to velocity information, we can treat the Bayesian structure learning problem as a problem of sparse identification of a dynamical system, capturing cyclic feedback loops through time. Since our objective is to model uncertainty over discrete structures, we leverage Generative Flow Networks (GFlowNets) to estimate the posterior distribution over the combinatorial space of possible sparse dependencies. Our results indicate that our method learns posteriors that better encapsulate the distributions of cyclic structures compared to counterpart state-of-the-art Bayesian structure learning approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/DynPoint: Dynamic Neural Point For View Synthesis b/data/2023/neurips/DynPoint: Dynamic Neural Point For View Synthesis
new file mode 100644
index 0000000000..339021862a
--- /dev/null
+++ b/data/2023/neurips/DynPoint: Dynamic Neural Point For View Synthesis	
@@ -0,0 +1 @@
+The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
\ No newline at end of file
diff --git a/data/2023/neurips/Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers b/data/2023/neurips/Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
new file mode 100644
index 0000000000..65a016ef81
--- /dev/null
+++ b/data/2023/neurips/Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers	
@@ -0,0 +1 @@
+Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational requirements during inference. Our method employs a learnable mechanism that determines which uninformative tokens can be dropped from the context at any point across the generation process. By doing so, our approach not only addresses performance concerns but also enhances interpretability, providing valuable insight into the model's decision-making process. Our technique can be applied to existing pre-trained models through a straightforward fine-tuning process, and the pruning strength can be specified by a sparsity parameter. Notably, our empirical findings demonstrate that we can effectively prune up to 80\% of the context without significant performance degradation on downstream tasks, offering a valuable tool for mitigating inference costs. Our reference implementation achieves up to $2\times$ increase in inference throughput and even greater memory savings.
\ No newline at end of file
diff --git a/data/2023/neurips/Dynamic Personalized Federated Learning with Adaptive Differential Privacy b/data/2023/neurips/Dynamic Personalized Federated Learning with Adaptive Differential Privacy
new file mode 100644
index 0000000000..2682de02f6
--- /dev/null
+++ b/data/2023/neurips/Dynamic Personalized Federated Learning with Adaptive Differential Privacy	
@@ -0,0 +1 @@
+Personalized federated learning with differential privacy has been considered a feasible solution to address non-IID distribution of data and privacy leakage risks. However, current personalized federated learning methods suffer from inflexible personalization and convergence difficulties due to two main factors: 1) Firstly, we observe that the prevailing personalization methods mainly achieve this by personalizing a fixed portion of the model, which lacks flexibility. 2) Moreover, we further demonstrate that the default gradient calculation is sensitive to the widely-used clipping operations in differential privacy, resulting in difficulties in convergence. Considering that Fisher information values can serve as an effective measure for estimating the information content of parameters by reflecting the model sensitivity to parameters, we aim to leverage this property to address the aforementioned challenges. In this paper, we propose a novel federated learning method with D ynamic Fisher P ersonalization and A daptive Constraint (FedDPA) to handle these challenges. Firstly, by using layer-wise Fisher information to measure the information content of local parameters, we retain local parameters with high Fisher values during the personalization process, which are considered informative, simultaneously prevent these parameters from noise perturbation. Secondly, we introduce an adaptive approach by applying differential constraint strategies to personalized parameters and shared parameters identified in the previous for better convergence. Our method boosts performance through flexible personalization while mitigating the slow convergence caused by clipping operations. Experimental results on CIFAR-10, FEMNIST and SVHN dataset demonstrate the effectiveness of our approach in achieving better performance and robustness against clipping, under personalized federated learning with differential privacy.
\ No newline at end of file
diff --git a/data/2023/neurips/Dynamic Regret of Adversarial Linear Mixture MDPs b/data/2023/neurips/Dynamic Regret of Adversarial Linear Mixture MDPs
new file mode 100644
index 0000000000..f455c03dd5
--- /dev/null
+++ b/data/2023/neurips/Dynamic Regret of Adversarial Linear Mixture MDPs	
@@ -0,0 +1 @@
+We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure. Denote by d the dimen-sion of the feature mapping, H the length of each episode, K the number of episodes, P T the non-stationary measure, we propose a novel algorithm that enjoys an (cid:101) O (cid:0) √ d 2 H 3 K + (cid:112) H 4 ( K + P T )(1 + P T ) (cid:1) dynamic regret under the condition that P T is known, which improves previously best-known dynamic regret for adversarial linear mixture MDP and adversarial tabular MDPs. We also establish an Ω (cid:0) √ d 2 H 3 K + (cid:112) HK ( H + P T ) (cid:1) lower bound, indicating our algorithm is optimal in K and P T . Furthermore, when the non-stationary measure P T is unknown, we design an online ensemble algorithm with a meta-base structure, which is proved to achieve an (cid:101) O (cid:0) √ d 2 H 3 K + (cid:112) H 4 ( K + P T )(1 + P T ) + H 2 S 2 T (cid:1) dynamic regret and here S T is the expected switching number of the best base-learner. The result can be optimal under certain regimes.
\ No newline at end of file
diff --git a/data/2023/neurips/Dynamic Sparsity Is Channel-Level Sparsity Learner b/data/2023/neurips/Dynamic Sparsity Is Channel-Level Sparsity Learner
new file mode 100644
index 0000000000..5c2ff688df
--- /dev/null
+++ b/data/2023/neurips/Dynamic Sparsity Is Channel-Level Sparsity Learner	
@@ -0,0 +1 @@
+Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), which for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60%) being sparser than others. By progressively identifying and removing these channels during training, our approach translates unstructured sparsity to channel-wise sparsity. Our experimental results demonstrate that Chase achieves 1.7 X inference throughput speedup on common GPU devices without compromising accuracy with ResNet-50 on ImageNet. We release our codes in https://github.com/luuyin/chase.
\ No newline at end of file
diff --git "a/data/2023/neurips/D\303\244RF: Boosting Radiance Fields from Sparse Input Views with Monocular Depth Adaptation" "b/data/2023/neurips/D\303\244RF: Boosting Radiance Fields from Sparse Input Views with Monocular Depth Adaptation"
new file mode 100644
index 0000000000..48a76d9fc7
--- /dev/null
+++ "b/data/2023/neurips/D\303\244RF: Boosting Radiance Fields from Sparse Input Views with Monocular Depth Adaptation"	
@@ -0,0 +1 @@
+Neural radiance fields (NeRF) shows powerful performance in novel view synthesis and 3D geometry reconstruction, but it suffers from critical performance degradation when the number of known viewpoints is drastically reduced. Existing works attempt to overcome this problem by employing external priors, but their success is limited to certain types of scenes or datasets. Employing monocular depth estimation (MDE) networks, pretrained on large-scale RGB-D datasets, with powerful generalization capability would be a key to solving this problem: however, using MDE in conjunction with NeRF comes with a new set of challenges due to various ambiguity problems exhibited by monocular depths. In this light, we propose a novel framework, dubbed D\"aRF, that achieves robust NeRF reconstruction with a handful of real-world images by combining the strengths of NeRF and monocular depth estimation through online complementary training. Our framework imposes the MDE network's powerful geometry prior to NeRF representation at both seen and unseen viewpoints to enhance its robustness and coherence. In addition, we overcome the ambiguity problems of monocular depths through patch-wise scale-shift fitting and geometry distillation, which adapts the MDE network to produce depths aligned accurately with NeRF geometry. Experiments show our framework achieves state-of-the-art results both quantitatively and qualitatively, demonstrating consistent and reliable performance in both indoor and outdoor real-world datasets. Project page is available at https://ku-cvlab.github.io/DaRF/.
\ No newline at end of file
diff --git a/data/2023/neurips/E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning b/data/2023/neurips/E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning
new file mode 100644
index 0000000000..b15709d98e
--- /dev/null
+++ b/data/2023/neurips/E2PNet: Event to Point Cloud Registration with Spatio-Temporal Representation Learning	
@@ -0,0 +1 @@
+Event cameras have emerged as a promising vision sensor in recent years due to their unparalleled temporal resolution and dynamic range. While registration of 2D RGB images to 3D point clouds is a long-standing problem in computer vision, no prior work studies 2D-3D registration for event cameras. To this end, we propose E2PNet, the first learning-based method for event-to-point cloud registration. The core of E2PNet is a novel feature representation network called Event-Points-to-Tensor (EP2T), which encodes event data into a 2D grid-shaped feature tensor. This grid-shaped feature enables matured RGB-based frameworks to be easily used for event-to-point cloud registration, without changing hyper-parameters and the training procedure. EP2T treats the event input as spatio-temporal point clouds. Unlike standard 3D learning architectures that treat all dimensions of point clouds equally, the novel sampling and information aggregation modules in EP2T are designed to handle the inhomogeneity of the spatial and temporal dimensions. Experiments on the MVSEC and VECtor datasets demonstrate the superiority of E2PNet over hand-crafted and other learning-based methods. Compared to RGB-based registration, E2PNet is more robust to extreme illumination or fast motion due to the use of event data. Beyond 2D-3D registration, we also show the potential of EP2T for other vision tasks such as flow estimation, event-to-image reconstruction and object recognition. The source code can be found at: https://github.com/Xmu-qcj/E2PNet.
\ No newline at end of file
diff --git a/data/2023/neurips/ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram b/data/2023/neurips/ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
new file mode 100644
index 0000000000..c4ada23786
--- /dev/null
+++ b/data/2023/neurips/ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram	
@@ -0,0 +1 @@
+Question answering (QA) in the field of healthcare has received much attention due to significant advancements in natural language processing. However, existing healthcare QA datasets primarily focus on medical images, clinical notes, or structured electronic health record tables. This leaves the vast potential of combining electrocardiogram (ECG) data with these systems largely untapped. To address this gap, we present ECG-QA, the first QA dataset specifically designed for ECG analysis. The dataset comprises a total of 70 question templates that cover a wide range of clinically relevant ECG topics, each validated by an ECG expert to ensure their clinical utility. As a result, our dataset includes diverse ECG interpretation questions, including those that require a comparative analysis of two different ECGs. In addition, we have conducted numerous experiments to provide valuable insights for future research directions. We believe that ECG-QA will serve as a valuable resource for the development of intelligent QA systems capable of assisting clinicians in ECG interpretations. Dataset URL: https://github.com/Jwoo5/ecg-qa
\ No newline at end of file
diff --git a/data/2023/neurips/EDGI: Equivariant Diffusion for Planning with Embodied Agents b/data/2023/neurips/EDGI: Equivariant Diffusion for Planning with Embodied Agents
new file mode 100644
index 0000000000..49deff9cca
--- /dev/null
+++ b/data/2023/neurips/EDGI: Equivariant Diffusion for Planning with Embodied Agents	
@@ -0,0 +1 @@
+Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group Z, and the object permutation group Sn. EDGI follows the Diffuser framework (Janner et al., 2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3)xZxSn-equivariant diffusion model that supports multiple representations. We integrate this model in a planning loop, where conditioning and classifier guidance let us softly break the symmetry for specific tasks as needed. On object manipulation and navigation tasks, EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
\ No newline at end of file
diff --git a/data/2023/neurips/EFWI: Multiparameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties b/data/2023/neurips/EFWI: Multiparameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties
new file mode 100644
index 0000000000..1b7710c99a
--- /dev/null
+++ b/data/2023/neurips/EFWI: Multiparameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties	
@@ -0,0 +1 @@
+Elastic geophysical properties (such as P- and S-wave velocities) are of great importance to various subsurface applications like CO$_2$ sequestration and energy exploration (e.g., hydrogen and geothermal). Elastic full waveform inversion (FWI) is widely applied for characterizing reservoir properties. In this paper, we introduce $\mathbf{\mathbb{E}^{FWI}}$, a comprehensive benchmark dataset that is specifically designed for elastic FWI. $\mathbf{\mathbb{E}^{FWI}}$ encompasses 8 distinct datasets that cover diverse subsurface geologic structures (flat, curve, faults, etc). The benchmark results produced by three different deep learning methods are provided. In contrast to our previously presented dataset (pressure recordings) for acoustic FWI (referred to as OpenFWI), the seismic dataset in $\mathbf{\mathbb{E}^{FWI}}$ has both vertical and horizontal components. Moreover, the velocity maps in $\mathbf{\mathbb{E}^{FWI}}$ incorporate both P- and S-wave velocities. While the multicomponent data and the added S-wave velocity make the data more realistic, more challenges are introduced regarding the convergence and computational cost of the inversion. We conduct comprehensive numerical experiments to explore the relationship between P-wave and S-wave velocities in seismic data. The relation between P- and S-wave velocities provides crucial insights into the subsurface properties such as lithology, porosity, fluid content, etc. We anticipate that $\mathbf{\mathbb{E}^{FWI}}$ will facilitate future research on multiparameter inversions and stimulate endeavors in several critical research topics of carbon-zero and new energy exploration. All datasets, codes and relevant information can be accessed through our website at https://efwi-lanl.github.io/
\ No newline at end of file
diff --git a/data/2023/neurips/EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models b/data/2023/neurips/EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
new file mode 100644
index 0000000000..9843dc2522
--- /dev/null
+++ b/data/2023/neurips/EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models	
@@ -0,0 +1 @@
+While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, which contains deidentified structured data from the electronic health records (EHRs) of 6,739 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of CLMBR-T-base, a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaptation. Our model and dataset are available via a research data use agreement from the Stanford AIMI Center. Code to reproduce our results are available at our Github repo: https://github.com/som-shahlab/ehrshot-benchmark
\ No newline at end of file
diff --git a/data/2023/neurips/ELDEN: Exploration via Local Dependencies b/data/2023/neurips/ELDEN: Exploration via Local Dependencies
new file mode 100644
index 0000000000..9236719ecc
--- /dev/null
+++ b/data/2023/neurips/ELDEN: Exploration via Local Dependencies	
@@ -0,0 +1 @@
+Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Easy Learning from Label Proportions b/data/2023/neurips/Easy Learning from Label Proportions
new file mode 100644
index 0000000000..dcbc96af7f
--- /dev/null
+++ b/data/2023/neurips/Easy Learning from Label Proportions	
@@ -0,0 +1 @@
+We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into"bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We showcase the flexibility of our approach by applying it to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. More concretely, we exhibit a variance reduction technique that makes the quality of LLP learning deteriorate only by a factor of k (k being bag size) in both ERM and SGD setups, as compared to full supervision. Finally, we validate our theoretical results on multiple datasets demonstrating our algorithm performs as well or better than previous LLP approaches in spite of its simplicity.
\ No newline at end of file
diff --git a/data/2023/neurips/Effective Bayesian Heteroscedastic Regression with Deep Neural Networks b/data/2023/neurips/Effective Bayesian Heteroscedastic Regression with Deep Neural Networks
new file mode 100644
index 0000000000..4d9e6cf538
--- /dev/null
+++ b/data/2023/neurips/Effective Bayesian Heteroscedastic Regression with Deep Neural Networks	
@@ -0,0 +1 @@
+Flexibly quantifying both irreducible aleatoric and model-dependent epistemic uncertainties plays an important role for complex regression problems. While deep neural networks in principle can provide this flexibility and learn heteroscedastic aleatoric uncertainties through non-linear functions, recent works highlight that maximizing the log likelihood objective parameterized by mean and variance can lead to compromised mean fits since the gradient are scaled by the predictive variance, and propose adjustments in line with this premise. We instead propose to use the natural parametrization of the Gaussian, which has been shown to be more stable for heteroscedastic regression based on non-linear feature maps and Gaussian processes. Further, we emphasize the significance of principled regularization of the network parameters and prediction. We therefore propose an efficient Laplace approximation for heteroscedastic neural networks that allows automatic regularization through empirical Bayes and provides epistemic uncertainties, both of which improve generalization. We showcase on a range of regression problems— including a new heteroscedastic image regression benchmark—that our methods are scalable, improve over previous approaches for heteroscedastic regression, and provide epistemic uncertainty without requiring hyperparameter tuning.
\ No newline at end of file
diff --git a/data/2023/neurips/Effective Robustness against Natural Distribution Shifts for Models with Different Training Data b/data/2023/neurips/Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
new file mode 100644
index 0000000000..36fe086a93
--- /dev/null
+++ b/data/2023/neurips/Effective Robustness against Natural Distribution Shifts for Models with Different Training Data	
@@ -0,0 +1 @@
+"Effective robustness"measures the extra out-of-distribution (OOD) robustness beyond what can be predicted from the in-distribution (ID) performance. Existing effective robustness evaluations typically use a single test set such as ImageNet to evaluate the ID accuracy. This becomes problematic when evaluating models trained on different data distributions, e.g., comparing models trained on ImageNet vs. zero-shot language-image pre-trained models trained on LAION. In this paper, we propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data. To do this, we control for the accuracy on multiple ID test sets that cover the training distributions for all the evaluated models. Our new evaluation metric provides a better estimate of effective robustness when there are models with different training data. It may also explain the surprising effective robustness gains of zero-shot CLIP-like models exhibited in prior works that used ImageNet as the only ID test set, while the gains diminish under our new evaluation. Additional artifacts including interactive visualizations are provided at https://shizhouxing.github.io/effective-robustness.
\ No newline at end of file
diff --git a/data/2023/neurips/Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning b/data/2023/neurips/Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection b/data/2023/neurips/Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection
new file mode 100644
index 0000000000..fafeedc42f
--- /dev/null
+++ b/data/2023/neurips/Efficient Adversarial Contrastive Learning via Robustness-Aware Coreset Selection	
@@ -0,0 +1 @@
+Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS. Our source code is at https://github.com/GodXuxilie/Efficient_ACL_via_RCS.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards b/data/2023/neurips/Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards
new file mode 100644
index 0000000000..64ad1094fd
--- /dev/null
+++ b/data/2023/neurips/Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards	
@@ -0,0 +1 @@
+This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+\epsilon)$-th moment is bounded for some $\epsilon\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+\epsilon}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $\epsilon=1$. Numerical experimental results confirm the merits of our algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks b/data/2023/neurips/Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
new file mode 100644
index 0000000000..cdbc01e957
--- /dev/null
+++ b/data/2023/neurips/Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks	
@@ -0,0 +1 @@
+Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Diffusion Policies For Offline Reinforcement Learning b/data/2023/neurips/Efficient Diffusion Policies For Offline Reinforcement Learning
new file mode 100644
index 0000000000..08b608a48f
--- /dev/null
+++ b/data/2023/neurips/Efficient Diffusion Policies For Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets, where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model, whose success relies on a parametrized Markov Chain with hundreds of steps for sampling. However, Diffusion-QL suffers from two critical limitations. 1) It is computationally inefficient to forward and backward through the whole Markov chain during training. 2) It is incompatible with maximum likelihood-based RL algorithms (e.g., policy gradient methods) as the likelihood of diffusion models is intractable. Therefore, we propose efficient diffusion policy (EDP) to overcome these two challenges. EDP approximately constructs actions from corrupted ones at training to avoid running the sampling chain. We conduct extensive experiments on the D4RL benchmark. The results show that EDP can reduce the diffusion policy training time from 5 days to 5 hours on gym-locomotion tasks. Moreover, we show that EDP is compatible with various offline RL algorithms (TD3, CRR, and IQL) and achieves new state-of-the-art on D4RL by large margins over previous methods. Our code is available at https://github.com/sail-sg/edp.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Learning of Linear Graph Neural Networks via Node Subsampling b/data/2023/neurips/Efficient Learning of Linear Graph Neural Networks via Node Subsampling
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Efficient Model-Free Exploration in Low-Rank MDPs b/data/2023/neurips/Efficient Model-Free Exploration in Low-Rank MDPs
new file mode 100644
index 0000000000..365be67635
--- /dev/null
+++ b/data/2023/neurips/Efficient Model-Free Exploration in Low-Rank MDPs	
@@ -0,0 +1 @@
+A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a barycentric spanner for the feature embedding as an efficiently computable basis for exploration, performing efficient barycentric spanner computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new approach to error-tolerant barycentric spanner computation and an improved analysis of a certain minimax representation learning objective found in prior work.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric b/data/2023/neurips/Efficient Potential-based Exploration in Reinforcement Learning using Inverse Dynamic Bisimulation Metric
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs b/data/2023/neurips/Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs
new file mode 100644
index 0000000000..346b9d0af7
--- /dev/null
+++ b/data/2023/neurips/Efficient Robust Bayesian Optimization for Arbitrary Uncertain inputs	
@@ -0,0 +1 @@
+Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve state-of-the-art performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models b/data/2023/neurips/Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models
new file mode 100644
index 0000000000..45098e76c2
--- /dev/null
+++ b/data/2023/neurips/Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models	
@@ -0,0 +1 @@
+This paper deals with the problem of efficient sampling from a stochastic differential equation, given the drift function and the diffusion matrix. The proposed approach leverages a recent model for probabilities \cite{rudi2021psd} (the positive semi-definite -- PSD model) from which it is possible to obtain independent and identically distributed (i.i.d.) samples at precision $\varepsilon$ with a cost that is $m^2 d \log(1/\varepsilon)$ where $m$ is the dimension of the model, $d$ the dimension of the space. The proposed approach consists in: first, computing the PSD model that satisfies the Fokker-Planck equation (or its fractional variant) associated with the SDE, up to error $\varepsilon$, and then sampling from the resulting PSD model. Assuming some regularity of the Fokker-Planck solution (i.e. $\beta$-times differentiability plus some geometric condition on its zeros) We obtain an algorithm that: (a) in the preparatory phase obtains a PSD model with L2 distance $\varepsilon$ from the solution of the equation, with a model of dimension $m = \varepsilon^{-(d+1)/(\beta-2s)} (\log(1/\varepsilon))^{d+1}$ where $1/2\leq s\leq1$ is the fractional power to the Laplacian, and total computational complexity of $O(m^{3.5} \log(1/\varepsilon))$ and then (b) for Fokker-Planck equation, it is able to produce i.i.d.\ samples with error $\varepsilon$ in Wasserstein-1 distance, with a cost that is $O(d \varepsilon^{-2(d+1)/\beta-2} \log(1/\varepsilon)^{2d+3})$ per sample. This means that, if the probability associated with the SDE is somewhat regular, i.e. $\beta \geq 4d+2$, then the algorithm requires $O(\varepsilon^{-0.88} \log(1/\varepsilon)^{4.5d})$ in the preparatory phase, and $O(\varepsilon^{-1/2}\log(1/\varepsilon)^{2d+2})$ for each sample. Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction b/data/2023/neurips/Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction
new file mode 100644
index 0000000000..dc9eaca665
--- /dev/null
+++ b/data/2023/neurips/Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction	
@@ -0,0 +1 @@
+Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images. Conventional SR methods typically gather the paired training data by synthesizing LR images from HR images using a predetermined degradation model, e.g., Bicubic down-sampling. However, the realistic degradation type of test images may mismatch with the training-time degradation type due to the dynamic changes of the real-world scenarios, resulting in inferior-quality SR images. To address this, existing methods attempt to estimate the degradation model and train an image-specific model, which, however, is quite time-consuming and impracticable to handle rapidly changing domain shifts. Moreover, these methods largely concentrate on the estimation of one degradation type (e.g., blur degradation), overlooking other degradation types like noise and JPEG in real-world test-time scenarios, thus limiting their practicality. To tackle these problems, we present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types. Specifically, we design a second-order degradation scheme to construct paired data based on the degradation type of the test image, which is predicted by a pre-trained degradation classifier. Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images. Extensive experiments are conducted on newly synthesized corrupted DIV2K datasets with 8 different degradations and several real-world datasets, demonstrating that our SRTTA framework achieves an impressive improvement over existing methods with satisfying speed. The source code is available at https://github.com/DengZeshuai/SRTTA.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks b/data/2023/neurips/Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks
new file mode 100644
index 0000000000..020dc664b3
--- /dev/null
+++ b/data/2023/neurips/Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks	
@@ -0,0 +1 @@
+Uncertainty quantification (UQ) is important for reliability assessment and enhancement of machine learning models. In deep learning, uncertainties arise not only from data, but also from the training procedure that often injects substantial noises and biases. These hinder the attainment of statistical guarantees and, moreover, impose computational challenges on UQ due to the need for repeated network retraining. Building upon the recent neural tangent kernel theory, we create statistically guaranteed schemes to principally \emph{characterize}, and \emph{remove}, the uncertainty of over-parameterized neural networks with very low computation effort. In particular, our approach, based on what we call a procedural-noise-correcting (PNC) predictor, removes the procedural uncertainty by using only \emph{one} auxiliary network that is trained on a suitably labeled dataset, instead of many retrained networks employed in deep ensembles. Moreover, by combining our PNC predictor with suitable light-computation resampling methods, we build several approaches to construct asymptotically exact-coverage confidence intervals using as low as four trained networks without additional overheads.
\ No newline at end of file
diff --git a/data/2023/neurips/Efficiently incorporating quintuple interactions into geometric deep learning force fields b/data/2023/neurips/Efficiently incorporating quintuple interactions into geometric deep learning force fields
new file mode 100644
index 0000000000..e6f3512609
--- /dev/null
+++ b/data/2023/neurips/Efficiently incorporating quintuple interactions into geometric deep learning force fields	
@@ -0,0 +1 @@
+Machine learning force fields (MLFFs) have instigated a groundbreaking shift in molecular dynamics (MD) simulations across a wide range of fields, such as physics, chemistry, biology, and materials science. Incorporating higher order many-body interactions can enhance the expressiveness and accuracy of models. Recent models have achieved this by explicitly including up to four-body interactions. However, five-body interactions, which have relevance in various fields, are still challenging to incorporate efficiently into MLFFs. In this work, we propose the quintuple network (QuinNet), an end-to-end graph neural network that efficiently expresses many-body interactions up to five-body interactions with ab initio accuracy. By analyzing the topology of diverse many-body interactions, we design the model architecture to efficiently and explicitly represent these interactions. We evaluate QuinNet on public datasets of small molecules, such as MD17 and its revised version, and show that it is compatible with other state-of-the-art models on these benchmarks. Moreover, QuinNet surpasses many leading models on larger and more complex molecular systems, such as MD22 and Chignolin, without increasing the computational complexity. We also use QuinNet as a force field for molecular dynamics (MD) simulations to demonstrate its accuracy and stability, and conduct an ablation study to elucidate the significance of five-body interactions. We open source our implementation at https://github.com/Zun-Wang/QuinNet .
\ No newline at end of file
diff --git a/data/2023/neurips/EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset b/data/2023/neurips/EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset
new file mode 100644
index 0000000000..bac0ae7b40
--- /dev/null
+++ b/data/2023/neurips/EgoTracks: A Long-term Egocentric Visual Object Tracking Dataset	
@@ -0,0 +1 @@
+Visual object tracking is a key component to many egocentric vision problems. However, the full spectrum of challenges of egocentric tracking faced by an embodied AI is underrepresented in many existing datasets; these tend to focus on relatively short, third-person videos. Egocentric video has several distinguishing characteristics from those commonly found in past datasets: frequent large camera motions and hand interactions with objects commonly lead to occlusions or objects exiting the frame, and object appearance can change rapidly due to widely different points of view, scale, or object states. Embodied tracking is also naturally long-term, and being able to consistently (re-)associate objects to their appearances and disappearances over as long as a lifetime is critical. Previous datasets under-emphasize this re-detection problem, and their"framed"nature has led to adoption of various spatiotemporal priors that we find do not necessarily generalize to egocentric video. We thus introduce EgoTracks, a new dataset for long-term egocentric visual object tracking. Sourced from the Ego4D dataset, this new dataset presents a significant challenge to recent state-of-the-art single-object tracking models, which we find score poorly on traditional tracking metrics for our new dataset, compared to popular benchmarks. We further show improvements that can be made to a STARK tracker to significantly increase its performance on egocentric data, resulting in a baseline model we call EgoSTARK. We publicly release our annotations and benchmark, hoping our dataset leads to further advancements in tracking.
\ No newline at end of file
diff --git a/data/2023/neurips/Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization b/data/2023/neurips/Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization
new file mode 100644
index 0000000000..099d8945ae
--- /dev/null
+++ b/data/2023/neurips/Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization	
@@ -0,0 +1 @@
+Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.
\ No newline at end of file
diff --git a/data/2023/neurips/Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity b/data/2023/neurips/Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity
new file mode 100644
index 0000000000..6eb42b0ac6
--- /dev/null
+++ b/data/2023/neurips/Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity	
@@ -0,0 +1 @@
+Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous principle in the brain, can in itself introduce shape bias into the network. We found that enforcing the sparse coding constraint using a non-differential Top-K operation can lead to the emergence of structural encoding in neurons in convolutional neural networks, resulting in a smooth decomposition of objects into parts and subparts and endowing the networks with shape bias. We demonstrated this emergence of shape bias and its functional benefits for different network structures with various datasets. For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction. For the image synthesis generative adversary networks, the emerged shape bias leads to more coherent and decomposable structures in the synthesized images. Ablation studies suggest that sparse codes tend to encode structures, whereas the more distributed codes tend to favor texture. Our code is host at the github repository: \url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture}
\ No newline at end of file
diff --git a/data/2023/neurips/Emergent Communication for Rules Reasoning b/data/2023/neurips/Emergent Communication for Rules Reasoning
new file mode 100644
index 0000000000..c35dab3b3a
--- /dev/null
+++ b/data/2023/neurips/Emergent Communication for Rules Reasoning	
@@ -0,0 +1 @@
+Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, inspired by the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts. Moreover, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting. Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Empowering Convolutional Neural Nets with MetaSin Activation b/data/2023/neurips/Empowering Convolutional Neural Nets with MetaSin Activation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics b/data/2023/neurips/End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics
new file mode 100644
index 0000000000..7e703bc0e5
--- /dev/null
+++ b/data/2023/neurips/End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics	
@@ -0,0 +1 @@
+High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this \textit{inverse problem} of mapping detector observations to theoretical quantities of the underlying collision are essential parts of many physics analyses at the LHC. We investigate and compare various generative deep learning methods to approximate this inverse mapping. We introduce a novel unified architecture, termed latent variation diffusion models, which combines the latent learning of cutting-edge generative art approaches with an end-to-end variational framework. We demonstrate the effectiveness of this approach for reconstructing global distributions of theoretical kinematic quantities, as well as for ensuring the adherence of the learned posterior distributions to known physics constraints. Our unified approach achieves a distribution-free distance to the truth of over 20 times less than non-latent state-of-the-art baseline and 3 times less than traditional latent diffusion models.
\ No newline at end of file
diff --git a/data/2023/neurips/Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models b/data/2023/neurips/Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
new file mode 100644
index 0000000000..1ec53e2841
--- /dev/null
+++ b/data/2023/neurips/Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models	
@@ -0,0 +1 @@
+Despite the remarkable performance of text-to-image diffusion models in image generation tasks, recent studies have raised the issue that generated images sometimes cannot capture the intended semantic contents of the text prompts, which phenomenon is often called semantic misalignment. To address this, here we present a novel energy-based model (EBM) framework. Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder. Then, we obtain the gradient of the log posterior of context vectors, which can be updated and transferred to the subsequent cross-attention layer, thereby implicitly minimizing a nested hierarchy of energy functions. Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts. Using extensive experiments, we demonstrate that the proposed method is highly effective in handling various image generation tasks, including multi-concept generation, text-guided image inpainting, and real and synthetic image editing.
\ No newline at end of file
diff --git a/data/2023/neurips/Energy-Efficient Scheduling with Predictions b/data/2023/neurips/Energy-Efficient Scheduling with Predictions
new file mode 100644
index 0000000000..4bb7a882ce
--- /dev/null
+++ b/data/2023/neurips/Energy-Efficient Scheduling with Predictions	
@@ -0,0 +1 @@
+An important goal of modern scheduling systems is to efficiently manage power usage. In energy-efficient scheduling, the operating system controls the speed at which a machine is processing jobs with the dual objective of minimizing energy consumption and optimizing the quality of service cost of the resulting schedule. Since machine-learned predictions about future requests can often be learned from historical data, a recent line of work on learning-augmented algorithms aims to achieve improved performance guarantees by leveraging predictions. In particular, for energy-efficient scheduling, Bamas et. al. [BamasMRS20] and Antoniadis et. al. [antoniadis2021novel] designed algorithms with predictions for the energy minimization with deadlines problem and achieved an improved competitive ratio when the prediction error is small while also maintaining worst-case bounds even when the prediction error is arbitrarily large. In this paper, we consider a general setting for energy-efficient scheduling and provide a flexible learning-augmented algorithmic framework that takes as input an offline and an online algorithm for the desired energy-efficient scheduling problem. We show that, when the prediction error is small, this framework gives improved competitive ratios for many different energy-efficient scheduling problems, including energy minimization with deadlines, while also maintaining a bounded competitive ratio regardless of the prediction error. Finally, we empirically demonstrate that this framework achieves an improved performance on real and synthetic datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork b/data/2023/neurips/Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork
new file mode 100644
index 0000000000..370b16ba2c
--- /dev/null
+++ b/data/2023/neurips/Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork	
@@ -0,0 +1 @@
+As there exist competitive subnetworks within a dense network in concert with Lottery Ticket Hypothesis , we introduce a novel neuron-wise task incremental learning method, namely Data-free Subnetworks (DSN) , which attempts to enhance the elastic knowledge transfer across the tasks that sequentially arrive. Specifically, DSN primarily seeks to transfer knowledge to the new coming task from the learned tasks by selecting the affiliated weights of a small set of neurons to be activated, including the reused neurons from prior tasks via neuron-wise masks. And it also transfers possibly valuable knowledge to the earlier tasks via data-free replay. Especially, DSN inherently relieves the catastrophic forgetting and the unavailability of past data or possible privacy concerns. The comprehensive experiments conducted on four benchmark datasets demonstrate the effectiveness of the proposed DSN in the context of task-incremental learning by comparing it to several state-of-the-art baselines. In particular, DSN enables the knowledge transfer to the earlier tasks, which is often overlooked by prior efforts.
\ No newline at end of file
diff --git a/data/2023/neurips/Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams b/data/2023/neurips/Enhancing Motion Deblurring in High-Speed Scenes with Spike Streams
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Enhancing Sharpness-Aware Optimization Through Variance Suppression b/data/2023/neurips/Enhancing Sharpness-Aware Optimization Through Variance Suppression
new file mode 100644
index 0000000000..c363127454
--- /dev/null
+++ b/data/2023/neurips/Enhancing Sharpness-Aware Optimization Through Variance Suppression	
@@ -0,0 +1 @@
+Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
\ No newline at end of file
diff --git a/data/2023/neurips/Entropic Neural Optimal Transport via Diffusion Processes b/data/2023/neurips/Entropic Neural Optimal Transport via Diffusion Processes
new file mode 100644
index 0000000000..213cb6d531
--- /dev/null
+++ b/data/2023/neurips/Entropic Neural Optimal Transport via Diffusion Processes	
@@ -0,0 +1 @@
+We propose a novel neural algorithm for the fundamental problem of computing the entropic optimal transport (EOT) plan between continuous probability distributions which are accessible by samples. Our algorithm is based on the saddle point reformulation of the dynamic version of EOT which is known as the Schr\"odinger Bridge problem. In contrast to the prior methods for large-scale EOT, our algorithm is end-to-end and consists of a single learning step, has fast inference procedure, and allows handling small values of the entropy regularization coefficient which is of particular importance in some applied problems. Empirically, we show the performance of the method on several large-scale EOT tasks. https://github.com/ngushchin/EntropicNeuralOptimalTransport
\ No newline at end of file
diff --git a/data/2023/neurips/Entropy-based Training Methods for Scalable Neural Implicit Samplers b/data/2023/neurips/Entropy-based Training Methods for Scalable Neural Implicit Samplers
new file mode 100644
index 0000000000..6b3da3dc40
--- /dev/null
+++ b/data/2023/neurips/Entropy-based Training Methods for Scalable Neural Implicit Samplers	
@@ -0,0 +1 @@
+Efficiently sampling from un-normalized target distributions is a fundamental problem in scientific computing and machine learning. Traditional approaches like Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samples from such distributions but suffer from computational inefficiency, particularly when dealing with high-dimensional targets, as they require numerous iterations to generate a batch of samples. In this paper, we propose an efficient and scalable neural implicit sampler that overcomes these limitations. Our sampler can generate large batches of samples with low computational costs by leveraging a neural transformation that directly maps easily sampled latent vectors to target samples without the need for iterative procedures. To train the neural implicit sampler, we introduce two novel methods: the KL training method and the Fisher training method. The former minimizes the Kullback-Leibler divergence, while the latter minimizes the Fisher divergence. By employing these training methods, we effectively optimize the neural implicit sampler to capture the desired target distribution. To demonstrate the effectiveness, efficiency, and scalability of our proposed samplers, we evaluate them on three sampling benchmarks with different scales. These benchmarks include sampling from 2D targets, Bayesian inference, and sampling from high-dimensional energy-based models (EBMs). Notably, in the experiment involving high-dimensional EBMs, our sampler produces samples that are comparable to those generated by MCMC-based methods while being more than 100 times more efficient, showcasing the efficiency of our neural sampler. We believe that the theoretical and empirical contributions presented in this work will stimulate further research on developing efficient samplers for various applications beyond the ones explored in this study.
\ No newline at end of file
diff --git a/data/2023/neurips/Episodic Multi-Task Learning with Heterogeneous Neural Processes b/data/2023/neurips/Episodic Multi-Task Learning with Heterogeneous Neural Processes
new file mode 100644
index 0000000000..917c29dc99
--- /dev/null
+++ b/data/2023/neurips/Episodic Multi-Task Learning with Heterogeneous Neural Processes	
@@ -0,0 +1 @@
+This paper focuses on the data-insufficiency problem in multi-task learning within an episodic training setup. Specifically, we explore the potential of heterogeneous information across tasks and meta-knowledge among episodes to effectively tackle each task with limited data. Existing meta-learning methods often fail to take advantage of crucial heterogeneous information in a single episode, while multi-task learning models neglect reusing experience from earlier episodes. To address the problem of insufficient data, we develop Heterogeneous Neural Processes (HNPs) for the episodic multi-task setup. Within the framework of hierarchical Bayes, HNPs effectively capitalize on prior experiences as meta-knowledge and capture task-relatedness among heterogeneous tasks, mitigating data-insufficiency. Meanwhile, transformer-structured inference modules are designed to enable efficient inferences toward meta-knowledge and task-relatedness. In this way, HNPs can learn more powerful functional priors for adapting to novel heterogeneous tasks in each meta-test episode. Experimental results show the superior performance of the proposed HNPs over typical baselines, and ablation studies verify the effectiveness of the designed inference modules.
\ No newline at end of file
diff --git a/data/2023/neurips/Equal Opportunity of Coverage in Fair Regression b/data/2023/neurips/Equal Opportunity of Coverage in Fair Regression
new file mode 100644
index 0000000000..49e99612cb
--- /dev/null
+++ b/data/2023/neurips/Equal Opportunity of Coverage in Fair Regression	
@@ -0,0 +1 @@
+We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. The seminal work of ``equalized coverage'' proposed an uncertainty-aware fairness notion. However, it does not guarantee equal coverage rates across more fine-grained groups (e.g., low-income females) conditioning on the true label and is biased in the assessment of uncertainty. To tackle these limitations, we propose a new uncertainty-aware fairness -- Equal Opportunity of Coverage (EOC) -- that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level. Further, the prediction intervals should be narrow to be informative. We propose Binned Fair Quantile Regression (BFQR), a distribution-free post-processing method to improve EOC with reasonable width for any trained ML models. It first calibrates a hold-out set to bound deviation from EOC, then leverages conformal prediction to maintain EOC on a test set, meanwhile optimizing prediction interval width. Experimental results demonstrate the effectiveness of our method in improving EOC. Our code is publicly available at https://github.com/fangxin-wang/bfqr .
\ No newline at end of file
diff --git a/data/2023/neurips/Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations b/data/2023/neurips/Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Estimating Causal Effects Identifiable from a Combination of Observations and Experiments b/data/2023/neurips/Estimating Causal Effects Identifiable from a Combination of Observations and Experiments
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Estimating Koopman operators with sketching to provably learn large scale dynamical systems b/data/2023/neurips/Estimating Koopman operators with sketching to provably learn large scale dynamical systems
new file mode 100644
index 0000000000..c6b8a983d3
--- /dev/null
+++ b/data/2023/neurips/Estimating Koopman operators with sketching to provably learn large scale dynamical systems	
@@ -0,0 +1 @@
+The theory of Koopman operators allows to deploy non-parametric machine learning algorithms to predict and analyze complex dynamical systems. Estimators such as principal component regression (PCR) or reduced rank regression (RRR) in kernel spaces can be shown to provably learn Koopman operators from finite empirical observations of the system's time evolution. Scaling these approaches to very long trajectories is a challenge and requires introducing suitable approximations to make computations feasible. In this paper, we boost the efficiency of different kernel-based Koopman operator estimators using random projections (sketching). We derive, implement and test the new"sketched"estimators with extensive experiments on synthetic and large-scale molecular dynamics datasets. Further, we establish non asymptotic error bounds giving a sharp characterization of the trade-offs between statistical learning rates and computational efficiency. Our empirical and theoretical analysis shows that the proposed estimators provide a sound and efficient way to learn large scale dynamical systems. In particular our experiments indicate that the proposed estimators retain the same accuracy of PCR or RRR, while being much faster.
\ No newline at end of file
diff --git a/data/2023/neurips/Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance b/data/2023/neurips/Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Evaluating Cognitive Maps and Planning in Large Language Models with CogEval b/data/2023/neurips/Evaluating Cognitive Maps and Planning in Large Language Models with CogEval
new file mode 100644
index 0000000000..dd6b220329
--- /dev/null
+++ b/data/2023/neurips/Evaluating Cognitive Maps and Planning in Large Language Models with CogEval	
@@ -0,0 +1 @@
+Recently an influx of studies claim emergent cognitive abilities in large language models (LLMs). Yet, most rely on anecdotes, overlook contamination of training sets, or lack systematic Evaluation involving multiple tasks, control conditions, multiple iterations, and statistical robustness tests. Here we make two major contributions. First, we propose CogEval, a cognitive science-inspired protocol for the systematic evaluation of cognitive capacities in Large Language Models. The CogEval protocol can be followed for the evaluation of various abilities. Second, here we follow CogEval to systematically evaluate cognitive maps and planning ability across eight LLMs (OpenAI GPT-4, GPT-3.5-turbo-175B, davinci-003-175B, Google Bard, Cohere-xlarge-52.4B, Anthropic Claude-1-52B, LLaMA-13B, and Alpaca-7B). We base our task prompts on human experiments, which offer both established construct validity for evaluating planning, and are absent from LLM training sets. We find that, while LLMs show apparent competence in a few planning tasks with simpler structures, systematic evaluation reveals striking failure modes in planning tasks, including hallucinations of invalid trajectories and getting trapped in loops. These findings do not support the idea of emergent out-of-the-box planning ability in LLMs. This could be because LLMs do not understand the latent relational structures underlying planning problems, known as cognitive maps, and fail at unrolling goal-directed trajectories based on the underlying structure. Implications for application and future directions are discussed.
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating Neuron Interpretation Methods of NLP Models b/data/2023/neurips/Evaluating Neuron Interpretation Methods of NLP Models
new file mode 100644
index 0000000000..9a95517db4
--- /dev/null
+++ b/data/2023/neurips/Evaluating Neuron Interpretation Methods of NLP Models	
@@ -0,0 +1 @@
+Neuron Interpretation has gained traction in the field of interpretability, and have provided fine-grained insights into what a model learns and how language knowledge is distributed amongst its different components. However, the lack of evaluation benchmark and metrics have led to siloed progress within these various methods, with very little work comparing them and highlighting their strengths and weaknesses. The reason for this discrepancy is the difficulty of creating ground truth datasets, for example, many neurons within a given model may learn the same phenomena, and hence there may not be one correct answer. Moreover, a learned phenomenon may spread across several neurons that work together -- surfacing these to create a gold standard challenging. In this work, we propose an evaluation framework that measures the compatibility of a neuron analysis method with other methods. We hypothesize that the more compatible a method is with the majority of the methods, the more confident one can be about its performance. We systematically evaluate our proposed framework and present a comparative analysis of a large set of neuron interpretation methods. We make the evaluation framework available to the community. It enables the evaluation of any new method using 20 concepts and across three pre-trained models.The code is released at https://github.com/fdalvi/neuron-comparative-analysis
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating Open-QA Evaluation b/data/2023/neurips/Evaluating Open-QA Evaluation
new file mode 100644
index 0000000000..59146a8469
--- /dev/null
+++ b/data/2023/neurips/Evaluating Open-QA Evaluation	
@@ -0,0 +1 @@
+This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs). Current automatic evaluation methods have shown limitations, indicating that human evaluation still remains the most reliable approach. We introduce a new task, Evaluating QA Evaluation (QA-Eval) and the corresponding dataset EVOUNA, designed to assess the accuracy of AI-generated answers in relation to standard answers within Open-QA. Our evaluation of these methods utilizes human-annotated results to measure their performance. Specifically, the work investigates methods that show high correlation with human evaluations, deeming them more reliable. We also discuss the pitfalls of current methods and methods to improve LLM-based evaluators. We believe this new QA-Eval task and corresponding dataset EVOUNA will facilitate the development of more effective automatic evaluation tools and prove valuable for future research in this area. All resources are available at \url{https://github.com/wangcunxiang/QA-Eval} and it is under the Apache-2.0 License.
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating Post-hoc Explanations for Graph Neural Networks via Robustness Analysis b/data/2023/neurips/Evaluating Post-hoc Explanations for Graph Neural Networks via Robustness Analysis
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts b/data/2023/neurips/Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts
new file mode 100644
index 0000000000..b5faa0867a
--- /dev/null
+++ b/data/2023/neurips/Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts	
@@ -0,0 +1 @@
+In reliable decision-making systems based on machine learning, models have to be robust to distributional shifts or provide the uncertainty of their predictions. In node-level problems of graph learning, distributional shifts can be especially complex since the samples are interdependent. To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks considering distributional shifts for node-level problems focus mainly on node features, while structural properties are also essential for graph problems. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure. We use this approach to create data splits according to several structural node properties: popularity, locality, and density. In our experiments, we thoroughly evaluate the proposed distributional shifts and show that they can be quite challenging for existing graph models. We also reveal that simple models often outperform more sophisticated methods on the considered structural shifts. Finally, our experiments provide evidence that there is a trade-off between the quality of learned representations for the base classification task under structural distributional shift and the ability to separate the nodes from different distributions using these representations.
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating Self-Supervised Learning for Molecular Graph Embeddings b/data/2023/neurips/Evaluating Self-Supervised Learning for Molecular Graph Embeddings
new file mode 100644
index 0000000000..46df08d664
--- /dev/null
+++ b/data/2023/neurips/Evaluating Self-Supervised Learning for Molecular Graph Embeddings	
@@ -0,0 +1 @@
+Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present"Molecular Graph Representation Evaluation"(MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning b/data/2023/neurips/Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning
new file mode 100644
index 0000000000..9e09d57fcc
--- /dev/null
+++ b/data/2023/neurips/Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning	
@@ -0,0 +1 @@
+Chain-of-thought prompting~(CoT) and tool augmentation have been validated in recent work as effective practices for improving large language models~(LLMs) to perform step-by-step reasoning on complex math-related tasks. However, most existing math reasoning datasets may be not able to fully evaluate and analyze the ability of LLMs in manipulating tools and performing reasoning, as they may only require very few invocations of tools or miss annotations for evaluating intermediate reasoning steps. To address the issue, we construct \textbf{CARP}, a new Chinese dataset consisting of 4,886 computation-intensive algebra problems with formulated annotations on intermediate steps. In CARP, we test four LLMs with CoT prompting, and find that they are all prone to make mistakes at the early steps of the solution, leading to wrong answers. Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}. In DELI, we first initialize a step-by-step solution based on retrieved exemplars, then iterate two deliberation procedures that check and refine the intermediate steps of the generated solution, from the perspectives of tool manipulation and natural language reasoning, until obtaining converged solutions or reaching the maximum turn. Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines, and can further boost the performance of existing CoT methods. Our data and code are available in \url{https://github.com/RUCAIBox/CARP}.
\ No newline at end of file
diff --git a/data/2023/neurips/Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance b/data/2023/neurips/Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance
new file mode 100644
index 0000000000..bce72ffef7
--- /dev/null
+++ b/data/2023/neurips/Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance	
@@ -0,0 +1 @@
+Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.
\ No newline at end of file
diff --git a/data/2023/neurips/Expanding Small-Scale Datasets with Guided Imagination b/data/2023/neurips/Expanding Small-Scale Datasets with Guided Imagination
new file mode 100644
index 0000000000..b8e16f7e37
--- /dev/null
+++ b/data/2023/neurips/Expanding Small-Scale Datasets with Guided Imagination	
@@ -0,0 +1 @@
+The power of DNNs relies heavily on the quantity and quality of training data. However, collecting and annotating data on a large scale is often expensive and time-consuming. To address this issue, we explore a new task, termed dataset expansion, aimed at expanding a ready-to-use small dataset by automatically creating new labeled samples. To this end, we present a Guided Imagination Framework (GIF) that leverages cutting-edge generative models like DALL-E2 and Stable Diffusion (SD) to"imagine"and create informative new data from the input seed data. Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content. To guide the imagination towards creating informative samples for model training, we introduce two key criteria, i.e., class-maintained information boosting and sample diversity promotion. These criteria are verified to be essential for effective dataset expansion: GIF-SD obtains 13.5% higher model accuracy on natural image datasets than unguided expansion with SD. With these essential criteria, GIF successfully expands small datasets in various scenarios, boosting model accuracy by 36.9% on average over six natural image datasets and by 13.5% on average over three medical datasets. The source code is available at https://github.com/Vanint/DatasetExpansion.
\ No newline at end of file
diff --git a/data/2023/neurips/Experimental Designs for Heteroskedastic Variance b/data/2023/neurips/Experimental Designs for Heteroskedastic Variance
new file mode 100644
index 0000000000..47d8f6833a
--- /dev/null
+++ b/data/2023/neurips/Experimental Designs for Heteroskedastic Variance	
@@ -0,0 +1 @@
+Most linear experimental design problems assume homogeneous variance although heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$ that can be probed to receive noisy linear responses of the form $y=x^{\top}\theta^{\ast}+\eta$. Here $\theta^{\ast}\in \mathbb{R}^d$ is an unknown parameter vector, and $\eta$ is independent mean-zero $\sigma_x^2$-sub-Gaussian noise defined by a flexible heteroskedastic variance model, $\sigma_x^2 = x^{\top}\Sigma^{\ast}x$. Assuming that $\Sigma^{\ast}\in \mathbb{R}^{d\times d}$ is an unknown matrix, we propose, analyze and empirically evaluate a novel design for uniformly bounding estimation error of the variance parameters, $\sigma_x^2$. We demonstrate the benefits of this method with two adaptive experimental design problems under heteroskedastic noise, fixed confidence transductive best-arm identification and level-set identification and prove the first instance-dependent lower bounds in these settings. Lastly, we construct near-optimal algorithms and demonstrate the large improvements in sample complexity gained from accounting for heteroskedastic variance in these designs empirically.
\ No newline at end of file
diff --git a/data/2023/neurips/Exploiting Correlated Auxiliary Feedback in Parameterized Bandits b/data/2023/neurips/Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
new file mode 100644
index 0000000000..54b07b5253
--- /dev/null
+++ b/data/2023/neurips/Exploiting Correlated Auxiliary Feedback in Parameterized Bandits	
@@ -0,0 +1 @@
+We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback). In this paper, we first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret. We then characterize the regret reduction in terms of the correlation coefficient between reward and its auxiliary feedback. Experimental results in different settings also verify the performance gain achieved by our proposed method.
\ No newline at end of file
diff --git a/data/2023/neurips/Exploiting hidden structures in non-convex games for convergence to Nash equilibrium b/data/2023/neurips/Exploiting hidden structures in non-convex games for convergence to Nash equilibrium
new file mode 100644
index 0000000000..869110e57c
--- /dev/null
+++ b/data/2023/neurips/Exploiting hidden structures in non-convex games for convergence to Nash equilibrium	
@@ -0,0 +1 @@
+A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such"hidden structures"and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments.
\ No newline at end of file
diff --git a/data/2023/neurips/Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks b/data/2023/neurips/Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks
new file mode 100644
index 0000000000..26986bc70c
--- /dev/null
+++ b/data/2023/neurips/Exploring Loss Functions for Time-based Training Strategy in Spiking Neural Networks	
@@ -0,0 +1 @@
+Spiking Neural Networks (SNNs) are considered promising brain-inspired energy-efficient models due to their event-driven computing paradigm. The spatiotemporal spike patterns used to convey information in SNNs consist of both rate coding and temporal coding, where the temporal coding is crucial to biological-plausible learning rules such as spike-timing-dependent-plasticity. The time-based training strategy is proposed to better utilize the temporal information in SNNs and learn in an asynchronous fashion. However, some recent works train SNNs by the time-based scheme with rate-coding-dominated loss functions. In this paper, we first map rate-based loss functions to time-based counterparts and explain why they are also applicable to the time-based training scheme. After that, we infer that loss functions providing adequate positive overall gradients help training by theoretical analysis. Based on this, we propose the enhanced counting loss to replace the commonly used mean square counting loss. In addition, we transfer the training of scale factor in weight standardization into thresholds. Experiments show that our approach outperforms previous time-based training methods in most datasets. Our work provides insights for training SNNs with time-based schemes and offers a fresh perspective on the correlation between rate coding and temporal coding. Our code is available at https://github.com/zhuyaoyu/SNN-temporal-training-losses.
\ No newline at end of file
diff --git a/data/2023/neurips/Exploring Question Decomposition for Zero-Shot VQA b/data/2023/neurips/Exploring Question Decomposition for Zero-Shot VQA
new file mode 100644
index 0000000000..fe6087f98f
--- /dev/null
+++ b/data/2023/neurips/Exploring Question Decomposition for Zero-Shot VQA	
@@ -0,0 +1 @@
+Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy for VQA to overcome this limitation. We probe the ability of recently developed large vision-language models to use human-written decompositions and produce their own decompositions of visual questions, finding they are capable of learning both tasks from demonstrations alone. However, we show that naive application of model-written decompositions can hurt performance. We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors, and validate its effectiveness on eight VQA tasks across three domains, showing consistent improvements in accuracy, including improvements of>20% on medical VQA datasets and boosting the zero-shot performance of BLIP-2 above chance on a VQA reformulation of the challenging Winoground task. Project Site: https://zaidkhan.me/decomposition-0shot-vqa/
\ No newline at end of file
diff --git a/data/2023/neurips/Exponentially Convergent Algorithms for Supervised Matrix Factorization b/data/2023/neurips/Exponentially Convergent Algorithms for Supervised Matrix Factorization
new file mode 100644
index 0000000000..22ee6b9a8d
--- /dev/null
+++ b/data/2023/neurips/Exponentially Convergent Algorithms for Supervised Matrix Factorization	
@@ -0,0 +1 @@
+Supervised matrix factorization (SMF) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. Our goal is to use SMF to learn low-rank latent factors that offer interpretable, data-reconstructive, and class-discriminative features, addressing challenges posed by high-dimensional data. Training SMF model involves solving a nonconvex and possibly constrained optimization with at least three blocks of parameters. Known algorithms are either heuristic or provide weak convergence guarantees for special cases. In this paper, we provide a novel framework that 'lifts' SMF as a low-rank matrix estimation problem in a combined factor space and propose an efficient algorithm that provably converges exponentially fast to a global minimizer of the objective with arbitrary initialization under mild assumptions. Our framework applies to a wide range of SMF-type problems for multi-class classification with auxiliary features. To showcase an application, we demonstrate that our algorithm successfully identified well-known cancer-associated gene groups for various cancers.
\ No newline at end of file
diff --git a/data/2023/neurips/Exposing Attention Glitches with Flip-Flop Language Modeling b/data/2023/neurips/Exposing Attention Glitches with Flip-Flop Language Modeling
new file mode 100644
index 0000000000..adf3e3bee5
--- /dev/null
+++ b/data/2023/neurips/Exposing Attention Glitches with Flip-Flop Language Modeling	
@@ -0,0 +1 @@
+Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.
\ No newline at end of file
diff --git a/data/2023/neurips/Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models b/data/2023/neurips/Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
new file mode 100644
index 0000000000..950bc69610
--- /dev/null
+++ b/data/2023/neurips/Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models	
@@ -0,0 +1 @@
+We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at https://github.com/layer6ai-labs/dgm-eval.
\ No newline at end of file
diff --git a/data/2023/neurips/Expressive Sign Equivariant Networks for Spectral Geometric Learning b/data/2023/neurips/Expressive Sign Equivariant Networks for Spectral Geometric Learning
new file mode 100644
index 0000000000..0dc6fba244
--- /dev/null
+++ b/data/2023/neurips/Expressive Sign Equivariant Networks for Spectral Geometric Learning	
@@ -0,0 +1 @@
+Recent work has shown the utility of developing machine learning models that respect the structure and symmetries of eigenvectors. These works promote sign invariance, since for any eigenvector v the negation -v is also an eigenvector. However, we show that sign invariance is theoretically limited for tasks such as building orthogonally equivariant models and learning node positional encodings for link prediction in graphs. In this work, we demonstrate the benefits of sign equivariance for these tasks. To obtain these benefits, we develop novel sign equivariant neural network architectures. Our models are based on a new analytic characterization of sign equivariant polynomials and thus inherit provable expressiveness properties. Controlled synthetic experiments show that our networks can achieve the theoretically predicted benefits of sign equivariant models. Code is available at https://github.com/cptq/Sign-Equivariant-Nets.
\ No newline at end of file
diff --git a/data/2023/neurips/Expressivity-Preserving GNN Simulation b/data/2023/neurips/Expressivity-Preserving GNN Simulation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/FAMO: Fast Adaptive Multitask Optimization b/data/2023/neurips/FAMO: Fast Adaptive Multitask Optimization
new file mode 100644
index 0000000000..e0cd620698
--- /dev/null
+++ b/data/2023/neurips/FAMO: Fast Adaptive Multitask Optimization	
@@ -0,0 +1 @@
+One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, in practice, applying gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe under-optimization of certain tasks. Previous approaches that manipulate task gradients for a more balanced loss decrease require storing and computing all task gradients ($\mathcal{O}(k)$ space and time where $k$ is the number of tasks), limiting their use in large-scale scenarios. In this work, we introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way using $\mathcal{O}(1)$ space and time. We conduct an extensive set of experiments covering multi-task supervised and reinforcement learning problems. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques while offering significant improvements in space and computational efficiency. Code is available at \url{https://github.com/Cranial-XIX/FAMO}.
\ No newline at end of file
diff --git a/data/2023/neurips/FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation b/data/2023/neurips/FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
new file mode 100644
index 0000000000..9356f84c32
--- /dev/null
+++ b/data/2023/neurips/FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation	
@@ -0,0 +1 @@
+Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two critical problems. Firstly, existing studies lack fine-grained evaluation of T2V models on different categories of text prompts. Although some benchmarks have categorized the prompts, their categorization either only focuses on a single aspect or fails to consider the temporal information in video generation. Secondly, it is unclear whether the automatic evaluation metrics are consistent with human standards. To address these problems, we propose FETV, a benchmark for Fine-grained Evaluation of Text-to-Video generation. FETV is multi-aspect, categorizing the prompts based on three orthogonal aspects: the major content, the attributes to control and the prompt complexity. FETV is also temporal-aware, which introduces several temporal categories tailored for video generation. Based on FETV, we conduct comprehensive manual evaluations of four representative T2V models, revealing their pros and cons on different categories of prompts from different aspects. We also extend FETV as a testbed to evaluate the reliability of automatic T2V metrics. The multi-aspect categorization of FETV enables fine-grained analysis of the metrics' reliability in different scenarios. We find that existing automatic metrics (e.g., CLIPScore and FVD) correlate poorly with human evaluation. To address this problem, we explore several solutions to improve CLIPScore and FVD, and develop two automatic metrics that exhibit significant higher correlation with humans than existing metrics. Benchmark page: https://github.com/llyx97/FETV.
\ No newline at end of file
diff --git a/data/2023/neurips/FIND: A Function Description Benchmark for Evaluating Interpretability Methods b/data/2023/neurips/FIND: A Function Description Benchmark for Evaluating Interpretability Methods
new file mode 100644
index 0000000000..52b386ff3e
--- /dev/null
+++ b/data/2023/neurips/FIND: A Function Description Benchmark for Evaluating Interpretability Methods	
@@ -0,0 +1 @@
+Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions span textual and numeric domains, and involve a range of real-world complexities. We evaluate methods that use pretrained language models (LMs) to produce descriptions of function behavior in natural language and code. Additionally, we introduce a new interactive method in which an Automated Interpretability Agent (AIA) generates function descriptions. We find that an AIA, built from an LM with black-box access to functions, can infer function structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, AIA descriptions tend to capture global function behavior and miss local details. These results suggest that FIND will be useful for evaluating more sophisticated interpretability methods before they are applied to real-world models.
\ No newline at end of file
diff --git a/data/2023/neurips/FIRAL: An Active Learning Algorithm for Multinomial Logistic Regression b/data/2023/neurips/FIRAL: An Active Learning Algorithm for Multinomial Logistic Regression
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout b/data/2023/neurips/FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
new file mode 100644
index 0000000000..e3267c7bf6
--- /dev/null
+++ b/data/2023/neurips/FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout	
@@ -0,0 +1 @@
+Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
\ No newline at end of file
diff --git a/data/2023/neurips/Facing Off World Model Backbones: RNNs, Transformers, and S4 b/data/2023/neurips/Facing Off World Model Backbones: RNNs, Transformers, and S4
new file mode 100644
index 0000000000..22fbc5be3d
--- /dev/null
+++ b/data/2023/neurips/Facing Off World Model Backbones: RNNs, Transformers, and S4	
@@ -0,0 +1 @@
+World models are a fundamental component in model-based reinforcement learning (MBRL) agents. To perform temporally extended and consistent simulations of the future in partially observable environments, world models need to possess long-term memory. However, state-of-the-art MBRL agents, such as Dreamer, predominantly employ recurrent neural networks (RNNs) as their world model backbone, which have limited memory capacity. In this paper, we seek to explore alternative world model backbones for improving long-term memory. In particular, we investigate the effectiveness of Transformers and Structured State Space Sequence (S4) models, motivated by their remarkable ability to capture long-range dependencies in low-dimensional sequences and their complementary strengths. We propose S4WM, the first S4-based world model that can generate high-dimensional image sequences through latent imagination. Furthermore, we extensively compare RNN-, Transformer-, and S4-based world models across four sets of environments, which we have specifically tailored to assess crucial memory capabilities of world models, including long-term imagination, context-dependent recall, reward prediction, and memory-based reasoning. Our findings demonstrate that S4WM outperforms Transformer-based world models in terms of long-term memory, while exhibiting greater efficiency during training and imagination. These results pave the way for the development of stronger MBRL agents.
\ No newline at end of file
diff --git a/data/2023/neurips/Failure-Aware Gaussian Process Optimization with Regret Bounds b/data/2023/neurips/Failure-Aware Gaussian Process Optimization with Regret Bounds
new file mode 100644
index 0000000000..1ec2c8a254
--- /dev/null
+++ b/data/2023/neurips/Failure-Aware Gaussian Process Optimization with Regret Bounds	
@@ -0,0 +1 @@
+Real-world optimization problems often require black-box optimization with observation failure, where we can obtain the objective function value if we succeed, otherwise, we can only obtain a fact of failure. Moreover, this failure region can be complex by several latent constraints, whose number is also unknown. For this problem, we propose a failure-aware Gaussian process upper confidence bound (F-GP-UCB), which only requires a mild assumption for the observation failure that an optimal solution lies on an interior of a feasible region. Furthermore, we show that the number of successful observations grows linearly, by which we provide the first regret upper bounds and the convergence of F-GP-UCB. We demonstrate the effectiveness of F-GP-UCB in several benchmark functions, including the simulation function motivated by material synthesis experiments.
\ No newline at end of file
diff --git a/data/2023/neurips/Fair Graph Distillation b/data/2023/neurips/Fair Graph Distillation
new file mode 100644
index 0000000000..6b5365cdc2
--- /dev/null
+++ b/data/2023/neurips/Fair Graph Distillation	
@@ -0,0 +1 @@
+As graph neural networks (GNNs) struggle with large-scale graphs due to high computational demands, graph data distillation promises to alleviate this issue by distilling a large real graph into a smaller distilled graph while maintaining comparable prediction performance for GNNs trained on both graphs. However, we observe that GNNs trained on distilled graphs may exhibit more severe group fairness issues than GNNs trained on real graphs for vanilla and fair GNNs training. Motivated by these observations, we propose fair graph distillation (FGD), an advanced graph distillation approach to generate fair distilled graphs. The challenge lies in the deficiency of sensitive attributes for nodes in the distilled graph, making most debiasing methods (e.g., regularization and adversarial debiasing) intractable for distilled graphs. We develop a simple yet effective bias metric, named coherence, for distilled graphs. Based on the proposed coherence metric, we introduce a framework for fair graph distillation using a bi-level optimization algorithm. Extensive experiments demonstrate that the proposed algorithm can achieve better prediction performance-fairness trade-offs across various datasets and GNN architectures.
\ No newline at end of file
diff --git a/data/2023/neurips/Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint b/data/2023/neurips/Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
new file mode 100644
index 0000000000..149a862715
--- /dev/null
+++ b/data/2023/neurips/Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint	
@@ -0,0 +1 @@
+Fair Principal Component Analysis (PCA) is a problem setting where we aim to perform PCA while making the resulting representation fair in that the projected distributions, conditional on the sensitive attributes, match one another. However, existing approaches to fair PCA have two main problems: theoretically, there has been no statistical foundation of fair PCA in terms of learnability; practically, limited memory prevents us from using existing approaches, as they explicitly rely on full access to the entire data. On the theoretical side, we rigorously formulate fair PCA using a new notion called \emph{probably approximately fair and optimal} (PAFO) learnability. On the practical side, motivated by recent advances in streaming algorithms for addressing memory limitation, we propose a new setting called \emph{fair streaming PCA} along with a memory-efficient algorithm, fair noisy power method (FNPM). We then provide its {\it statistical} guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature. Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach b/data/2023/neurips/Fairly Recommending with Social Attributes: A Flexible and Controllable Optimization Approach
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments b/data/2023/neurips/Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments
new file mode 100644
index 0000000000..1001a32232
--- /dev/null
+++ b/data/2023/neurips/Fairness Continual Learning Approach to Semantic Scene Understanding in Open-World Environments	
@@ -0,0 +1 @@
+Continual semantic segmentation aims to learn new classes while maintaining the information from the previous classes. Although prior studies have shown impressive progress in recent years, the fairness concern in the continual semantic segmentation needs to be better addressed. Meanwhile, fairness is one of the most vital factors in deploying the deep learning model, especially in human-related or safety applications. In this paper, we present a novel Fairness Continual Learning approach to the semantic segmentation problem. In particular, under the fairness objective, a new fairness continual learning framework is proposed based on class distributions. Then, a novel Prototypical Contrastive Clustering loss is proposed to address the significant challenges in continual learning, i.e., catastrophic forgetting and background shift. Our proposed loss has also been proven as a novel, generalized learning paradigm of knowledge distillation commonly used in continual learning. Moreover, the proposed Conditional Structural Consistency loss further regularized the structural constraint of the predicted segmentation. Our proposed approach has achieved State-of-the-Art performance on three standard scene understanding benchmarks, i.e., ADE20K, Cityscapes, and Pascal VOC, and promoted the fairness of the segmentation model.
\ No newline at end of file
diff --git a/data/2023/neurips/Faith and Fate: Limits of Transformers on Compositionality b/data/2023/neurips/Faith and Fate: Limits of Transformers on Compositionality
new file mode 100644
index 0000000000..b3702de121
--- /dev/null
+++ b/data/2023/neurips/Faith and Fate: Limits of Transformers on Compositionality	
@@ -0,0 +1 @@
+Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with\,increased\,task\,complexity.
\ No newline at end of file
diff --git a/data/2023/neurips/False Discovery Proportion control for aggregated Knockoffs b/data/2023/neurips/False Discovery Proportion control for aggregated Knockoffs
new file mode 100644
index 0000000000..191e68e9df
--- /dev/null
+++ b/data/2023/neurips/False Discovery Proportion control for aggregated Knockoffs	
@@ -0,0 +1 @@
+Controlled variable selection is an important analytical step in various scientific fields, such as brain imaging or genomics. In these high-dimensional data settings, considering too many variables leads to poor models and high costs, hence the need for statistical guarantees on false positives. Knockoffs are a popular statistical tool for conditional variable selection in high dimension. However, they control for the expected proportion of false discoveries (FDR) and not their actual proportion (FDP). We present a new method, KOPI, that controls the proportion of false discoveries for Knockoff-based inference. The proposed method also relies on a new type of aggregation to address the undesirable randomness associated with classical Knockoff inference. We demonstrate FDP control and substantial power gains over existing Knockoff-based methods in various simulation settings and achieve good sensitivity/specificity tradeoffs on brain imaging and genomic data.
\ No newline at end of file
diff --git a/data/2023/neurips/Fast Approximation of Similarity Graphs with Kernel Density Estimation b/data/2023/neurips/Fast Approximation of Similarity Graphs with Kernel Density Estimation
new file mode 100644
index 0000000000..b4e2fe6470
--- /dev/null
+++ b/data/2023/neurips/Fast Approximation of Similarity Graphs with Kernel Density Estimation	
@@ -0,0 +1 @@
+Constructing a similarity graph from a set $X$ of data points in $\mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Fast Model DeBias with Machine Unlearning b/data/2023/neurips/Fast Model DeBias with Machine Unlearning
new file mode 100644
index 0000000000..f61a134490
--- /dev/null
+++ b/data/2023/neurips/Fast Model DeBias with Machine Unlearning	
@@ -0,0 +1 @@
+Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc., as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets along with experiments with large language models demonstrate that our method achieves superior or competing accuracies compared with state-of-the-art methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.
\ No newline at end of file
diff --git a/data/2023/neurips/Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity b/data/2023/neurips/Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity
new file mode 100644
index 0000000000..fa80a387ab
--- /dev/null
+++ b/data/2023/neurips/Fast Projected Newton-like Method for Precision Matrix Estimation under Total Positivity	
@@ -0,0 +1 @@
+We study the problem of estimating precision matrices in Gaussian distributions that are multivariate totally positive of order two ($\mathrm{MTP}_2$). The precision matrix in such a distribution is an M-matrix. This problem can be formulated as a sign-constrained log-determinant program. Current algorithms are designed using the block coordinate descent method or the proximal point algorithm, which becomes computationally challenging in high-dimensional cases due to the requirement to solve numerous nonnegative quadratic programs or large-scale linear systems. To address this issue, we propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme. Our algorithm substantially reduces computational complexity, and its theoretical convergence is established. Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow Shrink Trees b/data/2023/neurips/Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow Shrink Trees
new file mode 100644
index 0000000000..3e4a58b012
--- /dev/null
+++ b/data/2023/neurips/Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow Shrink Trees	
@@ -0,0 +1 @@
+Learning graphical conditional independence structures is an important machine learning problem and a cornerstone of causal discovery. However, the accuracy and execution time of learning algorithms generally struggle to scale to problems with hundreds of highly connected variables -- for instance, recovering brain networks from fMRI data. We introduce the best order score search (BOSS) and grow-shrink trees (GSTs) for learning directed acyclic graphs (DAGs) in this paradigm. BOSS greedily searches over permutations of variables, using GSTs to construct and score DAGs from permutations. GSTs efficiently cache scores to eliminate redundant calculations. BOSS achieves state-of-the-art performance in accuracy and execution time, comparing favorably to a variety of combinatorial and gradient-based learning algorithms under a broad range of conditions. To demonstrate its practicality, we apply BOSS to two sets of resting-state fMRI data: simulated data with pseudo-empirical noise distributions derived from randomized empirical fMRI cortical signals and clinical data from 3T fMRI scans processed into cortical parcels. BOSS is available for use within the TETRAD project which includes Python and R wrappers.
\ No newline at end of file
diff --git a/data/2023/neurips/Faster Differentially Private Convex Optimization via Second-Order Methods b/data/2023/neurips/Faster Differentially Private Convex Optimization via Second-Order Methods
new file mode 100644
index 0000000000..2a03c237a1
--- /dev/null
+++ b/data/2023/neurips/Faster Differentially Private Convex Optimization via Second-Order Methods	
@@ -0,0 +1 @@
+Differentially private (stochastic) gradient descent is the workhorse of DP private machine learning in both the convex and non-convex settings. Without privacy constraints, second-order methods, like Newton's method, converge faster than first-order methods like gradient descent. In this work, we investigate the prospect of using the second-order information from the loss function to accelerate DP convex optimization. We first develop a private variant of the regularized cubic Newton method of Nesterov and Polyak, and show that for the class of strongly convex loss functions, our algorithm has quadratic convergence and achieves the optimal excess loss. We then design a practical second-order DP algorithm for the unconstrained logistic regression problem. We theoretically and empirically study the performance of our algorithm. Empirical results show our algorithm consistently achieves the best excess loss compared to other baselines and is 10-40x faster than DP-GD/DP-SGD.
\ No newline at end of file
diff --git a/data/2023/neurips/Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case b/data/2023/neurips/Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case
new file mode 100644
index 0000000000..6860521fcc
--- /dev/null
+++ b/data/2023/neurips/Faster Discrete Convex Function Minimization with Predictions: The M-Convex Case	
@@ -0,0 +1 @@
+Recent years have seen a growing interest in accelerating optimization algorithms with machine-learned predictions. Sakaue and Oki (NeurIPS 2022) have developed a general framework that warm-starts the L-convex function minimization method with predictions, revealing the idea's usefulness for various discrete optimization problems. In this paper, we present a framework for using predictions to accelerate M-convex function minimization, thus complementing previous research and extending the range of discrete optimization algorithms that can benefit from predictions. Our framework is particularly effective for an important subclass called laminar convex minimization, which appears in many operations research applications. Our methods can improve time complexity bounds upon the best worst-case results by using predictions and even have potential to go beyond a lower-bound result.
\ No newline at end of file
diff --git a/data/2023/neurips/Faster approximate subgraph counts with privacy b/data/2023/neurips/Faster approximate subgraph counts with privacy
new file mode 100644
index 0000000000..49c7c26c82
--- /dev/null
+++ b/data/2023/neurips/Faster approximate subgraph counts with privacy	
@@ -0,0 +1 @@
+One of the most common problems studied in the context of differential privacy for graph data is counting the number of non-induced embeddings of a subgraph in a given graph. These counts have very high global sensitivity. Therefore, adding noise based on powerful alternative techniques, such as smooth sensitivity and higher-order local sensitivity have been shown to give significantly better accuracy. However, all these alternatives to global sensitivity become computationally very expensive, and to date efficient polynomial time algorithms are known only for few selected subgraphs, such as triangles, k -triangles, and k -stars. In this paper, we show that good approximations to these sensitivity metrics can be still used to get private algorithms. Using this approach, we much faster algorithms for privately counting the number of triangles in real-world social networks, which can be easily parallelized. We also give a private polynomial time algorithm for counting any constant size subgraph using less noise than the global sensitivity; we show this can be improved significantly for counting paths in special classes of graphs.
\ No newline at end of file
diff --git a/data/2023/neurips/Feature Learning for Interpretable, Performant Decision Trees b/data/2023/neurips/Feature Learning for Interpretable, Performant Decision Trees
new file mode 100644
index 0000000000..103b3334e2
--- /dev/null
+++ b/data/2023/neurips/Feature Learning for Interpretable, Performant Decision Trees	
@@ -0,0 +1 @@
+Decision trees are regarded for high interpretability arising from their hierarchical partitioning structure built on simple decision rules. However, in practice, this is not realized because axis-aligned partitioning of realistic data results in deep trees, and because ensemble methods are used to mitigate overfitting. Even then, model complexity and performance remain sensitive to transformation of the input, and extensive expert crafting of features from the raw data is common. We propose the first system to alternate sparse feature learning with differentiable decision tree construction to produce small, interpretable trees with good performance. It benchmarks favorably against conventional tree-based models and demonstrates several notions of interpretability of a model and its predictions
\ No newline at end of file
diff --git a/data/2023/neurips/Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples b/data/2023/neurips/Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples
new file mode 100644
index 0000000000..12e7d7b689
--- /dev/null
+++ b/data/2023/neurips/Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples	
@@ -0,0 +1 @@
+Deep generative models have demonstrated the ability to generate complex, high-dimensional, and photo-realistic data. However, a uniﬁed framework for evaluating different generative modeling families remains a challenge. Indeed, likelihood-based metrics do not apply in many cases while pure sample-based metrics such as FID fail to capture known failure modes such as overﬁtting on training data. In this work, we introduce the Feature Likelihood Score (FLS), a parametric sample-based score that uses density estimation to quantitatively measure the quality/diversity of generated samples while taking into account overﬁtting. We empirically demonstrate the ability of FLS to identify speciﬁc overﬁtting problem cases, even when previously proposed metrics fail. We further perform an extensive experimental evaluation on various image datasets and model classes. Our results indicate that FLS matches intuitions of previous metrics, such as FID, while providing a more holistic evaluation of generative models that highlights models whose generalization abilities are under or overappreciated. Code for computing FLS is provided at https://github.com/marcojira/ﬂs.
\ No newline at end of file
diff --git a/data/2023/neurips/Feature Selection in the Contrastive Analysis Setting b/data/2023/neurips/Feature Selection in the Contrastive Analysis Setting
new file mode 100644
index 0000000000..fabad6b51d
--- /dev/null
+++ b/data/2023/neurips/Feature Selection in the Contrastive Analysis Setting	
@@ -0,0 +1 @@
+Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS), a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.
\ No newline at end of file
diff --git a/data/2023/neurips/Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer b/data/2023/neurips/Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer
new file mode 100644
index 0000000000..1ece6234ba
--- /dev/null
+++ b/data/2023/neurips/Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer	
@@ -0,0 +1 @@
+Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
\ No newline at end of file
diff --git a/data/2023/neurips/FedGCN: Convergence-Communication Tradeoffs in Federated Training of Graph Convolutional Networks b/data/2023/neurips/FedGCN: Convergence-Communication Tradeoffs in Federated Training of Graph Convolutional Networks
new file mode 100644
index 0000000000..f2b1093081
--- /dev/null
+++ b/data/2023/neurips/FedGCN: Convergence-Communication Tradeoffs in Federated Training of Graph Convolutional Networks	
@@ -0,0 +1 @@
+Methods for training models on graphs distributed across multiple clients have recently grown in popularity, due to the size of these graphs as well as regulations on keeping data where it is generated. However, the cross-client edges naturally exist among clients. Thus, distributed methods for training a model on a single graph incur either significant communication overhead between clients or a loss of available information to the training. We introduce the Federated Graph Convolutional Network (FedGCN) algorithm, which uses federated learning to train GCN models for semi-supervised node classification with fast convergence and little communication. Compared to prior methods that require extra communication among clients at each training round, FedGCN clients only communicate with the central server in one pre-training step, greatly reducing communication costs and allowing the use of homomorphic encryption to further enhance privacy. We theoretically analyze the tradeoff between FedGCN's convergence rate and communication cost under different data distributions. Experimental results show that our FedGCN algorithm achieves better model accuracy with 51.7% faster convergence on average and at least 100X less communication compared to prior work.
\ No newline at end of file
diff --git a/data/2023/neurips/FedNAR: Federated Optimization with Normalized Annealing Regularization b/data/2023/neurips/FedNAR: Federated Optimization with Normalized Annealing Regularization
new file mode 100644
index 0000000000..43e8ba85ad
--- /dev/null
+++ b/data/2023/neurips/FedNAR: Federated Optimization with Normalized Annealing Regularization	
@@ -0,0 +1 @@
+Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
\ No newline at end of file
diff --git a/data/2023/neurips/Federated Linear Bandits with Finite Adversarial Actions b/data/2023/neurips/Federated Linear Bandits with Finite Adversarial Actions
new file mode 100644
index 0000000000..f21b04d6c0
--- /dev/null
+++ b/data/2023/neurips/Federated Linear Bandits with Finite Adversarial Actions	
@@ -0,0 +1 @@
+We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of $\tilde{O}(\sqrt{d T})$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as $O(d M^2 \log(d)\log(T))$ and $O(\sqrt{d^3 M^3} \log(d))$, respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of $\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$ can be achieved with $\sigma_t^2$ being the noise variance of round $t$; and (2) adversarial corruption, where a total regret of $\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with $C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of FedSupLinUCB on both synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Finding Local Minima Efficiently in Decentralized Optimization b/data/2023/neurips/Finding Local Minima Efficiently in Decentralized Optimization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Finding Safe Zones of Markov Decision Processes Policies b/data/2023/neurips/Finding Safe Zones of Markov Decision Processes Policies
new file mode 100644
index 0000000000..77ae7f9583
--- /dev/null
+++ b/data/2023/neurips/Finding Safe Zones of Markov Decision Processes Policies	
@@ -0,0 +1 @@
+Given a policy of a Markov Decision Process, we define a SafeZone as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SafeZone is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SafeZones are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SafeZones, and show that in general, the problem is computationally hard. Our main result is a bi-criteria approximation learning algorithm with a factor of almost $2$ approximation for both the escape probability and SafeZone size, using a polynomial size sample complexity.
\ No newline at end of file
diff --git a/data/2023/neurips/Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator b/data/2023/neurips/Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
new file mode 100644
index 0000000000..d62b2a9228
--- /dev/null
+++ b/data/2023/neurips/Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator	
@@ -0,0 +1 @@
+In this paper, we introduce a novel approach to fine-grained cross-view geo-localization. Our method aligns a warped ground image with a corresponding GPS-tagged satellite image covering the same area using homography estimation. We first employ a differentiable spherical transform, adhering to geometric principles, to accurately align the perspective of the ground image with the satellite map. This transformation effectively places ground and aerial images in the same view and on the same plane, reducing the task to an image alignment problem. To address challenges such as occlusion, small overlapping range, and seasonal variations, we propose a robust correlation-aware homography estimator to align similar parts of the transformed ground image with the satellite image. Our method achieves sub-pixel resolution and meter-level GPS accuracy by mapping the center point of the transformed ground image to the satellite image using a homography matrix and determining the orientation of the ground camera using a point above the central axis. Operating at a speed of 30 FPS, our method outperforms state-of-the-art techniques, reducing the mean metric localization error by 21.3% and 32.4% in same-area and cross-area generalization tasks on the VIGOR benchmark, respectively, and by 34.4% on the KITTI benchmark in same-area evaluation.
\ No newline at end of file
diff --git a/data/2023/neurips/Fine-Grained Visual Prompting b/data/2023/neurips/Fine-Grained Visual Prompting
new file mode 100644
index 0000000000..ccb1af9214
--- /dev/null
+++ b/data/2023/neurips/Fine-Grained Visual Prompting	
@@ -0,0 +1 @@
+Vision-Language Models (VLMs), such as CLIP, have demonstrated impressive zero-shot transfer capabilities in image-level visual perception. However, these models have shown limited performance in instance-level tasks that demand precise localization and recognition. Previous works have suggested that incorporating visual prompts, such as colorful boxes or circles, can improve the ability of models to recognize objects of interest. Nonetheless, compared to language prompting, visual prompting designs are rarely explored. Existing approaches, which employ coarse visual cues such as colorful boxes or circles, often result in sub-optimal performance due to the inclusion of irrelevant and noisy pixels. In this paper, we carefully study the visual prompting designs by exploring more fine-grained markings, such as segmentation masks and their variations. In addition, we introduce a new zero-shot framework that leverages pixel-level annotations acquired from a generalist segmentation model for fine-grained visual prompting. Consequently, our investigation reveals that a straightforward application of blur outside the target mask, referred to as the Blur Reverse Mask, exhibits exceptional effectiveness. This proposed prompting strategy leverages the precise mask annotations to reduce focus on weakly related regions while retaining spatial coherence between the target and the surrounding background. Our Fine-Grained Visual Prompting (FGVP) demonstrates superior performance in zero-shot comprehension of referring expressions on the RefCOCO, RefCOCO+, and RefCOCOg benchmarks. It outperforms prior methods by an average margin of 3.0% to 4.6%, with a maximum improvement of 12.5% on the RefCOCO+ testA subset. Code is available at https://github.com/ylingfeng/FGVP.
\ No newline at end of file
diff --git a/data/2023/neurips/Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation b/data/2023/neurips/Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation
new file mode 100644
index 0000000000..1a477eb877
--- /dev/null
+++ b/data/2023/neurips/Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation	
@@ -0,0 +1 @@
+The design and analysis of randomized experiments is fundamental to many areas, from the physical and social sciences to industrial settings. Regression adjustment is a popular technique to reduce the variance of estimates obtained from experiments, by utilizing information contained in auxiliary covariates. While there is a large literature within the statistics community studying various approaches to regression adjustment and their asymptotic properties, little focus has been given to approaches in the finite population setting with non-asymptotic accuracy bounds. Further, prior work typically assumes that an entire population is exposed to an experiment, whereas practitioners often seek to minimize the number of subjects exposed to an experiment, for ethical and pragmatic reasons. In this work, we study the problems of estimating the sample mean, individual treatment effects, and average treatment effect with regression adjustment. We propose approaches that use techniques from randomized numerical linear algebra to sample a subset of the population on which to perform an experiment. We give non-asymptotic accuracy bounds for our methods and demonstrate that they compare favorably with prior approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/Flat Seeking Bayesian Neural Networks b/data/2023/neurips/Flat Seeking Bayesian Neural Networks
new file mode 100644
index 0000000000..2f488c99f1
--- /dev/null
+++ b/data/2023/neurips/Flat Seeking Bayesian Neural Networks	
@@ -0,0 +1 @@
+Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.
\ No newline at end of file
diff --git a/data/2023/neurips/Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models b/data/2023/neurips/Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models
new file mode 100644
index 0000000000..52c71c95fb
--- /dev/null
+++ b/data/2023/neurips/Flocks of Stochastic Parrots: Differentially Private Prompt Learning for Large Language Models	
@@ -0,0 +1 @@
+Large language models (LLMs) are excellent in-context learners. However, the sensitivity of data contained in prompts raises privacy concerns. Our work first shows that these concerns are valid: we instantiate a simple but highly effective membership inference attack against the data used to prompt LLMs. To address this vulnerability, one could forego prompting and resort to fine-tuning LLMs with known algorithms for private gradient descent. However, this comes at the expense of the practicality and efficiency offered by prompting. Therefore, we propose to privately learn to prompt. We first show that soft prompts can be obtained privately through gradient descent on downstream data. However, this is not the case for discrete prompts. Thus, we orchestrate a noisy vote among an ensemble of LLMs presented with different prompts, i.e., a flock of stochastic parrots. The vote privately transfers the flock's knowledge into a single public prompt. We show that LLMs prompted with our private algorithms closely match the non-private baselines. For example, using GPT3 as the base model, we achieve a downstream accuracy of 92.7% on the sst2 dataset with ($\epsilon=0.147, \delta=10^{-6}$)-differential privacy vs. 95.2% for the non-private baseline. Through our experiments, we also show that our prompt-based approach is easily deployed with existing commercial APIs.
\ No newline at end of file
diff --git a/data/2023/neurips/Flow Factorized Representation Learning b/data/2023/neurips/Flow Factorized Representation Learning
new file mode 100644
index 0000000000..fbd32f158b
--- /dev/null
+++ b/data/2023/neurips/Flow Factorized Representation Learning	
@@ -0,0 +1 @@
+A prominent goal of representation learning research is to achieve representations which are factorized in a useful manner with respect to the ground truth factors of variation. The fields of disentangled and equivariant representation learning have approached this ideal from a range of complimentary perspectives; however, to date, most approaches have proven to either be ill-specified or insufficiently flexible to effectively separate all realistic factors of interest in a learned latent space. In this work, we propose an alternative viewpoint on such structured representation learning which we call Flow Factorized Representation Learning, and demonstrate it to learn both more efficient and more usefully structured representations than existing frameworks. Specifically, we introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. Each latent flow is generated by the gradient field of a learned potential following dynamic optimal transport. Our novel setup brings new understandings to both \textit{disentanglement} and \textit{equivariance}. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models. Furthermore, we demonstrate that the transformations learned by our model are flexibly composable and can also extrapolate to new data, implying a degree of robustness and generalizability approaching the ultimate goal of usefully factorized representation learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection b/data/2023/neurips/Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
new file mode 100644
index 0000000000..06766f991e
--- /dev/null
+++ b/data/2023/neurips/Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection	
@@ -0,0 +1 @@
+Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, the uncertain temporal asynchrony and limited communication conditions can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. FFNet is a flow-based feature fusion framework that uses a feature flow prediction module to predict future features and compensate for asynchrony. Instead of transmitting feature maps extracted from still-images, FFNet transmits feature flow, leveraging the temporal coherence of sequential infrastructure frames. Furthermore, we introduce a self-supervised training approach that enables FFNet to generate feature flow with feature prediction ability from raw infrastructure sequences. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while only requiring about 1/100 of the transmission cost of raw data and covers all latency in one model on the DAIR-V2X dataset. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}.
\ No newline at end of file
diff --git a/data/2023/neurips/Flow: Per-instance Personalized Federated Learning b/data/2023/neurips/Flow: Per-instance Personalized Federated Learning
new file mode 100644
index 0000000000..6841823d7d
--- /dev/null
+++ b/data/2023/neurips/Flow: Per-instance Personalized Federated Learning	
@@ -0,0 +1 @@
+Personalization in Federated Learning (FL) aims to modify a collaboratively trained global model according to each client. Current approaches to personalization in FL are at a coarse granularity, i.e. all the input instances of a client use the same personalized model. This ignores the fact that some instances are more accurately handled by the global model due to better generalizability. To address this challenge, this work proposes Flow, a fine-grained stateless personalized FL approach. Flow creates dynamic personalized models by learning a routing mechanism that determines whether an input instance prefers the local parameters or its global counterpart. Thus, Flow introduces per-instance routing in addition to leveraging per-client personalization to improve accuracies at each client. Further, Flow is stateless which makes it unnecessary for a client to retain its personalized state across FL rounds. This makes Flow practical for large-scale FL settings and friendly to newly joined clients. Evaluations on Stackoverflow, Reddit, and EMNIST datasets demonstrate the superiority in prediction accuracy of Flow over state-of-the-art non-personalized and only per-client personalized approaches to FL.
\ No newline at end of file
diff --git a/data/2023/neurips/Focus Your Attention when Few-Shot Classification b/data/2023/neurips/Focus Your Attention when Few-Shot Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation b/data/2023/neurips/Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation
new file mode 100644
index 0000000000..e1cafd5abe
--- /dev/null
+++ b/data/2023/neurips/Focus on Query: Adversarial Mining Transformer for Few-Shot Segmentation	
@@ -0,0 +1 @@
+Few-shot segmentation (FSS) aims to segment objects of new categories given only a handful of annotated samples. Previous works focus their efforts on exploring the support information while paying less attention to the mining of the critical query branch. In this paper, we rethink the importance of support information and propose a new query-centric FSS model Adversarial Mining Transformer (AMFormer), which achieves accurate query image segmentation with only rough support guidance or even weak support labels. The proposed AMFormer enjoys several merits. First, we design an object mining transformer (G) that can achieve the expansion of incomplete region activated by support clue, and a detail mining transformer (D) to discriminate the detailed local difference between the expanded mask and the ground truth. Second, we propose to train G and D via an adversarial process, where G is optimized to generate more accurate masks approaching ground truth to fool D. We conduct extensive experiments on commonly used Pascal-5i and COCO-20i benchmarks and achieve state-of-the-art results across all settings. In addition, the decent performance with weak support labels in our query-centric paradigm may inspire the development of more general FSS models. Code will be available at https://github.com/Wyxdm/AMNet.
\ No newline at end of file
diff --git a/data/2023/neurips/Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts b/data/2023/neurips/Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts
new file mode 100644
index 0000000000..f198f7001f
--- /dev/null
+++ b/data/2023/neurips/Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts	
@@ -0,0 +1 @@
+Standard contextual bandit problem assumes that all the relevant contexts are observed before the algorithm chooses an arm. This modeling paradigm, while useful, often falls short when dealing with problems in which valuable additional context can be observed after arm selection. For example, content recommendation platforms like Youtube, Instagram, Tiktok also observe valuable follow-up information pertinent to the user's reward after recommendation (e.g., how long the user stayed, what is the user's watch speed, etc.). To improve online learning efficiency in these applications, we study a novel contextual bandit problem with post-serving contexts and design a new algorithm, poLinUCB, that achieves tight regret under standard assumptions. Core to our technical proof is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which can accommodate noise in data. Such robustification is necessary for tackling our problem, and we believe it could also be of general interest. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts as well as the superior performance of our algorithm over the state-of-the-art approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/Formulating Discrete Probability Flow Through Optimal Transport b/data/2023/neurips/Formulating Discrete Probability Flow Through Optimal Transport
new file mode 100644
index 0000000000..af8bac9fa0
--- /dev/null
+++ b/data/2023/neurips/Formulating Discrete Probability Flow Through Optimal Transport	
@@ -0,0 +1 @@
+Continuous diffusion models are commonly acknowledged to display a deterministic probability flow, whereas discrete diffusion models do not. In this paper, we aim to establish the fundamental theory for the probability flow of discrete diffusion models. Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions, and also present an equivalent evidence for discrete cases. In view of these findings, we are then able to define the discrete probability flow in line with the principles of optimal transport. Finally, drawing upon our newly established definitions, we propose a novel sampling method that surpasses previous discrete diffusion models in its ability to generate more certain outcomes. Extensive experiments on the synthetic toy dataset and the CIFAR-10 dataset have validated the effectiveness of our proposed discrete probability flow. Code is released at: https://github.com/PangzeCheung/Discrete-Probability-Flow.
\ No newline at end of file
diff --git a/data/2023/neurips/FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective b/data/2023/neurips/FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective
new file mode 100644
index 0000000000..44c9b51529
--- /dev/null
+++ b/data/2023/neurips/FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective	
@@ -0,0 +1 @@
+Multivariate time series (MTS) forecasting has shown great importance in numerous industries. Current state-of-the-art graph neural network (GNN)-based forecasting methods usually require both graph networks (e.g., GCN) and temporal networks (e.g., LSTM) to capture inter-series (spatial) dynamics and intra-series (temporal) dependencies, respectively. However, the uncertain compatibility of the two networks puts an extra burden on handcrafted model designs. Moreover, the separate spatial and temporal modeling naturally violates the unified spatiotemporal inter-dependencies in real world, which largely hinders the forecasting performance. To overcome these problems, we explore an interesting direction of directly applying graph networks and rethink MTS forecasting from a pure graph perspective. We first define a novel data structure, hypervariate graph, which regards each series value (regardless of variates or timestamps) as a graph node, and represents sliding windows as space-time fully-connected graphs. This perspective considers spatiotemporal dynamics unitedly and reformulates classic MTS forecasting into the predictions on hypervariate graphs. Then, we propose a novel architecture Fourier Graph Neural Network (FourierGNN) by stacking our proposed Fourier Graph Operator (FGO) to perform matrix multiplications in Fourier space. FourierGNN accommodates adequate expressiveness and achieves much lower complexity, which can effectively and efficiently accomplish the forecasting. Besides, our theoretical analysis reveals FGO's equivalence to graph convolutions in the time domain, which further verifies the validity of FourierGNN. Extensive experiments on seven datasets have demonstrated our superior performance with higher efficiency and fewer parameters compared with state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow b/data/2023/neurips/FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow
new file mode 100644
index 0000000000..2d4f0b4ada
--- /dev/null
+++ b/data/2023/neurips/FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow	
@@ -0,0 +1 @@
+Recent 4D shape representations model continuous temporal evolution of implicit shapes by (1) learning query flows without leveraging shape and articulation priors or (2) decoding shape occupancies separately for each time value. Thus, they do not effectively capture implicit correspondences between articulated shapes or regularize jittery temporal deformations. In this work, we present FourierHandFlow, which is a spatio-temporally continuous representation for human hands that combines a 3D occupancy field with articulation-aware query flows represented as Fourier series. Given an input RGB sequence, we aim to learn a fixed number of Fourier coefficients for each query flow to guarantee smooth and continuous temporal shape dynamics. To effectively model spatio-temporal deformations of articulated hands, we compose our 4D representation based on two types of Fourier query flow: (1) pose flow that models query dynamics influenced by hand articulation changes via implicit linear blend skinning and (2) shape flow that models query-wise displacement flow. In the experiments, our method achieves state-of-the-art results on video-based 4D reconstruction while being computationally more efficient than the existing 3D/4D implicit shape representations. We additionally show our results on motion inter- and extrapolation and texture transfer using the learned correspondences of implicit shapes. To the best of our knowledge, FourierHandFlow is the first neural 4D continuous hand representation learned from RGB videos. The code will be publicly accessible.
\ No newline at end of file
diff --git a/data/2023/neurips/Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization b/data/2023/neurips/Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization
new file mode 100644
index 0000000000..b94caf76e6
--- /dev/null
+++ b/data/2023/neurips/Framework and Benchmarks for Combinatorial and Mixed-variable Bayesian Optimization	
@@ -0,0 +1 @@
+This paper introduces a modular framework for Mixed-variable and Combinatorial Bayesian Optimization (MCBO) to address the lack of systematic benchmarking and standardized evaluation in the field. Current MCBO papers often introduce non-diverse or non-standard benchmarks to evaluate their methods, impeding the proper assessment of different MCBO primitives and their combinations. Additionally, papers introducing a solution for a single MCBO primitive often omit benchmarking against baselines that utilize the same methods for the remaining primitives. This omission is primarily due to the significant implementation overhead involved, resulting in a lack of controlled assessments and an inability to showcase the merits of a contribution effectively. To overcome these challenges, our proposed framework enables an effortless combination of Bayesian Optimization components, and provides a diverse set of synthetic and real-world benchmarking tasks. Leveraging this flexibility, we implement 47 novel MCBO algorithms and benchmark them against seven existing MCBO solvers and five standard black-box optimization algorithms on ten tasks, conducting over 4000 experiments. Our findings reveal a superior combination of MCBO primitives outperforming existing approaches and illustrate the significance of model fit and the use of a trust region. We make our MCBO library available under the MIT license at \url{https://github.com/huawei-noah/HEBO/tree/master/MCBO}.
\ No newline at end of file
diff --git a/data/2023/neurips/Frequency Domain-Based Dataset Distillation b/data/2023/neurips/Frequency Domain-Based Dataset Distillation
new file mode 100644
index 0000000000..9642416d59
--- /dev/null
+++ b/data/2023/neurips/Frequency Domain-Based Dataset Distillation	
@@ -0,0 +1 @@
+This paper presents FreD, a novel parameterization method for dataset distillation, which utilizes the frequency domain to distill a small-sized synthetic dataset from a large-sized original dataset. Unlike conventional approaches that focus on the spatial domain, FreD employs frequency-based transforms to optimize the frequency representations of each data instance. By leveraging the concentration of spatial domain information on specific frequency components, FreD intelligently selects a subset of frequency dimensions for optimization, leading to a significant reduction in the required budget for synthesizing an instance. Through the selection of frequency dimensions based on the explained variance, FreD demonstrates both theoretical and empirical evidence of its ability to operate efficiently within a limited budget, while better preserving the information of the original dataset compared to conventional parameterization methods. Furthermore, based on the orthogonal compatibility of FreD with existing methods, we confirm that FreD consistently improves the performances of existing distillation methods over the evaluation scenarios with different benchmark datasets. We release the code at https://github.com/sdh0818/FreD.
\ No newline at end of file
diff --git a/data/2023/neurips/Frequency-domain MLPs are More Effective Learners in Time Series Forecasting b/data/2023/neurips/Frequency-domain MLPs are More Effective Learners in Time Series Forecasting
new file mode 100644
index 0000000000..4d1409dd97
--- /dev/null
+++ b/data/2023/neurips/Frequency-domain MLPs are More Effective Learners in Time Series Forecasting	
@@ -0,0 +1 @@
+Time series forecasting has played the key role in different industrial, including finance, traffic, energy, and healthcare domains. While existing literatures have designed many sophisticated architectures based on RNNs, GNNs, or Transformers, another kind of approaches based on multi-layer perceptrons (MLPs) are proposed with simple structure, low complexity, and {superior performance}. However, most MLP-based forecasting methods suffer from the point-wise mappings and information bottleneck, which largely hinders the forecasting performance. To overcome this problem, we explore a novel direction of applying MLPs in the frequency domain for time series forecasting. We investigate the learned patterns of frequency-domain MLPs and discover their two inherent characteristic benefiting forecasting, (i) global view: frequency spectrum makes MLPs own a complete view for signals and learn global dependencies more easily, and (ii) energy compaction: frequency-domain MLPs concentrate on smaller key part of frequency components with compact signal energy. Then, we propose FreTS, a simple yet effective architecture built upon Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two stages, (i) Domain Conversion, that transforms time-domain signals into complex numbers of frequency domain; (ii) Frequency Learning, that performs our redesigned MLPs for the learning of real and imaginary part of frequency components. The above stages operated on both inter-series and intra-series scales further contribute to channel-wise and time-wise dependency learning. Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate our consistent superiority over state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader b/data/2023/neurips/From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
new file mode 100644
index 0000000000..cfdfe4064a
--- /dev/null
+++ b/data/2023/neurips/From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader	
@@ -0,0 +1 @@
+We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data. PMR can resolve the discrepancy between model pre-training and downstream fine-tuning of existing MLMs. To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data by using Wikipedia hyperlinks and designed a Wiki Anchor Extraction task to guide the MRC-style pre-training. Apart from its simplicity, PMR effectively solves extraction tasks, such as Extractive Question Answering and Named Entity Recognition. PMR shows tremendous improvements over existing approaches, especially in low-resource scenarios. When applied to the sequence classification task in the MRC formulation, PMR enables the extraction of high-quality rationales to explain the classification process, thereby providing greater prediction explainability. PMR also has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
\ No newline at end of file
diff --git a/data/2023/neurips/From Trainable Negative Depth to Edge Heterophily in Graphs b/data/2023/neurips/From Trainable Negative Depth to Edge Heterophily in Graphs
new file mode 100644
index 0000000000..0ab91f1247
--- /dev/null
+++ b/data/2023/neurips/From Trainable Negative Depth to Edge Heterophily in Graphs	
@@ -0,0 +1 @@
+Finding the proper depth d of a graph convolutional network (GCN) that provides strong representation ability has drawn significant attention, yet nonetheless largely remains an open problem for the graph learning community. Although noteworthy progress has been made, the depth or the number of layers of a corresponding GCN is realized by a series of graph convolution operations, which naturally makes d a positive integer ( d ∈ N + ). An interesting question is whether breaking the constraint of N + by making d a real number ( d ∈ R ) can bring new insights into graph learning mechanisms. In this work, by redefining GCN’s depth d as a trainable parameter continuously adjustable within ( −∞ , + ∞ ) , we open a new door of controlling its signal processing capability to model graph homophily/heterophily (nodes with similar/dissimilar labels/attributes tend to be inter-connected). A simple and powerful GCN model T E DGCN, is proposed to retain the simplicity of GCN and meanwhile automatically search for the optimal d without the prior knowledge regarding whether the input graph is homophilic or heterophilic. Negative-valued d intrinsically enables high-pass frequency filtering functionality via augmented topology for graph heterophily. Extensive experiments demonstrate the superiority of T E DGCN on node classification tasks for a variety of homophilic and heterophilic graphs.
\ No newline at end of file
diff --git a/data/2023/neurips/Full-Atom Protein Pocket Design via Iterative Refinement b/data/2023/neurips/Full-Atom Protein Pocket Design via Iterative Refinement
new file mode 100644
index 0000000000..2a93012811
--- /dev/null
+++ b/data/2023/neurips/Full-Atom Protein Pocket Design via Iterative Refinement	
@@ -0,0 +1 @@
+The design of \emph{de novo} functional proteins that bind specific ligand molecules is paramount in therapeutics and bio-engineering. A critical yet formidable task in this endeavor is the design of the protein pocket, which is the cavity region of the protein where the ligand binds. Current methods are plagued by inefficient generation, inadequate context modeling of the ligand molecule, and the inability to generate side-chain atoms. Here, we present the Full-Atom Iterative Refinement (FAIR) method, designed to address these challenges by facilitating the co-design of protein pocket sequences, specifically residue types, and their corresponding 3D structures. FAIR operates in two steps, proceeding in a coarse-to-fine manner (transitioning from protein backbone to atoms, including side chains) for a full-atom generation. In each iteration, all residue types and structures are simultaneously updated, a process termed full-shot refinement. In the initial stage, the residue types and backbone coordinates are refined using a hierarchical context encoder, complemented by two structure refinement modules that capture both inter-residue and pocket-ligand interactions. The subsequent stage delves deeper, modeling the side-chain atoms of the pockets and updating residue types to ensure sequence-structure congruence. Concurrently, the structure of the binding ligand is refined across iterations to accommodate its inherent flexibility. Comprehensive experiments show that FAIR surpasses existing methods in designing superior pocket sequences and structures, producing average improvement exceeding 10\% in AAR and RMSD metrics. FAIR is available at \url{https://github.com/zaixizhang/FAIR}.
\ No newline at end of file
diff --git a/data/2023/neurips/Function Space Bayesian Pseudocoreset for Bayesian Neural Networks b/data/2023/neurips/Function Space Bayesian Pseudocoreset for Bayesian Neural Networks
new file mode 100644
index 0000000000..af3c136044
--- /dev/null
+++ b/data/2023/neurips/Function Space Bayesian Pseudocoreset for Bayesian Neural Networks	
@@ -0,0 +1 @@
+A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset and thus can be used as a proxy dataset for scalable Bayesian inference. Typically, a Bayesian pseudocoreset is constructed by minimizing a divergence measure between the posterior conditioning on the pseudocoreset and the posterior conditioning on the full dataset. However, evaluating the divergence can be challenging, particularly for the models like deep neural networks having high-dimensional parameters. In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space. Unlike previous methods, which construct and match the coreset and full data posteriors in the space of model parameters (weights), our method constructs variational approximations to the coreset posterior on a function space and matches it to the full data posterior in the function space. By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space, including limited scalability and multi-modality issue. Through various experiments, we demonstrate that the Bayesian pseudocoresets constructed from our method enjoys enhanced uncertainty quantification and better robustness across various model architectures.
\ No newline at end of file
diff --git a/data/2023/neurips/Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks b/data/2023/neurips/Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks
new file mode 100644
index 0000000000..0dad8779e8
--- /dev/null
+++ b/data/2023/neurips/Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks	
@@ -0,0 +1 @@
+Understanding the learning process of artificial neural networks requires clarifying the structure of the parameter space within which learning takes place. A neural network parameter's functional equivalence class is the set of parameters implementing the same input--output function. For many architectures, almost all parameters have a simple and well-documented functional equivalence class. However, there is also a vanishing minority of reducible parameters, with richer functional equivalence classes caused by redundancies among the network's units. In this paper, we give an algorithmic characterisation of unit redundancies and reducible functional equivalence classes for a single-hidden-layer hyperbolic tangent architecture. We show that such functional equivalence classes are piecewise-linear path-connected sets, and that for parameters with a majority of redundant units, the sets have a diameter of at most 7 linear segments.
\ No newline at end of file
diff --git a/data/2023/neurips/GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection b/data/2023/neurips/GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection
new file mode 100644
index 0000000000..fe7d01b6eb
--- /dev/null
+++ b/data/2023/neurips/GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection	
@@ -0,0 +1 @@
+Detecting out-of-distribution (OOD) examples is crucial to guarantee the reliability and safety of deep neural networks in real-world settings. In this paper, we offer an innovative perspective on quantifying the disparities between in-distribution (ID) and OOD data -- analyzing the uncertainty that arises when models attempt to explain their predictive decisions. This perspective is motivated by our observation that gradient-based attribution methods encounter challenges in assigning feature importance to OOD data, thereby yielding divergent explanation patterns. Consequently, we investigate how attribution gradients lead to uncertain explanation outcomes and introduce two forms of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality. We then propose GAIA, a simple and effective approach that incorporates Gradient Abnormality Inspection and Aggregation. The effectiveness of GAIA is validated on both commonly utilized (CIFAR) and large-scale (ImageNet-1k) benchmarks. Specifically, GAIA reduces the average FPR95 by 23.10% on CIFAR10 and by 45.41% on CIFAR100 compared to advanced post-hoc methods.
\ No newline at end of file
diff --git a/data/2023/neurips/GAUCHE: A Library for Gaussian Processes in Chemistry b/data/2023/neurips/GAUCHE: A Library for Gaussian Processes in Chemistry
new file mode 100644
index 0000000000..711246d72b
--- /dev/null
+++ b/data/2023/neurips/GAUCHE: A Library for Gaussian Processes in Chemistry	
@@ -0,0 +1 @@
+We introduce GAUCHE, a library for GAUssian processes in CHEmistry. Gaussian processes have long been a cornerstone of probabilistic machine learning, affording particular advantages for uncertainty quantification and Bayesian optimisation. Extending Gaussian processes to chemical representations, however, is nontrivial, necessitating kernels defined over structured inputs such as graphs, strings and bit vectors. By defining such kernels in GAUCHE, we seek to open the door to powerful tools for uncertainty quantification and Bayesian optimisation in chemistry. Motivated by scenarios frequently encountered in experimental chemistry, we showcase applications for GAUCHE in molecular discovery and chemical reaction optimisation. The codebase is made available at https://github.com/leojklarner/gauche
\ No newline at end of file
diff --git a/data/2023/neurips/GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection b/data/2023/neurips/GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection
new file mode 100644
index 0000000000..741323c83f
--- /dev/null
+++ b/data/2023/neurips/GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection	
@@ -0,0 +1 @@
+The choice of a graph learning (GL) model (i.e., a GL algorithm and its hyperparameter settings) has a significant impact on the performance of downstream tasks. However, selecting the right GL model becomes increasingly difficult and time consuming as more and more GL models are developed. Accordingly, it is of great significance and practical value to equip users of GL with the ability to perform a near-instantaneous selection of an effective GL model without manual intervention. Despite the recent attempts to tackle this important problem, there has been no comprehensive benchmark environment to evaluate the performance of GL model selection methods. To bridge this gap, we present GLEMOS in this work, a comprehensive benchmark for instantaneous GL model selection that makes the following contributions. (i) GLEMOS provides extensive benchmark data for fundamental GL tasks, i.e., link prediction and node classification, including the performances of 366 models on 457 graphs on these tasks. (ii) GLEMOS designs multiple evaluation settings, and assesses how effectively representative model selection techniques perform in these different settings. (iii) GLEMOS is designed to be easily extended with new models, new graphs, and new performance records. (iv) Based on the experimental results, we discuss the limitations of existing approaches and highlight future research directions. To promote research on this significant problem, we make the benchmark data and code publicly available at https://github.com/facebookresearch/glemos.
\ No newline at end of file
diff --git a/data/2023/neurips/GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER b/data/2023/neurips/GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
new file mode 100644
index 0000000000..25a8ba5de4
--- /dev/null
+++ b/data/2023/neurips/GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER	
@@ -0,0 +1 @@
+Video generation necessitates both global coherence and local realism. This work presents a novel non-autoregressive method GLOBER, which first generates global features to obtain comprehensive global guidance and then synthesizes video frames based on the global features to generate coherent videos. Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner. To achieve maximum flexibility, our video decoder perceives temporal information through normalized frame indexes, which enables it to synthesize arbitrary sub video clips with predetermined starting and ending frame indexes. Moreover, a novel adversarial loss is introduced to improve the global coherence and local realism between the synthesized video frames. Finally, we employ a diffusion-based video generator to fit the global features outputted by the video encoder for video generation. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method, and new state-of-the-art results have been achieved on multiple benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/GMSF: Global Matching Scene Flow b/data/2023/neurips/GMSF: Global Matching Scene Flow
new file mode 100644
index 0000000000..613660599d
--- /dev/null
+++ b/data/2023/neurips/GMSF: Global Matching Scene Flow	
@@ -0,0 +1 @@
+We tackle the task of scene flow estimation from point clouds. Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target, resulting in a 3D motion vector field. Previous dominant scene flow estimation methods require complicated coarse-to-fine or recurrent architectures as a multi-stage refinement. In contrast, we propose a significantly simpler single-scale one-shot global matching to address the problem. Our key finding is that reliable feature similarity between point pairs is essential and sufficient to estimate accurate scene flow. We thus propose to decompose the feature extraction step via a hybrid local-global-cross transformer architecture which is crucial to accurate and robust feature representations. Extensive experiments show that the proposed Global Matching Scene Flow (GMSF) sets a new state-of-the-art on multiple scene flow estimation benchmarks. On FlyingThings3D, with the presence of occlusion points, GMSF reduces the outlier percentage from the previous best performance of 27.4% to 5.6%. On KITTI Scene Flow, without any fine-tuning, our proposed method shows state-of-the-art performance. On the Waymo-Open dataset, the proposed method outperforms previous methods by a large margin. The code is available at https://github.com/ZhangYushan3/GMSF.
\ No newline at end of file
diff --git a/data/2023/neurips/GPEX, A Framework For Interpreting Artificial Neural Networks b/data/2023/neurips/GPEX, A Framework For Interpreting Artificial Neural Networks
new file mode 100644
index 0000000000..8c9bb0c5ab
--- /dev/null
+++ b/data/2023/neurips/GPEX, A Framework For Interpreting Artificial Neural Networks	
@@ -0,0 +1 @@
+The analogy between Gaussian processes (GPs) and deep artificial neural networks (ANNs) has received a lot of interest, and has shown promise to unbox the blackbox of deep ANNs. Existing theoretical works put strict assumptions on the ANN (e.g. requiring all intermediate layers to be wide, or using specific activation functions). Accommodating those theoretical assumptions is hard in recent deep architectures, and those theoretical conditions need refinement as new deep architectures emerge. In this paper we derive an evidence lower-bound that encourages the GP's posterior to match the ANN's output without any requirement on the ANN. Using our method we find out that on 5 datasets, only a subset of those theoretical assumptions are sufficient. Indeed, in our experiments we used a normal ResNet-18 or feed-forward backbone with a single wide layer in the end. One limitation of training GPs is the lack of scalability with respect to the number of inducing points. We use novel computational techniques that allow us to train GPs with hundreds of thousands of inducing points and with GPU acceleration. As shown in our experiments, doing so has been essential to get a close match between the GPs and the ANNs on 5 datasets. We implement our method as a publicly available tool called GPEX: https://github.com/amirakbarnejad/gpex. On 5 datasets (4 image datasets, and 1 biological dataset) and ANNs with 2 types of functionality (classifier or attention-mechanism) we were able to find GPs whose outputs closely match those of the corresponding ANNs. After matching the GPs to the ANNs, we used the GPs' kernel functions to explain the ANNs' decisions. We provide more than 200 explanations (around 30 explanations in the paper and the rest in the supplementary) which are highly interpretable by humans and show the ability of the obtained GPs to unbox the ANNs' decisions.
\ No newline at end of file
diff --git a/data/2023/neurips/GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks b/data/2023/neurips/GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks
new file mode 100644
index 0000000000..7544b722a8
--- /dev/null
+++ b/data/2023/neurips/GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks	
@@ -0,0 +1 @@
+In recent years, there has been a rapid development of spatio-temporal prediction techniques in response to the increasing demands of traffic management and travel planning. While advanced end-to-end models have achieved notable success in improving predictive performance, their integration and expansion pose significant challenges. This work aims to address these challenges by introducing a spatio-temporal pre-training framework that seamlessly integrates with downstream baselines and enhances their performance. The framework is built upon two key designs: (i) We propose a spatio-temporal mask autoencoder as a pre-training model for learning spatio-temporal dependencies. The model incorporates customized parameter learners and hierarchical spatial pattern encoding networks. These modules are specifically designed to capture spatio-temporal customized representations and intra- and inter-cluster region semantic relationships, which have often been neglected in existing approaches. (ii) We introduce an adaptive mask strategy as part of the pre-training mechanism. This strategy guides the mask autoencoder in learning robust spatio-temporal representations and facilitates the modeling of different relationships, ranging from intra-cluster to inter-cluster, in an easy-to-hard training manner. Extensive experiments conducted on representative benchmarks demonstrate the effectiveness of our proposed method. We have made our model implementation publicly available at https://github.com/HKUDS/GPT-ST.
\ No newline at end of file
diff --git a/data/2023/neurips/GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction b/data/2023/neurips/GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
new file mode 100644
index 0000000000..410c3831dc
--- /dev/null
+++ b/data/2023/neurips/GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction	
@@ -0,0 +1 @@
+This paper aims to efficiently enable Large Language Models (LLMs) to use multimodal tools. Advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great potential for tool usage through sophisticated prompt engineering. Nevertheless, these models typically rely on prohibitive computational costs and publicly inaccessible data. To address these challenges, we propose the GPT4Tools based on self-instruct to enable open-source LLMs, such as LLaMA and OPT, to use tools. It generates an instruction-following dataset by prompting an advanced teacher with various multi-modal contexts. By using the Low-Rank Adaptation (LoRA) optimization, our approach facilitates the open-source LLMs to solve a range of visual problems, including visual comprehension and image generation. Moreover, we provide a benchmark to evaluate the ability of LLMs to use tools, which is performed in both zero-shot and fine-tuning ways. Extensive experiments demonstrate the effectiveness of our method on various language models, which not only significantly improves the accuracy of invoking seen tools, but also enables the zero-shot capacity for unseen tools. The code and demo are available at https://github.com/StevenGrove/GPT4Tools.
\ No newline at end of file
diff --git a/data/2023/neurips/Gacs-Korner Common Information Variational Autoencoder b/data/2023/neurips/Gacs-Korner Common Information Variational Autoencoder
new file mode 100644
index 0000000000..b43ef9c68b
--- /dev/null
+++ b/data/2023/neurips/Gacs-Korner Common Information Variational Autoencoder	
@@ -0,0 +1 @@
+We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is a variational relaxation of the G\'acs-K\"orner common information, which we recover as a special case, but is more amenable to optimization and can be approximated empirically using samples from the underlying distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables. Additionally, we show that the auto-encoder that we learn recovers semantically meaningful disentangled factors of variation, even though we do not explicitly optimize for it.
\ No newline at end of file
diff --git a/data/2023/neurips/Gaussian Membership Inference Privacy b/data/2023/neurips/Gaussian Membership Inference Privacy
new file mode 100644
index 0000000000..28d64dce69
--- /dev/null
+++ b/data/2023/neurips/Gaussian Membership Inference Privacy	
@@ -0,0 +1 @@
+We propose a novel and practical privacy notion called $f$-Membership Inference Privacy ($f$-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, $f$-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of $f$-MIP guarantees that we refer to as $\mu$-Gaussian Membership Inference Privacy ($\mu$-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how $f$-MIP can be amplified by adding noise to gradient updates. Our analysis further yields an analytical membership inference attack that offers two distinct advantages over previous approaches. First, unlike existing state-of-the-art attacks that require training hundreds of shadow models, our attack does not require any shadow model. Second, our analytical attack enables straightforward auditing of our privacy notion $f$-MIP. Finally, we quantify how various hyperparameters (e.g., batch size, number of model parameters) and specific data characteristics determine an attacker's ability to accurately infer a point's membership in the training set. We demonstrate the effectiveness of our method on models trained on vision and tabular datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data b/data/2023/neurips/Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data
new file mode 100644
index 0000000000..7cf9aafd6a
--- /dev/null
+++ b/data/2023/neurips/Gaussian Partial Information Decomposition: Bias Correction and Application to High-dimensional Data	
@@ -0,0 +1 @@
+Recent advances in neuroscientific experimental techniques have enabled us to simultaneously record the activity of thousands of neurons across multiple brain regions. This has led to a growing need for computational tools capable of analyzing how task-relevant information is represented and communicated between several brain regions. Partial information decompositions (PIDs) have emerged as one such tool, quantifying how much unique, redundant and synergistic information two or more brain regions carry about a task-relevant message. However, computing PIDs is computationally challenging in practice, and statistical issues such as the bias and variance of estimates remain largely unexplored. In this paper, we propose a new method for efficiently computing and estimating a PID definition on multivariate Gaussian distributions. We show empirically that our method satisfies an intuitive additivity property, and recovers the ground truth in a battery of canonical examples, even at high dimensionality. We also propose and evaluate, for the first time, a method to correct the bias in PID estimates at finite sample sizes. Finally, we demonstrate that our Gaussian PID effectively characterizes inter-areal interactions in the mouse brain, revealing higher redundancy between visual areas when a stimulus is behaviorally relevant.
\ No newline at end of file
diff --git a/data/2023/neurips/Gaussian Process Probes (GPP) for Uncertainty-Aware Probing b/data/2023/neurips/Gaussian Process Probes (GPP) for Uncertainty-Aware Probing
new file mode 100644
index 0000000000..395cec7741
--- /dev/null
+++ b/data/2023/neurips/Gaussian Process Probes (GPP) for Uncertainty-Aware Probing	
@@ -0,0 +1 @@
+Understanding which concepts models can and cannot represent has been fundamental to many tasks: from effective and responsible use of models to detecting out of distribution data. We introduce Gaussian process probes (GPP), a unified and simple framework for probing and measuring uncertainty about concepts represented by models. As a Bayesian extension of linear probing methods, GPP asks what kind of distribution over classifiers (of concepts) is induced by the model. This distribution can be used to measure both what the model represents and how confident the probe is about what the model represents. GPP can be applied to any pre-trained model with vector representations of inputs (e.g., activations). It does not require access to training data, gradients, or the architecture. We validate GPP on datasets containing both synthetic and real images. Our experiments show it can (1) probe a model's representations of concepts even with a very small number of examples, (2) accurately measure both epistemic uncertainty (how confident the probe is) and aleatory uncertainty (how fuzzy the concepts are to the model), and (3) detect out of distribution data using those uncertainty measures as well as classic methods do. By using Gaussian processes to expand what probing can offer, GPP provides a data-efficient, versatile and uncertainty-aware tool for understanding and evaluating the capabilities of machine learning models.
\ No newline at end of file
diff --git a/data/2023/neurips/GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image b/data/2023/neurips/GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image
new file mode 100644
index 0000000000..2a6b2a9d06
--- /dev/null
+++ b/data/2023/neurips/GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image	
@@ -0,0 +1 @@
+The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.
\ No newline at end of file
diff --git a/data/2023/neurips/GenS: Generalizable Neural Surface Reconstruction from Multi-View Images b/data/2023/neurips/GenS: Generalizable Neural Surface Reconstruction from Multi-View Images
new file mode 100644
index 0000000000..b138e5827a
--- /dev/null
+++ b/data/2023/neurips/GenS: Generalizable Neural Surface Reconstruction from Multi-View Images	
@@ -0,0 +1 @@
+Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. In this paper, we present GenS, an end-to-end generalizable neural surface reconstruction model. Unlike coordinate-based methods that train a separate network for each scene, we construct a generalized multi-scale volume to directly encode all scenes. Compared with existing solutions, our representation is more powerful, which can recover high-frequency details while maintaining global smoothness. Meanwhile, we introduce a multi-scale feature-metric consistency to impose the multi-view consistency in a more discriminative multi-scale feature space, which is robust to the failures of the photometric consistency. And the learnable feature can be self-enhanced to continuously improve the matching accuracy and mitigate aggregation ambiguity. Furthermore, we design a view contrast loss to force the model to be robust to those regions covered by few viewpoints through distilling the geometric prior from dense input to sparse input. Extensive experiments on popular benchmarks show that our model can generalize well to new scenes and outperform existing state-of-the-art methods even those employing ground-truth depth supervision. Code is available at https://github.com/prstrive/GenS.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation b/data/2023/neurips/Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation
new file mode 100644
index 0000000000..eb0f2d9f6b
--- /dev/null
+++ b/data/2023/neurips/Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation	
@@ -0,0 +1 @@
+Simulation-based inference (SBI) enables amortized Bayesian inference for simulators with implicit likelihoods. But when we are primarily interested in the quality of predictive simulations, or when the model cannot exactly reproduce the observed data (i.e., is misspecified), targeting the Bayesian posterior may be overly restrictive. Generalized Bayesian Inference (GBI) aims to robustify inference for (misspecified) simulator models, replacing the likelihood-function with a cost function that evaluates the goodness of parameters relative to data. However, GBI methods generally require running multiple simulations to estimate the cost function at each parameter value during inference, making the approach computationally infeasible for even moderately complex simulators. Here, we propose amortized cost estimation (ACE) for GBI to address this challenge: We train a neural network to approximate the cost function, which we define as the expected distance between simulations produced by a parameter and observed data. The trained network can then be used with MCMC to infer GBI posteriors for any observation without running additional simulations. We show that, on several benchmark tasks, ACE accurately predicts cost and provides predictive simulations that are closer to synthetic observations than other SBI methods, especially for misspecified simulators. Finally, we apply ACE to infer parameters of the Hodgkin-Huxley model given real intracellular recordings from the Allen Cell Types Database. ACE identifies better data-matching parameters while being an order of magnitude more simulation-efficient than a standard SBI method. In summary, ACE combines the strengths of SBI methods and GBI to perform robust and simulation-amortized inference for scientific simulators.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized Belief Transport b/data/2023/neurips/Generalized Belief Transport
new file mode 100644
index 0000000000..628f05e8b1
--- /dev/null
+++ b/data/2023/neurips/Generalized Belief Transport	
@@ -0,0 +1 @@
+Human learners have ability to adopt appropriate learning approaches depending on constraints such as prior on the hypothesis, urgency of decision, and drift of the environment. However, existing learning models are typically considered individually rather than in relation to one and other. To build agents that have the ability to move between different modes of learning over time, it is important to understand how learning models are related as points in a broader space of possibilities. We introduce a mathematical framework, Generalized Belief Transport (GBT), that unifies and generalizes prior models, including Bayesian inference, cooperative communication and classification, as parameterizations of three learning constraints within Unbalanced Optimal Transport (UOT). We visualize the space of learning models encoded by GBT as a cube which includes classic learning models as special points. We derive critical properties of this parameterized space including proving continuity and differentiability which is the basis for model interpolation, and study limiting behavior of the parameters, which allows attaching learning models on the boundaries. Moreover, we investigate the long-run behavior of GBT, explore convergence properties of models in GBT mathematical and computationally, document the ability to learn in the presence of distribution drift, and formulate conjectures about general behavior. We conclude with open questions and implications for more unified models of learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models b/data/2023/neurips/Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
new file mode 100644
index 0000000000..fa4d883098
--- /dev/null
+++ b/data/2023/neurips/Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models	
@@ -0,0 +1 @@
+Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Yet, the zero-shot performance is less competitive than a fully supervised one. Thus, to enhance the performance, fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that such prior work has overlooked the inherent biases in foundation models. Due to the highly imbalanced Web-scale training set, these foundation models are inevitably skewed toward frequent semantics, and thus the subsequent fine-tuning or ensembling is still biased. In this study, we systematically examine the biases in foundation models and demonstrate the efficacy of our proposed Generalized Logit Adjustment (GLA) method. Note that bias estimation in foundation models is challenging, as most pre-train data cannot be explicitly accessed like in traditional long-tailed classification tasks. To this end, GLA has an optimization-based bias estimation approach for debiasing foundation models. As our work resolves a fundamental flaw in the pre-training, the proposed GLA demonstrates significant improvements across a diverse range of tasks: it achieves 1.5 pp accuracy gains on ImageNet, an large average improvement (1.4-4.6 pp) on 11 few-shot datasets, 2.4 pp gains on long-tailed classification. Codes are in \url{https://github.com/BeierZhu/GLA}.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized Weighted Path Consistency for Mastering Atari Games b/data/2023/neurips/Generalized Weighted Path Consistency for Mastering Atari Games
new file mode 100644
index 0000000000..e38b0b0ff9
--- /dev/null
+++ b/data/2023/neurips/Generalized Weighted Path Consistency for Mastering Atari Games	
@@ -0,0 +1 @@
+Reinforcement learning with the help of neural-guided search consumes huge computational resources to achieve remarkable performance. Path consistency (PC), i.e., f values on one optimal path should be identical, was previously imposed on MCTS by PCZero to improve the learning efficiency of AlphaZero. Not only PCZero still lacks a theoretical support but also considers merely board games. In this paper, PCZero is generalized into GW-PCZero for real applications with non-zero immediate reward. A weighting mechanism is introduced to reduce the variance caused by scouting’s uncertainty on the f value estimation. For the first time, it is theoretically proved that neural-guided MCTS is guaranteed to find the optimal solution under the constraint of PC. Experiments are conducted on the Atari 100 k benchmark with 26 games and GW-PCZero achieves 198% mean human performance, higher than the state-of-the-art EfficientZero’s 194% , while consuming only 25% of the computational resources consumed by EfficientZero.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized equivalences between subsampling and ridge regularization b/data/2023/neurips/Generalized equivalences between subsampling and ridge regularization
new file mode 100644
index 0000000000..8faddbcf20
--- /dev/null
+++ b/data/2023/neurips/Generalized equivalences between subsampling and ridge regularization	
@@ -0,0 +1 @@
+We establish precise structural and risk equivalences between subsampling and ridge regularization for ensemble ridge estimators. Specifically, we prove that linear and quadratic functionals of subsample ridge estimators, when fitted with different ridge regularization levels $\lambda$ and subsample aspect ratios $\psi$, are asymptotically equivalent along specific paths in the $(\lambda,\psi)$-plane (where $\psi$ is the ratio of the feature dimension to the subsample size). Our results only require bounded moment assumptions on feature and response distributions and allow for arbitrary joint distributions. Furthermore, we provide a data-dependent method to determine the equivalent paths of $(\lambda,\psi)$. An indirect implication of our equivalences is that optimally tuned ridge regression exhibits a monotonic prediction risk in the data aspect ratio. This resolves a recent open problem raised by Nakkiran et al. for general data distributions under proportional asymptotics, assuming a mild regularity condition that maintains regression hardness through linearized signal-to-noise ratios.
\ No newline at end of file
diff --git a/data/2023/neurips/Generalized test utilities for long-tail performance in extreme multi-label classification b/data/2023/neurips/Generalized test utilities for long-tail performance in extreme multi-label classification
new file mode 100644
index 0000000000..81828c154a
--- /dev/null
+++ b/data/2023/neurips/Generalized test utilities for long-tail performance in extreme multi-label classification	
@@ -0,0 +1 @@
+Extreme multi-label classification (XMLC) is the task of selecting a small subset of relevant labels from a very large set of possible labels. As such, it is characterized by long-tail labels, i.e., most labels have very few positive instances. With standard performance measures such as precision@k, a classifier can ignore tail labels and still report good performance. However, it is often argued that correct predictions in the tail are more"interesting"or"rewarding,"but the community has not yet settled on a metric capturing this intuitive concept. The existing propensity-scored metrics fall short on this goal by confounding the problems of long-tail and missing labels. In this paper, we analyze generalized metrics budgeted"at k"as an alternative solution. To tackle the challenging problem of optimizing these metrics, we formulate it in the expected test utility (ETU) framework, which aims to optimize the expected performance on a fixed test set. We derive optimal prediction rules and construct computationally efficient approximations with provable regret guarantees and robustness against model misspecification. Our algorithm, based on block coordinate ascent, scales effortlessly to XMLC problems and obtains promising results in terms of long-tail performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Generating Behaviorally Diverse Policies with Latent Diffusion Models b/data/2023/neurips/Generating Behaviorally Diverse Policies with Latent Diffusion Models
new file mode 100644
index 0000000000..116a8ffb3c
--- /dev/null
+++ b/data/2023/neurips/Generating Behaviorally Diverse Policies with Latent Diffusion Models	
@@ -0,0 +1 @@
+Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has enabled learning a collection of behaviorally diverse, high performing policies. However, these methods typically involve storing thousands of policies, which results in high space-complexity and poor scaling to additional behaviors. Condensing the archive into a single model while retaining the performance and coverage of the original collection of policies has proved challenging. In this work, we propose using diffusion models to distill the archive into a single generative model over policy parameters. We show that our method achieves a compression ratio of 13x while recovering 98% of the original rewards and 89% of the original coverage. Further, the conditioning mechanism of diffusion models allows for flexibly selecting and sequencing behaviors, including using language. Project website: https://sites.google.com/view/policydiffusion/home
\ No newline at end of file
diff --git a/data/2023/neurips/Generator Identification for Linear SDEs with Additive and Multiplicative Noise b/data/2023/neurips/Generator Identification for Linear SDEs with Additive and Multiplicative Noise
new file mode 100644
index 0000000000..4dda00fe21
--- /dev/null
+++ b/data/2023/neurips/Generator Identification for Linear SDEs with Additive and Multiplicative Noise	
@@ -0,0 +1 @@
+In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.
\ No newline at end of file
diff --git a/data/2023/neurips/GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition b/data/2023/neurips/GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition
new file mode 100644
index 0000000000..293df1a9ca
--- /dev/null
+++ b/data/2023/neurips/GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition	
@@ -0,0 +1 @@
+Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected through crowd-sourcing. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, highlight shortcomings in current models, as well as show improved performances when even small amounts of GeoDE (1000 - 2000 images per region) are added to a training dataset. We release the full dataset and code at https://geodiverse-data-collection.cs.princeton.edu/
\ No newline at end of file
diff --git a/data/2023/neurips/Geodesic Multi-Modal Mixup for Robust Fine-Tuning b/data/2023/neurips/Geodesic Multi-Modal Mixup for Robust Fine-Tuning
new file mode 100644
index 0000000000..bb74cf2eb2
--- /dev/null
+++ b/data/2023/neurips/Geodesic Multi-Modal Mixup for Robust Fine-Tuning	
@@ -0,0 +1 @@
+Pre-trained multi-modal models, such as CLIP, provide transferable embeddings and show promising results in diverse applications. However, the analysis of learned multi-modal embeddings is relatively unexplored, and the embedding transferability can be improved. In this work, we observe that CLIP holds separated embedding subspaces for two different modalities, and then we investigate it through the lens of uniformity-alignment to measure the quality of learned representation. Both theoretically and empirically, we show that CLIP retains poor uniformity and alignment even after fine-tuning. Such a lack of alignment and uniformity might restrict the transferability and robustness of embeddings. To this end, we devise a new fine-tuning method for robust representation equipping better alignment and uniformity. First, we propose a Geodesic Multi-Modal Mixup that mixes the embeddings of image and text to generate hard negative samples on the hypersphere. Then, we fine-tune the model on hard negatives as well as original negatives and positives with contrastive loss. Based on the theoretical analysis about hardness guarantee and limiting behavior, we justify the use of our method. Extensive experiments on retrieval, calibration, few- or zero-shot classification (under distribution shift), embedding arithmetic, and image captioning further show that our method provides transferable representations, enabling robust model adaptation on diverse tasks. Code: https://github.com/changdaeoh/multimodal-mixup
\ No newline at end of file
diff --git a/data/2023/neurips/Geometric Analysis of Matrix Sensing over Graphs b/data/2023/neurips/Geometric Analysis of Matrix Sensing over Graphs
new file mode 100644
index 0000000000..8b8aa8e580
--- /dev/null
+++ b/data/2023/neurips/Geometric Analysis of Matrix Sensing over Graphs	
@@ -0,0 +1 @@
+In this work, we consider the problem of matrix sensing over graphs (MSoG). As a general case of matrix completion and matrix sensing problems, the MSoG problem has not been analyzed in the literature and the existing results cannot be directly applied to the MSoG problem. This work provides the first theoretical results on the optimization landscape of the MSoG problem. More specifically, we propose a new condition, named the Ω -RIP condition, to characterize the optimization complexity of the problem. In addition, with an improved regularizer of the incoherence, we prove that the strict saddle property holds for the MSoG problem with high probability under the incoherence condition and the Ω -RIP condition, which guarantees the polynomial-time global convergence of saddle-avoiding methods. Compared with state-of-the-art results, the bounds in this work are tight up to a constant. Besides the theoretical guarantees, we numerically illustrate the close relation between the Ω -RIP condition and the optimization complexity.
\ No newline at end of file
diff --git a/data/2023/neurips/Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization b/data/2023/neurips/Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization
new file mode 100644
index 0000000000..e69de29bb2
diff --git "a/data/2023/neurips/Global Identifiability of \360\235\223\2011-based Dictionary Learning via Matrix Volume Optimization" "b/data/2023/neurips/Global Identifiability of \360\235\223\2011-based Dictionary Learning via Matrix Volume Optimization"
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Global Structure-Aware Diffusion Process for Low-light Image Enhancement b/data/2023/neurips/Global Structure-Aware Diffusion Process for Low-light Image Enhancement
new file mode 100644
index 0000000000..4e26ca828d
--- /dev/null
+++ b/data/2023/neurips/Global Structure-Aware Diffusion Process for Low-light Image Enhancement	
@@ -0,0 +1 @@
+This paper studies a diffusion-based framework to address the low-light image enhancement problem. To harness the capabilities of diffusion models, we delve into this intricate process and advocate for the regularization of its inherent ODE-trajectory. To be specific, inspired by the recent research that low curvature ODE-trajectory results in a stable and effective diffusion process, we formulate a curvature regularization term anchored in the intrinsic non-local structures of image data, i.e., global structure-aware regularization, which gradually facilitates the preservation of complicated details and the augmentation of contrast during the diffusion process. This incorporation mitigates the adverse effects of noise and artifacts resulting from the diffusion process, leading to a more precise and flexible enhancement. To additionally promote learning in challenging regions, we introduce an uncertainty-guided regularization technique, which wisely relaxes constraints on the most extreme regions of the image. Experimental evaluations reveal that the proposed diffusion-based framework, complemented by rank-informed regularization, attains distinguished performance in low-light enhancement. The outcomes indicate substantial advancements in image quality, noise suppression, and contrast amplification in comparison with state-of-the-art methods. We believe this innovative approach will stimulate further exploration and advancement in low-light image processing, with potential implications for other applications of diffusion models. The code is publicly available at https://github.com/jinnh/GSAD.
\ No newline at end of file
diff --git a/data/2023/neurips/Gradient-Based Feature Learning under Structured Data b/data/2023/neurips/Gradient-Based Feature Learning under Structured Data
new file mode 100644
index 0000000000..7bbb58b352
--- /dev/null
+++ b/data/2023/neurips/Gradient-Based Feature Learning under Structured Data	
@@ -0,0 +1 @@
+Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Grammar Prompting for Domain-Specific Language Generation with Large Language Models b/data/2023/neurips/Grammar Prompting for Domain-Specific Language Generation with Large Language Models
new file mode 100644
index 0000000000..6c278dfa7a
--- /dev/null
+++ b/data/2023/neurips/Grammar Prompting for Domain-Specific Language Generation with Large Language Models	
@@ -0,0 +1 @@
+Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. However, for generating strings from highly structured languages (e.g., semantic parsing to complex domain-specific languages), it is challenging for the LLM to generalize from just a few exemplars. We explore $\textbf{grammar prompting}$ as a simple approach for enabling LLMs to use external knowledge and domain-specific constraints, expressed through a grammar expressed in Backus--Naur Form (BNF), during in-context learning. Grammar prompting augments each demonstration example with a specialized grammar that is minimally sufficient for generating the particular output example, where the specialized grammar is a subset of the full DSL grammar. For inference, the LLM first predicts a BNF grammar given a test input, and then generates the output according to the rules of the grammar. Experiments demonstrate that grammar prompting can enable LLMs to perform competitively on a diverse set of DSL generation tasks, including semantic parsing (SMCalFlow, Overnight, GeoQuery), PDDL planning, and even molecule generation (SMILES).
\ No newline at end of file
diff --git a/data/2023/neurips/Granger Components Analysis: Unsupervised learning of latent temporal dependencies b/data/2023/neurips/Granger Components Analysis: Unsupervised learning of latent temporal dependencies
new file mode 100644
index 0000000000..3f3a6b06cd
--- /dev/null
+++ b/data/2023/neurips/Granger Components Analysis: Unsupervised learning of latent temporal dependencies	
@@ -0,0 +1 @@
+A new technique for unsupervised learning of time series data based on the notion of Granger causality is presented. The technique learns pairs of projections of a multivariate data set such that the resulting components – “driving” and “driven” – maximize the strength of the Granger causality between the latent time series (how strongly the past of the driving signal predicts the present of the driven signal). A coordinate descent algorithm that learns pairs of coefficient vectors in an alternating fashion is developed and shown to blindly identify the underlying sources (up to scale) on simulated vector autoregressive (VAR) data. The technique is tested on scalp electroencephalography (EEG) data from a motor imagery experiment where the resulting components lateralize with the side of the cued hand, and also on functional magnetic resonance imaging (fMRI) data, where the recovered components express previously reported resting-state networks.
\ No newline at end of file
diff --git a/data/2023/neurips/Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis b/data/2023/neurips/Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis
new file mode 100644
index 0000000000..5607c8292b
--- /dev/null
+++ b/data/2023/neurips/Graph Neural Networks for Road Safety Modeling: Datasets and Evaluations for Accident Analysis	
@@ -0,0 +1 @@
+We consider the problem of traffic accident analysis on a road network based on road network connections and traffic volume. Previous works have designed various deep-learning methods using historical records to predict traffic accident occurrences. However, there is a lack of consensus on how accurate existing methods are, and a fundamental issue is the lack of public accident datasets for comprehensive evaluations. This paper constructs a large-scale, unified dataset of traffic accident records from official reports of various states in the US, totaling 9 million records, accompanied by road networks and traffic volume reports. Using this new dataset, we evaluate existing deep-learning methods for predicting the occurrence of accidents on road networks. Our main finding is that graph neural networks such as GraphSAGE can accurately predict the number of accidents on roads with less than 22% mean absolute error (relative to the actual count) and whether an accident will occur or not with over 87% AUROC, averaged over states. We achieve these results by using multitask learning to account for cross-state variabilities (e.g., availability of accident labels) and transfer learning to combine traffic volume with accident prediction. Ablation studies highlight the importance of road graph-structural features, amongst other features. Lastly, we discuss the implications of the analysis and develop a package for easily using our new dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/Grassmann Manifold Flows for Stable Shape Generation b/data/2023/neurips/Grassmann Manifold Flows for Stable Shape Generation
new file mode 100644
index 0000000000..f2b889aaa0
--- /dev/null
+++ b/data/2023/neurips/Grassmann Manifold Flows for Stable Shape Generation	
@@ -0,0 +1 @@
+Recently, studies on machine learning have focused on methods that use symmetry implicit in a specific manifold as an inductive bias. Grassmann manifolds provide the ability to handle fundamental shapes represented as shape spaces, enabling stable shape analysis. In this paper, we present a novel approach in which we establish the theoretical foundations for learning distributions on the Grassmann manifold via continuous normalization flows, with the explicit goal of generating stable shapes. Our approach facilitates more robust generation by effectively eliminating the influence of extraneous transformations, such as rotations and inversions, through learning and generating within a Grassmann manifold designed to accommodate the essential shape information of the object. The experimental results indicated that the proposed method could generate high-quality samples by capturing the data structure. Furthermore, the proposed method significantly outperformed state-of-the-art methods in terms of the log-likelihood or evidence lower bound. The results obtained are expected to stimulate further research in this field, leading to advances for stable shape generation and analysis.
\ No newline at end of file
diff --git a/data/2023/neurips/Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents b/data/2023/neurips/Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
new file mode 100644
index 0000000000..49279f907e
--- /dev/null
+++ b/data/2023/neurips/Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents	
@@ -0,0 +1 @@
+Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models. Unfortunately, applying such models to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of rewards or safety constraints that robots may require. On the other hand, language-conditioned robotic policies that learn from interaction data can provide the necessary grounding that allows the agent to be correctly situated in the real world, but such policies are limited by the lack of high-level semantic understanding due to the limited breadth of the interaction data available for training them. Thus, if we want to make use of the semantic knowledge in a language model while still situating it in an embodied setting, we must construct an action sequence that is both likely according to the language model and also realizable according to grounded models of the environment. We frame this as a problem similar to probabilistic filtering: decode a sequence that both has high probability under the language model and high probability under a set of grounded model objectives. We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models. The project's website can be found at grounded-decoding.github.io.
\ No newline at end of file
diff --git a/data/2023/neurips/Group Fairness in Peer Review b/data/2023/neurips/Group Fairness in Peer Review
new file mode 100644
index 0000000000..840e811fb6
--- /dev/null
+++ b/data/2023/neurips/Group Fairness in Peer Review	
@@ -0,0 +1 @@
+In the past few years, conferences like AAAI and NeurIPS have grown tremendously. While on the one hand this has attracted submissions from a large number of communities, on the other hand it has also resulted in a poor reviewing experience for some communities, whose submissions end up being assigned to less qualified reviewers outside of their communities. An often-advocated solution is to break up such large conferences into smaller conferences to decentralize the reviewing process. However, this can lead to isolation of various communities and slower emergence of truly interdisciplinary ideas. In this work, we tackle this challenge by introducing a notion of group fairness, namely the core, to the peer review setting. A reviewing assignment is in the core if every subset of researchers (a possible community) is treated in such a manner such that they cannot achieve a better outcome by breaking off and organizing a smaller conference on their own. We study a simple peer review model, prove that it always admits a reviewing assignment in the core, and design an efficient algorithm to find one such assignment. On the negative side, we show that the core is incompatible with achieving a good worst-case approximation of social welfare, an often-sought desideratum. We complement these results by conducting experiments with real data. We observe that our algorithm, in addition to satisfying the core, generates good social welfare on average. In contrast, existing review assignment systems violate the core, treat many communities unfairly, and significantly incentivize them to disengage.
\ No newline at end of file
diff --git a/data/2023/neurips/Guiding The Last Layer in Federated Learning with Pre-Trained Models b/data/2023/neurips/Guiding The Last Layer in Federated Learning with Pre-Trained Models
new file mode 100644
index 0000000000..fe02eda3ff
--- /dev/null
+++ b/data/2023/neurips/Guiding The Last Layer in Federated Learning with Pre-Trained Models	
@@ -0,0 +1 @@
+Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data. Recent works have begun to consider the effects of using pre-trained models as an initialization point for existing FL algorithms; however, these approaches ignore the vast body of efficient transfer learning literature from the centralized learning setting. Here we revisit the problem of FL from a pre-trained model considered in prior work and expand it to a set of computer vision transfer learning problems. We first observe that simply fitting a linear classification head can be efficient and effective in many cases. We then show that in the FL setting, fitting a classifier using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals, while obtaining strong performance. Finally, we demonstrate that using a two-phase approach of obtaining the classifier and then fine-tuning the model can yield rapid convergence and improved generalization in the federated setting. We demonstrate the potential our method has to reduce communication and compute costs while achieving better model performance.
\ No newline at end of file
diff --git a/data/2023/neurips/H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation b/data/2023/neurips/H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
new file mode 100644
index 0000000000..48e41cc5fa
--- /dev/null
+++ b/data/2023/neurips/H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation	
@@ -0,0 +1 @@
+Human hands possess remarkable dexterity and have long served as a source of inspiration for robotic manipulation. In this work, we propose a human $\textbf{H}$and$\textbf{-In}$formed visual representation learning framework to solve difficult $\textbf{Dex}$terous manipulation tasks ($\textbf{H-InDex}$) with reinforcement learning. Our framework consists of three stages: (i) pre-training representations with 3D human hand pose estimation, (ii) offline adapting representations with self-supervised keypoint detection, and (iii) reinforcement learning with exponential moving average BatchNorm. The last two stages only modify $0.36\%$ parameters of the pre-trained representation in total, ensuring the knowledge from pre-training is maintained to the full extent. We empirically study 12 challenging dexterous manipulation tasks and find that H-InDex largely surpasses strong baseline methods and the recent visual foundation models for motor control. Code is available at https://yanjieze.com/H-InDex .
\ No newline at end of file
diff --git a/data/2023/neurips/HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding b/data/2023/neurips/HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding
new file mode 100644
index 0000000000..2f5874882f
--- /dev/null
+++ b/data/2023/neurips/HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	
@@ -0,0 +1 @@
+Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD - the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance for comprehending knowledge in assembly progress, process efficiency, task collaboration, skill parameters and human intention. Details of HA-ViD is available at: https://iai-hrc.github.io/ha-vid.
\ No newline at end of file
diff --git a/data/2023/neurips/HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count b/data/2023/neurips/HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count
new file mode 100644
index 0000000000..d46fbbd365
--- /dev/null
+++ b/data/2023/neurips/HOH: Markerless Multimodal Human-Object-Human Handover Dataset with Large Object Count	
@@ -0,0 +1 @@
+We present the HOH (Human-Object-Human) Handover Dataset, a large object count dataset with 136 objects, to accelerate data-driven research on handover studies, human-robot handover implementation, and artificial intelligence (AI) on handover parameter estimation from 2D and 3D data of person interactions. HOH contains multi-view RGB and depth data, skeletons, fused point clouds, grasp type and handedness labels, object, giver hand, and receiver hand 2D and 3D segmentations, giver and receiver comfort ratings, and paired object metadata and aligned 3D models for 2,720 handover interactions spanning 136 objects and 20 giver-receiver pairs-40 with role-reversal-organized from 40 participants. We also show experimental results of neural networks trained using HOH to perform grasp, orientation, and trajectory prediction. As the only fully markerless handover capture dataset, HOH represents natural human-human handover interactions, overcoming challenges with markered datasets that require specific suiting for body tracking, and lack high-resolution hand tracking. To date, HOH is the largest handover dataset in number of objects, participants, pairs with role reversal accounted for, and total interactions captured.
\ No newline at end of file
diff --git a/data/2023/neurips/Hardware Resilience Properties of Text-Guided Image Classifiers b/data/2023/neurips/Hardware Resilience Properties of Text-Guided Image Classifiers
new file mode 100644
index 0000000000..050b206d6e
--- /dev/null
+++ b/data/2023/neurips/Hardware Resilience Properties of Text-Guided Image Classifiers	
@@ -0,0 +1 @@
+This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to $14\times$) across various architectures in the most critical layer, with minimal accuracy drop ($0.3\%$ on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.
\ No newline at end of file
diff --git a/data/2023/neurips/Harnessing the power of choices in decision tree learning b/data/2023/neurips/Harnessing the power of choices in decision tree learning
new file mode 100644
index 0000000000..b54c3cb678
--- /dev/null
+++ b/data/2023/neurips/Harnessing the power of choices in decision tree learning	
@@ -0,0 +1 @@
+We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a {\sl greediness hierarchy theorem} showing that for every $k \in \mathbb{N}$, Top-$(k+1)$ can be dramatically more powerful than Top-$k$: there are data distributions for which the former achieves accuracy $1-\varepsilon$, whereas the latter only achieves accuracy $\frac1{2}+\varepsilon$. We then show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent"optimal decision tree"algorithms. On one hand, Top-$k$ consistently enjoys significant accuracy gains over greedy algorithms across a wide range of benchmarks. On the other hand, Top-$k$ is markedly more scalable than optimal decision tree algorithms and is able to handle dataset and feature set sizes that remain far beyond the reach of these algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation b/data/2023/neurips/HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation
new file mode 100644
index 0000000000..2dc4e5e800
--- /dev/null
+++ b/data/2023/neurips/HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation	
@@ -0,0 +1 @@
+Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR).
\ No newline at end of file
diff --git a/data/2023/neurips/Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality b/data/2023/neurips/Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality
new file mode 100644
index 0000000000..4f79e07155
--- /dev/null
+++ b/data/2023/neurips/Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality	
@@ -0,0 +1 @@
+Prompt-based continual learning is an emerging direction in leveraging pre-trained knowledge for downstream continual learning, and has almost reached the performance pinnacle under supervised pre-training. However, our empirical research reveals that the current strategies fall short of their full potential under the more realistic self-supervised pre-training, which is essential for handling vast quantities of unlabeled data in practice. This is largely due to the difficulty of task-specific knowledge being incorporated into instructed representations via prompt parameters and predicted by uninstructed representations at test time. To overcome the exposed sub-optimality, we conduct a theoretical analysis of the continual learning objective in the context of pre-training, and decompose it into hierarchical components: within-task prediction, task-identity inference, and task-adaptive prediction. Following these empirical and theoretical insights, we propose Hierarchical Decomposition (HiDe-)Prompt, an innovative approach that explicitly optimizes the hierarchical components with an ensemble of task-specific prompts and statistics of both uninstructed and instructed representations, further with the coordination of a contrastive regularization strategy. Our extensive experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning (e.g., up to 15.01% and 9.61% lead on Split CIFAR-100 and Split ImageNet-R, respectively). Our code is available at \url{https://github.com/thu-ml/HiDe-Prompt}.
\ No newline at end of file
diff --git a/data/2023/neurips/Hierarchical Multi-Agent Skill Discovery b/data/2023/neurips/Hierarchical Multi-Agent Skill Discovery
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration b/data/2023/neurips/Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration
new file mode 100644
index 0000000000..f8388cb503
--- /dev/null
+++ b/data/2023/neurips/Hierarchical Semi-Implicit Variational Inference with Application to Diffusion Model Acceleration	
@@ -0,0 +1 @@
+Semi-implicit variational inference (SIVI) has been introduced to expand the analytical variational families by defining expressive semi-implicit distributions in a hierarchical manner. However, the single-layer architecture commonly used in current SIVI methods can be insufficient when the target posterior has complicated structures. In this paper, we propose hierarchical semi-implicit variational inference, called HSIVI, which generalizes SIVI to allow more expressive multi-layer construction of semi-implicit distributions. By introducing auxiliary distributions that interpolate between a simple base distribution and the target distribution, the conditional layers can be trained by progressively matching these auxiliary distributions one layer after another. Moreover, given pre-trained score networks, HSIVI can be used to accelerate the sampling process of diffusion models with the score matching objective. We show that HSIVI significantly enhances the expressiveness of SIVI on several Bayesian inference problems with complicated target distributions. When used for diffusion model acceleration, we show that HSIVI can produce high quality samples comparable to or better than the existing fast diffusion model based samplers with a small number of function evaluations on various datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/High Precision Causal Model Evaluation with Conditional Randomization b/data/2023/neurips/High Precision Causal Model Evaluation with Conditional Randomization
new file mode 100644
index 0000000000..2231c8ae0d
--- /dev/null
+++ b/data/2023/neurips/High Precision Causal Model Evaluation with Conditional Randomization	
@@ -0,0 +1 @@
+The gold standard for causal model evaluation involves comparing model predictions with true effects estimated from randomized controlled trials (RCT). However, RCTs are not always feasible or ethical to perform. In contrast, conditionally randomized experiments based on inverse probability weighting (IPW) offer a more realistic approach but may suffer from high estimation variance. To tackle this challenge and enhance causal model evaluation in real-world conditional randomization settings, we introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator. By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller asymptotic variance. Empirical studies demonstrate the improved of our estimator, highlighting its potential on achieving near-RCT performance. Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself, paving the way for more robust and reliable model assessments.
\ No newline at end of file
diff --git a/data/2023/neurips/High-dimensional Asymptotics of Denoising Autoencoders b/data/2023/neurips/High-dimensional Asymptotics of Denoising Autoencoders
new file mode 100644
index 0000000000..571d465ec8
--- /dev/null
+++ b/data/2023/neurips/High-dimensional Asymptotics of Denoising Autoencoders	
@@ -0,0 +1 @@
+We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection. We consider the high-dimensional limit where the number of training samples and the input dimension jointly tend to infinity while the number of hidden units remains bounded. We provide closed-form expressions for the denoising mean-squared test error. Building on this result, we quantitatively characterize the advantage of the considered architecture over the autoencoder without the skip connection that relates closely to principal component analysis. We further show that our results accurately capture the learning curves on a range of real data sets.
\ No newline at end of file
diff --git a/data/2023/neurips/Holistic Evaluation of Text-to-Image Models b/data/2023/neurips/Holistic Evaluation of Text-to-Image Models
new file mode 100644
index 0000000000..b6825d47d4
--- /dev/null
+++ b/data/2023/neurips/Holistic Evaluation of Text-to-Image Models	
@@ -0,0 +1 @@
+The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase.
\ No newline at end of file
diff --git a/data/2023/neurips/Homotopy-based training of NeuralODEs for accurate dynamics discovery b/data/2023/neurips/Homotopy-based training of NeuralODEs for accurate dynamics discovery
new file mode 100644
index 0000000000..080d5209ca
--- /dev/null
+++ b/data/2023/neurips/Homotopy-based training of NeuralODEs for accurate dynamics discovery	
@@ -0,0 +1 @@
+Neural Ordinary Differential Equations (NeuralODEs) present an attractive way to extract dynamical laws from time series data, as they bridge neural networks with the differential equation-based modeling paradigm of the physical sciences. However, these models often display long training times and suboptimal results, especially for longer duration data. While a common strategy in the literature imposes strong constraints to the NeuralODE architecture to inherently promote stable model dynamics, such methods are ill-suited for dynamics discovery as the unknown governing equation is not guaranteed to satisfy the assumed constraints. In this paper, we develop a new training method for NeuralODEs, based on synchronization and homotopy optimization, that does not require changes to the model architecture. We show that synchronizing the model dynamics and the training data tames the originally irregular loss landscape, which homotopy optimization can then leverage to enhance training. Through benchmark experiments, we demonstrate our method achieves competitive or better training loss while often requiring less than half the number of training epochs compared to other model-agnostic techniques. Furthermore, models trained with our method display better extrapolation capabilities, highlighting the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2023/neurips/How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources b/data/2023/neurips/How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
new file mode 100644
index 0000000000..04522888b8
--- /dev/null
+++ b/data/2023/neurips/How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources	
@@ -0,0 +1 @@
+In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets ranging from manually curated (e.g., OpenAssistant) to synthetic and distilled (e.g., Alpaca) and systematically evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities through a collection of automatic, model-based, and human-based metrics. We further introduce T\"ulu, our best performing instruction-tuned model suite finetuned on a combination of high-quality open resources. Our experiments show that different instruction-tuning datasets can uncover or enhance specific skills, while no single dataset (or combination) provides the best performance across all evaluations. Interestingly, we find that model and human preference-based evaluations fail to reflect differences in model capabilities exposed by benchmark-based evaluations, suggesting the need for the type of systemic evaluation performed in this work. Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap. We release our instruction-tuned models, including a fully finetuned 65B T\"ulu, along with our code, data, and evaluation framework at https://github.com/allenai/open-instruct to facilitate future research.
\ No newline at end of file
diff --git a/data/2023/neurips/How Re-sampling Helps for Long-Tail Learning? b/data/2023/neurips/How Re-sampling Helps for Long-Tail Learning?
new file mode 100644
index 0000000000..252e3b67e5
--- /dev/null
+++ b/data/2023/neurips/How Re-sampling Helps for Long-Tail Learning?	
@@ -0,0 +1 @@
+Long-tail learning has received significant attention in recent years due to the challenge it poses with extremely imbalanced datasets. In these datasets, only a few classes (known as the head classes) have an adequate number of training samples, while the rest of the classes (known as the tail classes) are infrequent in the training data. Re-sampling is a classical and widely used approach for addressing class imbalance issues. Unfortunately, recent studies claim that re-sampling brings negligible performance improvements in modern long-tail learning tasks. This paper aims to investigate this phenomenon systematically. Our research shows that re-sampling can considerably improve generalization when the training images do not contain semantically irrelevant contexts. In other scenarios, however, it can learn unexpected spurious correlations between irrelevant contexts and target labels. We design experiments on two homogeneous datasets, one containing irrelevant context and the other not, to confirm our findings. To prevent the learning of spurious correlations, we propose a new context shift augmentation module that generates diverse training images for the tail class by maintaining a context bank extracted from the head-class images. Experiments demonstrate that our proposed module can boost the generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methods. The source code is available at https://www.lamda.nju.edu.cn/code_CSA.ashx.
\ No newline at end of file
diff --git a/data/2023/neurips/How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model b/data/2023/neurips/How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
new file mode 100644
index 0000000000..99c840d5db
--- /dev/null
+++ b/data/2023/neurips/How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model	
@@ -0,0 +1 @@
+Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as"The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years>32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex but general mechanism that activates across diverse contexts.
\ No newline at end of file
diff --git a/data/2023/neurips/How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization b/data/2023/neurips/How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization
new file mode 100644
index 0000000000..3a2ddfee9c
--- /dev/null
+++ b/data/2023/neurips/How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization	
@@ -0,0 +1 @@
+Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/How to Scale Your EMA b/data/2023/neurips/How to Scale Your EMA
new file mode 100644
index 0000000000..66ea64f41a
--- /dev/null
+++ b/data/2023/neurips/How to Scale Your EMA	
@@ -0,0 +1 @@
+Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important tool for practical machine learning is the model Exponential Moving Average (EMA), which is a model copy that does not receive gradient information, but instead follows its target model with some momentum. This model EMA can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have treated the model EMA separately from optimization, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of model EMAs and demonstrate its validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, optimally a 6$\times$ wall-clock time reduction.
\ No newline at end of file
diff --git a/data/2023/neurips/How to Turn Your Knowledge Graph Embeddings into Generative Models b/data/2023/neurips/How to Turn Your Knowledge Graph Embeddings into Generative Models
new file mode 100644
index 0000000000..4270fc17ce
--- /dev/null
+++ b/data/2023/neurips/How to Turn Your Knowledge Graph Embeddings into Generative Models	
@@ -0,0 +1 @@
+Some of the most successful knowledge graph embedding (KGE) models for link prediction -- CP, RESCAL, TuckER, ComplEx -- can be interpreted as energy-based models. Under this perspective they are not amenable for exact maximum-likelihood estimation (MLE), sampling and struggle to integrate logical constraints. This work re-interprets the score functions of these KGEs as circuits -- constrained computational graphs allowing efficient marginalisation. Then, we design two recipes to obtain efficient generative circuit models by either restricting their activations to be non-negative or squaring their outputs. Our interpretation comes with little or no loss of performance for link prediction, while the circuits framework unlocks exact learning by MLE, efficient sampling of new triples, and guarantee that logical constraints are satisfied by design. Furthermore, our models scale more gracefully than the original KGEs on graphs with millions of entities.
\ No newline at end of file
diff --git a/data/2023/neurips/HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection b/data/2023/neurips/HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection
new file mode 100644
index 0000000000..3c00c2b210
--- /dev/null
+++ b/data/2023/neurips/HubRouter: Learning Global Routing via Hub Generation and Pin-hub Connection	
@@ -0,0 +1 @@
+Global Routing (GR) is a core yet time-consuming task in VLSI systems. It recently attracted efforts from the machine learning community, especially generative models, but they suffer from the non-connectivity of generated routes. We argue that the inherent non-connectivity can harm the advantage of its one-shot generation and has to be post-processed by traditional approaches. Thus, we propose a novel definition, called hub , which represents the key point in the route. Equipped with hubs, global routing is transferred from a pin-pin connection problem to a hub-pin connection problem. Specifically, to generate definitely-connected routes, this paper proposes a two-phase learning scheme named HubRouter, which includes 1) hub-generation phase : A condition-guided hub generator using deep generative models; 2) pin-hub-connection phase : An RSMT construction module that connects the hubs and pins using an actor-critic model. In the first phase, we incorporate typical generative models into a multi-task learning framework to perform hub generation and address the impact of sensitive noise points with stripe mask learning. During the second phase, HubRouter employs an actor-critic model to finish the routing, which is efficient and has very slight errors. Experiments on simulated and real-world global routing benchmarks are performed to show our approach’s efficiency, particularly HubRouter outperforms the state-of-the-art generative global routing methods in wirelength, overflow, and running time. Moreover, HubRouter also shows strength in other applications, such as RSMT construction and interactive path replanning.
\ No newline at end of file
diff --git a/data/2023/neurips/Human-Guided Complexity-Controlled Abstractions b/data/2023/neurips/Human-Guided Complexity-Controlled Abstractions
new file mode 100644
index 0000000000..6dd959ba38
--- /dev/null
+++ b/data/2023/neurips/Human-Guided Complexity-Controlled Abstractions	
@@ -0,0 +1 @@
+Neural networks often learn task-specific latent representations that fail to generalize to novel settings or tasks. Conversely, humans learn discrete representations (i.e., concepts or words) at a variety of abstraction levels (e.g.,"bird"vs."sparrow") and deploy the appropriate abstraction based on task. Inspired by this, we train neural models to generate a spectrum of discrete representations, and control the complexity of the representations (roughly, how many bits are allocated for encoding inputs) by tuning the entropy of the distribution over representations. In finetuning experiments, using only a small number of labeled examples for a new task, we show that (1) tuning the representation to a task-appropriate complexity level supports the highest finetuning performance, and (2) in a human-participant study, users were able to identify the appropriate complexity level for a downstream task using visualizations of discrete representations. Our results indicate a promising direction for rapid model finetuning by leveraging human insight.
\ No newline at end of file
diff --git a/data/2023/neurips/Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses b/data/2023/neurips/Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses
new file mode 100644
index 0000000000..1ec59ce91d
--- /dev/null
+++ b/data/2023/neurips/Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual Prostheses	
@@ -0,0 +1 @@
+Neuroprostheses show potential in restoring lost sensory function and enhancing human capabilities, but the sensations produced by current devices often seem unnatural or distorted. Exact placement of implants and differences in individual perception lead to significant variations in stimulus response, making personalized stimulus optimization a key challenge. Bayesian optimization could be used to optimize patient-specific stimulation parameters with limited noisy observations, but is not feasible for high-dimensional stimuli. Alternatively, deep learning models can optimize stimulus encoding strategies, but typically assume perfect knowledge of patient-specific variations. Here we propose a novel, practically feasible approach that overcomes both of these fundamental limitations. First, a deep encoder network is trained to produce optimal stimuli for any individual patient by inverting a forward model mapping electrical stimuli to visual percepts. Second, a preferential Bayesian optimization strategy utilizes this encoder to optimize patient-specific parameters for a new patient, using a minimal number of pairwise comparisons between candidate stimuli. We demonstrate the viability of this approach on a novel, state-of-the-art visual prosthesis model. We show that our approach quickly learns a personalized stimulus encoder, leads to dramatic improvements in the quality of restored vision, and is robust to noisy patient feedback and misspecifications in the underlying forward model. Overall, our results suggest that combining the strengths of deep learning and Bayesian optimization could significantly improve the perceptual experience of patients fitted with visual prostheses and may prove a viable solution for a range of neuroprosthetic technologies.
\ No newline at end of file
diff --git a/data/2023/neurips/Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images b/data/2023/neurips/Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images
new file mode 100644
index 0000000000..d7f6f818c1
--- /dev/null
+++ b/data/2023/neurips/Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images	
@@ -0,0 +1 @@
+We introduce Hyper-Skin, a hyperspectral dataset covering wide range of wavelengths from visible (VIS) spectrum (400nm - 700nm) to near-infrared (NIR) spectrum (700nm - 1000nm), uniquely designed to facilitate research on facial skin-spectra reconstruction. By reconstructing skin spectra from RGB images, our dataset enables the study of hyperspectral skin analysis, such as melanin and hemoglobin concentrations, directly on the consumer device. Overcoming limitations of existing datasets, Hyper-Skin consists of diverse facial skin data collected with a pushbroom hyperspectral camera. With 330 hyperspectral cubes from 51 subjects, the dataset covers the facial skin from different angles and facial poses. Each hyperspectral cube has dimensions of 1024$\times$1024$\times$448, resulting in millions of spectra vectors per image. The dataset, carefully curated in adherence to ethical guidelines, includes paired hyperspectral images and synthetic RGB images generated using real camera responses. We demonstrate the efficacy of our dataset by showcasing skin spectra reconstruction using state-of-the-art models on 31 bands of hyperspectral data resampled in the VIS and NIR spectrum. This Hyper-Skin dataset would be a valuable resource to NeurIPS community, encouraging the development of novel algorithms for skin spectral reconstruction while fostering interdisciplinary collaboration in hyperspectral skin analysis related to cosmetology and skin's well-being. Instructions to request the data and the related benchmarking codes are publicly available at: \url{https://github.com/hyperspectral-skin/Hyper-Skin-2023}.
\ No newline at end of file
diff --git a/data/2023/neurips/Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels b/data/2023/neurips/Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels
new file mode 100644
index 0000000000..4547f0edc0
--- /dev/null
+++ b/data/2023/neurips/Hyperbolic Space with Hierarchical Margin Boosts Fine-Grained Learning from Coarse Labels	
@@ -0,0 +1 @@
+Learning fine-grained embeddings from coarse labels is a challenging task due to limited label granularity supervision, i.e., lacking the detailed distinctions required for fine-grained tasks. The task becomes even more demanding when attempting few-shot fine-grained recognition, which holds practical significance in various applications. To address these challenges, we propose a novel method that embeds visual embeddings into a hyperbolic space and enhances their discriminative ability with a hierarchical cosine margins manner. Specifically, the hyperbolic space offers distinct advantages, including the ability to capture hierarchical relationships and increased expressive power, which favors modeling fine-grained objects. Based on the hyperbolic space, we further enforce relatively large/small similarity margins between coarse/fine classes, respectively, yielding the so-called hierarchical cosine margins manner. While enforcing similarity margins in the regular Euclidean space has become popular for deep embedding learning, applying it to the hyperbolic space is non-trivial and validating the benefit for coarse-to-fine generalization is valuable. Extensive experiments conducted on five benchmark datasets showcase the effectiveness of our proposed method, yielding state-of-the-art results surpassing competing methods.
\ No newline at end of file
diff --git a/data/2023/neurips/IBA: Towards Irreversible Backdoor Attacks in Federated Learning b/data/2023/neurips/IBA: Towards Irreversible Backdoor Attacks in Federated Learning
new file mode 100644
index 0000000000..e0d8836fe5
--- /dev/null
+++ b/data/2023/neurips/IBA: Towards Irreversible Backdoor Attacks in Federated Learning	
@@ -0,0 +1 @@
+Federated learning (FL) is a distributed learning approach that enables machine learning models to be trained on decentralized data without compromising end devices’ personal, potentially sensitive data. However, the distributed nature and uninvestigated data intuitively introduce new security vulnerabilities, including backdoor attacks. In this scenario, an adversary implants backdoor functionality into the global model during training, which can be activated to cause the desired misbehaviors for any input with a specific adversarial pattern. Despite having remarkable success in triggering and distorting model behavior, prior backdoor attacks in FL often hold impractical assumptions, limited imperceptibility, and durability. Specifically, the adversary needs to control a sufficiently large fraction of clients or know the data distribution of other honest clients. In many cases, the trigger inserted is often visually apparent, and the backdoor effect is quickly diluted if the adversary is removed from the training process. To address these limitations, we propose a novel backdoor attack framework in FL, the Irreversible Backdoor Attack ( IBA ), that jointly learns the optimal and visually stealthy trigger and then gradually implants the backdoor into a global model. This approach allows the adversary to execute a backdoor attack that can evade both human and machine inspections. Additionally, we enhance the efficiency and durability of the proposed attack by selectively poisoning the model’s parameters that are least likely updated by the main task’s learning process and constraining the poisoned model update to the vicinity of the global model. Finally, we evaluate the proposed attack framework on several benchmark datasets, including MNIST, CIFAR-10, and Tiny ImageNet, and achieved high success rates while simultaneously bypassing existing backdoor defenses and achieving a more durable backdoor effect compared to other backdoor attacks. Overall, IBA 2 offers a more effective, stealthy, and durable approach to backdoor attacks in FL.
\ No newline at end of file
diff --git a/data/2023/neurips/ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets b/data/2023/neurips/ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets
new file mode 100644
index 0000000000..baa0cd6967
--- /dev/null
+++ b/data/2023/neurips/ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets	
@@ -0,0 +1 @@
+Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena.
\ No newline at end of file
diff --git a/data/2023/neurips/IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers b/data/2023/neurips/IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers
new file mode 100644
index 0000000000..39c38b6db8
--- /dev/null
+++ b/data/2023/neurips/IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers	
@@ -0,0 +1 @@
+Data augmentation has been proven effective for training high-accuracy convolutional neural network classifiers by preventing overfitting. However, building deep neural networks in real-world scenarios requires not only high accuracy on clean data but also robustness when data distributions shift. While prior methods have proposed that there is a trade-off between accuracy and robustness, we propose IPMix, a simple data augmentation approach to improve robustness without hurting clean accuracy. IPMix integrates three levels of data augmentation (image-level, patch-level, and pixel-level) into a coherent and label-preserving technique to increase the diversity of training data with limited computational overhead. To further improve the robustness, IPMix introduces structural complexity at different levels to generate more diverse images and adopts the random mixing method for multi-scale information fusion. Experiments demonstrate that IPMix outperforms state-of-the-art corruption robustness on CIFAR-C and ImageNet-C. In addition, we show that IPMix also significantly improves the other safety measures, including robustness to adversarial perturbations, calibration, prediction consistency, and anomaly detection, achieving state-of-the-art or comparable results on several benchmarks, including ImageNet-R, ImageNet-A, and ImageNet-O.
\ No newline at end of file
diff --git a/data/2023/neurips/ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification b/data/2023/neurips/ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
new file mode 100644
index 0000000000..201b86439f
--- /dev/null
+++ b/data/2023/neurips/ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification	
@@ -0,0 +1 @@
+Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to first zoom to the most discriminative region in the image and then extract features from there to predict image labels, discarding the rest of the image. Studying six popular networks ranging from AlexNet to CLIP, we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zooming, we propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art (SOTA) TTA method. We introduce ImageNet-Hard, a new benchmark that challenges SOTA classifiers including large vision-language models even when optimal zooming is allowed.
\ No newline at end of file
diff --git a/data/2023/neurips/ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation b/data/2023/neurips/ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
new file mode 100644
index 0000000000..c017bd6bcc
--- /dev/null
+++ b/data/2023/neurips/ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation	
@@ -0,0 +1 @@
+We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. To begin with, we build ImageReward -- the first general-purpose text-to-image human preference reward model -- to effectively encode human preferences. Its training is based on our systematic annotation pipeline including rating and ranking, which collects 137k expert comparisons to date. In human evaluation, ImageReward outperforms existing scoring models and metrics, making it a promising automatic metric for evaluating text-to-image synthesis. On top of it, we propose Reward Feedback Learning (ReFL), a direct tuning algorithm to optimize diffusion models against a scorer. Both automatic and human evaluation support ReFL's advantages over compared methods. All code and datasets are provided at \url{https://github.com/THUDM/ImageReward}.
\ No newline at end of file
diff --git a/data/2023/neurips/Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion b/data/2023/neurips/Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Imitation Learning from Imperfection: Theoretical Justifications and Algorithms b/data/2023/neurips/Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
new file mode 100644
index 0000000000..b4a7076f1c
--- /dev/null
+++ b/data/2023/neurips/Imitation Learning from Imperfection: Theoretical Justifications and Algorithms	
@@ -0,0 +1 @@
+Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for sequential decision-making tasks. But, their effectiveness is hampered when faced with limited expert data. To tackle this challenge, a novel framework called (offline) IL with supplementary data has been proposed [25, 61], which enhances learning by incorporating an additional yet imperfect dataset obtained inexpensively from sub-optimal policies. Nonetheless, learning becomes challenging due to the potential inclusion of out-of-expert-distribution samples. In this work, we propose a mathematical formalization of this framework, uncovering its limitations. Our theoretical analysis reveals that a naive approach—applying the behavioral cloning (BC) algorithm concept to the combined set of expert and supplementary data—may fall short of vanilla BC, which solely relies on expert data. This deficiency arises due to the distribution shift between the two data sources. To address this issue, we propose a new importance-sampling-based technique for selecting data within the expert distribution. We prove that the proposed method eliminates the gap of the naive approach, highlighting its efficacy when handling imperfect data. Empirical studies demonstrate that our method outperforms previous state-of-the-art methods in tasks including robotic locomotion control, Atari video games, and image classification. 1 Overall, our work underscores the potential of improving IL by leveraging diverse data sources through effective data selection.
\ No newline at end of file
diff --git a/data/2023/neurips/Imitation Learning from Vague Feedback b/data/2023/neurips/Imitation Learning from Vague Feedback
new file mode 100644
index 0000000000..37b84974c1
--- /dev/null
+++ b/data/2023/neurips/Imitation Learning from Vague Feedback	
@@ -0,0 +1 @@
+Imitation learning from human feedback studies how to train well-performed imitation agents with an annotator’s relative comparison of two demonstrations (one demonstration is better/worse than the other), which is usually easier to collect than the perfect expert data required by traditional imitation learning. However, in many real-world applications, it is still expensive or even impossible to provide a clear pairwise comparison between two demonstrations with similar quality. This motivates us to study the problem of imitation learning with vague feedback, where the data annotator can only distinguish the paired demonstrations correctly when their quality differs significantly, i.e., one from the expert and another from the non-expert. By modeling the underlying demonstration pool as a mixture of expert and non-expert data, we show that the expert policy distribution can be recovered when the proportion α of expert data is known. We also propose a mixture proportion estimation method for the unknown α case. Then, we integrate the recovered expert policy distribution with generative adversarial imitation learning to form an end-to-end algorithm 1 . Experiments show that our methods outperform standard and preference-based imitation learning methods on various tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability b/data/2023/neurips/Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
new file mode 100644
index 0000000000..246b2d53bd
--- /dev/null
+++ b/data/2023/neurips/Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability	
@@ -0,0 +1 @@
+Recent research has observed that in machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS) [Cohen, et al., 2021], where the stepsizes are set to be large, resulting in non-monotonic losses induced by the GD iterates. This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime. Despite the presence of local oscillations, we prove that the logistic loss can be minimized by GD with \emph{any} constant stepsize over a long time scale. Furthermore, we prove that with \emph{any} constant stepsize, the GD iterates tend to infinity when projected to a max-margin direction (the hard-margin SVM direction) and converge to a fixed vector that minimizes a strongly convex potential when projected to the orthogonal complement of the max-margin direction. In contrast, we also show that in the EoS regime, GD iterates may diverge catastrophically under the exponential loss, highlighting the superiority of the logistic loss. These theoretical findings are in line with numerical simulations and complement existing theories on the convergence and implicit bias of GD for logistic regression, which are only applicable when the stepsizes are sufficiently small.
\ No newline at end of file
diff --git a/data/2023/neurips/Implicit Manifold Gaussian Process Regression b/data/2023/neurips/Implicit Manifold Gaussian Process Regression
new file mode 100644
index 0000000000..b3977655ab
--- /dev/null
+++ b/data/2023/neurips/Implicit Manifold Gaussian Process Regression	
@@ -0,0 +1 @@
+Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional settings.
\ No newline at end of file
diff --git a/data/2023/neurips/Implicit Regularization in Over-Parameterized Support Vector Machine b/data/2023/neurips/Implicit Regularization in Over-Parameterized Support Vector Machine
new file mode 100644
index 0000000000..689a969258
--- /dev/null
+++ b/data/2023/neurips/Implicit Regularization in Over-Parameterized Support Vector Machine	
@@ -0,0 +1 @@
+In this paper, we design a regularization-free algorithm for high-dimensional support vector machines (SVMs) by integrating over-parameterization with Nesterov's smoothing method, and provide theoretical guarantees for the induced implicit regularization phenomenon. In particular, we construct an over-parameterized hinge loss function and estimate the true parameters by leveraging regularization-free gradient descent on this loss function. The utilization of Nesterov's method enhances the computational efficiency of our algorithm, especially in terms of determining the stopping criterion and reducing computational complexity. With appropriate choices of initialization, step size, and smoothness parameter, we demonstrate that unregularized gradient descent achieves a near-oracle statistical convergence rate. Additionally, we verify our theoretical findings through a variety of numerical experiments and compare the proposed method with explicit regularization. Our results illustrate the advantages of employing implicit regularization via gradient descent in conjunction with over-parameterization in sparse SVMs.
\ No newline at end of file
diff --git a/data/2023/neurips/Implicit Variational Inference for High-Dimensional Posteriors b/data/2023/neurips/Implicit Variational Inference for High-Dimensional Posteriors
new file mode 100644
index 0000000000..bc5fe86878
--- /dev/null
+++ b/data/2023/neurips/Implicit Variational Inference for High-Dimensional Posteriors	
@@ -0,0 +1 @@
+In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors in high-dimensional spaces. Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler. This is distinct from existing methods that rely on additional discriminator networks and unstable adversarial objectives. Furthermore, we present a new sampler architecture that, for the first time, enables implicit distributions over tens of millions of latent variables, addressing computational concerns by using differentiable numerical approximations. We empirically show that our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network's performance but notoriously challenging to achieve. To the best of our knowledge, no other method has been shown to accomplish this task for such large models. Through experiments in downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit approximation.
\ No newline at end of file
diff --git a/data/2023/neurips/Implicit variance regularization in non-contrastive SSL b/data/2023/neurips/Implicit variance regularization in non-contrastive SSL
new file mode 100644
index 0000000000..e01e380802
--- /dev/null
+++ b/data/2023/neurips/Implicit variance regularization in non-contrastive SSL	
@@ -0,0 +1 @@
+Non-contrastive SSL methods like BYOL and SimSiam rely on asymmetric predictor networks to avoid representational collapse without negative samples. Yet, how predictor networks facilitate stable learning is not fully understood. While previous theoretical analyses assumed Euclidean losses, most practical implementations rely on cosine similarity. To gain further theoretical insight into non-contrastive SSL, we analytically study learning dynamics in conjunction with Euclidean and cosine similarity in the eigenspace of closed-form linear predictor networks. We show that both avoid collapse through implicit variance regularization albeit through different dynamical mechanisms. Moreover, we find that the eigenvalues act as effective learning rate multipliers and propose a family of isotropic loss functions (IsoLoss) that equalize convergence rates across eigenmodes. Empirically, IsoLoss speeds up the initial learning dynamics and increases robustness, thereby allowing us to dispense with the EMA target network typically used with non-contrastive methods. Our analysis sheds light on the variance regularization mechanisms of non-contrastive SSL and lays the theoretical grounds for crafting novel loss functions that shape the learning dynamics of the predictor's spectrum.
\ No newline at end of file
diff --git a/data/2023/neurips/Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures b/data/2023/neurips/Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures
new file mode 100644
index 0000000000..41aeaaa1c4
--- /dev/null
+++ b/data/2023/neurips/Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures	
@@ -0,0 +1 @@
+We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used"optimism in the face of uncertainty"principle reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. The performance of the resulting bandit algorithm depends on the size of the confidence sequence, with smaller confidence sets yielding better empirical performance and stronger regret guarantees. In this work, we use a novel tail bound for adaptive martingale mixtures to construct confidence sequences which are suitable for stochastic bandits. These confidence sequences allow for efficient action selection via convex programming. We prove that a linear bandit algorithm based on our confidence sequences is guaranteed to achieve competitive worst-case regret. We show that our confidence sequences are tighter than competitors, both empirically and theoretically. Finally, we demonstrate that our tighter confidence sequences give improved performance in several hyperparameter tuning tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition b/data/2023/neurips/Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition
new file mode 100644
index 0000000000..4c21430bc1
--- /dev/null
+++ b/data/2023/neurips/Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition	
@@ -0,0 +1 @@
+As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.
\ No newline at end of file
diff --git a/data/2023/neurips/Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms b/data/2023/neurips/Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms
new file mode 100644
index 0000000000..c8b1fcf566
--- /dev/null
+++ b/data/2023/neurips/Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms	
@@ -0,0 +1 @@
+We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the $\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. For some regularizers, our regret bounds also improve upon prior results even when uniqueness holds. We further provide an application of our results to the decoupled exploration and exploitation problem, demonstrating that our techniques are broadly applicable.
\ No newline at end of file
diff --git a/data/2023/neurips/Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise b/data/2023/neurips/Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise
new file mode 100644
index 0000000000..4a060149a6
--- /dev/null
+++ b/data/2023/neurips/Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise	
@@ -0,0 +1 @@
+In this work, we study the convergence in high probability of clipped gradient 1 methods when the noise distribution has heavy tails, i.e., with bounded p th mo-2 ments, for some 1 < p ≤ 2 . Prior works in this setting follow the same recipe of 3 using concentration inequalities and an inductive argument with union bound to 4 bound the iterates across all iterations. This method results in an increase in the 5 failure probability by a factor of T , where T is the number of iterations. We in-6 stead propose a new analysis approach based on bounding the moment generating 7 function of a well chosen supermartingale sequence. We improve the dependency 8 on T in the convergence guarantee for a wide range of algorithms with clipped 9 gradients, including stochastic (accelerated) mirror descent for convex objectives 10 and stochastic gradient descent for nonconvex objectives. Our high probability 11 bounds achieve the optimal convergence rates and match the best currently known 12 in-expectation bounds. Our approach naturally allows the algorithms to use time-13 varying step sizes and clipping parameters when the time horizon is unknown, 14 which appears difﬁcult or even impossible using the techniques from prior works. 15 Furthermore, we show that in the case of clipped stochastic mirror descent, several 16 problem constants, including the initial distance to the optimum, are not required 17 when setting step sizes and clipping parameters. 18
\ No newline at end of file
diff --git a/data/2023/neurips/Improving Robustness with Adaptive Weight Decay b/data/2023/neurips/Improving Robustness with Adaptive Weight Decay
new file mode 100644
index 0000000000..4eb7e915e1
--- /dev/null
+++ b/data/2023/neurips/Improving Robustness with Adaptive Weight Decay	
@@ -0,0 +1 @@
+We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., $\ell_2$-norm of the weights). We show that this simple modification can result in large improvements in adversarial robustness -- an area which suffers from robust overfitting -- without requiring extra data across various datasets and architecture choices. For example, our reformulation results in $20\%$ relative robustness improvement for CIFAR-100, and $10\%$ relative robustness improvement on CIFAR-10 comparing to the best tuned hyper-parameters of traditional weight decay resulting in models that have comparable performance to SOTA robustness methods. In addition, this method has other desirable properties, such as less sensitivity to learning rate, and smaller weight norms, which the latter contributes to robustness to overfitting to label noise, and pruning.
\ No newline at end of file
diff --git a/data/2023/neurips/Improving neural network representations using human similarity judgments b/data/2023/neurips/Improving neural network representations using human similarity judgments
new file mode 100644
index 0000000000..b0d6b17a8a
--- /dev/null
+++ b/data/2023/neurips/Improving neural network representations using human similarity judgments	
@@ -0,0 +1 @@
+Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning it with human similarity judgments. We find that a naive approach leads to large changes in local representational structure that harm downstream performance. Thus, we propose a novel method that aligns the global structure of representations while preserving their local structure. This global-local transform considerably improves accuracy across a variety of few-shot learning and anomaly detection tasks. Our results indicate that human visual representations are globally organized in a way that facilitates learning from few examples, and incorporating this global structure into neural network representations improves performance on downstream tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer b/data/2023/neurips/In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer
new file mode 100644
index 0000000000..331e8f9bce
--- /dev/null
+++ b/data/2023/neurips/In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer	
@@ -0,0 +1 @@
+Enabling machine learning classifiers to defer their decision to a downstream expert when the expert is more accurate will ensure improved safety and performance. This objective can be achieved with the learning-to-defer framework which aims to jointly learn how to classify and how to defer to the expert. In recent studies, it has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring which makes them uncalibrated. However, it remains unknown whether this is due to the widely used softmax parameterization and if we can find a softmax-based estimator that is both statistically consistent and possesses a valid probability estimator. In this work, we first show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax. We then propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness. We further analyze the non-asymptotic properties of our method and empirically validate its performance and calibration on benchmark datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/In-Context Impersonation Reveals Large Language Models' Strengths and Biases b/data/2023/neurips/In-Context Impersonation Reveals Large Language Models' Strengths and Biases
new file mode 100644
index 0000000000..5e579dead5
--- /dev/null
+++ b/data/2023/neurips/In-Context Impersonation Reveals Large Language Models' Strengths and Biases	
@@ -0,0 +1 @@
+In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases.
\ No newline at end of file
diff --git a/data/2023/neurips/In-Context Learning Unlocked for Diffusion Models b/data/2023/neurips/In-Context Learning Unlocked for Diffusion Models
new file mode 100644
index 0000000000..05578b3569
--- /dev/null
+++ b/data/2023/neurips/In-Context Learning Unlocked for Diffusion Models	
@@ -0,0 +1 @@
+We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models. Given a pair of task-specific example images, such as depth from/to image and scribble from/to image, and a text guidance, our model automatically understands the underlying task and performs the same task on a new query image following the text guidance. To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input. The diffusion model is trained jointly over six different tasks using these prompts. The resulting Prompt Diffusion model is the first diffusion-based vision-language foundation model capable of in-context learning. It demonstrates high-quality in-context generation on the trained tasks and generalizes effectively to new, unseen vision tasks with their respective prompts. Our model also shows compelling text-guided image editing results. Our framework aims to facilitate research into in-context learning for computer vision. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Prompt-Diffusion.
\ No newline at end of file
diff --git a/data/2023/neurips/Individual Arbitrariness and Group Fairness b/data/2023/neurips/Individual Arbitrariness and Group Fairness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Inferring Hybrid Neural Fluid Fields from Videos b/data/2023/neurips/Inferring Hybrid Neural Fluid Fields from Videos
new file mode 100644
index 0000000000..57140ab59c
--- /dev/null
+++ b/data/2023/neurips/Inferring Hybrid Neural Fluid Fields from Videos	
@@ -0,0 +1 @@
+We study recovering fluid density and velocity from sparse multiview videos. Existing neural dynamic reconstruction methods predominantly rely on optical flows; therefore, they cannot accurately estimate the density and uncover the underlying velocity due to the inherent visual ambiguities of fluid velocity, as fluids are often shapeless and lack stable visual features. The challenge is further pronounced by the turbulent nature of fluid flows, which calls for properly designed fluid velocity representations. To address these challenges, we propose hybrid neural fluid fields (HyFluid), a neural approach to jointly infer fluid density and velocity fields. Specifically, to deal with visual ambiguities of fluid velocity, we introduce a set of physics-based losses that enforce inferring a physically plausible velocity field, which is divergence-free and drives the transport of density. To deal with the turbulent nature of fluid velocity, we design a hybrid neural velocity representation that includes a base neural velocity field that captures most irrotational energy and a vortex particle-based velocity that models residual turbulent velocity. We show that our method enables recovering vortical flow details. Our approach opens up possibilities for various learning and reconstruction applications centered around 3D incompressible flow, including fluid re-simulation and editing, future prediction, and neural dynamic scene composition. Project website: https://kovenyu.com/HyFluid/
\ No newline at end of file
diff --git a/data/2023/neurips/Inferring the Future by Imagining the Past b/data/2023/neurips/Inferring the Future by Imagining the Past
new file mode 100644
index 0000000000..43dbaca847
--- /dev/null
+++ b/data/2023/neurips/Inferring the Future by Imagining the Past	
@@ -0,0 +1 @@
+A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a *static snapshot* of a *dynamic scene*, even in situations they have never seen before. In this paper, we model how humans make such rapid and flexible inferences. Building on a long line of work in cognitive science, we offer a Monte Carlo algorithm whose inferences correlate well with human intuitions in a wide variety of domains, while only using a small, cognitively-plausible number of samples. Our key technical insight is a surprising connection between our inference problem and Monte Carlo path tracing, which allows us to apply decades of ideas from the computer graphics community to this seemingly-unrelated theory of mind task.
\ No newline at end of file
diff --git a/data/2023/neurips/InfoCD: A Contrastive Chamfer Distance Loss for Point Cloud Completion b/data/2023/neurips/InfoCD: A Contrastive Chamfer Distance Loss for Point Cloud Completion
new file mode 100644
index 0000000000..847791348c
--- /dev/null
+++ b/data/2023/neurips/InfoCD: A Contrastive Chamfer Distance Loss for Point Cloud Completion	
@@ -0,0 +1 @@
+A point cloud is a discrete set of data points sampled from a 3D geometric surface. Chamfer distance (CD) is a popular metric and training loss to measure the distances between point clouds, but also well known to be sensitive to out-liers. We propose InfoCD , a novel contrastive Chamfer distance loss, and learn to spread the matched points to better align the distributions of point clouds. As such InfoCD leads to an improved surface similarity metric. We show that minimizing InfoCD is equivalent to maximizing a lower bound of the mutual information between the underlying geometric surfaces represented by the point clouds, leading to a regularized CD metric which is robust and computationally efficient for deep learning. We conduct comprehensive experiments for point cloud completion using InfoCD and observe significant improvements consistently over all the popular baseline networks trained with CD-based losses, leading to new state-of-the-art results on several benchmark datasets. Demo code is available at https://github.com/Zhang-VISLab/NeurIPS2023-InfoCD .
\ No newline at end of file
diff --git a/data/2023/neurips/InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding b/data/2023/neurips/InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
new file mode 100644
index 0000000000..2b841feefa
--- /dev/null
+++ b/data/2023/neurips/InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding	
@@ -0,0 +1 @@
+Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of the prompts. We also empirically observe that conventional prompt tuning methods cannot encode and learn sufficient task-relevant information from prompt tokens. In this work, we develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters (or encoded representations). This novel view helps us to develop a more efficient, accurate and robust soft prompt tuning method InfoPrompt. With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt. Extensive experiments validate that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods. Finally, we provide a formal theoretical result for showing to show that gradient descent type algorithm can be used to train our mutual information loss.
\ No newline at end of file
diff --git a/data/2023/neurips/Information Geometry of the Retinal Representation Manifold b/data/2023/neurips/Information Geometry of the Retinal Representation Manifold
new file mode 100644
index 0000000000..0ffd7dd6ac
--- /dev/null
+++ b/data/2023/neurips/Information Geometry of the Retinal Representation Manifold	
@@ -0,0 +1 @@
+The ability for the brain to discriminate among visual stimuli is constrained by their retinal representations. Previous studies of visual discriminability have been limited to either low-dimensional artificial stimuli or pure theoretical considerations without a realistic encoding model. Here we propose a novel framework for understanding stimulus discriminability achieved by retinal representations of naturalistic stimuli with the method of information geometry. To model the joint probability distribution of neural responses conditioned on the stimulus, we created a stochastic encoding model of a population of salamander retinal ganglion cells based on a three-layer convolutional neural network model. This model not only accurately captured the mean response to natural scenes but also a variety of second-order statistics. With the model and the proposed theory, we computed the Fisher information metric over stimuli to study the most discriminable stimulus directions. We found that the most discriminable stimulus varied substantially across stimuli, allowing an examination of the relationship between the most discriminable stimulus and the current stimulus. By examining responses generated by the most discriminable stimuli we further found that the most discriminative response mode is often aligned with the most stochastic mode. This finding carries the important implication that under natural scenes, retinal noise correlations are information-limiting rather than increasing information transmission as has been previously speculated. We additionally observed that sensitivity saturates less in the population than for single cells and that as a function of firing rate, Fisher information varies less than sensitivity. We conclude that under natural scenes, population coding benefits from complementary coding and helps to equalize the information carried by different firing rates, which may facilitate decoding of the stimulus under principles of information maximization.
\ No newline at end of file
diff --git a/data/2023/neurips/Information-guided Planning: An Online Approach for Partially Observable Problems b/data/2023/neurips/Information-guided Planning: An Online Approach for Partially Observable Problems
new file mode 100644
index 0000000000..c057422fe4
--- /dev/null
+++ b/data/2023/neurips/Information-guided Planning: An Online Approach for Partially Observable Problems	
@@ -0,0 +1 @@
+This paper presents IB-POMCP, a novel algorithm for online planning under partial observability. Our approach enhances the decision-making process by using estimations of the world belief’s entropy to guide a tree search process and surpass the limitations of planning in scenarios with sparse reward configurations. By performing what we denominate as an information-guided planning process , the algorithm, which incorporates a novel I-UCB function, shows significant improvements in reward and reasoning time compared to state-of-the-art baselines in several benchmark scenarios, along with theoretical convergence guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks b/data/2023/neurips/Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks
new file mode 100644
index 0000000000..2f74cd30c3
--- /dev/null
+++ b/data/2023/neurips/Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks	
@@ -0,0 +1 @@
+We provide several new results on the sample complexity of vector-valued linear predictors (parameterized by a matrix), and more generally neural networks. Focusing on size-independent bounds, where only the Frobenius norm distance of the parameters from some fixed reference matrix $W_0$ is controlled, we show that the sample complexity behavior can be surprisingly different than what we may expect considering the well-studied setting of scalar-valued linear predictors. This also leads to new sample complexity bounds for feed-forward neural networks, tackling some open questions in the literature, and establishing a new convex linear prediction problem that is provably learnable without uniform convergence.
\ No newline at end of file
diff --git a/data/2023/neurips/Inner Product-based Neural Network Similarity b/data/2023/neurips/Inner Product-based Neural Network Similarity
new file mode 100644
index 0000000000..6ddfa78269
--- /dev/null
+++ b/data/2023/neurips/Inner Product-based Neural Network Similarity	
@@ -0,0 +1 @@
+Analyzing representational similarity among neural networks (NNs) is essential for interpreting or transferring deep models. In application scenarios where numerous NN models are learned, it becomes crucial to assess model similarities in computationally efficient ways. In this paper, we propose a new paradigm for reducing NN representational similarity to filter subspace distance. Specifically, when convolutional filters are decomposed as a linear combination of a set of filter subspace elements, denoted as filter atoms , and have those decomposed atom coefficients shared across networks, NN representational similarity can be significantly simplified as calculating the cosine distance among respective filter atoms, to achieve millions of times computation reduction over popular probing-based methods. We provide both theoretical and empirical evidence that such simplified filter subspace-based similarity preserves a strong linear correlation with other popular probing-based metrics, while being significantly more efficient to obtain and robust to probing data. We further validate the effectiveness of the proposed method in various application scenarios where numerous models exist, such as federated and continual learning as well as analyzing training dynamics. We hope our findings can help further explorations of real-time large-scale representational similarity analysis in neural networks.
\ No newline at end of file
diff --git a/data/2023/neurips/InsActor: Instruction-driven Physics-based Characters b/data/2023/neurips/InsActor: Instruction-driven Physics-based Characters
new file mode 100644
index 0000000000..a8cd6490d1
--- /dev/null
+++ b/data/2023/neurips/InsActor: Instruction-driven Physics-based Characters	
@@ -0,0 +1 @@
+Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications. However, generating physically simulated animations that reflect high-level human instructions remains a difficult problem due to the complexity of physical environments and the richness of human language. In this paper, we present InsActor, a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters. Our framework empowers InsActor to capture complex relationships between high-level human instructions and character motions by employing diffusion policies for flexibly conditioned motion planning. To overcome invalid states and infeasible state transitions in planned motions, InsActor discovers low-level skills and maps plans to latent skill sequences in a compact latent space. Extensive experiments demonstrate that InsActor achieves state-of-the-art results on various tasks, including instruction-driven motion generation and instruction-driven waypoint heading. Notably, the ability of InsActor to generate physically simulated animations using high-level human instructions makes it a valuable tool, particularly in executing long-horizon tasks with a rich set of instructions.
\ No newline at end of file
diff --git a/data/2023/neurips/Inserting Anybody in Diffusion Models via Celeb Basis b/data/2023/neurips/Inserting Anybody in Diffusion Models via Celeb Basis
new file mode 100644
index 0000000000..d7fb296315
--- /dev/null
+++ b/data/2023/neurips/Inserting Anybody in Diffusion Models via Celeb Basis	
@@ -0,0 +1 @@
+Exquisite demand exists for customizing the pretrained large text-to-image model, $\textit{e.g.}$, Stable Diffusion, to generate innovative concepts, such as the users themselves. However, the newly-added concept from previous customization methods often shows weaker combination abilities than the original ones even given several images during training. We thus propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model using just $\textbf{one facial photograph}$ and only $\textbf{1024 learnable parameters}$ under $\textbf{3 minutes}$. So as we can effortlessly generate stunning images of this person in any pose or position, interacting with anyone and doing anything imaginable from text prompts. To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder. Then, given one facial photo as the target identity, we generate its own embedding by optimizing the weight of this basis and locking all other parameters. Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods. Besides, our model can also learn several new identities at once and interact with each other where the previous customization model fails to. The code will be released.
\ No newline at end of file
diff --git a/data/2023/neurips/Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision b/data/2023/neurips/Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision
new file mode 100644
index 0000000000..9c9d897025
--- /dev/null
+++ b/data/2023/neurips/Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision	
@@ -0,0 +1 @@
+Large language models (LLMs) have demonstrated remarkable capabilities in various tasks. However, their suitability for domain-specific tasks, is limited due to their immense scale at deployment, susceptibility to misinformation, and more importantly, high data annotation costs. We propose a novel Interactive Multi-Fidelity Learning (IMFL) framework for the cost-effective development of small domain-specific LMs under limited annotation budgets. Our approach formulates the domain-specific fine-tuning process as a multi-fidelity learning problem, focusing on identifying the optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance. We further propose an exploration-exploitation query strategy that enhances annotation diversity and informativeness, incorporating two innovative designs: 1) prompt retrieval that selects in-context examples from human-annotated samples to improve LLM annotation, and 2) variable batch size that controls the order for choosing each fidelity to facilitate knowledge distillation, ultimately enhancing annotation quality. Extensive experiments on financial and medical tasks demonstrate that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks. These promising results suggest that the high human annotation costs in domain-specific tasks can be significantly reduced by employing IMFL, which utilizes fewer human annotations, supplemented with cheaper and faster LLM (e.g., GPT-3.5) annotations to achieve comparable performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Interactive Visual Reasoning under Uncertainty b/data/2023/neurips/Interactive Visual Reasoning under Uncertainty
new file mode 100644
index 0000000000..818a9b5278
--- /dev/null
+++ b/data/2023/neurips/Interactive Visual Reasoning under Uncertainty	
@@ -0,0 +1 @@
+One of the fundamental cognitive abilities of humans is to quickly resolve uncertainty by generating hypotheses and testing them via active trials. Encountering a novel phenomenon accompanied by ambiguous cause-effect relationships, humans make hypotheses against data, conduct inferences from observation, test their theory via experimentation, and correct the proposition if inconsistency arises. These iterative processes persist until the underlying mechanism becomes clear. In this work, we devise the IVRE (pronounced as"ivory") environment for evaluating artificial agents' reasoning ability under uncertainty. IVRE is an interactive environment featuring rich scenarios centered around Blicket detection. Agents in IVRE are placed into environments with various ambiguous action-effect pairs and asked to determine each object's role. They are encouraged to propose effective and efficient experiments to validate their hypotheses based on observations and actively gather new information. The game ends when all uncertainties are resolved or the maximum number of trials is consumed. By evaluating modern artificial agents in IVRE, we notice a clear failure of today's learning methods compared to humans. Such inefficacy in interactive reasoning ability under uncertainty calls for future research in building human-like intelligence.
\ No newline at end of file
diff --git a/data/2023/neurips/Interpretability at Scale: Identifying Causal Mechanisms in Alpaca b/data/2023/neurips/Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
new file mode 100644
index 0000000000..6e8a7edb0d
--- /dev/null
+++ b/data/2023/neurips/Interpretability at Scale: Identifying Causal Mechanisms in Alpaca	
@@ -0,0 +1 @@
+Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal abstraction that has uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters -- an approach we call Boundless DAS. This enables us to efficiently search for interpretable causal structure in large language models while they follow instructions. We apply Boundless DAS to the Alpaca model (7B parameters), which, off the shelf, solves a simple numerical reasoning problem. With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables. Furthermore, we find that the alignment of neural representations with these variables is robust to changes in inputs and instructions. These findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models. Our tool is extensible to larger LLMs and is released publicly at `https://github.com/stanfordnlp/pyvene`.
\ No newline at end of file
diff --git a/data/2023/neurips/Interpretable Prototype-based Graph Information Bottleneck b/data/2023/neurips/Interpretable Prototype-based Graph Information Bottleneck
new file mode 100644
index 0000000000..7a5698f596
--- /dev/null
+++ b/data/2023/neurips/Interpretable Prototype-based Graph Information Bottleneck	
@@ -0,0 +1 @@
+The success of Graph Neural Networks (GNNs) has led to a need for understanding their decision-making process and providing explanations for their predictions, which has given rise to explainable AI (XAI) that offers transparent explanations for black-box models. Recently, the use of prototypes has successfully improved the explainability of models by learning prototypes to imply training graphs that affect the prediction. However, these approaches tend to provide prototypes with excessive information from the entire graph, leading to the exclusion of key substructures or the inclusion of irrelevant substructures, which can limit both the interpretability and the performance of the model in downstream tasks. In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB) that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction. This is the first work that incorporates prototype learning into the process of identifying the key subgraphs that have a critical impact on the prediction performance. Extensive experiments, including qualitative analysis, demonstrate that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.
\ No newline at end of file
diff --git a/data/2023/neurips/Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts b/data/2023/neurips/Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts
new file mode 100644
index 0000000000..0419c8d10b
--- /dev/null
+++ b/data/2023/neurips/Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts	
@@ -0,0 +1 @@
+Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant for human-written texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings for a given text sample. We show that the average intrinsic dimensionality of fluent texts in a natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.
\ No newline at end of file
diff --git a/data/2023/neurips/Invariant Learning via Probability of Sufficient and Necessary Causes b/data/2023/neurips/Invariant Learning via Probability of Sufficient and Necessary Causes
new file mode 100644
index 0000000000..cd5aeecb8a
--- /dev/null
+++ b/data/2023/neurips/Invariant Learning via Probability of Sufficient and Necessary Causes	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{sufficiency} and \textit{necessity} conditions. Namely, a necessary but insufficient cause (feature) is invariant to distribution shift, yet it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. The details of the implementation can be found at the GitHub repository: https://github.com/ymy4323460/CaSN.
\ No newline at end of file
diff --git a/data/2023/neurips/Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation b/data/2023/neurips/Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation
new file mode 100644
index 0000000000..50cbdad597
--- /dev/null
+++ b/data/2023/neurips/Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation	
@@ -0,0 +1 @@
+In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting with an unknown environment. Namely, we consider a setting where the pretraining corpus consists of multitask demonstrations and the task for each demonstration is set by an unobserved latent context variable. The goal is to use the pretraining corpus to learn a low dimensional representation of the high dimensional (e.g., visual) observation space which can be transferred to a novel context for finetuning on a limited dataset of demonstrations. Among a variety of possible pretraining objectives, we argue that inverse dynamics modeling -- i.e., predicting an action given the observations appearing before and after it in the demonstration -- is well-suited to this setting. We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. While previous work has attempted various theoretical explanations regarding the benefit of inverse dynamics modeling, we find that these arguments are insufficient to explain the empirical advantages often observed in our settings, and so we derive a novel analysis using a simple but general environment model.
\ No newline at end of file
diff --git a/data/2023/neurips/Inverse Reinforcement Learning with the Average Reward Criterion b/data/2023/neurips/Inverse Reinforcement Learning with the Average Reward Criterion
new file mode 100644
index 0000000000..5fb03b466a
--- /dev/null
+++ b/data/2023/neurips/Inverse Reinforcement Learning with the Average Reward Criterion	
@@ -0,0 +1 @@
+We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs $\mathcal{{O}}(1/\varepsilon)$ steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a $\mathcal{O}(1/\varepsilon^2)$ complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Iterative Reachability Estimation for Safe Reinforcement Learning b/data/2023/neurips/Iterative Reachability Estimation for Safe Reinforcement Learning
new file mode 100644
index 0000000000..d02cc15e2d
--- /dev/null
+++ b/data/2023/neurips/Iterative Reachability Estimation for Safe Reinforcement Learning	
@@ -0,0 +1 @@
+Ensuring safety is important for the practical deployment of reinforcement learning (RL). Various challenges must be addressed, such as handling stochasticity in the environments, providing rigorous guarantees of persistent state-wise safety satisfaction, and avoiding overly conservative behaviors that sacrifice performance. We propose a new framework, Reachability Estimation for Safe Policy Optimization (RESPO), for safety-constrained RL in general stochastic settings. In the feasible set where there exist violation-free policies, we optimize for rewards while maintaining persistent safety. Outside this feasible set, our optimization produces the safest behavior by guaranteeing entrance into the feasible set whenever possible with the least cumulative discounted violations. We introduce a class of algorithms using our novel reachability estimation function to optimize in our proposed framework and in similar frameworks such as those concurrently handling multiple hard and soft constraints. We theoretically establish that our algorithms almost surely converge to locally optimal policies of our safe optimization framework. We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, PyBullet, and MuJoCo, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2023/neurips/Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels b/data/2023/neurips/Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels
new file mode 100644
index 0000000000..79146c7b8e
--- /dev/null
+++ b/data/2023/neurips/Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels	
@@ -0,0 +1 @@
+Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
\ No newline at end of file
diff --git a/data/2023/neurips/Jailbroken: How Does LLM Safety Training Fail? b/data/2023/neurips/Jailbroken: How Does LLM Safety Training Fail?
new file mode 100644
index 0000000000..9a22b5584a
--- /dev/null
+++ b/data/2023/neurips/Jailbroken: How Does LLM Safety Training Fail?	
@@ -0,0 +1 @@
+Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of"jailbreak"attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we investigate why such attacks succeed and how they can be created. We hypothesize two failure modes of safety training: competing objectives and mismatched generalization. Competing objectives arise when a model's capabilities and safety goals conflict, while mismatched generalization occurs when safety training fails to generalize to a domain for which capabilities exist. We use these failure modes to guide jailbreak design and then evaluate state-of-the-art models, including OpenAI's GPT-4 and Anthropic's Claude v1.3, against both existing and newly designed attacks. We find that vulnerabilities persist despite the extensive red-teaming and safety-training efforts behind these models. Notably, new attacks utilizing our failure modes succeed on every prompt in a collection of unsafe requests from the models' red-teaming evaluation sets and outperform existing ad hoc jailbreaks. Our analysis emphasizes the need for safety-capability parity -- that safety mechanisms should be as sophisticated as the underlying model -- and argues against the idea that scaling alone can resolve these safety failure modes.
\ No newline at end of file
diff --git a/data/2023/neurips/K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing b/data/2023/neurips/K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs b/data/2023/neurips/KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
new file mode 100644
index 0000000000..284c5902fb
--- /dev/null
+++ b/data/2023/neurips/KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs	
@@ -0,0 +1 @@
+Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures. Specifically, we first decompose the generalized distiller into knowledge transformations, distance functions, and loss weights. Then, we construct our distiller search space by selecting advanced operations for these three components. With sharpness and represent gap as fitting objectives, we evolve candidate populations and generate better distillers by crossover and mutation. To ensure efficient searching, we employ the loss-rejection protocol, search space shrinkage, and proxy settings during the search process. In this manner, the discovered distiller can address the capacity gap and cross-architecture challenges for any teacher-student pairs in the final distillation stage. Comprehensive experiments reveal that KD-Zero consistently outperforms other state-of-the-art methods across diverse architectures on classification, detection, and segmentation tasks. Noticeably, we provide some practical insights in designing the distiller by analyzing the distiller discovered. Codes are available in supplementary materials.
\ No newline at end of file
diff --git a/data/2023/neurips/Kernel Quadrature with Randomly Pivoted Cholesky b/data/2023/neurips/Kernel Quadrature with Randomly Pivoted Cholesky
new file mode 100644
index 0000000000..665e63af85
--- /dev/null
+++ b/data/2023/neurips/Kernel Quadrature with Randomly Pivoted Cholesky	
@@ -0,0 +1 @@
+This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.
\ No newline at end of file
diff --git a/data/2023/neurips/Kiki or Bouba? Sound Symbolism in Vision-and-Language Models b/data/2023/neurips/Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
new file mode 100644
index 0000000000..e5d66992e4
--- /dev/null
+++ b/data/2023/neurips/Kiki or Bouba? Sound Symbolism in Vision-and-Language Models	
@@ -0,0 +1 @@
+Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.
\ No newline at end of file
diff --git a/data/2023/neurips/Kissing to Find a Match: Efficient Low-Rank Permutation Representation b/data/2023/neurips/Kissing to Find a Match: Efficient Low-Rank Permutation Representation
new file mode 100644
index 0000000000..55b956e41c
--- /dev/null
+++ b/data/2023/neurips/Kissing to Find a Match: Efficient Low-Rank Permutation Representation	
@@ -0,0 +1 @@
+Permutation matrices play a key role in matching and assignment problems across the fields, especially in computer vision and robotics. However, memory for explicitly representing permutation matrices grows quadratically with the size of the problem, prohibiting large problem instances. In this work, we propose to tackle the curse of dimensionality of large permutation matrices by approximating them using low-rank matrix factorization, followed by a nonlinearity. To this end, we rely on the Kissing number theory to infer the minimal rank required for representing a permutation matrix of a given size, which is significantly smaller than the problem size. This leads to a drastic reduction in computation and memory costs, e.g., up to $3$ orders of magnitude less memory for a problem of size $n=20000$, represented using $8.4\times10^5$ elements in two small matrices instead of using a single huge matrix with $4\times 10^8$ elements. The proposed representation allows for accurate representations of large permutation matrices, which in turn enables handling large problems that would have been infeasible otherwise. We demonstrate the applicability and merits of the proposed approach through a series of experiments on a range of problems that involve predicting permutation matrices, from linear and quadratic assignment to shape matching problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Knowledge Diffusion for Distillation b/data/2023/neurips/Knowledge Diffusion for Distillation
new file mode 100644
index 0000000000..1f3cdb86d6
--- /dev/null
+++ b/data/2023/neurips/Knowledge Diffusion for Distillation	
@@ -0,0 +1 @@
+The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models. Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model. To address this, we propose to denoise student features using a diffusion model trained by teacher features. This allows us to perform better distillation between the refined clean feature and teacher feature. Additionally, we introduce a light-weight diffusion model with a linear autoencoder to reduce the computation cost and an adaptive noise matching module to improve the denoising performance. Extensive experiments demonstrate that DiffKD is effective across various types of features and achieves state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code is available at https://github.com/hunto/DiffKD.
\ No newline at end of file
diff --git a/data/2023/neurips/Knowledge Distillation Performs Partial Variance Reduction b/data/2023/neurips/Knowledge Distillation Performs Partial Variance Reduction
new file mode 100644
index 0000000000..6bc013a239
--- /dev/null
+++ b/data/2023/neurips/Knowledge Distillation Performs Partial Variance Reduction	
@@ -0,0 +1 @@
+Knowledge distillation is a popular approach for enhancing the performance of ''student'' models, with lower representational capacity, by taking advantage of more powerful ''teacher'' models. Despite its apparent simplicity and widespread use, the underlying mechanics behind knowledge distillation (KD) are still not fully understood. In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. We show that, in the context of linear and deep linear models, KD can be interpreted as a novel type of stochastic variance reduction mechanism. We provide a detailed convergence analysis of the resulting dynamics, which hold under standard assumptions for both strongly-convex and non-convex losses, showing that KD acts as a form of partial variance reduction, which can reduce the stochastic gradient noise, but may not eliminate it completely, depending on the properties of the ''teacher'' model. Our analysis puts further emphasis on the need for careful parametrization of KD, in particular w.r.t. the weighting of the distillation loss, and is validated empirically on both linear models and deep neural networks.
\ No newline at end of file
diff --git a/data/2023/neurips/L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors b/data/2023/neurips/L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/L2-Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting b/data/2023/neurips/L2-Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/LEACE: Perfect linear concept erasure in closed form b/data/2023/neurips/LEACE: Perfect linear concept erasure in closed form
new file mode 100644
index 0000000000..541198a354
--- /dev/null
+++ b/data/2023/neurips/LEACE: Perfect linear concept erasure in closed form	
@@ -0,0 +1 @@
+Concept erasure aims to remove specified features from a representation. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept to observe changes in model behavior). We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible, as measured by a broad class of norms. We apply LEACE to large language models with a novel procedure called"concept scrubbing,"which erases target concept information from every layer in the network. We demonstrate our method on two tasks: measuring the reliance of language models on part-of-speech information, and reducing gender bias in BERT embeddings. Code is available at https://github.com/EleutherAI/concept-erasure.
\ No newline at end of file
diff --git a/data/2023/neurips/LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning b/data/2023/neurips/LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning
new file mode 100644
index 0000000000..d6bc670def
--- /dev/null
+++ b/data/2023/neurips/LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning	
@@ -0,0 +1 @@
+Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and concepts, lifelong learning in decision-making (LLDM) also necessitates the transfer of procedural knowledge, such as actions and behaviors. To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. Specifically, LIBERO highlights five key research topics in LLDM: 1) how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both; 2) how to design effective policy architectures and 3) effective algorithms for LLDM; 4) the robustness of a lifelong learner with respect to task ordering; and 5) the effect of model pretraining for LLDM. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks. For benchmarking purpose, we create four task suites (130 tasks in total) that we use to investigate the above-mentioned research topics. To support sample-efficient learning, we provide high-quality human-teleoperated demonstration data for all tasks. Our extensive experiments present several insightful or even unexpected discoveries: sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels at all types of knowledge transfer, and naive supervised pretraining can hinder agents' performance in the subsequent LLDM. Check the website at https://libero-project.github.io for the code and the datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation b/data/2023/neurips/LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
new file mode 100644
index 0000000000..915cc5980c
--- /dev/null
+++ b/data/2023/neurips/LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation	
@@ -0,0 +1 @@
+Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments. In this work, we propose LLMScore, a new framework that offers evaluation scores with multi-granularity compositionality. LLMScore leverages the large language models (LLMs) to evaluate text-to-image models. Initially, it transforms the image into image-level and object-level visual descriptions. Then an evaluation instruction is fed into the LLMs to measure the alignment between the synthesized image and the text, ultimately generating a score accompanied by a rationale. Our substantial analysis reveals the highest correlation of LLMScore with human judgments on a wide range of datasets (Attribute Binding Contrast, Concept Conjunction, MSCOCO, DrawBench, PaintSkills). Notably, our LLMScore achieves Kendall's tau correlation with human evaluations that is 58.8% and 31.2% higher than the commonly-used text-image matching metrics CLIP and BLIP, respectively.
\ No newline at end of file
diff --git a/data/2023/neurips/LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching b/data/2023/neurips/LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
new file mode 100644
index 0000000000..39ea636a90
--- /dev/null
+++ b/data/2023/neurips/LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching	
@@ -0,0 +1 @@
+Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.
\ No newline at end of file
diff --git a/data/2023/neurips/Label Poisoning is All You Need b/data/2023/neurips/Label Poisoning is All You Need
new file mode 100644
index 0000000000..aa241687b4
--- /dev/null
+++ b/data/2023/neurips/Label Poisoning is All You Need	
@@ -0,0 +1 @@
+In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks. However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels? We introduce a novel approach to design label-only backdoor attacks, which we call FLIP, and demonstrate its strengths on three datasets (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32, ResNet-18, VGG-19, and Vision Transformer). With only 2% of CIFAR-10 labels corrupted, FLIP achieves a near-perfect attack success rate of 99.4% while suffering only a 1.8% drop in the clean test accuracy. Our approach builds upon the recent advances in trajectory matching, originally introduced for dataset distillation.
\ No newline at end of file
diff --git a/data/2023/neurips/Label-Only Model Inversion Attacks via Knowledge Transfer b/data/2023/neurips/Label-Only Model Inversion Attacks via Knowledge Transfer
new file mode 100644
index 0000000000..f28d20df8c
--- /dev/null
+++ b/data/2023/neurips/Label-Only Model Inversion Attacks via Knowledge Transfer	
@@ -0,0 +1 @@
+In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information. In this work, we propose LOKT, a novel approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Subsequently, using these surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and introduce a new model, Target model-assisted ACGAN (T-ACGAN), for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach serve as effective proxies for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our code, demo, models and reconstructed data are available at our project page: https://ngoc-nguyen-0.github.io/lokt/
\ No newline at end of file
diff --git a/data/2023/neurips/Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels b/data/2023/neurips/Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels
new file mode 100644
index 0000000000..9df0c38848
--- /dev/null
+++ b/data/2023/neurips/Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels	
@@ -0,0 +1 @@
+Learning from noisy labels is an important and long-standing problem in machine learning for real applications. One of the main research lines focuses on learning a label corrector to purify potential noisy labels. However, these methods typically rely on strict assumptions and are limited to certain types of label noise. In this paper, we reformulate the label-noise problem from a generative-model perspective, $\textit{i.e.}$, labels are generated by gradually refining an initial random guess. This new perspective immediately enables existing powerful diffusion models to seamlessly learn the stochastic generative process. Once the generative uncertainty is modeled, we can perform classification inference using maximum likelihood estimation of labels. To mitigate the impact of noisy labels, we propose the $\textbf{L}$abel-$\textbf{R}$etrieval-$\textbf{A}$ugmented (LRA) diffusion model, which leverages neighbor consistency to effectively construct pseudo-clean labels for diffusion training. Our model is flexible and general, allowing easy incorporation of different types of conditional information, $\textit{e.g.}$, use of pre-trained models, to further boost model performance. Extensive experiments are conducted for evaluation. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets. Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases.
\ No newline at end of file
diff --git a/data/2023/neurips/LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite b/data/2023/neurips/LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite
new file mode 100644
index 0000000000..0ed24bf298
--- /dev/null
+++ b/data/2023/neurips/LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite	
@@ -0,0 +1 @@
+Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and three neighbor search routines, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available at https://github.com/tumaer/lagrangebench .
\ No newline at end of file
diff --git a/data/2023/neurips/Langevin Quasi-Monte Carlo b/data/2023/neurips/Langevin Quasi-Monte Carlo
new file mode 100644
index 0000000000..757ba9fcff
--- /dev/null
+++ b/data/2023/neurips/Langevin Quasi-Monte Carlo	
@@ -0,0 +1 @@
+Langevin Monte Carlo (LMC) and its stochastic gradient versions are powerful algorithms for sampling from complex high-dimensional distributions. To sample from a distribution with density $\pi(\theta)\propto \exp(-U(\theta)) $, LMC iteratively generates the next sample by taking a step in the gradient direction $\nabla U$ with added Gaussian perturbations. Expectations w.r.t. the target distribution $\pi$ are estimated by averaging over LMC samples. In ordinary Monte Carlo, it is well known that the estimation error can be substantially reduced by replacing independent random samples by quasi-random samples like low-discrepancy sequences. In this work, we show that the estimation error of LMC can also be reduced by using quasi-random samples. Specifically, we propose to use completely uniformly distributed (CUD) sequences with certain low-discrepancy property to generate the Gaussian perturbations. Under smoothness and convexity conditions, we prove that LMC with a low-discrepancy CUD sequence achieves smaller error than standard LMC. The theoretical analysis is supported by compelling numerical experiments, which demonstrate the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2023/neurips/Language Is Not All You Need: Aligning Perception with Language Models b/data/2023/neurips/Language Is Not All You Need: Aligning Perception with Language Models
new file mode 100644
index 0000000000..03276ac1ec
--- /dev/null
+++ b/data/2023/neurips/Language Is Not All You Need: Aligning Perception with Language Models	
@@ -0,0 +1 @@
+A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.
\ No newline at end of file
diff --git a/data/2023/neurips/Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting b/data/2023/neurips/Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
new file mode 100644
index 0000000000..a64109780b
--- /dev/null
+++ b/data/2023/neurips/Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting	
@@ -0,0 +1 @@
+Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output, often referred to as chain-of-thought reasoning (CoT). It is tempting to interpret these CoT explanations as the LLM's process for solving a task. This level of transparency into LLMs' predictions would yield significant safety benefits. However, we find that CoT explanations can systematically misrepresent the true reason for a model's prediction. We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs--e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always"(A)"--which models systematically fail to mention in their explanations. When we bias models toward incorrect answers, they frequently generate CoT explanations rationalizing those answers. This causes accuracy to drop by as much as 36% on a suite of 13 tasks from BIG-Bench Hard, when testing with GPT-3.5 from OpenAI and Claude 1.0 from Anthropic. On a social-bias task, model explanations justify giving answers in line with stereotypes without mentioning the influence of these social biases. Our findings indicate that CoT explanations can be plausible yet misleading, which risks increasing our trust in LLMs without guaranteeing their safety. Building more transparent and explainable systems will require either improving CoT faithfulness through targeted efforts or abandoning CoT in favor of alternative methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Language Models Meet World Models: Embodied Experiences Enhance Language Models b/data/2023/neurips/Language Models Meet World Models: Embodied Experiences Enhance Language Models
new file mode 100644
index 0000000000..529d72ae56
--- /dev/null
+++ b/data/2023/neurips/Language Models Meet World Models: Embodied Experiences Enhance Language Models	
@@ -0,0 +1 @@
+While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. Our approach deploys an embodied agent in a world model, particularly a simulator of the physical world (VirtualHome), and acquires a diverse set of embodied experiences through both goal-oriented planning and random exploration. These experiences are then used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking, etc. Moreover, it is desirable to preserve the generality of LMs during finetuning, which facilitates generalizing the embodied knowledge across tasks rather than being tied to specific simulations. We thus further introduce the classical (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency. Extensive experiments show our approach substantially improves base LMs on 18 downstream tasks by 64.28% on average. In particular, the small LMs (1.3B, 6B, and 13B) enhanced by our approach match or even outperform much larger LMs (e.g., ChatGPT).
\ No newline at end of file
diff --git a/data/2023/neurips/Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment b/data/2023/neurips/Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
new file mode 100644
index 0000000000..4e8a1ab125
--- /dev/null
+++ b/data/2023/neurips/Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment	
@@ -0,0 +1 @@
+Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks. However, a key limitation is that these language models fundamentally lack visual perception - a crucial attribute needed to extend these models to be able to interact with the real world and solve vision tasks, such as in visual-question answering and robotics. Prior works have largely connected image to text through pretraining and/or fine-tuning on curated image-text datasets, which can be a costly and expensive process. In order to resolve this limitation, we propose a simple yet effective approach called Language-Quantized AutoEncoder (LQAE), a modification of VQ-VAE that learns to align text-image data in an unsupervised manner by leveraging pretrained language models (e.g., BERT, RoBERTa). Our main idea is to encode image as sequences of text tokens by directly quantizing image embeddings using a pretrained language codebook. We then apply random masking followed by a BERT model, and have the decoder reconstruct the original image from BERT predicted text token embeddings. By doing so, LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned text-image pairs. This enables few-shot image classification with large language models (e.g., GPT-3) as well as linear classification of images based on BERT text features. To the best of our knowledge, our work is the first work that uses unaligned images for multimodal tasks by leveraging the power of pretrained language models.
\ No newline at end of file
diff --git a/data/2023/neurips/Language-based Action Concept Spaces Improve Video Self-Supervised Learning b/data/2023/neurips/Language-based Action Concept Spaces Improve Video Self-Supervised Learning
new file mode 100644
index 0000000000..d153836ad0
--- /dev/null
+++ b/data/2023/neurips/Language-based Action Concept Spaces Improve Video Self-Supervised Learning	
@@ -0,0 +1 @@
+Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/Large Language Models Are Semi-Parametric Reinforcement Learning Agents b/data/2023/neurips/Large Language Models Are Semi-Parametric Reinforcement Learning Agents
new file mode 100644
index 0000000000..d69ef42c30
--- /dev/null
+++ b/data/2023/neurips/Large Language Models Are Semi-Parametric Reinforcement Learning Agents	
@@ -0,0 +1 @@
+Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as REMEMBERER. By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. We further introduce Reinforcement Learning with Experience Memory (RLEM) to update the memory. Thus, the whole system can learn from the experiences of both success and failure, and evolve its capability without fine-tuning the parameters of the LLM. In this way, the proposed REMEMBERER constitutes a semi-parametric RL agent. Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER.
\ No newline at end of file
diff --git a/data/2023/neurips/Large Language Models are Visual Reasoning Coordinators b/data/2023/neurips/Large Language Models are Visual Reasoning Coordinators
new file mode 100644
index 0000000000..11ca05e26a
--- /dev/null
+++ b/data/2023/neurips/Large Language Models are Visual Reasoning Coordinators	
@@ -0,0 +1 @@
+Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the collective power of these complementary VLMs is rarely explored. Existing methods like ensemble still struggle to aggregate these models with the desired higher-order communications. In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. Our key insight is that a large language model (LLM) can efficiently coordinate multiple VLMs by facilitating natural language communication that leverages their distinct and complementary capabilities. Extensive experiments demonstrate that our instruction tuning variant, Cola-FT, achieves state-of-the-art performance on visual question answering (VQA), outside knowledge VQA, visual entailment, and visual spatial reasoning tasks. Moreover, we show that our in-context learning variant, Cola-Zero, exhibits competitive performance in zero and few-shot settings, without finetuning. Through systematic ablation studies and visualizations, we validate that a coordinator LLM indeed comprehends the instruction prompts as well as the separate functionalities of VLMs; it then coordinates them to enable impressive visual reasoning capabilities.
\ No newline at end of file
diff --git a/data/2023/neurips/Large Language Models can Implement Policy Iteration b/data/2023/neurips/Large Language Models can Implement Policy Iteration
new file mode 100644
index 0000000000..5ee1dca3c8
--- /dev/null
+++ b/data/2023/neurips/Large Language Models can Implement Policy Iteration	
@@ -0,0 +1 @@
+This work presents In-Context Policy Iteration, an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models. While the application of foundation models to RL has received considerable attention, most approaches rely on either (1) the curation of expert demonstrations (either through manual design or task-specific pretraining) or (2) adaptation to the task of interest using gradient methods (either fine-tuning or training of adapter layers). Both of these techniques have drawbacks. Collecting demonstrations is labor-intensive, and algorithms that rely on them do not outperform the experts from which the demonstrations were derived. All gradient techniques are inherently slow, sacrificing the"few-shot"quality that made in-context learning attractive to begin with. In this work, we present an algorithm, ICPI, that learns to perform RL tasks without expert demonstrations or gradients. Instead we present a policy-iteration method in which the prompt content is the entire locus of learning. ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment. In order to eliminate the role of in-weights learning (on which approaches like Decision Transformer rely heavily), we demonstrate our algorithm using Codex, a language model with no prior knowledge of the domains on which we evaluate it.
\ No newline at end of file
diff --git a/data/2023/neurips/LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting b/data/2023/neurips/LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
new file mode 100644
index 0000000000..6f7341bf8e
--- /dev/null
+++ b/data/2023/neurips/LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting	
@@ -0,0 +1 @@
+Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8,600 sensors in California with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https://github.com/liuxu77/LargeST.
\ No newline at end of file
diff --git a/data/2023/neurips/Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs b/data/2023/neurips/Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs
new file mode 100644
index 0000000000..22d713f3ed
--- /dev/null
+++ b/data/2023/neurips/Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs	
@@ -0,0 +1 @@
+We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual variable via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
\ No newline at end of file
diff --git a/data/2023/neurips/Latent SDEs on Homogeneous Spaces b/data/2023/neurips/Latent SDEs on Homogeneous Spaces
new file mode 100644
index 0000000000..dbf5d8ebf1
--- /dev/null
+++ b/data/2023/neurips/Latent SDEs on Homogeneous Spaces	
@@ -0,0 +1 @@
+We consider the problem of variational Bayesian inference in a latent variable model where a (possibly complex) observed stochastic process is governed by the solution of a latent stochastic differential equation (SDE). Motivated by the challenges that arise when trying to learn an (almost arbitrary) latent neural SDE from data, such as efficient gradient computation, we take a step back and study a specific subclass instead. In our case, the SDE evolves on a homogeneous latent space and is induced by stochastic dynamics of the corresponding (matrix) Lie group. In learning problems, SDEs on the unit n-sphere are arguably the most relevant incarnation of this setup. Notably, for variational inference, the sphere not only facilitates using a truly uninformative prior, but we also obtain a particularly simple and intuitive expression for the Kullback-Leibler divergence between the approximate posterior and prior process in the evidence lower bound. Experiments demonstrate that a latent SDE of the proposed type can be learned efficiently by means of an existing one-step geometric Euler-Maruyama scheme. Despite restricting ourselves to a less rich class of SDEs, we achieve competitive or even state-of-the-art results on various time series interpolation/classification problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Latent exploration for Reinforcement Learning b/data/2023/neurips/Latent exploration for Reinforcement Learning
new file mode 100644
index 0000000000..35e8ddc070
--- /dev/null
+++ b/data/2023/neurips/Latent exploration for Reinforcement Learning	
@@ -0,0 +1 @@
+In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. Due to the curse of dimensionality, learning policies that map high-dimensional sensory input to motor output is particularly challenging. During training, state of the art methods (SAC, PPO, etc.) explore the environment by perturbing the actuation with independent Gaussian noise. While this unstructured exploration has proven successful in numerous tasks, it can be suboptimal for overactuated systems. When multiple actuators, such as motors or muscles, drive behavior, uncorrelated perturbations risk diminishing each other's effect, or modifying the behavior in a task-irrelevant way. While solutions to introduce time correlation across action perturbations exist, introducing correlation across actuators has been largely ignored. Here, we propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network, which can be seamlessly integrated with on- and off-policy algorithms. We demonstrate that the noisy actions generated by perturbing the network's activations can be modeled as a multivariate Gaussian distribution with a full covariance matrix. In the PyBullet locomotion tasks, Lattice-SAC achieves state of the art results, and reaches 18% higher reward than unstructured exploration in the Humanoid environment. In the musculoskeletal control environments of MyoSuite, Lattice-PPO achieves higher reward in most reaching and object manipulation tasks, while also finding more energy-efficient policies with reductions of 20-60%. Overall, we demonstrate the effectiveness of structured action noise in time and actuator space for complex motor control tasks. The code is available at: https://github.com/amathislab/lattice.
\ No newline at end of file
diff --git a/data/2023/neurips/Layer-Neighbor Sampling - Defusing Neighborhood Explosion in GNNs b/data/2023/neurips/Layer-Neighbor Sampling - Defusing Neighborhood Explosion in GNNs
new file mode 100644
index 0000000000..2724d1c0df
--- /dev/null
+++ b/data/2023/neurips/Layer-Neighbor Sampling - Defusing Neighborhood Explosion in GNNs	
@@ -0,0 +1 @@
+Graph Neural Networks (GNNs) have received significant attention recently, but training them at a large scale remains a challenge. Mini-batch training coupled with sampling is used to alleviate this challenge. However, existing approaches either suffer from the neighborhood explosion phenomenon or have poor performance. To address these issues, we propose a new sampling algorithm called LAyer-neighBOR sampling (LABOR). It is designed to be a direct replacement for Neighbor Sampling (NS) with the same fanout hyperparameter while sampling up to 7 times fewer vertices, without sacrificing quality. By design, the variance of the estimator of each vertex matches NS from the point of view of a single vertex. Moreover, under the same vertex sampling budget constraints, LABOR converges faster than existing layer sampling approaches and can use up to 112 times larger batch sizes compared to NS.
\ No newline at end of file
diff --git a/data/2023/neurips/Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery b/data/2023/neurips/Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
new file mode 100644
index 0000000000..90c3c1c33d
--- /dev/null
+++ b/data/2023/neurips/Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery	
@@ -0,0 +1 @@
+In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Adaptive Tensorial Density Fields for Clean Cryo-ET Reconstruction b/data/2023/neurips/Learning Adaptive Tensorial Density Fields for Clean Cryo-ET Reconstruction
new file mode 100644
index 0000000000..262b8ec861
--- /dev/null
+++ b/data/2023/neurips/Learning Adaptive Tensorial Density Fields for Clean Cryo-ET Reconstruction	
@@ -0,0 +1 @@
+We present a novel learning-based framework for reconstructing 3D structures from tilt-series cryo-Electron Tomography (cryo-ET) data. Cryo-ET is a powerful imaging technique that can achieve near-atomic resolutions. Still, it suffers from challenges such as missing-wedge acquisition, large data size, and high noise levels. Our framework addresses these challenges by using an adaptive tensorial-based representation for the 3D density ﬁeld of the scanned sample. First, we optimize a quadtree structure to partition the volume of interest. Then, we learn a vector-matrix factorization of the tensor representing the density ﬁeld in each node. Moreover, we use a loss function that combines a differentiable tomographic formation model with three regularization terms: total variation, boundary consistency constraint, and an isotropic Fourier prior. Our framework allows us to query the density at any location using the learned representation and obtain a high-quality 3D tomogram. We demonstrate the superiority of our framework over existing methods using synthetic and real data. Thus, our framework boosts the quality of the reconstruction while reducing the computation time and the memory footprint. The code is available at https://github.com/yuanhaowang1213/adaptivetensordf.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Causal Models under Independent Changes b/data/2023/neurips/Learning Causal Models under Independent Changes
new file mode 100644
index 0000000000..9c457b0e59
--- /dev/null
+++ b/data/2023/neurips/Learning Causal Models under Independent Changes	
@@ -0,0 +1 @@
+In many scientific applications, we observe a system in different conditions in which its components may change, rather than in isolation. In our work, we are interested in explaining the generating process of such a multi-context system using a finite mixture of causal mechanisms. Recent work shows that this causal model is identifiable from data, but is limited to settings where the sparse mechanism shift hypothesis [1] holds and only a subset of the causal conditionals change. As this assumption is not easily verifiable in practice, we study the more general principle that mechanism shifts are independent , which we formalize using the algorithmic notion of independence. We introduce an approach for causal discovery beyond partially directed graphs using Gaussian process models and give conditions under which we provably identify the correct causal model. In our experiments, we show that our method performs well in a range of synthetic settings, on realistic gene expression simulations, as well as on real-world cell signaling data.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Cuts via Enumeration Oracles b/data/2023/neurips/Learning Cuts via Enumeration Oracles
new file mode 100644
index 0000000000..7f85aa13b0
--- /dev/null
+++ b/data/2023/neurips/Learning Cuts via Enumeration Oracles	
@@ -0,0 +1 @@
+Cutting-planes are one of the most important building blocks for solving large-scale integer programming (IP) problems to (near) optimality. The majority of cutting plane approaches rely on explicit rules to derive valid inequalities that can separate the target point from the feasible set. Local cuts, on the other hand, seek to directly derive the facets of the underlying polyhedron and use them as cutting planes. However, current approaches rely on solving Linear Programming (LP) problems in order to derive such a hyperplane. In this paper, we present a novel generic approach for learning the facets of the underlying polyhedron by accessing it implicitly via an enumeration oracle in a reduced dimension. This is achieved by embedding the oracle in a variant of the Frank-Wolfe algorithm which is capable of generating strong cutting planes, effectively turning the enumeration oracle into a separation oracle. We demonstrate the effectiveness of our approach with a case study targeting the multidimensional knapsack problem (MKP).
\ No newline at end of file
diff --git a/data/2023/neurips/Learning DAGs from Data with Few Root Causes b/data/2023/neurips/Learning DAGs from Data with Few Root Causes
new file mode 100644
index 0000000000..7be837fad3
--- /dev/null
+++ b/data/2023/neurips/Learning DAGs from Data with Few Root Causes	
@@ -0,0 +1 @@
+We present a novel perspective and algorithm for learning directed acyclic graphs (DAGs) from data generated by a linear structural equation model (SEM). First, we show that a linear SEM can be viewed as a linear transform that, in prior work, computes the data from a dense input vector of random valued root causes (as we will call them) associated with the nodes. Instead, we consider the case of (approximately) few root causes and also introduce noise in the measurement of the data. Intuitively, this means that the DAG data is produced by few data-generating events whose effect percolates through the DAG. We prove identifiability in this new setting and show that the true DAG is the global minimizer of the $L^0$-norm of the vector of root causes. For data with few root causes, with and without noise, we show superior performance compared to prior DAG learning methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization b/data/2023/neurips/Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization
new file mode 100644
index 0000000000..4dd7079d19
--- /dev/null
+++ b/data/2023/neurips/Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization	
@@ -0,0 +1 @@
+This paper addresses the problem of estimating the 3-DoF camera pose for a ground-level image with respect to a satellite image that encompasses the local surroundings. We propose a novel end-to-end approach that leverages the learning of dense pixel-wise flow fields in pairs of ground and satellite images to calculate the camera pose. Our approach differs from existing methods by constructing the feature metric at the pixel level, enabling full-image supervision for learning distinctive geometric configurations and visual appearances across views. Specifically, our method employs two distinct convolution networks for ground and satellite feature extraction. Then, we project the ground feature map to the bird's eye view (BEV) using a fixed camera height assumption to achieve preliminary geometric alignment. To further establish content association between the BEV and satellite features, we introduce a residual convolution block to refine the projected BEV feature. Optical flow estimation is performed on the refined BEV feature map and the satellite feature map using flow decoder networks based on RAFT. After obtaining dense flow correspondences, we apply the least square method to filter matching inliers and regress the ground camera pose. Extensive experiments demonstrate significant improvements compared to state-of-the-art methods. Notably, our approach reduces the median localization error by 89%, 19%, 80% and 35% on the KITTI, Ford multi-AV, VIGOR and Oxford RobotCar datasets, respectively.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation b/data/2023/neurips/Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
new file mode 100644
index 0000000000..8ee1bbe1ca
--- /dev/null
+++ b/data/2023/neurips/Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation	
@@ -0,0 +1 @@
+Image captioning aims to describe visual content in natural language. As 'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics than the label, it will be penalized and optimized to prefer more concise expressions, referred to as conciseness optimization. In contrast, predictions that are more concise than labels lead to richness optimization. Such conflicting optimization directions could eventually result in the model generating general descriptions. In this work, we introduce Semipermeable MaxImum Likelihood Estimation (SMILE), which allows richness optimization while blocking conciseness optimization, thus encouraging the model to generate longer captions with more details. Extensive experiments on two mainstream image captioning datasets MSCOCO and Flickr30K demonstrate that SMILE significantly enhances the descriptiveness of generated captions. We further provide in-depth investigations to facilitate a better understanding of how SMILE works.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Functional Transduction b/data/2023/neurips/Learning Functional Transduction
new file mode 100644
index 0000000000..7419ceabfb
--- /dev/null
+++ b/data/2023/neurips/Learning Functional Transduction	
@@ -0,0 +1 @@
+Research in machine learning has polarized into two general approaches for regression tasks: Transductive methods construct estimates directly from available data but are usually problem unspecific. Inductive methods can be much more specific but generally require compute-intensive solution searches. In this work, we propose a hybrid approach and show that transductive regression principles can be meta-learned through gradient descent to form efficient in-context neural approximators by leveraging the theory of vector-valued Reproducing Kernel Banach Spaces (RKBS). We apply this approach to function spaces defined over finite and infinite-dimensional spaces (function-valued operators) and show that once trained, the Transducer can almost instantaneously capture an infinity of functional relationships given a few pairs of input and output examples and return new image estimates. We demonstrate the benefit of our meta-learned transductive approach to model complex physical systems influenced by varying external factors with little data at a fraction of the usual deep learning training computational cost for partial differential equations and climate modeling applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Human Action Recognition Representations Without Real Humans b/data/2023/neurips/Learning Human Action Recognition Representations Without Real Humans
new file mode 100644
index 0000000000..900ed44e3d
--- /dev/null
+++ b/data/2023/neurips/Learning Human Action Recognition Representations Without Real Humans	
@@ -0,0 +1 @@
+Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to alleviate these problems by blurring faces, downsampling videos, or training on synthetic data. On the other hand, analysis on the transferability of privacy-preserving pre-trained models to downstream tasks has been limited. In this work, we study this problem by first asking the question: can we pre-train models for human action recognition with data that does not include real humans? To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Furthermore, we propose a novel pre-training strategy, called Privacy-Preserving MAE-Align, to effectively combine synthetic data and human-removed real data. Our approach outperforms previous baselines by up to 5% and closes the performance gap between human and no-human action recognition representations on downstream tasks, for both linear probing and fine-tuning. Our benchmark, code, and models are available at https://github.com/howardzh01/PPMA .
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Invariant Molecular Representation in Latent Discrete Space b/data/2023/neurips/Learning Invariant Molecular Representation in Latent Discrete Space
new file mode 100644
index 0000000000..507d6f48e3
--- /dev/null
+++ b/data/2023/neurips/Learning Invariant Molecular Representation in Latent Discrete Space	
@@ -0,0 +1 @@
+Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Large Graph Property Prediction via Graph Segment Training b/data/2023/neurips/Learning Large Graph Property Prediction via Graph Segment Training
new file mode 100644
index 0000000000..a3feec965d
--- /dev/null
+++ b/data/2023/neurips/Learning Large Graph Property Prediction via Graph Segment Training	
@@ -0,0 +1 @@
+Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Large-Scale MTP2 Gaussian Graphical Models via Bridge-Block Decomposition b/data/2023/neurips/Learning Large-Scale MTP2 Gaussian Graphical Models via Bridge-Block Decomposition
new file mode 100644
index 0000000000..582777c06b
--- /dev/null
+++ b/data/2023/neurips/Learning Large-Scale MTP2 Gaussian Graphical Models via Bridge-Block Decomposition	
@@ -0,0 +1 @@
+This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to bridges. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Large-scale Neural Fields via Context Pruned Meta-Learning b/data/2023/neurips/Learning Large-scale Neural Fields via Context Pruned Meta-Learning
new file mode 100644
index 0000000000..87bb29d70e
--- /dev/null
+++ b/data/2023/neurips/Learning Large-scale Neural Fields via Context Pruned Meta-Learning	
@@ -0,0 +1 @@
+We introduce an efficient optimization-based meta-learning technique for large-scale neural field training by realizing significant memory savings through automated online context point selection. This is achieved by focusing each learning step on the subset of data with the highest expected immediate improvement in model quality, resulting in the almost instantaneous modeling of global structure and subsequent refinement of high-frequency details. We further improve the quality of our meta-learned initialization by introducing a bootstrap correction resulting in the minimization of any error introduced by reduced context sets while simultaneously mitigating the well-known myopia of optimization-based meta-learning. Finally, we show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields in significantly shortened optimization procedures. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals. We provide an extensive empirical evaluation on nine datasets across multiple multiple modalities, demonstrating state-of-the-art results while providing additional insight through careful analysis of the algorithmic components constituting our method. Code is available at https://github.com/jihoontack/GradNCP
\ No newline at end of file
diff --git a/data/2023/neurips/Learning List-Level Domain-Invariant Representations for Ranking b/data/2023/neurips/Learning List-Level Domain-Invariant Representations for Ranking
new file mode 100644
index 0000000000..283efa698a
--- /dev/null
+++ b/data/2023/neurips/Learning List-Level Domain-Invariant Representations for Ranking	
@@ -0,0 +1 @@
+Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and the few existing implementations lack theoretical justifications. This paper revisits invariant representation learning for ranking. Upon reviewing prior work, we found that they implement what we call item-level alignment, which aligns the distributions of the items being ranked from all lists in aggregate but ignores their list structure. However, the list structure should be leveraged, because it is intrinsic to ranking problems where the data and the metrics are defined and computed on lists, not the items by themselves. To close this discrepancy, we propose list-level alignment -- learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method, and it achieves better empirical transfer performance for unsupervised domain adaptation on ranking tasks, including passage reranking.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Motion Refinement for Unsupervised Face Animation b/data/2023/neurips/Learning Motion Refinement for Unsupervised Face Animation
new file mode 100644
index 0000000000..ebbfd7a4ed
--- /dev/null
+++ b/data/2023/neurips/Learning Motion Refinement for Unsupervised Face Animation	
@@ -0,0 +1 @@
+Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model or the local thin-plate-spline motion model). While it is able to capture the coarse facial motion, artifacts can often be observed around the tiny motion in local areas (e.g., lips and eyes), due to the limited ability of these methods to model the finer facial motions. In this work, we design a new unsupervised face animation approach to learn simultaneously the coarse and finer motions. In particular, while exploiting the local affine motion model to learn the global coarse facial motion, we design a novel motion refinement module to compensate for the local affine motion model for modeling finer face motions in local areas. The motion refinement is learned from the dense correlation between the source and driving images. Specifically, we first construct a structure correlation volume based on the keypoint features of the source and driving images. Then, we train a model to generate the tiny facial motions iteratively from low to high resolution. The learned motion refinements are combined with the coarse motion to generate the new image. Extensive experiments on widely used benchmarks demonstrate that our method achieves the best results among state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Rate Free Bayesian Inference in Constrained Domains b/data/2023/neurips/Learning Rate Free Bayesian Inference in Constrained Domains
new file mode 100644
index 0000000000..f277e46c29
--- /dev/null
+++ b/data/2023/neurips/Learning Rate Free Bayesian Inference in Constrained Domains	
@@ -0,0 +1 @@
+We introduce a suite of new particle-based algorithms for sampling on constrained domains which are entirely learning rate free. Our approach leverages coin betting ideas from convex optimisation, and the viewpoint of constrained sampling as a mirrored optimisation problem on the space of probability measures. Based on this viewpoint, we also introduce a unifying framework for several existing constrained sampling algorithms, including mirrored Langevin dynamics and mirrored Stein variational gradient descent. We demonstrate the performance of our algorithms on a range of numerical examples, including sampling from targets on the simplex, sampling with fairness constraints, and constrained sampling problems in post-selection inference. Our results indicate that our algorithms achieve competitive performance with existing constrained sampling methods, without the need to tune any hyperparameters.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Regularized Monotone Graphon Mean-Field Games b/data/2023/neurips/Learning Regularized Monotone Graphon Mean-Field Games
new file mode 100644
index 0000000000..ea4c77dbd5
--- /dev/null
+++ b/data/2023/neurips/Learning Regularized Monotone Graphon Mean-Field Games	
@@ -0,0 +1 @@
+This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $\lambda$-regularized GMFG (for $\lambda\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($\lambda=0$) and $\lambda$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer b/data/2023/neurips/Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer
new file mode 100644
index 0000000000..44d19c3722
--- /dev/null
+++ b/data/2023/neurips/Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer	
@@ -0,0 +1 @@
+A good supervised embedding for a specific machine learning task is only sensitive to changes in the label of interest and is invariant to other confounding factors. We leverage the concept of repeatability from measurement theory to describe this property and propose to use the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. We then propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability. We use simulated data to explain why the ICC regularizer works better on minimizing the intra-class variance than the contrastive loss alone. We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice. The experimental results demonstrate that adding an ICC regularizer can improve the repeatability of learned embeddings compared to only using the contrastive loss; further, these embeddings lead to improved performance in these downstream tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Shared Safety Constraints from Multi-task Demonstrations b/data/2023/neurips/Learning Shared Safety Constraints from Multi-task Demonstrations
new file mode 100644
index 0000000000..18347a8a46
--- /dev/null
+++ b/data/2023/neurips/Learning Shared Safety Constraints from Multi-task Demonstrations	
@@ -0,0 +1 @@
+Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect. For example, regardless of whether it is making a sandwich or clearing the table, a kitchen robot should not break a plate. Manually specifying such a constraint can be both time-consuming and error-prone. We show how to learn constraints from expert demonstrations of safe task completion by extending inverse reinforcement learning (IRL) techniques to the space of constraints. Intuitively, we learn constraints that forbid highly rewarding behavior that the expert could have taken but chose not to. Unfortunately, the constraint learning problem is rather ill-posed and typically leads to overly conservative constraints that forbid all behavior that the expert did not take. We counter this by leveraging diverse demonstrations that naturally occur in multi-task settings to learn a tighter set of constraints. We validate our method with simulation experiments on high-dimensional continuous control tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Time-Invariant Representations for Individual Neurons from Population Dynamics b/data/2023/neurips/Learning Time-Invariant Representations for Individual Neurons from Population Dynamics
new file mode 100644
index 0000000000..f6fd6d8203
--- /dev/null
+++ b/data/2023/neurips/Learning Time-Invariant Representations for Individual Neurons from Population Dynamics	
@@ -0,0 +1 @@
+Neurons can display highly variable dynamics. While such variability presumably supports the wide range of behaviors generated by the organism, their gene expressions are relatively stable in the adult brain. This suggests that neuronal activity is a combination of its time-invariant identity and the inputs the neuron receives from the rest of the circuit. Here, we propose a self-supervised learning based method to assign time-invariant representations to individual neurons based on permutation-, and population size-invariant summary of population recordings. We fit dynamical models to neuronal activity to learn a representation by considering the activity of both the individual and the neighboring population. Our self-supervised approach and use of implicit representations enable robust inference against imperfections such as partial overlap of neurons across sessions, trial-to-trial variability, and limited availability of molecular (transcriptomic) labels for downstream supervised tasks. We demonstrate our method on a public multimodal dataset of mouse cortical neuronal activity and transcriptomic labels. We report>35% improvement in predicting the transcriptomic subclass identity and>20% improvement in predicting class identity with respect to the state-of-the-art.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Trajectories are Generalization Indicators b/data/2023/neurips/Learning Trajectories are Generalization Indicators
new file mode 100644
index 0000000000..b44af13bd8
--- /dev/null
+++ b/data/2023/neurips/Learning Trajectories are Generalization Indicators	
@@ -0,0 +1 @@
+This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities when optimized using (stochastic) gradient descent algorithms. Instead of concentrating solely on the generalization error of the DNN post-training, we present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error. This perspective allows for a more direct comprehension of how the learning trajectory influences generalization error. Building upon this analysis, we propose a new generalization bound that incorporates more extensive trajectory information. Our proposed generalization bound depends on the complexity of learning trajectory and the ratio between the bias and diversity of training set. Experimental findings reveal that our method effectively captures the generalization error throughout the training process. Furthermore, our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels. These results demonstrate that learning trajectory information is a valuable indicator of a model's generalization capabilities.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Universal Policies via Text-Guided Video Generation b/data/2023/neurips/Learning Universal Policies via Text-Guided Video Generation
new file mode 100644
index 0000000000..75a9dbcca4
--- /dev/null
+++ b/data/2023/neurips/Learning Universal Policies via Text-Guided Video Generation	
@@ -0,0 +1 @@
+A goal of artificial intelligence is to construct an agent that can solve a wide variety of tasks. Recent progress in text-guided image synthesis has yielded models with an impressive ability to generate complex novel images, exhibiting combinatorial generalization across domains. Motivated by this success, we investigate whether such tools can be used to construct more general-purpose agents. Specifically, we cast the sequential decision making problem as a text-conditioned video generation problem, where, given a text-encoded specification of a desired goal, a planner synthesizes a set of future frames depicting its planned actions in the future, after which control actions are extracted from the generated video. By leveraging text as the underlying goal specification, we are able to naturally and combinatorially generalize to novel goals. The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks. Finally, by leveraging pretrained language embeddings and widely available videos from the internet, the approach enables knowledge transfer through predicting highly realistic video plans for real robots.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning Visual Prior via Generative Pre-Training b/data/2023/neurips/Learning Visual Prior via Generative Pre-Training
new file mode 100644
index 0000000000..3187890c6b
--- /dev/null
+++ b/data/2023/neurips/Learning Visual Prior via Generative Pre-Training	
@@ -0,0 +1 @@
+Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, e.g., bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning from Active Human Involvement through Proxy Value Propagation b/data/2023/neurips/Learning from Active Human Involvement through Proxy Value Propagation
new file mode 100644
index 0000000000..ed3c655f5d
--- /dev/null
+++ b/data/2023/neurips/Learning from Active Human Involvement through Proxy Value Propagation	
@@ -0,0 +1 @@
+Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training. The interaction and corrective feedback from human brings safety and AI alignment to the learning process. In this work, we propose a new reward-free active human involvement method called Proxy Value Propagation for policy optimization. Our key insight is that a proxy value function can be designed to express human intents, wherein state-action pairs in the human demonstration are labeled with high values, while those agents’ actions that are intervened receive low values. Through the TD-learning framework, labeled values of demonstrated state-action pairs are further propagated to other unlabeled data generated from agents’ exploration. The proxy value function thus induces a policy that faithfully emulates human behaviors. Human-in-the-loop experiments show the generality and efficiency of our method. With minimal modification to existing reinforcement learning algorithms, our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection b/data/2023/neurips/Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
new file mode 100644
index 0000000000..c517754f0b
--- /dev/null
+++ b/data/2023/neurips/Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection	
@@ -0,0 +1 @@
+Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity -- an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the image; and (2) location sensitivity -- the label highly depends on the locations and crops of the original image, which may change after data transformations like random cropping. To remedy this, we propose RichSem, a simple but effective method, which is robust to learn rich semantics from coarse locations without the need of accurate bounding boxes. RichSem leverages rich semantics from images, which are then served as additional soft supervision for training detectors. Specifically, we add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection. The semantic branch is only used for training and is removed during inference. RichSem achieves consistent improvements on both overall and rare-category of LVIS under different backbones and detectors. Our method achieves state-of-the-art performance without requiring complex training and testing procedures. Moreover, we show the effectiveness of our method on other long-tailed datasets with additional experiments. Code is available at \url{https://github.com/MengLcool/RichSem}.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning non-Markovian Decision-Making from State-only Sequences b/data/2023/neurips/Learning non-Markovian Decision-Making from State-only Sequences
new file mode 100644
index 0000000000..921c2d2315
--- /dev/null
+++ b/data/2023/neurips/Learning non-Markovian Decision-Making from State-only Sequences	
@@ -0,0 +1 @@
+Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning to Augment Distributions for Out-of-distribution Detection b/data/2023/neurips/Learning to Augment Distributions for Out-of-distribution Detection
new file mode 100644
index 0000000000..33afbd4fff
--- /dev/null
+++ b/data/2023/neurips/Learning to Augment Distributions for Out-of-distribution Detection	
@@ -0,0 +1 @@
+Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lack of knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affecting the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAL over its advanced counterparts.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning to Group Auxiliary Datasets for Molecule b/data/2023/neurips/Learning to Group Auxiliary Datasets for Molecule
new file mode 100644
index 0000000000..536d188e74
--- /dev/null
+++ b/data/2023/neurips/Learning to Group Auxiliary Datasets for Molecule	
@@ -0,0 +1 @@
+The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval b/data/2023/neurips/Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval
new file mode 100644
index 0000000000..a7ecb5f6c5
--- /dev/null
+++ b/data/2023/neurips/Learning to Parameterize Visual Attributes for Open-set Fine-grained Retrieval	
@@ -0,0 +1 @@
+Open-set fine-grained retrieval is an emerging challenging task that allows to retrieve unknown categories beyond the training set. The best solution for handling unknown categories is to represent them using a set of visual attributes learnt from known categories, as widely used in zero-shot learning. Though important, attribute modeling usually requires significant manual annotations and thus is labor-intensive. Therefore, it is worth to investigate how to transform retrieval models trained by image-level supervision from category semantic extraction to attribute modeling. To this end, we propose a novel Visual Attribute Parameterization Network (VAPNet) to learn visual attributes from known categories and parameterize them into the retrieval model, without the involvement of any attribute annotations. In this way, VAPNet could utilize its parameters to parse a set of visual attributes from unknown categories and precisely represent them. Technically, VAPNet explicitly attains some semantics with rich details via making use of local image patches and distills the visual attributes from these discovered semantics. Additionally, it integrates the online refinement of these visual attributes into the training process to iteratively enhance their quality. Simultaneously, VAPNet treats these attributes as supervisory signals to tune the retrieval models, thereby achieving attribute parameterization. Extensive experiments on open-set fine-grained retrieval datasets validate the superior performance of our VAPNet over existing solutions.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning to Receive Help: Intervention-Aware Concept Embedding Models b/data/2023/neurips/Learning to Receive Help: Intervention-Aware Concept Embedding Models
new file mode 100644
index 0000000000..e3435186b9
--- /dev/null
+++ b/data/2023/neurips/Learning to Receive Help: Intervention-Aware Concept Embedding Models	
@@ -0,0 +1 @@
+Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning to Taste: A Multimodal Wine Dataset b/data/2023/neurips/Learning to Taste: A Multimodal Wine Dataset
new file mode 100644
index 0000000000..c22fc919b4
--- /dev/null
+++ b/data/2023/neurips/Learning to Taste: A Multimodal Wine Dataset	
@@ -0,0 +1 @@
+We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique bottlings, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.
\ No newline at end of file
diff --git a/data/2023/neurips/Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification b/data/2023/neurips/Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
new file mode 100644
index 0000000000..2747bb54bf
--- /dev/null
+++ b/data/2023/neurips/Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification	
@@ -0,0 +1 @@
+We present a novel language-driven ordering alignment method for ordinal classification. The labels in ordinal classification contain additional ordering relations, making them prone to overfitting when relying solely on training data. Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language by converting the original task into a visionlanguage alignment task. Consequently, we propose L2RCLIP, which fully utilizes the language priors from two perspectives. First, we introduce a complementary prompt tuning technique called RankFormer, designed to enhance the ordering relation of original rank prompts. It employs token-level attention with residual-style prompt blending in the word embedding space. Second, to further incorporate language priors, we revisit the approximate bound optimization of vanilla cross-entropy loss and restructure it within the cross-modal embedding space. Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment. Extensive experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance. The code is available at https://github.com/raywang335/L2RCLIP.
\ No newline at end of file
diff --git a/data/2023/neurips/Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning b/data/2023/neurips/Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
new file mode 100644
index 0000000000..251823001c
--- /dev/null
+++ b/data/2023/neurips/Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning	
@@ -0,0 +1 @@
+There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources, including the source code, are released at: https://guansuns.github.io/pages/llm-dm.
\ No newline at end of file
diff --git a/data/2023/neurips/Leveraging the two-timescale regime to demonstrate convergence of neural networks b/data/2023/neurips/Leveraging the two-timescale regime to demonstrate convergence of neural networks
new file mode 100644
index 0000000000..d36c17ef82
--- /dev/null
+++ b/data/2023/neurips/Leveraging the two-timescale regime to demonstrate convergence of neural networks	
@@ -0,0 +1 @@
+We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to hold, distinguishing our result from popular recent approaches such as the neural tangent kernel or mean-field regimes. Experimental illustration is provided, showing that the stochastic gradient descent behaves according to our description of the gradient flow and thus converges to a global optimum in the two-timescale regime, but can fail outside of this regime.
\ No newline at end of file
diff --git a/data/2023/neurips/Lexinvariant Language Models b/data/2023/neurips/Lexinvariant Language Models
new file mode 100644
index 0000000000..76953334c5
--- /dev/null
+++ b/data/2023/neurips/Lexinvariant Language Models	
@@ -0,0 +1 @@
+Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM). However, lexical symbol meanings can also be determined and even redefined by their structural role in a long context. In this paper, we ask: is it possible for a language model to be performant without \emph{any} fixed token embeddings? Such a language model would have to rely entirely on the co-occurence and repetition of tokens in the context rather than the \textit{a priori} identity of any token. To answer this, we study \textit{lexinvariant}language models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice. First, we prove that we can construct a lexinvariant LM to converge to the true language model at a uniform rate that is polynomial in terms of the context length, with a constant factor that is sublinear in the vocabulary size. Second, to build a lexinvariant LM, we simply encode tokens using random Gaussian vectors, such that each token maps to the same representation within each sequence but different representations across sequences. Empirically, we demonstrate that it can indeed attain perplexity comparable to that of a standard language model, given a sufficiently long context. We further explore two properties of the lexinvariant language models: First, given text generated from a substitution cipher of English, it implicitly implements Bayesian in-context deciphering and infers the mapping to the underlying real tokens with high accuracy. Second, it has on average 4X better accuracy over synthetic in-context reasoning tasks. Finally, we discuss regularizing standard language models towards lexinvariance and potential practical applications.
\ No newline at end of file
diff --git a/data/2023/neurips/LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion b/data/2023/neurips/LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion
new file mode 100644
index 0000000000..9cdd91c144
--- /dev/null
+++ b/data/2023/neurips/LinkerNet: Fragment Poses and Linker Co-Design with 3D Equivariant Diffusion	
@@ -0,0 +1 @@
+Targeted protein degradation techniques, such as PROteolysis TArgeting Chimeras (PROTACs), have emerged as powerful tools for selectively removing disease-causing proteins. One challenging problem in this field is designing a linker to connect different molecular fragments to form a stable drug-candidate molecule. Existing models for linker design assume that the relative positions of the fragments are known, which may not be the case in real scenarios. In this work, we address a more general problem where the poses of the fragments are unknown in 3D space. We develop a 3D equivariant diffusion model that jointly learns the generative process of both fragment poses and the 3D structure of the linker. By viewing fragments as rigid bodies, we design a fragment pose prediction module inspired by the Newton-Euler equations in rigid body mechanics. Empirical studies on ZINC and PROTAC-DB datasets demonstrate that our model can generate chemically valid, synthetically-accessible, and low-energy molecules under both unconstrained and constrained generation settings.
\ No newline at end of file
diff --git a/data/2023/neurips/LithoBench: Benchmarking AI Computational Lithography for Semiconductor Manufacturing b/data/2023/neurips/LithoBench: Benchmarking AI Computational Lithography for Semiconductor Manufacturing
new file mode 100644
index 0000000000..e61f8efa92
--- /dev/null
+++ b/data/2023/neurips/LithoBench: Benchmarking AI Computational Lithography for Semiconductor Manufacturing	
@@ -0,0 +1 @@
+Computational lithography provides algorithmic and mathematical support for resolution enhancement in optical lithography, which is critical for semiconductor manufacturing. The time-consuming lithography simulation and mask optimization processes limit the practical application of inverse lithography technology (ILT), a promising solution to the challenges of advanced-node lithography. Although machine learning for ILT has shown promise for reducing the computational burden, this ﬁeld lacks a dataset that can train the models thoroughly and evaluate the performance comprehensively. To boost the development of AI-driven computational lithography, we present the LithoBench dataset, a collection of circuit layout tiles for deep-learning-based lithography simulation and mask optimization. LithoBench consists of more than 120k tiles that are cropped from real circuit designs or synthesized according to topologies of widely adopted ILT testcases. Ground truths are generated by a famous lithography model in academia and an advanced ILT method. We provide a framework to design and evaluate deep neural networks (DNNs) with the data. The framework is used to benchmark state-of-the-art models on lithography simulation and mask optimization. LithoBench is available at https://github.com/shelljane/lithobench .
\ No newline at end of file
diff --git a/data/2023/neurips/Lo-Hi: Practical ML Drug Discovery Benchmark b/data/2023/neurips/Lo-Hi: Practical ML Drug Discovery Benchmark
new file mode 100644
index 0000000000..fd8278bd77
--- /dev/null
+++ b/data/2023/neurips/Lo-Hi: Practical ML Drug Discovery Benchmark	
@@ -0,0 +1 @@
+Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum $k$-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitter
\ No newline at end of file
diff --git a/data/2023/neurips/LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning b/data/2023/neurips/LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning
new file mode 100644
index 0000000000..0224534a53
--- /dev/null
+++ b/data/2023/neurips/LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning	
@@ -0,0 +1 @@
+We present a novel vision-language prompt learning approach for few-shot out-of-distribution (OOD) detection. Few-shot OOD detection aims to detect OOD images from classes that are unseen during training using only a few labeled in-distribution (ID) images. While prompt learning methods such as CoOp have shown effectiveness and efficiency in few-shot ID classification, they still face limitations in OOD detection due to the potential presence of ID-irrelevant information in text embeddings. To address this issue, we introduce a new approach called Local regularized Context Optimization (LoCoOp), which performs OOD regularization that utilizes the portions of CLIP local features as OOD features during training. CLIP's local features have a lot of ID-irrelevant nuisances (e.g., backgrounds), and by learning to push them away from the ID class text embeddings, we can remove the nuisances in the ID class text embeddings and enhance the separation between ID and OOD. Experiments on the large-scale ImageNet OOD detection benchmarks demonstrate the superiority of our LoCoOp over zero-shot, fully supervised detection methods and prompt learning methods. Notably, even in a one-shot setting -- just one label per class, LoCoOp outperforms existing zero-shot and fully supervised detection methods. The code will be available via https://github.com/AtsuMiyai/LoCoOp.
\ No newline at end of file
diff --git a/data/2023/neurips/Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training b/data/2023/neurips/Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training
new file mode 100644
index 0000000000..b7b75e4c25
--- /dev/null
+++ b/data/2023/neurips/Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training	
@@ -0,0 +1 @@
+Federated learning (FL) is vulnerable to backdoor attacks due to its distributed computing nature. Existing defense solution usually requires larger amount of computation in either the training or testing phase, which limits their practicality in the resource-constrain scenarios. A more practical defense, i.e., neural network (NN) pruning based defense has been proposed in centralized backdoor setting. However, our empirical study shows that traditional pruning-based solution suffers poison-coupling effect in FL, which significantly degrades the defense performance. This paper presents Lockdown, an isolated subspace training method to mitigate the poison-coupling effect. Lockdown follows three key procedures. First, it modifies the training protocol by isolating the training subspaces for different clients. Second, it utilizes randomness in initializing isolated subspacess, and performs subspace pruning and subspace recovery to segregate the subspaces between malicious and benign clients. Third, it introduces quorum consensus to cure the global model by purging malicious/dummy parameters. Empirical results show that Lockdown achieves superior and consistent defense performance compared to existing representative approaches against backdoor attacks. Another value-added property of Lockdown is the communication-efficiency and model complexity reduction, which are both critical for resource-constrain FL scenario. Our code is available at https://github.com/git-disl/Lockdown .
\ No newline at end of file
diff --git a/data/2023/neurips/LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees b/data/2023/neurips/LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees
new file mode 100644
index 0000000000..532a886b7f
--- /dev/null
+++ b/data/2023/neurips/LogSpecT: Feasible Graph Learning Model from Stationary Signals with Recovery Guarantees	
@@ -0,0 +1 @@
+Graph learning from signals is a core task in Graph Signal Processing (GSP). One of the most commonly used models to learn graphs from stationary signals is SpecT. However, its practical formulation rSpecT is known to be sensitive to hyperparameter selection and, even worse, to suffer from infeasibility. In this paper, we give the first condition that guarantees the infeasibility of rSpecT and design a novel model (LogSpecT) and its practical formulation (rLogSpecT) to overcome this issue. Contrary to rSpecT, the novel practical model rLogSpecT is always feasible. Furthermore, we provide recovery guarantees of rLogSpecT, which are derived from modern optimization tools related to epi-convergence. These tools could be of independent interest and significant for various learning problems. To demonstrate the advantages of rLogSpecT in practice, a highly efficient algorithm based on the linearized alternating direction method of multipliers (L-ADMM) is proposed. The subproblems of L-ADMM admit closed-form solutions and the convergence is guaranteed. Extensive numerical results on both synthetic and real networks corroborate the stability and superiority of our proposed methods, underscoring their potential for various graph learning applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Logarithmic Bayes Regret Bounds b/data/2023/neurips/Logarithmic Bayes Regret Bounds
new file mode 100644
index 0000000000..4285b122dc
--- /dev/null
+++ b/data/2023/neurips/Logarithmic Bayes Regret Bounds	
@@ -0,0 +1 @@
+We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a $O(c_h \log^2 n)$ bound, where $c_h$ is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the $\tilde{O}(\sqrt{n})$ bounds, that despite the existing lower bounds, have become standard in the literature.
\ No newline at end of file
diff --git a/data/2023/neurips/Long-Term Fairness with Unknown Dynamics b/data/2023/neurips/Long-Term Fairness with Unknown Dynamics
new file mode 100644
index 0000000000..52877fdad2
--- /dev/null
+++ b/data/2023/neurips/Long-Term Fairness with Unknown Dynamics	
@@ -0,0 +1 @@
+While machine learning can myopically reinforce social inequalities, it may also be used to dynamically seek equitable outcomes. In this paper, we formalize long-term fairness in the context of online reinforcement learning. This formulation can accommodate dynamical control objectives, such as driving equity inherent in the state of a population, that cannot be incorporated into static formulations of fairness. We demonstrate that this framing allows an algorithm to adapt to unknown dynamics by sacrificing short-term incentives to drive a classifier-population system towards more desirable equilibria. For the proposed setting, we develop an algorithm that adapts recent work in online learning. We prove that this algorithm achieves simultaneous probabilistic bounds on cumulative loss and cumulative violations of fairness (as statistical regularities between demographic groups). We compare our proposed algorithm to the repeated retraining of myopic classifiers, as a baseline, and to a deep reinforcement learning algorithm that lacks safety guarantees. Our experiments model human populations according to evolutionary game theory and integrate real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Loss Dynamics of Temporal Difference Reinforcement Learning b/data/2023/neurips/Loss Dynamics of Temporal Difference Reinforcement Learning
new file mode 100644
index 0000000000..63ff80459e
--- /dev/null
+++ b/data/2023/neurips/Loss Dynamics of Temporal Difference Reinforcement Learning	
@@ -0,0 +1 @@
+Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Lossy Image Compression with Conditional Diffusion Models b/data/2023/neurips/Lossy Image Compression with Conditional Diffusion Models
new file mode 100644
index 0000000000..d74cfdfd99
--- /dev/null
+++ b/data/2023/neurips/Lossy Image Compression with Conditional Diffusion Models	
@@ -0,0 +1 @@
+This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality. Our code is available at: \url{https://github.com/buggyyang/CDC_compression}
\ No newline at end of file
diff --git a/data/2023/neurips/Low-shot Object Learning with Mutual Exclusivity Bias b/data/2023/neurips/Low-shot Object Learning with Mutual Exclusivity Bias
new file mode 100644
index 0000000000..1c6313c644
--- /dev/null
+++ b/data/2023/neurips/Low-shot Object Learning with Mutual Exclusivity Bias	
@@ -0,0 +1 @@
+This paper introduces Low-shot Object Learning with Mutual Exclusivity Bias (LSME), the first computational framing of mutual exclusivity bias, a phenomenon commonly observed in infants during word learning. We provide a novel dataset, comprehensive baselines, and a state-of-the-art method to enable the ML community to tackle this challenging learning task. The goal of LSME is to analyze an RGB image of a scene containing multiple objects and correctly associate a previously-unknown object instance with a provided category label. This association is then used to perform low-shot learning to test category generalization. We provide a data generation pipeline for the LSME problem and conduct a thorough analysis of the factors that contribute to its difficulty. Additionally, we evaluate the performance of multiple baselines, including state-of-the-art foundation models. Finally, we present a baseline approach that outperforms state-of-the-art models in terms of low-shot accuracy.
\ No newline at end of file
diff --git a/data/2023/neurips/Lower Bounds on Adaptive Sensing for Matrix Recovery b/data/2023/neurips/Lower Bounds on Adaptive Sensing for Matrix Recovery
new file mode 100644
index 0000000000..d0dd116920
--- /dev/null
+++ b/data/2023/neurips/Lower Bounds on Adaptive Sensing for Matrix Recovery	
@@ -0,0 +1 @@
+We study lower bounds on adaptive sensing algorithms for recovering low rank matrices using linear measurements. Given an $n \times n$ matrix $A$, a general linear measurement $S(A)$, for an $n \times n$ matrix $S$, is just the inner product of $S$ and $A$, each treated as $n^2$-dimensional vectors. By performing as few linear measurements as possible on a rank-$r$ matrix $A$, we hope to construct a matrix $\hat{A}$ that satisfies $\|A - \hat{A}\|_F^2 \le c\|A\|_F^2$, for a small constant $c$. It is commonly assumed that when measuring $A$ with $S$, the response is corrupted with an independent Gaussian random variable of mean $0$ and variance $\sigma^2$. Cand\'es and Plan study non-adaptive algorithms for low rank matrix recovery using random linear measurements. At a certain noise level, it is known that their non-adaptive algorithms need to perform $\Omega(n^2)$ measurements, which amounts to reading the entire matrix. An important question is whether adaptivity helps in decreasing the overall number of measurements. We show that any adaptive algorithm that uses $k$ linear measurements in each round and outputs an approximation to the underlying matrix with probability $\ge 9/10$ must run for $t = \Omega(\log(n^2/k)/\log\log n)$ rounds showing that any adaptive algorithm which uses $n^{2-\beta}$ linear measurements in each round must run for $\Omega(\log n/\log\log n)$ rounds to compute a reconstruction with probability $\ge 9/10$. Hence any adaptive algorithm that has $o(\log n/\log\log n)$ rounds must use an overall $\Omega(n^2)$ linear measurements. Our techniques also readily extend to obtain lower bounds on adaptive algorithms for tensor recovery and obtain measurement-vs-rounds trade-off for many sensing problems in numerical linear algebra, such as spectral norm low rank approximation, Frobenius norm low rank approximation, singular vector approximation, and more.
\ No newline at end of file
diff --git a/data/2023/neurips/LuminAIRe: Illumination-Aware Conditional Image Repainting for Lighting-Realistic Generation b/data/2023/neurips/LuminAIRe: Illumination-Aware Conditional Image Repainting for Lighting-Realistic Generation
new file mode 100644
index 0000000000..3117e09a75
--- /dev/null
+++ b/data/2023/neurips/LuminAIRe: Illumination-Aware Conditional Image Repainting for Lighting-Realistic Generation	
@@ -0,0 +1 @@
+We present the il Lumin ation-A ware conditional I mage Re painting (LuminAIRe) task to address the unrealistic lighting effects in recent conditional image repainting (CIR) methods. The environment lighting and 3D geometry conditions are explicitly estimated from given background images and parsing masks using a parametric lighting representation and learning-based priors. These 3D conditions are then converted into illumination images through the proposed physically-based illumination rendering and illumination attention module. With the injection of illumination images, physically-correct lighting information is fed into the lighting-realistic generation process and repainted images with harmonized lighting effects in both foreground and background regions can be acquired, whose superiority over the results of state-of-the-art methods is confirmed through extensive experiments. For facilitating and validating the LuminAIRe task, a new dataset C AR -L UMIN AIR E with lighting annotations and rich appearance variants is collected.
\ No newline at end of file
diff --git a/data/2023/neurips/Lung250M-4B: A Combined 3D Dataset for CT- and Point Cloud-Based Intra-Patient Lung Registration b/data/2023/neurips/Lung250M-4B: A Combined 3D Dataset for CT- and Point Cloud-Based Intra-Patient Lung Registration
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery b/data/2023/neurips/M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery
new file mode 100644
index 0000000000..e686e1ce55
--- /dev/null
+++ b/data/2023/neurips/M2Hub: Unlocking the Potential of Machine Learning for Materials Discovery	
@@ -0,0 +1 @@
+We introduce M$^2$Hub, a toolkit for advancing machine learning in materials discovery. Machine learning has achieved remarkable progress in modeling molecular structures, especially biomolecules for drug discovery. However, the development of machine learning approaches for modeling materials structures lag behind, which is partly due to the lack of an integrated platform that enables access to diverse tasks for materials discovery. To bridge this gap, M$^2$Hub will enable easy access to materials discovery tasks, datasets, machine learning methods, evaluations, and benchmark results that cover the entire workflow. Specifically, the first release of M$^2$Hub focuses on three key stages in materials discovery: virtual screening, inverse design, and molecular simulation, including 9 datasets that covers 6 types of materials with 56 tasks across 8 types of material properties. We further provide 2 synthetic datasets for the purpose of generative tasks on materials. In addition to random data splits, we also provide 3 additional data partitions to reflect the real-world materials discovery scenarios. State-of-the-art machine learning methods (including those are suitable for materials structures but never compared in the literature) are benchmarked on representative tasks. Our codes and library are publicly available at https://github.com/yuanqidu/M2Hub.
\ No newline at end of file
diff --git a/data/2023/neurips/M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark b/data/2023/neurips/M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark
new file mode 100644
index 0000000000..f22ea70278
--- /dev/null
+++ b/data/2023/neurips/M5HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark	
@@ -0,0 +1 @@
+Recognizing and organizing text in correct reading order plays a crucial role in historical document analysis and preservation. While existing methods have shown promising performance, they often struggle with challenges such as diverse layouts, low image quality, style variations, and distortions. This is primarily due to the lack of consideration for these issues in the current benchmarks, which hinders the development and evaluation of historical document analysis and recognition (HDAR) methods in complex real-world scenarios. To address this gap, this paper introduces a complex multi-style Chinese historical document analysis benchmark, named M 5 HisDoc. The M 5 indicates five properties of style, ie., Multiple layouts, Multiple document types, Multiple calligraphy styles, Multiple backgrounds, and Multiple challenges. The M 5 HisDoc dataset consists of two subsets, M 5 HisDoc-R (Regular) and M 5 HisDoc-H (Hard). The M 5 HisDoc-R subset comprises 4,000 historical document images. To ensure high-quality annotations, we meticulously perform manual annotation and triple-checking. To replicate real-world conditions for historical document analysis applications, we incorporate image rotation, distortion, and resolution reduction into M 5 HisDoc-R subset to form a new challenging subset named M 5 HisDoc-H, which contains the same number of images as M 5 HisDoc-R. The dataset exhibits diverse styles, significant scale variations, dense texts, and an extensive character set. We conduct benchmarking experiments on five tasks: text line detection, text line recognition, character detection, character recognition, and reading order prediction. We also conduct cross-validation with other benchmarks. Experimental results demonstrate that the M 5 HisDoc
\ No newline at end of file
diff --git a/data/2023/neurips/MADLAD-400: A Multilingual And Document-Level Large Audited Dataset b/data/2023/neurips/MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
new file mode 100644
index 0000000000..34b3a9834e
--- /dev/null
+++ b/data/2023/neurips/MADLAD-400: A Multilingual And Document-Level Large Audited Dataset	
@@ -0,0 +1 @@
+We introduce MADLAD-400, a manually audited, general domain 3T token monolingual dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations revealed by self-auditing MADLAD-400, and the role data auditing had in the dataset creation process. We then train and release a 10.7B-parameter multilingual machine translation model on 250 billion tokens covering over 450 languages using publicly available data, and find that it is competitive with models that are significantly larger, and report the results on different domains. In addition, we train a 8B-parameter language model, and assess the results on few-shot translation. We make the baseline models available to the research community.
\ No newline at end of file
diff --git a/data/2023/neurips/MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers b/data/2023/neurips/MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
new file mode 100644
index 0000000000..e22aa46724
--- /dev/null
+++ b/data/2023/neurips/MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers	
@@ -0,0 +1 @@
+Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a global model between patches. This enables sub-quadratic self-attention, much larger feedforward layers for the same compute, and improved parallelism during decoding -- unlocking better performance at reduced cost for both training and generation. Extensive experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files. Together, these results establish the viability of tokenization-free autoregressive sequence modeling at scale.
\ No newline at end of file
diff --git a/data/2023/neurips/MG-ViT: A Multi-Granularity Method for Compact and Efficient Vision Transformers b/data/2023/neurips/MG-ViT: A Multi-Granularity Method for Compact and Efficient Vision Transformers
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/MMD-Fuse: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting b/data/2023/neurips/MMD-Fuse: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting
new file mode 100644
index 0000000000..c5a985b45e
--- /dev/null
+++ b/data/2023/neurips/MMD-Fuse: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting	
@@ -0,0 +1 @@
+We propose novel statistics which maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. Exponential concentration bounds are proved for our proposed statistics under the null and alternative. We further show how these kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. This technique applies more broadly to general permutation-based MMD testing, and includes the use of deep kernels with features learnt using unsupervised models such as auto-encoders. We highlight the applicability of our MMD-FUSE test on both synthetic low-dimensional and real-world high-dimensional data, and compare its performance in terms of power against current state-of-the-art kernel tests.
\ No newline at end of file
diff --git a/data/2023/neurips/MVDoppler: Unleashing the Power of Multi-View Doppler for MicroMotion-based Gait Classification b/data/2023/neurips/MVDoppler: Unleashing the Power of Multi-View Doppler for MicroMotion-based Gait Classification
new file mode 100644
index 0000000000..41622b4720
--- /dev/null
+++ b/data/2023/neurips/MVDoppler: Unleashing the Power of Multi-View Doppler for MicroMotion-based Gait Classification	
@@ -0,0 +1 @@
+,
\ No newline at end of file
diff --git a/data/2023/neurips/Machine learning detects terminal singularities b/data/2023/neurips/Machine learning detects terminal singularities
new file mode 100644
index 0000000000..81b2af41a6
--- /dev/null
+++ b/data/2023/neurips/Machine learning detects terminal singularities	
@@ -0,0 +1 @@
+Algebraic varieties are the geometric shapes defined by systems of polynomial equations; they are ubiquitous across mathematics and science. Amongst these algebraic varieties are Q-Fano varieties: positively curved shapes which have Q-factorial terminal singularities. Q-Fano varieties are of fundamental importance in geometry as they are"atomic pieces"of more complex shapes - the process of breaking a shape into simpler pieces in this sense is called the Minimal Model Programme. Despite their importance, the classification of Q-Fano varieties remains unknown. In this paper we demonstrate that machine learning can be used to understand this classification. We focus on 8-dimensional positively-curved algebraic varieties that have toric symmetry and Picard rank 2, and develop a neural network classifier that predicts with 95% accuracy whether or not such an algebraic variety is Q-Fano. We use this to give a first sketch of the landscape of Q-Fanos in dimension 8. How the neural network is able to detect Q-Fano varieties with such accuracy remains mysterious, and hints at some deep mathematical theory waiting to be uncovered. Furthermore, when visualised using the quantum period, an invariant that has played an important role in recent theoretical developments, we observe that the classification as revealed by ML appears to fall within a bounded region, and is stratified by the Fano index. This suggests that it may be possible to state and prove conjectures on completeness in the future. Inspired by the ML analysis, we formulate and prove a new global combinatorial criterion for a positively curved toric variety of Picard rank 2 to have terminal singularities. Together with the first sketch of the landscape of Q-Fanos in higher dimensions, this gives new evidence that machine learning can be an essential tool in developing mathematical conjectures and accelerating theoretical discovery.
\ No newline at end of file
diff --git a/data/2023/neurips/Macro Placement by Wire-Mask-Guided Black-Box Optimization b/data/2023/neurips/Macro Placement by Wire-Mask-Guided Black-Box Optimization
new file mode 100644
index 0000000000..2f7e967a77
--- /dev/null
+++ b/data/2023/neurips/Macro Placement by Wire-Mask-Guided Black-Box Optimization	
@@ -0,0 +1 @@
+The development of very large-scale integration (VLSI) technology has posed new challenges for electronic design automation (EDA) techniques in chip floorplanning. During this process, macro placement is an important subproblem, which tries to determine the positions of all macros with the aim of minimizing half-perimeter wirelength (HPWL) and avoiding overlapping. Previous methods include packing-based, analytical and reinforcement learning methods. In this paper, we propose a new black-box optimization (BBO) framework (called WireMask-BBO) for macro placement, by using a wire-mask-guided greedy procedure for objective evaluation. Equipped with different BBO algorithms, WireMask-BBO empirically achieves significant improvements over previous methods, i.e., achieves significantly shorter HPWL by using much less time. Furthermore, it can fine-tune existing placements by treating them as initial solutions, which can bring up to 50% improvement in HPWL. WireMask-BBO has the potential to significantly improve the quality and efficiency of chip floorplanning, which makes it appealing to researchers and practitioners in EDA and will also promote the application of BBO. Our code is available at https://github.com/lamda-bbo/WireMask-BBO.
\ No newline at end of file
diff --git a/data/2023/neurips/Many-body Approximation for Non-negative Tensors b/data/2023/neurips/Many-body Approximation for Non-negative Tensors
new file mode 100644
index 0000000000..c30e0c33e6
--- /dev/null
+++ b/data/2023/neurips/Many-body Approximation for Non-negative Tensors	
@@ -0,0 +1 @@
+We present an alternative approach to decompose non-negative tensors, called many-body approximation. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. We avoid these problems by energy-based modeling of tensors, where a tensor and its mode correspond to a probability distribution and a random variable, respectively. Our model can be globally optimized in terms of the KL divergence minimization by taking the interaction between variables (that is, modes), into account that can be tuned more intuitively than ranks. Furthermore, we visualize interactions between modes as tensor networks and reveal a nontrivial relationship between many-body approximation and low-rank approximation. We demonstrate the effectiveness of our approach in tensor completion and approximation.
\ No newline at end of file
diff --git a/data/2023/neurips/Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack b/data/2023/neurips/Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack
new file mode 100644
index 0000000000..a8fd783ace
--- /dev/null
+++ b/data/2023/neurips/Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack	
@@ -0,0 +1 @@
+We study design of black-box model extraction attacks that can send minimal number of queries from a publicly available dataset to a target ML model through a predictive API with an aim to create an informative and distributionally equivalent replica of the target. First, we define distributionally equivalent and Max-Information model extraction attacks, and reduce them into a variational optimisation problem. The attacker sequentially solves this optimisation problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads to an active sampling-based query selection algorithm, Marich, which is model-oblivious. Then, we evaluate Marich on different text and image data sets, and different models, including CNNs and BERT. Marich extracts models that achieve $\sim 60-95\%$ of true model's accuracy and uses $\sim 1,000 - 8,500$ queries from the publicly available datasets, which are different from the private training datasets. Models extracted by Marich yield prediction distributions, which are $\sim 2-4\times$ closer to the target's distribution in comparison to the existing active sampling-based attacks. The extracted models also lead to $84-96\%$ accuracy under membership inference attacks. Experimental results validate that Marich is query-efficient, and capable of performing task-accurate, high-fidelity, and informative model extraction.
\ No newline at end of file
diff --git a/data/2023/neurips/Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction b/data/2023/neurips/Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction
new file mode 100644
index 0000000000..b255f4adfb
--- /dev/null
+++ b/data/2023/neurips/Masked Space-Time Hash Encoding for Efficient Dynamic Scene Reconstruction	
@@ -0,0 +1 @@
+In this paper, we propose the Masked Space-Time Hash encoding (MSTH), a novel method for efficiently reconstructing dynamic 3D scenes from multi-view or monocular videos. Based on the observation that dynamic scenes often contain substantial static areas that result in redundancy in storage and computations, MSTH represents a dynamic scene as a weighted combination of a 3D hash encoding and a 4D hash encoding. The weights for the two components are represented by a learnable mask which is guided by an uncertainty-based objective to reflect the spatial and temporal importance of each 3D position. With this design, our method can reduce the hash collision rate by avoiding redundant queries and modifications on static areas, making it feasible to represent a large number of space-time voxels by hash tables with small size.Besides, without the requirements to fit the large numbers of temporally redundant features independently, our method is easier to optimize and converge rapidly with only twenty minutes of training for a 300-frame dynamic scene.As a result, MSTH obtains consistently better results than previous methods with only 20 minutes of training time and 130 MB of memory storage. Code is available at https://github.com/masked-spacetime-hashing/msth
\ No newline at end of file
diff --git a/data/2023/neurips/MathNAS: If Blocks Have a Role in Mathematical Architecture Design b/data/2023/neurips/MathNAS: If Blocks Have a Role in Mathematical Architecture Design
new file mode 100644
index 0000000000..d16ac1b0c8
--- /dev/null
+++ b/data/2023/neurips/MathNAS: If Blocks Have a Role in Mathematical Architecture Design	
@@ -0,0 +1 @@
+Neural Architecture Search (NAS) has emerged as a favoured method for unearthing effective neural architectures. Recent development of large models has intensified the demand for faster search speeds and more accurate search results. However, designing large models by NAS is challenging due to the dramatical increase of search space and the associated huge performance evaluation cost. Consider a typical modular search space widely used in NAS, in which a neural architecture consists of $m$ block nodes and a block node has $n$ alternative blocks. Facing the space containing $n^m$ candidate networks, existing NAS methods attempt to find the best one by searching and evaluating candidate networks directly.Different from the general strategy that takes architecture search as a whole problem, we propose a novel divide-and-conquer strategy by making use of the modular nature of the search space.Here, we introduce MathNAS, a general NAS framework based on mathematical programming.In MathNAS, the performances of the $m*n$ possible building blocks in the search space are calculated first, and then the performance of a network is directly predicted based on the performances of its building blocks. Although estimating block performances involves network training, just as what happens for network performance evaluation in existing NAS methods, predicting network performance is completely training-free and thus extremely fast. In contrast to the $n^m$ candidate networks to evaluate in existing NAS methods, which require training and a formidable computational burden, there are only $m*n$ possible blocks to handle in MathNAS. Therefore, our approach effectively reduces the complexity of network performance evaluation.Our code is available at https://github.com/wangqinsi1/MathNAS.
\ No newline at end of file
diff --git a/data/2023/neurips/Max-Sliced Mutual Information b/data/2023/neurips/Max-Sliced Mutual Information
new file mode 100644
index 0000000000..351459a6c8
--- /dev/null
+++ b/data/2023/neurips/Max-Sliced Mutual Information	
@@ -0,0 +1 @@
+Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
\ No newline at end of file
diff --git a/data/2023/neurips/Maximization of Average Precision for Deep Learning with Adversarial Ranking Robustness b/data/2023/neurips/Maximization of Average Precision for Deep Learning with Adversarial Ranking Robustness
new file mode 100644
index 0000000000..3acdae5a28
--- /dev/null
+++ b/data/2023/neurips/Maximization of Average Precision for Deep Learning with Adversarial Ranking Robustness	
@@ -0,0 +1 @@
+This paper seeks to address a gap in optimizing Average Precision (AP) while ensuring adversarial robustness, an area that has not been extensively explored to the best of our knowledge. AP maximization for deep learning has widespread applications, particularly when there is a significant imbalance between positive and negative examples. Although numerous studies have been conducted on adversarial training, they primarily focus on robustness concerning accuracy, ensuring that the average accuracy on adversarially perturbed examples is well maintained. However, this type of adversarial robustness is insufficient for many applications, as minor perturbations on a single example can significantly impact AP while not greatly influencing the accuracy of the prediction system. To tackle this issue, we introduce a novel formulation that combines an AP surrogate loss with a regularization term representing adversarial ranking robustness, which maintains the consistency between ranking of clean data and that of perturbed data. We then devise an efficient stochastic optimization algorithm to optimize the resulting objective. Our empirical studies, which compare our method to current leading adversarial training baselines and other robust AP maximization strategies, demonstrate the effectiveness of the proposed approach. Notably, our methods outperform a state-of-the-art method (TRADES) by more than 4% in terms of robust AP against PGD attacks while achieving 7% higher AP on clean data simultaneously on CIFAR10 and CIFAR100. The code is available at: https://github.com/GangLii/Adversarial-AP
\ No newline at end of file
diff --git a/data/2023/neurips/Maximum Independent Set: Self-Training through Dynamic Programming b/data/2023/neurips/Maximum Independent Set: Self-Training through Dynamic Programming
new file mode 100644
index 0000000000..e7d1375187
--- /dev/null
+++ b/data/2023/neurips/Maximum Independent Set: Self-Training through Dynamic Programming	
@@ -0,0 +1 @@
+This work presents a graph neural network (GNN) framework for solving the maximum independent set (MIS) problem, inspired by dynamic programming (DP). Specifically, given a graph, we propose a DP-like recursive algorithm based on GNNs that firstly constructs two smaller sub-graphs, predicts the one with the larger MIS, and then uses it in the next recursive call. To train our algorithm, we require annotated comparisons of different graphs concerning their MIS size. Annotating the comparisons with the output of our algorithm leads to a self-training process that results in more accurate self-annotation of the comparisons and vice versa. We provide numerical evidence showing the superiority of our method vs prior methods in multiple synthetic and real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations b/data/2023/neurips/May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations
new file mode 100644
index 0000000000..9991c2aeeb
--- /dev/null
+++ b/data/2023/neurips/May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations	
@@ -0,0 +1 @@
+Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. We address this gap by proposing a force-centric pretraining model for 3D molecular conformations covering both equilibrium and off-equilibrium data. For off-equilibrium data, our model learns directly from their atomic forces. For equilibrium data, we introduce zero-force regularization and forced-based denoising techniques to approximate near-equilibrium forces. We obtain a unified pre-trained model for 3D molecular representation with over 15 million diverse conformations. Experiments show that, with our pre-training objective, we increase forces accuracy by around 3 times compared to the un-pre-trained Equivariant Transformer model. By incorporating regularizations on equilibrium data, we solved the problem of unstable MD simulations in vanilla Equivariant Transformers, achieving state-of-the-art simulation performance with 2.45 times faster inference time than NequIP. As a powerful molecular encoder, our pre-trained model achieves on-par performance with state-of-the-art property prediction tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy b/data/2023/neurips/MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy
new file mode 100644
index 0000000000..a8daf056df
--- /dev/null
+++ b/data/2023/neurips/MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy	
@@ -0,0 +1 @@
+Graph neural networks, which typically exchange information between local neighbors, often struggle to capture long-range interactions (LRIs) within the graph. Building a graph hierarchy via graph pooling methods is a promising approach to address this challenge; however, hierarchical information propagation cannot entirely take over the role of local information aggregation. To balance locality and hierarchy, we integrate the local and hierarchical structures, represented by intra-and inter-graphs respectively, of a multi-scale graph hierarchy into a single mega graph. Our proposed MeGraph model consists of multiple layers alternating between local and hierarchical information aggregation on the mega graph. Each layer first performs local-aware message-passing on graphs of varied scales via the intra-graph edges, then fuses information across the entire hierarchy along the bidirectional pathways formed by inter-graph edges. By repeating this fusion process, local and hierarchical information could intertwine and complement each other. To evaluate our model, we establish a new Graph Theory Benchmark designed to assess LRI capture ability, in which MeGraph demonstrates dominant performance. Furthermore, MeGraph exhibits superior or equivalent performance to state-of-the-art models on the Long Range Graph Benchmark. The experimental results on commonly adopted real-world datasets further demonstrate the broad applicability of MeGraph . 1
\ No newline at end of file
diff --git a/data/2023/neurips/MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery b/data/2023/neurips/MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery
new file mode 100644
index 0000000000..b2b2988538
--- /dev/null
+++ b/data/2023/neurips/MedSat: A Public Health Dataset for England Featuring Medical Prescriptions and Satellite Imagery	
@@ -0,0 +1 @@
+As extreme weather events become more frequent, understanding their impact on human health becomes increasingly crucial. However, the utilization of Earth Observation to effectively analyze the environmental context in relation to health remains limited. This limitation is primarily due to the lack of fine-grained spatial and temporal data in public and population health studies, hindering a comprehensive understanding of health outcomes. Additionally, obtaining appropriate environmental indices across different geographical levels and timeframes poses a challenge. For the years 2019 (pre-COVID) and 2020 (COVID), we collected spatio-temporal indicators for all Lower Layer Super Output Areas in England. These indicators included: i) 111 sociodemographic features linked to health in existing literature, ii) 43 environmental point features (e.g., greenery and air pollution levels), iii) 4 seasonal composite satellite images each with 11 bands, and iv) prescription prevalence associated with five medical conditions (depression, anxiety, diabetes, hypertension, and asthma), opioids and total prescriptions. We combined these indicators into a single M ED S AT dataset, the availability of which presents an opportunity for the machine learning community to develop new techniques specific to public health. These techniques would address challenges such as handling large and complex data volumes, performing effective feature engineering on environmental and sociodemographic factors, capturing spatial and temporal dependencies in the models, addressing imbalanced data distributions, developing novel computer vision methods for health modeling based on satellite imagery, ensuring model explainability, and achieving generalization beyond the specific geographical region.
\ No newline at end of file
diff --git a/data/2023/neurips/Meta-AdaM: An Meta-Learned Adaptive Optimizer with Momentum for Few-Shot Learning b/data/2023/neurips/Meta-AdaM: An Meta-Learned Adaptive Optimizer with Momentum for Few-Shot Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Meta-Learning with Neural Bandit Scheduler b/data/2023/neurips/Meta-Learning with Neural Bandit Scheduler
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Meta-in-context learning in large language models b/data/2023/neurips/Meta-in-context learning in large language models
new file mode 100644
index 0000000000..bc207649e7
--- /dev/null
+++ b/data/2023/neurips/Meta-in-context learning in large language models	
@@ -0,0 +1 @@
+Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.
\ No newline at end of file
diff --git a/data/2023/neurips/Meta-learning families of plasticity rules in recurrent spiking networks using simulation-based inference b/data/2023/neurips/Meta-learning families of plasticity rules in recurrent spiking networks using simulation-based inference
new file mode 100644
index 0000000000..a9adc8e655
--- /dev/null
+++ b/data/2023/neurips/Meta-learning families of plasticity rules in recurrent spiking networks using simulation-based inference	
@@ -0,0 +1 @@
+There is substantial experimental evidence that learning-and memory-related behaviours rely on local synaptic changes, but the search for distinct plasticity rules has been driven by human intuition, with limited success for multiple, co-active plasticity rules in biological networks. More recently, automated meta-learning approaches have been used in simplified settings, such as rate networks and small feed-forward spiking networks. Here, we develop a simulation-based inference (SBI) method for sequentially filtering plasticity rules through an increasingly fine mesh of constraints that can be modified on-the-fly. This method, filter SBI , allows us to infer entire families of complex and co-active plasticity rules in spiking networks. We first consider flexibly parameterized doublet (Hebbian) rules, and find that the set of inferred rules contains solutions that extend and refine—and also reject—predictions from mean-field theory. Next, we expand the search space of plasticity rules by modelling them as multi-layer perceptrons that combine several plasticity-relevant factors, such as weight, voltage, triplets and co-dependency. Out of the millions of possible rules, we identify thousands of unique rule combinations that satisfy biological constraints like plausible activity and weight dynamics. They can be used as a starting point for further investigations into specific network computations, and already suggest refinements and predictions for classical experimental approaches on plasticity. This flexible approach for principled exploration of complex plasticity rules in large recurrent spiking networks presents the most advanced search tool to date for enabling robust predictions and deep insights into
\ No newline at end of file
diff --git a/data/2023/neurips/Metis: Understanding and Enhancing In-Network Regular Expressions b/data/2023/neurips/Metis: Understanding and Enhancing In-Network Regular Expressions
new file mode 100644
index 0000000000..fbdede2f66
--- /dev/null
+++ b/data/2023/neurips/Metis: Understanding and Enhancing In-Network Regular Expressions	
@@ -0,0 +1 @@
+Regular expressions (REs) offer one-shot solutions for many networking tasks, e.g., network intrusion detection. However, REs purely rely on expert knowledge and cannot learn from massive ubiquitous network data for automatic management. Today, neural networks (NNs) have shown superior accuracy and flexibility, thanks to their ability to learn from rich labeled data. Nevertheless, NNs are often incompetent in cold-start scenarios and too complex for deployment on network devices. In this paper, we propose Metis, a general framework that converts REs to network device affordable models for superior accuracy and throughput by taking advantage of REs’ expert knowledge and NNs’ learning ability. In Metis, we convert REs to byte-level recurrent neural networks (BRNNs) without training. The BRNNs preserve expert knowledge from REs and offer adequate accuracy in cold-start scenarios. When rich labeled data is available, the performance of BRNNs can be improved by training. Furthermore, we design a semi-supervised knowledge distillation to transform the BRNNs into pooling soft random forests (PSRFs) that can be deployed on network devices. We collect network traffic data on a large data center for three weeks and evaluate Metis on them. Experimental results show that Metis is more accurate than original REs and other baselines, achieving superior throughput when deployed on network devices.
\ No newline at end of file
diff --git a/data/2023/neurips/Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation b/data/2023/neurips/Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation
new file mode 100644
index 0000000000..a3b3f73799
--- /dev/null
+++ b/data/2023/neurips/Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation	
@@ -0,0 +1 @@
+We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts. Directly learning a conditional generative model from images or texts to 3D shapes is prone to producing inconsistent results with the conditions because 3D shapes have an additional dimension whose distribution significantly differs from that of 2D images and texts. To bridge the domain gap among the three modalities and facilitate multi-modal-conditioned 3D shape generation, we explore representing 3D shapes in a shape-image-text-aligned space. Our framework comprises two models: a Shape-Image-Text-Aligned Variational Auto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model (ASLDM). The former model encodes the 3D shapes into the shape latent space aligned to the image and text and reconstructs the fine-grained 3D neural fields corresponding to given shape embeddings via the transformer-based decoder. The latter model learns a probabilistic mapping function from the image or text space to the latent shape space. Our extensive experiments demonstrate that our proposed approach can generate higher-quality and more diverse 3D shapes that better semantically conform to the visual or textural conditional inputs, validating the effectiveness of the shape-image-text-aligned space for cross-modality 3D shape generation.
\ No newline at end of file
diff --git a/data/2023/neurips/Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension b/data/2023/neurips/Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension
new file mode 100644
index 0000000000..d63ab1f487
--- /dev/null
+++ b/data/2023/neurips/Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension	
@@ -0,0 +1 @@
+The success of over-parameterized neural networks trained to near-zero training error has caused great interest in the phenomenon of benign overfitting, where estimators are statistically consistent even though they interpolate noisy training data. While benign overfitting in fixed dimension has been established for some learning methods, current literature suggests that for regression with typical kernel methods and wide neural networks, benign overfitting requires a high-dimensional setting where the dimension grows with the sample size. In this paper, we show that the smoothness of the estimators, and not the dimension, is the key: benign overfitting is possible if and only if the estimator's derivatives are large enough. We generalize existing inconsistency results to non-interpolating models and more kernels to show that benign overfitting with moderate derivatives is impossible in fixed dimension. Conversely, we show that rate-optimal benign overfitting is possible for regression with a sequence of spiky-smooth kernels with large derivatives. Using neural tangent kernels, we translate our results to wide neural networks. We prove that while infinite-width networks do not overfit benignly with the ReLU activation, this can be fixed by adding small high-frequency fluctuations to the activation function. Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.
\ No newline at end of file
diff --git a/data/2023/neurips/Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks b/data/2023/neurips/Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks
new file mode 100644
index 0000000000..ca2929e1fa
--- /dev/null
+++ b/data/2023/neurips/Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks	
@@ -0,0 +1 @@
+We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/.
\ No newline at end of file
diff --git a/data/2023/neurips/Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees b/data/2023/neurips/Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees
new file mode 100644
index 0000000000..4301b3715d
--- /dev/null
+++ b/data/2023/neurips/Minimax Forward and Backward Learning of Evolving Tasks with Performance Guarantees	
@@ -0,0 +1 @@
+For a sequence of classification tasks that arrive over time, it is common that tasks are evolving in the sense that consecutive tasks often have a higher similarity. The incremental learning of a growing sequence of tasks holds promise to enable accurate classification even with few samples per task by leveraging information from all the tasks in the sequence (forward and backward learning). However, existing techniques developed for continual learning and concept drift adaptation are either designed for tasks with time-independent similarities or only aim to learn the last task in the sequence. This paper presents incremental minimax risk classifiers (IMRCs) that effectively exploit forward and backward learning and account for evolving tasks. In addition, we analytically characterize the performance improvement provided by forward and backward learning in terms of the tasks' expected quadratic change and the number of tasks. The experimental evaluation shows that IMRCs can result in a significant performance improvement, especially for reduced sample sizes.
\ No newline at end of file
diff --git a/data/2023/neurips/Minimum Description Length and Generalization Guarantees for Representation Learning b/data/2023/neurips/Minimum Description Length and Generalization Guarantees for Representation Learning
new file mode 100644
index 0000000000..970a86b9a7
--- /dev/null
+++ b/data/2023/neurips/Minimum Description Length and Generalization Guarantees for Representation Learning	
@@ -0,0 +1 @@
+A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has spurred much interest, most existing such approaches are heuristic; and very little is known about theoretical generalization guarantees. In this paper, we establish a compressibility framework that allows us to derive upper bounds on the generalization error of a representation learning algorithm in terms of the"Minimum Description Length"(MDL) of the labels or the latent variables (representations). Rather than the mutual information between the encoder's input and the representation, which is often believed to reflect the algorithm's generalization capability in the related literature but in fact, falls short of doing so, our new bounds involve the"multi-letter"relative entropy between the distribution of the representations (or labels) of the training and test sets and a fixed prior. In particular, these new bounds reflect the structure of the encoder and are not vacuous for deterministic algorithms. Our compressibility approach, which is information-theoretic in nature, builds upon that of Blum-Langford for PAC-MDL bounds and introduces two essential ingredients: block-coding and lossy-compression. The latter allows our approach to subsume the so-called geometrical compressibility as a special case. To the best knowledge of the authors, the established generalization bounds are the first of their kind for Information Bottleneck (IB) type encoders and representation learning. Finally, we partly exploit the theoretical results by introducing a new data-dependent prior. Numerical simulations illustrate the advantages of well-chosen such priors over classical priors used in IB.
\ No newline at end of file
diff --git a/data/2023/neurips/Minimum-Risk Recalibration of Classifiers b/data/2023/neurips/Minimum-Risk Recalibration of Classifiers
new file mode 100644
index 0000000000..8255c71131
--- /dev/null
+++ b/data/2023/neurips/Minimum-Risk Recalibration of Classifiers	
@@ -0,0 +1 @@
+Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of predictive models. Despite the development of numerous recalibration algorithms, there is still a lack of a comprehensive theory that integrates calibration and sharpness (which is essential for maintaining predictive power). In this paper, we introduce the concept of minimum-risk recalibration within the framework of mean-squared-error (MSE) decomposition, offering a principled approach for evaluating and recalibrating probabilistic classifiers. Using this framework, we analyze the uniform-mass binning (UMB) recalibration method and establish a finite-sample risk upper bound of order $\tilde{O}(B/n + 1/B^2)$ where $B$ is the number of bins and $n$ is the sample size. By balancing calibration and sharpness, we further determine that the optimal number of bins for UMB scales with $n^{1/3}$, resulting in a risk bound of approximately $O(n^{-2/3})$. Additionally, we tackle the challenge of label shift by proposing a two-stage approach that adjusts the recalibration function using limited labeled data from the target domain. Our results show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch. We validate our theoretical findings through numerical simulations, which confirm the tightness of the proposed bounds, the optimal number of bins, and the effectiveness of label shift adaptation.
\ No newline at end of file
diff --git a/data/2023/neurips/Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals b/data/2023/neurips/Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
new file mode 100644
index 0000000000..e45be4c898
--- /dev/null
+++ b/data/2023/neurips/Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals	
@@ -0,0 +1 @@
+Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.
\ No newline at end of file
diff --git a/data/2023/neurips/Mitigating Test-Time Bias for Fair Image Retrieval b/data/2023/neurips/Mitigating Test-Time Bias for Fair Image Retrieval
new file mode 100644
index 0000000000..ddf75f728b
--- /dev/null
+++ b/data/2023/neurips/Mitigating Test-Time Bias for Fair Image Retrieval	
@@ -0,0 +1 @@
+We address the challenge of generating fair and unbiased image retrieval results given neutral textual queries (with no explicit gender or race connotations), while maintaining the utility (performance) of the underlying vision-language (VL) model. Previous methods aim to disentangle learned representations of images and text queries from gender and racial characteristics. However, we show these are inadequate at alleviating bias for the desired equal representation result, as there usually exists test-time bias in the target retrieval set. So motivated, we introduce a straightforward technique, Post-hoc Bias Mitigation (PBM), that post-processes the outputs from the pre-trained vision-language model. We evaluate our algorithm on real-world image search datasets, Occupation 1 and 2, as well as two large-scale image-text datasets, MS-COCO and Flickr30k. Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result while maintaining satisfactory retrieval performance. The source code is publicly available at \url{https://anonymous.4open.science/r/Fair_Text_based_Image_Retrieval-D8B2}.
\ No newline at end of file
diff --git a/data/2023/neurips/Mitigating the Effect of Incidental Correlations on Part-based Learning b/data/2023/neurips/Mitigating the Effect of Incidental Correlations on Part-based Learning
new file mode 100644
index 0000000000..592c791223
--- /dev/null
+++ b/data/2023/neurips/Mitigating the Effect of Incidental Correlations on Part-based Learning	
@@ -0,0 +1 @@
+Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific backgrounds. These incidental correlations may have a detrimental impact on the generalization and interpretability of learned part representations. This study asserts that part-based representations could be more interpretable and generalize better with limited data, employing two innovative regularization methods. The first regularization separates foreground and background information's generative process via a unique mixture-of-parts formulation. Structural constraints are imposed on the parts using a weakly-supervised loss, guaranteeing that the mixture-of-parts for foreground and background entails soft, object-agnostic masks. The second regularization assumes the form of a distillation loss, ensuring the invariance of the learned parts to the incidental background correlations. Furthermore, we incorporate sparse and orthogonal constraints to facilitate learning high-quality part representations. By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets, including MiniImagenet, TieredImageNet, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the ImageNet-9 dataset. The implementation is available on GitHub: https://github.com/GauravBh1010tt/DPViT.git
\ No newline at end of file
diff --git a/data/2023/neurips/Mitigating the Popularity Bias of Graph Collaborative Filtering: A Dimensional Collapse Perspective b/data/2023/neurips/Mitigating the Popularity Bias of Graph Collaborative Filtering: A Dimensional Collapse Perspective
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/MixFormerV2: Efficient Fully Transformer Tracking b/data/2023/neurips/MixFormerV2: Efficient Fully Transformer Tracking
new file mode 100644
index 0000000000..3fa8b77e18
--- /dev/null
+++ b/data/2023/neurips/MixFormerV2: Efficient Fully Transformer Tracking	
@@ -0,0 +1 @@
+Transformer-based trackers have achieved strong accuracy on the standard benchmarks. However, their efficiency remains an obstacle to practical deployment on both GPU and CPU platforms. In this paper, to overcome this issue, we propose a fully transformer tracking framework, coined as \emph{MixFormerV2}, without any dense convolutional operation and complex score prediction module. Our key design is to introduce four special prediction tokens and concatenate them with the tokens from target template and search areas. Then, we apply the unified transformer backbone on these mixed token sequence. These prediction tokens are able to capture the complex correlation between target template and search area via mixed attentions. Based on them, we can easily predict the tracking box and estimate its confidence score through simple MLP heads. To further improve the efficiency of MixFormerV2, we present a new distillation-based model reduction paradigm, including dense-to-sparse distillation and deep-to-shallow distillation. The former one aims to transfer knowledge from the dense-head based MixViT to our fully transformer tracker, while the latter one is used to prune some layers of the backbone. We instantiate two types of MixForemrV2, where the MixFormerV2-B achieves an AUC of 70.6\% on LaSOT and an AUC of 57.4\% on TNL2k with a high GPU speed of 165 FPS, and the MixFormerV2-S surpasses FEAR-L by 2.7\% AUC on LaSOT with a real-time CPU speed.
\ No newline at end of file
diff --git a/data/2023/neurips/Mnemosyne: Learning to Train Transformers with Transformers b/data/2023/neurips/Mnemosyne: Learning to Train Transformers with Transformers
new file mode 100644
index 0000000000..589844333a
--- /dev/null
+++ b/data/2023/neurips/Mnemosyne: Learning to Train Transformers with Transformers	
@@ -0,0 +1 @@
+In this work, we propose a new class of learnable optimizers, called \textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature engineering to mitigate catastrophic forgetting of LSTMs), (b) can successfully train Transformers while using simple meta-training strategies that require minimal computational resources, (c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned hyper-parameters (often producing top performing models). Furthermore, Mnemosyne provides space complexity comparable to that of its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a) fine-tuning a wide range of Vision Transformers (ViTs) from medium-size architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our results with a comprehensive theoretical analysis of the compact associative memory used by Mnemosyne which we believe was never done before.
\ No newline at end of file
diff --git a/data/2023/neurips/MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks b/data/2023/neurips/MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
new file mode 100644
index 0000000000..ccf160f57e
--- /dev/null
+++ b/data/2023/neurips/MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks	
@@ -0,0 +1 @@
+Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigated. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses, we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenge datasets combined with insights from cognitive science can help us go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit tendencies and show to what extent these align with human intuitions.
\ No newline at end of file
diff --git a/data/2023/neurips/Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder b/data/2023/neurips/Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
new file mode 100644
index 0000000000..1f5811d2dd
--- /dev/null
+++ b/data/2023/neurips/Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder	
@@ -0,0 +1 @@
+Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of unmasked tokens. Based on this novel interpretation, we propose to integrate two advanced meta-learning techniques. First, we adapt the amortized latent of the Transformer encoder using gradient-based meta-learning to enhance the reconstruction. Then, we maximize the alignment between amortized and adapted latents through task contrastive learning which guides the Transformer encoder to better encode the task-specific knowledge. Our experiment demonstrates the superiority of MetaMAE in the modality-agnostic SSL benchmark (called DABS), significantly outperforming prior baselines. Code is available at https://github.com/alinlab/MetaMAE.
\ No newline at end of file
diff --git a/data/2023/neurips/Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser b/data/2023/neurips/Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
new file mode 100644
index 0000000000..ef6c1f637e
--- /dev/null
+++ b/data/2023/neurips/Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser	
@@ -0,0 +1 @@
+Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting, where the goal is to recognize audio and visual events in a video with only weak labels observed. Such weak video-level labels only tell what events happen without knowing the modality they are perceived (audio, visual, or both). To enhance learning in this challenging setting, we incorporate large-scale contrastively pre-trained models as the modality teachers. A simple, effective, and generic method, termed Visual-Audio Label Elaboration (VALOR), is innovated to harvest modality labels for the training events. Empirical studies show that the harvested labels significantly improve an attentional baseline by 8.0 in average F-score (Type@AV). Surprisingly, we found that modality-independent teachers outperform their modality-fused counterparts since they are noise-proof from the other potentially unaligned modality. Moreover, our best model achieves the new state-of-the-art on all metrics of LLP by a substantial margin (+5.4 F-score for Type@AV). VALOR is further generalized to Audio-Visual Event Localization and achieves the new state-of-the-art as well. Code is available at: https://github.com/Franklin905/VALOR.
\ No newline at end of file
diff --git a/data/2023/neurips/Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning b/data/2023/neurips/Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning
new file mode 100644
index 0000000000..5aec5df87a
--- /dev/null
+++ b/data/2023/neurips/Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning	
@@ -0,0 +1 @@
+Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs.
\ No newline at end of file
diff --git a/data/2023/neurips/Model-Based Control with Sparse Neural Dynamics b/data/2023/neurips/Model-Based Control with Sparse Neural Dynamics
new file mode 100644
index 0000000000..76dedc2466
--- /dev/null
+++ b/data/2023/neurips/Model-Based Control with Sparse Neural Dynamics	
@@ -0,0 +1 @@
+Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictive control that is amenable to efficient optimization algorithms. Specifically, we start with a ReLU neural model of the system dynamics and, with minimal losses in prediction accuracy, we gradually sparsify it by removing redundant neurons. This discrete sparsification process is approximated as a continuous problem, enabling an end-to-end optimization of both the model architecture and the weight parameters. The sparsified model is subsequently used by a mixed-integer predictive controller, which represents the neuron activations as binary variables and employs efficient branch-and-bound algorithms. Our framework is applicable to a wide variety of DNNs, from simple multilayer perceptrons to complex graph neural dynamics. It can efficiently handle tasks involving complicated contact dynamics, such as object pushing, compositional object sorting, and manipulation of deformable objects. Numerical and hardware experiments show that, despite the aggressive sparsification, our framework can deliver better closed-loop performance than existing state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms b/data/2023/neurips/Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
new file mode 100644
index 0000000000..59e832fe3c
--- /dev/null
+++ b/data/2023/neurips/Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms	
@@ -0,0 +1 @@
+ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.
\ No newline at end of file
diff --git a/data/2023/neurips/Model-free Posterior Sampling via Learning Rate Randomization b/data/2023/neurips/Model-free Posterior Sampling via Learning Rate Randomization
new file mode 100644
index 0000000000..b63dd84b8a
--- /dev/null
+++ b/data/2023/neurips/Model-free Posterior Sampling via Learning Rate Randomization	
@@ -0,0 +1 @@
+In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order $\widetilde{\mathcal{O}}(\sqrt{H^{5}SAT})$, where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order $\widetilde{\mathcal{O}}(H^{5/2} T^{(d_z+1)/(d_z+2)})$, where $d_z$ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.
\ No newline at end of file
diff --git a/data/2023/neurips/Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder b/data/2023/neurips/Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder
new file mode 100644
index 0000000000..0dc51bd422
--- /dev/null
+++ b/data/2023/neurips/Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder	
@@ -0,0 +1 @@
+Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.
\ No newline at end of file
diff --git a/data/2023/neurips/Module-wise Adaptive Distillation for Multimodality Foundation Models b/data/2023/neurips/Module-wise Adaptive Distillation for Multimodality Foundation Models
new file mode 100644
index 0000000000..b5d3bc030a
--- /dev/null
+++ b/data/2023/neurips/Module-wise Adaptive Distillation for Multimodality Foundation Models	
@@ -0,0 +1 @@
+Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture components, referred to as modules, contribute more significantly to the student's performance than others, we propose to track the contributions of individual modules by recording the loss decrement after distillation each module and choose the module with a greater contribution to distill more frequently. Such an approach can be naturally formulated as a multi-armed bandit (MAB) problem, where modules and loss decrements are considered as arms and rewards, respectively. We then develop a modified-Thompson sampling algorithm named OPTIMA to address the nonstationarity of module contributions resulting from model updating. Specifically, we leverage the observed contributions in recent history to estimate the changing contribution of each module and select modules based on these estimations to maximize the cumulative contribution. We evaluate the effectiveness of OPTIMA through distillation experiments on various multimodal understanding and image captioning tasks, using the CoCa-Large model (Yu et al., 2022) as the teacher model.
\ No newline at end of file
diff --git a/data/2023/neurips/Moment Matching Denoising Gibbs Sampling b/data/2023/neurips/Moment Matching Denoising Gibbs Sampling
new file mode 100644
index 0000000000..26f3addee2
--- /dev/null
+++ b/data/2023/neurips/Moment Matching Denoising Gibbs Sampling	
@@ -0,0 +1 @@
+Energy-Based Models (EBMs) offer a versatile framework for modeling complex data distributions. However, training and sampling from EBMs continue to pose significant challenges. The widely-used Denoising Score Matching (DSM) method for scalable EBM training suffers from inconsistency issues, causing the energy model to learn a `noisy' data distribution. In this work, we propose an efficient sampling framework: (pseudo)-Gibbs sampling with moment matching, which enables effective sampling from the underlying clean model when given a `noisy' model that has been well-trained via DSM. We explore the benefits of our approach compared to related methods and demonstrate how to scale the method to high-dimensional datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/MomentDiff: Generative Video Moment Retrieval from Random to Real b/data/2023/neurips/MomentDiff: Generative Video Moment Retrieval from Random to Real
new file mode 100644
index 0000000000..99b6461d6b
--- /dev/null
+++ b/data/2023/neurips/MomentDiff: Generative Video Moment Retrieval from Random to Real	
@@ -0,0 +1 @@
+Video moment retrieval pursues an efficient and generalized solution to identify the specific temporal segments within an untrimmed video that correspond to a given language description. To achieve this goal, we provide a generative diffusion-based framework called MomentDiff, which simulates a typical human retrieval process from random browsing to gradual localization. Specifically, we first diffuse the real span to random noise, and learn to denoise the random noise to the original span with the guidance of similarity between text and video. This allows the model to learn a mapping from arbitrary random locations to real moments, enabling the ability to locate segments from random initialization. Once trained, MomentDiff could sample random temporal segments as initial guesses and iteratively refine them to generate an accurate temporal boundary. Different from discriminative works (e.g., based on learnable proposals or queries), MomentDiff with random initialized spans could resist the temporal location biases from datasets. To evaluate the influence of the temporal location biases, we propose two anti-bias datasets with location distribution shifts, named Charades-STA-Len and Charades-STA-Mom. The experimental results demonstrate that our efficient framework consistently outperforms state-of-the-art methods on three public benchmarks, and exhibits better generalization and robustness on the proposed anti-bias datasets. The code, model, and anti-bias evaluation datasets are available at https://github.com/IMCCretrieval/MomentDiff.
\ No newline at end of file
diff --git a/data/2023/neurips/Momentum Provably Improves Error Feedback! b/data/2023/neurips/Momentum Provably Improves Error Feedback!
new file mode 100644
index 0000000000..c11d835245
--- /dev/null
+++ b/data/2023/neurips/Momentum Provably Improves Error Feedback!	
@@ -0,0 +1 @@
+Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richt\'{a}rik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.
\ No newline at end of file
diff --git a/data/2023/neurips/Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture b/data/2023/neurips/Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
new file mode 100644
index 0000000000..d201c0f6ec
--- /dev/null
+++ b/data/2023/neurips/Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture	
@@ -0,0 +1 @@
+Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new architecture that uses the same sub-quadratic primitive along both sequence length and model dimension: Monarch matrices, a simple class of expressive structured matrices that captures many linear transforms, achieves high hardware efficiency on GPUs, and scales sub-quadratically. As a proof of concept, we explore the performance of M2 in three domains: non-causal BERT-style language modeling, ViT-style image classification, and causal GPT-style language modeling. For non-causal BERT-style modeling, M2 matches BERT-base and BERT-large in downstream GLUE quality with up to 27% fewer parameters, and achieves up to 9.1$\times$ higher throughput at sequence length 4K. On ImageNet, M2 outperforms ViT-b by 1% in accuracy, with only half the parameters. Causal GPT-style models introduce a technical challenge: enforcing causality via masking introduces a quadratic bottleneck. To alleviate this bottleneck, we develop a novel theoretical view of Monarch matrices based on multivariate polynomial evaluation and interpolation, which lets us parameterize M2 to be causal while remaining sub-quadratic. Using this parameterization, M2 matches GPT-style Transformers at 360M parameters in pretraining perplexity on The PILE--showing for the first time that it may be possible to match Transformer quality without attention or MLPs.
\ No newline at end of file
diff --git a/data/2023/neurips/Monte Carlo Tree Search with Boltzmann Exploration b/data/2023/neurips/Monte Carlo Tree Search with Boltzmann Exploration
new file mode 100644
index 0000000000..51649f5327
--- /dev/null
+++ b/data/2023/neurips/Monte Carlo Tree Search with Boltzmann Exploration	
@@ -0,0 +1 @@
+Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go.
\ No newline at end of file
diff --git a/data/2023/neurips/Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset b/data/2023/neurips/Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset
new file mode 100644
index 0000000000..fa283f1fa0
--- /dev/null
+++ b/data/2023/neurips/Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset	
@@ -0,0 +1 @@
+In this paper, we present Motion-X, a large-scale 3D expressive whole-body motion dataset. Existing motion datasets predominantly contain body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions. Moreover, they are primarily collected from limited laboratory scenes with textual descriptions manually labeled, which greatly limits their scalability. To overcome these limitations, we develop a whole-body motion and text annotation pipeline, which can automatically annotate motion from either single- or multi-view videos and provide comprehensive semantic labels for each video and fine-grained whole-body pose descriptions for each frame. This pipeline is of high precision, cost-effective, and scalable for further research. Based on it, we construct Motion-X, which comprises 15.6M precise 3D whole-body pose annotations (i.e., SMPL-X) covering 81.1K motion sequences from massive scenes. Besides, Motion-X provides 15.6M frame-level whole-body pose descriptions and 81.1K sequence-level semantic labels. Comprehensive experiments demonstrate the accuracy of the annotation pipeline and the significant benefit of Motion-X in enhancing expressive, diverse, and natural motion generation, as well as 3D whole-body human mesh recovery.
\ No newline at end of file
diff --git a/data/2023/neurips/Multi-Agent Learning with Heterogeneous Linear Contextual Bandits b/data/2023/neurips/Multi-Agent Learning with Heterogeneous Linear Contextual Bandits
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity b/data/2023/neurips/Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
new file mode 100644
index 0000000000..a6b0d80c52
--- /dev/null
+++ b/data/2023/neurips/Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity	
@@ -0,0 +1 @@
+Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.
\ No newline at end of file
diff --git a/data/2023/neurips/Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation b/data/2023/neurips/Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
new file mode 100644
index 0000000000..2d829738b1
--- /dev/null
+++ b/data/2023/neurips/Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation	
@@ -0,0 +1 @@
+Most existing methods for unsupervised domain adaptation (UDA) rely on a shared network to extract domain-invariant features. However, when facing multiple source domains, optimizing such a network involves updating the parameters of the entire network, making it both computationally expensive and challenging, particularly when coupled with min-max objectives. Inspired by recent advances in prompt learning that adapts high-capacity models for downstream tasks in a computationally economic way, we introduce Multi-Prompt Alignment (MPA), a simple yet efficient framework for multi-source UDA. Given a source and target domain pair, MPA first trains an individual prompt to minimize the domain gap through a contrastive loss. Then, MPA denoises the learned prompts through an auto-encoding process and aligns them by maximizing the agreement of all the reconstructed prompts. Moreover, we show that the resulting subspace acquired from the auto-encoding process can easily generalize to a streamlined set of target domains, making our method more efficient for practical usage. Extensive experiments show that MPA achieves state-of-the-art results on three popular datasets with an impressive average accuracy of 54.1% on DomainNet.
\ No newline at end of file
diff --git a/data/2023/neurips/Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation b/data/2023/neurips/Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
new file mode 100644
index 0000000000..852144bd89
--- /dev/null
+++ b/data/2023/neurips/Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation	
@@ -0,0 +1 @@
+A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two interconnected, lightweight heads. These heads predict segmentation masks using point-level invariant features and estimate motion from SE(3) equivariant features, all without the need for category information. Our training strategy is unified and can be implemented online, which jointly optimizes the predicted segmentation and motion by leveraging the interrelationships among scene flow, segmentation mask, and rigid transformations. We conduct experiments on four datasets to demonstrate the superiority of our method. The results show that our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.
\ No newline at end of file
diff --git a/data/2023/neurips/Multi-modal Queried Object Detection in the Wild b/data/2023/neurips/Multi-modal Queried Object Detection in the Wild
new file mode 100644
index 0000000000..2392f43085
--- /dev/null
+++ b/data/2023/neurips/Multi-modal Queried Object Detection in the Wild	
@@ -0,0 +1 @@
+We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% AP on the LVIS benchmark via multi-modal queries without any downstream finetuning, and averagely +6.3% AP on 13 few-shot downstream tasks, with merely additional 3% modulating time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.
\ No newline at end of file
diff --git a/data/2023/neurips/Multi-scale Diffusion Denoised Smoothing b/data/2023/neurips/Multi-scale Diffusion Denoised Smoothing
new file mode 100644
index 0000000000..33ffce305e
--- /dev/null
+++ b/data/2023/neurips/Multi-scale Diffusion Denoised Smoothing	
@@ -0,0 +1 @@
+Along with recent diffusion models, randomized smoothing has become one of a few tangible approaches that offers adversarial robustness to models at scale, e.g., those of large pre-trained models. Specifically, one can perform randomized smoothing on any classifier via a simple"denoise-and-classify"pipeline, so-called denoised smoothing, given that an accurate denoiser is available - such as diffusion model. In this paper, we present scalable methods to address the current trade-off between certified robustness and accuracy in denoised smoothing. Our key idea is to"selectively"apply smoothing among multiple noise scales, coined multi-scale smoothing, which can be efficiently implemented with a single diffusion model. This approach also suggests a new objective to compare the collective robustness of multi-scale smoothed classifiers, and questions which representation of diffusion model would maximize the objective. To address this, we propose to further fine-tune diffusion model (a) to perform consistent denoising whenever the original image is recoverable, but (b) to generate rather diverse outputs otherwise. Our experiments show that the proposed multi-scale smoothing scheme combined with diffusion fine-tuning enables strong certified robustness available with high noise level while maintaining its accuracy close to non-smoothed classifiers.
\ No newline at end of file
diff --git a/data/2023/neurips/Multiclass Boosting: Simple and Intuitive Weak Learning Criteria b/data/2023/neurips/Multiclass Boosting: Simple and Intuitive Weak Learning Criteria
new file mode 100644
index 0000000000..44600fe4e8
--- /dev/null
+++ b/data/2023/neurips/Multiclass Boosting: Simple and Intuitive Weak Learning Criteria	
@@ -0,0 +1 @@
+We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being"slightly better than random guessing". We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of the number of classes. In addition, we utilize our new boosting technique in several theoretical applications within the context of List PAC Learning. First, we establish an equivalence to weak PAC learning. Furthermore, we present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning and List PAC learning. Notably, our technique gives rise to a simplified analysis, and also implies an improved error bound for large list sizes, compared to previous results.
\ No newline at end of file
diff --git a/data/2023/neurips/Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions b/data/2023/neurips/Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions
new file mode 100644
index 0000000000..0375b155dc
--- /dev/null
+++ b/data/2023/neurips/Multinomial Logistic Regression: Asymptotic Normality on Null Covariates in High-Dimensions	
@@ -0,0 +1 @@
+This paper investigates the asymptotic distribution of the maximum-likelihood estimate (MLE) in multinomial logistic models in the high-dimensional regime where dimension and sample size are of the same order. While classical large-sample theory provides asymptotic normality of the MLE under certain conditions, such classical results are expected to fail in high-dimensions as documented for the binary logistic case in the seminal work of Sur and Cand\`es [2019]. We address this issue in classification problems with 3 or more classes, by developing asymptotic normality and asymptotic chi-square results for the multinomial logistic MLE (also known as cross-entropy minimizer) on null covariates. Our theory leads to a new methodology to test the significance of a given feature. Extensive simulation studies on synthetic data corroborate these asymptotic results and confirm the validity of proposed p-values for testing the significance of a given feature.
\ No newline at end of file
diff --git a/data/2023/neurips/Multiply Robust Federated Estimation of Targeted Average Treatment Effects b/data/2023/neurips/Multiply Robust Federated Estimation of Targeted Average Treatment Effects
new file mode 100644
index 0000000000..5a185119eb
--- /dev/null
+++ b/data/2023/neurips/Multiply Robust Federated Estimation of Targeted Average Treatment Effects	
@@ -0,0 +1 @@
+Federated or multi-site studies have distinct advantages over single-site studies, including increased generalizability, the ability to study underrepresented populations, and the opportunity to study rare exposures and outcomes. However, these studies are challenging due to the need to preserve the privacy of each individual's data and the heterogeneity in their covariate distributions. We propose a novel federated approach to derive valid causal inferences for a target population using multi-site data. We adjust for covariate shift and covariate mismatch between sites by developing multiply-robust and privacy-preserving nuisance function estimation. Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites. We show that these learned weights are efficient and optimal under different scenarios. We showcase the finite sample advantages of our approach in terms of efficiency and robustness compared to existing approaches.
\ No newline at end of file
diff --git a/data/2023/neurips/NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations b/data/2023/neurips/NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
new file mode 100644
index 0000000000..f31d988bd6
--- /dev/null
+++ b/data/2023/neurips/NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations	
@@ -0,0 +1 @@
+Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.io
\ No newline at end of file
diff --git a/data/2023/neurips/NCDL: A Framework for Deep Learning on non-Cartesian Lattices b/data/2023/neurips/NCDL: A Framework for Deep Learning on non-Cartesian Lattices
new file mode 100644
index 0000000000..d21aafa70f
--- /dev/null
+++ b/data/2023/neurips/NCDL: A Framework for Deep Learning on non-Cartesian Lattices	
@@ -0,0 +1 @@
+The use of non-Cartesian grids is a niche but important topic in sub-fields of the numerical sciences, such as simulation and scientific visualization. However, non-Cartesian approaches are virtually unexplored in machine learning. This is likely due to the difficulties in the representation of data on non-Cartesian domains and the lack of support for standard machine learning operations on non-Cartesian data. This paper proposes a new data structure called the lattice tensor which generalizes traditional tensor spatio-temporal operations to lattice tensors, enabling the use of standard machine learning algorithms on non-Cartesian data. We introduce a software library that implements this new data structure and demonstrate its effectiveness on various problems. Our method provides a general framework for machine learning on non-Cartesian domains, addressing the challenges mentioned above and filling a gap in the current literature.
\ No newline at end of file
diff --git a/data/2023/neurips/Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation b/data/2023/neurips/Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
new file mode 100644
index 0000000000..c7dd7ae131
--- /dev/null
+++ b/data/2023/neurips/Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation	
@@ -0,0 +1 @@
+We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.
\ No newline at end of file
diff --git a/data/2023/neurips/Near-Linear Time Algorithm for the Chamfer Distance b/data/2023/neurips/Near-Linear Time Algorithm for the Chamfer Distance
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Near-Optimal k-Clustering in the Sliding Window Model b/data/2023/neurips/Near-Optimal k-Clustering in the Sliding Window Model
new file mode 100644
index 0000000000..78b6e852d4
--- /dev/null
+++ b/data/2023/neurips/Near-Optimal k-Clustering in the Sliding Window Model	
@@ -0,0 +1 @@
+Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older data past a certain time is expired. The sliding window model captures these desired properties and thus there has been substantial interest in clustering in the sliding window model. In this paper, we give the first algorithm that achieves near-optimal $(1+\varepsilon)$-approximation to $(k,z)$-clustering in the sliding window model, where $z$ is the exponent of the distance function in the cost. Our algorithm uses $\frac{k}{\min(\varepsilon^4,\varepsilon^{2+z})}\,\text{polylog}\frac{n\Delta}{\varepsilon}$ words of space when the points are from $[\Delta]^d$, thus significantly improving on works by Braverman et. al. (SODA 2016), Borassi et. al. (NeurIPS 2021), and Epasto et. al. (SODA 2022). Along the way, we develop a data structure for clustering called an online coreset, which outputs a coreset not only for the end of a stream, but also for all prefixes of the stream. Our online coreset samples $\frac{k}{\min(\varepsilon^4,\varepsilon^{2+z})}\,\text{polylog}\frac{n\Delta}{\varepsilon}$ points from the stream. We then show that any online coreset requires $\Omega\left(\frac{k}{\varepsilon^2}\log n\right)$ samples, which shows a separation from the problem of constructing an offline coreset, i.e., constructing online coresets is strictly harder. Our results also extend to general metrics on $[\Delta]^d$ and are near-optimal in light of a $\Omega\left(\frac{k}{\varepsilon^{2+z}}\right)$ lower bound for the size of an offline coreset.
\ No newline at end of file
diff --git a/data/2023/neurips/Nearest Neighbour with Bandit Feedback b/data/2023/neurips/Nearest Neighbour with Bandit Feedback
new file mode 100644
index 0000000000..a23ea527ba
--- /dev/null
+++ b/data/2023/neurips/Nearest Neighbour with Bandit Feedback	
@@ -0,0 +1 @@
+In this paper we adapt the nearest neighbour rule to the contextual bandit problem. Our algorithm handles the fully adversarial setting in which no assumptions at all are made about the data-generation process. When combined with a sufficiently fast data-structure for (perhaps approximate) adaptive nearest neighbour search, such as a navigating net, our algorithm is extremely efficient - having a per trial running time polylogarithmic in both the number of trials and actions, and taking only quasi-linear space. We give generic regret bounds for our algorithm and further analyse them when applied to the stochastic bandit problem in euclidean space. We note that our algorithm can also be applied to the online classification problem.
\ No newline at end of file
diff --git a/data/2023/neurips/Nearly Optimal Bounds for Cyclic Forgetting b/data/2023/neurips/Nearly Optimal Bounds for Cyclic Forgetting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming b/data/2023/neurips/Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming
new file mode 100644
index 0000000000..ab2712c554
--- /dev/null
+++ b/data/2023/neurips/Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming	
@@ -0,0 +1 @@
+Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on four application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb b/data/2023/neurips/Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb
new file mode 100644
index 0000000000..26aa78e2d7
--- /dev/null
+++ b/data/2023/neurips/Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb	
@@ -0,0 +1 @@
+Within a single sniff, the mammalian olfactory system can decode the identity and concentration of odorants wafted on turbulent plumes of air. Yet, it must do so given access only to the noisy, dimensionally-reduced representation of the odor world provided by olfactory receptor neurons. As a result, the olfactory system must solve a compressed sensing problem, relying on the fact that only a handful of the millions of possible odorants are present in a given scene. Inspired by this principle, past works have proposed normative compressed sensing models for olfactory decoding. However, these models have not captured the unique anatomy and physiology of the olfactory bulb, nor have they shown that sensing can be achieved within the 100-millisecond timescale of a single sniff. Here, we propose a rate-based Poisson compressed sensing circuit model for the olfactory bulb. This model maps onto the neuron classes of the olfactory bulb, and recapitulates salient features of their connectivity and physiology. For circuit sizes comparable to the human olfactory bulb, we show that this model can accurately detect tens of odors within the timescale of a single sniff. We also show that this model can perform Bayesian posterior sampling for accurate uncertainty estimation. Fast inference is possible only if the geometry of the neural code is chosen to match receptor properties, yielding a distributed neural code that is not axis-aligned to individual odor identities. Our results illustrate how normative modeling can help us map function onto specific neural circuits to generate new hypotheses.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity b/data/2023/neurips/Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity
new file mode 100644
index 0000000000..b3192e5cb2
--- /dev/null
+++ b/data/2023/neurips/Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity	
@@ -0,0 +1 @@
+The neural population spiking activity recorded by intracortical brain-computer interfaces (iBCIs) contain rich structure. Current models of such spiking activity are largely prepared for individual experimental contexts, restricting data volume to that collectable within a single session and limiting the effectiveness of deep neural networks (DNNs). The purported challenge in aggregating neural spiking data is the pervasiveness of context-dependent shifts in the neural data distributions. However, large scale unsupervised pretraining by nature spans heterogeneous data, and has proven to be a fundamental recipe for successful representation learning across deep learning. We thus develop Neural Data Transformer 2 (NDT2), a spatiotemporal Transformer for neural spiking activity, and demonstrate that pretraining can leverage motor BCI datasets that span sessions, subjects, and experimental tasks. NDT2 enables rapid adaptation to novel contexts in downstream decoding tasks and opens the path to deployment of pretrained DNNs for iBCI control. Code: https://github.com/joel99/context_general_bci
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes b/data/2023/neurips/Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
new file mode 100644
index 0000000000..61f0bfee94
--- /dev/null
+++ b/data/2023/neurips/Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes	
@@ -0,0 +1 @@
+Humans and animals have a rich and flexible understanding of the physical world, which enables them to infer the underlying dynamical trajectories of objects and events, plausible future states, and use that to plan and anticipate the consequences of actions. However, the neural mechanisms underlying these computations are unclear. We combine a goal-driven modeling approach with dense neurophysiological data and high-throughput human behavioral readouts that contain thousands of comparisons to directly impinge on this question. Specifically, we construct and evaluate several classes of sensory-cognitive networks to predict the future state of rich, ethologically-relevant environments, ranging from self-supervised end-to-end models with pixel-wise or object-slot objectives, to models that future predict in the latent space of purely static image-pretrained or dynamic video-pretrained foundation models. We find that “scale is not all you need”, and that many state-of-the-art machine learning models fail to perform well on our neural and behavioral benchmarks for future prediction. In fact, only one class of models matches these data well overall. We find that neural responses are currently best predicted by models trained to predict the future state of their environment in the latent space of pretrained foundation models optimized for dynamic scenes in a self-supervised manner. These models also approach the neurons’ ability to predict the environmental state variables that are visually hidden from view, despite not being explicitly trained to do so. Finally, we find that not all foundation model latents are equal. Notably, models that future predict in the latent space of video foundation models that are optimized to support a diverse range of egocentric sensorimotor tasks, reasonably match both human behavioral error patterns and neural dynamics across all environmental scenarios that we were able to test. Overall, these findings suggest that the neural mechanisms and behaviors of primate mental simulation have strong inductive biases associated with them, and are thus far most consistent with being optimized to future predict on reusable visual representations that are useful for Embodied AI more generally.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Functional Transformers b/data/2023/neurips/Neural Functional Transformers
new file mode 100644
index 0000000000..d61ae25ad3
--- /dev/null
+++ b/data/2023/neurips/Neural Functional Transformers	
@@ -0,0 +1 @@
+The recent success of neural networks as implicit representation of data has driven growing interest in neural functionals: models that can process other neural networks as input by operating directly over their weight spaces. Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers and composes them into deep equivariant models called neural functional Transformers (NFTs). NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains. In experiments processing the weights of feedforward MLPs and CNNs, we find that NFTs match or exceed the performance of prior weight-space methods. We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant latent representations from the weights of implicit neural representations (INRs). Our proposed method improves INR classification accuracy by up to $+17\%$ over existing methods. We provide an implementation of our layers at https://github.com/AllanYangZhou/nfn.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations b/data/2023/neurips/Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
new file mode 100644
index 0000000000..678ec63d76
--- /dev/null
+++ b/data/2023/neurips/Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations	
@@ -0,0 +1 @@
+We introduce a data-driven learning framework that assimilates two powerful ideas: ideal large eddy simulation (LES) from turbulence closure modeling and neural stochastic differential equations (SDE) for stochastic modeling. The ideal LES models the LES flow by treating each full-order trajectory as a random realization of the underlying dynamics, as such, the effect of small-scales is marginalized to obtain the deterministic evolution of the LES state. However, ideal LES is analytically intractable. In our work, we use a latent neural SDE to model the evolution of the stochastic process and an encoder-decoder pair for transforming between the latent space and the desired ideal flow field. This stands in sharp contrast to other types of neural parameterization of closure models where each trajectory is treated as a deterministic realization of the dynamics. We show the effectiveness of our approach (niLES - neural ideal LES) on a challenging chaotic dynamical system: Kolmogorov flow at a Reynolds number of 20,000. Compared to competing methods, our method can handle non-uniform geometries using unstructured meshes seamlessly. In particular, niLES leads to trajectories with more accurate statistics and enhances stability, particularly for long-horizon rollouts.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Image Compression: Generalization, Robustness, and Spectral Biases b/data/2023/neurips/Neural Image Compression: Generalization, Robustness, and Spectral Biases
new file mode 100644
index 0000000000..9df635cd19
--- /dev/null
+++ b/data/2023/neurips/Neural Image Compression: Generalization, Robustness, and Spectral Biases	
@@ -0,0 +1 @@
+Recent advances in neural image compression (NIC) have produced models that are starting to outperform classic codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to the popular CLIC and Kodak benchmarks. Next, we propose spectrally-inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of several classic codecs and NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Lyapunov Control for Discrete-Time Systems b/data/2023/neurips/Neural Lyapunov Control for Discrete-Time Systems
new file mode 100644
index 0000000000..283a9b81a8
--- /dev/null
+++ b/data/2023/neurips/Neural Lyapunov Control for Discrete-Time Systems	
@@ -0,0 +1 @@
+While ensuring stability for linear systems is well understood, it remains a major challenge for nonlinear systems. A general approach in such cases is to compute a combination of a Lyapunov function and an associated control policy. However, finding Lyapunov functions for general nonlinear systems is a challenging task. To address this challenge, several methods have been proposed that represent Lyapunov functions using neural networks. However, such approaches either focus on continuous-time systems, or highly restricted classes of nonlinear dynamics. We propose the first approach for learning neural Lyapunov control in a broad class of discrete-time systems. Three key ingredients enable us to effectively learn provably stable control policies. The first is a novel mixed-integer linear programming approach for verifying the discrete-time Lyapunov stability conditions, leveraging the particular structure of these conditions. The second is a novel approach for computing verified sublevel sets. The third is a heuristic gradient-based method for quickly finding counterexamples to significantly speed up Lyapunov function learning. Our experiments on four standard benchmarks demonstrate that our approach significantly outperforms state-of-the-art baselines. For example, on the path tracking benchmark, we outperform recent neural Lyapunov control baselines by an order of magnitude in both running time and the size of the region of attraction, and on two of the four benchmarks (cartpole and PVTOL), ours is the first automated approach to return a provably stable controller. Our code is available at: https://github.com/jlwu002/nlc_discrete.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability b/data/2023/neurips/Neural Modulation for Flash Memory: An Unsupervised Learning Framework for Improved Reliability
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Neural Priming for Sample-Efficient Adaptation b/data/2023/neurips/Neural Priming for Sample-Efficient Adaptation
new file mode 100644
index 0000000000..3fd96547e5
--- /dev/null
+++ b/data/2023/neurips/Neural Priming for Sample-Efficient Adaptation	
@@ -0,0 +1 @@
+We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Processes with Stability b/data/2023/neurips/Neural Processes with Stability
new file mode 100644
index 0000000000..0d37ae6b51
--- /dev/null
+++ b/data/2023/neurips/Neural Processes with Stability	
@@ -0,0 +1 @@
+Unlike traditional statistical models depending on hand-speciﬁed priors, neural processes (NPs) have recently emerged as a class of powerful neural statistical models that combine the strengths of neural networks and stochastic processes. NPs can deﬁne a ﬂexible class of stochastic processes well suited for highly non-trivial functions by encoding contextual knowledge into the function space. However, noisy context points introduce challenges to the algorithmic stability that small changes in training data may signiﬁcantly change the models and yield lower generalization performance. In this paper, we provide theoretical guidelines for deriving stable solutions with high generalization by introducing the notion of algorithmic stability into NPs, which can be ﬂexible to work with various NPs and achieves less biased approximation with theoretical guarantees. To illustrate the superiority of the proposed model, we perform experiments on both synthetic and real-world data, and the results demonstrate that our approach not only helps to achieve more accurate performance but also improves model robustness.
\ No newline at end of file
diff --git a/data/2023/neurips/Neural Sampling in Hierarchical Exponential-family Energy-based Models b/data/2023/neurips/Neural Sampling in Hierarchical Exponential-family Energy-based Models
new file mode 100644
index 0000000000..e46bf8b63a
--- /dev/null
+++ b/data/2023/neurips/Neural Sampling in Hierarchical Exponential-family Energy-based Models	
@@ -0,0 +1 @@
+Bayesian brain theory suggests that the brain employs generative models to understand the external world. The sampling-based perspective posits that the brain infers the posterior distribution through samples of stochastic neuronal responses. Additionally, the brain continually updates its generative model to approach the true distribution of the external world. In this study, we introduce the Hierarchical Exponential-family Energy-based (HEE) model, which captures the dynamics of inference and learning. In the HEE model, we decompose the partition function into individual layers and leverage a group of neurons with shorter time constants to sample the gradient of the decomposed normalization term. This allows our model to estimate the partition function and perform inference simultaneously, circumventing the negative phase encountered in conventional energy-based models (EBMs). As a result, the learning process is localized both in time and space, and the model is easy to converge. To match the brain's rapid computation, we demonstrate that neural adaptation can serve as a momentum term, significantly accelerating the inference process. On natural image datasets, our model exhibits representations akin to those observed in the biological visual system. Furthermore, for the machine learning community, our model can generate observations through joint or marginal generation. We show that marginal generation outperforms joint generation and achieves performance on par with other EBMs.
\ No newline at end of file
diff --git a/data/2023/neurips/NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function b/data/2023/neurips/NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function
new file mode 100644
index 0000000000..7b67e31aa0
--- /dev/null
+++ b/data/2023/neurips/NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function	
@@ -0,0 +1 @@
+Normal estimation for 3D point clouds is a fundamental task in 3D geometry processing. The state-of-the-art methods rely on priors of fitting local surfaces learned from normal supervision. However, normal supervision in benchmarks comes from synthetic shapes and is usually not available from real scans, thereby limiting the learned priors of these methods. In addition, normal orientation consistency across shapes remains difficult to achieve without a separate post-processing procedure. To resolve these issues, we propose a novel method for estimating oriented normals directly from point clouds without using ground truth normals as supervision. We achieve this by introducing a new paradigm for learning neural gradient functions, which encourages the neural network to fit the input point clouds and yield unit-norm gradients at the points. Specifically, we introduce loss functions to facilitate query points to iteratively reach the moving targets and aggregate onto the approximated surface, thereby learning a global surface representation of the data. Meanwhile, we incorporate gradients into the surface approximation to measure the minimum signed deviation of queries, resulting in a consistent gradient field associated with the surface. These techniques lead to our deep unsupervised oriented normal estimator that is robust to noise, outliers and density variations. Our excellent results on widely used benchmarks demonstrate that our method can learn more accurate normals for both unoriented and oriented normal estimation tasks than the latest methods. The source code and pre-trained model are publicly available at https://github.com/LeoQLi/NeuralGF.
\ No newline at end of file
diff --git a/data/2023/neurips/NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications b/data/2023/neurips/NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications
new file mode 100644
index 0000000000..50f32f00d8
--- /dev/null
+++ b/data/2023/neurips/NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications	
@@ -0,0 +1 @@
+Recently, the Deep Learning community has become interested in evolutionary optimization (EO) as a means to address hard optimization problems, e.g. meta-learning through long inner loop unrolls or optimizing non-differentiable operators. One core reason for this trend has been the recent innovation in hardware acceleration and compatible software - making distributed population evaluations much easier than before. Unlike for gradient descent-based methods though, there is a lack of hyperparameter understanding and best practices for EO - arguably due to severely less 'graduate student descent' and benchmarking being performed for EO methods. Additionally, classical benchmarks from the evolutionary community provide few practical insights for Deep Learning applications. This poses challenges for newcomers to hardware-accelerated EO and hinders significant adoption. Hence, we establish a new benchmark of EO methods (NeuroEvoBench) tailored toward Deep Learning applications and exhaustively evaluate traditional and meta-learned EO. We investigate core scientific questions including resource allocation, fitness shaping, normalization, regularization&scalability of EO. The benchmark is open-sourced at https://github.com/neuroevobench/neuroevobench under Apache-2.0 license.
\ No newline at end of file
diff --git a/data/2023/neurips/New Bounds for Hyperparameter Tuning of Regression Problems Across Instances b/data/2023/neurips/New Bounds for Hyperparameter Tuning of Regression Problems Across Instances
new file mode 100644
index 0000000000..3ac3f538f4
--- /dev/null
+++ b/data/2023/neurips/New Bounds for Hyperparameter Tuning of Regression Problems Across Instances	
@@ -0,0 +1 @@
+The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under ℓ 1 and ℓ 2 -constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class. We examine the pseudo-dimension of the approximation class and construct a uniform error bound between the validation loss function class and its approximation, which allows us to instantiate the first learning guarantee for the problem of tuning logistic regression regularization coefficients.
\ No newline at end of file
diff --git a/data/2023/neurips/No Change, No Gain: Empowering Graph Neural Networks with Expected Model Change Maximization for Active Learning b/data/2023/neurips/No Change, No Gain: Empowering Graph Neural Networks with Expected Model Change Maximization for Active Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Nominality Score Conditioned Time Series Anomaly Detection by Point Sequential Reconstruction b/data/2023/neurips/Nominality Score Conditioned Time Series Anomaly Detection by Point Sequential Reconstruction
new file mode 100644
index 0000000000..477c05a8f9
--- /dev/null
+++ b/data/2023/neurips/Nominality Score Conditioned Time Series Anomaly Detection by Point Sequential Reconstruction	
@@ -0,0 +1 @@
+Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
\ No newline at end of file
diff --git a/data/2023/neurips/Non-Asymptotic Analysis of a UCB-based Top Two Algorithm b/data/2023/neurips/Non-Asymptotic Analysis of a UCB-based Top Two Algorithm
new file mode 100644
index 0000000000..02a82e04bf
--- /dev/null
+++ b/data/2023/neurips/Non-Asymptotic Analysis of a UCB-based Top Two Algorithm	
@@ -0,0 +1 @@
+A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.
\ No newline at end of file
diff --git a/data/2023/neurips/Norm-based Generalization Bounds for Sparse Neural Networks b/data/2023/neurips/Norm-based Generalization Bounds for Sparse Neural Networks
new file mode 100644
index 0000000000..c63e92986c
--- /dev/null
+++ b/data/2023/neurips/Norm-based Generalization Bounds for Sparse Neural Networks	
@@ -0,0 +1 @@
+In this paper, we derive norm-based generalization bounds for sparse ReLU neural networks, including convolutional neural networks. These bounds differ from previous ones because they consider the sparse structure of the neural network architecture and the norms of the convolutional filters, rather than the norms of the (Toeplitz) matrices associated with the convolutional layers. Theoretically, we demonstrate that these bounds are significantly tighter than standard norm-based generalization bounds. Empirically, they offer relatively tight estimations of generalization for various simple classification problems. Collectively, these findings suggest that the sparsity of the underlying target function and the model’s architecture plays a crucial role in the success of deep learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Normalization Layers Are All That Sharpness-Aware Minimization Needs b/data/2023/neurips/Normalization Layers Are All That Sharpness-Aware Minimization Needs
new file mode 100644
index 0000000000..8c3722037a
--- /dev/null
+++ b/data/2023/neurips/Normalization Layers Are All That Sharpness-Aware Minimization Needs	
@@ -0,0 +1 @@
+Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.
\ No newline at end of file
diff --git a/data/2023/neurips/Normalizing flow neural networks by JKO scheme b/data/2023/neurips/Normalizing flow neural networks by JKO scheme
new file mode 100644
index 0000000000..e404300580
--- /dev/null
+++ b/data/2023/neurips/Normalizing flow neural networks by JKO scheme	
@@ -0,0 +1 @@
+Normalizing flow is a class of deep generative models for efficient sampling and likelihood estimation, which achieves attractive performance, particularly in high dimensions. The flow is often implemented using a sequence of invertible residual blocks. Existing works adopt special network architectures and regularization of flow trajectories. In this paper, we develop a neural ODE flow network called JKO-iFlow, inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which unfolds the discrete-time dynamic of the Wasserstein gradient flow. The proposed method stacks residual blocks one after another, allowing efficient block-wise training of the residual blocks, avoiding sampling SDE trajectories and score matching or variational learning, thus reducing the memory load and difficulty in end-to-end training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the induced trajectory in probability space to improve the model accuracy further. Experiments with synthetic and real data show that the proposed JKO-iFlow network achieves competitive performance compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
\ No newline at end of file
diff --git a/data/2023/neurips/Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts b/data/2023/neurips/Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts
new file mode 100644
index 0000000000..ea72738798
--- /dev/null
+++ b/data/2023/neurips/Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts	
@@ -0,0 +1 @@
+Neuro-Symbolic (NeSy) predictive models hold the promise of improved compliance with given constraints, systematic generalization, and interpretability, as they allow to infer labels that are consistent with some prior knowledge by reasoning over high-level concepts extracted from sub-symbolic inputs. It was recently shown that NeSy predictors are affected by reasoning shortcuts: they can attain high accuracy but by leveraging concepts with unintended semantics, thus coming short of their promised advantages. Yet, a systematic characterization of reasoning shortcuts and of potential mitigation strategies is missing. This work fills this gap by characterizing them as unintended optima of the learning objective and identifying four key conditions behind their occurrence. Based on this, we derive several natural mitigation strategies, and analyze their efficacy both theoretically and empirically. Our analysis shows reasoning shortcuts are difficult to deal with, casting doubts on the trustworthiness and interpretability of existing NeSy solutions.
\ No newline at end of file
diff --git a/data/2023/neurips/OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents b/data/2023/neurips/OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
new file mode 100644
index 0000000000..eb05fa8b83
--- /dev/null
+++ b/data/2023/neurips/OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents	
@@ -0,0 +1 @@
+Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks. However, the datasets used to train these models have not been released, and the collection process has not been fully specified. We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documents comprising 141 million web pages extracted from Common Crawl, 353 million associated images, and 115 billion text tokens. We describe the dataset creation process, present comprehensive filtering rules, and provide an analysis of the dataset's content. To show the viability of OBELICS, we train vision and language models of 9 and 80 billion parameters named IDEFICS, and obtain competitive performance on different multimodal benchmarks. We release our dataset, models and code.
\ No newline at end of file
diff --git a/data/2023/neurips/ODE-based Recurrent Model-free Reinforcement Learning for POMDPs b/data/2023/neurips/ODE-based Recurrent Model-free Reinforcement Learning for POMDPs
new file mode 100644
index 0000000000..3dc685d460
--- /dev/null
+++ b/data/2023/neurips/ODE-based Recurrent Model-free Reinforcement Learning for POMDPs	
@@ -0,0 +1 @@
+Neural ordinary differential equations (ODEs) are widely recognized as the standard for modeling physical mechanisms, which help to perform approximate inference in unknown physical or biological environments. In partially observable (PO) environments, how to infer unseen information from raw observations puzzled the agents. By using a recurrent policy with a compact context, context-based reinforcement learning provides a flexible way to extract unobservable information from historical transitions. To help the agent extract more dynamics-related information, we present a novel ODE-based recurrent model combines with model-free reinforcement learning (RL) framework to solve partially observable Markov decision processes (POMDPs). We experimentally demonstrate the efficacy of our methods across various PO continuous control and meta-RL tasks. Furthermore, our experiments illustrate that our method is robust against irregular observations, owing to the ability of ODEs to model irregularly-sampled time series.
\ No newline at end of file
diff --git a/data/2023/neurips/OV-PARTS: Towards Open-Vocabulary Part Segmentation b/data/2023/neurips/OV-PARTS: Towards Open-Vocabulary Part Segmentation
new file mode 100644
index 0000000000..1d71387093
--- /dev/null
+++ b/data/2023/neurips/OV-PARTS: Towards Open-Vocabulary Part Segmentation	
@@ -0,0 +1 @@
+Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks. While significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation inherently involves intricate boundaries, while limited annotated data compounds the challenge. Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world. Furthermore, the large-scale vision and language models, which play a key role in the open vocabulary setting, struggle to recognize parts as effectively as objects. To comprehensively investigate and tackle these challenges, we propose an Open-Vocabulary Part Segmentation (OV-PARTS) benchmark. OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K-Part-234. And it covers three specific tasks: Generalized Zero-Shot Part Segmentation, Cross-Dataset Part Segmentation, and Few-Shot Part Segmentation, providing insights into analogical reasoning, open granularity and few-shot adapting abilities of models. Moreover, we analyze and adapt two prevailing paradigms of existing object-level OVSS methods for OV-PARTS. Extensive experimental analysis is conducted to inspire future research in leveraging foundational models for OV-PARTS. The code and dataset are available at https://github.com/OpenRobotLab/OV_PARTS.
\ No newline at end of file
diff --git a/data/2023/neurips/Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations b/data/2023/neurips/Object Reprojection Error (ORE): Camera pose benchmarks from lightweight tracking annotations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving b/data/2023/neurips/Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving
new file mode 100644
index 0000000000..855cc4433a
--- /dev/null
+++ b/data/2023/neurips/Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving	
@@ -0,0 +1 @@
+Robotic perception requires the modeling of both 3D geometry and semantics. Existing methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details and struggling to handle general, out-of-vocabulary objects. 3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations. To support 3D occupancy prediction, we develop a label generation pipeline that produces dense, visibility-aware labels for any given scene. This pipeline comprises three stages: voxel densification, occlusion reasoning, and image-guided voxel refinement. We establish two benchmarks, derived from the Waymo Open Dataset and the nuScenes Dataset, namely Occ3D-Waymo and Occ3D-nuScenes benchmarks. Furthermore, we provide an extensive analysis of the proposed dataset with various baseline models. Lastly, we propose a new model, dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superior performance on the Occ3D benchmarks. The code, data, and benchmarks are released at https://tsinghua-mars-lab.github.io/Occ3D/.
\ No newline at end of file
diff --git a/data/2023/neurips/OceanBench: The Sea Surface Height Edition b/data/2023/neurips/OceanBench: The Sea Surface Height Edition
new file mode 100644
index 0000000000..51826d4cdf
--- /dev/null
+++ b/data/2023/neurips/OceanBench: The Sea Surface Height Edition	
@@ -0,0 +1 @@
+The ocean profoundly influences human activities and plays a critical role in climate regulation. Our understanding has improved over the last decades with the advent of satellite remote sensing data, allowing us to capture essential quantities over the globe, e.g., sea surface height (SSH). However, ocean satellite data presents challenges for information extraction due to their sparsity and irregular sampling, signal complexity, and noise. Machine learning (ML) techniques have demonstrated their capabilities in dealing with large-scale, complex signals. Therefore we see an opportunity for ML models to harness the information contained in ocean satellite data. However, data representation and relevant evaluation metrics can be the defining factors when determining the success of applied ML. The processing steps from the raw observation data to a ML-ready state and from model outputs to interpretable quantities require domain expertise, which can be a significant barrier to entry for ML researchers. OceanBench is a unifying framework that provides standardized processing steps that comply with domain-expert standards. It provides plug-and-play data and pre-configured pipelines for ML researchers to benchmark their models and a transparent configurable framework for researchers to customize and extend the pipeline for their tasks. In this work, we demonstrate the OceanBench framework through a first edition dedicated to SSH interpolation challenges. We provide datasets and ML-ready benchmarking pipelines for the long-standing problem of interpolating observations from simulated ocean satellite data, multi-modal and multi-sensor fusion issues, and transfer-learning to real ocean satellite observations. The OceanBench framework is available at github.com/jejjohnson/oceanbench and the dataset registry is available at github.com/quentinf00/oceanbench-data-registry.
\ No newline at end of file
diff --git a/data/2023/neurips/Offline Imitation Learning with Variational Counterfactual Reasoning b/data/2023/neurips/Offline Imitation Learning with Variational Counterfactual Reasoning
new file mode 100644
index 0000000000..fcf6f952ed
--- /dev/null
+++ b/data/2023/neurips/Offline Imitation Learning with Variational Counterfactual Reasoning	
@@ -0,0 +1 @@
+In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.
\ No newline at end of file
diff --git a/data/2023/neurips/Offline RL with Discrete Proxy Representations for Generalizability in POMDPs b/data/2023/neurips/Offline RL with Discrete Proxy Representations for Generalizability in POMDPs
new file mode 100644
index 0000000000..803f573625
--- /dev/null
+++ b/data/2023/neurips/Offline RL with Discrete Proxy Representations for Generalizability in POMDPs	
@@ -0,0 +1 @@
+Offline Reinforcement Learning (RL) has demonstrated promising results in various applications by learning policies from previously collected datasets, reducing the need for online exploration and interactions. However, real-world scenarios usually involve partial observability, which brings crucial challenges of the deployment of offline RL methods: i) the policy trained on data with full observability is not robust against the masked observations during execution, and ii) the information of which parts of observations are masked is usually unknown during training. In order to address these challenges, we present Offline RL with DiscrEte pRoxy representations (ORDER), a probabilistic framework which leverages novel state representations to improve the robustness against diverse masked observabilities. Specifically, we propose a discrete representation of the states and use a proxy representation to recover the states from masked partial observable trajectories. The training of ORDER can be compactly described as the following three steps. i) Learning the discrete state representations on data with full observations, ii) Training the decision module based on the discrete representations, and iii) Training the proxy discrete representations on the data with various partial observations, aligning with the discrete representations. We conduct extensive experiments to evaluate ORDER, showcasing its effectiveness in offline RL for diverse partially observable scenarios and highlighting the significance of discrete proxy representations in generalization performance. ORDER is a flexible framework to employ any offline RL algorithms and we hope that ORDER can pave the way for the deployment of RL policy against various partial observabilities in the real world.
\ No newline at end of file
diff --git a/data/2023/neurips/On Convergence of Polynomial Approximations to the Gaussian Mixture Entropy b/data/2023/neurips/On Convergence of Polynomial Approximations to the Gaussian Mixture Entropy
new file mode 100644
index 0000000000..870583861f
--- /dev/null
+++ b/data/2023/neurips/On Convergence of Polynomial Approximations to the Gaussian Mixture Entropy	
@@ -0,0 +1 @@
+Gaussian mixture models (GMMs) are fundamental to machine learning due to their flexibility as approximating densities. However, uncertainty quantification of GMMs remains a challenge as differential entropy lacks a closed form. This paper explores polynomial approximations, specifically Taylor and Legendre, to the GMM entropy from a theoretical and practical perspective. We provide new analysis of a widely used approach due to Huber et al. (2008) and show that the series diverges under simple conditions. Motivated by this divergence we provide a novel Taylor series that is provably convergent to the true entropy of any GMM. We demonstrate a method for selecting a center such that the series converges from below, providing a lower bound on GMM entropy. Furthermore, we demonstrate that orthogonal polynomial series result in more accurate polynomial approximations. Experimental validation supports our theoretical results while showing that our method is comparable in computation to Huber et al. We also show that in application, the use of these polynomial approximations, such as in Nonparametric Variational Inference, rely on the convergence of the methods in computing accurate approximations. This work contributes useful analysis to existing methods while introducing novel approximations supported by firm theoretical guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/On Differentially Private Sampling from Gaussian and Product Distributions b/data/2023/neurips/On Differentially Private Sampling from Gaussian and Product Distributions
new file mode 100644
index 0000000000..46b3473556
--- /dev/null
+++ b/data/2023/neurips/On Differentially Private Sampling from Gaussian and Product Distributions	
@@ -0,0 +1 @@
+Given a dataset of $n$ i.i.d. samples from an unknown distribution $P$, we consider the problem of generating a sample from a distribution that is close to $P$ in total variation distance, under the constraint of differential privacy (DP). We study the problem when $P$ is a multi-dimensional Gaussian distribution, under different assumptions on the information available to the DP mechanism: known covariance, unknown bounded covariance, and unknown unbounded covariance. We present new DP sampling algorithms, and show that they achieve near-optimal sample complexity in the first two settings. Moreover, when $P$ is a product distribution on the binary hypercube, we obtain a pure-DP algorithm whereas only an approximate-DP algorithm (with slightly worse sample complexity) was previously known.
\ No newline at end of file
diff --git a/data/2023/neurips/On Generalization Bounds for Projective Clustering b/data/2023/neurips/On Generalization Bounds for Projective Clustering
new file mode 100644
index 0000000000..476bdbee1b
--- /dev/null
+++ b/data/2023/neurips/On Generalization Bounds for Projective Clustering	
@@ -0,0 +1 @@
+Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous $k$-median and $k$-means objectives. One may also choose centers to be $j$ dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of $n$ samples $P$ drawn independently from some unknown, but fixed distribution $\mathcal{D}$, how quickly does a solution computed on $P$ converge to the optimal clustering of $\mathcal{D}$? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of $\tilde{O}\left(\sqrt{{k}/{n}}\right)$. This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for $k$-means and extends it to other important objectives such as $k$-median. For subspace clustering with $j$-dimensional subspaces, we show a convergence rate of $\tilde{O}\left(\sqrt{\frac{kj^2}{n}}\right)$. These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes $k$-means, we show a convergence rate of $\Omega\left(\sqrt{\frac{kj}{n}}\right)$ is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal.
\ No newline at end of file
diff --git a/data/2023/neurips/On Masked Pre-training and the Marginal Likelihood b/data/2023/neurips/On Masked Pre-training and the Marginal Likelihood
new file mode 100644
index 0000000000..133fcceed2
--- /dev/null
+++ b/data/2023/neurips/On Masked Pre-training and the Marginal Likelihood	
@@ -0,0 +1 @@
+Masked pre-training removes random input dimensions and learns a model that can predict the missing values. Empirical results indicate that this intuitive form of self-supervised learning yields models that generalize very well to new domains. A theoretical understanding is, however, lacking. This paper shows that masked pre-training with a suitable cumulative scoring function corresponds to maximizing the model's marginal likelihood, which is de facto the Bayesian model selection measure of generalization. Beyond shedding light on the success of masked pre-training, this insight also suggests that Bayesian models can be trained with appropriately designed self-supervision. Empirically, we confirm the developed theory and explore the main learning principles of masked pre-training in large language models.
\ No newline at end of file
diff --git a/data/2023/neurips/On Robust Streaming for Learning with Experts: Algorithms and Lower Bounds b/data/2023/neurips/On Robust Streaming for Learning with Experts: Algorithms and Lower Bounds
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/On permutation symmetries in Bayesian neural network posteriors: a variational perspective b/data/2023/neurips/On permutation symmetries in Bayesian neural network posteriors: a variational perspective
new file mode 100644
index 0000000000..f6bfbfb25d
--- /dev/null
+++ b/data/2023/neurips/On permutation symmetries in Bayesian neural network posteriors: a variational perspective	
@@ -0,0 +1 @@
+The elusive nature of gradient-based optimization in neural networks is tied to their loss landscape geometry, which is poorly understood. However recent work has brought solid evidence that there is essentially no loss barrier between the local solutions of gradient descent, once accounting for weight-permutations that leave the network's computation unchanged. This raises questions for approximate inference in Bayesian neural networks (BNNs), where we are interested in marginalizing over multiple points in the loss landscape. In this work, we first extend the formalism of marginalized loss barrier and solution interpolation to BNNs, before proposing a matching algorithm to search for linearly connected solutions. This is achieved by aligning the distributions of two independent approximate Bayesian solutions with respect to permutation matrices. We build on the results of Ainsworth et al. (2023), reframing the problem as a combinatorial optimization one, using an approximation to the sum of bilinear assignment problem. We then experiment on a variety of architectures and datasets, finding nearly zero marginalized loss barriers for linearly connected solutions.
\ No newline at end of file
diff --git a/data/2023/neurips/On student-teacher deviations in distillation: does it pay to disobey? b/data/2023/neurips/On student-teacher deviations in distillation: does it pay to disobey?
new file mode 100644
index 0000000000..de91ebb42b
--- /dev/null
+++ b/data/2023/neurips/On student-teacher deviations in distillation: does it pay to disobey?	
@@ -0,0 +1 @@
+Knowledge distillation (KD) has been widely used to improve the test accuracy of a"student"network, by training it to mimic the soft probabilities of a trained"teacher"network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Adversarial Robustness of Out-of-distribution Generalization Models b/data/2023/neurips/On the Adversarial Robustness of Out-of-distribution Generalization Models
new file mode 100644
index 0000000000..35456b6212
--- /dev/null
+++ b/data/2023/neurips/On the Adversarial Robustness of Out-of-distribution Generalization Models	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) generalization has attracted increasing research attention in recent years, due to its promising experimental results in real-world applications. Interestingly, we ﬁnd that existing OOD generalization methods are vulnerable to adversarial attacks. This motivates us to study OOD adversarial robustness. We ﬁrst present theoretical analyses of OOD adversarial robustness in two different complementary settings. Motivated by the theoretical results, we design two algorithms to improve the OOD adversarial robustness. Finally, we conduct experiments to validate the effectiveness of our proposed algorithms. Our code is available at https://github.com/ZouXinn/OOD-Adv.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence b/data/2023/neurips/On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence
new file mode 100644
index 0000000000..00cad29868
--- /dev/null
+++ b/data/2023/neurips/On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence	
@@ -0,0 +1 @@
+Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under $\epsilon$-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive a lower bound on the sample complexity of any $\delta$-correct BAI algorithm satisfying $\epsilon$-global DP. Our lower bound suggests the existence of two privacy regimes depending on the privacy budget $\epsilon$. In the high-privacy regime (small $\epsilon$), the hardness depends on a coupled effect of privacy and a novel information-theoretic quantity, called the Total Variation Characteristic Time. In the low-privacy regime (large $\epsilon$), the sample complexity lower bound reduces to the classical non-private lower bound. Second, we propose AdaP-TT, an $\epsilon$-global DP variant of the Top Two algorithm. AdaP-TT runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. We derive an asymptotic upper bound on the sample complexity of AdaP-TT that matches with the lower bound up to multiplicative constants in the high-privacy regime. Finally, we provide an experimental analysis of AdaP-TT that validates our theoretical results.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Connection between Pre-training Data Diversity and Fine-tuning Robustness b/data/2023/neurips/On the Connection between Pre-training Data Diversity and Fine-tuning Robustness
new file mode 100644
index 0000000000..674677c5c6
--- /dev/null
+++ b/data/2023/neurips/On the Connection between Pre-training Data Diversity and Fine-tuning Robustness	
@@ -0,0 +1 @@
+Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustness of a fine-tuned model? The properties we explore include the label space, label semantics, image diversity, data domains, and data quantity of the pre-training distribution. We find that the primary factor influencing downstream effective robustness (Taori et al., 2020) is data quantity, while other factors have limited significance. For example, reducing the number of ImageNet pre-training classes by 4x while increasing the number of images per class by 4x (that is, keeping total data quantity fixed) does not impact the robustness of fine-tuned models. We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources, primarily using the iWildCam-WILDS distribution shift as a test for downstream robustness.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms b/data/2023/neurips/On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
new file mode 100644
index 0000000000..ec74fd9da9
--- /dev/null
+++ b/data/2023/neurips/On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms	
@@ -0,0 +1 @@
+Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings. Our analysis employs more relaxed non-convex assumptions than previous literature. Nevertheless, we maintain the desired computational complexity as shuffling SGD has achieved in the general convex setting.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis b/data/2023/neurips/On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis
new file mode 100644
index 0000000000..6426beb16e
--- /dev/null
+++ b/data/2023/neurips/On the Generalization Error of Stochastic Mirror Descent for Quadratically-Bounded Losses: an Improved Analysis	
@@ -0,0 +1 @@
+In this work, we revisit the generalization error of stochastic mirror descent for 1 quadratically bounded losses studied in Telgarsky (2022). Quadratically bounded 2 losses is a broad class of loss functions, capturing both Lipschitz and smooth 3 functions, for both regression and classification problems. We study the high 4 probability generalization for this class of losses on linear predictors in both 5 realizable and non-realizable cases when the data are sampled IID or from a 6 Markov chain. The prior work relies on an intricate coupling argument between 7 the iterates of the original problem and those projected onto a bounded domain. 8 This approach enables blackbox application of concentration inequalities, but 9 also leads to suboptimal guarantees due in part to the use of a union bound 10 across all iterations. In this work, we depart significantly from the prior work of 11 Telgarsky (2022), and introduce a novel approach for establishing high probability 12 generalization guarantees. In contrast to the prior work, our work directly analyzes 13 the moment generating function of a novel supermartingale sequence and leverages 14 the structure of stochastic mirror descent. As a result, we obtain improved bounds 15 in all aforementioned settings. Specifically, in the realizable case and non-realizable 16 case with light-tailed sub-Gaussian data, we improve the bounds by a log T factor, 17 matching the correct rates of 1 /T and 1 / √ T , respectively. In the more challenging 18 case of heavy-tailed polynomial data, we improve the existing bound by a poly T 19 factor. 20
\ No newline at end of file
diff --git a/data/2023/neurips/On the Identifiability and Interpretability of Gaussian Process Models b/data/2023/neurips/On the Identifiability and Interpretability of Gaussian Process Models
new file mode 100644
index 0000000000..bed402b3db
--- /dev/null
+++ b/data/2023/neurips/On the Identifiability and Interpretability of Gaussian Process Models	
@@ -0,0 +1 @@
+In this paper, we critically examine the prevalent practice of using additive mixtures of Mat\'ern kernels in single-output Gaussian process (GP) models and explore the properties of multiplicative mixtures of Mat\'ern kernels for multi-output GP models. For the single-output case, we derive a series of theoretical results showing that the smoothness of a mixture of Mat\'ern kernels is determined by the least smooth component and that a GP with such a kernel is effectively equivalent to the least smooth kernel component. Furthermore, we demonstrate that none of the mixing weights or parameters within individual kernel components are identifiable. We then turn our attention to multi-output GP models and analyze the identifiability of the covariance matrix $A$ in the multiplicative kernel $K(x,y) = AK_0(x,y)$, where $K_0$ is a standard single output kernel such as Mat\'ern. We show that $A$ is identifiable up to a multiplicative constant, suggesting that multiplicative mixtures are well suited for multi-output tasks. Our findings are supported by extensive simulations and real applications for both single- and multi-output settings. This work provides insight into kernel selection and interpretation for GP models, emphasizing the importance of choosing appropriate kernel structures for different tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Identifiability of Sparse ICA without Assuming Non-Gaussianity b/data/2023/neurips/On the Identifiability of Sparse ICA without Assuming Non-Gaussianity
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/On the Overlooked Structure of Stochastic Gradients b/data/2023/neurips/On the Overlooked Structure of Stochastic Gradients
new file mode 100644
index 0000000000..befe5a51ff
--- /dev/null
+++ b/data/2023/neurips/On the Overlooked Structure of Stochastic Gradients	
@@ -0,0 +1 @@
+Stochastic gradients closely relate to both optimization and generalization of deep neural networks (DNNs). Some works attempted to explain the success of stochastic optimization for deep learning by the arguably heavy-tail properties of gradient noise, while other works presented theoretical and empirical evidence against the heavy-tail hypothesis on gradient noise. Unfortunately, formal statistical tests for analyzing the structure and heavy tails of stochastic gradients in deep learning are still under-explored. In this paper, we mainly make two contributions. First, we conduct formal statistical tests on the distribution of stochastic gradients and gradient noise across both parameters and iterations. Our statistical tests reveal that dimension-wise gradients usually exhibit power-law heavy tails, while iteration-wise gradients and stochastic gradient noise caused by minibatch training usually do not exhibit power-law heavy tails. Second, we further discover that the covariance spectra of stochastic gradients have the power-law structures overlooked by previous studies and present its theoretical implications for training of DNNs. While previous studies believed that the anisotropic structure of stochastic gradients matters to deep learning, they did not expect the gradient covariance can have such an elegant mathematical structure. Our work challenges the existing belief and provides novel insights on the structure of stochastic gradients in deep learning.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Planning Abilities of Large Language Models - A Critical Investigation b/data/2023/neurips/On the Planning Abilities of Large Language Models - A Critical Investigation
new file mode 100644
index 0000000000..5fe013ed41
--- /dev/null
+++ b/data/2023/neurips/On the Planning Abilities of Large Language Models - A Critical Investigation	
@@ -0,0 +1 @@
+Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs as a source of heuristic guidance for other agents (AI planners) in their planning tasks. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the heuristic mode show more promise. In the heuristic mode, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Robustness of Removal-Based Feature Attributions b/data/2023/neurips/On the Robustness of Removal-Based Feature Attributions
new file mode 100644
index 0000000000..aff8818fce
--- /dev/null
+++ b/data/2023/neurips/On the Robustness of Removal-Based Feature Attributions	
@@ -0,0 +1 @@
+To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing that they are sensitive to input and model perturbations, while other work addresses this issue by proposing robust attribution methods. However, previous work on attribution robustness has focused primarily on gradient-based feature attributions, whereas the robustness of removal-based attribution methods is not currently well understood. To bridge this gap, we theoretically characterize the robustness properties of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical results on synthetic and real-world data validate our theoretical results and demonstrate their practical implications, including the ability to increase attribution robustness by improving the model's Lipschitz regularity.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences b/data/2023/neurips/On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences
new file mode 100644
index 0000000000..f228666fd3
--- /dev/null
+++ b/data/2023/neurips/On the Role of Noise in the Sample Complexity of Learning Recurrent Neural Networks: Exponential Gaps for Long Sequences	
@@ -0,0 +1 @@
+We consider the class of noisy multi-layered sigmoid recurrent neural networks with $w$ (unbounded) weights for classification of sequences of length $T$, where independent noise distributed according to $\mathcal{N}(0,\sigma^2)$ is added to the output of each neuron in the network. Our main result shows that the sample complexity of PAC learning this class can be bounded by $O (w\log(T/\sigma))$. For the non-noisy version of the same class (i.e., $\sigma=0$), we prove a lower bound of $\Omega (wT)$ for the sample complexity. Our results indicate an exponential gap in the dependence of sample complexity on $T$ for noisy versus non-noisy networks. Moreover, given the mild logarithmic dependence of the upper bound on $1/\sigma$, this gap still holds even for numerically negligible values of $\sigma$.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Role of Randomization in Adversarially Robust Classification b/data/2023/neurips/On the Role of Randomization in Adversarially Robust Classification
new file mode 100644
index 0000000000..fb9cc29bb8
--- /dev/null
+++ b/data/2023/neurips/On the Role of Randomization in Adversarially Robust Classification	
@@ -0,0 +1 @@
+Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones. In this paper, we clarify the role of randomization in building adversarially robust classifiers. Given a base hypothesis set of deterministic classifiers, we show the conditions under which a randomized ensemble outperforms the hypothesis set in adversarial risk, extending previous results. Additionally, we show that for any probabilistic binary classifier (including randomized ensembles), there exists a deterministic classifier that outperforms it. Finally, we give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier for many types of commonly used probabilistic classifiers, i.e. randomized ensembles and parametric/input noise injection.
\ No newline at end of file
diff --git a/data/2023/neurips/On the Trade-off of Intra- Inter-class Diversity for Supervised Pre-training b/data/2023/neurips/On the Trade-off of Intra- Inter-class Diversity for Supervised Pre-training
new file mode 100644
index 0000000000..16647ef4c1
--- /dev/null
+++ b/data/2023/neurips/On the Trade-off of Intra- Inter-class Diversity for Supervised Pre-training	
@@ -0,0 +1 @@
+Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/On the spectral bias of two-layer linear networks b/data/2023/neurips/On the spectral bias of two-layer linear networks
new file mode 100644
index 0000000000..5fbc1b391e
--- /dev/null
+++ b/data/2023/neurips/On the spectral bias of two-layer linear networks	
@@ -0,0 +1 @@
+This paper studies the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape. This characterization reveals that, in the small scale initialization regime, the linear neural network’s hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution. We support our findings with numerical experiments illustrating the phenomena.
\ No newline at end of file
diff --git a/data/2023/neurips/One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning b/data/2023/neurips/One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning
new file mode 100644
index 0000000000..9d0a2a2022
--- /dev/null
+++ b/data/2023/neurips/One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning	
@@ -0,0 +1 @@
+Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.
\ No newline at end of file
diff --git a/data/2023/neurips/One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation b/data/2023/neurips/One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
new file mode 100644
index 0000000000..5f35fed9cf
--- /dev/null
+++ b/data/2023/neurips/One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation	
@@ -0,0 +1 @@
+Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD.
\ No newline at end of file
diff --git a/data/2023/neurips/One-step differentiation of iterative algorithms b/data/2023/neurips/One-step differentiation of iterative algorithms
new file mode 100644
index 0000000000..3f90d38d9e
--- /dev/null
+++ b/data/2023/neurips/One-step differentiation of iterative algorithms	
@@ -0,0 +1 @@
+In appropriate frameworks, automatic differentiation is transparent to the user at the cost of being a significant computational burden when the number of operations is large. For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as performant as implicit differentiation for fast algorithms (e.g., superlinear optimization methods). We provide a complete theoretical approximation analysis with specific examples (Newton's method, gradient descent) along with its consequences in bilevel optimization. Several numerical examples illustrate the well-foundness of the one-step estimator.
\ No newline at end of file
diff --git a/data/2023/neurips/OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling b/data/2023/neurips/OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
new file mode 100644
index 0000000000..0575c231c4
--- /dev/null
+++ b/data/2023/neurips/OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling	
@@ -0,0 +1 @@
+Online updating of time series forecasting models aims to address the concept drifting problem by efficiently updating forecasting models based on streaming data. Many algorithms are designed for online time series forecasting, with some exploiting cross-variable dependency while others assume independence among variables. Given every data assumption has its own pros and cons in online time series modeling, we propose \textbf{On}line \textbf{e}nsembling \textbf{Net}work (OneNet). It dynamically updates and combines two models, with one focusing on modeling the dependency across the time dimension and the other on cross-variate dependency. Our method incorporates a reinforcement learning-based approach into the traditional online convex programming framework, allowing for the linear combination of the two models with dynamically adjusted weights. OneNet addresses the main shortcoming of classical online learning methods that tend to be slow in adapting to the concept drift. Empirical results show that OneNet reduces online forecasting error by more than $\mathbf{50\%}$ compared to the State-Of-The-Art (SOTA) method. The code is available at \url{https://github.com/yfzhang114/OneNet}.
\ No newline at end of file
diff --git a/data/2023/neurips/Online Constrained Meta-Learning: Provable Guarantees for Generalization b/data/2023/neurips/Online Constrained Meta-Learning: Provable Guarantees for Generalization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Online Control for Meta-optimization b/data/2023/neurips/Online Control for Meta-optimization
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms b/data/2023/neurips/Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms
new file mode 100644
index 0000000000..22f63070e0
--- /dev/null
+++ b/data/2023/neurips/Online Label Shift: Optimal Dynamic Regret meets Practical Algorithms	
@@ -0,0 +1 @@
+This paper focuses on supervised and unsupervised online label shift, where the class marginals $Q(y)$ varies but the class-conditionals $Q(x|y)$ remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the dynamically evolving class marginals given only labeled online data. We develop novel algorithms that reduce the adaptation problem to online regression and guarantee optimal dynamic regret without any prior knowledge of the extent of drift in the label distribution. Our solution is based on bootstrapping the estimates of \emph{online regression oracles} that track the drifting proportions. Experiments across numerous simulated and real-world online label shift scenarios demonstrate the superior performance of our proposed approaches, often achieving 1-3\% improvement in accuracy while being sample and computationally efficient. Code is publicly available at https://github.com/acmi-lab/OnlineLabelShift.
\ No newline at end of file
diff --git a/data/2023/neurips/Online Learning under Adversarial Nonlinear Constraints b/data/2023/neurips/Online Learning under Adversarial Nonlinear Constraints
new file mode 100644
index 0000000000..aa9955e241
--- /dev/null
+++ b/data/2023/neurips/Online Learning under Adversarial Nonlinear Constraints	
@@ -0,0 +1 @@
+In many applications, learning systems are required to process continuous non-stationary data streams. We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints. As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves $\sqrt{T}$ regret and converges to the feasible set at a rate of $1/\sqrt{T}$, despite the fact that the feasible set is slowly time-varying and a priori unknown to the learner. CVV-Pro only relies on local sparse linear approximations of the feasible set and therefore avoids optimizing over the entire set at each iteration, which is in sharp contrast to projected gradients or Frank-Wolfe methods. We also empirically evaluate our algorithm on two-player games, where the players are subjected to a shared constraint.
\ No newline at end of file
diff --git a/data/2023/neurips/Online POMDP Planning with Anytime Deterministic Guarantees b/data/2023/neurips/Online POMDP Planning with Anytime Deterministic Guarantees
new file mode 100644
index 0000000000..82a94b6ac8
--- /dev/null
+++ b/data/2023/neurips/Online POMDP Planning with Anytime Deterministic Guarantees	
@@ -0,0 +1 @@
+Autonomous agents operating in real-world scenarios frequently encounter uncertainty and make decisions based on incomplete information. Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs). However, finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. In recent years, approximate algorithms, such as tree search and sample-based methodologies, have emerged as state-of-the-art POMDP solvers for larger problems. Despite their effectiveness, these algorithms offer only probabilistic and often asymptotic guarantees toward the optimal solution due to their dependence on sampling. To address these limitations, we derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one. First, we derive bounds for selecting a subset of the observations to branch from while computing a complete belief at each posterior node. Then, since a complete belief update may be computationally demanding, we extend the bounds to support reduction of both the state and the observation spaces. We demonstrate how our guarantees can be integrated with existing state-of-the-art solvers that sample a subset of states and observations. As a result, the returned solution holds deterministic bounds relative to the optimal policy. Lastly, we substantiate our findings with supporting experimental results.
\ No newline at end of file
diff --git a/data/2023/neurips/Online robust non-stationary estimation b/data/2023/neurips/Online robust non-stationary estimation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation b/data/2023/neurips/Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation
new file mode 100644
index 0000000000..cde42ff29b
--- /dev/null
+++ b/data/2023/neurips/Open Compound Domain Adaptation with Object Style Compensation for Semantic Segmentation	
@@ -0,0 +1 @@
+Many methods of semantic image segmentation have borrowed the success of open compound domain adaptation. They minimize the style gap between the images of source and target domains, more easily predicting the accurate pseudo annotations for target domain's images that train segmentation network. The existing methods globally adapt the scene style of the images, whereas the object styles of different categories or instances are adapted improperly. This paper proposes the Object Style Compensation, where we construct the Object-Level Discrepancy Memory with multiple sets of discrepancy features. The discrepancy features in a set capture the style changes of the same category's object instances adapted from target to source domains. We learn the discrepancy features from the images of source and target domains, storing the discrepancy features in memory. With this memory, we select appropriate discrepancy features for compensating the style information of the object instances of various categories, adapting the object styles to a unified style of source domain. Our method enables a more accurate computation of the pseudo annotations for target domain's images, thus yielding state-of-the-art results on different datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting b/data/2023/neurips/Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
new file mode 100644
index 0000000000..dfdc875e05
--- /dev/null
+++ b/data/2023/neurips/Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting	
@@ -0,0 +1 @@
+Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.
\ No newline at end of file
diff --git a/data/2023/neurips/OpenMask3D: Open-Vocabulary 3D Instance Segmentation b/data/2023/neurips/OpenMask3D: Open-Vocabulary 3D Instance Segmentation
new file mode 100644
index 0000000000..fd88599409
--- /dev/null
+++ b/data/2023/neurips/OpenMask3D: Open-Vocabulary 3D Instance Segmentation	
@@ -0,0 +1 @@
+We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features for each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods cannot separate multiple object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. Experiments and ablation studies on ScanNet200 and Replica show that OpenMask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase OpenMask3D's ability to segment object properties based on free-form queries describing geometry, affordances, and materials.
\ No newline at end of file
diff --git a/data/2023/neurips/OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning b/data/2023/neurips/OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
new file mode 100644
index 0000000000..6b426c702c
--- /dev/null
+++ b/data/2023/neurips/OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning	
@@ -0,0 +1 @@
+Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of systematic understanding persists due to the diverse settings, complex implementation, and difficult reproducibility. Without standardization, comparisons can be unfair and insights inconclusive. To address this dilemma, we propose OpenSTL, a comprehensive benchmark for spatio-temporal predictive learning that categorizes prevalent approaches into recurrent-based and recurrent-free models. OpenSTL provides a modular and extensible framework implementing various state-of-the-art methods. We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and weather forecasting. Based on our observations, we provide a detailed analysis of how model architecture and dataset properties affect spatio-temporal predictive learning performance. Surprisingly, we find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models. Thus, we further extend the common MetaFormers to boost recurrent-free spatial-temporal predictive learning. We open-source the code and models at https://github.com/chengtan9907/OpenSTL.
\ No newline at end of file
diff --git a/data/2023/neurips/Operation-Level Early Stopping for Robustifying Differentiable NAS b/data/2023/neurips/Operation-Level Early Stopping for Robustifying Differentiable NAS
new file mode 100644
index 0000000000..f7f47736bb
--- /dev/null
+++ b/data/2023/neurips/Operation-Level Early Stopping for Robustifying Differentiable NAS	
@@ -0,0 +1 @@
+Differentiable NAS (DARTS) is a simple and efficient neural architecture search method that has been extensively adopted in various machine learning tasks. Never-theless, DARTS still encounters several robustness issues, mainly the domination of skip connections. The resulting architectures are full of parametric-free operations, leading to performance collapse. Existing methods suggest that the skip connection has additional advantages in optimization compared to other parametric operations and propose to alleviate the domination of skip connections by eliminating these additional advantages. In this paper, we analyze this issue from a simple and straightforward perspective and propose that the domination of skip connections results from parametric operations overfitting the training data while architecture parameters are trained on the validation data, leading to undesired behaviors. Based on this observation, we propose the operation-level early stopping (OLES) method to overcome this issue and robustify DARTS without introducing any computation overhead. Extensive experimental results can verify our hypothesis and the effectiveness of OLES.
\ No newline at end of file
diff --git a/data/2023/neurips/Operator Learning with Neural Fields: Tackling PDEs on General Geometries b/data/2023/neurips/Operator Learning with Neural Fields: Tackling PDEs on General Geometries
new file mode 100644
index 0000000000..7ef958653b
--- /dev/null
+++ b/data/2023/neurips/Operator Learning with Neural Fields: Tackling PDEs on General Geometries	
@@ -0,0 +1 @@
+Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely on some form of discretization. In order to alleviate such limitations, we present CORAL, a new method that leverages coordinate-based networks for solving PDEs on general geometries. CORAL is designed to remove constraints on the input mesh, making it applicable to any spatial sampling and geometry. Its ability extends to diverse problem domains, including PDE solving, spatio-temporal forecasting, and inverse problems like geometric design. CORAL demonstrates robust performance across multiple resolutions and performs well in both convex and non-convex domains, surpassing or performing on par with state-of-the-art models.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Algorithms for the Inhomogeneous Spiked Wigner Model b/data/2023/neurips/Optimal Algorithms for the Inhomogeneous Spiked Wigner Model
new file mode 100644
index 0000000000..317b3c76ff
--- /dev/null
+++ b/data/2023/neurips/Optimal Algorithms for the Inhomogeneous Spiked Wigner Model	
@@ -0,0 +1 @@
+In this paper, we study a spiked Wigner problem with an inhomogeneous noise profile. Our aim in this problem is to recover the signal passed through an inhomogeneous low-rank matrix channel. While the information-theoretic performances are well-known, we focus on the algorithmic problem. We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem and show that its rigorous state evolution coincides with the information-theoretic optimal Bayes fixed-point equations. We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random. Finally, from the adapted AMP iteration we deduce a simple and efficient spectral method that can be used to recover the transition for matrices with general variance profiles. This spectral method matches the conjectured optimal computational phase transition.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning b/data/2023/neurips/Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning
new file mode 100644
index 0000000000..7808c85837
--- /dev/null
+++ b/data/2023/neurips/Optimal Block-wise Asymmetric Graph Construction for Graph-based Semi-supervised Learning	
@@ -0,0 +1 @@
+Graph-based semi-supervised learning (GSSL) serves as a powerful tool to model the underlying manifold structures of samples in high-dimensional spaces. It involves two phases: constructing an affinity graph from available data and inferring labels for unlabeled nodes on this graph. While numerous algorithms have been developed for label inference, the crucial graph construction phase has received comparatively less attention, despite its significant influence on the subsequent phase. In this paper, we present an optimal asymmetric graph structure for the label inference phase with theoretical motivations. Unlike existing graph construction methods, we differentiate the distinct roles that labeled nodes and unlabeled nodes could play. Accordingly, we design an efficient block-wise graph learning algorithm with a global convergence guarantee. Other benefits induced by our method, such as enhanced robustness to noisy node features, are explored as well. Finally, we perform extensive experiments on synthetic and real-world datasets to demonstrate its superiority to the state-of-the-art graph construction methods in GSSL.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes b/data/2023/neurips/Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
new file mode 100644
index 0000000000..a568abbb8e
--- /dev/null
+++ b/data/2023/neurips/Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes	
@@ -0,0 +1 @@
+Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, PMD algorithmically regularises the policy improvement step of PI. With exact policy evaluation, PI is known to converge linearly with a rate given by the discount factor $\gamma$ of a Markov Decision Process. In this work, we bridge the gap between PI and PMD with exact policy evaluation and show that the dimension-free $\gamma$-rate of PI can be achieved by the general family of unregularised PMD algorithms under an adaptive step-size. We show that both the rate and step-size are unimprovable for PMD: we provide matching lower bounds that demonstrate that the $\gamma$-rate is optimal for PMD methods as well as PI, and that the adaptive step-size is necessary for PMD to achieve it. Our work is the first to relate PMD to rate-optimality and step-size necessity. Our study of the convergence of PMD avoids the use of the performance difference lemma, which leads to a direct analysis of independent interest. We also extend the analysis to the inexact setting and establish the first dimension-optimal sample complexity for unregularised PMD under a generative model, improving upon the best-known result.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure b/data/2023/neurips/Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure
new file mode 100644
index 0000000000..924821eff9
--- /dev/null
+++ b/data/2023/neurips/Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure	
@@ -0,0 +1 @@
+We consider the problem of solving stochastic monotone variational inequalities with a separable structure using a stochastic ﬁrst-order oracle. Building on standard extragradient for variational inequalities we propose a novel algorithm—stochastic accelerated gradient-extragradient (AG-EG)—for strongly monotone variational inequalities (VIs). Our approach combines the strengths of extragradient and Nesterov acceleration. By showing that its iterates remain in a bounded domain and applying scheduled restarting, we prove that AG-EG has an optimal convergence rate for strongly monotone VIs. Furthermore, when specializing to the particular case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems, including bilinear games, our algorithm achieves ﬁne-grained convergence rates that match the respective lower bounds, with the stochasticity being characterized by an additive statistical error term that is optimal up to a constant prefactor.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model b/data/2023/neurips/Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model
new file mode 100644
index 0000000000..963ad558c4
--- /dev/null
+++ b/data/2023/neurips/Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model	
@@ -0,0 +1 @@
+Parallelization is a popular strategy for improving the performance of iterative algorithms. Optimization methods are no exception: design of efficient parallel optimization methods and tight analysis of their theoretical properties are important research endeavors. While the minimax complexities are well known for sequential optimization methods, the theory of parallel optimization methods is less explored. In this paper, we propose a new protocol that generalizes the classical oracle framework approach. Using this protocol, we establish minimax complexities for parallel optimization methods that have access to an unbiased stochastic gradient oracle with bounded variance. We consider a fixed computation model characterized by each worker requiring a fixed but worker-dependent time to calculate stochastic gradient. We prove lower bounds and develop optimal algorithms that attain them. Our results have surprising consequences for the literature of asynchronous optimization methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal Transport for Treatment Effect Estimation b/data/2023/neurips/Optimal Transport for Treatment Effect Estimation
new file mode 100644
index 0000000000..526fb2bdf4
--- /dev/null
+++ b/data/2023/neurips/Optimal Transport for Treatment Effect Estimation	
@@ -0,0 +1 @@
+Estimating conditional average treatment effect from observational data is highly challenging due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed mass-preserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimal testing using combined test statistics across independent studies b/data/2023/neurips/Optimal testing using combined test statistics across independent studies
new file mode 100644
index 0000000000..6a820e69b0
--- /dev/null
+++ b/data/2023/neurips/Optimal testing using combined test statistics across independent studies	
@@ -0,0 +1 @@
+Combining test statistics from independent trials or experiments is a popular method of meta-analysis. However, there is very limited theoretical understanding of the power of the combined test, especially in high-dimensional models considering composite hypotheses tests. We derive a mathematical framework to study standard {meta-analysis} testing approaches in the context of the many normal means model, which serves as the platform to investigate more complex models. We introduce a natural and mild restriction on the meta-level combination functions of the local trials. This allows us to mathematically quantify the cost of compressing $m$ trials into real-valued test statistics and combining these. We then derive minimax lower and matching upper bounds for the separation rates of standard combination methods for e.g. p-values and e-values, quantifying the loss relative to using the full, pooled data. We observe an elbow effect, revealing that in certain cases combining the locally optimal tests in each trial results in a sub-optimal {meta-analysis} method and develop approaches to achieve the global optima. We also explore the possible gains of allowing limited coordination between the trial designs. Our results connect meta-analysis with bandwidth constraint distributed inference and build on recent information theoretic developments in the latter field.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL b/data/2023/neurips/Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL
new file mode 100644
index 0000000000..12a397dc08
--- /dev/null
+++ b/data/2023/neurips/Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL	
@@ -0,0 +1 @@
+While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary. This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL. Optimistic NPG can be viewed as a simple combination of the classic natural policy gradient (NPG) algorithm [Kakade, 2001] with optimistic policy evaluation subroutines to encourage exploration. For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $\varepsilon$-optimal policy within $\tilde{O}(d^2/\varepsilon^3)$ samples, which is the first computationally efficient algorithm whose sample complexity has the optimal dimension dependence $\tilde{\Theta}(d^2)$. It also improves over state-of-the-art results of policy optimization algorithms [Zanette et al., 2021] by a factor of $d$. In the realm of general function approximation, which subsumes linear MDPs, Optimistic NPG, to our best knowledge, stands as the first policy optimization algorithm that achieves polynomial sample complexity for learning near-optimal policies.
\ No newline at end of file
diff --git a/data/2023/neurips/Optimizing Prompts for Text-to-Image Generation b/data/2023/neurips/Optimizing Prompts for Text-to-Image Generation
new file mode 100644
index 0000000000..d6a4512405
--- /dev/null
+++ b/data/2023/neurips/Optimizing Prompts for Text-to-Image Generation	
@@ -0,0 +1 @@
+Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.
\ No newline at end of file
diff --git a/data/2023/neurips/Order Matters in the Presence of Dataset Imbalance for Multilingual Learning b/data/2023/neurips/Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
new file mode 100644
index 0000000000..4e446fe4e6
--- /dev/null
+++ b/data/2023/neurips/Order Matters in the Presence of Dataset Imbalance for Multilingual Learning	
@@ -0,0 +1 @@
+In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.
\ No newline at end of file
diff --git a/data/2023/neurips/Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources b/data/2023/neurips/Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
new file mode 100644
index 0000000000..6e0ed3a406
--- /dev/null
+++ b/data/2023/neurips/Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources	
@@ -0,0 +1 @@
+Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learning methods, synthesizing OOD data via data generators for predictor training without requiring any real OOD data. Related methods typically pre-train a generator on ID data and adopt various selection procedures to find those data likely to be the OOD cases. However, generated data may still coincide with ID semantics, i.e., mistaken OOD generation remains, confusing the predictor between ID and OOD data. To this end, we suggest that generated data (with mistaken OOD generation) can be used to devise an auxiliary OOD detection task to facilitate real OOD detection. Specifically, we can ensure that learning from such an auxiliary task is beneficial if the ID and the OOD parts have disjoint supports, with the help of a well-designed training procedure for the predictor. Accordingly, we propose a powerful data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts.
\ No newline at end of file
diff --git a/data/2023/neurips/P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting b/data/2023/neurips/P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting
new file mode 100644
index 0000000000..575c87a509
--- /dev/null
+++ b/data/2023/neurips/P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting	
@@ -0,0 +1 @@
+While recent large-scale neural codec language models have shown significant improvement in zero-shot TTS by training on thousands of hours of data, they suffer from drawbacks such as a lack of robustness, slow sampling speed similar to previous autoregressive TTS methods, and reliance on pre-trained neural codec representations. Our work proposes P-Flow, a fast and data-efficient zero-shot TTS model that uses speech prompts for speaker adaptation. P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis. Our speech-prompted text encoder uses speech prompts and text input to generate speaker-conditional text representation. The flow matching generative decoder uses the speaker-conditional output to synthesize high-quality personalized speech significantly faster than in real-time. Unlike the neural codec language models, we specifically train P-Flow on LibriTTS dataset using a continuous mel-representation. Through our training method using continuous speech prompts, P-Flow matches the speaker similarity performance of the large-scale zero-shot TTS models with two orders of magnitude less training data and has more than 20 × faster sampling speed. Our results show that P-Flow has better pronunciation and is preferred in human likeness and speaker similarity to its recent state-of-the-art counterparts, thus defining P-Flow as an attractive and desirable alternative. We provide audio samples on our demo page.
\ No newline at end of file
diff --git a/data/2023/neurips/PAC Learning Linear Thresholds from Label Proportions b/data/2023/neurips/PAC Learning Linear Thresholds from Label Proportions
new file mode 100644
index 0000000000..aba8cbd6ac
--- /dev/null
+++ b/data/2023/neurips/PAC Learning Linear Thresholds from Label Proportions	
@@ -0,0 +1 @@
+Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works on LLP have focused on training models on such training data, computational learnability of LLP was only recently explored by [Saket'21, Saket'22] who showed worst case intractability of properly learning linear threshold functions (LTFs) from label proportions. However, their work did not rule out efficient algorithms for this problem on natural distributions. In this work we show that it is indeed possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion in which feature-vectors are, conditioned on their labels, independently sampled from a Gaussian distribution $N(\mathbf{\mu}, \mathbf{\Sigma})$. Our work shows that a certain matrix -- formed using covariances of the differences of feature-vectors sampled from the bags with and without replacement -- necessarily has its principal component, after a transformation, in the direction of the normal vector of the LTF. Our algorithm estimates the means and covariance matrices using subgaussian concentration bounds which we show can be applied to efficiently sample bags for approximating the normal direction. Using this in conjunction with novel generalization error bounds in the bag setting, we show that a low error hypothesis LTF can be identified. For some special cases of the $N(\mathbf{0}, \mathbf{I})$ distribution we provide a simpler mean estimation based algorithm. We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs, demonstrating the effectiveness of our techniques.
\ No newline at end of file
diff --git a/data/2023/neurips/PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers b/data/2023/neurips/PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
new file mode 100644
index 0000000000..0ce5653ae2
--- /dev/null
+++ b/data/2023/neurips/PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers	
@@ -0,0 +1 @@
+Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
\ No newline at end of file
diff --git a/data/2023/neurips/PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model b/data/2023/neurips/PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
new file mode 100644
index 0000000000..e463b05ed5
--- /dev/null
+++ b/data/2023/neurips/PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model	
@@ -0,0 +1 @@
+Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive"decoding"module with a"planning"module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.
\ No newline at end of file
diff --git a/data/2023/neurips/POMDP Planning for Object Search in Partially Unknown Environment b/data/2023/neurips/POMDP Planning for Object Search in Partially Unknown Environment
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/PPi: Pretraining Brain Signal Model for Patient-independent Seizure Detection b/data/2023/neurips/PPi: Pretraining Brain Signal Model for Patient-independent Seizure Detection
new file mode 100644
index 0000000000..b30a6c472e
--- /dev/null
+++ b/data/2023/neurips/PPi: Pretraining Brain Signal Model for Patient-independent Seizure Detection	
@@ -0,0 +1 @@
+Automated seizure detection is of great importance to epilepsy diagnosis and treat-ment. An emerging method used in seizure detection, stereoelectroencephalography (SEEG), can provide detailed and stereoscopic brainwave information. However, modeling SEEG in clinical scenarios will face challenges like huge domain shift between different patients and dramatic pattern evolution among different brain areas. In this study, we propose a P retraining-based model for P atient-i ndependent seizure detection (PPi) to address these challenges. Firstly, we design two novel self-supervised tasks which can extract rich information from abundant SEEG data while preserving the unique characteristics between brain signals recorded from different brain areas. Then two techniques, channel background subtraction and brain region enhancement , are proposed to effectively tackle the domain shift problem. Extensive experiments show that PPi outperforms the SOTA baselines on two public datasets and a real-world clinical dataset collected by us, which demonstrates the effectiveness and practicability of PPi. Finally, visualization analysis illustrates the rationality of the two domain generalization techniques.
\ No newline at end of file
diff --git a/data/2023/neurips/PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising b/data/2023/neurips/PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising
new file mode 100644
index 0000000000..a9690fecb0
--- /dev/null
+++ b/data/2023/neurips/PUCA: Patch-Unshuffle and Channel Attention for Enhanced Self-Supervised Image Denoising	
@@ -0,0 +1 @@
+Although supervised image denoising networks have shown remarkable performance on synthesized noisy images, they often fail in practice due to the difference between real and synthesized noise. Since clean-noisy image pairs from the real world are extremely costly to gather, self-supervised learning, which utilizes noisy input itself as a target, has been studied. To prevent a self-supervised denoising model from learning identical mapping, each output pixel should not be influenced by its corresponding input pixel; This requirement is known as J-invariance. Blind-spot networks (BSNs) have been a prevalent choice to ensure J-invariance in self-supervised image denoising. However, constructing variations of BSNs by injecting additional operations such as downsampling can expose blinded information, thereby violating J-invariance. Consequently, convolutions designed specifically for BSNs have been allowed only, limiting architectural flexibility. To overcome this limitation, we propose PUCA, a novel J-invariant U-Net architecture, for self-supervised denoising. PUCA leverages patch-unshuffle/shuffle to dramatically expand receptive fields while maintaining J-invariance and dilated attention blocks (DABs) for global context incorporation. Experimental results demonstrate that PUCA achieves state-of-the-art performance, outperforming existing methods in self-supervised image denoising.
\ No newline at end of file
diff --git a/data/2023/neurips/ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP b/data/2023/neurips/ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP
new file mode 100644
index 0000000000..830f5535af
--- /dev/null
+++ b/data/2023/neurips/ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP	
@@ -0,0 +1 @@
+Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific triggers in the input can lead poisoned models to misclassify these inputs to predetermined target classes. Current detection mechanisms are limited by their inability to address more covert backdoor strategies, such as style-based attacks. In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the interpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to stay stealthy. Based on this observation, we hypothesize that while the model's predictions for paraphrased clean samples should remain stable, predictions for poisoned samples should revert to their true labels upon the mutations applied to triggers during the paraphrasing process. We employ ChatGPT, a state-of-the-art large language model, as our paraphraser and formulate the trigger-removal task as a prompt engineering problem. We adopt fuzzing, a technique commonly used for unearthing software vulnerabilities, to discover optimal paraphrase prompts that can effectively eliminate triggers while concurrently maintaining input semantics. Experiments on 4 types of backdoor attacks, including the subtle style backdoors, and 4 distinct datasets demonstrate that our approach surpasses baseline methods, including STRIP, RAP, and ONION, in precision and recall.
\ No newline at end of file
diff --git a/data/2023/neurips/Parallel Submodular Function Minimization b/data/2023/neurips/Parallel Submodular Function Minimization
new file mode 100644
index 0000000000..9ea7fea8ed
--- /dev/null
+++ b/data/2023/neurips/Parallel Submodular Function Minimization	
@@ -0,0 +1 @@
+We consider the parallel complexity of submodular function minimization (SFM). We provide a pair of methods which obtain two new query versus depth trade-offs a submodular function defined on subsets of $n$ elements that has integer values between $-M$ and $M$. The first method has depth $2$ and query complexity $n^{O(M)}$ and the second method has depth $\widetilde{O}(n^{1/3} M^{2/3})$ and query complexity $O(\mathrm{poly}(n, M))$. Despite a line of work on improved parallel lower bounds for SFM, prior to our work the only known algorithms for parallel SFM either followed from more general methods for sequential SFM or highly-parallel minimization of convex $\ell_2$-Lipschitz functions. Interestingly, to obtain our second result we provide the first highly-parallel algorithm for minimizing $\ell_\infty$-Lipschitz function over the hypercube which obtains near-optimal depth for obtaining constant accuracy.
\ No newline at end of file
diff --git a/data/2023/neurips/Parallel-mentoring for Offline Model-based Optimization b/data/2023/neurips/Parallel-mentoring for Offline Model-based Optimization
new file mode 100644
index 0000000000..66f453760c
--- /dev/null
+++ b/data/2023/neurips/Parallel-mentoring for Offline Model-based Optimization	
@@ -0,0 +1 @@
+We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
\ No newline at end of file
diff --git a/data/2023/neurips/Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense b/data/2023/neurips/Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense
new file mode 100644
index 0000000000..4881e747df
--- /dev/null
+++ b/data/2023/neurips/Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense	
@@ -0,0 +1 @@
+The rise in malicious usage of large language models, such as fake content creation and academic plagiarism, has motivated the development of approaches that identify AI-generated text, including those based on watermarking or outlier detection. However, the robustness of these detection algorithms to paraphrases of AI-generated text remains unclear. To stress test these detectors, we build a 11B parameter paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering. Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking, GPTZero, DetectGPT, and OpenAI's text classifier. For example, DIPPER drops detection accuracy of DetectGPT from 70.3% to 4.6% (at a constant false positive rate of 1%), without appreciably modifying the input semantics. To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider. Given a candidate text, our algorithm searches a database of sequences previously generated by the API, looking for sequences that match the candidate text within a certain threshold. We empirically verify our defense using a database of 15M generations from a fine-tuned T5-XXL model and find that it can detect 80% to 97% of paraphrased generations across different settings while only classifying 1% of human-written sequences as AI-generated. We open-source our models, code and data.
\ No newline at end of file
diff --git a/data/2023/neurips/Partial Label Learning with Dissimilarity Propagation guided Candidate Label Shrinkage b/data/2023/neurips/Partial Label Learning with Dissimilarity Propagation guided Candidate Label Shrinkage
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Participatory Personalization in Classification b/data/2023/neurips/Participatory Personalization in Classification
new file mode 100644
index 0000000000..3c608ba0ee
--- /dev/null
+++ b/data/2023/neurips/Participatory Personalization in Classification	
@@ -0,0 +1 @@
+Machine learning models are often personalized with information that is protected, sensitive, self-reported, or costly to acquire. These models use information about people but do not facilitate nor inform their consent. Individuals cannot opt out of reporting personal information to a model, nor tell if they benefit from personalization in the first place. We introduce a family of classification models, called participatory systems, that let individuals opt into personalization at prediction time. We present a model-agnostic algorithm to learn participatory systems for personalization with categorical group attributes. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
\ No newline at end of file
diff --git a/data/2023/neurips/Particle-based Variational Inference with Generalized Wasserstein Gradient Flow b/data/2023/neurips/Particle-based Variational Inference with Generalized Wasserstein Gradient Flow
new file mode 100644
index 0000000000..e3b2ec4896
--- /dev/null
+++ b/data/2023/neurips/Particle-based Variational Inference with Generalized Wasserstein Gradient Flow	
@@ -0,0 +1 @@
+Particle-based variational inference methods (ParVIs) such as Stein variational gradient descent (SVGD) update the particles based on the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence. However, the design of kernels is often non-trivial and can be restrictive for the flexibility of the method. Recent works show that functional gradient flow approximations with quadratic form regularization terms can improve performance. In this paper, we propose a ParVI framework, called generalized Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient flow of the KL divergence, which can be viewed as a functional gradient method with a broader class of regularizers induced by convex functions. We show that GWG exhibits strong convergence guarantees. We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence. In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models b/data/2023/neurips/Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models
new file mode 100644
index 0000000000..245e4cea07
--- /dev/null
+++ b/data/2023/neurips/Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models	
@@ -0,0 +1 @@
+Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve outstanding FID scores in line with state-of-the-art benchmarks: 1.77 on CelebA-64$\times$64, 1.93 on AFHQv2-Wild-64$\times$64, and 2.72 on ImageNet-256$\times$256. We share our code and pre-trained models at https://github.com/Zhendong-Wang/Patch-Diffusion.
\ No newline at end of file
diff --git a/data/2023/neurips/Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution b/data/2023/neurips/Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
new file mode 100644
index 0000000000..80dcf609d4
--- /dev/null
+++ b/data/2023/neurips/Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution	
@@ -0,0 +1 @@
+The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence packing during training to process inputs of arbitrary resolutions and aspect ratios. Alongside flexible model usage, we demonstrate improved training efficiency for large-scale supervised and contrastive image-text pretraining. NaViT can be efficiently transferred to standard tasks such as image and video classification, object detection, and semantic segmentation and leads to improved results on robustness and fairness benchmarks. At inference time, the input resolution flexibility can be used to smoothly navigate the test-time cost-performance trade-off. We believe that NaViT marks a departure from the standard, CNN-designed, input and modelling pipeline used by most computer vision models, and represents a promising direction for ViTs.
\ No newline at end of file
diff --git a/data/2023/neurips/Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width b/data/2023/neurips/Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width
new file mode 100644
index 0000000000..58629ca919
--- /dev/null
+++ b/data/2023/neurips/Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width	
@@ -0,0 +1 @@
+We systematically analyze optimization dynamics in deep neural networks (DNNs) trained with stochastic gradient descent (SGD) and study the effect of learning rate $\eta$, depth $d$, and width $w$ of the neural network. By analyzing the maximum eigenvalue $\lambda^H_t$ of the Hessian of the loss, which is a measure of sharpness of the loss landscape, we find that the dynamics can show four distinct regimes: (i) an early time transient regime, (ii) an intermediate saturation regime, (iii) a progressive sharpening regime, and (iv) a late time ``edge of stability"regime. The early and intermediate regimes (i) and (ii) exhibit a rich phase diagram depending on $\eta \equiv c / \lambda_0^H $, $d$, and $w$. We identify several critical values of $c$, which separate qualitatively distinct phenomena in the early time dynamics of training loss and sharpness. Notably, we discover the opening up of a ``sharpness reduction"phase, where sharpness decreases at early times, as $d$ and $1/w$ are increased.
\ No newline at end of file
diff --git a/data/2023/neurips/Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties b/data/2023/neurips/Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties
new file mode 100644
index 0000000000..fc89d86d6b
--- /dev/null
+++ b/data/2023/neurips/Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties	
@@ -0,0 +1 @@
+General physical scene understanding requires more than simply localizing and recognizing objects -- it requires knowledge that objects can have different latent properties (e.g., mass or elasticity), and that those properties affect the outcome of physical events. While there has been great progress in physical and video prediction models in recent years, benchmarks to test their performance typically do not require an understanding that objects have individual physical properties, or at best test only those properties that are directly observable (e.g., size or color). This work proposes a novel dataset and benchmark, termed Physion++, that rigorously evaluates visual physical prediction in artificial systems under circumstances where those predictions rely on accurate estimates of the latent physical properties of objects in the scene. Specifically, we test scenarios where accurate prediction relies on estimates of properties such as mass, friction, elasticity, and deformability, and where the values of those properties can only be inferred by observing how objects move and interact with other objects or fluids. We evaluate the performance of a number of state-of-the-art prediction models that span a variety of levels of learning vs. built-in knowledge, and compare that performance to a set of human predictions. We find that models that have been trained using standard regimes and datasets do not spontaneously learn to make inferences about latent properties, but also that models that encode objectness and physical states tend to make better predictions. However, there is still a huge gap between all models and human performance, and all models' predictions correlate poorly with those made by humans, suggesting that no state-of-the-art model is learning to make physical predictions in a human-like way. Project page: https://dingmyu.github.io/physion_v2/
\ No newline at end of file
diff --git a/data/2023/neurips/PlanE: Representation Learning over Planar Graphs b/data/2023/neurips/PlanE: Representation Learning over Planar Graphs
new file mode 100644
index 0000000000..c7b427531e
--- /dev/null
+++ b/data/2023/neurips/PlanE: Representation Learning over Planar Graphs	
@@ -0,0 +1 @@
+Graph neural networks are prominent models for representation learning over graphs, where the idea is to iteratively compute representations of nodes of an input graph through a series of transformations in such a way that the learned graph function is isomorphism invariant on graphs, which makes the learned representations graph invariants. On the other hand, it is well-known that graph invariants learned by these class of models are incomplete: there are pairs of non-isomorphic graphs which cannot be distinguished by standard graph neural networks. This is unsurprising given the computational difficulty of graph isomorphism testing on general graphs, but the situation begs to differ for special graph classes, for which efficient graph isomorphism testing algorithms are known, such as planar graphs. The goal of this work is to design architectures for efficiently learning complete invariants of planar graphs. Inspired by the classical planar graph isomorphism algorithm of Hopcroft and Tarjan, we propose PlanE as a framework for planar representation learning. PlanE includes architectures which can learn complete invariants over planar graphs while remaining practically scalable. We empirically validate the strong performance of the resulting model architectures on well-known planar graph benchmarks, achieving multiple state-of-the-art results.
\ No newline at end of file
diff --git a/data/2023/neurips/PoET: A generative model of protein families as sequences-of-sequences b/data/2023/neurips/PoET: A generative model of protein families as sequences-of-sequences
new file mode 100644
index 0000000000..e17f27b76b
--- /dev/null
+++ b/data/2023/neurips/PoET: A generative model of protein families as sequences-of-sequences	
@@ -0,0 +1 @@
+Generative protein language models are a natural way to design new proteins with desired functions. However, current models are either difficult to direct to produce a protein from a specific family of interest, or must be trained on a large multiple sequence alignment (MSA) from the specific family of interest, making them unable to benefit from transfer learning across families. To address this, we propose $\textbf{P}$r$\textbf{o}$tein $\textbf{E}$volutionary $\textbf{T}$ransformer (PoET), an autoregressive generative model of whole protein families that learns to generate sets of related proteins as sequences-of-sequences across tens of millions of natural protein sequence clusters. PoET can be used as a retrieval-augmented language model to generate and score arbitrary modifications conditioned on any protein family of interest, and can extrapolate from short context lengths to generalize well even for small families. This is enabled by a unique Transformer layer; we model tokens sequentially within sequences while attending between sequences order invariantly, allowing PoET to scale to context lengths beyond those used during training. In extensive experiments on deep mutational scanning datasets, we show that PoET outperforms existing protein language models and evolutionary sequence models for variant function prediction across proteins of all MSA depths. We also demonstrate PoET's ability to controllably generate new protein sequences.
\ No newline at end of file
diff --git a/data/2023/neurips/Policy Space Diversity for Non-Transitive Games b/data/2023/neurips/Policy Space Diversity for Non-Transitive Games
new file mode 100644
index 0000000000..c5ae473f15
--- /dev/null
+++ b/data/2023/neurips/Policy Space Diversity for Non-Transitive Games	
@@ -0,0 +1 @@
+Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.
\ No newline at end of file
diff --git a/data/2023/neurips/PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models b/data/2023/neurips/PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models
new file mode 100644
index 0000000000..0522f19314
--- /dev/null
+++ b/data/2023/neurips/PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models	
@@ -0,0 +1 @@
+This paper presents PolyDiffuse, a novel structured reconstruction algorithm that transforms visual sensor data into polygonal shapes with Diffusion Models (DM), an emerging machinery amid exploding generative AI, while formulating reconstruction as a generation process conditioned on sensor data. The task of structured reconstruction poses two fundamental challenges to DM: 1) A structured geometry is a ``set'' (e.g., a set of polygons for a floorplan geometry), where a sample of $N$ elements has $N!$ different but equivalent representations, making the denoising highly ambiguous; and 2) A ``reconstruction'' task has a single solution, where an initial noise needs to be chosen carefully, while any initial noise works for a generation task. Our technical contribution is the introduction of a Guided Set Diffusion Model where 1) the forward diffusion process learns guidance networks to control noise injection so that one representation of a sample remains distinct from its other permutation variants, thus resolving denoising ambiguity; and 2) the reverse denoising process reconstructs polygonal shapes, initialized and directed by the guidance networks, as a conditional generation process subject to the sensor data. We have evaluated our approach for reconstructing two types of polygonal shapes: floorplan as a set of polygons and HD map for autonomous cars as a set of polylines. Through extensive experiments on standard benchmarks, we demonstrate that PolyDiffuse significantly advances the current state of the art and enables broader practical applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets b/data/2023/neurips/Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
new file mode 100644
index 0000000000..b488e0a7e9
--- /dev/null
+++ b/data/2023/neurips/Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets	
@@ -0,0 +1 @@
+The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Post Hoc Explanations of Language Models Can Improve Language Models b/data/2023/neurips/Post Hoc Explanations of Language Models Can Improve Language Models
new file mode 100644
index 0000000000..c2642a87ea
--- /dev/null
+++ b/data/2023/neurips/Post Hoc Explanations of Language Models Can Improve Language Models	
@@ -0,0 +1 @@
+Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Post-processing Private Synthetic Data for Improving Utility on Selected Measures b/data/2023/neurips/Post-processing Private Synthetic Data for Improving Utility on Selected Measures
new file mode 100644
index 0000000000..dbc365be4c
--- /dev/null
+++ b/data/2023/neurips/Post-processing Private Synthetic Data for Improving Utility on Selected Measures	
@@ -0,0 +1 @@
+Existing private synthetic data generation algorithms are agnostic to downstream tasks. However, end users may have specific requirements that the synthetic data must satisfy. Failure to meet these requirements could significantly reduce the utility of the data for downstream use. We introduce a post-processing technique that improves the utility of the synthetic data with respect to measures selected by the end user, while preserving strong privacy guarantees and dataset quality. Our technique involves resampling from the synthetic data to filter out samples that do not meet the selected utility measures, using an efficient stochastic first-order algorithm to find optimal resampling weights. Through comprehensive numerical experiments, we demonstrate that our approach consistently improves the utility of synthetic data across multiple benchmark datasets and state-of-the-art synthetic data generation algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/PrObeD: Proactive Object Detection Wrapper b/data/2023/neurips/PrObeD: Proactive Object Detection Wrapper
new file mode 100644
index 0000000000..19f8a39323
--- /dev/null
+++ b/data/2023/neurips/PrObeD: Proactive Object Detection Wrapper	
@@ -0,0 +1 @@
+Previous research in $2D$ object detection focuses on various tasks, including detecting objects in generic and camouflaged images. These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not optimal. To rectify this problem, we propose a wrapper based on proactive schemes, PrObeD, which enhances the performance of these object detectors by learning a signal. PrObeD consists of an encoder-decoder architecture, where the encoder network generates an image-dependent signal termed templates to encrypt the input images, and the decoder recovers this template from the encrypted images. We propose that learning the optimum template results in an object detector with an improved detection performance. The template acts as a mask to the input images to highlight semantics useful for the object detector. Finetuning the object detector with these encrypted images enhances the detection performance for both generic and camouflaged. Our experiments on MS-COCO, CAMO, COD$10$K, and NC$4$K datasets show improvement over different detectors after applying PrObeD. Our models/codes are available at https://github.com/vishal3477/Proactive-Object-Detection.
\ No newline at end of file
diff --git a/data/2023/neurips/PreDiff: Precipitation Nowcasting with Latent Diffusion Models b/data/2023/neurips/PreDiff: Precipitation Nowcasting with Latent Diffusion Models
new file mode 100644
index 0000000000..8b5922702d
--- /dev/null
+++ b/data/2023/neurips/PreDiff: Precipitation Nowcasting with Latent Diffusion Models	
@@ -0,0 +1 @@
+Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge alignment mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
\ No newline at end of file
diff --git a/data/2023/neurips/Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent b/data/2023/neurips/Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent
new file mode 100644
index 0000000000..d544e2b8f3
--- /dev/null
+++ b/data/2023/neurips/Preconditioning Matters: Fast Global Convergence of Non-convex Matrix Factorization via Scaled Gradient Descent	
@@ -0,0 +1 @@
+Low-rank matrix factorization (LRMF) is a canonical problem in non-convex optimization, the objective function to be minimized is non-convex and even non-smooth, which makes the global convergence guarantee of gradient-based algorithm quite challenging. Recent work made a breakthrough on proving that standard gradient descent converges to the ε -global minima after O ( dκ 2 τ 2 ln dσ d τ + dκ 2 τ 2 ln σ d ε ) iterations from small initialization with a very small learning rate (both are related to the small constant τ ). While the dependence of the convergence on the condition number κ and small learning rate makes it not practical especially for ill-conditioned LRMF problem. In this paper, we show that precondition helps in accelerating the convergence and prove that the scaled gradient descent (ScaledGD) and its variant, alternating scaled gradient descent (AltScaledGD) converge to an ε -global minima after O (ln dδ +ln dε ) iterations from general random initialization. Meanwhile, for small initialization as in gradient descent, both ScaledGD and AltScaledGD converge to ε -global minima after only O (ln dε ) iterations. Furthermore, we prove that as a proximity to the alternating minimization, AltScaledGD converges faster than ScaledGD, its global convergence does not rely on small learning rate and small initialization, which certiﬁcates the advantages of AltScaledGD in LRMF.
\ No newline at end of file
diff --git a/data/2023/neurips/Predicting a Protein's Stability under a Million Mutations b/data/2023/neurips/Predicting a Protein's Stability under a Million Mutations
new file mode 100644
index 0000000000..0d32d2dcf7
--- /dev/null
+++ b/data/2023/neurips/Predicting a Protein's Stability under a Million Mutations	
@@ -0,0 +1 @@
+Stabilizing proteins is a foundational step in protein engineering. However, the evolutionary pressure of all extant proteins makes identifying the scarce number of mutations that will improve thermodynamic stability challenging. Deep learning has recently emerged as a powerful tool for identifying promising mutations. Existing approaches, however, are computationally expensive, as the number of model inferences scales with the number of mutations queried. Our main contribution is a simple, parallel decoding algorithm. Our Mutate Everything is capable of predicting the effect of all single and double mutations in one forward pass. It is even versatile enough to predict higher-order mutations with minimal computational overhead. We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability. We trained on the Mega-Scale cDNA proteolysis dataset and achieved state-of-the-art performance on single and higher-order mutations on S669, ProTherm, and ProteinGym datasets. Code is available at https://github.com/jozhang97/MutateEverything
\ No newline at end of file
diff --git a/data/2023/neurips/Prediction and Control in Continual Reinforcement Learning b/data/2023/neurips/Prediction and Control in Continual Reinforcement Learning
new file mode 100644
index 0000000000..02efac944b
--- /dev/null
+++ b/data/2023/neurips/Prediction and Control in Continual Reinforcement Learning	
@@ -0,0 +1 @@
+Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge that persists over time, and a transient value function, which allows quick adaptation to new situations. We establish theoretical results showing that our approach is well suited for continual learning and draw connections to the complementary learning systems (CLS) theory from neuroscience. Empirically, this approach improves performance significantly on both prediction and control problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation b/data/2023/neurips/Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
new file mode 100644
index 0000000000..8e850bc71d
--- /dev/null
+++ b/data/2023/neurips/Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation	
@@ -0,0 +1 @@
+Recently, a new line of works has emerged to understand and improve self-attention in Transformers by treating it as a kernel machine. However, existing works apply the methods for symmetric kernels to the asymmetric self-attention, resulting in a nontrivial gap between the analytical understanding and numerical implementation. In this paper, we provide a new perspective to represent and optimize self-attention through asymmetric Kernel Singular Value Decomposition (KSVD), which is also motivated by the low-rank property of self-attention normally observed in deep layers. Through asymmetric KSVD, $i$) a primal-dual representation of self-attention is formulated, where the optimization objective is cast to maximize the projection variances in the attention outputs; $ii$) a novel attention mechanism, i.e., Primal-Attention, is proposed via the primal representation of KSVD, avoiding explicit computation of the kernel matrix in the dual; $iii$) with KKT conditions, we prove that the stationary solution to the KSVD optimization in Primal-Attention yields a zero-value objective. In this manner, KSVD optimization can be implemented by simply minimizing a regularization loss, so that low-rank property is promoted without extra decomposition. Numerical experiments show state-of-the-art performance of our Primal-Attention with improved efficiency. Moreover, we demonstrate that the deployed KSVD optimization regularizes Primal-Attention with a sharper singular value decay than that of the canonical self-attention, further verifying the great potential of our method. To the best of our knowledge, this is the first work that provides a primal-dual representation for the asymmetric kernel in self-attention and successfully applies it to modeling and optimization.
\ No newline at end of file
diff --git a/data/2023/neurips/Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation b/data/2023/neurips/Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation
new file mode 100644
index 0000000000..0f684b8e38
--- /dev/null
+++ b/data/2023/neurips/Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation	
@@ -0,0 +1 @@
+Privacy and communication constraints are two major bottlenecks in federated learning (FL) and analytics (FA). We study the optimal accuracy of mean and frequency estimation (canonical models for FL and FA respectively) under joint communication and $(\varepsilon, \delta)$-differential privacy (DP) constraints. We show that in order to achieve the optimal error under $(\varepsilon, \delta)$-DP, it is sufficient for each client to send $\Theta\left( n \min\left(\varepsilon, \varepsilon^2\right)\right)$ bits for FL and $\Theta\left(\log\left( n\min\left(\varepsilon, \varepsilon^2\right) \right)\right)$ bits for FA to the server, where $n$ is the number of participating clients. Without compression, each client needs $O(d)$ bits and $\log d$ bits for the mean and frequency estimation problems respectively (where $d$ corresponds to the number of trainable parameters in FL or the domain size in FA), which means that we can get significant savings in the regime $ n \min\left(\varepsilon, \varepsilon^2\right) = o(d)$, which is often the relevant regime in practice. Our algorithms leverage compression for privacy amplification: when each client communicates only partial information about its sample, we show that privacy can be amplified by randomly selecting the part contributed by each client.
\ No newline at end of file
diff --git a/data/2023/neurips/Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks b/data/2023/neurips/Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks
new file mode 100644
index 0000000000..98919cc3ef
--- /dev/null
+++ b/data/2023/neurips/Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks	
@@ -0,0 +1 @@
+We consider the problem of minimizing a non-convex objective while preserving the privacy of the examples in the training data. Building upon the previous variance-reduced algorithm SpiderBoost, we introduce a new framework that utilizes two different kinds of gradient oracles. The first kind of oracles can estimate the gradient of one point, and the second kind of oracles, less precise and more cost-effective, can estimate the gradient difference between two points. SpiderBoost uses the first kind periodically, once every few steps, while our framework proposes using the first oracle whenever the total drift has become large and relies on the second oracle otherwise. This new framework ensures the gradient estimations remain accurate all the time, resulting in improved rates for finding second-order stationary points. Moreover, we address a more challenging task of finding the global minima of a non-convex objective using the exponential mechanism. Our findings indicate that the regularized exponential mechanism can closely match previous empirical and population risk bounds, without requiring smoothness assumptions for algorithms with polynomial running time. Furthermore, by disregarding running time considerations, we show that the exponential mechanism can achieve a good population risk bound and provide a nearly matching lower bound.
\ No newline at end of file
diff --git a/data/2023/neurips/Private estimation algorithms for stochastic block models and mixture models b/data/2023/neurips/Private estimation algorithms for stochastic block models and mixture models
new file mode 100644
index 0000000000..931c170a01
--- /dev/null
+++ b/data/2023/neurips/Private estimation algorithms for stochastic block models and mixture models	
@@ -0,0 +1 @@
+We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient $(\epsilon, \delta)$-differentially private algorithm for both weak recovery and exact recovery. Previously known algorithms achieving comparable guarantees required quasi-polynomial time. For the latter, we design an $(\epsilon, \delta)$-differentially private algorithm that recovers the centers of the $k$-mixture when the minimum separation is at least $ O(k^{1/t}\sqrt{t})$. For all choices of $t$, this algorithm requires sample complexity $n\geq k^{O(1)}d^{O(t)}$ and time complexity $(nd)^{O(t)}$. Prior work required minimum separation at least $O(\sqrt{k})$ as well as an explicit upper bound on the Euclidean norm of the centers.
\ No newline at end of file
diff --git a/data/2023/neurips/ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab b/data/2023/neurips/ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
new file mode 100644
index 0000000000..831e545227
--- /dev/null
+++ b/data/2023/neurips/ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab	
@@ -0,0 +1 @@
+The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of modern intelligent systems has led to notable progress in various domains. Consequently, we embarked on an investigation of intelligent monitoring systems as a means of tackling the issue of the reproducibility crisis. Specifically, we first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective. This dataset comprises fine-grained hierarchical annotations intended for the purpose of studying activity understanding in BioLab. Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings. Finally, we provide a thorough experimental evaluation of contemporary video understanding models and highlight their limitations in this specialized domain to identify potential avenues for future research. We hope ProBio with associated benchmarks may garner increased focus on modern AI techniques in the realm of molecular biology.
\ No newline at end of file
diff --git a/data/2023/neurips/ProPILE: Probing Privacy Leakage in Large Language Models b/data/2023/neurips/ProPILE: Probing Privacy Leakage in Large Language Models
new file mode 100644
index 0000000000..2a938041a2
--- /dev/null
+++ b/data/2023/neurips/ProPILE: Probing Privacy Leakage in Large Language Models	
@@ -0,0 +1 @@
+The rapid advancement and widespread use of large language models (LLMs) have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.
\ No newline at end of file
diff --git a/data/2023/neurips/Probabilistic Invariant Learning with Randomized Linear Classifiers b/data/2023/neurips/Probabilistic Invariant Learning with Randomized Linear Classifiers
new file mode 100644
index 0000000000..610cfb0c58
--- /dev/null
+++ b/data/2023/neurips/Probabilistic Invariant Learning with Randomized Linear Classifiers	
@@ -0,0 +1 @@
+Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
\ No newline at end of file
diff --git a/data/2023/neurips/Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem b/data/2023/neurips/Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem
new file mode 100644
index 0000000000..9cf0ec744f
--- /dev/null
+++ b/data/2023/neurips/Projection-Free Methods for Stochastic Simple Bilevel Optimization with Convex Lower-level Problem	
@@ -0,0 +1 @@
+In this paper, we study a class of stochastic bilevel optimization problems, also known as stochastic simple bilevel optimization, where we minimize a smooth stochastic objective function over the optimal solution set of another stochastic convex optimization problem. We introduce novel stochastic bilevel optimization methods that locally approximate the solution set of the lower-level problem via a stochastic cutting plane, and then run a conditional gradient update with variance reduction techniques to control the error induced by using stochastic gradients. For the case that the upper-level function is convex, our method requires $\tilde{\mathcal{O}}(\max\{1/\epsilon_f^{2},1/\epsilon_g^{2}\}) $ stochastic oracle queries to obtain a solution that is $\epsilon_f$-optimal for the upper-level and $\epsilon_g$-optimal for the lower-level. This guarantee improves the previous best-known complexity of $\mathcal{O}(\max\{1/\epsilon_f^{4},1/\epsilon_g^{4}\})$. Moreover, for the case that the upper-level function is non-convex, our method requires at most $\tilde{\mathcal{O}}(\max\{1/\epsilon_f^{3},1/\epsilon_g^{3}\}) $ stochastic oracle queries to find an $(\epsilon_f, \epsilon_g)$-stationary point. In the finite-sum setting, we show that the number of stochastic oracle calls required by our method are $\tilde{\mathcal{O}}(\sqrt{n}/\epsilon)$ and $\tilde{\mathcal{O}}(\sqrt{n}/\epsilon^{2})$ for the convex and non-convex settings, respectively, where $\epsilon=\min \{\epsilon_f,\epsilon_g\}$.
\ No newline at end of file
diff --git a/data/2023/neurips/Projection-Free Online Convex Optimization via Efficient Newton Iterations b/data/2023/neurips/Projection-Free Online Convex Optimization via Efficient Newton Iterations
new file mode 100644
index 0000000000..f74d2dfa6f
--- /dev/null
+++ b/data/2023/neurips/Projection-Free Online Convex Optimization via Efficient Newton Iterations	
@@ -0,0 +1 @@
+This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set $\cK$ to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap potentially-expensive Euclidean projections onto $\mathcal{K}$ for linear optimization over $\mathcal{K}$. However, such algorithms have a sub-optimal regret in OCO compared to projection-based algorithms. In this paper, we look at a third type of algorithms that output approximate Newton iterates using a self-concordant barrier for the set of interest. The use of a self-concordant barrier automatically ensures feasibility without the need for projections. However, the computation of the Newton iterates requires a matrix inverse, which can still be expensive. As our main contribution, we show how the stability of the Newton iterates can be leveraged to compute the inverse Hessian only a vanishing fraction of the rounds, leading to a new efficient projection-free OCO algorithm with a state-of-the-art regret bound.
\ No newline at end of file
diff --git a/data/2023/neurips/PromptIR: Prompting for All-in-One Image Restoration b/data/2023/neurips/PromptIR: Prompting for All-in-One Image Restoration
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Propagating Knowledge Updates to LMs Through Distillation b/data/2023/neurips/Propagating Knowledge Updates to LMs Through Distillation
new file mode 100644
index 0000000000..a2ef273bcb
--- /dev/null
+++ b/data/2023/neurips/Propagating Knowledge Updates to LMs Through Distillation	
@@ -0,0 +1 @@
+Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our approach consists of two stages: transfer set generation and distillation on the transfer set. We first generate a transfer set by prompting a language model to generate continuations from the entity definition. Then, we update the model parameters so that the distribution of the LM (the student) matches the distribution of the LM conditioned on the definition (the teacher) on the transfer set. Our experiments demonstrate that this approach is more effective at propagating knowledge updates than fine-tuning and other gradient-based knowledge-editing methods. Moreover, it does not compromise performance in other contexts, even when injecting the definitions of up to 150 entities at once.
\ No newline at end of file
diff --git a/data/2023/neurips/ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design b/data/2023/neurips/ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics b/data/2023/neurips/ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics
new file mode 100644
index 0000000000..c501b82c23
--- /dev/null
+++ b/data/2023/neurips/ProteinInvBench: Benchmarking Protein Inverse Folding on Diverse Tasks, Models, and Metrics	
@@ -0,0 +1 @@
+Protein inverse folding has attracted increasing attention in recent years. However, we observe that current methods are usually limited to the CATH dataset and the recovery metric. The lack of a unified framework for ensembling and comparing different methods hinders the comprehensive investigation. In this paper, we propose ProteinInvBench, a new benchmark for protein design, which comprises extended protein design tasks, integrated models, and diverse evaluation metrics. We broaden the application of methods originally designed for single-chain protein design to new scenarios of multi-chain and de novo protein design. Recent impressive methods, including GraphTrans, StructGNN, GVP, GCA, AlphaDesign, ProteinMPNN, PiFold and KWDesign are integrated into our framework. In addition to the recovery, we also evaluate the confidence, diversity, sc-TM, efficiency, and robustness to thoroughly revisit current protein design approaches and inspire future work. As a result, we establish the first comprehensive benchmark for protein design, which is publicly available at https://github.com/A4Bio/OpenCPD .
\ No newline at end of file
diff --git a/data/2023/neurips/Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond b/data/2023/neurips/Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond
new file mode 100644
index 0000000000..de75b1a54b
--- /dev/null
+++ b/data/2023/neurips/Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond	
@@ -0,0 +1 @@
+Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable"proposal"distribution and the unnormalized"target"distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces. First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. Third, we find that the arithmetic path, while rarely used, can offer optimality properties over the universally-used geometric path. In fact, in a particular limit, the optimal path is arithmetic. Based on this theory, we finally propose a two-step estimator to approximate the optimal path in an efficient way.
\ No newline at end of file
diff --git a/data/2023/neurips/Provable benefits of score matching b/data/2023/neurips/Provable benefits of score matching
new file mode 100644
index 0000000000..f0d9a5d371
--- /dev/null
+++ b/data/2023/neurips/Provable benefits of score matching	
@@ -0,0 +1 @@
+Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood -- both computational and statistical -- are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension.
\ No newline at end of file
diff --git a/data/2023/neurips/Provable convergence guarantees for black-box variational inference b/data/2023/neurips/Provable convergence guarantees for black-box variational inference
new file mode 100644
index 0000000000..aff59df564
--- /dev/null
+++ b/data/2023/neurips/Provable convergence guarantees for black-box variational inference	
@@ -0,0 +1 @@
+Black-box variational inference is widely used in situations where there is no proof that its stochastic optimization succeeds. We suggest this is due to a theoretical gap in existing stochastic optimization proofs: namely the challenge of gradient estimators with unusual noise bounds, and a composite non-smooth objective. For dense Gaussian variational families, we observe that existing gradient estimators based on reparameterization satisfy a quadratic noise bound and give novel convergence guarantees for proximal and projected stochastic gradient descent using this bound. This provides rigorous guarantees that methods similar to those used in practice converge on realistic inference problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Provably (More) Sample-Efficient Offline RL with Options b/data/2023/neurips/Provably (More) Sample-Efficient Offline RL with Options
new file mode 100644
index 0000000000..908bb3f8c3
--- /dev/null
+++ b/data/2023/neurips/Provably (More) Sample-Efficient Offline RL with Options	
@@ -0,0 +1 @@
+The options framework yields empirical success in long-horizon planning problems of reinforcement learning (RL). Recent works show that options improves the sample efficiency in online RL where the learner can actively explores the environment. However, these results are no longer applicable to scenarios where exploring the environment online is risky, e.g., automated driving and healthcare. In this paper, we provide the first analysis of the sample complexity for offline RL with options, where the agent learns from a dataset without further interaction with the environment. We propose the PE ssimistic V alue I teration for Learning with O ptions (PEVIO) algorithm and establish near-optimal suboptimality bounds (with respect to the novel information-theoretic lower bound for offline RL with options) for two popular data-collection procedures, where the first one collects state-option transitions and the second one collects state-action transitions. We show that compared to offline RL with actions, using options not only enjoys a faster finite-time convergence rate (to the optimal value) but also attains a better performance (when either the options are carefully designed or the offline data is limited). Based on these results, we analyze the pros and cons of the data-collection procedures, which may facilitate the selection in practice.
\ No newline at end of file
diff --git a/data/2023/neurips/Provably Bounding Neural Network Preimages b/data/2023/neurips/Provably Bounding Neural Network Preimages
new file mode 100644
index 0000000000..a036e806fd
--- /dev/null
+++ b/data/2023/neurips/Provably Bounding Neural Network Preimages	
@@ -0,0 +1 @@
+Most work on the formal verification of neural networks has focused on bounding the set of outputs that correspond to a given set of inputs (for example, bounded perturbations of a nominal input). However, many use cases of neural network verification require solving the inverse problem, or over-approximating the set of inputs that lead to certain outputs. We present the INVPROP algorithm for verifying properties over the preimage of a linearly constrained output set, which can be combined with branch-and-bound to increase precision. Contrary to other approaches, our efficient algorithm is GPU-accelerated and does not require a linear programming solver. We demonstrate our algorithm for identifying safe control regions for a dynamical system via backward reachability analysis, verifying adversarial robustness, and detecting out-of-distribution inputs to a neural network. Our results show that in certain settings, we find over-approximations over 2500x tighter than prior work while being 2.5x faster. By strengthening robustness verification with output constraints, we consistently verify more properties than the previous state-of-the-art on multiple benchmarks, including a large model with 167k neurons in VNN-COMP 2023. Our algorithm has been incorporated into the $\alpha,\!\beta$-CROWN verifier, available at https://abcrown.org.
\ No newline at end of file
diff --git a/data/2023/neurips/Proximity-Informed Calibration for Deep Neural Networks b/data/2023/neurips/Proximity-Informed Calibration for Deep Neural Networks
new file mode 100644
index 0000000000..833b02d928
--- /dev/null
+++ b/data/2023/neurips/Proximity-Informed Calibration for Deep Neural Networks	
@@ -0,0 +1 @@
+Confidence calibration is central to providing accurate and interpretable uncertainty estimates, especially under safety-critical scenarios. However, we find that existing calibration algorithms often overlook the issue of *proximity bias*, a phenomenon where models tend to be more overconfident in low proximity data (i.e., data lying in the sparse region of the data distribution) compared to high proximity samples, and thus suffer from inconsistent miscalibration across different proximity samples. We examine the problem over 504 pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are relatively more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples. Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. To further quantify the effectiveness of calibration algorithms in mitigating proximity bias, we introduce proximity-informed expected calibration error (PIECE) with theoretical analysis. We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings under four metrics over various model architectures. We believe our findings on proximity bias will guide the development of *fairer and better-calibrated* models, contributing to the broader pursuit of trustworthy AI. Our code is available at: https://github.com/MiaoXiong2320/ProximityBias-Calibration.
\ No newline at end of file
diff --git a/data/2023/neurips/Q-DM: An Efficient Low-bit Quantized Diffusion Model b/data/2023/neurips/Q-DM: An Efficient Low-bit Quantized Diffusion Model
new file mode 100644
index 0000000000..61c9bd5f02
--- /dev/null
+++ b/data/2023/neurips/Q-DM: An Efficient Low-bit Quantized Diffusion Model	
@@ -0,0 +1 @@
+Denoising diffusion generative models are capable of generating high-quality data, but suffers from the computation-costly generation process, due to a iterative noise estimation using full-precision networks. As an intuitive solution, quantization can significantly reduce the computational and memory consumption by low-bit parameters and operations. However, low-bit noise estimation networks in diffusion models (DMs) remain unexplored yet and perform much worse than the full-precision counterparts as observed in our experimental studies. In this paper, we first identify that the bottlenecks of low-bit quantized DMs come from a large distribution oscillation on activations and accumulated quantization error caused by the multi-step denoising process. To address these issues, we first develop a Timestep-aware Quantization (TaQ) method and a Noise-estimating Mimicking (NeM) scheme for low-bit quantized DMs (Q-DM) to effectively eliminate such oscillation and accumulated error respectively, leading to well-performed low-bit DMs. In this way, we propose an efficient Q-DM to calculate low-bit DMs by considering both training and inference process in the same framework. We evaluate our methods on popular DDPM and DDIM models. Extensive experimental results show that our method achieves a much better performance than the prior arts. For example, the 4-bit Q-DM theoretically accelerates the 1000-step DDPM by 7.8 × and achieves a FID score of 5.17, on the unconditional CIFAR-10 dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules b/data/2023/neurips/QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules
new file mode 100644
index 0000000000..31de6d571b
--- /dev/null
+++ b/data/2023/neurips/QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules	
@@ -0,0 +1 @@
+Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
\ No newline at end of file
diff --git a/data/2023/neurips/Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing b/data/2023/neurips/Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
new file mode 100644
index 0000000000..44a77eea62
--- /dev/null
+++ b/data/2023/neurips/Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing	
@@ -0,0 +1 @@
+Transformer models have been widely adopted in various domains over the last years, and especially large language models have advanced the field of AI significantly. Due to their size, the capability of these networks has increased tremendously, but this has come at the cost of a significant increase in necessary compute. Quantization is one of the most effective ways to reduce the computational time and memory consumption of neural networks. Many studies have shown, however, that modern transformer models tend to learn strong outliers in their activations, making them difficult to quantize. To retain acceptable performance, the existence of these outliers requires activations to be in higher bitwidth or the use of different numeric formats, extra fine-tuning, or other workarounds. We show that strong outliers are related to very specific behavior of attention heads that try to learn a"no-op"or just a partial update of the residual. To achieve the exact zeros needed in the attention matrix for a no-update, the input to the softmax is pushed to be larger and larger during training, causing outliers in other parts of the network. Based on these observations, we propose two simple (independent) modifications to the attention mechanism - clipped softmax and gated attention. We empirically show that models pre-trained using our methods learn significantly smaller outliers while maintaining and sometimes even improving the floating-point task performance. This enables us to quantize transformers to full INT8 quantization of the activations without any additional effort. We demonstrate the effectiveness of our methods on both language models (BERT, OPT) and vision transformers.
\ No newline at end of file
diff --git a/data/2023/neurips/Query-based Temporal Fusion with Explicit Motion for 3D Object Detection b/data/2023/neurips/Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths b/data/2023/neurips/RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths
new file mode 100644
index 0000000000..337b88a26b
--- /dev/null
+++ b/data/2023/neurips/RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths	
@@ -0,0 +1 @@
+Text-to-image generation has recently witnessed remarkable achievements. We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is achieved by stacking tens of mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling billions of diffusion paths (routes) from the network input to the output. Each path intuitively functions as a"painter"for depicting a particular textual concept onto a specified image region at a diffusion timestep. Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior performance in switching images across diverse styles, such as Japanese comics, realism, cyberpunk, and ink illustration. Secondly, a single model with three billion parameters, trained on 1,000 A100 GPUs for two months, achieves a state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore, RAPHAEL significantly surpasses its counterparts in human evaluation on the ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the frontiers of image generation research in both academia and industry, paving the way for future breakthroughs in this rapidly evolving field. More details can be found on a webpage: https://raphael-painter.github.io/.
\ No newline at end of file
diff --git a/data/2023/neurips/RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks b/data/2023/neurips/RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks
new file mode 100644
index 0000000000..c12f9a8be8
--- /dev/null
+++ b/data/2023/neurips/RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks	
@@ -0,0 +1 @@
+Model poisoning attacks greatly jeopardize the application of federated learning (FL). The effectiveness of existing defenses is susceptible to the latest model poisoning attacks, leading to a decrease in prediction accuracy. Besides, these defenses are intractable to distinguish benign outliers from malicious gradients, which further compromises the model generalization. In this work, we propose a novel proactive defense named RECESS against model poisoning attacks. Different from the passive analysis in previous defenses, RECESS proactively queries each participating client with a delicately constructed aggregation gradient, accompanied by the detection of malicious clients according to their responses with higher accuracy. Furthermore, RECESS uses a new trust scoring mechanism to robustly aggregate gradients. Unlike previous methods that score each iteration, RECESS considers clients' performance correlation across multiple iterations to estimate the trust score, substantially increasing fault tolerance. Finally, we extensively evaluate RECESS on typical model architectures and four datasets under various settings. We also evaluated the defensive effectiveness against other types of poisoning attacks, the sensitivity of hyperparameters, and adaptive adversarial attacks. Experimental results show the superiority of RECESS in terms of reducing accuracy loss caused by the latest model poisoning attacks over five classic and two state-of-the-art defenses.
\ No newline at end of file
diff --git a/data/2023/neurips/RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing b/data/2023/neurips/RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing
new file mode 100644
index 0000000000..15402022e6
--- /dev/null
+++ b/data/2023/neurips/RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing	
@@ -0,0 +1 @@
+Monte-Carlo path tracing is a powerful technique for realistic image synthesis but suffers from high levels of noise at low sample counts, limiting its use in real-time applications. To address this, we propose a framework with end-to-end training of a sampling importance network, a latent space encoder network, and a denoiser network. Our approach uses reinforcement learning to optimize the sampling importance network, thus avoiding explicit numerically approximated gradients. Our method does not aggregate the sampled values per pixel by averaging but keeps all sampled values which are then fed into the latent space encoder. The encoder replaces handcrafted spatiotemporal heuristics by learned representations in a latent space. Finally, a neural denoiser is trained to refine the output image. Our approach increases visual quality on several challenging datasets and reduces rendering times for equal quality by a factor of 1.6x compared to the previous state-of-the-art, making it a promising solution for real-time applications.
\ No newline at end of file
diff --git a/data/2023/neurips/RaLEs: a Benchmark for Radiology Language Evaluations b/data/2023/neurips/RaLEs: a Benchmark for Radiology Language Evaluations
new file mode 100644
index 0000000000..ef0b76ed95
--- /dev/null
+++ b/data/2023/neurips/RaLEs: a Benchmark for Radiology Language Evaluations	
@@ -0,0 +1 @@
+The radiology report is the main form of communication between radiologists and other clinicians. Prior work in natural language processing in radiology reports has shown the value of developing methods tailored for individual tasks such as identifying reports with critical results or disease detection. Meanwhile, English and biomedical natural language understanding benchmarks such as the General Language Understanding and Evaluation as well as Biomedical Language Understanding and Reasoning Benchmark have motivated the development of models that can be easily adapted to address many tasks in those domains. Here, we characterize the radiology report as a distinct domain and introduce RaLEs, the Radiology Language Evaluations, as a benchmark for natural language understanding and generation in radiology. RaLEs is comprised of six natural language understanding and generation evaluations including the extraction of anatomical and disease entities and their relations, procedure selection, and report summarization. We characterize the performance of models designed for the general, biomedical, clinical and radiology domains across these tasks. We find that advances in the general and biomedical domains do not necessarily translate to radiology, and that certain more advanced models from the general domain can perform comparably to smaller clinical-specific models. The limited performance of existing pre-trained models on RaLEs highlights the opportunity to improve domain-specific self-supervised models for natural language processing in radiology. We propose RaLEs as a benchmark to promote and track the development of such domain-specific radiology language models. RaLEs is available at https://github.com/StanfordMIMI/RaLEs .
\ No newline at end of file
diff --git a/data/2023/neurips/RanPAC: Random Projections and Pre-trained Models for Continual Learning b/data/2023/neurips/RanPAC: Random Projections and Pre-trained Models for Continual Learning
new file mode 100644
index 0000000000..0779ab9b0d
--- /dev/null
+++ b/data/2023/neurips/RanPAC: Random Projections and Pre-trained Models for Continual Learning	
@@ -0,0 +1 @@
+Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmarks, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast CL has not hitherto been fully tapped. Code is at github.com/RanPAC/RanPAC.
\ No newline at end of file
diff --git a/data/2023/neurips/Random Cuts are Optimal for Explainable k-Medians b/data/2023/neurips/Random Cuts are Optimal for Explainable k-Medians
new file mode 100644
index 0000000000..6d5076ed70
--- /dev/null
+++ b/data/2023/neurips/Random Cuts are Optimal for Explainable k-Medians	
@@ -0,0 +1 @@
+We show that the RandomCoordinateCut algorithm gives the optimal competitive ratio for explainable k-medians in l1. The problem of explainable k-medians was introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian in 2020. Several groups of authors independently proposed a simple polynomial-time randomized algorithm for the problem and showed that this algorithm is O(log k loglog k) competitive. We provide a tight analysis of the algorithm and prove that its competitive ratio is upper bounded by 2ln k +2. This bound matches the Omega(log k) lower bound by Dasgupta et al (2020).
\ No newline at end of file
diff --git a/data/2023/neurips/RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection b/data/2023/neurips/RangePerception: Taming LiDAR Range View for Efficient and Accurate 3D Object Detection
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization b/data/2023/neurips/Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization
new file mode 100644
index 0000000000..8d0215232b
--- /dev/null
+++ b/data/2023/neurips/Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization	
@@ -0,0 +1 @@
+The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.
\ No newline at end of file
diff --git a/data/2023/neurips/ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation b/data/2023/neurips/ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation
new file mode 100644
index 0000000000..7e10dae62a
--- /dev/null
+++ b/data/2023/neurips/ReMaX: Relaxing for Better Training on Efficient Panoptic Segmentation	
@@ -0,0 +1 @@
+This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during training for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin \textbf{without} any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes. Code and pre-trained checkpoints will be available at \url{https://github.com/google-research/deeplab2}.
\ No newline at end of file
diff --git a/data/2023/neurips/Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals b/data/2023/neurips/Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals
new file mode 100644
index 0000000000..5819858b73
--- /dev/null
+++ b/data/2023/neurips/Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals	
@@ -0,0 +1 @@
+High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and reward structures. Therefore, we hypothesize that the ability to utilize human-written instruction manuals to assist learning policies for specific tasks should lead to a more efficient and better-performing agent. We propose the Read and Reward framework. Read and Reward speeds up RL algorithms on Atari games by reading manuals released by the Atari game developers. Our framework consists of a QA Extraction module that extracts and summarizes relevant information from the manual and a Reasoning module that evaluates object-agent interactions based on information from the manual. An auxiliary reward is then provided to a standard A2C RL agent, when interaction is detected. Experimentally, various RL algorithms obtain significant improvement in performance and training speed when assisted by our design.
\ No newline at end of file
diff --git a/data/2023/neurips/Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding b/data/2023/neurips/Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding
new file mode 100644
index 0000000000..ebbdeea636
--- /dev/null
+++ b/data/2023/neurips/Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding	
@@ -0,0 +1 @@
+The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods. Experiments on Waymo and Argoverse-2 datasets show that HPTR achieves superior performance among end-to-end methods that do not apply expensive post-processing or model ensembling. The code is available at https://github.com/zhejz/HPTR.
\ No newline at end of file
diff --git a/data/2023/neurips/Recasting Continual Learning as Sequence Modeling b/data/2023/neurips/Recasting Continual Learning as Sequence Modeling
new file mode 100644
index 0000000000..57873e9ceb
--- /dev/null
+++ b/data/2023/neurips/Recasting Continual Learning as Sequence Modeling	
@@ -0,0 +1 @@
+In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a sequence model. By adopting the meta-continual learning (MCL) framework, we can train the sequence model at the meta-level, on multiple continual learning episodes. As a specific example of our new formulation, we demonstrate the application of Transformers and their efficient variants as MCL methods. Our experiments on seven benchmarks, covering both classification and regression, show that sequence models can be an attractive solution for general MCL.
\ No newline at end of file
diff --git a/data/2023/neurips/Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares b/data/2023/neurips/Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares
new file mode 100644
index 0000000000..362fdc6b66
--- /dev/null
+++ b/data/2023/neurips/Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares	
@@ -0,0 +1 @@
+We propose a new algorithm for the problem of recovering data that adheres to multiple, heterogeneous low-dimensional structures from linear observations. Focusing on data matrices that are simultaneously row-sparse and low-rank, we propose and analyze an iteratively reweighted least squares (IRLS) algorithm that is able to leverage both structures. In particular, it optimizes a combination of non-convex surrogates for row-sparsity and rank, a balancing of which is built into the algorithm. We prove locally quadratic convergence of the iterates to a simultaneously structured data matrix in a regime of minimal sample complexity (up to constants and a logarithmic factor), which is known to be impossible for a combination of convex surrogates. In experiments, we show that the IRLS method exhibits favorable empirical convergence, identifying simultaneously row-sparse and low-rank matrices from fewer measurements than state-of-the-art methods. Code is available at https://github.com/ckuemmerle/simirls.
\ No newline at end of file
diff --git a/data/2023/neurips/Recurrent Temporal Revision Graph Networks b/data/2023/neurips/Recurrent Temporal Revision Graph Networks
new file mode 100644
index 0000000000..fd91b1f8cc
--- /dev/null
+++ b/data/2023/neurips/Recurrent Temporal Revision Graph Networks	
@@ -0,0 +1 @@
+Temporal graphs offer more accurate modeling of many real-world scenarios than static graphs. However, neighbor aggregation, a critical building block of graph networks, for temporal graphs, is currently straightforwardly extended from that of static graphs. It can be computationally expensive when involving all historical neighbors during such aggregation. In practice, typically only a subset of the most recent neighbors are involved. However, such subsampling leads to incomplete and biased neighbor information. To address this limitation, we propose a novel framework for temporal neighbor aggregation that uses the recurrent neural network with node-wise hidden states to integrate information from all historical neighbors for each node to acquire the complete neighbor information. We demonstrate the superior theoretical expressiveness of the proposed framework as well as its state-of-the-art performance in real-world applications. Notably, it achieves a significant +9.6% improvement on averaged precision in a real-world Ecommerce dataset over existing methods on 2-layer models.
\ No newline at end of file
diff --git a/data/2023/neurips/Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability b/data/2023/neurips/Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability
new file mode 100644
index 0000000000..a5b555642d
--- /dev/null
+++ b/data/2023/neurips/Recursion in Recursion: Two-Level Nested Recursion for Length Generalization with Scalability	
@@ -0,0 +1 @@
+Binary Balanced Tree RvNNs (BBT-RvNNs) enforce sequence composition according to a preset balanced binary tree structure. Thus, their non-linear recursion depth is just $\log_2 n$ ($n$ being the sequence length). Such logarithmic scaling makes BBT-RvNNs efficient and scalable on long sequence tasks such as Long Range Arena (LRA). However, such computational efficiency comes at a cost because BBT-RvNNs cannot solve simple arithmetic tasks like ListOps. On the flip side, RvNNs (e.g., Beam Tree RvNN) that do succeed on ListOps (and other structure-sensitive tasks like formal logical inference) are generally several times more expensive than even RNNs. In this paper, we introduce a novel framework -- Recursion in Recursion (RIR) to strike a balance between the two sides - getting some of the benefits from both worlds. In RIR, we use a form of two-level nested recursion - where the outer recursion is a $k$-ary balanced tree model with another recursive model (inner recursion) implementing its cell function. For the inner recursion, we choose Beam Tree RvNNs (BT-RvNN). To adjust BT-RvNNs within RIR we also propose a novel strategy of beam alignment. Overall, this entails that the total recursive depth in RIR is upper-bounded by $k \log_k n$. Our best RIR-based model is the first model that demonstrates high ($\geq 90\%$) length-generalization performance on ListOps while at the same time being scalable enough to be trainable on long sequence inputs from LRA. Moreover, in terms of accuracy in the LRA language tasks, it performs competitively with Structured State Space Models (SSMs) without any special initialization - outperforming Transformers by a large margin. On the other hand, while SSMs can marginally outperform RIR on LRA, they (SSMs) fail to length-generalize on ListOps. Our code is available at: \url{https://github.com/JRC1995/BeamRecursionFamily/}.
\ No newline at end of file
diff --git a/data/2023/neurips/Red Teaming Deep Neural Networks with Feature Synthesis Tools b/data/2023/neurips/Red Teaming Deep Neural Networks with Feature Synthesis Tools
new file mode 100644
index 0000000000..7965773671
--- /dev/null
+++ b/data/2023/neurips/Red Teaming Deep Neural Networks with Feature Synthesis Tools	
@@ -0,0 +1 @@
+Interpretable AI tools are often motivated by the goal of understanding model behavior in out-of-distribution (OOD) contexts. Despite the attention this area of study receives, there are comparatively few cases where these tools have identified previously unknown bugs in models. We argue that this is due, in part, to a common feature of many interpretability methods: they analyze model behavior by using a particular dataset. This only allows for the study of the model in the context of features that the user can sample in advance. To address this, a growing body of research involves interpreting models using \emph{feature synthesis} methods that do not depend on a dataset. In this paper, we benchmark the usefulness of interpretability tools on debugging tasks. Our key insight is that we can implant human-interpretable trojans into models and then evaluate these tools based on whether they can help humans discover them. This is analogous to finding OOD bugs, except the ground truth is known, allowing us to know when an interpretation is correct. We make four contributions. (1) We propose trojan discovery as an evaluation task for interpretability tools and introduce a benchmark with 12 trojans of 3 different types. (2) We demonstrate the difficulty of this benchmark with a preliminary evaluation of 16 state-of-the-art feature attribution/saliency tools. Even under ideal conditions, given direct access to data with the trojan trigger, these methods still often fail to identify bugs. (3) We evaluate 7 feature-synthesis methods on our benchmark. (4) We introduce and evaluate 2 new variants of the best-performing method from the previous evaluation. A website for this paper and its code is at https://benchmarking-interpretability.csail.mit.edu/
\ No newline at end of file
diff --git a/data/2023/neurips/Regret Matching+: (In)Stability and Fast Convergence in Games b/data/2023/neurips/Regret Matching+: (In)Stability and Fast Convergence in Games
new file mode 100644
index 0000000000..3131a9dd90
--- /dev/null
+++ b/data/2023/neurips/Regret Matching+: (In)Stability and Fast Convergence in Games	
@@ -0,0 +1 @@
+Regret Matching+ (RM+) and its variants are important algorithms for solving large-scale games. However, a theoretical understanding of their success in practice is still a mystery. Moreover, recent advances on fast convergence in games are limited to no-regret algorithms such as online mirror descent, which satisfy stability. In this paper, we first give counterexamples showing that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret. We then provide two fixes: restarting and chopping off the positive orthant that RM+ works in. We show that these fixes are sufficient to get $O(T^{1/4})$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions. We also apply our stabilizing techniques to clairvoyant updates in the uncoupled learning setting for RM+ and prove desirable results akin to recent works for Clairvoyant online mirror descent. Our experiments show the advantages of our algorithms over vanilla RM+-based algorithms in matrix and extensive-form games.
\ No newline at end of file
diff --git a/data/2023/neurips/Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time b/data/2023/neurips/Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time
new file mode 100644
index 0000000000..c279ba0a04
--- /dev/null
+++ b/data/2023/neurips/Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time	
@@ -0,0 +1 @@
+A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.
\ No newline at end of file
diff --git a/data/2023/neurips/Rehearsal Learning for Avoiding Undesired Future b/data/2023/neurips/Rehearsal Learning for Avoiding Undesired Future
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark b/data/2023/neurips/Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark
new file mode 100644
index 0000000000..3b50633f96
--- /dev/null
+++ b/data/2023/neurips/Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark	
@@ -0,0 +1 @@
+Synthetic data serves as an alternative in training machine learning models, particularly when real-world data is limited or inaccessible. However, ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper addresses this issue by exploring the potential of integrating data-centric AI techniques which profile the data to guide the synthetic data generation process. Moreover, we shed light on the often ignored consequences of neglecting these data profiles during synthetic data generation -- despite seemingly high statistical fidelity. Subsequently, we propose a novel framework to evaluate the integration of data profiles to guide the creation of more representative synthetic data. In an empirical study, we evaluate the performance of five state-of-the-art models for tabular data generation on eleven distinct tabular datasets. The findings offer critical insights into the successes and limitations of current synthetic data generation techniques. Finally, we provide practical recommendations for integrating data-centric insights into the synthetic data generation process, with a specific focus on classification performance, model selection, and feature selection. This study aims to reevaluate conventional approaches to synthetic data generation and promote the application of data-centric AI techniques in improving the quality and effectiveness of synthetic data.
\ No newline at end of file
diff --git a/data/2023/neurips/Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models b/data/2023/neurips/Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
new file mode 100644
index 0000000000..0d72520b22
--- /dev/null
+++ b/data/2023/neurips/Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models	
@@ -0,0 +1 @@
+Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e.g., rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality. Our code is available at https://github.com/google-research/google-research/tree/master/dpok.
\ No newline at end of file
diff --git a/data/2023/neurips/Reinforcement Learning with Fast and Forgetful Memory b/data/2023/neurips/Reinforcement Learning with Fast and Forgetful Memory
new file mode 100644
index 0000000000..50598a70e6
--- /dev/null
+++ b/data/2023/neurips/Reinforcement Learning with Fast and Forgetful Memory	
@@ -0,0 +1 @@
+Nearly all real world tasks are inherently partially observable, necessitating the use of memory in Reinforcement Learning (RL). Most model-free approaches summarize the trajectory into a latent Markov state using memory models borrowed from Supervised Learning (SL), even though RL tends to exhibit different training and efficiency characteristics. Addressing this discrepancy, we introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for RL. Our approach constrains the model search space via strong structural priors inspired by computational psychology. It is a drop-in replacement for recurrent neural networks (RNNs) in recurrent RL algorithms, achieving greater reward than RNNs across various recurrent benchmarks and algorithms without changing any hyperparameters. Moreover, Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than RNNs, attributed to its logarithmic time and linear space complexity. Our implementation is available at https://github.com/proroklab/ffm.
\ No newline at end of file
diff --git a/data/2023/neurips/Reining Generalization in Offline Reinforcement Learning via Representation Distinction b/data/2023/neurips/Reining Generalization in Offline Reinforcement Learning via Representation Distinction
new file mode 100644
index 0000000000..21f8e3e5d0
--- /dev/null
+++ b/data/2023/neurips/Reining Generalization in Offline Reinforcement Learning via Representation Distinction	
@@ -0,0 +1 @@
+Offline Reinforcement Learning (RL) aims to address the challenge of distribution shift between the dataset and the learned policy, where the value of out-of-distribution (OOD) data may be erroneously estimated due to overgeneralization. It has been observed that a considerable portion of the benefits derived from the conservative terms designed by existing offline RL approaches originates from their impact on the learned representation. This observation prompts us to scrutinize the learning dynamics of offline RL, formalize the process of generalization, and delve into the prevalent overgeneralization issue in offline RL. We then investigate the potential to rein the generalization from the representation perspective to enhance offline RL. Finally, we present Representation Distinction (RD), an innovative plug-in method for improving offline RL algorithm performance by explicitly differentiating between the representations of in-sample and OOD state-action pairs generated by the learning policy. Considering scenarios in which the learning policy mirrors the behavioral policy and similar samples may be erroneously distinguished, we suggest a dynamic adjustment mechanism for RD based on an OOD data generator to prevent data representation collapse and further enhance policy performance. We demonstrate the efficacy of our approach by applying RD to designed backbone algorithms and widely-used offline RL algorithms. The proposed RD method significantly improves their performance across various continuous control tasks on D4RL datasets, surpassing several state-of-the-art offline RL algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification b/data/2023/neurips/Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification
new file mode 100644
index 0000000000..90c85ed122
--- /dev/null
+++ b/data/2023/neurips/Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification	
@@ -0,0 +1 @@
+Classiﬁcation is a fundamental problem in machine learning, and considerable efforts have been recently devoted to the demanding long-tailed setting due to its prevalence in nature. Departure from the Bayesian framework, this paper rethinks classiﬁcation from a matching perspective by studying the matching probability between samples and labels with optimal transport (OT) formulation. Speciﬁcally, we ﬁrst propose a new variant of optimal transport, called Relative Entropic Optimal Transport (RE-OT), which guides the coupling solution to a known prior information matrix. We gives some theoretical results and their proof for RE-OT and surprisingly ﬁnd RE-OT can help to deblur for barycenter images. Then we adopt inverse RE-OT for training long-tailed data and ﬁnd that the loss derived from RE-OT has a similar form to Softmax-based cross-entropy loss, indicating a close connection between optimal transport and classiﬁcation and the potential for transferring concepts between these two academic ﬁelds, such as barycentric projection in OT, which can map the labels back to the feature space. We further derive an epoch-varying RE-OT loss, and do the experiments on unbalanced image classiﬁcation, molecule classiﬁcation, instance segmentation and representation learning. Experimental results show its effectiveness.
\ No newline at end of file
diff --git a/data/2023/neurips/Reliable Off-Policy Learning for Dosage Combinations b/data/2023/neurips/Reliable Off-Policy Learning for Dosage Combinations
new file mode 100644
index 0000000000..3b1a9bef42
--- /dev/null
+++ b/data/2023/neurips/Reliable Off-Policy Learning for Dosage Combinations	
@@ -0,0 +1 @@
+Decision-making in personalized medicine such as cancer therapy or critical care must often make choices for dosage combinations, i.e., multiple continuous treatments. Existing work for this task has modeled the effect of multiple treatments independently, while estimating the joint effect has received little attention but comes with non-trivial challenges. In this paper, we propose a novel method for reliable off-policy learning for dosage combinations. Our method proceeds along three steps: (1) We develop a tailored neural network that estimates the individualized dose-response function while accounting for the joint effect of multiple dependent dosages. (2) We estimate the generalized propensity score using conditional normalizing flows in order to detect regions with limited overlap in the shared covariate-treatment space. (3) We present a gradient-based learning algorithm to find the optimal, individualized dosage combinations. Here, we ensure reliable estimation of the policy value by avoiding regions with limited overlap. We finally perform an extensive evaluation of our method to show its effectiveness. To the best of our knowledge, ours is the first work to provide a method for reliable off-policy learning for optimal dosage combinations.
\ No newline at end of file
diff --git a/data/2023/neurips/Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective b/data/2023/neurips/Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
new file mode 100644
index 0000000000..308afef6f1
--- /dev/null
+++ b/data/2023/neurips/Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective	
@@ -0,0 +1 @@
+There are a number of diverging hypotheses about the neural text degeneration problem, i.e., generating repetitive and dull loops, which makes this problem both interesting and confusing. In this work, we aim to advance our understanding by presenting a straightforward and fundamental explanation from the data perspective. Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data. Subsequent experiments also demonstrate that by selectively dropping out the attention to repetitive words in training data, degeneration can be significantly minimized. Furthermore, our empirical analysis illustrates that prior works addressing the degeneration issue from various standpoints, such as the high-inflow words, the likelihood objective, and the self-reinforcement phenomenon, can be interpreted by one simple explanation. That is, penalizing the repetitions in training data is a common and fundamental factor for their effectiveness. Moreover, our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
\ No newline at end of file
diff --git a/data/2023/neurips/Replicability in Reinforcement Learning b/data/2023/neurips/Replicability in Reinforcement Learning
new file mode 100644
index 0000000000..fe9090622b
--- /dev/null
+++ b/data/2023/neurips/Replicability in Reinforcement Learning	
@@ -0,0 +1 @@
+We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient $\rho$-replicable algorithm for $(\varepsilon, \delta)$-optimal policy estimation with sample and time complexity $\widetilde O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$, where $N$ is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order $\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right)$. Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is $\widetilde O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$. At the cost of $\exp(N)$ running time, we transform these TV indistinguishable algorithms to $\rho$-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e.g., Renyi) and show an improved sample complexity of $\widetilde O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)$.
\ No newline at end of file
diff --git a/data/2023/neurips/Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning b/data/2023/neurips/Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning
new file mode 100644
index 0000000000..64552c1156
--- /dev/null
+++ b/data/2023/neurips/Representation Equivalent Neural Operators: a Framework for Alias-free Operator Learning	
@@ -0,0 +1 @@
+Recently, operator learning, or learning mappings between infinite-dimensional function spaces, has garnered significant attention, notably in relation to learning partial differential equations from data. Conceptually clear when outlined on paper, neural operators necessitate discretization in the transition to computer implementations. This step can compromise their integrity, often causing them to deviate from the underlying operators. This research offers a fresh take on neural operators with a framework Representation equivalent Neural Operators (ReNO) designed to address these issues. At its core is the concept of operator aliasing, which measures inconsistency between neural operators and their discrete representations. We explore this for widely-used operator learning techniques. Our findings detail how aliasing introduces errors when handling different discretizations and grids and loss of crucial continuous structures. More generally, this framework not only sheds light on existing challenges but, given its constructive and broad nature, also potentially offers tools for developing new neural operators.
\ No newline at end of file
diff --git a/data/2023/neurips/Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests b/data/2023/neurips/Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
new file mode 100644
index 0000000000..4f23c63307
--- /dev/null
+++ b/data/2023/neurips/Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests	
@@ -0,0 +1 @@
+Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a"bag"of inputs, where the label is positive if and only if a positive element is contained within the bag, and otherwise is negative. Training in this context requires associating the bag-wide label to instance-level information, and implicitly contains a causal assumption and asymmetry to the task (i.e., you can't swap the labels without changing the semantics). MIL problems occur in healthcare (one malignant cell indicates cancer), cyber security (one malicious executable makes an infected computer), and many other tasks. In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption. They are able to learn anti-correlated instances, i.e., defaulting to"positive"labels until seeing a negative counter-example, which should not be possible for a correct MIL model. We suspect that enhancements and other works derived from these models will share the same issue. In any context in which these models are being used, this creates the potential for learning incorrect models, which creates risk of operational failure. We identify and demonstrate this problem via a proposed"algorithmic unit test", where we create synthetic datasets that can be solved by a MIL respecting model, and which clearly reveal learning that violates MIL assumptions. The five evaluated methods each fail one or more of these tests. This provides a model-agnostic way to identify violations of modeling assumptions, which we hope will be useful for future development and evaluation of MIL models.
\ No newline at end of file
diff --git a/data/2023/neurips/ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting b/data/2023/neurips/ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
new file mode 100644
index 0000000000..5e55062b7d
--- /dev/null
+++ b/data/2023/neurips/ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting	
@@ -0,0 +1 @@
+Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed due to the requirements of hundreds or even thousands of sampling steps. Existing acceleration sampling techniques inevitably sacrifice performance to some extent, leading to over-blurry SR results. To address this issue, we propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps, thereby eliminating the need for post-acceleration during inference and its associated performance deterioration. Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual between them, substantially improving the transition efficiency. Additionally, an elaborate noise schedule is developed to flexibly control the shifting speed and the noise strength during the diffusion process. Extensive experiments demonstrate that the proposed method obtains superior or at least comparable performance to current state-of-the-art methods on both synthetic and real-world datasets, even only with 15 sampling steps. Our code and model are available at https://github.com/zsyOAOA/ResShift.
\ No newline at end of file
diff --git a/data/2023/neurips/Resetting the Optimizer in Deep RL: An Empirical Study b/data/2023/neurips/Resetting the Optimizer in Deep RL: An Empirical Study
new file mode 100644
index 0000000000..ee31a2bacb
--- /dev/null
+++ b/data/2023/neurips/Resetting the Optimizer in Deep RL: An Empirical Study	
@@ -0,0 +1 @@
+We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.
\ No newline at end of file
diff --git a/data/2023/neurips/Residual Alignment: Uncovering the Mechanisms of Residual Networks b/data/2023/neurips/Residual Alignment: Uncovering the Mechanisms of Residual Networks
new file mode 100644
index 0000000000..fb56a43e31
--- /dev/null
+++ b/data/2023/neurips/Residual Alignment: Uncovering the Mechanisms of Residual Networks	
@@ -0,0 +1 @@
+The ResNet architecture has been widely adopted in deep learning due to its significant boost to performance through the use of simple skip connections, yet the underlying mechanisms leading to its success remain largely unknown. In this paper, we conduct a thorough empirical study of the ResNet architecture in classification tasks by linearizing its constituent residual blocks using Residual Jacobians and measuring their singular value decompositions. Our measurements reveal a process called Residual Alignment (RA) characterized by four properties: (RA1) intermediate representations of a given input are equispaced on a line, embedded in high dimensional space, as observed by Gai and Zhang [2021]; (RA2) top left and right singular vectors of Residual Jacobians align with each other and across different depths; (RA3) Residual Jacobians are at most rank C for fully-connected ResNets, where C is the number of classes; and (RA4) top singular values of Residual Jacobians scale inversely with depth. RA consistently occurs in models that generalize well, in both fully-connected and convolutional architectures, across various depths and widths, for varying numbers of classes, on all tested benchmark datasets, but ceases to occur once the skip connections are removed. It also provably occurs in a novel mathematical model we propose. This phenomenon reveals a strong alignment between residual branches of a ResNet (RA2+4), imparting a highly rigid geometric structure to the intermediate representations as they progress linearly through the network (RA1) up to the final layer, where they undergo Neural Collapse.
\ No newline at end of file
diff --git a/data/2023/neurips/Resilient Constrained Learning b/data/2023/neurips/Resilient Constrained Learning
new file mode 100644
index 0000000000..09068465bd
--- /dev/null
+++ b/data/2023/neurips/Resilient Constrained Learning	
@@ -0,0 +1 @@
+When deploying machine learning solutions, they must satisfy multiple requirements beyond accuracy, such as fairness, robustness, or safety. These requirements are imposed during training either implicitly, using penalties, or explicitly, using constrained optimization methods based on Lagrangian duality. Either way, specifying requirements is hindered by the presence of compromises and limited prior knowledge about the data. Furthermore, their impact on performance can often only be evaluated by actually solving the learning problem. This paper presents a constrained learning approach that adapts the requirements while simultaneously solving the learning task. To do so, it relaxes the learning constraints in a way that contemplates how much they affect the task at hand by balancing the performance gains obtained from the relaxation against a user-defined cost of that relaxation. We call this approach resilient constrained learning after the term used to describe ecological systems that adapt to disruptions by modifying their operation. We show conditions under which this balance can be achieved and introduce a practical algorithm to compute it, for which we derive approximation and generalization guarantees. We showcase the advantages of this resilient learning method in image classification tasks involving multiple potential invariances and in heterogeneous federated learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline b/data/2023/neurips/Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
new file mode 100644
index 0000000000..18272d76d6
--- /dev/null
+++ b/data/2023/neurips/Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline	
@@ -0,0 +1 @@
+Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs. Our approach begins by tapping into the potential of LLMs to accurately perceive and predict the response length with minimal overhead. By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches. We evaluate our approach on real-world instruction datasets using the LLaMA-based model, and our results demonstrate an impressive 86% improvement in inference throughput without compromising effectiveness. Notably, our method is orthogonal to other inference acceleration techniques, making it a valuable addition to many existing toolkits (e.g., FlashAttention, Quantization) for LLM inference.
\ No newline at end of file
diff --git a/data/2023/neurips/Responsible AI (RAI) Games and Ensembles b/data/2023/neurips/Responsible AI (RAI) Games and Ensembles
new file mode 100644
index 0000000000..e99f1db13e
--- /dev/null
+++ b/data/2023/neurips/Responsible AI (RAI) Games and Ensembles	
@@ -0,0 +1 @@
+Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written as min-max problems over these uncertainty sets. In this work, we provide a general framework for studying these problems, which we refer to as Responsible AI (RAI) games. We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is motivated by online learning and game theory, whereas the latter class is motivated by the classical statistical literature on boosting, and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
\ No newline at end of file
diff --git a/data/2023/neurips/Restart Sampling for Improving Generative Processes b/data/2023/neurips/Restart Sampling for Improving Generative Processes
new file mode 100644
index 0000000000..0a4452827d
--- /dev/null
+++ b/data/2023/neurips/Restart Sampling for Improving Generative Processes	
@@ -0,0 +1 @@
+Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction. The sampling method alternates between adding substantial noise in additional forward steps and strictly following a backward ODE. Empirically, Restart sampler surpasses previous SDE and ODE samplers in both speed and accuracy. Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$. In addition, it attains significantly better sample quality than ODE samplers within comparable sampling times. Moreover, Restart better balances text-image alignment/visual quality versus diversity than previous samplers in the large-scale text-to-image Stable Diffusion model pre-trained on LAION $512 \times 512$. Code is available at https://github.com/Newbeeer/diffusion_restart_sampling
\ No newline at end of file
diff --git a/data/2023/neurips/Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption b/data/2023/neurips/Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption
new file mode 100644
index 0000000000..b6f7e404a6
--- /dev/null
+++ b/data/2023/neurips/Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption	
@@ -0,0 +1 @@
+We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings. A fundamental goal is to efficiently compute policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require \emph{any} additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.
\ No newline at end of file
diff --git a/data/2023/neurips/Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition b/data/2023/neurips/Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition
new file mode 100644
index 0000000000..74a93aefb6
--- /dev/null
+++ b/data/2023/neurips/Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition	
@@ -0,0 +1 @@
+Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penalties to prevent bias from effecting the model during training, or post-processing predictions to debias them, yet these approaches have shown limited success on hard problems such as face recognition. In our work, we discover that biases are actually inherent to neural network architectures themselves. Following this reframing, we conduct the first neural architecture search for fairness, jointly with a search for hyperparameters. Our search outputs a suite of models which Pareto-dominate all other high-performance architectures and existing bias mitigation methods in terms of accuracy and fairness, often by large margins, on the two most widely used datasets for face identification, CelebA and VGGFace2. Furthermore, these models generalize to other datasets and sensitive attributes. We release our code, models and raw data files at https://github.com/dooleys/FR-NAS.
\ No newline at end of file
diff --git a/data/2023/neurips/Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? b/data/2023/neurips/Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?
new file mode 100644
index 0000000000..a5abcdda94
--- /dev/null
+++ b/data/2023/neurips/Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial?	
@@ -0,0 +1 @@
+The past decade has witnessed the flourishing of a new profession as media content creators, who rely on revenue streams from online content recommendation platforms. The reward mechanism employed by these platforms creates a competitive environment among creators which affect their production choices and, consequently, content distribution and system welfare. It is thus crucial to design the platform's reward mechanism in order to steer the creators' competition towards a desirable welfare outcome in the long run. This work makes two major contributions in this regard: first, we uncover a fundamental limit about a class of widely adopted mechanisms, coined Merit-based Monotone Mechanisms, by showing that they inevitably lead to a constant fraction loss of the optimal welfare. To circumvent this limitation, we introduce Backward Rewarding Mechanisms (BRMs) and show that the competition game resultant from BRMs possesses a potential game structure. BRMs thus naturally induce strategic creators' collective behaviors towards optimizing the potential function, which can be designed to match any given welfare metric. In addition, the BRM class can be parameterized to allow the platform to directly optimize welfare within the feasible mechanism space even when the welfare metric is not explicitly defined.
\ No newline at end of file
diff --git a/data/2023/neurips/Revealing the unseen: Benchmarking video action recognition under occlusion b/data/2023/neurips/Revealing the unseen: Benchmarking video action recognition under occlusion
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness b/data/2023/neurips/Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models b/data/2023/neurips/Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models
new file mode 100644
index 0000000000..6b488502d5
--- /dev/null
+++ b/data/2023/neurips/Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models	
@@ -0,0 +1 @@
+While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR, much less is known for ImageNet. Given the recent debate about whether transformers are more robust than convnets, we revisit adversarial training on ImageNet comparing ViTs and ConvNeXts. Extensive experiments show that minor changes in architecture, most notably replacing PatchStem with ConvStem, and training scheme have a significant impact on the achieved robustness. These changes not only increase robustness in the seen $\ell_\infty$-threat model, but even more so improve generalization to unseen $\ell_1/\ell_2$-attacks. Our modified ConvNeXt, ConvNeXt + ConvStem, yields the most robust $\ell_\infty$-models across different ranges of model parameters and FLOPs, while our ViT + ConvStem yields the best generalization to unseen threat models.
\ No newline at end of file
diff --git a/data/2023/neurips/Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations b/data/2023/neurips/Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations
new file mode 100644
index 0000000000..0141efc6ef
--- /dev/null
+++ b/data/2023/neurips/Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations	
@@ -0,0 +1 @@
+This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pre-trained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at \url{https://github.com/lifan-yuan/OOD_NLP}.
\ No newline at end of file
diff --git a/data/2023/neurips/Revisiting the Evaluation of Image Synthesis with GANs b/data/2023/neurips/Revisiting the Evaluation of Image Synthesis with GANs
new file mode 100644
index 0000000000..45a484ad85
--- /dev/null
+++ b/data/2023/neurips/Revisiting the Evaluation of Image Synthesis with GANs	
@@ -0,0 +1 @@
+A good metric, which promises a reliable comparison between solutions, is essential for any well-defined task. Unlike most vision tasks that have per-sample ground-truth, image synthesis tasks target generating unseen data and hence are usually evaluated through a distributional distance between one set of real samples and another set of generated samples. This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set. Extensive experiments conducted on multiple datasets and settings reveal several important findings. Firstly, a group of models that include both CNN-based and ViT-based architectures serve as reliable and robust feature extractors for measurement evaluation. Secondly, Centered Kernel Alignment (CKA) provides a better comparison across various extractors and hierarchical layers in one model. Finally, CKA is more sample-efficient and enjoys better agreement with human judgment in characterizing the similarity between two internal data correlations. These findings contribute to the development of a new measurement system, which enables a consistent and reliable re-evaluation of current state-of-the-art generative models.
\ No newline at end of file
diff --git a/data/2023/neurips/Reward Imputation with Sketching for Contextual Batched Bandits b/data/2023/neurips/Reward Imputation with Sketching for Contextual Batched Bandits
new file mode 100644
index 0000000000..f4ae5d700f
--- /dev/null
+++ b/data/2023/neurips/Reward Imputation with Sketching for Contextual Batched Bandits	
@@ -0,0 +1 @@
+Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. In this paper, we propose an efficient approach called Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching, which approximates the full-information feedbacks. We formulate reward imputation as an imputation regularized ridge regression problem that captures the feedback mechanisms of both executed and non-executed actions. To reduce time complexity, we solve the regression problem using randomized sketching. We prove that our approach achieves an instantaneous regret with controllable bias and smaller variance than approaches without reward imputation. Furthermore, our approach enjoys a sublinear regret bound against the optimal policy. We also present two extensions, a rate-scheduled version and a version for nonlinear rewards, making our approach more practical. Experimental results show that SPUIR outperforms state-of-the-art baselines on synthetic, public benchmark, and real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement b/data/2023/neurips/Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
new file mode 100644
index 0000000000..38b03508f0
--- /dev/null
+++ b/data/2023/neurips/Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement	
@@ -0,0 +1 @@
+We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples.
\ No newline at end of file
diff --git a/data/2023/neurips/Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards b/data/2023/neurips/Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
new file mode 100644
index 0000000000..bbe3995851
--- /dev/null
+++ b/data/2023/neurips/Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards	
@@ -0,0 +1 @@
+Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when fine-tuned on diverse rewards from a shared pre-trained initialization. We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks. We hope to enhance the alignment of deep models, and how they interact with the world in all its diversity.
\ No newline at end of file
diff --git a/data/2023/neurips/Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation b/data/2023/neurips/Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
new file mode 100644
index 0000000000..5462698282
--- /dev/null
+++ b/data/2023/neurips/Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation	
@@ -0,0 +1 @@
+Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data.
\ No newline at end of file
diff --git a/data/2023/neurips/Riemannian Residual Neural Networks b/data/2023/neurips/Riemannian Residual Neural Networks
new file mode 100644
index 0000000000..d03ea84261
--- /dev/null
+++ b/data/2023/neurips/Riemannian Residual Neural Networks	
@@ -0,0 +1 @@
+Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. However, extending Euclidean networks is difficult and has only been done for a select few manifolds. In this work, we examine the residual neural network (ResNet) and show how to extend this construction to general Riemannian manifolds in a geometrically principled manner. Originally introduced to help solve the vanishing gradient problem, ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks. We find that our Riemannian ResNets mirror these desirable properties: when compared to existing manifold neural networks designed to learn over hyperbolic space and the manifold of symmetric positive definite matrices, we outperform both kinds of networks in terms of relevant testing metrics and training dynamics.
\ No newline at end of file
diff --git a/data/2023/neurips/Riemannian SAM: Sharpness-Aware Minimization on Riemannian Manifolds b/data/2023/neurips/Riemannian SAM: Sharpness-Aware Minimization on Riemannian Manifolds
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Riemannian stochastic optimization methods avoid strict saddle points b/data/2023/neurips/Riemannian stochastic optimization methods avoid strict saddle points
new file mode 100644
index 0000000000..a244c7d95a
--- /dev/null
+++ b/data/2023/neurips/Riemannian stochastic optimization methods avoid strict saddle points	
@@ -0,0 +1 @@
+Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.
\ No newline at end of file
diff --git a/data/2023/neurips/Risk-Averse Active Sensing for Timely Outcome Prediction under Cost Pressure b/data/2023/neurips/Risk-Averse Active Sensing for Timely Outcome Prediction under Cost Pressure
new file mode 100644
index 0000000000..85063ff3ac
--- /dev/null
+++ b/data/2023/neurips/Risk-Averse Active Sensing for Timely Outcome Prediction under Cost Pressure	
@@ -0,0 +1 @@
+Timely outcome prediction is essential in healthcare to enable early detection and intervention of adverse events. However, in longitudinal follow-ups to patients’ health status, cost-efficient acquisition of patient covariates is usually necessary due to the significant expense involved in screening and lab tests. To balance the timely and accurate outcome predictions with acquisition costs, an effective active sensing strategy is crucial. In this paper, we propose a novel risk-averse active sensing approach RAS that addresses the composite decision problem of when to conduct the acquisition and which measurements to make. Our approach decomposes the policy into two sub-policies: acquisition scheduler and feature selector, respectively. Moreover, we introduce a novel risk-aversion training strategy to focus on the underrepresented subgroup of high-risk patients for whom timely and accurate prediction of disease progression is of greater value. Our method outperforms baseline active sensing approaches in experiments with both synthetic and real-world datasets, and we illustrate the significance of our policy decomposition and the necessity of a risk-averse sensing policy through case studies.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust Bayesian Satisficing b/data/2023/neurips/Robust Bayesian Satisficing
new file mode 100644
index 0000000000..e5d5f96c65
--- /dev/null
+++ b/data/2023/neurips/Robust Bayesian Satisficing	
@@ -0,0 +1 @@
+Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy b/data/2023/neurips/Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy
new file mode 100644
index 0000000000..ade3ebacd3
--- /dev/null
+++ b/data/2023/neurips/Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy	
@@ -0,0 +1 @@
+Data pruning, which aims to downsize a large training set into a small informative subset, is crucial for reducing the enormous computational costs of modern deep learning. Though large-scale data collections invariably contain annotation noise and numerous robust learning methods have been developed, data pruning for the noise-robust learning scenario has received little attention. With state-of-the-art Re-labeling methods that self-correct erroneous labels while training, it is challenging to identify which subset induces the most accurate re-labeling of erroneous labels in the entire training set. In this paper, we formalize the problem of data pruning with re-labeling. We first show that the likelihood of a training example being correctly re-labeled is proportional to the prediction confidence of its neighborhood in the subset. Therefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples, thereby maximizing the re-labeling accuracy and generalization performance. Extensive experiments on four real and one synthetic noisy datasets show that \algname{} outperforms the baselines with Re-labeling models by up to 9.1% as well as those with a standard model by up to 21.6%.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust Learning with Progressive Data Expansion Against Spurious Correlation b/data/2023/neurips/Robust Learning with Progressive Data Expansion Against Spurious Correlation
new file mode 100644
index 0000000000..d7c8d94342
--- /dev/null
+++ b/data/2023/neurips/Robust Learning with Progressive Data Expansion Against Spurious Correlation	
@@ -0,0 +1 @@
+While deep learning models have shown remarkable performance in various tasks, they are susceptible to learning non-generalizable spurious features rather than the core features that are genuinely correlated to the true label. In this paper, beyond existing analyses of linear models, we theoretically examine the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. In light of this, we propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance. PDE begins with a group-balanced subset of training data and progressively expands it to facilitate the learning of the core features. Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as ResNets and Transformers. On average, our method achieves a 2.8% improvement in worst-group accuracy compared with the state-of-the-art method, while enjoying up to 10x faster training efficiency. Codes are available at https://github.com/uclaml/PDE.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust Matrix Sensing in the Semi-Random Model b/data/2023/neurips/Robust Matrix Sensing in the Semi-Random Model
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Robust Mean Estimation Without Moments for Symmetric Distributions b/data/2023/neurips/Robust Mean Estimation Without Moments for Symmetric Distributions
new file mode 100644
index 0000000000..dbf3f2ff21
--- /dev/null
+++ b/data/2023/neurips/Robust Mean Estimation Without Moments for Symmetric Distributions	
@@ -0,0 +1 @@
+We study the problem of robustly estimating the mean or location parameter without moment assumptions. We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distributions. For product distributions and elliptical distributions with known scatter (covariance) matrix, we show that given an $\varepsilon$-corrupted sample, we can with probability at least $1-\delta$ estimate its location up to error $O(\varepsilon \sqrt{\log(1/\varepsilon)})$ using $\tfrac{d\log(d) + \log(1/\delta)}{\varepsilon^2 \log(1/\varepsilon)}$ samples. This result matches the best-known guarantees for the Gaussian distribution and known SQ lower bounds (up to the $\log(d)$ factor). For elliptical distributions with unknown scatter (covariance) matrix, we propose a sequence of efficient algorithms that approaches this optimal error. Specifically, for every $k \in \mathbb{N}$, we design an estimator using time and samples $\tilde{O}({d^k})$ achieving error $O(\varepsilon^{1-\frac{1}{2k}})$. This matches the error and running time guarantees when assuming certifiably bounded moments of order up to $k$. For unknown covariance, such error bounds of $o(\sqrt{\varepsilon})$ are not even known for (general) sub-Gaussian distributions. Our algorithms are based on a generalization of the well-known filtering technique. We show how this machinery can be combined with Huber-loss-based techniques to work with projections of the noise that behave more nicely than the initial noise. Moreover, we show how SoS proofs can be used to obtain algorithmic guarantees even for distributions without a first moment. We believe that this approach may find other applications in future works.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust Model Reasoning and Fitting via Dual Sparsity Pursuit b/data/2023/neurips/Robust Model Reasoning and Fitting via Dual Sparsity Pursuit
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Robust and Actively Secure Serverless Collaborative Learning b/data/2023/neurips/Robust and Actively Secure Serverless Collaborative Learning
new file mode 100644
index 0000000000..e5c0df0740
--- /dev/null
+++ b/data/2023/neurips/Robust and Actively Secure Serverless Collaborative Learning	
@@ -0,0 +1 @@
+Collaborative machine learning (ML) is widely used to enable institutions to learn better models from distributed data. While collaborative approaches to learning intuitively protect user data, they remain vulnerable to either the server, the clients, or both, deviating from the protocol. Indeed, because the protocol is asymmetric, a malicious server can abuse its power to reconstruct client data points. Conversely, malicious clients can corrupt learning with malicious updates. Thus, both clients and servers require a guarantee when the other cannot be trusted to fully cooperate. In this work, we propose a peer-to-peer (P2P) learning scheme that is secure against malicious servers and robust to malicious clients. Our core contribution is a generic framework that transforms any (compatible) algorithm for robust aggregation of model updates to the setting where servers and clients can act maliciously. Finally, we demonstrate the computational efficiency of our approach even with 1-million parameter models trained by 100s of peers on standard datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust covariance estimation with missing values and cell-wise contamination b/data/2023/neurips/Robust covariance estimation with missing values and cell-wise contamination
new file mode 100644
index 0000000000..ee68511992
--- /dev/null
+++ b/data/2023/neurips/Robust covariance estimation with missing values and cell-wise contamination	
@@ -0,0 +1 @@
+Large datasets are often affected by cell-wise outliers in the form of missing or erroneous data. However, discarding any samples containing outliers may result in a dataset that is too small to accurately estimate the covariance matrix. Moreover, the robust procedures designed to address this problem require the invertibility of the covariance operator and thus are not effective on high-dimensional data. In this paper, we propose an unbiased estimator for the covariance in the presence of missing values that does not require any imputation step and still achieves near minimax statistical accuracy with the operator norm. We also advocate for its use in combination with cell-wise outlier detection methods to tackle cell-wise contamination in a high-dimensional and low-rank setting, where state-of-the-art methods may suffer from numerical instability and long computation times. To complement our theoretical findings, we conducted an experimental study which demonstrates the superiority of our approach over the state of the art both in low and high dimension settings.
\ No newline at end of file
diff --git a/data/2023/neurips/Robust low-rank training via approximate orthonormal constraints b/data/2023/neurips/Robust low-rank training via approximate orthonormal constraints
new file mode 100644
index 0000000000..5325210314
--- /dev/null
+++ b/data/2023/neurips/Robust low-rank training via approximate orthonormal constraints	
@@ -0,0 +1 @@
+With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.
\ No newline at end of file
diff --git a/data/2023/neurips/SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models b/data/2023/neurips/SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models
new file mode 100644
index 0000000000..cc25268a44
--- /dev/null
+++ b/data/2023/neurips/SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models	
@@ -0,0 +1 @@
+Diffusion Probabilistic Models (DPMs) have achieved considerable success in generation tasks. As sampling from DPMs is equivalent to solving diffusion SDE or ODE which is time-consuming, numerous fast sampling methods built upon improved differential equation solvers are proposed. The majority of such techniques consider solving the diffusion ODE due to its superior efficiency. However, stochastic sampling could offer additional advantages in generating diverse and high-quality data. In this work, we engage in a comprehensive analysis of stochastic sampling from two aspects: variance-controlled diffusion SDE and linear multi-step SDE solver. Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality. Our experiments show that SA-Solver achieves: 1) improved or comparable performance compared with the existing state-of-the-art sampling methods for few-step sampling; 2) SOTA FID scores on substantial benchmark datasets under a suitable number of function evaluations (NFEs).
\ No newline at end of file
diff --git a/data/2023/neurips/SALSA VERDE: a machine learning attack on LWE with sparse small secrets b/data/2023/neurips/SALSA VERDE: a machine learning attack on LWE with sparse small secrets
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/SAME: Uncovering GNN Black Box with Structure-aware Shapley-based Multipiece Explanations b/data/2023/neurips/SAME: Uncovering GNN Black Box with Structure-aware Shapley-based Multipiece Explanations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/SANFlow: Semantic-Aware Normalizing Flow for Anomaly Detection b/data/2023/neurips/SANFlow: Semantic-Aware Normalizing Flow for Anomaly Detection
new file mode 100644
index 0000000000..fbb8cfdb99
--- /dev/null
+++ b/data/2023/neurips/SANFlow: Semantic-Aware Normalizing Flow for Anomaly Detection	
@@ -0,0 +1 @@
+Visual anomaly detection, the task of detecting abnormal characteristics in images, is challenging due to the rarity and unpredictability of anomalies. In order to reliably model the distribution of normality and detect anomalies, a few works have attempted to exploit the density estimation ability of normalizing flow (NF). However, previous NF-based methods forcibly transform the distribution of all features into a single distribution (e.g., unit normal distribution), even when the features can have locally distinct semantic information and thus follow different distributions. We claim that forcibly learning to transform such diverse distributions to a single distribution with a single network will cause the learning difficulty, thereby limiting the capacity of a network to discriminate between normal and abnormal data. As such, we propose to transform the distribution of features at each location of a given input image to different distributions. Specifically, we train NF to map the feature distributions of normal data to different distributions at each location in the given image. Furthermore, to enhance the discriminability, we also train NF to map the distribution of abnormal data to a distribution significantly different from that of normal data. The experimental results highlight the efficacy of the proposed framework in improving the density modeling and thus anomaly detection performance.
\ No newline at end of file
diff --git a/data/2023/neurips/SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation b/data/2023/neurips/SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation
new file mode 100644
index 0000000000..9688b0401c
--- /dev/null
+++ b/data/2023/neurips/SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D Object Pose Estimation	
@@ -0,0 +1 @@
+In this paper, we introduce an SE(3) diffusion model-based point cloud registration framework for 6D object pose estimation in real-world scenarios. Our approach formulates the 3D registration task as a denoising diffusion process, which progressively refines the pose of the source point cloud to obtain a precise alignment with the model point cloud. Training our framework involves two operations: An SE(3) diffusion process and an SE(3) reverse process. The SE(3) diffusion process gradually perturbs the optimal rigid transformation of a pair of point clouds by continuously injecting noise (perturbation transformation). By contrast, the SE(3) reverse process focuses on learning a denoising network that refines the noisy transformation step-by-step, bringing it closer to the optimal transformation for accurate pose estimation. Unlike standard diffusion models used in linear Euclidean spaces, our diffusion model operates on the SE(3) manifold. This requires exploiting the linear Lie algebra $\mathfrak{se}(3)$ associated with SE(3) to constrain the transformation transitions during the diffusion and reverse processes. Additionally, to effectively train our denoising network, we derive a registration-specific variational lower bound as the optimization objective for model learning. Furthermore, we show that our denoising network can be constructed with a surrogate registration model, making our approach applicable to different deep registration networks. Extensive experiments demonstrate that our diffusion registration framework presents outstanding pose estimation performance on the real-world TUD-L, LINEMOD, and Occluded-LINEMOD datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/SE(3) Equivariant Augmented Coupling Flows b/data/2023/neurips/SE(3) Equivariant Augmented Coupling Flows
new file mode 100644
index 0000000000..3e16784277
--- /dev/null
+++ b/data/2023/neurips/SE(3) Equivariant Augmented Coupling Flows	
@@ -0,0 +1 @@
+Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13, and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling more than an order of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.
\ No newline at end of file
diff --git a/data/2023/neurips/SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models b/data/2023/neurips/SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models
new file mode 100644
index 0000000000..a217e27168
--- /dev/null
+++ b/data/2023/neurips/SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion Models	
@@ -0,0 +1 @@
+A potent class of generative models known as Diffusion Probabilistic Models (DPMs) has become prominent. A forward diffusion process adds gradually noise to data, while a model learns to gradually denoise. Sampling from pre-trained DPMs is obtained by solving differential equations (DE) defined by the learnt model, a process which has shown to be prohibitively slow. Numerous efforts on speeding-up this process have consisted on crafting powerful ODE solvers. Despite being quick, such solvers do not usually reach the optimal quality achieved by available slow SDE solvers. Our goal is to propose SDE solvers that reach optimal quality without requiring several hundreds or thousands of NFEs to achieve that goal. We propose Stochastic Explicit Exponential Derivative-free Solvers (SEEDS), improving and generalizing Exponential Integrator approaches to the stochastic case on several frameworks. After carefully analyzing the formulation of exact solutions of diffusion SDEs, we craft SEEDS to analytically compute the linear part of such solutions. Inspired by the Exponential Time-Differencing method, SEEDS use a novel treatment of the stochastic components of solutions, enabling the analytical computation of their variance, and contains high-order terms allowing to reach optimal quality sampling $\sim3$-$5\times$ faster than previous SDE methods. We validate our approach on several image generation benchmarks, showing that SEEDS outperform or are competitive with previous SDE solvers. Contrary to the latter, SEEDS are derivative and training free, and we fully prove strong convergence guarantees for them.
\ No newline at end of file
diff --git a/data/2023/neurips/SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction b/data/2023/neurips/SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
new file mode 100644
index 0000000000..29b32fd17e
--- /dev/null
+++ b/data/2023/neurips/SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction	
@@ -0,0 +1 @@
+Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce SEVA, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm (Vinker et al., 2022) capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.
\ No newline at end of file
diff --git a/data/2023/neurips/SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization b/data/2023/neurips/SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization
new file mode 100644
index 0000000000..1be04c9128
--- /dev/null
+++ b/data/2023/neurips/SLM: A Smoothed First-Order Lagrangian Method for Structured Constrained Nonconvex Optimization	
@@ -0,0 +1 @@
+Functional constrained optimization (FCO) has emerged as a powerful tool for solving various machine learning problems. However, with the rapid increase in applications of neural networks in recent years, it has become apparent that both the objective and constraints often involve nonconvex functions, which poses signiﬁcant challenges in obtaining high-quality solutions. In this work, we focus on a class of nonconvex FCO problems with nonconvex constraints, where the two optimization variables are nonlinearly coupled in the inequality constraint. Lever-aging the primal-dual optimization framework, we propose a smoothed ﬁrst-order Lagrangian method (SLM) for solving this class of problems. We establish the theoretical convergence guarantees of SLM to the Karush-Kuhn-Tucker (KKT) solutions through quantifying dual error bounds. By establishing connections be-tween this structured FCO and equilibrium-constrained nonconvex problems (also known as bilevel optimization), we apply the proposed SLM to tackle bilevel optimization oriented problems where the lower-level problem is nonconvex. Numerical results obtained from both toy examples and hyper-data cleaning problems demonstrate the superiority of SLM compared to benchmark methods.
\ No newline at end of file
diff --git a/data/2023/neurips/SLaM: Student-Label Mixing for Distillation with Unlabeled Examples b/data/2023/neurips/SLaM: Student-Label Mixing for Distillation with Unlabeled Examples
new file mode 100644
index 0000000000..e5eb1b9b4c
--- /dev/null
+++ b/data/2023/neurips/SLaM: Student-Label Mixing for Distillation with Unlabeled Examples	
@@ -0,0 +1 @@
+Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled data. In this setting, a large teacher model generates ``soft'' pseudo-labels for the unlabeled dataset which are then used for training the student model. Despite its success in a wide variety of applications, a shortcoming of this approach is that the teacher's pseudo-labels are often noisy, leading to impaired student performance. In this paper, we present a principled method for knowledge distillation with unlabeled examples that we call Student-Label Mixing (SLaM) and we show that it consistently improves over prior approaches by evaluating it on several standard benchmarks. Finally, we show that SLaM comes with theoretical guarantees; along the way we give an algorithm improving the best-known sample complexity for learning halfspaces with margin under random classification noise, and provide the first convergence analysis for so-called ``forward loss-adjustment"methods.
\ No newline at end of file
diff --git a/data/2023/neurips/SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation b/data/2023/neurips/SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
new file mode 100644
index 0000000000..c1a7a12542
--- /dev/null
+++ b/data/2023/neurips/SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	
@@ -0,0 +1 @@
+This paper studies referring video object segmentation (RVOS) by boosting video-level visual-linguistic alignment. Recent approaches model the RVOS task as a sequence prediction problem and perform multi-modal interaction as well as segmentation for each frame separately. However, the lack of a global view of video content leads to difficulties in effectively utilizing inter-frame relationships and understanding textual descriptions of object temporal variations. To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment. By associating a group of frame-level object embeddings with language tokens, SOC facilitates joint space learning across modalities and time steps. Moreover, we present multi-modal contrastive supervision to help construct well-aligned joint space at the video level. We conduct extensive experiments on popular RVOS benchmarks, and our method outperforms state-of-the-art competitors on all benchmarks by a remarkable margin. Besides, the emphasis on temporal coherence enhances the segmentation stability and adaptability of our method in processing text expressions with temporal variations. Code will be available.
\ No newline at end of file
diff --git a/data/2023/neurips/SPA: A Graph Spectral Alignment Perspective for Domain Adaptation b/data/2023/neurips/SPA: A Graph Spectral Alignment Perspective for Domain Adaptation
new file mode 100644
index 0000000000..996d8b4d11
--- /dev/null
+++ b/data/2023/neurips/SPA: A Graph Spectral Alignment Perspective for Domain Adaptation	
@@ -0,0 +1 @@
+Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to extend the in-domain model to the distinctive target domains where the data distributions differ. Most prior works focus on capturing the inter-domain transferability but largely overlook rich intra-domain structures, which empirically results in even worse discriminability. In this work, we introduce a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The core of our method is briefly condensed as follows: (i)-by casting the DA problem to graph primitives, SPA composes a coarse graph alignment mechanism with a novel spectral regularizer towards aligning the domain graphs in eigenspaces; (ii)-we further develop a fine-grained message propagation module -- upon a novel neighbor-aware self-training mechanism -- in order for enhanced discriminability in the target domain. On standardized benchmarks, the extensive experiments of SPA demonstrate that its performance has surpassed the existing cutting-edge DA methods. Coupled with dense model analysis, we conclude that our approach indeed possesses superior efficacy, robustness, discriminability, and transferability. Code and data are available at: https://github.com/CrownX/SPA.
\ No newline at end of file
diff --git a/data/2023/neurips/SPACE: Single-round Participant Amalgamation for Contribution Evaluation in Federated Learning b/data/2023/neurips/SPACE: Single-round Participant Amalgamation for Contribution Evaluation in Federated Learning
new file mode 100644
index 0000000000..c05ffe2fa9
--- /dev/null
+++ b/data/2023/neurips/SPACE: Single-round Participant Amalgamation for Contribution Evaluation in Federated Learning	
@@ -0,0 +1 @@
+The evaluation of participant contribution in federated learning (FL) has recently gained significant attention due to its applicability in various domains, such as incentive mechanisms, robustness enhancement, and client selection. Previous approaches have predominantly relied on the widely adopted Shapley value for participant evaluation. However, the computation of the Shapley value is expensive, despite using techniques like gradient-based model reconstruction and truncating unnecessary evaluations. Therefore, we present an efficient approach called Single-round Participants Amalgamation for Contribution Evaluation (SPACE). SPACE incorporates two novel components, namely Federated Knowledge Amalgamation and Prototype-based Model Evaluation to reduce the evaluation effort by eliminating the dependence on the size of the validation set and enabling participant evaluation within a single communication round. Experimental results demonstrate that SPACE outperforms state-of-the-art methods in terms of both running time and Pearson’s Correlation Coefficient (PCC). Furthermore, extensive experiments conducted on applications, client reweighting, and client selection highlight the effectiveness of SPACE. The code is available at https://github.com/culiver/SPACE.
\ No newline at end of file
diff --git a/data/2023/neurips/SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning b/data/2023/neurips/SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
new file mode 100644
index 0000000000..578c4192c3
--- /dev/null
+++ b/data/2023/neurips/SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning	
@@ -0,0 +1 @@
+Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events b/data/2023/neurips/STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
new file mode 100644
index 0000000000..fd327b2d13
--- /dev/null
+++ b/data/2023/neurips/STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events	
@@ -0,0 +1 @@
+While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper proposes an audio-visual sound event localization and detection (SELD) task, which uses multichannel audio and video information to estimate the temporal activation and DOA of target sound events. Audio-visual SELD systems can detect and localize sound events using signals from a microphone array and audio-visual correspondence. We also introduce an audio-visual dataset, Sony-TAu Realistic Spatial Soundscapes 2023 (STARSS23), which consists of multichannel audio data recorded with a microphone array, video data, and spatiotemporal annotation of sound events. Sound scenes in STARSS23 are recorded with instructions, which guide recording participants to ensure adequate activity and occurrences of sound events. STARSS23 also serves human-annotated temporal activation labels and human-confirmed DOA labels, which are based on tracking results of a motion capture system. Our benchmark results demonstrate the benefits of using visual object positions in audio-visual SELD tasks. The data is available at https://zenodo.org/record/7880637.
\ No newline at end of file
diff --git a/data/2023/neurips/STEVE-1: A Generative Model for Text-to-Behavior in Minecraft b/data/2023/neurips/STEVE-1: A Generative Model for Text-to-Behavior in Minecraft
new file mode 100644
index 0000000000..0710134df7
--- /dev/null
+++ b/data/2023/neurips/STEVE-1: A Generative Model for Text-to-Behavior in Minecraft	
@@ -0,0 +1 @@
+Constructing AI models that respond to text instructions is challenging, especially for sequential decision-making tasks. This work introduces a methodology, inspired by unCLIP, for instruction-tuning generative models of behavior without relying on a large dataset of instruction-labeled trajectories. Using this methodology, we create an instruction-tuned Video Pretraining (VPT) model called STEVE-1, which can follow short-horizon open-ended text and visual instructions in Minecraft. STEVE-1 is trained in two steps: adapting the pretrained VPT model to follow commands in MineCLIP's latent space, then training a prior to predict latent codes from text. This allows us to finetune VPT through self-supervised behavioral cloning and hindsight relabeling, reducing the need for costly human text annotations, and all for only $60 of compute. By leveraging pretrained models like VPT and MineCLIP and employing best practices from text-conditioned image generation, STEVE-1 sets a new bar for open-ended instruction-following in Minecraft with low-level controls (mouse and keyboard) and raw pixel inputs, far outperforming previous baselines and robustly completing 12 of 13 tasks in our early-game evaluation suite. We provide experimental evidence highlighting key factors for downstream performance, including pretraining, classifier-free guidance, and data scaling. All resources, including our model weights, training scripts, and evaluation tools are made available for further research.
\ No newline at end of file
diff --git a/data/2023/neurips/SUPA: A Lightweight Diagnostic Simulator for Machine Learning in Particle Physics b/data/2023/neurips/SUPA: A Lightweight Diagnostic Simulator for Machine Learning in Particle Physics
new file mode 100644
index 0000000000..c87e078851
--- /dev/null
+++ b/data/2023/neurips/SUPA: A Lightweight Diagnostic Simulator for Machine Learning in Particle Physics	
@@ -0,0 +1 @@
+Deep learning methods have gained popularity in high energy physics for fast modeling of particle showers in detectors. Detailed simulation frameworks such as the gold standard Geant4 are computationally intensive, and current deep generative architectures work on discretized, lower resolution versions of the detailed simulation. The development of models that work at higher spatial resolutions is currently hindered by the complexity of the full simulation data, and by the lack of simpler, more interpretable benchmarks. Our contribution is SUPA, the SUrrogate PArticle propagation simulator, an algorithm and software package for generating data by simulating simplified particle propagation, scattering and shower development in matter. The generation is extremely fast and easy to use compared to Geant4, but still exhibits the key characteristics and challenges of the detailed simulation. We support this claim experimentally by showing that performance of generative models on data from our simulator reflects the performance on a dataset generated with Geant4. The proposed simulator generates thousands of particle showers per second on a desktop machine, a speed up of up to 6 orders of magnitudes over Geant4, and stores detailed geometric information about the shower propagation. SUPA provides much greater flexibility for setting initial conditions and defining multiple benchmarks for the development of models. Moreover, interpreting particle showers as point clouds creates a connection to geometric machine learning and provides challenging and fundamentally new datasets for the field. The code for SUPA is available at https://github.com/itsdaniele/SUPA.
\ No newline at end of file
diff --git a/data/2023/neurips/SaVeNet: A Scalable Vector Network for Enhanced Molecular Representation Learning b/data/2023/neurips/SaVeNet: A Scalable Vector Network for Enhanced Molecular Representation Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations b/data/2023/neurips/SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations
new file mode 100644
index 0000000000..d488b58da9
--- /dev/null
+++ b/data/2023/neurips/SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations	
@@ -0,0 +1 @@
+We consider offline safe imitation learning (IL), where the agent aims to learn the safe policy that mimics preferred behavior while avoiding non-preferred behavior from non-preferred demonstrations and unlabeled demonstrations. This problem setting corresponds to various real-world scenarios, where satisfying safety constraints is more important than maximizing the expected return. However, it is very challenging to learn the policy to avoid constraint-violating (i.e. non-preferred) behavior, as opposed to standard imitation learning which learns the policy to mimic given demonstrations. In this paper, we present a hyperparameter-free offline safe IL algorithm, SafeDICE, that learns safe policy by leveraging the non-preferred demonstrations in the space of stationary distributions. Our algo-rithm directly estimates the stationary distribution corrections of the policy that imitate the demonstrations excluding the non-preferred behavior. In the experiments, we demonstrate that our algorithm learns a more safe policy that satisfies the cost constraint without degrading the reward performance, compared to baseline algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Safety Verification of Decision-Tree Policies in Continuous Time b/data/2023/neurips/Safety Verification of Decision-Tree Policies in Continuous Time
new file mode 100644
index 0000000000..0b1ae11be5
--- /dev/null
+++ b/data/2023/neurips/Safety Verification of Decision-Tree Policies in Continuous Time	
@@ -0,0 +1 @@
+Decision trees have gained popularity as interpretable surrogate models for learning-based control policies. However, providing safety guarantees for systems controlled by decision trees is an open challenge. We show that the problem is undecidable even for systems with the simplest dynamics, and PSPACE -complete for finite-horizon properties. The latter can be verified for discrete-time systems via bounded model checking. However, for continuous-time systems, such an approach requires discretization, thereby weakening the guarantees for the original system. This paper presents the first algorithm to directly verify decision-tree controlled systems in continuous time. The key aspect of our method is exploiting the decision-tree structure to propagate a set-based approximation through the decision nodes. We demonstrate the effectiveness of our approach by verifying safety of several decision trees distilled to imitate neural-network policies for nonlinear systems.
\ No newline at end of file
diff --git a/data/2023/neurips/Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling b/data/2023/neurips/Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
new file mode 100644
index 0000000000..7818e42b56
--- /dev/null
+++ b/data/2023/neurips/Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling	
@@ -0,0 +1 @@
+This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.
\ No newline at end of file
diff --git a/data/2023/neurips/Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning b/data/2023/neurips/Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Sample-efficient Multi-objective Molecular Optimization with GFlowNets b/data/2023/neurips/Sample-efficient Multi-objective Molecular Optimization with GFlowNets
new file mode 100644
index 0000000000..dc3cb241b2
--- /dev/null
+++ b/data/2023/neurips/Sample-efficient Multi-objective Molecular Optimization with GFlowNets	
@@ -0,0 +1 @@
+Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space. In practice, multiple conflicting objectives and costly evaluations (e.g., wet-lab experiments) make the diversity of candidates paramount. Computational methods have achieved initial success but still struggle with considering diversity in both objective and search space. To fill this gap, we propose a multi-objective Bayesian optimization (MOBO) algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN) as an acquisition function optimizer, with the purpose of sampling a diverse batch of candidate molecular graphs from an approximate Pareto front. Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives. We further propose a hindsight-like off-policy strategy to share high-performing molecules among different preferences in order to speed up learning for HN-GFN. We empirically illustrate that HN-GFN has adequate capacity to generalize over preferences. Moreover, experiments in various real-world MOBO settings demonstrate that our framework predominantly outperforms existing methods in terms of candidate quality and sample efficiency. The code is available at https://github.com/violet-sto/HN-GFN.
\ No newline at end of file
diff --git a/data/2023/neurips/SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data b/data/2023/neurips/SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/SatLM: Satisfiability-Aided Language Models Using Declarative Prompting b/data/2023/neurips/SatLM: Satisfiability-Aided Language Models Using Declarative Prompting
new file mode 100644
index 0000000000..500bb3d024
--- /dev/null
+++ b/data/2023/neurips/SatLM: Satisfiability-Aided Language Models Using Declarative Prompting	
@@ -0,0 +1 @@
+Prior work has combined chain-of-thought prompting in large language models (LLMs) with programmatic representations to perform effective and transparent reasoning. While such an approach works well for tasks that only require forward reasoning (e.g., straightforward arithmetic), it is less effective for constraint solving problems that require more sophisticated planning and search. In this paper, we propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of LLMs. We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. This approach has two key advantages. The declarative specification is closer to the problem description than the reasoning steps are, so the LLM can parse it out of the description more accurately. Furthermore, by offloading the actual reasoning task to an automated theorem prover, our approach can guarantee the correctness of the answer with respect to the parsed specification and avoid planning errors in the solving process. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm. In particular, SATLM outperforms program-aided LMs by 23% on a challenging subset of the GSM arithmetic reasoning dataset; SATLM also achieves a new SoTA on LSAT and BoardgameQA, surpassing previous models that are trained on the respective training sets.
\ No newline at end of file
diff --git a/data/2023/neurips/Scalable 3D Captioning with Pretrained Models b/data/2023/neurips/Scalable 3D Captioning with Pretrained Models
new file mode 100644
index 0000000000..f8bf87651e
--- /dev/null
+++ b/data/2023/neurips/Scalable 3D Captioning with Pretrained Models	
@@ -0,0 +1 @@
+We introduce Cap3D, an automatic approach for generating descriptive text for 3D objects. This approach utilizes pretrained models from image captioning, image-text alignment, and LLM to consolidate captions from multiple views of a 3D asset, completely side-stepping the time-consuming and costly process of manual annotation. We apply Cap3D to the recently introduced large-scale 3D dataset, Objaverse, resulting in 660k 3D-text pairs. Our evaluation, conducted using 41k human annotations from the same dataset, demonstrates that Cap3D surpasses human-authored descriptions in terms of quality, cost, and speed. Through effective prompt engineering, Cap3D rivals human performance in generating geometric descriptions on 17k collected annotations from the ABO dataset. Finally, we finetune Text-to-3D models on Cap3D and human captions, and show Cap3D outperforms; and benchmark the SOTA including Point-E, Shape-E, and DreamFusion.
\ No newline at end of file
diff --git a/data/2023/neurips/Scalable Fair Influence Maximization b/data/2023/neurips/Scalable Fair Influence Maximization
new file mode 100644
index 0000000000..b1018b380b
--- /dev/null
+++ b/data/2023/neurips/Scalable Fair Influence Maximization	
@@ -0,0 +1 @@
+Given a graph $G$, a community structure $\mathcal{C}$, and a budget $k$, the fair influence maximization problem aims to select a seed set $S$ ($|S|\leq k$) that maximizes the influence spread while narrowing the influence gap between different communities. While various fairness notions exist, the welfare fairness notion, which balances fairness level and influence spread, has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the welfare fairness objective function restricts its application to small-scale networks with only a few hundred nodes. In this paper, we adopt the objective function of welfare fairness to maximize the exponentially weighted summation over the influenced fraction of all communities. We first introduce an unbiased estimator for the fractional power of the arithmetic mean. Then, by adapting the reverse influence sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Further, we present an efficient algorithm that guarantees $1-1/e - \varepsilon$ approximation.
\ No newline at end of file
diff --git a/data/2023/neurips/ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection b/data/2023/neurips/ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
new file mode 100644
index 0000000000..ad34d93685
--- /dev/null
+++ b/data/2023/neurips/ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection	
@@ -0,0 +1 @@
+In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong
\ No newline at end of file
diff --git a/data/2023/neurips/Scaling Open-Vocabulary Object Detection b/data/2023/neurips/Scaling Open-Vocabulary Object Detection
new file mode 100644
index 0000000000..63c57dc8ee
--- /dev/null
+++ b/data/2023/neurips/Scaling Open-Vocabulary Object Detection	
@@ -0,0 +1 @@
+Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can be expanded by using Web image-text pairs as weak supervision, this has not been done at scales comparable to image-level pretraining. Here, we scale up detection data with self-training, which uses an existing detector to generate pseudo-box annotations on image-text pairs. Major challenges in scaling self-training are the choice of label space, pseudo-annotation filtering, and training efficiency. We present the OWLv2 model and OWL-ST self-training recipe, which address these challenges. OWLv2 surpasses the performance of previous state-of-the-art open-vocabulary detectors already at comparable training scales (~10M examples). However, with OWL-ST, we can scale to over 1B examples, yielding further large improvement: With an L/14 architecture, OWL-ST improves AP on LVIS rare classes, for which the model has seen no human box annotations, from 31.2% to 44.6% (43% relative improvement). OWL-ST unlocks Web-scale training for open-world localization, similar to what has been seen for image classification and language modelling.
\ No newline at end of file
diff --git a/data/2023/neurips/Scaling Riemannian Diffusion Models b/data/2023/neurips/Scaling Riemannian Diffusion Models
new file mode 100644
index 0000000000..85b8d036e8
--- /dev/null
+++ b/data/2023/neurips/Scaling Riemannian Diffusion Models	
@@ -0,0 +1 @@
+Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. Our key observation is that most relevant manifolds are symmetric spaces, which are much more amenable to computation. By leveraging and combining various ans\"{a}tze, we can quickly compute relevant quantities to high precision. On low dimensional datasets, our correction produces a noticeable improvement, allowing diffusion to compete with other methods. Additionally, we show that our method enables us to scale to high dimensional tasks on nontrivial manifolds. In particular, we model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
\ No newline at end of file
diff --git a/data/2023/neurips/Scaling laws for language encoding models in fMRI b/data/2023/neurips/Scaling laws for language encoding models in fMRI
new file mode 100644
index 0000000000..9a3191ee92
--- /dev/null
+++ b/data/2023/neurips/Scaling laws for language encoding models in fMRI	
@@ -0,0 +1 @@
+Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable improvements with model size. A noise ceiling analysis of these large, high-performance encoding models showed that performance is nearing the theoretical maximum for brain areas such as the precuneus and higher auditory cortex. These results suggest that increasing scale in both models and data will yield incredibly effective models of language processing in the brain, enabling better scientific understanding as well as applications such as decoding.
\ No newline at end of file
diff --git a/data/2023/neurips/Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer b/data/2023/neurips/Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
new file mode 100644
index 0000000000..1ad448bd0a
--- /dev/null
+++ b/data/2023/neurips/Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer	
@@ -0,0 +1 @@
+Transformer architecture has shown impressive performance in multiple research domains and has become the backbone of many neural network models. However, there is limited understanding on how it works. In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner. We open the black box of the dynamic process of how the self-attention layer combines input tokens, and reveal the nature of underlying inductive bias. More specifically, with the assumption (a) no positional encoding, (b) long input sequence, and (c) the decoder layer learns faster than the self-attention layer, we prove that self-attention acts as a \emph{discriminative scanning algorithm}: starting from uniform attention, it gradually attends more to distinct key tokens for a specific next token to be predicted, and pays less attention to common key tokens that occur across different next tokens. Among distinct tokens, it progressively drops attention weights, following the order of low to high co-occurrence between the key and the query token in the training set. Interestingly, this procedure does not lead to winner-takes-all, but decelerates due to a \emph{phase transition} that is controllable by the learning rates of the two layers, leaving (almost) fixed token combination. We verify this \textbf{\emph{scan and snap}} dynamics on synthetic and real-world data (WikiText).
\ No newline at end of file
diff --git a/data/2023/neurips/Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion b/data/2023/neurips/Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion
new file mode 100644
index 0000000000..02524869e3
--- /dev/null
+++ b/data/2023/neurips/Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion	
@@ -0,0 +1 @@
+Automated creation of synthetic traffic scenarios is a key part of validating the safety of autonomous vehicles (AVs). In this paper, we propose Scenario Diffusion, a novel diffusion-based architecture for generating traffic scenarios that enables controllable scenario generation. We combine latent diffusion, object detection and trajectory regression to generate distributions of synthetic agent poses, orientations and trajectories simultaneously. To provide additional control over the generated scenario, this distribution is conditioned on a map and sets of tokens describing the desired scenario. We show that our approach has sufficient expressive capacity to model diverse traffic patterns and generalizes to different geographical regions.
\ No newline at end of file
diff --git a/data/2023/neurips/Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking b/data/2023/neurips/Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking
new file mode 100644
index 0000000000..69ea486b60
--- /dev/null
+++ b/data/2023/neurips/Searching for Optimal Per-Coordinate Step-sizes with Multidimensional Backtracking	
@@ -0,0 +1 @@
+The backtracking line-search is an effective technique to automatically tune the step-size in smooth optimization. It guarantees similar performance to using the theoretically optimal step-size. Many approaches have been developed to instead tune per-coordinate step-sizes, also known as diagonal preconditioners, but none of the existing methods are provably competitive with the optimal per-coordinate stepsizes. We propose multidimensional backtracking, an extension of the backtracking line-search to find good diagonal preconditioners for smooth convex problems. Our key insight is that the gradient with respect to the step-sizes, also known as hypergradients, yields separating hyperplanes that let us search for good preconditioners using cutting-plane methods. As black-box cutting-plane approaches like the ellipsoid method are computationally prohibitive, we develop an efficient algorithm tailored to our setting. Multidimensional backtracking is provably competitive with the best diagonal preconditioner and requires no manual tuning.
\ No newline at end of file
diff --git a/data/2023/neurips/Secure Out-of-Distribution Task Generalization with Energy-Based Models b/data/2023/neurips/Secure Out-of-Distribution Task Generalization with Energy-Based Models
new file mode 100644
index 0000000000..00d2c805c8
--- /dev/null
+++ b/data/2023/neurips/Secure Out-of-Distribution Task Generalization with Energy-Based Models	
@@ -0,0 +1 @@
+The success of meta-learning on out-of-distribution (OOD) tasks in the wild has proved to be hit-and-miss. To safeguard the generalization capability of the meta-learned prior knowledge to OOD tasks, in particularly safety-critical applications, necessitates detection of an OOD task followed by adaptation of the task towards the prior. Nonetheless, the reliability of estimated uncertainty on OOD tasks by existing Bayesian meta-learning methods is restricted by incomplete coverage of the feature distribution shift and insufficient expressiveness of the meta-learned prior. Besides, they struggle to adapt an OOD task, running parallel to the line of cross-domain task adaptation solutions which are vulnerable to overfitting. To this end, we build a single coherent framework that supports both detection and adaptation of OOD tasks, while remaining compatible with off-the-shelf meta-learning backbones. The proposed Energy-Based Meta-Learning (EBML) framework learns to characterize any arbitrary meta-training task distribution with the composition of two expressive neural-network-based energy functions. We deploy the sum of the two energy functions, being proportional to the joint distribution of a task, as a reliable score for detecting OOD tasks; during meta-testing, we adapt the OOD task to in-distribution tasks by energy minimization. Experiments on four regression and classification datasets demonstrate the effectiveness of our proposal.
\ No newline at end of file
diff --git a/data/2023/neurips/Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation b/data/2023/neurips/Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
new file mode 100644
index 0000000000..0a1204c275
--- /dev/null
+++ b/data/2023/neurips/Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation	
@@ -0,0 +1 @@
+Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process b/data/2023/neurips/SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
new file mode 100644
index 0000000000..f6efb7e740
--- /dev/null
+++ b/data/2023/neurips/SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process	
@@ -0,0 +1 @@
+In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusion steps. Specifically, SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process. By predicting the label and corresponding states-transition probabilities for each pixel, SegRefiner progressively refines the noisy masks in a conditional denoising manner. To assess the effectiveness of SegRefiner, we conduct comprehensive experiments on various segmentation tasks, including semantic segmentation, instance segmentation, and dichotomous image segmentation. The results demonstrate the superiority of our SegRefiner from multiple aspects. Firstly, it consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks. Secondly, it outperforms previous model-agnostic refinement methods by a significant margin. Lastly, it exhibits a strong capability to capture extremely fine details when refining high-resolution images. The source code and trained models are available at https://github.com/MengyuWang826/SegRefiner.
\ No newline at end of file
diff --git a/data/2023/neurips/Segment Anything in High Quality b/data/2023/neurips/Segment Anything in High Quality
new file mode 100644
index 0000000000..fa98935e73
--- /dev/null
+++ b/data/2023/neurips/Segment Anything in High Quality	
@@ -0,0 +1 @@
+The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask. Instead of only applying it on mask-decoder features, we first fuse them with early and final ViT features for improved mask details. To train our introduced learnable parameters, we compose a dataset of 44K fine-grained masks from several sources. HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs. We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are evaluated in a zero-shot transfer protocol. Our code and pretrained models are at https://github.com/SysCV/SAM-HQ.
\ No newline at end of file
diff --git a/data/2023/neurips/Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models b/data/2023/neurips/Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
new file mode 100644
index 0000000000..9d45a62045
--- /dev/null
+++ b/data/2023/neurips/Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models	
@@ -0,0 +1 @@
+The recent proliferation of large-scale text-to-image models has led to growing concerns that such models may be misused to generate harmful, misleading, and inappropriate content. Motivated by this issue, we derive a technique inspired by continual learning to selectively forget concepts in pretrained deep generative models. Our method, dubbed Selective Amnesia, enables controllable forgetting where a user can specify how a concept should be forgotten. Selective Amnesia can be applied to conditional variational likelihood models, which encompass a variety of popular deep generative frameworks, including variational autoencoders and large-scale text-to-image diffusion models. Experiments across different models demonstrate that our approach induces forgetting on a variety of concepts, from entire classes in standard datasets to celebrity and nudity prompts in text-to-image models. Our code is publicly available at https://github.com/clear-nus/selective-amnesia.
\ No newline at end of file
diff --git a/data/2023/neurips/Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning b/data/2023/neurips/Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning
new file mode 100644
index 0000000000..f07a58f8ca
--- /dev/null
+++ b/data/2023/neurips/Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning	
@@ -0,0 +1 @@
+We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.
\ No newline at end of file
diff --git a/data/2023/neurips/Self-Adaptive Motion Tracking against On-body Displacement of Flexible Sensors b/data/2023/neurips/Self-Adaptive Motion Tracking against On-body Displacement of Flexible Sensors
new file mode 100644
index 0000000000..ebbdb68de4
--- /dev/null
+++ b/data/2023/neurips/Self-Adaptive Motion Tracking against On-body Displacement of Flexible Sensors	
@@ -0,0 +1 @@
+Flexible sensors are promising for ubiquitous sensing of human status due to their flexibility and easy integration as wearable systems. However, on-body displacement of sensors is inevitable since the device cannot be firmly worn at a fixed position across different sessions. This displacement issue causes complicated patterns and significant challenges to subsequent machine learning algorithms. Our work proposes a novel self-adaptive motion tracking network to address this challenge. Our network consists of three novel components: i) a light-weight learnable Affine Transformation layer whose parameters can be tuned to efficiently adapt to unknown displacements; ii) a Fourier-encoded LSTM network for better pattern identification; iii) a novel sequence discrepancy loss equipped with auxiliary regres-sors for unsupervised tuning of Affine Transformation parameters. Experimental results show that our method is robust across different on-body position configurations. Our dataset and code are available at: https://github.com/ZuoCX1996/Self-Adaptive-Motion-Tracking-against-On-body-Displacement-of-Flexible-Sensors.
\ No newline at end of file
diff --git a/data/2023/neurips/Self-Chained Image-Language Model for Video Localization and Question Answering b/data/2023/neurips/Self-Chained Image-Language Model for Video Localization and Question Answering
new file mode 100644
index 0000000000..9edf609723
--- /dev/null
+++ b/data/2023/neurips/Self-Chained Image-Language Model for Video Localization and Question Answering	
@@ -0,0 +1 @@
+Recent studies have shown promising results on utilizing large pre-trained image-language models for video question answering. While these image-language models can efficiently bootstrap the representation learning of video-language models, they typically concatenate uniformly sampled video frames as visual inputs without explicit language-aware, temporal modeling. When only a portion of a video input is relevant to the language query, such uniform frame sampling can often lead to missing important visual cues. Although humans often find a video moment to focus on and rewind the moment to answer questions, training a query-aware video moment localizer often requires expensive annotations and high computational costs. To address this issue, we propose Self-Chained Video Localization-Answering (SeViLA), a novel framework that leverages a single image-language model (BLIP-2) to tackle both temporal keyframe localization and QA on videos. SeViLA framework consists of two modules: Localizer and Answerer, where both are parameter-efficiently fine-tuned from BLIP-2. We propose two ways of chaining these modules for cascaded inference and self-refinement. First, in the forward chain, the Localizer finds multiple language-aware keyframes in a video, which the Answerer uses to predict the answer. Second, in the reverse chain, the Answerer generates keyframe pseudo-labels to refine the Localizer, alleviating the need for expensive video moment localization annotations. Our SeViLA framework outperforms several strong baselines on 5 challenging video QA and event prediction benchmarks, and achieves the state-of-the-art in both fine-tuning (NExT-QA, STAR) and zero-shot (NExT-QA, STAR, How2QA, VLEP) settings. We also analyze the impact of Localizer, comparisons of Localizer with other temporal localization models, pre-training/self-refinement of Localizer, and varying the number of keyframes.
\ No newline at end of file
diff --git a/data/2023/neurips/Self-Correcting Bayesian Optimization through Bayesian Active Learning b/data/2023/neurips/Self-Correcting Bayesian Optimization through Bayesian Active Learning
new file mode 100644
index 0000000000..b01485b58f
--- /dev/null
+++ b/data/2023/neurips/Self-Correcting Bayesian Optimization through Bayesian Active Learning	
@@ -0,0 +1 @@
+Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize hyperparameter learning. Statistical distance-based Active Learning (SAL) considers the average disagreement between samples from the posterior, as measured by a statistical distance. SAL outperforms the state-of-the-art in Bayesian active learning on several test functions. We then introduce Self-Correcting Bayesian Optimization (SCoreBO), which extends SAL to perform Bayesian optimization and active learning simultaneously. SCoreBO learns the model hyperparameters at improved rates compared to vanilla BO, while outperforming the latest Bayesian optimization methods on traditional benchmarks. Moreover, we demonstrate the importance of self-correction on atypical Bayesian optimization tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Self-Predictive Universal AI b/data/2023/neurips/Self-Predictive Universal AI
new file mode 100644
index 0000000000..cbd6834144
--- /dev/null
+++ b/data/2023/neurips/Self-Predictive Universal AI	
@@ -0,0 +1 @@
+Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.
\ No newline at end of file
diff --git a/data/2023/neurips/Self-supervised video pretraining yields robust and more human-aligned visual representations b/data/2023/neurips/Self-supervised video pretraining yields robust and more human-aligned visual representations
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Sequential Subset Matching for Dataset Distillation b/data/2023/neurips/Sequential Subset Matching for Dataset Distillation
new file mode 100644
index 0000000000..f5a6fd1544
--- /dev/null
+++ b/data/2023/neurips/Sequential Subset Matching for Dataset Distillation	
@@ -0,0 +1 @@
+Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods have produced notable improvements in generating synthetic datasets. However, current state-of-the-art methods treat the entire synthetic dataset as a unified entity and optimize each synthetic instance equally. This static optimization approach may lead to performance degradation in dataset distillation. Specifically, we argue that static optimization can give rise to a coupling issue within the synthetic data, particularly when a larger amount of synthetic data is being optimized. This coupling issue, in turn, leads to the failure of the distilled dataset to extract the high-level features learned by the deep neural network (DNN) in the latter epochs. In this study, we propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch), which tackles this problem by adaptively optimizing the synthetic data to encourage sequential acquisition of knowledge during dataset distillation. Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances, thereby enhancing its performance significantly. Our proposed SeqMatch outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny ImageNet. Our code is available at https://github.com/shqii1j/seqmatch.
\ No newline at end of file
diff --git a/data/2023/neurips/Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots b/data/2023/neurips/Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots
new file mode 100644
index 0000000000..a1ab6eb9f3
--- /dev/null
+++ b/data/2023/neurips/Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots	
@@ -0,0 +1 @@
+In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Shape Non-rigid Kinematics (SNK): A Zero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction b/data/2023/neurips/Shape Non-rigid Kinematics (SNK): A Zero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction
new file mode 100644
index 0000000000..534e07516b
--- /dev/null
+++ b/data/2023/neurips/Shape Non-rigid Kinematics (SNK): A Zero-Shot Method for Non-Rigid Shape Matching via Unsupervised Functional Map Regularized Reconstruction	
@@ -0,0 +1 @@
+We present Shape Non-rigid Kinematics (SNK), a novel zero-shot method for non-rigid shape matching that eliminates the need for extensive training or ground truth data. SNK operates on a single pair of shapes, and employs a reconstruction-based strategy using an encoder-decoder architecture, which deforms the source shape to closely match the target shape. During the process, an unsupervised functional map is predicted and converted into a point-to-point map, serving as a supervisory mechanism for the reconstruction. To aid in training, we have designed a new decoder architecture that generates smooth, realistic deformations. SNK demonstrates competitive results on traditional benchmarks, simplifying the shape-matching process without compromising accuracy. Our code can be found online: https://github.com/pvnieo/SNK
\ No newline at end of file
diff --git a/data/2023/neurips/Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization b/data/2023/neurips/Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization
new file mode 100644
index 0000000000..e45e93a084
--- /dev/null
+++ b/data/2023/neurips/Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization	
@@ -0,0 +1 @@
+Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.
\ No newline at end of file
diff --git a/data/2023/neurips/Should Under-parameterized Student Networks Copy or Average Teacher Weights? b/data/2023/neurips/Should Under-parameterized Student Networks Copy or Average Teacher Weights?
new file mode 100644
index 0000000000..48f3da566e
--- /dev/null
+++ b/data/2023/neurips/Should Under-parameterized Student Networks Copy or Average Teacher Weights?	
@@ -0,0 +1 @@
+Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a neural network with $n<k$ neurons can thus be seen as fitting an under-parameterized"student"network with $n$ neurons to a"teacher"network with $k$ neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that"copy-average"configurations are critical points if the teacher's incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when $n-1$ student neurons each copy one teacher neuron and the $n$-th student neuron averages the remaining $k-n+1$ teacher neurons. For the student network with $n=1$ neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.
\ No newline at end of file
diff --git a/data/2023/neurips/SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization b/data/2023/neurips/SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
new file mode 100644
index 0000000000..2e9b3f217f
--- /dev/null
+++ b/data/2023/neurips/SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization	
@@ -0,0 +1 @@
+In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.
\ No newline at end of file
diff --git a/data/2023/neurips/Simple, Scalable and Effective Clustering via One-Dimensional Projections b/data/2023/neurips/Simple, Scalable and Effective Clustering via One-Dimensional Projections
new file mode 100644
index 0000000000..7ba7de2893
--- /dev/null
+++ b/data/2023/neurips/Simple, Scalable and Effective Clustering via One-Dimensional Projections	
@@ -0,0 +1 @@
+Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. Popular clustering algorithms such as Lloyd's algorithm and $k$-means++ can take $\Omega(ndk)$ time when clustering $n$ points in a $d$-dimensional space (represented by an $n\times d$ matrix $X$) into $k$ clusters. In applications with moderate to large $k$, the multiplicative $k$ factor can become very expensive. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(\mathrm{nnz}(X) + n\log n)$ for arbitrary $k$. Here $\mathrm{nnz}(X)$ is the total number of non-zero entries in the input dataset $X$, which is upper bounded by $nd$ and can be significantly smaller for sparse datasets. We prove that our algorithm achieves approximation ratio $\smash{\widetilde{O}(k^4)}$ on any input dataset for the $k$-means objective. We also believe that our theoretical analysis is of independent interest, as we show that the approximation ratio of a $k$-means algorithm is approximately preserved under a class of projections and that $k$-means++ seeding can be implemented in expected $O(n \log n)$ time in one dimension. Finally, we show experimentally that our clustering algorithm gives a new tradeoff between running time and cluster quality compared to previous state-of-the-art methods for these tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Simplifying and Empowering Transformers for Large-Graph Representations b/data/2023/neurips/Simplifying and Empowering Transformers for Large-Graph Representations
new file mode 100644
index 0000000000..1582c64b37
--- /dev/null
+++ b/data/2023/neurips/Simplifying and Empowering Transformers for Large-Graph Representations	
@@ -0,0 +1 @@
+Learning representations on large-sized graphs is a long-standing challenge due to the inter-dependence nature involved in massive data points. Transformers, as an emerging class of foundation encoders for graph-structured data, have shown promising performance on small graphs due to its global attention capable of capturing all-pair influence beyond neighboring nodes. Even so, existing approaches tend to inherit the spirit of Transformers in language and vision tasks, and embrace complicated models by stacking deep multi-head attentions. In this paper, we critically demonstrate that even using a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks where node numbers range from thousand-level to billion-level. This encourages us to rethink the design philosophy for Transformers on large graphs, where the global attention is a computation overhead hindering the scalability. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model that can efficiently propagate information among arbitrary nodes in one layer. SGFormer requires none of positional encodings, feature/graph pre-processing or augmented loss. Empirically, SGFormer successfully scales to the web-scale graph ogbn-papers100M and yields up to 141x inference acceleration over SOTA Transformers on medium-sized graphs. Beyond current results, we believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
\ No newline at end of file
diff --git a/data/2023/neurips/Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions b/data/2023/neurips/Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
new file mode 100644
index 0000000000..0efe15fbd9
--- /dev/null
+++ b/data/2023/neurips/Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions	
@@ -0,0 +1 @@
+Single-call stochastic extragradient methods, like stochastic past extragradient (SPEG) and stochastic optimistic gradient (SOG), have gained a lot of interest in recent years and are one of the most efficient algorithms for solving large-scale min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, despite their undoubted popularity, current convergence analyses of SPEG and SOG require a bounded variance assumption. In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies. In this work, we address these questions and provide convergence guarantees for two large classes of structured non-monotone VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs). We introduce the expected residual condition, explain its benefits, and show how it can be used to obtain a strictly weaker bound than previously used growth conditions, expected co-coercivity, or bounded variance assumptions. Equipped with this condition, we provide theoretical guarantees for the convergence of single-call extragradient methods for different step-size selections, including constant, decreasing, and step-size-switching rules. Furthermore, our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
\ No newline at end of file
diff --git a/data/2023/neurips/SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning b/data/2023/neurips/SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning
new file mode 100644
index 0000000000..2e0c7fb367
--- /dev/null
+++ b/data/2023/neurips/SituatedGen: Incorporating Geographical and Temporal Contexts into Generative Commonsense Reasoning	
@@ -0,0 +1 @@
+Recently, commonsense reasoning in text generation has attracted much attention. Generative commonsense reasoning is the task that requires machines, given a group of keywords, to compose a single coherent sentence with commonsense plausibility. While existing datasets targeting generative commonsense reasoning focus on everyday scenarios, it is unclear how well machines reason under specific geographical and temporal contexts. We formalize this challenging task as SituatedGen, where machines with commonsense should generate a pair of contrastive sentences given a group of keywords including geographical or temporal entities. We introduce a corresponding English dataset consisting of 8,268 contrastive sentence pairs, which are built upon several existing commonsense reasoning benchmarks with minimal manual labor. Experiments show that state-of-the-art generative language models struggle to generate sentences with commonsense plausibility and still lag far behind human performance. Our dataset is publicly available at https://github.com/yunx-z/situated_gen.
\ No newline at end of file
diff --git a/data/2023/neurips/Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions b/data/2023/neurips/Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
new file mode 100644
index 0000000000..0f85b39c4b
--- /dev/null
+++ b/data/2023/neurips/Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions	
@@ -0,0 +1 @@
+Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. While previous approaches have explored applying FD for second-order optimization, we present a novel analysis which allows efficient interpolation between resource requirements and the degradation in regret guarantees with rank $k$: in the online convex optimization (OCO) setting over dimension $d$, we match full-matrix $d^2$ memory regret using only $dk$ memory up to additive error in the bottom $d-k$ eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
\ No newline at end of file
diff --git a/data/2023/neurips/Slot-guided Volumetric Object Radiance Fields b/data/2023/neurips/Slot-guided Volumetric Object Radiance Fields
new file mode 100644
index 0000000000..0369732a1b
--- /dev/null
+++ b/data/2023/neurips/Slot-guided Volumetric Object Radiance Fields	
@@ -0,0 +1 @@
+We present a novel framework for 3D object-centric representation learning. Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion. This method, called slot-guided Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance fields with object slots as a guidance to implement unsupervised 3D scene decomposition. Specifically, sVORF obtains object slots from a single image via a transformer module, maps these slots to volumetric object radiance fields with a hypernetwork and composes object radiance fields with the guidance of object slots at a 3D location. Moreover, sVORF significantly reduces memory requirement due to small-sized pixel rendering during training. We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of sVORF to segment objects in real-world scenes (e.g., the LLFF dataset). We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Small batch deep reinforcement learning b/data/2023/neurips/Small batch deep reinforcement learning
new file mode 100644
index 0000000000..61023ebda5
--- /dev/null
+++ b/data/2023/neurips/Small batch deep reinforcement learning	
@@ -0,0 +1 @@
+In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
\ No newline at end of file
diff --git a/data/2023/neurips/Smooth, exact rotational symmetrization for deep learning on point clouds b/data/2023/neurips/Smooth, exact rotational symmetrization for deep learning on point clouds
new file mode 100644
index 0000000000..4eb83fe3d6
--- /dev/null
+++ b/data/2023/neurips/Smooth, exact rotational symmetrization for deep learning on point clouds	
@@ -0,0 +1 @@
+Point clouds are versatile representations of 3D objects and have found widespread application in science and engineering. Many successful deep-learning models have been proposed that use them as input. The domain of chemical and materials modeling is especially challenging because exact compliance with physical constraints is highly desirable for a model to be usable in practice. These constraints include smoothness and invariance with respect to translations, rotations, and permutations of identical atoms. If these requirements are not rigorously fulfilled, atomistic simulations might lead to absurd outcomes even if the model has excellent accuracy. Consequently, dedicated architectures, which achieve invariance by restricting their design space, have been developed. General-purpose point-cloud models are more varied but often disregard rotational symmetry. We propose a general symmetrization method that adds rotational equivariance to any given model while preserving all the other requirements. Our approach simplifies the development of better atomic-scale machine-learning schemes by relaxing the constraints on the design space and making it possible to incorporate ideas that proved effective in other domains. We demonstrate this idea by introducing the Point Edge Transformer (PET) architecture, which is not intrinsically equivariant but achieves state-of-the-art performance on several benchmark datasets of molecules and solids. A-posteriori application of our general protocol makes PET exactly equivariant, with minimal changes to its accuracy.
\ No newline at end of file
diff --git a/data/2023/neurips/Smoothed Analysis of Sequential Probability Assignment b/data/2023/neurips/Smoothed Analysis of Sequential Probability Assignment
new file mode 100644
index 0000000000..ce7e0a39bc
--- /dev/null
+++ b/data/2023/neurips/Smoothed Analysis of Sequential Probability Assignment	
@@ -0,0 +1 @@
+We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.
\ No newline at end of file
diff --git a/data/2023/neurips/SoTTA: Robust Test-Time Adaptation on Noisy Data Streams b/data/2023/neurips/SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
new file mode 100644
index 0000000000..bbb4c2fd92
--- /dev/null
+++ b/data/2023/neurips/SoTTA: Robust Test-Time Adaptation on Noisy Data Streams	
@@ -0,0 +1 @@
+Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing TTA algorithms; we found that prior TTA algorithms suffer from those noisy test samples as they blindly adapt to incoming samples. To address this problem, we present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples. The key enabler of SoTTA is two-fold: (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples and (ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples. Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at https://github.com/taeckyung/SoTTA .
\ No newline at end of file
diff --git a/data/2023/neurips/Social Motion Prediction with Cognitive Hierarchies b/data/2023/neurips/Social Motion Prediction with Cognitive Hierarchies
new file mode 100644
index 0000000000..942603c928
--- /dev/null
+++ b/data/2023/neurips/Social Motion Prediction with Cognitive Hierarchies	
@@ -0,0 +1 @@
+Humans exhibit a remarkable capacity for anticipating the actions of others and planning their own actions accordingly. In this study, we strive to replicate this ability by addressing the social motion prediction problem. We introduce a new benchmark, a novel formulation, and a cognition-inspired framework. We present Wusi, a 3D multi-person motion dataset under the context of team sports, which features intense and strategic human interactions and diverse pose distributions. By reformulating the problem from a multi-agent reinforcement learning perspective, we incorporate behavioral cloning and generative adversarial imitation learning to boost learning efficiency and generalization. Furthermore, we take into account the cognitive aspects of the human social action planning process and develop a cognitive hierarchy framework to predict strategic human social interactions. We conduct comprehensive experiments to validate the effectiveness of our proposed dataset and approach. Code and data are available at https://walter0807.github.io/Social-CH/.
\ No newline at end of file
diff --git a/data/2023/neurips/Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks b/data/2023/neurips/Softmax Output Approximation for Activation Memory-Efficient Training of Attention-based Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Solving a Class of Non-Convex Minimax Optimization in Federated Learning b/data/2023/neurips/Solving a Class of Non-Convex Minimax Optimization in Federated Learning
new file mode 100644
index 0000000000..8af7407cbc
--- /dev/null
+++ b/data/2023/neurips/Solving a Class of Non-Convex Minimax Optimization in Federated Learning	
@@ -0,0 +1 @@
+The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/Sparse Deep Learning for Time Series Data: Theory and Applications b/data/2023/neurips/Sparse Deep Learning for Time Series Data: Theory and Applications
new file mode 100644
index 0000000000..e48d5bd248
--- /dev/null
+++ b/data/2023/neurips/Sparse Deep Learning for Time Series Data: Theory and Applications	
@@ -0,0 +1 @@
+Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
\ No newline at end of file
diff --git a/data/2023/neurips/Sparse Modular Activation for Efficient Sequence Modeling b/data/2023/neurips/Sparse Modular Activation for Efficient Sequence Modeling
new file mode 100644
index 0000000000..9a6bfc6255
--- /dev/null
+++ b/data/2023/neurips/Sparse Modular Activation for Efficient Sequence Modeling	
@@ -0,0 +1 @@
+Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure. However, in more comprehensive tasks like language modeling and machine translation, self-attention-based models still outperform SSMs. Hybrid models employing both SSM and self-attention generally show promising performance, but current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. In this work, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption at both training and inference stages of sequence modeling. As a specific instantiation of SMA, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including language modeling, speech classification and long-range arena, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity and reveals the amount of attention needed for each task through the learned sparse activation patterns.
\ No newline at end of file
diff --git a/data/2023/neurips/Sparse Parameterization for Epitomic Dataset Distillation b/data/2023/neurips/Sparse Parameterization for Epitomic Dataset Distillation
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Sparsity-Preserving Differentially Private Training of Large Embedding Models b/data/2023/neurips/Sparsity-Preserving Differentially Private Training of Large Embedding Models
new file mode 100644
index 0000000000..3e9edb615c
--- /dev/null
+++ b/data/2023/neurips/Sparsity-Preserving Differentially Private Training of Large Embedding Models	
@@ -0,0 +1 @@
+As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Spatial-frequency channels, shape bias, and adversarial robustness b/data/2023/neurips/Spatial-frequency channels, shape bias, and adversarial robustness
new file mode 100644
index 0000000000..9de48c52b3
--- /dev/null
+++ b/data/2023/neurips/Spatial-frequency channels, shape bias, and adversarial robustness	
@@ -0,0 +1 @@
+What spatial frequency information do humans and neural networks use to recognize objects? In neuroscience, critical band masking is an established tool that can reveal the frequency-selective filters used for object recognition. Critical band masking measures the sensitivity of recognition performance to noise added at each spatial frequency. Existing critical band masking studies show that humans recognize periodic patterns (gratings) and letters by means of a spatial-frequency filter (or"channel'') that has a frequency bandwidth of one octave (doubling of frequency). Here, we introduce critical band masking as a task for network-human comparison and test 14 humans and 76 neural networks on 16-way ImageNet categorization in the presence of narrowband noise. We find that humans recognize objects in natural images using the same one-octave-wide channel that they use for letters and gratings, making it a canonical feature of human object recognition. On the other hand, the neural network channel, across various architectures and training strategies, is 2-4 times as wide as the human channel. In other words, networks are vulnerable to high and low frequency noise that does not affect human performance. Adversarial and augmented-image training are commonly used to increase network robustness and shape bias. Does this training align network and human object recognition channels? Three network channel properties (bandwidth, center frequency, peak noise sensitivity) correlate strongly with shape bias (53% variance explained) and with robustness of adversarially-trained networks (74% variance explained). Adversarial training increases robustness but expands the channel bandwidth even further away from the human bandwidth. Thus, critical band masking reveals that the network channel is more than twice as wide as the human channel, and that adversarial training only increases this difference.
\ No newline at end of file
diff --git a/data/2023/neurips/Spatially Resolved Gene Expression Prediction from Histology Images via Bi-modal Contrastive Learning b/data/2023/neurips/Spatially Resolved Gene Expression Prediction from Histology Images via Bi-modal Contrastive Learning
new file mode 100644
index 0000000000..63e0a66821
--- /dev/null
+++ b/data/2023/neurips/Spatially Resolved Gene Expression Prediction from Histology Images via Bi-modal Contrastive Learning	
@@ -0,0 +1 @@
+Histology imaging is an important tool in medical diagnosis and research, enabling the examination of tissue structure and composition at the microscopic level. Understanding the underlying molecular mechanisms of tissue architecture is critical in uncovering disease mechanisms and developing effective treatments. Gene expression profiling provides insight into the molecular processes underlying tissue architecture, but the process can be time-consuming and expensive. We present BLEEP (Bi-modaL Embedding for Expression Prediction), a bi-modal embedding framework capable of generating spatially resolved gene expression profiles of whole-slide Hematoxylin and eosin (H&E) stained histology images. BLEEP uses contrastive learning to construct a low-dimensional joint embedding space from a reference dataset using paired image and expression profiles at micrometer resolution. With this approach, the gene expression of any query image patch can be imputed using the expression profiles from the reference dataset. We demonstrate BLEEP's effectiveness in gene expression prediction by benchmarking its performance on a human liver tissue dataset captured using the 10x Visium platform, where it achieves significant improvements over existing methods. Our results demonstrate the potential of BLEEP to provide insights into the molecular mechanisms underlying tissue architecture, with important implications in diagnosis and research of various diseases. The proposed approach can significantly reduce the time and cost associated with gene expression profiling, opening up new avenues for high-throughput analysis of histology images for both research and clinical applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning b/data/2023/neurips/Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning
new file mode 100644
index 0000000000..1c3e1d2614
--- /dev/null
+++ b/data/2023/neurips/Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning	
@@ -0,0 +1 @@
+We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. In both cases, each entry of the matrix carries important information, and we seek estimation methods with low entry-wise error. Importantly, these methods further need to accommodate for inherent correlations in the available data (e.g. for MDPs, the data consists of system trajectories). We investigate the performance of simple spectral-based matrix estimation approaches: we show that they efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error. These new results on low-rank matrix estimation make it possible to devise reinforcement learning algorithms that fully exploit the underlying low-rank structure. We provide two examples of such algorithms: a regret minimization algorithm for low-rank bandit problems, and a best policy identification algorithm for reward-free RL in low-rank MDPs. Both algorithms yield state-of-the-art performance guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/Spike-driven Transformer b/data/2023/neurips/Spike-driven Transformer
new file mode 100644
index 0000000000..97313e86b9
--- /dev/null
+++ b/data/2023/neurips/Spike-driven Transformer	
@@ -0,0 +1 @@
+Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i.e., spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87.2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77.1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field. The source code is available at https://github.com/BICLab/Spike-Driven-Transformer.
\ No newline at end of file
diff --git a/data/2023/neurips/Spontaneous symmetry breaking in generative diffusion models b/data/2023/neurips/Spontaneous symmetry breaking in generative diffusion models
new file mode 100644
index 0000000000..3c3a6a1c6e
--- /dev/null
+++ b/data/2023/neurips/Spontaneous symmetry breaking in generative diffusion models	
@@ -0,0 +1 @@
+Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data. In this paper, we show that the dynamics of these models exhibit a spontaneous symmetry breaking that divides the generative dynamics into two distinct phases: 1) A linear steady-state dynamics around a central fixed-point and 2) an attractor dynamics directed towards the data manifold. These two"phases"are separated by the change in stability of the central fixed-point, with the resulting window of instability being responsible for the diversity of the generated samples. Using both theoretical and empirical evidence, we show that an accurate simulation of the early dynamics does not significantly contribute to the final generation, since early fluctuations are reverted to the central fixed point. To leverage this insight, we propose a Gaussian late initialization scheme, which significantly improves model performance, achieving up to 3x FID improvements on fast samplers, while also increasing sample diversity (e.g., racial composition of generated CelebA images). Our work offers a new way to understand the generative dynamics of diffusion models that has the potential to bring about higher performance and less biased fast-samplers.
\ No newline at end of file
diff --git a/data/2023/neurips/Squared Neural Families: A New Class of Tractable Density Models b/data/2023/neurips/Squared Neural Families: A New Class of Tractable Density Models
new file mode 100644
index 0000000000..766e3f264a
--- /dev/null
+++ b/data/2023/neurips/Squared Neural Families: A New Class of Tractable Density Models	
@@ -0,0 +1 @@
+Flexible models for probability distributions are an essential ingredient in many machine learning tasks. We develop and investigate a new class of probability distributions, which we call a Squared Neural Family (SNEFY), formed by squaring the 2-norm of a neural network and normalising it with respect to a base measure. Following the reasoning similar to the well established connections between infinitely wide neural networks and Gaussian processes, we show that SNEFYs admit closed form normalising constants in many cases of interest, thereby resulting in flexible yet fully tractable density models. SNEFYs strictly generalise classical exponential families, are closed under conditioning, and have tractable marginal distributions. Their utility is illustrated on a variety of density estimation, conditional density estimation, and density estimation with missing data tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective b/data/2023/neurips/Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective
new file mode 100644
index 0000000000..3192be28eb
--- /dev/null
+++ b/data/2023/neurips/Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective	
@@ -0,0 +1 @@
+We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for efficient dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution synthesis, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5% and 32.9%, respectively. Our approach also surpasses MTT in terms of speed by approximately 52$\times$ (ConvNet-4) and 16$\times$ (ResNet-18) faster with less memory consumption of 11.6$\times$ and 6.4$\times$ during data synthesis. Our code and condensed datasets of 50, 200 IPC with 4K recovery budget are available at https://github.com/VILA-Lab/SRe2L.
\ No newline at end of file
diff --git a/data/2023/neurips/Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation b/data/2023/neurips/Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation
new file mode 100644
index 0000000000..f3fa146d6f
--- /dev/null
+++ b/data/2023/neurips/Stabilizing the Optimization of Neural Signed Distance Functions and Finer Shape Representation	
@@ -0,0 +1 @@
+We present new insights and a novel paradigm (StEik) for learning implicit neural representations (INR) of shapes. In particular, we shed light on the popular eikonal loss used for imposing a signed distance function constraint in INR. We show analytically that as the representation power of the network increases, the optimization approaches a partial differential equation (PDE) in the continuum limit that is unstable. We show that this instability can manifest in existing network optimization, leading to irregularities in the reconstructed surface and/or convergence to sub-optimal local minima, and thus fails to capture fine geometric and topological structure. We show analytically how other terms added to the loss, currently used in the literature for other purposes, can actually eliminate these instabilities. However, such terms can over-regularize the surface, preventing the representation of fine shape detail. Based on a similar PDE theory for the continuum limit, we introduce a new regularization term that still counteracts the eikonal instability but without over-regularizing. Furthermore, since stability is now guaranteed in the continuum limit, this stabilization also allows for considering new network structures that are able to represent finer shape detail. We introduce such a structure based on quadratic layers. Experiments on multiple benchmark data sets show that our new regularization and network are able to capture more precise shape details and more accurate topology than existing state-of-the-art.
\ No newline at end of file
diff --git a/data/2023/neurips/Stable Bias: Evaluating Societal Representations in Diffusion Models b/data/2023/neurips/Stable Bias: Evaluating Societal Representations in Diffusion Models
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures b/data/2023/neurips/Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures
new file mode 100644
index 0000000000..680b809baa
--- /dev/null
+++ b/data/2023/neurips/Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures	
@@ -0,0 +1 @@
+Persistent homology (PH) provides topological descriptors for geometric data, such as weighted graphs, which are interpretable, stable to perturbations, and invariant under, e.g., relabeling. Most applications of PH focus on the one-parameter case -- where the descriptors summarize the changes in topology of data as it is filtered by a single quantity of interest -- and there is now a wide array of methods enabling the use of one-parameter PH descriptors in data science, which rely on the stable vectorization of these descriptors as elements of a Hilbert space. Although the multiparameter PH (MPH) of data that is filtered by several quantities of interest encodes much richer information than its one-parameter counterpart, the scarceness of stability results for MPH descriptors has so far limited the available options for the stable vectorization of MPH. In this paper, we aim to bring together the best of both worlds by showing how the interpretation of signed barcodes -- a recent family of MPH descriptors -- as signed measures leads to natural extensions of vectorization strategies from one parameter to multiple parameters. The resulting feature vectors are easy to define and to compute, and provably stable. While, as a proof of concept, we focus on simple choices of signed barcodes and vectorizations, we already see notable performance improvements when comparing our feature vectors to state-of-the-art topology-based methods on various types of data.
\ No newline at end of file
diff --git a/data/2023/neurips/StableFDG: Style and Attention Based Learning for Federated Domain Generalization b/data/2023/neurips/StableFDG: Style and Attention Based Learning for Federated Domain Generalization
new file mode 100644
index 0000000000..cb8b65f42c
--- /dev/null
+++ b/data/2023/neurips/StableFDG: Style and Attention Based Learning for Federated Domain Generalization	
@@ -0,0 +1 @@
+Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.
\ No newline at end of file
diff --git a/data/2023/neurips/State Sequences Prediction via Fourier Transform for Representation Learning b/data/2023/neurips/State Sequences Prediction via Fourier Transform for Representation Learning
new file mode 100644
index 0000000000..3827562827
--- /dev/null
+++ b/data/2023/neurips/State Sequences Prediction via Fourier Transform for Representation Learning	
@@ -0,0 +1 @@
+While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
\ No newline at end of file
diff --git a/data/2023/neurips/State-Action Similarity-Based Representations for Off-Policy Evaluation b/data/2023/neurips/State-Action Similarity-Based Representations for Off-Policy Evaluation
new file mode 100644
index 0000000000..134e59960f
--- /dev/null
+++ b/data/2023/neurips/State-Action Similarity-Based Representations for Off-Policy Evaluation	
@@ -0,0 +1 @@
+In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is then used to estimate the expected return of the evaluation policy. Typically, the original fixed dataset is fed directly into FQE to learn the action-value function of the evaluation policy. Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE. To learn such an encoder, we introduce an OPE-tailored state-action behavioral similarity metric, and use this metric and the fixed dataset to learn an encoder that models this metric. Theoretically, we show that this metric allows us to bound the error in the resulting OPE estimate. Empirically, we show that other state-action similarity metrics lead to representations that cannot represent the action-value function of the evaluation policy, and that our state-action representation method boosts the data-efficiency of FQE and lowers OPE error relative to other OPE-based representation learning methods on challenging OPE tasks. We also empirically show that the learned representations significantly mitigate divergence of FQE under varying distribution shifts. Our code is available here: https://github.com/Badger-RL/ROPE.
\ No newline at end of file
diff --git a/data/2023/neurips/State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory b/data/2023/neurips/State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory
new file mode 100644
index 0000000000..2e0797a52a
--- /dev/null
+++ b/data/2023/neurips/State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory	
@@ -0,0 +1 @@
+State-space models have gained popularity in sequence modelling due to their simple and efficient network structures. However, the absence of nonlinear activation along the temporal direction limits the model's capacity. In this paper, we prove that stacking state-space models with layer-wise nonlinear activation is sufficient to approximate any continuous sequence-to-sequence relationship. Our findings demonstrate that the addition of layer-wise nonlinear activation enhances the model's capacity to learn complex sequence patterns. Meanwhile, it can be seen both theoretically and empirically that the state-space models do not fundamentally resolve the issue of exponential decaying memory. Theoretical results are justified by numerical verifications.
\ No newline at end of file
diff --git a/data/2023/neurips/State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding b/data/2023/neurips/State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
new file mode 100644
index 0000000000..ee0cc15178
--- /dev/null
+++ b/data/2023/neurips/State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding	
@@ -0,0 +1 @@
+As more non-AI experts use complex AI systems for daily tasks, there has been an increasing effort to develop methods that produce explanations of AI decision making that are understandable by non-AI experts. Towards this effort, leveraging higher-level concepts and producing concept-based explanations have become a popular method. Most concept-based explanations have been developed for classification techniques, and we posit that the few existing methods for sequential decision making are limited in scope. In this work, we first contribute a desiderata for defining concepts in sequential decision making settings. Additionally, inspired by the Protege Effect which states explaining knowledge often reinforces one's self-learning, we explore how concept-based explanations of an RL agent's decision making can in turn improve the agent's learning rate, as well as improve end-user understanding of the agent's decision making. To this end, we contribute a unified framework, State2Explanation (S2E), that involves learning a joint embedding model between state-action pairs and concept-based explanations, and leveraging such learned model to both (1) inform reward shaping during an agent's training, and (2) provide explanations to end-users at deployment for improved task performance. Our experimental validations, in Connect 4 and Lunar Lander, demonstrate the success of S2E in providing a dual-benefit, successfully informing reward shaping and improving agent learning rate, as well as significantly improving end user task performance at deployment time.
\ No newline at end of file
diff --git a/data/2023/neurips/Static and Sequential Malicious Attacks in the Context of Selective Forgetting b/data/2023/neurips/Static and Sequential Malicious Attacks in the Context of Selective Forgetting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Statistical Analysis of Quantum State Learning Process in Quantum Neural Networks b/data/2023/neurips/Statistical Analysis of Quantum State Learning Process in Quantum Neural Networks
new file mode 100644
index 0000000000..fba309642a
--- /dev/null
+++ b/data/2023/neurips/Statistical Analysis of Quantum State Learning Process in Quantum Neural Networks	
@@ -0,0 +1 @@
+Quantum neural networks (QNNs) have been a promising framework in pursuing near-term quantum advantage in various fields, where many applications can be viewed as learning a quantum state that encodes useful data. As a quantum analog of probability distribution learning, quantum state learning is theoretically and practically essential in quantum machine learning. In this paper, we develop a no-go theorem for learning an unknown quantum state with QNNs even starting from a high-fidelity initial state. We prove that when the loss value is lower than a critical threshold, the probability of avoiding local minima vanishes exponentially with the qubit count, while only grows polynomially with the circuit depth. The curvature of local minima is concentrated to the quantum Fisher information times a loss-dependent constant, which characterizes the sensibility of the output state with respect to parameters in QNNs. These results hold for any circuit structures, initialization strategies, and work for both fixed ansatzes and adaptive methods. Extensive numerical simulations are performed to validate our theoretical results. Our findings place generic limits on good initial guesses and adaptive methods for improving the learnability and scalability of QNNs, and deepen the understanding of prior information's role in QNNs.
\ No newline at end of file
diff --git a/data/2023/neurips/Statistical Knowledge Assessment for Large Language Models b/data/2023/neurips/Statistical Knowledge Assessment for Large Language Models
new file mode 100644
index 0000000000..b0c928d823
--- /dev/null
+++ b/data/2023/neurips/Statistical Knowledge Assessment for Large Language Models	
@@ -0,0 +1 @@
+Given varying prompts regarding a factoid question, can a large language model (LLM) reliably generate factually correct answers? Existing LLMs may generate distinct responses for different prompts. In this paper, we study the problem of quantifying knowledge contained in an LLM regarding a given set of facts. We propose KaRR, a statistical approach to assess factual knowledge for LLMs. The main idea is to estimate the ratio of LLM generating text corresponding to the answer entity given diverse prompts of the subject and the querying relation, versus it generating by random chances. Our assessment suite contains a comprehensive set of 994,123 entities and 600 relations, with 1,395,905 text aliases. We use our method to evaluate 20 LLMs of various sizes, including LLaMA, Alpaca, OPT, etc. Experiments show that our results have a strong correlation (0.43 Kendall's $\tau$) with the results of human assessment on LLMs. Our results reveal that the knowledge in LLMs with the same backbone architecture adheres to the scaling law, while tuning on instruction-following data sometimes compromises the model's capability to generate factually correct text reliably.
\ No newline at end of file
diff --git a/data/2023/neurips/Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits b/data/2023/neurips/Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
new file mode 100644
index 0000000000..62ac205586
--- /dev/null
+++ b/data/2023/neurips/Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits	
@@ -0,0 +1 @@
+We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored. This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms. Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound. The corresponding optimization problems have a reduced number of variables and constraints. By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process. We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds. The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.
\ No newline at end of file
diff --git a/data/2023/neurips/Statistically Valid Variable Importance Assessment through Conditional Permutations b/data/2023/neurips/Statistically Valid Variable Importance Assessment through Conditional Permutations
new file mode 100644
index 0000000000..d6ab7eea30
--- /dev/null
+++ b/data/2023/neurips/Statistically Valid Variable Importance Assessment through Conditional Permutations	
@@ -0,0 +1 @@
+Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An experiment on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
\ No newline at end of file
diff --git "a/data/2023/neurips/Stein \316\240-Importance Sampling" "b/data/2023/neurips/Stein \316\240-Importance Sampling"
new file mode 100644
index 0000000000..052007032e
--- /dev/null
+++ "b/data/2023/neurips/Stein \316\240-Importance Sampling"	
@@ -0,0 +1 @@
+Stein discrepancies have emerged as a powerful tool for retrospective improvement of Markov chain Monte Carlo output. However, the question of how to design Markov chains that are well-suited to such post-processing has yet to be addressed. This paper studies Stein importance sampling, in which weights are assigned to the states visited by a $\Pi$-invariant Markov chain to obtain a consistent approximation of $P$, the intended target. Surprisingly, the optimal choice of $\Pi$ is not identical to the target $P$; we therefore propose an explicit construction for $\Pi$ based on a novel variational argument. Explicit conditions for convergence of Stein $\Pi$-Importance Sampling are established. For $\approx 70\%$ of tasks in the PosteriorDB benchmark, a significant improvement over the analogous post-processing of $P$-invariant Markov chains is reported.
\ No newline at end of file
diff --git a/data/2023/neurips/Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths b/data/2023/neurips/Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths
new file mode 100644
index 0000000000..c5d2636cdb
--- /dev/null
+++ b/data/2023/neurips/Stochastic Optimal Control for Collective Variable Free Sampling of Molecular Transition Paths	
@@ -0,0 +1 @@
+We consider the problem of sampling transition paths between two given metastable states of a molecular system, e.g. a folded and unfolded protein or products and reactants of a chemical reaction. Due to the existence of high energy barriers separating the states, these transition paths are unlikely to be sampled with standard Molecular Dynamics (MD) simulation. Traditional methods to augment MD with a bias potential to increase the probability of the transition rely on a dimensionality reduction step based on Collective Variables (CVs). Unfortunately, selecting appropriate CVs requires chemical intuition and traditional methods are therefore not always applicable to larger systems. Additionally, when incorrect CVs are used, the bias potential might not be minimal and bias the system along dimensions irrelevant to the transition. Showing a formal relation between the problem of sampling molecular transition paths, the Schr\"odinger bridge problem and stochastic optimal control with neural network policies, we propose a machine learning method for sampling said transitions. Unlike previous non-machine learning approaches our method, named PIPS, does not depend on CVs. We show that our method successful generates low energy transitions for Alanine Dipeptide as well as the larger Polyproline and Chignolin proteins.
\ No newline at end of file
diff --git a/data/2023/neurips/StoryBench: A Multifaceted Benchmark for Continuous Story Visualization b/data/2023/neurips/StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
new file mode 100644
index 0000000000..839c3e6ba0
--- /dev/null
+++ b/data/2023/neurips/StoryBench: A Multifaceted Benchmark for Continuous Story Visualization	
@@ -0,0 +1 @@
+Generating video stories from text prompts is a complex task. In addition to having high visual quality, videos need to realistically adhere to a sequence of text prompts whilst being consistent throughout the frames. Creating a benchmark for video generation requires data annotated over time, which contrasts with the single caption used often in video datasets. To fill this gap, we collect comprehensive human annotations on three existing datasets, and introduce StoryBench: a new, challenging multi-task benchmark to reliably evaluate forthcoming text-to-video models. Our benchmark includes three video generation tasks of increasing difficulty: action execution, where the next action must be generated starting from a conditioning video; story continuation, where a sequence of actions must be executed starting from a conditioning video; and story generation, where a video must be generated from only text prompts. We evaluate small yet strong text-to-video baselines, and show the benefits of training on story-like data algorithmically generated from existing video captions. Finally, we establish guidelines for human evaluation of video stories, and reaffirm the need of better automatic metrics for video generation. StoryBench aims at encouraging future research efforts in this exciting new area.
\ No newline at end of file
diff --git a/data/2023/neurips/Strategic Apple Tasting b/data/2023/neurips/Strategic Apple Tasting
new file mode 100644
index 0000000000..d5a82fcfe7
--- /dev/null
+++ b/data/2023/neurips/Strategic Apple Tasting	
@@ -0,0 +1 @@
+Algorithmic decision-making in high-stakes domains often involves assigning decisions to agents with incentives to strategically modify their input to the algorithm. In addition to dealing with incentives, in many domains of interest (e.g. lending and hiring) the decision-maker only observes feedback regarding their policy for rounds in which they assign a positive decision to the agent; this type of feedback is often referred to as apple tasting (or one-sided) feedback. We formalize this setting as an online learning problem with apple-tasting feedback where a principal makes decisions about a sequence of $T$ agents, each of which is represented by a context that may be strategically modified. Our goal is to achieve sublinear strategic regret, which compares the performance of the principal to that of the best fixed policy in hindsight, if the agents were truthful when revealing their contexts. Our main result is a learning algorithm which incurs $O (\sqrt{T})$ strategic regret when the sequence of agents is chosen stochastically. We also give an algorithm capable of handling adversarially-chosen agents, albeit at the cost of $O(T^{(d+1)/(d+2)})$ strategic regret (where $d$ is the dimension of the context). Our algorithms can be easily adapted to the setting where the principal receives bandit feedback -- this setting generalizes both the linear contextual bandit problem (by considering agents with incentives) and the strategic classification problem (by allowing for partial feedback).
\ No newline at end of file
diff --git a/data/2023/neurips/Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost b/data/2023/neurips/Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Streaming PCA for Markovian Data b/data/2023/neurips/Streaming PCA for Markovian Data
new file mode 100644
index 0000000000..bd5306c76c
--- /dev/null
+++ b/data/2023/neurips/Streaming PCA for Markovian Data	
@@ -0,0 +1 @@
+Since its inception in 1982, Oja's algorithm has become an established method for streaming principle component analysis (PCA). We study the problem of streaming PCA, where the data-points are sampled from an irreducible, aperiodic, and reversible Markov chain. Our goal is to estimate the top eigenvector of the unknown covariance matrix of the stationary distribution. This setting has implications in scenarios where data can solely be sampled from a Markov Chain Monte Carlo (MCMC) type algorithm, and the objective is to perform inference on parameters of the stationary distribution. Most convergence guarantees for Oja's algorithm in the literature assume that the data-points are sampled IID. For data streams with Markovian dependence, one typically downsamples the data to get a"nearly"independent data stream. In this paper, we obtain the first sharp rate for Oja's algorithm on the entire data, where we remove the logarithmic dependence on the sample size, $n$, resulting from throwing data away in downsampling strategies.
\ No newline at end of file
diff --git a/data/2023/neurips/Strong and Precise Modulation of Human Percepts via Robustified ANNs b/data/2023/neurips/Strong and Precise Modulation of Human Percepts via Robustified ANNs
new file mode 100644
index 0000000000..8d7e6e370c
--- /dev/null
+++ b/data/2023/neurips/Strong and Precise Modulation of Human Percepts via Robustified ANNs	
@@ -0,0 +1 @@
+The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations – and locally stable in general – this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this, we show that when small-norm image perturbations are generated by standard ANN models, human object category percepts are indeed highly stable. However, in this very same “human-presumed-stable” regime, we find that robustified ANNs reliably discover low-norm image perturbations that strongly disrupt human percepts. These previously undetectable human perceptual disruptions are massive in amplitude, approaching the same level of sensitivity seen in robustified ANNs. Further, we show that robustified ANNs support precise perceptual state interventions : they guide the construction of low-norm image perturbations that strongly alter human category percepts toward specific prescribed percepts. In sum, these contemporary models of biological visual processing are now accurate enough to guide strong and precise interventions on human perception.
\ No newline at end of file
diff --git a/data/2023/neurips/Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects b/data/2023/neurips/Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects
new file mode 100644
index 0000000000..8912110097
--- /dev/null
+++ b/data/2023/neurips/Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects	
@@ -0,0 +1 @@
+Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.
\ No newline at end of file
diff --git a/data/2023/neurips/Structure of universal formulas b/data/2023/neurips/Structure of universal formulas
new file mode 100644
index 0000000000..0b2752e240
--- /dev/null
+++ b/data/2023/neurips/Structure of universal formulas	
@@ -0,0 +1 @@
+By universal formulas we understand parameterized analytic expressions that have a fixed complexity, but nevertheless can approximate any continuous function on a compact set. There exist various examples of such formulas, including some in the form of neural networks. In this paper we analyze the essential structural elements of these highly expressive models. We introduce a hierarchy of expressiveness classes connecting the global approximability property to the weaker property of infinite VC dimension, and prove a series of classification results for several increasingly complex functional families. In particular, we introduce a general family of polynomially-exponentially-algebraic functions that, as we prove, is subject to polynomial constraints. As a consequence, we show that fixed-size neural networks with not more than one layer of neurons having transcendental activations (e.g., sine or standard sigmoid) cannot in general approximate functions on arbitrary finite sets. On the other hand, we give examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets, but fail to do that on the whole domain of definition.
\ No newline at end of file
diff --git a/data/2023/neurips/Structured Neural Networks for Density Estimation and Causal Inference b/data/2023/neurips/Structured Neural Networks for Density Estimation and Causal Inference
new file mode 100644
index 0000000000..94bd6250c8
--- /dev/null
+++ b/data/2023/neurips/Structured Neural Networks for Density Estimation and Causal Inference	
@@ -0,0 +1 @@
+Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
\ No newline at end of file
diff --git a/data/2023/neurips/Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees b/data/2023/neurips/Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees
new file mode 100644
index 0000000000..342edd4d46
--- /dev/null
+++ b/data/2023/neurips/Structured Neural-PI Control with End-to-End Stability and Output Tracking Guarantees	
@@ -0,0 +1 @@
+We study the optimal control of multiple-input and multiple-output dynamical systems via the design of neural network-based controllers with stability and output tracking guarantees. While neural network-based nonlinear controllers have shown superior performance in various applications, their lack of provable guarantees has restricted their adoption in high-stake real-world applications. This paper bridges the gap between neural network-based controllers and the need for stabilization guarantees. Using equilibrium-independent passivity, a property present in a wide range of physical systems, we propose neural Proportional-Integral (PI) controllers that have provable guarantees of stability and zero steady-state output tracking error. The key structure is the strict monotonicity on proportional and integral terms, which is parameterized as gradients of strictly convex neural networks (SCNN). We construct SCNN with tunable softplus-$\beta$ activations, which yields universal approximation capability and is also useful in incorporating communication constraints. In addition, the SCNNs serve as Lyapunov functions, giving us end-to-end performance guarantees. Experiments on traffic and power networks demonstrate that the proposed approach improves both transient and steady-state performances, while unstructured neural networks lead to unstable behaviors.
\ No newline at end of file
diff --git a/data/2023/neurips/Structured Prediction with Stronger Consistency Guarantees b/data/2023/neurips/Structured Prediction with Stronger Consistency Guarantees
new file mode 100644
index 0000000000..67af1692fc
--- /dev/null
+++ b/data/2023/neurips/Structured Prediction with Stronger Consistency Guarantees	
@@ -0,0 +1 @@
+We present an extensive study of surrogate losses for structured prediction supported by H -consistency bounds . These are recently introduced guarantees that are more relevant to learning than Bayes-consistency, since they are not asymptotic and since they take into account the hypothesis set H used. We ﬁrst show that no non-trivial H -consistency bound can be derived for widely used surrogate structured prediction losses. We then deﬁne several new families of surrogate losses, including structured comp-sum losses and structured constrained losses , for which we prove H -consistency bounds and thus Bayes-consistency. These loss functions readily lead to new structured prediction algorithms with stronger theoretical guarantees, based on their minimization. We describe efﬁcient algorithms for minimizing several of these surrogate losses, including a new structured logistic loss .
\ No newline at end of file
diff --git a/data/2023/neurips/StyleDrop: Text-to-Image Synthesis of Any Style b/data/2023/neurips/StyleDrop: Text-to-Image Synthesis of Any Style
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/StyleGAN knows Normal, Depth, Albedo, and More b/data/2023/neurips/StyleGAN knows Normal, Depth, Albedo, and More
new file mode 100644
index 0000000000..122cc0d2e6
--- /dev/null
+++ b/data/2023/neurips/StyleGAN knows Normal, Depth, Albedo, and More	
@@ -0,0 +1 @@
+Intrinsic images, in the original sense, are image-like maps of scene properties like depth, normal, albedo or shading. This paper demonstrates that StyleGAN can easily be induced to produce intrinsic images. The procedure is straightforward. We show that, if StyleGAN produces $G({w})$ from latents ${w}$, then for each type of intrinsic image, there is a fixed offset ${d}_c$ so that $G({w}+{d}_c)$ is that type of intrinsic image for $G({w})$. Here ${d}_c$ is {\em independent of ${w}$}. The StyleGAN we used was pretrained by others, so this property is not some accident of our training regime. We show that there are image transformations StyleGAN will {\em not} produce in this fashion, so StyleGAN is not a generic image regression engine. It is conceptually exciting that an image generator should ``know'' and represent intrinsic images. There may also be practical advantages to using a generative model to produce intrinsic images. The intrinsic images obtained from StyleGAN compare well both qualitatively and quantitatively with those obtained by using SOTA image regression techniques; but StyleGAN's intrinsic images are robust to relighting effects, unlike SOTA methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression b/data/2023/neurips/Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression
new file mode 100644
index 0000000000..13261b8283
--- /dev/null
+++ b/data/2023/neurips/Sub-optimality of the Naive Mean Field approximation for proportional high-dimensional Linear Regression	
@@ -0,0 +1 @@
+The Na\"ive Mean Field (NMF) approximation is widely employed in modern Machine Learning due to the huge computational gains it bestows on the statistician. Despite its popularity in practice, theoretical guarantees for high-dimensional problems are only available under strong structural assumptions (e.g., sparsity). Moreover, existing theory often does not explain empirical observations noted in the existing literature. In this paper, we take a step towards addressing these problems by deriving sharp asymptotic characterizations for the NMF approximation in high-dimensional linear regression. Our results apply to a wide class of natural priors and allow for model mismatch (i.e., the underlying statistical model can be different from the fitted model). We work under an \textit{iid} Gaussian design and the proportional asymptotic regime, where the number of features and the number of observations grow at a proportional rate. As a consequence of our asymptotic characterization, we establish two concrete corollaries: (a) we establish the inaccuracy of the NMF approximation for the log-normalizing constant in this regime, and (b) we provide theoretical results backing the empirical observation that the NMF approximation can be overconfident in terms of uncertainty quantification. Our results utilize recent advances in the theory of Gaussian comparison inequalities. To the best of our knowledge, this is the first application of these ideas to the analysis of Bayesian variational inference problems. Our theoretical results are corroborated by numerical experiments. Lastly, we believe our results can be generalized to non-Gaussian designs and provide empirical evidence to support it.
\ No newline at end of file
diff --git a/data/2023/neurips/Subclass-Dominant Label Noise: A Counterexample for the Success of Early Stopping b/data/2023/neurips/Subclass-Dominant Label Noise: A Counterexample for the Success of Early Stopping
new file mode 100644
index 0000000000..f13e1218e0
--- /dev/null
+++ b/data/2023/neurips/Subclass-Dominant Label Noise: A Counterexample for the Success of Early Stopping	
@@ -0,0 +1 @@
+In this paper, we empirically investigate a previously overlooked and widespread type of label noise, subclass-dominant label noise (SDN). Our findings reveal that, during the early stages of training, deep neural networks can rapidly memorize mislabeled examples in SDN. This phenomenon poses challenges in effectively selecting confident examples using conventional early stopping techniques. To address this issue, we delve into the properties of SDN and observe that long-trained representations are superior at capturing the high-level semantics of mislabeled examples, leading to a clustering effect where similar examples are grouped together. Based on this observation, we propose a novel method called NoiseCluster that leverages the geometric structures of long-trained representations to identify and correct SDN. Our experiments demonstrate that NoiseCluster outperforms state-of-the-art baselines on both synthetic and real-world datasets, highlighting the importance of addressing SDN in learning with noisy labels. The code is available at https://github.com/tmllab/2023_NeurIPS_SDN .
\ No newline at end of file
diff --git a/data/2023/neurips/Successor-Predecessor Intrinsic Exploration b/data/2023/neurips/Successor-Predecessor Intrinsic Exploration
new file mode 100644
index 0000000000..a42fa21c4b
--- /dev/null
+++ b/data/2023/neurips/Successor-Predecessor Intrinsic Exploration	
@@ -0,0 +1 @@
+Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games.
\ No newline at end of file
diff --git a/data/2023/neurips/Suggesting Variable Order for Cylindrical Algebraic Decomposition via Reinforcement Learning b/data/2023/neurips/Suggesting Variable Order for Cylindrical Algebraic Decomposition via Reinforcement Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Survival Permanental Processes for Survival Analysis with Time-Varying Covariates b/data/2023/neurips/Survival Permanental Processes for Survival Analysis with Time-Varying Covariates
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems b/data/2023/neurips/SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems
new file mode 100644
index 0000000000..18519bdb6d
--- /dev/null
+++ b/data/2023/neurips/SustainGym: Reinforcement Learning Environments for Sustainable Energy Systems	
@@ -0,0 +1 @@
+The lack of standardized benchmarks for reinforcement learning (RL) in sustainability applications has made it difficult to both track progress on specific domains and identify bottlenecks for researchers to focus their efforts. In this paper, we present SustainGym, a suite of five environments designed to test the performance of RL algorithms on realistic sustainable energy system tasks, ranging from electric vehicle charging to carbon-aware data center job scheduling. The environments test RL algorithms under realistic distribution shifts as well as in multi-agent settings. We show that standard off-the-shelf RL algorithms leave significant room for improving performance and highlight the challenges ahead for introducing RL to real-world sustainability tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models b/data/2023/neurips/SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models
new file mode 100644
index 0000000000..51e359ad4a
--- /dev/null
+++ b/data/2023/neurips/SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models	
@@ -0,0 +1 @@
+Test-time adaptation (TTA) is a special and practical setting in unsupervised domain adaptation, which allows a pre-trained model in a source domain to adapt to unlabeled test data in another target domain. To avoid the computation-intensive backbone fine-tuning process, the zero-shot generalization potentials of the emerging pre-trained vision-language models (e.g., CLIP, CoOp) are leveraged to only tune the run-time prompt for unseen test domains. However, existing solutions have yet to fully exploit the representation capabilities of pre-trained models as they only focus on the entropy-based optimization and the performance is far below the supervised prompt adaptation methods, e.g., CoOp. In this paper, we propose SwapPrompt, a novel framework that can effectively leverage the self-supervised contrastive learning to facilitate the test-time prompt adaptation. SwapPrompt employs a dual prompts paradigm, i.e., an online prompt and a target prompt that averaged from the online prompt to retain historical information. In addition, SwapPrompt applies a swapped prediction mechanism, which takes advantage of the representation capabilities of pre-trained models to enhance the online prompt via contrastive learning. Specifically, we use the online prompt together with an augmented view of the input image to predict the class assignment generated by the target prompt together with an alternative augmented view of the same image. The proposed SwapPrompt can be easily deployed on vision-language models without additional requirement, and experimental results show that it achieves state-of-the-art test-time adaptation performance on ImageNet and nine other datasets. It is also shown that SwapPrompt can even achieve comparable performance with supervised prompt adaptation methods
\ No newline at end of file
diff --git a/data/2023/neurips/Swarm Reinforcement Learning for Adaptive Mesh Refinement b/data/2023/neurips/Swarm Reinforcement Learning for Adaptive Mesh Refinement
new file mode 100644
index 0000000000..57648bee22
--- /dev/null
+++ b/data/2023/neurips/Swarm Reinforcement Learning for Adaptive Mesh Refinement	
@@ -0,0 +1 @@
+Adaptive Mesh Refinement (AMR) enhances the Finite Element Method, an important technique for simulating complex problems in engineering, by dynamically refining mesh regions, enabling a favorable trade-off between computational speed and simulation accuracy. Classical methods for AMR depend on heuristics or expensive error estimators, hindering their use for complex simulations. Recent learning-based AMR methods tackle these issues, but so far scale only to simple toy examples. We formulate AMR as a novel Adaptive Swarm Markov Decision Process in which a mesh is modeled as a system of simple collaborating agents that may split into multiple new agents. This framework allows for a spatial reward formulation that simplifies the credit assignment problem, which we combine with Message Passing Networks to propagate information between neighboring mesh elements. We experimentally validate our approach, Adaptive Swarm Mesh Refinement (ASMR), on challenging refinement tasks. Our approach learns reliable and efficient refinement strategies that can robustly generalize to different domains during inference. Additionally, it achieves a speedup of up to $2$ orders of magnitude compared to uniform refinements in more demanding simulations. We outperform learned baselines and heuristics, achieving a refinement quality that is on par with costly error-based oracle AMR strategies.
\ No newline at end of file
diff --git a/data/2023/neurips/SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks b/data/2023/neurips/SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
new file mode 100644
index 0000000000..aa06596ba7
--- /dev/null
+++ b/data/2023/neurips/SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks	
@@ -0,0 +1 @@
+We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large language models (LLMs) to enhance task completion performance. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. The Swift module is a small encoder-decoder LM fine-tuned on the oracle agent's action trajectories, while the Sage module employs LLMs such as GPT-4 for subgoal planning and grounding. We develop a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflexion, demonstrating its effectiveness in solving complex interactive tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials b/data/2023/neurips/Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials
new file mode 100644
index 0000000000..2e24adce4c
--- /dev/null
+++ b/data/2023/neurips/Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials	
@@ -0,0 +1 @@
+Artificial intelligence for scientific discovery has recently generated significant interest within the machine learning and scientific communities, particularly in the domains of chemistry, biology, and material discovery. For these scientific problems, molecules serve as the fundamental building blocks, and machine learning has emerged as a highly effective and powerful tool for modeling their geometric structures. Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry,&biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis. Then we propose a platform, coined Geom3D, which enables benchmarking the effectiveness of geometric strategies. Geom3D contains 16 advanced symmetry-informed geometric representation models and 14 geometric pretraining methods over 46 diverse datasets, including small molecules, proteins, and crystalline materials. We hope that Geom3D can, on the one hand, eliminate barriers for machine learning researchers interested in exploring scientific problems; and, on the other hand, provide valuable guidance for researchers in computational chemistry, structural biology, and materials science, aiding in the informed selection of representation techniques for specific applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Synthetic-to-Real Pose Estimation with Geometric Reconstruction b/data/2023/neurips/Synthetic-to-Real Pose Estimation with Geometric Reconstruction
new file mode 100644
index 0000000000..527c053ba4
--- /dev/null
+++ b/data/2023/neurips/Synthetic-to-Real Pose Estimation with Geometric Reconstruction	
@@ -0,0 +1 @@
+Pose estimation is remarkably successful under supervised learning, but obtaining annotations, especially for new deployments, is costly and time-consuming. This work tackles adapting models trained on synthetic data to real-world target domains with only unlabelled data. A common approach is model ﬁne-tuning with pseudo-labels from the target domain; yet many pseudo-labelling strategies cannot provide sufﬁcient high-quality pose labels. This work proposes a reconstruction-based strategy as a complement to pseudo-labelling for synthetic-to-real domain adaptation. We generate the driving image by geometrically transforming a base image according to the predicted keypoints and enforce a reconstruction loss to reﬁne the predictions. It provides a novel solution to effectively correct conﬁdent yet inaccurate keypoint locations through image reconstruction in domain adaptation. Our approach outperforms the previous state-of-the-arts by 8% for PCK on four large-scale hand and human real-world datasets. In particular, we excel on endpoints such as ﬁngertips and head, with 7.2% and 29.9% improvements in PCK.
\ No newline at end of file
diff --git a/data/2023/neurips/Systematic Visual Reasoning through Object-Centric Relational Abstraction b/data/2023/neurips/Systematic Visual Reasoning through Object-Centric Relational Abstraction
new file mode 100644
index 0000000000..7817720453
--- /dev/null
+++ b/data/2023/neurips/Systematic Visual Reasoning through Object-Centric Relational Abstraction	
@@ -0,0 +1 @@
+Human visual reasoning is characterized by an ability to identify abstract patterns from only a small number of examples, and to systematically generalize those patterns to novel inputs. This capacity depends in large part on our ability to represent complex visual inputs in terms of both objects and relations. Recent work in computer vision has introduced models with the capacity to extract object-centric representations, leading to the ability to process multi-object visual inputs, but falling short of the systematic generalization displayed by human reasoning. Other recent models have employed inductive biases for relational abstraction to achieve systematic generalization of learned abstract rules, but have generally assumed the presence of object-focused inputs. Here, we combine these two approaches, introducing Object-Centric Relational Abstraction (OCRA), a model that extracts explicit representations of both objects and abstract relations, and achieves strong systematic generalization in tasks (including a novel dataset, CLEVR-ART, with greater visual complexity) involving complex visual displays.
\ No newline at end of file
diff --git a/data/2023/neurips/T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation b/data/2023/neurips/T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
new file mode 100644
index 0000000000..68827dee38
--- /dev/null
+++ b/data/2023/neurips/T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation	
@@ -0,0 +1 @@
+Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.
\ No newline at end of file
diff --git a/data/2023/neurips/TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning b/data/2023/neurips/TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning
new file mode 100644
index 0000000000..2e919d575a
--- /dev/null
+++ b/data/2023/neurips/TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning	
@@ -0,0 +1 @@
+Despite recent progress in reinforcement learning (RL) from raw pixel data, sample inefficiency continues to present a substantial obstacle. Prior works have attempted to address this challenge by creating self-supervised auxiliary tasks, aiming to enrich the agent's learned representations with control-relevant information for future state prediction. However, these objectives are often insufficient to learn representations that can represent the optimal policy or value function, and they often consider tasks with small, abstract discrete action spaces and thus overlook the importance of action representation learning in continuous control. In this paper, we introduce TACO: Temporal Action-driven Contrastive Learning, a simple yet powerful temporal contrastive learning approach that facilitates the concurrent acquisition of latent state and action representations for agents. TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states paired with action sequences and representations of the corresponding future states. Theoretically, TACO can be shown to learn state and action representations that encompass sufficient information for control, thereby improving sample efficiency. For online RL, TACO achieves 40% performance boost after one million environment interaction steps on average across nine challenging visual continuous control tasks from Deepmind Control Suite. In addition, we show that TACO can also serve as a plug-and-play module adding to existing offline visual RL methods to establish the new state-of-the-art performance for offline visual RL across offline datasets with varying quality.
\ No newline at end of file
diff --git a/data/2023/neurips/TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph b/data/2023/neurips/TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph
new file mode 100644
index 0000000000..03ea0fd321
--- /dev/null
+++ b/data/2023/neurips/TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph	
@@ -0,0 +1 @@
+Multi-hop logical reasoning over knowledge graph (KG) plays a fundamental role in many artificial intelligence tasks. Recent complex query embedding (CQE) methods for reasoning focus on static KGs, while temporal knowledge graphs (TKGs) have not been fully explored. Reasoning over TKGs has two challenges: 1. The query should answer entities or timestamps; 2. The operators should consider both set logic on entity set and temporal logic on timestamp set. To bridge this gap, we define the multi-hop logical reasoning problem on TKGs. With generated three datasets, we propose the first temporal CQE named Temporal Feature-Logic Embedding framework (TFLEX) to answer the temporal complex queries. We utilize vector logic to compute the logic part of Temporal Feature-Logic embeddings, thus naturally modeling all First-Order Logic (FOL) operations on entity set. In addition, our framework extends vector logic on timestamp set to cope with three extra temporal operators (After, Before and Between). Experiments on numerous query patterns demonstrate the effectiveness of our method.
\ No newline at end of file
diff --git a/data/2023/neurips/TOA: Task-oriented Active VQA b/data/2023/neurips/TOA: Task-oriented Active VQA
new file mode 100644
index 0000000000..65f535c455
--- /dev/null
+++ b/data/2023/neurips/TOA: Task-oriented Active VQA	
@@ -0,0 +1 @@
+Knowledge-based visual question answering (VQA) requires external knowledge to answer the question about an image. Early methods explicitly retrieve knowledge from external knowledge bases, which often introduce noisy information. Recently large language models like GPT-3 have shown encouraging performance as implicit knowledge source and revealed planning abilities. However, current large language models can not effectively understand image inputs, thus it remains an open problem to extract the image information and input to large language models. Prior works have used image captioning and object descriptions to represent the image. However, they may either drop the essential visual information to answer the question correctly or involve irrelevant objects to the task-of-interest. To address this problem, we propose to let large language models make an initial hypothesis according to their knowledge, then actively collect the visual evidence required to verify the hypothesis. In this way, the model can attend to the essential visual information in a task-oriented manner. We leverage several vision modules from the perspectives of spatial attention ( i.e. , Where to look) and attribute attention ( i.e. , What to look), which is similar to human cognition. The experiments show that our proposed method outperforms the baselines on open-ended knowledge-based VQA datasets and presents clear reasoning procedure with better interpretability.
\ No newline at end of file
diff --git a/data/2023/neurips/TRIAGE: Characterizing and auditing training data for improved regression b/data/2023/neurips/TRIAGE: Characterizing and auditing training data for improved regression
new file mode 100644
index 0000000000..55a25217c5
--- /dev/null
+++ b/data/2023/neurips/TRIAGE: Characterizing and auditing training data for improved regression	
@@ -0,0 +1 @@
+Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We operationalize the score to analyze individual samples' training dynamics and characterize samples as under-, over-, or well-estimated by the model. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings. Additionally, beyond sample level, we show TRIAGE enables new approaches to dataset selection and feature acquisition. Overall, TRIAGE highlights the value unlocked by data characterization in real-world regression applications
\ No newline at end of file
diff --git a/data/2023/neurips/Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds b/data/2023/neurips/Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds
new file mode 100644
index 0000000000..b2c69055c5
--- /dev/null
+++ b/data/2023/neurips/Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds	
@@ -0,0 +1 @@
+While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+\epsilon)$-th moments for some $\epsilon\in(0,1]$. In this work, we address the challenge of such rewards in RL with linear function approximation. We first design an algorithm, \textsc{Heavy-OFUL}, for heavy-tailed linear bandits, achieving an \emph{instance-dependent} $T$-round regret of $\tilde{O}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$, the \emph{first} of this kind. Here, $d$ is the feature dimension, and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of the reward at the $t$-th round. We further show the above bound is minimax optimal when applied to the worst-case instances in stochastic and deterministic linear bandits. We then extend this algorithm to the RL settings with linear function approximation. Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$. Here, $H$ is length of the episode, and $\mathcal{U}^*, \mathcal{V}^*$ are instance-dependent quantities scaling with the central moment of reward and value functions, respectively. We also provide a matching minimax lower bound $\Omega(d H K^{\frac{1}{1+\epsilon}} + d \sqrt{H^3 K})$ to demonstrate the optimality of our algorithm in the worst case. Our result is achieved via a novel robust self-normalized concentration inequality that may be of independent interest in handling heavy-tailed noise in general online regression problems.
\ No newline at end of file
diff --git a/data/2023/neurips/Tailoring Self-Attention for Graph via Rooted Subtrees b/data/2023/neurips/Tailoring Self-Attention for Graph via Rooted Subtrees
new file mode 100644
index 0000000000..cf37d0de3c
--- /dev/null
+++ b/data/2023/neurips/Tailoring Self-Attention for Graph via Rooted Subtrees	
@@ -0,0 +1 @@
+Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.
\ No newline at end of file
diff --git a/data/2023/neurips/Taming Local Effects in Graph-based Spatiotemporal Forecasting b/data/2023/neurips/Taming Local Effects in Graph-based Spatiotemporal Forecasting
new file mode 100644
index 0000000000..a6b55ab8aa
--- /dev/null
+++ b/data/2023/neurips/Taming Local Effects in Graph-based Spatiotemporal Forecasting	
@@ -0,0 +1 @@
+Spatiotemporal graph neural networks have shown to be effective in time series forecasting applications, achieving better performance than standard univariate predictors in several settings. These architectures take advantage of a graph structure and relational inductive biases to learn a single (global) inductive model to predict any number of the input time series, each associated with a graph node. Despite the gain achieved in computational and data efficiency w.r.t. fitting a set of local models, relying on a single global model can be a limitation whenever some of the time series are generated by a different spatiotemporal stochastic process. The main objective of this paper is to understand the interplay between globality and locality in graph-based spatiotemporal forecasting, while contextually proposing a methodological framework to rationalize the practice of including trainable node embeddings in such architectures. We ascribe to trainable node embeddings the role of amortizing the learning of specialized components. Moreover, embeddings allow for 1) effectively combining the advantages of shared message-passing layers with node-specific parameters and 2) efficiently transferring the learned model to new node sets. Supported by strong empirical evidence, we provide insights and guidelines for specializing graph-based models to the dynamics of each time series and show how this aspect plays a crucial role in obtaining accurate predictions.
\ No newline at end of file
diff --git a/data/2023/neurips/Tanimoto Random Features for Scalable Molecular Machine Learning b/data/2023/neurips/Tanimoto Random Features for Scalable Molecular Machine Learning
new file mode 100644
index 0000000000..9416f41a80
--- /dev/null
+++ b/data/2023/neurips/Tanimoto Random Features for Scalable Molecular Machine Learning	
@@ -0,0 +1 @@
+The Tanimoto coefficient is commonly used to measure the similarity between molecules represented as discrete fingerprints, either as a distance metric or a positive definite kernel. While many kernel methods can be accelerated using random feature approximations, at present there is a lack of such approximations for the Tanimoto kernel. In this paper we propose two kinds of novel random features to allow this kernel to scale to large datasets, and in the process discover a novel extension of the kernel to real-valued vectors. We theoretically characterize these random features, and provide error bounds on the spectral norm of the Gram matrix. Experimentally, we show that these random features are effective at approximating the Tanimoto coefficient of real-world datasets and are useful for molecular property prediction and optimization tasks.
\ No newline at end of file
diff --git a/data/2023/neurips/Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models b/data/2023/neurips/Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
new file mode 100644
index 0000000000..c217e404d8
--- /dev/null
+++ b/data/2023/neurips/Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models	
@@ -0,0 +1 @@
+Task arithmetic has recently emerged as a cost-effective and scalable approach to edit pre-trained models directly in weight space: By adding the fine-tuned weights of different tasks, the model's performance can be improved on these tasks, while negating them leads to task forgetting. Yet, our understanding of the effectiveness of task arithmetic and its underlying principles remains limited. We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks. Notably, we show that fine-tuning models in their tangent space by linearizing them amplifies weight disentanglement. This leads to substantial performance improvements across multiple task arithmetic benchmarks and diverse models. Building on these findings, we provide theoretical and empirical analyses of the neural tangent kernel (NTK) of these models and establish a compelling link between task arithmetic and the spatial localization of the NTK eigenfunctions. Overall, our work uncovers novel insights into the fundamental mechanisms of task arithmetic and offers a more reliable and effective approach to edit pre-trained models through the NTK linearization.
\ No newline at end of file
diff --git a/data/2023/neurips/Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training b/data/2023/neurips/Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
new file mode 100644
index 0000000000..cc198a5539
--- /dev/null
+++ b/data/2023/neurips/Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training	
@@ -0,0 +1 @@
+Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.
\ No newline at end of file
diff --git a/data/2023/neurips/Temporal Continual Learning with Prior Compensation for Human Motion Prediction b/data/2023/neurips/Temporal Continual Learning with Prior Compensation for Human Motion Prediction
new file mode 100644
index 0000000000..a3cbcf993c
--- /dev/null
+++ b/data/2023/neurips/Temporal Continual Learning with Prior Compensation for Human Motion Prediction	
@@ -0,0 +1 @@
+Human Motion Prediction (HMP) aims to predict future poses at different moments according to past motion sequences. Previous approaches have treated the prediction of various moments equally, resulting in two main limitations: the learning of short-term predictions is hindered by the focus on long-term predictions, and the incorporation of prior information from past predictions into subsequent predictions is limited. In this paper, we introduce a novel multi-stage training framework called Temporal Continual Learning (TCL) to address the above challenges. To better preserve prior information, we introduce the Prior Compensation Factor (PCF). We incorporate it into the model training to compensate for the lost prior information. Furthermore, we derive a more reasonable optimization objective through theoretical derivation. It is important to note that our TCL framework can be easily integrated with different HMP backbone models and adapted to various datasets and applications. Extensive experiments on four HMP benchmark datasets demonstrate the effectiveness and flexibility of TCL. The code is available at https://github.com/hyqlat/TCL .
\ No newline at end of file
diff --git a/data/2023/neurips/Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification b/data/2023/neurips/Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification
new file mode 100644
index 0000000000..d92a60ef4e
--- /dev/null
+++ b/data/2023/neurips/Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification	
@@ -0,0 +1 @@
+We investigate the problem of reducing mistake severity for fine-grained classification. Fine-grained classification can be challenging, mainly due to the requirement of domain expertise for accurate annotation. However, humans are particularly adept at performing coarse classification as it requires relatively low levels of expertise. To this end, we present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE) that utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions. By only requiring the parents of leaf nodes, our method significantly reduces avg. mistake severity while improving top-1 accuracy on the iNaturalist-19 and tieredImageNet-H datasets, achieving a new state-of-the-art on both benchmarks. We also investigate the efficacy of our approach in the semi-supervised setting. Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes. The simplicity and post-hoc nature of HiE renders it practical to be used with any off-the-shelf trained model to improve its predictions further.
\ No newline at end of file
diff --git a/data/2023/neurips/Test-time Training for Matching-based Video Object Segmentation b/data/2023/neurips/Test-time Training for Matching-based Video Object Segmentation
new file mode 100644
index 0000000000..eacf61f7d6
--- /dev/null
+++ b/data/2023/neurips/Test-time Training for Matching-based Video Object Segmentation	
@@ -0,0 +1 @@
+The video object segmentation (VOS) task involves the segmentation of an object over time based on a single initial mask. Current state-of-the-art approaches use a memory of previously processed frames and rely on matching to estimate segmentation masks of subsequent frames. Lacking any adaptation mechanism, such methods are prone to test-time distribution shifts. This work focuses on matching-based VOS under distribution shifts such as video corruptions, stylization, and sim-to-real transfer. We explore test-time training strategies that are agnostic to the specific task as well as strategies that are designed specifically for VOS. This includes a variant based on mask cycle consistency tailored to matching-based VOS methods. The experimental results on common benchmarks demonstrate that the proposed test-time training yields significant improvements in performance. In particular for the sim-to-real scenario and despite using only a single test video, our approach manages to recover a substantial portion of the performance gain achieved through training on real videos. Additionally, we introduce
\ No newline at end of file
diff --git a/data/2023/neurips/Text Alignment Is An Efficient Unified Model for Massive NLP Tasks b/data/2023/neurips/Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
new file mode 100644
index 0000000000..d85c0aae41
--- /dev/null
+++ b/data/2023/neurips/Text Alignment Is An Efficient Unified Model for Massive NLP Tasks	
@@ -0,0 +1 @@
+Large language models (LLMs), typically designed as a function of next-word prediction, have excelled across extensive NLP tasks. Despite the generality, next-word prediction is often not an efficient formulation for many of the tasks, demanding an extreme scale of model parameters (10s or 100s of billions) and sometimes yielding suboptimal performance. In practice, it is often desirable to build more efficient models -- despite being less versatile, they still apply to a substantial subset of problems, delivering on par or even superior performance with much smaller model sizes. In this paper, we propose text alignment as an efficient unified model for a wide range of crucial tasks involving text entailment, similarity, question answering (and answerability), factual consistency, and so forth. Given a pair of texts, the model measures the degree of alignment between their information. We instantiate an alignment model (Align) through lightweight finetuning of RoBERTa (355M parameters) using 5.9M examples from 28 datasets. Despite its compact size, extensive experiments show the model's efficiency and strong performance: (1) On over 20 datasets of aforementioned diverse tasks, the model matches or surpasses FLAN-T5 models that have around 2x or 10x more parameters; the single unified model also outperforms task-specific models finetuned on individual datasets; (2) When applied to evaluate factual consistency of language generation on 23 datasets, our model improves over various baselines, including the much larger GPT-3.5 (ChatGPT) and sometimes even GPT-4; (3) The lightweight model can also serve as an add-on component for LLMs such as GPT-3.5 in question answering tasks, improving the average exact match (EM) score by 17.94 and F1 score by 15.05 through identifying unanswerable questions.
\ No newline at end of file
diff --git a/data/2023/neurips/Textually Pretrained Speech Language Models b/data/2023/neurips/Textually Pretrained Speech Language Models
new file mode 100644
index 0000000000..b458d29c03
--- /dev/null
+++ b/data/2023/neurips/Textually Pretrained Speech Language Models	
@@ -0,0 +1 @@
+Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model design choices such as the speech tokenizer, the pretrained textual model, and the dataset size. We find that model and dataset scale both play an important role in constructing better-performing SpeechLMs. Based on our observations, we present the largest (to the best of our knowledge) SpeechLM both in terms of number of parameters and training data. We additionally introduce two spoken versions of the StoryCloze textual benchmark to further improve model evaluation and advance future research in the field. We make speech samples, code and models publicly available: https://pages.cs.huji.ac.il/adiyoss-lab/twist/ .
\ No newline at end of file
diff --git a/data/2023/neurips/The Behavior and Convergence of Local Bayesian Optimization b/data/2023/neurips/The Behavior and Convergence of Local Bayesian Optimization
new file mode 100644
index 0000000000..c56ac3cd37
--- /dev/null
+++ b/data/2023/neurips/The Behavior and Convergence of Local Bayesian Optimization	
@@ -0,0 +1 @@
+A recent development in Bayesian optimization is the use of local optimization strategies, which can deliver strong empirical performance on high-dimensional problems compared to traditional global strategies. The"folk wisdom"in the literature is that the focus on local optimization sidesteps the curse of dimensionality; however, little is known concretely about the expected behavior or convergence of Bayesian local optimization routines. We first study the behavior of the local approach, and find that the statistics of individual local solutions of Gaussian process sample paths are surprisingly good compared to what we would expect to recover from global methods. We then present the first rigorous analysis of such a Bayesian local optimization algorithm recently proposed by M\"uller et al. (2021), and derive convergence rates in both the noisy and noiseless settings.
\ No newline at end of file
diff --git a/data/2023/neurips/The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium b/data/2023/neurips/The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium
new file mode 100644
index 0000000000..57c00de191
--- /dev/null
+++ b/data/2023/neurips/The Best of Both Worlds in Network Population Games: Reaching Consensus and Convergence to Equilibrium	
@@ -0,0 +1 @@
+Reaching consensus and convergence to equilibrium are two major challenges of multi-agent systems. Although each has attracted significant attention, relatively few studies address both challenges at the same time. This paper examines the connection between the notions of consensus and equilibrium in a multi-agent sys-tem where multiple interacting sub-populations coexist. We argue that consensus can be seen as an intricate component of intra-population stability, whereas equilibrium can be seen as encoding inter-population stability. We show that smooth fictitious play, a well-known learning model in game theory, can achieve both consensus and convergence to equilibrium in diverse multi-agent settings. Moreover, we show that the consensus formation process plays a crucial role in the seminal thorny problem of equilibrium selection in multi-agent learning.
\ No newline at end of file
diff --git a/data/2023/neurips/The Crucial Role of Normalization in Sharpness-Aware Minimization b/data/2023/neurips/The Crucial Role of Normalization in Sharpness-Aware Minimization
new file mode 100644
index 0000000000..8a1c3ff451
--- /dev/null
+++ b/data/2023/neurips/The Crucial Role of Normalization in Sharpness-Aware Minimization	
@@ -0,0 +1 @@
+Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically and empirically study the effect of normalization in SAM for both convex and non-convex functions, revealing two key roles played by normalization: i) it helps in stabilizing the algorithm; and ii) it enables the algorithm to drift along a continuum (manifold) of minima -- a property identified by recent theoretical works that is the key to better performance. We further argue that these two properties of normalization make SAM robust against the choice of hyper-parameters, supporting the practicality of SAM. Our conclusions are backed by various experiments.
\ No newline at end of file
diff --git a/data/2023/neurips/The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model b/data/2023/neurips/The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model
new file mode 100644
index 0000000000..6113b86744
--- /dev/null
+++ b/data/2023/neurips/The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model	
@@ -0,0 +1 @@
+This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP. Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Assuming access to a generative model that draws samples based on the nominal MDP, we characterize the sample complexity of RMDPs when the uncertainty set is specified via either the total variation (TV) distance or $\chi^2$ divergence. The algorithm studied here is a model-based method called {\em distributionally robust value iteration}, which is shown to be near-optimal for the full range of uncertainty levels. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set: in the case w.r.t.~the TV distance, the minimax sample complexity of RMDPs is always smaller than that of standard MDPs; in the case w.r.t.~the $\chi^2$ divergence, the sample complexity of RMDPs can often far exceed the standard MDP counterpart.
\ No newline at end of file
diff --git a/data/2023/neurips/The Distortion of Binomial Voting Defies Expectation b/data/2023/neurips/The Distortion of Binomial Voting Defies Expectation
new file mode 100644
index 0000000000..54541309de
--- /dev/null
+++ b/data/2023/neurips/The Distortion of Binomial Voting Defies Expectation	
@@ -0,0 +1 @@
+In computational social choice, the distortion of a voting rule quantifies the degree to which the rule overcomes limited preference information to select a socially desirable outcome. This concept has been investigated extensively, but only through a worst-case lens. Instead, we study the expected distortion of voting rules with respect to an underlying distribution over voter utilities. Our main contribution is the design and analysis of a novel and intuitive rule, binomial voting, which provides strong distribution-independent guarantees for both expected distortion and expected welfare.
\ No newline at end of file
diff --git a/data/2023/neurips/The Equivalence of Dynamic and Strategic Stability under Regularized Learning in Games b/data/2023/neurips/The Equivalence of Dynamic and Strategic Stability under Regularized Learning in Games
new file mode 100644
index 0000000000..5647e7ef37
--- /dev/null
+++ b/data/2023/neurips/The Equivalence of Dynamic and Strategic Stability under Regularized Learning in Games	
@@ -0,0 +1 @@
+In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.
\ No newline at end of file
diff --git a/data/2023/neurips/The Gain from Ordering in Online Learning b/data/2023/neurips/The Gain from Ordering in Online Learning
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/The Grand Illusion: The Myth of Software Portability and Implications for ML Progress b/data/2023/neurips/The Grand Illusion: The Myth of Software Portability and Implications for ML Progress
new file mode 100644
index 0000000000..5ae5a2b58f
--- /dev/null
+++ b/data/2023/neurips/The Grand Illusion: The Myth of Software Portability and Implications for ML Progress	
@@ -0,0 +1 @@
+Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if software and hardware are co-evolving, making it even harder to stray away from mainstream ideas that work well with popular tooling stacks. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work, we ask: How portable are popular ML software frameworks? We conduct a large-scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme and render performance untenable. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
\ No newline at end of file
diff --git a/data/2023/neurips/The Graph Pencil Method: Mapping Subgraph Densities to Stochastic Block Models b/data/2023/neurips/The Graph Pencil Method: Mapping Subgraph Densities to Stochastic Block Models
new file mode 100644
index 0000000000..1e0b800a76
--- /dev/null
+++ b/data/2023/neurips/The Graph Pencil Method: Mapping Subgraph Densities to Stochastic Block Models	
@@ -0,0 +1 @@
+In this work, we describe a method that determines an exact map from a finite set of subgraph densities to the parameters of a stochastic block model (SBM) matching these densities. Given a number $K$ of blocks, the subgraph densities of a finite number of stars and bistars uniquely determines a single element of the class of all degree-separated stochastic block models with $K$ blocks. Our method makes it possible to translate estimates of these subgraph densities into model parameters, and hence to use subgraph densities directly for inference. The computational overhead is negligible; computing the translation map is polynomial in $K$, but independent of the graph size once the subgraph densities are given.
\ No newline at end of file
diff --git a/data/2023/neurips/The Learnability of In-Context Learning b/data/2023/neurips/The Learnability of In-Context Learning
new file mode 100644
index 0000000000..007c0e130b
--- /dev/null
+++ b/data/2023/neurips/The Learnability of In-Context Learning	
@@ -0,0 +1 @@
+In-context learning is a surprising and important phenomenon that emerged when modern language models were scaled to billions of learned parameters. Without modifying a large language model's weights, it can be tuned to perform various downstream natural language tasks simply by including concatenated training examples of these tasks in its input. Though disruptive for many practical applications of large language models, this emergent learning paradigm is not well understood from a theoretical perspective. In this paper, we propose a first-of-its-kind PAC based framework for in-context learnability, and use it to provide the first finite sample complexity results for the in-context learning setup. Our framework includes an initial pretraining phase, which fits a function to the pretraining distribution, and then a second in-context learning phase, which keeps this function constant and concatenates training examples of the downstream task in its input. We use our framework in order to prove that, under mild assumptions, when the pretraining distribution is a mixture of latent tasks (a model often considered for natural language pretraining), these tasks can be efficiently learned via in-context learning, even though the model's weights are unchanged and the input significantly diverges from the pretraining distribution. Our theoretical analysis reveals that in this setting, in-context learning is more about identifying the task than about learning it, a result which is in line with a series of recent empirical findings. We hope that the in-context learnability framework presented in this paper will facilitate future progress towards a deeper understanding of this important new learning paradigm.
\ No newline at end of file
diff --git a/data/2023/neurips/The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only b/data/2023/neurips/The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification b/data/2023/neurips/The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification
new file mode 100644
index 0000000000..141da3fdb0
--- /dev/null
+++ b/data/2023/neurips/The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification	
@@ -0,0 +1 @@
+This paper introduces the novel concept of few-shot weakly supervised learning for pathology Whole Slide Image (WSI) classification, denoted as FSWC. A solution is proposed based on prompt learning and the utilization of a large language model, GPT-4. Since a WSI is too large and needs to be divided into patches for processing, WSI classification is commonly approached as a Multiple Instance Learning (MIL) problem. In this context, each WSI is considered a bag, and the obtained patches are treated as instances. The objective of FSWC is to classify both bags and instances with only a limited number of labeled bags. Unlike conventional few-shot learning problems, FSWC poses additional challenges due to its weak bag labels within the MIL framework. Drawing inspiration from the recent achievements of vision-language models (V-L models) in downstream few-shot classification tasks, we propose a two-level prompt learning MIL framework tailored for pathology, incorporating language prior knowledge. Specifically, we leverage CLIP to extract instance features for each patch, and introduce a prompt-guided pooling strategy to aggregate these instance features into a bag feature. Subsequently, we employ a small number of labeled bags to facilitate few-shot prompt learning based on the bag features. Our approach incorporates the utilization of GPT-4 in a question-and-answer mode to obtain language prior knowledge at both the instance and bag levels, which are then integrated into the instance and bag level language prompts. Additionally, a learnable component of the language prompts is trained using the available few-shot labeled data. We conduct extensive experiments on three real WSI datasets encompassing breast cancer, lung cancer, and cervical cancer, demonstrating the notable performance of the proposed method in bag and instance classification. All codes will be available.
\ No newline at end of file
diff --git a/data/2023/neurips/The Transient Nature of Emergent In-Context Learning in Transformers b/data/2023/neurips/The Transient Nature of Emergent In-Context Learning in Transformers
new file mode 100644
index 0000000000..0089587e88
--- /dev/null
+++ b/data/2023/neurips/The Transient Nature of Emergent In-Context Learning in Transformers	
@@ -0,0 +1 @@
+Transformer neural networks can exhibit a surprising capacity for in-context learning (ICL) despite not being explicitly trained for it. Prior work has provided a deeper understanding of how ICL emerges in transformers, e.g. through the lens of mechanistic interpretability, Bayesian inference, or by examining the distributional properties of training data. However, in each of these cases, ICL is treated largely as a persistent phenomenon; namely, once ICL emerges, it is assumed to persist asymptotically. Here, we show that the emergence of ICL during transformer training is, in fact, often transient. We train transformers on synthetic data designed so that both ICL and in-weights learning (IWL) strategies can lead to correct predictions. We find that ICL first emerges, then disappears and gives way to IWL, all while the training loss decreases, indicating an asymptotic preference for IWL. The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to"overtrain"transformers when seeking compact, cheaper-to-run models. We find that L2 regularization may offer a path to more persistent ICL that removes the need for early stopping based on ICL-style validation tasks. Finally, we present initial evidence that ICL transience may be caused by competition between ICL and IWL circuits.
\ No newline at end of file
diff --git a/data/2023/neurips/The Tunnel Effect: Building Data Representations in Deep Neural Networks b/data/2023/neurips/The Tunnel Effect: Building Data Representations in Deep Neural Networks
new file mode 100644
index 0000000000..11465dd7af
--- /dev/null
+++ b/data/2023/neurips/The Tunnel Effect: Building Data Representations in Deep Neural Networks	
@@ -0,0 +1 @@
+Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.
\ No newline at end of file
diff --git a/data/2023/neurips/The expressive power of pooling in Graph Neural Networks b/data/2023/neurips/The expressive power of pooling in Graph Neural Networks
new file mode 100644
index 0000000000..b3c6ccfbd0
--- /dev/null
+++ b/data/2023/neurips/The expressive power of pooling in Graph Neural Networks	
@@ -0,0 +1 @@
+In Graph Neural Networks (GNNs), hierarchical pooling operators generate local summaries of the data by coarsening the graph structure and the vertex features. While considerable attention has been devoted to analyzing the expressive power of message-passing (MP) layers in GNNs, a study on how graph pooling affects the expressiveness of a GNN is still lacking. Additionally, despite the recent advances in the design of pooling operators, there is not a principled criterion to compare them. In this work, we derive sufficient conditions for a pooling operator to fully preserve the expressive power of the MP layers before it. These conditions serve as a universal and theoretically grounded criterion for choosing among existing pooling operators or designing new ones. Based on our theoretical findings, we analyze several existing pooling operators and identify those that fail to satisfy the expressiveness conditions. Finally, we introduce an experimental setup to verify empirically the expressive power of a GNN equipped with pooling layers, in terms of its capability to perform a graph isomorphism test.
\ No newline at end of file
diff --git a/data/2023/neurips/The noise level in linear regression with dependent data b/data/2023/neurips/The noise level in linear regression with dependent data
new file mode 100644
index 0000000000..cd29e91a8f
--- /dev/null
+++ b/data/2023/neurips/The noise level in linear regression with dependent data	
@@ -0,0 +1 @@
+We derive upper bounds for random design linear regression with dependent ($\beta$-mixing) data absent any realizability assumptions. In contrast to the strictly realizable martingale noise regime, no sharp instance-optimal non-asymptotics are available in the literature. Up to constant factors, our analysis correctly recovers the variance term predicted by the Central Limit Theorem -- the noise level of the problem -- and thus exhibits graceful degradation as we introduce misspecification. Past a burn-in, our result is sharp in the moderate deviations regime, and in particular does not inflate the leading order term by mixing time factors.
\ No newline at end of file
diff --git a/data/2023/neurips/The probability flow ODE is provably fast b/data/2023/neurips/The probability flow ODE is provably fast
new file mode 100644
index 0000000000..9ebc493ca4
--- /dev/null
+++ b/data/2023/neurips/The probability flow ODE is provably fast	
@@ -0,0 +1 @@
+We provide the first polynomial-time convergence guarantees for the probability flow ODE implementation (together with a corrector step) of score-based generative modeling. Our analysis is carried out in the wake of recent results obtaining such guarantees for the SDE-based implementation (i.e., denoising diffusion probabilistic modeling or DDPM), but requires the development of novel techniques for studying deterministic dynamics without contractivity. Through the use of a specially chosen corrector step based on the underdamped Langevin diffusion, we obtain better dimension dependence than prior works on DDPM ($O(\sqrt{d})$ vs. $O(d)$, assuming smoothness of the data distribution), highlighting potential advantages of the ODE framework.
\ No newline at end of file
diff --git a/data/2023/neurips/The s-value: evaluating stability with respect to distributional shifts b/data/2023/neurips/The s-value: evaluating stability with respect to distributional shifts
new file mode 100644
index 0000000000..d32b49a7c7
--- /dev/null
+++ b/data/2023/neurips/The s-value: evaluating stability with respect to distributional shifts	
@@ -0,0 +1 @@
+Common statistical measures of uncertainty such as $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. However, sampling is not the only source of uncertainty. In practice, distributions change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of instability that quantifies the distributional instability of a statistical parameter with respect to Kullback-Leibler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Leibler divergence ball. In addition, we quantify the instability of parameters with respect to directional or variable-specific shifts. Measuring instability with respect to directional shifts can be used to detect the type of shifts a parameter is sensitive to. We discuss how such knowledge can inform data collection for improved estimation of statistical parameters under shifted distributions. We evaluate the performance of the proposed measure on real data and show that it can elucidate the distributional instability of a parameter with respect to certain shifts and can be used to improve estimation accuracy under shifted distributions.
\ No newline at end of file
diff --git a/data/2023/neurips/Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks b/data/2023/neurips/Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance b/data/2023/neurips/Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
new file mode 100644
index 0000000000..5fea14f742
--- /dev/null
+++ b/data/2023/neurips/Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance	
@@ -0,0 +1 @@
+Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
\ No newline at end of file
diff --git a/data/2023/neurips/Thrust: Adaptively Propels Large Language Models with External Knowledge b/data/2023/neurips/Thrust: Adaptively Propels Large Language Models with External Knowledge
new file mode 100644
index 0000000000..ca100e56e0
--- /dev/null
+++ b/data/2023/neurips/Thrust: Adaptively Propels Large Language Models with External Knowledge	
@@ -0,0 +1 @@
+Although large-scale pre-trained language models (PTLMs) are shown to encode rich knowledge in their model parameters, the inherent knowledge in PTLMs can be opaque or static, making external knowledge necessary. However, the existing information retrieval techniques could be costly and may even introduce noisy and sometimes misleading knowledge. To address these challenges, we propose the instance-level adaptive propulsion of external knowledge (IAPEK), where we only conduct the retrieval when necessary. To achieve this goal, we propose measuring whether a PTLM contains enough knowledge to solve an instance with a novel metric, Thrust, which leverages the representation distribution of a small number of seen instances. Extensive experiments demonstrate that thrust is a good measurement of PTLM models' instance-level knowledgeability. Moreover, we can achieve significantly higher cost-efficiency with the Thrust score as the retrieval indicator than the naive usage of external knowledge on 88% of the evaluated tasks with 26% average performance improvement. Such findings shed light on the real-world practice of knowledge-enhanced LMs with a limited knowledge-seeking budget due to computation latency or costs.
\ No newline at end of file
diff --git a/data/2023/neurips/Tight Bounds for Volumetric Spanners and Applications b/data/2023/neurips/Tight Bounds for Volumetric Spanners and Applications
new file mode 100644
index 0000000000..a02077791a
--- /dev/null
+++ b/data/2023/neurips/Tight Bounds for Volumetric Spanners and Applications	
@@ -0,0 +1 @@
+Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using"small"coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = \{v_1, v_2, \dots, v_n\}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} \alpha_i v_i$, with $\|\alpha\|$ being small. This notion, which has also been referred to as a well-conditioned basis, has found several applications, including bandit linear optimization, determinant maximization, and matrix low rank approximation. In this paper, we give almost optimal bounds on the size of volumetric spanners for all $\ell_p$ norms, and show that they can be constructed using a simple local search procedure. We then show the applications of our result to other tasks and in particular the problem of finding coresets for the Minimum Volume Enclosing Ellipsoid (MVEE) problem.
\ No newline at end of file
diff --git a/data/2023/neurips/Tight Risk Bounds for Gradient Descent on Separable Data b/data/2023/neurips/Tight Risk Bounds for Gradient Descent on Separable Data
new file mode 100644
index 0000000000..75b0a634bd
--- /dev/null
+++ b/data/2023/neurips/Tight Risk Bounds for Gradient Descent on Separable Data	
@@ -0,0 +1 @@
+We study the generalization properties of unregularized gradient methods applied to separable linear classification -- a setting that has received considerable attention since the pioneering work of Soudry et al. (2018). We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form $\Theta(r_{\ell,T}^2 / \gamma^2 T + r_{\ell,T}^2 / \gamma^2 n)$, where $T$ is the number of gradient steps, $n$ is size of the training set, $\gamma$ is the data margin, and $r_{\ell,T}$ is a complexity term that depends on the (tail decay rate) of the loss function (and on $T$). Our upper bound matches the best known upper bounds due to Shamir (2021); Schliserman and Koren (2022), while extending their applicability to virtually any smooth loss function and relaxing technical assumptions they impose. Our risk lower bounds are the first in this context and establish the tightness of our upper bounds for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.
\ No newline at end of file
diff --git a/data/2023/neurips/Time Series as Images: Vision Transformer for Irregularly Sampled Time Series b/data/2023/neurips/Time Series as Images: Vision Transformer for Irregularly Sampled Time Series
new file mode 100644
index 0000000000..e4779e6716
--- /dev/null
+++ b/data/2023/neurips/Time Series as Images: Vision Transformer for Irregularly Sampled Time Series	
@@ -0,0 +1 @@
+Irregularly sampled time series are increasingly prevalent, particularly in medical domains. While various specialized methods have been developed to handle these irregularities, effectively modeling their complex dynamics and pronounced sparsity remains a challenge. This paper introduces a novel perspective by converting irregularly sampled time series into line graph images, then utilizing powerful pre-trained vision transformers for time series classification in the same way as image classification. This method not only largely simplifies specialized algorithm designs but also presents the potential to serve as a universal framework for time series modeling. Remarkably, despite its simplicity, our approach outperforms state-of-the-art specialized algorithms on several popular healthcare and human activity datasets. Especially in the rigorous leave-sensors-out setting where a portion of variables is omitted during testing, our method exhibits strong robustness against varying degrees of missing observations, achieving an impressive improvement of 42.8% in absolute F1 score points over leading specialized baselines even with half the variables masked. Code and data are available at https://github.com/Leezekun/ViTST
\ No newline at end of file
diff --git a/data/2023/neurips/Toolformer: Language Models Can Teach Themselves to Use Tools b/data/2023/neurips/Toolformer: Language Models Can Teach Themselves to Use Tools
new file mode 100644
index 0000000000..1ce94d1d74
--- /dev/null
+++ b/data/2023/neurips/Toolformer: Language Models Can Teach Themselves to Use Tools	
@@ -0,0 +1 @@
+Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
\ No newline at end of file
diff --git a/data/2023/neurips/Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification b/data/2023/neurips/Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/TopoSRL: Topology preserving self-supervised Simplicial Representation Learning b/data/2023/neurips/TopoSRL: Topology preserving self-supervised Simplicial Representation Learning
new file mode 100644
index 0000000000..4b88a6564d
--- /dev/null
+++ b/data/2023/neurips/TopoSRL: Topology preserving self-supervised Simplicial Representation Learning	
@@ -0,0 +1 @@
+In this paper, we introduce TopoSRL , a novel self-supervised learning (SSL) method for simplicial complexes to effectively capture higher-order interactions and preserve topology in the learned representations. TopoSRL addresses the limitations of existing graph-based SSL methods that typically concentrate on pairwise relationships, neglecting long-range dependencies crucial to capturing topological information. We propose a new simplicial augmentation technique that generates two views of the simplicial complex that enriches the representations while being efficient. Next, we propose a new simplicial contrastive loss function that contrasts the generated simplices to preserve local and global information present in the simplicial complexes. Extensive experimental results demonstrate the superior performance of TopoSRL compared to state-of-the-art graph SSL techniques and supervised simplicial neural models across various datasets corroborating the efficacy of TopoSRL in processing simplicial complex data in a self-supervised setting.
\ No newline at end of file
diff --git a/data/2023/neurips/Topological RANSAC for instance verification and retrieval without fine-tuning b/data/2023/neurips/Topological RANSAC for instance verification and retrieval without fine-tuning
new file mode 100644
index 0000000000..ea2403a5f0
--- /dev/null
+++ b/data/2023/neurips/Topological RANSAC for instance verification and retrieval without fine-tuning	
@@ -0,0 +1 @@
+This paper presents an innovative approach to enhancing explainable image retrieval, particularly in situations where a fine-tuning set is unavailable. The widely-used SPatial verification (SP) method, despite its efficacy, relies on a spatial model and the hypothesis-testing strategy for instance recognition, leading to inherent limitations, including the assumption of planar structures and neglect of topological relations among features. To address these shortcomings, we introduce a pioneering technique that replaces the spatial model with a topological one within the RANSAC process. We propose bio-inspired saccade and fovea functions to verify the topological consistency among features, effectively circumventing the issues associated with SP's spatial model. Our experimental results demonstrate that our method significantly outperforms SP, achieving state-of-the-art performance in non-fine-tuning retrieval. Furthermore, our approach can enhance performance when used in conjunction with fine-tuned features. Importantly, our method retains high explainability and is lightweight, offering a practical and adaptable solution for a variety of real-world applications.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Better Dynamic Graph Learning: New Architecture and Unified Library b/data/2023/neurips/Towards Better Dynamic Graph Learning: New Architecture and Unified Library
new file mode 100644
index 0000000000..b30de9e8b8
--- /dev/null
+++ b/data/2023/neurips/Towards Better Dynamic Graph Learning: New Architecture and Unified Library	
@@ -0,0 +1 @@
+We propose DyGFormer, a new Transformer-based architecture for dynamic graph learning. DyGFormer is conceptually simple and only needs to learn from nodes' historical first-hop interactions by: (1) a neighbor co-occurrence encoding scheme that explores the correlations of the source node and destination node based on their historical sequences; (2) a patching technique that divides each sequence into multiple patches and feeds them to Transformer, allowing the model to effectively and efficiently benefit from longer histories. We also introduce DyGLib, a unified library with standard training pipelines, extensible coding interfaces, and comprehensive evaluating protocols to promote reproducible, scalable, and credible dynamic graph learning research. By performing exhaustive experiments on thirteen datasets for dynamic link prediction and dynamic node classification tasks, we find that DyGFormer achieves state-of-the-art performance on most of the datasets, demonstrating its effectiveness in capturing nodes' correlations and long-term temporal dependencies. Moreover, some results of baselines are inconsistent with previous reports, which may be caused by their diverse but less rigorous implementations, showing the importance of DyGLib. All the used resources are publicly available at https://github.com/yule-BUAA/DyGLib.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask? b/data/2023/neurips/Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression b/data/2023/neurips/Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression
new file mode 100644
index 0000000000..a9b5c132b8
--- /dev/null
+++ b/data/2023/neurips/Towards Data-Algorithm Dependent Generalization: a Case Study on Overparameterized Linear Regression	
@@ -0,0 +1 @@
+One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many scenarios, this failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. This paper demonstrate that the generalization behavior of overparameterized model should be analyzed in a both data-relevant and algorithm-relevant manner. To make a formal characterization, We introduce a notion called data-algorithm compatibility, which considers the generalization behavior of the entire data-dependent training trajectory, instead of traditional last-iterate analysis. We validate our claim by studying the setting of solving overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting. Our theoretical results demonstrate that if we take early stopping iterates into consideration, generalization can hold with significantly weaker restrictions on the problem instance than the previous last-iterate analysis.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Distribution-Agnostic Generalized Category Discovery b/data/2023/neurips/Towards Distribution-Agnostic Generalized Category Discovery
new file mode 100644
index 0000000000..6a77d0f845
--- /dev/null
+++ b/data/2023/neurips/Towards Distribution-Agnostic Generalized Category Discovery	
@@ -0,0 +1 @@
+Data imbalance and open-ended distribution are two intrinsic characteristics of the real visual world. Though encouraging progress has been made in tackling each challenge separately, few works dedicated to combining them towards real-world scenarios. While several previous works have focused on classifying close-set samples and detecting open-set samples during testing, it's still essential to be able to classify unknown subjects as human beings. In this paper, we formally define a more realistic task as distribution-agnostic generalized category discovery (DA-GCD): generating fine-grained predictions for both close- and open-set classes in a long-tailed open-world setting. To tackle the challenging problem, we propose a Self-Balanced Co-Advice contrastive framework (BaCon), which consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task. In particular, the contrastive-learning branch provides reliable distribution estimation to regularize the predictions of the pseudo-labeling branch, which in turn guides contrastive learning through self-balanced knowledge transfer and a proposed novel contrastive loss. We compare BaCon with state-of-the-art methods from two closely related fields: imbalanced semi-supervised learning and generalized category discovery. The effectiveness of BaCon is demonstrated with superior performance over all baselines and comprehensive analysis across various datasets. Our code is publicly available.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior b/data/2023/neurips/Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior
new file mode 100644
index 0000000000..0956e20429
--- /dev/null
+++ b/data/2023/neurips/Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior	
@@ -0,0 +1 @@
+Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained model size is scaled, (ii) the downstream training dataset size is scaled, (iii) the physics parameters are systematically pushed out of distribution, and (iv) how a single model pre-trained on a mixture of different physics problems can be adapted to various downstream applications. We find that-when fine-tuned appropriately-transfer learning can help reach desired accuracy levels with orders of magnitude fewer downstream examples (across different tasks that can even be out-of-distribution) than training from scratch, with consistent behavior across a wide range of downstream examples. We also find that fine-tuning these models yields more performance gains as model size increases, compared to training from scratch on new downstream tasks. These results hold for a broad range of PDE learning tasks. All in all, our results demonstrate the potential of the"pre-train and fine-tune"paradigm for SciML problems, demonstrating a path towards building SciML foundation models. We open-source our code for reproducibility.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Higher Ranks via Adversarial Weight Pruning b/data/2023/neurips/Towards Higher Ranks via Adversarial Weight Pruning
new file mode 100644
index 0000000000..b49dce23d7
--- /dev/null
+++ b/data/2023/neurips/Towards Higher Ranks via Adversarial Weight Pruning	
@@ -0,0 +1 @@
+Convolutional Neural Networks (CNNs) are hard to deploy on edge devices due to its high computation and storage complexities. As a common practice for model compression, network pruning consists of two major categories: unstructured and structured pruning, where unstructured pruning constantly performs better. However, unstructured pruning presents a structured pattern at high pruning rates, which limits its performance. To this end, we propose a Rank-based PruninG (RPG) method to maintain the ranks of sparse weights in an adversarial manner. In each step, we minimize the low-rank approximation error for the weight matrices using singular value decomposition, and maximize their distance by pushing the weight matrices away from its low rank approximation. This rank-based optimization objective guides sparse weights towards a high-rank topology. The proposed method is conducted in a gradual pruning fashion to stabilize the change of rank during training. Experimental results on various datasets and different tasks demonstrate the effectiveness of our algorithm in high sparsity. The proposed RPG outperforms the state-of-the-art performance by 1.13% top-1 accuracy on ImageNet in ResNet-50 with 98% sparsity. The codes are available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Pruning/RPG and https://gitee.com/mindspore/models/tree/master/research/cv/RPG.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards In-context Scene Understanding b/data/2023/neurips/Towards In-context Scene Understanding
new file mode 100644
index 0000000000..9dc45210f8
--- /dev/null
+++ b/data/2023/neurips/Towards In-context Scene Understanding	
@@ -0,0 +1 @@
+In-context learning$\unicode{x2013}$the ability to configure a model's behavior with different prompts$\unicode{x2013}$has revolutionized the field of natural language processing, alleviating the need for task-specific models and paving the way for generalist models capable of assisting with any query. Computer vision, in contrast, has largely stayed in the former regime: specialized decoders and finetuning protocols are generally required to perform dense tasks such as semantic segmentation and depth estimation. In this work we explore a simple mechanism for in-context learning of such scene understanding tasks: nearest neighbor retrieval from a prompt of annotated features. We propose a new pretraining protocol$\unicode{x2013}$leveraging attention within and across images$\unicode{x2013}$which yields representations particularly useful in this regime. The resulting Hummingbird model, suitably prompted, performs various scene understanding tasks without modification while approaching the performance of specialists that have been finetuned for each task. Moreover, Hummingbird can be configured to perform new tasks much more efficiently than finetuned models, raising the possibility of scene understanding in the interactive assistant regime.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Label-free Scene Understanding by Vision Foundation Models b/data/2023/neurips/Towards Label-free Scene Understanding by Vision Foundation Models
new file mode 100644
index 0000000000..1696ddec6a
--- /dev/null
+++ b/data/2023/neurips/Towards Label-free Scene Understanding by Vision Foundation Models	
@@ -0,0 +1 @@
+Vision foundation models such as Contrastive Vision-Language Pre-training (CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot performance on image classification and segmentation tasks. However, the incorporation of CLIP and SAM for label-free scene understanding has yet to be explored. In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. The primary challenge lies in effectively supervising networks under extremely noisy pseudo labels, which are generated by CLIP and further exacerbated during the propagation from the 2D to the 3D domain. To tackle these challenges, we propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously. In particular, we introduce a prediction consistency regularization to co-train 2D and 3D networks, then further impose the networks' latent space consistency using the SAM's robust feature representation. Experiments conducted on diverse indoor and outdoor datasets demonstrate the superior performance of our method in understanding 2D and 3D open environments. Our 2D and 3D network achieves label-free semantic segmentation with 28.4\% and 33.5\% mIoU on ScanNet, improving 4.7\% and 7.9\%, respectively. For nuImages and nuScenes datasets, the performance is 22.1\% and 26.8\% with improvements of 3.5\% and 6.0\%, respectively. Code is available. (https://github.com/runnanchen/Label-Free-Scene-Understanding).
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Last-layer Retraining for Group Robustness with Fewer Annotations b/data/2023/neurips/Towards Last-layer Retraining for Group Robustness with Fewer Annotations
new file mode 100644
index 0000000000..cab788ba62
--- /dev/null
+++ b/data/2023/neurips/Towards Last-layer Retraining for Group Robustness with Fewer Annotations	
@@ -0,0 +1 @@
+Empirical risk minimization (ERM) of neural networks is prone to over-reliance on spurious correlations and poor generalization on minority groups. The recent deep feature reweighting (DFR) technique achieves state-of-the-art group robustness via simple last-layer retraining, but it requires held-out group and class annotations to construct a group-balanced reweighting dataset. In this work, we examine this impractical requirement and find that last-layer retraining can be surprisingly effective with no group annotations (other than for model selection) and only a handful of class annotations. We first show that last-layer retraining can greatly improve worst-group accuracy even when the reweighting dataset has only a small proportion of worst-group data. This implies a"free lunch"where holding out a subset of training data to retrain the last layer can substantially outperform ERM on the entire dataset with no additional data or annotations. To further improve group robustness, we introduce a lightweight method called selective last-layer finetuning (SELF), which constructs the reweighting dataset using misclassifications or disagreements. Our empirical and theoretical results present the first evidence that model disagreement upsamples worst-group data, enabling SELF to nearly match DFR on four well-established benchmarks across vision and language tasks with no group annotations and less than 3% of the held-out class annotations. Our code is available at https://github.com/tmlabonte/last-layer-retraining.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective b/data/2023/neurips/Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
new file mode 100644
index 0000000000..bdcd69bf6e
--- /dev/null
+++ b/data/2023/neurips/Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective	
@@ -0,0 +1 @@
+Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive Learning b/data/2023/neurips/Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive Learning
new file mode 100644
index 0000000000..7a56fcd22f
--- /dev/null
+++ b/data/2023/neurips/Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive Learning	
@@ -0,0 +1 @@
+Automatic coding of International Classification of Diseases (ICD) is a multi-label text categorization task that involves extracting disease or procedure codes from clinical notes. Despite the application of state-of-the-art natural language processing (NLP) techniques, there are still challenges including limited availability of data due to privacy constraints and the high variability of clinical notes caused by different writing habits of medical professionals and various pathological features of patients. In this work, we investigate the semi-structured nature of clinical notes and propose an automatic algorithm to segment them into sections. To address the variability issues in existing ICD coding models with limited data, we introduce a contrastive pre-training approach on sections using a soft multi-label similarity metric based on tree edit distance. Additionally, we design a masked section training strategy to enable ICD coding models to locate sections related to ICD codes. Extensive experimental results demonstrate that our proposed training strategies effectively enhance the performance of existing ICD coding methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards Stable Backdoor Purification through Feature Shift Tuning b/data/2023/neurips/Towards Stable Backdoor Purification through Feature Shift Tuning
new file mode 100644
index 0000000000..dab74506b2
--- /dev/null
+++ b/data/2023/neurips/Towards Stable Backdoor Purification through Feature Shift Tuning	
@@ -0,0 +1 @@
+It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low poisoning rate, the entanglement between backdoor and clean features undermines the effect of tuning-based defenses. Therefore, it is necessary to disentangle the backdoor and clean features in order to improve backdoor purification. To address this, we introduce Feature Shift Tuning (FST), a method for tuning-based backdoor purification. Specifically, FST encourages feature shifts by actively deviating the classifier weights from the originally compromised weights. Extensive experiments demonstrate that our FST provides consistently stable performance under different attack settings. Without complex parameter adjustments, FST also achieves much lower tuning costs, only 10 epochs. Our codes are available at https://github.com/AISafety-HKUST/stable_backdoor_purification.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs b/data/2023/neurips/Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs
new file mode 100644
index 0000000000..128c6c452c
--- /dev/null
+++ b/data/2023/neurips/Towards a Comprehensive Benchmark for High-Level Synthesis Targeted to FPGAs	
@@ -0,0 +1 @@
+High-level synthesis (HLS) aims to raise the abstraction layer in hardware design, enabling the design of domain-specific accelerators (DSAs) targeted for field-programmable gate arrays (FPGAs) using C/C++ instead of hardware description languages (HDLs). Compiler directives in the form of pragmas play a crucial role in modifying the microarchitecture within the HLS framework. However, the number of possible microarchitectures grows exponentially with the number of pragmas. Moreover, the evaluation of each candidate design using the HLS tool consumes significant time, ranging from minutes to hours, leading to a slow optimization process. To accelerate this process, machine learning models have been used to predict design quality in milliseconds. However, existing open-source datasets for training such models are limited in terms of design complexity and available optimizations. In this paper, we present HLS YN , a new benchmark that addresses these limitations. It contains more complex programs with a wider range of optimization pragmas, making it a comprehensive dataset for training and evaluating design quality prediction models. The HLS YN benchmark consists of 42 unique programs/kernels, each of which has many different pragma configurations, resulting in over 42,000 labeled designs. We conduct an extensive comparison of state-of-the-art baselines to assess their effectiveness in predicting design quality. As an ongoing project, we anticipate expanding the HLS YN benchmark in terms of both quantity and variety of programs to further support the development of this field.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift b/data/2023/neurips/Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift
new file mode 100644
index 0000000000..b561a951a5
--- /dev/null
+++ b/data/2023/neurips/Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift	
@@ -0,0 +1 @@
+Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards a Unified Framework of Contrastive Learning for Disentangled Representations b/data/2023/neurips/Towards a Unified Framework of Contrastive Learning for Disentangled Representations
new file mode 100644
index 0000000000..ee34b0f8b7
--- /dev/null
+++ b/data/2023/neurips/Towards a Unified Framework of Contrastive Learning for Disentangled Representations	
@@ -0,0 +1 @@
+Contrastive learning has recently emerged as a promising approach for learning data representations that discover and disentangle the explanatory factors of the data. Previous analyses of such approaches have largely focused on individual contrastive losses, such as noise-contrastive estimation (NCE) and InfoNCE, and rely on specific assumptions about the data generating process. This paper extends the theoretical guarantees for disentanglement to a broader family of contrastive methods, while also relaxing the assumptions about the data distribution. Specifically, we prove identifiability of the true latents for four contrastive losses studied in this paper, without imposing common independence assumptions. The theoretical findings are validated on several benchmark datasets. Finally, practical limitations of these methods are also investigated.
\ No newline at end of file
diff --git a/data/2023/neurips/Towards a fuller understanding of neurons with Clustered Compositional Explanations b/data/2023/neurips/Towards a fuller understanding of neurons with Clustered Compositional Explanations
new file mode 100644
index 0000000000..5b2fc7d7fd
--- /dev/null
+++ b/data/2023/neurips/Towards a fuller understanding of neurons with Clustered Compositional Explanations	
@@ -0,0 +1 @@
+Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations (i.e., the highest ones) used to check the alignment, thus lacking completeness. In this paper, we propose a generalization, called Clustered Compositional Explanations, that combines Compositional Explanations with clustering and a novel search heuristic to approximate a broader spectrum of the neurons' behavior. We define and address the problems connected to the application of these methods to multiple ranges of activations, analyze the insights retrievable by using our algorithm, and propose desiderata qualities that can be used to study the explanations returned by different algorithms.
\ No newline at end of file
diff --git a/data/2023/neurips/TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs b/data/2023/neurips/TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
new file mode 100644
index 0000000000..e2808490bc
--- /dev/null
+++ b/data/2023/neurips/TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs	
@@ -0,0 +1 @@
+Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there exist a few datasets for program performance prediction, they target small sub-programs such as basic blocks or kernels. This paper introduces TpuGraphs, a performance prediction dataset on full tensor programs, represented as computational graphs, running on Tensor Processing Units (TPUs). Each graph in the dataset represents the main computation of a machine learning workload, e.g., a training epoch or an inference step. Each data sample contains a computational graph, a compilation configuration, and the execution time of the graph when compiled with the configuration. The graphs in the dataset are collected from open-source machine learning programs, featuring popular model architectures, e.g., ResNet, EfficientNet, Mask R-CNN, and Transformer. TpuGraphs provides 25x more graphs than the largest graph property prediction dataset (with comparable graph sizes), and 770x larger graphs on average compared to existing performance prediction datasets on machine learning programs. This graph-level prediction task on large graphs introduces new challenges in learning, ranging from scalability, training efficiency, to model quality.
\ No newline at end of file
diff --git a/data/2023/neurips/Train Hard, Fight Easy: Robust Meta Reinforcement Learning b/data/2023/neurips/Train Hard, Fight Easy: Robust Meta Reinforcement Learning
new file mode 100644
index 0000000000..78a83e0cbc
--- /dev/null
+++ b/data/2023/neurips/Train Hard, Fight Easy: Robust Meta Reinforcement Learning	
@@ -0,0 +1 @@
+A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Meta-RL (MRL) addresses this issue by learning a meta-policy that adapts to new tasks. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. This limits system reliability since test tasks are not known in advance. In this work, we define a robust MRL objective with a controlled robustness level. Optimization of analogous robust objectives in RL is known to lead to both *biased gradients* and *data inefficiency*. We prove that the gradient bias disappears in our proposed MRL framework. The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML). RoML is a meta-algorithm that generates a robust version of any given MRL algorithm, by identifying and over-sampling harder tasks throughout training. We demonstrate that RoML achieves robust returns on multiple navigation and continuous control benchmarks.
\ No newline at end of file
diff --git a/data/2023/neurips/Training Chain-of-Thought via Latent-Variable Inference b/data/2023/neurips/Training Chain-of-Thought via Latent-Variable Inference
new file mode 100644
index 0000000000..37b788e997
--- /dev/null
+++ b/data/2023/neurips/Training Chain-of-Thought via Latent-Variable Inference	
@@ -0,0 +1 @@
+Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training set. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers; these rationales are expensive to produce by hand. Instead, we propose a fine-tuning strategy that tries to maximize the \emph{marginal} log-likelihood of generating a correct answer using CoT prompting, approximately averaging over all possible rationales. The core challenge is sampling from the posterior over rationales conditioned on the correct answer; we address it using a simple Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm inspired by the self-taught reasoner (STaR), memoized wake-sleep, Markovian score climbing, and persistent contrastive divergence. This algorithm also admits a novel control-variate technique that drives the variance of our gradient estimates to zero as the model improves. Applying our technique to GSM8K and the tasks in BIG-Bench Hard, we find that this MCMC-EM fine-tuning technique typically improves the model's accuracy on held-out examples more than STaR or prompt-tuning with or without CoT.
\ No newline at end of file
diff --git "a/data/2023/neurips/Training Fully Connected Neural Networks is \342\210\203R-Complete" "b/data/2023/neurips/Training Fully Connected Neural Networks is \342\210\203R-Complete"
new file mode 100644
index 0000000000..4f6d03ac80
--- /dev/null
+++ "b/data/2023/neurips/Training Fully Connected Neural Networks is \342\210\203R-Complete"	
@@ -0,0 +1 @@
+We consider the problem of finding weights and biases for a two-layer fully connected neural network to fit a given set of data points as well as possible, also known as EmpiricalRiskMinimization. Our main result is that the associated decision problem is $\exists\mathbb{R}$-complete, that is, polynomial-time equivalent to determining whether a multivariate polynomial with integer coefficients has any real roots. Furthermore, we prove that algebraic numbers of arbitrarily large degree are required as weights to be able to train some instances to optimality, even if all data points are rational. Our result already applies to fully connected instances with two inputs, two outputs, and one hidden layer of ReLU neurons. Thereby, we strengthen a result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021]. A consequence of this is that a combinatorial search algorithm like the one by Arora, Basu, Mianjy and Mukherjee [ICLR 2018] is impossible for networks with more than one output dimension, unless $\mathsf{NP}=\exists\mathbb{R}$.
\ No newline at end of file
diff --git a/data/2023/neurips/Training on Foveated Images Improves Robustness to Adversarial Attacks b/data/2023/neurips/Training on Foveated Images Improves Robustness to Adversarial Attacks
new file mode 100644
index 0000000000..0503250968
--- /dev/null
+++ b/data/2023/neurips/Training on Foveated Images Improves Robustness to Adversarial Attacks	
@@ -0,0 +1 @@
+Deep neural networks (DNNs) have been shown to be vulnerable to adversarial attacks -- subtle, perceptually indistinguishable perturbations of inputs that change the response of the model. In the context of vision, we hypothesize that an important contributor to the robustness of human visual perception is constant exposure to low-fidelity visual stimuli in our peripheral vision. To investigate this hypothesis, we develop \RBlur, an image transform that simulates the loss in fidelity of peripheral vision by blurring the image and reducing its color saturation based on the distance from a given fixation point. We show that compared to DNNs trained on the original images, DNNs trained on images transformed by \RBlur are substantially more robust to adversarial attacks, as well as other, non-adversarial, corruptions, achieving up to 25\% higher accuracy on perturbed data.
\ No newline at end of file
diff --git a/data/2023/neurips/Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis b/data/2023/neurips/Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis
new file mode 100644
index 0000000000..4f47472b4c
--- /dev/null
+++ b/data/2023/neurips/Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis	
@@ -0,0 +1 @@
+Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are demanding for various images with specific sizes and various aspect ratio. This paper focuses on adapting text-to-image diffusion models to handle such variety while maintaining visual fidelity. First we observe that, during the synthesis, lower resolution images suffer from incomplete object portrayal, while higher resolution images exhibit repetitively disordered presentation. Next, we establish a statistical relationship indicating that attention entropy changes with token quantity, suggesting that models aggregate spatial information in proportion to image resolution. The subsequent interpretation on our observations is that objects are incompletely depicted due to limited spatial information for low resolutions, while repetitively disorganized presentation arises from redundant spatial information for high resolutions. From this perspective, we propose a scaling factor to alleviate the change of attention entropy and mitigate the defective pattern observed. Extensive experimental results validate the efficacy of the proposed scaling factor, enabling models to achieve better visual effects, image quality, and text alignment. Notably, these improvements are achieved without additional training or fine-tuning techniques.
\ No newline at end of file
diff --git a/data/2023/neurips/Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory b/data/2023/neurips/Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
new file mode 100644
index 0000000000..e5a99a187e
--- /dev/null
+++ b/data/2023/neurips/Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory	
@@ -0,0 +1 @@
+Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
\ No newline at end of file
diff --git a/data/2023/neurips/Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks b/data/2023/neurips/Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks
new file mode 100644
index 0000000000..1cfff7a902
--- /dev/null
+++ b/data/2023/neurips/Transformed Low-Rank Parameterization Can Help Robust Generalization for Tensor Neural Networks	
@@ -0,0 +1 @@
+Achieving efficient and robust multi-channel data learning is a challenging task in data science. By exploiting low-rankness in the transformed domain, i.e., transformed low-rankness, tensor Singular Value Decomposition (t-SVD) has achieved extensive success in multi-channel data representation and has recently been extended to function representation such as Neural Networks with t-product layers (t-NNs). However, it still remains unclear how t-SVD theoretically affects the learning behavior of t-NNs. This paper is the first to answer this question by deriving the upper bounds of the generalization error of both standard and adversarially trained t-NNs. It reveals that the t-NNs compressed by exact transformed low-rank parameterization can achieve a sharper adversarial generalization bound. In practice, although t-NNs rarely have exactly transformed low-rank weights, our analysis further shows that by adversarial training with gradient flow (GF), the over-parameterized t-NNs with ReLU activations are trained with implicit regularization towards transformed low-rank parameterization under certain conditions. We also establish adversarial generalization bounds for t-NNs with approximately transformed low-rank weights. Our analysis indicates that the transformed low-rank parameterization can promisingly enhance robust generalization for t-NNs.
\ No newline at end of file
diff --git a/data/2023/neurips/Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction b/data/2023/neurips/Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction
new file mode 100644
index 0000000000..61b5d25244
--- /dev/null
+++ b/data/2023/neurips/Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction	
@@ -0,0 +1 @@
+Neural radiance fields (NeRFs) have become a ubiquitous tool for modeling scene appearance and geometry from multiview imagery. Recent work has also begun to explore how to use additional supervision from lidar or depth sensor measurements in the NeRF framework. However, previous lidar-supervised NeRFs focus on rendering conventional camera imagery and use lidar-derived point cloud data as auxiliary supervision; thus, they fail to incorporate the underlying image formation model of the lidar. Here, we propose a novel method for rendering transient NeRFs that take as input the raw, time-resolved photon count histograms measured by a single-photon lidar system, and we seek to render such histograms from novel views. Different from conventional NeRFs, the approach relies on a time-resolved version of the volume rendering equation to render the lidar measurements and capture transient light transport phenomena at picosecond timescales. We evaluate our method on a first-of-its-kind dataset of simulated and captured transient multiview scans from a prototype single-photon lidar. Overall, our work brings NeRFs to a new dimension of imaging at transient timescales, newly enabling rendering of transient imagery from novel views. Additionally, we show that our approach recovers improved geometry and conventional appearance compared to point cloud-based supervision when training on few input viewpoints. Transient NeRFs may be especially useful for applications which seek to simulate raw lidar measurements for downstream tasks in autonomous driving, robotics, and remote sensing.
\ No newline at end of file
diff --git a/data/2023/neurips/TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion b/data/2023/neurips/TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion
new file mode 100644
index 0000000000..558015b004
--- /dev/null
+++ b/data/2023/neurips/TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion	
@@ -0,0 +1 @@
+Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.
\ No newline at end of file
diff --git a/data/2023/neurips/Trial matching: capturing variability with data-constrained spiking neural networks b/data/2023/neurips/Trial matching: capturing variability with data-constrained spiking neural networks
new file mode 100644
index 0000000000..e13042416e
--- /dev/null
+++ b/data/2023/neurips/Trial matching: capturing variability with data-constrained spiking neural networks	
@@ -0,0 +1 @@
+Simultaneous behavioral and electrophysiological recordings call for new methods to reveal the interactions between neural activity and behavior. A milestone would be an interpretable model of the co-variability of spiking activity and behavior across trials. Here, we model a mouse cortical sensory-motor pathway in a tactile detection task reported by licking with a large recurrent spiking neural network (RSNN), fitted to the recordings via gradient-based optimization. We focus specifically on the difficulty to match the trial-to-trial variability in the data. Our solution relies on optimal transport to define a distance between the distributions of generated and recorded trials. The technique is applied to artificial data and neural recordings covering six cortical areas. We find that the resulting RSNN can generate realistic cortical activity and predict jaw movements across the main modes of trial-to-trial variability. Our analysis also identifies an unexpected mode of variability in the data corresponding to task-irrelevant movements of the mouse.
\ No newline at end of file
diff --git a/data/2023/neurips/TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models b/data/2023/neurips/TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models
new file mode 100644
index 0000000000..19cd15f111
--- /dev/null
+++ b/data/2023/neurips/TrojLLM: A Black-box Trojan Prompt Attack on Large Language Models	
@@ -0,0 +1 @@
+Large Language Models (LLMs) are progressively being utilized as machine learning services and interface tools for various applications. However, the security implications of LLMs, particularly in relation to adversarial and Trojan attacks, remain insufficiently examined. In this paper, we propose TrojLLM, an automatic and black-box framework to effectively generate universal and stealthy triggers. When these triggers are incorporated into the input data, the LLMs' outputs can be maliciously manipulated. Moreover, the framework also supports embedding Trojans within discrete prompts, enhancing the overall effectiveness and precision of the triggers' attacks. Specifically, we propose a trigger discovery algorithm for generating universal triggers for various inputs by querying victim LLM-based APIs using few-shot data samples. Furthermore, we introduce a novel progressive Trojan poisoning algorithm designed to generate poisoned prompts that retain efficacy and transferability across a diverse range of models. Our experiments and results demonstrate TrojLLM's capacity to effectively insert Trojans into text prompts in real-world black-box LLM APIs including GPT-3.5 and GPT-4, while maintaining exceptional performance on clean test sets. Our work sheds light on the potential security risks in current models and offers a potential defensive approach. The source code of TrojLLM is available at https://github.com/UCF-ML-Research/TrojLLM.
\ No newline at end of file
diff --git a/data/2023/neurips/Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data b/data/2023/neurips/Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data
new file mode 100644
index 0000000000..76e8715069
--- /dev/null
+++ b/data/2023/neurips/Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data	
@@ -0,0 +1 @@
+Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent flow simulation data. With this data, we benchmark a total of 49 variations of five deep learning approaches for 3D super-resolution - which can be applied for improving scientific imaging, simulations, turbulence models, as well as in computer vision applications. We perform neural scaling analysis on these models to examine the performance of different machine learning (ML) approaches, including two scientific ML techniques. We demonstrate that (i) predictive performance can scale with model size and cost, (ii) architecture matters significantly, especially for smaller models, and (iii) the benefits of physics-based losses can persist with increasing model size. The outcomes of this benchmark study are anticipated to offer insights that can aid the design of 3D super-resolution models, especially for turbulence models, while this data is expected to foster ML methods for a broad range of flow physics applications. This data is publicly available with download links and browsing tools consolidated at https://blastnet.github.io.
\ No newline at end of file
diff --git a/data/2023/neurips/Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods b/data/2023/neurips/Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
new file mode 100644
index 0000000000..be4c4a9052
--- /dev/null
+++ b/data/2023/neurips/Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods	
@@ -0,0 +1 @@
+The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying stepsize $\eta_t = \eta/\sqrt{t}$ relies on well-tuned $\eta$ depending on problem parameters such as Lipschitz smoothness constant, which is often unknown in practice. In this work, we prove that SGD with arbitrary $\eta>0$, referred to as untuned SGD, still attains an order-optimal convergence rate $\widetilde{O}(T^{-1/4})$ in terms of gradient norm for minimizing smooth objectives. Unfortunately, it comes at the expense of a catastrophic exponential dependence on the smoothness constant, which we show is unavoidable for this scheme even in the noiseless setting. We then examine three families of adaptive methods $\unicode{x2013}$ Normalized SGD (NSGD), AMSGrad, and AdaGrad $\unicode{x2013}$ unveiling their power in preventing such exponential dependency in the absence of information about the smoothness parameter and boundedness of stochastic gradients. Our results provide theoretical justification for the advantage of adaptive methods over untuned SGD in alleviating the issue with large gradients.
\ No newline at end of file
diff --git a/data/2023/neurips/Two-Stage Learning to Defer with Multiple Experts b/data/2023/neurips/Two-Stage Learning to Defer with Multiple Experts
new file mode 100644
index 0000000000..2b04a9f76f
--- /dev/null
+++ b/data/2023/neurips/Two-Stage Learning to Defer with Multiple Experts	
@@ -0,0 +1 @@
+We study a two-stage scenario for learning to defer with multiple experts, which is crucial in practice for many applications. In this scenario, a predictor is derived in a ﬁrst stage by training with a common loss function such as cross-entropy. In the second stage, a deferral function is learned to assign the most suitable expert to each input. We design a new family of surrogate loss functions for this scenario both in the score-based and the predictor-rejector settings and prove that they are supported by H -consistency bounds, which implies their Bayes-consistency. Moreover, we show that, for a constant cost function, our two-stage surrogate losses are realizable H -consistent. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on CIFAR-10 and SVHN datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Type-to-Track: Retrieve Any Object via Prompt-based Tracking b/data/2023/neurips/Type-to-Track: Retrieve Any Object via Prompt-based Tracking
new file mode 100644
index 0000000000..a68316dfac
--- /dev/null
+++ b/data/2023/neurips/Type-to-Track: Retrieve Any Object via Prompt-based Tracking	
@@ -0,0 +1 @@
+One of the recent trends in vision problems is to use natural language captions to describe the objects of interest. This approach can overcome some limitations of traditional methods that rely on bounding boxes or category annotations. This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions. We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT, that contains videos with various types of objects and their corresponding textual captions describing their appearance and action in detail. Additionally, we introduce two new evaluation protocols and formulate evaluation metrics specifically for this task. We develop a new efficient method that models a transformer-based eMbed-ENcoDE-extRact framework (MENDER) using the third-order tensor decomposition. The experiments in five scenarios show that our MENDER approach outperforms another two-stage design in terms of accuracy and efficiency, up to 14.7% accuracy and 4$\times$ speed faster.
\ No newline at end of file
diff --git a/data/2023/neurips/UDC-SIT: A Real-World Dataset for Under-Display Cameras b/data/2023/neurips/UDC-SIT: A Real-World Dataset for Under-Display Cameras
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/UE4-NeRF: Neural Radiance Field for Real-Time Rendering of Large-Scale Scene b/data/2023/neurips/UE4-NeRF: Neural Radiance Field for Real-Time Rendering of Large-Scale Scene
new file mode 100644
index 0000000000..ff1f81c1aa
--- /dev/null
+++ b/data/2023/neurips/UE4-NeRF: Neural Radiance Field for Real-Time Rendering of Large-Scale Scene	
@@ -0,0 +1 @@
+Neural Radiance Fields (NeRF) is a novel implicit 3D reconstruction method that shows immense potential and has been gaining increasing attention. It enables the reconstruction of 3D scenes solely from a set of photographs. However, its real-time rendering capability, especially for interactive real-time rendering of large-scale scenes, still has significant limitations. To address these challenges, in this paper, we propose a novel neural rendering system called UE4-NeRF, specifically designed for real-time rendering of large-scale scenes. We partitioned each large scene into different sub-NeRFs. In order to represent the partitioned independent scene, we initialize polygonal meshes by constructing multiple regular octahedra within the scene and the vertices of the polygonal faces are continuously optimized during the training process. Drawing inspiration from Level of Detail (LOD) techniques, we trained meshes of varying levels of detail for different observation levels. Our approach combines with the rasterization pipeline in Unreal Engine 4 (UE4), achieving real-time rendering of large-scale scenes at 4K resolution with a frame rate of up to 43 FPS. Rendering within UE4 also facilitates scene editing in subsequent stages. Furthermore, through experiments, we have demonstrated that our method achieves rendering quality comparable to state-of-the-art approaches. Project page: https://jamchaos.github.io/UE4-NeRF/.
\ No newline at end of file
diff --git a/data/2023/neurips/UP-NeRF: Unconstrained Pose Prior-Free Neural Radiance Field b/data/2023/neurips/UP-NeRF: Unconstrained Pose Prior-Free Neural Radiance Field
new file mode 100644
index 0000000000..e913ec5547
--- /dev/null
+++ b/data/2023/neurips/UP-NeRF: Unconstrained Pose Prior-Free Neural Radiance Field	
@@ -0,0 +1 @@
+Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses. Subsequent works even succeeded in eliminating the necessity of pose priors by jointly optimizing NeRF and camera pose. However, these works are limited to relatively simple settings such as photometrically consistent and occluder-free image collections or a sequence of images from a video. So they have difficulty handling unconstrained images with varying illumination and transient occluders. In this paper, we propose $\textbf{UP-NeRF}$ ($\textbf{U}$nconstrained $\textbf{P}$ose-prior-free $\textbf{Ne}$ural $\textbf{R}$adiance $\textbf{F}$ields) to optimize NeRF with unconstrained image collections without camera pose prior. We tackle these challenges with surrogate tasks that optimize color-insensitive feature fields and a separate module for transient occluders to block their influence on pose estimation. In addition, we introduce a candidate head to enable more robust pose estimation and transient-aware depth supervision to minimize the effect of incorrect prior. Our experiments verify the superior performance of our method compared to the baselines including BARF and its variants in a challenging internet photo collection, $\textit{Phototourism}$ dataset.
\ No newline at end of file
diff --git a/data/2023/neurips/UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition b/data/2023/neurips/UltraRE: Enhancing RecEraser for Recommendation Unlearning via Error Decomposition
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Unbiased learning of deep generative models with structured discrete representations b/data/2023/neurips/Unbiased learning of deep generative models with structured discrete representations
new file mode 100644
index 0000000000..f4401ebe77
--- /dev/null
+++ b/data/2023/neurips/Unbiased learning of deep generative models with structured discrete representations	
@@ -0,0 +1 @@
+By composing graphical models with deep learning architectures, we learn generative models with the strengths of both frameworks. The structured variational autoencoder (SVAE) inherits structure and interpretability from graphical models, and flexible likelihoods for high-dimensional data from deep learning, but poses substantial optimization challenges. We propose novel algorithms for learning SVAEs, and are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables. Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization. To more rapidly learn accurate graphical model parameters, we derive a method for computing natural gradients without manual derivations, which avoids biases found in prior work. These optimization innovations enable the first comparisons of the SVAE to state-of-the-art time series models, where the SVAE performs competitively while learning interpretable and structured discrete data representations.
\ No newline at end of file
diff --git a/data/2023/neurips/Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval b/data/2023/neurips/Uncertainty-Aware Alignment Network for Cross-Domain Video-Text Retrieval
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Uncertainty-Aware Instance Reweighting for Off-Policy Learning b/data/2023/neurips/Uncertainty-Aware Instance Reweighting for Off-Policy Learning
new file mode 100644
index 0000000000..0a92839781
--- /dev/null
+++ b/data/2023/neurips/Uncertainty-Aware Instance Reweighting for Off-Policy Learning	
@@ -0,0 +1 @@
+Off-policy learning, referring to the procedure of policy optimization with access only to logged feedback data, has shown importance in various real-world applications, such as search engines, recommender systems, and etc. While the ground-truth logging policy, which generates the logged data, is usually unknown, previous work simply takes its estimated value in off-policy learning, ignoring both high bias and high variance resulted from such an estimator, especially on samples with small and inaccurately estimated logging probabilities. In this work, we explicitly model the uncertainty in the estimated logging policy and propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning, with a theoretical convergence guarantee. Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator against an extensive list of state-of-the-art baselines.
\ No newline at end of file
diff --git a/data/2023/neurips/Unconstrained Dynamic Regret via Sparse Coding b/data/2023/neurips/Unconstrained Dynamic Regret via Sparse Coding
new file mode 100644
index 0000000000..41075f8d16
--- /dev/null
+++ b/data/2023/neurips/Unconstrained Dynamic Regret via Sparse Coding	
@@ -0,0 +1 @@
+Motivated by the challenge of nonstationarity in sequential decision making, we study Online Convex Optimization (OCO) under the coupling of two problem structures: the domain is unbounded, and the comparator sequence $u_1,\ldots,u_T$ is arbitrarily time-varying. As no algorithm can guarantee low regret simultaneously against all comparator sequences, handling this setting requires moving from minimax optimality to comparator adaptivity. That is, sensible regret bounds should depend on certain complexity measures of the comparator relative to one's prior knowledge. This paper achieves a new type of these adaptive regret bounds via a sparse coding framework. The complexity of the comparator is measured by its energy and its sparsity on a user-specified dictionary, which offers considerable versatility. Equipped with a wavelet dictionary for example, our framework improves the state-of-the-art bound (Jacobsen&Cutkosky, 2022) by adapting to both ($i$) the magnitude of the comparator average $||\bar u||=||\sum_{t=1}^Tu_t/T||$, rather than the maximum $\max_t||u_t||$; and ($ii$) the comparator variability $\sum_{t=1}^T||u_t-\bar u||$, rather than the uncentered sum $\sum_{t=1}^T||u_t||$. Furthermore, our analysis is simpler due to decoupling function approximation from regret minimization.
\ No newline at end of file
diff --git a/data/2023/neurips/Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation b/data/2023/neurips/Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
new file mode 100644
index 0000000000..92a95ec03f
--- /dev/null
+++ b/data/2023/neurips/Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation	
@@ -0,0 +1 @@
+This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens. This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness. Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns. Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg.
\ No newline at end of file
diff --git a/data/2023/neurips/Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts b/data/2023/neurips/Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts
new file mode 100644
index 0000000000..9f023716ce
--- /dev/null
+++ b/data/2023/neurips/Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts	
@@ -0,0 +1 @@
+Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift, i.e., (i) context shift, (ii) viewpoint shift, (iii) actor shift, (iv) source shift, (v) generalizability to unknown classes (zero-shot), and (vi) open-set recognition. To perform this extensive study, we carefully craft a test bed consisting of 17 in-distribution and out-of-distribution benchmark pairs using available public datasets and a series of evaluation protocols to stress-test the different methods under the intended shifts. Our study uncovers a series of intriguing findings and interesting behaviors of VSSL methods. For instance, we observe that while video models generally struggle with context shifts, v-MAE and supervised learning exhibit more robustness. Moreover, our study shows that v-MAE is a strong temporal learner, whereas contrastive methods, v-SimCLR and v-MoCo, exhibit strong performances against viewpoint shifts. When studying the notion of open-set recognition, we notice a trade-off between closed-set and open-set recognition performance if the pretrained VSSL encoders are used without finetuning. We hope that our work will contribute to the development of robust video representation learning frameworks for various real-world scenarios. The project page and code are available at: https://pritamqu.github.io/OOD-VSSL.
\ No newline at end of file
diff --git a/data/2023/neurips/Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation b/data/2023/neurips/Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation
new file mode 100644
index 0000000000..28fca4d811
--- /dev/null
+++ b/data/2023/neurips/Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation	
@@ -0,0 +1 @@
+To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that typically look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.
\ No newline at end of file
diff --git a/data/2023/neurips/Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes b/data/2023/neurips/Understanding Few-Shot Learning: Measuring Task Relatedness and Adaptation Difficulty via Attributes
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization b/data/2023/neurips/Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization
new file mode 100644
index 0000000000..8aca76abea
--- /dev/null
+++ b/data/2023/neurips/Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization	
@@ -0,0 +1 @@
+Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model via stage-wise local training processes on the heterogeneous dataset. Previous works have implicitly studied that FL suffers from the ``client-drift'' problem, which is caused by the inconsistent optimum across local clients. However, till now it still lacks solid theoretical analysis to explain the impact of this local inconsistency. To alleviate the negative impact of the ``client drift'' and explore its substance in FL, in this paper, we first design an efficient FL algorithm \textit{FedInit}, which allows employing the personalized relaxed initialization state at the beginning of each local training stage. Specifically, \textit{FedInit} initializes the local state by moving away from the current global state towards the reverse direction of the latest local state. This relaxed initialization helps to revise the local divergence and enhance the local consistency level. Moreover, to further understand how inconsistency disrupts performance in FL, we introduce the excess risk analysis and study the divergence term to investigate the test error of the proposed \textit{FedInit} method. Our studies show that optimization error is not sensitive to this local inconsistency, while it mainly affects the generalization error bound in \textit{FedInit}. Extensive experiments are conducted to validate this conclusion. Our proposed \textit{FedInit} could achieve state-of-the-art~(SOTA) results compared to several advanced benchmarks without any additional costs. Meanwhile, stage-wise relaxed initialization could also be incorporated into the current advanced algorithms to achieve higher performance in the FL paradigm.
\ No newline at end of file
diff --git a/data/2023/neurips/Understanding and Improving Feature Learning for Out-of-Distribution Generalization b/data/2023/neurips/Understanding and Improving Feature Learning for Out-of-Distribution Generalization
new file mode 100644
index 0000000000..7a4a87958a
--- /dev/null
+++ b/data/2023/neurips/Understanding and Improving Feature Learning for Out-of-Distribution Generalization	
@@ -0,0 +1 @@
+A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. Despite the contradictions at first glance, we theoretically show that ERM essentially learns both spurious and invariant features, while ERM tends to learn spurious features faster if the spurious correlation is stronger. Moreover, when fed the ERM learned features to the OOD objectives, the invariant feature learning quality significantly affects the final OOD performance, as OOD objectives rarely learn new features. Therefore, ERM feature learning can be a bottleneck to OOD generalization. To alleviate the reliance, we propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization. FeAT iteratively augments the model to learn new features while retaining the already learned features. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FeAT effectively learns richer features thus boosting the performance of various OOD objectives.
\ No newline at end of file
diff --git a/data/2023/neurips/Understanding the Limitations of Deep Models for Molecular property prediction: Insights and Solutions b/data/2023/neurips/Understanding the Limitations of Deep Models for Molecular property prediction: Insights and Solutions
new file mode 100644
index 0000000000..d8abfbb69c
--- /dev/null
+++ b/data/2023/neurips/Understanding the Limitations of Deep Models for Molecular property prediction: Insights and Solutions	
@@ -0,0 +1 @@
+Molecular Property Prediction (MPP) is a crucial task in the AI-driven Drug Discovery (AIDD) pipeline, which has recently gained considerable attention thanks to advancements in deep learning. However, recent research has revealed that deep models struggle to beat traditional non-deep ones on MPP. In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 15 molecule datasets. Through the most comprehensive study to date, we make the following key observations: (i) Deep models are generally unable to outperform non-deep ones; (ii) The failure of deep models on MPP cannot be solely attributed to the small size of molecular datasets; (iii) In particular, some traditional models including XGB and RF that use molecular fingerprints as inputs tend to perform better than other competitors. Furthermore, we conduct extensive empirical investigations into the unique patterns of molecule data and inductive biases of various models underlying these phenomena. These findings stimulate us to develop a simple-yet-effective feature mapping method for molecule data prior to feeding them into deep models. Empirically, deep models equipped with this mapping method can beat non-deep ones in most MoleculeNet datasets. Notably, the effectiveness is further corroborated by extensive experiments on cutting-edge dataset related to COVID-19 and activity cliff datasets.
\ No newline at end of file
diff --git a/data/2023/neurips/Understanding the detrimental class-level effects of data augmentation b/data/2023/neurips/Understanding the detrimental class-level effects of data augmentation
new file mode 100644
index 0000000000..f7cc62b455
--- /dev/null
+++ b/data/2023/neurips/Understanding the detrimental class-level effects of data augmentation	
@@ -0,0 +1 @@
+Data augmentation (DA) encodes invariance and provides implicit regularization critical to a model's performance in image classification tasks. However, while DA improves average accuracy, recent studies have shown that its impact can be highly class dependent: achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. There has been little progress in resolving class-level accuracy drops due to a limited understanding of these effects. In this work, we present a framework for understanding how DA interacts with class-level learning dynamics. Using higher-quality multi-label annotations on ImageNet, we systematically categorize the affected classes and find that the majority are inherently ambiguous, co-occur, or involve fine-grained distinctions, while DA controls the model's bias towards one of the closely related classes. While many of the previously reported performance drops are explained by multi-label annotations, our analysis of class confusions reveals other sources of accuracy degradation. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.
\ No newline at end of file
diff --git a/data/2023/neurips/UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models b/data/2023/neurips/UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
new file mode 100644
index 0000000000..0082c757e0
--- /dev/null
+++ b/data/2023/neurips/UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models	
@@ -0,0 +1 @@
+Diffusion probabilistic models (DPMs) have demonstrated a very promising ability in high-resolution image synthesis. However, sampling from a pre-trained DPM is time-consuming due to the multiple evaluations of the denoising network, making it more and more important to accelerate the sampling of DPMs. Despite recent progress in designing fast samplers, existing methods still cannot generate satisfying images in many applications where fewer steps (e.g., $<$10) are favored. In this paper, we develop a unified corrector (UniC) that can be applied after any existing DPM sampler to increase the order of accuracy without extra model evaluations, and derive a unified predictor (UniP) that supports arbitrary order as a byproduct. Combining UniP and UniC, we propose a unified predictor-corrector framework called UniPC for the fast sampling of DPMs, which has a unified analytical form for any order and can significantly improve the sampling quality over previous methods, especially in extremely few steps. We evaluate our methods through extensive experiments including both unconditional and conditional sampling using pixel-space and latent-space DPMs. Our UniPC can achieve 3.87 FID on CIFAR10 (unconditional) and 7.51 FID on ImageNet 256$\times$256 (conditional) with only 10 function evaluations. Code is available at https://github.com/wl-zhao/UniPC.
\ No newline at end of file
diff --git a/data/2023/neurips/Uniform Convergence with Square-Root Lipschitz Loss b/data/2023/neurips/Uniform Convergence with Square-Root Lipschitz Loss
new file mode 100644
index 0000000000..f71d6ae155
--- /dev/null
+++ b/data/2023/neurips/Uniform Convergence with Square-Root Lipschitz Loss	
@@ -0,0 +1 @@
+We establish generic uniform convergence guarantees for Gaussian data in terms of the Rademacher complexity of the hypothesis class and the Lipschitz constant of the square root of the scalar loss function. We show how these guarantees substantially generalize previous results based on smoothness (Lipschitz constant of the derivative), and allow us to handle the broader class of square-root-Lipschitz losses, which includes also non-smooth loss functions appropriate for studying phase retrieval and ReLU regression, as well as rederive and better understand"optimistic rate"and interpolation learning guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/Universality and Limitations of Prompt Tuning b/data/2023/neurips/Universality and Limitations of Prompt Tuning
new file mode 100644
index 0000000000..b7261f9218
--- /dev/null
+++ b/data/2023/neurips/Universality and Limitations of Prompt Tuning	
@@ -0,0 +1 @@
+Despite the demonstrated empirical efficacy of prompt tuning to adapt a pretrained language model for a new task, the theoretical underpinnings of the difference between"tuning parameters before the input"against"the tuning of model weights"are limited. We thus take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures. By considering a general purpose architecture, we analyze prompt tuning from the lens of both: universal approximation and limitations with finite-depth fixed-weight pretrained transformers for continuous-valued functions. Our universality result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions. The limitations of prompt tuning for limited-depth transformers are first proved by constructing a set of datasets, that cannot be memorized by a prompt of any length for a given single encoder layer. We also provide a lower bound on the required number of tunable prompt parameters and compare the result with the number of parameters required for a low-rank update (based on LoRA) for a single-layer setting. We finally extend our analysis to multi-layer settings by providing sufficient conditions under which the transformer can at best learn datasets from invertible functions only. Our theoretical claims are also corroborated by empirical results.
\ No newline at end of file
diff --git a/data/2023/neurips/Unleashing the Power of Randomization in Auditing Differentially Private ML b/data/2023/neurips/Unleashing the Power of Randomization in Auditing Differentially Private ML
new file mode 100644
index 0000000000..d6fc42dcfc
--- /dev/null
+++ b/data/2023/neurips/Unleashing the Power of Randomization in Auditing Differentially Private ML	
@@ -0,0 +1 @@
+We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to design randomized canaries. Second, we audit LiDP by trying to distinguish between the model trained with $K$ canaries versus $K - 1$ canaries in the dataset, leaving one canary out. By drawing the canaries i.i.d., LiDP can leverage the symmetry in the design and reuse each privately trained model to run multiple statistical tests, one for each canary. Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity, both theoretically and empirically, using synthetic and real data. Further, recent advances in designing stronger canaries can be readily incorporated into the new framework.
\ No newline at end of file
diff --git a/data/2023/neurips/Unsupervised Anomaly Detection with Rejection b/data/2023/neurips/Unsupervised Anomaly Detection with Rejection
new file mode 100644
index 0000000000..2b69212aee
--- /dev/null
+++ b/data/2023/neurips/Unsupervised Anomaly Detection with Rejection	
@@ -0,0 +1 @@
+Anomaly detection aims at detecting unexpected behaviours in the data. Because anomaly detection is usually an unsupervised task, traditional anomaly detectors learn a decision boundary by employing heuristics based on intuitions, which are hard to verify in practice. This introduces some uncertainty, especially close to the decision boundary, that may reduce the user trust in the detector's predictions. A way to combat this is by allowing the detector to reject examples with high uncertainty (Learning to Reject). This requires employing a confidence metric that captures the distance to the decision boundary and setting a rejection threshold to reject low-confidence predictions. However, selecting a proper metric and setting the rejection threshold without labels are challenging tasks. In this paper, we solve these challenges by setting a constant rejection threshold on the stability metric computed by ExCeeD. Our insight relies on a theoretical analysis of such a metric. Moreover, setting a constant threshold results in strong guarantees: we estimate the test rejection rate, and derive a theoretical upper bound for both the rejection rate and the expected prediction cost. Experimentally, we show that our method outperforms some metric-based methods.
\ No newline at end of file
diff --git a/data/2023/neurips/Unsupervised Graph Neural Architecture Search with Disentangled Self-Supervision b/data/2023/neurips/Unsupervised Graph Neural Architecture Search with Disentangled Self-Supervision
new file mode 100644
index 0000000000..0f5029d0ec
--- /dev/null
+++ b/data/2023/neurips/Unsupervised Graph Neural Architecture Search with Disentangled Self-Supervision	
@@ -0,0 +1 @@
+The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.
\ No newline at end of file
diff --git a/data/2023/neurips/Unsupervised Image Denoising with Score Function b/data/2023/neurips/Unsupervised Image Denoising with Score Function
new file mode 100644
index 0000000000..dac80a18c8
--- /dev/null
+++ b/data/2023/neurips/Unsupervised Image Denoising with Score Function	
@@ -0,0 +1 @@
+Though achieving excellent performance in some cases, current unsupervised learning methods for single image denoising usually have constraints in applications. In this paper, we propose a new approach which is more general and applicable to complicated noise models. Utilizing the property of score function, the gradient of logarithmic probability, we define a solving system for denoising. Once the score function of noisy images has been estimated, the denoised result can be obtained through the solving system. Our approach can be applied to multiple noise models, such as the mixture of multiplicative and additive noise combined with structured correlation. Experimental results show that our method is comparable when the noise model is simple, and has good performance in complicated cases where other methods are not applicable or perform poorly.
\ No newline at end of file
diff --git a/data/2023/neurips/Unsupervised Polychromatic Neural Representation for CT Metal Artifact Reduction b/data/2023/neurips/Unsupervised Polychromatic Neural Representation for CT Metal Artifact Reduction
new file mode 100644
index 0000000000..224a8503fa
--- /dev/null
+++ b/data/2023/neurips/Unsupervised Polychromatic Neural Representation for CT Metal Artifact Reduction	
@@ -0,0 +1 @@
+Emerging neural reconstruction techniques based on tomography (e.g., NeRF, NeAT, and NeRP) have started showing unique capabilities in medical imaging. In this work, we present a novel Polychromatic neural representation (Polyner) to tackle the challenging problem of CT imaging when metallic implants exist within the human body. CT metal artifacts arise from the drastic variation of metal's attenuation coefficients at various energy levels of the X-ray spectrum, leading to a nonlinear metal effect in CT measurements. Recovering CT images from metal-affected measurements hence poses a complicated nonlinear inverse problem where empirical models adopted in previous metal artifact reduction (MAR) approaches lead to signal loss and strongly aliased reconstructions. Polyner instead models the MAR problem from a nonlinear inverse problem perspective. Specifically, we first derive a polychromatic forward model to accurately simulate the nonlinear CT acquisition process. Then, we incorporate our forward model into the implicit neural representation to accomplish reconstruction. Lastly, we adopt a regularizer to preserve the physical properties of the CT images across different energy levels while effectively constraining the solution space. Our Polyner is an unsupervised method and does not require any external training data. Experimenting with multiple datasets shows that our Polyner achieves comparable or better performance than supervised methods on in-domain datasets while demonstrating significant performance improvements on out-of-domain datasets. To the best of our knowledge, our Polyner is the first unsupervised MAR method that outperforms its supervised counterparts. The code for this work is available at: https://github.com/iwuqing/Polyner.
\ No newline at end of file
diff --git a/data/2023/neurips/Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models b/data/2023/neurips/Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models
new file mode 100644
index 0000000000..52e7a2b5f6
--- /dev/null
+++ b/data/2023/neurips/Using Imperfect Surrogates for Downstream Inference: Design-based Supervised Learning for Social Science Applications of Large Language Models	
@@ -0,0 +1 @@
+In computational social science (CSS), researchers analyze documents to explain social and political phenomena. In most scenarios, CSS researchers first obtain labels for documents and then explain labels using interpretable regression analyses in the second step. One increasingly common way to annotate documents cheaply at scale is through large language models (LLMs). However, like other scalable ways of producing annotations, such surrogate labels are often imperfect and biased. We present a new algorithm for using imperfect annotation surrogates for downstream statistical analyses while guaranteeing statistical properties -- like asymptotic unbiasedness and proper uncertainty quantification -- which are fundamental to CSS research. We show that direct use of surrogate labels in downstream statistical analyses leads to substantial bias and invalid confidence intervals, even with high surrogate accuracy of 80-90%. To address this, we build on debiased machine learning to propose the design-based supervised learning (DSL) estimator. DSL employs a doubly-robust procedure to combine surrogate labels with a smaller number of high-quality, gold-standard labels. Our approach guarantees valid inference for downstream statistical analyses, even when surrogates are arbitrarily biased and without requiring stringent assumptions, by controlling the probability of sampling documents for gold-standard labeling. Both our theoretical analysis and experimental results show that DSL provides valid statistical inference while achieving root mean squared errors comparable to existing alternatives that focus only on prediction without inferential guarantees.
\ No newline at end of file
diff --git a/data/2023/neurips/Utilitarian Algorithm Configuration b/data/2023/neurips/Utilitarian Algorithm Configuration
new file mode 100644
index 0000000000..4d6883e4ea
--- /dev/null
+++ b/data/2023/neurips/Utilitarian Algorithm Configuration	
@@ -0,0 +1 @@
+We present the first nontrivial procedure for configuring heuristic algorithms to maximize the utility provided to their end users while also offering theoretical guarantees about performance. Existing procedures seek configurations that minimize expected runtime. However, very recent theoretical work argues that expected runtime minimization fails to capture algorithm designers' preferences. Here we show that the utilitarian objective also confers significant algorithmic benefits. Intuitively, this is because mean runtime is dominated by extremely long runs even when they are incredibly rare; indeed, even when an algorithm never gives rise to such long runs, configuration procedures that provably minimize mean runtime must perform a huge number of experiments to demonstrate this fact. In contrast, utility is bounded and monotonically decreasing in runtime, allowing for meaningful empirical bounds on a configuration's performance. This paper builds on this idea to describe effective and theoretically sound configuration procedures. We prove upper bounds on the runtime of these procedures that are similar to theoretical lower bounds, while also demonstrating their performance empirically.
\ No newline at end of file
diff --git a/data/2023/neurips/VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset b/data/2023/neurips/VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
new file mode 100644
index 0000000000..bdb3bcb415
--- /dev/null
+++ b/data/2023/neurips/VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	
@@ -0,0 +1 @@
+Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video caption dataset called VAST-27M. Specifically, we first collect 27 million open-domain video clips and separately train a vision and an audio captioner to generate vision and audio captions. Then, we employ an off-the-shelf Large Language Model (LLM) to integrate the generated captions, together with subtitles and instructional prompts into omni-modality captions. Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA). Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks. Code, model and dataset will be released at https://github.com/TXH-mercury/VAST.
\ No newline at end of file
diff --git a/data/2023/neurips/VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models b/data/2023/neurips/VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
new file mode 100644
index 0000000000..f03a876220
--- /dev/null
+++ b/data/2023/neurips/VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models	
@@ -0,0 +1 @@
+Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks. Towards this end, we propose VLAttack to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels. At the single-modal level, we propose a new block-wise similarity attack (BSA) strategy to learn image perturbations for disrupting universal representations. Besides, we adopt an existing text attack strategy to generate text perturbations independent of the image-modal attack. At the multimodal level, we design a novel iterative cross-search attack (ICSA) method to update adversarial image-text pairs periodically, starting with the outputs from the single-modal level. We conduct extensive experiments to attack three widely-used VL pretrained models for six tasks on eight datasets. Experimental results show that the proposed VLAttack framework achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, which reveals a significant blind spot in the deployment of pre-trained VL models. Codes will be released soon.
\ No newline at end of file
diff --git a/data/2023/neurips/VaRT: Variational Regression Trees b/data/2023/neurips/VaRT: Variational Regression Trees
new file mode 100644
index 0000000000..66a7df42f5
--- /dev/null
+++ b/data/2023/neurips/VaRT: Variational Regression Trees	
@@ -0,0 +1 @@
+Decision trees are a well-established tool in machine learning for classification and regression tasks. In this paper, we introduce a novel non-parametric Bayesian model that uses variational inference to approximate a posterior distribution over the space of stochastic decision trees. We evaluate the model’s performance on 18 datasets and demonstrate its competitiveness with other state-of-the-art methods in regression tasks. We also explore its application to causal inference problems. We provide a fully vectorized implementation of our algorithm in PyTorch
\ No newline at end of file
diff --git a/data/2023/neurips/Variational Annealing on Graphs for Combinatorial Optimization b/data/2023/neurips/Variational Annealing on Graphs for Combinatorial Optimization
new file mode 100644
index 0000000000..3d05332b20
--- /dev/null
+++ b/data/2023/neurips/Variational Annealing on Graphs for Combinatorial Optimization	
@@ -0,0 +1 @@
+Several recent unsupervised learning methods use probabilistic approaches to solve combinatorial optimization (CO) problems based on the assumption of statistically independent solution variables. We demonstrate that this assumption imposes performance limitations in particular on difficult problem instances. Our results corroborate that an autoregressive approach which captures statistical dependencies among solution variables yields superior performance on many popular CO problems. We introduce subgraph tokenization in which the configuration of a set of solution variables is represented by a single token. This tokenization technique alleviates the drawback of the long sequential sampling procedure which is inherent to autoregressive methods without sacrificing expressivity. Importantly, we theoretically motivate an annealed entropy regularization and show empirically that it is essential for efficient and stable learning.
\ No newline at end of file
diff --git a/data/2023/neurips/Video Prediction Models as Rewards for Reinforcement Learning b/data/2023/neurips/Video Prediction Models as Rewards for Reinforcement Learning
new file mode 100644
index 0000000000..6043a504c0
--- /dev/null
+++ b/data/2023/neurips/Video Prediction Models as Rewards for Reinforcement Learning	
@@ -0,0 +1 @@
+Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning. Specifically, we first train an autoregressive transformer on expert videos and then use the video prediction likelihoods as reward signals for a reinforcement learning agent. VIPER enables expert-level control without programmatic task rewards across a wide range of DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction model allows us to derive rewards for an out-of-distribution environment where no expert data is available, enabling cross-embodiment generalization for tabletop manipulation. We see our work as starting point for scalable reward specification from unlabeled videos that will benefit from the rapid advances in generative modeling. Source code and datasets are available on the project website: https://escontrela.me/viper
\ No newline at end of file
diff --git a/data/2023/neurips/Video-Mined Task Graphs for Keystep Recognition in Instructional Videos b/data/2023/neurips/Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
new file mode 100644
index 0000000000..c22fecf803
--- /dev/null
+++ b/data/2023/neurips/Video-Mined Task Graphs for Keystep Recognition in Instructional Videos	
@@ -0,0 +1 @@
+Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state -- such as the steps of a recipe or a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a predefined sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, and then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional videos, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.
\ No newline at end of file
diff --git a/data/2023/neurips/VisAlign: Dataset for Measuring the Alignment between AI and Humans in Visual Perception b/data/2023/neurips/VisAlign: Dataset for Measuring the Alignment between AI and Humans in Visual Perception
new file mode 100644
index 0000000000..2072ce9e42
--- /dev/null
+++ b/data/2023/neurips/VisAlign: Dataset for Measuring the Alignment between AI and Humans in Visual Perception	
@@ -0,0 +1 @@
+AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. In this paper, we focus on the models’ visual perception alignment with humans, further referred to as AI-human visual alignment . Specifically, we propose a new dataset for measuring AI-human visual alignment in terms of image classiﬁcation. In order to evaluate AI-human visual alignment , a dataset should encompass samples with various scenarios and have gold human perception labels. Our dataset consists of three groups of samples, namely Must-Act ( i.e. , Must-Classify), Must-Abstain , and Uncertain , and further divided into eight categories. All samples have a gold human perception label; even Un-certain ( e . g ., severely blurry) sample labels were obtained via crowd-sourcing. The validity of our dataset is veriﬁed by sampling theory, statistical theories related to survey design, and experts in the related ﬁelds. Using our dataset, we analyze the visual alignment and reliability of ﬁve popular visual perception models and eight abstention methods. Our code and data is available at https://github.com/jiyounglee-0523/VisAlign .
\ No newline at end of file
diff --git a/data/2023/neurips/VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution b/data/2023/neurips/VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
new file mode 100644
index 0000000000..13908f007f
--- /dev/null
+++ b/data/2023/neurips/VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution	
@@ -0,0 +1 @@
+We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas, where each image is associated with a caption containing a pronoun relationship of subjects and objects in the scene. VisoGender is balanced by gender representation in professional roles, supporting bias evaluation in two ways: i) resolution bias, where we evaluate the difference between pronoun resolution accuracies for image subjects with gender presentations perceived as masculine versus feminine by human annotators and ii) retrieval bias, where we compare ratios of professionals perceived to have masculine and feminine gender presentations retrieved for a gender-neutral search query. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes. While the direction and magnitude of gender bias depends on the task and the model being evaluated, captioning models are generally less biased than Vision-Language Encoders. Dataset and code are available at https://github.com/oxai/visogender
\ No newline at end of file
diff --git a/data/2023/neurips/Visual Instruction Inversion: Image Editing via Image Prompting b/data/2023/neurips/Visual Instruction Inversion: Image Editing via Image Prompting
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Volume Feature Rendering for Fast Neural Radiance Field Reconstruction b/data/2023/neurips/Volume Feature Rendering for Fast Neural Radiance Field Reconstruction
new file mode 100644
index 0000000000..155b96dadd
--- /dev/null
+++ b/data/2023/neurips/Volume Feature Rendering for Fast Neural Radiance Field Reconstruction	
@@ -0,0 +1 @@
+Neural radiance fields (NeRFs) are able to synthesize realistic novel views from multi-view images captured from distinct positions and perspectives. In NeRF's rendering pipeline, neural networks are used to represent a scene independently or transform queried learnable feature vector of a point to the expected color or density. With the aid of geometry guides either in occupancy grids or proposal networks, the number of neural network evaluations can be reduced from hundreds to dozens in the standard volume rendering framework. Instead of rendering yielded color after neural network evaluation, we propose to render the queried feature vectors of a ray first and then transform the rendered feature vector to the final pixel color by a neural network. This fundamental change to the standard volume rendering framework requires only one single neural network evaluation to render a pixel, which substantially lowers the high computational complexity of the rendering framework attributed to a large number of neural network evaluations. Consequently, we can use a comparably larger neural network to achieve a better rendering quality while maintaining the same training and rendering time costs. Our model achieves the state-of-the-art rendering quality on both synthetic and real-world datasets while requiring a training time of several minutes.
\ No newline at end of file
diff --git "a/data/2023/neurips/Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schr\303\266dinger Equation" "b/data/2023/neurips/Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schr\303\266dinger Equation"
new file mode 100644
index 0000000000..57b4716f33
--- /dev/null
+++ "b/data/2023/neurips/Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schr\303\266dinger Equation"	
@@ -0,0 +1 @@
+Solving the quantum many-body Schr\"odinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher-Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose"Wasserstein Quantum Monte Carlo"(WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than Fisher-Rao metric, and corresponds to transporting the probability mass, rather than teleporting it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
\ No newline at end of file
diff --git a/data/2023/neurips/Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets b/data/2023/neurips/Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets
new file mode 100644
index 0000000000..cfeb223e6e
--- /dev/null
+++ b/data/2023/neurips/Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets	
@@ -0,0 +1 @@
+Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the Waypoint Transformer (WT), using an architecture that builds upon the DT framework and conditioned on automatically-generated waypoints. The results show a significant increase in the final return compared to existing RvS methods, with performance on par or greater than existing state-of-the-art temporal difference learning-based methods. Additionally, the performance and stability improvements are largest in the most challenging environments and data configurations, including AntMaze Large Play/Diverse and Kitchen Mixed/Partial.
\ No newline at end of file
diff --git a/data/2023/neurips/What Can We Learn from Unlearnable Datasets? b/data/2023/neurips/What Can We Learn from Unlearnable Datasets?
new file mode 100644
index 0000000000..b8f6b947c6
--- /dev/null
+++ b/data/2023/neurips/What Can We Learn from Unlearnable Datasets?	
@@ -0,0 +1 @@
+In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. Our proposed attack is significantly less complex than recently proposed techniques.
\ No newline at end of file
diff --git a/data/2023/neurips/What Truly Matters in Trajectory Prediction for Autonomous Driving? b/data/2023/neurips/What Truly Matters in Trajectory Prediction for Autonomous Driving?
new file mode 100644
index 0000000000..17060b603e
--- /dev/null
+++ b/data/2023/neurips/What Truly Matters in Trajectory Prediction for Autonomous Driving?	
@@ -0,0 +1 @@
+In the autonomous driving system, trajectory prediction plays a vital role in ensuring safety and facilitating smooth navigation. However, we observe a substantial discrepancy between the accuracy of predictors on fixed datasets and their driving performance when used in downstream tasks. This discrepancy arises from two overlooked factors in the current evaluation protocols of trajectory prediction: 1) the dynamics gap between the dataset and real driving scenario; and 2) the computational efficiency of predictors. In real-world scenarios, prediction algorithms influence the behavior of autonomous vehicles, which, in turn, alter the behaviors of other agents on the road. This interaction results in predictor-specific dynamics that directly impact prediction results. As other agents' responses are predetermined on datasets, a significant dynamics gap arises between evaluations conducted on fixed datasets and actual driving scenarios. Furthermore, focusing solely on accuracy fails to address the demand for computational efficiency, which is critical for the real-time response required by the autonomous driving system. Therefore, in this paper, we demonstrate that an interactive, task-driven evaluation approach for trajectory prediction is crucial to reflect its efficacy for autonomous driving.
\ No newline at end of file
diff --git a/data/2023/neurips/What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation b/data/2023/neurips/What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
new file mode 100644
index 0000000000..785f2fd7f6
--- /dev/null
+++ b/data/2023/neurips/What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation	
@@ -0,0 +1 @@
+While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models. The toolkit is available at https://github.com/blumenstiel/MESS.
\ No newline at end of file
diff --git a/data/2023/neurips/When Demonstrations meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning b/data/2023/neurips/When Demonstrations meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
new file mode 100644
index 0000000000..a8419886da
--- /dev/null
+++ b/data/2023/neurips/When Demonstrations meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning	
@@ -0,0 +1 @@
+Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world'' model). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The policy model is conservative in that it maximizes reward subject to a penalty that is increasing in the uncertainty of the estimated model of the world. We propose a new algorithmic framework to solve the bi-level optimization problem formulation and provide statistical and computational guarantees of performance for the associated optimal reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art offline IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.
\ No newline at end of file
diff --git a/data/2023/neurips/When Do Neural Nets Outperform Boosted Trees on Tabular Data? b/data/2023/neurips/When Do Neural Nets Outperform Boosted Trees on Tabular Data?
new file mode 100644
index 0000000000..3c876f3d0a
--- /dev/null
+++ b/data/2023/neurips/When Do Neural Nets Outperform Boosted Trees on Tabular Data?	
@@ -0,0 +1 @@
+Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla.
\ No newline at end of file
diff --git a/data/2023/neurips/When Does Confidence-Based Cascade Deferral Suffice? b/data/2023/neurips/When Does Confidence-Based Cascade Deferral Suffice?
new file mode 100644
index 0000000000..4ba83898c0
--- /dev/null
+++ b/data/2023/neurips/When Does Confidence-Based Cascade Deferral Suffice?	
@@ -0,0 +1 @@
+Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
\ No newline at end of file
diff --git a/data/2023/neurips/When Does Optimizing a Proper Loss Yield Calibration? b/data/2023/neurips/When Does Optimizing a Proper Loss Yield Calibration?
new file mode 100644
index 0000000000..ee83c351fa
--- /dev/null
+++ b/data/2023/neurips/When Does Optimizing a Proper Loss Yield Calibration?	
@@ -0,0 +1 @@
+Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in Kakade-Foster (2008), B{\l}asiok et al. (2023). Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal.
\ No newline at end of file
diff --git a/data/2023/neurips/Where Did I Come From? Origin Attribution of AI-Generated Images b/data/2023/neurips/Where Did I Come From? Origin Attribution of AI-Generated Images
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? b/data/2023/neurips/Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
new file mode 100644
index 0000000000..89ddab4e41
--- /dev/null
+++ b/data/2023/neurips/Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?	
@@ -0,0 +1 @@
+We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data size and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 4.3M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Next, we show that task- or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. Finally, we present real-world hardware experiments, in which VC-1 and VC-1 (adapted) outperform the strongest pre-existing PVR. Overall, this paper presents no new techniques but a rigorous systematic evaluation, a broad set of findings about PVRs (that in some cases, refute those made in narrow domains in prior work), and open-sourced code and models (that required over 10,000 GPU-hours to train) for the benefit of the research community.
\ No newline at end of file
diff --git a/data/2023/neurips/Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects b/data/2023/neurips/Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects
new file mode 100644
index 0000000000..a55fe48533
--- /dev/null
+++ b/data/2023/neurips/Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects	
@@ -0,0 +1 @@
+Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. However, extant approaches often necessitate costly and inefficient test-time interactions with each unseen instance. Recognizing this limitation, we observe that despite their distinct shapes, different categories often share similar local geometries essential for manipulation, such as pullable handles and graspable edges - a factor typically underutilized in previous few-shot learning works. To harness this commonality, we introduce 'Where2Explore', an affordance learning framework that effectively explores novel categories with minimal interactions on a limited number of instances. Our framework explicitly estimates the geometric similarity across different categories, identifying local areas that differ from shapes in the training categories for efficient exploration while concurrently transferring affordance knowledge to similar parts of the objects. Extensive experiments in simulated and real-world environments demonstrate our framework's capacity for efficient few-shot exploration and generalization.
\ No newline at end of file
diff --git a/data/2023/neurips/Why Does Sharpness-Aware Minimization Generalize Better Than SGD? b/data/2023/neurips/Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
new file mode 100644
index 0000000000..9c6fee2e5f
--- /dev/null
+++ b/data/2023/neurips/Why Does Sharpness-Aware Minimization Generalize Better Than SGD?	
@@ -0,0 +1 @@
+The challenge of overfitting, in which the model memorizes the training data and fails to generalize to test data, has become increasingly significant in the training of large neural networks. To tackle this challenge, Sharpness-Aware Minimization (SAM) has emerged as a promising training method, which can improve the generalization of neural networks even in the presence of label noise. However, a deep understanding of how SAM works, especially in the setting of nonlinear neural networks and classification tasks, remains largely missing. This paper fills this gap by demonstrating why SAM generalizes better than Stochastic Gradient Descent (SGD) for a certain data model and two-layer convolutional ReLU networks. The loss landscape of our studied problem is nonsmooth, thus current explanations for the success of SAM based on the Hessian information are insufficient. Our result explains the benefits of SAM, particularly its ability to prevent noise learning in the early stages, thereby facilitating more effective learning of features. Experiments on both synthetic and real data corroborate our theory.
\ No newline at end of file
diff --git a/data/2023/neurips/Why think step by step? Reasoning emerges from the locality of experience b/data/2023/neurips/Why think step by step? Reasoning emerges from the locality of experience
new file mode 100644
index 0000000000..84bfe99af1
--- /dev/null
+++ b/data/2023/neurips/Why think step by step? Reasoning emerges from the locality of experience	
@@ -0,0 +1 @@
+Humans have a powerful and mysterious capacity to reason. By working through a series of purely mental steps, we can make inferences we would not be capable of making directly -- despite the fact that we get no additional data from the world. Similarly, when large language models generate a series of intermediate steps (a chain of thought) before answering a question, they often produce better answers than they otherwise would. We investigate why and how chain-of-thought reasoning is useful in language models, testing the hypothesis that reasoning is effective when training data consists of local clusters of variables that influence each other strongly. These training conditions enable the chaining of accurate local inferences in order to estimate relationships between variables that were not seen together in training. We prove that there will exist a"reasoning gap", where reasoning through intermediate variables improves inference, for the simple case of an autoregressive density estimator trained on local samples from a chain-structured probabilistic model. We then test our hypothesis empirically in more complex models, training an autoregressive language model on samples from Bayes nets but only including a subset of variables in each sample. We test language models' ability to match conditional probabilities with and without intermediate reasoning steps, finding that intermediate steps are only helpful when the training data is locally structured with respect to dependencies between variables and that the combination of locally-structured observations and reasoning is much more data-efficient than training on all variables. Our results illustrate how the effectiveness of reasoning step by step is rooted in the local statistical structure of the training data.
\ No newline at end of file
diff --git a/data/2023/neurips/WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction b/data/2023/neurips/WildfireSpreadTS: A dataset of multi-modal time series for wildfire spread prediction
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/data/2023/neurips/Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model b/data/2023/neurips/Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
new file mode 100644
index 0000000000..34c739fab8
--- /dev/null
+++ b/data/2023/neurips/Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model	
@@ -0,0 +1 @@
+With the rapid growth in model size, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, neural networks are usually trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2.7$\times$ peak memory reduction with almost no accuracy drop and enables up to $6.4\times$ larger batch size. Under the same hardware, WTA-CRS enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes.
\ No newline at end of file
diff --git a/data/2023/neurips/Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations b/data/2023/neurips/Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations
new file mode 100644
index 0000000000..5cfcca4d53
--- /dev/null
+++ b/data/2023/neurips/Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations	
@@ -0,0 +1 @@
+Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its"slow preprocessing"version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded"intrinsic"dimension. For the other data structure variants studied, including DiskANN with"fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a"reasonable"accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least $0.1 n$ steps on instances of size $n$ before it encounters any of the $5$ nearest neighbors of the query.
\ No newline at end of file
diff --git a/data/2023/neurips/XAGen: 3D Expressive Human Avatars Generation b/data/2023/neurips/XAGen: 3D Expressive Human Avatars Generation
new file mode 100644
index 0000000000..f1f64ed5c3
--- /dev/null
+++ b/data/2023/neurips/XAGen: 3D Expressive Human Avatars Generation	
@@ -0,0 +1 @@
+Recent advances in 3D-aware GAN models have enabled the generation of realistic and controllable human body images. However, existing methods focus on the control of major body joints, neglecting the manipulation of expressive attributes, such as facial expressions, jaw poses, hand poses, and so on. In this work, we present XAGen, the first 3D generative model for human avatars capable of expressive control over body, face, and hands. To enhance the fidelity of small-scale regions like face and hands, we devise a multi-scale and multi-part 3D representation that models fine details. Based on this representation, we propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands to ease model training and enhance geometric quality. Furthermore, we design multi-part discriminators that evaluate the quality of the generated avatars with respect to their appearance and fine-grained control capabilities. Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities. Code and data will be made available at https://showlab.github.io/xagen.
\ No newline at end of file
diff --git a/data/2023/neurips/Zero-One Laws of Graph Neural Networks b/data/2023/neurips/Zero-One Laws of Graph Neural Networks
new file mode 100644
index 0000000000..56e627d806
--- /dev/null
+++ b/data/2023/neurips/Zero-One Laws of Graph Neural Networks	
@@ -0,0 +1 @@
+Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.
\ No newline at end of file
diff --git a/data/2023/neurips/ZipLM: Inference-Aware Structured Pruning of Language Models b/data/2023/neurips/ZipLM: Inference-Aware Structured Pruning of Language Models
new file mode 100644
index 0000000000..f6373b5f52
--- /dev/null
+++ b/data/2023/neurips/ZipLM: Inference-Aware Structured Pruning of Language Models	
@@ -0,0 +1 @@
+The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. ZipLM achieves state-of-the-art accuracy-vs-speedup, while matching a set of desired target runtime speedups in any given inference environment. Specifically, given a model, a dataset, an inference environment, as well as a set of speedup targets, ZipLM iteratively identifies and removes components with the worst loss-runtime trade-off. Unlike prior methods that specialize in either the post-training/one-shot or the gradual compression setting, and only for specific families of models such as BERT (encoder) or GPT (decoder), ZipLM produces state-of-the-art compressed models across all these settings. Furthermore, ZipLM achieves superior results for a fraction of the computational cost relative to prior distillation and pruning techniques, making it a cost-effective approach for generating an entire family of smaller, faster, and highly accurate models, guaranteed to meet the desired inference specifications. In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Moreover, it matches the performance of the heavily optimized MobileBERT model, obtained via extensive architecture search, by simply pruning the baseline BERT-large model. When compressing GPT2, ZipLM outperforms DistilGPT2 while being 60% smaller and 30% faster. Our code is available at: https://github.com/IST-DASLab/ZipLM.
\ No newline at end of file
diff --git a/data/2023/neurips/f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences b/data/2023/neurips/f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences
new file mode 100644
index 0000000000..32fdb7e9a8
--- /dev/null
+++ b/data/2023/neurips/f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences	
@@ -0,0 +1 @@
+Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called $f$-Policy Gradients, or $f$-PG. $f$-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective. We show that several metric-based shaping rewards like L2 can be used with $s$-MaxEnt RL, providing a common ground to study such metric-based shaping rewards with efficient exploration. We find that $f$-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and FetchReach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.
\ No newline at end of file
diff --git a/data/2023/neurips/k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy b/data/2023/neurips/k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
new file mode 100644
index 0000000000..c0788fe8d2
--- /dev/null
+++ b/data/2023/neurips/k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy	
@@ -0,0 +1 @@
+When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.
\ No newline at end of file
diff --git a/data/2023/neurips/rPPG-Toolbox: Deep Remote PPG Toolbox b/data/2023/neurips/rPPG-Toolbox: Deep Remote PPG Toolbox
new file mode 100644
index 0000000000..9483a01a7f
--- /dev/null
+++ b/data/2023/neurips/rPPG-Toolbox: Deep Remote PPG Toolbox	
@@ -0,0 +1 @@
+Camera-based physiological measurement is a fast growing field of computer vision. Remote photoplethysmography (rPPG) utilizes imaging devices (e.g., cameras) to measure the peripheral blood volume pulse (BVP) via photoplethysmography, and enables cardiac measurement via webcams and smartphones. However, the task is non-trivial with important pre-processing, modeling, and post-processing steps required to obtain state-of-the-art results. Replication of results and benchmarking of new models is critical for scientific progress; however, as with many other applications of deep learning, reliable codebases are not easy to find or use. We present a comprehensive toolbox, rPPG-Toolbox, that contains unsupervised and supervised rPPG models with support for public benchmark datasets, data augmentation, and systematic evaluation: \url{https://github.com/ubicomplab/rPPG-Toolbox}
\ No newline at end of file
diff --git a/data/2023/neurips/xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data b/data/2023/neurips/xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
new file mode 100644
index 0000000000..2cc94b6f6d
--- /dev/null
+++ b/data/2023/neurips/xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data	
@@ -0,0 +1 @@
+The advances in high-throughput sequencing technology have led to significant progress in measuring gene expressions in single-cell level. The amount of publicly available single-cell RNA-seq (scRNA-seq) data is already surpassing 50M records for human with each record measuring 20,000 genes. This highlights the need for unsupervised representation learning to fully ingest these data, yet classical transformer architectures are prohibitive to train on such data in terms of both computation and memory. To address this challenge, we propose a novel asymmetric encoder-decoder transformer for scRNA-seq data, called xTrimoGene, which leverages the sparse characteristic of the data to scale up the pre-training. This scalable design of xTrimoGene reduces FLOPs by one to two orders of magnitude compared to classical transformers while maintaining high accuracy, enabling us to train the largest transformer models over the largest scRNA-seq dataset today. Our experiments also show that the performance of xTrimoGene improves as we increase the model sizes, and it also leads to SOTA performance over various downstream tasks, such as cell classification, perturb-seq effect prediction, and drug combination prediction.
\ No newline at end of file